Highlights of this release are support for weights in all aggregate functions, choice() with probabilities which can be different per individual (e.g. depending on gender), the ability to load globals mid-simulation, the possibility to do simulations without any .h5 output file, global constants and a lot of minor improvements and fixes.
Download this release from the download section.
defining functions without any argument without using parentheses is now a warning. Closes issue 162. In a future release, this will be an error. The goal of this change is both to make models more explicit and consistent (a function definition always has parentheses) and to make LIAM2 internal code simpler in the long run.
Given that literally all models ever written need to be updated and doing so by hand would take a lot of time, the LIAM2 model upgrader can be used to automatically upgrade model files to the new syntax (the original version of your model(s) will be saved in a .bak file).
If you only have a few model files and you use the Windows bundle, you could open each of your model in turn and use the LIAM2: upgrade model command in the Macro menu of Notepad++.
If you do not use the bundle, or have many models/files to upgrade, you should rather use the command line:
liam2 upgrade <pattern_for_your_models>
For example:
liam2 upgrade examples/*.yml
liam2 upgrade */*.yml
Note that if you are using the Windows bundle the executable is not liam2 but liam2/main.exe
added support for weights in all aggregate functions (closes issue 226):
- avg(income, filter=WORKING, weights=weight)
implemented choice with a different probability per individual (closes issue 211):
- p0: if(gender, 0.1, 0.3)
- p5: if(gender, 0.2, 0.4)
- p10: if(gender, 0.7, 0.3)
- intchoice: choice([0, 5, 10], [p0, p5, p10])
# the same using a global array (choices must be the first dimension)
# this particular case will become easier/nicer in a future release
- global_choice: choice(ARRAY2D.pvalues[0], ARRAY2D[:, gender * 1])
implemented simulation without any user-visible .h5 output file (closes issue 220). One can use an output section without declaring any “file”, or with file: ‘’:
output:
file: ''
In that case, LIAM2 will create an hidden minimal output file and delete it automatically at the end of the simulation. For people who do not use the .h5 output file, this can substantially improve disk usage and slightly improve performance when using large datasets. The minimal output file will contain only the fields used in lag expression going back in time more than one period (because those are not kept in memory).
implemented load() function which can load both arrays and tables in the middle of a simulation.
- array: load('param/mig.csv', type=float)
- table: load('param/othertable.csv',
fields=[('PERIOD', int), ('INTFIELD', int), ('FLOATFIELD', float)])
the above array and table variable can then be used, within the function, exactly like if they were respectively a global array or a global table. The only difference is that they are local to the function and thus are discarded when the function terminates. This can be used as a way to transfer groupby arrays from one entity to the other, which is otherwise not possible at the moment.
implemented align(link=) to use proportions in combination with the Chenard algorithm (closes issue 216).
implemented a way of declaring global constants (based on pull request 206 by Mahdi Ben Jelloul). For example:
globals:
MY_BOOL_CONSTANT: True
MY_FLOAT_CONSTANT: 3.1415
MY_INT_CONSTANT: 42
MY_STRING_CONSTANT: "hello"
chart functions gained xmin, xmax, ymin, ymax optional arguments to provide manual bounds for axes. By default, they are automatically inferred from the data as before (closes issue 209).
implemented totals argument to groupby to specify whether or not totals should be computed. Defaults to True like before.
added support for a new “msg” argument to all assertion functions. It specifies a custom message to append to the normal message displayed when the assertion fails. This is most useful when the condition of the assertion is complex. The message can be any expression or tuple of expressions, and it is only evaluated if the assertion fails (closes issue 208).
- assertTrue(all(age >= 0), msg="we have persons with negative age!")
will display:
AssertionError: all((age >= 100)) is not True: we have persons with negative age!
instead of just:
AssertionError: all((age >= 100)) is not True
Using dump(), csv(dump()) or breakpoint() as the msg argument can be useful.
- assertTrue(all(age < 150), msg=("we have abnormally old persons", dump(filter=age >= 150))
- assertTrue(all(total_income >= 0), breakpoint())
implemented assertRaises to check for expected errors.
implemented access to the error function (erf) function if the scipy package is installed (which is NOT the case in the Windows bundle). Thanks to Mahdi Ben Jelloul (pull request 234).
improve installation/getting started instructions and include installation instructions in the documentation (instead of only providing them in the source archive). It now includes a section concerning installation on Mac OS X. Closes issue 192. Thanks to Paul Williamson for his help concerning installation on Macs.
use “Read the Docs” theme for the documentation as it is nicer.
better error messages when trying to use data that is not ordered by period (display row and period) or which contains duplicated ids for a period (show row in addition to period and id).
better error message when a field default value is not of the field type.
When installing LIAM2 via python setup.py install, a liam2 script will be created in the Python installation Scripts directory (which is in the system PATH in most cases), so that one can use liam2 on the command line without specifying its path. For example:
liam2 run model.yml
the view command can be called without any file argument. This will launch the embedded ViTables without opening any file. In other words, one can now use liam2 view or python main.py view without another extra argument (closes issue 194).
made charts (matplotlib) work even without PyQt installed (i.e. fallback to the Tk backend).
avoid evaluating assertions arguments when using assertions: skip. Previously, only the final test was skipped.
improved check on the sum of probabilities in sidewalk alignment (pull request 197). Thanks to Mahdi Ben Jelloul.
misc improvements to the code, test models and the documentation, some of which done by Mahdi Ben Jelloul.
messages when an assertion concerning float values fails now contain all available decimals, instead of rounding at the 12th decimal.
fixed running the bundled LIAM2 when another Python distribution is installed in the PATH of the system (closes issue 222).
fixed csv(dump()) rounding float values at the 12th decimal instead of using all available precision when the missing argument is used (closes issue 252).
fixed bug which made it impossible to override an existing field or global definition when importing another model. It was using the original/imported field definition and ignoring the overridden definition (closes issue 264).
fixed fields declared as initialdata: False to load data from the input file anyway if a field with the same name existed in the corresponding table (closes issue 227). Additionally, those fields used the type from the input table instead of the one declared (if different).
fixed output: False fields acting like temporary globals (i.e. being wiped at the end of each period) instead of like a field (closes issue 230). This had the indirect consequence of having both a field and a temporary variable with the same name, which confused dump().
fixed importing models using relative paths in some cases. Also makes the display of the imported model path nicer in that case (pull request 200). Thanks to Mahdi Ben Jelloul.
fixed skip_shows: True in simulation file being ignored.
fixed --skiptimings=False being ignored if timings: True was specified in the simulation file.
fixed subsetting an array created by indexing a global with a field when the result is an array.
- array: global[field1, :, field2]
- first_item: array[0]
fixed show() and csv() on arrays created using some combinations of groupby and global arrays.
fixed using .transpose() without argument on a LabeledArray (a global or the result of a groupby).
fixed running LIAM2 in a debugger in some cases.
fixed some random number generator functions being referenced twice in the documentation index.
fixed alignment when take or leave filters are a single constant for all individuals (set as a Python bool).
fixed remove() failing if an array temporary variable is defined in the same function (closes issue 222).
fixed expressions with both a “non-simple” operation (everything except + - * / and where) and a function call failing to evaluate (closes issue 186).
Highlights of this release are support for the “sidewalk” alignment method, support for default values for fields and some basic help in the interactive console.
Download this release from the download section.
This is a bug fix only release.
Download this release from the download section.
This is a bug fix only release.
Download this release from the download section.
Highlights of this release are improved demonstration models, better Linux support, better automated tests and the usual assortment of fixes and miscellaneous improvements.
Download this release from the download section.
This release fixes a few more bugs I remembered about just after I finished the 0.10.1 release.
Download this release from the download section.
]]>This is a minor mostly bug fix release.
Download this release from the download section.
I will be giving a presentation titled “LIAM2 - overview and recently added features” at the Fifth World Congress of the International Microsimulation Association, 2-4 September 2015, Luxembourg.
This presentation will be divided into two parts. First a quick overview of what is LIAM2, then a description of all the new features since the LIAM2 course at the IMA conference in 2013 and more briefly those since our initial presentation in 2011. Each of those features will be briefly explained with examples of how they were used in real models.
LIAM2 is a free, open source, user-friendly modelling and simulation framework. It is made as generic as possible so that it can be used to develop almost any type of discrete-time dynamic microsimulation model with cross-sectional dynamic ageing (i.e. all individuals are simulated at the same time for one period, then for the next period, etc.). LIAM2 is clearly aiming to free “modellers” from having to develop or care about having state-of-the-art methods for data-handling or expression evaluation and yet be able to handle relatively large datasets at a reasonable speed. For example, a model like MIDAS in Belgium simulated over 60 years with 2.2 million individuals initially could be developed in a user-friendly environment and is run in less than 4 hours. To date, LIAM2 has been adopted by modellers in at least 7 countries.
In the context of the InGRID network, the Luxembourg Institute of Socio-Economic Research (LISER, formerly CEPS/INSTEAD) is organizing a workshop on “Elaborating a discrete-time dynamic microsimulation model with LIAM2, an open source development tool”.
The workshop is free of charge and will take place from 16 to 18 November 2015 in Luxembourg. Travel costs and accommodation are reimbursed up to certain amounts. See http://inclusivegrowth.be/events/call24/call24 for all the details.
]]>Highlights of this release are the implementation of while loops and calling user-defined functions, potentially with arguments. Please also see the complete release notes below for a potential migration issue concerning code defined outside of functions/procedures.
Download this release from the download section.
improved our error handling code to display the (part of the) line where the error occurred in more cases and to not strip the traceback (error.log) of some important information in the cases where the line was already displayed.
configured the bundle editor (Notepad++) to display some warning and error lines in red in the console log. Additionally, when the error message contains a filename and/or a line number (this is currently too rare unfortunately), it is now click-able (to jump directly at the file/line).
defining a process outside of a function has been deprecated because it is ambiguous. For example, this code will now trigger a warning and will be an error in a future version:
entities:
person:
fields:
agegroup: int
processes:
agegroup: 10 * trunc(age / 10)
simulation:
processes:
- person: [agegroup]
It should be replaced by (or possibly moved into another existing function):
entities:
person:
fields:
agegroup: int
processes:
compute_agegroup:
- agegroup: 10 * trunc(age / 10)
simulation:
processes:
- person: [compute_agegroup]
If this construct was used to have a temporary field (ie the field was not declared in the fields section) accessible from several functions, like
entities:
person:
processes:
tempfield: 0
func1:
tempfield: count()
func2:
otherfield: tempfield + 1
One should now declare that field with output: False instead.
entities:
person:
fields:
tempfield: {type: int, initialdata: False, output: False}
processes:
func1:
tempfield: count()
func2:
otherfield: tempfield + 1
This closes issue 124.
made bcolz optional (which is only useful for interpolating the dataset during import). Thanks to Mahdi Ben Jelloul (pull request 161).
allow simulations with no processes section (but an init section).
reworked a few sections of the documentation.
trying to set a value to an unknown variable in new/clone produces a warning instead of being silently ignored (or even crashing with –debug)
This is a remake of the 0.9.1 release with a correct version number and other packaging fixes.
Download this release from the download section.
This is a bugfix only release.
Download it from the download section.
Highlights of this release are many new random number generators, performance improvements, new matching algorithms, nicer internal code and many other smaller fixes and improvements.
Download this release from the download section.
added support for most of the random number generators provided by Numpy which were not already supported by LIAM2: beta, chisquare, dirichlet, exponential, f, gamma, geometric, hypergeometric, laplace, lognormal, multivariate_normal, noncentral_chisquare, noncentral_f, pareto, power, rayleigh, standard_cauchy, standard_exponential, standard_gamma, standard_normal, standard_t, triangular, vonmises, wald, weibull, zipf, binomial, logseries, negative_binomial, poisson and multinomial (see the random functions section for details). Closes issue 137.
added the rank_matching() function as an alternative method to match two sets of individuals. Based on pull request 136 from Alexis Eidelman.
added an optional “algo” argument to the matching() function, which can be set to either “onebyone” or “byvalue”.
This code is based on the optimized_matching work from pull request 144 by Alexis Eidelman.
added the possibility to automatically generate an order in matching() by using the special value ‘EDtM’ for its orderby argument. Based on pull request 136 from Alexis Eidelman.
added an optional ‘pool_size’ argument to matching(). If used, the best match for an individual is looked for in a random subset of size pool_size. Based on pull request 136 from Alexis Eidelman.
fixed the “view” command (to launch ViTables – via F9 for example) in the 64 bit bundle. This was a regression in 0.8.2. Closes issue 147.
fixed computing most expressions involving arrays with more than one dimension. It only worked if all the arrays involved were based on the same “source” array (which was the case in our tests).
assertEqual fails gracefully when comparing two arrays with different shapes.
fixed global fields colliding with fields with the same name from other (global) tables.
fixed expressions like:
if(filter_expr, align(..., array[scalar], ...), False)
and made all if(expr, GLOBAL[scalar_value], ...) expressions faster in the process.
fixed a rare problem with some expressions using scalars returned by aggregate functions.
The source code for LIAM2 is now hosted on GitHub:
https://github.com/liam2/liam2/
This offers many advantages over what we did previously (create a snapshot of the code each time a new version is released):
Enjoy!
]]>improved the performance and memory usage by changing the internal memory layout. Most operations are now faster. new(), remove(), “merging data” (for retrospective simulations) and writing data at the end of each period are now slower. In our model, this translates to a peak memory usage 20% smaller and a 35% overall speed increase. However, if your model has a low processes/variables ratio, it may very well be slower overall with this version. If it is your case, please contact us.
changed the syntax for all aggregate functions: grpxxx(...) should now be xxx(...). For example, grpsum(age) should now be: sum(age). The old syntax is still valid but it is deprecated (it will be removed in a later version). A special note for grpmin() and grpmax() which becomes min() and max() respectively even though those functions already existed. The meaning is deduced from the number of “non-keyword” arguments:
minimum between expr1 and expr2 (for each individual)
(aggregate) minimum value of “expr” over all individuals
(aggregate) minimum value of “expr” over individuals satisfying the filter
A tool to automatically upgrade models to the new syntax is provided. In notepad++, you should use the LIAM2: upgrade model command in the Macro menu.
You can also run it via the command line:
main upgrade model.yml [output.yml]
see main upgrade –help for details.
changed the syntax for all one2many link functions: xxxlink(link_name, ...) should now be link_name.xxx(...). For example, countlink(persons) should now be: persons.count(). The old syntax is still valid but it is deprecated (it will be removed in a later version). As for aggregate functions, one can upgrade its models automatically with the “upgrade” command.
the “period” argument of value_for_period can now be a scalar expression (it must have the same value for all individuals).
when the output directory does not exist, LIAM2 will now try to create it.
when debug mode is on, print the position in the random sequence before and after operations which use random numbers.
entities are loaded/stored for each period in alphabetical order instead of randomly. This has no influence on the results but produces nicer log files.
deprecated the “predictor” keyword. If you need several processes to write to the same variable, you should use procedures instead.
using invalid indexes in “global arrays” do not crash anymore if they are properly enclosed in an if() expression. For example if you have an array “by_age” with values for indices from 0 to 99, the following code will now work as expected:
if(age < 50, by_age[age + 50], 0.5)
Periodic globals are unaffected (they always return “missing” when out of bounds).
fixed link expressions which span 3 (or more) different entities.
fixed using show() on a scalar created by summing a “global array”.
fixed the progress bar of matching() when the number of individuals is different in the two sets.
fixed storing a copy of a (declared) field (without any modification) in a temporary “backup” variable. The temporary variable was not a copy but an alias to the same data, so if the field was modified afterwards, the temporary variable was also modified implicitly.
As an example, the following code failed before the fix:
# age is a field
- backup: age
# modify age (this also modified backup!)
- age: age + 1
# failed because "backup" was equal to "age"
- assertEqual(age, backup + 1)
This only affected assignment of “pure” fields, not expressions nor temporary variables, for example, the following code worked fine (because backup stores an expression, not a simple field):
- backup: age * 1
- age: age + 1
- assertEqual(age, backup + 1)
and this code worked too (because temp is a temporary variable, not a field):
- temp: age + 1
- backup: temp
- temp: temp + 1
- assertEqual(temp, backup + 1)
globals handling has been vastly improved:
multiple tables: one can now define several tables in globals and not only the “periodic” table.
These should be imported in the import file and declared in the simulation file in the exact same way that periodic globals are.
Their usage within a simulation is a bit different though: whereas periodic global variables can be used without prefixing, others globals need to be prefixed with the name of their table. For example, if one has declared a global table named “othertable”:
othertable:
fields:
- INTFIELD: int
- FLOATFIELD: float
its fields can be used like this:
my_variable: othertable.INTFIELD * 10
These other global tables need not contain a PERIOD column. When using such a table, LIAM2 will not automatically subtract the “base period” from the index, which means that to access a particular row, you have to use its row index (0 based).
n-dimensional globals: in addition to tables, globals can now be n-dimensional arrays. The file format for those should be the same than alignment files. They should be declared like this:
MYARRAY: {type: float}
globals can now be used in all situations instead of only in simple expressions and only for the “current” period. Namely, it makes globals available in: link functions, temporal functions (lag, value_for_period, ...), matching(), new() and in (all the different flavours of) the interactive console.
alignment has been vastly improved:
align_abs is a new function with the same arguments than align which can be used to align to absolute numbers per category, instead of proportions. Combined with other improvements in this release, this allows maximum flexibility for computing alignment targets on the fly (see below).
align on a linked entity (a.k.a immigration): additionally to the arguments of align, align_abs has also an optional “link” argument, which makes it work on the linked entities. The link argument must a one2many link. For example, it can be used to take as many household*s as necessary trying to get as close as possible to a particular distribution of *persons. When the link argument is in effect, the function uses the “Chenard” algorithm.
In this form, align_abs also supports two extra arguments:
renamed the “probabilities” argument of align to “proportions”
the “proportions” argument of align() is now much more versatile, as all the following are now accepted:
a single scalar, for aligning with a constant proportion.
a list of scalars, for aligning with constant proportions per category. (this used to be the only supported format for this argument)
an expression returning a single scalar.
an expression returning an n-dimensional array. expressions and possible_values will be retrieved from that array, so you can simply use:
align(score, array_expr)
a list of expressions returning scalars [expr1, expr2].
a string (in which case, it is treated as a filename). The “fname” argument is still provided for backward compatibility.
added an optional “frac_need” argument to align() to control how “fractional needs” are handled. It can take any of three values: “uniform” (default), “cutoff” or “round”.
“uniform” draws a random number (u) from an uniform distribution and adds one individual if u < fractional_need. “uniform” is the default behavior.
“round” simply rounds needs to the nearest integer. In other words, one individual is added for a category if the fractional need for that category is >= 0.5.
“cutoff” tries to match the total need as closely as possible (at the expense of a slight loss of precision for individual categories) by searching for the “cutoff point” that yields:
count(frac_need >= cutoff) == sum(frac_need)
changed the order of align() arguments: proportions is now the second argument, instead of filter, which means you can omit the “fname” or “proportions” keywords and write something like:
align(score, 'my_csv_file.csv')
made align() (and by extension logit_regr) always return False for individuals outside the filter, instead of trying to modify the target variable only where the filter is True. That feature seemed like a good idea on paper but had a very confusing side-effect: the result was different when it was stored in an existing variable than in a new temporary variable.
it is no longer possible to use expressions in alignment files. If you need to align on an expression (instead of a simple variable), you should specify the expression in the alignment function. eg:
align(0.0, fname='al_p_dead.csv', expressions=[gender, age + 1])
the result of a groupby can be used in expressions. This can be used, for example, to compute alignment targets on the fly.
implemented explore on data files (.h5), so that one can, for example, explore the input dataset.
added skip_na (defaults to True) argument to all aggregate functions to specify whether or not missing values (nan for float expressions, -1 for integer expressions) should be ignored.
macros can now be used in the interactive console.
added “globals” command in the interactive console to list the available globals.
added qshow() command to show an expression “textual form” in addition to its value. Example:
qshow(grpavg(age))
will display:
grpavg(age): 38.5277057298
added optional “pvalues” argument to groupby() to manually provide the “axis” values to compute the expression on, instead of having groupby compute the combination of all the unique values present in the dataset for each column.
improved the documentation, in part thanks to the corrections and suggestions from Alexis Eidelman.
added a “known issues” section to the documentation.
grpmin and grpmax ignore missing values (nan and -1) by default like other aggregate functions.
grpavg ignore -1 values for integer expressions like other aggregate functions.
made the operator precedence for “and”, “or” and “not” more sensible, which means that, for example:
age > 10 and age < 20
is now equivalent to:
(age > 10) and (age < 20)
instead of raising an error.
many2one links are now ~30% faster for large datasets.
during import, when a column is entirely empty and its type is not specified manually, assume a float column instead of failing to import.
allow “id” and “period” columns to be defined explicitly (even though they are still implicit by default).
allow “period” in any dimension in alignment files, not only in the last one.
disabled all warnings for x/0 and 0/0. This is not an ideal situation, but it is still an improvement because they appeared in LIAM2 code and not in user code and as such confused users more than anything.
the “num_periods” argument of lag: lag(age, num_periods) can now be a scalar expression (it must have the same value for all individuals).
changed output format of groupby to match input format for alignments.
added Warning in grpgini when all values (for the filter) are zeros.
when an unrecoverable error happens, save the technical error log to the output directory (for run and explore commands) instead of the directory from where liam2 was run and display on the console where the file has been saved.
better error message when an input file has inconsistent row lengths.
better error message when using a one2many function in a groupby expression.
LIAM2 was presented at the Third General Conference of the International Microsimulation Association, 8-10 June, Stockholm.
You can download the slides here.
]]>added support for retrospective simulation (ie simulating periods for which we already have some data): at the start of each simulated period, if there is any data in the input file for that period, it is “merged” with the result of the last simulated period. If there is any conflict, the data in the input file has priority.
added “clone” function which creates new individuals by copying all fields from their “origin” individuals, except for the fields which are given a value manually.
added breakpoint function, which launches the interactive console during a simulation. Two more console commands are available in that mode:
The breakpoint function takes an optional period argument so that it triggers only for that specific period.
added “tsum” function, which sums an expression over the whole lifetime of individuals. It returns an integer when summing integer or boolean expressions, and a float for float expressions.
implemented using the value of a periodic global at a specific period. That period can be either a constant (eg “MINR[2005]”) or an expression (eg “MINR[period - 10]” or “MINR[year_of_birth + 20]”)
added “trunc” function which takes a float expression and returns an int (dropping everything after the decimal point)
made integer division (int / int) return floats. eg 1/2 = 0.5 instead of 0.
processes which do not return any value (csv and show) do not need to be named anymore when they are inside of a procedure.
the array used to run the first period is constructed by merging the individuals present in all previous periods.
print timing for sub-processes in procedures. This is quite verbose but makes debugging performance problems/regressions easier.
made error messages more understandable in some cases.
manually flush the “console” output every time we write to it, not only within the interactive console, as some environments (namely when using the notepad++ bundle) do not flush the buffer themselves.
disable compression of the output/simulation file, as it hurts performance quite a bit (the simulation time can be increased by more than 60%). Previously, it was using the same compression settings as the input file.
allowed align() to work on a constant. eg:
align(0.0, fname='al_p_dead_m.csv')
made the “tavg” function work with boolean and float expressions in addition to integer expressions
allowed links to be used in expression given in the “new” function to initialise the fields of the new individuals.
using “__parent__” in the new() function is no longer necessary.
made the “init” section optional (it was never intended to be mandatory).
added progress bar for copying table.
optimised some parts for speed, making the whole simulation roughly as fast as 0.1 even though more work is done.