Released on 2013-06-18.
improved the performance and memory usage by changing the internal memory layout. Most operations are now faster. new(), remove(), “merging data” (for retrospective simulations) and writing data at the end of each period are now slower. In our model, this translates to a peak memory usage 20% smaller and a 35% overall speed increase. However, if your model has a low processes/variables ratio, it may very well be slower overall with this version. If it is your case, please contact us.
changed the syntax for all aggregate functions: grpxxx(...) should now be xxx(...). For example, grpsum(age) should now be: sum(age). The old syntax is still valid but it is deprecated (it will be removed in a later version). A special note for grpmin() and grpmax() which becomes min() and max() respectively even though those functions already existed. The meaning is deduced from the number of “non-keyword” arguments:
minimum between expr1 and expr2 (for each individual)
(aggregate) minimum value of “expr” over all individuals
(aggregate) minimum value of “expr” over individuals satisfying the filter
A tool to automatically upgrade models to the new syntax is provided. In notepad++, you should use the Liam2: upgrade model command in the Macro menu.
You can also run it via the command line:
main upgrade model.yml [output.yml]
see main upgrade –help for details.
changed the syntax for all one2many link functions: xxxlink(link_name, ...) should now be link_name.xxx(...). For example, countlink(persons) should now be: persons.count(). The old syntax is still valid but it is deprecated (it will be removed in a later version). As for aggregate functions, one can upgrade its models automatically with the “upgrade” command.
the “period” argument of value_for_period can now be a scalar expression (it must have the same value for all individuals).
when the output directory does not exist, Liam2 will now try to create it.
when debug mode is on, print the position in the random sequence before and after operations which use random numbers.
entities are loaded/stored for each period in alphabetical order instead of randomly. This has no influence on the results but produces nicer log files.
deprecated the “predictor” keyword. If you need several processes to write to the same variable, you should use procedures instead.
using invalid indexes in “global arrays” do not crash anymore if they are properly enclosed in an if() expression. For example if you have an array “by_age” with values for indices from 0 to 99, the following code will now work as expected:
if(age < 50, by_age[age + 50], 0.5)
Periodic globals are unaffected (they always return “missing” when out of bounds).
fixed link expressions which span 3 (or more) different entities.
fixed using show() on a scalar created by summing a “global array”.
fixed the progress bar of matching() when the number of individuals is different in the two sets.
Released on 2013-05-21.
fixed storing a copy of a (declared) field (without any modification) in a temporary “backup” variable. The temporary variable was not a copy but an alias to the same data, so if the field was modified afterwards, the temporary variable was also modified implicitly.
As an example, the following code failed before the fix:
# age is a field
- backup: age
# modify age (this also modified backup!)
- age: age + 1
# failed because "backup" was equal to "age"
- assertEqual(age, backup + 1)
This only affected assignment of “pure” fields, not expressions nor temporary variables, for example, the following code worked fine (because backup stores an expression, not a simple field):
- backup: age * 1
- age: age + 1
- assertEqual(age, backup + 1)
and this code worked too (because temp is a temporary variable, not a field):
- temp: age + 1
- backup: temp
- temp: temp + 1
- assertEqual(temp, backup + 1)
Released on 2013-03-27.
Released on 2013-03-15.
globals handling has been vastly improved:
multiple tables: one can now define several tables in globals and not only the “periodic” table.
These should be imported in the import file and declared in the simulation file in the exact same way that periodic globals are.
Their usage within a simulation is a bit different though: whereas periodic global variables can be used without prefixing, others globals need to be prefixed with the name of their table. For example, if one has declared a global table named “othertable”:
othertable:
fields:
- INTFIELD: int
- FLOATFIELD: float
its fields can be used like this:
my_variable: othertable.INTFIELD * 10
These other global tables need not contain a PERIOD column. When using such a table, LIAM2 will not automatically subtract the “base period” from the index, which means that to access a particular row, you have to use its row index (0 based).
n-dimensional globals: in addition to tables, globals can now be n-dimensional arrays. The file format for those should be the same than alignment files. They should be declared like this:
MYARRAY: {type: float}
globals can now be used in all situations instead of only in simple expressions and only for the “current” period. Namely, it makes globals available in: link functions, temporal functions (lag, value_for_period, ...), matching(), new() and in (all the different flavours of) the interactive console.
alignment has been vastly improved:
align_abs is a new function with the same arguments than align which can be used to align to absolute numbers per category, instead of proportions. Combined with other improvements in this release, this allows maximum flexibility for computing alignment targets on the fly (see below).
align on a linked entity (a.k.a immigration): additionally to the arguments of align, align_abs has also an optional “link” argument, which makes it work on the linked entities. The link argument must a one2many link. For example, it can be used to take as many household*s as necessary trying to get as close as possible to a particular distribution of *persons. When the link argument is in effect, the function uses the “Chenard” algorithm.
In this form, align_abs also supports two extra arguments:
renamed the “probabilities” argument of align to “proportions”
the “proportions” argument of align() is now much more versatile, as all the following are now accepted:
a single scalar, for aligning with a constant proportion.
a list of scalars, for aligning with constant proportions per category. (this used to be the only supported format for this argument)
an expression returning a single scalar.
an expression returning an n-dimensional array. expressions and possible_values will be retrieved from that array, so you can simply use:
align(score, array_expr)
a list of expressions returning scalars [expr1, expr2].
a string (in which case, it is treated as a filename). The “fname” argument is still provided for backward compatibility.
added an optional “frac_need” argument to align() to control how “fractional needs” are handled. It can take any of three values: “uniform” (default), “cutoff” or “round”.
“uniform” draws a random number (u) from an uniform distribution and adds one individual if u < fractional_need. “uniform” is the default behavior.
“round” simply rounds needs to the nearest integer. In other words, one individual is added for a category if the fractional need for that category is >= 0.5.
“cutoff” tries to match the total need as closely as possible (at the expense of a slight loss of precision for individual categories) by searching for the “cutoff point” that yields:
count(frac_need >= cutoff) == sum(frac_need)
changed the order of align() arguments: proportions is now the second argument, instead of filter, which means you can omit the “fname” or “proportions” keywords and write something like:
align(score, 'my_csv_file.csv')
made align() (and by extension logit_regr) always return False for individuals outside the filter, instead of trying to modify the target variable only where the filter is True. That feature seemed like a good idea on paper but had a very confusing side-effect: the result was different when it was stored in an existing variable than in a new temporary variable.
it is no longer possible to use expressions in alignment files. If you need to align on an expression (instead of a simple variable), you should specify the expression in the alignment function. eg:
align(0.0, fname='al_p_dead.csv', expressions=[gender, age + 1])
the result of a groupby can be used in expressions. This can be used, for example, to compute alignment targets on the fly.
implemented explore on data files (.h5), so that one can, for example, explore the input dataset.
added skip_na (defaults to True) argument to all aggregate functions to specify whether or not missing values (nan for float expressions, -1 for integer expressions) should be ignored.
macros can now be used in the interactive console.
added “globals” command in the interactive console to list the available globals.
added qshow() command to show an expression “textual form” in addition to its value. Example:
qshow(grpavg(age))
will display:
grpavg(age): 38.5277057298
added optional “pvalues” argument to groupby() to manually provide the “axis” values to compute the expression on, instead of having groupby compute the combination of all the unique values present in the dataset for each column.
improved the documentation, in part thanks to the corrections and suggestions from Alexis Eidelman.
added a “known issues” section to the documentation.
grpmin and grpmax ignore missing values (nan and -1) by default like other aggregate functions.
grpavg ignore -1 values for integer expressions like other aggregate functions.
made the operator precedence for “and”, “or” and “not” more sensible, which means that, for example:
age > 10 and age < 20
is now equivalent to:
(age > 10) and (age < 20)
instead of raising an error.
many2one links are now ~30% faster for large datasets.
during import, when a column is entirely empty and its type is not specified manually, assume a float column instead of failing to import.
allow “id” and “period” columns to be defined explicitly (even though they are still implicit by default).
allow “period” in any dimension in alignment files, not only in the last one.
disabled all warnings for x/0 and 0/0. This is not an ideal situation, but it is still an improvement because they appeared in LIAM2 code and not in user code and as such confused users more than anything.
the “num_periods” argument of lag: lag(age, num_periods) can now be a scalar expression (it must have the same value for all individuals).
changed output format of groupby to match input format for alignments.
added Warning in grpgini when all values (for the filter) are zeros.
when an unrecoverable error happens, save the technical error log to the output directory (for run and explore commands) instead of the directory from where liam2 was run and display on the console where the file has been saved.
better error message when an input file has inconsistent row lengths.
better error message when using a one2many function in a groupby expression.
Released on 2012-11-28.
Released on 2012-10-25.
Released on 2011-12-02.
Released on 2011-11-25.
Released on 2011-06-29.
Released on 2011-06-20.
Released on 2011-06-07.
added support for retrospective simulation (ie simulating periods for which we already have some data): at the start of each simulated period, if there is any data in the input file for that period, it is “merged” with the result of the last simulated period. If there is any conflict, the data in the input file has priority.
added “clone” function which creates new individuals by copying all fields from their “origin” individuals, except for the fields which are given a value manually.
added breakpoint function, which launches the interactive console during a simulation. Two more console commands are available in that mode:
The breakpoint function takes an optional period argument so that it triggers only for that specific period.
added “tsum” function, which sums an expression over the whole lifetime of individuals. It returns an integer when summing integer or boolean expressions, and a float for float expressions.
implemented using the value of a periodic global at a specific period. That period can be either a constant (eg “MINR[2005]”) or an expression (eg “MINR[period - 10]” or “MINR[year_of_birth + 20]”)
added “trunc” function which takes a float expression and returns an int (dropping everything after the decimal point)
made integer division (int / int) return floats. eg 1/2 = 0.5 instead of 0.
processes which do not return any value (csv and show) do not need to be named anymore when they are inside of a procedure.
the array used to run the first period is constructed by merging the individuals present in all previous periods.
print timing for sub-processes in procedures. This is quite verbose but makes debugging performance problems/regressions easier.
made error messages more understandable in some cases.
manually flush the “console” output every time we write to it, not only within the interactive console, as some environments (namely when using the notepad++ bundle) do not flush the buffer themselves.
disable compression of the output/simulation file, as it hurts performance quite a bit (the simulation time can be increased by more than 60%). Previously, it was using the same compression settings as the input file.
allowed align() to work on a constant. eg:
align(0.0, fname='al_p_dead_m.csv')
made the “tavg” function work with boolean and float expressions in addition to integer expressions
allowed links to be used in expression given in the “new” function to initialise the fields of the new individuals.
using “__parent__” in the new() function is no longer necessary.
made the “init” section optional (it was never intended to be mandatory).
added progress bar for copying table.
optimised some parts for speed, making the whole simulation roughly as fast as 0.1 even though more work is done.
First semi-public release, released on 2011-02-24.