Version 0.9.1.1 released

This is a remake of the 0.9.1 release with a correct version number and other packaging fixes.

Download this release from the download section.

Fixes

  • fixed version number (release 0.9.1 reports itself as 0.9).
  • fixed various small problems with the release script.
  • made the 32b version use the exact same versions of third-party packages as the 64b version.

Version 0.9.1 released

This is a bugfix only release.

Download it from the download section.

Fixes

  • Alignment under and overflows are displayed again, even when debug is not set (closes issue 155).
  • Always load all declared entities even if there is no process executed for them (but produce a warning in that case). This used to cause a problem if an entity had no process but was used through links (closes issue 89).
  • Fixed dump when the filter is False (a scalar), for example in dump(filter=period == 2100) (closes issue 142).
  • Fixed the declared return type of many random distributions, which caused a problem if they were used with a filter or within an if() expression. This change also fixed cont_regr when both mult and filter arguments are used (closes issue 153).

Version 0.9 released

Highlights of this release are many new random number generators, performance improvements, new matching algorithms, nicer internal code and many other smaller fixes and improvements.

Download this release from the download section.

New features

  • added support for most of the random number generators provided by Numpy which were not already supported by LIAM2: beta, chisquare, dirichlet, exponential, f, gamma, geometric, hypergeometric, laplace, lognormal, multivariate_normal, noncentral_chisquare, noncentral_f, pareto, power, rayleigh, standard_cauchy, standard_exponential, standard_gamma, standard_normal, standard_t, triangular, vonmises, wald, weibull, zipf, binomial, logseries, negative_binomial, poisson and multinomial (see the random functions section for details). Closes issue 137.

  • added the rank_matching() function as an alternative method to match two sets of individuals. Based on pull request 136 from Alexis Eidelman.

  • added an optional “algo” argument to the matching() function, which can be set to either “onebyone” or “byvalue”.

    • “onebyone” is the current default and should give the same result as previous versions.
    • “byvalue” groups individuals by their value for all the variables involved in both the score and orderby expressions, and match groups together. Depending on whether all individuals have different combination of values or not, this can be much faster than matching each individual in turn. It is highly encouraged to use this option as it is much faster in most cases and it scales better (O (N1g*N2g) instead of O(N1*N2) where N1g and N2g are the number of combination of values in each set and N1 and N2 are the number of individuals in each set). However, the results are NOT exactly the same than in previous versions, even though they are both correct. This means that simulation results will be harder to compare against simulation results obtained using previous versions. This will be the new default value for version 0.10. Please also note that this new option is only available if the C extensions are installed. In our test models on actual data, this version ran from 50% faster to 3x faster.

    This code is based on the optimized_matching work from pull request 144 by Alexis Eidelman.

  • added the possibility to automatically generate an order in matching() by using the special value ‘EDtM’ for its orderby argument. Based on pull request 136 from Alexis Eidelman.

  • added an optional ‘pool_size’ argument to matching(). If used, the best match for an individual is looked for in a random subset of size pool_size. Based on pull request 136 from Alexis Eidelman.

Miscellaneous improvements

  • updated bundled dependencies to their latest version. The numpy upgrade to version 1.9 brings some performance improvements in various areas (our test simulation runs approximately 15% faster overall).
  • large internal refactoring
    • it is now easier to define new functions (there is much less code to write).
    • all arguments to all functions can now be expressions. Closes issue 5.
    • cleaner variable scopes. Eliminates a whole class of potential problems when using two fields with the same name but a different entity (via a link) in the same expression. Closes issue 41.
  • cache some internal structure so that it is not recomputed over and over, which improves overall performance by a few percents in some cases, especially when computing many “small” expressions as is often the case in one-by-one matching() (which improved in our tests by 10-20%).
  • remove() can now be called without filter argument (it removes all individuals)
  • better and more consistent error messages when calling functions with incorrect arguments (too few, too many, ...)
  • use input/path as the base directory for loading .csv globals (those using an explicit “path”) instead of using the directory of the HDF input file.
  • nicer string representation of some expressions (this only affects qshow and groupby).
  • the –versions command-line argument now also shows versions for optional dependencies (if present).
  • improved many tests, especially the ones for matching().

Fixes

  • fixed the “view” command (to launch ViTables – via F9 for example) in the 64 bit bundle. This was a regression in 0.8.2. Closes issue 147.

  • fixed computing most expressions involving arrays with more than one dimension. It only worked if all the arrays involved were based on the same “source” array (which was the case in our tests).

  • assertEqual fails gracefully when comparing two arrays with different shapes.

  • fixed global fields colliding with fields with the same name from other (global) tables.

  • fixed expressions like:

    if(filter_expr, align(..., array[scalar], ...), False)
    

    and made all if(expr, GLOBAL[scalar_value], ...) expressions faster in the process.

  • fixed a rare problem with some expressions using scalars returned by aggregate functions.

Version 0.8.2 released

New features

  • allow loading globals from csv files directly, instead of having them in the input HDF file. One can mix both approaches (have some globals in the “input” HDF file while other globals in .csv files). This makes it more practical in case you have a relatively stable input dataset for entities, but need to change the globals often, or have several variants with different globals. See the globals section for details. This closes issue 30.
  • added a new simulation option category: “logging”. This is the new home for the existing “timings” option, and for a new “level” option, which allows to sets the verbosity of logging to one of periods, procedures (this is the default) or processes (which corresponds to the behavior in earlier versions).

Miscellaneous improvements

  • decreased memory usage for models with many periods. It used to keep the index of individuals of past periods forever in memory, which had approximately the same effect than adding one column for each period. Since it is only useful when going back in time more than one period, it now flushes it to disk for periods < period - 1. This change has two consequences: it will decrease memory use for models with many periods but will slow down models going back in time more than one period. This closes issue 130.
  • the “top times” at the end of the simulation now also include the % of total.
  • after each period, a very rough estimate of the remaining time for the simulation is displayed (closes issue 127 in combination with the logging/level option).
  • updated all dependencies provided in the bundle.
  • improved the release script.

Fixes

  • Using links or other “non-simple” variables in the score expression of the matching() function was a lot slower and memory-hungry than necessary because some “system temporary variables” kept accumulating. It is still a lot slower than it should though, see issue 128.

Version 0.8.1 released

New features

  • added the gumbel() function to draw random numbers from the Gumbel distribution (also known as the Smallest Extreme Value (SEV) distribution). Thanks to Gijs Dekkers for the patch.

Fixes

  • fixed a performance regression in 0.8 when using very large negative numbers as indices for a global (eg MINR[bad_index] where bad_indix contains several -2147483648 values). Closes issue 121.
  • fixed the (debug) interactive console to not produce a useless warning if a global temporary was run before entering the console (regression in 0.8). Closes issue 120.
  • added missing documentation for assertFalse and assertNanEqual.

Version 0.8 released

New features

  • added a few functions to create charts (courtesy of matplotlib): bar, plot, pie, stackplot, boxplot and scatter. As with all other functions in liam2, they are available both during a simulation and in the interactive console. The charts can either be visualized directly or saved to a file. See the charts section for details.
  • added a “view” command line option to LIAM2 to open ViTables (an hdf5 viewer) as well as a corresponding menu entry and keyboard shortcut (F9) in Notepad++. It is meant to be used when editing a model file, and it will open both the input dataset and the result file (if any).
  • document boolean aggregate functions: all() and any() which were added in 0.7 but were not yet documented.
  • added assertFalse assert function.

Miscellaneous improvements

  • improved the first few demonstration models quite a bit. A recommended read for all users.
  • added precisions in the documentation of align() based on Alexis Eidelman suggestions.
  • made a few more error messages a bit more useful by displaying the line where the error occurred.
  • sped up global[array_expr].
  • give a hint to use assertNanEqual when it would make a failing assertEqual pass.
  • implemented global[slice_expr] (eg. MINR[period: period+2]). When the slice bounds are arrays (different for each individual) and the slice length is not constant (not the same for all individals), it returns a special array with an extremely limited set of supported operations: only aggregates on axis=1 are implemented.
  • include the documentation only in HTML Help format (.chm) in the bundle. .pdf and “normal” html are still available as separate downloads on the website.
  • removed the predictor keyword support (it now raises an exception instead of a warning).
  • adapted the release script since our move to git and converted it to Python.

Fixes

  • fixed the “syntax upgrade” script by removing a special case for grpmin and grpmax as it was in fact not needed and caused problems when the expression being aggregated contained parentheses.

LIAM2 is on GitHub

The source code for LIAM2 is now hosted on GitHub:

https://github.com/liam2/liam2/

This offers many advantages over what we did previously (create a snapshot of the code each time a new version is released):

  • one can follow the development as it happens (see what changed at any time, browse the source code at any revision, ...)
  • much easier integration of external contributions
  • more visibility for liam2
  • and much more...

Enjoy!

Version 0.7 released

New features

  • implemented imports so that simulation files can be split and reused. This can be used to simply split a large model file into smaller files, or (more interestingly) to create simulation variants without having to duplicate the common parts. This feature was inspired by some code from Alexis Eidelman. For details see the Importing other models section.
  • added new logit and logistic functions. They were previously used internally but not available to modellers.
  • added two new debugging features: autodump and autodiff. autodump will dump all (non-scalar) variables (including temporaries) at the end of each procedure in a separate hdf5 file. It can be used stand-alone for debugging, or in combination with autodiff. Autodiff will gather all variables at the end of each procedure and compare them with the values stored previously by autodump in another run of the model (or a variant of it). This can be used to precisely compare two versions/variants of a model and see exactly where they start to differ.
  • added new assert functions:
    • assertIsClose to check that two results are “almost” equal tolerating small value differences (for example due to rounding differences).
    • assertEquiv to check that two results are equal tolerating differences in shape (though they must be compatible).
    • assertNanEqual to check that two arrays are equal even in the presence of nans (because normally nan != nan).
  • added a new “timings” option to hide timings from the simulation log, so that two simulation logs are more easily comparable (for example with “diff” tools like WinMerge).
  • added a menu entry in notepad++ to run a simulation in “debug mode”.

Miscellaneous improvements

  • improved the performance and memory usage by changing the internal memory layout. Most operations are now faster. new(), remove(), “merging data” (for retrospective simulations) and writing data at the end of each period are now slower. In our model, this translates to a peak memory usage 20% smaller and a 35% overall speed increase. However, if your model has a low processes/variables ratio, it may very well be slower overall with this version. If it is your case, please contact us.

  • changed the syntax for all aggregate functions: grpxxx(...) should now be xxx(...). For example, grpsum(age) should now be: sum(age). The old syntax is still valid but it is deprecated (it will be removed in a later version). A special note for grpmin() and grpmax() which becomes min() and max() respectively even though those functions already existed. The meaning is deduced from the number of “non-keyword” arguments:

    min(expr1, expr2)

    minimum between expr1 and expr2 (for each individual)

    min(expr)

    (aggregate) minimum value of “expr” over all individuals

    min(expr1, filter=expr2)

    (aggregate) minimum value of “expr” over individuals satisfying the filter

    A tool to automatically upgrade models to the new syntax is provided. In notepad++, you should use the LIAM2: upgrade model command in the Macro menu.

    You can also run it via the command line:

    main upgrade model.yml [output.yml]
    

    see main upgrade –help for details.

  • changed the syntax for all one2many link functions: xxxlink(link_name, ...) should now be link_name.xxx(...). For example, countlink(persons) should now be: persons.count(). The old syntax is still valid but it is deprecated (it will be removed in a later version). As for aggregate functions, one can upgrade its models automatically with the “upgrade” command.

  • the “period” argument of value_for_period can now be a scalar expression (it must have the same value for all individuals).

  • when the output directory does not exist, LIAM2 will now try to create it.

  • when debug mode is on, print the position in the random sequence before and after operations which use random numbers.

  • entities are loaded/stored for each period in alphabetical order instead of randomly. This has no influence on the results but produces nicer log files.

  • deprecated the “predictor” keyword. If you need several processes to write to the same variable, you should use procedures instead.

Fixes

  • using invalid indexes in “global arrays” do not crash anymore if they are properly enclosed in an if() expression. For example if you have an array “by_age” with values for indices from 0 to 99, the following code will now work as expected:

    if(age < 50, by_age[age + 50], 0.5)
    

    Periodic globals are unaffected (they always return “missing” when out of bounds).

  • fixed link expressions which span 3 (or more) different entities.

  • fixed using show() on a scalar created by summing a “global array”.

  • fixed the progress bar of matching() when the number of individuals is different in the two sets.

Version 0.6.2 released

Fixes

  • fixed storing a copy of a (declared) field (without any modification) in a temporary “backup” variable. The temporary variable was not a copy but an alias to the same data, so if the field was modified afterwards, the temporary variable was also modified implicitly.

    As an example, the following code failed before the fix:

    # age is a field
    - backup: age
    # modify age (this also modified backup!)
    - age: age + 1
    # failed because "backup" was equal to "age"
    - assertEqual(age, backup + 1)
    

    This only affected assignment of “pure” fields, not expressions nor temporary variables, for example, the following code worked fine (because backup stores an expression, not a simple field):

    - backup: age * 1
    - age: age + 1
    - assertEqual(age, backup + 1)
    

    and this code worked too (because temp is a temporary variable, not a field):

    - temp: age + 1
    - backup: temp
    - temp: temp + 1
    - assertEqual(temp, backup + 1)
    

Version 0.6.1 released

Miscellaneous improvements

  • when importing an nd-array skip cells with only spaces in addition to empty cells.

Fixes

  • fixed using non-scalar values (eg fields) as indices of n-dimensional arrays, and generally made indexing n-dimensional arrays more robust.
  • fixed choice which did not refuse to run when the sum of probability is != 1 and the “error” is > 1e-6, as it should. This was the case in past versions but the test was accidentally removed in version 0.5.
  • fixed choice to warn when the sum of probabilities is > 1 (and the error is <= 1e-6). Previously, it only warned if the sum was < 1.