Code architecture ################# This document is meant for people who want to know more about the internals of Liam2, for example to add or modify some functionality. One should already be familiar with how the program is used (on the user side). Concepts ======== Here is a brief description of the most important concepts to understand the code of Liam2, as well as where those concepts are implemented. Simulation ---------- file: simulation.py The *Simulation* class takes care of loading the simulation file (it delegates much of the work to entities), prepares the data source and simulates each period in turn (runs each process in turn). Entity ------ file: entities.py The *Entity* class stores all there is to know about each entity: fields, links, processes and data. It serves as a glue class between everything data, processes, ... When entities are created, they are added to a global registry in registry.py Process ------- file: process.py The *Process* class stores users processes. The most common kind of process is the *Assignment* which computes the value of an expression and stores the result in a variable. The *Compute* class is used as alternative for Assignment when a user does not store the result of an expression (with a side effect) which *does* have a return value (as opposed to Actions). Another very common process is the *ProcessGroup* (a.k.a procedures) which runs a list of processes in order. Action ------ file: actions.py Actions are processes which do not have any result (that can be stored in variables), but have side-effects. Examples include: csv(), show(), remove(), breakpoint() Expressions ----------- file: expr.py (and many others) Expressions are the meat of the code. The *Expr* class is the base class for all expressions in Liam2. It defines all the basic operators on expressions (arithmetic, logical, comparison), but it should not be inherited from directly. file: exprbases.py Liam2 provides many different bases classes to inherit from when implementing a new function: * NumexprFunction: base class for functions which are implemented as-is in numexpr. eg. abs, log, exp * CompoundExpression: base class for expressions which can be expressed in terms of other "liam2" expressions. eg. min, max, zeroclip * EvaluableExpression: base class for all other expressions (those that do not exist in numexpr and cannot be expressed in terms of other liam2 expressions). These expressions need to be pre-evaluated and stored in a (hidden) temporary variable before being fed to numexpr, and this is what EvaluableExpression does. One should only inherit from this class directly if none of the below subclasses applies. a) NumpyFunction: subclass for functions which are implemented as is in Numpy. Should not be used directly. * NumpyCreateArray: subclass for functions which create arrays out of nothing (usually random functions). * NumpyChangeArray: subclass for functions which take an array as input and give another array as output (eg clip, round). * NumpyAggregate: subclass for aggregate functions. eg. count, min, max, std, median. b) FunctionExpression: subclass for functions (which take one expression as argument). eg. trunc, lag, duration, ... * FilteredExpression: subclass for functions which also take a filter argument. eg. align, sum, avg, gini. Liam2 current expressions are implemented in the following files: alignment.py handles align() and align_abs() functions align_link.py the core algorithm (an implementation of Chenard's) for align_abs(link=) exprmisc.py all expressions which are not defined in another file. groupby.py handles groupby() links.py contains all link-related code: * the *Link* class stores the definition of links * the *LinkValue* class handles ManyToOne links * link functions to handle OneToMany links: countlink, sumlink, avglink, minlink and maxlink matching.py handles the matching() function regressions.py handles all the regression functions: logit_score, logit_regr, cont_regr, clip_regr, log_regr tfunc.py handles all time-related functions: value_for_period, lag, duration, tavg and tsum Context ------- file: context.py A context is a data structure used to keep track of "contextual" information: what is the "current" entity, what is the "current" period, what is the "current" dataset. The context is passed around to the evaluation functions/methods. A context must present a simple dictionary interface (key: value). There are a few keys with special meanings: period should be the period currently being evaluated __len__ if present, should be an int representing the number of rows in the context __entity__ current entity __globals__ if present, should be a dictionary of global tables ('periodic', ...) The kind of context which is most used is the *EntityContext* which provides a context interface to an Entity. Other files =========== Main code --------- config.py Stores some global configuration variables console.py Handles the interactive console cpartition.pyx Cython source to speed up our partitioning function (group_indices_nd) which is used in groupby and alignment. cpartition.c generated from cpartition.pyx using Cython cpartition.pyd cpartition.c compiled cutils.pyx Cython source to speed up some commonly used utility functions. cutils.c generated from cutils.pyx using Cython cutils.pyd cutils.c compiled data.py handles loading, indexing, checking, merging, copying or modifying (adding or removing fields) tables (or subsets of them). It tries to provide a uniform interface from different data sources but it is a work in progress. exprparser.py parsing code for expressions importer.py code to import csv files in our own hdf5 "subformat" by reading an "import file" (in yaml). khash.h Generic hash table from Klib, used in cpartition.pyx see https://github.com/attractivechaos/klib main.py The main script. It reads command line arguments and calls the corresponding code (run, import, explore) in simulation.py (run/explore) or importer.py (import) partition.py handles partitioning objects depending on the possible values of their columns. registry.py global registry of entities utils.py miscellaneous support functions standalone scripts ------------------ diff_h5.py diff two liam2 files dropfields_h5.py copy a subset of a liam2 file (excluding specified columns) filter_h5.py copy a subset of a liam2 file (all rows matching specified condition) merge_h5.py merge two liam2 files build scripts ------------- build_exe.py generic script to make executables (for standalones scripts) setup.py compile cython extensions to pyd and make an .exe for the main liam2 executable (using cx_Freeze)