Here is a brief description of the most important concepts to understand the
code of Liam2, as well as where those concepts are implemented.
Entity
file: entities.py
The Entity class stores all there is to know about each entity: fields,
links, processes and data. It serves as a glue class between everything
data, processes, ...
When entities are created, they are added to a global registry in registry.py
Process
file: process.py
The Process class stores users processes. The most common kind of process
is the Assignment which computes the value of an expression and stores the
result in a variable.
The Compute class is used as alternative for Assignment when a user does not
store the result of an expression (with a side effect) which does have a
return value (as opposed to Actions).
Another very common process is the ProcessGroup (a.k.a procedures) which
runs a list of processes in order.
Action
file: actions.py
Actions are processes which do not have any result (that can be stored in
variables), but have side-effects. Examples include: csv(), show(), remove(),
breakpoint()
Expressions
file: expr.py (and many others)
Expressions are the meat of the code. The Expr class is the base class for
all expressions in Liam2. It defines all the basic operators on expressions
(arithmetic, logical, comparison), but it should not be inherited from
directly.
file: exprbases.py
Liam2 provides many different bases classes to inherit from when implementing
a new function:
- NumexprFunction: base class for functions which are implemented
as-is in numexpr. eg. abs, log, exp
- CompoundExpression: base class for expressions which can be expressed in
terms of other “liam2” expressions. eg. min, max, zeroclip
- EvaluableExpression: base class for all other expressions (those that do not
exist in numexpr and cannot be expressed in terms of other liam2
expressions). These expressions need to be pre-evaluated and stored in
a (hidden) temporary variable before being fed to numexpr, and this is what
EvaluableExpression does. One should only inherit from this class directly
if none of the below subclasses applies.
- NumpyFunction: subclass for functions which are implemented
as is in Numpy. Should not be used directly.
- NumpyCreateArray: subclass for functions which create arrays out of
nothing (usually random functions).
- NumpyChangeArray: subclass for functions which take an array as input
and give another array as output (eg clip, round).
- NumpyAggregate: subclass for aggregate functions. eg. count, min,
max, std, median.
- FunctionExpression: subclass for functions (which take one expression as
argument). eg. trunc, lag, duration, ...
- FilteredExpression: subclass for functions which also take a filter
argument. eg. align, sum, avg, gini.
Liam2 current expressions are implemented in the following files:
- alignment.py
- handles align() and align_abs() functions
- align_link.py
- the core algorithm (an implementation of Chenard’s) for align_abs(link=)
- exprmisc.py
- all expressions which are not defined in another file.
- groupby.py
- handles groupby()
- links.py
contains all link-related code:
- the Link class stores the definition of links
- the LinkValue class handles ManyToOne links
- link functions to handle OneToMany links: countlink, sumlink, avglink,
minlink and maxlink
- matching.py
- handles the matching() function
- regressions.py
- handles all the regression functions: logit_score, logit_regr, cont_regr,
clip_regr, log_regr
- tfunc.py
- handles all time-related functions: value_for_period, lag, duration, tavg
and tsum
Context
file: context.py
A context is a data structure used to keep track of “contextual” information:
what is the “current” entity, what is the “current” period, what is the
“current” dataset. The context is passed around to the evaluation
functions/methods.
A context must present a simple dictionary interface (key: value). There are
a few keys with special meanings:
- period
- should be the period currently being evaluated
- __len__
- if present, should be an int representing the number of rows in the context
- __entity__
- current entity
- __globals__
- if present, should be a dictionary of global tables (‘periodic’, ...)
The kind of context which is most used is the EntityContext which provides
a context interface to an Entity.