Skip to content

quantpylib.simulator.gene

quantpylib.simulator.gene houses powerful features for numerical computations involving market and non-market variables, including a no-code mathematical parser-evaluator that computes trading signals/indicators from formulaic, well-defined Python str objects. The parser-evaluator is exposed via the quantpylib.simulator.gene.Gene class APIs, which internally uses a tree-data structure to encode trading formulas. The quantpylib.simulator.gene.GeneticAlpha class leverages this parser-evaluator, as well as the backtest engine provided by our quantpylib.simulator.alpha.Alpha class to provide a no-code solution to backtesting trading strategies. The GeneticAlpha class extends the Alpha class to implement all the necessary methods for signal computation, forecast generation, position sizing, risk-management, volatility targeting and backtest logic. All the performance metrics and hypothesis testing suites made available to the Alpha objects are naturally available to any GeneticAlpha instance via the same function signatures.

GeneticAlpha

(Bases: quantpylib.simulator.alpha.Alpha )

Parameters:

Name Type Description Default
genome Gene or str

Genome representation as Gene object or in mathematical string format.

required
**kwargs

Backtest parameters required to instantiate quantpylib.simulator.alpha.Alpha objects.

{}

Parameters in **kwargs are required and passed into quantpylib.simulator.alpha.Alpha and are as follows:

Parameters:

Name Type Description Default
start datetime

Start of backtest simulation. If not tz-aware, assumed UTC. If not given, assume min of OHLC dataset in dfs.

None
end datetime

Start of backtest simulation. If not tz-aware, assumed UTC. If not given, assume max of OHLC dataset in dfs.

None
dfs dict

inst : OHLCV/other Dataframes used for computations. Default is an empty dictionary.

{}
instruments list

List of traded instruments.

[]
execrates ndarray

Execution rates for each instrument. Default is None.

None
commrates ndarray

Commission rates for each instrument. Default is None.

None
longswps ndarray

Long annualized swap/funding rates for each instrument. Positive swaps means long positions incur swap fees. Default is None.

None
shortswps ndarray

Short annualized swap/funding rates for each instrument. Positive swaps means short positions incur swap fees. Default is None.

None
granularity Period

The granularity of each trading signal evaluation. Datapoints of lower granularity than specified are ignored. Last known datapoint of multiple entries in the same granularity interval is taken. Default is Period.DAILY.

DAILY
positional_inertia float

Parameter controlling position change inertia. Default is 0.

0
portfolio_vol float

Target portfolio volatility. Default is 0.20, representing 20% annualized volatility.

0.2
weekend_trading bool

Indicates if there is weekend trading, such as in cryptocurrency markets. Defaults to False.

False
around_the_clock bool

Indicates if there is 24H trading, such as in cryptocurrency and fx markets. Defaults to False.

False
currency_denomination str

Currency denomination for the portfolio. Default is "USD".

'USD'
starting_capital float

Amount to begins the backtest with. Defaults to 10000.0.

10000.0

Notes: execrates, commrates, longswps, shortswps are presented in decimals. execrates = [0.001, 0.005, ...] encodes that 0.1% of notional value transacted is deducted as execution costs for first instrument, second instrument... commrates specify commisions in the same units (as percentage of notional value), as for overnight swap rates for both long and short positions.

run_simulation async

Runs the entire backtest.

Parameters:

Name Type Description Default
verbose

(boolean, optional) flag to print out backtest simulation information at runtime.

False

Returns:

Type Description
DataFrame

a DataFrame containing backtest statistics. Contains information about contracts held throughtout the backtest, portfolio weights, portfolio leverage, nominal exposusure, execution costs, commissions, swaps, PnL, portfolio capital and so on.

get_performance_measures

Computes the performance metrics for the trading strategy.

Returns:

Type Description
dict

A dictionary containing various performance metrics:

  • "cum_ret": Cumulative returns over time.
  • "log_ret": Logarithmic returns.
  • "max_dd": Maximum drawdown.
  • "1y_roll_dd": One-year rolling drawdown.
  • "1y_roll_max_dd": One-year rolling maximum drawdown.
  • "sortino": Sortino ratio.
  • "sharpe": Sharpe ratio.
  • "mean_ret": Mean return per annum.
  • "median_ret": Median return per annum.
  • "stdev_ret": Standard deviation of returns per annum.
  • "var_ret": Variance of returns per annum.
  • "skew_ret": Skewness of returns.
  • "kurt_exc": Excess kurtosis of returns.
  • "cagr": Compound annual growth rate.
  • "3y_roll_cagr": Three-year rolling compound annual growth rate.
  • "3y_roll_calmar": Three-year rolling Calmar ratio.
  • "omega(0)": Omega ratio.
  • "ulcer": Ulcer index.
  • "VaR95": Value at Risk at 95% confidence level.
  • "cVaR95": Conditional Value at Risk at 95% confidence level.
  • "gain_to_pain": Gain-to-pain ratio.
  • "w_summary": Summary statistics of weights.
  • "directionality": Market long bias directionality.
  • "parity_distance": Distance from a 1/n equal weight portfolio.

hypothesis_tests async

Conducts monte carlo permutation p-value hypothesis tests on the performance of the trading strategy represented by the object instance.

Parameters:

Name Type Description Default
num_decision_shuffles int

Number of decision shuffles for monto carlo permutation tests. Default is 1000.

1000
num_data_shuffles int

Number of data shuffles for permutation tests. Default is 10 (computationally expensive).

10

Returns:

Type Description
dict

A dictionary containing the results of hypothesis tests.

  • 'timer_p': p-value from asset-timing test that shuffles time-series asset returns.
  • 'picker_p': p-value from asset-picking shuffler test that shuffles cross-sectional asset returns.
  • 'trader_p1': p-value from decision-making test that shuffles both time-series and cross-sectional asset returns.
  • 'trader_p2': p-value from decision-making test that shuffles market data.

Gene

Represents a formulaic alpha expression used to encode trading rules.

This class internally represents a trading rule as a tree data structure, where each node can either be a terminal (leaf) node or a functional node. Terminal nodes represent data points or constants, while functional nodes represent operations on their child nodes.

str_to_gene staticmethod

Converts a string representation of a gene/formulaic alpha into a Gene object.

Parameters:

Name Type Description Default
strgene str

The string representation of the gene.

required

Returns:

Type Description
Gene

A Gene object representing the gene.

Raises:

Type Description
AssertionError

If strgene representation is misspecified by syntax rules.

Notes

This is the recommended and most intuitive way of creating a Gene object.

__init__

Initializes a Gene object.

Parameters:

Name Type Description Default
prim str

The primary function/terminal of the gene.

required
space str

The space value associated with the gene, specifying details of the primitive. For example, in the context of financial trading, this could represent parameters such as window size or lookback period of a rolling correlation function. Defaults to None.

None
is_terminal bool

Indicates whether the gene is terminal node.

required
parent Gene

The parent gene. Defaults to None.

None
children list

The list of child genes. Defaults to an empty list.

[]

The list of prim primitives supported by our library, their behavior and their interpretations can be found here.

evaluate_node

Recursively evaluates a node in the formulaic alpha expression. When called on the root node in the gene representation, this function evaluates the entire formulaic expression.

Parameters:

Name Type Description Default
insts list

The list of instrument names.

required
dfs dict

A dictionary containing pricing/alternative data DataFrames for each instrument.

required
idx Index

The index for alignment.

required

Returns:

Type Description
DataFrame

The evaluated node with DataFrame.index=idx and DataFrame.columns=insts.

dfs should contain DataFrames for all terminals required in the evaluation of the gene representation. For OHLCV terminals, dfs should contain key-value pair of (ticker:pd.DataFrame). pd.DataFrame objects provided should have timezone aware DatetimeIndex properties. For instance:
{
    'BRKB':
                                open      high       low        close        adj_close volume
    datetime
    2000-01-03 00:00:00+00:00  1825.0000  1829.0000  1741.0000  1765.0000    35.3000   873500     
    2000-01-04 00:00:00+00:00  1725.0000  1733.0000  1695.0000  1704.0000    34.0800  1380000     
    2000-01-05 00:00:00+00:00  1707.0000  1773.0000  1695.0000  1732.0000    34.6400   997000     
    2000-01-06 00:00:00+00:00  1745.0000  1804.0000  1727.0000  1804.0000    36.0800   917000     
    2000-01-07 00:00:00+00:00  1830.0000  1848.0000  1805.0000  1820.0000    36.4000  1001500     
    ...                              ...        ...        ...        ...        ...      ...     
    2009-12-24 00:00:00+00:00  3281.9999  3295.9899  3274.9999  3286.9999    65.7400   607600     
    2009-12-28 00:00:00+00:00  3279.9999  3289.9999  3274.9999  3285.3699    65.7074  1080250     
    2009-12-29 00:00:00+00:00  3284.9999  3289.6899  3269.9999  3279.9999    65.6000  1105300     
    2009-12-30 00:00:00+00:00  3282.9999  3289.6499  3279.9999  3289.6499    65.7930   560350     
    2009-12-31 00:00:00+00:00  3289.9999  3300.9899  3279.9999  3285.9999    65.7200   972900 
} 
For non-OHLCV terminals, dfs should be supplemented with key-value pair of (ticker_terminal:pd.Series). pd.Series objects provided should have timezone aware DatetimeIndex properties.
{
    'BRKB_earnings' : pd.Series(
            index=[2000-01-03 00:00:00+00:00, ..., 2009-12-30 00:00:00+00:00],
            data=[...]
    ),
    'BRKB_sentiment' : pd.Series(
            index=[2010-01-03 00:00:00+00:00, ..., 2016-12-30 00:00:00+00:00],
            data=[...]
    )
}

make_dot

Generate a DOT language representation of the tree structure rooted at this node.

Returns:

Type Description
str

A string containing the DOT language representation of the tree.

Notes

This method uses pre-order traversal to generate the DOT representation of the tree rooted at the current node. Each node in the tree corresponds to a vertex in the DOT graph, and each edge represents the parent-child relationship between nodes.

The generated DOT string can be rendered into a graphical visualization using graphviz or other tools that support the DOT language.

height

Return the maximium distance from current node to any leaf node that is a descendant.

depth

Return the distance from current node to the root node.

size

Return the number of nodes in the graphical respresentation of the formulaic alpha.

pre_ord_apply

Apply a function to each node in the tree using pre-order traversal.

This method traverses the tree in a pre-order fashion, meaning it applies the function to the current node before recursively traversing its children. The function is applied to each node along with any additional keyword arguments provided.

Parameters:

Name Type Description Default
func callable

A function to be applied to each node in the tree.

required
**kwargs

Additional keyword arguments to be passed to the function.

{}

List of Primitives

The Op value = idx_op can be taken to be default as union_idx_op, or when explicitly paired with the un Space value. It takes intersect_idx_op when paired with the ix Space value. Examples would be plus(open,close), plus_un(open,close), plus_ix(open,close). The different operators and their behavior is documented here.

Primitive Space Op Terminal Args Meaning Example
const int,float - Yes - represents a constant numerical value of x const_3.14
open - - Yes - open price open
high - - Yes - high price high
low - - Yes - low price low
close - - Yes - close price close
volume - - Yes - trade volume volume
* - - Yes - custom variable * (e.g. epsEst, sentiment)
abs - self_idx_op2 No 1 absolute value abs(minus(close,open))
neg - self_idx_op2 No 1 negation neg(minus(close,open))
log - self_idx_op2 No 1 natural logarithm (replacing inf with NaN) log(volume)
sign - self_idx_op2 No 1 sign function sign(minus(close,open))
tanh - self_idx_op2 No 1 tanh function tanh(cszscre(logret_1()))
sigmoid - self_idx_op2 No 1 sigmoid function sigmoid(cszscre(logret_1()))
recpcal - self_idx_op2 No 1 reciprocal (replacing inf with NaN) recpcal(close)
pow int self_idx_op2 No 1 power function (replacing inf with NaN) pow_2(close)
csrank - all_idx_op No 1 cross-sectional rank (smallest=1, average draws) csrank(volume)
cszscre - all_idx_op No 1 cross-sectional Z-score cszscre(volume)
ls int,float / int,float all_idx_op No 1 -1 for values below 25 percentile and +1 for values above 75 percentile ls_25/75(volume)
delta int self_idx_op No 1 time-series change in variable over time delta_1(close)
delay int self_idx_op No 1 time-series delay by specified number of periods delay_1(close)
forward int self_idx_op No 1 time-series lookahead by specified number of periods forward_1(close)
sum int self_idx_op No 1 sum of time-series values sum_5(volume)
prod int self_idx_op No 1 product of time-series values prod_5(volume)
mean, sma int self_idx_op No 1 simple mean of time-series values mean_5(volume)
ema, ewma int self_idx_op No 1 exponentially weighted moving average ewma_5(volume)
median int self_idx_op No 1 median of time-series values median_5(volume)
std int self_idx_op No 1 standard deviation of time-series values std_5(volume)
var int self_idx_op No 1 variance of time-series values var_5(volume)
skew int self_idx_op No 1 skewness of time-series values skew_5(volume)
kurt int self_idx_op No 1 kurtosis of time-series values kurt_5(volume)
tsrank int self_idx_op No 1 time-series rank tsrank_5(volume)
tsmax int self_idx_op No 1 maximum value over time tsmax_5(volume)
tsmin int self_idx_op No 1 minimum value over time tsmin_5(volume)
tsargmax int self_idx_op No 1 index of maximum value over time tsargmax_5(volume)
tsargmin int self_idx_op No 1 index of minimum value over time tsargmin_5(volume)
tszscre int self_idx_op No 1 time-series Z-score tszscre_5(volume)
max -,un,ix idx_op No >=2 maximum over arguments max_ix(open,close,high)
plus -,un,ix idx_op No >=2 sum over arguments plus_un(open,close,high)
minus -,un,ix idx_op No 2 subtraction minus(high,low)
mult -,un,ix idx_op No 2 multiplication mult(open,close)
div -,un,ix idx_op No 2 division div(open,close)
and -,un,ix idx_op No 2 logical AND and(gt(high,low),lt(high,low))
or -,un,ix idx_op No 2 logical OR or(gt(high,low),lt(high,low))
eq -,un,ix idx_op No 2 logical EQUALS eq(gt(high,low),lt(high,low))
gt -,un,ix idx_op No 2 greater-than comparison gt(open,close)
lt -,un,ix idx_op No 2 less-than comparison lt(open,close)
ite -,un,ix idx_op No 3 if-then-else operation ite(or(gt(high,low),lt(high,low)),const_1,const_-1)
cor int slow_idx_op No 2 rolling-correlation cor_12(volume,close)
kentau int slow_idx_op No 2 rolling-Kendall's tau correlation kentau_12(volume,close)
cov int slow_idx_op No 2 rolling-covariance cov_12(volume,close)
dot int slow_idx_op No 2 rolling-dot product dot_12(volume,close)
wmean, wma int slow_idx_op No 2 weighted moving average wmean_12(close,weights)
grssret int - Pseudo 0 period gross returns grssret_12()
logret int - Pseudo 0 period log returns logret_12()
netret int - Pseudo 0 period net returns (gross returns - 1) netret_12()
volatility int - Pseudo 0 volatility (standard deviation of log returns) volatility_12()
rsi int - Pseudo 0 relative strength index indicator rsi_12()
mvwap int - Pseudo 0 moving volume-weighted average price indicator mvwap_12()
obv int - Pseudo 0 on-balance volume indicator obv_12()
atr int - Pseudo 0 average true range indicator atr_12()
tr - - Pseudo 0 true range indicator tr()
adx int - Pseudo 0 average directional movement index adx_12()
addv int - Pseudo 0 average daily dollar volume addv_12()
mac int / int - No 1 moving average crossover indicator function for fast/slow mac_20/50
vma, vwma int - No 1 volume weighted moving average vma_20(close)
vwvar int - No 1 volume weighted variance vmvar_20(logret_1())
vwstd int - No 1 volume weighted standard deviation vmstd_20(logret_1())