quantpylib.simulator.models
quantpylib.simulator.models
house powerful features for statistical analysis involving market and non-market variables. It features quantpylib.simulator.models.GeneticRegression
class that is an abstraction layer written on top of the
quantpylib.simulator.gene.Gene
class and statsmodels.formula.api
to perform no-code
regression analysis using simple string specifications.
An example scenario for quantitative analysis is a momentum study on the impact of standardized returns on forward returns. We may specify such a regression study by the following regression formula:
TheGeneticRegression
enables this in multiple steps:
- Parse the formula into blocks:
b0
:forward_1(logret_1())
b1
:div(logret_25(),volatility_25())
b2
:tsargmax_16(close)
- Construct the equivalent regression specification:
b0 ~ b1 + b2
- Evaluate each block using our evaluator-parser in the
Gene
class. - Pass the evaluated blocks and regression specification into
statsmodels
for regression analysis.
This allows the user to both leverage on the well-tested and familiar statistical package developed under statsmodels
, while enhancing the expressive capabilities of the formulaic language specialized for trading analysis. The full list of primitives (constants and functions) are documented here.
We also provide additional convenience methods for data binning and aggregation, diagnostics and plotting. The specifications are to be referred below.
The following notations apply in the documentation:
- y, b0 : response variable
- x[*], b[1..] : independent variable(s)
- y^ : fitted response
- uCI : upper confidence interval
- lCI : lower confidence interval
- res : residuals
- res# : (z-score) normalized residuals
- res+ : internally studentized residuals
- res* : externally studentized residuals
- PRP : partial regression plot
- CCPR : component-component plus residual plot
Bin
Bases: Enum
Enumeration representing different binning methods.
Attributes:
Name | Type | Description |
---|---|---|
WIDTH |
Binning method where each bin has an equal interval length. |
|
OBSERVATIONS |
Binning method where each bin contains an equal number of observations. |
GeneticRegression
__init__(formula='forward_1(logret_1()) ~ div(logret_25(),volatility_25())', intercept=True, df=None, start=None, end=None, dfs={}, instruments=[], granularity=Period.DAILY, build=True)
Initializes a GeneticRegression object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
formula
|
str
|
The regression formula. The formula describes the statistical model being analysed, and is closely inspired by
the formula mini-language used in R and S. The model formula should consist of valid string representations of |
'forward_1(logret_1()) ~ div(logret_25(),volatility_25())'
|
start
|
datetime
|
Start period for regression analysis. If not tz-aware, assumed UTC. If not given, assume min of dataset in dfs. |
None
|
end
|
datetime
|
End period for regression analysis. If not tz-aware, assumed UTC. If not given, assume max of dataset in dfs. |
None
|
dfs
|
dict
|
inst : OHLCV/other Dataframes used for computations. Default is an empty dictionary. |
{}
|
instruments
|
list
|
List of instruments used in the regression analysis. |
[]
|
granularity
|
Period
|
The granularity of the regression analysis.
Datapoints of lower granularity than specified are ignored. Last known datapoint of multiple entries in the same granularity interval is taken. Default is |
DAILY
|
build
|
bool
|
Whether to evaluate the formulaic blocks upon initialization. Default is True. |
True
|
build()
Evaluates the formulaic (dependent and independent) blocks to be used as regression
variables using the initialized formula and dataframes provided. If build=False
at initialization, then
build()
needs to be called before any of the regression methods, such as ols
, are called.
diagnose()
Diagnoses the regression model for multicollinearity and other issues.
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the following:
|
ols(axis='flatten', bins=0, binned_by=Bin.OBSERVATIONS, bin_block='b0', selector=None, aggregator=lambda x: np.mean(winsorize(x, limits=(0.05, 0.05))))
Performs Ordinary Least Squares (OLS) regression analysis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
axis
|
str
|
The axis along which the regression analysis is performed. Possible values are |
'flatten'
|
bins
|
int
|
The number of bins for grouping the data. If 0, no binning is performed. Defaults to 0. |
0
|
binned_by
|
Bin
|
The method used for binning the data. Possible values are |
OBSERVATIONS
|
bin_block
|
str
|
The block used for binning the data. Defaults to |
'b0'
|
selector
|
str or datetime
|
|
None
|
aggregator
|
callable or dict
|
Used for aggregating data within each bin. If callable is provided, then all of the blocks are aggregated using this
function. Different aggregators can be provided for different blocks, by providing dictionary containing |
lambda x: mean(winsorize(x, limits=(0.05, 0.05)))
|
Returns:
Type | Description |
---|---|
RegressionResults
|
The statsmodels results of the OLS regression analysis. |
parse_formula(formula)
staticmethod
Obtains the block-formula mapping and the derived patsy
formula describing the regression model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
formula
|
str
|
The regression formula. The formula describes the statistical model being analyzed and follows the syntax of the patsy formula. Supports [ ~ + * : ] operators. |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
A tuple containing
|
Examples:
plot(fit=True, diagnostics=True, influence=True, leverage=True)
Plots various diagnostic plots for the regression model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fit
|
bool
|
If
|
True
|
diagnostics
|
bool
|
If
|
True
|
influence
|
bool
|
If
|
True
|
leverage
|
bool
|
If
|
True
|