quantpylib.simulator.operators
The quantpylib.simulator.operators
is a helper module consisting of operators that are used for data alignment in the evaluation of functions used by the quantpylib.simulator.gene
's Gene class. The semantics of the functions defined and
how they operate on the arguments provided depend on the class of operators the primitive belongs to. See the list of primitives here.
For example, we may be interested in using as our trading signal the sum of facebook_sentiment
and twitter_sentiment
, where the former is sampled everyday
and latter is sampled only weekly. The formula plus(facebook_sentiment,twitter_sentiment)
can have different interpretations depending on how we want to align data for which one or both of the arguments is nan
.
last-known basis: this phrase can be construed as forward filling DataFrames and applying the relevant operation.
all_idx_op(op, chress, **kwargs)
Apply a specified operation to each row of the provided DataFrame. Missing arguments applied on last-known basis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op |
callable
|
The operation to be applied to each row of the DataFrame. It must accept a pandas Series as input and return a pandas Series/numpy.ndarray of same length. |
required |
chress |
list
|
singleton list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame containing the result of the operation applied to each row of the provided DataFrame. |
Examples:
def test_all_idx_op():
data = {'inst1': [3, 5, 2, np.nan, 8, np.nan, np.nan, 10, np.nan, 7],
'inst2': [np.nan, 5, 12, 9, 15, np.nan, 8, 13, 14, 11]
}
df = pd.DataFrame(data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
result = all_idx_op(lambda x: scipy.stats.rankdata(x, method="average", nan_policy="omit"), [df])
expected_data = {'inst1':[1.0, 1.5, 1.0, 1.0, 1.0, 1.0, 1.5, 1.0, 1.0, 1.0],
'inst2': [np.nan, 1.5, 2.0, 2.0, 2.0, 2.0, 1.5, 2.0, 2.0, 2.0]}
expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
assert isinstance(result, pd.DataFrame)
assert result.index.equals(df.index)
assert result.columns.equals(df.columns)
pd.testing.assert_frame_equal(result, expected_result)
intersect_idx_op(op, insts, aligner, chress, **kwargs)
Perform a intersection operation on index labels and apply a given operation across multiple arrays or pandas DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op |
callable
|
The operation to be applied across arrays/DataFrames. |
required |
insts |
list
|
List of instance names or identifiers. |
required |
aligner |
DataFrame
|
A DataFrame with a pandas datetime index property to align the result DataFrame with. |
required |
chress |
list
|
list of DataFrames or numbers.Number containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the result of the operation applied to the input arrays/DataFrames. |
Examples:
def test_intersect_idx_op():
data1 = {'inst1': [np.nan, 2.0, 3, np.nan, 5],
'inst2': [6, np.nan, 8, 9, 10]}
data2 = {'inst1': [1.0, 2, np.nan, np.nan, 5],
'inst2': [11, 12, 13, np.nan, 15]}
df1 = pd.DataFrame(data1, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
df2 = pd.DataFrame(data2, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
result = intersect_idx_op(np.add, ['inst1', 'inst2'], aligner, [df1, df2])
expected_data = {
'inst1': [np.nan, 4.0, np.nan, np.nan, 10],
'inst2': [17.0, np.nan, 21, np.nan, 25]
}
expected_result = pd.DataFrame(
expected_data, index=pd.date_range(start='2022-01-01', periods=5,freq='D')
)
assert isinstance(result, pd.DataFrame)
assert result.index.equals(aligner.index)
assert result.columns.equals(expected_result.columns)
pd.testing.assert_frame_equal(result, expected_result)
self_idx_op(op, insts, aligner, win, chress, **kwargs)
Apply a specified operation to each instance in the provided time series data, using a rolling window. Missing arguments are ignored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op |
callable
|
The operation to be applied to each instance in the time series data. It must accept a pandas Series as input and return a scalar value. |
required |
insts |
list of str
|
List of instance names or identifiers. |
required |
aligner |
DataFrame
|
A DataFrame with a pandas datetime index property to align the results. |
required |
win |
int
|
Window size for the rolling computation. |
required |
chress |
list
|
singleton list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the result of the operation applied to each instance, aligned with the provided aligner. |
Examples:
def test_self_idx_op():
data = {'inst1': [1, 2, 3, np.nan, 5, np.nan, np.nan, 8, np.nan, 10],
'inst2': [6, np.nan, 8, 9, 10, np.nan, 12, 13, 14, 15]}
df = pd.DataFrame(data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', end='2022-01-10', freq='D'))
result = self_idx_op(np.mean, ['inst1', 'inst2'], aligner, 3, [df])
expected_data = {
'inst1': [
np.nan, np.nan, 2.0, np.nan, 3.333333, np.nan, np.nan, 5.333333, np.nan, 7.666667
],
'inst2': [
np.nan, np.nan, np.nan, 7.666667, 9.0, np.nan, 10.333333, 11.666667, 13.0, 14.0
]
}
expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
assert isinstance(result, pd.DataFrame)
assert result.index.equals(aligner.index)
assert result.columns.equals(expected_result.columns)
pd.testing.assert_frame_equal(result, expected_result)
self_idx_op2(op, chress, **kwargs)
Apply a specified unary transformation to the provided time series data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op |
callable
|
The operation to be applied to the provided time series data. It must accept a pandas DataFrame as input and return a pandas DataFrame. |
required |
chress |
list
|
singleton list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Result of applying the operation to the provided time series data. The output DataFrame will have the same index and columns as the input DataFrame. |
Examples:
def test_self_idx_op2():
data = {'inst1': [1, 2, 3, np.nan, 5, np.nan, np.nan, 8, np.nan, 10],
'inst2': [6, np.nan, 8, 9, 10, np.nan, 12, 13, 14, 15]}
df = pd.DataFrame(data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
result = self_idx_op2(lambda x: -1 * x, [df])
expected_data = {'inst1':[-1, -2, -3, np.nan, -5, np.nan, np.nan, -8, np.nan, -10],
'inst2': [-6, np.nan, -8, -9, -10, np.nan, -12, -13, -14, -15]
}
expected_result = pd.DataFrame(
expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D')
)
assert isinstance(result, pd.DataFrame)
assert result.index.equals(df.index)
assert result.columns.equals(df.columns)
pd.testing.assert_frame_equal(result, expected_result)
slow_idx_op(op, insts, aligner, win, chress, **kwargs)
Apply a specified operation to aligned time series data, potentially handling unequal indices. The reference indices aligned to for each inst is the argument with the smallest number of data points. Missing arguments applied on last-known basis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op |
callable
|
The operation to be applied across time series data. It must accept numpy arrays as input. |
required |
insts |
list of str
|
List of instance names or identifiers. |
required |
aligner |
DataFrame
|
A DataFrame with a pandas datetime index property to determine the common index for alignment. |
required |
win |
int
|
Window size for rolling computation. Used for operations that involve a rolling window, such as rolling correlation. |
required |
chress |
list
|
list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the result of the operation applied to the aligned time series data. |
Examples:
def test_slow_idx_op():
data1 = {
'inst1': [np.nan, 2, 3, np.nan, 5, 6, np.nan, 8, 9, 10], #7 data points
'inst2': [6, np.nan, 8, 9, 10, np.nan, 12, 13, np.nan, 15] #7 data points
}
data2 = {
'inst1': [1, 2, np.nan, 4, 5, 6, 7, 8, 9, 10], #9 data points
'inst2': [11, 12, 13, np.nan, 15, 16, 17, 18, np.nan, 20] #8 data points
}
df1 = pd.DataFrame(data1, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
df2 = pd.DataFrame(data2, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', end='2022-01-10', freq='D'))
result = slow_idx_op(lambda a,b:np.add(a,b)[0], ['inst1', 'inst2'], aligner, 1, [df1, df2])
expected_data = {
'inst1': [np.nan, 4.0, 5.0, np.nan, 10.0, 12.0, np.nan, 16.0, 18.0, 20.0], #min(7,9)=7, choose df1.inst1.index as reference
'inst2': [17.0, np.nan, 21.0, 22.0, 25.0, np.nan, 29.0, 31.0, np.nan, 35.0] #min(7,8)=7, choose df1.inst2.index as reference
}
expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
assert isinstance(result, pd.DataFrame)
assert result.index.equals(aligner.index)
assert result.columns.equals(expected_result.columns)
pd.testing.assert_frame_equal(result, expected_result)
union_idx_op(op, insts, aligner, chress, **kwargs)
Perform a union operation on index labels and apply a given operation across multiple arrays or pandas DataFrames. Missing arguments applied on last-known basis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
op |
callable
|
The operation to be applied across arrays/DataFrames. |
required |
insts |
list
|
List of instance names or identifiers. |
required |
aligner |
DataFrame
|
A DataFrame with a pandas datetime index property to align the result DataFrame with. |
required |
chress |
list
|
list of DataFrames or numbers.Number containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the result of the operation applied to the input arrays/DataFrames. |
Examples:
def test_union_idx_op():
data1 = {'inst1': [np.nan, 2.0, 3, np.nan, 5],
'inst2': [6, np.nan, 8, 9, 10]}
data2 = {'inst1': [1.0, 2, np.nan, np.nan, 5],
'inst2': [11, 12, 13, np.nan, 15]}
df1 = pd.DataFrame(data1, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
df2 = pd.DataFrame(data2, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
result = union_idx_op(np.add, ['inst1', 'inst2'], aligner, [df1, df2])
expected_data = {
'inst1': [np.nan, 4.0, 5, np.nan, 10],
'inst2': [17.0, 18, 21, 22, 25]
}
expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
assert isinstance(result, pd.DataFrame)
assert result.index.equals(aligner.index)
assert result.columns.equals(expected_result.columns)
pd.testing.assert_frame_equal(result, expected_result)