Skip to content

quantpylib.simulator.operators

The quantpylib.simulator.operators is a helper module consisting of operators that are used for data alignment in the evaluation of functions used by the quantpylib.simulator.gene's Gene class. The semantics of the functions defined and how they operate on the arguments provided depend on the class of operators the primitive belongs to. See the list of primitives here.

For example, we may be interested in using as our trading signal the sum of facebook_sentiment and twitter_sentiment, where the former is sampled everyday and latter is sampled only weekly. The formula plus(facebook_sentiment,twitter_sentiment) can have different interpretations depending on how we want to align data for which one or both of the arguments is nan.

📝 last-known basis: this phrase can be construed as forward filling DataFrames and applying the relevant operation.

all_idx_op(op, chress, **kwargs)

Apply a specified operation to each row of the provided DataFrame. Missing arguments applied on last-known basis.

Parameters:

Name Type Description Default
op callable

The operation to be applied to each row of the DataFrame. It must accept a pandas Series as input and return a pandas Series/numpy.ndarray of same length.

required
chress list

singleton list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'.

required

Returns:

Type Description
DataFrame

DataFrame containing the result of the operation applied to each row of the provided DataFrame.

Examples:

def test_all_idx_op():
    data = {'inst1': [3,      5, 2,  np.nan, 8,   np.nan, np.nan, 10, np.nan, 7],
            'inst2': [np.nan, 5, 12, 9,      15,  np.nan, 8,      13, 14,     11]
            }
    df = pd.DataFrame(data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
    result = all_idx_op(lambda x: scipy.stats.rankdata(x, method="average", nan_policy="omit"), [df])
    expected_data = {'inst1':[1.0,    1.5, 1.0, 1.0, 1.0, 1.0, 1.5, 1.0, 1.0, 1.0],
                    'inst2': [np.nan, 1.5, 2.0, 2.0, 2.0, 2.0, 1.5, 2.0, 2.0, 2.0]}
    expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))

    assert isinstance(result, pd.DataFrame)
    assert result.index.equals(df.index)
    assert result.columns.equals(df.columns)
    pd.testing.assert_frame_equal(result, expected_result)

intersect_idx_op(op, insts, aligner, chress, **kwargs)

Perform a intersection operation on index labels and apply a given operation across multiple arrays or pandas DataFrames.

Parameters:

Name Type Description Default
op callable

The operation to be applied across arrays/DataFrames.

required
insts list

List of instance names or identifiers.

required
aligner DataFrame

A DataFrame with a pandas datetime index property to align the result DataFrame with.

required
chress list

list of DataFrames or numbers.Number containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'.

required

Returns:

Type Description
DataFrame

A DataFrame containing the result of the operation applied to the input arrays/DataFrames.

Examples:

def test_intersect_idx_op():
    data1 = {'inst1': [np.nan, 2.0,    3,      np.nan, 5], 
             'inst2': [6,      np.nan, 8,      9,      10]}

    data2 = {'inst1': [1.0,    2,      np.nan, np.nan, 5], 
             'inst2': [11,     12,     13,     np.nan, 15]}
    df1 = pd.DataFrame(data1, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
    df2 = pd.DataFrame(data2, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
    aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
    result = intersect_idx_op(np.add, ['inst1', 'inst2'], aligner, [df1, df2])    
    expected_data = {
        'inst1': [np.nan, 4.0,    np.nan, np.nan, 10], 
        'inst2': [17.0,   np.nan, 21,     np.nan, 25]
    }
    expected_result = pd.DataFrame(
        expected_data, index=pd.date_range(start='2022-01-01', periods=5,freq='D')
    )

    assert isinstance(result, pd.DataFrame)
    assert result.index.equals(aligner.index)
    assert result.columns.equals(expected_result.columns)
    pd.testing.assert_frame_equal(result, expected_result)

self_idx_op(op, insts, aligner, win, chress, **kwargs)

Apply a specified operation to each instance in the provided time series data, using a rolling window. Missing arguments are ignored.

Parameters:

Name Type Description Default
op callable

The operation to be applied to each instance in the time series data. It must accept a pandas Series as input and return a scalar value.

required
insts list of str

List of instance names or identifiers.

required
aligner DataFrame

A DataFrame with a pandas datetime index property to align the results.

required
win int

Window size for the rolling computation.

required
chress list

singleton list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'.

required

Returns:

Type Description
DataFrame

A DataFrame containing the result of the operation applied to each instance, aligned with the provided aligner.

Examples:

def test_self_idx_op():
    data = {'inst1': [1, 2,      3, np.nan, 5,  np.nan, np.nan, 8,  np.nan, 10],
            'inst2': [6, np.nan, 8, 9,      10, np.nan, 12,     13, 14,     15]}
    df = pd.DataFrame(data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
    aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', end='2022-01-10', freq='D'))
    result = self_idx_op(np.mean, ['inst1', 'inst2'], aligner, 3, [df])

    expected_data = {
        'inst1': [
            np.nan, np.nan, 2.0,    np.nan,   3.333333, np.nan, np.nan,    5.333333,  np.nan, 7.666667
        ],
        'inst2': [
            np.nan, np.nan, np.nan, 7.666667, 9.0,      np.nan, 10.333333, 11.666667, 13.0,   14.0
        ]
    }
    expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))

    assert isinstance(result, pd.DataFrame)
    assert result.index.equals(aligner.index)
    assert result.columns.equals(expected_result.columns)
    pd.testing.assert_frame_equal(result, expected_result)

self_idx_op2(op, chress, **kwargs)

Apply a specified unary transformation to the provided time series data.

Parameters:

Name Type Description Default
op callable

The operation to be applied to the provided time series data. It must accept a pandas DataFrame as input and return a pandas DataFrame.

required
chress list

singleton list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'.

required

Returns:

Type Description
DataFrame

Result of applying the operation to the provided time series data. The output DataFrame will have the same index and columns as the input DataFrame.

Examples:

def test_self_idx_op2():
    data = {'inst1': [1, 2,      3, np.nan, 5,  np.nan, np.nan, 8,  np.nan, 10],
            'inst2': [6, np.nan, 8, 9,      10, np.nan, 12,     13, 14,     15]}
    df = pd.DataFrame(data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
    result = self_idx_op2(lambda x: -1 * x, [df])

    expected_data = {'inst1':[-1, -2,     -3, np.nan, -5,  np.nan, np.nan, -8,  np.nan, -10],
                    'inst2': [-6, np.nan, -8, -9,     -10, np.nan, -12,    -13, -14,    -15]
                }
    expected_result = pd.DataFrame(
        expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D')
    )

    assert isinstance(result, pd.DataFrame)
    assert result.index.equals(df.index)
    assert result.columns.equals(df.columns)
    pd.testing.assert_frame_equal(result, expected_result)

slow_idx_op(op, insts, aligner, win, chress, **kwargs)

Apply a specified operation to aligned time series data, potentially handling unequal indices. The reference indices aligned to for each inst is the argument with the smallest number of data points. Missing arguments applied on last-known basis.

Parameters:

Name Type Description Default
op callable

The operation to be applied across time series data. It must accept numpy arrays as input.

required
insts list of str

List of instance names or identifiers.

required
aligner DataFrame

A DataFrame with a pandas datetime index property to determine the common index for alignment.

required
win int

Window size for rolling computation. Used for operations that involve a rolling window, such as rolling correlation.

required
chress list

list of DataFrames containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'.

required

Returns:

Type Description
DataFrame

A DataFrame containing the result of the operation applied to the aligned time series data.

Examples:

def test_slow_idx_op():
    data1 = {
        'inst1': [np.nan, 2,      3,      np.nan, 5,  6,      np.nan, 8,  9,      10], #7 data points
        'inst2': [6,      np.nan, 8,      9,      10, np.nan, 12,     13, np.nan, 15] #7 data points
    }
    data2 = {
        'inst1': [1,      2,      np.nan, 4,      5,  6,      7,      8, 9,       10], #9 data points
        'inst2': [11,     12,     13,     np.nan, 15, 16,     17,     18, np.nan, 20] #8 data points
    }
    df1 = pd.DataFrame(data1, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
    df2 = pd.DataFrame(data2, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
    aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', end='2022-01-10', freq='D'))
    result = slow_idx_op(lambda a,b:np.add(a,b)[0], ['inst1', 'inst2'], aligner, 1, [df1, df2])
    expected_data = {
        'inst1': [np.nan, 4.0,    5.0,  np.nan, 10.0, 12.0,   np.nan, 16.0, 18.0,   20.0], #min(7,9)=7, choose df1.inst1.index as reference
        'inst2': [17.0,   np.nan, 21.0, 22.0,   25.0, np.nan, 29.0,   31.0, np.nan, 35.0] #min(7,8)=7, choose df1.inst2.index as reference
    }
    expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=10, freq='D'))
    assert isinstance(result, pd.DataFrame)
    assert result.index.equals(aligner.index)
    assert result.columns.equals(expected_result.columns)
    pd.testing.assert_frame_equal(result, expected_result)

union_idx_op(op, insts, aligner, chress, **kwargs)

Perform a union operation on index labels and apply a given operation across multiple arrays or pandas DataFrames. Missing arguments applied on last-known basis.

Parameters:

Name Type Description Default
op callable

The operation to be applied across arrays/DataFrames.

required
insts list

List of instance names or identifiers.

required
aligner DataFrame

A DataFrame with a pandas datetime index property to align the result DataFrame with.

required
chress list

list of DataFrames or numbers.Number containing time series data. Each DataFrame should have columns corresponding to the instance names provided in 'insts'.

required

Returns:

Type Description
DataFrame

A DataFrame containing the result of the operation applied to the input arrays/DataFrames.

Examples:

def test_union_idx_op():
    data1 = {'inst1': [np.nan, 2.0,    3,      np.nan, 5], 
             'inst2': [6,      np.nan, 8,      9,      10]}
    data2 = {'inst1': [1.0,    2,      np.nan, np.nan, 5], 
             'inst2': [11,     12,     13,     np.nan, 15]}
    df1 = pd.DataFrame(data1, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
    df2 = pd.DataFrame(data2, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))
    aligner = pd.DataFrame(index=pd.date_range(start='2022-01-01', periods=5, freq='D'))

    result = union_idx_op(np.add, ['inst1', 'inst2'], aligner, [df1, df2])

    expected_data = {
        'inst1': [np.nan, 4.0, 5,  np.nan, 10], 
        'inst2': [17.0,   18,  21, 22,     25]
    }
    expected_result = pd.DataFrame(expected_data, index=pd.date_range(start='2022-01-01', periods=5, freq='D'))

    assert isinstance(result, pd.DataFrame)
    assert result.index.equals(aligner.index)
    assert result.columns.equals(expected_result.columns)
    pd.testing.assert_frame_equal(result, expected_result)