admin

Guidelines to re-use model results for backtests

Programming


When developing a trading algorithm, it may be the case where we have trained/built a particular model for a financial instrument and want to apply it on other financial sectors/markets. For such case, re-using a pre-trained model for strategy backtests could largely reduce the time in repeated model calibration process. However, if we handle it inproperly, it is likely for us to fall into the trap of "Look Ahead Bias"!


What is "Look Ahead Bias"?

Look-ahead bias refers to making trade decisions based on data/information that would only be available in the future.

For backtesting, it is crucial that we only use information that would have been available at the time of the trade. For example, using a yearly earnings figure that would be released a quarter later will potentially bias the results in favor of the desired outcome. The accuracy of this strategy performance is also doubtful.

Here are some considerations for identifying a potential look-ahead bias.

  • When was the data released?
  • At what time the data observable and available to us?

Load/Dump model on ALGOGENE Web IDE

As an event streaming backtest enginee, ALGOGENE primarily eliminated the look-ahead bias where data feed into our strategy script according to chronological order, as if replaying a recorder. On the other hand, ALGOGENE provides feasibility to save and re-use customized models for backtesting on the Web IDE. Such feasibility, however, might expose us to potential look-ahead bias that we might inappropriately introduce. Thus, the questions above would be a guide to justify whether our re-used model is logically sound good.

Now, let's see how to load/dump a model on ALGOGENE IDE. Upon account registration on the platform, each user has automatically been assigned with a partition on ALGOGENE's cloud environment. All we need to do is simply to save the model to our assigned cloud directory (i.e. self.evt.path_lib), and then retrieve it for other backtest process. In the following example, it is presented how to implement programmatically, but it may not be logically sound good in terms of strategy backtesting.

Suppose we want to find out the mathematical relationship between 3 financial instruments, defined as Y = f(X1, X2). We based on 'keras' (https://www.tensorflow.org/guide/keras/save_and_serialize) library to derive the fitted model using the past 100 daily observations. In the first example below, we create 'model_1' directory and dump the results there. In the second example, 'model_1' is retrieved for another backtest.


Source Codes

Save a model refers to line #51

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
from AlgoAPI import AlgoAPIUtil, AlgoAPI_Backtest
from datetime import datetime, timedelta
import tensorflow as tf
from tensorflow import keras


def get_model():
    # Create a simple model with 2 input variables
    inputs = keras.Input(shape=(2,))
    outputs = keras.layers.Dense(1)(inputs)
    model = keras.Model(inputs, outputs)
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model

class AlgoEvent:
    def __init__(self):
        self.lasttime = datetime(2000,1,1)
        self.isSaved = False
        self.numOfObs = 100 
        self.arr_Y, self.arr_X1, self.arr_X2 = [], [], []
        self.model = get_model()

    def start(self, mEvt):
        # get my selected financial instruments
        self.myinstrument_Y = mEvt['subscribeList'][0]
        self.myinstrument_X1 = mEvt['subscribeList'][1]
        self.myinstrument_X2 = mEvt['subscribeList'][2]

        # start backtest
        self.evt = AlgoAPI_Backtest.AlgoEvtHandler(self, mEvt)
        self.evt.start()

    def on_bulkdatafeed(self, isSync, bd, ab):
        if isSync and not self.isSaved:
            # get new day price
            if bd[self.myinstrument_Y]['timestamp'] > self.lasttime + timedelta(hours=24):
                self.lasttime = bd[self.myinstrument_Y]['timestamp']

                # append observation
                self.arr_Y.append(bd[self.myinstrument_Y]['lastPrice'])
                self.arr_X1.append(bd[self.myinstrument_X1]['lastPrice'])
                self.arr_X2.append(bd[self.myinstrument_X2]['lastPrice'])

                if len(self.arr_Y) >= self.numOfObs:
                    # Train the model
                    test_input = [[self.arr_X1[i], self.arr_X2[i]] for i in range(0,self.numOfObs)]
                    test_target = [[self.arr_Y[i]] for i in range(0,self.numOfObs)]
                    self.model.fit(test_input, test_target)

                    # save the model
                    self.model.save(self.evt.path_lib+"model_1")

                    self.isSaved = True

    def on_marketdatafeed(self, md, ab):
        pass

    def on_orderfeed(self, of):
        pass

    def on_newsdatafeed(self, nd):
        pass

    def on_dailyPLfeed(self, pl):
        pass

    def on_openPositionfeed(self, op, oo, uo):
        pass

Load a model refers to line #24

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from AlgoAPI import AlgoAPIUtil, AlgoAPI_Backtest
from datetime import datetime, timedelta
import tensorflow as tf
from tensorflow import keras


def get_model():
    # Create a simple model with 2 variables
    inputs = keras.Input(shape=(2,))
    outputs = keras.layers.Dense(1)(inputs)
    model = keras.Model(inputs, outputs)
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model

class AlgoEvent:
    def __init__(self):
        pass

    def start(self, mEvt):
        # start backtest
        self.evt = AlgoAPI_Backtest.AlgoEvtHandler(self, mEvt)

        # load my model_1
        self.model = keras.models.load_model(self.evt.path_lib+"model_1")

        self.evt.start()

    def on_marketdatafeed(self, md, ab):
        # use self.model for new market data feed ...
        pass

 
admin
Since the system release in 2021.04, users can now interact with your assigned cloud directory. 
The self.evt.path_lib mentioned above will be equivalent to /lib in cloud server. 
From [My History] > [Custom File Viewer], you can directly upload, download, edit, and delete your data and models. 

fileview

 
Bee Bee
Original Posted by - admin: Since the system release in 2021.04, users can now interact with your assigned cloud directory. 
The self.evt.path_lib mentioned above will be equivalent to /lib in cloud server. 
From [My History] > [Custom File Viewer], you can directly upload, download, edit, and delete your data and models. 

fileview

Hi, I have trained a model in my local machine. Can I include it for backtest? 
 
admin
Original Posted by - Bee Bee: Hi, I have trained a model in my local machine. Can I include it for backtest? 
Yes, you can firstly upload the trained model and library under cloud directory /lib, say /lib/m1
Then, in backtest script, the model can be imported or loaded from self.evt.path_lib+"m1"