When developing a trading algorithm, it may be the case where we have trained/built a particular model for a financial instrument and want to apply it on other financial sectors/markets. For such case, re-using a pre-trained model for strategy backtests could largely reduce the time in repeated model calibration process. However, if we handle it inproperly, it is likely for us to fall into the trap of "Look Ahead Bias"!
What is "Look Ahead Bias"?
Look-ahead bias refers to making trade decisions based on data/information that would only be available in the future.
For backtesting, it is crucial that we only use information that would have been available at the time of the trade. For example, using a yearly earnings figure that would be released a quarter later will potentially bias the results in favor of the desired outcome. The accuracy of this strategy performance is also doubtful.
Here are some considerations for identifying a potential look-ahead bias.
- When was the data released?
- At what time the data observable and available to us?
Load/Dump model on ALGOGENE Web IDE
As an event streaming backtest enginee, ALGOGENE primarily eliminated the look-ahead bias where data feed into our strategy script according to chronological order, as if replaying a recorder. On the other hand, ALGOGENE provides feasibility to save and re-use customized models for backtesting on the Web IDE. Such feasibility, however, might expose us to potential look-ahead bias that we might inappropriately introduce. Thus, the questions above would be a guide to justify whether our re-used model is logically sound good.
Now, let's see how to load/dump a model on ALGOGENE IDE. Upon account registration on the platform, each user has automatically been assigned with a partition on ALGOGENE's cloud environment. All we need to do is simply to save the model to our assigned cloud directory (i.e. self.evt.path_lib), and then retrieve it for other backtest process. In the following example, it is presented how to implement programmatically, but it may not be logically sound good in terms of strategy backtesting.
Suppose we want to find out the mathematical relationship between 3 financial instruments, defined as Y = f(X1, X2). We based on 'keras' (https://www.tensorflow.org/guide/keras/save_and_serialize) library to derive the fitted model using the past 100 daily observations. In the first example below, we create 'model_1' directory and dump the results there. In the second example, 'model_1' is retrieved for another backtest.
Source Codes
Save a model refers to line #51
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | from AlgoAPI import AlgoAPIUtil, AlgoAPI_Backtest from datetime import datetime, timedelta import tensorflow as tf from tensorflow import keras def get_model(): # Create a simple model with 2 input variables inputs = keras.Input(shape=(2,)) outputs = keras.layers.Dense(1)(inputs) model = keras.Model(inputs, outputs) model.compile(optimizer="adam", loss="mean_squared_error") return model class AlgoEvent: def __init__(self): self.lasttime = datetime(2000,1,1) self.isSaved = False self.numOfObs = 100 self.arr_Y, self.arr_X1, self.arr_X2 = [], [], [] self.model = get_model() def start(self, mEvt): # get my selected financial instruments self.myinstrument_Y = mEvt['subscribeList'][0] self.myinstrument_X1 = mEvt['subscribeList'][1] self.myinstrument_X2 = mEvt['subscribeList'][2] # start backtest self.evt = AlgoAPI_Backtest.AlgoEvtHandler(self, mEvt) self.evt.start() def on_bulkdatafeed(self, isSync, bd, ab): if isSync and not self.isSaved: # get new day price if bd[self.myinstrument_Y]['timestamp'] > self.lasttime + timedelta(hours=24): self.lasttime = bd[self.myinstrument_Y]['timestamp'] # append observation self.arr_Y.append(bd[self.myinstrument_Y]['lastPrice']) self.arr_X1.append(bd[self.myinstrument_X1]['lastPrice']) self.arr_X2.append(bd[self.myinstrument_X2]['lastPrice']) if len(self.arr_Y) >= self.numOfObs: # Train the model test_input = [[self.arr_X1[i], self.arr_X2[i]] for i in range(0,self.numOfObs)] test_target = [[self.arr_Y[i]] for i in range(0,self.numOfObs)] self.model.fit(test_input, test_target) # save the model self.model.save(self.evt.path_lib+"model_1") self.isSaved = True def on_marketdatafeed(self, md, ab): pass def on_orderfeed(self, of): pass def on_newsdatafeed(self, nd): pass def on_dailyPLfeed(self, pl): pass def on_openPositionfeed(self, op, oo, uo): pass |
Load a model refers to line #24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | from AlgoAPI import AlgoAPIUtil, AlgoAPI_Backtest from datetime import datetime, timedelta import tensorflow as tf from tensorflow import keras def get_model(): # Create a simple model with 2 variables inputs = keras.Input(shape=(2,)) outputs = keras.layers.Dense(1)(inputs) model = keras.Model(inputs, outputs) model.compile(optimizer="adam", loss="mean_squared_error") return model class AlgoEvent: def __init__(self): pass def start(self, mEvt): # start backtest self.evt = AlgoAPI_Backtest.AlgoEvtHandler(self, mEvt) # load my model_1 self.model = keras.models.load_model(self.evt.path_lib+"model_1") self.evt.start() def on_marketdatafeed(self, md, ab): # use self.model for new market data feed ... pass |