ALGOGENE | Time Series Analysis - Autoregressive Model

admin

Time Series Analysis - Autoregressive Model

Quantitative Model

What is an Autoregressive Model?

An autoregressive (AR) model predicts future behavior based on past results. It is used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them. You only use past data to model the behavior, hence the name autoregressive (the Greek prefix auto means 'self'). The process is basically a linear regression of the data in the current series against one or more past values in the same series.

In an AR model, the value of the outcome variable (Y) at some point t in time, like a 'regular' linear regression, directly related to the predictor variable (X). Where simple linear regression and AR models differ is that Y is dependent on X and previous values for Y.

The AR process is an example of a stochastic process, which have degrees of uncertainty or randomness built in. The randomness means that you might be able to predict future trends pretty well with past data, but you’re never going to get 100% accuracy.

AR models are also called conditional models, Markov models, or transition models.

AR(p) Models

An AR(p) model is an autoregressive model where specific lagged values of y_t are used as predictor variables. Lags are where results from one time period affect following periods.

The value for 'p' is called the order. For example, an AR(1) would be a 'first order autoregressive process.' The outcome variable in a first order AR process at some point in time t is related only to time periods that are one period apart (i.e. the value of the variable at t – 1). A second or third order AR process would be related to data two or three periods apart.

The AR(p) model is defined by the equation:

y_t = δ + φ₁y_t-1 + φ₂y_t-2 + … + φ_py_t-p + ε_t
Where:

y_t-1, y_t-2…y_t-p are the past series values (lags),
ε_t is random term (or called white noise),
and δ is defined by the following equation:
where μ is the process mean

Parameter Estimation

From the equation, all {y_t} terms are already observable and known to us. What we want to get is the coefficient terms, i.e. δ, φ₁, φ₂, ...

(A) Least Squares Regression

One of the estimation methods is to formulate as a least squares regression problem, basing prediction of values of y_t on the p previous values of the same series. A general multiple linear regression is written as:

Y_i = β₀ + β₁X_i,1 + β₂X_i,2 + … + β_pX_i,p + ε_t

Then, we try to minimize the sum of square error:

L(β) := Σ(ε_t²)

= ||Xβ - Y||²

= (Xβ - Y)^T(Xβ - Y)

= Y^TY - Y^TXβ - β^TX^TY + β^TX^TXβ

As it is a convex function, the optimal solution lies at gradient zero. So we firstly take a partial derivative.

∂L(β)/∂β = ∂ (Y^TY - Y^TXβ - β^TX^TY + β^TX^TXβ) / ∂β

= -2X^TY + 2X^TXβ

Set this gradient to zero, we get the optimal parameters.

β = (X^TX)^-1X^TY

Example

Suppose we collected the previous 20 daily closing price of stock A.

10, 11.3, 10.71, 10.87, 10.48, 11.36, 10.49, 10.57, 10.58, 9.42, 9.11, 9.75, 11.14, 9.72, 10.73, 10.57, 11.91, 13.09, 14.34, 14.09

Now, we want to use an AR(1) model (i.e. y_t = δ + φ₁y_t-1 + ε_t), to explain this series.

Using the least square estimation method above, we obtained δ, φ₁ = 1.319141067 and 0.898255165 respectively.

i.e. y_t = 1.319141067 + 0.898255165*y_t-1

Then, we can calculate the expected value of y at time t given that we know the information for t-1, i.e. E(y_t|y_t-1)

Seq	y_t	Estimated y_t
1	10	-
2	11.3	10.3016927176918
3	10.71	11.4694244322527
4	10.87	10.939453884875
5	10.48	11.0831747112825
6	11.36	10.7328551969143
7	10.49	11.5233197421555
8	10.57	10.7418377485647
9	10.58	10.8136981617685
10	9.42	10.8226807134189
11	9.11	9.78070472196458
12	9.75	9.50224562080005
13	11.14	10.07712892643
14	9.72	11.3257036058452
15	10.73	10.0501812714786
16	10.57	10.957418988176
17	11.91	10.8136981617685
18	13.09	12.0173600829313
19	14.34	13.0773011776866
20	14.09	14.2001201339952

Moreover, we can base on the formula to further iteratate and forecast the next stock prices.

Seq	y_t	Estimated y_t
21	-	13.9755563427335
22	-	13.8727567364869
23	-	13.7804164592112
24	-	13.6974713282064
25	-	13.6229654358659

Now, you understand the statistical theory behide an Auto-Regressive model. Let's further go to the next post to see how to implement as a trading strategy!

1 0

Posted on : 2019-11-28 13:45:04.036000

Hot Topic
貼現窗率 vs LIBOR：了解金融世界的兩大關鍵指標
Way of submission of algo challenge backtest code
May I know if this error is triggered by insuffici
Product Input Simplification
Close position with market order
Is it possible to run the backtest strategy backend
Run machine learning algorithm in the backtest engine.
保險業人工智能革新三部曲
Syndicated Loans Market
Question about system behavior of 1-day interval in live trading