You are on page 1of 56

Basic concepts

Autoregressive models
Moving average models

Time Series Analysis


1. Stationary ARMA models

Andrew Lesniewski

Baruch College
New York

Fall 2019

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Outline

1 Basic concepts

2 Autoregressive models

3 Moving average models

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Time series
A time series is a sequence of data points Xt indexed a discrete set of (ordered)
dates t, where −∞ < t < ∞.
Each Xt can be a simple number or a complex multi-dimensional object (vector,
matrix, higher dimensional array, or more general structure).
We will be assuming that the times t are equally spaced throughout, and denote
the time increment by h (e.g. second, day, month). Unless specified otherwise,
we will be choosing the units of time so that h = 1.
Typically, time series exhibit significant irregularities, which may have their origin
either in the nature of the underlying quantity or imprecision in observation (or
both).
Examples of time series commonly encountered in finance include:
(i) prices,
(ii) returns,
(iii) index levels,
(iv) trading volums,
(v) open interests,
(vi) macroeconomic data (inflation, new payrolls, unemployment, GDP,
housing prices, . . . )

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Time series
For modeling purposes, we assume that the elements of a time series are
random variables on some underlying probability space.
Time series analysis is a set of mathematical methodologies for analyzing
observed time series, whose purpose is to extract useful characteristics of the
data.
These methodologies fall into two broad categories:
(i) non-parametric, where the stochastic law of the time series is not explicitly
specified;
(ii) parametric, where the stochastic law of the time series is assumed to be
given by a model with a finite (and preferably tractable) number of
parameters.
The results of time series analysis are used for various purposes such as
(i) data interpretation,
(ii) forecasting,
(iii) smoothing,
(iv) back filling, ...
We begin with stationary time series.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Stationarity and ergodicity

A time series (model) is stationary, if for any times t1 < . . . < tk and any τ the
joint probability distribution of (Xt1 +τ , . . . , Xtk +τ ) is identical with the joint
probability distribution of (Xt1 , . . . , Xtk ).
In other words, the joint probability distribution of (Xt1 , . . . , Xtk ) remains the
same if each observation time ti is shifted by the same amount (time translation
invariance).
For a stationary time series, the expected value E(Xt ) is independent of t and is
called the (ensemble) mean of Xt . We will denote its value by µ.
A stationary time series model is ergodic if

1 X
lim Xt+k = µ, (1)
T →∞ T
1≤k ≤T

i.e. if the time average of Xt is equal to its mean.


The limit in (1) is usually understood in the sense of squared mean convergence.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Stationarity and ergodicity

Ergodicity is a desired property of a financial time series, as we are always faced


with a single realization of a process rather than an ensemble of alternative
outcomes.
The notions of stationarity and ergodicity are hard to verify in practice. In
particular, there is practical statistical test for ergodicity.
For this reason, a weaker but more practical concept of stationarity has been
introduced.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autocovariance and stationarity

A time series is covariance-stationary (a.k.a. weakly stationary), if:


(i) E(Xt ) = µ is a constant,
(ii) For any τ , the autocovariance Cov(Xs , Xt ) is time translation invariant,

Cov(Xs+τ , Xt+τ ) = Cov(Xs , Xt ), (2)

i.e. Cov(Xs , Xt ) depends only on the difference t − s. We will denote it by


Γt−s .
For covariance stationary series, Γ−t = Γt (show it!).
Notice that Γ0 = Var(Xt ).

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autocovariance and stationarity

The autocorrelation function (ACF) of a time series is defined as

Cov(Xs , Xt )
Rs,t = p p . (3)
Var(Xs ) Var(Xt )

For covariance-stationary time series, Rs,t = Rs−t,0 , i.e. the ACF is a function of
the difference s − t only.
We will write Rt = Rt,0 , and note that

Γt
Rt = . (4)
Γ0

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autocovariance and stationarity

Note that µ, Γ, and R are usually unknown, and are estimated from sample data.
The estimated sample mean µ b, autocovariance b
Γ, and autocorrelation R
b are
calculated as follows.
Consider a finite sample x1 , . . . , xT . Then

T
1 X
µ
b= xt ,
T t=1

T
 1 P (x − µ

b)(xj−t − µ
b), for t = 0, 1, . . . , T − 1,
T j (5)
Γt =
b j=t+1

Γ−t , for t = −1, . . . , −(T − 1).
b

b t = Γt .
b
R
Γ0
b

These quantities are called the sample mean, sample autocovariance, and
sample ACF, respectively.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autocovariance and stationarity

Usually, R
b t is a biased estimator of Rt , with the bias going to zero as 1/T for
T → ∞.
Notice that this method allows us to compute up to T − 1 estimated sample
autocorrelations.
One can use the above estimators to test the hypothesis H0 : Rt = 0 versus
Ha : Rt 6= 0.
The relevant t-stat is
R
bt
r = q .
1
1 + 2 t−1 b2
P
T
R
i=1 i

If Xt is a stationary Gaussian time series with Rs = 0 for s > t, this t-stat is


normally distributed, asymptotically as T → ∞.
We thus reject H0 with confidence 1 − α, if |r | > Zα/2 , where Zα/2 is the
1 − α/2 percentile of the standard normal distribution.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autocovariance and stationarity

Another test, the Portmanteau test, allows us to test jointly for the presence of
several autocorrelations, i.e. H0 : R1 = . . . = Rk = 0, versus Ha : Ri 6= 0, for
some 1 ≤ i ≤ k.
The relevant t-stat is defined as

k
X
Q ∗ (k) = T b 2.
Ri
i=1

Under the assumption that Xt is i.i.d., Q ∗ (k) is asymptotically distributed


according to χ2 (k).
The power of the test is increased if we replace the statistics above with the
Ljung-Box stat:
k b2
X Ri
Q(k) = T (T + 2) .
T −i
i=1

H0 is rejected if Q(k) is greater than the 1 − α percentile of the χ2 (k) distribution.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Models of time series

For practical applications, it is convenient to model a time series as a


discrete-time stochastic process with a small number of parameters.
Time series models have typically the following structure:

Xt = pt + mt + εt , (6)

where the three components on the RHS have the following meaning:
pt is a periodic function called the seasonality,
mt is a slowly varying process called the trend,
εt is a stochastic component called the error or disturbance.
Classic linear time series models fall into three broad categories:
autoregressive,
moving average,
integrated,
and their combinations.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

White noise
The source of randomness in the models discussed in these lectures is white
noise. It is a process specified as follows:

Xt = εt , (7)

where εt ∼ N(0, σ 2 ) are i.i.d. (= independent, identically distributed) normal


random variables.
Note that

E(εt ) = 0,
(
σ 2 , if s = t, (8)
Cov(εs , εt ) =
0, otherwise.

The white noise process is stationary and ergodic (show it!).


The white noise process with linear drift

Xt = at + b + εt , a 6= 0, (9)

is not stationary, as E(Xt ) = at + b.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)

The first class of models that we consider are the autoregressive models AR(p).
Their key characteristic is that the current observation is directly correlated with
the lagged p observations.
The simplest among them is AR(1), the autoregressive model with a single lag.
The model is specified as follows:

Xt = α + βXt−1 + εt . (10)

Here, α, β ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.


A particular case of the AR(1) model is the random walk model, namely

Xt = Xt−1 + εt ,

in which the current value of X is the previous value plus a “white noise”
disturbance.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)


The graph below shows a simulated AR(1) time series with the following choice
of parameters: α = 0.1, β = 0.3, σ = 0.005.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)


Here is the code snippet used to generate this graph in Python:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima model import ARMA
alpha=0.1
beta=0.3
sigma=0.005
#Simulate AR(1)
T=250
x0=alpha/(1-beta)
x=np.zeros(T+1)
x[0]=x0
eps=np.random.normal(0.0,sigma,T)
for i in range(1,T+1):
x[i]=alpha+beta*x[i-1]+eps[i-1]
#Take a look at the simulated time series
plt.plot(x)
plt.show()

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)

Let us investigate the circumstances under which an AR(1) process is


covariance-stationary.
For µ = E(Xt ) to be independent of t we must have from (10):

µ = α + βµ.

This equation has a solution iff β 6= 1 (except for the random walk case
corresponding to α = 0, β = 1). In this case,

α
µ= . (11)
1−β

Let us now compute the autocovariance. To this end, we rewrite (10) as

Xt − µ = β(Xt−1 − µ) + εt . (12)

Notice that the two terms on the RHS of this equation are independent of each
other.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)

For Γ0 = Var(Xt ) to be independent of t, this implies that

Γ0 = β 2 Γ0 + σ 2 ,

and so
σ2
Γ0 = . (13)
1 − β2
Since Γ0 > 0, this equation implies that |β| < 1.
Multiplying (12) by Xt−1 − µ, we find that Γ1 = βΓ0 . Iterating, we find that

Γk = β k Γ0 , (14)

with Γ0 given by (18). The autocorrelation function is decaying exponentially fast


as a function of lag between two observations.
In conclusion, the condition for a AR(1) process to be covariance-stationary is
that |β| < 1.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)

The AR(1) with |β| < 1 has a natural interpretation that can be gleaned from the
following “explicit” representation of Xt . Namely, iterating (10) we find that:

Xt = α + βXt−1 + εt
= α(1 + β) + β 2 Xt−2 + εt + βεt−1
= ... (15)
= α(1 + β + . . . + β L−1 ) + β L Xt−L + εt + βεt−1 + . . . + β L−1 εt−L+1
q
= µ(1 − β L ) + β L Xt−L + Γ0 (1 − β 2L−1 ) ξt

where ξt ∼ N(0, 1).


This implies that

E(Xt |Xt−L ) = µ(1 − β L ) + β L Xt−L ,


(16)
Var(Xt |Xt−L ) = Γ0 (1 − β 2L−1 ).

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Autoregressive model AR(1)

Since β L → 0 exponentially fast, for large L we have


p
Xt ≈ µ + Γ0 ξt . (17)

In other words, the AR(1) model describes a mean reverting time series. After a
large number of observations, Xt takes the form (17), i.e. it is equal to its mean
value plus a Gaussian noise.
The rate of convergence to this limit is given by |β|: the smaller this value, the
faster Xt reaches its limit behavior.
The next question is: given a set of observations, how do we determine the
values of the parameters α, β, and σ in (10)?

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Maximum likelihood estimation

Maximum likelihood estimation (MLE) is a commonly used method of estimating


the parameters of a statistical model given a set of observations.
It is based on the premise that the best choice of the parameter values should
maximize the likelihood of making the observations given these parameters.
Given a statistical model with parameters θ = (θ1 , . . . , θd ), and a set of data
y = (y1 , . . . , yN ), we construct the likelihood function L(θ|y), which links the
model with the data in such a way as if the data were drawn from the assumed
model.
In practice, L(θ|y ) is the joint probability density function (PDF) p(y|θ) under the
model, evaluated at the observed values.
In particular, if the observations yi are independent, then

N
Y
L(θ|y) = p(yi |θ), (18)
i=1

where p(yi |θ) denotes the PDF of a single observation.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Maximum likelihood estimation

The value θ∗ that maximizes L(θ|y) serves as the best fit between the model
specification and the data.
It is usualy more convenient to consider the log liklihood function (LLF)
− log L(θ|y). Then, θ∗ is the value at which the LLF attains its minimum.
As an illustration, consider a sample y = (y1 , . . . , yN ) drawn from the normal
distribution N(µ, σ 2 ). Its likelihood function is given by

N
Y  (yi − µ)2 
L(θ|y) = (2πσ 2 )−N/2 exp − , (19)
2σ 2
i=1

and the LLF is

N
1 1 X
− log L(θ|y) = N log σ 2 + 2
(yi − µ)2 + const. (20)
2 2σ
i=1

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Maximum likelihood estimation

Taking the µ and σ derivatives and setting them to 0, we readily find that that the
MLE estimates of µ and σ are

N
1 X
µ∗ = yi ,
N
i=1
(21)
N
1 X
(σ ∗ )2 = (yi − µ∗ )2 .
N
i=1

respectively.
Note that, while µ∗ is unbiased, the estimator σ ∗ is biased (N in the denominator
above, rather than the usual N − 1).
The fact that the MLE estimator of a parameter is biased is a common
occurance. One can show, however, that MLE estimators are consistent, i.e. in
the limit N → ∞ they converge to the appropriate value.
Going forward, we will use the notation θb rather than θ∗ for the MLE estimators.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for AR(1)


Consider now the AR(1) model and a time series of data x0 , . . . , xT , believed to
be drawn from this model. The easiest way to construct the likelihood function is
to focus on the conditional PDF p(x1 , . . . , xT |x0 , θ). This leads to the conditional
MLE method.
Let
εbt = xt − α − βxt−1 , (22)
for t = 1, . . . , T , be the disturbances implied from the data. According to the
model specification, each εbt is independently drawn from N(0, σ 2 ), and thus
T
1  1 X 2
p(x1 , . . . , xT |x0 , θ) = exp − ε
2
(2πσ )T /2 2σ t=1 t
2
(23)
T
1  1 X 
= exp − (xt − α − βxt−1 )2
(2πσ 2 )T /2 2σ 2 t=1

Hence the LLF is given by


T −1
1 1 X
− log L(θ|y ) = T log σ 2 + 2
(xt+1 − α − βxt )2 + const. (24)
2 2σ t=0

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for AR(1)


Minimizing this function yields:
   PT −1 −1  PT −1 
α
b T t=0 xt t=0 xt+1
 =   ,
PT −1 PT −1 PT −1
βb t=0 xt t=0 xt2 t=0 xt xt+1 (25)
T
1 X
b2 =
σ (xt − α b t−1 )2 .
b − βx
T t=1

This can also be explicitly rewritten as


PT −1
t=0(xt − xb)(xt+1 − xb+ )
βb = PT −1 ,
b 2
t=0 (xt − x ) (26)
b = xb+ − βbxb,
α

where
T −1 T −1
1 X 1 X
xb = xt , xb+ = xt+1 . (27)
T t=0 T t=0

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for AR(1)

The exact MLE method attempts to infer the likelihood of x0 from the probability
distribution. Since x0 ∼ N(µ, Γ0 ),
s
1 − β2  (x − α/(1 − β))2 
0
p(x0 |θ) = exp − . (28)
2πσ 2 2σ 2 /(1 − β 2 )

On the other hand, for t = 1, . . . , T ,

1  (x − α − βx ) 2 
t t−1
p(xt |xt−1 , . . . , x1 , θ) = 2
exp − . (29)
2πσ 2σ 2

From the definition of conditional probability we have the following identity:

T
Y
p(x0 , x1 , . . . , xT |θ) = p(x0 |θ) p(xt |xt−1 , . . . , x1 , θ). (30)
t=1

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for AR(1)

Therefore, the LLF is given by

1 σ2 1
− log L(θ|x) = log + T log σ 2
2 1 − β2 2
T (31)
(x0 − α/(1 − β))2 1 X
+ + (xt − α − βxt−1 )2 + const.
2σ 2 /(1 − β 2 ) 2σ 2 t=1

Unlike the conditional case, the minimum of the exact LLF cannot be calculated
in closed form, and the calculation has to be done by means of a numerical
search.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for AR(1)

Here is a Python code snippet implementing the MLE for AR(1):


#Conditional MLE estimate
y=x[0:T]
yp=x[1:(T+1)]
m=np.sum(y)/T
mp=np.sum(yp)/T
betaCMLE=np.inner(y-m,yp-mp)/np.inner(y-m,y-m)
alphaCMLE=mp-betaCMLE*m
sigmaCMLE=np.sqrt(np.inner(yp-betaCMLE*y-alphaCMLE,
yp-betaCMLE*y-alphaCMLE)/T)
Alternatively, one can use statsmodels functions:
#MLE estimate with statsmodels
model=ARMA(x,order=(1,0)).fit(method=’mle’)
alphaMLE=model.params[0]
betaMLE=model.params[1]
sigmaMLE=np.std(model.resid)

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Second order autoregressive model AR(2)

A second order autoregressive model AR(2) model is specified as follows:

Xt = α + β1 Xt−1 + β2 Xt−2 + εt , (32)

where α, β1 , β2 ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.


Under this specification, the state variable depends on its two lags (rather than
one lag as in AR(1).
Let us determine the conditions under which the model is covariance-stationary.
From the requirement that E(Xt ) = µ,

α
µ= , (33)
1 − β1 − β2

and so we can can rewrite (32) in the following form:

Xt − µ = β1 (Xt−1 − µ) + β2 (Xt−2 − µ) + εt . (34)

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Second order autoregressive model AR(2)

Multiplying (34) by Xt−j − µ, for j = 0, 1, 2, and calculating expectations, we find


that (
β1 Γ1 + β2 Γ2 + σ 2 , if k = 0,
Γk = (35)
β1 Γk−1 + β2 Γk−2 , if k = 1, 2.

This identity is called the Yule-Walker equation for the autocovariance.


Dividing (57) by Γ0 yields the Yule-Walker equation for the autocorrelation:

Rk = β1 Rk −1 + β2 Rk −2 , (36)

for k = 1, 2.
This equation allows us calculate explicitly the ACF for AR(2).
Namely, plugging in k = 1 and remembering that R−1 = R1 yields
R1 = β1 + β2 R1 , or
β1
R1 = . (37)
1 − β2

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Second order autoregressive model AR(2)

Plugging in k = 2 yields R2 = β1 R1 + β2 , or

β12
R2 = β2 + . (38)
1 − β2

Finally, substituting k = 0 in (34) yields

Γ0 = (β1 R1 + β2 R2 )Γ0 + σ 2 . (39)

Solving this, we obtain

(1 − β2 )σ 2
Γ0 = . (40)
(1 + β2 )((1 − β2 )2 − β12 )

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Lag operators and characteristic roots


We have not yet addressed the question under what condition is an AR(2) time
series covariance-stationary. We will now introduce the concepts that will settle
this issue and will allow us to formulate criteria for stationarity for more general
models,
Let us define the lag operator L as a (linear) mapping:

LXt = Xt−1 . (41)

In other words, the lag operator shifts the time index back by one unit.
Applying the lag operator k times shifts the time index by k units:

Lk Xt = Xt−k . (42)

We refer to Lk as the k-th power of L.


Finally, if ψ(z) = ψ0 + ψ1 z + . . . + ψn z n is a polynomial in z, we associate with it
an operator ψ(L) defined by

ψ(L) = ψ0 + ψ1 L + . . . + ψn Ln . (43)

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Lag operators and characteristic roots


Notice that equation (32) can be stated as

ψ(L)Xt = α + εt , (44)

where ψ(z) = 1 − β1 z − β2 z 2 .
Solving this equation amounts to finding the inverse ψ(L)−1 of ψ(L):

α
Xt = + ψ(L)−1 εt . (45)
ψ(1)

Suppose that we can write ψ(L)−1 as an infinite series


X
ψ(L)−1 = γj Lj , (46)
j=0

with

X
|γj | < ∞. (47)
j=0

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Lag operators and characteristic roots


Then

α X
Xt = + γj εt−j , (48)
ψ(1)
j=0

with α
E(Xt ) = , (49)
ψ(1)
and

X
Cov(Xt , Xt+k ) = γj γj+k , for k ≥ 0, (50)
j=0

independently of t. The series is thus covariance-stationary.


In the case of AR(1), ψ(L) = 1 − βL, it is clear that the geometric series does
the job:
X∞
(1 − βL)−1 = β j Lj , (51)
j=0

Condition (47) holds as long as |β| < 1. Another way of saying this is that the
root z1 = 1/β of 1 − βz lies outside of the unit circle.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Lag operators and characteristic roots

Now, if ψ(z) is a polynomial with non-zero roots z1 , . . . , zn . Then

n
(1 − zj−1 L),
Y
ψ(L) = c (52)
j=1

Qn
where c is the constant c = (−1)n ψn j=1 zj .
If each of the roots zj (they may be complex) lies outside of the unit circle, i.e.
|zj−1 | < 1, then we can invert ψ(L) by applying (51) to each factor in (52).
It is not hard to verify that the convergence criterion (47), and thus the time
series is stationary.
We can summarize these arguments by stating that a time series model given by
the lag form equation (44) is covariance stationary if the roots of the polynomial
ψ(z) lie outside of the unit circle.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

General autoregressive model AR(p)

The p-th order autoregressive model AR(p) model is specified as follows:

Xt = α + β1 Xt−1 + . . . + βp Xt−p + εt , (53)

where α, βj ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.


For the covariance-stationarity, the requirement that E(Xt ) = µ yields

α
µ= . (54)
1 − β1 − . . . − βp

Furthermore, we require that the roots of the characteristic polynomial


ψ(z) = 1 − α − β1 z − . . . − βp z p lie outside of the unit circle.
We can rewrite (53) in the following form:

Xt − µ = β1 (Xt−1 − µ) + . . . + βp (Xt−p − µ) + εt . (55)

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

General autoregressive model AR(p)

Multiplying this equation by Xt−j − µ, for j = 0, . . . , p, and calculating


expectations yields the Yule-Walker equation for the autocovariance:
(
β1 Γ1 + · · · + βp Γp + σ 2 , if k = 0,
Γk = (56)
β1 Γk−1 + . . . + βp Γk−p , if k = 1, . . . , p.

Dividing (56) by Γ0 yields the Yule-Walker equation for the autocorrelation:

Rk = β1 Rk−1 + . . . + βp Rk −p , (57)

for k = 1, . . . , p.
Note that the autocorrelations satisfy essentially the same equation as the
process defining Xt .
The ACF Rk can be found as the solution to the Yule-Walker equation and are
expressed in terms of the roots of the characteristic polynomial.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Choosing the number of lags in AR(p)

In practice, the number of lags p is unknown, and has to be determined


empirically.
This can be done by regressing the variable on its lagged values with
p = 1, 2, . . . , and assessing the impact of each added lag on the fit.
It is important not to overfit the model (“torture it until it confesses”) by adding too
many lags.
Useful quantitative guides for model selection are various information criteria.
The Akaike information criterion defined as follows:

AIC = 2k − 2 log L(θ|x).


b (58)

Here k = #θ is the number of model parameters, − log L(θ|x)


b denotes the
optimized value of the LLF.
Acoording to this criterion, among the candidate models the model with the
lowest value of AIC is the preferred one.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Choosing the number of lags in AR(p)

This is in contrast with picking the model whose optimized LLF is the lowest: this
may be the result of overfitting. The AIC criterion penalizes the number of
parameters, and thus discourages overfitting.
Another popular information criteria is the Bayesian information criterion (a.k.a
the Schwarz criterion), which is defined as follows:

BIC = log(N)k − 2 log L(θ|x),


b (59)

where N = #x is the number of data points.


According to this criterion, the model with the smallest value of BIC is the
preferred model.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Moving average model MA(1)


The moving average model MA(1) is specified as follows:

Xt = µ + εt + θεt−1 ., (60)

where µ and θ are constants, and εt is white noise.


The key feature of the MA(1) model is that its disturbances are autocorrelated
with lag 1.
The expected value of Xt is
E(Xt ) = µ, (61)
as E(εt ) = µ, for all t.
Its variance is

E((Xt − µ)2 ) = E((εt + θεt−1 )2 )


= E(ε2t ) + 2θE(εt εt−1 ) + θ2 E(ε2t−1 )
= (1 + θ2 )σ 2 .

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Moving average model MA(1)


For the first autocovariance, we have

E((Xt − µ)(Xt−1 − µ)) = E((εt + θεt−1 )(εt−1 + θεt−2 ))


= θσ 2 .

All autocovariances with lag ≥ 2 are zero (show it!).


As a result, MA(1) is (unlike AR(1)) always covariance-stationary with

2 2 if t = 0,
(1 + θ )σ ,

Γt = θσ 2 , if |t| = 1, (62)

0, if |t| ≥ 2.

As a result, the first autocorrelation R1 = Γ1 /Γ0 is given by

θ
R1 = , (63)
1 + θ2

with all higher order autocorrelations equal zero.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Moving average model MA(1)


The graph below shows a simulated MA(1) time series with the following choice
of parameters: µ = 1.1, β = 0.6, σ = 0.5.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Moving average model MA(1)


Here is the code snippet used to generate this graph in Python:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima model import ARMA
mu=1.1
theta=0.6
sigma=0.5
#Simulate MA(1)
T=250
x0=mu
x=np.zeros(T+1)
x[0]=x0
eps=np.random.normal(0.0,sigma,T+1)
for i in range(1,T+1):
x[i]=mu+eps[i]+theta*eps[i-1]
#Take a look at the simulated time series
plt.plot(x)
plt.show()

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for MA(1)


As in the case of AR(1), there are two natural approaches to MLE of an MA(1)
model: conditional on the initial value of ε and exact.
We begin with the conditional MLE method, which is somewhat easier.
Since the value of ε0 cannot be calculated from the observed data, we are free
to set it arbitrarily; we choose ε0 = 0. All the probabilities calculated below are
conditional on this choice.
We then have, for t = 1, . . . , T ,
εt = xt − µ − θεt−1 , (64)
and so the conditional PDF of xt is
1  ε2t 
p(xt |xt−1 , . . . , x1 , ε0 = 0, θ) = √ exp − . (65)
2πσ 2 2σ 2
This expression is deceivingly simply: in reality εt is a nested function of all xs
with s ≤ t.
The liklihood function of the sample x1 , . . .T is given by the product of the
probabilities above, and so
T
Y
L(θ|x, ε0 = 0) = p(xt |xt−1 , . . . , x1 , ε0 = 0, θ), (66)
t=1

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for MA(1)


The log liklihood has thus the following form:

T
1 1 X 2
− log L(θ|x, ε0 = 0) = T log σ 2 + ε + const. (67)
2 2σ 2 t=1 t

This is a quadratic function of the xt ’s. It is cumbersome to write it down explicitly,


but easy to code it in a programming language. Its minimum is easiest to find by
means of a numerical search.
In case of |θ| < 1, the impact of the choice ε0 = 0 phases out as we iterate
through time steps. For |θ| > 1 the impact of this choice accumulates, and the
method cannot be used.
For the exact MLE method, we notice that the joint PDF of x is given by

1  1 
p(x|θ) = exp − (x − µ)T Ω−1 (x − µ) , (68)
(2π)T /2 det(Ω)1/2 2

and thus
1 1
− log L(θ|x) = log det(Ω) + (x − µ)T Ω−1 (x − µ). (69)
2 2

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for MA(1)

Here, Ω is a band diagonal matrix:

1 + θ2 θ 0 ... 0
 
 θ 1 + θ2 θ ... 0 
Ω = σ2  0 1 + θ2 (70)
 
θ ... 0 
.. .. ..
 
. . . ... 1 + θ2 .

The numerics of minimizing (69) can be handled either by (i) a clever triangular
factorization of Ω, or by the Kalman filter method (we will discuss Kalman filters
later in this course).
Unlike the conditional MLE method, the exact method does not suffer from
instabilities if |θ| ≥ 1.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

MLE for MA(1)

Here is the Python code snippet implementing the MLE for MA(1) using
statsmodels:
#MLE estimate with statsmodels
model=ARMA(x,order=(0,1)).fit(method=’mle’)
muMLE=model.params[0]
thetaMLE=model.params[1]
sigmaMLE=np.std(model.resid)

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

General moving average model MA(q)

A q-th order moving average model MA(q) is specified as follows:

Xt = µ + εt + θ1 εt−1 + . . . + θq εt−q ., (71)

where µ and θj are constants, and εt is white noise.


In other words, the MA(q) model fluctuates around µ with disturbances which
are autocorrelated with lag q.
The expected value of Xt is
E(Xt ) = µ, (72)
while its autocovariance is

2 2 2 if j = 0,
(1 + θ1 + . . . + θq )σ ,

Γj = (θj + θj+1 θ1 + . . . + θq θq−j )σ 2 , if j = 1, . . . , q, (73)

0, if j > q.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

ARMA(p, q) model

A mixed autoregressive moving average model ARMA(p, q) is specified as


follows:

Xt = α + β1 Xt−1 + . . . + βp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q , (74)

where α and βj , θk are constants, and εt is white noise.


The equation above has the following lag operator representation:

ψ(L)Xt = α + ϕ(L)εt , (75)

where

ψ(z) = 1 − β1 z − . . . − βp z p ,
(76)
ϕ(z) = 1 + θ1 z + . . . + θq z q .

The process (49) is covariance stationary if the roots of ψ lie outside of the unit
circle.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

ARMA(p, q) model
In this case, we can write the model in the form

Xt = µ + γ(L)εt , (77)

where µ = α/ψ(1), and γ(L) = ψ(L)−1 ϕ(L). Explicitly, γ(L) is an infinite series:


X
γ(L) = γj Lj , (78)
j=0

with

X
|γj |2 < ∞. (79)
j=0

This form of the model specification is called the moving average form.
The parameters ARMA models are estimated by means of the MLE method. The
complexity of computation required to minimize the LLF increases with the
number of parameters.
Information criteria, such as AIC or BIC, remain useful quantitative guides for
model selection.
A. Lesniewski Time Series Analysis
Basic concepts
Autoregressive models
Moving average models

Forecasting time series with ARMA(p, q)


An important function of time series analysis is making predictions about future
values of the observed data, i.e. forecasting.
Data based forecasting problem can be formulated as follows: given the

observations X1:t = X1 , . . . , Xt , what is the best forecast Xt+1|1:t of Xt+1 ?
In mathematical terms, the problem requires minimizing a suitable loss function.
We choose to minimize the mean squared error (MSE) given by


)2 .

E (Xt+1 − Xt+1|1:t (80)


We claim that Xt+1|1:t is, indeed, given given by the conditional expected value:


Xt+1|1:t = Et (Xt+1 ). (81)

Here Et denotes expectation, conditional on the information up to time t,

Et (·) = E( · |X1:t ). (82)

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Forecasting time series with ARMA(p, q)

Indeed, if Z is any random variable measurable with respect to the information


set generated by X1:t , then

E (Xt+1 − Z )2 = E (Xt+1 − Et (Xt+1 ) + Et (Xt+1 ) − Z )2


 

= E (Xt+1 − Et (Xt+1 ))2 + E (Et (Xt+1 ) − Z )2


 

+ 2E (Xt+1 − Et (Xt+1 ))(Et (Xt+1 ) − Z ) .

We argue that the cross term above is zero. Indeed


  
Et (Xt+1 − Et (Xt+1 ))(Et (Xt+1 ) − Z ) = Et Xt+1 − Et (Xt+1 ) Et (Xt+1 ) − Z
 
= Et (Xt+1 ) − Et (Xt+1 ) Et (Xt+1 ) − Z
= 0.

Since E(·) = E(Et (·)|Xt ), the claim follows.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Forecasting time series with ARMA(p, q)

As a result

E (Xt+1 − Z )2 = E (Xt+1 − Et (Xt+1 ))2 + E (Et (Xt+1 ) − Z )2 ,


  

which has its minimum at Z = Et (Xt+1 ). This proves (81).


The argument above is, in fact, quite general, and it easily extends to general

k-period forecasts Xt+k . Minimizing the corresponding MSE yields:
|1:t


Xt+k|1:t = Et (Xt+k ). (83)

Later we will generalize this method to time series models with more complex
structure.

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Forecasting time series with ARMA(p, q)

As an example, a single period forecast in an AR(1) model is


Xt+1|1:t = Et (Xt+1 )
= Et (α + βXt + εt+1 ) (84)
= α + βXt .

The forecast error is εt+1 , and so the variance of the forecast error is σ 2 .
Likewise, a single period forecast in an AR(p) model is


Xt+1|1:t = α + β1 Xt + . . . + βp Xt−p+1 . (85)

with forecast error is εt+1 , and the variance of the forecast error is σ 2 .

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

Forecasting time series with ARMA(p, q)


A two-period forecast in an AR(1) model is given by


Xt+2|1:t = Et (Xt+2 )
= Et (α + βXt+1 + εt+2 ) (86)
= (1 + β)α + β 2 Xt .

The error of the two period forecast is εt+2 + βεt+1 ; its variance is (1 + β 2 )σ 2 .
A one period forecast in an MA(1) model is


Xt+1|1:t = Et (Xt+1 )
= Et (µ + εt+1 + θεt ) (87)
= µ + θεt .

The forecast error is εt+1 , and its variance is σ 2 .


These calculations can be generalized to produce a general formula for a
multi-period forecast in an ARMA(p, q) model. This result is known as the
Wiener-Kolmogorov prediction formula and its discussion can be found in [1].

A. Lesniewski Time Series Analysis


Basic concepts
Autoregressive models
Moving average models

References

[1] Hamilton, J. D.: Time Series Analysis, Princeton University Press (1994).

[2] Tsay, R. S.: Analysis of Financial Time Series, Wiley (2010).

A. Lesniewski Time Series Analysis

You might also like