TS Lecture1 2019

Basic concepts
Autoregressive models
Moving average models
Time Series Analysis

1. Stationary ARMA models
Andrew Lesniewski
Baruch College
New York
Fall 2019
A. Lesniewski Time Series Analysis

Basic concepts
Outline
1 Basic concepts
2 Autoregressive models
3 Moving average models

Basic concepts
Time series
A time series is a sequence of data points Xt indexed a discrete set of (ordered)
dates t, where −∞ < t < ∞.
Each Xt can be a simple number or a complex multi-dimensional object (vector,
matrix, higher dimensional array, or more general structure).
We will be assuming that the times t are equally spaced throughout, and denote
the time increment by h (e.g. second, day, month). Unless specified otherwise,
we will be choosing the units of time so that h = 1.
Typically, time series exhibit significant irregularities, which may have their origin
either in the nature of the underlying quantity or imprecision in observation (or
both).
Examples of time series commonly encountered in finance include:
(i) prices,
(ii) returns,
(iii) index levels,
(iv) trading volums,
(v) open interests,
(vi) macroeconomic data (inflation, new payrolls, unemployment, GDP,
housing prices, . . . )

Basic concepts
Time series
For modeling purposes, we assume that the elements of a time series are
random variables on some underlying probability space.
Time series analysis is a set of mathematical methodologies for analyzing
observed time series, whose purpose is to extract useful characteristics of the
data.
These methodologies fall into two broad categories:
(i) non-parametric, where the stochastic law of the time series is not explicitly
specified;
(ii) parametric, where the stochastic law of the time series is assumed to be
given by a model with a finite (and preferably tractable) number of
parameters.
The results of time series analysis are used for various purposes such as
(i) data interpretation,
(ii) forecasting,
(iii) smoothing,
(iv) back filling, ...
We begin with stationary time series.

Basic concepts
Stationarity and ergodicity
A time series (model) is stationary, if for any times t1 < . . . < tk and any τ the
joint probability distribution of (Xt1 +τ , . . . , Xtk +τ ) is identical with the joint
probability distribution of (Xt1 , . . . , Xtk ).
In other words, the joint probability distribution of (Xt1 , . . . , Xtk ) remains the
same if each observation time ti is shifted by the same amount (time translation
invariance).
For a stationary time series, the expected value E(Xt ) is independent of t and is
called the (ensemble) mean of Xt . We will denote its value by µ.
A stationary time series model is ergodic if
1 X
lim Xt+k = µ, (1)
T →∞ T
1≤k ≤T
i.e. if the time average of Xt is equal to its mean.

The limit in (1) is usually understood in the sense of squared mean convergence.

Basic concepts
Stationarity and ergodicity
Ergodicity is a desired property of a financial time series, as we are always faced

with a single realization of a process rather than an ensemble of alternative
outcomes.
The notions of stationarity and ergodicity are hard to verify in practice. In
particular, there is practical statistical test for ergodicity.
For this reason, a weaker but more practical concept of stationarity has been
introduced.

Basic concepts
Autocovariance and stationarity
A time series is covariance-stationary (a.k.a. weakly stationary), if:

(i) E(Xt ) = µ is a constant,
(ii) For any τ , the autocovariance Cov(Xs , Xt ) is time translation invariant,
Cov(Xs+τ , Xt+τ ) = Cov(Xs , Xt ), (2)
i.e. Cov(Xs , Xt ) depends only on the difference t − s. We will denote it by

Γt−s .
For covariance stationary series, Γ−t = Γt (show it!).
Notice that Γ0 = Var(Xt ).

Basic concepts
The autocorrelation function (ACF) of a time series is defined as
Cov(Xs , Xt )
Rs,t = p p . (3)
Var(Xs ) Var(Xt )
For covariance-stationary time series, Rs,t = Rs−t,0 , i.e. the ACF is a function of
the difference s − t only.
We will write Rt = Rt,0 , and note that
Γt
Rt = . (4)
Γ0

Basic concepts
Note that µ, Γ, and R are usually unknown, and are estimated from sample data.
The estimated sample mean µ b, autocovariance b
Γ, and autocorrelation R
b are
calculated as follows.
Consider a finite sample x1 , . . . , xT . Then
T
1 X
µ
b= xt ,
T t=1

T
 1 P (x − µ

b)(xj−t − µ
b), for t = 0, 1, . . . , T − 1,
T j (5)
Γt =
b j=t+1

Γ−t , for t = −1, . . . , −(T − 1).
b
b t = Γt .
b
R
Γ0
b
These quantities are called the sample mean, sample autocovariance, and
sample ACF, respectively.

Basic concepts
Usually, R
b t is a biased estimator of Rt , with the bias going to zero as 1/T for
T → ∞.
Notice that this method allows us to compute up to T − 1 estimated sample
autocorrelations.
One can use the above estimators to test the hypothesis H0 : Rt = 0 versus
Ha : Rt 6= 0.
The relevant t-stat is
R
bt
r = q .
1
1 + 2 t−1 b2
P
T
R
i=1 i
If Xt is a stationary Gaussian time series with Rs = 0 for s > t, this t-stat is

normally distributed, asymptotically as T → ∞.
We thus reject H0 with confidence 1 − α, if |r | > Zα/2 , where Zα/2 is the
1 − α/2 percentile of the standard normal distribution.

Basic concepts
Another test, the Portmanteau test, allows us to test jointly for the presence of
several autocorrelations, i.e. H0 : R1 = . . . = Rk = 0, versus Ha : Ri 6= 0, for
some 1 ≤ i ≤ k.
The relevant t-stat is defined as
k
X
Q ∗ (k) = T b 2.
Ri
i=1
Under the assumption that Xt is i.i.d., Q ∗ (k) is asymptotically distributed

according to χ2 (k).
The power of the test is increased if we replace the statistics above with the
Ljung-Box stat:
k b2
X Ri
Q(k) = T (T + 2) .
T −i
i=1
H0 is rejected if Q(k) is greater than the 1 − α percentile of the χ2 (k) distribution.

Basic concepts
Models of time series
For practical applications, it is convenient to model a time series as a

discrete-time stochastic process with a small number of parameters.
Time series models have typically the following structure:
Xt = pt + mt + εt , (6)
where the three components on the RHS have the following meaning:
pt is a periodic function called the seasonality,
mt is a slowly varying process called the trend,
εt is a stochastic component called the error or disturbance.
Classic linear time series models fall into three broad categories:
autoregressive,
moving average,
integrated,
and their combinations.

Basic concepts
White noise
The source of randomness in the models discussed in these lectures is white
noise. It is a process specified as follows:
Xt = εt , (7)
where εt ∼ N(0, σ 2 ) are i.i.d. (= independent, identically distributed) normal

random variables.
Note that
E(εt ) = 0,
(
σ 2 , if s = t, (8)
Cov(εs , εt ) =
0, otherwise.
The white noise process is stationary and ergodic (show it!).

The white noise process with linear drift
Xt = at + b + εt , a 6= 0, (9)
is not stationary, as E(Xt ) = at + b.

Basic concepts
Autoregressive model AR(1)
The first class of models that we consider are the autoregressive models AR(p).
Their key characteristic is that the current observation is directly correlated with
the lagged p observations.
The simplest among them is AR(1), the autoregressive model with a single lag.
The model is specified as follows:
Xt = α + βXt−1 + εt . (10)
Here, α, β ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.

A particular case of the AR(1) model is the random walk model, namely
Xt = Xt−1 + εt ,
in which the current value of X is the previous value plus a “white noise”
disturbance.

Basic concepts

The graph below shows a simulated AR(1) time series with the following choice
of parameters: α = 0.1, β = 0.3, σ = 0.005.

Basic concepts

Here is the code snippet used to generate this graph in Python:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima model import ARMA
alpha=0.1
beta=0.3
sigma=0.005
#Simulate AR(1)
T=250
x0=alpha/(1-beta)
x=np.zeros(T+1)
x[0]=x0
eps=np.random.normal(0.0,sigma,T)
for i in range(1,T+1):
x[i]=alpha+beta*x[i-1]+eps[i-1]
#Take a look at the simulated time series
plt.plot(x)
plt.show()

Basic concepts
Let us investigate the circumstances under which an AR(1) process is

covariance-stationary.
For µ = E(Xt ) to be independent of t we must have from (10):
µ = α + βµ.
This equation has a solution iff β 6= 1 (except for the random walk case
corresponding to α = 0, β = 1). In this case,
α
µ= . (11)
1−β
Let us now compute the autocovariance. To this end, we rewrite (10) as
Xt − µ = β(Xt−1 − µ) + εt . (12)
Notice that the two terms on the RHS of this equation are independent of each
other.

Basic concepts
For Γ0 = Var(Xt ) to be independent of t, this implies that
Γ0 = β 2 Γ0 + σ 2 ,
and so
σ2
Γ0 = . (13)
1 − β2
Since Γ0 > 0, this equation implies that |β| < 1.
Multiplying (12) by Xt−1 − µ, we find that Γ1 = βΓ0 . Iterating, we find that
Γk = β k Γ0 , (14)
with Γ0 given by (18). The autocorrelation function is decaying exponentially fast

as a function of lag between two observations.
In conclusion, the condition for a AR(1) process to be covariance-stationary is
that |β| < 1.

Basic concepts
The AR(1) with |β| < 1 has a natural interpretation that can be gleaned from the
following “explicit” representation of Xt . Namely, iterating (10) we find that:
Xt = α + βXt−1 + εt
= α(1 + β) + β 2 Xt−2 + εt + βεt−1
= ... (15)
= α(1 + β + . . . + β L−1 ) + β L Xt−L + εt + βεt−1 + . . . + β L−1 εt−L+1
q
= µ(1 − β L ) + β L Xt−L + Γ0 (1 − β 2L−1 ) ξt
where ξt ∼ N(0, 1).

This implies that
E(Xt |Xt−L ) = µ(1 − β L ) + β L Xt−L ,

(16)
Var(Xt |Xt−L ) = Γ0 (1 − β 2L−1 ).

Basic concepts
Since β L → 0 exponentially fast, for large L we have

p
Xt ≈ µ + Γ0 ξt . (17)
In other words, the AR(1) model describes a mean reverting time series. After a
large number of observations, Xt takes the form (17), i.e. it is equal to its mean
value plus a Gaussian noise.
The rate of convergence to this limit is given by |β|: the smaller this value, the
faster Xt reaches its limit behavior.
The next question is: given a set of observations, how do we determine the
values of the parameters α, β, and σ in (10)?

Basic concepts
Maximum likelihood estimation
Maximum likelihood estimation (MLE) is a commonly used method of estimating

the parameters of a statistical model given a set of observations.
It is based on the premise that the best choice of the parameter values should
maximize the likelihood of making the observations given these parameters.
Given a statistical model with parameters θ = (θ1 , . . . , θd ), and a set of data
y = (y1 , . . . , yN ), we construct the likelihood function L(θ|y), which links the
model with the data in such a way as if the data were drawn from the assumed
model.
In practice, L(θ|y ) is the joint probability density function (PDF) p(y|θ) under the
model, evaluated at the observed values.
In particular, if the observations yi are independent, then
N
Y
L(θ|y) = p(yi |θ), (18)
i=1
where p(yi |θ) denotes the PDF of a single observation.

Basic concepts
The value θ∗ that maximizes L(θ|y) serves as the best fit between the model
specification and the data.
It is usualy more convenient to consider the log liklihood function (LLF)
− log L(θ|y). Then, θ∗ is the value at which the LLF attains its minimum.
As an illustration, consider a sample y = (y1 , . . . , yN ) drawn from the normal
distribution N(µ, σ 2 ). Its likelihood function is given by
N
Y (yi − µ)2
L(θ|y) = (2πσ 2 )−N/2 exp − , (19)
2σ 2
i=1
and the LLF is
N
1 1 X
− log L(θ|y) = N log σ 2 + 2
(yi − µ)2 + const. (20)
2 2σ
i=1

Basic concepts
Taking the µ and σ derivatives and setting them to 0, we readily find that that the
MLE estimates of µ and σ are
N
1 X
µ∗ = yi ,
N
i=1
(21)
N
1 X
(σ ∗ )2 = (yi − µ∗ )2 .
N
i=1
respectively.
Note that, while µ∗ is unbiased, the estimator σ ∗ is biased (N in the denominator
above, rather than the usual N − 1).
The fact that the MLE estimator of a parameter is biased is a common
occurance. One can show, however, that MLE estimators are consistent, i.e. in
the limit N → ∞ they converge to the appropriate value.
Going forward, we will use the notation θb rather than θ∗ for the MLE estimators.

Basic concepts
MLE for AR(1)

Consider now the AR(1) model and a time series of data x0 , . . . , xT , believed to
be drawn from this model. The easiest way to construct the likelihood function is
to focus on the conditional PDF p(x1 , . . . , xT |x0 , θ). This leads to the conditional
MLE method.
Let
εbt = xt − α − βxt−1 , (22)
for t = 1, . . . , T , be the disturbances implied from the data. According to the
model specification, each εbt is independently drawn from N(0, σ 2 ), and thus
T
1 1 X 2
p(x1 , . . . , xT |x0 , θ) = exp − ε
2
(2πσ )T /2 2σ t=1 t
2
(23)
T
1 1 X
= exp − (xt − α − βxt−1 )2
(2πσ 2 )T /2 2σ 2 t=1
Hence the LLF is given by

T −1
1 1 X
− log L(θ|y ) = T log σ 2 + 2
(xt+1 − α − βxt )2 + const. (24)
2 2σ t=0

Basic concepts
MLE for AR(1)

Minimizing this function yields:
   PT −1 −1  PT −1 
α
b T t=0 xt t=0 xt+1
 =   ,
PT −1 PT −1 PT −1
βb t=0 xt t=0 xt2 t=0 xt xt+1 (25)
T
1 X
b2 =
σ (xt − α b t−1 )2 .
b − βx
T t=1
This can also be explicitly rewritten as

PT −1
t=0(xt − xb)(xt+1 − xb+ )
βb = PT −1 ,
b 2
t=0 (xt − x ) (26)
b = xb+ − βbxb,
α
where
T −1 T −1
1 X 1 X
xb = xt , xb+ = xt+1 . (27)
T t=0 T t=0

Basic concepts
MLE for AR(1)
The exact MLE method attempts to infer the likelihood of x0 from the probability
distribution. Since x0 ∼ N(µ, Γ0 ),
s
1 − β2 (x − α/(1 − β))2
0
p(x0 |θ) = exp − . (28)
2πσ 2 2σ 2 /(1 − β 2 )
On the other hand, for t = 1, . . . , T ,
1 (x − α − βx ) 2
t t−1
p(xt |xt−1 , . . . , x1 , θ) = 2
exp − . (29)
2πσ 2σ 2
From the definition of conditional probability we have the following identity:
T
Y
p(x0 , x1 , . . . , xT |θ) = p(x0 |θ) p(xt |xt−1 , . . . , x1 , θ). (30)
t=1

Basic concepts
MLE for AR(1)
Therefore, the LLF is given by
1 σ2 1
− log L(θ|x) = log + T log σ 2
2 1 − β2 2
T (31)
(x0 − α/(1 − β))2 1 X
+ + (xt − α − βxt−1 )2 + const.
2σ 2 /(1 − β 2 ) 2σ 2 t=1
Unlike the conditional case, the minimum of the exact LLF cannot be calculated
in closed form, and the calculation has to be done by means of a numerical
search.

Basic concepts
MLE for AR(1)
Here is a Python code snippet implementing the MLE for AR(1):

#Conditional MLE estimate
y=x[0:T]
yp=x[1:(T+1)]
m=np.sum(y)/T
mp=np.sum(yp)/T
betaCMLE=np.inner(y-m,yp-mp)/np.inner(y-m,y-m)
alphaCMLE=mp-betaCMLE*m
sigmaCMLE=np.sqrt(np.inner(yp-betaCMLE*y-alphaCMLE,
yp-betaCMLE*y-alphaCMLE)/T)
Alternatively, one can use statsmodels functions:
#MLE estimate with statsmodels
model=ARMA(x,order=(1,0)).fit(method=’mle’)
alphaMLE=model.params[0]
betaMLE=model.params[1]
sigmaMLE=np.std(model.resid)

Basic concepts
Second order autoregressive model AR(2)
A second order autoregressive model AR(2) model is specified as follows:
Xt = α + β1 Xt−1 + β2 Xt−2 + εt , (32)
where α, β1 , β2 ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.

Under this specification, the state variable depends on its two lags (rather than
one lag as in AR(1).
Let us determine the conditions under which the model is covariance-stationary.
From the requirement that E(Xt ) = µ,
α
µ= , (33)
1 − β1 − β2
and so we can can rewrite (32) in the following form:
Xt − µ = β1 (Xt−1 − µ) + β2 (Xt−2 − µ) + εt . (34)

Basic concepts
Multiplying (34) by Xt−j − µ, for j = 0, 1, 2, and calculating expectations, we find

that (
β1 Γ1 + β2 Γ2 + σ 2 , if k = 0,
Γk = (35)
β1 Γk−1 + β2 Γk−2 , if k = 1, 2.
This identity is called the Yule-Walker equation for the autocovariance.

Dividing (57) by Γ0 yields the Yule-Walker equation for the autocorrelation:
Rk = β1 Rk −1 + β2 Rk −2 , (36)
for k = 1, 2.
This equation allows us calculate explicitly the ACF for AR(2).
Namely, plugging in k = 1 and remembering that R−1 = R1 yields
R1 = β1 + β2 R1 , or
β1
R1 = . (37)
1 − β2

Basic concepts
Plugging in k = 2 yields R2 = β1 R1 + β2 , or
β12
R2 = β2 + . (38)
1 − β2
Finally, substituting k = 0 in (34) yields
Γ0 = (β1 R1 + β2 R2 )Γ0 + σ 2 . (39)
Solving this, we obtain
(1 − β2 )σ 2
Γ0 = . (40)
(1 + β2 )((1 − β2 )2 − β12 )

Basic concepts
Lag operators and characteristic roots

We have not yet addressed the question under what condition is an AR(2) time
series covariance-stationary. We will now introduce the concepts that will settle
this issue and will allow us to formulate criteria for stationarity for more general
models,
Let us define the lag operator L as a (linear) mapping:
LXt = Xt−1 . (41)
In other words, the lag operator shifts the time index back by one unit.
Applying the lag operator k times shifts the time index by k units:
Lk Xt = Xt−k . (42)
We refer to Lk as the k-th power of L.

Finally, if ψ(z) = ψ0 + ψ1 z + . . . + ψn z n is a polynomial in z, we associate with it
an operator ψ(L) defined by
ψ(L) = ψ0 + ψ1 L + . . . + ψn Ln . (43)

Basic concepts

Notice that equation (32) can be stated as
ψ(L)Xt = α + εt , (44)
where ψ(z) = 1 − β1 z − β2 z 2 .
Solving this equation amounts to finding the inverse ψ(L)−1 of ψ(L):
α
Xt = + ψ(L)−1 εt . (45)
ψ(1)
Suppose that we can write ψ(L)−1 as an infinite series
∞
X
ψ(L)−1 = γj Lj , (46)
j=0
with
∞
X
|γj | < ∞. (47)
j=0

Basic concepts

Then
∞
α X
Xt = + γj εt−j , (48)
ψ(1)
j=0
with α
E(Xt ) = , (49)
ψ(1)
and
∞
X
Cov(Xt , Xt+k ) = γj γj+k , for k ≥ 0, (50)
j=0
independently of t. The series is thus covariance-stationary.

In the case of AR(1), ψ(L) = 1 − βL, it is clear that the geometric series does
the job:
X∞
(1 − βL)−1 = β j Lj , (51)
j=0
Condition (47) holds as long as |β| < 1. Another way of saying this is that the
root z1 = 1/β of 1 − βz lies outside of the unit circle.

Basic concepts
Now, if ψ(z) is a polynomial with non-zero roots z1 , . . . , zn . Then
n
(1 − zj−1 L),
Y
ψ(L) = c (52)
j=1
Qn
where c is the constant c = (−1)n ψn j=1 zj .
If each of the roots zj (they may be complex) lies outside of the unit circle, i.e.
|zj−1 | < 1, then we can invert ψ(L) by applying (51) to each factor in (52).
It is not hard to verify that the convergence criterion (47), and thus the time
series is stationary.
We can summarize these arguments by stating that a time series model given by
the lag form equation (44) is covariance stationary if the roots of the polynomial
ψ(z) lie outside of the unit circle.

Basic concepts
General autoregressive model AR(p)
The p-th order autoregressive model AR(p) model is specified as follows:
Xt = α + β1 Xt−1 + . . . + βp Xt−p + εt , (53)
where α, βj ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.

For the covariance-stationarity, the requirement that E(Xt ) = µ yields
α
µ= . (54)
1 − β1 − . . . − βp
Furthermore, we require that the roots of the characteristic polynomial

ψ(z) = 1 − α − β1 z − . . . − βp z p lie outside of the unit circle.
We can rewrite (53) in the following form:
Xt − µ = β1 (Xt−1 − µ) + . . . + βp (Xt−p − µ) + εt . (55)

Basic concepts
General autoregressive model AR(p)
Multiplying this equation by Xt−j − µ, for j = 0, . . . , p, and calculating

expectations yields the Yule-Walker equation for the autocovariance:
(
β1 Γ1 + · · · + βp Γp + σ 2 , if k = 0,
Γk = (56)
β1 Γk−1 + . . . + βp Γk−p , if k = 1, . . . , p.
Dividing (56) by Γ0 yields the Yule-Walker equation for the autocorrelation:
Rk = β1 Rk−1 + . . . + βp Rk −p , (57)
for k = 1, . . . , p.
Note that the autocorrelations satisfy essentially the same equation as the
process defining Xt .
The ACF Rk can be found as the solution to the Yule-Walker equation and are
expressed in terms of the roots of the characteristic polynomial.

Basic concepts
Choosing the number of lags in AR(p)
In practice, the number of lags p is unknown, and has to be determined

empirically.
This can be done by regressing the variable on its lagged values with
p = 1, 2, . . . , and assessing the impact of each added lag on the fit.
It is important not to overfit the model (“torture it until it confesses”) by adding too
many lags.
Useful quantitative guides for model selection are various information criteria.
The Akaike information criterion defined as follows:
AIC = 2k − 2 log L(θ|x).

b (58)
Here k = #θ is the number of model parameters, − log L(θ|x)

b denotes the
optimized value of the LLF.
Acoording to this criterion, among the candidate models the model with the
lowest value of AIC is the preferred one.

Basic concepts
Choosing the number of lags in AR(p)
This is in contrast with picking the model whose optimized LLF is the lowest: this
may be the result of overfitting. The AIC criterion penalizes the number of
parameters, and thus discourages overfitting.
Another popular information criteria is the Bayesian information criterion (a.k.a
the Schwarz criterion), which is defined as follows:
BIC = log(N)k − 2 log L(θ|x),

b (59)
where N = #x is the number of data points.

According to this criterion, the model with the smallest value of BIC is the
preferred model.

Basic concepts
Moving average model MA(1)

The moving average model MA(1) is specified as follows:
Xt = µ + εt + θεt−1 ., (60)
where µ and θ are constants, and εt is white noise.

The key feature of the MA(1) model is that its disturbances are autocorrelated
with lag 1.
The expected value of Xt is
E(Xt ) = µ, (61)
as E(εt ) = µ, for all t.
Its variance is
E((Xt − µ)2 ) = E((εt + θεt−1 )2 )

= E(ε2t ) + 2θE(εt εt−1 ) + θ2 E(ε2t−1 )
= (1 + θ2 )σ 2 .

Basic concepts

For the first autocovariance, we have
E((Xt − µ)(Xt−1 − µ)) = E((εt + θεt−1 )(εt−1 + θεt−2 ))

= θσ 2 .
All autocovariances with lag ≥ 2 are zero (show it!).

As a result, MA(1) is (unlike AR(1)) always covariance-stationary with

2 2 if t = 0,
(1 + θ )σ ,

Γt = θσ 2 , if |t| = 1, (62)

0, if |t| ≥ 2.

As a result, the first autocorrelation R1 = Γ1 /Γ0 is given by
θ
R1 = , (63)
1 + θ2
with all higher order autocorrelations equal zero.

Basic concepts

The graph below shows a simulated MA(1) time series with the following choice
of parameters: µ = 1.1, β = 0.6, σ = 0.5.

Basic concepts

Here is the code snippet used to generate this graph in Python:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima model import ARMA
mu=1.1
theta=0.6
sigma=0.5
#Simulate MA(1)
T=250
x0=mu
x=np.zeros(T+1)
x[0]=x0
eps=np.random.normal(0.0,sigma,T+1)
for i in range(1,T+1):
x[i]=mu+eps[i]+theta*eps[i-1]
#Take a look at the simulated time series
plt.plot(x)
plt.show()

Basic concepts
MLE for MA(1)

As in the case of AR(1), there are two natural approaches to MLE of an MA(1)
model: conditional on the initial value of ε and exact.
We begin with the conditional MLE method, which is somewhat easier.
Since the value of ε0 cannot be calculated from the observed data, we are free
to set it arbitrarily; we choose ε0 = 0. All the probabilities calculated below are
conditional on this choice.
We then have, for t = 1, . . . , T ,
εt = xt − µ − θεt−1 , (64)
and so the conditional PDF of xt is
1 ε2t
p(xt |xt−1 , . . . , x1 , ε0 = 0, θ) = √ exp − . (65)
2πσ 2 2σ 2
This expression is deceivingly simply: in reality εt is a nested function of all xs
with s ≤ t.
The liklihood function of the sample x1 , . . .T is given by the product of the
probabilities above, and so
T
Y
L(θ|x, ε0 = 0) = p(xt |xt−1 , . . . , x1 , ε0 = 0, θ), (66)
t=1

Basic concepts
MLE for MA(1)

The log liklihood has thus the following form:
T
1 1 X 2
− log L(θ|x, ε0 = 0) = T log σ 2 + ε + const. (67)
2 2σ 2 t=1 t
This is a quadratic function of the xt ’s. It is cumbersome to write it down explicitly,

but easy to code it in a programming language. Its minimum is easiest to find by
means of a numerical search.
In case of |θ| < 1, the impact of the choice ε0 = 0 phases out as we iterate
through time steps. For |θ| > 1 the impact of this choice accumulates, and the
method cannot be used.
For the exact MLE method, we notice that the joint PDF of x is given by
1 1
p(x|θ) = exp − (x − µ)T Ω−1 (x − µ) , (68)
(2π)T /2 det(Ω)1/2 2
and thus
1 1
− log L(θ|x) = log det(Ω) + (x − µ)T Ω−1 (x − µ). (69)
2 2

Basic concepts
MLE for MA(1)
Here, Ω is a band diagonal matrix:
1 + θ2 θ 0 ... 0
 
 θ 1 + θ2 θ ... 0 
Ω = σ2  0 1 + θ2 (70)
 
θ ... 0 
.. .. ..
 
. . . ... 1 + θ2 .
The numerics of minimizing (69) can be handled either by (i) a clever triangular
factorization of Ω, or by the Kalman filter method (we will discuss Kalman filters
later in this course).
Unlike the conditional MLE method, the exact method does not suffer from
instabilities if |θ| ≥ 1.

Basic concepts
MLE for MA(1)
Here is the Python code snippet implementing the MLE for MA(1) using
statsmodels:
#MLE estimate with statsmodels
model=ARMA(x,order=(0,1)).fit(method=’mle’)
muMLE=model.params[0]
thetaMLE=model.params[1]
sigmaMLE=np.std(model.resid)

Basic concepts
General moving average model MA(q)
A q-th order moving average model MA(q) is specified as follows:
Xt = µ + εt + θ1 εt−1 + . . . + θq εt−q ., (71)
where µ and θj are constants, and εt is white noise.

In other words, the MA(q) model fluctuates around µ with disturbances which
are autocorrelated with lag q.
The expected value of Xt is
E(Xt ) = µ, (72)
while its autocovariance is

2 2 2 if j = 0,
(1 + θ1 + . . . + θq )σ ,

Γj = (θj + θj+1 θ1 + . . . + θq θq−j )σ 2 , if j = 1, . . . , q, (73)

0, if j > q.


Basic concepts
ARMA(p, q) model
A mixed autoregressive moving average model ARMA(p, q) is specified as

follows:
Xt = α + β1 Xt−1 + . . . + βp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q , (74)
where α and βj , θk are constants, and εt is white noise.

The equation above has the following lag operator representation:
ψ(L)Xt = α + ϕ(L)εt , (75)
where
ψ(z) = 1 − β1 z − . . . − βp z p ,
(76)
ϕ(z) = 1 + θ1 z + . . . + θq z q .
The process (49) is covariance stationary if the roots of ψ lie outside of the unit
circle.

Basic concepts
ARMA(p, q) model
In this case, we can write the model in the form
Xt = µ + γ(L)εt , (77)
where µ = α/ψ(1), and γ(L) = ψ(L)−1 ϕ(L). Explicitly, γ(L) is an infinite series:
∞
X
γ(L) = γj Lj , (78)
j=0
with
∞
X
|γj |2 < ∞. (79)
j=0
This form of the model specification is called the moving average form.
The parameters ARMA models are estimated by means of the MLE method. The
complexity of computation required to minimize the LLF increases with the
number of parameters.
Information criteria, such as AIC or BIC, remain useful quantitative guides for
model selection.
Basic concepts
Forecasting time series with ARMA(p, q)

An important function of time series analysis is making predictions about future
values of the observed data, i.e. forecasting.
Data based forecasting problem can be formulated as follows: given the
∗
observations X1:t = X1 , . . . , Xt , what is the best forecast Xt+1|1:t of Xt+1 ?
In mathematical terms, the problem requires minimizing a suitable loss function.
We choose to minimize the mean squared error (MSE) given by
∗
)2 .

E (Xt+1 − Xt+1|1:t (80)
∗
We claim that Xt+1|1:t is, indeed, given given by the conditional expected value:
∗
Xt+1|1:t = Et (Xt+1 ). (81)
Here Et denotes expectation, conditional on the information up to time t,
Et (·) = E( · |X1:t ). (82)

Basic concepts
Indeed, if Z is any random variable measurable with respect to the information

set generated by X1:t , then
E (Xt+1 − Z )2 = E (Xt+1 − Et (Xt+1 ) + Et (Xt+1 ) − Z )2

= E (Xt+1 − Et (Xt+1 ))2 + E (Et (Xt+1 ) − Z )2

+ 2E (Xt+1 − Et (Xt+1 ))(Et (Xt+1 ) − Z ) .
We argue that the cross term above is zero. Indeed

Et (Xt+1 − Et (Xt+1 ))(Et (Xt+1 ) − Z ) = Et Xt+1 − Et (Xt+1 ) Et (Xt+1 ) − Z

= Et (Xt+1 ) − Et (Xt+1 ) Et (Xt+1 ) − Z
= 0.
Since E(·) = E(Et (·)|Xt ), the claim follows.

Basic concepts
As a result
E (Xt+1 − Z )2 = E (Xt+1 − Et (Xt+1 ))2 + E (Et (Xt+1 ) − Z )2 ,

which has its minimum at Z = Et (Xt+1 ). This proves (81).

The argument above is, in fact, quite general, and it easily extends to general
∗
k-period forecasts Xt+k . Minimizing the corresponding MSE yields:
|1:t
∗
Xt+k|1:t = Et (Xt+k ). (83)
Later we will generalize this method to time series models with more complex
structure.

Basic concepts
As an example, a single period forecast in an AR(1) model is
∗
Xt+1|1:t = Et (Xt+1 )
= Et (α + βXt + εt+1 ) (84)
= α + βXt .
The forecast error is εt+1 , and so the variance of the forecast error is σ 2 .
Likewise, a single period forecast in an AR(p) model is
∗
Xt+1|1:t = α + β1 Xt + . . . + βp Xt−p+1 . (85)
with forecast error is εt+1 , and the variance of the forecast error is σ 2 .

Basic concepts

A two-period forecast in an AR(1) model is given by
∗
Xt+2|1:t = Et (Xt+2 )
= Et (α + βXt+1 + εt+2 ) (86)
= (1 + β)α + β 2 Xt .
The error of the two period forecast is εt+2 + βεt+1 ; its variance is (1 + β 2 )σ 2 .
A one period forecast in an MA(1) model is
∗
Xt+1|1:t = Et (Xt+1 )
= Et (µ + εt+1 + θεt ) (87)
= µ + θεt .
The forecast error is εt+1 , and its variance is σ 2 .

These calculations can be generalized to produce a general formula for a
multi-period forecast in an ARMA(p, q) model. This result is known as the
Wiener-Kolmogorov prediction formula and its discussion can be found in [1].

Basic concepts
References
[1] Hamilton, J. D.: Time Series Analysis, Princeton University Press (1994).
[2] Tsay, R. S.: Analysis of Financial Time Series, Wiley (2010).

TS Lecture1 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TS Lecture1 2019

Uploaded by

Copyright:

Available Formats

Basic concepts

Time Series Analysis

A. Lesniewski Time Series Analysis

3 Moving average models

A. Lesniewski Time Series Analysis

A. Lesniewski Time Series Analysis

A. Lesniewski Time Series Analysis

Stationarity and ergodicity

i.e. if the time average of Xt is equal to its mean.

A. Lesniewski Time Series Analysis

Stationarity and ergodicity

Ergodicity is a desired property of a financial time series, as we are always faced

A. Lesniewski Time Series Analysis

Autocovariance and stationarity

A time series is covariance-stationary (a.k.a. weakly stationary), if:

Cov(Xs+τ , Xt+τ ) = Cov(Xs , Xt ), (2)

i.e. Cov(Xs , Xt ) depends only on the difference t − s. We will denote it by

A. Lesniewski Time Series Analysis

Autocovariance and stationarity

The autocorrelation function (ACF) of a time series is defined as

A. Lesniewski Time Series Analysis

Autocovariance and stationarity

A. Lesniewski Time Series Analysis

Autocovariance and stationarity

If Xt is a stationary Gaussian time series with Rs = 0 for s > t, this t-stat is

A. Lesniewski Time Series Analysis

Autocovariance and stationarity

Under the assumption that Xt is i.i.d., Q ∗ (k) is asymptotically distributed

H0 is rejected if Q(k) is greater than the 1 − α percentile of the χ2 (k) distribution.

A. Lesniewski Time Series Analysis

Models of time series

For practical applications, it is convenient to model a time series as a

A. Lesniewski Time Series Analysis

where εt ∼ N(0, σ 2 ) are i.i.d. (= independent, identically distributed) normal

The white noise process is stationary and ergodic (show it!).

is not stationary, as E(Xt ) = at + b.

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

Here, α, β ∈ R, and εt ∼ N(0, σ 2 ) is a white noise.

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

Let us investigate the circumstances under which an AR(1) process is

Let us now compute the autocovariance. To this end, we rewrite (10) as

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

For Γ0 = Var(Xt ) to be independent of t, this implies that

with Γ0 given by (18). The autocorrelation function is decaying exponentially fast

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

where ξt ∼ N(0, 1).

E(Xt |Xt−L ) = µ(1 − β L ) + β L Xt−L ,

A. Lesniewski Time Series Analysis

Autoregressive model AR(1)

Since β L → 0 exponentially fast, for large L we have

A. Lesniewski Time Series Analysis

Maximum likelihood estimation

Maximum likelihood estimation (MLE) is a commonly used method of estimating

where p(yi |θ) denotes the PDF of a single observation.

A. Lesniewski Time Series Analysis