Time Series Station, AR, MA

RS – EC2 - Lecture 13
Lecture 13
Time Series:
Stationarity, AR(p) & MA(q)
Time Series: Introduction

• In the early 1970’s, it was discovered that simple time series models
performed better than the complicated multivarate, then popular,
1960s macro models (FRB-MIT-Penn). See, Nelson (1972).
• The tools? Simple univariate (ARIMA) models, popularized by the

textbook of Box & Jenkins (1970).
Q: What is a time series? A time series yt is a process observed in

sequence over time, t = 1,...., T  Yt={y1, y2 , y3, ..., yT}
• Main Issue with Time Series: Dependence.

Given the sequential nature of Yt, we expect yt & yt-1 to be dependent.
Depending on assumptions, classical results (based on LLN & CLT)
may not be valid.
1

• Usually, time series models are separated into two categories:
– Univariate (yt ∊ R, it is scalar)
 primary model: Autoregressions (ARs).
– Multivariate (yt ∊ Rm, it vector-valued).
 primary model: Vectot autoregressions (VARs).
• In time series, {..., y1, y2 , y3, ..., yT} are jointly RV. We want to
model the conditional expectation:
E[yt| Ft-1]
where Ft-1 = {yt-1, yt-2 , yt-3, ...} is the past history of the series.

• Two popular models for E[yt|Ft-1]:
– An autoregressive (AR) process models E[yt|Ft-1] with lagged
dependent variables.
• A moving average (MA) process models E[yt|Ft-1] with lagged
errors.
• Usually, E[yt|Ft-1] has been modeled as a linear process. But,

recently, non-linearities have become more common.
• In general, we assume the error term, εt, is uncorrelated with

anything, with mean 0 and constant variance, σ2. We call a process
like this a white noise (WN) process. We denote it as
εt ~ WN(0, σ2)
2

• We want to select an appropriate time series model to forecast yt. In
this class, the choices are AR(p), MA(q) or ARMA(p, q).
• Steps for forecasting:

(1) Identify the appropriate model. That is, determine p, q.
(2) Estimate the model.
(3) Test the model.
(4) Forecast.
• In this lecture, we go over the statistical theory (stationarity,

ergodicity and MDS CLT), the main models (AR, MA & ARMA) and
tools that will help us describe and identify a proper model
CLM Revisited: Time Series

With autocorrelated data, we get dependent observations. Recall,
t = t-1 + ut
The independence assumption (A2’) is violated. The LLN and the

CLT cannot be easily applied, in this context. We need new tools
and definitions.
We will introduce the concepts of stationarity and ergodicity. The

ergodic theorem will give us a counterpart to the LLN.
To get asymptotic distributions, we also need a CLT for dependent

variables, using the concept of mixing and stationarity. Or we can
rely on the martingale CLT.
3
Time Series – Stationarity

• Consider the joint probability distribution of the collection of RVs:
F ( z t1 , z t 2 ,..... z tn )  P ( Z t1  z t1 , Z t2  z t 2 ,... Z tn  z tn )
Then, we say that a process is

1st order stationary if F ( zt1 )  F ( zt1  k ) for any t1 , k
2nd order stationary if F ( zt1 , zt2 )  F ( zt1 k , zt2 k ) for any t1 , t2 , k
Nth-order stationary if F ( zt1 .....ztn )  F ( zt1 k .....ztn k ) for any t1, tn , k
• Definition. A process is strongly (strictly) stationary if it is a Nth-order

stationary process for any N.
Time Series – Moments

• The moments describe a distribution. We calculate the moments as
usual:
E (Z t )   t  Z t f ( z t ) dz t
Var ( Z t )   t2  E ( Z t   t )2   (Z t   t ) 2 f ( z t ) dz t
Cov ( Z t1 ,Z t2 )  E [( Z t1   t 1 )( Z t2   t 2 )]   ( t 1  t 2 )
 ( t1  t 2 )
 ( t1 , t 2 ) 
 t21  t22
Note: γ(t1-t2) is called the auto-covariance function –think of it as a

function of k = t1 – t2. γ(0) is the variance.
• Stationarity requires all these moments to be independent of time.
• If the moments are time dependent, we say the series is non-stationary.
4
Time Series – Moments

• For strictly stationary process: t   and  t2   2
because F ( zt1 )  F ( zt1  k )   t1   t1  k  
provided that E ( Z t )  , E ( Z 2 t )  
Then, F ( z t1 , z t 2 )  F ( z t 1  k , z t 2  k ) 
cov( z t1 , z t 2 )  cov( z t1  k , z t 2  k ) 
 ( t1 , t 2 )   ( t 1  k , t 2  k )
let t1  t  k and t 2  t , then
 ( t1 , t 2 )   ( t  k , t )   ( t , t  k )   k
The correlation between any two RVs depends on the time difference.
Time Series – Weak Stationarity
• A process is said to be N-order weakly stationary if all its joint

moments up to order N exist and are time invariant.
• A Covariance stationary process (or 2nd order weakly stationary) has:

- constant mean
- constant variance
- covariance function depends on time difference between R.V.
That is, Zt is covariance stationary if:

E ( Z t )  constant
Var ( Z t )  constant
Cov ( Z t1 , Z t 2 )  E[( Z t1   t1 )( Z t 2   t 2 )]   (t1  t 2 )  f (t1  t 2 )
5
Time Series – Weak Stationarity

Examples: For all assume εt ~ WN(0,σ2)
1) yt = Φ yt-1 + εt
E[yt ] = 0 (assuming Φ≠1)
2
Var[yt] = σ /(1-Φ )2 (assuming |Φ|<11)
E[yt yt-1] = Φ E[yt-12]
 stationary, not time dependent
2) yt = μ + yt-1 + εt  yt = μ t + Σj=0 to t-1 εt-j + y0

E[yt ] = μ t + y0
Var[yt] = Σj=0 to t-1 σ2 = σ2 t
 non-stationary, time dependent
Stationary Series
Examples:
y t  .08   t  .4  t 1  t ~ WN
y t  .13 yt 1   t
% Changes in USD/GBP (1978:I-2011:IV)

0.2
0.15
0.1
0.05
% change
0
-0.05 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129
-0.1
-0.15
-0.2
-0.25
Time
6
Non-Stationary Series
Examples:
y t   t  1 yt 1   2 yt  2   t  t ~ WN
y t    yt 1   t RW with drift
US CPI Prices (1978:I-2011:IV)

140.0
120.0
100.0
US CPI
80.0
60.0
40.0
20.0
0.0
1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136
Time Series – Ergodicity

• We want to allow as much dependence as the LLN allows us to do it.
• But, stationarity is not enough, as the following example shows:
• Example: Let {Ut} be a sequence of i.i.d. RVs uniformly distributed

on [0, 1] and let Z be N(0,1) independent of {Ut}.
Define Yt= Z+Ut . Then Yt is stationary (why?), but
n
1 1
Yn 
n Y
t 1
t 
no
E (Yt ) 
2
1
Yn  Z 
p
2
The problem is that there is too much dependence in the sequence {Yt}
(because of Z). In fact the correlation between Y1 and Yt is always
positive for any value of t.
7
Time Series – Ergodicity of the Mean

• We want to estimate the mean of the process {Zt}, μ(Zt). But, we
need to distinguishing between ensemble average and time average:
m
 Z i
- Ensemble Average z  i1
m
n
 Z t
- Time Series Average z  t1
n
Q: Which estimator is the most appropriate?

A: Ensemble Average. But, it is impossible to calculate. We only observe
one Zt.
• Q: Under which circumstances we can use the time average (only one
realization of {Zt})? Is the time average an unbiased and consistent
estimator of the mean? The Ergodic Theorem gives us the answer.

• Recall the sufficient conditions for consistency of an estimator: the
estimator is asymptotically unbiased and its variance asymptotically
collapses to zero.
1. Q: Is the time average is asymptotically unbiased? Yes.
1 1
E (z) 
n
 E (Z
t
t )
n

t
 
2. Q: Is the variance going to zero as T grows? It depends.
0 0
n n n n n
1
var(z) 
n2
cov(Z , Z )  n 
t 1 s 1
t s 2
t 1 s 1
t s 
n2
(
t 1
t 1  t 2  t n ) 
0
 [(0  1   n1 )  (1  0  1   n2 ) 
n2
  ((n1)  (n2)   0 )] 
8

n 1
0 0
  (1 
k
var( z )  2
( n  k ) k  ) k
n k   ( n  1)
n k
n
0
 (1 
k ?
lim var( z )  lim ) k 
 0
n  n  n n
k
• If the Zt were uncorrelated, the variance of the time average would

be O(n-1). Since independent random variables are necessarily
uncorrelated (but not vice versa), we have just recovered a form of the
LLN for independent data.
Q: How can we make the remaining part, the sum over the upper
triangle of the covariance matrix, go to zero as well?
A: We need to impose conditions on ρk. Conditions weaker than "they
are all zero;" but, strong enough to exclude the sequence of identical
copies.

• We use two inequalities to put upper bounds on the variance of the
time average:
n 1 n t n 1 n t n 1 
 t 1 k 1
k  
t 1 k 1
k   
t 1 k 1
k
Covariances can be negative, so we upper-bound the sum of the actual

covariances by the sum of their magnitudes. Then, we extend the inner
sum so it covers all lags. This might of course be infinite (sequence-of-
identical-copies).
• Definition: A covariance-stationary process is ergodic for the mean if
p lim z  E ( Z t )  
Ergodicity Theorem: Then, a sufficient condition for ergodicity for
the mean is
k  0 as k  
9
Time Series – Ergodicity of 2nd Moments
• A sufficient condition to ensure ergodicity for second moments is:

k
k  
A process which is ergodic in the first and second moments is usually

referred as ergodic in the wide sense .
• Ergodicity under Gaussian Distribution

If {Zt}is a stationary Gaussian process, 
k
k 
is sufficient to ensure ergodicity for all moments.
Note: Recall that only the first two moments are needed to describe
the normal distribution.
Time Series – Ergodicity – Theorems

• We state two essential theorems to the analysis of stationary time
series. Difficult to prove in general.
Theorem I
If yt is strictly stationary and ergodic and xt = f(yt, yt-1, yt-2 , ...) is a RV,
then xt is strictly stationary and ergodic.
Theorem II (Ergodic Theorem)

If yt is strictly stationary and ergodic and E[yt] <∞; then as T→ ∞;
1
T
y
t
t 
p
E[ yt ]
• These results allow us to consistently estimate parameters using

time-series moments.
10
Time Series - MDS

• Definition: εt is a martingale difference sequence (MDS) if
E[εt| Ft-1]=0.
• Regression errors are naturally a MDS. Some time-series processes

may be a MDS as a consequence of optimizing behaviour. For
example, most asset pricing models imply that asset returns should be
the sum of a constant plus a MDS.
• Useful property: εt is uncorrelated with any function of the lagged

information Ft-1. Then, for k > 0  E[yt-k εt] = 0.
Time Series – MDS CLT

Theorem (MDS CLT)
If ut is a strictly stationary and ergodic MDS and E(ut ut’) = Ω < ∞;
then as T→ ∞;
1
T
u
t
t 
d
N (0,  )
• Application: Let xt ={yt-1, yt-2 , ... }, a vector of lagged yt’s.

Then (xt εt) is a MDS. We can apply the MDS CLT Theorem. Then,
1
T
x t
t t 
d
N ( 0 ,  ),   E[ xt xt '  t 2 ]
• Like in the derivation of asymptotic distribution of OLS, the above

result is the key to establish the asymptotic distribution in a time series
context.
11
Autoregressive (AR) Process

• We want to model the conditional expectation of yt:
E[yt|Ft-1]
where Ft-1 = {yt-1, yt-2 , yt-3, ...} is the past history of the series. We
assume the error term, εt = yt – E[yt|Ft-1], follows a WN(0,σ2).
• An AR process models E[yt|Ft-1] with lagged dependent variables.
• The most common models are AR models. An AR(1) model

involves a single lag, while an AR(p) model involves p lags.
Example: A linear AR(p) model (the most popular in practice):

y t    1 yt 1   2 y t  2  .... p y t  p   t
with E[εt|Ft-1]=0.
AR Process – Lag Operator

• Define the operator L as
Lk zt  zt  k
• It is usually called Lag operator. But it can produces lagged or forward
variables (for negative values of k). For example:
L3 zt  zt 3
• Also note that if c is a constant  L c = c.
• Sometimes the notation for L when working as a lag operator is B

(backshift operator), and when working as a forward operator is F.
• Important application: Differencing

 z t  (1  L ) z t  z t  z t 1
d z t  (1  L ) d z t
12
Autoregressive (AR) Process

• Let’s work with the linear AR(p) model is:
p
y t    1 y t 1   2 y t  2  ....   p y t  p   t     y
i 1
i t i  t
p
yt     L y
i 1
i
i
t  t L : Lag operator
• We can write this process as:

( L ) y t     t where ( L )  1  1 L1  L2  2  ....   p L p
(L) is called the autoregressive polynomial of yt. Note that
yt  (L)1(  t )
delivers an infinite sum on the εt-j’s  an MA(∞) process!
• Q: Can we do this inversion?
AR Process - Stationarity
• Let’s compute moments of yt using the infinite sum (assume μ=0):
E [ y t ]   ( L )  1 E [ t ]  0   (L)  0
Var [ y t ]   ( L ) Var [ t ]
2
  ( L)2  0
E [ y t y t  j ]   (t  j )  E [(1 y t 1 y t  j   2 y t  2 y t  j ....   p y t  p y t  j   t y t  j )]
 1 ( j  1)   2 ( j  2 )  ....   p  ( j  p )
where, abusing notation,

 ( L )  2  1 /( 1   1 2 L1  L 2  2 2 L 2  ....   p
2
Lp )
Using the fundamental theorem of algebra, (z) can be factored as

( z )  (1  r1 1 z )(1  r2 1 z )...(1  r p 1 z )
where the r1, .... rk ∈ C are the roots of (z). If the j’s coefficients are
all real, the roots are either real or come in complex conjugate pairs.
13
AR Process – Stationarity
Theorem: The linear AR(p) process is strictly stationary and ergodic
if and only if |rj|>1 for all j, where |rj| is the modulus of the
complex number rj.
• We usually say “all roots lie outside the unit circle.”
Note: If one of the rj’s equals 1, (L) (& yt) has a unit root –i.e.,
Φ(1)=0. This is a special case of non-stationarity.
• Recall (L)-1 produces an infinite sum on the εt-j’s. If this sum does
not explode, we say the process is stable.
• If the process is stable, we can calculate δyt/δεt-j: How much yt is

affected today by an innovation (a shock) t  j periods ago. We call this
the impulse response function (IRF).
AR Process – Example: AR(1)

Example: AR(1) process
y t     y t 1   t
E [   t ] 
E[ yt ]    *    1 ( r1  0 )
1  1 
Var [  t ] 2
Var [ y t ]  2
 2
; since  2  0  |  | 1 ( r1  1)
(1   ) (1   )


Note: 1 /(1   i )   ij i  1, 2
j0
These infinite sums will not explode (stable process) if

|φ|<1  stationarity condition.
Under this condition, we can calculate the impulse response function:

δyt/δεt-j = φj
14

• The autocovariance function is:
k  Cov Y t , Y t  k   E Y t   Y t  k   
k  E  Y t  1      t Y t  k   
k   E Y t  1   Y t  k     E  t Y t  k  
k   k 1  E  t Y t  k      k  1
• There is a recursive formula for γk:

 1   0
 2    0    2  0
 k   k 0
• Again, when ||<1, the autocovariance do not explode as k

increases. There is an exponential decay towards zero.
• Note:  k    0
k
– when 0 <  < 1  All autocovariances are positive.

– when 1 <  < 0  The sign of γk shows an alternating pattern
beginning a negative value.
• The AR(1) process has the Markov property: The distribution of Yt

given {Yt-1, Yt-2, …} is the same as the distribution of Yt given {Yt-1}.
15

Example: AR(2) process
y t     1 y t 1   2 y t  2   t  (1  1 L   2 L2 ) y t     t
We can invert (1 – 1 L – 2L2) to get the MA(∞) process.
• Stationarity Check
– E[yt] = μ/(1 – 1 – 2)= μ*  1 + 2 ≠1.
– Var[yt] = σ2/(1 – 12 – 22)  12 + 22 < 1
Stationarity condition: |1+ 2 |<1
• The analysis can be simplified by rewriting the AR(2) in matrix form

as an AR(1):
 y t     1  2   y t 1    t  ~
y          ~y t  
~  A ~y
t 1   t
 t 1   0   1 0   y t  2   0 
Note: Now, we check [ I – Ai ] (i=1,2) for stationarity conditions
~ ~  A ~y
yt   ~  ~y t  [ I  AL ]  1 ~ t
t 1   t

Note: Recall I  F 1   F j  I  F  F 2  ...
j 0
Checking that [I – AL] is not singular, same as checking that Aj does

not explode. The stability of the system can be determined by the
eigenvalues of A. That is, get the λi’s and check if |λi|<1 for all i.
 2     2 
A  1   | A   I | det  1   (1   )   2
1 0   1   
• If |λi| <1 for all i=1,2, yt is stable (it does not explode) and
stationary. Then: 12   2  12   2  1
1  2  1  1  2  1  2
16
• The autocovariance function is given by:
 k  E Yt   Yt  k   
 E 1 Yt 1      2 Yt  2      t Yt  k   
 1 k 1   2 k  2  E  t Yt  k   
• Again a recursive formula. Let’s get the first autocovariances:
 0   1 1   2 2  E  t Y t   
  1 1   2 2   2
 1   1 0   2 1  E  t Y t  1   
 1 0
  1   2   
1  2
0 1 1
 2   1 1   2 2  E  t Y t  2   
 12   2   22
  1   2    
1  2
1 2 2 0
• The AR(2) in matrix AR(1) form is called Vector AR(1) or VAR(1).
Nice property: The VAR(1) is Markov -i.e., forecasts depend only on

today’s data.
• It is straightforward to apply the VAR formulation to any AR(p)

processes. We can also use the same eigenvalue conditions to check
the stationarity of AR(p) processes.
17
AR Process – Causality
• The AR(p) model:
( L ) y t     t where ( L )  1  1 L1  L2  2  ....   p L p
Then, yt  (L)1(  t ),  an MA(∞) process!
• But, we need to make sure that we can invert the polynomial (L).
• When (L) ≠0, we say the process yt is causal (strictly speaking, a

causal function of {εt}).
Definition: A linear process {yt} is causal if there is a

 ( L )  1   1 L   2 L2  ...


with |  j (L) |  
j0
with y t   ( L ) t .
AR Process – Causality
Example: AR(1) process:
( L ) y t     t , where  ( L )  1  1 L
Then, yt is causal if and only if:

|1| <1
or
the root r1 of the polynomial (z) = 1 − 1 z satisfies |r1|>1.
• Q: How do we calculate the Ψ‘s coefficients for an AR(p)?

A: Matching coefficients:
1 1
1  
Y t    t    1i Li t
1   1 L  i0

 1  1L  1 L    t
2 2
   i  1 , i  0
i
18
AR Process – Calculating the ψ’s

• Example: AR(2) - Calculating the ψ‘s by matching coefficients.

1   L   L  y
1
    2
2
t     t   L  L   1
 L 
0  1
1  1
 2   12   2
 3   13  2  1 2
 j  1 j 1  2 j2 , j  2
We can solve these linear difference equations in several ways:

- Numerically
- Guess the form of a solution and using an inductive proof
- Using the theory of linear difference equations.
AR Process – Estimation and Properties

• Define
xt  (1 y t 1 y t  2 .... y t  p )
  ( 1  2 ....  p )
Then the model can be written as yt  xt '    t
• The OLS estimator is b  ˆ x ( X ' X ) 1 X ' y t
• Recall that ut= xtεt is a MDS. It is also strictly stationary and ergodic.
1 1
T
x
t
t t 
T
u
t
t p  E [ u t ]  0 .
• The vector xt is strictly stationary and ergodic, and by Theorem I so

is xt xt’. Then, by the Ergodic Theorem
1
T
xx t
t t '  p  E [ x t x t ' ]  Q
19
AR Process – Estimation and Properties

• Consistency
Putting together the previous results, the OLS estimator can be
rewritten as:
1
1  1 
b  ˆ  ( X ' X ) 1 X ' y    
T

 t
xt xt ' 



T

 x '  
t
t t
xt
Then,
1
1  1 
b  
T

 t
xt xt ' 



T

 x '     Q
t
t t
p 1
0 
 the OLS estimator is consistent.
AR Process – Asymptotic Distribution

• Asymptotic Normality
We apply the MDS CLT to xtεt. Then, it is straightforward to derive
the asymptotic distribution of the estimator (similar to the OLS case):
Theorem If the AR(p) process yt is strictly stationary and ergodic

and E[yt4], then as T→ ∞;
T (b   ) 

d
N (0, Q 1Q 1 ),   E[ xt xt '  2 ]
• Identical in form to the asymptotic distribution of OLS in cross-

section regression  asymptotic inference is the same.
• The asymptotic covariance matrix is estimated just as in the cross-

section case: The sandwich estimator.
20
AR Process – Bootstrap
• So far, we constructed the bootstrap sample by randomly resampling
from the data values (yt,xt). This created an i.i.d bootstrap sample.
• This is inappropriate for time-series, since we have dependence.
• There are two popular methods to bootstrap time series.

(1) Model-Based (Parametric) Bootstrap
(2) Block Resampling Bootstrap
(1) Model-Based (Parametric) Bootstrap
1. Estimate b and residuals e:
2. Fix an initial condition {yt-k+1, yt-k+2 , yt-k+3, ..., y0}
3. Simulate i.i.d. draws e* from the empirical distribution of the
residuals {e1, e2 , e3, ..., eT}.
4. Create the bootstrap series yt by the recursive formula
y t *  ˆ  ˆ 1 y t 1 *  ˆ 2 y t  2 * ....  ˆ p y t  p *   t *
Pros: Simple. Similar to the usual bootstrap.
Cons: This construction imposes homoskedasticity on the errors e* ;

which may be different than the properties of the actual e. It also
imposes the AR(p) as the DGP.
21
(2) Block Resampling
1. Divide the sample into T/m blocks of length m.
2. Resample complete blocks. For each simulated sample, draw T/m
blocks.
3. Paste the blocks together to create the bootstrap time-series yt*.
Pros: It allows for arbitrary stationary serial correlation,

heteroskedasticity, and for model misspecification.
Cons: It may be sensitive to the block length, and the way that the
data are partitioned into blocks. May not work well in small samples.
Moving Average Process

• An MA process models E[yt|Ft-1] with lagged error terms. An
MA(q) model involves q lags.
• We keep the white noise assumption for εt.
Example: A linear MA(q) model:

q
yt     t  1 t 1   2 t  2  ...   q t  q      i t i   t
i 1
q
yt      i Li t   t     ( L) t  ( L)  1  1 L   2 L2  ...   q Lq
i 1
• Q: Is yt stationary? Check the moments.
22
Moving Average Process – Stationarity

• Q: Is yt stationary? Check the moments. WLOG, assume μ=0.
E[ yt ]  0
Var [ yt ]  (1  12   22  ...   q2 ) 2
 (t  j )  E[ yt yt  j ]  E[( 1 t 1 yt  j   2 t  2 yt  j  ....   q t  q yt  j   t yt  j )]
  2 [  j 1 j j |k | ]  (t  j )  0.
q
| k | q; otherwise
• It is easy to verify that the sums are finite  MA(q) is stationary.
• Note that an MA(q) process can generate an AR process.

yt    (L)t (L)1 yt  *t
• We have an infinite sum polynomial on θL. That is, an AR(∞).


 j ( L) yt   * t
j 0
MA Process – Invertibility
• We need to make sure that θ(L)-1 is defined: We require θ(L) ≠ 0.
When this condition is met, we can write εt as a causal function of yt.
We say the MA is invertible. For this to hold, we require:


j0
|  j (L) |  
Definition: A linear process {yt} is invertible strictly speaking, an

invertible function of {εt}, if there is a
 ( L )  1   1 L   2 L 2  ...


with |  j (L) |  
j0
with  t  (L) yt .
23
MA Process – Example: MA(1)

• Example: MA(1) process:
y t     ( L ) t  ( L )  1  1 L
- Moments
E Yt   
 0  Var  yt    2  12  2
1  E[ yt , yt 1 ]  1 2
 k  E[ yt , yt  k ]  0, k 1
Note: The autocovariance function is zero after lag 1.
– Invertibility: If |θ1|<1, we can write (1+ θ1 L)-1 yt + μ* = εt


(1  1 L  1 2 L2  ...  1 j L j  ...) y t  *   * 
   ( L) y
i 1
i t  t

• Example: MA(2) process:
y t     ( L ) t  ( L )  1  1 L   2 L2
- Moments
E Yt   

 2 1  12   22 ,  k 0

  1 1   2 , k 1
2
k  
   2 2 , k 2

 0, k 2
Note: the autocovariance function is zero after lag 2.
24

- Inveritbility: The roots of 2  1   2  0 all lie inside the unit
circle. It can be shown the invertibility condition for MA(2) process is:
1   2  1
 2  1  1
1  2  1
MA Process – Estimation
• MA are more complicated to estimate. In particular, there are
nonlinearities. Consider an MA(1):
yt = εt + θ εt-1
The auto-correlation is ρ1 = θ/(1+θ2). Then, MM estimate of θ

satisfies:
ˆ ˆ  1  1  4 r1
2
r1   
(1  ˆ 2 ) 2 r1
• A nonlinear solution and difficult to solve.
• Alternatively, if |θ|< 1, we can try a ∈(-1; 1),
 t ( a )  yt  a yt 1  a 2 yt  2  ...
and look (numerically) for the least-square estimator
ˆ  arg a min{ S T ( a )   t 1  t 2 ( a )}
T
25
The Wold Decomposition

Theorem - Wold (1938).
Any covariance stationary {yt} has infinite order, moving-average
representation:
y t   j  0 j L j  t   t ,

 0  1,
where  t : deterministic term (perfectly forecastable). Say,  t  



j 0
 j2  
 t ~ WN (0,  2 )
• yt is a linear combination of innovations over time.
• A stationary process can be represented as an MA(∞) plus a

deterministic “trend.”
The Wold Decomposition

Example:
Let xt= yt – κt. Then, check moments:


E[ xt ]  E[ yt   t ]   j E[ t  j ]  0.
j 0
 
 
E[ xt 2 ]   2j E[ t2 j ]   2  2j  .
j 0 j 0
E[ xt , xt  j ]  E[( t  1  t 1   2  t  2  ...)( t  j  1  t  j 1   2  t  j  2  ...)


  2 ( j  1  j 1   2  j  2  ...)   2  k  k 2
k 0
Xt is a covariance stationary process.
26
ARMA Process
• A combination of AR(p) and MA(q) processes produces an
ARMA(p, q) process:
yt    1 yt 1  2 yt  2  ....   p yt  p   t  1 t 1   2 t  2  ...   q t  q
p q
    i yt i    i Li t   t
i 1 i 1
  ( L ) yt     ( L ) t
• Usually, we insist that (L)≠0, θ(L)≠0 & that the polynomials (L),
θ(L) have no common factors. This implies it is not a lower order
ARMA model.
ARMA Process
Example: Common factors.
Suppose we have the following ARMA(2,3) model ( L) yt  ( L) t
with
( L)  1  .6 L  .3L2
( L)  1  1.4 L  .9 L2  .3L3  (1  .6 L  .3L2 )(1  L)
This model simplifies to: yt  (1 L)t  an MA(1) process.
 p L 
• Pure AR Representation:  L  y t     a t   L  
 q L 
 q L 
• Pure MA Representation:  yt      L at   L  
 p L 
• Special ARMA(p,q) cases: – p = 0: MA(q)

– q = 0: AR(p).
27
ARMA: Stationarity, Causality and Invertibility

Theorem: If (L) and θ(L) have no common factors, a (unique)
stationary solution to ( L) yt  ( L) t exists if and only if
| z | 1   ( z )  1  1 z   2 z 2  ...   p z p  0 .
This ARMA(p, q) model is causal if and only if
| z | 1   ( z )  1  1 z   2 z 2  ...   p z p  0 .
This ARMA(p, q) model is invertible if and only if
| z | 1   ( z )  1  1 z   2 z 2  ...   p z p  0 .
• Note: Real data cannot be exactly modeled using a finite number of

parameters. We choose p, q to create a good approximated model.
ARMA Process – SDE Representation

• Consider the ARMA(p,q) model:
 ( L )( yt   )   ( L ) t
Let xt  yt   and wt   ( L ) t .
Then, x t  1 x t 1   2 x t  2  ....   p x t  p  w t
 xt is a p-th-order linear stochastic difference equation

(SDE).
xt   xt 1   t
Example: 1st-order SDE (AR(1)):
 
Recursive solution (Wold form):
x t   t  1 x 1    t  t  i   t  1 x 1   i  ti
i0 i0
where x 1 is an initial condition.
28
ARMA Process – Dynamic Multiplier

• The dynamic multiplier measurers the effect of εt on subsequent
values of xt: That is, the first derivative on the Wold representation:
δxt+j/δεt = δxj/δε0 = ψj.
For an AR(1) process:

δxt+j/δεt = δxj/δε0 =  j.
• That is, the dynamic multiplier for any linear SDE depends only on
the length of time j, not on time t.
ARMA Process – Impulse Response Function

• The impulse-response function (IRF) a sequence of dynamic multipliers
as a function of time from the one time change in the innovation, εt.
• Usually, IRF are represented with a graph, that measures the effect
of the innovation, εt, on yt over time:
δyt+j/δεt+ δyt+j+1/δεt + δyt+j+21/δεt+...= ψj + ψj+1+ ψj+2+...
• Once we estimate the ARMA coefficients, it is easy to draw an IRF.
29
ARMA Process – Addition

• Q: We add two ARMA process, what order do we get?
• Adding MA processes
x t  A ( L ) t
z t  C ( L )u t
y t  x t  z t  A ( L ) t  C ( L ) u t
- Under independence:
 y ( j )  E [ y t y t  j ]  E [( x t  z t )( xt  j  z t  j )]
 E [( x t x t  j  z t z t  j )]   x ( j )   z ( j )
- Then, γ(j) = 0 for j > Max(qx, qz,)  yt is ARMA(0, max(qx, qz))
- Implication: MA(2) + MA(1) = MA(2)
ARMA Process – Addition

• Q: We add two ARMA process, what order do we get?
• Adding AR processes
(1  A ( L )) x t   t
(1  C ( L )) z t  u t
y t  xt  z t  ?
- Rewrite system as:
(1  C( L))(1  A( L))xt  (1  C(L)) t
(1  A(L))(1  C( L))zt  (1  A( L))ut
(1  A(L))(1  C( L)) yt  (1  C(L)) t  (1  A( L))ut   t  ut  [C(L) t  A(L)ut ]
- Then, yt is ARMA(px+pz , max(px, pz))
30

Time Series Station, AR, MA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Station, AR, MA

Uploaded by

Copyright:

Available Formats

RS – EC2 - Lecture 13

Time Series: Introduction

• The tools? Simple univariate (ARIMA) models, popularized by the

Q: What is a time series? A time series yt is a process observed in

• Main Issue with Time Series: Dependence.

Time Series: Introduction

Time Series: Introduction

• Usually, E[yt|Ft-1] has been modeled as a linear process. But,

• In general, we assume the error term, εt, is uncorrelated with

Time Series: Introduction

• Steps for forecasting:

• In this lecture, we go over the statistical theory (stationarity,

CLM Revisited: Time Series

The independence assumption (A2’) is violated. The LLN and the

We will introduce the concepts of stationarity and ergodicity. The

To get asymptotic distributions, we also need a CLT for dependent

Time Series – Stationarity

Then, we say that a process is

2nd order stationary if F ( zt1 , zt2 )  F ( zt1 k , zt2 k ) for any t1 , t2 , k

Nth-order stationary if F ( zt1 .....ztn )  F ( zt1 k .....ztn k ) for any t1, tn , k

• Definition. A process is strongly (strictly) stationary if it is a Nth-order

Time Series – Moments

Note: γ(t1-t2) is called the auto-covariance function –think of it as a

• Stationarity requires all these moments to be independent of time.

• If the moments are time dependent, we say the series is non-stationary.

Time Series – Moments

Time Series – Weak Stationarity

• A process is said to be N-order weakly stationary if all its joint

• A Covariance stationary process (or 2nd order weakly stationary) has:

That is, Zt is covariance stationary if:

Time Series – Weak Stationarity

2) yt = μ + yt-1 + εt  yt = μ t + Σj=0 to t-1 εt-j + y0

% Changes in USD/GBP (1978:I-2011:IV)

US CPI Prices (1978:I-2011:IV)

Time Series – Ergodicity

• Example: Let {Ut} be a sequence of i.i.d. RVs uniformly distributed

Time Series – Ergodicity of the Mean

Q: Which estimator is the most appropriate?

Time Series – Ergodicity of the Mean

2. Q: Is the variance going to zero as T grows? It depends.

Time Series – Ergodicity of the Mean

• If the Zt were uncorrelated, the variance of the time average would

Time Series – Ergodicity of the Mean

Covariances can be negative, so we upper-bound the sum of the actual

Time Series – Ergodicity of 2nd Moments

• A sufficient condition to ensure ergodicity for second moments is:

A process which is ergodic in the first and second moments is usually

• Ergodicity under Gaussian Distribution

is sufficient to ensure ergodicity for all moments.

Time Series – Ergodicity – Theorems

Theorem II (Ergodic Theorem)

• These results allow us to consistently estimate parameters using

Time Series - MDS

• Regression errors are naturally a MDS. Some time-series processes

• Useful property: εt is uncorrelated with any function of the lagged

Time Series – MDS CLT

• Application: Let xt ={yt-1, yt-2 , ... }, a vector of lagged yt’s.

• Like in the derivation of asymptotic distribution of OLS, the above

Autoregressive (AR) Process

• An AR process models E[yt|Ft-1] with lagged dependent variables.

• The most common models are AR models. An AR(1) model

Example: A linear AR(p) model (the most popular in practice):

AR Process – Lag Operator