Professional Documents
Culture Documents
Diego Vílchez
2
• We will discuss the modern approach to economic time
series
– We will start with a class of specifications aimed at predicting the
behavior of a variable using only information contained in the past
values of this variable and possibly past values of the error term
– yt=f(yt-1, yt-2, ..yt-p, et-1,et-2, ..et-q)
– This is different from multivariate models in which we try to predict
the behavior of a variable using the past or current behavior of
other variables
– yt=f(x1, t-1, x1, t-2, .. x1, t-p, x2, t-1, x2, t-2, .. x2, t-p,…xn, t-1, xn, t-2, .. xn, t-p)
• Univariate time series models are usually atheoretical, but
they can produce good forecasts,
• They are also useful because we may not have other
variables at the same frequency
– For instance, we want to predict stock returns but we do not have
daily data for fundamentals
3
• Economists started using this modern approach to
time series analysis when they realized that
purely statistical models used by statisticians
were much better at forecasting than structural
models used by economists
• Economists did not use to worry about the
stationarity of the series they used
– However, a series of results showed that standard
econometric techniques applied to non-stationary data
could yield results that did not make any sense
• By the way, what does stationarity mean?
4
• Think of time series analysis as an attempt
of estimating a difference equation with a
stochastic component.
– The problem is that we have no clue about the
form of this difference equation
– The difficult thing is to choose which equation
we should estimate:
• y(t)=ay(t-1)+u(t)
• y(t)=ay(t-1)+by(t-2)+u(t)
• y(t)=a(1)y(t-1)+… +a(n)y(t-n)+u(t)
• y(t)=ay(t-1)+bu(t)+cu(t-1)
• y(t)=a(1)y(t-1)+… +a(n)y(t-n)+b(1)u(t)+..+b(n)(u(t-n+1))
5
Objective of this class
• Learn how to forecast
• 3 steps
1. Take a time series
2. Use the Box and Jenkins methodology to
identify the characteristics of this series (its
moving average and autoregressive
components)
3. Forecast
6
We need some definitions
• They will not make sense at the beginning
– Hopefully, they will make sense later on
– So bear with me
• Stationarity
• Autocovariance and Autocorrelation
• White Noise Process
• Moving average (MA)
• Autoregressive process (AR)
• Wold Decomposition
• Random walk
• Partial autocorrelation function (PACF)
7
Stationarity
• For the moment we will work with stationary
processes
• A strictly stationary process is a process in
which the probability measure for the sequence
{y(t)} is the same as that for the sequence
{y(t+k)} and this holds for all possible k.
– This means that the distribution of the variable is the
same at any point in time.
• We will use a weaker concept of stationarity:
weak stationarity or covariance stationarity.
8
• A series is covariance (or weakly) stationary if:
3. E ( yt )( yt s ) E ( yt k )( yt s k ) s
for s 0, 1, 2, 3,..
– The first condition states that the series needs to have
constant mean
– The second condition states that the series needs to have
constant and finite variance
– The third condition says that the autocovariance only depends
from the distance (s) between two observations.
Is yt=bt+ut stationary? 9
Strict versus Weak
• Strict stationarity requires that the joint
distribution of any group of observations does
not depend on t.
• In general weak stationarity does not imply
strong stationarity because weak stationarity is
only concerned with the first two moments of the
distribution (the mean and covariances).
• An exception is a series that is normally
distributed
– In this case, weak stationarity implies strong
stationarity because the normal distribution is fully
characterized by its first two moments.
10
Why do we care?
• Nonstationary series are much more difficult to analyze
than stationary series.
• In standard econometrics, you learn about things like the
central limit theorem and consistency of estimators.
Those concepts can be generalized in straightforward
ways to apply to stationary time series.
• For nonstationary series, the distributions of estimators
are much more complex and are not simple
generalizations of the familiar distributions of estimators
for cross-sectional data.
• The most common approach is to transform a series so
that it is stationary and then proceed from there.
– Not so when we do cointegration
11
For the moment let’s assume we
work with stationary series
• Next class, we’ll learn how to test for stationarity
• Let’s talk about autocovariance and
autocorrelation
Remember t he autocovariance
E ( yt )( yt s ) E ( yt k )( yt s k ) s
for s 0, 1, 2, 3,..
12
• Since the autocovariance depends on the unit of
{y(t)}, we normalize the autocovariance with the
variance of the series. This is the
autocorrelation (AC):
( s)
t ( s)
(0)
• The AC is like a correlation and is bounded
between -1 and +1. Of course t(0)=1.
– A plot of t(s) versus s is called the autocorrelation
function (ACF) or correlogram
13
• Note that if y(t) is stationary with normally
distributed errors, then the sample
autocorrelation is also approximately normally
distributed
tˆs ~ approximately N (0,1 / T )
• Where T is the sample size.
• This is cool, because we can use the above
result to build confidence intervals for the
autocorrelation coefficient
• The 95% confidence interval is:
1.96
tˆs
T 14
• We can also have a joint test that all the
autocorrelation coefficients up to m (maximum lag
length) are jointly equal to zero.
• The original test was the Box-Pierce Q statistics
m
Q T tˆk2
k 1
16
• ac y
0.80
0.60
0.40
0.20
0.00
-0.20
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
17
• Note that we can do the calculation by hand
• The first 3 autocorrelations are: 0.695, 0.494,
0.333
1.96
• The confidence intervals are given by tˆs T
– 0.695 ± 1.96/1000^0.5= 0.695 ± 0.06;
– 0.494± 0.06;
– 0.333± 0.06
• Q* for m = 3 is
– 1000*1002(0.6952/999+ 0.4942/998+0.3332/997)=841
– We reject the null that all of the first 3
autocorrelations are zero
18
Is this the number produced by Stata?
White Noise
• A white noise process is a process with
constant mean, constant variance and zero
autocovariance except at lag zero.
– Or, in a white noise process each observation is
uncorrelated with all other values in the sequence.
• So y(t) is a white noise process if:
• 1 E(y(t))=
• 2 var(y(t))=
• 3 s =0 for all s different from 0
19
White Noise
• STATA has a white noise test
. wntestq y, lag(20)
. wntestq e, lag(20)
20
Moving average
• The MA is the simplest time series process
• Let u be a WN with E(u)=0 and Var(u)=2
• Then, the following process is a moving
average of order q. written as MA(q)
21
Moving average
• So an MA process is just a linear combination
of WN processes
• Lag operator: Ly(t)=y(t-1) and Liy(t)=y(t-i)
• So:
q
yt i Li ut ut
i 1
• Or y(t)=(L)u(t), where:
( L) 1 1L 2 L2 3 L3 ... q Lq
22
Moving average
• From now on, we will set =0
• Example: MA(2) process yt ut 1ut 1 2ut 2
• The mean of yt is E( yt ) E(ut ) 1E(ut 1 ) 2 E(ut 2 ) 0
• The variance is var( yt ) E( yt E( yt ))( yt E( yt ))
• Since E(yt)=0, var( yt ) Eyt yt
var( yt ) E[(ut 1ut 1 2ut 2 )(ut 1ut 1 2ut 2 )]
var( yt ) E[(ut2 12ut21 22ut22 1 2ut 1ut 2 1ut 1ut 2ut 2ut ...)]
t1
1 1 2
(1 12 22 )
2
t2
(1 12 22 )
0
t3 0
(1 )
1
2 2
2
t 2 0
25
In general
• If you have a MA(q) process
• y(t)=u+e(t)+1e(t-1)+2e(t-2)+..qe(t-q)
– E(y(t))=u
– var(y(t))=(1+12+22+..q2)2
(1)=(1+12+23..+q-1q)2
(2)=(2+13+24..+q-2q)2
(q)=(q)2
(m)=0 (for all m>q)
26
• Again with STATA We expect
– clear
t(1)=(/()=
– set obs 1000
– gen e = invnorm(uniform()) = (-0.5-0.5*0.25)/(1+0.52+0.252)=-0.48
– gen t = _n-1 t 0.25/(1+0.52+0.252)=0.19
– tsset t
– gen y = 0 if t==0
– replace y = 0 if t==1
– replace y = e-0.5*l.e+0.25*l2.e if t>1
– corrgram y, lag(20)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
27
• ac y
0.20
0.00
-0.20
-0.40
-0.60
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
28
Autoregressive processes
• An AR process is a process in which the current
value of y depends only on past values of y plus
an error term. An AR(p) is defined as:
yt 1 yt 1 2 yt 2 3 yt 3 ... p yt p ut
• or y p y u
t i 1
i t i t
• or y Li y u
p
t i t t
i 1
Our friend the lag operator
• or ( L) yt ut with:
( L) 1 1L 2 L2 3 L3 ... p Lp 29
Back to stationarity
• Non-stationary AR models have explosive behavior, a
shock today will not die out with time but become bigger
and bigger
• Start with an AR(p) model with =0 ( L) yt ut
• The model is stationary if it is possible to write:
yt ( L) ut
1
30
The Wold's Decomposition
• Any stationary series can be decomposed into the sum
of two unrelated processes
– A deterministic part
– A stochastic part represented by an MA(∞)
• This is called the WOLD‘S DECOMPOSITION
THEOREM
• So, an AR(p) process with no constant and no other
terms can be expressed as a MA
• The Wold decomposition of ( L) yt ut is yt ( L)ut
• With:
• ( L) ( L) 1 (1 L L2 L3 ... Lp ) 1
1 2 3 p
31
Back to stationarity
• Note that ( L) 1 can be written as an infinite MA
(MA(∞ ))
ut a1ut 1 a2ut 2 a3ut 3 ...a0u0
• If the process is stationary, the coefficients of the
MA will decline with lag length. If the process is
not stationary, the MA coefficients will not
converge to zero
• An AR(p) process is stationary if the roots of the
"characteristic equation" all lie outside the unit
circle
1 1 z 2 z 2 3 z 3 ... p z p 0
32
Example I
• y(t)=by(t-1)+u(t) => y(t)(1-bL)=u(t)
• The characteristic equation is 1-bz=0 and the
root is z=1/b
• If -1<b<1, then the root lies outside the unit circle
and the process is stationary. If not the model is
not stationary
• For instance: a random walk is defined as:
y(t)=y(t-1)+u(t)
• As b= 1 and z=1. The RW is NOT stationary
33
Example II
• y(t)=3y(t-1)+2.75y(t-2)+0.75y(t-3)+u(t)
• y(t)(1-3L-2.75L2-0.75L3)=u(t)
• The characteristic equation is
• 1-3z-2.75z2-0.75z3=0
• ….mmmm are you good with cubic
equations?
34
Example II
• The characteristic equation is
• 1-3z-2.75z2-0.75z3=0 Non stationary
• ….mmmm are you good with cubic equations?
. mata
mata (type end to exit)
: z = polyroots((1, -3, -2.75, -0.75))
: z
1 2 3
35
Example
• Compute the unconditional mean, variance and ACFs of
a simple AR(1) process yt 1 yt 1 ut
E ( yt ) 1 E ( yt 1 )
E ( yt 1 ) 1 E ( yt 2 )
E ( yt ) 1 1 E ( yt 2 ) (1 1 ) 12 E ( yt 2 )
E ( yt 2 ) 1 E ( yt 3 )
E ( yt ) (1 1 ) 12 1 E ( yt 3 ) (1 1 12 ) 13 E ( yt 3 )
E ( yt ) (1 1 12 ... 1n 1 ) 1n E ( yt n )
If 1 1 (i.e. if the model is stationary) lim n 1n E ( yt n ) 0
yt 1 yt 1 ut
yt (1 1 L) ut
from the Wold decomposition
yt (1 1 L) 1 ut
yt (1 1 L 12 L2 ....)ut
yt ut 1ut 1 12ut 2 13ut 3 ...
as long as 1 1 this sum will converge
Remember t he formula of the variance
var (yt ) E yt E yt yt E yt
37
• Variance (assume =0)
0
t0 1
0
2 2 2
1 1 2 2 1 2
t1 , t 2
0 2 2
0 2
1 2 1 2
3 2 s 2
3 1 2 1 2
t3 3
, t s
s
0 2 s
0 2
1 2 1 2
40
For higher order processes
getting the ACF is messier
• The mean and the autocorrelations of an AR
process can be obtained as follows
• Unconditional mean E ( y )
1 1 2 3 ... p
t
42
0.80
0.60
0.40
0.20
0.00
-0.20
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
43
A piece of advice about simulation
• The first observations of simulated series are
influenced by the choice of y(0).
• When simulating data from an autoregressive
process, it is nice to create a series much longer
than what you actually want and then discard or
"burn" the observations at the beginning of the
series.
• In general, the larger , the larger the burn-in
period
44
THE ACF for AR(2) process
yt 1 yt 1 2 yt 2 ut , we will derive Yule - Walker
t AR process by yt-s for s 0, 1, 2, 3
M ultiply he
and take expectations
Eyt yt 1 Ey t 1 yt 2 Ey t 2 yt Eu t yt
Eyt yt 1 1 Ey t 1 yt 1 2 Ey t 2 yt 1 Eu t yt 1
Eyt yt 2 1 Ey t 1 yt 2 2 Ey t 2 yt 2 Eu t yt 2
...
Eyt yt s 1 Ey t 1 yt s 2 Ey t 2 yt s Eu t yt s
45
THE ACF for an AR(2) process
Because of stationarity : Ey t yt s Eyt k yt k s s
M oreover Eu t yt σ 2 and Eu t yt-s 0
0 1 1 2 2 2
1 1 0 2 1 (A)
2 1 1 2 0 (B)
3 1 2 2 1 (C)
...
s 1 s 1 2 s 2 (D)
Note the strange pattern at 0 , 1 , 2 . This is because :
1 1 2 2 46
Now we can divide (A), (B), (C), (D) by 0
to get autocorrelations
t 1 1t 0 2t 1 (i) This is Yule Walker
t 2 1t 1 2t 0 (ii)
t 3 1t 2 2t 1 (iii)
...
t s 1t s 1 2t s 2 (iv) t 1 1 t 12 t 23 ...t p 1 p
t 2 t 11 2 t 13 ...t p 2 p
Solving (i) (remember t 0 1) t 3 t 21 t 12 3 ...t p 3 p
t 1 1 /(1 2 ) ....
t p t p 11 t p 22 t p 33 ...t p
Solving (ii) and (ii)
t 2 12 /(1 2 ) 2
t 3 1 12 /(1 2 ) 2 12 /(1 2 )
47
THE ACF for AR(2) process
• It gets messier for higher orders
• But, given the solution for t and t all
autocorrelation functions need to satisfy
the difference equation in (iv)
48
– set obs 1000
– gen e = invnorm(uniform())
– gen t = _n-1
– tsset t
– gen y = 0 if t==0
– replace y = 0 if t==1
– replace y = 0.7*l.y-0.49*l2.y+e if t>1
– corrgram y, lag(15)
49
0.40
0.20
0.00
-0.20
-0.40
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
50
But what is PACF?
• PACF: Partial autocorrelation function
• In an AR(1) process y(t) and y(t-2) are correlated even
though y(t-2) does not appear directly in the model.
– We just saw that the correlation between y(t) and y(t-2) is equal
to the correlation between y(t) and y(t-1) () multiplied by the
correlation between y(t-1) and y(t-2) () so that ttt
– All such indirect correlations are present in the ACF of any
autoregressive process
• The partial autocorrelation eliminates the effects of any
intervening values y(t-1) through y(t-s+1)
– Clearly, at lag 1 the PACF and the ACF are identical because
there is no intervening coefficient to eliminate
• So, in an AR(1) process the PACF between y(t) and y(t-
2) is equal to zero but the ACF is equal to
51
Formulae for PACF
t 11 t 1
(t 2 t 12 )
t 22 , computing higher order PACF is complicated
(1 t 1 )
2
53
Invertibility
• The concept of invertibility is similar to that
of stationarity, but when we talk about MA
we usually talk about invertibility and when
we talk about AR we talk about stationarity
• An MA(q) model is invertible if the roots of
the characteristic equation are outside the
unit circle (same condition as stationarity)
• This condition prevents the model from
exploding under an AR(oo) representation
54
• Invertibility for an MA(2) model
55
What have we learnt so far?
• If we have an AR(p) model
– The ACF declines exponentially
– The PACF goes to zero at lag p+1
• If we have an MA(q) model
– The ACF goes to zero at lag q+1
– The PACF declines exponentially
56
clear
set obs 10000
gen e = invnorm(uniform())
gen t = _n-1
tsset t
gen y = 0 if t==0
replace y = 0 if t==1
replace y = 0 if t==2
gen y_ar= y
gen y_ma= y
replace y_ar = 0.9*l.y_ar+e if t>0
replace y_ma = e+0.9*l.e if t>0
corrgram y_ar, lag(20)
57
1.00
0.80
0.60
0.40
0.20
0.00
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
58
pac y_ar
1.00
0.80
0.60
0.40
0.20
0.00
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
59
clear
set obs 10000
gen e = invnorm(uniform())
gen t = _n-1
tsset t
gen y = 0 if t==0
replace y = 0 if t==1
replace y = 0 if t==2
gen y_ar= y
gen y_ma= y
replace y_ar = 0.9*l.y_ar+e if t>0
replace y_ma = e+0.9*l.e if t>0
corrgram y_ma, lag(20)
60
0.50
0.40
0.30
0.20
0.10
0.00
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
61
0.60
0.40
0.20
0.00
-0.20
-0.40
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
62
What have we learnt so far?
• If we have an AR(p) model
– The ACF declines geometrically
– The PACF goes to zero at lag p+1
• If we have an MA(q) model
– The ACF goes to zero at lag q+1
– The PACF declines geometrically
• If these were the only two possibilities,
life would be very easy
63
ARMA
• ARMA models, i.e., models which have an AR
and an MA component complicate our life!
• ARMA(p,q)
yt 1 yt 1 2 yt 2 3 yt 3 ... p yt p
1ut 1 2ut 2 3ut 3 ... q ut q ut
or, more concisely :
(L)yt μ θ(L)ut
with
(L) 1 1 L 2 L2 3 L3 ... p Lp
(L) 1 1 L 2 L2 3 L3 ... q Lq 64
ARMA
• ARMA models, i.e., models which have an AR and an MA component
complicate our life!
• In fact, they don’t really complicate our life.
• Think about it!
• We know that under certain conditions on the parameters there exist
fundamental relationships between autoregressive and moving-average
processes, as long as we are willing to consider an infinite number of
parameters.
• However, we favor models that can characterize the properties of data
using as few parameters as possible.
• Models that contain both autoregressive and moving-average terms often
allow us to obtain such parsimonious models.
• Essentially, an ARMA model allows us to capture the dynamics of our data
using fewer parameters than would be necessary if we restricted ourselves
to pure AR or MA processes
– Having fewer parameters to estimate is particularly helpful in time-series
analysis, where the correlation among observations reduces the effective
number of observations.
65
ARMA
• We now know that a distinguishing factor
between autoregressive and moving-average
processes is how shocks to a series affect future
realizations.
• With an MA(q) process, a shock at time t has
absolutely no effect on the series in periods t + q
+ 1 and beyond.
• With an AR(p) process, the effects of a shock
decay gradually over time.
• We will use these characteristics to differentiate
series based on their moving-average and
autoregressive properties
66
ARMA
• The ACF and PACF of and ARMA(p,q) model
are a combination of those of AR and MA
models
• Both will be declining geometrically
• After lag q, the ACF is dominated by the AR
process and starts decaying towards zero
• After lag p, the PACF is dominated by the MA
process and starts decaying towards zero
67
ARMA(1,1)
• This is the workhorse specification
– (if the series is stationary)
• Calculate autocorrelations with Yule-Walker
E ( yt yt ) 1 E ( yt 1 yt ) 1 E (ut 1 yt ) E (uyt ) 0 1 1 1 (1 1 ) 2 2
E ( yt yt 1 ) 1 E ( yt 1 yt 1 ) 1 E (ut 1 yt 1 ) E (uyt 1 ) 1 1 0 1 2
E ( yt yt 2 ) 1 E ( yt 1 yt 2 ) 1 E (ut 1 yt 2 ) E (uyt 2 ) 2 1 1
E ( yt yt s ) 1 E ( yt 1 yt s ) 1 E (ut 1 yt s ) E (uyt s ) s 1 s 1
Solving the first 2 equations simultaneously
1 12 211 2
0
1 12
(1 11 )(1 1 ) 2
1 , and
1 12
(1 11 )(1 1 )
t1 , and t s 1t s 1
1 1 211
2 68
• Example y(t)= -0.7y(t-1)-0.7u(t-1)+u(t)
(1 11 )(1 1 ) (1 0.491 )(0.7 0.7)
t1 0.844
1 1 211
2
1 0.49 2 * 0.49
t 2 1t 1 0.7 * (0.844) 0.591
t 3 1t 2 0.7 * (0.591) 0.414,t 4 0.29,t 5 0.203
t 6 0.142,t 7 0.099,t 8 0.07
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
69
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
70
0.00
-0.20
-0.40
-0.60
-0.80
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
71
(1 11 )(1 1 ) (1 0.491 )(0.7 0.7)
t1 0.844
1 1 211
2
1 0.49 2 * 0.49
t 2 1t 1 0.7 * (0.844) 0.591
t 3 1t 2 0.7 * (0.591) 0.414,t 4 0.29,t 5 0.203
t 6 0.142,t 7 0.099,t 8 0.07
73
1.00
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
74
1.00
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
75
Cheatsheet
Process ACF PACF
White noise All ts All t(ss)=0
AR(1) with >0 Exponential decay tss t(11)= t1 t(ss)=0 for s>1
AR(1) with <0 Oscillating decay SAME AS ABOVE
AR(p) Exponential decay (it can be Different from zero until lag
oscillating) p and then zero
MA(1) with >0 Positive at lag 1, ts for s Oscillating decay
MA(1) with <0 Negative at lag 1, ts for s Exponential decay
MA(q) goes to zero at lag q+1 Exponential decay
ARMA(1,1) >0 Exponential decay from lag 1. Oscillating decay from lag
sign ts sign 1. t(11)= t1
ARMA(1,1) <0 Oscillating decay from lag 1. Exponential decay from lag
sign ts sign 1. t(11)= t1
sign tss tss
ARMA(p,q) Decay starts at lag q Decay starts at lag p 76
From Enders: Tab 2.1
How to do it!
The Box-Jenkins approach
• BJ developed a systematic way to estimate ARMA
• Three steps:
1. Identification
2. Estimation
3. Diagnostic
• Note that we are still assuming that the process is
stationary
– Step 0 is to test for stationarity
• Remember parsimonious models are best, adding lags
improves the R2 but it does not necessarily improve
forecasting performance (it can actually reduce it)
– More parameters less degrees of freedom
– Too many parameters overfit historical accidents that are
unlikely to occur again
77
Identification
• In this step we want to find out the order of the model
– Plot the data and see if there is something strange (trend, outliers, missing
values, structural breaks)
– Compute and plot ACF and PACF
• The ACF and PACF are useful for selecting the number of autoregressive
and moving average terms.
– Recall that for an AR(p) process, the first p partial autocorrelations will be
significant, and the p + 1-st and later ones will be zero, ACF will be decaying
slowly. Moving-average processes show gradually decaying partial
autocorrelations and have ACF that go to zero at q+1.
• Unfortunately real world data, don't look as nice as the generated data we
used so far and it will be difficult to infer the order of the process by just
looking at the graph
– Typically a series is represented by a mixed ARMA process, in which case the
autocorrelation function decays either more slowly or more quickly than we would
expect from a pure autoregressive process.
– In these cases, we fit several candidate models with varying numbers of
autoregressive and moving-average terms and then employ information criteria
to select a final model.
78
Identification
• Information criteria
• The most popular IC are
– Akaike's information criterion (AIC)
– Schwarz's Bayesian information criterion
(SBIC)
– Hannan-Quinn information criterion (HQIC)
79
Punishment for more
Identification parameters
83
Estimation
• Before computing IC you need to estimate
the model,
• This is easy, you just tell the computer to
do it
– (not so easy without a computer, it is not
generally done with OLS but with MLE)
84
Diagnostic
• Check R2
• Look at whether the and are statistically significant
• How long does it take to converge?
– If convergence takes time, there may be a problem
• Compute the IC
• Plot the residuals and look for outliers
• Check if the residuals are serially correlated (do this by
looking at ACF and PACF of the residuals)
– If you get several residuals correlation that are marginally
significant (or close to being significant) and a Q statistics which
is barely significant at 10%, be suspicious
• Evaluate forecasting properties
85
Forecasting
• Assume you have an AR(1)
• You estimate y(t)=+y(t-1)+u(t)
• And use it to predict y(t+1). How?
– Et(y(t+1))= y(t)
– Et(y(t+2))= Ey(t+1)= y(t)
– Et(y(t+s))= Ey(t+s-1))= ssy(t)
• But the quality of the forecasts decline with s.
• If the series is stationary, when s goes to infinity
Et(y(t+s))=
– The forecast converges to the unconditional mean
86
In-sample forecasts Out-of-sample forecasts
6
4
2
0
y xb prediction, dyn(9980)
87
Forecasting errors are driven by
both parameter uncertainty and
Mafalda effect (we don’t know
the future and therefore the
Forecasting
parameters may change)
88
How good are my forecasts?
• Strategy:
– You have T observations
– Estimate different types of models using T-H
observations
– Use the results to forecast the H observations that are
not included in the model
– Compare model performance
• Note: you can do this by keeping the same window length
(rolling window) or by adding one observation at a time
(recursive window)
• Lengthening the window gives you more info, but if one
model picks a big error in the past and then does a better job
(maybe structural change) a rolling window may be
preferable
89
How good are my forecasts?
• How do you compare model performance?
• Having a small mean square prediction error (MSPE) is
good. If your forecasting error is e
H
1
MSPE
H
i
e 2
i 1
• Assume you have two models. Call MSPE1 and MSPE2
their MSPEs.
• You find that MSPE1>MSPE2 but you don't know whether
it is significantly bigger
– You can check this with an F-test with (H,H) degrees of freedom
– F=MSPE1/MSPE2
90
How good are my forecasts?
• That F-test looked cool, but it requires three assumptions:
1. The forecasts errors have zero mean and they are normally
distributed
2. The forecast errors are serially uncorrelated
3. The forecast errors are contemporaneously uncorrelated with each
other
• These 3 assumptions rarely hold.
– Forecast errors are often serially correlated
– The fact that u is normal does not guarantee that e is normal.
– The forecast errors of two different models are often
contemporaneously correlated with each other.
• Under these conditions the ratio of MSPES does not have
an F distribution
• What to do?
91
Granger-Newbold test
• This test does not require contemporaneously uncorrelated
forecasts errors
• Define e1 and e2 the forecasts errors of the first and second model
• Use them to build xt= e1t + e2t and zt= e1t - e2t
• If the first two assumptions are valid, under the null hypothesis of
equal forecast accuracy, xt and zt should be uncorrelated
• We can compute the correlation rxz=E xt zt =E(e1t2 –e2t2) and if rxz>0
MSPE1>MSPE2 (and viceversa)
• We can test this because Granger and Newbold showed that
rxz
GN
(1 rxz2 ) /( H 1)
g (e ) g (e2i )
1
d 1i
H i 1
94
• If the {di} series is serially uncorrelated
var(d ) 0 / H 1
d
and ~ t with H - 1 degrees of freedom
0 / H 1
• You can install the DMARIANO ado file in STATA
– But only has quadratic (default) and absolute loss functions
95
• In the presence of serial correlation in dBAR, things
become more complicated.
• Let i the ith autocovariance of the {di} sequence.
• If the first q values of i are different from zero, the
variance of dBAR can be approximated with:
[0+2 ( … q)]/(H-1)
• yielding the Diebold and Mariano (DM) statistics:
d
DM
2( 1 2 .. q ) / H 1
~ t with H - 1 degrees of freedom
0
96
Now, let's really do it
• Load the data and graph
• use DATA_UV1.dta
• tsline y
20
10
Nothing
0
y
surprising here
-10
-20
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
99
It oscillates but it is still significant at 30 lags
Look at the PACF
1.00
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
100
Not significant after the 4th lag
• So far, no clear pattern, this is unlikely to
be a simple AR or MA process, but let's
use the parsimony principle and start form
the simplest model AR1
arima y, ar(1)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons -.011453 .6223732 -0.02 0.985 -1.231282 1.208376
ARMA
ar
L1. .809582 .0198779 40.73 0.000 .7706219 .848542
102
Let's look at an AR(2)
arima y, ar(1/2)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons .0047896 .2597757 0.02 0.985 -.5043614 .5139407
ARMA
ar
L1. 1.381413 .0225699 61.21 0.000 1.337177 1.42565
L2. -.7050654 .0225957 -31.20 0.000 -.7493521 -.6607787
103
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
NOPE!
104
Let's look at an MA(2)
arima y, ma(1/2)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons -.0005962 .2795364 -0.00 0.998 -.5484774 .5472851
ARMA
ma
L1. 1.042007 .0267381 38.97 0.000 .9896009 1.094412
L2. .5975604 .0258416 23.12 0.000 .5469117 .648209
105
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
NOPE!
106
Let's try at an ARMA(11)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons -.0049327 .5317221 -0.01 0.993 -1.047089 1.037223
ARMA
ar
L1. .7152491 .02563 27.91 0.000 .6650151 .7654831
ma
L1. .4877043 .0318154 15.33 0.000 .4253472 .5500613
107
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
NOPE!
Wasn’t this the workhorse????
108
Let's try at an ARMA(21)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons .0099372 .1232032 0.08 0.936 -.2315366 .2514111
ARMA
ar
L1. 1.613196 .0160236 100.68 0.000 1.58179 1.644602
L2. -.8993578 .0142444 -63.14 0.000 -.9272763 -.8714393
ma
L1. -.533875 .0343623 -15.54 0.000 -.6012238 -.4665261
Amazing t-stats!
Let's look at the residuals
109
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
110
Let's try at an ARMA(22)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons .0061418 .1439669 0.04 0.966 -.2760281 .2883117
ARMA
ar
L1. 1.604461 .0150945 106.29 0.000 1.574877 1.634046
L2. -.9129829 .0140284 -65.08 0.000 -.9404781 -.8854877
ma
L1. -.6066721 .0356254 -17.03 0.000 -.6764967 -.5368476
L2. .2036361 .0333804 6.10 0.000 .1382118 .2690604
mmm!
Let's look at the residuals
111
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
Ah Ah!
112
Let's try at an ARMA(12)
ARIMA regression
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons -.0017848 .507626 -0.00 0.997 -.9967136 .993144
ARMA
ar
L1. .6333353 .0328677 19.27 0.000 .5689159 .6977548
ma
L1. .5780074 .0362004 15.97 0.000 .507056 .6489589
L2. .4068361 .0343752 11.84 0.000 .339462 .4742102
Better before
Let's look at the residuals
113
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
y
_cons -.01145297 .00478961 -.0014219 -.00059618 -.00493273 .00993724 .00614182 -.0017848
ARMA
L.ar .80958196 1.3814135 .71524908 1.6131958 1.6044613 .63333531
L2.ar -.70506538 -.89935778 -.91298289
L.ma .80017243 1.0420065 .48770427 -.53387498 -.60667211 .57800741
L2.ma .59756038 .20363609 .40683611
sigma
_cons 3.7450909 2.6515503 4.1199273 3.348943 3.2058267 2.3830526 2.3459689 2.9645442
117
local i = 1
while `i' <=T { gen fe_ar1_sq = fe_ar1^2
arima y if t<1000-T+`i', ar(1) gen fe_ar2_sq = fe_ar2^2
predict temp, dynamic(1000-T+`i') gen fe_arma11_sq = fe_arma11^2
replace fe_ar1 = y-temp if t==900+`i'+2 gen fe_arma21_sq = fe_arma21^2
drop temp gen fe_arma22_sq = fe_arma22^2
arima y if t<1000-T+`i', ar(1/2) gen fe_arma12_sq = fe_arma12^2
predict temp , dynamic(1000-T+`i')
replace fe_ar2 = y-temp if t==1000-T+`i'+2 gen fe_ar1_sqcw = fe_ar1cw^2
drop temp gen fe_ar2_sqcw = fe_ar2cw^2
arima y if t<1000-T+`i', ar(1) ma(1) gen fe_arma11_sqcw = fe_arma11cw^2
predict temp , dynamic(1000-T+`i') gen fe_arma21_sqcw = fe_arma21cw^2
replace fe_arma11 = y-temp if t==1000-T+`i'+2 gen fe_arma22_sqcw = fe_arma22cw^2
drop temp gen fe_arma12_sqcw = fe_arma12cw^2
arima y if t<900+`i', ar(1/2) ma(1)
predict temp , dynamic(1000-T+`i')
replace fe_arma21 = y-temp if t==1000-T+`i'+2 egen MSE_AR1_3 = mean(fe_ar1_sq)
drop temp egen MSE_AR2_3 = mean(fe_ar2_sq)
arima y if t<1000-T+`i', ar(1/2) ma(1/2) egen MSE_ARMA11_3 = mean(fe_arma11_sq)
predict temp , dynamic(1000-T+`i') egen MSE_ARMA21_3 = mean(fe_arma21_sq)
replace fe_arma22 = y-temp if t==1000-T+`i'+2 egen MSE_ARMA22_3 = mean(fe_arma22_sq)
drop temp egen MSE_ARMA12_3 = mean(fe_arma12_sq)
arima y if t<1000-T+`i', ar(1) ma(1/2)
predict temp , dynamic(1000-T+`i')
replace fe_arma12 = y-temp if t==1000-T+`i'+2 egen MSE_AR1CW_3 = mean(fe_ar1_sqcw)
drop temp egen MSE_AR2CW_3 = mean(fe_ar2_sqcw)
local i = `i' + 1 egen MSE_ARMA11CW_3 = mean(fe_arma11_sqcw)
} egen MSE_ARMA21CW_3 = mean(fe_arma21_sqcw)
egen MSE_ARMA22CW_3 = mean(fe_arma22_sqcw)
egen MSE_ARMA12CW_3 = mean(fe_arma12_sqcw)
118
Now let's try with forecasting
119
Now let's try with forecasting
• At this point we could do F test with the ratios of
MSE
• Do Granger and Newbold
• Do Diebold and Mariano
• The latter is particularly useful if our loss function
is not quadratic (maybe underpredictions are
more costly than overpredictions, maybe there
is a threshold above which the error does not
matter…)
120
Seasonality
• Monthly, quarterly, and daily data may have a seasonal
effect
• For instance, consumption and money supply are higher
in December and in the 4th quarter of the year, and stock
returns are often lower on Friday.
• One way to deal with seasonality is to regress the series
on quarter-month, day dummies and then work with the
residuals of the series.
• This is what it is often done when you get seasonally
adjusted data
• But seasonal patterns may remain (especially if you use
subsets of data which may have more or less
pronounced seasonal patterns)
121
Seasonality
• Instead of using a two-step procedure (first you remove
seasonal effect and then you estimate the model) there
is now consensus that it is better to estimate ARMA and
seasonal coefficients jointly
• The procedure is similar to Box and Jenkins by
remember that the ACF and PACF may exhibit seasonal
behavior
• For instance, with quarterly data we may have
y(t)=4y(t-4)+u(t) (a)
y(t)=4u(t-4)+u(t) (b)
• and ti=(4i4 if i/4 is an integer and ti=0 otherwise.
– In model (a) (AR) the ACF exhibits decays at lags 4, 8, 12, …
– In model (b) (MA) the ACF has a spike at lag 4 and all other
values are zero 122
Seasonality
M odels of the type:
y t 1 yt 1 1ut 1 4ut 4 ut
y t 1 yt 1 4 yt 4 1ut 1 ut (a)
y t 1 yt 1 4 yt 4 1ut 1 4ut 4 ut
exhibit additive seasonalit y (the seasonal coefficien ts
are added to the process). M odels of the type:
(1 - 1 L) yt (1 1 L)(1 4 L4 )ut
(1 - 1 L)(1 - 4 L4 ) yt (1 1 L)ut (b)
(1 - 1 L)(1 - 4 L4 ) yt (1 1 L)(1 4 L4 )ut
exhibit multiplicative seasonalit y
123
Seasonality
M odels of the type:
(1 - 1 L) yt (1 1 L)(1 4 L4 )ut
yt 1 yt 1 ut 1ut 1 4ut 4 1 4ut 5
(1 - 1 L)(1 - 4 L4 ) yt (1 1 L)ut
yt 1 yt 1 4 yt 4 1 yt 44 yt 5 ut 1ut 1
(1 - 1 L)(1 - 4 L4 ) yt (1 1 L)(1 4 L4 )ut
yt 1 yt 1 4 yt 4 14 yt 5 ut 1ut 1 4ut 4 1 4ut 5
124
Seasonality
• Equation (a) differs from (b) because the latter allows the MA term at
lag 1 to interact with the MA term at lag 4
• Equation (b) can be rewritten as
125
Example
• I am using quarterly data (not seasonally adjusted) for
US money supply (M1) for the 1960:q1-2002:q1 period
1200
.06
1000
.04
The growth rate
looks stationary,
800
Growth of M1
but there seems
.02
to be a cyclical
600
0
higher
200
-.02
1960q1 1970q1 1980q1 1990q1 2000q1
date
. char quarter[omit] 4
You can omit quarter 4!
. xi: reg grM1 i.quarter
i.quarter _Iquarter_1-4 (naturally coded; _Iquarter_4 omitted)
127
. corrgram grM1, lag(12)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
128
0.60
0.40
0.40
0.20
Autocorrelations of m1
0.20
0.00
0.00
-0.20
-0.20
-0.40
-0.40
0 10 20 30 40
Lag
0 10 20 30 40
Bartlett's formula for MA(q) 95% confidence bands Lag
Bartlett's formula for MA(q) 95% confidence bands
0.60
0.40
0.40
Partial autocorrelations of m1
0.20
0.20
0.00
0.00
-0.20
-0.20
-0.40
0 10 20 30 40
-0.40
Lag
95% Confidence bands [se = 1/sqrt(n)]
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
129
Try with 4 different models
arima m1, ar(1) ma(4)
est store arma1_4
arima m1, ar(1) mar(1,4)
est store ar_14
arima m1, ma(1) mma(1,4) These are all multiplicative
est store ma_14
arima m1, ar(1) ma(1) mar(1,4) mma(1,4)
est store arma_14_14
. est table arma1_4 ar_14 ma_14 arma_14_14, b(%7.4f) se(%7.4f) drop(_cons)
ARMA
L.ar 0.5371 0.4845 0.7335
0.0576 0.0587 0.1046
L4.ma -0.7513
0.0469
L.ma 0.4487 -0.2835 According to BIC and
0.0727 0.1349
AIC the best model is
ARMA4
L.ar -0.4292 0.0012 the second
0.0685 0.1058
L.ma -0.7326 -0.7611
0.0485 0.0644
legend: b/se
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
131
. corrgram resma1_4, lag(13)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]