Univariate Time Series

Univariate Time Series
Diego Vílchez
Loosely based on Ugo Panizza’s

lecture notes
1
Why do we need time series
techniques?
• When dealing with “normal” econometrics, we often use
the mean and standard deviation to describe the central
tendency and spread of a variable, and we use
covariances and correlations to describe the degree to
which two or more variables move together.
• With time-series data, we must be more careful.
– A key feature of most time series is that observations that are
close together in time are typically correlated.
• That is, the value of xt is often correlated with xt-1, xt-2, xt+1, and so on.
– Moreover, many time series exhibit trending behavior.
• If we regress one variable that exhibits a time trend on another variable that also has a
time trend, the two variables may look highly correlated, even if they are in fact
completely unrelated in any substantive sense.
– Finally, with cross-sectional data, we assume that the sample
mean is an estimate of the population mean
• With time-series data the population mean may not even exist.
2
• We will discuss the modern approach to economic time
series
– We will start with a class of specifications aimed at predicting the
behavior of a variable using only information contained in the past
values of this variable and possibly past values of the error term
– yt=f(yt-1, yt-2, ..yt-p, et-1,et-2, ..et-q)
– This is different from multivariate models in which we try to predict
the behavior of a variable using the past or current behavior of
other variables
– yt=f(x1, t-1, x1, t-2, .. x1, t-p, x2, t-1, x2, t-2, .. x2, t-p,…xn, t-1, xn, t-2, .. xn, t-p)
• Univariate time series models are usually atheoretical, but
they can produce good forecasts,
• They are also useful because we may not have other
variables at the same frequency
– For instance, we want to predict stock returns but we do not have
daily data for fundamentals
3
• Economists started using this modern approach to
time series analysis when they realized that
purely statistical models used by statisticians
were much better at forecasting than structural
models used by economists
• Economists did not use to worry about the
stationarity of the series they used
– However, a series of results showed that standard
econometric techniques applied to non-stationary data
could yield results that did not make any sense
• By the way, what does stationarity mean?
4
• Think of time series analysis as an attempt
of estimating a difference equation with a
stochastic component.
– The problem is that we have no clue about the
form of this difference equation
– The difficult thing is to choose which equation
we should estimate:
• y(t)=ay(t-1)+u(t)
• y(t)=ay(t-1)+by(t-2)+u(t)
• y(t)=a(1)y(t-1)+… +a(n)y(t-n)+u(t)
• y(t)=ay(t-1)+bu(t)+cu(t-1)
• y(t)=a(1)y(t-1)+… +a(n)y(t-n)+b(1)u(t)+..+b(n)(u(t-n+1))
5
Objective of this class
• Learn how to forecast
• 3 steps
1. Take a time series
2. Use the Box and Jenkins methodology to
identify the characteristics of this series (its
moving average and autoregressive
components)
3. Forecast
6
We need some definitions
• They will not make sense at the beginning
– Hopefully, they will make sense later on
– So bear with me
• Stationarity
• Autocovariance and Autocorrelation
• White Noise Process
• Moving average (MA)
• Autoregressive process (AR)
• Wold Decomposition
• Random walk
• Partial autocorrelation function (PACF)
7
Stationarity
• For the moment we will work with stationary
processes
• A strictly stationary process is a process in
which the probability measure for the sequence
{y(t)} is the same as that for the sequence
{y(t+k)} and this holds for all possible k.
– This means that the distribution of the variable is the
same at any point in time.
• We will use a weaker concept of stationarity:
weak stationarity or covariance stationarity.
8
• A series is covariance (or weakly) stationary if:
1. E ( yt )   (and not t)
2. E ( yt   ) 2   2   (and not t)
3. E ( yt   )( yt  s   )  E ( yt  k   )( yt  s  k   )   s
for s  0, 1, 2, 3,..
– The first condition states that the series needs to have
constant mean
– The second condition states that the series needs to have
constant and finite variance
– The third condition says that the autocovariance only depends
from the distance (s) between two observations.
Is yt=bt+ut stationary? 9
Strict versus Weak
• Strict stationarity requires that the joint
distribution of any group of observations does
not depend on t.
• In general weak stationarity does not imply
strong stationarity because weak stationarity is
only concerned with the first two moments of the
distribution (the mean and covariances).
• An exception is a series that is normally
distributed
– In this case, weak stationarity implies strong
stationarity because the normal distribution is fully
characterized by its first two moments.
10
Why do we care?
• Nonstationary series are much more difficult to analyze
than stationary series.
• In standard econometrics, you learn about things like the
central limit theorem and consistency of estimators.
Those concepts can be generalized in straightforward
ways to apply to stationary time series.
• For nonstationary series, the distributions of estimators
are much more complex and are not simple
generalizations of the familiar distributions of estimators
for cross-sectional data.
• The most common approach is to transform a series so
that it is stationary and then proceed from there.
– Not so when we do cointegration
11
For the moment let’s assume we
work with stationary series
• Next class, we’ll learn how to test for stationarity
• Let’s talk about autocovariance and
autocorrelation
Remember t he autocovariance
E ( yt   )( yt  s   )  E ( yt  k   )( yt  s  k   )   s
for s  0, 1, 2, 3,..
12
• Since the autocovariance depends on the unit of
{y(t)}, we normalize the autocovariance with the
variance of the series. This is the
autocorrelation (AC):
 ( s)
t ( s) 
 (0)
• The AC is like a correlation and is bounded
between -1 and +1. Of course t(0)=1.
– A plot of t(s) versus s is called the autocorrelation
function (ACF) or correlogram
13
• Note that if y(t) is stationary with normally
distributed errors, then the sample
autocorrelation is also approximately normally
distributed
tˆs ~ approximately N (0,1 / T )
• Where T is the sample size.
• This is cool, because we can use the above
result to build confidence intervals for the
autocorrelation coefficient
• The 95% confidence interval is:
1.96
tˆs 
T 14
• We can also have a joint test that all the
autocorrelation coefficients up to m (maximum lag
length) are jointly equal to zero.
• The original test was the Box-Pierce Q statistics
m
Q  T tˆk2
k 1
• As this is a sum of normals, it is distributed like a

chi squared variable with m degrees of freedom.
• However, the above statistics has poor small
sample properties, and we now use Ljung-Box Q*
m
tˆk2
Q*  T T  2
k 1 T k
15
• With STATA life is easy
– set obs 1000
– gen e = invnorm(uniform())
– gen t = _n-1
– tsset t
– gen y = 0 if t==0
– replace y = 0.7*l.y+e if t>0
– corrgram y, lag(15)
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
1 0.6952 0.6953 484.82 0.0000

2 0.4939 0.0205 729.78 0.0000
3 0.3334 -0.0335 841.48 0.0000
4 0.2238 -0.0035 891.85 0.0000
5 0.1456 -0.0074 913.2 0.0000
6 0.1209 0.0515 927.92 0.0000
7 0.1140 0.0340 941.03 0.0000
8 0.1257 0.0476 957 0.0000
9 0.1232 0.0071 972.35 0.0000
10 0.1024 -0.0190 982.96 0.0000
11 0.0546 -0.0535 985.99 0.0000
12 0.0128 -0.0256 986.15 0.0000
13 -0.0070 0.0087 986.2 0.0000
14 0.0011 0.0340 986.2 0.0000
15 0.0108 0.0083 986.32 0.0000
16
• ac y
0.80
0.60
0.40
0.20
0.00
-0.20
0 10 20 30 40
Lag
Bartlett's formula for MA(q) 95% confidence bands
17
• Note that we can do the calculation by hand
• The first 3 autocorrelations are: 0.695, 0.494,
0.333
1.96
• The confidence intervals are given by tˆs  T
– 0.695 ± 1.96/1000^0.5= 0.695 ± 0.06;
– 0.494± 0.06;
– 0.333± 0.06
• Q* for m = 3 is
– 1000*1002(0.6952/999+ 0.4942/998+0.3332/997)=841
– We reject the null that all of the first 3
autocorrelations are zero
18
Is this the number produced by Stata?
White Noise
• A white noise process is a process with
constant mean, constant variance and zero
autocovariance except at lag zero.
– Or, in a white noise process each observation is
uncorrelated with all other values in the sequence.
• So y(t) is a white noise process if:
• 1 E(y(t))=
• 2 var(y(t))=
• 3 s =0 for all s different from 0
19
White Noise
• STATA has a white noise test
. wntestq y, lag(20)
Portmanteau test for white noise
Portmanteau (Q) statistic = 13885.0178

Prob > chi2(20) = 0.0000
. wntestq e, lag(20)
Portmanteau test for white noise
Portmanteau (Q) statistic = 14.2543

Prob > chi2(20) = 0.8174
• The tests confirm that y is not WN but e is WN

– This is just the Ljung-Box Q* test with a null
hypothesis that the autocorrelation function has no
significant elements at all lags>0
20
Moving average
• The MA is the simplest time series process
• Let u be a WN with E(u)=0 and Var(u)=2
• Then, the following process is a moving
average of order q. written as MA(q)
yt    1ut 1   2ut 2  3ut 3  ... qut q

• or
q
yt     i ut i  ut
i 1
21
Moving average
• So an MA process is just a linear combination
of WN processes
• Lag operator: Ly(t)=y(t-1) and Liy(t)=y(t-i)
• So:
q
yt     i Li ut  ut
i 1
• Or y(t)=(L)u(t), where:
 ( L)  1  1L   2 L2  3 L3  ... q Lq
22
Moving average
• From now on, we will set =0
• Example: MA(2) process yt  ut  1ut 1   2ut 2
• The mean of yt is E( yt )  E(ut )  1E(ut 1 )  2 E(ut 2 )  0
• The variance is var( yt )  E( yt  E( yt ))( yt  E( yt ))
• Since E(yt)=0, var( yt )  Eyt yt 
var( yt )  E[(ut  1ut 1   2ut 2 )(ut  1ut 1   2ut 2 )]
var( yt )  E[(ut2  12ut21   22ut22  1 2ut 1ut 2  1ut 1ut   2ut 2ut  ...)]
• But E(cross products)=0 because u is WN

var( yt )  E[(ut2  12ut21   22ut22 )]
var( yt )   (0)   2  12 2   21 2   2 (1  12   22 )
23
Moving average
• What about the autocovariances
 1  E( yt  E ( yt ))( yt 1  E( yt 1 ))
 1  Eyt yt 1 
 1  E[(ut  1ut 1   2ut 2 )(ut 1  1ut 2   2ut 3 )]
 1  E[1ut21  1 2ut22 ]   2 1  1 2 
 2  E[(ut  1ut 1   2ut 2 )(ut 2  1ut 3   2ut 4 )]

 2  E[ 2ut22 ]   2 2
 3  E[(ut  1ut 1  2ut 2 )(ut 3  1ut 4   2ut 5 )]  0

24
Moving average
• What about the autocorrelations
t1 
1  1 2 
(1  12   22 )
2
t2 
(1  12   22 )
0
t3  0
(1     )
1
2 2
2
t 2  0
25
In general
• If you have a MA(q) process
• y(t)=u+e(t)+1e(t-1)+2e(t-2)+..qe(t-q)
– E(y(t))=u
– var(y(t))=(1+12+22+..q2)2
 (1)=(1+12+23..+q-1q)2
 (2)=(2+13+24..+q-2q)2
 (q)=(q)2
 (m)=0 (for all m>q)
26
• Again with STATA We expect
– clear
t(1)=(/()=
– set obs 1000
– gen e = invnorm(uniform()) = (-0.5-0.5*0.25)/(1+0.52+0.252)=-0.48
– gen t = _n-1 t  0.25/(1+0.52+0.252)=0.19
– tsset t
– gen y = 0 if t==0
– replace y = 0 if t==1
– replace y = e-0.5*l.e+0.25*l2.e if t>1
-1 0 1 -1 0 1
1 -0.4994 -0.4995 250.15 0.0000

2 0.2644 0.0200 320.34 0.0000
3 -0.0574 0.1096 323.65 0.0000
4 0.0271 0.0454 324.39 0.0000
5 -0.0152 -0.0135 324.62 0.0000
6 0.0248 0.0080 325.24 0.0000
7 -0.0113 0.0096 325.37 0.0000
8 -0.0260 -0.0428 326.05 0.0000
9 0.0121 -0.0268 326.2 0.0000
10 -0.0130 -0.0057 326.37 0.0000
11 0.0240 0.0310 326.96 0.0000
12 -0.0388 -0.0197 328.48 0.0000
13 0.0182 -0.0214 328.82 0.0000
14 0.0180 0.0373 329.15 0.0000
15 -0.0386 -0.0148 330.66 0.0000
16 0.0460 0.0124 332.81 0.0000
17 0.0023 0.0466 332.82 0.0000
18 -0.0117 0.0069 332.96 0.0000
19 0.0311 0.0197 333.94 0.0000
20 -0.0264 -0.0118 334.66 0.0000
27
• ac y
0.20
0.00
-0.20
-0.40
-0.60
0 10 20 30 40
Lag
28
Autoregressive processes
• An AR process is a process in which the current
value of y depends only on past values of y plus
an error term. An AR(p) is defined as:
yt    1 yt 1  2 yt 2  3 yt 3  ... p yt  p  ut
• or y    p  y  u
t i 1
i t i t
• or y     Li y  u
p
t i t t
i 1
Our friend the lag operator
• or  ( L) yt    ut with:
 ( L)  1  1L  2 L2  3 L3  ... p Lp 29
Back to stationarity
• Non-stationary AR models have explosive behavior, a
shock today will not die out with time but become bigger
and bigger
• Start with an AR(p) model with =0  ( L) yt  ut
• The model is stationary if it is possible to write:
yt   ( L) ut
1
• With  ( L) converging to zero.

1
• Another way to say this is that stationarity requires

declining autocorrelations as the lag length increases
30
The Wold's Decomposition
• Any stationary series can be decomposed into the sum
of two unrelated processes
– A deterministic part
– A stochastic part represented by an MA(∞)
• This is called the WOLD‘S DECOMPOSITION
THEOREM
• So, an AR(p) process with no constant and no other
terms can be expressed as a MA
• The Wold decomposition of  ( L) yt  ut is yt   ( L)ut
• With:
•  ( L)   ( L) 1  (1   L   L2   L3  ...   Lp ) 1
1 2 3 p
31
Back to stationarity
• Note that  ( L) 1 can be written as an infinite MA
(MA(∞ ))
ut  a1ut 1  a2ut 2  a3ut 3  ...a0u0
• If the process is stationary, the coefficients of the
MA will decline with lag length. If the process is
not stationary, the MA coefficients will not
converge to zero
• An AR(p) process is stationary if the roots of the
"characteristic equation" all lie outside the unit
circle
1  1 z  2 z 2  3 z 3  ... p z p  0
32
Example I
• y(t)=by(t-1)+u(t) => y(t)(1-bL)=u(t)
• The characteristic equation is 1-bz=0 and the
root is z=1/b
• If -1<b<1, then the root lies outside the unit circle
and the process is stationary. If not the model is
not stationary
• For instance: a random walk is defined as:
y(t)=y(t-1)+u(t)
• As b= 1 and z=1. The RW is NOT stationary
33
Example II
• y(t)=3y(t-1)+2.75y(t-2)+0.75y(t-3)+u(t)
• y(t)(1-3L-2.75L2-0.75L3)=u(t)
• The characteristic equation is
• 1-3z-2.75z2-0.75z3=0
• ….mmmm are you good with cubic
equations?
34
Example II
• The characteristic equation is
• 1-3z-2.75z2-0.75z3=0 Non stationary
• ….mmmm are you good with cubic equations?
. mata
mata (type end to exit)
: z = polyroots((1, -3, -2.75, -0.75))
: z
1 2 3
1 -1.96560837 - 1.08461394i -1.96560837 + 1.08461394i .264550072
– If you have a complex solution, the modulus of the complex

number (|a+bi|=(a^2+b^2)^0.5) needs to lie outside the unit circle
: abs(z)
1 2 3
1 2.24499525 2.24499525 .2645500719
35
Example
• Compute the unconditional mean, variance and ACFs of
a simple AR(1) process yt    1 yt 1  ut
E ( yt )    1 E ( yt 1 )
E ( yt 1 )    1 E ( yt  2 )
E ( yt )    1   1 E ( yt  2 )    (1  1 )  12 E ( yt  2 )
E ( yt  2 )    1 E ( yt 3 )
E ( yt )   (1  1 )  12   1 E ( yt 3 )    (1  1  12 )  13 E ( yt 3 )
E ( yt )   (1  1  12  ...  1n 1 )  1n E ( yt  n )
If 1  1 (i.e. if the model is stationary) lim n 1n E ( yt n )  0
E ( yt )   (1  1  12  ...  1n 1 )


E ( yt ) 
1  1
36
• Variance (assume =0)
yt  1 yt 1  ut
yt (1  1 L)  ut
from the Wold decomposition
yt  (1  1 L) 1 ut
yt  (1  1 L  12 L2  ....)ut
yt  ut  1ut 1  12ut  2  13ut 3  ...
as long as 1  1 this sum will converge
Remember t he formula of the variance
var (yt )  E  yt  E  yt  yt  E  yt 
37
• Variance (assume =0)
var( yt )  E[ yt  E ( yt )]2 since with   0, E ( yt )  0

var( yt )  E[ yt2 ] use the Wold decomposition and get
var( yt )  E[(ut  1ut 1  12ut  2  13ut 3  ..)
(ut  1ut 1  12ut  2  13ut 3  ..)]
remember t hat E(ut  k ut  s )  0 if s  k
var( yt )  E[(u   u 2
t
2 2
1 t 1  u
4 2
1 t 2  u
9 2
1 t 3  ..)]
var( yt )   2 [1  12  14  19  ..)] if 1  1
 2
var( yt ) 
1  12 38
• Autocovariances (assume =0)
 1  E[ yt  E ( yt )][ yt 1  E ( yt 1 )] since with   0, E ( yt )  0
 1  E[ yt yt 1 ] use the Wold decomposition and get
 1  E[(ut  1ut 1  12ut  2  13ut 3  ..)
(ut 1  1ut  2  12ut 3  13ut  4  ..)]
remember t hat E(ut  k ut  s )  0 if s  k
 1  E[(1 ut21  13ut2 2  16ut23  ..)]
 1   21 [1  12  14  19  ..)] if 1  1
1  2
1  using the same procedure
1  12
s 2

 2 2
s  or  s    1
s
2  1
1 2
1  12
13 2 The autocovariances decay at rate 
3  39
1  12
• Getting the autocorrelation functions is now a
piece of cake
0
t0  1
0
 2  2 2
1 1 2  2 1 2
t1     , t     2
0 2 2
0 2
1 2 1 2
 3 2  s 2
 3 1 2  1   2
t3     3
, t  s
   s
0 2 s
0 2
1 2 1 2
40
For higher order processes
getting the ACF is messier
• The mean and the autocorrelations of an AR
process can be obtained as follows

• Unconditional mean E ( y ) 
1  1  2  3  ... p
t
• The correlogram can be obtained from the Yule-

Walker equations t    t   t   ...t 
1 1 1 2 2 3 p 1 p
• It is a bit of a mess
• But we will see an example soon t 2  t 11  2  t 13  ...t p  2 p
t 3  t 21  t 12  3  ...t p 3 p
....
t p  t p 11  t p  22  t p 33  ...t p
41
In this example =1 and =0. So:
• With STATA var(y(t))=1/(1-0.72)=1.961
– set obs 1000
– gen e = invnorm(uniform()) t(1)=0.7
– gen t = _n-1
– tsset t t(2)=0.72=0.49
– gen y = 0 if t==0
– replace y = 0.7*l.y+e if t>0 t(3)=0.73=0.34
1 0.6952 0.6953 484.82 0.0000

2 0.4939 0.0205 729.78 0.0000
3 0.3334 -0.0335 841.48 0.0000
4 0.2238 -0.0035 891.85 0.0000
5 0.1456 -0.0074 913.2 0.0000
6 0.1209 0.0515 927.92 0.0000
7 0.1140 0.0340 941.03 0.0000
8 0.1257 0.0476 957 0.0000
9 0.1232 0.0071 972.35 0.0000
10 0.1024 -0.0190 982.96 0.0000
11 0.0546 -0.0535 985.99 0.0000
12 0.0128 -0.0256 986.15 0.0000
13 -0.0070 0.0087 986.2 0.0000
14 0.0011 0.0340 986.2 0.0000
15 0.0108 0.0083 986.32 0.0000
42
0.80
0.60
0.40
0.20
0.00
-0.20
0 10 20 30 40
Lag
43
A piece of advice about simulation
• The first observations of simulated series are
influenced by the choice of y(0).
• When simulating data from an autoregressive
process, it is nice to create a series much longer
than what you actually want and then discard or
"burn" the observations at the beginning of the
series.
• In general, the larger , the larger the burn-in
period
44
THE ACF for AR(2) process
yt  1 yt 1  2 yt  2  ut , we will derive Yule - Walker
t AR process by yt-s for s  0, 1, 2, 3
M ultiply he
and take expectations
Eyt yt  1 Ey t 1 yt  2 Ey t  2 yt  Eu t yt
Eyt yt 1  1 Ey t 1 yt 1  2 Ey t  2 yt 1  Eu t yt 1
Eyt yt  2  1 Ey t 1 yt  2  2 Ey t  2 yt  2  Eu t yt  2
...
Eyt yt  s  1 Ey t 1 yt  s  2 Ey t  2 yt  s  Eu t yt  s
45
THE ACF for an AR(2) process
Because of stationarity : Ey t yt  s  Eyt  k yt  k  s   s
M oreover Eu t yt  σ 2 and Eu t yt-s  0
 0  1 1  2 2   2
 1  1 0  2 1 (A)
 2  1 1  2 0 (B)
 3  1 2  2 1 (C)
...
 s  1 s 1  2 s  2 (D)
Note the strange pattern at  0 ,  1 ,  2 . This is because :
 1   1  2    2 46
Now we can divide (A), (B), (C), (D) by  0
to get autocorrelations
t 1  1t 0  2t 1 (i) This is Yule Walker
t 2  1t 1  2t 0 (ii)
t 3  1t 2  2t 1 (iii)
...
t s  1t s 1  2t s  2 (iv) t 1  1  t 12  t 23  ...t p 1 p
t 2  t 11  2  t 13  ...t p  2 p
Solving (i) (remember t 0  1) t 3  t 21  t 12  3  ...t p 3 p
t 1  1 /(1  2 ) ....
t p  t p 11  t p  22  t p 33  ...t p
Solving (ii) and (ii)
t 2  12 /(1  2 )  2
t 3  1 12 /(1  2 )  2   12 /(1  2 )
47
THE ACF for AR(2) process
• It gets messier for higher orders
• But, given the solution for t and t all
autocorrelation functions need to satisfy
the difference equation in (iv)
48
– set obs 1000
– gen e = invnorm(uniform())
– gen t = _n-1
– tsset t
– gen y = 0 if t==0
– replace y = 0 if t==1
– replace y = 0.7*l.y-0.49*l2.y+e if t>1
1 0.4597 0.4599 2114.3 0.0000

2 -0.1759 -0.4913 2423.9 0.0000
3 -0.3394 0.0119 3576.4 0.0000
4 -0.1345 0.0116 3757.4 0.0000
5 0.0690 -0.0170 3805.1 0.0000
6 0.1005 -0.0056 3906.1 0.0000
7 0.0306 0.0033 3915.5 0.0000
8 -0.0210 0.0059 3919.9 0.0000
9 -0.0292 -0.0109 3928.4 0.0000
10 -0.0313 -0.0302 3938.2 0.0000
11 -0.0258 -0.0060 3944.9 0.0000
12 -0.0017 0.0042 3944.9 0.0000
13 0.0198 -0.0022 3948.9 0.0000
14 0.0126 -0.0109 3950.5 0.0000
15 -0.0088 -0.0037 3951.3 0.0000
16 -0.0086 0.0128 3952 0.0000
17 0.0091 0.0057 3952.8 0.0000
18 0.0082 -0.0125 3953.5 0.0000
19 -0.0117 -0.0095 3954.9 0.0000
20 -0.0189 0.0007 3958.4 0.0000
49
0.40
0.20
0.00
-0.20
-0.40
0 10 20 30 40
Lag
50
But what is PACF?
• PACF: Partial autocorrelation function
• In an AR(1) process y(t) and y(t-2) are correlated even
though y(t-2) does not appear directly in the model.
– We just saw that the correlation between y(t) and y(t-2) is equal
to the correlation between y(t) and y(t-1) () multiplied by the
correlation between y(t-1) and y(t-2) () so that ttt  
– All such indirect correlations are present in the ACF of any
autoregressive process
• The partial autocorrelation eliminates the effects of any
intervening values y(t-1) through y(t-s+1)
– Clearly, at lag 1 the PACF and the ACF are identical because
there is no intervening coefficient to eliminate
• So, in an AR(1) process the PACF between y(t) and y(t-
2) is equal to zero but the ACF is equal to 
51
Formulae for PACF
t 11  t 1
(t 2  t 12 )
t 22  , computing higher order PACF is complicated
(1  t 1 )
2
but computer programs do it for you.If you really want to know,

you can use the Yule - Walker equations and get :
s 1
t s   t s 1, j t s  j 
j 1
t ss  s 1
, for s  3, 4, 5, ...
1   t s 1, j t j 
j 1
where t sj  t s 1, j  t sst s 1, s  j , for j  1, 2, 3,...,s-1

52
• In general for AR(p) the PACF goes to zero for all
lags greater than p.
– This is a useful feature for identifying AR(p) processes
• If instead we have an MA process the PACF
never goes to zero
yt  ut  ut 1 if   1 the M A is invertibe and has an infinite - order
AR representation
yt
 ut or yt  yt 1   2 yt  2   3 yt 3  ...  ut
(1  L)
– Hence, the PACF will not go to zero because y(t) will be

correlated with all of its own lags
53
Invertibility
• The concept of invertibility is similar to that
of stationarity, but when we talk about MA
we usually talk about invertibility and when
we talk about AR we talk about stationarity
• An MA(q) model is invertible if the roots of
the characteristic equation are outside the
unit circle (same condition as stationarity)
• This condition prevents the model from
exploding under an AR(oo) representation
54
• Invertibility for an MA(2) model
yt  ut  1ut 1   2ut  2   ( L)ut

if the process is invertible, it can be expressed as AR()
yt  c1 yt 1  c2 yt  2  c3 yt 3  c4 yt  4  ..  ut

yt   ci yt i  ut
i 1
• So, the value of y(t) will depend from all of its
past values and the PACF will not die out
55
What have we learnt so far?
• If we have an AR(p) model
– The ACF declines exponentially
– The PACF goes to zero at lag p+1
• If we have an MA(q) model
– The ACF goes to zero at lag q+1
– The PACF declines exponentially
56
clear
set obs 10000
gen e = invnorm(uniform())
gen t = _n-1
tsset t
gen y = 0 if t==0
replace y = 0 if t==1
gen y_ar= y
gen y_ma= y
replace y_ar = 0.9*l.y_ar+e if t>0
replace y_ma = e+0.9*l.e if t>0
corrgram y_ar, lag(20)
1 0.8977 0.8977 8060.7 0.0000

2 0.8021 -0.0192 14497 0.0000
3 0.7153 -0.0069 19616 0.0000
4 0.6369 -0.0048 23675 0.0000
5 0.5664 -0.0030 26885 0.0000
6 0.5008 -0.0150 29396 0.0000
7 0.4427 0.0004 31358 0.0000
8 0.3910 -0.0015 32888 0.0000
9 0.3435 -0.0085 34069 0.0000
10 0.3011 -0.0030 34977 0.0000
11 0.2634 -0.0015 35671 0.0000
12 0.2323 0.0102 36211 0.0000
13 0.2020 -0.0151 36620 0.0000
14 0.1738 -0.0083 36923 0.0000
15 0.1502 0.0046 37149 0.0000
16 0.1305 0.0044 37319 0.0000
17 0.1156 0.0113 37453 0.0000
18 0.1045 0.0104 37563 0.0000
19 0.0942 -0.0017 37652 0.0000
20 0.0885 0.0173 37730 0.0000
57
1.00
0.80
0.60
0.40
0.20
0.00
0 10 20 30 40
Lag
58
pac y_ar
1.00
0.80
0.60
0.40
0.20
0.00
0 10 20 30 40
Lag
95% Confidence bands [se = 1/sqrt(n)]
59
clear
set obs 10000
gen e = invnorm(uniform())
gen t = _n-1
tsset t
gen y = 0 if t==0
gen y_ar= y
gen y_ma= y
replace y_ar = 0.9*l.y_ar+e if t>0
replace y_ma = e+0.9*l.e if t>0
corrgram y_ma, lag(20)
1 0.5051 0.5051 2552 0.0000

2 0.0070 -0.3331 2552.5 0.0000
3 -0.0037 0.2412 2552.6 0.0000
4 -0.0017 -0.1879 2552.7 0.0000
5 0.0004 0.1524 2552.7 0.0000
6 -0.0088 -0.1416 2553.4 0.0000
7 -0.0109 0.1142 2554.6 0.0000
8 -0.0059 -0.1030 2555 0.0000
9 -0.0088 0.0781 2555.8 0.0000
10 -0.0177 -0.0897 2558.9 0.0000
11 -0.0160 0.0702 2561.5 0.0000
12 0.0001 -0.0533 2561.5 0.0000
13 -0.0013 0.0367 2561.5 0.0000
14 -0.0168 -0.0541 2564.3 0.0000
15 -0.0239 0.0246 2570.1 0.0000
16 -0.0261 -0.0476 2576.9 0.0000
17 -0.0183 0.0322 2580.3 0.0000
18 -0.0120 -0.0416 2581.7 0.0000
19 -0.0100 0.0306 2582.7 0.0000
20 0.0116 -0.0008 2584 0.0000
60
0.50
0.40
0.30
0.20
0.10
0.00
0 10 20 30 40
Lag
61
0.60
0.40
0.20
0.00
-0.20
-0.40
0 10 20 30 40
Lag
62
What have we learnt so far?
• If we have an AR(p) model
– The ACF declines geometrically
– The PACF goes to zero at lag p+1
• If we have an MA(q) model
– The ACF goes to zero at lag q+1
– The PACF declines geometrically
• If these were the only two possibilities,
life would be very easy
63
ARMA
• ARMA models, i.e., models which have an AR
and an MA component complicate our life!
• ARMA(p,q)
yt    1 yt 1  2 yt  2  3 yt 3  ... p yt  p 
 1ut 1   2ut  2   3ut 3  ... q ut  q  ut
or, more concisely :
(L)yt  μ  θ(L)ut
with
(L)  1  1 L  2 L2  3 L3  ...   p Lp
(L)  1  1 L   2 L2   3 L3  ...   q Lq 64
ARMA
• ARMA models, i.e., models which have an AR and an MA component
complicate our life!
• In fact, they don’t really complicate our life.
• Think about it!
• We know that under certain conditions on the parameters there exist
fundamental relationships between autoregressive and moving-average
processes, as long as we are willing to consider an infinite number of
parameters.
• However, we favor models that can characterize the properties of data
using as few parameters as possible.
• Models that contain both autoregressive and moving-average terms often
allow us to obtain such parsimonious models.
• Essentially, an ARMA model allows us to capture the dynamics of our data
using fewer parameters than would be necessary if we restricted ourselves
to pure AR or MA processes
– Having fewer parameters to estimate is particularly helpful in time-series
analysis, where the correlation among observations reduces the effective
number of observations.
65
ARMA
• We now know that a distinguishing factor
between autoregressive and moving-average
processes is how shocks to a series affect future
realizations.
• With an MA(q) process, a shock at time t has
absolutely no effect on the series in periods t + q
+ 1 and beyond.
• With an AR(p) process, the effects of a shock
decay gradually over time.
• We will use these characteristics to differentiate
series based on their moving-average and
autoregressive properties
66
ARMA
• The ACF and PACF of and ARMA(p,q) model
are a combination of those of AR and MA
models
• Both will be declining geometrically
• After lag q, the ACF is dominated by the AR
process and starts decaying towards zero
• After lag p, the PACF is dominated by the MA
process and starts decaying towards zero
67
ARMA(1,1)
• This is the workhorse specification
– (if the series is stationary)
• Calculate autocorrelations with Yule-Walker
E ( yt yt )  1 E ( yt 1 yt )  1 E (ut 1 yt )  E (uyt )   0  1 1  1 (1  1 ) 2   2
E ( yt yt 1 )  1 E ( yt 1 yt 1 )  1 E (ut 1 yt 1 )  E (uyt 1 )   1  1 0  1 2
E ( yt yt  2 )  1 E ( yt 1 yt  2 )  1 E (ut 1 yt  2 )  E (uyt  2 )   2  1 1
E ( yt yt  s )  1 E ( yt 1 yt  s )  1 E (ut 1 yt  s )  E (uyt  s )   s  1 s 1
Solving the first 2 equations simultaneously
1  12  211 2
0  
1  12
(1  11 )(1  1 ) 2
1   , and
1  12
(1  11 )(1  1 )
t1  , and t s  1t s 1
1  1  211
2 68
• Example y(t)= -0.7y(t-1)-0.7u(t-1)+u(t)
(1  11 )(1  1 ) (1  0.491 )(0.7  0.7)
t1    0.844
1  1  211
2
1  0.49  2 * 0.49
t 2  1t 1  0.7 * (0.844)  0.591
t 3  1t 2  0.7 * (0.591)  0.414,t 4  0.29,t 5  0.203
t 6  0.142,t 7  0.099,t 8  0.07
1 -0.8445 -0.8446 7133.8 0.0000

2 0.5837 -0.4515 10542 0.0000
3 -0.4012 -0.2933 12152 0.0000
4 0.2744 -0.2143 12906 0.0000
5 -0.1883 -0.1667 13261 0.0000
6 0.1294 -0.1325 13428 0.0000
7 -0.0866 -0.0950 13503 0.0000
8 0.0550 -0.0836 13533 0.0000
9 -0.0341 -0.0708 13545 0.0000
10 0.0240 -0.0427 13551 0.0000
69
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
70
0.00
-0.20
-0.40
-0.60
-0.80
0 10 20 30 40
Lag
71
(1  11 )(1  1 ) (1  0.491 )(0.7  0.7)
t1    0.844
1  1  211
2
1  0.49  2 * 0.49
t 2  1t 1  0.7 * (0.844)  0.591
t 3  1t 2  0.7 * (0.591)  0.414,t 4  0.29,t 5  0.203
t 6  0.142,t 7  0.099,t 8  0.07
Same but with 100 obs (before we had 10,000)
1 -0.7905 -0.8060 64.386 0.0000

2 0.4489 -0.4827 85.358 0.0000
3 -0.2564 -0.3723 92.273 0.0000
4 0.1969 -0.1451 96.391 0.0000
5 -0.1532 0.0048 98.91 0.0000
6 0.0991 0.0300 99.977 0.0000
7 -0.0457 0.0693 100.21 0.0000
8 0.0209 0.0654 100.25 0.0000
9 -0.0421 -0.0942 100.45 0.0000
10 0.0509 -0.1875 100.75 0.0000
72
ARMA(2,1) model
y(t)=1.6y(t-1)-0.9y(t-2)+0.5u(t-1) +u(t)
1 0.8479 0.8482 7191 0.0000

2 0.4559 -0.9365 9270.2 0.0000
3 -0.0320 0.3837 9280.4 0.0000
4 -0.4582 -0.1852 11381 0.0000
5 -0.7009 0.0831 16297 0.0000
6 -0.7068 -0.0471 21296 0.0000
7 -0.4998 0.0123 23797 0.0000
8 -0.1656 -0.0081 24072 0.0000
9 0.1814 0.0001 24401 0.0000
10 0.4357 0.0025 26302 0.0000
11 0.5315 0.0102 29130 0.0000
12 0.4581 0.0035 31232 0.0000
13 0.2574 0.0272 31895 0.0000
14 0.0054 0.0121 31896 0.0000
15 -0.2163 -0.0132 32364 0.0000
16 -0.3461 -0.0153 33565 0.0000
17 -0.3581 -0.0077 34850 0.0000
18 -0.2648 -0.0053 35552 0.0000
19 -0.1075 0.0036 35668 0.0000
20 0.0592 -0.0096 35703 0.0000
73
1.00
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
74
1.00
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
75
Cheatsheet
Process ACF PACF
White noise All ts All t(ss)=0
AR(1) with >0 Exponential decay tss t(11)= t1 t(ss)=0 for s>1
AR(1) with <0 Oscillating decay SAME AS ABOVE
AR(p) Exponential decay (it can be Different from zero until lag
oscillating) p and then zero
MA(1) with >0 Positive at lag 1, ts for s Oscillating decay
MA(1) with <0 Negative at lag 1, ts for s Exponential decay
MA(q) goes to zero at lag q+1 Exponential decay
ARMA(1,1) >0 Exponential decay from lag 1. Oscillating decay from lag
sign ts  sign 1. t(11)= t1
ARMA(1,1) <0 Oscillating decay from lag 1. Exponential decay from lag
sign ts  sign 1. t(11)= t1
sign tss  tss
ARMA(p,q) Decay starts at lag q Decay starts at lag p 76
From Enders: Tab 2.1
How to do it!
The Box-Jenkins approach
• BJ developed a systematic way to estimate ARMA
• Three steps:
1. Identification
2. Estimation
3. Diagnostic
• Note that we are still assuming that the process is
stationary
– Step 0 is to test for stationarity
• Remember parsimonious models are best, adding lags
improves the R2 but it does not necessarily improve
forecasting performance (it can actually reduce it)
– More parameters less degrees of freedom
– Too many parameters overfit historical accidents that are
unlikely to occur again
77
Identification
• In this step we want to find out the order of the model
– Plot the data and see if there is something strange (trend, outliers, missing
values, structural breaks)
– Compute and plot ACF and PACF
• The ACF and PACF are useful for selecting the number of autoregressive
and moving average terms.
– Recall that for an AR(p) process, the first p partial autocorrelations will be
significant, and the p + 1-st and later ones will be zero, ACF will be decaying
slowly. Moving-average processes show gradually decaying partial
autocorrelations and have ACF that go to zero at q+1.
• Unfortunately real world data, don't look as nice as the generated data we
used so far and it will be difficult to infer the order of the process by just
looking at the graph
– Typically a series is represented by a mixed ARMA process, in which case the
autocorrelation function decays either more slowly or more quickly than we would
expect from a pure autoregressive process.
– In these cases, we fit several candidate models with varying numbers of
autoregressive and moving-average terms and then employ information criteria
to select a final model.
78
Identification
• Information criteria
• The most popular IC are
– Akaike's information criterion (AIC)
– Schwarz's Bayesian information criterion
(SBIC)
– Hannan-Quinn information criterion (HQIC)
79
Punishment for more
Identification parameters
• AIC = T*ln(sum of squared residuals)+2k

• SBIC = T*ln(sum of squared residuals)+k*ln(T)
• HQIC= T*ln(sum of squared residuals)+2k*ln(ln(T)
– T= number of observations
– k number of parameters to be estimated
• k=p+q or k=p+q+1 if there is a constant
• The lower the IC, the better the model
• Note that the IC are decreasing in T
– You need to make sure to estimate the model keeping T
constant
– When you add lags T becomes smaller and the IC smaller, but
this does not mean that the model is better
– Example, you have 100 observations An AR(1) model will use 99
obs. And an AR(2) model will use 98. You need to compare the
models using 98 obs.
80
Identification
• But which criteria should you be using?
• The SBIC has better large sample properties
while the AIC tend to be biased towards
selecting an overparametrized model (AIC is
inconsistent, in the sense that as the sample
size tends to infinity, AIC will not select the right
model with probability one)
– (SBIC has a stiffer penalty, k is multiplied by ln(T) with
100 obs ln(T) = 4.6)
– HQIC is somewhere in between
• It is good when the 3 criteria give similar
orderings
– But sometimes, they don’t
81
Identification
• Another clue that a model is misspecified
is the iteration log produced by arima.
– If arima requires many iterations to converge
and you receive many not concave or backed
up messages, try refitting the model with
fewer moving-average lags.
– Autoregressive terms are relatively easy to
estimate, but ARIMA models with incorrect
moving-average specifications are particularly
problematic.
• In fitting a linear regression, having an irrelevant
regressor simply leads to a coefficient near zero;
with nonlinear models such as ARIMA, however,
trying to estimate irrelevant parameters is not so
easy. 82
Identification
• While statisticians frequently frown on ad-hoc
specification searches, in the context of ARIMA
modeling, fitting several competing models and
choosing the "best" one is often justified.
– Typically, ARIMA models are used for forecasting and
smoothing purposes, so selecting the model which by
some measure fits the data best is a natural course of
action.
• Finally, selecting the proper ARIMA model
specification is as much art as it is science
– As you fit more models to data that interest you, you
will become more adept in quickly picking an
appropriate model.
83
Estimation
• Before computing IC you need to estimate
the model,
• This is easy, you just tell the computer to
do it
– (not so easy without a computer, it is not
generally done with OLS but with MLE)
84
Diagnostic
• Check R2
• Look at whether the  and  are statistically significant
• How long does it take to converge?
– If convergence takes time, there may be a problem
• Compute the IC
• Plot the residuals and look for outliers
• Check if the residuals are serially correlated (do this by
looking at ACF and PACF of the residuals)
– If you get several residuals correlation that are marginally
significant (or close to being significant) and a Q statistics which
is barely significant at 10%, be suspicious
• Evaluate forecasting properties
85
Forecasting
• Assume you have an AR(1)
• You estimate y(t)=+y(t-1)+u(t)
• And use it to predict y(t+1). How?
– Et(y(t+1))=  y(t)
– Et(y(t+2))=   Ey(t+1)=  y(t)
– Et(y(t+s))= Ey(t+s-1))=  ssy(t)
• But the quality of the forecasts decline with s.
• If the series is stationary, when s goes to infinity
Et(y(t+s))=
– The forecast converges to the unconditional mean
86
In-sample forecasts Out-of-sample forecasts
6
4
2
0
9950 9960 9970 9980 9990 10000

t
y xb prediction, dyn(9980)
87
Forecasting errors are driven by
both parameter uncertainty and
Mafalda effect (we don’t know
the future and therefore the
Forecasting
parameters may change)
• The j-steps ahead forecasting error is et(j)=y(t+j)-Et(y(t+j))

• For an AR(1), the j-steps ahead forecasting error is
• et(j)=ut+j+ut+j-1+ 2ut+j-2+ 3ut+j-3+…. j-1ut+1
• Clearly E(et(j))=0
• VAR(et(j))=[ 4 6  j
• The variance is an increasing function of j.
• For j that goes to infinite
• VAR(et(j))= 2/
• If {ut} is normally distributed we can build confidence intervals for the
forecasts
– The 95% CI for the one-step ahead forecast is   y(t) ±1.96
– The 95% CI for the two-steps ahead forecast is  2y(t) ±1.96 1+5
– The 95% CI for the 3-steps ahead forecast is 3y(t) ±1.96 1+45
• This can be generalized to ARMA(p,q) processes but it’s messy
88
How good are my forecasts?
• Strategy:
– You have T observations
– Estimate different types of models using T-H
observations
– Use the results to forecast the H observations that are
not included in the model
– Compare model performance
• Note: you can do this by keeping the same window length
(rolling window) or by adding one observation at a time
(recursive window)
• Lengthening the window gives you more info, but if one
model picks a big error in the past and then does a better job
(maybe structural change) a rolling window may be
preferable
89
• How do you compare model performance?
• Having a small mean square prediction error (MSPE) is
good. If your forecasting error is e
H
1
MSPE 
H
i
e 2
i 1
• Assume you have two models. Call MSPE1 and MSPE2
their MSPEs.
• You find that MSPE1>MSPE2 but you don't know whether
it is significantly bigger
– You can check this with an F-test with (H,H) degrees of freedom
– F=MSPE1/MSPE2
90
• That F-test looked cool, but it requires three assumptions:
1. The forecasts errors have zero mean and they are normally
distributed
2. The forecast errors are serially uncorrelated
3. The forecast errors are contemporaneously uncorrelated with each
other
• These 3 assumptions rarely hold.
– Forecast errors are often serially correlated
– The fact that u is normal does not guarantee that e is normal.
– The forecast errors of two different models are often
contemporaneously correlated with each other.
• Under these conditions the ratio of MSPES does not have
an F distribution
• What to do?
91
Granger-Newbold test
• This test does not require contemporaneously uncorrelated
forecasts errors
• Define e1 and e2 the forecasts errors of the first and second model
• Use them to build xt= e1t + e2t and zt= e1t - e2t
• If the first two assumptions are valid, under the null hypothesis of
equal forecast accuracy, xt and zt should be uncorrelated
• We can compute the correlation rxz=E xt zt =E(e1t2 –e2t2) and if rxz>0
MSPE1>MSPE2 (and viceversa)
• We can test this because Granger and Newbold showed that
rxz
GN 
(1  rxz2 ) /( H  1)
has a t-distribution with H-1 degrees of freedom (r is the sample

value of r) 92
Diebold and Mariano
• The GN test still requires the first two
assumptions
• The GN test makes sense only if the loss
function is quadratic
– Maybe you care about absolute values
– Maybe you care about errors in one direction
• Diebold and Mariano develop a test that
allows to relax all 3 assumptions and
allows for non-quadratic loss functions
93
Diebold and Mariano
• Assume that you are working with one-step ahead forecast errors
• Call g(e1i) the loss from a forecast error of model 1 in period i
• The differential loss from using model 1 and 2 is given by di=g(e1i)-
g(e2i)
• The mean loss is
H
 g (e )  g (e2i )
1
d 1i
H i 1
• Under the null of equal forecast accuracy dBAR=0

• Since dBAR is an average, the CLT applies and dBAR has a normal
distribution
• If we can find its variance we have a test
94
• If the {di} series is serially uncorrelated
var(d )   0 / H  1
d
and ~ t with H - 1 degrees of freedom
 0 / H 1
• You can install the DMARIANO ado file in STATA
– But only has quadratic (default) and absolute loss functions
95
• In the presence of serial correlation in dBAR, things
become more complicated.
• Let i the ith autocovariance of the {di} sequence.
• If the first q values of i are different from zero, the
variance of dBAR can be approximated with:
[0+2 (    … q)]/(H-1)
• yielding the Diebold and Mariano (DM) statistics:
d
DM 
  2( 1   2  .. q )  / H  1
~ t with H - 1 degrees of freedom
0
96
Now, let's really do it
• Load the data and graph
• use DATA_UV1.dta
• tsline y
20
10
Nothing
0
y
surprising here
-10
-20
0 200 400 600 800 1000

t 97
Look at the correlogram
1 0.8102 0.8102 658.46 0.0000

2 0.4141 -0.7064 830.61 0.0000
3 -0.0751 -0.4389 836.27 0.0000
4 -0.5020 -0.1794 1089.8 0.0000
5 -0.7340 0.0253 1632.3 0.0000
6 -0.7228 0.0023 2158.9 0.0000
7 -0.4899 0.0159 2401.1 0.0000
8 -0.1309 -0.0308 2418.4 0.0000
9 0.2365 0.0099 2474.9 0.0000
10 0.4975 -0.0061 2725.5 0.0000
11 0.5825 -0.0021 3069.3 0.0000
12 0.4897 0.0541 3312.5 0.0000
13 0.2578 -0.0068 3380 0.0000
14 -0.0277 -0.0007 3380.8 0.0000
15 -0.2771 0.0010 3458.9 0.0000
16 -0.4245 -0.0318 3642.4 0.0000
17 -0.4339 -0.0093 3834.2 0.0000
18 -0.3210 -0.0360 3939.4 0.0000
19 -0.1216 0.0401 3954.5 0.0000
20 0.0868 -0.0603 3962.2 0.0000
Q is always highly significant it's not a White Noise
AC oscillates but decays slowly, it looks like an AR,

No clear pattern in PACF
98
1.00
0.50
0.00
-0.50
-1.00
Look at the ACF
0 10 20 30 40
Lag
99
It oscillates but it is still significant at 30 lags
Look at the PACF
1.00
0.50
0.00
-0.50
-1.00
0 10 20 30 40
Lag
100
Not significant after the 4th lag
• So far, no clear pattern, this is unlikely to
be a simple AR or MA process, but let's
use the parsimony principle and start form
the simplest model AR1
arima y, ar(1)
ARIMA regression
Sample: 0 - 999 Number of obs = 1000

Wald chi2(1) = 1658.75
Log likelihood = -2739.901 Prob > chi2 = 0.0000
OPG
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
y
_cons -.011453 .6223732 -0.02 0.985 -1.231282 1.208376
ARMA
ar
L1. .809582 .0198779 40.73 0.000 .7706219 .848542
/sigma 3.745091 .0901549 41.54 0.000 3.56839 3.921791
Standard deviation of the WN error
There seem to be no constant but the AR coefficient is highly significant 101

Let's look at the errors
residuals
predict uar1, yr
corrgram uar1
1 0.5720 0.5722 328.12 0.0000

2 0.2628 -0.0960 397.48 0.0000
3 -0.1545 -0.3974 421.47 0.0000
4 -0.5125 -0.4180 685.7 0.0000
5 -0.6506 -0.2619 1111.9 0.0000
6 -0.5988 -0.2029 1473.4 0.0000
7 -0.3487 -0.1192 1596.1 0.0000
8 -0.0338 -0.1274 1597.2 0.0000
9 0.2757 -0.0876 1674.1 0.0000
10 0.4674 -0.0747 1895.2 0.0000
11 0.4805 -0.1079 2129.1 0.0000
12 0.3796 -0.0237 2275.2 0.0000
13 0.1536 -0.0269 2299.2 0.0000
14 -0.0881 -0.0222 2307.1 0.0000
15 -0.2698 0.0076 2381.1 0.0000
16 -0.3699 -0.0176 2520.4 0.0000
17 -0.3338 0.0063 2634 0.0000
18 -0.2381 -0.0640 2691.8 0.0000
19 -0.0338 0.0362 2693 0.0000
20 0.1009 -0.0761 2703.4 0.0000
21 0.2556 0.0434 2770.2 0.0000
The errors are not White Noise!
102
Let's look at an AR(2)
arima y, ar(1/2)
ARIMA regression

Wald chi2(2) = 4713.57
OPG
y
_cons .0047896 .2597757 0.02 0.985 -.5043614 .5139407
ARMA
ar
L1. 1.381413 .0225699 61.21 0.000 1.337177 1.42565
L2. -.7050654 .0225957 -31.20 0.000 -.7493521 -.6607787
/sigma 2.65155 .0618434 42.88 0.000 2.530339 2.772761
The fit is still good

Let's look at the residuals
103
1 0.4784 0.4784 229.56 0.0000

2 0.4554 0.2938 437.82 0.0000
3 -0.1672 -0.6559 465.92 0.0000
4 -0.3384 -0.4257 581.09 0.0000
5 -0.6482 -0.1241 1004.1 0.0000
6 -0.5392 0.0083 1297.2 0.0000
7 -0.4193 0.0169 1474.6 0.0000
8 -0.0737 0.0010 1480.1 0.0000
9 0.1770 -0.0225 1511.8 0.0000
10 0.4219 0.0095 1692 0.0000
11 0.4535 -0.0105 1900.3 0.0000
12 0.3994 0.0041 2062.1 0.0000
13 0.1998 0.0615 2102.6 0.0000
14 -0.0341 -0.0317 2103.8 0.0000
15 -0.2134 0.0322 2150.2 0.0000
16 -0.3614 -0.0400 2283.2 0.0000
17 -0.3214 0.0124 2388.5 0.0000
18 -0.2835 -0.0577 2470.5 0.0000
19 -0.0616 0.0188 2474.3 0.0000
20 0.0452 -0.0096 2476.4 0.0000
NOPE!
104
Let's look at an MA(2)
arima y, ma(1/2)
ARIMA regression

Wald chi2(2) = 1534.91
OPG
y
_cons -.0005962 .2795364 -0.00 0.998 -.5484774 .5472851
ARMA
ma
L1. 1.042007 .0267381 38.97 0.000 .9896009 1.094412
L2. .5975604 .0258416 23.12 0.000 .5469117 .648209
/sigma 3.348943 .080088 41.82 0.000 3.191973 3.505913
The fit is still good

105
1 0.2644 0.2645 70.129 0.0000

2 0.2659 0.2107 141.1 0.0000
3 0.0649 -0.0524 145.33 0.0000
4 -0.4820 -0.6116 379.03 0.0000
5 -0.3853 -0.3213 528.5 0.0000
6 -0.4606 -0.1212 742.32 0.0000
7 -0.3351 0.0443 855.65 0.0000
8 -0.0311 -0.0090 856.63 0.0000
9 0.1419 -0.0041 876.98 0.0000
10 0.3296 0.0029 986.93 0.0000
11 0.3641 -0.0235 1121.2 0.0000
12 0.3033 0.0099 1214.5 0.0000
13 0.1527 0.0177 1238.2 0.0000
14 -0.0276 0.0379 1238.9 0.0000
15 -0.1875 -0.0148 1274.7 0.0000
16 -0.2651 0.0032 1346.3 0.0000
17 -0.2603 0.0127 1415.3 0.0000
18 -0.2262 -0.0785 1467.6 0.0000
19 -0.0200 0.0461 1468 0.0000
20 0.0196 -0.0616 1468.4 0.0000
NOPE!
106
Let's try at an ARMA(11)
arima y, ar(1) ma(1)
ARIMA regression

Wald chi2(2) = 1689.22
OPG
y
_cons -.0049327 .5317221 -0.01 0.993 -1.047089 1.037223
ARMA
ar
L1. .7152491 .02563 27.91 0.000 .6650151 .7654831
ma
L1. .4877043 .0318154 15.33 0.000 .4253472 .5500613
/sigma 3.205827 .0757257 42.33 0.000 3.057407 3.354246
t-stats are lower but still good

107
1 0.1667 0.1667 27.859 0.0000

2 0.3150 0.2955 127.45 0.0000
3 -0.1511 -0.2678 150.39 0.0000
4 -0.3434 -0.4521 269.01 0.0000
5 -0.4440 -0.3274 467.49 0.0000
6 -0.4306 -0.2297 654.43 0.0000
7 -0.2326 -0.1237 709.03 0.0000
8 -0.0361 -0.0932 710.35 0.0000
9 0.1885 -0.0858 746.28 0.0000
10 0.3337 -0.0331 858.98 0.0000
11 0.3175 -0.0803 961.12 0.0000
12 0.2800 -0.0309 1040.6 0.0000
13 0.1058 0.0109 1052 0.0000
14 -0.0652 -0.0161 1056.3 0.0000
15 -0.1696 0.0292 1085.5 0.0000
16 -0.2817 -0.0174 1166.3 0.0000
17 -0.1987 0.0402 1206.6 0.0000
18 -0.2120 -0.0703 1252.5 0.0000
19 0.0217 0.0340 1252.9 0.0000
NOPE!
Wasn’t this the workhorse????
108
arima y, ar(1/2) ma(1)
ARIMA regression

Wald chi2(3) = 15389.47
OPG
y
_cons .0099372 .1232032 0.08 0.936 -.2315366 .2514111
ARMA
ar
L1. 1.613196 .0160236 100.68 0.000 1.58179 1.644602
L2. -.8993578 .0142444 -63.14 0.000 -.9272763 -.8714393
ma
L1. -.533875 .0343623 -15.54 0.000 -.6012238 -.4665261
/sigma 2.383053 .0535526 44.50 0.000 2.278091 2.488014
Amazing t-stats!
109
1 -0.0791 -0.0791 6.2775 0.0122

2 0.1197 0.1142 20.668 0.0000
3 0.0724 0.0917 25.94 0.0000
4 -0.0372 -0.0397 27.333 0.0000
5 0.0140 -0.0122 27.529 0.0000
6 -0.0225 -0.0200 28.041 0.0001
7 0.0373 0.0415 29.448 0.0001
8 0.0118 0.0228 29.589 0.0002
9 0.0435 0.0407 31.501 0.0002
10 0.0488 0.0450 33.91 0.0002
11 -0.0340 -0.0380 35.084 0.0002
12 0.0185 -0.0044 35.429 0.0004
13 -0.0273 -0.0209 36.186 0.0006
14 -0.0392 -0.0371 37.748 0.0006
15 -0.0038 -0.0071 37.763 0.0010
16 -0.0336 -0.0233 38.916 0.0011
Much better, but still, look at Q!
110
arima y, ar(1/2) ma(1/2)
ARIMA regression

Wald chi2(4) = 17950.76
OPG
y
_cons .0061418 .1439669 0.04 0.966 -.2760281 .2883117
ARMA
ar
L1. 1.604461 .0150945 106.29 0.000 1.574877 1.634046
L2. -.9129829 .0140284 -65.08 0.000 -.9404781 -.8854877
ma
L1. -.6066721 .0356254 -17.03 0.000 -.6764967 -.5368476
L2. .2036361 .0333804 6.10 0.000 .1382118 .2690604
/sigma 2.345969 .052197 44.94 0.000 2.243665 2.448273
mmm!
111
1 -0.0005 -0.0005 .00029 0.9863

2 -0.0050 -0.0050 .02508 0.9875
3 0.0197 0.0197 .41426 0.9373
4 -0.0434 -0.0435 2.311 0.6788
5 0.0044 0.0046 2.3302 0.8018
6 -0.0203 -0.0212 2.745 0.8401
7 0.0178 0.0196 3.0664 0.8788
8 -0.0073 -0.0096 3.1195 0.9266
9 0.0219 0.0234 3.6033 0.9355
10 0.0253 0.0226 4.2514 0.9353
11 -0.0417 -0.0399 6.0171 0.8722
12 0.0143 0.0124 6.224 0.9044
13 -0.0116 -0.0102 6.3603 0.9321
14 -0.0177 -0.0150 6.6777 0.9464
15 0.0165 0.0135 6.9551 0.9589
16 -0.0044 -0.0028 6.9752 0.9737
17 0.0244 0.0223 7.5835 0.9747
18 -0.0572 -0.0587 10.918 0.8978
19 0.0174 0.0171 11.226 0.9160
20 -0.1160 -0.1212 24.977 0.2023
Ah Ah!
112
arima y, ar(1) ma(1/2)
ARIMA regression

Wald chi2(3) = 1659.19
OPG
y
_cons -.0017848 .507626 -0.00 0.997 -.9967136 .993144
ARMA
ar
L1. .6333353 .0328677 19.27 0.000 .5689159 .6977548
ma
L1. .5780074 .0362004 15.97 0.000 .507056 .6489589
L2. .4068361 .0343752 11.84 0.000 .339462 .4742102
/sigma 2.964544 .0689436 43.00 0.000 2.829417 3.099671
Better before
113
1 0.1059 0.1059 11.245 0.0008

2 0.0851 0.0747 18.513 0.0001
3 0.0723 0.0570 23.768 0.0000
4 -0.3610 -0.3878 154.87 0.0000
5 -0.3853 -0.3813 304.34 0.0000
6 -0.3076 -0.2988 399.7 0.0000
7 -0.2010 -0.1354 440.48 0.0000
8 -0.0470 -0.1159 442.71 0.0000
9 0.1719 -0.0505 472.6 0.0000
10 0.2769 -0.0269 550.21 0.0000
11 0.2391 -0.0722 608.12 0.0000
12 0.2397 -0.0116 666.38 0.0000
13 0.0915 -0.0059 674.88 0.0000
14 -0.0642 0.0078 679.07 0.0000
15 -0.1423 0.0131 699.68 0.0000
It was definitely better before

The strongest candidate seem to be ARMA(2,2), followed by ARMA(2,1)
Let's look at all of them (easy to do because I typed est store)
114
est table ar1 ar2 ma1 ma2 arma11 arma21 arma22 arma12
Variable ar1 ar2 ma1 ma2 arma11 arma21 arma22 arma12
y
_cons -.01145297 .00478961 -.0014219 -.00059618 -.00493273 .00993724 .00614182 -.0017848
ARMA
L.ar .80958196 1.3814135 .71524908 1.6131958 1.6044613 .63333531
L2.ar -.70506538 -.89935778 -.91298289
L.ma .80017243 1.0420065 .48770427 -.53387498 -.60667211 .57800741
L2.ma .59756038 .20363609 .40683611
sigma
_cons 3.7450909 2.6515503 4.1199273 3.348943 3.2058267 2.3830526 2.3459689 2.9645442
What about the information criteria?

est stats ar1 ar2 ma1 ma2 arma11 arma21 arma22 arma12
STATA very smart
ARMA(2,2) and ARMA(2,1) are the best with ARMA(2,2) being

slightly better. In fact, the process was an ARMA(2,1) 115
Now let's try with forecasting
replace fe_arma21 = y-temp if t==1000-T+ì'
gen fe_ar1=.
drop temp
gen fe_ar2=.
arima y if t<1000-T+ì', ar(1/2) ma(1/2)
gen fe_arma11=.
predict temp
gen fe_arma21=.
gen fe_arma22=.
drop temp
gen fe_arma12=.
arima y if t<1000-T+ì', ar(1) ma(1/2)
predict temp
gen fe_ar1cw=.
gen fe_ar2cw=.
drop temp
gen fe_arma11cw=.
arima y if t>=ì'-1 & t<1000-T+ì', ar(1)
gen fe_arma21cw=.
predict temp
gen fe_arma22cw=.
replace fe_ar1cw = y-temp if t==1000-T+ì'
gen fe_arma12cw=.
drop temp
arima y if t>=ì'-1 & t<1000-T+ì', ar(1/2)
***WE SET H=50
predict temp
replace fe_ar2cw = y-temp if t==1000-T+ì'
****CALCULATING ONE-STEP AHEAD MSE**
drop temp
arima y if t>=ì'-1 & t<1000-T+ì', ar(1) ma(1)
gen T = 50
predict temp
replace fe_arma11cw = y-temp if t==1000-T+ì'
local i = 1
drop temp
while ì' <= T {
arima y if t>=ì'-1 & t<1000-T+ì', ar(1/2) ma(1)
arima y if t<1000-T+ì', ar(1)
predict temp
predict temp
replace fe_ar1 = y-temp if t==1000-T+ì'
drop temp
drop temp
arima y if t>=ì'-1 & t<1000-T+ì', ar(1/2) ma(1/2)
arima y if t<1000-T+ì', ar(1/2)
predict temp
predict temp
replace fe_ar2 = y-temp if t==1000-T+ì'
drop temp
drop temp
arima y if t>=ì'-1 & t<1000-T+ì', ar(1) ma(1/2)
arima y if t<1000-T+ì', ar(1) ma(1)
predict temp
predict temp
drop temp
drop temp
local i = ì' + 1
arima y if t<1000-T+ì', ar(1/2) ma(1) 116
}
predict temp
gen fe_ar1_sq = fe_ar1^2 ****CALCULATING 3-STEPS AHEAD mse**
gen fe_ar2_sq = fe_ar2^2
gen fe_arma11_sq = fe_arma11^2 drop fe_*
gen fe_arma21_sq = fe_arma21^2
gen fe_arma22_sq = fe_arma22^2 gen fe_ar1=.
gen fe_arma12_sq = fe_arma12^2 gen fe_ar2=.
gen fe_arma11=.
gen fe_ar1_sqcw = fe_ar1cw^2 gen fe_arma21=.
gen fe_ar2_sqcw = fe_ar2cw^2 gen fe_arma22=.
gen fe_arma11_sqcw = fe_arma11cw^2 gen fe_arma12=.
gen fe_arma21_sqcw = fe_arma21cw^2
gen fe_arma22_sqcw = fe_arma22cw^2 gen fe_ar1cw=.
gen fe_arma12_sqcw = fe_arma12cw^2 gen fe_ar2cw=.
gen fe_arma11cw=.
gen fe_arma21cw=.
egen MSE_AR1 = mean(fe_ar1_sq) gen fe_arma22cw=.
egen MSE_AR2 = mean(fe_ar2_sq) gen fe_arma12cw=.
egen MSE_ARMA11 = mean(fe_arma11_sq)
egen MSE_AR1CW = mean(fe_ar1_sqcw)

egen MSE_AR2CW = mean(fe_ar2_sqcw)
egen MSE_ARMA11CW = mean(fe_arma11_sqcw)
117
local i = 1
while ì' <=T { gen fe_ar1_sq = fe_ar1^2
arima y if t<1000-T+ì', ar(1) gen fe_ar2_sq = fe_ar2^2
predict temp, dynamic(1000-T+ì') gen fe_arma11_sq = fe_arma11^2
replace fe_ar1 = y-temp if t==900+ì'+2 gen fe_arma21_sq = fe_arma21^2
drop temp gen fe_arma22_sq = fe_arma22^2
arima y if t<1000-T+ì', ar(1/2) gen fe_arma12_sq = fe_arma12^2
predict temp , dynamic(1000-T+ì')
replace fe_ar2 = y-temp if t==1000-T+ì'+2 gen fe_ar1_sqcw = fe_ar1cw^2
drop temp gen fe_ar2_sqcw = fe_ar2cw^2
arima y if t<1000-T+ì', ar(1) ma(1) gen fe_arma11_sqcw = fe_arma11cw^2
predict temp , dynamic(1000-T+ì') gen fe_arma21_sqcw = fe_arma21cw^2
replace fe_arma11 = y-temp if t==1000-T+ì'+2 gen fe_arma22_sqcw = fe_arma22cw^2
drop temp gen fe_arma12_sqcw = fe_arma12cw^2
arima y if t<900+ì', ar(1/2) ma(1)
replace fe_arma21 = y-temp if t==1000-T+ì'+2 egen MSE_AR1_3 = mean(fe_ar1_sq)
drop temp egen MSE_AR2_3 = mean(fe_ar2_sq)
arima y if t<1000-T+ì', ar(1/2) ma(1/2) egen MSE_ARMA11_3 = mean(fe_arma11_sq)
predict temp , dynamic(1000-T+ì') egen MSE_ARMA21_3 = mean(fe_arma21_sq)
replace fe_arma22 = y-temp if t==1000-T+ì'+2 egen MSE_ARMA22_3 = mean(fe_arma22_sq)
drop temp egen MSE_ARMA12_3 = mean(fe_arma12_sq)
arima y if t<1000-T+ì', ar(1) ma(1/2)
replace fe_arma12 = y-temp if t==1000-T+ì'+2 egen MSE_AR1CW_3 = mean(fe_ar1_sqcw)
drop temp egen MSE_AR2CW_3 = mean(fe_ar2_sqcw)
local i = ì' + 1 egen MSE_ARMA11CW_3 = mean(fe_arma11_sqcw)
} egen MSE_ARMA21CW_3 = mean(fe_arma21_sqcw)
egen MSE_ARMA22CW_3 = mean(fe_arma22_sqcw)
egen MSE_ARMA12CW_3 = mean(fe_arma12_sqcw)
118
Variable Obs Mean Std. Dev. Min Max
MSE_AR1 1000 12.02818 0 12.02818 12.02818

MSE_AR2 1000 6.201813 0 6.201813 6.201813
MSE_ARMA11 1000 8.336815 0 8.336815 8.336815
MSE_ARMA21 1000 5.647309 0 5.647309 5.647309
MSE_ARMA22 1000 5.321001 0 5.321001 5.321001
MSE_ARMA12 1000 8.36657 0 8.36657 8.36657

MSE_AR1CW 1000 12.03326 0 12.03326 12.03326
MSE_AR2CW 1000 6.194459 0 6.194459 6.194459
MSE_ARMA11CW 1000 8.333846 0 8.333846 8.333846
MSE_ARMA21CW 1000 5.656081 0 5.656081 5.656081
MSE_ARMA22CW 1000 5.327425 0 5.327425 5.327425

MSE_ARMA12CW 1000 8.314061 0 8.314061 8.314061
MSE_AR1_3 1000 9.962931 0 9.962931 9.962931
MSE_AR2_3 1000 19.3157 0 19.3157 19.3157
MSE_ARMA11_3 1000 39.73622 0 39.73622 39.73622
MSE_ARMA21_3 1000 19.59711 0 19.59711 19.59711

MSE_ARMA22_3 1000 17.27826 0 17.27826 17.27826
MSE_ARMA12_3 1000 34.89569 0 34.89569 34.89569
Best one-step-ahead Best three-steps ahead
119
• At this point we could do F test with the ratios of
MSE
• Do Granger and Newbold
• Do Diebold and Mariano
• The latter is particularly useful if our loss function
is not quadratic (maybe underpredictions are
more costly than overpredictions, maybe there
is a threshold above which the error does not
matter…)
120
Seasonality
• Monthly, quarterly, and daily data may have a seasonal
effect
• For instance, consumption and money supply are higher
in December and in the 4th quarter of the year, and stock
returns are often lower on Friday.
• One way to deal with seasonality is to regress the series
on quarter-month, day dummies and then work with the
residuals of the series.
• This is what it is often done when you get seasonally
adjusted data
• But seasonal patterns may remain (especially if you use
subsets of data which may have more or less
pronounced seasonal patterns)
121
Seasonality
• Instead of using a two-step procedure (first you remove
seasonal effect and then you estimate the model) there
is now consensus that it is better to estimate ARMA and
seasonal coefficients jointly
• The procedure is similar to Box and Jenkins by
remember that the ACF and PACF may exhibit seasonal
behavior
• For instance, with quarterly data we may have
y(t)=4y(t-4)+u(t) (a)
y(t)=4u(t-4)+u(t) (b)
• and ti=(4i4 if i/4 is an integer and ti=0 otherwise.
– In model (a) (AR) the ACF exhibits decays at lags 4, 8, 12, …
– In model (b) (MA) the ACF has a spike at lag 4 and all other
values are zero 122
Seasonality
M odels of the type:
y t  1 yt 1  1ut 1   4ut  4  ut
y t  1 yt 1  4 yt  4  1ut 1  ut (a)
y t  1 yt 1  4 yt  4  1ut 1   4ut  4  ut
exhibit additive seasonalit y (the seasonal coefficien ts
are added to the process). M odels of the type:
(1 - 1 L) yt  (1  1 L)(1   4 L4 )ut
(1 - 1 L)(1 - 4 L4 ) yt  (1  1 L)ut (b)
(1 - 1 L)(1 - 4 L4 ) yt  (1  1 L)(1   4 L4 )ut
exhibit multiplicative seasonalit y
123
Seasonality
M odels of the type:
(1 - 1 L) yt  (1  1 L)(1   4 L4 )ut
yt  1 yt 1  ut  1ut 1   4ut  4  1 4ut 5
(1 - 1 L)(1 - 4 L4 ) yt  (1  1 L)ut
yt  1 yt 1  4 yt  4  1 yt  44 yt 5  ut  1ut 1
(1 - 1 L)(1 - 4 L4 ) yt  (1  1 L)(1   4 L4 )ut
yt  1 yt 1  4 yt  4  14 yt 5  ut  1ut 1   4ut  4  1 4ut 5
124
Seasonality
• Equation (a) differs from (b) because the latter allows the MA term at
lag 1 to interact with the MA term at lag 4
• Equation (b) can be rewritten as
y t  1 yt 1  1ut 1   4ut 4  1 4ut 5  ut

– By estimating 3 coefficients we can capture the autoregressive effect of
the of MA terms at lags 1, 4, and 5
• Of course, we would get a better fit by estimating
y t  1 yt 1  1ut 1   4ut 4  5ut 5  ut

• But the multiplicative model is more parsimonious and is
preferable if 4 are similar to 5
125
Example
• I am using quarterly data (not seasonally adjusted) for
US money supply (M1) for the 1960:q1-2002:q1 period
1200
.06
1000
.04
The growth rate
looks stationary,
800
Growth of M1
but there seems
.02
to be a cyclical
600
pattern, the 4th

quarter is always
400
0
higher
200
-.02
1960q1 1970q1 1980q1 1990q1 2000q1
date
Money Supply Bil USD Growth of M1

126
This is the formal test
. xi: reg grM1 i.quarter
i.quarter _Iquarter_1-4 (naturally coded; _Iquarter_1 omitted)
Source SS df MS Number of obs = 168

F( 3, 164) = 30.66
Model .014976823 3 .004992274 Prob > F = 0.0000 What am I testing here?
Residual .026703604 164 .000162827 R-squared = 0.3593
Adj R-squared = 0.3476
Total .041680427 167 .000249583 Root MSE = .01276
grM1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
_Iquarter_2 .0119477 .0027845 4.29 0.000 .0064496 .0174459

_Iquarter_3 .0118625 .0027845 4.26 0.000 .0063643 .0173606
_Iquarter_4 .0266308 .0027845 9.56 0.000 .0211326 .0321289
_cons .0000663 .001969 0.03 0.973 -.0038215 .0039541
. char quarter[omit] 4
You can omit quarter 4!
. xi: reg grM1 i.quarter
i.quarter _Iquarter_1-4 (naturally coded; _Iquarter_4 omitted)
Source SS df MS Number of obs = 168

F( 3, 164) = 30.66
Model .014976823 3 .004992274 Prob > F = 0.0000
Residual .026703604 164 .000162827 R-squared = 0.3593
Adj R-squared = 0.3476
Total .041680427 167 .000249583 Root MSE = .01276
grM1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
_Iquarter_1 -.0266308 .0027845 -9.56 0.000 -.0321289 -.0211326

_Iquarter_2 -.014683 .0027845 -5.27 0.000 -.0201812 -.0091849
_Iquarter_3 -.0147683 .0027845 -5.30 0.000 -.0202665 -.0092701
_cons .0266971 .001969 13.56 0.000 .0228093 .0305849
127
. corrgram grM1, lag(12)
-1 0 1 -1 0 1
1 0.1456 0.1457 3.6243 0.0569

2 0.2057 0.1888 10.904 0.0043
3 0.0496 -0.0039 11.33 0.0101
4 0.6476 0.6474 84.364 0.0000
5 -0.0061 -0.3103 84.37 0.0000
6 0.1076 -0.0381 86.41 0.0000
7 -0.0636 -0.0627 87.128 0.0000
8 0.5030 0.2225 132.3 0.0000
9 -0.1535 -0.3051 136.53 0.0000
10 -0.0387 -0.1149 136.8 0.0000
11 -0.1412 0.0913 140.43 0.0000
12 0.4072 0.1468 170.78 0.0000
. corrgram m1, lag(12)
-1 0 1 -1 0 1
1 0.3749 0.3757 23.475 0.0000

2 0.1611 0.0242 27.837 0.0000
3 0.0400 -0.0282 28.107 0.0000
4 -0.2695 -0.3494 40.463 0.0000
5 0.0085 0.2904 40.476 0.0000
6 0.0710 0.0507 41.344 0.0000
7 -0.0719 -0.1785 42.242 0.0000
8 -0.0788 -0.2434 43.325 0.0000
9 -0.1196 0.1479 45.836 0.0000
10 -0.2192 -0.1525 54.327 0.0000
11 -0.0944 -0.0658 55.913 0.0000
12 -0.1057 -0.2064 57.915 0.0000
128
0.60
0.40
0.40
0.20
Autocorrelations of m1
0.20
0.00
0.00
-0.20
-0.20
-0.40
-0.40
0 10 20 30 40
Lag
0 10 20 30 40
Bartlett's formula for MA(q) 95% confidence bands Lag
0.60
0.40
0.40
Partial autocorrelations of m1
0.20
0.20
0.00
0.00
-0.20
-0.20
-0.40
0 10 20 30 40
-0.40
Lag
0 10 20 30 40
Lag
129
Try with 4 different models
arima m1, ar(1) ma(4)
est store arma1_4
arima m1, ar(1) mar(1,4)
est store ar_14
arima m1, ma(1) mma(1,4) These are all multiplicative
est store ma_14
arima m1, ar(1) ma(1) mar(1,4) mma(1,4)
est store arma_14_14
. est table arma1_4 ar_14 ma_14 arma_14_14, b(%7.4f) se(%7.4f) drop(_cons)
Variable arma1_4 ar_14 ma_14 arma~14
ARMA
L.ar 0.5371 0.4845 0.7335
0.0576 0.0587 0.1046
L4.ma -0.7513
0.0469
L.ma 0.4487 -0.2835 According to BIC and
0.0727 0.1349
AIC the best model is
ARMA4
L.ar -0.4292 0.0012 the second
0.0685 0.1058
L.ma -0.7326 -0.7611
0.0485 0.0644
legend: b/se
. est stats arma1_4 ar_14 ma_14 arma_14_14
Model Obs ll(null) ll(model) df AIC BIC
arma1_4 164 . 521.0791 4 -1034.158 -1021.759

ar_14 164 . 506.4956 4 -1004.991 -992.5918
ma_14 164 . 513.9429 4 -1019.886 -1007.486
arma_14_14 164 . 522.4548 6 -1032.91 -1014.31
130
Note: N=Obs used in calculating BIC; see [R] BIC note
. corrgram resarma1_4, lag(13)
-1 0 1 -1 0 1
1 -0.0582 -0.0587 .56652 0.4516

2 0.0308 0.0292 .72595 0.6956
3 0.0902 0.0989 2.1006 0.5518
4
5
0.0156
0.1192
0.0246
0.1328
2.1421
4.5731
0.7096
0.4702
This looks good!
6 0.0652 0.0731 5.3065 0.5051
7 -0.0128 -0.0140 5.3348 0.6192
8 -0.0478 -0.0965 5.7339 0.6770
9 0.0301 0.0016 5.8929 0.7506
10 -0.1484 -0.1939 9.7877 0.4593
11 0.0334 0.0138 9.9858 0.5317
12 -0.0791 -0.0910 11.106 0.5198
13 -0.0951 -0.0607 12.737 0.4683
. corrgram resar1_4, lag(13)
-1 0 1 -1 0 1
1 -0.0394 -0.0396 .25928 0.6106

2 0.0608 0.0609 .88043 0.6439
3 0.0686 0.0763 1.6765 0.6422
4 -0.1020 -0.1128 3.4459 0.4862
5 0.1092 0.1089 5.4892 0.3591 Not so good!
6 0.0874 0.1134 6.8061 0.3391
7 -0.0341 -0.0255 7.0083 0.4280
8 -0.2444 -0.3582 17.431 0.0259
9 0.0126 0.0276 17.459 0.0420
10 -0.1573 -0.1362 21.833 0.0160
11 0.0372 0.0723 22.08 0.0238
12 -0.1096 -0.2434 24.233 0.0189
13 -0.1272 -0.0124 27.15 0.0119
131
. corrgram resma1_4, lag(13)
-1 0 1 -1 0 1
1 0.0885 0.0899 1.3079 0.2528

2 0.2843 0.2853 14.892 0.0006
3 0.1649 0.1337 19.492 0.0002
4
5
0.0935
0.1558
0.0051
0.0938
20.979
25.133
0.0003
0.0001
This is very bad!
6 0.0705 0.0178 25.989 0.0002
7 0.0274 -0.0649 26.118 0.0005
8 -0.0725 -0.1697 27.036 0.0007
9 0.0128 0.0065 27.065 0.0014
10 -0.1747 -0.1811 32.457 0.0003
11 -0.0193 0.0173 32.523 0.0006
12 -0.1381 -0.0609 35.939 0.0003
13 -0.1209 -0.0417 38.574 0.0002
. corrgram resarma1_4_1_4, lag(13)
-1 0 1 -1 0 1
1 0.0152 0.0152 .03834 0.8448

2 -0.0608 -0.0625 .65996 0.7189
3 0.0108 0.0178 .67953 0.8780
4 -0.0049 -0.0107 .68363 0.9533
5 0.1049 0.1215 2.5663 0.7665
6
7
0.0624
-0.0239
0.0584
-0.0104
3.2371
3.3357
0.7786
0.8523
good!
8 -0.0476 -0.0579 3.7312 0.8805
9 0.0071 0.0040 3.7401 0.9277
10 -0.1378 -0.1832 7.0992 0.7161
11 0.0223 0.0239 7.1877 0.7837
12 -0.0644 -0.1068 7.9314 0.7905
13 -0.0871 -0.0644 9.298 0.7501
So, only models 1 and 4 have well-behaved residuals, the information

criteria are slightly better in 4, but not much difference. Before
132
deciding what to do, I would check their forecasting power

Univariate Time Series

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Univariate Time Series

Uploaded by

Copyright:

Available Formats

Univariate Time Series

Loosely based on Ugo Panizza’s

1. E ( yt )   (and not t)

2. E ( yt   ) 2   2   (and not t)

• As this is a sum of normals, it is distributed like a

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.6952 0.6953 484.82 0.0000

Portmanteau test for white noise

Portmanteau (Q) statistic = 13885.0178

Portmanteau test for white noise

Portmanteau (Q) statistic = 14.2543

• The tests confirm that y is not WN but e is WN

yt    1ut 1   2ut 2  3ut 3  ... qut q

• But E(cross products)=0 because u is WN

 2  E[(ut  1ut 1   2ut 2 )(ut 2  1ut 3   2ut 4 )]

 3  E[(ut  1ut 1  2ut 2 )(ut 3  1ut 4   2ut 5 )]  0

1 -0.4994 -0.4995 250.15 0.0000

• With  ( L) converging to zero.

• Another way to say this is that stationarity requires

1 -1.96560837 - 1.08461394i -1.96560837 + 1.08461394i .264550072

– If you have a complex solution, the modulus of the complex

1 2.24499525 2.24499525 .2645500719

E ( yt )   (1  1  12  ...  1n 1 )

var( yt )  E[ yt  E ( yt )]2 since with   0, E ( yt )  0

• The correlogram can be obtained from the Yule-

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.6952 0.6953 484.82 0.0000

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.4597 0.4599 2114.3 0.0000

but computer programs do it for you.If you really want to know,

where t sj  t s 1, j  t sst s 1, s  j , for j  1, 2, 3,...,s-1

– Hence, the PACF will not go to zero because y(t) will be

yt  ut  1ut 1   2ut  2   ( L)ut

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.8977 0.8977 8060.7 0.0000

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.5051 0.5051 2552 0.0000

1 -0.8445 -0.8446 7133.8 0.0000

Same but with 100 obs (before we had 10,000)

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 -0.7905 -0.8060 64.386 0.0000

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.8479 0.8482 7191 0.0000

• AIC = T*ln(sum of squared residuals)+2k

9950 9960 9970 9980 9990 10000

• The j-steps ahead forecasting error is et(j)=y(t+j)-Et(y(t+j))

has a t-distribution with H-1 degrees of freedom (r is the sample

• Under the null of equal forecast accuracy dBAR=0

0 200 400 600 800 1000

1 0.8102 0.8102 658.46 0.0000

Q is always highly significant it's not a White Noise

AC oscillates but decays slowly, it looks like an AR,

Sample: 0 - 999 Number of obs = 1000

/sigma 3.745091 .0901549 41.54 0.000 3.56839 3.921791

Standard deviation of the WN error

There seem to be no constant but the AR coefficient is highly significant 101

LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]

1 0.5720 0.5722 328.12 0.0000

The errors are not White Noise!

Sample: 0 - 999 Number of obs = 1000

/sigma 2.65155 .0618434 42.88 0.000 2.530339 2.772761

The fit is still good

1 0.4784 0.4784 229.56 0.0000

Sample: 0 - 999 Number of obs = 1000