You are on page 1of 296

Addis Ababa University

College of Business and Economics


Department of Economics
Econ 654: Research Methods & Applied Econometrics

1. Univariate Time Series Analysis & Forecasting

Fantu Guta Chemrie (PhD)


F. Guta (CoBE) Econ 654 September, 2017 1 / 296
1. Univariate Time Series Analysis and Forecasting
1.1 Expectations, Stationarity, Ergodicity & WN Process
1.2 Autoregressive and Moving Average Processes
1.3 Use of Lag operator
1.4 Invertibility of Lag Polynomials
1.5 Problem of Common Roots of Lag Polynomials
1.6 Autocovariance Generating Function
1.7 Trend Stationarity
1.8 Testing for Omitted Serial Correlation
1.9 Stationarity and Unit Roots
1.10 Estimation of ARMA Models
F. Guta (CoBE) Econ 654 September, 2017 2 / 296
1. Univariate Time Series Continued ...
1.11 Model Selection and Diagnostic Checking
1.12 Predicting with ARMA models

F. Guta (CoBE) Econ 654 September, 2017 3 / 296


1. Univariate Time Series Analysis and Forecasting
1.1 Expectations, Stationarity, Ergodicity and White Noise
Process

A time series yt is a process observed in sequence


over time, t = 1, . . . , T .
To indicate the dependence on time, we adopt new
notation, and use the subscript t to denote the
individual observation, and T to denote the number
of observations.
F. Guta (CoBE) Econ 654 September, 2017 4 / 296
Because of sequential nature of time series, we
expect that yt and yt 1 are not independent, so
classical assumptions of linear regression models are
not valid.
We can separate time series into two categories:
univariate (yt 2 R is scalar); and multivariate
(yt 2 Rn is vector-valid) .
The primary model for univariate time series is ARs.
The primary model for multivariate time series is
VARs.
F. Guta (CoBE) Econ 654 September, 2017 5 / 296
Suppose we observed a sample of size T of some
T
random variable Yt : fyt gt =1 .
The observed sample represents T particular
numbers, but this T numbers is only one possible
outcome of the underlying stochastic process 1 that
generated the data.
1A stochastic process can be de…ned as a function on T Ω , that is,
X :T Ω ! R such that X : (t, ω ) ! X (t, ω ) = Xt (ω ). For a given t 2 T,
Xt ( ) : Ω ! R is a random variable. For a given ω 2 Ω, X ( , ω ) : T ! R is a
function mapping the index set T to R. Such a function is called a sample
function (a sample path, a realization or a trajectory).
F. Guta (CoBE) Econ 654 September, 2017 6 / 296
Indeed, even if we were to imagine having observed

the process fyt gt = ∞, this in…nite sequence still be
viewed as a single realization from a time series
process.
Imagine we have N computers generating sequences
n o n o n o
(1) ∞ (2) ∞ (N ) ∞
yt , yt , . . . , yt , and
t= ∞ t= ∞ t= ∞
consider selecting the observation associated with
n o
(1) (2) (N )
date t from each sequence: yt , yt , . . . , yt .
This would be described as a sample of N

F. Guta (CoBE) Econ 654 September, 2017 7 / 296


realizations of a random variable Yt .

This random variable has some density function,


denoted by fYt (yt ), which is the unconditional
density of Yt .

The expectation of the t th observation of the time

series refers to the mean of this probability

distribution provided it exists:


Z ∞
E (Yt ) = yt fYt (yt ) dyt (1.1)

F. Guta (CoBE) Econ 654 September, 2017 8 / 296


We might view this as the probability limit of the

ensemble average:

N
1
∑ yt
(i )
E (Yt ) = plim (1.2)
N !∞ N i =1

Sometimes for emphasis the expectation E (Yt ) is


called the unconditional mean of Yt , denoted by µt .
Note that this notation allows the general possibility
that the mean can be a function of the date of the
observation t.
F. Guta (CoBE) Econ 654 September, 2017 9 / 296
Variance of the random variable Yt (denoted by γ0 )

is similarly de…ned as:

Z ∞
γ0t = E (Yt µt )2 = (yt µt )2 fYt (yt ) dyt (1.3)

Autocovariance at lag k of Yt (denoted by γk ) is


de…ned as:

γkt = E (Y t µt ) Y t k µt k (1.4)
Z ∞ Z ∞
= (Y t µt ) Y t k µt k fYt ,Yt 1 ,...,Y t k (y t , y t 1 , . . . , yt k ) dy t dy t 1 dyt k
∞ ∞

F. Guta (CoBE) Econ 654 September, 2017 10 / 296


Again we may also think of the autocovariance at

lag k as the probability limit of an ensemble

average:
N
1

(i ) (i )
γkt = plim yt µt yt k µt k (1.5)
N !∞ N i =1

Stationarity and Ergodicity


De…nition
1.1.1 fyt g is covariance (weakly) stationary

if E (yt ) = µ is independent of t, and


F. Guta (CoBE) Econ 654 September, 2017 11 / 296
De…nition
cov (yt , yt k) = γk is independent of t for all k. γk
is the autocovariance function at lag k.
ρk = γk /γ0 = corr (yt , yt k) is the autocorrelation
function at lag k.

1.1.2 fyt g is strictly stationary if the joint distribution of


(yt , . . . , yt k) is the same as the joint distribution
of (yt +τ , . . . , yt +τ k) for all possible τ.

We have viewed expectations of a time series in


F. Guta (CoBE) Econ 654 September, 2017 12 / 296
terms of ensemble averages such as (1.2) and (1.5).

These de…nitions may seem a bit contrived, since


usually all one has available is a single realization of
size T from the process, which may be denoted by
n o
(1) (1) (1)
y1 , y2 , . . . , yT .
From these observations, we would calculate the
sample mean y.
This, of course, is not an ensemble average but
rather a time average.
F. Guta (CoBE) Econ 654 September, 2017 13 / 296
(1)
y = (1/T ) ∑Tt=1 yt (1.6)

Whether time averages such as (1.6) eventually


converges to the ensemble average concept for a
stationary process has to do with ergodicity.

De…nition
1.1.3 A stationary time series is ergodic if γk ! 0 as
k " ∞.

The following two theorems are essential to the


analysis of stationary time series. There proofs are
F. Guta (CoBE) Econ 654 September, 2017 14 / 296
rather di¢ cult, however.
Theorem
1.1.1 If yt is strictly stationary and ergodic and
xt = f (yt , yt 1 , . . .) is a random variable, then xt is
strictly stationary and ergodic.
1.1.2 (Ergodic Theorem). If yt is strictly stationary and
ergodic and E (jyt j) < ∞, then as T " ∞ ,
1 T p
∑t =1 yt ! E (yt ).
T

This allows us to consistently estimate parameters


F. Guta (CoBE) Econ 654 September, 2017 15 / 296
using time-series moments:
1 T
b=
The sample mean: µ T ∑t =1 yt .
The sample autocovariance:
1 T
bk =
γ T ∑t =1 (yt b ) (yt
µ k b ).
µ
The sample autocorrelation: b b k /b
ρk = γ γ0 .
Theorem
1.1.3 If yt is strictly stationary and ergodic and
E (yt2 ) < ∞ then as T " ∞
p
b ! µ.
1). µ

F. Guta (CoBE) Econ 654 September, 2017 16 / 296


Theorem
p
bk
2). γ ! γk .
p
3). b
ρk ! ρk .

Proof.
Proof of Theorem 1.1.3. Part (1) is a direct
consequence of the Ergodic theorem.

For part (2), note that

1 T 1 T 1 T
bk =
γ
T ∑t =1 yt yt k
T ∑t =1 yt µb T ∑t =1 yt b+µ
kµ b 2

F. Guta (CoBE) Econ 654 September, 2017 17 / 296


Proof.
By Theorem 1.1.1 above, the sequence yt yt k is
strictly stationary and ergodic and it has a …nite
mean by the assumption that E (yt2 ) < ∞.
Thus an application of the Ergodic Theorem yields:

1 p
∑t =1 yt yt
T
k ! E (yt yt k)
T

Hence,

p
bk
γ ! E (yt yt k) µ2 µ2 + µ2 = E (yt yt k) µ2 = γk
F. Guta (CoBE) Econ 654 September, 2017 18 / 296
A covariance stationary process is said to be ergodic
for the mean if y converges in probability to E (yt )
as T ! ∞.
Alternatively, if the autocovariance for a
covariance-stationary process satisfy ∑j∞=1 γj < ∞
then fyt g is ergodic for the mean.

Lp can be de…ned as the set of random variables X


p
with E (jX j ) < ∞ (convergence in mean of order
p).
F. Guta (CoBE) Econ 654 September, 2017 19 / 296
When p is two, the set of random variables X
satisfying E (X 2 ) < ∞ are said to be square
integrable.
The set of square integrable real random variable X
is a normed linear space R with norm
1/2
kX k = [E (X 2 )] .
De…nition
1.1.4 Two random variables X and Y such that
E (X 2 ) < ∞ and E (Y 2 ) < ∞ are said to be
orthogonal with respect L2 if E (XY ) = 0.
F. Guta (CoBE) Econ 654 September, 2017 20 / 296
De…nition
1.1.5 A sequence of variables xn , satisfying E (x 2 ) < ∞
converges to a variable x with respect L2 (or
converges in mean square) if kxn x k ! 0 as
n " ∞.

The function γ : k ! γk , k being integer, is called


autocovariance function. This function is
i ). Even: that is, 8k, γ k = γk ;
ii ).Positive since
F. Guta (CoBE) Econ 654 September, 2017 21 / 296
n n
∑j =1 ∑k =1 φj φk γ (tj tk ) = var ∑nj=1 φj xtj > 0,
for all positive integer n, for all real numbers φj and
φk , and for all integer tj .

Theorem
1.1.4 If xt is a stationary process, and if
(φi , i being integer) forms a sequence of absolutely
summable real numbers with ∑i∞= ∞ jφi j < ∞, the
new variable obtained by yt = ∑i∞= ∞ φi xt i de…nes
a new stationary process.

F. Guta (CoBE) Econ 654 September, 2017 22 / 296


Proof.
The series formed by φi xt i is convergent in mean
square since
∞ ∞ 1 /2 ∞
∑ k φi xt ik = ∑ j φi j kxt ik = γ0 + µ2 ∑ j φi j < ∞
i= ∞ i= ∞ i= ∞

The expression ∑i∞= ∞ φi xt i is an element of L2 , yt


so de…ned is square integrable.

The moment of fyt g can be written as


∞ ∞
E (yt ) = E ∑ φi xt i = ∑ φi E (xt i )
i= ∞ i= ∞
F. Guta (CoBE) Econ 654 September, 2017 23 / 296
Proof.
= µx ∑i∞= ∞ φi = µy

!
∞ ∞
cov (yt , yt +k ) = cov ∑ φi xt i , ∑ φj xt +k j
i= ∞ j= ∞
∞ ∞
= ∑ ∑ φi φj cov (xt i , xt +k j)
i= ∞ j= ∞

∞ ∞
= ∑ ∑ φi φj γx (k + i j ) = γy (k )
i= ∞ j= ∞

Both mean and covariance are independent of t.


F. Guta (CoBE) Econ 654 September, 2017 24 / 296
White Noise Process:

The building block for all the process considered in



this unit is a sequence fet gt = ∞ whose elements
have mean zero and variance σ2 :
9
E ( et ) = 0 =
(1.7)
E ( e2 ) = σ 2 ;
t

and for which the e’s are uncorrelated across time:

E (et eτ ) = 0 for t 6= τ (1.8)


F. Guta (CoBE) Econ 654 September, 2017 25 / 296
A process satisfying (1.7) and (1.8) is described as
a white noise process.
One may on occasion wish to replace (1.8) with
slightly stronger condition that the e‘s are
independent across time:

et and eτ are independent for t 6= τ (1.9)

A process satisfying (1.7) and (1.9) is called an


independent white noise process.

F. Guta (CoBE) Econ 654 September, 2017 26 / 296


Finally, if (1.7) and (1.9) hold along with

et N 0, σ2 (1.10)

then we have the Gaussian white noise process.

1.2 Autoregressive and Moving Average Processes

In time series, the series f. . . , y1 , y2 , . . . , yT , . . .g


are jointly random.
We consider the conditional expectation
E ( yt j Ft 1) where Ft 1 = fyt 1 , yt 2 , . . .g is the
F. Guta (CoBE) Econ 654 September, 2017 27 / 296
history of the series or the information set at time
t 1.

An autoregressive (AR) model speci…es that only a


…nite number of past lags matter.

E ( yt j Ft 1) = E ( yt j yt 1 , yt 2 , . . . , yt k)

A linear AR model (the most common type used in


practice) speci…es linearity:

E ( yt j Ft 1) = ρ0 + ρ1 yt 1 + ρ2 yt 2 + + ρk yt k

F. Guta (CoBE) Econ 654 September, 2017 28 / 296


Letting et = yt E ( yt j Ft 1) , then we have the
AR model

yt = ρ0 + ρ1 yt 1 + ρ2 yt 2 + + ρk yt k + et
E ( et j Ft 1) =0

The last property de…nes a special time-series


process.
De…nition
1.2.1 et is a martingale di¤erence sequence (MDS) if
E ( et j Ft 1) = 0.
F. Guta (CoBE) Econ 654 September, 2017 29 / 296
Regression errors are naturally MDS.
Some time-series processes may be a MDS as a
consequence of optimizing behavior.

Example
Some versions of the life-cycle hypothesis imply that
either changes in consumption or consumption
growth rates should be a MDS.
Most asset pricing models imply that asset returns
should be the sum of a constant plus a MDS.

F. Guta (CoBE) Econ 654 September, 2017 30 / 296


The MDS property for the regression error plays the
same role in a time-series regression as does the
conditional mean-zero property for the regression
error in a cross-section regression.
In fact, it is even more important in the time-series
context, as it is di¢ cult to derive distribution
theories without this property.
A useful property of a MDS is that et is
uncorrelated with any function of the lagged

F. Guta (CoBE) Econ 654 September, 2017 31 / 296


information Ft 1. Thus for k > 0,
E ( et j Ft k) = 0.
1.2.1 Stationary AR(1) Process

A …rst-order autoregressive process, denoted by


AR (1), satisfy the following di¤erence equation:

yt = α + φyt 1 + et (1.2.1)

where, fet g is white noise sequence.


If jφj 1, the consequences of the e’s for y
accumulate rather than die out over time and hence
F. Guta (CoBE) Econ 654 September, 2017 32 / 296
there does not exist a covariance stationary process
for yt with …nite variance that satisfy (1.2.1).
In the case where jφj < 1, there is a covariance
stationary process for yt satisfying (1.2.1).
By repeated backward-substitution, (1.2.1) can be
expressed as
α
yt = + et + φet 1 + φ2 et 2 + (1.2.2)
1 φ
This can be viewed as an MA(∞) process.
When jφj < 1, ∑j∞=0 jφj =
j 1
1 jφj .
F. Guta (CoBE) Econ 654 September, 2017 33 / 296
The remainder of this discussion of …rst-order AR
process assumes that jφj < 1.
This ensures that the MA(∞) representation exists
and that the AR (1) process is ergodic for the mean.

Theorem
1.2.1 If jφj < 1 then yt is strictly stationary and ergodic.

Taking expectation of both sides of (1.2.2) yields


the mean of AR (1) process given by:

F. Guta (CoBE) Econ 654 September, 2017 34 / 296


α
E (yt ) = µ = (1.2.3)
1 φ

Variance of a stationary AR (1) process is

2
γ0 = E (yt µ)2 = E et + φet 1 + φ2 et 2 +

σ2
= 1 + φ2 + φ4 + φ6 + σ2 = (1.2.4)
1 φ2

Covariance at lag k of a stationary AR (1) process is

γk = E (yt µ) (yt k µ)

F. Guta (CoBE) Econ 654 September, 2017 35 / 296


=E et + φet 1 + φ2 et 2 + + φk et k + φ k +1 e t k 1 +

et k + φet k 1 + φ2 et k 2 +

φk
γ k = φ k + φ k +2 + φ k +4 + σ2 = σ2 (1.2.5)
1 φ2

From (1.2.4) and (1.2.5) it follows that the


autocorrelation function at lag k of a stationary
AR (1) is
γk
ρk = = φk (1.2.6)
γ0
F. Guta (CoBE) Econ 654 September, 2017 36 / 296
The moments for a stationary AR (1) process were
derived above by viewing it as an MA(∞) process.
The second way to arrive at the same results is to
assume that the process is covariance- stationary
and calculate the moments directly from (1.2.1).
Taking expectations of both sides of (1.2.1),

E (yt ) = α + φE (yt 1) + E ( et ) (1.2.7)

Assuming that the process is covariance stationary,

E (yt ) = E (yt 1) =µ (1.2.8)


F. Guta (CoBE) Econ 654 September, 2017 37 / 296
Substituting (1.2.8) into (1.2.7)
α
µ = α + φµ + 0 ) µ = (1.2.9)
1 φ
To …nd the second moment of yt in analogous
manner, use (1.2.9) to rewrite (1.2.1) as
9
yt = µ (1 φ) + φyt 1 + et =
(1.2.10)
(yt µ) = φ (yt 1 µ) + et ;

Now square both sides of (1.2.10) and take


expectations
F. Guta (CoBE) Econ 654 September, 2017 38 / 296
E (yt µ )2 = φ2 E (yt 1 µ)2 + 2φE [(yt 1 µ) et ] + E e2t (1.2.11)

Recall from (1.2.2) that (yt 1 µ) is a linear


function of et 1, et 2 , . . ., i.e.,

(yt 1 µ ) = et 1 + φet 2 + φ2 et 3 +

But et is uncorrelated with et 1, et 2 , . . ., so et


must be uncorrelated with (yt 1 µ ).
Hence, the second term on the right hand side of
(1.2.11) is zero:
F. Guta (CoBE) Econ 654 September, 2017 39 / 296
E [(yt 1 µ ) et ] = 0 (1.2.12)
Again, assuming covariance-stationarity, we have
2 2
E (yt µ) = E (yt 1 µ ) = γ0 (1.2.13)

Substituting (1.2.12) and (1.2.13) into (1.2.11),


2 σ2 2
γ0 = φ γ0 + σ ) γ0 =
1 φ2
Similarly, we could multiply both sides of (1.2.10)
by (yt k µ) and take expectations:

E [(yt µ ) (y t k µ)] = φE [(yt 1 µ ) (y t k µ)] + E [(yt k µ) et ] (1.2.14)

F. Guta (CoBE) Econ 654 September, 2017 40 / 296


But the term (yt k µ) is a linear function of
et k, et k 1 , . . ., which, for k > 0, will be
uncorrelated with et .
Thus, for k > 0, the last term on the right hand
side of (1.2.14) is zero.
Hence, for k > 0, (1.2.14) becomes:

γk = φγk 1 ) γk = φk γ0 (1.2.15)

The data for the simulated AR (1) processes with


parameter φ equal to 0.5 and 0.9 are depicted in
F. Guta (CoBE) Econ 654 September, 2017 41 / 296
the following …gures, combined with their
autocorrelation function.

All series are standardized to have unit variance and


zero mean.
If we compare the AR series with φ = 0.5 and
φ = 0.9, it appears that the latter process is
smoother, that is, a higher degree of persistence.
The autocorrelation function show an exponential
decay in both cases, although it takes large lags for
F. Guta (CoBE) Econ 654 September, 2017 42 / 296
the ACF of the φ = 0.9 series to become close to
zero.

For example, after 15 periods, the e¤ect of a shock


is still 0.915 ' 0.21 of its original e¤ect.
For φ = 0.5 series, the e¤ect at lag 15 is virtually
zero.

F. Guta (CoBE) Econ 654 September, 2017 43 / 296


F. Guta (CoBE) Econ 654 September, 2017 44 / 296
F. Guta (CoBE) Econ 654 September, 2017 45 / 296
1.2.2 The Second-Order Autoregressive Process

A second-order Autoregression, denoted by AR (2),


satis…es:

yt = α + φ1 yt 1 + φ2 yt 2 + et (1.2.16)

Or, in the lag operator notation,

1 φ1 L φ2 L2 yt = α + et (1.2.17)

The di¤erence equation (1.2.16) is stable provided


that the roots of
F. Guta (CoBE) Econ 654 September, 2017 46 / 296
(1 φ1 z φ2 z 2 ) = 0 (1.2.18)
lie outside the unit circle.

When this condition is satis…ed, the AR (2) process


turns out to be covariance-stationary, and the
inverse of the AR operator in (1.2.17) is given by

1
ψ (L) = 1 φ1 L φ2 L2 = ψ0 + ψ1 L + ψ2 L2 + (1.2.19)

The value of ψj ’s can be found from the fact that

F. Guta (CoBE) Econ 654 September, 2017 47 / 296


1 φ1 L φ2 L2 ψ0 + ψ1 L + ψ2 L2 + ψ3 L3 + =1

From this it follows that

ψ0 = 1, ψ1 = φ1 , ψj = φ1 ψj 1 + φ2 ψj 2 for j 2

Multiplying both sides of (1.2.17) by ψ(L) gives

yt = ψ (L) α + ψ (L) et (1.2.20)

One can easily show that


α
ψ (L) α = (1.2.21)
1 φ1 φ2
F. Guta (CoBE) Econ 654 September, 2017 48 / 296
and

∑ j =0 ψj < ∞ (1.2.22)

Proof.
Taking expectation of both sides of (1.2.16) yields

E (yt ) = α + φ1 E (yt 1) + φ2 E (yt 2 )

Assuming that the process is covariance stationary,


we obtain

F. Guta (CoBE) Econ 654 September, 2017 49 / 296


Proof.
α
E (yt ) = µ =
1 φ1 φ2
Taking expectation of both sides of (1.2.20) yields

E (yt ) = ψ (L) α

The preceding two expression imply that

α
ψ (L) α = = ψ ( 1) α
1 φ1 φ2

This proves the result in (1.2.21) for a covariance


F. Guta (CoBE) Econ 654 September, 2017 50 / 296
Proof.
stationary second-order autoregressive process.

Proof of (1.2.22) proceeds as follows:


Factorizing a second-order polynomial in the lag
operator,

1 1
1 φ1 L φ2 L2 = [(1 λ1 L) (1
λ2 L)]
c1 c2
= +
(1 λ1 L) (1 λ2 L)
λ1 λ2
Where c1 = and c2 = .
λ1 λ2 λ2 λ1
F. Guta (CoBE) Econ 654 September, 2017 51 / 296
Proof.
Assuming that the eigenvalues are inside the unit
circle or the roots of the reverse characteristic
polynomial lie outside the unit circle, one would
express the second order polynomial in the lag
operator as
∞ ∞
= c1 ∑ (λ1 L) + c2 ∑ (λ2 L)j
2 1 j
1 φ1 L φ2 L
j =0 j =0

= ∑ c1 λj1 + c2 λj2 Lj
j =0

F. Guta (CoBE) Econ 654 September, 2017 52 / 296


Proof.
From this last expression it follows that
ψj = c1 λj1 + c2 λj2 .

Hence,

∞ ∞ ∞ ∞
∑ ψj = ∑ c1 λj1 + c2 λj2 jc1 j ∑ jλ1 jj + jc2 j ∑ jλ2 jj
j =0 j =0 j =0 j =0
jc1 j jc2 j
= + <∞
1 j λ1 j 1 j λ2 j

This proves the expression in (1.2.22).

F. Guta (CoBE) Econ 654 September, 2017 53 / 296


To …nd the second moments, write (1.2.16) as

yt = µ (1 φ1 φ2 ) + φ1 yt 1 + φ2 yt 2 + et

Or

(yt µ) = φ1 (yt 1 µ) + φ2 (yt 2 µ) + et (1.2.23)

Multiplying both sides of (1.2.23) by (yt k µ)


and taking expectations produces

γk = φ1 γk 1 + φ2 γk 2 for k = 1, 2, . . . (1.2.24)

F. Guta (CoBE) Econ 654 September, 2017 54 / 296


Hence, the autocovariances follow the same
second-order di¤erence equation as does the process
for yt .
An AR (2) process is covariance stationary if the
roots of the second-order polynomial in the lag
operator lie outside the unit circle.
If both roots are real and lie outside the unit circle,
the autocovariance function γk is the sum of two
decaying exponential functions of k.

F. Guta (CoBE) Econ 654 September, 2017 55 / 296


When both roots are complex and their modulus lie
outside the unit circle, the autocovariance function
γk is a damped sinusoidal function.
The autocorrelations are found by dividing both
sides of (1.2.24) by γ0 :

ρk = φ1 ρk 1 + φ2 ρk 2 for k = 1, 2, . . . (1.2.25)

In particular, setting k = 1 yields

ρ1 = φ1 + φ2 ρ1
F. Guta (CoBE) Econ 654 September, 2017 56 / 296
φ1
Or, ρ1 = (1.2.26)
1 φ2
For k = 2,
ρ2 = φ1 ρ1 + φ2 (1.2.27)

Variance of a covariance stationary second-order AR

can be found by multiplying both sides of (1.2.23)

by (yt µ) and taking expectations:

E (yt µ)2 = φ1 E [(yt 1 µ) (yt µ)]

+φ2 E [(yt 2 µ) (yt µ)] + E [et (yt µ)]


F. Guta (CoBE) Econ 654 September, 2017 57 / 296
Or
γ0 = φ1 γ1 + φ2 γ2 + σ 2 (1.2.28)

Equation (1.2.28) can be written as:

γ0 = φ1 ρ1 γ0 + φ2 ρ2 γ0 + σ 2 (1.2.29)

Substituting (1.2.26) and (1.2.27) into (1.2.29)


gives
(1 φ2 ) σ 2
γ0 = h i
(1 + φ2 ) (1 φ2 )2 φ21

F. Guta (CoBE) Econ 654 September, 2017 58 / 296


1.2.3 The pth -order Autoregressive process

A pth -order Autoregression, denoted by AR (p ),

satis…es

yt = α + φ1 yt 1 + φ2 yt 2 + + φp yt p + et (1.2.30)

Provided that the roots of

1 φz φz 2 φp z p = 0 (1.2.31)

all lie outside the unit circle, one can easily verify
that a covariance stationary representation of the
F. Guta (CoBE) Econ 654 September, 2017 59 / 296
form
yt = µ + ψ (L) et (1.2.32)

exits where

1 ∞
ψ ( L) = 1 φL 2
φL p
φp L and ∑ ψj < ∞
j =0

Theorem
1.3.1 The AR (p ) process is strictly stationary and ergodic
if and only if jλk j < 1 (eigenvalues) for all k.
F. Guta (CoBE) Econ 654 September, 2017 60 / 296
Assuming that the stationarity condition is satis…ed,
one alternative way to …nd the mean is to take
expectations of (1.2.30):
α
µ= (1.2.33)
1 φ1 φ2 φp
Using (1.2.33), one can easily rewrite equation
(1.2.30) as

yt µ = φ1 (yt 1 µ) + φ2 (yt 2 µ) +
+ φp (yt p µ ) + et (1.2.34)
F. Guta (CoBE) Econ 654 September, 2017 61 / 296
One can easily …nd autocovariances by multiplying
both sides of (1.2.34) by (yt k µ) and taking
expectations:
8
>
>
>
< φ1 γk + φ2 γk + + φp γk for k = 1, 2, . . .
1 2 p
γk = (1.2.35)
>
>
>
: φ1 γ1 + φ2 γ2 + + φp γp + σ 2
for k = 0

Using the fact that γ k = γk , the system of


equations in (1.2.35) can be solved for
γ0 , γ1 , . . . , γp 1 as functions of σ2 , φ1 , φ2 . . . , φp .
It can be shown that the (p 1) vector
F. Guta (CoBE) Econ 654 September, 2017 62 / 296
0
γ0 γ1 γp 1
is given by the …rst p
elements of the …rst column of the (p2 p2 ) matrix
1
σ2 Ip2 (F F) where F is the (p p ) matrix
de…ned as
0 1
B φ1 φ2 φp 1 φp C
B C
B 1 0 0 0 C
B C
B C
B 0 1 0 0 C
B C
B .. .. ... .. .. C
B . . . . C
@ A
0 0 1 0

F. Guta (CoBE) Econ 654 September, 2017 63 / 296


and indicates the Kronecker product.

Dividing (1.2.35) by γ0 produces the Yule-Walker


equations:

ρk = φ1 ρk 1 + φ2 ρk 2 + + φp ρk p for k = 1, 2, . . . (1.2.36)

Hence, the autocovariances and the autocorrelations


follow the same pth -order di¤erence equation as
does the process (1.2.30) itself.
For distinct roots, their solutions take the form
F. Guta (CoBE) Econ 654 September, 2017 64 / 296
γk = g1 λk1 + g2 λk2 + + gp λkp (1.2.37)
where the eigenvalues (λ1 , λ2 , . . . , λp ) are the
solutions to

λp φ1 λp 1
φ2 λp 2
φp = 0

1.2.4 Stationary MA(1) process

Let fet g be a white noise process and consider the


process

yt = µ + et + θet 1 (1.2.38)
F. Guta (CoBE) Econ 654 September, 2017 65 / 296
where µ and θ could be any constants.
This time series process is called a …rst-order
moving average process, denoted by MA(1).
The term “moving average” comes from the fact
that yt is constructed from a weighted sum of two
most recent values of e.
The expectation of yt is given by

E (yt ) = µ + E (et ) + θE (et 1) =µ (1.2.39)

We used the symbol µ for the constant term in


F. Guta (CoBE) Econ 654 September, 2017 66 / 296
(1.2.38) in anticipation of the result that this
constant term turns out to be the mean of the
process.

The variance of yt is:

E (yt µ)2 = E (et + θet 1)


2

2
= E e2t + 2θ E (et et 1) + θ E e2t 1

= 1 + θ 2 σ2 (1.2.40)

F. Guta (CoBE) Econ 654 September, 2017 67 / 296


The autocovariance at lag 1 is

E [(yt µ) (yt 1 µ)] = E [(et + θet 1 ) ( et 1 + θet 2 )]

= θσ2 (1.2.41)

Autocovariance at lags larger than one are all zero:

E [(yt µ) (yt k µ)] = E [(et + θet 1 ) ( et k + θet k 1 )]

= 0 for k > 1 (1.2.42)

Since the mean and autocovariances are not


F. Guta (CoBE) Econ 654 September, 2017 68 / 296
functions of time, an MA(1) process is
covariance-stationary regardless of the magnitude of
θ.
Furthermore, an MA(1) process satis…es the
condition that

∑ j γk j = 1 + θ 2 σ2 + θσ2 < ∞
k =0

Hence, if fet g is Gaussian white noise process, then


the MA(1) process in (1.2.38) is ergodic for all
moments.
F. Guta (CoBE) Econ 654 September, 2017 69 / 296
The autocorrelation at lag k of a
covariance-stationary process (denoted by ρk ) is
de…ned as its autocovariance at lag k divided by the
variance:
γk
ρk = (1.2.43)
γ0
Notice also that the autocorrelation at lag 0 is equal
to unity for any covariance-stationary process by
de…nition.
From (1.2.40) and (1.2.41), the …rst order
F. Guta (CoBE) Econ 654 September, 2017 70 / 296
autocorrelation for an MA(1) process is given by:
θσ2 θ
ρ1 = = (1.2.44)
1 + θ 2 σ2 1 + θ2
Higher order autocorrelations are all zero.

F. Guta (CoBE) Econ 654 September, 2017 71 / 296


F. Guta (CoBE) Econ 654 September, 2017 72 / 296
1.2.5 Stationary MA(q) process

A qth -order moving average process, denoted by

MA(q ), is characterized by

yt = µ + et + θ 1 et 1 + θ 2 et 2 + + θ q et q (1.2.45)

where fet g is a white noise process and


(θ 1 , θ 2 , . . . , θ q ) could be any real numbers.

The mean of the process in (1.2.45) is given by:


F. Guta (CoBE) Econ 654 September, 2017 73 / 296
E ( yt ) = µ + E ( e t ) + θ 1 E ( e t 1 ) + θ2 E ( et 2) + + θ q E ( et q)

=µ (1.2.46)

The variance of an MA(q ) process is:

γ0 = E ( yt µ )2 = E ( et + θ 1 et 1 + θ 2 et 2 + + θ q et q)
2

= 1 + θ 21 + θ 22 + + θ 2q σ2 (1.2.47)

For k = 1, 2, . . . , q,

γk = E f(et + θ 1 et 1 + θ 2 et 2 + + θ q et q) (1.2.48)

et k + θ 1 et k 1 + θ 2 et k 2 + + θ q et k q

F. Guta (CoBE) Econ 654 September, 2017 74 / 296


= E θ k e2t k + θ k +1 θ 1 e2t k 1 + θ k +2 θ 2 e2t k 2 + + θq θq 2
k et q

Terms involving e’s at di¤erent dates have been


dropped because their product has expectation zero,
and θ 0 is de…ned to be unity.
For k > q, there are no e’s with common dates in
the de…nition of γk , and so the expectation is zero.
Hence,
8
>
> 2
< ( θ k + θ k +1 θ 1 + θ k +2 θ 2 + + θq θq k)σ for k = 1, 2, . . . , q
γk = (1.2.49)
>
>
: 0 for k > q

For any values of (θ 1 , θ 2 , . . . , θ q ), the MA(q )


F. Guta (CoBE) Econ 654 September, 2017 75 / 296
process is thus covariance-stationary as

∑k =0 jγk j < ∞ and furthermore, for Gaussian et
the MA(q ) process is covariance-stationary and also
ergodic for all moments.
The autocorrelation function is zero after q lags.

1.2.6 The In…nite-Order Moving Average Process

The MA(q ) process can be written as

yt = µ + ∑j =0 ψj et
q
j with ψ0 = 1

Consider the process that result as q " ∞:


F. Guta (CoBE) Econ 654 September, 2017 76 / 296
yt = µ + ∑j∞=0 ψj et j = µ + ψ0 et + ψ1 et 1 + ψ2 et 2 + with ψ0 = 1 (1.2.50)

This is described as an MA(∞) process.


The in…nite sequence in (1.2.50) generates a well
de…ned covariance-stationary process provided that

∑j =0 ψ2j < ∞ (1.2.51)

It is often convenient to work with a slightly


stronger condition than (1.2.51) given by:

∑ j =0 ψj < ∞ (1.2.52)
F. Guta (CoBE) Econ 654 September, 2017 77 / 296
n o∞
A sequence of numbers ψj satisfying (1.2.51)
j =0
is said to be square summable, whereas a sequence
satisfying (1.2.52) is said to be absolutely
summable.
Absolute summability implies square summability,
but the converse does not hold.
Proof.
Proof of the result in (1.2.51):
We need to show that square-summability of a
moving average coe¢ cients implies that the
F. Guta (CoBE) Econ 654 September, 2017 78 / 296
Proof.
MA(∞) representation in (1.2.50) generates a
mean square convergent random variable.

For a stochastic process such as (1.2.50), the


question is whether ∑Tj=0 ψj et j converges in mean
square to some random variable yt as T " ∞.
To prove this we use the Cauchy Criterion for a
stochastic process which states that ∑j∞=0 ψj et j

converges if and only if, for any ε > 0, there exists a


suitably large integer N such that for any integer
F. Guta (CoBE) Econ 654 September, 2017 79 / 296
Proof.
M>N
h i2
∑ ∑
M N
E j =0
ψj et j j =0
ψj et j <ε (1.2.53)

The expression on the left hand side of (1.2.53) can


written

h i2
∑ j =N +1 ψ j e t
M
E j = ψ2M + ψ2M 1 + + ψ2N +1 σ2
h i
∑j =0 ψ2j ∑j =0 ψ2j
M N
= σ2 (1.2.54)

F. Guta (CoBE) Econ 654 September, 2017 80 / 296


Proof.
But if ∑j∞=0 ψ2j converges as required by (1.2.51),
then by the Cauchy Criterion the right hand side of
(1.2.54) may be made as small as desired by choice
of a suitably large N.
Hence, the in…nite series in (1.2.50) converges in
mean square provided that (1.2.51) is satis…ed.
Proof of the result that absolute summability
implies square summability:
Absolute summability of the moving average
F. Guta (CoBE) Econ 654 September, 2017 81 / 296
Proof.
coe¢ cients implies square summability.
n o∞
Suppose that ψj is absolutely summable.
j =0
Then there exists an N < ∞ such that ψj < 1 for
all j N. Then
∞ N 1 ∞ N 1 ∞
∑ ψ2j = ∑ ψ2j + ∑ ψ2j < ∑ ψ2j + ∑ ψj
j =0 j =0 j =N j =0 j =N

But ∑Nj =01 ψ2j is …nite, since N is …nite, and


n o

∑j =N ψj is …nite since ψj is absolutely
F. Guta (CoBE) Econ 654 September, 2017 82 / 296
Proof.
summable.

Hence, ∑j∞=0 ψ2j < ∞ , establishing the result that


(1.2.52) implies (1.2.51).

The mean and variance of an MA(∞) process with


absolutely summable coe¢ cients can be calculated
from a simple extrapolation of the results for an
MA(q ) process:
F. Guta (CoBE) Econ 654 September, 2017 83 / 296
E (yt ) = lim E (µ + ψ0 et + ψ1 et 1 + ψ2 et 2 + + ψT et T) (1.2.55)
T !∞

γ0 = E (yt µ )2

2
= lim E (ψ0 et + ψ1 et 1 + ψ2 et 2 + + ψT et T)
T !∞

= lim ψ20 + ψ21 + ψ22 + + ψ2T σ2 (1.2.56)


T !∞

γk = E (yt µ) (yt k µ) (1.2.57)

= ψ k ψ 0 + ψ k +1 ψ 1 + ψ k +2 ψ 2 + ψ k +3 ψ 3 + σ2

F. Guta (CoBE) Econ 654 September, 2017 84 / 296


Moreover, an MA(∞) process with absolutely
summable coe¢ cients has absolutely summable
autocovariance:

∑ j γk j < ∞ (1.2.58)
k =0

Hence an MA(∞) process satisfying (1.2.52) is


ergodic for the mean. Proof of this result is as
follows:
Proof.
Write (1.2.57) as γk = σ2 ∑j∞=0 ψj +k ψj . Then
F. Guta (CoBE) Econ 654 September, 2017 85 / 296
Proof.
jγk j = σ2 ∑j∞=0 ψj +k ψj σ2 ∑j∞=0 ψj +k ψj

Hence,

∞ ∞ ∞ ∞ ∞
∑ j γk j σ2 ∑∑ ψ j +k ψ j = σ 2 ∑ ψj ∑ ψ j +k
k =0 k =0 j =0 j =0 k =0

But there exists an M < ∞ such that


∞ ∞
∑j =0 ψj < M, and therefore ∑k =0 ψj +k < M for
j = 0, 1, 2, . . . . Therefore,
∞ ∞
∑ k =0 j γ k j < σ 2 ∑ j =0 ψ j M < σ2 M 2 < ∞.
F. Guta (CoBE) Econ 654 September, 2017 86 / 296
Proof.
Hence (1.2.51) holds and the process is ergodic for
the mean.

If the e’s are Gaussian, then the process is ergodic


for all moments.

1.2.7 Stationary ARMA(p,q) Process

An ARMA(p, q ) process includes both


autoregressive and moving average terms de…ned as:
F. Guta (CoBE) Econ 654 September, 2017 87 / 296
yt = α + φ1 yt 1 + + φp yt p + et
+ θ 1 et 1 + + θ q et q (1.2.59)
Or, in lag operator form,

1 φ1 L φ2 L 2 φp L p y t = α + 1 + θ 1 L + θ 2 L 2 + + θ q L q et (1.2.60)

Provided that the roots of

1 φ1 z φ2 z 2 φp z p = 0 (1.2.61)

all lie outside the unit circle, both sides of (1.2.60)

F. Guta (CoBE) Econ 654 September, 2017 88 / 296


can be dived by 1 φ1 L φ2 L2 φp Lp to
obtain yt = µ + ψ (L) et
where
(1 + θ 1 L + θ 2 L2 + + θ q Lq )
ψ (L) =
1 φ1 L φ2 L2 φp Lp

α
∑ ψj < ∞ and µ =
1 φ1 φ2 φp
j =0

Hence, stationarity of an ARMA(p, q ) process


depends entirely on the autoregressive parameters
φ1 , φ2 , . . . , φp and not on the moving average
F. Guta (CoBE) Econ 654 September, 2017 89 / 296
parameters (θ 1 , θ 2 , . . . , θ q ).

It is often convenient to write the ARMA process

(1.2.59) in terms of deviations from the mean:

yt µ = φ1 (yt 1 µ) + + φp (yt p µ)

+ et + θ 1 et 1 + + θ q et q (1.2.62)

Autocovariances are found by multiplying both sides


of (1.2.62) by (yt k µ) and taking expectations.
F. Guta (CoBE) Econ 654 September, 2017 90 / 296
For k > q, the resulting equations take the form

γk = φ1 γk 1 + φ2 γk 2 + + φp γk p (1.2.63)

for k = q + 1, q + 2, . . .

Hence, after q lags the autocovariance function γk


(and the autocorrelation function ρk ) follow the
pth -order di¤erence equation governed by the
autoregressive parameters.
Note that (1.2.63) does not hold for k q, owing
F. Guta (CoBE) Econ 654 September, 2017 91 / 296
to the correlation between θ k et k and yt k.

Therefore, an ARMA(p, q ) process will have more


complicated autocovariances for lags 1 through q
than would the corresponding AR (p ) process.
For k > q with distinct autoregressive roots, the
autocovariances will be given by:

γk = h1 λk1 + h2 λk2 + + hp λkp (1.2.64)

This takes the same form as the autocovariances for


an AR (p ) process, though because the initial
F. Guta (CoBE) Econ 654 September, 2017 92 / 296
conditions γ0 , γ1 , . . . , γq di¤er for the ARMA and
AR processes, the parameters hk in (1.2.64) will not
be the same as the parameters gk in (1.2.37).
There is a potential for overparameterization with
ARMA processes.
Consider, for instance, a simple white noise process

yt = et (1.2.65)

Suppose we multiply both sides of (1.2.65) by


(1 ρL):
F. Guta (CoBE) Econ 654 September, 2017 93 / 296
(1 ρL) yt = (1 ρL) et (1.2.66)

If (1.2.65) is a valid parameterization, then so is


(1.2.66) for any value of ρ. Hence, (1.2.66) might
be described as an ARMA(1, 1) process with
φ1 = ρ and θ 1 = ρ.

It is important to avoid such a parameterization.

Since any value of ρ in (1.2.66) describes the data


equally well, we will obviously get into trouble trying

F. Guta (CoBE) Econ 654 September, 2017 94 / 296


to estimate the parameter ρ in (1.2.66) by
maximum likelihood.
If we are using an ARMA(1, 1) model in which θ 1 is
close to φ1 , then the data might better be
modeled as simple white noise process.
A related overparameterization can arise with an
ARMA(p, q ) model.
Consider factoring the lag polynomial in (1.2.60) as
p q
∏ (1 λk L) (yt µ) = ∏ (1 η k L)et (1.2.67)
k =1 k =1
F. Guta (CoBE) Econ 654 September, 2017 95 / 296
Assume jλi j < 1 for all i, so that the process is
covariance-stationary.

If the autoregressive operator

1 φ1 L φ2 L2 φp Lp and the moving average

operator 1 + θ 1 L + θ 2 L2 + + θ q Lq have any roots in

common, say λi = η j for some i and j, then both

sides of (1.2.67) can be divided by (1 λi L):


p q
∏ (1 λk L) (yt µ) = ∏ (1 η k L) et (1.2.68)
k =1, k 6=i k =1, k 6=j
F. Guta (CoBE) Econ 654 September, 2017 96 / 296
Or,

p 1 q 1
1 φ1 L φp 1L ( yt µ) = 1 + θ 1 L + + θq 1L et

where
p
1 φ1 L φp 1L
p 1
= ∏ (1 λk L)
k =1, k 6=i
q
1 + θ1 L + + θq 1L
q 1
= ∏ (1 η k L)
k =1, k 6=j

The stationary ARMA(p, q ) process satisfying


(1.2.60) is identical to the stationary
F. Guta (CoBE) Econ 654 September, 2017 97 / 296
ARMA(p 1, q 1) process satisfying (1.2.68).
Wold’s Decomposition Theorem
All of the covariance-stationary processes considered
in this section can be written in the form

yt = µ + ∑j =0 ψj et j (1.2.69)

where et is the white noise error one would make in


forecasting yt as a linear function of lagged y and
where ∑j∞=0 ψ2j < ∞ with ψ0 = 1.
Proposition: (Wold’s Decomposition). Any
F. Guta (CoBE) Econ 654 September, 2017 98 / 296
zero-mean covariance-stationary process yt can be
represented in the form

yt = ∑ j =0 ψ j e t j + κt (1.2.70)

where ψ0 = 1 and ∑j∞=0 ψ2j < ∞.

The term et is a white noise and represents the


error made in forecasting yt on the basis of a linear
function of lagged y :

et = yt E ( yt j yt 1 , yt 2 , . . . ) (1.2.71)
F. Guta (CoBE) Econ 654 September, 2017 99 / 296
The value of κ t is uncorrelated with et j for any j,
though κ t can be predicted arbitrarily well from a
linear function of past values of y:

κ t = E ( κ t j yt 1 , yt 2 , . . . )

The term κ t is called the linearly deterministic


component of yt , while ∑j∞=0 ψj et j is called the
linearly indeterministic component.
If κ t = 0, then the process is called purely linearly
indeterministic.
F. Guta (CoBE) Econ 654 September, 2017 100 / 296
This proposition was …rst proved by Wold (1938).
The proposition relies on stable second moments of
y but makes no use of higher moments.
It thus describes only optimal linear forecasts of y.
Finding the Wold representation requires …tting an
in…nite number of parameters fψ1 , ψ2 , . . .g to the
data.
With a …nite number of observations on
fy1 , y2 , . . . , yT g this will never be possible.
F. Guta (CoBE) Econ 654 September, 2017 101 / 296
As a practical matter, we therefore, need to make
some additional assumptions about the nature of
f ψ1 , ψ2 , . . . g.
A typical assumption is that ψ (L) can be expressed
as the ratio of two …nite-order polynomials:

θ (L) 1 + θ 1 L + θ 2 L2 + + θ q Lq
∑ ψj Lj = φ (L)
=
1 φ1 L φ2 L2 φp Lp
j =0

F. Guta (CoBE) Econ 654 September, 2017 102 / 296


1.3 Use of Lag operator

An algebraic construct which is useful for the


analysis of autoregressive models is the lag operator.

De…nition
1.3.1 The lag operator L satis…es Lyt = yt 1.

De…ning L2 = L.L, we see that


L2 yt = Lyt 1 = yt 2 . In general, Lk yt = yt k.

The AR (1) model can be written in the format


F. Guta (CoBE) Econ 654 September, 2017 103 / 296
yt = α + φyt 1 + et

Or (1 φL) yt = α + et
The operator Φ (L) = 1 φL is a polynomial in the
lag operator L.
We say that the root of the polynomial is 1/φ,
since Φ (z ) = 0 when z = 1/φ.
We call Φ (L) the autoregressive polynomial of yt .
From Theorem 1.2.1, an AR (1) is stationary i¤
jφj < 1.
F. Guta (CoBE) Econ 654 September, 2017 104 / 296
Note that an equivalent way to say this is that an
AR (1) is stationary i¤ the root of the autoregressive
polynomial lies outside the unit circle or larger than
one (in absolute value).
We can write a general AR (p ) model as

Φ (L) yt = et

where Φ (L) is a polynomial of order p in the lag


operator L, usually referred to as a lag polynomial,
given by
F. Guta (CoBE) Econ 654 September, 2017 105 / 296
Φ (L) = 1 φ1 L φ2 L2 φp Lp
We can interpret the lag polynomial as a …lter that,
if applied to a time series, produces a new series.
So the …lter Φ (L) applied to an AR (p ) process
produces a white noise process et .
It is relatively easy to manipulate lag polynomials.
Example
Transforming a series by two such polynomials one after the

other is the same as transforming the series once by a

polynomial that is the product of the two original ones.


F. Guta (CoBE) Econ 654 September, 2017 106 / 296
This way we can de…ne the inverse of a …lter, which
is naturally given by the inverse of the polynomial.
Thus the inverse of Φ (L), denoted by Φ 1
(L), is
de…ned so as to satisfy Φ 1
(L) Φ (L) = 1.
If Φ (L) is a …nite-order polynomial in L, its inverse
will be one of in…nite order.

For the AR (1) case we …nd



(1 φ L) 1
= ∑ φj Lj provided that jφj < 1 (1.3.1)
j =0

F. Guta (CoBE) Econ 654 September, 2017 107 / 296


This is similar to the result that the in…nite sum
∞ 1
∑j =0 φj equals to (1 φ) if jφj < 1, while it
does not converge for jφj 1.
In general, the inverse of a polynomial Φ (L) exists
if it satis…es certain conditions on its parameters, in
which case we call Φ (L) invertible.
For example the AR (1) model can be written as
1 1
(1 φL) (1 φL) yt = (1 φL) et . Or
∞ ∞
yt = ∑ φ L et = ∑ φj et
j j
j (1.3.2)
j =0 j =0
F. Guta (CoBE) Econ 654 September, 2017 108 / 296
Under appropriate conditions, the converse is also
possible and we can write a moving average model
in autoregressive form.
Using the lag operator, we can write the MA(1)
process as
yt = (1 + θL) et

and the general MA(q ) process as

yt = θ (L) et

where θ (L) = 1 + θ 1 L + θ 2 L2 + + θ q Lq .
F. Guta (CoBE) Econ 654 September, 2017 109 / 296
1
Now, if θ (L) exists, we can write

1
θ (L) yt = et

which in general, will be an AR model with in…nite


order.

For the MA(1) case, we use,


(1 + θ L) 1
= ∑( θ )j Lj , provided that jθ j < 1 (1.3.3)
j =0

Consequently, an MA(1) model can be written as


F. Guta (CoBE) Econ 654 September, 2017 110 / 296

yt = θ ∑j =0 ( θ ) yt
j
j 1 + et (1.3.4)

A necessary condition for the in…nite AR


representation (AR (∞)) to exist is that the MA
polynomial is invertible, which, in the MA(1) case,
requires that jθ j < 1.

Particularly for making predictions conditional upon


an observed past, the AR representations are very
convenient.

F. Guta (CoBE) Econ 654 September, 2017 111 / 296


The MA representations are often convenient to
determine variances and covariances.
For a more parsimonious representation, we may
want to work with an ARMA model that contains
both an autoregressive and moving average part.
The general ARMA model can be written as

Φ (L) yt = θ (L) et

which (if the AR lag polynomial is invertible) can be


F. Guta (CoBE) Econ 654 September, 2017 112 / 296
written in MA(∞) representation as

yt = Φ 1
(L) θ (L) et

or (if the MA lag polynomial is invertible) can be


written in AR (∞) form as

θ 1
(L) Φ (L) yt = et

Both Φ 1
(L) θ (L) and θ 1
(L) Φ (L) are lag
polynomials of in…nite lag length, with restrictions
on the coe¢ cients.
F. Guta (CoBE) Econ 654 September, 2017 113 / 296
1.4 Invertibility of Lag Polynomials

As we have seen before, the …rst order lag


polynomial 1 φL is invertible if jφj < 1.
This condition can be generalized to higher-order
lag polynomials.
Consider the case of a second-order lag polynomial,
given by 1 φ1 L φ2 L2 .
Generally we can …nd values λ1 and λ2 such that
the polynomial can be written as
F. Guta (CoBE) Econ 654 September, 2017 114 / 296
1 φ1 L φ2 L2 = (1 λ1 L) (1 λ2 L) (1.4.1)

It can easily be veri…ed that λ1 and λ2 can be


solved from λ1 + λ2 = φ1 and λ1 λ2 = φ2 .
The conditions for invertibility of the second-order
polynomial are just the conditions that both the
…rst-order polynomials (1 λ1 L) and (1 λ2 L) are
invertible.
Thus, the requirement for invertibility is that both
jλ1 j < 1 and jλ2 j < 1.
F. Guta (CoBE) Econ 654 September, 2017 115 / 296
These requirements can be formulated in terms of
the so-called reverse characteristic equation.

(1 λ1 z ) (1 λ2 z ) = 0 (1.4.2)

This equation has two solutions, z1 and z2 , referred


to as the reverse characteristic roots.
The requirement jλi j < 1 corresponds to jzi j > 1.
If any solution satis…es jzi j 1, the corresponding
polynomial is noninvertible.
F. Guta (CoBE) Econ 654 September, 2017 116 / 296
A solution that equals unity is referred to as a unit
root.
The presence of a unit root in the lag polynomial
Φ (L) can be detected relatively easily, without
solving the characteristic equation, by nothing that
the polynomial Φ (z ) evaluated at z = 1 is zero if
p
∑j =1 φj = 1.
Thus, the presence of a …rst unit root can be
veri…ed by checking whether the polynomial
coe¢ cients sum to one.
F. Guta (CoBE) Econ 654 September, 2017 117 / 296
If the sum exceeds one, the polynomial is not
invertible.
As an example, consider the following AR (2) model

yt = 1.2yt 1 0.32yt 2 + et (1.4.3)

The AR (2) model in equation (1.4.3) can be


written as

(1 0.8L) (1 0.4L) yt = et (1.4.4)

with reverse characteristic equation


F. Guta (CoBE) Econ 654 September, 2017 118 / 296
1 1.2z + 0.32z 2 = (1 0.8z ) (1 0.4z )
=0 (1.4.5)

The solution (reverse characteristic roots) are


z1 = 1/0.8 = 1.25, and z2 = 1/0.4 = 2.5 which
are both larger than one.
Consequently, the AR polynomial in (1.4.3) is
invertible.
Note that the following AR (1) model

F. Guta (CoBE) Econ 654 September, 2017 119 / 296


yt = 1.2yt 1 + et (1.4.6)

describes a noninvertible AR process.

Invertibility of a lag polynomial is very important for


several reasons.

For moving average models, or more generally,


models with moving average component,
invertibility of the MA is important for estimation
and prediction.

F. Guta (CoBE) Econ 654 September, 2017 120 / 296


For models with an autoregressive part, the AR
polynomial is invertible if and only if the process is
stationary.

1.5 Problem of Common Roots of Lag Polynomials

Decomposing the moving average and


autoregressive polynomials into products of linear
functions in L also shows the problem of common
roots or cancelling roots.
This means that the AR and MA parts of the model
F. Guta (CoBE) Econ 654 September, 2017 121 / 296
have roots that are identical and the corresponding
linear functions in L cancel out.

To illustrate this, consider the model described by

1 φ1 L φ2 L2 yt = (1 + θL) et (1.5.1)

Thus we can write this as

(1 λ1 L) (1 λ2 L) yt = (1 + θL) et (1.5.2)

Now, if θ = λ1 , we can divide both sides by


(1 + θL) to obtain
F. Guta (CoBE) Econ 654 September, 2017 122 / 296
(1 λ2 L) yt = et (1.5.3)
which is exactly the same as (1.5.2).

Thus, in the case of one cancelling root, an


ARMA(p, q ) model can be written equivalently as
an ARMA(p 1, q 1) model.
As an example, the following ARMA(2, 1) model

yt = yt 1 0.25yt 2 + et 0.5et 1 (1.5.4)

which can be rewritten as


F. Guta (CoBE) Econ 654 September, 2017 123 / 296
2
(1 0.5L) yt = (1 0.5L) et (1.5.5)

This reduces to an AR (1) model given by

(1 0.5L) yt = et (1.5.6)

which describes the same process as (1.5.4).

The problem of common roots illustrates why it may


be problematic, in practice, to estimate an ARMA
model with an AR and an MA part of a high order.

F. Guta (CoBE) Econ 654 September, 2017 124 / 296


The reason is that identi…cation and estimation are
hard if roots of the MA and AR polynomials are
almost identical.
In this case, a simpli…ed ARMA(p 1, q 1)
model will yield an almost equivalent representation.

1.6 Autocovariance Generating Function

For each covariance-stationary process yt we


calculated the sequence of autocovariances

γj j= ∞
.
F. Guta (CoBE) Econ 654 September, 2017 125 / 296
If this sequence is absolutely summable, then one
way of summarizing the autocovariances is through
a scalar-valued function called the
autocovariance-generating function:

gy (z ) = ∑ γj z j (1.6.1)
j= ∞

The argument of this function is taken to be a


complex scalar.
Of particular interest as an argument for the
autocovariance-generating function is any value of z
F. Guta (CoBE) Econ 654 September, 2017 126 / 296
that lies on the complex unit circle,

z = e i ω = cos ω i sin ω
p
where i = 1 and ω is the radian angle that z
makes with the real axis.
If the autocovariance-generating function is

evaluated at z = e and divided by 2π, the
resulting function of ω,
1 1 ∞
2π j =∑∞ j
i ωj
Sy (ω ) = gy (z ) = γe

F. Guta (CoBE) Econ 654 September, 2017 127 / 296
is called the population spectrum of y.

For a process with absolutely summable


autocovariances, the function Sy (ω ) exits and can
be used to calculate all of the autocovariances.
This means that if two di¤erent processes share the
same autocovariance-generating function, then the
two processes exhibit the identical sequence of
autocovariances.
As an example of calculating an autocovariance-
F. Guta (CoBE) Econ 654 September, 2017 128 / 296
generating function, consider the MA(1) process.
From equations (1.2.40) to (1.2.42), its
autocovariance-generating function is
h i
gy (z ) = θσ2 z 1
+ 1 + θ 2 σ2 z 0 + θσ2 z 1 = σ2 θ z 1
+ 1 + θ2 + θz

This expression could alternatively be written as

gy (z ) = σ2 (1 + θz ) 1 + θz 1
(1.6.2)

The autocovariance generating function for MA(q )


might be calculated as
F. Guta (CoBE) Econ 654 September, 2017 129 / 296
g y (z ) = σ 2 1 + θ 1 z + θ 2 z 2 + + θq z q 1 + θ1 z 1 + θ2 z 2 + + θq z q (1.6.3 )

This conjecture can be veri…ed by carrying out the


multiplication in (1.6.3) and collecting terms by
power of z:

1 + θ1 z + θ2 z 2 + + θq z q 1 + θ1 z 1
+ θ2 z 2
+ + θq z q

= θ q z q + (θ q 1 + θq θ1 ) z q 1
+ (θ q 2 + θq 1 θ1 + θq θ2 ) z q 2
+

+ (θ 1 + θ 2 θ 1 + θ 3 θ 2 + + θq θq 1) z
1
+ 1 + θ 21 + θ 22 + + θ 2q z 0

1 q
+ (θ 1 + θ 2 θ 1 + θ 3 θ 2 + + θq θq 1) z + + θq z (1.6.4)

F. Guta (CoBE) Econ 654 September, 2017 130 / 296


Comparison of (1.6.4) with (1.2.47) or (1.2.49)
con…rms that the coe¢ cient on z k in (1.6.3) is
indeed the k th autocovariance.
This method for …nding gy (z ) extends to the
MA(∞) case. If

yt = µ + θ (L) et (1.6.5)

with

θ (L) = 1 + θ 1 L + θ 2 L2 + (1.6.6)
F. Guta (CoBE) Econ 654 September, 2017 131 / 296
and

∑j =0 jθ j j < ∞, (1.6.7)

then
gy (z ) = σ2 θ (z ) θ z 1
(1.6.8)

For example, the stationary AR (1) process can be


written as
1
yt µ = (1 φL) et

which is in the form of (1.6.5) with


1
θ (L) = (1 φL) .
F. Guta (CoBE) Econ 654 September, 2017 132 / 296
The autocovariance-generating function for AR (1)
process could therefore be calculated from
σ2
gy (z ) = (1.6.9)
(1 φz 1 ) φz ) (1
To verify this claim, expand out the terms in
(1.6.9):
σ2
= σ 2 1 + φz + φ2 z 2 + 1 + φz 1
+ φ2 z 2
+
(1 φz ) (1 φz 1 )

From which the coe¢ cient on z k is


2 k k +1 k +2 2 σ 2 φk
σ φ +φ φ+φ φ + =
1 φ2
F. Guta (CoBE) Econ 654 September, 2017 133 / 296
This indeed yields the k th autocovariance as earlier
calculated in equation (1.2.5).
The autocovariance-generating function for a
stationary ARMA(p, q ) process can be written as:
σ2 1 + θ 1 z + θ 2 z 2 + + θq z q 1 + θ1 z 1 + θ2 z 2 + + θq z q
g y (z ) = (1.6.10)
1 φ1 z φ2 z2 φp zp 1 φ1 z 1 φ2 z 2 φp z p

Filters

Sometimes the data are …ltered, or treated in a


particular way before they are analyzed, and we
would like to summarize the e¤ects of this
F. Guta (CoBE) Econ 654 September, 2017 134 / 296
treatment on the autocovariances.
This calculation is particularly simple using the
autocovariance-generating function.
For example, suppose that the original data yt were
generated from an MA(1) process,

yt = (1 + θL) et (1.6.11)

with autocovariance-generating function given by


(1.6.2).
Let’s say that the data as actually analyzed, xt ,
F. Guta (CoBE) Econ 654 September, 2017 135 / 296
represent the change in yt over its value the
previous period:

xt = yt yt 1 = (1 L) yt (1.6.12)

Substituting (1.6.11) into (1.6.12), the observed


data can be characterized as the following MA(2)
process,

xt = (1 L) (1 + θ L) et = 1 + ( θ 1) L θ L2 et

= 1 + θ 1 L + θ 2 L2 et (1.6.13)
F. Guta (CoBE) Econ 654 September, 2017 136 / 296
with θ 1 = (θ 1) and θ 2 = θ.

The autocovariance-generating function of the

observed xt series can be calculated by direct

application of (1.6.3):

gx (z ) = σ2 1 + θ 1 z + θ 2 z 2 1 + θ1z 1
+ θ2z 2
(1.6.14)

It is often instructive, however, to keep the


polynomial (1 + θ 1 z + θ 2 z 2 ) in its factored form of
the …rst line of (1.6.13),
F. Guta (CoBE) Econ 654 September, 2017 137 / 296
(1 + θ 1 z + θ 2 z 2 ) = (1 z ) (1 + θz ) ,
in which case (1.6.14) could be written as

gx (z ) = σ2 (1 z ) (1 + θz ) 1 z 1
1 + θz 1

1
= (1 z) 1 z gy (z ) (1.6.15)

Of course, (1.6.14) and (1.6.15) represent the


identical function of z, and which way we choose to
write it is simply a matter of convenience.
Applying the …lter (1 L) to yt thus results in
multiplying its autocovariance-generating function
F. Guta (CoBE) Econ 654 September, 2017 138 / 296
1
by (1 z ) (1 z ).
This principle readily generalizes.
Suppose that the original data series fyt g satis…es
(1.6.5) through (1.6.7). Let’s say the data are
…ltered according to

xt = h (L) yt (1.6.16)

with h (L) = ∑j∞= ∞ hj Lj and ∑j∞= ∞ jhj j < ∞.


Substituting (1.6.5) into (1.6.16) , the observed
data xt are then generated by
F. Guta (CoBE) Econ 654 September, 2017 139 / 296
xt = h (1) µ + h (L) θ (L) et = µ + θ (L) et .

where µ = h (1) µ and θ (L) = h (L) θ (L).

The sequence of coe¢ cients associated with the



compound operator θ j j= ∞
turns out to be
absolutely summable and the
autocovariance-generating function of xt can
accordingly be calculated as

gx (z ) = σ 2 θ (z ) θ z 1
= σ 2 h (z ) θ (z ) h z 1
θ z 1

1
= h (z ) h z gy (z ) (1.6.17)
F. Guta (CoBE) Econ 654 September, 2017 140 / 296
Applying the …lter h(L) to a series thus results in
multiplying its autocovariance-generating function
1
by h (z ) h (z ).

1.7 Trend Stationarity

yt = µ0 + µ1 t + St (1.7.1)

St = ρ1 St 1 + ρ2 St 2 + + ρk St k + et (1.7.2)

Or,

yt = µ0 + µ1 t + ρ1 St 1 + ρ2 St 2 + + ρk St k + et (1.7.3)
F. Guta (CoBE) Econ 654 September, 2017 141 / 296
There are two essentially equivalent ways to
estimate the autoregressive parameters
( ρ1 , ρ2 , . . . ρk ).
You can estimate (1.7.3) by OLS.
You can estimate (1.7.1) (1.7.2) sequentially by
OLS.
bt ,
That is, …rst estimate (1.7.1), get the residual S
and then perform regression (1.7.2) replacing St
bt .
with S
F. Guta (CoBE) Econ 654 September, 2017 142 / 296
This procedure is sometimes called Detrending.

Seasonal E¤ects

There are three popular methods to deal with


seasonal data.

1). Include dummy variables for each season. This


presumes that “seasonality” does not change over
the sample.
2). Use “seasonally adjusted” data. The seasonal factor
is typically estimated by a two-sided weighted
F. Guta (CoBE) Econ 654 September, 2017 143 / 296
average of the data for that season in neighboring
years.

Thus the seasonally adjusted data is a “…ltered”


series.
This is a ‡exible approach which can extract a wide
range of seasonal factors.
The seasonal adjustment, however, also alters the
time-series correlations of the data.

3). First apply a seasonal di¤erencing operator.


F. Guta (CoBE) Econ 654 September, 2017 144 / 296
If s is the number of seasons (typically s = 4 or
s = 12);
∆s yt = yt yt s ,

or the season-to-season change.


The series ∆s yt is clearly free of seasonality.
But the long-run trend is also eliminated, and
perhaps this was of relevance.

1.8 Testing for Omitted Serial Correlation

For simplicity, let the null hypothesis be an AR (1):


F. Guta (CoBE) Econ 654 September, 2017 145 / 296
yt = α + ρyt 1 + ut (1.8.1)
We are interested in the question if the error ut is
serially correlated.
We model this as an AR (1):

ut = θut 1 + et (1.8.2)

with et a MDS.

The hypothesis of no omitted serial correlation is

H0 : θ = 0 vs H1 : θ 6= 0
F. Guta (CoBE) Econ 654 September, 2017 146 / 296
We want to test H0 against H1 :
To combine (1.8.1) and (1.8.2), we take (1.8.1)
and lag the equation once:

yt 1 = α + ρyt 2 + ut 1

We then multiply this by θ and subtract from


(1.8.1), to …nd

yt θyt 1 =α αθ + ρyt 1 θρyt 2 + ut θut 1,

Or yt = α (1 θ ) + (ρ + θ ) yt 1 θρyt 2 + et = AR (2)
F. Guta (CoBE) Econ 654 September, 2017 147 / 296
Thus under H0 , yt is an AR (1), and under H1 it is
an AR (2).
H0 may be expressed as the restriction that the
coe¢ cient on yt 2 is zero.
An appropriate test of H0 against H1 is therefore a
Wald test that the coe¢ cient on yt 2 is zero (a
simple exclusion test).
In general, if the null hypothesis is that yt is an
AR (k ), and the alternative is that the error is an
F. Guta (CoBE) Econ 654 September, 2017 148 / 296
AR (m), this is the same as saying that under the
alternative yt is an AR (k + m), and this is
equivalent to the restriction that the coe¢ cients on
yt k 1 , yt k 2 , . . . , yt k m are jointly zero.
An appropriate test is the Wald test of this restriction.

1.9 Stationarity and Unit Roots

Stationarity of a stochastic process requires that the


variances and autocovariances are …nite and
independent of time.
F. Guta (CoBE) Econ 654 September, 2017 149 / 296
It is easily veri…ed that …nite-order MA processes
are stationary by construction as they correspond to
a weighted sum of white noise processes.
However, this result breaks down if we allow the MA
coe¢ cients to vary over time, as in

yt = et + g (t ) et 1 (1.9.1)

where g (t ) is some deterministic function of t.


Now, we have
2
E yt2 = σ2 + g (t ) σ2
F. Guta (CoBE) Econ 654 September, 2017 150 / 296
which is not independent of t. Consequently, the
process in (1.9.1) is non-stationary.
Stationarity of AR or ARMA processes is less trivial.
Consider, for example, the following AR (1) process

yt = φyt 1 + et , (1.9.2)

with φ = 1.
Taking variances on both sides gives
var (yt ) = var (yt 1) + σ2 , which has no solution
for the variance of the process consistent with
F. Guta (CoBE) Econ 654 September, 2017 151 / 296
stationarity, unless σ2 = 0, in which case in…nity of
solutions exists.

The process in (1.9.2) is a …rst-order autoregressive


process with a unit root (φ = 1), usually referred to
as a random walk without a drift.
The unconditional variance of yt does not exist, i.e.,
is in…nite and the process is non-stationary.
In fact, for any value of φ with φ 1, (1.9.2)
describes a non-stationary process.
F. Guta (CoBE) Econ 654 September, 2017 152 / 296
We can formalize the above result as follows.
The AR (1) process is stationary if and only if the
polynomial 1 φL is invertible, i.e., if the roots of
the reverse characteristic equation 1 φz = 0 lies
outside the unit circle.
This result is straightforwardly generalized to
arbitrary ARMA models.
The ARMA(p, q ) model

ψ (L) yt = θ (L) et (1.9.3)


F. Guta (CoBE) Econ 654 September, 2017 153 / 296
corresponds to a stationary process if and only if the
solutions z1 , . . . , zp to ψ (z ) = 0 lie outside the unit
circle, that is, when the AR polynomial is invertible.
For example, the ARMA(2, 1) process given by

yt = 1.2yt 1 0.2yt 2 + et 0.5et (1.9.4)

is non-stationary because z = 1 is a solution to


1 1.2z + 0.2z 2 = 0.
A special case that is of particular interest arises
when one root is exactly equal to one, while the
F. Guta (CoBE) Econ 654 September, 2017 154 / 296
other roots are larger than one.
If this arises, we can write the process for yt as

ψ (L) (1 L) yt = ψ (L) ∆yt = θ (L) et (1.9.5)

where ψ (L) is an invertible polynomial in L of


order p 1, and ∆ = 1 L is the …rst-di¤erence
operator.
Because the roots of the AR polynomial are the
solutions to ψ (z ) (1 z ) = 0, there is one
solution z = 1, or in other words a single unit root.
F. Guta (CoBE) Econ 654 September, 2017 155 / 296
Equation (1.9.5) thus shows that ∆yt can be
described as a stationary ARMA model if the
process for yt has one unit root.
Consequently, we can eliminate the non-stationarity
by transforming the series into …rst-di¤erences.
Writing the process in (1.9.4) as

(1 0.2L) (1 L) yt = (1 0.5L) et

shows that ∆yt is described as a stationary


ARMA(1, 1) process given by
F. Guta (CoBE) Econ 654 September, 2017 156 / 296
∆yt = 0.2∆yt 1 + et 0.5et 1

A series that becomes stationary after


…rst-di¤erencing is said to be integrated of order
one, denoted by I (1).
If ∆yt is described by stationary ARMA(p, q ) model,
we say that yt is described by an autoregressive
integrated moving average (ARIMA) model of order
p, 1, q, or in short an ARIMA(p, 1, q ) model.
First di¤erencing quite often transforms a
F. Guta (CoBE) Econ 654 September, 2017 157 / 296
non-stationary series into a stationary series.

In particular this may be the case for aggregate


economic series or their natural logarithms.
In some cases, taking …rst-di¤erences is insu¢ cient
to produce stationary series and another di¤erencing
step is required.
In this case the stationary series is given by:

∆ (∆yt ) = ∆2 yt = ∆yt ∆yt 1

If the series must be di¤erenced twice before it


F. Guta (CoBE) Econ 654 September, 2017 158 / 296
becomes stationary, then it is said to be integrated
of order two, denoted by I (2), and it must have two
unit roots.
Thus, a series yt is I (2) if ∆yt is non-stationary but
∆2 yt is stationary.
In general, the main di¤erence between I (0) and
I (1) processes can be summarized as follows.
An I (0) series ‡uctuates around its mean with a
…nite variance that does not depend on time, while
an I (1) series wanders widely.
F. Guta (CoBE) Econ 654 September, 2017 159 / 296
Typically, an I (0) series is mean reverting, as there
is a tendency in the long run to return to its mean.
Furthermore, an I (0) series has a limited memory of
its past behavior (implying that the e¤ects of a
particular random innovation are only transitory).
However, an I (1) series has in…nitely long memory
(implying that an innovation will permanently a¤ect
the process).
These aspects becomes clear from the
F. Guta (CoBE) Econ 654 September, 2017 160 / 296
autocorrelation functions: for an I (0) series the
autocorrelation function declines rapidly as the lag
length increases, while for the I (1) process the
estimated autocorrelation coe¢ cients decay to zero
very slowly.

The last property makes the presence of a unit root


an interesting question from an economic point of
view.
In models with unit roots, shocks (which may be
F. Guta (CoBE) Econ 654 September, 2017 161 / 296
due to policy interventions) have persistent e¤ects
that last forever, while, in the case of stationary
models, shocks can only have a temporary e¤ect.

The long-run e¤ect of a shock is not necessarily of


the same magnitude as the short-run e¤ect.
The fact that the autocorrelations of a stationary
series die out rapidly may help in determining the
number of times di¤erencing is needed to achieve
stationarity.
F. Guta (CoBE) Econ 654 September, 2017 162 / 296
In addition, a number of formal unit root test has
been proposed in the literature.

1.9.1 Testing for Unit Roots in a AR(1) Model

Consider the AR (1) process

yt = µ + ρyt 1 + et (1.9.6)

where ρ = 1 corresponds to a unit root.

It seems obvious to use the estimate b


ρ for ρ from an
OLS procedure (which is consistent, irrespective of
F. Guta (CoBE) Econ 654 September, 2017 163 / 296
the true value of ρ) and the corresponding standard
error to test the null hypothesis of a unit root.

However, as shown in the seminal paper of Dickey


and Fuller (1979), under the null hypothesis that
ρ = 1 the standard t-ratio does not have a
t-distribution, not even asymptotically.
The reason for this is the non-stationarity of the
process invalidating standard results on the
distribution of the OLS estimator b
ρ.
F. Guta (CoBE) Econ 654 September, 2017 164 / 296
To test for the presence of a unit root in AR (1) the
null and the alternative hypothesis are given by

H0 : ρ = 1
H1 : ρ < 1

It is possible to use the standard t-statistic to test


the above hypothesis, where the t-statistic is given
by
b
ρ 1
b
τµ = (1.9.7)
.e. (b
sd ρ)
where s.e. (b
ρ) denotes the usual OLS standard
F. Guta (CoBE) Econ 654 September, 2017 165 / 296
error. Critical Values, however, have to be taken
from the appropriate distribution, which under the
null hypothesis of non-stationarity is nonstandard.

In particular, the distribution is skewed to the left


(with a long left-hand tail) so that critical values are
smaller than those for (the normal approximation
of) the t-distribution.
Using a 5% signi…cance level in a one-tailed test of
H0 : ρ = 1 (a unit root) against H1 : ρ < 1
F. Guta (CoBE) Econ 654 September, 2017 166 / 296
(stationary), the correct critical value in large
samples is 2.86 rather than 1.65 for the normal
approximation.

Usually, a slightly more convenient regression


procedure is used, in which case, the model is
written as:

∆yt = µ + (ρ 1) yt 1 + et (1.9.8)

from which the b


τ µ -statistic for H0 : ρ 1 = 0 is
identical to the b
τ µ above.
F. Guta (CoBE) Econ 654 September, 2017 167 / 296
The reason for this is that the least squares method
is invariant to linear transformations of the model.
Under the null hypothesis of a unit root the above
model turns out to be

∆yt = µ + et (1.9.9)

which is known as a random walk with drift, where


µ is the drift parameter.
In the model for the level variable yt , µ corresponds
to a linear time trend as (1.9.9) implies that
F. Guta (CoBE) Econ 654 September, 2017 168 / 296
E (∆yt ) = µ.

Hence for a given initial value y0 , E (yt ) = y0 + µt.


This shows that the interpretation of the intercept
term in (1.9.9) depends on the presence of a unit
root.
In the stationary case, µ re‡ects the nonzero mean
of the series, while in the unit root case it re‡ects a
deterministic trend in yt .
Because in the latter case …rst-di¤erencing produces
F. Guta (CoBE) Econ 654 September, 2017 169 / 296
a stationary time series, the process for yt is referred
to as di¤erence stationary.
It is also possible that non-stationarity is caused by
the presence of a deterministic time trend in the
process, rather than by the presence of a unit root.
This happens when the AR (1) model is extended to

yt = µ + γt + ρyt 1 + et (1.9.10)

with jρj < 1 and γ 6= 0.


In this case we have a non-stationary process
F. Guta (CoBE) Econ 654 September, 2017 170 / 296
because of the linear trend γt.

This non-stationarity can be removed by regressing


yt upon a constant and t, and then considering the
residuals of this regression, or by simply including t
as an additional variable in the model.
The process for yt in this case is referred to as being
trend stationary.
Non-stationary process may thus be characterized by
the presence of a deterministic trend or a stochastic
F. Guta (CoBE) Econ 654 September, 2017 171 / 296
trend implied by the presence of a unit root or both.
It is possible to test whether yt follows a random
walk against the alternative that it follows the trend
stationary process as in (1.9.10).
This can be tested by running the regression

∆yt = µ + γt + (ρ 1) yt 1 + et (1.9.11)

The null hypothesis one would like to test is that


the process is a random walk given by
H0 : µ = γ = ρ 1 = 0.
F. Guta (CoBE) Econ 654 September, 2017 172 / 296
Instead of testing this joint hypothesis, it is quite
common to use the t-ratio on b
ρ 1, denoted by b
ττ ,
assuming that the other restrictions in the null
hypothesis are satis…ed.
Although the null hypothesis is still the same as in
the previous unit root test, the testing regression is
di¤erent and thus we have di¤erent distribution of
the test statistic.
It should be noted that if the unit root hypothesis

F. Guta (CoBE) Econ 654 September, 2017 173 / 296


ρ 1 = 0 is rejected, we cannot conclude that the
process for yt is likely to be stationary.

Under the alternative hypothesis γ may be nonzero


so that the process for yt is not stationary (but only
trend stationary).
If a graphical inspection of the series indicates a
clear positive or negative trend, it is most
appropriate to perform the Dickey-Fuller test with a
trend.
F. Guta (CoBE) Econ 654 September, 2017 174 / 296
This implies that the alternative hypothesis allows
the process to exhibit a linear deterministic trend.
If we are unable to reject the presence of a unit
root, it does not necessarily mean that it is true.
It could just be that there is insu¢ cient information
in the data to reject it.
Kwiatkowski, Phillips, Schmidt and Shin (1992)
propose an alternative test where stationarity is the
null hypothesis and the existence of a unit root is
the alternative.
F. Guta (CoBE) Econ 654 September, 2017 175 / 296
This test is usually referred to as the KPSS test.
The basic idea is that a time series is decomposed
into the sum of a deterministic time trend, a
random walk and a stationary error term (typically
not a white noise).
The null hypothesis (of trend stationarity) speci…es
that the variance of the random walk component is
zero.
The test is actually a Lagrange multiplier test, and
F. Guta (CoBE) Econ 654 September, 2017 176 / 296
computation of the test statistic is fairly simple.

First run an auxiliary regression of yt upon an


intercept and a time trend t.
Next, save the OLS residuals et and compute the
partial sums St = ∑ts =1 es for all t.
Then the test statistic is given by
T
KPSS = T 2
∑ St2 /bσ2 (1.9.12)
t =1

b2 is an estimator of the error variance.


where σ
F. Guta (CoBE) Econ 654 September, 2017 177 / 296
b2 may involve corrections for
This latter estimator σ
autocorrelations based on the Newey-West formula.
The asymptotic distribution is nonstandard, and
Kwiatkowski, Phillips, Schmidt and Shin (1992)
report a 5% critical value of 0.146.
If the null hypothesis is stationary rather than trend
stationary, the trend term should be dropped from
the auxiliary regression.
The test statistic is computed in the same fashion,
F. Guta (CoBE) Econ 654 September, 2017 178 / 296
but the 5% critical value is 0.463.

1.9.2 Testing for Unit Roots in Higher-order AR Models

A test for a unit root in higher-order AR processes


can easily be obtained by extending the
Dickey-Fuller test procedures.
The general strategy is that lagged di¤erences, such
as ∆yt 1, ∆yt 2, . . . are included in the regression,
such that its error term corresponds to white

F. Guta (CoBE) Econ 654 September, 2017 179 / 296


noise.This leads to the so called augmented
Dickey-Fuller tests (ADF tests), for which the same
asymptotic critical values hold as those of the
Dickey-Fuller tests.
Consider the AR (k ) model

ρ (L) yt = µ + et (1.9.13)
ρ (L) = 1 ρ1 L ρ2 L2 ρk Lk

As we discussed before, yt has a unit root when


ρ (1) = 0, or ρ1 + ρ2 + + ρk = 1.
F. Guta (CoBE) Econ 654 September, 2017 180 / 296
In this case, yt is non-stationary.
The ergodic theorem and MDS CLT do not apply,
and test statistics are asymptotically non-normal.
A helpful way to write the above equation is using
the so-called Dickey-Fuller (DF) reparametrization:

∆yt = µ + α0 yt 1 + α 1 ∆ yt 1 + + αk 1 ∆ yt ( k 1 ) + et (1.9.14)

These models are equivalent linear transformations


of one another.
The DF parameterization is convenient because the
F. Guta (CoBE) Econ 654 September, 2017 181 / 296
parameter α0 summarizes the information about the
unit root, since ρ (1) = α0 .

To see this, observe that the lag polynomial for the

yt computed from (1.9.14) is

(1 L) α0 L α1 L L2 αk 1 Lk 1 Lk

But this must equal to ρ (L), as the models are


equivalent.
Thus
F. Guta (CoBE) Econ 654 September, 2017 182 / 296
ρ ( 1) = ( 1 1) α0 α1 (1 1) αk 1 (1 1) = α0

Hence, the hypothesis of a unit root in yt can be


stated as
H0 : α0 = 0

Note that the model is stationary if α0 < 0.


So the natural alternative is

H1 : α0 < 0

Under H0 , the model for yt is


F. Guta (CoBE) Econ 654 September, 2017 183 / 296
∆yt = µ + α1 ∆yt 1 + + αk 1 ∆yt (k 1) + et
which is an AR (k 1) in the …rst-di¤erence ∆yt .

Thus if yt has a (single) unit root, then ∆yt is a


stationary AR process.
Because of this property, we say that if yt is
non-stationary but ∆d yt is stationary, then yt is
“integrated of order d”, or I (d ).
Thus a time series with unit root is I (1).
Since α0 is the parameter of a linear regression, the
F. Guta (CoBE) Econ 654 September, 2017 184 / 296
natural test statistic is the t-statistic for H0 from
OLS estimation of (1.9.14).

Indeed, this is the most popular unit root test, and


is called the Augmented Dickey-Fuller (ADF) test
for a unit root.
It would seem natural to assess the signi…cance of
the ADF statistic using the normal table.
However, under H0 , yt is non-stationary, so
conventional normal asymptotics are invalid.
F. Guta (CoBE) Econ 654 September, 2017 185 / 296
An alternative asymptotic framework has been
developed to deal with non-stationary data.
We do not have the time to develop this theory in
detail, but simply assert the main results.
Theorem
1.9.1 Dickey-Fuller Theorem : Assume α0 = 0. As
T ! ∞,

d
Tb
α0 ! (1 α1 α2 αk 1 ) ρµ
b
α0 d
ADF = ! τµ
b (b
se α0 )
F. Guta (CoBE) Econ 654 September, 2017 186 / 296
The limiting distributions ρµ and τ µ are non-normal.
They are skewed to the left, and have negative
means.
The …rst result states that b
α0 converges to its true
value (of zero) at rate T , rather than the
conventional rate of T 1/2 .
This is called a “super-consistent” rate of
convergence.

F. Guta (CoBE) Econ 654 September, 2017 187 / 296


The second result states that the t-statistic for b
α0
converges to a limiting distribution which is
non-normal, but does not depend on the parameters.
This distribution has been extensively tabulated,
and may be used for testing the hypothesis H0 .

Note: The standard error se (b


α0 ) is the conventional
(“homoscedastic”) standard error.

But the theorem does not require an assumption of


homoscedasticity.
F. Guta (CoBE) Econ 654 September, 2017 188 / 296
Thus the Dickey-Fuller test is robust to
heteroscedasticity.
Since the alternative hypothesis is one-sided, the
ADF test rejects H0 in favour of H1 when ADF < c,
where c is the critical value from the ADF table.
If the test rejects H0 , this means that the evidence
points to yt being stationary.
If the test does not reject H0 , a common conclusion
is that the data suggests that yt is non-stationary.
F. Guta (CoBE) Econ 654 September, 2017 189 / 296
However, this is not really a correct conclusion.
All we can say is that there is insu¢ cient evidence
to conclude whether the data are stationary or not.
We have described the test for the setting with an
intercept.
Another popular setting includes as well a linear
time trend. This model is

∆yt = µ1 + µ2 t + α0 yt 1 + α1 ∆yt 1 + + αk 1 ∆ y t (k 1 ) + et (1.9.15)

This is natural when the alternative hypothesis is


F. Guta (CoBE) Econ 654 September, 2017 190 / 296
that the series is stationary about a linear time
trend.

If the series has a linear trend (e.g. GDP, Stock


Prices), then the series itself is non- stationary, but
it may be stationary around the linear time trend.
In this context, it is a waste of time to …t an AR
model to the level of the series without a time
trend, as the AR model cannot conceivably describe
this data.
F. Guta (CoBE) Econ 654 September, 2017 191 / 296
The natural solution is to include a time trend in
the …tted OLS equation.
When conducting the ADF test, this means that it
is computed as the t-ratio for α0 from OLS
estimation of (1.9.15). If a time trend is included,
the test procedure is the same, but di¤erent critical
values are required.
The ADF test has a di¤erent distribution when the
time trend has been included, and a di¤erent table
should be consulted.
F. Guta (CoBE) Econ 654 September, 2017 192 / 296
If too many lags are included, this will somewhat
reduce the power of the tests, but, if too few lags
are included, the asymptotic distributions from the
table are simply invalid (because of autocorrelation
in the residuals), and the tests may lead to seriously
biased conclusions.
It is possible to use the model selection criterion or
statistical signi…cance of the additional variables to
select the lag length in the ADF tests.

F. Guta (CoBE) Econ 654 September, 2017 193 / 296


A regression of the form (1.9.14) can also be used
to test for a unit root in a general (invertible)
ARMA model.
Said and Dickey (1984) argue that when,
theoretically, one lets the number of lags in the
regression grow with the sample size, the same
asymptotic distributions hold and the ADF tests are
also valid for an ARMA model.
The argument essentially is that an ARMA model

F. Guta (CoBE) Econ 654 September, 2017 194 / 296


(with invertible MA polynomial) can be written as
an in…nite order autoregressive process.

This explains why, when testing for unit roots,


people usually do not worry about MA components.
Phillips and Perron (1988) have suggested an
alternative to the augmented Dickey-Fuller tests.
Instead of adding additional lags in the regressions
to obtain an error term that has no autocorrelation,
they stick to the original Dickey-Fuller regressions
F. Guta (CoBE) Econ 654 September, 2017 195 / 296
but make nonparametric adjustments to the
Dickey-Fuller test statistics to take into account of
potential autocorrelation pattern in the errors.

Phillips-Perron tests are nonparametric in nature


and applicable to a wide class of weakly dependent
and heterogeneously distributed innovations.
If the ADF test does not reject the null hypothesis
of one unit root, the presence of a second unit root
may be tested by estimating the regression of ∆2 yt
F. Guta (CoBE) Econ 654 September, 2017 196 / 296
on ∆yt 1, ∆2 yt 1, . . . , ∆
2
yt p +1 , and comparing the
t-ratio of the coe¢ cient on ∆yt 1 with the
appropriate critical value.

Alternatively, the presence of two unit roots may be


tested jointly by estimating the regression of ∆2 yt
on yt 1, ∆yt 1, ∆2 yt 1, . . . , ∆
2
yt p +1 , and
computing the usual F -statistic for testing the joint
signi…cance of yt 1 and ∆yt 1 .
This test statistic, under the null hypothesis of a
F. Guta (CoBE) Econ 654 September, 2017 197 / 296
double unit root, has a distribution that is not the
usual F -distribution.

Critical values of this distribution are tabulated by


Hasza and Fuller (1979).
A stochastic process may be non-stationary for
other reasons than the presence of unit roots.
A linear deterministic trend is one example and
structural breaks in the series may also mimic the
presence of unit root non-stationarity.
F. Guta (CoBE) Econ 654 September, 2017 198 / 296
Without going into details, it may be mentioned
that the recent literature on unit roots also includes
discussions on stochastic unit roots, seasonal unit
roots, fractional integration and panel data unit
roots.
A stochastic unit root implies that a process is
characterized by a root that is not constant but
stochastic and vary around unity.
Such a process can be stationary for some periods

F. Guta (CoBE) Econ 654 September, 2017 199 / 296


and explosive for others (see Granger and Swanson,
1977).

A seasonal unit root arises if a series becomes


stationary after seasonal di¤erencing.
Fractional integration starts from the idea that a
series may be integrated of order d, where d is not
an integer.
1
If d 2, the process is non-stationary and said to
be fractionally integrated.
F. Guta (CoBE) Econ 654 September, 2017 200 / 296
Finally, panel data unit root tests involve tests for
unit roots in multiple series, for example GDP in
ten di¤erent countries.

1.10 Estimation of ARMA Models

Suppose we know that the data series,y1 , y2 , . . . , yT


, is generated by an ARMA process of order p, q.
Depending upon the speci…cation of the model, and
the distributional assumptions we are willing to

F. Guta (CoBE) Econ 654 September, 2017 201 / 296


make, we can estimate the unknown parameters by
ordinary or nonlinear least squares or by maximum
likelihood.

1.10.1 Least Squares

The least squares approach chooses the model


parameters such that the residual sum of squares is
minimal.
This is particularly easy for models in autoregressive
form.
F. Guta (CoBE) Econ 654 September, 2017 202 / 296
Consider the AR (p) model

yt = α + φ1 yt 1 + + φp yt p + et (1.10.1)

where et is a white noise error term that is


uncorrelated with anything dated t 1 or before.

Consequently, we have

E (yt j et ) = 0 for j = 1, 2, . . . , p

that is, error terms and explanatory variables are


contemporaneously uncorrelated and OLS applied
F. Guta (CoBE) Econ 654 September, 2017 203 / 296
to (1.10.1) provides consistent estimates.
Estimation of an autoregressive model is thus not
di¤erent from a linear regression model with lagged
dependent variables.
Estimation of the AR(p) process
0
Let Xt = 1 yt 1 yt 2 yt p
0
β= α φ1 φ2 φp
Then the AR (p ) model can be written as
yt = Xt0 β + et .
F. Guta (CoBE) Econ 654 September, 2017 204 / 296
The OLS estimator of the model is then given by:
b = (X0 X ) 1
β X0 y
b it helpful to de…ne the
To study properties of β,
process ut = Xt et
Note that ut is a MDS, since

E ( ut j Ft 1) = E ( Xt et j Ft 1 ) = Xt E ( et j Ft 1 ) = 0

By Theorem 1.1.1, it is also strictly stationary and


ergodic. Hence,
F. Guta (CoBE) Econ 654 September, 2017 205 / 296
1 T 1 T p
∑t =1 Xt et = ∑t =1 ut ! E (ut ) = 0 (1.10.2)
T T
The vector Xt is strictly stationary and ergodic, and
by Theorem 1.1.1, so is Xt Xt0 .
Therefore by the Ergodic Theorem,
T
1 p
T ∑ Xt Xt0 ! E (Xt Xt0 ) = Q
t =1

Combined with (1.10.2) and the continuous

mapping theorem, we see that


! !
T T
1 1
b = β+
β
T ∑ Xt Xt0 T ∑ Xt et ! β+Q 1
0=β
t =1 t =1
F. Guta (CoBE) Econ 654 September, 2017 206 / 296
We have shown the following:
Theorem
1.10.1 If the AR (p ) process is strictly stationary and
p
b !
ergodic and E (yt2 ) < ∞, then β β as T " ∞.

Asymptotic Distribution

Theorem
1

1.10.2 MDS CLT . If ut is strictly stationary and ergodic


MDS and E (ut ut0 ) = Ω < ∞, then as T " ∞,
d
p1 T
∑t =1 ut
F. Guta (CoBE)
! N (0, Ω) .
Econ 654 September, 2017 207 / 296
Since Xt et is a MDS, we can apply theorem 1.10.2
to see that
T
1
∑ Xt et
d
p ! N (0, Ω) , where Ω = E Xt Xt0 et2
T t =1

Theorem
1.10.3 If the AR (p ) process yt is strictly stationary and
ergodic and E (yt4 ) < ∞, then as T " ∞,
p d
b
T β β ! N 0, Q 1 ΩQ 1

F. Guta (CoBE) Econ 654 September, 2017 208 / 296


This is identical in form to the asymptotic
distribution of OLS estimator in cross-section
regression.
In particular, the asymptotic covariance matrix is
estimated just as in the cross-section case.

Estimation of MA Process

For moving average models, estimation is somewhat


more complicated.
Suppose that we have an MA(1) model
F. Guta (CoBE) Econ 654 September, 2017 209 / 296
yt = α + et + θet 1

Because et 1 is not observed, we cannot apply


regression here.
In theory, ordinary least squares would minimize

∑t =2 (yt
T 2
S (θ, α) = α θet 1)

A possible solution arises if we write et 1 in this


expression as a function of observed yt ’s.
This is only possible if the MA polynomial is
invertible.
F. Guta (CoBE) Econ 654 September, 2017 210 / 296
In this case we can use


et 1 = ∑j =0 ( θ )j (yt 1 j α)

and write

h i2
T ∞
S (θ , α) = ∑t =2 yt α θ ∑j =0 ( θ )j yt 1 j α

In practice, yt is not observed for t = 0, 1, . . . , so


we have to cut o¤ the in…nite sum in the above
expression to obtain an approximate sum of squares
F. Guta (CoBE) Econ 654 September, 2017 211 / 296
h i2
e ( θ , α ) = ∑T
S t =2 yt α θ ∑tj =02 ( θ )j (yt 1 j α) (1.10.3)

Because, asymptotically, if T goes to in…nity the


e (θ, α) disappears
di¤erence between S (θ, α) and S
and minimizing (1.10.3) with respect to α and θ
α and b
gives consistent estimators b θ.
Unfortunately, (1.10.3) is a higher-order polynomial
in θ and thus has very many local minima.
Therefore, analytically (1.10.3) is complicated.
However, as we know that 1 < θ < 1, a grid
search can be performed.
F. Guta (CoBE) Econ 654 September, 2017 212 / 296
The resulting nonlinear least squares estimators for
α and θ are consistent and asymptotically normal.
1.10.2 Maximum Likelihood Estimation
Consider an ARMA model of the form

yt = α + φ1 yt 1 + + φp yt p + et + θ 1 et 1 + + θ q et q (1.10.4)

with et white noise:

E ( et ) = 0 (1.10.5)
8
>
>
< σ2 , t = τ
E ( et e τ ) = (1.10.6)
>
>
: 0 , otherwise
F. Guta (CoBE) Econ 654 September, 2017 213 / 296
This section explores how to estimate the values of
α, φ1 , . . . , φp , θ 1 , . . . , θ q , σ2 on the basis of
observations on y.
The primary principle on which estimation will be
based is maximum likelihood.
0
2
Let θ = α, φ1 , . . . , φp , θ 1 , . . . , θ q , σ denote the
vector of population parameters.
Suppose we have observed a sample of size T
(y1 , . . . , yT ).
F. Guta (CoBE) Econ 654 September, 2017 214 / 296
The approach will be to calculate the probability
density

fyT ,...,y1 (y1 , . . . , yT ; θ) (1.10.7)

which might be viewed as the probability of having


observed this particular sample.
The maximum likelihood estimate (MLE ) of θ is
the value for which this sample is most likely to
have been observed; that is, it is the value of θ that
maximizes (1.10.7).
F. Guta (CoBE) Econ 654 September, 2017 215 / 296
This approach requires specifying a particular
distribution for the white noise process et .
Typically we will assume that et is Gaussian white
noise:
et i.i.d.N 0, σ2 (1.10.8)

This assumption is strong, however, the estimates


of θ that results from it will often turn out to be a
sensible for non-Gaussian processes as well.
Finding the MLE conceptually involves two steps.
F. Guta (CoBE) Econ 654 September, 2017 216 / 296
First, the likelihood function (1.10.7) must be
calculated.
Second, values of θ must be found that maximize
this function.
The Likelihood Function for a Gaussian AR(1)
Process
A Gaussian AR (1) process takes the form

Yt = α + φYt 1 + et (1.10.9)

with et i.i.d.N (0, σ2 ).


F. Guta (CoBE) Econ 654 September, 2017 217 / 296
For this case , the vector of parameters to be
0
estimated consists of θ = (α, φ, σ2 ) .
Consider the probability distribution of Y1 , the …rst
observation in the sample.
From equations (1.2.3) and (1.2.4) this is a random
variable with mean E (Y1 ) = µ = α/ (1 φ) and
2
variance E (Y1 µ ) = σ2 / 1 φ2 .

Since fet gt = ∞ is Gaussian, Y1 is also Gaussian.

F. Guta (CoBE) Econ 654 September, 2017 218 / 296


Hence, the density of the …rst observation takes the
form

fY 1 (y1 ; θ) = fY 1 y1 , α, φ, σ2 (1.10.10)
( )
1 [y1 α/ (1 φ)]2
= q exp
2πσ2 / 1 φ2 2σ 2 / 1 φ2

Next consider the distribution of the second


observation Y2 conditional on observing Y1 = y1 .
From (1.10.9),

Y2 = α + φY1 + e2 (1.10.11)
F. Guta (CoBE) Econ 654 September, 2017 219 / 296
Conditional on Y1 = y1 , Y 2 j Y 1 = y1 N α + φy1 , σ 2 . Hence,
( )
1 ( y2 α φ y1 ) 2
f Y 2 jY 1 ( y2 j y1 ; θ) = p exp (1.10.12)
2πσ2 2σ 2

The joint density of observations 1 and 2 is then


just the product of (1.10.10) and (1.10.12):

fY2 ,Y1 (y2 , y1 ; θ) = f Y2 jY1 ( y2 j y1 ; θ) fY1 (y1 ; θ)

Similarly, distribution of the third observation


conditional on the …rst two is
F. Guta (CoBE) Econ 654 September, 2017 220 / 296
( )
1 (y3 α φy2 )2
f Y 3 jY 2 ,Y 1 ( y3 j y2 , y1 ; θ) = p exp
2πσ2 2σ 2

from which,

fY3 ,Y2 ,Y1 (y3 , y2 , y1 ; θ) = f Y3 jY2 ,Y1 ( y3 j y2 , y1 ; θ)

f Y2 jY1 ( y2 j y1 ; θ) fY1 (y1 ; θ)

In general, the values of Y1 , Y2 , . . . , Yt 1 matter for


Yt only through the value of Yt 1, and the density
of observation t conditional on the preceding t 1
observations is given by
F. Guta (CoBE) Econ 654 September, 2017 221 / 296
f Y t jY t 1 ,...,Y 1
( yt j yt = f Y t jY t 1 ( yt j yt 1 ; θ)
1 , . . . , y1 ; θ) (1.10.13)
( )
1 (yt α φyt 1 )2
=p exp
2πσ2 2σ 2

The joint density of the …rst t observations is then

fY t ,...,Y 1 (yt , . . . , y1 ; θ) = f Y t jY t 1
( yt j yt 1 ; θ) (1.10.14)

fY t 1 ,...,Y 1
( yt 1 , . . . , y1 ; θ)

The likelihood of the complete sample can thus be


calculated as
T
fY T ,...,Y 1 (yT , . . . , y1 ; θ) = fY 1 (y1 ; θ) ∏ f Y t jY t 1
( yt j yt 1 ; θ) (1.10.15)
t =2

F. Guta (CoBE) Econ 654 September, 2017 222 / 296


The log likelihood function (denoted by ` (θ)) can be found

by taking log of (1.10.15)

T
` (θ) = ln fY1 (y1 ; θ) + ∑ ln f Y jY t t 1
( yt j yt 1 ; θ) (1.10.16)
t =2

Substituting (1.10.10) and (1.10.13) into (1.10.16), the

loglikelihood for a sample of size T from a Gaussian AR (1)

proces is seen to be

fy1 α/ (1 φ)g2
` (θ) = 1
2 ln 2π 1
2 ln σ2 / 1 φ2
2 σ 2 / 1 φ2
T
T
2
1
ln 2π T
2
1
ln σ2 1
2 σ2 ∑ (yt α φyt 1)
2
(1.10.17)
t =2
F. Guta (CoBE) Econ 654 September, 2017 223 / 296
The MLE b
θ is the value for which (1.10.17) is
maximized.
In principle, this requires di¤erentiating (1.10.17)
and setting the result equal to zero.
In practice, when an attempt is made to carry this
out, the result is a system of nonlinear equations in
θ and (y1 , . . . , yT ) for which there is no simple
solution for θ in terms of (y1 , . . . , yT ).
Maximization of (1.10.17) thus requires iterative or
numerical procedures.
F. Guta (CoBE) Econ 654 September, 2017 224 / 296
Conditional ML Estimates of AR(1) Process
An alternative to numerical optimization of the exact

likelihood function is to regard the value of y1 as deterministic

and maximize the likelihood conditioned on the …rst observation,

T
f Y T ,...,Y 2 jY 1 ( yT , . . . , y2 j y1 ; θ) = ∏ f Y t jY t 1
( yt j yt 1 ; θ) (1.10.18)
t =2

the objective function then being to maximize

ln f Y T ,...,Y 2 jY 1 ( yT , . . . , y2 j y1 ; θ) = T 1
2 ln 2π T 1
2 ln σ2 (1.10.19)
T
1
2 σ2 ∑ ( yt α φ yt 1)
2
t =2
F. Guta (CoBE) Econ 654 September, 2017 225 / 296
Maximization of (1.10.19) with respect to α and φ

is equivalent to minimization of

T
∑ (yt α φyt 1)
2
(1.10.20)
t =2

which is achieved by OLS regression of yt on a


constant and its own lagged value.

The conditional maximum likelihood estimates of α


and φ are therefore given by
F. Guta (CoBE) Econ 654 September, 2017 226 / 296
0 1 0 1 1 0 1
B b C B
B α C B T 1 ∑T
t = 2 yt 1 C
C
B
B ∑T
t = 2 yt 1 C
C
B C=B C B C,
@ A @ A @ A
b
φ ∑T
t = 2 yt 1 ∑T 2
t = 2 yt 1 ∑T
t = 2 yt 1 yt

The conditional maximum likelihood estimate of the


innovation variance is found by di¤erentiating
(1.10.19) with respect to σ2 and setting the result
equal to zero:
T
T 1 1 2

σ2
2b
+
σ4
2b
∑ =0 yt b
α byt
φ 1
t =2
1
b2 =
or σ T
∑t =2 yt b α φbyt 1 2 .
T 1
In contrast to the exact maximum likelihood
F. Guta (CoBE) Econ 654 September, 2017 227 / 296
estimates, the conditional maximum likelihood
estimates are thus trivial to compute.

Moreover, if the sample size T is su¢ ciently large,


the …rst observation makes a negligible contribution
to the total likelihood.
The exact MLE and conditional MLE turns out to
have the same large sample distribution, provided
that jφj < 1.
When jφj > 1, the conditional MLE continues to
F. Guta (CoBE) Econ 654 September, 2017 228 / 296
provide consistent estimates, whereas maximization
of (1.10.17) does not.

This is because (1.10.17) is derived from (1.10.10),


which does not actually describe the density of Y1
when jφj > 1.
For these reasons, in most applications the
parameters of an autoregression are estimated by
OLS (conditional ML) rather than exact maximum
likelihood.
F. Guta (CoBE) Econ 654 September, 2017 229 / 296
Conditional ML Estimatesof AR(p) Process
A Gaussian AR (p ) process takes the form

Yt = α + φ1 Yt 1 + + φp Yt p + et (1.10.21)

with et i.i.d.N (0, σ2 ).


In this case , the vector of parameters to be
0
estimated consists of θ = α, φ1 , . . . , φp , σ2 .
Maximization of the exact log likelihood for AR (p )
process must be accomplished numerically.
In contrast, the log of the likelihood conditional on
F. Guta (CoBE) Econ 654 September, 2017 230 / 296
…rst p observations assumes the simple form

ln f Y Yp +1 jYp ,...,Y1 ( yT , yp +1 j yp , . . . , y1 ; θ)
T,

T p T p
= ln 2π ln σ2 (1.10.22)
2 2
T
1 2
2σ 2 ∑ Yt α φ1 Yt 1 φp Yt p
t =p +1

The values of α, φ1 , . . . , φp that maximizes

(1.10.22) are the same as those that minimize


T 2
∑ Yt α φ1 Yt 1 φp Yt p (1.10.23)
t =p +1
F. Guta (CoBE) Econ 654 September, 2017 231 / 296
Thus, the conditional maximum likelihood estimates
of these parameters can be obtained from an OLS
regression of yt on a constant and p of its own
lagged values.

The conditional maximum likelihood estimate of σ2

turns out to be

T
1 2
b2 =
σ
T ∑
p t =p +1
Yt b
α b1 Yt
φ 1 bp Yt
φ p

The exact maximum likelihood estimates and the


F. Guta (CoBE) Econ 654 September, 2017 232 / 296
conditional maximum likelihood estimates again
have the same large sample distributions.
Conditional LF for a Gaussian MA(1) Process

Calculation of the likelihood function for a moving


average process is simple if we condition on initial
values for the e‘s .
Consider the Gaussian MA(1) process

Yt = µ + et + θet 1 (1.10.24)

with et i.i.d.N (0, σ2 ).


F. Guta (CoBE) Econ 654 September, 2017 233 / 296
0
Let θ (µ θ σ2 ) denote the vector of parameters
to be estimated.
If the value of et 1 were known with certainty, then
2
Yt j et 1 N µ + θet 1, σ

Or

1 1 2
f Y t j et ( yt j et 1 ; θ) = p exp (yt µ θet 1)
1
σ 2π 2σ 2
(1.10.25)

Suppose that we knew for certain that e0 = 0.


F. Guta (CoBE) Econ 654 September, 2017 234 / 296
Then Y1 j e0 = 0 N (µ, σ2 ).
Moreover, given observation of y1 , the value of e1 is
then known with certainty as well, e1 = y1 µ.
Allowing application of (1.10.25) again:
1 1
f Y 2 jY 1 ,e0 =0 ( y2 j y1 , e0 = 0; θ) = p exp (y2 µ θe1 )2
σ 2π 2σ 2

Since e1 is known with certainty, e2 can be


calculated from e2 = y1 µ θe1 .
Proceeding in this fashion, it is clear that given
knowledge that e0 = 0, the full sequence
F. Guta (CoBE) Econ 654 September, 2017 235 / 296
fe1 , e2 , . . . , eT g can be calculated from
fy1 , y2 , . . . , yT g by iterating on

et = yt µ θet 1 (1.10.26)

for t = 1, 2, . . . , T , starting from e0 = 0.

The conditional density of the t th observation can


then be calculated from (1.10.25) as

f Y t jY t 1 ,...,Y 1 ,e0 =0
( yt j yt 1 , . . . , y1 , e 0 = 0; θ) (1.10.27)

1 e2t
= f Y t j et ( yt j e t 1 ; θ) = p exp
1
σ 2π 2σ 2

F. Guta (CoBE) Econ 654 September, 2017 236 / 296


The sample likelihood would then be the product of
these individual densities:

f Y T ,...,Y 1 je0 =0 ( yT , . . . , y1 j e0 = 0; θ)
T
= f Y 1 je0 =0 ( y1 j e0 = 0; θ) ∏ f Y t jY t 1 ,...,Y 1 ,e0 =0
( yt j yt 1 , . . . , y1 , e0 = 0; θ)
t =2

The conditional log likelihood is

` (θ) = ln f YT ,...,Y1 je0 =0 ( yT , . . . , y1 j e0 = 0; θ)


T
T T 1
=
2
ln 2π
2
ln σ2
2σ 2 ∑ e2t (1.10.28)
t =1

F. Guta (CoBE) Econ 654 September, 2017 237 / 296


For a particular numerical value of θ, we thus
calculate the sequence of e‘s implied by the data
from (1.10.26).
The conditional log likelihood function (1.10.28) is
then a function of the sum of squares of these e‘s.
The log likelihood is a complicated function of µ
and θ, so that an analytical expression for the ML
estimates of µ and θ is not readily calculated.
Hence, even the conditional MLEs for MA(1)
F. Guta (CoBE) Econ 654 September, 2017 238 / 296
process must be found by numerical optimization.
Iteration on (1.10.26) from an arbitrary value of e0
will result in

et = (yt µ) θ (yt 1 µ) + θ 2 (yt 2 µ)


+ ( 1) t 1
θt 1
(y1 t
µ ) + ( 1) θ t e 0

If jθ j is substantially less than unity, the e¤ect of


imposing e0 = 0 will quickly die out and the
conditional likelihood (1.10.27) will give a good
approximation to the unconditional likelihood for a
F. Guta (CoBE) Econ 654 September, 2017 239 / 296
reasonably large sample size.

By contrast, if jθ j > 1, the consequence of


imposing e0 = 0 accumulate over time.
The conditional approach is not reasonable in this
case.
If numerical optimization of (1.10.28) results in a
value of θ that exceeds one in absolute value, the
results must be discarded.
The numerical optimization should be attempted
F. Guta (CoBE) Econ 654 September, 2017 240 / 296
again with the reciprocal of b
θ used as a starting
value for the numerical search procedure.
Conditional LF for a Gaussian MA(q) Process

For the MA(q ) process

Yt = µ + et + θ 1 et 1 + + θ q et q (1.10.29)

A simple approach is to condition on the assumption


that the …rst q values for e were all zero:

e0 = e 1 = =e q +1 =0 (1.10.30)
F. Guta (CoBE) Econ 654 September, 2017 241 / 296
From these starting values we can iterate on

et = Yt µ θ 1 et 1 θ q et q (1.10.31)

for t = 1, 2, . . . , T .
Let e0 denote the (q 1) vector
0
( e0 e 1 e q +1 )

The conditional log likelihood is then

` (θ) = ln f YT ,...,Y1 je0 =0 ( yT , . . . , y1 j e0 = 0; θ)

T
= T
2 ln 2π T
2 ln σ2 1
2 σ2 ∑t =1 e2t (1.10.32)
F. Guta (CoBE) Econ 654 September, 2017 242 / 296
0
where θ = (µ θ 1 . . . θ q σ2 )
Again, expression (1.10.32) is useful only if all
values of z for which
1 + θ1z + θ2z 2 + + θ q z q = 0 lie outside the
unit circle.
Conditional LF for a Gaussian ARMA(p,q) Process

A Gaussian ARMA(p, q ) process takes the form

Yt = α + φ1 Yt 1 + + φp Yt p

+ et + θ 1 et 1 + + θ q et q (1.10.33)
F. Guta (CoBE) Econ 654 September, 2017 243 / 296
where et i.i.d.N (0, σ2 ).

The objective is to estimate the vector of population


0
parameters θ α φ1 . . . φp θ 1 . . . θ q σ 2 .
The approximation to the likelihood for an
autoregression is conditioned on the initial values of
the y’s.
The approximation to the likelihood function for a
moving average process conditioned on initial values
for the e‘s.
F. Guta (CoBE) Econ 654 September, 2017 244 / 296
A common approach to the likelihood function for
an ARMA(p, q ) process conditions on both y’s and
e‘s.
0
Taking initial values for y0 (y0 y 1 ... y p +1 )
0
and e0 ( e0 e 1 ... e q +1 ) as given, the
sequence fe1 , e2 , . . . , eT g can be calculated from
fy1 , y2 , . . . , yT g by iterating on

et = Yt α φ1 Yt 1 φp Yt p

θ 1 et 1 θ q et q (1.10.34)
F. Guta (CoBE) Econ 654 September, 2017 245 / 296
for t = 1, 2, . . . , T .

The conditional log likelihood is then

` (θ) = ln f YT ,...,Y1 jY0 ,e0 ( yT , . . . , y1 j y0 , e0 )

∑t =1 e2t
T
= T
2 ln 2π T
2 ln σ2 1
2 σ2
(1.10.35)

One option is to set initial y’s and e’s equal to their


expected values.
That is, set ys = α/ 1 φ1 φp for
s = 0, 1, . . . , p + 1 and set es = 0 for
F. Guta (CoBE) Econ 654 September, 2017 246 / 296
s = 0, 1, . . . , q + 1 and then proceed with the
iteration in (1.10.34) for t = 1, 2, . . . , T .

Alternatively, Box and Jenkins (1976) recommended


setting e’s to zero but y’s equal to their actual
values.
Thus, the iteration on (1.10.34) is started at date
t = p + 1 with y1 , y2 , . . . , yp set to the observed
values and ep = ep 1 = = ep q +1 = 0.
Then the conditional log likelihood is calculated as
F. Guta (CoBE) Econ 654 September, 2017 247 / 296
` (θ) = ln f ( yT , . . . , yp +1 j yp , . . . , y1 , ep = = ep q +1 = 0; θ)
T p T p 1 T
= ln 2π ln σ2 ∑ e2t
2 2 2σ 2 t =p +1

As in the case for the moving average processes,


these approximations should be used only if all
values of z satisfying

1 + θ1z + θ2z 2 + + θq z q = 0

lie outside the unit circle.

1.11 Model Selection and Diagnostic Checking


F. Guta (CoBE) Econ 654 September, 2017 248 / 296
Most of the time there are no economic reasons to
choose a particular speci…cation of the model.
Consequently, to a large extent the data will
determine which time series model is appropriate.
Before estimating any model, it is common to
estimate autocorrelation and partial autocorrelation
coe¢ cients directly from the data.
Often it gives some idea about which models might
be appropriate.
F. Guta (CoBE) Econ 654 September, 2017 249 / 296
After one or more models are estimated, their
quality can be judged by checking whether the
residuals are more or less white noise, and by
comparing them with alternative speci…cations.
These comparisons can be based on statistical
signi…cance tests or the use of particular model
selection criteria.

The Autocorrelation Function

The autocorrelation function (ACF ) describes the


F. Guta (CoBE) Econ 654 September, 2017 250 / 296
correlation between yt and its lag yt k as a
function of k.
The k th -order autocorrelation coe¢ cient is de…ned
as
1 T
T k ∑t =k +1 (yt y ) (yt k y)
b
ρk = 1 T 2 (1.11.1)
T ∑t =1 (yt y)
where (1/T ) ∑Tt=1 yt denotes the sample average.
Alternatively, it can be estimated by regressing yt
on a constant and yt k which will give a slightly
di¤erent estimator as the summation in the
F. Guta (CoBE) Econ 654 September, 2017 251 / 296
numerator and denominator will be over the same
set of observations.

It will usually not to be the case that b


ρk is zero for
an MA model of order q < k.
But we can use b
ρk to test the hypothesis that
ρk = 0 using the result that asymptotically
p d
T (b
ρk ρk ) ! N (0, νk )

where νk = 1 + 2ρ21 + 2ρ22 + + 2ρ2q if q < k.


F. Guta (CoBE) Econ 654 September, 2017 252 / 296
Testing MA(k 1) versus MA(k ) is done by testing
ρk = 0 and comparing the test statistic
p b
ρk
Tq (1.11.2)
ρ21
1 + 2b ρ22
+ 2b + ρ2k 1
+ 2b

with critical values from the standard normal


distribution.

Typically, two-standard error bounds for b


ρk based on
ρ21 + 2b
the estimated variance 1 + 2b ρ22 + ρ2k
+ 2b 1

are graphically displayed in the plot of the sample


F. Guta (CoBE) Econ 654 September, 2017 253 / 296
autocorrelation function.

The order of a moving average can in this way be


determined from an inspection of the sample ACF .
At least it will give us a reasonable value for q to
start with, and diagnostic checking should indicate
whether it is appropriate or not.
For autoregressive models the ACF is less helpful.
For AR (1) model we have seen that the
autocorrelation coe¢ cients do not cut o¤ at a
F. Guta (CoBE) Econ 654 September, 2017 254 / 296
…nite lag length.

Instead they go to zero exponentially corresponding


to ρk = φk .
For higher-order AR models, the autocorrelation
function is more complex.
An alternative source of information that is helpful
is provided by the partial autocorrelation function.

The Partial Autocorrelation Function

F. Guta (CoBE) Econ 654 September, 2017 255 / 296


The k th order sample partial autocorrelation
coe¢ cient is de…ned as an estimate for φk in an
AR (k ) model.
bk . So, estimating
We denote this by φ

yt = α + φ1 yt 1 + φ2 yt 2 + + φk yt k + et

bk , the estimated coe¢ cient for φk in the


yields φ
AR (k ) model.
The partial autocorrelation coe¢ cient measures the
additional correlation between yt and yt k after
F. Guta (CoBE) Econ 654 September, 2017 256 / 296
controlling for the e¤ects of other regressors
yt 1 , . . . , yt k +1 .

Obviously if the true model is an AR (p ) process,


then estimating an AR (k ) model by OLS gives
consistent estimators for the model parameters if
k p.
Consequently, we have

bk = 0 if k > p
plim φ (1.11.3)

Moreover, it can be shown that the asymptotic


F. Guta (CoBE) Econ 654 September, 2017 257 / 296
distribution is normal, i.e.,
p d
bk
T φ 0 ! N (0, 1) if k > p (1.11.4)

Consequently, the partial autocorrelation coe¢ cients


can be used to determine the order of AR process.
Testing an AR (k 1) model versus and AR (k )
model implies testing the null hypothesis that
φk = 0.
Under the null hypothesis that the model is
F. Guta (CoBE) Econ 654 September, 2017 258 / 296
AR (k bk
1), the appropriate standard error of φ
p
based on (1.11.4) is 1/ T , so that φk = 0 is
p
rejected if Tφbk > 1.96.

This way one can look at the PACF and test for
each lag whether the partial autocorrelation
coe¢ cient is zero.
For a genuine AR (p ) model the partial
autocorrelation will be close to zero after the pth
lag.
F. Guta (CoBE) Econ 654 September, 2017 259 / 296
For a moving average model it can be shown that
the partial autocorrelations do not have a cut-o¤
point but tail o¤ to zero just like the
autocorrelations in an autoregressive model.

In summary the AR (p ) process is described by:

an ACF that is in…nite in extent (it tails o¤).


a PACF that is (close to) zero for lags alrger than p.

For an MA(q ) process we have:

an ACF that is (close to) 0 for lags larger than q.


F. Guta (CoBE) Econ 654 September, 2017 260 / 296
a PACF that is in…nite in extent (tails o¤).
In the absence of any of these two situations, a
combined ARMA model may provide a parsimonious
representation of the data.

Diagnostic Checking

As a last step in model-building cycle, some checks


on the model adequacy are required.
Possibilities are doing a residual analysis and
over…tting the speci…ed model.
F. Guta (CoBE) Econ 654 September, 2017 261 / 296
For example, if an ARMA(p, q ) model is chosen, we
could also estimate an ARMA(p + 1, q ) and an
ARMA(p, q + 1) models and test the signi…cance of
the additional parameters.
A residual analysis is usually based on the fact that
the residuals of an adequate model should
approximately be white noise.
A plot of the residuals can be a useful tool in
checking for outliers.

F. Guta (CoBE) Econ 654 September, 2017 262 / 296


Moreover, the estimated residual autocorrelations
are usually examined.
For a white noise series the autocorrelations are
zero.
Therefore the signi…cance of the residuals
autocorrelations is often checked by comparing with
p
approximate two-standard error bounds 2/ T .
To check the overall acceptability of the residual
autocorrelations, the Ljung-Box (1978)
portmanteau test statistic
F. Guta (CoBE) Econ 654 September, 2017 263 / 296
Q = T (T + 2) ∑Kk=1 T 1 k rk2 (1.11.5)
is often used, where rk are the estimated
autocorrelation coe¢ cients of the residuals b
et , and
K is the number chosen by the researcher.

Values of Q for di¤erent K may be computed in the


residual analysis.
For an ARMA(p, q ) process, the statistic is
approximately Chi-squared distributed with
K p q (where K > p + q) degrees of freedom
F. Guta (CoBE) Econ 654 September, 2017 264 / 296
under the null hypothesis that the ARMA(p, q ) is
correctly speci…ed.
If the model is rejected at this stage, the
model-building cycle has to be repeated.
Criteria for Model Selection
Economic theory does not provide any guidance to
the appropriate choice of models, some additional
criteria can be used to choose from alternative
models that are acceptable from statistical point of
view.
F. Guta (CoBE) Econ 654 September, 2017 265 / 296
What is the appropriate choice of p in practice?
This is a problem of model selection.
One approach to model selection is to choose p
based on a Wald test.
Another is to minimize the AIC or BIC information
criterion assuming constant is included in the
AR (p ) model, e.g.
2 ( p + 1)
b 2 (p ) +
AIC (p ) = ln σ (1.11.6)
T
b2 (p ) is the estimated residual variance
where σ
F. Guta (CoBE) Econ 654 September, 2017 266 / 296
from an AR (p ).
One ambiguity in de…ning the AIC criterion is that
the sample available for estimation changes as p
changes.
If you increase p, you need more initial conditions.
This can induce strange behaviour into the AIC .
The best remedy is to …x an upper value p, and
then reserve the …rst p as initial conditions, and
then calculate the models AR (1), AR (2), . . . ,
AR (p ) on this uni…ed sample.
F. Guta (CoBE) Econ 654 September, 2017 267 / 296
Alternatively one can use the BIC given by

( p + 1)
b 2 (p ) +
BIC (p ) = ln σ ln T (1.11.7)
T

If one is to choose between alternative ARMA(p, q )

models, the AIC and BIC information criteria are

given by

2 ( p + q + 1)
b 2 (p + q ) +
AIC (p + q ) = ln σ
T
( p + q + 1)
b 2 (p + q ) +
BIC (p + q ) = ln σ ln T
T
F. Guta (CoBE) Econ 654 September, 2017 268 / 296
Both criteria are likelihood based and represent a
di¤erent trade-o¤ between ‘…t’, as measured by the
loglikelihood value, and ‘parsimony’, as measured
by the number of free parameters, p + q + 1,
(assuming the models include a constant).
Usually the model with the smallest AIC or BIC
value is preferred, although one can choose to
deviate from this if the di¤erences in criterion values
are small for a subset of the models.

F. Guta (CoBE) Econ 654 September, 2017 269 / 296


While the two criterion di¤er in the trade-o¤
between …t and parsimony, the BIC criterion can be
preferred because it has the property that it will
almost surely select the true model, if T ! ∞,
provided that the true model is in the class of
ARMA(p, q ) models for relatively small values of p
and q.
The AIC criterion tends to result asymptotically in
overparameterized models.

F. Guta (CoBE) Econ 654 September, 2017 270 / 296


1.12 Predicting with ARMA models

The main goal of building a time series model is


predicting the future path of economic variables.
ARMA models usually perform quite well in
prediction and often outperform more complicated
structural models.
However, ARMA models do not provide any
economic insight in one’s predictions and are unable
to forecast under alternative economic scenarios.
F. Guta (CoBE) Econ 654 September, 2017 271 / 296
The Optimal Predictor

Suppose we are interested in predicting YT +h at


time T , the value of YT h-periods ahead.
A predictor for YT +h will based on an information
set, denoted by FT , that contains the information
that is available and potentially used at the time of
making the forecast.
Ideally it contains all the information that is
observed and known at time T .
F. Guta (CoBE) Econ 654 September, 2017 272 / 296
In univariate time series modeling we will usually
assume that the information set at any point t in
time contains the value of Yt and all its lags. Thus
we have

FT = fY ∞ , . . . , YT 1 , YT g (1.12.1)

b T +hjT (the predictor for


In general, the predictor Y
YT +h as of time T ) is a function of the information
set FT .
Our criterion for choosing a predictor from the
F. Guta (CoBE) Econ 654 September, 2017 273 / 296
many possible ones is to minimize the expected
quadratic prediction error
2
E YT +h b T +h jT
Y FT (1.12.2)

where E ( j FT ) denotes the conditional


expectation given the information set FT .

It can be shown that the best predictor for YT +h


given the information set at time T is the
conditional mean of YT +h given the information set.
F. Guta (CoBE) Econ 654 September, 2017 274 / 296
We denote this optimal predictor as

b T +h jT
Y E ( YT +h j FT ) (1.12.3)

The conditional expectation of YT +h given an


information set FT0 , where FT0 is a subset of ,FT is
b T +hjT based on FT .
at best as good as Y
In line with this intuition, it holds that, the more
information one uses to determine the predictor, the
better the predictor will be.

F. Guta (CoBE) Econ 654 September, 2017 275 / 296


For example, E ( YT +h j YT , YT 1,... ) will usually be
a better predictor than E ( YT +h j YT ) or E (YT +h )
(an empty information set).
To simplify things, assume that the parameters of
the ARMA model for YT are known.
In practice, one would simply replace the unknown
parameters by their consistent estimates.
Now, how do we determine these conditional
expectations when YT follows an ARMA process?
F. Guta (CoBE) Econ 654 September, 2017 276 / 296
To simplify the notation, consider forecasting yT +h ,
noting that YT +hjT = µ + yT +hjT .
As a …rst example, consider an AR (1) process where

yT +1 = φyT + eT +1

Consequently,

ybT +1jT = E ( yT +1 j yT , yT 1 , . . .)

= φyT + E ( eT +1 j yT , yT 1 , . . .)

= φyT (1.12.4)
F. Guta (CoBE) Econ 654 September, 2017 277 / 296
To predict two periods ahead (h = 2), we write

yT +2 = φyT +1 + eT +2

from which it follows

ybT +2jT = E ( yT +2 j yT , . . .)
= φE ( yT +1 j yT , . . .) + E ( eT +2 j yT , . . .)
= φ2 yT (1.12.5)

In general, we obtain ybT +hjT = φh yT .


Thus, the last observed value yT contains all the
F. Guta (CoBE) Econ 654 September, 2017 278 / 296
information to determine the predictor for any
future value.

When h is large, the predictor ybT +hjT converges to


0 (the unconditional expectation of yt ), provided
that jφj < 1.
With a nonzero mean, the best predictor for YT +h is
directly obtained as
µ + ybT +hjT = µ + φh (YT µ ).
Note that this di¤ers from φh YT .
F. Guta (CoBE) Econ 654 September, 2017 279 / 296
As a second example, consider an MA(1) process
where
yt = et + θet 1

Then we have

E ( yT +1 j yT , . . .) = θE ( eT j yT , . . .) = θeT

where implicitly, we assumed that eT is observed


(contained in the information set FT ).
Assuming that the MA process is invertible, we can
write eT = ∑j∞=0 ( θ ) yT
j
j and determine the
F. Guta (CoBE) Econ 654 September, 2017 280 / 296
one-period ahead predictor as

ybT +1jT = θ ∑ ( θ ) yT
j
j (1.12.6)
j =0

predicting two periods ahead gives

ybT +2jT = E ( yT +2 j yT , yT 1 , . . .) (1.12.7)

= E ( eT +2 j yT , . . .) + θ E ( eT +1 j yT , . . .) = 0

which shows that the MA(1) model is


uninformative for predicting two periods ahead: the
best predictor is simply the (unconditional)
F. Guta (CoBE) Econ 654 September, 2017 281 / 296
expected value of yt , normalized at zero.

This also follows from the autocorrelation function


of the process, because the ACF is zero after one
lag.
That is, the ‘memory’of the process is only one
period.

For the general ARMA(p, q ) model

yt = φ1 yt 1 + + φp yt p + et + θ 1 et 1 + + θ q et q

F. Guta (CoBE) Econ 654 September, 2017 282 / 296


We can derive the following recursive formula to

determine the optimal predictors:

ybT +h jT = φ1 ybT +h 1 jT + + φp ybT +h p jT + be T +h jT

+θ 1be T +h 1 jT + + θ q be T +h q jT (1.12.8)

where b
e T +k jT is the optimal predictor for eT +k at
time T , and

ybT +k jT = yT +k if k 0
b
e T +k jT = 0 if k > 0
F. Guta (CoBE) Econ 654 September, 2017 283 / 296
b
e T +k jT = e T +k if k 0
where that latter innovation can be solved from the
autoregressive representation of the model.

For this we have used the fact that the process is


stationary and invertible, in which case the
information set fyT , yT 1 , . . .g is equivalent to
f eT , eT 1 , . . . g.

That is, if all et ’s are known from ∞ to T , then


all yt ’s are known from ∞ to T , and vice versa.
F. Guta (CoBE) Econ 654 September, 2017 284 / 296
To illustrate this, consider an ARMA(1, 1) model
that takes the form

yt = φyt 1 + et + θet 1

The optimal predictor for yT +1 is then given by

ybT +1jT = φyT + b


e T +1jT + θeT = φyT + θeT

Assuming invertibility

yt φ1 yt 1 = (1 + θL) et

can be written as
F. Guta (CoBE) Econ 654 September, 2017 285 / 296
et = (1 + θ L) 1
( yt φ 1 yt 1) = ∑j∞=0 ( θ )j yt j φ 1 yt 1 j

We can write for the one-period-ahead predictor


ybT +1jT = φyT + θ ∑( θ )j yT j φ1 yT j 1 (1.12.9)
j =0

Predicting two period ahead gives

y T +1 jT + b
ybT +2 jT = φb e T +2 jT + θb
e T +1 jT = φb
y T +1 jT (1.12.10)

Note that this does not equal to φ2 yT .

Prediction Accuracy
F. Guta (CoBE) Econ 654 September, 2017 286 / 296
In addition to prediction, it is important to know
how accurate this prediction is.

To judge forecasting precision, we de…ne the

prediction error as

YT +h b T +hjT = yT +h
Y ybT +hjT and the expected

quadratic prediction error as

2
Ch = E yT +h b
y T +h jT = var ( yT +h j FT ) (1.12.11)

F. Guta (CoBE) Econ 654 September, 2017 287 / 296


where the latter step follows from the fact that

ybT +hjT = E ( yT +h j FT )

Determining Ch , corresponding to the variance of


the h period-ahead prediction error, is relatively
easy with the moving average representation.
To start with the simplest case, consider an MA(1)
model.
Then we have
F. Guta (CoBE) Econ 654 September, 2017 288 / 296
C1 = var ( yT +1 j yT , yT 1 , . . .) = var ( eT +1 + θeT j eT , eT 1 , . . .)

= var (eT +1 ) = σ2

Alternatively, we explicitly solve for the predictor,


which is ybT +1jT = θeT , and determine the variance
of yT +1 ybT +1jT = eT +1 , which gives the same
result.
For the two-period-ahead predictor we have

C2 = var ( yT +2 j yT , yT 1 , . . .) = var ( eT +2 + θeT +1 j eT , eT 1 , . . .)

= 1 + θ 2 σ2
F. Guta (CoBE) Econ 654 September, 2017 289 / 296
As one would expect, the accuracy of the prediction
decreases if we predict further into the future.
It will not, however, increase any further if h is
increased beyond 2.
This becomes clear if we compare the expected
quadratic prediction error with that of a simple
unconditional predictor:

ybT +hjT = E (yT +h ) = E (eT +h + θeT +h 1 ) = 0

For this predictor we have


F. Guta (CoBE) Econ 654 September, 2017 290 / 296
2
Ch = E (yT +h 0) = var (yT +h )
= var (eT +h + θeT +h 1 ) = 1 + θ 2 σ2

Consequently, this gives an upper bound on the


inaccuracy of the predictors.
The MA(1) model thus gives more e¢ cient
predictors only if one predicts one period ahead.
More general ARMA models, however, will yield
e¢ ciency gains also in further ahead predictors.
Suppose the general model is ARMA(p, q ),
F. Guta (CoBE) Econ 654 September, 2017 291 / 296
which we can write as an MA(∞) model, with αj
coe¢ cients to be determined:

yt = ∑ αj et j with α0 = 1
j =0

The h period ahead predictor (in terms of et ’s) is

given by

b
y T +h jT = E ( yT +h j yT , yT 1 , . . .)

∞ ∞
= ∑ α j E ( e T +h j j e T , e T 1 , . . .) = ∑ α j e T +h j
j =0 j =h

F. Guta (CoBE) Econ 654 September, 2017 292 / 296


such that
h 1
yT +h ybT +hjT = ∑ α j e T +h j
j =0

Consequently, we have
h 1
∑ α2j
2 2
E yT +h ybT +hjT =σ (1.12.12)
j =0

This shows how the variance of the forecast errors


can easily be determined from the coe¢ cients of the
moving average representation of the model.
F. Guta (CoBE) Econ 654 September, 2017 293 / 296
Recall that, for the computation of the predictor,
the autoregressive representation was most
convenient.
As an illustration, consider the AR (1) model where
αj = φj .

The expected quadratic prediction error are given by

C1 = σ2 , C2 = σ2 1 + φ2 , C3 = σ2 1 + φ2 + φ4 , etc.

F. Guta (CoBE) Econ 654 September, 2017 294 / 296


For h = ∞, we have
C∞ = σ2 1 + φ2 + φ4 + = σ2 / 1 φ2
which is the unconditional variance of yt .
Consequently, the informational value contained in
AR (1) process slowly decays over time.
In the long run the predictor equals the
unconditional predictor, being the mean of the yt
series.
In practical cases, the parameters in ARMA models
F. Guta (CoBE) Econ 654 September, 2017 295 / 296
are unknown, and replaced by their estimated
values.

This introduces additional uncertainty in the


predictors.
Usually, however, this uncertainty is ignored.
The motivation is that the additional variance that
arises because of estimation error disappears
asymptotically as the sample size becomes
su¢ ciently large.
F. Guta (CoBE) Econ 654 September, 2017 296 / 296

You might also like