You are on page 1of 130

Advanced Econometrics II

Master in Economics and Finance, BGSE

Handout 2: Serial Correlation

Laura Mayoral
IAE and Barcelona GSE
Barcelona, Winter 2021
Goal

This handout extends the GMM single-equation GMM model


(Hayashi, Chapter 3) to incorporate serial correlation

Serial correlation arises in the context of time series data.

Thus, in the following, we will use the subindex “t” (=time),


rather than “i”

Serial correlation: correlation between contemporaneous a ran-


dom variable and its lagged values.

In the GMM context: we will allow for serial correlation in the


product of vector of instruments and the error term.
The problem

When time series data is employed, serial correlation is likely


to be the norm, not the exception

What are the properties of the estimators (OLS, GMM,. . . )


under serial correlation of Xt εt ?

Still consistent?

Same asymptotic distribution?


The problem

When time series data is employed, serial correlation is likely


to be the norm, not the exception

What are the properties of the estimators (OLS, GMM,. . . )


under serial correlation of Xt εt ?

Still consistent?

Same asymptotic distribution?

Still consistent (if X continues being exogeneous)

But different asymptotic distribution (more precisely, different


Avar)
To develop the asymptotic distribution of the GMM estima-
tor, so far we have assumed that Xt εt is a martingale difference
sequence.

Recall that by definition of m.d.s.,

E (Xt εt |Xt−1 εt−1 . . . ) = 0

which implies that the process {Xt εt } is non-autocorrelated. (Note:


use the law of iterated expectations to check this).

We will now relax this hypotheses: allow autocorrelation in


{Xt εt }

Then: we need to use a new C.L.T for serially dependent pro-


cesses!
This handout has two parts:

Basic background in time series analysis. Limit theorems for


weakly dependent processes

GMM with serial correlation


Basic elements in Time Series Analysis.

Limit theory for weakly dependent processes


Outline
I. Univariate Time Series Models:

1. Autocovariance and autocorrelation function

2. Strict and Weak Stationarity. Ergodicity.

3. Modelling weakly dependent processes: The Wold theorem (MA


processes)

4. AR and ARMA processes

5. Limit theorems for stationary and ergodic processes

6. Estimation of the long run variance.

II. Multivariate Time Series Models.


Time series data

A time series is a set of observations

y1 , y2 , . . . , yt , . . . , yT ,

where t is the time index.

Time series data come with a natural temporal ordering


Random Sampling vs. Dependent stochastic
processes

In cross-sectional analysis: observations (yi , xi ), i = {1, ..., N }


are (typically assumed to be) randomly drawn from a fixed popu-
lation. N observations from the same distribution. No ordering.

Random sampling implies that observations from different units


are independently distributed.

Time series observations (yt , xt ) , t = {1, ...T } are in general


non-independent.

−→ Dependence among observations is a key feature in time series


variables.
Univariate vs Multivariate time series processes

Univariate time series process: {yt }, where yt is a random vari-


ables

Multivariate time series process: {yt }, where yt is a vector of


random variables: (y1t . . . ykt )0

We begin by analysing key aspects of univariate time series


processes
Very similar in multivariate time series (notation is a bit more
complicated)
The autocovariance funtion

Consider a univariate time series process {Xt }


The autocovariance function is a measure of linear dependence
between elements of the sequence {Xt , t ∈ Z}
extends the concept of covariance matrix (computed with a
finite number of random variables) to the case where there is an
infinite collection of random variables.

Definition 1 The autocovariance function. If {Xt , t ∈ T } is a pro-


cess such that V ar (Xt ) < ∞ for each t ∈ T , then the autocovariance
function γX (., .) of Xt is defined by

γX (r, s) = Cov (Xr , Xs )


= E [(Xr − E (Xr ))(Xs − E (Xs ))], r, s ∈ T .
Weak Stationarity and Strict stationarity

Stationarity is a crucial concept.

There are two basic definitions of stationarity:

strict and

weak (or second-order) stationarity.

In both cases: it imposes stability over time (either in the


joint distribution or in the moments of the process of the random
variables in the stochastic process)
Strict stationarity

Definition 2 (Finite Dimensional distributions). Let T be the set


of all vectors {t = (t1, ..., tn )0 ∈ T n : t1 < t2 ... < tn , n=1, 2,...}. Then
the finite-dimensional distribution functions of {Xt , t ∈ T } are the
0
functions {Ft (.) , t ∈ T } defined for t = (t1 , ..., tn ) by
0
Ft (x) = P (Xt1 ≤ x1 , ..., Xtn ≤ xn ), x = (x1 , ..., xn ) ∈ Rn
Definition 3 (First, Second and n-th order stationary). The time
series {Xt , t ∈ Z} is said to be first-order, second-order and n-th
order stationary, respectively if

Ft (xt1 ) = Ft (xt1 +h ) , for any t1 , h;

Ft (xt1 , xt2 ) = Ft (xt1 +h , xt2 +h ) , for any t1 , t2 , h;



Ft (xt1 , xt2 , ..., xtn ) = Ft xt1 +h, xt2 +h , ..., xtn +h ,
for any t1 , t2 , ..., th , h;

Definition 4 (Strict Stationarity) The time series {Xt , t ∈ Z} is


said to be strictly stationary if the joint distributions of (Xt1 , ....Xtk )0
and (Xt1+h , ....Xtk+h )0 are the same for all positive integers k and for
all t1 , ..., tk , h ∈ Z. In other words, {Xt , t ∈ Z} is strictly stationary
if it is n-order stationary for any n.
Interpretation:

This means that the graphs over two equal-length time intervals
of a realisation of the time series should exhibit similar statistical
characteristics.

Joint finite dimensional distributions are difficult to work with.


The following concept introduces a notion of stationarity that can
be characterized by only looking at first and second moments.
Weak Stationarity

Definition 5 (Weak Stationarity) The time series {Xt , t ∈ Z} is


said to be weakly stationary if

i) E Xt < ∞ for all t ∈ Z


2

ii) E(Xt ) = m for all t


iii) γX (r, s) = γX (r + t, s + t) for all r, s, t ∈ Z.

This concept of stationarity is usually referred in the litera-


ture as second-order stationarity, weak stationarity or covariance
stationarity.
Notice that stationarity requires also the variance of Xt to be
constant. If Xt is stationary, then

V ar (Xr ) = γX (r, r ) = γX (r + t, r + t) = V ar (Xr +t ),


for all r, t ∈ Z.

(Weak) Stationarity basically means that the mean, the vari-


ance are finite and constant and that the autocovariance function
only depends on h, the distance between observations.
Stationarity and the autocovariance funtion

If {Xt , t ∈ Z} is stationary, then γX (r, s) = γX (r − s, 0) for all r,


s ∈ Z. Then, for stationary processes one can define the autoco-
variance as a function of only one parameter, that is

γX (h) = Cov (Xt+h , Xt ) for all t, h ∈ Z.

The function γX (.) will be referred to as the autocovariance


function of the process {Xt } and γX (h) is the value of this function
at lag h.

Notation: γX (h) or simply γh will denote the h-th autocovari-


ance of Xt .
If γ (.) is the autocovariance function of a stationary process,
then it verifies

i) γ (0) ≥ 0
ii) |γ (h)| ≤ γ (0) for all h ∈ Z
iii) γ (−h) = γ (h) for all h ∈ Z
The autocorrelation function

Definition 6 (Autocorrelation function, ACF) For a stationary pro-


cess {Xt }, the autocorrelation function at lag h is defined as

γX (h)
ρX ( h ) = = Corr (Xt+h , Xt ), for all t, h ∈ Z.
γX (0)
The relation between Stationary and Strict Stationarity
Strict stationarity implies weak stationarity, provided the first
and second moments of the variables exist, but the converse of
this statement is not true in general.

Taking k = 1 in Definition 3, it is clear that all the variables


have the same distribution, which implies that the mean and the
variance are the same for all variables (provided they exist).

Taking k = 2 in Definition 3, it follows that Cov(Xt1 , Xt2 ) =


Cov (Xt1 +h , Xt2 +h ) for all t1 , t2 , h ∈ Z, implying condition iii) in the
stationarity definition.
However, it is easy to find counterexamples where a process is
stationary but not strictly stationary.

Example 1 Consider a sequence of independent random variables


such that if t < T1 , Xt follows an exponential distribution with mean
and variance equal to 1 and if t ≥ T1 , Xt is normally distributed
with mean and variance equal to 1. {Xt } is stationary but it is not
strictly stationary because Xt and Xt∗ have different distributions
if t < T1 and t∗ ≥ T1 .
The relation between Stationary and Strict Stationarity,
II
There is an one important case where both concepts of stationary
are equivalent.

Definition 7 (Gaussian Time series) The process {Xt } is a Gaus-


sian time series if and only if the distribution functions of {Xt } are
all multivariate normal.

If {Xt , t ∈ Z} is a stationary Gaussian time series, then it is also


strictly stationary, since for all n ∈ {1, 2, ...} and for all h, t1 , t2 , ...
0 0
∈ Z, the random vectors (Xt1 , ..., Xtn ) , and (Xt1 +h , ..., Xtn +h ) have
the same mean, and covariance matrix, and hence they have the
same distribution.
Ergodicity

Ergodicity is a condition that restricts the memory of the pro-


cess.

A loose definition of ergodicity is that the process is asymptot-


ically independent.

That is, for sufficiently large n, Yi and Yi+n are nearly indepen-
dent. A more formal definition is provided below.

All of these definitions essentially say that the effect of current


events eventually disappears.
Definition 8 (Ergodicity for the mean) A covariance stationary
−1
PT
process Yt is said to be ergodic for the mean if Y = T t = 1 Yt
converges to E(Yt ) .

Definition 9 (Ergodicity for the second moments) A covariance


stationary process is said to be ergodic for the second moments if
1 X p
(Yt − µ) (Yt−j − µ) → γj , for all j.
(T − j ) t=j +1
Sufficient conditions for ergodicity:

• If γ (n) → 0 as n → ∞, then then {Yt } is ergodic for the mean.


(Proof: Brockwell and Davis, p. 219)

P∞
• If j =0 |γ (j )| < ∞, then then {Yt } is ergodic for second mo-
ments. (Proof: Brockwell and Davis, p. 220)

• Furthermore,
P∞ if {Yt } is a stationary Gaussian process and
j =0 γj < ∞, then the process is ergodic of all moments.
Some examples of stationary processes

Example 2 iid sequences.


The sequence {εt } is i.i.d (independent and identically dis-
tributed) if all the variables are independent and share the same
univariate distribution.

Clearly, an iid sequence is strictly stationary and provided the


first and second order moments exist, it is also weakly stationary.

Example 3 White noise process.


The process {εt } is called white noise if it is weakly stationary
with E (εt ) = 0 and autovariance function
 2
σ h=0
γε (h) =
0 h 6= 0
The white noise process is important because is used as a
building block for more general processes, as can be seen in some
of the examples below.

An i.i.d sequence with zero mean and variance σ 2 is also white


noise. The converse is not true in general. Furthermore, a white
noise process might not be strictly stationary.
Example 4 Martingale difference sequence, m.d.s.
A process {εt }, with E (εt ) = 0 is called a martingale difference
sequence if
E (εt |εt−1 , εt−2 , ...) = 0, for t ≥ 2.

Exercise:
Show that if E (εt ) = 0 and the second order moments exist then,

εt is i.i.d. ⇒ εt is m.d.s ⇒ εt is white noise

but the converse implication is not true in general.


Example 5 Moving average of order one.
The process {Xt } is called a moving average of order 1, or
MA(1), if {Xt } is defined as

Xt = εt + θεt−1 ,

where {εt } is a white noise process and is stationary for any value
of θ.

Example 6 Moving average of order q.


The process {Xt } is called a moving average of order 1, or
MA(1), if {Xt } is defined as

Xt = εt + θ1 εt−1 · · · + θq εt−q ,

where {εt } is a white noise process and is stationary for any value
of θ1 . . . θq
Example 7 Autoregression of order 1.
The process {Xt } is called an autoregressive process of order
1 if {Xt } is defined as

Xt = φXt−1 + εt ,

where {εt } is a white noise process. Xt is stationary provided |φ| < 1.

Example 8 Autoregression of order p.


The process {Xt } is called an autoregressive process of order
p if {Xt } is defined as

Xt = φ1 Xt−1 + . . . φp Xt−p + εt ,

where {εt } is a white noise process. Xt is stationary provided all


the roots of the polynomial Φ(L) = 1 − φL − · · · − φp Lp are larger
than 1 in abs. value.
Some graphs
The graphs below correspond to simulated data.
IID process: plot and autocorrelation function

-1

-2

-3

-4
0 50 100 150 200 250 300

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

-0.1
0 5 10 15 20 25
AR(1) process, φ = 0.8: plot and autocorrelation function

-2

-4

-6
0 50 100 150 200 250 300

0.5

-0.5
0 5 10 15 20 25
MA(1) process, θ = 0.8: plot and autocorrelation function

-2

-4
0 50 100 150 200 250 300

0.5

0.4

0.3

0.2

0.1

-0.1
0 5 10 15 20 25
Some examples of non-stationary processes

Example 9 A trended process.

Xt = βt + εt ,

where t = 1, ...., T is a deterministic time trend.

Example 10 A random walk process

Xt = Xt−1 + εt , t≥0

Example 11 A process with a break

Xt = εt , t < k
Xt = µ + εt , t ≥ k

where µ 6= 0.
And some more graphs
Random walk: plot and autocorrelation funciton

40

30

20

10

-10
0 50 100 150 200 250 300

0.8

0.6

0.4

0.2

0
0 5 10 15 20 25
Trend-stationary process: plot and autocorrelation funciton

60

40

20

-20
0 50 100 150 200 250 300

0.8

0.6

0.4

0.2

0
0 5 10 15 20 25
The Lag Operator

The lag operator L maps a sequence {xt } into a sequence {yt }


such that
yt = Lxt = xt−1 , for all t.

If we apply L repeatedly on a process, for instance L (L (Lxt )) ,


we will use the convention

L (L (Lxt )) = L3 xt = xt−3 .

We can also form polynomials, ap (L) = 1 + a1 L + a2 L2 + ... +


ap Lp , such that

ap (L) xt = xt + a1 xt−1 + ...ap xt−p .


Modelling Serial correlation: Linear processes

A fundamental result: The Wold Theorem

Let {yt } be a stationary time series with E (yt ) = µ and var (yt ) < ∞.
A fundamental decomposition result is the following
Wold Representation theorem:


X
yt = µ + ψj εt−j (1)
j =0

With:
ψ0 = 1
P∞ 2 <∞
(square-summability) ψ
j =0 j

εt is a white noise process with zero mean and variance σ 2


Implications

VERY IMPORTANT: any stationary process can ALWAYS be


written as a linear process

The resulting process is a M A(∞) process: a linear combination


of a white noise process

Very easy to manipulate analytically


Moments of the MA(∞) process

mean: E (yt ) = µ

P∞
variance: var (yt ) = σ2 2
j =0 ψj

P∞ 2 <∞
Square-summability is a stationarity condition: ψ
j =0 j

Often a more demanding condition is required (one that is


NOT guaranteed by the Wold Theorem!):

P∞
Absolute-summability: j =0 ψj < ∞

Note: absolute-summability ⇒ Square-summability (see Hamil-


ton, Appendix 3.A) [The converse argument, not true in general]
Autocovariances of the MA(∞) process

γj = E [(yt − µ)(yt−j − µ)]



X ∞
X
= E [( ψk εt−k )( ψh εt−h )
k =0 h=0

X
= σ2 ψj +k ψk , j = 0, 1, 2 . . .
k =0
Ergodicity for the mean (MA(∞) process)

A stationary linear process is ergodic for the mean

Recall that for this to happen then γ (n) → 0 if n → ∞


Ergodicity for second moments fo the MA(∞) process

Recall that ergodicity for second moments requires


X
γj < ∞
j =0

P∞ P∞
It can be shown that j =0 ψj < ∞ implies j =0 γj < ∞

Proof: See Appendix 3.A (Hamilton)

P∞
Thus, a stationary process with j =0 ψj < ∞ is ergodic for
second moments.
MA(q) processes
The Wold representation theorem allows us to write any sta-
tionary process as a (potentially infinite) linear combination of a
white noise process: MA(∞).

If the number of terms in this linear combination is finite:


MA(q) process

You can check that in a MA(q) process the first q autocorre-


lations are different from zero and the rest are equal to zero.
An alternative representation: AR processes

From the Wold representation (MA(∞)) it’s possible to obtain


and alternative representation:

Autoregressive representation (AR(∞) process)

We will obtain this representation by “inverting” the MA poly-


nomial.

To do this properly, we will define first the concept of “Filter”


of a process.
Filters

Given a sequence of real numbers (α0 , α1 , . . . ), define a filter as


a polynomial in L:

α(L) = α0 + α1 L + α2 L2 + . . . .

If you apply this filter to a process {xt }, you get:

α(L)xt = α0 xt + α1 xt−1 + α2 xt−2 + . . . .


X
= αj xt−j
j =0
Inversion

L−1 is the inverse of L, such that L−1 (L) xt = xt .

Lag polynomials can also be inverted.

The inverse of a polynomial φp (L) , are given by the values of


the coefficients αi of α (L) such that
−1
α (L) = φ (L) = 1 + α1 L + α2 L2 + ..., such that
−1
φp ( L ) φ ( L ) = 1.
Example
Let p=1. Find the inverse of φ1 (L) = (1 − φL) .

This amounts to finding the α0i s that verify


2

(1 − φL) 1 + α1 L + α2 L + ... = 1.
Matching terms in Lj , it follows that

−φ + α1 = 0 =⇒ α1 = φ,
−φα1 + α2 = 0 =⇒ α2 = φ2 .

...

Therefore
 

(1 − φL)−1 = 1 +
X
φj Lj  , provided |φ| < 1.
j =1
 P∞ j j 
It is easy to check that 1 + j =1 φ L is the inverse of (1 − φL)
since:

 
k
X
(1 − φL) 1 + φj Lj  = 1 − φk+1 Lk+1 → 1 as k → ∞
j =1
Example 12 Let p=2. Find the inverse of φ1 (L) = (1 − φL) −
φ2 L2 ).

If p > 1, we can invert the polynomial φp (L) by first factoring it


and, then, use the formula for p=1. For example, let 1/λ1 and
1/λ2 be the roots of φ2 (L) . Then,

1 + φ1 L + φ2 L2 = (1 − λ1 L) (1 − λ2 L)

Provided |λ1 | , |λ2 | < 1,

2 −1
= (1 − λ1 L)−1 (1 − λ2 L)−1

1 + φ1 L + φ2 L
  
∞ ∞
λj1 Lj   λj2 Lj 
X X
= 
j =0 j =0
AR(∞) processes

Consider the MA(∞) representation of a stationary linear pro-


cess:

xt = ψ ( L ) εt

where ψ (L) = 1 + ψ1 L + ψ2 L2 . . . .

By inverting ψ (L) it’s possible to obtain an alternative repre-


sentation for xt

Denote: φ(L) = ψ (L)−1 = 1 − φ1 L − φ2 L2 . . .

For the inverse to exist: the roots of ψ (L) have to be larger


than 1 in absolute value (invertibility condition)
Then, provided the roots of ψ (L) are larger than 1 in absolute
value (i.e., xt is invertible) then

xt = ψ ( L ) εt ⇒

xt = φ1 xt−1 + φ2 xt−2 + · · · + εt ,

which can also be written as

φ ( L ) xt = εt

where φ(L) = ψ (L)−1

This is the autoregressive representation (of order ∞) of xt

AR representation: lags of the dependent variable plus a white


noise process.
Note: about invertibility

In time series a process is called invertible if an AR represen-


tation exists

Then,

An AR process is always invertible (obviously, as it’s already


written in AR form)

A MA process is invertible provided the roots of ψ (L) are larger


than 1 in absolute value
AR(p) processes

If the AR representation contains a finite number (p) of lags:


AR(p) process

xt = µ + φ1 xt−1 + φ2 xt−2 + · · · + φp xt−p + εt

In contrast to MA process, AR processes are NOT always sta-


tionary

For instance: a random walk is a not stationary AR(1) process:

xt = xt−1 + εt−1

(the variance of this process is NOT constant).


Stationarity condition for AR processes

The roots of the polynomial φ(L) need to be larger than 1 in


absolute value

[Typical wording of this condition in time series books: “The


roots of φ(L) have to lie outside the unit circle]

An example (where this condition doesn’t hold): random walk

yt = yt−1 + εt

φ(L) = 1 − L; root=1; not stationary

The theory that follows doesn’t apply to this type of processes:


only applies to stationary and ergodic ones.
ARMA(p,q) processes

The MA(∞) and AR(∞) processes are not very useful in prac-
tice: both have an infinite number of coefficients!

Consider the following stationary process:

xt = ψ ( L ) εt
The following approximation writes the MA polynomial (with
an infinite number of terms) as a ratio of two finiter-order polyno-
mials:

φp (L)
ψ (L) ∼
θq (L)
Then,

φp (L)xt = θq (L)εt

xt is an ARMA(p, q) process.

it has p autoregressive terms and a moving average component


of order q:

Main advantage of ARMA(p,q): it depends on a finite number


of parameters

ARMA(p,q) is the most popular way of modelling (univariate)


serial correlation.
ARMA:

is stationary if the AR stationarity condition is verified (all the


roots of φ(L) outside of the unit circle)

is invertible (=admits an AR representation) if the roots of the


MA polynomial θ (L) lie outside of the unit circle
Estimation

Assume the model is correctly specified

AR(p) models: OLS

MA(q) models: since the error term εt is not observable: max-


imum likelihood

ARMA(p,q): maximum likelihood (same reason!)


Limit Theorems for the sample mean

Consider the sample mean associated to a stochastic process


{yt }:

T
X
ȳT = T −1 yt
t=1

The Law of Large Numbers and the Central Limit Theorem are
the most important results for computing the limit of this sequence.

There are different versions of these theorems that differ in the


allowed degree of dependence
for i.i.d. processes
for m.d.s. processes
for stationary processes
Limit theorems: i.i.d and m.d.s cases
For completeness (you already know this!)

Theorem 1 (Weak law of large numbers for iid sequences) If {yt }


is an i.i.d sequence of random variables with finite mean µ then
T
X p
ȳT = T −1 yt → µ
t=1

If we further assume that var(yi ) = σ 2 < ∞, then a simple proof


can be provided. By Chebychev’s inequality:

T T
!
X X
P T −1 yt − µ > ε) ≤ var (T −1 yt )/ε2


t=1 t=1
T
X
= T −2 var (yt )/ε2
t=1
T σ2
= 2 2
→ 0.
T ε
Theorem 2 (Central limit theorem for i.i.d. sequences) If {yt } is
a sequence of iid(µ, σ 2 ) random variables then
√ d
T (ȳT − µ)/σ → N (0, 1).

A more general version of this theorem can be stated as follows.

Theorem 3 (Central limit theorem for martingale difference se-


2

quences) Let {yt } be a martingale difference sequence. If a) E yt =
σt2 > 0 with T −1 T 2 2 r
P
t=1 σt → σ , b) E(|yt |) < ∞ for some r > 2 and
−1
P 2 p 2 √ d 2

all t and c) T yt → σ , then T ȳT → N 0, σ .
Limit theorems for stationary processes

Theorem 4 If {yt } is stationary with mean µ and autocovariance


function γ (.) , then
i) E (ȳT ) = µ
ii) If γ (T ) → 0 as T→ ∞, Var(ȳT ) = E (ȳT − µ)2 → 0.
P∞ 2
P∞
iii) If h=−∞ |γ (h)| < ∞, T E (ȳT − µ) → h=−∞ γ (h).

Proof. See Brockell and Davis, (1991), p. 219.


Notice that i) and ii) imply that ȳT converges in mean square
to µ. [Recall that mean square convergence implies convergence
in probability]

thus, the sample mean is consistent provided γ (T ) → 0 (re-


member that this is the ergodicity condition).

Then, this theorem presents a weak Law of Large Numbers


for stationary and ergodic processes.
Theorem 5 (Central limit theorem for dependent P∞processes) Let
{yt } be a stationary sequence given by yt = µ + j =0 ψj εt−j where
2
P∞
{εt } is an iid(0,σ ) sequence of random variables and j = 0 ψj < ∞
P∞
and j =0 ψj 6= 0 then
 
√ ∞
d X
T (ȳT − µ) → N 0, γj  .
j =−∞

Proof. See Brockell and Davis, (1991), Section 7.3.


2
P∞
The limit of T E (ȳT − µ) , j =−∞ γ (j ) , is called the long run vari-
ance of ȳT .
Remarks
Remark 1
Notice this theorem implies that weak stationarity is not enough!
why?

εt is assumed to be i.i.d.

Under this assumption yt is strictly stationary! (see Hayashi,


proposition 6.1.d)

Remark 2
Alternative versions of the C.L.Ts for dependent processes ex-
ist. See Hayashi, Theorem 6.10

Main difference between the two theorems: a different sufficient


condition for ergodicity for second moments (Gordin’s condition).
Remark 3

Notice that the long run variance (LRV)


X
LRV = γ (j )
j =−∞

can also be written as (since γj = γ−j )


X
LRV = γ0 + 2 γ (j )
j =1


X
= σ 2 ψ (1) = σ 2 ( ψj )2
j =0

[See Hamilton p. 62 for the proof of this last equality.]


A bit of intuition behind the LRV formula
Estimation of the LRV
Remarks

Parametric estimates require us to specify a model of yt

If the estimates of the coefficients in the LRV are consistent,


then the estimated LRV is consistent as well

But notice that this requires correct specification!

Parametric estimates are in general more efficient than non-


parametric ones

But: provided the model is correctly specified!


Non parametric estimation of the LRV
Problems

1. How to pick q?

p
2. q must grow with T in order for the LRVq → LRV
\

3. In finite samples, L
\ RVq might have bad properties, it can even
be negative.
Kernel-based estimators

Non-parametric estimator of the LRV

The idea is:


1) estimate non-parametrically the autocovariance function (up
to a number, q [Bandwith or truncation parameter]).

2) Compute the estimate of the LRV as the weighted average of


these autocovariances. How? The kernel provides these weights-

Advantage: these weighting schemes improve the properties of


the LRV estimator (more specifically, yield positive estimates!)
Kernel-based Estimators
Examples of Kernel Weight Functions
Remarks
7. Newey and West (1994) Ecta gave Monte Carlo evidence that
the choice of bandwidth, q (T ), is more important than the choice
of kernel.

See Hall (2005), section 3.5 for more details.


Multivariate Time Series

So far we’ve focused on univariate time series processes

Recall that our goal is to compute the CLT for the sample
mean of t εt which is a k × 1 vector!

Thus, we need to extend the previous concepts to the multi-


variate set-up

Notation is more complicated but concepts are very similar!


Multivariate Time Series
Stationarity and Ergodicity
Means and var-cov matrix
Autocovariance of order k
Sample counterparts of the population covariances
Multivariate Wold Representation
Limit theorems for multivariate ergodic pro-
cesses

The theorems stated in the univariate case extend to the mul-


tivariate case in a similar way

LLN: the requirements are 1) weak stationarity and 2) covari-


ance function → 0 (ergodicity for the mean)

CLT: the requirements are 1) strict stationarity and ergodic-


ity for second moments (absolute summability of the matrices of
coefficients of the Wold representation)

See Brockwell and Davis, Propositions 11.2.1. and 11.2.2 for


further details.
A popular multivariate model: VAR(p) model

VAR: Vector Autoregressive Model

Made famous in Chris Sim’s paper “Macroeconomics and Re-


ality” (2011 Nobel Laureate)

Natural extension of univariate AR(p) processes to the multi-


variate set-up

Has proven to be very useful for describing the dynamic behavior


of economic time series
A quick introduction to VAR(p) models
Stationarity Condition
Central Limit Theorem and LRV
Estimation of LRV

As in the univariate case, two ways

Parametric: estimate the VAR and use the coefficients of the


VAR to estimate the LRV

Non-parametric: estimate the covariances without imposing


any parametric condition (i.e., not assuming that the process fol-
lows a VAR(p))
Parametric Estimate of LRV using VAR(p) model
Parametric Estimate of LRV using VAR(p) model, II
Non-Parametric estimator of the LRV
Regression with autocorrelated errors.

OLS and GMM


OLS with autocorrelated errors
Assume that {gt } = {xt εt } is a stationary and ergodic (for sec-
ond moment) process, given by

gt = Ψ(L)ηt , η ∼ i.i.d.(0, Σ)

Then,

T
1 X p
xt εt → 0
T t=1
T
1 X d
√ xt εt → N (0, LRV )
T t=1

With

X
S = LRV = Γ0 + (Γj + Γ0j ) = Ψ(1)ΣΨ(1)0
j =1

Γ0 = E (gt gt0 ) = E (xt x0t ε2t ), Γt = E (gt gt0 ) = E (xt x0t εt εt−j )
Asymptotics with autocorrelated errors
OLS with autocorrelated errors

Bottom line:

(Provided the x’s are still exogenous!) the OLS estimator is


still consistent

However, the standard errors need to be adjusted: HAC stan-


dard errors!
HAC standard errors
Are the x’s still exogenous?

Serial correlation in the residuals can make the regressors en-


dogenous!

Example:

yt = αyt−1 + εt

if εt is w.n →, yt−1 is ”predetermined” (exogenous, uncorrelated


with εt )

But if εt is autocorrelated, then yt−1 can become endogenous.


An example: εt is AR(1)

εt = βεt−1 + ηt

It’s clear that corr (yt−1 , εt ) = corr (αyt−2 + εt−1 , εt ) 6= 0

In fact, the model is misspecified. Notice that

ηt
εt =
1 − βL

Which implies:

(1 − βL)yt = (1 − βL)yt−1 + ηt

That is, yt is an AR(2) process (with white noise residuals)


Conclusion: Serial correlation with lagged dependent
variables

Be careful with lagged dependent variables in the presence of


serial correlation in the residuals.

OLS can become inconsistent

Try to specify the model in such a way that the residuals look
uncorrelated (you can test for this)

Serial correlation in the residuals doesn’t always mean than


lagged dependent variables will become endogenous (see Wooldridge
(Introduction to Econometrics), Chapter 12 for an example).
GMM with Serially Correlated errors
Recall:

0 c
(δ̂ − δ0 ) = (Sxz W Sxz )−1 Sxz
0 c
W ḡ

p
GMM is consistent (ḡ → 0 by the LNN)

The asymptotic distribution is different (as the variance of ḡ is


different under serial correlation)
Estimation of the LRV
GMM with Serially Correlated errors
Wrapping up. . .

If working with time series data, serial correlation is very likely

Provided the instruments/regressors continue to be exogenous

OLS/GMM will still be consistent

The asymptotic distribution will be different (Avar (δ̂ ) will be


different

New standard errors are needed, robust to heteroskedasticity


and Autocorrelation

Parametric and non-parametric estimators of the LRV


References
Hayashi, Chapter 6
Hamilton: Chapter 1, 7.
Brockwell, P. J., and A. Davis (1991): Chapter 1, 6.
Stock and Watson, Chapter 14.

You might also like