Handout 2

Advanced Econometrics II
Master in Economics and Finance, BGSE
Handout 2: Serial Correlation
Laura Mayoral
IAE and Barcelona GSE
Barcelona, Winter 2021
Goal
This handout extends the GMM single-equation GMM model

(Hayashi, Chapter 3) to incorporate serial correlation
Serial correlation arises in the context of time series data.
Thus, in the following, we will use the subindex “t” (=time),

rather than “i”
Serial correlation: correlation between contemporaneous a ran-

dom variable and its lagged values.
In the GMM context: we will allow for serial correlation in the

product of vector of instruments and the error term.
The problem
When time series data is employed, serial correlation is likely

to be the norm, not the exception
What are the properties of the estimators (OLS, GMM,. . . )

under serial correlation of Xt εt ?
Still consistent?
Same asymptotic distribution?

The problem
When time series data is employed, serial correlation is likely

to be the norm, not the exception
What are the properties of the estimators (OLS, GMM,. . . )

under serial correlation of Xt εt ?
Still consistent?
Same asymptotic distribution?
Still consistent (if X continues being exogeneous)
But different asymptotic distribution (more precisely, different

Avar)
To develop the asymptotic distribution of the GMM estima-
tor, so far we have assumed that Xt εt is a martingale difference
sequence.
Recall that by definition of m.d.s.,
E (Xt εt |Xt−1 εt−1 . . . ) = 0
which implies that the process {Xt εt } is non-autocorrelated. (Note:

use the law of iterated expectations to check this).
We will now relax this hypotheses: allow autocorrelation in

{Xt εt }
Then: we need to use a new C.L.T for serially dependent pro-

cesses!
This handout has two parts:
Basic background in time series analysis. Limit theorems for

weakly dependent processes
GMM with serial correlation

Basic elements in Time Series Analysis.
Limit theory for weakly dependent processes

Outline
I. Univariate Time Series Models:
1. Autocovariance and autocorrelation function
2. Strict and Weak Stationarity. Ergodicity.
3. Modelling weakly dependent processes: The Wold theorem (MA

processes)
4. AR and ARMA processes
5. Limit theorems for stationary and ergodic processes
6. Estimation of the long run variance.
II. Multivariate Time Series Models.

Time series data
A time series is a set of observations
y1 , y2 , . . . , yt , . . . , yT ,
where t is the time index.
Time series data come with a natural temporal ordering

Random Sampling vs. Dependent stochastic
processes
In cross-sectional analysis: observations (yi , xi ), i = {1, ..., N }

are (typically assumed to be) randomly drawn from a fixed popu-
lation. N observations from the same distribution. No ordering.
Random sampling implies that observations from different units

are independently distributed.
Time series observations (yt , xt ) , t = {1, ...T } are in general

non-independent.
−→ Dependence among observations is a key feature in time series

variables.
Univariate vs Multivariate time series processes
Univariate time series process: {yt }, where yt is a random vari-

ables
Multivariate time series process: {yt }, where yt is a vector of

random variables: (y1t . . . ykt )0
We begin by analysing key aspects of univariate time series

processes
Very similar in multivariate time series (notation is a bit more
complicated)
The autocovariance funtion
Consider a univariate time series process {Xt }

The autocovariance function is a measure of linear dependence
between elements of the sequence {Xt , t ∈ Z}
extends the concept of covariance matrix (computed with a
finite number of random variables) to the case where there is an
infinite collection of random variables.
Definition 1 The autocovariance function. If {Xt , t ∈ T } is a pro-

cess such that V ar (Xt ) < ∞ for each t ∈ T , then the autocovariance
function γX (., .) of Xt is defined by
γX (r, s) = Cov (Xr , Xs )

= E [(Xr − E (Xr ))(Xs − E (Xs ))], r, s ∈ T .
Weak Stationarity and Strict stationarity
Stationarity is a crucial concept.
There are two basic definitions of stationarity:
strict and
weak (or second-order) stationarity.
In both cases: it imposes stability over time (either in the

joint distribution or in the moments of the process of the random
variables in the stochastic process)
Strict stationarity
Definition 2 (Finite Dimensional distributions). Let T be the set

of all vectors {t = (t1, ..., tn )0 ∈ T n : t1 < t2 ... < tn , n=1, 2,...}. Then
the finite-dimensional distribution functions of {Xt , t ∈ T } are the
0
functions {Ft (.) , t ∈ T } defined for t = (t1 , ..., tn ) by
0
Ft (x) = P (Xt1 ≤ x1 , ..., Xtn ≤ xn ), x = (x1 , ..., xn ) ∈ Rn
Definition 3 (First, Second and n-th order stationary). The time
series {Xt , t ∈ Z} is said to be first-order, second-order and n-th
order stationary, respectively if
Ft (xt1 ) = Ft (xt1 +h ) , for any t1 , h;
Ft (xt1 , xt2 ) = Ft (xt1 +h , xt2 +h ) , for any t1 , t2 , h;

Ft (xt1 , xt2 , ..., xtn ) = Ft xt1 +h, xt2 +h , ..., xtn +h ,
for any t1 , t2 , ..., th , h;
Definition 4 (Strict Stationarity) The time series {Xt , t ∈ Z} is

said to be strictly stationary if the joint distributions of (Xt1 , ....Xtk )0
and (Xt1+h , ....Xtk+h )0 are the same for all positive integers k and for
all t1 , ..., tk , h ∈ Z. In other words, {Xt , t ∈ Z} is strictly stationary
if it is n-order stationary for any n.
Interpretation:
This means that the graphs over two equal-length time intervals
of a realisation of the time series should exhibit similar statistical
characteristics.
Joint finite dimensional distributions are difficult to work with.

The following concept introduces a notion of stationarity that can
be characterized by only looking at first and second moments.
Weak Stationarity
Definition 5 (Weak Stationarity) The time series {Xt , t ∈ Z} is

said to be weakly stationary if
i) E Xt < ∞ for all t ∈ Z

2
ii) E(Xt ) = m for all t

iii) γX (r, s) = γX (r + t, s + t) for all r, s, t ∈ Z.
This concept of stationarity is usually referred in the litera-

ture as second-order stationarity, weak stationarity or covariance
stationarity.
Notice that stationarity requires also the variance of Xt to be
constant. If Xt is stationary, then
V ar (Xr ) = γX (r, r ) = γX (r + t, r + t) = V ar (Xr +t ),

for all r, t ∈ Z.
(Weak) Stationarity basically means that the mean, the vari-

ance are finite and constant and that the autocovariance function
only depends on h, the distance between observations.
Stationarity and the autocovariance funtion
If {Xt , t ∈ Z} is stationary, then γX (r, s) = γX (r − s, 0) for all r,

s ∈ Z. Then, for stationary processes one can define the autoco-
variance as a function of only one parameter, that is
γX (h) = Cov (Xt+h , Xt ) for all t, h ∈ Z.
The function γX (.) will be referred to as the autocovariance

function of the process {Xt } and γX (h) is the value of this function
at lag h.
Notation: γX (h) or simply γh will denote the h-th autocovari-

ance of Xt .
If γ (.) is the autocovariance function of a stationary process,
then it verifies
i) γ (0) ≥ 0
ii) |γ (h)| ≤ γ (0) for all h ∈ Z
iii) γ (−h) = γ (h) for all h ∈ Z
The autocorrelation function
Definition 6 (Autocorrelation function, ACF) For a stationary pro-

cess {Xt }, the autocorrelation function at lag h is defined as
γX (h)
ρX ( h ) = = Corr (Xt+h , Xt ), for all t, h ∈ Z.
γX (0)
The relation between Stationary and Strict Stationarity
Strict stationarity implies weak stationarity, provided the first
and second moments of the variables exist, but the converse of
this statement is not true in general.
Taking k = 1 in Definition 3, it is clear that all the variables

have the same distribution, which implies that the mean and the
variance are the same for all variables (provided they exist).
Taking k = 2 in Definition 3, it follows that Cov(Xt1 , Xt2 ) =

Cov (Xt1 +h , Xt2 +h ) for all t1 , t2 , h ∈ Z, implying condition iii) in the
stationarity definition.
However, it is easy to find counterexamples where a process is
stationary but not strictly stationary.
Example 1 Consider a sequence of independent random variables

such that if t < T1 , Xt follows an exponential distribution with mean
and variance equal to 1 and if t ≥ T1 , Xt is normally distributed
with mean and variance equal to 1. {Xt } is stationary but it is not
strictly stationary because Xt and Xt∗ have different distributions
if t < T1 and t∗ ≥ T1 .
The relation between Stationary and Strict Stationarity,
II
There is an one important case where both concepts of stationary
are equivalent.
Definition 7 (Gaussian Time series) The process {Xt } is a Gaus-

sian time series if and only if the distribution functions of {Xt } are
all multivariate normal.
If {Xt , t ∈ Z} is a stationary Gaussian time series, then it is also

strictly stationary, since for all n ∈ {1, 2, ...} and for all h, t1 , t2 , ...
0 0
∈ Z, the random vectors (Xt1 , ..., Xtn ) , and (Xt1 +h , ..., Xtn +h ) have
the same mean, and covariance matrix, and hence they have the
same distribution.
Ergodicity
Ergodicity is a condition that restricts the memory of the pro-

cess.
A loose definition of ergodicity is that the process is asymptot-

ically independent.
That is, for sufficiently large n, Yi and Yi+n are nearly indepen-
dent. A more formal definition is provided below.
All of these definitions essentially say that the effect of current

events eventually disappears.
Definition 8 (Ergodicity for the mean) A covariance stationary
−1
PT
process Yt is said to be ergodic for the mean if Y = T t = 1 Yt
converges to E(Yt ) .
Definition 9 (Ergodicity for the second moments) A covariance

stationary process is said to be ergodic for the second moments if
1 X p
(Yt − µ) (Yt−j − µ) → γj , for all j.
(T − j ) t=j +1
Sufficient conditions for ergodicity:
• If γ (n) → 0 as n → ∞, then then {Yt } is ergodic for the mean.

(Proof: Brockwell and Davis, p. 219)
P∞
• If j =0 |γ (j )| < ∞, then then {Yt } is ergodic for second mo-
ments. (Proof: Brockwell and Davis, p. 220)
• Furthermore,
P∞ if {Yt } is a stationary Gaussian process and
j =0 γj < ∞, then the process is ergodic of all moments.
Some examples of stationary processes
Example 2 iid sequences.

The sequence {εt } is i.i.d (independent and identically dis-
tributed) if all the variables are independent and share the same
univariate distribution.
Clearly, an iid sequence is strictly stationary and provided the

first and second order moments exist, it is also weakly stationary.
Example 3 White noise process.

The process {εt } is called white noise if it is weakly stationary
with E (εt ) = 0 and autovariance function
2
σ h=0
γε (h) =
0 h 6= 0
The white noise process is important because is used as a
building block for more general processes, as can be seen in some
of the examples below.
An i.i.d sequence with zero mean and variance σ 2 is also white

noise. The converse is not true in general. Furthermore, a white
noise process might not be strictly stationary.
Example 4 Martingale difference sequence, m.d.s.
A process {εt }, with E (εt ) = 0 is called a martingale difference
sequence if
E (εt |εt−1 , εt−2 , ...) = 0, for t ≥ 2.
Exercise:
Show that if E (εt ) = 0 and the second order moments exist then,
εt is i.i.d. ⇒ εt is m.d.s ⇒ εt is white noise
but the converse implication is not true in general.

Example 5 Moving average of order one.
The process {Xt } is called a moving average of order 1, or
MA(1), if {Xt } is defined as
Xt = εt + θεt−1 ,
where {εt } is a white noise process and is stationary for any value
of θ.
Example 6 Moving average of order q.

The process {Xt } is called a moving average of order 1, or
MA(1), if {Xt } is defined as
Xt = εt + θ1 εt−1 · · · + θq εt−q ,
where {εt } is a white noise process and is stationary for any value
of θ1 . . . θq
Example 7 Autoregression of order 1.
The process {Xt } is called an autoregressive process of order
1 if {Xt } is defined as
Xt = φXt−1 + εt ,
where {εt } is a white noise process. Xt is stationary provided |φ| < 1.
Example 8 Autoregression of order p.

The process {Xt } is called an autoregressive process of order
p if {Xt } is defined as
Xt = φ1 Xt−1 + . . . φp Xt−p + εt ,
where {εt } is a white noise process. Xt is stationary provided all

the roots of the polynomial Φ(L) = 1 − φL − · · · − φp Lp are larger
than 1 in abs. value.
Some graphs
The graphs below correspond to simulated data.
IID process: plot and autocorrelation function
-1
-2
-3
-4
0 50 100 150 200 250 300
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-0.1
0 5 10 15 20 25
AR(1) process, φ = 0.8: plot and autocorrelation function
-2
-4
-6
0 50 100 150 200 250 300
0.5
-0.5
0 5 10 15 20 25
MA(1) process, θ = 0.8: plot and autocorrelation function
-2
-4
0 50 100 150 200 250 300
0.5
0.4
0.3
0.2
0.1
-0.1
0 5 10 15 20 25
Some examples of non-stationary processes
Example 9 A trended process.
Xt = βt + εt ,
where t = 1, ...., T is a deterministic time trend.
Example 10 A random walk process
Xt = Xt−1 + εt , t≥0
Example 11 A process with a break
Xt = εt , t < k
Xt = µ + εt , t ≥ k
where µ 6= 0.
And some more graphs
Random walk: plot and autocorrelation funciton
40
30
20
10
-10
0 50 100 150 200 250 300
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25
Trend-stationary process: plot and autocorrelation funciton
60
40
20
-20
0 50 100 150 200 250 300
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25
The Lag Operator
The lag operator L maps a sequence {xt } into a sequence {yt }

such that
yt = Lxt = xt−1 , for all t.
If we apply L repeatedly on a process, for instance L (L (Lxt )) ,

we will use the convention
L (L (Lxt )) = L3 xt = xt−3 .
We can also form polynomials, ap (L) = 1 + a1 L + a2 L2 + ... +

ap Lp , such that
ap (L) xt = xt + a1 xt−1 + ...ap xt−p .

Modelling Serial correlation: Linear processes
A fundamental result: The Wold Theorem
Let {yt } be a stationary time series with E (yt ) = µ and var (yt ) < ∞.
A fundamental decomposition result is the following
Wold Representation theorem:
∞
X
yt = µ + ψj εt−j (1)
j =0
With:
ψ0 = 1
P∞ 2 <∞
(square-summability) ψ
j =0 j
εt is a white noise process with zero mean and variance σ 2

Implications
VERY IMPORTANT: any stationary process can ALWAYS be

written as a linear process
The resulting process is a M A(∞) process: a linear combination

of a white noise process
Very easy to manipulate analytically

Moments of the MA(∞) process
mean: E (yt ) = µ
P∞
variance: var (yt ) = σ2 2
j =0 ψj
P∞ 2 <∞
Square-summability is a stationarity condition: ψ
j =0 j
Often a more demanding condition is required (one that is

NOT guaranteed by the Wold Theorem!):
P∞
Absolute-summability: j =0 ψj < ∞
Note: absolute-summability ⇒ Square-summability (see Hamil-

ton, Appendix 3.A) [The converse argument, not true in general]
Autocovariances of the MA(∞) process
γj = E [(yt − µ)(yt−j − µ)]

∞
X ∞
X
= E [( ψk εt−k )( ψh εt−h )
k =0 h=0
∞
X
= σ2 ψj +k ψk , j = 0, 1, 2 . . .
k =0
Ergodicity for the mean (MA(∞) process)
A stationary linear process is ergodic for the mean
Recall that for this to happen then γ (n) → 0 if n → ∞

Ergodicity for second moments fo the MA(∞) process
Recall that ergodicity for second moments requires
∞
X
γj < ∞
j =0
P∞ P∞
It can be shown that j =0 ψj < ∞ implies j =0 γj < ∞

Proof: See Appendix 3.A (Hamilton)
P∞
Thus, a stationary process with j =0 ψj < ∞ is ergodic for
second moments.
MA(q) processes
The Wold representation theorem allows us to write any sta-
tionary process as a (potentially infinite) linear combination of a
white noise process: MA(∞).
If the number of terms in this linear combination is finite:

MA(q) process
You can check that in a MA(q) process the first q autocorre-

lations are different from zero and the rest are equal to zero.
An alternative representation: AR processes
From the Wold representation (MA(∞)) it’s possible to obtain

and alternative representation:
Autoregressive representation (AR(∞) process)
We will obtain this representation by “inverting” the MA poly-

nomial.
To do this properly, we will define first the concept of “Filter”

of a process.
Filters
Given a sequence of real numbers (α0 , α1 , . . . ), define a filter as

a polynomial in L:
α(L) = α0 + α1 L + α2 L2 + . . . .
If you apply this filter to a process {xt }, you get:
α(L)xt = α0 xt + α1 xt−1 + α2 xt−2 + . . . .
∞
X
= αj xt−j
j =0
Inversion
L−1 is the inverse of L, such that L−1 (L) xt = xt .
Lag polynomials can also be inverted.
The inverse of a polynomial φp (L) , are given by the values of

the coefficients αi of α (L) such that
−1
α (L) = φ (L) = 1 + α1 L + α2 L2 + ..., such that
−1
φp ( L ) φ ( L ) = 1.
Example
Let p=1. Find the inverse of φ1 (L) = (1 − φL) .
This amounts to finding the α0i s that verify

2

(1 − φL) 1 + α1 L + α2 L + ... = 1.
Matching terms in Lj , it follows that
−φ + α1 = 0 =⇒ α1 = φ,
−φα1 + α2 = 0 =⇒ α2 = φ2 .
...
Therefore
 
∞
(1 − φL)−1 = 1 +
X
φj Lj  , provided |φ| < 1.
j =1
P∞ j j
It is easy to check that 1 + j =1 φ L is the inverse of (1 − φL)
since:
 
k
X
(1 − φL) 1 + φj Lj  = 1 − φk+1 Lk+1 → 1 as k → ∞
j =1
Example 12 Let p=2. Find the inverse of φ1 (L) = (1 − φL) −
φ2 L2 ).
If p > 1, we can invert the polynomial φp (L) by first factoring it

and, then, use the formula for p=1. For example, let 1/λ1 and
1/λ2 be the roots of φ2 (L) . Then,
1 + φ1 L + φ2 L2 = (1 − λ1 L) (1 − λ2 L)
Provided |λ1 | , |λ2 | < 1,
2 −1
= (1 − λ1 L)−1 (1 − λ2 L)−1

1 + φ1 L + φ2 L
  
∞ ∞
λj1 Lj   λj2 Lj 
X X
= 
j =0 j =0
AR(∞) processes
Consider the MA(∞) representation of a stationary linear pro-

cess:
xt = ψ ( L ) εt
where ψ (L) = 1 + ψ1 L + ψ2 L2 . . . .
By inverting ψ (L) it’s possible to obtain an alternative repre-

sentation for xt
Denote: φ(L) = ψ (L)−1 = 1 − φ1 L − φ2 L2 . . .
For the inverse to exist: the roots of ψ (L) have to be larger

than 1 in absolute value (invertibility condition)
Then, provided the roots of ψ (L) are larger than 1 in absolute
value (i.e., xt is invertible) then
xt = ψ ( L ) εt ⇒
xt = φ1 xt−1 + φ2 xt−2 + · · · + εt ,
which can also be written as
φ ( L ) xt = εt
where φ(L) = ψ (L)−1
This is the autoregressive representation (of order ∞) of xt
AR representation: lags of the dependent variable plus a white

noise process.
Note: about invertibility
In time series a process is called invertible if an AR represen-

tation exists
Then,
An AR process is always invertible (obviously, as it’s already

written in AR form)
A MA process is invertible provided the roots of ψ (L) are larger

than 1 in absolute value
AR(p) processes
If the AR representation contains a finite number (p) of lags:

AR(p) process
xt = µ + φ1 xt−1 + φ2 xt−2 + · · · + φp xt−p + εt
In contrast to MA process, AR processes are NOT always sta-

tionary
For instance: a random walk is a not stationary AR(1) process:
xt = xt−1 + εt−1
(the variance of this process is NOT constant).

Stationarity condition for AR processes
The roots of the polynomial φ(L) need to be larger than 1 in

absolute value
[Typical wording of this condition in time series books: “The

roots of φ(L) have to lie outside the unit circle]
An example (where this condition doesn’t hold): random walk
yt = yt−1 + εt
φ(L) = 1 − L; root=1; not stationary
The theory that follows doesn’t apply to this type of processes:

only applies to stationary and ergodic ones.
ARMA(p,q) processes
The MA(∞) and AR(∞) processes are not very useful in prac-
tice: both have an infinite number of coefficients!
Consider the following stationary process:
xt = ψ ( L ) εt
The following approximation writes the MA polynomial (with
an infinite number of terms) as a ratio of two finiter-order polyno-
mials:
φp (L)
ψ (L) ∼
θq (L)
Then,
φp (L)xt = θq (L)εt
xt is an ARMA(p, q) process.
it has p autoregressive terms and a moving average component

of order q:
Main advantage of ARMA(p,q): it depends on a finite number

of parameters
ARMA(p,q) is the most popular way of modelling (univariate)

serial correlation.
ARMA:
is stationary if the AR stationarity condition is verified (all the

roots of φ(L) outside of the unit circle)
is invertible (=admits an AR representation) if the roots of the

MA polynomial θ (L) lie outside of the unit circle
Estimation
Assume the model is correctly specified
AR(p) models: OLS
MA(q) models: since the error term εt is not observable: max-

imum likelihood
ARMA(p,q): maximum likelihood (same reason!)

Limit Theorems for the sample mean
Consider the sample mean associated to a stochastic process

{yt }:
T
X
ȳT = T −1 yt
t=1
The Law of Large Numbers and the Central Limit Theorem are
the most important results for computing the limit of this sequence.
There are different versions of these theorems that differ in the

allowed degree of dependence
for i.i.d. processes
for m.d.s. processes
for stationary processes
Limit theorems: i.i.d and m.d.s cases
For completeness (you already know this!)
Theorem 1 (Weak law of large numbers for iid sequences) If {yt }

is an i.i.d sequence of random variables with finite mean µ then
T
X p
ȳT = T −1 yt → µ
t=1
If we further assume that var(yi ) = σ 2 < ∞, then a simple proof

can be provided. By Chebychev’s inequality:

T T
!
X X
P T −1 yt − µ > ε) ≤ var (T −1 yt )/ε2

t=1 t=1
T
X
= T −2 var (yt )/ε2
t=1
T σ2
= 2 2
→ 0.
T ε
Theorem 2 (Central limit theorem for i.i.d. sequences) If {yt } is
a sequence of iid(µ, σ 2 ) random variables then
√ d
T (ȳT − µ)/σ → N (0, 1).
A more general version of this theorem can be stated as follows.
Theorem 3 (Central limit theorem for martingale difference se-

2

quences) Let {yt } be a martingale difference sequence. If a) E yt =
σt2 > 0 with T −1 T 2 2 r
P
t=1 σt → σ , b) E(|yt |) < ∞ for some r > 2 and
−1
P 2 p 2 √ d 2

all t and c) T yt → σ , then T ȳT → N 0, σ .
Limit theorems for stationary processes
Theorem 4 If {yt } is stationary with mean µ and autocovariance

function γ (.) , then
i) E (ȳT ) = µ
ii) If γ (T ) → 0 as T→ ∞, Var(ȳT ) = E (ȳT − µ)2 → 0.
P∞ 2
P∞
iii) If h=−∞ |γ (h)| < ∞, T E (ȳT − µ) → h=−∞ γ (h).
Proof. See Brockell and Davis, (1991), p. 219.

Notice that i) and ii) imply that ȳT converges in mean square
to µ. [Recall that mean square convergence implies convergence
in probability]
thus, the sample mean is consistent provided γ (T ) → 0 (re-

member that this is the ergodicity condition).
Then, this theorem presents a weak Law of Large Numbers

for stationary and ergodic processes.
Theorem 5 (Central limit theorem for dependent P∞processes) Let
{yt } be a stationary sequence given by yt = µ + j =0 ψj εt−j where
2
P∞
{εt } is an iid(0,σ ) sequence of random variables and j = 0 ψj < ∞
P∞
and j =0 ψj 6= 0 then
 
√ ∞
d X
T (ȳT − µ) → N 0, γj  .
j =−∞
Proof. See Brockell and Davis, (1991), Section 7.3.

2
P∞
The limit of T E (ȳT − µ) , j =−∞ γ (j ) , is called the long run vari-
ance of ȳT .
Remarks
Remark 1
Notice this theorem implies that weak stationarity is not enough!
why?
εt is assumed to be i.i.d.
Under this assumption yt is strictly stationary! (see Hayashi,

proposition 6.1.d)
Remark 2
Alternative versions of the C.L.Ts for dependent processes ex-
ist. See Hayashi, Theorem 6.10
Main difference between the two theorems: a different sufficient

condition for ergodicity for second moments (Gordin’s condition).
Remark 3
Notice that the long run variance (LRV)
∞
X
LRV = γ (j )
j =−∞
can also be written as (since γj = γ−j )
∞
X
LRV = γ0 + 2 γ (j )
j =1
∞
X
= σ 2 ψ (1) = σ 2 ( ψj )2
j =0
[See Hamilton p. 62 for the proof of this last equality.]

A bit of intuition behind the LRV formula
Estimation of the LRV
Remarks
Parametric estimates require us to specify a model of yt
If the estimates of the coefficients in the LRV are consistent,

then the estimated LRV is consistent as well
But notice that this requires correct specification!
Parametric estimates are in general more efficient than non-

parametric ones
But: provided the model is correctly specified!

Non parametric estimation of the LRV
Problems
1. How to pick q?
p
2. q must grow with T in order for the LRVq → LRV
\
3. In finite samples, L
\ RVq might have bad properties, it can even
be negative.
Kernel-based estimators
Non-parametric estimator of the LRV
The idea is:

1) estimate non-parametrically the autocovariance function (up
to a number, q [Bandwith or truncation parameter]).
2) Compute the estimate of the LRV as the weighted average of

these autocovariances. How? The kernel provides these weights-
Advantage: these weighting schemes improve the properties of

the LRV estimator (more specifically, yield positive estimates!)
Kernel-based Estimators
Examples of Kernel Weight Functions
Remarks
7. Newey and West (1994) Ecta gave Monte Carlo evidence that
the choice of bandwidth, q (T ), is more important than the choice
of kernel.
See Hall (2005), section 3.5 for more details.

Multivariate Time Series
So far we’ve focused on univariate time series processes
Recall that our goal is to compute the CLT for the sample
mean of t εt which is a k × 1 vector!
Thus, we need to extend the previous concepts to the multi-

variate set-up
Notation is more complicated but concepts are very similar!

Multivariate Time Series
Stationarity and Ergodicity
Means and var-cov matrix
Autocovariance of order k
Sample counterparts of the population covariances
Multivariate Wold Representation
Limit theorems for multivariate ergodic pro-
cesses
The theorems stated in the univariate case extend to the mul-

tivariate case in a similar way
LLN: the requirements are 1) weak stationarity and 2) covari-

ance function → 0 (ergodicity for the mean)
CLT: the requirements are 1) strict stationarity and ergodic-

ity for second moments (absolute summability of the matrices of
coefficients of the Wold representation)
See Brockwell and Davis, Propositions 11.2.1. and 11.2.2 for

further details.
A popular multivariate model: VAR(p) model
VAR: Vector Autoregressive Model
Made famous in Chris Sim’s paper “Macroeconomics and Re-

ality” (2011 Nobel Laureate)
Natural extension of univariate AR(p) processes to the multi-

variate set-up
Has proven to be very useful for describing the dynamic behavior

of economic time series
A quick introduction to VAR(p) models
Stationarity Condition
Central Limit Theorem and LRV
Estimation of LRV
As in the univariate case, two ways
Parametric: estimate the VAR and use the coefficients of the

VAR to estimate the LRV
Non-parametric: estimate the covariances without imposing

any parametric condition (i.e., not assuming that the process fol-
lows a VAR(p))
Parametric Estimate of LRV using VAR(p) model
Parametric Estimate of LRV using VAR(p) model, II
Non-Parametric estimator of the LRV
Regression with autocorrelated errors.
OLS and GMM

OLS with autocorrelated errors
Assume that {gt } = {xt εt } is a stationary and ergodic (for sec-
ond moment) process, given by
gt = Ψ(L)ηt , η ∼ i.i.d.(0, Σ)
Then,
T
1 X p
xt εt → 0
T t=1
T
1 X d
√ xt εt → N (0, LRV )
T t=1
With
∞
X
S = LRV = Γ0 + (Γj + Γ0j ) = Ψ(1)ΣΨ(1)0
j =1
Γ0 = E (gt gt0 ) = E (xt x0t ε2t ), Γt = E (gt gt0 ) = E (xt x0t εt εt−j )
Asymptotics with autocorrelated errors
OLS with autocorrelated errors
Bottom line:
(Provided the x’s are still exogenous!) the OLS estimator is

still consistent
However, the standard errors need to be adjusted: HAC stan-

dard errors!
HAC standard errors
Are the x’s still exogenous?
Serial correlation in the residuals can make the regressors en-

dogenous!
Example:
yt = αyt−1 + εt
if εt is w.n →, yt−1 is ”predetermined” (exogenous, uncorrelated

with εt )
But if εt is autocorrelated, then yt−1 can become endogenous.

An example: εt is AR(1)
εt = βεt−1 + ηt
It’s clear that corr (yt−1 , εt ) = corr (αyt−2 + εt−1 , εt ) 6= 0
In fact, the model is misspecified. Notice that
ηt
εt =
1 − βL
Which implies:
(1 − βL)yt = (1 − βL)yt−1 + ηt
That is, yt is an AR(2) process (with white noise residuals)

Conclusion: Serial correlation with lagged dependent
variables
Be careful with lagged dependent variables in the presence of

serial correlation in the residuals.
OLS can become inconsistent
Try to specify the model in such a way that the residuals look
uncorrelated (you can test for this)
Serial correlation in the residuals doesn’t always mean than

lagged dependent variables will become endogenous (see Wooldridge
(Introduction to Econometrics), Chapter 12 for an example).
GMM with Serially Correlated errors
Recall:
0 c
(δ̂ − δ0 ) = (Sxz W Sxz )−1 Sxz
0 c
W ḡ
p
GMM is consistent (ḡ → 0 by the LNN)
The asymptotic distribution is different (as the variance of ḡ is

different under serial correlation)
Estimation of the LRV
GMM with Serially Correlated errors
Wrapping up. . .
If working with time series data, serial correlation is very likely
Provided the instruments/regressors continue to be exogenous
OLS/GMM will still be consistent
The asymptotic distribution will be different (Avar (δ̂ ) will be

different
New standard errors are needed, robust to heteroskedasticity

and Autocorrelation
Parametric and non-parametric estimators of the LRV

References
Hayashi, Chapter 6
Hamilton: Chapter 1, 7.
Brockwell, P. J., and A. Davis (1991): Chapter 1, 6.
Stock and Watson, Chapter 14.

Handout 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handout 2

Uploaded by

Copyright:

Available Formats

Advanced Econometrics II

Master in Economics and Finance, BGSE

Handout 2: Serial Correlation

This handout extends the GMM single-equation GMM model

Serial correlation arises in the context of time series data.

Thus, in the following, we will use the subindex “t” (=time),

Serial correlation: correlation between contemporaneous a ran-

In the GMM context: we will allow for serial correlation in the

When time series data is employed, serial correlation is likely

What are the properties of the estimators (OLS, GMM,. . . )

Same asymptotic distribution?

When time series data is employed, serial correlation is likely

What are the properties of the estimators (OLS, GMM,. . . )

Same asymptotic distribution?

Still consistent (if X continues being exogeneous)

But different asymptotic distribution (more precisely, different

Recall that by definition of m.d.s.,

E (Xt εt |Xt−1 εt−1 . . . ) = 0

which implies that the process {Xt εt } is non-autocorrelated. (Note:

We will now relax this hypotheses: allow autocorrelation in

Then: we need to use a new C.L.T for serially dependent pro-

Basic background in time series analysis. Limit theorems for

GMM with serial correlation

Limit theory for weakly dependent processes

1. Autocovariance and autocorrelation function

2. Strict and Weak Stationarity. Ergodicity.

3. Modelling weakly dependent processes: The Wold theorem (MA

4. AR and ARMA processes

5. Limit theorems for stationary and ergodic processes

6. Estimation of the long run variance.

II. Multivariate Time Series Models.

A time series is a set of observations

where t is the time index.

Time series data come with a natural temporal ordering

In cross-sectional analysis: observations (yi , xi ), i = {1, ..., N }

Random sampling implies that observations from different units

Time series observations (yt , xt ) , t = {1, ...T } are in general

−→ Dependence among observations is a key feature in time series

Univariate time series process: {yt }, where yt is a random vari-

Multivariate time series process: {yt }, where yt is a vector of

We begin by analysing key aspects of univariate time series

Consider a univariate time series process {Xt }

Definition 1 The autocovariance function. If {Xt , t ∈ T } is a pro-

γX (r, s) = Cov (Xr , Xs )

Stationarity is a crucial concept.

There are two basic definitions of stationarity:

weak (or second-order) stationarity.

In both cases: it imposes stability over time (either in the

Definition 2 (Finite Dimensional distributions). Let T be the set

Ft (xt1 ) = Ft (xt1 +h ) , for any t1 , h;

Ft (xt1 , xt2 ) = Ft (xt1 +h , xt2 +h ) , for any t1 , t2 , h;

Definition 4 (Strict Stationarity) The time series {Xt , t ∈ Z} is

Joint finite dimensional distributions are difficult to work with.

Definition 5 (Weak Stationarity) The time series {Xt , t ∈ Z} is

i) E Xt < ∞ for all t ∈ Z

ii) E(Xt ) = m for all t

This concept of stationarity is usually referred in the litera-

V ar (Xr ) = γX (r, r ) = γX (r + t, r + t) = V ar (Xr +t ),

(Weak) Stationarity basically means that the mean, the vari-

If {Xt , t ∈ Z} is stationary, then γX (r, s) = γX (r − s, 0) for all r,

γX (h) = Cov (Xt+h , Xt ) for all t, h ∈ Z.

The function γX (.) will be referred to as the autocovariance