Geneve Chapitre2

Chapter 2.
Dynamic panel data models

School of Economics and Management - University of Geneva
Christophe Hurlin, Université of Orléans
University of Orléans
April 2018
C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 1 / 209

1. Introduction
De…nition (Dynamic panel data model)

We now consider a dynamic panel data model, in the sense that it contains
(at least) one lagged dependent variables. For simplicity, let us consider
0
yit = γyi ,t 1 + β xit + αi + εit
for i = 1, .., n and t = 1, .., T . αi and λt are the (unobserved) individual

and time-speci…c e¤ects, and εit the error (idiosyncratic) term with
E(εit ) = 0, and E(εit εjs ) = σ2ε if j = i and t = s, and E(εit εjs ) = 0
otherwise.

1. Introduction
Remark
In a dynamic panel model, the choice between a …xed-e¤ects formulation
and a random-e¤ects formulation has implications for estimation that are
of a di¤erent nature than those associated with the static model.

1. Introduction
Dynamic panel issues
1 If lagged dependent variables appear as explanatory variables, strict

exogeneity of the regressors no longer holds. The LSDV is no longer
consistent when n tends to in…nity and T is …xed.
2 The initial values of a dynamic process raise another problem. It
turns out that with a random-e¤ects formulation, the interpretation
of a model depends on the assumption of initial observation.
3 The consistency property of the MLE and the GLS estimator also
depends on the way in which T and n tend to in…nity.

Introduction
The outline of this chapter is the following:

Section 1: Introduction
Section 2: Dynamic panel bias
Section 3: The IV (Instrumental Variable) approach
Subsection 3.1: Reminder on IV and 2SLS
Subsection 3.2: Anderson and Hsiao (1982) approach
Section 4: The GMM (Generalized Method of Moment) approach
Subsection 4.1: General presentation of GMM
Subsection 4.2: Application to dynamic panel data models

Section 2
The Dynamic Panel Bias

2. The dynamic panel bias
Objectives
1 Introduce the AR(1) panel data model.

2 Derive the semi-asymptotic bias of the LSDV estimator.
3 Understand the sources of the dynamic panel bias or Nickell’s bias.
4 Evaluate the magnitude of this bias in a simple AR(1) model.
5 Asses this bias by Monte Carlo simulations.

Dynamic panel bias
1 The LSDV estimator is consistent for the static model whether the
e¤ects are …xed or random.
2 On the contrary, the LSDV is inconsistent for a dynamic panel data
model with individual e¤ects, whether the e¤ects are …xed or random.

De…nition (Nickell’s bias)

The biais of the LSDV estimator in a dynamic model is generaly known as
dynamic panel bias or Nickell’s bias (1981).
Nickell, S. (1981). Biases in Dynamic Models with Fixed E¤ects,

Econometrica, 49, 1399–1416.
Anderson, T.W., and C. Hsiao (1982). Formulation and Estimation of
Dynamic Models Using Panel Data, Journal of Econometrics, 18, 47–82.

De…nition (AR(1) panel data model)

Consider the simple AR(1) model
yit = γyi ,t 1 + αi + εit
for i = 1, .., n and t = 1, .., T . For simplicity, let us assume that
αi = α + αi
to avoid imposing the restriction that ∑ni=1 αi = 0 or E (αi ) = 0 in the

case of random individual e¤ects.

Assumptions
1 The autoregressive parameter γ satis…es
jγj < 1
2 The initial condition yi 0 is observable.
3 The error term satis…es with E (εit ) = 0, and E (εit εjs ) = σ2ε if j = i
and t = s, and E (εit εjs ) = 0 otherwise.

Dynamic panel bias

In this AR(1) panel data model, we will show that
b LSDV 6= γ
plim γ dynamic panel bias
n !∞
b LSDV = γ
plim γ
n,T !∞

The LSDV estimator is de…ned by (cf. chapter 1)
b
αi = y i b LSDV y i ,
γ 1
! 1
n T
b LSDV
γ = ∑ ∑ (yi ,t 1 y i, 1)
2
i =1 t =1
!
n T
∑ ∑ (yi ,t 1 y i, 1 ) (yit yi )
i =1 t =1
T T T
1 1 1
xi =
T ∑ xit yi =
T ∑ yit y i, 1 =
T ∑ yi ,t 1
t =1 t =1 t =1

De…nition (bias)
The bias of the LSDV estimator is de…ned by:
! 1
n T
b LSDV
γ γ = ∑ ∑ (yi ,t 1 y i, 1)
2
i =1 t =1
!
n T
∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi )
i =1 t =1

The bias of the LSDV estimator can be rewritten as:

n T
∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi ) / (nT )
i =1 t =1
b LSDV
γ γ= n T
2
∑ ∑ (yi ,t 1 y i, 1) / (nT )
i =1 t =1

Let us consider the numerator. Because εit are (1) uncorrelated with αi
and (2) are independently and identically distributed, we have
n T
1
plim
n !∞ nT ∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi )
i =1 t =1
T n T n
1 1
= plim
n !∞ nT ∑ ∑ yi ,t 1 εit plim
n !∞ nT ∑ ∑ yi ,t 1 εi
t =1 i =1 t =1 i =1
| {z } | {z }
N1 N2
T n
1 1 T n
plim
n !∞ nT ∑ ∑ y i, 1 εit + plim ∑ ∑ y i,
n !∞ nT t =1 i =1
1 εi
t =1 i =1
| {z } | {z }
N3 N4

Theorem (Weak law of large numbers, Khinchine)

If fXi g , for i = 1, .., m is a sequence of i.i.d. random variables with
E (Xi ) = µ < ∞, then the sample mean converges in probability to µ:
1 m p
m i∑
Xi ! E (Xi ) = µ
=1
or
1 m
m i∑
plim Xi = E (Xi ) = µ
m !∞ =1

By application of the WLLN (Khinchine’s theorem)

n T
1
N1 = plim
n !∞ nT ∑ ∑ yi ,t 1 εit = E (yi ,t 1 εit )
i =1 t =1
Since (1) yi ,t 1 only depends on εi ,t 1, εi ,t 2 and (2) the εit are

uncorrelated, then we have
E (yi ,t 1 εit ) =0
and …nally
n T
1
N1 = plim
n !∞ nT ∑ ∑ yi ,t 1 εit =0
i =1 t =1

For the second term N2 , we have:

n T
1
N2 = plim
n !∞ nT ∑ ∑ yi ,t 1 εi
i =1 t =1
n T
1
= plim
n !∞ nT ∑ εi ∑ yi ,t 1
i =1 t =1
n T
1 1
= plim
n !∞ nT ∑ εi T y i , 1 as y i , 1 =
T ∑ yi ,t 1
i =1 t =1
n
1
n i∑
= plim εi y i , 1
n !∞ =1

In the same way:

n T n T
1 1 1 n
N3 = plim
n !∞ nT ∑ ∑ y i, 1 εit = plim
n !∞ nT ∑ y i , 1 ∑ εit = plim
n !∞ n i∑
y i, 1ε
i =1 t =1 i =1 t =1 =1
n T n
1 1 1 n
N4 = plim
n !∞ nT ∑ ∑ y i, 1 εi = plim
n !∞ nT
T ∑ y i, 1 εi = plim
n !∞ n i∑
y i, 1 εi
i =1 t =1 i =1 =1

The numerator of the bias expression can be rewritten as

n T
1
plim
n !∞ nT ∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi )
i =1 t =1
1 n 1 n 1 n
n i∑ n i∑ n i∑
= 0
|{z} plim εi y i , 1 plim y i, 1 εi + plim y i, 1 εi
n !∞ =1 n !∞ =1 n !∞ =1
N1 | {z } | {z } | {z }
N2 N3 N4
1 n
n i∑
= plim y i, 1 εi
n !∞ =1

Solution
The numerator of the expression of the LSDV bias satis…es:
n T
1 1 n
plim
n !∞ nT ∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi ) = plim
n !∞ n i∑
y i, 1 εi
i =1 t =1 =1

Remark
n T
∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi ) / (nT )
i =1 t =1
b LSDV
γ γ= n T
2
∑ ∑ (yi ,t 1 y i, 1) / (nT )
i =1 t =1
n T
1 1 n
plim
n !∞ nT ∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi ) =
n !∞
plim
n i∑
y i, 1 εi
i =1 t =1 =1
b LSDV is biased when n
If this plim is not null, then the LSDV estimator γ
tends to in…nity and T is …xed.

Let us examine this plim
1 n
n i∑
plim y i, 1 εi
n !∞ =1
We know that

2
= γ yi ,t 2 + αi (1 + γ ) + εit + γεi ,t 1
= γ3 yi ,t 3 + αi 1 + γ + γ
2
+ εit + γεi ,t 1 + γ2 εi ,t 2
= ...
1 γt
= γt yi 0 + α + εit + γεi ,t 1 + γ2 εi ,t 2 + ... + γt 1
εi 1
1 γ i

For any time t, we have:
yit = εit + γεi ,t 1 + γ2 εi ,t 2 + ... + γt 1

εi 1
1 γt
+ α + γt yi 0
1 γ i
For yi ,t 1, we have:
yi ,t 1 = εi ,t 1 + γεi ,t 2 + γ2 εi ,t 3 + ... + γt 2
εi 1
1 γt 1
+ α + γt 1 yi 0
1 γ i

2 t 2 1 γt 1
yi ,t 1 = εi ,t 1 + γεi ,t 2 + γ εi ,t 3 + ... + γ εi 1 + α + γt 1
yi 0
1 γ i
Summing yi ,t 1 over t, we get:

T
1 γ2 1 γT 1
∑ yi ,t 1 = εi ,T 1 +
1 γ
εi ,T 2 + ... +
1 γ
εi 1
t =1
(T 1) T γ + γT 1 γT
+ αi + yi 0
(1 γ )2 1 γ

2 t 2 1 γt 1
yi ,t 1 = εi ,t 1 + γεi ,t 2 + γ εi ,t 3 + ... + γ εi 1 + α + γt 1
yi 0
1 γ i
Proof: We have (each lign corresponds to a date)
T
∑ yi ,t 1 = yi ,T 1 + yi ,T 2 + .. + yi ,1 + yi ,0
t =1
T 2 1 γT 1
= εi ,T 1 + γεi ,T 2 + .. + γ εi 1 + αi + γT 1 yi 0
1 γ
1 γT 2
+εi ,T 2 + γεi ,T 3 + ... + γT 3
εi 1 + αi + γT 2 yi 0
1 γ
+..
1 γ1
+εi ,1 + α + γyi 0
1 γ i
+yi 0
Proof (ct’d): For the individual e¤ect αi , we have
αi
1 γ+1 γ2 + ... + 1 γT 1
1 γ
αi
= T 1 γ γ2 .. γT 1
1 γ
αi 1 γT
= T
1 γ 1 γ
αi T T γ 1 + γT
=
(1 γ )2

So, we have
T
1
y i, 1 =
T ∑ yi ,t 1
t =1
1 1 γ2 1 γT 1
= εi ,T 1 + εi ,T 2 + ... + εi 1
T 1 γ 1 γ
!
T Tγ 1 + γT 1 γT
+ αi + yi 0
(1 γ )2 1 γ

Finally, the plim is equal to
1 n
n i∑
plim y i, 1 εi
n !∞ =1
1 n 1 1 γ2 1 γT 1
n i∑
= plim εi ,t 1 + + ... +
εi ,t 2 εi 1
n !∞ =1 T 1 γ 1 γ
!
T Tγ 1 + γT 1 γT 1
+ αi + yi 0 (εi 1 + ... + εiT )
(1 γ )2 1 γ T

Because εit are i.i.d, by a law of large numbers, we have:
1 n
n i∑
plim y i, 1 εi
n !∞ =1
1 n 1 1 γ2 1 γT 1
n i∑
= plim εi ,T 1 + + ... +
εi ,T 2 εi 1
n !∞ =1 T 1 γ 1 γ
!
T Tγ 1 + γT 1 γT 1
+ αi + yi 0 (εi 1 + ... + εiT )
(1 γ )2 1 γ T
σ2ε 1 γ 1 γ2 1 γT 1
= + + ... +
T2 1 γ 1 γ 1 γ
T
σ2ε T Tγ 1 + γ
=
T2 (1 γ )2

Theorem
If the errors terms εit are i.i.d. 0, σ2ε , we have:
n T
1
plim
n !∞ nT ∑ ∑ (yi ,t 1 y i, 1 ) ( εit εi )
i =1 t =1
n
1
n i∑
= plim y i, 1 εi
n !∞ =1
σ2ε T Tγ 1 + γT
=
T2 (1 γ )2

b LSDV
By similar manipulations, we can show that the denominator of γ
converges to:
n T
1
plim
n !∞ nT
∑ ∑ (yi ,t 1 y i, 1)
2
i =1 t =1
!
σ2ε 1 2γ T T γ 1 + γT
= 1
1 γ2 T (1 γ )2 T2

So, we have :
b LSDV
plim (γ γ)
n !∞
n T
1
nT ∑ ∑ (yi ,t 1 y i , 1 ) (εit εi )
i =1 t =1
= plim n T
n !∞ 1
nT ∑ ∑ (yi ,t 1 y i , 1 )2
i =1 t =1
σ2ε (T T γ 1 + γT )
T2 (1 γ )2
=
σ2ε 1 2γ (T T γ 1 + γT )
1 γ2
1 T (1 γ )2 T2

This semi-asymptotic bias can be rewriten as:
b LSDV
plim (γ γ)
n !∞
T Tγ 1 + γT
=
1 γ 2γ
T2 T (T Tγ 1 + γT )
1 +γ (1 γ )2
(1 + γ ) T Tγ 1 + γT
=
2γ
(1 γ) T 2 T (T Tγ 1 + γT )
(1 γ )2

Fact
If T also tends to in…nity, then the numerator converges to zero, and
denominator converges to a nonzero constant σ2ε / 1 γ2 , hence the
LSDV estimator of γ and αi are consistent.
Fact
b LSDV and
If T is …xed, then the denominator is a nonzero constant, and γ
b
αi are inconsistent estimators when n is large.

Theorem (Dynamic panel bias)

In a dynamic panel AR(1) model with individual e¤ects, the
semi-asymptotic bias (with n) of the LSDV estimator on the autoregressive
parameter is equal to:
(1 + γ ) T Tγ 1 + γT
b LSDV
plim (γ γ) =
n !∞ 2γ
(1 γ) T 2 T (T Tγ 1 + γT )
(1 γ )2

Theorem (Dynamic panel bias)

For an AR(1) model, the dynamic panel bias can be rewriten as :
1+γ 1 1 γT
b LSDV
plim (γ γ) = 1
n !∞ T 1 T 1 γ
1
2γ 1 γT
1 1
(1 γ ) (T 1) T (1 γ )

Fact
b LSDV is caused by having to eliminate the individual
The dynamic bias of γ
e¤ects αi from each observation, which creates a correlation of order
(1/T ) between the explanatory variables and the residuals in the
transformed model
0 1
B C
(yit y i ) = γ @yi ,t 1 y i, 1 A
| {z }
depends on past value of εit
0 1
+ @εit εi A
|{z}
depends on past value of εit

Intuition of the dynamic bias
(yit y i ) = γ (yi ,t 1 y i, 1 ) + ( εit εi )
with cov (y i , 1 , εi ) 6= 0 since

!
T T
1 1
cov (y i , 1 , εi ) = cov
T ∑ yi ,t 1,
T ∑ εit
t =1 t =1
!
T T
1 1
= cov
T ∑ yi ,t 1,
T ∑ εit
t =1 t =1
1
= cov ((yi 1 + ... + yiT 1 ) , ( εi 1 + ... + εiT ))
T2

(yit y i ) = γ (yi ,t 1 y i, 1 ) + ( εit εi ) with cov (y i , 1 , εi ) 6= 0
If we approximate yit by εit (in fact yit also depend on εit 1, εt 2 , ...) then
we have
1
cov (y i , 1 , εi ) = cov ((yi 1 + ... + yiT 1 ) , (εi 1 + ... + εiT ))
T2
1
' (cov (εi ,1 , εi ,1 ) + ... + (cov (εi ,T 1 , εi ,T 1 )))
T2
(T 1) σ2ε
' 6= 0
T2

(yit y i ) = γ (yi ,t 1 y i, 1 ) + ( εit εi ) with cov (y i , 1 , εi ) 6= 0
If we approximate yit by εit then we have
(T 1) σ2ε
cov (y i , 1 , εi ) =
T2
By taking into account all the interaction terms, we have shown that
1 n σ2ε (T 1) γ 1 + γT
n i∑ 1 εi = cov (y i , 1 , εi ) =
plim y i,
n !∞ =1 T2 (1 γ )2

Remarks
1+γ 1 1 γT
b LSDV
plim (γ γ) = 1
n !∞ T 1 T 1 γ
1
2γ 1 γT
1 1
(1 γ ) (T 1) T (1 γ )
1 When T is large, the right-hand-side variables become asymptotically

uncorrelated.
2 For small T , this bias is always negative if γ > 0.
3 The bias does not go to zero as γ goes to zero.

Dynam ic pane l bias
-0.05
Semi-asymptotic bias
-0.1
-0.15
-0.2
T=10
T=30
-0.25
T=50
T=100
-0.3
0 0.2 0.4 0.6 0.8 1

T=10 T=30
1 1
True value of True value of
plim of the LSDV estimator plim of the LSDV estimator
0.8 0.8
semi-asymptotic bias
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
T=50 T=100
1 1
True value of True value of
plim of the LSDV estimator 0.9 plim of the LSDV estimator
0.8
0.8
0.7
0.6
0.6
0.4 0.5
0.4
0.2
0.3
0.2
0
0.1
-0.2 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Dynam ic bias for T=10 (in % of the true value )
0
-20
-40
relative bias (in %)
-60
-80
T=10
-100 T=30
T=50
T=100
-120
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Monte Carlo experiments

How to check these semi-asymptotic formula with Monte Carlo
simulations?

Step 1: parameters
i .i .d .
Let assume that γ = 0.5, σ2ε = 1 and εit N (0, 1) .
Simulate n individual e¤ects αi once at all. For instance, we can use a
uniform distribution
αi U[ 1,1 ]

Step 2: Monte Carlo pseudo samples
Simulate n (typically n = 1, 000) i.i.d. sequences fεit gTt=1 for a given

value of T (typically T = 10)
Generate n sequences fyit gTt=1 for i = 1, .., n with the model:
Repeat S times the step 2 in order to generate S = 5, 000 sequences

n o
(s ) T
yit for s = 1, .., S for each cross-section unit i = 1, ..., n
t =1

Step 3: LSDV estimates on pseudo series
For each pseudo sample s = 1, ..., S, consider the empirical model
yits = γyis,t 1 + αi + µit i = 1, .., n t = 1, ...T
b sLSDV .
and compute the LSDV estimates γ
b LSDV based on
Compute the average bias of the LSDV estimator γ
the S Monte Carlo simulations
S
1
av .bias =
S ∑ γbsLSDV γ
s =1

Step 4: Semi-asymptotic bias

1 Repeat this experiment for various cross-section dimensions n:
when n increases,the average bias should converge to
1+γ 1 1 γT
b LSDV
plim (γ γ) = 1
n !∞ T 1 T 1 γ
1
2γ 1 γT
1 1
(1 γ ) (T 1) T (1 γ )
2 Repeat this this experiment for various time dimensions T : when T
increases,the average bias should converge to 0.





Histogram of the LSDV estimates for=0.5, T=10 and n=1000

350
300
250
Number of simulations
200
150
100
50
0
0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38
at
h

Click me!

-0.15
Theoretical semi-asymptotic bias
MC average bias
-0.155
-0.16
-0.165
-0.17
-0.175
-0.18
0 200 400 600 800 1000
Sample size n

Question: What is the importance of the dynamic bias in micro-panels?

”Macroeconomists should not dismiss the LSDV bias as
insigni…cant. Even with a time dimension T as large as 30, we
…nd that the bias may be equal to as much 20% of the true value
of the coe¢ cient of interest.” (Judson et Owen, 1999, page 10)
Judson R.A. and Owen A. (1999), Estimating dynamic panel data models: a
guide for macroeconomists. Economics Letters, 1999, vol. 65, issue 1, 9-15.

Finite Sample results (Monte Carlo simulations)
n T γ b LSDV
Avg. γ Avg. bias
10 10 0.5 0.3282 0.1718
50 10 0.5 0.3317 0.1683
100 10 0.5 0.3338 0.1662
10 50 0.5 0.4671 0.0329
50 50 0.5 0.4688 0.0321
100 50 0.5 0.4694 0.0306

Finite Sample results (Monte Carlo simulations)
n T γ b LSDV
Avg. γ Avg. bias
10 10 0.3 0.3686 0.0686
50 10 0.3 0.3743 0.0743
100 10 0.3 0.3753 0.0753
10 50 0.3 0.3134 0.0134
50 50 0.3 0.3133 0.0133
100 50 0.5 0.3142 0.0142

Fact (smearing e¤ect)

The LSDV for dynamic individual-e¤ects model remains biased with the
introduction of exogenous variables if T is small; for details of the
derivation, see Nickell (1981); Kiviet (1995).
0
yit = α + γyi ,t 1 + β xit + αi + εit
b LSDV and b
In this case, both estimators γ βLSDV are biased.

What are the solutions?

Consistent estimator of γ can be obtained by using:
1 ML or FIML (but additional assumptions on yi 0 are necessary)
2 Feasible GLS (but additional assumptions on yi 0 are necessary)
3 LSDV bias corrected (Kiviet, 1995)
4 IV approach (Anderson and Hsiao, 1982)
5 GMM approach (Arenallo and Bond, 1985)

What are the solutions?

Consistent estimator of γ can be obtained by using:
1 ML or FIML (but additional assumptions on yi 0 are necessary)
2 Feasible GLS (but additional assumptions on yi 0 are necessary)
3 LSDV bias corrected (Kiviet, 1995)
4 IV approach (Anderson and Hsiao, 1982)
5 GMM approach (Arenallo and Bond, 1985)

Key Concepts Section 2
1 AR(1) panel data model

2 Semi-asymptotic bias
3 Dynamic panel bias (Nickell’s bias)
4 Monte Carlo experiments
5 Magnitude of the dynamic panel bias

Section 3
The Instrumental Variable (IV) approach

Subsection 3.1
Reminder on IV and 2SLS

3.1 Reminder on IV and 2SLS
Objectives
1 De…ne the endogeneity bias and the smearing e¤ect.

2 De…ne the notion of instrument or instrumental variable.
3 Introduce the exogeneity and relevance properties of an instrument.
4 Introduce the notion of just-identi…ed and over-identi…ed systems.
5 De…ne the IV estimator and its asymptotic variance.
6 De…ne the 2SLS estimator and its asymptotic variance.
7 De…ne the notion of weak instrument.

Consider the (population) multiple linear regression model:
y = Xβ + ε
y is a N 1 vector of observations yj for j = 1, .., N
X is a N K matrix of K explicative variables xjk for k = 1, ., K and

j = 1, .., N
β = ( β1 ..βK )0 is a K 1 vector of parameters
ε is a N 1 vector of error terms εi with (spherical disturbances)
V ( ε j X ) = σ 2 IN

Endogeneity we assume that the assumption A3 (exogeneity) is violated:
E ( εj X) 6= 0N 1
with
1 0
plim X ε = E (xj εj ) = γ 6= 0K 1
N

Theorem (Bias of the OLS estimator)

If the regressors are endogenous, i.e. E ( εj X) 6= 0, the OLS estimator of
β is biased
E β b
OLS 6 = β
where β denotes the true value of the parameters. This bias is called the
endogeneity bias.

Theorem (Inconsistency of the OLS estimator)

If the regressors are endogenous with plim N 1 X0 ε = γ, the OLS
estimator of β is inconsistent
b 1
plim β OLS = β + Q γ
where Q = plim N 1 X0 X.

Proof: Given the de…nition of the OLS estimator:

b 1
β OLS = X0 X X0 y
1
= X0 X X0 (Xβ + ε)
1
= β + X0 X X0 ε
We have:
1
b 1 0 1 0
plim β OLS = β + plim XX plim Xε
N N
1
= β+Q γ 6= β

Remarks
b 1
plim β OLS = β + Q γ
1 The implication is that even though only one of the variables in X is

correlated with ε, all of the elements of βb
OLS are inconsistent,
not just the estimator of the coe¢ cient on the endogenous variable.
2 This e¤ects is called smearing e¤ect: the inconsistency due to the
endogeneity of the one variable is smeared across all of the least
squares estimators.

Example (Endogeneity, OLS estimator and smearing)
Consider the multiple linear regression model
yi = 0.4 + 0.5xi 1 0.8xi 2 + εi
where εi is i.i.d. with E (εi ) . We assume that the vector of variables

de…ned by wi = (xi 1 : xi 2 : εi ) has a multivariate normal distribution with
wi N (03 1 , ∆)
with 0 1
1 0.3 0
∆ = @ 0.3 1 0.5 A
0 0.5 1
It means that Cov (εi , xi 1 ) = 0 (x1 is exogenous) but Cov (εi , xi 2 ) = 0.5
(x2 is endogenous) and Cov (xi 1, xi 2 ) = 0.3 (x1 is correlated to x2 ).

Example (Endogeneity, OLS estimator and smearing (cont’d))

Write a Matlab code to (1) generate S = 1, 000 samples fyi , xi 1 , xi 2 gN
i =1
of size N = 10, 000. (2) For each simulated sample, determine the OLS
estimators of the model
yi = β1 + β2 xi 1 + β3 xi 2 + εi
0
b = b
Denote β β1s b
β2s b
β3s the OLS estimates obtained from the
s
simulation s 2 f1, ..S g . (3) compare the true value of the parameters in
the population (DGP) to the average OLS estimates obtained for the S
simulations



Question: What is the solution to the endogeneity issue?
The use of instruments..

De…nition (Instruments)
Consider a set of H variables zh 2 RN for h = 1, ..N. Denote Z the N H
matrix (z1 : .. : zH ) . These variables are called instruments or
instrumental variables if they satisfy two properties:
(1) Exogeneity: They are uncorrelated with the disturbance.
E ( εj Z) = 0N 1
(2) Relevance: They are correlated with the independent variables, X.
E (xjk zjh ) 6= 0
for h 2 f1, .., H g and k 2 f1, .., K g.

Assumptions: The instrumental variables satisfy the following properties.

Well behaved data:
1 0
plim Z Z = QZZ a …nite H H positive de…nite matrix
N
Relevance:
1 0
plim Z X = QZX a …nite H K positive de…nite matrix
N
Exogeneity:
1 0
plim Z ε = 0K 1
N

De…nition (Instrument properties)

We assume that the H instruments are linearly independent:
E Z0 Z is non singular
or equivalently
rank E Z0 Z =H

The exogeneity condition
E ( εj j zj ) = 0 =) E (εj zj ) = 0H
can expressed as an orthogonality condition or moment condition

0 1
E @ zj yj xj0 β A = 0H
(H ,1 ) (1,1 ) (H ,1 )
So, we have H equations and K unknown parameters β

De…nition (Identi…cation)
The system is identi…ed if there exists a unique vector β such that:
E zj yj xj0 β =0
where zj = (zj 1 ..zjH )0 . For that, we have the following conditions:

(1) If H < K the model is not identi…ed.
(2) If H = K the model is just-identi…ed.
(3) If H > K the model is over-identi…ed.

Remark
1 Under-identi…cation: less equations (H) than unknowns (K )....

2 Just-identi…cation: number of equations equals the number of
unknowns (unique solution)...=> IV estimator
3 Over-identi…cation: more equations than unknowns. Two equivalent
solutions:
1 Select K linear combinations of the instruments to have a unique
solution )...=> Two-Stage Least Squares (2SLS)
2 Set the sample analog of the moment conditions as close as possible to
zero, i.e. minimize the distance between the sample analog and zero
given a metric (optimal metric or optimal weighting matrix?) =>
Generalized Method of Moments (GMM).


Assumption: Consider a just-identi…ed model
H=K

Motivation of the IV estimator

By de…nition of the instruments:
1 0 1
plim Z ε = plim Z0 (y Xβ) = 0K 1
N N
So, we have:
1 0 1 0
plim Zy= plim ZX β
N N
or equivalently
1
1 0 1 0
β= plim ZX plim Zy
N N

De…nition (Instrumental Variable (IV) estimator)

b of parameters
If H = K , the Instrumental Variable (IV) estimator β IV
β is de…ned as to be:
b = Z0 X 1 Z0 y
β IV

De…nition (Consistency)
b is
Under the assumption that plim N 1 Z0 ε = 0, the IV estimator β IV
consistent:
p
b !
β β
IV
where β denotes the true value of the parameters.

Proof: By de…nition:
1
b = Z0 X 1 1 0 1 0
β IV Z0 y = β + ZX Zε
N N
So, we have:
1
b = β + plim 1 0 1 0
plim β IV ZX plim Zε
N N
Under the assumption of exogeneity of the instruments

1 0
plim Z ε = 0K 1
N
So, we have
b =β
plim β IV

De…nition (Asymptotic distribution)

b is asymptotically
Under some regularity conditions, the IV estimator β IV
normally distributed:
p d
b
N β β ! N 0K 1, σ
2
QZX1 QZZ QZX1
IV
where
1 0 1 0
QZZ = plim ZZ QZX = plim ZX
K K N K K N

De…nition (Asymptotic variance covariance matrix)

b is
The asymptotic variance covariance matrix of the IV estimator β IV
de…ned as to be:
b σ2
Vasy β IV = Q 1 QZZ QZX1
N ZX
A consistent estimator is given by
b
b asy β 1 1
V IV b 2 Z0 X
=σ Z0 Z X0 Z

Remarks
1 If the system is just identi…ed H = K ,

1 1
Z0 X = X0 Z
QZX = QXZ
the estimator can also written as
b
b asy β 1 1
V IV b 2 Z0 X
=σ Z0 Z Z0 X
2 As usual, the estimator of the variance of the error terms is:
ε0b
b ε 1 N 2
b2 =
σ
N K
=
N K ∑ yi b
xi0 β IV
i =1

Relevant instruments
1 Our analysis thus far has focused on the “identi…cation” condition

for IV estimation, that is, the “exogeneity assumption,” which
produces
1
plim Z0 ε = 0K 1
N
2 A growing literature has argued that greater attention needs to be
given to the relevance condition
1 0
N
with H = K in the case of a just-identi…ed model.

Relevant instruments (cont’d)
1 0
N
1 While strictly speaking, this condition is su¢ cient to determine the
asymptotic properties of the IV estimator
2 However, the common case of “weak instruments,” is only barely
true has attracted considerable scrutiny.

De…nition (Weak instrument)

A weak instrument is an instrumental variable which is only slightly
correlated with the right-hand-side variables X. In presence of weak
instruments, the quantity QZX is close to zero and we have
1 0
Z X ' 0H K
N

Fact (IV estimator and weak instruments)

In presence of weak instruments, the IV estimators βb has a poor
IV
precision (great variance). For QZX ' 0H K , the asymptotic variance
tends to be very large, since:
b σ2
Vasy β IV = Q 1 QZZ QZX1
N ZX
As soon as N 1 Z0 X ' 0H K , the estimated asymptotic variance
covariance is also very large since
b
b asy β 1 1
V IV b 2 Z0 X
=σ Z0 Z X0 Z

Assumption: Consider an over-identi…ed model
H>K

Introduction
If Z contains more variables than X, then much of the preceding derivation
is unusable, because Z0 X will be H K with
rank Z0 X = K < H
So, the matrix Z0 X has no inverse and we cannot compute the IV

estimator as:
b = Z0 X 1 Z0 y
β IV

Introduction (cont’d)
The crucial assumption in the previous section was the exogeneity
assumption
1
plim Z0 ε = 0K 1
N
1 That is, every column of Z is asymptotically uncorrelated with ε.
2 That also means that every linear combination of the columns of Z
is also uncorrelated with ε, which suggests that one approach would
be to choose K linear combinations of the columns of Z.

Introduction (cont’d)
Which linear combination to choose?
A choice consists in using is the projection of the columns of X in the
column space of Z:
Xb = Z Z0 Z 1 Z0 X
b for Z, we have
With this choice of instrumental variables, X
1
b
β = b 0X
X b 0y
X
2SLS
1 1 1
= X0 Z Z0 Z Z0 X X0 Z Z0 Z Z0 y

De…nition (Two-stage Least Squares (2SLS) estimator)

The Two-stage Least Squares (2SLS) estimator of the parameters β is
de…ned as to be:
1 0
b b0 by
β 2SLS = X X X
b = Z Z0 Z 1
where X Z0 X corresponds to the projection of the columns of
X in the column space of Z, or equivalently by
1 1 1
b 0 0
Z0 X X0 Z Z0 Z Z0 y
β 2SLS = X Z Z Z

Remark
By de…nition
1
b b0 b 0y
β 2SLS = X X X
Since
b = Z Z0 Z 1
X Z0 X = PZ X
where PZ denotes the projection matrix on the columns of Z. Reminder:
PZ is symmetric and PZ PZ0 = PZ . So, we have
1
b
β = X0 PZ X
0 b 0y
X
2SLS
1
=
0
X0 PZ PZ X b 0y
X
1
= b 0X
X b b 0y
X

De…nition (Two-stage Least Squares (2SLS) estimator)

The Two-stage Least Squares (2SLS) estimator of the parameters β
can also be de…ned as:
1
b b0 b b 0y
β 2SLS = X X X
It corresponds to the OLS estimator obtained in the regression of y on X.b

b
Then, the 2SLS can be computed in two steps, …rst by computing X, then
by the least squares regression. That is why it is called the two-stage LS
estimator.

A procedure to get the 2SLS estimator is the following

Step 1: Regress each explicative variable xk (for k = 1, ..K ) on the H
instruments.
xkj = α1 z1j + α2 z2j + .. + αH zHj + vj
Step 2: Compute the OLS estimators b
αh and the …tted values b
xkj
xkj = b
b α1 z1j + b
α2 z2j + .. + b
αH zHj
Step 3: Regress the dependent variable y on the …tted values b

xki :
yj = β1 b
x1j + β2 b
x2j + .. + βK b
xKj + εj
The 2SLS estimator βb

2SLS then corresponds to the OLS estimator
obtained in this model.

Theorem
If any column of X also appears in Z, i.e. if one or more explanatory
(exogenous) variable is used as an instrument, then that column of X is
b
reproduced exactly in X.

Example (Explicative variables used as instrument)

Suppose that the regression contains K variables, only one of which, say,
the K th , is correlated with the disturbances, i.e. E (xKi εi ) 6= 0. We can
use a set of instrumental variables z1 ,..., zJ plus the other K 1 variables
that certainly qualify as instrumental variables in their own right. So,
Z = (z1 : .. : zJ : x1 : .. : xK 1)
Then
b = (x1 : .. : xK
X 1 :b
xK )
where b
xK denotes the projection of xK on the columns of Z.

Key Concepts SubSection 3.1
1 Endogeneity bias and smearing e¤ect.

2 Instrument or instrumental variable.
3 Exogeneity and relavance properties of an instrument.
4 Instrumental Variable (IV) estimator.
5 Two-Stage Least Square (2SLS) estimator.
6 Weak instrument.

Subsection 3.2
Anderson and Hsiao (1982) IV approach

3.2 Anderson and Hsiao (1982) IV approach
Objectives
1 Introduce the IV approach of Anderson and Hsiao (1982).

2 Describe their 4 steps estimation procedure.
3 Introduce the …rst di¤erence transformation of the dynamic model.
4 Describe their choice of instruments.

Consider a dynamic panel data model with random individual e¤ects:

0 0
yit = γyi ,t 1 + β xit + ρ ω i + αi + εit
αi are the (unobserved) individual e¤ects,
xit is a vector of K1 time-varying explanatory variables,
ω i is a vector of K2 time-invariant variables.

Assumption: we assume that the component error term vit = εit + αi

E (εit ) = 0, E (αi ) = 0
E (εit εjs ) = σ2ε if j = i and t = s, 0 otherwise.
E (αi αj ) = σ2α if j = i, 0 otherwise.
E (αi xit ) = 0, E (αi ω i ) = 0 (exogeneity assumption for ω i )
E (εit xit ) = 0, E (εit ω i ) = 0 (exogeneity assumption for xit )

The K1 + K2 + 3 parameters to estimate are

0 0
1 γ the autoregressive parameter,

2 β is the K1 1 vector of parameters for the time-varying explanatory
variables,
3 ρ is the K2 1 vector of parameters for the time-invariant variables,
4 σ2ε and σ2α the variances of the error terms.

Remark
If the vector ω i includes a constant term, the associated parameter can be
interpreted as the mean of the (random) individual e¤ects
0 0
αi = µ + αi E ( αi ) = 0
0 1 0 1
1 µ
B zi 2 C B ρ C
ωi = B @ ... A
C ρ =B 2 C
@ ... A
(K 2 ,1 ) (K 2 ,1 )
ziK 2 ρK 2

Vectorial form:
0
yi = yi , 1γ + Xi β + ω i ρe + αi e + εi
εi , yi and yi , 1 are T 1 vectors (T is the adjusted sample size),
Xi a T K1 matrix of time-varying explanatory variables,
ω i is a K2 1 vector of time-invariant variables,
e is the T 1 unit vector, and

0 0
E (αi ) = 0 E αi xit = 0 E αi ω i = 0

In the dynamic panel data models context:

The Instrumental Variable (IV) approach was …rst proposed by
Anderson and Hsiao (1982).
They propose an IV procedure with 2 choices of instruments and 4

steps to estimate γ, β, ρ and σ2ε .
Anderson, T.W., and C. Hsiao (1982). Formulation and Estimation of

Dynamic Models Using Panel Data, Journal of Econometrics, 18, 47–82.

The Anderson and Hsiao (1982) IV approach

1 First step: …rst di¤erence transformation
2 Second step: choice of instruments and IV estimation of γ and β
3 Third step: estimation of ρ
4 Fourth step: estimation of the variances σ2α and σ2ε



First step: …rst di¤erence transformation

Taking the …rst di¤erence of the model, we obtain for t = 2, .., T .
0
(yit yi ,t 1) = γ (yi ,t 1 yi ,t 2) + β (xit xi ,t 1 ) + εit εi ,t 1
The …rst di¤erence transformation leads to "lost" one observation.
But, it allows to eliminate the individual e¤ects (as the Within

transformation).



Second step: choice of the instruments and IV estimation

0
A valid instrument zit should satisfy
E (zit (εit εi ,t 1 )) = 0 Exogeneity property
E (zit (yi ,t 1 yi ,t 2 )) 6= 0 Relevance property

Anderson and Hsiao (1982) propose two valid instruments:

1 First instrument: zi ,t = yi ,t 2
E (yi ,t 2 (εit εi ,t 1 )) = 0 Exogeneity property
E (yi ,t 2 (yi ,t 1 yi ,t 2 )) 6= 0 Relevance property
2 Second instrument: zi ,t = (yi ,t 2 yi ,t 3)
E ((yi ,t 2 yi ,t 3 ) ( εit εi ,t 1 )) = 0 Exogeneity property
E ((yi ,t 2 yi ,t 3 ) (yi ,t 1 yi ,t 2 )) 6= 0 Relevance property

Remarks
The initial …rst di¤erences model includes K1 + 1 regressors.
The regressor (yi ,t 1 yi ,t 2) is endogeneous.
The regressors (xit xi ,t 1) are assumed to be exogeneous.

De…nition (Instruments)
Anderson and Hsiao (1982) consider two sets of K1 + 1 instruments, in
both cases the system is just identi…ed (IV estimator):
!0
0
zi = yi ,t 2 : (xit xi ,t 1)
(K 1 +1,1 ) (1,1 ) (1,K 1 )
!0
0
zi = (yi ,t 2 yi ,t 3 ) : (xit xi ,t 1 )
(K 1 +1,1 ) (1,1 ) (1,K 1 )

IV estimator with the …rst set of instruments

b IV
γ 1
b = Z0 X Z0 y =
βIV
0
!! 1
n T
(yi ,t yi ,t 2 ) yi ,t 2 yi ,t 2 (xit xi ,t 1 )
∑∑ (xit
1
0
i =1 t =2 xi ,t 1 ) yi ,t 2 (xit xi ,t 1 ) (xit xi ,t 1)
!
n T
yi ,t 2
∑∑ xit xi ,t 1
(yi ,t yi ,t 1)
i =1 t =2

IV estimator with the second set of instruments

b IV
γ 1
b = Z0 X Z0 y =
βIV
n T
(yi ,t yi ,t 2 ) (yi ,t 2 yi ,t 3) (yi ,t yi ,t 3 ) (xit xi ,t
∑∑
1 2 1
0
i =1 t =3 (xit xi ,t 1 ) (yi ,t 2 yi ,t 3) (xit xi ,t 1 ) (xit xi ,t 1)
!
n T
yi ,t yi ,t 3
∑∑ xit
2
xi ,t 1
(yi ,t yi ,t 1)
i =1 t =3

3. Instrumental variable (IV) estimators
Remarks
1 The …rst estimator (with zit = yi ,t 2 ) has an advantage over the
second one (with zit = yi ,t 2 yi ,t 3 ), in that the minimum number
of time periods required is two, whereas the …rst one requires T 3.
2 In practice, if T 3, the choice between both depends on the
correlations between (yi ,t 1 yi ,t 2 ) and yi ,t 2 or (yi ,t 2 yi ,t 3)
=> relevance assumption.
Anderson, T.W., and C. Hsiao (1981). Estimation of Dynamic Models with

Error Components, Journal of the American Statistical Association, 76,
598–606



Third step
0 0
yit = γyi ,t 1 + β xit + ρi ω i + αi + εit
b IV and b
Given the estimates γ βIV , we can deduce an estimate of ρ,
the vector of parameters for the time-invariant variables ω i .
Let us consider, the following equation

0
yi b IV y i ,
γ 1
b
βIV x i = ρ0 ω i + vi i = 1, ..., n
with vi = αi + εi .
The parameters vector ρ can simply be estimated by OLS.

De…nition (parameters of time-invariant variables)

A consistent estimator of the parameters ρ is given by
! 1 !
n n
b
ρ = ∑ ωi ωi0 ∑ ωi hi
(K 2 ,1 ) i =1 i =1
0
with hi = y i b IV y i ,
γ 1
b
βIV x i .



Fourth step: estimation of the variances
De…nition
b IV , b
Given γ βIV , and b
ρ, we can estimate the variances as follows:
T n
1
n (T 1) t∑ ∑ bε2it
b2ε =
σ
=2 i =1
2
1 n 0 1 2
n i∑
b2α =
σ yi b IV y i ,
γ 1
b
βIV x i ρ0 zi
b b
σ
=1 T ε
with
0
bεit = (yi ,t yi ,t 1) b IV (yi ,t
γ 1 yi ,t 2)
b
βIV (xi ,t xi ,t 1)

Theorem
The instrumental-variable estimators of γ, β, and σ2ε are consistent when
n (correction of the Nickell bias), or T , or both tend to in…nity.
b IV = γ
plim γ plim b
βIV = β b2ε = σ2ε
plim σ
n,T !∞ n,T !∞ n,T !∞
The estimators of ρ and σ2α are consistent only when n goes to in…nity.
plim b
ρ=ρ b2α = σ2α
plim σ
n !∞ n !∞

Key Concepts SubSection 3.2
1 Anderson and Hsiao (1982) IV approach.

2 The 4 steps of the estimation procedure.
3 First di¤erence transformation of the dynamic panel model.
4 Tow choices of instrument.

Section 4
Generalized Method of Moment (GMM)

4. The GMM approach
Let us consider the same dynamic panel data model as in section 3:

0 0
αi are the (unobserved) individual e¤ects,
xit is a vector of K1 time-varying explanatory variables,
ω i is a vector of K2 time-invariant variables.

4. The GMM approach
Assumptions: we assume that the component error term vit = εit + αi

E (εit ) = 0, E (αi ) = 0
E (εit εjs ) = σ2ε if j = i and t = s, 0 otherwise.
E (αi αj ) = σ2α if j = i, 0 otherwise.
E (αi xit ) = 0, E (αi ω i ) = 0 (exogeneity assumption for ω i )

4. The GMM approach
De…nition (First di¤erence model)

The GMM estimation method is based on a model in …rst di¤erences, in
order to swip out the individual e¤ects αi and th variables ω i :
0
for t = 2, .., T .

4. The GMM approach
Intuition of the moment conditions

Notice that yi ,t 2 and (yi ,t 2 yi ,t 3) are not the only valid
instruments for (yi ,t 1 yi ,t 2 ).
All the lagged variables yi ,t 2 j, for j 0, satisfy
E (yi ,t 2 j (εi ,t εi ,t 1 )) = 0 Exogeneity property
E (yi ,t 2 j (yi ,t 1 yi ,t 2 )) 6= 0 Relevance property

Therefore, they all are legitimate instruments for (yi ,t 1 yi ,t 2 ).

4. The GMM approach
Intuition of the moment conditions

The m + 1 conditions
E (yi ,t 2 j (εi ,t εi ,t 1 )) = 0 for j = 0, 1, .., m
can be used as moment conditions in order to estimate
θ = β, γ, ρ, σ2α , σ2ε
Arellano, M., and S. Bond (1991). “Some Tests of Speci…cation for Panel
Data: Monte Carlo Evidence and an Application to Employment Equations,”
Review of Economic Studies, 58, 277–297.

4. The GMM approach
Remark: The moment conditions
E (yi ,t 2 j (εi ,t εi ,t 1 )) = 0 for j = 0, 1, .., m
mean that there exists a vector of parameters (true value)

0
0 0
θ 0 = β0 , γ0 , ρ0 , σ2α0 , σ2ε0
such that
0
E yi ,t 2 j ∆yit γ0 ∆yi ,t 1 β0 ∆xit =0
where ∆ = (1 L) and L denotes the lag operator .

4. The GMM approach
We consider two alternative assumptions on the explanatory variables xit
1 The explanatory variables xit are strictly exogeneous.

2 The explanatory variables xit are pre-determined.

4. The GMM approach


4. The GMM approach
Assumption: exogeneity
We assume that the time varying explanatory variables xit are strictly
exogeneous in the sense that:
0
E xit εis = 0 8 (t, s )

4. The GMM approach
De…nition (moment conditions)

For each period, we have the following orthogonal conditions
E (qit ∆εit ) = 0, t = 2, .., T

0
0
qit = yi 0 , yi 1 , .., yi ,t 2 , xi
(t 1 +TK 1 ,1 )
0 0 0
where xi = xi 1 , .., xiT , ∆ = (1 L) and L denotes the lag operator

4. The GMM approach
Example (moment conditions)

0
The condition E (qit ∆εit ) = 0, qit = (yi 0 , yi 1 , .., yi ,t 2 , xi0 ) at time t = 2
becomes
!
yi 0
E qi 2 ∆εi 2 = E ( εi 2 εi 1 ) = 0
(1 +TK 1 ,1 ) (1,1 ) xi0 (1 +TK 1 ,1 )
0 0 0
where xi = xi 1 , .., xiT . At time t = 3, we have
! 00 1 1
yi 0
E qi 3 ∆εi 3 = E @ @ yi 1 A ( ε i 3 εi 2 ) A = 0
(2 +TK 1 ,1 )
(2 +TK 1 ,1 ) (1,1 ) xi0

4. The GMM approach
Under the exogeneity assumption, what is the number of moment

conditions?
E (qit ∆εit ) = 0, t = 2, .., T
Time Number of moment conditions

t=2 1 + TK1
t=3 2 + TK1
... ...
t=T T 1 + TK1
total T (T 1) (K1 + 1/2)

4. The GMM approach
Proof: the total number of moment conditions is equal to
r = 1 + TK1 + 2 + TK1 .. + TK1 + (T 1)

= T (T 1) K1 + 1 + 2 + .. + (T 1)
T (T 1)
= T ( T 1 ) K1 +
2
1
= T ( T 1 ) K1 +
2

4. The GMM approach
Stacking the T 1 …rst-di¤erenced equations in matrix form, we have
∆yi = ∆yi , 1 γ + ∆Xi β + ∆εi i = 1, .., N

(T 1,1 ) (T 1,1 )(1,1 ) (T 1,K 1 )(K 1 ,1 ) (T 1,1 )
where :
0 1 0 1
yi 2 yi 1 yi 1 yi 0
B yi 3 yi 2 C B yi 2 yi 1 C
∆yi =B
@
C ∆yi ,
A 1 =B
@
C
A
(T 1,1 ) .. (T 1,1 ) ..
yiT yi ,T 1 yiT 1 yi ,T 2

4. The GMM approach
Stacking the T 1 …rst-di¤erenced equations in matrix form, we have
∆yi = ∆yi , 1 γ + ∆Xi β + ∆εi i = 1, .., N

(T 1,1 ) (T 1,1 )(1,1 ) (T 1,K 1 )(K 1 ,1 ) (T 1,1 )
where :
0 1 0 1
xi 2 xi 1 εi 2 εi 1
B xi 3 xi 2 C B εi 3 εi 2 C
∆Xi =B
@
C
A ∆εi =B
@
C
A
(T 1,K 1 ) .. (T 1,1 ) ..
xiT xi ,T 1 εiT εi ,T 1

4. The GMM approach
De…nition (moment conditions)

The conditions E (qit ∆εit ) = 0 for t = 2, .., T , can be written as
!
E Wi ∆εi = 0
(r ,T 1 )(T 1,1 ) (m,1 )
0 1
qi 2 0 ... 0
B (1 +TK 1 ,1 ) C
B 0 qi 3 C
B C
Wi = B
B (2 +TK 1 ,1 ) C
C
B .. C
@ A
0 .. qiT
(T 1 +TK 1 ,1 )
where r = T (T 1) (K1 + 1/2) is the number of moment conditions.

4. The GMM approach
Example (moment conditions, vectorial form)

At time t = 2, we have
yi 0
E (qi 2 ∆εi 2 ) = E ( εi 2 εi 1 ) =0
xi0
In a vectorial form we have the …rst set of 1 + TK1 moment conditions

0 0 11
εi 2 εi 1
B qi 2 0 ... 0 B CC
E (Wi ∆εi ) = E B B εi 3 εi 2 C C = 0
@ (1 +TK 1 ,1 ) @ .. AA
εiT εi ,T 1

4. The GMM approach
Example (moment conditions, vectorial form)

At time t = 3, we have
00 1 1
yi 0
E (qi 3 ∆εi 3 ) = E @@ yi 1 A (εi 3 εi 2 ) A = 0
xi0
In a vectorial form we have the second set of 2 + TK1 moment conditions

0 0 11
εi 2 εi 1
B 0 qi 3 ... 0 B CC
E (Wi ∆εi ) = E B B εi 3 εi 2 C C = 0
@ (2 +TK 1 ,1 ) @ .. AA
εiT εi ,T 1

4. The GMM approach
Example
For T = 10 et K1 = 0 (without explicative variable), we have
T (T 1)
r= = 45 orthogonal conditions
2
Example
For T = 50 et K1 = 0 (without explicative variable), we have
T (T 1)
r= = 1225 orthogonal conditions !!
2

4. The GMM approach
Number of orthogonal conditions

5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0 10 20 30 40 50 60 70 80 90 100
T

4. The GMM approach


4. The GMM approach


4. The GMM approach
Assumption: pre-determination
We assume that the time varying explanatory variables xit are
pre-determined in the sense that:
E xit0 εis = 0 if t s

4. The GMM approach
In this case, we have
E (qit ∆εit ) = 0, t = 2, .., T

0 10
B 0 0 C
qit = @yi 0 , yi 1 , .., yi ,t 2 , xi 1 , .., xi ,t 2 A
(t 1 +tK 1 ,1 ) | {z }
not T

4. The GMM approach
De…nition
The conditions E (qit ∆εit ) = 0 for t = 2, .., T , can be written as
!
E Wi ∆εi = 0
(r ,T 1 )(T 1,1 ) (m,1 )
0 1
qi 2 0 ... 0
B (1 +K 1 ,1 ) C
B 0 qi 3 C
B C
Wi = B
B (2 +2K 1 ,1 ) C
C
B .. C
@ A
0 .. qiT
(T 1 +(T 1 )K 1 ,1 )
where r = T (T 1) (K1 + 1) /2 is the number of moment conditions.

4. The GMM approach
Proof: the total number of moment conditions is equal to
r = 1 + K1 + 2 + K1 .. + (T 1) K1 + (T 1)
= (1 + K 1) (1 + 2 + ... + (T 1))
T (T 1)
= ( 1 + K1 )
2

4. The GMM approach
Number of orthogonal conditions (K1=1)

15000
10000
X exogeneous
X pre-determined
5000
0
0 10 20 30 40 50 60 70 80 90 100
T

4. The GMM approach
Fact
Whatever the assumption made on the explanatory variable, the number of
othogonal conditions (moments) r is much larger than the number of
parameters, e.g. K1 + 1. Thus, Arellano and Bond (1991) suggest a
generalized method of moments (GMM) estimator.
Arellano, M., and S. Bond (1991). “Some Tests of Speci…cation for Panel
Data: Monte Carlo Evidence and an Application to Employment Equations,”
Review of Economic Studies, 58, 277–297.

4. The GMM approach
We will exploit the moment conditions
E (Wi ∆εi ) = 0
0
to estimate by GMM the parameters θ = γ, β0 in
∆yi = ∆yi , 1γ + ∆Xi β + ∆εi i = 1, .., n

Subsection 4.1
GMM: a general presentation

4.1 GMM: a general presentation
De…nition
The standard method of moments estimator consists of solving the
unknown parameter vector θ by equating the theoretical moments with
their empirical counterparts or estimates.

1 Suppose that there exist relations m (yt ; θ ) such that
E (m (yt ; θ 0 )) = 0
where θ 0 is the true value of θ and m (yt ; θ 0 ) is a r 1 vector.

2 b (y , θ ) be the sample estimates of E (m (yt ; θ )) based on n
Let m
independent samples of yt
1 n
n t∑
b (y , θ ) =
m m (yt ; θ )
=1
3 Then the method of moments consit in …nding b

θ, such that
b y,b
m θ =0

Intuition of the GMM
Consider the moment conditions such that
E (m (yt ; θ 0 )) = 0
Under some regularity assumptions, 8θ 2 Θ
1 n p
b (y , θ ) =
m ∑
n t =1
m (yt ; θ ) ! E (m (yt ; θ ))
In particular
p
b (y , θ 0 ) ! E (m (yt ; θ 0 )) = 0
m
So, the GMM consists in …nding b
θ such that
p
b y,b
m θ = 0 =) b
θ ! θ0

Fact (just identi…ed system)

If the number r of equations (moments) is equal to the dimension a of θ, it
is in general possible to solve for b
θ uniquely. The system is just identi…ed.

Example (classical method of moment)

Consider a random variable yt t (v ). We want to estimate v from an
i.i.d. sample fy1 , ..yn g. We know that:
v
µ2 = E yt2 = V (yt ) =
v 2
If µ2 is known, then v can be identi…ed as:
2E yt2
v=
E (yt2 ) 1


b2,T
Now let us consider the sample variance µ
1 n 2 p
n t∑
b2 =
µ yt ! µ2
=1
So, a consistent estimate (classical method of moment) of v is de…ned by:
2b
µ2
vb =
b2
µ 1


Another way to write the problem consists in de…ning
v
m (yt ; v ) = yt2
v 2
By de…nition, we have:
v
E (m (yt ; v )) = E yt2 =0
v 2


The moment condition (r = 1) is
v
E (m (yt ; v )) = E yt2 =0
v 2
The empirical counterpart is
1 n 1 n v
b (y ; v ) =
m ∑
n t =1
m (yt ; v ) = ∑ yt2
n i =1 v 2
So, the estimator vb of the classical method of moment is de…ned by:
2b
µ2 p 2E yt2
b (y ; vb) = 0 ,
m vb = !v =
b2
µ 1 E (yt2 ) 1

De…nition (over-identi…ed system)

If the number of moments r is greater than the dimension of θ, the system
b (y ; vb) = 0, in general, has no solution. The
of non linear equation m
system is over-identi…ed.

If the system is over-identi…ed, it is then necessary to minimize some norm

(or distance measure) of m b (y ; θ ) in order to …nd b
θ:
b (y ; θ ) 0 S
q (y , θ ) = m 1
b (y ; θ )
m
where S 1 is a positive de…nite (weighting) matrix.

Example (weigthing matrix)

Consider a random variable yt t (v ). We want to estimate v from an
i.i.d. sample fy1 , ..yn g. We know that:
v
µ2 = E yt2 =
v 2
3v 2
µ4 = E yt4 =
(v 2) (v 4)
The two moment conditions (r = 2) can be written as
!
yt2 v v 2 0
E (m (yt ; v )) = E 2 =
yt4 (v 23v)(v 4 ) 0


It is impossible to …nd a unique value vb such that
!
1 n 2 vb
1 n n ∑t =1 yt 0
b (y ; vb) = ∑ m (yt ; vb) =
m 1 n 2
vb 2
3bv2 =
n t =1 n ∑t =1 yt (vb 2 )(vb 4 )
0
So, we have to impose some weights to the two moment conditions
b (y ; v ) 0 S
m 1
b (y ; v )
m


Let us assume that
1 1 0
S =
0 2
then we have
!2
1 n 2 v
n t∑
0 1
b (y ; v ) S
m b (y ; v ) =
m yt
=1 v 2
!2
1 n 2 3v 2
n t∑
+2 yt
=1 (v 2) (v 4)
b (y ; v ) 0 S
It is now possible to …nd vb such that m 1 b (y ; v ) = 0
m

De…nition (GMM estimator)

The GMM estimator b
θ minimizes the following criteria
b b (y ; θ ) 0 S
θ = arg min q (y , θ ) = arg min m 1
b (y ; θ )
m
θ 2 Ra (1,1 ) θ 2 Ra (1,r ) (r ,r ) (r ,1 )
where S 1 is the optimal weighting matrix.

What is the optimal weigthing matrix?
b b (y ; θ ) 0 S
θ = arg min q (y , θ ) = arg min m 1
b (y ; θ )
m
θ 2 Ra (1,1 ) θ 2 Ra (1,r ) (r ,r ) (r ,1 )
The optimal choice (if there is no autocorrelation of m (y ; θ 0 )) of S

turns out to be !
S = E m (y ; θ 0 ) m (y ; θ 0 ) 0
(r ,r ) (r ,1 ) (1,r )
The matrix S corresponds to variance-covariance matrix of the vector

m (y ; θ 0 ).

De…nition (Optimal weighting matrix)

In the general case, the optimal weighting matrix is the inverse of the
long-run variance covariance matrix of m (yt ; θ 0 ).
!
∞
S =
(r ,r )
∑ E m (yt ; θ 0 ) m (yt j ; θ 0 )0
j= ∞ (r ,1 ) (1,r )

Remark
The optimal weighting matrix is
∞
S= ∑ E m (yt ; θ 0 ) m (yt j ; θ 0 )0
j= ∞
We can replace the unknow value θ 0 by the GMM estimator θ̂ and the
optimal weighting matrix becomes
∞ 0
S= ∑ E m yt ; b
θ m yt j ; b
θ
j= ∞

Problem 1 How to estimate S?

∞ 0
S= ∑ E m yt ; b
θ m yt j ; b
θ
j= ∞
A …rst solution (too) simple solution consits in using the empirical

counterparts of variance and covariances
n 2
b=
S ∑ bj
Γ
j = (n 2 )
n 0
bj = 1 ∑ m yt ; b
Γ θ m yt j ; b
θ
n t =j +2
But, this estimator may be no positive de…nite...

Solution (Non-parametric kernel estimators)

A positive de…nite kernel estimator for S has been proposed by Newey and
West (1987) and is de…ned as
q
j
b=Γ
S b0 + ∑ 1
q+1
bj + Γ
Γ bj0
j =1
n 0
bj = 1 ∑ m yt ; b
Γ θ m yt j ; b
θ
n t =j +2
where q is a bandwidth parameter and K (j ) = 1 j / (q + 1) a Bartlett

kernel function.

Example (Newey and West kernel estimator)

q
j
b=Γ
S b0 + ∑ 1
q+1
bj + Γ
Γ bj0
j =1
If q = 2 then we have
S b0 + 2 Γ
b=Γ b10 + 1 Γ
b1 + Γ b2 + Γ
b20
3 3

Other estimators => other kernel functions
q
j
b=Γ
S b0 + ∑K q+1
bj + Γ
Γ bj0
j =1
1 Gallant (1987): Parzen kernel

8
< 1 6 ju j2 + 6 ju j3 if 0 ju j 1/2
K (u ) = 2 (1 ju j)3 if 1/2 ju j 1
:
0 otherwise
2 Andrews (1991): quadratic spectral kernel
3 sin (6πu/5)
K (u ) = 2
cos (6πu/5)
(6πu/5) (6πu/5)

Problem 2
b b (y ; θ ) 0 S
θ = arg min m 1
b (y ; θ )
m
θ 2 Ra
∞ 0
S= ∑ E m yt ; b
θ m yt j ; b
θ
j= ∞
1 In order to compute b
θ, we have to know S 1.
2 In order to compute S, we have to know b

θ... a circularity issue

Solutions
1 Two-step GMM: Hansen (1982)

2 Iterative GMM: Ferson and Foerster (1994)
3 Continuous-updating GMM: Hansen, Heaton and Yaron (1996),
Stock and Wright (2000), Newey and Smith (2003), Ma (2002).

Solutions
1 Two-step GMM: Hansen (1982)

2 Iterative GMM: Ferson and Foerster (1994)
3 Continuous-updating GMM: Hansen, Heaton and Yaron (1996),
Stock and Wright (2000), Newey and Smith (2003), Ma (2002).

Two-step GMM
Step 1: put the same weight to the r moment conditions by using an
identity weighting matrix
S0 = Ir
Compute a …rst consistent (but not e¢ cient) estimator b
θ0
b
θ0 = b ( y ; θ ) 0 S0 1 m
arg min m b (y ; θ )
θ 2 Ra
= b (y ; θ ) 0 m
arg min m b (y ; θ )
θ 2 Ra

Two-step GMM
b1
Step 2: Compute the estimator for the optimal weighting matrix S
q
j
b1 = Γ
S b0 + ∑K q+1
bj + Γ
Γ bj0
j =1
n 0
bj = 1 ∑ m yt ; b
Γ θ 0 m yt j ; b
θ0
n t =j +2
Finally, compute the e¢ cient GMM estimator b

θ 1 as
b b (y ; θ ) 0 S
θ 1 = arg min m b 1m
1 b (y ; θ )
θ 2Ra

Subsection 4.2
Application to dynamic panel data models

4.2 Application to dynamic panel data models
Various GMM estimators (i.e. moment conditions) have been proposed for
dynamic panel data models
1 Arellano and Bond (1991): GMM estimator

2 Arellano and Bover (1995): GMM estimator
3 Ahn and Schmidt (1995): GMM estimator
4 Blundell and Bond (2000): a system GMM estimator



Problem
Let us consider the dynamic panel data model in …rst di¤erences
∆yi = ∆yi , 1γ + ∆Xi β + ∆εi i = 1, .., n

0
We want to estimate the K1 + 1 parameters θ = γ, β0 .
For that, we have r = T (T 1) (K1 + 1/2) moment conditions (if

xit are strictly exogeneous)
E (Wi ∆εi ) = E (Wi (∆yi ∆yi , 1γ ∆Xi β)) = 0r

Let us denote
m (yi , xi ; θ ) = Wi (∆yi ∆yi , 1γ ∆Xi β)
with
E (m (yi , xi ; θ )) = 0r

De…nition (Arellano and Bond (1991) GMM estimator)

0
The Arellano and Bond GMM estimator of θ = γ, β0 is
!0 !
1 n 1 n
n i∑ n i∑
b
θ = arg min m (yi , xi ; θ ) S 1
m (yi , xi ; θ )
θ 2 RK 1 + 1 =1 =1
or equivalently
! !
1 n 1 n
n i∑ n i∑
b
θ = arg min ∆εi0 Wi0 S 1
Wi ∆εi
θ 2 RK 1 + 1 =1 =1
with S = E (m (y ; θ 0 ) m (y ; θ 0 ))0 .

Under the assumption of non-autocorrelation, the optimal weighting

matrix can be expressed as
!
1 n
n2 i∑
S =E Wi ∆εi ∆εi0 Wi0
=1
In the general case, S is the long-run variance covariance matrix of

n 2 ∑ni=1 Wi ∆εi ∆εi0 Wi0 .



In addition to the previous moment conditions, Arellano and Bover (1995)

also note that E (v i ) = E (εi + αi ) = 0, where
vi = yi γy i , 1 β0 x i ρ0 ω i
ei exist (for instance, the constant 1 is a valid

Therefore, if instruments q
instrument) such that
E (qei v i ) = 0
then a more e¢ cient GMM estimator can be derived by incorporating this
additional moment condition.
Arellano, M., and O. Bover (1995). “Another Look at the Instrumental
Variable Estimation of Error-Components Models,” Journal of Econometrics,
68, 29–51.

De…nition
Arellano and Bond (1991) consider the following moment conditions
E (m (yi , xi ; θ )) = E (Wi (∆yi ∆yi , 1γ ∆Xi β)) = 0
De…nition
Arellano and Bover (1995) consider additional moment conditions
E (m (yi , xi ; θ )) = E q
ei y i γy i , 1 β0 x i ρ0 ω i =0



Apart from the previous linear moment conditions, Ahn and Schmidt
(1995) note that the homoscedasticity condition on E ε2it implies the
following T 2 linear conditions
E (yit ∆εi ,t +1 yi ,t +1 ∆εi ,t +1 ) = 0 t = 1, .., T 2
Combining these restrictions to the previous ones leads to a more e¢ cient

GMM estimator.
Ahn, S.C., and P. Schmidt (1995). “E¢ cient Estimation of Models for
Dynamic Panel Data,” Journal of Econometrics, 68, 5–27.



De…nition (system GMM)

The system GMM (Blundell and Bond) was invented to tackle the weak
instrument problem. It consists in considering both the equation in level
and in …rst di¤erences
E (yit, s ∆εit ) =0 E (xi ,t s ∆εit ) =0 Di¤erence equation
Following additional moments are explored:
E (∆yit, s (αi + εit )) = 0 E (∆xi ,t s (αi + εit )) = 0 Level equation
Blundell and Bond, S. (2000): GMM Estimation with persistent panel data:
an application to production functions. Econometric Reviews,19(3), 321-340.

Remarks
1 While theoretically it is possible to add additional moment conditions

to improve the asymptotic e¢ ciency of GMM, it is doubtful how
much e¢ ciency gain one can achieve by using a huge number of
moment conditions in a …nite sample.
2 Moreover, if higher-moment conditions are used, the estimator can be
very sensitive to outlying observations.

Remarks
1 Through a simulation study, Ziliak (1997) has found that the

downward bias in GMM is quite severe as the number of moment
conditions expands, outweighing the gains in e¢ ciency.
2 The strategy of exploiting all the moment conditions for estimation is
actually not recommended for panel data applications. For further
discussions, see Judson and Owen (1999), Kiviet (1995), and
Wansbeek and Bekker (1996).

End of Chapter 2
Christophe Hurlin (University of Orléans)

Geneve Chapitre2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Geneve Chapitre2

Uploaded by

Copyright:

Available Formats

Chapter 2.

Dynamic panel data models

Christophe Hurlin, Université of Orléans

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 1 / 209

De…nition (Dynamic panel data model)

for i = 1, .., n and t = 1, .., T . αi and λt are the (unobserved) individual

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 2 / 209

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 3 / 209

Dynamic panel issues

1 If lagged dependent variables appear as explanatory variables, strict

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 4 / 209

The outline of this chapter is the following:

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 5 / 209

The Dynamic Panel Bias

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 6 / 209

1 Introduce the AR(1) panel data model.

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 7 / 209

Dynamic panel bias

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 8 / 209

De…nition (Nickell’s bias)

Nickell, S. (1981). Biases in Dynamic Models with Fixed E¤ects,

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 9 / 209

De…nition (AR(1) panel data model)

yit = γyi ,t 1 + αi + εit

for i = 1, .., n and t = 1, .., T . For simplicity, let us assume that

to avoid imposing the restriction that ∑ni=1 αi = 0 or E (αi ) = 0 in the

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 10 / 209

1 The autoregressive parameter γ satis…es

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 11 / 209

Dynamic panel bias

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 12 / 209

The LSDV estimator is de…ned by (cf. chapter 1)

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 13 / 209

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 14 / 209

The bias of the LSDV estimator can be rewritten as:

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 15 / 209

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 16 / 209

Theorem (Weak law of large numbers, Khinchine)

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 17 / 209

By application of the WLLN (Khinchine’s theorem)

Since (1) yi ,t 1 only depends on εi ,t 1, εi ,t 2 and (2) the εit are

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 18 / 209

For the second term N2 , we have:

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 19 / 209

In the same way:

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 20 / 209

The numerator of the bias expression can be rewritten as

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 21 / 209

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 22 / 209

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 23 / 209

Let us examine this plim

yit = γyi ,t 1 + αi + εit

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 24 / 209

For any time t, we have:

yit = εit + γεi ,t 1 + γ2 εi ,t 2 + ... + γt 1

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 25 / 209

Summing yi ,t 1 over t, we get:

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 26 / 209

Proof (ct’d): For the individual e¤ect αi , we have

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 28 / 209

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 29 / 209

Finally, the plim is equal to

C. Hurlin (University of Orléans) Advanced Econometrics II April 2018 30 / 209