Hayashi chp3

Hayashi Chapter 3
Bo E. Honoré
Princeton University
Bo Honoré (Princeton University) Instrumental Variables 1 / 57

Motivation: Measurement Error.
Suppose that
yi = α + xi∗ β + εi
but that xi∗ is observed with error so
xi = xi∗ + vi
It is natural to assume that xi∗ , vi and εi are independent.

The equation of interest can be written as
yi = α + xi β + (εi − vi β)
It is clear that E [(εi − vi β) xi ] 6= 0 (unless β = 0) so the OLS estimator will be biased.

On the other hand, suppose that x̃i is correlated with xi∗ but uncorrelated with vi and
εi . Then
cov (yi , x̃i ) = cov (α + xi β + (εi − vi β) , x̃i ) = cov (xi , x̃i ) β
or
cov (yi , x̃i )
β=
cov (xi , x̃i )
So there is hope...

One more example
Suppose
yi = β0 + β1 xi + ui
but that ui varies with xi .
Example
yi log–wages
xi education
ui ability plus “random stuff”

Recall
We need something
that moves xi
that does not move ui
because we can then use it’s relationship with yi to learn about β1 .
Let’s call it zi .
In the “returns to education” example, zi could be distance to the closest college.

Mathematically
cov (yi , zi ) = cov ( β0 + β1 xi + ui , zi )

= cov ( β0 , zi ) + cov ( β1 xi , zi ) + cov (ui , zi )
= 0 + β1 cov (xi , zi ) + 0
so
cov (yi , zi )
β1 =
cov (xi , zi )

From
cov (yi , zi )
β1 =
cov (xi , zi )
it is clear that
cov (xi , zi ) has to be non–zero
a little randomness matters a lot (and asymmetrically) if cov (xi , zi ) is close to 0.

The problem with that is that it does not generalize easily to higher dimensions and
when there are other explanatory variables.
So let us think about it differently.
Suppose that
and
xi = γ0 + γ1 zi + vi .

Suppose that we know γ0 and γ1 then
yi = β0 + β1 (γ0 + γ1 zi + vi ) + ui
= β0 + β1 (γ0 + γ1 zi ) + (ui + β1 vi )
= β0 + β1 xbi + (ui + β1 vi )

Of course, we do not know γ0 and γ1 . But we can estimate them!
2SLS:
Estimate γ0 and γ1 by regressing
xi = γ0 + γ1 zi + vi
Construct
xbi = γ
b0 + γ
b 1 zi
Estimate β0 and β1 by regressing
yi = β0 + β1 xbi + errori

For this to work, we need
γ0 + γ1 zi cannot be a constant. In other words, γ1 6= 0 (zi and xi are correlated)
E [zi ui ] = 0

The general case
yi = β0 + β1 xi1 + β2 xi2 . . . βk xik

+ β1+k wi1 + β2+k wi2 . . . βk +r wir + ui
with
E [wij ui ] = 0, for j = 1, . . . , r
E [zij ui ] = 0, for j = 1, . . . , m
E [ui ] = 0

2SLS:
For each j, regress xij on
1, zi1 , zi2 . . . zim , wi1 , wi2 . . . wir .
Get fitted values, xbij .

Run the regression
yi = β0 + β1 xbi1 + β2 xbi2 . . . βk xbik

+ β1+k wi1 + β2+k wi2 . . . βk +r wir + errori

Why does this work?
Let us think about an example (k = 1, r = 1, m = 1):
yi = β0 + β1 xi1 + β2 wi1 + ui
with
E [ui ] = 0,
E [wi1 ui ] = 0,
E [zi1 ui ] = 0.
You first run the first–stage regression
xi1 = γ0 + γ1 zi1 + γ2 wi1 + vi (1)

The sum of squared errors that you minimize is
n
∑ (xi1 − (g0 + g1zi1 + g2wi1 ))2
i =1
The first order conditions for minimizing this regression are

n
−2 ∑ (xi1 − (γ
b0 + γ
b 1 zi1 + γ
b 2 wi1 )) = 0
i =1
n
−2 ∑ (xi1 − (γ
b0 + γ
b 1 zi1 + γ
b 2 wi1 )) z1i = 0
i =1
n
−2 ∑ (xi1 − (γ
b0 + γ
b 1 zi1 + γ
b 2 wi1 )) w1i = 0
i =1

or
1 n
n i∑
(xi1 − xbi1 ) = 0
=1
1 n
n i∑
(xi1 − xbi1 ) zi1 = 0
=1
1 n
n i∑
(xi1 − xbi1 ) wi1 = 0
=1

Now let us re–write our equation of interest as
yi = β0 + β1 xi1 + β2 wi1 + ui (2)

= β0 + β1 xbi1 + β2 wi1 + ui + β1 (xi1 − xbi1 )
= β0 + β1 xbi1 + β2 wi1 + vi
Warning: The next step is a little sloppy, but making if more precise would add
confusion.
In (2), both wi1 and zi1 are uncorrelated with both ui and (xi1 − xbi1 ). This implies that
xbi1 and wi1 are both uncorrelated with both vi . So (2) is a valid regression.
Note: This is why we include wi1 in (1). Without it, we could not know that wi1 is
uncorrelated with (xi1 − xbi1 ).

GMM: Let’s Do It Right (Hayashi Chapter 3)
Assumptions
Assumption 3.1:
yi = zi0 δ + εi but E [zi εi ] 6= 0
Assumption 3.2: (yi , zi , xi ) ergodic and stationary.
Assumption 3.3: E [gi ] = E [xi εi ] = E [xi (yi − zi0 δ)] = 0
Assumption 3.4: E [xi zi0 ] has rank L where dim (zi ) = L ≤ K = dim (xi )

With these,
E [xi yi ] = E xi zi0 δ + εi = E xi zi0 δ + E [xi εi ] = E xi zi0 δ.

will have a unique solution for δ.
Suppose L = K . Then
E [xi yi ] = E xi zi0 δ =⇒

−1
δ = E xi zi0 −1
E [xi yi ] = Σxz

σxy .
This suggests
! −1
1 n 1 n
δ=
b ∑
n i =1
xi zi0 ∑
n i =1
−1
xi yi = Sxz sxy

Clearly
! −1
n
1 1 n
n i∑ ∑ xi yi
0
δ=
b x z
i i
=1 n i =1
a.s. 0 −1
−→ E xi zi E [ xi yi ] = δ

and
! −1
n
1 1 n
δ =
b ∑
n i =1
xi zi0
n i∑
xi yi
=1
! −1
1 n 1 n
n i∑ ∑
xi zi0 xi zi0 δ + εi

=
=1 n i =1
! −1
1 n 1 n
n i∑ n i∑
0
= δ+ xi zi xi εi
=1 =1
or

! −1
√ 1 n 1 n
n i∑
n bδ−δ = xi zi0 √ ∑ xi ε i
=1 n i =1
With
Assumption 3.5: gi = xi εi is a m.d.s. with S = E [gi gi0 ] finite
we have
! −1
√ 1 n 1 n
n δ−δ =
b ∑
n i =1
xi zi0 √ ∑ xi ε i
n i =1
d −1
−→ E xi zi0

· N (0, S )

What if we have more instruments than we need?
Let 1 n
δ = ∑ xi yi − zi0e
gn e δ = sxy − Sxz e
δ
n i =1

It makes sense to estimate δ by making gn eδ close to 0:

δ W
b c = arg min J eδ, W
c
δ
e
0
= arg min n · gn eδ Wcgn e δ
δ
e
0
= arg min n · sxy − Sxz eδ Wc sxy − Sxz e
δ
δ
e

Simple algebra
−1
0 c 0 c
δ W
b c = Sxz W Sxz Sxz W sxy
!
−1 1 n
n i∑
0 c 0 c
= δ + Sxz W Sxz Sxz W xi ε i
=1

p
c −→
We will assume that W W and W = W 0 . Then
p
δ W
b c −→ δ
and
√
c −δ
δ W
n b
!
−1 1 n
= 0 c
Sxz W Sxz 0 c
Sxz W √ ∑ xi ε i
n i =1
d
0
−1 0
−→ Σxz W Σxz Σxz W · N (0, S )

or
√
d
n b c − δ −→
δ W
−1 0 −1
0 0
N 0, Σxz W Σxz Σxz WSW Σxz Σxz W Σxz
Variance minimized if −1
W = S −1 = E ε2i xi xi0


This suggests
Wc = Sb−1
1 2
Sb = ∑ yi − zi0b
δ Wc1 xi xi0
n
for some arbitrary W
c1 .
√ −1
d

0
−1
δ Sb
n b − δ −→ N 0, Σxz S −1 Σxz
Can show
d
δ Sb−1 , Sb−1 −→ χ2 (K − L)
J e

Two–Stage Least Squares
Example: Suppose homoskedasticity:
E ε2i xi = σ2

Then S = E ε2i xi xi0 = E E ε2i xi xi0 xi = σ2 E [xi xi0 ] .

Suppose we know σ2 then

n
1
Sb = σ2 ∑ xi xi0
n i =1
and

δ W
b copt = b δ Sb−1 =
−1
0 b−1 0 b−1
Sxz S Sxz Sxz S sxy =
 ! ! −1 !  −1
 1 n n n
21 1 
 n i∑
σ ∑ xi xi
n i∑
0 0 0
zi xi xi zi
=1 n i =1 =1

! ! −1 !
1 n 1 n
1 n
n i∑
σ2 ∑ xi xi0
n i∑
zi xi0 xi yi
=1 n i =1 =1

 ! ! −1 !  −1
 n n n 
= ∑ zi xi0 ∑ xi xi0 ∑ xi zi0
i =1 i =1 i =1
 
! ! −1 !
n n n
∑ zi xi0 ∑ xi xi0 ∑ xi yi
i =1 i =1 i =1
−1 0 −1 0 −1 0
= Z 0X X 0X X Z Z X X 0X X y
−1 0
= Z 0 PZ Z Py
−1 0 0
Z 0 P 0 PZ

= ZPy

−1
= (PZ )0 (PZ ) (PZ )0 y
−1
0b
= ZZ
b Zb0 y
−1 0 0
= Z 0 P 0 PZ Z P Py
−1
(PZ )0 (PZ ) (PZ )0 Py

=
−1
= Zb0 Zb Zb0 yb

Classes Example: Measurement Error (again)
Suppose that
yi = α + xi∗ β + ui
but that xi∗ is observed with error so
xi = xi∗ + vi
It is natural to assume that xi∗ , vi and εi are independent. The equation of interest can
be written as
yi = α + xi β + (εi − vi β)
It is clear that E [(εi − vi β) xi ] 6= 0 (unless β = 0) so the OLS estimator will be biased.

Indeed, it can be shown that
σ2x ∗
plim β̂ = β
σ2x ∗ + σ2v
so in this case the bias in unambiguously towards 0. plims can also be calculated in the
case of many x-variables, but they are messy.
Possible instrument: second measurement of xi∗ .

Supply and Demand

ln q d = β1 + ln (p ) β2 + I β3 + ε1 (Demand)
ln (q s ) = α1 + ln (p ) α2 + w α3 + r α4 + ε2 (Supply)
So if q d = q s then
( α1 + w α3 + r α4 + ε2 ) − ( β1 + I β3 + ε1 )
ln (p ) =
β2 − α2
so ln (p ) is neither uncorrelated with ε1 nor with ε2 .

So
OLS regression of (Demand) will not correctly estimate the demand function, and
OLS regression of (Supply) will not correctly estimate the supply function.
But
we can use w and r as instruments and estimate (Demand) by 2SLS
(over–identified), and
we can use I as instrument and estimate (Supply) by 2SLS (just–identified)

LIML (not in Hayashi)
yi = Y1i0 γ1 + x1i0 β1 + ui , E [x1i ui ] = 0, E [x2i ui ] = 0

Y1i = x1i0 Π1 + x2i0 Π2 + Vi , E [x1i Vi ] = 0, E [x2i Vi ] = 0
(second equation is without loss of generality).
Or 0
yi = x1i0 Π1 + x2i0 Π2 γ1 + x1i0 β1 + ui + Vi0 γ1 ,

2SLS estimates (Π1 , Π2 ) then (γ1 , β1 )

Back to
yi = Y1i0 γ1 + x1i0 β1 + ui
Y1i = x1i0 Π1 + x2i0 Π2 + Vi
The LIML is the MLE assumes that (ui , Vi0 ) is normally distributed.

One can show that for γ1 this is the same as minimizing (over γ1 ) the F–test for
testing β2 = 0 in
yi − Y1i0 γ1 = x1i0 β1 + x2i0 β2 + ui

−1 −1
With MX = I − X (X 0 X ) X 0 and M1 = I − X1 (X10 X1 ) X10 ,
LIML
! 0 −1
β1
b X1 X1 X10 Y1
=
b LIML
γ 1
Y10 X1 Y10 (I − bκ MX ) Y1
0
X1 y
Y10 (I − b
κ MX ) y
where b
κ is the smallest eigenvalue of

−1/2 −1/2
Y 0 MX Y Y 0 M1 Y Y 0 MX Y

and .
−1
M1 = I − X1 X10 X1 X10 and Y = y1 ..Y

Estimators of the form
0 −1
X1 X1 X10 Y1

β1 (k )
b
=
b 1 (k )
γ Y1 X1 Y10 (I − kMX ) Y1
0
X1 y
Y10 (I − kMX ) y
are called K –class estimators. LIML is a K –class estimator with k = b

κ and 2SLS is a
K –class estimator with k = 1.

√ p
It can be shown that all K –class estimators for which n (k − 1) −→ 0 have the same
asymptotic distribution.
Hence LIML and 2SLS are asymptotically equivalent.

(Davidson and MacKinnon)

Is the asymptotics useful?
2SLS regresses
yi on x1i0 Π
b1 + x2i0 Π
b2 and x1i ,
Suppose
that there aremany instruments.
Then x1i0 Π
b 1 + x0 Π
2i 2 will be biased in the direction of Y1i .
b
But we do not want to regress
yi on Y1i and x1i ,

The idea that one wants to minimize (over γ1 ) the F–test for testing β2 = 0 in
yi − Y1i0 γ1 = x1i0 β1 + x2i0 β2 + ui
makes sense in small samples even if dim(x2i ) is large (at least if ui is homoskedastic).
A different, but related, problem is that of weak instruments.

Consider the following simpler model in matrix notation
y = Yγ+u
Y = Zπ + v
where γ1 is one–dimensional. Let the instruments be denoted by Z

−1 −1
Y 0 Z (Z 0 Z ) Z 0y Y 0 Z (Z 0 Z ) Z 0u
b 2SLS =
γ = γ+
Y 0 Z (Z 0 Z ) −1 Z 0 Y Y 0 Z (Z 0 Z ) −1 Z 0 Y

Now
Z 0Y = Z 0Z π + Z 0v
and if we let Q = Z 0 Z /n therefore
√ 0 0 Z / √n Q −1 Z 0 u/ √n

nπ Q + v
b 2SLS − γ = √ 0
γ √ √ √
nπ Q + v 0 Z / n Q −1 Qπ n + Z 0 v / n
√ √
Now if Z is independent of (u, v ) then Z 0 u/ n and Z 0 v / n are approximately
distributed like N 0, Qσ2u and N 0, Qσ2u , respectively.

These are exact if (u, v ) is normal.
We can therefore further write

b 2SLS − γ
γ
√ 0
nπ Q + Q 1/2 σv zv Q −1/2 σu zu

= √ 0 √
nπ Q + Q 1/2 σv zv Q −1 Qπ n + Q 1/2 σv zv

√ 0 1/2
σv nπ Q /σv + zv σu zu
= √ 0 1/2 √
σv nπ Q /σv + zv Q 1/2 π n/σv + zv σv
where zu and zv are (not independent) standard normals.

√
b 2SLS − γ is therefore approximately a normal if
γ nπ 0 Q 1/2 /σv is big relative to a
standard normal.
Hence 2SLS will perform poorly if π is small (weak instruments). The same is true for
LIML.
Rule of Thumb for Weak Instruments
yi = β0 + β1 xi + β2 wi1 + β3 wi2 . . . β1+r wir + ui

with
E [wij ui ] = 0, E [zij ui ] = 0, and E [ui ] = 0
First stage in 2SLS: Regress xi on
1, zi1 , zi2 . . . zim , wi1 , wi2 . . . wir .
Worry if F–stat associated with the z’s is less than 10.

Invalid instruments and weak instruments
Simple model
yi = β0 + xi β1 + ui
Use zi and a constant as instruments.
Simple algebra gives
1
n ∑ (ui − u ) (zi − z ) p cov (ui , zi )
β1 = β1 +
b
1
−→ β1 +
n ∑ (xi − x ) (zi − z ) cov (xi , zi )
So a slight correlation between u and z can give serious bias if cov (xi , zi ) is small.

Parameter Heterogeneity (IV)
Simplest case
yi = β0 + xi β1i + ui (equation of interest)

xi = δ0 + zi δ1i + vi (first stage of TSLS)
To simplify things, suppose:

β1i and δ1i are distributed independently of (ui , vi , zi )
E [ui |zi ] = 0, E [vi |zi ] = 0 and E [δ1i ] 6= 0
Can show
2SLS p cov (yi , zi ) E [δ1i β1i ]
β1
b −→ =
cov (xi , zi ) E [δ1i ]
(weighted average of βi )
Parameter Heterogeneity (IV); continuted.
yi = β0 + xi β1i + ui (equation of interest)

xi = δ0 + zi δ1i + vi (first stage of TSLS)
We had
2SLS p E [ δ1i β1i ]
β1
b −→
E [δ1i ]
TSLS estimates the causal effect for those individuals for whom Zi is most
influential (those with large δ1i ).
LIML not as nice!

Review: Efficient GMM
Recall that 0
δ W
b c = arg min n · gn e
δ W cgn e
δ
δ
e
where 1 n
gn δ = ∑ xi yi − zi δ
e 0e
n i =1
and the optimal choice of W

c is
−1
S −1 = E ε2i xi xi0


We therefore defined a two step estimator based on minimizing
0 −1
arg min n · gn e δ W
δ Sb b c1 gn e
δ
δ
e
where
1
Sb (d ) = ∑ yi − zi0 d xi xi0
2
n

Continuously Updating GMM Estimator (CUE)
0 −1
θ̂ CUE = arg min n · gn e
δ Sb eδ gn e
δ
δ
e
by Hansen, Heaton, and Yaron (1996).
No ad-hoc first stage weighting matrix.

Invariant to rotation of gi by Rm×m (or Rm×m e
δ ).

I δ with R e
Replacing g e δ g e
δ leads to numerically the same estimator as long

as R eδ has full rank.

What is CUE in the linear IV model? Here

gi eδ ≡ yi − zi e
δ xi
so
" # −1
n n
1
0 1
2 1 n
n i∑ n i∑ n i∑
0
θ̂ CUE = arg min yi − zi δ xi
e yi − zi δ xi xi
e yi − zi δ xi .
e
δ
e =1 =1 =1

With homoskedasticity
S = σ2 E xi xi0

so we could define 1 n 2 1 n
δ = ∑ yi − zi e
n i∑
Sb e δ xi0 xi
n i =1 =1
and the Continuously Updating GMM Estimator would then be
0 1 n x 0 x −1 1 n

1 n
∑ ∑ ∑

n i =1 y i − z i δ
e x i n i =1 i i n i =1 y i − z i δ
e xi .
θ̂ CUE = arg min 2
1 n
n ∑i =1 yi − zi δ
δ
e e
This happens to be the LIML estimator.

Hayashi chp3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hayashi chp3

Uploaded by

Copyright:

Available Formats

Hayashi Chapter 3

Bo Honoré (Princeton University) Instrumental Variables 1 / 57

It is natural to assume that xi∗ , vi and εi are independent.

Bo Honoré (Princeton University) Instrumental Variables 2 / 57

It is clear that E [(εi − vi β) xi ] 6= 0 (unless β = 0) so the OLS estimator will be biased.

Bo Honoré (Princeton University) Instrumental Variables 3 / 57

Bo Honoré (Princeton University) Instrumental Variables 4 / 57

Bo Honoré (Princeton University) Instrumental Variables 5 / 57

In the “returns to education” example, zi could be distance to the closest college.

Bo Honoré (Princeton University) Instrumental Variables 6 / 57

cov (yi , zi ) = cov ( β0 + β1 xi + ui , zi )

Bo Honoré (Princeton University) Instrumental Variables 7 / 57

Bo Honoré (Princeton University) Instrumental Variables 8 / 57

So let us think about it differently.

Bo Honoré (Princeton University) Instrumental Variables 9 / 57

Bo Honoré (Princeton University) Instrumental Variables 10 / 57

Bo Honoré (Princeton University) Instrumental Variables 11 / 57

Bo Honoré (Princeton University) Instrumental Variables 12 / 57

yi = β0 + β1 xi1 + β2 xi2 . . . βk xik

Bo Honoré (Princeton University) Instrumental Variables 13 / 57

For each j, regress xij on

1, zi1 , zi2 . . . zim , wi1 , wi2 . . . wir .

Get fitted values, xbij .

yi = β0 + β1 xbi1 + β2 xbi2 . . . βk xbik

Bo Honoré (Princeton University) Instrumental Variables 14 / 57

You first run the first–stage regression

xi1 = γ0 + γ1 zi1 + γ2 wi1 + vi (1)

Bo Honoré (Princeton University) Instrumental Variables 15 / 57

The first order conditions for minimizing this regression are

Bo Honoré (Princeton University) Instrumental Variables 16 / 57

Bo Honoré (Princeton University) Instrumental Variables 17 / 57

yi = β0 + β1 xi1 + β2 wi1 + ui (2)

Bo Honoré (Princeton University) Instrumental Variables 18 / 57

Bo Honoré (Princeton University) Instrumental Variables 19 / 57

E [xi yi ] = E xi zi0 δ + εi = E xi zi0 δ + E [xi εi ] = E xi zi0 δ.

will have a unique solution for δ.

Bo Honoré (Princeton University) Instrumental Variables 20 / 57

Bo Honoré (Princeton University) Instrumental Variables 21 / 57

Bo Honoré (Princeton University) Instrumental Variables 22 / 57

Bo Honoré (Princeton University) Instrumental Variables 23 / 57

Bo Honoré (Princeton University) Instrumental Variables 24 / 57

Bo Honoré (Princeton University) Instrumental Variables 25 / 57

Bo Honoré (Princeton University) Instrumental Variables 26 / 57

Bo Honoré (Princeton University) Instrumental Variables 27 / 57

Bo Honoré (Princeton University) Instrumental Variables 28 / 57

Example: Suppose homoskedasticity:

Then S = E ε2i xi xi0 = E E ε2i xi xi0 xi = σ2 E [xi xi0 ] .

Suppose we know σ2 then

Bo Honoré (Princeton University) Instrumental Variables 29 / 57

Bo Honoré (Princeton University) Instrumental Variables 30 / 57

Bo Honoré (Princeton University) Instrumental Variables 31 / 57

Bo Honoré (Princeton University) Instrumental Variables 32 / 57

Bo Honoré (Princeton University) Instrumental Variables 33 / 57

Bo Honoré (Princeton University) Instrumental Variables 34 / 57

so ln (p ) is neither uncorrelated with ε1 nor with ε2 .

Bo Honoré (Princeton University) Instrumental Variables 35 / 57

Bo Honoré (Princeton University) Instrumental Variables 36 / 57

yi = Y1i0 γ1 + x1i0 β1 + ui , E [x1i ui ] = 0, E [x2i ui ] = 0

(second equation is without loss of generality).

2SLS estimates (Π1 , Π2 ) then (γ1 , β1 )

Bo Honoré (Princeton University) Instrumental Variables 37 / 57

Bo Honoré (Princeton University) Instrumental Variables 38 / 57

Bo Honoré (Princeton University) Instrumental Variables 39 / 57