You are on page 1of 57

Hayashi Chapter 3

Bo E. Honoré
Princeton University

Bo Honoré (Princeton University) Instrumental Variables 1 / 57


Motivation: Measurement Error.

Suppose that
yi = α + xi∗ β + εi
but that xi∗ is observed with error so

xi = xi∗ + vi

It is natural to assume that xi∗ , vi and εi are independent.

Bo Honoré (Princeton University) Instrumental Variables 2 / 57


The equation of interest can be written as

yi = α + xi β + (εi − vi β)

It is clear that E [(εi − vi β) xi ] 6= 0 (unless β = 0) so the OLS estimator will be biased.

Bo Honoré (Princeton University) Instrumental Variables 3 / 57


On the other hand, suppose that x̃i is correlated with xi∗ but uncorrelated with vi and
εi . Then
cov (yi , x̃i ) = cov (α + xi β + (εi − vi β) , x̃i ) = cov (xi , x̃i ) β
or
cov (yi , x̃i )
β=
cov (xi , x̃i )
So there is hope...

Bo Honoré (Princeton University) Instrumental Variables 4 / 57


One more example

Suppose
yi = β0 + β1 xi + ui
but that ui varies with xi .

Example
yi log–wages
xi education
ui ability plus “random stuff”

Bo Honoré (Princeton University) Instrumental Variables 5 / 57


Recall
yi = β0 + β1 xi + ui

We need something
that moves xi
that does not move ui
because we can then use it’s relationship with yi to learn about β1 .

Let’s call it zi .

In the “returns to education” example, zi could be distance to the closest college.

Bo Honoré (Princeton University) Instrumental Variables 6 / 57


Mathematically

cov (yi , zi ) = cov ( β0 + β1 xi + ui , zi )


= cov ( β0 , zi ) + cov ( β1 xi , zi ) + cov (ui , zi )
= 0 + β1 cov (xi , zi ) + 0

so
cov (yi , zi )
β1 =
cov (xi , zi )

Bo Honoré (Princeton University) Instrumental Variables 7 / 57


From
cov (yi , zi )
β1 =
cov (xi , zi )
it is clear that
cov (xi , zi ) has to be non–zero
a little randomness matters a lot (and asymmetrically) if cov (xi , zi ) is close to 0.

Bo Honoré (Princeton University) Instrumental Variables 8 / 57


The problem with that is that it does not generalize easily to higher dimensions and
when there are other explanatory variables.

So let us think about it differently.

Suppose that
yi = β0 + β1 xi + ui
and
xi = γ0 + γ1 zi + vi .

Bo Honoré (Princeton University) Instrumental Variables 9 / 57


Suppose that we know γ0 and γ1 then

yi = β0 + β1 (γ0 + γ1 zi + vi ) + ui
= β0 + β1 (γ0 + γ1 zi ) + (ui + β1 vi )
= β0 + β1 xbi + (ui + β1 vi )

Bo Honoré (Princeton University) Instrumental Variables 10 / 57


Of course, we do not know γ0 and γ1 . But we can estimate them!

2SLS:
Estimate γ0 and γ1 by regressing

xi = γ0 + γ1 zi + vi

Construct
xbi = γ
b0 + γ
b 1 zi
Estimate β0 and β1 by regressing

yi = β0 + β1 xbi + errori

Bo Honoré (Princeton University) Instrumental Variables 11 / 57


For this to work, we need
γ0 + γ1 zi cannot be a constant. In other words, γ1 6= 0 (zi and xi are correlated)
E [zi ui ] = 0

Bo Honoré (Princeton University) Instrumental Variables 12 / 57


The general case

yi = β0 + β1 xi1 + β2 xi2 . . . βk xik


+ β1+k wi1 + β2+k wi2 . . . βk +r wir + ui

with

E [wij ui ] = 0, for j = 1, . . . , r
E [zij ui ] = 0, for j = 1, . . . , m
E [ui ] = 0

Bo Honoré (Princeton University) Instrumental Variables 13 / 57


2SLS:

For each j, regress xij on

1, zi1 , zi2 . . . zim , wi1 , wi2 . . . wir .

Get fitted values, xbij .


Run the regression

yi = β0 + β1 xbi1 + β2 xbi2 . . . βk xbik


+ β1+k wi1 + β2+k wi2 . . . βk +r wir + errori

Bo Honoré (Princeton University) Instrumental Variables 14 / 57


Why does this work?
Let us think about an example (k = 1, r = 1, m = 1):

yi = β0 + β1 xi1 + β2 wi1 + ui
with

E [ui ] = 0,
E [wi1 ui ] = 0,
E [zi1 ui ] = 0.

You first run the first–stage regression

xi1 = γ0 + γ1 zi1 + γ2 wi1 + vi (1)

Bo Honoré (Princeton University) Instrumental Variables 15 / 57


The sum of squared errors that you minimize is
n
∑ (xi1 − (g0 + g1zi1 + g2wi1 ))2
i =1

The first order conditions for minimizing this regression are


n
−2 ∑ (xi1 − (γ
b0 + γ
b 1 zi1 + γ
b 2 wi1 )) = 0
i =1
n
−2 ∑ (xi1 − (γ
b0 + γ
b 1 zi1 + γ
b 2 wi1 )) z1i = 0
i =1
n
−2 ∑ (xi1 − (γ
b0 + γ
b 1 zi1 + γ
b 2 wi1 )) w1i = 0
i =1

Bo Honoré (Princeton University) Instrumental Variables 16 / 57


or
1 n
n i∑
(xi1 − xbi1 ) = 0
=1
1 n
n i∑
(xi1 − xbi1 ) zi1 = 0
=1
1 n
n i∑
(xi1 − xbi1 ) wi1 = 0
=1

Bo Honoré (Princeton University) Instrumental Variables 17 / 57


Now let us re–write our equation of interest as

yi = β0 + β1 xi1 + β2 wi1 + ui (2)


= β0 + β1 xbi1 + β2 wi1 + ui + β1 (xi1 − xbi1 )
= β0 + β1 xbi1 + β2 wi1 + vi

Warning: The next step is a little sloppy, but making if more precise would add
confusion.

In (2), both wi1 and zi1 are uncorrelated with both ui and (xi1 − xbi1 ). This implies that
xbi1 and wi1 are both uncorrelated with both vi . So (2) is a valid regression.

Note: This is why we include wi1 in (1). Without it, we could not know that wi1 is
uncorrelated with (xi1 − xbi1 ).

Bo Honoré (Princeton University) Instrumental Variables 18 / 57


GMM: Let’s Do It Right (Hayashi Chapter 3)

Assumptions
Assumption 3.1:
yi = zi0 δ + εi but E [zi εi ] 6= 0
Assumption 3.2: (yi , zi , xi ) ergodic and stationary.
Assumption 3.3: E [gi ] = E [xi εi ] = E [xi (yi − zi0 δ)] = 0
Assumption 3.4: E [xi zi0 ] has rank L where dim (zi ) = L ≤ K = dim (xi )

Bo Honoré (Princeton University) Instrumental Variables 19 / 57


With these,

E [xi yi ] = E xi zi0 δ + εi = E xi zi0 δ + E [xi εi ] = E xi zi0 δ.


     

will have a unique solution for δ.

Suppose L = K . Then

E [xi yi ] = E xi zi0 δ =⇒
 
 −1
δ = E xi zi0 −1
E [xi yi ] = Σxz

σxy .

This suggests
! −1
1 n 1 n
δ=
b ∑
n i =1
xi zi0 ∑
n i =1
−1
xi yi = Sxz sxy

Bo Honoré (Princeton University) Instrumental Variables 20 / 57


Clearly
! −1
n
1 1 n
n i∑ ∑ xi yi
0
δ=
b x z
i i
=1 n i =1
a.s.  0  −1
−→ E xi zi E [ xi yi ] = δ

Bo Honoré (Princeton University) Instrumental Variables 21 / 57


and
! −1
n
1 1 n
δ =
b ∑
n i =1
xi zi0
n i∑
xi yi
=1
! −1
1 n 1 n
n i∑ ∑
xi zi0 xi zi0 δ + εi

=
=1 n i =1
! −1
1 n 1 n
n i∑ n i∑
0
= δ+ xi zi xi εi
=1 =1

or

Bo Honoré (Princeton University) Instrumental Variables 22 / 57


! −1
√   1 n 1 n
n i∑
n bδ−δ = xi zi0 √ ∑ xi ε i
=1 n i =1
With
Assumption 3.5: gi = xi εi is a m.d.s. with S = E [gi gi0 ] finite
we have
! −1
√   1 n 1 n
n δ−δ =
b ∑
n i =1
xi zi0 √ ∑ xi ε i
n i =1
d  −1
−→ E xi zi0

· N (0, S )

Bo Honoré (Princeton University) Instrumental Variables 23 / 57


What if we have more instruments than we need?
Let   1 n  
δ = ∑ xi yi − zi0e
gn e δ = sxy − Sxz e
δ
n i =1
 
It makes sense to estimate δ by making gn eδ close to 0:

   
δ W
b c = arg min J eδ, W
c
δ
e
 0  
= arg min n · gn eδ Wcgn e δ
δ
e
 0  
= arg min n · sxy − Sxz eδ Wc sxy − Sxz e
δ
δ
e

Bo Honoré (Princeton University) Instrumental Variables 24 / 57


Simple algebra
    −1
0 c 0 c
δ W
b c = Sxz W Sxz Sxz W sxy
!
  −1 1 n
n i∑
0 c 0 c
= δ + Sxz W Sxz Sxz W xi ε i
=1

Bo Honoré (Princeton University) Instrumental Variables 25 / 57


p
c −→
We will assume that W W and W = W 0 . Then
  p
δ W
b c −→ δ

and
√    
c −δ
δ W
n b
!
  −1 1 n
= 0 c
Sxz W Sxz 0 c
Sxz W √ ∑ xi ε i
n i =1
d
0
 −1 0
−→ Σxz W Σxz Σxz W · N (0, S )

Bo Honoré (Princeton University) Instrumental Variables 26 / 57


or
√    
d
n b c − δ −→
δ W
  −1 0  −1 
0 0
N 0, Σxz W Σxz Σxz WSW Σxz Σxz W Σxz

Variance minimized if  −1
W = S −1 = E ε2i xi xi0


Bo Honoré (Princeton University) Instrumental Variables 27 / 57


This suggests
Wc = Sb−1
1   2
Sb = ∑ yi − zi0b
δ Wc1 xi xi0
n
for some arbitrary W
c1 .
√   −1  
d

0
 −1 
δ Sb
n b − δ −→ N 0, Σxz S −1 Σxz

Can show    
d
δ Sb−1 , Sb−1 −→ χ2 (K − L)
J e

Bo Honoré (Princeton University) Instrumental Variables 28 / 57


Two–Stage Least Squares

Example: Suppose homoskedasticity:

E ε2i xi = σ2
 

Then S = E ε2i xi xi0 = E E ε2i xi xi0 xi = σ2 E [xi xi0 ] .


    

Suppose we know σ2 then


n
1
Sb = σ2 ∑ xi xi0
n i =1
and

Bo Honoré (Princeton University) Instrumental Variables 29 / 57


   
δ W
b copt = b δ Sb−1 =
  −1
0 b−1 0 b−1
Sxz S Sxz Sxz S sxy =
 ! ! −1 !  −1
 1 n n n
21 1 
 n i∑
σ ∑ xi xi
n i∑
0 0 0
zi xi xi zi
=1 n i =1 =1

! ! −1 !
1 n 1 n
1 n

n i∑
σ2 ∑ xi xi0
n i∑
zi xi0 xi yi
=1 n i =1 =1

Bo Honoré (Princeton University) Instrumental Variables 30 / 57


 ! ! −1 !  −1
 n n n 
= ∑ zi xi0 ∑ xi xi0 ∑ xi zi0
i =1 i =1 i =1
 
! ! −1 !
n n n
∑ zi xi0 ∑ xi xi0 ∑ xi yi
i =1 i =1 i =1
  −1 0  −1 0  −1 0
= Z 0X X 0X X Z Z X X 0X X y
 −1 0
= Z 0 PZ Z Py
−1 0 0
Z 0 P 0 PZ

= ZPy

Bo Honoré (Princeton University) Instrumental Variables 31 / 57


 −1
= (PZ )0 (PZ ) (PZ )0 y
  −1
0b
= ZZ
b Zb0 y
 −1 0 0
= Z 0 P 0 PZ Z P Py
−1
(PZ )0 (PZ ) (PZ )0 Py

=
  −1
= Zb0 Zb Zb0 yb

Bo Honoré (Princeton University) Instrumental Variables 32 / 57


Classes Example: Measurement Error (again)

Suppose that
yi = α + xi∗ β + ui
but that xi∗ is observed with error so

xi = xi∗ + vi

It is natural to assume that xi∗ , vi and εi are independent. The equation of interest can
be written as
yi = α + xi β + (εi − vi β)
It is clear that E [(εi − vi β) xi ] 6= 0 (unless β = 0) so the OLS estimator will be biased.

Bo Honoré (Princeton University) Instrumental Variables 33 / 57


Indeed, it can be shown that

σ2x ∗
plim β̂ = β
σ2x ∗ + σ2v

so in this case the bias in unambiguously towards 0. plims can also be calculated in the
case of many x-variables, but they are messy.
Possible instrument: second measurement of xi∗ .

Bo Honoré (Princeton University) Instrumental Variables 34 / 57


Supply and Demand

 
ln q d = β1 + ln (p ) β2 + I β3 + ε1 (Demand)
ln (q s ) = α1 + ln (p ) α2 + w α3 + r α4 + ε2 (Supply)

So if q d = q s then

( α1 + w α3 + r α4 + ε2 ) − ( β1 + I β3 + ε1 )
ln (p ) =
β2 − α2

so ln (p ) is neither uncorrelated with ε1 nor with ε2 .

Bo Honoré (Princeton University) Instrumental Variables 35 / 57


So
OLS regression of (Demand) will not correctly estimate the demand function, and
OLS regression of (Supply) will not correctly estimate the supply function.
But
we can use w and r as instruments and estimate (Demand) by 2SLS
(over–identified), and
we can use I as instrument and estimate (Supply) by 2SLS (just–identified)

Bo Honoré (Princeton University) Instrumental Variables 36 / 57


LIML (not in Hayashi)

yi = Y1i0 γ1 + x1i0 β1 + ui , E [x1i ui ] = 0, E [x2i ui ] = 0


Y1i = x1i0 Π1 + x2i0 Π2 + Vi , E [x1i Vi ] = 0, E [x2i Vi ] = 0

(second equation is without loss of generality).

Or 0
yi = x1i0 Π1 + x2i0 Π2 γ1 + x1i0 β1 + ui + Vi0 γ1 ,


2SLS estimates (Π1 , Π2 ) then (γ1 , β1 )

Bo Honoré (Princeton University) Instrumental Variables 37 / 57


Back to

yi = Y1i0 γ1 + x1i0 β1 + ui
Y1i = x1i0 Π1 + x2i0 Π2 + Vi

The LIML is the MLE assumes that (ui , Vi0 ) is normally distributed.

Bo Honoré (Princeton University) Instrumental Variables 38 / 57


One can show that for γ1 this is the same as minimizing (over γ1 ) the F–test for
testing β2 = 0 in
yi − Y1i0 γ1 = x1i0 β1 + x2i0 β2 + ui

Bo Honoré (Princeton University) Instrumental Variables 39 / 57


−1 −1
With MX = I − X (X 0 X ) X 0 and M1 = I − X1 (X10 X1 ) X10 ,
LIML
!  0  −1
β1
b X1 X1 X10 Y1
=
b LIML
γ 1
Y10 X1 Y10 (I − bκ MX ) Y1
 0 
X1 y
Y10 (I − b
κ MX ) y

where b
κ is the smallest eigenvalue of

Bo Honoré (Princeton University) Instrumental Variables 40 / 57


−1/2 −1/2
Y 0 MX Y Y 0 M1 Y Y 0 MX Y


and  . 
 −1
M1 = I − X1 X10 X1 X10 and Y = y1 ..Y

Bo Honoré (Princeton University) Instrumental Variables 41 / 57


Estimators of the form
 0  −1
X1 X1 X10 Y1
 
β1 (k )
b
=
b 1 (k )
γ Y1 X1 Y10 (I − kMX ) Y1
 0 
X1 y
Y10 (I − kMX ) y

are called K –class estimators. LIML is a K –class estimator with k = b


κ and 2SLS is a
K –class estimator with k = 1.

Bo Honoré (Princeton University) Instrumental Variables 42 / 57


√ p
It can be shown that all K –class estimators for which n (k − 1) −→ 0 have the same
asymptotic distribution.

Hence LIML and 2SLS are asymptotically equivalent.


(Davidson and MacKinnon)

Bo Honoré (Princeton University) Instrumental Variables 43 / 57


Is the asymptotics useful?

2SLS regresses  
yi on x1i0 Π
b1 + x2i0 Π
b2 and x1i ,
Suppose
 that there aremany instruments.
Then x1i0 Π
b 1 + x0 Π
2i 2 will be biased in the direction of Y1i .
b
But we do not want to regress

yi on Y1i and x1i ,

Bo Honoré (Princeton University) Instrumental Variables 44 / 57


The idea that one wants to minimize (over γ1 ) the F–test for testing β2 = 0 in

yi − Y1i0 γ1 = x1i0 β1 + x2i0 β2 + ui

makes sense in small samples even if dim(x2i ) is large (at least if ui is homoskedastic).

A different, but related, problem is that of weak instruments.

Bo Honoré (Princeton University) Instrumental Variables 45 / 57


Consider the following simpler model in matrix notation

y = Yγ+u
Y = Zπ + v

where γ1 is one–dimensional. Let the instruments be denoted by Z


−1 −1
Y 0 Z (Z 0 Z ) Z 0y Y 0 Z (Z 0 Z ) Z 0u
b 2SLS =
γ = γ+
Y 0 Z (Z 0 Z ) −1 Z 0 Y Y 0 Z (Z 0 Z ) −1 Z 0 Y

Bo Honoré (Princeton University) Instrumental Variables 46 / 57


Now
Z 0Y = Z 0Z π + Z 0v
and if we let Q = Z 0 Z /n therefore
√ 0 0 Z / √n Q −1 Z 0 u/ √n

nπ Q + v
b 2SLS − γ = √ 0
γ √  √ √ 
nπ Q + v 0 Z / n Q −1 Qπ n + Z 0 v / n
√ √
Now if Z is independent of (u, v ) then Z 0 u/ n and Z 0 v / n are approximately
distributed like N 0, Qσ2u and N 0, Qσ2u , respectively.


These are exact if (u, v ) is normal.

We can therefore further write

Bo Honoré (Princeton University) Instrumental Variables 47 / 57


b 2SLS − γ
γ
√ 0
nπ Q + Q 1/2 σv zv Q −1/2 σu zu

= √ 0 √
nπ Q + Q 1/2 σv zv Q −1 Qπ n + Q 1/2 σv zv
 

√ 0 1/2 
σv nπ Q /σv + zv σu zu
= √ 0 1/2  √ 
σv nπ Q /σv + zv Q 1/2 π n/σv + zv σv

where zu and zv are (not independent) standard normals.



b 2SLS − γ is therefore approximately a normal if
γ nπ 0 Q 1/2 /σv is big relative to a
standard normal.

Hence 2SLS will perform poorly if π is small (weak instruments). The same is true for
LIML.
Bo Honoré (Princeton University) Instrumental Variables 48 / 57
Rule of Thumb for Weak Instruments

yi = β0 + β1 xi + β2 wi1 + β3 wi2 . . . β1+r wir + ui


with
E [wij ui ] = 0, E [zij ui ] = 0, and E [ui ] = 0

First stage in 2SLS: Regress xi on

1, zi1 , zi2 . . . zim , wi1 , wi2 . . . wir .

Worry if F–stat associated with the z’s is less than 10.

Bo Honoré (Princeton University) Instrumental Variables 49 / 57


Invalid instruments and weak instruments

Simple model
yi = β0 + xi β1 + ui
Use zi and a constant as instruments.
Simple algebra gives
1
n ∑ (ui − u ) (zi − z ) p cov (ui , zi )
β1 = β1 +
b
1
−→ β1 +
n ∑ (xi − x ) (zi − z ) cov (xi , zi )

So a slight correlation between u and z can give serious bias if cov (xi , zi ) is small.

Bo Honoré (Princeton University) Instrumental Variables 50 / 57


Parameter Heterogeneity (IV)
Simplest case

yi = β0 + xi β1i + ui (equation of interest)


xi = δ0 + zi δ1i + vi (first stage of TSLS)

To simplify things, suppose:


β1i and δ1i are distributed independently of (ui , vi , zi )
E [ui |zi ] = 0, E [vi |zi ] = 0 and E [δ1i ] 6= 0
Can show
2SLS p cov (yi , zi ) E [δ1i β1i ]
β1
b −→ =
cov (xi , zi ) E [δ1i ]
(weighted average of βi )
Bo Honoré (Princeton University) Instrumental Variables 51 / 57
Parameter Heterogeneity (IV); continuted.

yi = β0 + xi β1i + ui (equation of interest)


xi = δ0 + zi δ1i + vi (first stage of TSLS)

We had
2SLS p E [ δ1i β1i ]
β1
b −→
E [δ1i ]

TSLS estimates the causal effect for those individuals for whom Zi is most
influential (those with large δ1i ).
LIML not as nice!

Bo Honoré (Princeton University) Instrumental Variables 52 / 57


Review: Efficient GMM

Recall that    0  
δ W
b c = arg min n · gn e
δ W cgn e
δ
δ
e

where   1 n  
gn δ = ∑ xi yi − zi δ
e 0e
n i =1

and the optimal choice of W


c is
 −1
S −1 = E ε2i xi xi0


Bo Honoré (Princeton University) Instrumental Variables 53 / 57


We therefore defined a two step estimator based on minimizing
 0   −1  
arg min n · gn e δ W
δ Sb b c1 gn e
δ
δ
e

where
1
Sb (d ) = ∑ yi − zi0 d xi xi0
2
n

Bo Honoré (Princeton University) Instrumental Variables 54 / 57


Continuously Updating GMM Estimator (CUE)

  0   −1  
θ̂ CUE = arg min n · gn e
δ Sb eδ gn e
δ
δ
e

by Hansen, Heaton, and Yaron (1996).

No ad-hoc first stage weighting matrix.


 
Invariant to rotation of gi by Rm×m (or Rm×m e
δ ).
     
I δ with R e
Replacing g e δ g e
δ leads to numerically the same estimator as long
 
as R eδ has full rank.

Bo Honoré (Princeton University) Instrumental Variables 55 / 57


What is CUE in the linear IV model? Here
   
gi eδ ≡ yi − zi e
δ xi

so
" # −1
n n 
1  
0 1
 2 1 n  
n i∑ n i∑ n i∑
0
θ̂ CUE = arg min yi − zi δ xi
e yi − zi δ xi xi
e yi − zi δ xi .
e
δ
e =1 =1 =1

Bo Honoré (Princeton University) Instrumental Variables 56 / 57


With homoskedasticity
S = σ2 E xi xi0
 

so we could define   1 n  2 1 n
δ = ∑ yi − zi e
n i∑
Sb e δ xi0 xi
n i =1 =1
and the Continuously Updating GMM Estimator would then be

0 1 n x 0 x −1 1 n
    
1 n
∑ ∑ ∑

n i =1 y i − z i δ
e x i n i =1 i i n i =1 y i − z i δ
e xi .
θ̂ CUE = arg min  2
1 n
n ∑i =1 yi − zi δ
δ
e e

This happens to be the LIML estimator.

Bo Honoré (Princeton University) Instrumental Variables 57 / 57

You might also like