Professional Documents
Culture Documents
Bo E. Honoré
Princeton University
Suppose that
yi = α + xi∗ β + εi
but that xi∗ is observed with error so
xi = xi∗ + vi
yi = α + xi β + (εi − vi β)
Suppose
yi = β0 + β1 xi + ui
but that ui varies with xi .
Example
yi log–wages
xi education
ui ability plus “random stuff”
We need something
that moves xi
that does not move ui
because we can then use it’s relationship with yi to learn about β1 .
Let’s call it zi .
so
cov (yi , zi )
β1 =
cov (xi , zi )
Suppose that
yi = β0 + β1 xi + ui
and
xi = γ0 + γ1 zi + vi .
yi = β0 + β1 (γ0 + γ1 zi + vi ) + ui
= β0 + β1 (γ0 + γ1 zi ) + (ui + β1 vi )
= β0 + β1 xbi + (ui + β1 vi )
2SLS:
Estimate γ0 and γ1 by regressing
xi = γ0 + γ1 zi + vi
Construct
xbi = γ
b0 + γ
b 1 zi
Estimate β0 and β1 by regressing
yi = β0 + β1 xbi + errori
with
E [wij ui ] = 0, for j = 1, . . . , r
E [zij ui ] = 0, for j = 1, . . . , m
E [ui ] = 0
yi = β0 + β1 xi1 + β2 wi1 + ui
with
E [ui ] = 0,
E [wi1 ui ] = 0,
E [zi1 ui ] = 0.
Warning: The next step is a little sloppy, but making if more precise would add
confusion.
In (2), both wi1 and zi1 are uncorrelated with both ui and (xi1 − xbi1 ). This implies that
xbi1 and wi1 are both uncorrelated with both vi . So (2) is a valid regression.
Note: This is why we include wi1 in (1). Without it, we could not know that wi1 is
uncorrelated with (xi1 − xbi1 ).
Assumptions
Assumption 3.1:
yi = zi0 δ + εi but E [zi εi ] 6= 0
Assumption 3.2: (yi , zi , xi ) ergodic and stationary.
Assumption 3.3: E [gi ] = E [xi εi ] = E [xi (yi − zi0 δ)] = 0
Assumption 3.4: E [xi zi0 ] has rank L where dim (zi ) = L ≤ K = dim (xi )
Suppose L = K . Then
E [xi yi ] = E xi zi0 δ =⇒
−1
δ = E xi zi0 −1
E [xi yi ] = Σxz
σxy .
This suggests
! −1
1 n 1 n
δ=
b ∑
n i =1
xi zi0 ∑
n i =1
−1
xi yi = Sxz sxy
or
δ W
b c = arg min J eδ, W
c
δ
e
0
= arg min n · gn eδ Wcgn e δ
δ
e
0
= arg min n · sxy − Sxz eδ Wc sxy − Sxz e
δ
δ
e
and
√
c −δ
δ W
n b
!
−1 1 n
= 0 c
Sxz W Sxz 0 c
Sxz W √ ∑ xi ε i
n i =1
d
0
−1 0
−→ Σxz W Σxz Σxz W · N (0, S )
Variance minimized if −1
W = S −1 = E ε2i xi xi0
Can show
d
δ Sb−1 , Sb−1 −→ χ2 (K − L)
J e
E ε2i xi = σ2
n i∑
σ2 ∑ xi xi0
n i∑
zi xi0 xi yi
=1 n i =1 =1
Suppose that
yi = α + xi∗ β + ui
but that xi∗ is observed with error so
xi = xi∗ + vi
It is natural to assume that xi∗ , vi and εi are independent. The equation of interest can
be written as
yi = α + xi β + (εi − vi β)
It is clear that E [(εi − vi β) xi ] 6= 0 (unless β = 0) so the OLS estimator will be biased.
σ2x ∗
plim β̂ = β
σ2x ∗ + σ2v
so in this case the bias in unambiguously towards 0. plims can also be calculated in the
case of many x-variables, but they are messy.
Possible instrument: second measurement of xi∗ .
ln q d = β1 + ln (p ) β2 + I β3 + ε1 (Demand)
ln (q s ) = α1 + ln (p ) α2 + w α3 + r α4 + ε2 (Supply)
So if q d = q s then
( α1 + w α3 + r α4 + ε2 ) − ( β1 + I β3 + ε1 )
ln (p ) =
β2 − α2
Or 0
yi = x1i0 Π1 + x2i0 Π2 γ1 + x1i0 β1 + ui + Vi0 γ1 ,
yi = Y1i0 γ1 + x1i0 β1 + ui
Y1i = x1i0 Π1 + x2i0 Π2 + Vi
The LIML is the MLE assumes that (ui , Vi0 ) is normally distributed.
where b
κ is the smallest eigenvalue of
and .
−1
M1 = I − X1 X10 X1 X10 and Y = y1 ..Y
2SLS regresses
yi on x1i0 Π
b1 + x2i0 Π
b2 and x1i ,
Suppose
that there aremany instruments.
Then x1i0 Π
b 1 + x0 Π
2i 2 will be biased in the direction of Y1i .
b
But we do not want to regress
makes sense in small samples even if dim(x2i ) is large (at least if ui is homoskedastic).
y = Yγ+u
Y = Zπ + v
√ 0 1/2
σv nπ Q /σv + zv σu zu
= √ 0 1/2 √
σv nπ Q /σv + zv Q 1/2 π n/σv + zv σv
Hence 2SLS will perform poorly if π is small (weak instruments). The same is true for
LIML.
Bo Honoré (Princeton University) Instrumental Variables 48 / 57
Rule of Thumb for Weak Instruments
Simple model
yi = β0 + xi β1 + ui
Use zi and a constant as instruments.
Simple algebra gives
1
n ∑ (ui − u ) (zi − z ) p cov (ui , zi )
β1 = β1 +
b
1
−→ β1 +
n ∑ (xi − x ) (zi − z ) cov (xi , zi )
So a slight correlation between u and z can give serious bias if cov (xi , zi ) is small.
We had
2SLS p E [ δ1i β1i ]
β1
b −→
E [δ1i ]
TSLS estimates the causal effect for those individuals for whom Zi is most
influential (those with large δ1i ).
LIML not as nice!
Recall that 0
δ W
b c = arg min n · gn e
δ W cgn e
δ
δ
e
where 1 n
gn δ = ∑ xi yi − zi δ
e 0e
n i =1
where
1
Sb (d ) = ∑ yi − zi0 d xi xi0
2
n
0 −1
θ̂ CUE = arg min n · gn e
δ Sb eδ gn e
δ
δ
e
so
" # −1
n n
1
0 1
2 1 n
n i∑ n i∑ n i∑
0
θ̂ CUE = arg min yi − zi δ xi
e yi − zi δ xi xi
e yi − zi δ xi .
e
δ
e =1 =1 =1
so we could define 1 n 2 1 n
δ = ∑ yi − zi e
n i∑
Sb e δ xi0 xi
n i =1 =1
and the Continuously Updating GMM Estimator would then be
0 1 n x 0 x −1 1 n
1 n
∑ ∑ ∑
n i =1 y i − z i δ
e x i n i =1 i i n i =1 y i − z i δ
e xi .
θ̂ CUE = arg min 2
1 n
n ∑i =1 yi − zi δ
δ
e e