Professional Documents
Culture Documents
1 Instrumental Variables
The IV Estimator
Identification under IV Estimation
Variance of the IV Estimator
Consistency of the IV estimator
X01 u
Exogenous: plim = E (X01 u) = 0
N
X02 u
Endogenous: plim = E (X02 u) 6= 0
N
The key idea is to find some variables (ZP×N ) which have two properties:
I They are uncorrelated to the error term u, hence
1 0
plim( Z u) = E (Z0 u) = 0
N
I They are correlated to the potentially endogenous regressors X, hence
1 0
plim( Z X) = E (Z0 X) = ΣZX 6= 0
N
We will call these variables instruments
Let’s start with the SLR model as usual. For y = β0 + β1 x + u, and given our
assumptions
Then the IV estimator for β1 is obtained by replacing the population moments with
the sample moments:
Pn
(zi − z̄)(yi − ȳ )
β̂1 = Pi=1
n
i=1 (zi − z̄)(xi − x̄)
When z is equal x, IV reduces to OLS, i.e. OLS is a particular case of IV when use
as instrument for themselves
Z0 û
= 0 ⇒ Z0 (y − Xβ̂ IV ) = 0 ⇒ Z0 y = Z0 Xβ̂ IV
N
If Z0 X is invertible, we have:
β̂ IV = (Z0 X)−1 Z0 y
If we set ZZ0 = I we get the OLS estimator: Minimizing the weighted residual sum
of squares is the same as minimizing the covariance between the instruments and
the error terms
Exact Identification
# of exogenous instruments (Z2 ) = # of endogenous variables (X2 )
⇒ Z = [X1 , Z2 ]
IV Identification
∂yi
∂zi
βi =
∂xi
∂zi
ûû0 0 −1 0
= (Z X) Z Z(Z0 X)−1
N
a correction for degrees of freedom can be done but it is not relevant if we think in
terms of asymptotic variance
1
Since β̂ IV = (Z0 X)−1 Z0 y = β + (Z0 X)−1 Z0 u ⇒ β̂ IV − β = (Z0 X)−1 Z0 u
when Z = X, then the variance of the IV estimator is equal to the variance of the
OLS estimator
in general, the IV estimator produces larger variances than the OLS estimator
we are not going to prove it for the general case but show in the simple case that
σ2 1 1
VarIV (β̂1 ) = = VarOLS (β̂1 ) 2
Nσx2 ρ2x,z Rx,z
standard error (the squared root of the variance) in the IV case differs from OLS
2 2
only in the Rx,z from regressing x on z. Since Rx,z < 1 , IV standard errors are larger
what is the intuition behind the this result?
I to identify the effect of X on y, we are only using that part of the variability in
X which is induced by the instrument
∂y
I less variability in X allows for lower precision in the estimate of ∂X
the stronger the correlation between Z and X, the smaller the IV standard errors
thus:
−1 0 −1 σ 2 var(z)
Var β̂IV = σu2 Z 0 X Z Z X 0Z = u
n cov(x, z)2
σu2 1 var(x) var(z) σu2 1 1 1
= = = Var β̂ OLS
n var(x) cov(x, z)2 n var(x) ρ2xz ρ2xz
2
where ρxz is the correlation
between x and z. Since 0 ≤ ρxz ≤ 1, it
coeffcient
follows that Var β̂IV ≥ Var β̂OLS . Note that when we have instruments with
low power, ρ2xz → 0 and Var β̂IV → ∞. Thus an indication of low power of
instruments is effectively high standard errors of the IV estimates.
= β + Σ−1
ZX × 0 = β
1 2 1 0 −1 1 0 1
lim VarIV (β IV ) = lim ( σ ) ( Z X) ( Z Z) ( Z0 X)−1 = 0
N→∞ N→∞ N N N N
| {z } | {z } | {z }
=Σ−1
ZX
=ΣZZ =Σ−1
ZX
2
A very instructive lecture on IV and identification is: Angrist, J. D. and A.B. Kruger, 2001, "Instrumental
Variables and the Search for Identification: From Supply and Demand to Natural Experiments," Journal of
Economic Perspectives 15, 4, 69–85.
1 Finding instruments is difficult, mainly because there is no way we can ever be sure
that the variables that we have proposed as instruments are actually good ones, i.e.
that the exclusion restrictions hold. Remember plim(Z0 u/N) = 0 is an act of faith
2 Moreover, as we shortly mentioned above and discuss in detail in the next lecture, if
the instruments are poorly correlated with X the IV can be very misleading: Even
worse than OLS!
y = Xβ
b +u
2 b + u is3
The OLS estimator for y = Xβ
b 0 X)
β̂ 2SLS = (X b −1 X
b0y
= [(Z(Z0 Z)−1 Z0 X)0 (Z(Z0 Z)−1 Z0 X)]−1 (Z(Z0 Z)−1 Z0 X)0 y
= [X0 Z (Z0 Z)−1 Z0 Z(Z0 Z)−1 Z0 X)]−1 X0 Z(Z0 Z)−1 Z0 y
| {z }
=I
3
In the perfectly identified case X0 Z and Z0 Z are squared, invertible matrices, ABC−1 = C−1 B−1 A−1
The estimates produced by 2SLS are the same as those produced by IV, the
standard errors that you obtain from the second stage OLS will not be the same as
the correct standard errors of the IV estimates
In general they will be larger. One can show that:
Var (β̂ 2SLS ) = σ 2 (Z0 X)−1 Z0 Z(Z0 X)−1 + Var (η̂β)(Z0 X)−1 Z0 Z(X0 Z)−1
> σ 2 (Z0 X)−1 Z0 Z(Z0 X)−1 = Var (β̂ IV )
I The reason is that the variable that is used in the second stage as an
explanatory variable (X)
b to replace X, is itself a random variable (a generated
regressor) and this inflates the variance
I Hence they need to be adjusted
I As usual, you won’t have to do it, STATA is doing it for you!
y = Xβ + u
we use the fact that X = X̂ + v̂ and rewrite the model as
y = (X̂ + v̂ )β + u = X̂ β + (u + v̂ β) = X̂ β + ε
| {z }
error in the 2S of 2SLS
which is the regression equation we use in the second stage of 2SLS. The variance of
β̂2SLS which we get from this regression (and which will appear in your output) is
where the equality sign applies to the case where there’s a perefct fit in the first stage
(so that var(v̂ ) = 0 ).
b + u is:4
The OLS estimator for y = Xβ
b 0 X)
β̂ 2SLS = (X b −1 Xy
b
0 −1 0 0 0 −1 0 −1 0 −1 0 0
= [(Z(Z Z) Z X) (Z(Z Z) Z X)] (Z(Z Z) Z X) y
| {z } | {z } | {z }
PZ PZ PZ
0 0 −1 0 0
= (X P Z PZ X) XP Z PZ y
0 −1 0
= (X PZ X) X PZ y
4
The matrix PZ is idempotent, hence P0 Z PZ = PZ
y = β0 + β1 x1 + β2 x2 + β3 x3 + u (1)
So, we can think of the 2SLS estimator as a Weighted Least Squared Estimator
where PZ is a particular matrix of weights
The weights are derived from the regression coefficients of the first stage
regression(s)
Hence, we assign a larger weight to the instruments that are more strongly
correlated with X
Thus the optimal matrix PZ has a simple interpretation: it is the matrix that
transforms the endogenous regressors X into their predicted values from the first
stage regression
This new matrix of regressors (X~ = PZ X) satisfies the OLS restrictions because it is
a linear combination of the instruments