You are on page 1of 32

Unit 12: Instrumental variables (IV) and 2SLS

C. Zulehner: Introductory Econometrics 1 / 32


Outline

1 Instrumental Variables
The IV Estimator
Identification under IV Estimation
Variance of the IV Estimator
Consistency of the IV estimator

2 The Power of IV: Examples


Addressing errors-in-variables with IV
Dealing with unobserved heterogeneity: An Example

3 The Two Stage Least Squared Estimator

C. Zulehner: Introductory Econometrics 2 / 32


1. The Instrumental Variables (IV) Estimator
Our regression model is:

yN×1 = XN×K β K ×1 + uN×1

We suspect that E (X0 u) = plim( N1 X0 u) 6= 0 ⇒ plimβ̂ 6= plimβ, therefore, the OLS


estimator is inconsistent
We have namned four different causes for this endogeneity problem
In many cases, only some of the variables in X suffer from endogeneity
So partition X = [X1 X2 ] and define:

X01 u
 
Exogenous: plim = E (X01 u) = 0
N
X02 u
 
Endogenous: plim = E (X02 u) 6= 0
N

C. Zulehner: Introductory Econometrics 3 / 32


Endogeneity - Remarks

Our thinking that plim(X02 u/N) 6= 0 is a suspicion, not a certainty


I We use economic reasoning, common sense, intuition to form our suspicion
I We will never know whether the suspicion is real or justified
I We cannot test whether plim(X02 u/N) = 0 as u is not observable
We look for an alternative estimator, which can help to obtain consistent estimates
in presence of endogeneity
We will call this estimator Instrumental Variable (IV) estimator

C. Zulehner: Introductory Econometrics 4 / 32


The Instruments

The key idea is to find some variables (ZP×N ) which have two properties:
I They are uncorrelated to the error term u, hence
1 0
plim( Z u) = E (Z0 u) = 0
N
I They are correlated to the potentially endogenous regressors X, hence
1 0
plim( Z X) = E (Z0 X) = ΣZX 6= 0
N
We will call these variables instruments

C. Zulehner: Introductory Econometrics 5 / 32


1.1 The IV Estimator

Let’s start with the SLR model as usual. For y = β0 + β1 x + u, and given our
assumptions

E (z, y ) = Cov (z, y ) = Cov (z, β0 + β1 x + u)


= β1 Cov (z, x) + Cov (z, u)
Cov (z, y )
⇒ β1 =
Cov (z, x)

Then the IV estimator for β1 is obtained by replacing the population moments with
the sample moments:
Pn
(zi − z̄)(yi − ȳ )
β̂1 = Pi=1
n
i=1 (zi − z̄)(xi − x̄)

When z is equal x, IV reduces to OLS, i.e. OLS is a particular case of IV when use
as instrument for themselves

C. Zulehner: Introductory Econometrics 6 / 32


The IV Estimator - The general case

The key assumption (moment condition) that we want to exploit is


plim( N1 Z0 u) = E (Z 0 u) = 0
I Note that this is a property of the entire population
We can measure its sample analog:

Z0 û
= 0 ⇒ Z0 (y − Xβ̂ IV ) = 0 ⇒ Z0 y = Z0 Xβ̂ IV
N

If Z0 X is invertible, we have:

Instrumental Variable (IV) estimator

β̂ IV = (Z0 X)−1 Z0 y

C. Zulehner: Introductory Econometrics 7 / 32


The IV Estimator - Weighted LS derivation
The alternative strategy is to minimize the residual sum of squares weighting
observations with the instruments, i.e. find the β̂ that minimizes:

(Z0 u)0 (Z0 u) = (u0 ZZ0 u)


= (y − Xβ)0 ZZ0 (y − Xβ)
= y0 ZZ0 y − 2β 0 X0 ZZ0 y + β 0 X0 ZZ0 Xβ

The FOC are

−2X0 ZZ0 y + 2X0 ZZ0 Xβ̂ IV = 0


X0 ZZ0 Xβ̂ IV = X0 ZZ0 y
Z0 Xβ̂ IV = Z0 y
β̂ IV = (Z0 X)−1 Z0 y

If we set ZZ0 = I we get the OLS estimator: Minimizing the weighted residual sum
of squares is the same as minimizing the covariance between the instruments and
the error terms

C. Zulehner: Introductory Econometrics 8 / 32


Comments to IV - Exact Identification

If we set Z = X we again get the OLS estimator


To get the IV estimator, Z0 P×N XN×K has to be invertible:
I Z must have the same number of variables as X, i.e. P = K
I The instruments must be linearly independent (Z must be full rank)
In this case we have what is called exact identification. What does this mean?
If X = [X1 , X2 ] and X1 are exogenous and X2 are endogenous variables, then
I Z will include all X1 variables as an instrument for themselves
I We need to replace X2 variables with as many instruments

Exact Identification
# of exogenous instruments (Z2 ) = # of endogenous variables (X2 )
⇒ Z = [X1 , Z2 ]

C. Zulehner: Introductory Econometrics 9 / 32


1.2 Identification under IV Estimation

The intuitive idea behind IV is as follows:


0
I X vary in response to u (because plim( XNu ) 6= 0)
I But X also vary in response to Z
0
I Z do not vary as u changes (because plim( ZNu ) = 0)
I Exploit the variation in X that is due to the variation in Z to identify the effect
of X on y
You see again the centrality of the orthogonality condition E (Z0 u) = 0 which
replaces E (X0 u) = 0
Therefore we solve the endogeneity problems by looking for variables (instruments)
which are exogenous!
we will see that this might be very useful

C. Zulehner: Introductory Econometrics 10 / 32


More formal intuition: Identification
The effect we want to measure is
∂yi ∂ui
= βi +
∂xi ∂xi

If the assumptions of OLS are met ( ∂ui


∂xi
= 0 ) we can identify the true parameter βi ,
else, we don’t get identification
In the IV case we have
∂yi ∂xi ∂ui
= βi +
∂zi ∂zi ∂zi
∂ui
If zi is a good instrument ∂zi
= 0 then

IV Identification
 
∂yi
∂zi
βi =  
∂xi
∂zi

C. Zulehner: Introductory Econometrics 11 / 32


1.3 Variance of the IV Estimator
The homoskedasticity assumption in this case is E (ui2 |zi ) = Var (u) = σ 2 or, more
generally, E (uu0 |Z) = σ 2 I
We can then derive the variance of the IV estimator:1

Var (β̂ IV ) = E [(β̂ IV − β)(β̂ IV − β)0 ]


= E [((Z0 X)−1 Z0 u)((Z0 X)−1 Z0 u)0 ] = E [(Z0 X)−1 Z0 uu0 Z(Z0 X)−1 ]
= (Z0 X)−1 Z0 E [uu0 ] Z(Z0 X)−1 = σ 2 (Z0 X)−1 Z0 Z(Z0 X)−1
| {z }
=σ 2

This can be estimated by


d (β̂ ) = σ̂ 2 (Z0 X)−1 Z0 Z(Z0 X)−1
Var IV

ûû0 0 −1 0
= (Z X) Z Z(Z0 X)−1
N

a correction for degrees of freedom can be done but it is not relevant if we think in
terms of asymptotic variance

1
Since β̂ IV = (Z0 X)−1 Z0 y = β + (Z0 X)−1 Z0 u ⇒ β̂ IV − β = (Z0 X)−1 Z0 u

C. Zulehner: Introductory Econometrics 12 / 32


Remarks

when Z = X, then the variance of the IV estimator is equal to the variance of the
OLS estimator
in general, the IV estimator produces larger variances than the OLS estimator
we are not going to prove it for the general case but show in the simple case that

σ2 1 1
VarIV (β̂1 ) = = VarOLS (β̂1 ) 2
Nσx2 ρ2x,z Rx,z

standard error (the squared root of the variance) in the IV case differs from OLS
2 2
only in the Rx,z from regressing x on z. Since Rx,z < 1 , IV standard errors are larger
what is the intuition behind the this result?
I to identify the effect of X on y, we are only using that part of the variability in
X which is induced by the instrument
∂y
I less variability in X allows for lower precision in the estimate of ∂X
the stronger the correlation between Z and X, the smaller the IV standard errors

C. Zulehner: Introductory Econometrics 13 / 32


Variance of IV vs OLS - Formally
suppose for simplicity that Z and X are mean-zero vectors (i.e., we have a single
regressor and a single instrument). Thus
n
X n
X n
X
Z0 X = X 0 Z = zi xi = ncov (z, x), Z0 Z = zi2 = nvar (z), X 0 X = xi2 = nvar (x)
i=1 i=1 i=1

thus:
  −1 0 −1 σ 2 var(z)
Var β̂IV = σu2 Z 0 X Z Z X 0Z = u
n cov(x, z)2
σu2 1 var(x) var(z) σu2 1 1   1
= = = Var β̂ OLS
n var(x) cov(x, z)2 n var(x) ρ2xz ρ2xz
2
where ρxz is the correlation
  between x and z. Since 0 ≤ ρxz ≤ 1, it
coeffcient

follows that Var β̂IV ≥ Var β̂OLS . Note that when we have instruments with
 
low power, ρ2xz → 0 and Var β̂IV → ∞. Thus an indication of low power of
instruments is effectively high standard errors of the IV estimates.

C. Zulehner: Introductory Econometrics 14 / 32


1.4 Consistency of the IV estimator

The IV estimator is:

β̂ IV = (Z0 X)−1 Z0 y = (Z0 X)−1 Z0 (Xβ + u)


= β + (Z0 X)−1 Z0 u

Taking the plim

plimβ̂ IV = plimβ + plim(Z0 X)−1 plim(Z0 u)


| {z } | {z }
E (Z0 X)−1 =Σ−1 E (Z0 u)=0
ZX

= β + Σ−1
ZX × 0 = β

C. Zulehner: Introductory Econometrics 15 / 32


Moreover the variance of the IV estimator tends to zero when N → ∞

1 2 1 0 −1 1 0 1
lim VarIV (β IV ) = lim ( σ ) ( Z X) ( Z Z) ( Z0 X)−1 = 0
N→∞ N→∞ N N N N
| {z } | {z } | {z }
=Σ−1
ZX
=ΣZZ =Σ−1
ZX

The instrumental variable IV estimator is consistent


Note: While IV is not efficient if E (X0 u) = 0 (i.e. in absence of endogeneity), IV is
consistent both when E (X0 u) = 0 and when E (X0 u) 6= 0
This is the key idea that we will use to "test"for endogeneity (see next lecture)

C. Zulehner: Introductory Econometrics 16 / 32


2. The Power of IV: When and how

IV can address any case where the orthogonality condition fails


What should we do if we suspect one of the above? Should we rush to find an
instrument?2
It depends on the nature of failure of the orthogonality condition
Better first to try fix the problem by eliminating its cause
I If a variable is omitted try first to get it or a proxy for it
I Unobserved heterogeneity can sometimes be dealt with dummy variables plus
some other source of variability
I If variable measured with error try to obtain a better measured variable
I Reverse causality (& sample selection): very hard to deal by trying to eliminate
the cause; unavoidable to rely on IV

2
A very instructive lecture on IV and identification is: Angrist, J. D. and A.B. Kruger, 2001, "Instrumental
Variables and the Search for Identification: From Supply and Demand to Natural Experiments," Journal of
Economic Perspectives 15, 4, 69–85.

C. Zulehner: Introductory Econometrics 17 / 32


Why to be so careful?

1 Finding instruments is difficult, mainly because there is no way we can ever be sure
that the variables that we have proposed as instruments are actually good ones, i.e.
that the exclusion restrictions hold. Remember plim(Z0 u/N) = 0 is an act of faith
2 Moreover, as we shortly mentioned above and discuss in detail in the next lecture, if
the instruments are poorly correlated with X the IV can be very misleading: Even
worse than OLS!

C. Zulehner: Introductory Econometrics 18 / 32


2.1 Addressing errors-in-variables with IV

Remember the classical errors-in-variables problem where we observe x1 instead of


x1∗
I Where x1 = x1∗ + e1 , and e1 is uncorrelated with x1∗ and x2 , . . . , xk
If there is a z, such that Corr (z, u) = 0 and Corr (z, x1) 6= 0, then IV will remove
the attenuation bias
I Notice that z can also be measured with errors, the important is that the error
in the instrument is uncorrelated with error in the x1

The use of checks in Italy


Last week, we saw the measurement-error-bias when we perturbed the social capital
variable (participation to referenda) with a random error
We choose as an instrument a dummy for blood donation
I It is correlated with participation in referenda (social awareness)
I It should be not correlated with the error term in the use of check equation, as
well as with the measurement error that we randomly generated

C. Zulehner: Introductory Econometrics 19 / 32


The use of checks in Italy
Dependent variable Use of Checks Use of Checks Social capital Use of Checks
Specification correct sc w/ large error w/ large error IV
Social capital 0.9440 0.0127 1.3759
(5.785)∗∗ (1.681) (4.266)∗∗
Blood donation 1.9287
(4.530)∗∗
Age -0.0034 -0.0033 0.0004 -0.0038
(-7.740)∗∗ (-7.406)∗∗ (2.285)∗∗ (-7.140)∗∗
Married 0.0589 0.0511 0.0157 0.0336
(5.111)∗∗ (3.930)∗∗ (1.802)∗ (2.413)∗∗
Male 0.0197 0.0192 -0.0127 0.0356
(2.110)∗∗ (1.814)∗ (-1.136) (1.959)∗
Years of education 0.0287 0.0283 -0.0001 0.0290
(15.852)∗∗ (15.528)∗∗ (-0.134) (12.986)∗∗
Household wealth 0.0447 0.0410 -0.0109 0.0598
(1.429) (1.209) (-0.791) (1.645)
Household income 0.0057 0.0063 0.0005 0.0052
(14.645)∗∗ (14.403)∗∗ (2.343)∗∗ (10.456)∗∗
Judicial Efficiency -0.0130 -0.0481 -0.0259 0.0036
(-1.138) (-3.647)∗∗ (-4.917)∗∗ (0.258)
Constant -0.4778 0.3845 0.8063 -0.8407
(-2.995)∗∗ (4.890)∗∗ (32.351)∗∗ (-3.174)∗∗
Number of observations 32442 32442 32442 32442
R-squared adjusted 0.277 0.263 0.015 .

Robust t-statistics clustered at the provincial level in parentheses

C. Zulehner: Introductory Econometrics 20 / 32


2.2 Dealing with unobserved heterogeneity

The use of checks in Italy


Social capital varies across provinces
One objection is that it may be capturing other variables that are specific to the
province, that are omitted from the regression and that are correlated with social
capital
One can try to address this issue by inserting other controls that vary by province
and that may possibly matter for the regression (and are possibly correlated with
social capital)
1 Judicial inefficiency
2 GDP per capita
But there could be others that one is omitting either because they do not come to
one’s mind or because they are not easily measurable

C. Zulehner: Introductory Econometrics 21 / 32


Dummies may solve the problem
The use of checks in Italy
One way to solve the problem is to add a set of province dummies:
I Pj = 1 if household located in province j, zero otherwise
I These dummies would control for ANY variable that varies across provinces
(but are constant over time!) and that may matter for the choice of using
checks
However, the dummies absorb all the variation in the province of residence. Hence
all those other variables that are province-specific and do not vary over time are not
identified
Hence, one has to find some other source of variation for variables such as the
social capital
I One can use the social capital of origin province
I This will work because some households are original from a different province
than the province they now leave in
I This requires the additional assumption that individual’s behavior depends not
only on the social capital of the place where he lives but also on the SC of the
place were he is born

C. Zulehner: Introductory Econometrics 22 / 32


The use of checks in Italy
Dependent variable Use of Checks Use of Checks Use of Checks
Specification No dummies No dummies Prov. dummies
Social capital 0.9440
(5.785)∗∗
Social capital - origin 0.6433 0.2450
(4.602)∗∗ (2.672)∗∗
Age -0.0034 -0.0036 -0.0035
(-7.740)∗∗ (-7.787)∗∗ (-8.317)∗∗
Married 0.0589 0.0537 0.0657
(5.111)∗∗ (4.415)∗∗ (5.666)∗∗
Male 0.0197 0.0235 0.0286
(2.110)∗∗ (2.286)∗∗ (3.993)∗∗
Years of education 0.0287 0.0283 0.0276
(15.852)∗∗ (15.122)∗∗ (13.480)∗∗
Household wealth 0.0447 0.0331 0.0455
(1.429) (1.010) (1.535)
Household income 0.0057 0.0060 0.0053
(14.645)∗∗ (15.104)∗∗ (11.515)∗∗
Judicial Efficiency -0.0130 -0.0285
(-1.138) (-2.249)∗∗
Constant -0.4778 -0.1648 0.0513
(-2.995)∗∗ (-1.056) (0.791)
Number of observations 32442 31961 31961
R-squared adjusted 0.277 0.273 0.300

Robust t-statistics clustered at the regional level in parentheses

C. Zulehner: Introductory Econometrics 23 / 32


3. The Two Stage Least Squared Estimator

IV estimates are numerically equivalent to a two stage procedure where


1 We run an OLS regression of the endogenous variables on the instruments (i.e.
X = Zδ + η) and save the predicted values X
b = Zδ̂
2 We then regress y on X again by OLS:
b

y = Xβ
b +u

The Two-Stage Least Squares (2SLS) estimator β̂ 2SLS is numerically identical to


β̂ IV (see the derivation below)
Hence, the 2SLS estimator, being identical to the IV estimator, is also consistent

C. Zulehner: Introductory Econometrics 24 / 32


2SLS and IV Estimator - Derivation
1 The predicted value from regression X = Zδ + η is

X̂ = Zδ̂ = Z (Z0 Z)−1 Z0 X


| {z }
OLS estimator of δ

2 b + u is3
The OLS estimator for y = Xβ
b 0 X)
β̂ 2SLS = (X b −1 X
b0y
= [(Z(Z0 Z)−1 Z0 X)0 (Z(Z0 Z)−1 Z0 X)]−1 (Z(Z0 Z)−1 Z0 X)0 y
= [X0 Z (Z0 Z)−1 Z0 Z(Z0 Z)−1 Z0 X)]−1 X0 Z(Z0 Z)−1 Z0 y
| {z }
=I

= (Z0 X)−1 (Z0 Z)(X0 Z)−1 X0 Z(Z0 Z)−1 Z0 y


| {z }
=I
0 −1 0
= (Z X) Z y = β̂ IV

3
In the perfectly identified case X0 Z and Z0 Z are squared, invertible matrices, ABC−1 = C−1 B−1 A−1

C. Zulehner: Introductory Econometrics 25 / 32


Standard Errors in 2SLS

The estimates produced by 2SLS are the same as those produced by IV, the
standard errors that you obtain from the second stage OLS will not be the same as
the correct standard errors of the IV estimates
In general they will be larger. One can show that:

Var (β̂ 2SLS ) = σ 2 (Z0 X)−1 Z0 Z(Z0 X)−1 + Var (η̂β)(Z0 X)−1 Z0 Z(X0 Z)−1
> σ 2 (Z0 X)−1 Z0 Z(Z0 X)−1 = Var (β̂ IV )

I The reason is that the variable that is used in the second stage as an
explanatory variable (X)
b to replace X, is itself a random variable (a generated
regressor) and this inflates the variance
I Hence they need to be adjusted
I As usual, you won’t have to do it, STATA is doing it for you!

C. Zulehner: Introductory Econometrics 26 / 32


Derivation of the Variance of 2SLS
Now, from

y = Xβ + u
we use the fact that X = X̂ + v̂ and rewrite the model as

y = (X̂ + v̂ )β + u = X̂ β + (u + v̂ β) = X̂ β + ε
| {z }
error in the 2S of 2SLS

which is the regression equation we use in the second stage of 2SLS. The variance of
β̂2SLS which we get from this regression (and which will appear in your output) is

   −1 −1 0 −1


Var β̂2SLS = σε2 X̂ 0 X̂ = σε2 Z 0 X Z Z X 0Z
−1 0 −1 −1 0 −1  
= σu2 Z 0 X Z Z X 0Z + var(v̂ β) Z 0 X Z Z X 0Z ≥ Var β̂IV
| {z }
Var(β̂IV )

where the equality sign applies to the case where there’s a perefct fit in the first stage
(so that var(v̂ ) = 0 ).

C. Zulehner: Introductory Econometrics 27 / 32


Overidentification

If we have more instruments than endogenous variables, the model is overidentified


I In practice it is good to have more instruments than strictly needed, because
this increases the precision of the estimates
I But be careful! (see below and next lecture)
in case of overidentification there are several estimates of the “structural
parameters” that we can obtain
I If we have one endogenous variable and two instruments, we can obtain one IV
estimate using the first instrument and another IV estimate using the second
I We can even think of several possible combinations of the 2 instruments
What should we then do?
I Disregarding instruments (that is disregarding identifying restrictions) sounds
inefficient
I For this reason we will use all instruments but give more weight to “better”
ones

C. Zulehner: Introductory Econometrics 28 / 32


Overidentification and IV Estimator

In case of overidentification we can therefore think of a matrix PZ that gives more


weight to the more “informative” instruments
The IV estimate is then obtained by minimizing the weighted residual sum of
squares, i.e. by finding the β̂ IV that minimizes:

(u0 PZ u) = (y − Xβ)0 PZ (y − Xβ)

The FOC is:

−2X0 PZ y + 2X0 PZ Xβ̂ IV = 0


⇒ β̂ IV = (X0 PZ X)−1 X0 PZ y

But where should the weights come from?


The 2SLS estimator gives us a clever way of assigning these weights

C. Zulehner: Introductory Econometrics 29 / 32


2SLS Estimator - Another notation
The predicted value from regression X = Zδ + η is:
0 −1 0
X̂ = Zδ̂ = Z(Z Z) Z X = PZ X

b + u is:4
The OLS estimator for y = Xβ
b 0 X)
β̂ 2SLS = (X b −1 Xy
b
0 −1 0 0 0 −1 0 −1 0 −1 0 0
= [(Z(Z Z) Z X) (Z(Z Z) Z X)] (Z(Z Z) Z X) y
| {z } | {z } | {z }
PZ PZ PZ

0 0 −1 0 0
= (X P Z PZ X) XP Z PZ y
0 −1 0
= (X PZ X) X PZ y

the matrix PZ is also known as a “projection” matrix: since we have more


instruments than endogenous variables we have to “reduce” their dimension, i.e. we
project the space of the instruments into the space of the endogenous variables

4
The matrix PZ is idempotent, hence P0 Z PZ = PZ

C. Zulehner: Introductory Econometrics 30 / 32


What is 2SLS doing?
Imagine we have the "structural"model:

y = β0 + β1 x1 + β2 x2 + β3 x3 + u (1)

I We have two potentially endogenous variables x1 and x2 and one exogenous


variable x3
I We therefore need two instruments z1 and z2
2SLS is using in the best way all possible instruments, i.e. is calculating good
weights for a linear combination of all of the exogenous variables by using a
regression!
In our example, we run 2 first-stage regressions and for each of them we use a
linear combination of all exogenous variables and instruments:
x1 = δ0 + δ1 z1 + δ2 z2 + δ3 x3 + η1
x2 = α0 + α1 z1 + α2 z2 + α3 x3 + η2
In the second stage, we use x̂1 and x̂2 instead of x1 and x2 and obtain the
consistent estimates βˆ1 and βˆ2

C. Zulehner: Introductory Econometrics 31 / 32


Overidentification and 2SLS Estimator

So, we can think of the 2SLS estimator as a Weighted Least Squared Estimator
where PZ is a particular matrix of weights
The weights are derived from the regression coefficients of the first stage
regression(s)
Hence, we assign a larger weight to the instruments that are more strongly
correlated with X
Thus the optimal matrix PZ has a simple interpretation: it is the matrix that
transforms the endogenous regressors X into their predicted values from the first
stage regression
This new matrix of regressors (X~ = PZ X) satisfies the OLS restrictions because it is
a linear combination of the instruments

C. Zulehner: Introductory Econometrics 32 / 32

You might also like