You are on page 1of 39

Handout 8: Instrumental variable I

Yichong Zhang1

1 School of Economics

Singapore Management University


Threats to Internal Validity

I Internal validity:
I A statistical analysis is internally valid if the statistical
inferences about causal effects are valid for the population
being studied.
I We know that internal validity hinges on two things:
1. The estimator of the causal effect should be consistent (and
ideally unbiased)
2. Hypothesis tests should have the desired significance level (i.e.
you should be using the correct standard errors)
Threats to Internal Validity

I We will continue to focus on threats to consistency


I In the context of simple univariate OLS, we know that:
sX ,U p σX ,U
βb1 = β 1 + 2 −→ β 1 + 2 .
sX σX

I We know that βb1 will be inconsistent if σX ,u 6= 0


I So when might this occur?
Some threats to internal validity

I Measurement error in the regressors


I Omitted (and unobserved) variables
I Simultaneous causality
I X causes Y , but Y in turn causes X
I Today we’ll discuss Instrumental Variables (IV), a method
that can help to address these problems
Simultaneous causality

I Consider the test scores and student-teacher ratio example.


I We assumed that there was a causal relationship running from
STR to TestScore (i.e. that lower STR’s caused higher
TestScore through a better learning environment).
I But what if the school board responds to low average test
scores by hiring more teachers (i.e., to lower STR) for those
school districts?

STRi = γ0 + γ1 TestScorei + vi

I Then the causality runs both ways. But why is this a problem?
I It leads to correlation between STR and the error term. Let’s
see why.
Simultaneous causality

I Suppose there’s an omitted factor that leads to low TestScore


but is “not directly correlated” with STR.
I Because of the school board’s actions, it will also lead to a
decrease in STR.
I In particular, a negative error (ui ) in

TestScorei = β 0 + β 1 STRi + ui

reduces TestScore, which then reduces STR (because more


teachers are hired), so there will be a positive correlation
between STR and u, which violates OLS Assumption 1.
I The problem that some unobserved factor affects both the
dependent variable and some of the regressors is called an
endogeneity problem
I here u has a causal effect on both TestScore and STR
Simultaneous causality

I The canonical example of simultaneous causality concerns


demand (or supply) estimation.
I Suppose we are interested in estimating the demand for a
product (say beer).
I The demand function relates the quantity demanded to price

Qi = β 0 + β 1 Pi + ui , β1 < 0

where the error term ui captures unobserved demand factors


like income and tastes.
I So why not use OLS to estimate the demand function?
Simultaneous causality: Perfectly Competitive Case
I There’s also a supply function that relates the quantity firms
are willing to supply at a given price

Qi = α0 + α1 Pi + α2 Zi + vi , α1 > 0

where the error term vi captures unobserved supply shifters.


I The only thing distinguishing the two functions is Zi , which
might be a supply factor like the price of hops (an input to
making beer).
I The equilibrium P and Q are given by the intersection of these
two relations, making it impossible to estimate the demand
(or supply) equation by simply regressing Q on P (and Z ).
I The reason is that P will be correlated with u, for the same
reason as the class size example discussed above.
I Let’s see how it works in this setting.
Simultaneous causality
I The demand and supply functions

Demand : Qi = β 0 + β 1 Pi + ui , β1 < 0
Supply : Qi = α0 + α1 Pi + α2 Zi + vi , α1 > 0

are called the structural equations since each can be derived


from economic theory and has a causal interpretation.
I The variables Qi and Pi are called endogenous variables
(since they are determined inside the system), while Zi is
called an exogenous variable (since it is assumed to be
determined outside the system).
I Through clever substitution, we can solve for the endogenous
variables in terms of the exogenous variable Zi (and the
remaining parameters).
I This will illustrate why we have an endogeneity problem, and
suggest how we will obtain a solution to it.
Simultaneous causality (Cont’d)

I In particular, if we set the two equations equal, solve for Pi ,


and then plug this Pi into the demand equation to get Qi , we
obtain the following reduced form equations

α0 − β 0 α2 vi − ui
Pi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
β 1 α0 − β 0 α1 β 1 α2 β 1 vi − α1 ui
Qi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1

which show how Pi is correlated with the error terms in both


the structural supply and demand equations.
I Reduced form equations only have exogenous variables and
parameters on the right-hand side.
I This is direct evidence of the endogeneity problem caused by
simultaneous causality.
Simultaneous causality (Cont’d)
I However, by re-writing the reduced form equations

α0 − β 0 α2 vi − ui
Pi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
β 1 α0 − β 0 α1 β 1 α2 β 1 vi − α1 ui
Qi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1

more compactly as

Pi = π10 + π11 Zi + ε 1i
Qi = π20 + π21 Zi + ε 2i

we can see an easy way to estimate β 1 : π21 = β 1 π11 , so


β 1 = ππ21
11
I so if we have consistent estimates of π21 and π11 we can get
consistent estimate of β 1 .
Simultaneous causality (Cont’d)
I Since Zi is exogenous (by assumption), we can just use OLS
to estimate the πs.
I Using the univariate OLS formulae from Chapter 4, the OLS
s
estimates of π21 and π11 are simply π̂21 = sZQ
2 and
Z
sZP
π̂11 = s 2 , so
Z
π̂21 s
β̂ 1 = = ZQ
π̂11 sZP

I This estimation approach is called Indirect Least Squares


(ILS) and it is a special case of Instrumental Variables (IV)
estimation.
I However, from the reduced form equations, it is clear that
there is no way to “indirectly” estimate α1 so can’t estimate
the supply equation.
I We will see why in a bit, but first let’s see how we can use IV
to address endogeneity problems more generally.
Instrumental Variables
I Instrumental Variables (IV) is a general way to obtain a
consistent estimator of the unknown coefficients of the
population regression function

Yi = β 0 + β 1 Xi + ui

when X is correlated with u.


I What’s the intuition?
I Think of X as having two parts, one that is correlated with u
and one that isn’t.
I IV isolates the part of X which is uncorrelated with u (the
“good” part) allowing us to disregard the variations in X that
bias the OLS estimates (the “bad” part).
I How? By using other variables (instruments) that are
correlated with Y only through X , and are uncorrelated with u.
I In our example the price of hops affects the price of beer, and
so affects the demand for beer, but it is assumed not to be
correlated with the unobserved demand shock for beer.
Instrumental Variables (Cont’d)

I A valid instrument Zi must satisfy two conditions:


1. Instrument relevance

Cov (Zi , Xi ) 6= 0 (Usually easy to satisfy)

2. Instrument exogeneity

Cov (Zi , ui ) = 0 (Usually hard to satisfy)

I Example
ln wage = β 0 + β 1 educ + u
I Problem : ability is in u and affects education.
I Possible IVs?
I Distance to nearest college, tuition subsidies, construction of
new schools...
IV–some examples

I Education
ln wage = β 0 + β 1 educ + u

I Problem : ability is in u and affects education.


I Possible IVs?
I Distance to nearest college, tuition subsidies, construction of
new schools...
IV – some examples, cont’d

I Labor supply consequences of child-bearing:

labor supply = β 0 + β 1 third child birth + u

I Problem: ability, income, and education are in u and affect the


decision of child-bearing. (income).
I Possible IVs
I Sex-composition of the twin as IV. Same sex, more likely to
have a third child. Sex-composition is independent of ability.
The Two Stage Least Squares (2SLS) Estimator

I The most popular method of IV is 2SLS.


I How does 2SLS work?
I Suppose that
Yi = β 0 + β 1 Xi + ui (1)
where Cov (Xi , ui ) 6= 0 (so we violate OLS Assumption 1).
I The 2SLS estimator can be calculated in two stages.
I The first stage decomposes X into two components (both of
which are correlated with Y ):
1. A problematic component that may be correlated with u, and
2. A problem free component that varies because of the
instrument and is uncorrelated with u.
I The second stage uses the problem-free component to
estimate β 1 .
2SLS (Cont’d)

I Specifically, the first stage assumes there is a population


regression linking X and the instrument Z :

Xi = π0 + π1 Zi + vi (2)

where we assume that


I Cov (Zi , ui ) = 0 (exogeneity)
I Cov (Zi , Xi ) 6= 0 (relevance, i.e. π1 6= 0), and
I Note that this is a “reduced form” equation like the ones we
derived above.
2SLS (Cont’d)

I The first stage decomposes Xi into two components:


1. The problematic component: vi (the variation in X not
explained by Z ).
2. The problem free component: π0 + π1 Zi .
I In the second stage we use the problem free component to
estimate:
Yi = β 0 + β 1 Xi + ui
by replacing Xi with π0 + π1 Zi + vi .

Yi = β 0 + β 1 (π0 + π1 Zi ) + ( β 1 vi + ui ).
2SLS procedures

I First stage:
X = π0 + π1 Z + V .
I regression X on Z, obtain OLS estimators π̂0 and π̂1 .
I Second stage:

Yi = β 0 + β 1 Xi + ui
= β 0 + β 1 (π0 + π1 Zi ) + ( β 1 vi + ui )

I Regress Y on X̂ where X̂ = π̂0 + π̂1 Z .


Do we need E (V |Z ) = 0 for the first stage?

I No. E (V |Z ) = 0 is needed for a causal interpretation of π1 .


I OLS in general decompose the dependent variable into two
parts: one part can be explained by regressors, one part
cannot.
I What does OLS calculate without correct specification.
p Cov (X ,Z )
I π̂1 = SSxz2 −→ π1 = σ2
,
z z
p Cov (X ,Z )
π̂0 = X − π̂1 Z −→ π0 = E (X ) − σZ2
E (Z ).
I Just let V be X − π0 − π1 Z = X − E (X ) − π1 (Z − E (Z )).
With this V , by construction, we have

X = π0 + π1 Z + V .

I Double check: Cov (V , Z ) = Cov (X , Z ) − π1 Cov (Z , Z ) = 0.


(Note, we do not assume E (V |Z ) = 0.)
Why the second stage of 2SLS is consistent?

Y = β 0 + β 1 ( π0 + π1 Z ) + ( β 1 V + U )

I Pretend for the moment that we know π0 and π1 .


I If I regress Y on π0 + π1 Z , I obtain OLS estimator β̂ 1 of β 1 :

Sπ0 +π1 Z ,Y p σπ0 +π1 Z ,Y σπ +π Z ,β V +U


β̂ 1 = 2
−→ 2 = β1 + 0 2 1 1 = β1 .
Sπ0 + π1 Z σπ0 +π1 Z σπ0 +π1 Z

I To see the last equality,

σπ0 +π1 Z ,β1 V +U = σπ1 Z ,β1 V +U = π1 β 1 σZ ,V + π1 σZ ,U = 0.

I σZ ,V = 0 by first stage OLS. σZ ,U = 0 by exogeneity of the


instrument.
I We do not know π0 and π1 . Replace by their OLS estimators
(π̂0 , π̂1 ) from the first stage.
Intuition behind 2SLS

Y = β 0 + β 1 X + U, X = π0 + π1 Z + V ,
Y = β 0 + β 1 ( π0 + π1 Z ) + ( β 1 V + U )
I Predicted values represent variation in X that is “good” in
that it is driven only by factors (Z ) that are uncorrelated with
U.
I Specifically, the predicted value is linear function of Z that are
uncorrelated with U.
I Why not just use the X ’s that are exogenous? Why need Z?
I Answer: cannot just use X ’s who are exogenous. The
predicted value would be collinear in the second stage if we
have endogenous X .
Consistency of 2SLS

I So how do we know that 2SLS yields a consistent estimate of


β1 ?
I Let’s work through the math.
I Suppose we are interested in estimating

Yi = β 0 + β 1 Xi + ui (1)

where E [ui | Xi ] 6= 0, but we have a valid instrument Z .


I Recall that 2SLS takes place in two stages
I First stage: Estimate

Xbi = π̂0 + π̂1 Zi

I Second stage: Estimate the slope β 1 by regressing Yi on Xbi ,


so that
sXb ,Y sπ̂0 +π̂1 Zi ,Y π̂1 s
β̂ 1 = = 2
= 2Z 2,Y
sXb 2 sπ̂0 +π̂1 Zi π̂1 sZ
sX ,Z
I Further note π1 = . Therefore,
SZ2

π̂1 sZ ,Y s
β̂ 1 = 2 2
= Z ,Y .
π̂1 sZ sZ ,X
Consistency of 2SLS (Cont’d)

Note 2SLS may be biased, so we will only look at consistency.


I Note Y = β 0 + β 1 X + U,
sZ ,Y p σZ ,Y σZ ,β0 + β1 X +U σ
β̂ 1 = −→ = = β 1 + Z ,U .
sZ ,X σZ ,X σZ ,X σZ ,X

I By IV exogeneity: Cov (Z , U ) = 0. Therefore,


p
β̂ 1 −→ β 1 .
2SLS (Cont’d)

I Now let’s look back at our supply and demand example.


I Recall that we were interested in estimating the demand
equation
Qi = β 0 + β 1 Pi + ui
but had an endogeneity problem, since prices and quantities
are determined by the intersection of demand with supply

Qi = α0 + α1 Pi + α2 Zi + vi

I However, the supply equation now provides us with a natural


instrument for Pi in the demand equation, namely the supply
shifter Zi .
2SLS (Cont’d)
I How do we know that Zi is relevant? The reduced form
equation

α0 − β 0 α2 vi − ui
Pi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1

shows that Zi is related to Pi as long as α2 6= 0.


I Since Zi is exogenous by assumption, we can estimate the
slope of the demand equation using 2SLS
sZQ
β̂2SLS
1 =
sZP

I Note that this is exactly the same estimator we found before.


I Again, since there is no exogenous demand shifter in the
demand equation, we cannot estimate the supply equation: it
isn’t identified.
Sampling distribution of 2SLS

I Asymptotic normality:

n ( β̂ 1 − β 1 ) N (0, σ2 ).
Var [(Z −µZ )U ]
where σ2 = Cov (Z ,X )2
.
I SE ( β̂ 1 ):
We can consistently estimate σ2 by σ̂2 and then

SE ( β̂ 1 ) = σ̂/ n
2SLS with a single endogenous regressor

I In many applications, there is only a single endogenous


regressor (X ), but several exogenous ones (W ’s):

Yi = β 0 + β 1 X1i + β 2 W1i + ... + β r +1 Wri + ui

where endogenous means “correlated with u” and


exogenous means “uncorrelated with u”.
I Example: demand equation with price, and some exogenous
shifters to the level of demand as regressors...
2SLS with a single endogenous regressor: identification

I In order to use 2SLS, we need at least one instrument (Z ) for


the endogenous regressor X1i .
I With no instruments, the equation is ”under-identified” (i.e.,
cannot consistently estimate β 1 ).
I With 1 instrument, the equation is “exactly-identified”.
I With > 1 instrument, the equation is “over-identified” (i.e.,
we could still consistently estimate β 1 if we had fewer
instruments).
I With m ≥ 1 instruments, the 2SLS estimator is computed by
1. Regressing X1i on all of the instruments (Z1i , ..., Zmi ) and
all of the included exogenous regressors (W1i , ..., Wri )
using OLS and computing the predicted value Xb1i .
2. Regressing Yi on Xb1i and the included exogenous regressors
(W1i , ..., Wri ) using OLS.
Empirical Example: Demand for Cigarettes

Qi = 9.72 − 1.08 ln Pi
lnd
(1.53) (0.32)
2SLS in EViews

I Download cigarette data from http:


//wps.aw.com/aw_stock_ie_3/178/45691/11696965.cw/.
I Load the data in EViews.
I Gen real salestax = taxs/cpi - tax/cpi.
I Equation – TSLS regress log(packpc) on log(avgprs/cpi),
using log(avgprs/cpi) as IV, choosing robust standard error.
I Compare with your OLS estimates.
I Now TSLS regress log(packpc) n log(avgprs), using salestax
as IV, choosing robust standard error.
I Now TSLS regress log(packpkc) n log(avgprs), using salestax
as IV, controlling for log(income/(pop*cpi)), and choosing
robust standard error.
2SLS in Multiple Regression

I In general, there can be multiple endogenous regressors (X ’s)


and exogenous regressors (W ’s)

Yi = β 0 + β 1 X1i + ... + β k Xki + β k +1 W1i + ... + β k +r Wri + ui

I Assumptions (Key concept 12.4 in S$W)


1. Controls are exogenous: E (ui |W1i , · · · , Wri ) = 0.
2. Data are i.i.d. (X1i , · · · , Xki , W1i , · · · , Wri , Z1i , · · · , Zmi , Yi )
are i.i.d.
3. No outliers: X ’s, W ’s, Z ’s, and Y have nonzero finite fourth
moments.
4. IV: m ≥ k and
I Relevance: (X1i , · · · , Xki ) can be explained by (Z1i , · · · , Zmi )
after controlling for (W1i , · · · , Wri ).
I Exogeneity: cov (Z1i , ui ) = · · · = cov (Zmi , ui ) = 0.
How to implement 2SLS

The general 2SLS estimator can still be computed in two stages:


1. Regress each Xji on all of the instruments (Z1i , ..., Zmi ) and
 (W1i , ...,Wri ) using OLS.
the included exogenous regressors
Compute the predicted values Xb1i , ..., Xbki from these k
regressions.
 
2. Regress Yi on the predicted values Xb1i , ..., Xbki and the
included exogenous regressors (W1i , ..., Wri ) using OLS.
Instrument Strength and Exogeneity

I Whether IV regression is useful in a given application hinges


on whether the instruments are valid.
I So how do we know if our instruments are valid?
I Recall that validity depends on the relevance
(Cov (Zi , Xi ) 6= 0) and exogeneity (Cov (Zi , ui ) = 0)
assumptions.
I Let’s look at the consequences of violating each one and see
how to test for violations.
Instrument Relevance (Strength)

Y = β 0 + β 1 X + U, X = π0 + π1 Z + V .

I With relevance the issue is not just whether the instrument is


relevant, but how relevant. Consider,
s d (Zi , Yi ) 2
Cov 1 var [(Zi − µZ ) ui ]
β̂2SLS
1 = ZY = , σβb2SLS =
sZX Cov (Zi , Xi )
d 1 n [Cov (Zi , Xi )]2
I Low Cov (Zi , Xi ) will make the estimator explode.
I So the more variation in X that is explained by the
instrument, the more information is available for use in IV
regression, i.e., you want π1 to be large (in magnitude).
I It also turns out that relevance also affects the quality of the
normal approximation used for inference, just like having a
larger sample size.
I This can have the effect that even though the estimator may
explode the standard errors may not, a problem known as
”weak instruments”.
Checking Instrument Strength (Relevance)

I Checking for weak instruments with a single endogenous


regressor:
I Rule of thumb: If the F -statistic for testing the null hypothesis
that the coefficients on the instruments are all zero in the first
stage regression is less than 10, you have weak instruments1 .
I Note: if you only have one instrument, you can just use
F = t 2 (so you’d like a t-stat on the instrument > 3.2).
I So what should you do if you have weak instruments?
I Find better ones (or drop the weak ones if you are
over-identified).
I Or use a different technique...

1 The intuition for the cutoff of 10 here is a bit complicated and involves the asymptotic bias of the 2SLS coefficients and how much bias you should be willing to

tolerate (for more details, see Appendix 12.5).


Instrument Exogeneity

I If the instruments are not exogenous (Cov (Zi , ui ) 6= 0), then


the Xb ’s will be correlated with u and 2SLS will be
inconsistent.
I Defeats the purpose of IV since it can’t isolate the “good part”
of X .
Checking Instrument Exogeneity
I Can we test for the exogeneity of the regressors?
I If the number of endogenous variables equals the number of
available instruments (i.e. you are exactly identified), then you
can’t (have to use economic theory and intuition/judgement)!
I If you have more instruments than endogenous variables (i.e.
you are over-identified), then you can.

You might also like