wk06 IV

Instrumental Variables I
Yiran Xie
School of Economics, University of Sydney
September 7, 2022
Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 1 / 34

Table of contents
1. Motivation
Endogeneity
Sources of endogeneity
2. Instrumental Variables
IV Intuition
Model setup
IV estimator
2SLS and Optimal GMM Estimators

Reference:
Chapter 4.8, Microecononmetrics: Methods and Applications, Cameron and
Trivedi (2005)
Chapter 6.1-6.3, Microecononmetrics Using Stata, Cameron and Trivedi (2009)

Motivation
Table of contents
Motivation
Endogeneity
Instrumental Variables
IV Intuition
Model setup
IV estimator

Motivation Endogeneity
Motivation
We consider linear models of the form:
yi = xi′ β + ui
For this information to be useful it is important that β̂ ≈ β
Under the classical assumptions we can guarantee:

Unbiasedness E [β̂] = β
p
Consistency β̂ → β

Endogeneity
One of the crucial assumptions is orthogonality
x is exogenous when it is uncorrelated with the error term:
E [xu] = 0
Otherwise it is endogenous
Stronger notions of exogeneity require:

u to be independent from x
or conditional mean independence E [u|x] = 0
Exogeneity guarantees that the things that are not accounted for (u) do not
interfere with the estimation of β

Endogeneity bias
The OLS regression
β̂ = (X ′ X )−1 X ′ y
= β + (X ′ X )−1 X ′ u
! N
#−1 ! N
#
1 " ′ 1 "
=β+ xi xi x i ui
N N
i=1 i=1
When E [xi ui ] ∕= 0,
! N
#
1 "
plim x i ui = E [xi ui ] ∕= 0
N
i=1
⇒ plim β̂ ∕= β

Simple model: y = βx + u (no intercept)
y : earnings, x: years of schooling, u: error term
Standard regression:
Assumes that x is uncorrelated with the error u
Then the only effect of x on y is a direct effect via the term βx
⇒ β̂ is consistent
x y

Simple model: y = βx + u (no intercept)

u embodies all factors other than schooling, e.g., ability
high ability (u) directly increases earnings (y ), since y = βx + u.
Ability is correlated with years of schooling

schooling (x) is likely to be higher for those with high ability (u).
β̂ combines the direct effect of schooling on earnings

with the indirect effect that people with high schooling tend to have high
ability and thus high earnings
⇒ β̂ is biased
x y

Why do we care?
Orthogonality is the most important and delicate assumption
Failures of other assumptions can be tested (to some degree)

Data from a model with a high degree of endogeneity can look completely
normal
Endogeneity can only be established/assumed through common sense/theory
If other assumptions fail, we can still estimate β consistently, and we can

make inference with minor adjustments
Endogeneity does not allow to estimate β consistently
If there is strong endogeneity bias, our estimated models can be poor
descriptions of reality

Motivation Sources of endogeneity
Sources of Endogeneity
Omitted variables
Simultaneous equations
Selection bias
Measurement error

Sources of Endogeneity: Omitted Variable Bias

The most common source of endogeneity:
We cannot account for all determinants of yi , so we have ui
if there is one determinant that is correlated with X but is unobserved
⇒ Omitted variable bias
Recall the model

Suppose data comes from (with all classical assumptions satisfied):
yi = β0 + β1 x1i + αx2i + ui
And instead we omit x2 (often because we cannot observe it):
yi = β0 + β1 x1i + ei
The OLS estimator of β1 will be consistent only if

α = 0: x2 is not a determinant of y
or δ = 0: x2 is not correlated with x1
e.g., the returns of schooling (y : earnings, x1 : years of schooling, x2 : ability)
Sources of Endogeneity: Simultaneous Equations
Or in some cases, reverse causality bias (y1 causes y2 , y2 causes y1 )
y1i = β0 + β1 y2i + γxi + ui

y2i = α0 + α1 y1i + θzi + ei
Notice that a change of ui leads to a change in y1i , which in turns leads to a

change in y2i . Hence ui and y2i are correlated!
e.g., y1 : quantity, y2 : price;

y1 : crime rates, y2 : police presence/surveillance

Sources of Endogeneity: Selection Bias
Suppose we want to estimate the effect of a treatment denoted by a dummy

variable d:
yi = β 0 + β 1 d i + u i
We want to estimate β1
Can we use
β̂1 = mean(yi |di = 1) − mean(yi |di = 0)?

Sources of Endogeneity: Selection Bias
Notice that
E [β̂1 ] = E [mean(yi |di = 1)] − E [mean(yi |di = 0)]

= E [yi |di = 1] − E [yi |di = 0]
= (β0 + β1 + E [ui |di = 1]) − (β0 + E [ui |di = 0])
= β1 + (E [ui |di = 1] − E [ui |di = 0])
We obtain consistent estimates only if E [ui |di = 1] = E [ui |di = 0]

That is, if the selection criteria for the treatment is orthogonal to u
e.g., the long-term consequences of military service

for people who will not pursue military careers
y : wage, d: the choice to participate in voluntary service
self-selection leads to endogeneity

Sources of Endogeneity: Measurement error
X is measured with error

Consider the model
yi = β 0 + β 1 xi + u i
Suppose that instead of observing x , we observe x ∗ = x + µ

Hence the model that we actually estimate is:
yi = β0 + β1 xi∗ + ei
ei = u i − β 1 µi
In many cases, it is sensible to assume that the measurement error is

correlated with the measured variable
e.g., y : health, x: wealth, x ∗ : self-reported wealth

greater measurement error for richer people, E [xµ] ∕= 0

Table of contents
Motivation
Endogeneity
IV Intuition
Model setup
IV estimator

Instrumental Variables IV Intuition
IV Intuition
Consider the model
y = β0 + β1 x + u
We wish to estimate the effect of x on y , namely β1

But x and u are correlated
If we could observe variations of x which are independent from u, we could

then estimate β1

IV Intuition
Assume there exists an instrument z that has the properties

changes in z do not directly lead to changes in y
changes in z are associated with changes in x
The path analysis diagram becomes
z x y
Note: z does not directly cause y , though z and y are correlated via indirect
path of z being correlated with x which in turn determines y .
Formally, z is an instrument for regressor x if

z is uncorrelated with the error u; and
z is correlated with the regressor x.

IV Intuition
A one unit change in the instrument z is associated with

0.2 more years of schooling (x) and
$500 increase in annual earnings (y ) (due to z ↑⇒ x ↑⇒ y ↑.)
Then 0.2 years extra schooling is associated with $500 extra earnings.
So a one year increase in schooling is associated with a $500/0.2 = $2, 500
increase in earnings.
The causal estimate of β is therefore 2500.

Mathematically we estimated changes dx/dz and dy /dz and calculated the

causal estimator as
dy /dz
βIV =
dx/dz
dy /dz estimated by OLS of y on z with slope estimate (z ′ z)−1 z ′ y

dx/dz estimated by OLS of x on z with slope estimate (z ′ z)−1 z ′ x
The IV estimator is
(z ′ z)−1 z ′ y
β̂IV =
(z ′ z)−1 z ′ x
= (z ′ x)−1 z ′ y
! N #−1 N
" "
= z i xi z i yi
i=1 i=1
(not a general formular. only for this simplified model with one instrument z)

Application: Angrist and Krueger(1991)
Angrist, Joshua D. and Alan B. Krueger. 1991. ”Does Compulsory School

Attendance Affect Schooling and Earnings?” The Quarterly Journal of
Economics 106 (4):pp979-1014.
They use quarter of birth as an instrument for education to estimate the
returns to schooling.

Why is the Quarter of Birth?

In most of the U.S. must attend school until age 16 (at least during
1938-1967)
Age when starting school depends on birthday, so grade when can legally drop
out depends on birthday by compulsory schooling laws.

Is Schooling related to Quarter of Birth?

Is Earnings related to Quarter of Birth?

Instrumental Variables Model setup
Model setup
A more general regression model:
y1i = y2i′ β1 + x1i′ β2 + ui , i = 1, ..., N
y1 : scalar dependent variable

y2 : m × 1 vector, m endogeneous regressors
x1 : K1 × 1 vector, K1 exogeneous regressors (including an intercept)
ui : uncorrelated with x1 but correlated with y2
Assume at least m instrumental variables x2 for y2 :
y2ji = x1i′ π1j + x2j′ π2j + vji , j = 1, ..., m
x1 : use the same regressors as instruments for themselves

x2 : at least m additional instruments for m endogeneous variables

Model setup
The model can be more simply written as
yi = xi′ β + ui
xi′ = [y2i′ x1i′ ] combines endogenous and exogenous variables

y instead of y1 – simply a change of notation
The vector of instruments zi′ = [x1i′ x2i′ ]

x1 : the (ideal) instrument for itself
x2 : the instrument for y2
Should satisfy the moment condition:
E (zi ui ) = 0
In sum, we regress y on x using instruments z

Find the Estimators
We have the population moment condition
E [zi (yi − xi′ β)] = 0
Method of moments: solve the corresponding sample moment condition

N
"
zi (yi − xi′ β) = 0
i=1
i.e., find the best β̂ so that the equation (approximately) holds

Instrumental Variables IV estimator
Instrumental variables (IV) estimator
In just-identified case (#instruments = #regressors = k)

!
solve k equations in k unknowns i zi (yi − xi′ β) = 0
gives the instrumental variables (IV) estimator
" N
$−1 " N
$
# #
β̂IV = zi xi′ z i yi
i=1 i=1
′ −1
= (Z X ) (Z ′ y )
estimate using Stata command ivregress 2sls
Often just one regressor in xi is endogenous (i.e. correlated with ui )

Then one variable in zi is the instrument for this endogenous regressor
the remaining entries in zi are the exogenous variables
i.e. exogenous variables are instruments for themselves

Instrumental Variables IV estimator
What if we have more instruments than regressors?
In over-identified$
case (#instruments > #regressors)
we cannot solve i zi (yi − xi′ β) = 0
because we have more equations than unknowns
dim(z) equations with only dim(x) unknowns
Simply drop additional instruments to get to the just-identified case?

Valid but not efficient
Use the generalized method of moments (GMM) estimator instead

Instrumental Variables 2SLS and Optimal GMM Estimators
Generalized method of moments estimator
The estimator works for over-identified cases

$
Minimizes the quadratic form in i zi (yi − xi′ β)
% &′ % &
1 " 1 "
Q(β) = zi (yi − xi′ β) × WN × zi (yi − xi′ β)
N N
i i
= N −2 (Z ′ u)′ WN (Z ′ u) = N −2 u ′ (ZWN Z ′ )u
The symmetric full-rank weighting matrix WN does not depend on β.
Then ∂Q(β)/∂β = 0 yields the GMM estimator

! #−1 ! #
" " " "
β̂GMM = xi zi′ × WN × zi xi′ xi zi′ × WN × zi yi
i i i i
= (X ZWN Z ′ X )−1 X ′ ZWN Z ′ y
′

2SLS and Optimal GMM

The variance of β̂GMM is smallest when the optimal weighting matrix WN is
consistent for (Var [Z ′ u])−1
Though in the just-identified case GMM = IV for any WN .
For homoskedastic errors Var [Z ′ u] = σ 2 Z ′ Z

Two-stage least squares (2SLS) estimator sets WN = (Z ′ Z )−1
Yields β̂2SLS = (X ′ Z (Z ′ Z )−1 Z ′ X )−1 × X ′ Z (Z ′ Z )−1 Z ′ y
Estimate using Stata command ivregress 2sls
but use robust VCE to guard against errors not homoskedastic.
$
For heteroskedastic errors Var [Z ′ u] = Z ′ ΩZ = i ui2 zi zi′
Optimal GMM estimator if errors are heteroskedastic errors sets
" $−1
−1
# 2 ′
WN = Ŝ = ûi zi zi , ûi = yi − xi′ β̂2SLS
i
Yields β̂OGMM = (X Z Ŝ Z X )−1 × X ′ Z Ŝ −1 Z ′ y

′ −1 ′
estimate using Stata command ivregress gmm

More on 2SLS
2SLS gets its name because it can be computed in two-stages.

Structural equation y = X β + u
First-stage equation X = Z Π + V
2SLS can be computed as follows
First-stage: X̂ as prediction from OLS of X on Z
Π̂ = (Z ′ Z )−1 Z ′ X
X̂ = Z Π̂ = Z (Z ′ Z )−1 Z ′ X
Structural: Do OLS of y on X̂
β̂2SLS = (X̂ ′ X̂ )−1 X̂ ′ y

= (X ′ Z (Z ′ Z )−1 Z ′ X )−1 × X ′ Z (Z ′ Z )−1 Z ′ y

Data Example: Drug expenditures
use mus06data.dta
Drug expenditures for U.S. elderly (ldrugexp) regressed on
endogenous private health insurance dummy (hi empunion) and
exogenous regressors defined by global x2list.

wk06 IV

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

wk06 IV

Uploaded by

Copyright:

Available Formats

Instrumental Variables I

School of Economics, University of Sydney

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 1 / 34

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 2 / 34

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 3 / 34

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 4 / 34

We consider linear models of the form:

For this information to be useful it is important that β̂ ≈ β

Under the classical assumptions we can guarantee:

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 5 / 34

One of the crucial assumptions is orthogonality

x is exogenous when it is uncorrelated with the error term:

Stronger notions of exogeneity require:

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 6 / 34

The OLS regression

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 7 / 34

Simple model: y = βx + u (no intercept)

y : earnings, x: years of schooling, u: error term

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 8 / 34

Simple model: y = βx + u (no intercept)

Ability is correlated with years of schooling

β̂ combines the direct eﬀect of schooling on earnings

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 9 / 34

Orthogonality is the most important and delicate assumption

Failures of other assumptions can be tested (to some degree)

If other assumptions fail, we can still estimate β consistently, and we can

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 10 / 34

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 11 / 34

Sources of Endogeneity: Omitted Variable Bias

Recall the model

And instead we omit x2 (often because we cannot observe it):

The OLS estimator of β1 will be consistent only if

Sources of Endogeneity: Simultaneous Equations

Or in some cases, reverse causality bias (y1 causes y2 , y2 causes y1 )

y1i = β0 + β1 y2i + γxi + ui

Notice that a change of ui leads to a change in y1i , which in turns leads to a

e.g., y1 : quantity, y2 : price;

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 13 / 34

Sources of Endogeneity: Selection Bias

Suppose we want to estimate the eﬀect of a treatment denoted by a dummy

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 14 / 34

Sources of Endogeneity: Selection Bias

E [β̂1 ] = E [mean(yi |di = 1)] − E [mean(yi |di = 0)]

We obtain consistent estimates only if E [ui |di = 1] = E [ui |di = 0]

e.g., the long-term consequences of military service

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 15 / 34

Sources of Endogeneity: Measurement error

X is measured with error

Suppose that instead of observing x , we observe x ∗ = x + µ

In many cases, it is sensible to assume that the measurement error is

e.g., y : health, x: wealth, x ∗ : self-reported wealth

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 16 / 34

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 17 / 34

Consider the model

We wish to estimate the eﬀect of x on y , namely β1

If we could observe variations of x which are independent from u, we could

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 18 / 34

Assume there exists an instrument z that has the properties

Formally, z is an instrument for regressor x if

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 19 / 34

A one unit change in the instrument z is associated with

The causal estimate of β is therefore 2500.

Yiran Xie (School of Economics, University of Sydney) IV September 7, 2022 20 / 34