Professional Documents
Culture Documents
Yichong Zhang1
1 School of Economics
I Internal validity:
I A statistical analysis is internally valid if the statistical
inferences about causal effects are valid for the population
being studied.
I We know that internal validity hinges on two things:
1. The estimator of the causal effect should be consistent (and
ideally unbiased)
2. Hypothesis tests should have the desired significance level (i.e.
you should be using the correct standard errors)
Threats to Internal Validity
STRi = γ0 + γ1 TestScorei + vi
I Then the causality runs both ways. But why is this a problem?
I It leads to correlation between STR and the error term. Let’s
see why.
Simultaneous causality
TestScorei = β 0 + β 1 STRi + ui
Qi = β 0 + β 1 Pi + ui , β1 < 0
Qi = α0 + α1 Pi + α2 Zi + vi , α1 > 0
Demand : Qi = β 0 + β 1 Pi + ui , β1 < 0
Supply : Qi = α0 + α1 Pi + α2 Zi + vi , α1 > 0
α0 − β 0 α2 vi − ui
Pi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
β 1 α0 − β 0 α1 β 1 α2 β 1 vi − α1 ui
Qi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
α0 − β 0 α2 vi − ui
Pi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
β 1 α0 − β 0 α1 β 1 α2 β 1 vi − α1 ui
Qi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
more compactly as
Pi = π10 + π11 Zi + ε 1i
Qi = π20 + π21 Zi + ε 2i
Yi = β 0 + β 1 Xi + ui
2. Instrument exogeneity
I Example
ln wage = β 0 + β 1 educ + u
I Problem : ability is in u and affects education.
I Possible IVs?
I Distance to nearest college, tuition subsidies, construction of
new schools...
IV–some examples
I Education
ln wage = β 0 + β 1 educ + u
Xi = π0 + π1 Zi + vi (2)
Yi = β 0 + β 1 (π0 + π1 Zi ) + ( β 1 vi + ui ).
2SLS procedures
I First stage:
X = π0 + π1 Z + V .
I regression X on Z, obtain OLS estimators π̂0 and π̂1 .
I Second stage:
Yi = β 0 + β 1 Xi + ui
= β 0 + β 1 (π0 + π1 Zi ) + ( β 1 vi + ui )
X = π0 + π1 Z + V .
Y = β 0 + β 1 ( π0 + π1 Z ) + ( β 1 V + U )
Y = β 0 + β 1 X + U, X = π0 + π1 Z + V ,
Y = β 0 + β 1 ( π0 + π1 Z ) + ( β 1 V + U )
I Predicted values represent variation in X that is “good” in
that it is driven only by factors (Z ) that are uncorrelated with
U.
I Specifically, the predicted value is linear function of Z that are
uncorrelated with U.
I Why not just use the X ’s that are exogenous? Why need Z?
I Answer: cannot just use X ’s who are exogenous. The
predicted value would be collinear in the second stage if we
have endogenous X .
Consistency of 2SLS
Yi = β 0 + β 1 Xi + ui (1)
π̂1 sZ ,Y s
β̂ 1 = 2 2
= Z ,Y .
π̂1 sZ sZ ,X
Consistency of 2SLS (Cont’d)
Qi = α0 + α1 Pi + α2 Zi + vi
α0 − β 0 α2 vi − ui
Pi = + Zi +
β 1 − α1 β 1 − α1 β 1 − α1
I Asymptotic normality:
√
n ( β̂ 1 − β 1 ) N (0, σ2 ).
Var [(Z −µZ )U ]
where σ2 = Cov (Z ,X )2
.
I SE ( β̂ 1 ):
We can consistently estimate σ2 by σ̂2 and then
√
SE ( β̂ 1 ) = σ̂/ n
2SLS with a single endogenous regressor
Qi = 9.72 − 1.08 ln Pi
lnd
(1.53) (0.32)
2SLS in EViews
Y = β 0 + β 1 X + U, X = π0 + π1 Z + V .
1 The intuition for the cutoff of 10 here is a bit complicated and involves the asymptotic bias of the 2SLS coefficients and how much bias you should be willing to