Professional Documents
Culture Documents
1 Endogeneity
Failure of Exogeneity
Detecting Endogeneity
2 Omitted Variables
The General Case
Consequences of Omitted Variables: an example
3 Measurement Error
“Classical” Measurement Error
The General Case
An Informative special case
Measurement Error in the Dependent Variable
Until now, we mainly looked at the failure of assumptions MLR.2, MLR.3, and
MLR.5
I We argued that multicollinearity, heteroscedasticity, and correlated errors
among clusters - holding the other assumptions constant - only result in a loss
of efficiency
I Standard errors are too high but point estimates are “correct”, i.e. estimates
are consistent
Assumption MLR.1 (linearity) also doesn’t seem to be too restrictive as we often
can linearize the model. Alternatively, we can use non-linear estimation methods
(not in this lecture)
The crucial assumption of our model is MLR.4: the orthogonality or conditional
independence assumption
E (u|X) = 0
We now will address the issues related to the failure of this central assumption more
generally
1 What happens if this does not hold
2 What – besides omitted variable bias – can cause its failure
3 How do we know if it fails to hold?
−1
1 0 1 0
plim β̂ = plim β + plim XX plim X u 6= β
N N
| {z } | {z }
→
−
p E [xi 0 xi ]− 1=Q−1 →
−
p E [xi ui ]6=0
1
Remember β̂ = β + (X0 X)−1 X0 u ⇒ β̂ − β = (X0 X)−1 X0 u
Q−1 plim X0 u ≷ 0
Key: When exogeneity fails the estimator is unable to identify the true effect of X
on y
Non-identification here means the inability to deliver consistent estimates
Intuition: we are trying to measure ∂y/∂X
This is equal to
∂y ∂u
=β+ 6= β
∂X ∂X
If Q−1 plim(X0 u/N) is sufficiently large this can generate large differences between
β̂ and β
I Depending on sign of correlation between X and u, the sign of β̂ and β can
even differ!
If we are sure that the estimate is consistent, even if it is inefficient (has a high
standard error) we can still focus on the size of the average effect and get some
useful conclusions (though the high variance warns to be cautious)
If the estimate is inconsistent we can’t draw any conclusion
Knowing the direction of the bias can be informative:
I If β̂ > 0 and plim(X0 u/N) < 0 than the estimate is a lower bound of true
effect
I Try to guess sign and size of bias whenever possible
Taking expectation
To exactly sign the bias in the general case, we have to know all correlations among
the x’s and the omitted factor
Typically, we explicitly measure the bias only if we can (safely) assume that the
other x’s are uncorrelated with the omitted factor
So, why is omitted variable a cause of endogeneity?
I If we omit x2 (or Z) from our equation, it will be captured by the error term u
I If x1 and x2 are correlated, then E (u|x1 ) 6= 0
I In this case, omitting x2 will make our assumption MLR.4 fail and OLS
estimates inconsistent!
with yj = ln Yj , kj = ln Kj , lj = ln Lj and β0 + j = ln Aj
the error term, j , includes:
I technology or management differences, measurement errors, variation in
external factors (weather, machine break down, labor problems)
observed inputs may be correlated with unobserved shock and therefore OLS will
yield biased and inconsistent estimates
I capital and labor are chosen by the firm
I if the firm has knowledge of j (or some part of it) when making input
decisions the choices will likely be correlated with j
I already mentioned by Marshack and Andrews (EMA, 1964)
y = Xβ + u with E (u|X) = 0
instead of the exact variables X∗ we observe X, which is measured with error:
X = X∗ + e
The “classical” measurement error is when E (e|X∗ ) = 0
We can therefore re-write our model as:
y = Xβ − eβ + u
I Now X is per definition correlated with the composite error u − eβ
I This lead to inconsistency of the OLS estimates
I We want to estimate E (y|X∗ ) but we only can estimate E (y|X)
I Which type of bias will we have and ow large will it be?
=0 Var (e)
z }| { z }| {
∗
Cov (e, X ∗ + e) (Cov (e, X ) + Cov (e, e)
=β−β =β−β
Var (X ) Var (X )
Var (X ∗ )
Var (e)
=β 1− = β
Var (X ∗ ) + Var (e) Var (X ∗ ) + Var (e)
| {z }
<1
Consider the previous model with classical measurement error but assume now that
X is multi-dimensional
Call Ω the covariance matrix of u. Moreover assume
1 ∗0 ∗ 1 ∗0
plim X X = Ω∗XX and plim X e =0
N N
Denominator:
1 ∗
plim (X + e)0 (X∗ + e)
N
1 ∗0 ∗ 1 ∗0 1 0 ∗ 1 0 ∗
N X X + N X e + N e X + N e e = ΩXX + Ω
= plim
| {z } | {z }
plim=0 plim=0
Nominator:
1 ∗ 0 ∗
plim (X + e) (X β + u)
N
1 ∗0 ∗ 1 0 ∗ 1 ∗0 1 0 ∗
N X X β + N e X β + N X u + N e u = ΩXX β
= plim
| {z } | {z }
plim=0 plim=0
But, in this general case, it is hard to say anything about the direction of bias on
any single coefficient. It will depend on the vector of coefficients and the matrices
of covariance among the regressors and the errors
If both Ω∗XX and Ω are diagonal then all coefficients are biased towards zero
Consider the two variables case. One variable, x1 is measured with error, the other
x2 is measured without error. The two matrixes are then
2 2
σe 0 σ1 σ12
Ω= and Ω∗XX =
0 0 σ12 σ22
σe2 σ12
plimβ̂2 = β1 + β2
σ12 σ22 − σ122
+ σe2 σ22
When ρ12 6= 0 this attenuation bias is worse than when x2 is excluded (check!)
Intuition: x2 soaks up some signal in x1 leaving more noise in what remains
One implication of this is that putting in extra regressors may lead to worse
estimates– omitted variables bias is reduced but attenuation bias is increased
Compare with
Cov (x1 , u − β1 e) β1 σ 2 σ2 σ2
plimβ̂1 = β1 + = β1 − 2 e 2 = β1 (1 − 2 e 2 ) = β1 ( 2 1 2 )
Var (x1 ) σ1 + σe σ1 + σe σ1 + σe
σe2 σ12
plimβ̂2 = β1 + β2
σ12 σ22 − σ122
+ σe2 σ22
y = Xβ + e + u