Metrics WT 2023-24 Unit11 Endogeneity

Unit 11: Endogeneity and Measurement error
C. Zulehner: Introductory Econometrics 1 / 36

Outline
1 Endogeneity
Failure of Exogeneity
Detecting Endogeneity
2 Omitted Variables
The General Case
Consequences of Omitted Variables: an example
3 Measurement Error
“Classical” Measurement Error
The General Case
An Informative special case
Measurement Error in the Dependent Variable

1. Endogeneity
Until now, we mainly looked at the failure of assumptions MLR.2, MLR.3, and
MLR.5
I We argued that multicollinearity, heteroscedasticity, and correlated errors
among clusters - holding the other assumptions constant - only result in a loss
of efficiency
I Standard errors are too high but point estimates are “correct”, i.e. estimates
are consistent
Assumption MLR.1 (linearity) also doesn’t seem to be too restrictive as we often
can linearize the model. Alternatively, we can use non-linear estimation methods
(not in this lecture)
The crucial assumption of our model is MLR.4: the orthogonality or conditional
independence assumption
E (u|X) = 0

Endogeneity
We now will address the issues related to the failure of this central assumption more
generally
1 What happens if this does not hold
2 What – besides omitted variable bias – can cause its failure
3 How do we know if it fails to hold?

1.1 Failure of Exogeneity
Why do we care so much about MLR.4?

Without the orthogonality condition, our estimates are (biased and) inconsistent!
Why don’t we get consistency?1
−1
1 0 1 0
plim β̂ = plim β + plim XX plim X u 6= β
N N
| {z } | {z }
→
−
p E [xi 0 xi ]− 1=Q−1 →
−
p E [xi ui ]6=0
Hence if plim( N1 X0 u) = E (X 0 u) 6= 0 the OLS estimates are inconsistent
1
Remember β̂ = β + (X0 X)−1 X0 u ⇒ β̂ − β = (X0 X)−1 X0 u

Asymptotic Bias
The asymptotic bias is Q−1 plim (X0 u)

The direction of the bias will in general depend on whether:
Q−1 plim X0 u ≷ 0

Since Q is positive definite, this means that

I If X and u are positively correlated we have upward bias: β̂ > β
I If X and u are negatively correlated we have downward bias: β̂ < β
the omitted variable bias for x1 when the true model is yi = β0 + β1 x1i + β2 x2i + ui ,
but we estimate yi = α0 + α1 x1i + vi is

Implications
Key: When exogeneity fails the estimator is unable to identify the true effect of X
on y
Non-identification here means the inability to deliver consistent estimates
Intuition: we are trying to measure ∂y/∂X
This is equal to
∂y ∂u
=β+ 6= β
∂X ∂X

Positive Bias - Intuition

Negative Bias - Intuition

Remarks
If Q−1 plim(X0 u/N) is sufficiently large this can generate large differences between
β̂ and β
I Depending on sign of correlation between X and u, the sign of β̂ and β can
even differ!
If we are sure that the estimate is consistent, even if it is inefficient (has a high
standard error) we can still focus on the size of the average effect and get some
useful conclusions (though the high variance warns to be cautious)
If the estimate is inconsistent we can’t draw any conclusion
Knowing the direction of the bias can be informative:
I If β̂ > 0 and plim(X0 u/N) < 0 than the estimate is a lower bound of true
effect
I Try to guess sign and size of bias whenever possible

1.2. Detecting Endogeneity
We could test/check for heteroskedasticity, correlation, multicollinearity but

unfortunately we cannot easily test whether E (u|X) = 0
I u is unobservable! Can we use “the trick” of replacing it with û?
I NO! OLS estimates are constructed so that Xu = 0!
However, the units on instrumental variable estimation, we will see that there is also
a statistical test, which might be helpful if we have additional information
In general, to “detect” endogeneity
i) become suspicious (i.e. be a good econometrician) and
ii) think about the theory you are testing (i.e. be a good economist !)
think of situations under which conditional independence might fail and check
whether these factors play a role in your case

Sources of Endogeneity
There are four main sources of endogeneity (i.e. E (u|X) 6= 0)

1 Omitted variables
2 Measurement errors
3 Reverse (two-ways) causality
4 Selection
We will manly focus on the first two today and 3) and 4) in the next weeks.

2.1 The General Case
The true relationship is y = Xβ + Zα + u where Z is the potentially omitted

variable(s)
The OLS estimator is therefore
β̂ = (X0 X)−1 X0 y = (X0 X)−1 X0 (Xβ + Zα + u)

= (X0 X)−1 X0 Xβ + (X0 X)−1 X0 Zα + (X0 X)−1 X0 u
= β + (X0 X)−1 X0 Zα + (X0 X)−1 X0 u
Taking expectation
E (β̂) = β + (X0 X)−1 X0 Zα
Hence the bias depends on the weighted portion of zi which is “explained” by xi

The General Case
To exactly sign the bias in the general case, we have to know all correlations among
the x’s and the omitted factor
Typically, we explicitly measure the bias only if we can (safely) assume that the
other x’s are uncorrelated with the omitted factor
So, why is omitted variable a cause of endogeneity?
I If we omit x2 (or Z) from our equation, it will be captured by the error term u
I If x1 and x2 are correlated, then E (u|x1 ) 6= 0
I In this case, omitting x2 will make our assumption MLR.4 fail and OLS
estimates inconsistent!

2.2 Omitted Variables: an example
The use of checks in Italy

Suppose we want to estimate the determinants of the use of checks as a mean of
payment in Italya
Suppose we are interested in the effect of income but we have no proxy for
education, while we think that education matters for the use of checks
I Income and education are highly and significantly correlated (0.407)
I In principle we expect a large effect of education on the use of checks (the
more educated are more likely to have a checking account and thus use checks)
We therefore expect a large bias if we omit education
Suppose that we also omit wealth which is even more strongly and also significantly
correlated to income (correlation=0.630)
Exercise: replicate all tables and calculations using italy.csv
a
See, Guiso, Sapienza, Zingales, 2004, The Role of Social Capital in Financial Development, American
Economic Review, 94,3,526-556

Dependent Variable Use of Checks Use of Checks Use of Checks
Specification Correct Omit Educ+Wealth Omit Wealth
Social captial 0.9440 0.8950 0.9425
(5.785)∗∗ (4.966)∗∗ (5.758)∗∗
Age -0.0034 -0.0067 -0.0033
(-7.740)∗∗ (-17.080)∗∗ (-7.347)∗∗
Married 0.0589 0.0331 0.0581
(5.111)∗∗ (2.550)∗∗ (5.023)∗∗
Male 0.0197 0.0381 0.0197
(2.110)∗∗ (3.626)∗∗ (2.113)∗∗
Years of education 0.0287 0.0287
(15.852)∗∗ (15.974)∗∗
Household wealth 0.0447
(1.429)
Household income 0.0057 0.0088 0.0061
(14.645)∗∗ (11.514)∗∗ (12.764)∗∗
Judicial Efficiency -0.0130 -0.0172 -0.0128
(-1.138) (-1.380) (-1.127)
Constant -0.4778 -0.0777 -0.4830
(-2.995)∗∗ (-0.448) (-3.013)∗∗
Number of observations 32442 32442 32442
R-squared adjusted 0.277 0.227 0.277
Robust t-statistics clustered at the regional level in parentheses

Consequences of omitted variables: an example

Omitting education leads to a very large bias in the income parameter (increases by
50%):
I Education is highly correlated with income
I It has a very strong effect on use of checks: an increase by one (four) year(s)
of eduction increase the use of checks by about 3 (12)%
I All other coefficients are also biased. This means that education is correlated
to all other variables. The driver of the bias is however due to the strong
correlation between education and use of checks
Omitting wealth generates instead a smaller bias in spite of being more highly
correlated with income
I Wealth has a small effect on use of checks: doubling the mean of wealth
increases the use of checks by 0.58%

Omitted Variables Bias: Wrap up
Hence in assessing how relevant may be an omitted variable in biasing estimates,

think of both
I How correlated it is likely to be with the included variable(s)
I How important it is in affecting the dependent variable
In practice there may be many variables that affect y, but what matters is that we
include the critical ones

Unobserved Heterogeneity
Unobserved Heterogeneity is a particular case of omitted variables

The only difference is that there is one variable that is unobservable but if observed
should be in the regression
Example 1: ability in estimates of the return to education
wagei = β0 + β1 Educationi + β2 Abilityi + ui

| {z }
New error term
Since we don’t observe Ability , it ends up in the error term

Problem: Ability is likely to be correlated to Education as more able people perform
better ⇒ more likely to invest in education
A positive βˆ1 could simply reflect the effect of unobserved ability due to the
correlation between ability and education

Unobserved Heterogeneity
Example 2: productivity in estimates of returns to capital

βk β
Yj = Aj · Kj · Lj l → yj = β0 + βk · kj + βl · lj + j
with yj = ln Yj , kj = ln Kj , lj = ln Lj and β0 + j = ln Aj
the error term, j , includes:
I technology or management differences, measurement errors, variation in
external factors (weather, machine break down, labor problems)
observed inputs may be correlated with unobserved shock and therefore OLS will
yield biased and inconsistent estimates
I capital and labor are chosen by the firm
I if the firm has knowledge of j (or some part of it) when making input
decisions the choices will likely be correlated with j
I already mentioned by Marshack and Andrews (EMA, 1964)

3.1. “Classical” Measurement Error
Most (perhaps all) our data are measured with error

Suppose our true relation is:
y = Xβ + u with E (u|X) = 0
instead of the exact variables X∗ we observe X, which is measured with error:
X = X∗ + e
The “classical” measurement error is when E (e|X∗ ) = 0
We can therefore re-write our model as:
y = Xβ − eβ + u
I Now X is per definition correlated with the composite error u − eβ
I This lead to inconsistency of the OLS estimates
I We want to estimate E (y|X∗ ) but we only can estimate E (y|X)
I Which type of bias will we have and ow large will it be?

Measurement Error in The SLR
With one regressor and “classical” measurement error the plim of the slope
coefficient is:
Cov (y , X ) Cov (βX − βe + u, X )

plimβ̂ = = =
Var (X ) Var (X )
 
1  
β Cov (X , X ) −βCov (e, X ) + Cov (u, X ) =
=
Var (X )  | {z } | {z }
=Var (X ) =0
=0 Var (e)
z }| { z }| {
∗
Cov (e, X ∗ + e) (Cov (e, X ) + Cov (e, e)
=β−β =β−β
Var (X ) Var (X )
Var (X ∗ )

Var (e)
=β 1− = β
Var (X ∗ ) + Var (e) Var (X ∗ ) + Var (e)
| {z }
<1
1 OLS estimate is biased towards zero. This is called attenuation bias
2 The extent of bias is related to importance of the measurement error

(Var (X ∗ )/[Var (X ∗ ) + Var (e)]) and it is called signal-to-noise ratio or reliability ratio

Measurement Error - Intuition

3.2. The General Case
Consider the previous model with classical measurement error but assume now that
X is multi-dimensional
Call Ω the covariance matrix of u. Moreover assume

1 ∗0 ∗ 1 ∗0
plim X X = Ω∗XX and plim X e =0
N N
The OLS estimator is

−1
β̂ = (X0 X)−1 X0 y = (X∗ + e)0 (X∗ + e) (X∗ + e)0 (X∗ β + u)
And taking the plim

−1
plimβ̂ = plim (X∗ + e)0 (X∗ + e) (X∗ + e)0 (X∗ β + u)
−1
= (Ω∗XX + Ω) Ω∗XX β

Proof
Denominator:

1 ∗
plim (X + e)0 (X∗ + e)
N
 
 1 ∗0 ∗ 1 ∗0 1 0 ∗ 1 0  ∗
 N X X + N X e + N e X + N e e = ΩXX + Ω
= plim  
| {z } | {z }
plim=0 plim=0
Nominator:

1 ∗ 0 ∗
plim (X + e) (X β + u)
N
 
 1 ∗0 ∗ 1 0 ∗ 1 ∗0 1 0  ∗
 N X X β + N e X β + N X u + N e u = ΩXX β
= plim  
| {z } | {z }
plim=0 plim=0

Remarks
The matrix equivalent to the attenuation bias is

−1
(Ω∗XX + Ω) Ω∗XX β
But, in this general case, it is hard to say anything about the direction of bias on
any single coefficient. It will depend on the vector of coefficients and the matrices
of covariance among the regressors and the errors
If both Ω∗XX and Ω are diagonal then all coefficients are biased towards zero

3.2 An informative special case
Consider the two variables case. One variable, x1 is measured with error, the other
x2 is measured without error. The two matrixes are then
2 2
σe 0 σ1 σ12
Ω= and Ω∗XX =
0 0 σ12 σ22
We can therefore write:
σ12 σ22 − σ12

2
plimβ̂1 = β1
σ12 σ22− σ12 + σe2 σ22
2
σe2 σ12
plimβ̂2 = β1 + β2
σ12 σ22 − σ122
+ σe2 σ22

Why?
∗ 0 ∗ −1 ∗ 0 ∗ ∗ −1 ∗
plimβ̂ = plim (X + e) (X + e) (X + e) (X β + u) = ΩXX + Ω ΩXX β
2 2
−1 2

σ1 + σe σ12 σ1 σ12 β1
=
σ12 σ22 σ12 σ22 β2
2 2
1 σ2 −σ12 σ1 σ12 β1
= 2 2 2 2 2
σ1 σ2 + σe2 σ22 − σ12 σ12 −σ12 σ1 + σe σ12 σ2 β2
2 2
σ12 σ22 − σ12 2
σ2 σ12 − σ2 σ12
 
| {z } 
1 
=0  β1
= 2 2

 2 2
σ1 σ2 + σe2 σ22 − σ12 σ12 σ1 σ12 − σ1 σ12 +σe2 σ12 −σ122
+ σ12 σ22 + σe2 σ22  β2

| {z }
=0
σ12 σ22 − σ12

2

1 0 β1
= 2 2
σ1 σ2 + σe2 σ22 − σ12 σ12 σe2 σ12 2
−σ12 + σ12 σ22 + σe2 σ22 β2

Attenuation bias
the attenuation bias of error-ridden variable worsens when other variables are included.
plim of β1 can be written as:
σ12
σ12 +σe2
− ρ212
plimβ̂1 = β1
1 − ρ212
where ρ12 is the correlation between x1 and x2
When ρ12 6= 0 this attenuation bias is worse than when x2 is excluded (check!)
Intuition: x2 soaks up some signal in x1 leaving more noise in what remains
One implication of this is that putting in extra regressors may lead to worse
estimates– omitted variables bias is reduced but attenuation bias is increased

Why?
σ12
σ2 σ2 − σ2 σ2 σ2 − σ2 σ12 +σe2
− ρ212
plimβ̂1 = 2 2 1 2 2 12 2 2 β1 = 2 21 2 2 12 2 β1 = β1
σ1 σ2 − σ12 + σe σ2 σ2 (σ1 + σe ) − σ12 1 − ρ212
Compare with
Cov (x1 , u − β1 e) β1 σ 2 σ2 σ2
plimβ̂1 = β1 + = β1 − 2 e 2 = β1 (1 − 2 e 2 ) = β1 ( 2 1 2 )
Var (x1 ) σ1 + σe σ1 + σe σ1 + σe

Errors and Consistency
The Presence of a variable measured with error causes inconsistency on the coefficients
of other variables (β2 )
σe2 σ12
plimβ̂2 = β1 + β2
σ12 σ22 − σ122
+ σe2 σ22
The plimβ2 6= β2 if x1 and x2 are correlated ⇒ measurement error act as a

contagious disease!
Mirror image of previous result: x2 soaks up some of the true variation of x1

An Extreme Case
Observed x1 is all noise, σe2 = ∞. Then its coefficient will be zero

Then we get:
σ12 Cov (x1 , x2 )

plimβ̂2 = β1 + β2 = β2 + β1
σ22 Var (x2 )
This is the formula for the omitted variables bias!

If x1 is only “noise” then it is equivalent to omit it from the regressions

Measurement Error: An Example
Estimate the use of check equation in the social capital dataset
Social capital is measured by participation in referenda. These are administrative
data ⇒ very little (if any) measurement error
One can generate “measurement error” and add it to trust our measure of social
capital
We generate a random variable in Stata
I set seed 2565
I normals = invnorm(uniform())
Also generate three different measures of social capital with error
I gen noise2 = trust1+normals/2
The (significant) correlation between trust and terror , terror 10, and terror 100 is
0.1586, 0.6271, and 0.9924 respectively

Dependent variable Use of Checks Use of Checks Use of Checks Use of Checks
Specification correct large error midium error small error
Social capital 0.9440 0.0127 0.2641 0.9188
(5.785)∗∗ (1.681) (3.760)∗∗ (5.764)∗∗
Age -0.0034 -0.0033 -0.0033 -0.0034
(-7.740)∗∗ (-7.406)∗∗ (-7.584)∗∗ (-7.751)∗∗
Married 0.0589 0.0511 0.0523 0.0583
(5.111)∗∗ (3.930)∗∗ (4.245)∗∗ (5.077)∗∗
Male 0.0197 0.0192 0.0198 0.0199
(2.110)∗∗ (1.814)∗ (1.918)∗ (2.114)∗∗
Years of education 0.0287 0.0283 0.0284 0.0287
(15.852)∗∗ (15.528)∗∗ (15.521)∗∗ (15.809)∗∗
Household wealth 0.0447 0.0410 0.0424 0.0448
(1.429) (1.209) (1.272) (1.427)
Household income 0.0057 0.0063 0.0061 0.0058
(14.645)∗∗ (14.403)∗∗ (15.047)∗∗ (14.722)∗∗
Judicial Efficiency -0.0130 -0.0481 -0.0386 -0.0139
(-1.138) (-3.647)∗∗ (-3.016)∗∗ (-1.212)
Constant -0.4778 0.3845 0.1529 -0.4540
(-2.995)∗∗ (4.890)∗∗ (1.401) (-2.878)∗∗
Number of observations 32442 32442 32442 32442
R-squared adjusted 0.277 0.263 0.267 0.277

3.3 Measurement Error in the Dependent Variable
Suppose classical measurement error in y: y = y∗ + e

Assume that the error e is uncorrelated with y∗ and X (i.e. E (e|y∗ , X) = 0)
Then we have:
y = Xβ + e + u
Since X is uncorrelated with e then OLS is consistent!

We however have a loss in precision, since the error term has a larger variance due
to the “low quality” of our data

Dependent variable Use of Checks Use of Checks
Specification Correct Error in LHS
Social capital 0.9440 0.9186
(5.785)∗∗ (4.583)∗∗
Age -0.0034 -0.0028
(-7.740)∗∗ (-5.168)∗∗
Married 0.0589 0.1006
(5.111)∗∗ (4.477)∗∗
Male 0.0197 -0.0031
(2.110)∗∗ (-0.148)
Years of education 0.0287 0.0286
(15.852)∗∗ (15.257)∗∗
Household wealth 0.0447 0.0252
(1.429) (0.670)
Household income 0.0057 0.0062
(14.645)∗∗ (9.607)∗∗
Judicial Efficiency -0.0130 -0.0143
(-1.138) (-1.161)
Constant -0.4778 -0.5078
(-2.995)∗∗ (-2.611)∗∗
Number of observations 32442 32442
R-squared adjusted 0.277 0.056

Metrics WT 2023-24 Unit11 Endogeneity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Metrics WT 2023-24 Unit11 Endogeneity

Uploaded by

Copyright:

Available Formats

Unit 11: Endogeneity and Measurement error

C. Zulehner: Introductory Econometrics 1 / 36

C. Zulehner: Introductory Econometrics 2 / 36

C. Zulehner: Introductory Econometrics 3 / 36

C. Zulehner: Introductory Econometrics 4 / 36

Why do we care so much about MLR.4?

Hence if plim( N1 X0 u) = E (X 0 u) 6= 0 the OLS estimates are inconsistent

C. Zulehner: Introductory Econometrics 5 / 36

The asymptotic bias is Q−1 plim (X0 u)

Since Q is positive definite, this means that

C. Zulehner: Introductory Econometrics 6 / 36

C. Zulehner: Introductory Econometrics 7 / 36

C. Zulehner: Introductory Econometrics 8 / 36

C. Zulehner: Introductory Econometrics 9 / 36

C. Zulehner: Introductory Econometrics 10 / 36

We could test/check for heteroskedasticity, correlation, multicollinearity but

C. Zulehner: Introductory Econometrics 11 / 36

There are four main sources of endogeneity (i.e. E (u|X) 6= 0)

C. Zulehner: Introductory Econometrics 12 / 36

The true relationship is y = Xβ + Zα + u where Z is the potentially omitted

β̂ = (X0 X)−1 X0 y = (X0 X)−1 X0 (Xβ + Zα + u)

E (β̂) = β + (X0 X)−1 X0 Zα

Hence the bias depends on the weighted portion of zi which is “explained” by xi

C. Zulehner: Introductory Econometrics 13 / 36

C. Zulehner: Introductory Econometrics 14 / 36

The use of checks in Italy

C. Zulehner: Introductory Econometrics 15 / 36

Robust t-statistics clustered at the regional level in parentheses

C. Zulehner: Introductory Econometrics 16 / 36

The use of checks in Italy

C. Zulehner: Introductory Econometrics 17 / 36

Hence in assessing how relevant may be an omitted variable in biasing estimates,

C. Zulehner: Introductory Econometrics 18 / 36

Unobserved Heterogeneity is a particular case of omitted variables

Example 1: ability in estimates of the return to education

wagei = β0 + β1 Educationi + β2 Abilityi + ui

Since we don’t observe Ability , it ends up in the error term

C. Zulehner: Introductory Econometrics 19 / 36

Example 2: productivity in estimates of returns to capital

C. Zulehner: Introductory Econometrics 20 / 36

Most (perhaps all) our data are measured with error

C. Zulehner: Introductory Econometrics 21 / 36

Cov (y , X ) Cov (βX − βe + u, X )

1 OLS estimate is biased towards zero. This is called attenuation bias

2 The extent of bias is related to importance of the measurement error

C. Zulehner: Introductory Econometrics 22 / 36

C. Zulehner: Introductory Econometrics 23 / 36

The OLS estimator is

And taking the plim

C. Zulehner: Introductory Econometrics 24 / 36

C. Zulehner: Introductory Econometrics 25 / 36

The matrix equivalent to the attenuation bias is

C. Zulehner: Introductory Econometrics 26 / 36

We can therefore write:

σ12 σ22 − σ12

C. Zulehner: Introductory Econometrics 27 / 36

σ12 σ22 − σ12

C. Zulehner: Introductory Econometrics 28 / 36

where ρ12 is the correlation between x1 and x2

C. Zulehner: Introductory Econometrics 29 / 36

C. Zulehner: Introductory Econometrics 30 / 36

The plimβ2 6= β2 if x1 and x2 are correlated ⇒ measurement error act as a

C. Zulehner: Introductory Econometrics 31 / 36