You are on page 1of 71

Omitted Variable Bias

OLS estimates the causal relationship from to

It is possible that the direction of causality goes both ways: to

(A) Simultaneity

E.g. Impact of smoking on health

Does smoking determine health outcomes or do health outcomes


determine smoking behaviour?
(B) Omitted variable bias
Eg. Impact of schooling on earnings

Observed association between outcome variable ( ) and explanatory


variable ( ) can be misleading

 partly reflects omitted factors that are related to both variables


If these factors could be measured and held constant in a regression
 omitted variable bias would be eliminated
 in practice this is difficult
Innate ability of ones’ parents affects earnings and schooling of children
 cannot perfectly control for ability which is essentially
unobservable
Formally:

Assumption of OLS is that and are not correlated

This assumption is violated if:

 There are omitted variables which determine both and


which we cannot control for
In other words the estimate of β is not identified, we cannot deduce it
from the joint distribution between and
Ways to address this:

(1) Experiments which randomly assigns  so that it is no longer


correlated with

 Job training program which conducts a social experiment which


randomly assigns training to a subset of individuals
 Random assignment assures that participation into the program is
not correlated with omitted personal or social factors

In practice randomization is not very feasible

Not easy to run social experiments on population (outside of a lab)


(2) Instrumental Variables

Suppose we have a third variable (“the instrument”) which is


correlated with  but not with

Hence is uncorrelated with the omitted variables and the regression


error

Instrumental variable technique allows us to estimate the coefficient of


interest consistently (free of bias caused by the omitted variables)
without having data on the omitted variables

Intuitively -- instrumental variables uses only part of the variability in


 (the part which is uncorrelated with ) to estimate relationship
between and
Classic example

Estimation of demand and supply elasticities

Observed data on quantities and prices reflects a set of equilibrium points


on both the demand and supply curves

Consequently an OLS regression of quantities on prices fails to identify,


that is trace out, either the supply or demand relationship

We can solve this problem by finding certain “curve shifters” (now


called instrumental variables)
Find additional factors which affect demand conditions without affecting
supply conditions and vice-versa

Example of linseed oil:

 For the demand curve shifter we can use the price of substitute
goods (cottonseed)

 For the supply curve shifter can use factors that affect costs
(yield per acre) such as weather patterns
Intuitively:

 weather related shifts (which shift the supply curve) are used to
trace out the demand curve

 changes in the price of substitute goods are used to shift the


demand curve so as to trace out the supply curve
EXAMPLE ON SUPPLY/DEMAND
[from: Stock and Watson, Introduction to Econometrics, chapter 12]

ƒ Simultaneous causality bias in the OLS regression of quantities


on prices arises because price and quantity are determined by
the interaction of demand and supply!
EXAMPLE ON SUPPLY/DEMAND
[from: Stock and Watson, Introduction to Econometrics, chapter 12]

ƒ The interaction between demand and supply could


reasonably produce something not useful for our purposes!
EXAMPLE ON SUPPLY/DEMAND
[from: Stock and Watson, Introduction to Econometrics, chapter 12]

ƒ But, what if only supply shifts?

ƒ TSLS estimates the demand curve by isolating shifts in price


and quantity that arise from shifts in supply; Z is a variable
that shifts supply but not demand.
Instrumental Variables Method

First stage estimation:

Obtain predicted values:

Predicted value, , is not a random variable and hence is not correlated


with
Second stage estimation:
Econ 495 - Econometric Review 1

Contents

4 Instrumental Variables 2

4.1 Single endogenous variable – One continuous instrument . 2

4.2 Single endogenous variable – more than one continuous


instrument . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Testing for Endogeneity and Overidentifying Restrictions . 27


Econ 495 - Econometric Review 2

4 Instrumental Variables

4.1 Single endogenous variable – One continuous instru-


ment

• Instrumental Variables (IV) estimation is used when a model

Y = β 0 + β 1X + u (1)
has an endogenous X , that is, whenever Cov (X, u) 6= 0

• In other words, IV can be used to address the problem of omitted


variable bias
Econ 495 - Econometric Review 3

• For example, we are concerned that educ in a wage equation may be


an endogeneous variable and the OLS coefficients over-estimate the
returns to education

. use c:\data\card; /* sample of men in 1976 as in Wooldridge example 15.4 */

. reg lwage educ exper expersq black south smsa smsa66 reg661-reg668 ;

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 15, 2994) = 85.48
Model | 177.695591 15 11.8463727 Prob > F = 0.0000
Residual | 414.946054 2994 .138592536 R-squared = 0.2998
-------------+------------------------------ Adj R-squared = 0.2963
Total | 592.641645 3009 .196956346 Root MSE = .37228

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0746933 .0034983 21.35 0.000 .0678339 .0815527
exper | .084832 .0066242 12.81 0.000 .0718435 .0978205
Econ 495 - Econometric Review 4

expersq | -.002287 .0003166 -7.22 0.000 -.0029079 -.0016662


black | -.1990123 .0182483 -10.91 0.000 -.2347927 -.1632318
south | -.147955 .0259799 -5.69 0.000 -.1988952 -.0970148
smsa | .1363845 .0201005 6.79 0.000 .0969724 .1757967
smsa66 | .0262417 .0194477 1.35 0.177 -.0118905 .0643739
reg661 | -.1185698 .0388301 -3.05 0.002 -.194706 -.0424335
reg662 | -.0222026 .0282575 -0.79 0.432 -.0776088 .0332036
reg663 | .0259703 .0273644 0.95 0.343 -.0276846 .0796251
reg664 | -.0634942 .0356803 -1.78 0.075 -.1334546 .0064662
reg665 | .0094551 .0361174 0.26 0.794 -.0613623 .0802725
reg666 | .0219476 .0400984 0.55 0.584 -.0566755 .1005708
reg667 | -.0005887 .0393793 -0.01 0.988 -.077802 .0766245
reg668 | -.1750058 .0463394 -3.78 0.000 -.265866 -.0841456
_cons | 4.739377 .0715282 66.26 0.000 4.599127 4.879626
------------------------------------------------------------------------------

• Additionally, IV can be used to solve the classic errors-in-variables


problem
Econ 495 - Econometric Review 5

• But what is an instrumental variable?

• In order for a variable, Z , to serve as a valid instrument for X , the


following must be true

• Assumption 1: Exclusion Restriction The instrument must be exoge-


nous, that is, uncorrelated with the error term, Cov (Z, u) = 0

• Assumption 2: Instrument Relevance The instrument must be corre-


lated with the endogenous variable X that is, Cov (Z, X ) 6= 0

• How do we know that Z is a valid instrument?


Econ 495 - Econometric Review 6

• The main problem is that we have to use common sense and economic
theory to decide if it makes sense to assume Cov (Z, u) = 0

• In the case of multiple instruments, we can use the overid test below

• However, we can test whether Cov (Z, X ) 6= 0

• We simply test H0 : π 1 = 0 in the regression

X = π 0 + π 1Z + v (2)

• This regression is called the first-stage regression


Econ 495 - Econometric Review 7

• Card (1995) has used proximity to a four-year college nearc4 as in-


strument for education

. reg educ nearc4 exper expersq black south smsa smsa66 reg661-reg668 ;

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 15, 2994) = 182.13
Model | 10287.6179 15 685.841194 Prob > F = 0.0000
Residual | 11274.4622 2994 3.76568542 R-squared = 0.4771
-------------+------------------------------ Adj R-squared = 0.4745
Total | 21562.0801 3009 7.16586243 Root MSE = 1.9405

------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nearc4 | .3198989 .0878638 3.64 0.000 .1476194 .4921785
exper | -.4125334 .0336996 -12.24 0.000 -.4786101 -.3464566
expersq | .0008686 .0016504 0.53 0.599 -.0023674 .0041046
black | -.9355287 .0937348 -9.98 0.000 -1.11932 -.7517377
south | -.0516126 .1354284 -0.38 0.703 -.3171548 .2139296
smsa | .4021825 .1048112 3.84 0.000 .1966732 .6076918
Econ 495 - Econometric Review 8

smsa66 | .0254805 .1057692 0.24 0.810 -.1819071 .2328682


reg661 | -.210271 .2024568 -1.04 0.299 -.6072395 .1866975
reg662 | -.2889073 .1473395 -1.96 0.050 -.5778042 -.0000105
reg663 | -.2382099 .1426357 -1.67 0.095 -.5178838 .0414639
reg664 | -.093089 .1859827 -0.50 0.617 -.4577559 .2715779
reg665 | -.4828875 .1881872 -2.57 0.010 -.8518767 -.1138982
reg666 | -.5130857 .2096352 -2.45 0.014 -.9241293 -.1020421
reg667 | -.4270887 .2056208 -2.08 0.038 -.8302611 -.0239163
reg668 | .3136204 .2416739 1.30 0.194 -.1602434 .7874841
_cons | 16.84852 .2111222 79.80 0.000 16.43456 17.26248
------------------------------------------------------------------------------

. test nearc4;

( 1) nearc4 = 0

F( 1, 2994) = 13.26
Prob > F = 0.0003
Econ 495 - Econometric Review 9

• Rule-of-thumb: you need to worry about weak instruments if the first-


stage F-statistic is less than 10

• Given equation (1) and our assumptions 1 and 2

Cov (Z, Y ) = Cov [Z, (β 0 + β 1X + u)]


= β 1Cov (Z, X ) + Cov (Z, u),

Cov (Z, Y )
so β IV
1 =
Cov (Z, X )
Econ 495 - Econometric Review 10

• Equivalently, this yields the conditions for a method-of-moments esti-


mator

E (u) = E (Y − β 0 − β 1X ) = m1 = 0
E (Zu) = E [Z (Y − β 0 − β 1X )] = m2 = 0

Either way , the IV estimator for β 1 is


Pn
b IV = P i=1(Zi − Z̄ )(Yi − Ȳ )
β (3)
1 n (Z − Z̄ )(X − X̄ )
i=1 i i
Econ 495 - Econometric Review 11

. ivreg lwage (educ=nearc4) exper expersq black south smsa smsa66 reg661-reg668 ;

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 15, 2994) = 51.01
Model | 141.146813 15 9.40978752 Prob > F = 0.0000
Residual | 451.494832 2994 .150799877 R-squared = 0.2382
-------------+------------------------------ Adj R-squared = 0.2343
Total | 592.641645 3009 .196956346 Root MSE = .38833

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1315038 .0549637 2.39 0.017 .0237335 .2392742
exper | .1082711 .0236586 4.58 0.000 .0618824 .1546598
expersq | -.0023349 .0003335 -7.00 0.000 -.0029888 -.001681
black | -.1467757 .0538999 -2.72 0.007 -.2524603 -.0410912
south | -.1446715 .0272846 -5.30 0.000 -.19817 -.091173
smsa | .1118083 .031662 3.53 0.000 .0497269 .1738898
smsa66 | .0185311 .0216086 0.86 0.391 -.0238381 .0609003
Econ 495 - Econometric Review 12

reg661 | -.1078142 .0418137 -2.58 0.010 -.1898007 -.0258278


reg662 | -.0070465 .0329073 -0.21 0.830 -.0715696 .0574767
reg663 | .0404445 .0317806 1.27 0.203 -.0218694 .1027585
reg664 | -.0579172 .0376059 -1.54 0.124 -.1316532 .0158189
reg665 | .0384577 .0469387 0.82 0.413 -.0535777 .130493
reg666 | .0550887 .0526597 1.05 0.296 -.0481642 .1583416
reg667 | .026758 .0488287 0.55 0.584 -.0689832 .1224992
reg668 | -.1908912 .0507113 -3.76 0.000 -.2903238 -.0914586
_cons | 3.773965 .934947 4.04 0.000 1.940762 5.607169
------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper expersq black south smsa smsa66 reg661 reg662 reg663
reg664 reg665 reg666 reg667 reg668 nearc4
------------------------------------------------------------------------------

b IV = 0.132 > β
• Notice that β b OLS = 0.075 (see LATE effect below)
educ educ

• Which estimator should we prefer IV or OLS?


Econ 495 - Econometric Review 13

• When the regressor is endogenous, Cov (X, u) 6= 0, IV estimation is


consistent, while OLS is inconsistent,
b OLS = β + Corr(X, u) · σ u
plimβ (4)
1 1
σX

• But when R2 < 1 in the first stage, IV standard errors are larger than
the OLS

• The stronger the correlation between Z and X (strong instrument),


the smaller the IV standard errors

• On the other hand, if the instrument is weak, the IV standard errors


will be large
Econ 495 - Econometric Review 14

• When the instrument is not really exogenous, i.e. if our assumption


that Cov (Z, u) = 0 is false,

• Then the IV estimator will be inconsistent, too

b IV Corr(Z, u) σ u
plimβ 1 = β 1 + · (5)
Corr(Z, X ) σ X

• We will prefer IV if Corr(Z, u)/Corr(Z, X ) < Corr(X, u).


Econ 495 - Econometric Review 15

• Potential problems with IV estimation:

– IV can be very biased (much more than OLS), when the instrument
is not truly exogenous

– Even instruments that are randomly assigned can be invalid if they


affect the outcome is some way (not double-blind)

– Not always representative of the whole population, but may capture


a local average treatment effect (LATE) e.g. proximity to school
may affect more lower income students, compulsory schooling may
affect marginal students

– Specification searching and publication bias lead to higher IV esti-


mates since they have larger standard errors.
Econ 495 - Econometric Review 16

4.2 Single endogenous variable – more than one continuous


instrument

• Consider the following structural model

Y = β 0 + β 1 X 1 + β 2 X 2 + u1 (6)
where X1 is an endogeneous variable and X2 is an exogenous variable

• Suppose now that we have two exogenous variables excluded from


equation (6)
X1 = π 0 + π 1Z1 + π 2Z2 + v (7)
where Z1 and Z2 are valid instruments in that they do not appear
in the structural model and are uncorrelated with the structural error
term u1, but are correlated with X1
Econ 495 - Econometric Review 17

• With more than one instrument, the IV estimator is also called the
two-stage least squares (2SLS) estimator

• In our returns to education example, we can add proximity to a two-


year college nearc2

. reg educ nearc4 nearc2 exper expersq black south smsa smsa66 reg661-reg668 ;

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 16, 2993) = 170.99
Model | 10297.1164 16 643.569774 Prob > F = 0.0000
Residual | 11264.9637 2993 3.76377002 R-squared = 0.4776
-------------+------------------------------ Adj R-squared = 0.4748
Total | 21562.0801 3009 7.16586243 Root MSE = 1.94

------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Econ 495 - Econometric Review 18

nearc4 | .3205819 .0878425 3.65 0.000 .148344 .4928197


nearc2 | .1229986 .0774256 1.59 0.112 -.0288142 .2748114
exper | -.4122915 .0336914 -12.24 0.000 -.4783521 -.3462309
expersq | .0008479 .00165 0.51 0.607 -.0023874 .0040832
black | -.9451729 .0939073 -10.06 0.000 -1.129302 -.7610434
south | -.0419115 .1355316 -0.31 0.757 -.3076561 .2238331
smsa | .4013708 .1047858 3.83 0.000 .1959113 .6068303
smsa66 | .0000782 .1069445 0.00 0.999 -.2096139 .2097704
reg661 | -.1687829 .2040832 -0.83 0.408 -.5689405 .2313747
reg662 | -.269031 .1478324 -1.82 0.069 -.5588944 .0208325
reg663 | -.1902114 .1457652 -1.30 0.192 -.4760216 .0955987
reg664 | -.037715 .1891745 -0.20 0.842 -.4086403 .3332102
reg665 | -.4371387 .1903306 -2.30 0.022 -.8103307 -.0639467
reg666 | -.5022265 .2096933 -2.40 0.017 -.9133841 -.0910688
reg667 | -.3775317 .207922 -1.82 0.070 -.7852162 .0301529
reg668 | .3820043 .2454171 1.56 0.120 -.0991991 .8632076
_cons | 16.77306 .2163481 77.53 0.000 16.34885 17.19727
------------------------------------------------------------------------------

. predict peduc;
(option xb assumed; fitted values)
Econ 495 - Econometric Review 19

. predict reseduc, res;

. test nearc4=nearc2=0;

( 1) nearc4 - nearc2 = 0
( 2) nearc4 = 0

F( 2, 2993) = 7.89
Prob > F = 0.0004

• Here nearc2 is a weak instrument, so we no longer pass the rule-of-


thumb test, so we would prefer the IV using only nearc4

• In a more general case, we could use either Z1 or Z2 as an instrument


Econ 495 - Econometric Review 20

• But the best instrument is a linear combination of all of the exogenous


variables, including X2

X1∗ = π 0 + π 1Z1 + π 2Z2 + π 3X2 + v2

• We can estimate X1∗ by regressing X1 on Z1, Z2 and X2, a regression


that is an example of a reduced form equation

c as an instrument for X in the structural model, we will


• If we use X 1 1
get same coefficient as 2SLS (see ivreg2 below)

• The ‘2’SLS expression comes from the fact that this estimation strat-
egy is done in two steps
Econ 495 - Econometric Review 21

. ivreg lwage (educ=peduc) exper expersq black south smsa smsa66 reg661-reg668;

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 15, 2994) = 47.07
Model | 100.86894 15 6.724596 Prob > F = 0.0000
Residual | 491.772705 2994 .16425274 R-squared = 0.1702
-------------+------------------------------ Adj R-squared = 0.1660
Total | 592.641645 3009 .196956346 Root MSE = .40528

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1570594 .0525783 2.99 0.003 .0539662 .2601526
exper | .1188149 .0228061 5.21 0.000 .0740977 .1635321
expersq | -.0023565 .0003475 -6.78 0.000 -.0030379 -.0016751
black | -.1232778 .0521501 -2.36 0.018 -.2255313 -.0210242
south | -.1431945 .0284448 -5.03 0.000 -.1989678 -.0874212
smsa | .100753 .0315193 3.20 0.001 .0389512 .1625548
smsa66 | .0150626 .022336 0.67 0.500 -.0287328 .058858
Econ 495 - Econometric Review 22

reg661 | -.102976 .0434224 -2.37 0.018 -.1881167 -.0178353


reg662 | -.0002286 .0337943 -0.01 0.995 -.066491 .0660337
reg663 | .0469556 .032649 1.44 0.150 -.0170612 .1109725
reg664 | -.0554084 .0391828 -1.41 0.157 -.1322364 .0214196
reg665 | .0515042 .0475678 1.08 0.279 -.0417647 .144773
reg666 | .0699968 .0533049 1.31 0.189 -.0345212 .1745148
reg667 | .0390596 .0497499 0.79 0.432 -.0584878 .136607
reg668 | -.1980371 .052535 -3.77 0.000 -.3010454 -.0950287
_cons | 3.339686 .894538 3.73 0.000 1.585715 5.093658
------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper expersq black south smsa smsa66 reg661 reg662 reg663
reg664 reg665 reg666 reg667 reg668 peduc
------------------------------------------------------------------------------

• While the coefficients are the same, the standard errors from doing
2SLS by hand are incorrect, so let STATA do it for you
Econ 495 - Econometric Review 23

• Some economists like to interpret the first stage of 2SLS as a way to


“purge” X1 from its correlation with u1 before doing the second stage
regression

• The method extends to multiple endogenous variables (say k), but we


need at least as many excluded exogenous variables (instruments) as
there are endogenous variables in the structural equation

• The standard IV(one instrument) and 2SLS(one linear combination of


many instruments) estimators are special cases of the GMM estimator,
where l instruments will give us a set of l moments E [Zj u] = mj = 0,
j = 1, . . . l .
Econ 495 - Econometric Review 24

• In a just-identified model (l=k), GMM will be identical to IV, but when


there are more instruments than endogeneous variables (l > k), we
will need a weighing matrix that accounts for the correlations among
the moments conditions when errors are not i.i.d.

• When we use ivreg2 with the option gmm, STATA will compute 2SLS
residuals in a first step and use these residuals to compute a weighing
matrix that will give the most efficient feasible estimate.

• Thus for overidentified models, the GMM approach makes more ef-
ficient use of the information in the l moment conditions than the
standard 2SLS approach which reduces them to k instrument, and it
is heteroskedascity-efficient.
Econ 495 - Econometric Review 25

. ivreg2 lwage (educ=nearc2 nearc4) exper expersq black south smsa smsa66
reg661-reg668, gmm ;

GMM estimation
--------------

Number of obs = 3010


F( 15, 2994) = 51.43
Prob > F = 0.0000
Total (centered) SS = 592.6416447 Centered R2 = 0.1760
Total (uncentered) SS = 118616.3653 Uncentered R2 = 0.9959
Residual SS = 488.3650354 Root MSE = .4028

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1552102 .052387 2.96 0.003 .0525336 .2578867
exper | .1179614 .0228779 5.16 0.000 .0731215 .1628013
expersq | -.0023521 .0003674 -6.40 0.000 -.0030721 -.001632
black | -.1257875 .0514422 -2.45 0.014 -.2266124 -.0249627
Econ 495 - Econometric Review 26

south | -.1431677 .0301873 -4.74 0.000 -.2023336 -.0840018


smsa | .1014907 .0313552 3.24 0.001 .0400356 .1629459
smsa66 | .0146266 .0211085 0.69 0.488 -.0267452 .0559984
reg661 | -.1048084 .0425444 -2.46 0.014 -.188194 -.0214228
reg662 | -.0006335 .0345211 -0.02 0.985 -.0682936 .0670267
reg663 | .0464749 .0335225 1.39 0.166 -.0192279 .1121778
reg664 | -.0546612 .0408873 -1.34 0.181 -.1347988 .0254764
reg665 | .0507403 .0506229 1.00 0.316 -.0484788 .1499593
reg666 | .0670444 .0533888 1.26 0.209 -.0375958 .1716845
reg667 | .0369472 .0513968 0.72 0.472 -.0637886 .137683
reg668 | -.1992538 .0522223 -3.82 0.000 -.3016077 -.0968999
_cons | 3.372118 .8904517 3.79 0.000 1.626865 5.117371
------------------------------------------------------------------------------
Anderson canon. corr. LR statistic (identification/IV relevance test): 15.834
Chi-sq(2) P-val = 0.0004
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 1.269
Chi-sq(1) P-val = 0.2600
------------------------------------------------------------------------------
Instrumented: educ
Included instruments: exper expersq black south smsa smsa66 reg661 reg662 reg663
Econ 495 - Econometric Review 27

reg664 reg665 reg666 reg667 reg668


Excluded instruments: nearc2 nearc4
------------------------------------------------------------------------------

4.3 Testing for Endogeneity and Overidentifying Re-


strictions

• Since OLS is preferred to IV if we do not have an endogeneity problem,


then we would like to be able to test for endogeneity

• If we do not have endogeneity, both OLS and IV are consistent


Econ 495 - Econometric Review 28

• The idea of the Hausman test is to see if the estimates from OLS and
IV are different

• If X1 is endogenous, then v2 (from the first stage equation) and u1


from the structural model will be correlated

• This test is easily done by including the residual from the first stage
in the OLS regression

Y = β 0 + β 1X1 + β 2X2 + δ v̂2 + u1


and testing H0 : δ = 0 that there is no correlation between the resid-
uals and the X ’s using a t statistic
Econ 495 - Econometric Review 29

. reg lwage educ reseduc exper expersq black south smsa smsa66 reg661-reg668 ;

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 16, 2993) = 80.37
Model | 178.100803 16 11.1313002 Prob > F = 0.0000
Residual | 414.540842 2993 .138503455 R-squared = 0.3005
-------------+------------------------------ Adj R-squared = 0.2968
Total | 592.641645 3009 .196956346 Root MSE = .37216

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1570594 .0482814 3.25 0.001 .0623912 .2517275
reseduc | -.0828005 .0484086 -1.71 0.087 -.177718 .0121169
exper | .1188149 .0209423 5.67 0.000 .0777521 .1598776
expersq | -.0023565 .0003191 -7.38 0.000 -.0029822 -.0017308
black | -.1232778 .0478882 -2.57 0.010 -.2171749 -.0293806
south | -.1431945 .0261202 -5.48 0.000 -.1944098 -.0919791
smsa | .100753 .0289435 3.48 0.001 .0440018 .1575042
smsa66 | .0150626 .0205106 0.73 0.463 -.0251538 .0552789
reg661 | -.102976 .0398738 -2.58 0.010 -.1811588 -.0247932
Econ 495 - Econometric Review 30

reg662 | -.0002286 .0310325 -0.01 0.994 -.0610759 .0606186


reg663 | .0469556 .0299809 1.57 0.117 -.0118296 .1057408
reg664 | -.0554084 .0359807 -1.54 0.124 -.1259578 .0151411
reg665 | .0515041 .0436804 1.18 0.238 -.0341426 .1371509
reg666 | .0699968 .0489487 1.43 0.153 -.0259797 .1659734
reg667 | .0390596 .0456842 0.85 0.393 -.050516 .1286352
reg668 | -.1980371 .0482417 -4.11 0.000 -.2926273 -.1034468
_cons | 3.339687 .821434 4.07 0.000 1.729054 4.950319
------------------------------------------------------------------------------

• With a |t| = 1.71, there is moderate evidence of positive correlation


between u1 and v2, we conclude that educ is moderately endogenous

• Alternatively, you can use the STATA command hausman IV OLS where
the commands est store OLS and est store IV have followed each
estimation command
Econ 495 - Econometric Review 31

. hausman IV OLS, constant sigmamore;

Note: the rank of the differenced variance matrix (1) does not equal the number
of coefficients being tested (16); be sure this is what you expect, or
there may be problems computing the test. Examine the output of your
estimators for anything unexpected and possibly consider scaling your
variables so that the coefficients are on a similar scale.

---- Coefficients ----


| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| IV OLS Difference S.E.
-------------+----------------------------------------------------------------
educ | .1570594 .0746933 .0823661 .0481701
exper | .1188149 .084832 .0339828 .0198741
expersq | -.0023565 -.002287 -.0000694 .0000406
black | -.1232778 -.1990123 .0757345 .0442917
south | -.1431945 -.147955 .0047605 .0027841
smsa | .100753 .1363845 -.0356315 .0208383
smsa66 | .0150626 .0262417 -.0111791 .0065379
reg661 | -.102976 -.1185698 .0155938 .0091197
reg662 | -.0002286 -.0222026 .0219739 .012851
Econ 495 - Econometric Review 32

reg663 | .0469556 .0259703 .0209854 .0122728


reg664 | -.0554084 -.0634942 .0080858 .0047288
reg665 | .0515041 .0094551 .0420491 .0245915
reg666 | .0699968 .0219476 .0480492 .0281005
reg667 | .0390596 -.0005887 .0396483 .0231875
reg668 | -.1980371 -.1750058 -.0230313 .0134693
_cons | 3.339687 4.739377 -1.39969 .818579
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from ivreg
B = inconsistent under Ha, efficient under Ho; obtained from regress

Test: Ho: difference in coefficients not systematic

chi2(1) = (b-B)’[(V_b-V_B)^(-1)](b-B)
= 2.92
Prob>chi2 = 0.0873
(V_b-V_B is not positive definite)

• We conclude that educ is correlated with the error terms at the 10%
level of significance.
Econ 495 - Econometric Review 33

• Now is our instrument correlated with the error term?

• If there is just one instrument for our endogenous variable, we cannot


test to see whether the instrument is uncorrelated with the error: the
model is just identified

• However, if we have multiple instruments, it is possible to test the


overidentifying restrictions to see if some of the instruments are
correlated with the error if we are pretty sure that one instrument is
excluded

• With ivreg2, STATA will perform the test for you


Econ 495 - Econometric Review 34

• The idea is to regress the predicted residual û1 on all exogenous vari-
ables, including the instrumental variables

• If the instruments are in fact exogenous, the coefficients on the instru-


ments and on the included exogenous variables in a regression on û1
should all be zero

• This is formally tested with either a chi-square statistics or a J-statistic


to test H0 that all IVs are uncorrelated with u1.

. ivreg2 lwage (educ=nearc2 nearc4) exper expersq black south smsa smsa66
reg661-reg668 ;

Instrumental variables (2SLS) regression


----------------------------------------
Econ 495 - Econometric Review 35

Number of obs = 3010


F( 15, 2993) = 47.07
Prob > F = 0.0000
Total (centered) SS = 592.6416447 Centered R2 = 0.1702
Total (uncentered) SS = 118616.3653 Uncentered R2 = 0.9959
Residual SS = 491.7726451 Root MSE = .4

------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1570594 .0524383 3.00 0.003 .0542822 .2598366
exper | .1188149 .0227454 5.22 0.000 .0742348 .163395
expersq | -.0023565 .0003466 -6.80 0.000 -.0030358 -.0016772
black | -.1232778 .0520112 -2.37 0.018 -.225218 -.0213376
south | -.1431945 .0283691 -5.05 0.000 -.1987968 -.0875921
smsa | .100753 .0314355 3.21 0.001 .0391406 .1623654
smsa66 | .0150626 .0222765 0.68 0.499 -.0285986 .0587238
reg661 | -.102976 .0433068 -2.38 0.017 -.1878558 -.0180962
reg662 | -.0002286 .0337043 -0.01 0.995 -.0662879 .0658306
reg663 | .0469556 .0325621 1.44 0.149 -.016865 .1107763
reg664 | -.0554084 .0390786 -1.42 0.156 -.132001 .0211842
Econ 495 - Econometric Review 36

reg665 | .0515041 .0474412 1.09 0.278 -.0414789 .1444872


reg666 | .0699968 .0531631 1.32 0.188 -.0342009 .1741945
reg667 | .0390596 .0496175 0.79 0.431 -.0581889 .136308
reg668 | -.1980371 .0523952 -3.78 0.000 -.3007297 -.0953444
_cons | 3.339687 .8921571 3.74 0.000 1.591091 5.088283
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments): 1.248
Chi-sq(1) P-val = 0.26391
------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper expersq black south smsa smsa66 reg661 reg662 reg663
reg664 reg665 reg666 reg667 reg668 nearc2 nearc4
------------------------------------------------------------------------------

. predict res1, res; */doing it the long way/*

. regress res1 exper expersq black south smsa smsa66 reg661-reg668 nearc4 nearc2 ;

Source | SS df MS Number of obs = 3010


-------------+------------------------------ F( 16, 2993) = 0.08
Model | .203922708 16 .012745169 Prob > F = 1.0000
Econ 495 - Econometric Review 37

Residual | 491.568732 2993 .16423947 R-squared = 0.0004


-------------+------------------------------ Adj R-squared = -0.0049
Total | 491.772655 3009 .163433917 Root MSE = .40526

------------------------------------------------------------------------------
res1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | .0000312 .0070379 0.00 0.996 -.0137685 .0138309
expersq | -3.43e-06 .0003447 -0.01 0.992 -.0006793 .0006724
black | -.0008853 .0196167 -0.05 0.964 -.0393489 .0375784
south | .0011448 .0283118 0.04 0.968 -.0543678 .0566574
smsa | .0006683 .0218892 0.03 0.976 -.0422511 .0435877
smsa66 | -.0005942 .0223401 -0.03 0.979 -.0443978 .0432093
reg661 | .0061073 .0426319 0.14 0.886 -.0774835 .089698
reg662 | .0032348 .0308814 0.10 0.917 -.0573161 .0637857
reg663 | .0060228 .0304496 0.20 0.843 -.0536814 .065727
reg664 | .007307 .0395175 0.18 0.853 -.0701773 .0847913
reg665 | .0054381 .039759 0.14 0.891 -.0725197 .0833959
reg666 | -.0003375 .0438038 -0.01 0.994 -.0862261 .0855511
reg667 | .0052685 .0434338 0.12 0.903 -.0798946 .0904316
reg668 | .0083152 .0512663 0.16 0.871 -.0922056 .108836
Econ 495 - Econometric Review 38

nearc4 | -.0080835 .0183498 -0.44 0.660 -.044063 .0278961


nearc2 | .0165189 .0161738 1.02 0.307 -.015194 .0482318
_cons | -.0064297 .045194 -0.14 0.887 -.095044 .0821847
------------------------------------------------------------------------------

. gen overid2=_N*e(r2);

. di overid2;
1.2481526

. gen pval=chi2tail(1,overid2);

. di pval;
.26390561

. test nearc4=nearc2=0;

( 1) nearc4 - nearc2 = 0
( 2) nearc4 = 0

F( 2, 2993) = 0.62
Econ 495 - Econometric Review 39

Prob > F = 0.5376

. gen Jstat=r(df)*r(F);

. di Jstat;
1.2416182

• Therefore the variables of proximity to four and two-year college pass


the overidentification test, but may not be the preferred specification
since nearc2 is a weak instrument
TRADE ANALYSIS
(TOPALOVA and KHANDELWAL, TK)
ƒ “Trade Liberalization and Firm Productivity: the Case of India” is a relevant
paper to explore how to address the issue of endogeneity in an empirical
trade model.
ƒ The paper examines the effects of Indian trade reform on firm-level
productivity.
ƒ Endogeneity concerns for the productivity effect of trade policy:
1. Governments may reduce tariffs only after domestic firms have
improved productivity which would result in a spurious relationship
between trade and productivity
2. Selective protection of industries (tariffs may be adjusted in response to
industry productivity levels)
ƒ If policy decisions on tariff changes across industries were indeed based
on expected future productivity or on industry lobbying, isolating the
impact of the tariff changes would be difficult. Simply comparing
productivity in liberalized industries to productivity in non liberalized
industries would possibly give a spurious correlation between total factor
productivity (TFP) growth and trade policies.
TRADE ANALYSIS (TK)

ƒ Since 1991, over a short period of time, India drastically


reduced tariffs and narrowed the dispersion in tariffs across
sectors. Since the reform was rapid, comprehensive, and
externally imposed (IMF), it is reasonable to assume that the
changes in the level of protectionism were unrelated to firm-
and industry-level productivity.
ƒ However, at the time the government announced the
export-import policy in the Ninth Plan (1997-2002), the
sweeping reforms outlined in the previous plan had been
undertaken and pressure for further reforms from external
sources had abated.
ƒ More difficult to isolate the causal impact of the tariff changes.
TRADE ANALYSIS (TK)

ƒ The authors address the concern of possible endogeneity of


trade policy in 3 ways:

1. Examining the extent to which tariffs moved together.


ƒ Tariff movements were uniform until 1997 and less uniform
afterwards, indicating a more pronounced problem of
endogenous trade protection in the second period.
2. Testing whether protection correlates with industry
characteristics (employment, output, average wage,
concentration etc.).
ƒ No statistical correlation (indication of exogeneity)
INSTITUTIONS AND ECONOMIC DEVELOPMENT

Notes from : “Colonial origins of comparative development”


(Acemoglu et. al.)

What are the fundamental causes of the large differences in income per
capita across countries?

Differences in institutions and property rights have received attention

View receives support from cross-country correlations between measures


of property rights and economic development
At some level -- obvious that institutions matter

 North and South Korea


 East and West Germany

One part of the country stagnated under central planning and collective
ownership while the other prospered with private property and a market
economy

To estimate impact of institutions on economic performance we need a


source of exogenous variation in institutions (an instrument)
Propose a theory of institutional differences among countries colonized
by Europeans

Exploit this theory to derive a possible source of exogenous variation

Theory rests on three premises


(1) Different types of colonization policies created different sets of
institutions

 At one extreme:

Colonizers did not settle and set up extractive institutions

 did not introduce protection for private property


 did not provide checks and balances against government
expropriation
 main purpose -- transfer as much of the resources of the colony
to the colonizer
 Latin America and the Belgian Congo
 At the other extreme:
Colonizers settled and replicated European institutions

 strong emphasis on private property and checks against


government power

 Australia, New Zealand, Canada and U.S.


(2) Colonization strategy was influenced by the feasibility of
settlements

 In places where disease environment was not favorable to European


settlement

 formation of the extractive state was more likely

(3) The colonial state and institutions persisted even after


independence.
Based on these three premises:

 use the mortality rates of the first European settlers as an instruments


for current institutions in these countries

settler mortality  settlements  early institutions  current


institutions  current economic performance 
USA
SGP
HKG
CAN
10 AUS
NZL
MLT CHLBHS
BRB
MUS ARG
VEN
URY
MEX
MYS
ZAF COL PAN GAB
CRI
TTO
Log GDP per capita, PPP, 1995

BRA
TUN
ECU
PER
DZA DOM
FJI GTM BLZ
PRY JAM
MAR IDN
8 GUY EGY
SLV
BOL
AGO
LKA
HND
NIC CMR GIN CIV
PAK IND SDN VNMSEN COGMRT GHA
CAF TGO GMB
HTI
LAO
KEN BEN
BGD UGA
ZAR
BFA
TCD NERMDG NGA
BDI
RWA MLI
TZA SLE
ETH
6

4
2 4 6 8
Log of Settler Mortality
Colonies where Europeans faced higher mortality rates are today
substantially poorer than colonies that were healthy for Europeans

Theory implies

 this relationship reflects the effect of settler mortality working


through the institutions brought by Europeans

 assumes there is no direct affect between settler mortality and


economic performance today
Under these assumptions:

Regress current performance on current institutions and instrument the


latter by settler mortality rates

Focus on property rights and checks against government power

 use protection against risk of expropriation index as a proxy for


institutions
Estimation Strategy

is income per capita in country i

is protection against expropriation (institutions)

is a vector of other control variables (geography, legal origins)

  is not an exogenous variable

 many omitted variables determine both   and


 OLS regression would suffer from omitted variable bias
First stage estimation:

Where is settler mortality rate

Second stage estimation:

 
SGP USA
HKG CAN
10 AUS
NZL
MLT
BHS CHL
ARG VEN
URY MEX GAB
PAN ZAF
CRI COL MYS
Log GDP per capita, PPP, 1995

TTO BRA

GTM PER DOMTUN


ECU
DZA
PRY
JAM
IDN
8 SLV BOLGUY EGY MAR
AGO
HND LKA
NIC CMR
GIN CIV
COG SENGHA
PAK IND
SDN VNM TGO GMB
HTI KEN
ZAR UGA BGD NGA
BFA
MDG
NER
MLI
SLE TZA
ETH
6

4
4 6 8 10
Average Expropriation Risk 1985-95
USA
SGP
HKG
CAN
10 AUS
NZL
MLT CHLBHS
BRB
MUS ARG
VEN
URY
MEX
MYS
ZAF COL PAN GAB
CRI
TTO
Log GDP per capita, PPP, 1995

BRA
TUN
ECU
PER
DZA DOM
FJI GTM BLZ
PRY JAM
MAR IDN
8 GUY EGY
SLV
BOL
AGO
LKA
HND
NIC CMR GIN CIV
PAK IND SDN VNMSEN COGMRT GHA
CAF TGO GMB
HTI
LAO
KEN BEN
BGD UGA
ZAR
BFA
TCD NERMDG NGA
BDI
RWA MLI
TZA SLE
ETH
6

4
2 4 6 8
Log of Settler Mortality
USA
10 NZL CAN
AUS SGP
Average Expropriation Risk 1985-95

IND GMB
HKG
8 MYS BRA
CHL GAB
MEXBHS IDN
TTO
MLT COL
VEN
MAR
CRI JAM
URY
PRY CIV
TGO
ZAF EGY
ECU TZA GIN
TUNDZA VNM CMR
ARG
DOM GHA
PAK LKA KEN
SEN
6 GUY
ETH PER
PAN SLE
BOL NGA
HND AGO
GTM
BGD NIC
SLV NER
COG
UGA
BFA MDG

SDN MLI
4 HTI
ZAR

2 4 6 8
Log of Settler Mortality

You might also like