You are on page 1of 10

* Instrumental Variables in Stata

* Copyright 2013 by Ani Katchova

clear all
set more off

use C:\Econometrics\Data\iv_health

* Define dependent variable y1, endogenous variable y2


* Define exogenous variables x1 and instrumental variables x2
* Define alternative set of instruments x2alt for overidentified case
* Define exogenous variables x12 for eq2, instrumental variable x22 for eq2
global y1list logmedexpense
global y2list healthinsu
global x1list illnesses age logincome
global x2list ssiratio
global x2listalt ssiratio firmlocation
global x1list2 illnesses
global x2list2 firmlocation

describe $y1list $y2list $x1list $x2list


summarize $y1list $y2list $x1list $x2list

* OLS regression
regress $y1list $y2list $x1list

* 2SLS estimation
ivregress 2sls $y1list ($y2list = $x2list) $x1list, first

* 2SLS estimation - overidentified


ivregress 2sls $y1list ($y2list = $x2listalt) $x1list, first

* 2SLS estimation (details)


regress $y2list $x2list $x1list
predict y2hat, xb
regress $y1list y2hat $x1list

* Durbin-Wu-Hausman test of endogeneity


quietly ivregress 2sls $y1list ($y2list = $x2list) $x1list, first
estat endogenous

quietly regress $y2list $x2list $x1list


quietly predict v1hat, resid
quietly regress $y1list $y2list $x1list v1hat
test v1hat

* Test of overidentifying restrictions


quietly ivregress gmm $y1list ($y2list = $x2listalt) $x1list, wmatrix(robust)
estat overid

* IV estimation with binary endogenous regressor (first step is probit model)


treatreg $y1list $x1list, treat($y2list = $x2list $x1list)

* Weak instruments
* Correlations of endogenous regressors with instruments
correlate $y2list $x2listalt

* Weak instrument tests - just-identified model


quietly ivregress 2sls $y1list ($y2list = $x2list) $x1list, vce(robust)
estat firststage, forcenonrobust

* Weak instrument tests - two or more overidentifying restrictions


quietly ivregress gmm $y1list ($y2list = $x2listalt) $x1list, vce(robust)
estat firststage, forcenonrobust

* Systems of equations

* 2SLS estimation
reg3 ($y1list $y2list $x1list $x2list)($y2list $y1list $x1list2 $x2list2), 2sls

* 3SLS estimation
reg3 ($y1list $y2list $x1list $x2list)($y2list $y1list $x1list2 $x2list2)
. * Instrumental Variables in Stata
. * Copyright 2013 by Ani Katchova
.
. clear all

. set more off

.
. use C:\Econometrics\Data\iv_health

.
. * Define dependent variable y1, endogenous variable y2
. * Define exogenous variables x1 and instrumental variables x2
. * Define alternative set of instruments x2alt for overidentified case
. * Define exogenous variables x12 for eq2, instrumental variable x22 for eq2
. global y1list logmedexpense

. global y2list healthinsu

. global x1list illnesses age logincome

. global x2list ssiratio

. global x2listalt ssiratio firmlocation

. global x1list2 illnesses

. global x2list2 firmlocation

.
. describe $y1list $y2list $x1list $x2list

storage display value


variable name type format label variable label
--------------------------------------------------------------------------------------
-------------------
logmedexpense float %9.0g log(drugexp)
healthinsu byte %8.0g =1 if individual has supplemental health
insurance through
employer
illnesses byte %8.0g number of illnesses
age byte %8.0g Age
logincome float %9.0g log(income)
ssiratio float %9.0g SSI/Income ratio

. summarize $y1list $y2list $x1list $x2list

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
logmedexpe~e | 10089 6.481361 1.362052 0 10.18017
healthinsu | 10089 .3821984 .4859488 0 1
illnesses | 10089 1.860938 1.292858 0 9
age | 10089 75.05174 6.682109 65 91
logincome | 10089 2.743275 .9131433 -6.907755 5.744476
-------------+--------------------------------------------------------
ssiratio | 10089 .5365438 .3678175 0 9.25062

.
. * OLS regression
. regress $y1list $y2list $x1list

Source | SS df MS Number of obs = 10089


-------------+------------------------------ F( 4, 10084) = 534.37
Model | 3273.16162 4 818.290405 Prob > F = 0.0000
Residual | 15441.9546 10084 1.53133227 R-squared = 0.1749
-------------+------------------------------ Adj R-squared = 0.1746
Total | 18715.1162 10088 1.85518599 Root MSE = 1.2375

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | .0749595 .0260124 2.88 0.004 .02397 .125949
illnesses | .440653 .0095721 46.04 0.000 .4218897 .4594162
age | -.0025946 .001879 -1.38 0.167 -.0062777 .0010886
logincome | .0172363 .0137865 1.25 0.211 -.009788 .0442607
_cons | 5.780127 .150891 38.31 0.000 5.48435 6.075903
------------------------------------------------------------------------------

.
. * 2SLS estimation
. ivregress 2sls $y1list ($y2list = $x2list) $x1list, first

First-stage regressions
-----------------------

Number of obs = 10089


F( 4, 10084) = 185.08
Prob > F = 0.0000
R-squared = 0.0684
Adj R-squared = 0.0680
Root MSE = 0.4691

------------------------------------------------------------------------------
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
illnesses | .011351 .0036336 3.12 0.002 .0042285 .0184736
age | -.0085302 .0007125 -11.97 0.000 -.0099268 -.0071337
logincome | .0544246 .0056429 9.64 0.000 .0433634 .0654858
ssiratio | -.1997539 .0141579 -14.11 0.000 -.2275062 -.1720017
_cons | .9591576 .0568776 16.86 0.000 .8476662 1.070649
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 10089


Wald chi2(4) = 1910.33
Prob > chi2 = 0.0000
R-squared = 0.0709
Root MSE = 1.3128

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.852201 .1983369 -4.30 0.000 -1.240934 -.4634679
illnesses | .4485123 .0102903 43.59 0.000 .4283437 .4686808
age | -.0117975 .0027882 -4.23 0.000 -.0172622 -.0063327
logincome | .0976929 .0224588 4.35 0.000 .0536744 .1417113
_cons | 6.589839 .2346179 28.09 0.000 6.129996 7.049681
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio

.
. * 2SLS estimation - overidentified
. ivregress 2sls $y1list ($y2list = $x2listalt) $x1list, first

First-stage regressions
-----------------------

Number of obs = 10089


F( 5, 10083) = 155.21
Prob > F = 0.0000
R-squared = 0.0715
Adj R-squared = 0.0710
Root MSE = 0.4684

------------------------------------------------------------------------------
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
illnesses | .0117912 .0036286 3.25 0.001 .0046785 .0189039
age | -.0079491 .0007184 -11.06 0.000 -.0093573 -.0065409
logincome | .0509146 .0056665 8.99 0.000 .039807 .0620221
ssiratio | -.1909688 .0142168 -13.43 0.000 -.2188365 -.163101
firmlocation | .1156546 .0200232 5.78 0.000 .0764051 .1549041
_cons | .9124637 .0573591 15.91 0.000 .8000285 1.024899
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 10089


Wald chi2(4) = 1863.60
Prob > chi2 = 0.0000
R-squared = 0.0429
Root MSE = 1.3324

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.9696236 .1863391 -5.20 0.000 -1.334841 -.6044057
illnesses | .4495077 .0104242 43.12 0.000 .4290766 .4699387
age | -.012963 .002727 -4.75 0.000 -.0183079 -.0076181
logincome | .1078825 .0218155 4.95 0.000 .0651249 .1506401
_cons | 6.692387 .2286487 29.27 0.000 6.244244 7.14053
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio firmlocation
.
. * 2SLS estimation (details)
. regress $y2list $x2list $x1list

Source | SS df MS Number of obs = 10089


-------------+------------------------------ F( 4, 10084) = 185.08
Model | 162.932961 4 40.7332402 Prob > F = 0.0000
Residual | 2219.30988 10084 .220082297 R-squared = 0.0684
-------------+------------------------------ Adj R-squared = 0.0680
Total | 2382.24284 10088 .236146197 Root MSE = .46913

------------------------------------------------------------------------------
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ssiratio | -.1997539 .0141579 -14.11 0.000 -.2275062 -.1720017
illnesses | .011351 .0036336 3.12 0.002 .0042285 .0184736
age | -.0085302 .0007125 -11.97 0.000 -.0099268 -.0071337
logincome | .0544246 .0056429 9.64 0.000 .0433634 .0654858
_cons | .9591576 .0568776 16.86 0.000 .8476662 1.070649
------------------------------------------------------------------------------

. predict y2hat, xb

. regress $y1list y2hat $x1list

Source | SS df MS Number of obs = 10089


-------------+------------------------------ F( 4, 10084) = 538.15
Model | 3292.26263 4 823.065659 Prob > F = 0.0000
Residual | 15422.8536 10084 1.52943808 R-squared = 0.1759
-------------+------------------------------ Adj R-squared = 0.1756
Total | 18715.1162 10088 1.85518599 Root MSE = 1.2367

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y2hat | -.8522011 .1868427 -4.56 0.000 -1.21845 -.4859521
illnesses | .4485123 .0096939 46.27 0.000 .4295103 .4675143
age | -.0117975 .0026266 -4.49 0.000 -.0169461 -.0066488
logincome | .0976929 .0211572 4.62 0.000 .0562204 .1391653
_cons | 6.589839 .2210212 29.82 0.000 6.156593 7.023084
------------------------------------------------------------------------------

.
. * Durbin-Wu-Hausman test of endogeneity
. quietly ivregress 2sls $y1list ($y2list = $x2list) $x1list, first

. estat endogenous

Tests of endogeneity
Ho: variables are exogenous

Durbin (score) chi2(1) = 25.0914 (p = 0.0000)


Wu-Hausman F(1,10083) = 25.139 (p = 0.0000)

.
. quietly regress $y2list $x2list $x1list

. quietly predict v1hat, resid

. quietly regress $y1list $y2list $x1list v1hat

. test v1hat

( 1) v1hat = 0

F( 1, 10083) = 25.14
Prob > F = 0.0000

.
. * Test of overidentifying restrictions
. quietly ivregress gmm $y1list ($y2list = $x2listalt) $x1list, wmatrix(robust)

. estat overid

Test of overidentifying restriction:

Hansen's J chi2(1) = 2.14311 (p = 0.1432)

.
. * IV estimation with binary endogenous regressor (first step is probit model)
. treatreg $y1list $x1list, treat($y2list = $x2list $x1list)

Iteration 0: log likelihood = -22788.44


Iteration 1: log likelihood = -22776.656
Iteration 2: log likelihood = -22775.117
Iteration 3: log likelihood = -22775.111
Iteration 4: log likelihood = -22775.111

Treatment-effects model -- MLE Number of obs = 10089


Wald chi2(4) = 1909.65
Log likelihood = -22775.111 Prob > chi2 = 0.0000

-------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
logmedexpense |
illnesses | .4533646 .0110531 41.02 0.000 .4317009 .4750284
age | -.0174793 .002292 -7.63 0.000 -.0219716 -.012987
logincome | .1473667 .0171907 8.57 0.000 .1136736 .1810599
healthinsu | -1.42463 .0812462 -17.53 0.000 -1.58387 -1.265391
_cons | 7.089755 .1860256 38.11 0.000 6.725152 7.454359
--------------+----------------------------------------------------------------
healthinsu |
ssiratio | -.4833678 .0343442 -14.07 0.000 -.5506812 -.4160545
illnesses | .0346595 .0099313 3.49 0.000 .0151946 .0541245
age | -.0237617 .0019748 -12.03 0.000 -.0276323 -.0198912
logincome | .152461 .0148235 10.29 0.000 .1234075 .1815145
_cons | 1.243892 .1568575 7.93 0.000 .9364568 1.551327
--------------+----------------------------------------------------------------
/athrho | .7859563 .0436984 17.99 0.000 .7003089 .8716036
/lnsigma | .3552498 .015163 23.43 0.000 .3255307 .3849688
--------------+----------------------------------------------------------------
rho | .6561122 .024887 .6045638 .702188
sigma | 1.426537 .0216306 1.384765 1.469568
lambda | .9359682 .0485072 .8408959 1.031041
-------------------------------------------------------------------------------
LR test of indep. eqns. (rho = 0): chi2(1) = 90.32 Prob > chi2 = 0.0000

.
.
. * Weak instruments
. * Correlations of endogenous regressors with instruments
. correlate $y2list $x2listalt
(obs=10089)

| health~u ssiratio firmlo~n


-------------+---------------------------
healthinsu | 1.0000
ssiratio | -0.2124 1.0000
firmlocation | 0.1198 -0.1904 1.0000

.
. * Weak instrument tests - just-identified model
. quietly ivregress 2sls $y1list ($y2list = $x2list) $x1list, vce(robust)

. estat firststage, forcenonrobust

First-stage regression summary statistics


--------------------------------------------------------------------------
| Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(1,10084) Prob > F
-------------+------------------------------------------------------------
healthinsu | 0.0684 0.0680 0.0194 68.881 0.0000
--------------------------------------------------------------------------

Minimum eigenvalue statistic = 199.065

Critical Values # of endogenous regressors: 1


Ho: Instruments are weak # of excluded instruments: 1
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53
LIML Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53
---------------------------------------------------------------------

.
. * Weak instrument tests - two or more overidentifying restrictions
. quietly ivregress gmm $y1list ($y2list = $x2listalt) $x1list, vce(robust)
. estat firststage, forcenonrobust

First-stage regression summary statistics


--------------------------------------------------------------------------
| Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(2,10083) Prob > F
-------------+------------------------------------------------------------
healthinsu | 0.0715 0.0710 0.0226 58.8742 0.0000
--------------------------------------------------------------------------

Minimum eigenvalue statistic = 116.533

Critical Values # of endogenous regressors: 1


Ho: Instruments are weak # of excluded instruments: 2
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS Size of nominal 5% Wald test | 19.93 11.59 8.75 7.25
LIML Size of nominal 5% Wald test | 8.68 5.33 4.42 3.92
---------------------------------------------------------------------

.
.
. * Systems of equations
.
. * 2SLS estimation
. reg3 ($y1list $y2list $x1list $x2list)($y2list $y1list $x1list2 $x2list2), 2sls

Two-stage least-squares regression


----------------------------------------------------------------------
Equation Obs Parms RMSE "R-sq" F-Stat P
----------------------------------------------------------------------
logmedexpe~e 10089 5 1.487932 -0.1928 299.26 0.0000
healthinsu 10089 3 .5536873 -0.2978 40.12 0.0000
----------------------------------------------------------------------

-------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
logmedexpense |
healthinsu | -1.6725 .5499929 -3.04 0.002 -2.750531 -.5944692
illnesses | .4578235 .0131069 34.93 0.000 .432133 .4835141
age | -.0187948 .0052074 -3.61 0.000 -.0290017 -.0085879
logincome | .1423373 .0348757 4.08 0.000 .0739781 .2106966
ssiratio | -.163858 .1186859 -1.38 0.167 -.396492 .068776
_cons | 7.376635 .5575224 13.23 0.000 6.283846 8.469424
--------------+----------------------------------------------------------------
healthinsu |
logmedexpense | .2348304 .0820282 2.86 0.004 .0740484 .3956123
illnesses | -.0995706 .036138 -2.76 0.006 -.170404 -.0287371
firmlocation | .2828365 .0269203 10.51 0.000 .2300705 .3356025
_cons | -.9720765 .4658588 -2.09 0.037 -1.885198 -.0589552
-------------------------------------------------------------------------------
Endogenous variables: logmedexpense healthinsu
Exogenous variables: illnesses age logincome ssiratio firmlocation
------------------------------------------------------------------------------

.
. * 3SLS estimation
. reg3 ($y1list $y2list $x1list $x2list)($y2list $y1list $x1list2 $x2list2)

Three-stage least-squares regression


----------------------------------------------------------------------
Equation Obs Parms RMSE "R-sq" chi2 P
----------------------------------------------------------------------
logmedexpe~e 10089 5 1.468607 -0.1627 1497.49 0.0000
healthinsu 10089 3 .5535775 -0.2978 120.41 0.0000
----------------------------------------------------------------------

-------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
logmedexpense |
healthinsu | -1.599042 .5498181 -2.91 0.004 -2.676666 -.5214188
illnesses | .4563656 .0131028 34.83 0.000 .4306845 .4820466
age | -.0177463 .0052054 -3.41 0.001 -.0279487 -.0075439
logincome | .1359816 .0348627 3.90 0.000 .067652 .2043113
ssiratio | -.1339564 .1186419 -1.13 0.259 -.3664902 .0985774
_cons | 7.273976 .5573174 13.05 0.000 6.181654 8.366298
--------------+----------------------------------------------------------------
healthinsu |
logmedexpense | .2348304 .0820119 2.86 0.004 .0740899 .3955708
illnesses | -.0995706 .0361308 -2.76 0.006 -.1703857 -.0287554
firmlocation | .2828365 .026915 10.51 0.000 .2300841 .3355888
_cons | -.9720765 .4657664 -2.09 0.037 -1.884962 -.0591911
-------------------------------------------------------------------------------
Endogenous variables: logmedexpense healthinsu
Exogenous variables: illnesses age logincome ssiratio firmlocation
------------------------------------------------------------------------------

You might also like