0% found this document useful (0 votes)
48 views9 pages

IV Example

ECONOMETRIA

Uploaded by

vgh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views9 pages

IV Example

ECONOMETRIA

Uploaded by

vgh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

IV_example

October 27, 2021

1 Instrumental variables example


1.1 Medical expenses
We want to study the factors influencing medical expenses (y1 = logmedexpense) given the endoge-
nous regressor of having health insurance (y2 = healthinsu) and exogenous regressors of illnesses,
age, and income (x1list). Instruments are the SS income ratio and firm multiple locations. Data
are from the Medical Expenditure Panel Survey (MEPS).

[1]: use iv_health, clear

* Define dependent variable y1, endogenous variable y2


global y1list logmedexpense
global y2list healthinsu
* Define exogenous variables x1 and instrumental variables x2
global x1list illnesses age logincome
global x2list ssiratio
* Define alternative set of instruments x2alt for overidentified case
global x2listalt ssiratio firmlocation
describe $y1list $y2list $x1list $x2list
summarize $y1list $y2list $x1list $x2list

storage display value


variable name type format label variable label
--------------------------------------------------------------------------------
logmedexpense float %9.0g log(drugexp)
healthinsu byte %8.0g =1 if individual has supplemental
health insurance through
employer
illnesses byte %8.0g number of illnesses
age byte %8.0g Age

1
logincome float %9.0g log(income)
ssiratio float %9.0g SSI/Income ratio

Variable | Obs Mean Std. Dev. Min Max


-------------+---------------------------------------------------------
logmedexpe~e | 10,089 6.481361 1.362052 0 10.18017
healthinsu | 10,089 .3821984 .4859488 0 1
illnesses | 10,089 1.860938 1.292858 0 9
age | 10,089 75.05174 6.682109 65 91
logincome | 10,089 2.743275 .9131433 -6.907755 5.744476
-------------+---------------------------------------------------------
ssiratio | 10,089 .5365438 .3678175 0 9.25062
Estimate the following model by OLS:

logmedexpensei = β1 + β2 healthinsui + β3 illnessesi + β4 agei + β5 logincomei + ui

[2]: regress $y1list $y2list $x1list

Source | SS df MS Number of obs = 10,089


-------------+---------------------------------- F(4, 10084) = 534.37
Model | 3273.16162 4 818.290405 Prob > F = 0.0000
Residual | 15441.9546 10,084 1.53133227 R-squared = 0.1749
-------------+---------------------------------- Adj R-squared = 0.1746
Total | 18715.1162 10,088 1.85518599 Root MSE = 1.2375

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | .0749595 .0260124 2.88 0.004 .02397 .125949
illnesses | .440653 .0095721 46.04 0.000 .4218897 .4594162
age | -.0025946 .001879 -1.38 0.167 -.0062777 .0010886
logincome | .0172363 .0137865 1.25 0.211 -.009788 .0442607
_cons | 5.780127 .150891 38.31 0.000 5.48435 6.075903
------------------------------------------------------------------------------
For individuals with health insurance, the predicted medical expenses are 7.8% higher than those
for individuals without health insurance, ceteris paribus.
Let ssiratio be an instrument for healthinsu, estimate the 2SLS with the first stage also:

[3]: ivregress 2sls $y1list ($y2list = $x2list) $x1list, first

First-stage regressions
-----------------------

2
Number of obs = 10,089
F( 4, 10084) = 185.08
Prob > F = 0.0000
R-squared = 0.0684
Adj R-squared = 0.0680
Root MSE = 0.4691

------------------------------------------------------------------------------
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
illnesses | .011351 .0036336 3.12 0.002 .0042285 .0184736
age | -.0085302 .0007125 -11.97 0.000 -.0099268 -.0071337
logincome | .0544246 .0056429 9.64 0.000 .0433634 .0654858
ssiratio | -.1997539 .0141579 -14.11 0.000 -.2275062 -.1720017
_cons | .9591576 .0568776 16.86 0.000 .8476662 1.070649
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 10,089


Wald chi2(4) = 1910.33
Prob > chi2 = 0.0000
R-squared = 0.0709
Root MSE = 1.3128

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.852201 .1983369 -4.30 0.000 -1.240934 -.4634679
illnesses | .4485123 .0102903 43.59 0.000 .4283437 .4686808
age | -.0117975 .0027882 -4.23 0.000 -.0172622 -.0063327
logincome | .0976929 .0224588 4.35 0.000 .0536744 .1417113
_cons | 6.589839 .2346179 28.09 0.000 6.129996 7.049681
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio
Durbin-Wu-Hausman test of endogeneity:

[4]: estat endogenous

regress $y2list $x2list $x1list


predict v1hat, resid
quietly regress $y1list $y2list $x1list v1hat
test v1hat

Tests of endogeneity
Ho: variables are exogenous

3
Durbin (score) chi2(1) = 25.0914 (p = 0.0000)
Wu-Hausman F(1,10083) = 25.139 (p = 0.0000)

Source | SS df MS Number of obs = 10,089


-------------+---------------------------------- F(4, 10084) = 185.08
Model | 162.932961 4 40.7332402 Prob > F = 0.0000
Residual | 2219.30988 10,084 .220082297 R-squared = 0.0684
-------------+---------------------------------- Adj R-squared = 0.0680
Total | 2382.24284 10,088 .236146197 Root MSE = .46913

------------------------------------------------------------------------------
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ssiratio | -.1997539 .0141579 -14.11 0.000 -.2275062 -.1720017
illnesses | .011351 .0036336 3.12 0.002 .0042285 .0184736
age | -.0085302 .0007125 -11.97 0.000 -.0099268 -.0071337
logincome | .0544246 .0056429 9.64 0.000 .0433634 .0654858
_cons | .9591576 .0568776 16.86 0.000 .8476662 1.070649
------------------------------------------------------------------------------

( 1) v1hat = 0

F( 1, 10083) = 25.14
Prob > F = 0.0000
The Durbin-Wu-Hausman test compares OLS and the 2SLS model coefficients. The null hypothesis
that the regressors are exogenous is rejected. Therefore, the health insurance is an endogenous
regressor and we need to use instrumental variables approach.
Without using the automatic way, estimate the coefficients by 2SLS:

[5]: regress $y2list $x2list $x1list


predict y2hat, xb
regress $y1list y2hat $x1list

Source | SS df MS Number of obs = 10,089


-------------+---------------------------------- F(4, 10084) = 185.08
Model | 162.932961 4 40.7332402 Prob > F = 0.0000
Residual | 2219.30988 10,084 .220082297 R-squared = 0.0684
-------------+---------------------------------- Adj R-squared = 0.0680
Total | 2382.24284 10,088 .236146197 Root MSE = .46913

4
------------------------------------------------------------------------------
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ssiratio | -.1997539 .0141579 -14.11 0.000 -.2275062 -.1720017
illnesses | .011351 .0036336 3.12 0.002 .0042285 .0184736
age | -.0085302 .0007125 -11.97 0.000 -.0099268 -.0071337
logincome | .0544246 .0056429 9.64 0.000 .0433634 .0654858
_cons | .9591576 .0568776 16.86 0.000 .8476662 1.070649
------------------------------------------------------------------------------

Source | SS df MS Number of obs = 10,089


-------------+---------------------------------- F(4, 10084) = 538.15
Model | 3292.26263 4 823.065659 Prob > F = 0.0000
Residual | 15422.8536 10,084 1.52943808 R-squared = 0.1759
-------------+---------------------------------- Adj R-squared = 0.1756
Total | 18715.1162 10,088 1.85518599 Root MSE = 1.2367

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y2hat | -.8522011 .1868427 -4.56 0.000 -1.21845 -.4859521
illnesses | .4485123 .0096939 46.27 0.000 .4295103 .4675143
age | -.0117975 .0026266 -4.49 0.000 -.0169461 -.0066488
logincome | .0976929 .0211572 4.62 0.000 .0562204 .1391653
_cons | 6.589839 .2210212 29.82 0.000 6.156593 7.023084
------------------------------------------------------------------------------
After instrumentation, for individuals with health insurance, their medical expenses are predicted
57.3% lower than those for individuals without health insurance, ceteris paribus. Note that the
2SLS coefficient estimate turned out quite different from the OLS coefficient estimate.
Alternatively, let ssiratio f irmlocation be the instruments for healthinsu, estimate by 2SLS:

[6]: ivregress 2sls $y1list ($y2list = $x2listalt) $x1list, first

First-stage regressions
-----------------------

Number of obs = 10,089


F( 5, 10083) = 155.21
Prob > F = 0.0000
R-squared = 0.0715
Adj R-squared = 0.0710
Root MSE = 0.4684

------------------------------------------------------------------------------

5
healthinsu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
illnesses | .0117912 .0036286 3.25 0.001 .0046785 .0189039
age | -.0079491 .0007184 -11.06 0.000 -.0093573 -.0065409
logincome | .0509146 .0056665 8.99 0.000 .039807 .0620221
ssiratio | -.1909688 .0142168 -13.43 0.000 -.2188365 -.163101
firmlocation | .1156546 .0200232 5.78 0.000 .0764051 .1549041
_cons | .9124637 .0573591 15.91 0.000 .8000285 1.024899
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 10,089


Wald chi2(4) = 1863.60
Prob > chi2 = 0.0000
R-squared = 0.0429
Root MSE = 1.3324

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.9696236 .1863391 -5.20 0.000 -1.334841 -.6044057
illnesses | .4495077 .0104242 43.12 0.000 .4290766 .4699387
age | -.012963 .002727 -4.75 0.000 -.0183079 -.0076181
logincome | .1078825 .0218155 4.95 0.000 .0651249 .1506401
_cons | 6.692387 .2286487 29.27 0.000 6.244244 7.14053
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio firmlocation
With two instruments instead of one, the estimates changed only slightly from -0.852 to -0.970 for
the coefficient on have health insurance.
Test of overidentifying restrictions:

[7]: ivregress 2sls $y1list ($y2list = $x2listalt) $x1list


estat overid

Instrumental variables (2SLS) regression Number of obs = 10,089


Wald chi2(4) = 1863.60
Prob > chi2 = 0.0000
R-squared = 0.0429
Root MSE = 1.3324

------------------------------------------------------------------------------
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.9696236 .1863391 -5.20 0.000 -1.334841 -.6044057

6
illnesses | .4495077 .0104242 43.12 0.000 .4290766 .4699387
age | -.012963 .002727 -4.75 0.000 -.0183079 -.0076181
logincome | .1078825 .0218155 4.95 0.000 .0651249 .1506401
_cons | 6.692387 .2286487 29.27 0.000 6.244244 7.14053
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio firmlocation

Tests of overidentifying restrictions:

Sargan (score) chi2(1) = 2.37696 (p = 0.1231)


Basmann chi2(1) = 2.37611 (p = 0.1232)
The test for overidentifying restriction shows all instruments are valid.
Test for weak instrumens on both alternatives:

[8]: * W eak instrument tests - just-identified model


ivregress 2sls $y1list ($y2list = $x2list) $x1list, vce(robust)
estat firststage, forcenonrobust
* W eak instrument tests - two or more overidentifying restrictions
ivregress 2sls $y1list ($y2list = $x2listalt) $x1list, vce(robust)
estat firststage, forcenonrobust

Instrumental variables (2SLS) regression Number of obs = 10,089


Wald chi2(4) = 1994.79
Prob > chi2 = 0.0000
R-squared = 0.0709
Root MSE = 1.3128

------------------------------------------------------------------------------
| Robust
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.852201 .2113027 -4.03 0.000 -1.266347 -.4380553
illnesses | .4485123 .0100689 44.54 0.000 .4287776 .468247
age | -.0117975 .0029007 -4.07 0.000 -.0174828 -.0061121
logincome | .0976929 .0233306 4.19 0.000 .0519657 .14342
_cons | 6.589839 .245398 26.85 0.000 6.108867 7.07081
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio

First-stage regression summary statistics


--------------------------------------------------------------------------

7
| Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(1,10084) Prob > F
-------------+------------------------------------------------------------
healthinsu | 0.0684 0.0680 0.0194 68.881 0.0000
--------------------------------------------------------------------------

Minimum eigenvalue statistic = 199.065

Critical Values # of endogenous regressors: 1


Ho: Instruments are weak # of excluded instruments: 1
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53
LIML Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53
---------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 10,089


Wald chi2(4) = 1938.88
Prob > chi2 = 0.0000
R-squared = 0.0429
Root MSE = 1.3324

------------------------------------------------------------------------------
| Robust
logmedexpe~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
healthinsu | -.9696236 .1987108 -4.88 0.000 -1.35909 -.5801575
illnesses | .4495077 .0102219 43.98 0.000 .4294731 .4695422
age | -.012963 .0028378 -4.57 0.000 -.018525 -.007401
logincome | .1078825 .0227967 4.73 0.000 .0632018 .1525632
_cons | 6.692387 .2388115 28.02 0.000 6.224325 7.160449
------------------------------------------------------------------------------
Instrumented: healthinsu
Instruments: illnesses age logincome ssiratio firmlocation

First-stage regression summary statistics


--------------------------------------------------------------------------
| Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(2,10083) Prob > F
-------------+------------------------------------------------------------
healthinsu | 0.0715 0.0710 0.0226 58.8742 0.0000

8
--------------------------------------------------------------------------

Minimum eigenvalue statistic = 116.533

Critical Values # of endogenous regressors: 1


Ho: Instruments are weak # of excluded instruments: 2
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS Size of nominal 5% Wald test | 19.93 11.59 8.75 7.25
LIML Size of nominal 5% Wald test | 8.68 5.33 4.42 3.92
---------------------------------------------------------------------

The test for weak instruments looks at the F statistic for joint significance of instruments. The
number is 69 from the model with 1 instrument and 59 from the model with 2 instruments, which
is larger than the rule of thumb of 10. Therefore, the instruments are not weak.

You might also like