You are on page 1of 9

# Homework # 2

ECO 7427
Prof. Sarah Hamersma

1. For your data work this week, I would like you to do exercise 5.4 in Wooldridge (all
parts). The data are available on the shared drive in the Economics department or on
Wooldridges website at: http://www.msu.edu/~ec/faculty/wooldridge/book2.htm
The paper you will replicate is Card 1995. It was published in a book, but there are copies
of the working paper version online (NBER # 4483) for your reference.
Wooldridge Ch 5-4
Here is my Stata code, with the verbal answers to the questions embedded in it.
------------------------------------------------------------------------------log: G:\Wooldridge5-4.log
log type: text
opened on: 16 Feb 2005, 11:03:07
.
.
. * Sarah Hamersma
. * 2/15/05
. * program name: Wooldridge5-4.do
.
. * This program provides an answer key to Wooldridge question 5.4
.
. #delimit ;
delimiter now ;
. use "H:\Wooldridge Data\CARD.DTA", clear;
. * Part a ;
. gen logwage = log(wage);
. regress logwage educ exper expersq black south smsa reg661 reg662
>
reg663 reg664 reg665 reg666 reg667 reg668 smsa66;
Source |
SS
df
MS
-------------+-----------------------------Model | 177.695591
15 11.8463727
Residual | 414.946054 2994 .138592536
-------------+-----------------------------Total | 592.641645 3009 .196956346

Number of obs
F( 15, 2994)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

3010
85.48
0.0000
0.2998
0.2963
.37228

-----------------------------------------------------------------------------logwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0746933
.0034983
21.35
0.000
.0678339
.0815527
exper |
.084832
.0066242
12.81
0.000
.0718435
.0978205
expersq |
-.002287
.0003166
-7.22
0.000
-.0029079
-.0016662

black | -.1990123
.0182483
-10.91
0.000
-.2347927
-.1632318
south |
-.147955
.0259799
-5.69
0.000
-.1988952
-.0970148
smsa |
.1363845
.0201005
6.79
0.000
.0969724
.1757967
reg661 | -.1185698
.0388301
-3.05
0.002
-.194706
-.0424335
reg662 | -.0222026
.0282575
-0.79
0.432
-.0776088
.0332036
reg663 |
.0259703
.0273644
0.95
0.343
-.0276846
.0796251
reg664 | -.0634942
.0356803
-1.78
0.075
-.1334546
.0064662
reg665 |
.0094551
.0361174
0.26
0.794
-.0613623
.0802725
reg666 |
.0219476
.0400984
0.55
0.584
-.0566755
.1005708
reg667 | -.0005887
.0393793
-0.01
0.988
-.077802
.0766245
reg668 | -.1750058
.0463394
-3.78
0.000
-.265866
-.0841456
smsa66 |
.0262417
.0194477
1.35
0.177
-.0118905
.0643739
_cons |
4.739377
.0715282
66.26
0.000
4.599127
4.879626
-----------------------------------------------------------------------------.
>
>
>
>
>
>
>
>
>

*
*
*
*
*
*
*
*

## This lines up perfectly with Card's Table 2, column 2. The only

difference is that Card uses "expersq/100" as his regressor so
his coefficient is exactly 100 times the size of ours (but this
affects the std error the same way, so the significance level
is identical). This is a useful place to note that it can be easier
for the reader if you scale variables for which the coefficient
would be very very small, to make it easier to interpret, which
is what Card did.

> * Part b ;
. regress educ exper expersq black south smsa reg661 reg662
>
reg663 reg664 reg665 reg666 reg667 reg668 smsa66 nearc4;
Source |
SS
df
MS
-------------+-----------------------------Model | 10287.6179
15 685.841194
Residual | 11274.4622 2994 3.76568542
-------------+-----------------------------Total | 21562.0801 3009 7.16586243

Number of obs
F( 15, 2994)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

3010
182.13
0.0000
0.4771
0.4745
1.9405

-----------------------------------------------------------------------------educ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------exper | -.4125334
.0336996
-12.24
0.000
-.4786101
-.3464566
expersq |
.0008686
.0016504
0.53
0.599
-.0023674
.0041046
black | -.9355287
.0937348
-9.98
0.000
-1.11932
-.7517377
south | -.0516126
.1354284
-0.38
0.703
-.3171548
.2139296
smsa |
.4021825
.1048112
3.84
0.000
.1966732
.6076918
reg661 |
-.210271
.2024568
-1.04
0.299
-.6072395
.1866975
reg662 | -.2889073
.1473395
-1.96
0.050
-.5778042
-.0000105
reg663 | -.2382099
.1426357
-1.67
0.095
-.5178838
.0414639
reg664 |
-.093089
.1859827
-0.50
0.617
-.4577559
.2715779
reg665 | -.4828875
.1881872
-2.57
0.010
-.8518767
-.1138982
reg666 | -.5130857
.2096352
-2.45
0.014
-.9241293
-.1020421
reg667 | -.4270887
.2056208
-2.08
0.038
-.8302611
-.0239163
reg668 |
.3136204
.2416739
1.30
0.194
-.1602433
.7874841
smsa66 |
.0254805
.1057692
0.24
0.810
-.1819071
.2328682
nearc4 |
.3198989
.0878638
3.64
0.000
.1476194
.4921785
_cons |
16.84852
.2111222
79.80
0.000
16.43456
17.26248
-----------------------------------------------------------------------------.
>
>
>

*
*
*
*

## This lines up with Card as well. The coefficient seems reasonably

for those living near a college and it contributes to the regression meaningfully. One way to assert

>
.
>
>
>
>

*
*
*
*
*

## this would be with an F-test of the effect of including the instruments;

since there is only one instrument, we can simply look at a t-test of
the instrument in the first-stage regression and we can see that it
has statistical explanatory power (see Wooldridge top of p. 105). The
size and significance suggest a reasonably strong instrument.

> * Part c ;
. ivreg logwage exper expersq black south smsa reg661 reg662
>
reg663 reg664 reg665 reg666 reg667 reg668 smsa66 (educ = nearc4);
Instrumental variables (2SLS) regression
Source |
SS
df
MS
-------------+-----------------------------Model | 141.146813
15 9.40978752
Residual | 451.494832 2994 .150799877
-------------+-----------------------------Total | 592.641645 3009 .196956346

Number of obs
F( 15, 2994)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

3010
51.01
0.0000
0.2382
0.2343
.38833

-----------------------------------------------------------------------------logwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.1315038
.0549637
2.39
0.017
.0237335
.2392742
exper |
.1082711
.0236586
4.58
0.000
.0618824
.1546598
expersq | -.0023349
.0003335
-7.00
0.000
-.0029888
-.001681
black | -.1467757
.0538999
-2.72
0.007
-.2524603
-.0410912
south | -.1446715
.0272846
-5.30
0.000
-.19817
-.091173
smsa |
.1118083
.031662
3.53
0.000
.0497269
.1738898
reg661 | -.1078142
.0418137
-2.58
0.010
-.1898007
-.0258278
reg662 | -.0070465
.0329073
-0.21
0.830
-.0715696
.0574767
reg663 |
.0404445
.0317806
1.27
0.203
-.0218694
.1027585
reg664 | -.0579172
.0376059
-1.54
0.124
-.1316532
.0158189
reg665 |
.0384577
.0469387
0.82
0.413
-.0535777
.130493
reg666 |
.0550887
.0526597
1.05
0.296
-.0481642
.1583416
reg667 |
.026758
.0488287
0.55
0.584
-.0689832
.1224992
reg668 | -.1908912
.0507113
-3.76
0.000
-.2903238
-.0914586
smsa66 |
.0185311
.0216086
0.86
0.391
-.0238381
.0609003
_cons |
3.773965
.934947
4.04
0.000
1.940762
5.607169
-----------------------------------------------------------------------------Instrumented: educ
Instruments:
exper expersq black south smsa reg661 reg662 reg663 reg664
reg665 reg666 reg667 reg668 smsa66 nearc4
-----------------------------------------------------------------------------.
>
>
>
>
>
>
>

*
*
*
*
*
*
*
*

The new estimate of the return to education is almost twice as high (13%
vs. 7.5% before). The 95% confidence interval here is (.024, .239). The
earlier one was (.068,.082). We have a lot less precision with the IV
procedure. However, this lack of precision is appropriate. The precision
is part (a) is false in the sense that the estimates are not even
consistent since educ is endogenous (plus they are biased). When we
account for the endogeneity, we use a two-stage IV procedure that will
result in less precision but consistency and unbiasedness of the estimate.;

. * Part d ;
. regress educ exper expersq black south smsa reg661 reg662
>
reg663 reg664 reg665 reg666 reg667 reg668 smsa66 nearc4 nearc2;
Source |
SS
df
MS
-------------+-----------------------------Model | 10297.1164
16 643.569774

Number of obs =
F( 16, 2993) =
Prob > F
=

3010
170.99
0.0000

## Residual | 11264.9637 2993 3.76377002

-------------+-----------------------------Total | 21562.0801 3009 7.16586243

R-squared
=
Root MSE
=

0.4776
0.4748
1.94

-----------------------------------------------------------------------------educ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------exper | -.4122915
.0336914
-12.24
0.000
-.4783521
-.3462309
expersq |
.0008479
.00165
0.51
0.607
-.0023874
.0040832
black | -.9451729
.0939073
-10.06
0.000
-1.129302
-.7610434
south | -.0419115
.1355316
-0.31
0.757
-.3076561
.2238331
smsa |
.4013708
.1047858
3.83
0.000
.1959113
.6068303
reg661 | -.1687829
.2040832
-0.83
0.408
-.5689404
.2313747
reg662 |
-.269031
.1478324
-1.82
0.069
-.5588944
.0208325
reg663 | -.1902114
.1457652
-1.30
0.192
-.4760216
.0955987
reg664 |
-.037715
.1891745
-0.20
0.842
-.4086403
.3332102
reg665 | -.4371387
.1903306
-2.30
0.022
-.8103307
-.0639467
reg666 | -.5022265
.2096933
-2.40
0.017
-.9133841
-.0910688
reg667 | -.3775317
.207922
-1.82
0.070
-.7852162
.0301529
reg668 |
.3820043
.2454171
1.56
0.120
-.0991991
.8632076
smsa66 |
.0000782
.1069445
0.00
0.999
-.2096139
.2097704
nearc4 |
.3205819
.0878425
3.65
0.000
.148344
.4928197
nearc2 |
.1229986
.0774256
1.59
0.112
-.0288142
.2748114
_cons |
16.77306
.2163481
77.53
0.000
16.34885
17.19727
-----------------------------------------------------------------------------.
>
.
>
>

## * The variable nearc4 seems to be more related, and the relationship is

* more precisely estimated for nearc4;
ivreg logwage exper expersq black south smsa reg661 reg662
reg663 reg664 reg665 reg666 reg667 reg668 smsa66 (educ = nearc4 nearc
2);

## Instrumental variables (2SLS) regression

Source |
SS
df
MS
-------------+-----------------------------Model |
100.869
15 6.72459998
Residual | 491.772645 2994
.16425272
-------------+-----------------------------Total | 592.641645 3009 .196956346

Number of obs
F( 15, 2994)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

3010
47.07
0.0000
0.1702
0.1660
.40528

-----------------------------------------------------------------------------logwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.1570594
.0525782
2.99
0.003
.0539662
.2601525
exper |
.1188149
.0228061
5.21
0.000
.0740977
.163532
expersq | -.0023565
.0003475
-6.78
0.000
-.0030379
-.0016751
black | -.1232778
.05215
-2.36
0.018
-.2255313
-.0210243
south | -.1431945
.0284448
-5.03
0.000
-.1989678
-.0874212
smsa |
.100753
.0315193
3.20
0.001
.0389512
.1625548
reg661 |
-.102976
.0434224
-2.37
0.018
-.1881167
-.0178353
reg662 | -.0002286
.0337943
-0.01
0.995
-.066491
.0660337
reg663 |
.0469556
.032649
1.44
0.150
-.0170612
.1109724
reg664 | -.0554084
.0391828
-1.41
0.157
-.1322364
.0214196
reg665 |
.0515041
.0475678
1.08
0.279
-.0417647
.144773
reg666 |
.0699968
.0533049
1.31
0.189
-.0345212
.1745148
reg667 |
.0390596
.0497499
0.79
0.432
-.0584878
.136607
reg668 | -.1980371
.052535
-3.77
0.000
-.3010454
-.0950287
smsa66 |
.0150626
.022336
0.67
0.500
-.0287328
.058858
_cons |
3.339687
.8945377
3.73
0.000
1.585716
5.093658
-----------------------------------------------------------------------------Instrumented: educ
Instruments:
exper expersq black south smsa reg661 reg662 reg663 reg664

## reg665 reg666 reg667 reg668 smsa66 nearc4 nearc2

-----------------------------------------------------------------------------.
>
>
>
>
>
>
>

*
*
*
*
*
*
*
*

The new estimate of the returns to educ is even higher, at 15.7%. There
is again sufficient precision to say with confidence that there is a
positive effect of education on earnings, but the confidence interval is
still fairly wide at (.054,.260). It's notable, though, that the bottom
end of the interval is not that much lower (in magnitude) than the point
estimate in the OLS where we ignored endogeneity. Although we cannot be
certain, this seems to suggest that the uncorrected OLS estimate was likely

. * Part e ;
. regress iq nearc4;
Source |
SS
df
MS
-------------+-----------------------------Model | 2869.62905
1 2869.62905
Residual | 487188.423 2059 236.614096
-------------+-----------------------------Total | 490058.052 2060 237.892258

Number of obs
F( 1, 2059)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

2061
12.13
0.0005
0.0059
0.0054
15.382

-----------------------------------------------------------------------------iq |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nearc4 |
2.5962
.7454966
3.48
0.001
1.134195
4.058206
_cons |
100.6106
.6274557
160.35
0.000
99.38014
101.8412
-----------------------------------------------------------------------------.
>
>
>
>
>
>

*
*
*
*
*

They are correlated - probably not that shocking, given profs' smart kids!
wages directly, so nearc4 could be picking up IQ effects, which would make
nearc4 correlated with the error in the outcome equation (that would be

## * But wait...there's still part f;

. * Part f ;
. regress iq nearc4 smsa66 reg661 reg662 reg669;
Source |
SS
df
MS
-------------+-----------------------------Model | 14792.5727
5 2958.51453
Residual |
475265.48 2055
231.27274
-------------+-----------------------------Total | 490058.052 2060 237.892258

Number of obs
F( 5, 2055)
Prob > F
R-squared
Root MSE

=
=
=
=
=
=

2061
12.79
0.0000
0.0302
0.0278
15.208

-----------------------------------------------------------------------------iq |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nearc4 |
.8680808
.8216913
1.06
0.291
-.7433537
2.479515
smsa66 |
1.354527
.8027961
1.69
0.092
-.2198513
2.928906
reg661 |
4.768099
1.546809
3.08
0.002
1.734623
7.801576
reg662 |
5.80812
.9017539
6.44
0.000
4.039673
7.576566
reg669 |
1.844655
1.151703
1.60
0.109
-.4139708
4.103281
_cons |
99.38472
.7016631
141.64
0.000
98.00868
100.7608
-----------------------------------------------------------------------------. * I'm not sure why we didn't use the whole set of dummies here...anyway,
> * this is good - IQ and nearc4 no longer appear to be partially correlated.
> * Or, at least, there is not a strong enough correlation for us to be

>
>
>
>
>
.

## * able to measure it precisely. The point here is that it is important

* for us to control for 1966 location and regional dummies in the outcome
* equation because these soak up the effects of IQ in a way that allows
* the instrument to end up uncorrelated with the error in the outcome
* equation (which is required in order for us to legitimately use IV).;
log close;
log: G:\Wooldridge5-4.log
log type: text
closed on: 16 Feb 2005, 11:03:08

## 2. Please write me a description of the distinction between a proxy and an

instrument. Specifically, tell me about differences in the assumptions required for each to
be valid and differences in the type of situation in which it would be useful to use such a
variable.
Proxy:
Use a proxy when you need a representative for an omitted variable for which you
dont have direct data (such as using IQ to proxy for ability in a returns-toeducation context). You place it directly into the regression to represent the
variable you dont have data for. The interpretation of the coefficient is the
predictive power of the proxy on the outcome. Note that we still cannot measure
the effect of the omitted variable a proxy just acts as a control so that the other
coefficients in the regression arent biased.
Assumptions:
a) uncorrelated with the error in the outcome equation (this is also called
redundant in the structural equation if we had the real variable, this
one would be redundant)
b) correlated with the omitted variable
- more specifically, it should be closely related enough to the omitted
variable that the other Xs have no power for predicting the omitted
variable once the proxy is taken into account (though theres no way
to check this exactly, since we dont have data on the omitted
variable)
Instrument:
Use an instrument when you have data on an endogenous variable that you think
is correlated with some other omitted variable in your outcome equation, causing
the regression estimates to be biased. An instrument is used to represent the
endogenous variable (NOT the omitted one) and if we consider the singlevariable case, the instrument is put into the outcome equation directly. While we
could look at the coefficient on the instrument, we are typically interested in the
effect of the endogenous variable, which we get by dividing the coefficient on the
instrument in the outcome equation by the coefficient on the instrument from the
first-stage.
6

Assumptions:
a) uncorrelated with the error in the outcome equation (this is also called
redundant in the structural equation if we had a clean version of the
endogenous regressor (without its implicit correlation with some other
omitted variable) then the instrument would be redundant)
b) correlated with the endogenous variable (and NOT with the omitted
variable that is causing the endogenous regressor to be endogenous if it is
correlated with the omitted variable, it will fail to meet assumption (a).)
In terms of comparing the two one clear similarity is in the assumptions
(particularly the first one). However, a clear difference is that they are used to fix
different problems. In one case (instrument) we have a variable of interest but it
is correlated with some omitted variable, preventing us from estimating the effect
properly. We want a representative that will get rid of the endogenous part of the
variable of interest. In the case of a proxy, controlling for the omitted variable
itself is of interest, and we are looking for a way to do this with some substitute
because the data are not available. In a very practical sense, these are distinct in
that there can be no first stage in a proxy setting because we do not have data
on the variable we are trying to represent (and if we did, we wouldnt need the
proxy!).
3. Regarding Lotts work: I would like you to tell me if you think this (his website
defense) is a sufficient argument for choosing not to use clustering in the analysis.
Do your best to convince me of your position by explaining why the analysis does or
does not need clustering.
This was a hard question. I gave substantial partial credit for wrong answers that were well
a) when clustering standard errors is still needed, even with dummies
b) explanation of what dummies can and cannot successfully fix
c) explanation of why clustering will make SEs bigger even if its unneeded
John Lotts analysis uses county-level data from several states and looks at the impact of
state-level treatments. Note that he does not use individual data at all the unit of
observation is the county. This means when he refers to using county fixed effects, this is
equivalent to an individual fixed effect from the perspective of his sample where each
observation is a county. He argues that including county fixed effects implicitly includes
state fixed effects. This argument is correct. However, this only moves us one step closer to
the real question: Does including state fixed-effects mean you dont need clustering at the
state level? The answer is that you still may need clustering.

State fixed effects are an important component of an analysis that uses state-level treatments.
There may be correlated outcomes Y within a state that are not picked up by observable Xs.
This can be thought of as an omitted variables (endogeneity) problem so if this is the case,
and we do not include state fixed effects, our estimates of the treatment effect will be biased
and inconsistent (not to mention the standard errors!). Including a state fixed effect allows
us to explain some of this variation. Econometrically, it will force the expected value of the
residuals within each state to be zero (if they averaged something else, this would have been
incorporated into the estimate of the fixed effect by construction).
Suppose that these state fixed effects properly fix the point estimates (i.e. there is no longer
an omitted variables problem). What does the error structure look like now? Well, within
each state there are several counties. We can estimate a regression and look at the residuals
within each state they will average zero (as noted above) but depending on the state they
might be spread widely or distributed narrowly around zero. This is a heteroskedasticity
problem solve it with the robust function to fix your standard errors.
Where does the clustering come in? It is worth noting that the clustering problem would
have been HUGE if we ignored the fixed effects to start with, and so including them does
make the problem smaller (which is why some of our intuition suggested that it could fix the
problem). However, it may still remain. The issue is that we have controlled only for a very
specific form of correlation among observations within a state we have controlled for a
form of correlation in which every observation in the state has a common (state-level)
component of variance and a random component that is individual-specific (or, in Lotts
case, county-specific). We have assumed all states have this same within-state correlation
structure. It is conceivable, though, that there are other correlations among counties in a
state that are not picked up by this very simple model of correlation. Wooldridge, in his
paper Cluster-Sample Methods in Applied Econometrics, says that an example would be
something that is somehow related to the other Xs in the regression...such as if people
within certain states tend to have certain Xs that are related to certain error-term patterns,
which could cause a complication in the relationships among the errors within a state.
Clustering the standard errors, along with making them robust (Stata does this
automatically), will address this problem. (However, let me note that in the case described
by Wooldridge there it seems there may also be an endogeneity problem if there is some
nonrandom relationship between Xs and error terms).
Mitch Petersons paper Estimating Standard Errors in Finance Panel Data Sets:
Comparing Approaches also addresses this issue and gives a nice example of a situation in
which clustering is still needed in the presence of fixed effects. He examines the use of
various standard error corrections in the presence of different types of error correlation.
Some key insights are on pages 6-8, Section IV (starting page 23), and Section V (starting
page 26). I have pasted the most transparent part of the paper for our purposes below.
Petersons point is that adding firm (or in our case county) fixed effects will fix everything IF
the only correlation among counties is a fixed, time-invariant component. (This echoes
Wooldridge). He notes that this will fail if there is a gradually-changing firm effect. I would
add that the same may be true for a geographically-based correlations (nearby counties may
be more correlated with each other than distant counties, even within a state).

Once we include the firm effects, the OLS standard errors are unbiased .... The clustered standard errors
are unbiased with and without the fixed effects (see Kezdi, 2004, for examples where the clustered standard
errors are too large in a fixed effect model). This conclusion, however, depends on the firm effect being fixed. If
the firm effect decays over time, the firm dummies no longer fully capture the within cluster dependence and
OLS standard errors are still biased (see Table 5 - Panel A, columns II-IV). In these simulations, the firm
effect decays over time (in column II, 61 percent of the firm effect dissipates after 9 years). Once the firm

effect is temporary, the OLS standard errors again underestimate the true standard
errors even when firm dummies are included in the regression (Wooldridge, 2003, Baker,
Stein, and Wurgler, 2003). (p. 28)

If it happens that you are certain that the error structure of your data is perfectly picked up
with fixed effects, you will not need to cluster. Moreover, you will not want to cluster.
Why? Your standard errors will get unnecessarily larger. But why should they change,
especially given the way Moulton (1990) presented the formula for the adjustment (which
seems to imply that if there is no correlation, no adjustment is made)? The intuition here is
that anytime we allow for more flexibility of estimation, it costs us something. Estimating
these flexible standard errors causes a loss of efficiency you can think of it as using up
some of our observations (degrees of freedom) to calculate these special standard errors.
This would lead us to want to KNOW whether we need clustering, since we wouldnt want
to use it unnecessarily.
A newer (2004) paper by Lott posted on his website contains an appendix with his argument
for why any correlation in his errors is taken care of with his fixed effects (though he now
includes clustering throughout the paper, for comparability with other work and to be
conservative about his estimates). He does a test for correlation of errors to argue his point
(which myself and another econometrics colleague have found to be weak at best). This
seems a step in the right direction the idea being that one must still make some kind of
argument for choosing NOT to cluster, even when fixed effects are included. The
argument that fixed effects are included is not itself a sufficient reason to avoid
clustering.