This action might not be possible to undo. Are you sure you want to continue?
Analysis of Cross Section and Panel
Data, by Jeffrey M. Wooldridge, MIT Press, 2002.
The empirical examples are
solved using various versions of Stata, with some dating back to Stata 4.0.
Partly out of laziness, but also because it is useful for students to see
computer output, I have included Stata output in most cases rather than type
tables.
In some cases, I do more hand calculations than are needed in current
versions of Stata.
Currently, there are some missing solutions.
I will update the solutions
occasionally to fill in the missing solutions, and to make corrections.
some problems I have given answers beyond what I originally asked.
For
Please
report any mistakes or discrepencies you might come across by sending me email at wooldri1@msu.edu.
CHAPTER 2
dE(yx1,x2)
dE(yx1,x2)
= b1 + b4x2 and
= b2 + 2b3x2 + b4x1.
dx1
dx2
2
b. By definition, E(ux1,x2) = 0. Because x2 and x1x2 are just functions
2.1. a.


of (x1,x2), it does not matter whether we also condition on them:
E(ux1,x2,x2,x1x2) = 0.
2
c. All we can say about Var(ux1,x2) is that it is nonnegative for all x1
and x2:
E(ux1,x2) = 0 in no way restricts Var(ux1,x2).
2.3. a. y = b0 + b1x1 + b2x2 + b3x1x2 + u, where u has a zero mean given x1
and x2:
b.
E(ux1,x2) = 0.
We can say nothing further about u.
dE(yx1,x2)/dx1 = b1 + b3x2.
Because E(x2) = 0, b1 =
1
E[dE(yx1,x2)/dx1].
Similarly, b2 = E[dE(yx1,x2)/dx2].
c. If x1 and x2 are independent with zero mean then E(x1x2) = E(x1)E(x2)
= 0.
Further, the covariance between x1x2 and x1 is E(x1x2Wx1) = E(x1x2) =
2
2
E(x1)E(x2) (by independence) = 0.
A similar argument shows that the
covariance between x1x2 and x2 is zero.
But then the linear projection of
x1x2 onto (1,x1,x2) is identically zero.
Now just use the law of iterated
projections (Property LP.5 in Appendix 2A):
L(y1,x1,x2) = L(b0 + b1x1 + b2x2 + b3x1x21,x1,x2)
= b0 + b1x1 + b2x2 + b3L(x1x21,x1,x2)
= b0 + b1x1 + b2x2.
d. Equation (2.47) is more useful because it allows us to compute the
partial effects of x1 and x2 at any values of x1 and x2.
Under the
assumptions we have made, the linear projection in (2.48) does have as its
slope coefficients on x1 and x2 the partial effects at the population average
values of x1 and x2  zero in both cases  but it does not allow us to
obtain the partial effects at any other values of x1 and x2.
Incidentally,
the main conclusions of this problem go through if we allow x1 and x2 to have
any population means.
2.5. By definition, Var(u1x,z) = Var(yx,z) and Var(u2x) = Var(yx).
2
assumption, these are constant and necessarily equal to s1
Var(u2), respectively.
By
_ Var(u1) and s22 _
2
But then Property CV.4 implies that s2
> s21.
This
simple conclusion means that, when error variances are constant, the error
variance falls as more explanatory variables are conditioned on.
2.7. Write the equation in error form as
2
y = g(x) + zB + u, E(ux,z) = 0.
Take the expected value of this equation conditional only on x:
E(yx) = g(x) + [E(zx)]B,
and subtract this from the first equation to get
y  E(yx) = [z  E(zx)]B + u
~
~
or y = zB + u.
~
~
Because z is a function of (x,z), E(uz) = 0 (since E(ux,z) =
~ ~
~
0), and so E(yz) = zB.
This basic result is fundamental in the literature on
estimating partial linear models.
First, one estimates E(yx) and E(zx)
using very flexible methods, typically, socalled nonparametric methods.
~
Then, after obtaining residuals of the form yi
^
E(zixi),
B
^
~
_ yi  E(y
ixi) and zi _ zi  
~
~
is estimated from an OLS regression yi on zi, i = 1,...,N.
Under
general conditions, this kind of nonparametric partiallingout procedure leads
to a
rNconsistent, asymptotically normal estimator of B.

See Robinson (1988)
and Powell (1994).
CHAPTER 3
3.1. To prove Lemma 3.1, we must show that for all e > 0, there exists be <
and an integer Ne such that P[xN
following fact:
since xN
p
L
But
We use the
a, for any e > 0 there exists an integer Ne such
that P[xN  a > 1] < e for all N
Definition 3.3(1).]
> be] < e, all N > Ne.
8
> Ne .
[The existence of Ne is implied by
xN = xN  a + a < xN  a + a (by the triangle
inequality), and so
xN  a < xN  a.
It follows that P[xN 
< P[xN  a > 1].
Therefore, in Definition 3.3(3) we can take be
a > 1]
_ a + 1
(irrespective of the value of e) and then the existence of Ne follows from
Definition 3.3(1).
3
1)/se(q) = 3/2 = 1. This follows immediately from Lemma 3.7.39 and se(g^) = 1/2.m) ~a Normal(0.g)]  2 ^ = (1/q) Avar[rN(q .  is used: ^2 s = (N  The asymptotic i=1 ^ standard error of yN is simply s/rN. and so Avar[rN(yN .  ^ c.yN)2. g = log(4) ^ When q = 4 ~ 1.q)].     c. ^ b. By the CLT. of course. the null of interest can also be stated as H0: g = 4 . the asymptotic standard error of g is generally dg(^q)/dqWse(^q). Because g = log(q). we need a consistent estimator of s.Avar[rN(g .s2).m)] = N(s /N) = s .q)].g)] = [dg(q)/dq] Avar[rN(q . The asymptotic t statistic for testing H0: q = 1 is (q . e. ^ ^ and se(q) = 2. se(g) = se(q)/q. or s/rN. 3.m)] by N.  d.   e. ^ ^ d.  2 Avar(yN) = s /N. Since Var(yN) = s /N.  As expected.   3. this coincides with the actual variance of yN.5.3. To obtain the asymptotic standard error of yN.which is. the unbiased estimator of s 2 S (yi . We Obtain Avar(yN) by dividing Avar[rN(yN . In the scalar case.1 because g(xN) p L g(c). for g(q) = log(q). and so ^ ^ plim[log(q)] = log[plim(q)] = log(q) = g.m)] = s2.    Therefore. For q > 0 the natural logarithim is a continuous function.  2  2 rN(yN .   When g(q) = ^ log(q) . and then s^ is the positive square root.5. a. Var[rN(yN .3. ^ ^ ^ Therefore. We use the delta method to find Avar[rN(g . The asymptotic standard deviation of yN is the square root of its asymptotic variance.g)]. a. ^ ^ ^ 2 ^ if g = g(q) then Avar[rN(g .  In the scalar case. 1 N 1) Typically. 2  b. continuously differentiable .
^ Avar[rN(G  where G(Q) = G)] ~ = G(Q)V1G(Q)’. whereas the t statistic based on q is. By assumption.V1 is positive semidefinite. Now.78. V2 .0.s. Now. using the Wald test. the percentage difference is 100W[exp(b1) . ~ Avar[rN(G  G)] G)] = G(Q)V2G(Q)’. The lesson is that. at best. marginally significant. ^ .1] = g(b1). This leads to a ^ very strong rejection of H0.1. 3.d. b. By the delta method. finding the proportionate difference in this expectation at married = 1 and married = 0 (with all else equal) gives exp(b1) . all other factors cancel out. Avar[rN(G  Dqg(Q) is Q * P. CHAPTER 4 4.1. E(wagex) = E[exp(u)x]exp(b0 + b1married + b2educ + zG). Therefore. Therefore.Avar[rN(G  G)] = G(Q)(V2 . if u and x are independent Therefore E(wagex) = d0exp(b0 + b1married + b2educ + zG). and therefore G(Q)(V2 V1)G(Q)’ is p. we need the derivative of g with 5 . say.1].5) = 2. we can change the outcome of hypotheses tests by using nonlinear transformations. Exponentiating equation (4. ^ The t statistic based on g is about 1. This completes the proof. where x denotes all explanatory variables. Thus. a. then E[exp(u)x] = E[exp(u)] = d0.39/(.49) gives wage = exp(b0 + b1married + b2educ + zG + u) = exp(u)exp(b0 + b1married + b2educ + zG).V1)G(Q)’.9. Since q1 = 100W[exp(b1) .
11.29).5. But. all else fixed.199. 2 b. in which case OLS is consistent. se(b1) = ^ ^ . Var(ux) = E(u ^ ^ Therefore. Not in general. because E(x’z) = 0.1 = exp(b2Deduc) .50) as E(yw) = wD. It could be that E(x’u) = 0. For the estimated version of equation (4.006. with upper block E(x’x) and lower block E(z ).039. say educ0 and educ1.3. generally. where w = (x. and Var(ux) is constant. 2 the upper K * K block gives 6 Inverting E(w’w) and focusing on . E(w’w) is block diagonal. c.1.[E(ux)]2. The proportionate change in expected wage from educ0 to educ1 is [exp(b2educ1) . Write equation (4. 4. it follows by Theorem 4. ^ q2 we set Deduc = 4. ^ Using the same arguments in part (b).dg/db1 = 100Wexp(b1). 4. We can evaluate the conditional expectation in part (a) at two levels of education.z). the usual standard errors would not be valid unless E(ux) = 0. se(b2) = . if E(ux) $ 0.educ0)] . q1 = 22. D) Since Var(yw) = is s [E(w’w)] .01 and se(q1) = 4.g)’.1] and ^ ^ ^ se(q2) = 100WDeducexp(b2Deduc)se(b2) ^ ^ d. b1 = . 2 ^ s . The conditional variance can always be written as x) .2 that Avar rN(D  ^ ^ (B’.where 2 1 ^ D = Importantly. b2 = . For ^ ^ Then q2 = 29. a.7 and se(q2) = 3. q2 = 100W[exp(b2Deduc) . then E(u2x) $ Var(ux).065. respect to b1: ^ The asymptotic standard error of q1 ^ using the delta method is obtained as the absolute value of dg/db1 times ^ se(b1): ^ ^ ^ se(q1) = [100Wexp(b1)]Wse(b1).exp(b2educ0)]/exp(b2educ0) = exp[b2(educ1 .76.
E(v x’x) . 1 Because [E(x’x)] 1 s E(x’x) is p.Avar rN(B . other things equal. the equation y = xB + v generally violates the 2 Unless E(z homoskedasticity assumption OLS. 2 2 2 1 is positive definite.z) = 0 and E(u x. E(v x’x) . a. It is helpful to write y = xB + v  _ y .Avar rN(B . Avar So.B). E(v x’x) = E[E(v 2 2 _ E(z2x). 2 2 2 2 4.7.B) .3). it suffices to show that E(v x’x) 2 To this end.  Next. ~ 1 2 1 rN(B . One important omitted factor in u is family income: students that come from wealthier families tend to do better in school. where we use E(zux. Therefore. which.z) = zE(ux.s E(x’x)][E(x’x)] .d.B)   = [E(x’x)] E(v x’x)[E(x’x)] 1 2 1 = [E(x’x)] E(v x’x)[E(x’x)] 1 2 1 . ~ rN(B .E(yx.B) = [E(x’x)] E(v x’x)[E(x’x)] .3.  Now we can show Avar ~ ^ rN(B . let h(x) iterated expectations.z). Then by the law of x)x’x] = g2E[h(x)x’x] + s2E(x’x). 2 = h $ 0.Avar ^ 2 1 rN(B .z) = Var(yx.s [E(x’x)] E(x’x)[E(x’x)] 2 1 1 = [E(x’x)] [E(v x’x) . without further assumptions.B) is positive semidefinite by   writing Avar ~ ^ rN(B . which is positive definite.s E(x’x) = g E[h(x)x’x].s.B) . we need to find Avar where v = gz + u and u E(x’v) = 0.s [E(x’x)] 2 1 . Family income and PC ownership are positively correlated because the probability of owning a PC increases with family income. when g 2 2 2 a positive definite matrix except by fluke.B) = s [E(x’x)] . x) = g2E(z2x) + E(u2x) + 2gE(zux) = g2E(z2x) + 2 Further. 2 2 2 x) is constant. Because E(x’z) = 0 and E(x’u) = 0. 7 Another factor in u . is actually x) = E(z2) 2 In particular.s E(x’x) = g h E(x’x). E(v s .z) = s . if E(z > 0 (in which case y = xB + v satisfies the homoskedasticity assumption OLS.
b3 is likely to have an upward bias because of the positive correlation between u and PC. a. This may also be correlated with PC: a student who had more exposure with computers in high school may be more likely to own a computer. b. where sw 1 1 = sd(w1) and sw = sd(w). average teacher salary. as zip code is often part of school records. If we write the linear projection u = d0 + d1hsGPA + d2SAT + d3PC + r then the bias is upward if d3 is greater than zero.is quality of high school. w1 = log(y1). but it is not clearcut because of the other explanatory variables in the equation. This measures the partial correlation between u (say.9. If family income is not available sometimes level of parents’ education is.w)/Var(w1). But Corr(w1.w) = Cov(w1.w)/(sw sw). by assumption. The coefficient on log(y1) changes. and it is likely to be positive. c. Clearly. Then the population slope coefficient in a simple regression is always a1 = Cov(w1. Another possibility is to use average house value in each student’s home zip code. Var(w) = Var(w1). expenditure per student. let w = log(y). the intercept and slope estimates on x will be the same. and since a correlation coefficient is always between 1 1 8 . Just subtract log(y1) from both sides: Dlog(y) = b0 + xB + (a1 . and so on. If data on family income can be collected then it can be included in the equation. family income) and PC. But.1)log(y1) + u. Proxies for high school quality might be facultystudent ratios. For simplicity. ^ b. 4.w)/(sw sw). so we can write a1 = Cov(w1.
89964382 Residual  121.2087073 .006125 .1921449 .000 .28 0.36251 lwage  Coef.2662 0. We can see from the t statistics that these variables are going to be 9 . we have an even lower estimated return to education.0031183 .000 . 4.0002 a.002 .0109248 .0305676 urban  .1334913 . 925) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 935 37.559489 925 .0032308 3. test iq kww ( 1) ( 2) iq = 0.0000 0. When we used no proxy the estimated return was about 6.000 .066 0.0074608 _cons  5.534 0.002 .177362188 Number of obs F( 9.863 0.1157839 . Thus.0064117 .000 .0157246 married  .039 .506 0. Std.0024457 4.079 0.0051059 kww  .2685059 south  .000 4.938 0.1758226 .0018521 2.000 .0010128 3. 925) = Prob > F = 8.0820295 .59 0.2286334 black  .0389094 4.5%. reg lwage exper tenure married south urban black educ iq kww Source  SS df MS +Model  44.and 1. and with only IQ as a proxy it was about 5.0967944 9 4. Err.947 0.467 0.1230118 . b.1303995 .0 F( 2.003826 .0 kww = 0.127776 40.001 .0011306 .0355856 .0640893 iq  .175644 .0399014 3.131415664 +Total  165. but it is still practically nontrivial and statistically very significant.128 0. Here is some Stata output obtained to answer this question: .0127522 .0001911 .4%.0269095 6. the result follows.426408 .11. The estimated return to education using both IQ and KWW as proxies for ability is about 5%.0190927 tenure  .007262 6.0498375 . Interval] +exper  .2591 .656283 934 . t P>t [95% Conf.0262222 3.924879 5.0520917 educ  .268 0.
Std.15 0.13.4162 0.4946899 lprbconv  .28 0.725921 4.18405574 +Total  26.69 0. 85) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 15. Using the 90 counties for 1987 gives . c.42902 lcrmrte  Coef.009923 Because of the loglog functional form. Blacks are estimated to earn about 13% less than nonblacks.000 . The F test verifies this. The wage differential between nonblacks and blacks does not disappear.1153163 6.2507964 .6377519 .000 . b. all coefficients are elasticities.4014499 _cons  4. To add the previous year’s crime rate we first generate the lag: . 4. a. reg lcrmrte lprbarr lprbconv lprbpris lavgsen if d87 Source  SS df MS +Model  11.4725112 . with pvalue = .2486073 . and both are practically and statistically significant.0764213 . The elasticities of crime with respect to the arrest and conviction probabilities are the sign we expect.77 0.641 . holding all other factors fixed.6447379 85 .jointly significant.000 5.441 .4315307 11.1634732 0.570136 lavgsen  .7239696 .3888 .28 0.47 0.9532493 .1596698 .301120202 Number of obs F( 4. gen lcrmr_1 = lcrmrte[_n1] if d87 (540 missing values generated) . Err. The elasticities with respect to the probability of serving a prison term and the average sentence length are positive but are statistically insignificant.867922 .2064441 0. t P>t [95% Conf.0000 0.799698 89 .78874002 Residual  15. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 if d87 10 .0831078 5.1549601 4 2.3072706 lprbpris  . Interval] +lprbarr  .0002.
799698 89 .20251 lcrmrte  Coef.0465999 0. and the latter is almost statistically significant at the 5% level against a twosided alternative (pvalue = . the elasticity with respect to the lagged crime rate is large and very statistically significant. The conviction Adding the lagged crime rate changes the signs of the elasticities with respect to prbpris and avgsen.0627624 2.3130986 2.67099462 Residual  3.28 0.389257 .) c.3549731 5 4.0000 0.301120202 Number of obs F( 14.25 0.004 . t P>t [95% Conf.3098523 .038930942 +Total  26.1520228 .8707 . Std.0539921 lprbpris  . Err.1439946 There are some notable changes in the coefficients on the original variables.0386768 . (The elasticity is also statistically different from unity.95 0.0452114 17.0036684 lcrmr_1  .1266874 .81 0.056 .0602325 lprbconv  .000 .94 0.8697208 _cons  .3077141 . but still have signs predicted by a deterrenteffect story.3232625 .7798129 .204 . Interval] +11 .799698 89 . 75) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 43.91982063 75 .056).0698876 lavgsen  .1313457 . 84) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 113. Interval] +lprbarr  .19731 lcrmrte  Coef.8715 0.8911 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 lwconlwloc if d87 Source  SS df MS +Model  23.016 1.90 0.6899051 .7666256 .0782915 1. The elasticities with respect to prbarr and prbconv are much smaller now.4447249 84 .8798774 14 1. Std. Err. Adding the logs of the nine wage variables gives the following: . Not surprisingly.70570553 Residual  2.0000 0.301120202 Number of obs F( 5.Source  SS df MS +Model  23.83 0.8638 .45 0.04100863 +Total  26.0988505 1.409 . t P>t [95% Conf.1850424 . probability is no longer statistically significant.
1674273 .4749687 .37 0.336).792525 1.056 7.0987371 . which is appended to the "reg" command.253707 .11 0.3291546 0.039 .634 .50 0.692009 .011 .3361278 .1964974 0.05 0.43 0.911 . These are with respect to the wage in construction ( .1525615 .0641312 .0686327 lwtuc  .0 0.0847427 1.672 .61 0.3732769 . (This F statistic is the heteroskedasticityrobust Wald statistic divided by the number of restrictions being tested.187 .7153665 lwfir  .049728 1. nine in this 12 .957472 1. d.which also have the largest absolute t statistics have the opposite sign.8509887 lwcon  .2815703 lwmfg  .2079524 .8248172 lwsta  .1024014 2. testparm lwconlwloc ( ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) 9) lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F( = = = = = = = = = 0.3317244 lwtrd  .1725122 .2850008 .618724 _cons  3.364317 . 75) = Prob > F = 1.1069592 .2155553 .62 0.lprbarr  .4522947 lwloc  .113 .09 0.0 0.0659533 2.19 and pvalue = .0 0.19 0.1960546 .277 .032.37 0.0306994 lprbpris  .134327 0.173 .408 . Using the "robust" option in Stata.2034619 .1127542 .1375459 .1775178 1. gives the heteroskedasiticityrobust F statistic as F = 2.2072112 0.6386344 .0683639 .6396942 .3038978 .0 0.94 0.0560619 .11 0.3079171 lwser  .0835258 .7453414 .2453134 1.48 0.0 0.3350201 lwfed  .2317449 1.0844647 2.175 .1186099 0.285) and the wage for federal employees (.0411265 lprbconv  .4195493 .0 0.000 .023 .849 .0 0. The two largest elasticities .0 9.0530331 14.0277923 lcrmr_1  .6926951 .0115614 lavgsen  .32 0.0395089 .0 0.0369855 .1643 The nine wage variables are jointly insignificant even at the 15% level.33 0.83 0. the elasticities are not consistently positive or negative. Plus.
2 The statement "Var(ui) = s are nonrandom (or B = Var(yi) for all i" assumes that the regressors = 0. Cov(xB. plim(R ) = 1 2 2 2 . which is not a very interesting case). a. Therefore. the error changes. Cov(xB. Because each xj has finite second moment. Since Var(u) But each xj is uncorrelated with u. or sy = Var(xB) + su. 2 Therefore. 2 2 b. The population Rsquared depends on only the unconditional variances of u and y. so we should allow for error variances to change across different models for the same response variable.(SSR/N)/(SST/N).) In the vast majority of economic applications. Suppose that an element of the error term.[plim(SSR/N)]/[plim(SST/N)] = 1 . (It gets smaller.SSR/SST = 1 .u) = 0.su/sy = r . The derivation in part (c) assumed nothing about Var(ux).example. regardless of the nature of heteroskedasticity in Var(ux).15. so Therefore. it makes no sense to think we have access to the entire set of factors that one would ever want to control for. the usual Rsquared consistently estimates the population Rsquared.u) is welldefined. The division by the number of restrictions turns the asymptotic chi square statistic into one that roughly has an F distribution. This is nonsense when we view the xi as random draws along with yi. 13 . which is uncorrelated with each xj. 8. suddenly becomes observed. When we add z to the regressor list.) 4. say z. This is another example of how the assumption of nonrandom regressors can lead to counterintuitive conclusions. Var(y) = Var(xB) + Var(u). 2 where we use the fact that SSR/N is a consistent estimator of su and SST/N is 2 a consistent estimator of sy. Neither Rsquared nor the adjusted Rsquared has desirable finitesample properties. and so does the error variance. d. Write R = 1 . 2 c.plim[(SSR/N)/(SST/N)] = 1 . Var(xB) < < 8.
on average. the residuals are just z1 since v2 is N orthogonal in sample to z. 5.r1)’ be OLS estimator from (5. a.1.3. ^ y2. because i=1 ^ ^ ^ ^ we can write y2 = y2 + v2. Define x1 ^ ^ ^ _ (z1.a1)’. or eat less nutritious meals. B^1 can also be obtained by partitioned regression: ^ (i) Regress x1 onto v2 and save the residuals. Basic economics says that packs should be negatively correlated with cigarette price.) Further. There may be unobserved health factors correlated with smoking behavior that affect infant birth weight. S z’i1^vi2 = 0. so the only analysis we can do in any generality involves asymptotics. where B^1 ^ ^ = (D’ 1 . women who smoke during pregnancy may. But the 2SLS estimator of B1 is obtained ^ exactly from the OLS regression y1 on z1.y2) and x2 _ v^2. (More precisely.54).such as unbiasedness.y2). but we must be a little careful. ^ In other words. although the correlation might be small (especially because price is aggregated at the state level). the ^ residuals from regressing y2 onto v2 are simply the first stage fitted values. Using the hint. y2. The statement in the problem is simply wrong. (ii) Regress y1 onto ¨ x1. One component of cigarette price is the state tax on 14 . CHAPTER 5 5.52). ¨ x1 = (z1. At first glance it seems that cigarette price should be exogenous in equation (5. where y2 and v2 are orthogonal in sample. and let B _ (B ’1 . For example. ^ ^ But when we regress z1 onto v2. drink more coffee or alcohol. b. say ¨ x1.
000 3.0064486 .001 .601 0.0460328 parity  .056 0.890 0.) 15 .cigarettes. standard errors.333819 2.2588289 17.233 0. Quality of health care is in u.264 .0837281 .0036171 . . t P>t [95% Conf.063646 . Std.0218813 213.770361 1383 .0050562 .0000 0.000 4.960122 4.734 0.0298205 .4203336 1387 .0056646 2.39 0.718542 .086275 0.017779 1. OLS is followed by 2SLS (IV.0262407 .0570128 1.975601  (Note that Stata automatically shifts endogenous explanatory variables to the beginning of the list when report coefficients. Interval] +packs  .632694 4. reg lbwght male parity lfaminc packs (male parity lfaminc cigprice) Source  SS df MS +Model  91. reg lbwght male parity lfaminc packs Source  SS df MS +Model  1.000 . Std. and so maybe cigarette price fails the exogeneity requirement for an IV.262 0.7971063 1.0219322 0.1173139 .8375067 Residual  141.0290032 packs  . on average. .0258414 lfaminc  .035179819 +Total  50.600 0.094 .65369 1383 .0481949 .0417848 lfaminc  .1754869 _cons  4.4203336 1387 .32017 lbwght  Coef.0646972 parity  . in this case): .0322 . c. Interval] +male  .0350 0.0055837 3.116 0.0012391 .009 . States that have lower taxes on cigarettes may also have lower quality of health care.0171209 4. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 1388 2.18756 lbwght  Coef.76664363 4 .009 .0147292 . and so on.55 0.0100894 2.102509299 +Total  50.3500269 4 22.044263 .0501423 _cons  4.0070964 .0490 .677 0.467861 . Err.675618 .441660908 Residual  48.681 0.955 .0180498 .928031 male  .463 1. t P>t [95% Conf. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1388 12.036352079 Number of obs F( 4.036352079 Number of obs F( 4. Err.
3414234 The reduced form estimates show that cigprice does not significantly affect packs.041 .051 0.766 .0007459 .0047261 .55) because each is uncorrelated with u1.29448 packs  Coef.298 0. z1 and z2 are exogenous in (5.The difference between OLS and IV in the estimated effect of packs on bwght is huge.0358264 .696129 1387 . and is not statistically significant.089182501 Number of obs F( 4.0007291 .0355724 cigprice  .317 . 16 Unfortunately. the coefficient on cigprice is not the sign we expect. Err.86 0.0007763 1. cigprice fails as an IV for packs because cigprice is not partially correlated with packs (with a sensible sign for the correlation). 5. Std. Thus.0276 .321 0.0355692 lfaminc  .086716615 +Total  123. y2 . With the OLS estimate.001 0. Under the null hypothesis that q and z2 are uncorrelated. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1388 10.000777 . We can see the problem with IV by estimating the reduced form for packs: .4%. in fact. one more pack of cigarettes is estimated to reduce bwght by about 8. This is separate from the problem that cigprice may not truly be exogenous in the birth weight equation.0305 0. The sign and size of the smoking effect are not realistic.0526374 . reg packs male parity lfaminc cigprice Source  SS df MS +Model  3.94176277 Residual  119.044 0.0000 0. and is statistically significant. t P>t [95% Conf.0666084 .1040005 1.929078 1383 .187 . Interval] +male  .0263742 parity  .0088802 2. The IV estimate has the opposite sign.76705108 4 .0022999 _cons  .000 .0086991 6.0697023 .5. is huge in magnitude. d.0181491 .0158539 0.1374075 .
This is the sense in which identification With a single endogenous variable.z1. v (by definition of redundancy). Further. we might fail to reject H0: J1 = 0 when z2 and q are correlated . z2 does not produce a consistent estimator of 0 on z2 even when E(z’ 2 q) = 0. pK+M must be different from zero. b.7.xK.is correlated with u1. .in which case we incorrectly conclude that the elements in z2 are valid as instruments...h1a1..x1.45).45) we get y = b0 + b1x1 + .. cannot be tested.. in the linear projection q1 = p0 + p1x1 + .zM) to get consistent of the bj and h1.h1a1. that ^ J 1 We could find from this regression is statistically different from zero even when q and z2 are uncorrelated . Or. The point of this exercise is that one cannot simply add instrumental variable candidates in the structural equation and then test for significance of these variables using OLS..56) Now. a.. and so the regression of y1 on z1. + pK+MzM + r1.. they are uncorrelated with the structural error.. where h1 _ (1/d1).in which case we would incorrectly conclude that z2 is not a valid IV candidate. we can estimate (5. We need family background variables to be redundant in the log(wage) 17 . at least one of pK+1. we must take a stand that at least one element of z2 is uncorrelated with q. y2. Since each xj is also uncorrelated with v . + bKxK + h1q1 + v .. we have assumed that the zh are uncorrelated with a1. If we plug q = (1/d1)q1 .. + pKxK + pK+1z1 + .(1/d1)a1 into equation (5.z2.56) by 2SLS using instruments (1. 5. since the zh are redundant in (5. (5....... Given all of the zero correlation assumptions. what we need for identification is that at least one of the zh appears in the reduced form for q1. More formally.
70 .07 0.0241444 urban  .1901012 .046 .1225442 .000 .192 .208996 713 . 713) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 722 25.equation once ability (and other factors.392231 .2819033 south  .811916 721 .1451 .81 0.0162185 .175883378 Number of obs F( 8.1546 0.0000 0.0305692 tenure  . The idea here is that family background may influence ability but should have no partial effect on log(wage) once ability has been accounted for.150363248 +Total  126.00 0.2635832 exper  .468913 9.035254 .0367425 1.047992 .0154368 .45036497 Residual  107.54 0. once the xj have been netted out. reg lwage exper tenure educ married south urban black iq (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source  SS df MS +Model  19.2513311 black  .0003044 .013 . t P>t [95% Conf.35 0. say IQ.725 .70 0.38777 lwage  Coef.0161809 . have been controlled for.000 .537 .1835294 .0077077 2.0083503 .0982991 .48 0.RAW gives the following results: .000 .0467592 4.0327986 5.0400269 .000 3. 713) = 722 25. we need family background variables to be correlated with the indicator. reg lwage exper tenure educ married south urban black kww (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source  SS df MS +18 Number of obs = F( 8.0015979 .31 0.0261982 0.0076754 . q1.62 0.6029198 8 2.1201284 .0240867 _cons  4. For the rank condition to hold. Interval] +iq  . Err.0676158 married  .1869376 . This is likely to be true if we think that family background and ability are (partially) correlated. c.0030956 2.0040076 4. Applying the procedure to the data set in NLS80. such as educ and exper).0137529 educ  . Std.471616 .551 5.1138678 0.05 0.
0046184 .0411598 3.0125238 educ  . and sibs.1551341 .2179041 . because data are missing on meduc and feduc.635 . or they might be correlated with a1. This would allow us to use all 935 observations.150058361 +Total  126.0545067 tenure  . and then use the binary indicators as instruments along with meduc.000 4.002.1468 .0565198 .477538 Residual  106.38737 lwage  Coef. Std.03 0.0249441 .02 0.098 .66 0.537362 Even though there are 935 men in the sample.004 . (In both firststage regressions.991612 713 .176 . t P>t [95% Conf.0761549 married  .000 .1563 0.02 0.820304 8 2.0201147 _cons  5.0150576 1. and sibs have pvalues below .36 0. so it seems the family background variables are sufficiently partially correlated with the ability indicators.9.0067471 1.898273 5. so that b4 = b3 + q4.0529759 3.85 0.0286399 urban  .091887 .0675914 .06 0.0063783 .1484003 .b3. What we could do is define binary indicators for whether the corresponding variable is missing.307 .Model  19. the F statistic for joint significance of meduc.0068682 .0322147 2.811916 721 . only 722 are used for the estimation.1605273 .0239933 . The return to education is estimated to be small and insignificant whether IQ or KWW used is used as the indicator.0000 0. Define q4 = b4 . the equation and rearranging gives 19 Plugging this expression into . feduc.0037739 1. Interval] +kww  .0260808 . feduc. This could be because family background variables do not satisfy the appropriate redundancy condition.61 0.47 0.003 .0051145 .0893695 0.) 5.0022947 .2292093 black  .309 .1627592 32.1330137 exper  .217818 .0424452 .175883378 Prob > F Rsquared Adj Rsquared Root MSE = = = = 0.2645347 south  . set the missing values to zero.0255051 1. Err.
let y2 be the linear projection of y2 on z2. * Now. Following the hint. E(z’u1) = 0. The key consistency condition is that each explanatory is orthogonal to the composite error. y2. In a simple regression model with a single IV. 5.x) _ * = i i 7i=1 i 8 7i=1 i 8 20 . a. a1a2 + u1. just estimate the latter equation by 2 2SLS using exper. and so E(z’ 1 r2) = 0 and E(y2r2) = 0. where E(z’r2) = 0.13. and assume that is known. 0 0 Effectively. E(y2a2) = 0 by construction.and secondstage regressions. is that E(z’ 1 a2) Therefore. 0 Further. * that P2 The problem $ 0 necessarily because z1 was not included in the linear projection for y2. general. 0 5._z)(y ._y)*/& SN (z . r2 is uncorrelated with z. OLS will be inconsistent for all parameters in * Contrast this with 2SLS when y2 is the projection on z1 and z2: = y2 + r2 = zP2 + r2. * The lesson is that one must be very careful if manually carrying out 2SLS by explicitly doing the first.) 0 Plugging in y2 = y2 + a2 gives y1 = z1D1 + a1y2 + a1a2 + u1. exper . the IV estimate of the ^ slope can be written as b1 = & SN (z . Now. let a2 L2 be the projection error. log(wage) = b0 + b1exper + b2exper = b0 + b1exper + b2exper where totcoll = twoyr + fouryr. we regress y1 on z1. ^ We can use the t statistic on q4 to test H0: q4 = 0 against H1: q4 > 0.1 show that the argument carries over to the case when L2 is estimated.11. (The results on generated regressors in Section 6. dist2yr and dist4yr as the full set of instruments. assumption.1._z)(x . By y2 The second step regression (assuming is known) is essentially y1 = z1D1 + a1y2 + a1r2 + u1.2 + b3(twoyr + fouryr) + q4fouryr + u 2 + b3 totcoll + q4fouryr + u.
 b.y). If x is also binary . x0 = 0. (When eligibility is Generally.(N0/N)y0   = (N0/N)(y1 .x1 is the  fraction of observations receiving treatment when zi = 1 and x0 is the fraction receiving treatment when zi = 0.   Next. So the difference in the mean response between the z = 1 and z = 0 groups gets divided by the difference in participation rates across the two groups. Taking the ratio proves the result. as a weighted average: clear.   The same argument shows that the denominator is (N0N1/N)(x1 .& SN z (y . In L(xz) = z^. ) .15. a. suppose xi = 1 if person i participates in a job training program. So.&7 S zi*8_y = N1y1 .y = [(N . 0 is the L1 x K2 zero matrix. then 21 ^11 has .) participation rates when z = 1 and z = 0.y0). i=1 i=1 i=1 N where N1 = S zi is the number of observations in the sample with zi = 1 and     i=1   y1 is the average of the yi over the observations with zi = 1. where the notation should be     Straightforward algebra shows that y1 ._y)*/& SN z (x . Then x1 is the fraction of people  participating in the program out of those made eligibile.N1y = N1(y1 .x0).   So the numerator of the IV estimate is (N0N1/N)(y1 . we can write ^ = (^ 9^ 2 11 12 0 IK identity matrix. and x0 is the fraction of people participating who are not eligible.representing some "treatment" ._y) = S ziyi . K1.x0 is the difference in   necessary for participation.12. the vector z1 does not appear in L(xjz). Now the numerator can be written as 7i=1 i i 8 7i=1 i i 8 N N N S zi(yi .N1)/N]y1 . 5. write y  y = (N0/N)y0 + (N1/N)y1. If for some xj. x1 .y0). and let zi = 1 if person i is eligible for participation in the program. and ^12 is K2 x As in Problem 5. the rank condition holds if and only if rank(^) = K._x)*. where IK 2 20 ^11 2 is the K2 x K2 is L1 x K1.
j = 1. b. which means that the second row of zeros. It cannot have rank K. rank is a K1 x K1 diagonal matrix with nonzero diagonal elements.. only one of them turned out to be partially correlated with x1 and x2. while we began with two instruments.a column which is entirely zeros. c. a necessary condition for the rank condition is that no columns of ^11 be exactly zero. we see that if 2 20 ^11 is diagonal with all nonzero is lower triangular with all nonzero diagonal elements. ^ can be written as which means rank(^) < K. ^ = K. Suppose K1 = 2 and L1 = 2. ^ is all Intuitively.K1. but z2 appears in neither reduced form. But then that column of a linear combination of the last K2 elements of ^. predict v2hat. Then Looking at ^11 ^ = diagonals then Therefore.1. resid 22 . Here is abbreviated Stata output for testing the null hypothesis that educ is exogenous: ... qui reg educ nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66 . which means that at least one zh must appear in the reduced form of each xj. where z1 appears in the reduced form form both x1 and x2. we can simply reorder the elements of z1 to ensure this is the case.. Therefore. CHAPTER 6 6. Without loss of generality. ^11 Then the 2 x 2 matrix has zeros in its second row. a. in that case. we assume that zj appears in the reduced form for xj. (^ 11 12 9^ 2 ^ 0 IK ) .
179 0.0828005 ..729054 4.1811588 .0017308 black  .0436804 1.0118296 .100753 .540 0.1286352 reg668  .0610759 .253 0. predict uhat1.1570594 .0478882 2.710 0.994 .1598776 expersq  .000 1.1575042 reg661  .000 .238 .0359807 1.0515041 . which is not significant at the 5% level against a twosided alternative.855 0. reg uhat1 exper expersq black south smsa reg661reg668 smsa66 nearc4 nearc2 Source  SS df MS Number of obs = 23 3010 .2926273 .066 0.821434 4.0205106 0. t P>t [95% Conf.673 0.000 .1057408 reg664  .0484086 1.0003191 7.1188149 . Std.0552789 v2hat  .0259797 .000 .001 .1659733 reg667  .430 0.574 0.583 0.1980371 .339687 .463 .) b.105 0.0209423 5. that educ is endogenous.0341426 . Err. I would call this marginal evidence (Depending on the application or purpose of a study.0247932 reg662  .384 0.000 .0310325 0.0777521 .0293807 south  .0919791 smsa  .0456842 0.482 0.0699968 .1944098 . The negative correlation between u1 and educ is essentially the same finding that the 2SLS estimated return to education is larger than the OLS estimate.0002286 .010 .1034468 smsa66  .0390596 .1259578 .950319 ^ The t statistic on v2 is 1.0482814 3.0029822 .734 0.010 . resid Now.0623912 .0489487 1.1371509 reg666  .124 .0554084 . qui reg lwage educ exper expersq black south smsa reg661reg668 smsa66 (nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66) .177718 .0121169 _cons  3.050516 .0261202 5.0606186 reg663  .087 .1232778 .007 0. we regress the 2SLS residuals on all exogenous variables: .566 0.393 .0023565 .0251538 .0440018 .0299809 1.71.71 as evidence for or against endogeneity.153 .000 . To test the single overidentifying restiction we obtain the 2SLS residuals: . the same researcher may take t = 1.0289435 3. reg lwage educ exper expersq black south smsa reg661reg668 smsa66 v2hat lwage  Coef.0482417 4.0150626 .1431945 .0398738 2.102976 .0151411 reg665  .2171749 . Interval] +educ  .0469556 .117 .481 0.2517275 exper  . In any case.
27332168 2 The pvalue.0004 1.163433913 F( 16.012745177 Residual  491.. .273.08 = 1. exper. A potential problem is that prices reflect food quality and that features of the food other than calories and protein appear in the disturbance u1. educ. b. 2993) Prob > F Rsquared Adj Rsquared Root MSE = 0. is about .164239466 +Total  491. exper . di 3010*. exper . di chiprob(1. c. pM.+Model  . exper. First. We need prices to satisfy two requirements. a. v21 and v22.568721 2993 . While this is easy to test for each by estimating the two reduced forms. We would first estimate the two reduced forms for calories and protein 2 by regressing each on a constant. and the M prices.1. we must also assume prices are exogenous in the productivity equation. the rank condition could still be violated (although see Problem 15.0000 = 0. In addition. ^ ^ We obtain the residuals.. v22 and do a joint 24 ..2) . calories and protein must be partially correlated with prices of food.203922832 16 .3. obtained from a c1 distribution.0049 = . educ. v21.772644 3009 .5c). prices vary because of things like transportation costs that are not systematically related to regional variations in individual productivity.0004 = 0. p1.40526 The test statistic is the sample size times the Rsquared from this regression: . Since there are two endogenous explanatory variables we need at least two prices.204 . 6. Ideally. Then we would run the 2 ^ ^ regression log(produc) on 1. so the instruments pass the overidentification test.
S (hi . 6.2uixi(B  B) + so 1/2 N N S (hi . N i=1 i=1 We are done with this part if we show N 1/2 N N S (hi .B)(B .Mh)’(xi t xi)*8{vec[rN(B . 1/2 N i=1 i=1 S (hi . E(ux) = 0.s ). i h i i 8 7 (6. absorb the intercept in x. ^ [xi(B N B)]2.Mh)’ = Op(1) by the central limit theorem and s^2 . N 1/2& 7N The third term can be written as ^ ^ 1/2 S (hi . ^2 In these tests.^s2) = N1/2 S (hi .Mh)’u^2i = N1/2 S (hi  i=1 M 2 h)’ui + op(1). So N Therefore. ^2 ^2 So ui .s2) = Op(1)Wop(1) = op(1). We could use a standard F test or use a heteroskedasticityrobust test. so i=1 far we have 1/2 N N ^2 2 S h’i (u^2i .B) 7 i=1 N & 1/2 S (h .Mh)’u2i i=1 i=1 & 1/2 N ^ . which means that asymptotically.) N (In any case. as in Problem 4. the law of large numbers  B)’x’i = (xi   implies that the sample average is op(1).B) = 7 i=1 8 Op(1) and. s is implictly SSR/N .5. so y = xB + u.Mh)’xi8*(B .Mh)’u^2i = N1/2 S (hi .Mh)’xi] = 0. a.s ) + op(1).s2 = Next. N i=1 1/2 N op(1).s has a zero sample average.Mh)’xi rN(B .2 N S ui(hi . Dropping the "2" the second term can & 1 N * ^ ^ be written as N S ui(hi .B)’]}. we can write ui = ui .B)’]} = N WOp(1)WOp(1).B)rN(B .there is no degrees of Var(ux) = s .M )’(x t x )*{vec[(B ^ ^ + N .B) = op(1)WOp(1) because rN(B . For simplicity. under E(uixi) = 0. the df adjustment makes no difference 1/2 N ^2 ^2 1/2 N ^2 ^2 S (hi .s ) = N i . 2 freedom adjustment.Mh)’(u i .Mh)’(u S h’i (u i . i=1 1 N   where we again use the fact that sample averages are Op(1) by the law of large ^ numbers and vec[rN(B  B)rN(B^   B)’] = Op(1). E[ui(hi .4.B)(B .B)’].Mh)’(s^2 . i=1 ^2 2 ^ Now.^ ^ significance test on v21 and v22.40) i=1 ^ where the expression for the third term follows from [xi(B  B)]2 ^ = xi(B  B)(B^ ^ ^ t xi)vec[(B . 25 We have shown that the last two .
terms in (6.40) are op(1), which proves part (a).
1/2 N
S h’i (u^2i  ^s2) is Var[(hi
b. By part (a), the asymptotic variance of N
i=1
M

2
h)’(ui
2 2
4
2uis
2
 s )] =
+ s .
2
E[(ui
2 2
 s ) (hi 
Mh)’(hi

Mh)].
2
Under the null, E(uixi) = Var(uixi) = s
2
2
2
xi] = k2  s4 _ h2.
2 2
2
2 2
standard iterated expectations argument gives E[(ui  s ) (hi 2
2 2
Mh)}
Mh)’(hi

Mh)]xi}
2
[since hi = h(xi)] = h E[(hi 
show.
2
2 2
= E{E[(ui  s )
Mh)’(hi

4
= ui 
[since E(uixi) = 0 is
assumed] and therefore, when we add (6.27), E[(ui  s )
= E{E[(ui  s ) (hi 
2 2
Now (ui  s )
Mh)].
Mh)’(hi

A
Mh)]
xi](hi  Mh)’(hi 
This is what we wanted to
(Whether we do the argument for a random draw i or for random variables
representing the population is a matter of taste.)
c. From part (b) and Lemma 3.8, the following statistic has an asymptotic
2
cQ distribution:
&N1/2 SN (u^2  s^2)h *{h2E[(h  M )’(h  M )]}1&N1/2 SN h’(u
^2
^2 *
i
i8
i
h
i
h
i
i  s )8.
7
7
i=1
i=1
N ^2
^2
Using again the fact that S (ui  s ) = 0, we can replace hi with hi  h in

i=1
the two vectors forming the quadratic form.
Then, again by Lemma 3.8, we can
replace the matrix in the quadratic form with a consistent estimator, which is
^2& 1
h N
^2
1
where h = N
N ^2
^2 2
S (u
i  s ) .
7
N
S (hi  h)’(hi  h)*8,
i=1


The computable statistic, after simple algebra,
i=1
can be written as
& SN (u^2  s^2)(h  h)*& SN (h  h)’(h  h)*1& SN (h  h)’(u
^2
^2 * ^2
i
i
i  s )8/h .
7i=1 i
87i=1 i
8 7i=1 i




^2
^2
Now h is just the total sum of squares in the ui, divided by N.
The numerator
^2
of the statistic is simply the explained sum of squares from the regression ui
on 1, hi, i = 1,...,N.
Therefore, the test statistic is N times the usual
^2
2
(centered) Rsquared from the regression ui on 1, hi, i = 1,...,N, or NRc.
2
2 2
d. Without assumption (6.37) we need to estimate E[(ui  s ) (hi 
Mh)]
generally.
Hopefully, the approach is by now pretty clear.
26
Mh)’(hi
We replace
the population expected value with the sample average and replace any unknown
parameters (under H0).
B,
2
s , and
Mh
in this case  with their consistent estimators
&
7
^2
^2 *
S h’i (u
i  s )8
i=1
1/2 N
So a generally consistent estimator of Avar N
is
N
1 N
S (u^2i  s^2)2(hi  h)’(hi  h),


i=1
and the test statistic robust to heterokurtosis can be written as
& SN (u
^2
^2
*& SN (u^2  ^s2)2(h  h)’(h  h)*1
 s )(hi  h)
i
i
i
7i=1
87i=1 i
8
N
&
^2
^2 *
W7 S (hi  h)’(ui  s )8,




i=1
which is easily seen to be the explained sum of squares from the regression of
^2
^2
1 on (ui  s )(hi  h), i = 1,...,N (without an intercept).

Since the total
sum of squares, without demeaning, is N = (1 + 1 + ... + 1) (N times), the
statistic is equivalent to N  SSR0, where SSR0 is the sum of squared
residuals.
6.7. a. The simple regression results are
. reg lprice ldist if y81
Source 
SS
df
MS
+Model  3.86426989
1 3.86426989
Residual  17.5730845
140 .125522032
+Total  21.4373543
141 .152037974
Number of obs
F( 1,
140)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
142
30.79
0.0000
0.1803
0.1744
.35429
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+ldist 
.3648752
.0657613
5.548
0.000
.2348615
.4948889
_cons 
8.047158
.6462419
12.452
0.000
6.769503
9.324813
This regression suggests a strong link between housing price and distance from
the incinerator (as distance increases, so does housing price).
27
The elasticity
is .365 and the t statistic is 5.55.
However, this is not a good causal
regression:
the incinerator may have been put near homes with lower values to
begin with.
If so, we would expect the positive relationship found in the
simple regression even if the new incinerator had no effect on housing prices.
b. The parameter d3 should be positive:
after the incinerator is built a
house should be worth more the farther it is from the incinerator.
Here is my
Stata session:
. gen y81ldist = y81*ldist
. reg lprice y81 ldist y81ldist
Source 
SS
df
MS
+Model  24.3172548
3 8.10575159
Residual  37.1217306
317 .117103251
+Total  61.4389853
320 .191996829
Number of obs
F( 3,
317)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
321
69.22
0.0000
0.3958
0.3901
.3422
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y81  .0113101
.8050622
0.014
0.989
1.59525
1.57263
ldist 
.316689
.0515323
6.145
0.000
.2153006
.4180775
y81ldist 
.0481862
.0817929
0.589
0.556
.1127394
.2091117
_cons 
8.058468
.5084358
15.850
0.000
7.058133
9.058803
The coefficient on ldist reveals the shortcoming of the regression in part (a).
This coefficient measures the relationship between lprice and ldist in 1978,
before the incinerator was even being rumored.
The effect of the incinerator
is given by the coefficient on the interaction, y81ldist.
While the direction
of the effect is as expected, it is not especially large, and it is
statistically insignificant anyway.
Therefore, at this point, we cannot reject
the null hypothesis that building the incinerator had no effect on housing
prices.
28
638 1. Err.0469214 .041028709 +Total  61.677871 309 .0248165 4.675 0.229847 .0101699 .191996829 Number of obs F( 11.926 0.0222128 larea  .000 .041817 .027479 3.1593143 lintst  .000 .43282858 Residual  12.000 .0132713 . t P>t [95% Conf.096088 .062) and the t statistic is larger.0000 0.185185 5.0517205 1. Adding the variables listed in the problem gives .1588297 age  .489 0.4389853 320 .241 0.006 .c.3213518 1.0046178 agesq  .0591504 . The Stata results are .0617759 . reg ldurat afchnge highearn afhigh male married headconstruc if ky Source  SS df MS +Model  358.0611683 .7863 .0187723 3.627 0.3548562 .0000315 8. a.953 0.4556655 lland  .41206 5334 1.0000486 rooms  .189519 .04 0.0073939 .7937 0. 6.300 0.0512328 6.471 0.0866424 . reg lprice y81 ldist y81ldist lintst lintstsq larea lland age agesq rooms baths Source  SS df MS +Model  48.0357625 .305525 1.109999 . but the interaction is still statistically insignificant.56381928 +29 Number of obs F( 14.37 0.774032 1.3262647 2.002 .4877198 0.095 .0387 .7611143 11 4. 309) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 321 108.0958867 .000 .214 .20256 lprice  Coef.0151265 .1499564 _cons  2.7298249 ldist  .000 .0000 0.744 0. Std.195 1.6029852 Residual  8341.0805715 baths  .9.796236 The incinerator effect is now larger (the elasticity is about .69e06 3.0495705 1.0014108 5.151 0.246 0.0000144 .441793 14 25.0412 0.0171015 2.9633332 .003 . Using these models and this two years of data we must conclude the evidence that housing prices were adversely affected by the new incinerator is somewhat weak.605315 lintstsq  .432 0. Interval] +y81  .1884113 y81ldist  . 5334) Prob > F Rsquared Adj Rsquared = = = = = 5349 16.2540468 .
178539 . The low Rsquared means that making predictions of log(durat) would be very difficult given the factors we have included in the regression: the variation in the unobservables pretty much swamps the explained variation.24 0.2308768 .1011794 1.1614899 1.7673372 .32 0. Std.2076305 .1606709 .1904371 lowextr  .000 .813 .1%. The small Rsquared.76 0.3208922 . on the order of 4.93 0.0085967 .2604634 neck  .1987962 head  .2772035 afhigh  .0695248 3.98 0.85385 5348 1.1404816 .33).2727118 .0945798 .0466737 . and even more statistically significant than in equation (6. Err. Provided the Kentucky change is a good natural experiment.245922 . However. the low Rsquared does not mean we have a biased or consistent estimator of the effect of the policy change.13 0. or 3.0986824 highearn  . 30 With over 5.29 0.0804827 construc  .0409038 3.1264514 .033 .2408591 .0872651 .0086352 .001 .08 0. the OLS estimator is consistent.0445498 2.74 0.0106274 .0774276 .40 0.028 .03779 1.196 .2699126 .1757598 .0979407 .0803101 occdis  .1090163 1.933 .2505 ldurat  Coef.Total  8699. Adding the other explanatory variables only slightly increased the standard error on the interaction term.0391228 3.454054 The estimated coefficient on the interaction term is actually higher now.1852766 .16 0.5139003 .000 1.62674904 Root MSE = 1.3671738 male  .12 0.5864988 upextr  .240 . means that we cannot explain much of the variation in time on workers compensation using the variables included in the regression.1101967 .2117581 _cons  1.18 0.6859052 manuf  . we .246 .0518063 2. Interval] +afchnge  .0449167 0.000 observations.078 . t P>t [95% Conf.1015267 0.1202911 .340168 lowback  .002 .9% if we used the adjusted Rsquared.210769 1.20 0.0198141 trunk  .1292776 3. b.0517462 3.1220995 . is often the case in the social sciences: This it is very difficult to include the multitude of factors that can affect something like durat.0454027 .000 .001 .67 0.0743161 .095 .0106049 married  .1023262 1.376892 .1061677 11.
60 0.626/1.0973808 . The following is Stata output that I will use to answer the first three parts: .1691388 .3765 ldurat  Coef. . Unfortunately.992074 8 16.1104176 . Std. reg ldurat afchnge highearn afhigh if mi Source  SS df MS +Model  34.0118 0. reg lwage y85 educ y85educ exper expersq union female y85fem Source  SS df MS +Model  135.9990092 31 Number of obs = F( 8.0004 0.96981 1520 1. 1520) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1524 6.1541699 1.0847879 1. 6.0379348 . because of the many fewer observations.000 1.301485 1. Interval] +afchnge  . Err. Using the data for Michigan to estimate the simple model gives .4943988 _cons  1.524) ~ 1.1919906 .192.0000 .251 .92 larger than that for Kentucky.0567172 24.523989 The coefficient on the interaction term. the t statistic is insignificant at the 10% level against a onesided alternative.15 0.213 . the ratio of The difference in the KY and MI cases shows the importance of a large sample size for this kind of policy analysis.3850177 3 11.91356194 Number of obs F( 3.05 0.0098 1. 1075) = Prob > F = 1084 99.89471698 +Total  2914.2636945 highearn  .412737 . Asymptotic theory predicts that the standard error for Michigan will be about 1/2 (5.25 0.0689329 . c.23. although the 95% confidence interval is pretty wide.35483 1523 1.can get a reasonably precise estimate of the effect.3762124 afhigh  .1055676 1.11.91 0. standard errors is about 2.109 . is remarkably similar to that for Kentucky.80 0.4616726 Residual  2879. In fact. t P>t [95% Conf.
2021319 . But the t statistic is only significant at about the 10% level against a twosided alternative.95 0.85 percentage points. The return to another year of education increased by about .4219 .022.0184605 .383.125075 .0225846 . Err. this is suggestive of some closing of wage differentials between men and women at given levels of education and workforce experience.0878212 y85educ  . or 1.0000775 5. between 1978 and 1985.0295843 .1426888 . the coefficient on y85 becomes about .085052 .3885663 .642295  a. Only the coefficient on y85 changes if wages are measured in 1978 dollars.091167 1083 . Still.1178062 .036584 expersq  . I just took the squared OLS residuals and regressed those on the year dummy.29 0.000 .65 0. The coefficient is about .4262 0.000 . you can check that when 1978 wages are used.0616206 .0185.2755707 .185729 _cons  . The coefficient on y85fem is positive and shows that the estimated gender gap declined by about 8.0035673 8. c.) d. In fact.0093542 1.049 .0302945 6.2615749 female  .91. 32 So .29463635 Rsquared = Adj Rsquared = Root MSE = 0.67 0.19 0.000 .098 .0003994 . which shows a significant fall in real wages for given productivity characteristics and gender over the sevenyear period.0002473 union  . y85.0366215 8. b. which gives a t statistic of about 1.97 0.000 .036815 exper  .3167086 .341 .000106 .0747209 .15 0. which is marginally significant at the 5% level against a twosided alternative. To answer this question. Std.051309 1.66 0.4127 lwage  Coef.4589329 . The t statistic is 1.1237817 0.244851 y85fem  . (But see part e for the proper interpretation of the coefficient.000 .3606874 educ  . t P>t [95% Conf.0066764 11.170324738 +Total  319.000 .97.Residual  183.099094 1075 .0934485 4.0005516 . Interval] +y85  .5 percentage points.042 with a standard error of about .0156251 .91 0.
0747209 . q0 is the coefficient on 12). educ.642295 So the growth in nominal wages for a man with educ = 12 is about .000 .0302945 6.12) .000 .29463635 Number of obs F( 8.339) 1.170324738 +Total  319.0002473 union  . e. Err.0616206 .0225846 . the coefficient d0 is the growth in nominal wages for a male with no years of education! with 12 years of education.0005516 .29 0.80 0.185729 _cons  . For a male A simple way to obtain ^ ^ ^ the standard error of q0 = d0 + 12d1 is to replace y85Weduc with y85W(educ Simple algebra shows that.0878212 y85educ0  . In Stata we have .000 .036584 expersq  .339.9990092 Residual  183.2615749 female  .0003994 .4060659 educ  .0340099 9. Interval] +y85  .3 to 40.there is some evidence that the variance of the unexplained part of log wages (or log real wages) has increased over time. [We could use the more accurate estimate. obtained from exp(.19 0.000 . t P>t [95% Conf. we want q0 _ d0 + 12d1.098 .0066764 11.000 .98 0.0035673 8. As the equation is written in the problem.3167086 .244851 y85fem  .085052 .2021319 . gen y85educ0 = y85*(educ .992074 8 16.15 0.051309 1.1426888 .67 0.9%.0000 0. Std.0295843 .66 0.4127 lwage  Coef.6.0000775 5.] The 95% confidence interval goes from about 27.2755707 .000 .4219 .3885663 .3393326 .0934485 4. or 33.0156251 .97 0.0184605 .091167 1083 . 1075) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1084 99.000106 . 33 . in the new model.4262 0.4589329 . reg lwage y85 educ y85educ0 exper expersq union female y85fem Source  SS df MS +Model  135.0093542 1.91 0.000 .2725993 .099094 1075 .036815 exper  .049 .65 0.0366215 8.
1. SGLS. and E(uiguihx’ igxih) = E(uiguih)E(x’ igxih) = 0..CHAPTER 7 7. 0 2 s2 G E(x’ iGxiG)8 0 W When this matrix is inverted.1). &s2 0 1 E(x’ i1xi1) 2 1 0 W E(X’ i ) Xi) = 2 W 2 7 0 0 Therefore. we use the result under SGLS. asymptotic variance of what we wanted to show. and SGLS.3. This follows if the asymptotic variance matrix is block diagonal (see Section 3.2. Thus.3.B) = [E(X’ i ) Xi)] . where the blocking is by the parameter vector for each equation. all g $ h. 34 This shows that the . SGLS. 7 i=1 i i8 ^ & 1 N X *1Wplim &N1 SN X’u * = B + A1W0 = B.G. it is also block diagonal. plim &N1 SN X’X *1 = A1.1..5). and SGLS... Write (with probability approaching one) B^ = B + &N1 SN X’X *1&N1 SN X’u *. 7 i=1 i i8 7 i=1 i i8 From SOLS.1.2. implies that E(uigx’ igxig) = 2 In the SUR model with diagonal ).4: To establish block diagonality. ) plim B = B + plim N S X’ 7 i=1 i i8 7 i=1 i i8 Further. the WLLN implies that plim 7. 7 i=1 i i8 &N1 SN X’u * = 0.3. from Theorem 7. Now. a. we can use the special form of Xi for SUR (see Example 7. the weak law of large numbers. it suffices to show that the GLS estimators for different equations are asymptotically uncorrelated. we have * 2 2. Since OLS equationbyequation is the same as GLS when ) is diagonal. and Slutsky’s Theorem. . under SOLS.^ 1 1 Avar rN(B .3 sg2E(x’igxig) for all g = 1. the fact that )1 is diagonal.
7. GLS and FGLS are asymptotically equivalent (regardless of the structure of = B^GLS when ) and )) whether or not SGLS. a.. Note that 1 &^ 1 & N *1 N ^ 2) t 7 S x’i xi*82 = ) t &7 S x’i xi*8 .2.. i=1 82 2 W 2 7 2 N 2 2 SN x’y 2 i iG8 7 S x’i yiG8 7 i=1 i=1 Straightforward multiplication shows that the right hand side of the equation ^ ^ ^ ^ is just the vector of stacked Bg.B ^ ^ .6 for one way to impose general linear restrictions.3 does not hold.5.1 and SGLS. Under SGLS.. system OLS and GLS are the same. 7 i=1 8 i=1 Therefore.BGLS) = op(1).53). then rN(BSOLS . B^SOLS But. 7. is diagonal. First. we can either construct the Wald statistic or we can use the weighted sum of squared residuals form of the statistic as in (7. To test any linear hypothesis. .52) or (7. OLS and FGLS are asymptotically equivalent. consider E(uituis). g = 1. When ) is diagonal in a SUR system.. 35 2 are easily found since E(uit) = Now. This is easy with the hint.BFGLS) = op(1). c. See Problem 7.b. and .^ ^ rN( FGLS . model with the restriction B1 = For the restricted SSR we must estimate the B2 imposed. the diagonal elements of E[E(uitxit)] = 2 ) st2 by iterated expectations. if Thus.3 holds.G. even if )^ is estimated in an unrestricted fashion and even if the system homoskedasticity assumption SGLS. B^ = 1* &^ N ^ 1 2) t &7 S x’i xi*8 2() 7 i=1 8 & SN x’y * & SN x’y * i i12 i i12 2i=1 2i=1 2 2 1 2 2 & N * t IK)2 WW 2 = 2IG t &7 S x’i xi*8 22 W WW 22. ) 7. where Bg is the OLS estimator for equation g.
T S x’its2 t uit. Thus.. t=1s=1 Under (7. .t1 + uit. First consider the terms for s T T 2 S S s2 t ss E(uituisx’ itxis).. First. which says that xi..2. say. t $ s.uis. E(uitxit. 1 2 2 s2 T x’ iT)’. SGLS. xis.79). It follows that E(X’ i ) uiu’ i ) Xi) = 1 1 T 1 S s2 t E(x’ itxit) = E(X’ i ) Xi).s2 x’ i2.1. SGLS. since is diagonal. . Applying the law of iterated expectations (LIE) again we have E(uituis) = E[E(uituisuis)] = E[E(uituis)uis)] = 0.t+1 = yit is correlated with uit. X’ i) = (s1 x’ i1. t=1 36 Next. E(uituisx’ itxis) = 0.1 holds whenever there is feedback from yit to )1 However. The GLS estimator is 1 N N B* _ &7 S X’i )1Xi*8 &7 S X’i )1yi*8 i=1 i=1 & SN ST s2x’ x *1& SN ST s2x’ y *...Under (7. and so E(X’ i ) uiu’ i ) Xi) = 1 1 $ t. E(uitx’ itxit) = E[E(uitx’ itxitxit)] = E[E(uitxit)x’ itxit)] 2 2 2 = E[stx’ itxit] = 2 st2E(x’itxit).T. for each t.1 Generally. and so by the LIE. yit with uit. s > t. If.80).80). 7i=1t=1 t it it8 7i=1t=1 t it it8 = b0 + b1yi.80). E(uituis) = 0 since uis take s < t without loss of generality. and so t=1 = 0 Thus.. X’ i ) ui = 1 T E(X’ i ) ui) = S s2 t E(x’ ituit) t=1 1 since E(x’ ituit) = 0 under (7. GLS is consistent in this case without SGLS. does not hold. since )1 is diagonal.xis) = 0. if s < t. d.. t = 1. b. is a subset of the conditioning information in (7. then yit is clearly correlated = c.
In particular.67 0.^ ^ e. We can obtain valid standard errors..K as a degreesoffreedom adjustment. if ^ ) is taken to be the diagonal matrix with s^t2 as the diagonal. we can use the standard errors and test statistics reported by a standard OLS regression pooled across i and t....606623 107 2. run pooled regression across all i and t. inference is very easy.8509 .2. 7. FGLS reduces Thus. th t Now.T. and F statistics from (7..0000 0. i = 1. standard errors obtained from (7.2296502 103 . t = 1. then the FGLS statistics are easily shown to be identical to the statistics obtained by performing pooled OLS on the equation ^ ^ (yit/st) = (xit/st)B + errorit. I first test for serial correlation before computing the fully robust standard errors: . First.8565 0. let uit denote the pooled OLS residuals. for each t. We have verified the assumptions under which standard FGLS statistics have nice properties (although we relaxed SGLS. t statistics..376973 4 46.5942432 Residual  31.1). The Stata session follows. define N ^2 ^2 st = N1 S ^uit i=1 (We might replace N with N ... to pooled OLS.03370676 37 Number of obs F( 4. by ^2 st Lp s2t as N L 8. note that the ^2 st should be obtained from the pooled OLS residuals for the unrestricted model...53) are valid..9. Then. Then.51) are asymptotically valid. For F testing. If s2t = s2 for all t = 1. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987 Source  SS df MS +Model  186. f.T.55064 .N. g.) standard arguments.. and F statistics from this weighted least squares analysis.303200488 +Total  217. 103) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 108 153.
6186631 Residual  15.3232679 lscrap_1  .028 0.5958904 _cons  .2123137 . t P>t [95% Conf.1723924 . but neither is strongly statistically significant.1746251 0.07976741 Number of obs F( 4.0769918 grant_1  .0378247 .2120579 lscrap_1  .9518152 _cons  .0000 0. gen uhat_1 = uhat[_n1] if d89 (417 missing values generated) .47 0.7530202 49 .8571 0.1153893 .0357963 24.1257443 1.173 .1146314 2.666 0.875 .227673 53 2. resid (363 missing values generated) .0000 .4217765 .606 0. and its lag.8055569 1.8454 .035384 uhat_1  .3532078 . Err.1224292 grant  .809828 . 49) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 54 73. are now the expected sign.567 lscrap  Coef. Now test for AR(1) serial correlation: . The results are certainly different from when we omit the lag of log(scrap). Interval] +grant  .4628854 .24 0.077 0.097 0.321490208 +Total  110.083 . reg lscrap d89 grant grant_1 lscrap_1 if year != 1987.158 0. Interval] +d89  .1576739 1.1073226 .0021646 .0276544 .770 0.lscrap  Coef.3785767 . Std.048 . Err.0165089 .8808216 .4500385 grant_1  .0883283 0. predict uhat.2790328 .371 0.426703 .9204706 .939 .1199127 0.420 0.338 . robust cluster(fcode) Regression with robust standard errors Number of obs = F( 4.138043 The estimated effect of grant.000 .675 .000 .4746525 4 23.507 . reg lscrap grant grant_1 lscrap_1 uhat_1 if d89 Source  SS df MS +Model  94.232525 .1610378 0.0371354 . The variable grant would be if we use a 10% significance level and a onesided test.215732 0.962 0.4170208 . t P>t [95% Conf. Std.0571831 16. 53) = Prob > F = 38 108 77.
735673 618 .3450708 .14 0.3795728 39 . t P>t [95% Conf.1142922 grant  .45 0. 53) = Prob > F = 1.380342 629 . Err.65 0.0660522 grant_1  .0672268 3. Interval] +lprbarr  .0 grant_1 = 0. and the fully robust standard errors are much larger than the nonrobust ones.000 .0 F( 2.8565 . test grant grant_1 ( 1) ( 2) grant = 0.2517165 lscrap_1  .3266 7. a.644669 11 10.1155314 .60 0.5974413 .5624 .1145118 1.01 0. Err.0000 0.1188807 1.7513821 1. The following Stata output should be selfexplanatory. . 618) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 630 74.153 .0367657 19.8808216 .49 0.000 .2475521 .0371354 . Interval] +d89  . making both more statistically significant.55064  Robust lscrap  Coef.1790052 0. However.1153893 .216278 .0645344 13.6949699 Residual  88. Std.143585231 +Total  206.42 0.1073226 . grant and grant1 are jointly insignificant: .7917042 .37893 lcrmrte  Coef.010261 _cons  .000 .1420073 The robust standard errors for grant and grant1 are actually smaller than the usual ones.318 .1723924 .5700 0.4663616 .Rsquared Root MSE Number of clusters (fcode) = 54 = = 0.328108652 Number of obs F( 11. t P>t [95% Conf.6473024 lprbconv  .000 .0893147 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87 Source  SS df MS +Model  117.679 .5456589 .11.682 0.551 .7195033 .4108369 .4938765 lprbpris  .570 0. There is strong evidence of positive serial correlation in the static model.0263683 20. Std.694 0.
37893  Robust lcrmrte  Coef.02746 28. Interval] +uhat_1  .056127934 +Total  76.275 0.0696601 d84  .7918085 .15563 .7195033 .0579205 1.0000 0.057931 0.082293 .135 .3070248 .0578218 0.1087542 .878 0.056899 0.19 0.929 .061 .74e10 .1387815 .lavgsen  .5017347 40 .7378666 .576438 1.0051371 .23691 uhat  Coef. gen uhat_1 = uhat[_n1] if year > 81 (90 missing values generated) .4249525 d82  . Err.181 .0270426 .5700 .3659886 .057923 1.089 0.0300252 12. 538) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 831.2516253 8.6680407 1 46.8457504 _cons  1. I obtain the fully robust standard errors: .46 0.1925835 .0714718 d87  . t P>t [95% Conf.635 . 89) = Prob > F = Rsquared = Root MSE = Number of clusters (county) = 90 630 37.728 0. t P>t [95% Conf.475 0. resid .0101951 0.338 0.0364927 d86  .0867575 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87. robust cluster(county) Regression with robust standard errors Number of obs = F( 11.835 0.142606437 Number of obs F( 1.6680407 Residual  30.000 .1968286 538 .0269872 lpolpc  .1095979 6.451 .467 . predict uhat.0200271 Because of the strong serial correlation.9372719 .6071 0.1189026 d83  .000 2. Err.588149 .000 . Interval] +lprbarr  .6064 .000 1.1566662 .0420791 .189 0. Std.56 0.043503 .0200271 .1086284 .222504 .498 0.0576243 0. Std.0846963 _cons  2.0583244 1.000 . reg uhat uhat_1 Source  SS df MS +Model  46.0000 0.0780454 .8648693 539 .000 .755 0.2005023 .0049957 d85  .
4057025 lprbpris  .321 0.14 0.003 .1205245 lcrmrt_1  .121078 3.0960793 lprbpris  .8442885 Residual  16.1103509 .562 0.154268 539 .026896 1.0431201 d87  .lprbconv  .5456589 .000 .078524 .27 0.2475521 .75 0.68 0.78 0.0270426 .0164261 6.1865956 .0867575 .41 0.755 .6856152 .0271816 2.3113499 .1217691 lprbconv  .1337606 d83  .1324195 0.0085982 . Std.2119007 .006 0.0108203 .0267299 2.480 .1546683 .273 0.0124338 d84  .818 .031945255 +Total  180.6065681 d82  .018 3.1062619 . Further.8647054 2.0420159 .000 .0780454 .3641423 .8670945 528 .0428788 0.0391758 2. the lagged crime rate is very significant.784 0.334237975 Number of obs F( 11.0165096 7.9064 0.0440833 d86  . when we add the lag of log(crmrte): .082293 .101492 .8637879 _cons  .0948522 d87  .179 0.1668349 . including it makes all other coefficients much smaller in magnitude.007 .1152298 .0570255 lavgsen  . Interval] +lprbarr  .8263047 .174924 .015 .0678438 .0051371 .045 .306 0.0267623 2.000 .0137298 .445 .749 . We lose the first year.7888214 . t P>t [95% Conf.98 0.4638255 lavgsen  .230 0.0385625 2.0233448 d84  .2906166 .1088453 2.0229405 7.046 .470 0.02 0.430 0.0781181 d83  .329 .0014224 d86  .0107492 .033643 1.1272783 .1174537 .0704368 7.000 .1378348 lpolpc  .199 .71 0. 1981.800445 .29 0.1609444 .119 . 528) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 464.0612797 .02 0.0309127 d85  .312 0.77 0.17873 lcrmrte  Coef.0000 0. Err. drop uhat uhat_1 b.0011145 d85  .229651 Not surprisingly.0536882 . gen lcrmrt_1 = lcrmrte[_n1] if year > 81 (90 missing values generated) .287174 11 14.0345003 0.0381447 0.0190806 43.1087542 .043503 .0671272 .030387 3.0304828 .1130321 0.1285118 .0268172 0.1028353 .0487502 _cons  2.014 .3659886 .000 .1254092 .0367296 0.9044 .792 0. 41 The .0692234 .000 .0420791 .0649438 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 Source  SS df MS +Model  163.889 .0312787 .025 .0555355 lpolpc  .
gen uhat_1 = uhat[_n1] if year > 82 (180 missing values generated) .1746053 .986.0088345 42 .1292903 .0352873 0. although it is insignificant. t P>t [95% Conf. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d84d87 lcrmrt_1 uhat_1 From this regression the coefficient on uhat1 is only .000 .000 .0000 0. however.087 0.0165559 d84  .0729231 .000 .071157 .533423 20 8.0238458 7.9042 . Err.32 0.580 .0652494 .variable log(prbpris) now has a negative sign. Std. which means that there is little evidence of serial correlation (especially since ^ r is practically small).1050704 . resid (90 missing values generated) .0195318 .1277591 lprbconv  .0172627 6.011 . We still get a positive relationship between size of police force and crime rate.0311719 3.554 0.1005518 lprbpris  .059 with t statistic .17895 lcrmrte  Coef.1108926 .0169096 7.0287165 2.03202475 +Total  180. I will not correct the standard errors. and the magnitudes are pretty small in all cases: .0497918 lavgsen  . 519) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 255. Interval] +lprbarr  . predict uhat.17667116 Residual  16. Thus.911 0.1389838 d83  .557 0.154268 539 .542 0.0286922 2.1337714 .023 .334237975 Number of obs F( 20. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 lwconlwloc Source  SS df MS +Model  163.6208452 519 . None of the log(wage) variables is statistically significant.1216644 .000 .049654 lpolpc  . d.166991 .0888553 .9077 0. c.2214516 .272 0.322 0. There is no evidence of serial correlation in the model with a lagged dependent variable: .1721313 .
0474615 .114 0.2201867 .Xib)*8.0 0.0326156 0.039408 lwloc  .338 .0 9.429 .29319 _cons  .582 0.0487983 lwtuc  .181 .0564908 lwmfg  .0258059 .154 0.721 0. 519) = Prob > F = 0.341 0..0439875 0.310 1.871 0.6335887 1.23). 8 W7i=1 db i=1 Evaluating the derivative at the solution B^ gives & SN Z’X *’^W& SN Z’(y .0922683 lwser  .6009076 .5663 CHAPTER 8 8.012903 ..016 0.1173893 .471 .098539 lwfir  ..0121236 .172 . Letting Q(b) denote the objective function in (8.0498207 .276 0.0318995 0.0223995 0.0466549 .2639275 lwsta  .0418004 1. 7i=1 i i8 7i=1 i i In terms of full data matrices.8087768 .88852 .1070534 . 43 . we can write.0355801 lwfed  . it follows from multivariable calculus that dQ(b)’ N ^& N .294 .000 .877 .928 0.0465632 .354 .0 0..958 0.0389325 1. ^ ^ ^ (X’ZWZ’X)B = (X’ZWZ’Y).266 .783 .0034567 .038269 d86  .0330676 ..85 0.8496525 lwcon  .478 .791 0.0355555 .561 .0221872 0.6438061 .0 0.1054249 .0392516 0.0306847 .0371746 0.0 0.X B ^ * i )8 = 0.= 2&7 S Z’i Xi*’ S Z’i (yi .0405482 lwtrd  .0 0.1..0903894 ...0898807 .710 0.0208067 38.0296003 ..0742918 .0263763 . after simple algebra..1003172 0.051 0.0409046 . test lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc ( ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) 9) lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F( = = = = = = = = = 0..1286819 lcrmrt_1  .0 0.d85  ..767901 .1009652 .368 0...0660699 1.0798526 1.0283133 .0 0.0994076 d87  .0961124 .0 0.
^2 = [E(s’s)] E(s’r). idempotent matrix IL . Then we must show that ^2 = 0. it is now 44 .24). But. verifies the first part of the hint. First. E(s’r) = 0. we can always write x as its linear projection plus an error: * x * + e. and this = 0.L(xz). E(z’x) = E(z’x ).7 in Chapter 2). it is necessarily itself 1 positive semidefinite. th is block diagonal with g block Z’ g Xg. 8. th is a block diagonal matrix with g denotes the N ^2& N ^ ^ S Z’i ) Zi = Z’(IN t ))Z i=1 * ^2 block sg S z’ 7i=1 igzig8 _ sgZ’g Zg. let h _ h(z). where s 1 _ h . where ^1 is M * K and ^2 is Q * K.D(D’D) D’. and so r is uncorrelated with all functions of z. [IL . where x = z^ and E(z’e) = 0.3. 8. Z’X Using these facts. When )^ is diagonal and Zi has the form in (8. E(rz) = 0. This follows directly from the hint.(C’WC)(C’W*WC) (C’WC) can be written as 1 1 C’* 1/2 where D _ *1/2WC. x = Therefore. Since this is a matrix quadratic form in the L * L symmetric. from the twostep projection theorem (see Property LP. 8. by the assumption that E(xz) = L(xz).15).E(xz). which * To verify the second step. Therefore. Now.h) = z^1 + h^2.B^ Solving for gives (8.7. Straightforward matrix algebra shows that (C’* C) .5. r is also equal to x .D(D’D) D’]* 1 1/2 C. and write the linear projection as L(yz. s is simply a function of z since h shows that ^2 _ h(z). But Therefore. Further. where Zg N * Lg matrix of instruments for the gth equation.L(hz) and r _ x .
. 45 . x1 = (z1.. 1 1 1 This is just the system 2SLS estimator or.straightforward to show that the 3SLS estimator consists of [X’ g Zg(Z’ g Zg) Z’ g Xg] X’ g Zg(Z’ g Zg) Z’ g Yg stacked from g = 1. and this leads to the OLS estimator. b.9. If y2 is binary then E(y2z) = P(y2 = 1z) = F(z). equivalently. If E(ux) = 0 and E(u 2 = x) = s2 then the optimal instruments are s2E(xx) s2x. s2 and E(xizi) = zi^. )(z) Var(u1z) = s21.F(z)]. Further. Without the i subscript. and so the optimal IVs are [z1. 2SLS.E(y2z)]. 8. 2 Dropping the division by s21 clearly does not affect the optimal instruments. 2SLS equationbyequation. a..11. 2SLS estimator has the same asymptotic variance whether ^ or ^ ^ The is used.5 when G = 1.G.E(y2z)]. and so 2SLS is asymptotically efficient.. The constant multiple s2 clearly has no effect on the optimal IV estimator. * If E(uizi) = 2 1 w(zi) = E(ui2zi). the the optimal instruments are s2zi^.5. This is a simple application of Theorem 8. 8. with G = 1: zi = [w(zi)] E(xizi). The optimal instruments are given in Theorem 8. except that ^ These are the optimal IVs underlying is replaced with its rNconsistent OLS estimator. = It follows that the optimal instruments are (1/s1)[z1.y2) and so E(x1z) = [z1. so the optimal instruments are zi^.
and the parameters in a twoequation system modeling one in terms of the other. d. No. b. We we can certainly be interested in the causal effect of alcohol consumption on productivity. (In fact.) The simultaneous equations model recognizes that cities choose law enforcement expenditures in part on what they expect the crime rate to be. 46 One’s hourly wage is . Yes.1. a. These are both choice variables of the firm. c. If we want to know how a change in the price of foreign technology affects foreign technology (FT) purchases. we could use a simple regression analysis. We can certainly think of an exogenous change in law enforcement expenditures causing a reduction in crime.CHAPTER 9 9. but we should use a SUR model where neither is an explanatory variable in the other’s equation. OLS could be inconsistent for estimating the tradeoff. What causal inference could one draw from this? We may be interested in the tradeoff between wages and benefits. but then either of these can be taken as the dependent variable and the analysis would be by OLS. why would we want to hold fixed R&D spending? Clearly FT purchases and R&D spending are simultaneously chosen. An SEM is a convenient way to allow expenditures to depend on unobservables (to the econometrician) that affect crime. Yes. and therefore wage. have no economic meaning. But it is not a simultaneity problem. and vice versa. and we are certainly interested in such thought experiments. if we have omitted some important factors or have a measurement error problem. If we could do the appropriate experiment. No. where expenditures are assigned randomly across cities. then we could estimate the crime equation by OLS. Of course.
3. the only variable excluded from the support equation is the variable mremarr. this equation is identified if and only if d21 $ 0. finc. These are both chosen by the firm. It makes no sense to think about how exogenous changes in one would affect the other.2. Each equation can be estimated by 2SLS using instruments 1. presumably to maximize It makes no sense to hold advertising expenditures fixed while looking at how other variables affect price markup. profits. a. e. 9. mremarr. we need d11 $ 0 or d13 $ 0. No. When the property tax changes. a family will generally adjust expenditure in all categories.determined by the demand for skills. suppose that we look at the effects of changes in local property tax rates. First. obtain the reduced form for visits: 47 . fremarr. First. that is. This ensures that there is an exogenous variable shifting the mother’s reaction function that does not also shift the father’s reaction function. We can apply part b of Problem 9. b. We would not want to hold fixed family saving and then measure the effect of changing property taxes on housing expenditures. f. The visits equation is identified if and only if at least one of finc and fremarr actually appears in the support equation. since the support equation contains one endogenous variable. A SUR system with property tax as an explanatory variable seems to be the appropriate model. c. Further. dist. These are choice variables by the same household. No. alcohol consumption is determined by individual behavior.
There is one overidentifying restriction in the visits equation. as in part b. the sample size times the usual Rsquared from this regression is distributed asymptotically as c21 under the null hypothesis that all instruments are exogenous. 48 . visits. mremarr. Next. fremarr. dist. ^ Let u2 be the 2SLS residuals.d14). Then. ^ regress finc (or fremarr) on support. v2 ^ and do a (heteroskedasticityrobust) t test that the coefficient on v2 is zero. dist. (SSR0 is just the usual sum of squared residuals. fremarr. Then. run the simple regression (without intercept) of 1 on u2r1. A heteroskedasticityrobust test is also easy to obtain. d. ^ ^ Then. N  SSR0 from this regression is asymptotically c21 under H0. mremarr. ^ Estimate this equation by OLS. Assuming homoskedasticity of u2. ^ Let support denote the fitted values from the reduced form regression for support. v2. ^ say r1. and save the residuals. finc.g12. finc. run the auxiliary regression ^ u2 on 1.d12.5. run the OLS regression ^ support on 1. dist. a.) 9. assuming that d11 and d12 are both different from zero. and save the residuals.d13. the easiest way to test the overidentifying restriction is to first estimate the visits equation by 2SLS. Let B1 denote the 7 * 1 vector of parameters in the first equation with only the normalization restriction imposed: B’1 = (1.d11.visits = p20 + p21finc + p22fremarr + p23dist + p24mremarr + v2.g13. If this test rejects we conclude that visits is in fact endogenous in the support equation.
d22 = and so R1B becomes d32 * 2. .1 14 7 13 d23 d22 + d24 . the first column of R1B is zero. there are just enough instruments to estimate this equation.g31 8 0.7. and G . Now. It is easy to see how to estimate the first equation under the given assumptions.z4) + u1.The restrictions d12 = 0 and d13 + d14 = 1 are obtained by choosing R1 = &0 0 71 0 0 0 0 0 1 0 0* . After simple algebra we get y1 . &0 0 R1B = 2 70 g21 d33 Identification requires g21 $ 0 and d32 $ 0. Note that. we need to check the rank condition. + d34 .z4). This equation can be estimated by 2SLS using instruments (z1. g31 = 0. the order condition is satisfied. we need at least two elements in z(2) and/or z(3) that are not also in z(1).z4 = g12y2 + g13y3 + d11z1 + d13(z3 .z2. Because alcohol and educ are endogenous in the first equation. we use the constraints in the remainder of the system to get the expression for R1B with all information imposed. if we just count instruments. Letting B denote the 7 * 3 matrix of all structural parameters with only the three normalizations.1 = 2. d23 = 0. But g23 = 0. 49 Ideally. 9.g21 d33 d32 * 2. a. Next.g31 8 By definition of the constraints on the first equation. d24 = 0. straightforward matrix multiplication gives R1B = & d12 2d + d . + d34 .z3. Set d14 = 1 . and g32 = 0.d13 and plug this into the equation. 18 0 1 Because R1 has two rows. b.
we should not make any exclusion restrictions in the reduced form for educ.000 1454.0895 79. The matrix of instruments for each i is ( 2z i Zi = 2 0 2 0 9 d.6455 103.11 0.9.176 119.0070942 .000 .29): .0267 51.95137 1.49 0.0208906 . Std.28121 8.67 0.933 431.143 .169 3. z(3) = z.5673 134.396957 7.000 831.4078 age  12.87188 0.2685 1.451518 0.89 0. Then use these as instruments in a 2SLS analysis.35 0.362 2.84729 3.128 +lwage  hours  .1426539 exper  .000201 .0142782 1. z P>z [95% Conf. Err. a.0002109 0.0151452 7. reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) (lwage hours educ exper expersq) Threestage least squares regression Equation Obs Parms RMSE "Rsq" chi2 P hours 428 6 1368. Interval] +hours  lwage  1676.educi) 0 ) 2 2.46 0.82352 nwifeinc  .0006143 educ  .135 463.28) and (9. Let z denote all nonredundant exogenous variables in the system.799 535. 0 (zi.47 3555.132745 _cons  2504.1129699 .0002123 .915 6. zi2 0 0 That is.0832858 .137 28.95 0.8919 4.0488753 50 .8577 2522. b.3678943 3.49 0.6892584 0.1145 34.0000  Coef.1032 21.7287 62.we have at least one such element in z(2) and at least one such element in z(3).911094 kidslt6  200.63986 35. Here is my Stata output for the 3SLS estimation of (9.009 educ  205.0000 lwage 428 4 .340 .000 306.59414 kidsge6  48. 9.261529 1.46 0.95 0.47351 3.53608 0. c.
e.expersq  . We can just estimate the reduced form E(y2z1. Given Or. I know of no econometrics packages that conveniently allow system estimation using different instruments for different equations. we could just use 2SLS on ^ d11. 9. Consistency of OLS for p11 does not hinge on the validity of the exclusion restrictions in the structural model. this is identical to 2SLS since it is just identified. and g^21.1081241 Endogenous variables: hours lwage Exogenous variables: educ age kidslt6 kidsge6 nwifeinc exper expersq  b.) So our estimate of g21 provided we have not p11 = dE(y2z)/dz1 will be inconsistent in any case. c. We can estimate the system by 3SLS. Unfortunately.021 1. identified if and only if The second equation is d11 $ 0.z2.g^12g^21). of course). for the second equation.0002943 . To be added.31 0. it can be seen that p11 = d11/(1 .260 . course. whereas using an SEM does.0002614 1.3045904 2. Whether we estimate the parameters by 2SLS or 3SLS.302097 .z3) by ordinary least squares. a.13 0. Since z2 and z3 are both omitted from the first equation.7051103 . b. (Since we are estimating the second equation by 2SLS. we will still consistently estimate misspecified this equation.0008066 . we just need d22 $ 0 or d23 $ 0 (or both. we would form p^11 = d^11/(1 .000218 _cons  . d.g12g21). we obtain a more efficient 51 Of . we will generally inconsistently estimate d11 and g12.11. each equation. After substitution and straightforward algebra. ^g12. f. if the SEM is correctly specified.
Here is my Stata output.489 This shows that log(land) is very statistically significant in the RF for Smaller countries are more open.747 0. Here is my Stata output: . d22 $ 0.41783 52 .61916 57.021 . Interval] +lpcinc  . First. The first equation is identified if.230002 Number of obs F( 2. Err.715 2.4487 0. then OLS: .294 0.79 0.180527 5.0845 15.4217 113 575.342 0.953679 _cons  117.61387 Residual  63064.083 3. and only if. (This is the rank condition.89934 15.194 111 568.412473 3.796 open  Coef.1441212 2.49324 0.145892 +Total  65073.8483 7.1936 2 14303.505435 lland  7.4387 17. t P>t [95% Conf.8142162 9.0134 23. 9. c.68006 148.9902 113 564.7966 111 316. open.0968 Residual  35151. Err. reg inf open lpcinc (lland lpcinc) Source  SS df MS +Model  2009.000 9. Std.015081 0. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 114 2.5464812 1.) b.4012 1. reg open lpcinc lland Source  SS df MS +Model  28606.682852 +Total  63757. Interval] +open  .388 0.000 85.0657 0.3758247 2.187 0.0000 0.22775 2 1004.836 inf  Coef.368841 _cons  26.567103 . Std. 2SLS.0519014 lpcinc  .870989 Number of obs F( 2.366 0.13. a. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 45. t P>t [95% Conf.3374871 .0309 0.617192 4.17 0.estimator of the reduced form parameters by imposing the restrictions in estimating p11.852 3.6230728 .
it also You might want to test to see if open is endogenous.273 0.4936 111 559.1060 .7527 110 595.09 0.870989 Number of obs F( 2. 24. Std.70715 +Total  65073.102 5. Err.40 inf  Coef. Err.63 0.026122 55.110342 Residual  65487.993 3.23419 The 2SLS estimate is notably larger in magnitude.343207 +Total  65073.009 0.0281 23. but we will go ahead. has a larger standard error.027556 lpcinc  . we need an IV for it.92812 2 1472. gen opensq = open^2 .025 . 110) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 114 2.931692 _cons  25.896555 3.658 inf  Coef. Interval] +open  . 2 A regression of open Since is a natural 2 on log(land). 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 2. 2 log(land) is partially correlated with open. reg inf open opensq lpcinc (lland llandsq lpcinc) Source  SS df MS +Model  414.96406 Residual  62127.2150695 . Std.0764 0. The Stata output for 2SLS is . t P>t [95% Conf.651 0.0946289 2. t P>t [95% Conf. d.4217 113 575..975267 0.331026 3 138.402583 . reg inf open lpcinc Source  SS df MS +Model  2945. Not surprisingly. and log(pcinc) 2 gives a heteroskedasticityrobust t statistic on [log(land)] This is borderline.0175683 1. .4217 113 575.0453 0. of about 2. [log(land)] . [log(land)] candidate.20522 1. If we add g13open2 to the equation.10403 15. Interval] +53 . gen llandsq = lland^2 .870989 Number of obs F( 3.
4387 17.37 0.8142162 9.715 2.230 0.069134 0. the estimate would be significant at about the 6.801467 81.72804 Residual  61330. gen openhsq = openh^2 .6205699 1.593929 4.5066092 2. Std.54102  The squared term indicates that the impact of open on inf diminishes.932 0.68006 148. t P>t [95% Conf.4487 0. Interval] +lpcinc  .39 0. Err. reg inf openh openhsq lpcinc Source  SS df MS +Model  3743.1936 2 14303.7966 111 316.245 0.0174527 lpcinc  .36141 2.612 inf  Coef. fitted values) .open  1.953679 _cons  117.0000 0.18411 3 1247.180527 5.198637 .028 4.547615 +Total  65073.489 .682852 +Total  63757.131 .0845 15.0049828 1. reg open lpcinc lland Source  SS df MS +Model  28606. Std.2376 110 557.505435 lland  7.0311868 opensq  .49324 0. t P>t [95% Conf.9902 113 564.29 0. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 45.412473 3.0318 23.0879 0.000 85.4217 113 575.521 0.567103 .056 2.0575 0.5464812 1.807 3.0022966 . predict openh (option xb assumed.607147 _cons  43.5% level against a onesided alternative. 110) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 2.230002 Number of obs F( 2.428461 .24 0. Interval] +54 .17124 19.17 0. Err.0075781 .870989 Number of obs F( 3.000 9.0968 Residual  35151. Here is the Stata output for implementing the method described in the problem: .8483 7. e.796 open  Coef.
a. could easily be correlated with tax rates because tax rates are. I would start with a fixed effects analysis to allow arbitrary correlation between all timevarying explanatory variables and ci.78391  Qualitatively. Standard investment theories suggest that. anyway.047 .984 3.12. But the forbidden regression implemented in this part is uncessary. If only a cross section were available. This is often a difficult task.lland) is linear and.5394132 1. d. c.1. at least to a certain extent. (Actually. Something like "average" county economic climate. less robust.02 0. this is done by using T . E(openlpcinc. and we cannot trust the standard errors. larger marginal tax rates decrease investment.5727026 77. which affects investment. as shown in Problem 9.01 0.0059682 1. Putting the unobserved effect ci in the equation is a simple way to account for timeconstant features of a county that affect investment and might also be correlated with the tax variable. it is important to allow for these by including separate time intercepts.1 time period dummies.01 0.933799 .17831 19. ceteris paribus.050927 _cons  39.0178777 lpcinc  .openh  . both methods are consistent.48041 2. we would have to find an instrument for the tax variable that is uncorrelated with ci and correlated with the tax rate.023302 0.0060502 .60 0. Since investment is likely to be affected by macroeconomic factors.0412172 2. CHAPTER 10 10. the results are similar to the correct IV method from part d. b.204181 openhsq  .313 . selected by state and local officials. 55 . If g13 = 0.0057774 .8648092 .968493 4.112 1.
¨ xi1 = xi1 . If taxit and disasterit do not have lagged effects on investment. with first Duit are serially uncorrelated. In either case. state officials might look at the levels of past investment in determining future tax policy. 56 . taxit. yi = (yi1 + yi2)/2. in which case I might use first differencing instead. differencing it is easy to test whether the changes Remember. I have no strong intuition for the likely serial correlation properties of the {uit}. Let xi = (xi1 + xi2)/2. I would compute the fully robust standard errors along with the usual ones. a. e. this is not a worry for the disaster variable: It is safe to say that disasters are not determined by past investment. and disasterit in the sense that these are uncorrelated with the errors uis for all t and s. in which case I would use standard fixed effects.xi. it seems more likely that the uit are positively autocorrelated. since a larger base means a smaller rate can achieve the same amount of revenue.3. these results can be compared with those from an FE analysis). this might not be much of a problem. presumably. rates: This could be similar to setting property tax sometimes property tax rates are set depending on recent housing values.doing pooled OLS is a useful initial exercise. However. Given that we allow taxit to be correlated with ci. 10. But it cannot be ruled out ahead of time. then the only possible violation of the strict exogeneity assumption is if future values of these variables are correlated with uit. especially if there is a target level of tax revenue the officials are are trying to achieve. future natural On the other hand. These might have little serial correlation because we have allowed for ci. Such an analysis assumes strict exogeneity of zit.
¨ ¨ . and similarly for ¨ yi1 and y i2 For T = 2 the fixed effects estimator can be written as B^FE = & SN (x¨’ x¨ + x¨’ x¨ )*1& SN (x¨’ ¨y + ¨x’ y¨ )*. xi2 = xi2 .2. ^ _ Dyi . we have ^ ^ ui1 = Dyi/2 ..1’).xi1)/2 = Dxi/2 ¨ yi1 = (yi1 .xi.DxiBFD)/2 _ ei/2. N N ^2 2 ^2 S (u^i1 + ui2) = (1/2) S ei.¨ xi1BFE and ui2 = ¨ yi2 . Let ui1 = ¨ yi1 .N.x i2 FE be the fixed effects residuals for the two time periods for cross section observation i.(Dxi/2)BFD = (Dyi ^ ui2 = ^ where ei ^ ^ DxiB FD)/2 _ ei/2 ^ ^ ^ Dyi/2 . Since we know the variance estimate for fixed effects is the SSR 57 . i = 1. Since B^FE and using the representations in (4. ¨ xi1 = (xi1 ..DxiB FD are the first difference residuals. i2 i2 8 i2 i2 8 7i=1 i1 i1 7i=1 i1 i1 Now. and so B^FE & SN Dx’Dx /2*1& SN Dx’Dy /2* 7i=1 i i 8 7i=1 i i 8 & SN Dx’Dx *1& SN Dx’Dy * = B ^ = FD.yi1)/2 = Dyi/2.xi2)/2 = Dxi/2 ¨ xi2 = (xi2 . i=1 i=1 This shows that the sum of squared residuals from the fixed effects regression is exactly one have the sum of squared residuals from the first difference regression. by simple algebra. ¨ ¨ ¨ ¨ x’ i1xi1 + x’ i2xi2 = Dx’D i xi/4 + Dx’D i xi/4 = Dx’D i xi/2 ¨ ¨ ¨ y ¨ x’ i1yi1 + x’ i2 i2 = Dx’D i yi/4 + Dx’D i yi/4 = Dx’D i yi/2.yi2)/2 = Dyi/2 ¨ yi2 = (yi2 ... Therefore. 7i=1 i i8 7i=1 i i8 = ^ ^ ^ ^ ¨ B b. = B^FD.(Dxi/2)B FD = (Dyi . Therefore.
10.5.K. a. This is easy since the variance matrix estimate for fixed effects is 1 N N *1 = ^s2& SN Dx’Dx *1. and the variance estimate for first difference is the SSR divided by N . which implies that E[(ciu’ i )xi) = 0 by interated expecations.3b. $ Therefore. that is. h(xi) + s2u. E(viv’ i xi) = E(cixi)jTj’ T + E(uiu’ i xi) = h(xi)jTj’ T + 2 where h(xi) _ Var(cixi) = E(c2ixi) (by RE. and in fact all other test statistics (F statistics) will be numerically identical using the two approaches. E(uiu’ i xi. Write viv’ i = cijTj’ T + uiu’ i + jT(ciu’ i ) + (ciui)j’ T.30) 58 . The RE estimator is still consistent and rNasymptotically normal without assumption RE. E(uixi. by iterated expectations). the standard errors. su7 S (x¨’i1¨xi1 + ¨x’i2¨xi2)*8 = (^s2e/2)&7 S Dx’D x /2 i i 8 e7 i i8 i=1 i=1 i=1 ^2& which is the variance matrix estimator for first difference. ^2 su = ^s2e/2 (contrary to what the problem asks you so show). while the variances and covariances depend on xi in general. the error variance from fixed effects is always half the size as the error variance for first difference estimation.3a. 2 Under RE. to show is that the variance matrix estimates of B^FE and What I wanted you B^FD are identical. Thus. b.divided by N .1b). This shows that the conditional variance matrix of vi given xi has the same covariance for all t s. Therefore.K (when T = 2). Under RE. they do not depend on time separately. h(xi). but the usual random effects variance estimator of B^RE is no longer valid because E(viv’ i xi) does not have the form (10. s2uIT. and the same variance for all t. which implies that E(uiu’ i xi) = suIT (again.ci) = 0.ci) = 2 s2uIT.1.
0121797 crsgpa  1.103 .0392381 1.4785 chi2( 10) = Prob > chi2 = 512.963 0.000 2.000 .000 . xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black female.3566599 4.843 0.864 0.000 .2380173 .630 0.261 .1145132 . xtreg trmgpa spring crsgpa frstsem season.4779937 .0108977 .0001771 9.0612948 5.0020523 verbmath  .000 .1210044 .3718544 .0012426 6. fe 59 .810 0.0930877 11.3581529 .7.0371605 1.0013582 . z P>z [95% Conf.0029948 . and I compute the nonrobust regressionbased statistic from equation (11.0017052 .0328061 sat  .124 0. with timevarying variables only. * random effects estimation .4088283 .0000 trmgpa  Coef.8999166 1.1629538 hsperc  .000167 black  .0060268 hssize  .000 . Err.2067 0.3684048 .445 0.1334868 . The robust variance matrix estimator given in (7. re sd(u_id) sd(e_id_t) sd(e_id_t + u_id) = = = .43396 1.0000775 .2348189 .050 0.0681573 3.264814 frstsem  .(because it depends on xi).082365 .960 . Interval] +spring  . X) = 0 (assumed) (theta = 0.035879 . .5390 0.1012331 female  .0001248 0.73492 .0599542 0.000322 .16351 0.0440992 . iis id .1205028 season  .0606536 .1575199 .77 0. * fixed effects estimation. I provide annotated Stata output.632 0.621 0.0084622 .4782886 _cons  1. Std. 10.49) should be used in obtaining standard errors or Wald statistics.534 .5526448 corr(u_id.79): .335 .3862) Randomeffects GLS regression Number of obs = 732 n = 366 T = 2 Rsq within between overall = = = 0. tis term .627 0.
gen bhssize = . * timevarying variables.332 0.386. * Obtaining the regressionbased Hausman test is a bit tedious. * Now obtain GLS transformations for both timeconstant and . gen bsat = . .1186538 9.366 0. gen bvrbmth = .614*sat .614 .0893 Fixedeffects (within) regression Number of obs = 732 n = 366 T = 2 Rsq within between overall F( = = = 0..0333 0.614*hsperc . by(id) . gen bhsperc = . .420747 . egen afrstsem = mean(frstsem).681 0.61 0.614*verbmath .0391404 1. 362) = Prob > F = 23. egen atrmgpa = mean(trmgpa). by(id) . Interval] +spring  .3305004 2.399 0.2069 0.0613 4.386 . Note that lamdahat = . Err.9073506 1.sd(u_id) sd(e_id_t) sd(e_id_t + u_id) = = = .1225172 . egen aseason = mean(season).679133 .1208637 id  F(365.0657817 .187 0. Xb) = 0. egen acrsgpa = mean(crsgpa).1482218 season  .000 . di 1 .020 1. Std.0414748 1.0249165 _cons  .792693 corr(u_id.614 0.173 . by(id) .0000 trmgpa  Coef.140688 . by(id) .0128523 . by(id) . compute the timeaverages for all of the timevarying variables: . egen aspring = mean(spring).7708056 .614*hssize 60 First.362) = 5.0566454 .1382072 .4088283 .0111895 crsgpa  1. t P>t [95% Conf.094 .852 .0688364 0.1427528 .374025 frstsem  .000 (366 categories) .614 . gen bone = .
000 .0000775 .0109013 ..103 .. Err. gen bspring = spring . reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale acrsgpa afrstsem aseason.964 0.000 2. gen bcrsgpa = crsgpa . 721) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 732 862.386*atrmgpa .864 0.163434 bhsperc  . * effects.734843 . gen bfemale = .632 0. * These are the RE estimates.0000 0.0371666 1.1575166 .082336 .811 0.0599604 0.0017052 .9294 0.1207046 bseason  . Std.0001674 bblack  .0012424 6.0003224 .336 . nocons Source  SS df MS +Model  1584.535 .114731 .0612839 5.4784686 .. * Now add the time averages of the variables that change across i and t . gen btrmgpa = trmgpa . .000 .2378363 ..1010359 bfemale  .0329558 bsat  .0013577 .060651 .0001247 0.1211368 .4784672 .3686049 . gen bblack = . t P>t [95% Conf.000177 9. * Check to make sure that pooled OLS on transformed data is random .614*female .000 .632 0.386*aseason .0123167 bcrsgpa  1.844 0. subject to rounding error.1634784 0.359125 721 .0060231 bhssize  . * to perform the Hausman test: .050 0.10163 11 144.1336187 .3284983 Number of obs F( 11.3581524 .386*afrstsem .626 0.67 0. gen bseason = season . .621 0.0020528 bvrbmth  .9283 .40858 btrmgpa  Coef.034666 bspring  .386*aspring .009239 Residual  120.0930923 11. gen bfrstsem = frstsem .0440905 .123 0..0392441 1.000 .435019 1. nocons 61 .46076 732 2.8995719 1.446 0.0084622 ..262 .960 .614*black .2348204 .000 .3566396 4.1669336 +Total  1704.386*acrsgpa .0681441 3.265101 bfrstsem  . Interval] +bone  1. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale.0029868 .
173 .000 .961 0.0016681 .0012551 6.569 0.171981 Residual  120.0 F( 3.006 2.536 0. * Thus.1654425 0.0794119 0. For comparison. 718) = Prob > F = 0.337 .4754216 acrsgpa  .441186 .0 afrstsem = 0.1316462 .40773 14 113.000 .2322278 .000 . Std.0084655 .366 0.0685972 3.61 0.423761 .140688 .053023 718 .167204767 +Total  1704.1959815 .0391479 1.1234835 0.0480418 .3794684 .85 0.2241405 .0110764 bcrsgpa  1.000167 bblack  .3357016 . c24 It would have been easy to make the regressionbased test robust to any violation of RE.0711669 4.0001804 9.000 .0688496 0.531 . based on a distribution (using Stata 7.592 .373683 bfrstsem  .0001249 0.Source  SS df MS +Model  1584.3284983 Number of obs F( 14.9282 . 718) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 732 676.9296 0. the usual form of the Hausman test.627 0.0896965 0. gives pvalue = .1281327 afrstsem  .9076934 1.0000 0.770.0795867 . robust .187 0. * significance levels.680 0. we fail to reject the random effects assumptions even at very large .1426398 .148023 bseason  .1931626 bhsperc  . test acrsgpa afrstsem aseason ( 1) ( 2) ( 3) acrsgpa = 0.40891 btrmgpa  Coef.001314 .4063367 bspring  .000 .46076 732 2.0247967 bsat  .5182286 2.3567312 .1380874 .2447934 .3: cluster(id)" to the regression command.717 0. which includes spring among the coefficients tested.0657817 .0). Interval] +bone  1.0109296 .0 aseason = 0.1142992 .1223184 .745 0.747 0.1280569 aseason  .0003236 .0566454 .796 0.1101184 bfemale  .0763206 .612 0.0000783 .0414828 1.426 .852 .4564551 .6085 .0060013 bhssize  .093 .0020223 bvrbmth  . t P>t [95% Conf. 62 add ". Err.926 0.355 .247 0.1186766 9.0128523 .
. and estimating the equation by random effects. in the sum of squared residuals. c. "fixed effects weighted least squares. I should have In other words. excluding the dummy variables. substituting back into the 63 .11. we can justify this procedure with fixed T In particular. and d: To be added. is another case where "estimating" the fixed effects leads to an estimator of properties. Parts b. it produces a estimator of B. The short answer is: as N L 8. B with good (As usual with fixed T. write  yit = xitB + wiX + rit. rNconsistent. there is no sense in which we can estimate the ci consistently.. but it is mostly just algebra. We can estimate this equation The actual calculation for this example is to be added.10. the covariates that change across i and t. 10. t = 1.13. by random effects and test H0: X = 0. The simplest way to compute a Hausman test is to just add the time averages of all explanatory variables. where xit includes an overall intercept along with time dummies. First.yi) and b.. we can "concentrate" the ai out ^ by finding ai(b) as a function of (xi. To be added.) Verifying this claim takes much more work. a. 10." where the weights are known functions of exogenous variables (including xi and possible other covariates that do not appear in the conditional mean). as well as wit. Yes.T. The Stata output follows. asymptotically normal Therefore.9. done a better job of spelling this out in the text..
. it is .N.. 7i=1t=1 it it8 7i=1t=1 it it8 (10.w . ~ When we plug this in for yit in (10. . and then minimizing with respect to b only.82) and divide by N in the appropriate places we get 64 .y.w . _ (uit . we can study the asymptotic (N this equation from yit = xitB ~ where uit .wi)/rh. _ (yit ..w .. Note that yi and xi are simply weighted averages. y and x are the usual time it i i averages. i=1t=1 ..xib.wi)/rh.(xit ... i = 1. where ui _ wi S uit/hit . ^ ^ Now we can plug each ai(b) into the SSR to get the problem solved by B: min K beR N T S S [(yit .it Then B^ can be expressed in usual pooled OLS form: B^ = & SN ST x~’ x~ *1& SN ST x~’ y~ *.^ai . t=1 which gives ^ & ai(b) = wi T * & T * S yit/hit .wi)/rh.... T & * . _ (xit .w Note carefully how the initial yit are weighted by 1/hit to obtain yi. First.w . Given ^ L 8) properties of B .. Subtracting 7t=1 8 (10.y...82). and a similar 7t=1 8 7t=1 8 . define yit (xit .wi S xit/hit b 7t=1 8 7t=1 8 w w _ yi .w .sum of squared residuals.yi) on w ~ Equivalently.... Straightforward algebra gives the first order conditions for each i as T S (yit ..wi)b]2/hit...u.wi) .it ..xitb)/hit = 0.w If h equals the same constant for all t.82) .w definition holds for xi..w But this is just a pooled weighted least squares regression of (yit .w ..w & T * easy to show that yi = xiB + ci + ui. all t = 1..xi) with weights 1/hit.w & T * where wi _ 1/ S (1/hit) > 0 and yi _ wi S yit/hit ... but  where the usual 1/rhit weighting shows up in the sum of squared residuals on the timedemeaned data (where the demeaming is a weighted average).it + ~ ~ ci + uit for all t gives yit = xitB + ~ uit.x.....T.x. ~ xit ..
~2 2 .B) = suA . i = 1.. _ S E(x~’it~xit) and B _ Var&7 S ~x’ituit/rh.83) B^. (10.hi. t = 1.it 8 t=1 t=1 If we assume that Cov(uit.2su2E[(wi/hit)] + su2E[(wi/hit)]. and so rN( .. in addition to the A variance assumption Var(uitxi. t = 1.2E[(uitui)/hit] su2 . 7 i=1t=1 it it8 7 i=1t=1 it it8 T ~ ~ T ~ Straightforward algebra shows that S x’ ituit = S x’ ituit/rhit.. where the law of 65 .. B^ = B + t=1 t=1 and so we have the convenient expression B^ = B + &N1 SN ST x~’ x~ *1&N1 SN ST x~’ u /rh.T.uisxi...hi).. The same subtleties that arise in estimating effects estimator crop up here as well.hi.T.H) = B.w ^ say rit. Then.. 7 i=1t=1 it it8 7 i=1t=1 it it it8 From (10.w Now E(uit) = E[(uit/hit)] . s2u for the usual fixed Assume the zero conditional covariance assumption and correct variance specification in the previous paragraph.N....N.ci) = 0. are estimating uit = (uit . As long & S E(x~’ ~x )* = K.ui)/rhit (in the sense that we obtain rit ~ from uit by replacing w 2 + E[(ui) /hit] = B with B^).hi... The asymptotic variance is generally Avar .84) ^ ~ .. note that the residuals from the pooled OLS regression ~ ~ yit on xit....B ^ 2 1 su2A.. we can use the usual proof to show it it 8 7t=1 ^ ^ plim(B) = B. including xit. t $ s.B ^ 1 1 rN( .83) that B is rNasymptotically normal under T as we assume rank mild assumptions. ~ So E(x’ ituit) = 0.* .*.. where T T .&N1 SN ST x~’ x~ *1&N1 SN ST x~’ u~ *. then it is easily shown that B .ci) = 0.B) = A BA .. which means uit is uncorrelated with any ~ function of (xi. i = 1. (We can even show that E(BX..ci) = = s2uhit. Why? We assumed that E(uitxi.) ^ It is also clear from (10.....83) we can immediately read off the consistency of (10.
  w 2x ,h ] = s2w has
i
i
u i
iterated expectations is applied several times, and E[(ui)
been used.
~2
Therefore, E(uit) =
su2[1  E(wi/hit)], t = 1,...,T, and so
T
S E(u~2it) = s2u{T  E[wiWSTt=1(1/hit)]} = s2u(T  1).
t=1
This contains the usual result for the within transformation as a special
case.
A consistent estimator of
su2 is SSR/[N(T  1)  K], where SSR is the
usual sum of squared residuals from (10.84), and the subtraction of K is
optional.
^
The estimator of Avar(B) is then
^2&
1
N T
su7 S S ~x’it~xit*8 .
i=1t=1
If we want to allow serial correlation in the {uit}, or allow
Var(uitxi,hi,ci)
$ s2uhit, then we can just apply the robust formula for the
pooled OLS regression (10.84).
CHAPTER 11
11.1. a. It is important to remember that, any time we put a variable in a
regression model (whether we are using cross section or panel data), we are
controlling for the effects of that variable on the dependent variable.
The
whole point of regression analysis is that it allows the explanatory variables
to be correlated while estimating ceteris paribus effects.
Thus, the
inclusion of yi,t1 in the equation allows progit to be correlated with
yi,t1, and also recognizes that, due to inertia, yit is often strongly
related to yi,t1.
An assumption that implies pooled OLS is consistent is
E(uitzi,xit,yi,t1,progit) = 0, all t,
66
which is implied by but is weaker than dynamic completeness.
Without
additional assumptions, the pooled OLS standard errors and test statistics
need to be adjusted for heteroskedasticity and serial correlation (although
the later will not be present under dynamic completeness).
b. As we discussed in Section 7.8.2, this statement is incorrect.
Provided our interest is in E(yitzi,xit,yi,t1,progit), we do not care about
serial correlation in the implied errors, nor does serial correlation cause
inconsistency in the OLS estimators.
c. Such a model is the standard unobserved effects model:
yit = xitB +
d1progit + ci + uit,
t=1,2,...,T.
We would probably assume that (xit,progit) is strictly exogenous; the weakest
form of strict exogeneity is that (xit,progit) is uncorrelated with uis for
all t and s.
Then we could estimate the equation by fixed effects or first
differencing.
If the uit are serially uncorrelated, FE is preferred.
We
could also do a GLS analysis after the fixed effects or firstdifferencing
transformations, but we should have a large N.
d. A model that incorporates features from parts a and c is
yit = xitB +
d1progit + r1yi,t1 + ci + uit,
t = 1,...,T.
Now, program participation can depend on unobserved city heterogeneity as well
as on lagged yit (we assume that yi0 is observed).
differencing are both inconsistent as N
Fixed effects and first
L 8 with fixed T.
Assuming that E(uitxi,progi,yi,t1,yi,t2,...,yi0) = 0, a consistent
procedure is obtained by first differencing, to get
yit =
At time t and
yi,tj for j
DxitB + d1Dprogit + r1Dyi,t1 + Duit,
t=2,...,T.
Dxit, Dprogit can be used as there own instruments, along with
> 2.
Either pooled 2SLS or a GMM procedure can be used.
67
Under
strict exogeneity, past and future values of xit can also be used as
instruments.
11.3. Writing yit =
bxit + ci + uit  brit, the fixed effects estimator ^bFE
can be written as
2
N T
N T
b + 7&N1 S S (xit  x  i)*8 &7N1 S S (xit  x  i)(uit  u  i  b(rit  r i)*8.
i=1t=1
i=1t=1
*
*
*
Now, xit  xi = (xit  xi) + (rit  ri). Then, because E(ritxi,ci) = 0 for
*
  *
all t, (x
 x ) and (r
 r ) are uncorrelated, and so
it
i
it
i

  *
*

Var(xit  xi) = Var(xit  xi) + Var(rit  ri), all t.


Similarly, under (11.30), (xit  xi) and (uit  ui) are uncorrelated for all


*
  *


Now E[(xit  xi)(rit  ri)] = E[{(xit  xi) + (rit  ri)}(rit  ri)] =
t.

Var(rit  ri).
By the law of large numbers and the assumption of constant
variances across t,
N
T
T
S S (xit  x  i) Lp S Var(xit  x i) = T[Var(x*it  x *i) + Var(rit  r i)]
1 N
i=1t=1
t=1
and
N
T
S S (xit  x  i)(uit  u  i  b(rit  r i) Lp TbVar(rit  r i).
1 N
i=1t=1
Therefore,
plim
^
bFE
Var(rit  r i )
&
*
= b  b 7[Var(x*  x   * ) + Var(r  r   )]8
it
i
it
i
it
i
it
i
Var(rit  r i )
&
*
= b 1   .
7
*
*
8
[Var(x
 x ) +
Var(r
 r )]
11.5. a. E(vizi,xi) = Zi[E(aizi,xi) 
A]
+ E(uizi,xi) = Zi(A 
A)
+ 0 = 0.
Next, Var(vizi,xi) = ZiVar(aizi,xi)Z’
i + Var(uizi,xi) + Cov(ai,uizi,xi) +
Cov(ui,aizi,xi) = ZiVar(aizi,xi)Z’
i + Var(uizi,xi) because ai and ui are
uncorrelated, conditional on (zi,xi), by FE.1’ and the usual iterated
68
Var(vizi.2. We can easily make the RE analysis fully robust to an arbitrary Var(vixi. From part a. b. Davidson and MacKinnon (1993.4)]. t = 1..will be invalid.2. provided ^ ) L 8 for rNasymptotically normal. provided the rank condition. It need ^ ^ not be the case that Var(vixi. denote the pooled OLS estimator from this equation. c. we know that E(vixi. the usual random effects inference .49). Therefore. When Lt = L/T  B^ (along with ^ along with B.xit). a feasible GLS analysis with any ^ ) will be consistent converges in probability to a nonsingular matrix as N L 8. By standard results on partitioned regression [for example. From part a. there is conditional heteroskedasticity.expectations argument. as in equation (7. B^ can be obtained by the following twostep procedure: 69 . which shows that the conditional variance depends on zi.zi). Naturally. we are applying FGLS to the equation yi = ZiA + XiB + vi. and so the usual RE estimator is consistent (as N fixed T) and RE. If we use the usual RE analysis.60) to get yit = xitB + xiL Let A L) + vit. Assumption (Remember.that is.. and we estimate 11. we expand the set of explanatory variables to (zit.zi) depends on zi unless we restrict almost all elements of constant in zit). for all t. * to be zero (all but those corresponding the the Therefore.zi) = 0. based on the usual RE variance matrix estimator .T.7.xi) = Zi*Z’ i + s2uIT under the assumptions given. or even that Var(vi) = plim()).zi) = plim()). Section 1... we can rearrange (11. Unlike in the standard random effects model. where vi = Zi(ai  A) + ui. we know that Var(vixi. holds.
Given that the FE estimator can ^ be obtained by pooled OLS of yit on (xit .S S x’i xi8 7 S S x’i xit*8 7i=1t=1 i=1t=1 N . just as before.25) and simplify. and this rules out timeit it 8 7t=1 The condition rank constant instruments. If we plug these choices of C.^ = S Tx’ i xi = S S x’ i xi. But ^ . su2IT (by the usual iterated expectations argument).B) = su{E(X i i)[E(Z’ i Zi)] E(Z’ i Xi)} ..1)su. We can apply Problem 8. This completes the proof. in equation (8. and save the 1 * K vectors of ^ residuals.N.’i xit = S x. But if the rank condition holds.80).xi).B ^ 2 1 ¨ ¨ 1 ¨’Z ¨ ¨ ¨ rN( . The argument is very similar to the case of the fixed effects T estimator.T..9.b. (i) Regress xit on xi across all t and i. c. a. it it 8 7t=1 timedemeaned equation: rank This clearly fails if xit contains any timeconstant explanatory variables (across all i.N T . t=1 70 ^ ¨ ¨ ^ If uit = y it . 2 2 S E(u¨it ) = (T . ^ ^ The OLS vector on rit is B.. W = [E(Z’ i Zi)] . t=1 b. i = 1. we can always redefine T zit so that S E(z¨’it¨zit) has full rank.xi for all t and i..25). ^ (ii) Regress yit on rit across all t and i. we obtain Avar . and i=1t=1 11.xiIK N T N T S S x..xitB .. t = 1.xi. where A key point is that ¨ Z’ i ui = (QTZi)’(QTui) = Z’ i QTui = Z’ i i QT is the T x T timedemeaning matrix defined in Chapter 10. We can apply the results on GMM estimation in Chapter 8. say rit. First.8. and * into (8.*1& N T ... and ¨’¨ ¨ ¨ E(Z i uiu’ i Zi). as usual).. B^ We want to show that is the FE estimator. & ST E(z¨’ ¨z )* = L is also needed. In 1 ¨’¨ ¨ ¨ particular.’i S xit i=1t=1 i=1 t=1 i=1 = xit . and so rit = xit .. and so * = ¨ E(uiu’ i Zi) = ¨’u u’¨ E(Z i i i Zi) = ¨’¨ s2uE(Z i Zi). it suffices to show that rit =  xit . W.xi N T ... * = " ¨ u .& rit = xit . take C = E(Z i Xi). as we are applying pooled 2SLS to the & ST E(z¨’ ¨x )* = K... Under (11..
This proves that the 2SLS estimates of and (11.. B^ ^ and the coefficient on rit. then [N(T 1 N 1)] T S S ^u2it is a consistent estimator of s2u. cN.xiB.) e.are the pooled 2SLS residuals applied to the timedemeaned data. as ^ would usually be the case. where B is the IV estimator 71 ... Therefore. . . dNi results in time demeaning.1 (which is purely algebraic.. B from the pooled regression yit on d1i. the 2SLS estimator of all parameters in (11. Typically.1) would be i=1t=1 replaced by N(T . xit. again by partial regression ^ and the fact that regressing on d1i. dNi. Now. say from this last regression can be obtained by first partialling out the dummy variables. say rit.81).. some entries in rit are identically zero for all t and i. and obtain the residuals. Now consider the 2SLS estimator of B from (11. As we know from Chapter 10. This is equivalent to ^ first regressing ¨ xit on ¨ zit and saving the residuals. . by algebra of partial regression.79). d. dNi. including B...^ ^ for xit). second. and xit as the IVs ^ .. sit. this partialling out is equivalent to time demeaning all variables.. where we use the fact ^ that the time average of rit for each i is identically zero. B^ and D^ can be ¨ . and then ^ running the OLS regression ¨ yit on ¨ xit. by writing down the first order condition for the 2SLS ^ estimates from (11. zit across all t and i. can be obtained as follows: first run the regression xit on d1i. say sit.. ^ rit.81) (with dni as their own instruments. ^ ..1) . B from (11. and so applies immediately to pooled 2SLS).K as a degrees of freedom adjustment. ^ ^ ^ obtain c1. . N(T . it is easy to show that ci = yi . sit = ^ rit for all i and t... But we can simply drop those without changing any other steps in the argument.81) are identical. ^ obtained from the pooled regression ¨ yit on x it rit. But. First. ^ D.. dNi. d1i.79) (If some elements of xit are included in zit. From Problem 5..
cgrant_1[_n1] if d89 (314 missing values generated) . Because the N dummy variables are explicitly included in (11. messy estimator in equation (8. W = (Z Z/N) .81). gen ccgrnt = cgrant .¨ XiB. 51) Prob > F Rsquared Adj Rsquared Root MSE = 54 = 0. Std.xiB) .97 = 0.49516731 +Total  26. The 2SLS procedure is inconsistent as N L 8 with fixed T. 7 i=1 i i i i8 g.(xit . Alternatively.81) .11. reg cclscrap ccgrnt ccgrnt_1 Source  SS df MS +Model  .81) (and also (11.27) should be used.79)).0366 = 0.958448372 2 . gen ccgrnt_1 = cgrant_1 .cgrant[_n1] if d89 (314 missing values generated) . gen cclscrap = clscrap . where ^ 1 ^ ^ ^ ¨’¨ X and Z are replaced with ¨ X and ¨ Z.^ ¨ are computed as yit .0012 = . as is any IV method that uses timedemeaning to eliminate the unobserved effect. Differencing twice and using the resulting cross section is easily done in Stata.70368 cclscrap  Coef. I can used fixed effects on the first differences: .79). Interval] 72 . This is because the timedemeaned IVs will generally be correlated with some elements of ui (usually. The general.clscrap[_n1] if d89 (417 missing values generated) .xi)B = y it ^ ¨ xitB. Therefore.2535328 51 .479224186 Residual  25. Err. ui = ¨ yi .xitB = (yit .yi) .494565682 Number of obs F( 2.3868 = 0. the 2SLS residuals from (11.(yi . 11.from (11.2119812 53 .^ ^ . all elements). the degrees of freedom in estimating s2u from part c are properly calculated. f. and * = &N1 SN ¨Z’u^ ^u’¨Z *. t P>t [95% Conf. which are exactly the 2SLS residuals from (11.
114748 1. 51) = Prob > F = 1.2632934 0.6635913 1. To be added.689 0.4544153 . 73 .594 0.1564748 .689 0. 11.6099015 .6635913 1.04 0. Std. Xb) = 0.6850584 cgrant_1  .555 .2240491 .341 .097 .594 0.4011 Fixedeffects (within) regression Number of obs = 108 n = 54 T = 2 Rsq within between overall F( = = = 0. Interval] +d89  .3721087 .2377385 .0476 0.6099016 .7122094 corr(u_fcode.6343411 0.0448014 cgrant  .961 0.6850584 ccgrnt_1  .0050 3.1564748 .3721087 . fe sd(u_fcode) sd(e_fcode_t) sd(e_fcode_t + u_fcode) = = = .2377384 .961 0. The joint F test for the 53 different intercepts is significant at the 5% level.4975778 .1407362 1. t P>t [95% Conf.0063171 fcode  F(53.056 .953 0.3826 clscrap  Coef.6343411 0. To be added.0448014  .883394 _cons  .674 0. Err.555 .15.5202783 .and they are very imprecise.+ccgrnt  .883394 _cons  .1407363 1.341 .2632934 0. It does cast doubt on the standard unobserved effects model without a random growth term. xtreg clscrap d89 cgrant cgrant_1.5202783 .033 (54 categories) The estimates from the random growth model are pretty bad .097 .51) = 1.the estimates on the grant variables are of the wrong sign . 11.0577 0. so it is hard to know what to make of this.509567 .13.
B ^ rN( FE .B) N 1/2 N = N S (si .CA1X¨’i ui. A.A) = N S [(si .A ^ rN( . and B with ^ their consistent estimators. 1 1 By definition. If we replace A.A ^ 1/2 rN( . N i=1 which implies by the central limit theorem and the asymptotic equivalence lemma that . where ri _ (si .XiB) . i=1 _ E[(Z’i Zi) Z’i Xi] and si _ (Z’i Zi) Z’i (yi .A) .^ .11.CA1N1/2 S X¨’i ui + op(1) N i=1 where C = A.55). E(si) By combining terms in the sum we have . we get exactly (11. To obtain (11.B) = Simple algebra and standard properties of Op(1) and op(1) give .A) .A) . sincethe ui are the FE residuals.17.A) = N S [(Z’i Zi)1Z’i (yi .XiB).A) is asymptotically normal with zero mean and variance E(rir’ i ).CA1X¨’i ui] + op(1). .54) and the representation 1 A (N 1/2 N ¨ Si=1X’i ui) + op(1).A] i=1 & 1 N 1 * . 74 . C.N S (Z’ 7 i=1 i Zi) Z’i Xi8rN(BFE .A ^ 1/2 rN( .55). we use (11.
We need the gradient of m(xi.m(x. the gradient of the mean function evaluated under the null is restricted NLS estimator. we first regress mixi2 on mixi1 and * K2 residuals.Q)]E(ux) + E{[m(x.Q) = exp(x1Q1 + x2Q2)x. The approximate elasticity is ^ dlog[E(y z)]/dlog(z1) = d[q^1 + ^ q2log(z1) + ^q3z2]/dlog(z1) = ^q2. where ~ ~ ui = yi . Dbm(x. ^ ^ b. Since d^E(yz)/dz2 = exp[^q1 + ^q2log(z1) + ^q3z2 + ^q4z22]W(^q3 + 2^q4z2).4) with respect to x.1.5. we can compute the usual LM 2 ~ ~ ~ statistic as NRu from the regression ui on mixi1. Since ^ q3/(2^q4).Qo) . Then we compute the statistic as in regression (12. the first term does not depend on Q minimized at = Qo Q..N.Q)]2 2 = E(u x) + [m(x. This is approximated by 100Wdlog[E(yz)]/dz2 = 100Wq3. 12.Qo) . i = 1.m(x. Take the conditional expectation of equation (12. c. and the second term is clearly (although not uniquely. Dq~mi = exp(xi1Q~1)xi _ ~mixi.72).Q)] 2 x} x) + 0 + [m(x. where ~Q1 is the From (12.m(x.mi.75). obtain the 1 ~ ~ For the robust test. mixi2. Dqm(x..m(x. in general).Q)]2..CHAPTER 12 12.Qo) .. By the chain rule. and use E(ux) = 0: E{[y .m(x. 2 = E(u Now. 12. * the turning point is z2 = d.Qo) .Q)] 2 x] = E(u2x) + 2[m(x. a. r~i.Q) = g[xB + d1(xB)2 + d2(xB)3]W[x + 2d1(xB)2 + 75 .Q) evaluated under the null hypothesis.3.
2 Ddm(x.the nuisance Then. so that E(uigxi) = Further.Qo). ) .G) = Dqm(xi.(xiB) ]. and so its unconditional expectation is zero.m(xig. So Each Dgsj(wi. Then. _ yig .Q)’) ui(Q) 1 where.Q~) = g(xiB )[(xiB) . First.Q) = g[xB + d1(xB)2 + d2(xB)3]W[(xB)2.Q. ~ ~ 2 ~ 3 Ddm(xi. giW(xiB) . * 1 vector containing the uig. _ 1.Qo. 76 . )o is 1 N N ^ ^ Qg S ^^ui^^u’i i=1 is consistent for Qog as N 8.37).. and we get RESET. do NLS for each g. This shows that we do not have to adjust for the firststage estimation of )o. the usual LM statistic can be 2 ~ ~ ~ ~ 2 ~ ~ 3 obtained as NRu from the regression ui on gixi. ^ ^ be the vector of nonlinear least Let u i That is. hopefully.G) is a _ ui. g = 1. For each i and g.. where ~ gi ~ _ g(xiB ).3d2(xB) ].Qo. which has the same consequence. too. by standard arguments.G) is a linear combination of ui(Q). where the linear combination is a function Since E(uixi) = 0. With this definition.Q. This part involves several steps.G)xi] = 0.G.G). If G(W) is the identity function.~Q) = g(xiB )xi and Therefore. Then Let B~ denote ~ Dbm(xi. element of s(wi.Qo. a consistent estimator of )^ _ because each NLS estimator.. define uig 0. one can verify the hint directly. the score for observation i is s(wi.7. giW(xiB) . and collect the residuals. g(W) 12. let G be the vector of distinct elements of parameters in the context of twostep Mestimation. the NLS estimator with d1 = d2 = 0 imposed. even though the actual derivatives are complicated. let ui be the G Then E(uiu’ i xi) = E(uiu’ i) = squares residuals.(xB)3]. and I will sketch how each one goes. a. L b. Alternatively. linear combination of ui(Qo) of (xi. E[Dgsj(wi. we can verify condition (12.. )o. the notation is clear.
Go)’] = E[Dqmi(Qo)’)o uiu’ i )o 1 = E{E[Dqmi(Qo)’)o uiu’ i )o 1 1 1 1 Next.Go)].G) = Q can be written * P matrix. The = {E[Dqmi(Qo)’)o 1 Dqmi(Qo)].Q) + [IP t ui(Q)’]F(xi.Go)si(Qo. Hessian itself is complicated.Q)’)1Dqm(xi. but its expected value is not. 1 So. So.Q.G) with respect to Hi(Q. where P is the total number of parameters.Next.Q.Go)xi] = = verified (12. As usual. where F(xi. not on yi.37) and that Ao = Bo. and divide the result by N to get Avar(Q): &N1 SN D m (^Q)’) ^ 1 ^ *1 D mi(Q) q i q 7 i=1 8 /N N 1 & S D m (Q^)’) ^ 1 = Dqmi(^Q)*8 . c. E[Hi(Qo. we have to derive Ao Dqmi(Qo)] Dqmi(Qo)xi]} = E[Dqmi(Qo)’)o E(uiu’ i xi))o = E[Dqmi(Qo)’)o 1 1 Dqmi(Qo)] )o)1 o Dqmi(Qo)] = E[Dqmi(Qo)’)o 1 Dqmi(Qo)]. we have Therefore. Dqmi(Qo)’)1 o Dqmi(Qo).3. from Theorem 12. and show that Bo = Ao.G) is a GP Q.Go)’]: E[si(Qo.Q. (I implicityly assumed that there are no crossequation restrictions imposed in the nonlinear SUR estimation. note that Dqmi(Qo) is a blockdiagonal matrix with blocks Dqgmig(Qog). Avar rN(^Q  Dqmi(Qo)]}1.Qo. d.G). q i 7 ^ ^ Avar(Q) = i=1 The estimate ) ^ can be based on the multivariate NLS residuals or can be updated after the nonlinear SUR estimates have been obtained.G) depends on xi. First.) 77 If )o . 1 = Ao with respect to Dqmi(Qo)’)1 o Dqmi(Qo) + [IP t E(uixi)’]F(xi. _ E[Hi(Qo.Go) Now iterated expectations gives Ao = E[Dqmi(Qo)’)o Qo) The Jacobian Dqm(xi.Go)si(Qo. a 1 * Pg matrix. of si(Q. we derive Bo _ E[si(Qo. we replace expectations with sample averages and unknown ^ ^ parameters. that involves Jacobians of the rows of )1Dqmi(Q) The key is that F(xi.
and the gradients differ across g.d. which does not .Qog) = exp(xiQog) then Dqgmg(xi.Qog) varies across g. 2 2 o o s2 oG DqGm iG’DqGmiG 9 0 ^ Taking expectations and inverting the result shows that Avar rN(Qg . is the same in all equations .7. 360.Med(yx) = 78 a . a. mg(xi.Qog) = xi For example. see p. If u and x are independent.. The first hint given in Problem 7. the blocks are not the same even when the same regressors appear in each equation.Qog) = 0 2 W W W 0 2  s2og[E(Dqgmoig’Dqgmoig)]1.a very Dqgmg(xi. We cannot say anything in general about Med(yx). if Dqgmg(xi.9. But. and Med(ux) could be a general function of x.G. since Med(yx) = m(x.. then E(ux) and Med(ux) are both constants. g = 1.. even when the same regressors appear in each equation. so is its inverse..Qog) = exp(xiQog)xi. I cannot see a nonlinear analogue of Theorem 7. for all g. While this G The key is that Xi is replaced * P matrix has a blockdiagonal form. b. unless Qog restrictive assumption  In the linear case. (Note also that the nonlinear SUR estimators are asymptotically uncorrelated across equations.) These asymptotic variances are easily seen to be the same as those for nonlinear least squares on each equation.Bo) + Med(ux).5 does not extend readily to nonlinear models. as described in part d.is blockdiagonal. e. say a and d. with Dqm(xi. Then E(yx) .Qo). 12. Standard matrix multiplication shows that ( 2 o so1 Dq1mi1 ’Dq 1 m oi1 0 W W W 2 Dqmi(Qo)’)1 o Dqmi(Qo) = ) 0 2 2 2 0 WW W 2 2 .
Provided m(x.depend on x.Bo) .B)]’[yi . That is.Bo) m(xi. a.Bo) m(xi.B) = XiB for Xi a G B $ Bo. Bo must uniquely minimize E[q(wi. 79 The key is that.in addition to the regularity conditions.B)]’[m(xi. which I will ignore .B)]}’{ui + [m(xi. When u and x are independent. We can apply the results on twostep Mestimation.Bo) . b. 12.11. Therefore.B)]} > 0.m(xi.Bo) . Generally.W) is twice continuously differentiable. .Bo) .B)]’[m(xi.m(xi.B)]} = E({ui + [m(xi. the condition is (Bo  B)’E(X’i Xi)(Bo  B) > 0. B $ Bo." at least when only the mean and median are in the running.B)]} = E(uiu’ i ) + E{[m(xi.m(xi.B)]} because E(uixi) = 0.B)]}) = E(u’ i ui) + 2E{[m(xi. where m(xi. and there is no ambiguity about what is "the effect of xj on y.m(xi.m(xi. * K matrix. we could interpret large differences between LAD and NLS as perhaps indicating an outlier problem.the identification condition. we need .3. For consistency of the MNLS estimator. and Ao = E[Dqmi(Bo)’Dqmi(Bo)]. and this holds provided E(X’ i Xi) is positive definite.Bo) .B)]’ui} + E{[m(xi. the identification assumption is that E{[m(xi. there are no problems in applying Theorem 12. But it could just be that u and x are not independent.m(xi. In a linear model. Bo = E[Dqmi(Bo)’uiu’D i qmi(Bo)] These can be consistently estimated in the obvious way after obtain the MNLS estimators. c.Bo) .B)]’[m(xi. Then.B)] = E{[yi  m(xi. the partial effects of xj on the conditional mean and conditional median are the same.Bo) m(xi.m(xi.
37).7: E[si(Bo.m(xi.Do)xi] = 0.B)]’[W(xi.Bo. B As and the second term is minimized we would have to assume it is uniquely minimized. In particular.m(xi.Bo) m(xi.m(xi. i=1 converges uniformly in probability to E{[yi .m(xi. But we can use an argument very similar to the unweighted case to show E{[yi .Do). as always). It follows easily that E[Ddsi(Bo. we can use an argument very similar to the nonlinear SUR case in Problem 12.Do)] [yi . 1 where E(uixi) = 0 is used to show the crossproduct term. 1 which is just to say that the usual consistency proof can be used provided we verify identification. when Var(yixi) = Var(uixi) = W(xi.B)]’[W(xi. 1 before. 2E{[m(xi.B)]’[Wi(Do)] [m(xi.underl general regularity conditions. under E(yixi) = m(xi. To get the asymptotic variance.Do)si(Bo. N 1 N S [yi .Do) for some function G(xi.Do) = (IP t ui)’G(xi.m(xi.B)]} = E{u’ i [Wi(Do)] ui} 1 1 + E{[m(xi. 1 80 Dbmi(Bo)] Dbmi(Bo)] . First. is zero (by iterated expectations. we can write Ddsi(Bo. we can ignore preliminary estimation of provided we have a Do rNconsistent estimator.B)]’[Wi(^D)]1[yi .m(xi.B)]/2.37) holds. This means that.Do)’] = E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] 1 1 = E{E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] 1 1 Dbmi(Bo)xi]} = E[Dbmi(Bo)’[Wi(Do)] E(uiu’ i xi)[Wi(Do)] 1 1 = E{Dbmi(Bo)’[Wi(Do)] ]Dbmi(Bo)}. the first term does not depend on at Bo.  To obtain the asymptotic variance when the conditional variance matrix is correctly specified.Bo.7.Bo) .Do)] [yi .Bo) .m(xi.B)]}/2.B)]’[Wi(Do)] ui}. it can be shown that condition (12.Do). that is.m(xi.Bo). we proceed as in Problem 12. which implies (12.B)]}.
Bo) = Ao . the asymptotic variance is affected because Ao Bo.Do)] = E{Dbm(xi. and the expression for Bo no longer holds. The consistency argument in part b did not use the fact that W(x. No. we estimate Avar ^ ^1^^1 rN(B . through.B )’[Wi(D)] uiu’ i [Wi(D)] Dbm(xi.D) is correctly specified for Var(yx).Q)]}. for some complicated function F(xi.Bo) + (IP t ui)’]F(xi. evaluated at (Bo. f(yixi. still works.B).Bo)’[Wi(Do)]1Dbm(xi. and  a consistent estimator of Ao is ^ 1 A = N N ^ ^ 1 ^ S Dbm(xi.Q)] $ exp{E[log f(yixi. Qo Therefore. the Hessian (with respect to Hi(Bo.yi). Qe$ where the expectation is over the joint distribution of (xi.Do) = B).Bo.Q)] > exp{E[log 81 In .Do).Bo)} = Bo.Do) that depends only on xi.Bo)’[Wi(Do)]1Dbm(xi. We know that Qo solves max E[log f(yixi. because exp(W) is an increasing function. from the usual results on Mestimation. of course. Exactly the same derivation goes But.Q)]} over $. fact.Q)]. Avar ^ 1 rN(B . i=1 c.  CHAPTER 13 13. Jensen’s inequality tells us that E[f(yixi. Taking expectations gives Ao _ E[Hi(Bo.Bo) in the usual way: A BA .Do).B). can be written as Dbm(xi. i=1 Now. of course. also maximizes exp{E[log The problem is that the expectation and the exponential function cannot be interchanged: E[f(yixi. ^ B = N $ The estimator of Ao in part b To consistently estimate Bo we use 1 N ^ ^ 1^ ^ ^ 1 ^ S Dbm(xi.1.Bo.Now.B )’[Wi(D)] Dbm(xi. Therefore.
g 1 E[si(Fo)si(Fo)’xi] = E{[G(Qo)’] si(Qo)si(Qo)’[G(Qo)] g g 1 1 xi} = [G(Qo)’] E[si(Qo)si(Qo)’xi][G(Qo)] 1 1 = [G(Qo)’] Ai(Qo)[G(Qo)] .xi] = ri2E[li1(Q)yi2. Since si(Fo) = [G(Qo)’] si(Qo). In part b. 13.Q)]}. The expected Hessian form of the statistic is given in the second ~g ~g part of equation (13. 13. but where it is based initial on si and Ai: & SN ~sg*’& SN A~g*1& SN ~sg* LMg = 7i=1 i8 7i=1 i8 7i=1 i8 & SN ~G’1~s *’& SN ~G’1~A ~G1*1& SN ~G’1s~ * = i8 7 i i8 7i=1 8 7i=1 i=1 N ’ N 1 N & S s~ * G~1G~& S A~ * G~’G~’1& S ~s * = 7i=1 i8 7i=1 i8 7i=1 i8 N N 1 N & S ~s *’& S ~A * & S ~s * = LM. Parts a and b essentially appear in Section 15.5. for all (yi2.xi).f(yixi. Since ri2 is a function of (yi2.x.xi).Q).Qo). a. The joint density is simply g(y1y2. and therefore Qo maximizes E[ri2li1(Q)].xi].xi).7. 1 b.Qo)Wh(y2x. Qo maximizes E[li1(Q)yi2. we know that.xi]. E[ri2li1(Q)yi2.3. and . The log likelihood for observation i is Q) _ li( log g(yi1yi2.Q) + log h(yi2xi.36). First. = i i i 7i=1 8 7i=1 8 7i=1 8 13. 82 Qo maximizes E[li2(Q)]. we just replace 1 Qo with ~g ~ 1 ~ ~ 1 Ai = [G(Q)’] Ai(Q)[G(Q)] Q~ and Fo with F~: _ ~G’1~Ai~G1. since ri2 > 1.xi. Qo maximizes E[ri2li1(Q)yi2. and we would use this in a standard MLE analysis (conditional on xi).xi] for all (yi2. c. a. b. Similary.4.
expectation.xi). E[si1(Qo)yi2.Qo) by {E[Hi(Q)]}1.Q). E[si2(Qo)si2(Qo)’] = E[Hi2(Qo)]. Further. where si1(Q) _ Dqli1(Q)’ and si2(Q) _ Dqli2(Q)’. (13. by iterated expectatins. Combining all the pieces. which means we can estimate the asymptotic variance of 83 rN(^Q . and so its transpose also has zero conditional expectation.70) Since ri2 is a function of (yi2. we have shown that E[si(Qo)si(Qo)’] = E[ri2Hi1(Qo)] . E[si1(Qo)si1(Qo)’yi2. where Hi1(Q) = Dqsi1(Q). c. As usual.Q).xi] = 0 and. byt the conditional IM equality for the density g(y1y2.xi].E[Hi2(Qo)] = {E[ri2Dqsi1(Q) + = E[Dqli(Q)] 2 Dqsi2(Q)] _ E[Hi(Q)]. Now. where Hi2(Q) = Dqsi2(Q). E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’] + E[ri2si1(Qo)si2(Qo)’] + E[ri2si2(Qo)si1(Qo)’]. we can put ri2 inside both expectations in (13. this implies zero unconditional We have shown E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’]. Therefore.70).xi] = 0. since ri2 and si2(Q) are functions of (yi2. E[ri2si1(Qo)si1(Qo)’] = E[ri2Hi1(Qo)].so it follows that Qo maximizes ri2li1(Q) + li2(Q).xi] = E[Hi1(Qo)yi2. it follows that E[ri2si1(Qo)si2(Qo)’yi2. we have to assume or verify uniqueness. So we have verified that an unconditional IM equality holds.x. by the unconditional information matrix equality for the density h(y2x.  . Now by the usual conditional MLE theory. The score is si(Q) = ri2si1(Q) + si2(Q).xi). Then. For identification.
Similarly. Instead. Now. i=1 where the notation should be obvious. Answer: B are P 1 B .A We use a basic fact about positive definite matrices: if A and * P positive definite matrices.xi]. e. Interestingly.s. for which we can use iterated expectations. because E[Ai1(Qo) + Ai2(Qo)] . N This implies that.s. (yi2. the asymptotic variance would be {E[Ai1(Qo) + Ai2(Qo)]} .to consistently estimate the asymptotic variance of the partial MLE. But. we can still used the conditional expectations of the hessians . the asymptotic variance of the partial MLE is {E[ri2Ai1(Qo) + Ai2(Qo)]} .Qo) is  1 N N S (ri2H^i1 + H^i2).d.d.but conditioned on different sets of variables. 1 If we could use the entire random sample for both terms. even though we do not have a true conditional maximum likelihood problem.E[ri2Ai1(Qo) + Ai2(Qo)] 84 . if and only if 1 is positive definite. N S ^Ai2 is consistent for E[Hi2(Qo)] i=1 by the usual iterated expectations argument. Bonus Question: Show that if we were able to use the entire random sample. under general regularity 1 N S ri2^Ai1 consistently estimates E[ri2Hi1(Qo)]. and xi in the other . as we showed in part d. From part c. and ri2 is a function of (yi2.{E[Ai1(Qo) + is p.xi). then A .d. definition. since Ai1(Qo) _ E[Hi1(Qo)yi2. 1 Ai2(Qo)]} 1 But {E[ri2Ai1(Qo) + Ai2(Qo)]} 1 . This i=1 completes what we needed to show. by 1 N _ E[Hi2(Qo)xi]. it follows that E[ri2Ai1(Qo)] = E[ri2Hi1(Qo)]. conditions. this estimator need not be positive definite. we can break the problem into needed consistent estimators of E[ri2Hi1(Qo)] and E[Hi2(Qo)]. as we discussed in Chapters 12 and 13. Ai2(Qo) Since.B is p. one consistent estimator of rN(^Q . the result conditional MLE would be more efficient than the partial MLE based on the selected sample.xi) in one case.
To be added. To be added. (since Ai1(Qo) is p.3.s.= E[(1 . one could try to use the optimal instruments derived in section 14. regression y2 on x2 consistently estimates 85 D2. to find analytically if b. We can see this by obtaining E(y1x): E(y1x) = x1D1 + Now.x2). single equation GMM estimator. these are difficult. Even under homoskedasticity. Otherwise. we cannot find E(y1x) without more assumptions. the parameter g2 does not appear in the model. g1 = 0.s.d. E(y22x) $ [E(y2x)] 2. 13. course. in fact.ri2)Ai1(Qo)] is p. 2SLS using the given list of instruments is the efficient. g g2 $ 1. If g2 $ 1.5.11. While the the twostep NLS estimator of . we would consistently estimate D1 by OLS. if not impossible. a.ri2 > 0.these would generally improve efficiency if g2 $ 1. using instruments (x1. No. If E(u2x) = 2 s22.d. and 1 . CHAPTER 14 14.35) is by 2SLS.9.1. Finally. The simplest way to estimate (14. when g = x1D1 + g g1E(y22x) + E(u1x) g g1E(y22x). 13. so we cannot write E(y1x) = x1D1 + g g1(xD2) 2. c. if we knew Of g1 = 0. the optimal weighting matrix that allows heteroskedasticity of unknown form should be used. Nonlinear functions of these can be added to the instrument list .
2.L’ 3 . take A _ G’o %oGo and s(wi) _ * The optimal score function is s (wi) Ro(xi)’)o(xi) r(wi.3.Qo)’)o(xi) Ro(xi)] * 1 = G’ o %oE[Z’ i E{r(wi.Qo)’xi})o(xi) Ro(xi)] 1 = G’ o %oE[Z’ i )o(xi))o(xi) Ro(xi)] = G’ o %oGo = A.3.Qo)r(wi.63). where we suppress its dependence on xi.") When D1 and g2.L’ 1 . So.(L2 + Therefore.Qo)r(wi.L’ 2 . the plugin method works: it is just the usual 2SLS estimator. Then the asymptotic variance of the GMM estimator has the form (14.L’3 ]’. * (1 + 4K) matrix H defined . pt0 + xiPt + vit.10) with Go = E[Z’ i Ro(xi)].54). (This is an g2 = 1. P2 = [L’ 1 . in (14. P3 = [L’ 1 . Let Zi be the G * G matrix of optimal instruments in (14. 1 14. P1 = [(L1 + B)’.Qo).B’)’. (xiD2) will not be consistent for example of a "forbidden regression. * 1. With the restrictions imposed on we have pt0 = j. and then P is the 3 + 9K * 1 vector obtained by Let Q = (j.L’2 .3.Qo). * 14.(L3 + = HQ for the (3 + 9K) by 86 B)’]’.5. we can write P B)’.L’3 ]’.L’ 2 .57) with r = 1: E[s(wi)s (wi)’] = G’ o %oE[Z’ i r(wi. t = 1. We can write the unrestricted linear projection as yit = where Pt is 1 + 3K stacking the the Pt Pt.^ g2 yi1 on xi1. G’ o %oZ’ i r(wi. t = 1.2. 1 _ Now we can verify (14. %o function of xi and let * L matrix that is a Let Zi be a G be the probability limit of the weighting matrix.
we know that E(rir’ i xi) =  ljTvi.7.p. assuming H’% H is nonsingular . where ri = vi ˇ ˇ Therefore. the minimization problem becomes ^ ^1 ^ min (P . 10.which occurs w. The first order condition is easily seen to be ^1 ^ ^ 2H’% (P . as described in the hint. The choices of si1.55). E(si1s’ i1) = E(X’ i rir’ i Xi) = iterated expectations argument. we just need to verify (14. and RE. QeR P where it is assumed that no restrictions are placed on Q. 14. ¨ ˇ But si2s’ i1 = X’ i uir’ i X i. With h(Q) = HQ. or ^1 Therefore.HQ).55) and (14.we have 1 Q^ ^1 1 ^1^ = (H’% H) H’% P. r _ su2.&1 0 0 0 0 * 0 IK 0 0 0 0 IK 0 2 2 2 0 1 0 H = 0 0 1 0 0 70 2 2 2 2 2 2 2 2 2 2 0 0 IK 0 0 0 IK 0 0 0 0 0 IK 0 0 0 IK 0 IK 0 0 0 IK 0 0 0 IK IK 0 2 2 2 0 0 0 .is nonsingular .1. We have to verify equations (14. X’ i ri = X’ i (vi 87 r.2. su2E(Xˇ’i Xˇi) _ su2A1 by the usual This means that. and A2 are given in the hint.1.HQ)’% (P .56) for this choice of ¨ ¨ Now. in (14.HQ) = 0 ^1 ^ ^1^ (H’% H)Q = H’% P. IK 0 0 0 0 IK8 2 2 2 2 2 2 2 2 2 2 14. RE.a.9.  Now.3. from Chapter s2uIT under RE. A1.56) for the random effects and fixed effects estimators. Now. ljTvi) = ¨X’i vi = ¨X’i (cijT + ui) =  . when H’%o H . si2 (with added i subscripts for clarity).
56) with r . This verifies (14. ¨ ˇ ¨ ˇ So si2s’ i1 = X’ i rir’ i Xi and therefore E(si2s’ i1xi) = X’ i E(rir’ i xi)Xi = It follows that E(si2s’ i1) = ¨ ˇ ¨ note that X’ i Xi = X’ i (Xi = su2E(X¨’i Xˇi). 88 To finish off the proof.  s2u. su2X¨’i Xˇi.¨ X’ i ui. ljTxi) = ¨X’i Xi = ¨X’i ¨Xi.
and these are necessarily in [0. Therefore.the coefficient on dm is obtained from the regression yi on dmi. If P(y = 1z1.3.z2. M. d1 . of course. this is estimated as ^ ^ ^ ^ ^ 2 (g1 + 2g2z2)Wf(z1D1 + g1z2 + g2z2).d1) = the partial effect of z2 is dP(y = 1z1.in the first case. In the model P(y = 1z1. for given z.z2) = (g1 + 2g2z2)Wf(z1D1 + dz 2 2  g1z2 + g2z22).. g3d1)Wf(z1D1 + g1z2 + g2d1 + g3z2d1).z2..d1 = 0) = F[z1D1 + (g1 + g3)z2 + g2] .d1 = 1) .1]. where. But this is easily seen to be the fraction of yi in the sample falling into category m.. and all i .d1) = (g1 + dz 2  F(z1D1 + g1z2 + g2d1 + g3z2d1). a. 15. b. m = 2.z2) = F(z1D1 + g1z2+ g2z2) then dP(y = 1z1.1. i = 1. the overall intercept is the cell frequency for the first category. The fitted values for each category will be the same...CHAPTER 15 15.F(z1D1 + g1z2). b. and the coefficient on dm becomes the difference in cell frequency between category m and category one. a.P(y = 1z. to estimate these effects at given z and .dkiWdmi = 0 for k $ m. Again..N. . the fitted values are just the cell frequencies.. The effect of d1 is measured as the difference in the probabilities at d1 = 1 and d1 = 0: P(y = 1z. the estimates are the probit estimates. If we drop d1 but add an overall intercept.we 89 . Since the regressors are all orthogonal by construction .
If P(y = 1z. for P(y = 1z). qz ~ Normal(0. 5 2 2 g1z2 + 1 has a standard normal distribution independent of z. Testing H0: r1 = 0 is most easily done using the score or LM test because. this is what we can estimate ====================================== P(y = 1z) = along with D1.ez) = 0 by independence between e and (z. Because P(y = 1z) depends only on g1. Because q is assumed independent of z. ^fi = f(zi1^D1).ez) = g12z22 + 1 because Cov(q. ====================================== It follows that 5 F&7z1D1/r g21z22 + 18*.5.) F(z1D1 + g1z2q) then dP(y = 1z. this follows because E(rz) = 2 2 E(ez) = 0. ================================================ . ^ui = yi ^ 90 5 Fi. with respect to all probit parameters. we would require the full variance matrix of the probit estimates as well as the gradient of the expression of interest.) g1 = 2 and g1 = 2 give exactly the same model This is why we define r1 = g21. Define Let D1 ^ denote the probit estimates under the null that Fi = F(zi1^D1). (15. g1z2 + (Not with respect to the zj. and e is independent of (z. (For example. r/r g1z2E(qz) + Thus. Var(rz) = g12z22Var(qz) + Var(ez) + 2g1z2Cov(q.q) =  assuming that z2 is not functionally related to z1.q) with a standard normal distribution. Also. Write y = z1D1 + r.just replace the parameters with their probit estimates. 1 1 1 1 2 dz 2 15. under H0. such as (g1 + 2g2z2)Wf(z1D1 + g2z22). a. c. * b. where r = g1z2q + e.q) = g qWf(z D + g z q).q). Thus. we have a standard probit model. We would apply the delta method from Chapter 3.F^i) ^  r1 = 0.90) 2 c.g1z2 + 1). and use average or other interesting values of z. and ~ui _ u^i/r F^i(1 .
0215953 .0014738 . &r z2 + 1*3/2f(z D /r5g2z2 + 1). the 2 score statistic can be obtained as NRu from the regression ~ ui 5 5 2 ^ f^izi1/r F^i(1 . t P>t [95% Conf.1543802 .83 0.17 0.0303561 .000 .0160927 22.34 0.0171986 0. d. ================================================ on 2 a under H0.184405163 +Total  545. Err.0063417 0.0089326 .0824 0.1617183 .0028698 .0308539 . The model can be estimated by MLE using the formulation with place of g21.000 .816514 2724 .0205592 4.000 .0020613 .90) with respect to r1 is.844422 2716 .0365936 _cons  .0128344 inc86  .0048884 0.55 0. The following Stata output is for part a: .0235044 6.3925382 91 .1133329 avgsen  .65 0. 2716) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 2725 30.0035024 .43 0.1954275 .0489454 .42942 arr86  Coef.0892586 .0209336 7. NRu ~ ================================================ c21.007524 ptime86  .673 .867 .48 0.37 0.329428 .1156299 . is simply only other quantity needed is the gradient with respect to null estimates.2078066 hispan  . (zi1D^1)zi2 fi/r F^i(1 .0044679 4.20037317 Number of obs F( 8. r1 in But this is not a standard probit estimation.0012248 . 15.000127 9. The r1 evaluated at the But the partial derivative of (15.000 .0159374 tottime  .0116466 .3609831 .0009759 black  . with respect to The gradient of the mean function in (15. 1 i2 7 1 i2 8 9 i1 1 0 ^ ^ 2 ^ When we evaluate this at r1 = 0 and D1 we get (zi1D1)(zi2/2)fi.000 . a. ========================================== (zi1D1)(zi2/2) 2 Then.7. for each i. Std. D1.581 .F^i).9720916 8 5.88 0.F^i). Interval] +pcnv  .62151145 Residual  500.(the standardized residuals).0000 0.42 0.90) evaluated under the null estimates.0797 . reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Source  SS df MS +Model  44. ^ fizi1 .000 .1295718 born60  .
In fact.0 tottime = 0.18 0.0027532 7.0269938 ..1617183 .000 .0167081 21.84 0.0171596 0.0210689 4.1116622 .61 0.42942  Robust arr86  Coef.154(.0012248 .0001141 10. Err. There are no important differences between the usual and robust standard errors.0479459 . test avgsen tottime ( 1) ( 2) avgsen = 0.0307774 . test avgsen tottime 92 .33 0.0824 .0020613 .000 . so the probability of arrest falls by about 7.0161967 inc86  .3282214 . in a couple of cases the robust standard errors are notably smaller.14 0.0062244 ptime86  .59 0.0042256 0.552 .626 . t P>t [95% Conf.3609831 . reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60.000 . qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 . The robust statistic and its pvalue are gotten by using the "test" command after appending "robust" to the regression command: .7 points.000 .2117743 hispan  .001001 black  .17 0.867 . b.0080423 .018964 8. 2716) = Prob > F = 0.49 0.0058876 0.5) = .000 .010347 .0014487 .0150471 tottime  .24 0.036517 _cons  .000 .0 F( 2.0035024 .0255279 6.1171948 avgsen  .0028698 .0892586 . robust Regression with robust standard errors Number of obs F( 8.59 0.077. Interval] +pcnv  .0000 0.25 to .0215953 .1305714 born60  .1915656 .8320 .1543802 .3937449 The estimated effect from increasing pcnv from . 2716) Prob > F Rsquared Root MSE = = = = = 2725 37. Std.73 0.75 is about .
548 .6322936 3.0212318 0.000 .0046346 . Min Max +avgsen  2725 . The probit model is estimated as follows: . 2716) = Prob > F = 0.0556843 0.45 0. Interval] +pcnv  . probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = 1608.1629135 .607019 0 63.508031 0 59.0168844 0.0004777 9.0076486 .45 0.1203466 _cons  .18 0.3255516 .0459949 inc86  .67 0.0407414 .387156 1.( 1) ( 2) avgsen = 0.20 0.0 tottime = 0.0 F( 2.60 0.62721 0 541 93 .2911005 .6406 = = = = 2725 249.09 0.4143791 .0000 0. sum avgsen tottime ptime86 inc86 Variable  Obs Mean Std.8387523 4.0979318 .3157 1483.5529248 .2 tottime  2725 .0720778 7. hispan = 0. Std.0812017 .0512999 6.0654027 4.8360 c.4 ptime86  2725 . born60 = 1.0774 arr86  Coef.000 . we must compute the difference in the normal cdf at the two different values of pcnv.4192875 born60  . and at the average values of the remaining variables: .96705 66.6458 1483.6076635 hispan  .6941947 .70 0.4666076 .017963 4.000 .0543531 tottime  .1837 1486.0127395 .3138331 .1164085 .0055709 .12 0.0036983 black  .000 .950051 0 12 inc86  2725 54.651 .028874 . black = 1.52 0.0254442 ptime86  . z P>z [95% Conf.213287 Now.840 .6406 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 1483.0719687 6.000 .4116549 avgsen  . Err.48 0.000 .0112074 . Dev.
Adding the quadratic terms gives .75 .0046*54.97 + . di normprob(.839 ..5 . di . for the men who were arrested.553*.0127*.387 . di 1903/1970 .313 + .467 + .1174364 .10.10181543 This last command shows that the probability falls by about . we first generate the predicted values of arr86 as described on page 465: .. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq pt86sq inc86sq 94 .3% of the time. Unfortunately.96598985 . gen arr86h = phat > .normprob(. predict phat (option p assumed. which is somewhat larger than the effect obtained from the LPM. di 78/755 .0112 . e. To obtain the percent correctly predicted for each outcome.117) . the probit predicts correctly about 96.553*..6% of the time..632 . The overall percent correctly predicted is quite high.0812*.. but we cannot very well predict the outcome we would most like to predict.. d.10331126 For men who were not arrested.0076*. Pr(arr86)) . tab arr86h arr86  arr86 arr86h  0 1  Total ++0  1903 677  2580 1  67 78  145 ++Total  1970 755  2725 .25 . the probit is correct only about 10.117) .
77 0.0965905 pcnvsq  .0733798 5.568 .1474522 .0213253 ptime86  .798 .337362 .8005 = = = = 2725 336.002 1.62 0.0078094 .1837 1452.0000171 _cons  . The quadratic in pcnv means that.Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: 7: log log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = = 1608.2167615 .2089 1444.8005 1439.0244972 0. test pcnvsq pt86sq inc86sq ( 1) ( 2) ( 3) pcnvsq = 0.63e07 .2929913 . which does not make much sense.00 0.8535 1440.0562665 6.28e06 2.217/(2*.0620105 tottime  .0224234 4.3978727 born60  .0139969 .97 0.16 0.059554 inc86sq  8.1349163 .4368131 .57 0.067082 3. z P>z [95% Conf.75e06 4.056957 .0058786 .3250042 pt86sq  .0199703 0.580635 hispan  . there is actually a positive relationship between probability of arrest and pcnv.2270817 note: 51 failures and 0 successes completely determined.000 . The turning point is easily found as .1047 arr86  Coef.1256351 .8005 Probit estimates Number of obs LR chi2(11) Prob > chi2 Pseudo R2 Log likelihood = 1439.041 3.0000 0.7449712 .1438485 5.0178158 .2937968 .89 0.2714575 3. Std.0145223 .0000 The quadratics are individually and jointly significant.7273198 avgsen  .0 inc86sq = 0.8166 1439.0 pt86sq = 0. Err. at low levels of pcnv.4476423 .04 0.0340166 .1035031 .97 0.54 0. which means that there is an estimated deterrent effect over most of the range of pcnv.18 0.000 .0039478 black  .26 0.4630333 1.000 . 95 . Interval] +pcnv  .026909 inc86  .0009851 5.405 .95 0.2663945 .000 .83 0.268 1439.857) ~ .000 .372 .3151 1441.0 chi2( 3) = Prob > chi2 = 38.2604937 0. .127.000 .8570512 .0566913 0.389098 .
of course . Therefore. this condition must be checked...N. where x1 = 1. = yilog(xiB) + (1 .T. which is only welldefined for 0 < xiB < 1.yi)log(1 . When we add the standard assumption for pooled probit ... 15.. we can use values of the loglikelihood to choose among different models for P(y = 1x) when y is binary.. a.maximizes the KLIC. for each i.. c. independence assumption: independent. the loglikelihood function is well ^ defined only if 0 < xiB < 1 for all i = 1.. We really need to make two assumptions. li(B) Then... asymptotically the true density will produce the highest average log likelihood function.yTxi) = f1(y1xi)WWWfT(yTxi).15. especially if N is large.. during the iterations to obtain the MLE. For any possible estimate B^. It may be impossible to find an estimate that satisfies these inequalities for every observation. just as we can use an Rsquared to choose among different functional forms for E(yx).then 96 .. (yi1. Let P(y = 1x) = xB. This follows from the KLIC: the true density of y given x  evaluated at the true values...... t = 1.that D(yitxit) follows a probit model .. b. the joint density (conditional on xi) is the product of the marginal densities (each conditional on xi).yiT) are This allows us to write f(y1. The first is a conditional given xi = (xi1.xiT).11. So.xiB). Since the MLEs are consistent for the unknown parameters..9. exogeneity assumptiond: The second assumption is a strict D(yitxi) = D(yitxit).. that is.
c. Once we have the estimates.  and in the latter we have N ^ ^ ^ ^ q _ N1 S {[F(d^0 + d^1 + d^2 + d^3 + xiG ) . We would have to use the delta method to obtain a valid standard error for either ^ q or ~q.. both before and after the policy change. we need to compute the "differenceindifferences" estimate.f(y1. and x using all observations. The estimated probabilities for the treatment and control groups. say x. will be identical across models. or averaging the differences across xi. d2.. dB. Let d2 be a binary indicator for the second time period.F(d0 + d2 + xiG)] ~ i=1 ^ ^ ^ . Both are estimates of the difference..[F(d0 + d1 + xiG)  F(^d0 + xi^G)]}. d2WdB. 15.[F(d0 + d1 + xG)   F(^d0 + x^G)]. between groups B and A.13. b. a.yTxi) = T p [G(xitB)]yt[1 . of the change in the response probability over time.F(^d0 + ^d2 + x^G)]  ^ ^ ^ . and let dB be an indicator for the treatment group. we have q^ _ [F(^d0 + ^d1 + ^d2 + ^d3 + x^G) .G(xitB)]1yt. We would estimate all parameters from a probit of y on 1. If there are no covariates. where x is a vector of covariates. Then a probit model to evaluate the treatment effect is P(y = 1x) = F(d0 + d1d2 + d2dB + d3d2WdB + xG). 97 . t=1 and so pooled probit is conditional MLE.. there is no point in using any method other than a straight comparison of means. In the former case.  which requires either plugging in a value for x.
. We would be assuming that the underlying GPA is normally distributed conditional on x. 1 2 G b..17.. 98 .. this depends only on the observed data. a.. We obtain the joint density by the product rule..Go) = f1(y1x.15.c. 15. equivalently.. c. ordered probit with known cut points. since we have independence conditional on (x.Go)WWWfG(yGx.Go) = i p fg(ygx..19.yGx...we estimate The estimated coefficients are interpreted as if we had done a linear regression with actual GPAs. Because c appears in each D(ygx...... The density of (y1.yGx.) s2.. (Clearly a conditional normal distribution for the GPAs is at best an approximation. We should use an interval regression model.c. but we only observe interval coded data.Go) h(cx.c. and the unknown parameters. i 3 87g=1 g ig i 8 4 log As expected..yG) given x is obtained by integrating out with respect to the distribution of c given x: 8& G * g(y1. Along with the bj ..Do)dc.c): f(y1.D)dc$. 7 8 8 g=1 where c g is a dummy argument of integration. To be added. 15. (xi.c)..including an intercept .c.yiG). The log likelihood for each i is # 8i& pG f (y x . y1.Gg)*h(cx .15..Go).yi1.Go)f2(y1x.yG are dependent without conditioning on c.c.c.
where ui might contain unobserved ability. Under H0.s ) = 1[yi = log(c)]Wlog(1 . P(yi _ log(ti*). if xi contains something like education. then the 99 .xiBxi] = 1 As c L 8. B2 1 f[(yi . < yxi) = P(yi* < yxi). y < log(c). To test H0: statistic. the longer we wait to censor. f(yxi) = 1  This is because. LR is c2K2. for y Thus. Thus. = 0.F{[log(c) . F{[log(c) . y = log(c) sf[(y . P[log(ti) = log(c)xi] = P[log(ti) > log(c)xi] * = P[ui > log(c) . The assumption that ui is independent of ci means that the decision to censor an individual (or other economic unit) is not related to unobservables * affecting ti. in something like an unemployment duration equation.xiB)/s]}. _ log(ti) (given xi) when ti < c is the same as the b.xiB]/s}.ci) has the same form as the density of yi given xi above.xiB]/s}) f(yxi) = 1  + 1[yi < log(c)]Wlog{s d. which is treated as exogenous. the density of yi given (xi. a. e. This simply says that. we do not wait longer to censor people of lower ability. The LR statistic is distributed asymptotically as LR = 2(Lur  Lr). which is just Normal(xiB.s2). Since ui is independent of (xi. except that ci replaces c.xiB]/s} L F{[log(c) . 2 c. Note that ci can be related to xi. The density of yi * density of yi < log(c). li(B.1.CHAPTER 16 16. I would probably use the likelihood ratio This requires estimating the model with all variables. and then the model without x2.xiB]/s}. the less likely it is that we observe a censored observation. the density for yi = log(ti) is F{[log(c) . Thus.xiB)/s]. 1. and so P[log(ti) = log(c)xi] L 0 as c L 8.ci).
Taking the derivative of this cdf with respect to y gives the pdf of yi conditional on xi for values of y strictly between a1 and a2: * b.xiB)/s] F[(a1 .F[(a2 .xB)/s] 100 F[(a1 .a1 .a2)/s] = a1F[(a1 .xB) * E(y = xB + sE[(u/s)x. Since y = y a2).xB)/s < u/s < (a2 . P(yi < yxi) = P(y*i < yxi) = F[(y .xB < u < a2 .xiB)/s].xB)/s] + E(yx. Next.xB.a1 < y < a2). a. and a1 < y (1/s)f[(y . we can easily get E(yx) by using the following: E(yx) = a1P(y = a1x) + E(yx.a1 < y* < a2) = xB + E(ux.xB)/s] . Therefore.a1 < y < a2) = E(y * when a1 < y = xB + u.xiB)/s] F[(a2 .xB)/s] + a2F[(xB .xB)/s] = xB + s{f[(a1 . * But y < a2. Similarly. P(yi = a1xi) = P(yi * = < a1xi) = P[(ui/s) < (a1 .3.(a1 .xiB)/s].a1 < y* < < a2 if and only if a1 .xB)/s]} . for a1 < y < a2.F[(a1 .a1 < y < a2)WP(a1 < y < a2x) + a2P(y2 = a2x) = a1F[(a1 . 16.xB)/s]  f[(a2 .xB < u < a2 . * * x.xB)/s]}/{F[(a2 . P(yi = a2xi) = P(yi * = P[(ui/s) = > a2xi) = P(xiB + ui > a2xi) > (a2 . E(yx.a1 < y < a2)W{F[(a2 .xiB)/s] = 1 . using the hint.xiB)/s].xiB)/s].censoring time can depend on education. x.xB)/s]} = E(yx. Now.
f. or strictly between the endpoints. As a shorthand.xiB)/s]} + 1[yi = a2]log{F[(xiB .57). at the right endpoint. f2 _ f[(a2 .xiB)/s]}.xiB)/s]. and F2 _ F[(a2 .a2)/s]} + 1[a1 < yi < a2]log{(1/s)f[(yi . and so it * c.xB)/s]. on x in the subpopulation Generally.] d. The linear regression of yi on xi using only those yi such that a1 < yi < a2 * consistently estimates the linear projection of y * for which a1 < y < a2. We get the loglikelihood immediately from part a: li(q) = 1[yi = a1]log{F[(a1 . Note how the indicator function selects out the appropriate density for each of the three possible cases: at the left endpoint.xB)/s]} + a2F[(xB . the regression on the restricted subsample could consistently estimate B up to a common scale coefficient. B^ and s^2. x.a2)/s]. After obtaining the maximum likelihood estimates these into the formulas in part b. there is no reason to think that this will have any simple relationship to the parameter vector B. just plug The expressions can be evaluated at interesting values of x.a1 < y* < a2) $ xB. f1 _ f[(a1 . dE(yx) = (a /s)f b + (a /s)f b 1 1 j 2 2 j dx j + (F2 . We can show this by bruteforce differentiation of equation (16.f2)]bj  101 Then . write a2)/s].xB)/s]} (16.57) s{f[(a1 . [In some restrictive cases.f[(a2 .xB)/s] + F[(a1 .+ (xB)W{F[(a2 .xB)/s] . From part b it is clear that E(y would be a fluke if OLS on the restricted sample consistently estimated B.xB)/s] = f[(xB  F1 _ F[(a1 .xB)/s].F1)bj + [(xB/s)(f1 . e.
since we act as if we were able to do OLS on an uncensored sample.  at. which is necessarily between zero and one.58) ^ ^ F[(a1 .xB)/s. in (16.xB)/s]f1}bj . respectively. terms cancel except (F2  Careful inspection shows that all F1)bj. does not make sense to directly compare the magnitude of By the way. where the first two parts are the derivatives of the first and third terms. g. The scale factor is simply the probability that a standard normal random variable falls in the interval [(a1 . there is no sense in which ^ s is "ancillary. which is the expression we wanted to be left with. Of course.xiB)/s]  all i to obtain the average partial effect.57).xiB )/s]} across In either case.xB)/s]f2}bj. We could evaluate these partial effects ^ ^ Or. but it is often roughly true. The partial effects on E(yx) are given in part f. x.{[(a2 . and the last two lines are obtained from differentiating the second term in E(yx). we expect ^ ^ gj ~ ^rWb j. Intepretating the results is even easier.(a2 . ^ ^ ^ F[(a1 . where 0 < ^ r < 1 is the scale factor.xB )/s]}bj. (16. 102 .xB)/s]. These are estimated as ^ ^ {F[(a2 . the analysis is essentially the same but a1 and a2 are replaced with ai1 and ai2." h. note that It ^ bj with that of ^gj.+ {[(a1 . the scaled ^ bj Generally. For data censoring where the censoring points might change with i. say. ^ s appears in the partial effects along with the ^bj. we could average {F[(a2 . this approximation need not be very good in a partiular application.xB)/s] where the estimates are the MLEs. can be compared to the ^ gj.
949 0.0678666 0.2788372 .5.3604 .2455481 nrtheast  .0083783 9.0284978 . Interval] +exper  .000 1.258 .492 .547 0.0022495 .19384436 Residual  170.0523598 4.423 0. t P>t [95% Conf.0287099 .082204 .005544 .132288 11 9.0899016 .1825451 .000 .839786 604 .0050939 .0061263 educ  .000 .384 .635 0.442231015 Number of obs F( 11. The results from OLS estimation of the linear model are .3768401 .0696015 .3718 0.858 0.560 .2556765 .0986582 tenure  .0112981 .0058343 educ  .0025859 .3518203 b.1772515 3.0614223 nrthcen  .325 0.672 .1490686 .66616 = 616 = 283.583 0.871 0.0538339 1. Std.1900971 male  .3547274 white  .811 0.0035481 7.0657498 .0029666 .0492621 .710 0.0088168 9.186 .0746602 1. Interval] +exper  . Std.53183 hrbens  Coef.048028 .812 0.0041162 0.0132201 age  .6999244 .16.000 .1038129 union  .0000 0.131 0.552 0.021225 .010294 .0351612 married  .468 .057 .1042321 tenure  .726 0.0037237 7.0044362 0.0103333 .0834306 .0046627 0.206 .079 .2282836 .0869168 .0040631 .084021 south  .50 0.265 0.0510187 1.0551672 4.1608084 . tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union.86 = 0.0000 = 0.0673714 0.364019 white  .0281931 .0499022 7.972074 615 .000 .0737578 1.2084814 male  .946 0.078604 1.0477021 .1027574 . 604) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 616 32.251898 .4748429 _cons  .054929 .2145 hrbens  Coef.762 0. reg hrbens exper age educ tenure married male white nrtheast nrthcen south union Source  SS df MS +Model  101.0029862 .282847328 +Total  271. ll(0) Tobit Estimates Number of obs chi2(11) Prob > chi2 Pseudo R2 Log Likelihood = 519.2538105 103 .000 .000 .0360227 married  .909 0. Err.0115164 age  .1473341 .0994408 . t P>t [95% Conf.688 0.000 . a.585 .098923 . The Tobit estimates are .021397 .0043435 0. Err.
0306652 .0040294 .8137158 .753 0.327 0.0002604 104 .95 = 0.0768576 1. summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations The Tobit and OLS estimates are similar because only 41 of 616 observations.000 .0005524 .354 .000 .000 .1187017 union  .1880725 4.3874497 .230 0.928 0.0008445 .0693418 0.033717 .0709243 0.7% of the sample.0522697 7.239 .4033519 .0043428 0.1708394 .0631812 .18307 .0775035 1.581 0.629 .0698213 0.1536597 .037525 .1012841 nrthcen  . Interval] +exper  .0743625 nrthcen  .0324014 .2562597 .728 .0086957 9.000 .1503703 .5551027 .0973362 tenure  .0581357 .685 0.0489422 .597 0.0787463 married  . the parameter "_se" is ^ s.351 0. 2 c.2870843 .493 .0714831 .0125583 .000 .1034053 south  . ll(0) Tobit Estimates Number of obs chi2(13) Prob > chi2 Pseudo R2 Log Likelihood = 503.2416193 nrtheast  .252 0.4878151 expersq  .0778461 .0480194 .0085253 3.348 0.180 0.0246854 .0000 = 0.2300547 .0539178 4.1891572 . Again.1639731 .316 .004 0.1753675 male  .715 0.000 1.717 0.0044995 educ  .540 0.197323 .177 .3621491 white  .000 .0912729 south  . or about 6.632 0.1146022 union  .000 . Std.051105 7. this reflects that the scale factor is always less than unity. t P>t [95% Conf.5060039 _cons  .483 0.62108 = 616 = 315. Here is what happens when exper and tenure 2 are included: .3006999 .0165773 (Ancillary parameter) Obs. the Tobit estimates are all slightly larger in magnitude.2388 hrbens  Coef.0906783 .y > 0). tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq.0802587 . as we know.047408 age  .nrtheast  .0139224 .0713965 0.0760238 0.801 . Err.0602628 . As expected.528 .0528969 1. You should ignore the phrase "Ancillary parameter" (which essentially means "subordinate") associated with "_se" as it is misleading for corner solution applications: ^2 s appears directly in ^E(yx) and ^E(yx.0001487 3. have hrbens = 0.4443616 +_se  .017479 .0104947 5.
0506381 6.3257878 .2035085 .319 1.0963657 .408 .99 = 0.828 0.349246 .2148662 .997 0.056 0.0400045 nrthcen  .387704 .7536342 ind6  .0501776 1.0007188 .343 0.0547462 .0438005 .368639 0.295 0.000 .009 0.0034182 .0046942 educ  .7310468 .633 0.000 1.1853532 5.09766 = 616 = 388.053115 .0789402 .0963403 tenure  .3739442 0.0735678 1. summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations Both squared terms are very signficant.6276261 ind4  .2632871 nrtheast  .4947348 ind5  .0081297 3.888 0.207 0.1667934 .231545 . Err.0088598 8.243 0.0655859 0.7117618 .0003863 3.06154 .107 .0013026 .0908226 union  .993 .4137686 expersq  .261 0.1317401 .0721422 1.105 1.330 0.3682535 1.tenuresq  .1016799 .0001623 tenuresq  .0335907 .372 0.0427534 age  .001 .278 .168 1.0099413 5. d.091 0. and we use ind1 as the base industry: . Interval] +exper  .3731778 .0004098 3.000 .0209362 .109 0.3669437 0.2351539 .0556864 4.108095 .002 .615 0.2411059 .159 .0004405 . There are nine industries.5796409 +_se  .091 0.5083107 .7377754 ind8  .002134 .3504717 white  .8203574 .165 1.579 0.0033643 .624 0.3742017 0.376006 1.0001417 3.0256812 .0013291 .000 .04645 .0726393 married  .9436572 .390 0.000 . Std.0585521 south  .373072 0.794 .276 .2433643 .5099298 .5750527 .0115306 .910 0.380 0.527 .086 0.4137824 1.0724782 .1188029 .1532928 male  .0005242 _cons  .2375989 +_se  .0161572 (Ancillary parameter) Obs.5418171 .563 .0379854 .127675 ind9  . so they should be included in the model.0020613 .0108205 .0151907 (Ancillary parameter) 105 .000544 ind2  .3716415 0.955 .0000 = 0. ll(0) Tobit Estimates Number of obs chi2(21) Prob > chi2 Pseudo R2 Log Likelihood = 467. t P>t [95% Conf.3143174 .3617389 ind3  .2940 hrbens  Coef.214924 ind7  .001 .6107854 .0267869 .0041306 0.000 .9650425 .307673 .0667174 1.3948746 _cons  .375 1. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq ind2ind9.409 0.
Certainly several estimates on the industry dummies are economically significant.0 0. in this example. where F(Wx) is the cdf 106 .0 8. it is roughly legitimate to use the parameter estimates as the partial effects. This is somewhat unusual for dummy variables that are necessarily orthogonal (so that there is not a multicollinearity problem among them).66 0.098) = 73. summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations . a.0 0.0 0.0 0.0000 Each industry dummy variable is individually insignificant at even the 10% level. with a worker in. test ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 F( = = = = = = = = 0.0 0. A more general case is done in Section Briefly. 17.Obs. The likelihood ratio statistic is 2(503. with so few observations at zero.467. [Remember. This follows because the densities conditional on y > 0 are identical for the Tobit model and Cragg’s model. then the density of y given x and y > 0 is f(Wx)/[1 . notice that this is roughly 8 (= number of restrictions) times the F statistic.0 0. industry eight earning about 61 cents less per hour in benefits than comparable worker in industry one.] 16. the pvalue for the LR statistic is also essentially zero.0 0.F(0x)]. if f(Wx) is the continuous density of y given x.621 .046. say.7. 595) = Prob > F = 9. but the joint Wald test says that they are jointly very significant.3.
and this 1 is exactly the density specified for Cragg’s model given y > 0. a. a2 = 10. A twolimit Tobit model. the upper limit of 10 is an arbitrary corner imposed by law. c. we get that f(yx. which would be the percentage invested in the absense of any restrictions. Then.9.f[(a2 .3(b).from (16. we have E(yx) = (xB)W{F[(a2 .8): log[E(yx)] = log[P(y > 0x)] + log[E(yx.xB)/s] + F(xB/s)} s{f(xB/s) . c. there would be no upper bound required (since we would not have to worry about 100 percent of income being invested in a pension plan). From Problem 16.y > 0) = F(xG)[xB + sl(xB/s)].y > 0)].a2)/s].of y given x. with a1 = 0.xB)/s]} + a2F[(xB . Taking the derivative of this function with respect to a2 gives 107 .8) we have E(yx) = F(xG)WE(yx. with a1 = 0. b. This follows very generally . we can think of an underlying variable. One can imagine that some people at the corner y = 10 would choose y > 10 if they could. of the kind analyzed in Problem 16. 16. When f is the normal pdf with mean xB and variance s2. From (6. So.3.not just for Cragg’s model or the Tobit model . The lower limit at zero is logically necessary considering the kind of response: the smallest percentage of one’s income that can be invested in a pension plan is zero.y > 0) = {F(xB/s)} {f[(y . is appropriate.xB)/s]/s} for the Tobit model. On the other hand. b. If we take the partial derivative with respect to log(x1) we clearly get the sum of the elasticities.
d. and 108 . t=1 1 T   Of course.xB)/s] + = F[(xB ..a2)/s]. If yi < 10 for i = 1. B^ and For a given value of x. (16..dE(yx)/da2 = (xB/s)Wf[(a2 .13. t=1 S Pt8*X + xiX + ai _ j + xiX + ai. we would compute ^ s are the MLEs.a2)/s] F[(xB . OLS always consistently estimates the parameters of a linear projection .. ^ ^ F[(xB .N.a2)/s] . That is why a linear regression analysis is always a reasonable first step for binary outcomes. and Var(x) has full rank K . and count outcomes. We simply have & 7 ci = . any aggregate time dummies explicitly get  swept out of xi in this case but would usually be included in xit.xB)/s] + [(a2 . This extension has no practical effect on how we estimate an unobserved effects Tobit or probit model. corner solution outcomes. 16.xB)/s]Wf[(a2 .59) We can plug in a2 = 10 to obtain the approximate effect of increasing the cap from 10 to 11.T where T j _ 7&T1 S Pt8*X. 16..(a2/s)f[(xB .regardless of the nature of y or x. where We might evaluate this expression at the sample average of x or at other interesting values (such as across gender or race).11. No. B^ and ^ s are just the usual Tobit estimates with the "censoring" at zero.provided the second moments of y and the xj are finite.10)/s]. provided there is not true data censoring. An interesting followup question would have been: What if we standardize each xit by its crosssectional mean and variance at time t. or how we estimate a variety of unobserved effects panel data models with conditional normal heterogeneity.
This is a rather simple twostep estimation t=1 method.sa) (where. given that a fire has occured. but accounting for the sample variation in cumbersome.N}. then there is no problem. CHAPTER 17 17. 1/2 r t = 1...assume ci is related to the mean and variance of the standardized vectors. again.1. for each random draw i Then.Pt))t1/2. But then a twostage analysis is appropriate. let zit In _ (xit . 2  from the population.. zit would not contain aggregate time dummies). 109 . one could estimate estimate Pt for each t using the cross section observations {xit: i = 1. form ^ zit P^t and )^ t. given building and neighborhood characteristics.  Then.2..Pt). ci = j + S xirLr + ai... in which case one might ignore the sampling error in the firststage estimates.. This is the kind of scenario that is handled by Chamberlain’s more general assumption concerning T the relationship between ci and xi: ) X/T.. we might assume cixi ~ Normal(j + ziX. P^t and )^ t would be It may be possible to use a much larger to obtain P^t and )^ t.T.. 16. usual sample means and sample variance matrices. If you are interested in the effects of things like age of the building and neighborhood demographics on fire damage. say and rNasymptotically normal..15. To be added. We simply need a random sample of buildings that actually caught on fire. where Lr = r=1 Alternatively. and proceed t with the usual Tobit (or probit) unobserved effects analysis that includes the 1 time averages ^ zi = T  T S ^zit. and )t The are consistent ^ 1/2 ^ _ ) (xit .2.T. t = 1.. other words. You might want to supplement this with an analysis of the probability that buildings catch fire..
but If we use an IV approach. the procedure is to replace a1W(zD2 + v2) + u1 a1W(zD2) + (u1 + a1v2). density f(yxi.3. is p(yxi.v3) = z1D1 + where E[(u1 + a1W(zD2) + g1v3.81) is u1 + a1v2.B.81) its (17.G). F(a2(xi)xi.5. where B Let yi given xi have is the vector indexing E(yixi) and set of parameters (usually a single variance parameter).17. and a2(xi) was a function of family size (which determines the official poverty level).B.G) . If we replace y2 with y2. D2 in (17. a1(xi) = 8.B. E(y1z.v3) to be linear in v3 (in particular. If the selection correction is going to work.G)  In the Hausman and Wise (1977) study.81) rNconsistent estimator.v3) is independent of z with a trivariate normal distribution.v2. D^2. G is another Then the density of yi given xi.B. (17. a1(xi) < y < a2(xi). a1v2)v3] = g1v3 by normality. when si = 1[a1(xi) < yi < a2(xi)]. it cannot depend on z). we need to see what happens when y2 = zD2 + v2 is plugged into the structural mode: y1 = z1D1 + = z1D1 + So.F(a 1 (xi)xi. We can get by with less than this. a1v2 Then we can write E(y1z.82) is that (u1. ^ 17. This is essentially given in equation (17. we need the expected value of u1 + given (z. si = 1. we need assume 110 .si=1) = f(yx i . yi = log(incomei).82) A sufficient condition for (17.G) .14).y3 = 1) = z1D1 + Conditioning on y3 = 1 gives a1W(zD2) + g1l(zD3). the nature of v2 is restricted.  The key is to note how the error term in (17.
cannot be consistently estimated using the OLS procedure. zD2. where v3 _ u3 + a1v1 + a2v2. D2.zP3 + v3).nothing about v2 except for the usual linear projection assumption. Given D1 ^ Then.z3) can contain no exact linear dependencies. if we cannot write y2 = zD2 + v2. zi3 using all observations. This is why 2SLS is generally preferred. (zD1. As a practical matter. a2. v3 is indepdent of z and normally distributed.83) y3 = max(0. and z3. Thus.84) where y1 is observed only when y3 > 0. if we knew D1 and D2. (ziD2). then the OLS alternative will not be consistent. ^ ^ form ziD1 and ziD2 for each observation i in the sample. and Estimation of Estimation of D1 D2 is simple: just use OLS using the follows exactly as in Procedure 17.zP3 + v3).7. we could consistently a1. estimate Thus. where v2 is independent of z and approximately normal. For identification. ^a2.3 using the system y1 = zD1 + v1 (17. a. Substitute the reduced forms for y1 and y2 into the third equation: y3 = max(0. Under the assumptions given. 17. a1. obtain and D^2. (17.zD2. and D3 from a Tobit of y3 on zD1. equations where y2 is binary. or is some other variable that exhibits nonnormality. consistent estimators are obtained by using initial consistent D1 estimators of entire sample. and D^3 from the Tobit ^ yi3 ^ ^ on (ziD1). Necessary is that there must be at least two elements in z not 111 . >From the usual argument.a1(zD1) + a2(zD2) + z3D3 + v3) _ max(0.
problem. there is no sample selection bias because we have specified the conditional expectation for the population of interest. This is not very different from part a. We only need to obtain a random sample from the subpopulation with y > 0. For example.y > 0) = the NLS estimator of B Let w = 1[y > 0]. The only difference is that Then follow the steps from part a. E(yx) = P(y > 0x)WE(yx. We need to estimate the variance of u3. 112 . F(xG)Wexp(xB). Again. or conditional means.also in z3. x follows a probit model with P(w = 1x) = d. for the two parts. and the probit estimator of So we would plug in G. b. Not when you specify the conditional distributions. e.3. I think. you have specified the distribution of y given x and y > 0. w = 1[xG + v > 0]. 17. We would use a standard probit model. It is most easily done in a generalized method of moments framework. s23. Then w given F(xG). by definition. c. NLS is generally consistent and rNasymptotically  normal.9. a. b.11. There is no sample selection problem because. there is no sample selection Confusion arises. Obtaining the correct asymptotic variance matrix is complicated. 17. If we have a random sample from that population. To be added. D2 must be estimated using Procedure 17. By definition. c. when two part models are specified with unobservables that may be correlated. we could write y = wWexp(xB + u).
w) . we can write log(y) = xB + u. This twostep procedure reveals a potential problem with the model that allows u and v to be correlated: adding the inverse Mills ratio means that we are adding a nonlinear function of x. which we warned about in this chapter. one would try to find a variable that affects the fixed costs of being employed that does not affect the choice of hours.v) = 0.v) is multivariate normal. If we assume (u. if u and v are independent .w = 1] = xB + A twostep strategy for estimating probit of wi on xi to get G ^ and and l(xi^G). correlated. Ideally. once we absorb E[exp(u)] into the intercept). 113 While this would be a little less . A standard t ^ r is a simple test of Cov(u.so that u is independent of (x. In other words.we have E(yx.v) is independent of x. ^ ^ l(xi^G) to obtain B . estimate a Then. First. If we make the usual linearity assumption.w) = wWexp(xB)E[exp(u)x.w = 1).so that w = 0 6 y = 0. is pretty clear. E(uv) = rv and assume a standard normal distribution for v then we have the usual inverse Mills ratio added to the linear model: E[log(y)x. The interesting twist here is if u and v are Given w = 1. Then. run the regression log(yi) on xi. r.w = 1] = xB + E(ux. with mean zero. which implies the specification in part b (by setting w = 1. statistic on G B rl(xG). So E[log(y)x. Assume that (u. In labor economics. identification of B comes entirely from the nonlinearity of the IMR. where twopart models are used to allow for fixed costs of entering the labor market.w] = wWexp(xB)E[exp(u)]. then we can use a full maximum likelihood procedure. we would have a variable that affects P(w = 1x) that can be excluded from xB. using the yi > 0 observations.
A similar example is given in Section 19.13. the underlying variable y of interest has a conditional normal distribution in the population. For one.5. b. The point is to obtain partial effects of interest. We cannot use censored Tobit because that requires observing x when whatever the value of y. particularly.y > 0) = exp(x. the partial effects are not straightforward to obtain. Here. Usually. where E[exp(u)x. we can use truncated Tobit: distribution of y given x and y > 0. we use the Notice that our reason for using truncated Tobit differs from the usual application. Then.44). 114 . the parameters. Instead. we need sufficient price variation for the population that consumes some of the good. we can multiply this expectation by P(w = 1x) = that we cannot simply look at B F(xG). rank E(x’xy > 0) = K. we can Even with a full set of assumptions. equation (19. 17. a. making full distributional assumptions has a subtle advantage: then compute partial effects on E(yx) and E(yx. y given x follows a standard Tobit model in the population (for a corner solution outcome).B)WE[exp(u)x.robust.w = 1)]. see. E(yx. This is very different from the sample selection model. Provided x varies enough in the subpopulation where y > 0 such that b.y > 0).2. Given such variation.w = 1)] can be obtained under joint normality. In the case where an element of x is a derived price. we can estimate E(yx) = F(xB/s)xB + sf(xB/s) because we have made the assumption that y given x follows a Tobit in the full population.
a.07 0.  First. E(y0) = E(yw = 1).0008734 0.934 .0189577 .CHAPTER 18 18.y0) = [E(y0w = 1) . but I did not do so: .0005467 . probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 302.524 .1726192 0.0017837 nodegree  .1041242 agesq  .37 0.4298464 black  . Err.0534045 0.19 0.234 .2271609 0.06748 = = = = 445 16.7389742 .004 .0501979 .1515457 2.5). on average. Interval] +re74  .091519 .   and so the bias is given by the first term.1052176 . 18.3006019 115 .2468083 .0016399 . If E(y0w = 1) < E(y0w = 0). those who participate in the program would have had lower average earnings without training than those who chose not to participate.0122825 re75  . Std.170 .06748 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 294.0371871 . and.64 0.0266 train  Coef. The following Stata session estimates a using the three different regression approaches.0415 0.5).3. leads to an underestimate of the impact of the program. This is a form of sample selection.1. E(y1 .0159392 1.090319 age  .596 . It would have made sense to add unem74 and unem75 to the vector x. z P>z [95% Conf.0000719 .992 .08 0.0271086 1.0159447 .1 = 294. This follows from equation (18.E(y0w = 0)] + ATE1.53 0.1449258 married  . b.06748 = 294.44195 .01 0. by (18.1446253 . E(y1) = E(yw = 1) and  Therefore.5898524 .92 0.07642 = 294.
072 . Min Max +phat  445 .8154273 0. sum phat Variable  Obs Mean Std.45 0. Std. t P>t [95% Conf.13 0.0123 .1030629 _cons  .21153806 +Total  94.000 .826664 .84 0.0095 .8224719 444 . Err.4793509 1. Err.1066934 .0994803 3.0190 0.966 .0217247 phat  .0244515 441 .80 0.719599 .4155321 .hisp  .0212673 .4998223 442 .2378099 0. Std.779 1.661695 . t P>t [95% Conf.213564126 Number of obs F( 3.369752 1. Interval] +train  .015 . Interval] +train  . Pr(train)) .79802041 3 .3579151 .3184939 .661324802 Residual  93.0375 0. 442) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 3. gen traphat0 = train*(phat . reg unem78 train phat traphat0 Source  SS df MS +Model  1.3151992 0.104 1.60 0.6738951 .1624018 .1638736 .95 0.129489 1. reg unem78 train phat Source  SS df MS +Model  1.63 0.3009852 .340 .103972 .4775317 .213564126 Number of obs F( 2.195208 ..5534283 .37 0.8224719 444 .28 0. 441) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 2.50 0.0449 0.599340137 Residual  93.45928 unem78  Coef.0934459 .0450374 2.134 1.04 0.4572254 _cons  .210939799 +Total  94.1987593 .0181789 phat  .2284561 . Dev.9204644 traphat0  .3079227 1.45993 unem78  Coef.0101531 .0139 0.416) . predict phat (option p assumed.222497 _cons  .5004545 .4877173 .3226496 2 .233225 .018 .045039 2.110242 . reg unem78 train re74 re75 age agesq nodegree married black hisp 116 .
0659889 .0296401 .716 .1105582 . in this example.3368413  In all three cases.0114269 age  .07 0.0131441 .2342579 .0053889 0.0923609 black  .06748 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 294.1979868 .11. 18.2905718 0.025669 .0040 0.180637 .444 .636 .109 .7246235 435 .5.1078464 0.11: participating in job training is estimated to reduce the unemployment probability by about . of course.0304127 .1516412 .0189565 1.0444832 2.75 0.633 .36 0.0415 0.1 = 294.06748 = = = = 445 16.Source  SS df MS +Model  5.213564126 Number of obs F( 9.566427604 Residual  89.3408202 hisp  .451 .0094371 0. Of course.206263502 +Total  94.0001139 nodegree  .0676704 agesq  . so we are not surprised that different methods lead to roughly the same estimate.06748 = 294.1726761 _cons  .22 0.0538 0.0815002 2.1502777 married  .0231295 re74  . probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 302.60 0. Interval] +train  .0392887 .0266 117 . Std.0080391 re75  .0204538 . is to use a probit model for unem78 on train and x.60 0.8224719 444 . a.0003098 1.49 0.007121 .0004949 .75 0.07642 = 294.027 .0620734 0.8053572 .0068449 . training status was randomly assigned. I used the following Stata session to answer all parts: .48 0. 435) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 2. t P>t [95% Conf.0342 .47 0. the average treatment effect is estimated to be right around .77 0.0550176 0. An alternative.421 .2512535 .0011038 .013 .0421444 .111 .45416 unem78  Coef. Err.09784844 9 .81 0.0025525 .
19 0.484255158 Residual  .28 0. 436) Prob > F Rsquared Adj Rsquared Root MSE = 445 =69767.1998802 .47144 0.7397788 agesq  .1453799 0.0016399 .55 0.train  Coef.197362 Residual  18821. Err.759 .467 .0113738 . Std.42 0.64 0. Pr(train)) .5004545 .92 0.0045238 0.53 0.0360 0.157 5.9410e06 +Total  3.1030629 _cons  .4668602 .0159447 .5898524 .050672 1.37 0.40 0.43 0.0122825 re75  .0863775 .103972 .170 .776258 9 78.31 0.6566 444 43.1052176 .0064086 nodegree  1.0624611 .01 0.826664 .0024826 .44 = 0.0271086 1.210237 2.934 .104 1.779 1.00263 .927734 married  .87404126 8 . Err.2284561 .9992 = .668 .05 0.93248 27. t P>t [95% Conf.8154273 0.73 0. predict phat (option p assumed.0534045 0.2468083 .091519 .00172 0.2232733 .482387 6. z P>z [95% Conf.524 .369752 1.08 0.2814839 0.5779 re78  Coef.1515457 2.936 7.7389742 .0763 0.257878 .0371871 .108893 black  2.9992 = 0.2746971 0.63 0.003026272 436 6.004 .0000719 .992 .963 2. Interval] +re74  .662979 4.098774 0.583 .1041242 agesq  .43 0.1446253 .656719 0.596 . 435) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 1. Std.75 0.3006019 hisp  .0159392 1.008732134 118 Number of obs F( 8.31125 35.0017837 nodegree  .688 17. reg re78 train re74 re75 age agesq nodegree married black hisp (phat re74 re75 age agesq nodegree married black hisp) Instrumental variables (2SLS) regression Source  SS df MS +Model  703.613857 11.2953534 3.4298464 black  .0008734 0.0699177 18.8517046 hisp  .00 0.0189577 .89168 _cons  4.1602 .44195 .1449258 married  .87706754 444 .0005467 .08 0.9767041 Number of obs F( 9.554259 1. Interval] +train  .0000 = 0.997 35.1726192 0.670 7.203087 1.3400184 .0161 6.3079227 1.203039 0.367622 3.45109 re74  .234 .8804 435 43.0501979 .2686905 +Total  19525.6396151 age  .2271609 0.090319 age  .3481955 re75  . reg phat re74 re75 age agesq nodegree married black hisp Source  SS df MS +Model  3.
000 .04 0.1719806 married  .000316 546. But E(wWvx.070.0352802 . To be added.0003207 .99 0.z. t P>t [95% Conf.1726018 .9992.82 0.5907578 .000 . a.z] = E[E(wx.z] = xWexp(p0 + xP1 + zP2) where x = E[exp(p3v)Wv]. Std. we will replace wWv with its expectation given (x. y = h0 + xG + bw + wW(x  J)D + u + wWv + e.14 0.00036 98.000 .625.0562315 . Generally.0001046 agesq  .0359877 black  .5874586 .000 . which means there is virtually no separate ^ variation in Fi that cannot be explained by xi. ^ c.66).0140283 age  .0138135 .1732229 . 18.v)x. again.v)Wvx.00) suggests severe collinearity among the instruments. it is not a good idea.2.0069301 .80e06 16. much smaller than when we used either linear regression or the propensity score in a regression in Example 18. Err.z) and an error.0006238 294.0553027 hisp  . The collinearity suspected in part b is confirmed by regressing Fi on the xi: the Rsquared is .z) = E[E(wWvx.0016786 351.0000328 nodegree  .1826192 _cons  .z.000 .0069914 .7. a = 1.0004726 118.9.0000293 1.000 . d. 18.1838453 . The IV estimate of a is very small .z] = E[exp(p0 + xP1 + zP2 + p3v)Wvx.0139209 . ^ (When we do not instrument for train.000 .01 0.594057 b. Interval] +re74  .0000546 254.0571603 . and we have 119 . We can start with equation (18.0345727 . and.1850713 .93 0.000 ..) The very large standard error (18.71 0.92 0.640.00011 2.0005368 .0068687 re75  .phat  Coef. se = . This example illustrates why trying to achieve identification off of a nonlinearity can be fraught with problems.004 .0000258 .0000312 222.31 0.
z) = h0 + xG + bw + wW(x  J)D + rWg + qwWg. E(rx.. Now. [Note that we do not need to replace p0 with a different constant. In effect.. as is implied in the statement of the problem.z) = rWg and E(vg.. where we have used the fact that w is a function of (g.z) = 0.66) conditional on (g. The last equation suggests a twostep procedure.z) + E(eg. i = 1. c.  As usual.] So we can write y = h0 + xG + bw + wW(x  J)D + xE(wx.z).z) in the estimating equation.x.z) = 0. The ATE b is not identified by the IV estimator applied to the extended equation. p0 + xiP1 + ziP2 + gi.z) = h0 + xG + bw + wW(x  J)D + E(ug. gi.z) are valid as instruments. assume we can write w = exp(p0 + xP1 + zP2 + g).x. define r = u + [w  Given the assumptions. E(wWvx.z): E(yv. These are standard linearity assumptions under independence of (u. If h _ h(x.. A standard joint significant test .N. the coefficient on wi is the consistent estimator of b.N.z)] + e.z) is any function of (x. becaue we need to include E(wx.z). wigi. wi..x.for example.z) + r. an Ftype 120 .z) and E(eg. OLS regression log(wi) on 1. E(rx.x.. This is not what I intended to ask. no other functions of (x. we can consistently estimate p0.x. Then we take the expected value of (18. i = 1. i = 1.z) + wE(vgx. gi.x. xi.x).x.h) = L(wq) = q because q = E(wx.z).. xi. wi(xi .z).z) = 0..v.g) and (x.z) = qWg... where E(ug. What I should have said is. we In the second step.used the assumption that v is independent of (x.x.q. and P2 from the From this regression. zi.x. This is a clear weakness of the approach. the average treatment effect.. First. ^ need the residuals. b. since log(wi) = P1. L(w1. run the regression ^ ^ yi on 1..N.
3. which. Err. 2 The 2 + m .7745021 . The following is Stata output used to answer parts a through f.6722235 white  .1.1605158 4. b.log(m).702 3.782321 0.003 . when evaluated at mo.83 0.m for m > 0.test .561503 2.49943 lincome  . The first = 0.829893 .011 5.089585 121 .412 cigs  Coef.1. Write q(m) _ Then dq(m)/dm = mo/m .459461 0. so m = mo uniquely sets the The second derivative of q(m) is mom 2 > 0 for all m > 0.0446 13.280003 Number of obs F( 7.883 12.683 806 188.15 0. CHAPTER 19 19. gives 2mo + < 0.865621 1.8690144 .0529 0.424067 2.246 799 179.1671677 3.m 3 second derivative is 2mom 2 mo 2 = mo 1 _ E[li(m)] = mo/m . which is uniquely solved by m = mo. 799) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 807 6. For the exponential case.4594197 1.19 0.299532 restaurn  2. Interval] +lcigpric  . so the sufficient second order condition is satisfied.5017533 . reg cigs lcigpric lincome restaurn white educ age agesq Source  SS df MS +Model  8029. This is a simple problem in univariate calculus.0000 0. The answers are given below.00 0. Std.233 .38 0.8509044 5.117406 2.43631 7 1147.5592363 1. derivative to zero.06233 Residual  143724.7287636 1.000 .059019 .20124 10. t P>t [95% Conf.1736136 age  .305594 educ  . a.880158 +Total  151753. . molog(m) .on the last two terms effectively tests the null hypothesis that w is exogenous. 19.38 0. q(m) 2 order condition is mom .56 0.
07 0.10 0.862469 .1380317 5.597972 1.38 0. 799) = Prob > F = 1.005 4.865621 1.41 0.16145 .918 53.146247 educ  .519 Poisson regression Number of obs LR chi2(7) 122 = = 807 1068. test lcigpric lincome ( 1) ( 2) lcigpric = 0.912 50.52632 48.0017481 5.70 .888 12. Err. test lcigpric lincome ( 1) ( 2) lcigpric = 0.0 lincome = 0. reg cigs lcigpric lincome restaurn white educ age agesq.8509044 6.0090686 . Std.4899 .8690144 .11 0. robust Regression with robust standard errors Number of obs F( 7.682435 24.22 0.8687741 white  . t P>t [95% Conf.5017533 .14 0.002 .0056373 _cons  2.000 .054396 0.0 F( 2.3441 .8346 log likelihood = 8111.682435 25.5035545 1.5191 log likelihood = 8111.042796 restaurn  2.agesq  .378283 0.000 .61 0.82 0.000 .0000 0.04545 agesq  .7353 11.71 0.26472 2.0062048 _cons  2.0014589 6.86134 .017275 2.0335 lincome  .7745021 .1829532 age  .22073 0.685 3.90194 0.0 lincome = 0.0090686 .0 F( 2.09 0.1624097 3. Interval] +lcigpric  .19 0.22621 44. 799) = Prob > F = 0.0124999 .412  Robust cigs  Coef. poisson cigs lcigpric lincome restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2: log likelihood = 8111.8205533 .147 .0119324 . 799) Prob > F Rsquared Root MSE = = = = = 807 9.0529 13.5592363 1.3047671 2.45 0.
0049694 22.1407338 2.33 0.1434779 restaurn  .0703561 .1142571 . Std.460 .74 0.0008677 _cons  .000 .1059607 .1686685 0.886 5.027457 5.6463244 0.0013708 .14 0.0012592 _cons  .16 0.000057 24. Err.0618 cigs  Coef. Err.0754414 .07 0.3636059 .160812 lincome  .372733 1.13 0.257 . glm cigs lcigpric lincome restaurn white educ age agesq.65 0.002 . z P>z [95% Conf.0000 0.1045172 .1750847 lincome  .0510802 age  .96 0.000 .12272 cigs  Coef.0042564 13.1083 = 8111.140 .Log likelihood = Prob > chi2 Pseudo R2 8111.0677648 .000 . family(poisson) sca(x2) Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 8380.58 0.10 0.000 .158158 agesq  .6394391 .000 .518 .0202811 5.1285444 .0970243 . Std.3964493 2.0218208 age  .46933 16232.8068952 1.6139626 0.519022 = 14698.519 = 8111.519 Generalized linear models Optimization : ML: NewtonRaphson Deviance Pearson = = No.48 0.1142571 .820355 (Standard errors scaled using square root of Pearson X2based dispersion) * The estimate of sigma is 123 .31628 = 20.000 .70987 Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : OIM [Poisson] [Log] Log likelihood BIC AIC = 8111.6454 = 8111.0191849 3.743 .1037275 .1037275 . of obs Residual df Scale param (1/df) Deviance (1/df) Pearson 14752.0639772 . z P>z [95% Conf.10 0.0552011 .1239969 agesq  .599794 .0014825 .0181421 educ  .0552012 .4248021 .0594225 . Interval] +lcigpric  .870 1.010 .76735 0.1433932 0.0002567 5.519 = = 0.92274 = = = = = 807 799 1 18.65 0.0594225 .3964494 .3857854 .34 0.11 0.001874 .0914144 1.3870061 .3024098 white  .3636059 .2753831 educ  .1059607 .99 0.46367 20.0877728 white  .0312231 11.000 .0223989 5.2828965 restaurn  .0374207 1. Interval] +lcigpric  .0013708 .
0452489 age  .5469381 .098 .6454 = 8111.544 .46367 20.0015543 .9765587 .000 . * This is the usual LR statistic.2907 log likelihood = 8125.519)/(20.32) 4.14 0. di 2*(8125. Err. Std.95 0.46933 16232.037371 1..618 log likelihood = 8125.48 0.32: The GLM version is obtained by .5077711 .32) 1. Interval] +restaurn  . .14 0.000 .3555118 .291 .0000553 26.0013374 _cons  .0040652 13.0000 0.8111.31628 .000 .0048175 25.1211174 . family(poisson) robust Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 8380. of obs Residual df Scale param (1/df) Deviance (1/df) Pearson 14752.09 0.65 0.8111.1305594 agesq  .2906 Poisson regression Number of obs LR chi2(5) Prob > chi2 Pseudo R2 Log likelihood = 8125.70987 Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : Sandwich [Poisson] [Log] 124 = = = = = 807 799 1 18. poisson cigs restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2: log likelihood = 8125.0618025 .4150564 .0532166 . di sqrt(20.2940107 white  . z P>z [95% Conf.519 = 8111.0308796 11.1095991 6.1083 = 8111.1116754 .0114433 educ  .519 Generalized linear models Optimization : ML: NewtonRaphson Deviance Pearson = = No.0014458 .519) 27.16 0.7617484 .3545336 .0611842 .0602 cigs  Coef.2906 = = = = 807 1041. glm cigs lcigpric lincome restaurn white educ age agesq.291 .1350483 . * dividing by 20. di 2*(8125.000 .000 .
25 0. using the usual and heteroskedasticityrobust tests (pvalues = . Err.0595355 .46).213 .3964493 2.000 .0002446 5.3636059 . b.59 0.010 .0884937 white  . the income variable.12272  Robust cigs  Coef.083299 1.then log(cigpric) becomes much more significant (but using the incorrect standard errors).0217798 age  .0008914 _cons  .140366 2. Both estimates are elasticities: the estimate price elasticity is . although the coefficient estimates are the expected sign. they are significantly correlated. Incidentally.519022 = 14698. and.106 and the estimated income elasticity is .715328 a.1037275 .97704 0. too. if you drop restaurn .a binary indicator for restaurant smoking restrictions at the state level . based on the usual Poisson standard errors.09 0.0726427 . It does not matter whether we use the usual or robust standard errors. not surprisingly.104.3752553 .1632959 0. both cigpric and restaurn vary only at the state level.0594225 .6387182 .0192058 3. is very significant: t = 5.0970653 .23134 .344.34 0.0552011 .000 . (States that have restaurant smoking restrictions also have higher average prices. .264853 educ  . Neither the price nor income variable is significant at any reasonable significance level.438442 6.1059607 .60 0. Std. respectively).0212322 5.1143/(2*.0018503 . In this data set. z P>z [95% Conf.894 5. on the order of 2.6681827 0.0013708 .9%.2669906 restaurn  . While the price variable is still very insignificant (pvalue = .490.203653 lincome  .38 0.1558715 agesq  . The two variables are jointly insignificant.874 1.92274 AIC = 20.1142571 .415575 1.Log likelihood BIC = 8111.11.735 . Interval] +lcigpric  . di .13 0.00137) 41.16 0.002 .) 125 .
This means all of the Poisson standard errors should be multiplied by this factor. there is no race effect. and that on lincome falls to 1. 126 .54.519) = 27. A double hurdle model . The usual LR statistic is 2(8125.which separates the initial decision to smoke at all from the decision of how much to smoke . and then to model P(y = 0x) as a logit or probit.19 with the usual standard errors). One approach is to model D(yx. so QLR = 1.32. is certainly worth investigating. f. and the age variables are still significant. We simply compute the turning point for the quadratic: = 1143/(2*. in fact. As expected.16).72.^ c. while the LR statistic shows strong significance.) d.8111. using the maximum likelihood standard errors is very misleading in this example. which is a 2 very large value in a c2 distribution (pvalue ~ 0).13 . With the GLM standard errors. g.seems like a good idea. with the option "sca(x2). the restaurant restriction variable. conditional on the other covariates.51).much more in line with the linear model t statistic (1.00137) ^ ^ bage/(2bage2) ~ 41. as is done using the "glm" command in Stata.51. Using the robust standard errors does not significantly change any conclusions.291 . Clearly. most explanatory variables become slightly more significant than when we use the GLM standard errors.y It > 1) as a truncated Poisson distribution." The t statistic on lcigpric is now very small (. The GLM estimate of s is s = 4. the QLR statistic shows that the variables are jointly insignificant. education. The QLR statistic ^2 divides the usual LR statistic by s = 20. ^ the adjustment by s > 1 that makes the most difference. it is Having fully robust standard errors has no additional effect.36 (pvalue ~ . e. (Interestingly. In this example.
ci)xi] + Cov[E(yitxi. This works because.ci)xi] = E[ciexp(xitB)xi] + Var[ciexp(xitB)xi] = exp(a + xitB) + t [exp(xitB)] . First. of ~ ~ ~ ~2 ~ ~ 2 uit . B by...ci)xi] = 0 + Cov[ciexp(xitB).yit on [exp(xitB)] . Var(yitxi) = E[Var(yitxi.yirxi) = E[Cov(yit. say. Var(yixi). A similar. 2 and t . under H0. but we are maintaining full We have enough assumptions to derive * T conditional variance matrix of yi given xi under H0. B. We just use iterated expectations: E(yitxi) = E[E(yitxi. general expression holds for conditional covariances: Cov(yit.. E(uitxi) = E(uitxit) 2 = exp(a + xitB) + t [exp(xitB)] . 2 also use the many covariance terms in estimating t 127 2 because t [We could 2 = . obtain ~ ~ ~ ~ ~ ~ ~ ~ consistent estimators a.5..   G b. Var(yixi) depends on a.N.ci)..yirxi.yit.ciexp(xirB)xi] = t exp(xitB)exp(xirB). 2 2 where t 2 _ Var(ci) and we have used E(cixi) = exp(a) under H0.19. under H0. Let yit = exp(a + ~ ~ ~ ~ ~ ~ 2 xitB) and uit = yit . ~2 Call this estimator t . pooled Poisson QMLE.ci)xi] + Var[E(yitxi.. a. A consistent estimator of t can be obtained from estimate. 2 So.ci)xi] = E(cixi)exp(xitB) = exp(a + xiG)exp(xitB) = exp(a + xitB + xiG).E(yitxit). where uit 2 2 _ yit . the T = 0. t = 1. i = 1. First.T. We are explicitly testing H0: independence of ci and xi under H0.E(yirxi. a simple pooled regression. through the origin. all of which we can It is natural to use a score test of H0: G = 0...
E{[uit/exp(xitB)][uir/exp(xirB)]}, all t
2
2
Next, we construct the T
$ r.
* T weighting matrix for observation i, as in
~
~
Section 19.6.3; see also Problem 12.11. The matrix Wi(D) = W(xi,D) has
~
~
~
~2
~ 2
diagonal elements yit + t [exp(xitB)] , t = 1,...,T and offdiagonal elements
~
~
~2
~
~
~ ~
t exp(xitB)exp(xirB), t $ r. Let a, B be the solutions to
N
~ 1
min (1/2) S [yi  m(xi,a,B)]’[Wi(D)] [yi  m(xi,a,B)],
i=1
a,B
where m(xi,a,B) has t
th
element exp(a + xitB).
Since Var(yixi) = W(xi,D),
this is a MWNLS estimation problem with a correctly specified conditional
variance matrix.
Therefore, as shown in Problem 12.1, the conditional
information matrix equality holds.
To obtain the score test in the context
of MWNLS, we need the score of the comditional mean function, with respect to
all parameters, evaluated under H0.
Let
Q _
Then, we can apply equation (12.69).
(a,B’,G’)’ denote the full vector of conditional mean
parameters, where we want to test H0:
G
= 0.
The unrestricted conditional
mean function, for each t, is
mt(xi,Q) = exp(a + xitB + xiG).

Taking the gradient and evaluating it under H0 gives
~
Dqmt(xi,Q~) = exp(a~ + xitB
)[1,xit,xi],

which would be 1
* (1 + 2K) without any redundancies in xi.

Usually, xit
would contain year dummies or other aggregate effects, and these would be

dropped from xi; we do not make that explicit here.
T
Let
DqM(xi,~Q) denote the
* (1 + 2K) matrix obtained from stacking the Dqmt(xi,Q~) from t = 1,...,T.
Then the score function, evaluate at the null estimates
~
Q _
~ ~ ~
(a,B’,G’)’, is
~
~
~ 1~
si(Q) = DqM(xi,Q)’[Wi(D)] ui,
~
where ui is the T
* 1 vector with elements ~uit _ yit  exp(~a + xitB~ ).
128
The
estimated conditional Hessian, under H0, is
~
1
A = N
N
S DqM(xi,Q~)’[Wi(~D)]1DqM(xi,~Q),
i=1
a (1 + 2K)
* (1 + 2K) matrix.
The score or LM statistic is therefore
& S D M(x ,~Q)’[W (~D)]1~u *’& SN D M(x ,~Q)’[W (~D)]1D M(x ,~Q)*1
i
i
i8 7
q i
i
q i 8
7i=1 q
i=1
N
W&7 S DqM(xi,Q~)’[Wi(~D)]1~ui*8.
N
LM =
i=1
a
2
Under H0, and the full set of maintained assumptions, LM ~ cK.
If only J < K

elements of xi are included, then the degrees of freedom gets reduced to J.
In practice, we might want a robust form of the test that does not
require Var(yixi) = W(xi,D) under H0, where W(xi,D) is the matrix described
above.
This variance matrix was derived under pretty restrictive
assumptions.
~
A fully robust form is given in equation (12.68), where si(Q)
~
~
1
and A are as given above, and B = N
N
S si(~Q)si(~Q)’.
Since the restrictions
i=1
are written as
matrix is K
G
= 0, we take c(Q) =
G,
~
and so C = [0IK], where the zero
* (1 + K).
c. If we assume (19.60), (19.61) and ci = aiexp(a + xiG) where aixi ~

Gamma(d,d), then things are even easier  at least if we have software that
estimates random effects Poisson models.
Under these assumptions, we have
yitxi,ai ~ Poisson[aiexp(a + xitB + xiG)]

yit, yir are independent conditional on (xi,ai), t
$ r
aixi ~ Gamma(d,d).
In other words, the full set of random effects Poisson assumptions holds, but
where the mean function in the Poisson distribution is aiexp(a + xitB + xiG).


In practice, we just add the (nonredundant elements of) xi in each time
period, along with a constant and xit, and carry out a random effects Poisson
analysis.
We can test H0:
G
= 0 using the LR, Wald, or score approaches.
Any of these wouldbe asymptotically efficient.
129
But none is robust because we
have used a full distribution for yi given xi.
19.7. a. First, for each t, the density of yit given (xi = x, ci = c) is
yt
f(ytx,c;Bo) = exp[cWm(xt,Bo)][cWm(xt,Bo)] /yt!,
yt = 0,1,2,....
Multiplying these together gives the joint density of (yi1,...,yiT) given (xi
= x, ci = c).
Taking the log, plugging in the observed data for observation
i, and dropping the factorial term gives
T
S {cim(xit,B) + yit[log(ci) + log(m(xit,B))]}.
t=1
b. Taking the derivative of li(ci,B) with respect to ci, setting the
result to zero, and rerranging gives
T
(ni/ci) =
S m(xit,B).
t=1
Letting ci(B) denote the solution as a function of
ni/Mi(B), where Mi(B)
B,
we have ci(B) =
T
_ S m(xit,B).
The second order sufficient condition
t=1
for a maximum is easily seen to hold.
c. Plugging the solution from part b into li(ci,B) gives
li[ci(B),B]
= [ni/Mi(B)]Mi(B) +
= ni + nilog(ni) +
T
=
T
S yit{log[ni/Mi(B)] + log[m(xit,B)]
t=1
T
S yit{log[m(xit,B)/Mi(B)]
t=1
S yitlog[pt(xi,B)] + (ni  1)log(ni),
t=1
because pt(xi,B) = m(xit,B)/Mi(B) [see equation (19.66)].
N
d. From part c it follows that if we maximize
S
i=1
li(ci,B)
with respect to
(c1,...,cN)  that is, we concentrate out these parameters  we get exactly
N
li[ci(B),B].
i=1
S
B
depend on
N
But, except for the term
S (ni  1)log(ni)  which does not
i=1
 this is exactly the conditional log likelihood for the
conditional multinomial distribution obtained in Section 19.6.4.
Therefore,
this is another case where treating the ci as parameters to be estimated leads
us to a
rNconsistent, asymptotically normal estimator of Bo.

130
95396289 4 1.7087769 .99 0. replace atndrte = atndrte/100 (680 real changes made) .14287 atndrte  Coef.1].2040379 frosh  .014485 0.0936415 . I first converted the dependent variable to be in [0. Dev.0000 0.1599947 .76 0.029059989 Number of obs F( 4. Min Max +atndrteh  680 .000 . glm atndrte ACT priGPA frosh soph.086443 . . rather than [0.0169202 . This is required to easily use the "glm" command in Stata. count if atndrteh > 1 12 . predict atndrteh (option xb assumed.1820163 .0517097 .448 . Err.0417257 16.001681 10.0136196 priGPA  .0394496 _cons  . t P>t [95% Conf.92 0.0174327 .003 .0110085 .0177377 .64509 223.9.0173019 2.64937 223.6268492 . I will use the following Stata output.19.07 0.7907046 .99 0. family(binomial) sca(x2) note: atndrte has noninteger values Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = 226.100].48849072 Residual  13.020411511 +Total  19. Std.64937 Generalized linear models No. Interval] +ACT  . fitted values) .0856818 soph  .0112156 16.7777696 675 . reg atndrte ACT priGPA frosh soph Source  SS df MS +Model  5.000 .2976 . 675) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 680 72.7317325 679 .4846666 1.8170956 .000 . sum atndrteh Variable  Obs Mean Std. of obs 131 = 680 .0202207 .64983 223.3017 0.23 0.
7622 .6724981 atndrte  Coef.0000 atndh  0.75991253 .84673249 .1266718 = = = = 675 1 .7371358 85. di exp(. di exp(.44 0. di (..0922209 .000 .98 0.1114*30 + 1. Min Max +atndh  680 .093199 1.7622 .847 .0771321 16. Err.5725 1.13 0.7621699 ..008 .6122622 soph  .113436 3.1114*25 + 1.244*3)/(1 + exp(.0891901 priGPA  1.1676013 .322713 (Standard errors scaled using square root of Pearson X2based dispersion) .2778463 _cons  .0928127 .01607824 .3499525 . z P>z [95% Conf.Optimization : ML: NewtonRaphson Deviance Pearson = = Residual df Scale param (1/df) Deviance (1/df) Pearson 285.66 0. Interval] +ACT  ..760 . di .1267746 = . predicted mean atndrte) .1335703 . predict atndh (option mu assumed.. corr atndrte atndh (obs=680)  atndrte atndh +atndrte  1.3899318 .0000 132 .201627 1.1114*30 + 1.0965356 .0944066 0.6493665 = 253. sum atndh Variable  Obs Mean Std.000 1.7622 .244*3)/(1 + exp(.395552 frosh  .0113217 9.244375 . Dev.087 . Std.244*3)) .1113802 .9697185 .84 0.57283238 Variance function: V(u) = u*(1u) Link function : g(u) = ln(u/(1u)) Standard errors : OIM [Bernoulli] [Logit] Log likelihood BIC AIC = 223.8170956 .2859966 2.7622 .001 .4233143 .1114*25 + 1..1268)^2 .244*3)) .326 .
2 standard errors that account for s The < 1 are given by the GLM output. The GLM standard errors are given in the output. This Rsquared is about . Naturally.5 percentage points.328. the estimated fall in atndrte is about . you will get the usual MLE standard errors. or 8..182 higher. we know that an increase in ACT score.7 percentage points.more than a one standard deviation increase . In other words.5725)^2 . are much too large. and so the logistic functional form does fit better than the linear model. The coefficient on ACT means that if the ACT score increases by 5 points . For the logistic functional form. 133 . say. remember that the parameters in the logistic functional form are not chosen to maximize an Rsquared. I computed the squared correlation between atndrtei and ^ E(atndrteixi). or 18.then the attendance rate is estimated to fall by about .0161. (If you omit the "sca(x2)" option in the "glm" command. The calculation shows that when ACT increases from 25 to 30. obtained.017(5) = . holding year and prior GPA fixed. b. actually reduces predicted attendance rate. Since the coefficient on ACT is negative. This is very similar to that found using the linear model. the usual MLE standard errors. The Rsquared for the linear model is about . the attendance rate is predicted to be about . And. from the expected Hessian of the quasilog likelihood. d. these changes do not always make sense when starting at extreme values of atndrte. none less than zero. ^ Note that s ~ .32775625 a.) c. There are 12 fitted values greater than one. The coefficient on priGPA means that if prior GPA is one point higher.2 percentage points.087. or about 8. di (.085.302.
Q)] = S log[1 .ai. To be added.F(b . So no two real numbers for a and b maximize the log likelihood. d. To be added.19. the loglikelihood is maximized by minimizing exp(b) across b. It is not possible to estimate duration models from flow data when all durations are right censored. i=1 But then.aixi)]/[1 .si = 1) = P(ti * 134 > cixi.ci.aixi)]. exp(b) L 0. the Weibull loglikelihood with complete censoring is exp(b) N S cai . for any a > 0.F(tixi.si = 1) = P(t*i < txi.Q)] a b.aixi) (because t < b  ai) = [F(txi) . For the Weibull case. a.t*i > b .F(b . is simply f(txi)/[1 . SOLUTIONS TO CHAPTER 20 PROBLEMS 20.ai) = P(t*i < t.ai) = P(t*i > . If all durations in the sample are censored.F(cixi. and so the loglikelihood is N N i=1 i=1 S log[1 . with respect to t.aixi)/P(ti * > b .aixi)].aixi) = P(ti * < txi)/P(t*i > b . P(ti < txi.5.F(b .3.ci.11. N loglikelihood is  i=1 c. we can choose any a > 0 so that N S cai > 0.1. a. F(txi.Q) = 1 . b. Without covariates. So plugging any value a into the log likelihood will lead to b getting more and more negative without bound. The derivative of the cdf in part a. di = 0 for all i.exp[exp(xiB)t ]. * 20. c.t*i > b .ai. and so the S exp(xiB)cai .t*i > b . P(ti = cixi. 20. But as b L 8. i=1 Since ci > 0.
32). the density is k(axi)[1  F(cixi)].cixi)/P(ti * > b . is P(ti ai. For t = ci.aixi) (because ci > b . First.22) and D(aici. the density of (ai.ai) = [1 .di.xi) when t < ci.xi) does not depend * on ci and is given by k(axi)f(txi) for 0 < a < b and 0 < t < 8. conditional on xi. the probability of * observing the random draw (ai.9. 20. (20. Putting in the parameters and taking the log gives (20. a. by (20. for all combinations (a. the density of (ai.56) results in more efficient estimators provided we have the two densities correctly specified.30) requires us to only specify f(Wxi). which is exactly (20.ti). To be added.F(cixi)]/[1 . the observation is uncensored.7. b.t) such that si = 1.xi) = D(aixi).56).F(b  aixi)]. We suppress the parameters in the densities.ti) given (ci.F(cixi)] /P(si = 1xi). that is. Using the log likelihood (20.xi.ci.xi) and si = 1 is d (1 . > b  From the standard result for densities for truncated distributions.di) k(axi)[f(txi)] i[1 . 20.xi). Now. This is also the conditional density of (ai. by the usual right censoring argument. 135 .ti) given (ci. We have the usual tradeoff between robustness and efficiency.ti) given (ci.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.