Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
Analysis of Cross Section and Panel
Data, by Jeffrey M. Wooldridge, MIT Press, 2002.
The empirical examples are
solved using various versions of Stata, with some dating back to Stata 4.0.
Partly out of laziness, but also because it is useful for students to see
computer output, I have included Stata output in most cases rather than type
tables.
In some cases, I do more hand calculations than are needed in current
versions of Stata.
Currently, there are some missing solutions.
I will update the solutions
occasionally to fill in the missing solutions, and to make corrections.
some problems I have given answers beyond what I originally asked.
For
Please
report any mistakes or discrepencies you might come across by sending me email at wooldri1@msu.edu.
CHAPTER 2
dE(yx1,x2)
dE(yx1,x2)
= b1 + b4x2 and
= b2 + 2b3x2 + b4x1.
dx1
dx2
2
b. By definition, E(ux1,x2) = 0. Because x2 and x1x2 are just functions
2.1. a.


of (x1,x2), it does not matter whether we also condition on them:
E(ux1,x2,x2,x1x2) = 0.
2
c. All we can say about Var(ux1,x2) is that it is nonnegative for all x1
and x2:
E(ux1,x2) = 0 in no way restricts Var(ux1,x2).
2.3. a. y = b0 + b1x1 + b2x2 + b3x1x2 + u, where u has a zero mean given x1
and x2:
b.
E(ux1,x2) = 0.
We can say nothing further about u.
dE(yx1,x2)/dx1 = b1 + b3x2.
Because E(x2) = 0, b1 =
1
E[dE(yx1,x2)/dx1].
Similarly, b2 = E[dE(yx1,x2)/dx2].
c. If x1 and x2 are independent with zero mean then E(x1x2) = E(x1)E(x2)
= 0.
Further, the covariance between x1x2 and x1 is E(x1x2Wx1) = E(x1x2) =
2
2
E(x1)E(x2) (by independence) = 0.
A similar argument shows that the
covariance between x1x2 and x2 is zero.
But then the linear projection of
x1x2 onto (1,x1,x2) is identically zero.
Now just use the law of iterated
projections (Property LP.5 in Appendix 2A):
L(y1,x1,x2) = L(b0 + b1x1 + b2x2 + b3x1x21,x1,x2)
= b0 + b1x1 + b2x2 + b3L(x1x21,x1,x2)
= b0 + b1x1 + b2x2.
d. Equation (2.47) is more useful because it allows us to compute the
partial effects of x1 and x2 at any values of x1 and x2.
Under the
assumptions we have made, the linear projection in (2.48) does have as its
slope coefficients on x1 and x2 the partial effects at the population average
values of x1 and x2  zero in both cases  but it does not allow us to
obtain the partial effects at any other values of x1 and x2.
Incidentally,
the main conclusions of this problem go through if we allow x1 and x2 to have
any population means.
2.5. By definition, Var(u1x,z) = Var(yx,z) and Var(u2x) = Var(yx).
2
assumption, these are constant and necessarily equal to s1
Var(u2), respectively.
By
_ Var(u1) and s22 _
2
But then Property CV.4 implies that s2
> s21.
This
simple conclusion means that, when error variances are constant, the error
variance falls as more explanatory variables are conditioned on.
2.7. Write the equation in error form as
2
y = g(x) + zB + u, E(ux,z) = 0.
Take the expected value of this equation conditional only on x:
E(yx) = g(x) + [E(zx)]B,
and subtract this from the first equation to get
y  E(yx) = [z  E(zx)]B + u
~
~
or y = zB + u.
~
~
Because z is a function of (x,z), E(uz) = 0 (since E(ux,z) =
~ ~
~
0), and so E(yz) = zB.
This basic result is fundamental in the literature on
estimating partial linear models.
First, one estimates E(yx) and E(zx)
using very flexible methods, typically, socalled nonparametric methods.
~
Then, after obtaining residuals of the form yi
^
E(zixi),
B
^
~
_ yi  E(y
ixi) and zi _ zi  
~
~
is estimated from an OLS regression yi on zi, i = 1,...,N.
Under
general conditions, this kind of nonparametric partiallingout procedure leads
to a
rNconsistent, asymptotically normal estimator of B.

See Robinson (1988)
and Powell (1994).
CHAPTER 3
3.1. To prove Lemma 3.1, we must show that for all e > 0, there exists be <
and an integer Ne such that P[xN
following fact:
since xN
p
L
But
We use the
a, for any e > 0 there exists an integer Ne such
that P[xN  a > 1] < e for all N
Definition 3.3(1).]
> be] < e, all N > Ne.
8
> Ne .
[The existence of Ne is implied by
xN = xN  a + a < xN  a + a (by the triangle
inequality), and so
xN  a < xN  a.
It follows that P[xN 
< P[xN  a > 1].
Therefore, in Definition 3.3(3) we can take be
a > 1]
_ a + 1
(irrespective of the value of e) and then the existence of Ne follows from
Definition 3.3(1).
3
3. ^ ^ ^ Therefore.     c.   3. se(g) = se(q)/q.3. the unbiased estimator of s 2 S (yi .  is used: ^2 s = (N  The asymptotic i=1 ^ standard error of yN is simply s/rN.1 because g(xN) p L g(c).g)]. continuously differentiable .Avar[rN(g .1)/se(q) = 3/2 = 1. In the scalar case. The asymptotic standard deviation of yN is the square root of its asymptotic variance.  d.  2 Avar(yN) = s /N. ^ b. We use the delta method to find Avar[rN(g . 2  b.3.7. this coincides with the actual variance of yN.yN)2. The asymptotic t statistic for testing H0: q = 1 is (q .   When g(q) = ^ log(q) .  2  2 rN(yN .m)] by N. of course. ^ ^ and se(q) = 2. and so ^ ^ plim[log(q)] = log[plim(q)] = log(q) = g. Since Var(yN) = s /N. Var[rN(yN . a.m)] = s2. Because g = log(q).m) ~a Normal(0. and so Avar[rN(yN . ^ ^ ^ 2 ^ if g = g(q) then Avar[rN(g .g)] = [dg(q)/dq] Avar[rN(q .5. the asymptotic standard error of g is generally dg(^q)/dqWse(^q). g = log(4) ^ When q = 4 ~ 1. 1 N 1) Typically. the null of interest can also be stated as H0: g = 4 .  In the scalar case. a.q)].which is. We Obtain Avar(yN) by dividing Avar[rN(yN .q)]. ^ ^ d. for g(q) = log(q). For q > 0 the natural logarithim is a continuous function. e.g)]  2 ^ = (1/q) Avar[rN(q .m)] = N(s /N) = s . or s/rN. This follows immediately from Lemma 3.39 and se(g^) = 1/2. By the CLT.s2).    Therefore. we need a consistent estimator of s.5. and then s^ is the positive square root.   e. To obtain the asymptotic standard error of yN.  As expected.  ^ c.
Since q1 = 100W[exp(b1) . ~ Avar[rN(G  G)] G)] = G(Q)V2G(Q)’.s. finding the proportionate difference in this expectation at married = 1 and married = 0 (with all else equal) gives exp(b1) . E(wagex) = E[exp(u)x]exp(b0 + b1married + b2educ + zG).V1 is positive semidefinite. and therefore G(Q)(V2 V1)G(Q)’ is p. Now.9.Avar[rN(G  G)] = G(Q)(V2 .1. This leads to a ^ very strong rejection of H0. ^ . a.39/(.78. By the delta method.V1)G(Q)’. we need the derivative of g with 5 . Exponentiating equation (4. b. if u and x are independent Therefore E(wagex) = d0exp(b0 + b1married + b2educ + zG). CHAPTER 4 4.1. ^ Avar[rN(G  where G(Q) = G)] ~ = G(Q)V1G(Q)’. Avar[rN(G  Dqg(Q) is Q * P. This completes the proof. Thus. the percentage difference is 100W[exp(b1) . The lesson is that. all other factors cancel out. using the Wald test. ^ The t statistic based on g is about 1.d. then E[exp(u)x] = E[exp(u)] = d0. marginally significant. whereas the t statistic based on q is. Therefore.1]. Therefore.1] = g(b1).5) = 2. V2 . Now. 3. we can change the outcome of hypotheses tests by using nonlinear transformations.0. at best. say. where x denotes all explanatory variables. By assumption.49) gives wage = exp(b0 + b1married + b2educ + zG + u) = exp(u)exp(b0 + b1married + b2educ + zG).
with upper block E(x’x) and lower block E(z ). The conditional variance can always be written as x) . se(b2) = . 4.1. Var(ux) = E(u ^ ^ Therefore.educ0)] .5. all else fixed. For the estimated version of equation (4. c. where w = (x.01 and se(q1) = 4. then E(u2x) $ Var(ux). generally. the usual standard errors would not be valid unless E(ux) = 0. if E(ux) $ 0. 2 b.exp(b2educ0)]/exp(b2educ0) = exp[b2(educ1 .7 and se(q2) = 3. For ^ ^ Then q2 = 29.3.76. b2 = .006. D) Since Var(yw) = is s [E(w’w)] .z).dg/db1 = 100Wexp(b1). The proportionate change in expected wage from educ0 to educ1 is [exp(b2educ1) .1] and ^ ^ ^ se(q2) = 100WDeducexp(b2Deduc)se(b2) ^ ^ d.50) as E(yw) = wD. q1 = 22. q2 = 100W[exp(b2Deduc) .065. a. ^ Using the same arguments in part (b).1 = exp(b2Deduc) .199.g)’. say educ0 and educ1. 2 the upper K * K block gives 6 Inverting E(w’w) and focusing on . b1 = . Not in general. 4.039. respect to b1: ^ The asymptotic standard error of q1 ^ using the delta method is obtained as the absolute value of dg/db1 times ^ se(b1): ^ ^ ^ se(q1) = [100Wexp(b1)]Wse(b1). Write equation (4. it follows by Theorem 4. and Var(ux) is constant. 2 ^ s . se(b1) = ^ ^ . ^ q2 we set Deduc = 4.where 2 1 ^ D = Importantly. It could be that E(x’u) = 0. because E(x’z) = 0. But.29). E(w’w) is block diagonal.[E(ux)]2. We can evaluate the conditional expectation in part (a) at two levels of education. in which case OLS is consistent.2 that Avar rN(D  ^ ^ (B’.11.
Because E(x’z) = 0 and E(x’u) = 0. 2 = h $ 0. it suffices to show that E(v x’x) 2 To this end. E(v x’x) = E[E(v 2 2 _ E(z2x). other things equal. without further assumptions. E(v x’x) . which. One important omitted factor in u is family income: students that come from wealthier families tend to do better in school.s E(x’x) = g E[h(x)x’x]. 7 Another factor in u . Then by the law of x)x’x] = g2E[h(x)x’x] + s2E(x’x).  Now we can show Avar ~ ^ rN(B .Avar rN(B . Family income and PC ownership are positively correlated because the probability of owning a PC increases with family income.Avar rN(B .E(yx.s E(x’x) = g h E(x’x).s [E(x’x)] E(x’x)[E(x’x)] 2 1 1 = [E(x’x)] [E(v x’x) .z) = 0 and E(u x.3.z) = s .s [E(x’x)] 2 1 .Avar ^ 2 1 rN(B . is actually x) = E(z2) 2 In particular.B) .7. x) = g2E(z2x) + E(u2x) + 2gE(zux) = g2E(z2x) + 2 Further.  Next.B) = [E(x’x)] E(v x’x)[E(x’x)] .s.B) is positive semidefinite by   writing Avar ~ ^ rN(B . when g 2 2 2 a positive definite matrix except by fluke.z) = Var(yx. E(v s . 2 2 2 x) is constant. let h(x) iterated expectations.B). Therefore. It is helpful to write y = xB + v  _ y . where we use E(zux.s E(x’x)][E(x’x)] . we need to find Avar where v = gz + u and u E(x’v) = 0.d.B)   = [E(x’x)] E(v x’x)[E(x’x)] 1 2 1 = [E(x’x)] E(v x’x)[E(x’x)] 1 2 1 .B) = s [E(x’x)] . 2 2 2 2 4. which is positive definite. a.3).B) . ~ rN(B . 2 2 2 1 is positive definite.z) = zE(ux. ~ 1 2 1 rN(B .z). 1 Because [E(x’x)] 1 s E(x’x) is p. if E(z > 0 (in which case y = xB + v satisfies the homoskedasticity assumption OLS. Avar So. the equation y = xB + v generally violates the 2 Unless E(z homoskedasticity assumption OLS. E(v x’x) .
9. expenditure per student. Then the population slope coefficient in a simple regression is always a1 = Cov(w1. Another possibility is to use average house value in each student’s home zip code. The coefficient on log(y1) changes.w)/Var(w1).w) = Cov(w1. b. average teacher salary. by assumption.w)/(sw sw). but it is not clearcut because of the other explanatory variables in the equation. a. But Corr(w1. If we write the linear projection u = d0 + d1hsGPA + d2SAT + d3PC + r then the bias is upward if d3 is greater than zero. 4. For simplicity. so we can write a1 = Cov(w1. and since a correlation coefficient is always between 1 1 8 .w)/(sw sw). ^ b. family income) and PC. Clearly. But. If data on family income can be collected then it can be included in the equation. Var(w) = Var(w1). and so on. b3 is likely to have an upward bias because of the positive correlation between u and PC. the intercept and slope estimates on x will be the same. This may also be correlated with PC: a student who had more exposure with computers in high school may be more likely to own a computer. as zip code is often part of school records.is quality of high school. Just subtract log(y1) from both sides: Dlog(y) = b0 + xB + (a1 . This measures the partial correlation between u (say. Proxies for high school quality might be facultystudent ratios. and it is likely to be positive. let w = log(y).1)log(y1) + u. w1 = log(y1). where sw 1 1 = sd(w1) and sw = sd(w). If family income is not available sometimes level of parents’ education is. c.
0640893 iq  .001 .0074608 _cons  5.28 0.2286334 black  .863 0.000 .0109248 .0002 a. We can see from the t statistics that these variables are going to be 9 .000 .0001911 .000 4. Std.0520917 educ  . the result follows.006125 .5%. 4.0305676 urban  .467 0. reg lwage exper tenure married south urban black educ iq kww Source  SS df MS +Model  44.0269095 6.002 .128 0.268 0.2087073 .0031183 .0498375 . but it is still practically nontrivial and statistically very significant.0157246 married  . Err.36251 lwage  Coef. When we used no proxy the estimated return was about 6. Thus.0820295 .and 1.1303995 .000 . test iq kww ( 1) ( 2) iq = 0. b. t P>t [95% Conf.177362188 Number of obs F( 9.039 .1758226 .4%.0127522 .0355856 . Interval] +exper  .506 0. The estimated return to education using both IQ and KWW as proxies for ability is about 5%.079 0.59 0.0018521 2.131415664 +Total  165.0000 0.426408 .0064117 .0051059 kww  .0967944 9 4.003826 .0262222 3.924879 5.559489 925 .11. we have an even lower estimated return to education.0399014 3.2685059 south  .1334913 .1921449 .534 0.066 0.947 0. 925) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 935 37.000 .0011306 .89964382 Residual  121.0389094 4.2662 0.938 0. and with only IQ as a proxy it was about 5.127776 40.0010128 3.2591 .0024457 4.0 kww = 0. 925) = Prob > F = 8.175644 .007262 6.656283 934 . Here is some Stata output obtained to answer this question: .002 .000 .0 F( 2.1157839 .1230118 .0032308 3.0190927 tenure  .
441 .15 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen if d87 Source  SS df MS +Model  11.47 0.3072706 lprbpris  .77 0.6447379 85 .641 .78874002 Residual  15.2486073 . a.301120202 Number of obs F( 4.6377519 . t P>t [95% Conf. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 if d87 10 .18405574 +Total  26. To add the previous year’s crime rate we first generate the lag: .42902 lcrmrte  Coef.0002.1634732 0.570136 lavgsen  .7239696 . Using the 90 counties for 1987 gives .4315307 11.4725112 . Std. Err.28 0.000 5. and both are practically and statistically significant.2064441 0.9532493 .2507964 .0764213 . The elasticities of crime with respect to the arrest and conviction probabilities are the sign we expect.799698 89 . with pvalue = .725921 4.28 0.867922 .3888 . 4. gen lcrmr_1 = lcrmrte[_n1] if d87 (540 missing values generated) . Interval] +lprbarr  .000 .1549601 4 2.1596698 . all coefficients are elasticities. b.4014499 _cons  4.4946899 lprbconv  . The elasticities with respect to the probability of serving a prison term and the average sentence length are positive but are statistically insignificant.jointly significant.1153163 6. c. holding all other factors fixed.0000 0.13.0831078 5. Blacks are estimated to earn about 13% less than nonblacks. The F test verifies this.4162 0. The wage differential between nonblacks and blacks does not disappear.000 .009923 Because of the loglog functional form. 85) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 15.69 0.
0465999 0. the elasticity with respect to the lagged crime rate is large and very statistically significant.8798774 14 1.301120202 Number of obs F( 5.6899051 .4447249 84 . t P>t [95% Conf.1313457 . Not surprisingly. The elasticities with respect to prbarr and prbconv are much smaller now. 84) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 113.04100863 +Total  26. Err.83 0.799698 89 .94 0.67099462 Residual  3.0627624 2.0602325 lprbconv  .20251 lcrmrte  Coef.0000 0.25 0.7666256 .0698876 lavgsen  .056).004 .70570553 Residual  2.0036684 lcrmr_1  .) c.016 1.3232625 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 lwconlwloc if d87 Source  SS df MS +Model  23.90 0.1520228 .8638 .000 .Source  SS df MS +Model  23. t P>t [95% Conf.1266874 .19731 lcrmrte  Coef.1850424 . Interval] +lprbarr  . but still have signs predicted by a deterrenteffect story.3130986 2. Std.95 0.0539921 lprbpris  .389257 .81 0.0000 0.0988505 1. Std.3549731 5 4. and the latter is almost statistically significant at the 5% level against a twosided alternative (pvalue = .8697208 _cons  .301120202 Number of obs F( 14.0452114 17.3098523 .1439946 There are some notable changes in the coefficients on the original variables.8715 0.45 0.28 0. Adding the logs of the nine wage variables gives the following: .3077141 .91982063 75 . The conviction Adding the lagged crime rate changes the signs of the elasticities with respect to prbpris and avgsen.056 .8707 .204 .799698 89 .7798129 . 75) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 43.0386768 . probability is no longer statistically significant. (The elasticity is also statistically different from unity. Interval] +11 .409 .038930942 +Total  26.8911 0.0782915 1. Err.
364317 .19 0.3732769 . (This F statistic is the heteroskedasticityrobust Wald statistic divided by the number of restrictions being tested.0395089 .3361278 .1725122 .37 0.113 .4522947 lwloc  .32 0.692009 .3038978 .0411265 lprbconv  .37 0.2155553 .173 .634 .0987371 .134327 0.032. which is appended to the "reg" command.2072112 0.277 .1186099 0.408 .2815703 lwmfg  . testparm lwconlwloc ( ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) 9) lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F( = = = = = = = = = 0.2850008 .911 .187 .11 0.2034619 .0 0.43 0.1069592 .4749687 .0115614 lavgsen  .3350201 lwfed  .3317244 lwtrd  .33 0.618724 _cons  3.0 0.049728 1.011 .0560619 .1525615 .83 0.50 0.0 0.which also have the largest absolute t statistics have the opposite sign. The two largest elasticities .62 0.253707 .0686327 lwtuc  .957472 1.6386344 .0 0.672 .61 0. Plus.0835258 .0683639 .1960546 .0659533 2.1775178 1.7453414 .1375459 .039 . nine in this 12 .4195493 . These are with respect to the wage in construction ( . gives the heteroskedasiticityrobust F statistic as F = 2.0844647 2.0 0.0 0.48 0.0 0.11 0.8509887 lwcon  .175 .1643 The nine wage variables are jointly insignificant even at the 15% level.056 7. the elasticities are not consistently positive or negative.0641312 .1127542 .0277923 lcrmr_1  .849 .1964974 0.0369855 .2453134 1.0306994 lprbpris  .792525 1. Using the "robust" option in Stata.0 9.3079171 lwser  .1024014 2.1674273 .8248172 lwsta  . 75) = Prob > F = 1.2079524 .336).000 .023 .94 0.09 0.0847427 1.05 0.6396942 .7153665 lwfir  .6926951 .0530331 14.0 0.2317449 1.19 and pvalue = .lprbarr  .285) and the wage for federal employees (.3291546 0. d.
plim(R ) = 1 2 2 2 . 2 c. d. Because each xj has finite second moment.plim[(SSR/N)/(SST/N)] = 1 . When we add z to the regressor list. Since Var(u) But each xj is uncorrelated with u. so Therefore. 2 where we use the fact that SSR/N is a consistent estimator of su and SST/N is 2 a consistent estimator of sy. Var(y) = Var(xB) + Var(u). The population Rsquared depends on only the unconditional variances of u and y.u) is welldefined. This is another example of how the assumption of nonrandom regressors can lead to counterintuitive conclusions. which is uncorrelated with each xj. The derivation in part (c) assumed nothing about Var(ux). Therefore. Suppose that an element of the error term. Write R = 1 . Var(xB) < < 8. Cov(xB. 2 2 b. so we should allow for error variances to change across different models for the same response variable. This is nonsense when we view the xi as random draws along with yi. Neither Rsquared nor the adjusted Rsquared has desirable finitesample properties. 2 Therefore. suddenly becomes observed. it makes no sense to think we have access to the entire set of factors that one would ever want to control for. Cov(xB. 2 The statement "Var(ui) = s are nonrandom (or B = Var(yi) for all i" assumes that the regressors = 0. and so does the error variance. the error changes. (It gets smaller. The division by the number of restrictions turns the asymptotic chi square statistic into one that roughly has an F distribution.su/sy = r . which is not a very interesting case).SSR/SST = 1 .[plim(SSR/N)]/[plim(SST/N)] = 1 . say z.) In the vast majority of economic applications.) 4. 8. or sy = Var(xB) + su.(SSR/N)/(SST/N).15.example. a.u) = 0. 13 . the usual Rsquared consistently estimates the population Rsquared. regardless of the nature of heteroskedasticity in Var(ux).
but we must be a little careful. ¨ x1 = (z1. a. For example. ^ y2. At first glance it seems that cigarette price should be exogenous in equation (5.y2) and x2 _ v^2. say ¨ x1. because i=1 ^ ^ ^ ^ we can write y2 = y2 + v2. ^ In other words. The statement in the problem is simply wrong. (More precisely.1. where B^1 ^ ^ = (D’ 1 . or eat less nutritious meals. women who smoke during pregnancy may. There may be unobserved health factors correlated with smoking behavior that affect infant birth weight. ^ ^ But when we regress z1 onto v2. Basic economics says that packs should be negatively correlated with cigarette price. and let B _ (B ’1 . (ii) Regress y1 onto ¨ x1.52).a1)’. b. 5.r1)’ be OLS estimator from (5. One component of cigarette price is the state tax on 14 . on average.y2). CHAPTER 5 5. drink more coffee or alcohol. the ^ residuals from regressing y2 onto v2 are simply the first stage fitted values.) Further.54).3.such as unbiasedness. although the correlation might be small (especially because price is aggregated at the state level). y2. the residuals are just z1 since v2 is N orthogonal in sample to z. Define x1 ^ ^ ^ _ (z1. But the 2SLS estimator of B1 is obtained ^ exactly from the OLS regression y1 on z1. Using the hint. B^1 can also be obtained by partitioned regression: ^ (i) Regress x1 onto v2 and save the residuals. where y2 and v2 are orthogonal in sample. so the only analysis we can do in any generality involves asymptotics. S z’i1^vi2 = 0.
734 0.0012391 . .017779 1. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1388 12.0180498 . OLS is followed by 2SLS (IV.0490 . on average.0417848 lfaminc  .675618 .086275 0. reg lbwght male parity lfaminc packs (male parity lfaminc cigprice) Source  SS df MS +Model  91.4203336 1387 .770361 1383 . and so maybe cigarette price fails the exogeneity requirement for an IV. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 1388 2. .333819 2.4203336 1387 .55 0.009 .cigarettes.441660908 Residual  48.116 0.0322 .681 0.0050562 .3500269 4 22.467861 .1173139 .955 .000 3.0290032 packs  .0262407 .0350 0.76664363 4 .044263 .601 0.001 . in this case): .) 15 .0481949 . standard errors. Interval] +packs  .8375067 Residual  141.056 0.928031 male  .0501423 _cons  4.094 .0218813 213. t P>t [95% Conf.7971063 1.2588289 17.009 . Std.264 .102509299 +Total  50.036352079 Number of obs F( 4. States that have lower taxes on cigarettes may also have lower quality of health care. Std.233 0.0000 0.0219322 0.0070964 .0056646 2.0100894 2.035179819 +Total  50.39 0.262 0. c.0298205 .677 0.463 1.0837281 .32017 lbwght  Coef.000 4.18756 lbwght  Coef. and so on.0460328 parity  . Err.718542 .063646 .0258414 lfaminc  .0064486 .65369 1383 . Err.1754869 _cons  4.0646972 parity  .0147292 .0171209 4. t P>t [95% Conf.0055837 3.000 .890 0.960122 4.632694 4.975601  (Note that Stata automatically shifts endogenous explanatory variables to the beginning of the list when report coefficients.0036171 .036352079 Number of obs F( 4. Quality of health care is in u.600 0.0570128 1. Interval] +male  . reg lbwght male parity lfaminc packs Source  SS df MS +Model  1.
051 0.001 0.0007291 . Std.0007763 1. and is statistically significant.4%.696129 1387 .3414234 The reduced form estimates show that cigprice does not significantly affect packs. 5. in fact. one more pack of cigarettes is estimated to reduce bwght by about 8.000777 . cigprice fails as an IV for packs because cigprice is not partially correlated with packs (with a sensible sign for the correlation).76705108 4 . is huge in magnitude. We can see the problem with IV by estimating the reduced form for packs: .089182501 Number of obs F( 4.0181491 .041 .0158539 0. Err.The difference between OLS and IV in the estimated effect of packs on bwght is huge.0526374 . Under the null hypothesis that q and z2 are uncorrelated.29448 packs  Coef.1040005 1.0022999 _cons  . 16 Unfortunately.187 . reg packs male parity lfaminc cigprice Source  SS df MS +Model  3.000 . The sign and size of the smoking effect are not realistic.0088802 2.0263742 parity  .0276 .0086991 6. This is separate from the problem that cigprice may not truly be exogenous in the birth weight equation. d. The IV estimate has the opposite sign. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1388 10.086716615 +Total  123.0697023 . t P>t [95% Conf.1374075 . the coefficient on cigprice is not the sign we expect.0305 0. Thus. Interval] +male  .0355724 cigprice  . and is not statistically significant.86 0.0358264 .5. y2 . With the OLS estimate.317 .94176277 Residual  119.929078 1383 .044 0.298 0.55) because each is uncorrelated with u1.0007459 . z1 and z2 are exogenous in (5.0047261 .321 0.0355692 lfaminc  .766 .0666084 .0000 0.
.45). since the zh are redundant in (5. More formally. 5. The point of this exercise is that one cannot simply add instrumental variable candidates in the structural equation and then test for significance of these variables using OLS.x1.56) by 2SLS using instruments (1. and so the regression of y1 on z1. we have assumed that the zh are uncorrelated with a1...h1a1. Given all of the zero correlation assumptions.... what we need for identification is that at least one of the zh appears in the reduced form for q1. We need family background variables to be redundant in the log(wage) 17 . y2.z2. b.in which case we incorrectly conclude that the elements in z2 are valid as instruments.(1/d1)a1 into equation (5. Since each xj is also uncorrelated with v .in which case we would incorrectly conclude that z2 is not a valid IV candidate... we might fail to reject H0: J1 = 0 when z2 and q are correlated .45) we get y = b0 + b1x1 + . (5.xK.. Further. in the linear projection q1 = p0 + p1x1 + . we can estimate (5.zM) to get consistent of the bj and h1. ... z2 does not produce a consistent estimator of 0 on z2 even when E(z’ 2 q) = 0. This is the sense in which identification With a single endogenous variable. cannot be tested.z1. + pKxK + pK+1z1 + . at least one of pK+1. where h1 _ (1/d1).56) Now.. that ^ J 1 We could find from this regression is statistically different from zero even when q and z2 are uncorrelated . + bKxK + h1q1 + v .. + pK+MzM + r1.is correlated with u1.. Or. they are uncorrelated with the structural error. a.h1a1..7. If we plug q = (1/d1)q1 . pK+M must be different from zero.. we must take a stand that at least one element of z2 is uncorrelated with q.. v (by definition of redundancy).
0076754 . q1. Applying the procedure to the data set in NLS80.2635832 exper  . 713) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 722 25.175883378 Number of obs F( 8.45036497 Residual  107.811916 721 .54 0.725 .0161809 . have been controlled for.RAW gives the following results: .0241444 urban  .047992 .70 0.537 .81 0.0305692 tenure  .000 .551 5.0077077 2.1869376 .1201284 . such as educ and exper). For the rank condition to hold.2819033 south  .1138678 0.0676158 married  .1835294 .0467592 4. c. reg lwage exper tenure educ married south urban black iq (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source  SS df MS +Model  19.0240867 _cons  4.150363248 +Total  126.013 .31 0. t P>t [95% Conf.1546 0.0137529 educ  .38777 lwage  Coef.0261982 0.0367425 1.0083503 .0982991 .046 .00 0.192 .468913 9.000 .471616 . This is likely to be true if we think that family background and ability are (partially) correlated.0400269 .0162185 . Err.70 .2513311 black  .48 0. Interval] +iq  .000 3.62 0.0000 0.1451 . Std.0030956 2.035254 .6029198 8 2.0015979 .0154368 .0003044 .1225442 .0327986 5.1901012 . 713) = 722 25.392231 . once the xj have been netted out. say IQ.208996 713 .0040076 4. reg lwage exper tenure educ married south urban black kww (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source  SS df MS +18 Number of obs = F( 8.05 0. we need family background variables to be correlated with the indicator.equation once ability (and other factors. The idea here is that family background may influence ability but should have no partial effect on log(wage) once ability has been accounted for.35 0.07 0.000 .
0150576 1.0286399 urban  . only 722 are used for the estimation.175883378 Prob > F Rsquared Adj Rsquared Root MSE = = = = 0. Std.85 0. so that b4 = b3 + q4. What we could do is define binary indicators for whether the corresponding variable is missing.06 0.02 0.2179041 .991612 713 .1627592 32.0322147 2.0529759 3. because data are missing on meduc and feduc.1563 0.9. and then use the binary indicators as instruments along with meduc.898273 5. and sibs. or they might be correlated with a1.000 .0201147 _cons  5. Err.635 .2645347 south  .0068682 .217818 .309 .0675914 .2292093 black  .66 0. the equation and rearranging gives 19 Plugging this expression into . (In both firststage regressions.0125238 educ  .03 0.003 .0893695 0.0565198 .0424452 .0037739 1.098 .02 0.0761549 married  .0260808 .0239933 .1605273 .47 0. Interval] +kww  .004 .150058361 +Total  126.) 5. and sibs have pvalues below .176 .0545067 tenure  .38737 lwage  Coef.002. so it seems the family background variables are sufficiently partially correlated with the ability indicators.0000 0.0255051 1.1484003 .000 4.820304 8 2.36 0. This would allow us to use all 935 observations. t P>t [95% Conf. the F statistic for joint significance of meduc.0051145 . The return to education is estimated to be small and insignificant whether IQ or KWW used is used as the indicator.b3.091887 .0411598 3.0063783 .1330137 exper  .0022947 .0249441 .0067471 1.537362 Even though there are 935 men in the sample.811916 721 .477538 Residual  106.1551341 .307 .Model  19. feduc. set the missing values to zero. feduc. Define q4 = b4 . This could be because family background variables do not satisfy the appropriate redundancy condition.0046184 .61 0.1468 .
exper . 0 0 Effectively. r2 is uncorrelated with z.1. let y2 be the linear projection of y2 on z2.) 0 Plugging in y2 = y2 + a2 gives y1 = z1D1 + a1y2 + a1a2 + u1. The key consistency condition is that each explanatory is orthogonal to the composite error. log(wage) = b0 + b1exper + b2exper = b0 + b1exper + b2exper where totcoll = twoyr + fouryr. * Now.x) _ * = i i 7i=1 i 8 7i=1 i 8 20 . and so E(z’ 1 r2) = 0 and E(y2r2) = 0. * The lesson is that one must be very careful if manually carrying out 2SLS by explicitly doing the first. y2. 5. OLS will be inconsistent for all parameters in * Contrast this with 2SLS when y2 is the projection on z1 and z2: = y2 + r2 = zP2 + r2._z)(x . let a2 L2 be the projection error.1 show that the argument carries over to the case when L2 is estimated. By y2 The second step regression (assuming is known) is essentially y1 = z1D1 + a1y2 + a1r2 + u1. dist2yr and dist4yr as the full set of instruments. E(z’u1) = 0. the IV estimate of the ^ slope can be written as b1 = & SN (z . Following the hint. 0 Further. general. where E(z’r2) = 0.11. 0 5. is that E(z’ 1 a2) Therefore._y)*/& SN (z . a1a2 + u1.and secondstage regressions. (The results on generated regressors in Section 6._z)(y . ^ We can use the t statistic on q4 to test H0: q4 = 0 against H1: q4 > 0. Now. * that P2 The problem $ 0 necessarily because z1 was not included in the linear projection for y2.2 + b3(twoyr + fouryr) + q4fouryr + u 2 + b3 totcoll + q4fouryr + u. and assume that is known. In a simple regression model with a single IV. just estimate the latter equation by 2 2SLS using exper. assumption.13. we regress y1 on z1. a. E(y2a2) = 0 by construction.
Now the numerator can be written as 7i=1 i i 8 7i=1 i i 8 N N N S zi(yi .y). x0 = 0.   The same argument shows that the denominator is (N0N1/N)(x1 . i=1 i=1 i=1 N where N1 = S zi is the number of observations in the sample with zi = 1 and     i=1   y1 is the average of the yi over the observations with zi = 1.& SN z (y .x0).y0). x1 .x0 is the difference in   necessary for participation. If x is also binary ._y) = S ziyi .12.  b.&7 S zi*8_y = N1y1 . (When eligibility is Generally. 0 is the L1 x K2 zero matrix. the rank condition holds if and only if rank(^) = K.N1y = N1(y1 . Then x1 is the fraction of people  participating in the program out of those made eligibile.x1 is the  fraction of observations receiving treatment when zi = 1 and x0 is the fraction receiving treatment when zi = 0. Taking the ratio proves the result. a. and x0 is the fraction of people participating who are not eligible. then 21 ^11 has . where the notation should be     Straightforward algebra shows that y1 . where IK 2 20 ^11 2 is the K2 x K2 is L1 x K1. So the difference in the mean response between the z = 1 and z = 0 groups gets divided by the difference in participation rates across the two groups.N1)/N]y1 . ) .y = [(N . write y  y = (N0/N)y0 + (N1/N)y1. So. as a weighted average: clear. In L(xz) = z^. 5._y)*/& SN z (x . and let zi = 1 if person i is eligible for participation in the program.(N0/N)y0   = (N0/N)(y1 . and ^12 is K2 x As in Problem 5. we can write ^ = (^ 9^ 2 11 12 0 IK identity matrix. If for some xj. suppose xi = 1 if person i participates in a job training program. K1. the vector z1 does not appear in L(xjz).   Next.) participation rates when z = 1 and z = 0.15.representing some "treatment" .   So the numerator of the IV estimate is (N0N1/N)(y1 .y0)._x)*.
only one of them turned out to be partially correlated with x1 and x2.. (^ 11 12 9^ 2 ^ 0 IK ) . where z1 appears in the reduced form form both x1 and x2. in that case. ^ = K. Without loss of generality. while we began with two instruments. qui reg educ nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66 . ^ is all Intuitively. but z2 appears in neither reduced form. predict v2hat.. ^11 Then the 2 x 2 matrix has zeros in its second row. CHAPTER 6 6..a column which is entirely zeros.K1. we see that if 2 20 ^11 is diagonal with all nonzero is lower triangular with all nonzero diagonal elements. resid 22 . rank is a K1 x K1 diagonal matrix with nonzero diagonal elements.1. a. Therefore.. ^ can be written as which means rank(^) < K. we assume that zj appears in the reduced form for xj. Then Looking at ^11 ^ = diagonals then Therefore. But then that column of a linear combination of the last K2 elements of ^. a necessary condition for the rank condition is that no columns of ^11 be exactly zero. It cannot have rank K. which means that at least one zh must appear in the reduced form of each xj. b. which means that the second row of zeros. Here is abbreviated Stata output for testing the null hypothesis that educ is exogenous: . we can simply reorder the elements of z1 to ensure this is the case. Suppose K1 = 2 and L1 = 2. c. j = 1.
that educ is endogenous. The negative correlation between u1 and educ is essentially the same finding that the 2SLS estimated return to education is larger than the OLS estimate.1371509 reg666  .0456842 0.0341426 .0247932 reg662  .0919791 smsa  .393 .0150626 .0002286 .179 0.1575042 reg661  . predict uhat1.2926273 .0017308 black  .463 .0482417 4.010 .0299809 1.1286352 reg668  .000 .0029822 .007 0.0554084 .0205106 0.734 0.0121169 _cons  3.673 0.010 .0359807 1.105 0.0289435 3.0310325 0.821434 4.1188149 . To test the single overidentifying restiction we obtain the 2SLS residuals: .0478882 2.0118296 .0469556 . Interval] +educ  .0610759 .0251538 .0261202 5.950319 ^ The t statistic on v2 is 1.0482814 3. the same researcher may take t = 1.1259578 .050516 .540 0. I would call this marginal evidence (Depending on the application or purpose of a study.0259797 .) b.1598776 expersq  .001 .0828005 .0777521 ..566 0. resid Now.2517275 exper  .1944098 . we regress the 2SLS residuals on all exogenous variables: .087 .000 .481 0.729054 4.1232778 .1570594 .0552789 v2hat  .1034468 smsa66  .000 .0209423 5.71.117 .0436804 1.1659733 reg667  .124 .0390596 .000 .177718 .339687 .100753 .0293807 south  .0003191 7.710 0.0699968 .583 0. Std.0606186 reg663  .238 .1980371 . qui reg lwage educ exper expersq black south smsa reg661reg668 smsa66 (nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66) .0023565 . reg uhat1 exper expersq black south smsa reg661reg668 smsa66 nearc4 nearc2 Source  SS df MS Number of obs = 23 3010 . reg lwage educ exper expersq black south smsa reg661reg668 smsa66 v2hat lwage  Coef.0151411 reg665  .153 .0489487 1.855 0.430 0. In any case.384 0.1431945 .0484086 1.994 .0440018 .2171749 .0623912 .1057408 reg664  .066 0. Err.0515041 .482 0. which is not significant at the 5% level against a twosided alternative.1811588 .0398738 2.102976 .574 0.253 0.000 .000 1.71 as evidence for or against endogeneity. t P>t [95% Conf.
204 . In addition.772644 3009 . We would first estimate the two reduced forms for calories and protein 2 by regressing each on a constant. v22 and do a joint 24 . b. we must also assume prices are exogenous in the productivity equation.+Model  .203922832 16 .163433913 F( 16. 2993) Prob > F Rsquared Adj Rsquared Root MSE = 0.0004 = 0. c.1. We need prices to satisfy two requirements... educ. exper . First. v21. educ. and the M prices.27332168 2 The pvalue. exper. v21 and v22. di 3010*.0000 = 0. Since there are two endogenous explanatory variables we need at least two prices. obtained from a c1 distribution.3. 6. Then we would run the 2 ^ ^ regression log(produc) on 1. calories and protein must be partially correlated with prices of food.40526 The test statistic is the sample size times the Rsquared from this regression: .164239466 +Total  491.568721 2993 .0004 1.08 = 1. a. While this is easy to test for each by estimating the two reduced forms.5c). exper . Ideally. . exper.0049 = . so the instruments pass the overidentification test. p1. di chiprob(1.. is about . prices vary because of things like transportation costs that are not systematically related to regional variations in individual productivity. ^ ^ We obtain the residuals.012745177 Residual  491.273. pM. A potential problem is that prices reflect food quality and that features of the food other than calories and protein appear in the disturbance u1. the rank condition could still be violated (although see Problem 15.2) .
s is implictly SSR/N .Mh)’u2i i=1 i=1 & 1/2 N ^ . S (hi . For simplicity.5.s ) + op(1). So N Therefore. i=1 ^2 2 ^ Now.B)’]}.2uixi(B  B) + so 1/2 N N S (hi . 2 freedom adjustment. N 1/2& 7N The third term can be written as ^ ^ 1/2 S (hi .B) = 7 i=1 8 Op(1) and. the df adjustment makes no difference 1/2 N ^2 ^2 1/2 N ^2 ^2 S (hi .40) i=1 ^ where the expression for the third term follows from [xi(B  B)]2 ^ = xi(B  B)(B^ ^ ^ t xi)vec[(B . as in Problem 4.Mh)’(xi t xi)*8{vec[rN(B . E[ui(hi .Mh)’(u i .s ).2 N S ui(hi .) N (In any case.Mh)’u^2i = N1/2 S (hi  i=1 M 2 h)’ui + op(1).Mh)’xi rN(B .B)rN(B .s2) = Op(1)Wop(1) = op(1).Mh)’(s^2 . i=1 1 N   where we again use the fact that sample averages are Op(1) by the law of large ^ numbers and vec[rN(B  B)rN(B^   B)’] = Op(1).s has a zero sample average. which means that asymptotically.B)(B .M )’(x t x )*{vec[(B ^ ^ + N . N i=1 1/2 N op(1).there is no degrees of Var(ux) = s .4.s ) = N i . ^2 In these tests.B)(B . absorb the intercept in x.B)’]} = N WOp(1)WOp(1). 25 We have shown that the last two .^s2) = N1/2 S (hi .B) = op(1)WOp(1) because rN(B .^ ^ significance test on v21 and v22. so i=1 far we have 1/2 N N ^2 2 S h’i (u^2i .s2 = Next. N i=1 i=1 We are done with this part if we show N 1/2 N N S (hi . 6.Mh)’xi8*(B . i h i i 8 7 (6.B) 7 i=1 N & 1/2 S (h . ^ [xi(B N B)]2. 1/2 N i=1 i=1 S (hi . we can write ui = ui . a.Mh)’u^2i = N1/2 S (hi . ^2 ^2 So ui . under E(uixi) = 0. the law of large numbers  B)’x’i = (xi   implies that the sample average is op(1).Mh)’ = Op(1) by the central limit theorem and s^2 . so y = xB + u.Mh)’(u S h’i (u i .B)’].Mh)’xi] = 0. E(ux) = 0. We could use a standard F test or use a heteroskedasticityrobust test. Dropping the "2" the second term can & 1 N * ^ ^ be written as N S ui(hi .
terms in (6.40) are op(1), which proves part (a).
1/2 N
S h’i (u^2i  ^s2) is Var[(hi
b. By part (a), the asymptotic variance of N
i=1
M

2
h)’(ui
2 2
4
2uis
2
 s )] =
+ s .
2
E[(ui
2 2
 s ) (hi 
Mh)’(hi

Mh)].
2
Under the null, E(uixi) = Var(uixi) = s
2
2
2
xi] = k2  s4 _ h2.
2 2
2
2 2
standard iterated expectations argument gives E[(ui  s ) (hi 2
2 2
Mh)}
Mh)’(hi

Mh)]xi}
2
[since hi = h(xi)] = h E[(hi 
show.
2
2 2
= E{E[(ui  s )
Mh)’(hi

4
= ui 
[since E(uixi) = 0 is
assumed] and therefore, when we add (6.27), E[(ui  s )
= E{E[(ui  s ) (hi 
2 2
Now (ui  s )
Mh)].
Mh)’(hi

A
Mh)]
xi](hi  Mh)’(hi 
This is what we wanted to
(Whether we do the argument for a random draw i or for random variables
representing the population is a matter of taste.)
c. From part (b) and Lemma 3.8, the following statistic has an asymptotic
2
cQ distribution:
&N1/2 SN (u^2  s^2)h *{h2E[(h  M )’(h  M )]}1&N1/2 SN h’(u
^2
^2 *
i
i8
i
h
i
h
i
i  s )8.
7
7
i=1
i=1
N ^2
^2
Using again the fact that S (ui  s ) = 0, we can replace hi with hi  h in

i=1
the two vectors forming the quadratic form.
Then, again by Lemma 3.8, we can
replace the matrix in the quadratic form with a consistent estimator, which is
^2& 1
h N
^2
1
where h = N
N ^2
^2 2
S (u
i  s ) .
7
N
S (hi  h)’(hi  h)*8,
i=1


The computable statistic, after simple algebra,
i=1
can be written as
& SN (u^2  s^2)(h  h)*& SN (h  h)’(h  h)*1& SN (h  h)’(u
^2
^2 * ^2
i
i
i  s )8/h .
7i=1 i
87i=1 i
8 7i=1 i




^2
^2
Now h is just the total sum of squares in the ui, divided by N.
The numerator
^2
of the statistic is simply the explained sum of squares from the regression ui
on 1, hi, i = 1,...,N.
Therefore, the test statistic is N times the usual
^2
2
(centered) Rsquared from the regression ui on 1, hi, i = 1,...,N, or NRc.
2
2 2
d. Without assumption (6.37) we need to estimate E[(ui  s ) (hi 
Mh)]
generally.
Hopefully, the approach is by now pretty clear.
26
Mh)’(hi
We replace
the population expected value with the sample average and replace any unknown
parameters (under H0).
B,
2
s , and
Mh
in this case  with their consistent estimators
&
7
^2
^2 *
S h’i (u
i  s )8
i=1
1/2 N
So a generally consistent estimator of Avar N
is
N
1 N
S (u^2i  s^2)2(hi  h)’(hi  h),


i=1
and the test statistic robust to heterokurtosis can be written as
& SN (u
^2
^2
*& SN (u^2  ^s2)2(h  h)’(h  h)*1
 s )(hi  h)
i
i
i
7i=1
87i=1 i
8
N
&
^2
^2 *
W7 S (hi  h)’(ui  s )8,




i=1
which is easily seen to be the explained sum of squares from the regression of
^2
^2
1 on (ui  s )(hi  h), i = 1,...,N (without an intercept).

Since the total
sum of squares, without demeaning, is N = (1 + 1 + ... + 1) (N times), the
statistic is equivalent to N  SSR0, where SSR0 is the sum of squared
residuals.
6.7. a. The simple regression results are
. reg lprice ldist if y81
Source 
SS
df
MS
+Model  3.86426989
1 3.86426989
Residual  17.5730845
140 .125522032
+Total  21.4373543
141 .152037974
Number of obs
F( 1,
140)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
142
30.79
0.0000
0.1803
0.1744
.35429
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+ldist 
.3648752
.0657613
5.548
0.000
.2348615
.4948889
_cons 
8.047158
.6462419
12.452
0.000
6.769503
9.324813
This regression suggests a strong link between housing price and distance from
the incinerator (as distance increases, so does housing price).
27
The elasticity
is .365 and the t statistic is 5.55.
However, this is not a good causal
regression:
the incinerator may have been put near homes with lower values to
begin with.
If so, we would expect the positive relationship found in the
simple regression even if the new incinerator had no effect on housing prices.
b. The parameter d3 should be positive:
after the incinerator is built a
house should be worth more the farther it is from the incinerator.
Here is my
Stata session:
. gen y81ldist = y81*ldist
. reg lprice y81 ldist y81ldist
Source 
SS
df
MS
+Model  24.3172548
3 8.10575159
Residual  37.1217306
317 .117103251
+Total  61.4389853
320 .191996829
Number of obs
F( 3,
317)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
321
69.22
0.0000
0.3958
0.3901
.3422
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y81  .0113101
.8050622
0.014
0.989
1.59525
1.57263
ldist 
.316689
.0515323
6.145
0.000
.2153006
.4180775
y81ldist 
.0481862
.0817929
0.589
0.556
.1127394
.2091117
_cons 
8.058468
.5084358
15.850
0.000
7.058133
9.058803
The coefficient on ldist reveals the shortcoming of the regression in part (a).
This coefficient measures the relationship between lprice and ldist in 1978,
before the incinerator was even being rumored.
The effect of the incinerator
is given by the coefficient on the interaction, y81ldist.
While the direction
of the effect is as expected, it is not especially large, and it is
statistically insignificant anyway.
Therefore, at this point, we cannot reject
the null hypothesis that building the incinerator had no effect on housing
prices.
28
605315 lintstsq  .0617759 .9633332 .0387 .195 1.027479 3.6029852 Residual  8341.7611143 11 4.0101699 .241 0.4877198 0.0222128 larea  . Interval] +y81  . reg ldurat afchnge highearn afhigh male married headconstruc if ky Source  SS df MS +Model  358.229847 .627 0.441793 14 25.796236 The incinerator effect is now larger (the elasticity is about .41206 5334 1.0495705 1.1884113 y81ldist  .006 .4556655 lland  .189519 .953 0.000 .0248165 4.305525 1.0958867 . The Stata results are .002 .0132713 .062) and the t statistic is larger.489 0. t P>t [95% Conf.000 .56381928 +29 Number of obs F( 14.0000144 .774032 1.1588297 age  .0151265 .214 .2540468 .1593143 lintst  .3213518 1.7298249 ldist  .1499564 _cons  2.638 1.096088 . but the interaction is still statistically insignificant.3548562 .7937 0. Adding the variables listed in the problem gives .0805715 baths  .744 0. 5334) Prob > F Rsquared Adj Rsquared = = = = = 5349 16.0412 0.109999 .677871 309 .0046178 agesq  . Std.191996829 Number of obs F( 11.37 0.0187723 3. 6.69e06 3.0591504 .675 0.3262647 2.0517205 1.04 0.095 .003 .246 0.0866424 .0171015 2.0357625 .185185 5.041817 . 309) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 321 108.000 .0512328 6. Using these models and this two years of data we must conclude the evidence that housing prices were adversely affected by the new incinerator is somewhat weak.926 0.0000 0.000 .041028709 +Total  61.471 0.c.0611683 .0000486 rooms  .0000315 8. reg lprice y81 ldist y81ldist lintst lintstsq larea lland age agesq rooms baths Source  SS df MS +Model  48.9.20256 lprice  Coef.7863 .0469214 .0014108 5.151 0.43282858 Residual  12.300 0.0000 0.4389853 320 . a.000 .0073939 . Err.432 0.
7673372 .1220995 .0454027 .454054 The estimated coefficient on the interaction term is actually higher now. Interval] +afchnge  .3671738 male  . the OLS estimator is consistent.028 .000 observations.6859052 manuf  .62674904 Root MSE = 1.240 .08 0.03779 1.1757598 .001 .0695248 3.0085967 .0803101 occdis  .000 .0391228 3.74 0. However.245922 . and even more statistically significant than in equation (6.16 0.0198141 trunk  .0979407 .1090163 1.0518063 2. 30 With over 5. The low Rsquared means that making predictions of log(durat) would be very difficult given the factors we have included in the regression: the variation in the unobservables pretty much swamps the explained variation. b.178539 .98 0.Total  8699.1852766 .76 0.0872651 .078 . Err.001 .0106049 married  .0449167 0.1023262 1.0743161 .0409038 3.67 0.85385 5348 1. the low Rsquared does not mean we have a biased or consistent estimator of the effect of the policy change.2117581 _cons  1.24 0.210769 1.5864988 upextr  . is often the case in the social sciences: This it is very difficult to include the multitude of factors that can affect something like durat.0086352 .32 0.1011794 1.1101967 .12 0.095 .29 0.246 .0774276 .1292776 3.1606709 .1202911 .0804827 construc  .0466737 .5139003 .000 .13 0.33). or 3.033 . Provided the Kentucky change is a good natural experiment.2727118 .1264514 .2308768 . Adding the other explanatory variables only slightly increased the standard error on the interaction term.93 0.1614899 1.340168 lowback  .3208922 .196 .000 1.813 .9% if we used the adjusted Rsquared.1061677 11.2604634 neck  . The small Rsquared.2772035 afhigh  .20 0. Std. on the order of 4.1%.1404816 .0945798 .1987962 head  .2699126 .376892 .933 . t P>t [95% Conf.0517462 3.1015267 0.2505 ldurat  Coef.2408591 .002 .0445498 2.0986824 highearn  .0106274 .40 0.18 0.2076305 . means that we cannot explain much of the variation in time on workers compensation using the variables included in the regression.1904371 lowextr  . we .
251 . standard errors is about 2. the ratio of The difference in the KY and MI cases shows the importance of a large sample size for this kind of policy analysis.524) ~ 1. Interval] +afchnge  .3850177 3 11.25 0.11. is remarkably similar to that for Kentucky. because of the many fewer observations. Asymptotic theory predicts that the standard error for Michigan will be about 1/2 (5.301485 1.35483 1523 1.3765 ldurat  Coef.05 0.4943988 _cons  1. In fact. Std. 6. 1520) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1524 6.1919906 .92 larger than that for Kentucky. although the 95% confidence interval is pretty wide.0118 0.3762124 afhigh  .0847879 1.0567172 24.0098 1.412737 .91356194 Number of obs F( 3.1541699 1.60 0.15 0. the t statistic is insignificant at the 10% level against a onesided alternative.000 1. t P>t [95% Conf. .91 0.80 0. Using the data for Michigan to estimate the simple model gives .626/1.0004 0. 1075) = Prob > F = 1084 99.0973808 .1691388 .can get a reasonably precise estimate of the effect. reg ldurat afchnge highearn afhigh if mi Source  SS df MS +Model  34.992074 8 16.0000 . Unfortunately. c.0689329 .109 .89471698 +Total  2914.1104176 .523989 The coefficient on the interaction term.0379348 .192.96981 1520 1. reg lwage y85 educ y85educ exper expersq union female y85fem Source  SS df MS +Model  135.1055676 1.2636945 highearn  .9990092 31 Number of obs = F( 8.23.4616726 Residual  2879. The following is Stata output that I will use to answer the first three parts: .213 . Err.
91 0.95 0. y85.2615749 female  . Still.1178062 . (But see part e for the proper interpretation of the coefficient.15 0. between 1978 and 1985. which is marginally significant at the 5% level against a twosided alternative. you can check that when 1978 wages are used.1426888 .0616206 .0295843 .091167 1083 .29463635 Rsquared = Adj Rsquared = Root MSE = 0.170324738 +Total  319.0000775 5.000 .383.19 0.0003994 .67 0.000 .85 percentage points.2021319 . c. b.0878212 y85educ  .085052 .642295  a.0302945 6.1237817 0.0035673 8.5 percentage points.125075 .97. Std.0002473 union  . which gives a t statistic of about 1.3167086 . Only the coefficient on y85 changes if wages are measured in 1978 dollars.4219 .036584 expersq  . or 1.000 . the coefficient on y85 becomes about . t P>t [95% Conf.0366215 8.Residual  183.0184605 .4127 lwage  Coef. The coefficient on y85fem is positive and shows that the estimated gender gap declined by about 8. I just took the squared OLS residuals and regressed those on the year dummy.3606874 educ  . this is suggestive of some closing of wage differentials between men and women at given levels of education and workforce experience.185729 _cons  .0934485 4.65 0.4589329 .341 .0747209 .2755707 .0225846 .91.000 . The return to another year of education increased by about . The t statistic is 1. To answer this question. Interval] +y85  . But the t statistic is only significant at about the 10% level against a twosided alternative.0093542 1.97 0. In fact. Err.3885663 .0066764 11. 32 So .000 .099094 1075 .000 .036815 exper  .049 .0005516 .098 .244851 y85fem  . which shows a significant fall in real wages for given productivity characteristics and gender over the sevenyear period.) d.4262 0.042 with a standard error of about .051309 1.000106 .022.0185.29 0. The coefficient is about .66 0.0156251 .
0366215 8.0184605 .0093542 1.049 .19 0. gen y85educ0 = y85*(educ . in the new model.0003994 . For a male A simple way to obtain ^ ^ ^ the standard error of q0 = d0 + 12d1 is to replace y85Weduc with y85W(educ Simple algebra shows that. we want q0 _ d0 + 12d1.29 0.4127 lwage  Coef.339.3885663 .2615749 female  . q0 is the coefficient on 12). the coefficient d0 is the growth in nominal wages for a male with no years of education! with 12 years of education.000 .65 0.0616206 .1426888 . Interval] +y85  .0934485 4.036584 expersq  .051309 1.642295 So the growth in nominal wages for a man with educ = 12 is about .085052 . e.66 0.099094 1075 .000 .3167086 .0156251 .4060659 educ  .80 0.000 .29463635 Number of obs F( 8.0000775 5.there is some evidence that the variance of the unexplained part of log wages (or log real wages) has increased over time.0302945 6. educ.339) 1.2725993 .036815 exper  .4589329 .0066764 11.000 . In Stata we have .15 0.0340099 9. or 33.000106 .244851 y85fem  .91 0.000 . obtained from exp(.992074 8 16.4219 .2755707 . t P>t [95% Conf.0005516 .0002473 union  .98 0.2021319 .091167 1083 .0225846 .9990092 Residual  183.000 .4262 0.098 .0878212 y85educ0  .0295843 . As the equation is written in the problem.9%. Std. Err. [We could use the more accurate estimate.97 0.185729 _cons  . 1075) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1084 99.170324738 +Total  319.000 .0000 0.67 0. 33 .0747209 .0035673 8.] The 95% confidence interval goes from about 27.12) .3393326 . reg lwage y85 educ y85educ0 exper expersq union female y85fem Source  SS df MS +Model  135.3 to 40.6.
. This follows if the asymptotic variance matrix is block diagonal (see Section 3.^ 1 1 Avar rN(B . 34 This shows that the . we use the result under SGLS. .G. SGLS.3 sg2E(x’igxig) for all g = 1. 0 2 s2 G E(x’ iGxiG)8 0 W When this matrix is inverted. Write (with probability approaching one) B^ = B + &N1 SN X’X *1&N1 SN X’u *. we can use the special form of Xi for SUR (see Example 7. and E(uiguihx’ igxih) = E(uiguih)E(x’ igxih) = 0. from Theorem 7. it is also block diagonal.4: To establish block diagonality. 7 i=1 i i8 7 i=1 i i8 From SOLS.3.. and Slutsky’s Theorem.CHAPTER 7 7. SGLS. the WLLN implies that plim 7.1. we have * 2 2.1. all g $ h. 7 i=1 i i8 &N1 SN X’u * = 0. implies that E(uigx’ igxig) = 2 In the SUR model with diagonal ). Thus. where the blocking is by the parameter vector for each equation.2.1). asymptotic variance of what we wanted to show. plim &N1 SN X’X *1 = A1. 7 i=1 i i8 ^ & 1 N X *1Wplim &N1 SN X’u * = B + A1W0 = B. a. ) plim B = B + plim N S X’ 7 i=1 i i8 7 i=1 i i8 Further.2.1. under SOLS.B) = [E(X’ i ) Xi)] .3. Now. and SGLS. and SGLS. the fact that )1 is diagonal.. &s2 0 1 E(x’ i1xi1) 2 1 0 W E(X’ i ) Xi) = 2 W 2 7 0 0 Therefore. the weak law of large numbers.3.5). it suffices to show that the GLS estimators for different equations are asymptotically uncorrelated.. Since OLS equationbyequation is the same as GLS when ) is diagonal.
See Problem 7. c. To test any linear hypothesis. i=1 82 2 W 2 7 2 N 2 2 SN x’y 2 i iG8 7 S x’i yiG8 7 i=1 i=1 Straightforward multiplication shows that the right hand side of the equation ^ ^ ^ ^ is just the vector of stacked Bg.BFGLS) = op(1).. 7 i=1 8 i=1 Therefore. GLS and FGLS are asymptotically equivalent (regardless of the structure of = B^GLS when ) and )) whether or not SGLS. where Bg is the OLS estimator for equation g.53). model with the restriction B1 = For the restricted SSR we must estimate the B2 imposed..7. consider E(uituis).. Note that 1 &^ 1 & N *1 N ^ 2) t 7 S x’i xi*82 = ) t &7 S x’i xi*8 .52) or (7. This is easy with the hint.3 holds. B^SOLS But. Under SGLS. the diagonal elements of E[E(uitxit)] = 2 ) st2 by iterated expectations. .G.5. OLS and FGLS are asymptotically equivalent. First. we can either construct the Wald statistic or we can use the weighted sum of squared residuals form of the statistic as in (7.b. even if )^ is estimated in an unrestricted fashion and even if the system homoskedasticity assumption SGLS. then rN(BSOLS . and .B ^ ^ .6 for one way to impose general linear restrictions. B^ = 1* &^ N ^ 1 2) t &7 S x’i xi*8 2() 7 i=1 8 & SN x’y * & SN x’y * i i12 i i12 2i=1 2i=1 2 2 1 2 2 & N * t IK)2 WW 2 = 2IG t &7 S x’i xi*8 22 W WW 22. 35 2 are easily found since E(uit) = Now. a. system OLS and GLS are the same. 7. ) 7. g = 1. When ) is diagonal in a SUR system.3 does not hold.BGLS) = op(1).^ ^ rN( FGLS .1 and SGLS. if Thus.2.. is diagonal.
The GLS estimator is 1 N N B* _ &7 S X’i )1Xi*8 &7 S X’i )1yi*8 i=1 i=1 & SN ST s2x’ x *1& SN ST s2x’ y *. Thus. xis. T S x’its2 t uit. X’ i ) ui = 1 T E(X’ i ) ui) = S s2 t E(x’ ituit) t=1 1 since E(x’ ituit) = 0 under (7. GLS is consistent in this case without SGLS.80). E(uituis) = 0 since uis take s < t without loss of generality.t1 + uit. since is diagonal. . is a subset of the conditioning information in (7.. say.. and so E(X’ i ) uiu’ i ) Xi) = 1 1 $ t. SGLS. since )1 is diagonal. for each t.. s > t. E(uitx’ itxit) = E[E(uitx’ itxitxit)] = E[E(uitxit)x’ itxit)] 2 2 2 = E[stx’ itxit] = 2 st2E(x’itxit).1. First consider the terms for s T T 2 S S s2 t ss E(uituisx’ itxis).. b. It follows that E(X’ i ) uiu’ i ) Xi) = 1 1 T 1 S s2 t E(x’ itxit) = E(X’ i ) Xi). .80). if s < t. and so by the LIE. t = 1.xis) = 0..T. 7i=1t=1 t it it8 7i=1t=1 t it it8 = b0 + b1yi. First.80).s2 x’ i2.79).1 Generally. 1 2 2 s2 T x’ iT)’.t+1 = yit is correlated with uit. and so t=1 = 0 Thus.Under (7. E(uitxit. If. which says that xi. t=1s=1 Under (7. SGLS. then yit is clearly correlated = c.uis. yit with uit.. X’ i) = (s1 x’ i1. d. t=1 36 Next.2. t $ s. does not hold.. E(uituisx’ itxis) = 0.1 holds whenever there is feedback from yit to )1 However. Applying the law of iterated expectations (LIE) again we have E(uituis) = E[E(uituisuis)] = E[E(uituis)uis)] = 0.
^ ^ e. f.) standard arguments.2. for each t. In particular..67 0.. For F testing. If s2t = s2 for all t = 1.. t = 1. Then...9.303200488 +Total  217.376973 4 46. we can use the standard errors and test statistics reported by a standard OLS regression pooled across i and t. The Stata session follows.8509 .. and F statistics from (7. th t Now. First.2296502 103 ..N.5942432 Residual  31. We can obtain valid standard errors. inference is very easy.51) are asymptotically valid. by ^2 st Lp s2t as N L 8.T. 103) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 108 153.8565 0. standard errors obtained from (7. define N ^2 ^2 st = N1 S ^uit i=1 (We might replace N with N . to pooled OLS.. then the FGLS statistics are easily shown to be identical to the statistics obtained by performing pooled OLS on the equation ^ ^ (yit/st) = (xit/st)B + errorit. if ^ ) is taken to be the diagonal matrix with s^t2 as the diagonal.T.55064 . t statistics. I first test for serial correlation before computing the fully robust standard errors: .1). run pooled regression across all i and t.. g. 7. note that the ^2 st should be obtained from the pooled OLS residuals for the unrestricted model. FGLS reduces Thus.0000 0.03370676 37 Number of obs F( 4. i = 1.. We have verified the assumptions under which standard FGLS statistics have nice properties (although we relaxed SGLS. and F statistics from this weighted least squares analysis. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987 Source  SS df MS +Model  186.606623 107 2.. let uit denote the pooled OLS residuals..53) are valid. Then.K as a degreesoffreedom adjustment.
0357963 24.097 0. Now test for AR(1) serial correlation: .048 .770 0.173 .9518152 _cons  .371 0. are now the expected sign.215732 0.4217765 . reg lscrap grant grant_1 lscrap_1 uhat_1 if d89 Source  SS df MS +Model  94.232525 .2790328 .0276544 .3532078 .7530202 49 . predict uhat.338 .000 . Interval] +grant  .1146314 2. gen uhat_1 = uhat[_n1] if d89 (417 missing values generated) .666 0. and its lag.47 0.6186631 Residual  15.321490208 +Total  110. The results are certainly different from when we omit the lag of log(scrap).5958904 _cons  . Err. t P>t [95% Conf.0000 0.2120579 lscrap_1  .158 0.077 0.035384 uhat_1  .07976741 Number of obs F( 4. 53) = Prob > F = 38 108 77. 49) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 54 73.0571831 16.8571 0. resid (363 missing values generated) . Err.2123137 .4746525 4 23.0021646 .9204706 .1723924 .0165089 .3785767 .962 0.1610378 0.0883283 0.3232679 lscrap_1  .138043 The estimated effect of grant.8808216 . but neither is strongly statistically significant.8454 .507 .0371354 . t P>t [95% Conf.420 0.4500385 grant_1  .675 .875 . reg lscrap d89 grant grant_1 lscrap_1 if year != 1987.8055569 1.0769918 grant_1  .1746251 0.028 0. The variable grant would be if we use a 10% significance level and a onesided test.227673 53 2.0000 .606 0.426703 .lscrap  Coef.24 0.1576739 1.809828 . Std.000 . Std.1224292 grant  .0378247 .4628854 .1073226 . robust cluster(fcode) Regression with robust standard errors Number of obs = F( 4. Interval] +d89  .939 .083 .567 lscrap  Coef.1257443 1.4170208 .1199127 0.1153893 .
4938765 lprbpris  .0645344 13.5456589 . Err.6949699 Residual  88. 618) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 630 74. Std.3266 7.4663616 .42 0.4108369 .679 .143585231 +Total  206. t P>t [95% Conf.65 0.1073226 .01 0.1790052 0.735673 618 .60 0. t P>t [95% Conf.11.6473024 lprbconv  . Interval] +d89  .0263683 20.1188807 1. test grant grant_1 ( 1) ( 2) grant = 0.2517165 lscrap_1  .5624 .570 0.1723924 .1142922 grant  .000 .694 0.1153893 .0 grant_1 = 0.0371354 .551 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87 Source  SS df MS +Model  117.328108652 Number of obs F( 11.682 0.7917042 .8808216 . a.3795728 39 .55064  Robust lscrap  Coef. Interval] +lprbarr  .7513821 1. grant and grant1 are jointly insignificant: . 53) = Prob > F = 1.Rsquared Root MSE Number of clusters (fcode) = 54 = = 0.318 . Err.0 F( 2. making both more statistically significant.8565 .1155314 . and the fully robust standard errors are much larger than the nonrobust ones. The following Stata output should be selfexplanatory.14 0.153 .5974413 . .49 0.5700 0.0367657 19.000 .000 .0660522 grant_1  .0000 0. Std.37893 lcrmrte  Coef.2475521 .1145118 1.216278 . There is strong evidence of positive serial correlation in the static model.0672268 3.644669 11 10.0893147 0.3450708 . However.45 0.7195033 .380342 629 .000 .010261 _cons  .1420073 The robust standard errors for grant and grant1 are actually smaller than the usual ones.
2516253 8.0200271 . Err.2005023 .0420791 . Interval] +lprbarr  . Err.498 0. t P>t [95% Conf.1087542 .1968286 538 .0269872 lpolpc  .755 0.7918085 .576438 1.082293 . 538) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 831.6071 0. reg uhat uhat_1 Source  SS df MS +Model  46.3659886 .5700 . I obtain the fully robust standard errors: .1566662 . 89) = Prob > F = Rsquared = Root MSE = Number of clusters (county) = 90 630 37.6680407 1 46.056127934 +Total  76.0049957 d85  .0000 0.043503 .189 0.475 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87.56 0.0578218 0.0867575 .1086284 .635 . Std.835 0.15563 .6680407 Residual  30.1925835 .5017347 40 .000 .19 0.0300252 12.0000 0.135 .878 0.6064 .0846963 _cons  2.7378666 .000 .929 .056899 0.37893  Robust lcrmrte  Coef. Interval] +uhat_1  .8648693 539 . gen uhat_1 = uhat[_n1] if year > 81 (90 missing values generated) .000 .000 .0780454 .lavgsen  .9372719 .0714718 d87  .4249525 d82  .7195033 . Std.275 0.451 .74e10 .057931 0.0576243 0.3070248 .467 .02746 28.057923 1.23691 uhat  Coef.0051371 .222504 .338 0.000 2.000 1.089 0. predict uhat. resid .1189026 d83  .0583244 1.1095979 6. robust cluster(county) Regression with robust standard errors Number of obs = F( 11.142606437 Number of obs F( 1.0270426 .588149 .0364927 d86  .0579205 1.061 .8457504 _cons  1. t P>t [95% Conf.1387815 .0696601 d84  .181 .46 0.0200271 Because of the strong serial correlation.0101951 0.728 0.
043503 .2119007 .287174 11 14.3113499 .229651 Not surprisingly.0391758 2.75 0.031945255 +Total  180. the lagged crime rate is very significant.306 0.1088453 2.41 0.101492 .000 .749 .27 0.02 0. We lose the first year.4638255 lavgsen  .0385625 2.000 . Std.1254092 .018 3.1546683 .0428788 0.800445 .0270426 .026896 1.0267623 2.480 .784 0. t P>t [95% Conf. 41 The .6065681 d82  .082293 .792 0.273 0.033643 1.0678438 .0309127 d85  . when we add the lag of log(crmrte): . drop uhat uhat_1 b.0960793 lprbpris  . including it makes all other coefficients much smaller in magnitude.4057025 lprbpris  .470 0. Err.174924 .0267299 2.0649438 .2475521 . Interval] +lprbarr  .02 0.1062619 .1285118 .0229405 7.230 0.9064 0.0948522 d87  .1103509 .030387 3.025 .000 .1152298 .lprbconv  .71 0.003 .0367296 0.0107492 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 Source  SS df MS +Model  163.007 .0137298 .9044 .000 .0381447 0.179 0.0555355 lpolpc  .1130321 0.154268 539 .5456589 .889 .0867575 . gen lcrmrt_1 = lcrmrte[_n1] if year > 81 (90 missing values generated) .1217691 lprbconv  .1609444 .0487502 _cons  2.755 .0781181 d83  .14 0.0612797 .8637879 _cons  .562 0.0268172 0.6856152 .8263047 .0304828 .1337606 d83  .8647054 2.0312787 .014 . 528) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 464.0570255 lavgsen  .046 .329 .321 0.0190806 43.0431201 d87  .8442885 Residual  16.015 .119 .1668349 .000 .0420791 .1324195 0.121078 3.0165096 7.29 0.0124338 d84  .0536882 .0085982 . Further.0233448 d84  . 1981.98 0.006 0.078524 .8670945 528 .0704368 7.818 .0011145 d85  .1272783 .1028353 .1865956 .3641423 .1087542 .0692234 .77 0.3659886 .334237975 Number of obs F( 11.0000 0.0164261 6.0671272 .2906166 .17873 lcrmrte  Coef.000 .0345003 0.0108203 .0780454 .0051371 .68 0.445 .1205245 lcrmrt_1  .0271816 2.1378348 lpolpc  .0014224 d86  .0440833 d86  .1174537 .7888214 .430 0.199 .312 0.0420159 .78 0.045 .
000 .1005518 lprbpris  .17895 lcrmrte  Coef. None of the log(wage) variables is statistically significant.0352873 0.0729231 .1746053 . predict uhat. Err.0286922 2.533423 20 8.059 with t statistic .154268 539 .334237975 Number of obs F( 20.0165559 d84  .1277591 lprbconv  .0000 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 lwconlwloc Source  SS df MS +Model  163.000 .580 .0169096 7. Thus.166991 . which means that there is little evidence of serial correlation (especially since ^ r is practically small).0195318 .087 0.0088345 42 . t P>t [95% Conf.1050704 .1216644 . and the magnitudes are pretty small in all cases: .0311719 3.2214516 .1292903 .32 0.0172627 6. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d84d87 lcrmrt_1 uhat_1 From this regression the coefficient on uhat1 is only .1108926 . resid (90 missing values generated) . c.986.071157 .1389838 d83  .023 .011 .variable log(prbpris) now has a negative sign.554 0. d. Interval] +lprbarr  . There is no evidence of serial correlation in the model with a lagged dependent variable: .322 0.542 0.272 0.557 0. I will not correct the standard errors. however. We still get a positive relationship between size of police force and crime rate. Std.0287165 2.1337714 .000 .9077 0.03202475 +Total  180.0238458 7.0497918 lavgsen  .049654 lpolpc  .0652494 . gen uhat_1 = uhat[_n1] if year > 82 (180 missing values generated) . although it is insignificant.0888553 .17667116 Residual  16.1721313 .000 . 519) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 255.6208452 519 .911 0.9042 .
6335887 1.0392516 0.....2639275 lwsta  .012903 .0208067 38.791 0.266 .0 0.0318995 0..0 0. test lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc ( ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) 9) lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F( = = = = = = = = = 0.0418004 1.0961124 .172 .0034567 ..0283133 .471 .0326156 0. 43 .354 .1054249 .1.23).721 0.1173893 .0296003 .0355801 lwfed  .877 .0466549 .0371746 0.0405482 lwtrd  .85 0.8087768 .88852 .0 0. it follows from multivariable calculus that dQ(b)’ N ^& N .0409046 . Letting Q(b) denote the objective function in (8.0 0.0223995 0.710 0.X B ^ * i )8 = 0.0994076 d87  .767901 .114 0.1070534 .338 .0474615 .928 0.0 0.341 0.429 .0487983 lwtuc  .0 0.6009076 .0465632 .0798526 1. after simple algebra. ^ ^ ^ (X’ZWZ’X)B = (X’ZWZ’Y).181 ..154 0.0221872 0.0 0.6438061 .958 0.783 .0922683 lwser  .0742918 .582 0.276 0.051 0.2201867 .0498207 ...0903894 .Xib)*8.0898807 .0564908 lwmfg  .1003172 0.871 0.1286819 lcrmrt_1  .29319 _cons  .0121236 .= 2&7 S Z’i Xi*’ S Z’i (yi .8496525 lwcon  ..0263763 .310 1.0389325 1.0439875 0.478 .0 0. 8 W7i=1 db i=1 Evaluating the derivative at the solution B^ gives & SN Z’X *’^W& SN Z’(y .0258059 .561 .0306847 .1009652 .5663 CHAPTER 8 8.d85  .294 .039408 lwloc  .0330676 . we can write.0 9.000 .038269 d86  .368 0.0355555 .. 519) = Prob > F = 0.0660699 1.098539 lwfir  ..016 0. 7i=1 i i8 7i=1 i i In terms of full data matrices...
7.D(D’D) D’.5. When )^ is diagonal and Zi has the form in (8. we can always write x as its linear projection plus an error: * x * + e. 8. where x = z^ and E(z’e) = 0. idempotent matrix IL . 8. E(s’r) = 0. verifies the first part of the hint.24).B^ Solving for gives (8.(C’WC)(C’W*WC) (C’WC) can be written as 1 1 C’* 1/2 where D _ *1/2WC. let h _ h(z). This follows directly from the hint.h) = z^1 + h^2. r is also equal to x . where ^1 is M * K and ^2 is Q * K. th is a block diagonal matrix with g denotes the N ^2& N ^ ^ S Z’i ) Zi = Z’(IN t ))Z i=1 * ^2 block sg S z’ 7i=1 igzig8 _ sgZ’g Zg. [IL . where s 1 _ h . Then we must show that ^2 = 0.7 in Chapter 2). 8. Since this is a matrix quadratic form in the L * L symmetric. Now.15). and so r is uncorrelated with all functions of z. it is now 44 .L(xz). and write the linear projection as L(yz.E(xz). Therefore. th is block diagonal with g block Z’ g Xg.3. But Therefore. x = Therefore. by the assumption that E(xz) = L(xz).D(D’D) D’]* 1 1/2 C. Z’X Using these facts. But. and this = 0. it is necessarily itself 1 positive semidefinite. E(rz) = 0. from the twostep projection theorem (see Property LP. E(z’x) = E(z’x ). s is simply a function of z since h shows that ^2 _ h(z). which * To verify the second step.L(hz) and r _ x . ^2 = [E(s’s)] E(s’r). First. Straightforward matrix algebra shows that (C’* C) . Further. where Zg N * Lg matrix of instruments for the gth equation.
y2) and so E(x1z) = [z1. 45 . the the optimal instruments are s2zi^. The optimal instruments are given in Theorem 8.. 8.G. s2 and E(xizi) = zi^.11.E(y2z)]..9.E(y2z)]. x1 = (z1. Without the i subscript.5. and this leads to the OLS estimator. 2 Dropping the division by s21 clearly does not affect the optimal instruments. * If E(uizi) = 2 1 w(zi) = E(ui2zi).. and so the optimal IVs are [z1.5 when G = 1. except that ^ These are the optimal IVs underlying is replaced with its rNconsistent OLS estimator. 2SLS. 2SLS estimator has the same asymptotic variance whether ^ or ^ ^ The is used. If y2 is binary then E(y2z) = P(y2 = 1z) = F(z). equivalently. The constant multiple s2 clearly has no effect on the optimal IV estimator. with G = 1: zi = [w(zi)] E(xizi). = It follows that the optimal instruments are (1/s1)[z1. b. )(z) Var(u1z) = s21. and so 2SLS is asymptotically efficient. If E(ux) = 0 and E(u 2 = x) = s2 then the optimal instruments are s2E(xx) s2x. Further.. 2SLS equationbyequation. so the optimal instruments are zi^. 1 1 1 This is just the system 2SLS estimator or.F(z)].straightforward to show that the 3SLS estimator consists of [X’ g Zg(Z’ g Zg) Z’ g Xg] X’ g Zg(Z’ g Zg) Z’ g Yg stacked from g = 1. a. This is a simple application of Theorem 8. 8.
and the parameters in a twoequation system modeling one in terms of the other. b. and vice versa. why would we want to hold fixed R&D spending? Clearly FT purchases and R&D spending are simultaneously chosen. These are both choice variables of the firm. (In fact. Of course. What causal inference could one draw from this? We may be interested in the tradeoff between wages and benefits. a. We we can certainly be interested in the causal effect of alcohol consumption on productivity. But it is not a simultaneity problem. and therefore wage. but then either of these can be taken as the dependent variable and the analysis would be by OLS. d. An SEM is a convenient way to allow expenditures to depend on unobservables (to the econometrician) that affect crime.CHAPTER 9 9. then we could estimate the crime equation by OLS. No. Yes. and we are certainly interested in such thought experiments.1. if we have omitted some important factors or have a measurement error problem. we could use a simple regression analysis. c. No. 46 One’s hourly wage is . If we want to know how a change in the price of foreign technology affects foreign technology (FT) purchases. but we should use a SUR model where neither is an explanatory variable in the other’s equation. have no economic meaning.) The simultaneous equations model recognizes that cities choose law enforcement expenditures in part on what they expect the crime rate to be. Yes. We can certainly think of an exogenous change in law enforcement expenditures causing a reduction in crime. If we could do the appropriate experiment. where expenditures are assigned randomly across cities. OLS could be inconsistent for estimating the tradeoff.
c. The visits equation is identified if and only if at least one of finc and fremarr actually appears in the support equation. since the support equation contains one endogenous variable. First. we need d11 $ 0 or d13 $ 0. the only variable excluded from the support equation is the variable mremarr. b. f. A SUR system with property tax as an explanatory variable seems to be the appropriate model. No. First. obtain the reduced form for visits: 47 . These are choice variables by the same household. No.determined by the demand for skills. It makes no sense to think about how exogenous changes in one would affect the other. We would not want to hold fixed family saving and then measure the effect of changing property taxes on housing expenditures. This ensures that there is an exogenous variable shifting the mother’s reaction function that does not also shift the father’s reaction function. a. 9. alcohol consumption is determined by individual behavior. presumably to maximize It makes no sense to hold advertising expenditures fixed while looking at how other variables affect price markup.2. Each equation can be estimated by 2SLS using instruments 1. Further. We can apply part b of Problem 9. mremarr. suppose that we look at the effects of changes in local property tax rates. These are both chosen by the firm. profits. fremarr. dist. e. When the property tax changes.3. a family will generally adjust expenditure in all categories. finc. that is. this equation is identified if and only if d21 $ 0.
d13. the easiest way to test the overidentifying restriction is to first estimate the visits equation by 2SLS. Let B1 denote the 7 * 1 vector of parameters in the first equation with only the normalization restriction imposed: B’1 = (1. ^ say r1. a. ^ Let u2 be the 2SLS residuals. dist. and save the residuals. as in part b. v2. dist. v2 ^ and do a (heteroskedasticityrobust) t test that the coefficient on v2 is zero.g12.g13.5. run the simple regression (without intercept) of 1 on u2r1.d14). ^ regress finc (or fremarr) on support. Then.) 9. Next. Then. mremarr. ^ ^ Then. and save the residuals. 48 . ^ Estimate this equation by OLS.visits = p20 + p21finc + p22fremarr + p23dist + p24mremarr + v2. fremarr. visits. fremarr. If this test rejects we conclude that visits is in fact endogenous in the support equation.d12. mremarr. finc. run the auxiliary regression ^ u2 on 1. finc. the sample size times the usual Rsquared from this regression is distributed asymptotically as c21 under the null hypothesis that all instruments are exogenous.d11. Assuming homoskedasticity of u2. ^ Let support denote the fitted values from the reduced form regression for support. d. run the OLS regression ^ support on 1. dist. (SSR0 is just the usual sum of squared residuals. There is one overidentifying restriction in the visits equation. assuming that d11 and d12 are both different from zero. A heteroskedasticityrobust test is also easy to obtain. N  SSR0 from this regression is asymptotically c21 under H0.
9. Set d14 = 1 .z3. the order condition is satisfied. straightforward matrix multiplication gives R1B = & d12 2d + d . + d34 .z4 = g12y2 + g13y3 + d11z1 + d13(z3 . Letting B denote the 7 * 3 matrix of all structural parameters with only the three normalizations.z2. Next. d22 = and so R1B becomes d32 * 2. This equation can be estimated by 2SLS using instruments (z1.g31 8 By definition of the constraints on the first equation.g31 8 0. and g32 = 0.1 14 7 13 d23 d22 + d24 . we need to check the rank condition. 49 Ideally.The restrictions d12 = 0 and d13 + d14 = 1 are obtained by choosing R1 = &0 0 71 0 0 0 0 0 1 0 0* .z4) + u1. g31 = 0. a. After simple algebra we get y1 . d23 = 0. Note that. Because alcohol and educ are endogenous in the first equation. But g23 = 0. d24 = 0. 18 0 1 Because R1 has two rows. . and G .g21 d33 d32 * 2.1 = 2. there are just enough instruments to estimate this equation.z4). we need at least two elements in z(2) and/or z(3) that are not also in z(1). &0 0 R1B = 2 70 g21 d33 Identification requires g21 $ 0 and d32 $ 0.7. the first column of R1B is zero. if we just count instruments. It is easy to see how to estimate the first equation under the given assumptions. Now. b. + d34 . we use the constraints in the remainder of the system to get the expression for R1B with all information imposed.d13 and plug this into the equation.
135 463. Here is my Stata output for the 3SLS estimation of (9.we have at least one such element in z(2) and at least one such element in z(3). 0 (zi.67 0.451518 0.87188 0. z(3) = z.0002109 0.132745 _cons  2504.3678943 3. b.46 0.0070942 .11 0.educi) 0 ) 2 2.8577 2522.1032 21.915 6.46 0.0000 lwage 428 4 .1129699 .0208906 . Std.169 3.0895 79.47351 3.911094 kidslt6  200.82352 nwifeinc  .0142782 1.95 0.0006143 educ  .5673 134.6455 103.35 0.9.128 +lwage  hours  .261529 1.362 2.59414 kidsge6  48. 9.1145 34. we should not make any exclusion restrictions in the reduced form for educ. reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) (lwage hours educ exper expersq) Threestage least squares regression Equation Obs Parms RMSE "Rsq" chi2 P hours 428 6 1368.933 431.340 .009 educ  205.0000  Coef. c.2685 1.143 .28121 8. Let z denote all nonredundant exogenous variables in the system.63986 35.176 119.1426539 exper  . zi2 0 0 That is.95137 1.000 831.000 306.49 0. Err. The matrix of instruments for each i is ( 2z i Zi = 2 0 2 0 9 d.0267 51. Then use these as instruments in a 2SLS analysis.000 1454.47 3555.0832858 .4078 age  12. z P>z [95% Conf.84729 3.137 28.29): .8919 4.0151452 7.28) and (9.000201 .0002123 . Interval] +hours  lwage  1676.89 0.95 0.49 0.6892584 0.396957 7.7287 62. a.000 .53608 0.0488753 50 .799 535.
31 0.13 0. c. We can estimate the system by 3SLS. we could just use 2SLS on ^ d11. identified if and only if The second equation is d11 $ 0. whereas using an SEM does. if the SEM is correctly specified. Whether we estimate the parameters by 2SLS or 3SLS. Given Or. 9.expersq  .1081241 Endogenous variables: hours lwage Exogenous variables: educ age kidslt6 kidsge6 nwifeinc exper expersq  b.z2.g12g21). for the second equation. (Since we are estimating the second equation by 2SLS. a.11.021 1. and g^21. e.7051103 .g^12g^21). of course). After substitution and straightforward algebra. We can just estimate the reduced form E(y2z1. we will still consistently estimate misspecified this equation. ^g12. I know of no econometrics packages that conveniently allow system estimation using different instruments for different equations.z3) by ordinary least squares. Consistency of OLS for p11 does not hinge on the validity of the exclusion restrictions in the structural model.0002614 1. it can be seen that p11 = d11/(1 .302097 . course. Unfortunately. we just need d22 $ 0 or d23 $ 0 (or both. we obtain a more efficient 51 Of .000218 _cons  . we would form p^11 = d^11/(1 .260 . we will generally inconsistently estimate d11 and g12. d.3045904 2.0008066 .) So our estimate of g21 provided we have not p11 = dE(y2z)/dz1 will be inconsistent in any case. Since z2 and z3 are both omitted from the first equation. this is identical to 2SLS since it is just identified. f.0002943 . each equation. To be added. b.
41783 52 .230002 Number of obs F( 2.68006 148. Std.747 0. then OLS: .294 0.0968 Residual  35151.22775 2 1004.796 open  Coef.368841 _cons  26. Interval] +open  .194 111 568. Err.49324 0.9902 113 564.0000 0.083 3.953679 _cons  117.7966 111 316. t P>t [95% Conf.6230728 .000 85.617192 4.567103 .412473 3.852 3. Here is my Stata output. c.13.0309 0.0519014 lpcinc  .4012 1. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 114 2.682852 +Total  63757.388 0. Interval] +lpcinc  . and only if. reg open lpcinc lland Source  SS df MS +Model  28606.89934 15.3374871 .187 0.015081 0.8142162 9.366 0.61916 57.4487 0. 2SLS.342 0. reg inf open lpcinc (lland lpcinc) Source  SS df MS +Model  2009.79 0. Std.180527 5.) b.5464812 1. t P>t [95% Conf.000 9.0134 23.61387 Residual  63064.145892 +Total  65073.17 0. First. Err.estimator of the reduced form parameters by imposing the restrictions in estimating p11. a. (This is the rank condition.1441212 2.0845 15. Here is my Stata output: .4387 17.4217 113 575.715 2. open. The first equation is identified if.870989 Number of obs F( 2.1936 2 14303.0657 0. d22 $ 0.505435 lland  7.3758247 2. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 45.8483 7.836 inf  Coef.489 This shows that log(land) is very statistically significant in the RF for Smaller countries are more open.021 . 9.
reg inf open lpcinc Source  SS df MS +Model  2945.70715 +Total  65073.870989 Number of obs F( 3. Std.2150695 . but we will go ahead. [log(land)] .40 inf  Coef.0453 0.975267 0.110342 Residual  65487.1060 .0175683 1.896555 3. reg inf open opensq lpcinc (lland llandsq lpcinc) Source  SS df MS +Model  414. gen opensq = open^2 . t P>t [95% Conf.273 0.09 0. of about 2. d. Interval] +53 ..10403 15.870989 Number of obs F( 2. it also You might want to test to see if open is endogenous. Not surprisingly. t P>t [95% Conf. and log(pcinc) 2 gives a heteroskedasticityrobust t statistic on [log(land)] This is borderline.009 0.4217 113 575.63 0. 2 log(land) is partially correlated with open. 110) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 114 2.331026 3 138. 2 A regression of open Since is a natural 2 on log(land). we need an IV for it.7527 110 595. If we add g13open2 to the equation. [log(land)] candidate.343207 +Total  65073. has a larger standard error. Err. The Stata output for 2SLS is .993 3.027556 lpcinc  .026122 55.651 0. .4936 111 559.0946289 2.931692 _cons  25.102 5. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 2.402583 .658 inf  Coef. Std. Interval] +open  .0281 23.92812 2 1472.96406 Residual  62127. 24.4217 113 575.20522 1.23419 The 2SLS estimate is notably larger in magnitude.025 .0764 0. Err. gen llandsq = lland^2 .
521 0.37 0.056 2.68006 148.39 0. Here is the Stata output for implementing the method described in the problem: .49324 0.54102  The squared term indicates that the impact of open on inf diminishes.4387 17. Std.4487 0.0968 Residual  35151.8142162 9.open  1.5% level against a onesided alternative.801467 81.028 4.715 2.0174527 lpcinc  .180527 5. t P>t [95% Conf.932 0. reg open lpcinc lland Source  SS df MS +Model  28606.682852 +Total  63757.870989 Number of obs F( 3.4217 113 575.612 inf  Coef.8483 7.17 0. Err.17124 19. Interval] +54 .0845 15. the estimate would be significant at about the 6. t P>t [95% Conf.000 85. reg inf openh openhsq lpcinc Source  SS df MS +Model  3743.567103 .131 .0049828 1.0311868 opensq  .953679 _cons  117.505435 lland  7.069134 0.796 open  Coef.412473 3. e.29 0. fitted values) .24 0. 110) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 2. Std.000 9.230 0.245 0. gen openhsq = openh^2 .5066092 2.6205699 1. Err. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 45.1936 2 14303.9902 113 564.36141 2.489 .0000 0.607147 _cons  43.7966 111 316.72804 Residual  61330.198637 .230002 Number of obs F( 2.5464812 1.2376 110 557.0022966 .428461 . Interval] +lpcinc  .547615 +Total  65073.807 3.0879 0.0318 23. predict openh (option xb assumed.593929 4.0575 0.0075781 .18411 3 1247.
b. at least to a certain extent.60 0.984 3. ceteris paribus.0057774 . the results are similar to the correct IV method from part d. a.5727026 77.78391  Qualitatively. could easily be correlated with tax rates because tax rates are. it is important to allow for these by including separate time intercepts.lland) is linear and. less robust.48041 2.17831 19. But the forbidden regression implemented in this part is uncessary.047 .openh  . both methods are consistent. If only a cross section were available.5394132 1.0178777 lpcinc  . This is often a difficult task.0412172 2.0059682 1. as shown in Problem 9. If g13 = 0. CHAPTER 10 10. we would have to find an instrument for the tax variable that is uncorrelated with ci and correlated with the tax rate.204181 openhsq  . larger marginal tax rates decrease investment.02 0.12. I would start with a fixed effects analysis to allow arbitrary correlation between all timevarying explanatory variables and ci. this is done by using T .313 . anyway.1 time period dummies.8648092 . and we cannot trust the standard errors. d. c.1. selected by state and local officials. 55 .933799 . Putting the unobserved effect ci in the equation is a simple way to account for timeconstant features of a county that affect investment and might also be correlated with the tax variable.050927 _cons  39. which affects investment.112 1. Standard investment theories suggest that.968493 4. Something like "average" county economic climate. (Actually.01 0. Since investment is likely to be affected by macroeconomic factors.01 0.023302 0.0060502 . E(openlpcinc.
yi = (yi1 + yi2)/2. these results can be compared with those from an FE analysis).doing pooled OLS is a useful initial exercise. rates: This could be similar to setting property tax sometimes property tax rates are set depending on recent housing values. Given that we allow taxit to be correlated with ci. this is not a worry for the disaster variable: It is safe to say that disasters are not determined by past investment. These might have little serial correlation because we have allowed for ci. especially if there is a target level of tax revenue the officials are are trying to achieve. If taxit and disasterit do not have lagged effects on investment. ¨ xi1 = xi1 . in which case I might use first differencing instead. e. since a larger base means a smaller rate can achieve the same amount of revenue. taxit. 56 . and disasterit in the sense that these are uncorrelated with the errors uis for all t and s. then the only possible violation of the strict exogeneity assumption is if future values of these variables are correlated with uit. But it cannot be ruled out ahead of time. 10. differencing it is easy to test whether the changes Remember. state officials might look at the levels of past investment in determining future tax policy. I have no strong intuition for the likely serial correlation properties of the {uit}. with first Duit are serially uncorrelated. presumably. future natural On the other hand. I would compute the fully robust standard errors along with the usual ones. Such an analysis assumes strict exogeneity of zit. in which case I would use standard fixed effects. a. this might not be much of a problem.xi. it seems more likely that the uit are positively autocorrelated.3. Let xi = (xi1 + xi2)/2. However. In either case.
i = 1. Let ui1 = ¨ yi1 . ¨ xi1 = (xi1 .(Dxi/2)B FD = (Dyi .1’).(Dxi/2)BFD = (Dyi ^ ui2 = ^ where ei ^ ^ DxiB FD)/2 _ ei/2 ^ ^ ^ Dyi/2 .2.xi.N. we have ^ ^ ui1 = Dyi/2 . Since B^FE and using the representations in (4.. Therefore.yi2)/2 = Dyi/2 ¨ yi2 = (yi2 . Therefore.xi2)/2 = Dxi/2 ¨ xi2 = (xi2 . N N ^2 2 ^2 S (u^i1 + ui2) = (1/2) S ei.yi1)/2 = Dyi/2. and similarly for ¨ yi1 and y i2 For T = 2 the fixed effects estimator can be written as B^FE = & SN (x¨’ x¨ + x¨’ x¨ )*1& SN (x¨’ ¨y + ¨x’ y¨ )*.DxiBFD)/2 _ ei/2. and so B^FE & SN Dx’Dx /2*1& SN Dx’Dy /2* 7i=1 i i 8 7i=1 i i 8 & SN Dx’Dx *1& SN Dx’Dy * = B ^ = FD.¨ xi1BFE and ui2 = ¨ yi2 .¨ ¨ . i2 i2 8 i2 i2 8 7i=1 i1 i1 7i=1 i1 i1 Now. xi2 = xi2 .. ^ _ Dyi . ¨ ¨ ¨ ¨ x’ i1xi1 + x’ i2xi2 = Dx’D i xi/4 + Dx’D i xi/4 = Dx’D i xi/2 ¨ ¨ ¨ y ¨ x’ i1yi1 + x’ i2 i2 = Dx’D i yi/4 + Dx’D i yi/4 = Dx’D i yi/2.xi1)/2 = Dxi/2 ¨ yi1 = (yi1 . by simple algebra. i=1 i=1 This shows that the sum of squared residuals from the fixed effects regression is exactly one have the sum of squared residuals from the first difference regression. 7i=1 i i8 7i=1 i i8 = ^ ^ ^ ^ ¨ B b.DxiB FD are the first difference residuals. = B^FD.x i2 FE be the fixed effects residuals for the two time periods for cross section observation i.. Since we know the variance estimate for fixed effects is the SSR 57 ..
and the variance estimate for first difference is the SSR divided by N .1. E(uixi. $ Therefore. s2uIT.K. E(viv’ i xi) = E(cixi)jTj’ T + E(uiu’ i xi) = h(xi)jTj’ T + 2 where h(xi) _ Var(cixi) = E(c2ixi) (by RE. that is.3a. The RE estimator is still consistent and rNasymptotically normal without assumption RE. b.K (when T = 2).1b). This shows that the conditional variance matrix of vi given xi has the same covariance for all t s.divided by N . 10. and the same variance for all t.30) 58 .3b. which implies that E(uiu’ i xi) = suIT (again. 2 Under RE.ci) = 0. and in fact all other test statistics (F statistics) will be numerically identical using the two approaches. h(xi). h(xi) + s2u.ci) = 2 s2uIT. Therefore. by iterated expectations). Thus. while the variances and covariances depend on xi in general. to show is that the variance matrix estimates of B^FE and What I wanted you B^FD are identical. which implies that E[(ciu’ i )xi) = 0 by interated expecations. This is easy since the variance matrix estimate for fixed effects is 1 N N *1 = ^s2& SN Dx’Dx *1. a. Write viv’ i = cijTj’ T + uiu’ i + jT(ciu’ i ) + (ciui)j’ T.5. su7 S (x¨’i1¨xi1 + ¨x’i2¨xi2)*8 = (^s2e/2)&7 S Dx’D x /2 i i 8 e7 i i8 i=1 i=1 i=1 ^2& which is the variance matrix estimator for first difference. E(uiu’ i xi. the standard errors. but the usual random effects variance estimator of B^RE is no longer valid because E(viv’ i xi) does not have the form (10. the error variance from fixed effects is always half the size as the error variance for first difference estimation. Under RE. they do not depend on time separately. ^2 su = ^s2e/2 (contrary to what the problem asks you so show).
X) = 0 (assumed) (theta = 0. tis term .0029948 .3581529 .0392381 1.4785 chi2( 10) = Prob > chi2 = 512. * fixed effects estimation.4779937 .49) should be used in obtaining standard errors or Wald statistics.000 . z P>z [95% Conf.0012426 6.1205028 season  .8999166 1.0328061 sat  .5526448 corr(u_id.1012331 female  .0108977 .0681573 3.000167 black  .3718544 .103 .1210044 . Err.000 .963 0.000 2.(because it depends on xi).335 .0084622 .0599542 0. The robust variance matrix estimator given in (7. xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black female.79): .3684048 .445 0.2348189 . iis id .000 . xtreg trmgpa spring crsgpa frstsem season.261 .264814 frstsem  . * random effects estimation .16351 0.000 .627 0. and I compute the nonrobust regressionbased statistic from equation (11.4782886 _cons  1. I provide annotated Stata output.0121797 crsgpa  1.0001248 0.0612948 5.0017052 .2067 0.050 0.000322 .4088283 .082365 . re sd(u_id) sd(e_id_t) sd(e_id_t + u_id) = = = .1145132 .77 0.0000 trmgpa  Coef.000 . Std. with timevarying variables only.0000775 .0060268 hssize  .3862) Randomeffects GLS regression Number of obs = 732 n = 366 T = 2 Rsq within between overall = = = 0.124 0.621 0. Interval] +spring  .0001771 9.0013582 .0020523 verbmath  .632 0.1575199 .5390 0.810 0.73492 .843 0.1629538 hsperc  .0606536 .534 .960 .0440992 .630 0. .864 0.3566599 4.43396 1.0930877 11.2380173 . fe 59 .0371605 1.1334868 . 10.7.035879 .
614*sat .0893 Fixedeffects (within) regression Number of obs = 732 n = 366 T = 2 Rsq within between overall F( = = = 0. egen aspring = mean(spring).0688364 0. di 1 .094 .000 . egen aseason = mean(season).9073506 1. by(id) . 362) = Prob > F = 23.187 0.679133 .61 0.0391404 1. * Obtaining the regressionbased Hausman test is a bit tedious. by(id) .852 .3305004 2.0128523 .0333 0.0657817 .0000 trmgpa  Coef. .1482218 season  .1186538 9. egen acrsgpa = mean(crsgpa).386 .0613 4.0111895 crsgpa  1. gen bone = .1427528 .386.681 0.332 0. by(id) . gen bvrbmth = . compute the timeaverages for all of the timevarying variables: . Xb) = 0.7708056 .0249165 _cons  . Note that lamdahat = .614*verbmath .792693 corr(u_id.420747 .2069 0.000 (366 categories) .. * Now obtain GLS transformations for both timeconstant and . by(id) .sd(u_id) sd(e_id_t) sd(e_id_t + u_id) = = = .1382072 . gen bhsperc = .614 .020 1. gen bhssize = .366 0.140688 . egen atrmgpa = mean(trmgpa). Err. Interval] +spring  .1208637 id  F(365. .614 0.1225172 .0566454 .362) = 5. * timevarying variables.614*hsperc .173 .614*hssize 60 First. by(id) . t P>t [95% Conf.374025 frstsem  . Std. gen bsat = .614 .4088283 .399 0.0414748 1. egen afrstsem = mean(frstsem).
964 0. * effects.0392441 1..123 0. nocons Source  SS df MS +Model  1584.000 .0440905 .811 0.000 . Std.0084622 ..359125 721 .4784686 . subject to rounding error.103 . * Now add the time averages of the variables that change across i and t . gen bspring = spring .009239 Residual  120.0001674 bblack  .050 0.000177 9. * These are the RE estimates. gen bblack = .114731 .0599604 0.000 .0000775 .864 0.336 .163434 bhsperc  .1207046 bseason  . reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale.1211368 .8995719 1.0029868 .435019 1.386*acrsgpa .0681441 3.0013577 .614*black .265101 bfrstsem  .1575166 .614*female .262 .0109013 . * Check to make sure that pooled OLS on transformed data is random .1010359 bfemale  .3686049 .0012424 6. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale acrsgpa afrstsem aseason.386*atrmgpa .0930923 11. nocons 61 . Err.0329558 bsat  .000 .1336187 .9294 0.0371666 1.082336 .632 0.67 0.40858 btrmgpa  Coef.386*afrstsem .2348204 .46076 732 2.844 0.3284983 Number of obs F( 11.734843 .386*aseason .1669336 +Total  1704. Interval] +bone  1. 721) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 732 862.0001247 0.000 .535 .0123167 bcrsgpa  1...3566396 4.621 0.0003224 .0020528 bvrbmth  .10163 11 144.632 0.0000 0.034666 bspring  .3581524 .446 0. gen bfemale = . gen bfrstsem = frstsem . t P>t [95% Conf. .4784672 ..9283 .960 .386*aspring . gen bcrsgpa = crsgpa .0017052 .2378363 .000 2.060651 . gen bseason = season .1634784 0. gen btrmgpa = trmgpa .626 0. * to perform the Hausman test: ..0060231 bhssize  .0612839 5. .
* Thus.1931626 bhsperc  .536 0.1280569 aseason  .1380874 .0).148023 bseason  .0000 0.770.40891 btrmgpa  Coef.9076934 1. the usual form of the Hausman test.000 .0685972 3.1101184 bfemale  .000 .1316462 .5182286 2.40773 14 113.2241405 .85 0.680 0.1186766 9.187 0. Std.961 0.0711669 4. based on a distribution (using Stata 7.4063367 bspring  .3357016 .0 afrstsem = 0.053023 718 .3284983 Number of obs F( 14.355 .612 0.9282 .0003236 .000167 bblack  .531 .747 0.006 2. robust . * significance levels.0016681 .745 0.0000783 .0566454 .1234835 0.001314 . 62 add ".366 0.0480418 .0020223 bvrbmth  .852 .61 0.3794684 .3567312 .1281327 afrstsem  . t P>t [95% Conf.0795867 .2322278 .0 aseason = 0.1426398 .0001249 0. test acrsgpa afrstsem aseason ( 1) ( 2) ( 3) acrsgpa = 0.0 F( 3.1959815 .2447934 .000 .0084655 . 718) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 732 676.592 .0391479 1.373683 bfrstsem  .6085 .337 .000 .441186 .3: cluster(id)" to the regression command.0794119 0.46076 732 2.426 .0109296 .0110764 bcrsgpa  1.423761 .000 .717 0.0060013 bhssize  .1142992 .0896965 0. 718) = Prob > F = 0.796 0.4754216 acrsgpa  .093 . Interval] +bone  1.926 0. which includes spring among the coefficients tested. we fail to reject the random effects assumptions even at very large . c24 It would have been easy to make the regressionbased test robust to any violation of RE.0012551 6. For comparison.167204767 +Total  1704.627 0. gives pvalue = .0247967 bsat  .569 0.Source  SS df MS +Model  1584.0128523 .0688496 0.4564551 .1654425 0.171981 Residual  120.1223184 .0001804 9.173 .247 0.9296 0.140688 .0657817 .0763206 . Err.0414828 1.
excluding the dummy variables. "fixed effects weighted least squares." where the weights are known functions of exogenous variables (including xi and possible other covariates that do not appear in the conditional mean). First.13. I should have In other words. as well as wit.9. we can justify this procedure with fixed T In particular. and estimating the equation by random effects.. asymptotically normal Therefore. there is no sense in which we can estimate the ci consistently. c.) Verifying this claim takes much more work. t = 1. write  yit = xitB + wiX + rit. by random effects and test H0: X = 0. in the sum of squared residuals. rNconsistent. B with good (As usual with fixed T.11.10. where xit includes an overall intercept along with time dummies.. the covariates that change across i and t. a.yi) and b. The simplest way to compute a Hausman test is to just add the time averages of all explanatory variables. substituting back into the 63 . To be added. 10. Yes. but it is mostly just algebra. The Stata output follows.. it produces a estimator of B. done a better job of spelling this out in the text. We can estimate this equation The actual calculation for this example is to be added. Parts b.T. is another case where "estimating" the fixed effects leads to an estimator of properties. 10. we can "concentrate" the ai out ^ by finding ai(b) as a function of (xi.. The short answer is: as N L 8. and d: To be added.
it + ~ ~ ci + uit for all t gives yit = xitB + ~ uit.w ....u...x.. ~ xit .w & T * easy to show that yi = xiB + ci + ui..w If h equals the same constant for all t..N.. First.xib..wi S xit/hit b 7t=1 8 7t=1 8 w w _ yi ...82) .x.w . define yit (xit .. y and x are the usual time it i i averages.w .it ..w Note carefully how the initial yit are weighted by 1/hit to obtain yi.. i=1t=1 ..xi) with weights 1/hit...w .wi) . ~ When we plug this in for yit in (10.(xit . _ (uit ..wi)/rh. _ (yit .. ..w .xitb)/hit = 0.w .w But this is just a pooled weighted least squares regression of (yit . i = 1. 7i=1t=1 it it8 7i=1t=1 it it8 (10.^ai .w definition holds for xi.T. Note that yi and xi are simply weighted averages.it Then B^ can be expressed in usual pooled OLS form: B^ = & SN ST x~’ x~ *1& SN ST x~’ y~ *. Subtracting 7t=1 8 (10.wi)/rh. it is . but  where the usual 1/rhit weighting shows up in the sum of squared residuals on the timedemeaned data (where the demeaming is a weighted average).wi)b]2/hit..y.. Given ^ L 8) properties of B ..82). where ui _ wi S uit/hit .y.. all t = 1.. and a similar 7t=1 8 7t=1 8 .w & T * where wi _ 1/ S (1/hit) > 0 and yi _ wi S yit/hit .wi)/rh. Straightforward algebra gives the first order conditions for each i as T S (yit . T & * ...82) and divide by N in the appropriate places we get 64 . we can study the asymptotic (N this equation from yit = xitB ~ where uit .yi) on w ~ Equivalently. _ (xit .sum of squared residuals. t=1 which gives ^ & ai(b) = wi T * & T * S yit/hit . ^ ^ Now we can plug each ai(b) into the SSR to get the problem solved by B: min K beR N T S S [(yit .. and then minimizing with respect to b only..
...uisxi...ci) = = s2uhit.hi. and so rN( . i = 1. (10.N.B ^ 1 1 rN( . (We can even show that E(BX. are estimating uit = (uit .83) we can immediately read off the consistency of (10.hi). then it is easily shown that B ..ci) = 0..83) that B is rNasymptotically normal under T as we assume rank mild assumptions. ~ So E(x’ ituit) = 0.B) = A BA . we can use the usual proof to show it it 8 7t=1 ^ ^ plim(B) = B.. The asymptotic variance is generally Avar . t $ s.. 7 i=1t=1 it it8 7 i=1t=1 it it it8 From (10.2su2E[(wi/hit)] + su2E[(wi/hit)]..w ^ say rit.H) = B... ~2 2 .....hi.* .T.2E[(uitui)/hit] su2 . s2u for the usual fixed Assume the zero conditional covariance assumption and correct variance specification in the previous paragraph. The same subtleties that arise in estimating effects estimator crop up here as well. Why? We assumed that E(uitxi.) ^ It is also clear from (10. note that the residuals from the pooled OLS regression ~ ~ yit on xit..N.&N1 SN ST x~’ x~ *1&N1 SN ST x~’ u~ *...*...it 8 t=1 t=1 If we assume that Cov(uit. B^ = B + t=1 t=1 and so we have the convenient expression B^ = B + &N1 SN ST x~’ x~ *1&N1 SN ST x~’ u /rh.B ^ 2 1 su2A. _ S E(x~’it~xit) and B _ Var&7 S ~x’ituit/rh. t = 1. including xit. where the law of 65 .. As long & S E(x~’ ~x )* = K.83) B^.ci) = 0. where T T .T.. 7 i=1t=1 it it8 7 i=1t=1 it it8 T ~ ~ T ~ Straightforward algebra shows that S x’ ituit = S x’ ituit/rhit.hi. in addition to the A variance assumption Var(uitxi.84) ^ ~ . which means uit is uncorrelated with any ~ function of (xi. Then.ui)/rhit (in the sense that we obtain rit ~ from uit by replacing w 2 + E[(ui) /hit] = B with B^). i = 1.w Now E(uit) = E[(uit/hit)] . t = 1.B) = suA ...
  w 2x ,h ] = s2w has
i
i
u i
iterated expectations is applied several times, and E[(ui)
been used.
~2
Therefore, E(uit) =
su2[1  E(wi/hit)], t = 1,...,T, and so
T
S E(u~2it) = s2u{T  E[wiWSTt=1(1/hit)]} = s2u(T  1).
t=1
This contains the usual result for the within transformation as a special
case.
A consistent estimator of
su2 is SSR/[N(T  1)  K], where SSR is the
usual sum of squared residuals from (10.84), and the subtraction of K is
optional.
^
The estimator of Avar(B) is then
^2&
1
N T
su7 S S ~x’it~xit*8 .
i=1t=1
If we want to allow serial correlation in the {uit}, or allow
Var(uitxi,hi,ci)
$ s2uhit, then we can just apply the robust formula for the
pooled OLS regression (10.84).
CHAPTER 11
11.1. a. It is important to remember that, any time we put a variable in a
regression model (whether we are using cross section or panel data), we are
controlling for the effects of that variable on the dependent variable.
The
whole point of regression analysis is that it allows the explanatory variables
to be correlated while estimating ceteris paribus effects.
Thus, the
inclusion of yi,t1 in the equation allows progit to be correlated with
yi,t1, and also recognizes that, due to inertia, yit is often strongly
related to yi,t1.
An assumption that implies pooled OLS is consistent is
E(uitzi,xit,yi,t1,progit) = 0, all t,
66
which is implied by but is weaker than dynamic completeness.
Without
additional assumptions, the pooled OLS standard errors and test statistics
need to be adjusted for heteroskedasticity and serial correlation (although
the later will not be present under dynamic completeness).
b. As we discussed in Section 7.8.2, this statement is incorrect.
Provided our interest is in E(yitzi,xit,yi,t1,progit), we do not care about
serial correlation in the implied errors, nor does serial correlation cause
inconsistency in the OLS estimators.
c. Such a model is the standard unobserved effects model:
yit = xitB +
d1progit + ci + uit,
t=1,2,...,T.
We would probably assume that (xit,progit) is strictly exogenous; the weakest
form of strict exogeneity is that (xit,progit) is uncorrelated with uis for
all t and s.
Then we could estimate the equation by fixed effects or first
differencing.
If the uit are serially uncorrelated, FE is preferred.
We
could also do a GLS analysis after the fixed effects or firstdifferencing
transformations, but we should have a large N.
d. A model that incorporates features from parts a and c is
yit = xitB +
d1progit + r1yi,t1 + ci + uit,
t = 1,...,T.
Now, program participation can depend on unobserved city heterogeneity as well
as on lagged yit (we assume that yi0 is observed).
differencing are both inconsistent as N
Fixed effects and first
L 8 with fixed T.
Assuming that E(uitxi,progi,yi,t1,yi,t2,...,yi0) = 0, a consistent
procedure is obtained by first differencing, to get
yit =
At time t and
yi,tj for j
DxitB + d1Dprogit + r1Dyi,t1 + Duit,
t=2,...,T.
Dxit, Dprogit can be used as there own instruments, along with
> 2.
Either pooled 2SLS or a GMM procedure can be used.
67
Under
strict exogeneity, past and future values of xit can also be used as
instruments.
11.3. Writing yit =
bxit + ci + uit  brit, the fixed effects estimator ^bFE
can be written as
2
N T
N T
b + 7&N1 S S (xit  x  i)*8 &7N1 S S (xit  x  i)(uit  u  i  b(rit  r i)*8.
i=1t=1
i=1t=1
*
*
*
Now, xit  xi = (xit  xi) + (rit  ri). Then, because E(ritxi,ci) = 0 for
*
  *
all t, (x
 x ) and (r
 r ) are uncorrelated, and so
it
i
it
i

  *
*

Var(xit  xi) = Var(xit  xi) + Var(rit  ri), all t.


Similarly, under (11.30), (xit  xi) and (uit  ui) are uncorrelated for all


*
  *


Now E[(xit  xi)(rit  ri)] = E[{(xit  xi) + (rit  ri)}(rit  ri)] =
t.

Var(rit  ri).
By the law of large numbers and the assumption of constant
variances across t,
N
T
T
S S (xit  x  i) Lp S Var(xit  x i) = T[Var(x*it  x *i) + Var(rit  r i)]
1 N
i=1t=1
t=1
and
N
T
S S (xit  x  i)(uit  u  i  b(rit  r i) Lp TbVar(rit  r i).
1 N
i=1t=1
Therefore,
plim
^
bFE
Var(rit  r i )
&
*
= b  b 7[Var(x*  x   * ) + Var(r  r   )]8
it
i
it
i
it
i
it
i
Var(rit  r i )
&
*
= b 1   .
7
*
*
8
[Var(x
 x ) +
Var(r
 r )]
11.5. a. E(vizi,xi) = Zi[E(aizi,xi) 
A]
+ E(uizi,xi) = Zi(A 
A)
+ 0 = 0.
Next, Var(vizi,xi) = ZiVar(aizi,xi)Z’
i + Var(uizi,xi) + Cov(ai,uizi,xi) +
Cov(ui,aizi,xi) = ZiVar(aizi,xi)Z’
i + Var(uizi,xi) because ai and ui are
uncorrelated, conditional on (zi,xi), by FE.1’ and the usual iterated
68
zi) = 0.2. B^ can be obtained by the following twostep procedure: 69 . Davidson and MacKinnon (1993. we are applying FGLS to the equation yi = ZiA + XiB + vi. we know that Var(vixi.. we can rearrange (11. From part a. and so the usual RE estimator is consistent (as N fixed T) and RE.xit).zi). provided ^ ) L 8 for rNasymptotically normal. Naturally. a feasible GLS analysis with any ^ ) will be consistent converges in probability to a nonsingular matrix as N L 8.zi) = plim()). we know that E(vixi.7. provided the rank condition.zi) depends on zi unless we restrict almost all elements of constant in zit).that is.49). denote the pooled OLS estimator from this equation. the usual random effects inference .. there is conditional heteroskedasticity. From part a. Unlike in the standard random effects model.. as in equation (7. If we use the usual RE analysis. or even that Var(vi) = plim()).4)]. * to be zero (all but those corresponding the the Therefore. Section 1.2. where vi = Zi(ai  A) + ui.60) to get yit = xitB + xiL Let A L) + vit. which shows that the conditional variance depends on zi. We can easily make the RE analysis fully robust to an arbitrary Var(vixi. b. based on the usual RE variance matrix estimator . When Lt = L/T  B^ (along with ^ along with B. By standard results on partitioned regression [for example.expectations argument..T. t = 1.will be invalid. and we estimate 11. Assumption (Remember. c. we expand the set of explanatory variables to (zit. Var(vizi. for all t. holds.xi) = Zi*Z’ i + s2uIT under the assumptions given. Therefore. It need ^ ^ not be the case that Var(vixi.
If we plug these choices of C. We can apply the results on GMM estimation in Chapter 8. and i=1t=1 11. as we are applying pooled 2SLS to the & ST E(z¨’ ¨x )* = K. t=1 70 ^ ¨ ¨ ^ If uit = y it .^ = S Tx’ i xi = S S x’ i xi. We can apply Problem 8..8. say rit.. t=1 b.25) and simplify. and * into (8.. Under (11.’i xit = S x.N.B) = su{E(X i i)[E(Z’ i Zi)] E(Z’ i Xi)} . a.. Given that the FE estimator can ^ be obtained by pooled OLS of yit on (xit . take C = E(Z i Xi). it it 8 7t=1 timedemeaned equation: rank This clearly fails if xit contains any timeconstant explanatory variables (across all i.. B^ We want to show that is the FE estimator.xi N T . W = [E(Z’ i Zi)] . and this rules out timeit it 8 7t=1 The condition rank constant instruments. and so rit = xit ..xi. * = " ¨ u .. First. The argument is very similar to the case of the fixed effects T estimator. and so * = ¨ E(uiu’ i Zi) = ¨’u u’¨ E(Z i i i Zi) = ¨’¨ s2uE(Z i Zi). (i) Regress xit on xi across all t and i. t = 1.& rit = xit .*1& N T .B ^ 2 1 ¨ ¨ 1 ¨’Z ¨ ¨ ¨ rN( . In 1 ¨’¨ ¨ ¨ particular. su2IT (by the usual iterated expectations argument). c. we obtain Avar . ^ ^ The OLS vector on rit is B.S S x’i xi8 7 S S x’i xit*8 7i=1t=1 i=1t=1 N . we can always redefine T zit so that S E(z¨’it¨zit) has full rank.xi for all t and i. in equation (8. This completes the proof...xiIK N T N T S S x. & ST E(z¨’ ¨z )* = L is also needed.xitB .1)su. it suffices to show that rit =  xit ... But if the rank condition holds. and save the 1 * K vectors of ^ residuals.xi). But ^ . and ¨’¨ ¨ ¨ E(Z i uiu’ i Zi).b. ^ (ii) Regress yit on rit across all t and i. W.’i S xit i=1t=1 i=1 t=1 i=1 = xit . where A key point is that ¨ Z’ i ui = (QTZi)’(QTui) = Z’ i QTui = Z’ i i QT is the T x T timedemeaning matrix defined in Chapter 10.80)...25).. i = 1. 2 2 S E(u¨it ) = (T .T. just as before.9. as usual)..N T .
. some entries in rit are identically zero for all t and i. This proves that the 2SLS estimates of and (11. sit. . again by partial regression ^ and the fact that regressing on d1i. it is easy to show that ci = yi . B^ and D^ can be ¨ .^ ^ for xit).1 (which is purely algebraic.. and so applies immediately to pooled 2SLS). where we use the fact ^ that the time average of rit for each i is identically zero. B from (11. sit = ^ rit for all i and t.81). and obtain the residuals.. .81) (with dni as their own instruments. dNi. including B. N(T . by algebra of partial regression.. the 2SLS estimator of all parameters in (11. ^ obtained from the pooled regression ¨ yit on x it rit. d. ^ rit. dNi results in time demeaning. ^ . Now.79) (If some elements of xit are included in zit. ^ ^ ^ obtain c1. As we know from Chapter 10. First.) e. say rit. this partialling out is equivalent to time demeaning all variables. as ^ would usually be the case. B^ ^ and the coefficient on rit. Typically. and then ^ running the OLS regression ¨ yit on ¨ xit.are the pooled 2SLS residuals applied to the timedemeaned data. d1i....1) would be i=1t=1 replaced by N(T .. by writing down the first order condition for the 2SLS ^ estimates from (11. zit across all t and i.1) . cN.79). can be obtained as follows: first run the regression xit on d1i. From Problem 5.81) are identical. ^ D. B from the pooled regression yit on d1i. dNi...xiB. xit.. .. Now consider the 2SLS estimator of B from (11. and xit as the IVs ^ . where B is the IV estimator 71 . say sit. But. Therefore. But we can simply drop those without changing any other steps in the argument. second... This is equivalent to ^ first regressing ¨ xit on ¨ zit and saving the residuals. dNi.K as a degrees of freedom adjustment. say from this last regression can be obtained by first partialling out the dummy variables.. then [N(T 1 N 1)] T S S ^u2it is a consistent estimator of s2u..
cgrant_1[_n1] if d89 (314 missing values generated) . Alternatively. Interval] 72 . The 2SLS procedure is inconsistent as N L 8 with fixed T.27) should be used.xitB = (yit . Because the N dummy variables are explicitly included in (11.81). I can used fixed effects on the first differences: . gen ccgrnt = cgrant . ui = ¨ yi .70368 cclscrap  Coef.xiB) .81) .xi)B = y it ^ ¨ xitB. W = (Z Z/N) .2119812 53 .(xit .yi) .479224186 Residual  25. 51) Prob > F Rsquared Adj Rsquared Root MSE = 54 = 0. gen cclscrap = clscrap .(yi .49516731 +Total  26. as is any IV method that uses timedemeaning to eliminate the unobserved effect.3868 = 0. The general.11. gen ccgrnt_1 = cgrant_1 .^ ¨ are computed as yit . where ^ 1 ^ ^ ^ ¨’¨ X and Z are replaced with ¨ X and ¨ Z. all elements).494565682 Number of obs F( 2.clscrap[_n1] if d89 (417 missing values generated) . t P>t [95% Conf.^ ^ . Therefore.0366 = 0. messy estimator in equation (8.from (11.2535328 51 . Differencing twice and using the resulting cross section is easily done in Stata.cgrant[_n1] if d89 (314 missing values generated) .81) (and also (11.79). the degrees of freedom in estimating s2u from part c are properly calculated. which are exactly the 2SLS residuals from (11.97 = 0. 11.958448372 2 . reg cclscrap ccgrnt ccgrnt_1 Source  SS df MS +Model  . This is because the timedemeaned IVs will generally be correlated with some elements of ui (usually. f. Err. the 2SLS residuals from (11. and * = &N1 SN ¨Z’u^ ^u’¨Z *. 7 i=1 i i i i8 g. Std.¨ XiB.79)).0012 = .
1407362 1. Xb) = 0.6635913 1.883394 _cons  .114748 1. 73 .2240491 .0476 0.594 0. t P>t [95% Conf.594 0.0448014 cgrant  .555 .3721087 .3721087 .4544153 .6635913 1.7122094 corr(u_fcode.6099016 . 11. The joint F test for the 53 different intercepts is significant at the 5% level.6343411 0.1564748 .056 .1407363 1.883394 _cons  .and they are very imprecise. Err.341 .2377384 . Interval] +d89  .04 0. fe sd(u_fcode) sd(e_fcode_t) sd(e_fcode_t + u_fcode) = = = .2377385 .953 0. so it is hard to know what to make of this.555 .5202783 .0050 3.6850584 cgrant_1  .689 0. To be added.961 0.4975778 .689 0.15.2632934 0.0063171 fcode  F(53.3826 clscrap  Coef.6343411 0.341 .961 0.13.0577 0. xtreg clscrap d89 cgrant cgrant_1.097 .5202783 .509567 . It does cast doubt on the standard unobserved effects model without a random growth term. To be added.033 (54 categories) The estimates from the random growth model are pretty bad .+ccgrnt  . 51) = Prob > F = 1.the estimates on the grant variables are of the wrong sign .51) = 1.674 0.1564748 .097 .2632934 0.4011 Fixedeffects (within) regression Number of obs = 108 n = 54 T = 2 Rsq within between overall F( = = = 0.0448014  . 11.6850584 ccgrnt_1  .6099015 . Std.
A) = N S [(si . 1 1 By definition.55).B ^ rN( FE .A) . sincethe ui are the FE residuals. C.^ . A.CA1X¨’i ui.A ^ 1/2 rN( . we use (11.54) and the representation 1 A (N 1/2 N ¨ Si=1X’i ui) + op(1). If we replace A. 74 .B) = Simple algebra and standard properties of Op(1) and op(1) give .CA1N1/2 S X¨’i ui + op(1) N i=1 where C = A.XiB).A) . N i=1 which implies by the central limit theorem and the asymptotic equivalence lemma that .11. i=1 _ E[(Z’i Zi) Z’i Xi] and si _ (Z’i Zi) Z’i (yi . and B with ^ their consistent estimators. .A ^ rN( . we get exactly (11.B) N 1/2 N = N S (si .55).CA1X¨’i ui] + op(1).17.A) . where ri _ (si .A) is asymptotically normal with zero mean and variance E(rir’ i ).N S (Z’ 7 i=1 i Zi) Z’i Xi8rN(BFE .A] i=1 & 1 N 1 * .A) = N S [(Z’i Zi)1Z’i (yi . To obtain (11.A ^ 1/2 rN( .XiB) . E(si) By combining terms in the sum we have .
and use E(ux) = 0: E{[y . 2 = E(u Now.. obtain the 1 ~ ~ For the robust test.m(x. r~i. mixi2.75). where ~Q1 is the From (12.Qo) .Q) evaluated under the null hypothesis.m(x.N. 12. * the turning point is z2 = d. Since ^ q3/(2^q4).Qo) . Take the conditional expectation of equation (12. where ~ ~ ui = yi .Q) = g[xB + d1(xB)2 + d2(xB)3]W[x + 2d1(xB)2 + 75 . a.Qo) .Q)]2 2 = E(u x) + [m(x.mi.Q)] 2 x} x) + 0 + [m(x.72). c.m(x.m(x. and the second term is clearly (although not uniquely. we can compute the usual LM 2 ~ ~ ~ statistic as NRu from the regression ui on mixi1. the first term does not depend on Q minimized at = Qo Q. By the chain rule.m(x.Q)]E(ux) + E{[m(x. we first regress mixi2 on mixi1 and * K2 residuals. Dbm(x. in general).5. ^ ^ b.4) with respect to x. the gradient of the mean function evaluated under the null is restricted NLS estimator.1. Then we compute the statistic as in regression (12.Q) = exp(x1Q1 + x2Q2)x.CHAPTER 12 12. Since d^E(yz)/dz2 = exp[^q1 + ^q2log(z1) + ^q3z2 + ^q4z22]W(^q3 + 2^q4z2).Qo) .. 12..Q)]2.Q)] 2 x] = E(u2x) + 2[m(x. We need the gradient of m(xi. The approximate elasticity is ^ dlog[E(y z)]/dlog(z1) = d[q^1 + ^ q2log(z1) + ^q3z2]/dlog(z1) = ^q2. Dqm(x.3. i = 1. This is approximated by 100Wdlog[E(yz)]/dz2 = 100Wq3. Dq~mi = exp(xi1Q~1)xi _ ~mixi..
and I will sketch how each one goes.(xB)3]. ^ ^ be the vector of nonlinear least Let u i That is. )o is 1 N N ^ ^ Qg S ^^ui^^u’i i=1 is consistent for Qog as N 8. linear combination of ui(Qo) of (xi.7. * 1 vector containing the uig. define uig 0. where ~ gi ~ _ g(xiB ). ~ ~ 2 ~ 3 Ddm(xi. one can verify the hint directly. where the linear combination is a function Since E(uixi) = 0. 2 Ddm(x. g(W) 12. 76 .Q~) = g(xiB )[(xiB) . If G(W) is the identity function. hopefully. E[Dgsj(wi.~Q) = g(xiB )xi and Therefore.. even though the actual derivatives are complicated.G) = Dqm(xi. which has the same consequence. Alternatively.G)xi] = 0. the usual LM statistic can be 2 ~ ~ ~ ~ 2 ~ ~ 3 obtained as NRu from the regression ui on gixi. we can verify condition (12.G). and so its unconditional expectation is zero.3d2(xB) ]. a consistent estimator of )^ _ because each NLS estimator. With this definition.m(xig. a. element of s(wi. L b.Qo).G. g = 1. Then Let B~ denote ~ Dbm(xi. the NLS estimator with d1 = d2 = 0 imposed. ) .Q. This part involves several steps.. so that E(uigxi) = Further.Q) = g[xB + d1(xB)2 + d2(xB)3]W[(xB)2. let G be the vector of distinct elements of parameters in the context of twostep Mestimation. the score for observation i is s(wi. and we get RESET. Then. For each i and g.G) is a linear combination of ui(Q). do NLS for each g.37). too. This shows that we do not have to adjust for the firststage estimation of )o. _ 1. giW(xiB) . So Each Dgsj(wi.Q.Qo.Q)’) ui(Q) 1 where. let ui be the G Then E(uiu’ i xi) = E(uiu’ i) = squares residuals.G) is a _ ui. giW(xiB) . and collect the residuals..the nuisance Then.(xiB) ]..Qo. the notation is clear. First. _ yig .Qo. )o. by standard arguments.
1 = Ao with respect to Dqmi(Qo)’)1 o Dqmi(Qo) + [IP t E(uixi)’]F(xi. As usual. Avar rN(^Q  Dqmi(Qo)]}1.Next. where F(xi. Hessian itself is complicated.Qo.Go)xi] = = verified (12. 1 So.Q) + [IP t ui(Q)’]F(xi. The = {E[Dqmi(Qo)’)o 1 Dqmi(Qo)].Q)’)1Dqm(xi. So. First.Go)’]: E[si(Qo.37) and that Ao = Bo.Q. we have Therefore.G). from Theorem 12. where P is the total number of parameters.Go)]. note that Dqmi(Qo) is a blockdiagonal matrix with blocks Dqgmig(Qog). of si(Q.Q. a 1 * Pg matrix. _ E[Hi(Qo. we have to derive Ao Dqmi(Qo)] Dqmi(Qo)xi]} = E[Dqmi(Qo)’)o E(uiu’ i xi))o = E[Dqmi(Qo)’)o 1 1 Dqmi(Qo)] )o)1 o Dqmi(Qo)] = E[Dqmi(Qo)’)o 1 Dqmi(Qo)].G) with respect to Hi(Q.G) is a GP Q.G) = Q can be written * P matrix.) 77 If )o .G) depends on xi. d.Go)si(Qo.Go)si(Qo.Go)’] = E[Dqmi(Qo)’)o uiu’ i )o 1 = E{E[Dqmi(Qo)’)o uiu’ i )o 1 1 1 1 Next. and show that Bo = Ao. and divide the result by N to get Avar(Q): &N1 SN D m (^Q)’) ^ 1 ^ *1 D mi(Q) q i q 7 i=1 8 /N N 1 & S D m (Q^)’) ^ 1 = Dqmi(^Q)*8 .3. q i 7 ^ ^ Avar(Q) = i=1 The estimate ) ^ can be based on the multivariate NLS residuals or can be updated after the nonlinear SUR estimates have been obtained. that involves Jacobians of the rows of )1Dqmi(Q) The key is that F(xi. not on yi. E[Hi(Qo.Go) Now iterated expectations gives Ao = E[Dqmi(Qo)’)o Qo) The Jacobian Dqm(xi. we derive Bo _ E[si(Qo. but its expected value is not. we replace expectations with sample averages and unknown ^ ^ parameters. (I implicityly assumed that there are no crossequation restrictions imposed in the nonlinear SUR estimation.Q. c. Dqmi(Qo)’)1 o Dqmi(Qo).
G. If u and x are independent. and the gradients differ across g. since Med(yx) = m(x. We cannot say anything in general about Med(yx). Standard matrix multiplication shows that ( 2 o so1 Dq1mi1 ’Dq 1 m oi1 0 W W W 2 Dqmi(Qo)’)1 o Dqmi(Qo) = ) 0 2 2 2 0 WW W 2 2 .Med(yx) = 78 a . see p. While this G The key is that Xi is replaced * P matrix has a blockdiagonal form. I cannot see a nonlinear analogue of Theorem 7. even when the same regressors appear in each equation. e. with Dqm(xi. then E(ux) and Med(ux) are both constants. But. as described in part d. unless Qog restrictive assumption  In the linear case.) These asymptotic variances are easily seen to be the same as those for nonlinear least squares on each equation. b. for all g.Bo) + Med(ux). so is its inverse. and Med(ux) could be a general function of x. g = 1.is blockdiagonal.d.a very Dqgmg(xi. is the same in all equations .Qog) = exp(xiQog)xi. which does not .. 2 2 o o s2 oG DqGm iG’DqGmiG 9 0 ^ Taking expectations and inverting the result shows that Avar rN(Qg .. 12..9..Qog) = 0 2 W W W 0 2  s2og[E(Dqgmoig’Dqgmoig)]1. say a and d. a.Qog) varies across g. mg(xi.5 does not extend readily to nonlinear models. The first hint given in Problem 7.7. Then E(yx) .Qog) = exp(xiQog) then Dqgmg(xi. (Note also that the nonlinear SUR estimators are asymptotically uncorrelated across equations.Qo). if Dqgmg(xi. the blocks are not the same even when the same regressors appear in each equation. 360.Qog) = xi For example.
Bo) .m(xi.B)]’ui} + E{[m(xi. which I will ignore .Bo) m(xi.B)]}’{ui + [m(xi.Bo) .m(xi. the identification assumption is that E{[m(xi.m(xi.B) = XiB for Xi a G B $ Bo. Bo must uniquely minimize E[q(wi. b. When u and x are independent. But it could just be that u and x are not independent. where m(xi.m(xi. * K matrix. For consistency of the MNLS estimator.B)]’[m(xi.depend on x.B)]’[m(xi. In a linear model. we need . and there is no ambiguity about what is "the effect of xj on y. the condition is (Bo  B)’E(X’i Xi)(Bo  B) > 0.W) is twice continuously differentiable. Therefore.B)]’[yi .B)]} because E(uixi) = 0.Bo) .B)]}) = E(u’ i ui) + 2E{[m(xi.m(xi.Bo) . That is.B)]} > 0. Provided m(x. Generally.Bo) m(xi. and this holds provided E(X’ i Xi) is positive definite." at least when only the mean and median are in the running. a. Bo = E[Dqmi(Bo)’uiu’D i qmi(Bo)] These can be consistently estimated in the obvious way after obtain the MNLS estimators. . B $ Bo.B)]} = E(uiu’ i ) + E{[m(xi. We can apply the results on twostep Mestimation. c.Bo) .the identification condition.3. there are no problems in applying Theorem 12.B)]} = E({ui + [m(xi. 12. 79 The key is that.Bo) .Bo) m(xi. and Ao = E[Dqmi(Bo)’Dqmi(Bo)].B)]’[m(xi.B)] = E{[yi  m(xi. the partial effects of xj on the conditional mean and conditional median are the same. we could interpret large differences between LAD and NLS as perhaps indicating an outlier problem.11.in addition to the regularity conditions.m(xi.m(xi. Then.
m(xi.Bo). i=1 converges uniformly in probability to E{[yi . 1 80 Dbmi(Bo)] Dbmi(Bo)] .B)]’[Wi(Do)] [m(xi.B)]’[W(xi. It follows easily that E[Ddsi(Bo.B)]’[Wi(Do)] ui}.  To obtain the asymptotic variance when the conditional variance matrix is correctly specified.m(xi.Do)xi] = 0.B)]}/2.B)]/2.m(xi.Do) = (IP t ui)’G(xi.underl general regularity conditions. 1 which is just to say that the usual consistency proof can be used provided we verify identification.Bo) .B)]}. 2E{[m(xi. we can write Ddsi(Bo. which implies (12. under E(yixi) = m(xi.m(xi.Do) for some function G(xi. when Var(yixi) = Var(uixi) = W(xi. In particular.m(xi.37). B As and the second term is minimized we would have to assume it is uniquely minimized.Do)si(Bo.Do)] [yi .7.Do). as always).B)]’[Wi(^D)]1[yi . 1 where E(uixi) = 0 is used to show the crossproduct term.Bo) . we can ignore preliminary estimation of provided we have a Do rNconsistent estimator. To get the asymptotic variance.Bo. First. it can be shown that condition (12.Bo.Do). the first term does not depend on at Bo.m(xi. that is.Do)’] = E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] 1 1 = E{E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] 1 1 Dbmi(Bo)xi]} = E[Dbmi(Bo)’[Wi(Do)] E(uiu’ i xi)[Wi(Do)] 1 1 = E{Dbmi(Bo)’[Wi(Do)] ]Dbmi(Bo)}.m(xi. we can use an argument very similar to the nonlinear SUR case in Problem 12. is zero (by iterated expectations. But we can use an argument very similar to the unweighted case to show E{[yi . we proceed as in Problem 12.B)]’[W(xi. 1 before.Bo) m(xi.Do)] [yi . This means that.37) holds. N 1 N S [yi .B)]} = E{u’ i [Wi(Do)] ui} 1 1 + E{[m(xi.m(xi.7: E[si(Bo.
and  a consistent estimator of Ao is ^ 1 A = N N ^ ^ 1 ^ S Dbm(xi. can be written as Dbm(xi.1.Do).Q)]} over $. No.Bo) in the usual way: A BA . Avar ^ 1 rN(B . Therefore.B )’[Wi(D)] uiu’ i [Wi(D)] Dbm(xi. because exp(W) is an increasing function.Now. evaluated at (Bo. still works.D) is correctly specified for Var(yx). of course.Q)] $ exp{E[log f(yixi. i=1 c.Q)] > exp{E[log 81 In . through.Do) = B).Bo)’[Wi(Do)]1Dbm(xi.Bo)’[Wi(Do)]1Dbm(xi. of course. we estimate Avar ^ ^1^^1 rN(B . Taking expectations gives Ao _ E[Hi(Bo.Do).Q)]}. the asymptotic variance is affected because Ao Bo. Qe$ where the expectation is over the joint distribution of (xi.yi).B).Bo. f(yixi. for some complicated function F(xi.Do) that depends only on xi.Bo)} = Bo.Do)] = E{Dbm(xi.Bo. We know that Qo solves max E[log f(yixi.  CHAPTER 13 13. fact. Exactly the same derivation goes But. i=1 Now.B )’[Wi(D)] Dbm(xi.Q)]. from the usual results on Mestimation. the Hessian (with respect to Hi(Bo. and the expression for Bo no longer holds. Qo Therefore. also maximizes exp{E[log The problem is that the expectation and the exponential function cannot be interchanged: E[f(yixi.B).Bo) = Ao .Bo) + (IP t ui)’]F(xi. The consistency argument in part b did not use the fact that W(x. Jensen’s inequality tells us that E[f(yixi. ^ B = N $ The estimator of Ao in part b To consistently estimate Bo we use 1 N ^ ^ 1^ ^ ^ 1 ^ S Dbm(xi.
Q). 13. g 1 E[si(Fo)si(Fo)’xi] = E{[G(Qo)’] si(Qo)si(Qo)’[G(Qo)] g g 1 1 xi} = [G(Qo)’] E[si(Qo)si(Qo)’xi][G(Qo)] 1 1 = [G(Qo)’] Ai(Qo)[G(Qo)] . a. c. b. The log likelihood for observation i is Q) _ li( log g(yi1yi2. 82 Qo maximizes E[li2(Q)].Qo)Wh(y2x. we know that. E[ri2li1(Q)yi2. and therefore Qo maximizes E[ri2li1(Q)]. 13.xi]. Qo maximizes E[ri2li1(Q)yi2. and . we just replace 1 Qo with ~g ~ 1 ~ ~ 1 Ai = [G(Q)’] Ai(Q)[G(Q)] Q~ and Fo with F~: _ ~G’1~Ai~G1. First.5.3. Qo maximizes E[li1(Q)yi2.f(yixi.xi] for all (yi2. a.xi] = ri2E[li1(Q)yi2.xi. Parts a and b essentially appear in Section 15.4. but where it is based initial on si and Ai: & SN ~sg*’& SN A~g*1& SN ~sg* LMg = 7i=1 i8 7i=1 i8 7i=1 i8 & SN ~G’1~s *’& SN ~G’1~A ~G1*1& SN ~G’1s~ * = i8 7 i i8 7i=1 8 7i=1 i=1 N ’ N 1 N & S s~ * G~1G~& S A~ * G~’G~’1& S ~s * = 7i=1 i8 7i=1 i8 7i=1 i8 N N 1 N & S ~s *’& S ~A * & S ~s * = LM.Qo).Q)]}. Since ri2 is a function of (yi2. The joint density is simply g(y1y2. In part b. Since si(Fo) = [G(Qo)’] si(Qo). for all (yi2. and we would use this in a standard MLE analysis (conditional on xi).xi). since ri2 > 1. 1 b.36).xi]. The expected Hessian form of the statistic is given in the second ~g ~g part of equation (13. = i i i 7i=1 8 7i=1 8 7i=1 8 13.xi). Similary.xi).7.x.Q) + log h(yi2xi.
E[si1(Qo)si1(Qo)’yi2. byt the conditional IM equality for the density g(y1y2. So we have verified that an unconditional IM equality holds. where Hi2(Q) = Dqsi2(Q).xi). c. by iterated expectatins. Further. by the unconditional information matrix equality for the density h(y2x. we have shown that E[si(Qo)si(Qo)’] = E[ri2Hi1(Qo)] . since ri2 and si2(Q) are functions of (yi2.xi]. where si1(Q) _ Dqli1(Q)’ and si2(Q) _ Dqli2(Q)’.xi] = 0. E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’] + E[ri2si1(Qo)si2(Qo)’] + E[ri2si2(Qo)si1(Qo)’]. it follows that E[ri2si1(Qo)si2(Qo)’yi2.xi] = E[Hi1(Qo)yi2. expectation.xi] = 0 and.x.xi). For identification. E[si2(Qo)si2(Qo)’] = E[Hi2(Qo)].Q). and so its transpose also has zero conditional expectation. which means we can estimate the asymptotic variance of 83 rN(^Q . Combining all the pieces. we can put ri2 inside both expectations in (13. we have to assume or verify uniqueness.70) Since ri2 is a function of (yi2. where Hi1(Q) = Dqsi1(Q). E[si1(Qo)yi2. Then.Qo) by {E[Hi(Q)]}1.70). The score is si(Q) = ri2si1(Q) + si2(Q). this implies zero unconditional We have shown E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’].  . Now by the usual conditional MLE theory. E[ri2si1(Qo)si1(Qo)’] = E[ri2Hi1(Qo)].Q). Now. (13. As usual. Therefore.so it follows that Qo maximizes ri2li1(Q) + li2(Q).E[Hi2(Qo)] = {E[ri2Dqsi1(Q) + = E[Dqli(Q)] 2 Dqsi2(Q)] _ E[Hi(Q)].
and ri2 is a function of (yi2.s. and xi in the other . if and only if 1 is positive definite.A We use a basic fact about positive definite matrices: if A and * P positive definite matrices. as we showed in part d. 1 If we could use the entire random sample for both terms. we can break the problem into needed consistent estimators of E[ri2Hi1(Qo)] and E[Hi2(Qo)]. the result conditional MLE would be more efficient than the partial MLE based on the selected sample. Ai2(Qo) Since. From part c.s. i=1 where the notation should be obvious. the asymptotic variance of the partial MLE is {E[ri2Ai1(Qo) + Ai2(Qo)]} .E[ri2Ai1(Qo) + Ai2(Qo)] 84 . Instead. for which we can use iterated expectations.d.xi]. it follows that E[ri2Ai1(Qo)] = E[ri2Hi1(Qo)]. conditions. Interestingly. Now. because E[Ai1(Qo) + Ai2(Qo)] .Qo) is  1 N N S (ri2H^i1 + H^i2). (yi2.to consistently estimate the asymptotic variance of the partial MLE. one consistent estimator of rN(^Q .d.{E[Ai1(Qo) + is p.d. even though we do not have a true conditional maximum likelihood problem. under general regularity 1 N S ri2^Ai1 consistently estimates E[ri2Hi1(Qo)]. N This implies that. This i=1 completes what we needed to show. e. the asymptotic variance would be {E[Ai1(Qo) + Ai2(Qo)]} . N S ^Ai2 is consistent for E[Hi2(Qo)] i=1 by the usual iterated expectations argument.but conditioned on different sets of variables.B is p. since Ai1(Qo) _ E[Hi1(Qo)yi2. then A . we can still used the conditional expectations of the hessians . Bonus Question: Show that if we were able to use the entire random sample. as we discussed in Chapters 12 and 13.xi). Answer: B are P 1 B . But. Similarly. by 1 N _ E[Hi2(Qo)xi]. definition. this estimator need not be positive definite. 1 Ai2(Qo)]} 1 But {E[ri2Ai1(Qo) + Ai2(Qo)]} 1 .xi) in one case.
c.d. we would consistently estimate D1 by OLS. using instruments (x1. Finally. 13. If g2 $ 1. if not impossible. We can see this by obtaining E(y1x): E(y1x) = x1D1 + Now.ri2 > 0. 2SLS using the given list of instruments is the efficient. so we cannot write E(y1x) = x1D1 + g g1(xD2) 2. To be added. No.1.these would generally improve efficiency if g2 $ 1. If E(u2x) = 2 s22. Even under homoskedasticity.11.ri2)Ai1(Qo)] is p. a.= E[(1 . and 1 . when g = x1D1 + g g1E(y22x) + E(u1x) g g1E(y22x). regression y2 on x2 consistently estimates 85 D2. the parameter g2 does not appear in the model. CHAPTER 14 14. Nonlinear functions of these can be added to the instrument list . to find analytically if b. the optimal weighting matrix that allows heteroskedasticity of unknown form should be used. single equation GMM estimator.x2). if we knew Of g1 = 0.3. 13. one could try to use the optimal instruments derived in section 14. Otherwise. g1 = 0. The simplest way to estimate (14. (since Ai1(Qo) is p.5. While the the twostep NLS estimator of .s. g g2 $ 1. in fact.9.s. course. these are difficult.d. E(y22x) $ [E(y2x)] 2. To be added.35) is by 2SLS. we cannot find E(y1x) without more assumptions.
2.^ g2 yi1 on xi1.3.B’)’. pt0 + xiPt + vit.(L3 + = HQ for the (3 + 9K) by 86 B)’]’.3.63). 1 _ Now we can verify (14. With the restrictions imposed on we have pt0 = j. (This is an g2 = 1.5.L’ 2 . P3 = [L’ 1 . * (1 + 4K) matrix H defined .L’3 ]’. So. the plugin method works: it is just the usual 2SLS estimator. we can write P B)’.(L2 + Therefore. Then the asymptotic variance of the GMM estimator has the form (14.Qo)r(wi.Qo). 1 14. * 1. %o function of xi and let * L matrix that is a Let Zi be a G be the probability limit of the weighting matrix.L’ 3 .3. (xiD2) will not be consistent for example of a "forbidden regression.57) with r = 1: E[s(wi)s (wi)’] = G’ o %oE[Z’ i r(wi. Let Zi be the G * G matrix of optimal instruments in (14. P2 = [L’ 1 .54). t = 1. t = 1.10) with Go = E[Z’ i Ro(xi)].Qo)’)o(xi) Ro(xi)] * 1 = G’ o %oE[Z’ i E{r(wi.Qo)’xi})o(xi) Ro(xi)] 1 = G’ o %oE[Z’ i )o(xi))o(xi) Ro(xi)] = G’ o %oGo = A. * 14.2. where we suppress its dependence on xi.Qo). and then P is the 3 + 9K * 1 vector obtained by Let Q = (j.L’ 1 .") When D1 and g2.Qo)r(wi. G’ o %oZ’ i r(wi. P1 = [(L1 + B)’. take A _ G’o %oGo and s(wi) _ * The optimal score function is s (wi) Ro(xi)’)o(xi) r(wi.L’2 . in (14. We can write the unrestricted linear projection as yit = where Pt is 1 + 3K stacking the the Pt Pt.L’3 ]’.L’ 2 .
3. in (14. as described in the hint. E(si1s’ i1) = E(X’ i rir’ i Xi) = iterated expectations argument. The choices of si1. where ri = vi ˇ ˇ Therefore. The first order condition is easily seen to be ^1 ^ ^ 2H’% (P .a. we just need to verify (14. when H’%o H .9.HQ). si2 (with added i subscripts for clarity).is nonsingular . we know that E(rir’ i xi) =  ljTvi.56) for this choice of ¨ ¨ Now. the minimization problem becomes ^ ^1 ^ min (P . ljTvi) = ¨X’i vi = ¨X’i (cijT + ui) =  .we have 1 Q^ ^1 1 ^1^ = (H’% H) H’% P.which occurs w. su2E(Xˇ’i Xˇi) _ su2A1 by the usual This means that.&1 0 0 0 0 * 0 IK 0 0 0 0 IK 0 2 2 2 0 1 0 H = 0 0 1 0 0 70 2 2 2 2 2 2 2 2 2 2 0 0 IK 0 0 0 IK 0 0 0 0 0 IK 0 0 0 IK 0 IK 0 0 0 IK 0 0 0 IK IK 0 2 2 2 0 0 0 . 14.HQ)’% (P .2.55) and (14.HQ) = 0 ^1 ^ ^1^ (H’% H)Q = H’% P.1. QeR P where it is assumed that no restrictions are placed on Q. A1. assuming H’% H is nonsingular . RE. Now. and A2 are given in the hint.1. 10. IK 0 0 0 0 IK8 2 2 2 2 2 2 2 2 2 2 14. X’ i ri = X’ i (vi 87 r.7. or ^1 Therefore. With h(Q) = HQ. ¨ ˇ But si2s’ i1 = X’ i uir’ i X i. from Chapter s2uIT under RE. r _ su2. We have to verify equations (14. and RE.  Now.55).p.56) for the random effects and fixed effects estimators.
This verifies (14.  s2u. 88 To finish off the proof. ljTxi) = ¨X’i Xi = ¨X’i ¨Xi.56) with r . ¨ ˇ ¨ ˇ So si2s’ i1 = X’ i rir’ i Xi and therefore E(si2s’ i1xi) = X’ i E(rir’ i xi)Xi = It follows that E(si2s’ i1) = ¨ ˇ ¨ note that X’ i Xi = X’ i (Xi = su2E(X¨’i Xˇi). su2X¨’i Xˇi.¨ X’ i ui.
of course.d1 = 1) . the estimates are the probit estimates. i = 1.3...d1 = 0) = F[z1D1 + (g1 + g3)z2 + g2] .1..in the first case.z2.. Since the regressors are all orthogonal by construction . and the coefficient on dm becomes the difference in cell frequency between category m and category one. M. Again. If we drop d1 but add an overall intercept.1]. a. this is estimated as ^ ^ ^ ^ ^ 2 (g1 + 2g2z2)Wf(z1D1 + g1z2 + g2z2). b.. and these are necessarily in [0. b. 15.N. But this is easily seen to be the fraction of yi in the sample falling into category m. g3d1)Wf(z1D1 + g1z2 + g2d1 + g3z2d1).F(z1D1 + g1z2)..d1) = the partial effect of z2 is dP(y = 1z1.we 89 ..the coefficient on dm is obtained from the regression yi on dmi. The effect of d1 is measured as the difference in the probabilities at d1 = 1 and d1 = 0: P(y = 1z.z2. a.z2) = (g1 + 2g2z2)Wf(z1D1 + dz 2 2  g1z2 + g2z22). d1 . for given z.P(y = 1z. If P(y = 1z1.CHAPTER 15 15.d1) = (g1 + dz 2  F(z1D1 + g1z2 + g2d1 + g3z2d1). to estimate these effects at given z and .dkiWdmi = 0 for k $ m. .z2) = F(z1D1 + g1z2+ g2z2) then dP(y = 1z1. and all i . where. the fitted values are just the cell frequencies. Therefore. In the model P(y = 1z1. The fitted values for each category will be the same. the overall intercept is the cell frequency for the first category. m = 2.
where r = g1z2q + e. 5 2 2 g1z2 + 1 has a standard normal distribution independent of z.5.just replace the parameters with their probit estimates. under H0. Thus.) g1 = 2 and g1 = 2 give exactly the same model This is why we define r1 = g21. ^ui = yi ^ 90 5 Fi. ================================================ . (15.ez) = 0 by independence between e and (z. and use average or other interesting values of z. If P(y = 1z.F^i) ^  r1 = 0.q) with a standard normal distribution. g1z2 + (Not with respect to the zj. ^fi = f(zi1^D1). ====================================== It follows that 5 F&7z1D1/r g21z22 + 18*. qz ~ Normal(0. with respect to all probit parameters. Testing H0: r1 = 0 is most easily done using the score or LM test because. (For example. We would apply the delta method from Chapter 3. Also. Var(rz) = g12z22Var(qz) + Var(ez) + 2g1z2Cov(q. this follows because E(rz) = 2 2 E(ez) = 0. we would require the full variance matrix of the probit estimates as well as the gradient of the expression of interest. we have a standard probit model. and e is independent of (z. a.) F(z1D1 + g1z2q) then dP(y = 1z. this is what we can estimate ====================================== P(y = 1z) = along with D1.q) =  assuming that z2 is not functionally related to z1. r/r g1z2E(qz) + Thus.q). for P(y = 1z). Write y = z1D1 + r. 1 1 1 1 2 dz 2 15.ez) = g12z22 + 1 because Cov(q. Because q is assumed independent of z. * b.90) 2 c.g1z2 + 1). c. and ~ui _ u^i/r F^i(1 . such as (g1 + 2g2z2)Wf(z1D1 + g2z22). Because P(y = 1z) depends only on g1. Define Let D1 ^ denote the probit estimates under the null that Fi = F(zi1^D1).q) = g qWf(z D + g z q).
for each i.184405163 +Total  545.88 0.0824 0.0365936 _cons  .0000 0.65 0.0048884 0.48 0. The following Stata output is for part a: .0009759 black  .83 0.0159374 tottime  . the 2 score statistic can be obtained as NRu from the regression ~ ui 5 5 2 ^ f^izi1/r F^i(1 .0063417 0. t P>t [95% Conf.42 0. is simply only other quantity needed is the gradient with respect to null estimates.1954275 .0215953 .0205592 4.000127 9.1617183 .62151145 Residual  500. ================================================ on 2 a under H0.0012248 .673 .42942 arr86  Coef.9720916 8 5.0489454 .0116466 .43 0.2078066 hispan  . a.000 .90) with respect to r1 is.0235044 6. 2716) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 2725 30. 15.1543802 .000 .(the standardized residuals).0020613 . d.816514 2724 .0035024 .0160927 22. The r1 evaluated at the But the partial derivative of (15.0303561 . Interval] +pcnv  .0308539 . &r z2 + 1*3/2f(z D /r5g2z2 + 1).000 .3925382 91 .1133329 avgsen  .000 . r1 in But this is not a standard probit estimation.17 0.90) evaluated under the null estimates.0014738 . Std.55 0.F^i). The model can be estimated by MLE using the formulation with place of g21. NRu ~ ================================================ c21.000 .3609831 .20037317 Number of obs F( 8.1156299 .1295718 born60  .0028698 .867 . ^ fizi1 .329428 . reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Source  SS df MS +Model  44.7.37 0.844422 2716 . ========================================== (zi1D1)(zi2/2) 2 Then.0892586 .0797 . Err.000 . 1 i2 7 1 i2 8 9 i1 1 0 ^ ^ 2 ^ When we evaluate this at r1 = 0 and D1 we get (zi1D1)(zi2/2)fi. (zi1D^1)zi2 fi/r F^i(1 .0044679 4.0171986 0.F^i).34 0.0209336 7. D1.0128344 inc86  .581 .0089326 .007524 ptime86  . with respect to The gradient of the mean function in (15.
3609831 . test avgsen tottime ( 1) ( 2) avgsen = 0.000 .552 .001001 black  . Std.0255279 6.626 .25 to . 2716) = Prob > F = 0.0 F( 2.18 0.0210689 4.0150471 tottime  . so the probability of arrest falls by about 7.0035024 . The robust statistic and its pvalue are gotten by using the "test" command after appending "robust" to the regression command: .0 tottime = 0.0012248 .75 is about .1617183 . t P>t [95% Conf.1116622 .0269938 . in a couple of cases the robust standard errors are notably smaller.0042256 0.1171948 avgsen  .24 0.0161967 inc86  .0171596 0.49 0.0028698 .018964 8.000 . There are no important differences between the usual and robust standard errors.0892586 . test avgsen tottime 92 .0307774 .0027532 7.0062244 ptime86  . In fact.0479459 .0001141 10.8320 .0080423 .17 0.000 .867 .1305714 born60  .0824 .14 0..7 points.0000 0.61 0. b.3937449 The estimated effect from increasing pcnv from .000 . Err. 2716) Prob > F Rsquared Root MSE = = = = = 2725 37.33 0.0014487 .84 0.0020613 . robust Regression with robust standard errors Number of obs F( 8.010347 .73 0. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60.000 .036517 _cons  .2117743 hispan  .0215953 . Interval] +pcnv  .1915656 .077. qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 .5) = .42942  Robust arr86  Coef.0058876 0.59 0.1543802 .3282214 .154(.0167081 21.59 0.000 .
45 0. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = 1608.48 0.000 . Std.( 1) ( 2) avgsen = 0.508031 0 59.3138331 .62721 0 541 93 .0212318 0.000 .0 F( 2. 2716) = Prob > F = 0.4666076 .1164085 .0036983 black  .4143791 .96705 66.0055709 . and at the average values of the remaining variables: .2911005 .8360 c.651 .0168844 0.4192875 born60  .0 tottime = 0. hispan = 0.0046346 .12 0.0254442 ptime86  .6322936 3.0556843 0.387156 1.6406 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 1483.0543531 tottime  .0459949 inc86  .0812017 . z P>z [95% Conf.0000 0.4116549 avgsen  . born60 = 1. black = 1.000 .0654027 4. Err.1629135 .2 tottime  2725 .0512999 6.09 0.3255516 .0979318 .6076635 hispan  .67 0.0127395 .1837 1486.000 .60 0.607019 0 63.52 0.028874 .6941947 .6458 1483.0004777 9.0112074 .0720778 7.0774 arr86  Coef.45 0.1203466 _cons  .18 0.0719687 6.5529248 .3157 1483.950051 0 12 inc86  2725 54.017963 4.70 0. The probit model is estimated as follows: .6406 = = = = 2725 249.0076486 .4 ptime86  2725 . sum avgsen tottime ptime86 inc86 Variable  Obs Mean Std.8387523 4. we must compute the difference in the normal cdf at the two different values of pcnv. Min Max +avgsen  2725 . Interval] +pcnv  .000 .000 .840 . Dev.20 0.213287 Now.0407414 .548 .
3% of the time.6% of the time.. the probit predicts correctly about 96.10..25 .117) . gen arr86h = phat > .313 + . e. di .. The overall percent correctly predicted is quite high.0812*. d.467 + .387 ..normprob(. the probit is correct only about 10.75 . di 78/755 . Pr(arr86)) .0112 . but we cannot very well predict the outcome we would most like to predict. we first generate the predicted values of arr86 as described on page 465: .839 . di normprob(. Adding the quadratic terms gives .0046*54.0076*. tab arr86h arr86  arr86 arr86h  0 1  Total ++0  1903 677  2580 1  67 78  145 ++Total  1970 755  2725 .10331126 For men who were not arrested. predict phat (option p assumed.96598985 .97 + .553*.0127*. Unfortunately.553*. To obtain the percent correctly predicted for each outcome. for the men who were arrested. di 1903/1970 .10181543 This last command shows that the probability falls by about .632 .5 .. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq pt86sq inc86sq 94 . which is somewhat larger than the effect obtained from the LPM.1174364 ..117) .
4476423 .0566913 0.0 pt86sq = 0.0178158 .0340166 .04 0.7273198 avgsen  .0224234 4.372 .63e07 .62 0.2270817 note: 51 failures and 0 successes completely determined.2929913 .8166 1439.26 0.0620105 tottime  .067082 3. 95 .568 .83 0.97 0.3978727 born60  . z P>z [95% Conf.0562665 6.0058786 . test pcnvsq pt86sq inc86sq ( 1) ( 2) ( 3) pcnvsq = 0.0078094 .268 1439.0213253 ptime86  .1474522 .857) ~ .0000 The quadratics are individually and jointly significant.2089 1444. Err.57 0. The turning point is easily found as . Interval] +pcnv  .337362 .16 0.002 1. which does not make much sense.0 inc86sq = 0.1035031 .97 0.026909 inc86  .28e06 2.95 0.389098 . The quadratic in pcnv means that.0244972 0.54 0.8570512 .89 0. at low levels of pcnv.0039478 black  .798 .2937968 .000 .000 .3250042 pt86sq  .1438485 5.2663945 .1047 arr86  Coef.8005 Probit estimates Number of obs LR chi2(11) Prob > chi2 Pseudo R2 Log likelihood = 1439.4368131 .580635 hispan  .000 .0733798 5.1837 1452.0 chi2( 3) = Prob > chi2 = 38.0965905 pcnvsq  . there is actually a positive relationship between probability of arrest and pcnv.75e06 4.18 0.056957 .0139969 .000 .059554 inc86sq  8.0199703 0.1256351 . Std.8005 1439.8005 = = = = 2725 336.2167615 .Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: 7: log log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = = 1608.2604937 0.217/(2*.77 0.4630333 1.0145223 .0009851 5.0000 0. which means that there is an estimated deterrent effect over most of the range of pcnv.041 3.1349163 .00 0.3151 1441.0000171 _cons  .000 .127.7449712 .000 .2714575 3. .405 .8535 1440.
. during the iterations to obtain the MLE.. We really need to make two assumptions.9... Therefore.then 96 . = yilog(xiB) + (1 .. for each i. For any possible estimate B^.. Since the MLEs are consistent for the unknown parameters.. c. So.yi)log(1 .N. where x1 = 1. just as we can use an Rsquared to choose among different functional forms for E(yx). When we add the standard assumption for pooled probit .. we can use values of the loglikelihood to choose among different models for P(y = 1x) when y is binary. which is only welldefined for 0 < xiB < 1... b..11. This follows from the KLIC: the true density of y given x  evaluated at the true values. The first is a conditional given xi = (xi1.. Let P(y = 1x) = xB. the joint density (conditional on xi) is the product of the marginal densities (each conditional on xi).T. the loglikelihood function is well ^ defined only if 0 < xiB < 1 for all i = 1. of course .... It may be impossible to find an estimate that satisfies these inequalities for every observation. that is.15. a. asymptotically the true density will produce the highest average log likelihood function. this condition must be checked. exogeneity assumptiond: The second assumption is a strict D(yitxi) = D(yitxit)..yiT) are This allows us to write f(y1.xiT). (yi1. especially if N is large. t = 1.that D(yitxit) follows a probit model .maximizes the KLIC. li(B) Then.. 15. independence assumption: independent.yTxi) = f1(y1xi)WWWfT(yTxi).xiB)....
.. b. Then a probit model to evaluate the treatment effect is P(y = 1x) = F(d0 + d1d2 + d2dB + d3d2WdB + xG). In the former case. of the change in the response probability over time. we have q^ _ [F(^d0 + ^d1 + ^d2 + ^d3 + x^G) .[F(d0 + d1 + xiG)  F(^d0 + xi^G)]}.. we need to compute the "differenceindifferences" estimate. We would have to use the delta method to obtain a valid standard error for either ^ q or ~q.. say x.  and in the latter we have N ^ ^ ^ ^ q _ N1 S {[F(d^0 + d^1 + d^2 + d^3 + xiG ) . If there are no covariates. d2WdB. between groups B and A. Both are estimates of the difference.  which requires either plugging in a value for x. d2. dB.F(^d0 + ^d2 + x^G)]  ^ ^ ^ . We would estimate all parameters from a probit of y on 1. a. The estimated probabilities for the treatment and control groups.F(d0 + d2 + xiG)] ~ i=1 ^ ^ ^ .[F(d0 + d1 + xG)   F(^d0 + x^G)]. will be identical across models.13.yTxi) = T p [G(xitB)]yt[1 . t=1 and so pooled probit is conditional MLE. Let d2 be a binary indicator for the second time period. c. or averaging the differences across xi.f(y1. Once we have the estimates. 97 .G(xitB)]1yt. and x using all observations. where x is a vector of covariates. there is no point in using any method other than a straight comparison of means. both before and after the policy change. 15. and let dB be an indicator for the treatment group.
We would be assuming that the underlying GPA is normally distributed conditional on x. a.c.Do)dc. ordered probit with known cut points. (Clearly a conditional normal distribution for the GPAs is at best an approximation....Go) = f1(y1x. 15..Go) = i p fg(ygx.including an intercept . To be added.. this depends only on the observed data..19. since we have independence conditional on (x.yG are dependent without conditioning on c.. 98 ....Go)f2(y1x.yG) given x is obtained by integrating out with respect to the distribution of c given x: 8& G * g(y1...yGx.Go)WWWfG(yGx.c): f(y1. but we only observe interval coded data. 15..Go) h(cx. Along with the bj . (xi... The density of (y1.c.c. and the unknown parameters.Gg)*h(cx .yiG). We obtain the joint density by the product rule.c.. equivalently.) s2. The log likelihood for each i is # 8i& pG f (y x . 7 8 8 g=1 where c g is a dummy argument of integration.17.15. y1.yGx.D)dc$.c).Go). i 3 87g=1 g ig i 8 4 log As expected.c.yi1.we estimate The estimated coefficients are interpreted as if we had done a linear regression with actual GPAs. Because c appears in each D(ygx.c. c. We should use an interval regression model..15.. 1 2 G b...
e. y = log(c) sf[(y . the longer we wait to censor. which is just Normal(xiB. B2 1 f[(yi . P[log(ti) = log(c)xi] = P[log(ti) > log(c)xi] * = P[ui > log(c) . The density of yi * density of yi < log(c).xiBxi] = 1 As c L 8. where ui might contain unobserved ability.ci) has the same form as the density of yi given xi above. Under H0. Since ui is independent of (xi. Thus.xiB]/s} L F{[log(c) . This simply says that. and so P[log(ti) = log(c)xi] L 0 as c L 8. < yxi) = P(yi* < yxi). The LR statistic is distributed asymptotically as LR = 2(Lur  Lr).ci). which is treated as exogenous.s2). The assumption that ui is independent of ci means that the decision to censor an individual (or other economic unit) is not related to unobservables * affecting ti. To test H0: statistic. the density of yi given (xi. I would probably use the likelihood ratio This requires estimating the model with all variables. 1. 2 c. _ log(ti) (given xi) when ti < c is the same as the b. except that ci replaces c. a.CHAPTER 16 16.s ) = 1[yi = log(c)]Wlog(1 . the less likely it is that we observe a censored observation.xiB]/s}.xiB)/s]. Thus.xiB]/s}. P(yi _ log(ti*). = 0.1. we do not wait longer to censor people of lower ability. then the 99 . f(yxi) = 1  This is because. for y Thus.xiB]/s}) f(yxi) = 1  + 1[yi < log(c)]Wlog{s d. y < log(c). F{[log(c) . in something like an unemployment duration equation. Note that ci can be related to xi. if xi contains something like education. li(B.xiB)/s]}. and then the model without x2. LR is c2K2.F{[log(c) . the density for yi = log(ti) is F{[log(c) .
a1 < y < a2). Since y = y a2).xB)/s] 100 F[(a1 .3.xB) * E(y = xB + sE[(u/s)x. x.a1 < y < a2)W{F[(a2 .censoring time can depend on education.xB)/s]}/{F[(a2 . E(yx.a1 < y < a2) = E(y * when a1 < y = xB + u.a1 < y* < a2) = xB + E(ux. * * x. P(yi < yxi) = P(y*i < yxi) = F[(y .a2)/s] = a1F[(a1 .xB)/s] = xB + s{f[(a1 .xB)/s < u/s < (a2 .xB)/s]} = E(yx.xiB)/s] F[(a1 . and a1 < y (1/s)f[(y . Now.xB)/s] + a2F[(xB .xiB)/s] F[(a2 . we can easily get E(yx) by using the following: E(yx) = a1P(y = a1x) + E(yx.xB)/s]} . Taking the derivative of this cdf with respect to y gives the pdf of yi conditional on xi for values of y strictly between a1 and a2: * b.xB < u < a2 .F[(a1 .(a1 .xiB)/s]. P(yi = a1xi) = P(yi * = < a1xi) = P[(ui/s) < (a1 .xB. using the hint. Therefore. * But y < a2.F[(a2 . Similarly.xiB)/s].xB < u < a2 .a1 < y* < < a2 if and only if a1 .xB)/s] .xiB)/s]. P(yi = a2xi) = P(yi * = P[(ui/s) = > a2xi) = P(xiB + ui > a2xi) > (a2 .xiB)/s].xB)/s] + E(yx.xiB)/s] = 1 .xB)/s]  f[(a2 . 16. a.a1 .a1 < y < a2)WP(a1 < y < a2x) + a2P(y2 = a2x) = a1F[(a1 . Next. for a1 < y < a2.
f. [In some restrictive cases. write a2)/s]. f2 _ f[(a2 .xB)/s]} (16.a2)/s]} + 1[a1 < yi < a2]log{(1/s)f[(yi .] d. dE(yx) = (a /s)f b + (a /s)f b 1 1 j 2 2 j dx j + (F2 . at the right endpoint.a1 < y* < a2) $ xB.xiB)/s]. and so it * c.F1)bj + [(xB/s)(f1 .xB)/s]. just plug The expressions can be evaluated at interesting values of x. x.xB)/s] + F[(a1 .xB)/s]} + a2F[(xB . After obtaining the maximum likelihood estimates these into the formulas in part b.xB)/s] = f[(xB  F1 _ F[(a1 . Note how the indicator function selects out the appropriate density for each of the three possible cases: at the left endpoint. or strictly between the endpoints.xiB)/s]} + 1[yi = a2]log{F[(xiB . on x in the subpopulation Generally.57) s{f[(a1 . We can show this by bruteforce differentiation of equation (16. there is no reason to think that this will have any simple relationship to the parameter vector B. f1 _ f[(a1 . The linear regression of yi on xi using only those yi such that a1 < yi < a2 * consistently estimates the linear projection of y * for which a1 < y < a2. We get the loglikelihood immediately from part a: li(q) = 1[yi = a1]log{F[(a1 .xB)/s] . As a shorthand.f[(a2 . the regression on the restricted subsample could consistently estimate B up to a common scale coefficient.xiB)/s]}. e. B^ and s^2.xB)/s].+ (xB)W{F[(a2 .f2)]bj  101 Then .57).a2)/s]. From part b it is clear that E(y would be a fluke if OLS on the restricted sample consistently estimated B. and F2 _ F[(a2 .
the scaled ^ bj Generally. 102 . since we act as if we were able to do OLS on an uncensored sample. where the first two parts are the derivatives of the first and third terms. respectively.58) ^ ^ F[(a1 . The partial effects on E(yx) are given in part f.{[(a2 . Intepretating the results is even easier.xB)/s. ^ s appears in the partial effects along with the ^bj. These are estimated as ^ ^ {F[(a2 . and the last two lines are obtained from differentiating the second term in E(yx).xB)/s]f2}bj. x. note that It ^ bj with that of ^gj. but it is often roughly true. For data censoring where the censoring points might change with i.xiB)/s]  all i to obtain the average partial effect. (16. does not make sense to directly compare the magnitude of By the way. we expect ^ ^ gj ~ ^rWb j. which is necessarily between zero and one. which is the expression we wanted to be left with. there is no sense in which ^ s is "ancillary.xiB )/s]} across In either case.57). Of course.xB)/s]f1}bj .xB)/s].(a2 ." h. we could average {F[(a2 . We could evaluate these partial effects ^ ^ Or. can be compared to the ^ gj. The scale factor is simply the probability that a standard normal random variable falls in the interval [(a1 .xB)/s] where the estimates are the MLEs. where 0 < ^ r < 1 is the scale factor. terms cancel except (F2  Careful inspection shows that all F1)bj.xB )/s]}bj. in (16. say. g.+ {[(a1 . this approximation need not be very good in a partiular application.  at. the analysis is essentially the same but a1 and a2 are replaced with ai1 and ai2. ^ ^ ^ F[(a1 .
0986582 tenure  .132288 11 9.547 0. t P>t [95% Conf.0673714 0.000 .560 .098923 .812 0.384 . Std.50 0. ll(0) Tobit Estimates Number of obs chi2(11) Prob > chi2 Pseudo R2 Log Likelihood = 519.000 .0287099 .0044362 0.972074 615 .1772515 3.000 .0022495 .3518203 b.258 .2145 hrbens  Coef.53183 hrbens  Coef.2084814 male  .3718 0.1900971 male  .0040631 .585 .3768401 .0869168 .251898 .0103333 .2556765 . reg hrbens exper age educ tenure married male white nrtheast nrthcen south union Source  SS df MS +Model  101.1825451 .0834306 .0043435 0.0112981 .0000 0.021397 .0058343 educ  . tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union.2455481 nrtheast  .423 0.0351612 married  .16.0492621 .057 .021225 .0737578 1. t P>t [95% Conf.5.0046627 0.0746602 1.86 = 0.0000 = 0. The results from OLS estimation of the linear model are .1490686 . Interval] +exper  .0281931 .0678666 0.0088168 9.6999244 .000 .1608084 .000 .0551672 4.0696015 .0050939 .468 .839786 604 .4748429 _cons  .492 .0499022 7.635 0.0614223 nrthcen  .0523598 4.1027574 .0284978 . Std.2538105 103 .1038129 union  . The Tobit estimates are .000 .19384436 Residual  170.0538339 1.048028 .0115164 age  .0477021 .858 0.0025859 .282847328 +Total  271.710 0.762 0.583 0.0035481 7.005544 .186 .66616 = 616 = 283.054929 .206 .325 0.364019 white  .000 1.0899016 .946 0.909 0.0061263 educ  .0029862 .0510187 1.1042321 tenure  . a.2788372 . 604) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 616 32.949 0.688 0. Err.0132201 age  .0657498 .1473341 .672 .000 .084021 south  .079 .078604 1.0083783 9.552 0.726 0.0360227 married  .265 0.0029666 .442231015 Number of obs F( 11.0994408 .3604 .131 0.0037237 7. Err.082204 .010294 .0041162 0. Interval] +exper  .811 0.2282836 .871 0.3547274 white  .
685 0.1639731 .2870843 .0973362 tenure  .0912729 south  .000 .037525 . 2 c.5551027 .728 .0787463 married  . ll(0) Tobit Estimates Number of obs chi2(13) Prob > chi2 Pseudo R2 Log Likelihood = 503.0139224 .632 0.000 .0125583 .0001487 3.0693418 0.1708394 .047408 age  .3006999 . t P>t [95% Conf.0043428 0.5060039 _cons  . or about 6.753 0. Std.0480194 .0775035 1.483 0.0522697 7. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq. Err.0040294 .354 .017479 .0324014 . Interval] +exper  .0802587 .8137158 .000 1.0906783 .000 .327 0.252 0.000 .0489422 .0044995 educ  .2300547 .528 .0602628 .62108 = 616 = 315. as we know.493 .1187017 union  .0760238 0.18307 .239 .230 0.597 0.717 0.540 0.197323 .0528969 1.4033519 . Here is what happens when exper and tenure 2 are included: .351 0.928 0.1536597 .033717 .581 0.0698213 0. You should ignore the phrase "Ancillary parameter" (which essentially means "subordinate") associated with "_se" as it is misleading for corner solution applications: ^2 s appears directly in ^E(yx) and ^E(yx.3874497 .1034053 south  .2388 hrbens  Coef. summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations The Tobit and OLS estimates are similar because only 41 of 616 observations.2562597 . have hrbens = 0.177 .004 0.0000 = 0.nrtheast  .0581357 . the Tobit estimates are all slightly larger in magnitude.1880725 4.1146022 union  .0008445 .000 .0306652 .3621491 white  . Again.801 .1503703 . As expected.4443616 +_se  .715 0.4878151 expersq  .0246854 . this reflects that the scale factor is always less than unity.180 0.0631812 .1012841 nrthcen  .348 0.0714831 .051105 7.0085253 3.0743625 nrthcen  .0005524 .0002604 104 .0539178 4.000 . the parameter "_se" is ^ s.1891572 .95 = 0.0165773 (Ancillary parameter) Obs.1753675 male  .0086957 9.0768576 1.2416193 nrtheast  .0104947 5.629 .0713965 0.7% of the sample.316 .000 .y > 0).0709243 0.0778461 .
0004098 3.0001417 3.0081297 3.105 1.0438005 .108095 . d.261 0.243 0.5099298 . tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq ind2ind9.1853532 5.4137824 1.04645 .091 0.0000 = 0.615 0.tenuresq  .319 1.0501776 1.0161572 (Ancillary parameter) Obs.086 0.376006 1.1317401 .002134 .214924 ind7  .409 0.0335907 .349246 .5750527 .06154 .579 0.307673 .9436572 .794 .0046942 educ  . so they should be included in the model.330 0.372 0.828 0.3504717 white  .2148662 .168 1.7377754 ind8  .5418171 .563 . There are nine industries. ll(0) Tobit Estimates Number of obs chi2(21) Prob > chi2 Pseudo R2 Log Likelihood = 467.7310468 .3716415 0.0400045 nrthcen  .0908226 union  .368639 0.056 0.373072 0.1532928 male  .2940 hrbens  Coef.5796409 +_se  .3617389 ind3  .0789402 .0005242 _cons  .2433643 .0724782 .888 0.107 .009 0.053115 .165 1.000 1.001 .09766 = 616 = 388.0034182 .0013291 .0726393 married  .231545 .1667934 .276 . t P>t [95% Conf.8203574 .408 .3682535 1.0003863 3.6107854 .0013026 .002 .0041306 0.0001623 tenuresq  .0256812 . summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations Both squared terms are very signficant.7536342 ind6  .000 .0667174 1.2632871 nrtheast  .3143174 .0108205 .0151907 (Ancillary parameter) 105 .0088598 8.0735678 1.343 0.7117618 . Std.624 0.000 . Interval] +exper  .0209362 .0963657 .001 .0655859 0.0721422 1.527 .2035085 .207 0.0267869 .0020613 .3948746 _cons  .0007188 .375 1.9650425 .000544 ind2  .1188029 .997 0.278 .0099413 5.910 0.000 . Err.091 0.295 0.0556864 4.0115306 .2375989 +_se  .0585521 south  .380 0.955 .993 .159 .0379854 .390 0.99 = 0.109 0.633 0.6276261 ind4  .3731778 .3257878 .3669437 0.0033643 .2411059 .0547462 .0506381 6.000 .1016799 .127675 ind9  .000 .3742017 0.3739442 0.0427534 age  .2351539 .5083107 .4137686 expersq  .4947348 ind5  .0963403 tenure  . and we use ind1 as the base industry: .387704 .0004405 .
7.098) = 73. test ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 F( = = = = = = = = 0.467.0 0. 595) = Prob > F = 9.0 0.0 0.0 0. with a worker in. 17. industry eight earning about 61 cents less per hour in benefits than comparable worker in industry one. notice that this is roughly 8 (= number of restrictions) times the F statistic.0 8.F(0x)].0 0. where F(Wx) is the cdf 106 . summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations . This follows because the densities conditional on y > 0 are identical for the Tobit model and Cragg’s model. in this example. a. The likelihood ratio statistic is 2(503.0 0. say.3. This is somewhat unusual for dummy variables that are necessarily orthogonal (so that there is not a multicollinearity problem among them).] 16.621 . then the density of y given x and y > 0 is f(Wx)/[1 . but the joint Wald test says that they are jointly very significant.66 0. it is roughly legitimate to use the parameter estimates as the partial effects. A more general case is done in Section Briefly.Obs.0000 Each industry dummy variable is individually insignificant at even the 10% level. with so few observations at zero.046. the pvalue for the LR statistic is also essentially zero. Certainly several estimates on the industry dummies are economically significant. if f(Wx) is the continuous density of y given x. [Remember.0 0.
a.from (16. From (6. One can imagine that some people at the corner y = 10 would choose y > 10 if they could.xB)/s]/s} for the Tobit model.y > 0) = {F(xB/s)} {f[(y . b.of y given x.xB)/s] + F(xB/s)} s{f(xB/s) .8) we have E(yx) = F(xG)WE(yx. If we take the partial derivative with respect to log(x1) we clearly get the sum of the elasticities. From Problem 16. This follows very generally . Then.f[(a2 . of the kind analyzed in Problem 16.y > 0)]. we can think of an underlying variable. with a1 = 0.8): log[E(yx)] = log[P(y > 0x)] + log[E(yx.3. and this 1 is exactly the density specified for Cragg’s model given y > 0. the upper limit of 10 is an arbitrary corner imposed by law. we have E(yx) = (xB)W{F[(a2 . a2 = 10.3(b). Taking the derivative of this function with respect to a2 gives 107 .y > 0) = F(xG)[xB + sl(xB/s)]. So. with a1 = 0. c.a2)/s].not just for Cragg’s model or the Tobit model . is appropriate. we get that f(yx. When f is the normal pdf with mean xB and variance s2. which would be the percentage invested in the absense of any restrictions. b.xB)/s]} + a2F[(xB . 16.9. there would be no upper bound required (since we would not have to worry about 100 percent of income being invested in a pension plan). The lower limit at zero is logically necessary considering the kind of response: the smallest percentage of one’s income that can be invested in a pension plan is zero. A twolimit Tobit model. On the other hand. c.
and count outcomes.59) We can plug in a2 = 10 to obtain the approximate effect of increasing the cap from 10 to 11.11.. and Var(x) has full rank K .10)/s]. or how we estimate a variety of unobserved effects panel data models with conditional normal heterogeneity.a2)/s] F[(xB . t=1 1 T   Of course. t=1 S Pt8*X + xiX + ai _ j + xiX + ai.(a2/s)f[(xB . corner solution outcomes. we would compute ^ s are the MLEs. (16. provided there is not true data censoring. An interesting followup question would have been: What if we standardize each xit by its crosssectional mean and variance at time t. ^ ^ F[(xB .regardless of the nature of y or x. If yi < 10 for i = 1. B^ and For a given value of x.a2)/s] . 16.xB)/s]Wf[(a2 ..T where T j _ 7&T1 S Pt8*X.xB)/s] + [(a2 . No.provided the second moments of y and the xj are finite..N.. OLS always consistently estimates the parameters of a linear projection . That is why a linear regression analysis is always a reasonable first step for binary outcomes. any aggregate time dummies explicitly get  swept out of xi in this case but would usually be included in xit. This extension has no practical effect on how we estimate an unobserved effects Tobit or probit model.dE(yx)/da2 = (xB/s)Wf[(a2 . d.xB)/s] + = F[(xB . and 108 .13. 16. B^ and ^ s are just the usual Tobit estimates with the "censoring" at zero. where We might evaluate this expression at the sample average of x or at other interesting values (such as across gender or race).a2)/s]. We simply have & 7 ci = .
say and rNasymptotically normal. for each random draw i Then. and )t The are consistent ^ 1/2 ^ _ ) (xit ... zit would not contain aggregate time dummies). CHAPTER 17 17. 16.2. given building and neighborhood characteristics. You might want to supplement this with an analysis of the probability that buildings catch fire... but accounting for the sample variation in cumbersome. let zit In _ (xit . ci = j + S xirLr + ai. again.. usual sample means and sample variance matrices.2.. and proceed t with the usual Tobit (or probit) unobserved effects analysis that includes the 1 time averages ^ zi = T  T S ^zit.T. we might assume cixi ~ Normal(j + ziX.assume ci is related to the mean and variance of the standardized vectors.  Then.T. one could estimate estimate Pt for each t using the cross section observations {xit: i = 1. If you are interested in the effects of things like age of the building and neighborhood demographics on fire damage. t = 1.Pt))t1/2.. But then a twostage analysis is appropriate. given that a fire has occured..15. P^t and )^ t would be It may be possible to use a much larger to obtain P^t and )^ t.Pt).N}.. 1/2 r t = 1. This is a rather simple twostep estimation t=1 method. We simply need a random sample of buildings that actually caught on fire. in which case one might ignore the sampling error in the firststage estimates. other words.1. then there is no problem. 109 . 2  from the population. form ^ zit P^t and )^ t.. To be added. This is the kind of scenario that is handled by Chamberlain’s more general assumption concerning T the relationship between ci and xi: ) X/T.sa) (where. where Lr = r=1 Alternatively...
y3 = 1) = z1D1 + Conditioning on y3 = 1 gives a1W(zD2) + g1l(zD3). a1v2 Then we can write E(y1z. a1(xi) < y < a2(xi).v2. ^ 17.5.G) . G is another Then the density of yi given xi. it cannot depend on z). D^2.B. we need the expected value of u1 + given (z.82) is that (u1.v3) is independent of z with a trivariate normal distribution.81) is u1 + a1v2. and a2(xi) was a function of family size (which determines the official poverty level).G)  In the Hausman and Wise (1977) study. the nature of v2 is restricted. E(y1z. is p(yxi. when si = 1[a1(xi) < yi < a2(xi)]. si = 1.  The key is to note how the error term in (17. yi = log(incomei).F(a 1 (xi)xi.si=1) = f(yx i .81) rNconsistent estimator. D2 in (17. we need to see what happens when y2 = zD2 + v2 is plugged into the structural mode: y1 = z1D1 + = z1D1 + So.17.G) .B. but If we use an IV approach. We can get by with less than this.v3) = z1D1 + where E[(u1 + a1W(zD2) + g1v3. we need assume 110 . a1v2)v3] = g1v3 by normality. F(a2(xi)xi.G).82) A sufficient condition for (17. This is essentially given in equation (17.B.81) its (17. (17.v3) to be linear in v3 (in particular. If the selection correction is going to work. If we replace y2 with y2. density f(yxi.3. a1(xi) = 8. where B Let yi given xi have is the vector indexing E(yixi) and set of parameters (usually a single variance parameter).14). the procedure is to replace a1W(zD2 + v2) + u1 a1W(zD2) + (u1 + a1v2).B.
if we cannot write y2 = zD2 + v2. where v2 is independent of z and approximately normal. zi3 using all observations. (zD1. cannot be consistently estimated using the OLS procedure. consistent estimators are obtained by using initial consistent D1 estimators of entire sample. then the OLS alternative will not be consistent. estimate Thus. v3 is indepdent of z and normally distributed. Necessary is that there must be at least two elements in z not 111 .83) y3 = max(0. Given D1 ^ Then. and Estimation of Estimation of D1 D2 is simple: just use OLS using the follows exactly as in Procedure 17. D2. 17.84) where y1 is observed only when y3 > 0. and z3. obtain and D^2. ^a2.zP3 + v3). and D^3 from the Tobit ^ yi3 ^ ^ on (ziD1). if we knew D1 and D2. we could consistently a1. equations where y2 is binary. Under the assumptions given.nothing about v2 except for the usual linear projection assumption. zD2.3 using the system y1 = zD1 + v1 (17. (ziD2). (17.zD2. As a practical matter. or is some other variable that exhibits nonnormality.a1(zD1) + a2(zD2) + z3D3 + v3) _ max(0. For identification. >From the usual argument. Thus.7. a. a1. and D3 from a Tobit of y3 on zD1.z3) can contain no exact linear dependencies. a2. This is why 2SLS is generally preferred.zP3 + v3). ^ ^ form ziD1 and ziD2 for each observation i in the sample. Substitute the reduced forms for y1 and y2 into the third equation: y3 = max(0. where v3 _ u3 + a1v1 + a2v2.
F(xG)Wexp(xB). NLS is generally consistent and rNasymptotically  normal.also in z3. b. The only difference is that Then follow the steps from part a. It is most easily done in a generalized method of moments framework. when two part models are specified with unobservables that may be correlated. Not when you specify the conditional distributions. To be added. I think. for the two parts. 112 .9. D2 must be estimated using Procedure 17. c. there is no sample selection bias because we have specified the conditional expectation for the population of interest. e. There is no sample selection problem because. 17. or conditional means. Then w given F(xG). s23. by definition. there is no sample selection Confusion arises. By definition.y > 0) = the NLS estimator of B Let w = 1[y > 0]. and the probit estimator of So we would plug in G. For example. x follows a probit model with P(w = 1x) = d. c. We only need to obtain a random sample from the subpopulation with y > 0. w = 1[xG + v > 0]. Obtaining the correct asymptotic variance matrix is complicated. problem. We need to estimate the variance of u3. a. 17. This is not very different from part a.11. b. Again. We would use a standard probit model. E(yx) = P(y > 0x)WE(yx. we could write y = wWexp(xB + u). you have specified the distribution of y given x and y > 0.3. If we have a random sample from that population.
one would try to find a variable that affects the fixed costs of being employed that does not affect the choice of hours. If we assume (u.w) . First. correlated. where twopart models are used to allow for fixed costs of entering the labor market. using the yi > 0 observations. E(uv) = rv and assume a standard normal distribution for v then we have the usual inverse Mills ratio added to the linear model: E[log(y)x. A standard t ^ r is a simple test of Cov(u. we can write log(y) = xB + u. In other words. If we make the usual linearity assumption. The interesting twist here is if u and v are Given w = 1. So E[log(y)x. is pretty clear.v) = 0. with mean zero.v) is multivariate normal.v) is independent of x. ^ ^ l(xi^G) to obtain B . Ideally.w] = wWexp(xB)E[exp(u)].we have E(yx. which we warned about in this chapter.w = 1] = xB + E(ux. identification of B comes entirely from the nonlinearity of the IMR.so that w = 0 6 y = 0. In labor economics.w = 1). Assume that (u.so that u is independent of (x. if u and v are independent . 113 While this would be a little less . once we absorb E[exp(u)] into the intercept). which implies the specification in part b (by setting w = 1.w) = wWexp(xB)E[exp(u)x. we would have a variable that affects P(w = 1x) that can be excluded from xB.w = 1] = xB + A twostep strategy for estimating probit of wi on xi to get G ^ and and l(xi^G). r. estimate a Then. This twostep procedure reveals a potential problem with the model that allows u and v to be correlated: adding the inverse Mills ratio means that we are adding a nonlinear function of x. Then. run the regression log(yi) on xi. then we can use a full maximum likelihood procedure. statistic on G B rl(xG).
b.5. we need sufficient price variation for the population that consumes some of the good. we use the Notice that our reason for using truncated Tobit differs from the usual application. A similar example is given in Section 19. This is very different from the sample selection model. we can Even with a full set of assumptions. Then. particularly.w = 1)] can be obtained under joint normality.B)WE[exp(u)x. 17. a.robust. the parameters.13. The point is to obtain partial effects of interest.y > 0). We cannot use censored Tobit because that requires observing x when whatever the value of y. see.w = 1)]. we can estimate E(yx) = F(xB/s)xB + sf(xB/s) because we have made the assumption that y given x follows a Tobit in the full population. where E[exp(u)x.44). Given such variation. y given x follows a standard Tobit model in the population (for a corner solution outcome). Provided x varies enough in the subpopulation where y > 0 such that b. 114 . equation (19. rank E(x’xy > 0) = K. Here.2. we can multiply this expectation by P(w = 1x) = that we cannot simply look at B F(xG). the underlying variable y of interest has a conditional normal distribution in the population. making full distributional assumptions has a subtle advantage: then compute partial effects on E(yx) and E(yx. In the case where an element of x is a derived price. the partial effects are not straightforward to obtain. E(yx. Usually.y > 0) = exp(x. For one. we can use truncated Tobit: distribution of y given x and y > 0. Instead.
92 0. probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 302.0017837 nodegree  .5898524 .01 0.0159447 .0534045 0.1 = 294.53 0.1726192 0.06748 = = = = 445 16.0371871 .170 .y0) = [E(y0w = 1) .5).992 .1449258 married  .   and so the bias is given by the first term. This follows from equation (18.07 0. z P>z [95% Conf. 18.0159392 1.0271086 1. but I did not do so: .090319 age  . E(y1) = E(yw = 1) and  Therefore.  First.CHAPTER 18 18.2271609 0.06748 = 294.0266 train  Coef. E(y1 .091519 .07642 = 294. This is a form of sample selection.0415 0.5). Interval] +re74  . The following Stata session estimates a using the three different regression approaches.1446253 . Std.0501979 . E(y0) = E(yw = 1).3.64 0.44195 . If E(y0w = 1) < E(y0w = 0).08 0.0122825 re75  .0189577 .0000719 .2468083 . on average.596 .37 0.0008734 0.1515457 2.1052176 . leads to an underestimate of the impact of the program.004 .3006019 115 .234 .19 0.4298464 black  .524 .934 .0005467 .06748 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 294.7389742 .E(y0w = 0)] + ATE1. b.1. a. Err.1041242 agesq  . It would have made sense to add unem74 and unem75 to the vector x. those who participate in the program would have had lower average earnings without training than those who chose not to participate. and.0016399 . by (18.
0450374 2.966 .045039 2.779 1. reg unem78 train phat Source  SS df MS +Model  1.661695 .63 0.80 0.210939799 +Total  94.416) .1638736 .5534283 .4572254 _cons  .104 1. 442) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 3.3009852 .103972 .84 0.60 0.8224719 444 .369752 1. t P>t [95% Conf.213564126 Number of obs F( 3.129489 1.0934459 .1624018 . Err.9204644 traphat0  .3184939 .45 0. Pr(train)) ..0449 0.0095 .340 .95 0.015 .0375 0.599340137 Residual  93.826664 .4793509 1.45993 unem78  Coef.3151992 0.195208 .233225 .0101531 .110242 .2378099 0.hisp  .6738951 .2284561 . t P>t [95% Conf.0139 0. gen traphat0 = train*(phat .222497 _cons  . Interval] +train  .37 0. 441) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 2.21153806 +Total  94.4775317 .018 .8154273 0.1066934 .0212673 .134 1.1030629 _cons  .4155321 .79802041 3 .213564126 Number of obs F( 2.13 0.5004545 .072 .4998223 442 . reg unem78 train re74 re75 age agesq nodegree married black hisp 116 .0190 0.0217247 phat  .3079227 1. Interval] +train  .28 0. Err.3226496 2 .0123 .0244515 441 . reg unem78 train phat traphat0 Source  SS df MS +Model  1.4877173 . sum phat Variable  Obs Mean Std.3579151 . Std.8224719 444 . Min Max +phat  445 .0181789 phat  .000 .50 0.45928 unem78  Coef.04 0. Dev.661324802 Residual  93.1987593 . predict phat (option p assumed.0994803 3.719599 . Std.
in this example.0550176 0.0068449 .1726761 _cons  .36 0. Err. 435) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 2.1979868 .0392887 .444 .1516412 .8224719 444 .1105582 .0114269 age  .111 .0053889 0.1078464 0. so we are not surprised that different methods lead to roughly the same estimate. Std.206263502 +Total  94.0659889 .566427604 Residual  89. t P>t [95% Conf.0923609 black  .22 0.716 . the average treatment effect is estimated to be right around .0538 0.0131441 . An alternative.636 .027 .07642 = 294.77 0.0004949 .2905718 0.Source  SS df MS +Model  5.06748 = 294.007121 .013 .0304127 .45416 unem78  Coef. probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 302.0421444 .06748 = = = = 445 16.09784844 9 .180637 .109 .0342 .5. Interval] +train  .213564126 Number of obs F( 9.451 .633 . is to use a probit model for unem78 on train and x.11.06748 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 294.0676704 agesq  .025669 . 18.0011038 .0815002 2.0296401 . of course.75 0.7246235 435 .0025525 .2342579 .0444832 2.0620734 0. I used the following Stata session to answer all parts: .3408202 hisp  .11: participating in job training is estimated to reduce the unemployment probability by about . training status was randomly assigned.8053572 .3368413  In all three cases.0266 117 .48 0.2512535 .60 0.421 .0094371 0.0003098 1.0080391 re75  .0040 0.47 0.07 0. a.75 0.60 0.49 0.0204538 .0231295 re74  . Of course.0001139 nodegree  .81 0.0189565 1.0415 0.1 = 294.1502777 married  .
43 0.9767041 Number of obs F( 9.927734 married  .0113738 .93248 27.55 0.1041242 agesq  .0371871 .583 . 435) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 1.050672 1.1515457 2.53 0.05 0.43 0.9992 = .6396151 age  . Std.482387 6.091519 .0159447 .6566 444 43.5898524 .004 .003026272 436 6.0360 0. 436) Prob > F Rsquared Adj Rsquared Root MSE = 445 =69767.524 .63 0.759 .997 35.9992 = 0.75 0.01 0.28 0.779 1.776258 9 78.92 0.656719 0.098774 0.369752 1.170 .197362 Residual  18821.5779 re78  Coef.1726192 0.train  Coef.8154273 0.257878 .5004545 .31 0.4298464 black  . reg re78 train re74 re75 age agesq nodegree married black hisp (phat re74 re75 age agesq nodegree married black hisp) Instrumental variables (2SLS) regression Source  SS df MS +Model  703.0161 6.1602 .688 17.2271609 0. Std.103972 .0624611 .104 1. t P>t [95% Conf.2686905 +Total  19525.9410e06 +Total  3.08 0.00172 0.7389742 .934 .3400184 .8517046 hisp  .008732134 118 Number of obs F( 8.0045238 0.554259 1.08 0.2468083 .37 0.1052176 .203039 0. Err.0122825 re75  .2232733 .87706754 444 . z P>z [95% Conf.0000719 .0064086 nodegree  1.0024826 .31125 35.0699177 18.992 .467 .203087 1.89168 _cons  4.2814839 0.87404126 8 .670 7.2746971 0.668 .4668602 .210237 2.8804 435 43.596 .47144 0.45109 re74  .00 0.1998802 .0763 0.7397788 agesq  .108893 black  2.0501979 .44 = 0.00263 .40 0. reg phat re74 re75 age agesq nodegree married black hisp Source  SS df MS +Model  3.1449258 married  .44195 .0534045 0. Interval] +re74  .367622 3.157 5. Pr(train)) .234 .42 0.1030629 _cons  .0000 = 0.0017837 nodegree  .0016399 .73 0.2284561 .613857 11.0189577 .3481955 re75  .662979 4.0863775 .0159392 1.19 0. Interval] +train  .963 2.484255158 Residual  .090319 age  .64 0.0271086 1.936 7.1446253 .3079227 1.3006019 hisp  .2953534 3.0008734 0. predict phat (option p assumed.826664 .0005467 .1453799 0. Err.
92 0.0553027 hisp  .1838453 .00036 98. 18.0000293 1.000 .0006238 294.0069301 . se = .000 .z] = E[exp(p0 + xP1 + zP2 + p3v)Wvx.z. a.80e06 16.000316 546. We can start with equation (18.0000328 nodegree  .9992. The IV estimate of a is very small .004 .0571603 .0001046 agesq  . which means there is virtually no separate ^ variation in Fi that cannot be explained by xi.000 .0359877 black  . again. d.0139209 .v)x. y = h0 + xG + bw + wW(x  J)D + u + wWv + e.0138135 .0004726 118.0016786 351. much smaller than when we used either linear regression or the propensity score in a regression in Example 18. This example illustrates why trying to achieve identification off of a nonlinearity can be fraught with problems.31 0.) The very large standard error (18. and.1826192 _cons  .v)Wvx.93 0.1726018 .594057 b.z] = xWexp(p0 + xP1 + zP2) where x = E[exp(p3v)Wv].0003207 .0068687 re75  .0562315 .phat  Coef.070.1719806 married  .01 0.z.0140283 age  .14 0. t P>t [95% Conf.5874586 . a = 1.0005368 .000 .z) and an error. and we have 119 .000 .2.000 .00) suggests severe collinearity among the instruments.66).1732229 .0352802 . The collinearity suspected in part b is confirmed by regressing Fi on the xi: the Rsquared is .5907578 .640.00011 2.0000258 .0000546 254.0069914 .z) = E[E(wWvx.000 .7. it is not a good idea.04 0. we will replace wWv with its expectation given (x. To be added. ^ (When we do not instrument for train. Err..99 0.000 . Generally. ^ c.z] = E[E(wx.0345727 . Std.0000312 222. Interval] +re74  .71 0. But E(wWvx.625.82 0.9.1850713 . 18.
where we have used the fact that w is a function of (g. Then we take the expected value of (18. The last equation suggests a twostep procedure.x. wi(xi .z) are valid as instruments. define r = u + [w  Given the assumptions. wigi. an Ftype 120 . assume we can write w = exp(p0 + xP1 + zP2 + g). xi. If h _ h(x. L(w1.z) = 0. and P2 from the From this regression. as is implied in the statement of the problem.x..N.z) + r. we can consistently estimate p0.. zi.. gi.v.. i = 1. i = 1. where E(ug.. ^ need the residuals.x.z)] + e.z): E(yv.z) in the estimating equation. wi.for example.  As usual. E(rx.z) + E(eg.z) and E(eg. xi. no other functions of (x. These are standard linearity assumptions under independence of (u.] So we can write y = h0 + xG + bw + wW(x  J)D + xE(wx.. A standard joint significant test . The ATE b is not identified by the IV estimator applied to the extended equation.z).66) conditional on (g. [Note that we do not need to replace p0 with a different constant.z). This is a clear weakness of the approach. What I should have said is.. Now. we In the second step.. run the regression ^ ^ yi on 1..z) = h0 + xG + bw + wW(x  J)D + rWg + qwWg.z) + wE(vgx. c.x.g) and (x.z) = rWg and E(vg. the average treatment effect.z) = h0 + xG + bw + wW(x  J)D + E(ug.x. since log(wi) = P1. OLS regression log(wi) on 1.N.q.N.used the assumption that v is independent of (x.. p0 + xiP1 + ziP2 + gi.z). b.z) is any function of (x. i = 1.x. E(rx.x.x. E(wWvx. gi.h) = L(wq) = q because q = E(wx. First.z) = 0.x.z) = qWg.x). In effect. the coefficient on wi is the consistent estimator of b.z) = 0. This is not what I intended to ask. becaue we need to include E(wx...z).
299532 restaurn  2.011 5. For the exponential case.1. t P>t [95% Conf.7745021 . which.003 .83 0.8509044 5.3.412 cigs  Coef.on the last two terms effectively tests the null hypothesis that w is exogenous.829893 .880158 +Total  151753.5017533 .1605158 4.56 0. 799) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 807 6.059019 .8690144 .00 0. so the sufficient second order condition is satisfied.1. . 19.683 806 188. The first = 0.0000 0. so m = mo uniquely sets the The second derivative of q(m) is mom 2 > 0 for all m > 0.561503 2.0529 0. CHAPTER 19 19. 2 The 2 + m . The answers are given below.1736136 age  .246 799 179.089585 121 .305594 educ  .38 0. molog(m) .5592363 1.log(m).m 3 second derivative is 2mom 2 mo 2 = mo 1 _ E[li(m)] = mo/m . The following is Stata output used to answer parts a through f. gives 2mo + < 0.19 0.1671677 3. b. which is uniquely solved by m = mo. Write q(m) _ Then dq(m)/dm = mo/m . when evaluated at mo.702 3.280003 Number of obs F( 7.233 .883 12. q(m) 2 order condition is mom . Err.43631 7 1147.test .15 0.782321 0.865621 1. This is a simple problem in univariate calculus.20124 10.38 0.7287636 1.4594197 1.459461 0.0446 13.000 .m for m > 0. a.117406 2. Std. derivative to zero. Interval] +lcigpric  . reg cigs lcigpric lincome restaurn white educ age agesq Source  SS df MS +Model  8029.49943 lincome  .6722235 white  .06233 Residual  143724.424067 2.
test lcigpric lincome ( 1) ( 2) lcigpric = 0.1829532 age  .90194 0.16145 .82 0.07 0.002 .000 .0056373 _cons  2.0000 0.14 0.685 3.017275 2.682435 25.682435 24.61 0. robust Regression with robust standard errors Number of obs F( 7.5592363 1.412  Robust cigs  Coef.7353 11.04545 agesq  .09 0.005 4.1624097 3.0 F( 2.agesq  .11 0. test lcigpric lincome ( 1) ( 2) lcigpric = 0.000 .0 F( 2.0119324 .597972 1.042796 restaurn  2.912 50. 799) Prob > F Rsquared Root MSE = = = = = 807 9.22 0. reg cigs lcigpric lincome restaurn white educ age agesq.865621 1.0062048 _cons  2.3047671 2. Std.0335 lincome  .0124999 .918 53. poisson cigs lcigpric lincome restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2: log likelihood = 8111.8346 log likelihood = 8111.71 0.26472 2.147 .8690144 .86134 .22073 0.000 .888 12.45 0.22621 44.146247 educ  .38 0.0 lincome = 0. 799) = Prob > F = 0.0090686 .8509044 6.19 0.5191 log likelihood = 8111.378283 0.41 0.5017533 .3441 . 799) = Prob > F = 1.862469 .519 Poisson regression Number of obs LR chi2(7) 122 = = 807 1068. Err.0017481 5.5035545 1.8687741 white  .70 .1380317 5. t P>t [95% Conf. Interval] +lcigpric  .7745021 .0 lincome = 0.8205533 .10 0.4899 .0014589 6.52632 48.054396 0.0529 13.0090686 .
Interval] +lcigpric  .519022 = 14698.46367 20.0223989 5.0618 cigs  Coef.2753831 educ  .0677648 .1037275 .0914144 1.0594225 .58 0.1037275 .519 = 8111. of obs Residual df Scale param (1/df) Deviance (1/df) Pearson 14752.14 0.10 0.33 0.0008677 _cons  .160812 lincome  .48 0.10 0.886 5.3636059 .519 Generalized linear models Optimization : ML: NewtonRaphson Deviance Pearson = = No.1059607 .1407338 2.34 0.46933 16232.74 0.820355 (Standard errors scaled using square root of Pearson X2based dispersion) * The estimate of sigma is 123 .1285444 .000 .6394391 .0013708 .99 0.0510802 age  .1750847 lincome  .0049694 22.0181421 educ  .3870061 .3024098 white  . glm cigs lcigpric lincome restaurn white educ age agesq.0639772 .1142571 . Err.6454 = 8111.0374207 1.870 1.0552011 .000 .31628 = 20.0312231 11.92274 = = = = = 807 799 1 18.6463244 0.0013708 . family(poisson) sca(x2) Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 8380.000 .2828965 restaurn  .0012592 _cons  .65 0. z P>z [95% Conf.4248021 .001874 . Interval] +lcigpric  .1434779 restaurn  .000 .518 . Std.0754414 .027457 5.3857854 .000057 24.257 .0000 0.599794 .6139626 0. Err.65 0.002 .0042564 13.000 .11 0.140 .1142571 .0703561 .3636059 .1239969 agesq  .0191849 3.12272 cigs  Coef.0552012 .Log likelihood = Prob > chi2 Pseudo R2 8111.0594225 .1059607 .0014825 .158158 agesq  .0218208 age  .8068952 1.0970243 .96 0.0202811 5. Std.3964493 2.1083 = 8111.13 0.743 .010 .1686685 0.000 .07 0.1433932 0.372733 1.0877728 white  . z P>z [95% Conf.1045172 .460 .76735 0.70987 Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : OIM [Poisson] [Log] Log likelihood BIC AIC = 8111.3964494 .519 = = 0.0002567 5.16 0.000 .
48 0.519 = 8111. .14 0.32) 4.31628 .3545336 .0014458 .000 .0040652 13. Err.098 .6454 = 8111.291 .46367 20. di sqrt(20.037371 1.32) 1.2906 = = = = 807 1041. Interval] +restaurn  .5469381 .1350483 .46933 16232. di 2*(8125.1305594 agesq  .70987 Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : Sandwich [Poisson] [Log] 124 = = = = = 807 799 1 18. * dividing by 20.4150564 .2907 log likelihood = 8125.0618025 .519 Generalized linear models Optimization : ML: NewtonRaphson Deviance Pearson = = No.1116754 .1095991 6. family(poisson) robust Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 8380.2940107 white  .0000553 26. z P>z [95% Conf..544 .0013374 _cons  .000 .3555118 . di 2*(8125.95 0.0000 0.1211174 .0015543 .2906 Poisson regression Number of obs LR chi2(5) Prob > chi2 Pseudo R2 Log likelihood = 8125.8111.0602 cigs  Coef.000 .7617484 .0452489 age  .8111. glm cigs lcigpric lincome restaurn white educ age agesq.1083 = 8111.32: The GLM version is obtained by .0114433 educ  . poisson cigs restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2: log likelihood = 8125.65 0.5077711 .0532166 . * This is the usual LR statistic.291 .16 0.14 0.000 .9765587 .0611842 .519)/(20.519) 27.0048175 25. Std.618 log likelihood = 8125.0308796 11.09 0.000 . of obs Residual df Scale param (1/df) Deviance (1/df) Pearson 14752.
is very significant: t = 5.438442 6.3752553 . It does not matter whether we use the usual or robust standard errors.1143/(2*.0002446 5.0595355 . Incidentally.34 0. Std. b.0212322 5.002 . Neither the price nor income variable is significant at any reasonable significance level. Interval] +lcigpric  .46). Err. Both estimates are elasticities: the estimate price elasticity is .Log likelihood BIC = 8111.38 0. (States that have restaurant smoking restrictions also have higher average prices.12272  Robust cigs  Coef.1558715 agesq  .1142571 .0018503 .3964493 2.715328 a.104.16 0.0013708 .264853 educ  .344. based on the usual Poisson standard errors. In this data set.6681827 0.92274 AIC = 20.140366 2.00137) 41. di .then log(cigpric) becomes much more significant (but using the incorrect standard errors). they are significantly correlated.0008914 _cons  .23134 .1059607 .9%. using the usual and heteroskedasticityrobust tests (pvalues = .0192058 3. While the price variable is still very insignificant (pvalue = . not surprisingly.010 .0594225 .213 .000 .) 125 .0217798 age  .415575 1. The two variables are jointly insignificant.2669906 restaurn  .000 .874 1. the income variable.490.a binary indicator for restaurant smoking restrictions at the state level .09 0. respectively).1632959 0.0552011 . . although the coefficient estimates are the expected sign.60 0.203653 lincome  .59 0. if you drop restaurn .0884937 white  . and.25 0.0970653 . z P>z [95% Conf.735 .11.106 and the estimated income elasticity is .13 0.97704 0.894 5. on the order of 2.519022 = 14698.083299 1. both cigpric and restaurn vary only at the state level.1037275 . too.6387182 .3636059 .0726427 .
In this example.19 with the usual standard errors).72.51.32. and the age variables are still significant. which is a 2 very large value in a c2 distribution (pvalue ~ 0). The usual LR statistic is 2(8125. and that on lincome falls to 1. so QLR = 1. e.y It > 1) as a truncated Poisson distribution. With the GLM standard errors. A double hurdle model . there is no race effect. in fact.seems like a good idea. while the LR statistic shows strong significance." The t statistic on lcigpric is now very small (. and then to model P(y = 0x) as a logit or probit. One approach is to model D(yx. (Interestingly. f.51). most explanatory variables become slightly more significant than when we use the GLM standard errors. As expected. it is Having fully robust standard errors has no additional effect. is certainly worth investigating. Using the robust standard errors does not significantly change any conclusions.291 .^ c. This means all of the Poisson standard errors should be multiplied by this factor. the QLR statistic shows that the variables are jointly insignificant. as is done using the "glm" command in Stata.which separates the initial decision to smoke at all from the decision of how much to smoke .) d. with the option "sca(x2). the restaurant restriction variable. using the maximum likelihood standard errors is very misleading in this example.13 . 126 .much more in line with the linear model t statistic (1. We simply compute the turning point for the quadratic: = 1143/(2*. education.519) = 27.54. Clearly.36 (pvalue ~ . The QLR statistic ^2 divides the usual LR statistic by s = 20.16).00137) ^ ^ bage/(2bage2) ~ 41. g.8111. The GLM estimate of s is s = 4. conditional on the other covariates. ^ the adjustment by s > 1 that makes the most difference.
. obtain ~ ~ ~ ~ ~ ~ ~ ~ consistent estimators a.. through the origin. of ~ ~ ~ ~2 ~ ~ 2 uit ... 2 2 where t 2 _ Var(ci) and we have used E(cixi) = exp(a) under H0.. the T = 0. general expression holds for conditional covariances: Cov(yit. We just use iterated expectations: E(yitxi) = E[E(yitxi. Var(yitxi) = E[Var(yitxi. B. 2 also use the many covariance terms in estimating t 127 2 because t [We could 2 = . First. We are explicitly testing H0: independence of ci and xi under H0.yit on [exp(xitB)] . all of which we can It is natural to use a score test of H0: G = 0.ci).19.yit.ci)xi] + Var[E(yitxi. say.ciexp(xirB)xi] = t exp(xitB)exp(xirB).ci)xi] = E[ciexp(xitB)xi] + Var[ciexp(xitB)xi] = exp(a + xitB) + t [exp(xitB)] . under H0. Var(yixi).N. Let yit = exp(a + ~ ~ ~ ~ ~ ~ 2 xitB) and uit = yit .T.yirxi) = E[Cov(yit.yirxi. but we are maintaining full We have enough assumptions to derive * T conditional variance matrix of yi given xi under H0.ci)xi] + Cov[E(yitxi. This works because. pooled Poisson QMLE.E(yirxi. under H0. a simple pooled regression. ~2 Call this estimator t .. i = 1. First.E(yitxit). A consistent estimator of t can be obtained from estimate. t = 1. Var(yixi) depends on a. where uit 2 2 _ yit ..ci)xi] = 0 + Cov[ciexp(xitB).ci)xi] = E(cixi)exp(xitB) = exp(a + xiG)exp(xitB) = exp(a + xitB + xiG). a.. 2 So.   G b. 2 and t . B by.5. A similar. E(uitxi) = E(uitxit) 2 = exp(a + xitB) + t [exp(xitB)] .
E{[uit/exp(xitB)][uir/exp(xirB)]}, all t
2
2
Next, we construct the T
$ r.
* T weighting matrix for observation i, as in
~
~
Section 19.6.3; see also Problem 12.11. The matrix Wi(D) = W(xi,D) has
~
~
~
~2
~ 2
diagonal elements yit + t [exp(xitB)] , t = 1,...,T and offdiagonal elements
~
~
~2
~
~
~ ~
t exp(xitB)exp(xirB), t $ r. Let a, B be the solutions to
N
~ 1
min (1/2) S [yi  m(xi,a,B)]’[Wi(D)] [yi  m(xi,a,B)],
i=1
a,B
where m(xi,a,B) has t
th
element exp(a + xitB).
Since Var(yixi) = W(xi,D),
this is a MWNLS estimation problem with a correctly specified conditional
variance matrix.
Therefore, as shown in Problem 12.1, the conditional
information matrix equality holds.
To obtain the score test in the context
of MWNLS, we need the score of the comditional mean function, with respect to
all parameters, evaluated under H0.
Let
Q _
Then, we can apply equation (12.69).
(a,B’,G’)’ denote the full vector of conditional mean
parameters, where we want to test H0:
G
= 0.
The unrestricted conditional
mean function, for each t, is
mt(xi,Q) = exp(a + xitB + xiG).

Taking the gradient and evaluating it under H0 gives
~
Dqmt(xi,Q~) = exp(a~ + xitB
)[1,xit,xi],

which would be 1
* (1 + 2K) without any redundancies in xi.

Usually, xit
would contain year dummies or other aggregate effects, and these would be

dropped from xi; we do not make that explicit here.
T
Let
DqM(xi,~Q) denote the
* (1 + 2K) matrix obtained from stacking the Dqmt(xi,Q~) from t = 1,...,T.
Then the score function, evaluate at the null estimates
~
Q _
~ ~ ~
(a,B’,G’)’, is
~
~
~ 1~
si(Q) = DqM(xi,Q)’[Wi(D)] ui,
~
where ui is the T
* 1 vector with elements ~uit _ yit  exp(~a + xitB~ ).
128
The
estimated conditional Hessian, under H0, is
~
1
A = N
N
S DqM(xi,Q~)’[Wi(~D)]1DqM(xi,~Q),
i=1
a (1 + 2K)
* (1 + 2K) matrix.
The score or LM statistic is therefore
& S D M(x ,~Q)’[W (~D)]1~u *’& SN D M(x ,~Q)’[W (~D)]1D M(x ,~Q)*1
i
i
i8 7
q i
i
q i 8
7i=1 q
i=1
N
W&7 S DqM(xi,Q~)’[Wi(~D)]1~ui*8.
N
LM =
i=1
a
2
Under H0, and the full set of maintained assumptions, LM ~ cK.
If only J < K

elements of xi are included, then the degrees of freedom gets reduced to J.
In practice, we might want a robust form of the test that does not
require Var(yixi) = W(xi,D) under H0, where W(xi,D) is the matrix described
above.
This variance matrix was derived under pretty restrictive
assumptions.
~
A fully robust form is given in equation (12.68), where si(Q)
~
~
1
and A are as given above, and B = N
N
S si(~Q)si(~Q)’.
Since the restrictions
i=1
are written as
matrix is K
G
= 0, we take c(Q) =
G,
~
and so C = [0IK], where the zero
* (1 + K).
c. If we assume (19.60), (19.61) and ci = aiexp(a + xiG) where aixi ~

Gamma(d,d), then things are even easier  at least if we have software that
estimates random effects Poisson models.
Under these assumptions, we have
yitxi,ai ~ Poisson[aiexp(a + xitB + xiG)]

yit, yir are independent conditional on (xi,ai), t
$ r
aixi ~ Gamma(d,d).
In other words, the full set of random effects Poisson assumptions holds, but
where the mean function in the Poisson distribution is aiexp(a + xitB + xiG).


In practice, we just add the (nonredundant elements of) xi in each time
period, along with a constant and xit, and carry out a random effects Poisson
analysis.
We can test H0:
G
= 0 using the LR, Wald, or score approaches.
Any of these wouldbe asymptotically efficient.
129
But none is robust because we
have used a full distribution for yi given xi.
19.7. a. First, for each t, the density of yit given (xi = x, ci = c) is
yt
f(ytx,c;Bo) = exp[cWm(xt,Bo)][cWm(xt,Bo)] /yt!,
yt = 0,1,2,....
Multiplying these together gives the joint density of (yi1,...,yiT) given (xi
= x, ci = c).
Taking the log, plugging in the observed data for observation
i, and dropping the factorial term gives
T
S {cim(xit,B) + yit[log(ci) + log(m(xit,B))]}.
t=1
b. Taking the derivative of li(ci,B) with respect to ci, setting the
result to zero, and rerranging gives
T
(ni/ci) =
S m(xit,B).
t=1
Letting ci(B) denote the solution as a function of
ni/Mi(B), where Mi(B)
B,
we have ci(B) =
T
_ S m(xit,B).
The second order sufficient condition
t=1
for a maximum is easily seen to hold.
c. Plugging the solution from part b into li(ci,B) gives
li[ci(B),B]
= [ni/Mi(B)]Mi(B) +
= ni + nilog(ni) +
T
=
T
S yit{log[ni/Mi(B)] + log[m(xit,B)]
t=1
T
S yit{log[m(xit,B)/Mi(B)]
t=1
S yitlog[pt(xi,B)] + (ni  1)log(ni),
t=1
because pt(xi,B) = m(xit,B)/Mi(B) [see equation (19.66)].
N
d. From part c it follows that if we maximize
S
i=1
li(ci,B)
with respect to
(c1,...,cN)  that is, we concentrate out these parameters  we get exactly
N
li[ci(B),B].
i=1
S
B
depend on
N
But, except for the term
S (ni  1)log(ni)  which does not
i=1
 this is exactly the conditional log likelihood for the
conditional multinomial distribution obtained in Section 19.6.4.
Therefore,
this is another case where treating the ci as parameters to be estimated leads
us to a
rNconsistent, asymptotically normal estimator of Bo.

130
fitted values) .1].448 .0177377 .0936415 .0110085 .0394496 _cons  .76 0.64937 223.0856818 soph  . This is required to easily use the "glm" command in Stata.64983 223.0202207 .003 . I first converted the dependent variable to be in [0.014485 0.2976 .000 .001681 10. count if atndrteh > 1 12 .000 . Interval] +ACT  .0517097 .99 0.6268492 . sum atndrteh Variable  Obs Mean Std.0417257 16. replace atndrte = atndrte/100 (680 real changes made) .100]. Dev.95396289 4 1. Min Max +atndrteh  680 .48849072 Residual  13.086443 .1820163 .020411511 +Total  19. of obs 131 = 680 .92 0.0112156 16. I will use the following Stata output.4846666 1.000 .19.7907046 .7777696 675 . reg atndrte ACT priGPA frosh soph Source  SS df MS +Model  5. Err.99 0.0000 0.14287 atndrte  Coef. predict atndrteh (option xb assumed. 675) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 680 72.64937 Generalized linear models No.64509 223. Std.0174327 . .3017 0.23 0.2040379 frosh  .7317325 679 .0136196 priGPA  . glm atndrte ACT priGPA frosh soph. family(binomial) sca(x2) note: atndrte has noninteger values Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = 226. t P>t [95% Conf.0173019 2.7087769 .029059989 Number of obs F( 4. rather than [0.07 0.1599947 .0169202 .8170956 .9.
113436 3. Std. Interval] +ACT  .1266718 = = = = 675 1 . sum atndh Variable  Obs Mean Std. z P>z [95% Conf.0928127 .66 0.1335703 .1114*30 + 1.57283238 Variance function: V(u) = u*(1u) Link function : g(u) = ln(u/(1u)) Standard errors : OIM [Bernoulli] [Logit] Log likelihood BIC AIC = 223. di exp(.8170956 . corr atndrte atndh (obs=680)  atndrte atndh +atndrte  1.0113217 9.0922209 .7622 .322713 (Standard errors scaled using square root of Pearson X2based dispersion) .395552 frosh  .4233143 .2859966 2. di (.1113802 .0000 atndh  0.760 .75991253 . predict atndh (option mu assumed. Err.000 1.6493665 = 253.0771321 16.44 0.1114*25 + 1.093199 1.3899318 .087 .847 .008 .01607824 .0944066 0.1676013 .84 0.Optimization : ML: NewtonRaphson Deviance Pearson = = Residual df Scale param (1/df) Deviance (1/df) Pearson 285.0965356 .2778463 _cons  ..7621699 .7622 . predicted mean atndrte) .0000 132 .244375 . Min Max +atndh  680 .98 0.244*3)/(1 + exp(.1268)^2 .6724981 atndrte  Coef.3499525 . Dev.244*3)) ..000 .1267746 = ..5725 1.244*3)) .. di .13 0.7371358 85. di exp(.1114*25 + 1.326 .0891901 priGPA  1.244*3)/(1 + exp(.7622 .201627 1.1114*30 + 1.7622 .84673249 ..001 .9697185 .6122622 soph  .
are much too large. Since the coefficient on ACT is negative. For the logistic functional form. actually reduces predicted attendance rate.5 percentage points. you will get the usual MLE standard errors. these changes do not always make sense when starting at extreme values of atndrte. In other words. The Rsquared for the linear model is about . (If you omit the "sca(x2)" option in the "glm" command. the attendance rate is predicted to be about . The coefficient on ACT means that if the ACT score increases by 5 points .7 percentage points. And. from the expected Hessian of the quasilog likelihood. or about 8. This Rsquared is about .. d.085.5725)^2 . This is very similar to that found using the linear model.302. we know that an increase in ACT score. The coefficient on priGPA means that if prior GPA is one point higher. The calculation shows that when ACT increases from 25 to 30. or 8.087. There are 12 fitted values greater than one.more than a one standard deviation increase .then the attendance rate is estimated to fall by about .017(5) = .) c. the usual MLE standard errors. 133 .328.0161. none less than zero. obtained. say. b. The GLM standard errors are given in the output.32775625 a. di (. and so the logistic functional form does fit better than the linear model.182 higher. or 18. ^ Note that s ~ . 2 standard errors that account for s The < 1 are given by the GLM output. the estimated fall in atndrte is about .2 percentage points. holding year and prior GPA fixed. remember that the parameters in the logistic functional form are not chosen to maximize an Rsquared. I computed the squared correlation between atndrtei and ^ E(atndrteixi). Naturally.
Q) = 1 .1.aixi)].aixi)].F(b . the loglikelihood is maximized by minimizing exp(b) across b.ai.5.ai) = P(t*i < t.F(tixi. 20. So no two real numbers for a and b maximize the log likelihood. To be added.11.ai. and so the S exp(xiB)cai .Q)] a b. di = 0 for all i.F(b .aixi) (because t < b  ai) = [F(txi) . If all durations in the sample are censored. c. P(ti < txi. It is not possible to estimate duration models from flow data when all durations are right censored. is simply f(txi)/[1 . The derivative of the cdf in part a.F(cixi.t*i > b . for any a > 0.3. To be added. the Weibull loglikelihood with complete censoring is exp(b) N S cai . we can choose any a > 0 so that N S cai > 0. So plugging any value a into the log likelihood will lead to b getting more and more negative without bound. a.aixi)]/[1 . Without covariates. N loglikelihood is  i=1 c.ai) = P(t*i > . SOLUTIONS TO CHAPTER 20 PROBLEMS 20.F(b . i=1 Since ci > 0.ci. i=1 But then. * 20.si = 1) = P(ti * 134 > cixi. P(ti = cixi.exp[exp(xiB)t ].Q)] = S log[1 .si = 1) = P(t*i < txi. and so the loglikelihood is N N i=1 i=1 S log[1 .aixi)/P(ti * > b . with respect to t. b.t*i > b . For the Weibull case.ci. F(txi. a. d. exp(b) L 0.aixi) = P(ti * < txi)/P(t*i > b .t*i > b . But as b L 8.19.
ti). a. For t = ci.ci.di) k(axi)[f(txi)] i[1 . To be added. > b  From the standard result for densities for truncated distributions.7.t) such that si = 1. that is. which is exactly (20. by the usual right censoring argument.di.56) results in more efficient estimators provided we have the two densities correctly specified. We have the usual tradeoff between robustness and efficiency. Using the log likelihood (20. the density of (ai.22) and D(aici.xi) when t < ci.30) requires us to only specify f(Wxi).56).xi) does not depend * on ci and is given by k(axi)f(txi) for 0 < a < b and 0 < t < 8.xi).ai) = [1 . the density is k(axi)[1  F(cixi)].F(cixi)] /P(si = 1xi). the density of (ai. the probability of * observing the random draw (ai. 20.xi.ti) given (ci.ti) given (ci. Putting in the parameters and taking the log gives (20. is P(ti ai.F(cixi)]/[1 .ti) given (ci. First. (20. Now. by (20.9.aixi) (because ci > b .32). We suppress the parameters in the densities. for all combinations (a.F(b  aixi)]. 135 . 20. conditional on xi. This is also the conditional density of (ai. the observation is uncensored.cixi)/P(ti * > b . b.xi) and si = 1 is d (1 .xi) = D(aixi).
This action might not be possible to undo. Are you sure you want to continue?