Analysis of Cross Section and Panel
Data, by Jeffrey M. Wooldridge, MIT Press, 2002.
The empirical examples are
solved using various versions of Stata, with some dating back to Stata 4.0.
Partly out of laziness, but also because it is useful for students to see
computer output, I have included Stata output in most cases rather than type
tables.
In some cases, I do more hand calculations than are needed in current
versions of Stata.
Currently, there are some missing solutions.
I will update the solutions
occasionally to fill in the missing solutions, and to make corrections.
some problems I have given answers beyond what I originally asked.
For
Please
report any mistakes or discrepencies you might come across by sending me email at wooldri1@msu.edu.
CHAPTER 2
dE(yx1,x2)
dE(yx1,x2)
= b1 + b4x2 and
= b2 + 2b3x2 + b4x1.
dx1
dx2
2
b. By definition, E(ux1,x2) = 0. Because x2 and x1x2 are just functions
2.1. a.


of (x1,x2), it does not matter whether we also condition on them:
E(ux1,x2,x2,x1x2) = 0.
2
c. All we can say about Var(ux1,x2) is that it is nonnegative for all x1
and x2:
E(ux1,x2) = 0 in no way restricts Var(ux1,x2).
2.3. a. y = b0 + b1x1 + b2x2 + b3x1x2 + u, where u has a zero mean given x1
and x2:
b.
E(ux1,x2) = 0.
We can say nothing further about u.
dE(yx1,x2)/dx1 = b1 + b3x2.
Because E(x2) = 0, b1 =
1
E[dE(yx1,x2)/dx1].
Similarly, b2 = E[dE(yx1,x2)/dx2].
c. If x1 and x2 are independent with zero mean then E(x1x2) = E(x1)E(x2)
= 0.
Further, the covariance between x1x2 and x1 is E(x1x2Wx1) = E(x1x2) =
2
2
E(x1)E(x2) (by independence) = 0.
A similar argument shows that the
covariance between x1x2 and x2 is zero.
But then the linear projection of
x1x2 onto (1,x1,x2) is identically zero.
Now just use the law of iterated
projections (Property LP.5 in Appendix 2A):
L(y1,x1,x2) = L(b0 + b1x1 + b2x2 + b3x1x21,x1,x2)
= b0 + b1x1 + b2x2 + b3L(x1x21,x1,x2)
= b0 + b1x1 + b2x2.
d. Equation (2.47) is more useful because it allows us to compute the
partial effects of x1 and x2 at any values of x1 and x2.
Under the
assumptions we have made, the linear projection in (2.48) does have as its
slope coefficients on x1 and x2 the partial effects at the population average
values of x1 and x2  zero in both cases  but it does not allow us to
obtain the partial effects at any other values of x1 and x2.
Incidentally,
the main conclusions of this problem go through if we allow x1 and x2 to have
any population means.
2.5. By definition, Var(u1x,z) = Var(yx,z) and Var(u2x) = Var(yx).
2
assumption, these are constant and necessarily equal to s1
Var(u2), respectively.
By
_ Var(u1) and s22 _
2
But then Property CV.4 implies that s2
> s21.
This
simple conclusion means that, when error variances are constant, the error
variance falls as more explanatory variables are conditioned on.
2.7. Write the equation in error form as
2
y = g(x) + zB + u, E(ux,z) = 0.
Take the expected value of this equation conditional only on x:
E(yx) = g(x) + [E(zx)]B,
and subtract this from the first equation to get
y  E(yx) = [z  E(zx)]B + u
~
~
or y = zB + u.
~
~
Because z is a function of (x,z), E(uz) = 0 (since E(ux,z) =
~ ~
~
0), and so E(yz) = zB.
This basic result is fundamental in the literature on
estimating partial linear models.
First, one estimates E(yx) and E(zx)
using very flexible methods, typically, socalled nonparametric methods.
~
Then, after obtaining residuals of the form yi
^
E(zixi),
B
^
~
_ yi  E(y
ixi) and zi _ zi  
~
~
is estimated from an OLS regression yi on zi, i = 1,...,N.
Under
general conditions, this kind of nonparametric partiallingout procedure leads
to a
rNconsistent, asymptotically normal estimator of B.

See Robinson (1988)
and Powell (1994).
CHAPTER 3
3.1. To prove Lemma 3.1, we must show that for all e > 0, there exists be <
and an integer Ne such that P[xN
following fact:
since xN
p
L
But
We use the
a, for any e > 0 there exists an integer Ne such
that P[xN  a > 1] < e for all N
Definition 3.3(1).]
> be] < e, all N > Ne.
8
> Ne .
[The existence of Ne is implied by
xN = xN  a + a < xN  a + a (by the triangle
inequality), and so
xN  a < xN  a.
It follows that P[xN 
< P[xN  a > 1].
Therefore, in Definition 3.3(3) we can take be
a > 1]
_ a + 1
(irrespective of the value of e) and then the existence of Ne follows from
Definition 3.3(1).
3
We Obtain Avar(yN) by dividing Avar[rN(yN . and so ^ ^ plim[log(q)] = log[plim(q)] = log(q) = g.1)/se(q) = 3/2 = 1. ^ ^ and se(q) = 2. we need a consistent estimator of s.1 because g(xN) p L g(c).3. ^ ^ ^ Therefore. By the CLT.   When g(q) = ^ log(q) .m) ~a Normal(0. We use the delta method to find Avar[rN(g . the null of interest can also be stated as H0: g = 4 .  2  2 rN(yN .5. se(g) = se(q)/q.Avar[rN(g . and so Avar[rN(yN .g)]  2 ^ = (1/q) Avar[rN(q .5. this coincides with the actual variance of yN. or s/rN.yN)2.    Therefore.   3. The asymptotic t statistic for testing H0: q = 1 is (q . and then s^ is the positive square root.s2).  ^ c. Because g = log(q). a. ^ b.  d. g = log(4) ^ When q = 4 ~ 1.g)]. continuously differentiable .  As expected. The asymptotic standard deviation of yN is the square root of its asymptotic variance. 3. for g(q) = log(q). For q > 0 the natural logarithim is a continuous function.     c. 2  b. This follows immediately from Lemma 3. Var[rN(yN . ^ ^ ^ 2 ^ if g = g(q) then Avar[rN(g . the unbiased estimator of s 2 S (yi .7.  is used: ^2 s = (N  The asymptotic i=1 ^ standard error of yN is simply s/rN.m)] by N.q)].q)]. To obtain the asymptotic standard error of yN.   e. e. the asymptotic standard error of g is generally dg(^q)/dqWse(^q).g)] = [dg(q)/dq] Avar[rN(q .m)] = N(s /N) = s . Since Var(yN) = s /N.which is.m)] = s2. ^ ^ d.39 and se(g^) = 1/2.3.  In the scalar case. In the scalar case. a. 1 N 1) Typically. of course.  2 Avar(yN) = s /N.
where x denotes all explanatory variables. By assumption. ^ Avar[rN(G  where G(Q) = G)] ~ = G(Q)V1G(Q)’. The lesson is that.Avar[rN(G  G)] = G(Q)(V2 . ~ Avar[rN(G  G)] G)] = G(Q)V2G(Q)’.1]. then E[exp(u)x] = E[exp(u)] = d0. we need the derivative of g with 5 . CHAPTER 4 4.V1)G(Q)’. all other factors cancel out.d. This completes the proof.0.s. whereas the t statistic based on q is. if u and x are independent Therefore E(wagex) = d0exp(b0 + b1married + b2educ + zG).9.1] = g(b1).39/(. V2 . using the Wald test. E(wagex) = E[exp(u)x]exp(b0 + b1married + b2educ + zG). a.1.49) gives wage = exp(b0 + b1married + b2educ + zG + u) = exp(u)exp(b0 + b1married + b2educ + zG). the percentage difference is 100W[exp(b1) . Thus. say. This leads to a ^ very strong rejection of H0. Avar[rN(G  Dqg(Q) is Q * P. 3. Therefore. Therefore. Now. we can change the outcome of hypotheses tests by using nonlinear transformations. Since q1 = 100W[exp(b1) . Exponentiating equation (4. b. By the delta method. at best.1. and therefore G(Q)(V2 V1)G(Q)’ is p. marginally significant.78. ^ The t statistic based on g is about 1. Now.5) = 2. ^ . finding the proportionate difference in this expectation at married = 1 and married = 0 (with all else equal) gives exp(b1) .V1 is positive semidefinite.
D) Since Var(yw) = is s [E(w’w)] . It could be that E(x’u) = 0.065. say educ0 and educ1. a. where w = (x. then E(u2x) $ Var(ux). with upper block E(x’x) and lower block E(z ).3.199. and Var(ux) is constant. But.z).2 that Avar rN(D  ^ ^ (B’. 4. E(w’w) is block diagonal. The proportionate change in expected wage from educ0 to educ1 is [exp(b2educ1) . Var(ux) = E(u ^ ^ Therefore.dg/db1 = 100Wexp(b1). because E(x’z) = 0. b2 = . if E(ux) $ 0.76. The conditional variance can always be written as x) .01 and se(q1) = 4.where 2 1 ^ D = Importantly.1. b1 = .11.5. Write equation (4. q2 = 100W[exp(b2Deduc) . c.educ0)] . in which case OLS is consistent.29).g)’. all else fixed.1 = exp(b2Deduc) . For ^ ^ Then q2 = 29.50) as E(yw) = wD.exp(b2educ0)]/exp(b2educ0) = exp[b2(educ1 . For the estimated version of equation (4. se(b1) = ^ ^ . ^ q2 we set Deduc = 4.1] and ^ ^ ^ se(q2) = 100WDeducexp(b2Deduc)se(b2) ^ ^ d. 4. 2 the upper K * K block gives 6 Inverting E(w’w) and focusing on . respect to b1: ^ The asymptotic standard error of q1 ^ using the delta method is obtained as the absolute value of dg/db1 times ^ se(b1): ^ ^ ^ se(q1) = [100Wexp(b1)]Wse(b1).006. Not in general.7 and se(q2) = 3. ^ Using the same arguments in part (b).[E(ux)]2. the usual standard errors would not be valid unless E(ux) = 0. it follows by Theorem 4. 2 ^ s .039. q1 = 22. 2 b. se(b2) = . We can evaluate the conditional expectation in part (a) at two levels of education. generally.
z) = 0 and E(u x.B)   = [E(x’x)] E(v x’x)[E(x’x)] 1 2 1 = [E(x’x)] E(v x’x)[E(x’x)] 1 2 1 .z) = Var(yx.3.d. x) = g2E(z2x) + E(u2x) + 2gE(zux) = g2E(z2x) + 2 Further.s E(x’x) = g h E(x’x).s E(x’x)][E(x’x)] .3). the equation y = xB + v generally violates the 2 Unless E(z homoskedasticity assumption OLS.z) = zE(ux.E(yx. ~ 1 2 1 rN(B . when g 2 2 2 a positive definite matrix except by fluke. Family income and PC ownership are positively correlated because the probability of owning a PC increases with family income.B) is positive semidefinite by   writing Avar ~ ^ rN(B . 1 Because [E(x’x)] 1 s E(x’x) is p. Because E(x’z) = 0 and E(x’u) = 0. Then by the law of x)x’x] = g2E[h(x)x’x] + s2E(x’x).B) = s [E(x’x)] . is actually x) = E(z2) 2 In particular.s [E(x’x)] E(x’x)[E(x’x)] 2 1 1 = [E(x’x)] [E(v x’x) . we need to find Avar where v = gz + u and u E(x’v) = 0. let h(x) iterated expectations. 2 2 2 x) is constant.Avar rN(B . ~ rN(B .  Now we can show Avar ~ ^ rN(B . 2 = h $ 0. where we use E(zux. which is positive definite. 7 Another factor in u . 2 2 2 2 4.s. 2 2 2 1 is positive definite.s E(x’x) = g E[h(x)x’x].B) .B). One important omitted factor in u is family income: students that come from wealthier families tend to do better in school. Avar So. Therefore. E(v x’x) = E[E(v 2 2 _ E(z2x). a.B) . it suffices to show that E(v x’x) 2 To this end.Avar ^ 2 1 rN(B .z).7. E(v x’x) . E(v s .z) = s . other things equal. which.Avar rN(B . E(v x’x) .B) = [E(x’x)] E(v x’x)[E(x’x)] . It is helpful to write y = xB + v  _ y .  Next. if E(z > 0 (in which case y = xB + v satisfies the homoskedasticity assumption OLS. without further assumptions.s [E(x’x)] 2 1 .
1)log(y1) + u. For simplicity. ^ b. But Corr(w1. The coefficient on log(y1) changes. b.w)/(sw sw). Var(w) = Var(w1). Clearly. and so on. expenditure per student. This may also be correlated with PC: a student who had more exposure with computers in high school may be more likely to own a computer. so we can write a1 = Cov(w1. average teacher salary. as zip code is often part of school records.is quality of high school. where sw 1 1 = sd(w1) and sw = sd(w). family income) and PC.9. and it is likely to be positive. by assumption. But. but it is not clearcut because of the other explanatory variables in the equation. b3 is likely to have an upward bias because of the positive correlation between u and PC.w) = Cov(w1. Just subtract log(y1) from both sides: Dlog(y) = b0 + xB + (a1 . c. If data on family income can be collected then it can be included in the equation. If family income is not available sometimes level of parents’ education is. a.w)/(sw sw). w1 = log(y1). Another possibility is to use average house value in each student’s home zip code. and since a correlation coefficient is always between 1 1 8 . Proxies for high school quality might be facultystudent ratios. This measures the partial correlation between u (say.w)/Var(w1). Then the population slope coefficient in a simple regression is always a1 = Cov(w1. the intercept and slope estimates on x will be the same. If we write the linear projection u = d0 + d1hsGPA + d2SAT + d3PC + r then the bias is upward if d3 is greater than zero. let w = log(y). 4.
000 .1303995 .0002 a.128 0.0520917 educ  .0 kww = 0. the result follows.131415664 +Total  165.0024457 4.0000 0.1758226 . Thus.002 .066 0.0262222 3.2662 0.0498375 .and 1. Interval] +exper  .177362188 Number of obs F( 9.0011306 . we have an even lower estimated return to education. t P>t [95% Conf. Err.0305676 urban  .0074608 _cons  5.0640893 iq  .656283 934 .0109248 .0820295 .0031183 .0269095 6.2685059 south  .2591 .1230118 .0127522 .2087073 .000 .0157246 married  .0 F( 2.559489 925 .127776 40.2286334 black  .534 0. 925) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 935 37.000 .506 0.0064117 .863 0. Std. We can see from the t statistics that these variables are going to be 9 . 925) = Prob > F = 8.0032308 3.006125 .0190927 tenure  .003826 . 4. When we used no proxy the estimated return was about 6.0051059 kww  .268 0.1334913 .001 . reg lwage exper tenure married south urban black educ iq kww Source  SS df MS +Model  44.1157839 .36251 lwage  Coef.0001911 . and with only IQ as a proxy it was about 5.947 0.924879 5.5%.079 0.000 4.4%. b.175644 . Here is some Stata output obtained to answer this question: .002 .0399014 3.0010128 3.000 . but it is still practically nontrivial and statistically very significant.0018521 2.938 0. test iq kww ( 1) ( 2) iq = 0.000 .28 0.0355856 .0389094 4.039 .1921449 .0967944 9 4.007262 6.89964382 Residual  121.426408 .59 0.11. The estimated return to education using both IQ and KWW as proxies for ability is about 5%.467 0.
a.9532493 .000 . The elasticities with respect to the probability of serving a prison term and the average sentence length are positive but are statistically insignificant.jointly significant.4162 0. Std.28 0.2507964 .1153163 6.0831078 5. Err.0002.441 . and both are practically and statistically significant. 85) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 15.7239696 .2064441 0.725921 4.000 .570136 lavgsen  .0764213 . 4.78874002 Residual  15.4014499 _cons  4.69 0.0000 0. The wage differential between nonblacks and blacks does not disappear. with pvalue = . Blacks are estimated to earn about 13% less than nonblacks.641 .1549601 4 2.4725112 . The elasticities of crime with respect to the arrest and conviction probabilities are the sign we expect. b.799698 89 .867922 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 if d87 10 . Interval] +lprbarr  .6377519 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen if d87 Source  SS df MS +Model  11.301120202 Number of obs F( 4.1634732 0. t P>t [95% Conf. c.4315307 11.42902 lcrmrte  Coef. To add the previous year’s crime rate we first generate the lag: .18405574 +Total  26. Using the 90 counties for 1987 gives .15 0. gen lcrmr_1 = lcrmrte[_n1] if d87 (540 missing values generated) .4946899 lprbconv  .3072706 lprbpris  .28 0.47 0.3888 .77 0.2486073 . holding all other factors fixed.1596698 .009923 Because of the loglog functional form.000 5. all coefficients are elasticities.13.6447379 85 . The F test verifies this.
8697208 _cons  .25 0.81 0.83 0.04100863 +Total  26.799698 89 .038930942 +Total  26. but still have signs predicted by a deterrenteffect story.409 .94 0.1439946 There are some notable changes in the coefficients on the original variables.1520228 . The conviction Adding the lagged crime rate changes the signs of the elasticities with respect to prbpris and avgsen. t P>t [95% Conf. and the latter is almost statistically significant at the 5% level against a twosided alternative (pvalue = . Err.20251 lcrmrte  Coef. Std.91982063 75 .1850424 .8798774 14 1.3232625 . 84) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 113.1313457 . Std.6899051 .0782915 1.90 0. probability is no longer statistically significant. Interval] +lprbarr  .1266874 .8638 .056 .3130986 2.0000 0. 75) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 90 43.28 0.19731 lcrmrte  Coef.389257 . (The elasticity is also statistically different from unity. Not surprisingly.301120202 Number of obs F( 5.8911 0.4447249 84 .0036684 lcrmr_1  .8707 .0452114 17.) c.Source  SS df MS +Model  23.0627624 2.301120202 Number of obs F( 14.3549731 5 4. Interval] +11 . Adding the logs of the nine wage variables gives the following: .3098523 .67099462 Residual  3.7666256 .0602325 lprbconv  .016 1. the elasticity with respect to the lagged crime rate is large and very statistically significant.056).000 .95 0.7798129 .45 0.799698 89 .3077141 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmr_1 lwconlwloc if d87 Source  SS df MS +Model  23. The elasticities with respect to prbarr and prbconv are much smaller now.004 .0988505 1.204 .70570553 Residual  2.0000 0.0698876 lavgsen  . Err.8715 0.0465999 0. t P>t [95% Conf.0539921 lprbpris  .0386768 .
692009 .0395089 .911 .1775178 1.0 0.957472 1.039 . testparm lwconlwloc ( ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) 9) lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F( = = = = = = = = = 0.364317 .0835258 .1960546 .032.672 .056 7.336).0 0.0987371 .37 0.33 0.0641312 .285) and the wage for federal employees (.0 0.0277923 lcrmr_1  .175 .0560619 .0530331 14.32 0.3079171 lwser  .023 .0686327 lwtuc  .6396942 .618724 _cons  3.3317244 lwtrd  .09 0.134327 0.1725122 .1525615 .48 0.634 .0 0. gives the heteroskedasiticityrobust F statistic as F = 2.253707 .1375459 .408 .2155553 .83 0.94 0.0659533 2.43 0.1069592 .62 0.3350201 lwfed  .4522947 lwloc  .19 0. These are with respect to the wage in construction ( .7153665 lwfir  .2317449 1.11 0.8509887 lwcon  .7453414 . The two largest elasticities .1643 The nine wage variables are jointly insignificant even at the 15% level.0115614 lavgsen  .0369855 . 75) = Prob > F = 1.792525 1.37 0. Using the "robust" option in Stata.0 0.0411265 lprbconv  .3361278 .3038978 .11 0. d.849 .1186099 0. Plus.113 .2815703 lwmfg  .4749687 .61 0.6386344 .2072112 0.011 .277 .000 .05 0.0683639 .which also have the largest absolute t statistics have the opposite sign.3291546 0.0 0. which is appended to the "reg" command.1964974 0.4195493 .049728 1.19 and pvalue = .6926951 .0847427 1.1127542 . (This F statistic is the heteroskedasticityrobust Wald statistic divided by the number of restrictions being tested. nine in this 12 .1024014 2.2850008 .0 0.2453134 1.lprbarr  .173 .1674273 .187 .2034619 .0844647 2.0 0. the elasticities are not consistently positive or negative.8248172 lwsta  .3732769 .50 0.2079524 .0306994 lprbpris  .0 9.
Write R = 1 . Therefore. the error changes.15.[plim(SSR/N)]/[plim(SST/N)] = 1 . 2 where we use the fact that SSR/N is a consistent estimator of su and SST/N is 2 a consistent estimator of sy. Cov(xB. regardless of the nature of heteroskedasticity in Var(ux). Var(xB) < < 8.su/sy = r . The division by the number of restrictions turns the asymptotic chi square statistic into one that roughly has an F distribution. Since Var(u) But each xj is uncorrelated with u.plim[(SSR/N)/(SST/N)] = 1 . a. d. The derivation in part (c) assumed nothing about Var(ux). so we should allow for error variances to change across different models for the same response variable.u) is welldefined. which is uncorrelated with each xj.) 4. 13 . The population Rsquared depends on only the unconditional variances of u and y. which is not a very interesting case). the usual Rsquared consistently estimates the population Rsquared. Cov(xB. 2 c.example. say z. or sy = Var(xB) + su. 2 2 b. Var(y) = Var(xB) + Var(u). 2 Therefore. Suppose that an element of the error term. 8. 2 The statement "Var(ui) = s are nonrandom (or B = Var(yi) for all i" assumes that the regressors = 0. This is another example of how the assumption of nonrandom regressors can lead to counterintuitive conclusions. When we add z to the regressor list.) In the vast majority of economic applications.u) = 0. This is nonsense when we view the xi as random draws along with yi. (It gets smaller.SSR/SST = 1 . suddenly becomes observed.(SSR/N)/(SST/N). Neither Rsquared nor the adjusted Rsquared has desirable finitesample properties. Because each xj has finite second moment. and so does the error variance. plim(R ) = 1 2 2 2 . so Therefore. it makes no sense to think we have access to the entire set of factors that one would ever want to control for.
For example.1. a.3. say ¨ x1. ^ In other words. ^ ^ But when we regress z1 onto v2. One component of cigarette price is the state tax on 14 .r1)’ be OLS estimator from (5. because i=1 ^ ^ ^ ^ we can write y2 = y2 + v2. ^ y2. At first glance it seems that cigarette price should be exogenous in equation (5. (More precisely.52).such as unbiasedness. the ^ residuals from regressing y2 onto v2 are simply the first stage fitted values. 5. where B^1 ^ ^ = (D’ 1 . But the 2SLS estimator of B1 is obtained ^ exactly from the OLS regression y1 on z1. so the only analysis we can do in any generality involves asymptotics. drink more coffee or alcohol. but we must be a little careful.54). on average. y2. where y2 and v2 are orthogonal in sample. The statement in the problem is simply wrong.y2) and x2 _ v^2. or eat less nutritious meals. Using the hint. CHAPTER 5 5. although the correlation might be small (especially because price is aggregated at the state level). the residuals are just z1 since v2 is N orthogonal in sample to z. ¨ x1 = (z1. and let B _ (B ’1 .a1)’. S z’i1^vi2 = 0.) Further. women who smoke during pregnancy may. There may be unobserved health factors correlated with smoking behavior that affect infant birth weight. b. (ii) Regress y1 onto ¨ x1. Basic economics says that packs should be negatively correlated with cigarette price. B^1 can also be obtained by partitioned regression: ^ (i) Regress x1 onto v2 and save the residuals.y2). Define x1 ^ ^ ^ _ (z1.
reg lbwght male parity lfaminc packs (male parity lfaminc cigprice) Source  SS df MS +Model  91.4203336 1387 . Std.0322 .0070964 .632694 4.233 0. in this case): .0012391 .116 0.000 3.65369 1383 .7971063 1. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1388 12.0460328 parity  .017779 1. States that have lower taxes on cigarettes may also have lower quality of health care.264 .467861 .76664363 4 .000 4.681 0.975601  (Note that Stata automatically shifts endogenous explanatory variables to the beginning of the list when report coefficients.1173139 . standard errors. Quality of health care is in u.0837281 .601 0.333819 2.677 0.18756 lbwght  Coef.262 0.0501423 _cons  4. 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 1388 2.) 15 .0180498 .0298205 .036352079 Number of obs F( 4.955 .001 .009 .0570128 1.4203336 1387 .0171209 4. OLS is followed by 2SLS (IV.718542 .1754869 _cons  4.55 0.000 . Interval] +male  .036352079 Number of obs F( 4.0262407 . and so maybe cigarette price fails the exogeneity requirement for an IV.063646 .094 . t P>t [95% Conf. Err.009 .890 0.32017 lbwght  Coef.0417848 lfaminc  .0290032 packs  .675618 .cigarettes.0258414 lfaminc  .0219322 0. t P>t [95% Conf.0218813 213.0147292 .0490 . on average. Err.044263 . Std.0055837 3. Interval] +packs  .0036171 .0064486 .2588289 17.39 0.770361 1383 .086275 0.441660908 Residual  48.0050562 .0350 0. and so on.928031 male  .3500269 4 22.0100894 2.035179819 +Total  50. .102509299 +Total  50.463 1.734 0. . reg lbwght male parity lfaminc packs Source  SS df MS +Model  1.8375067 Residual  141.0000 0.600 0.0646972 parity  . c.0056646 2.056 0.0481949 .960122 4.
Interval] +male  . 1383) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1388 10.0358264 .0305 0.000 .0088802 2.041 .000777 .187 . the coefficient on cigprice is not the sign we expect. We can see the problem with IV by estimating the reduced form for packs: .1040005 1. one more pack of cigarettes is estimated to reduce bwght by about 8.321 0. This is separate from the problem that cigprice may not truly be exogenous in the birth weight equation.0276 .0263742 parity  .0086991 6.The difference between OLS and IV in the estimated effect of packs on bwght is huge. The IV estimate has the opposite sign.0526374 . The sign and size of the smoking effect are not realistic.766 . Std. Err. Thus.089182501 Number of obs F( 4. With the OLS estimate.0007459 . in fact.001 0.1374075 .29448 packs  Coef. 5.0047261 . y2 .55) because each is uncorrelated with u1. t P>t [95% Conf.76705108 4 .86 0. d. Under the null hypothesis that q and z2 are uncorrelated. and is statistically significant.0697023 .317 .0007763 1.94176277 Residual  119.5. cigprice fails as an IV for packs because cigprice is not partially correlated with packs (with a sensible sign for the correlation).0181491 .0666084 . and is not statistically significant. 16 Unfortunately.086716615 +Total  123.4%.0022999 _cons  . is huge in magnitude.298 0.051 0.0355692 lfaminc  .696129 1387 .929078 1383 . z1 and z2 are exogenous in (5.0007291 .0158539 0.3414234 The reduced form estimates show that cigprice does not significantly affect packs. reg packs male parity lfaminc cigprice Source  SS df MS +Model  3.0000 0.0355724 cigprice  .044 0.
. (5. This is the sense in which identification With a single endogenous variable..x1.z1. z2 does not produce a consistent estimator of 0 on z2 even when E(z’ 2 q) = 0. ..h1a1. cannot be tested.. a..(1/d1)a1 into equation (5. we must take a stand that at least one element of z2 is uncorrelated with q. More formally. b...h1a1.. Since each xj is also uncorrelated with v . 5.56) Now. we have assumed that the zh are uncorrelated with a1.z2. + pKxK + pK+1z1 + . at least one of pK+1..zM) to get consistent of the bj and h1.. where h1 _ (1/d1). If we plug q = (1/d1)q1 .. since the zh are redundant in (5. they are uncorrelated with the structural error..is correlated with u1. The point of this exercise is that one cannot simply add instrumental variable candidates in the structural equation and then test for significance of these variables using OLS. in the linear projection q1 = p0 + p1x1 + . + bKxK + h1q1 + v . y2....xK. what we need for identification is that at least one of the zh appears in the reduced form for q1..in which case we would incorrectly conclude that z2 is not a valid IV candidate.45) we get y = b0 + b1x1 + .56) by 2SLS using instruments (1. that ^ J 1 We could find from this regression is statistically different from zero even when q and z2 are uncorrelated . we can estimate (5.. pK+M must be different from zero.45).in which case we incorrectly conclude that the elements in z2 are valid as instruments. and so the regression of y1 on z1. Given all of the zero correlation assumptions. Or. + pK+MzM + r1. We need family background variables to be redundant in the log(wage) 17 . v (by definition of redundancy).7. Further. we might fail to reject H0: J1 = 0 when z2 and q are correlated .
2819033 south  .471616 .0676158 married  .725 .1835294 . have been controlled for.1201284 .013 .07 0.537 .1901012 . Std.1546 0.70 0.6029198 8 2.192 .1451 .00 0. The idea here is that family background may influence ability but should have no partial effect on log(wage) once ability has been accounted for.150363248 +Total  126.0076754 .551 5.811916 721 . reg lwage exper tenure educ married south urban black iq (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source  SS df MS +Model  19.000 3.1138678 0.208996 713 .0327986 5.62 0.81 0.2513311 black  .1869376 .equation once ability (and other factors.31 0. q1.392231 .0305692 tenure  .000 . such as educ and exper).175883378 Number of obs F( 8.38777 lwage  Coef.0015979 .54 0.0261982 0.0982991 . Interval] +iq  . For the rank condition to hold.0467592 4.1225442 . once the xj have been netted out.05 0.2635832 exper  .035254 . t P>t [95% Conf.0137529 educ  .468913 9.0030956 2.047992 .000 . This is likely to be true if we think that family background and ability are (partially) correlated.0241444 urban  .45036497 Residual  107.0162185 . c.48 0.70 . we need family background variables to be correlated with the indicator.0367425 1. reg lwage exper tenure educ married south urban black kww (exper tenure educ married south urban black meduc feduc sibs) Instrumental variables (2SLS) regression Source  SS df MS +18 Number of obs = F( 8. Err.0077077 2. 713) = 722 25.0083503 . say IQ.0000 0.000 .0003044 .0161809 .046 .35 0.RAW gives the following results: . Applying the procedure to the data set in NLS80.0040076 4.0240867 _cons  4. 713) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 722 25.0154368 .0400269 .
1551341 .004 .0424452 .000 . because data are missing on meduc and feduc.000 4.1484003 .0046184 .098 .176 . Define q4 = b4 .1627592 32.0249441 .0068682 .003 . (In both firststage regressions. and sibs. feduc.1468 .85 0. What we could do is define binary indicators for whether the corresponding variable is missing.9.991612 713 .1563 0.38737 lwage  Coef.811916 721 .) 5.0201147 _cons  5.0545067 tenure  .175883378 Prob > F Rsquared Adj Rsquared Root MSE = = = = 0.0255051 1.0260808 . so that b4 = b3 + q4.2179041 .47 0.307 .0022947 .217818 .0037739 1. and then use the binary indicators as instruments along with meduc.02 0.1330137 exper  .61 0. and sibs have pvalues below .0322147 2. This would allow us to use all 935 observations.0051145 .2645347 south  .0761549 married  . Std. the F statistic for joint significance of meduc.0675914 .03 0.0239933 .2292093 black  . the equation and rearranging gives 19 Plugging this expression into .150058361 +Total  126.Model  19.0000 0. Err.0286399 urban  .002. This could be because family background variables do not satisfy the appropriate redundancy condition.0565198 .635 . only 722 are used for the estimation.0150576 1. so it seems the family background variables are sufficiently partially correlated with the ability indicators.b3.0067471 1.820304 8 2. Interval] +kww  .309 . or they might be correlated with a1.36 0.06 0.0529759 3.0125238 educ  .898273 5.02 0. set the missing values to zero. t P>t [95% Conf.091887 . feduc.537362 Even though there are 935 men in the sample.477538 Residual  106.0411598 3.0893695 0.0063783 .1605273 .66 0. The return to education is estimated to be small and insignificant whether IQ or KWW used is used as the indicator.
r2 is uncorrelated with z. By y2 The second step regression (assuming is known) is essentially y1 = z1D1 + a1y2 + a1r2 + u1._z)(y . where E(z’r2) = 0. the IV estimate of the ^ slope can be written as b1 = & SN (z . just estimate the latter equation by 2 2SLS using exper. 0 0 Effectively. Following the hint. exper . we regress y1 on z1. ^ We can use the t statistic on q4 to test H0: q4 = 0 against H1: q4 > 0.x) _ * = i i 7i=1 i 8 7i=1 i 8 20 .) 0 Plugging in y2 = y2 + a2 gives y1 = z1D1 + a1y2 + a1a2 + u1. 0 Further. Now. OLS will be inconsistent for all parameters in * Contrast this with 2SLS when y2 is the projection on z1 and z2: = y2 + r2 = zP2 + r2.1. 0 5. is that E(z’ 1 a2) Therefore. a1a2 + u1. log(wage) = b0 + b1exper + b2exper = b0 + b1exper + b2exper where totcoll = twoyr + fouryr. let y2 be the linear projection of y2 on z2.2 + b3(twoyr + fouryr) + q4fouryr + u 2 + b3 totcoll + q4fouryr + u. * Now. assumption. * that P2 The problem $ 0 necessarily because z1 was not included in the linear projection for y2.1 show that the argument carries over to the case when L2 is estimated._y)*/& SN (z . a. and so E(z’ 1 r2) = 0 and E(y2r2) = 0. dist2yr and dist4yr as the full set of instruments. E(z’u1) = 0. general.and secondstage regressions. let a2 L2 be the projection error. y2._z)(x . and assume that is known. * The lesson is that one must be very careful if manually carrying out 2SLS by explicitly doing the first.13. The key consistency condition is that each explanatory is orthogonal to the composite error. (The results on generated regressors in Section 6. E(y2a2) = 0 by construction. 5. In a simple regression model with a single IV.11.
x0). the rank condition holds if and only if rank(^) = K.  b.(N0/N)y0   = (N0/N)(y1 . a.&7 S zi*8_y = N1y1 . as a weighted average: clear.N1)/N]y1 . Taking the ratio proves the result.x0 is the difference in   necessary for participation. If x is also binary . K1.y = [(N . Now the numerator can be written as 7i=1 i i 8 7i=1 i i 8 N N N S zi(yi ._y) = S ziyi . i=1 i=1 i=1 N where N1 = S zi is the number of observations in the sample with zi = 1 and     i=1   y1 is the average of the yi over the observations with zi = 1. (When eligibility is Generally. If for some xj._x)*.y0). and x0 is the fraction of people participating who are not eligible.   Next.) participation rates when z = 1 and z = 0. where the notation should be     Straightforward algebra shows that y1 ._y)*/& SN z (x . In L(xz) = z^.   So the numerator of the IV estimate is (N0N1/N)(y1 .& SN z (y . where IK 2 20 ^11 2 is the K2 x K2 is L1 x K1. x1 .y). we can write ^ = (^ 9^ 2 11 12 0 IK identity matrix. So the difference in the mean response between the z = 1 and z = 0 groups gets divided by the difference in participation rates across the two groups.   The same argument shows that the denominator is (N0N1/N)(x1 . Then x1 is the fraction of people  participating in the program out of those made eligibile.15. and ^12 is K2 x As in Problem 5. then 21 ^11 has . suppose xi = 1 if person i participates in a job training program. write y  y = (N0/N)y0 + (N1/N)y1. the vector z1 does not appear in L(xjz). and let zi = 1 if person i is eligible for participation in the program. 0 is the L1 x K2 zero matrix.x1 is the  fraction of observations receiving treatment when zi = 1 and x0 is the fraction receiving treatment when zi = 0. ) . So.N1y = N1(y1 .representing some "treatment" . x0 = 0.y0). 5.12.
Here is abbreviated Stata output for testing the null hypothesis that educ is exogenous: . rank is a K1 x K1 diagonal matrix with nonzero diagonal elements. j = 1.1. Without loss of generality. Suppose K1 = 2 and L1 = 2. But then that column of a linear combination of the last K2 elements of ^. while we began with two instruments.. qui reg educ nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66 . resid 22 . ^ = K.. predict v2hat. a. CHAPTER 6 6. ^ is all Intuitively. Then Looking at ^11 ^ = diagonals then Therefore. It cannot have rank K. where z1 appears in the reduced form form both x1 and x2. but z2 appears in neither reduced form. c. ^11 Then the 2 x 2 matrix has zeros in its second row. we see that if 2 20 ^11 is diagonal with all nonzero is lower triangular with all nonzero diagonal elements. we assume that zj appears in the reduced form for xj. which means that the second row of zeros. Therefore. in that case.. (^ 11 12 9^ 2 ^ 0 IK ) . we can simply reorder the elements of z1 to ensure this is the case. which means that at least one zh must appear in the reduced form of each xj. a necessary condition for the rank condition is that no columns of ^11 be exactly zero.a column which is entirely zeros.. only one of them turned out to be partially correlated with x1 and x2. ^ can be written as which means rank(^) < K. b.K1.
430 0.0699968 .001 .710 0. To test the single overidentifying restiction we obtain the 2SLS residuals: . we regress the 2SLS residuals on all exogenous variables: .0828005 .0341426 .1188149 .007 0.71 as evidence for or against endogeneity.0777521 .124 .463 .0209423 5.0478882 2.0251538 .0299809 1.253 0.1570594 .117 .179 0.0150626 .1286352 reg668  .0390596 .0610759 . resid Now.000 .393 .481 0.0359807 1.0259797 . Interval] +educ  .000 1.0310325 0.0293807 south  .050516 .0623912 .0003191 7.1259578 .2517275 exper  .1980371 .0469556 .566 0.000 .0552789 v2hat  .0023565 .1944098 . The negative correlation between u1 and educ is essentially the same finding that the 2SLS estimated return to education is larger than the OLS estimate.1057408 reg664  .71.821434 4.673 0.0017308 black  .1034468 smsa66  .0606186 reg663  .1659733 reg667  .100753 .0289435 3.1598776 expersq  .105 0. reg uhat1 exper expersq black south smsa reg661reg668 smsa66 nearc4 nearc2 Source  SS df MS Number of obs = 23 3010 .1371509 reg666  .177718 .2926273 .0484086 1.0247932 reg662  . that educ is endogenous.0151411 reg665  .153 .994 .734 0.339687 .1811588 . In any case.0489487 1.1431945 .729054 4.000 .0456842 0.000 . Std.000 .0205106 0.0440018 .0261202 5.0515041 .482 0.) b. the same researcher may take t = 1.0482814 3.0554084 .102976 . reg lwage educ exper expersq black south smsa reg661reg668 smsa66 v2hat lwage  Coef. I would call this marginal evidence (Depending on the application or purpose of a study. t P>t [95% Conf.0919791 smsa  .0398738 2.950319 ^ The t statistic on v2 is 1.238 . Err.1232778 .1575042 reg661  .855 0.2171749 .010 . qui reg lwage educ exper expersq black south smsa reg661reg668 smsa66 (nearc4 nearc2 exper expersq black south smsa reg661reg668 smsa66) .583 0.0121169 _cons  3. which is not significant at the 5% level against a twosided alternative.066 0.087 .0118296 .0482417 4.384 0.0002286 .0436804 1. predict uhat1.010 .0029822 ..540 0.574 0.
0004 1. exper.08 = 1. prices vary because of things like transportation costs that are not systematically related to regional variations in individual productivity. 2993) Prob > F Rsquared Adj Rsquared Root MSE = 0.164239466 +Total  491. v21. While this is easy to test for each by estimating the two reduced forms. First. educ.163433913 F( 16.0000 = 0.27332168 2 The pvalue... educ.+Model  . the rank condition could still be violated (although see Problem 15. b.568721 2993 .204 . is about . p1. v21 and v22. We would first estimate the two reduced forms for calories and protein 2 by regressing each on a constant.012745177 Residual  491. Since there are two endogenous explanatory variables we need at least two prices.772644 3009 .0049 = . . a.3. we must also assume prices are exogenous in the productivity equation. 6. exper .273.. exper . di chiprob(1. obtained from a c1 distribution. and the M prices. pM.1. exper. In addition. Ideally. di 3010*. calories and protein must be partially correlated with prices of food. ^ ^ We obtain the residuals.5c).2) . We need prices to satisfy two requirements. Then we would run the 2 ^ ^ regression log(produc) on 1.0004 = 0. A potential problem is that prices reflect food quality and that features of the food other than calories and protein appear in the disturbance u1. v22 and do a joint 24 .40526 The test statistic is the sample size times the Rsquared from this regression: .203922832 16 . so the instruments pass the overidentification test. c.
s2 = Next. a.Mh)’xi8*(B . N 1/2& 7N The third term can be written as ^ ^ 1/2 S (hi . 2 freedom adjustment. 25 We have shown that the last two .s2) = Op(1)Wop(1) = op(1). ^2 In these tests.B) = 7 i=1 8 Op(1) and. i=1 1 N   where we again use the fact that sample averages are Op(1) by the law of large ^ numbers and vec[rN(B  B)rN(B^   B)’] = Op(1).s ) = N i . E[ui(hi . N i=1 1/2 N op(1).Mh)’(xi t xi)*8{vec[rN(B . the df adjustment makes no difference 1/2 N ^2 ^2 1/2 N ^2 ^2 S (hi .there is no degrees of Var(ux) = s .40) i=1 ^ where the expression for the third term follows from [xi(B  B)]2 ^ = xi(B  B)(B^ ^ ^ t xi)vec[(B . Dropping the "2" the second term can & 1 N * ^ ^ be written as N S ui(hi .^s2) = N1/2 S (hi .B)rN(B .B)(B .s ). ^2 ^2 So ui .s has a zero sample average. We could use a standard F test or use a heteroskedasticityrobust test. which means that asymptotically.Mh)’(s^2 . as in Problem 4.Mh)’u^2i = N1/2 S (hi .Mh)’u^2i = N1/2 S (hi  i=1 M 2 h)’ui + op(1).B)’]. E(ux) = 0. so y = xB + u. the law of large numbers  B)’x’i = (xi   implies that the sample average is op(1).B)’]}.M )’(x t x )*{vec[(B ^ ^ + N . S (hi .Mh)’xi rN(B . For simplicity. i=1 ^2 2 ^ Now.Mh)’xi] = 0. ^ [xi(B N B)]2.^ ^ significance test on v21 and v22.B)’]} = N WOp(1)WOp(1). So N Therefore. 6.B) = op(1)WOp(1) because rN(B .2 N S ui(hi . N i=1 i=1 We are done with this part if we show N 1/2 N N S (hi .5.s ) + op(1).Mh)’u2i i=1 i=1 & 1/2 N ^ . i h i i 8 7 (6.4.2uixi(B  B) + so 1/2 N N S (hi .B) 7 i=1 N & 1/2 S (h .Mh)’(u i . we can write ui = ui . so i=1 far we have 1/2 N N ^2 2 S h’i (u^2i . under E(uixi) = 0. s is implictly SSR/N .Mh)’(u S h’i (u i . absorb the intercept in x.B)(B .) N (In any case.Mh)’ = Op(1) by the central limit theorem and s^2 . 1/2 N i=1 i=1 S (hi .
terms in (6.40) are op(1), which proves part (a).
1/2 N
S h’i (u^2i  ^s2) is Var[(hi
b. By part (a), the asymptotic variance of N
i=1
M

2
h)’(ui
2 2
4
2uis
2
 s )] =
+ s .
2
E[(ui
2 2
 s ) (hi 
Mh)’(hi

Mh)].
2
Under the null, E(uixi) = Var(uixi) = s
2
2
2
xi] = k2  s4 _ h2.
2 2
2
2 2
standard iterated expectations argument gives E[(ui  s ) (hi 2
2 2
Mh)}
Mh)’(hi

Mh)]xi}
2
[since hi = h(xi)] = h E[(hi 
show.
2
2 2
= E{E[(ui  s )
Mh)’(hi

4
= ui 
[since E(uixi) = 0 is
assumed] and therefore, when we add (6.27), E[(ui  s )
= E{E[(ui  s ) (hi 
2 2
Now (ui  s )
Mh)].
Mh)’(hi

A
Mh)]
xi](hi  Mh)’(hi 
This is what we wanted to
(Whether we do the argument for a random draw i or for random variables
representing the population is a matter of taste.)
c. From part (b) and Lemma 3.8, the following statistic has an asymptotic
2
cQ distribution:
&N1/2 SN (u^2  s^2)h *{h2E[(h  M )’(h  M )]}1&N1/2 SN h’(u
^2
^2 *
i
i8
i
h
i
h
i
i  s )8.
7
7
i=1
i=1
N ^2
^2
Using again the fact that S (ui  s ) = 0, we can replace hi with hi  h in

i=1
the two vectors forming the quadratic form.
Then, again by Lemma 3.8, we can
replace the matrix in the quadratic form with a consistent estimator, which is
^2& 1
h N
^2
1
where h = N
N ^2
^2 2
S (u
i  s ) .
7
N
S (hi  h)’(hi  h)*8,
i=1


The computable statistic, after simple algebra,
i=1
can be written as
& SN (u^2  s^2)(h  h)*& SN (h  h)’(h  h)*1& SN (h  h)’(u
^2
^2 * ^2
i
i
i  s )8/h .
7i=1 i
87i=1 i
8 7i=1 i




^2
^2
Now h is just the total sum of squares in the ui, divided by N.
The numerator
^2
of the statistic is simply the explained sum of squares from the regression ui
on 1, hi, i = 1,...,N.
Therefore, the test statistic is N times the usual
^2
2
(centered) Rsquared from the regression ui on 1, hi, i = 1,...,N, or NRc.
2
2 2
d. Without assumption (6.37) we need to estimate E[(ui  s ) (hi 
Mh)]
generally.
Hopefully, the approach is by now pretty clear.
26
Mh)’(hi
We replace
the population expected value with the sample average and replace any unknown
parameters (under H0).
B,
2
s , and
Mh
in this case  with their consistent estimators
&
7
^2
^2 *
S h’i (u
i  s )8
i=1
1/2 N
So a generally consistent estimator of Avar N
is
N
1 N
S (u^2i  s^2)2(hi  h)’(hi  h),


i=1
and the test statistic robust to heterokurtosis can be written as
& SN (u
^2
^2
*& SN (u^2  ^s2)2(h  h)’(h  h)*1
 s )(hi  h)
i
i
i
7i=1
87i=1 i
8
N
&
^2
^2 *
W7 S (hi  h)’(ui  s )8,




i=1
which is easily seen to be the explained sum of squares from the regression of
^2
^2
1 on (ui  s )(hi  h), i = 1,...,N (without an intercept).

Since the total
sum of squares, without demeaning, is N = (1 + 1 + ... + 1) (N times), the
statistic is equivalent to N  SSR0, where SSR0 is the sum of squared
residuals.
6.7. a. The simple regression results are
. reg lprice ldist if y81
Source 
SS
df
MS
+Model  3.86426989
1 3.86426989
Residual  17.5730845
140 .125522032
+Total  21.4373543
141 .152037974
Number of obs
F( 1,
140)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
142
30.79
0.0000
0.1803
0.1744
.35429
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+ldist 
.3648752
.0657613
5.548
0.000
.2348615
.4948889
_cons 
8.047158
.6462419
12.452
0.000
6.769503
9.324813
This regression suggests a strong link between housing price and distance from
the incinerator (as distance increases, so does housing price).
27
The elasticity
is .365 and the t statistic is 5.55.
However, this is not a good causal
regression:
the incinerator may have been put near homes with lower values to
begin with.
If so, we would expect the positive relationship found in the
simple regression even if the new incinerator had no effect on housing prices.
b. The parameter d3 should be positive:
after the incinerator is built a
house should be worth more the farther it is from the incinerator.
Here is my
Stata session:
. gen y81ldist = y81*ldist
. reg lprice y81 ldist y81ldist
Source 
SS
df
MS
+Model  24.3172548
3 8.10575159
Residual  37.1217306
317 .117103251
+Total  61.4389853
320 .191996829
Number of obs
F( 3,
317)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
321
69.22
0.0000
0.3958
0.3901
.3422
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y81  .0113101
.8050622
0.014
0.989
1.59525
1.57263
ldist 
.316689
.0515323
6.145
0.000
.2153006
.4180775
y81ldist 
.0481862
.0817929
0.589
0.556
.1127394
.2091117
_cons 
8.058468
.5084358
15.850
0.000
7.058133
9.058803
The coefficient on ldist reveals the shortcoming of the regression in part (a).
This coefficient measures the relationship between lprice and ldist in 1978,
before the incinerator was even being rumored.
The effect of the incinerator
is given by the coefficient on the interaction, y81ldist.
While the direction
of the effect is as expected, it is not especially large, and it is
statistically insignificant anyway.
Therefore, at this point, we cannot reject
the null hypothesis that building the incinerator had no effect on housing
prices.
28
0132713 .677871 309 .1593143 lintst  .43282858 Residual  12.0187723 3.2540468 . 5334) Prob > F Rsquared Adj Rsquared = = = = = 5349 16.1499564 _cons  2.675 0.441793 14 25.002 . but the interaction is still statistically insignificant.062) and the t statistic is larger.926 0.4877198 0.0014108 5.185185 5.7863 .9633332 .0357625 .0866424 .041817 .0073939 .189519 .0000486 rooms  .1884113 y81ldist  .c.229847 .7611143 11 4.69e06 3.20256 lprice  Coef.6029852 Residual  8341.56381928 +29 Number of obs F( 14.0469214 .41206 5334 1. reg lprice y81 ldist y81ldist lintst lintstsq larea lland age agesq rooms baths Source  SS df MS +Model  48.000 .0805715 baths  .0248165 4. t P>t [95% Conf.000 .774032 1.1588297 age  .9. Err.0222128 larea  .241 0.0495705 1.246 0.0611683 .096088 . The Stata results are .041028709 +Total  61.0101699 .7298249 ldist  .3548562 .000 .3262647 2.0517205 1.0412 0. 309) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 321 108.432 0.744 0. Using these models and this two years of data we must conclude the evidence that housing prices were adversely affected by the new incinerator is somewhat weak.027479 3.0046178 agesq  .0512328 6.796236 The incinerator effect is now larger (the elasticity is about .489 0.04 0. reg ldurat afchnge highearn afhigh male married headconstruc if ky Source  SS df MS +Model  358. a.0958867 .0000144 .191996829 Number of obs F( 11.006 .4389853 320 .953 0.0171015 2.151 0.0000 0.000 .000 .305525 1.0387 .0591504 .003 .214 .638 1.471 0.0617759 .0151265 .300 0. 6.3213518 1.4556655 lland  .7937 0.0000315 8. Std.605315 lintstsq  .627 0. Interval] +y81  .37 0.109999 . Adding the variables listed in the problem gives .0000 0.095 .195 1.
Err.2727118 . The low Rsquared means that making predictions of log(durat) would be very difficult given the factors we have included in the regression: the variation in the unobservables pretty much swamps the explained variation.376892 .1015267 0.16 0. Std.5139003 .1101967 .18 0.000 1.12 0.2408591 .40 0.0804827 construc  . the OLS estimator is consistent.98 0.1011794 1.3208922 .196 .1614899 1.2505 ldurat  Coef.0086352 .24 0.2308768 .1264514 .033 .210769 1.095 .813 .29 0.0198141 trunk  .1220995 . Adding the other explanatory variables only slightly increased the standard error on the interaction term.0466737 .178539 .078 . The small Rsquared.74 0.0872651 .6859052 manuf  .002 .1757598 .246 .0518063 2.1904371 lowextr  .0517462 3.0979407 .13 0.32 0.0106049 married  . is often the case in the social sciences: This it is very difficult to include the multitude of factors that can affect something like durat.2772035 afhigh  .08 0.0743161 .1090163 1.000 .0409038 3.62674904 Root MSE = 1.0085967 .76 0.2117581 _cons  1.000 .0986824 highearn  .245922 .001 .93 0.3671738 male  .0803101 occdis  .03779 1.1987962 head  .1023262 1.0945798 .0774276 . b.028 .Total  8699. and even more statistically significant than in equation (6.1202911 . Provided the Kentucky change is a good natural experiment.5864988 upextr  .001 . Interval] +afchnge  .1061677 11. t P>t [95% Conf. the low Rsquared does not mean we have a biased or consistent estimator of the effect of the policy change.1606709 .1292776 3.2604634 neck  . or 3.2076305 .1852766 .340168 lowback  .1%. on the order of 4.0695248 3.85385 5348 1.0106274 .7673372 .67 0. 30 With over 5.0454027 .1404816 .933 . means that we cannot explain much of the variation in time on workers compensation using the variables included in the regression.2699126 .33). we .20 0.000 observations.0391228 3.240 .454054 The estimated coefficient on the interaction term is actually higher now.0449167 0. However.0445498 2.9% if we used the adjusted Rsquared.
although the 95% confidence interval is pretty wide.1919906 .192.0098 1.251 . 1075) = Prob > F = 1084 99.0847879 1.60 0.91356194 Number of obs F( 3.523989 The coefficient on the interaction term. is remarkably similar to that for Kentucky. c. reg ldurat afchnge highearn afhigh if mi Source  SS df MS +Model  34. The following is Stata output that I will use to answer the first three parts: . the ratio of The difference in the KY and MI cases shows the importance of a large sample size for this kind of policy analysis.89471698 +Total  2914.0689329 .can get a reasonably precise estimate of the effect.91 0. t P>t [95% Conf.4616726 Residual  2879. Asymptotic theory predicts that the standard error for Michigan will be about 1/2 (5. Using the data for Michigan to estimate the simple model gives .992074 8 16. .109 .3850177 3 11. standard errors is about 2.80 0.0118 0. Interval] +afchnge  .3765 ldurat  Coef.35483 1523 1.3762124 afhigh  .05 0.1541699 1. the t statistic is insignificant at the 10% level against a onesided alternative.11.0379348 .96981 1520 1.000 1.2636945 highearn  . In fact.92 larger than that for Kentucky.4943988 _cons  1.301485 1.0567172 24. Err.1691388 .0973808 . 6.15 0.626/1. 1520) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1524 6.524) ~ 1.213 .23.0004 0.1055676 1.412737 .9990092 31 Number of obs = F( 8.25 0. Unfortunately. reg lwage y85 educ y85educ exper expersq union female y85fem Source  SS df MS +Model  135.0000 . Std.1104176 . because of the many fewer observations.
91 0.0005516 .2615749 female  .0066764 11.098 .042 with a standard error of about . this is suggestive of some closing of wage differentials between men and women at given levels of education and workforce experience.000 .0934485 4. or 1. c.000 .4589329 .2021319 .1426888 . To answer this question. But the t statistic is only significant at about the 10% level against a twosided alternative.97. The coefficient is about .051309 1. Std.125075 .66 0.0185.0366215 8.Residual  183. Still.67 0.244851 y85fem  .0156251 .15 0.4262 0. The t statistic is 1.0747209 .000 .085052 .) d.036815 exper  . which shows a significant fall in real wages for given productivity characteristics and gender over the sevenyear period.0184605 . The return to another year of education increased by about . Interval] +y85  .185729 _cons  .642295  a. (But see part e for the proper interpretation of the coefficient.1178062 . which gives a t statistic of about 1.65 0. you can check that when 1978 wages are used.3167086 .000 .000 .0878212 y85educ  .091167 1083 .000 .0035673 8.022.91. between 1978 and 1985.2755707 . 32 So .0000775 5. t P>t [95% Conf. y85.0295843 .4219 .0093542 1.4127 lwage  Coef.099094 1075 . which is marginally significant at the 5% level against a twosided alternative.19 0. In fact.170324738 +Total  319.85 percentage points.0003994 .1237817 0. Only the coefficient on y85 changes if wages are measured in 1978 dollars.29 0.0225846 . the coefficient on y85 becomes about .29463635 Rsquared = Adj Rsquared = Root MSE = 0.95 0.0616206 .036584 expersq  .0002473 union  . b. The coefficient on y85fem is positive and shows that the estimated gender gap declined by about 8. Err.049 .383.000106 .97 0.0302945 6.3606874 educ  .3885663 .341 . I just took the squared OLS residuals and regressed those on the year dummy.5 percentage points.
036815 exper  .0035673 8.0000775 5.000106 .29463635 Number of obs F( 8.1426888 . q0 is the coefficient on 12).992074 8 16.3393326 .000 . Interval] +y85  .0156251 .098 .3167086 .2615749 female  .98 0.0878212 y85educ0  .051309 1. or 33.9990092 Residual  183.0002473 union  .12) .] The 95% confidence interval goes from about 27.036584 expersq  .3885663 .000 .0184605 . obtained from exp(. e. we want q0 _ d0 + 12d1. educ.4127 lwage  Coef.0295843 .0747209 . [We could use the more accurate estimate.0005516 .2021319 .4589329 .339.185729 _cons  .244851 y85fem  .19 0.339) 1. t P>t [95% Conf. In Stata we have . the coefficient d0 is the growth in nominal wages for a male with no years of education! with 12 years of education.0616206 . Err.4262 0. 1075) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 1084 99.000 .91 0.4219 .0366215 8.085052 .66 0.0066764 11.000 .0093542 1.642295 So the growth in nominal wages for a man with educ = 12 is about .6.0302945 6. reg lwage y85 educ y85educ0 exper expersq union female y85fem Source  SS df MS +Model  135.3 to 40.000 .170324738 +Total  319.0934485 4.0225846 . 33 .9%.97 0.29 0.4060659 educ  .000 . As the equation is written in the problem.2725993 .0003994 .000 .0000 0.091167 1083 .15 0.099094 1075 .0340099 9.there is some evidence that the variance of the unexplained part of log wages (or log real wages) has increased over time. Std.049 . in the new model.80 0. For a male A simple way to obtain ^ ^ ^ the standard error of q0 = d0 + 12d1 is to replace y85Weduc with y85W(educ Simple algebra shows that.2755707 . gen y85educ0 = y85*(educ .67 0.65 0.
B) = [E(X’ i ) Xi)] . Since OLS equationbyequation is the same as GLS when ) is diagonal. the weak law of large numbers. plim &N1 SN X’X *1 = A1. 7 i=1 i i8 7 i=1 i i8 From SOLS. a.3. we have * 2 2.2. SGLS. Now. SGLS. Thus. it is also block diagonal.3 sg2E(x’igxig) for all g = 1. and SGLS. it suffices to show that the GLS estimators for different equations are asymptotically uncorrelated. Write (with probability approaching one) B^ = B + &N1 SN X’X *1&N1 SN X’u *. we can use the special form of Xi for SUR (see Example 7. we use the result under SGLS.2. .CHAPTER 7 7.. the fact that )1 is diagonal. and SGLS..1. 7 i=1 i i8 ^ & 1 N X *1Wplim &N1 SN X’u * = B + A1W0 = B. implies that E(uigx’ igxig) = 2 In the SUR model with diagonal ).3. ) plim B = B + plim N S X’ 7 i=1 i i8 7 i=1 i i8 Further. 34 This shows that the .1. where the blocking is by the parameter vector for each equation. under SOLS.1). the WLLN implies that plim 7. &s2 0 1 E(x’ i1xi1) 2 1 0 W E(X’ i ) Xi) = 2 W 2 7 0 0 Therefore. from Theorem 7. asymptotic variance of what we wanted to show.1.3.G.. and Slutsky’s Theorem.4: To establish block diagonality. 7 i=1 i i8 &N1 SN X’u * = 0..^ 1 1 Avar rN(B .5). all g $ h. and E(uiguihx’ igxih) = E(uiguih)E(x’ igxih) = 0. 0 2 s2 G E(x’ iGxiG)8 0 W When this matrix is inverted. This follows if the asymptotic variance matrix is block diagonal (see Section 3.
then rN(BSOLS .5. we can either construct the Wald statistic or we can use the weighted sum of squared residuals form of the statistic as in (7.2. a. First.. consider E(uituis). 7. where Bg is the OLS estimator for equation g.. ) 7. g = 1.3 does not hold. Note that 1 &^ 1 & N *1 N ^ 2) t 7 S x’i xi*82 = ) t &7 S x’i xi*8 . To test any linear hypothesis. even if )^ is estimated in an unrestricted fashion and even if the system homoskedasticity assumption SGLS. GLS and FGLS are asymptotically equivalent (regardless of the structure of = B^GLS when ) and )) whether or not SGLS. 35 2 are easily found since E(uit) = Now. Under SGLS. i=1 82 2 W 2 7 2 N 2 2 SN x’y 2 i iG8 7 S x’i yiG8 7 i=1 i=1 Straightforward multiplication shows that the right hand side of the equation ^ ^ ^ ^ is just the vector of stacked Bg.B ^ ^ . is diagonal. 7 i=1 8 i=1 Therefore. This is easy with the hint. if Thus. system OLS and GLS are the same.7.6 for one way to impose general linear restrictions. c.1 and SGLS.53).BFGLS) = op(1).. See Problem 7. . the diagonal elements of E[E(uitxit)] = 2 ) st2 by iterated expectations.52) or (7. and .3 holds. model with the restriction B1 = For the restricted SSR we must estimate the B2 imposed. B^SOLS But.. OLS and FGLS are asymptotically equivalent.BGLS) = op(1). When ) is diagonal in a SUR system. B^ = 1* &^ N ^ 1 2) t &7 S x’i xi*8 2() 7 i=1 8 & SN x’y * & SN x’y * i i12 i i12 2i=1 2i=1 2 2 1 2 2 & N * t IK)2 WW 2 = 2IG t &7 S x’i xi*8 22 W WW 22.b.^ ^ rN( FGLS .G.
since is diagonal. d.. Applying the law of iterated expectations (LIE) again we have E(uituis) = E[E(uituisuis)] = E[E(uituis)uis)] = 0.80). s > t..xis) = 0. say. t = 1. t $ s. if s < t. . E(uitxit.t+1 = yit is correlated with uit.. First..Under (7. First consider the terms for s T T 2 S S s2 t ss E(uituisx’ itxis). since )1 is diagonal. and so t=1 = 0 Thus.. SGLS.1 holds whenever there is feedback from yit to )1 However.t1 + uit.. is a subset of the conditioning information in (7. 7i=1t=1 t it it8 7i=1t=1 t it it8 = b0 + b1yi. which says that xi. If. b.T. X’ i) = (s1 x’ i1.1. GLS is consistent in this case without SGLS. E(uituis) = 0 since uis take s < t without loss of generality. Thus. and so by the LIE. E(uitx’ itxit) = E[E(uitx’ itxitxit)] = E[E(uitxit)x’ itxit)] 2 2 2 = E[stx’ itxit] = 2 st2E(x’itxit). 1 2 2 s2 T x’ iT)’. does not hold. The GLS estimator is 1 N N B* _ &7 S X’i )1Xi*8 &7 S X’i )1yi*8 i=1 i=1 & SN ST s2x’ x *1& SN ST s2x’ y *. for each t.s2 x’ i2.1 Generally.. xis. It follows that E(X’ i ) uiu’ i ) Xi) = 1 1 T 1 S s2 t E(x’ itxit) = E(X’ i ) Xi).80). t=1s=1 Under (7. E(uituisx’ itxis) = 0. then yit is clearly correlated = c.uis. X’ i ) ui = 1 T E(X’ i ) ui) = S s2 t E(x’ ituit) t=1 1 since E(x’ ituit) = 0 under (7. T S x’its2 t uit.80). . t=1 36 Next. yit with uit. and so E(X’ i ) uiu’ i ) Xi) = 1 1 $ t. SGLS.2.79).
Then. t statistics. by ^2 st Lp s2t as N L 8.. t = 1.8509 . reg lscrap d89 grant grant_1 lscrap_1 if year != 1987 Source  SS df MS +Model  186.03370676 37 Number of obs F( 4..T. inference is very easy. If s2t = s2 for all t = 1. i = 1.8565 0. I first test for serial correlation before computing the fully robust standard errors: .K as a degreesoffreedom adjustment. FGLS reduces Thus. f.51) are asymptotically valid.2296502 103 .N..303200488 +Total  217. th t Now.9. standard errors obtained from (7.1).. Then. The Stata session follows. run pooled regression across all i and t.. and F statistics from (7. 103) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 108 153.5942432 Residual  31. then the FGLS statistics are easily shown to be identical to the statistics obtained by performing pooled OLS on the equation ^ ^ (yit/st) = (xit/st)B + errorit.. g. We have verified the assumptions under which standard FGLS statistics have nice properties (although we relaxed SGLS.. For F testing.. note that the ^2 st should be obtained from the pooled OLS residuals for the unrestricted model.55064 . We can obtain valid standard errors.376973 4 46.T. First. define N ^2 ^2 st = N1 S ^uit i=1 (We might replace N with N .53) are valid.) standard arguments.606623 107 2..^ ^ e. to pooled OLS..2. for each t.67 0.. if ^ ) is taken to be the diagonal matrix with s^t2 as the diagonal. and F statistics from this weighted least squares analysis. let uit denote the pooled OLS residuals.. 7. we can use the standard errors and test statistics reported by a standard OLS regression pooled across i and t. In particular.0000 0.
962 0. reg lscrap grant grant_1 lscrap_1 uhat_1 if d89 Source  SS df MS +Model  94. are now the expected sign.809828 .2123137 . but neither is strongly statistically significant.1257443 1.7530202 49 .077 0.770 0.3785767 .4628854 . and its lag. robust cluster(fcode) Regression with robust standard errors Number of obs = F( 4.1199127 0.8055569 1.9204706 . resid (363 missing values generated) . Now test for AR(1) serial correlation: .000 . t P>t [95% Conf.0883283 0.6186631 Residual  15.420 0.0021646 .8454 .507 .5958904 _cons  .227673 53 2.0378247 .173 .000 .666 0.232525 .0769918 grant_1  .4500385 grant_1  .3532078 .675 .1153893 .338 .1073226 .3232679 lscrap_1  . Err.321490208 +Total  110.0000 0.028 0.0357963 24.2790328 .097 0.24 0.0000 .0571831 16.1224292 grant  .035384 uhat_1  .939 .1146314 2.47 0.0371354 .0276544 .8808216 .07976741 Number of obs F( 4.1576739 1. 53) = Prob > F = 38 108 77. The results are certainly different from when we omit the lag of log(scrap).083 . t P>t [95% Conf.1746251 0.158 0. gen uhat_1 = uhat[_n1] if d89 (417 missing values generated) .875 .4217765 . The variable grant would be if we use a 10% significance level and a onesided test. Err.048 .567 lscrap  Coef. Interval] +d89  .426703 .2120579 lscrap_1  .8571 0. 49) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 54 73. Std. Interval] +grant  .lscrap  Coef.1723924 .215732 0.9518152 _cons  .4746525 4 23. reg lscrap d89 grant grant_1 lscrap_1 if year != 1987.138043 The estimated effect of grant.0165089 .371 0.4170208 .606 0. predict uhat. Std.1610378 0.
570 0.0645344 13.3266 7.380342 629 .000 .1142922 grant  .65 0.143585231 +Total  206.153 .735673 618 .6949699 Residual  88. The following Stata output should be selfexplanatory. making both more statistically significant. There is strong evidence of positive serial correlation in the static model. and the fully robust standard errors are much larger than the nonrobust ones.60 0. Interval] +lprbarr  .0672268 3.5624 .2475521 .42 0.694 0.8808216 .1155314 . 53) = Prob > F = 1. Std.682 0.000 .0 grant_1 = 0.5456589 . Err.45 0.000 .3795728 39 . a.0263683 20.318 .551 .1790052 0.8565 .216278 . However.7513821 1.0000 0.01 0.4938765 lprbpris  .0660522 grant_1  .328108652 Number of obs F( 11.11.Rsquared Root MSE Number of clusters (fcode) = 54 = = 0.55064  Robust lscrap  Coef.1145118 1. t P>t [95% Conf. grant and grant1 are jointly insignificant: .49 0.1723924 .000 .14 0. Interval] +d89  .5974413 .7195033 .010261 _cons  . 618) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 630 74.679 .0893147 0.37893 lcrmrte  Coef. .3450708 .0367657 19.0 F( 2.7917042 .0371354 . t P>t [95% Conf.644669 11 10.1073226 . Err. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87 Source  SS df MS +Model  117.4108369 . Std.6473024 lprbconv  . test grant grant_1 ( 1) ( 2) grant = 0.1188807 1.1420073 The robust standard errors for grant and grant1 are actually smaller than the usual ones.4663616 .5700 0.1153893 .2517165 lscrap_1  .
23691 uhat  Coef.0578218 0. robust cluster(county) Regression with robust standard errors Number of obs = F( 11.74e10 .057931 0.043503 .46 0. t P>t [95% Conf.000 2.498 0. reg uhat uhat_1 Source  SS df MS +Model  46.0049957 d85  .588149 .56 0.1566662 . 538) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 831.755 0.lavgsen  .1087542 .635 .451 .0579205 1.057923 1.0696601 d84  .1968286 538 .061 .0200271 Because of the strong serial correlation.181 .189 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82d87.0714718 d87  . Interval] +lprbarr  .7918085 .2005023 .000 .4249525 d82  . 89) = Prob > F = Rsquared = Root MSE = Number of clusters (county) = 90 630 37.222504 . Interval] +uhat_1  .475 0.000 .000 1.000 .467 .6064 .8648693 539 .878 0.37893  Robust lcrmrte  Coef.082293 .0583244 1.0364927 d86  .0269872 lpolpc  .089 0.0780454 . Std.0846963 _cons  2.338 0.8457504 _cons  1.576438 1. Err. resid .9372719 .728 0.0420791 . Err.02746 28.929 . Std.6071 0.5017347 40 .1189026 d83  .0576243 0.15563 .3070248 . t P>t [95% Conf.7195033 .0051371 .142606437 Number of obs F( 1.2516253 8.0000 0.1387815 .6680407 1 46.275 0.835 0. I obtain the fully robust standard errors: . gen uhat_1 = uhat[_n1] if year > 81 (90 missing values generated) .0101951 0.19 0.0867575 .1095979 6.0200271 .5700 .6680407 Residual  30.056899 0.1086284 . predict uhat.7378666 .3659886 .056127934 +Total  76.1925835 .0300252 12.135 .0000 0.000 .0270426 .
0164261 6.1174537 .0948522 d87  .4638255 lavgsen  .2119007 .1103509 .1217691 lprbconv  .045 . t P>t [95% Conf.0781181 d83  .1028353 .1337606 d83  .000 . gen lcrmrt_1 = lcrmrte[_n1] if year > 81 (90 missing values generated) .046 .0190806 43.0692234 .0612797 . We lose the first year.000 .1546683 .0229405 7.2906166 .0085982 .lprbconv  .0704368 7. when we add the lag of log(crmrte): .0381447 0.8637879 _cons  .1272783 .6065681 d82  .321 0.41 0.101492 .0270426 .755 .02 0.229651 Not surprisingly.0233448 d84  .007 .0391758 2.445 .000 .480 .025 .8442885 Residual  16.784 0. reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 Source  SS df MS +Model  163. including it makes all other coefficients much smaller in magnitude.0011145 d85  .030387 3.329 .1378348 lpolpc  . Interval] +lprbarr  .0107492 .1254092 . 1981.1088453 2.312 0.7888214 .0867575 . Further.006 0.0309127 d85  .3641423 .75 0.29 0. Std.0137298 .119 .0428788 0.0440833 d86  . 41 The .1152298 .0780454 .3113499 .0271816 2.1087542 .0678438 .562 0. drop uhat uhat_1 b.8263047 .0671272 .0014224 d86  .818 .800445 .174924 .0431201 d87  .0420791 .02 0.1609444 .0051371 . Err.889 .0555355 lpolpc  .0367296 0.015 .0267299 2.082293 .121078 3.000 .78 0.0649438 .0487502 _cons  2.4057025 lprbpris  .026896 1.68 0.000 .334237975 Number of obs F( 11.8647054 2.14 0.1865956 .154268 539 .71 0.287174 11 14.98 0.792 0.033643 1.1668349 .430 0.0385625 2.0960793 lprbpris  .306 0.9064 0.031945255 +Total  180.27 0.0570255 lavgsen  .014 .9044 .0312787 .003 .0000 0.2475521 .0420159 .17873 lcrmrte  Coef. the lagged crime rate is very significant.1285118 .179 0.77 0.1205245 lcrmrt_1  .3659886 .230 0.199 .043503 .273 0.0304828 .0124338 d84  .0108203 .0536882 .5456589 .749 .0267623 2.1062619 .018 3.8670945 528 .0268172 0.6856152 .1324195 0.0165096 7.078524 . 528) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 464.1130321 0.0345003 0.000 .470 0.
1337714 .023 . I will not correct the standard errors.17895 lcrmrte  Coef.000 .580 . 519) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 540 255.911 0.0088345 42 .32 0.087 0.000 .0888553 .0352873 0.166991 .557 0. None of the log(wage) variables is statistically significant.011 .1216644 .071157 . however.0195318 .542 0. Thus.554 0.1005518 lprbpris  . d.0311719 3.1277591 lprbconv  .0729231 . There is no evidence of serial correlation in the model with a lagged dependent variable: . We still get a positive relationship between size of police force and crime rate.272 0.000 .1050704 . reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d84d87 lcrmrt_1 uhat_1 From this regression the coefficient on uhat1 is only . and the magnitudes are pretty small in all cases: . Std.1389838 d83  .0000 0.1292903 . which means that there is little evidence of serial correlation (especially since ^ r is practically small). reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d83d87 lcrmrt_1 lwconlwloc Source  SS df MS +Model  163.0169096 7.0652494 .0172627 6.154268 539 .0165559 d84  .986.000 . t P>t [95% Conf.1108926 . predict uhat. gen uhat_1 = uhat[_n1] if year > 82 (180 missing values generated) .0287165 2.03202475 +Total  180.9077 0.322 0. although it is insignificant.6208452 519 .17667116 Residual  16.0497918 lavgsen  . Interval] +lprbarr  .9042 .1746053 . c.533423 20 8.variable log(prbpris) now has a negative sign.059 with t statistic .334237975 Number of obs F( 20.0238458 7. Err.1721313 .2214516 . resid (90 missing values generated) .049654 lpolpc  .0286922 2.
0409046 ..0898807 .0389325 1.= 2&7 S Z’i Xi*’ S Z’i (yi .478 .1003172 0.471 ..0498207 .0 0.0 0.928 0.1070534 .368 0.721 0.0208067 38.0418004 1. 519) = Prob > F = 0.051 0....0439875 0.0306847 .0330676 . it follows from multivariable calculus that dQ(b)’ N ^& N .0922683 lwser  .0034567 .0 0.294 .. 7i=1 i i8 7i=1 i i In terms of full data matrices. after simple algebra.0405482 lwtrd  .172 .561 .767901 .0355801 lwfed  .29319 _cons  .6438061 .0326156 0.0961124 ..0121236 . 8 W7i=1 db i=1 Evaluating the derivative at the solution B^ gives & SN Z’X *’^W& SN Z’(y .181 .1054249 .1.114 0.098539 lwfir  .Xib)*8.88852 .8496525 lwcon  .871 0.0223995 0.. we can write.958 0..0465632 . 43 .1286819 lcrmrt_1  .038269 d86  ..0296003 .8087768 .266 .0 0.710 0.0798526 1. ^ ^ ^ (X’ZWZ’X)B = (X’ZWZ’Y).2201867 ...0564908 lwmfg  .000 .X B ^ * i )8 = 0.0355555 .0 0.0 0.0 9.0283133 ..276 0.85 0.5663 CHAPTER 8 8.338 .310 1.354 .791 0.0474615 . test lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc ( ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) 9) lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc F( = = = = = = = = = 0..6335887 1.341 0.039408 lwloc  .016 0.6009076 .582 0.0 0.0466549 .d85  .783 .0660699 1. Letting Q(b) denote the objective function in (8.0371746 0.0392516 0.877 .012903 .23).0903894 .0742918 .0263763 .1173893 .0221872 0.0258059 .154 0.429 .0318995 0.0994076 d87  .2639275 lwsta  .0487983 lwtuc  .1009652 .0 0.
D(D’D) D’]* 1 1/2 C. s is simply a function of z since h shows that ^2 _ h(z). where ^1 is M * K and ^2 is Q * K. E(rz) = 0. by the assumption that E(xz) = L(xz). and this = 0. x = Therefore. and so r is uncorrelated with all functions of z. idempotent matrix IL . th is block diagonal with g block Z’ g Xg.h) = z^1 + h^2. E(z’x) = E(z’x ).L(xz). 8. and write the linear projection as L(yz.B^ Solving for gives (8. where x = z^ and E(z’e) = 0.15). E(s’r) = 0.3. ^2 = [E(s’s)] E(s’r).24).E(xz). 8.(C’WC)(C’W*WC) (C’WC) can be written as 1 1 C’* 1/2 where D _ *1/2WC. we can always write x as its linear projection plus an error: * x * + e. Further. r is also equal to x . let h _ h(z). But. from the twostep projection theorem (see Property LP. 8. First. it is now 44 . Therefore. But Therefore. [IL . Z’X Using these facts. Since this is a matrix quadratic form in the L * L symmetric. where s 1 _ h . where Zg N * Lg matrix of instruments for the gth equation. When )^ is diagonal and Zi has the form in (8.L(hz) and r _ x . it is necessarily itself 1 positive semidefinite. This follows directly from the hint. Straightforward matrix algebra shows that (C’* C) .7. th is a block diagonal matrix with g denotes the N ^2& N ^ ^ S Z’i ) Zi = Z’(IN t ))Z i=1 * ^2 block sg S z’ 7i=1 igzig8 _ sgZ’g Zg. Then we must show that ^2 = 0.5.7 in Chapter 2). Now. verifies the first part of the hint. which * To verify the second step.D(D’D) D’.
x1 = (z1. except that ^ These are the optimal IVs underlying is replaced with its rNconsistent OLS estimator. s2 and E(xizi) = zi^. This is a simple application of Theorem 8. Without the i subscript.G. 2SLS equationbyequation. 45 . equivalently. and so the optimal IVs are [z1. )(z) Var(u1z) = s21.. * If E(uizi) = 2 1 w(zi) = E(ui2zi).y2) and so E(x1z) = [z1.F(z)].straightforward to show that the 3SLS estimator consists of [X’ g Zg(Z’ g Zg) Z’ g Xg] X’ g Zg(Z’ g Zg) Z’ g Yg stacked from g = 1. Further. The optimal instruments are given in Theorem 8. and this leads to the OLS estimator. 2SLS.5 when G = 1. If E(ux) = 0 and E(u 2 = x) = s2 then the optimal instruments are s2E(xx) s2x..E(y2z)].9. 2 Dropping the division by s21 clearly does not affect the optimal instruments. the the optimal instruments are s2zi^. 2SLS estimator has the same asymptotic variance whether ^ or ^ ^ The is used. so the optimal instruments are zi^.5. and so 2SLS is asymptotically efficient. b. a. The constant multiple s2 clearly has no effect on the optimal IV estimator.. 8. 8. = It follows that the optimal instruments are (1/s1)[z1. with G = 1: zi = [w(zi)] E(xizi). 1 1 1 This is just the system 2SLS estimator or.E(y2z)].11.. If y2 is binary then E(y2z) = P(y2 = 1z) = F(z).
and therefore wage. What causal inference could one draw from this? We may be interested in the tradeoff between wages and benefits. but then either of these can be taken as the dependent variable and the analysis would be by OLS. We we can certainly be interested in the causal effect of alcohol consumption on productivity. b. have no economic meaning. we could use a simple regression analysis. If we want to know how a change in the price of foreign technology affects foreign technology (FT) purchases. a. if we have omitted some important factors or have a measurement error problem.) The simultaneous equations model recognizes that cities choose law enforcement expenditures in part on what they expect the crime rate to be. then we could estimate the crime equation by OLS.CHAPTER 9 9. But it is not a simultaneity problem. Of course. 46 One’s hourly wage is . Yes. (In fact. c. where expenditures are assigned randomly across cities. If we could do the appropriate experiment.1. why would we want to hold fixed R&D spending? Clearly FT purchases and R&D spending are simultaneously chosen. These are both choice variables of the firm. OLS could be inconsistent for estimating the tradeoff. and we are certainly interested in such thought experiments. No. d. An SEM is a convenient way to allow expenditures to depend on unobservables (to the econometrician) that affect crime. and the parameters in a twoequation system modeling one in terms of the other. and vice versa. No. We can certainly think of an exogenous change in law enforcement expenditures causing a reduction in crime. Yes. but we should use a SUR model where neither is an explanatory variable in the other’s equation.
We can apply part b of Problem 9. Each equation can be estimated by 2SLS using instruments 1. the only variable excluded from the support equation is the variable mremarr. mremarr. alcohol consumption is determined by individual behavior. a. These are choice variables by the same household. we need d11 $ 0 or d13 $ 0. A SUR system with property tax as an explanatory variable seems to be the appropriate model. fremarr. suppose that we look at the effects of changes in local property tax rates. The visits equation is identified if and only if at least one of finc and fremarr actually appears in the support equation. this equation is identified if and only if d21 $ 0. since the support equation contains one endogenous variable. presumably to maximize It makes no sense to hold advertising expenditures fixed while looking at how other variables affect price markup. When the property tax changes. Further. profits. b. e. 9. obtain the reduced form for visits: 47 .3. dist. c. No. that is. finc. First. First. No.2. It makes no sense to think about how exogenous changes in one would affect the other. This ensures that there is an exogenous variable shifting the mother’s reaction function that does not also shift the father’s reaction function.determined by the demand for skills. a family will generally adjust expenditure in all categories. We would not want to hold fixed family saving and then measure the effect of changing property taxes on housing expenditures. f. These are both chosen by the firm.
and save the residuals. If this test rejects we conclude that visits is in fact endogenous in the support equation.d12.visits = p20 + p21finc + p22fremarr + p23dist + p24mremarr + v2. finc. A heteroskedasticityrobust test is also easy to obtain. ^ ^ Then. fremarr. ^ regress finc (or fremarr) on support. mremarr. run the auxiliary regression ^ u2 on 1. a. dist. ^ say r1. Next. dist. dist. ^ Let support denote the fitted values from the reduced form regression for support. assuming that d11 and d12 are both different from zero. the sample size times the usual Rsquared from this regression is distributed asymptotically as c21 under the null hypothesis that all instruments are exogenous. ^ Let u2 be the 2SLS residuals. Assuming homoskedasticity of u2.d13. Then.) 9. Let B1 denote the 7 * 1 vector of parameters in the first equation with only the normalization restriction imposed: B’1 = (1. run the simple regression (without intercept) of 1 on u2r1. visits. Then. 48 .5. fremarr. finc. (SSR0 is just the usual sum of squared residuals. and save the residuals. the easiest way to test the overidentifying restriction is to first estimate the visits equation by 2SLS. run the OLS regression ^ support on 1.g13. There is one overidentifying restriction in the visits equation. v2. ^ Estimate this equation by OLS.d14).d11. N  SSR0 from this regression is asymptotically c21 under H0. d.g12. mremarr. as in part b. v2 ^ and do a (heteroskedasticityrobust) t test that the coefficient on v2 is zero.
we use the constraints in the remainder of the system to get the expression for R1B with all information imposed. 18 0 1 Because R1 has two rows.g21 d33 d32 * 2. Set d14 = 1 .z4) + u1.1 = 2. + d34 . a. d23 = 0.The restrictions d12 = 0 and d13 + d14 = 1 are obtained by choosing R1 = &0 0 71 0 0 0 0 0 1 0 0* . This equation can be estimated by 2SLS using instruments (z1. Because alcohol and educ are endogenous in the first equation. d24 = 0. Note that. After simple algebra we get y1 . there are just enough instruments to estimate this equation. straightforward matrix multiplication gives R1B = & d12 2d + d . Letting B denote the 7 * 3 matrix of all structural parameters with only the three normalizations. .d13 and plug this into the equation. and g32 = 0.z2. d22 = and so R1B becomes d32 * 2.z4). + d34 .7.g31 8 0. we need at least two elements in z(2) and/or z(3) that are not also in z(1).1 14 7 13 d23 d22 + d24 . Next. Now. It is easy to see how to estimate the first equation under the given assumptions. and G . &0 0 R1B = 2 70 g21 d33 Identification requires g21 $ 0 and d32 $ 0. we need to check the rank condition.z4 = g12y2 + g13y3 + d11z1 + d13(z3 . the first column of R1B is zero. But g23 = 0. 49 Ideally.g31 8 By definition of the constraints on the first equation.z3. if we just count instruments. g31 = 0. 9. the order condition is satisfied. b.
799 535.0006143 educ  . b.1032 21. we should not make any exclusion restrictions in the reduced form for educ.29): .000 831.0895 79.84729 3.63986 35. Std.911094 kidslt6  200.67 0. c.0002123 .0151452 7.135 463.0070942 .261529 1.1426539 exper  .0208906 .8919 4. 0 (zi.8577 2522.5673 134. z(3) = z. Here is my Stata output for the 3SLS estimation of (9.11 0.362 2.0142782 1. reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) (lwage hours educ exper expersq) Threestage least squares regression Equation Obs Parms RMSE "Rsq" chi2 P hours 428 6 1368. The matrix of instruments for each i is ( 2z i Zi = 2 0 2 0 9 d.0000 lwage 428 4 .9. a.009 educ  205. zi2 0 0 That is.176 119. z P>z [95% Conf.143 .46 0.340 .59414 kidsge6  48.132745 _cons  2504.35 0.95137 1.2685 1.6455 103.128 +lwage  hours  .89 0.6892584 0.3678943 3.53608 0.educi) 0 ) 2 2.7287 62.87188 0.47 3555. Interval] +hours  lwage  1676.933 431.000 1454.000 306.915 6. Err.28) and (9.169 3.46 0. Then use these as instruments in a 2SLS analysis.0488753 50 .47351 3.95 0.0267 51.451518 0.1129699 .396957 7.0000  Coef.000 .0832858 .000201 .137 28. 9.95 0.0002109 0.82352 nwifeinc  .we have at least one such element in z(2) and at least one such element in z(3).28121 8.49 0.49 0.4078 age  12. Let z denote all nonredundant exogenous variables in the system.1145 34.
g12g21). After substitution and straightforward algebra. we obtain a more efficient 51 Of .302097 . e. 9. We can estimate the system by 3SLS. I know of no econometrics packages that conveniently allow system estimation using different instruments for different equations.expersq  . Unfortunately. Consistency of OLS for p11 does not hinge on the validity of the exclusion restrictions in the structural model. we will still consistently estimate misspecified this equation.0008066 . Whether we estimate the parameters by 2SLS or 3SLS. for the second equation. b.) So our estimate of g21 provided we have not p11 = dE(y2z)/dz1 will be inconsistent in any case. we just need d22 $ 0 or d23 $ 0 (or both.31 0.7051103 . it can be seen that p11 = d11/(1 . we would form p^11 = d^11/(1 .11. To be added. We can just estimate the reduced form E(y2z1.z3) by ordinary least squares.0002614 1. of course). (Since we are estimating the second equation by 2SLS.13 0. each equation. f.g^12g^21). if the SEM is correctly specified. course. ^g12.0002943 .1081241 Endogenous variables: hours lwage Exogenous variables: educ age kidslt6 kidsge6 nwifeinc exper expersq  b.z2.3045904 2. we could just use 2SLS on ^ d11.260 . Given Or. we will generally inconsistently estimate d11 and g12. whereas using an SEM does. and g^21. identified if and only if The second equation is d11 $ 0. c. d.000218 _cons  . Since z2 and z3 are both omitted from the first equation. a. this is identical to 2SLS since it is just identified.021 1.
61916 57. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 114 2.021 .13.294 0. Here is my Stata output. and only if. then OLS: .4217 113 575.682852 +Total  63757.230002 Number of obs F( 2. 9.368841 _cons  26.4387 17.) b. Here is my Stata output: . c.852 3.1936 2 14303.505435 lland  7. Std.5464812 1.715 2. reg open lpcinc lland Source  SS df MS +Model  28606.083 3.0845 15. open. 2SLS.4487 0. Interval] +lpcinc  . d22 $ 0. reg inf open lpcinc (lland lpcinc) Source  SS df MS +Model  2009.9902 113 564.388 0.953679 _cons  117.194 111 568.0309 0.0968 Residual  35151.145892 +Total  65073.000 85.796 open  Coef.747 0.41783 52 . First.22775 2 1004.0134 23.79 0. t P>t [95% Conf. Interval] +open  . a.000 9.366 0.180527 5. (This is the rank condition. The first equation is identified if.412473 3. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 45. Err.89934 15.342 0.4012 1. Err.8483 7.015081 0.3758247 2.8142162 9.187 0.61387 Residual  63064.17 0.870989 Number of obs F( 2.0000 0.836 inf  Coef.7966 111 316.489 This shows that log(land) is very statistically significant in the RF for Smaller countries are more open.0519014 lpcinc  .0657 0.567103 .1441212 2.estimator of the reduced form parameters by imposing the restrictions in estimating p11.3374871 .68006 148.6230728 .617192 4. Std. t P>t [95% Conf.49324 0.
.4217 113 575.0281 23.96406 Residual  62127. we need an IV for it.110342 Residual  65487. gen llandsq = lland^2 . 2 A regression of open Since is a natural 2 on log(land).09 0. of about 2.2150695 .402583 . If we add g13open2 to the equation. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 2. has a larger standard error.931692 _cons  25. gen opensq = open^2 .1060 .4936 111 559.10403 15. Interval] +open  .0175683 1.0946289 2. reg inf open opensq lpcinc (lland llandsq lpcinc) Source  SS df MS +Model  414. 24. t P>t [95% Conf.40 inf  Coef. Err. Interval] +53 . reg inf open lpcinc Source  SS df MS +Model  2945. and log(pcinc) 2 gives a heteroskedasticityrobust t statistic on [log(land)] This is borderline.870989 Number of obs F( 2. d. [log(land)] . [log(land)] candidate.4217 113 575. t P>t [95% Conf. Std.027556 lpcinc  . Std.273 0.651 0.870989 Number of obs F( 3.026122 55. The Stata output for 2SLS is .0453 0.23419 The 2SLS estimate is notably larger in magnitude.993 3.0764 0.. but we will go ahead.896555 3.7527 110 595.343207 +Total  65073.658 inf  Coef.63 0.009 0. Not surprisingly. Err.975267 0.331026 3 138. it also You might want to test to see if open is endogenous.70715 +Total  65073. 110) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = (2SLS) 114 2. 2 log(land) is partially correlated with open.20522 1.102 5.025 .92812 2 1472.
0879 0.68006 148. the estimate would be significant at about the 6. Std. fitted values) . 110) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 2.0318 23.428461 .0049828 1.230002 Number of obs F( 2.17 0.612 inf  Coef.37 0. reg open lpcinc lland Source  SS df MS +Model  28606. Err.4387 17. t P>t [95% Conf.028 4.0022966 .000 85.547615 +Total  65073. 111) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 114 45. reg inf openh openhsq lpcinc Source  SS df MS +Model  3743.72804 Residual  61330.8142162 9.5464812 1.8483 7.9902 113 564.056 2.open  1. Interval] +lpcinc  .0968 Residual  35151.230 0.245 0. Here is the Stata output for implementing the method described in the problem: .807 3. predict openh (option xb assumed. gen openhsq = openh^2 .801467 81.54102  The squared term indicates that the impact of open on inf diminishes.505435 lland  7.2376 110 557.131 .932 0.180527 5.39 0.0575 0. t P>t [95% Conf.870989 Number of obs F( 3.49324 0.0000 0.607147 _cons  43. Std.567103 .4487 0.4217 113 575. e.18411 3 1247.198637 .17124 19.36141 2.412473 3.24 0.0075781 .0845 15.1936 2 14303.5066092 2.6205699 1. Err.5% level against a onesided alternative.682852 +Total  63757.29 0.7966 111 316. Interval] +54 .069134 0.715 2.953679 _cons  117.796 open  Coef.489 .0174527 lpcinc  .000 9.521 0.593929 4.0311868 opensq  .
Since investment is likely to be affected by macroeconomic factors. E(openlpcinc.5727026 77.02 0. If g13 = 0. less robust. If only a cross section were available. d. b.lland) is linear and. and we cannot trust the standard errors.984 3.1. the results are similar to the correct IV method from part d.17831 19.0059682 1. Standard investment theories suggest that. could easily be correlated with tax rates because tax rates are.48041 2.313 . But the forbidden regression implemented in this part is uncessary.023302 0.204181 openhsq  . CHAPTER 10 10. as shown in Problem 9. I would start with a fixed effects analysis to allow arbitrary correlation between all timevarying explanatory variables and ci.112 1. larger marginal tax rates decrease investment.60 0.933799 .8648092 . selected by state and local officials.0060502 . This is often a difficult task. both methods are consistent. 55 .5394132 1. Putting the unobserved effect ci in the equation is a simple way to account for timeconstant features of a county that affect investment and might also be correlated with the tax variable.050927 _cons  39.78391  Qualitatively.968493 4. (Actually. a. ceteris paribus.12.0178777 lpcinc  .047 .0412172 2. c. which affects investment. anyway.openh  .0057774 . this is done by using T .01 0. we would have to find an instrument for the tax variable that is uncorrelated with ci and correlated with the tax rate.1 time period dummies. Something like "average" county economic climate. it is important to allow for these by including separate time intercepts.01 0. at least to a certain extent.
especially if there is a target level of tax revenue the officials are are trying to achieve. this is not a worry for the disaster variable: It is safe to say that disasters are not determined by past investment. yi = (yi1 + yi2)/2. differencing it is easy to test whether the changes Remember. future natural On the other hand. state officials might look at the levels of past investment in determining future tax policy. this might not be much of a problem. These might have little serial correlation because we have allowed for ci. taxit.xi. 10. Given that we allow taxit to be correlated with ci. 56 . with first Duit are serially uncorrelated. ¨ xi1 = xi1 . since a larger base means a smaller rate can achieve the same amount of revenue. However. I would compute the fully robust standard errors along with the usual ones. e. But it cannot be ruled out ahead of time. Such an analysis assumes strict exogeneity of zit. then the only possible violation of the strict exogeneity assumption is if future values of these variables are correlated with uit. a. Let xi = (xi1 + xi2)/2. it seems more likely that the uit are positively autocorrelated. I have no strong intuition for the likely serial correlation properties of the {uit}. these results can be compared with those from an FE analysis). in which case I would use standard fixed effects.doing pooled OLS is a useful initial exercise. In either case. If taxit and disasterit do not have lagged effects on investment. and disasterit in the sense that these are uncorrelated with the errors uis for all t and s. rates: This could be similar to setting property tax sometimes property tax rates are set depending on recent housing values. in which case I might use first differencing instead. presumably.3.
xi.¨ xi1BFE and ui2 = ¨ yi2 .. = B^FD. we have ^ ^ ui1 = Dyi/2 .1’). Therefore.2.yi1)/2 = Dyi/2.xi1)/2 = Dxi/2 ¨ yi1 = (yi1 . 7i=1 i i8 7i=1 i i8 = ^ ^ ^ ^ ¨ B b.DxiBFD)/2 _ ei/2. and similarly for ¨ yi1 and y i2 For T = 2 the fixed effects estimator can be written as B^FE = & SN (x¨’ x¨ + x¨’ x¨ )*1& SN (x¨’ ¨y + ¨x’ y¨ )*.yi2)/2 = Dyi/2 ¨ yi2 = (yi2 .DxiB FD are the first difference residuals. i=1 i=1 This shows that the sum of squared residuals from the fixed effects regression is exactly one have the sum of squared residuals from the first difference regression.N.¨ ¨ . ¨ xi1 = (xi1 . ¨ ¨ ¨ ¨ x’ i1xi1 + x’ i2xi2 = Dx’D i xi/4 + Dx’D i xi/4 = Dx’D i xi/2 ¨ ¨ ¨ y ¨ x’ i1yi1 + x’ i2 i2 = Dx’D i yi/4 + Dx’D i yi/4 = Dx’D i yi/2. xi2 = xi2 .. and so B^FE & SN Dx’Dx /2*1& SN Dx’Dy /2* 7i=1 i i 8 7i=1 i i 8 & SN Dx’Dx *1& SN Dx’Dy * = B ^ = FD. ^ _ Dyi ..(Dxi/2)BFD = (Dyi ^ ui2 = ^ where ei ^ ^ DxiB FD)/2 _ ei/2 ^ ^ ^ Dyi/2 . N N ^2 2 ^2 S (u^i1 + ui2) = (1/2) S ei. i = 1. Let ui1 = ¨ yi1 . Therefore. i2 i2 8 i2 i2 8 7i=1 i1 i1 7i=1 i1 i1 Now.(Dxi/2)B FD = (Dyi .. Since B^FE and using the representations in (4. Since we know the variance estimate for fixed effects is the SSR 57 .x i2 FE be the fixed effects residuals for the two time periods for cross section observation i.xi2)/2 = Dxi/2 ¨ xi2 = (xi2 . by simple algebra.
2 Under RE. This is easy since the variance matrix estimate for fixed effects is 1 N N *1 = ^s2& SN Dx’Dx *1. b. ^2 su = ^s2e/2 (contrary to what the problem asks you so show). a.3a. Write viv’ i = cijTj’ T + uiu’ i + jT(ciu’ i ) + (ciui)j’ T. which implies that E(uiu’ i xi) = suIT (again.K.ci) = 0. h(xi) + s2u. Therefore. and in fact all other test statistics (F statistics) will be numerically identical using the two approaches. the standard errors. E(uiu’ i xi.ci) = 2 s2uIT.5. that is.K (when T = 2). E(viv’ i xi) = E(cixi)jTj’ T + E(uiu’ i xi) = h(xi)jTj’ T + 2 where h(xi) _ Var(cixi) = E(c2ixi) (by RE. h(xi). and the variance estimate for first difference is the SSR divided by N . and the same variance for all t.1. 10. while the variances and covariances depend on xi in general. they do not depend on time separately. Thus. which implies that E[(ciu’ i )xi) = 0 by interated expecations. E(uixi.1b).30) 58 . su7 S (x¨’i1¨xi1 + ¨x’i2¨xi2)*8 = (^s2e/2)&7 S Dx’D x /2 i i 8 e7 i i8 i=1 i=1 i=1 ^2& which is the variance matrix estimator for first difference.divided by N . The RE estimator is still consistent and rNasymptotically normal without assumption RE.3b. This shows that the conditional variance matrix of vi given xi has the same covariance for all t s. s2uIT. Under RE. the error variance from fixed effects is always half the size as the error variance for first difference estimation. but the usual random effects variance estimator of B^RE is no longer valid because E(viv’ i xi) does not have the form (10. $ Therefore. to show is that the variance matrix estimates of B^FE and What I wanted you B^FD are identical. by iterated expectations).
77 0.1334868 .627 0. and I compute the nonrobust regressionbased statistic from equation (11.0612948 5.0681573 3. Err. X) = 0 (assumed) (theta = 0.3718544 .0440992 .000 2.0599542 0.000167 black  .4782886 _cons  1.261 . re sd(u_id) sd(e_id_t) sd(e_id_t + u_id) = = = .050 0.630 0.035879 .(because it depends on xi).3684048 .5390 0. I provide annotated Stata output.0084622 .0121797 crsgpa  1.79): . fe 59 . 10.3566599 4.124 0.43396 1.16351 0.1575199 . with timevarying variables only.5526448 corr(u_id.335 .2067 0.534 . The robust variance matrix estimator given in (7.1629538 hsperc  .0001248 0.0000 trmgpa  Coef.264814 frstsem  .0000775 .3581529 .843 0.2348189 .73492 . .0930877 11.960 .4779937 .0013582 .3862) Randomeffects GLS regression Number of obs = 732 n = 366 T = 2 Rsq within between overall = = = 0.000 .082365 .0029948 . z P>z [95% Conf.0328061 sat  .0020523 verbmath  .2380173 .1145132 .103 .0001771 9.49) should be used in obtaining standard errors or Wald statistics. iis id . xtreg trmgpa spring crsgpa frstsem season.0017052 .000322 .000 .0060268 hssize  .621 0.0392381 1.632 0.4785 chi2( 10) = Prob > chi2 = 512.000 .0012426 6. tis term . Std.864 0.4088283 . * fixed effects estimation.000 . Interval] +spring  . * random effects estimation .000 .7.0371605 1.0108977 . xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black female.445 0.1205028 season  .1012331 female  .963 0.0606536 .1210044 .8999166 1.810 0.
614*hssize 60 First. di 1 . * Now obtain GLS transformations for both timeconstant and .3305004 2.374025 frstsem  .sd(u_id) sd(e_id_t) sd(e_id_t + u_id) = = = .1482218 season  ..187 0. Interval] +spring  . Xb) = 0.366 0.4088283 .332 0.1382072 .000 (366 categories) .140688 . Std.614*verbmath .852 . Note that lamdahat = .1186538 9.681 0.614*sat .614 .0000 trmgpa  Coef.0613 4.386. * Obtaining the regressionbased Hausman test is a bit tedious.679133 .614 . by(id) . by(id) . gen bone = . t P>t [95% Conf.0128523 . .000 .792693 corr(u_id.61 0.9073506 1.399 0.0111895 crsgpa  1. egen acrsgpa = mean(crsgpa). 362) = Prob > F = 23.1208637 id  F(365.0657817 . Err. . by(id) .0249165 _cons  . by(id) .0414748 1. gen bvrbmth = .362) = 5.1427528 .0566454 .7708056 .2069 0. egen afrstsem = mean(frstsem).614 0. gen bhsperc = . gen bhssize = . * timevarying variables.1225172 .0893 Fixedeffects (within) regression Number of obs = 732 n = 366 T = 2 Rsq within between overall F( = = = 0. egen atrmgpa = mean(trmgpa).0391404 1. compute the timeaverages for all of the timevarying variables: .0688364 0.614*hsperc . egen aseason = mean(season). egen aspring = mean(spring). by(id) .0333 0.386 .094 .173 .420747 .020 1. gen bsat = .
050 0.3686049 .446 0.46076 732 2. * Now add the time averages of the variables that change across i and t .10163 11 144. nocons Source  SS df MS +Model  1584. Std. * Check to make sure that pooled OLS on transformed data is random .009239 Residual  120. gen bspring = spring .1211368 ..844 0.4784672 . Interval] +bone  1.1010359 bfemale  .1669336 +Total  1704.0029868 .67 0. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale.435019 1. gen bfrstsem = frstsem .2348204 .621 0.960 .0681441 3..632 0.0371666 1.0612839 5.336 .082336 .000 .0123167 bcrsgpa  1.0084622 . gen bblack = .3581524 . . * to perform the Hausman test: .163434 bhsperc  . gen bseason = season .0012424 6.0000775 .123 0.000 .0003224 .262 .2378363 .060651 ..000 2.811 0.0017052 .000 .1207046 bseason  .734843 .000 . .964 0.386*atrmgpa .1634784 0.614*black . nocons 61 .000 .632 0.9294 0. 721) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 732 862. gen bfemale = .0930923 11. reg btrmgpa bone bspring bcrsgpa bfrstsem bseason bsat bvrbmth bhsperc bhssize bblack bfemale acrsgpa afrstsem aseason.114731 . gen btrmgpa = trmgpa .9283 .0109013 . t P>t [95% Conf.614*female .386*afrstsem .535 .386*aspring .0013577 .1575166 . * effects.4784686 .1336187 . * These are the RE estimates.864 0.103 .40858 btrmgpa  Coef.359125 721 .386*aseason .0599604 0.034666 bspring  . Err..000177 9..265101 bfrstsem  .8995719 1.386*acrsgpa .0392441 1.0001674 bblack  .0440905 .3566396 4..0329558 bsat  .3284983 Number of obs F( 11.0000 0. gen bcrsgpa = crsgpa .626 0. subject to rounding error.0020528 bvrbmth  .0001247 0.0060231 bhssize  .
c24 It would have been easy to make the regressionbased test robust to any violation of RE.2241405 .0).1931626 bhsperc  .0 afrstsem = 0.0001804 9.1101184 bfemale  .167204767 +Total  1704.000 .355 .926 0.3357016 .0 F( 3.0685972 3.40773 14 113.0001249 0.187 0.148023 bseason  .3567312 .4063367 bspring  . gives pvalue = .4564551 .1654425 0.441186 .093 . t P>t [95% Conf.852 .745 0.61 0.173 .0003236 .1281327 afrstsem  .0391479 1.0 aseason = 0.46076 732 2.531 .140688 .1316462 .40891 btrmgpa  Coef. 718) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 732 676.627 0.0763206 .680 0.001314 .2447934 .0896965 0.0414828 1.3284983 Number of obs F( 14.0060013 bhssize  .3794684 .5182286 2.0247967 bsat  .1280569 aseason  . Interval] +bone  1.0016681 .1959815 . based on a distribution (using Stata 7.1186766 9.85 0.000167 bblack  .4754216 acrsgpa  . * significance levels.961 0.0020223 bvrbmth  . test acrsgpa afrstsem aseason ( 1) ( 2) ( 3) acrsgpa = 0.592 .0566454 .569 0.770.9296 0.6085 . For comparison.0012551 6.612 0.000 .796 0.747 0. which includes spring among the coefficients tested.0711669 4.366 0.373683 bfrstsem  .0794119 0.247 0. 62 add ". Err.0000 0.1142992 .1234835 0.171981 Residual  120.2322278 .0000783 .006 2.0795867 .1223184 .337 .0657817 . Std.426 .0110764 bcrsgpa  1.423761 .9282 .3: cluster(id)" to the regression command. we fail to reject the random effects assumptions even at very large .536 0. the usual form of the Hausman test.0084655 .1426398 . robust . 718) = Prob > F = 0.Source  SS df MS +Model  1584.0109296 .0480418 .053023 718 .1380874 .000 .717 0.0688496 0.000 .000 .9076934 1. * Thus.0128523 .
The short answer is: as N L 8.. The Stata output follows.) Verifying this claim takes much more work.10.. B with good (As usual with fixed T. To be added. we can "concentrate" the ai out ^ by finding ai(b) as a function of (xi. but it is mostly just algebra. 10.. t = 1. "fixed effects weighted least squares. We can estimate this equation The actual calculation for this example is to be added. c. 10.yi) and b. as well as wit. is another case where "estimating" the fixed effects leads to an estimator of properties. rNconsistent. substituting back into the 63 . where xit includes an overall intercept along with time dummies. asymptotically normal Therefore. I should have In other words. The simplest way to compute a Hausman test is to just add the time averages of all explanatory variables.T. and estimating the equation by random effects.13. First. done a better job of spelling this out in the text. the covariates that change across i and t.9.. write  yit = xitB + wiX + rit. in the sum of squared residuals.11. by random effects and test H0: X = 0. it produces a estimator of B." where the weights are known functions of exogenous variables (including xi and possible other covariates that do not appear in the conditional mean). Parts b. a. and d: To be added. excluding the dummy variables. there is no sense in which we can estimate the ci consistently. we can justify this procedure with fixed T In particular. Yes.
x. t=1 which gives ^ & ai(b) = wi T * & T * S yit/hit . T & * . define yit (xit ..xi) with weights 1/hit. ^ ^ Now we can plug each ai(b) into the SSR to get the problem solved by B: min K beR N T S S [(yit .w Note carefully how the initial yit are weighted by 1/hit to obtain yi..y.. .xib.... and a similar 7t=1 8 7t=1 8 .w definition holds for xi.xitb)/hit = 0. Note that yi and xi are simply weighted averages.82).. _ (yit .w If h equals the same constant for all t. but  where the usual 1/rhit weighting shows up in the sum of squared residuals on the timedemeaned data (where the demeaming is a weighted average).N..w .it Then B^ can be expressed in usual pooled OLS form: B^ = & SN ST x~’ x~ *1& SN ST x~’ y~ *. where ui _ wi S uit/hit .wi)/rh. Subtracting 7t=1 8 (10.82) and divide by N in the appropriate places we get 64 ... Straightforward algebra gives the first order conditions for each i as T S (yit ..it ...yi) on w ~ Equivalently.. _ (xit .w & T * easy to show that yi = xiB + ci + ui. First.w .wi)/rh.wi)/rh.w & T * where wi _ 1/ S (1/hit) > 0 and yi _ wi S yit/hit . it is .. and then minimizing with respect to b only. all t = 1.wi)b]2/hit.w .T.. i = 1..u.w But this is just a pooled weighted least squares regression of (yit . i=1t=1 .(xit ....wi S xit/hit b 7t=1 8 7t=1 8 w w _ yi .wi) ...w .. _ (uit .w . we can study the asymptotic (N this equation from yit = xitB ~ where uit .. 7i=1t=1 it it8 7i=1t=1 it it8 (10.82) .sum of squared residuals.w ... Given ^ L 8) properties of B .. ~ When we plug this in for yit in (10..y.it + ~ ~ ci + uit for all t gives yit = xitB + ~ uit. y and x are the usual time it i i averages.. ~ xit ..x.^ai .
(We can even show that E(BX.83) B^. _ S E(x~’it~xit) and B _ Var&7 S ~x’ituit/rh.w ^ say rit.N.T.hi).2su2E[(wi/hit)] + su2E[(wi/hit)]...84) ^ ~ .. 7 i=1t=1 it it8 7 i=1t=1 it it8 T ~ ~ T ~ Straightforward algebra shows that S x’ ituit = S x’ ituit/rhit....H) = B. The same subtleties that arise in estimating effects estimator crop up here as well.) ^ It is also clear from (10.. ~2 2 .....B) = A BA . (10.&N1 SN ST x~’ x~ *1&N1 SN ST x~’ u~ *..ui)/rhit (in the sense that we obtain rit ~ from uit by replacing w 2 + E[(ui) /hit] = B with B^).hi.B ^ 1 1 rN( . The asymptotic variance is generally Avar . ~ So E(x’ ituit) = 0. are estimating uit = (uit . B^ = B + t=1 t=1 and so we have the convenient expression B^ = B + &N1 SN ST x~’ x~ *1&N1 SN ST x~’ u /rh.ci) = 0. t = 1. t = 1.. including xit. As long & S E(x~’ ~x )* = K.... and so rN( .* .*.uisxi.T.w Now E(uit) = E[(uit/hit)] . i = 1.. then it is easily shown that B .it 8 t=1 t=1 If we assume that Cov(uit. s2u for the usual fixed Assume the zero conditional covariance assumption and correct variance specification in the previous paragraph.B ^ 2 1 su2A..83) we can immediately read off the consistency of (10. where T T ..ci) = = s2uhit. we can use the usual proof to show it it 8 7t=1 ^ ^ plim(B) = B. Why? We assumed that E(uitxi.. t $ s. Then.83) that B is rNasymptotically normal under T as we assume rank mild assumptions.ci) = 0.B) = suA .. i = 1.2E[(uitui)/hit] su2 ... in addition to the A variance assumption Var(uitxi. which means uit is uncorrelated with any ~ function of (xi. where the law of 65 .hi.. 7 i=1t=1 it it8 7 i=1t=1 it it it8 From (10.hi..N. note that the residuals from the pooled OLS regression ~ ~ yit on xit.
  w 2x ,h ] = s2w has
i
i
u i
iterated expectations is applied several times, and E[(ui)
been used.
~2
Therefore, E(uit) =
su2[1  E(wi/hit)], t = 1,...,T, and so
T
S E(u~2it) = s2u{T  E[wiWSTt=1(1/hit)]} = s2u(T  1).
t=1
This contains the usual result for the within transformation as a special
case.
A consistent estimator of
su2 is SSR/[N(T  1)  K], where SSR is the
usual sum of squared residuals from (10.84), and the subtraction of K is
optional.
^
The estimator of Avar(B) is then
^2&
1
N T
su7 S S ~x’it~xit*8 .
i=1t=1
If we want to allow serial correlation in the {uit}, or allow
Var(uitxi,hi,ci)
$ s2uhit, then we can just apply the robust formula for the
pooled OLS regression (10.84).
CHAPTER 11
11.1. a. It is important to remember that, any time we put a variable in a
regression model (whether we are using cross section or panel data), we are
controlling for the effects of that variable on the dependent variable.
The
whole point of regression analysis is that it allows the explanatory variables
to be correlated while estimating ceteris paribus effects.
Thus, the
inclusion of yi,t1 in the equation allows progit to be correlated with
yi,t1, and also recognizes that, due to inertia, yit is often strongly
related to yi,t1.
An assumption that implies pooled OLS is consistent is
E(uitzi,xit,yi,t1,progit) = 0, all t,
66
which is implied by but is weaker than dynamic completeness.
Without
additional assumptions, the pooled OLS standard errors and test statistics
need to be adjusted for heteroskedasticity and serial correlation (although
the later will not be present under dynamic completeness).
b. As we discussed in Section 7.8.2, this statement is incorrect.
Provided our interest is in E(yitzi,xit,yi,t1,progit), we do not care about
serial correlation in the implied errors, nor does serial correlation cause
inconsistency in the OLS estimators.
c. Such a model is the standard unobserved effects model:
yit = xitB +
d1progit + ci + uit,
t=1,2,...,T.
We would probably assume that (xit,progit) is strictly exogenous; the weakest
form of strict exogeneity is that (xit,progit) is uncorrelated with uis for
all t and s.
Then we could estimate the equation by fixed effects or first
differencing.
If the uit are serially uncorrelated, FE is preferred.
We
could also do a GLS analysis after the fixed effects or firstdifferencing
transformations, but we should have a large N.
d. A model that incorporates features from parts a and c is
yit = xitB +
d1progit + r1yi,t1 + ci + uit,
t = 1,...,T.
Now, program participation can depend on unobserved city heterogeneity as well
as on lagged yit (we assume that yi0 is observed).
differencing are both inconsistent as N
Fixed effects and first
L 8 with fixed T.
Assuming that E(uitxi,progi,yi,t1,yi,t2,...,yi0) = 0, a consistent
procedure is obtained by first differencing, to get
yit =
At time t and
yi,tj for j
DxitB + d1Dprogit + r1Dyi,t1 + Duit,
t=2,...,T.
Dxit, Dprogit can be used as there own instruments, along with
> 2.
Either pooled 2SLS or a GMM procedure can be used.
67
Under
strict exogeneity, past and future values of xit can also be used as
instruments.
11.3. Writing yit =
bxit + ci + uit  brit, the fixed effects estimator ^bFE
can be written as
2
N T
N T
b + 7&N1 S S (xit  x  i)*8 &7N1 S S (xit  x  i)(uit  u  i  b(rit  r i)*8.
i=1t=1
i=1t=1
*
*
*
Now, xit  xi = (xit  xi) + (rit  ri). Then, because E(ritxi,ci) = 0 for
*
  *
all t, (x
 x ) and (r
 r ) are uncorrelated, and so
it
i
it
i

  *
*

Var(xit  xi) = Var(xit  xi) + Var(rit  ri), all t.


Similarly, under (11.30), (xit  xi) and (uit  ui) are uncorrelated for all


*
  *


Now E[(xit  xi)(rit  ri)] = E[{(xit  xi) + (rit  ri)}(rit  ri)] =
t.

Var(rit  ri).
By the law of large numbers and the assumption of constant
variances across t,
N
T
T
S S (xit  x  i) Lp S Var(xit  x i) = T[Var(x*it  x *i) + Var(rit  r i)]
1 N
i=1t=1
t=1
and
N
T
S S (xit  x  i)(uit  u  i  b(rit  r i) Lp TbVar(rit  r i).
1 N
i=1t=1
Therefore,
plim
^
bFE
Var(rit  r i )
&
*
= b  b 7[Var(x*  x   * ) + Var(r  r   )]8
it
i
it
i
it
i
it
i
Var(rit  r i )
&
*
= b 1   .
7
*
*
8
[Var(x
 x ) +
Var(r
 r )]
11.5. a. E(vizi,xi) = Zi[E(aizi,xi) 
A]
+ E(uizi,xi) = Zi(A 
A)
+ 0 = 0.
Next, Var(vizi,xi) = ZiVar(aizi,xi)Z’
i + Var(uizi,xi) + Cov(ai,uizi,xi) +
Cov(ui,aizi,xi) = ZiVar(aizi,xi)Z’
i + Var(uizi,xi) because ai and ui are
uncorrelated, conditional on (zi,xi), by FE.1’ and the usual iterated
68
7. where vi = Zi(ai  A) + ui.zi) = 0. By standard results on partitioned regression [for example. From part a. Naturally. It need ^ ^ not be the case that Var(vixi. If we use the usual RE analysis. Unlike in the standard random effects model. we know that E(vixi.zi). Var(vizi.will be invalid. Assumption (Remember. b.. there is conditional heteroskedasticity.xit).that is. and so the usual RE estimator is consistent (as N fixed T) and RE. holds.49). B^ can be obtained by the following twostep procedure: 69 .zi) depends on zi unless we restrict almost all elements of constant in zit). provided ^ ) L 8 for rNasymptotically normal.T. and we estimate 11.xi) = Zi*Z’ i + s2uIT under the assumptions given. t = 1.4)]. c. provided the rank condition. we are applying FGLS to the equation yi = ZiA + XiB + vi. we expand the set of explanatory variables to (zit.2. or even that Var(vi) = plim()). Davidson and MacKinnon (1993. based on the usual RE variance matrix estimator . * to be zero (all but those corresponding the the Therefore. we can rearrange (11. a feasible GLS analysis with any ^ ) will be consistent converges in probability to a nonsingular matrix as N L 8. as in equation (7.2. the usual random effects inference . for all t. We can easily make the RE analysis fully robust to an arbitrary Var(vixi. Section 1. From part a. which shows that the conditional variance depends on zi. When Lt = L/T  B^ (along with ^ along with B. we know that Var(vixi. Therefore....zi) = plim()). denote the pooled OLS estimator from this equation.60) to get yit = xitB + xiL Let A L) + vit.expectations argument.
25). it it 8 7t=1 timedemeaned equation: rank This clearly fails if xit contains any timeconstant explanatory variables (across all i. ^ ^ The OLS vector on rit is B. as usual). just as before. In 1 ¨’¨ ¨ ¨ particular.xi for all t and i. The argument is very similar to the case of the fixed effects T estimator. 2 2 S E(u¨it ) = (T . we can always redefine T zit so that S E(z¨’it¨zit) has full rank.. and ¨’¨ ¨ ¨ E(Z i uiu’ i Zi). and this rules out timeit it 8 7t=1 The condition rank constant instruments. i = 1. and so * = ¨ E(uiu’ i Zi) = ¨’u u’¨ E(Z i i i Zi) = ¨’¨ s2uE(Z i Zi). t=1 70 ^ ¨ ¨ ^ If uit = y it . t=1 b. * = " ¨ u .*1& N T .T. But if the rank condition holds.B) = su{E(X i i)[E(Z’ i Zi)] E(Z’ i Xi)} .. (i) Regress xit on xi across all t and i.xi. First.’i S xit i=1t=1 i=1 t=1 i=1 = xit .8. We can apply Problem 8. Given that the FE estimator can ^ be obtained by pooled OLS of yit on (xit . in equation (8. B^ We want to show that is the FE estimator. and save the 1 * K vectors of ^ residuals. a. we obtain Avar ...80). as we are applying pooled 2SLS to the & ST E(z¨’ ¨x )* = K..& rit = xit .N T . where A key point is that ¨ Z’ i ui = (QTZi)’(QTui) = Z’ i QTui = Z’ i i QT is the T x T timedemeaning matrix defined in Chapter 10.b. W...’i xit = S x.25) and simplify...S S x’i xi8 7 S S x’i xit*8 7i=1t=1 i=1t=1 N .. & ST E(z¨’ ¨z )* = L is also needed. ^ (ii) Regress yit on rit across all t and i. We can apply the results on GMM estimation in Chapter 8. and i=1t=1 11. and so rit = xit . it suffices to show that rit =  xit .xi N T .B ^ 2 1 ¨ ¨ 1 ¨’Z ¨ ¨ ¨ rN( ..xiIK N T N T S S x. But ^ . su2IT (by the usual iterated expectations argument). c.. take C = E(Z i Xi)... This completes the proof. and * into (8. say rit.1)su..9. If we plug these choices of C. t = 1. W = [E(Z’ i Zi)] . Under (11.xi).N.xitB .^ = S Tx’ i xi = S S x’ i xi.
and xit as the IVs ^ . But.. d. N(T .^ ^ for xit).1) . then [N(T 1 N 1)] T S S ^u2it is a consistent estimator of s2u. . cN. xit. where we use the fact ^ that the time average of rit for each i is identically zero. Now.81). dNi.81) (with dni as their own instruments.) e.. Therefore. From Problem 5.. B from the pooled regression yit on d1i. . including B.. dNi results in time demeaning. B from (11. say rit. second. as ^ would usually be the case.79). say sit. ^ rit. Typically. and so applies immediately to pooled 2SLS). B^ ^ and the coefficient on rit. Now consider the 2SLS estimator of B from (11. this partialling out is equivalent to time demeaning all variables. say from this last regression can be obtained by first partialling out the dummy variables. by algebra of partial regression. can be obtained as follows: first run the regression xit on d1i. B^ and D^ can be ¨ . ^ obtained from the pooled regression ¨ yit on x it rit.xiB. ^ . again by partial regression ^ and the fact that regressing on d1i. As we know from Chapter 10. But we can simply drop those without changing any other steps in the argument.are the pooled 2SLS residuals applied to the timedemeaned data. . This proves that the 2SLS estimates of and (11. sit... d1i.79) (If some elements of xit are included in zit. some entries in rit are identically zero for all t and i. dNi..... and obtain the residuals. dNi.1 (which is purely algebraic.81) are identical. . the 2SLS estimator of all parameters in (11. This is equivalent to ^ first regressing ¨ xit on ¨ zit and saving the residuals. First. by writing down the first order condition for the 2SLS ^ estimates from (11. zit across all t and i.. where B is the IV estimator 71 . it is easy to show that ci = yi .... ^ ^ ^ obtain c1. and then ^ running the OLS regression ¨ yit on ¨ xit.K as a degrees of freedom adjustment.1) would be i=1t=1 replaced by N(T . ^ D.. sit = ^ rit for all i and t.
and * = &N1 SN ¨Z’u^ ^u’¨Z *. messy estimator in equation (8.3868 = 0. 51) Prob > F Rsquared Adj Rsquared Root MSE = 54 = 0.97 = 0.2535328 51 .81) (and also (11. f. which are exactly the 2SLS residuals from (11. as is any IV method that uses timedemeaning to eliminate the unobserved effect. all elements).cgrant[_n1] if d89 (314 missing values generated) .0366 = 0. the degrees of freedom in estimating s2u from part c are properly calculated.¨ XiB.cgrant_1[_n1] if d89 (314 missing values generated) .^ ^ .79).2119812 53 . I can used fixed effects on the first differences: . t P>t [95% Conf.xi)B = y it ^ ¨ xitB. Therefore. Std. W = (Z Z/N) .(xit .81). The general.^ ¨ are computed as yit . The 2SLS procedure is inconsistent as N L 8 with fixed T.0012 = .479224186 Residual  25. 7 i=1 i i i i8 g. gen ccgrnt = cgrant . This is because the timedemeaned IVs will generally be correlated with some elements of ui (usually. Alternatively.11.70368 cclscrap  Coef.from (11. Because the N dummy variables are explicitly included in (11.xitB = (yit .27) should be used.xiB) . Err. Interval] 72 .49516731 +Total  26. Differencing twice and using the resulting cross section is easily done in Stata. gen ccgrnt_1 = cgrant_1 .(yi . the 2SLS residuals from (11.yi) . reg cclscrap ccgrnt ccgrnt_1 Source  SS df MS +Model  . 11.79)).clscrap[_n1] if d89 (417 missing values generated) .81) . ui = ¨ yi . where ^ 1 ^ ^ ^ ¨’¨ X and Z are replaced with ¨ X and ¨ Z. gen cclscrap = clscrap .494565682 Number of obs F( 2.958448372 2 .
0050 3.and they are very imprecise.509567 .689 0.0448014 cgrant  .555 .6850584 ccgrnt_1  .0063171 fcode  F(53. 73 . To be added.033 (54 categories) The estimates from the random growth model are pretty bad .2377384 .6635913 1. xtreg clscrap d89 cgrant cgrant_1.4975778 .961 0.3721087 .1564748 .883394 _cons  . so it is hard to know what to make of this.6099016 .674 0.5202783 .594 0. Std.2632934 0.the estimates on the grant variables are of the wrong sign . 11.341 .6635913 1.0577 0. It does cast doubt on the standard unobserved effects model without a random growth term.6850584 cgrant_1  . t P>t [95% Conf.961 0. Err.15.555 .953 0.689 0.3721087 .2377385 .7122094 corr(u_fcode.04 0.6343411 0. To be added. 51) = Prob > F = 1.4011 Fixedeffects (within) regression Number of obs = 108 n = 54 T = 2 Rsq within between overall F( = = = 0. Xb) = 0.6099015 .4544153 .056 .0476 0.114748 1.341 .2632934 0.1564748 .13.1407363 1. The joint F test for the 53 different intercepts is significant at the 5% level.6343411 0.5202783 .097 .51) = 1. 11.2240491 . fe sd(u_fcode) sd(e_fcode_t) sd(e_fcode_t + u_fcode) = = = .0448014  .+ccgrnt  .3826 clscrap  Coef.594 0.097 . Interval] +d89  .1407362 1.883394 _cons  .
A ^ 1/2 rN( . where ri _ (si .A) = N S [(Z’i Zi)1Z’i (yi .A) . and B with ^ their consistent estimators.CA1X¨’i ui] + op(1).11. we use (11.55). . 1 1 By definition.XiB) . If we replace A. we get exactly (11.CA1X¨’i ui.A ^ rN( .N S (Z’ 7 i=1 i Zi) Z’i Xi8rN(BFE .A ^ 1/2 rN( .A) . To obtain (11.XiB).^ . 74 . E(si) By combining terms in the sum we have .B) N 1/2 N = N S (si .A] i=1 & 1 N 1 * .A) .B) = Simple algebra and standard properties of Op(1) and op(1) give . i=1 _ E[(Z’i Zi) Z’i Xi] and si _ (Z’i Zi) Z’i (yi .A) = N S [(si .17. C. N i=1 which implies by the central limit theorem and the asymptotic equivalence lemma that .B ^ rN( FE .54) and the representation 1 A (N 1/2 N ¨ Si=1X’i ui) + op(1). A.CA1N1/2 S X¨’i ui + op(1) N i=1 where C = A.A) is asymptotically normal with zero mean and variance E(rir’ i ).55). sincethe ui are the FE residuals.
Since ^ q3/(2^q4).mi. and the second term is clearly (although not uniquely.Q) = g[xB + d1(xB)2 + d2(xB)3]W[x + 2d1(xB)2 + 75 .Q) evaluated under the null hypothesis. i = 1... * the turning point is z2 = d.Qo) . a. 2 = E(u Now. obtain the 1 ~ ~ For the robust test.Q)]2..N. Since d^E(yz)/dz2 = exp[^q1 + ^q2log(z1) + ^q3z2 + ^q4z22]W(^q3 + 2^q4z2).Q)]E(ux) + E{[m(x. 12.Qo) . 12. the gradient of the mean function evaluated under the null is restricted NLS estimator. and use E(ux) = 0: E{[y .m(x. we first regress mixi2 on mixi1 and * K2 residuals. r~i..m(x. Then we compute the statistic as in regression (12.Q)] 2 x} x) + 0 + [m(x.72).3. We need the gradient of m(xi.Qo) . where ~ ~ ui = yi . Dbm(x. By the chain rule. mixi2.Q)] 2 x] = E(u2x) + 2[m(x. the first term does not depend on Q minimized at = Qo Q. Dq~mi = exp(xi1Q~1)xi _ ~mixi. c.4) with respect to x.CHAPTER 12 12. we can compute the usual LM 2 ~ ~ ~ statistic as NRu from the regression ui on mixi1.1. where ~Q1 is the From (12.m(x. The approximate elasticity is ^ dlog[E(y z)]/dlog(z1) = d[q^1 + ^ q2log(z1) + ^q3z2]/dlog(z1) = ^q2. Take the conditional expectation of equation (12. Dqm(x.75).Q)]2 2 = E(u x) + [m(x.Q) = exp(x1Q1 + x2Q2)x.m(x. This is approximated by 100Wdlog[E(yz)]/dz2 = 100Wq3.5.Qo) . in general). ^ ^ b.m(x.
and so its unconditional expectation is zero.G) is a _ ui. _ 1. So Each Dgsj(wi. giW(xiB) . )o. Then Let B~ denote ~ Dbm(xi.the nuisance Then. we can verify condition (12. one can verify the hint directly. * 1 vector containing the uig.Qo. and collect the residuals. and we get RESET.Qo.. where the linear combination is a function Since E(uixi) = 0.Q. Alternatively.37).. First. let G be the vector of distinct elements of parameters in the context of twostep Mestimation. even though the actual derivatives are complicated.7. ) . do NLS for each g. the NLS estimator with d1 = d2 = 0 imposed. the usual LM statistic can be 2 ~ ~ ~ ~ 2 ~ ~ 3 obtained as NRu from the regression ui on gixi. )o is 1 N N ^ ^ Qg S ^^ui^^u’i i=1 is consistent for Qog as N 8.Qo.~Q) = g(xiB )xi and Therefore. a consistent estimator of )^ _ because each NLS estimator. giW(xiB) . Then. let ui be the G Then E(uiu’ i xi) = E(uiu’ i) = squares residuals. g = 1. by standard arguments. linear combination of ui(Qo) of (xi.G). _ yig .G. which has the same consequence.Qo). where ~ gi ~ _ g(xiB ).G) = Dqm(xi.Q~) = g(xiB )[(xiB) . For each i and g. ^ ^ be the vector of nonlinear least Let u i That is.3d2(xB) ].m(xig. This shows that we do not have to adjust for the firststage estimation of )o. E[Dgsj(wi. define uig 0..Q) = g[xB + d1(xB)2 + d2(xB)3]W[(xB)2.(xiB) ]. the notation is clear. 2 Ddm(x. the score for observation i is s(wi. ~ ~ 2 ~ 3 Ddm(xi.. and I will sketch how each one goes. If G(W) is the identity function. element of s(wi.Q. L b. so that E(uigxi) = Further.(xB)3].G)xi] = 0. too. With this definition.G) is a linear combination of ui(Q). g(W) 12. This part involves several steps.Q)’) ui(Q) 1 where. a. hopefully. 76 .
Next.Go)si(Qo.Go)’] = E[Dqmi(Qo)’)o uiu’ i )o 1 = E{E[Dqmi(Qo)’)o uiu’ i )o 1 1 1 1 Next. d.37) and that Ao = Bo. note that Dqmi(Qo) is a blockdiagonal matrix with blocks Dqgmig(Qog). (I implicityly assumed that there are no crossequation restrictions imposed in the nonlinear SUR estimation.Go)’]: E[si(Qo. So. 1 So.Q. where P is the total number of parameters. Hessian itself is complicated. The = {E[Dqmi(Qo)’)o 1 Dqmi(Qo)]. not on yi. First. we derive Bo _ E[si(Qo. 1 = Ao with respect to Dqmi(Qo)’)1 o Dqmi(Qo) + [IP t E(uixi)’]F(xi.Qo. E[Hi(Qo.3. from Theorem 12. c.Q)’)1Dqm(xi.G) depends on xi. but its expected value is not.Go)xi] = = verified (12. that involves Jacobians of the rows of )1Dqmi(Q) The key is that F(xi.) 77 If )o . we have Therefore.G) with respect to Hi(Q. of si(Q.Go) Now iterated expectations gives Ao = E[Dqmi(Qo)’)o Qo) The Jacobian Dqm(xi. _ E[Hi(Qo. we replace expectations with sample averages and unknown ^ ^ parameters.G).Go)]. q i 7 ^ ^ Avar(Q) = i=1 The estimate ) ^ can be based on the multivariate NLS residuals or can be updated after the nonlinear SUR estimates have been obtained. Avar rN(^Q  Dqmi(Qo)]}1.Go)si(Qo.G) is a GP Q. Dqmi(Qo)’)1 o Dqmi(Qo). where F(xi. a 1 * Pg matrix. As usual.Q) + [IP t ui(Q)’]F(xi.Q. and show that Bo = Ao.G) = Q can be written * P matrix. we have to derive Ao Dqmi(Qo)] Dqmi(Qo)xi]} = E[Dqmi(Qo)’)o E(uiu’ i xi))o = E[Dqmi(Qo)’)o 1 1 Dqmi(Qo)] )o)1 o Dqmi(Qo)] = E[Dqmi(Qo)’)o 1 Dqmi(Qo)].Q. and divide the result by N to get Avar(Q): &N1 SN D m (^Q)’) ^ 1 ^ *1 D mi(Q) q i q 7 i=1 8 /N N 1 & S D m (Q^)’) ^ 1 = Dqmi(^Q)*8 .
so is its inverse. I cannot see a nonlinear analogue of Theorem 7.Qog) = exp(xiQog)xi. even when the same regressors appear in each equation.9. then E(ux) and Med(ux) are both constants.. Standard matrix multiplication shows that ( 2 o so1 Dq1mi1 ’Dq 1 m oi1 0 W W W 2 Dqmi(Qo)’)1 o Dqmi(Qo) = ) 0 2 2 2 0 WW W 2 2 . b.Qog) = xi For example. a.) These asymptotic variances are easily seen to be the same as those for nonlinear least squares on each equation. see p. say a and d. If u and x are independent..d.Med(yx) = 78 a . mg(xi. and Med(ux) could be a general function of x. with Dqm(xi. 2 2 o o s2 oG DqGm iG’DqGmiG 9 0 ^ Taking expectations and inverting the result shows that Avar rN(Qg . is the same in all equations .Qo). the blocks are not the same even when the same regressors appear in each equation. since Med(yx) = m(x.Bo) + Med(ux). The first hint given in Problem 7. g = 1. But. 12.G. for all g.5 does not extend readily to nonlinear models. e. Then E(yx) .. We cannot say anything in general about Med(yx). (Note also that the nonlinear SUR estimators are asymptotically uncorrelated across equations.7..is blockdiagonal. if Dqgmg(xi. 360. While this G The key is that Xi is replaced * P matrix has a blockdiagonal form.Qog) = 0 2 W W W 0 2  s2og[E(Dqgmoig’Dqgmoig)]1.a very Dqgmg(xi.Qog) = exp(xiQog) then Dqgmg(xi.Qog) varies across g. which does not . and the gradients differ across g. unless Qog restrictive assumption  In the linear case. as described in part d.
and Ao = E[Dqmi(Bo)’Dqmi(Bo)]. Bo = E[Dqmi(Bo)’uiu’D i qmi(Bo)] These can be consistently estimated in the obvious way after obtain the MNLS estimators. the condition is (Bo  B)’E(X’i Xi)(Bo  B) > 0. and this holds provided E(X’ i Xi) is positive definite. * K matrix. B $ Bo.Bo) m(xi. Provided m(x.m(xi.Bo) m(xi.B)]} because E(uixi) = 0." at least when only the mean and median are in the running.m(xi.m(xi. That is. the partial effects of xj on the conditional mean and conditional median are the same. a.in addition to the regularity conditions. Bo must uniquely minimize E[q(wi.B)]}’{ui + [m(xi.m(xi. and there is no ambiguity about what is "the effect of xj on y. .Bo) . we need .B)]}) = E(u’ i ui) + 2E{[m(xi. c.B) = XiB for Xi a G B $ Bo. b. In a linear model. When u and x are independent. the identification assumption is that E{[m(xi.m(xi.11.m(xi. we could interpret large differences between LAD and NLS as perhaps indicating an outlier problem. Generally.m(xi. We can apply the results on twostep Mestimation.B)]’[yi . 12. 79 The key is that.Bo) . there are no problems in applying Theorem 12.Bo) .depend on x.3.B)]} = E(uiu’ i ) + E{[m(xi.W) is twice continuously differentiable. Then.B)] = E{[yi  m(xi.B)]’[m(xi. which I will ignore .the identification condition. But it could just be that u and x are not independent.B)]’ui} + E{[m(xi.Bo) .B)]’[m(xi. For consistency of the MNLS estimator.Bo) . where m(xi.B)]} > 0.Bo) m(xi. Therefore.B)]’[m(xi.Bo) .B)]} = E({ui + [m(xi.
It follows easily that E[Ddsi(Bo.B)]’[Wi(Do)] ui}. the first term does not depend on at Bo. we can write Ddsi(Bo.7.Bo) m(xi.m(xi.Do). But we can use an argument very similar to the unweighted case to show E{[yi .m(xi.Bo.37).m(xi. when Var(yixi) = Var(uixi) = W(xi. 1 80 Dbmi(Bo)] Dbmi(Bo)] . we can use an argument very similar to the nonlinear SUR case in Problem 12.underl general regularity conditions.m(xi.Do)] [yi .B)]’[Wi(^D)]1[yi . This means that.Do). First.  To obtain the asymptotic variance when the conditional variance matrix is correctly specified.m(xi.B)]}/2. In particular.Do)xi] = 0.Do)’] = E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] 1 1 = E{E[Dbmi(Bo)’[Wi(Do)] uiu’ i [Wi(Do)] 1 1 Dbmi(Bo)xi]} = E[Dbmi(Bo)’[Wi(Do)] E(uiu’ i xi)[Wi(Do)] 1 1 = E{Dbmi(Bo)’[Wi(Do)] ]Dbmi(Bo)}.Do) for some function G(xi. under E(yixi) = m(xi.Bo.Bo) . it can be shown that condition (12. we can ignore preliminary estimation of provided we have a Do rNconsistent estimator.Bo) . 2E{[m(xi. To get the asymptotic variance. N 1 N S [yi . 1 before.Do)] [yi .B)]’[W(xi. that is.B)]} = E{u’ i [Wi(Do)] ui} 1 1 + E{[m(xi.Do) = (IP t ui)’G(xi.B)]’[W(xi.B)]}. 1 which is just to say that the usual consistency proof can be used provided we verify identification.37) holds.m(xi.B)]’[Wi(Do)] [m(xi.m(xi.B)]/2.7: E[si(Bo. as always).m(xi. which implies (12. is zero (by iterated expectations. 1 where E(uixi) = 0 is used to show the crossproduct term. i=1 converges uniformly in probability to E{[yi .Bo). B As and the second term is minimized we would have to assume it is uniquely minimized.Do)si(Bo. we proceed as in Problem 12.
Do). the Hessian (with respect to Hi(Bo. ^ B = N $ The estimator of Ao in part b To consistently estimate Bo we use 1 N ^ ^ 1^ ^ ^ 1 ^ S Dbm(xi.Q)] $ exp{E[log f(yixi.Do). No. for some complicated function F(xi.1.Bo.B )’[Wi(D)] uiu’ i [Wi(D)] Dbm(xi.Do) = B).Bo) + (IP t ui)’]F(xi. Exactly the same derivation goes But. from the usual results on Mestimation. Avar ^ 1 rN(B . also maximizes exp{E[log The problem is that the expectation and the exponential function cannot be interchanged: E[f(yixi. of course. can be written as Dbm(xi.Q)]. fact.Q)] > exp{E[log 81 In .  CHAPTER 13 13.Q)]}. because exp(W) is an increasing function. and the expression for Bo no longer holds.Bo)} = Bo. Qe$ where the expectation is over the joint distribution of (xi. the asymptotic variance is affected because Ao Bo. and  a consistent estimator of Ao is ^ 1 A = N N ^ ^ 1 ^ S Dbm(xi. i=1 Now.Bo) = Ao .B). i=1 c. we estimate Avar ^ ^1^^1 rN(B . Qo Therefore. The consistency argument in part b did not use the fact that W(x.Bo)’[Wi(Do)]1Dbm(xi. of course. We know that Qo solves max E[log f(yixi.D) is correctly specified for Var(yx).Now. Therefore. through.yi). Jensen’s inequality tells us that E[f(yixi.Do) that depends only on xi.Do)] = E{Dbm(xi.Bo)’[Wi(Do)]1Dbm(xi.Bo. still works. f(yixi. evaluated at (Bo.Bo) in the usual way: A BA .B )’[Wi(D)] Dbm(xi. Taking expectations gives Ao _ E[Hi(Bo.B).Q)]} over $.
xi] = ri2E[li1(Q)yi2. b. but where it is based initial on si and Ai: & SN ~sg*’& SN A~g*1& SN ~sg* LMg = 7i=1 i8 7i=1 i8 7i=1 i8 & SN ~G’1~s *’& SN ~G’1~A ~G1*1& SN ~G’1s~ * = i8 7 i i8 7i=1 8 7i=1 i=1 N ’ N 1 N & S s~ * G~1G~& S A~ * G~’G~’1& S ~s * = 7i=1 i8 7i=1 i8 7i=1 i8 N N 1 N & S ~s *’& S ~A * & S ~s * = LM. 13.Q)]}. 82 Qo maximizes E[li2(Q)]. and .Qo)Wh(y2x. and therefore Qo maximizes E[ri2li1(Q)].f(yixi. and we would use this in a standard MLE analysis (conditional on xi). The log likelihood for observation i is Q) _ li( log g(yi1yi2.36). 13.x. Since ri2 is a function of (yi2. The joint density is simply g(y1y2.3. = i i i 7i=1 8 7i=1 8 7i=1 8 13. 1 b.xi].xi). Qo maximizes E[li1(Q)yi2. First. Similary.xi. since ri2 > 1.5. a. we know that.Q).Qo).xi).4. a. c.xi).xi]. Qo maximizes E[ri2li1(Q)yi2. we just replace 1 Qo with ~g ~ 1 ~ ~ 1 Ai = [G(Q)’] Ai(Q)[G(Q)] Q~ and Fo with F~: _ ~G’1~Ai~G1.xi] for all (yi2. for all (yi2. The expected Hessian form of the statistic is given in the second ~g ~g part of equation (13.7.Q) + log h(yi2xi. Parts a and b essentially appear in Section 15. E[ri2li1(Q)yi2. In part b. Since si(Fo) = [G(Qo)’] si(Qo). g 1 E[si(Fo)si(Fo)’xi] = E{[G(Qo)’] si(Qo)si(Qo)’[G(Qo)] g g 1 1 xi} = [G(Qo)’] E[si(Qo)si(Qo)’xi][G(Qo)] 1 1 = [G(Qo)’] Ai(Qo)[G(Qo)] .
(13.xi] = E[Hi1(Qo)yi2. Further. it follows that E[ri2si1(Qo)si2(Qo)’yi2. where si1(Q) _ Dqli1(Q)’ and si2(Q) _ Dqli2(Q)’. expectation.  . Now. by the unconditional information matrix equality for the density h(y2x.70) Since ri2 is a function of (yi2. E[ri2si1(Qo)si1(Qo)’] = E[ri2Hi1(Qo)]. c.70).xi).xi). we have shown that E[si(Qo)si(Qo)’] = E[ri2Hi1(Qo)] .so it follows that Qo maximizes ri2li1(Q) + li2(Q). where Hi2(Q) = Dqsi2(Q). Now by the usual conditional MLE theory. where Hi1(Q) = Dqsi1(Q). E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’] + E[ri2si1(Qo)si2(Qo)’] + E[ri2si2(Qo)si1(Qo)’]. E[si2(Qo)si2(Qo)’] = E[Hi2(Qo)]. For identification.Qo) by {E[Hi(Q)]}1. this implies zero unconditional We have shown E[si(Qo)si(Qo)’] = E[ri2si1(Qo)si1(Qo)’] + E[si2(Qo)si2(Qo)’]. Therefore. we have to assume or verify uniqueness.xi].E[Hi2(Qo)] = {E[ri2Dqsi1(Q) + = E[Dqli(Q)] 2 Dqsi2(Q)] _ E[Hi(Q)]. by iterated expectatins. we can put ri2 inside both expectations in (13.xi] = 0 and. E[si1(Qo)si1(Qo)’yi2. E[si1(Qo)yi2.Q). which means we can estimate the asymptotic variance of 83 rN(^Q . The score is si(Q) = ri2si1(Q) + si2(Q). Then. and so its transpose also has zero conditional expectation.x.xi] = 0. As usual. So we have verified that an unconditional IM equality holds. byt the conditional IM equality for the density g(y1y2. Combining all the pieces. since ri2 and si2(Q) are functions of (yi2.Q).
s. N S ^Ai2 is consistent for E[Hi2(Qo)] i=1 by the usual iterated expectations argument.{E[Ai1(Qo) + is p.xi) in one case.xi].but conditioned on different sets of variables.A We use a basic fact about positive definite matrices: if A and * P positive definite matrices. conditions. Interestingly. this estimator need not be positive definite. i=1 where the notation should be obvious.s. 1 Ai2(Qo)]} 1 But {E[ri2Ai1(Qo) + Ai2(Qo)]} 1 . But. as we showed in part d.xi). Similarly. we can still used the conditional expectations of the hessians . and xi in the other .d. under general regularity 1 N S ri2^Ai1 consistently estimates E[ri2Hi1(Qo)]. one consistent estimator of rN(^Q .B is p. because E[Ai1(Qo) + Ai2(Qo)] . 1 If we could use the entire random sample for both terms. the asymptotic variance would be {E[Ai1(Qo) + Ai2(Qo)]} . N This implies that. Answer: B are P 1 B . Now. the asymptotic variance of the partial MLE is {E[ri2Ai1(Qo) + Ai2(Qo)]} . the result conditional MLE would be more efficient than the partial MLE based on the selected sample. and ri2 is a function of (yi2.Qo) is  1 N N S (ri2H^i1 + H^i2). This i=1 completes what we needed to show.E[ri2Ai1(Qo) + Ai2(Qo)] 84 . Instead. as we discussed in Chapters 12 and 13. by 1 N _ E[Hi2(Qo)xi]. for which we can use iterated expectations. From part c. Ai2(Qo) Since.d.to consistently estimate the asymptotic variance of the partial MLE. if and only if 1 is positive definite. it follows that E[ri2Ai1(Qo)] = E[ri2Hi1(Qo)]. since Ai1(Qo) _ E[Hi1(Qo)yi2. definition. (yi2. e. we can break the problem into needed consistent estimators of E[ri2Hi1(Qo)] and E[Hi2(Qo)]. Bonus Question: Show that if we were able to use the entire random sample.d. then A . even though we do not have a true conditional maximum likelihood problem.
when g = x1D1 + g g1E(y22x) + E(u1x) g g1E(y22x).35) is by 2SLS. No. Finally. if not impossible.= E[(1 . (since Ai1(Qo) is p. the parameter g2 does not appear in the model. using instruments (x1.d.s. one could try to use the optimal instruments derived in section 14. 13. Nonlinear functions of these can be added to the instrument list . regression y2 on x2 consistently estimates 85 D2. c.3. we would consistently estimate D1 by OLS.1. so we cannot write E(y1x) = x1D1 + g g1(xD2) 2. in fact.11. g g2 $ 1.ri2)Ai1(Qo)] is p.x2). g1 = 0. We can see this by obtaining E(y1x): E(y1x) = x1D1 + Now. 2SLS using the given list of instruments is the efficient. these are difficult.s. the optimal weighting matrix that allows heteroskedasticity of unknown form should be used. a. To be added. to find analytically if b. Otherwise. E(y22x) $ [E(y2x)] 2.5. The simplest way to estimate (14.9. if we knew Of g1 = 0. While the the twostep NLS estimator of . Even under homoskedasticity. we cannot find E(y1x) without more assumptions. To be added. single equation GMM estimator.d.these would generally improve efficiency if g2 $ 1. and 1 .ri2 > 0. CHAPTER 14 14. If g2 $ 1. course. If E(u2x) = 2 s22. 13.
P1 = [(L1 + B)’.Qo). 1 _ Now we can verify (14.Qo).Qo)r(wi.L’3 ]’. and then P is the 3 + 9K * 1 vector obtained by Let Q = (j. * 14. %o function of xi and let * L matrix that is a Let Zi be a G be the probability limit of the weighting matrix. We can write the unrestricted linear projection as yit = where Pt is 1 + 3K stacking the the Pt Pt.(L2 + Therefore. P3 = [L’ 1 .(L3 + = HQ for the (3 + 9K) by 86 B)’]’. take A _ G’o %oGo and s(wi) _ * The optimal score function is s (wi) Ro(xi)’)o(xi) r(wi.Qo)’)o(xi) Ro(xi)] * 1 = G’ o %oE[Z’ i E{r(wi. we can write P B)’.3.Qo)’xi})o(xi) Ro(xi)] 1 = G’ o %oE[Z’ i )o(xi))o(xi) Ro(xi)] = G’ o %oGo = A. With the restrictions imposed on we have pt0 = j. pt0 + xiPt + vit.3.54). Then the asymptotic variance of the GMM estimator has the form (14.63). (This is an g2 = 1. G’ o %oZ’ i r(wi.L’ 2 . t = 1. * 1.L’ 3 .B’)’. So.2. Let Zi be the G * G matrix of optimal instruments in (14. * (1 + 4K) matrix H defined .L’3 ]’.") When D1 and g2.Qo)r(wi.L’ 2 .L’ 1 .57) with r = 1: E[s(wi)s (wi)’] = G’ o %oE[Z’ i r(wi.^ g2 yi1 on xi1. in (14.L’2 .2. (xiD2) will not be consistent for example of a "forbidden regression. P2 = [L’ 1 . the plugin method works: it is just the usual 2SLS estimator. t = 1.5. 1 14.3. where we suppress its dependence on xi.10) with Go = E[Z’ i Ro(xi)].
2. when H’%o H . we know that E(rir’ i xi) =  ljTvi. A1.55) and (14.7. from Chapter s2uIT under RE. 10.1. r _ su2. X’ i ri = X’ i (vi 87 r. and RE.p. RE. where ri = vi ˇ ˇ Therefore. the minimization problem becomes ^ ^1 ^ min (P . The first order condition is easily seen to be ^1 ^ ^ 2H’% (P .1.HQ) = 0 ^1 ^ ^1^ (H’% H)Q = H’% P.HQ)’% (P . su2E(Xˇ’i Xˇi) _ su2A1 by the usual This means that.  Now.56) for the random effects and fixed effects estimators. E(si1s’ i1) = E(X’ i rir’ i Xi) = iterated expectations argument. With h(Q) = HQ.55). Now. ¨ ˇ But si2s’ i1 = X’ i uir’ i X i. 14. assuming H’% H is nonsingular . or ^1 Therefore.which occurs w.we have 1 Q^ ^1 1 ^1^ = (H’% H) H’% P. ljTvi) = ¨X’i vi = ¨X’i (cijT + ui) =  . si2 (with added i subscripts for clarity).is nonsingular . QeR P where it is assumed that no restrictions are placed on Q.a.56) for this choice of ¨ ¨ Now. and A2 are given in the hint. IK 0 0 0 0 IK8 2 2 2 2 2 2 2 2 2 2 14.&1 0 0 0 0 * 0 IK 0 0 0 0 IK 0 2 2 2 0 1 0 H = 0 0 1 0 0 70 2 2 2 2 2 2 2 2 2 2 0 0 IK 0 0 0 IK 0 0 0 0 0 IK 0 0 0 IK 0 IK 0 0 0 IK 0 0 0 IK IK 0 2 2 2 0 0 0 .3.9. in (14. We have to verify equations (14. we just need to verify (14. as described in the hint.HQ). The choices of si1.
56) with r .  s2u. su2X¨’i Xˇi. ljTxi) = ¨X’i Xi = ¨X’i ¨Xi. This verifies (14. ¨ ˇ ¨ ˇ So si2s’ i1 = X’ i rir’ i Xi and therefore E(si2s’ i1xi) = X’ i E(rir’ i xi)Xi = It follows that E(si2s’ i1) = ¨ ˇ ¨ note that X’ i Xi = X’ i (Xi = su2E(X¨’i Xˇi).¨ X’ i ui. 88 To finish off the proof.
In the model P(y = 1z1.N.dkiWdmi = 0 for k $ m. But this is easily seen to be the fraction of yi in the sample falling into category m. the overall intercept is the cell frequency for the first category. the fitted values are just the cell frequencies.. 15.. If we drop d1 but add an overall intercept. Since the regressors are all orthogonal by construction .d1 = 0) = F[z1D1 + (g1 + g3)z2 + g2] .z2. Again. m = 2.3. i = 1.d1) = (g1 + dz 2  F(z1D1 + g1z2 + g2d1 + g3z2d1).. d1 . g3d1)Wf(z1D1 + g1z2 + g2d1 + g3z2d1). The fitted values for each category will be the same. and these are necessarily in [0.CHAPTER 15 15.. Therefore. and all i ..the coefficient on dm is obtained from the regression yi on dmi.z2) = F(z1D1 + g1z2+ g2z2) then dP(y = 1z1. for given z.P(y = 1z. The effect of d1 is measured as the difference in the probabilities at d1 = 1 and d1 = 0: P(y = 1z. M.z2) = (g1 + 2g2z2)Wf(z1D1 + dz 2 2  g1z2 + g2z22). a. b. this is estimated as ^ ^ ^ ^ ^ 2 (g1 + 2g2z2)Wf(z1D1 + g1z2 + g2z2).. where. the estimates are the probit estimates. a. to estimate these effects at given z and .in the first case. of course. b. and the coefficient on dm becomes the difference in cell frequency between category m and category one. If P(y = 1z1.d1 = 1) . .1].z2.F(z1D1 + g1z2).d1) = the partial effect of z2 is dP(y = 1z1..we 89 .1.
Because P(y = 1z) depends only on g1. this is what we can estimate ====================================== P(y = 1z) = along with D1. we have a standard probit model.ez) = g12z22 + 1 because Cov(q. we would require the full variance matrix of the probit estimates as well as the gradient of the expression of interest. Thus. where r = g1z2q + e. Define Let D1 ^ denote the probit estimates under the null that Fi = F(zi1^D1). We would apply the delta method from Chapter 3. and ~ui _ u^i/r F^i(1 . under H0. Also. (For example. g1z2 + (Not with respect to the zj.g1z2 + 1). Write y = z1D1 + r. a. * b. for P(y = 1z).90) 2 c.q) =  assuming that z2 is not functionally related to z1. this follows because E(rz) = 2 2 E(ez) = 0.ez) = 0 by independence between e and (z. ================================================ .q).F^i) ^  r1 = 0. Testing H0: r1 = 0 is most easily done using the score or LM test because.q) with a standard normal distribution. Because q is assumed independent of z. ^fi = f(zi1^D1). r/r g1z2E(qz) + Thus. (15.q) = g qWf(z D + g z q). and use average or other interesting values of z.just replace the parameters with their probit estimates. with respect to all probit parameters. qz ~ Normal(0. ====================================== It follows that 5 F&7z1D1/r g21z22 + 18*. and e is independent of (z. 5 2 2 g1z2 + 1 has a standard normal distribution independent of z. ^ui = yi ^ 90 5 Fi.) F(z1D1 + g1z2q) then dP(y = 1z. If P(y = 1z.) g1 = 2 and g1 = 2 give exactly the same model This is why we define r1 = g21. c. Var(rz) = g12z22Var(qz) + Var(ez) + 2g1z2Cov(q. such as (g1 + 2g2z2)Wf(z1D1 + g2z22).5. 1 1 1 1 2 dz 2 15.
is simply only other quantity needed is the gradient with respect to null estimates. Err.0171986 0.0215953 .0063417 0.37 0.1617183 .0205592 4. (zi1D^1)zi2 fi/r F^i(1 . r1 in But this is not a standard probit estimation.184405163 +Total  545. t P>t [95% Conf. 15. with respect to The gradient of the mean function in (15.844422 2716 .0797 .0116466 . &r z2 + 1*3/2f(z D /r5g2z2 + 1).0009759 black  .0489454 .7. D1.62151145 Residual  500.000 . Std.867 .F^i).000 .0044679 4.90) with respect to r1 is.1156299 .0128344 inc86  .0048884 0.0209336 7.3609831 .48 0.55 0.42942 arr86  Coef.673 .90) evaluated under the null estimates. 1 i2 7 1 i2 8 9 i1 1 0 ^ ^ 2 ^ When we evaluate this at r1 = 0 and D1 we get (zi1D1)(zi2/2)fi.65 0. Interval] +pcnv  .83 0. The r1 evaluated at the But the partial derivative of (15.2078066 hispan  .0012248 .0892586 . for each i.0160927 22.20037317 Number of obs F( 8.9720916 8 5. The model can be estimated by MLE using the formulation with place of g21.17 0. NRu ~ ================================================ c21. the 2 score statistic can be obtained as NRu from the regression ~ ui 5 5 2 ^ f^izi1/r F^i(1 .88 0.000 .007524 ptime86  .43 0. ========================================== (zi1D1)(zi2/2) 2 Then. reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Source  SS df MS +Model  44.000127 9. a.329428 .F^i).0308539 .3925382 91 .0000 0.34 0.000 .0089326 .0159374 tottime  .42 0.816514 2724 .0824 0. ^ fizi1 .000 . d.0014738 .0235044 6.000 .1954275 .0303561 .1295718 born60  . The following Stata output is for part a: .0365936 _cons  .581 .1133329 avgsen  .(the standardized residuals). 2716) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 2725 30. ================================================ on 2 a under H0.1543802 .0028698 .0035024 .0020613 .
Std.59 0.626 .17 0. Interval] +pcnv  . In fact.0001141 10.3282214 .3609831 ..1116622 .001001 black  .0 tottime = 0.61 0. t P>t [95% Conf.1305714 born60  .0210689 4. There are no important differences between the usual and robust standard errors.0479459 .0000 0.0042256 0.7 points.0028698 .0161967 inc86  .0171596 0.0824 .1543802 . reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60. test avgsen tottime 92 . in a couple of cases the robust standard errors are notably smaller.8320 .42942  Robust arr86  Coef.000 .24 0.33 0.3937449 The estimated effect from increasing pcnv from . robust Regression with robust standard errors Number of obs F( 8.010347 .000 .14 0.000 .154(.0167081 21. test avgsen tottime ( 1) ( 2) avgsen = 0.1915656 .18 0. The robust statistic and its pvalue are gotten by using the "test" command after appending "robust" to the regression command: . qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 .73 0.000 . 2716) Prob > F Rsquared Root MSE = = = = = 2725 37.0 F( 2.0062244 ptime86  .000 .0012248 .59 0.2117743 hispan  .0020613 .0150471 tottime  .0014487 .0035024 .75 is about .0892586 .0215953 .1617183 .000 .077. 2716) = Prob > F = 0.018964 8.0080423 .25 to .5) = .552 .84 0. so the probability of arrest falls by about 7.0269938 .0255279 6.49 0.0027532 7.0058876 0.0307774 . Err.1171948 avgsen  . b.036517 _cons  .867 .
4116549 avgsen  .0168844 0.0 F( 2.8360 c.4666076 .4192875 born60  .000 .000 .09 0. Interval] +pcnv  .1164085 .6406 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 1483. Dev.18 0.20 0.0127395 .1837 1486. we must compute the difference in the normal cdf at the two different values of pcnv.1203466 _cons  . Min Max +avgsen  2725 .4 ptime86  2725 . probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = 1608.0 tottime = 0.0556843 0.0000 0.60 0.0254442 ptime86  .0076486 . Err.8387523 4.000 .2 tottime  2725 .5529248 .0407414 .67 0. Std.0036983 black  .3157 1483.0459949 inc86  .028874 .1629135 .213287 Now.0719687 6.387156 1.548 .000 .6076635 hispan  .000 .651 .0654027 4.45 0.6322936 3.12 0.0812017 . hispan = 0.0046346 .840 .0720778 7.0979318 .0543531 tottime  .6941947 . The probit model is estimated as follows: . z P>z [95% Conf.3255516 . and at the average values of the remaining variables: .6458 1483.0774 arr86  Coef.508031 0 59.0112074 .0004777 9.3138331 .2911005 .607019 0 63.950051 0 12 inc86  2725 54.017963 4.4143791 .000 .0212318 0.70 0.0512999 6.45 0.( 1) ( 2) avgsen = 0.0055709 . 2716) = Prob > F = 0. born60 = 1. sum avgsen tottime ptime86 inc86 Variable  Obs Mean Std.6406 = = = = 2725 249.62721 0 541 93 .48 0.96705 66. black = 1.52 0.
di 1903/1970 .6% of the time.0046*54.839 .117) . which is somewhat larger than the effect obtained from the LPM.467 + .. d. gen arr86h = phat > .97 + .632 . predict phat (option p assumed. tab arr86h arr86  arr86 arr86h  0 1  Total ++0  1903 677  2580 1  67 78  145 ++Total  1970 755  2725 .10. di normprob(..25 . e.1174364 . we first generate the predicted values of arr86 as described on page 465: .75 .0127*. Adding the quadratic terms gives .. the probit is correct only about 10..0812*. but we cannot very well predict the outcome we would most like to predict.10181543 This last command shows that the probability falls by about .117) .3% of the time. di . To obtain the percent correctly predicted for each outcome. the probit predicts correctly about 96.5 .553*. Unfortunately.313 + .normprob(. di 78/755 .96598985 .0112 . The overall percent correctly predicted is quite high.0076*. probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 pcnvsq pt86sq inc86sq 94 . Pr(arr86)) .387 .10331126 For men who were not arrested. for the men who were arrested..553*..
0039478 black  .389098 .0000 The quadratics are individually and jointly significant.1256351 .0009851 5.8570512 .8005 Probit estimates Number of obs LR chi2(11) Prob > chi2 Pseudo R2 Log likelihood = 1439.1438485 5.0224234 4.1474522 . at low levels of pcnv.7273198 avgsen  .4476423 .0145223 .1047 arr86  Coef.57 0. The turning point is easily found as .0566913 0.026909 inc86  .0965905 pcnvsq  . Interval] +pcnv  .75e06 4.000 .127.18 0.2270817 note: 51 failures and 0 successes completely determined.0 chi2( 3) = Prob > chi2 = 38.89 0.2937968 .000 .3978727 born60  .0244972 0. 95 .16 0. which does not make much sense.067082 3.3250042 pt86sq  .0139969 .8005 = = = = 2725 336.268 1439.4630333 1.0000171 _cons  .000 .000 .337362 . The quadratic in pcnv means that.28e06 2.0213253 ptime86  .7449712 . Std. there is actually a positive relationship between probability of arrest and pcnv.002 1.83 0.26 0.0562665 6.0078094 .3151 1441.041 3.97 0.2167615 .0199703 0. test pcnvsq pt86sq inc86sq ( 1) ( 2) ( 3) pcnvsq = 0.1349163 .0000 0. which means that there is an estimated deterrent effect over most of the range of pcnv.056957 .857) ~ .62 0.2663945 .1035031 . .8166 1439.405 .0 pt86sq = 0.0340166 .0620105 tottime  .95 0.0178158 .63e07 .97 0.000 .00 0.0 inc86sq = 0.059554 inc86sq  8.54 0.2089 1444.580635 hispan  .Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: 7: log log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = = 1608.0733798 5. Err.217/(2*.1837 1452.2929913 .2714575 3.372 . z P>z [95% Conf.8535 1440.8005 1439.000 .77 0.0058786 .2604937 0.4368131 .798 .568 .04 0.
yi)log(1 .yiT) are This allows us to write f(y1... Let P(y = 1x) = xB... = yilog(xiB) + (1 . li(B) Then..N. where x1 = 1. It may be impossible to find an estimate that satisfies these inequalities for every observation. This follows from the KLIC: the true density of y given x  evaluated at the true values. the loglikelihood function is well ^ defined only if 0 < xiB < 1 for all i = 1. we can use values of the loglikelihood to choose among different models for P(y = 1x) when y is binary.maximizes the KLIC.yTxi) = f1(y1xi)WWWfT(yTxi). Therefore. that is. We really need to make two assumptions. which is only welldefined for 0 < xiB < 1... (yi1.15... When we add the standard assumption for pooled probit .that D(yitxit) follows a probit model ... this condition must be checked.. Since the MLEs are consistent for the unknown parameters. a. for each i. So. independence assumption: independent. c.. of course .. especially if N is large. 15. For any possible estimate B^.11. The first is a conditional given xi = (xi1. during the iterations to obtain the MLE.T. exogeneity assumptiond: The second assumption is a strict D(yitxi) = D(yitxit).... just as we can use an Rsquared to choose among different functional forms for E(yx)..then 96 . t = 1..9. asymptotically the true density will produce the highest average log likelihood function. the joint density (conditional on xi) is the product of the marginal densities (each conditional on xi).xiT).xiB).. b.
we have q^ _ [F(^d0 + ^d1 + ^d2 + ^d3 + x^G) .. and let dB be an indicator for the treatment group. a. t=1 and so pooled probit is conditional MLE. will be identical across models. d2. and x using all observations.[F(d0 + d1 + xG)   F(^d0 + x^G)]. Then a probit model to evaluate the treatment effect is P(y = 1x) = F(d0 + d1d2 + d2dB + d3d2WdB + xG). there is no point in using any method other than a straight comparison of means. say x.G(xitB)]1yt.13. c.f(y1.  and in the latter we have N ^ ^ ^ ^ q _ N1 S {[F(d^0 + d^1 + d^2 + d^3 + xiG ) . 97 . or averaging the differences across xi. We would estimate all parameters from a probit of y on 1.  which requires either plugging in a value for x. between groups B and A.yTxi) = T p [G(xitB)]yt[1 .F(^d0 + ^d2 + x^G)]  ^ ^ ^ . If there are no covariates. where x is a vector of covariates. dB.[F(d0 + d1 + xiG)  F(^d0 + xi^G)]}.. d2WdB.. Let d2 be a binary indicator for the second time period. Both are estimates of the difference. both before and after the policy change. We would have to use the delta method to obtain a valid standard error for either ^ q or ~q. of the change in the response probability over time.. Once we have the estimates. b.F(d0 + d2 + xiG)] ~ i=1 ^ ^ ^ . The estimated probabilities for the treatment and control groups. we need to compute the "differenceindifferences" estimate. 15. In the former case.
We should use an interval regression model... but we only observe interval coded data..Go) h(cx.. i 3 87g=1 g ig i 8 4 log As expected. To be added.Gg)*h(cx .c.Go) = i p fg(ygx. Along with the bj ... (Clearly a conditional normal distribution for the GPAs is at best an approximation.... ordered probit with known cut points.we estimate The estimated coefficients are interpreted as if we had done a linear regression with actual GPAs.c.D)dc$.c. 98 .Go)f2(y1x. Because c appears in each D(ygx. and the unknown parameters.Go) = f1(y1x.c): f(y1.c.yGx. (xi.yG are dependent without conditioning on c. this depends only on the observed data.Go)WWWfG(yGx. c. 1 2 G b.Do)dc.15.. equivalently.Go). 7 8 8 g=1 where c g is a dummy argument of integration.yi1....including an intercept .. a. The log likelihood for each i is # 8i& pG f (y x ..c)..c.. 15..15. We would be assuming that the underlying GPA is normally distributed conditional on x. 15... We obtain the joint density by the product rule. The density of (y1.yGx.yG) given x is obtained by integrating out with respect to the distribution of c given x: 8& G * g(y1.yiG).17.) s2. since we have independence conditional on (x.c. y1.19.
e. where ui might contain unobserved ability. in something like an unemployment duration equation. a.xiB]/s}) f(yxi) = 1  + 1[yi < log(c)]Wlog{s d.s ) = 1[yi = log(c)]Wlog(1 .xiBxi] = 1 As c L 8. y = log(c) sf[(y . 1. if xi contains something like education. _ log(ti) (given xi) when ti < c is the same as the b.1. f(yxi) = 1  This is because. The density of yi * density of yi < log(c).xiB]/s}. the longer we wait to censor. Under H0. F{[log(c) . Thus.ci) has the same form as the density of yi given xi above.xiB)/s]. P[log(ti) = log(c)xi] = P[log(ti) > log(c)xi] * = P[ui > log(c) . B2 1 f[(yi . y < log(c). P(yi _ log(ti*). the density for yi = log(ti) is F{[log(c) . for y Thus. Thus.CHAPTER 16 16.xiB)/s]}. the density of yi given (xi. The assumption that ui is independent of ci means that the decision to censor an individual (or other economic unit) is not related to unobservables * affecting ti. LR is c2K2.ci).xiB]/s} L F{[log(c) .s2). = 0. This simply says that. and then the model without x2. To test H0: statistic.xiB]/s}. we do not wait longer to censor people of lower ability. which is treated as exogenous. 2 c. the less likely it is that we observe a censored observation. except that ci replaces c.F{[log(c) . < yxi) = P(yi* < yxi). which is just Normal(xiB. The LR statistic is distributed asymptotically as LR = 2(Lur  Lr). Note that ci can be related to xi. li(B. then the 99 . and so P[log(ti) = log(c)xi] L 0 as c L 8. I would probably use the likelihood ratio This requires estimating the model with all variables. Since ui is independent of (xi.
for a1 < y < a2.F[(a2 .xiB)/s] F[(a1 . x. we can easily get E(yx) by using the following: E(yx) = a1P(y = a1x) + E(yx.a1 < y < a2) = E(y * when a1 < y = xB + u.xB)/s < u/s < (a2 .xB < u < a2 .xB < u < a2 .a1 . Next. and a1 < y (1/s)f[(y .a1 < y* < a2) = xB + E(ux.xB)/s]  f[(a2 .xiB)/s]. P(yi = a1xi) = P(yi * = < a1xi) = P[(ui/s) < (a1 .xiB)/s] = 1 . Therefore.F[(a1 .xB)/s] 100 F[(a1 . E(yx.xB)/s]} = E(yx. * * x.(a1 .xiB)/s]. P(yi < yxi) = P(y*i < yxi) = F[(y . Taking the derivative of this cdf with respect to y gives the pdf of yi conditional on xi for values of y strictly between a1 and a2: * b. 16. Since y = y a2). * But y < a2.3.xB)/s]} .xB)/s] + E(yx. Now.xiB)/s] F[(a2 .xB)/s] + a2F[(xB .a1 < y < a2).xB.a1 < y < a2)W{F[(a2 . using the hint.xB) * E(y = xB + sE[(u/s)x. a.censoring time can depend on education.xB)/s] = xB + s{f[(a1 .xB)/s] .xB)/s]}/{F[(a2 .a2)/s] = a1F[(a1 .a1 < y* < < a2 if and only if a1 . Similarly.xiB)/s]. P(yi = a2xi) = P(yi * = P[(ui/s) = > a2xi) = P(xiB + ui > a2xi) > (a2 .xiB)/s].a1 < y < a2)WP(a1 < y < a2x) + a2P(y2 = a2x) = a1F[(a1 .
f2 _ f[(a2 . f. The linear regression of yi on xi using only those yi such that a1 < yi < a2 * consistently estimates the linear projection of y * for which a1 < y < a2. there is no reason to think that this will have any simple relationship to the parameter vector B.xiB)/s]. f1 _ f[(a1 .57) s{f[(a1 . or strictly between the endpoints.xB)/s] + F[(a1 .F1)bj + [(xB/s)(f1 . As a shorthand. write a2)/s].xB)/s] = f[(xB  F1 _ F[(a1 .xB)/s]} + a2F[(xB .] d.57).xB)/s]} (16. After obtaining the maximum likelihood estimates these into the formulas in part b. B^ and s^2. [In some restrictive cases.xB)/s].xiB)/s]} + 1[yi = a2]log{F[(xiB . at the right endpoint.a2)/s].xB)/s].f2)]bj  101 Then .a1 < y* < a2) $ xB. Note how the indicator function selects out the appropriate density for each of the three possible cases: at the left endpoint. From part b it is clear that E(y would be a fluke if OLS on the restricted sample consistently estimated B.xB)/s] . just plug The expressions can be evaluated at interesting values of x.xiB)/s]}. We can show this by bruteforce differentiation of equation (16.f[(a2 . the regression on the restricted subsample could consistently estimate B up to a common scale coefficient.+ (xB)W{F[(a2 . We get the loglikelihood immediately from part a: li(q) = 1[yi = a1]log{F[(a1 .a2)/s]} + 1[a1 < yi < a2]log{(1/s)f[(yi . and F2 _ F[(a2 . e. on x in the subpopulation Generally. x. and so it * c. dE(yx) = (a /s)f b + (a /s)f b 1 1 j 2 2 j dx j + (F2 .
we could average {F[(a2 . and the last two lines are obtained from differentiating the second term in E(yx). the scaled ^ bj Generally.xB )/s]}bj.xiB)/s]  all i to obtain the average partial effect. which is the expression we wanted to be left with. where 0 < ^ r < 1 is the scale factor. ^ ^ ^ F[(a1 . note that It ^ bj with that of ^gj. since we act as if we were able to do OLS on an uncensored sample. These are estimated as ^ ^ {F[(a2 . We could evaluate these partial effects ^ ^ Or.{[(a2 . ^ s appears in the partial effects along with the ^bj.(a2 ." h. say. The scale factor is simply the probability that a standard normal random variable falls in the interval [(a1 .  at. this approximation need not be very good in a partiular application.58) ^ ^ F[(a1 . The partial effects on E(yx) are given in part f. For data censoring where the censoring points might change with i. the analysis is essentially the same but a1 and a2 are replaced with ai1 and ai2. (16. respectively.xB)/s] where the estimates are the MLEs. but it is often roughly true. which is necessarily between zero and one. can be compared to the ^ gj.+ {[(a1 .xB)/s. we expect ^ ^ gj ~ ^rWb j.xB)/s].57). there is no sense in which ^ s is "ancillary. in (16. where the first two parts are the derivatives of the first and third terms. 102 . Intepretating the results is even easier. Of course.xB)/s]f1}bj .xB)/s]f2}bj. x.xiB )/s]} across In either case. terms cancel except (F2  Careful inspection shows that all F1)bj. g. does not make sense to directly compare the magnitude of By the way.
2145 hrbens  Coef.16.0360227 married  .1473341 .0673714 0.251898 .0281931 .0834306 .000 .3718 0.909 0. ll(0) Tobit Estimates Number of obs chi2(11) Prob > chi2 Pseudo R2 Log Likelihood = 519.0088168 9.0614223 nrthcen  .688 0. Err.1490686 .054929 .3518203 b.423 0.282847328 +Total  271.131 0.0678666 0. Std.0737578 1.0994408 .021225 .0103333 . The Tobit estimates are .0029862 .0538339 1.057 .66616 = 616 = 283.3547274 white  .1042321 tenure  .0287099 .972074 615 .0115164 age  .3604 .0061263 educ  .0083783 9.1900971 male  .0132201 age  .560 .0022495 .0043435 0.084021 south  .2282836 .005544 .000 .0035481 7.0058343 educ  .2788372 .949 0.946 0.762 0.0041162 0. t P>t [95% Conf.468 .0657498 .0696015 .53183 hrbens  Coef.0510187 1.0986582 tenure  .0869168 .0040631 .0284978 .048028 .021397 .1027574 .2538105 103 .000 .583 0.2084814 male  .1825451 .839786 604 .871 0.0000 = 0. tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union.3768401 .0746602 1. a.0523598 4. Err.6999244 .000 .384 .442231015 Number of obs F( 11. The results from OLS estimation of the linear model are .86 = 0.000 .0050939 .547 0.1608084 . Interval] +exper  .325 0. Interval] +exper  .010294 .1038129 union  . 604) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 616 32.50 0.078604 1.0112981 .5.265 0.082204 .710 0.0477021 .000 .672 .186 .0499022 7.552 0.635 0.0492621 .079 . reg hrbens exper age educ tenure married male white nrtheast nrthcen south union Source  SS df MS +Model  101.364019 white  .0029666 .2556765 .858 0.19384436 Residual  170.098923 .811 0.492 . Std.585 .000 1.258 .132288 11 9.0351612 married  .0025859 . t P>t [95% Conf.206 .0551672 4.0000 0.726 0.812 0.4748429 _cons  .1772515 3.0899016 .0044362 0.0037237 7.000 .2455481 nrtheast  .0046627 0.
0324014 .4878151 expersq  .0044995 educ  .0104947 5.0973362 tenure  .1503703 .715 0. the Tobit estimates are all slightly larger in magnitude.632 0.0906783 .4033519 .351 0.4443616 +_se  .2300547 .000 .5060039 _cons  .327 0.0001487 3.051105 7.0139224 .0581357 .0008445 .230 0. summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations The Tobit and OLS estimates are similar because only 41 of 616 observations.000 .0775035 1.0631812 .7% of the sample.0165773 (Ancillary parameter) Obs.1753675 male  .62108 = 616 = 315.0040294 .0043428 0. Std. ll(0) Tobit Estimates Number of obs chi2(13) Prob > chi2 Pseudo R2 Log Likelihood = 503.033717 .000 . have hrbens = 0.316 . Here is what happens when exper and tenure 2 are included: .493 .354 .1012841 nrthcen  .2562597 .0086957 9.1639731 .597 0.1034053 south  .0787463 married  .3874497 .0005524 .004 0.0306652 .239 .1891572 .0246854 .1146022 union  .197323 .nrtheast  .0709243 0.1536597 .0522697 7.685 0. Again.0528969 1.0802587 .5551027 .348 0. the parameter "_se" is ^ s.717 0. Interval] +exper  . this reflects that the scale factor is always less than unity.0480194 .528 . tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq.0713965 0.037525 .0743625 nrthcen  . or about 6.017479 .18307 .0489422 . As expected.928 0.0539178 4.2870843 .95 = 0.483 0.1187017 union  .0085253 3.000 .000 1.177 .252 0.0602628 .3006999 .728 .0778461 .2388 hrbens  Coef.0714831 .0768576 1.0760238 0.047408 age  .801 .753 0. Err.540 0. 2 c. as we know.0125583 .000 . You should ignore the phrase "Ancillary parameter" (which essentially means "subordinate") associated with "_se" as it is misleading for corner solution applications: ^2 s appears directly in ^E(yx) and ^E(yx.1708394 .1880725 4.180 0.000 .0002604 104 .581 0.2416193 nrtheast  .3621491 white  .0693418 0.0698213 0.y > 0).0000 = 0.8137158 .629 . t P>t [95% Conf.000 .0912729 south  .
000 1.2411059 .2632871 nrtheast  .993 .0088598 8.000 .108095 .000544 ind2  .091 0.373072 0.053115 .0726393 married  .0427534 age  .0963657 .372 0.tenuresq  .4137686 expersq  .5796409 +_se  .0655859 0.159 .165 1.0004098 3.0005242 _cons  .9650425 .390 0.091 0.0033643 .0081297 3.3504717 white  .278 .0789402 .0438005 .0046942 educ  .000 .408 .4137824 1.0041306 0.7117618 . Std.307673 .0161572 (Ancillary parameter) Obs.0335907 .0151907 (Ancillary parameter) 105 .2940 hrbens  Coef.0267869 . and we use ind1 as the base industry: .3669437 0.579 0.3143174 . Err.243 0.8203574 .99 = 0.3731778 .1853532 5.06154 .001 . so they should be included in the model.105 1.0506381 6.276 .0400045 nrthcen  .888 0.7377754 ind8  .368639 0.0501776 1.009 0.7310468 .1016799 . tobit hrbens exper age educ tenure married male white nrtheast nrthcen south union expersq tenuresq ind2ind9.527 .1532928 male  .0963403 tenure  .3257878 .04645 .3739442 0.997 0.0908226 union  .0667174 1.349246 .109 0. ll(0) Tobit Estimates Number of obs chi2(21) Prob > chi2 Pseudo R2 Log Likelihood = 467.127675 ind9  .330 0.086 0. There are nine industries.7536342 ind6  .1667934 .261 0.5099298 . d.3682535 1.0556864 4.0000 = 0.231545 .376006 1.615 0.3742017 0.387704 .0013291 .001 .910 0.0256812 .380 0.955 .0115306 .168 1.002134 .633 0.2351539 .0099413 5.0034182 .002 .000 .5083107 . summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations Both squared terms are very signficant.0004405 .2035085 .0585521 south  .09766 = 616 = 388.0007188 .214924 ind7  .319 1.5750527 .1317401 .9436572 .0013026 .0735678 1.3716415 0.0209362 .2433643 .794 . Interval] +exper  .0379854 . t P>t [95% Conf.6107854 .624 0.000 .6276261 ind4  .3617389 ind3  .295 0.000 .107 .0724782 .0108205 .828 0.2148662 .056 0.0721422 1.563 .4947348 ind5  .343 0.3948746 _cons  .409 0.0003863 3.0001623 tenuresq  .207 0.0547462 .1188029 .2375989 +_se  .5418171 .0001417 3.0020613 .375 1.
0 0.046.0 0. Certainly several estimates on the industry dummies are economically significant. but the joint Wald test says that they are jointly very significant. The likelihood ratio statistic is 2(503.621 .0 0.0000 Each industry dummy variable is individually insignificant at even the 10% level. in this example. notice that this is roughly 8 (= number of restrictions) times the F statistic. if f(Wx) is the continuous density of y given x. then the density of y given x and y > 0 is f(Wx)/[1 . say.0 8. This is somewhat unusual for dummy variables that are necessarily orthogonal (so that there is not a multicollinearity problem among them). industry eight earning about 61 cents less per hour in benefits than comparable worker in industry one.0 0.0 0. This follows because the densities conditional on y > 0 are identical for the Tobit model and Cragg’s model.Obs. with so few observations at zero.66 0.098) = 73. 17. it is roughly legitimate to use the parameter estimates as the partial effects. the pvalue for the LR statistic is also essentially zero. A more general case is done in Section Briefly.0 0.0 0.F(0x)].467. with a worker in. summary: 41 leftcensored observations at hrbens<=0 575 uncensored observations .3. [Remember. 595) = Prob > F = 9. a.7.] 16. where F(Wx) is the cdf 106 . test ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 ( ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7) 8) ind2 ind3 ind4 ind5 ind6 ind7 ind8 ind9 F( = = = = = = = = 0.
If we take the partial derivative with respect to log(x1) we clearly get the sum of the elasticities. a2 = 10.y > 0)]. One can imagine that some people at the corner y = 10 would choose y > 10 if they could.8): log[E(yx)] = log[P(y > 0x)] + log[E(yx. b.from (16. When f is the normal pdf with mean xB and variance s2. 16.a2)/s]. Taking the derivative of this function with respect to a2 gives 107 .y > 0) = {F(xB/s)} {f[(y . c.xB)/s]/s} for the Tobit model. The lower limit at zero is logically necessary considering the kind of response: the smallest percentage of one’s income that can be invested in a pension plan is zero. This follows very generally . a. Then. and this 1 is exactly the density specified for Cragg’s model given y > 0. the upper limit of 10 is an arbitrary corner imposed by law.y > 0) = F(xG)[xB + sl(xB/s)]. c.xB)/s]} + a2F[(xB . we get that f(yx.f[(a2 . with a1 = 0. we can think of an underlying variable. of the kind analyzed in Problem 16.of y given x.3(b). which would be the percentage invested in the absense of any restrictions.8) we have E(yx) = F(xG)WE(yx. So.not just for Cragg’s model or the Tobit model .xB)/s] + F(xB/s)} s{f(xB/s) . there would be no upper bound required (since we would not have to worry about 100 percent of income being invested in a pension plan). b. is appropriate. From Problem 16. we have E(yx) = (xB)W{F[(a2 .9.3. with a1 = 0. On the other hand. A twolimit Tobit model. From (6.
dE(yx)/da2 = (xB/s)Wf[(a2 .a2)/s]. An interesting followup question would have been: What if we standardize each xit by its crosssectional mean and variance at time t. or how we estimate a variety of unobserved effects panel data models with conditional normal heterogeneity.N. B^ and For a given value of x. and Var(x) has full rank K . We simply have & 7 ci = . (16. No.xB)/s] + [(a2 . we would compute ^ s are the MLEs. B^ and ^ s are just the usual Tobit estimates with the "censoring" at zero. That is why a linear regression analysis is always a reasonable first step for binary outcomes.13. any aggregate time dummies explicitly get  swept out of xi in this case but would usually be included in xit.T where T j _ 7&T1 S Pt8*X... and count outcomes. 16. provided there is not true data censoring.(a2/s)f[(xB .xB)/s] + = F[(xB . If yi < 10 for i = 1. This extension has no practical effect on how we estimate an unobserved effects Tobit or probit model.xB)/s]Wf[(a2 . where We might evaluate this expression at the sample average of x or at other interesting values (such as across gender or race).regardless of the nature of y or x. OLS always consistently estimates the parameters of a linear projection .11. corner solution outcomes.a2)/s] . t=1 1 T   Of course.59) We can plug in a2 = 10 to obtain the approximate effect of increasing the cap from 10 to 11.a2)/s] F[(xB .provided the second moments of y and the xj are finite. t=1 S Pt8*X + xiX + ai _ j + xiX + ai. d. and 108 . ^ ^ F[(xB .10)/s]... 16.
given that a fire has occured.assume ci is related to the mean and variance of the standardized vectors... again.. CHAPTER 17 17...2. let zit In _ (xit . t = 1. P^t and )^ t would be It may be possible to use a much larger to obtain P^t and )^ t. usual sample means and sample variance matrices. To be added. given building and neighborhood characteristics. where Lr = r=1 Alternatively. we might assume cixi ~ Normal(j + ziX. 109 .. and )t The are consistent ^ 1/2 ^ _ ) (xit .. other words.. zit would not contain aggregate time dummies). and proceed t with the usual Tobit (or probit) unobserved effects analysis that includes the 1 time averages ^ zi = T  T S ^zit. 1/2 r t = 1.Pt))t1/2.  Then. then there is no problem.1. but accounting for the sample variation in cumbersome. say and rNasymptotically normal.. 2  from the population. in which case one might ignore the sampling error in the firststage estimates.N}.15. one could estimate estimate Pt for each t using the cross section observations {xit: i = 1.Pt). You might want to supplement this with an analysis of the probability that buildings catch fire. This is a rather simple twostep estimation t=1 method. 16..T. ci = j + S xirLr + ai.. This is the kind of scenario that is handled by Chamberlain’s more general assumption concerning T the relationship between ci and xi: ) X/T.sa) (where. We simply need a random sample of buildings that actually caught on fire. But then a twostage analysis is appropriate.T. form ^ zit P^t and )^ t. If you are interested in the effects of things like age of the building and neighborhood demographics on fire damage..2. for each random draw i Then.
a1(xi) = 8.3. E(y1z. we need assume 110 .si=1) = f(yx i . the procedure is to replace a1W(zD2 + v2) + u1 a1W(zD2) + (u1 + a1v2). If the selection correction is going to work.G) .81) is u1 + a1v2.G) . si = 1. and a2(xi) was a function of family size (which determines the official poverty level). ^ 17. (17. density f(yxi. is p(yxi.17.B. where B Let yi given xi have is the vector indexing E(yixi) and set of parameters (usually a single variance parameter). we need the expected value of u1 + given (z.81) rNconsistent estimator.F(a 1 (xi)xi. when si = 1[a1(xi) < yi < a2(xi)]. D2 in (17.v2.G)  In the Hausman and Wise (1977) study. F(a2(xi)xi.B.5.14). a1v2 Then we can write E(y1z. We can get by with less than this.81) its (17. it cannot depend on z). the nature of v2 is restricted. yi = log(incomei).y3 = 1) = z1D1 + Conditioning on y3 = 1 gives a1W(zD2) + g1l(zD3). D^2. but If we use an IV approach.82) A sufficient condition for (17.82) is that (u1.v3) = z1D1 + where E[(u1 + a1W(zD2) + g1v3.v3) is independent of z with a trivariate normal distribution.B.  The key is to note how the error term in (17. a1(xi) < y < a2(xi). G is another Then the density of yi given xi. we need to see what happens when y2 = zD2 + v2 is plugged into the structural mode: y1 = z1D1 + = z1D1 + So.v3) to be linear in v3 (in particular. This is essentially given in equation (17. If we replace y2 with y2.B.G). a1v2)v3] = g1v3 by normality.
estimate Thus. consistent estimators are obtained by using initial consistent D1 estimators of entire sample. Substitute the reduced forms for y1 and y2 into the third equation: y3 = max(0. (17. This is why 2SLS is generally preferred. 17. or is some other variable that exhibits nonnormality. (ziD2). obtain and D^2. Under the assumptions given. Thus. D2. As a practical matter. Necessary is that there must be at least two elements in z not 111 .zD2.zP3 + v3). where v3 _ u3 + a1v1 + a2v2. ^ ^ form ziD1 and ziD2 for each observation i in the sample. For identification. a2. where v2 is independent of z and approximately normal. Given D1 ^ Then.z3) can contain no exact linear dependencies. and z3. and Estimation of Estimation of D1 D2 is simple: just use OLS using the follows exactly as in Procedure 17. a.83) y3 = max(0. we could consistently a1.84) where y1 is observed only when y3 > 0. then the OLS alternative will not be consistent. equations where y2 is binary. a1.zP3 + v3). cannot be consistently estimated using the OLS procedure. v3 is indepdent of z and normally distributed. if we knew D1 and D2.nothing about v2 except for the usual linear projection assumption. zi3 using all observations. ^a2. zD2. (zD1. if we cannot write y2 = zD2 + v2. >From the usual argument. and D^3 from the Tobit ^ yi3 ^ ^ on (ziD1).7. and D3 from a Tobit of y3 on zD1.a1(zD1) + a2(zD2) + z3D3 + v3) _ max(0.3 using the system y1 = zD1 + v1 (17.
We would use a standard probit model. and the probit estimator of So we would plug in G. by definition. s23. It is most easily done in a generalized method of moments framework. To be added.also in z3.11. for the two parts. problem. D2 must be estimated using Procedure 17. c. b. when two part models are specified with unobservables that may be correlated. Then w given F(xG). or conditional means.9.y > 0) = the NLS estimator of B Let w = 1[y > 0]. This is not very different from part a. we could write y = wWexp(xB + u). NLS is generally consistent and rNasymptotically  normal. F(xG)Wexp(xB). Obtaining the correct asymptotic variance matrix is complicated. 17. b. By definition. Not when you specify the conditional distributions. 17. If we have a random sample from that population. c. There is no sample selection problem because. Again.3. there is no sample selection Confusion arises. you have specified the distribution of y given x and y > 0. a. e. We only need to obtain a random sample from the subpopulation with y > 0. x follows a probit model with P(w = 1x) = d. I think. We need to estimate the variance of u3. there is no sample selection bias because we have specified the conditional expectation for the population of interest. The only difference is that Then follow the steps from part a. 112 . For example. w = 1[xG + v > 0]. E(yx) = P(y > 0x)WE(yx.
This twostep procedure reveals a potential problem with the model that allows u and v to be correlated: adding the inverse Mills ratio means that we are adding a nonlinear function of x. The interesting twist here is if u and v are Given w = 1. using the yi > 0 observations. 113 While this would be a little less . correlated. run the regression log(yi) on xi.v) is independent of x. Assume that (u. which we warned about in this chapter. First. ^ ^ l(xi^G) to obtain B . is pretty clear. if u and v are independent .w) . once we absorb E[exp(u)] into the intercept).so that w = 0 6 y = 0. E(uv) = rv and assume a standard normal distribution for v then we have the usual inverse Mills ratio added to the linear model: E[log(y)x. If we assume (u.w) = wWexp(xB)E[exp(u)x. Ideally. A standard t ^ r is a simple test of Cov(u. which implies the specification in part b (by setting w = 1. Then. In labor economics.w] = wWexp(xB)E[exp(u)]. with mean zero. So E[log(y)x.w = 1). If we make the usual linearity assumption. r.so that u is independent of (x. statistic on G B rl(xG).we have E(yx. identification of B comes entirely from the nonlinearity of the IMR. we would have a variable that affects P(w = 1x) that can be excluded from xB.w = 1] = xB + A twostep strategy for estimating probit of wi on xi to get G ^ and and l(xi^G).v) is multivariate normal. where twopart models are used to allow for fixed costs of entering the labor market. estimate a Then. we can write log(y) = xB + u.v) = 0. In other words. one would try to find a variable that affects the fixed costs of being employed that does not affect the choice of hours. then we can use a full maximum likelihood procedure.w = 1] = xB + E(ux.
17. In the case where an element of x is a derived price. b. we can estimate E(yx) = F(xB/s)xB + sf(xB/s) because we have made the assumption that y given x follows a Tobit in the full population. we need sufficient price variation for the population that consumes some of the good.44).w = 1)]. a. particularly. rank E(x’xy > 0) = K.w = 1)] can be obtained under joint normality. E(yx. making full distributional assumptions has a subtle advantage: then compute partial effects on E(yx) and E(yx. the partial effects are not straightforward to obtain. Instead. we can use truncated Tobit: distribution of y given x and y > 0. Usually. This is very different from the sample selection model. we can Even with a full set of assumptions. the underlying variable y of interest has a conditional normal distribution in the population. the parameters. A similar example is given in Section 19.13. we use the Notice that our reason for using truncated Tobit differs from the usual application. Given such variation. We cannot use censored Tobit because that requires observing x when whatever the value of y. Then. equation (19. Here. we can multiply this expectation by P(w = 1x) = that we cannot simply look at B F(xG).robust. For one. where E[exp(u)x.5.B)WE[exp(u)x. see. The point is to obtain partial effects of interest.y > 0) = exp(x.2. Provided x varies enough in the subpopulation where y > 0 such that b. y given x follows a standard Tobit model in the population (for a corner solution outcome). 114 .y > 0).
1041242 agesq  .06748 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 294. This is a form of sample selection. those who participate in the program would have had lower average earnings without training than those who chose not to participate.5).0122825 re75  .0371871 .1726192 0. leads to an underestimate of the impact of the program. 18.64 0.0008734 0.524 . If E(y0w = 1) < E(y0w = 0). It would have made sense to add unem74 and unem75 to the vector x.0271086 1.07 0.4298464 black  . on average.0000719 .07642 = 294.992 .0266 train  Coef.E(y0w = 0)] + ATE1.CHAPTER 18 18.0159447 .1446253 .2271609 0.090319 age  . Std.1052176 .170 . The following Stata session estimates a using the three different regression approaches.0501979 .0415 0.0159392 1.06748 = 294.3.1515457 2. E(y1 . This follows from equation (18.091519 . z P>z [95% Conf.004 .5).37 0.19 0.7389742 . and.06748 = = = = 445 16.934 .5898524 .2468083 .596 .1449258 married  .08 0.234 .0005467 .1. by (18.44195 .   and so the bias is given by the first term. but I did not do so: . Interval] +re74  .0189577 .0017837 nodegree  .0534045 0.01 0. E(y1) = E(yw = 1) and  Therefore. b. probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 302.y0) = [E(y0w = 1) .3006019 115 .53 0. E(y0) = E(yw = 1). Err.92 0.1 = 294.0016399 . a.  First.
0450374 2.661324802 Residual  93.3009852 . 442) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 3. t P>t [95% Conf.95 0.661695 .129489 1. Std.79802041 3 .4775317 .0095 .3579151 .0244515 441 .6738951 .195208 .210939799 +Total  94.4877173 .04 0.2378099 0.015 .37 0. Std.1638736 .072 . Err.5004545 . Interval] +train  .0449 0.8154273 0.2284561 .1030629 _cons  .1066934 .21153806 +Total  94.369752 1.3151992 0.0375 0.0934459 .60 0.0123 . Pr(train)) .213564126 Number of obs F( 2.9204644 traphat0  .966 .0212673 .45993 unem78  Coef..84 0.13 0.8224719 444 .0217247 phat  .000 .4998223 442 .826664 . reg unem78 train re74 re75 age agesq nodegree married black hisp 116 .45928 unem78  Coef.018 . Err.222497 _cons  .4793509 1.104 1. t P>t [95% Conf.110242 .63 0. Dev.134 1.8224719 444 .1624018 .719599 .80 0.28 0.0181789 phat  .45 0.340 .213564126 Number of obs F( 3.0994803 3. gen traphat0 = train*(phat . reg unem78 train phat Source  SS df MS +Model  1.4572254 _cons  .5534283 .045039 2. Interval] +train  .0139 0.103972 .0101531 . Min Max +phat  445 .3184939 . 441) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 2.50 0.599340137 Residual  93. predict phat (option p assumed. sum phat Variable  Obs Mean Std.4155321 .0190 0. reg unem78 train phat traphat0 Source  SS df MS +Model  1.779 1.416) .233225 .3079227 1.1987593 .3226496 2 .hisp  .
206263502 +Total  94.636 .0304127 .1105582 .0923609 black  .0538 0.60 0.47 0.48 0.0342 . probit train re74 re75 age agesq nodegree married black hisp Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 302. is to use a probit model for unem78 on train and x. Std.716 .0053889 0.0131441 .451 .0415 0.0421444 . of course.0040 0.Source  SS df MS +Model  5.45416 unem78  Coef.75 0.06748 = 294.06748 = = = = 445 16.1516412 .5.0676704 agesq  .007121 . I used the following Stata session to answer all parts: .22 0. training status was randomly assigned.11.444 .421 .11: participating in job training is estimated to reduce the unemployment probability by about .3368413  In all three cases.60 0.0004949 .7246235 435 .213564126 Number of obs F( 9.0189565 1.8224719 444 .0815002 2.0444832 2.0080391 re75  .77 0. 435) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 2. the average treatment effect is estimated to be right around .0550176 0.0001139 nodegree  .013 .36 0.109 .1502777 married  .0011038 .0296401 .07 0.0114269 age  . Of course.0620734 0. Interval] +train  .2512535 .566427604 Residual  89.81 0.025669 . 18.633 . An alternative. in this example.2342579 .1078464 0.0025525 .1726761 _cons  .0231295 re74  . so we are not surprised that different methods lead to roughly the same estimate.0094371 0.027 .2905718 0.1979868 .0003098 1.06748 Probit estimates Number of obs LR chi2(8) Prob > chi2 Pseudo R2 Log likelihood = 294.0266 117 .09784844 9 .3408202 hisp  .07642 = 294.0659889 .0068449 .111 . Err.1 = 294.49 0.0392887 .180637 .8053572 .75 0. a. t P>t [95% Conf.0204538 .
00 0.92 0.43 0. t P>t [95% Conf.89168 _cons  4.93248 27.008732134 118 Number of obs F( 8.1726192 0.1041242 agesq  .8517046 hisp  .08 0.3481955 re75  . Interval] +train  .613857 11.6566 444 43.004 .0534045 0.19 0.45109 re74  .934 .0161 6.2686905 +Total  19525.662979 4.0017837 nodegree  .656719 0.0000719 .203087 1.234 .1998802 .9992 = .87404126 8 .0159447 .2746971 0.2232733 .9410e06 +Total  3.0699177 18. Std.257878 .8804 435 43.157 5.554259 1.992 .1446253 . predict phat (option p assumed.3006019 hisp  . Pr(train)) .2284561 .2814839 0.0189577 .963 2.3400184 .369752 1.197362 Residual  18821.688 17.2468083 .467 . reg re78 train re74 re75 age agesq nodegree married black hisp (phat re74 re75 age agesq nodegree married black hisp) Instrumental variables (2SLS) regression Source  SS df MS +Model  703. Err. 436) Prob > F Rsquared Adj Rsquared Root MSE = 445 =69767.7389742 . Std.090319 age  .203039 0.0000 = 0.6396151 age  .0016399 .104 1.28 0.668 .0113738 .0024826 .0045238 0.210237 2.0624611 .1052176 .1602 . Err.05 0.43 0.936 7. reg phat re74 re75 age agesq nodegree married black hisp Source  SS df MS +Model  3. 435) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 445 1.42 0.1453799 0.759 .108893 black  2.098774 0.0122825 re75  .0371871 .0271086 1.0863775 .5898524 .4668602 .170 .050672 1.63 0.0005467 .3079227 1.44 = 0.482387 6.47144 0.01 0.1030629 _cons  .5779 re78  Coef.484255158 Residual  .44195 .091519 .87706754 444 .003026272 436 6.0008734 0.53 0.997 35.927734 married  .1449258 married  .train  Coef. z P>z [95% Conf.670 7.9767041 Number of obs F( 9.2953534 3.9992 = 0.779 1. Interval] +re74  .00263 .103972 .4298464 black  .31125 35.7397788 agesq  .5004545 .08 0.40 0.596 .0159392 1.1515457 2.00172 0.73 0.64 0.0763 0.826664 .0064086 nodegree  1.367622 3.55 0.37 0.31 0.0360 0.75 0.583 .0501979 .2271609 0.776258 9 78.8154273 0.524 .
The collinearity suspected in part b is confirmed by regressing Fi on the xi: the Rsquared is .004 . much smaller than when we used either linear regression or the propensity score in a regression in Example 18.14 0.04 0.) The very large standard error (18. t P>t [95% Conf.640..0345727 .0139209 .070.594057 b. Interval] +re74  .1732229 .0352802 .0000312 222.v)x.93 0.0000258 .000 .0003207 . again.99 0.1719806 married  .01 0.1850713 . ^ (When we do not instrument for train.31 0.v)Wvx. The IV estimate of a is very small .0571603 .0068687 re75  . y = h0 + xG + bw + wW(x  J)D + u + wWv + e. a = 1.00) suggests severe collinearity among the instruments. d.000 .z] = E[exp(p0 + xP1 + zP2 + p3v)Wvx.0005368 .92 0.00011 2.80e06 16.5874586 .0004726 118.000 .0000293 1. a. 18.0001046 agesq  . To be added.9.z) = E[E(wWvx.z.000 . We can start with equation (18.z] = xWexp(p0 + xP1 + zP2) where x = E[exp(p3v)Wv].0140283 age  .0000328 nodegree  .0138135 .0553027 hisp  .000316 546. Std.66). Generally.0069914 .0016786 351.phat  Coef.000 . But E(wWvx.9992.2.z.5907578 .1838453 . which means there is virtually no separate ^ variation in Fi that cannot be explained by xi.00036 98.0069301 .71 0. and we have 119 .z) and an error. ^ c.7. se = . and.000 . This example illustrates why trying to achieve identification off of a nonlinearity can be fraught with problems.0000546 254.1826192 _cons  .000 .0006238 294. 18.1726018 .82 0.000 .0562315 .z] = E[E(wx.625. it is not a good idea. we will replace wWv with its expectation given (x. Err.0359877 black  .
z).x.N. wi.z) = 0. wi(xi . i = 1. c. i = 1. we In the second step. These are standard linearity assumptions under independence of (u.z) = 0. If h _ h(x.. gi. [Note that we do not need to replace p0 with a different constant.z) + r.z): E(yv.z) = qWg. no other functions of (x.z) in the estimating equation.....66) conditional on (g.z) and E(eg. First. i = 1. the coefficient on wi is the consistent estimator of b. an Ftype 120 .h) = L(wq) = q because q = E(wx. run the regression ^ ^ yi on 1. assume we can write w = exp(p0 + xP1 + zP2 + g). The ATE b is not identified by the IV estimator applied to the extended equation.. and P2 from the From this regression. In effect. p0 + xiP1 + ziP2 + gi.z) are valid as instruments.x. E(rx.g) and (x. as is implied in the statement of the problem.z) = h0 + xG + bw + wW(x  J)D + E(ug. OLS regression log(wi) on 1.z)] + e..x.z). xi.for example.z). ^ need the residuals. xi. Now.x.N. A standard joint significant test . where we have used the fact that w is a function of (g. zi. Then we take the expected value of (18.x. since log(wi) = P1. The last equation suggests a twostep procedure. L(w1. b. wigi.. where E(ug.q.used the assumption that v is independent of (x. E(wWvx. becaue we need to include E(wx.x. E(rx. the average treatment effect.z) + wE(vgx.x.. define r = u + [w  Given the assumptions.z) = 0..v. gi.x.z) = rWg and E(vg.] So we can write y = h0 + xG + bw + wW(x  J)D + xE(wx. What I should have said is.z) + E(eg.z) is any function of (x.x)..N.z) = h0 + xG + bw + wW(x  J)D + rWg + qwWg.  As usual. This is not what I intended to ask. we can consistently estimate p0..z).x. This is a clear weakness of the approach.
log(m). 2 The 2 + m .280003 Number of obs F( 7. reg cigs lcigpric lincome restaurn white educ age agesq Source  SS df MS +Model  8029.19 0.0000 0.883 12. 799) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 807 6.8690144 .1605158 4. The first = 0. The answers are given below. For the exponential case. molog(m) . when evaluated at mo.4594197 1.011 5.299532 restaurn  2. so m = mo uniquely sets the The second derivative of q(m) is mom 2 > 0 for all m > 0.412 cigs  Coef.880158 +Total  151753.561503 2.7745021 .1.000 . The following is Stata output used to answer parts a through f.1736136 age  .20124 10.782321 0.424067 2. Std.1671677 3.5017533 .m for m > 0. 19.m 3 second derivative is 2mom 2 mo 2 = mo 1 _ E[li(m)] = mo/m . which is uniquely solved by m = mo.829893 . CHAPTER 19 19.83 0. q(m) 2 order condition is mom . Write q(m) _ Then dq(m)/dm = mo/m . derivative to zero.15 0.305594 educ  .00 0. which.7287636 1.1.117406 2. .459461 0.test . b.246 799 179. This is a simple problem in univariate calculus.38 0. Interval] +lcigpric  . a.003 .702 3.38 0.43631 7 1147.089585 121 .0446 13.06233 Residual  143724.865621 1.56 0.5592363 1.683 806 188. Err.059019 .0529 0. gives 2mo + < 0.8509044 5.on the last two terms effectively tests the null hypothesis that w is exogenous.49943 lincome  .6722235 white  . so the sufficient second order condition is satisfied.233 . t P>t [95% Conf.3.
0 lincome = 0.000 .042796 restaurn  2.862469 .07 0.38 0.0335 lincome  .5191 log likelihood = 8111.0062048 _cons  2.86134 .1624097 3.002 .054396 0.412  Robust cigs  Coef.1380317 5.0017481 5.865621 1.10 0.4899 .22621 44.5035545 1.61 0.16145 .918 53.0 F( 2.017275 2.71 0.19 0.26472 2.8205533 .22 0. Err.3441 .5592363 1.0124999 .000 .90194 0.04545 agesq  .3047671 2.147 . robust Regression with robust standard errors Number of obs F( 7. 799) = Prob > F = 1. test lcigpric lincome ( 1) ( 2) lcigpric = 0.22073 0.005 4. poisson cigs lcigpric lincome restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2: log likelihood = 8111.8346 log likelihood = 8111.597972 1.682435 25.52632 48.519 Poisson regression Number of obs LR chi2(7) 122 = = 807 1068.685 3.8690144 . reg cigs lcigpric lincome restaurn white educ age agesq.0 lincome = 0. test lcigpric lincome ( 1) ( 2) lcigpric = 0.45 0.70 .41 0.0090686 .0119324 . Interval] +lcigpric  .0529 13.0014589 6. 799) = Prob > F = 0.1829532 age  .0 F( 2.82 0.888 12.146247 educ  .0090686 .000 .09 0. 799) Prob > F Rsquared Root MSE = = = = = 807 9.8509044 6.14 0.agesq  .8687741 white  .0000 0.0056373 _cons  2.5017533 . Std.7353 11.912 50. t P>t [95% Conf.7745021 .682435 24.378283 0.11 0.
1407338 2.257 .160812 lincome  . z P>z [95% Conf. Std. Err.6454 = 8111.10 0.3870061 . Interval] +lcigpric  .0218208 age  .1059607 .0014825 .518 .12272 cigs  Coef.74 0.1239969 agesq  .4248021 .6139626 0.519 Generalized linear models Optimization : ML: NewtonRaphson Deviance Pearson = = No.46933 16232.599794 .16 0.000 .0552011 .0374207 1.1686685 0.13 0.0042564 13.0970243 .002 .07 0. Interval] +lcigpric  .1433932 0.0914144 1.6463244 0.46367 20.6394391 .027457 5.76735 0.1750847 lincome  .0049694 22.0639772 .0703561 .519 = = 0.0013708 .65 0.000 .0008677 _cons  .3964493 2.820355 (Standard errors scaled using square root of Pearson X2based dispersion) * The estimate of sigma is 123 .0223989 5.000 .33 0. Err.519 = 8111.886 5.1285444 .0002567 5.31628 = 20.3636059 .58 0.743 .1142571 .0202811 5.010 . z P>z [95% Conf.1083 = 8111.1037275 .000057 24.99 0.1045172 .0191849 3. Std.0510802 age  .870 1.3964494 .34 0.158158 agesq  .140 .48 0.519022 = 14698.0594225 .0754414 .0012592 _cons  .000 .0312231 11.14 0.92274 = = = = = 807 799 1 18. glm cigs lcigpric lincome restaurn white educ age agesq.96 0.0013708 .0552012 .000 .1142571 .0677648 .1059607 .2753831 educ  .Log likelihood = Prob > chi2 Pseudo R2 8111.0618 cigs  Coef.0000 0.0877728 white  .1434779 restaurn  .65 0.3024098 white  . of obs Residual df Scale param (1/df) Deviance (1/df) Pearson 14752.3857854 .000 .460 .2828965 restaurn  . family(poisson) sca(x2) Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 8380.10 0.372733 1.3636059 .001874 .0181421 educ  .70987 Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : OIM [Poisson] [Log] Log likelihood BIC AIC = 8111.8068952 1.11 0.0594225 .000 .1037275 .
* dividing by 20.0014458 .519) 27.1116754 .1211174 ..0308796 11.000 . z P>z [95% Conf.0013374 _cons  .9765587 .46367 20.000 .31628 .0532166 .95 0.2906 Poisson regression Number of obs LR chi2(5) Prob > chi2 Pseudo R2 Log likelihood = 8125.0602 cigs  Coef.0452489 age  .2906 = = = = 807 1041.0000 0.3555118 . * This is the usual LR statistic.519 Generalized linear models Optimization : ML: NewtonRaphson Deviance Pearson = = No.70987 Variance function: V(u) = u Link function : g(u) = ln(u) Standard errors : Sandwich [Poisson] [Log] 124 = = = = = 807 799 1 18.16 0.0048175 25.6454 = 8111. poisson cigs restaurn white educ age agesq Iteration 0: Iteration 1: Iteration 2: log likelihood = 8125. glm cigs lcigpric lincome restaurn white educ age agesq. di sqrt(20.65 0.32) 1.000 .0618025 .2940107 white  .618 log likelihood = 8125. .291 .1350483 .14 0.48 0. family(poisson) robust Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = 8380. of obs Residual df Scale param (1/df) Deviance (1/df) Pearson 14752. di 2*(8125.1095991 6.7617484 .0015543 .46933 16232.32: The GLM version is obtained by .0040652 13. Err.0611842 .000 .291 .5077711 . Interval] +restaurn  .32) 4.037371 1.544 .098 .1083 = 8111.519)/(20.0114433 educ  .2907 log likelihood = 8125.1305594 agesq  .8111. Std.14 0.5469381 . di 2*(8125.09 0.519 = 8111.000 .4150564 .8111.3545336 .0000553 26.
based on the usual Poisson standard errors. Std.1558715 agesq  .00137) 41.0192058 3.2669906 restaurn  . they are significantly correlated.0212322 5.60 0. Err.1059607 .000 .000 .104.0002446 5.0595355 .1632959 0.1142571 . While the price variable is still very insignificant (pvalue = .0018503 .25 0. is very significant: t = 5.a binary indicator for restaurant smoking restrictions at the state level . on the order of 2.Log likelihood BIC = 8111. z P>z [95% Conf.735 . and.213 .874 1.3964493 2. not surprisingly.34 0.16 0. The two variables are jointly insignificant.3752553 .then log(cigpric) becomes much more significant (but using the incorrect standard errors). .6387182 .0970653 . using the usual and heteroskedasticityrobust tests (pvalues = .106 and the estimated income elasticity is .519022 = 14698. di .) 125 .1037275 . too. (States that have restaurant smoking restrictions also have higher average prices. although the coefficient estimates are the expected sign.0884937 white  .438442 6. In this data set.010 . Both estimates are elasticities: the estimate price elasticity is .715328 a.0008914 _cons  .490.0726427 .0217798 age  .083299 1.415575 1.0552011 .3636059 .344. the income variable.1143/(2*. Neither the price nor income variable is significant at any reasonable significance level.92274 AIC = 20.6681827 0.0594225 . It does not matter whether we use the usual or robust standard errors.12272  Robust cigs  Coef.38 0.9%.002 .23134 .46). both cigpric and restaurn vary only at the state level.203653 lincome  .0013708 .09 0. b.264853 educ  .13 0.59 0. Interval] +lcigpric  .11.140366 2. Incidentally.97704 0. if you drop restaurn . respectively).894 5.
As expected. The usual LR statistic is 2(8125. with the option "sca(x2). We simply compute the turning point for the quadratic: = 1143/(2*.51). and the age variables are still significant. the QLR statistic shows that the variables are jointly insignificant.) d. and that on lincome falls to 1.36 (pvalue ~ .which separates the initial decision to smoke at all from the decision of how much to smoke .seems like a good idea. education.32.519) = 27. there is no race effect. in fact.much more in line with the linear model t statistic (1. it is Having fully robust standard errors has no additional effect.51.19 with the usual standard errors). as is done using the "glm" command in Stata.16). so QLR = 1. conditional on the other covariates.54. the restaurant restriction variable. One approach is to model D(yx.8111. The GLM estimate of s is s = 4. while the LR statistic shows strong significance. e. which is a 2 very large value in a c2 distribution (pvalue ~ 0). In this example.291 . f. This means all of the Poisson standard errors should be multiplied by this factor.13 . A double hurdle model .y It > 1) as a truncated Poisson distribution. Clearly." The t statistic on lcigpric is now very small (. is certainly worth investigating. Using the robust standard errors does not significantly change any conclusions. (Interestingly. most explanatory variables become slightly more significant than when we use the GLM standard errors. g. With the GLM standard errors. 126 .00137) ^ ^ bage/(2bage2) ~ 41.72. and then to model P(y = 0x) as a logit or probit. The QLR statistic ^2 divides the usual LR statistic by s = 20.^ c. ^ the adjustment by s > 1 that makes the most difference. using the maximum likelihood standard errors is very misleading in this example.
under H0. Var(yixi) depends on a. i = 1.E(yirxi.E(yitxit). a. a simple pooled regression. 2 and t .ci)xi] + Cov[E(yitxi. through the origin. ~2 Call this estimator t . We are explicitly testing H0: independence of ci and xi under H0. Let yit = exp(a + ~ ~ ~ ~ ~ ~ 2 xitB) and uit = yit . pooled Poisson QMLE. First. 2 also use the many covariance terms in estimating t 127 2 because t [We could 2 = . t = 1.. We just use iterated expectations: E(yitxi) = E[E(yitxi.yirxi) = E[Cov(yit. B by. Var(yixi). This works because. Var(yitxi) = E[Var(yitxi. A similar.5... E(uitxi) = E(uitxit) 2 = exp(a + xitB) + t [exp(xitB)] . all of which we can It is natural to use a score test of H0: G = 0. 2 So.T..   G b. say.19.. 2 2 where t 2 _ Var(ci) and we have used E(cixi) = exp(a) under H0. obtain ~ ~ ~ ~ ~ ~ ~ ~ consistent estimators a.ci)xi] = 0 + Cov[ciexp(xitB).. of ~ ~ ~ ~2 ~ ~ 2 uit .N.yirxi.ci)xi] = E[ciexp(xitB)xi] + Var[ciexp(xitB)xi] = exp(a + xitB) + t [exp(xitB)] . but we are maintaining full We have enough assumptions to derive * T conditional variance matrix of yi given xi under H0.. general expression holds for conditional covariances: Cov(yit. where uit 2 2 _ yit . First.. A consistent estimator of t can be obtained from estimate. under H0.ci)xi] + Var[E(yitxi.ciexp(xirB)xi] = t exp(xitB)exp(xirB).yit on [exp(xitB)] . the T = 0. B.ci).yit.ci)xi] = E(cixi)exp(xitB) = exp(a + xiG)exp(xitB) = exp(a + xitB + xiG).
E{[uit/exp(xitB)][uir/exp(xirB)]}, all t
2
2
Next, we construct the T
$ r.
* T weighting matrix for observation i, as in
~
~
Section 19.6.3; see also Problem 12.11. The matrix Wi(D) = W(xi,D) has
~
~
~
~2
~ 2
diagonal elements yit + t [exp(xitB)] , t = 1,...,T and offdiagonal elements
~
~
~2
~
~
~ ~
t exp(xitB)exp(xirB), t $ r. Let a, B be the solutions to
N
~ 1
min (1/2) S [yi  m(xi,a,B)]’[Wi(D)] [yi  m(xi,a,B)],
i=1
a,B
where m(xi,a,B) has t
th
element exp(a + xitB).
Since Var(yixi) = W(xi,D),
this is a MWNLS estimation problem with a correctly specified conditional
variance matrix.
Therefore, as shown in Problem 12.1, the conditional
information matrix equality holds.
To obtain the score test in the context
of MWNLS, we need the score of the comditional mean function, with respect to
all parameters, evaluated under H0.
Let
Q _
Then, we can apply equation (12.69).
(a,B’,G’)’ denote the full vector of conditional mean
parameters, where we want to test H0:
G
= 0.
The unrestricted conditional
mean function, for each t, is
mt(xi,Q) = exp(a + xitB + xiG).

Taking the gradient and evaluating it under H0 gives
~
Dqmt(xi,Q~) = exp(a~ + xitB
)[1,xit,xi],

which would be 1
* (1 + 2K) without any redundancies in xi.

Usually, xit
would contain year dummies or other aggregate effects, and these would be

dropped from xi; we do not make that explicit here.
T
Let
DqM(xi,~Q) denote the
* (1 + 2K) matrix obtained from stacking the Dqmt(xi,Q~) from t = 1,...,T.
Then the score function, evaluate at the null estimates
~
Q _
~ ~ ~
(a,B’,G’)’, is
~
~
~ 1~
si(Q) = DqM(xi,Q)’[Wi(D)] ui,
~
where ui is the T
* 1 vector with elements ~uit _ yit  exp(~a + xitB~ ).
128
The
estimated conditional Hessian, under H0, is
~
1
A = N
N
S DqM(xi,Q~)’[Wi(~D)]1DqM(xi,~Q),
i=1
a (1 + 2K)
* (1 + 2K) matrix.
The score or LM statistic is therefore
& S D M(x ,~Q)’[W (~D)]1~u *’& SN D M(x ,~Q)’[W (~D)]1D M(x ,~Q)*1
i
i
i8 7
q i
i
q i 8
7i=1 q
i=1
N
W&7 S DqM(xi,Q~)’[Wi(~D)]1~ui*8.
N
LM =
i=1
a
2
Under H0, and the full set of maintained assumptions, LM ~ cK.
If only J < K

elements of xi are included, then the degrees of freedom gets reduced to J.
In practice, we might want a robust form of the test that does not
require Var(yixi) = W(xi,D) under H0, where W(xi,D) is the matrix described
above.
This variance matrix was derived under pretty restrictive
assumptions.
~
A fully robust form is given in equation (12.68), where si(Q)
~
~
1
and A are as given above, and B = N
N
S si(~Q)si(~Q)’.
Since the restrictions
i=1
are written as
matrix is K
G
= 0, we take c(Q) =
G,
~
and so C = [0IK], where the zero
* (1 + K).
c. If we assume (19.60), (19.61) and ci = aiexp(a + xiG) where aixi ~

Gamma(d,d), then things are even easier  at least if we have software that
estimates random effects Poisson models.
Under these assumptions, we have
yitxi,ai ~ Poisson[aiexp(a + xitB + xiG)]

yit, yir are independent conditional on (xi,ai), t
$ r
aixi ~ Gamma(d,d).
In other words, the full set of random effects Poisson assumptions holds, but
where the mean function in the Poisson distribution is aiexp(a + xitB + xiG).


In practice, we just add the (nonredundant elements of) xi in each time
period, along with a constant and xit, and carry out a random effects Poisson
analysis.
We can test H0:
G
= 0 using the LR, Wald, or score approaches.
Any of these wouldbe asymptotically efficient.
129
But none is robust because we
have used a full distribution for yi given xi.
19.7. a. First, for each t, the density of yit given (xi = x, ci = c) is
yt
f(ytx,c;Bo) = exp[cWm(xt,Bo)][cWm(xt,Bo)] /yt!,
yt = 0,1,2,....
Multiplying these together gives the joint density of (yi1,...,yiT) given (xi
= x, ci = c).
Taking the log, plugging in the observed data for observation
i, and dropping the factorial term gives
T
S {cim(xit,B) + yit[log(ci) + log(m(xit,B))]}.
t=1
b. Taking the derivative of li(ci,B) with respect to ci, setting the
result to zero, and rerranging gives
T
(ni/ci) =
S m(xit,B).
t=1
Letting ci(B) denote the solution as a function of
ni/Mi(B), where Mi(B)
B,
we have ci(B) =
T
_ S m(xit,B).
The second order sufficient condition
t=1
for a maximum is easily seen to hold.
c. Plugging the solution from part b into li(ci,B) gives
li[ci(B),B]
= [ni/Mi(B)]Mi(B) +
= ni + nilog(ni) +
T
=
T
S yit{log[ni/Mi(B)] + log[m(xit,B)]
t=1
T
S yit{log[m(xit,B)/Mi(B)]
t=1
S yitlog[pt(xi,B)] + (ni  1)log(ni),
t=1
because pt(xi,B) = m(xit,B)/Mi(B) [see equation (19.66)].
N
d. From part c it follows that if we maximize
S
i=1
li(ci,B)
with respect to
(c1,...,cN)  that is, we concentrate out these parameters  we get exactly
N
li[ci(B),B].
i=1
S
B
depend on
N
But, except for the term
S (ni  1)log(ni)  which does not
i=1
 this is exactly the conditional log likelihood for the
conditional multinomial distribution obtained in Section 19.6.4.
Therefore,
this is another case where treating the ci as parameters to be estimated leads
us to a
rNconsistent, asymptotically normal estimator of Bo.

130
glm atndrte ACT priGPA frosh soph.000 .0394496 _cons  . 675) Prob > F Rsquared Adj Rsquared Root MSE = = = = = = 680 72. replace atndrte = atndrte/100 (680 real changes made) .0169202 . fitted values) . This is required to easily use the "glm" command in Stata.0517097 . reg atndrte ACT priGPA frosh soph Source  SS df MS +Model  5.92 0.020411511 +Total  19.64937 Generalized linear models No.029059989 Number of obs F( 4.8170956 .0136196 priGPA  .3017 0.64937 223.19. t P>t [95% Conf.64983 223.7087769 .0000 0. I first converted the dependent variable to be in [0.014485 0.086443 .76 0.7317325 679 .6268492 .000 . Dev.0417257 16. sum atndrteh Variable  Obs Mean Std.95396289 4 1. count if atndrteh > 1 12 .14287 atndrte  Coef.07 0. Min Max +atndrteh  680 . Interval] +ACT  .99 0.0856818 soph  .7777696 675 .9.2040379 frosh  .0112156 16.100].0202207 . .4846666 1. of obs 131 = 680 .7907046 .1820163 .1].23 0.0177377 .0174327 .1599947 .003 . I will use the following Stata output. Std.99 0.0110085 .48849072 Residual  13.64509 223.448 .2976 .001681 10.000 .0936415 . rather than [0. predict atndrteh (option xb assumed.0173019 2. family(binomial) sca(x2) note: atndrte has noninteger values Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = 226. Err.
Interval] +ACT  .7622 .7622 .1267746 = .0771321 16.1268)^2 . Err.322713 (Standard errors scaled using square root of Pearson X2based dispersion) .7621699 . Dev.6493665 = 253.Optimization : ML: NewtonRaphson Deviance Pearson = = Residual df Scale param (1/df) Deviance (1/df) Pearson 285.001 .760 .244*3)/(1 + exp(.13 0..244*3)) . di exp(.326 .000 .6724981 atndrte  Coef.84 0.1114*30 + 1. di (.1266718 = = = = 675 1 ..98 0. Min Max +atndh  680 .1676013 .0891901 priGPA  1. di .0000 132 .7371358 85.7622 .847 .087 .75991253 .244*3)/(1 + exp(.44 0.0944066 0.244375 .1114*30 + 1. predicted mean atndrte) .1335703 . di exp(.7622 .57283238 Variance function: V(u) = u*(1u) Link function : g(u) = ln(u/(1u)) Standard errors : OIM [Bernoulli] [Logit] Log likelihood BIC AIC = 223.395552 frosh  .0922209 .113436 3.244*3)) . sum atndh Variable  Obs Mean Std.6122622 soph  .5725 1.9697185 .2778463 _cons  ..0113217 9. predict atndh (option mu assumed.0928127 .201627 1.2859966 2.84673249 .3899318 . z P>z [95% Conf..1114*25 + 1.01607824 .1114*25 + 1.1113802 .093199 1.8170956 .000 1.0965356 ..66 0. Std.4233143 .3499525 .008 .0000 atndh  0. corr atndrte atndh (obs=680)  atndrte atndh +atndrte  1.
There are 12 fitted values greater than one. ^ Note that s ~ . For the logistic functional form.then the attendance rate is estimated to fall by about .302. are much too large. the usual MLE standard errors. remember that the parameters in the logistic functional form are not chosen to maximize an Rsquared.2 percentage points.328.0161. The calculation shows that when ACT increases from 25 to 30. say. This is very similar to that found using the linear model. or about 8. actually reduces predicted attendance rate. we know that an increase in ACT score. In other words.7 percentage points.32775625 a. And..5725)^2 . you will get the usual MLE standard errors. Naturally. or 18. or 8. none less than zero.087.5 percentage points. The Rsquared for the linear model is about .) c. (If you omit the "sca(x2)" option in the "glm" command. 2 standard errors that account for s The < 1 are given by the GLM output. from the expected Hessian of the quasilog likelihood. holding year and prior GPA fixed. I computed the squared correlation between atndrtei and ^ E(atndrteixi). and so the logistic functional form does fit better than the linear model.017(5) = .085. b. di (. The GLM standard errors are given in the output. the attendance rate is predicted to be about . the estimated fall in atndrte is about . 133 .182 higher. Since the coefficient on ACT is negative. The coefficient on priGPA means that if prior GPA is one point higher. these changes do not always make sense when starting at extreme values of atndrte. obtained. The coefficient on ACT means that if the ACT score increases by 5 points . This Rsquared is about . d.more than a one standard deviation increase .
Q) = 1 .ci. with respect to t. For the Weibull case.ai.F(tixi.F(cixi.aixi)/P(ti * > b . F(txi. P(ti = cixi.11. d.si = 1) = P(ti * 134 > cixi.F(b .exp[exp(xiB)t ].ai) = P(t*i > . * 20.3. To be added. The derivative of the cdf in part a.t*i > b . If all durations in the sample are censored. To be added.aixi)]. and so the loglikelihood is N N i=1 i=1 S log[1 . SOLUTIONS TO CHAPTER 20 PROBLEMS 20.F(b . a. is simply f(txi)/[1 .aixi)]/[1 .19. Without covariates. c. b. 20. So plugging any value a into the log likelihood will lead to b getting more and more negative without bound. It is not possible to estimate duration models from flow data when all durations are right censored. P(ti < txi. and so the S exp(xiB)cai . we can choose any a > 0 so that N S cai > 0.aixi)]. So no two real numbers for a and b maximize the log likelihood. di = 0 for all i. for any a > 0.t*i > b .ci. the loglikelihood is maximized by minimizing exp(b) across b. exp(b) L 0.aixi) = P(ti * < txi)/P(t*i > b .si = 1) = P(t*i < txi.F(b .ai) = P(t*i < t. But as b L 8.t*i > b . a.Q)] a b. the Weibull loglikelihood with complete censoring is exp(b) N S cai .aixi) (because t < b  ai) = [F(txi) . N loglikelihood is  i=1 c.5.ai. i=1 Since ci > 0.Q)] = S log[1 . i=1 But then.1.
Now. the density of (ai. We have the usual tradeoff between robustness and efficiency. 20.7.56) results in more efficient estimators provided we have the two densities correctly specified.9.xi) does not depend * on ci and is given by k(axi)f(txi) for 0 < a < b and 0 < t < 8. To be added.ti) given (ci.F(cixi)] /P(si = 1xi). the observation is uncensored. We suppress the parameters in the densities. the density is k(axi)[1  F(cixi)]. 135 . is P(ti ai.ai) = [1 .32).xi.xi) and si = 1 is d (1 .aixi) (because ci > b .xi). This is also the conditional density of (ai. a. for all combinations (a.22) and D(aici.di) k(axi)[f(txi)] i[1 .xi) when t < ci.ti). > b  From the standard result for densities for truncated distributions. Using the log likelihood (20.F(b  aixi)]. the density of (ai.t) such that si = 1. by (20.F(cixi)]/[1 . Putting in the parameters and taking the log gives (20. which is exactly (20.cixi)/P(ti * > b . For t = ci. conditional on xi. 20. that is.30) requires us to only specify f(Wxi).56). First.ti) given (ci.di. b.ti) given (ci. (20. by the usual right censoring argument. the probability of * observing the random draw (ai.ci.xi) = D(aixi).