Professional Documents
Culture Documents
1. We apply Heckit model to mroz data, in the hope to estimate the marginal effect of
educ on wage. The issue here is that we observe wage only when the person is employed
(in labor force, inlf=1); on the other hand, for unemployed persons, inlf=0, and the
wage is unobserved, denoted by missing value.
. sum wage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
wage | 428 4.177682 3.310282 .1282 25
Notice that the whole sample includes 753 persons. However, we observe wage only
for 428 employed persons. The remaining 325 persons are unemployed for whom wage
is missing value, and missing value is automatically excluded by command sum.
2. It is easy to show the employed and unemployed groups are NOT comparable, at least
in terms of educ:
1
On average, education of the employed is more than the unemployed by .8619554. This
difference is statistically significant with t-value=5.23.
3. Given that the observable educ differs across the two groups, it is natural to speculate
that other unobserved factors such as ability may also differ across the two groups.
Thus, if the population is everyone (the employed plus the unemployed), the sample
consisting of only the employed is NOT iid sample. In this case, Heckit model can be
used to address the non-random sampling issue.
y = β1 x1 + u, E(u|x1 ) = 0 (1)
{
1, if γ1 x1 + γ2 x2 + v ≥ 0
s = (2)
0, otherwise
5. Let ϕ and Φ denote the pdf and cdf of standard normal distribution. We need some
preliminary results that you may learn in a mathematical statistics class:
2
6. Given those results, it follows that
7. Equation (10) makes it clear that we would get biased estimate for β1 if we ignored
the inverse Mills ratio—the omitted variable in this context. Usually, the inverse Mills
ratio can be well approximated by a linear function. So without x2 there would be
multicollinearity between x1 and inverse Mills ratio.
8. James Heckman, the 2000 Nobel Prize winner, suggests a two-step procedure: in step
one, estimate γ̂ by a probit model (using both employed and unemployed persons)
and compute inverse Mills ratio; in step two, run linear regression (using the employed
only) that includes both x1 and inverse Mills ratio, i.e., regression (10).
9. Go back to mroz data. We estimate the first-step probit model and compute inverse
Mills ratio using
3
_cons | -1.978889 .2934994 -6.74 0.000 -2.554137 -1.403641
------------------------------------------------------------------------------
. predict yhat, xb
. gen imr = normalden(yhat)/normal(yhat)
. label variable imr "inverse Mills ratio"
Notice that: (1) the dependent variable in the probit regression is the dummy variable
inlf, which determines the selection outcome and whether wage is observed; (2) the
whole sample is used (N=753); (3) kidslt6 has significant effects on the decision of
working vs not working—having kids younger than 6 lowers the probability of being in
labor force; (4) option xb in command predict is important. Without it, you would
get the predicted probability Φ(γ̂x), other than γ̂x.
10. In the second step, we regress log wage onto educ, exper, expersq, and the inverse Mills
ratio (but drop kidslt6, i.e., kidslt6 is the excluded variable x2 in (10))
Now the sample size becomes 428, because the dependent variable in the linear regres-
sion, wage, is non-missing only for the employed.
4
11. There is a command heckman that can implement the two-step procedure automati-
cally, and can account for the fact that a variable generated from the first-step, the
inverse Mills ratio, is used in the second-step as a regressor by making adjustment to
the standard error and t value. Thus, the heckman command is strongly recommended
to use in practice
. heckman lwage educ exper expersq, select(educ exper expersq kidslt6) twostep
Heckman selection model -- two-step estimates Number of obs = 753
(regression model with sample selection) Censored obs = 325
Uncensored obs = 428
Wald chi2(3) = 28.84
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage |
educ | .0977484 .0202663 4.82 0.000 .0580271 .1374696
exper | .027473 .0247111 1.11 0.266 -.0209599 .0759059
expersq | -.0005259 .0005784 -0.91 0.363 -.0016595 .0006076
_cons | -.176171 .5499718 -0.32 0.749 -1.254096 .9017539
-------------+----------------------------------------------------------------
select |
educ | .1152243 .0228636 5.04 0.000 .0704124 .1600361
exper | .1232176 .018005 6.84 0.000 .0879285 .1585066
expersq | -.0024388 .0005845 -4.17 0.000 -.0035843 -.0012932
kidslt6 | -.4954733 .1002735 -4.94 0.000 -.6920058 -.2989408
_cons | -1.978889 .2934991 -6.74 0.000 -2.554137 -1.403642
-------------+----------------------------------------------------------------
mills |
lambda | -.183683 .2720596 -0.68 0.500 -.71691 .3495439
-------------+----------------------------------------------------------------
rho | -0.27175
sigma | .67593798
------------------------------------------------------------------------------
5
Notice that the coefficient of inverse Mills ratio is reported as λ. It is insignificant with
t-value = -0.68. That means, for this problem, selection bias is not a significant issue.
Another signal for minor selection bias is rho = -0.27175, the correlation coefficient
between u and v, is close to zero here.
12. Because this is log-level model, we can interpret β1 as follows: holding constant expe-
rience, one more year education is associated with 9.77484 percent increase in wage.
since inverse Mills ratio has been controlled for, this estimate is free-of selection bias.
13. Exercise: what is wrong with this result? Why do we see similar estimates for β1 with
and without inverse Mills ratio .1074896 ≈ .0977484?
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1074896 .0141465 7.60 0.000 .0796837 .1352956
exper | .0415665 .0131752 3.15 0.002 .0156697 .0674633
expersq | -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382
_cons | -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144
------------------------------------------------------------------------------
6
Tobit Model (optional)
1. The Tobit model is similar to Heckit model—instead of having many missing values
for y, we have many so called corner solution values.
2. For the mroz data, a person works positive hours only if employed; otherwise the hour is
zero. Thus, the variable hours equals zero (the corner solution) for a nontrivial fraction
of sample, but is roughly continuously distributed over positive values. Speaking of
statistics, hours follows a hybrid distribution—a mixture of discrete and continuous
distribution. The histogram below clearly shows hours pile up at value zero.
.0025
.002
.0015
Density
.001
5.0e−04
0
y ∗ = βx + u, u ∼ N (0, σ 2 ) (11)
y = max(0, y ∗ ) (12)
Equation (11) shows that β measures the partial effect of x on y ∗ , not y. That means we
need to be cautious when interpreting β. The second implication is, a linear regression
is in general infeasible because y ∗ is unobserved.
7
4. It follows that
( )
∗ βx
P r(y = 0) = P r(y < 0) = P r(u < −βx) = 1 − P r(u < βx) = 1 − Φ
σ
This implies that it is bad idea to ignore the observations with hours=0 because those
observations can be used to estimate β.
5. There are two options to estimate the tobit model. The first option is apply MLE to
the whole sample. The log likelihood for the i-th observation is
{ [ ( )]
log 1 − Φ βxσ i , if y = 0
log(fi ) = [ ( )] (13)
log σ1 ϕ yi −βx
σ
i
, if y > 0
∑
Then MLE maximizes i log(fi ). The result of column (2) in Table 17.2 is
8
0 right-censored observations
Here stata calls the corner solution value zero the left-censored value, which is specified
by the lower limit ll option.
So like probit model, in tobit model β multiplied by a factor gives the marginal effect
of x on y (the marginal effect of x on y ∗ is just β). In practice we use the APE scale
∑ ( )
factor n−1 i Φ βxσ i :
predict yhat, xb
replace yhat = yhat/1122.022
gen p = normal(yhat)
qui sum p
dis "factor is " r(mean)
factor is .58866336
dis "average marginal effect of educ on hours is " (.58866336)*(80.64561)
average marginal effect of educ on hours is 47.473116
For example, the average marginal (partial) effect of educ on hours is (.58866336)(80.64561) =
47.473116
7. The second option is to fit nonlinear OLS to the subsample in which hours are positive.
Similar to heckit model, we can show inverse Mills ratio should be included:
ϕ(βx/σ)
E(y|x, y > 0) = βx + E(u|u > −βx) = βx + σ (14)
Φ(βx/σ)
8. To sum up, the tobit model can be used to account for corner solution. The downside
is that β becomes hard to interpret.