You are on page 1of 9

Heckit Model

1. We apply Heckit model to mroz data, in the hope to estimate the marginal effect of
educ on wage. The issue here is that we observe wage only when the person is employed
(in labor force, inlf=1); on the other hand, for unemployed persons, inlf=0, and the
wage is unobserved, denoted by missing value.

. import excel "I:\420\420_mroz.xls", sheet("Sheet1") firstrow clear


. label define inlfl 0 "Unemployed" 1 "Employed"
. label value inlf inlfl
. tab inlf
inlf | Freq. Percent Cum.
------------+-----------------------------------
Unemployed | 325 43.16 43.16
Employed | 428 56.84 100.00
------------+-----------------------------------
Total | 753 100.00

. sum wage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
wage | 428 4.177682 3.310282 .1282 25

Notice that the whole sample includes 753 persons. However, we observe wage only
for 428 employed persons. The remaining 325 persons are unemployed for whom wage
is missing value, and missing value is automatically excluded by command sum.

2. It is easy to show the employed and unemployed groups are NOT comparable, at least
in terms of educ:

. reg educ inlf, nohe


------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
inlf | .8619554 .1649094 5.23 0.000 .5382172 1.185694
_cons | 11.79692 .1243283 94.89 0.000 11.55285 12.041
------------------------------------------------------------------------------

1
On average, education of the employed is more than the unemployed by .8619554. This
difference is statistically significant with t-value=5.23.

3. Given that the observable educ differs across the two groups, it is natural to speculate
that other unobserved factors such as ability may also differ across the two groups.
Thus, if the population is everyone (the employed plus the unemployed), the sample
consisting of only the employed is NOT iid sample. In this case, Heckit model can be
used to address the non-random sampling issue.

4. In terms of statistics, we are ultimately seeking E(y|x)—here y is wage and x is


educ. But, if we use the sample of the employed only, we are effectively estimat-
ing E(y|x, inlf = 1). In general, the two condition means are different unless inlf is
irrelevant. To fix the idea, consider a system of two models:

y = β1 x1 + u, E(u|x1 ) = 0 (1)
{
1, if γ1 x1 + γ2 x2 + v ≥ 0
s = (2)
0, otherwise

where s is a dummy variable representing the selection outcome—here, s = inlf. wage


is observed only when γ1 x1 + γ2 x2 + v ≥ 0 (when a person selects to be in labor force).
Notice that we need at least one variable x2 that matters for selection but is excluded
from the regression for wage.

5. Let ϕ and Φ denote the pdf and cdf of standard normal distribution. We need some
preliminary results that you may learn in a mathematical statistics class:

if u and v follow bivariate normal distribution ⇒ E(u|v) = ρv (3)


ϕ(c)
if v follows standard normal distribution ⇒ E(v|v > c) = (4)
1 − Φ(c)
ϕ(−c) = ϕ(c), 1 − Φ(−c) = Φ(c) (5)

2
6. Given those results, it follows that

E(y|x1 , s = 1) = β1 x1 + E(u|s = 1) (6)


= β1 x1 + E(u|γ1 x1 + γ2 x2 + v ≥ 0) (7)
= β1 x1 + ρE(v|v ≥ −γ1 x1 − γ2 x2 ) (8)
[ ]
ϕ(−γ1 x1 − γ2 x2 )
= β1 x1 + ρ (9)
1 − Φ(−γ1 x1 − γ2 x2 )
[ ]
ϕ(γ1 x1 + γ2 x2 )
= β1 x1 + ρ (10)
Φ(γ1 x1 + γ2 x2 )

where λ(γ1 x1 + γ2 x2 ) ≡ ϕ(γ1 x1 +γ2 x2 )


Φ(γ1 x1 +γ2 x2 )
is called inverse Mills ratio.

7. Equation (10) makes it clear that we would get biased estimate for β1 if we ignored
the inverse Mills ratio—the omitted variable in this context. Usually, the inverse Mills
ratio can be well approximated by a linear function. So without x2 there would be
multicollinearity between x1 and inverse Mills ratio.

8. James Heckman, the 2000 Nobel Prize winner, suggests a two-step procedure: in step
one, estimate γ̂ by a probit model (using both employed and unemployed persons)
and compute inverse Mills ratio; in step two, run linear regression (using the employed
only) that includes both x1 and inverse Mills ratio, i.e., regression (10).

9. Go back to mroz data. We estimate the first-step probit model and compute inverse
Mills ratio using

. probit inlf educ exper expersq kidslt6


Probit regression Number of obs = 753
LR chi2(4) = 161.37
Prob > chi2 = 0.0000
Log likelihood = -434.19024 Pseudo R2 = 0.1567
------------------------------------------------------------------------------
inlf | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1152242 .0228636 5.04 0.000 .0704124 .1600361
exper | .1232176 .018005 6.84 0.000 .0879285 .1585067
expersq | -.0024388 .0005845 -4.17 0.000 -.0035843 -.0012932
kidslt6 | -.4954733 .1002736 -4.94 0.000 -.692006 -.2989405

3
_cons | -1.978889 .2934994 -6.74 0.000 -2.554137 -1.403641
------------------------------------------------------------------------------
. predict yhat, xb
. gen imr = normalden(yhat)/normal(yhat)
. label variable imr "inverse Mills ratio"

Notice that: (1) the dependent variable in the probit regression is the dummy variable
inlf, which determines the selection outcome and whether wage is observed; (2) the
whole sample is used (N=753); (3) kidslt6 has significant effects on the decision of
working vs not working—having kids younger than 6 lowers the probability of being in
labor force; (4) option xb in command predict is important. Without it, you would
get the predicted probability Φ(γ̂x), other than γ̂x.

10. In the second step, we regress log wage onto educ, exper, expersq, and the inverse Mills
ratio (but drop kidslt6, i.e., kidslt6 is the excluded variable x2 in (10))

. reg lwage educ exper expersq imr


Source | SS df MS Number of obs = 428
-------------+------------------------------ F( 4, 423) = 19.80
Model | 35.2240136 4 8.80600341 Prob > F = 0.0000
Residual | 188.103427 423 .444688953 R-squared = 0.1577
-------------+------------------------------ Adj R-squared = 0.1498
Total | 223.327441 427 .523015084 Root MSE = .66685
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0977484 .0202379 4.83 0.000 .057969 .1375278
exper | .027473 .0247323 1.11 0.267 -.0211406 .0760866
expersq | -.0005259 .0005781 -0.91 0.363 -.0016623 .0006104
imr | -.1836831 .2727257 -0.67 0.501 -.7197494 .3523833
_cons | -.1761709 .5506569 -0.32 0.749 -1.258535 .9061936
------------------------------------------------------------------------------

Now the sample size becomes 428, because the dependent variable in the linear regres-
sion, wage, is non-missing only for the employed.

4
11. There is a command heckman that can implement the two-step procedure automati-
cally, and can account for the fact that a variable generated from the first-step, the
inverse Mills ratio, is used in the second-step as a regressor by making adjustment to
the standard error and t value. Thus, the heckman command is strongly recommended
to use in practice

. heckman lwage educ exper expersq, select(educ exper expersq kidslt6) twostep
Heckman selection model -- two-step estimates Number of obs = 753
(regression model with sample selection) Censored obs = 325
Uncensored obs = 428
Wald chi2(3) = 28.84
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage |
educ | .0977484 .0202663 4.82 0.000 .0580271 .1374696
exper | .027473 .0247111 1.11 0.266 -.0209599 .0759059
expersq | -.0005259 .0005784 -0.91 0.363 -.0016595 .0006076
_cons | -.176171 .5499718 -0.32 0.749 -1.254096 .9017539
-------------+----------------------------------------------------------------
select |
educ | .1152243 .0228636 5.04 0.000 .0704124 .1600361
exper | .1232176 .018005 6.84 0.000 .0879285 .1585066
expersq | -.0024388 .0005845 -4.17 0.000 -.0035843 -.0012932
kidslt6 | -.4954733 .1002735 -4.94 0.000 -.6920058 -.2989408
_cons | -1.978889 .2934991 -6.74 0.000 -2.554137 -1.403642
-------------+----------------------------------------------------------------
mills |
lambda | -.183683 .2720596 -0.68 0.500 -.71691 .3495439
-------------+----------------------------------------------------------------
rho | -0.27175
sigma | .67593798
------------------------------------------------------------------------------

5
Notice that the coefficient of inverse Mills ratio is reported as λ. It is insignificant with
t-value = -0.68. That means, for this problem, selection bias is not a significant issue.
Another signal for minor selection bias is rho = -0.27175, the correlation coefficient
between u and v, is close to zero here.

12. Because this is log-level model, we can interpret β1 as follows: holding constant expe-
rience, one more year education is associated with 9.77484 percent increase in wage.
since inverse Mills ratio has been controlled for, this estimate is free-of selection bias.

13. Exercise: what is wrong with this result? Why do we see similar estimates for β1 with
and without inverse Mills ratio .1074896 ≈ .0977484?

. reg lwage educ exper expersq

Source | SS df MS Number of obs = 428


-------------+------------------------------ F( 3, 424) = 26.29
Model | 35.0222967 3 11.6740989 Prob > F = 0.0000
Residual | 188.305144 424 .444115906 R-squared = 0.1568
-------------+------------------------------ Adj R-squared = 0.1509
Total | 223.327441 427 .523015084 Root MSE = .66642

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1074896 .0141465 7.60 0.000 .0796837 .1352956
exper | .0415665 .0131752 3.15 0.002 .0156697 .0674633
expersq | -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382
_cons | -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144
------------------------------------------------------------------------------

6
Tobit Model (optional)
1. The Tobit model is similar to Heckit model—instead of having many missing values
for y, we have many so called corner solution values.

2. For the mroz data, a person works positive hours only if employed; otherwise the hour is
zero. Thus, the variable hours equals zero (the corner solution) for a nontrivial fraction
of sample, but is roughly continuously distributed over positive values. Speaking of
statistics, hours follows a hybrid distribution—a mixture of discrete and continuous
distribution. The histogram below clearly shows hours pile up at value zero.
.0025
.002
.0015
Density
.001
5.0e−04
0

0 1000 2000 3000 4000 5000


hours

3. Suppose the decision of working or not depends on an unobserved latent variable y ∗ ,


which quantifies the utility obtained from working. A person works only if the perceived
utility is positive. More explicitly, the Tobit model is made up of a system of two
regressions

y ∗ = βx + u, u ∼ N (0, σ 2 ) (11)
y = max(0, y ∗ ) (12)

Equation (11) shows that β measures the partial effect of x on y ∗ , not y. That means we
need to be cautious when interpreting β. The second implication is, a linear regression
is in general infeasible because y ∗ is unobserved.

7
4. It follows that
( )
∗ βx
P r(y = 0) = P r(y < 0) = P r(u < −βx) = 1 − P r(u < βx) = 1 − Φ
σ

This implies that it is bad idea to ignore the observations with hours=0 because those
observations can be used to estimate β.

5. There are two options to estimate the tobit model. The first option is apply MLE to
the whole sample. The log likelihood for the i-th observation is
{ [ ( )]
log 1 − Φ βxσ i , if y = 0
log(fi ) = [ ( )] (13)
log σ1 ϕ yi −βx
σ
i
, if y > 0

Then MLE maximizes i log(fi ). The result of column (2) in Table 17.2 is

. tobit hours nwi educ exper expersq age k*, ll(0)


Tobit regression Number of obs = 753
LR chi2(7) = 271.59
Prob > chi2 = 0.0000
Log likelihood = -3819.0946 Pseudo R2 = 0.0343
------------------------------------------------------------------------------
hours | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nwifeinc | -8.814243 4.459096 -1.98 0.048 -17.56811 -.0603724
educ | 80.64561 21.58322 3.74 0.000 38.27453 123.0167
exper | 131.5643 17.27938 7.61 0.000 97.64231 165.4863
expersq | -1.864158 .5376615 -3.47 0.001 -2.919667 -.8086479
age | -54.40501 7.418496 -7.33 0.000 -68.96862 -39.8414
kidslt6 | -894.0217 111.8779 -7.99 0.000 -1113.655 -674.3887
kidsge6 | -16.218 38.64136 -0.42 0.675 -92.07675 59.64075
_cons | 965.3053 446.4358 2.16 0.031 88.88528 1841.725
-------------+----------------------------------------------------------------
/sigma | 1122.022 41.57903 1040.396 1203.647
------------------------------------------------------------------------------
Obs. summary: 325 left-censored observations at hours<=0
428 uncensored observations

8
0 right-censored observations

Here stata calls the corner solution value zero the left-censored value, which is specified
by the lower limit ll option.

6. Tedious math can prove that


( )
dE(y|x) βx
= βΦ
dx σ

So like probit model, in tobit model β multiplied by a factor gives the marginal effect
of x on y (the marginal effect of x on y ∗ is just β). In practice we use the APE scale
∑ ( )
factor n−1 i Φ βxσ i :

predict yhat, xb
replace yhat = yhat/1122.022
gen p = normal(yhat)
qui sum p
dis "factor is " r(mean)
factor is .58866336
dis "average marginal effect of educ on hours is " (.58866336)*(80.64561)
average marginal effect of educ on hours is 47.473116

For example, the average marginal (partial) effect of educ on hours is (.58866336)(80.64561) =
47.473116

7. The second option is to fit nonlinear OLS to the subsample in which hours are positive.
Similar to heckit model, we can show inverse Mills ratio should be included:

ϕ(βx/σ)
E(y|x, y > 0) = βx + E(u|u > −βx) = βx + σ (14)
Φ(βx/σ)

8. To sum up, the tobit model can be used to account for corner solution. The downside
is that β becomes hard to interpret.

You might also like