You are on page 1of 20

Tobit models

Econ 60303
Bill Evans

1
Example: Bias in censored models
• Bivariate regression
• xi and ε are drawn from N(0,1)
• y i = α + x i β + εi
• Let α=0 and β=1 (45o line) and construct y
• Estimate yi = α + xi β + εi

2
• Consider three LHS variables
• y1 is as reported (no censoring)
• y2=min(1,y1)
– censored 23.9%
• y3=min(0.25,y1)
– Censored 41.8% of the time

3
Figure 1: Plot of X and Y1

4
True
3

0
Y

-4 -3 -2 -1 0 1 2 3 4
-1

-2

-3

-4

-5
X
4
Figure 1: Plot of X and Y2

4
True
3

2
OLS
1

0
Y

-4 -3 -2 -1 0 1 2 3 4
-1

-2

-3

-4

-5
X
5
Figure 1: Plot of X and Y3

4
True
3

1 OLS

0
Y

-4 -3 -2 -1 0 1 2 3 4
-1

-2

-3

-4

-5
X
6
OLS Estimate of α and β
Dependent Variable Ratio, βYj/ βY1

Y1 Y2 Y3

α 0.027 -0.189 -0.432

β 1.023 0.755 0.565 0.738 0.553

% cen. 0 0.239 0.418


(1-%cen) 0.761 0.582

7
OLS using Tobit using Tobit using
Y1 Y2 Y3

α 1.0229 1.0078 0.9960


(0.027) (0.036) (0.041)

β 0.027 0.0133 -0.0001


(0.031) (0.033) (0.004)

8
9
Example from CPS
• Data from the 1987 CPS out-going rotation
group
• Households in CPS for same four months
in a two year period (April-July 1987 and
1988)
• ¼ leave the sample temporarily or
permanently each month
• In these months, answer detailed
questions about current employment
10
• Union status
• Usual hours, hours of overtime
• Usual weekly earnings
• In each survey, weekly earnings are
‘topcoded’
• In the data we use (1987), topcoded at
$999

11
• Sample, 25% random sample of
full-time/full year male workers, 21-64

12
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
age byte %9.0g age in years
race byte %9.0g 1=white, non-hisp, 2=black,
n.h, 3=hisp
educ byte %9.0g years of education
unionm byte %9.0g 1=union member, 2=otherwise
smsa byte %9.0g 1=live in 19 largest smsa,
2=other smsa, 3=non smsa
region byte %9.0g 1=east, 2=midwest, 3=south,
4=west
earnwke int %9.0g usual weekly earnings
-------------------------------------------------------------------------------

13
Need a variable
That identifies
What obs are censored

. gen earnwkl=ln(earnwke);
. gen union=unionm==1;
. gen topcode=earnwke==999;
. gen black=race==2;
. gen hispanic=race==3;
. * get frequencie of topcode; Fraction
. tabulate topcode; Of obs
topcoded
=1 if |
earnwkl is |
topcoded | Freq. Percent Cum.
------------+-----------------------------------
0 | 18,474 92.81 92.81
1 | 1,432 7.19 100.00
------------+-----------------------------------
Total | 19,906 100.00

14
• . *run simple regression on topcoded data;
• . reg earnwkl age age2 educ black hispanic union;

• [delete results]

• . * run tobit model;


• . * here, ul specifies that the dependent variable is;
• . * topcoded above (upper censoring);
• . tobit earnwkl age age2 educ black hispanic union, ul;

15
Tobit regression Number of obs = 19906
LR chi2(6) = 7309.06
Prob > chi2 = 0.0000
Log likelihood = -13207.534 Pseudo R2 = 0.2167

------------------------------------------------------------------------------
earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0703864 .00214 32.89 0.000 .0661919 .074581
age2 | -.0006948 .0000262 -26.55 0.000 -.0007461 -.0006435
educ | .0757658 .0012172 62.25 0.000 .07338 .0781515
black | -.2200147 .011795 -18.65 0.000 -.2431339 -.1968954
hispanic | -.1058161 .0141638 -7.47 0.000 -.1335783 -.0780539
union | .1191111 .0077791 15.31 0.000 .1038634 .1343588
_cons | 3.499009 .0421806 82.95 0.000 3.416332 3.581686
-------------+----------------------------------------------------------------
/sigma | .4530426 .0023983 .4483418 .4577434
------------------------------------------------------------------------------
Obs. summary: 0 left-censored observations
18474 uncensored observations
1432 right-censored observations at earnwkl>=6.906755

Similar to RMSE
16
egen q=mean(topcode) if earnwke>=750;
gen alpha=ln(q)/(ln(750) - ln(999));
gen ey_y999=999*alpha/(alpha-1);
sum q alpha ey_y999;

. sum q alpha ey_y999;

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
q | 3277 .436985 0 .436985 .436985
alpha | 3277 2.887721 0 2.887721 2.887721
ey_y999 | 3277 1528.21 0 1528.21 1528.21

E[Y | Y>c] = αc/(α-1)

α = 2.89

E[Y | Y>999] = (2.89)(999)/(1.89) = 1528 17


OLS/Tobit when Income is
Topcoded at $999
OLS Tobit QF Tobit/
OLS
Age 0.0679 0.0704 0.0723 0.964
Age2 -6.8E-4 -6.9E-4 -7.1E-4 0.985
Educ 0.0701 0.0757 0.0796 0.926
Black -0.2130 -0.2200 -0.2252 0.968
Hispanic -0.1096 -0.1058 -0.1049 1.036
Union 0.1316 0.1191 0.1078 1.105
18
• . * artifically topcode wages at 750;
• . gen top750=earnwke>=750;
• . gen earnwkl3=top750*ln(750) + (1-
top750)*ln(earnwke);
• . * run regression on model with artificially
topcoded wages;
• . reg earnwkl3 age age2 educ black
hispanic union;
19
OLS/Tobit when Income is
Topcoded at $750
OLS Tobit QF Tobit/
OLS
Age 0.06350 0.0704 0.0750 0.902
Age2 -6.4E-4 -6.9E-4 -7.4E-4 0.927
Educ 0.0614 0.0755 0.0817 0.813
Black -0.2013 -0.2211 -0.2326 0.910
Hispanic -0.1151 -0.1054 -0.1053 1.092
Union 0.1493 0.1318 0.1161 1.132
20

You might also like