Professional Documents
Culture Documents
Econ 60303
Bill Evans
1
Example: Bias in censored models
• Bivariate regression
• xi and ε are drawn from N(0,1)
• y i = α + x i β + εi
• Let α=0 and β=1 (45o line) and construct y
• Estimate yi = α + xi β + εi
2
• Consider three LHS variables
• y1 is as reported (no censoring)
• y2=min(1,y1)
– censored 23.9%
• y3=min(0.25,y1)
– Censored 41.8% of the time
3
Figure 1: Plot of X and Y1
4
True
3
0
Y
-4 -3 -2 -1 0 1 2 3 4
-1
-2
-3
-4
-5
X
4
Figure 1: Plot of X and Y2
4
True
3
2
OLS
1
0
Y
-4 -3 -2 -1 0 1 2 3 4
-1
-2
-3
-4
-5
X
5
Figure 1: Plot of X and Y3
4
True
3
1 OLS
0
Y
-4 -3 -2 -1 0 1 2 3 4
-1
-2
-3
-4
-5
X
6
OLS Estimate of α and β
Dependent Variable Ratio, βYj/ βY1
Y1 Y2 Y3
7
OLS using Tobit using Tobit using
Y1 Y2 Y3
8
9
Example from CPS
• Data from the 1987 CPS out-going rotation
group
• Households in CPS for same four months
in a two year period (April-July 1987 and
1988)
• ¼ leave the sample temporarily or
permanently each month
• In these months, answer detailed
questions about current employment
10
• Union status
• Usual hours, hours of overtime
• Usual weekly earnings
• In each survey, weekly earnings are
‘topcoded’
• In the data we use (1987), topcoded at
$999
11
• Sample, 25% random sample of
full-time/full year male workers, 21-64
12
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
age byte %9.0g age in years
race byte %9.0g 1=white, non-hisp, 2=black,
n.h, 3=hisp
educ byte %9.0g years of education
unionm byte %9.0g 1=union member, 2=otherwise
smsa byte %9.0g 1=live in 19 largest smsa,
2=other smsa, 3=non smsa
region byte %9.0g 1=east, 2=midwest, 3=south,
4=west
earnwke int %9.0g usual weekly earnings
-------------------------------------------------------------------------------
13
Need a variable
That identifies
What obs are censored
. gen earnwkl=ln(earnwke);
. gen union=unionm==1;
. gen topcode=earnwke==999;
. gen black=race==2;
. gen hispanic=race==3;
. * get frequencie of topcode; Fraction
. tabulate topcode; Of obs
topcoded
=1 if |
earnwkl is |
topcoded | Freq. Percent Cum.
------------+-----------------------------------
0 | 18,474 92.81 92.81
1 | 1,432 7.19 100.00
------------+-----------------------------------
Total | 19,906 100.00
14
• . *run simple regression on topcoded data;
• . reg earnwkl age age2 educ black hispanic union;
• [delete results]
15
Tobit regression Number of obs = 19906
LR chi2(6) = 7309.06
Prob > chi2 = 0.0000
Log likelihood = -13207.534 Pseudo R2 = 0.2167
------------------------------------------------------------------------------
earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0703864 .00214 32.89 0.000 .0661919 .074581
age2 | -.0006948 .0000262 -26.55 0.000 -.0007461 -.0006435
educ | .0757658 .0012172 62.25 0.000 .07338 .0781515
black | -.2200147 .011795 -18.65 0.000 -.2431339 -.1968954
hispanic | -.1058161 .0141638 -7.47 0.000 -.1335783 -.0780539
union | .1191111 .0077791 15.31 0.000 .1038634 .1343588
_cons | 3.499009 .0421806 82.95 0.000 3.416332 3.581686
-------------+----------------------------------------------------------------
/sigma | .4530426 .0023983 .4483418 .4577434
------------------------------------------------------------------------------
Obs. summary: 0 left-censored observations
18474 uncensored observations
1432 right-censored observations at earnwkl>=6.906755
Similar to RMSE
16
egen q=mean(topcode) if earnwke>=750;
gen alpha=ln(q)/(ln(750) - ln(999));
gen ey_y999=999*alpha/(alpha-1);
sum q alpha ey_y999;
α = 2.89