You are on page 1of 116

Part 18: Ordered Outcomes [1/55]

Econometric Analysis of Panel Data

William Greene
Department of Economics
University of South Florida
Econometric Analysis of Panel Data

18. Ordered Outcomes


and Interval Censoring
Part 18: Ordered Outcomes [3/55]
Part 18: Ordered Outcomes [4/55]

Ordered Discrete Outcomes


 E.g.: Taste test, credit rating, course grade,
preference scale
 Underlying random preferences:
 Existence of an underlying continuous preference scale
 Mapping to observed choices
 Strength of preferences is reflected in the discrete
outcome
 Censoring and discrete measurement
 The nature of ordered data
Part 18: Ordered Outcomes [5/55]

Ordered Choices at IMDb


Part 18: Ordered Outcomes [6/55]
Part 18: Ordered Outcomes [7/55]
Part 18: Ordered Outcomes [8/55]
Part 18: Ordered Outcomes [9/55]
Part 18: Ordered Outcomes [10/55]
Part 18: Ordered Outcomes [11/55]
Part 18: Ordered Outcomes [12/55]

Health Satisfaction (HSAT)


Self administered survey: Health Care Satisfaction? (0 – 10)

Continuous Preference Scale


Part 18: Ordered Outcomes [13/55]

Modeling Ordered Choices


 Random Utility (allowing a panel data setting)
Uit =  + ’xit +  it

= ait + it

 Observe outcome j if utility is in region j


 Probability of outcome = probability of cell
Pr[Yit=j] = F(j – ait) - F(j-1 – ait)
Part 18: Ordered Outcomes [14/55]

Ordered Probability Model


y*  βx  , we assume x contains a constant term
y  0 if y*  0
y = 1 if 0 < y*  1
y = 2 if 1 < y*  2
y = 3 if 2 < y*  3
...
y = J if  J-1 < y*   J
In general : y = j if  j-1 < y*   j , j = 0,1,...,J
-1  ,  o  0,  J  ,  j-1   j, j = 1,...,J
Part 18: Ordered Outcomes [15/55]

Combined Outcomes for Health Satisfaction


Part 18: Ordered Outcomes [16/55]

Ordered Probabilities

Prob[y=j]=Prob[ j-1  y*   j ]
= Prob[ j-1  βx     j ]
= Prob[βx     j ]  Prob[βx     j1 ]
= Prob[   j  βx]  Prob[   j1  βx]
= F[ j  βx]  F[ j1  βx]
where F[] is the CDF of .
Part 18: Ordered Outcomes [17/55]
Part 18: Ordered Outcomes [18/55]

Coefficients
 What are the coefficients in the ordered probit model?
There is no conditional mean function.
Prob[y=j|x ]
 [f( j1  β'x )  f( j  β'x)] k
x k
Magnitude depends on the scale factor and the coefficient.
Sign depends on the densities at the two points!
 What does it mean that a coefficient is "significant?"
Part 18: Ordered Outcomes [19/55]

Partial Effects in the Ordered Choice Model


Assume the βk is positive.
Assume that xk increases.
β’x increases. μj- β’x shifts
to the left for all 5 cells.
Prob[y=0] decreases
Prob[y=1] decreases – the
mass shifted out is larger
than the mass shifted in.
Prob[y=3] increases –
same reason in reverse.
Prob[y=4] must increase.
When βk > 0, increase in xk decreases Prob[y=0]
and increases Prob[y=J]. Intermediate cells are
ambiguous, but there is only one sign change in
the marginal effects from 0 to 1 to … to J
Part 18: Ordered Outcomes [20/55]

Partial Effects of 8 Years of Education


Part 18: Ordered Outcomes [21/55]

An Ordered Probability
Model for Health Satisfaction
+---------------------------------------------+
| Ordered Probability Model |
| Dependent variable HSAT |
| Number of observations 27326 |
| Underlying probabilities based on Normal |
| Cell frequencies for outcomes |
| Y Count Freq Y Count Freq Y Count Freq |
| 0 447 .016 1 255 .009 2 642 .023 |
| 3 1173 .042 4 1390 .050 5 4233 .154 |
| 6 2530 .092 7 4231 .154 8 6172 .225 |
| 9 3061 .112 10 3192 .116 |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Index function for probability
Constant 2.61335825 .04658496 56.099 .0000
FEMALE -.05840486 .01259442 -4.637 .0000 .47877479
EDUC .03390552 .00284332 11.925 .0000 11.3206310
AGE -.01997327 .00059487 -33.576 .0000 43.5256898
HHNINC .25914964 .03631951 7.135 .0000 .35208362
HHKIDS .06314906 .01350176 4.677 .0000 .40273000
Threshold parameters for index
Mu(1) .19352076 .01002714 19.300 .0000
Mu(2) .49955053 .01087525 45.935 .0000
Mu(3) .83593441 .00990420 84.402 .0000
Mu(4) 1.10524187 .00908506 121.655 .0000
Mu(5) 1.66256620 .00801113 207.532 .0000
Mu(6) 1.92729096 .00774122 248.965 .0000
Mu(7) 2.33879408 .00777041 300.987 .0000
Mu(8) 2.99432165 .00851090 351.822 .0000
Mu(9) 3.45366015 .01017554 339.408 .0000
Part 18: Ordered Outcomes [22/55]

Ordered Probability Partial Effects


+----------------------------------------------------+
| Marginal effects for ordered probability model |
| M.E.s for dummy variables are Pr[y|x=1]-Pr[y|x=0] |
| Names for dummy variables are marked by *. |
+----------------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
These are the effects on Prob[Y=00] at means.
*FEMALE .00200414 .00043473 4.610 .0000 .47877479
EDUC -.00115962 .986135D-04 -11.759 .0000 11.3206310
AGE .00068311 .224205D-04 30.468 .0000 43.5256898
HHNINC -.00886328 .00124869 -7.098 .0000 .35208362
*HHKIDS -.00213193 .00045119 -4.725 .0000 .40273000
These are the effects on Prob[Y=01] at means.
*FEMALE .00101533 .00021973 4.621 .0000 .47877479
EDUC -.00058810 .496973D-04 -11.834 .0000 11.3206310
AGE .00034644 .108937D-04 31.802 .0000 43.5256898
HHNINC -.00449505 .00063180 -7.115 .0000 .35208362
*HHKIDS -.00108460 .00022994 -4.717 .0000 .40273000
... repeated for all 11 outcomes
These are the effects on Prob[Y=10] at means.
*FEMALE -.01082419 .00233746 -4.631 .0000 .47877479
EDUC .00629289 .00053706 11.717 .0000 11.3206310
AGE -.00370705 .00012547 -29.545 .0000 43.5256898
HHNINC .04809836 .00678434 7.090 .0000 .35208362
*HHKIDS .01181070 .00255177 4.628 .0000 .40273000
Part 18: Ordered Outcomes [23/55]

Ordered Probit Marginal Effects


Part 18: Ordered Outcomes [24/55]

Analysis of Model Implications


 Partial Effects
 Fit Measures
 Predicted Probabilities
 Averaged: They match sample
proportions.
 By observation
 Segments of the sample
 Related to particular variables
Part 18: Ordered Outcomes [25/55]

Fit Measures
 There is no single “dependent variable” to
explain.
 There is no sum of squares or other
measure of “variation” to explain.
 Predictions of the model relate to a set of
J+1 probabilities, not a single variable.
 How to explain fit?
 Based on the underlying regression
 Based on the likelihood function
 Based on prediction of the outcome variable
Part 18: Ordered Outcomes [26/55]

Log Likelihood Based Fit Measures


Part 18: Ordered Outcomes [27/55]
Part 18: Ordered Outcomes [28/55]

A Somewhat Better Fit


Part 18: Ordered Outcomes [29/55]

Zero Inflated Ordered Probit


Behavioral Regime (Latent Class) = "Participation"
pit * =zit   uit , pit = 1[pit *  0] (PROBIT Model)
Nonparticipants (pit  0) always report y it  0.
Participants (pit  1) report y it  0,1,2,...J (Ordered)
Consumer Behavior (Ordered Outcome)
y it *  x it  it
y it  0 if y it *  0; Prob[y it  0]  [-x it ]
y it  1 if 0 < y it *  1 ; Prob[y it  1] =[1 -x it]-[-x it ]
y it  2 if 1 < y it *  2 Prob[y it  2] = [2 -x it]-[1 -x it]
...
y it  J if y it *   J1 Prob[y it  J]  1-[ J1 -x it ]
Implied Probabilities
Prob[y it =0] =Prob[pit =0]+Prob[pit =1]Prob[y it =0|pit =1]
Prob[y it =j>0]= Prob[pit =1]Prob[y it =j |pit =1]
Part 18: Ordered Outcomes [30/55]

Teenage Smoking
Harris, M. and Zhao, Z., "Modelling Tobacco Consumption with a Zero
Inflated Ordered Probit Model," (Monash University - under review,
Journal of Econometrics, 2005)
"How often do you currently smoke cigarettes, pipes or other tobacco
products in the last 12 months?"
0 = Not at all (76%)
1 = Less frequently than weekly (4%)
2 = Daily, less than 20/day (13.8%)
3 = Daily, more than 20/day (6.2%)
Splitting Equation: Young & Female, Log(Age), Male, married, Working,
Unemployed, English speaking, ...
Smoking Equation: Prices of alcohol, marijuana, tobacco, Age, Sex,
Married, English speaking, ...
Part 18: Ordered Outcomes [31/55]

Interval Censored Data


Part 18: Ordered Outcomes [32/55]

Interval Censored Data

y it *  x it  it
y it  0 if y it *  a0
y it  1 if a0 < y it *  a1
y it  2 if a1 < y it *  a2
...
y it  J  1 if aJ1 < y it *  a J1
y it  J if y it *  aJ1
a j are known censoring thresholds
Part 18: Ordered Outcomes [33/55]

Income Data
Part 18: Ordered Outcomes [34/55]

Interval Censored Income Data

0 - .15 .15-.25 .25-.30 .30-.35 .35-.40 .40+


0 1 2 3 4 5

How do these differ from the health satisfaction data?


Part 18: Ordered Outcomes [35/55]

Interval Censored Data

y it *  x it   it
0  a0
 x it 
y it  0 if y it *  a ;Prob[y it  0]    
  
0 1  a1  x it    a0  x it  
y it  1 if a < y it *  a ;Prob[y it  1]       
     
j1 j  a j  x it   a j1  x it 
y it  j if a < y it *  a ;Prob[y it  1]       
     
Part 18: Ordered Outcomes [36/55]

Interval Censored Data Model


+---------------------------------------------+
| Limited Dependent Variable Model - CENSORED |
| Dependent variable INCNTRVL |
| Iterations completed 10 |
| Akaike IC=15285.458 Bayes IC=15317.663 |
| Finite sample corrected AIC =15285.471 |
| Censoring Thresholds for the 6 cells: |
| Lower Upper Lower Upper |
| 1 ******* .15 2 .15 .25 |
| 3 .25 .30 4 .30 .35 |
| 5 .35 .40 6 .40 ******* |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Primary Index Equation for Model
Constant .09855610 .01405518 7.012 .0000
AGE -.00117933 .00016720 -7.053 .0000 46.7491906
EDUC .01728507 .00092143 18.759 .0000 10.9669624
MARRIED .09317316 .00441004 21.128 .0000 .75458666
Sigma .11819820 .00169166 69.871 .0000
OLS Standard error of e = .1558463
Constant .07968461 .01698076 4.693 .0000
AGE -.00105530 .00020911 -5.047 .0000 46.7491906
EDUC .02096821 .00108429 19.338 .0000 10.9669624
MARRIED .09198074 .00540896 17.005 .0000 .75458666
Part 18: Ordered Outcomes [37/55]

The Interval Censored Data Model


 What are the marginal effects?

 How do you predict the dependent variable?

 Does the model fit the “data?”


Part 18: Ordered Outcomes [38/55]

Panel Data Models


Part 18: Ordered Outcomes [39/55]

Fixed Effects in Ordered Probit


FEM is feasible, but still has the IP problem:
The model does not allow time invariant variables. (True for all FE models.)

+---------------------------------------------+
| FIXED EFFECTS OrdPrb Model for HSAT |
| Probability model based on Normal |
| Unbalanced panel has 7293 individuals. |
| Bypassed 1626 groups with inestimable a(i). |
| Ordered probit (normal) model |
| LHS variable = values 0,1,...,10 |
+---------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
---------+Index function for probability
AGE | -.07112929 .00272163 -26.135 .0000 43.9209856
HHNINC | .30440707 .06911872 4.404 .0000 .35112607
HHKIDS | -.05314566 .02759325 -1.926 .0541 .40921377
MU(1) | .32488357 .02036536 15.953 .0000
MU(2) | .84482743 .02736195 30.876 .0000
MU(3) | 1.39401405 .03002759 46.424 .0000
MU(4) | 1.82295281 .03102039 58.766 .0000
MU(5) | 2.69905015 .03228035 83.613 .0000
MU(6) | 3.12710938 .03273985 95.514 .0000
MU(7) | 3.79215121 .03344945 113.370 .0000
MU(8) | 4.84337386 .03489854 138.784 .0000
MU(9) | 5.57234230 .03629839 153.515 .0000
Part 18: Ordered Outcomes [40/55]

Incidental Parameters Problem


Table 9.1 Monte Carlo Analysis of the Bias of the MLE in Fixed
Effects Discrete Choice Models (Means of empirical sampling
distributions, N = 1,000 individuals, R = 200 replications)
Part 18: Ordered Outcomes [41/55]

Solution to IP in Ordered Choice Model


Fixed effects ordered logit with individual specific thresholds
Prob[y i,t  di,t | x i,t ]  [ di,t ,i  x i,t    i ]  [ di,t 1,i  x i,j,t   i ]
This can be transformed into J-1 binary choice models:
Prob[y i,j,t  k | x i,t ]  [ x i,t   i   di,t ,i ]  [ xi,t   i ]
(The individual specific thresholds part is meaningless.)
The resulting model is a fixed effects logit model.
(1) Use the "Chamberlain" fixed effects estimator to estimate .
(2) This provides multiple estimators of  If y i,t  k, then it is
also  k-1. E.g., suppose J=5. If Y=3, then Y is greater than or
equal to 0, 1, 2 and 3.
(3) How to reconcile the multiple est imates of Use MDE after
estimating them.
Part 18: Ordered Outcomes [42/55]

Two Studies
 Ferrer-i-Carbonell, A. and Frijters, P., “How
Important is Methodogy for the Estimates of
the Determinants of Happiness?” Working
paper, University of Amsterdam, 2004.
 Das, M. and van Soest, A., “A Panel Data Model
for Subjective Information in Household Income
Growth,” Journal of Economic Behavior and
Organization, 40, 1999, 409-426.
Part 18: Ordered Outcomes [43/55]

Omitted Heterogeneity in the Ordered


Probability Model
y it *  x it  ui  it
y it  j if 1 < y it *   2
Prob[y it  j]  Prob[x it  ui   it   j ]-Prob[x it  ui   it   j1 ]
  -x      -x   
=  j it  -  j1 it 
 1  u2   1  u2 
Ignoring the heterogeneity produces esti mates of the
scaled coefficients and threshold parameters.
Marginal Effects are

Prob[y it  j]    j -x it    j1 -x it   


=    -  
x it   1  u   1  u   1  u
2 2 2

Does the scaling erase the bias due to ignoring the heterogeneity?
Part 18: Ordered Outcomes [44/55]

Random Effects Ordered Probit


+---------------------------------------------+
| Random Effects Ordered Probability Model |
| Log likelihood function -7350.039 |
| Number of parameters 10 |
| Akaike IC=14720.078 Bayes IC=14784.488 |
| Log likelihood function -7570.099 |
| Number of parameters 9 |
| Akaike IC=15158.197 Bayes IC=15216.166 |
| Chi squared 440.1194 |
| Degrees of freedom 1 |
| Prob[ChiSqd > value] = .0000000 |
| Underlying probabilities based on Normal |
| Unbalanced panel has 2721 individuals. |
+---------------------------------------------+
Log Likelihood function rises by 220.
AIC falls by a lot.
Part 18: Ordered Outcomes [45/55]

Random Effects Ordered Probit


+---------+--------------+----------------+--------+---------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |
+---------+--------------+----------------+--------+---------+
Index function for probability
Constant 2.30977026 .19358195 11.932 .0000
AGE -.01871746 .00209003 -8.956 .0000
LOGINC .18063717 .04447407 4.062 .0000
EDUC .05189883 .01138694 4.558 .0000
MARRIED .16934087 .05625235 3.010 .0026
Threshold parameters for index model
Mu(01) .37231012 .02099440 17.734 .0000
Mu(02) 1.02152648 .02996734 34.088 .0000
Mu(03) 1.90942649 .03834274 49.799 .0000
Mu(04) 3.13364227 .05394482 58.090 .0000
Std. Deviation of random effect
Sigma .86357820 .03459713 24.961 .0000
+---------+--------------+----------------+--------+--------+
Index function for probability
Constant 1.73092403 .13201381 13.112 .0000
AGE -.01459464 .00141680 -10.301 .0000
LOGINC .17731072 .03283610 5.400 .0000
EDUC .03956549 .00760040 5.206 .0000
MARRIED .09513703 .03850569 2.471 .0135
Threshold parameters for index
Mu(1) .27875355 .01454454 19.166 .0000
Mu(2) .76803748 .01708019 44.967 .0000
Mu(3) 1.44624995 .01794090 80.612 .0000
Mu(4) 2.37085047 .02336295 101.479 .0000
Part 18: Ordered Outcomes [46/55]

RE Ordered Probit Fits Worse


+---------------------------------------------------------------------------+
| Cross tabulation of predictions. Row is actual, column is predicted. |
| Model = Probit . Prediction is number of the most probable cell. |
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| Actual|Row Sum| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 0| 447| 0| 0| 0| 163| 284| 0|
| 1| 255| 0| 0| 0| 77| 178| 0|
| 2| 642| 0| 0| 0| 177| 465| 0|
| 3| 1173| 0| 0| 0| 255| 918| 0|
| 4| 1390| 0| 0| 0| 285| 1105| 0|
| 5| 726| 0| 0| 0| 88| 638| 0| Random Effects Model
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|Col Sum| 4633| 0| 0| 0| 1045| 3588| 0| 0| 0| 0| 0|
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 0| 447| 1| 0| 0| 135| 311| 0|
| 1| 255| 0| 0| 0| 66| 189| 0|
| 2| 642| 2| 0| 0| 141| 499| 0|
| 3| 1173| 1| 0| 0| 212| 960| 0|
| 4| 1390| 1| 0| 0| 217| 1172| 0|
| 5| 726| 1| 0| 0| 68| 657| 0| Pooled Model
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|Col Sum| 4633| 6| 0| 0| 839| 3788| 0| 0| 0| 0| 0|
+-------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
Part 18: Ordered Outcomes [47/55]
+---------------------------------------------+
| Random Coefficients OrdProbs Model |
| Log likelihood function -7399.789 |
| Number of parameters 14 |
| Akaike IC=14827.577 Bayes IC=14917.751 |
| LHS variable = values 0,1,..., 5 |
| Simulation based on 10 Halton draws |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Means for random parameters
Constant 2.20558990 .09383245 23.506 .0000
AGE -.01777008 .00100651 -17.655 .0000 46.7491906
LOGINC .22137632 .02324751 9.523 .0000 -1.23143358
EDUC .04993003 .00533564 9.358 .0000 10.9669624
MARRIED .15204526 .02732037 5.565 .0000 .75458666
Scale parameters for dists. of random parameters
Constant .73499851 .01269198 57.910 .0000
AGE .00450991 .00023099 19.524 .0000
LOGINC .18122682 .00982249 18.450 .0000
EDUC .00242171 .00098524 2.458 .0140
MARRIED .17686840 .01274872 13.873 .0000
Threshold parameters for probabilities
MU(1) .35236133 .01417318 24.861 .0000
MU(2) .96740071 .01930160 50.120 .0000
MU(3) 1.81667039 .02269549 80.045 .0000
MU(4) 2.99534033 .02813426 106.466 .0000
Part 18: Ordered Outcomes [48/55]

A Dynamic Ordered Probit Model


Part 18: Ordered Outcomes [49/55]

Model for Self Assessed Health


 British Household Panel Survey (BHPS)
 Waves 1-8, 1991-1998
 Self assessed health on 0,1,2,3,4 scale
 Sociological and demographic covariates
 Dynamics – inertia in reporting of top scale
 Dynamic ordered probit model
 Balanced panel – analyze dynamics
 Unbalanced panel – examine attrition
Part 18: Ordered Outcomes [50/55]

Data
Part 18: Ordered Outcomes [51/55]

Variable of Interest
Part 18: Ordered Outcomes [52/55]

Dynamic Ordered Probit Model


It would not be
Latent Regression - Random Utility appropriate to include
hi,t-1 itself in the model
h *it = xit +  H i ,t 1 +  i + it as this is a label, not a
measure
xit = relevant covariates and control variables
H i ,t 1 = 0/1 indicators of reported health status in previous period
H i ,t 1 ( j ) = 1[Individual i reported h it  j in previous period], j=0,...,4
Ordered Choice Observation Mechanism
h it = j if  j 1 < h *it   j , j = 0,1,2,3,4
Ordered Probit Model - it ~ N[0,1]
Random Effects with Mundlak Correction and Initial Conditions
 i =  0  1H i ,1 + 2 xi + u i , u i ~ N[0,2 ]
Part 18: Ordered Outcomes [53/55]

Dynamics
Part 18: Ordered Outcomes [54/55]

Estimated Partial Effects by Model


Part 18: Ordered Outcomes [55/55]

Partial Effect for a Category

These are 4 dummy variables for state in the previous period. Using
first differences, the 0.234 estimated for SAHEX means transition from
EXCELLENT in the previous period to GOOD in the previous period,
where GOOD is the omitted category. Likewise for the other 3 previous
state variables. The margin from ‘POOR’ to ‘GOOD’ was not interesting
in the paper. The better margin would have been from EXCELLENT to
POOR, which would have (EX,POOR) change from (1,0) to (0,1).
Part 18: Ordered Outcomes [56/55]

Ordered Choice Model


Extensions
Part 18: Ordered Outcomes [57/55]

Nested Random Effects


 Winkelmann, R., “Subjective Well Being and the Family:
Results from an Ordered Probit Model with Multiple
Random Effects,” IZA Discussion Paper 1016, Bonn,
2004.
 GSOEP, T=14 years
 21,168 person-years
 7,485 family-years
 1,309 families
 Y=subjective well being (0 to 10)
 Age, Sex, Employment status, health, log income, family size,
time trend
Part 18: Ordered Outcomes [58/55]

Nested RE Ordered Probit


 y*(i,t)=xi,t’β + aj (family)
+ ui,j (individual in family)
+ vi,j,t (unique factor)
 Ordered probit formulation.
 Model is estimated by nested simulation over uij
in aj.
Part 18: Ordered Outcomes [59/55]

Log Likelihood for Nested Effects-1


Ordered Probit Model, based on normality
v i,j,t is the disturbance in the model.
Conditioned on a j and uij the probability of the outcome is
Prob[y i,j,t  d | x i,j,t ,a j ,ui,j ]  [ d  xi,j,t   a j  ui,j ]  [ d1  x i,j,t   a j  ui,j ]
 f(di,j,t , x i,j,t ,a j ,ui,j )
For the individual observed T times, the joint probability is
Prob[yi,j  d | X i,j ,a j ,ui,j ]   t 1 f(di,j,t , x i,j,t ,a j,ui,j )
T

The unconditional probability for the individual is, then


T
Prob[yi,j  d | X i,j ,a j ]   t 1
f(di,j,t , x i,j,t ,a j,ui,j )h(ui,j )dui,j
ui, j
Part 18: Ordered Outcomes [60/55]

Log Likelihood for Nested Effects-2


For the individual observed T times, the joint probability is
Prob[yi,j  di,j | X i,j ,a j ,ui,j ]   t 1 f(di,j,t , x i,j,t ,a j,ui,j )
T

The unconditional probability for the individual is, then


T
Prob[yi,j  di,j | X i,j ,a j ]   t 1
f(di,j,t , x i,j,t ,a j ,ui,j )h(ui,j )dui,j
ui, j

For the family with N j members, the conditional probability is


Prob[yi,1  di,1, yi,2  di,2 ,... | X i,1,a1, X i,2 ,a 2 ,...]
  jj1  
N T
t 1
f(di,j,t , x i,j,t ,a j ,ui,j )h(ui,j )dui,j
ui, j

The unconditional probability is


Prob[yi,1  di,1, yi,2  di,2 ,... | X i,1, X i,2 ,...]
 Nj  h(a )da
 j1  
T
 f(d , x ,a ,u )h(u )du
aj ui, j t 1 i,j,t i,j,t j i,j i,j i,j  j j
Part 18: Ordered Outcomes [61/55]

Log Likelihood for Nested Effects-3


For the individual observed T times, the joint probability is
The unconditional probability is
Prob[yi,1  di,1, yi,2  di,2 ,... | X i,1, X i,2 ,...]
 Nj  h(a )da
  
T
 f(d , x ,a ,u )h(u )du
aj 
 j1 ui, j t 1 i,j,t i,j,t j i,j i,j i,j 

j j

The log likelihood is


logL= i=1 logProb[yi,1  di,1, yi,2  di,2 ,... | Xi,1, X i,2 ,...]
N

= i=1log  Nj  h(a )da


 j1 ui, j  t 1 i,j,t i,j,t j i,j i,j i,j  j j
T

N
f(d , x ,a ,u )h(u )du
aj
Part 18: Ordered Outcomes [62/55]

Log Likelihood for Nested Effects-4

Transform normal variables to standardized form: ui,j  u w i,j ; a j  a v j


w i,j and v j are standard normal variables now.
logL= i=1logProb[ yi,1  di,1, yi,2  di,2 ,... | X i,1, X i,2,...]
N

= i=1log
N

 N  T [ di, j,t  xi,j,t  a v j  u w i,j ]   1  1


 
v j  j1 wi, j  t 1  [d 1  xi,j,t  a v j  u w i,j ]  u h(w i,j )dw i,j  a h(v j )dv j
j

   i, j,t  
Winkelmann evaluated this with nested Hermite quadratures. This is
somewhat more complicated than necessary.
Part 18: Ordered Outcomes [63/55]

Log Likelihood for Nested Effects-5


Integration is a summing operation that may be commuted;
[ di, j,t  xi,j,t   a v j  u w i,j ]   1 1
 i=1 v j  j1 wi, j  t 1  [  x    v   w ]  
N Nj T
log h(w i,j )dw i,j h(v j )dv j
 u i,j  a
di, j,t 1 i,j,t a j  u
1 1 [di, j,t  xi,j,t   a v j  u w i,j ]  
  i=1 log  j1  t 1 
Nj T
 
N
 h(w i,j )h(v j )dw i,jdv j
u  a v j w i, j 
 [di, j,t 1  x i,j,t   a v j  u w i,j ] 
The integration may be replaced by summation of simulated draws:
1 1 1 M 1 R [ di, j,t  x i,j,t   a v j,m  u w i,j,m,r ]  
 i=1   M  m1 R  r 1  
N Nj T
log  
j 1 t 1
 [   x     v   w ]
u a  di, j,t 1 i,j,t a j,m u i,j,m,r 
Many draws - a lot of time - but not particularly complicated. Programming is
quite simple.
Part 18: Ordered Outcomes [64/55]

Generalizing the Ordered Probit


with Heterogeneous Thresholds
Index = βx i
Threshold parameters
Standard model : μ-1 = -, μ0 = 0, μ j > μ j-1 > 0, μJ = +
Preference scale and thresholds are homogeneous
A generalized model (Pudney and Shields, JAE, 2000)
μij = α j + γ j zi
Note the identification problem. If zik is also in xi (same variable)
then μij - βxi = α j + γzik - βzik +... No longer clear if the variable
is in x or z (or both)
Part 18: Ordered Outcomes [65/55]

Generalized Ordered Probit-1


Pudney, S. and M. Shields, "Gender, Race Pay and Promotion in
the British Nursing Profession," J. Applied Ec'trix, 15/4, July 2000.
Ordered Probit Kernel
y it *  x it  it
y it  0 if y it *  0; Prob[y it  0]  [-x it]
y it  1 if 0 < y it *  1 ; Prob[y it  1] =[1 -x it ]-[-x it ]
y it  2 if 1 < y it *  2 Prob[y it  2] = [2 -x it ]-[1 -x it ]
...
y it  J if y it *   J 1 Prob[y it  J]  1-[ J 1 -x it ]

Heterogeneous Thresholds and Latent Regression Y=Grade (rank)
ij  zi j Z=Sex, Race
Prob[y it  j] = [zi j -x it (   j )]-[zi j 1 -x it (   j 1 )]
X=Experience,
Problems: Education, Training,
(1) Coefficients on variables in both Z and X are unidentified History, Marital Status,
(2) How do you make sure that  j   j1 is positive? Age
Part 18: Ordered Outcomes [66/55]

Generalized Ordered Probit-2


y it *  x it  it
y it  0 if y it *  0; Prob[y it  0]  [-x it ]
y it  1 if 0 < y it *  1 ; Prob[y it  1] =[ 1 -x it ]-[-x it]
y it  2 if 1 < y it *  2 Prob[y it  2] = [ 2 -x it]-[1 -x it]
...
y it  J if y it *   J1 Prob[y it  J]  1-[ J1 -x it]

ij  exp( j  zi)


Part 18: Ordered Outcomes [67/55]

A G.O.P Model
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Index function for probability
Constant 1.73737318 .13231824 13.130 .0000
AGE -.01458121 .00141601 -10.297 .0000 46.7491906
LOGINC .17724352 .03275857 5.411 .0000 -1.23143358
EDUC .03897560 .00780436 4.994 .0000 10.9669624
MARRIED .09391821 .03761091 2.497 .0125 .75458666
Estimates of t(j) in mu(j)=exp[t(j)+d*z]
Theta(1) -1.28275309 .06080268 -21.097 .0000
Theta(2) -.26918032 .03193086 -8.430 .0000
Theta(3) .36377472 .02109406 17.245 .0000
Theta(4) .85818206 .01656304 51.813 .0000
Threshold covariates mu(j)=exp[t(j)+d*z]
FEMALE .00987976 .01802816 .548 .5837

How do we interpret the result for FEMALE?


Part 18: Ordered Outcomes [68/88]

Hierarchical Ordered Probit


Index = βx i
Threshold parameters
Standard model : μ-1 = -, μ0 = 0, μ j > μj-1 > 0, μJ = +
Preference scale and thresholds are homogeneous
A generalized model (Harris and Zhao (2000), NLOGIT (2007))
μij = exp[α j + γ j zi ]
An internally consistent restricted modification
μij = exp[α j + γ zi ], α j  α j-1 + exp(θ j )
Part 18: Ordered Outcomes [69/55]

Ordered Choice Model


Part 18: Ordered Outcomes [70/55]

HOPit Model
Part 18: Ordered Outcomes [71/55]

Appendix:
Applications
Part 18: Ordered Outcomes [72/55]

Differential Item Functioning


Part 18: Ordered Outcomes [73/55]
Part 18: Ordered Outcomes [74/55]

A Vignette Random Effects Model


Part 18: Ordered Outcomes [75/55]

Vignettes
Part 18: Ordered Outcomes [76/55]
Part 18: Ordered Outcomes [77/55]
Part 18: Ordered Outcomes [78/55]
Part 18: Ordered Outcomes [79/55]

A Sample Selection Model


Part 18: Ordered Outcomes [80/55]

A Bivariate Latent Class Correlated Generalised


Ordered
Probit Model with an Application to Modelling Observed
Obesity Levels

William Greene
Stern School of Business, New York University

With Mark Harris, Bruce Hollingsworth, Pushkar Maitra


Monash University

Stern Economics Working Paper 08-18.


http://w4.stern.nyu.edu/emplibrary/ObesityLCGOPpaperReSTAT.pdf
Economics Letters, 2014
Part 18: Ordered Outcomes [81/55]
Part 18: Ordered Outcomes [82/55]

300 Million People Worldwide. International Obesity Task Force: www.iotf.org


Part 18: Ordered Outcomes [83/55]

Costs of Obesity
 In the US more people are obese than smoke or use
illegal drugs
 Obesity is a major risk factor for non-communicable
diseases like heart problems and cancer
 Obesity is also associated with:
 lower wages and productivity, and absenteeism
 low self-esteem
 An economic problem. It is costly to society:
 USA costs are around 4-8% of all annual health care
expenditure - US $100 billion
 Canada, 5%; France, 1.5-2.5%; and New Zealand 2.5%
Part 18: Ordered Outcomes [84/55]

Measuring Obesity
 An individual’s weight given their height should
lie within a certain range
 Body Mass Index (BMI)
 Weight (Kg)/height(Meters)2
 World Health Organization guidelines:
 Underweight BMI < 18.5
 Normal 18.5 < BMI < 25
 Overweight 25 < BMI < 30
 Obese BMI > 30
 Morbidly Obese BMI > 40

Kg = 2.2 Pounds Meter = 39.36 Inches


Part 18: Ordered Outcomes [85/55]

Two Latent Classes: Approximately Half of European Individuals


Part 18: Ordered Outcomes [86/55]

Modeling BMI Outcomes


 Grossman-type health production function
Health Outcomes = f(inputs)
 Existing literature assumes BMI is an ordinal, not cardinal,
representation of individuals.
 Weight-related health status
 Do not assume a one-to-one relationship between BMI levels and
(weight-related) health status levels
 Translate BMI values into an ordinal scale using WHO guidelines
 Preserves underlying ordinal nature of the BMI index but
recognizes that individuals within a so-defined weight range are
of an (approximately) equivalent (weight-related) health status
level
Part 18: Ordered Outcomes [87/55]

Conversion to a Discrete Measure


 Measurement issues: Tendency to
under-report BMI
 women tend to under-estimate/report
weight;
 men over-report height.
 Using bands should alleviate this
 Allows focus on discrete ‘at risk’ groups
Part 18: Ordered Outcomes [88/55]

A Censored Regression Model for BMI


Simple Regression Approach Based on Actual BMI:
BMI* = ′x + ,  ~ N[0,2] , σ2 = 1
True BMI = weight proxy is unobserved
Interval Censored Regression Approach
WT = 0 if BMI* < 25 Normal
1 if 25 < BMI* < 30 Overweight 2 if BMI*
> 30 Obese

 Inadequate accommodation of heterogeneity


 Inflexible reliance on WHO classification
 Rigid measurement by the guidelines
Part 18: Ordered Outcomes [89/55]

Heterogeneity in the BMI Ranges


 Boundaries are set by the WHO narrowly defined for all individuals

 Strictly defined WHO definitions may consequently push individuals


into inappropriate categories

 We allow flexibility at the margins of these intervals

 Following Pudney and Shields (2000) therefore we consider


Generalised Ordered Choice models - boundary parameters are
now functions of observed personal characteristics
Part 18: Ordered Outcomes [90/55]

Generalized Ordered Probit


Approach
A Latent Regression Model for True BMI
BMIi* = ′xi + i , i ~ N[0,σ2], σ2 = 1
Observation Mechanism for Weight Type
WTi = 0 if BMIi* < 0 Normal
1 if 0 < BMIi* < (wi) Overweight
2 if (wi) < BMIi* Obese
Part 18: Ordered Outcomes [91/55]

Latent Class Modeling


 Several ‘types’ or ‘classes. Obesity be due to genetic
reasons (the FTO gene) or lifestyle factors

 Distinct sets of individuals may have differing reactions


to various policy tools and/or characteristics

 The observer does not know from the data which class
an individual is in.

 Suggests a latent class approach for health outcomes


(Deb and Trivedi, 2002, and Bago d’Uva, 2005)
Part 18: Ordered Outcomes [92/55]

Latent Class Application


 Two class model (considering FTO gene):
 More classes make class interpretations much more
difficult
 Parametric models proliferate parameters

 Endogenous class membership: Two classes


allow us to correlate the equations driving class
membership and observed weight outcomes via
unobservables.
Part 18: Ordered Outcomes [93/55]

Heterogeneous Class Probabilities


 j = Prob(class=j) = governor of a detached
natural process. Homogeneous.
 ij = Prob(class=j|zi,individual i)
Now possibly a behavioral aspect of the
process, no longer “detached” or “natural”
 Nagin and Land 1993, “Criminal Careers…
Part 18: Ordered Outcomes [94/55]

Endogeneity of Class Membership


Class Membership: C* = z i  ui , C = 1[C* > 0] (Probit)

BMI|Class=0,1 BMI* = c xi   c ,i , BMI group = OP[BMI*,(c w i )]

 ui   0   1 c  
Endogeneity:   ~ N   ,  
  c ,i   0    c 1 

Bivariate Ordered Probit (one variable is binary).

Full information maximum likelihood.


Part 18: Ordered Outcomes [95/55]

Model Components
 x: determines observed weight levels within classes
For observed weight levels we use lifestyle factors such
as marital status and exercise levels
 z: determines latent classes
For latent class determination we use genetic proxies
such as age, gender and ethnicity: the things we
can’t change
 w: determines position of boundary parameters within
classes
For the boundary parameters we have: weight-
training intensity and age (BMI inappropriate for the
aged?) pregnancy (small numbers and length of term
unknown)
Part 18: Ordered Outcomes [96/55]

Data
 US National Health Interview Survey
(2005); conducted by the National
Center for Health Statistics
 Information on self-reported height and
weight levels, BMI levels
 Demographic information
 Split sample (30,000+) by gender
Part 18: Ordered Outcomes [97/55]

Outcome Probabilities
 Class 0 dominated by normal and overweight probabilities ‘normal weight’ class
 Class 1 dominated by probabilities at top end of the scale ‘non-normal weight’
 Unobservables for weight class membership, negatively correlated with those
determining weight levels:
Part 18: Ordered Outcomes [98/55]

Classification (Latent Probit) Model


Part 18: Ordered Outcomes [99/55]

BMI Ordered Choice Model


 Conditional on class membership, lifestyle factors
 Marriage comfort factor only for normal class women
 Both classes associated with income, education
 Exercise effects similar in magnitude
 Exercise intensity only important for ‘non-normal’ class:
 Home ownership only important for .non-normal.class, and negative:
result of differing socieconomic status distributions across classes?
Part 18: Ordered Outcomes [100/55]

Effects of Aging on Weight Class


Part 18: Ordered Outcomes [101/55]

Effect of Education on Probabilities


Part 18: Ordered Outcomes [102/55]

Effect of Income on Probabilities


Part 18: Ordered Outcomes [103/55]

Obesity
 The International Obesity Taskforce (http://www.iotf.org) calls obesity one
of the most important medical and public health problems of our time.
 Defined as a condition of excess body fat; associated with a large number
of debilitating and life-threatening disorders
 Health experts argue that given an individual’s height, their weight should
lie within a certain range
 Most common measure = Body Mass Index (BMI):
 Weight (Kg)/height(Meters)2
 WHO guidelines:
 BMI < 18.5 are underweight
 18.5 < BMI < 25 are normal
 25 < BMI < 30 are overweight
 BMI > 30 are obese
 Around 300 million people worldwide are obese, a figure likely to rise
Part 18: Ordered Outcomes [104/55]

Models for BMI


Simple Regression Approach Based on Actual
BMI:
BMI* = ′x + ,  ~ N[0,2]
No accommodation of heterogeneity
Rigid measurement by the guidelines
Interval Censored Regression Approach
WT = 0 if BMI* < 25 Normal
1 if 25 < BMI* < 30 Overweight
2 if BMI* > 30 Obese
Inadequate accommodation of heterogeneity
Inflexible reliance on WHO classification
Part 18: Ordered Outcomes [105/55]

An Ordered Probit Approach


A Latent Regression Model for “True BMI”
BMI* = ′x + ,  ~ N[0,σ2], σ2 = 1
“True BMI” = a proxy for weight is
unobserved
Observation Mechanism for Weight Type
WT = 0 if BMI* < 0 Normal
1 if 0 < BMI* <  Overweight
2 if BMI* >  Obese
Part 18: Ordered Outcomes [106/55]

A Basic Ordered Probit Model


Prob(WTi  0 | x)  Prob( BMI i *  0)
 Prob(xi  i  0)
 ( xi )
Prob(WTi  1| x)  Prob(0  BMIi *  )
 Prob(xi  i  )  Prob(xi  i  0)
 (  xi )  ( xi )
Prob(WTi  2 | x)  Prob( BMI i *  )
 Prob(xi  i  )
 1  Prob(xi  i  )
 1   (   x i )
Part 18: Ordered Outcomes [107/55]

Latent Class Modeling


 Irrespective of observed weight category, individuals can be
thought of being in one of several ‘types’ or ‘classes. e.g. an obese
individual may be so due to genetic reasons or due to lifestyle
factors
 These distinct sets of individuals likely to have differing reactions
to various policy tools and/or characteristics
 The observer does not know from the data which class an
individual is in.
 Suggests use of a latent class approach
 Growing use in explaining health outcomes (Deb and Trivedi,
2002, and Bago d’Uva, 2005)
Part 18: Ordered Outcomes [108/55]

A Latent Class Model

For modeling purposes, class membership is


distributed with a discrete distribution,
 
Prob(individual i is a member of class = c)
= ic = c
 
Prob(WTi = j | xi)
= Σc Prob(WTi = j | xi,class = c)Prob(class = c).
Part 18: Ordered Outcomes [109/55]

Probabilities in the Latent Class Model

Prob(WTi =j | xi )   c c     j ,c  c xi      j 1,c  c xi  


There are two classes labeled c = 0 and c = 1.

Prob(WTi =j | xi )
  c c    j 1,c  c xi      j ,c  c xi    c
xi
Part 18: Ordered Outcomes [110/55]

Class Assignment
Class membership may relate to demographics such as age and sex.

Probit Model for Class Membership


Prob(Classi = 1 | w i )  i1
 ( w i )
Prob(Classi = 0 | w i )  1-i1
 1  ( w i )
 ( w i )
Prob(Classi = c | w i )  [(2c  1) w i ]
Part 18: Ordered Outcomes [111/55]

Generalized Ordered Probit – Latent Classes and


Variable Thresholds

Basic Ordered Choice Model


Prob(WTi  0 | x, class  c)  ( c xi )
Prob(WTi  1| x, class  c)  (i ,c  c xi )  ( c xi )
Prob(WTi  2 | x, class  c)  1  (i ,c  c xi )
Heterogeneity in Threshold Parameter
i ,c  exp(c   c z i )
Part 18: Ordered Outcomes [112/55]

Data
 US National Health Interview Survey (2005);
conducted by the National Centre for Health
Statistics
 Information on self-reported height and weight
levels, BMI levels
 Demographic information
 Remove those underweight
 Split sample (30,000+) by gender
Part 18: Ordered Outcomes [113/55]

Model Components
 x: determines observed weight levels within classes
For observed weight levels we use lifestyle factors such as marital
status and exercise levels
 z: determines latent classes
For latent class determination we use genetic proxies such as age,
gender and ethnicity: the things we can’t change
 w: determines position of boundary parameters within classes
For the boundary parameters we have: weight-training
intensity and age (BMI inappropriate for the aged?) pregnancy
(small numbers and length of term unknown)
Part 18: Ordered Outcomes [114/55]

Different Normalizations
 NLOGIT
 Y = 0,1,…,J, U* = α + β’x + ε
 One overall constant term, α
 J-1 “cutpoints;” μ-1 = -∞, μ0 = 0, μ1,… μJ-1, μJ = + ∞
 Stata
 Y = 1,…,J+1, U* = β’x + ε
 No overall constant, α=0
 J “cutpoints;” μ0 = -∞, μ1,… μJ, μJ+1 = + ∞
Part 18: Ordered Outcomes [115/55]

α̂

μˆ j
Part 18: Ordered Outcomes [116/55]

αˆ

μˆ j  αˆ

You might also like