You are on page 1of 19

Logit, Probit and Tobit: Models for Categorical and Limited Dependent Variables

By Rajulton Fernando Presented at


PLCS/RDC Statistics and Data Series at Western

March 23 2011 23,

Introduction
In social science research categorical data are often research, collected through surveys.
Categorical g Nominal and Ordinal variables They take only a few values that do NOT have a metric.

A) Binary Case ) y Many dependent variables of interest take only two values (a dichotomous variable), denoting an event or non-event and coded as 1 and 0 respectively. Some examples:
The labor force status of a person. Voting behavior of a person (in favor of a new policy). Whether a person got married or divorced. Whether a person involved in criminal behaviour, etc.

Introduction
With such variables we can build models that variables, describe the response probabilities, say P(yi = 1), of the dependent variable yi. p
For a sample of N independently and identically distributed observations i = 1, ... ,N and a (K+1)-dimensional vector xi of explanatory variables, the probability th t y t k value f l t i bl th b bilit that takes l 1 is modeled as

P ( yi = 1| xi ) = F ( xi ) = F ( zi )

where is a (K + 1)-dimensional column vector of parameters.

The transformation function F is crucial. It maps the linear combination into [0,1] and satisfies in general F() = 0, F(+) = 1, and F(z)/z > 0 [that is it is a 0 1 is, cumulative distribution function].

The Logit and Probit Models


When the transformation function F is the logistic function, the response probabilities are given by
P ( y i = 1 | xi ) = e xi

1 + e xi

And, when the transformation function F is the cumulative density function (cdf) of the standard normal distribution, the response probabilities are 1 x x s given by 1 2
P ( y i = 1 | x i ) = ( x i ) =

( s ) ds

ds

The Logit and Probit models are almost identical (see the Figure next slide) and the choice of the model is arbitrary, although l i model has certain bi lh h logit d lh i advantages (simplicity and ease of interpretation)

Source: J.S. Long, 1997

The Logit and Probit Models


However the parameters of the two models are However, scaled differently. The parameter estimates in a logistic regression tend to be 1.6 to 1.8 times higher g g g than they are in a corresponding probit model. p g y The probit and logit models are estimated by maximum likelihood (ML), assuming independence across observations. The ML estimator of is consistent and asymptotically normally distributed. i d i ll ll di ib d However, the estimation rests on the strong assumption that the latent error term is normally distributed and homoscedastic. If homoscedasticity is violated, no easy solution. , y

The Logit and Probit Models


Note: The response function (logistic or probit) is an S-shaped function, which implies a fixed change in X has a smaller impact on the p p probability when it is y near zero than when it is near the middle. Thus, it is a non-linear response function. How to interpret the coefficients : In both models,
If b > 0 If b < 0 p increases as X increases p decreases as X increases
As mentioned above, b cannot be interpreted as a simple slope as in ordinary regression. Because the rate at which the curve ascends or descends changes according to the value of X. In other words, it is not a constant change as in ordinary , g y regression. The greatest rate of change is at p = 0.5

The Logit and Probit Models


In the logit model we can interpret b as an effect model, on the odds. That is, every unit increase in X results in a multiplicative effect of eb on the odds. p
Example: If b = 0.25, then e.25 = 1.28. Thus, when X changes by one unit, p increases by a factor of 1.28, or changes by 28%.

- In the probit model, use the Z-score terminology. For F every unit increase in X, the Z it i i X th Z-score ( th (or the Probit of success) increases by b units. [Or, we can also say that an increase in X changes Z by b standard deviation units.]
- If you like, you can convert the z-score to probabilities y ,y p using the normal table.

Models for Polytomous Data


B) Polytomous Case
Here we need to distinguish between purely nominal variables and really ordinal variables. When the variable is purely nominal, we can extend the dichotomous logit model, using one of g , g the categories as reference and modeling the other responses j=1,2,..m-1 compared to the reference.
Example: In the case of 3 categories, using the 3rd category as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3), which will give two sets of parameter estimates. g p
P ( y = 1) = P ( y = 2) = P ( y = 3) = exp( 1 x ) 1 + exp( 1 x ) + exp( 2 x ) exp( 2 x ) 1 + exp( 1 x ) + exp( 2 x ) 1 1 + exp( 1 x ) + exp( 2 x )

Polytomous Case
When the variable is really ordinal, we use cumulative ordinal logits (or probits). The logits in this model are for cumulative categories at each point, contrasting categories above with categories below. Example: Suppose Y has 4 categories; then,
logit (p1) = ln{p1 / (1-p1)} (1 p = a1 + bX logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 p2)} = a2 + bX logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1p2p3)} = a3 + bX

Since these are cumulative logits, the probabilities are attached to being in category j and lower. Since the right side changes only in the intercepts, and not in the slope coefficient, this model is known as Proportional odds model. Thus in ordered logistic, we model Thus, logistic need to test the assumption of proportionality as well.

Ordinal Logistic
a1, a2, a3 are the intercepts that satisfy the property intercepts a1 < a2 < a3 interpreted as thresholds of the latent variable. Interpretation of parameter estimates depends on the software used! Check the software manual. If the RHS = a + bX, a positive coefficient is associated bX positi e more with lower order categories and a negative coefficient is associated more with higher order categories. If the RHS = a bX, a negative coefficient is more associated with lower ordered categories and a positive categories, coefficient is more associated with higher ordered categories.

Model for Limited Dependent Variable


C) Tobit Model This model is for metric dependent variable and when it is limited in the sense we observe it only if limited it is above or below some cut off level. For example,
the wages may be limited from below by the minimum g y y wage The donation amount give to charity Top coding income at, say, at $300,000 Time use and leisure activity of individuals Extramarital affairs

It is also called censored regression model. Censoring can be from below or from above, also called left and right censoring. [Do not confuse the term censoring with the one used in dynamic modeling.]

The Tobit Model


The model is called Tobit because it was first proposed by Tobin (1958), and involves aspects of Probit analysis a term coined by Goldberger for Tobins Probit. Reasoning behind:
If we include the censored observations as y = 0, the censored observations on the l f will pull down the end of d b i h left ill ll d h d f the line, resulting in underestimates of the intercept and p overestimates of the slope. If we exclude the censored observations and just use the observations for which y>0 (that is, truncating the sample), it will overestimate the intercept and underestimate the slope. The degree of bias in both will increase as the number of g observations that take on the value of zero increases. (see Figure next slide)

Source: J.S. Long

The Tobit Model


The Tobit model uses all of the information, information including info on censoring and provides consistent estimates. It is also a nonlinear model and similar to the probit model. It is estimated using maximum likelihood g estimation techniques. The likelihood function for the tobit model takes the form:

This is an unusual function, it consists of two terms, the first for non-censored observations (it is the pdf), and th second f censored observations (it i th cdf). d the d for d b ti is the df)

The Tobit Model


The estimated tobit coefficients are the marginal effects of a change in xj on y*, the unobservable latent variable and can be interpreted in the same way as in a p y linear regression model. But such an interpretation may not be useful since we are interested in the effect of X on the observable y (or change in the censored outcome).
It can b shown th t change i y i found by multiplying be h that h in is f db lti l i the coefficient with Pr(a<y*<b), that is, the probability of being uncensored. Since this probability is a fraction, the marginal effect is actually attenuated. In the above, a and b denote lower and upper censoring points. points For example, in left censoring, the limits will be: example censoring a =0, b=+.

Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at http://www.stata-press.com/data/imeus/womenwk.dta
Descriptive Statistics N age education married children wagefull wage lw work lwf Valid N (listwise) 2000 2000 2000 2000 2000 1343 1343 2000 2000 1343 Minimum 20 10 0 0 -1.68 5.88 1.77 0 .00 Maximum 59 20 1 5 45.81 45.81 3.82 1 3.82 Mean 36.21 13.08 .67 1.64 21.3118 23.6922 3.1267 .67 2.0996 Std. Deviation 8.287 3.046 .470 1.399 7.01204 6.30537 .28651 .470 1.48752

Binary Logistic Regression


Model Summary Step -2 Log likelihood 1 2055.829a Cox & Snell R Square .212 Nagelkerke R Square .295

a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.

Hosmer and Lemeshow Test Step 1 Chi-square 6.491 df 8 Sig. .592

Variables in the Equation B Step 1


a

S.E. .058 .098 .742 .764 .007 .019 .126 .052 .332

Wald 64.359 27.747 34.401 220.110 156.909

df 1 1 1 1 1

Sig. .000 .000 .000 .000 .000

Exp(B) 1.060 1.103 2.100 2.148 .016

age education married children Constant

-4.159

a. Variable(s) entered on step 1: age, education, married, children.

Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit link function. Ignore the test of parallel lines, etc.)

Model Fitting Information Model -2 Log Likelihood Intercept Only Final Link function: Probit. 1645.024 1166.702 478.322 4 .000 Chi-Square df Sig.

Parameter Estimates 95% Confidence Interval Estimate Threshold Location [work = 0] age education children [married=0] [married=1] Link function: Probit. a. This parameter is set to zero because it is redundant. 2.037 .035 .058 .447 -.431 0a Std. Error .209 .004 .011 .029 .074 . Wald 94.664 67.301 28.061 243.907 33.618 . df 1 1 1 1 1 0 Sig. .000 .000 .000 .000 .000 . Lower Bound 1.626 .026 .037 .391 -.577 . Upper Bound 2.447 .043 .080 .503 -.285 .

Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands. First, fit simple OLS Regression of the variable lwf (just to check)

. regress lwf age married children education


Source | SS df MS -------------+-----------------------------Model | 937.873188 4 234.468297 Residual | 3485.34135 1995 1.74703827 -------------+-----------------------------Total | 4423.21454 1999 2.21271363 Number of obs F( 4, 1995) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2000 134.21 0.0000 0.2120 0.2105 1.3218

-----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0363624 .003862 9.42 0.000 .0287885 .0439362 married | .3188214 .0690834 4.62 0.000 .1833381 .4543046 children | .3305009 .0213143 15.51 0.000 .2887004 .3723015 education | .0843345 .0102295 8.24 0.000 .0642729 .1043961 _cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105 ------------------------------------------------------------------------------

. tobit lwf age married children education, ll(0)

Tobit regression

Log likelihood = -3349.9685

Number of obs LR chi2(4) Prob > chi2 Pseudo R2

= = = =

2000 461.85 0.0000 0.0645

-----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .052157 .0057457 9.08 0.000 .0408888 .0634252 married | .4841801 .1035188 4.68 0.000 .2811639 .6871964 children | .4860021 .0317054 15.33 0.000 .4238229 .5481812 education | .1149492 .0150913 7.62 0.000 .0853529 .1445454 _cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409 -------------+---------------------------------------------------------------/sigma | 1.872811 .040014 1.794337 1.951285 -----------------------------------------------------------------------------Obs. summary: 657 left-censored observations at lwf<=0 1343 uncensored observations 0 right-censored observations

. mfx compute, predict(pr(0,.))


Marginal effects after tobit y = Pr(lwf>0) (predict, pr(0,.)) = .81920975 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208 married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705 children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445 educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

. mfx compute, predict(e(0,.))


Marginal effects after tobit y = E(lwf|lwf>0) (predict, e(0,.)) = 2.3102021 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208 married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705 children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445 educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1

You might also like