You are on page 1of 158

J.

Scott Long
Advanced Quantitative Techniques
in the Social Sciences

LIST OF ADVISORY BOARD MEMBERS

Peter Bentler, 01 Psychology and Statistícs, UCLA


Bengt Muthén, Graduate School 01 Educatíon and Inlormatíon Sciences, UCLA
Regression
David
Read,
Edward Leamer,
Donald Ylvisaker,
Departments olGeography and Statistícs, UCLA
olAnthropology and Statistics, UCLA
01 Economics and Statístícs, UCLA
01 Mathematics and Statistics, UCLA
Models for
VOLUMES IN THE SERIES Ca tegorica 1
1. HIERARCHICAL LINEAR MODELS: Applications and
Data Analysis Methods
AnthollY S. alld Stephen W. Raudenbush
and Limited
2. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Theory
JOhll P. Van de Geer
3. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Applications
Dependent
4.
John P. Van de Geer
STATISTICAL MODELS FOR ORDINAL VARIABLES
Clifford C. Clogg and Edward S. Shihadeh
Variables
5. FACET THEORY: Form and Content
Ingwer ami Samuel
6. LATENT CLASS AND DISCRETE LATENT TRAIT MODELS:
J. Seott Long
Similaríties and Differences
Ton Heinen
7. REGRESSION MODELS FOR CATEGORICAL AND LIMITED
DEPENDENT VARIABLES
J. Seott Long
8. LOG-LINEAR MODELS FOR EVENT HISTORIES
Jeroen K. Vermunt
Advanced Quantitative Techniques
in the Social Sciences Series 7
9. MULTIVARIATE TAXOMETRIC PROCEDURES: Distinguishing
Types From Continua
Niels G. Waller anel Paul E. Meehl
10. STRUCnlRAL EQUATION MODELING: Foundations and Extensions
David
SAGE Publications
International Educational and Professional Publisher
TI10usand Oaks London New Delhi
Copyright 1997 by Sage Publications, lnc,

Al! rights reserved. No par! 01' this book may be reproduced or utilized in any fonn or by any means,
electronic or mechanical, including photocopying, recording, or by any infonnation storage and retneval
system, wi!hout pennission in writing from !he pubIísher.

For mf"rnlal¡ionaddress:

SAGE Publications, lnc,


2455 Teller Road
Thousand Oaks, California 91320
E-mail: order@sagepuh.com
SAGE Publications LId.
6 Bonhíll Street
London EC2A 4PU
United Kingdom
SAGE Publications India Pvt Ltd,
M-32 Market
Greater Kailash 1
New Delhí 110 048 India

Printed in the United States of America


To V ALERIE AND MEGAN
Library ofCongress Cata/oging-in-Publication Data

Long,],Scott
Regression models for categorical and lirnited dependent variables
I aUlhor, J. Scott Long,

p. cm, - (Advanced quantitative techniques in !he socíal


sciences ; v. 7)
IncJudes hi~,liOOT'lnhiClll references ¡¡nd indexo
ISBN (c\olh : alk. paper)
1, I. Tille. 11. Series: Advanced
the social sciences ; 7,
96-35710

04 10

Pmductíon Coordina/or: Astrid Virding


Cuva Design: Lesa Valdez:
Book Design: Ravi B.lasuriyia
Print Buyer: Anna Chín
Contents

List of Figures xi
List of Tables xv
Series Editor's Introduction xix
Preface xxiii
Acknowledgments xxv
Abbreviations and Notation xxvii
1. Introduction 1
1.1. Linear and Nonlinear Models 3
1.2. Organization 6
1.3. Orientation 9
1.4. Bibliographic Notes 10
2. Continuous Outcomes: The Linear Regression Model 11
2.1. The Linear Regression Model 11
2.2. Interpreting Regression Coefficients 14
2.3. Estimation by Ordinary Least Squares 18
2.4. Nonlinear Linear Regression Models 20
2.5. Violations of the Assumptions 22
2.6. Maximum Likelihood Estimatíon 25 6.9. Related Models ]84
2.7. Conclusions 33 6.10. Conclusions 185
2.8. Notes 33 6.11. Bibliographic Notes 186
3. Outcomes: The Linear Probability, Probit, and Logit 7. Limited Outcomes: The Tobit Model 187
Models 34 7.l. The Problem of Censoríng 188
3.l. The Linear Probability Model 35 7.2. Truncated and Censored Dístributions 192
3.2. A Latent Variable Model for Binary Variables 40 7.3. The Tobit Model for Censored Outcomes 196
3.3. Identification 47 7.4. Estimation 204
3.4. A Nonlinear Probability Model 50 7.5. Interpretation 206
3.5. ML Estimation 52 7.6. Extensions 211
3.6. Numerical Methods for ML Estimation 54 7.7. Conclusions 216
3.7. 61 7.8. Bibliographic Notes 216
3.8. lnterpretation Odds Ratios 79
3.9. Conclusions 83 8. Count Outcomes: Regression Models for Counts 217
3.10. Notes 83 8.1. The Poisson Distribution 218
8.2. The Poisson Regression Model 221
4. 85 8.3. The Negative Binomial Regression Model 230
85 8.4. Models for Truncated Counts 239
4.2. Residuals and Influence 98 8.5. Zero Modified Count Models 242
4.3. Scalar Measures of Fit 102 8.6. Comparisons Among Count Models 247
4.4. Conclusions 112 8.7. Conclusions 249
4.5. Notes 113 8.8. Bibliographic Notes 249
5. Ordinal Outcomes: Ordered Logit and Ordered Probit Analysis 114 9. Conclusions 251
5.1. A Latent Variable Model for Ordinal Variables 116 9.1. Links Using Latent Variable Models 252
5.2. Identification 122 9.2. The Generalized Linear Model 257
5.3. Estimation 123 9.3. Similarities Among Probability Models 258
5.4. 127 9.4. Event History Analysis 258
5.5. Assumption 140 9.5. Log-Linear Models 259
5.6. Rclated Models for Ordinal Data 145
5.7. Conclusions 146 A. Answers to Exercises 2M
5.8. Notes 147 References 274
6. Nominal Outcomes: Multinomial Logit and Related Models 148 Author Index 283
6.1. Introduction to tile Multinomial Logit Model 149
6.2. The Mnltinomial Model 151 Subject Index 287
6.3. ML Estimation 156 Abont the Author 297
6.4. and Other Contrasts 158
6.5. Two Useful Tests 160
6.6. 164
6.7. The Conditional Logit Model 178
6.8. of Irrelevant Alternatives 182
List of Figures

1.1 Effects of Continuous and Dummy Variables in Linear and


Nonlinear Models 4
2.1 Simple Linear Regression Model With the Distribution of y
Given x 13
2.2 Identification of the Intercept in the Linear Regression Model 23
2.3 Probability of s = 3 for Different Values of 17 26
2.4 Maximum Likelihood Estimation of ¡.L From a Normal Dis-
tribution 28
2.5 Maximum Likelihood Estimation for the Linear Regression
Model 30
3.1 Linear Probability Model for a Single Independent Variable 36
3.2 The Distribution of y' Given x in the Binary Response Model 41
3.3 Normal and Logistic Distributions 43
3.4 Probability of Observed Values in the Binary Response Model 44
3.5 Computing Pr(y = 1 Ix) in the Binary Response Model 44
3.6 Plot of y* and Pr(y = 11 x) in the Binary Response Model 46
3.7 Complementary Log-Log and Log-Log Models 52
3.8 Effects of Changing the Slope and Intercept on the Binary
Response Model: Pr(y = 11 x) = F( ex + f3x) 63
3.9 Plot of Probit Model: Pr(y = 11 x, z) = 1>(1.0 + 1.0x + 0.75z) 64
3.10 Probability of Labor Force Participation by Age and Wife's
Education 67
xi
REGRESSION MODELS List of Figures xiii

3.11 of Labor Force Participatíon by Age and Family 7.4 Inverse Milis Ratio 195
lncome for Women Without Sorne College Educatíon 68 7.5 Probability of Being Censored in the Tobit Model 198
3.12 Effect in the Binary Response ModeI 73 7,6 Probability of Being Censored by Gender, Fellowship Status,
3.13 Versus Discrete Change in Nonlinear Models 76 and Prestige of Doctoral Department 200
4.1 Distribution for a z-Statistic 86 7.7 Expected Values of y*, y I y> T, and y in the Tobít Model 202
4.2 Likelihood Ratio, and Lagrange Multiplier Tests 88 7.8 Maximum Likelihood Estimation for the Tobit Model 204
4.3 Sampling Dístribution of a Chi-Square Statistic with 5 De- 8.1 Poisson Probability Distribution 219
grces of Freedom 89 8.2 Distribution of Observed and Predicted Counts of Artides 220
Index Plot of Standardized Pearson ResiduaIs 100 8.3 Distribution of Counts for the Poisson Regression Model 222
4.5 [ndex Plot of Cook's Iníluence Statistics 101 8.4 Comparisons of the Mean Predicted Probabilities From the
5.1 . of a Latent Variable Compared to the Regres- Poisson and Negative Binomial Regression Models 229
S10n of the Observed Variable y 118 8.5 Probability Density Function for the Gamma Distribution 232
5.2 Distribution of Given x for the Ordered Regression Model 120 8.6 Comparisons of the Negative Binomial and Poisson Distri-
5.3 Predictcd and Cumulative Probabilities for Women in 1989 132 butions 234
5.4 IlIustration of the Parallel Regression Assumption 141 8.7 Distribution of Counts for the Negative Binomial Regression
6.1 Discrete Plot for the Multinomial Logit Model of Model 235
Control Variables Are Held at Their Means. 8.8 Probability of O's From the Poisson and Negative Binomial
Jobs Are Classified as: M Menial; e Craft; B BIue Regression Models 238
Collar; W White Collar; and P = Professional 168 8.9 Comparison of the Predictions From Four Count Models 248
6.2 Odds Ratio Plot for a HypotheticaI Binary Logit Model 172 9.1 Similarities Between the Tobit and Probit Models 253
6.3 Odds Ratio Plot of Coefficients for a Hypothetical Multino- 9.2 Similarities Arnong the Ordinal Regression, Two-Limit Tobit,
mí al Model With Three Outcomes 174 and Grouped Regression Models 255
6.4 Odds Ratio Plot for a Multinomial Logit ModeI of Occu-
Attainment. Jobs Are Classified as: M = Menial;
B BIue Collar; W = White Collar; and P =
ProfessionaI 175
6.5 Enhanced Odds Ratio Plot With the Size of Letters Corre-
to of the Discrete Change in the Prob-
ability. Discrete Are Computed With All Variables
Held at Their Means. Jobs Are Classified as: M = Menial'
e B Blue Collar; W White Collar; and P ,
Professional 176
6.6 Enhanced Odds Ratio Plot for the Multinomial Logit Model
of Attitudes Toward Working Mothers. Discrete Changes
Were Computed With AIl Variables Held at Their Means.
categones Are: 1 Strongly Disagree; 2 Disagree; 3
Strongly 178
'~""~_,,,·I and Truncated Variables 188
7.2 Model With and Without Censoring and
Truncation 190
7.3 Normal Distribution With Truncation and Censoring 192
List of Tables

2.1 Descriptive Statistics for the First Academic Job Example 19


2.2 Linear Regression of the Prestige of the First Academic Job 20
3.1 Descriptive Statistics for the Labor Force Participation Ex-
ample 37
3.2 Linear Probability Model of Labor Force Participation 38
3.3 Logit and Probit Analyses of Labor Force Participation 49
3.4 Probabilities of Labor Force Participation Over the Range
of Each Independent Variable for the Probit Model 66
3.5 Probability of Employment by College Attendence and the
Number of Young Children for the Probit Model 69
3.6 Standardized and Unstandardized Probit Coefficients for La-
bor Force Participation 71
3.7 Marginal Etfects on the Probability of Being Employed for
the Probit Model 74
3.8 Discrete Change in the Probability of Employment for the
Probit Model 78
3.9 Factor Change Coefficients for Labor Force Participation for
the Probit Model 81
3.10 Factor Change of Two in the Odds With the Corresponding
Factor Change and Change in the Probability 82
4.1 Comparing Results From the LR and Wald Tests 97
xv
xvii
xvi REGRESSION MODELS Ust of Tables

8.1 Descriptive Statistics for the Doctoral Publications Example 227


4.2 Measures of Fit for the Logit and LPM Models 106
8.2 Linear Regression, Poisson Regression, and Negative Bino-
4.3 Classification 1hble of Observed and Predicted Outcomes for
mial Regression of Doctoral Publications 228
a Model 107
8.3 Zero Inflated Poisson and Zero Inflated Negative Binomial
4.4 Observed and Predicted Outcomes for the Logit Model of
Regression Models for Doctoral Publications 246
Labor Force Participatíon 109
9.1 Death Penalty Verdict by Race of Defendant and Victim 260
4.5 of Evidence Based on the Absolute Value of the
Difference in BIC or BIC' 112
4.6 AlC and BIC for the Model 113
5.1 Statistics for the Attitudes Toward Working
126
5.2 of the Linear Regression Model and Different
Parameterizations of the Ordered Regression Model 127
5.3 Standardized Coefficients for the Ordered Regression Model 129
5.4 Predicted Probabilities of Outcomes Within the Sample for
the Ordcred Model 131
5.5 Predicted Probabilities by Sex and Year for the Ordered
Model 134
5.6 Effects on Probabilities for Women in 1989, Com-
at the Means of Other Variables, for the Ordered
Model 135
5.7 Discrete in the Probability of Attitudes About Work-
Mothers fO! the Ordered Logit Model 137
5.8 Ordered and Cumulative Logit Regressions 142
5.9 Wald Tests of the Parallel Regression Assumption 144
6.1 Statistics for the Occupational Attainment Ex-
152
6.2 Coefficients for a Multinomial Logít Model of Occu-
Attainment 159
6.3 LR and Wald Tests That Each Variable Has No Effect 162
6.4 Discrete in Probability for a Multinomial Logit
Model of Occupations. Jobs Are Classified as: M = Menial'
e B Blue Collar; W = White Collar; and
P = Professional 167
6.5 Factor in the Odds for Being White 170
6.6 Coefficients From a Hypothetical Binary Logit Model 171
6.7 Coefficients fO! a Hypothetical Multinomial Logit
ModeI 173
7.1 and Truncation in the Analysis of the Prestige of
the Fírst Academic Job 191
7.2 Hausman and Wise's OLS and ML Estimates From a Sample
With Truncatíon 215
Series Editor's Introduction

The tools broadly labeled as "regression" have expanded in number


and power over the past two decades. In the "old days," researchers
trying to link a set of explanatory variables to a single response vari-
able were essentially limited to the general linear model: analysis of
variance-analysis of covariance and multiple regression. These were
useful tools when the response variable was measured on an equal in-
terval scale. However, in the social and biomedical sciences, few of
the response variables of interest come in equal interval metrics. Re-
sponses to survey questions are often, even typically, categorical (e.g.,
"employed," "unemployed") or ordinal (e.g., "agree," "uncertain," "dis-
agree"). The same holds for the outcomes of people processing and
medical institutions: sick or well, arrested or not, dropped out of school
or not, ¡ived or died, high school diploma or college degree or post-
graduate degree, and so on. For these kinds of response variables, the
general linear model is inappropriate and will often give misleading
answers.
The solution within a regression framework is "regression-like" mod-
els, sometimes collected within the framework of the generalized linear
model. The basic idea is still work with a lin,ear combination of explana-
tory variables, but to allow them to be related to the response variables
in a nonlinear way through a "link" function. Then the disturbance is
xix
REGRESSION MODELS Senes Editor's lntroduction xxi

distribution, usualIy not the normal. For example, Finally, a word about software. For most of the procedures disc~s~ed
the log of the odds of some binary outcome (e.g., in those book there exist statistical routines in all of the major statlstIcal
is on tbe usual linear eombination of explana- packages. This is both a blessing and a curse .. The bles~i~g is that min-
variables with the underlying conditional distribution of the binary imal computer skills are required. The curse IS that mll1lmal compu~er
outcomes taken to be binomial. skills are required. Right answers and wrong answers are easy to obtam.
In this Scott addresses these and related kinds of statis- With this in mind, Pro1'essor Long discusses ,some of the most popular
tical 1 am very to add Scott Long's Limited Depen- software. This too deserves serious study.
dent Van:ables to the series. The topies are of both praetical and theoret-
Ical and Professor Long has done a excellent job of exposi- RICHARD BERK
tion. The book is well suited as a text for graduate students in the social
and biomcdical sciences. It will also serve as a wonderful reference for

The core 01' Professor Long's approach is "statistical modeling," A


"model" is a of the processes being studied and/or
an of a scientiflc theory. A model is not merely
a data reduction device. Given the emphasis on modeling, it is especially
that the techniques discussed be used judiciously and that
Professor caveats be taken to heart. Thus, even a state-of-the-art
statistical is unlikely to salvage much of use from a seriously
flawed dataset. In addition, one must be able to make the case that the
statistical model maps well onto the empírical phenomena being stud-
ied. researchers use cause-and-effect language at their peril unless
there has been real manipulation of the explanatory variables. FinalIy,
when statistical in1'erence is to be undertaken, the sources of uncertainty
have to be articulated in a fashion that is consistent with what the model
assumes about how the uncertainty opera tes.
There is about the valídity of these principies, but
there are about what these principIes mean in prac-
tice. 1b it a bit (but only a bit) too starkly, at one extreme there are
those who never saw a model did not like. At the opposite extreme
are those who never saw a l110del Iiked, Most researchers fall be-
tween these extremes where the issues often boj] down to where the
burden 01' lies-1'or some, a model is acceptable as long as there
is no strong evidence to undermine it. For others, a model is unaccept-
abIe unless there is evidence to support it. I suspect that social
and biomedical researehers tend to fall in the first camp and that statis-
ticians tend to fall more in the second campo However, from this tension
has come a range 01' diagnostic tools that can help (but only
to determine how sound a model is. Professor Long is to be com-
mended for a healthy dose of those diagnostics in this book.
Practitioners should take them very seriously.
Preface

This book is about regression models that are appropriate when the
dependent variable is binary, ordinal, nominal, censored, truncated, or
counted. 1 refer to these outcomes as categorical and limited dependent
variables (CLDVs, for short). Within the last decade, advances in sta-
tistical software and increases in computing power have made it nearly
as easy to estimate models for CLDVs as the linear regression model.
This is reflected in the rapidly increasing use of these models. Nearly ev-
ery issue of major journals in the social sciences contains examples of
models such as logit, probit, or negative binomial regression. While com-
putational problems have largely be en eliminated, the models are more
difficult to learn and to use. There are two quite different reasons for
this. First, the models are nonlinear. As readers willlearn well, the non-
linearity of many models for CLDVs makes interpretation of the results
more difficult. With the linear regression model, most of the work is done
when the estimates are obtained. With models for CLDVs, the task of
interpretatíon is just beginning. Unfortunately, all too often when these
models are used, the substantive meaning of the parameters is incom-
pletely explained, incorrectly explained, or simply ignored. Sometimes
only the statistical significance or possibly the sign ís mentioned. A sec-
ond reason that these models are difficult to learn is that while models
for CLDVs are more complicated than the linear regression model, most
xxiii
,,'Xiv REGRESSION MODELS

books discuss them briefly, if at al!. While hundreds of pages may


be devoted to the linear model, only a dozen or two pages
are devoted to models for CLDVs.
in this book is to provide a unified treatment of Acknowledgments
the most useful models for categorical and limited dependent variables.
the book, the links among the models are made explicit,
and common methods of derivation, interpretation, and testing are ap-
Whenever possible, 1 relate these models to the more familiar
linear model. While Chapter 2 is a brief review of this model,
1 assume that readers are familiar with the specification, estimation, and
01' the linear regressíon mode!.
The best way lO learn these models is by seeing them applíed to
real data ancl applying them as you read. To that end, 1 iIlustrate
each model with data from a variety of applications ranging from at-
titudes toward mothers to scientific productivity. You may
find it useful to reproduce the results presented in the book using
your statistical To that end, 1 have placed the data from the
book with programs on my homepage on the World
Wide Web (http://www.indiana.edurjsI650) or aeeess the Sage Website
http://www.sagepub.eom/sagepage/authors.HTM for information. While
I used GAUSS-Markov for most of the eomputations, I will be adding I am indebted to the many people who gave me eomments on earlier
programs written in Stata, SAS, and LIMDEP. And, a book on drafts: Dick Berk, Ken Bollen, Brian Driseoll, Seott Eliason, Lowell
using Stata to estimate models for CLDVs is planned. Hargens, David James, Bob Kaufman, Herb Smith, Adrian Raftery,
This book grew out of a course on eategorical data analysis taught Ron Schoenberg, and Yu Xie. Members of the Workshop in Quan-
from 1978 to 1989 at Washington State University and at Indiana Uni- titative Methods at Indiana University-Clem Brooks, Bob Carini,
sinee 1989. this eourse is a eonstant ehalIenge and souree Brian Driseoll, Laurie Ervin, David James, Patricia MeManus, and Karl
of satisfaction. If you find the explanations that follow to be c1ear, it is Sehuessler-gave me feedback that substantially improved the book.
the fault of those students who refused to aeeept unelear explana- Paul Al lison , Laurie Ervin, Jaeques Hagenaars, Seott Hershberger, and
few refused to aeeept e1ear explanations, but that is a different Pravin Trivedi gave me exeeptionalIy detailed and useful advice. Techni-
UI..c;""Jll:> from students eontinually motivated me to find a way cal Typesetting Ine. did an outstanding job typesetting the book. And,
to make diffieult topies aceessible. And, indeed, sorne of the topics are I want to thank C. Deborah Laughton, my editor at Sage, for aH that
diffieult. While [ have 80ught to present the models fully and e1early with she has done for me and this book. While the suggestions that these
the mathematie8 possible, some readers will find the mathemat- people made resulted in a mueh better book, I am responsible (as they
¡cs to be a 1 hope that these readers will persist, beca use I say) for any errors that remain. Researeh support from the College of
have to find an person who eould not master these teehniques and Arts and Seienees at Indiana University is gratefulIy aeknowledged.
use them to learn more about the social world. While planning and writing this book 1 eneountered more than the
usual number of problems, few of whieh were related to the book. My
J.SCOTT LONG wife Valerie and my daughter Megan shared these ehallenges with me,
BLOOMINGTON, IN and to them I dedicate this book.
JUNE 12, 1996

xxv
Abbreviations and Notation

The following abbreviatíons and notation are used throughout the book.
While 1 have tried to use consistent notatíon and to avoid using the same
symbol for more than one purpose, there are a few exceptions, such as
A being used as the inverse Milis ratio and the logistic distribution.

Abbreviations

BRM: binary response model.


cdf: cumulative density function.
CLDVs: categorical and limited dependent variables.
CLM: conditional logit model.
HA: independence of irrelevant alternatives.
LM test: Lagrange multiplier test.
LPM: linear probability model.
LR test: Iikelihood ratio test.
LRM: linear regression model.
ML: maximum Iikelihood.
MNLM: multinomial logit model.
NB: negative binomial.
NBRM: negative binomial regression model.
OL'): ordinary least squares.
xxvii
xxviii REGRESSION MODELS Abbreviations and Notation

ORM: x: the independent variable when there is a single inelependent


variable (e.g., y IX + f3x + s).
PRM: the kth inelependent variable.
ZINB model: zero-inIIated binomial model. the kth indepenelent variable stanelarelized to have a variance
ZIP model: zero-intlated Poisson model. of l.
the lower extreme of xk; the mínimum of Xk if f3 k is positive;
else the maximum.
the upper extreme of Xk; the maxímum of Xk if f3k is positive;
Notation else the minimum.
a ww vector of independent variables for the ¡th observation;
is equa! to (e.g., 7T "" 2217). the ¡th row of X.
the eleviance of the moelel M. x: a row vector containing the means of the inelepenelent variables.
the residual y JI. x: a matrix of inelepenelent variables for the entire sample.
the of x. y: the observeel dependent variable; in Chapter 7, y is the observed
the value of y x anel noting the value of Xk' censo red variable.
either the pelf A(.) or the normal pelf <f;(.). y*: the latent dependent variable.
either the cdf A(·) or the normal celf <1>(-). y5: y stanelardizeel to have a variance of 1.
thc likelihood ratio statistic comparing the constraineel model y I y> T: the truncateel variable y given that y is greater than T.
to the unconstraineel moelel Mu. z: a z-statistíc.
the likelihood ratio statistic comparing M f3 to the model with a ww vector of independent variables for ¡th observation ror
Zim=
thc or intercepts. outcome m for the CLM in Chapter 7.
H: the Hessian matrix of se con el elerivatives of the log Iikelihooel a and f3: the intercept anel slope when there is a independent vari-
function; also used for the hat matrix in Section 4.2. able (e.g., y a + f3x + s).
the obselvation number (e.g., xJ. a: the dispersion parameter for the NBRM.
J: the number of dependent categories in nominal and ordinal a vector of coefficients; f30 is the intercept; f3k is the coefficient
13:
models.
for Xk'
the variable number f3k)' the unstanelarelized coefficient for x k'
K: the number of x's. in the MNLM, the coefficient for the effect of x k on the oelds
the likelíhood of parameters a given data b [e.g., L(f3 IX)]. of outcome m versus outcome n.
the Iikelihooel ratio chi-square statistic; the same as C 2 • a vector of coefficients f3k.mln in the MNLM.
the constraíned model (i.e., Mu with aelded constraints). the fuUy stanelarelized coefficient for Xk; Y anel the x's are stan-
the [ull model with as many parameters as observations. dardized.
the unconstraíned moeleL
the x-standardized coefficient for xk; Y is not standarelized but
the model with only the intercept or intercepts inclueleel.
Xk is.
the moelel wíth regressors anel intercepts inclueleel.
the size. the y-stanelardizeel coefficient for Xk; Y is stanelarelizeel but the
the normal distribution with mean f.L anel "arianee IJ2. x's are no!.
the coefficient of eletermination. an abbreviation for (xf3 - T)/IJ in Chapter 7.
the variance of the resielual e. an abbreviation for (TL - xf3)/ IJ in the two-limit tobit moelel of
the stanelard eleviation of Xk' Chapter 7.
a t-statistic. an abbreviation for (TU xf3)/ IJ in the two-limit tobit moelel of
the variance-covariance matrix of 9. Chapter 7.
the variance-covariance matrix of the x's. K: the average absolute eliscrete change.
W: the Wald test statistic; same as x 2 . /lE(y I x)/ /lXk: the eliscrete change in y for a change in Xk holeling other vari-
the Walel test statistic; the same as W. ables constant.
xxx REGRESSION MODELS

the in y for an infinitesimal change in Xk holding

6:
other variables constant; also called the marginal effect.
the error in equation
a vector of parameters
y* = a + {3x + e).
6 = (a {3 O")'J.
the pdf and cdf for the standard logistic distribution with mean
O ami variance
the pdf and cdf for the standardized logistic distribution with
1 Introduction

mean O and variance l.


the inverse Milis ratio defined as 1>0/<1>(.); used in Chapter 7.
the inverse Milis ratio for the ¡th observation.
the mean.
the y¡ x Y2 X ..••
the standard deviation of e given x.
the standard deviation of Xk'
the standard deviation of y.
the threshold in the tobít, probit, and logit models.
Tm : the threshold or cutpoint for the ORM.
the value to censored cases in tobit models.
the lower threshold for the two-limit tobi! model.
the upper threshold fOI the two-limit tobi! mode!.
the and cdf for the standard normal distribution with mean
O and variance 1.
the of in a group where the count is always O. The linear regression model is the most commonly used statistical
Used with zero modified count models. method in the social sciences. Rundreds of books describe this model,
the odds of outcomc given x.
and thousands of applications can be found. With few exceptions, the
the odds of outcome x and noting specifically the value of
regression model assumes that the dependent variable is continuous and
Xk'
the odds of outcomes less than or equal to m versus greater
has been measured for aH cases in the sample. Yet, many outcomes of
than m. fundamental interest to social scientists are not continuous or are not
the odds of outcome m versus n given x for the MNLM. observed for al! cases. This book considers regression models that are
appropriate when the dependent variable is censored, truncated, bínary,
ordinal, nominal, or count. 1 refer to these variables as categorical and
Iimited dependent variables (hereafter CLDVs).
A brief review of the literature in the social sciences shows how com-
mon CLDVs are. Indeed, continuous dependent variables may be the
exception. Rere are a few examples:
• Binary variables have two categories and are often used to indicate that
an event has occurred or that sorne characteristic is present. Is an adult a
member of the labor force? Did a citizen vote in the last election? Does a
high school student decide to go to college? Is a consumer more Iikely to
buy the same brand or to try a new brand? Did someone answer a given
question on a survey?
• Ordinal variables have categOIies that can be ranked. Surveys often ask
respondents to indicate their agreement to a statement tlsing the choices
REGRESSION MODELS lntroduction 3

agree, agree, and strongly disagree, Items asking the fre- Once the level of the dependent variable is determined, it is important
quency occurrence use the categories often, occasionally, seldom, to match the model used to the level of measurement. lf the model
and never. Polítical orientation may be c\assified as radical. liberal, and con- chosen assumes the wrong level of measurement, the estimator could
servative. Educational attainment can be measured in terms of the highest
be biased, inefficient, or simply inappropriate. Fortunately, there are a
received, with the ordinal oí less than high school, high
and sehooL Military rank and civil serviee grade
large number of models specifically designed for CLDVs. Binary logit
are ordinal. and probit are appropriate for binary outcomes. The ordered logit and
probit models explicitly deal with the ordered nature of the dependent
• Nominal variables oecur when therc are multiple outcomes that cannot be variable. Multinomial logit is appropriate for nominal outcomes. The
ordered. can be grouped as manual, trade, blue collar, white tobit model is designed for censored outcomes. Furthermore, a variety
collar, and Marital status might be coded as single, married,
of models such as Poisson and negatíve binomial regression can be used
divorced, and widowed. Political parties in European countries can be con-
sidered nominal c\assifications. Studies of brand preference may inc\ude
for count outcomes. These and related models are the subject of this
cholces among unordered alternatives. book.
Dntíl recently, the greatest obstacle in using models for CLDVs was
• variables occur when the value of a variable is unknown over the lack of software that was flexible, stable, and easy to use. This lim-
sorne mnge of the variable. The c\assic example i8 expenditures for durable itation no longer applies since these models can be estimated routinely
Individuals with less ¡ncome than the price of the cheapest
with standard software. Now, the greatest impediment is the complexity
durable will have zero expenditure. Measures of workers'
wages are restricted on the lower end by the minimum wage rateo
of the models and the difficulty in interpreting the results. The difficul-
Variables percentage, such as the percentage of homes damaged ties arise because most models for CLDVs are nonlinear.
in natmal disaster, are censored below at O and above at 100. Censoring
can al so occur for reasons. In the 1990 Census, all salaries
greater than $140,000 were recorded as $140,000 to ensure confidentiality. 1.1. Linear and Nonlinear Models
• Count variables indicate the number of times that sorne event has occurred.
How often did a person visit the doctor last year? How many jobs did some- The linear regression model is linear, while most models for CLDVs
one have? How rnany strikes occurred? How many artieles did a scientist are nonlinear. This difference is so basic for understanding the materials
How many demonstratÍons occurred? How many children in later chapters that 1 begin with a general overview of the implicatíons
did a have? How many years of formal education were cornpleted? of nonlinearity for interpreting the effects of independent variables. Just
How lIlany newspapers were founded during a given period? as the nonlinearities introduced by relativity theory made physical mod-
els substantialIy more complicated than their Newtonian counterparts,
The level of measurement of a variable is not always clear or unam-
the use of nonlinear statistical models has added new complications for
Indeed, you might with sorne of the examples gíven
the data analyst
above. Carter notes that " ... statements about levels of mea-
Figure 1.1 shows a linear and a nonlinear model predicting the depen-
suremcnt of a cannot be sensibly made in isolation from the
dent variable y. Each model has two independent variables: x is contin-
thcoretical and substantive context in which the [variable J is to be used.
uous and d is dichotomous with values O and 1. To keep the example
that a variable is somehow 'intrinsically' interval (ordinal,
simple, 1 assume that there is no random error. Panel A plots the linear
nominal) are analytically misleading." Education is a good example. Ed-
model
ucation can be measurcd as a binary variable that distinguishes those
school education or less from others. Or, it could be ordinal
llLLll"""l1Jll'. the received: junior high, high school, college, y = a + f3x + 8d [1.1]
or Or, it can be a count variable indicating the number of
years of school complcted. Each of these i8 reasonable and appropriate The solid line beginning at a plots y as x changes when d = O: y
on the substantive purpose of the analysis. a + f3x. The dashed line beginning at a + 8 plots y as x changes when
Introduction 5
REGRESSION MODELS

Panel A: Linear Model The partial derivative, often called the marginal effect, is the ratio of the
change in y to the change in x, when the change in x is infinitely small,
holding d constant. In a linear model, the partial derivative is the same
at all values of x and d. Consequently, when x increases by one unit,
y increases by {3 units regardless of the current level of x or d. This is
shown in panel A by the four small triangles with bases of length 1 and
heights of length {3.
The effect of d cannot be computed by taking the partial derivative
>-,
since d is not continuous. Instead, we measure the discrete change in y
ex él --- as d changes from O to 1, holding x constant:

(ex+{3x+ol)-(a+{3x+oO) o

When d changes from O to 1, y changes by o units regardless of the leve!


of x. This is shown in panel A by the two arrows marking the distance
between the solid and dashed lines.
Panel B: Nonlinear Model
Panel B plots the model

y = g (ex* + {3* x + 0* d) [1.2]

where g is a non linear function. For example, for the logit model of
Chapter 3, Equation 1.2 becomes

[1.3]

Interpretation of the effects of x and d is now more complicated. The


solid curve for d O and the dashed curve for d 1 are no longer
parallel: Ll¡ ::f: Ll4' The effect of a unit change in x differs according to
the level of both d and x: Ll 2 ::f: Ll, ::f: Lls ::f: Lló' The partial derivative of
y with respect to x is a function of both x and d. In general, the effect
of a unit change in a variable depends on the values of all variables in
x the model and is no longer simply equal to a parameter of the model.
While Equation 1.2 is nonlinear in y, it is often possible to find sorne
}'igure 1.1. Effects of Continuous and Dummy Variables in Linear and Nonlinear
Models
function h that transforms the nonlinear model into a linear model:

h (y) = a* + {3* x + o' d


=a + lo = (a + o) + {3x. The effect of x on y can be
taking the partial derivative with respect to x: For example, we can rewrite Equation 1.3 as

In (-y-)
1-y
ex* + {3*x + o'el
REGRESSION MODELS lntroduction 7

(Show this. J ) The dependent variable is now Inyj(1 y), a quantity show how the same model can be understood as a nonlinear probability
known as the The logit in creases by f3* units for every unit increase model without appealing to a latent variable. Issues of identification are
in x, d constant. As with Equation 1.1, this is true regardless of introduced to explain the apparent differences in results from the logit
the level of x or d. The problem is that it i8 often unclear what a unit and probit models. Since numerical methods are often necessary .for esti-
in crease in h means. For example, an in crease of f3* in the logit is mating these models, as well as later models, these methods are dlscussed
to most people. in some detail. 1 aIso introduce a variety of approaches for interpreting
One of the difficulties in effectively using models for CLDVs the results from nonlinear models. These techniques are the basis for
i5 the nonlinear effects of the independent variables. An aH interpreting al! of the models in later chapters. Chapter 4 reviews stan-
too common, albeit unnecessary, solution i8 to talk only about the statis- dard statistical tests associated with maximum likelihood estimation, and
tical of coefficients without indicating how these parameters considers a variety of measures for assessing the fit of a model. Chap-
to in the outcome of interest. A key ob- ter 5 extends the binary logit and probit modcls to ordered outcomes.
of this book i8 to show how models for CLDVs can be effectively While the resulting ordered logit and probit models are simple exten-
sions of their binary counterparts, having additional outcome categories
the book, 1 use the term "effect" to refer to a change in makes interpretation more complex. Chapter 6 presents the multinon:i~1
an outcome for a in an independent variable, holding all other and conditional logit models for nominal outcomes. The greatest dlffi-
variables constant. For in the probit model the effect of educa- culty in using these models is the large number of parameters required
tion 011 labor force participation might be described as: for an additional and the corresponding problems of ínterpretation. Chapter 7 considers
year of educatíon the probability of being in the labor force in crease s models with censored and truncated dependent variables, with a focus
all other variables at their means. Or, for count models on the tobit model. The tobit model is developed in terms of a latent
we conclude: for each increase in income of $1000, the expected variable that is mapped to the observed, censored outcome. The chapter
number of children in the family deereases by 5%, holding al! other vari- ends by considering a number of related models, including models for
ables constant. The interpretatíon of an "effect" as causal depends on sample selection bias. Chapter 8 presents models for count outcomes,
the natme of the problem being analyzed and the assumptions that a beginning with the Poisson regression model. Negative binomial regres-
researcher is to make. For a detailed discussion of the issues in- sion and zero modified models are considered as alternatives that allow
volved in causal inferences, see Sobel (1995) and the literature for overdispersion or heteroscedasticity in the data. Chapter 9 compares
cited therein. and contrasts the models from earlier chapters, and discusses the links
between these models and models not discussed in the book, such as
log-Iinear and event history models. .
1.2. Organizatíon The material in this book can be learned most effectIvely by read-
ing the chapters in order, but it is possible to skip some chapters or to
\.A!illi'lC;¡ 2 reviews the linear regression model to highlight issues that change the order in which others are read. Everyone should r.ead Chap-
are for the models in ¡ater chapters. Maximum likelihood es- ter 2 to learn the basic terminology and notation. Chapter 3 IS essentlal
timation i5 introduced within this familiar context to make it is easier for al! that follows since it introduces key concepts, such as latent vari-
to understand how to apply this method to the models in later chapters. ables, and methods of interpretation, such as di serete change. Those who
3 models for binary outcomes. 1 begin with regression are familiar with Wald and likelihood ratio tests can skip that section of
variable to illustrate how CLDVs can cause violations of the Chapter 4. The discussion of assessing fit in Chapter 4 is not needed for
of the linear regression model. Binary probit and logit are later chapters. Chapter 5 on ordinal outcomes can be read after Chap-
first derived using an unobserved or latent dependent variable. 1 then ter 6 on nominal outcomes. Chapter 8 on count models builds on the
results for truncated distributions in Chapter 7 to develop the zero mod-
ified models. However, most of Chapter 8 is accessible without reading
1 Exercises fOI" the reader are in italies. Solutíons are found in the Appendix.
Chapter 7.
REGRESSION MODELS Introduction 9

While eaeh model studied has unique characteristics, there are impor- literature. To help you keep the notation clear, atable of notatíon is
tant similarities among the models that are exploited. First, each model given on pages xxvii to xxx.
has the same component (MeCullagh and Nelder, 1989, pp. 26-
cach model enters the independent variables as a linear
combinatian: + ... + Consequently, in specifying your 1.3. Orientation
model you can use all of the "tricks" that you know for entering vari-
ables in the linear model: nominal variables can be coded as Before ending this chapter, a few words about the general orientation
a set of nonlinearities can be introduced by transform- of this book are in order. This is a book about data analysis rather than
the variables; the effects of an independent variable can about statistical theory. The mathematics has been kept as simple as
differ group adding interaction variables; and so on. Second, each possible without oversimplifying the models in ways that could result in
model is estimated by maximum likelihood. Once the general character- misuse or misunderstanding. The mathematics that is used, however, is
istics of maximum Iíkelihood are understood and the associated statisti- essential for understanding the correct applicatíon of these modeIs. To
cal tests are learned, these can be applied to aU of the models. Third, master the methods, it is important to work with the equations and to
the same ideas are used for interpreting eaeh model. Expected try sorne derivations on your own. To help you do this, 1 have included
and discrete changes are computed at interest- exercises in italics at various points. In the long mn, it will be worth
values of the independent variables and are presented in plots or your while to think about each of these questions befo re proceeding.
tables. whenever possihle the mathematical tools used for one Brief answers to the exercises are given in the Appendix.
model are carried over in the presentation of Jater models. Seeing how these models can be applied in substantive research is also
of these models can be derived in different ways. For example, important for understanding the models. Accordingly, each chapter in-
model can be deveJoped as a latent variable model in eludes a substantive example that is used to ilIustrate the interpretation
which the observed variable is an imperfect measurement of an of each model. You are al so encouraged to apply these models to your
latent variahle. Or, the model can be derived as a discrete own data while you are reading. To this end, comments are given about
choice model in which an individual chooses the outcome that provides four statistical packages for estímating models for CLDVs: LIMDEP
the maximum utility. the model can be viewed as a probability Version 7 (Greene, 1995), Markov Versíon 2 (Long, 1993), SAS Version
model with the characteristic S-shaped relationship between indepen- 6 (SAS Institute, 1990a), and Stata Version 5 (Stata Corporation, 1997).
dent variables and the probability of an event. Each of these approaches These comments are not designed to teach you how to use these pack-
results in the same formula relating the independent variables to the ages, but rather are general comments about difficulties that míght be
probability. 1 show alternative derivations of sorne models in
PVr,pf"TI'1"l encountered with any statistieal package. While nearly a1l of the analy-
order to highlight different characteristics of the models. This also serves ses in the book were done with my program Markov (Long, 1993) wrÍt-
to link my to the diverse literature in which these models ten in GAUSS (Aptech Systems lnc., 1996), any of these four packages
were could have been used for most analyses. To help you use these methods,
Models for CLDVs were often deveIoped independently in different 1 have placed the data sets, programs, and output for the examples on
such as engineering, statistics, and econometrics, with my homepage (http://www.indiana.edurjsI650) or access the Sage Web-
vcry littlc contact across the f¡elds. Consequentiy, there is no universally site http://www.sagepub.com/sagepage/authors.HTM for information.
notation or terminology. For example, the ordered logit model While this book contains what 1 believe are the most basic and use-
5 is also known as the ordinal logit model, the proportional fui methods for the analysis of CLDVs, a number of important topies
the model, and the grouped continuous were excluded due to limitations of space. Topics that have not heen dis-
model. 1 have tried to use what appears to be emerging as standard cussed include: robust and nonparametric methods of estimatíon, speci-
within the social sciences. Every effort has been made to fication tests (Davidson & MacKlnnon, 1993, pp. 522-528; Greene, 1993,
the notation consistent across chapters. On rare occasions, this has pp. 648-650), complex sampling, multiple equation systems (see Browne
resulted in notatíon that 1s different from that commonly used in the & Armínger, 1995, for a review), and híerarchical models (Longford,
REGRESSION MODELS

pp.
these are
Additional citatíons are gíven in later chapters. While

sidered here and are


important topics, they presuppose the models con-
the seope of this book. 1 chose a fuller
treatment of a smaller number of models rather than less detailed di s-
? Continuous Outcomes: The Linear
....... Regression Model
cussion of more methods. Hopefully, this will provide a firm foundatíon
fOl' further from the vast and growing Iiterature on limited and
variables.

1.4. Bibliographic Notes

ends with "Bibliographic Notes." These notes present a


brief of the models in that chapter and provide a list of basic
sources.
There are several alternative sources that deal with some of the mod-
els in this book. Maddala (1983) considers dozens of models
for CLDVs. (1985) reviews extensions to the tobit model, in-
seleetion models. McCullagh and Nelder (1989) discuss
some 01' the same models from the standpoint of the generalized linear
model. many of these models with particular ap-
to science. (1990) is particularly useful if al! This ehapter briefty reviews the linear regression model (LRM). While
of your variables are nominal or ordinal. Liao (1994) considers the in- 1 assume that you are familiar with regression, you should read this
of models within the context of the generalized chapter carefully since the model is described in a way that facilita tes the
(1995) provides a eomprehensive review of many development of models for categorical and limited dependent variables.
rclatcd Finally, Stokes et al. (1995) discuss models fOl' categorical Moreover, while the LRM is usually estimated by ordinary least squares,
variables in tcrms oE the SAS system. 1 focus on maximum likelihood estimation since this method is used
extensively in later chapters. My discussion of the LRM is by no means
comprehensive; for further details, see the references in Section 2.8.

2.1. The Linear Regression Model

The linear regression model can be written as

where y is the dependent variable, the x's are independent variables, and
e is a stochastic error. The subscript i is the observation number from
N random observations. f3¡ through f3 K are parameters that indicate the
effect of a given x on y. f30 is the intercept which indicates the expected
value of y when all of the x's are O. The modeI can be written in matrix
11
REGRESSION MODELS
Continuous Outcomes 13

x is a linear combination of the x's:


y XP+E

Thís is shown in Figure 2.1 for tlle simple


f3x + 8. Notice that 1 use IX and f3 for the palranleters
y

If we define Xi the ith row of


p (]

2.1 can be written as


e tU regression model rather than the more cumbersome:
The expected value of y x is drawn as a thick line
and moving up and to the right with

Homoscedastic and Uncon-elated En-ors. The errors are assumed to be


+S.
at IX

homoscedastic, which means that for a x, the errors have a constant


X;P+81 variance. Formally,

LRM Var(sd for all i


A number of as!¡un1PtiOtls are added to I',..,,,,,,I,,t,,, the specification When the variance differs across the errors are hetero-
of the model. Tlle set of assumotil:ms concerns tlle independent scedastic and Var(s¡ IXi) al. The errors are also assumed to be un-
variables. correlated across observations, so that for two observatíons i and .J. tbe
covariance between Sí and Sj is O.
hn,~""I" related to tlle x's In Figure 2.1, the distributioll of S is by a dotted curve tbat
between tlle x's and y should be thougbt of as coming out of the page into a third dimensiono
For ",,,,,,.,.,,,,1,,,
+ 8. Tllis assumption

are melepem:terlt. This means tllat none


combination of tlle x's. More formaIly,
tllat X is of full rank.

dístrílJution of the error 8,


as an un observable inftu-
can be viewed as tlle effect of a number
llave small effects on y.

CO'lUI,!ttonai Mean E. Tlle eonditional eX¡Jec:tatíon of the error

=0 x
Tllis means tllat Figure 2.1. Simple Linear Regressicm Model Wíth the Distribution of y Given x
to This aSS1Llm10ucm
REGRESSION MODELS Continuous Outcomes 15

The the curve, the more Iikely it is to have an error of that value. This means that when Xk increases by one unir, y is expected to change
The errors are sinee the variance of the error distribution by f3k uníts, holding other x's constant.
~s the same for ~ach X. While the curves are drawn as normal, normality In the LRM,
IS not for the errors to be homoscedastic.

When the errors are thought of as the combined effects


o~ m.any small it is reasonable to assume that they are normally
~lstnbuted when conditioned on the x's. With this assumption, the curves
which allows a simple interpretation of the f3's:
m 2.1 should be thought of as normal.
• For a unit in crease in Xk' the expected change in y equals holding all
See the referenees in Seetion 2.8 for a more detailed discussion of the other variables constant.

Since dummy variables are coded as 1 if an observation has some char-


acteristic and else 0, the coefficient for a dummy variable can be in ter-
Interpreting Regression Coefficients preted in the same way:

In 1, derivatives and discrete change were used to de- • liaving characteristic Xk (as opposed to not having the characterístic) results
scribe the effects of an índependent variable on the dependent outcome. in an expected change of f3k in y, holding al! other variables constant.
Even these two measures of change give identical answers for
tlle LRM, 1 consider both in order to introduce ideas that are critical in The slope coefficient is represented in Figure 2.1 by small triangles. The
Jater The subscript i is droppcd to simplify the notation. base of each triangle is one unit long, with the rise in the triangle equal
The delivative of y with respect to x k is to f3. Thus, for a unit increase in x, whethel' starting at x2' x3' or any
other value oí' x, y is expected to increase by f3 units.

2.2.1. Standardized and Semi-Standardized Coefficients


In the LRM, the derivative is the slope of the line relating y and
all other variables eonstant. Since the model is linear, the The f3 coefficients are defined in terms of the original me trie of the
value of thc partia! is a constant f3 k that does not depend on the leve! variables, and are sometimes called metlic coefficients or ullstandardized
any oí' the x's in the model. coefficients. It is often usefu! to compute coefficients after some ol' al!
The s~cond to interpretatíon involves computing the discrete of the variables have been standardized to have a unít variance. This
m. the value oí' y for a given change Ín x b holding aIl is particularly useful for the models introduced in later chapters where
other vanables constant. The notation E(y Ix, Xk) indicates the expected the scale of the dependent variable is arbitrary. This section considers
~alue oí x, explicitly noting the value of Xk' Thus, E(y IX, Xk + 1) coefficients that are standardized for y, standardized for the x's, and
J~ thc. value of y x when the kth variable equals Xk + 1. fully standardized for both y and the x 's.
1 he dlserete in y for a unit change in x k equals
y-Standardized Coefficients
Ix, Xk + 1) - E(y Ix, Xk)
Let u y be the standard deviation oí' y. We can standardize y to a
+ ... + f3k(xk + 1) + ... + f3KxK + e] variance of 1 by dividing Equation 2.1 by u y :
[f3o f3¡x¡ + ... + f3k x k + ... + f3KxK + el
e
+-
uy
REGRESSION MODELS Confinuous Ou/comes 17

new Fully Standardized Coefficients


It is also possihle to standardize both y and the x's:
XI ... +

where is y standardized to have a unir variance. f3~Y = 13 k! CTy is a


semi-standardized with respect to y or simply a y-standardized
and, adding new notation,
It ís still the case that
i = 135 + f3I XI + ... + f3~x~ + .. , + f3k x k +
_ _-'_-'- - f3Sy
- k'
f3~ (CTkf3k)!CTy ís a lul/y standardized coefficient or a path coefficient.
Since
For a continuous can be interpreted as:

• For a unit inerease in is to ehange by f3!y standard deviations,


all orher variables eonstant. the following ínterpretation applies:
For a • For a standard deviation inerease in Xb Y is expected to change by {3¡
standard deviations, holding al! other variables constant.
• opposed to not having the eharaeteristic) results
in of standard deviations, holding all other
Standardized Coefficients lor Dummy Variables
For a dummy variable, the meaning of a standard deviation change
x-Standardized is unclear. For example, consider the variable MALE defined as 1 for
men and O for women. Assume that the regressíon coefficíent for MALE
Let (Tk be the standard deviation of xk' Then, dividing each Xk by CTk equals .5. The effect of MALE changíng from O to 1 is quite c1ear: being
and the corresponding f3k by CTk' male increases the dependent variable by .5, holding all other variables
constant. Now consider the x-standardized coefficient. Suppose that the
y + ... + standard deviation of MALE is .25. Then the x-standardized coefficient
would equal .125 (= .5 x .25). To say that a standard deviation change
in a person's gender increases the dependent variable by .125 does not
make substantive sense. While fulIy standardized and x-standardized co-
efficients for dummy variables are sometimes used to compare the mag-
+ ... + + ... + f3~xk + s nitudes of the effects of variables, 1 do not find such comparisons use-
fuI. Consequently, x-standardized and fully standardízed coefficients for
is standardized to have a unir variance, and f3~x = CTkf3k ís a dummy variables are not used in ¡ater chapters.
~rm'-srfln(1fJJ"fl1'7{Nf with respect to X or simply an x-standardized
For a continuous variable, f3~x can be ínterpreted as: Comparison to Non/inear Models

• For standard deviation inerease in Xb Y is expeeted to ehange by f3;x The interpretation of the coefficients in the LRM differs in two im-
units, al! other variables constant. portant respects from the nonlinear models in ¡ater chapters. First, in
18 REGRESSION MODELS Continuous Outcomes 19

non linear models, depends on the value of Xk and on the val- has a t-distribution with N - K - 1 degrees of freedom and ean be used
ues of the other x's in the model. Second, in nonlinear models, aE(·)/ aXk to test the hypothesis that Ho: f3k f3*. Without assuming normality,
does not equal !lE(·) /!lx k' It is extremely important to avoid tk has a t-distribution as the sample becomes infinitely large (Greene,
the intelpretation of the LRM to the models in later 1993, pp. 299-301). Issues involved in testing hypotheses are discussed
in Chapter 4.

2.3. Estimation by Ordinary Least Squares Example of the LRM: Prestige of the Fírst Job

Long et al. (1980) examined factors that affect the prestige of a scien-
least squares (OLS) is the most frequently used met~od of tist's first academic job for a sample of male biochemists. Their primary
estimation for the LRM. The OLS estimator of (3 is that value (3 that
interest was whether characteristics associated with scientific productiv-
minímizes the sum of the squared residuals: :L::I (Yi - xj3)2. The result- ity were more important than characteristics associated with educational
estimator is
background. Here 1 extend those analyses to include information on fe-
male scientists.
The dependent variable is the prestige of the first job (JOB). Pres-
with the covariance matrix: tige is rated on a contínuous scale from LOO to S.OO, with schools from
1.00 to 1.99 classified as adequate, those from 2.00 to 2.99 as good, 3.00
to 3.99 as strong, and those aboye 3.99 as distinguished. Graduate pro-
Cov(fio, f;¡) Cov(í§o, f;K) grams rated below adequate or departments without graduate ~rogran:s
were coded as 1.00. The implications of thi8 decision are consldered m
Var(f;I) COV(f;I' f;K) Chapter 7 when this example is used to illustrate the tobit n:odel. The
independent variables are described in Table 2.1. Our regresslOn model
is

~~"~"0 of the model hold, the OLS estimator is the best JOB f30 + f3¡FEM +f32PHD+f33MENT +f34FEL+f35ART +f36C1T +8
This means that if the assumptions hold, the
Table 2.2 presents the estimates of the unstandardized a.nd standar~­
is an unbiased estimator [i.e., E(~) (3] that has the
minimum variance among all linear estimators. ized coefficients. t-values are also presented, but are not dlscussed untIl
To estimate Var(~), we need an estimate of the varianee of the errors,
the residual as e¡ Yi x¡p, we can use the unbiased
TABLE 2.1 Descriptive Statistics for the First Academic Job Example
estimator:
Standard
___ ¿>2 N
Name Mean Deviation Mínimum Maximum Descriptioll
1 ;=1 I
JOB 2.23 0.97 1.00 4.80 Prestige of job (from 1 lO 5)
where K is the number of independent variables. This allows us to es- FEM 0.39 0.49 0.00 1.00 1 if female; O if male
timate the eovariance matrix as Var(P) s2(X'X)-1. If the errors are PHD 3.20 0.95 1.00 4.80 Prestige of Ph.D. department
normal and = , then MENT 45.47 65.53 0.00 532.00 Citatíons received by mentor
FEL 0.62 0.49 0.00 l.OO 1 if held fellowshíp; cisc O
ART 2.28 2.26 0.00 18.00 Number of articles published
CIT 21.72 33.06 0.00 203.00 Number of citatÍons reccived

NOTE: N = 408.
20 REGRESSION MODELS Continuous Outcomes 21

of the First Academic Job


rated by transforming the variables. For example, consider the nonlinear
model:
6.42
~0.143 ~1.54
[2.2]
0.260 0.280 0.267 5.53
MENT 0.001 0.07!l 0.001 0.080 1.69 If we take the log of both sides,
FEL 0.234 0.240 2.47
ART 0.051 0.023 0.053 0.79
CIT 0.004 0.148 0.005 0.153 2.28

is an x-standardízed coefficient; f3s, is a the resulting equation is linear in Jn(z) even though it is nonlinear in
coetIicicnt: t is a Hest of f3. z. Accordingly, the slope f31 can be interpreted as discussed aboye: for
a unit in crease in x], ln( z) is expected to in crease by f31 units, holding
X2 constant. Note, however, that a f31 unit increase in ln(z) fram 1 to
and e/T can be used to ilIustrate the
of coefficients. 1 + f3l involves a different change in z than a change in Jn(z) fram, say,
2 to 2 + f31' This can be seen by taking the derivative 01' z with respect
• Unstandardized Being a fcmale scientist decreases the expected to x:l
of the nrst job .14 points on a five-point scale, holding all other
variables constant. For every additional citation, the prestige of the first job ó'z
is lo increase .004 units, holding al! othcr variables constant. ó'x¡
efiect is small due (o Ihe standard deviation in CIT.)
For every standard deviation increase in cita- exp(f3o + f3 1x¡ + f32 x 2 +
tÍons, the of the first job is expected to increase by .15 units, holding
other variables constant. exp(f3o + f3¡x] + f32 x 2 + e)f31
• a woman decreases the expected prestige zf3l
oC the nrst job .14 standard deviations, holding all other variables con-
stant. For every additional citatíon, the prestige of the first job is expected Thus, even though the expected change in y = ln(z) is the same re-
to in crease .005 standard deviations, holding al1 other variables constant. gardless of the current levels of Xl and xz, the change in z [not ln(z)]
unstandardized and y-standardized coefncíents are nearly identical depends on the level of z.
síncc the variance of y is about 1.) Equation 2.2 is an example of a class of nonlinear models known as
log-linear models: whíle z is nonlinearly reJated to the the log of z is
• standardized rn,plTIriu»'c For every standard deviation increase in ci-
linearly related to the x's. Sínce the logit models of Chapters 3, 4, and
tations, the of thc first job is expected to increase by .15 standard
dcviations, all other variables constant. 6 and the count models of Chapter 8 are log-linear models, it is worth
considering a simple method of interpretation that can be used for any
standardized and y-standardized coefficients are used to in- log-linear modeL
many of the models in Jater chapters. Since exp( a + b) exp( a) exp( b), Equation 2.2 can be written as

z( Xl) exp(f3o) exp(f3¡ X¡) exp(f32x2) exp( e)


2.4. Nonlinear Linear Regressíon Models
1 This requires the chain rule:
While the LRM is a linear model, non linear relationships between
af(g(x» ag(x)
the variables and the dependent variable can be incorpo- and exp(x).
ax == ag(x) ¡¡;¡- ax
REGRESSION MODELS Cantinuaus Outeames 23

) indica tes the value of z when x 1 has a given value. Consider we assume that E(e I x) = O. Consider a simple modification where we
1 to xlI: now assume that E(e Ix) = o. Here o is an unknown, nonzero constant.
We can modify Equation 2.3 so that the new error will have a zero mean:

y=(/30+0)+/3¡x¡ +"'+/3KxK+(e o)
) exp(/3¡) exp(/32x2) exp( e)
= /3'0 + /3Jx¡ + ... + /3KxK + 8*
is the multiplicative factor change in z
for a unit We have subtracted the mean of e o) from e to create a new error
e* with a zero mean. (Show that the mean of e* is O.) To maintain the
exp(/3¡ ) equality, we al so added o which is combined with /30 and relabeled as /3 0,
The resulting equation has all of the properties of the LRM, including a
This leads to the following interpretation: mean of Ofor the error e*. Consequently, we can use OLS to obtain best,
linear, unbiased estimates of /3'0 (not /3() and the /3k's. Thc expected
• For a unit increase in xl' is expectc~d to change by the factor exp(/3¡), value of fio is a combination of the intercept /30 and the mean of e:
al! other variables constant. E(fiü) /30 + o. No matter how large the sample, it is impossible to
disentangle estimates of /30 and o. More formally, /30 and B are not
in z for a unit change in x ¡ can be computed
as identified individually, although their sum /30 + o is identified.
Since the idea of identification is essential for understanding models
Z(X¡)] for CLDVs, it is worth reinforcing the key ideas with Figure 2.2. Assume
-(-) = lOO[exp(/3¡) 1] that the sample data, which are indicated by the dots, are generated by
Z Xl
the model y = a + /3x + e, where 8 is normally distributed with mean o.
This can be as: The solid ¡ine represents E(y I x) = a + /3x. As would be expected, the
unit increase in Xl' is expecte:d to change by lOO[exp(f3]) 1]%,
all other variables constan!.

~ote that other nonlinear models do not have this simple interpretation
m tcrms of a factor or a change.

Violatiol1s of the Assumptiol1s


... ..
.... ;-.--:-
-
.,..~-:
'J{5
-'~";' -:

While a discussion of the consequences of violating the as- ~.

of the LRM is the scope of my review, 1 consider two ()(

violations that are useful for understanding the models in


later
a
2.5.1. The Nonzero COllditiollal Mean of e

In the x
+ [2.3] Figure 2.2. Identificatíon of the Intercept in the Linear Regressíon Model
REGRESSION MODELS Continuous Outcomes 25

observed data are loeated approximately o E( si x) units above the but that we have estimated the model:
line. The OLS estima te of the regression Hne is the dashed
line that nms. the observations, with intercept (?* and slope í3. [2.5]
The estnnate of the slope appears unaffeeted by the nonzero The error lJ absorbs the excluded variable x2 and the original e:
mean of the errors, and is approximately egual to f3. Consistent with
our o
argument, the estimated intereept is about units above
the o: as a conseguence of the nonzero mean of the
o o
errors. While neither o: nor is identified, the sum o: + is identified If Xl and X2 are correlated, then lJ and Xl must be correlated. (H'J¡y must
and can be estimated íi~. this be the case?) Consequently, the OLS estimate of f31 in Eguation 2.5
This illustrates a number of critical ideas related to is a biased and inconsistent estimate of f31 in Eguation 2.4.
the of identification, First, a parameter is unidentified when it
is to estimate a parameter regardless of the data available.
Identification 15 a limitation 01' the model that cannot be remedied by 2.6. Maximum Likelihood Estimation
the size. Second, models beco me identified by adding
. . The i8 identified if we assume that E( si x) O; If we assume that the errors are normally distributed, the LRM can
wlthout thls it i5 unidentified. Third, it is possible for so me be estimated by maximum likelihood (ML). While the OLS and ML
~ara~~ters to be identified while others are not. Thus, while f30 i5 not estimators of J3 are identical for the LRM, 1 introduce ML estimation
~dent:~ed u~less the. value 01' Ix) is assumed, f31 through f3K are within the familiar context of regression to make it easier to understand
¡dentIÍ¡ed wlthout tl11s assumption. Finally, while individual parameters the application of ML to the models in later ehapters.
may not be combinations of those parameters may be identi-
~~d. while neither o nor f30 is identified, the sum f30 + o is iden- 2.6.1. Introduction to ML Estim¡¡tion
tlÍled. These ideas are important for understanding how we identify the
models in later Consider the problem of estimating the probability oí' having a given
number of men in your sample. The binomial formula computes the
probability of having s men in a sample of size N with the population
2.5.2. The x's ¡¡nd 8 Are Correl¡¡ted
parameter 1T indicating the probability of being male:
The Ix) O implies that the x's and s are uncorre- N!
Pr(s 11T, N) - - - - 1 T S (l [2.6]
¡ated. 1:1 there are several reasons why the x's might be corre-
lated wlth lhe er,rors, including reciprocal effeets among variables, mea-
surement error, mcorrect functional form, and f3's that differ across ob- where k! k· (k - 1) .. ·2·1. For example, the probability of having
servations 1986, pp. 334-350). Bere I consider the effect of three men in a sample of 10 with the probability of being a male equal
a variable since this will help us understand the tobit model in to .5 is
7.
lO! 3 7
If we estimate a model that excludes an independent variable which is Pr(s = 311T = .5, N 10) 3!7!.5 (1-.5) 0.117
c~rrelated with included independent variables, the OLS estima tes are
blased and inconsistent. Kmenta (1986, pp. 443-446) shows that this is This is a typical problem in probability. We know the formula for the
due to the eorrelation between the error and the independent variables probability distribution and the values of the parameters 1T and N. We
in the model. To see why they are correlated, assume that want to know the probability of a particular outcome s. In statistics, we
by the model: know s and N, but want to estimate 1T from the sample information. The
ML estima te is thal value of the parameter that makes the obse/1Jed data
y [2.4J most likely.
REGRESSION MODELS
Continuous Outcomes 27

the gradient 01' score, equals O:

O
a11"'
This is represented in Figure 2.3 by the líne with O located
at 11"' .3.
Tile value that maximizes the likelihood function also maxímizes the
log of the Iikelihood. Since it ls easier to solve the of
the log likelihood than the likelihood itself, the ML estimate is
computed by solving the equation:
aln
---'---'---------'- = O
0.8
For our example,2

2.3. Different Values of 11"
aln s = 3, N =
---~~-------~=--~~--~--~--~
a11"'

To continue our assume that we know that S 3 and N 10, a11"' a11"'
but do not know 11"'. What value of 11"' is most to have gen- O a31n 11"' a7
the observed 2.3 the probability of oh~",r,,¡nn + a11"' + --....,...-'------'-
three 10 tries for all values of 11"'. The 'U"'F.""'U
curve shows that the pnJO,IOlJIUV occurs at .3.
3 7
=
our ML estimate. 11" 1-11"'

Setting aIn L( 11"' Is 3, N lO)/a11" = O and for 11"' results in


2.6.2. The Likellllooti Funetion 7T .3 s/N.

",'-nul·n,,<u,~ the of S 2.6.3. ML Estimation of the Sample Mean


n,"rn""",t.""" N and 11"', it is referred to as a
of N and 11"' are held constant while S Befare estimating the model with it is usel'ul to con-
the same as a function of 11"', we refer sider the similar but simpler problem of tile mean of a stan-
ttkl;~/if1í!)od TUllcrlor¡: the values 01' N and s are held constant dard normal distribution. If y is drawn from a normal distribution with
Tile likelihood function far our is a standard deviation ol' 1, then the probability function for
y is

lJ\ellll()od estímate is that value 7T that maximizes the


2 We use the chain rule:
Oh<,"'n;'"''''the data that were observed. The
the 01' 1he 'u,",""wc>vu alnx
and
ax x
REGRESSION MODELS Continuous Outcomes 29

Since ~ is unknown, we writc the líkelihood function as O, 1, and 2. These are represented in the figure as solid circles. The
four panel s correspond to a sequence of guesses for the value of ~ that
IYI' (T 1) I~, (T 1) maximizes the likelihood. In panel A, the normal curve is centered on
For three ~ 2. The likelihood of each point is indicated by a vertical line, with
the likelihood is the product of the
individual llKI~1ll10()QS the overall likelihood equal to the product of the lengths of the lines:
3
L(~ 21 y) .005. Panel B computes the likelihood for ~ re-
3
suIting in L(~ -11 y) .0001. To increase the likelihood, we need a
nL(~IYi' (T
1) = nf(Y¡ I~, (T 1) value of ~ somewhere between 2 and -1. Panel e shows ~ 1, result-
i=l i=]
ing in L(~ 11 y) = .023. When we increase the mean slightly to 1.2
and the is in panel D, the likelihood is reduced to .022. Of our four tries, ~ 1
3
produces the ¡argest likeIihood. Tentatively, we conclude that ílML 1.
In y, In practice, ML is more complicated. First, we would usually have
1) InL(~ Iy" 1) = Llnf(Yi I~,
(T
(r (T = 1)
1=1
more observations. Second, we would often be estimating more than one
parameter (e.g., ~ and (r). Finally, we would have to consider all possible
The ML estimate is the value íl that maximizes this equation.
values of the parameters being estimated, not just the four values in our
To a better sense of how the ML estimate is determined, con- figure. Still, the general ideas are the same.
sider 2.4. Suppose that there are three observations with vaIues

Panel A: I y)=.005 Panel B: L(ft =-1 I y)=.0001 2.6.4. ML Estimation for Regression
o o
'"o '"o Maximum likelihood for the LRM is a direct extension of fitting a
normal distribution to a set of points. Consider estimating the simple
regression y a+ f3x + e using three observations: (x], y]), (X2' Y2), and
(x3, Y3)' PaneIs A and B of Figure 2.5 compare the likelihoods for two
sets of possible estimates. The observed data are indicated by circles. The
assumed distribution of y conditional on x is represented by the normal
curves which shouId be visuaIized as coming out of the page ínto a third
dimensiono The likelihood of an observatíon for a given paír a and f3
JI.. JI.. is indicated by the length of the line from an observatíon, indicated by
Panel c: L{~=1 I y)=.023 Panel D: L(~ =1.2 I y)=.022 a circle, to the normal curve. In panel A for a il and W, we find that
a o (x3' Y3) is very unlikely, while (x], y¡) ís quite likely. The likelihood of
'"O '"o aa and f3a is the product of the three lines in panel A. Clearly, a il and
W are not the ML estimates since ít is easy to find other estimates that
increase the likelihood, such as ah and f3b in panel B. The ML estimates
a
are those values and ~. that make the likelihood as large as possible.
Mathematically, we can develop the ML estimator for the LRM as
follows. Since y conditional on x is distributed normally with mean a+f3x
and variance (T2, the pdf for an observa tío n is

Figure 2.4. Maximum Likelihood Estimation of j.k From a Normal Distribution f(Yi 1 a + f3x" (T)
1 (1-2:
= (T~ exp :::':'--=-_'--;:---,---'-'-=-- [2.7J
Continuous Outcomes 31
30 REGRESSION MODELS

Panel A: Worse Fit Using this definition, Equation 2.7 becomes

f(y'! a Hx" a) ~ ~ ~ ex{-'---"'------:.-


[ ]
~ cp(Yi [a(]"+ {3X¡])

and the likelihood equation can be written as


N 1
L(a, {3,(]"ly,X)= TI-cp
;=1 (]"

Taking logs,

Panel B: Better Fít


{3X¡] ) [2.8]

ML estimates a, fj, and (j are obtained by maximizing this equation.


For multiple regression, y = xl3 + B and

The Iikelihood function is maximized when P I


(X X)-lX ' y, which is
the same as the OLS estimator. Maximum likelihood for the LRM is
unusual since a closed-form solution is available. This means that the
estima tes can be obtained by algebraically solving the gradient of the log
likelihood equation for the unknown parameters. Closed-form solutions
are not possible for most of the models considered in later chapters and,
consequentIy, iterative methods must be used. This topic is discussed in
x Chapter 3.
Figure 2.5. Maximum Likelihood Estimation for the Linear Regression Model
2.6.5. The Variance of ML Estimators
o: a norma~ variable with mean J1, and variance (]"2 is often Maximum Iikelihood can also estimate the variance of the estimators.
m ter.ms of the pdf of a standardized normal variable cp with While the technical details are beyond the scopc of our discussion (see
mean O and vanancc 1: Cramer, 1986, pp. 27-28; Davidson & MacKinnon, 1993, pp. 260-267;
Eliason, 1993, pp. 40-41), we need a few definitions and results that are
uscd in latcr chapters.
REGRESSION MODELS
Let a be "V t . Continuous Outcomes 33
• ee or contaming the
u .
. m the simple parameters bel~g estimated. For ex-
wlll contain a, and u The Y. . a.+ (3x + e wlth Vare e Ix) ::::: U a
n 2.6.6. The Properties of ML Estimators
'IS
, . eSStan IS a matrix of ,second d envatlves
. .'
Under very general cunditions, the ML estimator has a number of
desirable properties. First, the ML estimator is consistent. This means
H(6) InL(a) roughly that as the sample size grows large, the probabilíty that the
eJa eJa' ML estimator differs from the true parameter by an arbitrarily smal!
is a
symmetric matrix. For Our example, amount tends toward O. Second, the ML estimator i8 asymptotically effi-
cient, which mean s that the variance of the ML estimator is the small-
In i!2Jn L(a) c7 2 In L(a) est possible among consistent estimators. Finally, the ML esimator i8
i!ai!{3 c7ai!u asymptotically normally distributed, which justifies the statistica) tests that
H(O) 2 are discussed in Chapter 4. Notice that these are asymptotic proper-
c7 In L(a) i!21n L(6)
ties, which means that they describe the ML estimator as the sample
~ i!{3du size approaches oo. The degree to which they apply in finite samples is
InL(6) discussed in Section 3.5.
i!udU
For 't~he rate at which the slope of the func- 2.7. Conclusions
sluwly a; (3
,
t
L(a)ja{3i!{3 is small, then the log
c umges That is In L . The linear regression model is our point of departure for presentíng
sense that if In L is fl;t h' '.' . lS nearly flat.
that ' t en It wlll be difficult to the models in later chapters. The next chapter begins by showing the
of the log .likelihood. This should be problems inherent in using the LRM with a binary dependent variable.
the vanance refl .t . These problems lead to a latent regression mudel that generates the
Indeed, the Hessian is'l d ee s o~r certamty
rnr"",,,,·.!. the . ' re ate to the vanance of the binary logit and probit models.
matnx.
rnatnr: is defined as the .
-E[H(a)] U d . negatlveoftheexpectedvalue
matrix for the . ~ er v~ry general conditions, the cova " 2.8. Bibliographic Notes
18 the mverse of the informatíon mat~~~ce

Varea) -E[H(a)¡-1 There are hundreds of texts dealing with the linear regression model.
For our [n order of increasing difficulty, 1 recommend Griffiths et al. (1993) for
an introductory text; Kmenta (1986), Greene (1993), and Theil (1971)
as intermediate texts; and Amemiya (1985) for an advanced treatment.
_E(i!2In L(6») -J Manski (1995) provides a detailed discussion of the identification prob-
_E(i!2 In L(a») lem. Four recommended sources on maximum likelihood, in order of
eJaeJ{3 i!acJu
increasing difficulty, are: Eliason (1993), Cramer (1986), Greene (1993,
-E( lnL(a») _E(cJ21n L(a») Chapter 12), and Davidson and MacKinnon (1993, Chapter 8).
i!{3eJ{3 cJ{3cJu
2
_E(i! 1n L(6») _E(i!Z In L(f!2)
i!ud{3 cJucJu
VareO) are considered in Chapter 3.
BinGly Outcomes 35

able. While 1 do not recommend the LPM, the model iJlustrates the
8ina~y Outcomes: The Linear Probability, problems resulting from a binary dependent variable, and motiva tes our
discussion of the logit and probit models. The probit and logit models
Pro bit, and Logit Models are developed first in terms of the regression of a latent variable. The la-
tent variable is related to the observed, binary variable in a simple way:
if the latent variable is greater than sorne value, the observed variable
is 1; otherwise it is O. This model is linear in the latent variable, but re-
sults in a nonlinear, S-shaped model relating the independent variables
to the probability that an event has occurred. Given the great similarity
between the logit and probit models, 1 refer to them jointly as the bi-
nary response model, abbreviated as BRM. The BRM is also developed
as a nonlinear probability model. Within this context, the complemen-
tary log-Iog model is introduced as an asymmetric altemative to the logit
and probit models.

3.1. The Linear Probability Model

The linear probability model is the regression model applied to a binary


dependent variable. The structural model is
are extrcmely com . h .
and d' mon m t e social sciences
. stu led the decisions by a bank to accep'
D omenclch and McFadden (1975)
¡ where Xi is a vector of values for tbe ith observatíon, 13 is a vector of
the use of pubr . analyzed factors af- parameters, and e is the error termo y = 1 when sorne event occurs,
IC versus pnvate transp t ' t
. and Cnudde (1975) conside d th d .0:- atlOn or commuting. and y = O if the event does not occur. For example, y 1 if a woman
e
em m the 1972 presidential electO re All eCIslon to vote for McGov-
. IOn; en (1991)' . is in the paid labor force, and y = O if she is not. If we have a single
tlle corporate elite to th D .. ' exammed contnbu- independent variable, the model can be written as
·
stu d led tbe pres¡'dent's e emocratlc Party' wh'l R
. . d eCISlon
" to mak d' , . I e agsdale
to the natíon. 01her ontcomes ¡'n l d h e a IscretIonary speech Yi = oc + (3x¡ + ei
. c u e w ether fraud .
a and loan institution (TilIma & P l wa~ commItted by
to remain with the sponso' n l ante 1, 1995); If a trainee de- which is plotted in Figure 3.1. The conditional expectation o[ y given x,
a student . ~~~ e.mp oyer (Gunderson, 1974); and E(y I x) oc + (3x, is shown as a solid lineo Observations are plotted as
1990) E I hlS or. her mentor duríng graduate circles at y = O and y = l.
. Nen a Cllrsory ghnce t .
turns up of add'( , a recent joumals in the soeial To llnderstand the LPM, we must consider the meaníng of E(y Ix).
tercourse d I ¡.onal exan:p~es, ranging from having in- When y is a binary random variable, the uncondítional expectatíon of y
to in the mílítary. roPplI1g out of hlgh school, joining a un ion, is the probability that the event occurs:
1 present four model f 1 . E(y¡) = [1 x Pr(Yí 1)] + [O x Pr(y¡ O)] Pr(y¡ = 1)
comes: tbe probability model (L s or t le .analysls o: binary out-
model b . tI PM), the bmary problt model the
For tbe regression model, we are taking conditional expectations:
is the linear' ne ; ' ~h~ co~plementary log-log model. 'The
10 e app led to a binary dependent vari-
E(Yi Ix¡) = [1 x Pr(Yi = 11 x¡)] + [O x Pr(Yi OIXi)] = Pr(Yi 11 x¡)
REGRESSJON MODELS Binary Outcomes 37

E(ylx) TABLE 3.1 Descriptive Statistics for the Labor Force Participation Example
o
Standard
Name Mean Deviatian Minimum Maximum Desenplian

LFP 0.57 0.50 0.00 l.OO 1 if wife is in the paid labor force; else O
K5 0.24 0.52 0.00 3.00 Number of childrcn ages 5 and younger
K618 1.35 1.32 0.00 8.00 Number of children ages 6 to 18
AGE 42.54 8.07 30.00 60.00 Wife 's age in years
WC 0.28 0.45 0.00 LOO 1 if wife attended college; else O
.... (1)
HC 0.39 0.49 0.00 1.00 1 if husband altended college; else O
LWG 1.10 0.59 -2.05 3.22 Log of wife's estimated wage rate
INC 20.13 11.63 -0.03 96.0ü Family iilcome excluding wifc's wages

100 NOTE: N 753.


x
Figure 3.1. Linear Our analysis is based on data extracted by Mroz (1987) from the 1976
Model for a Single Independent Variable
Panel Study of Income Dynamics.! The sample consists of 753 white,
married women between the ages of 30 and 60. The dependent variable
LFP is 1 if a woman is employed and is O otherwise. The independent
variables, which are similar to those used by Nakamura and Nakamura
value .of x is the probability that y 1 (1981), Mroz (1987), and Berndt (1991), are listed in Table 3.1. Our mea-
us to re\vnte the LPM as sures oí educational attainment are dummy variables indicating whether
the husband or wife spent at least one year in college, rather than the
Pr(Yí 1 Ix¡) x¡j3
more commonly used measures of the number oí years of educatíon.
a outcome does not aff t I ' This was done to illustrate the interpretation of dummy independent
parameters that was . (] ec t le Interpretatíon of the variables.
. h I.n Japter 2: for a unir íncrease in x the
The model being estimated is
all other In t e probab~lity of an event occurring is f3 , h01ding
k
constanL SlIlce the model is linea' .
re~uJts in the same change in the probabi~itya ~~I\ c.hanhge
lI1 the probability . d h . a IS, t e
modelo ,an ence the name linear probability with estimates presented in Table 3.2. Interpretation is straightforward.
For example:

the • Unstandardized coefficients for continuous variables. For every additional


Labor Force Participation child under 6, the predicted probability of a woman employed dc-
authors prese t d . . dI' . creases by .30, holding all other variables constant.
, .. n e mo e s m WhlCh the dependent v . bl
<l marned woman was in the paid labor force For e ana l e • x-standardized coefficients for continuous variables. For a standard deviation
ti . f' . xamp e, increase in family income, the predicted probability of being employed de-
and le l use o logIt, probit, and LPM mod-
creases by .08, holding all other variables constant.
(,An~~"r" labor . ' . (l ~81,. pp. 464-468) use a probít model
Mroz partlclpatlOn In the United States and Canada. • Unstandardized coefficients for dummy variables. If the wife attended college,
on models for a woman's h f' the predicted probability of being in the labor force increases by .16, holding
he uses a probit model to correct f 1 .ours.o pald la- all other variables constant.
pp . or samp e selectlOn bias Berndt
. revlCWS the research in this area. .
1 These data were generously made available by Thomas Mroz.
REGRESSION MODELS
Bínary Outcomes 39
TABLE 3.2 Linear Probability Model of 1 abor F '
Participation orce
J •

Goldberger (1964, pp. 248-250) sllggested that the LPM could be cor-
Varíable
f3 rected for heteroscedasticity with a two-step estimator. In the first step,
Constan! Ll44
y is estimated by OLS. In the second step, the model is estimated with
K5 --0,295 -0,154
9,00 generalized least squares using Var(e) y(1 - 10 correct for het-
K618 -8,21
-0,011 -0,015 eroscedasticity. While this approaeh inereases the efficiency 01' the esti-
AGE -0,80
-0,013 --0,103 mates, it do es not correct for other problems with the LPM. l'urther, for
we -5,02
He
0.164
0,019
3,57 y < O or y > 1, the estimated varianee is negative and ad hoc adjust-
OAS ments are required.
0,072 4,07
NOImality. Consider a specific value of x, say x •. In Figure 3.1,
E(y Ix.) is represented by a diamond on the regression line. e is the
distanee from E(y Ix) to the observed value. Since y ean only have the
values O and 1, which are indicated by the open circles, the error must
There are several things to note b . ,
effeet of a variable is the a out these mterpretatIons. Fírst, the equal either el = 1 - E(y Ix.) or ea 0- E(y I Clearly, the errors
ame
""',"UJ""d. th~ effec: re~ardless of the vaIues of the other cannot be normalIy distributed. RecaIl that normality is not required for
of ¡he current valu~fo; ~~ltt ~<~r~~;~ f~; e:aarmiapblIee í'Sf athewoman
, . ( ,1
same
the OLS estimates to be unbiased.

, compared to no young children her redíct Nonsensical Predictions. The LPM prediets values of y that are neg-
oí employment decreases by 1 18 4 ' p . e.d ative or greater than 1. Given our interpretation of E(y I x) as Pr(y
TI' . x -.295), whlch 1S
lIS probl,em is co.nsidered in the next section. Fi- 11 x), this leads to nonsensical predictions for the probabilities, For ex-
,and y-standardlzed coefficients are inap ro ri- ample, using the means in Table 3.1 and the LPM estima tes in Table
.olltcome, and x-standardized coeflicients ar~ in~p­ 3.2, we find that a 35-year-old woman with four young children. who did
mdependent variables. not attend coIlege nor did her husband. and who is average on other
variables, has a predicted probability of being employed of ,48. (Verify
3.1.1. Problems With the LI'M
this result.) While unreasonable predictions are sometimes used to di s-
While the miss the LPM, sllch predieations at extreme values oí the independent
of the parameters is unaffected by h . variables are al so common in regressions with continuous outcomes.
several ass t' avmg
.• ump lOns of the LRM are necessarily
Functíonal Fonn, Since the model is linear, a unir increase in XI< re-
sults in a constant change of f3k in the probability of an event, holding
If a binary rando . bl
is 11(1 ) (P . . m vana e has mean /.L, then its al! other variables constant. The inerease is the same regardless of the
r /.L. rave thll') Smce the t d
x is xJ3, the condi!ional varian~~ of y de edxpec e value?f y given current value of x, In many applications, this is unrealistic. For exam-
pen s on x accordmg to the pIe, with the LPM each additional young child decreases the probability
oí being employed by .295, which implies that a woman with four young
VhrCvlx) Pr(y l/x)[l Pr(y l/x)] children has a probability that is 1.18 less than that of a woman without
. ' xJ3(J - xJ3)
WhlCh that the f h young children, alI other variables being held constant. More realistically,
not constant. (Plot the Var(y I x) o, t (le crrors ?epends on the x's and is each additional chiId would have a diminishing effeet on the probabil-
LPM is 1 O' as x p ranges from -.2 (o 1.2.) Since the ity. While the first child might de crease the probability by ,3, the second
• t le LS '. . .
standard errors are . 1" eS~In:ator of J3 IS mefficient and the child might only deerease the probability an additional and so on.
resu tmg m mcorrect test statistics, That is to say, the model should be nonlinear. In general, when the out-
eome is a probability, it is often substantive/y reasonable that the effeets
41
Binary Outcomes
40 REGRESSION MODELS
. . d application of the BRM is not
of independent variables will have diminishing retums as the predicted portant to realize that the denva;l;: ~:tion of a latent variable. Section
probability approaches O or 1. In my opinion, the most serious problem dependent on your accePBtaRn~ ~an :e derived as a nonlinear probability
with the LPM is its functional formo 3 4 shows that the sarne .
The binary response model has an S-shaped relationship between the ~odel, without in.voking thed it~e~~fli~~~tr~yn\:~:~~~I~~ the observed x's
independent variables and the probability of an event, which addresses The latent y' IS assume
the problem with the functiona l form in the LPM. In the following sec- through the structural rnodel:
tion 1 develop this model in terrns of a latent dependent variable. Section y; = x¡~ + B¡
3.4 shows how the logit and probit rnodels can also be thought of as non-
linear probability rnodels without appealing to a latent variable. And, in . bl • I·S Iinked to the observed binary variable y by the
The latent vana e y
Chapter 6, the rnodels are derived as discrete choice models in which an measurement equation:
individual chooses the option that rnaximizes her utility.
f1 if Y7 > T [3.1]
y¡ = tO if yj ::: T

3.2. A Latent Variable Model for Binary Variables . If ' then y - O If y' crosses
where T is the threshold or cutpotnt. y::: T, - m· e that T = O
. * ) h - 1 For now, we assu .
As with the LPM, we have an observed binary variable y. Suppose the threshold T (I.e., y. > T , t e.n ~d- t: in assumption in detail.
that there is an unobserved or latent variable y* ranging frorn -00 to 00 Section 5.2 (p. 122) dlscusses thlS.1 en ~e gobserved Y is ilIustrated in
d
that generates the observed y's. Those who have larger values of y* are The link between the la:e~ y a~x + B. In this figure, y' is on the
observed as y = 1, while those with srnaller values of y' are observed as Figure 3.2 for the model y - a + . l dashed line
y = O. vertic~l a~is, ~ith the*t~re~hold ~ int~i~a~~~_~~a~:;~~~:: which should
Since the notion of a latent variable is central to this approach to The dlstnbutlOn of y. IS s °twonf t~e figure into a third dimensiono When
deriving the BRM, it is important to understand what is rneant by a latent be thought of as commg ou . - 1
y' is larger than T, indicated by the shaded reglon, we observe y - .
variable. Consider a woman's labor force participation as the observed
y. The variable y can only be observed in two states: a woman is in the
labor force, or she is noto However, not all wornen in the labor force
are there with the same certainty. One woman rnight be very c10se to
the decision of leaving the labor force, while another woman could be E(y" lx)
very firm in her decision. In both cases, we observe the same y = lo
The idea of a latent y' is that there is an underlying propensity to work
that generates the observed sta te. While we cannot directly observe y*,
at some point a change in y* results in a change in what we observe, *~
nam ely, whether a woman is in the labor force. For example, as the
number of young children in the farn ily in creases, it is reasonable that T=O
a woman's propensity to be in the labor force (as opposed to working
at home) would decrease. At sorne point, the propensity would cross a
threshold that would result in a decision to ¡eave the labor force.
Can al! binary outcomes be viewed as manifestations of a latent vari- X1 x2
able? Sorne researchers argue that invoking a latent variable is usuaJly
X
inappropriate, others believe that an underlying latent variable is per-
fectly reasonable in all cases, while most seem to take a middle ground. Figure 3.2. The Distribution of y' Given x in the Binary Response Model
Regardless of your assessment of the use of a latent variable, it is im-
REGRESSION MODELS Bina/y Outcomes 43
For <:ACUl1IJl<:,
of the
1 egual 1, at X2 nearly 90% are Panel A: pdf's for logistic and normal distributions
LO
o
Since is continuous, the model avoids the problems encountered
with the LPM, sin ce the dependent variable is unobserved, the
model cannot be estimated with OLS, Instead, we use ML estimation, n
which about the distribution of the errors. Most o
the choice is normal errors whieh result in the prohit N

and errors which result in the logit modeL As with the


o
we assume that Ix) O.
Since is we cannot estima te the variance of the er-
rors as we did with tlle LRM. In tlle probit model, we assume that
-2 -1
] and in the model that Vare x) e'
1T 2 ¡3 ~ 3.29.
means "is approximately egual to.") The specific value Panel B: cdf's for logistic and normal distributions
assumed for the variance is arbitralY in the sense that it cannot be dis-
~----~----~---r---'----:--=~~~~~=,
confirmed the data. We choose a value that results in the simplest
for the distríbution of e.
and normal distributions are used so freguently for models
that it is worth examining these distributions in detail. The o
LO

and cllmulative distriblltion functions for o


tlle norma] and distributions are shown in Figure 3.3. The normal LO
N ..
is drawn with a solid line. When e is normal with E(el x) O o
1, the pdf is
g -_.-.
exp ( _ f;2) 0_4 -3 -2 -1 o
Figure 33, Normal and Logistic Distributions
and tlle cumulative distribution function (Ilereafter, cdO is

I~ 1 ex p ( -~)dt
These distributions are drawn with long dot-dashes in. Fig~re 3:3. The
The cdf JU".,w",,, the probability that a random variable is less than or
standard logistic pdf is flatter than the normal distributlon smce It has a
value, For example, cf)(O) = Pr( e ::'S O) S (Find this larger variance. . . k
If we rescale the logis tic distribution to h~ve. a u.mt vananc~, .. n~wn
model, the errors are assumed to have a standard logistic as the standardized (not standard) logistic dl~tflbutiOn, the IO~lstlC and
{lISlrUnl[¡!on with mean O and variance 1T 2¡3, This unusual variance is normal cdi's are nearly identical, as shown 111 pa.ne.1 B. of. FIg.ure 3:3.
LJV',",<tl,,,<: it results in a particularly simple eguation for the pdf: However, t h e p df an d cdf for the standardized loglstlc dlstnbutlOn wlth
a unit variance are more complicated:
y exp( ye) 5' exp( ye)
and an even eguation for tlle cdf: and A (e) = 1 + exp(Y8) [3.2]
"lllllJI'C;t
[1 + exp(ys)]2
where y = 1T¡../3, Because of the simpler eguations for the stan?~rd
(not standardized) logistic distribution, it is gene rally used for denvmg
44
REGRESSrON MODELS
Bina/y Outcomes 45

3.4 and places it on its side. Since y=:::1 when y* > 0,

E(y' lx) Pr(y = 1 1x) = Pr(y' > ° I x)

Substituting y* = x(l + e, it follows that

Pr(y = 11 x) = Pr(x(l + (-; > OIx)


Subtracting x(l from each si de of the ineguality corresponds to shifting
the x-axis as shown in panel B. Then

Pr(y = 11x) = Pr(e > -x(l Ix)


Xl Xz
Since cdCs express the probability of a variable being less than sorne
x value, we must change the direction of the inequality. The norm al and
logistic distributions are symmetric, which means that the shaded area
Figut·c 3.4. ProbabiJity of Observed Values in the Binary Response Model
of the distribution greater than -x(l in panel B equals the shaded area
less than x(l in panel C. Consequently,
the logit model. The conseguences of assuming diffe rent variances for
the probit and logit models are considered in Section 3.3. Pr(y = 1 1x) = Pr( e :s: x(l I x)
, By assuming a spe~i.fk fo rm for the distribution of e, it is possible to
This is simply the cdf of the error distribution evaluated at x(l. Accord-
c~mpute the probabIlJty of y = 1 for a given x. To see this, consider ingly,
F¡gl!re 3.4, where (-; is distributed either logistically or normally around
E(y I x) = a.+ ~x. ~alues of y = 1 are observed for the shaded portion Pr(y = 1 1x) = F(x(l) [3.3J
of the error dlstnbutlOn above 7". Even if E(y' Ix) is in the shaded region
°
wher~ y =::: 1 (e.g., ~t x2), it is possible to observe a if e is large and
negatIve. The negatlve error moves y' ¡nto the unshaded region of the
where F is the normal cdf <P for the prebit model and the logistic cdf A
for the logit model. The probability of observing an event given x is the
curve.
cumulative density evaluated at x(l.
Figure 3.5 ¡!lustra tes the translatíon of these ideas into a formula for To understand the functional form of the resulting model, consider the
computmg Pr(y = 1 1x). Panel A takes the error distribution frem Figure BRM for a single independent variable:

Pr(y = 1 1x) = F( ex + (3x) [3.4)


Panel A: Original Axis Panel B: Shift the Axis Panel C: Flip the Axis
?;- As x increases by one unit, the argument of F increases by (3 units.
(f)
e Plotting Eguation 3.4 corresponds to plotting the cdf of either the normal
<l)
o or the logistic distribution as its argument increases. This is shown in
Figure 3.6. Panel A illustrates the error distribution for nine values of x .
o xfJ The region of the distribution where y' > r corresponds to Pr(y = 1 I x)
-xf! O O xf!
y' y'-xf3= e and has been shaded. Panel B plots Pr(y = 1 1x). At Xl' only a small
- e=xf3-y' portion of the tail of the curve crosses the threshold in panel A, resulting
Figure 3.5. Computing Pr(y = 1 1x) in the Binary Response Model in a small value of Pr(y = 1 I x) in panel B. As we move to x2' the
error distribution shifts up slightly. (This shift is exactly (3(X 2 - Xl) . Why?
REGRESSION MODELS Binary Oulcomes 47
Panel A: Plot of y*
Before considering the interpretation of the parameters and how they
are related to the predicted probability of an event, we must consider
the issue of identification.

3.3, Identificatíon

In specifying the BRM, we made three identifying assumptions: (1)


the threshold is O: T = O; (2) the conditional mean of e is O: Ix) =
O; and (3) the conditional variance of e is a constant: Vare e I x) = 1
in the probit model and Varee I x) = in the logit model. These
assumptions are arbitrary in the sen se that they cannot be tested, but
they are necessaty to identify the model. Identification is an issue that is
essential for understanding models with latent variables. Since a latent
variable is unobserved, Íts mean and variance caonot be estimated. For
8: Plot of Pr(y=1Ix) example, in the eovariance structure model, commonly referred to as
the LISREL model, the variance of a latent variable ís unidentified,
Assumptions are required 10 fix the variance to a constant or to link
the latent variable to an observed variable (Bollen, 1989, pp. 238-246:
Long, 1983, pp, 49-52). In the BRM, the model is 110t identified until
11 we impose assumptions that determine the mean and variance of y*.
To see the relationship between the variance of the dependent variable
and the identifieation of the (3's in él regressíon model, consider the
model y = xl3 y + e y , where y is observed. Construct a new dependent
variable w = oy, where o ís any nonzero constant. The variance of w
equals:

x Vare w) = Vare oy) = 8 2 Var(y)


Figure 3,6, Plot of ¡¡nd
1/ x) in the Binary Response Model For example, if 8 = l/JVar(y), then Var(w) = 1. Since 10 = and
Y = xl3 y + el" it follows that
w = 8(xl3y + Ey) = x(8I3y) +
FV}¡at is the amount.
lhe in the rlJrf'babt'lt'ty?)
v .
S'mce on 1y a small Therefore, the f3's in a regression of w on x are 8 times the {3's in the
of lhe thl.n taíl .moves over the threshold' , Pr( y -- 11 x )'mcreases regression of y on x. That is,
as shown m panel B. As we continue to move to the r' ht
ti' k . 19 , [3.5]
o t 11C 'er reg~ons of the error distríbution slide over
and. the . m Pr(y 11 x) beco mes largeL After Since the magnitude of the slope depends on the scale of the dependent
thmner sectlons of the distribution cross the threshold variable, if we do not know the variance of the dependent variable, then
and the value of 11 '., .
mcreases mcreasmgly more slowly as it the slope coeffieients are not identified,
.J J 1. The curve is the well-known S-curve associated To apply this result to the BRM and to understand the relatiol1ship
Wlt 1 t le BRM.
between the magnitudes of the logit compared to the probit coeffieients,
49
REGRESSION MODELS Binary Outcomes

Probit of Labor Force


need to between the structural models for logit and probit.
Let Logit Probit

z f3 z f3 z
xl3p + Sp Variable f3
4.94 1.918 5.04 1.66 0.98
Constant 3.182
,vhefe L model and P the probit model. Since YI. K5 -1.463 -7.43 --0.875 -7.70 1.67 0.96
IJlJ,o;)HJ1\. to determine their variances from the -0.95 -0.039 -0.95 1.67 1.00
K618 -0.065
-4.92 -0.038 -4.97 1.66 099
I3 L and I3p are unidentified. For both AGE -0.063
3.51 0.488 3.60 1.65 0.97
is determined by assuming the variance of WC 0.807
0.057 0.46 1.95
e. Since Var(eplx) (Why?) , it follows that SL "'" He 0.112 0.54
4.17 1.65 0.96
LWG 0.605 4.01 0366
The errors are no! identieal since the logistic and normal -4.20 -0021 -4.30 1.68 0.98
dÍstríbutions with unit variance are only approximately equal (see Figure
905.39
From
NOTE: N 753. fJ is an unstandardi7.ed coefficient: z is the z-test for fJ·
to a prohit coefficieilL

where 1.81. This transformation can be used to compare coef- of the error. The effects of the identifying assumptions about Vare s) are
ficients from a published analysis to comparable coefficients from se en by taking the ratio of the logit coefficients to the pr?l:it coefficients,
and vice versa. contained in the column labeled "Ratio." The logit coefficlents are about
I3 L 1.813p is based on equating the variances of 1.7 times larger than the corresponding probit coeffic~e~ts, wit~ t~e ex-
tlle and normal distributions. Amemiya (1981) suggested making ception of the coefficient for HC which is the least staÍlstlcally slgmficant
the cdf's of the and normal distributions as c10se as possible, not parameter. Clearly, interpretation of the f3's mus~ take the e~fects of the
their variances equal. He proposed that the cdf's were most identifying assumptions into account. This issue IS now comlldered.
similar when SL which led to his approximation: I3L "'" 1.613p·
own calculations indicare that the cdf's are c10sest when eL"'" 1. 7e p, 3.3.1. The Identification of Probabilities
corresponds to the results in the examplc 1 now
Since the f3's are unidentified without assumptions about the mea~ and
variance of s, the f3's are arbitrary in this scnse: if we change the Iden-
cmd Probit: Labor Force Participatíon tifying assumption regarding Vare e Ix), tlle f3's also change. .
the f3's cannot be ínterpreted directly since they reflect bolh: (1) the re~atwn­
we have no! considered estimation, it is useful to examine ship between the x's and y*; and (2) the identifying assumptwns. Wlllle lhe
and probit estimates fmm our modeI of labor force participa- identifying assumptions affect the 13'8, they do not affect Pr(y . 1 Ix).
tlon. The model is More technically, Pr(y 11 x) is an estimable functton. An e~tJma?le
function is a function of the parameters that is invariant lo the ldentlfy-
Pr(LFP 1) + f3 1K5 + f32K618 + f33 AGE ing assumptions (Searle, 1971, pp. 180-188).
+ + f3 sHC + f36LWG + f37 INC ) Consider the logit model where
exp(x¡l3) 1
Estimates are in Table 3.3. The first tlling to notice is that the log Pr(Yi = 11 x¡) 1 + exp(x¡l3)
likelihood and z-tests are identieal. This reflects tlle basic simi-
for in the structure of the logit and probit models, (Prove the last equality.) The right-hand side is the celf .for the logistic
and the fact that these statistics are unaffected by the assumed variance distribution with variance (í2 1T 2 ¡3. We can standardlze e to have a
REGRESSION MODELS
Binary Outcomes 51
unit
dividing the structural model by 0-:
into the odds:
Xi~ e
=._+-'- Pr(y = 11 x) Pr(y 11 x)
o- o- o-
Pr(y=Olx) 1 Pr(y = 11 x)
has
logistic distribution with cdf (se e Equation 3.2): The odds indieate how often something (e.g., y = 1) happens relative
to how often it does not happen (e.g., y = O), and range from O when
Pr(y = 11 x) = O to 00 when Pr(y = 11 x) = 1. The log of the odds,
known as the logit, ranges from -00 to oo. This suggests a model that is
linear in the logit:

In =x~ [3.6]

This is equivalent to the logit model derived aboye (Show this.):

Pr(y 11 x) = exp(x~) f3.7J


thepr b bi' 1+ exp(x~)
o a ¡ tty an ~vent is unatr.ected by the identifying
I IS" Ix). WhIle the speelfie value assumed fo Other probability models can be constructed by ehoosing functions of
and affeets the f3's . r
thar is oí . ' It does not affeet the quan- xf3 that range from O to 1. Cumulative distribution functions have this
mterest namely the b bT property and readily provide a number of examples. The cdf for the
oceurred. The same 1'esult h()Jd' f' th ' . pro a I lty that an event
, '. . s 01' e problt model. standard normal distribution results in the probit model:
The pomt IS that while the f3'
I~
seale assumed f
the'
... s are affeeted by the arbitrary
o: e'bt]~ probab¡]¡tles are not affeeted. Consequently
ean e Illte1'preted without eoneern ab
thar is made to identifv the model TI
.'
. Out the a1'bItrary
Pr(y = 11 x) = ¡ Xf3

-00
1

are f' ~ . lat lS to say, the proba- Another example is the complementary log-log model (Agresti, 1990,
. ' u n e t l O n s Further f .
IS 31so 1 . ,any unetlOn of the probabilities
. mportantly we ea . t . pp. 104-107; McCulIagh & Nelder, 1989, p. 108), defined by
and whieh are ratios o/ p1'oba~I:~e~rp;~~ e~anges I~ prob~bilities
but first we cOIlsider an alternative m h d' s :s. done m SeetlOn 3.7, ln( -ln[I - Pr(y = 11 x)]) = x~
bit models. et o of denvlllg the logit and pro-
or, equivalently,

Pr(y = 11 x) = 1 exp[ - exp(x~)]


3.4. A Nonlinear Probability ModeI Unlike the logit and probit models, the complementary log-log model
The BRM can also be d . d . is asymmetric. In the logit and probit models, if you are at that point
Th' -' d enve wlthout appealing to an underlying on the probability curve where Pr(y = 1 Ix) = increasing x by a
IS IS one by specifying l'
lhe x's to the f a non mear model relating given amount 8 changes the probability by the same amount as if x is
pp. 3 o an.event. For example, Aldrich and Nelson decreased by 8. This is not the case for the complementary log-Iog model
the 1 PM can te 10g.lt model by starting with the probJem that as shown in Figure 3.7. As x inereases, the probability increases slowly
less than O 'T va u~s of Pr(y 1 1x) that are greater than 1 or at the left until it reaches about .2; the change from .8 toward 1 oceurs
• , '. o tl118 problem they t rans f
that ranges from -00 to 00 P'
t h orm ~.r(y. = 11 x) into a
. lrs, t e probabIllty IS transformed
much more rapidly. The log-Iog model, whieh is defined as

Pr(y = llx) = exp[-exp(-x~)]


REGRESSION MODELS Binaly Outeomes 53

Combining Equations 3.8 and 3.9,

L(P I y, X) n Pr(Yi = 11 Xi) n[l


y=O
Pr(Yi = 11 Xi)]

where the index for multiplication indicates that thc product is taken
11 over only those cases where y 1 and y = O, respectively.
The f3's are incorporated into the likelihood equation by substituting
Q the right-hand si de of Equation 3.3:

L(P ¡y, X) = n
y=l
F(x¡P) np
y=O
F(XiP)]

Taking logs, we obtain the log likelihood equation:

6 lnL(Ply,X)=¿InF(xiP) ¿In[I F(XiP)]


x y=1 y=o
3.7. Amemiya (1985, pp. 273-274) proves that under conditions that are likely
and Log-Log Models
to apply in practice, the likelihood function is globally concave which en-
sures the uniqueness of the ML estima tes. These estima tes are consis-
tent, asymptotically normal, and asymptotically efficient.
. pattern. ~hese models can be estimated with GLIM
and have hnks to the proportional hazards model (se~
3.5.1. Maximum Likelihood and Sample Size
pp. 17. or Petersen, 1995, p. 499, for details).
For ML estimation, the desirable properties of consistency, normal-
ity, and efficiency are asymptotic. This means that these properties have
ML Estimation 2 been proven to hoId as the sample size approaches oo. While ML esti-
mators are not necessarily bad estimators in small samples, indeed OLS
the likelihood equation, define P as the probability of ob-
for the linear regression model is an ML estimator that works quite
value of y was actually observed for a given observation: well in small-samples, the small-sample behavior of ML estimators for
Pi {1 11 Xi) if Yi 1 is observed the models in this book is largely unknown. Since altcrnative estimators
1/ Xi) if Y¡ O is observed [3.8J with knowl1 small sample properties are generally not available for the
models we consider, the practical question is: When is ¡he sample large
Equa.tion 3.3. If the observations are inde- enough fo use the ML estima tes and the resulting tests? While
18
1 am reluctant to give advice without firm evidence to justify the advice,

L(P Iy, X) = nN

i=1
Pi (3.9]
it seems necessary to add a cautionary note since it is easy to get the
impression that ML estimation works well with any sample size. For ex-
ample, the 32 observations from a study by Spector and Mazzeo (1980)
are used frequently to illustrate the logit and probit models, yet 32 is too
more than one observation for each combination 01' values of independent
mmUllum estimaf b . small of a sample to justify the use of ML. The following guidelines are
nn<f'rvn,,,,,, '. . Ion can e used. Smce the requirement
H celI 15 rarely satlsfied in social scíence research 1 do not cons'd not hard and fast. They are based on my experience of when the models
anushek and Jackson (1977, pp. or Madd:¡la (1983, pp. 12:~ seem to produce reasonable and robust results and my discussions with
other researchers who use these methods.
55
REGRESSrON MODELS Binary Outcomes

,~t iS" to use ML with samples smaller tban 100, while samples
,> > on tbis guess by adding a vector ~o of adjustments:
over 500.se~m These values should be raised depending on
c~aractefl~tlCS of the model and the d~ta, First, if there are a lot of pa-
61 60 + ~o
rdmeters In t.be more observatlOns are needecL In the literature We proceed by updating the previous itera1ion according 10 the equation:
on the covanaI~ce structure model, the rule of at least five observations
p,er > IS ~ften . A rule oC at least 10 observations per pa- 0n+1 6 11 +~"
rameter seems redsonable for the models in this book, Tbis rule does Iterations continue until there is convergence, Roughly, convergence oc-
not tbat min~I?um of 100 is not needed if you have only two curs when the gradient of the log likelihood is close to O or the estimates
Jf the data are iIl conditioned (e.g" independent do not change from on~ step to the nexL Convergen ce must occur to ob-
, or íf there is little variation in the depen-
tain the ML estimator O.
dent vana:Jle, all oC the outcomes are 1), a larger sample ís The problem is to find a ~n tbat moves the process rapidly toward
, rhlrd, sorne models seem to rcquire more obscrvations. The a solution. It is useful to think of ~n as consisting of two parts: ~n
orchnal . model of 5'1S an
. example. In d1scussmg
' . tbe D" 'Y n' 'Y n is the gradient vector defined as J In L/ JO,1' which indica tes
use. M:~ lor small , . Allison (1995, p. 80) makes a useful point. the direction of the change in the log likelihood for a change in the
Wlule the standard a~vlce IS that with small samples you should accept parameters. Dn is a direction matm that reflects the curvature of tlle
. as eVld~nce tbe nul! hypothesis, given that the log Iikelihood function; that ¡s, it indica tes Ilow rapidly the gradient is
to wh:c~ ML estlmates are normally distributed in small samples changing. A dearer understanding of these components is gained by
It 15 more reasonable ro require smaller p-values in small examining the simplest metbods of maximization.

The Method of Steepest Ascolt. The method of steepest ascent lets


D 1:
3.6. Numerical Methods for ML Estimation

For tb~ L~~M, ML estimates are obtained by setting tbe gradient of An estimate increases if the gradient i8 positive, and it decreases if the
the hkellho.od to O and for the parameters using algebra. gradient is negative. Iterations stop when the derivative becomes nearly
solutlOns are possible with nonlinear models. Conse- O. The problem with this approach is that it considers the slope of In L,
:zun:zencal fllethods are used to find the estimates that maximize but not how quickly the slope is changing. To see why this is a problem,
v. j¡kej¡~ood function, NUI;nerical methods start with a guess of the
consider two log likelibood functions with the same gradient at a
"Iues of the and Iterate to improve on that guess. While point but with one function changing shape more quickly tl1an the other.
to dismiss numerical methods as an esoteric topíc
(Sketch these functions.) You should move more gradually for the func-
concern, programs using numerical methods for esti- tion that is changing quickly, in order to avoid moving too faL Steepest
incorrect estimates or faí! to provide any estimates, descent tends to work poorly since it treats both functions in the same
. and co~rect such problems, an elementary understanding of
numencal methods 18 usefuL 1 witb an introduction to numerícal way.
The next three commonly used methods address this problem by
methods. followed by advice on using these methods. adding a direction matrix that assesses how quickly the log likelihood
function is changing. They differ in their choice of a direction matrix,
In all cases, it takes longer to compute the direction matrix than the
3.6.1. Iterative Solutions
identity matrix used with the method of steepest ascenL Usually, the
A'lSUl?e lha~ ':: are to estimate the vector of parameters O. We additional computational costs are made up for by the fewer iterations
w1th an lI1ltIal guess 6o, called start values, and attempt to improve that are required to reach eonvergence.
56
REGRESSION MODELS
Billa/y Outcomes 57
No one method works best al! of th ' ,
one set of data may not wh ~ tIme, An algor~thm applied to
where In Li is the value of the likelihood function evaluated for the ¡th
the same data may 'dI FIle another algonthm appIied to
may occur' In rapI y. ~or a different set of data, the op- observation. This approximation is often simpler to compute since only
~n tbe . . the algonthm used in commercial software the gradient needs to be evaluated. Iterations proceed according to
. b of the programmer and the ease witb wbich
can . e programmed for a given modeI.

IS lb. 'd . .The rate of change in the slope of In L


e secon denvatlves which '. which is known as the BHHH (pronounced "B-triple-H") algorithm or
In L¡BOBa' F ,. '. are contamed m the Hessian
is . or example, wltb two parameters O (a (3)', the the modified method of scoring.

Numerical Derivatives. If you cannot obtaín an algebraic solution for


InL

~)
the gradient or the Hessian, numerical methods can be used to estímate
InL Bada BaB{3 them. For example, consider a log Iikelihood based on a single parameter
(J, The gradient is approximated by computing the slope of the change in
( ('PlnL InL
d{3Ba In L when (J changes by a small amount. If /l is a small number relative
to (J,
If
as ~
2
more rela:ive to B In L¡ r7{3Bf3, tbe gradient is changing él In L In
to the 1"~Tlm'."" t lan as f3 changes. Thus, smaller a d'ustm d(J
~ ----------------
of iY would be indicated The N t R .J .ents
to ¡he equation:' ew on- aphson algonthm
Using numerical estimates can greatly increase the time and number of
iterations needed, and results can be sensitive to the choice of /l. Further,
InL)-ldlnL different start values can result in different estimates of the Hessian at
(
BO/lBO~ dO/l convergence, which translates into different estima tes of the standard
lVe errors. Programs that use numerical methods for computing derivatives
the inverse of the Hessian?)
should only be used if no alternatives are available. When they must be
The iVI('JtIf'fl used, you should experiment with different starting values to make sure
as lhe lnr,Ol71'latinn In some cases, th~ expectation of the Hessian that the estimates that you obtain are stable.
The method of can ?e easler to compute than the Hes~
uses ¡he mformatíon matrlX' as th e d'IrectlOn
,
which results in 3.6.2. The Variance of the ML Estimator

In addition to estimating the parameters O, numerical methods provide


estimates of the asymptotic covariance matrix Var(6), which are used
for the statistical tests in Chapter 4. The theory of maximum likelihood
shows that if the assumptions justifYing ML estimation hold, then the
the Hessian and the informatíon matrix asymptotic covariance matrix equals
Bern?t e~ al. (1974) propose using an outer
approxlmatlOn to the ínformatíon matrix:
~ =
Var(O) (- E[délOélO'
2
-InLJ)-1
- [3.10]

In words, the asymptotic covariance equals the inverse of the negative


of the expected value of the Hessian, known as the information matrÍX.
REGRESSION MODELS Binary Outcomes 5'1

~l~t~l~ovarianC~)f~~~riX is often wrítten in an equivalent form using the 3,6.3, Problems With Numerical Methods and Possible Solutions

While numerical methods generally work well, there can be problems.


VareO) First, Ít may be difficult or impossibIe to reach convergence. You might
[3.11] get an error such as "Convergence not obtained after 250 iteratíons." Or,
In both cases, ¡he it might not be possible to invert the Hessian when In L is nearly fiar.
is e:aluated at O. Since we only have an
estimate of {I, the This generates a message such as "Singularity encountered," "Hessian
matnx mus! be estimated. Three consistent
of are commonly used. could not be inverted," or "Hessian was not of full rank." The message
The first might refer to the covariance matrix or the information matrix. Second,
evaluates Eguation 3.10 using the ML estimates O:
sometimes convergence occurs, but the wrong solution ís obtained. This
(O) = (E[éi~~L])'-] occurs when In L has more than one location where the gradient is O.
. éiOéiO' The iterative process might locate a saddle point or local maximum,
where the gradient is also O, rather than the global maximum. (Think
used. with th~ method of scoring since that
of a two-humped Bactrian camel. The top of the smaller hump is a lo-
. t~e mformatlOn matrix at each iteration.
cal maximum; the low spot between the two humps is a sae!dle point.)
obtamed by evaluating the negative of the Hes-
IS
In such cases, the covariance matrix which should be positive definite
t.o as the observed information matrix rather
matnx ' is negative dennÍte. When In L is globaIly concave, there is only one

=
2;n!:í)-1
(t éiéi6éiO'
solution, and that is a maximum. This is the case for most of the mod-
els considered in this book. However, even when the log likelihood is
1=]
[3.12J globally concave, it is possible to have false convergence. This can oc-
lS d .h cur when the function is very flat and the precision of the estimates of
'j 1 use the Newton-Raphson algorithm Eqlla
Wlt the gradient is insufficient. This is common when numerical gradients
s lOWS t le b . -
. . etween the curvature of the likelihood are used and can also be caused by problems with scaling (discussed be-
and the vanancc of the estímator Tlle sl'ze of th . . low). Finally, in some cases, ML estimates do not exist for a particular
. 1 - " e vanance IS
le ated lo the. second derivative: the smaIler the second deriva- pattern of data. For example, with a binary outcome and a single binary
the ~anance. ~hen the second derivative is smalIer, the independent variable, ML estimates are not possible if there is no vari-
. IS flatter. It the likeIíhood f'
vanal1ce will be TI '. 1 I :1 ~gua Ion IS very fiat, the ation in the independent variable for one of the outcomes. You can try
the lIS s lOU ( match your mtuition that the fiatter estimating a probit model using: y' (O O 1 1 1) and Xl (l O 1 1 O).
. . the harder it. will be to nnd the maximum of the This works fine, since there are x's egual to O ane! 1 for both y 1 and
and the (
¡lave in che solution you obtain. I.e., the more variance) you should y O. However, now try to estimate the modcl for: y' (O O 1 1) and
.
x' (1 O 1 1). Your program will "crash" since whenever y 1, an x's
A third wÍlich is related to the BHHH . "
to compute sin ce it does not eval 't' f h algonthm, IS sImple are l's.
• ua Ion o t e second derivatives: When you cannot get a solution or appear to get the wrong solution,
the first thing to check is that the software is estimating the model that
éil~Li)--l you want to estimate. It is easy to make an error in specifying the com-
éiO' mands to estimate your model. If the modeI and commands are correct,
Whíle there may be problems with the data.
oí" the .covarianc~ matrix are asymptotícally
in
.sometlmes provlde very different estimates ¡nCo/Teet variables. Most simply, you may have constructed a variable ¡ncor-
when the
¡S smalI or the data are ilI conditioned. Con~ rectly. Be sure to check the descriptive statistics for all variables. My
ifyou
the same model with the same data using two experience suggests that most problems with numerical methods are
programs that use
you can get different results. due to data that have no! been "cleaned.
60
REGRESSION MODELS
Bína¡y Outcomes 6J
observations. a
"en era JIy occurs more rapldly
. when
t J1cre are more d h . differ at the first decimal digit as a result of the different methods used
. .' an w en the ratio of the number oí" ob-
servatlOns to the number of variables is Wh'l th . to estimate vard3').
¡¡ttle you can do about " . . 1 e cre lS generally
Slze, 11 can explam why you are having
your models to converge. Parameterizations of the Model. A more basic difference is found in the
• 1 variables. is a very common cause of probJems with numeri- outcome being modeled. While most programs model the probability of
~<¡ meth~d~. The . the ratio between the largest standard deviation a 1, so me programs (e.g., SAS) model the probability of a O. This is a
and the sm~tllest standard devnttion, the more problems you will have trivial difference if you are aware of what the program is doing. For the
wlth numencal methods. For example if you ha .
' . , v e mcome measured BRM,
m lt may ha~e a very standard deviation relative to other
variables.
. mcome to thousands of dollars, may solve the prob-
lem. . expenence suggests that problems are much more likely wh
Pr(Yi = OI x¡) = 1 Pr(Yi = 11 x¡) F(X¡j3) = F( -x;(3)
¡he ratiO betwcen the ' d ' '11 ' en
10. <in sma est standard devlation exceeds where the last equality follows from the symmetry of the pdf for the logit
Distributi(m outeo' If . and probit models, Thus, aU coefficients wiII have the opposite Note
o· . dI"
t me. . a '. proportlOn of cases are censored in the that this wiII not be the case for the complementary log-Iog model since
o ,lt ,mo . e or If one oí the of a categorical variable has ve
few Cdses, convergence may be difficult. There is little that can be d()n~ it is asymmetric.
wl!h such data limitations. With estimates in hand, we can consider the interpretatíon of the bi-
nary response modeL
model is methods for ML estimatíon tend to work well when your
for your d.at~, In such cases, convergence gener-
often wlthm five iterations. If you have too few 3.7. Interpretation
·
Jcm. l 11 such cases
or a poor m. odeJ, convergence may be a prob-
d' In this section, 1 present tour methods of interpretation, each of which
, . your ata can solve the problem. If that
¡cal you can try usmg a program that uses a different numer- is generalized to other models in later chapters. First, 1 show how 10
A problem ¡hat may be very difficult for one algorithm present predicted probabilities using graphs and tables. Second, 1 exam-
may work well for another. ine the partial change in y* and in the probability. Third, 1 use discrete
Whíle numerical methods generally work well I heart'l d change in the probability to summarize the effects of each variable. Fi-
10 ' " , 1 Y en orse
nalIy, tor the logit model, 1 derive a simple transformation of the param-
p, ) advlce: Check the data, check their transfer into
check th~ actual computations (preferably by repeating eters that indicates the effect of a variable on the odds that the event
a nval program), and always remain suspicious of occurred.
of the appea!." Since the BRM is nonlinear, no single approach to interpretatíon can
fulIy describe the relationship between a variable and the outcome prob-
3.6.4. Software Issues ability. You should search for an elegant and concise way to summarize
the results that does justice to the complexities of the nonlinear model.
related lo software for logit and probit that For any given application, you may need to try each method before a fl-
nal approach is determined. Por example, you might have to construct a
plot of the predicted probabilities before realizing that a single measure
The M ,. '. of discrete change is sufficient to summarize the effect oí' a variable, 1 il-
, .' . Dlfferent programs use dif-
ClXllnlZatzon.
~ methods of numencal maximization. In most cases estima tes of lustrate these methods with the data on the labor force participation of
t 1le parameters from the ' . ' women. You should be able to replicate many of the results using Tables
decimal . programs are Identlcal to at least four
oí the standard errors and the z-values may 3.1 and 3.3, aIthough your answers may differ slightly due to rounding
error.
63
REGRESSION MODELS Bínary Ol/teames

how the intercept and the slope affect the curve Panel A: Effects of Changing a
variable to the probability of an event. Under- o ,-~
,/
how the affect the probability curves is fundamental ~
-- I
I

I
I I
ea eh method of interpretation. I I I
I
~ I I I
.,.- I
3.7.1. The gffects of the Parameters I I /
I
11 I I I
Consider the BRM with a single x: >- LO f
., .1..
I ¡
'----"" o
L i I
I
I
Pr(y = ti F(a+{3x) Cl.. I I

I
I I
I I I
I
Panel A of 3.8 sbows the effect of the intercept on tbe probability / I I
I
0, shown tbe short dashed ¡ine, the curve passes / ,- ,/
~
~

(O, As a gets larger, the curve shifts to the left; O 20


o 20 10
the curve sbifts to the right. (Hlhy does the CUl1Je shif!
When the curve shifts, the slope at a given Panel B: Effects of Changing f3
value 01' does not change. This idea 01' shifting, "parallel"
curves i8 used to several of the methods presented below. lt o - - ex .25
is also fundamental to the ordinal regression model in --ex .5
\J <'" , 5.
,-"oH,,,
--- ex O
Panel B of 3.8 shows the effects of changing the slope. Since ~

~
0, the curves go through point (O, .5). The smaller the {3, the more
stretched out the curve. At 13 shown by the solid line, the curve 11 LO

as it moves 1'rom -20 to 20. When {3 increases to .5,


>- o
'----""
1-
dashed the curve initially increases more slowly. Cl..
O, the in crease is more rapid. In general, as {3 increases,
tlle curve increases more as x approaches O. Wbile 1 bave not
drawn the curves, when tbe slope i8 negative, tbe curve 1S rotated 180 0

°L-~~~~~~~~------~~----~20
around x O. For i1' (3 the curve would be near 1 at 0-20 O 10
and would decrease toward O at x = 20. X
a i8 also to understand how the probability curve general-
izes to more than one variable. 3.9 plots the probit model: Figure 3.8. Effects of Changing the Slope and lntercept on the Binary Response

llx, =<P(l+Ix+.75z)
Model: Pr(y 11 x) F(a + f3x)

Similar results for the logit mode!. The surface begins near zero
when -4 and z = -8. 11' we fu z = then d h' h ses tbe curve to shift to the
<P(l + Ix + [.75 x -8)) = <p( -5.0 + Ix) onl ( the i~~~c~to~~i::r~n~~8)'. ~h~ le~~~ of z affects the ínt~rcept of
Y
lehft see: but do es not affect the slope. Conver8ely, controllmg for x
curve along the x-axis. If we increase z by 1, t e curv , . b t the slope
which to the next curve back along the z-axis, then affects the intercepto oí t~e curve for z, ~~ no me~hods for inter-
With these ideas m mmd, we can conSl er
11 x, z (1)(1 + Ix + [.75 x -7]) <p( -4.25 + Ix) preting the binary response model.
REGRESSION MODELS BinG/y Outcomes 65

where mini indicates taking the minimum vaIue over all observations,
and similarly for max¡. In our example, the predicted probabiJities from
the probit model range from .01 to .97, which indicates that the nonlin-
earities that occur below .2 and aboye .8 need to be taken Ínto account.
If the coefficients from the logit model are used, the predícted probabiI-
ities range [rom .01 to .96. This illustrates the simiIarity between
the predictions of the logit and probit models, even for observations that
fall in the tail of the distribution. Consequently, in the remaÍnder of this
section, only the results from the probit analysis are shown.
Computing the minimum and maximum predicted probabilities re-
quires your software to save each observation's predicted probability for
further analysis. If this is not possible, or if you are doing a meta-analysis,
3.9. Plot of Probit Model: Pr(}' 1I ) the mínimum and maximum can be approximated by using the estimated
x. z <P(1.0 + 1.0x + 0.75z)
f3's and the descriptive statistics. The lower extreme of the variables is de-
3,7.2. Interpl'etation Using Predicted Probabilities fined by setting each variable associated with a positive f3 to its minimum
and each variable associated with a negatíve f3 lo its maximum. In our
di~~~~ most direct a~proacJl for interpretation is to examine the pre- exampIe, this involves taking the maximum number of young children
Wl 'h o an event for different values of the independent (since K6 has a negative effect), the mínimum anticipated wage
len t ere are more than two V' . bI " LWG has a positive effect), and so on. Formally, let
sible to the entire prob'1bTt f' ana es, .It. 18 no Ionger pos-
·1 ' 11 Y Sur ace and a deCISlon must be made
useful first
wh IC 1
is to .
to co d
mpute an how to present them. A
= {min Xik
if f3k > O
the <1nd the examme the ~ange of predicted probabilities within maxx¡k
í
if f3k O
ities. Jf ¡he mnge of to WhI~h each variable affects the probabil-
". 18 be:ween .2 and .8 (or, more con- and let be the vector whose kth element is . The upper extreme can
·~s and .7), ~he, relatlOn~hip between the x's and the be defined in a corresponding way, with the values contained in . The
. lmedr, and sImple measures can be d minimum and maximum probabilities are computed as
the results. Or" if the range of the probability is s~:~l
linear. For
the x s and the prob' bTty '11
the se . a 1 ¡ w~. al so be approxi-
' Pr(y = 11 ~) = F(~~) and Pr(y = 11 ) = FC7~)
.05 alld .10 is gme.nt of th~ probabll¡ty Curve between
pomts are lllustrated below. In our example, the computed probability at the lower extreme is less
than .01 and at the upper extreme is .99. While these values are quite
close to the minimum and maximum predicted probabilities for Ihe sam-
pIe, ~ and are constructs that do not necessarily approximate any
The
of an event given x for the ith individual is member of the sample. If they differ substantially from any Xi in the sam-
pIe, then Pr(y = 11 ~) and Pr(y = 1 1""t) will be poor approximations
= 11 Xi) = F(x¡~) of the probabilities min Pr(y = 11 x) and max Pr(y = 11x).
The
and maximum probabilities in the sample are defined as
Waming on the Use of Minimums anc! Maximums. The use of the mín-
min = 11 x) = min F(x¡~)
¡ imum or maximum value of a variable can be misleading if there are
max 11 x) max F(Xi~) extreme values in the sample. For exampIe, if our sample includes an
1 extremely wealthy person, the change in the probability when we move
67
REGRESSION MODELS Binary Outcomes

from the to the maximum income would be unrealistically Plotting Probabilities Ova the Range of a Variable
the mínimum and maximum, VOl! should examine When there are more than two independent variables, we must exam-
distributiOIl of each variable. If extre~e values are present,
ine the effects of one or two variables while the remaining variabl~~ are
you should consider lhe 5th percentile and the 95th percentile, for
held constant. For example, consider the effects of age and the wI~e at-
rather than the minimum and maximum.
",","UUlJ1'v.
tending college on labor force participation. The effects of both varIables
can be plotted by holding al! other variables at theÍr mean~ and allow-
The Each Variable cm lhe Predicted Probability ing age and college status to vary. To do this, let Xo contam ~he n;ean
of al! variables, exeept let we == O and allow ACE lO vary. Xl IS defincd
The next step is to determine the extent to which change in a vari-
able affects rhe probability. One way to do this i8 to allow one similarly for WC = l. Then
variable to vary from its minimum to its maximum, while aU other vari- Pr(LFP = 1 lACE, we = O) = <P(xolJ)
ables are fixed al their means. Let Pr(y = 11 X, Xk) be the probability
when aH variables except Xk are set equal to their means, and is the predicted probability of being in the labor force for women of a
80me value. For examplc, Pr(y = 11 X, min Xk) is the given age who did not attend college and who are average on ~ll ?ther
when xk its minimum. The predicted change in the characteristics. Pr( LFP = 1 lAG E, we == 1) can be computed SImIlarly.
as from its minimum to its maximum equals These probabilities are plotted in Figure 3.10. As suggcs~ed by Table 3.~,
the relationship between age and the probabili~ of bem~ employed IS
11 x, max approximately linear. This allows a very simple mterpretatIOn:
• Attel1díng college il1creases the probability of being employed by abou! .18
. For our the results are in Table 3.4. The range of pre-
for women of aH ages, holding all other vanables at thelr means.
dlcted. . can be used to guide further analysis. For example,
th~~c. lS ~lttle to be ¡earnee! by analyzing variables whose range of prob- • For each additional 10 years of age, the probabilíty of being employed de-
abIlltles IS su eh as He. For variables that have a larger range, the ereases by about .13, holding all other variables at their means.
ene! of the mnge aHect how interpretation should proceed. For
the probabilities for ACE range from .75 when age is
30 to .32 when age i8 which is a regio n where the probability curve is
linear. The mnge for [Ne, however, i8 from '()9 to .73, where non-
linearities are The implieations of these differenees are shown
in the next sectíon.

'JABLE 3.4 Probabilities of Labor Force Participatíon Over the Range of Each
nd,epc:ndcl1t Variable for lhe Probit Model

Range of lD
Al N
Afaximwn Jl;finil1lul1l Pi- o
K5 0.01 0.66 0.64
0.60 0.12 gL-____~____~----~--~~--~~--~
c:i 30 35 40 45 50 55 60
0.75 0.43
0.52 0.18 Age
He 0.59 0.57 0.02
0.17 0.66 Figure 3.10. Probability of Labor Force Participatíon by Age and Wife's Educa-
0.73 0.64
tion
REGRESSION MODELS Binary Outcomes 69

TABLE 3.5 Probability of Employment by College Attendence and the Number


of Young Children for the Probit Model

Predicted Probability
Number of
Young Children Did Not Attend Attended College Difference
1-
o O 0.61 0.78 0.17
..D
0.27 O.4S 0.18
2 O.O? 0.16 0.09
3 O.Ol om 0.02
e

of attending college decreases as the number of children íncrcases. (The


difference in the probability for those attending and not attendirzg
increases and then decreases. Draw the probability CUllles that produce this
resulto )
Figure 3.11. of Lahor Force Participation by Another strategy for presenting probabilities is to define combinations
and Family lncome
Women Without Some Education of characteristics that correspond to ideal types in the population. For
cxample, in his study of factors that affected the retention of workers
by their employer after training programs, Gunderson (J 974) defined
The effeet of age was b·.' five "hypothetical trainees" based on combinations of the independent
, p . • ' su . tractmg the predlcted probability
at agIC 30 ( , from that at agc 60 ( .46) and dividing by 3 (for three variables: typical, disadvantaged, advantaged, housewife, and teenage en-
effect of ten, ,It ~ould al.so ~e ~ppropriate to use the marginal trant. Predicted probabilities of being retained were computed for each
at the mICan. whlch IS dlscussed in Section 3.7.4. hypothetical persono In sorne situations, this can quickly and convíncingly
Tlle b
r etween age and the probability of working was sllmmarize the effects of key variables.
, f • mear ~nd. the was superfiuous. In other cases, plottin i~
ve!) useful. Conslder the effects of in come 'md age Wh'l g Id
3.7.3. The Partial Change in y*
hoId al] other variables at their' means and ;iraw a ~hree l;' we ~ou 1
it is often more . " . - :mensJOna
, . to dIVIde one of the vanables ¡nto Measures of partial change can also be used to summaríze the effects
,md f the results m 1:\vo dimensions. Figure 3.11 shows the of each independent variable on the probability of an event occurring.
. ~ . as íncome changes for women aged 30 40 RecalI that the logit and probit models are linear in the latent variable:
and 60. -:he nonlinearities are apparent, with the effect of inc~m~
.' . wlth age. When relationships are nonlinear. plots are often =xt\+e
llseful lor . l ' h' "
the Je atlOns JpS, even Jf they are not used to present
Taking the partial derivative with respect to Xk,

Tables
al Selected Vczlues BXk

You can al so USe tables to present predicted probabilities. For exam- Since the model is linear in ,the partial derivative can be interpreted
the effeets of . children and the wife's education on the proba- as:
of are s?own in 111ble 3.5. The strong, nonlinear effect
• For a unÍ! change in Xb y* is expected to change by f3k units, all
lS cJearly evident. It also shows that the effect other variables constant.
REGRESSION MODELS Binary Outcomes 71

with this interpretation is that the varianee of . TABLE 3.6 Standardized and Unstandardized Probit Coefllcients for Labor
so the of a '. IS un-
. .' of /3 k m 18 unclear. This issue Force Partícipatíon
Wmslup and Mare (1984, p. 517) and M K 1
pp 114-116) . e e vey and Váriable f3 f3s
but their coneerns ' the or~mal regression model,
to the BRM. Smee the varianee of y* K5 -0.875 -0,759 --0.398
I
Wlen . dd d K618 -0.039 -0.033 --0.044 -0.95
'11 . are <i e to the model, the magnitudes
WI ev~n lf the added variable is uneorrelated with AGE -0,03R -(J.()33 -0.265
WC 0.488 0.424 0.191 3.60
ThlS makes ít .misleading to compare coefficients HC O.OS7 OJ150 0.024 0.46
.f of the mdependent variables. (Why is this LWG 0.366 0.317 0186 4.17
Wlt
d 1 lhe LRM?) To compare coefficients across equations ¡NC -0021 -O.0l8 -0.207 -4.30
adn proposed fully standardized coefficients wh¡'le'
an Mare * d ' ', Yar(y') 1.328
, y -stan ardlzed coefficients
IS rhe uncon~itjonal, standard deviation of y~: then the y*_ NOTE: N 753. is an unstandardized cocfficient f3sJ" is )1'" ~standard¡zed fully
lUIUUlrU'lZf'l1 Ior x k IS f3'
standardized Z IS the z-test.

Var(x) is the covariance matrix for the x's computee! from the observed
i3
data; contains ML estimates; and Vare e) = 1 in the probit model and
which can be as: Vare e) = 7T 2 ¡3 in the logit modeL
If you accept the notion that it is meaningful to discuss the latent
• F~r, unit in crease in is expected lo increase by
standard devi- propensity to work, the fully standardized and y* -standardized coeffi-
atlOns. al! otlJe!' variables constant. cients in Table 3.6 can be interpreted just as their counterparts for the
LRM. 3 For example,
, indicate the effect of an independent vari-
. _ UIllt o~ measurement. This is sometimes preferable for • Each additional young child decreases the mothcr's propensity to enter the
redsons and 18 fo b' . d labor market by .76 standard deviations, holding aH othcr variables constant-
' r mary m ependent variables.
If . also standardize the independent vari- • A standard deviation increase in age decreases a woman's propensity to
(Tk IS the standard deviation of:r then the fi II d d' en ter the labor market by .27 standard deviations. holding al! other variables
COeT/:Wll?nl for is . k, U Y stan ar. lzed
constant.

3.7.4. The Partial Change in Pr(y = 11 x)


which can be as: The f3's can also be used to compute the partial change in the proba-
bility of an event. Let
• ~~~(;a::tndard deviation increase in Xb y* is expected to increase by /3f
all otlJer variables constant. Pr(y = 11 x) = F(x/3) [3,13J
where F is either the cdf <t> for the normal distribution or the cdf A
~~ need estimates of ~k, (J'k, and (J'y., The
for the logistic distribution. The corresponding pdf is indicated as f.
served data, Since .x s can be computed dlrectly from the ob-
be x/3 e, and x ane! e are uncorrelated (J'2 ca The partíal change in the prohability, also called the marginal is
the quadratic form: ' Y* n
J If you try to reproduce the standardized coefficients in Tahlc 3.6 using the descriptive
slatistics from Table 3.1, your answers will only match lo the first decimal digit due to
rounding.
REGRESSION MODELS Binary Outcomes 73
the
derivative oi Eguation 3.13 with respect

dF(xf3) dXf3
~ - = f(xf3)/3 k [3.141
Forthe

and for ¡he

A(Xf3)/3 k exp( xf3)


+ exp(xf3)]2 /3k
[1 x
Pr(y ] Ix)[l - Pr(y = 11 x)J/3k .Figure 3.12. Marginal Effect in the Binary Response Model

oi th e pro ba b'IlJty corresponds to the intersection of Iines within the figure. The partíal
. is the . curve relatin x
d Pr(y = 11 x, z)1 dX is the slope oi the line parallel to the x-axis at the
effecr is holdmg all other va.ri~ble~ con~tant. The sign ol th~
point (x, z); dPr(y = 11 x, z)/dz is the slope oi the line parallel to the z
of the /3k, smce j(x~) IS always positive. The axis at the point (x, z). For example, at (-4, -8), the slope wilh respect
01' xf3. Thís is in .on the magl1ltude oi /3k and the value to x is nearly O. As z in creases, the slope with respect to x incrcases
11 and the r 3.12, where :he solid line graphs Pr(y = steadily. At (-4, O), where Pr(y = 1/ x, z) is about .5, the slope is near
. me the margmal effect. The ma' .
at x whlch corresponds t P rgmal IS its maximum. As z continues to increase, the slope gradually decreases.
is symmetric around '. o, rey = 1 Ix) = .5. The marginal Hanushek and Jackson (1977, p. 189) show this relationship by taking
the symmetry oi .f. Therefore., the second derivative:
/;2

of the eifect d d
other and their . ep~n s on the values of the =f3kf3ePr(y llx)[l Pr(y llx)][1-2Pr(y llx)]
the Sl~C~ f IS computed at xf3. Conse-
of aH .1."'s. To 1 on the /3 s for all variables and the levels The f3's can also be used to assess the relative magnitudes of the
10W the value of the . 1 if marginal effect for two variables. From Eguation 3.14, the ratio of
on the level oi other variables cons'd maF~gma e ect ~i xk de- marginal effects for x k and x e is
the f '. 1 er Igure 3.9 whlch plots
01' x and z. Pick a point (x, z), which
chain rule: f(xf3)f3k
f(Xf3)f3f f3e
dXe
Thus, while the f3's are only identified up to a scale factor, their ratío
f(x) is identified and can be used to compare the efiects of independent
variables.
REGRESSION MODELS Binary Outcomes 75
Since thc value of the marginal d d
wc must . epen s on the Icvels of all two mea sures of change can be quite different. Second, the marginal
the effect. Oneo:~~~~~~ :a~~)e~~~pt~~ev~hnC"~balves to use when effect at the mean for AGE approximates the slope of the ¡ines in Figure
'''\?''''A~o.. . erage Over al! 3.1 O. If an independent variable varies over a region of the probability
curve that is nearly linear, the marginal effect can be L1sed to summarize
mean - -__-.....:---:.. the effect of a unit change in the variable on the probability 01' an event.
However, if the range of an independent variable corresponds to a
i=1 of the probability curve that is nonlinear, the marginal cannot be used
method is to o h " to assess the overall effect of the variable.
. c mpute t e margmal effect at the mean of the

3.7.5, Discrele Change in Pr(y 11 x)


The change in the predicted probabilities for a discrete change in an
independent variable is an altemative to tbe marginal effect that 1 fmd
al the mean ~~ a popul~r summary measure for models more effective for interpreting the BRM (as well as otber models for
. , vanab~es . It lS frequently included in tables categorical outcomes). Let Pr(y = 11 x, x k) be the probability of an evcnt
and 18 au~oma~lc~lJ~ computed by programs such as given x, noting, in particular, the value of Xk' Thus, Pr(y = 11 x, 8)
the measure 18 ]¡mJted First given the ¡.. . is the probability with xk increased by 8, wbile the other variables are
t t I ' " non llleanty
o rans a. te the marginal effect ¡nto the chan e unchanged. Thc discrete change in the probability for a change of 8 in
that wIll OCCur if there is a discrete chan:e Xk equals
not correspond to any observed values m .
over observations might be preferred F ¡1 Ll
the measure is anr1rnnri"ro for b" . d . ma y, --"---'--'- = Pr(y = 11 X, Xk + 8) Pr(y =1 X,
reasons, 1 much mary m ependent variables. For these LlXk
the mcasures of discrete change that are d"IScussed
111 3 . 7. 5 . The discrete change can be interpreted as:
TIlble 3.7 contains >ff t f'
.... e ec s or our example of labor force • For a change in the variable Xk from Xk to Xk + 0, lhe predicted probabil ..
thmgs should be notcd F 1 . ity of an event changes by .6. Pr(y ] Ix) ¡.6.x k' holding all other variables
over all . Irst, t le margmal effects
arc held at th" arc c10se to the marginals computcd when constant.
Clr means. They ar - 1 " .
overall is _. e cose smce the predlcted
.::> m the sample . In general, these When interpreting the results of the BRM, it is essential to understand
that the partial change does not equa! the discrete change:
TABLE 3.7

except in the limÍt as 8 becomes infinitely small (which is, by definition,


the partial change) . The difference betwecn these two measures is shown
in Figure 3.13 which plots a segment of the probabilíty curve. The par-
tia! change is the tangent at x ¡, and its value corresponds to the solid
He triangle. For simp!icity, assume that 8 = 1. The discretc change mea-
LWG
0 . 125 sures the change in the probabi!ity computed al Xl and X¡ 1. This is
INe
represented by a triangle formed of dashed Iines. The discrete and par-
tial changes are not equal since the rate of changc in the curve changes
REGRESSION MODELS Binary Outcomes 77

Dummy variables reguire special consideration. If Xii is a dummy vari-


able, is the proportion of the sample with Xd 1. The predicted
probability at is between the predicted probability at xd 1 and
Xii = O. Alternatively, you could compute the predicted probability for
11 each combination of the dummy variables, with the other variables held
at their means. In our labor force example, this would reguire four base
o... probabilities: husband and wife attending college; only the husband at-
tending; only the wife attending; and neither attending. Alternatively,
dummy variables could be held at the modal value for each variable.
If there is a combination 01' the independent variables that is of par-
ticular substantive interest, those values could be used as a baseline. For
example, if you were interested in the effects of educatiol1 on labor force
participatiol1 for young women without children, you could hold AGE at
x 30, K5 at O, K618 at 0, and al! other variables at their means. In the
3.13. Partíal folIowing examples, 1 hold all variables at their means.
Versus Discretc Change in Nonlinear Models

Amounts ol Change in the Independent Variables


as W¡·, 1
• 11 e t le measures are not egua! if the h .
Curs over a of the prob bT ' c ange m Xk oc- Discrete change can be computed for any amount of change in an
measures will be This isat~ lty cu~e that is roughly linear. the two independent variable, holding aH other variables at some fixed value. The
The amount of ["0'~~'>+~" e ~ase I.or the example in Figure 3.1 O. amount of change that you allow for an independent variable depends
011' th 1I1 the probability for a change in xk
on the type of variable and your purpose. Rere are some use fui options.
. e amount of in x . (2) th .
am! the vaIues of all oth " b! k, e startmg value of
er vana es. For example, if we have A Unit Change in xk' If xk increases from to + 1,
from Xl and X 2 , the .change in Pr(y 11 x) when x
~U'"H¡;\C" i 1 to 2 does not necessanly egua! the change wh 1 Ll Pr(y
to ~'. would be equal il Pr(y 11 x) 5::» Men X goes Pr(y 1 1 x, Xk + 1) - Pr(y 11 x,
LlXk
mIl ) h " oreover
does not x w en Xl changes from 1 to 2 with t' ; By examining the probability curves (see Figure 3.8), it is c1ear that a
. the change whe ? T ' 2
• '! • n x2 _. hus, the practícal unit increase in X k from its mean will only have the same effect as a
WhlCh \ aIues oí the variables to consider and how unit decrease in xk from its mean when Pr(y = 11 x) .5. This implies
that if you have two variables such that f3k = -f3e, the effect of a unit
increase in xk will not equal the effect of a unit decrease in xc. For
CI1'OOSln'í> Values the tnctev'endel1f Variables these reasons, Kaufman (1996) suggested examining a unit increase that
the . 1 is centered around That is,
the levels of all m t le proba~ility for a given change in xk depends on
rhe x's to
the f
tI
. le
or an
vanables, we must decide at which values of
change. A common approach is to assess
member of tI
Ll Pr(y
Llxk
1 1 x) ( _
--'-'----'-..:... = Pr y = 1 IX,
-D
we could hold al! values at their m' If h . le sample. For example, The centered discrete change can be interpreted as:
ed~S, t e mdependent variables are
• A unit change in xk that is centercd around Xk results in a of
and .relatlve to the mean may be misleading
él Pr(y = 11 x)j élXk in the predicted probability, holding all other variables
t
to tlle median would be more useful.
at their means.
REGRESSION MODELS Binary Outcomes 79
A
ill
amine the effeet Ofl standard d .Xk: Tlll¡'S idea can be extended to ex- preted as:
, . , eVlatlOn e ulnge:
Ll 1 • For a woman who is average on all characteristics, an additional young child

where
11 x, - i) decreases the probability of employment by .33.
• A standard deviation change in age centered around the mean will decrease
is the standard
the probability of working by .12, holding all other variables at their means.
A From O lo j D· Vt· • If a woman attends college, her probability of being in the labor force i8
erete . . .. un/my ana bies. When eomputing a dis-
the variable d~!):~:a~~~~~~ r~l~~~~IS~ mhake certain that the change in
.18 greater than a woman who does not attend college, holding all other
variables at theír means.
For j'1" '. d es t at exceed the variable's range
IS a ummy .h .
will b e ' (l' eIÍ er + 1/2 will exceed 1 or Notice that the discrete change from O to 1 for WC and HC i8 nearly
measure of un ess 1/2). Consequently, a preferred identical to the effect of a unit change. This is a consequence of the near
for dummy variables is linearity of the probability curve over the range of these variables, and
will not necessarily be true in other examples.
= llx, xk 1) Pr(y llx, xk O)
This is the as
their means. goes from O to 1, holding all other variables at 3.8. Interpretation Using Odds Ratios

The idea of h' Our final method of interpretation takes advantage of the tractable
'. c ange can he extended in man form of the logit modeL A simple transformatíon of the {3's in the logit
on the appllcatlOn. If a change of a specific amount .Y model indicates the factor change in the odds of an event occurring.
other rhan 1 nr suc~::Sb~h:s~~~ition of four years of SChoolin~: There is no corresponding transformation of the parameters of the probit
model.
From Equation 3.6, the logit model can be written as the log-linear
Labor Force Participation model:
lhble 3.8 contains measures of discre 'h .
of women's lahor force participation S te c afnghe for the probIt model InH(x) = x~ [3.15]
. , ome o t e effects can he inter-
where
IABLE 3.8 Discrete
Model in ¡he Probability of Employment for the Probit 11 x)
H(x) = Pr(y = llx) [3.161
Pr(y OIx) 11 x)
is the odds of the event given x. In H(x) is the log of the odds, known as
the [ogit. Equation 3.15 shows that the logit model is linear in the logit.
Consequently,
-0.02
-0.12 J In H(x)
He 0.18 JXk
0.14 0.02
0.08
Since the model is linear, {3k can be interpreted as:
• For a unít change in Xk' wc expect the logit to change by f3 k , holding all
othcr variables constan!.
REGRESSION MODELS Binwy Outcomes 81
This
the is simple sinee the effeet of a unit ehange in xk on
011 the level of x k or on the level of any
"exp({3k) times smaller." For 8 = Sk' we have:
.most of us do not have an intuitive un"- • Standardízed factor change. For a standard deviation in lhe odas
m the logit means. This requires another are expected to change by a factor of exp( f3 k ), holding al! olher variables
constan!.

Notice that the effect of a change in Xk does not depend on the level of
exp(xf3 ) Xk or on the level of any other variable.

.. + We can also compute the percentage change in the odds:


+ ... + (3K x K)
". exp({3KXK) = O(x, xk) 100 --'------'''----'----'--''-.::.. = lOOr exp({3k x B) 11
The last .
notatlOl1 tha! makes l" h This quantity can be interpreted as the percentage change in the odds
To assess the effeet of. ' exp ¡eH t e value of
.lb we want to see how O ehan es wh for a 8 unit change in x k, holding all other variables constant.
some quantity 8. Most often, we eonsider 8 = 1 ~r 8 _ en ~} The factor change and standardized factor coefficients for the
8, the odds - sk'
logit model analyzing labor force participation are presented in 1hble
3.9. Here is how so me of the coefficients can be interprcted using factor
and percentage changes:
Xl)"· + 8» ... exp({3KXK) • For each additional young child. lhe odds of being arc accreascd
) ... exp({3k xd exp({3k D) ... exp({3KXK) by a factor of .23, holding al! other variables constant. Or, equivalently,
1b fol' each additional young child, the odds of are decreased
and after adding 8 to xk, we take the odds holding al! other variables constant.
ratio:
• For a standard deviation incrcase in anticipated wages, the odas of
employed are 1.43 times greater, holding all olher variables constan!. Or,
for a standard deviation increase in anticipated wages, the odds of
are 43% greater, holding all other variables constan!.
• Being 10 years older decreases (he odds by a factor of .52 (
holding all other variables constant.

TARLE 3.9 Factor Change Coefficients for Labor Force Participation for the
the parameters can be interpreted in terms of odd . Logit Model
, s ratlOs:
3.
(1l1 the odds are expected to change by a factor o/" Logit Factor Slandard filelor
all other vanables constant. Variable Coefficienl Change Change z~value

For B 1, we have: Constant 3.182 4.94


K5 -1.463 0.232 0.465 -7.43
• K618 -0.065 0.937 0.918 ··0.95
inx" the odds are expected to change by AGE -0.063 0.939 0.602 -4.92
al1 orher variables constant.
WCOL 0.807 2.242 3.51
If
i~, ~~eater tha~ 1, you could say that the odds are "exp({3 k) HCOL
WAGE
0.112
0.605
LlJ8
1.831 JA27
0.54
4.01
) IS less than 1, you could say that the odds a;e INC -0.034 0.966 0.670 -4.20
REGRESSION MODELS Binaly Outcomes 83

.. The oelds ratio i8 a multiplicative coefficient which m h" 3.9. ConcIusions


Itlve" . . , e a n s t at pos-
O d 1 are greater than J, while "negative" effects are between
an . , l . and negative effects should be compared The choice between the logit and probit models is largely one of con-
"le mverse the ( .
fac tor . . or Vlce versa). For example a venience and convention, since the substantive results are indis-
f ") of 2 has
• the s ame magmtu. el e as a negative factor
' tinguishable. Chambcrs and Cox (1967) show that extremely sam-
o ..a
effect than Thus' a of . 1 1/10'md'lcates a stronger pIes are necessary to distinguish whether observations were generated
scale is that to Oft~~ AFr°ther consequence of the muItiplicative from the logit or the probit modeL The availability of software is no
e e. ect on the odds of the event not occur- longer an issue in choosing which model to use. Often the choice is a
take the of th'e efiect on the odds of the event matter of convention. Sorne research areas tend to use logit, while oth-
ers favor probit. For sorne users, the simple interpretation of logit coef-
• 10 years older makes the odds 01' not being in the labor force 1 9
ficients as odds ratios is the deciding factor. In other cases, the need to
times greater, holding all other variables constan!. .
generalize a model may be an issue. For example, multiple-equation sys-
tems involving gualitative dependent variables are based on the probit
When 1 I model, as discussed in Chapter 9. Or, if an analysis also includes egua-
. . d' . . t le oc ds ratio, it is essential to keep the follow-
m mm . A constant . I tions with a nominal dependent variable, the logit model may be pre-
a COlutant m t le odds does not correspond to ferred since the probit model for nominal dependent variables is com-
, or constan! change in the p' b bT Th'
be se en in Table 3.10. While h . . 10 a 1 zty. IS can putationally too demanding. Or, in case-control studies where sampling
of 2, the t e odds are bemg changed by a constant is stratified by the binary outcome, the logit model is required (see Hos-
do not change by a constant factor or a
mer & Lemeshow, 1989, Chapter 6, for details).
is the odds are very small, the factor change in
Many of the ideas presented in this chapter are used to develop and
th e o dd s are egual lo the factor change in the odds
th e pro ba b1T ' interpret models for ordinal and nominal variables in Chapters 5 and
I lty remams essentíally unchanged .
w len a factor change in th dd . . . 6. First, however, Chapter 4 considers hypothesis testing, methods for
ttal to know what the c t1 ¡ e o s, It IS essen- detecting outliers and influential observations, and measures of fit.
the in' urren eve of the odds is. This can be done using
then ,3.7.2 to compute the predicted probability, and
the odds to Equation 3.16.
3.10. Biblíographic Notes

of Tw? in the Odds With the Corresponding Factor The very early history of these models begins in the 18605 and is dis-
!TI the cussed by Finney (1971, pp. 38-41). The more recent history of the probit
model involves attempts to model the effects of toxins on insects. Work
by Gaddum (1933) and Bliss (1934) was coditied in Finney's influential
Probit Analysis (1971), whose first edition appeared in 1947. The logit
model was championed by Berkson (1944, 1951) in the 1940s as an alter-
native to the probit modeL Cox's (1970) The Analysis of Binary Data was
highly influential in the acceptance of the logit modeL Applications of
the logit and probit models appeared in economics in the 19508 (Cramer,
1991, p. 41). Goldberger's (1964, pp. 248-251) Econometnc Theory was
important in establishing these models as standard tools in economics,
while Hanushek and Jackson's (1977) Statistical Methods for Social Sci-
entists was important in disseminating these models to areas outside of
economics.
REGRESSION MODELS

and (1 CI '
. lapter 4) develop the logit and

~l
wlth severaJ alternatives within th f pro-
model Pudne ( 1 9 ' e ramework of the
, .' . y . 89, Chapter 3) derives these mod-
assumptIOns associated with t'l' '"
4) . U I Ity maxJmlzatlon. Hypolhesis Tesling and Goodness 01 Fil
m>h",,",_ 1 . presents both models with special attention
oglt and 1 r
data. the ntpr,w~+ f og- mear models for categoricaJ
been each 01' the met~o~l,e r~sults of th~se models has often
can be found in onc l' s of mterpretatlOn considered in this
treatments that . orm ?r a~other in earHer work. Recent
pp. on mterpretatlOn mclude . Hanushek. an d J ackSon
and pp. 97-117), LJao (1994), Long (1987),

of numericaJ methods see Judge et


. ( 1993 '
. -, pp. 343-357). For details on
, m~tnx, see Cramer (I986, pp. 27-29) Greene
and Davldson and MacKinnon (1993, pp. 263-267).

This chapter begins by reviewing tests of hypothesis that can be used


with any model estimated by maximum likeJihood. Next, methods for
detecting outliers and influential observations for the binary logit and
probit models are examined; comparable methods for ordinal and nomi-
nal outcomes are not available. The chapter ends with a review of scalar
measures for assessing the overall goodness of 11t oí a model. While
sorne of these measures apply only to the binary response model, most
can be adapted to the models in Jater chapters.

4.1. Hypothesis Testing

ML estimators are distributed asymptotically normally. This means


that as the sample size increases, the sampling distribution of an ML
estimator becomes approximately normal. For an individual parameter,

í3k ~ K(f3k> Var(í3k))


where "~,, reads "is distributed asymptotically as." For a vector oí pa-
rameters,

85
REGRESSION MODELS
Hypothesis Testing and Goodness of Fit H7
\vhere Var(í3) is the
matrix for í3. For example, with three
distribution is used. Accordingly, sorne programs label this statistic a
z-test, while other programs label it a t-test.

Var (t)
{32
Example ot the z- Test: Labor Force Participatian

To test the hypothesis that having young children affects a woman's


The probability of working, we can use the z-statistic in Table 3.3 for the
two parameters. are the covariances between the estima tes of logit model. Since prior research suggests that the effect is a
one-tailed test is used. We conclude that:
. the hypothesis H . {3 h *.
slzed often tOS' O·.k , W ere {3 IS the hypothe-
.
WhlCh results in the test:
° .. ,
mce lS unknow't b .
n, I must e estlmated,
• Having young children has a significant effect on the prohahility of working
(z = -7.43, P < .01 for a one-tailed test).

z 4.1.1. Wald, Likelihood Ratio, and Lagrange Multiplier Tests


[4.11
It is often useful to test complex hypotheses. For example, you might
.h ML, if Ho is true, then z is distributed want to test that severa! coefficients are simultaneously equal to O, or
The n:
;lt .a ean ,of O and a variance of 1 for large that two coefficients are equal. Such hypotheses can be tested witb Wald,
likelihood ratio (LR), or Lagrange multiplier (LM) tests. These tests can
the oí' " Istnbutlon for z, drawn in Figure 4.1 shows
Vdnous values of z wh H ' ' be thought of as a comparison between the estimates obtained after the
the shaded for z 1 9 '.. en o 18 true. For example,
J.96 will OCcur dlle to . 6 ,1Il.dl~ates that values of z greater than constraints implied by the hypothesis have been imposed to the estimates
shaded on rhe left i d' vanatlOn 2.5% of the time. Similarly, the obtained without the constraints. This is illustrated in panel A of Figure
n
will occur. For a two-tailed Ica~" ~lOW .frequently values less than 1.96 4.2, which is based on a figure from Buse (1982).
The log likelihood function for estimating (3 is drawn as a solid curve.
in rIle 8haded of eíthe t '1 oI~ reJected at the .05 level when z falIs
of the , r .a.l. pasto research or theory suggests the The unconstrained estimator Pu maximizes tbe log likelibood function,
would be a hone-ta.lle.d test IS used and the null hypothesis with the log likelihood equal to In L(Pu). The hypothesis H(}: {3 W ilIl-
Tbe test statistic in
w en IS 111 the expect I t .]
' . ce al.
poses tbe constraint {3 = W, so that tbe constrained estímate Pe equals
4.1 IS sOmetlmes eonsidered to have an W. Unless Pu is exactly equa! to (3*, InL(Pd ís smaller tban InL(pu),
t-test. N is h ·rhe
and · .test . 1 to as
· .is refe rre( ' a t-test or a quasi- as shown in the figure. The LR test assesses the constraim by comparing
of tbe test, ít make8 HttI: ¡ch 18 requl~ed for the ~sYI?Pt~tic justificatíon the !og likelihood of the unconstrained~mode1, In L(Pu), to tbe log like-
w ether a t-dlstnbutlOn or a normal lihood of the constrained model, In L({3c). If the constraint significantly
reduces the likelihood, tben the null hypothesis is rejected.
The Wald test estimates the model without constraints, and assesses
the constraint by considering two things. First, it measures the distance
between the unconstrained and the constrained estimates. In our exam-
pIe, this quantity is Pu - Pe Pu -
= W. The ¡arger tbe distan~e, th~less
likely it is that the constraint is true. Second, this distance {3u - {3c is
Reject Ha
.1 weighted by the curvature of the log likelihood function, whicb is indí-
z cated by the second derivative ¿p In L I iJ{32. The larger the second deriva-
o 1.96 tive, the faster the curve is changing. (What daes ir mean iI the second
4.1. denvative is O?) The importance of the shape of tbe function is iIIus-
for a z-Statistic
trated in panel B. The log likelihood drawn with a dashed line is nearly
REGRESSION MODELS Hypothesis Testing and Goodness of Fít 89
Panel A: Wald, LR, and LM Tests

Accept

Figure 4.3. Sampling Distribution of a Chi-Square Statistie with 5 of


Freedom

likelihood funetion at the constraint. If the hypothesis is true, the slope


(known as the score) at the constraint should be c10se to O. In panel A
B: of the Ukelihood Function of Figure 4.2, the slope is represented by the tangent to the curve drawn
InL with a dashed line, which is labeled J In L/ Jf3. As with the Wald test,
the curvature of the log likelihood function at the constraint is used to
assess the significance of a nonzero slope.
When Ho ís true, the Wald, LR, and LM tests are asymptotically equiv-
alent. As N inereases, the sampling distributions of the three tests con-
verge to the same chi-square distribution with degrees of freedom equal
to the number of constraints being tested. Figure 4.3 shows the sampling
distribution for a chi-square statistic with 5 degrees of freedom. The area
to the right of X~ is equal to p, and indicates the probability of observ-
ing a value of the test statistic greater than X~ if Ho is true. The null
hypothesis is rejected at the p level of significance if the test statistic is
f3 larger than X~.
It is important to remember that the Wald, LR, and LM tests only have
Figure 4.2. asymptotic justifications. The degree to which these tests approximate a
and Multiplier Tests ehi-square distribution in small samples is largely unknown. See Section
3.5.1 (p. 53) for guidelines on the sample size needed for using these
'. lirJ
evaluated at is relatively small When tests.
18 small, the distance between f3~ and f3~'. . With these ideas in mind, we are ready for formal definitions of the
. . U e 18 ml- Wald and LR tests. The LM test is diseussed further in Chapter 7.
~anatlOn .. Th.e second function, drawn with
secon? denvatlve, indicating a more rapidly
'~Á''-'d'''J·u. ~lth a second derivative the same 4.1.2. The Wald Test
. an mlght be significant (HolV IV td· .
Slze t}~e (:Ulvature o[ the log likelihood fun~~on ~)creasmg While in its most general form the Wald test can be used to test non-
multIpher (LM) test also k . linear eonstraints, here we consider only linear constraínts of the form:
t11e constrained model' ~ d nown as the seore test,
, ,n assesses the slope of the log Q(3 r [4.2]
REGRESSION MODELS HypOfhesis Testíng and Goodness of Fit 91

where ~ is the vector of parameters being tested, Q is a matrÍX of con-


whieh is simply the inverse of the variance. The larger the variance,
. and r is a vector of constants. WhiJe we are usually interested the smaller the weight given to the distance bctween the hypothesized
111 the . and of a model, ~ could contain other pa- and estimated value. Equivalent!y, the faster the likelihood function is
rameters sl:ch as eT 111 the LRM. By specifying Q and r, a variety of lin- changing in the region around f3¡,
the more significant the difference
ear constramts can be For example, consider the probit model
lil - f3*.
(Why should we give less weight when the variance is larga?)
hp,'nr".~c
1 + To test that f3¡ = O, Equation 4.2 Combining these results,

to test the constraint rhat which is distributed as chi-square with 1 degree of freedom if Ha is true.
= = 0, Notice that W is the square of the z-statistíc in Equation 4.1, which cor-

The
(~6n G:) ~ m responds to a chi-square variable with 1 degrec 01' freedom being equal
to the squarc of a normal variable. Sorne programs, such as SAS, present
a single degree of freedom chi-square statistic for individual coefficients,
rather than the z-statistic.
Q~ r can be tested with the Wald statistic: The same ideas apply to more complex hypotheses. Consider Ho: f31
=
f32 = O, which can be written as
W [QP rHQV;U:(p)QT][Qp r] [4.3J

(O
W
~umber. of c~nstramts
~s wirh of freedom equal ro rhe
the number_of rows of Q). The Wald statis- Ho:
10) (f3o)
001 ~~
tIC conslsts oí two components. Q~ r at each end of the formula
measures between the estimated and hypothesized values. QP - r is simply (liI li2)" The middle portion of the Wald formula is
reflects the variability in the estimator, 01', al-
the curvature of the likelihood function. To see this more
consider a example.
1'01' the model 11 x) (P(f3¡¡ + f3¡x¡ + with Ho: f3 1 = f3*,
Q~ r can be wríttcn as
To keep the example simple, assume that the estimates are uncorrelated.
(In practice, the estimates will be correlated.) Then

Q~
[4.4]
at the cnd of the formula, which squares the distance
\,;cU'.GIC;¡'

value and the estimate. Therefore, negative


llave the same effect on the test statistie. The The larger the variance, the less weight is given to the distancc between
of the Wald statistic is the hypothesized and estimated parameter. Carrying out the algebra, we
obtain
2
W=¿
k=1
93
REGRESSION MODELS Hypothesis Testing and Goodness of Fit

With uncorrelated the Wald statistic is the sum of squared Wald Test Tlzat Two Coefficients Are Equal. To test that the effect of
z's. Recall that a distribution with J degrees of freedom is the husband's education equals the effeet of the wife's education, define
defined as the sum of J independent, squared normal random variables.
Q=(OOOOl-100) and r = (O)
When rhe estimates are which is normalIy the case, the re-
formula is more but the general ideas are the same.
Substituting these matrices into Equation 4.3 and simplifying results in
the usual formula:
the vVald Test: Labor Force Participation
To iIIustrate the Wald test, consider the logit model:

1) + /3¡K5 + /3zK618 + /33AGE [4.5] Then W = 3.54 with 1 degree of freedom. There is 1 degree of freedom
since there is a single restriction, even though that restriction involves
/34WC + /3sHC + /3óLWG + /37 INC )
two parameters. We conclude:
U'ctld Test To test Ho: /3! = O, let • The hypothesís that the effects of the husband's and wife's educatíon are
equal ís marginally significant at the .05 level 3.54, 1, P = .(6).
Q 1 O () () O O O) and r = (O)
4.1.3. The Likelihood Ratio Test
Then W which is the square of the z-statistic for K5 in Thble
3.3. We describe the result as: The LR test can also be used to test constraints on a model. While in
its most general form these constraints can be eomplex and nonlinear, 1
• The effee! of young ehildren on the probability of enteríng the labor only consider constraints that involve eliminating one or more regressors
force is s¡grul1eant at the .01 level 55.14, 1, P < .01). from the model. For example, consider the logit models:
The is ofren used rather than W since the Wald statistic has MI: Pr(y = 11 x) = A(/3o + /31 Xl + /32 X2)
a dístribution.
M2 : Pr(y = 11 x) = A(/3o + /3¡x¡ + /32 x 2 + (33 X 3)
Wald Test That Two Are O. The hypothesis that the effects M 3: Pr(y = 11 x) A(/3o + (3¡x¡ + (32 x 2 + /34 X 4)
of the husband's and wife's education are simultaneously Ocan be written
M4 : Pr(y = 11 x) = A(/3o + /3¡x¡ + + /33 X 3 + /34 X 4)
as: = O. 1b test this hypothesis, let
Model MI is formcd from M2 by imposing the constraint (33 = O, and
Q
'00001000)
( 00000100 and r = (~) M is formed from M3 by imposing the constraint /34 = O. When one
m~del can be obtained from another model by imposing constraints, the
constrained model is said to be nested in the unconstrained model. Thus,
Then W 17.66 with 2 of freedom. We conclude: M¡ is nested in M2 and in M3 . However, M2 is not nested in M3 , nor is
fwnoth,'~í< that the effeets of the husband's and wife's education are
M3 nested in M2 . (Which models are nested in M4?) .
sIrrmll'an,em¡slv enmll to Ocan be at the .00level (X 2 := 17.66, df = 2, The LR test is defined as follows. The constrained model Me wlth
p parameters ~e is nested in the unconstrained model M u with parameters
~u. The nuIl hypothesis is that the constraints ímposed to create Me are
Q and r to test the hypothesis that al! of the coefficients except the true. Let L(Mu) be the value of the likelihood function evaluated at the
inf.,Yrrpnf are ML estimates for the uneonstrained model, and let L(Me) be the value
95
REGRESSION MODELS Hypothesis Testing and Goodness ot' Fít

While D( M p) is sometimes reported as having a chi-square distribu-


at the constrained estimates. The likelihood ratio statistic, hereafter the
tion, McCullagh (1986) shows that D(Mp) has an asymptot.ic normal
LR
distríbution as a consequence of the number of parameters m the full
= 2InL(Mu) 2lnL(Me) model increasing directly with the number of observations. McCuJlagh
Under very if }lo is tme, then e 2 is asymptoticalIy and Nelder (1989, pp. 120-122) suggest that when the data are sparse
distributed as with of freedom equal to the number of (i.e., when each combination of values of the independent variables oc-
. cOI1straínts. While the LR statistic can be used to compare curs only once in the sample), D(M p ) should not be used as a meas~re
any of nested there are two tests that are commonly com- of fit in the model. See Hosmer and Lemeshow (1989, pp. for
standard software and are often ineluded in tables presenting further details.
G2(M ) and D(M ) can be used to compare nested models. Consider
the results of models estimated by ML.
The first test compares a model to the constrained model in the unc6nstrained m6del Mu and the constrained model Me· If the val-
ues of the likelihood function are known, we could test the constraints on
which alJ coefficients are equal to O. Thís test is frequently referred
lhe likelihood ratio chi-square or the LR chi-square. To define the Mu with G2(McIMu) = 2InL(Mu) 2InL(Mc). This statistic could
test, Jet model be the unconstrained modeI that íneludes an intercept, also be computed using the LR chi-squares:
and any other parameters in the modeI (e.g., (J' in the
G 2 (Mu) = 2lnL(M[J) - 2InL(Ma )
Let be the constrained model that exeludes all regressors
from the model parameters f30 and (J' would be ineluded for e 2 (Mc) 2InL(Mc ) - 2InL(MoJ
the that aH of the slope coefficients are
10 O. we use the test statistic: Since Ma is the same for both models,
2InL(1"fp) - 2InL(M,,} (4.6]
G 2 (Mc IMu) = G2 (Mu) - e 2(Mc)
T11e. notation G2(M¡3) the more cumbersome e (Mcr I
2

If the null that all slopes are O is true, then e 2 (Mp) is


= 2InL(Mu) - 2InL(Mc)
distribllted as with of freedom equal to the number
This is why G 2 ( M c 1M u) is often referred to as a difference of ~hi-square
of regressors.
test. Similarly, the deviance can be used to compute the test. lf
. The second test, known as lhe scaled deviance 01' simply the deviance,
IS llsed within rhe framework known as the generaIízed linear
D(Mu) -2InL(M u ) and D(McJ = -2InL(M e )
model & 1989, pp. 33-34). The devíance compares a
model 10 the model M p. The full model has one parameter for
cach and can reproduce perfectly the observed data. Since then
the observed data are predicted, the likelihood of M F is 1, and
G 2 (McI M u} D(M c ) D(M[J)
the likelihood is O. 1b test that M F significantly improves the fit over
the deviance is defined as -2InL(Mc) - -2InL(M u )
= 21nL(M p ) 2In L(M{3) = 2InL(Mu) 2lnL(Mc)
-21nL(M p )
Examples of the LR Test: Labor Force Participation
IMp )
For the unconstrained model in Equation 4.5, the LR chi-square
Since the deviance is -2 times tIJe log likelihood of the given modeI, its
G2(M u) = 124.48 and the deviance D(Mu) = 905.27. These statistics
~alu~ can be from any program that provides the log
hkellllOod of the modeJ estimated. are used for computing the following tests.
REGRESSION MODELS Hypothesis Testing and Goodness of Fit 97

" a To te~t l~o: ~l O, the model M[K5] TABLE 4.1 Comparing Results From the LR and Wald Tests
18 where the subscnpt mdlcates that K5 is excluded
fmm th.e unconstrained model. The LR chi-square and deviance for the LR Test Wald Test
constramed model are Hypothesis df (;2 W p
P
= 58.00 and D(M[K5]) = 971.75 /3, O 1 66.5 < 0.01 55.1 < fl.Ol
/34 /3, O 2 18.5 < 0.0] 17.7 0.01
AlI slopes (J 7 124.5 (101 95.0 0.01
2 2
G (M[/) G (M[K5J) = 66.48
= D(M(K-'i]) D(Mu) = 66.48 4.1.4. Comparing the LR and Wald Tests
We conclude:
Even though the LR and Wald tests are asymptotically equivalent, in
• The effec! 01' young children is significant at the .01 level (LRX2 = finite samples they give different answers, particularly for small samples.
66.5, In general, it is unclear whether one test is to he preferred to the other.
Rothenberg (1984) suggests that neither test is uniformly superior, while
. that .1 have used rather than in presenting the result. Hauck and Donner (1977) suggest that the Wald test is less powerflll
ThlS makes It that a likelihood ratio test i8 being reported. than the LR test. In praetice, the choice of which test 10 use is often
determined by convenience. While the LR test requires the estimation
To test the hypothesis that the effects of two models, the computation of the test only involves subtraction. The
and wife's education are simultaneously O, Ho: /34 = Wald test only requires estimatíon of a single model, but the computation
O, the model is estimated, resulting in
of the test involves matrix manipulations. Which test is more convenient
105.98 and depends on the software being used.
D(M¡wC.He¡) = 923.76
Table 4.1 compares the results of the LR and Wald tests for our ex-
The test statistic is
ample hased on a sample of 753. For al! hypotheses, the conclusions
= G 2(Mu) 2
G (M[wc'HC]) = 18.50 from both tests are the same. Note, however, that the values of the LR
statistics are larger than the corresponding Wald statistics.
= He¡) - D(Mu) = 18.50
We conclude: 4.1.5. Computational Issues
nvrJottJes:ls tha! the effects of the hushand's and wife's education are There are two important computational considerations that must be
lo O can be rejected at the .01 level (LRX2 = 18.5, taken into account when computing Wald and LR tests. If they are not,
you run the risk of drawing the wrong conclusions from your tests.
LR Test That All
Are O. G2(Mu) = G 2 (MaI M u) can be
used to test the Computing the LR Test
that none of the regressors affects the prob-
of
the labor Formally, Ho: /3 1 = /32 = /33 = /3 4 = The LR test requires using the same sample for all models being com-
pared. Since ML estimatíon excludes cases with missing data, it is com-
u "~~'''~,''0 that all coefficients except the intercept are O mon for the sample size to change when a variable has been excluded.
124.5, 7, p < .01). For example, if Xl has three missing observations that are not missing
for any other variables, the usable sample inereases by 3 when Xl is ex-
While a Wald test could be used to test this hypothesis, the LR test is
more used. cluded from the model. To ensure that the sample size do es not change,
you should construct a data set that excludes every observation that has
REGRESSION MODELS
Hypothesis Testing and Goodness of Fit 99
values for any of the variables used in any of the models b
tested . , e- While X 2 is sometimes reported as having a chi-square distribution, Mc-
: " nllssmg values can be imputed using methods
Ul:'i,-"U""C::U m LJttle and Rubín (1987), CuIlagh (1986) demonstrated that when the data are sparse when
there are continuous independent variables), X 2 has an asymptotic nor-
curtunW'!1p the Wald Test mal distribution with a mean and variance that are difficult to compute.
McCullagh and Nelder (1989, pp. 112-122) recommended that X 2 not
The matrix cOI?putations for the Wald test can accumulate appreciable be used as an absolute measure of fit. Hosmer and Lemeshow (1989, pp.
error lÍ yOl~ do not use the ful! precision of the estimated 140-145) propose an alternative test constructed by grouping data tbat
and covananee matrix. PracticaIly speaking this means that can be used with sparse data.
you should use a program in which the estimates can b~ stored and then While Var(Y¡-1T¡) 1Ti(1-1Ti), Var(y¡-11'¡) #- 11'¡{1-11'J Consequently,
the rounded values listed in the output can resuIt ' the variance of r¡ is not 1. To compute tbe variance of the estimated
values for the test statistic, m re.'liduals, we need what is known as the hat matrix, so named because it
transforms the observed y into y in the LRM. For the BRM, Pregibon
(1981) derived tbe hat matrix:
Residuals and Infiuence
When a m d 1 't' f
flt ~. h o e, I IS use ul to consider how weIl the model
1S
h Cd.C case and how much influence each case has on the estimates of where V is a diagonal matrix with )11'1(1 - 11'1) on tbe diagonal. Since
t e parameters me asure t1le d'ff . only the diagonal of H is needed, we can use the computationally simpler
. ' ¡ erence between the model's
tor a , case and the observed value for rhar case w¡'th formula:
. l' . that fit poorIy th ough t 01' as autliers. Influence is the 'effect
o <In on of tI d l'
01' fit The '. l.e mo e s parameters or measures
. . of reslduals and mfluence is well developed for the where Xi is a row vector with values of the independent variables for
and 1 assume that yOll have sorne 1'amiliarity with this material the ith observation and Var(P) is the estimated covariance of the ML
and . 1980, Chaprer 5, for good introductions). estimator Ji.
Using 1 - h¡¡ to estimate the variance of r¡, the standardized
Preglbon's (1981) extensions 01' these methods to Pearsan residual is

define "1Ti E(y. 1 Ix)


1
= Pr(y.1 11) S'
Xi . ,¡nce r 1Std
t h e ?eVlatlOns Yi - 1TI are heteroscedastic, wíth
Thls suggests the Pearsan residual: .
While r Std is preferred to r, the two resíduals are often similar in practice.
,..
¡
An index plat of the standardized residuals against tlle observation
number can be used to search for outliers. Figure 4.4 is an index plot
of the standardized residual for the labor force data. Only half of the
observations are shown in order to make the figure c1earer. Two obser-
values 01' ti suggest a failure of the model to fit a given obser- vations stand out as extreme and are marked with boxes. Observation
can ,be used to construct a summary measure 142 has a residual of 3.2; observatíon 512 has a residual of -2.7. Fur-
rPflr,v,» statlstlC:
ther analyses of these cases might reveal either incorrectly coded data
or sorne inadequacy in the specification of the model. Cases with large
positive or negative residuals should not simply be discarded from the
analysís, but rather should be examined to determine why they were fit
so poorly.
REGRESSION MODELS Hypothesís Testíng and Goodness of Fit 101

A large value of DFBETA ik indicates that the ith observation has a large
inftuence on the estimate 01' {3k'
__ J _ __________• __________ __ _
A second measure summarizes the e1'fect of removing the ith observa-
~

tion on the en tire vector ~, which is the counterpart to Cook's distance


for the LRM:

- --r---- .: ___ __ _
~~ ~

Another measure 01' the impact of a single observation is the change in


X 2 when the ¡th observation is removed:
400 600 800
Obse ion Number
4.4. Index Plat of Standardized Pearson Residuals Figure 4.5 shows an index plot 01' C. Comparing this figure to Figure
4.4 illustrates the difference between an outlier and an inftuential ob-
servation. In both figures, observation 142 stands out. However, while
residua]s indicate that an observatíon is not fit well, they observation 554 has a large residual, it has a e of only .06. Analysis of
do not whether an obselvation has a large inftuence on the es- the DFBETAik'S for observation 142 would indicate which coefficients
timated or the overall fit. For example, a large residual 1'or the ¡th are being affected.
observation willnot have a inftuence on the estimates of {3 (Le., re- Methods for plotting residuals and outliers can be extended in many
that . wiIJ not change the estimates very much) if Xi is ways, including plots of different diagnostics against onc another. Details
near Ihe center oi the data. near the center 01' the data means that of these plots are found in Landwehr et al. (1984) and Hosmer and
an values for each independent variable are clase to that Lemeshow (1989, pp. 149-170). While Lesaffre and Albert (1989) have
variable's mean in Ihe On the other hand, extreme observations proposed extensions of these diagnostics to the multinomial logit model,
can inftuence the even when they do no! have large residuals. these extensions have not been added to standard software. Diagnostics
A useful way !o detec! 8uch o~8ervations, known as high leverage points, for logit and probit are included in SAS ancl Stata.
to the in p that occurs when the ¡th observation is
dcieted. Since il is impractical to estimate the model
N once with each observation removed, Pregibon (1981) derived
N ______ 8 ___________________________ _
!!lat only estimating the model once. The ex- o
in P if the ith observation is removed is approximately

u
c:i

The standardized in due to the deletion of Xi' known as the . '


.. : :.. •\*".....
C=? ' ••••••• ....;....'.'.• ,::":'"
...~"~,.. ., '''y.0> ,= .•:.::., ::.,... :.::.. ;:;
""'-.A~,-#_#.... "~",-,, ··V' #V"'.
o O 200 400 600 800
Observation Number
Figure 4.5. Index Plot of Cook's Influence Statistics
103
102 REGRESSION MODELS Hyporhesis Testing and Goodness of Fil

4.3. Scalar Measures of Fit for CLDVs, they often produee different values and thus provide differ-
ent mea sures of fit. 1
In addition to the fit of each observation, it is sometimes Let the structural model be y = xl3+t:, with K regressor§.t an interc~pt,
useful to have a number to summarize the overall goodness of fit and N observations. The expected value of y is y = xl3, where 13 .IS
of mode!. Such a measure might aid in comparing competing models the OLS estímator. The coefficient of determination can be defined In
in a final model. Within a substantive area of each of the following ways. Derivations of these formulas can be found
measures of tit ean provide a rough index of whether a model in Judge et al. (1985, pp. 29-31), Goldberger (1991, pp. 176-179), and
For if prior models of labor force participation Pindyck and Rubinfeld (1991, pp. 61, 76-78, 98-99).
have values of .4 for a given measure of fit, you would expect
tha! new with a different sample and perhaps with revised mea- Tite Percentage of Explained Variation. Let RSS be
variables would result in a similar value for that the sum of squared residual s, and let TSS 2.:;':1 be the total
or smaller values would suggest the need to sum of squares. Then R 2 is the percentage of TSS explained by the x's:
reassess the made in the new study.
While tlle desirability of a scalar measure of fit is clear, in practice R 2 = TSS - RSS = 1 RSS = 1 [4.7}
their use is First, 1 am unaware of convincing evidence that TSS
a model that maximizes the value of a given measure of fit
results in a model that is optimal in any sen se otller than the model The Ratio of Var(y) and Var(y) The ratio of the variances of Y and y
a value of that measure. While measures of fit provide is another definitíon:
some information. it is only partial informatíon that must be assessed
Var(Y) [4.8J
within the context of the motivating the analysis, past research, =-:-:-
and the estimated of the model being considered. Second, Var(y)
while in the LRM the coefficient of determination R 2 ls the standard
measure of there is no clear choice for models with eategorical out- A TransfOlmation of the Likelihood Ratio. If the errors are assumed to
comes. There have been numerous attempts to construct a counterpart be normal, then R 2 can be written as
to in the LRM, but no one measure is c1early superior and none has
L(Mc,)J2IN [4.91
the of a clear Ínterpretation in terms of explained variation.
[ L(Mp )
Orher measures have been constructed based on the ability of a model
to tbe observed outcome. FillalIy, the Bayesian measures AIC
where L( M a ) is the likelihood for the model with just the intercept, and
whieh are useful for comparillg nonnested models, are inereas-
L( M (3) is the likelihood for the model including the regressors.
while 1 approach scalar measures of fit with some
and proliferation makes a review useful.
A Transformation ofthe F-Test. The hypothesis Ho:
2
= .,. = f3K. = O
can be tested using an F-test, with the test statistic F. R can be wntten
in terms of F as
4.3.1. R 2 in the LRM

scalar measures of fit for models with CLDVs are constructed


VAU"'''''' the coefficíent of determination R2 in the LRM. Most
is defined as the proportion of the variation in y that where K is the number of independent variables.
the x's in the model. However, R2 can be defined
in other ways, each of which produces an identical value for R 2 in the 1 This is similar to the case in the LRM when there is no intercept. See Judge el aL (1985,
LRM. when these equivalent formulas are applíed to models pp. 30-31).
REGRESSJON MODELS Hypothesis Testing and Goodness of Fit 105

4.3.2. Pseudo-R 2,s Based on R 2 in tbe LRM


way to compare log likelihoods across different models. Unfortunatcly.
Severa1 for models with CLDVs have been defined by there is no clear interpretation of values other than O and 1, nor IS there
to the formula in the last section. These formulas produce any standard by which to judge if the value is "large enough."
different values in models with categorical outcomes, and, consequently,
are of as distinct measures. The Ratio of Var(y*) and Var()l*). For models defined in t~rms of a
latent outcome according to = xl3 + s, McKelvey and Zavoma (1975,
E,XlJl[,!lTl/';(j "Variation." For binary outcomes, Efron's pp. 111-112) proposed a pseudo-R2 by analogy to Equation 4.8:
defines y as 7i = PlÚ Ix) and applies Equation 4.7:
1
This formula differs from that for the LRM in two respects. First, we
a outeome, (Yi Y)2 = (non¡)/N, are using the estimated variance of the latent variable y* .rather than the
O:s and nI is the number of 1's in the sample.) observed y. Second, the variance of s is fixed by a~sumptIOn, rather tha~
"'''<''':',-",><\..u a different analogy to explained variation
being estimated. For the logit model, Vare e) = 1T /3. and for the problt
in the LRM that can be applied to any model estimated with ML. This model, Vare s) 1. The variance of )l*can be computed as
measure is also referred to as the "likelihood ratio index." In this
measure, the likelihood for model Ma without regressors is thought Var(?) íl'Var(x)íl
of as the total sum of squares, while the log likelihood of model M f3
wíth regressors is of as the residual sum of squares. By analogy where Var(x) is the estimated covariance matrix. among the x's.
R2 was suggested by McKelvey and Zavoma (1975, pp. 111-112)
for ~~inal outcomes, but can al so be applied to binary and censored out-
comes (Laitila, 1993). In simulation studies, Hagle and MitcheIl (1992)
1 and Windmeijer (1995) find that R~&z mos.t c10sely app:oximates the
R2 obtained from regressions on the underlymg latent varIable.
lf modeJ the slopes are aIl O), R~cF equals O, but R~cF
can never l. A Transformatíon of the Likelihood Ratio. If we define M IX as t~e model
Like [or the LRM, increases as new variables are added to with just the intercept, and M f3 as the model with the regressors mcluded,
the model. To compensate, Ben-Akiva and Lerman (1985, p. 167) suggest by analogy to Equation 4.9 a pseudo-R 2 can be defined as
for the number of parameters in the model (just as the
in the LRM): [4.1O}

1 Maddala (1983, p. 39) shows that R~L can be expresscd as a transfor-


mation of the likelihood ratio chi-square = -21n[L(M,,)/ L(M¡J»):
will increases by more than 1 for each
parameter added to the model. R~L = 1 exp( _G 2 / N)
and Lerman p. 167) discuss the Iogic behind and which illustrates that measures of fit such as R 2 and the various pseudo-
of these measures. Al! else being equal, models with a larger R 2 ,s are often closely related to tests of hypothesís. See Magee (1990)
value of the are prefcrred, and R~cF provides a convenient for other measures of tit based on the Wald and score tests.
lOó
REGRESSION MODELS Hypothesis Testing and Goodness of Pit 107
TABLE 4.2 Measures oí' Fit for the Logit and LPM Models
Let the observed y equal O or 1. The predicted probabilíty that y = 1
LPM Logit is
Ml
[4.11]
--486.42ó -452.633 -461.653
-539.410 -514.873 -514.873 where F is the cdf for the normal distribution for probit and for the
0.131 0.155 0.135 logistic distribution for logit. Define the expected outcome y as
0.098 0.121 0.103
0.131 0.217 0.182 if 1T¡ :::: 0.5
0.t3J 0.152 0.132
0.172
if > 0.5
0.205

likdihood ror the full model; In La ís the log likelihood for the model
which Cramer (1991, p. 90) ca lis the "maximum probability rule." This
other measures. allows us to construct atable of observed and predicted values, such as
Table 4.3, which is sometimes called a classification table.
approaclles tlle fit 01' MI> [i.e., as L(Mj3) -+ L(McJ],
The eount R 2• A simple and seemingly appealing measure based on
. Maddala (1983, pp. 39-40) sllows tllat R 2
reaches a ma.,,(lmum 01' 1 L(Ma:)ZIN. Tllis led Cragg and Ull~~ the table of observed and expected counts is the proportion of correct
to suggest the normed measure: predictions, which Maddala (1992, p. 334) refers to as the count R2:

1 [L(M"JIL(Mj3)fIN
--~""7-- = 2
RCount = N1 'L.
"
1 L(M,,)2I N J

Síncc both and are defined in terms of the likelihood 1'unc- where the n¡/s are the number of correct predictions for outcome j,
can be applied to any model estimated by ML. which are located on the diagonal cells in Table 4.3.

's: Labor Force Participation The Adjusted eount R 2• The count R 2 can give the faulty impression
that the model is predicting very well, when, in faet, it is not. In a binary
'lb iIlustratc scalar measures of fit, consider two models. Model M model without knowledge about the independent variables, it is possible
I
has the of independent variables: K5, K618, AGE to correctly predict at least 50% of the cases by choosing the outcome
. and INC. Model M 2 adds a squared age termAGE2 ancl category with the largest percentage of observed cases. For example,
thc vanables He, and LWG. The resulting measures of fit 57% of our sample were in the paid labor forcc. If we predict that all
for rhe LPM and models are given in Table 4.2. Notice tllat for a women are working, we would be correct 57% of thc time. Accordingly,
. model many of the measures are identical for tlle LPM but not
for the " model ") h Id try to reproduce these measures
. • LC U S ou
' using
TABLE 4.3 Classification Table of Observed and Predicted Outcornes for a
the hkehhoods for the full and restrieted models. Hinary Response Model

2 Predicted Outcome
4.3.3. Pseudo-R ,s Using Observed Versus Predicted Values Obselved
Outcome .51=1 Y O Row Total
. Another
.' to goo d ness O f filt 1Il
. models wlth
. categor-
lcal.outcomes IS to .c~mpare the observed values to the predicted values. y=1 fl ll :: correct 11 12 :: incorrect n¡¡
y=o incorrect
thls lde~ for models witll two outcomes, it can be easily
1!21 :: /1" :: correet l12+
Whlle 1 .
to models wlth J ordinal or nominal outcomes. Column Total ni ¡ 11+2 N
REGRESSION MODELS Hvpothesis ami Goodl1css Fit 109

TABLE 4.4 Observed and Predicted Outcomes fuI' ¡he


be 10 accnunt ror Ihe row This
Mode! nf Labor Force Participation

Predil'led Outcome

Ohst'lwd ()utcome

) 1)
Ro\" 554
) is the maximum mw y tlli 342
lhe oulcome with lhe most ob- Row 20.1 79.9
of correct guesses
by choosing lhe Co!unm '¡istal 266 41'\7 753
Row 35.3 64.7

variables, lo our prediction


• distribu!Íons, reduces the error in prediction by 100 x
4.3.4. Information Measures

A different approach to assessing the tit of a model and for com-


paring eompeting models is based on measures of information. Akaike's
lo Goodman and Kruskal's Á (Bishop el al. 1975, p. information criterion (AIC) is a well-known measure, while the Bayesian
classitication table. Other measures of association information eriterion (BIC) is a measure that is gaining increasing popu-
could also be lo lhe dassilkation table (Menard, 1995 pp. 24- larity. For a general discussion of information-based measures, see Judge
et al. (1985, pp. 870-875).

Counl J~leasures: Labor Force Participa/ion Akaike's /nfol11uztion Critenon (A/C)

Table 4.4 shows the values from the logit Akaike's (1973) information criterion is defined as
AGE, Wc, HC, LWG, and
indicate lhe of a given outcome 2P
AIC [4.121
tha! were to be cither 1's nr O's. They show Ihat Ihe model is
are predicted corrcctly) 'than O's
where 1(M fl) is the likelihood of the model and P is the number of
are lhe count R2 is
parameters in the model (e.g., K + 1 in the binary regression model
where K is the number 01' regressors). While Akaike (1973) formal1y
= ._---" .69 derives AIC thmugh the comparison 01' a given model to a set of inferior
alternative models, here 1 only provide a heuristic motivation for the
reasonableness of the formula.
which can be of the cases that were observed as 1's. L( M fl) indicates the likelihood of the data for the model, with larger
On Ihe other R 2 is Le
values indicating a better fit. - 21n Mfl) ranges from O to +00 wirh
smaller values indicating a better fit. As the number 01' parameters in
the model bccomes larger, - 2ln L( M fl) becomes smaller since m~re pa-
rameters make what is observed more likely. 2P is added to -21n L( Mfl)
as a penalty for increasing the number of parameters. Since the num-
lha! the models reduces lhe errors in prediction by ber of observations affects -2 In L( M [;), we divide by N to obtain the
REGRESSION MODELS
111
-2 In L(lH{l)' AH else being Bre fllt'aSIJ[(:
a oelter model.
used lO eompare models aeross dífferent samples or to
nonm~sted models that cannot be compared with the LR test. InN [4. 13 1
tht~ model with lhe smaller AlC is considered the
Since the saturated model O (¡~fhy nmst this be the
~he salurated model i5 when BI(' k O, When BrC 0, M
A k
IS wlrh the more the BlC\ the better the fiL
Crilerion A ~econd version nf BIC 15 based on the LR chi-square in Equation
information criterion has heen proposed by Raftery 4.6 wlth ro tlle number of regressors (not in the
model:
Ihe literature cited as a measure to assess the overall
mode! ami to allow the of both nested an? nonnes:ed
( ! 996), which derives the for- InN [4.14J

of models. Consider models If ¡he null model without anvJ regressor5, thel1 BIC:, i5 O. The
"",,'/pr71),. odds of M ¿ relatíve to ,"1 J equal
nul! model is when BIC~ 0, that M k includes too
MI lllany parameters or variables. When < 0, then M tS preferred
k
with the more the BIC k the the tit, Basically, ElC~ as-
sesses whether M k tits (he data well lo the number of
parameters tha! are llsed,
the observed data is greater than the Either BIC k nr BIC' can be used to compare models, whether or not
M 2 would be preferred. Under they are nested. (1996) shows that
)/Pr(M j ) of the two models
have no for one model over the other), 21n
theorem can be used lO show that the posterior odds equal BICI BIC 2 [4.15J

Thus, ¡he in the BICs from two models indicates which model
ís more likely lO have generated the observed data. Further, it can be
shown that

would be ehosen if the probability 01' the observed data given


the data is
"''''''''T''Atc'(1 than the probability 01' the observed
Even if neither nor M] is the "tme" model, the Bayes
so that the choice of which BIC mt:asure to use tS a matter of conve-
, lo choose lhe model that will, on average, give better menee.
(Raftery, p. 14). ..
Based on 4. ¡he model with the smaller ElC or Ble' ís
The BIC statistic is a computationally convenient approxlma~lon to preferred. J-[ow strong lhe preferenee is depends on the magnitude of
factor. Given N consider model ~k wlth de-
the Raftery, based on (1961), suggested guidelines for
to the saturated model Ms w1th dfk equal
the strength of M2 lI1! based on a difference
size minus the number of parameters in Mk •2 The first
in BIC or BIC. Thesc are lísted in 11tble 4.5. Sínce the model wíth the
more BIC or BIe' is preferred, if BICI BIC 2 < 0, then the
refer to what Raftery ¡he saturat¡:d
llrst model is If BIC¡ BIC: > 0, then the second model is
REGRESSION MODELS lúpothesi, (/ lid (folle/ness Fl"t 113

TARtE 4.6 AIC anti BIC for the Mmld

.~f, \1.

n
dI
p
Ale L2~3
link between mc and othcr measures of fit, consider
1 p. 19) for eomputing me' in

InN
formula for BIC' in the LRM can also
outliers and i~tluential observations apply only to models with binary
wirh CLDVs by R~ll from Equation outcomes. Wh¡le some of the scalar measures of goodness of fit are only
appropriate for models with outcomes, others apply with minor
adJustments to any model estimated with ML.
MI'Clsures: Labor Force Participation
AIC ane! BIC measures. the logit model l'vl) with the
variables: KÓ18,AGE, WC, HC,
4.5. Bibliographic Notes
which adds a age term AGE2 and drops
The tests presented in this chapter have a long history. R. A. Fisher
HC, ami UVG were estimated. Table 4.6 contains the
introduced the LR test in lhe 1920s, and A. Wald proposed the Wald
wíth the (hat are used to compute them.
the AIC and mc, it is important test in the 1940s. Further details on these tests can be found in most
econometrics texts. Godfrey (1988, pp. 8-20) and Cramer (1986, pp. 30-
listed statistics using the formula in
42) contain thorough discussions of the fOllndations of these tests. Buse
BIC, and BIC. modell'vf¡ is favored by (1982) provides an informative interpretation. Maddala (1992,
pp. 118-124) presents an discussion within the context of the
difference in BIC',
linear regression model. Regression ' for the binary response
--4.024.87 = -4.79 model were dcveloped by Pregibon (1981). Amemiya (1981) and Wind-
73.32 =-4.79 meijcr (1995) have revÍews of measures of lit. Hosmer and Lemeshow
1hble Ihe evidence favoring M 1 over M 2 is positive hut (1989, Chapter 5) provlde further detalls on diagnostícs and tests of fit.
The AIC was proposed Akaike (1973). The BIC has been advoeated
by Raftery in a series of papers summarized in Raftery (
developed from Schwarz (1 and (1961). See et
Conclusions (1985, pp. 870-875) 1'Ol" a discussion of these and related measures.

are quite general and can he llsed


in Ihis book. methods for deteeting
Ordinal Ol/temncs lIS

Researchers often. and perhaps usually. lreal ordinal vari-


ables as if lhey wcrc ÍntervaL Tlle are flumbered
Ordinal Outcomes. Ordered Logit and sequentially and lhe LRM is used. This involvcs the assurnption
Ordered Probít Analysis lhat lhe intervals betwcen adjaeent Fnr
rhe distance between
lhe same as rhe distance bctwcen
scale. Winshíp and Mare ( review (he debate bet\veen ¡hose who
argue that tbe ease uf use, simple and of the
LRM justify its use wirh ordinal outcomes versus those who arguc that
¡he bias introduced by of an ordinal variable makes this prac-
tice unacceptable. Both McKelvey and Zavnina ( p. 117) and Win-
ship and Mare (1984, pp. 521 wherc of
an ordinal outcome provides misleading rcsults. Given this prudent
researchers should use models spccifically for ordinal variables,
Before considering methods for ordinal outcomcs, ir is importan! to
note rhar simply beca use the vaJues of a variable can be ordercd does not
imply thar the variable shollld be analyzed as ordinal. A variable migh! be
ordered when considered for one purpose, bUI be L1nordered or ordered
di1'ferently when used for another purpose. McCullagh and Nelder (1989,
p. 151) make this point with the of colors. While colors can be
variable ordinal. can be rankcd from low to arranged according to the e1ectromagnetic spectrum, this does not imply
but lhe dístances betwcen are unknown. Onli- Ihat this ordering is appropri,ate for all purposes. When consumers buy
outcomes are common in the social sciences. McKelvey and Zavoina a car, there is no reason to belíeve that they prefer colon; in an order
studied votes for the 1965 Medicare biU where each member 01' thar moves around the color wheel from red, 10 orange, to yellow, and
for Ihe bill. Mar- so on. Miller and Volker (1985) illustrate this point in their analysis of
factors the assignment of Navy occupational attainment. Occupational groupings were ordercd both
that were ordcrcd as mcdium skilled, highly skilled, tlle status of the oceupations and by the income of the occupations. The
nuclear Winship and Mare (1984) modeled educational alternarive rankings resulted in different conclusions when with
¡¡ttainment with education classif1ed as less rhan K years of school, H to the ordered regression modeL Another examplc is Likert scales which
¡ 12 ycars, or 13 or more. Many studies have considered ordi- might reflect two dimensions, intensity and opinion, rather than a sin-
status. et al. (1994) analyzed the gle dimension. While in one sense the agree, agree,
that workers and Miller (1995) studied occupa- neutral, disagree, and strongly disagree are ordered, if a researcher is in-
attainment in China where workers were ranked as shift ¡cader. terested in the intensity of opinion, rhen lhe ordering should be strongly
middle ami Hedstóm (¡ studicd organizational agree or strongly disagree, agree or followed neutraL When
rank in Sweden where workers wcre classified as having low organiza- the proper ordering of a variable is ambiguous, rhe models for nominal
first-line responsibilíties, middle man- variables in Chapter 6 should be eonsidered in addition to rnodels for
surveys have respondents indicate ordinal variables.
such as less than $15,000, be- In this chapter. 1 focLls on the ordered logit and the ordered probit
models. Since these rnodels are so c10sely related, 1 refer to them to-
agree, agree, have no gether as the ordered regression model, abbreviated as ORM. Within the
statemcnt. social sciences. this model was introduced and Zavoina
- ¡¡

11 RE()RESSION MODELS Ordmal OlltCI!II1I'S

This mapping from lhe latent variable to lhe observcd is il-


lustrated in lhe

00

~<~--~------------+--------r-------~ y*

< 2 3 4 >y

curves that are The solid Une lhe latent variable . The are indi-
SectÍon The name continuous cated by lhe horizontal lines marked ,T2, and T,. The values oí' the
between an underlying eontinLlous ohserved variable r over lhe nmge 01' are rnarked helnw with a dotted
variable. 1 with this view of lhe line.
rnodeL As with lhe binary response modeL lhe structural modcl is

Ordinal Variahles xJ3


5. L A Laten! Model
XI is a row vector wi¡h a 1 in lhe first column for t he
frorn a measurernent model in which a latent ¡he ¡th observatían fOl' in column k l. (3 is a column
rnapped to an observed variable y.
structural coefficients with the tirst element being rhe
incomplete inforrnatÍon about
reasons that will become clear when I discuss
lo lhe measurement equation:
is always included.
for mIto J 15.1] For a single the structural moLlel is
The extreme categories j and J
intervals with TO JO and T¡ JO. When
tu rhe rneasurcment equation for lhe This is plotted in panel A of 5.1. The latent is on Ihe vertical
3.1 ). axis, The values 15, (J, and are labcled VOl! a sensc oí the scale
of . Tlle thresholds TI' T2' and Tl are índicated dashed Iincs that
divide ¡nto four values of the obscrvcd y. is localed helow
and T~ JO is localee! above. The values of the observed are shown
al the righe The regression (r for el' I and {3 .1
is shown with él thick lineo Since is unobserved, él" and {3 canl10t be
estimaled hy regressing on X.
Panel B plots the obscrved y X. The are constructed fmm
the in panel A assigning all cases with ahove 7.1 to 4, cases
with y* between Te. and T:; to 3, and so on. Tlle OLS estimate 01' the
SD y' regressíon of y 011 x is illdicated by the dashed line with an estimated

:
.·1

D 71 slope of .026. Regressing y on x does not the


A Tl regression of on x, which has a slope that is four times The
\ SA T" JO regression lines in panels A and B only look similar beca use Ihe scales
REGRFSSION MODELS Ordinal OwnJ/lu:s
¡le¡

Panel A: oí Latent y* ij' tht' thresholds are aIl ahout the same apart. When !his is not
lhe case. lhe LRM can very misleading results,
Figure 5,1 also i!lustrales an importan! of lhe ORM,
In panel A, you could add another
ing lhe structural mndeL Imagine a horizontal Une hetween TI
am! T2' This would correspond lo another lo (he ordinal
scale, such as the "Neutral" between
The regression line for on ,1: would not be affeeted. In
new category would correspond lo a new horizontal row oí' ob ..
servations, which would affect the reslllts 01' lhe 01' v on x

5.1.1, Distributiollal A.~sumptiolls

As with the BRM, ML estimation can he llsed lo estímate lhe regres-


sion 01' on x. 'TI) use Mt, we must assume a form of Ihe error
distriblltion. Once again, we consider normal and errors. Other
Panel B: of Observed y distributions for !he errors have been but are not used often
(see McCllllagh, 1980, p. J 15, for details).
For lhe ordered probit mndel, E: is distributed normally with mean O
and variance l. The pdf is

and lhe cdf is

$(8) = f~ ex p ( - 2) dt
For the ordered logit model, E: has a logistic distrihlltiol1 with a mean of
O and a variance of The pdf 15
x
A(E:)
Latent Variable to the Regression of

amI the cdf Is

in panel B were drawn


the in A lhe regression Hne for y \(8)
horizontaL Another problem with lhe regression
errors are heteroscedastic and are not normal. In For the rest of this chapter, F represents either $ or :\, and f represents
(he LRM to those of the ORM either 4> or A.
REGRESSION MODELS OrdinalOwco/)ws
121

modcls, the choice betweel1 the Substituting

assume a Pr(YI 1 Ix l ) Pr( Tn xl J3


use the ordered probit model (see
On the other hand, interpretatiol1 of Then, slIbtracting xl3 within ¡he ineqllality.
in Section 5.4.5 requires the ordered
Ihe choice is likcly to depend
and which mndcl is most common in your
The probability that a random variable is bt~tween (wo vatues is the dit'-
ference between lhe cdf evaluated at these values. .

5.1.2. The Prohahilitil~s of Observed Values


Pr(Yi= I I xJ Pr(
F( TI xi l3) F( TO x¡J3)
we can compute the
x. To see this. consider Fig- These can be the nr¡'II-."h, uf any ob-
ure 5.2 which illustrates Ihe distribution of for three values of x. The served outcome y m givel1 x:
nr normally around the regres-
01' outcome m corresponds Pr(y¡ m Ixi ) = F( T m Xi 13 ) [5.31
distributlol1 bet:wccn lhe cutpoints T m 1 and Tm'
follows. When computing Pr(y 1 Ix), the second term on the si de
consíder lhe formula for lhe that v l. We observe drops out ~il1ce xl3) F( -00 xl3) O; when Pr(y
falls between -0(; and TI' This implies that J Ix), the first term equals I sincc F(T¡ xl3) = xJ3) 1. Thus,
for a model with four observed outcomes. such as shown in Figure
Pr(TIl T] \xJ the formulas for the ordered probit model are

Pr(y, = 1 ¡ xJ = a
Pr(y¡ 21 x¡) (l>( T2 a
*
Pr(y¡ 31 x,) <1>( T.l a
4
Pr(Yi = 41 ) tl>( T.J

:3 For example, if a T2 :=: 3.5, and T}


thcn for.x 40, and are obtaíned
(Reproduce (his table,):

Predicted
Probability x 40 x = 80
Pr(y 1 I x) 0.68 0.20 (UlO
Pr(y 2 I x) 0.32 0.77 0.44
Pr(y 3 I x) 0.00 0.03 0.47
Figure 5.2. Ordered Model Pr(y 4 Ix) 0.00 0.00 0.1)9
GRESSION MODELS Ordinal Olllcomel'

¡hat there is an under- seIs 01' paramet¡;rs lhe obs¡;rved data: a


¡he molle!. the slruclural mouel can be
untenahle. For example, aca- change in the thresholds. Tha! is lo
an underlying variahle While there is an inflnite nllmber
associate. amI full to identify rhe mode!. two are
a prohability
1. Ihis

') ASSUITIC lhat O. This

Both assllmptions íd¡;ntífv lh¡; modd a constraint on (me


Iden of the parameters. there otila thal Hiould lhe
model?) The different identifying lead to what are known
as different parameterizations of the mode!. The choice of which param-
eterization lo use is arbitrary and does not affect lhe for
f3() or associated significanee tests. Further, as shown by Equation
the probabilities are nol affected by the assumption. How-
e~e,r, understanding lhe different is important since
as the "tme'- dIfferent software uses different Section 5.3.1).
the observeu

5.3. Estimation
[5.4]
m is identical, Let P be the vector with parameters from lhe structural model with
the intercept f30 in the tirst row, and let T be the vector containí~g lhe
threshold parameters. Eith¡;r f30 or TI is constrained to O to identifv the
mi f3x) [5.5] mode\. From Equatíon 5.3. ~

[S.6J

a* f3x) The probability of observing whatever value 01' was observed


for the ¡th observation is
Since bOlh the same value for the probability
an observed lo choose hetween the two

Yule and K. P~an,on in


ordmal ratller than Ileing
Agncsti. 1996. Chapler !O, fm if Y m
une n:viewer
variable, while another chastised me for
variable behind it. jf J
Ordinal 125
R F. S S [ o N
()l/lCOIIlI'S
~1 O DEL S

likclihood equation is AfetllOds uf Numerical Afaximizafíor!. Different programs usc difft:rellt


methods of numerical maximization. uf lhe method used.

np¡
\
[5.S]
programs should produce the same estimares 01' the parameters up to
f1ve signiJkant digits, but the standard errors ami lest statístics can differ
substantially, especially with small or wil h iIl-conditioned data.

Failure lo In my ¡he ORM takes lo con-


verge than other modcls considered in this book; between ¡¡ve and ten it-
TI y. erations is typical. If lhe number of cases in a response i5 small.
the model may faíl to converge. When this occurs, estimation can pro~
ceed by merging the outeome with a "mal! number of cases into
an adjacent category. The only effect of combining
is a loss of efflciency (McCullagh, 1 This is a conscquence
of the parallel regression assumption, which is discussed in Section 5.5.
over al! cases where Ji is obscrved to
likelihood is Binwy Outcome. With J = 2 and the constraint TI n, the ORM is
identÍcal to the BRM after a change in notation. However, some pro-
grams for the ORM have been optimized for J 2 ami do not work for
J = 2.

5.3.2. Example of the ORM and the LRM: Altitudes Toward Working Mothers
pp. presents the gradient
estimation ami reviews Pratt's (l9S 1) In 1977 and 1989, the General Social asked lo
tha! Newton-Raphson wiU converge to a global evaluate the following statemcnt: "A working mother can establish just
asymptotically normal, as warm and secure a relationship with her children as mother who
does not work." Responses were coded in the variable 1'~~1RM as: 1
Strongly Disagree (SD); 2 Disagree (D); 3 = (A); and 4
Strongly Agrce (SA). With a sample of the marginal percentages
5.3.1. Software Issues
are 13, 32, 37, and J 8, respectively. The variables used in our
There are several issues relatcd to software that should be considered are described in Table 5.1. See Clogg and Shihadeh (1994, pp. 158-1
lhe ORM. for an alternatíve analysis 01' the same data.
lhblc 5.2 contaíns the estimates from four models. Column I contains
the Jlodef. The most important issue i5 knowing OLS estimates for the LRM:
your program uses. Programs such as LIMDEP
W4RM f3(J f31 YR89 f32j~fALE f3, ltHITE
assume that TI O and estima te whilc programs such as Markov,
SAS's LOGlSTlC, and Stata assume that O ami estimate TI' The + f3.y1GE f3sED f36PRST t;

does no! affect estimates of the slopes, but


and the T'S. In SAS's LOGISTIC, you Column 2 contains estimates from the ordered probit model with the
mus! use in order for the estimates of the constraint that TI O; column :. contains estimates fmm Ihe ordered
here. Without this op- probit model with the constraint that = O; and column 4 eontains
--xll + e. estimates from (he ordered logit model with O.
RE RESSION MODELS Ordinal Owcomes 27

Mothers TARLE 5.2 Comparison uf Linear


T\BLE 5.1
Parmneterizatiom. of the Ordered

¡í¡riahll'

A:·I SA rR89 f3
!()H9; ()
remede .HAU: 1I
¿
nonwhite
Agc in WlJITE
¿
Year, of "ducation
Occupati<mal A(;E:

F-:!)

PRSI

nolice bcfore we eonsider how these results

for ¡he LRM are similar 10 those for


oI ¡he LRM estima tes are very sim-
mude!, and ¡he LRM estimates them-
fUf Ihe nrdered prohit modeL To see why
¡he estímates of ¡he T's. The distanees
the ordered prohit
the implieit
ordinal are equatly spaced
that ¡his wIII generally he the
'lOTE: S

!'rom Ihe ordered logit and


models assume differen! vari-
the coefficients are quite
affectcd the assumed variance 5.4. rnterpretatiol1

Ihe two parameterizatíons of the If the idea of a continuOllS, latent variable makes substantive sense,
identieal, while ¡he and thresholds dif- simple interpretations are possible lhe latent variahle to a
constraints on ¡he intereept and unit varíance and computing y' ··standardized and flllly standardizcd co~
in ¡he modeL efficients. When concern is with the observed of
whether a latent variable is methods from
¡hese cocfficients can be interpreted in be extended to the case of mllltiple outcomes:
and how can be used to compute the the observed olltcomes can be m rabies or plots, partial élnd
discrete in probabilities can be and lhe ordered
RESSION "10DELS Ordinal Olllcnmes 129

The fully standardized codflcient can he as:


• Fm a 'itandard dcviatioll
standard deviatiul1s.

As with lhe BRM. lhe variance of can he estímated ¡he


form:
5A.l. The "¡¡rtial
[5.91

where V;¡r(x) i!;,. th(~ covariance matrix for Ihe x's computed fmm lhe
observed dala; Il contains ML am! I in the orden~d
probit model and Vare t:) in lhe ordered mode!.
The codfkients in TabIe 5.3 were computed fmm the coefticients
in l1lble 5.2 amI the statistics from TabIe 5.1. The variance
be as: of was estimated using Equation 5.9, 3.77 for the
ordered logit model and 1.16 for lhe ordered modeL Notice
• {3, units. all
that 3.25. which close to the ratío of lhe assumed variances:
Var( fe p)/ ) 3.29. The difference in the variance of for
from [he nbserved data. the the two models is retlectcd in the magnitudes 01' lhe unstandardized {3's.
undeaL As diseussed by McK- where the coefficienls from lhe ordered logit model are 1.6 lO 1.8 times
ami Winship and Mare (19H4. ¡arger than those for lhe ordered probit model. The fully standardized
fullv standa rdized eoefflcients and y* -standardizcd cocfficicnts are nearly identical across models sinec
¡he seale of y* has bcen elíminated by díviding by . (vVlly are no!
dcviation of lhe latent . then cxactly equal?)
If you are willing to consider the
ing an unclerlying measure of the

TABLE 5.3 Standanlizcd Coefticients for the Ordered Modd


• standard dcvi-
()r!lcred Pm/Jit

~íJria/J/e ¡3 13
YR89 0524 0.31 C)
.\1ALE -O T1' -(1.'711
WHlTl:, -O ..V!! -0.202 -0.210
AGE -0.022 -(UlI j -(l -IIO! J 191
En 0067 () 109 IUlY! 0.1
PRST o.nO!) 0.0113 0.045 O.IlOj 0.003 (1.O44
Ordinal Owcollles
RE R E S S ION M O D El .. S
TABLE 5.4 Prcdicted Pmbabilities 01' Outcomcs Within ¡he rhe
coefficicnts can he interprctcd as Ordcred 1 Modcl
follows:
than in 1977.
SD D /\ 'lA

.01 standard dcviatíons, Mínimum


• \clean 013
\1aximum 0.'17
all other varí- Rangt~ 045

• incrcase~ support by 109

prohability of strongly is .02 and the maximum is


JJ rhe idea or if interest is in the resulting in a range of .45. Similar results are listed 1'or lhe other cat-
01' strongly agree- In our example, there is sufficient variation in each to
of nhserved outcomes can he justify further When the range is too small lo be of substantive
used. interest, further analysís is unnecessary.

5.4.2. Predicted Prob~lbilities Plotting Predicted Prohabílities

The that m x is With a single independent the entire probabílity curve can be
plotted. When there are more variahles, the effect of a variable
can be examined while the remaining variables are held constant. For
example, the effect of age on the probahility of ordinal outcomes can be
can he used in a of ways to show the relation- plotted by holding all other variables constant and allowing age to vary.
variables and the dependent categories. To do this, let x. contain a 1 in the first column for the intercept, a 1
in the second column to specify the survey year 1989. a O in the thifd
the Mean and Predicted Prohahilíties column to seleet women, and the means for the variables fm age
in the remaining columns. Then
the mean, minimum, and maximum
Pr(WARM mi x*) = - x.,l)

is the predictcd probability of outcome m for women in 1989 for a


mean In I x) IL..
" age who are average on all othcr characteristics.
N
These probabilities are plotted in panel A of 5.3. Consider the
min m x) mín Pr(Yi mi x¡) probability of strongly which is indicated
t
eles. At age 20. lhe probability is .39. As age
max Pr(Yí = m I Xi) probability decreases to ,25 at age 50 and .15 at age 80. The probabíJ-
ity of disagreeing, indicated by the is nearly the mirror
where indicates the minimum predicted probability over It begins at .16 at age 20 and ends at .34 at age 80. There is a smaller
all observations, and for max¡, These values are presented in change in the probability of strongly indicated diamonds,
lhat starts at .04 and ends al .12. The probahility of shown
Table 5.4. Consider lhe outcome SO. Within the sample, the minimum
R (¡ R E S S ION M O D F 1 S Ordinal ()u!comes 133

than move from A into D: the of A increases.


A:
When age is larger, more cases leave A fm D than enter A from re-
sulting in a smalkr 5.3 unfil you are conFinad
uf (his )

Plotting Cwnulariv(;' Prohahilities

The cumulative thar the outcome is less


than or equal to some value. cumulative of
being less than or lo m is

Pr(y m I x) lOl
(Prove this equality.) In our would
be the probability of
80
of strongly or 21 and so on. These
probabílities can be plotted to uncover overall trends. The cumulative
Panel B: Curnulative probabilities from ntlr example are plotted in B of 5.3.
Nntice that the cumulatíve probabilíties "stack" the prob-
abilities from the top and show Ihe overaIl increase with age in
negative attitudes toward Ihe statement that a mother can es-
tablish just as warm and secure a relationship with her child as a mother
who does not work.

Tables ()f Predicted Prohahilities

Tltblcs can a)so be used to present


the predicted probabilities fm men and women
vey, along with differences by in Ihe within year and
aeross years. The first to notice is that men are more likely rhan
women to disagree and tha! a mother can es-
tablish just as warm and secure a with her child as a mother
e who does not work; and men are less lO agree and agree.
Second, between 1977 and 1989, there was movement for both men
Figure 5.3. Predicted and Probabílitics for Worncn in 1989 amI women toward more attitudes.

of the ORM (whích also 5.4.3. Partíal Change in Predicted Probabilities


squares. iIlustrates an llflusual
to nominal models in The probability begins at .42,
A third method for interpreting the ORM is to compute Ihe partíal
and lIJen The effect of age on agree-
change in the probabilities. Recall that
and This occurs beca use as age
SA move ¡nto category A Pr(y mlx) F(TII1~Xjl) F(T m _ 1 xjl)
R R SSION \10DELS Ordinal OlltclJmes

T,\HU: 5.6 at

I'Ílriahle

AUE
l:f)
PRST

More commonly. the effect is al ¡he mean "aIues 01


all variables:

xf3) xf3) 1

Oro lhe marginal can be evaluated al other valucs. For . Table 5.h
contains the partial in for women in 1989. Thesé
are computed ,HALE at O ami YR89 at 1 wltb lbe other variables
xf3) al theÍr means.
In general. ¡be marginal dfcet floes not indicate
probability that would be obselVcd for a unit
xf3) Tm xf3) if an independent variable varíes over a
rhat is nearly linear. the effcet ean be uscd lo
-- xf3) 1m xf3} 1 effeet oí" a unit ehange in rhe variable on the
For example, given the nearly linear
of the CUlVe relating
probability of disagreeing shown in panel A of
other variables constant. The 01'
the same as the of {3. ~ince • For womCI1 in 1989, caeh additional 10 years oí" íncrcases lhc
Indeed. it is possible for uf Iha! a mutht~r can estahlish warm ami seeure
changes. This is seen a rdationship with her child as a mother who docs not work .032.
(shown with
curve is positive, The value .032 is 10 times cffeet
effect of age. Around age 40. the marginal oí' age for outcomc D. Beware tbat this
age decreases the probabilíty of effect is only possible when the

Sinee lhe on lhe levels of al! variables. we


5.4..... Dis{'rete Change
must decide whieh values of lhe variables to use whcn computing lhe
One method over al! obselVations: Interpretation using effects ean he whcn the prob-
ability eurve is changing rapidly or when an variable is
xf3) xf3) 1 dummy variable. For the ORM. 1 find that measures of diserete
are much more informative.
RECiRESSION MODELS Ordinal Owcoml's

TABLE 5.7 Discrctc in ¡he uf Attitudes Ahout


Discrete in rhe prohability for él Motbcrs for the Ordercd
in from rhe end value a from O
¡,¡riaMe Chal1ge i Sf) j)

Overall Prohahility O.IJ


i x, mlx, xs) YRHfJ O 1 0.06 0.0;
}tAU. n 0.09 0.117 O.lO
¡+¡¡ITE 11 OOS 0.11, 0.116
m AUF ~! 000 0.00 OIM)
~<T 0.04
~Rang"
pwbability 01' outcome 111 El) ~l 0.01
al! other variables al x. ~'T
.lRangc 01(,
Since the model is the value 01' the discrcte ehange depends on l'RST ~l lUlO
thn.:e factors: (1) lhe leve! of al! oí' the variables that are not changing; 0.01
(2) lhe value at which starts; and (3) the amount of change in Xk'
Most each eontinuous variable x k is he!d at its mean.
For variahlcs. the change might be compulcd for
both values oí' the variahle. For we could compute the discrete

depend on the purpose the change when the variable goes from it5 mínimum to íts maximum
chojces include: value. For example,

is found letting XI change from its mini- • For each additional year of education. the prohability nI' strongly
mum lo incrcases .fll, holding al! other variables constan! al theír means.
variable is obtained lettíng XI change from () lo 1 • For a standard deviatíon increase in age. Ihe of in-
The effect of computed by changíng from ro creases by .05. holding al! other variables constan! al their means.
can be computed by changmg
• Moving from the mínimum lo ¡he maximum lhe
predieted probability of .06, holding al! olher variables
in Xk is computed by changing constan! al lheír means.
is computed by changmg
The effects of a variable can be summarized hy the av-
erage of the absolute values of the across all nf the outcome
Tahle 5.7 contains measures oi discrete change for our example using categories. The absolute value is taken since the sum 01' the
the ordered model. For variahles, 1 present the change in without taking the absolute value is O. The average absolute
lhe when the variable changes from O to 1. For discrete chmzge equals

of ís .08 higher for men than women.


all mher variables al theÍr means.

For variables that are not should examine the change in the
for a unit centered around the mean, the These values are listed in the column labeled 3: in the table. Clearly, the
deviation centered around the mean, and respondent's sex, education, and age have the strongest ef1'ects on atti-
R (, R E S S 1 O '1 \1 O DEI S Ordinal ()lIICmlli'S
139

as warm a relationship Th determine thc effect of a m x, consider two values of x:


X x, and x Xi' The odds ratio a t x, versus x, equals
t:x[t:ndcd in many \vays dcpend-
variables are highly skewcd,
can misleading and changes
llsefuL If a of a specitlc
¡he additinn of fOllr years of This equatioll IS most useful when nnly a variable For
Ihan lora standard deviation if B. then
variables are you may
Ihe grollps def1ned by the
exp( ¡')
variables.

which is interpreted as:


5.4.5. Modding Odds in (he Orden~d Logít ~1odel

where lhe model is referrcd to as • Fm an increase of In lhe odtls 01' an outeome less than or
to m are changed the faetor -8 ), holding al! other variahles
1990, p. 322: McCullagh & Nelder,
constan!.
model is uften interpreted in terms
cllmulalive The cumulative probabilíty If xk changes 1, lhe odds ratio equals
tha! lhe oulcnme ís less rhan or
1)
Pr(r m 1 x) I i x) for 1. J
----'-------- = exp( llJ
In )

Notice that the factor change equals exp(). to


The odds that an outcome m or less versus than m given x are
Chapter 3 for the binary logit modeL This is beca use the ordered
modcl is parameterizcd as In n",(x) Tm -XJ3, compared 10 In H(x) xJ3
ror the binary model.
To iIIustrate the interpretation using odds consider the coeffi-
rhe odds of disagreeing or strongly dis- cient for gender fmm 'Jable 5.3: (32 -.73, so that exp( 2.1.
01' stmngly agreeing. This can be interpreted as:
lhe odds of an olltcome being less than
than In have lhe sImple equation • The odds of SD versus lhe comhined outcomes D, A, ami SA are 2.1 times
5.2 ami 5.10.): greater for men than women, holding al! other variahles constan!.
lhe odds (JI' SD alld D versus A and SA are 2.1 limes greater for men than
women; and the odds of SD, D, and A versus SA are 2.1 times greater.

Thc coefficient for age is J)2 with a standard deviation .1'4 16.8.
the results in lhe equation: Thus. J (lO! exp( ) i] 44. which can be as:
In =7 • For a standard deviation increase in age, lhe odds of SD versus D, A, and
SA are inen.:ased 44(0", holding al! other variables constan!.
Discussions Ihe onJcred model that do not use a latent variable lhe odds of SD and D versus A and SA are grcaler for every standard
10 lhe model often with this eguation. In such cases, the deviat¡on incn:ase in age: amI the odds of SD. D. and A versus SA are
model is referred to the cumulative logit modelo greater.
REGRESSION MODELS 141

show Iha! Equation 5.11 implies that the odds ratio


... ,
) is ¡he same for al! values of m. This is known
"- ,,
. In terms of our \?xample, we must '\
whether il makes st:nse that a in age has the same effect \
\
,
\

the odds nf SD versus other as for answering


D or A vt:rsuS SA This leads us lo a test of the proportional odds
\
, \
\
\
\
\
\
whieh is also known the assumptio/l fur \

r<''',Cf\f1'' Iha! now

Assumption

odds in lhe ordered model cor-


to the idea of parallel in both the x
and models. The idea of parallel can be
seen the model in terms of the cumulative probabiJity that Figure 5.4. IIlustration of ¡he Parallel
an outcome is lt:ss than or to m. From Equation 5.10,

mlx) F(T m -xl3) 121 For cxample. Figure 5.4 plots the cumulative probability curves when
there are four ordered categories, rt:sulting in duee curves with inter-
is the cumulative distribution function F eval- eepts T¡ T2 f3o, and T3 f3o. To see why these curves are parallel,
The
xl3. Since 13 is ¡he same for aH Equation 5.12 defines a pick a value of the outcome probability. For example, the probability .5
response models with different intercepts. To see thlS, note is indicated a dotted, horizontal line. When we examine the slope of
the three probability curves at this point, we find that

xl3 (T",

It is in this sense that the curves are paralleL


the model for 1 is

An Inf(Jf7nal Test. We can informally assess lhe assumption of parallel


) rcgressions by J 1

The model for y 2 is Pr{y


with lhe
The flrst binary is for the binary outcome defined as 1 if
Pr(y 211 x) (30) tf3k X k) ji 1, clse O. The seconcl is for the outcome equal to 1 if
k=l
Y 2, else O. AmI so O!1 up lo the outcome cqual to 1 if Y J 1. This
thís lo T2 f3(), but the coefficients results in.l 1 estimates ~m' If rhe assumption of parallel regressions is
tfue, then
changing the intercept shifts the prob-
or 10 the but does not change the slope.
RF(lRESSION MODELS Ordinal 143

where we the cnn~lra¡nt thal ¡he ~~m's are across lhe ]

[5.13]
Thus, \ve arc mmll:l:
2.XO
¡.(lX4
V IIl¡X) X,~ ) 14]
-U.6'1l
X.XH Tht: scort: tt:sl t:valuatt:s huw tht: Iikelíhood 01' lht: ORM would
14 -O}93 if lhe c\ll1straint in 5.13 was rt:moved. The
-2,24 -2..14
tt:st statistic is distributed as with K(I of free-
--().O19
- 4.94
dom. Ihat litis is ¡he correel Ilwnber count-
1).053 005H ¡!lg lhe tlumber conslrainls tested) For our the score
2.17 tt:sl t:quals 48.4 with 12 of fret:dom (p .OO!). This
0.006 strong t:videnct: that ¡he parallt:l is violated.
1.14
A Wald Test. The scorc test is an omnibus test tha! does not show
whether ¡he parallel assumption is violated for a11 indepen-
dent variables or only sume. A Wald test proposed by Brant (1990)
allows both an overall test that aH 13m'8 are equa] and tests of lhe equalily
01' cocffieients for individual variables. While this lest is not implemented
by commereial programs, it is albeit tedious, lo com-
consistent estimate of the 13 in Equation 5.12 (Clogg
pute with programs that indude a matrix SAS, LlMDEP,
pp. the similaritics 'lnd differ-
GAUSS). The lest is constructed as foIlows.
from prohits) and 13 from the
assessment of the parallel
l. l;'slimate 13m ami Var(Bn) Run J hinary on the outcomes
delined hy
In
Y m
wíth estimated ,10 pes Bm and covarÍance matrices vare Bm)' Then the esti-
mal\;d probability tha! 1 XI is

2. 1';slimalc the covariullce hetween where Ihe underline indicates


test LM test) 01' the parallel regression ¡hat ¡he eonstant has heen vector. Define
SAS's LOGISTlC (SAS Institute, 1990h,
4 (p. that a score test cstimates a con-
how the likelihood function would change and lel \Vm¡ be a N N diagonal matrix whllse ilh elemcnt is Let X
relaxed. To understand how a score test can be be the N (K 1) matrix with L's in the tlrsl eolumn ami ¡he indepen-
assumption, think of the ORM as dent variables in lhe columns. Brant that lhe covarianees
among lhe --t
B
J, are estímated
the tirst row ::tnd column of

Ix)
REGRESSfON MODELS Ordinal O/ltcmnes 145
J44
score test (S The tests nf the equality of cocfficients ror each
~' (~'§ W.1 I
~~~
a¡j(j
variable examined il1dividually show, as Tahle 5.8, tha!

(
1)
,) there is strol1g evidenee for the violation of the
variables bUI nol for others.
My experienee Ihat Ihe parallel
for some

is fre-
J ¡) quently violated based on either an informal test, Ihe seme test, or ¡he
Wald test. When Ihe assumption of parallel is alter-
are lhe matrices fmm eaeh
native models shollld he considered thar do no! lhe constraínt of
"U""',""'" clements wcre dcflned in step 2.
parallel These models are cOl1sídered in lhe next
I!J I I'!lis cnrresponds

(!¡ ~~ ~~ :)
5.6. Related Models for Ordinal Data
D
While ordered logit and ordered prohit are the mosl frequel1tly used
models for ordinal outcomes in the social sciences (wilh lhe ex-
() () -1
ception of the misuse 01' the LRM), there are a numbcr of other models
) 1 matrix and () is a l ) + 1) matrix that are also available.
thal ¡lIi" matrix results in {he ¡mear combmatlOfl
The Wald slatistic takes the standard form
5.6.1. The Grouped Regression Model

In the ORM, the observed variable is defined

with (J of freedom. y = m if Tm 1 for m ¡ ro J


Construct individual variables. T!le 110 : [ . '-,
can be tested tilose rowS and colllmns oi D, 13 , where the cutpoints are unknown. A similar type of variable occurs when
",~rn',nnrHl 10 lhe coet1ícients being tested. Tht: reslllting a continuous variable is grouped at known values 01' T. For example,
of freedom. il1come might be measured as

For our t h e rcsu1ts o f tile W'lld


, tests are cOl1tained in if $10.000
Tablc 5.9. The omnibus Wald test is dose In Ihe result from the if $10,000 $20,000
y

lt i1' $100,000

Such variables are often analyzed by reeoding thcir values lo the mid-
point of each interval, with some rcasonahle value tlsed for the
ll.O! and lowest The problem is that there is only weak
13.01 0.01
< 0.01
tion for lhe reeoded va/ues. Alternatívely. Stlch variahles are sometimes
1.27 2. treated as lhough they are ordinal and the ORM is used Ander-
WHlTE
son, 1(84). However, sil1ce the clltpoints are known, they do not need
FD lO be estimated. Further, with known eutpoints it is possible to estimate
Var( whieh must be assumed in the ORM. Stewart (
REGRESSION MODELS

an cxtcnsion of the tobit model Chapter 7) and devel- variahles and Ihe
both two-stage and ML estimators. The ML estimator is available ¡ion difticult. Further. must be taken lo ensure ¡hat ¡he
LIMDEP , 1995. p. Stata (1 and SAS's LIFEREG variahle and that the
,,11t:l
outcOInes burh with nwdels fOl' ordinal data and with models for nominal
5.6.2. Otber Models fur Ordinal Data considered in

The model
Notes
, xJ3
Th(: ordered probit mndel grew out of AÍlchison and
the outcome is the of the odds of category m versus category work in where ¡he latent continuous variable ,vas an
Unlike rhe ORM. this mode! is a spedal ease of the multinomial lolerance to sorne exposure. such as a poison. The tok~rance could nol
model eonsidered in the next The continuatíon ralio model be observed. bu! ¡he state of the as
( p. 110): or dene! could he assesscd. Their model was limited
to independent variable. The of the ordered logit model
call be found in Snell (1964). McKelvey and Zavoina ( in a papel'
In
written for social scientists. extended the work of Aitchison and Silvey to
Ihe case where there are multiple independent variables and
is the of the odds of eategory m versus categories
a computationally efficient method of estímation. Independently, the or-
In this estimates will differ if adjacent categories
dered logit and probit models were developed by McCullagh (1980),
are proposed the Slereotvpe model:
whose were limited lO a single independent variable. His fo-
cus was 011 Ihe ordered logit model, which he referred to as the propor-
tional odds modeL McCullagh's work stimulated a great dea] of research
in biostatistics. al! of which scems to be unaware of the earlier work
where constraints are on the ,'s to ensure ordinality and the McKelvcy amI Zavoina.
J3's differ thus the parallel regression as- Several authors provide a review of modcls for ordinal variables.
related to the multinomial logit model Agresti (1990. pp. 318-336) and Clogg and Shihadeh (1994. Chapter
discuss models for ordinal variables with particular attention lo their
(1990. pp. 318·-336). ami Clogg relationship to models. McCullagh and Nelder (1989, Chap-
review these models with an empha- ter 5) discuss several of these models in the context of the
011 their to models. Greenwood ami Farewell linear modeL Winship and Mare (1984) reviewed models for ordinal
compare several of these modeIs in an analysis of medical data. variahles with in

Conc1usions

The linear model used with ordinal dependent variahles can


íncorrect results. The ordered modcl ís an alternative
that is ror ordinal outcomes. While it is straightfor-
the non linear relationship between the
;\'umina! 149

1111lltinomial logit mude!. lf variable is ordinal and a model


for nominal variabks is used. [here is a loss 01' sinee informa-
Outcomes: Multínomíal Logit tion is On Ihe other when a method for ordinal
variables nominal
Related Models
mates are biased or even llonsensieal. If there is any
ordinality ní' the variahk. Ihe potential loss oí"
models for nominal outcomes is
bias.
This reJated models. The multinomial
used modeJ for nominal outcomes. The

eharacteristics of the outcomes


made. While

such rnodcls

6.1. lo Ihe Mul Model

model (MNLM) can be thought of as simulta-


the cannot be ordered. Nom- for all among the
found in cvery arca of the social sciences. Schmidt outcome lndced. estimales frorn consis-
cxamined oc~upational attainment in an early ap- tent estimates of the 01' the MNLM & Gray, 1984).
model. Meng and Miller (1995) In thi5 sense, multínomial 15 a extensíon of the logit
in China. Arum and Shavit model. lhe extension is made difficult by the number of
schoo! vocational education on occu- comparisolls tbat are illvo!ved. With tbree outcomes, Tl1ultinomial logit
are also found in other areas. Hoffman is roughly to running three logíts outcomes
the conditional logit and the multino- 1 to 2, to 3, ami 2 lo 3. With four outcomes, you must add three more
of marital and welfare status. Spector comparisons: 1 to 4, 2 to 4. ami 3 lo 4. Just lhe notation to keep track
the effcCls of an experimental teaching of lhe can be and lhe sheer number of compar-
as indicated by Other examples in- lhe model as as possible,
the home (Goldscheider & DaVanzo, índe-
context of scientífic work (Long & McGinnis, penden! variable. The model is as a set of three
in ti multilingual society (Stevens, Accordingly, you might want to review Section 3,4 of Chapter 3 before
prnceeding.
Consider a nominal outcome v wirh B, and with
and observatiollS in eaeh Assume that there ís a
independent variable x. We c(luld the relationship between x
a series of To examine the ctl'ect of x
A versus B, select ¡he N 4 observations with
RE R N "fODEI S Numilla! ()u!comes 151

mB the Thercfore, sorne of the comparisons are redundant: i1' you know lhe
reslllts for Ihe binary logÍl 01' A versus B, and lhe results frorn lhe
1] logit of B versus C. you ean derive the results for the logit 01' A versus C.
There is, however, one complication. The in 6.5
descrihe necessary relationships among ¡he parameters in the
of Ihe odds of A versus B. The f3
tinn. They will not hold with sample estimates from the three binary
A B lo indicate that
its. (Try this using ,/our ul1m data.) The reason is lhe ¡hree
can bíC
are hased nn differen! samp!es. The tlrst sample has N 4 ohserva-
tions, lhe second has observations, and the third has IV /1
in tlle same way. For out- observations. In the l1lultinomial logit model, all of lhe
and estimate the mated simultaneously, which enforees lhe Ingical
parameters and uses the data more cfticiently. Nonethelcss,
lhe l1lultinomial logit model as a linked sel of binary is
eorreee

Then select ¡he observations for the 6.2. The Multinomial Logit Model

eX [6.31 The formal presentation of the MNLM begins by specifying the prob-
ability of each outenme as a non linear function of lhe x's. After issues of
If we know how x afIeets the identification are resolved, r show tha! the non linear probability model
Are all tiuee leads to a model that is linear in the log of lhe odds; lhis is lhe form
of A versus and how affeets Ihe odds of B versus C, it seems
would tell us about how of the model that we have just considered. Two methods of interpreta-
reasonable that this tion are reeommended: discrete change in lhe probabilities and factor
affects Ihe odds of A versus e Indeed, there is a neeessary relationship
change in the odds. While lhese methods are basically the same as those
aI110ng rhe ¡hree used for the binary logit model, the number of probahilities and odds
involved requires graphical methods lO summarize the results. lb make
[6.4J
the discussion concrete, 1 use lhe example of occupational attainment

the identity: ln( a / b)


Example of the MNLA1: Occupational Attainment
In
Sinee the lcfl-hand side of 6.1 the left·lumd side of The 1982 General Social Survey asked respondents to indicate their
the left-hand side of Equation 6.3, Ihe equality must occllpation. These oecupations were recoded lO correspond to the broad
sides of the categories of oceupations that were used by Sehmidt and Strauss (
in an early application nf the MNLM. In a of 337 currently em-
+ f3LA ployed men, respondents were distribllted among the following groups oí'
separatcly, oecupations: menial johs blue-collar jobs craft jobs
al Ihe
white-cnllar jobs ( and professional jobs ). Three independent
[6.5] variables are expected to affeet an individual's probahilíty of in a
given occupatíon: raee, which is measured as a dummy variable equal tn
1 if the respondent is white, else O: years of education: and an estimate
REGRESSION ~lODELS
,'Vomínal ()¡IICO/1l/'s
153
TABLE 6.1
Wíth thís normalization, it follows that )' mlx) 1. (hat
>/""'''I>IU'' ,um to r)
Desaiption

Occupati,m: "4 rnenial; While lhe now sum lo 1. ¡hey are since more
B I'llue collar; e eraft; Ihan one of parameters generales the same of the oh-
W whitc collar; P professíonal 1
nutcmnes. 'TI) see this i5 the case, we can rnultíply
LO Raec: 1 white; () nonwhites
Education: nurnbcr nf years of multiplying L the value
formal edllcatíon
Pnssible vears of work experienclC:
age mim;s of edllcation
minus 5

the number of years a could have been in .the labor fo.rce.


The statistics and abbreviations for these vanables are glven
in 11lblc 1.
While ¡he valnes of the probabilities are the original param-
elers 13m have been hy 13m T. Aecordingly, for every nonzero T
6.2.1. The lV1NLM as a I'robability Mudel there is different set of parameters that results in ¡he same predictions.
That the model is not identifled.
variable with 1 nominal outcomes. The cat-
l. but are not assumed to be ordered. lb the we must eonstraints on the such
be the of observing outcome m given x. A O the constraints are violated. Two
can be constructed as follows. are used. we could assume that I 13)
straint is oflen used with hierarchical log-linear models
= m [x) is a function of the linear combination x~m' Th,e 1 alld more comrnonly for the MNLM, one of ¡he j3's is
. .. incluues the intercept anu coefti- O, such as 131 Oor I3J O. The choice i5 arbitrary,
for the effect of on outeome m. In contras! to the ordereu and we assume lhat
modc! differs for cach outcome. For example, the coefficient for
tbe cffect ' edllcation on the probabílity of a bllle-collar worker is 131 O
different fmm the eoeffieient for the effect of euucatíon on the probabilíty
eraft worker. Clearly, if we alld a nOl1zero T to 131, the assumption that 131 O is
violated.
ensure that ¡he are we take the exponential
Whíle the result is 11 I lIe,,"" "". the sum exp(x~)
Adding this cOllstraint to the mndel results in the probability
1 whieh it must for probabilities.
In muer tn make the 11l 'Jl""'luu'",~ sum to 1, we divide exp(xl3m) by m [Xi) where 131 O [6.7J

¡¡¡t.:nl and the modd \Vas


m [x,) [6.6J
still not icknt¡tled.
REGRESSION MODFLS ,\í() 111111 (/ f

1, Ihe mndel is eommonly written as Sincc 131 O, rhe 1m tlw with outcome 1
to

In

is Ihe efket
for In


~t:-'¡LM as an Odds ~lodel

in terms nf the odds, as was done in This since the efreet of unít in
In versus outeome 17 x, indicated or on the level 01'
sínee ir is hard
nf the odds.
Alternative mcthods 01' are diseussed in Section 6.6.
This is the multinomial model Ihat was by Theil (
exp(xJ~m)
----- who derived the model in mueh the same way that 1 have
exp( x¡l3n) The model can be derived diserete choice which is now
considered.

to the odds equation: 6.2.3. The Multinomial Logit :vlodel as a Disrrete Choke Mudel

fn an inl1uential paper 111 McFadden demonstrated


that Luce's (1 model choice behavior can be lIsed to derive a va-
of eeonometric the multinomial and conditional
the MNLM ís linear in the logit:
logit models. This sectíon
known as the díscn:te choice rnodeL For a more detailed
Ben~Akiva and m
ters I ami
contrast, is Ihe dfeet of x on lhe logit The discrete choice model is that an individual
chooses lhe olltcome that rnaximizes lhe from that choice.
it is simple to compute the partial For assume that there an.: two
The utility for choice 1 is UI and the utility for choice :2 is U2' A person
chooses al!ernative 1 when u I and chooses alternative 2 when
f3km !le u] We assume tha! do nol occur. A person is rational in the
sense ni the allernative tIJat maximizes the derived from
lhe choice.
as: The utility derived from choice m for individual i
.. 01' outcome m versus outcome tl is ex-
aH other variahles constant.
R R SSIO l\!ODELS .,"ominal OlllClJ/lles
157
associatt.:d wit.h choice m for individual Pr(y¡ ni I Xi' !l:::, .. . !lJ) be lhe probability of Y¡ In
HUI'''''''. error associated \\¡ilh that choice. The prob-
x, with parameters !le through !l¡. Le! Pi be the probability 01' observ-
1 is the Ihat the utilíty from whatever value of y was actually observed for Ihe íth If
lhe observations are independcnt, Ihe likelihood IS

The are intrnduced inlo Ihe likelihood Ihe


When J right-hand side fmm 6.7 for {JI:

the
exceeds that of all other
where is the product over all cases for which y, is to m. Tak-
The form of lhe discrete choice model is determined by the ing logs, we obtain ¡he Iikelihood equation which can be maximízed
assumed distribution of 8 and ¡he of how /J-"" lhe aver- with numerical methods to estimate lhe In nY'.l['t",P
for relared measured variables. 'TI) obtain the tends to be very quiek. The resulting estimates are """'ele'""
MNLM, let the be a linear combination of the character- ically normal, and asymptotically efficient.
of an individual: shows Ihat under conditÍons thal are likely to
lihood funclíon is globally con cave, Ihe
:::: x¡Pm estimates.

McFadden (1 Iha! the MNLM results if and only if the t:'s 6,3,1, Software Issues
are 1 extreme-value distriblltion:
Different programs analyzing the same data uppear to
-8 exp( feren! results. This can be understood most

distriblltion looks líke a normal curve that is skewed lo the right,


a thinner taíl on lhe left and a thicker taH on lhe It has
mode 0, mean and standard deviation 1.28. The choice of lhe distri-
bution is motivated the and llsefulness of rhe
Different programs estima te different sets 01' contrasts!lm !l11' The
model.
contrasts Ihat are estimated are a minimal set in the sense Ihat all other
contrasts can be computed from them. UMDEP and Stata estímate the
J 1 contrasts!lm !l1 !lm, for m 2 to J. SAS's CATMOD estimates
ML Estimation the contrasts!l", !lJ for In 1 to J - 1. (Show that with rhese contrasts
you Ciln compute any other contrast!l", !l". ) Markov optionally estimates
is llsed LO derive the mode1, the all contrasts !lm - !lw
of an outcome is lhe same. This Unfortunately, some programs do not indicate which contrasts are
is the basis for the ML estimator. From Equation 6.7, let being estimated, If you assume that your program is one set
REGRESS10l'J MODELS NUn/llla! ()utcollles
159

will TAHLE 6.2 ("()dtkicllts fol' Multínomial Modt:l


lO compare your Attainmt:nt
benchmark. Let y'
no! 6.) The estimates are:

and Other Contrasts

Standard soft\vare estimares a mínimal set of contrasts. For example,


show the J 1 contrasts with the reference out- If our substantive interest in a that was not eomputed
the programo sllch as the effee! of race on a craft versus a
white-eollar ). we need to and test the
ir for all m 1= r [6.81 coeffkícnts.

t new notation for the contrast (Be1'ore pro- Other Contrasts


you should determine which contrasts are being estimated by
The contrasts estímated your software can be used to Assumc that your software cstimates ¡he contrasts for all outcomes
other contrasts that may be of substantive interest. Tc) see how relative to olltcome r. where r stands for the referenee
this is done. consider our model of oeeupational attaínment. variable the program estimates the J 1 r.
In Table to menial occupations. The for
the MNI}~f:
the 01' p versus q ean be by taking the differenee
Attainment
between two of the known parameters:
The coefficients in C¡lIble 6.2 are the standard output from a program
that estima tes the MNLM. The minimal ser of eontrasts involves all pos- [6.91
with outeome M. These eoeffieients eorrespond 10 the
This hnlds sincc

In 81 + 8

WHITE

In \1 f31, JfVVHITE For ¡he effcct of mce the of e versus J~! is


WHlTE f32, + f3:l. P IV/EXP 0.47 1.57 1.10
160 REGRESSION MODELS ¡'l/om inal OUIC{)I/l(!S
161

The va riance for ¡he new es timate is 6.5.1. Testing That a Variahle Has No Ellec!

With J de pendent ca legories, there are} - l paramctcrs f3 k m r asso-


ciated with each variable x k Th c hypothesis lhat X l does not ~ffect lhe
de pe nd en t va ri able can be written as
Va r( Pu.· \tl + Var( .BI.IL .\.¡ ) - 2Cov(,B1. e .\/ ' f3 uv : \1 )

T he inforrnatio n needed to compute the new variance is obtained from


the covari¡mce matrix fo r the estimates computed by your software. Since f3k . r r is necessarily 0, the hypothcsis imposes constraints on J - 1
Usually . you will need to explicitly request th at this matrix be printed. parameters. Thi s hypothesis ca n be tested with either a Wald or a LR
The va riances are located on the diagonal , with the eovariances on lhe test.
off-diagonaL In order lo avoid se rious rounding e rror, you must make
t~ c;:mputatio ns using as many decimal digits as are available. Once A LR "Test. First, estimate the full model M F that eonlains all of the
Var( f3J.( . w ) i.. eomputcd, the coefficie nt for raee o n the logit of ver- e variables, wilh the resulting LR statistic G¡._
Second. estímate the re-
sus W can be tested with a standard z- test: stricted mod e l M R formed by excluding variable X b with the resul!ing
LR test statistíc G~. This model has .1 - 1 fewer parameters. Fin a lly, eom-
P 1.C:W
p~te the difference G~vsF = G}. - G~ whieh is distribute d as ehi-square
1._ , -.. wlth .1 -- I degrees of freedom if the hypothesis that x k does not affeet
) Var(f3 I.c' w ) the outeome is true. The practical wea kness of this test is that you must
estímate the fuI! model and then K restrietecl mode ls corresponding to
excludin g eaeh of the X k 's.
Atternatively, and in rny experience more reli ably, your software can
be " tricked " ¡nto computing th e m issing coefficients. The software Ihat
1 used ror rhe exa l11ple complltecl the contrasts for oureomes 2 through A Wafd Test.. Since the Wald test only reqllires estimating a single
} aga inst category L The dependent variable was codecl: M = 1, B = 2, model, It IS easler to apply whe n there are l11any variables to test. Con-
e 3, W = 4, and P = 5. This resulted in the coefficients in Table sequently, it i~ includ5!d in the J,tandard o utput of most programs for the
6.2 .. ff occu pat io n is recoded so that M ::::: 2, B :::: 3, e = 4. W = 5, MNLM. Let I3k = (P k,2 ¡ I . . . Pk,J i 1)' be the ML estimates for variable
and P = L an el if lhe modeJ ís reestimated with the recoded olltcome X k from the ful! modeL For simplicity, 1 have assumed tha! the soft\vare
variab le, the progra l11 will estimate rhe coefficients and standard errors is estima ting lhe coe fficients against the referenee category l. If yOllr
for 13 \f ¡ /', 138 ./', and so o n. O ther contrasts can be estil11ated with similar software uses a different referenee catego ry, íikwould simply contain
recod ings o f the eategori es. the J - 1 eoefficients that wcre estimated for x k ' Le! Vai-Cíik) be lhe
cstimated covariance matrix. The Wald statistic for Ho: I3k = O has the
standard form:
6.5. Two Useful Tests

This section presents two tests that are very lIseful when lIsing the
MN LM. The tl rst is a test that the effect of a variable is n. The second 1f the null hypo thesis is true, then Wx is distributed as chi-square with
Ís a test of wheth e r a pa ir of outcome calegori es can be combined. Since J - 1 degrees of free dom.
ir is impo rta nt lO understan d how rhese tests can be implemented with
th e output from youf software , the tests a re presented in terms of the Example uf Wald ami LR "Tests. Table 6.3 contains the Wald and LR
contrasts wit h outcome r . tests for eaeh variable frol11 our example. The LR test [or th e variable
R (iR SSI N MODELS Nominal Olltclimes J63

A I~'clld Test. The hypolhesis that m and fl are can


be tested wi!h a Wald test:

where W eontaÍns estima tes from aH of the


Q imposes the eonstraints implied Equation 6. HL the Q to
test tlle hypothesis in 6./1.) This test is el1mbersome to
H1fIITE which makes rhe following test far more
tfue, we woulJ
A LR Test. 2 A simpler but less LR test can also
oecupa- be used. Fina, select only those observations with outcomes lo the
leve!. two being considered. Second, estimate a on the
new sample. Finally, compute a LR test that all of lhe slope coefficients
01', (not the intercept) in the binary logit are siml1ltaneously O. This test is
easy to apply since it ís part of the standard output from most programs
• for binary logit.
The the same. Whíle the LR ami Wald
the table shows that have differ- Ex;ample of Wald ami LR Tests. The hypothesis 1hat and
white-collar occupations can be combined can be tested as follows. First,
select the 153 individuals who have professional or white-collar jobs.
6,5.2. Testíng That 1"\\"0 Outcomes Can Be Combined Second, estímate the binary logit:

If affeets the odds of outeome nI versus


with to the
lnH pIW·(x)=!30+ WHITE+ +
are the Third, compute the LR test of Ho: O. For our data,
of In versus n, then the G~I w 23.4, d! 3, p .01. The Wald test a similar result:
that outeomes In and n are corresponds to U'p I W = 22.2, The hypothesis that professional and white-collar occupa-
tions are indistinguishable with respect to race, education, and
o [6,10]
ence is rejected at the .01 leve!.
in terms the coeftidents estimated your software,

o 6.5.3. Specification Searches


( !31
is Given the difficulties of interpretation that are discussed in the next
In our that P and W are
section, it is tempting to search for a more parsimonious model con-
=0 [6,11 J structed by excluding variables or combining outcome While
the two tests presented in this section can be used in a specification
the in Equa- search, great care is required. First, both of these tests involve multí-

M =O 2 1 thank Paul Allison for suggesting thís test.


REGRFSSION MODELS Snmillal ()wcoml's 165

cstimated:

where Ilm r is vector with the coeff1cicnts


Probabilities can be
various ways. For
probahilities. you can
abilities over the
With three al! variables except al sorne Ievel and
12 unique para me- as x k over its range. If you want lo
With seven outcomes, important groups, you can construct table of
contras! is cxamined, ¡he num- binations of values 01' the variables. Thcse wcrc
¡he MNLM is estimated, illustrated in Chaptcrs 3 and 5 and are not consídered further here.

1n this section, 6.6.2. Partial Change


for binary ami
For continuous
summaries, it is possible
ability is computed
of the MNLM.
to xk:

6.6.1. ('redicted Prohahilities Pr(y m Ix) J3]


The x is
The partíal is Ihe
xk to Pr(y ni I x), holding all other variahles eOl1slant. The value 01' lhe
12] marginal effcct on Ihe values (lf all
on lhe cocfficients for each outcome. Mosl
computed when variables are held al their means, '''''OC!!'''\!
variables held at O Uf l.
Sinee Equation 6.13 combines al! of lhe
x k on ni does not !leed to have the same
flcient as the can
For example. al one point the effeet of edueation on
eraft occupation could be while al another lhe
R R SSION l'vIODELS Nomina! Owco!1les

10 that ror the ORM, The absolute value is taken before since rhe sum lhe
Not w(luId otherwise O.

Discrl'te AUainment
6.6..t Uiscrete Change
Table 6.4 eontains estimates 01' discrete from OUT model of
occupationaI attainment. consider lhe variable Wl!rn::.
Holding alI other variables at ¡heir means, white decreases ¡he
probability of a menial job .13 ¡¡nd increases lhe
a professional jon .1 ó. By lhe average absolute
1'or a standard deviation in edueation is 16, and is .1)3 for
perienee. The cffeet of education is 01'
where the
in education is (138.
the
While ir is to examine these a
discrete plot quickly summarizes the information, 1 shows
• the change in the probability along the horizontal with each variable
¡isted on the vertical axis. The letters to the
show the
come for a in the
variables held at theÍr means. (Remember that different results would
in he obtained if the variables were held at other It ís easy to see
variables, Most often, other
(hat the effects of a standard deviation in cducation are
variables held
with an ¡nerease of ovcr 35 for The effects
mind Iha! the amount and even
on lhe values a! whích ¡he
I!1tJle.pen1ctel1t variables are held constant.
Your choice of ¡he amount of in ¡he variable assessed
TABLE 6.4 Díscrete in ror Multinomial Model
of Occupations. Jobs Are Classified as: M Menial; Craft;
on the purpose 01' the B Rlue Collar; W White and P Professional
variables should be from O
vÍlriahle Change X M B e w p

WHITE o 0.12
ED ':;1 O,()6
variable from lts mínimum to lts .:;" 0.16
Sections 3.7.5 ami 5.4.4. ':;Range
EX!' ~J lUlO
~"""~"Tn the discrete
.:;" 11.03
':;Range
absolute diserefe Prohability at Mean

.la
REGRESSION MODELS Nominal Outcomes 169

Where nm ¡,(x) is ¡he odds of ollteome tri versus outcome n X.

WH BW P Expanding xl3m leads to

P
lf x k is changed by O, then

0,25 0,50
Change in Predided Probability The effect of x k can be measured by rhe ratio of the odds before ami
after the change in x k:
Plot fOl' lhe Multinomial Logit Model of Oc-
Held al Their Means, Johs Are Classified nm Il(X' Xic + o)
B = Blue Collar; W White Collar; and 'I1(X,Xk)

of raee are also with average blaeks being Iess Iikely to en-
ter blue-collar, jobs, The expeeted changes
due to standard deviation in are much smaller and AH terms cancel except for exp(¡3k.ml o), which the odds ratio,
show that increases the probabilities of more highly skilled The odds ratio can be interpreted as:

• For a change of o in Xb lhe odds of outcome m versus outcome n are


expected lo change by a factor of al! other variables
6,6.4. lnterpretíng Odds Ratíos constant.

the in the probability is a use fuI way to assess When o= 1, the unstandardized odds ratio can be interpreted as:
of effeets in the MNLM, it is limited in two ways. Fírs!,
• For a unit change in Xk' ¡he odds are to hy a factor of
indicates ¡he for a particular set of values of the
exp(.Bk, holding all other variahles constant.
variables, At diHerent levels of the variables, the changes
wi!l be different. Seeond, measures of diserete change do not indicate When o is the standard deviation of xb the x-standardized odds ratio can
Ihe among the outcomes, For example, a deerease be interpreted as:
in edlleation ¡nereases the of both blue-eollar and eraft jobs.
But how dOes it affeel lhe odds 01' a person choosing a craft job relative • For a standard deviation in x b ¡he odds are to change
blue-eollar To answer tllis of qllestion, we need to consider a factor of exp(.Bk,m ), holding all other variables constant.
the odds formlllation of lhe model.
Very important~y, the factor change in lhe odds for el in xk does not
1 Iha! the MNLM can be written as
depend Of1 lhe level of x k or O!7 the leve! any other variable.
While the interpretation of each odds ratio is the number of
exp(xl3m ) comparisons makes the task difficult. To appreciate the problem, con-
( RFSSION MODELS ,Vominal ()11f(OmeS

TAHLE 6.5 mak<2s it find cuef!1eients

The column
of B relutive lo A are cut in has no drect on in-
creases the ndds factor A doublcs the odds.
The of lhe effeets of I ami dircc-
1/ The effeet of m
on the udds
as a (lile uníl inerease in
Tb plot these ,'"''''''0'''
) as lhe dístance
the dístanee. If increases the odds of A
over B. rhen A would be lO lhe of B and vcrsa.
ure 6.2 plots the coeffkienls fmm Table scale in
• menial occupa- the units oí' rhe The are relative LO category A
celucatíon anel A is located al () on ¡he botlOm scale indicate
tha! a does not lhe
otlsly). B is localed at unít increase in
XI decreases the logít of B versus /1 While ! he cocfticients are
10 blue-collar
plotted relative to outcome A. coulel relative lo
outcol11e B.
Since our ¡nterest is in lhe factor
scale is printeel al the top of the seale with
each value equal to the
For the B for
a uni! in XI
coefllcients are

6.6.5. Pllltting the Coefticients

is useful for understanding


nominal outcomes, lhe number of eoeft1eients
in lhe results. If you also keep track of
coefficiems lhe difficulty íncreases. An
R (iR S ION MODII S i'/ominaIOwU}/flCS

TABLE 6.7 Ctldtlcicnts for

X1

X2 el,1

X3

X4 8 NOTE

~OJ)7 -0,33 0,00 0,33 0,67 1.00


Logit Coefficients Seole

B can be examined in a similar way. Even with this it is


6.2. Ratio Plot MoJel
difficult to kcep track of all of rhe "n~'rv,r,
Figure 6.3 plots ¡hese coefficients relarive to A, which is
TIle relative the dfects for each variable are shown located on lhe factor scale at 1 The plot makes it immedi-
lhe Jistance bctween A und B, The dfeets of Xl and X 4 are of egual ately clear that x I ami have effects:
but n~~'~""'A whieh inJieated ¡he distanee from A to B 1 by 1 has (he same effec! as We also see tha!
the same for both variables. Tlle of the effeet of X4 is X, has a effect on A versus e, but smaller cffects on A ver-
indieateJ lhe distanee being half as large sus B ami B versus e en) add links among outcomes that are no!
1'or 0, \vhieh Is inJicateJ A on significanlly affected by a variable, 1 have the vertical spac-
of B, for the letters. This spacing is needed so tha! lines can
The lineo The be seen and has no substantive levels
intuition is then lhe shows, for example, thar x,does not
variable docs not diffcrcntiate the two outcomes, and so those outcomes any of the outcomes. You sllould take sorne time lo
inJicated on the
n
"A .. , B for x3' to make sure that you see how it all of lhe Ínformation in
coefficicnts for two outcomcs. How- Table 6.7.
is useful fm grasping the Figure 6.3 shows ¡he coefficienls relative lo which is why
overall between an variable and lllc A's are located at () on lhe scale and 1 on lhe factor scale.
tlle odds among aU of outcomes. Consider a hypothetical model The information could also be plotted relative !o either B 01' e
with three outcornes: A, and C. The coefficients are given in Plotting the information relative lo outcome B would shift rhe fm
1ablc The variables but opposite effects on each variable so that they are located wirll B at O on tlle bottom scale.
outcome B effect of Xl beillg hall' as For xl' this would allletters to the right .69 units. The
two-uni! has the same effect on the odds relative positions would remain the same. (Plot the relative to
also have equal and eategory e Convinee that lhe is
that are half as Ib more fully appreciate how the factor coefficients
. The effects on outcome e versus can help you interpret the results, 1 use these with our model of
X1

X2

X3
.00 -0.67
Co .. ffTcitlnts Seol"

not strongly differentiated by edueation. The effeets of are


Odds Ratio PlO1: Attainment
much weaker than those of either raee or education. splíts
The cocfficicnts for the MNLM for occupational attainment were jobs into two groups. Increasing increases the odds of white-
111 6.4. Race orders the oc- collar, eraft, and professional relative to menial and blue-collar jobs.
from menial to craft to blue collar to white collar to profes- In eomparing the effects 01' the variables on the various outcomes, it
510nal. The dotted lines show that none of the adjacent categmies is is important to note Ihe different of fm the differ-
differentiated mee. For being white increases ent variables. If an mdered logit model had been this in
the odds of craft worker re!ative to a menial job, but ordering would have been lost.
tl1at are farther apart are sig-
increases the odds The Importance 01 rhe Predicted Probability

re!ative 10 a menial When using an odds ratio plot, ir IS essential lO understand that lhe
x-standardized coefficients have been plotted fm edueation and expe- substantive meaning of a factor of a is depen-
r¡enee. the effect a standard deviation change in edueation is dent on the predicted probability or odds. See fm a
rhan the effeet índieated the greater spread in the detailed discussion of this point. For example, if the odds inerease a
jobs are not differentiated factor of 10 but the eurrent odds are 1 in 10,000, then the substantive im-
effeet of education, as wouId be pact is smaI!. Consequently, the odds ratio plot must be interpreted while
the educational of professional jobs. Educa- keeping in mind the base probabilities and the diserete in the
differentiates white-collar jobs from menial. eraft, probabilities. The plot can be modified to tbis information
meniaL and blue-collar jobs themselves are by making the height of the leHers in the odds ratio proportional
RECiR SSION MODELS Nominal OlllCOIrII'\

in lhe odds, The s4uare rool {he Odds Ratio Mothcrs


and thus rhe arca nf ¡he
Whíle 1 In of whether a
in prohahili¡ies is her child as
MNLM, yOll mus!

other variahles and are not discllssed,


Figure 6,6 shows lhe odds with the size lhe numbers within
is shown in
and 6.4, Consider the results lhe plot lo lhe size lhe discrete in
is ahoLl! the same as the líes when variables are he Id al their means, In Ilj8lj relative to
lhe ¡eUer" heing odds of al! outcomes were more relative
edueation with aJl The effects of ¡he year of lhe survey on lhe odds élmong
01' a professional ¡n- rp,',,"T""" arrange ¡he outcomes as would be with an ordi-
ol' either white-collar or menial nal variabk, Being él woman has the effee! on
between strongly and all other and has a
cant effec! on lhe from
divides attitudes tnlO Ihose thal are and those that are neg-
with no effects withín lhe two groups. Education has
weaker although it increases the odds of
relative to strongly
In this example, the ordinal fUf al!
variahles, except where ¡here are no differences belween ad-
jacent This does nol mean ¡ha! Ihe resu!ts are
those for the ordered modeL the
eliminated the parallel restriction tha! forced lhe
each independent variahle to be rhe same of tlle out come
category,

Conclusions on the Jlvfultinomial lvfodel

While the ínterpretation of the multinomial


cated when there are more Ihan a few outcome
methods presented in Ihis section can be used lO uncover the
patterns of effects, In some these reveal that
can be described in a few sentenccs, and therc is liuJe need to
present ¡he graphs, In other cases, the and the
plots themsclves need to be included as of the
Professíonal results.
178 REG R ESS I O N M O DE L S Nominal ()ulcoml's 179

Foctor Chang e Seo te


fre eway !"O U tes. Bo skin (1 974) exa rnined oceupation a l ch o ice using eh ar-
0 .22 0 .37 0 . 61 1.00 1.65 2 .72 4-48
I I I --+----------+ acteristies of rh e oecupatio ns. A co nsume r's cho ice o f transporta ti a n
fo r a sho pping trip was studi ed by Domencich a nd Md-a dde n ( 1975).
:3
YR 2 4
H aus ma n a nd McFadde n (1 984) analyzed the choi ce between an e lee-
trie d rye r, gas d ryer, a nd no d rye r. Hoffma n and Dun ean ( 1988) co m-
+ - - - ---+
pared the MNLM a nd lhe C LM with spec itic a tte nti o n l O applieat io ns in
4 l. de mogra phy.
MALE J 2 In th e C LM, rhe pred iete d probab ility is

4. 2
Pr(y¡ = miz¡) = _.~~'T_~~0!!'l..._. [6. 14J
AGE S .1
Lj~ l exp( z,; 'Y )
whieh should be eompa red to the M N LM:

EO S
:3 Pr(Yi = m I Xi) = ~XP( Xi ~m ) whe re ~1 = O [6, 151
2- '4 L 1= 1 e xp(x, ~j )
-1. 50 -1 .00 -0. 50 0 .00 0. 50 1.00 1. 50 In E qu a tio n 6.15, the re are J - 1 para me ters {3 k m fo r caeh Xk ' but on ly a
Lo gi! Co effi c ia nts Seata single va lue o f X k fo r e aeh individu a l. In E qu atio n 6.1 4, th ere is a sin gle
'Ik for e aeh va ri able zb bUl ¡here a re J values of th e va riab le for eaeh
Figure 6.6. Enhanced Odds Ratio Pla t for the Multinomial Logit Model of individu al.
Altit udes Toward Working Mothers. Discrete Changes Were Computcd With An exa mple of how the da ta are eonstructe d for the CLM is useful
Al! Variab les Held at Their Means. Categories Are: 1 = Stro ngly D isagrce ; fo r unde rsta nding th e mo de l. Assume the re is a single inde pendent va ri -
2 = Disagree; 3 = Agree; and 4 = Strongly Agree ab le z a nd three outcom es . Fo r fo ur individu als, the d ata might loo k as
fo llows:

6.7. The Conditional Logit Model ¡ OWCOf/1e O Ulr'orne l-¡¡riable


I m C!lOsen Zw r

[n rhe M N LM , e ach exp lan atory vari able has a differe nt dfeet o n e aeh
o u tco rne . For example, the dfee t of X k on o utco me m is {3 km' while the
effect o n o utcome n is (3 kll ' The eo nditio nal logit mode l (CLM), so rne-
I
1
2
3
O

O
2 '1
z" = 7
z" = 3
=1

I
times refe rred to as the Luce mode l or (eo nfusingly) the rnultino mial 2 1 In = 5
lagi t mode l, is a closely related mode l in which the coefficients for a I 2
2
2
3
()
()
z:: 0= 1
va riable are lhe same for each ou teo me , bu t the values uf ¡he variables Z2J = 2

d iffe r [or e aeh o uteome . For exam ple, if we a re trying to expl a in a eom-
mU ler 's choice of transporta tío n amo ng th e o ptio rls o f train , bus, and
!, .l
3 2
I
()
ZJI
Zj:
= 3
== ()
! 3 3 () ziJ =1
p ri vate automob ile , we might conside r lh e amount of time or the eost 4 () z" = 3
pe r tri p fo r eac h optlon . T he effecl of tim e wo uld be th e samc fo r eaeh 4 2 O :42 = 2
4 .3 I Z '¡3 :::'::: 7
mode of travel, bu t the amount of time wo uld di ffc r by the mode o f
tra nsporta tian.
T he CLM was deve loped by MeFad de n a nd o th e rs, la rgely within the Fo r each individual, th e re are three observatio ns correspo nding to the
eontext of rese arch o n travel de mand o McFadde n (1 968) used the mo del three possible o utcomes. The diffe rences in the val ues of z fo r rhe d iffer-
to study er ite ria used by a state highway dep artme nt to se lect urban e nt o uteo mes dete rmine th e p robab ilities of vario us cho iees. To re fl ect
REGRESSION MODELS Nomlllal Olllcomes 181

1 have listed such lhar the largest value of z i5 To transform this into the CLM, we eonstruet z vectors with four ele-
assocíated with ¡he oulcome that is (hose n lhe individual. ments:
The models can he in terms of lhe odd5 form of the
(0000)
model. In ¡he Cl.M. the odds 10 lhe difference in the
values of lhc associated with ¡he two outcomes: (1 O O)
ZIII ]y) (O O 1 )
In the MNl.M. lhe odds lo the di1'ference in the coe1'- The first subscript for z is the observation nurnber; ¡he second is the out-
ticíents for the IVlO come (either 1, 2, or 3), and the third is the variable number 1 through 4.
Zil is a vector of 0'$ for all observations, which to the con-
Pm Pn ]) straint that PI = O. Within z2' the first elements ís 1, tlle second
Boskin's ( oí' the CLM lO oceupational attainment is Xi' and lhe last two elerncnts are always O. Within z" the first two ele-
useful lo OUT using the MNLM. In the ments are always 0, the third is always 1, and lhe last is x. To see how this
MNLM we examined how mee, and experienee affeeted the eonstruction of the z's leads to the MNLM, define 'Y y.
odds of different Eor a individual, the values of the Then
regressors were lhe same for al! outcomes. For example, a person's raee
did not with the choice of an oeeupatlon. In Boskin's CLM, the
Zil'Y=(O ,820)+(ÜX )+(üx +(0 )=0
variahles were the eosts and benefits for each oe- zi2'Y=(1 x ,820) + (X íl x )+(0 )
Eor for each person he eomputed the present value of
the wages in that oecupation times ,820 + ,821 Xíl
ICA,}\Ol.lIOU number 01' hours the person will work in the future) for ZiJ'Y = (O x ,820) + (O ,821) + (1 x +
The effeet of lhe value is the sarne for eaeh oe-
value itself differs by oeeupation. For a given = ,830 + ,8JJ Xil
value of a professional oceupation will exceed the Substituting ¡nto the equations for the CLM,
valuc 01' a menial thus making a professional oecu-
more , all else equa!. Pr(Yi = 1 1 z¡) = __,---:c...:......:..:.-=-:-_
The eonditional ami multinornial models retleet different aspects
nf the processes which individuals attain oecupations. 1 Sllspect that
at sorne the mast useful models for the analysis of nominal out-
comes will combine eharaeteristies of the rnultinomial and conditional
models. To how these models can be eornbined, we can take Pr(y¡ = 21 z¡) = --_:...:......:.::.:..:-
the of the CLM and the MNLM
p, lb illustrate this equivalenee, consider a MNLM
variable and three dependent categories. Then
1
Pr(Yi = 31 z¡) = -~--'--'-"--'--.

which is the MNLM.


H ESSION :Vl0DELS Sommal Oulcomc.\

biJities are: Pr(


This is neeessary
al 1 (1 1
hus service from the
ear travders to slart
crs would be
Pr( blue bus)
2 (1 )/( 1
tha!

6.7.1. Software Issues ability of a car ean


'ioftware that different colors 01' buses!
The of ¡rrelevan! alternatives is
strietive assumption. McFadden I
the multinomial and eonditional
cases where the outcome he assurned lo be dis-
Irrelcvant tinel and of each decisioll maker.
Similarly, Amemiya (1981, p. that lhe MNLM works well
In lhe multinomial for the mlds of In versus
when the alternatives are dissímilar. Care in lhe model to in-
volve distinct outcomes Ihat are not suhstítutes one another seems
ro be albeit LHJvíce. A formal test

6.8.1. Testing HA
in the IS

Hausman and McFadden (1 a test of the


HA property. A Hausman test is ba!'.ed on ¡he of two esti-
mators of the same One estimator is consisten! and cftkicnt
if the null hypothesis is tme, while the second estimator is consistent
withoul reference ro Ihe bul ineftkient. For both the MNLM and the lhe ML estimator
This known as Ihe indepen- is consisten! andefficient ir [he model is A consis-
HA. While this may tent bUI inefficient estimator obtained
appear 10 be obscure mathematical it has important practical restricted se! 01' outcomes (Ben-Akíva & 1985. p. I
which can be íIlustrated with lhe famous red bus/hlue hus alternatives are irrelevant in the odds for two outeomes. then
tha! attríbuted to McFadden. omitting those alternatives should no! affeet lhe estima tes of the param-
10 work: a private car that elers Ihat affect Ihe two outcomes. For you could estímate the
<lnd a bus with Pr{red bus) = ¡ The coefficients in a MNLM with A, and e
versus the red bus are 1 ( 1 estirnating the two binary A lo e amI B to C. Since
that a new hus eompany that is identical lO the current these estimators do not use aH of ¡he data would
that ¡he are hluc. ITA that the new proha- not be efflciellt.
R (¡ Rr s s 1O ~ M O DEL S Nominal ()lI{comes IX5

HA software. it correlatiolls among errurs in a multivariate normal distribution. Unfor-


,,,',,,,,"',,,,c tiJa! follnwll1g tunately, the bunJell 01' multidimensíonal nor-
mal makes ¡he modd MeFadden ( has made
progn:ss in lhe

The Afodel. The modd was Ander-


son (1984) in response lo lhe restrÍClive
sions in lh¡: ordered mode!. The
with the MNLM and adds lhe constraint ¡hat ~m (I)m~' This re-
sults in lhe mudel Pr(y m I x) )/ 'J ¡ exp(x~(I)J)' The
of the oulcom¡:s is ensured ¡he constraints lhat
(P 1 (l. There is no software
iealJy lO estima te the mode!. DiPrete (1 used a
ML program in GAUSS. 'lh¡: model can be asscssed by ex-
amining lh¡: from ¡he MNLM to s¡:e i1' lhe structun:: of lhe
stereotype model is approximatcd. This was taken Green-
wood and Farewell (1

n1e Nested Logit Mm/e/. The nested model divitks the choices ¡ntn
a hierarchy of levcls and thus avoids ¡he HA
Amemiya (1981, pp. Cramer (
note that H HA ean he (1995, Chapters 24 and Maddala (1 pp.
"'''''''n,~
semidefinite. examine an (1981, p. 238).
lO ensure tha! the difference is positive
is evídence that HA holds. Models Jor Ranked Data. Models for rankecl data are also similar 10
the MNLM. Rank data occurs when an individual ranks fmm
a set of choices. For example, a person indicat¡: lh¡: rank order 01'
preference for tbree eandidates rullning for o!l1ce. References include:
Allíson and Christakis ( el al. ( 1), ami flausman and
Ruud (1987),
6. Modt:els

There are models that sllould be mentioned.


6.10. Conclusions
Tite .'vlultinomial Prohit l'vlodel. Tlle multinomwl model can he
derived tha! ¡he errors in the discrcte choice model are The multinomial logit model ís useful
distributed. Thís model was by Aitchison ami Ben- nominal and ordinal variables. While lhe mollel is
and has wirh three outcome by Haus- terpretation is complicated by lhe
man and Wise ( 01' the multinomial probit However, the graphical methods nf
model i8 tha! the errors can be correlatcd aeross which elimi- ter can be applíed effectívely even wilh a
nates the HA restriction. This since it is to incorporate variables and dependent
REGRESSrON MODELS

]I

by Gurland et al. (1960).


used in the social

whose derivation
7 Limited Outcomes: The Tobit Model

such as Luce (1959).


Aitchison and Bennett (1970) were the
modeL whose origin can be traced
The model was derived fmm the as-
McFadden (1973). Nerlove and
that made an important con-
Fortran programs for the multinomial

In the linear regression model, the values of all variables are known
for the entire sample. This chapter considers the situation in which the
sample is limited by censoring or truncation. occurs when we
observe the independent variables for the entire sample. but for sorne
observations we have only Iimited information about the dependent vari-
able. For example, we míght know that the variable is less
than 100, but not know how much less. Tnmcatio!l limits the data more
severely by excluding observations based on characteristics of the de-
pendent variable. For example, in a truncated sample all cases where the
dependent variable is less than 100 would be deleted. While truncation
changes the sample, censoring does not.
The c1assic example 01' censoring is Tobin's study 01' household
expenditures. A consumer maximizes utility by purchasing durable
under the constraint that total expenditures do not exceed ineome. Ex-
penditures for durable goods must at least equal the cost of the least
expensive ítem. If a consumer has onJy $50 Jeft after other expenses and
the Jeast expensive item costs $ J00, the consumer can
on durable goods. The outcome is censored since we do not know how
much a household would have spent i1' a durable couJd be pur-
chased for less than $100. Many other examples of censored outcomes
187
REGRESSION MODELS l.inllfed ()w('o lt/es

ca n be fo und : hOll rs wo rked by wives (Ouester & Greene, 1982), 5ci- Panel B pto ls ¡he cc nsored va riable y \vi th censored cases stacked at O.
e ntific pub licatio ns (Stephan & Levin, 1'-)92), extra m ar it al affairs (Fair, T he bar conlains cases fmm lhe shaded region in panel A. Pa nel e plots
1978), fore ign trade and investme nt (Euton & Tamura, 1994), allster- [he trunca ted va riahle y I y > 1 ( i.e ., y givcn th at y > 1), which simply
ity pro tests in Third World cOlln tries (Walto n & Ragm, 1990), damag~ delctcs lIJe s hatlcd region fmm panel A.
caused by a hurricane ( Fronst in & Holtmann, 1994), and IRA c~ntn­ To sc e how ccnsnri ng and truncatian affect the LRM, consider the
bu tions ( LeCl e re , 1994). Amem iya (l985, p. 365) lists many addltlonal 1 model y' = 1.2 + .ORx + e, where alt of th e assumptio ns of the LRM
apply , including the normal ity of the errors . Panel A of Figure 7.2 shows
e xa m p les. .
H a usman and Wi se's ( 1977) analysis of the New Jersey Negatlve 1n- .....1 a sample of 200 with no censoring. The solid líne is the OLS estimate
come Tax Expe riment is an e arly a pplication of models fo r truncated
·1 ? = 1. 1R + .ORx. ff y' were censored belnw at 1, we would know x for al!
da ta. In this study, famílies with incomes more than 1.5 times the poverty obsc rva lio ns, hu I ohse rve y' on ly for y" > 1. In panel B. values of y' at
level were exduded from the sample. T hus, the sample itself is affected or befow 1 a re censo red wi lh y = O for censored cases. These are plotted
and is no lo nger re prese nta tive of th e population. . ' wit h tri angles. The lhree thick Iines are the rcsu lts of three approaches
M a nv models have been developed for censoring and truncatlo n. Thls to est imatio n .
chap te ; focuses on the most frequently used model fo r censoring, the On e way to estima te rhe parameters is wi th a n OLS regression of y
tobit m odel. Ser tion 7.6 briefly reviews related models for truncatlon, 011 X ror all nbservations, \virh the ce nsored data included as O's. The

mult iple censoring, and sample selection. res ulting estimate y = .95 + .11 x is the long dashed line in panel B. The
censo red observa tio ns on lhe Idt puB down that cnd of the line, resulting
in underestimates of the interce pt and overestimates of tbe slope. This
7.1. The Problem of Censoring a pproach to ce nso ring produces inconsistent estimates.
Sinc\:! induding censored observations causes problems, we might use
Le t y' be a dependent va riable that is nOl censo:ed. P~nel A of Fig- OLS to estima te the regression afte r truncating the sample lo exclude
ure 7 .1 shows the distribution of y' , where the he tght ot the curve m- cases with a ce nsored de pe ndent variable. This changes the proble m of
dica tes the relative frequency o f a give n value of y*. If we do not know censoring into t he problem of a tru ncated sample . After dele ting the
the value of y* when y' ::: 1, corresponding to the shaded regíon, then cases at y = 0, lhe OL') estim atey = 1.41 + .61x overestimates the in-
y' is a lalerlt variable tha t cannot be observed over its e ntire range. The terce pt and underestimales the slope, as shown by the short d ashed line.
~'ensored variable y is detlne d as The uncensored ohse rva tions at the left have pulled the line up, s ince
those obse rva tions with large nega tive e rro rs h ave b ee n de le ted. Trunca-
if yi > I tion ca uses a corre latinn hetween x and e which produces inco nsistent
if y' ::: 1 estim a tes.
A th ird approach is to estimate the tabú model, som e times refened to
Panel C: Truncoted as the censored regression m odel oT he tobit model uses all of th e info rma-
Panel A; Lot ent Pa nel B; Censored
tion , including informa tion about the censoring, and provides consistent
stim ates o f the parameters. ML estimares for the tobit mode l a re shown
>, hy t he snlid line , which is indistinguishable fro m the estimates in panel A
~
(J) E
o
wh e re the re is no censoring.
e Uncensored
Q) '"c: Uncensored
a u"
o 2 .:1 o 1 2 .:1 Example of Censot7ng and 71uncalion: Prestige of ¡he First Job
Y yly> 1
C ha pter 2 used as a n example the regression of the prestige of a sci-
Figure 7.1. Latent, Censored, and Truncated Variables en tist 's lirst acade mic jobo (See Table 2.1, p. 19, for él description of the
REGRESSION MODELS Umilc¡/ ()wcomes 191

TABLE 7.1 lhe of lhe

FE,,!
-0.1':14
--2.1)5
PHD
0252

1.52
0.325
(URO 0.267
Censoring and Truncation
ART O.O()(i 0.034

ClT 0.005

R2 0.210 0.201

cate the sample by ¡he cel1sored cases. The OLS estimates from
x the tnmcated sample are in the column "OLS with a Truncated Sample."
I\ilodel Wílh ami Wíthout Censoring and Trunca- Finally, tobit estimates are listed in the column "Tobit "
The most important difference between the results of the tobit anal-
ysis ami the two OLS concems the effect of In the
tobit the effeet nf a woman is and negative.
of the job was unavailable for graduate the effect is substantíally smaller
and for departments without graduate pro- and not significant. In ¡he truncated the effect is positive, al-
lO 1.0 ami OLS was used to estímate though not significant. Thus, a key substantive result is dependent on the
from are reproduced in the column method of analysis. Other differences in relative and level of
in Table 7.1. Altematively, we eould trun- significance are also found.
REGRESSION MODELS UlI1íred OWcomes
193
ami Distrihutions When J.1. () and (r 1, lhe standard f/omw!
the simplitied notation: is written in
Before the tohit model, we need somc results
truncatecl amI censored normal distrihutions. These ,1IstnhutlOns
I J.1. 0, Ir
the fmmdatioll nf most models for truncation and . Re-
'-",'<n.n"" and truncation on the lefr, w!llch translates <f>(y' ) = I¡..t 0, ir = 1)
from helow in Ihe tonit model. formu-
and tmncalÍon 011 ¡he ami hoth on the le!"! ami Any normal distribution, regardless of lIs mean ¡..L and varíance
, can
are available. For more see Johnsol1 et al. (1 be written as a function oí' ¡he standard normal The
be written as can
or Maddala ( pp.

7.2.1. Tbe Normal Distribution


I ¡..L, Ir)
exp[-~ 2J
(r
(1)
) !J
wirh mean ¡..t and varianee and the cdf 01' ean be wriUen as

Pr( Y* S )=(f>( (T )
SO that,

which 7.3. The cdf ís Pr( Y* $( cr )


= Pr( y' Since the standard normal distribution is with a mean of 0,
two identities follow that are frequently used to simplify other formulas:
tha!
4;( o) = 4;( -o)
(1)(0)=1 (f>(-o)

in A and I ¡..t, (T) is the These results are often used to simplify the in this
For example, Equations 7.1 and 7.2 can be wríuen as

Pone! A: Normal Panel B: Truncated Panel C: Censored )


Pr( Y* >
)
7.2.2. The Truncated Normal Distribution
r ¡J.

yly>T y
When values below Tare deleted, the variable y y > T has a trun-
Distribution Wi¡h Truncation and cated normal distribution. In terms of panel A oí' 7.3, we want
to consider lhe distribution of in the unshaded while
RECiRESSION MODELS

is crealed dividing
tn ¡he nf T. This
an arca (}f l'

"
¡he solid hne. The ""
lo ¡he
This is se en
the tnli1cated distrihution lo the dotted
the results
wri!e the truncated distribution as More 3
less

Figure 7.4. Ratío

¡he area truncation


01' the distrihution has heen tnmcated, the effect 01' truncation on the mean
O.
1-<. if y' is normal

7.2.3. The Censored Normal Distrihution

When is censored on the with values


1-< Ir A(1-<. u T) [7.3] nr bclow T are se! to

Alills ratio.
in this chapter that it de-
A and íts components 4>
IT,
if
if
T

T, bllt other values such as zero are also usefuL Panel e


is the number censo red normal where rhe censored ob-
Ihat the mean 1-< is aboye or below the truncatíon the T. From
2 means that 1-< is 2 standard devia- of an
assume thal the mean 1-< is fixed, and censored
T. At the left of the t1gure, T exceeds
ís more extreme. d; is than el>, generating val-
lo the T the amount of
increases. With ¡his change, <1> ¡n-
and Ihe censored is
in smaller values of A
7.3 shows tha! as A approaches 0,
¡he truncated variable approaches 1-<. That ¡s, as <p(.J.L T)
(r
REGRFSSION MODELS 197

In this section. 1 present lhe implieations oí' in several


variable
First. r show Ihe effects of variables on Ihe probability of
T)] [7.41 eensoring. Next. 1 demonstrate lhe associated wíth OLS
with censored data or a truncated These problems lcad lo the
)\ ML cstirnator for lhe tohi! modeL Finally. 1 consider several methods
(1) T f.L)T v fm inlerpreting ¡he in the tohit model. Befme 1
(T want lo considcr potcntial source nf confusion.
7.3. Consider how lhe expeeted
on T. As T approaehes x, the 7.3.1. The Distinction Between T and T,
1 and E(y) approaehes lhe
x, the probability nf being cen- Many authors assume that T
the uncensored mean f.L. formulas that are simplcr than mine.
lhe tobit mode!. can lead to confusion and incorrect results since in it is often
the ense thar T =f:. =f:. O. Conseguently, 1 make the distinction between
T and rhreshold T detem¡ines whet/¡er is censored. T y
Qutcomes is the lo ir is censored. While ís often eguaI to T,
this Is no! always appropríate. Consider Tobín's application. The
is cos! of rhe chcapest durable good (Le., T) is not but far censored
cases it is most reasonahle to code y T} O since these people did not
[7.51 purchase any In my formulas, you can substitute T O or
T TI' to ohtain formulas that match those in other sources.
for all cases. is a latent
than and is eensored for
T
i1' you use formulas that eguate these guantities, it is essential that these
y ís deflned by the mea- restrictions apply to your data.

7.3.2. The Distribution of Censoring


if T
[7.6j
\ if T The probability of being censo red depends on the proportion of rhe
distribution of ,or, eguivalently, 8, that falls helow T. The distribution
ami 7.6, of y x is shown in panel A of Figure 7.5. Ix) ís the solid
x,,~ if y; T
p.7]
Une witil the dístribution of y' shown at tilree values of x. For example,
al XI a vertical line is drawn \Vith a normal curve coming out of the
\ if <. T
page. CCl1soríng occurs when observations fal! at or below lhe line
T. which i5 indicated by rhe shadcd region of the distribution. As the
=
be uscd in situations where there is eensoring
ineomes over $lOO,OOO were combined ¡nto value of X Ix) the proportion of the
" income would be eensored from aboye and distrihution that is ccnsored 10 decrease. Thus, the labeled A is
larger than R whkh is Ihan C.
The probahility of a case being censmed for a of
x¡p if T
the normal curve less rilan or egual to T:

\ ir y; T

Pr( Censored I XI) =


are in Seetion 7.6.\.
RESSION fv10DELS Limited Owcomes 99
R

x To simplify lhe formulas Ihat follow. le!


Panel A:

8 is ¡he number of standard deviations that xa


is aboye T (HoY\' are 8)
ami -o) related? (N 8) ami Using thís dennitíon,

Pr( Censo red I XI) tI)(

Pr( U ncensored I XI) <1>( 8, )

Equation 7.8 is plotted in panel B of 7.5, The points on the curve


labeled A, B, and C correspond lo the shaded in A. At the
left. the change in Pr(Censored Ix) is gradual as lhe thin taíl moves over
lhe threshold. The probabilíty then decreases rapidly as the fat center
Panel B: of the curve passes over the threshold, and then as the
bottom taH passes over the threshold.

The Link Between Tbbit and Probit

Deriving the probability of a case censored is very similar to the


derivation of the probability of an event in the probit model of Chap-
ter 3. The structural models for probit and tobit are the same, but the
measurement models differ. In the tobit model, we know the va/ue of
when y' T, while in the probit model we only know il T. Sinee
more information is available in tobit (i.e .. we know for same
estima tes of the {3's fmm tobit are more efficient than lhe estima tes that
would be obtained from a probit model. Further. sínee all cases are cen-
sored in probit, we have no way to estimate the variance 01' and must
assume (hat Var( e I x) 1, while Vare e I x) can be estimated in the tobit
x model.
7.5. Censored in ¡he lbbit :Y1odel
Example of (he Probability of Censoring: rr?'\'¡/(,,' che Fim Job
.;v(O, 1). Therefore, The effects of doctoral prestige, and a postdoetoral fel-
Since t: is
.. ~
..- - -
I
i,

~
X
!
') (T x¡a)
<1) . - - -
(r
lowship on lhe probability that the prestige of the first ís censored
are íIlustrated in Figure 7.6. The solíd line with open squares shows the
probability of censoring for women who weTe not fellows. Fcmale fel-
lows are less likely to have lhe of their first job censored
to have a first job with prestige below 1), as shown the solid line with
REGRESSION MODELS
201
woman is shown in lhe column "OLS with
Dala" in '[¡¡blt: 7. L
nf Ihe of for

7.3,3. Pmhlems Inlroduced hy CensorÍng


e

1.
3 4 5
Ph.D. Prestige
variable is crcated in which al! censored observa-
Ccnsorcd Gendcr, Status, and the value T v. The model is cstimated with OLS the

an of lhe con se-


is 1, female fellows have a proba- in research on the labor Hefe
censored of which is 11 less than female nonfellows. , with these and in the proces~
",.»<tlc,,"' is 5, female fellows have a probability of being present results that useful for the tobit model.
whkh is .04 less than femate nonfeJlows. Notice that the
of a fellow 011 doctoral prestige. The dashed lines a lhmcated
similar results for meno For mate nonfellows, lhe probabílity de-
fmm .41 lO .08, while for male fellows the probability decreases The structural model for the latent variable is xl3 + 8. Since
.30 to .04.. female increases lhe probability of be- E( e Ix) 0, Ix) xl3. With truncation, ollr model is
is I and .03 when doctoral
x¡13 8¡ for al! i such tha! T

or, equivalently, The dependent variable is the truncated variable T. expec-


tatíons,
When censored were excluded from
effect of a woman on a ranked
Consequently, the estímated effect of
shown in the column "OLS with a Trun-
in 11:1ble When unranked jobs are coded 1 and Idt
results are biased since these johs are assigned a higher
would had if the variahle were not ccnsored. That y > T, x) i8
not zero. From
scores lower than 1 if
the negative effect of being a
[7.10]
RFGRFSS\O:'>J MODEI.S f .lIIlÍled ()t<tCi i/lll'S 203

(xJ3 and A is ¡he where A, is Llsed fOl' A( that A, may be thought of as


another variable in ¡he and Ir can be lhollght of as the slope
If thcl"C werc no trunca- coeflkient for Ihc variable A. If we estímate J3 y xJ3 + 8, we
to the d\)ts, both above have model that excllldes A. The OLS estimates wiII be
T, with the OLS estímate uf inconsistt:nt.
2, which is
Ceflsored Data

A secolld approaeh is to lhe entire sample after


IV y lo censnred cases, In 7.7, the censored observations are
are truncated. As we move to indicated by rhe eircles loeated on the ¡ine T 2. Sinee values of
) sínce smaller values of be low 2 have heen se! to 2, vI ís aboye v' i x) as shown
¡he As x contínues to move lo the by lhe short dashed line. This line is below y iy i. x) slllce the

doser and doser lo T. Given the differ- censored cases have not been eliminated, bUl only all llnrcalistícallv
it is clear why OLS produces ¡arge value, [1' we use OLS to estimate a usmg the entíre
when lhe is truncaled. sample after T r to censored observatlOlls, tbe estimares are

about the problems introduced by truncation incol1sistenL


mmlel implied by Equation 7.10: More formally, with censoring our model becomes

> i
[7.llJ p.12]
i

Applying E4lwtion the value of y x is the sum of


components for uncensored and censored cases:

'1 XI) 13]

[Pr(Censored I XI)

Using Equations 7,g and 7.9 with () (xJ3

) x iy] [7.14J

Substituting results from Equations 7.10 and 7.12, and simplifying,

[7.15J

Ix) is nonlinear in x, so that estimating the regression 01' y on x


x results in inconsistent estimares of the parameters for the regression of
T. and in the 1bhil Model on x. (What happens fo 7.15 if (I>( 8) = l? lf Cf!( 8) O?)
R SSION MODELS Umited Olllcomes 21l:'\

use the probability of heing censored as the likelihoocL This is indicated


hy the shaded region.
In rhe prescf1ce OLS is inconsistent One approach to Formally, for uncensorcd observations,
lhe tobit model hased on Equation l!. Y x(3 ITA e.
nvo-stage in which A is estimated
in ¡he first and y x(3 is estimated OLS
in lhe second sluge. Sinee Ihis estimator i8 less efficient and no easier 10 where 8í .N'(O, (r"). As in Equation 2.8, rhe likefihood
compute than the ML 1 do not consider it further. fm uncensored observarions is
ML estimation for ¡he tobi! model involves díviding the observations
¡nto two The first se! contains uncensored observations, which ML
treats in rhe same way as the LRM. The second se! contains ccnsored
InL¡ «(3, ln 1 <f¡ ('
(r IT )
we do not know the specific value
computing the probability 01' being censored In Figure 7.8, In Le is lhc sum 01' lhe logs of lhe spikes at ) and
in the likelihood equation. Figure 7.8 iIIustrates (x), ).
"nnrf,·u'h used for three observations by so lid circles. At For censored observations. we know x and that :'S T, so we can
each value of there is a normal curve showing the distribution of compute
. For uncensored the distance fmm lhe observation
lo the normal curve is the likelihood of that observatíon for a given (3
and (]". The Hne al T indicates where censoring occurs. For the
Pr(y; :5.. TI x,) = <1>( T (T ) 16)
censored such as , y;). we do not know the value of
ami hence cannot use rhe of the normal curve at thar point for Thus, for the flrst observation in Figure we are computing the area
the likelihood. Since aH we know for censored cases is thar T, we of the shaded rcgion at or below y T rather than rhe height of the
pdf at . Using Equation 7.16, we can write that part of the likelihood
function that applies to censo red observations as

Taking logs,

Combining the results for censored and uncensored observations,

In L( (3, [y, X)

.
'\'
L..t ln -1 <f¡
(T
(V
,1

l,' ¡ICe nsored


x
Figure 7,8, Maxirnum Likelihood Estímation for ¡he Tobit Model
While this likelihood equation is unusuaI with irs combination of the
pdf for uncensored observations and the cdf for censored observations,
R E G R E S S 1 O "i :\;1 O D F L S
207
tilé tobit moLlé! hold, 75.1. in the Latent Oul.come

LlMDEP, ¡he interest.


a méthod
value nI' is

¡xl xp
is
7..1. L Violations of i~~'HIIIIIWIIJ

the tobít model assumes that the errors are nor-


In ¡he LRM, if these assumptions are violated
, we can sta te:
but nol efficícnt. This is nol the case in
" 13¡ units in

Maddala amI Nelson ( show that lhe ML es- For


the tobit modcl is inconsistenl if there is heteroscedasticity.
I ! illustrates the effects of heleroscedastic- "
for several ami Arabmazar and Schmidt (1(81)
01' the robustness of the ML estimator lo het-
likelihood equation can be modified to account Sincc , the effect of xk does not depend on the
(T al' LlMDEP provides ML es- valuc ¡he other x's.
is of the form: a¡ acxp(z¡i'). See

The ML estimator is inconsistent whcn the errors ¡he arguments in Section 2.2.1 (p, 1 for the
& Schmídt, 1(82). Estimation of the tobit standardized and semi-standardized coefficients can be com-

ís considered further in Chapter 9.

is the lmcoflditional standard deviation of and iTk is the


Interpretation deviation of . Since is a latent IT" cannot be com-

fmm lhe observed data. lb cieal with this problem, Roncek


three outcomes tha! can be of ¡nterest in lhe tobit model: to a standardized coeftkient" that uses
latent the truncated variable y I y > T: and conditional on x: f3k X' Since

Ir" x ,md different (Why ?), Roncek's approximation


y. This section methods for inter-
should no! the unconditional variance of should be
values 01' each of these outcomes using
with the form:
Since y I }' T and are rarely used except
discussed
REGRESSION MODFI S UIIIÍfL'd Owcom('s 209

rnatrix among Ihe x's and - effect 01' a unit incrcase in un value of the truncated
()utcmne. The partial dcrivative of T, x) with to Xk is

Ilw Pirst Joh [1 [7.lH]


in Ihe same way
cnnsider the shows that lhe qllantity in sqllare brackets must
1 as xl3 increases. Thus, Iy
as xl3 increases. This is seen by com-
• lhe snlid ami long dashed Unes in 7.7. For a dichotomous
variable, the discrete as lhe variable moves fmm O lo 1 should
be u~ed instead of Ihe derivative:

• of the doctoral department, the pn:s-


.32 uníts, holding all other T, x, 1) Iy T,X,xk O)
deviation inerease in the of lhe
(Jf the tirst is to increasc by For both the partíal and the discrete the depends on
all variables constan!. ¡he level 01' al! of the x's in ¡he modeL As él summary measure, the
partial or discrete change for each with all other variables held at
and doctoral prestige are significant at Ihe
tlleir means is often llsed.

7.5.3. Change in the Censored Outcome


in the Trnncated Outcome
The censored olltcorne y equals the latcnt wllen the dependent
T is defined only for those observa-
variable is observed, ano eqllals TI' (lIsually T or O) when lhe dependent
If ¡he variable ís expenditures
variable i5 censored. Tr the dependent variable is on durable
the truncated outcome is how mllch was spent by pe 0-
goods, ir ís usefuf to let TI' O. Then Ix) is the actual
Those who did no! purchase goods are
Ihis olltcome may be of con-
~xpenditures of those witha x. Those who are censored on are
inclllded as O's, which is how mllch they actllally
manllfacturers of durable goods might
Fmm Equation 15,
monev consumers \vill spend and may be un-
consu;u:rs would have spent if durable goods
for less than lhe threshold T. Whether the
Ix) o
iP( )xl3 cr <P(
on the substantive focus of the
where i5 (xl3 The partial derivative with respect to xk is
value of the truncated outcome ,/E(y I x)

xl3 crA(i5) [7.17] Ir T\ T. then we obtain ¡he simpler reslllt:

The expected value is


cannot be interpreted as the (P(8)t3k = Pr(Uncensoredlx)t3k [7.19]
R RESSION l\10DELS Limited Owcomes
211

7.6, Extensions

The tobit O1odel has been extended in many ways. Amemiya (1


as the variable pp. 360-411), Berndt (1991, pp. Breen ( and Maddala
derivative: (1983, pp. 149-2(0) díscuss many of these O1ost of which
can be estimated with LIMDEP 1 In this

j x. 1) ~ x, comprehensive,

rhe lhe depends on 7.6.1. lJpper Censoring


all the model. As a summary measure, the
The simplest extension 01' lhe tobit model with from below
each with all other variables held al
1S rhe tobit model with fmm aboye:

if T

7.5A. l\1cDonald and J\!loffitt's Decomposition if T

McDonald and Moffitt of This model can be obtained from the model with lower simply
in rhe censored outcome. The sim- by changing the sign of y. Censoring y from aboye at T is identical to
i5 to differentiate Equation 7.13 censoring y fmm below at --T. Since this simple has subtle
rule. After deal of effects on the signs in many formulas, 1 present rcsults here. The
probability of censoring is

T,X)
Pr( Censored Ix) (1)( 8)

where U Ix) is the of an observation being uncensored


where o= (x~ T) I (f. Expected values are
x. When TI = 0, this results in the more commonly found version E(y' Ix) x~
of Ihis . It is important lo realize Ihat iI T y is not 0, the
formula 1S not I E(y I y < T, x) = x~ cr A(
The rif""t\rnnl'~ show5 that when it affects the expec- E(ylx) (I>(-o)x~ (r4>(8)
for uncensored cases weighted by Ihe pmbability of being
and il affects the pmbability of uncensored weighted The partí al derivatives with
eXi)ectecl value for uncensored cases mÍnus the censoring value
is useful for understanding how changes in
observed cases occur, ils aL/L/UI."",,, depends on one's ¡nterest in y
OllPU:,Cll lo
/3d 1 + OA(
REGRESSION 'vIODELS UlIlitcd ()wco!ncs
21

7.(,,2, {'pper and towcr Censoring such thar


the tl'\'o-límít lollit mude! lo allnw
aml10wer

I
The formulas fnr truncated amI censored outcomes are
T¡ of those discussed above. The value for Ihe tnmcated outcome
IS (Maddala, 1 pp. 160--1

when ¡he outcome is a probability


( examines the effects of
The partial with respect lo is
variable is lhe

variable is truncated
1()Wn 01' the time
bilis Ihat were never
were more than imlicated by
lo lhose who never voted If xk is diehotomous,
Fronstin and Holtmann (1994)
in a development that were clam-
ami Worden (1990) in their
will lile for hankruptcy,
I TU Y TL' X, = O)
With two Iimits. ¡he likelihoocl function incluclcs componcnts for upper
DI (TI XP)/!T For the observed outeome:
E(y I x) h, Pr(y TL x,)l ¡TU TU Ix í )]
[E(y I TL TU' x) Pr( TL <: TU I Xí)]
<1>( TI <1>( o¡) T (j )

[(1)( 8L ) (P( ) 1(xp


L
Differentiating results in Ihe simple pv<'rnN'

used for the single-limit tobít If Xk is a clichotomolls

__-'-'--'-o = E (y Ix, 1) Ix,


Ix) xp
¡.imUed ()¡¡{CO/7les
REGRESSION :'v10DEI S

TAHLE 7.2 Hausman ano Wisc:\ OLS \1L With


7.63. rile Truncaled Rel<lression Model lhmcalÍnn
when no information for [he dependen! or ¡nde- I )fS HL
where ¡he variahle is
if you sample only individuals
¡hat is truncated from
used lo these 01'

10

1'or all i slIch tha! T

Union
of lhe struclural equation for the tohit
The likelihood of eaeh ohservation is IIlnt:ss
lhe same for uncensored observations in the tobit modelo that
the likclihood must the area 01' the normal distribution Age
that has been trunca!t:d:
NOTE:
O1.S.

7.6.5. Models for Sample Selection

Sample selection models the tobit and truncated r.n'·Yrp,c<·,


models by explicitly modeling the mechanism that selects observations
The likelihood function becomes In L In The exoelcted as beíng censored or uncensored. There is a vast and ¡iterature
value I and ¡he clerivatives are ¡he same as for lhe on sample selection models 10; Maddala,
tobit mode!. 1983, Chapters Manski. 1995, for further 1 consider
The truncatíon ¡nto account is iIIustrated with only lhe simplest model for sample selection. sorne times knllwn as the
results from Hausman and Wise's ( of the New Jersey Heckman model (1976).
ative Income Tax In this was truncated 10 In the tobit and truncated tlle structural modeL ís
exclude families wilh incomes more than 1.5 times the poverty leve!. 1:'-
ble 7.2 OLS estimates that tnmcation and ML estímates xJl
of the truncated model. The ML estima tes are as much as Instead of being ohserved when > T. assume that is ohserved
wilh coefficients often z-values. These re- hased on the value nf a second latent variable z*, where
sults lhe hias introduced by OLS estimation
can be. [7.20]
x and w can have variables in common. is observed only when O.
7.6.4. Individually Varying Limits To estímate the modelo we assume tha! the errors are normally dis-
tributcd such that
The tobit model can he to allow censoring limits that differ
from individual to individual. This extension is closely related to event (8. i
) ~ .H[(O)' , (,
O peTe
p(Te)]
1
and discussed in Section 9.4.
RIC,RESSION \10DELS

and and Var( ji) is assumed to he


distributed normally. Equatiol1 7.20
(ami is if ;:' O.
Count Outcomes: Regression Models for
¡he tobit model (Greene. 1993. Counts
value 01' the ohservcd y:

v on x for obscrvations
inconsistcnt estima tes since A has been
involves first estimating the
moclel in

and Ihen ¡he ,pc,,","'''''''' of y on x and A.

7.7. Conclusions

touched on the rich set of moclels that cleal Variables that count the number 01' times that sornething has happened
with truncation. ancl selection. In a11 01' these moclels. are common in the social sciences. Hausman et aL ( examined the
is the same. Due to some data collection mechanism. ef1'eet of R&D expenditures on the number of patents receíved by U.S.
data are on some uf the observations in a systematic way. As a companies; Cameron and Trivedi (1986) analyzed factors affecting how
consequence. the LRM biasecl and inconsistent estimates. frequently a person visited the doctor; Grogger (1990) studied the deter-
rent effects of capital punishment on daily homicides; and King (198%)
examined the effeets of allianees on the number 01' natIons al war. Other
7.8. Bibliographic Notes count outeomes include derogatory reports in an individual's eredit his-
tory (Greene, 1994); consumption of (Mullahy, 1986); iIlnesses
While censored and tnmcated distributions have a long history in bio- eaused by pollution (Portney & Mullahy, 1986); party switehing by mern-
and within the social sciences structural bers of the House of Representatives (King, 1988); industrial injuries
and truncation oríginated with Tobin's (1958) ar- (Ruser, 1991); the emergenee of new companies (Hannan & Freeman,
tiele on household for durable goods. Indeed, this entire 1989, p. 230); and police arrests (Land, 1(92).
elass 01' models is sometimes referred to as tabit models, a term coined Count variables are often treated as though they are eontinuous and
to stand for "Tobin's probit." In the 1970s, a series the linear regression model is applied. The use 01' the LRM for count
Tobin's model appeared that stimulated a great outcomes can result in inefficient, inconsistent, and biased estimates.
and theoretical work. These inelude Grounau (1973), Fortunately, there are a variety 01' models that deal explicitly with char-
am! Hausman and Wise (1977). See Amemiya acteristics of count outcomes. The Poisson regression model is the m08t
fnr an extensive review of this !iterature, and Breen basic modeL With this model the probability of a count is determined by
a Poisson distribution, where the mean of the distribution is a 1'unction
217
R (fR SSI N MOl)ELS Cmml ()II(('O/l1CS 19

Panel A: /1=0.8 Panel B: /1=1 ,,5


tI) '<t
o o

a
0 0 -'---_ ...
2 3 4

observed after lhe Panel C: /1=2,9 Panel D: 11::::10.5


n N
count mode\. eorrespondíng a o
Each these models
,--.., N
.
>.0
~
'-
o.. o
L ..,,,,,'cn,, Distrihutiof1
o
0
I

0 1 2 :3 4
II
5 6
I
7 8 9
Let variable lhe number nf times that an
Y y
event interval of time. ji has a Pnisson distrihu¡ion
with Figure 8.1. Poissnn Dislrihution

1'01' 1, 2, [8.IJ
ís called o\'crdispcrsiofl. The of many models for count data is
exp( an attempt to account for
and for y
3. As M ¡nereases. the pwbahility 01' 0\ deereases. For M .8, lhe
K 1 plots the
of a O is AS; for M 15, it is .22; for M 2.9, it i5 and for M 1O.5,lhe
and illustrates
probability is .00002. Eor many eount variables, there are more observed
(see Taylor &
()'s than predieted by lhe Poisson distribution.
4. As M inereases, lhe Poisson distrihution a normal distrihution.
rhis is shown in panel D where a normal distríbution with él mean and
varianee of 1n.5 has heen on the Poisson distribution.

since it i5 lhe number of The Poisson distribution can he derived from a simple stochastic pro-
time, M can also be lhought cess, known as a Poisson process, where the outcome is the number of
times that something has happened Taylor & Karlin, 1994, pp. 252-
258, for a formal derívation of the Poisson distrihution). A critical as-
sllmption of a Poisson process is that cvents are This means
that when an event occurs it does not affect the prohahility of the event
is known as In occurring in the future. For example, consider the publication 01' articles
than lhc mean, which by sCÍentísts. The assllmptíon of independence implics that when a sci-
Count OU!con!cs 221
RFGRESSION \10DELS

8.1.1. The Idea uf Heterogeneity


papel', her rate of does not Past
no! affeet fllture sueccss. One explanation for the failure 01' the Poisson distributíon tn 111 the
observed data is (hat (he rate of productivity tL differs across indíviduals.
This is known as Failure to aceount fOl" in the
DistribuliofL Anide Cmm/s
rate results in nverdispersion in the distributiol1 of the count.
In él sdentitk faetors For example. suppose that the mean for men is tL Ei, with
numoer of papen; schnol by a a corresponding variance 01' tL a, while the mean and variance for
of 915 bioehernists. The average number 01' artieles was 1.7 with a women is tL O. Publications are assumed lO be a Poisson
whieh indicates that rhere is overdispersion in the distri- process in which the rate of differs for men and women.
butíon of anides. The forrn Ihis is shown in Figure R.2. What will the marginal distribution look like'? Assume there are equal
The observed for each count are indicated by diamonds that numbers nf men ami women. Then (he mean rate of produetivity fm
are connected a solid lineo The circles show the predicted probabíli- the combined sample is the average oi the rates for men ami women,
tíes from Poisson distribution with tL = 1.7. Compared to the Poisson tL [(tL ,,) (tL but the varianee will exceed tL. (he
the observed dístribution has substantially more O's, fewer two conditional distributions ami sllow lhal Ihe distrihulion wou/d
cases in the eenter of the and more observations in the up- have el larga variance.) In fai/ure lo account
per tail. the than would be expected umorzg individuuls in lhe rate o{ a COut!/ variable leads lo
if the of artldes was by a Poisson process in which in the marginal distributiofl. This result leads to the PoÍsson r."'<"'·"·CC"
all seientists had the same rate of producrivity. Of course, the idea that model whieh introduces heterogeneity based on observed characteristics.
aH scientists have lhe same rate of is unrealistic, whích leads
us to the idea of
The Poisson MoJel

In the Poisson regression model, hereafter the PRM, the number of


events y has a Poisson dístribution with a cofUlitionalmean that
on an individual's eharacteristics to the structural model:

Taking the exponential of xl3 fmces the count tL to be


which is required for the Poisson distribution. While other relationships
between tL and the x's are possiblc, such as x) xl3, they are
llsed.
Panel A of R.3 iIlustrates the PRM for a independent
variable X. The relationship tL exp( -.25 is shown by a solid
o line. Sinee y is a count, it can only have values.
These values are represented by dotted Iines which should be of
~LO--~~--~--~~~~~~~~-A~
O
as coming out of the page. The of tlle line Índicates the pnJb¡lblllt"
2 3 4 5 6 7 8 9 10 of a count given X. Specifically,
Number of Articles
8.2. Distrihution Ohserved and Predicted Counts of Artides
REGRESSION MODEI.S Count Owcoll1es 223

Panel A: tor x=o to 25 The distribution of counts around the conditional mean of y in panel
A of Figure K3 reflects the characteristics of the Poisson distribution
that were discussed using 8.1. Indecd, 1 construcled 8.3
so that the means al x egual to (l, 5, 10, and 20 correspond to the
means in the earlier You can see tha! as J.L íncreases: (1) lhe
conditional variance of v (2) the 01' O's
!1'l'r,>''''f'~' and (3) the distribution around the value becomes
approximately normal.
The figure also shows why the PRM can be of as él non-
linear regression mode! with errors lO e }' yI While
the conditional mean of B i5 0, the errors are heteroscedastic since
Var( 8[ E( Y I exp( xl3). however, that ¡f your data are Hm-
ited to a range of x where the relationship is linear. the
LRM is a reasonahle approximation to the PRM. This is shown in panel
B which expands that portion of the in A between x 15
Panel B: for x=15 to 20 and x 20. The relationship between J.L and x is
are approximately normal, and there is only slight
o
N

8.2.1. Estimation

The Iikelihood function for the PRM 1S

L(131 y, X) nN
Pr(.Vi J.L¡) = n --'---'-:-'.:. :. . .:. .
N

1=1
f8.3]

where J.L = exp(xl3). After taking the log, numerícal maximization can
he used. The gradients and Hessian 01' the log likelihood are by
OL-______ ______ ______ ______ ______ Maddala (1983, p. Since the likelihood function is globally concave,
~
~
.
~
~
~
15 16 17 18 19 20 if a maximum is found it will he unigue.
x
8.2.2. Interpretation
.-igure 8,3, Distribution Counls for the Poisson Regression Model
The way in which you interpret a count model depends on whether
you are interested in the expected value of the count variable or in the
= .78. Using this value of J.L, the
distrihutíon of counts. lf interest is in the expected count. several meth-
these values. )
ods can be used to compute the change in the expectation for a
01 .46 Pr(y = 1 1 J.L) .36 in an independent variahle. If ¡nterest is in the distrihution of counts or
21 .14 Pr(y = 3 I J.L) Jl4 perhaps just the prohability of a specific count, the probability 01' a count
for a given level of the independent variables can be computed. Each of
Other can he simiJarly. these methods is now considered.
RES ON MODELS ('Olml Olll(,ol/les 225

( The factor in the count for a 01' ¡) in equals


x ís

[K4]

Tlle IX
cxp(

Thercfore, lhe parameters can be fnllows:

• For a of in , th" count by a factor uf exp(


3), holding '111 other variables con~tanL

exp(x~
For specific values ni' 8:

eHeet JepenJs on • Factor in lhe count


al! nthcr variahles constanL
value 01' y given x. The larger
in E(y Ix). Further, • Standllrdízed
vaiues aJl independent variables, the pected count
un lhe lcvels uf aH variables. Often, Ihe eonstanL
with all variables helJ at their rneans.
Since Ihe PRM and lhe other count models in (his chapter are non- Alternatively, the percentage in the count for a unit o
lhe cannot be inlcrpn:ted as the change in the change in x k, holding other variables constant, can be computed as
in lhe partíal with re-
make sense. For these reasons, this lOO 100 [exp(f3k - 1]
measure informative than the factor change or Jiserete

Notice that the cffect of a in Xk does not depend on the level of


Xk 01' on the level nf any other variable,
The factor or percentage
[rom the pararne-
provides a more detailed DiscrC1C Changc in 1 x). The effect of a variable can also be as-

sessed by computing the discrete in lhe cxpected value of y for


a changc in x k x.\ amI at
Ix, )
Ix, !x,x
where Ix, lhe value uf 1f x k ehanges by 8,
This can be as:

• For a ehange in x k fmm x, lO the count


flE(y! all uther variahles constan!.
('Olll1t ()WCimlCS
226 REGRESSION MODELS

TABLE 8,1 Statistics for the Doctoral Pllhlicatínn~


the discrete can be compuled
lila
Same

The from its mini- ART ()Oo


mum to its maximum, !,¡¡ART tU¡l!
FEJ1 (UlO
2, The fmm () to
1100
,\1AR
KW5 0.00
¡'fin O.
.\1ENT lUJO

Ihe Poissof/ ,\fode!: Artide CounEs

The failure of lhe univariate Poisson distribution lo account for ¡he


Unlike rhe factor or the magnitude of the discrete distribution of article counts eOllld be due lo in the charac-
/".",,,,,,'e on the levels of aH variables in the model. terístics of the scientists. If scientists who differ in their rate 01'
are combined, ¡he univariate distribution of articles will be overdis-
persed, Research by Long (1990) tha! marital status.
Predicted Probabilities number of young children, program, ami Ihe
number of artieles written by a scientist's mentor collld affect a scien-
The can also be used to compute the probability distribu-
tist's level of publication, Table 8,1 contains statistics for these
tion of counts for a !evel of the independent variables. For a given
variables, Table 8,2 estimates from the PRM and the
x, the that m IS
binomial model (NBRM) that is considered in Section 8.3.
For purposes of comparison, 1 have also included results from the
m Ix) [8.6] LRM. By taking the log oí' Equatíon the PRM can be written ¡he
log-linear model:

where íl The probabilities can be computed fm In fkl = xJ~


each observation for each count m that is of interest. Then the mean
This suggests that the PRM can be ',,,·n,..,w,,·" the LRM:
count m can be used to summarize the
lnYi = XI~ +
However, since In(O) is undefined, it is necessary to add a COI1-
= rrl m IXi) [8.7] stant e to y before the log, Values of e equal to ,5 or .01 arc
frequently used, This the model:

after controlling fm inde-


can be to the observed proportions of the
King (1988) demol1strates that
at each count. This is now iIIustrated with the data on scicntific
estimates of the of lhe /'nrn',cnf\nm PRM, However, as
REGRESSION MODELS COWlt Outcomes 229

TA8LE 8.2 Binomial The Htandardized coeff1cient can be interpreted as:


• For a standard deviation inerease in the mentor', a seientist's
U?:H PRJf NBRM mean productivity increases al! other variahles constant .
.,un or ART
030; 0.256 (~érij}'
¡hese numbers.)
Ul2 The results just given refer to the multiplicative factor in the
-0.225 -0.216 expected count. It can also be informative to examine the additive
-".11 -2.1'12 in the expected count. For example, the in the
IU55 lUSO for a change in FElvl from O to 1 can be computed
2.53 1.79 First, hold aIl variables at their means for
O.IH5 -0.176 indicating a female scientist, the eXIDe(:tecl
141 -0.135 O, indicating a male scientist, the eX1De¡:Iea
-".61 -3.28
we conclude:
0.013 0.015
OJl13 iUl!5 • Being a fcmale scicntist decreases the .36 artides,
OA9 OA2
holding all other varíahles at theír means.
0.026 0.1129
0.242 0.276
Notice that the change of .36 articles from 1.79 to 1.43 corresponds to
12.73 9.10
0442 the 20% de crease that was computed using the measure of percentage
HAS change. (Verify these values using Table
The results of the PRM can also be interpreted in terms of predicted
0.21
probabilitíes using Equation 8.7. In Figure the observed proportions
are shown by solid diamonds connected by solíd lines. The mean pre-

v
o
the estimates from the LRM ean be of roughly
M
as the eoefficients from the PRM. This is n
more counts are frequent.
The the results of the PRM is by lIsing the
count. For example, the eoefficient foc ...o N
O
...o o
O
• llumher of artielcs hy a !.....
al! othcr variahles constant. o...
o
Or,

• the numher of artieles by 20% o


all other variables constant. 00 2 345 6 7 8 9 o
the effeet of the mcntor's produetivity can be interpreted as: Number of .A ¡cles
the mentor, a scientist's mean productívity Figure 8.4. Comparisons of the Mean Predicted Prohahilities From the Poisson
al! othcr variahles constant. and Ncgative Binomial Regression Modcls
REGRliSSIOS MODFLS ('mmt ()utcomes 231

t:is a random error that is assumed lo be uncorrelated \Vílh x. You can


think 01' B either as the combined effecls of lI110bservcd variahles ihat
have been omitted from the model el aL all-
other so urce of pun:: randomness (Hallsman el al.. I
\ariation in fl is intrmluced through ohsenJed 1H'If"nI!WI
ues nf x result in different v¡¡lues of fl. bul all índividuals wilh the same
x havc ¡he same ¡L. In the NBRM. variation in is duc hoth 10 variatinn
in x among individuals bu! aIso to lIIwhsen'ed introduced
hy 8. Fnr a comhination of values fOl' the variables.
there is a distribution of ralher than a
The n:lationshíp between ji and our

Mudel where 8, is defined lo equal exp(B,). Recall tha! Ihe LRM was nol iden-
tificd until an assumption was made about lhe mean of lhe error
Section 2.5.1). For simílar reasons, Ihe NBRM is not identifled without
an assumption about the mean of the error termo The most convenient
lhe estimates assllmption is that
el al., 1(84).
18.91
This assllmption implíes that the count after the new
source of variation is the same as it was fOl' lhe PRM:

)=
COJ1straining the
The distribution of observations x and 8 is still Poisson:
mean IS to compare rhe
x IS conditionally dístrihuted
wíth the Even though 101
ít allows rhe variance of the errors to
In the PRM. has a condi- However, since 8 is unknown we cannot compute Pr(y 1 x, 8) and instead
function 01' the x's need to compute the distrihution of y given only x. To compute Pr(y x)
cxtension nf lhe PRM adds without conditioning on 8, we average Pr(y I x, 8) the probability of
to exceed the con- each value of o. If ¡:; is the pdf for 8, then
model, hereafter
NBRM can be derivcd in several ways, 1 con- Pr(y¡ I x¡) r
}o
[Pr(y¡ 1 X" )1 111
sider ¡he most ,'(\r1<ty"m motivation 01' the model in terms of unobserved
To c1arify what this important equation is assume that has
mean of x is known: fl exp(xJ3). two values, d 1 and The counterpart to Equation 8.11 is
with Ihe rando!1l variable
Pr(y,lx¡} lPr(y,lx¡,o¡ ci¡) d¡ )} 121
[8.81 [Pr(y, 1 xl' 8, )]
REGRESSION MODELS
COW¡[ Outcomes 233

Panel A: Panel B: v=4 huI lhe c()nditional variance differs:


o o
151

Since f.1 and l' are positive, ¡he conditional variance y in the NBRM
must excced the conditional mean exp(x~). (What mus! lo ¡, to
reduce IIw variana tu that the PRI\4'!)
ÓLO------~2~~3====4==--5---6 The conditional variance in y increases Ihe relative
2 :5 4
uf low and high nmnts. This is seen in H.6 when.: ¡he Poisson
Ó Ó
and NB distrihutions are compared for means 01' 1 and 10. The NB dis"
Figure K.5. FUllctioll ¡he Gamma Distrihution trihution eorreets a number of sourees nf pom tit tha! are often found
when the Poisson distribution is used. First. lhe varianee 01' the NB dis-
tribution exeeeds the varianee of the Poisson distributÍon for a
mean. Second, the increased varianee in ¡he NBRM reslIlts in substan-
tially larger probabilities for small counts. In A, Ihe probability 01'
of Y as a mixture uf two
a zero count inereases fmm .37 for the Poisson distribution to
and .H5 as the varianee of the NB dístribution inereases. Finally, there
lb solve 11, we must the [orm of the peif for o.
are slighlly larger probahilities for eOllnts in the NB distribution.
While several eiistributions have been the most eommon as-
While the mean strueture Is fully by 8.14, the vari-
is that o, has a gamma distrihution with parameter /)i:
anee is unidentified in Equation 8.15. The problem i8 that if ¡J varies
individual, then there are more than observations. The mos!
o, ) ) for /), O 13] eommon identifying assumption is that /) i5 tlle same for al! individuals:

for a ()
t" le -( dt. It ean
that 8, has a gamma This assumption simply states that Ihe varianee of 8 is eonstant. (We
1, as Equation and Vare 8 1 ) = set the varianee to ( t ' 1 rather than a lo simplify the formulas that 1'01-
/) also affeets the shape of the distribution, as shown low.) a is known as the dL5persion parameter sinee a inereases
the distribution beeomes inereasingly bell the conditional variance of y. ThÍs Ís seen by 0'-1 into
and eentered around . Equation 8.15:
hereafter NB, pmbahility distríbution is ob-
1¡ H.1O for Pr(y I x, S) and [H.161
Cameron & Trivedi, 1996, for details):

(What happens if ex O?) Under Ihis of /), the eonditional


varianee is quadratie in the mean, whích has led Cameron and Trivedi
(1986) to cal! this the Negbin 2 model.
value of for the NB distribution is the same as for the Figure H.7 shows the effeet of the added varíation in Ihe NBRM. While
Poisson distrihution: the mean structure is identieal lo that used to ¡Ilustrate the PRM, namely
E(y Ix) exp( .25 + .13x), lhe distribution around the mean differs. In
panel A, a .5 and the dif1'erenee from the PRM is subtle. Compared
[8.14]
REGRESSION MODELS COWlt OWCOIllt'S 23S

Panel A: Panel A: NBRM with iX=O.5


eO
o N

o
N

eO
....-...,
>-
o
o...

N eO
o l·.
l.
o o
00 O 5 10 15 20 25
2 3 4 5 6
X
Y
Panel B: NBRM with a=1.0
Panel B: O eO
N
1
r<)

o o
N I ¡

¡
! f
eO

N
o >- t•
o ,
r
~
¡.
o... ~
l-
o eO

o
o O 5 10 15 20 25
00 5 10 15 20 X
y Figure 8.7. Distribution of Counts for the Binomial Mollel
Figure 8.6. ofthe Binomial and Poisson Distributions

8.3.1. Heterogeneity and Contagion


the differences are noticeable at x = 0, but can be
values of x. When a is íncreased to 1 in panel B, The NB distribution can be derived in a of ways as shown by
for example, that the condítional Feller (1971, pp. and Johnson et al. (1 pp. The
mode for aH values 01' is now O and rhat the errors no longer appear derivation used above is based on unobserved heterogeneíty, which is
normal f.L increases. represented by the error e in Equation 8.8. This derivation dates ro
REGRESSION MODELS

920. Alternatively. the NB distrihu- PRM are ineflicient \:vith standard errors
proeess known as colltagiorl. usíng an ap- thal are Trivedi. ¡ If software is
and Pólva in 191.1 COlltaginn oceurs available estimate Ihe NBRM. a nne-tailed z-test of HI): IY = n can
set of x's initially have lhe same probabil- he llsed when (t is zero ¡he NBRM
but Ihis prohability as events necur. reduces 10 lhe PRM. Or. a LR test can he If In is ¡he
that there are two scientísts \vho have identical likdihood fmm Ihe PRM ami In likelihood from the
have the same late 01' productivity ¡.L If the ~BRM. ¡!len 2(1n In ) is a test nf No: (f O. To test
article. her rate nf productivity Íncreases hy a al ¡he should he used sinee
result 01' contagion from the imtial publi- n must be Camernn and Trivedi severa) tests
she receive additíonal resources as a result based 011 Ihe [(:siduals from lhe PRM that do not estimation oí'
and these rcsourees may increase her rate of publicatíoll. lhe ~BRM.
sci.:ntist's rate would stay the same as long as he did not
The process is in the sense that success in publish- 8.3.4. Interpretation
the rate future publishing. Thus. contagion víolates the
of the Poisson distribution. Methods
ami contagion can generate the same NB
eounts. Consequently. heterogeneity is some-
or contagion, as opposed to
With eross-seetional it is impossible to determine [8.17]
¡he observed distribution of counts arose from true or spuriolls

where ¡¡ exp( xl3).

8.3.2. Eslima!ion rhe NBRA1: Artide Counes

NBRM modd can be estimated ML. The Iikelihood equation Estimates 01' the NBRM for published artides are in Table 8.2.
The can be interpreted in lhe same way as the from the PRM
v considered aboye. There is evidenee that there is
131 y, n SiOIl. The (J is positive wlth 8.45. which is
a LR test can he eomputed:
N V In L PRV!) which is even more highly signifi-

n J.LJ '
cant. Notice that ¡he z-values for the NBRM are smaller than those for
the PRM. which would be with overdispersion.
8.4 shows that lhe NBRM does a much hetter job than the
After the likelihood equation can be PRM in predicting the eounts from O to 3. Another way to see the dif-
with llumerical methods. Lawless (1987) provides gradients ferences between the two models is to compare their predictions for the
prohahility of not publishing as the level of other variables change. In
Figure lhe probability of having zero publieations is computed when
8.3.3. Tcsting for Overdispersion eaeh variable the mentor's number of articles is held at its mean.
For hoth modcls, lhe nf a O deereases as the mentor's arti-
lo test for overdispersion if you use the PRM. Even eles but the of predicted O's is significantly higher
01' the mean structure, estimates from lhe for the NBRM. Sinee hOlh mndels have the same number of
REGR SSION MODELS
Cowu ()¡i/COII/¡'s 239

KA. Modds 1m Truncated Counts

Zeru truneat<:d uccur whcn unservations enler ¡he


only after lhe flrst count oecurs. For consider lhe pwbkm of
¡he number 01' times persun seeks medical treatment. Ir
is based on lhe records of those whn visÍted a a
persol1 mus! visil lhe al least once before the sampk.
Ir a of scientisls ís dra,vn fmm lhe aulhors uf papen; published
in some j(jurna!' ¡hose withoul are exc\uded. A study of lhe
!1umber 01' TVs in a household be based on a of those
who returned their canl afler a TV; those who do
not \)wn TVs are excluded fmm the Gurmu (I 9(1)
and and Carson ( 1Y(1) extended the PRM and NBRM lo deal
with trnm:ated counts. Whíle truncation can occur at any [ focus
on truncation al O, which occurs mosl in
n Let y be a Poisson random variable:
Poisson Binomial
Pr(YI I [8.18]

where fl cxp(xj3). The (Jf zero and counts are

zeros for the NBRM is Pr( y, OI Xi)


cmmts that are also predicted
[8.19]

The conditional pnlbability of y evcnts gíven that y > O


is computed with the law of conditional probabílity: Pr( A I B)
:\lndels Pr( A and B)/Pr(B). From Equations 8.18 and 8.19,
of models constructeu by míxing the
,>econd distríbution using Equation 8.11 The 0, Xi) [8.2°1
"amma distributions is particularly conve-
b the negative binomial distribu- Each is increased the factor [1 exp(-fl)r 1 , which dis-
2 models considered aboye, Cameron and tributes the probability of a zero count acftlss al! positive counts in the
k mode! in which Var(y Ix) l_dCXfl2-k truncated sample. This forces the truncatcd pdf lo sum lo 1.
afl which corresponds to replacing the Grogger and Carso!1 (1991) provide the mean aud varianee of the
¡he 2 model with v fl/a. This ís known truncated variable y I y 0, which can be derived using methods similar
Other distributions and mixtures can also be to those used in Chapter 7 for lhe tohil model (see Johnson el al., 1992,
Poisson and normal mixture; Dean el pp. 181-184, Since zero counts are exduded, the expected value is
invcrse Gaussian mixture. Kíng (1989a) increased by the inverse of lhe probability of a positive count:
count model that allows for eíther
Sec Winkelmann (1994, pp. 112-120) L(YII [8.21 ]
REGRFSSION \10DELS COI/m O¡/lCO//1I'I' 241

for the zero truneated count cnnvergcs In the ex- Similarly. for the truncated binomial
without truncal ion Ihe 01 a zcro count
counts cxcluded, lhe vafiance is less than that
L(f3. a Iy. X) = n
J
Pr(y¡1

where lhe conditional probability is obtained from


log likelihoods can be maximized with numerical methods. and
Carson (199 L p. lhe and Hes~ians.

Tile lO the ~BRM whcre


8.4.2. Interpretation

As with the truncated model 01' 7. intcrpretation


can be in terms of either the untruncated or the truncated count. For
that both the truncated PRM and the truncated NBRM, lhe value
01' y without tnmcation is

! x) exp(xf3)
01 =1 (1 (~!-ii)

these Interpretatíon in terms of partí al derivatives, factor


change can proceed as discussed in Seclion 8.2.2. The "'p"A.. ".,,,, values
for the truncated count y! y () can be estimated Equations 8.21
and 8.23.
[8.22] Estimates of predicted probabílities for y are computed lhe es-
timated f3's from the truncated models and 8.6 and 8.17 for
The conditional mean and variance are & Carson, 1991) untruncated distributions. Predicted probabilities for the truncated dis-
tribution are computed using 8.20 and 8.22.
o. x¡} [8.23]
8.4.3. Overdispersion in Truncated Count Models

0, Grogger and Carson (1991, p. make lhe important point that in


the presence of truncation. overdispersion results in biased and inconsis-
tent estimates 01' the f3's and, the estimated
The reason for this is símílar lo the reason for
8.'U. Estimation bit estimator in Ihe presence of
lhe mean structure 01' the PRM is correct even
Estimation of the truncated Poisson mode! involves a simple modifi- sion. Thus, the estimates are consistent even the standard errors
cation to the Iikelihood for the PRM (Equation are biased downward. In ¡he presence of truncation, the mean struc·
ture changes with overdispersion, resultíng in inconsistent estima les of
f3ly.X) n
,V

i~l
O, X¡ ) n
,V

i=1
---,-_'!-':---'.-.:--'-'---:-: [8.2 41 f3. Gurmu and Trivedi (1992)
cated count model.
tests for in lhe trun-
REGRESSION l\10DELS
243

Mouels R.

of O's in the PRM hy in- (1


[he conditional mean.
mean "trucrme to explicitly oflhe ¡he
counts. This is done by that
different proeess than cnunts. For ex-
NBRM assume thar cach scientist has a positive for V O
number of papcrs. The prohahility
to their characteristics. hut all scien-
ami al! scícntists might publish. This is the ze!'o inllated models
have in which publishing is not pos- considcr estimation or nf the with
O is 1). Zero moditled models allow for ¿eros molle!. Hllwevcr. it lo understand ¡he of the
proeess increase the conditional variance and wilh ¿eros it is lhe hasÍs for the zero inflated models.
counts.
8.5,2. Zero Inflated :'vlodels
8.5.1 Tlle Witll Zeros Model
Lambert (1 ami Greene extend the with zeros model lo
Mullahy ( assumes that the allow lo be determined characteristies of ¡he individual. As in the
two groups. A person is in group 1 with probability with counts are two processes. which I iIlus-
with 1 - ¡jf. ¡JI is an unknown trate wilh ¡be Poisson versinn of the model. Fírst, both zero ami
estimated. The first group consists of people can
counts. For a scientist who will never
because nI' ¡he nature of her jobo would be in this group.
a scientist who does not publish but tries to (e.g.,
would nol he in this group. We do not know
scientist with zero is in the first or lhe second with 1/1 from a
this could he entered explicitly in the regression as an functíon of characteristics of Ihe
the distinction bet:ween ¡he two groups is a modelo is determined

by 11 PRM or a NBRM. Fnr

where F cdI. See 3 for details.


Ix,) [H.25] The
In the the z's the same as the and the param-
this group. zero counts occur by chanee with model are assumed to be a scalar multiple of the
This corresponds to the scientíst parameters

two different proeesses, depending on T~])


01' O's is a combination of the prohahil- T) model reduces the number of it is diffi-
¡he prohahilíty of an individual a seience in which one would
('( ¡¡mI OIll('OIll('\' 245
RF(JFU'SSION MODELS

proccss to he a simple l11ultiple of the pa- S.27 and Pr( x. z) is def1ned as


the díffcrences hdween the f3
substantive inten:sL Accordingly, r do no!
ínílated modcls any further.
model and ¡he hinaly process for lhe

Comparable formulas are available fOl' the other zero intlated moe!els.
[8.281

ror y, :> O
For ¡he ZIP mude!, oí' a zero count are basee!
(ZI model is created by replac- on Equation 828:
NBRM, with corresponding adjustments lo
}' () ¡x)
Greene (1994) shows that

(1 )] where ¡¡ = exp( xll) ane! ¡/f F(z9). The probability of a


positive count to lhe 1 ~ at rísk of the event:
Ihe mOlle I has becn changed by lowering lhe
conditional varíance is also changed. For lhe
Ix) = (1

Similarly, for the ZINB modeL ¡he predieted probability of a zero eount
is
standard PRM. Otherwise, lhe variance excceds
ZINB
o x) = ¡Ji (1

(han O the and the predicted probabilíty for a positíve count is


NBRM, but for ¡II
lhe NBRM.
Pr(y I x) (1

the EM algnrithm, Gfeene The e!ef1nitions of ¡¡ and !~ are unehanged.


tha! allow ML estimation using the BHHH The f3 parameters are interpreted in the same way as the parameters
1 this method of estimation. from the PRM or the NBRM. Furthermore, the 'Y parameters are Ínter-
pretee! in the same way as lhe parameters 1'or the binary logit or probit
modcls of Chapter 3, where the outcome event is havíng a zero eount.
Thus, a positive eoefficient in the binary proeess inereases the probability
1'iy.X,Z) I XI' z,) 01' beillg in the group where the probabilíty of a zero eount ís 1.
, 1
R <1RESS!O"l 'v10DELS ('O/mi OIllCO!1leS

¡\Jodels: CO/mlS Third. the magnitudes and Ievels nI' lhe pa-
rameters for the proeess are different from the
from the count process. The level of
strongest cffect in tbe hinary process
from nonpuhlishers. None of the other variables is in lhe
portion nf lhe ZIP mode!, although heing married makes ir more
Ihat a scientist has ¡he lo publish in the ZINB molle!.

8.6. Comparisons Among COllnt Models

We bave considerell four models the


Poisson mode\' the binomial the
zero inftated PoÍsson molle!, and the zero intlated
mode!. One way to compare these models is with the mean nf'r"""'''\!
¡tíes computed using Equation tU:
TABLE 8.3 Zero Inftated Poisson and Zero Inllated Binomial
Models Doctoral Puhlications
Pf(y
llSB

Figure 8.9 plots the difference between the observed proportíons for
each count and the mcan probabílity from lhe four models. We see im-
mediately that the major failure nf the PRM ís in the number
-2A8
of zeros. with an underprediction of about .1. The ZIP does much het-
0.098 ter at predicting zeros, but bas poor predictions for counts one
1.14 three. The NBRM predicts the zeros very well and aIso has much bet-
0,628 152 ter predictions for the counts from one to three. The ZINB slightly nver
11'<+81-0.116 predicts zeros and under predicts ones, with similar predictions ro the
1 -2,81
·0,001
NBRM for other counts. Overall, the NBRM model the m08t
.. 0001 accurate predictions, which are slightly better than those for the ZINB .
012 ·-0.02 We can a1so test differences between pairs 01' models. Section 8.3.3
Il.025 showed that the PRM and the NBRM can be by the
0.235 dispersion parameter a. Since the NBRM reduces to the PRM when
·-233 7.97
0.377
a O, the models are nested. For our example, we found that a was
746 significant, with a Wald test of Za 8.45 and a LR test of 180.2.
There is clear evidence supporting the NBRM over the PRM. The ZIP
O,iOO 0312 and ZINB mode1s are a1so nested, and we can test }(o: a O with
3099.98
either the z-statistic for a in the ZINB or a LR test:
2(ln L Z1NB In L Z1P ) 109.6. There is evidence that the ZINB improves
the tit over the ZIP mode!.
R RESS10~ MODEI S 2-19

mndels fits the


of models without any theoretí-

that there are unohserved sourees of


the ZINB makes

While ¡he Pnisson central to rhe of


Conot Mudels
mode!s for counts, it nf additional models have been
that the conditíonal mean,
out that PRM and ZIP are not nested (and si m- mntivation is lo construct models tha! eorrect ¡he vari-
For the ZIP mode! to reduee to PRM fails ro t1t The PRM and Ihe NBRM have
O. This does no! oeeur when structure, bul lile NBRM introduces unohserved hetero-
8.27: ~¡ zO) .5. This im- which allows the conditional variance In exceed the conditional
huye ¿ero counts with mean. The modified count models mix two processes Ihat ¡he
test Vuong ( p. 319) counts. The zero eO\lnts, while the second
this test, eonsider two models where process and counts. As consequence of mix-
of hased nn the t1rsl these two processes, lhe eonditional means for the with ¿eros amI
¿ero intlated moclels differ fmm ¡hose of lhe PRM and NBRM. How-
nf lhe Poisson moclel
additinnal models for counts
models fnr time series and data. In ad-
tests and robust methods of estimation are rapidly
Cameron and Trivedi ( 1 Gurmu and Trivedi
and Winkelmann ( detailecl reviews of this litera-
ture.

that E(m) O. V is asymptot-


normal. If V is Ihan lhe critical value, 8.8. Notes
model is if V is less than 1.96, the see-
and othef\vise neither modcl i5 For Count the social sciences. Indeed, Pois-
5.98 which favors the ZIP mndel, and known the Poisson distribution
favors the ZINB modeL criminal behavior. Coleman (
RF (J R I S S 1 O:'-i \1 O D F L S

9
nI ¡he Poissoll distrihu-
nmdeL

Conclusions
considered a varíetv uf pseudo
cstímatinn \vbich were illustrated with
('ameron and Trivedi (¡9X/)) presented
and tests 1m cnunt mockls. King
count mode! lo politieal seiel1ce ami
and binomial mod-
count modcls grcw out of work 011
in Johnson et al.. 1992. Chapter 4).
\vi¡h zeros modd to lieal Wilh lhe frequency
(¡mm\! (1991 and and Carson
the Pois50n model to cleal wilh Imn-
Lambert ( developed an extension
referred to as the zero intlated Poisson

and relatively nontechIlica! in- We have ennsídered many models in lhe last 250 pages. If yO\! are en-
rdated to lhe Poisson distribution. ¡hese modds for lhe tina il may be hard to track
information and also considers rdated of the differenees and similarities. StilL ¡here are many importan! mod-
els that have not heen sorne 01' which are very closely related
to the rnodels in Ihis book. In this brief ehapter. 1 try to ad-
dress both issues. First 1 summarize Ihe eonneetions arnong the models
from the prior three distinet but eomple-
mentary linear model, and
probabílity ll1odels,
eovered and wil! into Ihe rnodels. Second, 1 show
the eonneetions between the models we have studied and two importan!
classes of models. ;nodels and ll10dels for survival analysis.
This bríef discussion wí!l be most useful for those who are famil-
lar with and survíval models and who are inlerested in lheir
eonnections to lhe ll10dels in Ihis book, Additional links between the
ll10dels in Ihis book ami mndels for ordinal variables are found in C10gg
and Shihadeh (1994. Chapter Heinen's (1996) book on latent class
and discrete latent Irait models also i!lustrales many links between these
models and lhe rnodels \Ve have díseussed.
251
R RESSION \10 D F LS (

9.1 Latent Models

were on a structural
latcnt:

xll 1]

x Il 15 a vector 01' structllral


ami is random error. For now, assume that s has a normal distri-
bution. To estímate model with latent there must be a link
between the latent and the observed The nature nf this link de-
¡he model. lb show with the lincar
model and then eonsider how different measurement models ¡cad to
olher models.
x
Linear ,~todel.Most we can asslIme tha! the
laten! variable the obscrved variable: Figure 9.1. Similarüies Between the 1()bit amI Probit Mmlels

for al!
We can also compute the probability 01' a case
This leads to the linear r'"',np'C~I"n model: corresponds to afea B in 9.1. For a x. the nrtln:'n! ¡hat
is at or below T is
y xll
Pr(y' Tlx)= T xlllx)
OLS 01' ML can be used lo estímate ¡he . Since is observcd,
of ¡he unstandardized
for a unit change T/¡e Hinary Probit Model. The binary probit model can be thought 01'
in as a tobit model in which values both aboye ami below Tare censored.
The measurement model is
Tlle Ibbit ,"lode!. The tobit mode! is formed that when
is below some value we do not know its value, but only that y* 15 al or if T o
our measurement model is y
jf T O
if T
Therefore, is not observed for any observations, and we assume that
if T T O.
Interpretation of the probit model often
This rnodel is iIlustrated in l. Consider the dístribution of y* at that el I was observed (i.e., is above the
of ¡he distributioll labeled A. y is egual lo the latent was observed (Le., is at or below the ~~,~,·".r'~,n
is censored and all we know is tha! T. Since
for the probability of a n iIIl1strates the close
and probit and tobit models. From Chapter 3.
»r,·,,'p·">d in terms the units
Pr(y Olx) xlll x)
R GR S 10 MOD LS

ro

T xi31 x) r9 .41
identical to
in the tobit
to as "Tobín's

model and either the LRM or

1,

results in a different varío x


model are affected
in the and pro-
model

the for the observation X2-


ami
In observed in tile but is ceno
e.

coded as 1 and 1
the variance of can be
reftect the scale of
scen the T'S. If TU
modeL As TU goes lo 00, we
from below. As TL goes LO -00,
R 1 ; R E S S ro" :\1 O DEL S CUIl<111.\irIlIS
257

(1995, pp. revícw these models which are known mean and
covariance ~tructures wilh l10nmetric

¡
variables. Versions 1'or
ir lhe Ingit model are IlO! available ,ínee ¡here ís no to the
mullivariate Ilormal lbtrihution 01' 8 whích is used to allow correlations
across

9.2, lbe middlc 9.2. The Linear Model

and TI . Whíle we do not have


Al10ther lo ~ee links among many 01' Ihe models Iha! we have
we do know lhe values of TL
studied is in terms of ¡he linear modeL hereafter GLM
same metrÍC as lhe latent . This
(McCullagh & Nelder. 1 The ohserved v is assumed lo llave a run-
and ¡he lll1standardiled m
dom dist ribution with mean:

¡'L
rhe ordered mude! is identical ro For example, in the LRM, is assumed lo he distrihuted conditionally
that lhe values of lhe thresholds normallv with mean 11. The 01' lhe GLM assumes that
unknown, we have no way to link our
consequenlly, lhere is no way
us to assume ¡he variance where 1/ is called lhe linear value f.L is linked lo
can interprel fuUy standardized and tbe linear prediclor through lhe link
¡he unslandardized
TI
of these models has a logit counterpart The 01' the random error and the link function defines
¡hat the ermrs are distrihuted as logistic Ibe model. For if ¡he link is ¡he identity function r¡ f.L and
¡he errors are normal, we have Ihe LRM:

01' lhe laten! vari- f.L 1] xJ3


and limited dependent vari- If y has a hinomial distrihution with the link:
equation models m a multiple
consider the extensiOll 01' the LRM to the In ( f.L) r¡ xJ3
Chapter lO):
,I f.L
the logit mode! is ohtained. 01', with the inverse normal link:
/':1 [9.5] (p I (f.L) r¡ xJ3
the prohit model is obtained. When has a Poisson distrihution, with
Ihe log link, the Poisso!1 rnode! results:

be constrained to 0, If we replace the in In f.L r¡ xJ3


any of ¡be latent vari- The first program lo estimate Ihe linear model was GLIM
tobit, probít, two-limit SAS's GENMOD and Stata's GLM can
Browne and Arminger
Conclusiolls 259
EG H S S ! () '" \1 U nE1S
to promotion in the mílitary. The mean months in
Modcls
the service at time of promotion, was censored since sorne soldiers left
linear the service befo re promotion. This problem díffers from the standard to-
bit model since the time until differs for each individual. Since
soldiers entered at different times and left the service at different times
without promotion, the value of T differs among the soldiers. The model
becomes
cnnsider the truncated Pni"son regres- if < Ti

mode! where counts at 2 or ahove are truncated (Winkelmann, 1 ¡f
If define fL then
where the censoring points differ by individual and T is uncorrelated
with the x's. Another difference between the tobit mudel of Chapter 7
x) and AFT models is that in the tobit model e is assumed lO
be normaL In AFT models, many other dÍstributions are
such as the one- or two-parameter extreme value distribution, the
gamma distribution, or the logistic distribution. A practical illustration
2. of the link between tobit analysis with individually limits and the
AFT model is that programs for the AFf model SAS's LIFEREG)
can be used for tobit analysis if normal errors are specified. Or, if the
x) normality of the error term in tobit is inappropriate, these programs can
be used to estimate a tobit model with other error distributions.
fL

9.5. Log-Linear Models


Pr(y O 2,x)
exp(xp)
Log-linear models are a large and important class of models for the
0\ 2, x) analysis of contíngency tables (Agresti, 1990). The of log-linear
analysis is to determine if the distribution of counts among the cells of
a table can be explained be some simpler, structure. By com-
which is the model.
paring models that specify different structures, the researcher can test
hypotheses about the interrelationships among the variables represent-
ing the rows, columns, and layers of the table.
9.4. Event Analysis
To illustrate the links between log-linear models and the models in this
Event also known as survival deals with longi- book, consider Table 9.1, which was originally by Radelet (1981)
and Jater examined by Agresti (1990, pp. This three-
tudinal data on the occurrence of events. The dependent variable is how
way table is based on 326 murder cases. The variables are the defendant's
ir takes until has happencd. While there are many meth-
1995: Kalbflcisch & Prentice, race (D), the victim's race (V), and whether the sentence was the death
one method of analyz- penalty. The number of observations for cell D i, V = j, and P = k
duratíon data is very related to the tobit model. These are is Yijk'
models (AlIison, 1995, Chapter 4). The number of observations is assumed to have a Poisson distribu-
caBed accelerated failure time
examined the effects 01' race on times tion with mean fLijk' In a simple model of independence, fL is determined
For DauJa et al.
REGRESSION MODELS Conclusions

TAnLE 9.1 DC~lh To see rhe link between log-linear models ami Poisson rp('rp',,¡rln
fine three dummy variables that 1 lo indicate Ihat observation
in level 2 of a given variable:
So :;

132
q
r: if D
i1' D:::: 2

r:
52
ifV

). i1' V 2

=l~
i1' P
the i1' P 2
In =A Thus, whenever you are in a cell where D i 2, rhen equals 1
For example, xf~ 1 = O, = 1, and O. Then
Notice that lhe means for all cells where D 1 indude the parameter DVP
al! cells where V 2 indude the parameter ; and so on. For 1n fJ..1]k f30 + + +
for cell 1, 1. 1),
specifies a Poisson regression model. Consider several cells of the tabIe,
corresponding to Equation 9.6,
DVP
1nfJ..Ill =
while for cell 2.
In f30 +
In DVP
1n fJ..222 = f3
() + f3 D +
To rhe constraÍnts are imposed on the parameters, Iden- The estimates from Ihis model are identicaJ lO those from a log-linear
tificatiol1 in Ihis model is similar to the situatiol1 with dummy variables model, where
in lhe LRM. For vou cannot have (me parameter for being a
female and anoth;r for being a male (e,g., As wlth dummy f30 A f3v =
variables in the LRM. we identify lhe model by assuming that lhe first These parameters can be interpreted in the same way as the
leve) of each group of is fixed al O: for the Poissún regression modeL
Interactions are added to the model to allow the in some
::;: O; () Af =0 combinations of cells to be more likely than would be if the
Wirh these variables were independent of one another. For
1 DVP \ ,D
n fJ.. iJk = 1\ + 1\ i + + [9.7J
In [9.6]
In Tú identify the model, we as sume that o for al!
i. j, and k equal to 1. Equation 9.7
In + + Af model with the interaction variables:

and so on.
REGRESSION MODELS Conclusiolls 263

Then where

In /30 = A~ /3v
where This is simply the logit model of Chapter 3, and it can interpreted
predicted probabilities and factor changes in the odds in the same way.
/3vp = While my discussion of log-linear models oversimplifies a number 01'
Notice tha! the variable has been the number of observa- important issues, the basic ideas should be clear. See Agresti (1990) for
tinns in each cel!. our substantive fncus lS likely to be on the effects a comprehensive discussíon of log-linear models, or (1996) for
of !he defendant's and the victim's mee on the sentence received. The an excellent introduction.
effects of race can be the difference between the log
two counts when P 2 and P 1:

DVP)
In In = In
( J.Lb~
J.Lijl

This i5 the of the odds, or logit, of not giving the death penalty
the faces 01' the defendant and victim. Taking the difference of
9.7 for a combination of D and V for two levels of P,

In In

Since any A with a subscript equal to is constrained to equal O to


'-''''''''.1 the modeL the model becomes

In ~ In = [9.8]

1b show the link to the model, define sorne new dummy variables:

{~
if D
if D =2

Then 9.8 can be written as


r: if V
if V 2

In +
Appendix A. Answers 10 f);:ercises 265

J~
Chapter 2: Continuous Outcomes
Page 23:

Answers to Exercises Ix):::: ¡¡Ix) E(e x) Ix) o.


Since Xl and are assumed 10 he
e) must he correlated.

Chapter 3: Hinary Outcomes


Page 38: Let y be a dummy variable with
E[(y - fL)2) = E(;/ - 2y¡..t +
equals 1 or 0, E(l) Therefore, Var(y) =
Substituting E(y) ¡..t, Var(y) = fL(l ¡..t}.
Page 38: The dashed ellipse shows I The dotted lines are
located at xJl = O and 1:

E(ylx)

The contains brief answers to the exercises contained in italics


o ,

,-
<

..
"¡r""
;1>"""
I
I
.•..•..
,- " I

""
I
withín the texto I

""
I
I

" ""
1: lmroductiofl I "" "
""
I
I .1'
6: If I .1'

,.'" "
I
I
....... ". .-
..
,,;1>;1'

--
then 50 100
X

¡he ratio and denominators, Page 39: From Tables 3.1 and 3.2, Pr(y = 1) :: 1.144+ x 4) + (-.011 x
1.35)+(-.013 x +(.J64xO)+(.019xO)+(.123x 1.10)+ .007 x
exp(a' + ¡3*x + 8'd) 20.13) -.51 based on rounding to three decimal digits. Using the fuI!
precision sto red by lhe computer, the probability is -.48. Note that you
¡he should use full precision in making these computations.
Page 42: The point is located at (0.0,0.50).
in ) a*+ x+8'd. Page 45: At XI' y;- = a + ¡3x¡, al X2' Y2 in the expected
value of y* is Y2 - 51;' + - Xl)' At XI'
REGRESSION MODELS A. Answel:\' /O E>::erdw:s 267

the call it H, the !nL is


+ The
and rhe slower we want to H. rhe
smaller H~ 1, and hence the smaller lhe
L lt follüws that
6'" 3.6. If (]: Íncreases in
the dístrihution i8 ahove T, in a
That is, the curve in Panel B wil! shift to Ihe left.
69: The "'Hv",un, and
not
L Then dashed tines mark off the range
mark the for ¡hose who attended
squares are for those who did no! attend Al 0, the
curves are almost huI 1.
lo converge.

LO
OLC)
..Do
O ,,
the ---lLC) ,
55: The so it would have a
N ,,
.. .,1.
Ca
to find the max- '--'"
imum, LO
DL O~__~__~____~__-L__~____~~~~~

o O 1 2 3
Number of Children

70: In the LRM, the de¡Jendcllt variahle is observed and,


lts seale will not as variables are added 10 the model.
Page 72: From 3.7,
....J
C
P= _~...!;-2."":"'::'= and P

Therefore,

P(l P)

which is the result we nced.


REGRESSION MODELS Appendix A. Answers to Exercises 269

fmm lhe symmetry (Jf lhe prohahility curve Page 133: Consider m = 3. Then
Pr(y ::: 31 x) = Pr(y = 1I x) + 21 x) + = 31 x)
From Equation 5.6, Pr(y == m I x) xll) - - xll). Sub-
4: and Goodness Fít stituting and noting that F( 7'0 xll) O,

uf In L ís nut 1.110'¡¡15"1115'
Pr(y:::3 x) = F(7'¡xll) Fh xll) + - xll)
88: therehy increasing the ahso- - F(7'¡ - xll) + - xll) - F(7'2xlJ)
lute value of lhe second derivative.
= F(7'3 - xll)- xlJ) xlJ)
91. lf lhe vafiance i8 we are less confident in lhe estimate and,
Mepn""nt"" would wanl to it less weight in making a decision. Page 138: Combining Equations 5.2 and 5.10,
92: Let O be a 1 vector of O's; lel 1 be a 7 x 7 identity matrix. Then
let Q rOl 1and r o. Pr(y ::: m Ix) = -...::...;.....":'--..:.....:...-
are nested in M4'
y2 Summing just the cases where Then
Summing the cases where y = 1,
xlJ) 1
. Combining the two sums, I)Yí - 51)2 = Pr ( y > m 1)
x = 1 - 1 +exp( 7'm -
:---=-~--"-,,=
exp( 7' m - xll) 1+
= n¡ 2nf/N +niN/N 2 =
Dividing,
111. This follows since D( MI') °and O.
Pr(y:::mlx) = exp(7'm- xll)/[l+exp('Tm -xll)] =e ('Tm-xll)
Pr(y>mlx) l/[1+exp('Tm -xll)] xp

Page 143: Excluding the intercept, Ilm has K coefficients for each oí the l 1
5: Ordinal Outcomes
binary logits, for a total of (J - 1)K coefficients; IJ has K coefficients
121: For 15, excluding the intercept. Therefore, we are imposing (l - 1)K - K
K(J - 2) constraints.
I x <1>[.75 .052(15)] = 0.68 Page 144:
<1>[3.5 - .052(15)]

- <1>(.75 .052( 15)] = 0.32

3 x <1>[ 5.0 - .052(15)]

- .052(15)] = 0.00

15) 1 - <1>[5.0- - .052(15)] = 0.00 Chapter 6: Nominal Outcomes


41x
Page 150:
123: Fix any one threshold lO any value; or fix the intercept to any value.
For 7'3 -13.9 or a 33.3.
pr(AIX)] [pr(BIX)]
129: When both the and normal distributions are standardized In [ Pr(Blx) +In Pr(Cjx) = [lnPr{AIx)-lnPr(B x)]
are similar in shape but not exactly the
same. + [lnPr(Blx) -lnPr(C Ix)]
REGRESSION MODELS AppCf/dix A. AfW\,('~\' {() E\ercisl's 271

(he In Pr( B I x)'s, have: 173: Notice tha( ¡he es Iined up. Thc n:lative lucatillll uf lhe other
is

lnPr(Alx) lnPr(Cfx) In Factor Chen"" Sea la


0.37 0.61 4.48
latcr in ¡hc chapler, eonsider three
eraft, and M menial. The MNLM B .. C
variable, ED (edueation). Then
Xl
estimated with two the first comparing
outcome, P and M, and ¡hc second outcomes P and C.
and multinomial differ. X2
e B
Sotice ¡ha! ¡he cstimates from ¡he A

ED A:::" ...... :
X3 "'8"
0.725
0607
-1.50 -1.00 -0.50 0.00 0.50 1.00 1.50
0.690
Lagit Coelficient. Seal.
O.6M

the binary
Chapter 7: Limited Outcomcs
Page 199: Since ¡he normal distribution is <1>( o) <1>(
due lo symmetry, 1>( -o) 1 1>(0).
m Ix,) Page 203: Jf 1>( o) = 1, then O and O, so Ix) xrt That
is, there is no censoring. Ir <1)( o) O, then all cases arc censored and
E(y I x) T"
Let ean be computed as
Page 207: Consider a with two values of x, with (he conditional
distributions A and B. The marginal dislribulion of y' would combine
A and B lo form a single marginal distribution indiealed the two
peaks el and b, The marginal distrihution has a mueh varianee .

. Then staek all param-


13'" 13',,)'. Let

O () O O OO
Q~(
O O () O (J 1 1
OO )
OOOO O OO
(J OO 1 1 OO OO
O () O O OO OO OO OO O O -1 1

The firsl tour columns of Q to the four intercepts, the next


four ro the coeffieients for WHITE, alld so on.
170: For in the odds of heing eraft versus
.466. x
REGRESSION MODELS Appendix A, Answers lO Eúrcise.l' 273

Outcomes Page 233: As l' oo. Var(y I x) = o.


Page 233: If a O, there will be
distributions. The solid line is for ¡;. o =
dashed line ¡;. 4,5 for men. Panel B plots Page 243:
distrihutions tha! are nol conditional on The solid
distribution with ¡L the dashed line averages the Pr(yx) [1/1+(1- 1-
in A Notice that the distribution has
"n"r~lnn ¡han the Poisson distrihution,
1/1 (1 1/1 (1
Distributíon 8 Distribu\ion

10 12

y y

,779, Using this value, we


for various counts:

.46

1I = .36
21 ,14 Pr(y 31 ¡;.)

¡he unstandardized coefficient for MENT is 0.026,


,he percentage is 100 [ 1] = 2.6; the x-
standardized coefficient is so the percentage change is 100 x
1]
(1 1 contain ¡he means of all variables
except for FEM L Then
x ,155) + (.50 x -.185)

Ix) exp(xfl ) L 43. And similarly for


275

Berkson. J.
Berndt, E. R. Reading, MA:
Addison-Wesley.
John Wiley. Berndt, E. R" Hall, B. H., Hall. R.
New York: John Wiley. in non-lino:ar structural MeaSlIremeflt, 3, 653-
response hy maximum indican!.

probít analysis lO the case of

ex!ten.sjc,n of ¡he maximum likelihood princi-


¡ntl'""atlOfl,al symposium on information

fli""yi",in:41'¡ analysis. American Joumal

ami probit models. Beverly

intervention: Theories of !he ,tate and


An1i'r1Cllfl SVIr:iolrogl'cal Review, 56, 679-689.
Hills, CA: Sage. New York: Plenum.
SAS NC: SAS Institute, lnc. Buse, A. Tbe likelihood ,\n eXI)ositolrv
models for seIs of nmked ítems. In P.
24, pp. 199-228). Oxford: BasH Camemn, A.
parisons and applkations estimators ami tests. Joumal ECoflomet-
lhe dependen! variables are truncated nor-
Camo:ron, A. c., & 1rivedi, P K. ( for overdispersiol1 in the
Joumal uf Economic Litera/ure, Poisson mude!. }aumal
Cameron. A. 1rivedi, P K. data. manuscipt.
REGRESSION MODELS RC{CrI'IlCCS

lheory. Social f(Jrces, 50, 12-25. Gourieroux, C. Monfmt. A" & A(


Discrimination bt:tween alternative binary response Applications to Poisson models. Ecollometrica.
(¡rcene, W H. (1993). ECOflOmetric ana/vsL> (2nd ed. New York: l\'lacmillan.
\1""/"11"11 moJels ¡(jr ordinal vanables. Thousand Gn:cne, W H. (1994), Accounling ami sampk
negative binomial rcgression l11odels, Working Paper No. Ncw
lIlathematical suciology, New York: Free Press. School of Business, New York UnÍversit:y, Department Econornics.
data, London: 1Vh:thuen & Co, Greene, W H, (1995), UJWEP Ve¡;w()1! Bellport, NY: f~c,•• n'HTI • 'tr¡,'
Greenwood. c.. & Farewell, V (l9i\8), A cornparison of rf',lrf'""nn
data in an analysis 01' transplanted-kidney fundion, \ .. urwaulfI
Cramer,1. applica¡iollS ofmaximum likelihood methods, Cambridge: 325-335.
Cambridge University Press, Grifílths, W E" Hill, R c., & Judge, G. (J, (1993), Leaming and rlmrl//''1nt' enJ/lC,m,'lrt,
Cramer, J. S, (l991), mm/el. New York: E. Arnold, New York: John Wiley,
T, Smith, D A, & Nmd, R. 19(0), lnequality in the mílitary: Faet or fiction? J. (1990). The detcrrent effect 01' punishment: An
cide counts, Joumal of (he Amencan Slalislical Associalioll,
Amencan Review, 714-718.
Davidson, R., & MacKinnon, 1. (i (1993). Estimlllíon and inference in econometncs. New J. T, & Carson, R. T (1991), Modcls fm truncated counts, Joumal o{ Applied
Econometncs, 6, 225-238,
York: Oxford lJniversity Press,
J F, & Willmot, G. E. (1989), A mixed Poisson-inverse Gaussian Grounau, R. (1973). The effect of children on the housewife's value of time, Jouma/
Po/ítical Economy, 81, sI68-s199,
_",.~""." .. modeL Canadian Jouma/ Slatís/ies, 17, 171-IHL
Gunderson, M, (1974), Probit and logit estimates of lahor force partícipatíon, Industnal
Adding covariates lo loglinear l110dels for the 'itudy of social mobility,
Relations, 19, 216-220.
American Review, 757-773,
Gurland, j" Lee, L, & Dahm, p, A (1960). Polychotomous t¡uantal n~sponse in biological
DOl11encich, T A, & McFadden, D. (1975), Urhan travel demand: A hehavioral ana/y'sis.
assay. Biometrics, 16. 382-398.
Amsterdam: North·HollamL
Gurmu, S, (¡991). Tests fm detecting overdispersion in lhe positive poisson regression
Eaton, ,1" & Tamura, A. (1994), Bilatcralism and in Japanese and U,S, trade
modeL Jouma! uf Business and Ecoflomíc Statistics, 9, 215-222.
and dírect foreign investment panerns, Joumal af ¡he Japanese and Imematianal
Gurmu, S .. & lhvedi, P K (1992). Overdíspersion tests fm truncated Poisson regression
478-510,
mode!s. Joumal of Econometrics, 54, 347-370,
Kegn~sSlon and ANOVA with zero-one data: Measures of n:sídual varí-
Gurmu, S .. & Trivedi, P K (1995). Recent developments in models 01' event counts. Papel'
the /~menfan S/atístical Association, 73, 113-121.
No. 2t\!. Charlottesville, VA: University 01 Jefferson Center for Polítical
Maximum likelilwod estimarion: Logic und practíce, Newbury Park,
Economy.
CA: Sage. Hagle, T M" & Mitchell n, G, E. (1992), Goodness-of-fit measures fm probit and
Fair, R. C. (1978). theory of extramarítal affairs, Jouma! of Política! Economy, 86, 45--61. American Joumal of Política! ,'icience, 36, 7t\2-784,
Feller, W 1971), All introouctíonlo probabilíty rheary and its applications, vol. 2 (2nd ed.). Hannan, M, T, & Freeman, ], (1989), Orílanizationa/ Cambridge, MA: Harvard
New York: John Wiley, University Pres>,
Fíenberg, S. E. (1980). The anal"sis af cross-dassífied caref!OnC,21 data (2nd ed.). Cambridge, Hanushek, E. A., & Jackson, J. E. (1977), Statistical methods social New York:
MA: MlT Press. Academic Press.
Finney, D, J. (1971), ProM ana/ysis (:'Ird ed,). Cambridge: Cambridge University Press, Hartog, J., Ridder, G., & Visser, M. (1994), AlIocatíon oC individual, 10 ¡ob levels under
J, Ret.¡ression diagnostics: An imroductior!. Newbury Park, CA: Sage. rationing. Joumal of applied I'conomelncs, 9, 437-451.
Fronstin, p, & Holtmann, A G. The determinants of residential property damage Hauck, W W, & Donner, A (1977), Wald's tests as applied lo hypotheses in logít analysis,
caused by Hurrícane Andrew. Southem Economic Jouma/, 61, 387-397, Jouma! of Ihe American Slatístical Association. 72, 851-H53.
Gaddum,], H. Methoos assay depending on a qUilmal response, Reports Hausman,1. A, Hall, B. H., & GriJiches, Z, (19i\4), Econometric models fm eoun! data
on Special Repmt Series, 183. London: Medical Research with an applícation to lhe patents-R&D reJationship, EeO/wmetrica, 52, 909-938.
CouncíL Hausman, J, A, & McFadden, D, (19H4). Specification lests for the multinomial logit
Godfrey, L G, MI,SSDt?CltlCC¡{U,1/l tests ín eco/lometncs, Cambridge: Cambridge Uni- rnodeL Econometríca, 52, 1219-1240,
versí!y Pres" Hausman, J. A, & Ruud, P A (l9i\7), Specifying and econometric models for
Goldberger, A S. (1964). Econometric New York: John Wiley. rank-ordered data. Jouma/ of EcOnOmetl7CS, 34, 83-104,
Goldberger, A, S, ( A in uO/wmetncs. Cambridge, MA: Harvard University Hausman, J. A, & Wise, D. A (1977). Social experimentation, truncaled distríbutions and
Press. efficient cstimation. Ecollometnca, 45, 919-939,
Goldscheider, E K" & DaVanzo, j, Pathways lO independent living in early adult- Hausman, J, A, & Wise, D. A (1978). A conditional probit model lor qualitative choice:
hood: Marriage, semiautonomy, and premarítal residential independence, Demogra- Discrete decisions recognizing interdependence and heterogeneous preferenees,
Ecollometnca, 46, 403-42t\,
REGRESSION MODELS

Mllrkct and labor supply. Ewnometrica, 42, Lesaffre, E., & Alhert, A (1989). Multiple-group lngistic regression diagnostícs. Applied
Statistics. 3R, 425-440.
,tatísticlll models of lruncation, sample Líao, T. E (1994). lmerpreting prohabi/ity models: Logit, prohit and otller genera/ized linear
simple eslimator for such models. models. Thousand Oaks, CA: Sage.
475-492. Little, R. .l. A., & Rubín, 1), B. (19!l7). Statislica/ ana/ysi.> wilh míssing dala. New York:
John Wiley.
Long, J. S. (1983). Confirmatory factor analysís. Newbury Park, CA: Sage.
Long, 1. S. (1987), A graphical method for the interpretation of multinomiallogit analysis.
Sociological Methods and Research, 15, 420-446.
n:grcssion modds. In R. Gilchrist (Ed.), GLlM /1)1:12: Long, J. S. (1990), The origins of ,ex differences in science, Social Forces, 1297-1315.
O/l Gcnemhzed Linear ,Hodels (pp. 109- Long, J. S. (1993). ¡\1ARKOll.' A statistical environmentfor GAUSS. ~ersion 2. Maple Valley,
WA: Aptech Systems, lne.
Long, J. S., Allison, P. D., & MeGinnis, R. (1980). Entrance into lhe academic career.
American Review, 44, 816--830.
Applied regression. New York: John Wiley. Long, 1. S., & McGinnís, R. (1981). context ami scientific
/JfCI(",,,m'y (2nd ed.). Ncw York: Oxford University Press. American Review, 46, 422-442.
(1994). Continuolls univariate dístributions, Longford, N. T. (1995). Random coefficient models. In G. C.e & M. E,
Sobel (Eds.), Handbook of slatís/ieal mode/íng for the ami behavioral sciences
Univariate discrele distributiolls. New York: (pp. 519-578). New York: Plenum.
Luce, R, D. (1959). Individual choice behavior. New York: John Wiley.
& Lee, T.-C. (1985). The theory and practice of Maddala, G. S. (1983). Limited-dependent and quali/ative variables in econometrics. Cam-
Ncw John Wiley. bridge: Cambridge Press.
R. L. (1980). file s/místical ana(ysis of fai/ure time data. Maddala, O. S. (1992). Introduction lO econometrics (2nd ed.), New York: Macmillan.
John Wiley. Maddala, G. S., & Nelson, E D. (1975). Switching regression models with exogenous and
Comparing dfecls in diehotomous !ogistic regression: A variety of endogenous switching. Proceedings of lite American Statistical Association (Business
Social Stienee Quarterly, 77,90-109. and Economics Seetion), 423-426.
Statisticaj models for polítical science event counts: Bias in conventional Maddala, G. S., & Trost, R. 1'. (1982). On measuring discrimination in loan markets. Hous-
evidence for lhe exponential Poisson regression model. American ing Finance Review, 1, 245-268.
838--863. Magee, L (1990). R 2 Measures based on Wald and likelihood ratio joint significance tests.
metlwdology: Tite Iikelihood theory of statistical ínference. American Statístician, 44, 250-253.
Press. Manski, C. E (1995).ldentification problems in the social sciences. Cambridge, MA: Harvard
University Press.
Marcus, A, & Oreene, W. H. (1985), The determinants of ratíng assignment and performance,
Working Paper, No. CRC528. Alexandria, VA, Center ror Naval Analyses.
McCuUagh, p, (1980). Regression models for ordinal data (with discussion). Joumal of
Royal Statistical Society, 42, 109-142.
poísson regression with an application to defects in McCullagh, p, (1986). The conditioIlal distríbution of statistics fm discrete
1-14. data. Jouma/ of ¡he American Statistieal Association, 81, 104-107.
en;lflo,rne,rnc Clllalysis of (ransttion data. New York: Cambridge McCullagh, P., & Nelder, J. A (1989). Generalized linear mode/s (2nd ed.). New York:
Chapman and Hall.
criminal careers: So me suggestions for moving beyond the McDonald, J. E, & Moffitt, R, A (1980). The uses of tobit analysis. Review of Economies
curren! Criminology, 149-155. and Statístics, 62, 318-321.
J. M., Pregibol1, 1)" & Shocmaker, A C. (1984). Graphical methods for as- McFadden, D. (1968). Tlle revealed preferences of a government bureaLlcracy. Working PapeL
«"0'''''':<'''''' model" Joumal (he American S/atis/ieal Associa/ion, 79, Berkeley, CA: University oí California, Department of EcoIlomics.
McFadden, D. (1973). Conditional logit analysis oC qualitative choice behavior. In
binomial and mixed POiSSOll rcgression, Canadian Journal 1'. Zaremhka (Ed.), Frontier:5 of econometrics (pp. 105-142). New York: Academic
Press.
'111e dccompm,ition cocffkients in censored n:gressíon models: McFaddcn, D. (1981). Econometric models of probabilistic choice. In C. E Manski &
indepcndent variahles on taxpayer behavior. Na/ional D. McFadden (Eds.), ,)'tmctural analysís of díscrete data (pp. 198-272). Cambridge,
MA: MIT Prcss.
R GRESSION MODELS 281

for lhe

TX: Stata

(3rd

the American SU/lis/ical As-

Statistics, 9, 705-724. choices


t'f'(¡m",,,·tri,n of comer." kinks and holes.

logit model. ¡fIlen/alional Economic


the death penalty. Amerí-
"'t¡UI1"11¡¡~ involving qualitative variables. Ameri-
research. In P. V. Marsden (Ed.),
Oxford: Basil Black:well. 34, 273-286.
1949-198(1 American Po-

limited dependen! variables. Econometrica,

non-nested hypotheses.
REGRESSION MODELS

amlllc,21it;'!1 10 labor

models: Testing the HA property.


THE ¡\ PPROACH
.' i -", tllt i.,¡¡;:, '" '1!,!,I",!, /¡ 1.' ,>!l" ¡j¡,¡¡ ¡ 11I"':'/;/;' t (')/111'1'11,1. f/¡n(' /.' ¡I .1«( /detl ('I/lI'/u!.'i., 1111 tile
,11'1'1.'c'i/lif'iJ 'I/Id flll t"l' rt'frUiOlI 'f 1111' 'l'r ('f!ic,f ,¡fr ~tic;¡l lcclrllil! u¡", l.t)/l,'I ,curb frmn Ihe
!"t' /li/.'!' thul tllt' n!li!OI :lij!ICulh¡ t('!r /i tlll' ,)f IlImll'" ,/! ut ,11'1 't· rtde /l I
"t¡nil.!r',' 11.,,'/)\,",) i" the t'ollll'lnrll! uf illlflpretl11:i Ili l /l-li/l('[ir /I1(I,(e!" Illill he 11wrid(,"
1,'u!, for IIwl /.(' I.IIII,II,'I! lill'
Robl'rt L. K.lutllhlll, rlJI' o}ú,' ,~ta!e i íni¡'lT,itll

¡¡mi ,¡¡¡el lilll lÍ cd


{/ in/di/i,m!¡l n?grt'..;,;¡mt ¡ti'l'spcdÍl.'t' ilwt prllndes 1/I1/hltalll¡ e/ear
d<'¡'t'lTdml ."lnillblc.' fl'tltll
¡,,'¡ ullat ioll, Ii /mtific tlf io/l, IlIId lile III1¡ftil'!i(i11f tlf modd, {mti/¡¡f¡ie
,I LX ibSitll h (t)IlCn ni ll:,

lo Ihe rt:.'t'ilrd:a tu ¡¡¡u/ll/u ,ueh tiala, "

• THE ORG NlZATlON


Ihal ¡he ¡¡I U., ! {¡ l/out t/¡ is [Jook i.' Iww n n 'II/'I'17, '!I it i~. rile clwptcrs
,m' uc!'!lolllo:.;ical ~('q!lCII({" '/'haf ¡,!I lN:flll rt7,cI¡li¡t1l of iI/l1 (I/'/aI11
rtI (O I!(Cpf:; (',,'S,,
'
estima /io ll , /¡y/,ot!lf;,¡S tes/ in:.;) [ruin ,!iaplfr lo (!tapia . :)((l tl L011g has r done a lerrific;o/J
,1 (IIStlll¡: W'¡ likc 1111/1>(0 flom displlrale liIcm l lm:s,ilU./t lb Ihe ,miel' IIICINl res of fit 111
Chilpfcr 4,"

';'\ 'l1/lfor "lrc ll:.;th uf Ihe !JOok i, the ¡l'fll/ 1/1111 il is urXll/lIzed. 1'17,' cha ¡ller a[lOut ('l/eh
Ictl1l1if/uc h wriftCí! m illl/:~h lil Ofg lllll Zed ami I'a m 11 el fOlll1ll 1. Tirsl lIle , tallsl lcal /1a,si:;
awi I¡';"I/1/1ptioIlS for tire par/iC1lll/r mudel i5 dl'I'eloped, 111m C:ilillllll lt¡;¡ ¡",;¡¡es are
( lJ lTsic/cred, t/¡nr iS.':' lIc:> uf tes /ing l./Il d Ill laprdaiitm (lre t'f)/LSidcrcd, t,l¡C I1 ¡'ariatimls and
"X/('1/,siuI¡'; {lre i'xplurcd, '.
-·-R()b~rt L. KJufman, The Oh;o Stal. UlIil'er5ilv

• FOR THE COUR 'E


illll'C [ ICC11 a ,:(lu rse un data to Xradlll/le ,twlents
from e/o:,-/' lo 2U year", 11lI! 1 ¡u/i.'e /lepa fO lllul a hook ¡(lI!1r 11'i/ICh 1 was happy. ]. Scolt
LUI1g'" ¡¡oot 0/1 ¡he otila luwd, 1-' I/carly ideal for IHIf ob]cctit,<,s .?l/Ill'rejÍTI'I1CC5, and 1
expel t Iha l malllf otila ,oclIll ,cimll.'¡' ¡uill fcel tite ,mm! w(/\( I ¡viII defmi/ely adopl il the
Ilnt time 1 lellch ¡he CilUrSI', lt del//:; l1'itlr tllf right IOFies in the lIIosl dcsírallle sequl!/lCf
!lml II i., clCilrltl [(, I'illen ."

AdV¡II)(:"d Q ua ntit,ltív c 'lhJmi qu!!s


in Ih e Social Sdc nces, Volum e 7
ISBN 0-1>039-7374-8 hardcovt>r

Visi¡ our web,íte ,JI www,"'gepublic;¡tions,wm q


~11~iftlr~~ílíi l ll! I~!Imr~
¡e08f)3 9?:fi' 4Y

You might also like