Professional Documents
Culture Documents
Email:sa.publishers@gmail.com
Web: www.statisticalassociates.com
Table of Contents
Overview ......................................................................................................................................... 7
Data examples in this volume ......................................................................................................... 8
Key Terms and Concepts................................................................................................................. 9
Location variables and thresholds .............................................................................................. 9
Prediction equations ................................................................................................................... 9
Ordinal Regression in SPSS.............................................................................................................. 9
Overview ..................................................................................................................................... 9
SPSS inputs ................................................................................................................................ 10
The main “Ordinal Regression” dialog................................................................................... 10
The Ordinal Regression “Location” dialog ............................................................................. 14
The Ordinal Regression “Options” dialog .............................................................................. 17
The Ordinal Regression “Scale” dialog .................................................................................. 19
The Ordinal Regression “Bootstrap” dialog........................................................................... 20
The Ordinal Regression “Output” dialog ............................................................................... 22
SPSS outputs .............................................................................................................................. 23
Overview ................................................................................................................................ 23
The parallel lines test............................................................................................................. 24
Tests and effect size measures for model goodness of fit ....... Error! Bookmark not defined.
Parameter estimates ................................................................ Error! Bookmark not defined.
Odds ratios................................................................................ Error! Bookmark not defined.
Other output ............................................................................. Error! Bookmark not defined.
Ordinal Regression in SAS ................................................................ Error! Bookmark not defined.
Overview ...................................................................................... Error! Bookmark not defined.
SAS syntax for ordinal regression ................................................. Error! Bookmark not defined.
SAS output for ordinal regression ................................................ Error! Bookmark not defined.
The parallel lines test................................................................ Error! Bookmark not defined.
Testing the global null hypothesis ............................................ Error! Bookmark not defined.
Parameter estimates ................................................................ Error! Bookmark not defined.
Type 3 Analysis of Effects ......................................................... Error! Bookmark not defined.
Odds ratio estimates ................................................................ Error! Bookmark not defined.
R-square .................................................................................... Error! Bookmark not defined.
What are ordinal regression signal-response models (probit link)?Error! Bookmark not
defined.
In Stata’s gologit2 partial proportional odds procedure, how are standardized estimates
obtained? ..................................................................................... Error! Bookmark not defined.
What is the SPSS syntax for ordinal regression models? ............. Error! Bookmark not defined.
Acknowledgements.......................................................................... Error! Bookmark not defined.
Bibliography ..................................................................................... Error! Bookmark not defined.
Overview
Ordinal regression, also called the ordered logit model, is used with ordinal
dependent (response) variables, where the independent variables may be
categorical factors or continuous covariates. Ordinal regression avoids the
measurement error inherent in OLS regression using ordinal data. When the
response variable is ordinal rather than nominal in data level, ordinal regression
also has more statistical power than multinomial regression.
Ordinal regression models are sometimes called “cumulative logit models” since
they are a variant on logistic regression, except using the cumulative logit link
rather than the logit link (though other link functions are possible). Ordinal
regression models are also called “proportional odds models” because of their
requirement that the regression lines they generate must be proportional,
meaning parallel, sharing the same regression coefficients but varying in their
intercepts.
Ordinal regression models are also called a “proportional odds models” since the
k–1 regression lines are parallel, hence proportional, and because the b
coefficients may be converted to odds ratios as in logistic regression. The natural
logarithm base e exponentiated to the power of b is the odds ratio, discussed
below.
See also the separate Statistical Associates "Blue Book" volume on Generalized
Linear Models. Ordinal regression is a special case of generalized linear modeling
(GZLM). Identical parameter and model fit estimates can be obtained using the
GZLM procedure, but options vary somewhat between it and the stand-alone
In this example, the dependent variable is "happy". It has three levels: 1=very
happy, 2=pretty happy, 3=not too happy. What is modeled is the log odds of
"happy"=1 vs. "happy" equaling the higher values, 2 or 3. Also modeled is the log
odds of "happy" = 1 or 2 vs. "happy" equaling the higher value, 3. Put another
way, ordinal regression models odds of cumulative counts, not odds of individual
levels of the dependent. For example, for "happy" = 1, we model
ln[(prob(happy=1/prob(happy>1)]. We also model, for "happy" = 2,
ln[(prob(happy=1 or 2/prob(happy>2)]. Etc. The SAS default reverses the
comparisons compared to SPSS, as discussed further below, but this can be
overrridden by the researcher.
In the example the predictor (location) variables are the main effects of:
Prediction equations
Ordinal regression will result in (k - 1) predictions for the dependent variable,
where k is the number of its categories. For instance, for the case of a dependent
variable with five values, the first prediction will be for the log odds of a score of 1
compared to a score higher than 1 (that is, ln(prob(1)/prob(>1)). The second
prediction will be the log odds of a score of 1 or 2 compared to a score higher
than 2. Etc. Note that it is unnecessary to have a fifth prediction to predict the
probability of a score from 1 to 5 as by definition that probability is 100%.
Note that "happy" has three levels: 1=very happy, 2=pretty happy, 3=not too
happy. Thus a higher value is less happiness in life. The highest level (3 = not too
happy) is the reference level for comparison purposes when interpreting odds
ratios, discussed further below.
SPSS inputs
The main “Ordinal Regression” dialog
The main ordinal regression dialog box in SPSS is where the dependent variable,
factors, and covariates are entered, as depicted below.
In SPSS the ordinal dependent variable may be coded in numeric or string terms
but coding is assumed to be ascending, with the first category corresponding to
the lowest value. What is predicted is not the raw value of the dependent variable
but some transformation of it. Usually this is the logistic function, using a logit
transformation, but SPSS offers four other transformations discussed below (ex.,
probit is available). Note that predicted values may be saved as variables, as
discussed in the FAQ below.
Factors
In the example below, unhappiness is predicted from sex, marital status, and age
category. The unhappiness variable is coded 1 = very happy, 2 = pretty happy, and
3 = not too happy. Higher values are more unhappiness. Sex (1 = male, 2 =
female), marital status (1 = married, 2 = all others), and age category (1 = under
35, 2 = 35 -64, 3 = over 64) are entered as categorical factors. Respondent's
highest year of education, total family income, and hours per week watching
television are entered as continuous covariates.
Factor space is the table formed by all the categorical variables plus the ordinal
dependent variable, which is also categorical. By common rule of thumb, no cell in
factor space should be 0 and 80% or more of the cells should be greater than 5.
For the example, there are 36 cells in factor space: 3 levels of happy times 3 levels
of agecat3 times 2 levels of marital2 times 2 levels of sex. In the figure below,
factor space is adequate because there are no 0-count cells and only one cell of
five or less. That is the cell for married, not too happy males under 35.
Why is this important? After computing ordinal regression, the researcher will
want to discuss effect sizes in a general way, generalizing from his or her sample
to the population from which it was taken. Because there are only four cases of
married, not too happy males under 35, the researcher’s generalizations logically
could be footnoted with a note stating the generalizations do not apply to that
group. The “no 0 cells, 80% more than 5” rule is, in effect, a permission to
generalize if this condition is met. Methodologists are tacitly agreeing that if the
rule is met, there are not “too many” gaps in the data to generalize one’s findings.
Factor space for the example is viewed in SPSS by running a crosstabulation of the
three factors (sex, marital status, age category) plus the categorical dependent
variable. (An ordinal variable is a type of categorical variable). As illustrated
below, all factor cells contain 5 or more observations, so cell count is adequate for
analysis ordinal regression. In SPSS, this table is obtained by selecting Analyze >
Descriptive Statistics > Crosstabs, then entering marital2 as the row variable, sex
as the column variable, and agecat3 as the layer variable for layer 1 and the
dependent variable, happy, as the layer variable for layer 2. The SPSS syntax is
this:
CROSSTABS
/TABLES=marital2 BY sex BY agecat3 BY happy
/FORMAT=AVALUE TABLES
/CELLS=COUNT
/COUNT ROUND CELL.
Covariates
The location model is the ordinal regression analog to the OLS regression model.
It specifies the predictor variables. That is, "location" simply refers to the
independent variables and their b coefficients. In SPSS the location model is
specified by clicking the "Location" button in the main "Ordinal Regression"
dialog, leading to the screen illustrated below. Even though factors and covariates
were specified in the previous main “Ordinal Regression” dialog discussed above,
the “Location” dialog still is needed because not all predictors may be used for a
given run of a model and because the researcher may choose to create
interaction and nested effects in addition to main effects.
The "Factors/covariates" pane on the left lists any factors or covariates entered in
the initial "Ordinal Regression" dialog. If the “Custom” radio button is selected,
the researcher then uses the variables listed on the left to construct main, nested,
and/or interaction effects, moving them into the "Location model" pane on the
right. In this example, happy (actually unhappiness since higher values correspond
to being less happy) is predicted from sex, marital status, age category, highest
year of school completed (educ), total family income, and hours of television
viewed per day.
It is also possible to select the “Main effects” radio button to get the default
location model. The default model consists of all of the specified covariates and
all of the main factor effects. For this example, It would be identical to the custom
model shown below.
Interactions
Interactions represent the joint effect of two or more variables over and above
their main effects of the variables in the interaction. The “Type” drop-down menu
supports interaction effects (not shown). Simply control-click the variables
wanted, then click on the enter arrow in the “Type” pane to create an interactions
such as sex*marital2. Interactions may be between factors, between covariates,
or between a factor and a covariate. Multi-way interactions are supported, not
just two-way.
Nested effects
The model may also contain nested effects, including multiple levels of nesting. A
nested effect is one where levels of one variable are dependent on which level of
a second variable they are nested in. For example, cereal(brand) is a nested effect
where the particular levels of cereal (ex., All-Bran, Apple Jacks, Corn Pops, etc.)
depend on which level of brand is being considered (here, Kellogg’s brand).
Nested effects may be more complex. For example, A(B(C)) means that B is nested
within C, and A is nested within B(C). Nesting within an interaction effect is valid
(ex., A(B*C) means that A is nested within B*C). Factors inside a nested effect
must be distinct (ex., A(B*A) is invalid). Factors cannot be nested within a
covariate effect (ex., if A and B are factors and X is a covariate, then X(A) is valid
but A(X) is invalid).
In the next three sections (the Options, Scale, and Bootstrap button dialogs) most
researchers accept the defaults. The skip these sections and go to the “Output“
button dialog, click here.
The Options button in SPSS is where the estimation options are specified, as in
the figure below:
The Options dialog in SPSS has elements related to the iterative process used in
calculating ordinal regression and also elements related to confidence intervals
and statistical tolerance.
1. Logit. f(x) = log(x / (1 – x)). This is the default (and used in the
example in this module) and is recommended when the dependent
ordinal variable has relatively equal categories. As it offers more
interpretable parameter estimates, including odds ratios as measures
Copyright @c 2012 by G. David Garson and Statistical Associates Publishing Page 18
When a probit rather than logit link is selected in ordinal regression, the SPSS
algorithm may need to be told what the model is to scale different variances for
different groups. In the “Scale” dialog, the researcher may create such a scale
model. As shown in the figure below, factors and covariates are listed, from which
the researcher may construct main and interaction effects.
Scale models are outside the scope of the present volume, which focuses on logit
link ordinal models. For discussion of scale models, see the separate Statistical
Associates “Blue Book” volume on “Probit and Logit Response Models”.
The Ordinal Regression “Bootstrap” dialog
bootstrapping also is used simply because the asymptotic formula seems too
complicated, as for standard errors of the median.
The Output button in SPSS is where statistical output and saved variables are
specified. The default outputs are goodness of fit statistics, summary statistics,
and parameter estimates. While not a default, requesting the test of parallel lines
is strongly recommended. Other output options include iteration history,
correlation and covariance of estimates, and outputting to file for each case the
predicted category of the dependent variable, response probabilities, the log-
likelihood, and more. The figure below shows settings for the example which
follows.
SPSS outputs
Overview
In the SPSS example discussed below, the dependent variable is "happy", coded
1=very happy, 2=pretty happy, 3=not too happy. What is modeled is the log odds
of "happy"=1 vs. "happy" equaling the higher values, 2 or 3. Also modeled is the
log odds of "happy" = 1 or 2 vs. "happy" equaling the higher value, 3. Put another
way, ordinal regression models odds of cumulative counts, not odds of individual
levels of the dependent. For example, for "happy" = 1, we model
ln[(prob(happy=1/prob(happy>1)]. We also model, for "happy" = 2,
ln[(prob(happy=1 or 2/prob(happy>2)]. Etc. The SAS default reverses the
comparisons, as discussed further below, but this may be overrridden by the
researcher.
In this example, the location (predictor) variables are the main effects of:
The parallel lines test is non-significant in a well-fitting model which meets the
parallel lines assumption. The test is a likelihood ratio test of the difference in -2
Log Likelihood between a model constrained to have equal slopes for the
predictor (location) variables and an unconstrained model. Note that the parallel
lines test is available only for the location-only model (not a location-scale model
for unequal variances. discussed below).
Although not part of SPSS default output, perhaps the first output table to
examine is that testing the critical parallel lines assumption: that the slopes of the
predictor variables (to be shown as the parameter estimates for the location
variables) are the same for each level of the dependent variable ("happy" in the
example).
In this example, the parallel lines test is significant, meaning that the regression
slopes do differ significantly across levels of the dependent variable, "happy".
When the test of parallel lines returns a finding of significance, there are four
options:
@c 2006, 2008, 2010, 2012, 2013, & 2014 by G. David Garson and Statistical Associates
Publishers. Worldwide rights reserved in all languages and all media. Do not copy or post on
other servers, even for educational use. Last updated: 5/12/2014.
Association, Measures of
Canonical Correlation
Case Studies
Cluster Analysis
Content Analysis
Correlation
Correlation, Partial
Correspondence Analysis
Cox Regression
Creating Simulated Datasets
Crosstabulation
Curve Estimation & Nonlinear Regression
Delphi Method in Quantitative Research
Discriminant Function Analysis
Ethnographic Research
Evaluation Research
Factor Analysis
Focus Group Research
Game Theory
Generalized Linear Models/Generalized Estimating Equations
GLM (Multivariate), MANOVA, and MANCOVA
GLM (Univariate), ANOVA, and ANCOVA
Grounded Theory
Life Tables & Kaplan-Meier Survival Analysis
Literature Review in Research and Dissertation Writing
Logistic Regression: Binary & Multinomial
Log-linear Models,
Longitudinal Analysis