An Introduction To Logistic Regression and Reporting - Peng

An Introduction to Logistic
Regression Analysis and

Reporting
Chao-Ying Joanne Peng
Indiana University-Bloomington
Three purposes of this

session:
1. Introduces you to basic
concepts of logistic regression
LR constitutes a special class
of regression methods for
research utilizing dichotomous
outcomes.
Three purposes of this

session:
2. Provides you with a set of guidelines
of what to expect in an article using
logistic regression techniques
What tables, figures, or charts should
be included to comprehensively
assess the results?
And, what assumption should be
verified?
Three purposes of this session:

3. Recommendations are also
offered for
appropriate reporting formats of
logistic regression results and
the minimum observation to
predictor ratio.
Reporting
Many research problems in education call for
the analysis and prediction of a dichotomous
outcome, for example, whether a child should
be classified as learning disabled (LD), or
whether a teenager is prone to engage in risky
behaviors.
Traditionally, these research questions were
addressed by either ordinary least squares
(OLS) regression or linear discriminant
function analysis.
Reporting
Both techniques were subsequently
found to be less than ideal in handling
dichotomous outcomes, due to their strict
statistical assumptions i.e., linearity,
normality, and continuity for OLS
regression and multivariate normality
with equal variances and covariances for
discriminant analysis.
Reporting
As an alternative, logistic regression was

proposed in the late 60s and early 70s
(Cabrera, 1994). It became routinely
available in statistical packages in the early
80s.
With the wide availability of sophisticated
statistical software installed on high-speed
computers, the use of logistic regression is
increasing.
Logistic Regression Models

The central mathematical concept that
underlies logistic regression is the logit.
The simplest example of a logit derives ln(odds)
from a 22 contingency table. Consider
an instance in which the distribution of a
dichotomous outcome variable (a child
from an inner city school recommended
for remedial reading classes) is paired
with a dichotomous predictor variable
(gender).
Table 1. Sample Data for Gender and

Recommendation for Remedial Reading
Remedial reading
instruction recommended
Gender
Totals
Boys
Girls
Yes
(coded as 1)
73
15
88
No
(coded as 0)
23
11
34
Totals
96
26
122
The results yield 2(df =1) = 3.43. Alternatively,

one might prefer to assess a boys odds of
being recommended for remedial reading
instructions, relative to a girls odds; the
result is an odds ratio of 2.33.
73 23
odds ratio =
= 3.17 = 2.33
1.36
15 11
Its natural logarithm [i.e., ln (2.33)] equals

0.85 which would be the regression
coefficient of the gender predictor, if logistic
regression were used to model the two
outcomes of a remedial recommendation as
it is related to gender.
The simple logistic model has the form:
logit (Y ) = natural log( odds ) = ln(
) = + X ,
where is the probability of interested outcome,

is the intercept parameter, is a regression
coefficient, and X is a predictor.
a.k.a.
slope parameter
Figure 1. The relationship of a dichotomous outcome variable,

Y (1=remedial reading recommended, 0=remedial
reading not recommended) with a continuous
predictor, READING scores.
For the data in Table 1, the regression

coefficient () is the logit (=0.85) previously
explained. Taking the antilog of equation (1) on
both sides, one derives an equation for the
prediction of the probability of the occurrence of
the outcome of interest as follows:
= P(Y = outcome of interest | X = x, a specific value)

e + x
=
1+e + x
Extending the logic of the simple logistic

regression to multiple predictors (say
X1=reading score and X2=gender), one may
construct a complex logistic regression for Y
(recommendation for remedial reading programs)
as follows:

logit (Y ) = ln
= + 1 X 1 + 2 X 2 .
1
Therefore,
= P(Y = outcome of interest | X 1 = x1 , X 2 = x2 )
+ 1 x1 + 2 x 2
e
=
+ 1 x1 + 2 x 2
1+ e
Illustration of Logistic
Regression
Analysis and Reporting
The hypothetical data consisted of 189 inner city
school childrens reading scores and gender.
Of these children, 59 (31.22%) were recommended
for remedial reading classes while 130 (68.78%)
were not. A legitimate research hypothesis posed
to the data was: the likelihood that an inner city
school child is recommended for remedial reading
instruction is related to both his/her reading score
and gender.
Table 2. Description of a Hypothetical Data Set for

Logistic Regression
Remedial
Gender
Total
reading
instruction Sample
Boys Girls
(N)
recommende
(n1)
(n2)
d?
Reading
scores
Mea
n
SD
Yes
59
36
23
61.0
7
13.2
8
No
130
57
73
66.6
5
15.8
6
Summary
189
93
96
64.9
1
15.2
9
Logistic Regression Analysis

The logistic regression analysis was
carried out by the LOGISTIC
REGRESSION command in SPSS
version 13 (SPSS Inc., 2004)
Predicted logit of (REMEDIAL)=0.534 +
(0.026)READING + (0.648)GENDER
Evaluations of the Logistic

Regression Model
a) overall model evaluation
b) goodness-of-fit statistics
c) statistical tests of individual predictors
d) validations of predicted probabilities
Overall Model Evaluation

2
df
Likelihood Ratio test
10.019
0.007
Score test
9.518
0.009
Tests
OK?
Goodness-of-fit Test
Test
Hosmer-Lemeshow
Goodness-of-fit test
df
OK?
9.286
0.319
R2-type Indices
Cox and Snell R squared = .052
Nagelkerke (Max rescaled) R squared = .073
Table 3. Logistic Regression Analysis of 189 Childrens Referrals for

Remedial Reading Programs by SPSS LOGISTIC
REGRESSION command (version 12)
e
(odds
ratio)
SE
Walds 2
(df=1)
0.534
0.811
0.434
.510
(not applicable)
READING
0.026
0.012
4.565
.033
0.974
GENDER
0.648
0.325
3.976
.046
1.911
Predictor
CONSTANT
(1=boys, 0=girls)
Table 4. The Observed and the Predicted Frequencies for

Remedial Reading Instructions by Logistic
Regression with the Cutoff of 0.50
Predicted
Yes
No
Percentage
Correct
Yes
56
5.1%
No
129
99.2%
Observed
Overall % correct
Note.
Sensitivity=3/(3+56)=5.1%
Specificity=129/(1+129)=99.2%
False Positive=1/(1+3)=25.0%
False Negative=56/(56+129)=30.3%
69.8%
E
s
t
i
m
a
t
e
d
P
r
o
b
a
b
i
l
i
t
y
0.6
Boys
A
B
0.5
A
FA
CC
EA
AI
E
0.4
PC
Girls
BC
HB
AC
A
CB
AC
0.3
BB
ABA
AIB
A
CJ
BA
AKE
B
AFA
B
0.2
BB
A
CA
AAA
ACA
AB
A
B A
A A
0.1
A
0.0
40
60
80
100
120
140
Reading score
Figure 2. Predicted
probability of being
referred for
remedial reading
instructions versus
reading scores,
plotting symbols
A=1 observation,
B=2 observations,
C=3 observations,
etc.
Reporting and Interpreting

Logistic Regression Results
In addition to Tables 3, 4 and Figure 2, it is
helpful to demonstrate the relationship
between the predicted outcome and certain
characteristics found in observations.
Table 5. Predicated Probability of Being Referred for Remedial

Reading Instructions for 8 Children
Case
Number
Beta= 0.026
READIND
Beta=0.648
GENDER
Intercept
= 0.534
Predicted probability of
being referred for
remedial reading
instructions
Actual outcome
1=Yes, 0=No
52.5
Boy
0.5340
0.4530
85
Boy
0.5340
0.2618
75
Girl
0.5340
0.1941
92
Girl
0.5340
0.1250
60
Boy
0.5340
0.4051
--
60
Girl
0.5340
0.2627
--
100
Boy
0.5340
0.1934
--
100
Girl
0.5340
0.1115
--
Interpretation of Regression
Coefficients
For each point increase on the reading score, the
odds of being recommended for remedial
reading programs decrease from one to 0.974
(=e 0.026, Table 3).
If the increase on the reading score is 10 points,
the odds decrease from one to 0.771
[=e 10*(0.026) ].
However, when the READING score was held as
a constant, boys were predicted to be referred
for remedial reading instructions with greater
probability than girls.
Guidelines and Recommendations

What Tables, Figures, or Charts Should be
Included to Comprehensively Assess the
Result?
the overall evaluation of the logistic
model
goodness-of-fit statistics
statistical tests of individual predictors
an assessment of the predicted
probabilities
What Assumption Should be

Verified?
Logistic regression does not assume that predictor
variables are distributed as a multivariate normal
distribution with equal covariance matrix.
It assumes that the binomial distribution describes the
distribution of the errors, which equal the actual Y
minus the predicted Y ;
It is also the assumed distribution for the conditional
mean of the dichotomous outcome.
What Assumption Should be

Verified?
The binomial assumption may be tested by
the normal z test (Siegel & Castellan, 1988),
or taken to be robust as long as the sample
is random; thus, observations are
independent from each other.
Recommended Reporting
Formats of
Logistic Regression
In terms of reporting logistic regression
results, we recommend presenting
A complete logistic regression

model including
the Y-intercept
odds ratio
a table such as Table 5 to illustrate
the relationship between outcomes
and observations with profiles of
certain characteristics
Recommended Minimum
Observation to Predictor Ratio
The literature has not offered specific rules
applicable to logistic regression.
Several authors of multivariate statistics
recommended a minimum ratio of 10 to 1,
with a minimum sample size of 100 or 50
plus a variable number that is a function of
the number of predictors.
Preview of Next Session

An Introduction To Logistic Regression and Reporting - Peng

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction To Logistic Regression and Reporting - Peng

Uploaded by

Copyright:

Available Formats

An Introduction to Logistic

Regression Analysis and

Three purposes of this

Three purposes of this

Three purposes of this session:

As an alternative, logistic regression was

Logistic Regression Models

Table 1. Sample Data for Gender and

The results yield 2(df =1) = 3.43. Alternatively,

Its natural logarithm [i.e., ln (2.33)] equals

The simple logistic model has the form:

logit (Y ) = natural log( odds ) = ln(

where is the probability of interested outcome,

Figure 1. The relationship of a dichotomous outcome variable,

For the data in Table 1, the regression

= P(Y = outcome of interest | X = x, a specific value)

Extending the logic of the simple logistic

Table 2. Description of a Hypothetical Data Set for

Logistic Regression Analysis

Evaluations of the Logistic

Overall Model Evaluation

Likelihood Ratio test

Table 3. Logistic Regression Analysis of 189 Childrens Referrals for

Table 4. The Observed and the Predicted Frequencies for

Reporting and Interpreting

Table 5. Predicated Probability of Being Referred for Remedial

Guidelines and Recommendations

What Assumption Should be

What Assumption Should be

A complete logistic regression

Preview of Next Session

You might also like