Professional Documents
Culture Documents
An Introduction to Logistic
Regression Analysis and
Reporting
Many research problems in education call for
the analysis and prediction of a dichotomous
outcome, for example, whether a child should
be classified as learning disabled (LD), or
whether a teenager is prone to engage in risky
behaviors.
Traditionally, these research questions were
addressed by either ordinary least squares
(OLS) regression or linear discriminant
function analysis.
An Introduction to Logistic
Regression Analysis and
Reporting
Both techniques were subsequently
found to be less than ideal in handling
dichotomous outcomes, due to their strict
statistical assumptions i.e., linearity,
normality, and continuity for OLS
regression and multivariate normality
with equal variances and covariances for
discriminant analysis.
An Introduction to Logistic
Regression Analysis and
Reporting
Remedial reading
instruction recommended
Gender
Totals
Boys
Girls
Yes
(coded as 1)
73
15
88
No
(coded as 0)
23
11
34
Totals
96
26
122
73 23
odds ratio =
= 3.17 = 2.33
1.36
15 11
) = + X ,
1+e + x
logit (Y ) = ln
= + 1 X 1 + 2 X 2 .
1
Therefore,
= P(Y = outcome of interest | X 1 = x1 , X 2 = x2 )
+ 1 x1 + 2 x 2
e
=
+ 1 x1 + 2 x 2
1+ e
Illustration of Logistic
Regression
Analysis and Reporting
The hypothetical data consisted of 189 inner city
school childrens reading scores and gender.
Of these children, 59 (31.22%) were recommended
for remedial reading classes while 130 (68.78%)
were not. A legitimate research hypothesis posed
to the data was: the likelihood that an inner city
school child is recommended for remedial reading
instruction is related to both his/her reading score
and gender.
Remedial
Gender
Total
reading
instruction Sample
Boys Girls
(N)
recommende
(n1)
(n2)
d?
Reading
scores
Mea
n
SD
Yes
59
36
23
61.0
7
13.2
8
No
130
57
73
66.6
5
15.8
6
Summary
189
93
96
64.9
1
15.2
9
df
10.019
0.007
Score test
9.518
0.009
Tests
OK?
Goodness-of-fit Test
Test
Hosmer-Lemeshow
Goodness-of-fit test
df
OK?
9.286
0.319
R2-type Indices
Cox and Snell R squared = .052
Nagelkerke (Max rescaled) R squared = .073
SE
Walds 2
(df=1)
0.534
0.811
0.434
.510
(not applicable)
READING
0.026
0.012
4.565
.033
0.974
GENDER
0.648
0.325
3.976
.046
1.911
Predictor
CONSTANT
(1=boys, 0=girls)
No
Percentage
Correct
Yes
56
5.1%
No
129
99.2%
Observed
Overall % correct
Note.
Sensitivity=3/(3+56)=5.1%
Specificity=129/(1+129)=99.2%
False Positive=1/(1+3)=25.0%
False Negative=56/(56+129)=30.3%
69.8%
E
s
t
i
m
a
t
e
d
P
r
o
b
a
b
i
l
i
t
y
0.6
Boys
A
B
0.5
A
FA
CC
EA
AI
E
0.4
PC
Girls
BC
HB
AC
A
CB
AC
0.3
BB
ABA
AIB
A
CJ
BA
AKE
B
AFA
B
0.2
BB
A
CA
AAA
ACA
AB
A
B A
A A
0.1
A
0.0
40
60
80
100
120
140
Reading score
Figure 2. Predicted
probability of being
referred for
remedial reading
instructions versus
reading scores,
plotting symbols
A=1 observation,
B=2 observations,
C=3 observations,
etc.
Beta= 0.026
READIND
Beta=0.648
GENDER
Intercept
= 0.534
Predicted probability of
being referred for
remedial reading
instructions
Actual outcome
1=Yes, 0=No
52.5
Boy
0.5340
0.4530
85
Boy
0.5340
0.2618
75
Girl
0.5340
0.1941
92
Girl
0.5340
0.1250
60
Boy
0.5340
0.4051
--
60
Girl
0.5340
0.2627
--
100
Boy
0.5340
0.1934
--
100
Girl
0.5340
0.1115
--
Interpretation of Regression
Coefficients
For each point increase on the reading score, the
odds of being recommended for remedial
reading programs decrease from one to 0.974
(=e 0.026, Table 3).
If the increase on the reading score is 10 points,
the odds decrease from one to 0.771
[=e 10*(0.026) ].
However, when the READING score was held as
a constant, boys were predicted to be referred
for remedial reading instructions with greater
probability than girls.
Recommended Reporting
Formats of
Logistic Regression
In terms of reporting logistic regression
results, we recommend presenting
Recommended Minimum
Observation to Predictor Ratio
The literature has not offered specific rules
applicable to logistic regression.
Several authors of multivariate statistics
recommended a minimum ratio of 10 to 1,
with a minimum sample size of 100 or 50
plus a variable number that is a function of
the number of predictors.