Professional Documents
Culture Documents
Chapter-08 Logistic Regression
Chapter-08 Logistic Regression
Chapter 8
Aims
When and Why do we Use Logistic
Regression?
Binary
Multinomial
Outcome
We predict the probability of the
outcome occurring
b0 and b0
Can be thought of in much the same way
as multiple regression
Note the normal regression equation
forms part of the logistic regression
equation
Slide 4
Outcome
We still predict the probability of the
outcome occurring
Differences
Note the multiple regression equation
forms part of the logistic regression
equation
This part of the equation expands to
accommodate additional predictors
Slide 5
Y ln PY 1 Y ln 1 PY
i
i1
Assessing Changes in
Models
df knew kbaseline
Wald SESEb
Exp(b) Odds
Oddsbefore
beforeaaunit
unitchange
changein
inthe
thepredictor
predictor
Methods of Regression
Forced Entry: All variables entered
simultaneously.
Hierarchical: Variables entered in
blocks.
Blocks should be based on past research, or
theory being tested. Good Method.
Unique Problems
Incomplete Information
Complete Separation
Overdispersion
Continuous variables
Will your sample contain a to include an 80 year old,
highly anxious, Buddhist left-handed cricket player?
Complete Separation
When the outcome variable can be
perfectly predicted.
1.0
1.0
0.8
0.8
Probability of Outcome
Probability of Outcome
0.6
0.4
0.2
0.0
0.6
0.4
0.2
0.0
20
30
40
50
60
Weight (KG)
70
80
90
20
40
Weight (KG)
60
80
Overdispersion
Overdispersion is where the
variance is larger than expected
from the model.
This can be caused by violating the
assumption of independence.
This problem makes the standard
errors too small!
An Example
Predictors of a treatment intervention.
Participants
113 adults with a medical problem
Outcome:
Cured (1) or not cured (0).
Predictors:
Intervention: intervention or no treatment.
Duration: the number of days before
treatment that the patient had the problem.
Slide 15
Click
Categorical
With a categorical predictor with more than 2 categories you should use either
the highest number to code your control category, then select last for your
indicator contrast. In this data set 1 is cured, 0 not cured (our control
category, therefore we select first as control, see p 279.
Highlight both
predictors,
then click the
>a*b>
Hosmer-Lemeshow
assesses how well the
model fits the data.
Output: Step 1
We can say that the odds of a patient who is treated being cured are 3.41
times higher than those of a patient who is not treated, with a 95% CI of
1.561 to 7.480.
The important thing about this confidence interval is that it doesnt cross 1
(both values are greater than 1). This is important because values greater
than 1 mean that as the predictor variable(s) increase, so do the odds of (in
this case) being cured. Values less than 1 mean the opposite: as the
predictor increases, the odds of being cured decreases.
Output: Step 1
Classification Plot
The .5 line
represents a
coin toss you
have a 50/50
chance.
Further
away
from .5
is better.
If the model fits the data, then the histogram should show all of the cases
for which the event has occurred on the right hand side (C), and all the
cases for which the event hasnt occurred on the left hand side (N).
This model is better at predicting cured cases than it is for non cured
cases, as the non cured cases are closer to the .5 line.
Use the Case Summaries function to create a table of the first 15 cases
showing the values of Cured, Intervention, Duration, the predicted
probability (PRE_1) and the predicted group membership (PGR_1).
Case Summaries
Summary
The overall fit of the final model is shown by the 2
log-likelihood statistic.
If the significance of the chi-square statistic is less than
.05, then the model is a significant fit of the data.
Multinomial logistic
regression
Logistic regression to predict membership of more than
two categories.
It (basically) works in the same way as binary logistic
regression.
The analysis breaks the outcome variable down into a
series of comparisons between two categories.
E.g., if you have three outcome categories (A, B and C),
then the analysis will consist of two comparisons that you
choose:
Compare everything against your first category (e.g. A vs. B and
A vs. C),
Or your last category (e.g. A vs. C and B vs. C),
Or a custom category (e.g. B vs. A and B vs. C).
Predictors:
The content of the chat-up lines were rated for:
Funniness (0 = not funny at all, 10 = the funniest thing that I
have ever heard)
Sexuality (0 = no sexual content at all, 10 = very sexually
direct)
Moral vales (0 = the chat-up line does not reflect good
characteristics, 10 = the chat-up line is very indicative of
good characteristics).
Gender of recipient
Output
Output
Output
Output
Interpretation
Interpretation