Chapter-08 Logistic Regression

Logistic Regression
Chapter 8
Aims
When and Why do we Use Logistic
Regression?
Binary
Multinomial
Theory Behind Logistic Regression

Assessing the Model
Assessing predictors
Things that can go Wrong
Interpreting Logistic Regression

Slide 2
When And Why

To predict an outcome variable that is
categorical from one or more
categorical or continuous predictor
variables.
Used because having a categorical
outcome variable violates the
assumption of linearity in normal
regression.
Slide 3
With One Predictor

11
PP((YY)) ((bb0 bb1XX1i ))

1 e 0 1 1 i
1 e
Outcome
We predict the probability of the
outcome occurring
b0 and b0
Can be thought of in much the same way
as multiple regression
Note the normal regression equation
forms part of the logistic regression
equation
Slide 4
With Several Predictor

PP((YY)) ((bb0 bb1XX1bb121XX2 ......bbn XXn i ))
1 e 0 1 1 2 2
n n i
1 e
Outcome
We still predict the probability of the
outcome occurring
Differences
Note the multiple regression equation
forms part of the logistic regression
equation
This part of the equation expands to
accommodate additional predictors
Slide 5
Assessing the Model

log likelihood
Y ln PY 1 Y ln 1 PY
i
i1
The Log-likelihood statistic

Analogous to the residual sum of
squares in multiple regression
It is an indicator of how much
unexplained information there is after
the model has been fitted.
Large values indicate poorly fitting
statistical models.
Assessing Changes in
Models
Its possible to calculate a loglikelihood for different models and to

compare these models by looking at
the difference between their loglikelihoods.
2 2 LL(New) LL(Baseline
)
df knew kbaseline
Assessing Predictors: The

Wald Statistic
bb
Wald
Wald SESEb
Similar to t-statistic in Regression.

Tests the null hypothesis that b =
0.
Is biased when b is large.
Better to look at Likelihood-ratio
statistics.
Slide 8
Assessing Predictors: The

Odds Ratio or Exp(b)
Odds
after
aaunit
change
in
the
predictor
Odds
after
unit
change
in
the
predictor
Exp
(
b
)
Exp(b) Odds
Oddsbefore
beforeaaunit
unitchange
changein
inthe
thepredictor
predictor
Indicates the change in odds

resulting from a unit change in the
predictor.
OR > 1: Predictor , Probability of
outcome occurring .
OR < 1: Predictor , Probability of
outcome occurring .
Slide 9
Methods of Regression
Forced Entry: All variables entered
simultaneously.
Hierarchical: Variables entered in
blocks.
Blocks should be based on past research, or
theory being tested. Good Method.
Stepwise: Variables entered on the basis

of statistical criteria (i.e. relative
contribution to predicting outcome).
Should be used only for exploratory
analysis.
Slide 10
Things That Can go Wrong

Assumptions from Linear Regression:
Linearity
Independence of Errors
Multicollinearity
Unique Problems
Incomplete Information
Complete Separation
Overdispersion
Incomplete Information From

the Predictors
Categorical Predictors:
Predicting cancer from smoking and eating
tomatoes.
We dont know what happens when nonsmokers eat
tomatoes because we have no data in this cell of the
design.
Continuous variables
Will your sample contain a to include an 80 year old,
highly anxious, Buddhist left-handed cricket player?
Complete Separation
When the outcome variable can be
perfectly predicted.
1.0
1.0
0.8
0.8
Probability of Outcome
Probability of Outcome
E.g. predicting whether someone is a burglar

or your teenage son or your cat based on
weight.
Weight is a perfect predictor of cat/burglar
unless you have a very fat cat indeed!
0.6
0.4
0.2
0.0
0.6
0.4
0.2
0.0
20
30
40
50
60
Weight (KG)
70
80
90
20
40
Weight (KG)
60
80
Overdispersion
Overdispersion is where the
variance is larger than expected
from the model.
This can be caused by violating the
assumption of independence.
This problem makes the standard
errors too small!
An Example
Predictors of a treatment intervention.
Participants
113 adults with a medical problem
Outcome:
Cured (1) or not cured (0).
Predictors:
Intervention: intervention or no treatment.
Duration: the number of days before
treatment that the patient had the problem.
Slide 15
Identify any categorical

Covariates (Predictors).
Click
Categorical
Click First, then

Change. See p 279
With a categorical predictor with more than 2 categories you should use either
the highest number to code your control category, then select last for your
indicator contrast. In this data set 1 is cured, 0 not cured (our control
category, therefore we select first as control, see p 279.
Enter Interaction Term(s)

You can specify main
effects and
interactions.
Highlight both
predictors,
then click the
>a*b>
If you dont have previous

literature, choose
Stepwise Forward LR
LR is Likelihood Ratio
Save Settings for Logistic Regression
Option Settings for Logistic

Regression
Hosmer-Lemeshow
assesses how well the
model fits the data.
Look for outliers

+/- 2 SD
Request the 95% CI for the

odds ratio (odds of Y occurring)
Output for Step 0, Constant

Only
Initially the model will always select
the option with the highest frequency,
in this case it selects the intervention
(treated).
Large values for -2 Log Likelihood (-2 LL)
indicate a poor fitting model. The -2 LL will
get smaller as the fit improves.
Example of How to Write

the Logistic Regression
Equation from Coefficients
Using the constant only the

model above predicts a 57%
probability of Y occurring.
Output: Step 1
Equation for Step 1
See p 288 for an

Example of using
equation to compute
Odds ratio.
We can say that the odds of a patient who is treated being cured are 3.41
times higher than those of a patient who is not treated, with a 95% CI of
1.561 to 7.480.
The important thing about this confidence interval is that it doesnt cross 1
(both values are greater than 1). This is important because values greater
than 1 mean that as the predictor variable(s) increase, so do the odds of (in
this case) being cured. Values less than 1 mean the opposite: as the
predictor increases, the odds of being cured decreases.
Output: Step 1
Removing Intervention from the model would have a significant effect on

the predictive ability of the model, in other words, it would be very bad to
remove it.
Classification Plot
The .5 line
represents a
coin toss you
have a 50/50
chance.
Further
away
from .5
is better.
If the model fits the data, then the histogram should show all of the cases
for which the event has occurred on the right hand side (C), and all the
cases for which the event hasnt occurred on the left hand side (N).
This model is better at predicting cured cases than it is for non cured
cases, as the non cured cases are closer to the .5 line.
Choose Analyze Reports Case

Summaries
Use the Case Summaries function to create a table of the first 15 cases
showing the values of Cured, Intervention, Duration, the predicted
probability (PRE_1) and the predicted group membership (PGR_1).
Case Summaries
Summary
The overall fit of the final model is shown by the 2
log-likelihood statistic.
If the significance of the chi-square statistic is less than
.05, then the model is a significant fit of the data.
Check the table labelled Variables in the equation to

see which variables significantly predict the
outcome.
Use the odds ratio, Exp(B), for interpretation.
OR > 1, then as the predictor increases, the odds of
the outcome occurring increase.
OR < 1, then as the predictor increases, the odds of
the outcome occurring decrease.
The confidence interval of the OR should not cross 1!
Check the table labelled Variables not in the equation

to see which variables did not significantly predict
the outcome.
Reporting the Analysis
Multinomial logistic
regression
Logistic regression to predict membership of more than
two categories.
It (basically) works in the same way as binary logistic
regression.
The analysis breaks the outcome variable down into a
series of comparisons between two categories.
E.g., if you have three outcome categories (A, B and C),
then the analysis will consist of two comparisons that you
choose:
Compare everything against your first category (e.g. A vs. B and
A vs. C),
Or your last category (e.g. A vs. C and B vs. C),
Or a custom category (e.g. B vs. A and B vs. C).
The important parts of the analysis and output are much

the same as we have just seen for binary logistic
regression
I may not be Fred

Flintstone
How successful are chat-up lines?
The chat-up lines used by 348 men and 672 women

in a night-club were recorded.
Outcome:
Whether the chat-up line resulted in one of the
following three events:
The person got no response or the recipient walked away,
The person obtained the recipients phone number,
The person left the night-club with the recipient.
Predictors:
The content of the chat-up lines were rated for:
Funniness (0 = not funny at all, 10 = the funniest thing that I
have ever heard)
Sexuality (0 = no sexual content at all, 10 = very sexually
direct)
Moral vales (0 = the chat-up line does not reflect good
characteristics, 10 = the chat-up line is very indicative of
good characteristics).
Gender of recipient
Output
Output
Output
Output
Interpretation
Good_Mate: Whether the chat-up line showed signs of good

moral fibre significantly predicted whether you got a phone
number or no response/walked away, b = 0.13, Wald 2(1) = 6.02,
p < .05.
Funny: Whether the chat-up line was funny did not significantly
predict whether you got a phone number or no response, b =
0.14, Wald 2(1) = 1.60, p > .05.
Gender: The gender of the person being chatted up significantly
predicted whether they gave out their phone number or gave no
response, b = 1.65, Wald 2(1) = 4.27, p < .05.
Sex: The sexual content of the chat-up line significantly predicted
whether you got a phone number or no response/walked away, b
= 0.28, Wald 2(1) = 9.59, p < .01.
FunnyGender: The success of funny chat-up lines depended on
whether they were delivered to a man or a woman because in
interaction these variables predicted whether or not you got a
phone number, b = 0.49, Wald 2(1) = 12.37, p < .001.
SexGender: The success of chat-up lines with sexual content
depended on whether they were delivered to a man or a woman
because in interaction these variables predicted whether or not
you got a phone number, b = 0.35, Wald 2(1) = 10.82, p < .01.
Interpretation
Good_Mate: Whether the chat-up line showed signs of good moral

fibre did not significantly predict whether you went home with the
date or got a slap in the face, b = 0.13, Wald 2(1) = 2.42, p > .05.
Funny: Whether the chat-up line was funny significantly predicted
whether you went home with the date or no response, b = 0.32,
Wald 2(1) = 6.46, p < .05.
Gender: The gender of the person being chatted up significantly
predicted whether they went home with the person or gave no
response, b = 5.63, Wald 2(1) = 17.93, p < .001.
Sex: The sexual content of the chat-up line significantly predicted
whether you went home with the date or got a slap in the face, b =
0.42, Wald 2(1) = 11.68, p < .01.
FunnyGender: The success of funny chat-up lines depended on
whether they were delivered to a man or a woman because in
interaction these variables predicted whether or not you went
home with the date, b = 1.17, Wald 2(1) = 34.63, p < .001.
SexGender: The success of chat-up lines with sexual content
depended on whether they were delivered to a man or a woman
because in interaction these variables predicted whether or not
you went home with the date, b = 0.48, Wald 2(1) = 8.51, p < .
01.
Reporting the Results

Chapter-08 Logistic Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter-08 Logistic Regression

Uploaded by

Copyright:

Available Formats

Logistic Regression

Theory Behind Logistic Regression

Interpreting Logistic Regression

When And Why

With One Predictor

PP((YY)) ((bb0 bb1XX1i ))

With Several Predictor

Assessing the Model

The Log-likelihood statistic

Its possible to calculate a loglikelihood for different models and to

Assessing Predictors: The

Similar to t-statistic in Regression.

Assessing Predictors: The

Indicates the change in odds

Stepwise: Variables entered on the basis

Things That Can go Wrong

Incomplete Information From

E.g. predicting whether someone is a burglar

Identify any categorical

Click First, then

Enter Interaction Term(s)

If you dont have previous

Save Settings for Logistic Regression

Option Settings for Logistic

Look for outliers

Request the 95% CI for the

Output for Step 0, Constant

Example of How to Write

Using the constant only the

Equation for Step 1

See p 288 for an

Removing Intervention from the model would have a significant effect on

Choose Analyze Reports Case

Check the table labelled Variables in the equation to

Check the table labelled Variables not in the equation

Reporting the Analysis

The important parts of the analysis and output are much

I may not be Fred

How successful are chat-up lines?

The chat-up lines used by 348 men and 672 women

Good_Mate: Whether the chat-up line showed signs of good

Good_Mate: Whether the chat-up line showed signs of good moral

Reporting the Results

You might also like