Logistic Regression

Logistic regression
• It is a tool for modeling the effect of one or more risk factors on a binary (dichotomous)
response, usually this binary response indicates the presence or absence of “a disease”.
• Relative risk(RR): estimates the magnitude of an association between exposure and
“disease” and it indicates the likelihood of developing the disease in the exposed group
relative to those who are not exposed:
Cumulative Incidence(exposed)
RR 
Cumulative Incidence (unexposed)
Interpretation:
* RR  1 means no difference between the exposed and the unexposed groups with
respect to the incidence of the disease
* RR  1 the exposed group is more at risk of the disease than the unexposed group
* RR  1 indicate an inverse association between the exposure and the disease.
Current OC
(exposure/risk
factor) use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390
Cumulative Incidence(exposed)
RR 
Cumulative Incidence (unexposed)
27 482
=  1.388
77 1908
• The ODDS ratio is another measure of association between exposure and disease:
ODDS:
*Given a patient is exposed to the risk factor, the ODDS of him/her getting the disease is
given by
P  disease | exp osure  
P  disease | exp osure  
*Similarly, given a patient is not exposed to the risk factor, the ODDS of him/her getting
the disease is given by
1
P  disease | exp osure  
P  disease | exp osure  
Then ODDS Ratio (OR) is given by:
P  disease | exp osure   P  disease | exp osure  

OR 
P  disease | exp osure   P  disease | exp osure  
Current
OC use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390
P  disease | exp osure   P  disease | exp osure  

OR 
P  disease | exp osure   P  disease | exp osure  
=
 27 482   455 482   27 1831  1.41
 77 1908 1831 1908  77  455
Modeling Binary data
We are interested in the P(disease)  P( y  1)  p and particularly how this probability relates
to exposure status (or other risk factor).
A linear model of the form
P( y  1)  p  0  1x Will be unsuitable because
•  j are free to take any value between  ,   , this has implication that P( y  1) can
take any value  ,   but P( y  1) takes only on values in  0,1 .
• Data are not normally distributed, hence the theory underlining linear models will not
valid.
• The variance of the observed probability of success for each individual is not constant
i.e Var  pi 
2
To ensure that P( y  1) will lie in the interval  0,1 we transform the probability scale from the
range
 0,1 to  ,   . Then we formulate a linear model for transformed variable, which will
ensure that fitted probabilities lie between 0 and 1.
The particular transformation that will be concentrating on, is LOGIT transformation
 p 
log it ( p)  log     0  1 x  log  odds of succes 
 1 p 
Once we have estimated the parameters of the model we can back-transformation to obtain
estimate for p using
eb0 b1x
pˆ 
1  eb0 b1x
 j is the log of odds ratio of disease for exposure relative to non-exposure

Thus, e j is an estimate of the relative risk of the disease among the exposed relative to non-
exposed.
Confidence intervals for parameters
100 1    % Confidence interval for the coefficient  j is
 j  z s.e.   j  .
2
The confidence limits for the corresponding odds ratio are obtained by exponentiation of confidence
 j  z s .e.  j 
limits for the  j . That is e 2
Interpretation of the parameters in a linear model
1. For dichotomous exposure,
 j = log of odds ratio of disease for exposure relative to non-exposure
2. Continuous exposure variable

3. For a continuous exposure, X, consider the ratio of odds of disease for an individual for
whom the value of the continuous exposure is X  x  1 , relative to an individual with
X  x:
3
exp   0  1  x  1 
odds ratio   exp  1 
exp   0  1  x  
Where 1 is the change logarithm of odds ratio when X is increased by one unit.
The estimated change in the log odds when X is increased by r units is r 1 .
And the corresponding estimate of the odds ratio is exp  r 1  , which is an estimate of change in risk of
disease, for every increase of r units in the value of the X variable.
Example:
The data below come from a study to determine whether the levels of two proteins, Fibrinogen and
  globulin , increase the erythrocyte sedimentation rate(ESR) at which red blood cells settle out of
suspension in blood plasma. The two protein levels are measured in gm/l. the response variable
indicates whether we have a health individual (1), when ESR  20mn / h or unhealthy individual (0),
when ESR  20mn / h .
Gender Fibrinogen Globulin Response

1 2.52 38 0
1 2.56 31 0
0 2.19 33 0
0 2.18 31 0
1 3.41 37 0
0 2.46 36 0
0 3.22 38 0
1 2.21 37 0
0 3.15 39 0
1 2.6 41 0
1 2.29 36 0
0 2.35 29 0
0 5.06 37 1
1 3.34 32 1
1 2.38 37 1
1 3.15 36 0
0 3.53 46 1
0 2.68 34 0
1 2.6 38 0
0 2.23 37 0
1 2.88 30 0
0 2.65 46 0
1 2.09 44 1
4
0 2.28 36 0
0 2.67 39 0
0 2.29 31 0
0 2.15 31 0
1 2.54 28 0
0 3.93 32 1
0 3.34 30 0
1 2.99 36 0
1 3.22 35 0
Exercise:
Calculate the relative risk of disease related to 0.5 unit increase in Fibrinogen. Compute a 95%
confidence interval for this relative risk.
Logistic regression output;
Model Summary
Step -2 Log likelihood Cox & Snell R Nagelkerke R Square

Square
1 22.874a .221 .358
a. Estimation terminated at iteration number 6 because parameter estimates

changed by less than .001.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Fibrinogen 1.942 .985 3.889 1 .049 6.972
Step 1a Globulin .155 .120 1.687 1 .194 1.168
Constant -12.864 5.822 4.883 1 .027 .000
a. Variable(s) entered on step 1: Frabrino, Globulin.
5
Solution:
 
ˆ fibrinogen  1.94 & s.e. ˆ fibrinogen  0.985
Thus, the odds ratio for 1 unit increase in fibrinogen is

e1.94  6.959 .
A 95% confidence interval for ˆ fibrinogen :
1.94  1.96  0.985,1.94  1.96  0.985    0.0094,3.871
A 95% confidence interval for OR:

e 0.0094
, e3.871   1.009, 48.009 
The effect of a 0.5 unit increase in fibrinogen is

exp  0.5 1.941   2.64
To compute a 95% for this odds ratio, we first compute a 95% CI for
0.5ˆ fibrinogen

0.5ˆ fibrinogen  z  0.5  s.e. ˆ fibrinogen
2

That is,  0.5 1.94  1.96  0.5  0.985, 0.5  1.94  1.96  0.5  0.985    0.0047,1.935 
Thus, the corresponding 95% CI for odds ratio is

e 0.0047
, e1.935   1.0047, 6.929 
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)

Lower Upper
Frabrino 1.942 .985 3.889 1 .049 6.972 1.012 48.034
Step 1a Globulin .155 .120 1.687 1 .194 1.168 .924 1.477
Constant -12.864 5.822 4.883 1 .027 .000
a. Variable(s) entered on step 1: Frabrino, Globulin.
6
HYPOTHESIS TESTING
▪ The Wald statistic for the  coefficient is:
2
Wald = [ /s.e. ]
B
which is distributed chi-square with 1 degree of freedom.

▪ The "Partial R" (in SPSS output) is
1/2
R = {[(Wald-2)/(-2LL()]}
▪
Example
Variable B S.E. Wald R Sig t-value
PETS -0.659 0.2012 10.732 -0.113 0.0011 -3.28

MOBLHOME 1.5583 0.2874 29.39 0.1996 0 5.42
TENURE -0.02 0.008 6.1238 -0.078 0.0133 -2.48
EDUC 0.0501 0.0468 1.1483 0.0000 0.2839 1.07
Constant -0.916 0.69 1.7624 1 0.1843 -1.33
EVALUATING THE PERFORMANCE OF THE MODEL
There are several statistics which can be used for comparing alternative models or evaluating the
performance of a single model:
• Model Chi-Square
• Percent Correct Predictions
• Pseudo-R2
1. MODEL CHI-SQUARE
▪ The model likelihood ratio (LR), statistic is
LR[i] = -2[LL() - LL(, ) ]

{Or, as you are reading SPSS printout:
LR[i] = [-2LL (of beginning model)] - [-2LL (of ending model)]}

▪ The LR statistic is distributed chi-square with i degrees of freedom,
where i is the number of independent variables
7
▪ Use the “Model Chi-Square” statistic to determine if the overall
model is statistically significant.
▪
Example
Beginning Block Number 1. Method: Enter

-2 Log Likelihood 687.35714
Variable(s) Entered on Step Number

1.. PETS PETS
MOBLHOME MOBLHOME
TENURE TENURE
EDUC EDUC
Estimation terminated at iteration number 3 because

Log Likelihood decreased by less than .01 percent.
-2 Log Likelihood 641.842
Chi-Square df Sign.
Model 45.515 4 0.0000
2. PERCENT CORRECT PREDICTIONS
▪ The "Percent Correct Predictions" statistic assumes that if the estimated p is greater
than or equal to .5 then the event is expected to occur and not occur otherwise.
▪ By assigning these probabilities 0s and 1s and comparing these to the actual 0s and
1s, the % correct Yes, % correct No, and overall % correct scores are calculated.
Example
▪
Observed Predicted % Correct
0 1
0 328 24 93.18%
1 139 44 24.04%
Overall 69.53%
3. PSEUDO-R
2 2
▪ One psuedo-R statistic is the McFadden's-R statistic:
8
2
McFadden's-R = 1 - [LL(,)/LL()]
{= 1 - [-2LL(, )/-2LL()] (from SPSS printout)}
2
▪ where the R is a scalar measure which varies between 0 and (somewhat close to) 1
2
much like the R in a LP model.
An Example:
Beginning -2 LL 687.36
Ending -2 LL 641.84
Ending/Beginning 0.9338
McF. R2 = 1 - E./B. 0.0662

Logistic Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression

Uploaded by

Copyright:

Available Formats

Logistic regression

Then ODDS Ratio (OR) is given by:

P  disease | exp osure   P  disease | exp osure  

P  disease | exp osure   P  disease | exp osure  

A linear model of the form

P( y  1)  p  0  1x Will be unsuitable because

The particular transformation that will be concentrating on, is LOGIT transformation

 j is the log of odds ratio of disease for exposure relative to non-exposure

Confidence intervals for parameters

100 1    % Confidence interval for the coefficient  j is

Interpretation of the parameters in a linear model

1. For dichotomous exposure,

 j = log of odds ratio of disease for exposure relative to non-exposure

2. Continuous exposure variable

The estimated change in the log odds when X is increased by r units is r 1 .

Gender Fibrinogen Globulin Response

Logistic regression output;

Step -2 Log likelihood Cox & Snell R Nagelkerke R Square

1 22.874a .221 .358

a. Estimation terminated at iteration number 6 because parameter estimates

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Fibrinogen 1.942 .985 3.889 1 .049 6.972

Step 1a Globulin .155 .120 1.687 1 .194 1.168

Constant -12.864 5.822 4.883 1 .027 .000

a. Variable(s) entered on step 1: Frabrino, Globulin.

Thus, the odds ratio for 1 unit increase in fibrinogen is

A 95% confidence interval for OR:

The effect of a 0.5 unit increase in fibrinogen is

Thus, the corresponding 95% CI for odds ratio is

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)

Frabrino 1.942 .985 3.889 1 .049 6.972 1.012 48.034

Step 1a Globulin .155 .120 1.687 1 .194 1.168 .924 1.477

Constant -12.864 5.822 4.883 1 .027 .000

a. Variable(s) entered on step 1: Frabrino, Globulin.

which is distributed chi-square with 1 degree of freedom.

Variable B S.E. Wald R Sig t-value

PETS -0.659 0.2012 10.732 -0.113 0.0011 -3.28

EVALUATING THE PERFORMANCE OF THE MODEL

▪ The model likelihood ratio (LR), statistic is

LR[i] = -2[LL() - LL(, ) ]

LR[i] = [-2LL (of beginning model)] - [-2LL (of ending model)]}

Beginning Block Number 1. Method: Enter

Variable(s) Entered on Step Number

Estimation terminated at iteration number 3 because

-2 Log Likelihood 641.842

Model 45.515 4 0.0000

2. PERCENT CORRECT PREDICTIONS

You might also like