You are on page 1of 9

Logistic regression

• It is a tool for modeling the effect of one or more risk factors on a binary (dichotomous)
response, usually this binary response indicates the presence or absence of “a disease”.
• Relative risk(RR): estimates the magnitude of an association between exposure and
“disease” and it indicates the likelihood of developing the disease in the exposed group
relative to those who are not exposed:
Cumulative Incidence(exposed)
RR 
Cumulative Incidence (unexposed)
Interpretation:
* RR  1 means no difference between the exposed and the unexposed groups with
respect to the incidence of the disease
* RR  1 the exposed group is more at risk of the disease than the unexposed group
* RR  1 indicate an inverse association between the exposure and the disease.

Current OC
(exposure/risk
factor) use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390

Cumulative Incidence(exposed)
RR 
Cumulative Incidence (unexposed)
27 482
=  1.388
77 1908
• The ODDS ratio is another measure of association between exposure and disease:
ODDS:
*Given a patient is exposed to the risk factor, the ODDS of him/her getting the disease is
given by
P  disease | exp osure  
P  disease | exp osure  

*Similarly, given a patient is not exposed to the risk factor, the ODDS of him/her getting
the disease is given by

1
P  disease | exp osure  
P  disease | exp osure  

Then ODDS Ratio (OR) is given by:

P  disease | exp osure   P  disease | exp osure  


OR 
P  disease | exp osure   P  disease | exp osure  

Current
OC use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390

P  disease | exp osure   P  disease | exp osure  


OR 
P  disease | exp osure   P  disease | exp osure  

=
 27 482   455 482   27 1831  1.41
 77 1908 1831 1908  77  455
Modeling Binary data

We are interested in the P(disease)  P( y  1)  p and particularly how this probability relates
to exposure status (or other risk factor).

A linear model of the form

P( y  1)  p  0  1x Will be unsuitable because

•  j are free to take any value between  ,   , this has implication that P( y  1) can
take any value  ,   but P( y  1) takes only on values in  0,1 .
• Data are not normally distributed, hence the theory underlining linear models will not
valid.
• The variance of the observed probability of success for each individual is not constant
i.e Var  pi 

2
To ensure that P( y  1) will lie in the interval  0,1 we transform the probability scale from the
range

 0,1 to  ,   . Then we formulate a linear model for transformed variable, which will
ensure that fitted probabilities lie between 0 and 1.

The particular transformation that will be concentrating on, is LOGIT transformation

 p 
log it ( p)  log     0  1 x  log  odds of succes 
 1 p 

Once we have estimated the parameters of the model we can back-transformation to obtain
estimate for p using

eb0 b1x
pˆ 
1  eb0 b1x

 j is the log of odds ratio of disease for exposure relative to non-exposure


Thus, e j is an estimate of the relative risk of the disease among the exposed relative to non-
exposed.

Confidence intervals for parameters

100 1    % Confidence interval for the coefficient  j is

 j  z s.e.   j  .
2

The confidence limits for the corresponding odds ratio are obtained by exponentiation of confidence
 j  z s .e.  j 
limits for the  j . That is e 2

Interpretation of the parameters in a linear model

1. For dichotomous exposure,

 j = log of odds ratio of disease for exposure relative to non-exposure

2. Continuous exposure variable


3. For a continuous exposure, X, consider the ratio of odds of disease for an individual for
whom the value of the continuous exposure is X  x  1 , relative to an individual with
X  x:

3
exp   0  1  x  1 
odds ratio   exp  1 
exp   0  1  x  

Where 1 is the change logarithm of odds ratio when X is increased by one unit.

The estimated change in the log odds when X is increased by r units is r 1 .

And the corresponding estimate of the odds ratio is exp  r 1  , which is an estimate of change in risk of
disease, for every increase of r units in the value of the X variable.

Example:

The data below come from a study to determine whether the levels of two proteins, Fibrinogen and
  globulin , increase the erythrocyte sedimentation rate(ESR) at which red blood cells settle out of
suspension in blood plasma. The two protein levels are measured in gm/l. the response variable
indicates whether we have a health individual (1), when ESR  20mn / h or unhealthy individual (0),
when ESR  20mn / h .

Gender Fibrinogen Globulin Response


1 2.52 38 0
1 2.56 31 0
0 2.19 33 0
0 2.18 31 0
1 3.41 37 0
0 2.46 36 0
0 3.22 38 0
1 2.21 37 0
0 3.15 39 0
1 2.6 41 0
1 2.29 36 0
0 2.35 29 0
0 5.06 37 1
1 3.34 32 1
1 2.38 37 1
1 3.15 36 0
0 3.53 46 1
0 2.68 34 0
1 2.6 38 0
0 2.23 37 0
1 2.88 30 0
0 2.65 46 0
1 2.09 44 1

4
0 2.28 36 0
0 2.67 39 0
0 2.29 31 0
0 2.15 31 0
1 2.54 28 0
0 3.93 32 1
0 3.34 30 0
1 2.99 36 0
1 3.22 35 0

Exercise:

Calculate the relative risk of disease related to 0.5 unit increase in Fibrinogen. Compute a 95%
confidence interval for this relative risk.

Logistic regression output;

Model Summary

Step -2 Log likelihood Cox & Snell R Nagelkerke R Square


Square

1 22.874a .221 .358

a. Estimation terminated at iteration number 6 because parameter estimates


changed by less than .001.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Fibrinogen 1.942 .985 3.889 1 .049 6.972

Step 1a Globulin .155 .120 1.687 1 .194 1.168

Constant -12.864 5.822 4.883 1 .027 .000

a. Variable(s) entered on step 1: Frabrino, Globulin.

5
Solution:
 
ˆ fibrinogen  1.94 & s.e. ˆ fibrinogen  0.985

Thus, the odds ratio for 1 unit increase in fibrinogen is


e1.94  6.959 .
A 95% confidence interval for ˆ fibrinogen :
1.94  1.96  0.985,1.94  1.96  0.985    0.0094,3.871

A 95% confidence interval for OR:


e 0.0094
, e3.871   1.009, 48.009 

The effect of a 0.5 unit increase in fibrinogen is


exp  0.5 1.941   2.64

To compute a 95% for this odds ratio, we first compute a 95% CI for
0.5ˆ fibrinogen


0.5ˆ fibrinogen  z  0.5  s.e. ˆ fibrinogen
2

That is,  0.5 1.94  1.96  0.5  0.985, 0.5  1.94  1.96  0.5  0.985    0.0047,1.935 

Thus, the corresponding 95% CI for odds ratio is


e 0.0047
, e1.935   1.0047, 6.929 

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)


Lower Upper

Frabrino 1.942 .985 3.889 1 .049 6.972 1.012 48.034

Step 1a Globulin .155 .120 1.687 1 .194 1.168 .924 1.477

Constant -12.864 5.822 4.883 1 .027 .000

a. Variable(s) entered on step 1: Frabrino, Globulin.

6
HYPOTHESIS TESTING
▪ The Wald statistic for the  coefficient is:
2
Wald = [ /s.e. ]
B

which is distributed chi-square with 1 degree of freedom.


▪ The "Partial R" (in SPSS output) is
1/2
R = {[(Wald-2)/(-2LL()]}

Example

Variable B S.E. Wald R Sig t-value

PETS -0.659 0.2012 10.732 -0.113 0.0011 -3.28


MOBLHOME 1.5583 0.2874 29.39 0.1996 0 5.42
TENURE -0.02 0.008 6.1238 -0.078 0.0133 -2.48
EDUC 0.0501 0.0468 1.1483 0.0000 0.2839 1.07
Constant -0.916 0.69 1.7624 1 0.1843 -1.33

EVALUATING THE PERFORMANCE OF THE MODEL

There are several statistics which can be used for comparing alternative models or evaluating the
performance of a single model:

• Model Chi-Square
• Percent Correct Predictions
• Pseudo-R2

1. MODEL CHI-SQUARE

▪ The model likelihood ratio (LR), statistic is

LR[i] = -2[LL() - LL(, ) ]


{Or, as you are reading SPSS printout:

LR[i] = [-2LL (of beginning model)] - [-2LL (of ending model)]}


▪ The LR statistic is distributed chi-square with i degrees of freedom,
where i is the number of independent variables

7
▪ Use the “Model Chi-Square” statistic to determine if the overall
model is statistically significant.

Example

Beginning Block Number 1. Method: Enter


-2 Log Likelihood 687.35714

Variable(s) Entered on Step Number


1.. PETS PETS
MOBLHOME MOBLHOME
TENURE TENURE
EDUC EDUC

Estimation terminated at iteration number 3 because


Log Likelihood decreased by less than .01 percent.

-2 Log Likelihood 641.842

Chi-Square df Sign.

Model 45.515 4 0.0000

2. PERCENT CORRECT PREDICTIONS

▪ The "Percent Correct Predictions" statistic assumes that if the estimated p is greater
than or equal to .5 then the event is expected to occur and not occur otherwise.
▪ By assigning these probabilities 0s and 1s and comparing these to the actual 0s and
1s, the % correct Yes, % correct No, and overall % correct scores are calculated.

Example

Observed Predicted % Correct
0 1
0 328 24 93.18%
1 139 44 24.04%
Overall 69.53%

3. PSEUDO-R
2 2
▪ One psuedo-R statistic is the McFadden's-R statistic:

8
2
McFadden's-R = 1 - [LL(,)/LL()]
{= 1 - [-2LL(, )/-2LL()] (from SPSS printout)}
2
▪ where the R is a scalar measure which varies between 0 and (somewhat close to) 1
2
much like the R in a LP model.

An Example:

Beginning -2 LL 687.36
Ending -2 LL 641.84
Ending/Beginning 0.9338
McF. R2 = 1 - E./B. 0.0662

You might also like