Professional Documents
Culture Documents
• It is a tool for modeling the effect of one or more risk factors on a binary (dichotomous)
response, usually this binary response indicates the presence or absence of “a disease”.
• Relative risk(RR): estimates the magnitude of an association between exposure and
“disease” and it indicates the likelihood of developing the disease in the exposed group
relative to those who are not exposed:
Cumulative Incidence(exposed)
RR
Cumulative Incidence (unexposed)
Interpretation:
* RR 1 means no difference between the exposed and the unexposed groups with
respect to the incidence of the disease
* RR 1 the exposed group is more at risk of the disease than the unexposed group
* RR 1 indicate an inverse association between the exposure and the disease.
Current OC
(exposure/risk
factor) use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390
Cumulative Incidence(exposed)
RR
Cumulative Incidence (unexposed)
27 482
= 1.388
77 1908
• The ODDS ratio is another measure of association between exposure and disease:
ODDS:
*Given a patient is exposed to the risk factor, the ODDS of him/her getting the disease is
given by
P disease | exp osure
P disease | exp osure
*Similarly, given a patient is not exposed to the risk factor, the ODDS of him/her getting
the disease is given by
1
P disease | exp osure
P disease | exp osure
Current
OC use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390
=
27 482 455 482 27 1831 1.41
77 1908 1831 1908 77 455
Modeling Binary data
We are interested in the P(disease) P( y 1) p and particularly how this probability relates
to exposure status (or other risk factor).
• j are free to take any value between , , this has implication that P( y 1) can
take any value , but P( y 1) takes only on values in 0,1 .
• Data are not normally distributed, hence the theory underlining linear models will not
valid.
• The variance of the observed probability of success for each individual is not constant
i.e Var pi
2
To ensure that P( y 1) will lie in the interval 0,1 we transform the probability scale from the
range
0,1 to , . Then we formulate a linear model for transformed variable, which will
ensure that fitted probabilities lie between 0 and 1.
p
log it ( p) log 0 1 x log odds of succes
1 p
Once we have estimated the parameters of the model we can back-transformation to obtain
estimate for p using
eb0 b1x
pˆ
1 eb0 b1x
Thus, e j is an estimate of the relative risk of the disease among the exposed relative to non-
exposed.
j z s.e. j .
2
The confidence limits for the corresponding odds ratio are obtained by exponentiation of confidence
j z s .e. j
limits for the j . That is e 2
3
exp 0 1 x 1
odds ratio exp 1
exp 0 1 x
Where 1 is the change logarithm of odds ratio when X is increased by one unit.
And the corresponding estimate of the odds ratio is exp r 1 , which is an estimate of change in risk of
disease, for every increase of r units in the value of the X variable.
Example:
The data below come from a study to determine whether the levels of two proteins, Fibrinogen and
globulin , increase the erythrocyte sedimentation rate(ESR) at which red blood cells settle out of
suspension in blood plasma. The two protein levels are measured in gm/l. the response variable
indicates whether we have a health individual (1), when ESR 20mn / h or unhealthy individual (0),
when ESR 20mn / h .
4
0 2.28 36 0
0 2.67 39 0
0 2.29 31 0
0 2.15 31 0
1 2.54 28 0
0 3.93 32 1
0 3.34 30 0
1 2.99 36 0
1 3.22 35 0
Exercise:
Calculate the relative risk of disease related to 0.5 unit increase in Fibrinogen. Compute a 95%
confidence interval for this relative risk.
Model Summary
5
Solution:
ˆ fibrinogen 1.94 & s.e. ˆ fibrinogen 0.985
To compute a 95% for this odds ratio, we first compute a 95% CI for
0.5ˆ fibrinogen
0.5ˆ fibrinogen z 0.5 s.e. ˆ fibrinogen
2
That is, 0.5 1.94 1.96 0.5 0.985, 0.5 1.94 1.96 0.5 0.985 0.0047,1.935
6
HYPOTHESIS TESTING
▪ The Wald statistic for the coefficient is:
2
Wald = [ /s.e. ]
B
There are several statistics which can be used for comparing alternative models or evaluating the
performance of a single model:
• Model Chi-Square
• Percent Correct Predictions
• Pseudo-R2
1. MODEL CHI-SQUARE
7
▪ Use the “Model Chi-Square” statistic to determine if the overall
model is statistically significant.
▪
Example
Chi-Square df Sign.
▪ The "Percent Correct Predictions" statistic assumes that if the estimated p is greater
than or equal to .5 then the event is expected to occur and not occur otherwise.
▪ By assigning these probabilities 0s and 1s and comparing these to the actual 0s and
1s, the % correct Yes, % correct No, and overall % correct scores are calculated.
Example
▪
Observed Predicted % Correct
0 1
0 328 24 93.18%
1 139 44 24.04%
Overall 69.53%
3. PSEUDO-R
2 2
▪ One psuedo-R statistic is the McFadden's-R statistic:
8
2
McFadden's-R = 1 - [LL(,)/LL()]
{= 1 - [-2LL(, )/-2LL()] (from SPSS printout)}
2
▪ where the R is a scalar measure which varies between 0 and (somewhat close to) 1
2
much like the R in a LP model.
An Example:
Beginning -2 LL 687.36
Ending -2 LL 641.84
Ending/Beginning 0.9338
McF. R2 = 1 - E./B. 0.0662