Professional Documents
Culture Documents
Outline
Review of simple and multiple regression
Simple Logistic Regression
The logistic function
Interpretation of coefficients
continuous predictor (X)
dichotomous categorical predictor (X)
categorical predictor with three or more levels (X)
E(Y|X) = o + 1f(x)
Here f(x) is any function of X, e.g.
f(x) = X
E(Y|X) = o + 1X
(line)
f(x) = ln(X) E(Y|X) = o + 1ln(X) (curved)
The key is that E(Y|X) is a linear in the
parameters o and 1 but not necessarily in X.
0 = Estimated Intercept
= y -value at x = 0
y E (Y | X ) 0 1 x
Interpretable only if x = 0 is a
value of particular interest.
1 w units
^
w units
0
1 = Estimated Slope
= Change in y for every
unit increase in x
= estimated change
in the mean of Y for
a unit change in X.
Always interpretable!
Logistic Regression
Models relationship between set of variables Xi
and
dichotomous categorical response variable Y
e.g. Success/Failure, Remission/No Remission
Survived/Died, CHD/No CHD, Low Birth Weight/Normal
Birth Weight, etc
Logistic Regression
Example: Coronary Heart Disease (CD) and Age In
this study sampled individuals were examined for
signs of CD (present = 1 / absent = 0) and the
potential relationship between this outcome and their
age (yrs.) was considered.
This is a portion of the raw data for the 100 subjects who
participated in the study.
Logistic Regression
How can we analyze these data?
Non-pooled t-test
Logistic Regression
Simple Linear Regression?
Logistic Regression
We can group individuals into age classes and look
at the percentage/proportion showing signs of
coronary heart disease.
Logistic Function
P(Success|X)
e o 1 X
P (" Success"| X )
1 e o 1 X
Logit Transformation
The logistic regression model is given by
o 1 X
e
P (Y | X )
1 e o 1 X
which is equivalent to
P (Y | X )
o 1 X
ln
1 P (Y | X )
This is called the
Logit Transformation
Dichotomous Predictor
Consider a dichotomous predictor (X) which
represents the presence of risk (1 = present)
P
e o 1 X
1 P
P(Y 1 | X 1)
e o 1
1 - P(Y 1 | X 1)
P(Y 1 | X 0)
Odds for Disease with Risk Absent
e o
1 - P(Y 1 | X 0)
Odds for Disease with Risk Present
o 1
Odds
for
Disease
with
Risk
Present
e
Therefore the
1
e
odds ratio (OR) Odds for Disease with Risk Absent
e o
Dichotomous Predictor
Therefore, for the odds ratio associated with
ln(OR ) 1
thus the estimated regression coefficient
associated with a 0-1 coded dichotomous
predictor is the natural log of the OR
associated with risk presence!!!
P (Y | X )
P
ln
ln
o 1 X
1 P
1 P (Y | X )
P
e o 1 X
1 P
P(Y 1 | X 1)
e o 1
1 - P(Y 1 | X 1)
P(Y 1 | X -1)
Odds for Disease with Risk Absent
e o 1
1 - P(Y 1 | X -1)
Odds for Disease with Risk Present
o 1
Odds
for
Disease
with
Risk
Present
e
Therefore the
2 1
e
odds ratio (OR) Odds for Disease with Risk Absent
e o 1
Dichotomous Predictor
Therefore, for the odds ratio associated with
2
ln(OR ) 2 1
thus twice the estimated regression
coefficient associated with a +1 / -1 coded
dichotomous predictor is the natural log of
the OR associated with risk presence!!!
Risk Present
1 .335
OR e
2 1
e .670 1.954
2 1
e .670 1.954
(LCL,UCL)
2nd
We estimate that the odds for having a low birth weight infant are between 1.86
and 2.05 times higher for smokers than non-smokers, with 95% confidence.
ln
p
o 1 X 1 2 X 2 p X p
1 p
p
5.31 .111 Age
1 p
p
Odds
e 5.31.111 Age
1 p
ln
Race[ Black ]
Race[Other ]
Calculate the odds for low birth weight for each race (Low, Norm)
White Infants (reference group, missing in parameters)
Odds Ratios for the other factors in the model can be computed
as well. All of which can be prefaced by the adjusting for
statement.
Summary
In logistic regression the response (Y) is a