You are on page 1of 35

Mike Blyth, M.D., M.P.H.

February 2006

A way to look at the effect of


Numeric * independent variables
on an

Binary (yes-no) dependent variable

*(interval or ratio)

Dependent variable is continuous interval or ratio (numeric) Independent variables are also interval or ratio

Examples
Effect of weight on blood pressure Effect of drug dose on reticulocyte count

Independent Variable

Dependent Variable

Independent Variable

Dependent Variable

Dependent variable is binary (yes/no) outcome. Independent variables are continuous interval

Examples:
Relation of weight and BP to 10 year risk of death Relation of CD4 count to 1 year risk of AIDS diagnosis

AIDS

No AIDS

CD4 > 350


150 < CD4 < 350 CD4 < 150

80
50 20

20
50 80

We could just use categorical analysis such as frequency table

AIDS

No AIDS

CD4 > 350


150 < CD4 < 350 CD4 < 150

80
50 20

20
50 80

Problems some information is lost when we collapse the numeric data into categories. This leads to loss of power. no estimate of magnitude of relation

Probability:
p = probability of event
1 - p = probabilty of not the event (also called q) p varies from 0 to 1

Odds
Ratio of probability of event to probability of not having the event: Odds = p/(1 - p) When p = 0.5, odds = 1 (or 1:1 odds) When p = 0.1, odds = 0.1/0.9 = 0.11

The log odds ratio (also called logit) is simply the natural logarithm of the odds ratio: logit = ln(odds ratio) = ln(p/(1-p)) = ln(p) ln(1-p)

ln (1) = 0, so logit is 0 when odds are 1:1, or probability = 50%


The logit for event of probability p is the opposite of the logit for the probability of not having the event.

Relation between probability p and logit


1.000

0.750

0.500

0.250

0.000 -8 -6 -4 -2 0 2 4 6 8 logit = ln[p/(1-p)]

The linear regression model with one variable is


y = a + bx + e

The logistic regression model with one variable is


logit = a + bx + e

where logit is defined as ln(p/(1-p))

The logistic regression model with one variable is


logit = a + bx where logit = ln(p/(1-p))

In other words, the model says the odds of the event happening are
A constant factor (a) Some other constant (b) times a numeric risk factor (x) (for example, SBP)

Given value of the independent variables, the regression equation predicts the

Log Odds Ratio

The statistics program calculates the coefficient b The coefficient b shows how much the odds ratio changes with a change in the independent variable Positive b higher risk with higher values Negative b lower risk with higher values

Hypothetical example given above examining relation of BP to risk of stroke/death. The model predicts:
ln(odds ratio) = constant + b SBP

e(ln odds ratio) = e(c + b SBP)


Odds Ratio =

e(c + bSBP)

= ec e(bSBP)

The coefficient b shows how much the odds ratio changes with a change in the independent variable
Odds Ratio In other words, Odds Ratio = something (eb)(x) = ec e(bx)

Odds Ratio = constant (eb)(x)

So eb is the factor indicating effect of x on the event. Each one unit change in x will multiply the odds ratio by a factor of eb .

Odds Ratio = constant (eb)(x)


Suppose b = 0.693, so eb = 2
A one-unit change in x will double the odds ratio

Suppose b = -0.693 so eb = 0.5 A one-unit change in x will halve the odds ratio. If b = 0, eb = 1, and x has no effect on OR

For the hypothetical example above, the report is given by Epi Info as
Term BP Odds Ratio 1.0597 95% CI 1.022 1.098 Coeff 0.0579 S. E. 0.0185 Z P

3.131 0.0017

Const

-7.201

2.2994

3.131 0.0017

Term BP Constant

Odds Ratio 1.0597 *

95% CI 1.022 * 1.098 *

Coefficient 0.0579 -7.2014

S. E. 0.018 2.299

Z 3.131 3.131

P-value 0.0017 0.0017

Coefficient, or beta, or b, is the slope or magnitude of the effect.

A one unit change in BP multiplies the odds ratio by 1.0597.


Term

Odds Ratio

95% CI

Coefficient

S. E.

P-value

BP
Constant

1.0597
*

1.0220
*

1.0987
*

0.0579
-7.2014

0.0185
2.2994

3.1319
3.1319

0.0017
0.0017

eb

Odds ratio for one unit change in the independent variable (e.g. BP). This is the calculated eb

Term BP Constant

Odds Ratio 1.0597 *

95% CI 1.022 * 1.098 *

Coeff 0.0579 -7.2014

S. E. 0.0185 2.2994

Z 3.1319 3.1319

P-value 0.0017 0.0017

95% confidence interval for that odds ratio.

The confidence interval does not include 1, so the effect is statistically significant

Single variable:
logit = c + bx
OR = c (eb)x

Multiple variables:
logit = c + b1x1 + b2x2 + + bnxn
OR = c (eb1)x1 (eb2)x2 (ebn)xn

Note that the terms multiply their effect on odds ratio.

Analysis reports a b coefficient for each independent variable. That coefficient is the effect of the given independent variable, separated from the effects of all the other independent variables.

Prospective cohort study of causes of cardiac disease: Evans County Study 1965 Independent variables = age, gender, race, social index, SBP, diabetes, smoking, cholesterol, and an obesity index Dependent variable = risk of dying during 10 year period

Variable
Constant Age Gender Age x gender

Range
40-69 y 0=m, 1=f

b coeff
-6.376 0.086 1.500 -0.043 -0.056 0.0006

SE
1.634 0.115 0.967 0.017 0.040

p
<0.001 <0.001 0.121 0.011 0.160

Social index 20-84 (Soc ind)2 400-7056

0.0003 0.082

SBP
Diabetes Smoking Cholesterol Quartlet

88-310
0=n, 1=y 0=n, 1=y 94-546 2.11-8.76

0.019
1.123 0.317 0.0031 -1.064

0.002
0.261 0.157 0.432

<0.001
<0.001 0.043 0.014

0.0015 0.041

(Quartlet)2

4.44-76.8

0.112

0.049

0.022

Cited in Kelsey et al., Methods in Observational Epidemiology, 1986

Variable Constant Age Gender Age x gender

Range 40-69 y 0=m, 1=f

b coeff -6.376 0.086 1.500 -0.043

SE 1.634 0.115 0.967 0.017

<0.001 <0.001 0.121 0.011

Social index 20-84


(Soc ind)2 SBP Diabetes Smoking 400-7056 88-310 0=n, 1=y 0=n, 1=y

-0.056
0.0006 0.019 1.123 0.317

0.040
0.002 0.261 0.157 0.432 0.049

0.160
<0.001 <0.001 0.043 0.014 0.022

0.0003 0.082

Cholesterol
Quartlet (Quartlet)2

94-546
2.11-8.76 4.44-76.8

0.0031
-1.064 0.112

0.0015 0.041

Variable Age Gender

Range 40-69 y 0=m, 1=f

b coeff 0.086 1.500

SE 0.115 0.967

<0.001 0.121

The p value indicates statistical significance Age is positively correlated with risk of death Gender has positive b coefficient, but the p value is 0.12, indicating that we cannot say that there is a significant relationship.

Variable
Constant Age

Range
40-69 y

b coeff
-6.376 0.086

SE
1.634 0.115

<0.001 <0.001

Gender

0=m, 1=f

1.500

0.967

0.121

Gender is coded as 0 for male, 1 for female eb [e1.5 = 4.48] is change in OR for 1 unit change in gender, i.e. OR for females relative to males eb for any dummy variable (coded 0-1) is the adjusted OR for that risk factor, since 1 unit of change = presence vs. absence of risk factor

Variable Age x gender

Range

b coeff -0.043

SE 0.017

0.011

Social index 20-84


(Soc ind)2

-0.056
0.0006

0.040

0.160

400-7056

0.0003 0.082

Social index squared is included as well as social index itself. Squared terms allow for curvilinear relationships, just as in ordinary regression

Variable Age

Range 40-69 y

b coeff 0.086

SE 0.115

<0.001

Gender

0=m, 1=f

1.500
-0.043

0.967
0.017

0.121
0.011

Age x gender M: 0-0 F: 40-69

Age and gender are entered into model as separate terms

Age x gender included to see whether age has different effect in males than in females.

With binary, dummy variables, eb is the odds ratio. You can compare the strength (slope) of the effect by comparing b. With numeric variables, b is not a direct measure of strength of effect.
Example: b is quite small in effect of BP on mortality, because it is the effect of only one mmHg change in BP. BP is still an important factor in mortality because there is a wide range in the BP.

In a prospective cohort study we can use logistic regression model to predict probability of the event given the independent variables. Also can derive relative risk.

In a cross sectional study we only have the odds ratio.

Same principle as with ordinary regression

Forward selection: add one variable at a time until there are no more that make a significant difference Backward selection: start with all, remove one at a time to see if they made a significant contribution EPI Info has suggestions on how to do this

You might also like