Professional Documents
Culture Documents
Logistics LT
Logistics LT
LOGISTIC REGRESSION
1
f(z)=1+e-z
1
2
0
Z +
, Ph.D.(Statistics)
II
.. ()
M.P.H. (Epidemiology)
Grad. Dip. Medical Statistics
Ph.D. (Statistics)
III
(Multivariable analysis)
(Logistic
Regression)
2543
II
David G. Kleinbaum
2 5
III
5 4
1
2
STATA
IV
.................................................................................................................................................... I
..................................................................................................................................II
............................................................................................................ III
1 (Introduction)................................................................................................................1
1. Logistic Regression................................................................................2
1.1 ............................................................................................2
1.2 ......................................................................................2
1.3 .............................................................3
1.4 Logistic regression .................................................................................4
1.5 ................................................................................................................5
2. (E) (D)........................................6
2.1 Crude analysis...........................................................................................................6
2.2 Stratified analysis ....................................................................................................10
2.2.1 Confounding Effect .........................................................................................10
2.2.2. Interaction effect ............................................................................................12
3. Logistic regression .........................................................................................14
1...................................................................................................17
1 ....................................................................................................................18
2 Logistic Regression
..................................................................................................................................28
1. Logistic Model...............................................................................29
2. Logistic Model.........................................................................................31
2....................................................................................................35
2 ....................................................................................................................36
3 Odds Ratio Logistic Regression Model
..............................................................................................................................43
1. .............................................................................................................................44
2. OR Additive Model..............................................................................45
2.1 0,1 ...............................................46
2.2 0,1................................................46
2.3 ..................................................................47
2.4 ..............................................................................50
2.5 ..............................................................................52
2.6 .................................................................................53
V
3. OR Multiplicative model........................................................................54
3.1 Second order term......................................................................56
3.2 Third order term ........................................................................57
4. OR......................................................................................................59
5. OR...........................................................................................59
3...................................................................................................61
3 ....................................................................................................................62
4 Odds Ratio Logistic Regression
Model.......................................................................................................................................67
4..............................................................................................74
4...............................................................................................................75
5 Logistic Regression Model .............................................................................81
1. Model...............................................................................................................82
1.1 Model .........................................................................................82
1.2 Model............................................................................................83
1.3 Model....................................................................................84
1.3.1 Multicolinearity .............................................................................................84
1.3.2 Multiple Testing.............................................................................................85
1.3.3 Outlier ..........................................................................................................85
1.3.4 Non-linear Relationship..................................................................................86
2. Model .................................................................................................87
3. Model.....................................................................................89
4. Interaction Effect Confounding Effect .................................................94
5. Conditional Logistic Regression Unconditional Logistic Regression ...............................98
5.................................................................................................100
5 ....................................................................................................................96
....................................................................................................................................118
1 ...........................................................................................................119
1 ...........................................................................................................119
2 ...........................................................................................................123
3 ...........................................................................................................126
4 ...........................................................................................................128
2 ............................................................................132
1
1
(Introduction)
1. Logistic Regression
2. (Crude analysis)
3. Stratified Analysis
1. Logistic Regression
1. 1
2.
3.
2
1. Logistic Regression
1.1
1.1
Logistic Regression
LOGISTIC REGESSION
E ? D
:
D = (CHD)
E = (SMK)
[Exposure (E)]
SMK ? CHD [Disease (D)]
[Coronary Heart Disease (CHD)]
[Smoking (SMK)]
1.2 D
(Dependent Variable) D
E D
SMK CHD Outcome
Response variable (event)
E (Independent
Variable)
Independent variable Dependent variable
predictor Outcome
Explanatory variable Response variable
Predictor Explanatory Variable
(factor)
1.2
(CHD)
(SMK)
3
1.3
(Effect of smoking on
coronary heart disease)
(factor)
(
,
2542)
Factor of interest Extraneous factors
Extraneous factors
(Extraneous
Extraneous variables
factors)
Covariates
Controlled variables
Confounders ()
1.3
2
1.4 1)
(Extraneous factors) (Matching)
1. (Design stage) (Restriction)
Matched
Restriction in study design
(Randomization) 2)
Randomization
2. (Analysis stage)
Subgroup analysis
Stratified analysis
Multivariable analysis Logistic regression (Subgroup analysis)
(Stratified analysis)
(Multivariable analysis)
Logistic regression
4
Multivariable analysis
( Dichotomous)
1.5
Logistic regression
2
6
2.
(E) (D)
Logistic regression
2.1 Crude analysis
1.8
2 X 2
ID E D D Total
1. 1 1 E 1 0
2. 1 0
1
0
a
c
b
d
a+b
c+d
(Crude analysis) Bivariate
3. 0 1 Total a+c b+d
b+d n=a+b+c+d
4. 0 0 analysis
5. 1 1
6. 1 0
D Total
7. 0 0
E 1 0
8.
9.
0
1
0
1
1 4 2 6
0 1 3 4
10. 1 1 Total 5 5 10 Dichotomous
ID E (1= 0=) D (1= 0=)
Dichotomous
E
D (Contingency table)
22 SMK 0
7
1
0
1 4
(cell)
1. (Measure of association)
1.1) Relative Risk
1.2) Odds Ratio
2. (Test of association) E D
Chi-square test (Test of
Fishers exact test
association) - (2-test)
McNemar test
Binomial probability test Expected value
5 20%
Fishers Exact test
Note: Chi-square valid Expected (Measure of
value 5 20%
association) Relative Risk (RR.)
Fishers
Exact test Cohort study Odds
Ratio (OR) Cross-
sectional Case-control Study
1.9
Crude analysis
1
CHD
0
359 Cohort
1 42 203 245 study
SMK
0 7 107 114 245 (a+b) 114
49 310 359 (n)
:-
(c+d) 10
Cohort study Cross-sectional Case-control study 42 (a)
X2 = 7.99 X2 = 7.99
df = 1 p-value = 0.005 df = 1 p-value = 0.005 7 (c)
RR = 2.8 OR = 3.2
95%CI : 1.3 6.0 95% CI : 1.4 7.1 (Row totals are
fixed )
Cross-sectional
study
359 (n)
4
a=42 b=203 c=7 d=107
8
[ ]
bc
95%CI . OR. = OR. exp 196
. 1a + 1b + 1c + d1
RR. Cohort study
2.8 RR. =
[a/(a+c)]/[b/(b+c)]
2.8
Cross-
sectional Case-control study
RR.
RR. OR
9
OR 3.2
OR = ad/bc
RR OR
95% [95% Confidence Interval (95%CI.)]
OR 1.4 7.1
1 1
1.4 < 3.2< 7.1
0.5 1 2 4 6 8
0.05
0 0.25
p-value RR OR
(p-value < 0.05) RR
OR
95% CI RR.
OR
1.10
(E) 1
E D
:
E, C
SMK, SEX
D
CHD
CHD
1 0
1 50 10 X2 = 38.10
= <0.01
SMK
0 50
CHD
90
P
OR
95%CI
= 9.0
= 4.0-20.8
?
1 0
M 90 50 X2 = 38.10
SEX
F 10 50
P
OR
95%CI
= <0.01
= 9.0
= 4.0-20.8
X SMK
CHD SEX CHD
9
9
SMK CHD
10
SEX CHD
Stratified analysis (2)
Stratum specific OR
Test of homogeneity of OR p-value > 0.05
Stratum Specific OR
(
)
ORMH ORC
Stratum specific OR
ORMH
Confounding effect Confounding effect
ORMH ORC
ORMH ORC ORMH
()
Confounding effect C
Confounder
E D ORMH
ORMH ORC
ORMH
ORC
ORMH
C
ORC ORMH
1.13
confounding Effect
CHD 5.4
1 0
1 140 60 ORc = 5.4 ORC
SMK 95%CI = 3.5 8.3
0 60 140 (SEX) OR1
Strata 1 9
CHD CHD
1 0 1 0
1 50 10 1 90 50
OR2 Strata 2
SMK SMK
50 90 10 50 9 OR1= OR2 p-
0 0
OR1 = 9.0 OR2 = 9.0
95%CI = 4.2 19.0 95%CI = 4.2 19.0 value = 0.999 ORMH = 9.0
Test
Test of
of homogeneity
homogeneity of
of odds ratios :: p-value
odds ratios p-value == 0.999
0.999 ; OR
ORMH = 9.0 (95%CI: 5.2 15.4)
MH = 9.0 (95%CI: 5.2 15.4)
Test of homogeneity of
odds ratios Stratum-specific odds
ratios
ORMH ORC
(9.0 5.4 )
Confounding effect
Confounder
ORMH
"
12
1.14
Output STATA
(p-value < 0.001)
SEX ||
SEX OR
OR [95%
[95% Conf.
Conf. Interval]
Interval] M-H
M-H Weight
Weight
-----------------+-------------------------------------------------
-----------------+-------------------------------------------------
11 || 99 4.241913
4.241913 19.04442
19.04442 2.5
2.5
9 (95%CI: 5.2
22 || 99 4.241913
4.241913 19.04442
19.04442 2.5
2.5 -----------------+----------------------
-----------------+----------------------
---------------------------
---------------------------
Crude ||
Crude 5.444444
5.444444 3.5527
3.5527 8.343513
8.343513
15.4)"
M-H combined ||
M-H combined 99 5.251333
5.251333 15.42465
15.42465
-----------------+-------------------------------------------------
-----------------+-------------------------------------------------
Test
Test of homogeneity (M-H)
of homogeneity (M-H) chi2(1) ==
chi2(1) 0.00 Pr>chi2 == 1.0000
0.00 Pr>chi2 1.0000
p-value Mantel-
Test
Test that
that combined OR == 1:1:
combined OR
Mantel-Haenszel chi2(1) ==
Mantel-Haenszel chi2(1)
Pr>chi2 ==
Pr>chi2
75.81
75.81
0.0000
0.0000
Haenszel chi-square test ORMH
1
STATA ( 1)
1.15
Confounding Effect Confounding Effect
E D
(ORC)
ORc X
ORMH
ORMH ORMH
OR
C
ORMH OR Stratum
13
1.17 (SMK)
Interaction Effect
CHD (CHD)
1 0
1 75 60 ORc = 1.5 1.5
SMK 95%CI = 0.9 2.4
0 75 90
(ACT)
CHD CHD
1 0 1 0
1 50 50 1 25 10
1
SMK SMK
50 50 25 40 4
0 0
OR1 = 1.0 OR2 = 4.0
95%CI = 0.6 1.7 95%CI = 1.7 9.6 (SMK)
Test
Test of
of homogeneity
homogeneity of
of odds ratios : p-value
odds ratios p-value == 0.009
0.009 ;; OR
ORMH = 1.5 (95%CI: 0.9 2.4)
MH 1.5 (95%CI: 0.9 2.4)
(CHD)
(ACT)
C
Interaction effect E D Effect
Modifier
Effect Modifier
(SMK)
(CHD)
1.18
Output STATA
(SMK)
ACT
ACT || OR
OR [95%
[95% Conf.
Conf. Interval]
Interval] M-H
M-H Weight
Weight
-----------------+-------------------------------------------------
-----------------+-------------------------------------------------
11 ||
22 ||
11
44
.5754505
.5754505
1.663145
1.663145
1.737769
1.737769
9.596975
9.596975
12.5
12.5
2.5
2.5
(CHD)
-----------------+-------------------------------------------------
-----------------+-------------------------------------------------
M-H
Crude ||
Crude
combined ||
M-H combined
1.5
1.5
1.5
1.5
.9504315
.9504315
.949093
.949093
2.367332
2.367332
2.370685
2.370685
(ACT)
-----------------+-------------------------------------------------
-----------------+-------------------------------------------------
homogeneity (M-H) chi2(1) == 6.75 Pr>chi2 == 0.0094
Test
Test of
of homogeneity (M-H) chi2(1) 6.75 Pr>chi2 0.0094
STATA (
Test
Test that
that combined OR == 1:1:
combined OR
Mantel-Haenszel chi2(1) ==
Mantel-Haenszel chi2(1) 3.07
3.07
Pr>chi2 ==
Pr>chi2 0.0796
0.0796 1)
E Risk effect ( OR
1) C=1
Protective effect ( OR 1)
C=2
C
C
1.20
ORc Confounding
Interaction effect Confounder Effect
Modifier
Confounder
E D
Effect Modifier
Stratified analysis
E
D Dichotomous
C Dichotomous
Polytomous 2
Strata
C
3. Logistic regression
1.21
Confounding effect
Interaction effect
Logistic Regression Logistic Regression
Confounder
Model Confounder Logistic regression
model
OR Model OR Adjusted Odds
Adjusted Odds Ratio
Ratio
Interaction effect Interaction effect
15
1.22
E D : Crude Analysis E D
E,C D : Stratified Analysis Crude
E, C1, C2,,Cn D :?
analysis
:
C1 = AGE (C)
C2 = OCC
C3 = SEX Stratified analysis
E, C1, C2,,Cn D
C
Logistic Regression Analysis
1 (AGE)
(OCC) (SEX)
Stratified analysis
Logistic Regression
1.23 Model OR
Logistic Regression Model Model
95% CI
Logit P(D=1) = + ()
OR Logistic regression
STATA
1.24
(
Confounding effect Crude analysis STATA OR 95%CI
. cc CHD SMK
| SMK
| Exposed Unexposed |
|
Total
Proportion
Exposed
)
-----------------+------------------------+----------------------
Cases |
Controls |
140
60
60 |
140 |
200
200
0.7000
0.3000
-----------------+------------------------+----------------------
Total |
|
200 200 |
|
400 0.5000 (SMK)
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 5.444444 | 3.5527 8.343513 (Cornfield) (CHD) (SEX)
Attr. frac. ex. | .8163265 | .7185239 .8801464 (Cornfield)
Attr. frac. pop | .5714286 |
+-----------------------------------------------
chi2(1) = 64.00 Pr>chi2 = 0.0000
Confounder
Crude analysis
ORC = 5.4
16
------------------------------------------------------------------------------
SEX
CHD | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
SMK | 9 2.473863 7.994 0.000 5.251333 15.42465
SEX | .36 .0989545 -3.717 0.000 .2100533 .6169862
------------------------------------------------------------------------------
17
1
. (2541). .
. 3(3) :20-25.
Fleiss, J.L. (1981). Statistical methods for rates and proportions. 2nd edition. New York: John
Willey & Sons.
Kleinbaum, D.G., Kupper, L.L., and Morgenstern, H. (1982). Epidemiologic research: principles
and qualitative methods. London: Lifetime Learning Publications.
18
1
1. 1 1.24 1.27
1.13 STATA
1.1
STATA
STATA
MALE FEMALE sex smk chd n
CHD+ CHD- Total CHD+ CHD- Total 1 1 1 50
SMK+ 50 10 60 SMK- 90 50 140
SMK- 50 90 140 SMK- 10 50 60 1 1 0 10
Total 100 100 200 Total 100 100 200 1 0 1 50
1 0 0 90
2 1 1 90
2 1 0 50
SEX 1=Male, 2=Female 2 0 1 10
SMK 0=No, 1=Yes 2 0 0 50
CHD 0=No, 1=Yes
001
400
1. SEX [ ]
[ ]1. [ ]2.
2. SMK [ ]
[ ]0. [ ]1.
3. CHD [ ]
[ ]0. [ ]1.
19
1.2
STATA ( STATA
STATA )
1.2.1 STATA
((MMeennuu BBaarr))
((IIccoonn BBaarr))
1.2.2
.edit <ENTER> STATA Editor
1.2.3 1.1
( STATA var<X> X 1
) STATA Editor
20
4. 5. OK 6. Data Editor
.expand n <ENTER>
.drop n <ENTER> n
1.2.5
.list <ENTER>
: . records
21
2. 1.17
2.1 STATA
STATA
EXERCISE+ EXERCISE-
001
.
2.3 1.18
: ..
3. ANC
()
Cohort Study
()
28
1 939 944
65 28
(Neonatal death) 400 ( LOGISTIC.DTA
)
LOGISTIC.DTA
ID 1 465
DEAD 28 0= 1=
AREA 0= 1=
MALPRES 0= 1=
BWT ()
MAGE ()
DCHILD
()
3.1 3.7
3.1 Dichotomous
23
.................................................................................................. ......................................
3.2 Continuous
........................................................................................................................................
3.3 (D)
..................................................................................................................
3.4 (E)
..........................................................................................................
3.5 (C)
......................................
3.6
(
)
.
1. 28 DEAD[ ]
[ ]1. [ ]0.
2. AREA[ ]
[ ]1. [ ]0.
3. MALPRES[ ]
[ ]1. [ ]0.
4. ................ BWT[ ][ ][ ][ ]
5. .................................... AGE[
][ ]
6. DCHILD[ ]
..........
3.7
24
3.8.2 STATA
. cc dead area
| area | Proportion
| Exposed Unexposed | Total Exposed
-----------------+------------------------+----------------------
Cases | 37 28 | 65 0.5692
Controls | 204 196 | 400 0.5100
-----------------+------------------------+----------------------
Total | 241 224 | 465 0.5183
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Odds ratio | 1.269608 | .7512221 2.145309 (Cornfield)
Attr. frac. ex. | .2123552 | -.3311642 .5338668 (Cornfield)
Attr. frac. pop | .1208791 |
+-----------------------------------------------
chi2(1) = 0.79 Pr>chi2 = 0.3754
3.9.1.2 STATA
3.9.1.3
ORC = ......................................
OR1 = ......................................
OR2 = ......................................
Woolfs test p-value = ................
ORMH = ......................................
3.9.1.4
3.9.2.1
DCHILD = 0 DCHILD = 1 ()
DEAD DEAD DEAD
167 21 29 7 196 28
170 17 34 20 204 37
26
3.9.2.2 STATA
3.9.2.3
ORC = ......................................
OR1 = ......................................
OR2 = ......................................
Woolfs test p-value = ................
ORMH = ......................................
3.9.2.4
3 = 3000
3.9.3.2 STATA
. gen bwtg = .
(465 missing values generated)
. replace bwtg = 1 if bwt < 2500
(39 real changes made)
. replace bwtg = 2 if bwt >= 2500 & bwt < 3000
(140 real changes made)
. replace bwtg = 3 if bwt >= 3000
(286 real changes made)
. cc dead area, by(bwtg)
3.9.3.4
Logistic Regression
2
:
1. Multivariable analysis
3. Logistic Model
:
1. Logistic Regression Model
1. 1
2.
3.
29
1. Logistic Model
1
2.1
(E)
E D
E,C1 D (D) (C)
E,C1,C2,C3 D
X
X1, X2, ... Xk D (Product term)
:
X1 = E X4 = E*C1 X
X2 = C1 Main effect X5 = C1*C2 Product
term
X3 = C2 X6 = E 2 ()
X1 1 X2 2
Xk K
6 X1
E X2 C1 X3 C2
X4 E C1 X5
C1 C2 X6
E
X4 X5 X6 Product term
X4 X5 Interaction term
Interaction effect
C1 C2 Product term
2.2
(Multi-factorial Outcome)
X1, X2,, XK D
(Mathematical model)
Mathematical Model
(D)
Logistic Model D Dichotomous Dichotomous
Logistic Model
30
2.3
Y = a + bX
Intercept
(Simple Linear Regression)
() (Slope ) Y = a + bX a Y
Y
Y X 0 Y
y = a+bX Intercept
Simple Linear Regression
a
X Slope b
Y +
Y X
0 Y +1 Y -
(, )
+
Y
Dichotomous 2 1
0
Dichotomous
0 1
2.4
Logistic
Logistic Function 1
0 f(z) 1
e-z= 2.7183-z
= Exponential(-z)
= EXP(-z)
31
2.6
Logistic Model : Logistic Model
f (z) =
1 P(X)= 1/[1+e-(a+bixi)]
1+ e(a+ bixi )
a
(Constant) bi
a : (Constant)
b : (Coefficient) (Coefficient)
(Estimation)
Fit Model
Maximum Likelihood
Fit Model
Maximum Likelihood
2. Logistic Model
2.7 ( Kleinbaum, 1994)
Cohort Study
609
Logistic Model
(CHD) Dichotomous
Y = CHD(0,1) 0 1
X 1 = SMK(0,1) 3 (SMK)
X 2 = AGE() 1 0 1
P ( X )=
X 3 = ECG(0,1) 1 + e ( a + b1 SMK + b2 AGE + b3 ECG )
n = 609
(AGE)
9 Electrocardiogram (ECG) 0
1
Logistic Model :
32
1
P( X ) =
1 + e ( a+ b1SMK + b2AGE+ b3 ECG )
a b
Maximum Likelihood
Logistic Regression SAS,
BMDP, SPSS, STATA, GLIM, EGRET
STATA ( StataCorp.,
1999)
a = -3.911 b1 = 0.652
b2 = 0.029 b3 = 0.342
2.8 Logistic
1
P( X )= Model
-[-3.991+ 0.652(SMK) + 0.029( AGE) + 0.342(ECG)]
1+ e
SMK =?
AGE =? P(X) CHD SMK AGE
ECG =?
SMK = 1 AGE = 40 ECG = 0
ECG Model
1
P(X) = -(-3.991+ 0.652(1)+ 0.029(40)+ 0.342(0))
1+ e
1 1 Predicted Risk
= -(-2.101) = 1+8.173 = 0109 . 109 1,000
1+ e 40 ECG
SMK = 0 AGE = 40 ECG = 0 ( SMK)
P(D) = 0.06 60 1,000 SMK=1
P(X) Smoker 0.109
=
P(X) Non - smoker 0.06
= 1.82 = Relative Risk (RR.) AGE=40 ECG=0 Model
0.109
0.109
40 ECG
1000
109 9 (follow-up time)
(SMK=0) AGE ECG
0.06 60
1000
(Ratio) Risk
Risk
0.109/ 0.060=1.82
Risk Ratio Relative Risk (RR.)
33
RR.
ECG (Adjusted RR.)
2.9 RR.
MEASURE OF ASSOCIATION (Direct measure)
Logistic Regression Model
Direct Method Indirect Method
Cohort study
RR () OR ( RR)
Cohort study X
(Cohort / Cross-sectional / Case-control)
Logistic Model Logistic Model
OR
Cohort X X Cohort
P( X1 ) OR. = ebi (X1i X0 i )
RR. =
P( X 0 ) study Case-control Cross-
( a ) ( a )
P(X) P(X)
sectional study
X
OR
RR.
()
OR RR.
RR.
Case-
control Cross-sectional
(a)
RR.
2.10
OR Logistic Model
: OR Logistic
k
b (X X )
OR X1 X O = e i=1 i 1i 0 i
: Regression Model Odds
1
P( X ) = (a + bi x i ) Logistic Model Odds
1+ e
Logit transformation
P(X)
Logit P(X) = ln
= ln Odds
1 - P(X)
P(X) 1/[1 + e ( a + b i X i ) ]
(Ratio) Logistic function
Logit P(X) =a + b X Logit Model
i i
ln Odds = a + b X
a+b X
i i (Linear sum)
Odds = e i i
Logit P(X) = a+biXi
Logit P(X) Log e
34
log
Exponential e ea+biXi
Exponential a+biXi
Odds
2.12 OR Logistic
OR
Regression Model X
X = (SMK, AGE, ECG) X1
X1 = (SMK =1, AGE =40, ECG =0)
X0 = (SMK =0, AGE =40, ECG =0)
40 ECG
X0
fixed
Logit P(X) = a + b1SMK + b2AGE + b3ECG ECG
k
b (X X )
OR X X = e i=1 i 1i 0 i
OR X1
1 O
= e b1 +0+ 0 X0 AGE ECG
= e b1
b1 = 0.652 e 0.652 = 1.92 OR = 1.92
SMK
1-0 1 b1
b1 Exponential b1 OR
(SMK)
(CHD)
AGE ECG
X1 X0
35
2
Kleinbaum, D.G. (1994). Logistic Regression: A self-learning text. New York Springer-Verlag.
StataCorp. (1999). Stata statistical software: Release 6.0. College Station. TX: Stata
Corporation.
36
2
1. 2 3
1.1) Logistic Function :
Output STATA
. logit dead area
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | .2387081 .2697124 0.885 0.376 -.2899185 .7673346
_cons | -1.94591 .2020295 -9.632 0.000 -2.341881 -1.54994
------------------------------------------------------------------------------
2.2) OR
(i) ORx1,Xo =
37
(ii) X1 =.
X0
=.
(iii) ORX1,X0 =
STATA
. logistic dead area
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | 1.269608 .342429 0.885 0.376 .7483246 2.154017
------------------------------------------------------------------------------
(iv) .............................................................
2.3) 2 x 2
DEAD
1 0
1
AREA
0 OR = ............................................
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | .1419709 .2775134 0.512 0.609 -.4019454 .6858872
dchild | 1.321847 .287817 4.593 0.000 .7577363 1.885958
_cons | -2.255301 .2251246 -10.018 0.000 -2.696537 -1.814065
------------------------------------------------------------------------------
3.2) OR
(I) ORx1,Xo =
DCHILD X0 =..........................
(iii) ORx1,Xo =
STATA
. logistic dead area dchild
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | 1.152543 .3198462 0.512 0.609 .6690173 1.985533
dchild | 3.750343 1.079412 4.593 0.000 2.133441 6.592668
------------------------------------------------------------------------------
(iv)
39
( A N )
40
------------------------------------------------------------------------------
hicrimeF | Odds RatioB Std. Err.H zI P>|z|J [95% Conf. Interval]K
---------+--------------------------------------------------------------------
maleteen | 1.086959 .0478646 1.894 0.058 .9970804 1.184939
south | .3272305 .4449077 -0.822 0.411 .0227796 4.70068
educ | 1.023187 .5723757 0.041 0.967 .3418133 3.062818
police59 | 1.059909 .0222633 2.770 0.006 1.01716 1.104455
------------------------------------------------------------------------------
. logit
------------------------------------------------------------------------------
hicrimeF | Coef.L Std. Err.M zI P>|z|J [95% Conf. Interval]N
---------+--------------------------------------------------------------------
maleteen | .0833837 .0440353 1.894 0.058 -.0029239 .1696914
south | -1.117091 1.359616 -0.822 0.411 -3.781888 1.547707
educ | .0229224 .5594047 0.041 0.967 -1.073491 1.119335
police59 | .0581834 .0210049 2.770 0.006 .0170147 .0993522
_cons | -17.70177 9.495993 -1.864 0.062 -36.31357 .9100364
------------------------------------------------------------------------------
[B] This is the likelihood ratio chi-square with 4 degrees of freedom. One degree of freedom is used for each
predictor variable in the logistic regression model. The likelihood-ration chi-square is defined as 2(L1 - L0),
where L0 represents the log likelihood for the "constant-only" model and L1 is the log likelihood for the full
model with constant and predictors. In this example, L0 = -25.573407 (which doesn't show up in the output)
and L1 = -18.606959 (which is found in item d below). Thus, the likelihood-ratio chi-square = 2*(-
25.573407 - (-18.606959)) = 13.93.
[C] This is the p-value associated the chi-square with 4 degrees of freedom. The value of .0075 indicates that the
model as a whole is statistically significant.
41
[D] This is the values of the log likelihood for the model including the constant and all of the predictors that was
computed using the maximum-likelihood logit model.
[E] Technically, R2 cannot be computed the same way in logistic regression as it is in OLS regression. The pseudo-
R2, in logistic regression, is defined as (1 - L1)/L0, where L0 represents the log likelihood for the "constant-
only" model and L1 is the log likelihood for the full model with constant and predictors.
[F] This column starts with the name of the response variable (hicrime) and then lists the names of the predictor
variables (maleteen south educ police59).
[G] The odds ratio column gives the amount of change expected in the odds ratio when there is a one unit change in
the predictor variable with all of the other valiables in the model held constant. An odds ratio close to 1.0 suggest
that there is no change due to the predictor variable.
In this example, the odds ratio for police59 is 1.059909. Thus, you would predict that the odds ratio would
change by 1.059909 for every one unit change in police59 when maleteen, south and educ are held constant.
For a more detailed explanation of odds rations see the Stata FAQ: How do I interpret odds ratios in logistic
regression?
[H] The standard error for the odds ratio is obtained from the logistic regression coefficient and its standard error using
the formula:
[I] This column contains the z-statistic testing the logistic coeffieient.
In the case of the logit command, z = (coef.)/(Std. Err). For this example, z(police59) = .0581834/.0210049
= 2.770.
Stata uses the same z-test value computed for the logistic coefficient as the test of the odds ratio.
[J] This column contains the two-tail p-value for the z-test. Stata uses the same p-value computed testing the
hypothesis, H0: b = 0, for both the logistic coefficients and for the odds ratios.
[K] This column contains the 95% confidence intervals for the odds ratios. Significant effects are suggested when
confidence intervals do not contain 1.0. In this example, the only interval that would be considered significant at
the .05 level is the one for police59. All of the other confidence intervals contain the value 1.0.
42
[L] The coefficient column gives the values for the logistic regression coefficients. These coefficients indicate the
amount of change expected in the log odds when there is a one unit change in the predictor variable with all of the
other variables in the model held constant. A coefficient close to 0 suggest that there is no change due to the
predictor variable.
There is a relationship between the logistic coefficients and the odds ratios, odds ration = exp(coefficient). In this
example the logistic coefficienct for police59 is .0581834, exp(.0581834) = 1.0599094, which is very close
to the value of the odds ratio for police59.
Also in this example, the logistic coefficient for police59 is .0581834. Thus, you would predict that the log odds
for hicrime would change by .0581834 for every one unit change in police59 when maleteen, south and educ are
held constant.
The logistic coefficients can be used in a manner very similar to regression coefficient to generate predicted values.
In this example,
You would get the same results in you used the predict command with the xb option.
[M] This column contains the standard error for the logistic regression coefficient which is used to compute the z-test
for the coefficient.
[N] This column contains the 95% confidence intervals for the logistic regression coefficients. Significant effects are
suggested when confidence intervals do not contain 0. In this example, the only interval that would be considered
significant at the .05 level is the one for police59. All of the other confidence intervals contain the value 0.
43
:
1.
2. Odds Ratio Model Main effect Additive model
2.1) (Categorical Variable)
2.1.1) 2 (Dichotomous)
2.1.1.1) 0,1
2.1.1.2) (Arbitrary coding of E)
2.1.2) 2
2.2) (Ordinal Scaled Variable)
2.3) (Continuous variable)
2.4) Odds Ratio Exposing factor
3. Odds Ratio Model Interaction term Multiplicative model
3.1) Interaction term (Second order term)
3.2) Interaction term (Third order term)
:
1. 1
2.
3.
44
1.
3.1
3.2 OR
Logistic Regression Model
:
Risk Profile 2
OR X 1 ,X 0 = e bi (X 1i X 0 i ) Risk Profile
Risk Profile 2
Model
X = (E, C1, C2,..., Ck) Risk Profile
X1 = (E = 1, C1 =?, C2 =?,..., Ck =?)
D
X0 = (E = 0, C1 =?, C2 =?,..., Ck =?) (Fixed)
E Risk
Profile (X1) (X0)
C1 C2
Ck
(Unspecified but fixed)
Ci
E Dichotomous
45
2. OR Additive Model
3.4
Model :
Logistic Model
Logit P(X) = a + b1E + b2c1 + b3c2 + ... + bk+1ck
Logit transformation
E = x1, c1=x2, c2=x3 ,... ck=xk+1 Xi X1 Xk
K
Logit P(X) = a + b1x1 + b2x2 +... + bkxk OR
OR Xi X
OR X1 , X0 = e bi (X 1i X 0 i ) Risk Profile X
2 OR
Logistic Regression Model Main
effect
Model Additive
model OR
46
2.1
0,1
OR Model
Main effect
Dichotomous
0 (Non-exposed)
1 (Exposed)
3.5
OR Exponential
OR Model
Additive Model X Dichotomous
X 0 1
Logit P(X) = a + b1x1 + b2x2 +... + bkxk x(0,1)
OR Exponential
OR = eb
:-
SMK
Logit P(X) = -3.991 + 0.652SMK + 0.029AGE + 0.342ECG 0.652 OR 1.92
ORSMK=1 = e0.652
= 1.92 ECG
ORECG=1 = e0.342
ECG
1.92
= 1.40
1.9
ECG
2.2
0,1
3.6
OR (Arbitrary coding of E)
Additive Model X Dichotomous
X 0 1 0 1 1 -1
: X(1,-1)
1 (Exposed)
Logit P(X) = -6.7727 + 0.3260SMK + 0.0322AGE + 0.0087ECG -1 (Non-exposed)
SMK = 1
= -1 OR X1
ORSMK=1 = e0.3260(1-(-1))
X0 1 2
= e0.3260(2)
= e0.652 ( 1-(-1) 1+1 2)
= 1.9 ( OR. Model 0,1) Exponential
OR
Model 0 1
47
3.7 Logistic
Regression Model
Coding OR
. E(0,1) OR = eb
. E(1,-1) OR = e2b
. E(100,0) OR = e100b
Model OR
2.3
3.8
OR (E)
Additive model X (Categorical variable)
X 2
2 Polytomous
X q OR q-1
: X = 1 = variable 4 (Categories)
2 = E
3 =
4 =
D OR
X 4 (q = 4)
Dummy variable = q - 1 = 4 - 1 = 3 (Reference group) OR
q-1 q
E E
q-1 Model
Dummy Variable
Fit model
Dummy Variable
Dummy variable
48
3.9
3.11 OR
2
Polytomous Model
:
= CHD (MAR)
= MAR (1 , 2 , 3 , 4 )
= SEX (1 = , 2 = ) 4
= AGE ()
= SMK (0=, 1=) Dichotomous (AGE)
Model :
Logit P(X) = a + b1MAR + b2SEX + b3AGE + b4SMK X
Logit P(X) = a + b1MAR1 + b2MAR2 + b3MAR3 + b4SEX + b5AGE + b6SMK
Dummy variable Model
MAR
(SEX)
(AGE) (SMK)
MAR
3.12 X Polytomous
Dummy Variable Fit Model
1. Polytomous
= CHD
= MAR (1 , 2 , 3 , 4 )
= SEX (1 = , 2 = )
= AGE ()
(MAR) 3 Model
= SMK (0=, 1=)
2. Dummy variable
(Dummy variable) : MAR1 MAR1 MAR1
{ MAR =
MAR =
MAR =
MAR =
Reference group
1
0
0
0
0
1
0
0
0
0
1
0
Dummy Variable
3. Fit Model Logit P(X) = a + b1MAR1 + b2MAR2 + b3MAR3 + b4SEX + b5AGE + b6SMK
Partial method
Reference cell coding
(Reference group)
()
50
3.13 OR
OR
ORMAR1 Exponential
ORMAR1 = eb1
MAR1 eb1
( SEX, AGE SMK )
OR
X1 = (MAR1=1,MAR2=0, MAR3=0, SEX, AGE, SMK)
X0 = (MAR1=0,MAR2=0, MAR3=1, SEX, AGE, SMK)
[eb1 ]
ORX1,X0 = ORMAR1, MAR3
= eb1-b3
ORMAR1,MAR3 Exponential
(b1)
Note: Fit Model : (b3)
- > 2 categories
- Dummy variable
- Reference group [eb1-b3 ]
2.4
(Ordinal Variable)
(Social Support
3.14 Status) SSU
OR
Additive Model X (Ordinal variable)
:
0
Logit P(X) = a + b1SSU + b2SEX + b3AGE
SSU = (Social Support Status) 4
0=, 1=, 2=, 3=, 4=
OR
ORssu=2, ssu=0 = EXP [b1(2-0)]
= 2b1
e .... Fixed AGE SEX SSU MAR
OR
ORssu=4, ssu=2 = EXP [b1(4-2)]
= 2b1
e ..... Fixed AGE SEX SSU
: OR SSU CHD
SSU Polytomous
MAR
SSU
(CHD)
51
SSU Model
SSU
CHD (Linear relationship)
(Dummy variable) OR
( OR = Exp
[bi(X1i-X0i)]
SSU CHD AGE
SEX Odds
Risk Profile 2 AGE
SEX ( Risk Profile)
SSU
3.15 OR
Logit P(X) = 2.6341 - 0.4540SSU + 0.2016AGE + 1.010SEX
ORSSU2,SSU0 Risk
ORSSU=2,SSU=0 = Exp [b1(2-0) + b2(0) + b3(0)]
= Exp [b1(2)] Profile
= e2b 1 OR SSU
= e 2(-0.4540)
= e -0.908 AGE SEX 0
= 0.4
SSU
2 SSU
2 0 SSU
2 OR
Exponential
SSU Exp2b1
(-
0.454) OR 1 (
0.4)
(Protective factor)
52
2.5
OR Model
Product term
(Continuous variable)
3.16
(Ordinal variable)
OR
(Systolic
Additive Model X Blood Pressure SBP) 200
(Continuous variables)
mmHg
: Model
Logit P(x) = a+b1SBP+b2SMH+ b3ECG
ORSBP200,SBP120 = Exp [b1(200-120)+b2(0)+ b3(0)]
= Exp [b1(80)+0+0]
= Exp [b1(80)] (
= e80b1 5) OR
Risk Profile
OR
200 mmHg
120
mmHg Risk
Profile 0
SBP 200 120
80 OR Exponential
80
SBP b1
ECG
200 mmHg
e80b1
120 mmHg OR
Risk profile
80 mmHg
Linear relationship
OR eb
53
Dichotomous
eb Odds
Odds CHD
3.17 200 199
Odds
OR
Odds ratio "
OR = eb X1-X0 = 1
mmHg Odds
eb"
(SBP)
(GAGE)
- (Preterm)
- (Normal)
- (Post term)
Mean Median
Fit Model
OR
Fit Model X Model
OR
2.6
OR Main Effect
3.18
Model
OR
Exposure (E)
E1, C1, C2,...,Ck D (D)
E1, E2, E3 ,...,C1 , C2,...,Ck D (C)
Logit P(X) = a+b1SMK+b2AGE+b3SEX
X1 = (SMK = 1, AGE = 60, SEX = 1) E D E
X2 = (SMK = 0, AGE = 40, SEX = 1)
OR = Exp [b1 (1-0)+b2(60-40)+b3(1-1)]
1
= Exp [b1 (1)+b2(20)+b3(0)] Risk Profile 2
= Exp [b1 + 20 b2] = e b1+20b2
OR (SMK=1)
54
60 (AGE=60)
(SEX = 1)
(SMK=0) 40 (AGE=40)
(SEX=1)
SEX
(E)
OR
Risk Profile 2
SMK 1-0 1
AGE 60-40 20
SEX 0
OR Exponential
SMK 20
AGE
60
[eb1+20b2 ]
40
E 1
3. OR Multiplicative model
3.19 OR
OR Additive model OR
Additive Model ( Interaction term)
E
- Dichotomous
- >2 Categories
- Ordinal
- Continuous
E 1
( Joint effect )
Multiplicative Model ( Interaction term) Model Product term
Interaction term Model
55
Multiplicative model
Interaction term
3.20 Model ()
Interaction term
term Second order term
Second order term 2 term
E*C1, E*C2
Third order term
SMK*AGE, SMK*SEX
Third order term 3
E*C1*C2
SMK*AGE*SEX
Third order term
Model
Interaction effect
3.21
Model Main
effect Interaction term
Hierarchical Well-Formated Models
Model
Second Order term
Logit P(X) = a + b1E + b2c1+b3c2 + b4(E *c1) + b5(E*c2) 4 Model
Third Order term Third order term Model
Logit P(X) = a + b1E + b2c1 + b3c2 + b4(E*c1) + b5(E*c2) Main effect
+b6(c1*c2) + b7(E*c1*c2)
Second order term term
Third order term
Hierarchical Well-formatted model
Model
OR
OR Interaction term
Risk Profile 2
OR
56
Interaction term
Second Order term
(SMK)
3.22
(CHD)
OR
Multiplicative model
(Confounder) (AGE)
1 Second order term (SEX)
Logit P(x) = a + b1SMK + b2AGE + b3SEX + b4ECG + b5SBP
(ECG) (SBP)
+ b6(SMK*SEX)
Model
ORSMK(1,0) e b1 b6(SMK*SEX)
ORSMK SEX
Effect Modifier
Model Interaction term
SEX Effect Modifier term
SMK*SEX Model OR
SMK CHD Model
Interaction term term
SMK Interaction
term SMK*SEX
ORSMK SEX
3.23 ORSMK
SEX = 1 Risk Profile 2
SEX = 1 ()
Model
X1 = (SMK=1, SEX=1) AGE,ECG, SBP SMK SEX SMK
X2 = (SMK=0, SEX=1) Fixed
() ( Odds
ORSMK(1,0) = Exp {b1(1-0)+b3(1-1)+b6[(1x1)-(0 x1)]}
) SEX
= Exp [b1(1)+b3(0)+b6(1-0)] 1 Risk
= Exp (b1 +b6)
profile ( OR SMK
= e b1 + b6 CHD SEX)
Fixed (
)
57
OR Risk Profile
(X1i-X0i)
2 Interaction
term
(X11X12-X01 X02)
Interaction term Risk Profile
1 (X1) Risk Profile 2 (X2)
ORSMK Exponential
SMK
Interaction term
(SMK*SEX) ORSMK
eb1 + b6 ORSMK
3.24 SEX=0
OR = eb1
SEX = 0 ()
X1 = (SMK=1, SEX=0) AGE, ECG, SBP ECG
X2 = (SMK=0, SEX=0) Fixed
eb1 + b6
ORSMK(1,0) = Exp {b1(1-0) + b3(0-0) + b6[(1x0)-(0x0)]}
= Exp [b1(1) + b3(0) + b6(0)]
= e b1 ECG
eb1
3.2 Third order term
3.25
(SMK*ECG) (SEX*ECG)
Main effect SMK SEX
ECG Model AGE
Model
3.26
SEX = 1 ECG = 1
X1 = (SMK=1, SEX=1,ECG=1) AGE SMK CHD
X2 = (SMK=0, SEX=1,ECG=1) SBP Fixed
SEX ECG ORSMK
ORSMK(1,0) = Exp {b1(1-0) + b3(1-1) + b4[1-1) + b6[(1x1)-(0x1)]
+ b7[(1x1)-(0x1)] + b9[1x1)-(1x1)] + b9[(1x1x1)-(0x1x1)]}
ECG
= Exp {b1(1) + b3(0) + b4(0)+b6[(1)-(0)] + b7 [(1)-(0)]
+b9 [(0)-(0)] + b9 [(1)-(0)]}
ORSMK
= Exp [ b1 + 0 + 0 + b6 + b7+ 0 + b9]
ECG ORSMK ECG
= e( b1 + b6 + b7 + b9 ) ORSMK ECG
4 OR
22 (
Effect Modifier)
3.27 ORSMK 4
SEX = 1 ECG = 0
=? Second order term
SEX = 0 ECG = 1
SEX = 0 ECG = 0 } OR SMK
Risk Profile 2 X1
X0
( ORSMK)
OR
2. ORSMK = e b1+b6
3. ORSMK = e b1+b7 Exponential
4. ORSMK = e b1
Exponential
OR
OR ECG
eb1
ECG
eb1
59
4. OR
3.28 OR Logistic
Regression model
OR
1. Risk Profile
X1 = (X1 =..., X2=..., ..., Xk=...)
X0 = (X1 =..., X2=..., ..., Xk=...) Risk Profile 2
Fixed Risk Profile
2.
Model Profile
OR X
1 ,X 0 = e bi (X 1i X 0 i )
bi
3.
bi
OR
Exponential OR
5. OR
OR
Odds (
3.29
) (Exposed
X1) (Non-exposed
OR
OR X1,X0 X0) OR
Odds ( (X1) Model
(X0) X0 (Reference group)
Adjusted OR => Model Adjusted Odds Ratio
=>
=>
OR RR
OR 1=>
OR 1 => Risk factor ( )
OR 1 => Protective factor ( )
Model"
OR RR
OR 1
(Risk factor)
OR 1
60
(Protective factor)
Protective factor
3.30
OR (
1)
ORX1,X0 1 Protective effect ( Risk)
Protective effect
Protective effect No effect Risk effect
"
"
0 0.25 0.5 1 2 3 4 5 ""
Protective effect
X1 X0 X0 X1
OR 1/ OR .
1
ORX0 ,X1 = ORX1,X0
OR, = 0.05 OR, = 1/0.05 = 20 0 1 1 2
OR (
Jaeschke et al., 1995
)
OR 1.05 1.5
()
0.05 0.5
OR
1/0.05 = 20 1/0.5 = 2
( 20
2 )
OR 1
OR
OR
61
3
Jaeschke, R., Guyatt, G., Shannon, H., Walter, S. Cook, D. Heddle, N. (1995). Assessing the
effects of treatment: measures of association . Canadian Medical Association Journal. 152:
351-357
Kleinbaum, D.G. (1994). Logistic Regression: A self-learning text. New York Springer-Verlag.
62
1. 1 2.2.3 BWT
BWTG
1.1
Dummy Variable BWTG
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | .0911082 .2777323 0.328 0.743 -.4532371 .6354536
bwtd1 | .6092406 .40922 1.489 0.137 -.1928159 1.411297
bwtd3 | -.8628492 .2980022 -2.895 0.004 -1.446923 -.2787755
_cons | -1.483589 .2677199 -5.542 0.000 -2.00831 -.9588672
------------------------------------------------------------------------------
1.3 OR
63
1.3.1) OR
(I) ORx1,Xo =
(ii)
X1 =
.................................................................................................
X0 =
.................................................................................................
(iii) ORx1,Xo =
1.3.2) OR
(I) ORx1,Xo =
(ii)
X1 =
.................................................................................................
X0 =
.................................................................................................
(iii) ORx1,Xo =
64
1.3.3) STATA
. logistic dead area bwtd1 bwtd3
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | 1.095388 .3042245 0.328 0.743 .6355674 1.887878
bwtd1 | 1.839034 .7525696 1.489 0.137 .8246337 4.101271
bwtd3 | .4219581 .1257445 -2.895 0.004 .2352932 .7567098
------------------------------------------------------------------------------
2.2) STATA
. gen a_mal = area * malpres
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | -.3988154 .3230824 -1.234 0.217 -1.032045 .2344146
malpres | .8903152 .8428469 1.056 0.291 -.7616343 2.542265
a_mal | 2.362425 .9739987 2.425 0.015 .4534228 4.271427
_cons | -1.988928 .2091045 -9.512 0.000 -2.398765 -1.57909
------------------------------------------------------------------------------
65
2.3) OR
28
Logistic Model:
Logit P(X) = -1.989 - 0.399AREA + 0.890MALPRES + 2.362A_MAL
MALPRES = 0
OR(AREA1,0) = Exp{[-1.989-0.399(1)+0.890(0)+2.362(1)(0)]
-[-1.989-0.399(0)+0.890(0)+2.362(0)(0)]}
= Exp(-0.399)
= ..
MALPRES = 1
OR(AREA1,0) = Exp{[-1.989-0.399(1)+0.890(1)+2.362(1)(1)]
-[-1.989-0.399(0)+0.890(1)+2.362(0)(1)]}
= Exp(-0.399 + 2.362)
= Exp(1.963)
=
2.4) OR Stratified analysis
3.9.2.1 1
2.5) OR STATA
OR Exponential Coefficient
( Output 2.3)
. logistic dead area malpres a_mal
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | .6711146 .2168253 -1.234 0.217 .3562775 1.264168
malpres | 2.435897 2.053089 1.056 0.291 .4669028 12.70842
a_mal | 10.61667 10.34062 2.425 0.015 1.573689 71.6238
------------------------------------------------------------------------------
Main effect
Second order term
Third order term
Odds Ratio
4
Logistic Regression Model
:
1. Odds Ratio
:
1. Logistic Regression
2. Interaction term
3. Interaction term
:
1. 1
2.
3.
68
Logistic Regression
Model
Logistic Regression 2
4.1 Best Predicted Model
Logistic Regression (
( Case-control study ) Cohort study)
Measure of association ( Coefficient) (Risk
Case Control OR (95%CI) assessment)
(n=....) (n=....) (Unadjusted) (Adjusted)
1. % % (
2. % %
3. % %
Lang and Secic, 1997)
...
Risk assessment
Logistic
Regression
Cohort study
Case-control cross-
sectional study
Model
4.2
OR
Logistic Regression 95%CI p-value
...
Logit P(X) = 1.421 + 1.609SMK + 0.095SEX + 0.301AGE
5 (95%CI.OR: 2.1 11.7)
... ( 1)
1 Adjusted Odds Ratio (%)
Case Control OR(95%CI) Crude OR Adjusted OR
(n = 150) (n = 150) (Unadjusted) (Adjusted)
1.
-
-
80.7%
19.3%
30.0%
70.0%
9.7(5.5 17.3) 5(2.1 11.7) (Confidence Interval)
2.
- 73.3% 63.3% 1.6(1.0 2.7) 1.1(0.8 3.7)
95%CI p-value
- 36.7% 33.7%
... ... ... ... ... Adjusted OR
69
Study design
Case-control study
Lang and Secic (1997)
95%CI OR
OR
95% CI OR
(Lower Limit)
(Upper Limit)
4.3 OR (Precision )
95% Confidence Interval
(Precision )
(95% CI. OR.)
Precision OR Significant risk factor
Lower Limit Estimated OR. Upper Limit
. 30-35
.
.
2-80
0 1 + ()
(Null value)
: OR
. Precision
. .
. Precision
(Sample Size)
(Hypothesis testing)
1
Null value
p-value >0.05
1
(p-value <0.05)
Guyatt et al.(1995)
70
95% OR
() 0.05
95%
4.4 Z0.05/2 1.96
OR
[
100 (1 )% CI .OR . = EXP L Z / 2 Var ( L ) ] 3
OR. Standard error SE ( Standard Error) OR
EXP(L) = OR. L
square root variance
= 0.05 : -
OR Var (OR)
[
95 % CI .OR . = EXP L 1 . 96 Var ( L ) ]
Logistic regression model OR
L OR. : -
OR. = EXP[ bi(X1i - X0i)] OR = EXP[bi(X1-X0)]
[ bi(X1i - X0i)] =
L = bi(X1-X0) OR =
L
OR. = EXP(L)
EXP(L) SE OR = Var (L)
L
4.5 OR
Model
Interaction term
Interaction term
OR = EXP (b) 95% CI OR = Exp [b (1.96 Var(b))]
}
1999).
Interaction
(CHD)
(SMK) (ECG)
(CHL)
(HPT)
SE
(
Chi-
square p-value
5)
OR
OR
SMK 1.82 95% CI
0.91 3.63
OR
(
StataCorp., 1999).
4.6
ORSMK = 1.82
95% CI. ORSMK = 0.91, 3.63
Lower Limit Estimated OR. Upper Limit
0 1 2 3 4
Null value 1.82
1 95%
Non Significant ( p-value 0.05)
print out p-value = 0.0896
0.91 3.63
p-value = 0.09 (p-
value = 0.0896)
72
( precision )
1
(Significance) p-value
0.05
p-value
OR 95%CI
Fit Model
Interaction SMK
CHL OR SMK
CHL CHL
()
200 220 240
CHL 200 ORSMK 26.1
Var(L)
73
4.9 Var(L)
Var(L ) Model
X1 Dichotomous (1,0) Interaction
Model: Logit P(X) = a + b1X1 + b2X2 + b3X1X2 Model
L = b1X1 + b3(X1X2)
Var (L) = Var(b1) + (X2)2var(b3) + 2(X2)cov(b1,b3)
Model: Logit P(X) = a + b1X1 + b2X2 + b3X3 + b4X1X2 + b5X1X3
L = b1 X1 + b4X1X2 + b5X1X3 Var(L) Model (
Var (L) = Var(b1) + (X2)2var(b4) + (X3)2var(b5) + 2X2cov(b1,b4) L)
+ 2X3cov(b1,b5) + 2X2X3Cov (b4,b5)
( Kleinbaum, 1994. 141 - 144)
Interaction term Model
Interaction term
(Second order term ) Var(L)
Interaction term Model
Effect Modifier
(
Kleinbaum, 1994. 141 - 144)
74
4
Guyatt, G., Jaeschke, R., Heddle, N., Cook, D., Shannon, H., and Walter S. (1995). Interpreting
study results: confidence intervals. Canadian Medical Association Journal. 152:169-173.
Kleinbaum, D.G. (1994). Logistic Regression: A self-learning text. New York Springer-Verlag.
Lang, TA., Secic, M. (1997). How to report statistics in medicine: annotated guidelines for
authors, editors, and reviewers. Philadelphia: American College of Physician.
StataCorp. (1999). Stata statistical software: Release 6.0. College Station. TX: Stata
Corporation.
75
4
1. 1 (
LOGISTIC.DAT) OR. 95%
1.1)
28 2 x 2
1.1.1) a b c d OR.
95%
1 OR. = ad /bc
1
a b OR. = ........................................................
c d
[ ]
0
)100%CI
( 195%CI.OR. . OR. = OR.exp Z / 2 + + +
1 1 1 1
=.............................................
a b c d
1.1.2) STATA 2 2
( epitab cc)
1.2.3
76
77
2. OR. (MAGE)
(DEAD) 30 20
2.2) OR MAGE30,20 = eL
L =......................................................
OR. =......................................................
95% CI
95% CI OR. = exp [L + 1.96var (L)]
Var(L) = ........................................................
.................................................................................
3. 3 OR.
(AREA) (DEAD) Interaction effect
(MALPRES) Effect Modifier
3.2) OR.
OR(MALPRES=0) =...............................................
OR(MALPRES=1) =...............................................
78
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | -.3988154 .3230824 -1.234 0.217 -1.032045 .2344146
malpres | .8903152 .8428469 1.056 0.291 -.7616343 2.542265
a_m | 2.362425 .9739987 2.425 0.015 .4534228 4.271427
_cons | -1.988928 .2091045 -9.512 0.000 -2.398765 -1.57909
------------------------------------------------------------------------------
. matrix V = get(VCE)
. matrix list V
symmetric V[4,4]
area malpres a_m _cons
area .10438226
malpres .0437247 .71039085
a_m -.10438226 -.71039085 .94867342 Variance-covariance Matrix
_cons -.0437247 -.0437247 .0437247 .0437247
3.3.2 OR.
. MALPRES = 0
95% CI. OR. = Exp[-0.399 1.960.1044]
= Exp(-1.032) Exp(0.234)
= 0.36 1.26
79
. MALPRES = 1
var (L) = bAREA + bA_MAL
= var(bAREA) + 2MALPRES cov(bAREA , bA_MAL )
+ (MALPRES)2var(bA_MAL)
= 0.1044 + [2(1)(-0.01044)] + [(1)2(0.9487)]
= 0.8443
lincom
. lincom area
( 1) area = 0.0
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
(1) | .6711146 .2168253 -1.234 0.217 .3562775 1.264168
------------------------------------------------------------------------------
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
(1) | 7.125 6.546829 2.137 0.033 1.176673 43.14337
------------------------------------------------------------------------------
effmod
. effmod dead area, cov( malpres) int( a_m 0)
Disease: dead
Exposure: area
Confounders: malpres
Interaction Terms and Stratum Values:
a_m: 0
Exposed-Unexposed= 1
l= -.39881537
Var(l)= .10438226
Odds Ratio (95% CI) for dead vs. area: 0.671 (0.356, 1.264)
80
Disease: dead
Exposure: area
Confounders: malpres
Interaction Terms and Stratum Values:
a_m: 1
Exposed-Unexposed= 1
l= 1.9636097
Var(l)= .84429116
Odds Ratio (95% CI) for dead vs. area: 7.125 (1.177, 43.143)
3.5)
81
5
Logistic Regression Model
:
1. Model
2. Model
:
1. Model (Model fitting Strategies)
1.1) Model
1.2) Model
1.3) - Multicolinearity
- Multiple testing
- Outlier
- Non-linear relationship
2. Model
2.1) Initial Model
2.2) Model
1. 1
2.
3.
82
1. Model
Logistic
Regression
5.1
Model Model
Model Fitting Strategy
: (E)
BEST MODEL
Confounder Effect (D)
Modifier
(C) Confounder Effect
:
1. Initial Model Modifier
2. Final Model
3. Effect Modifier OR
4. Confounder
5. Model adequacy
Model
Final Model
Model
Model
1.
2.
3. Effect Modifier
4. Confounder
5.
1.1 Model
Model
2 Model
E D
Risk assessment goal
Model
D E
83
Risk Profile
Prediction goal
5.2
Cohort study
1. Model E D Risk assessment
2. Model D E Prediction
Model
Prediction goal
Prediction goal
Fit Model Backward Elimination Forward
Backward Stepwise method
Risk assessment goal Inclusion Backward Stepwise Method
Fit Model Risk Assessment Goal
Model
Model
Confounder Effect
Modifier
1.2 Model
Model 3
5.3
Model (Initial Model)
1. Initial Model
Effect
2. Effect Modifier
Modifier Effect Modifier
3. Confounder
precision
confounder
Model
84
Effect Modifier
Confounder
Precision
Precision
1.3 Model
5.4
Model
Multicolinearity
Multiple testing Outlier
Multicolinearity Non-linear relationship
Multiple testing (ordinal)
Outlier
Non-Linear relationship
(continuous)
1.3.1 Multicolinearity
Multicolinearity
5.5
Model
Multicolinearity Model
Model AGE
: AGEG Model
:
AGE () AGEG
Model AGE () ()
AGEG ()
:
(Unreliable Coefficient)
Model
Multicolinearity
Multicolinearity Fit model
Multicolinearity
Model
85
(STATA
Multicolinearity )
Model
1.3.2 Multiple Testing
Model
5.6
Multiple testing
Model
Model Model
Model
1.3.3 Outlier
Outlier
5.7
Record
Outlier
record
Outliner
....... . .
Logit P(x)
. .
. . outlier
. . ... .. outlier
.. . Model
. .. ..... . . ..
b
.
Age
Fit Model
outlier
Measure of influence
Hosmer and Lemeshow
(1989)
STATA ( Stata Corp, 1999
86
term)
(20-36 )
5.8
Logit transformation
logit P(X)
Model
Logit P(X)
Risk profile Model
Model
Linear relationship
Crude analysis
1
Model
2. Model
Fit Model
5.9
Initial Model
1
Initial Model
Initial model :
( )
Crude analysis p-value < 0.25
Interaction term Tests of homogeneity of odds ratios Stratum p-
value < 0.25 Stratified analysis
:
(Review literature)
Crude analysis Stratified
analysis p-value < 0.25 (
88
1)
Initial model
Model Multicolinearity
Non-
linear relationship (
)
Initial
Model
5.10
Main effect Model
:
1. Dichotomous
Interaction term
2. Product term (
2.1) Main effect - Clinically or biologically or socially important
5.11 Model
Heirarchically Well-formatted
Model
Model Product term
Model Main effect
Product term
Model
Model
89
Heirarchically Well-
Heirarchically Well-formated Model (HWF) :
formatted Model
1. Logit P(X) = a + b1X1 + b2X2 + b3X3 + b4X1X3
HWF Model
2. Logit P(X) = a + b1X1 + b2X2 + b3X3X4
HWF Model Main effect X3 X4 Model
3. Model
Model
5.12
2
Model Backward elimination
Model Backward Elimination
Heirarchical
(Initial Model)
Principle Model
3 (Third order term)
Third order term
Model Third
2 (Second order term) order term Second
Main effect
order term Main
effect Model
5.14
Model
90
Model
Heirarchical Principle
Heirarchical Principle
Model
:
Initial Model : Logit P(X) = a + b1X1+ b2X2+ b3X3+ b4X1X2+ b5X1X3+ b6X2X3+ b7X1X2X3
1. X1X2X3 Model
2. X1X2X3 , X1X3 X2X3 Model
X1X2 X1 X2
X3
Model
Product term
dummy variable
Initial model
1. X1X2X3
Model
2. X1X2X3 , X1X3
X2X3 Model
X1X2
X1 X2
X3
5.15 Model
Fit Model Initial model
2 (1) p-value
91
5.20 Second order
3. Model 3 : X1X3 term X1X3 Fit Model
Log Likelihood = -59.839966
y Coefficient Std. Error Z P-value [95%Conf. Interval]
X1X3 LR.
x1 | .5128482 .6508643 0.788 0.431 -.7628223 1.788519
x2 | .6767462 .6568664 1.030 0.303 -.6106883 1.964181
Model 2 p-
x3 | .4361177 .597445 0.730 0.465 -.7348529 1.607088 value < 0.05
x4 | -.5383285 .5042896 -1.067 0.286 -1.526718 .450061
_cons | -1.9106 .5103662 -3.744 0.000 -2.910899 -.9103008 Model
LR = -2(-59.84 -(-57.74)) = 4.2 X1
X2df=1 p = 0.04 0.05 X1X3
Model X1 X3 Heirarchical well-formated model X3 Model
X2 Heirarchical Principle
Model
Model 2 X1X3 Model
5.21
X2 X4
X2 p-value
94
4. Interaction Effect
Confounding Effect
Interaction term
Model Model (p-
5.23 value <0.05) Interaction Effect
p>0.05
Interaction term Model
Interaction
Effect
Effect Modifier Interaction Effect
Likelihood
95
Ratio test
Interaction
Statistical test
Con foumding precision Confounder
Statistical test
Validity
Control for Confounding
Validity
Random error Interaction Effect
( OR) confounder
Control Precision ( )
Precision
Precision
Confounding Effect
Potential
Confounder ( Subset
Model) Fit Model
OR OR
Gold Standard Model Subset
OR Gold standard
OR
OR
Gold Standard Model
Model Final Model
96
Confounding effect
(Subset) Initial
Model Confounder
Model
5.25 Fit Model
Model OR
Confounding assessment Interaction effect Model (Gold
OR Model
Model OR OR Full Model (Gold Standard) standard) precision
Model CI
OR Model
Validity Precision Model Final Model
Validity
Best Model Precise
Model Subset Full
Model
Full Model
Best Model
Model Subset
5.26
Interaction Interaction
Full Model :
Logit P(X) = a+b1X1+b2X2+b3X3+b4X4
Model Main
Model Subset OR 95%CI effect X1, X 2, X3 X4
1. X1,X2 ,X3,X44.3 1.9-6.4
2. X1,X2 ,X3, 4.0 3.1-5.0
3. X1,X3 ,X4 4.6 1.7-5.8
4. X1,X2 ,X4 2.6 0.9-4.5 Confounder Best Model
Model : 1, 2, 3 (n
Model : 2
Final Model : Logit P(X) = a+b1X1+b2X2+b3X3 ) Fit Model OR
95% CI OR Model
Model 1 Full Model OR
Gold Standard Model
Subset 4 Model
97
Valid OR
Gold standard
Model
Precision Final Model
term
Interaction term Model
- Model
- Confounding assessment
5.28
Fit Model Confounding assesment
Interaction term Model
Confounding assessment
-
- (Subjective)
Confounder Model
Potential Confounder
Confounding factors Precision
Validity
Precision
98
Confounding effect
Interaction
OR Model Subset
Full Model OR Gold
standard
Kleinbaum, 1994 203-218
5. Conditional Logistic Regression
Unconditional Logistic Regression
5.29
ML 2
1. Unconditional method
Mathematical Model
2. Conditional Method
2 Maximum Likelihood (ML.)
Least Square (LS.) Model
(Coefficient)
b Logistic
Regression ML.
Unconditional method
-
- : SAS (LOGIST) BMDP
GLIM SPSS EGRET SPIDA
S-PLUS STATA
Unconditional
Conditional Method
-
- : SAS (DECAN) SAS (PHREG)
5.30 EGRET SPIDA S+ STATA Conditional
OR
(Overestimate Odds Ratio)
99
Unconditional method
-
- : SAS (LOGIST) BMDP
GLIM SPSS EGRET SPIDA
S-PLUS STATA
Conditional Method
-
- : SAS (DECAN) SAS (PHREG)
EGRET SPIDA S+ STATA
5.31
Unconditional
Matched study
Conditional Unconditional Matched study
Unconditional :- Matched study
- Matched design Conditional
-
Conditional :- Matched study
- Matched study Conditional Matched
-
Outcome ( ) data (Dummy variable)
Matched
1
5.32 Model
?
: Hsieh (1989) Hsieh et al. (1998) Logistic
Rule of thumb: regression
Harrel et al. (1984): 1 Outcome 10
Concato et al. (1993): 1 Outcome 10 Hsieh (1989) Hsieh et al.
Feinstein (1996): 1 Outcome 20
(1998) Harrel et al.
Safe rule :- Outcome
Model
(1984): Concato et al. (1993)
Conditional
Conditional
1 Outcome
Memory 10
5
Model
50
5
100
5
Concato, J., Feinstein, A.R., and Holford, T.R., (1993). The risk of determining risk in
multivariable models. Annals of Internal Medicine. 118:201-210.
Feinstein, A.R. (1996). Multivariable analysis: an introduction. Yale university Press: New
Haven.
Harrell, F.E., Lee, K.L., Califf, R.M., Pryor, D.B., and Rosati, R.A. (1984). Rgression modelling
strategies for improve prognostic modelling. Statistics in Medicine. 3:143-152.
Hsieh, F. Y. (1989). Sample size tables for logistic regression. Stat Med 8, 795-802.
Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size calculation
for linear and logistic regression. Stat Med 17, 1623-34.
101
Hosmer, D.W., and Lemeshow, S. (1989). Applied Logistic Regression. New York: John Wiley
& Sons.
Kleinbaum, D.G. (1994). Logistic Regression: A self-learning text. New York Springer-Verlag.
StataCorp. (1999). Stata statistical software: Release 6.0. College Station. TX: Stata
Corporation.
102
5
1. (Model fitting strategies)
1 4 ( EXAMPLE.DTA
1 4 Polytomous
variable) 2 3
ID 1 465
DEAD 28 0= 1=
AREA 0= 1=
MALPRES 0= 1=
BWT ()
MAGE ()
PLACE 0= 1=
2= 3=()
"summarize"
. summarize
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
dead | 465 .1397849 .3471372 0 1
area | 465 .5182796 .5002039 0 1
malpres | 465 .0752688 .2641087 0 1
bwt | 465 3010.695 437.7349 1850 4000
mage | 465 25.52473 5.362298 17 42
place | 465 .2408602 .5273217 0 3
"tab"
. tab dead
"ci"
. ci dead
" 465 65
(neonatal dead rate) 14.0% (95%CI: 10.8% 17.1%)".
| area |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 37 28 | 65
Noncases | 204 196 | 400
-----------------+------------------------+----------
Total | 241 224 | 465
| |
Risk | .153527 .125 | .1397849
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .028527 | -.0342996 .0913535
Risk ratio | 1.228216 | .778466 1.937803
Attr. frac. ex. | .1858108 | -.2845776 .4839517
Attr. frac. pop | .1057692 |
Odds ratio | 1.269608 | .7512221 2.145309 (Cornfield)
+-----------------------------------------------
chi2(1) = 0.79 Pr>chi2 = 0.3754
104
" 241
15.4% 224
12.5%
1.26 (95%CI: 0.8 2.1)
(p-value = 0.375).
| malpres |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 21 44 | 65
Noncases | 14 386 | 400
-----------------+------------------------+----------
Total | 35 430 | 465
| |
Risk | .6 .1023256 | .1397849
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .4976744 | .3328653 .6624835
Risk ratio | 5.863636 | 3.972854 8.654289
Attr. frac. ex. | .8294574 | .7482918 .8844504
Attr. frac. pop | .2679785 |
Odds ratio | 13.15909 | 6.309044 27.44195 (Cornfield)
+-----------------------------------------------
chi2(1) = 66.67 Pr>chi2 = 0.0000
. gen bwtg = .
(465 missing values generated)
. replace bwtg = 1 if bwt < 2500
(39 real changes made)
. replace bwtg = 2 if bwt >= 2500 & bwt <= 3000
(213 real changes made)
. replace bwtg = 3 if bwt > 3000
(213 real changes made)
| dead
bwtg | 0 1 | Total
-----------+----------------------+----------
1 | 27 12 | 39
| 69.23 30.77 | 100.00
-----------+----------------------+----------
2 | 175 38 | 213
| 82.16 17.84 | 100.00
-----------+----------------------+----------
3 | 198 15 | 213
| 92.96 7.04 | 100.00
-----------+----------------------+----------
Total | 400 65 | 465
| 86.02 13.98 | 100.00
OR 2500
105
"lintrend"
Model
0.00
-1.00
Log odds of dead
-2.00
-3.00
-4.00
2000.0 2500.0 3000.0 3500.0 4000.0
Mean of bwt categories
Assessing Linearity Assumption -- Log Odds
Linear relationship
Linear relationship
. replace bwtg = .
(465 real changes made, 465 to missing)
| bwtg |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 12 53 | 65
Noncases | 27 373 | 400
-----------------+------------------------+----------
Total | 39 426 | 465
| |
Risk | .3076923 .1244131 | .1397849
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .1832792 | .0350754 .3314829
Risk ratio | 2.473149 | 1.449993 4.218275
Attr. frac. ex. | .5956573 | .3103413 .7629363
Attr. frac. pop | .1099675 |
Odds ratio | 3.127883 | 1.513083 6.47943 (Cornfield)
+-----------------------------------------------
chi2(1) = 9.98 Pr>chi2 = 0.0016
107
. gen mageg = .
(465 missing values generated)
. cs dead mageg, or
| mageg |
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 7 58 | 65
Noncases | 39 361 | 400
-----------------+------------------------+----------
Total | 46 419 | 465
| |
Risk | .1521739 .1384248 | .1397849
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .0137491 | -.0951896 .1226878
Risk ratio | 1.099325 | .5336421 2.264657
Attr. frac. ex. | .0903512 | -.8739152 .558432
Attr. frac. pop | .0097301 |
Odds ratio | 1.117153 | .4874978 2.567507 (Cornfield)
+-----------------------------------------------
chi2(1) = 0.07 Pr>chi2 = 0.7985
| dead
place | 0 1 | Total
-----------+----------------------+----------
0 | 337 38 | 375
| 89.87 10.13 | 100.00
-----------+----------------------+----------
1 | 47 21 | 68
| 69.12 30.88 | 100.00
-----------+----------------------+----------
2 | 11 4 | 15
| 73.33 26.67 | 100.00
-----------+----------------------+----------
3 | 5 2 | 7
| 71.43 28.57 | 100.00
-----------+----------------------+----------
Total | 400 65 | 465
| 86.02 13.98 | 100.00
Model
| dead
place | 0 1 | Total
-----------+----------------------+----------
0 | 337 38 | 375
| 89.87 10.13 | 100.00
-----------+----------------------+----------
1 | 47 21 | 68
| 69.12 30.88 | 100.00
-----------+----------------------+----------
2 | 16 6 | 22
| 72.73 27.27 | 100.00
-----------+----------------------+----------
Total | 400 65 | 465
| 86.02 13.98 | 100.00
OR
. csi 21 38 47 337, or
. csi 6 38 16 337, or
(
) OR 95%CI p-value p-value Model
109
3 Stratified analysis
| dead
area | 0 1 | Total
-----------+----------------------+----------
0 | 190 26 | 216
| 87.96 12.04 | 100.00
-----------+----------------------+----------
1 | 196 18 | 214
| 91.59 8.41 | 100.00
-----------+----------------------+----------
Total | 386 44 | 430
| 89.77 10.23 | 100.00
| dead
area | 0 1 | Total
-----------+----------------------+----------
0 | 6 2 | 8
| 75.00 25.00 | 100.00
-----------+----------------------+----------
1 | 8 19 | 27
| 29.63 70.37 | 100.00
-----------+----------------------+----------
Total | 14 21 | 35
| 40.00 60.00 | 100.00
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | -.5413629 .4338548 -1.248 0.212 -1.391703 .3089768
malpres | .8913886 .9323986 0.956 0.339 -.9360792 2.718856
bwtg | 1.117437 .4577921 2.441 0.015 .2201811 2.014693
mageg | 1.439287 .6143028 2.343 0.019 .2352758 2.643299
Iplace_1 | .5058782 .6178105 0.819 0.413 -.7050082 1.716765
Iplace_2 | 1.306483 .7715727 1.693 0.090 -.2057713 2.818738
a_mal | 2.086607 1.073441 1.944 0.052 -.0172996 4.190513
a_mageg | -1.630821 1.032344 -1.580 0.114 -3.654178 .3925359
Ia_pla_1 | .8395218 .8137985 1.032 0.302 -.7554939 2.434537
Ia_pla_2 | .2971595 1.080756 0.275 0.783 -1.821084 2.415403
_cons | -2.38564 .2742555 -8.699 0.000 -2.923171 -1.848109
------------------------------------------------------------------------------
Log-likelihood Model 0
. lrtest, saving(0)
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | -.310655 .355062 -0.875 0.382 -1.006564 .3852538
malpres | .9825991 .9336054 1.052 0.293 -.8472338 2.812432
bwtg | 1.040474 .4492353 2.316 0.021 .159989 1.920959
mageg | 1.55312 .6067453 2.560 0.010 .3639209 2.742319
Iplace_1 | .9748812 .3862543 2.524 0.012 .2178366 1.731926
Iplace_2 | 1.452425 .539014 2.695 0.007 .3959774 2.508873
a_mal | 2.045493 1.077159 1.899 0.058 -.0657009 4.156686
a_mageg | -1.861715 1.004119 -1.854 0.064 -3.829751 .1063217
_cons | -2.487954 .2633255 -9.448 0.000 -3.004063 -1.971846
------------------------------------------------------------------------------
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | -.5590873 .3350725 -1.669 0.095 -1.215817 .0976429
malpres | .8722993 .9317147 0.936 0.349 -.9538279 2.698427
bwtg | 1.047556 .4440794 2.359 0.018 .1771763 1.917935
mageg | .7140317 .4804309 1.486 0.137 -.2275954 1.655659
Iplace_1 | .9866509 .3859063 2.557 0.011 .2302884 1.743013
Iplace_2 | 1.478023 .540374 2.735 0.006 .4189098 2.537137
a_mal | 2.246689 1.073616 2.093 0.036 .1424403 4.350939
_cons | -2.382853 .2488897 -9.574 0.000 -2.870668 -1.895038
------------------------------------------------------------------------------
. lrtest, using(1)
Logit: likelihood-ratio test chi2(1) = 3.78
Prob > chi2 = 0.0518
. lrtest, saving(2)
113
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | -.5157685 .3326557 -1.550 0.121 -1.167762 .1362246
malpres | .7955996 .925203 0.860 0.390 -1.017765 2.608964
bwtg | 1.093564 .4429316 2.469 0.014 .2254335 1.961694
Iplace_1 | .8849724 .376117 2.353 0.019 .1477966 1.622148
Iplace_2 | 1.365092 .5319488 2.566 0.010 .3224913 2.407692
a_mal | 2.266141 1.069409 2.119 0.034 .170138 4.362143
_cons | -2.295464 .2367904 -9.694 0.000 -2.759565 -1.831363
------------------------------------------------------------------------------
. lrtest, using(2)
Logit: likelihood-ratio test chi2(1) = 2.01
Prob > chi2 = 0.1565
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | .5970416 .1986093 -1.550 0.121 .3110624 1.145939
malpres | 2.215769 2.050036 0.860 0.390 .3614018 13.58497
bwtg | 2.984892 1.322103 2.469 0.014 1.252866 7.11136
Iplace_1 | 2.422917 .9113004 2.353 0.019 1.159277 5.063957
Iplace_2 | 3.916082 2.083155 2.566 0.010 1.380563 11.1083
a_mal | 9.642116 10.31136 2.119 0.034 1.185468 78.42503
------------------------------------------------------------------------------
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
(1) | 5.756744 5.836508 1.726 0.084 .7891896 41.99257
------------------------------------------------------------------------------
7 Summarize findings
465 65 28 14.0%
(95%CI: 10.8% 17.1%) 1
115
. 28
(p-value = 0.034)
28 1.7 . (95%CI:
0.9 3.2) .
28 5.8
(95%CI: 0.8 42.0)
****************
116
2.
< 40 577 34
682 57
40 - 59 164 4
245 74
2.1)
2.2) 2.1
2.3) Confounder Effect Modifier
2.4) 2.3
2.5)
3. (Esophageal cancer)
case-control study
case
1 31 2531 Control case
117
1 2 ESOPH_CA.DTA
1-3 ID 1 = case
4 CASE 1 = case
0 = control
5 AlC1 1 =
2 =
6 ALC2 1 =
2 =
7 SMK 1 =
2 =
8 PARA 1 =
2 =
9 CIGA 0 =
1 = 1-9 /
2 = 10-19 /
3 = 20+ /
118
. (2541). .
. 3(3) :20-25.
Concato, J., Feinstein, A.R., and Holford, T.R., (1993). The risk of determining risk in
multivariable models. Annals of Internal Medicine. 118:201-210.
Feinstein, A.R. (1996). Multivariable analysis: an introduction. Yale university Press: New
Haven.
Fleiss, J.L. (1981). Statistical methods for rates and proportions. 2nd edition. New York: John
Willey & Sons.
Guyatt, G., Jaeschke, R., Heddle, N., Cook, D., Shannon, H., and Walter S. (1995). Interpreting
study results: confidence intervals. Canadian Medical Association Journal. 152:169-173.
Harrell, F.E., Lee, K.L., Califf, R.M., Pryor, D.B., and Rosati, R.A. (1984). Rgression modelling
strategies for improve prognostic modelling. Statistics in Medicine. 3:143-152.
Hosmer, D.W., and Lemeshow, S. (1989). Applied Logistic Regression. New York: John Wiley
& Sons.
Hsieh, F. Y. (1989). Sample size tables for logistic regression. Stat Med 8, 795-802.
Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size calculation
for linear and logistic regression. Stat Med 17, 1623-34.
Jaeschke, R., Guyatt, G., Shannon, H., Walter, S. Cook, D. Heddle, N. (1995). Assessing the
effects of treatment: measures of association . Canadian Medical Association Journal. 152:
351-357
Kleinbaum, D.G. (1994). Logistic Regression: A self-learning text. New York Springer-Verlag.
Kleinbaum, D.G., Kupper, L.L., and Morgenstern, H. (1982). Epidemiologic research: principles
and qualitative methods. London: Lifetime Learning Publications.
Lang, TA., Secic, M. (1997). How to report statistics in medicine: annotated guidelines for
authors, editors, and reviewers. Philadelphia: American College of Physician.
Mazumdar, M., and Glassman, J. R. (2000). Categorizing a prognostic variable: review of
methods, code for easy implementation and applications to decision-making about cancer
treatments. Stat Med 19, 113-32.
StataCorp. (1999). Stata statistical software: Release 6.0. College Station. TX: Stata
Corporation.
119
1
1
1
1.2.5 400 records
2
2.1
STATA
EXERCISE+ EXERCISE- exc smk chd n
CHD+ CHD- Total CHD+ CHD- Total 1 1 1 50
SMK+ 50 50 100 SMK+ 25 10 35
SMK- 50 50 100 SMK- 25 40 65 1 1 0 50
Total 100 100 200 Total 50 50 100 1 0 1 50
1 0 0 50
2 1 1 25
2 1 0 10
EXC 1=Always, 2= Not always 2 0 1 25
SMK 0=No, 1=Yes 2 0 0 40
CHD 0=No, 1=Yes
001
300
1. EXC [ ]
[ ]1. [ ]2.
2. SMK [ ]
[ ]0. [ ]1.
3. CHD [ ]
[ ]0. [ ]1.
120
3.
3.1) Dichotomous DEAD AREA MALPRES
3.6)
1.
1. 28 DEAD[ 1 ]
[X]1. [ ]0.
2. AREA[ 1 ]
[X]1. [ ]0.
3. MALPRES[ 0 ]
[ ]1. [X]0.
4. ..........2600....... BWT[2][6 ][ 0 ][ 0 ]
5. .................30.................... AGE[ 3 ][ 0 ]
6.
....0...... DCHILD[ 0 ]
121
3.7) 465
3.9.1.4
Stratified analysis DEAD AREA Interaction
effect MALPRES Effect Modifier
28
28
1.49 (95%CI: 0.80 2.79)
28
7.13 (95%CI: 0.93 67.28)
: 1. (95%CI.)
2. OR 1
OR 1
1/OR 1/0.69 = 1.49
OR 1
(protective effect) 1 (risk
effect)
3.9.2.4
28
2
1.
1.1) Logistic Function :
1
f (z ) =
1 + e z
P (X )
ln = a + b i x i
1 P (X )
2.
2.1) OR Logistic Model
Cohort study
Case-control study
Cross-sectional Study
2.2) OR
(i) ORx1,Xo = Exp{[bi(X1i- X0i)]}
(ii) X1 AREA=1
X0 AREA=0
124
2.3) 2 x 2
DEAD
1 0
1 37 204
AREA
0 28 196 OR = (37 x 196) / (204 x 28) = 1.27
3
3.2) OR
(i) ORx1,Xo = Exp{[bi(X1i- X0i)]}
(ii) DCHILD
X1 (AREA=1, DCHILD )
DCHILD X0
X0 (AREA=0, DCHILD )
125
4.
Logit P(X) = a + b1AREA + b2MALPRES
( 3 2 (2.1)
126
3
1.
1.1 BWTG = 2 (Reference group)
(Dummy Variable)
BWTD1 BWTD3
BWTG = 1 1 0
BWTG = 2 0 0
BWTG = 3 0 1
1.3.1) OR
(i) ORx1,Xo = Exp{[bi(X1i- X0i)]}
(ii)
X1 (BWTD3=1, )
X0 (BWTD1=0, )
(iii) 3.2 (iii) 2
ORx1,Xo = Exp(0.6092406)
= 1.84
1.3.2) OR
(i) ORx1,Xo = Exp{[bi(X1i- X0i)]}
(ii)
X1 (BWTD1=1, )
X0 (BWTD1=0, )
(iii) 3.2 (iii) 2
ORx1,Xo = Exp(-0.8628492)
= 0.42
127
2.
2.1) Model Logit transformation
Logit P(X) = a + b1AREA + b2MALPRES + b3AREA*MALPRES
4
1.
1.1.1) OR. 95%
DEAD
1 0
1 37 204
AREA
0 28 196 OR. = (37 x 196) / (204 x 28) = 1.27
1 1 1 1
95%CI .OR.=1.27 EXP 1.96 + + +
37 204 28 196
= 0.75 - 2.15
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
area | 1.269608 .342429 0.885 0.376 .7483246 2.154017
------------------------------------------------------------------------------
1.2.3 :
(.)
28 1.27
transformation
1.2.1)
OR. = Exp(0.239) = 1.27
95% CI OR. = Exp[0.239 + 1.96(0.27)]
Standard Error (SE.) = 0.27 STATA
.logit dead area
= Exp(-0.29) Exp(0.77)
= 0.75 2.16
1.2.2) STATA
.logistic dead area
OR. = 1.27
95% CI OR. = 0.75 - 2.15
2.
2.1) STATA Fit Model
.logit dead mage
-----------------------------------------------------------------
dead | Coef. Std. Err. z P>|z|
[95% Conf. Interval]
------+----------------------------------------------------------
mage | .0170007 .0244332 0.696 0.487 -.0308874
.0648888
_cons | -2.254025 .64633 -3.487 0.000 -3.520808 -
.9872413
-----------------------------------------------------------------
2.2) OR MAGE30,20 = eL
L = b(MAGE30 - MAGE20)
130
OR. = Exp[(30-20)(0.017)]
= Exp(0.17)
= 1.19
95% CI
95% CI OR. = exp [L 1.96 var (L)]
L = 10(bMAGE)
Var (L) = var[10(bMAGE)]
Var (L) = 102 var(bMAGE)
var (L) = 102 var(bMAGE)
var (L) = 10 x var(bMAGE)
= 10 x SE.(bMAGE)
= 10 x 0.024 ....>(var = SE. SE.(bMAGE) 2.1)
= 0.24
95% CI OR. = Exp[(0.17) + 1.96 x 0.24]
= Exp(-0.3) Exp(0.64)
= 0.74 1.90
: 30 28
1.19 20 (95%CI: 0.74 1.90)
3.
3.2) OR.
OR(MALPRES=0) = Exp(-0.399)
= 0.67
OR(MALPRES=1) = Exp(-0.399 + 2.362)
= 7.12
3.5)
. 28
28 1.49 .
(95%CI: 0.79 2.78)
. 28 7.13
(95%CI: 1.18 43.14)
: OR 1
1
OR=1.49 1 / 0.67 95%CI: 0.79 2.81 1/ 1.26 1/0.36
132
1.
50 () .
.
. 40002
Download http://web.kku.ac.th/~bandit/data/
2. STATA
STATA
Stata Coorporation
702 University Drive East
College Station, TX 77840 USA.
Fax. 409-696-4601
http://www.stata.com
:
.
. . 40002
e-Mail : karawa@kku.ac.th