You are on page 1of 23

Phn tch hi qui logistic

(logistic regression analysis)


Nguyn Vn Tun
Nhiu nghin cu y khoa (v khoa hc thc nghim ni chung) c mc tiu chnh
l phn tch mi tng quan gia mt (hay nhiu) yu t nguy c v nguy c mc bnh.
Chng hn nh i vi mt nghin cu v mi tng quan gia thi quen ht thuc l v
ung th phi, th yu t nguy c y l thi quen ht thuc l v i tng phn tch l
nguy c mc ung th phi. Ni theo thut ng dch t hc, yu t nguy c chnh l risk
factors, v i tng phn tch l outcome. Trong cc nghin cu ny, i tng phn
tch thng c th hin qua cc bin s nh phn, tc l c/khng, mc bnh/khng
mc bnh, cht/sng, xy ra/khng xy ra, v.v Yu t nguy c c th l cc bin s
lin tc (nh tui, p sut mu, mt xng, v.v) hay cc bin nh phn (nh gii
tnh) hay bin mang c tnh th bc (nh tnh trng ca bnh dao ng t nh, trung
bnh n nghim trng).
Vn t ra cho cc nghin cu dng ny l lm cch no c tnh tng
quan (magnitude of association) gia yu t nguy c v bnh. Cc phng php phn
tch nh m hnh hi qui tuyn tnh (linear regression model) khng th p dng c,
bi v bin ph thuc (dependent variable) khng phi l mt bin lin tc, m l bin nh
phn. Vo thp nin 1970s nh thng k hc David R. Cox pht trin mt m hnh c tn
l logistic regression model (m ti tm dch l m hnh hi qui logistic) phn tch
cc bin nh phn. Ti s gii thch cch ng dng m hnh ny qua mt s v d t n
gin n phc tp. Ti s khng bn n cc chi tit ton hc ca m hnh hi qui
logistic, m ch tp trung vo cc kha cnh thc t v din dch kt qu phn tch.

I. Phn tch hi qui logistic n gin cho nghin cu i chng


V d 1: Nghin cu mi tng quan gia phi nhim cht c da cam v
ung th tuyn tin lit. Giri v ng nghip (2004) tin hnh mt nghin cu s b
thm nh mi lin h gia phi nhim cht c mu da cam (Agent Orange AO) v
nguy c ung th tuyn tin lit (prostate cancer risk) cc cu chin binh M tng tham
chin Vit Nam trc y. Cc nh nghin cu chn on 47 trng hp ung th tin
lit tuyn tng tham chin. Sau , h ngu nhin chn 144 cu chin binh cng tng
tham chin Vit Nam v nay nhp vin v cc l do khng lin quan n ung th. Gi
nhm ny l nhm i chng (control). mi nhm, cc nh nghin cu tm trong h

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

s bnh l v phng vn trc tip bit ai l ngi tng phi nhim AO trong thi
chin. Kt qu cho thy trong s 47 trng hp ung th, c 11 ngi tng b phi nhim
AO, 29 ngi khng tng b phi nhim, v 7 ngi khng r tin s; trong nhm i
chng c 17 ngi khng tng b phi nhim, 106 ngi khng tng b phi nhim, v
21 ngi khng th xc nh phi nhim. Kt qu c th tm lc trong bng s liu sau
y:
Bng 1. Phi nhim AO v ung th tin lit tuyn
Ung th
i chng
(n=47)
(n=142)
Phi nhim AO
11
17
Khng phi nhim AO
29
106
Khng r
7
21
Tng s
47
144
Ghi ch: n l s bnh nhn. Ngun s liu: Giri VN, Cassidy AE, Beebe-Dimmer J, Ellis
LR, Smith DC, Bock CH, Cooney KA. Association between Agent Orange and prostate
cancer: a pilot case-control study. Urology. 2004 Apr;63(4):757-60; discussion 760-1.
Correction in Urology. 2004 Jun;63(6):1213.
minh ha cho phn tch hi qui tuyn tnh v n gin ha vn , ti s gp
chung hai nhm Khng phi nhim AO v Khng r thnh mt nhm chung. (Cch
lm ny c th l mt ti phn tch khc!) Bng s liu trn, do , c th rt gn nh
sau:

Phi nhim AO
Khng phi nhim AO v khng r

Ung th
11
36

i chng
17
127

Qua s lin trn y, c th thy 23.4% (hay 11/47) nhm ung th tin lit tuyn
tng b phi nhim AO. Nhng t l ny trong nhm i chng l 11.8% (17/144). Vn
t ra l c s tng quan no gia phi nhim AO v ung th tin lit tuyn hay
khng? Cm t s tng quan c th khai trin thnh hai cu hi c th:

Nguy c mc bnh ung th tin tit tuyn nhng ngi tng b phi
nhim so vi nguy c nhng ngi khng tng b phi nhim l bao
nhiu?

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

khc bit v nguy c ung th gia hai nhm c ngha thng k hay
khng?

M hnh phn tch hi qui logistic c th tr li hai cu hi ny. Ch s thng k


quan trng phn tch s liu t cc nghin cu bnh chng (case-control study) nh
trn l t s nguy c (odds ratio hay OR). c tnh OR, ti phi gii thch tng bc
nh sau:
Ting Anh c mt danh t m t nguy c hay kh nng m cc ngn ng u
khc (nh Php, , Ty Ban Nha, Trung Quc, Vit Nam, v.v) khng c: l danh t
odd. Do , ti s tm thi khng dch ch odd sang ting Vit. Ni mt cch ngn gn,
odd l t s ca hai gi tr ca mt bin s nh phn. Do , OR l t s ca hai odds.
Ni cch khc, OR l t s ca hai t s! Trong v d trn, chng ta c:

odd mc ung th trong nhm tng b phi nhim AO l: 11/17 =


0.647;

odd mc ung th trong nhm khng tng b phi nhim AO l:


36/127 = 0.283;

v odds ratio mc bnh ung th trong nhm tng b phi nhim so


vi nhm khng tng b phi nhim l: OR = 0.647 / 0.283 =
2.28.

Tht ra, OR cng c th tnh ngn gn bng mt cng thc:


OR =

11 127
= 2.28
17 36

Ni cch khc, nguy c mc bnh ung th tin lit tuyn trong cc cu chin binh
tng b phi nhim AO cao hn cc cu chin binh khng tng b phi nhim AO
khong 2.3 ln.
Nhng v y l mt nghin cu da vo mt mu duy nht, v c tnh trn y
c th dao ng t mu ny sang mu khc. Nn nh rng, OR l mt c tnh
estimate ca mt OR tht true OR m chng ta khng bit trong thc t. Ch s
nguy c tht ny c th dao ng bt thng t thp hn 1 n cao hn 1. Nu OR tht
thp hn 1, th iu ny c ngha l nhng ngi tng phi nhim AO c nguy c ung
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

th thp hn nhng ngi khng tng phi nhim AO; mt ch s OR tht cao hn 1 cho
bit nhng ngi tng phi nhim AO c nguy c ung th cao hn nhng ngi khng
tng phi nhim AO; v nu OR tht bng 1 th iu ny c ngha l khng c mi lin
h no gia phi nhim AO v ung th tin lit tuyn.
V th, cu hi th hai (v c l quan trng hn) l mi tng quan nh phn nh
qua OR c ngha thng k hay khng? Ni cch khc, nu nghin cu trn c lp li
rt nhiu ln, th dao ng ca OR l bao nhiu? Nu nghin cu c lp li (chng
hn nh) 100 ln, v 95 nghin cu cho ra c s OR dao ng t 1.1 n 3.8, v 5
nghin cu cho thy OR thp hn 1.1 hay cao hn 3.8, th chng ta c bng chng
pht biu rng mi lin h gia phi nhim AO v ung th tin lit tuyn c ngha
thng k statistically significant.
Ni cch khc, chng ta cn phi c tnh sai s chun (standard error) cho OR
v khong tin cy 95% ca OR. V OR l mt t s, cho nn vic c tnh sai s chun
cho OR khng th tin hnh trc tip c (hay c nhng rt phc tp), m phi c
tnh bng cc phng php gin tip. Mt trong nhng phng php gin tip l
phng php Woolf v qui trnh c tnh c th m t tng bc nh sau:

Trc ht, chng ta hon chuyn OR sang n v logart (natural


logarithm):
logOR = log(OR) = log(2.28) = 0.824

Bc th hai l c tnh sai s chun (tm cho k hiu SE) ca logOR qua
cng thc sau y:

SE =

1 1 1
1
+ + +
= 0.430
11 17 36 127

Bc th ba, theo lut phn phi chun, khong tin cy 95% ca logOR
l: logOR 1.96SE, v trong trng hp trn, khong tin cy 95% ca
logOR l:
0.824 1.960.430 = 0.0188
0.824 + 1.960.430 = +1.6668

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

V n v va tnh l log, cho nn bc th t l hon chuyn khong tin


cy 95% sang n v t s nh lc ban u:
e-0.0188 = 0.98

n e0.16668 = 5.30

Kt qu phn tch trn cho thy tnh trung bnh, OR l 2.28, nhng khong tin cy
95% ca OR dao ng t 0.98 n 5.30. Ni cch khc, nu nghin cu trn c lp li
100 ln, s c 95 nghin cu cho thy OR c th thp hn 1 (0.98) hay thm ch cao n
5.30.
n y, chng ta c kt qu pht biu cho cu hi th hai. Bi v khong tin
cy 95% c th thp hn 1 m cng c th cao hn 1, cho nn chng ta pht biu rng
mi lin h gia phi nhim AO v nguy c mc ung th tuyn tin lit khng c ngha
thng k. Xin nhn mnh, y ch mi l mt kt lun thng k, v ti cha bn n
ngha ca s liu ny trn quan im lm sng v n khng nm trong phm vi ca tho
lun.

II. M hnh hi qui logistic


V d trn minh ha cho phng php phn tch hi qui logistic mang tnh th
cng. Tht ra, m hnh hi qui logistic c th th hin bng mt m hnh chung. Gi p
l xc sut ca mt s kin (trong v d trn, s kin y l bnh ung th tuyn tin
lit), th odd c th nh ngha nh sau:
odd =

p
1 p

Gi tnh trng phi nhim AO l x, v x c hai gi tr: 0 c ngha l khng tng b


phi nhim, v 1 biu hin cho tnh trng tng b phi nhim AO. M hnh hi qui
logistic pht biu rng log(odd) ty thuc vo gi tr ca x qua mt hm s tuyn tnh
gm 2 thng s nh sau:

log ( odd ) = + x +
hay,
p
log
= + x+
1 p

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

[1]

p
Trong , log(odd) hay log
cn c gi l logit(p) (v do , mi c tn
1 p
logistic); v l hai thng s cn c tnh t d liu, v l phn d (residual), tc l
phn khng th gii thch bng x. L do hon chuyn t p thnh logit(p) l v p c gi tr
trong khong 0 v 1, trong khi logit(p) c gi tr v gii hn v do thch hp cho
vic phn tch theo m hnh hi qui tuyn tnh.

M hnh trn gi nh rng tun theo lut phn phi chun (normal distribution)
vi trung bnh bng 0 v phng sai bt bin (constant variance). Vi gi nh ny, gi
p
tr k vng (expected value) hay gi tr trung bnh ca log
cho bt c gi tr no
1 p
ca x l: l + x (v gi tr trung bnh ca l 0). Ni cch khc, odd b ung th, t

phng trnh [1], l:


odd =

p
= e + x +
1 p

[2]

Nh vy m hnh hi qui logistic pht biu rng odd ca mt s kin (ung th


tuyn tin lit) ty thuc vo x (tnh trng phi nhim AO). Da vo phng trnh [1],
nhm khng b phi nhim (x = 0) c odd b ung th (gi tt odd0) l:
odd 0 = e + 0 = e

[3]

v nhm tng b phi nhim (x = 1) c odd b ung th (odd1)l:


odd1 = e + 1 = e +

[4]

T s ca hai odds chnh l odds ratio (v chnh l l do ti sao ti dch odds


ratio l t s nguy c). T s nguy c OR c th c tnh t [3] v [4] nh sau:

odd1 e +
OR =
= = e
odd0
e

[5]

Trong thc t, chng ta khng bit gi tr tht ca hai thng s v , v phi


c tnh t s liu quan st c. Theo qui c thng k, c s (estimates) ca hai
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

)
thng s ny c k hiu ha bng du m: v . Nh trong trng hp v d 1, c

s ca thng s l = 0.824. Do , OR phn nh odd b ung th trong nhm b phi

nhim AO so vi odd trong nhm khng tng b phi nhim AO. Trong v d 1, e =
e0.824 = 2.28.

III. c tnh thng s ca m hnh hi qui logistic bng R


Nh va trnh by, phng php c tnh OR v khong tin cy 95% tuy n
gin, nhng kh di dng. Trong trng hp c nhiu bin c lp x, phng php tnh
ton phc tp hn v phn tch bng phng php th cng nh trn s tn nhiu th gi.
Ngy nay, my tnh v cc phn mm thng k c th cung cp cho chng ta mt phng
tin phn tch rt hu hiu. Mt trong nhng phn mm chuyn phn tch thng k c tn
n gin l R m ti c dp gii thiu trong cun sch Phn tch s liu v to biu
bng R (Nh xut bn Khoa hc v K thut, TPHCM 2007).
y, ti s hng dn cch phn tch s liu trn bng R. Trc khi phn tch,
cn phi nhp d liu vo mt khun kh m R c th c c. tin cho vic theo
di, ti trnh by bng s liu mt ln na y:

Phi nhim AO
Khng phi nhim AO v khng r

Ung th
11
36

i chng
17
127

y, chng ta c hai bin, gi tt l ao v cancer; mi bin c hai gi tr: 0 (khng)


v 1 (c). Trong nhm ao = 1 (phi nhim) c 28 i tng, v trong s ny c 11
ngi b ung th; trong nhm ao = 1 (khng phi nhim) c 143 i tng v trong s
ny c 36 ngi b ung th. Chng ta s b tr s liu trn bng R nh sau:
ao <- c(1, 0)
ntotal <- c(28, 163)
cancer <- c(11, 36)
proportion <- cancer/ntotal

Ch thch:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Dng 1 nh ngha bin ao c hai gi tr 1 v 0 (ch du <- c ngha


tng ng nh du bng =);

Dng 2 nh ngha bin ntotals, v cho bit ao=1 c 28 i tng,


ao=0 c 163 i tng;

Dng 3 nh ngha bin cancer, v cho bit ao=1 c 11 i tng,


ao=0 c 36 i tng;

Dng 4 nh ngha bin proportion bng cancer chia cho


ntotals, c ngha l t l ung th cho tng nhm ao.

Sau khi nhp s liu, chng ta sn sng phn tch. Trong R c hm glm
chuyn dng cho phn tch hi qui logistic. Cch vit hm ny c m t trong sch
ca ti. y, ti ch gii thch ngn gn nh sau:
logistic <- glm(proportion ~ ao, family=binomial,
weight=ntotal)

Trong lnh trn, chng ta yu cu R s dng hm glm m t proportion


nh l mt hm s ca ao (ch du ~ c ngha l m hnh), v phn phi ca
proportion l phn phi nh phn (binomial) v ch c 2 gi tr. Ngoi ra, trong
lnh trn, chng ta cn cho mt thng s weight=ntotal. Thng s weight yu
cu R s dng ntotal l mt s tm lc (thay v mt bnh nhn).
Kt qu phn tch c lu tr i tng c tn l logistic (tt nhin, chng
ta c th thay i vi mt tn no khc m mnh thch). By gi, chng ta c th xem
qua kt qu phn tch bng cch lnh summary i tng logistic nh sau:
summary(logistic)

Call:
glm(formula = proportion ~ ao, family = "binomial", weights = ntotal)
Deviance Residuals:
[1] 0 0

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2607
0.1888 -6.677 2.44e-11 ***
ao
0.8254
0.4306
1.917
0.0552 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3.5022e+00
Residual deviance: -2.3093e-14
AIC: 12.933

on 1
on 0

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 3

Bng 2. Kt qu phn tch hi qui logistic bng R.

Ch thch: Lnh summary(logistic) cung cp cho chng ta cc kt qu phn tch nh


trnh by trong Hnh 1 trn.

(a) Phn Call: bo cho chng ta bit m hnh phn tch;


(b) Deviance: phn th hai ca kt qu cho bit qua v deviance, tc phn d (hay
residual trong m hnh [1]).
Deviance Residuals:
[1] 0 0

Deviance nh gii thch trn phn nh khc bit gia m hnh v d liu (cng tng
t nh mean square residual trong phn tch hi qui tuyn tnh vy). i vi mt m
hnh n l nh v d ny th gi tr ca deviance khng c ngha g nhiu.
(c) Phn k tip cung cp c s ca (m R t tn l intercept) v (ao) v sai
s chun (standard error) cho tng c s:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2607
0.1888 -6.677 2.44e-11 ***
ao
0.8254
0.4306
1.917
0.0552 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Qua kt qu ny, chng ta c = -1.2607 v = -0.8254. c s l s dng, cho


thy mi lin h gia cancer v ao l mi lin h thun: nguy c ung th tng khi gi
tr ca ao tng. Tuy nhin, kim nh z (tnh bng cch ly c s chia cho sai s
chun) cho chng ta thy nh hng ca ao khng hn c ngha thng k, v tr s p =
0.055.
Xin nhc li, OR chnh l e0.8254 = 2.28 (tc phng trnh [5]) m chng ta va c c
qua phn tch th cng trong phn trn. Ni cch khc, khi ao=1 th nguy c ung th
tng 2.28 ln so vi nhm ao=0.
(d) Cc phn k tip cung cp mt s ch s thng k v m hnh, nhng khng c lin
quan n vn chng ta quan tm, nn ti s khng gii thch y.
Nh trnh by trn, khng c khc bit no gia kt qu phn tch bng R v kt
qu qua phn tch th cng. Tuy nhin, li th khi phn tch bng my tnh l thi gian.
Sau khi nhp d liu, tt c cc tnh ton bng R qua lnh trn tn khng y 1 giy!
Ngoi ra, R cn cung cp cho chng ta cc sai s chun thng rt kh tnh trong trng
hp phn tch a bin (m ti s bn qua trong mt bi sau).

IV. Phn tch hi qui logistic vi mt bin lin tc


Trong v d 1, c hai bin ph thuc (ung th) v bin c lp (phi nhim AO)
u l bin nh phn. Do , vic tnh ton cng n gin. Nhng trong nhiu nghin
cu, bin c lp (hay yu t nguy c) l bin lin tc, v vic tm hiu mi tng quan
gia hai bin c phn phc tp hn. Trong phn ny, ti s bn qua mt trng hp nh
th v s s dng R gii quyt vn .
V d 2. Nghin cu mi tng quan gia fibrinogen v EST. Erythrocyte
sedimentation rate (ESR) l t sut m cc hng huyt cu (erythrocytes) ng li trong
huyt thanh. Bnh nhn vi ESR cao hn 20 mm/gi c nguy c cao b bnh thp khp,
v cc bnh vim mn tnh; v bnh nhn vi ESR thp hn 20 c xem l bnh
thng. Khi ESR tng, mt s protein trong mu cng gia tng. Mt trong nhng
protein l fibrinogen. Mt nghin cu o lng ESR v fibrinogen 29 i tng
(Collett D, Jemain AA. Residuals, outliers and influential observations in regresison
analysis. Sains Malaysias 1985; 4:493-511) , v cc nh nghin cu pht hin trong
nhm ny c 6 i tng vi ESR cao hn 20 mm/gi. Cc nh nghin cu mun bit c

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

10

mi tng quan no gia fibrinogen v ESR hay khng. S liu ca 29 i tng c


trnh by trong Bng s 3 sau y:

Bng 3. Fibrinogen v ESR 29 i tng


id fibrinogen
1
2.52
2
2.56
3
2.19
4
2.18
5
3.41
6
2.46
7
3.22
8
2.21
9
3.15
10
2.60
11
2.29
12
2.35
16
3.15
18
2.68
19
2.60
20
2.23
21
2.88
22
2.65
24
2.28
25
2.67
26
2.29
27
2.15
28
2.54
30
3.34
31
2.99
32
3.32
13
5.06
14
3.34
15
2.38
17
3.53
23
2.09
29
3.93

ESR
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1

Ghi ch: id l m s ca i tng nghin cu;


esr c m ha 0 (nu ESR thp hn 20) hay 1
(nu ESR cao hn 20).

Gi p l xc sut esr=1 v x l lng protein fibrinogen trong mu, m hnh hi


qui logistic [1] c th ng dng tr li cu hi trn:
p
log
= + x+
1 p

[6]

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

11

Ch rng y, x l mt bin lin tc, ch khng phi bin nh phn. V th


phng php c tnh thng s v cng khc vi v d 1. Phng php chnh c
tnh thng s trong m hnh [6] l phng php maximum likelihood tc phng php
Hp l cc i, v khng nm trong phm vi ca bi vit ny, nn ti s khng trnh by
y (bn c c th tham kho sch gio khoa bit thm, nu cn thit). Tuy nhin,
ti mun cp ngn gn l phng php hp l cc i cung cp cho chng ta mt h
phng trnh nh sau:
1
n
n
( + xi )
yi = 1 + e
i =1
i =1
n
n
x y = x 1 + e( + xi )

i i
i

i =1
i =1

Trong , Trong , yi l bin ph thuc (esr vi gi tr 0 hay 1), v xi l bin c lp


(fibrinogen), v n l s mu. tm c s v (c s ca v , mt trong nhng
php tnh hay s dng l iterative weighted least square hay Newton-Raphson. R s
dng php tnh Newton-Raphson tm hai c s .
Trc khi phn tch, chng ta cn phi nhp s liu vo R nh sau (chng ta
khng cn nhp bin id):
fibrinogen <- c(2.52,
2.60,
2.28,
5.06,
esr

2.56,
2.29,
2.67,
3.34,

2.19,
2.35,
2.29,
2.38,

2.18,
3.15,
2.15,
3.53,

3.41,
2.68,
2.54,
2.09,

2.46, 3.22, 2.21, 3.15,


2.60, 2.23, 2.88, 2.65,
3.34, 2.99, 3.32,
3.93)

<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)

data <- data.frame(fibrinogen, esr)

Ch lnh th ba yu cu R nhp hai bin fibrinogen v esr vo mt d liu c tn


l data tin cho vic phn tch sau ny.
boxplot(fibrinogen ~ esr, xlab="ESR", ylab="Fibrinogen")
t.test(fibrinogen ~ esr)

Lnh th nht yu cu R v biu hnh hp (box plot) v fibrinogen phn nhm theo
bin esr, v kt qu c trnh by trong biu 2 di y. Lnh th hai s dng
kim nh t.test trong R xem s khc bit v fibrinogen gia hai nhm ESR c
ngha thng k hay khng, v kt qu c trnh by trong Bng 3 di y:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

12

Welch Two Sample t-test

4.0
3.0

3.5

alternative hypothesis: true


difference in means is not equal to 0
95 percent confidence interval:
-1.8666562 0.3907588

2.0

2.5

Fibrinogen

4.5

5.0

data: fibrinogen by esr


t = -1.6498, df = 5.331,
p-value = 0.1563

1
ESR

sample estimates:
mean in group 0 mean in group 1
2.650385
3.388333

Biu 2. Biu hnh hp phn phi ca Bng 4. Kim nh t gia hai nhm cao v thp
fibrinogen gia hai nhm ESR.
ESR.

Phn tch n gin trn y cho thy fibrinogen trung bnh i tng c ESR cao
(tc esr = 1) l 3.39 mm/gi, c phn cao hn so vi nhm ESR thp vi
fibrinogen trung bnh l 2.65 mm/gi. Nhng s khc bit ny khng c ngha thng
k (p = 0.1563).
By gi chng ta phn tch bng phng php hi qui logistic vi hm glm trong R nh
sau:
logit.esr <- glm(esr ~ fibrinogen, family="binomial")
summary(logit.esr)

Ch cch vit lnh cng khng khc g so vi v d 1, Kt qu ca phn tch ny c


trnh by trong biu 3 sau y:
Call:
glm(formula = esr ~ fibrinogen, family = "binomial")
Deviance Residuals:
Min
1Q
Median
-0.9298 -0.5399 -0.4382

3Q
-0.3356

Max
2.4794

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8451
2.7703 -2.471
0.0135 *
fibrinogen
1.8271
0.9009
2.028
0.0425 *

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

13

--Signif. codes:

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)


Null deviance: 30.885
Residual deviance: 24.840
AIC: 28.840

on 31
on 30

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 5

Bng 5. Kt qu phn tch s tng quan gia fibrinogen v ESR

Vi kt qu trn, phng trnh [6] c th vit nh sau:


p
log
= 6.8451 + 1.8271x
1 p
Nh vy OR lin quan n fibrinogen l: OR = e1.827 = 6.21 (nh gii thch phng
trnh [5]). Ni cch khc, khi fibrinogen tng 1 mmol/L, th odd vi esr cao tng 6.21
ln. Chng ta c th tnh khong tin cy 95% ca OR bng lnh sau y:
exp(confint(logit.esr, parm="fibrinogen"))

Ch : lnh trn yu cu tnh s m (exp) ca khong tin cy 95% (confint vit tt


t confidence interval ca thng s fibrinogen (parm vit tt ca ch parameter)
trong i tng phn tch logit.esr. Kt qu l:
2.5 %
97.5 %
1.403468 54.535954

Tc khong tin cy 95% ca OR lin quan n fibrinogen dao ng t 1.40 n 54.5.


Bi v khong tin cy 95% cao hn 1, chng ta c bng chng pht biu rng mi lin
h gia fibrinogen v ESR c ngha thng k. Tht ra, tr s p ca mi lin h ny l
0.0425 (xem Bng 5).

V. nh hng tng tc (interaction effect)


Hai v d trn, ti gii thiu qua cch phn tch hi qui logistic cc nghin cu
m bin c lp c th l bin lin tc hay bin khng lin tc, nhng m hnh ch n

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

14

gin gii hn mt bin c lp. Tuy nhin, trong nhiu nghin cu khoa hc, c rt nhiu
bin c lp m nh nghin cu mun thm nh mi tng quan hay nh hng n mt
bin ph thuc. Trong phn ny, ti s bn v mt nghin cu vi hai bin c lp, v
vn tng tc gia cc bin c lp.
V d 3. Nghin cu v vai tr ca ph n trong x hi. Trong mt iu tra x
hi thc hin vo nm 1971-1972, cc nh nghin cu hi i tng nam v n ng
hay khng ng vi cu hi sau y: Ph n nn lo vic nh v vic iu hnh
nh nc cho n ng (Harberman SJ. The analysis of residuals in cross-classified
tables. Biometrics 1973;29:205-220). Cc nh nghin cu ghi nhn trnh hc vn v
gii ca mi i tng. Kt qu nghin cu c th tm lc bng Bng s liu s 6 sau
y.
Bng 6. Vai tr ca ph n trong x hi
edu
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12

sex
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female

agree
4
2
4
6
5
13
25
27
75
29
32
36
115
31
28
9
15
3
1
2
3
4
1
0
6
10
14
17
26
91
30
55
50
190

disagree
2
0
0
3
5
7
9
15
49
29
45
59
245
70
79
23
110
29
28
13
20
2
0
0
1
0
7
5
16
36
35
67
62
403

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

15

13
14
15
16
17
18
19
20

Female
Female
Female
Female
Female
Female
Female
Female

17
18
7
13
3
0
1
2

92
81
34
115
28
21
2
4

Ghi ch: Trong bng trn, bin edu l trnh hc vn (o bng s nm theo hc) ca
ngi tr li, agree v disagree l s i tng ng hay khng ng vi cu
hi. Chng hn nh trong dng cui ca bng s liu c ngha l trong s ph n vi 20
nm hc, 2 ngi ng v 4 ngi khng ng vi cu hi.

Cc nh nghin cu mun c lng s nh hng ca gii tnh v trnh hc


vn n xu hng tr li cu hi trn.
tin cho vic theo di, cc s liu trong bng trn trc ht s c nhp vo
R. Cc lnh sau y to ra 4 bin: edu, sex, agree v disagree. Ngoi ra, hai
bin ntotal (tng s i tng) v proportion (phn trm i tng ng vi cu
hi) cng c tnh ton t hai bin agree v disagree. Cc s liu ny s c lu
tr trong mt d liu c tn l women.
edu <- c(0,
1,
11, 12,
0,
1,
11, 12,
sex <- c(0,
0,
1,
1,

0,
0,
1,
1,

agree <- c(4,


2,
36, 115,
1,
0,
190, 17,

2,
13,
2,
13,

3,
14,
3,
14,

4,
15,
4,
15,

5,
16,
5,
16,

6,
17,
6,
17,

7,
18,
7,
18,

8,
19,
8,
19,

9,
20,
9,
20)

0,
0,
1,
1,

0,
0,
1,
1,

0,
0,
1,
1,

0,
0,
1,
1,

0,
0,
1,
1,

0,
0,
1,
1,

0,
0,
1,
1,

0,
0,
1,
1)

4,
31,
6,
18,

disagree <- c(2,


0,
59, 245,
0,
0,
403, 92,

6,
28,
10,
7,
0,
70,
1,
81,

5,
9,
14,
13,

13,
15,
17,
3,

25,
3,
26,
0,

3,
5,
7,
79, 23, 110,
0,
7,
5,
34, 115, 28,

27,
1,
91,
1,
9,
29,
16,
21,

75,
2,
30,
2)
15,
28,
36,
2,

29,
3,
55,

49,
13,
35,
4)

10,
10,

0,
1,

32,
4,
50,

29,
20,
67,

45,
2,
62,

ntotal <- agree + disagree


proportion <- agree/ntotal
women <- data.frame(edu, sex, agree, disagree, ntotal, proportion)

Trc khi phn tch, chng ta th tm hiu t l ng (tc bin proportion)


theo trnh hc vn v gii tnh, vi hai lnh sau y: Lnh th nht th hin s tng

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

16

quan gia t l ng v trnh hc vn, v kt qu trnh by trong Biu 3; lnh th


hai v biu hnh hp v t l ng theo gii tnh (Biu 4):

0.6
0.4

0.6

0.2

0.4
0.0

0.0

0.2

% agreed

0.8

0.8

1.0

1.0

plot(proportion ~ edu, ylab="% agreed", pch=ifelse(sex==0,16,21))


boxplot(proportion ~ sex)

10

15

20

edu

Biu 3. T l i tng ng vi cu
hi theo trnh hc vn. Cc im trn en
th hin i tng n, v cc im trn trng
th hin i tng nam.

Biu 4. T l i tng ng vi
cu hi theo gii tnh. Trc honh 0 th
hin i tng n, 1 th hin i tng
nam.

Biu 3 cho thy r rng c mt mi tng quan nghch o gia t l ng v trnh


hc vn: i tng c trnh vn ha cng cao, t l ng cng thp. Tuy nhin, c
hai biu cho thy nh hng ca gii tnh c v khng quan trng, d t l n ng
c v cao hn so vi nam gii.

Gi p l xc sut ng vi cu hi, v vi kt qu phn tch s b trn, chng ta


c th xem xt mt m hnh n gin m theo t xc sut ng ty thuc vo trnh
hc vn v gii tnh. Ni theo ngn ng ca m hnh hi qui logistic:
p
log
= + edu + sex
1 p

[7]

V, theo ngn ng my tnh R (kt qu trnh by trong Bng 7):


logistic <- glm(proportion ~ sex + edu, family=binomial,
weight=ntotal)
summary(logistic)

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

17

Call:
glm(formula = proportion ~ sex + edu, family = "binomial", weights =
ntotal)
Deviance Residuals:
Min
1Q
Median
-2.72544 -0.87168 -0.08448

3Q
0.88843

Max
3.13315

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.50937
0.18389 13.646
<2e-16 ***
sex
-0.01145
0.08415 -0.136
0.892
edu
-0.27062
0.01541 -17.560
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 451.722
Residual deviance: 64.007
AIC: 208.07

on 40
on 38

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 4

Bng 7. Kt qu phn tch hi qui logistic ca m hnh [7]

Kt qu trn cho thy r rng nh hng ca trnh hc vn n xu hng ng vi


cu hi (p < 0.0001), nhng gii tnh khng c nh hng ng k (p = 0.892).
M hnh [7] cn c tn l m hnh cng hng (additive model hay main effect
model), bi v m hnh ny pht biu rng trnh hc vn v gii tnh nh hng c
lp n t l ng . Cm t c lp y c ngha l nh hng ca trnh hc vn
hon ton khng ty thuc vo nh hng ca gii tnh (v ngc li, nh hng ca gii
tnh nu c hon ton khng ph thuc vo trnh hc vn).
Trong thc t, l mt m hnh n gin, bi v thi v hnh x ca nam v
n c th khc nhau d h c cng mt trnh hc vn. Nu iu xy ra, th m hnh
cng hng [7] khng cn ph hp trong thc t na. V th, trc khi chp nhn m
hnh cng hng, chng ta phi xem xt n m hnh tng tc (interaction model) gia
gii tnh v trnh hc vn. M hnh tng tc pht biu rng:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

18

p
log
= + edu + sex + edu sex
1 p

[8]

Vi R, m hnh trn c vit nh sau:


interaction <- glm(proportion ~ sex + edu + sex:edu,
family=binomial, weight=ntotal)
summary(interaction)
Call:
glm(formula = proportion ~ sex * edu, family = "binomial", weights =
ntotal)
Deviance Residuals:
Min
1Q
-2.39097 -0.94911

Median
0.03065

3Q
0.75927

Max
2.45262

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.09820
0.23550
8.910 < 2e-16 ***
sex
0.90474
0.36007
2.513 0.01198 *
edu
-0.23403
0.02019 -11.592 < 2e-16 ***
sex:edu
-0.08138
0.03109 -2.617 0.00886 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 451.722
Residual deviance: 57.103
AIC: 203.16

on 40
on 37

degrees of freedom
degrees of freedom

Number of Fisher Scoring iterations: 4

Bng 8. Kt qu phn tch hi qui logistic ca m hnh tng tc [8]

Kt qu trn cho chng ta mt bc tranh hon ton khc vi m hnh cng hng: tt
c ba thng s sex, edu v tng tc sex:edu (du : c ngha l tng tc trong R)
u c ngha thng k. hiu m hnh ny, chng ta cn phi vit li m hnh [8]
bng cc c s trong Bng 8:
p
log
= 2.098 + 0.905 sex 0.234 edu 0.081 edu sex
1 p
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

19

Phng trnh cho n (tc sex = 0) l:


p
log
= 2.098 0.234 edu
1 p
Phng trnh cho nam (tc sex = 1) l:
p
log
= 2.098 + 0.905 0.234 edu 0.081 edu
1 p
= 3.003 0.315 edu
Ni cch khc, chng ta c 2 phng trnh vi cho hai gii tnh. n, mi nm tng v
hc vn, OR = e-0.234 = 0.79, nhng nam, OR = e-0.315 = 0.73.

o* *
o* o o
* *
o* *
* *
ooo

0.0
0

1.0
0.8

10

15

oo

0.6

o
*

Fitted (Men)
Fitted (Women)

o
o*

o
o
o* *
o* *

*
*

0.4

o*

o* *

o* o o
* *
o* *
* *
ooo

0.2

0.8

o*

0.4

0.6

Fitted (Men)
Fitted (Women)

0.2

% agreed

o
o

o* o*

* *
0.0

o* *

% agreed

1.0

Mt cch khc cm nhn s khc bit gia hai nhm l qua biu (xem m v hai
biu ny trong phn Ch thch). Hai biu sau y m t t l ng v trnh
hc vn cho nam v n da vo m hnh cng hng [7] v m hnh tng tc [8]:

o*
20

Education

Biu 5. Tin on t l i tng ng


vi cu hi theo gii tnh da vo m hnh
cng hng: ng khng t on th hin
nam, v ng t on th hin n.
Cc im o th hin i tng n, v cc
im * th hin i tng nam.

10

15

oo
o* o*

* *
o*
20

Education

Biu 6. Tin on t l i tng ng


vi cu hi theo gii tnh da vo m hnh
tng tc: ng khng t on th hin
nam, v ng t on th hin n.
Cc im o th hin i tng n, v cc
im * th hin i tng nam.

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

20

Biu 5 cho thy hai ng biu din cho nam v n hu nh trng nhau,
nhng Biu 6 cho thy xc sut ng vi cu hi khc nhau gia nam v n v
khc bit cn ty thuc vo trnh hc vn. Chng hn nh nhng i tng c trnh
hc vn thp hn 10 nm, n c xu hng ng cao hn nam; nhng nhng i
tng c trnh hc vn cao hn 10 nm, nam c xu hng ng hn n. Trong bi
cnh ca cu hi, nhng i tng vi trnh hc vn thp thng ng vi quan
im rng ph n nn lo vic nh v vic quc gia i s cho nam iu hnh, nhng
vi nhng i tng c trnh hc vn cao, phn ln u khng ng vi quan im
ny, v phn ng ca n khc vi nam ty vo trnh hc vn. chnh l ngha ca
nh hng tng tc!

Qua v d trn, chng ta thy nu phn tch s liu theo thi quen m khng xem
xt n kh nng nh hng tng tc, rt d i n kt lun sai hay b qua nhng thng
tin quan trng. Xy dng m hnh trong phn tch thng k v khoa hc ni chung l mt
vn phc tp, v ti s bn n trong phn sau.
Trong bi sau (hi vng l c th gi) ti s bn qua v m hnh hi qui logistic a
bin, nh hng phi tuyn tnh (non-linear effect) v cc phng php cng tiu chun
xy mt m hnh logistic hon chnh.

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

21

Ch thch:
Thut ng s dng trong bi
Ting Anh
Logistic regression model
Control
Variable
Continuous variable
Discrete variable
Dependent variable
Independent variable
Maximum likelihood method
Additive model
Interaction model

Ting Vit
M hnh hi qui logistic
i chng
Bin
Bin lin tc
Bin khng lin tc hay bin ri rc
Bin ph thuc
Bin c lp
Phng php hp l cc i
M hnh cng hng
M hnh tng tc

M R v biu 5 v 6
# to mt hm v, gi hm l myplot
myplot <- function(predicted)
{
f <- data$sex == 1
plot(data$edu, predicted, type="n",
ylab="% agreed", xlab="Education", ylim=c(0,1))
lines(data$edu[!f], predicted [!f], lty=1)
lines(data$edu[f], predicted[f], lty=2)
lgtxt <- c("Fitted (Men)", "Fitted (Women)")
legend("topright", lgtxt, lty=1:2, bty="n")
y <- data$agree/data$ntotal
# text(data$edu, y, ifelse(f, "", ""), cex=1.25)
text(data$edu, y, ifelse(f, "o", "*"), cex=1.25)
}
# v biu 5 m hnh cng hng - additive model
additive <- glm(proportion ~ sex+edu,
family=binomial, weight=ntotal, data=data)
p.additive <- predict(additive, type="response")
myplot(p.additive)
# v biu 6 6 m hnh tng tc - interactive model

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

22

interaction <- glm(proportion ~ sex+edu,


family=binomial, weight=ntotal, data=data)
p.predicted <- predict(interaction, type="response")
myplot(p.predicted)

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

23

You might also like