Professional Documents
Culture Documents
s bnh l v phng vn trc tip bit ai l ngi tng phi nhim AO trong thi
chin. Kt qu cho thy trong s 47 trng hp ung th, c 11 ngi tng b phi nhim
AO, 29 ngi khng tng b phi nhim, v 7 ngi khng r tin s; trong nhm i
chng c 17 ngi khng tng b phi nhim, 106 ngi khng tng b phi nhim, v
21 ngi khng th xc nh phi nhim. Kt qu c th tm lc trong bng s liu sau
y:
Bng 1. Phi nhim AO v ung th tin lit tuyn
Ung th
i chng
(n=47)
(n=142)
Phi nhim AO
11
17
Khng phi nhim AO
29
106
Khng r
7
21
Tng s
47
144
Ghi ch: n l s bnh nhn. Ngun s liu: Giri VN, Cassidy AE, Beebe-Dimmer J, Ellis
LR, Smith DC, Bock CH, Cooney KA. Association between Agent Orange and prostate
cancer: a pilot case-control study. Urology. 2004 Apr;63(4):757-60; discussion 760-1.
Correction in Urology. 2004 Jun;63(6):1213.
minh ha cho phn tch hi qui tuyn tnh v n gin ha vn , ti s gp
chung hai nhm Khng phi nhim AO v Khng r thnh mt nhm chung. (Cch
lm ny c th l mt ti phn tch khc!) Bng s liu trn, do , c th rt gn nh
sau:
Phi nhim AO
Khng phi nhim AO v khng r
Ung th
11
36
i chng
17
127
Qua s lin trn y, c th thy 23.4% (hay 11/47) nhm ung th tin lit tuyn
tng b phi nhim AO. Nhng t l ny trong nhm i chng l 11.8% (17/144). Vn
t ra l c s tng quan no gia phi nhim AO v ung th tin lit tuyn hay
khng? Cm t s tng quan c th khai trin thnh hai cu hi c th:
Nguy c mc bnh ung th tin tit tuyn nhng ngi tng b phi
nhim so vi nguy c nhng ngi khng tng b phi nhim l bao
nhiu?
khc bit v nguy c ung th gia hai nhm c ngha thng k hay
khng?
11 127
= 2.28
17 36
Ni cch khc, nguy c mc bnh ung th tin lit tuyn trong cc cu chin binh
tng b phi nhim AO cao hn cc cu chin binh khng tng b phi nhim AO
khong 2.3 ln.
Nhng v y l mt nghin cu da vo mt mu duy nht, v c tnh trn y
c th dao ng t mu ny sang mu khc. Nn nh rng, OR l mt c tnh
estimate ca mt OR tht true OR m chng ta khng bit trong thc t. Ch s
nguy c tht ny c th dao ng bt thng t thp hn 1 n cao hn 1. Nu OR tht
thp hn 1, th iu ny c ngha l nhng ngi tng phi nhim AO c nguy c ung
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
th thp hn nhng ngi khng tng phi nhim AO; mt ch s OR tht cao hn 1 cho
bit nhng ngi tng phi nhim AO c nguy c ung th cao hn nhng ngi khng
tng phi nhim AO; v nu OR tht bng 1 th iu ny c ngha l khng c mi lin
h no gia phi nhim AO v ung th tin lit tuyn.
V th, cu hi th hai (v c l quan trng hn) l mi tng quan nh phn nh
qua OR c ngha thng k hay khng? Ni cch khc, nu nghin cu trn c lp li
rt nhiu ln, th dao ng ca OR l bao nhiu? Nu nghin cu c lp li (chng
hn nh) 100 ln, v 95 nghin cu cho ra c s OR dao ng t 1.1 n 3.8, v 5
nghin cu cho thy OR thp hn 1.1 hay cao hn 3.8, th chng ta c bng chng
pht biu rng mi lin h gia phi nhim AO v ung th tin lit tuyn c ngha
thng k statistically significant.
Ni cch khc, chng ta cn phi c tnh sai s chun (standard error) cho OR
v khong tin cy 95% ca OR. V OR l mt t s, cho nn vic c tnh sai s chun
cho OR khng th tin hnh trc tip c (hay c nhng rt phc tp), m phi c
tnh bng cc phng php gin tip. Mt trong nhng phng php gin tip l
phng php Woolf v qui trnh c tnh c th m t tng bc nh sau:
Bc th hai l c tnh sai s chun (tm cho k hiu SE) ca logOR qua
cng thc sau y:
SE =
1 1 1
1
+ + +
= 0.430
11 17 36 127
Bc th ba, theo lut phn phi chun, khong tin cy 95% ca logOR
l: logOR 1.96SE, v trong trng hp trn, khong tin cy 95% ca
logOR l:
0.824 1.960.430 = 0.0188
0.824 + 1.960.430 = +1.6668
n e0.16668 = 5.30
Kt qu phn tch trn cho thy tnh trung bnh, OR l 2.28, nhng khong tin cy
95% ca OR dao ng t 0.98 n 5.30. Ni cch khc, nu nghin cu trn c lp li
100 ln, s c 95 nghin cu cho thy OR c th thp hn 1 (0.98) hay thm ch cao n
5.30.
n y, chng ta c kt qu pht biu cho cu hi th hai. Bi v khong tin
cy 95% c th thp hn 1 m cng c th cao hn 1, cho nn chng ta pht biu rng
mi lin h gia phi nhim AO v nguy c mc ung th tuyn tin lit khng c ngha
thng k. Xin nhn mnh, y ch mi l mt kt lun thng k, v ti cha bn n
ngha ca s liu ny trn quan im lm sng v n khng nm trong phm vi ca tho
lun.
p
1 p
log ( odd ) = + x +
hay,
p
log
= + x+
1 p
[1]
p
Trong , log(odd) hay log
cn c gi l logit(p) (v do , mi c tn
1 p
logistic); v l hai thng s cn c tnh t d liu, v l phn d (residual), tc l
phn khng th gii thch bng x. L do hon chuyn t p thnh logit(p) l v p c gi tr
trong khong 0 v 1, trong khi logit(p) c gi tr v gii hn v do thch hp cho
vic phn tch theo m hnh hi qui tuyn tnh.
M hnh trn gi nh rng tun theo lut phn phi chun (normal distribution)
vi trung bnh bng 0 v phng sai bt bin (constant variance). Vi gi nh ny, gi
p
tr k vng (expected value) hay gi tr trung bnh ca log
cho bt c gi tr no
1 p
ca x l: l + x (v gi tr trung bnh ca l 0). Ni cch khc, odd b ung th, t
p
= e + x +
1 p
[2]
[3]
[4]
odd1 e +
OR =
= = e
odd0
e
[5]
)
thng s ny c k hiu ha bng du m: v . Nh trong trng hp v d 1, c
nhim AO so vi odd trong nhm khng tng b phi nhim AO. Trong v d 1, e =
e0.824 = 2.28.
Phi nhim AO
Khng phi nhim AO v khng r
Ung th
11
36
i chng
17
127
Ch thch:
Sau khi nhp s liu, chng ta sn sng phn tch. Trong R c hm glm
chuyn dng cho phn tch hi qui logistic. Cch vit hm ny c m t trong sch
ca ti. y, ti ch gii thch ngn gn nh sau:
logistic <- glm(proportion ~ ao, family=binomial,
weight=ntotal)
Call:
glm(formula = proportion ~ ao, family = "binomial", weights = ntotal)
Deviance Residuals:
[1] 0 0
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2607
0.1888 -6.677 2.44e-11 ***
ao
0.8254
0.4306
1.917
0.0552 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3.5022e+00
Residual deviance: -2.3093e-14
AIC: 12.933
on 1
on 0
degrees of freedom
degrees of freedom
Deviance nh gii thch trn phn nh khc bit gia m hnh v d liu (cng tng
t nh mean square residual trong phn tch hi qui tuyn tnh vy). i vi mt m
hnh n l nh v d ny th gi tr ca deviance khng c ngha g nhiu.
(c) Phn k tip cung cp c s ca (m R t tn l intercept) v (ao) v sai
s chun (standard error) cho tng c s:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2607
0.1888 -6.677 2.44e-11 ***
ao
0.8254
0.4306
1.917
0.0552 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10
ESR
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
[6]
11
i i
i
i =1
i =1
2.56,
2.29,
2.67,
3.34,
2.19,
2.35,
2.29,
2.38,
2.18,
3.15,
2.15,
3.53,
3.41,
2.68,
2.54,
2.09,
<- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
Lnh th nht yu cu R v biu hnh hp (box plot) v fibrinogen phn nhm theo
bin esr, v kt qu c trnh by trong biu 2 di y. Lnh th hai s dng
kim nh t.test trong R xem s khc bit v fibrinogen gia hai nhm ESR c
ngha thng k hay khng, v kt qu c trnh by trong Bng 3 di y:
12
4.0
3.0
3.5
2.0
2.5
Fibrinogen
4.5
5.0
1
ESR
sample estimates:
mean in group 0 mean in group 1
2.650385
3.388333
Biu 2. Biu hnh hp phn phi ca Bng 4. Kim nh t gia hai nhm cao v thp
fibrinogen gia hai nhm ESR.
ESR.
Phn tch n gin trn y cho thy fibrinogen trung bnh i tng c ESR cao
(tc esr = 1) l 3.39 mm/gi, c phn cao hn so vi nhm ESR thp vi
fibrinogen trung bnh l 2.65 mm/gi. Nhng s khc bit ny khng c ngha thng
k (p = 0.1563).
By gi chng ta phn tch bng phng php hi qui logistic vi hm glm trong R nh
sau:
logit.esr <- glm(esr ~ fibrinogen, family="binomial")
summary(logit.esr)
3Q
-0.3356
Max
2.4794
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8451
2.7703 -2.471
0.0135 *
fibrinogen
1.8271
0.9009
2.028
0.0425 *
13
--Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
on 31
on 30
degrees of freedom
degrees of freedom
14
gin gii hn mt bin c lp. Tuy nhin, trong nhiu nghin cu khoa hc, c rt nhiu
bin c lp m nh nghin cu mun thm nh mi tng quan hay nh hng n mt
bin ph thuc. Trong phn ny, ti s bn v mt nghin cu vi hai bin c lp, v
vn tng tc gia cc bin c lp.
V d 3. Nghin cu v vai tr ca ph n trong x hi. Trong mt iu tra x
hi thc hin vo nm 1971-1972, cc nh nghin cu hi i tng nam v n ng
hay khng ng vi cu hi sau y: Ph n nn lo vic nh v vic iu hnh
nh nc cho n ng (Harberman SJ. The analysis of residuals in cross-classified
tables. Biometrics 1973;29:205-220). Cc nh nghin cu ghi nhn trnh hc vn v
gii ca mi i tng. Kt qu nghin cu c th tm lc bng Bng s liu s 6 sau
y.
Bng 6. Vai tr ca ph n trong x hi
edu
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
sex
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Male
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
Female
agree
4
2
4
6
5
13
25
27
75
29
32
36
115
31
28
9
15
3
1
2
3
4
1
0
6
10
14
17
26
91
30
55
50
190
disagree
2
0
0
3
5
7
9
15
49
29
45
59
245
70
79
23
110
29
28
13
20
2
0
0
1
0
7
5
16
36
35
67
62
403
15
13
14
15
16
17
18
19
20
Female
Female
Female
Female
Female
Female
Female
Female
17
18
7
13
3
0
1
2
92
81
34
115
28
21
2
4
Ghi ch: Trong bng trn, bin edu l trnh hc vn (o bng s nm theo hc) ca
ngi tr li, agree v disagree l s i tng ng hay khng ng vi cu
hi. Chng hn nh trong dng cui ca bng s liu c ngha l trong s ph n vi 20
nm hc, 2 ngi ng v 4 ngi khng ng vi cu hi.
0,
0,
1,
1,
2,
13,
2,
13,
3,
14,
3,
14,
4,
15,
4,
15,
5,
16,
5,
16,
6,
17,
6,
17,
7,
18,
7,
18,
8,
19,
8,
19,
9,
20,
9,
20)
0,
0,
1,
1,
0,
0,
1,
1,
0,
0,
1,
1,
0,
0,
1,
1,
0,
0,
1,
1,
0,
0,
1,
1,
0,
0,
1,
1,
0,
0,
1,
1)
4,
31,
6,
18,
6,
28,
10,
7,
0,
70,
1,
81,
5,
9,
14,
13,
13,
15,
17,
3,
25,
3,
26,
0,
3,
5,
7,
79, 23, 110,
0,
7,
5,
34, 115, 28,
27,
1,
91,
1,
9,
29,
16,
21,
75,
2,
30,
2)
15,
28,
36,
2,
29,
3,
55,
49,
13,
35,
4)
10,
10,
0,
1,
32,
4,
50,
29,
20,
67,
45,
2,
62,
16
0.6
0.4
0.6
0.2
0.4
0.0
0.0
0.2
% agreed
0.8
0.8
1.0
1.0
10
15
20
edu
Biu 3. T l i tng ng vi cu
hi theo trnh hc vn. Cc im trn en
th hin i tng n, v cc im trn trng
th hin i tng nam.
Biu 4. T l i tng ng vi
cu hi theo gii tnh. Trc honh 0 th
hin i tng n, 1 th hin i tng
nam.
[7]
17
Call:
glm(formula = proportion ~ sex + edu, family = "binomial", weights =
ntotal)
Deviance Residuals:
Min
1Q
Median
-2.72544 -0.87168 -0.08448
3Q
0.88843
Max
3.13315
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.50937
0.18389 13.646
<2e-16 ***
sex
-0.01145
0.08415 -0.136
0.892
edu
-0.27062
0.01541 -17.560
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 451.722
Residual deviance: 64.007
AIC: 208.07
on 40
on 38
degrees of freedom
degrees of freedom
18
p
log
= + edu + sex + edu sex
1 p
[8]
Median
0.03065
3Q
0.75927
Max
2.45262
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.09820
0.23550
8.910 < 2e-16 ***
sex
0.90474
0.36007
2.513 0.01198 *
edu
-0.23403
0.02019 -11.592 < 2e-16 ***
sex:edu
-0.08138
0.03109 -2.617 0.00886 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 451.722
Residual deviance: 57.103
AIC: 203.16
on 40
on 37
degrees of freedom
degrees of freedom
Kt qu trn cho chng ta mt bc tranh hon ton khc vi m hnh cng hng: tt
c ba thng s sex, edu v tng tc sex:edu (du : c ngha l tng tc trong R)
u c ngha thng k. hiu m hnh ny, chng ta cn phi vit li m hnh [8]
bng cc c s trong Bng 8:
p
log
= 2.098 + 0.905 sex 0.234 edu 0.081 edu sex
1 p
Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun
19
o* *
o* o o
* *
o* *
* *
ooo
0.0
0
1.0
0.8
10
15
oo
0.6
o
*
Fitted (Men)
Fitted (Women)
o
o*
o
o
o* *
o* *
*
*
0.4
o*
o* *
o* o o
* *
o* *
* *
ooo
0.2
0.8
o*
0.4
0.6
Fitted (Men)
Fitted (Women)
0.2
% agreed
o
o
o* o*
* *
0.0
o* *
% agreed
1.0
Mt cch khc cm nhn s khc bit gia hai nhm l qua biu (xem m v hai
biu ny trong phn Ch thch). Hai biu sau y m t t l ng v trnh
hc vn cho nam v n da vo m hnh cng hng [7] v m hnh tng tc [8]:
o*
20
Education
10
15
oo
o* o*
* *
o*
20
Education
20
Biu 5 cho thy hai ng biu din cho nam v n hu nh trng nhau,
nhng Biu 6 cho thy xc sut ng vi cu hi khc nhau gia nam v n v
khc bit cn ty thuc vo trnh hc vn. Chng hn nh nhng i tng c trnh
hc vn thp hn 10 nm, n c xu hng ng cao hn nam; nhng nhng i
tng c trnh hc vn cao hn 10 nm, nam c xu hng ng hn n. Trong bi
cnh ca cu hi, nhng i tng vi trnh hc vn thp thng ng vi quan
im rng ph n nn lo vic nh v vic quc gia i s cho nam iu hnh, nhng
vi nhng i tng c trnh hc vn cao, phn ln u khng ng vi quan im
ny, v phn ng ca n khc vi nam ty vo trnh hc vn. chnh l ngha ca
nh hng tng tc!
Qua v d trn, chng ta thy nu phn tch s liu theo thi quen m khng xem
xt n kh nng nh hng tng tc, rt d i n kt lun sai hay b qua nhng thng
tin quan trng. Xy dng m hnh trong phn tch thng k v khoa hc ni chung l mt
vn phc tp, v ti s bn n trong phn sau.
Trong bi sau (hi vng l c th gi) ti s bn qua v m hnh hi qui logistic a
bin, nh hng phi tuyn tnh (non-linear effect) v cc phng php cng tiu chun
xy mt m hnh logistic hon chnh.
21
Ch thch:
Thut ng s dng trong bi
Ting Anh
Logistic regression model
Control
Variable
Continuous variable
Discrete variable
Dependent variable
Independent variable
Maximum likelihood method
Additive model
Interaction model
Ting Vit
M hnh hi qui logistic
i chng
Bin
Bin lin tc
Bin khng lin tc hay bin ri rc
Bin ph thuc
Bin c lp
Phng php hp l cc i
M hnh cng hng
M hnh tng tc
M R v biu 5 v 6
# to mt hm v, gi hm l myplot
myplot <- function(predicted)
{
f <- data$sex == 1
plot(data$edu, predicted, type="n",
ylab="% agreed", xlab="Education", ylim=c(0,1))
lines(data$edu[!f], predicted [!f], lty=1)
lines(data$edu[f], predicted[f], lty=2)
lgtxt <- c("Fitted (Men)", "Fitted (Women)")
legend("topright", lgtxt, lty=1:2, bty="n")
y <- data$agree/data$ntotal
# text(data$edu, y, ifelse(f, "", ""), cex=1.25)
text(data$edu, y, ifelse(f, "o", "*"), cex=1.25)
}
# v biu 5 m hnh cng hng - additive model
additive <- glm(proportion ~ sex+edu,
family=binomial, weight=ntotal, data=data)
p.additive <- predict(additive, type="response")
myplot(p.additive)
# v biu 6 6 m hnh tng tc - interactive model
22
23