You are on page 1of 47

11

Phn tch phng sai


(Analysis of variance)
Phn tch phng sai, nh tn gi, l mt s phng php phn tch thng k
m trng im l phng sai (thay v s trung bnh). Phng php phn tch phng
sai nm trong i gia nh cc phng php c tn l m hnh tuyn tnh (hay
general linear models), bao gm c hi qui tuyn tnh m chng ta gp trong
chng trc. Trong chng ny, chng ta s lm quen vi cch s dng R trong
phn tch phng sai. Chng ta s bt u bng mt phn tch n gin, sau s
xem n phn tch phng sai hai chiu, v cc phng php phi tham s thng dng.

11.1 Phn tch phng sai n gin (one-way


analysis of variance - ANOVA)
V d 1. Bng thng k 11.1 di y so snh galactose trong 3
nhm bnh nhn: nhm 1 gm 9 bnh nhn vi bnh Crohn; nhm 2 gm 11
bnh nhn vi bnh vim rut kt (colitis); v nhm 3 gm 20 i tng khng
c bnh (gi l nhm i chng). Cu hi t ra l galactose gia 3 nhm
bnh nhn c khc nhau hay khng? Gi gi tr trung bnh ca ba nhm l 1,
2, v 3, v ni theo ngn ng ca kim nh gi thit th gi thit o l:
Ho: 1 = 2 = 3
V gi thit chnh l: HA: c mt khc bit gia 3 j (j=1,2,3)
Bng 11.1. galactose cho 3 nhm bnh nhn Crohn, vim rut kt v i chng
Nhm 1: bnh
Crohn
1343
1393
1420
1641
1897
2160
2169
2279
2890
n=9
Trung bnh: 1910
SD: 516

Nhm 2: bnh vim


rut kt
1264
1314
1399
1605
2385
2511
2514
2767
2827
2895
3011
n=11
Trung bnh: 2226
SD: 727

Nhm 3: i chng
(control)
1809 2850
1926 2964
2283 2973
2384 3171
2447 3257
2479 3271
2495 3288
2525 3358
2541 3643
2769 3657
n=20
Trung bnh: 2804
SD: 527

173

Ch thch: SD l lch chun (standard deviation).


Mi xem qua vn , c l bn c s ngh rng chng ta cn lm 3 so
snh (bng phng php kim nh t): gia nhm 1 v 2, nhm 2 v 3, v nhm 1
v 3. Nhng cch lm ny khng hp l, v c ba phng sai khc nhau. Cch
thch hp nht so snh ny l phn tch phng sai. Phn tch phng sai c
th ng dng so snh nhiu nhm cng mt lc (simultaneous comparisons).

11.1.1 M hnh phn tch phng sai


minh ha cho phng php phn tch phng sai, chng ta phi
dng k hiu. Gi galactose ca bnh nhn i thuc nhm j (j = 1, 2, 3) l xij.
M hnh phn tch phng sai pht biu rng:

xij = + i + ij

[1]

Hay c th hn:
xi1 = + 1 + i1
xi2 = + 2 + i2
xi3 = + 3 + i3
Tc l, gi tr galactose ca bt c bnh nhn no bng gi tr trung
bnh ca ton qun th () cng/tr cho nh hng ca nhm j c o bng h
s nh hng i , v sai s ij . Mt gi nh khc l ij phi tun theo lut phn
phi chun vi trung bnh 0 v phng sai 2. Hai thng s cn c tnh l v
i . Cng nh phn tch hi qui tuyn tnh, hai thng s ny c c tnh bng
phng php bnh phng nh nht; tc l tm c s v j sao cho

( x

ij

j ) nh nht.
2

Quay li vi s liu nghin cu trn, chng ta c nhng tm tt thng k nh sau:


Nhm

S i
tng (nj)
n1 = 9

1 Crohn

Trung bnh

Phng sai

s12 = 265944

x1 = 1910

2 Vim rut kt

n2 = 11

x2 = 2226

s22 = 473387

3 i chng

n3 = 20

x3 = 2804

s32 = 277500

Ton b mu

n = 40

x = 2444

) (

Ch : xij = x + x j x + xij x j

174

[2]

Trong , x l s trung bnh ca ton mu, v x j l s trung bnh ca nhm j.

Ni cch khc, phn x j x

phn nh khc bit (hay cng c th gi l

hiu s) gia trung bnh tng nhm v trung bnh ton mu, v phn xij x j

phn nh hiu s gia mt galactose ca mt i tng v s trung bnh ca


tng nhm. Theo , chng ta c cc ngun dao ng nh sau:

Tng bnh phng cho ton b mu l:

SST = ( xij x )
i

= (13432444)2 + (13932444)2 + (1343 2444)2 + + (3657


2444)2

= 12133923
Tng bnh phng phn nh khc nhau gia cc nhm:

SSB = ( xi x ) =
2

n ( x
j

x)

= 9(1910 2444) + 11(2226 2444)2 + 20(2804 2444)2

= 5681168
Tng bnh phng phn nh dao ng trong mi nhm:

SSW = ( xij x j ) =
2

( n

1) s 2j

= (9-1)(265944) + (11-1)(473387) + (20-1)(277500)


= 12133922
C th chng minh rng: SST = SSB + SSW.
SSW c tnh t mi bnh nhn cho 3 nhm, cho nn trung bnh bnh phng
cho tng nhm (mean square MSW) l:
MSW = SSW / (N k) = 12133922 / (40-3) = 327944
v trung bnh bnh phng gia cc nhm l:
MSB = SSB / (k 1) = 5681168 / (3-1) = 2841810
Trong N l tng s bnh nhn (N = 40) ca ba nhm, v k = 3 l s nhm
bnh nhn. Nu c s khc bit gia cc nhm, th chng ta k vng rng MSB
s ln hn MSW. Thnh ra, kim tra gi thit, chng ta c th da vo kim
nh F:

175

F = MSB / MSW = 8.67


[3]
Vi bc t do k-1 v N-k. Cc s liu tnh ton trn y c th trnh by trong
mt bng phn tch phng sai (ANOVA table) nh sau:
Ngun bin thin (source of
variation)

Bc t do
(degrees
of
freedom)

Tng bnh
phng
(sum of
squares)

Khc bit gia cc nhm


(between-group)
Khc bit trong tng nhm
(with-group)
Tng s

5681168

Trung
bnh bnh
phng
(mean
square)
2841810

37

12133923

327944

39

12133923

Kim nh
F

8.6655

11.1.2 Phn tch phng sai n gin vi R


Tt c cc tnh ton trn tng i phc tp, v tn kh nhiu thi gian. Tuy
nhin vi R, cc tnh ton c th lm trong vng 1 giy, sau khi d liu
c chun b ng cch.
(a) Nhp d liu. Trc ht, chng ta cn phi nhp d liu vo R. Bc th
nht l bo cho R bit rng chng ta c ba nhm bnh nhn (1, 2 v 3), nhm 1
gm 9 ngi, nhm 2 c 11 ngi, v nhm 3 c 20 ngi:
> group <- c(1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3)

phn tch phng sai, chng ta phi nh ngha bin group l mt yu t factor.
> group <- as.factor(group)

Bc k tip, chng ta np s liu galactose cho tng nhm nh nh ngha trn


(gi object l galactose):
> galactose <- c(1343,1393,1420,1641,1897,2160,2169,2279,
2890,1264,1314,1399,1605,2385,2511,2514,
2767,2827,2895,3011,1809,2850,1926,2964,
2283,2973,2384,3171,2447,3257,2479,3271,
2495,3288,2525,3358,2541,3643,2769,3657)

a hai bin group v galactose vo mt dataframe v gi l data:


> data <- data.frame(group, galactose)

176

> attach(data)

Sau khi c d liu sn sng, chng ta dng hm lm() phn tch phng
sai nh sau:
> analysis <- lm(galactose ~ group)

Trong hm trn chng ta cho R bit bin galactose l mt hm s ca


group. Gi kt qu phn tch l analysis.
(b) Kt qu phn tch phng sai. By gi chng ta dng lnh anova bit
kt qu phn tch:
> anova(analysis)
Analysis of Variance Table
Response: galactose
Df Sum Sq Mean Sq F value Pr(>F)
group
2 5683620 2841810 8.6655 0.0008191 ***
Residuals 37 12133923 327944
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Trong kt qu trn, c ba ct: Df (degrees of freedom) l bc t do; Sum Sq l


tng bnh phng (sum of squares), Mean Sq l trung bnh bnh phng (mean
square); F value l gi tr F nh nh ngha [3] va cp phn trn; v
Pr(>F) l tr s P lin quan n kim nh F.
Dng group trong kt qu trn c ngha l bnh phng gia cc nhm
(between-groups) v residual l bnh phng trong mi nhm (withingroup). y, chng ta c:
SSB = 5683620 v MSB = 2841810
v:
MSB = 2841810 v MSB = 327944
Nh vy, F = 2841810 / 327944 = 8.6655.
Tr s p = 0.00082 c ngha l tn hiu cho thy c s khc bit v galactose
gia ba nhm.
(c) c s. bit thm chi tit kt qu phn tch, chng ta dng lnh
summary nh sau:
> summary(analysis)
Call:

177

lm(formula = galactose ~ group)


Residuals:
Min
1Q Median
3Q Max
-995.5 -437.9 102.0 456.0 979.8
Coefficients:
Estimate Std. Error t value
(Intercept) 1910.2
190.9
10.007
group2
316.3
257.4
1.229
group3
894.3
229.9
3.891
---

Pr(>|t|)
4.5e-12 ***
0.226850
0.000402 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 572.7 on 37 degrees of freedom


Multiple R-Squared: 0.319,
Adjusted R-squared: 0.2822
F-statistic: 8.666 on 2 and 37 DF, p-value: 0.0008191

Theo kt qu trn y, intercept chnh l trong m hnh [1]. Ni cch


khc, = 1910 v sai s chun l 190.9.
c tnh thng s j , R t 1 =0, v 2 = 2 1 = 316.3, vi sai s chun
l 257, v kim nh t = 316.3 / 257 = 1.229 vi tr s p = 0.2268. Ni cch khc,
so vi nhm 1 (bnh nhn Crohn), bnh nhn vim rut kt c galactose trung
bnh cao hn 257, nhng khc bit ny khng c ngha thng k.
Tng t, 3 = 3 1 = 894.3, vi sai s chun l 229.9, kim nh
t=894.3/229.9=3.89, v tr s p = 0.00040. So vi bnh nhn Crohn, nhm i
chng c galactose cao hn 894, v mc khc bit ny c ngha thng k.

11.2 So snh nhiu nhm (multiple comparisons)


v iu chnh tr s p
Cho k nhm, chng ta c t nht l k(k-1)/2 so snh. V d trn c 3
nhm, cho nn tng s so snh kh d l 3 (gia nhm 1 v 2, nhm 1 v 3, v
nhm 2 v 3). Khi k=10, s ln so snh c th ln rt cao. Nh cp trong
chng 7, khi c nhiu so snh, tr s p tnh ton t cc kim nh thng k
khng cn ngha ban u na, bi v cc kim nh ny c th cho ra kt qu
dng tnh gi (tc kt qu vi p<0.05 nhng trong thc t khng c khc nhau
hay nh hng). Do , trong trng hp c nhiu so snh, chng ta cn phi
iu chnh tr s p sao cho hp l.

178

C kh nhiu phng php iu chnh tr s p, v 4 phng php thng


dng nht l: Bonferroni, Scheff, Holm v Tukey (tn ca 4 nh thng k hc).
Phng php no thch hp nht? Khng c cu tr li dt khot cho cu hi
ny, nhng hai im sau y c th gip bn c quyt nh tt hn:

(a)

Nu k < 10, chng ta c th p dng bt c phng php


no iu chnh tr s p. Ring c nhn ti th thy
phng php Tukey thng rt hu ch trong so snh.

(b)

Nu k > 10, phng php Bonferroni c th tr nn rt bo


th. Bo th y c ngha l phng php ny rt t khi
no tuyn b mt so snh c ngha thng k, d trong
thc t l c tht! Trong trng hp ny, hai phng php
Tukey, Holm v Scheff c th p dng.

y, chng ta s khng bn n l thuyt ng sau cc phng php


ny (v bn c c th tham kho trong cc sch gio khoa v thng k), nhng
s ch cch s dng R tin hnh cc so snh theo phng php ca Tukey.
Quay li v d trn, cc tr s p trn y l nhng tr s cha c iu
chnh cho so snh nhiu ln. Trong chng v tr s p, ni cc tr s ny
phng i ngha thng k, khng phn nh tr s p lc ban u (tc 0.05).
iu chnh cho nhiu so snh, chng ta phi s dng n phng php iu
chnh Bonferroni.
Chng ta c th dng lnh pairwise.t.test c c tt c cc
tr s p so snh gia ba nhm nh sau:
> pairwise.t.test(galactose, group, p.adj="bonferroni")
Pairwise comparisons using t tests with pooled SD
data: galactose and group
1
2
2 0.6805 3 0.0012 0.0321
P value adjustment method: bonferroni

Kt qu trn cho thy tr s p gia nhm 1 (Crohn) v vim rut kt l 0.6805


(tc khng c ngha thng k); gia nhm Crohn v i chng l 0.0012 (c
ngha thng k), v gia nhm vim rut kt v i chng l 0.0321 (tc cng
c ngha thng k).

179

Mt phng php iu chnh tr s p khc c tn l phng php Holm:


> pairwise.t.test(galactose, group)
Pairwise comparisons using t tests with pooled SD
data: galactose and group
1
2
2 0.2268 3 0.0012 0.0214
P value adjustment method: holm

Kt qu ny cng khng khc so vi phng php Bonferroni.


Tt c cc phng php so snh trn s dng mt sai s chun chung cho c ba
nhm. Nu chng ta mun s dng cho tng nhm th lnh sau y
(pool.sd=F) s p ng yu cu :
> pairwise.t.test(galactose, group, pool.sd=FALSE)
Pairwise comparisons using t tests with non-pooled SD
data: galactose and group
1
2
2 0.2557 3 0.0017 0.0544
P value adjustment method: holm

Mt ln na, kt qu ny cng khng lm thay i kt lun.

11.2.1 So snh nhiu nhm bng phng php Tukey


Trong cc phng php trn, chng ta ch bit tr s p so snh gia cc
nhm, nhng khng bit mc khc bit cng nh khong tin cy 95% gia cc
nhm. c nhng c s ny, chng ta cn n mt hm khc c tn l aov
(vit tt t analysis of variance) v hm TukeyHSD (HSD l vit tt t Honest
Significant Difference, tm dch l Khc bit c ngha thnh tht) nh sau:
> res <- aov(galactose ~ group)
> TukeyHSD (res)
Tukey multiple comparisons of means
95% family-wise confidence level

180

Fit: aov(formula = galactose ~ group)


$group
diff
lwr
upr
2-1 316.3232 -312.09857 944.745
3-1 894.2778 333.07916 1455.476
3-2 577.9545 53.11886 1102.790

p adj
0.4439821
0.0011445
0.0281768

Kt qu trn cho chng ta thy nhm 3 v 1 khc nhau khong 894 n v, v


khong tin cy 95% t 333 n 1455 n v. Tng t, galactose trong nhm
bnh nhn vim rut kt thp hn nhm i chng (nhm 3) khong 578 n v,
v khong tin cy 95% t 53 n 1103.

3-2

3-1

2-1

95% family-wise confidence level

500

1000

1500

Differences in mean levels of group

Biu 11.1. Trung bnh hiu v khong tin cy 95%


gia nhm 1 v 2, 1 v 3, v 3 v 2. Trc honh l
galactose, trc tung l ba so snh.

11.2.2 Phn tch bng biu


Mt phn tch thng k khng th no hon tt nu khng c mt th
minh ha cho kt qu. Cc lnh sau y v th th hin galactose trung
bnh v sai s chun cho tng nhm bnh nhn. Biu ny cho thy, nhm
bnh nhn Crohn c galactose thp nht (nhng khng thp hn nhm vim
rut kt), v c hai nhm thp hn nhm i chng v s khc bit ny c
ngha thng k.
> xbar <- tapply(galactose, group, mean)
> s <- tapply(galactose, group, sd)
> n <- tapply(galactose, group, length)

181

1500 2000 2500 3000 3500

> sem <- s/sqrt(n)


> stripchart(galactose ~ group, jitter, jit=0.05,
pch=16, vert=TRUE)
> arrows(1:3, xbar+sem, 1:3, xbar-sem, angle=90, code=3,
length=0.1)
> lines(1:3, xbar, pch=4, type=b, cex=2)

Biu 11.2. galactose ca nhm 1 (bnh nhn Crohn), nhm 2


(bnh nhn vim rut kt), v nhm 3 (i chng).

11.3 Phn tch bng phng php phi tham s


Phng php so snh nhiu nhm phi tham s (non-parametric
statistics) tng ng vi phng php phn tch phng sai l KruskalWallis. Cng nh phng php Wilcoxon so snh hai nhm theo phng php
phi tham s, phng php Kruskal-Wallis cng bin i s liu thnh th bc
(ranks) v phn tch khc bit th bc ny gia cc nhm. Hm
kruskal.test trong R c th gip chng ta trong kim nh ny:
> kruskal.test(galactose ~ group)
Kruskal-Wallis rank sum test
data: galactose by group
Kruskal-Wallis chi-squared = 12.1381, df = 2, p-value =
0.002313

Tr s p t kim nh ny kh thp (p = 0.002313) cho thy c s khc


bit gia ba nhm nh phn tch phng sai qua hm lm trn y. Tuy nhin,

182

mt bt tin ca kim nh phi tham s Kruskal-Wallis l phng php ny


khng cho chng ta bit hai nhm no khc nhau, m ch cho mt tr s p
chung. Trong nhiu trng hp, phn tch phi tham s nh kim nh KruskalWallis thng khng c hiu qu nh cc phng php thng k tham s
(parametric statistics).

11.4 Phn tch phng sai hai chiu (two-way


analysis of variance - ANOVA)
Phn tch phng sai n gin hay mt chiu ch c mt yu t (factor).
Nhng phn tch phng sai hai chiu (two-way ANOVA), nh tn gi, c hai
yu t. Phng php phn tch phng sai hai chiu ch n gin khai trin t
phng php phn tch phng sai n gin. Thay v c tnh phng sai ca
mt yu t, phng php phng sai hai chiu c tnh phng sai ca hai yu
t.
V d 2. Trong v d sau y, nh gi hiu qu ca mt k thut sn
mi, cc nh nghin cu p dng sn trn 3 loi vt liu (1, 2 v 3) trong hai iu
kin (1, 2). Mi iu kin v loi vt liu, nghin cu c lp li 3 ln. bn
c o l ch s bn b (tm gi l score). Tng cng, c 18 s liu nh sau:
Bng 11.2. bn b ca sn cho 2 iu kin v 3 vt liu
iu kin
(i)
1
2

1
4.1, 3.9, 4.3
2.7, 3.1, 2.6

Vt liu (j)
2
3.1, 2.8, 3.3
1.9, 2.2, 2.3

3
3.5, 3.2, 3.6
2.7, 2.3, 2.5

S liu ny c th tm lc bng s trung bnh cho tng iu kin v vt liu


trong bng thng k sau y:
Bng 11.3. Tm lc s liu t th nghim bn b ca nc sn
iu kin (i)
Trung bnh
1
2
Trung bnh 2
nhm

Vt liu (j)
2

4.10
2.80
3.450

3.07
2.13
2.600

3.43
2.50
2.967

Trung bnh
cho 3 vt liu
3.533
2.478
3.00

183

Phng sai
1
0.040
0.063
0.043
2
0.070
0.043
0.040
Nhng tnh ton s khi trn y cho thy c th c s khc nhau (hay nh
hng) ca iu kin v vt liu th nghim.
Gi xij l score ca iu kin i (i = 1, 2) cho vt liu j (j = 1, 2, 3). ( n
gin ha vn , chng ta tm thi b qua k i tng). M hnh phn tch
phng sai hai chiu pht biu rng:
xij = + i + j + ij
[4]
x11 = + 1 + 1 + 11

Hay c th hn:

x12 = + 1 + 2 + 12
x13 = + 1 + 3 + 11
x21 = + 2 + 1 + 21
x22 = + 2 + 2 + 22
x23 = + 2 + 3 + 21
l s trung bnh cho ton qun th, cc h s i (nh hng ca iu kin i)v
j (nh hng ca vt liu j) cn phi c tnh t s liu thc t. ij c gi
nh tun theo lut phn phi chun vi trung bnh 0 v phng sai 2.
Trong phn tch phng sai hai chiu, chng ta cn chia tng bnh phng ra
thnh 3 ngun:

Ngun th nht l tng bnh phng do khc bit gia 2 iu kin:

SSc = ni ( xi x )

= 9(3.533 3.00)2 + 9(2.478 3.00)2


= 5.01

184

Ngun th hai l tng bnh phng do khc bit gia 3 vt liu:

SSm = n j ( x j x )

= 6(3.45 3.00)2 + 6(2.60 3.00)2 + 6(2.967 3.00)2

= 2.18
Ngun th ba l tng bnh phng phn d (residual sum of squares):

SSe = ( xij xi x j + x ) = ( nij 1) sij2


2

= 2(0.040) + 2(0.063) + 2(0.043) + 2(0.070) + 2(0.043) + 2(0.040)


= 0.73
Trong cc phng trnh trn, n = 3 (lp li 3 ln cho mi iu kin v vt
liu), m = 3 vt liu, x l s trung bnh cho ton mu, xi l s trung bnh cho
tng iu kin, x j l s trung bnh cho tng vt liu. V SSc c m-1 bc t do,
SSm c (n -1) bc t do, v SSe c Nnm+2 bc t do, trong N l tng s
mu (tc 18). Do , cc trung bnh bnh phng

Gia hai iu kin:


Gia ba vt liu:
Phn d:

MSc = SSc / (m-1) = 5.01 / 1 = 5.01


MSm = SSc / (n-1) = 2.18 /2 = 1.09
MSe = SSe / (N-nm+2) = 0.73 / 14 = 0.052

Do , so snh khc bit gia hai iu kin da vo kim nh F = MSc/Mse


vi bc t do 1 v 14. Tng t, so snh khc bit gia ba vt liu c th da
vo kim nh F = MSm/Mse vi bc t do 2 v 14. Cc phn tch trn c th
trnh by trong mt bng phn tch phng sai nh sau:
Ngun bin thin (source Bc t do
of variation)
(degrees of
freedom)

Tng bnh
phng
(sum of
squares)

Khc bit gia 2 iu


kin
Khc bit gia 3 vt liu
Phn d (residual)
Tng s

5.01

2
14
17

2.18
0.73
7.92

Trung
bnh bnh
phng
(mean
square)
5.01

Kim
nh F

1.09
0.052

20.8

95.6

11.4.1 Phn tch phng sai hai chiu vi R

185

(a) Bc u tin l nhp s liu t bng 11.2 vo R. Chng ta cn phi t


chc d liu sao cho c 4 bin nh sau:

Condition
(iu kin)
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2

Material
(vt liu)
1
1
1
2
2
2
3
3
3
1
1
1
2
2
2
3
3
3

i tng

Score

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

4.1
3.9
4.3
3.1
2.8
3.3
3.5
3.2
3.6
2.7
3.1
2.6
1.9
2.2
2.3
2.7
2.3
2.5

Chng ta c th to ra mt dy s bng cch s dng hm gl (generating


levels). Cch s dng hm ny c th minh ha nh sau:
> gl(9, 1, 18)
[1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Levels: 1 2 3 4 5 6 7 8 9

Trong lnh trn, chng ta to ra mt dy s 1,2,3, 9 hai ln (vi tng s 18


s). Mi mt ln l mt nhm. Trong khi lnh:
> gl(4,
[1] 1 1
4 4 4 4
Levels:

9, 36)
1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4
4
1 2 3 4

Trong lnh trn, chng ta to ra mt dy s vi 4 bc (1,2,3, 4) 9 ln (vi tng


s 36 s).
Do , to ra cc bc cho iu kin v vt liu, chng ta lnh nh sau:
> condition <- gl(2, 9, 18)

186

> material <- gl(3, 3, 18)

V to nn 18 m s (t 1 n 18):
> id <- 1:18

Sau cng l s liu cho score:


> score <- c(4.1,3.9,4.3, 3.1,2.8,3.3, 3.5,3.2,3.6,
2.7,3.1,2.6, 1.9,2.2,2.3, 2.7,2.3,2.5)

Tt c cho vo mt dataframe tn l data:


> data <- data.frame(condition, material, id, score)
> attach(data)

(b) Phn tch v kt qu s khi. By gi s liu sn sng cho phn tch.


phn tch phng sai hai chiu, chng ta vn s dng lnh lm vi cc
thng s nh sau:
> twoway <- lm(score ~ condition + material)
> anova(twoway)
Analysis of Variance Table
Response: score
Df Sum Sq Mean Sq
condition 1 5.0139 5.0139
material 2 2.1811 1.0906
Residuals 14 0.7344 0.0525
---

F value Pr(>F)
95.575 1.235e-07 ***
20.788 6.437e-05 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ba ngun dao ng (variation) ca score c phn tch trong bng


trn. Qua trung bnh bnh phng (mean square), chng ta thy nh hng ca
iu kin c v quan trng hn l nh hng ca vt liu th nghim. Tuy nhin,
c hai nh hng u c ngha thng k, v tr s p rt thp cho hai yu t.
(c) c s. Chng ta yu cu R tm lc cc c s phn tch bng lnh
summary:
> summary(twoway)
Call:
lm(formula = score ~ condition + material)
Residuals:
Min
1Q Median
3Q
Max
-0.32778 -0.16389 0.03333 0.16111 0.32222

187

Coefficients:
Estimate Std. Error t value
(Intercept) 3.9778
0.1080 36.841
condition2 -1.0556
0.1080
-9.776
material2
-0.8500
0.1322
-6.428
material3
-0.4833
0.1322
-3.655
---

Pr(>|t|)
2.43e-15 ***
1.24e-07 ***
1.58e-05 ***
0.0026 **

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.229 on 14 degrees of freedom
Multiple R-Squared: 0.9074,
Adjusted R-squared: 0.8875
F-statistic: 45.72 on 3 and 14 DF, p-value: 1.761e-07

Kt qu trn cho thy so vi iu kin 1, iu kin 2 c score thp


hn khong 1.056 v sai s chun l 0.108, vi tr s p = 1.24e-07, tc c
ngha thng k. Ngoi ra, so vi vt liu 1, score cho vt liu 2 v 3 cng thp
hn ng k vi thp nht ghi nhn vt liu 2, v nh hng ca vt liu
th nghim cng c ngha thng k.
Gi tr c tn l Residual standard error c c tnh t
trung bnh bnh phng phn d trong phn (a), tc l 0.0525 = 0.229, tc l
c s ca .
H s xc nh bi (R2) cho bit hai yu t iu kin v vt liu gii
thch khong 91% dao ng ca ton b mu. H s ny c tnh t tng
bnh phng trong kt qu phn (a) nh sau:

R2 =

5.0139 + 2.1811
= 0.9074
5.0139 + 2.1811 + 0.7344

V sau cng, h s R2 iu chnh phn nh ci tin ca m hnh.


hiu h s ny tt hn, chng ta thy phng sai ca ton b mu l
s2 = (5.0139 + 2.1811 + 0.7344) / 17 = 0.4644. Sau khi iu chnh cho nh
hng ca iu kin v vt liu, phng sai ny cn 0.0525 (tc l residual
mean square). Nh vy hai yu t ny lm gim phng sai khong 0.4644
0.0525 = 0.4119. V h s R2 iu chnh l:
Adj R2 = 0.4119 / 0.4644 = 0.88
Tc l sau khi iu chnh cho hai yu t iu kin v vt liu phng sai ca
score gim khong 88%.
(d) Hiu ng tng tc (interaction effects)

188

cho phn tch hon tt, chng ta cn phi xem xt n kh nng nh


hng ca hai yu t ny c th tng tc nhau (interactive effects). Tc l m
hnh score tr thnh:

xij = + i + j + ( i j ) + ij
ij

Ch phng trnh trn c phn i j

ij

phn nh s tng tc gia hai yu

t. V chng ta ch n gin lnh R nh sau:


> anova(twoway <- lm(score ~ condition+
material+condition*material))
Analysis of Variance Table
Response: score
Df Sum Sq
condition
1 5.0139
material
2 2.1811
condition:material
2 0.1344
Residuals
12 0.6000
---

Mean Sq F value
5.0139 100.2778
1.0906
21.8111

Pr(>F)
3.528e-07 ***
0.0001008 ***

0.0672
0.0500

0.2972719

1.3444

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kt qu phn tch trn (p = 0.297 cho nh hng tng tc). Chng ta c bng
chng kt lun rng nh hng tng tc gia vt liu v iu kin khng c
ngha thng k, v chng ta chp nhn m hnh [4], tc khng c tng tc.
(e) So snh gia cc nhm. Chng ta s c tnh khc bit gia hai iu
kin v ba vt liu bng hm TukeyHSD vi aov:
> res <- aov(score ~ condition+ material+condition)
> TukeyHSD(res)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = score ~ condition + material +
condition)
$condition
diff
lwr
upr
p adj
2-1 -1.055556 -1.287131 -0.8239797 1e-07
$material
diff
lwr
upr
p adj
2-1 -0.8500000 -1.19610279 -0.5038972 0.0000442

189

3-1 -0.4833333 -0.82943612 -0.1372305 0.0068648


3-2 0.3666667 0.02056388 0.7127695 0.0374069

Biu 11.3 sau y s minh ha cho cc kt qu trn:


> plot(TukeyHSD(res), ordered=TRUE)
There were 16 warnings (use warnings() to see them)

4.0

2-1

95% family-wise confidence level

condition

3.0

3-2

2.5

3-1

mean of score

3.5

1
2

-1.0

-0.5

0.0

0.5

Differences in mean levels of material

Biu 11.3. So snh gia 3 loi vt


liu bng phng php Tukey.

material

Biu 11.4. Trung bnh score cho


tng iu kin 1 (ng t on)
v iu kin 2 cho 3 loi vt liu.

(f) Biu . xem qua nh hng ca hai yu t iu kin v vt liu,


chng ta cn phi c mt th, m trong phn tch phng sai gi l th
tng tc. Hm interaction.plot cung cp phng tin v biu
ny (xem biu 11.4):
> interaction.plot(score, condition, material)

11.5 Phn tch hip bin (analysis of covariance ANCOVA)


Phn tch hip bin (s vit tt l ANCOVA) l phng php phn tch
s dng c hai m hnh hi qui tuyn tnh v phn tch phng sai. Trong phn
tch hi qui tuyn tnh, c hai bin ph thuc (dependent variable, cng c th
gi l bin ng response variable) v bin c lp (independent variable hay
predictor variable) phn ln l dng lin tc (continuous variable), nh
cholesterol v tui chng hn. Trong phn tch phng sai, bin ph thuc l
bin lin tc, cn bin c lp th dng th bc v th loi (categorical
variable), nh galactose v nhm bnh nhn trong v d 1 chng hn. Trong

190

phn tch hip bin, bin ph thuc l lin tc, nhng bin c lp c th l lin
tc v th loi.
V d 3. Trong nghin cu m kt qa c trnh by di y, cc nh
nghin cu o chiu cao v tui ca 18 hc sinh thuc vng thnh th (urban)
v 14 hc tr thuc vng nng thn (rural).
Bng 11.4. Chiu cao ca hc tr vng thnh th v nng
thn
Area
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
rural
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban
urban

ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Age (months)
109
113
115
116
119
120
121
124
126
129
130
133
134
135
137
139
141
142
121
121
128
129
131
132
133
134
138
138
138
140
140
140

Height (cm)
137.6
147.8
136.8
140.7
132.7
145.4
135.0
133.0
148.5
148.3
147.5
148.8
133.2
148.7
152.0
150.6
165.3
149.9
139.0
140.9
134.9
149.5
148.7
131.0
142.3
139.9
142.9
147.7
147.7
134.6
135.8
148.5

191

Cu hi t ra l c s khc bit no v chiu cao gia tr em thnh


th v nng thn hay khng. Ni cch khc, mi trng c tr c nh hng n
chiu cao hay khng, v nu c th mc nh hng l bao nhiu?
Mt yu t c nh hng ln n chiu cao l tui. Trong tui
trng thnh, chiu cao tng theo tui. Do , so snh chiu cao gia hai
nhm ch c th khch quan nu tui gia hai nhm phi tng ng nhau.
m bo tnh khch quan ca so snh, chng ta cn phi phn tch s liu
bng m hnh hip bin.
Vic u tin l chng ta phi nhp s liu vo R vi nhng lnh sau y:
# to ra dy s id
> id <- c(1:18, 1:14)
# group 1=urban 2=rural v cn phi xc nh group l mt
factor
> group <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2)
> group <- as.factor(group)
> # nhp d liu
> age <- c(109,113,115,116,119,120,121,124,126,129,130,
133,134,135, 137,139,141,142, 121,121,128,
129,131,132,133,134,138,138,138,140,140,140)
> height <- c(137.6,147.8,136.8,140.7,132.7,145.4,135.0,
133.0,148.5, 148.3,147.5,148.8,133.2,
148.7,152.0,150.6,165.3,149.9,
139.0,140.9,134.9,149.5,148.7,131.0,142.3,
139.9,142.9,147.7,147.7,134.6,135.8,148.5)
> # to mt data frame
> data <- data.frame(id, group, age, height)
> attach(data)

Chng ta th xem qua vi ch s thng k m t bng cch c tnh


tui v chiu cao trung bnh cho tng nhm hc sinh:
> tapply(age, group, mean)
1
2
126.8333 133.0714
> tapply(height, group, mean)
1
2
144.5444 141.6714

192

Kt qu trn cho thy nhm hc sinh thnh th c tui thp hn hc


sinh nng thn khong 6.3 thng (126.8 133.1). Tuy nhin, chiu cao ca hc
sinh thnh th cao hn hc sinh nng thn khong 2.8 cm (144.5 141.7). Bn
c c th dng kim nh t thy rng s khc bit v tui gia hai nhm
c ngha thng k (p = 0.045).

150
130

135

140

145

height

155

160

165

Ngoi ra, biu sau y cn cho thy c mt mi lin h tng quan gia tui
v chiu cao:

110

115

120

125

130

135

140

age

Biu 11.5. Chiu cao (cm) v tui (thng tui)


ca hai nhm hc sinh thnh th v nng thn.
V hai nhm khc nhau v tui, v tui c lin h vi chiu cao, cho
nn chng ta khng th pht biu hay so snh chiu cao gia 2 nhm hc sinh
m khng iu chnh cho tui. iu chnh tui, chng ta s dng
phng php phn tch hip bin.

11.5.1 M hnh phn tch hip bin


Gi y l chiu cao, x l tui, v g l nhm. M hnh cn bn ca
ANCOVA gi nh rng mi lin h gia y v x l mt ng thng, v dc
(gradient hay slope) ca hai nhm trong mi lin h ny khng khc nhau. Ni
cch khc, vit theo k hiu ca hi qui tuyn tnh, chng ta c:

y1 = 1 + x + e1

in group 1

193

y2 = 2 + x + e2

in group 2.

[5]

Trong :

1 : l gi tr trung bnh ca y khi x= 0 ca nhm 1;


2 : l gi tr trung bnh ca y khi x= 0 ca nhm 2;
: dc ca mi lin h gia y v x;

e1 v e2: bin s ngu nhin vi trung bnh 0 v phng sai 2.


Gi x l s trung bnh ca tui cho c 2 nhm, x1 v x2 l tui
trung bnh ca nhm 1 v nhm 2. Nh ni trn, nu x1 x2 , th so snh chiu
cao trung bnh ca nhm 1 v 2 ( y1 v y2 ) s thiu khch quan, v

y1 = 1 + x1 + e1
y2 = 2 + x2 + e2
v mc khc bit gia hai nhm by gi ty thuc vo h s :

y1 y 2 = 1 2 + ( x1 x2 )
Ch rng trong m hnh [5], chng ta c th din dch 1 2 l
khc bit chiu cao trung bnh gia hai nhm nu c hai nhm c cng tui
trung bnh. Mc khc bit ny th hin nh hng ca hai nhm nu khng c
mt yu t no lin h n y. c tnh 1 2 , chng ta khng th n gin
tr hai s trung bnh y1 - y2 , nhng phi iu chnh cho x. Gi x* l mt gi tr
chung cho c hai nhm, chng ta c th c tnh gi tr iu chnh y cho nhm 1
(k hiu y1a ) nh sau:

y1a = y1 x1 x*

y1a c th xem l mt c s cho chiu cao trung bnh ca nhm 1 (thnh th)
cho gi tr x l x* . Tng t:

y2 a = y2 x2 x*

l s cho chiu cao trung bnh ca nhm 1 (nng thn) vi cng gi tr x*. T
y, chng ta c th c tnh nh hng ca thnh th v nng thn bng cng
thc sau y:

194

y1a y2 a = y2 y1 ( x1 x2 )

Do , vn l
chng ta phi c tnh . C th chng minh rng c s t phng php
bnh phng nh nht cng l c tnh khch quan cho 1 2 . Khi vit bng
m hnh tuyn tnh, m hnh hip bin c th m t nh sau:

y = + x + g + ( xg ) + e

[6]

Ni cch khc, m hnh trn pht biu rng chiu cao ca mt hc sinh
b nh hng bi 3 yu t: tui (), thnh th hay nng thn (), v tng tc
gia hai yu t (). Nu = 0 (tc nh hng tng tc khng c ngha
thng k), m hnh trn gim xung thnh:

y = + x+ g +e

[7]

Nu = 0 (tc nh hng ca thnh th khng c ngha thng k), m hnh


trn gim xung thnh:

y = + x+e

[8]

11.5.2 Phn tch bng R


Cc tho lun va trnh by trn xem ra kh phc tp, nhng trong thc
t, vi R, cch c tnh rt n gin bng hm lm. Chng ta s phn tch ba m
hnh [6], [7] v [8]:
> # model 6
> model6 <- lm(height ~ group + age + group:age)
> # model 7
> model7 <- lm(height ~ group + age)
> # model 8
> model8 <- lm(height ~ age)
Chng ta cng c th so snh c ba m hnh cng mt lc bng lnh anova
nh sau:
> anova(model6, model7, model8)
Analysis of Variance Table
Model 1: height ~ group + age + group:age
Model 2: height ~ group + age

195

Model 3: height ~ age


Res.Df
RSS
Df Sum of Sq F
1
28
1270.44
2
29
1338.02 -1 -67.57
1.4893
3
30
1545.95 -1 -207.93 4.5827
---

Pr(>F)
0.23251
0.04114 *

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ch model 1 chnh l m hnh [6], model 2 l m hnh [7], v


model 3 l m hnh [8]. RSS l residual sum of squares, tc tng
bnh phng phn d cho mi m hnh. Kt qu phn tch trn cho thy:

Ton b mu c 18+14=32 hc sinh, m hnh [6] c 4 thng s (, ,


v ), cho nn m hnh ny c 32-4 = 28 bc t do. Tng bnh phng
ca m hnh l 1270.44.

M hnh [7] c 3 thng s (tc cn 29 bc t do), cho nn tng bnh


phng phn d cao hn m hnh [7]. Tuy nhin, ng trn phng
din xc sut th trung bnh bnh phng phn d ca m hnh ny
1338.02 / 29 = 46.13, khng khc my so vi m hnh [6] (trung bnh
bnh phng l: 1270.44 / 28 = 45.36), v tr s p=0.2325, tc khng c
ngha thng k. Ni cch khc, b h s tng tc khng lm thay
i kh nng tin on ca m hnh mt cch ng k.

M hnh [8] ch c 2 thng s (v do c 30 bc t do), vi tng bnh


phng l 1545.95. Trung bnh bnh phng phn d ca m hnh ny
l 51.53 (1545.95 / 30), tc cao hn hai m hnh [6] mt cch ng k,
v tr s p = 0.0411.

Qua phn tch trn, chng ta thy m hnh [7] l ti u hn c, v ch cn 3


thng s m c th gii thch c d liu mt cch y . By gi chng ta
s ch tm vo phn tch kt qu ca m hnh ny.
> summary(model7)
Call:
lm(formula = height ~ group + age)
Residuals:
Min
1Q Median
3Q
Max
-14.324 -3.285 0.879 3.956 14.866
Coefficients:

196

Estimate Std. Error t value Pr(>|t|)


(Intercept) 91.8171 17.9294 5.121 1.81e-05 ***
group2
-5.4663
2.5749 -2.123 0.04242 *
age
0.4157
0.1408 2.953 0.00619 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.793 on 29 degrees of freedom


Multiple R-Squared: 0.2588,
Adjusted R-squared: 0.2077
F-statistic: 5.063 on 2 and 29 DF, p-value: 0.01300

Qua phn c tnh thng s trnh by trn y, chng ta thy tnh trung
bnh chiu cao hc sinh tng khong 0.41 cm cho mi thng tui. Ch trong
kt qu trn, phn group2 c ngha l h s hi qui (regression coefficient)
cho nhm 2 (tc l nng thn), v R phi t h s cho nhm 1 bng 0 tin
vic tnh ton. V th, chng ta c hai phng trnh (hay hai ng biu din)
cho hai nhm hc sinh nh sau:
i vi hc sinh thnh th:
Height = 91.817 + 0.4157(age)

V i vi hc sinh nng thn:


Height = 91.817 5.4663(rural) + 0.4157(age)

Ni cch khc, sau khi iu chnh cho tui, nhm hc sinh nng
thn (rural) c chiu cao thp hn nhm thnh th khong 5.5 cm v mc
khc bit ny c ngha thng k v tr s p = 0.0424. (Ch l trc khi iu
chnh cho tui, mc khc bit l 2.8 cm).
Cc biu sau y s minh ha cho cc m hnh trn:
> par(mfrow=c(2,2))
> plot(age, height, pch=as.character(group),
main=Mo hinh 1)

> abline(144.54, 0) #mean value for urban


> abline(141.67, 0) #mean value for rural
> plot(age, height, pch=as.character(group),
main=Mo hinh 2)

> abline(102.63, 0.3138) #single line for dependence on age


> plot(age, height, pch=as.character(group),
main=Mo hinh 3)

> abline(91.8, 0.416) #line for males

197

> abline(91.8-5.46,0.416) #line for females parallel


> plot(age, height, pch=as.character(group),
main=Mo hinh 4)

> abline(79.7, 0.511) #line for males


> abline(79.7+47.08, 0.511-0.399) #line for females parallel
> par(mfrow=c(1,1))
Mo hinh 1

Mo hinh 2

1
110

115

120

125

130

150

1 1
2 2

2
2

135

140

1
1

110

115

2 2 1 1
11

120

age

1 1
2 2
2

2
2

2
2

1
1

130

2
2

2
2

1
1

2 2 1 1
11

140

height

150

130

140

height

160

160

125

130

2
2

1
135

140

age

Mo hinh 3

Mo hinh 4

1
1

110

115

120

1
125
age

2
2
130

1
135

1 1
2 2

140

150

2
2

2
2

1
1

2
112 1 1

2
2

140

1
1

110

115

2
112 1 1

120

2
2

2
2

1
1

130

height

150

130

140

height

160

160

1
125

2
2
130

1
135

1 1
2 2
2
2
2

140

age

Biu 11.6. M hnh 1: chiu cao l hm s ca ni tr ng, nhng


khng c lin h vi tui; M hnh 2 gi thit rng chiu cao ph thuc
vo tui, nhng khng c khc nhau gia hai nhm thnh th v nng
thn; M hnh 3 gi thit rng mi lin h gia chiu cao v tui ca nhm
thnh th tng ng vi vi nhm nng thn (hai ng song song),
nhng hc sinh thnh th c chiu cao cao hn nng thn; v m hnh 4 gi
thit rng mc khc bit v chiu cao gia hai nhm ty thuc vo
tui (tc c tng tc gia tui v ni tr ng): tui <120 thng, chiu
cao hai nhm khng khc nhau my, nhng khi tui >120 thng tui th
nhm hc sinh thnh th c chiu cao cao hn nhm nng thn. Phn tch
trn cho thy m hnh 3 l tt nht.

11.6 Phn tch phng sai cho th nghim giai


tha (factorial experiment)

198

V d 4. kho st nh hng ca 4 loi thuc tr su (1, 2, 3 v 4)


v ba loi ging (B1, B2 v B3) n sn lng ca cam, cc nh nghin cu tin
hnh mt th nghim loi giai tha. Trong th nghim ny, mi ging cam c 4
cy cam c chn mt cch ngu nhin, v 4 loi thuc tr su p dng (cng
ngu nhin) cho mi cy cam. Kt qu nghin cu (sn lng cam) cho tng
ging v thuc tr su nh sau:
Bng 11.5. Sn lng cam cho 3 loi ging v 4 loi thuc tr su
Ging cam
(variety)
B1
B2
B3
Tng s

1
29
41
66
136

Thuc tr su (pesticide)
2
3
50
43
58
42
85
63
193
154

Tng s
4
53
73
85
211

175
214
305
694

M hnh phn tch th nghim giai tha cng khng khc g so vi phn
tch phng sai hai chiu nh trnh by trong phn trn. C th hn, m hnh
m chng ta xem xt l:
product = + (variety) + (pesticide) +

Trong , l hng s biu hin trung bnh ton mu, l h s nh


hng ca ba ging cam, v l h s nh hng ca 4 loi thuc tr su, v
l phn d (residual) ca m hnh.
Chng ta c th s dng hm aov ca R c tnh cc thng s trn
nh sau:
# trc ht chng ta nhp s liu
> variety <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
> pesticide <- c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
> product <- c(29,50,43,53,41,58,42,73,66,85,69,85)

# nh ngha variety v pesticide l hai yu t (factors)


> variety <- as.factor(variety)
> pesticide <- as.factor(pesticide)

# cho vo mt data frame tn l data

> data <- data.frame(variety, pesticide, product)

# phn tch phng sai bng aov v cho vo object analysis


> analysis <- aov(product ~ variety + pesticide)
> anova(analysis)

199

Analysis of Variance Table


Response: product
Df Sum Sq
Mean Sq
variety
2 2225.17 1112.58
pesticide 3 1191.00
397.00
Residuals 6
151.50
25.25
---

F value Pr(>F)
44.063 0.000259 ***
15.723 0.003008 **

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kt qu trn cho thy c hai yu t ging cy (variety) v thuc tr su


(pesticide) u c nh hng n sn lng cam, v tr s p < 0.05. so snh
c th cho tng hai nhm, chng ta s dng hm TukeyHSD nh sau:
> TukeyHSD(analysis)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = product ~ variety + pesticide)
$variety
diff
lwr
2-1 9.75 -1.152093
3-1 32.50 21.597907
3-2 22.75 11.847907
$pesticide
diff
lwr
2-1 19
4.797136
3-1 6
-8.202864
4-1 25 10.797136
3-2 -13 -27.202864
4-2 6
-8.202864
4-3 19
4.797136

upr
p adj
20.65209 0.0749103
43.40209 0.0002363
33.65209 0.0016627
upr
33.202864
20.202864
39.202864
1.202864
20.202864
33.202864

p adj
0.0140509
0.5106152
0.0036109
0.0704233
0.5106152
0.0140509

Kt qu phn tch gia cc loi ging cho thy ging B3 c sn lng


cao hn ging B1 khong 32 n v vi khong tin cy 95% t 21 n 43
(p = 0.0002). Ging cam B3 cng tt hn ging B2, vi khc bit trung bnh
khong 22 n v (p = 0.0017). Nhng khng c khc bit ng k gia ging
B2 v B1.
So snh gia cc loi thuc tr su, kt qu trn cho chng ta bit cc
thuc tr su 4 c hiu qu cao hn thuc 1 v 3. Ngoi ra, thuc 2 cng c hiu
qu cao hn thuc 1. Cn cc so snh khc khng c ngha thng k. Biu
Tukey sau y minh ha cho kt lun trn.

200

> plot(TukeyHSD(analysis), ordered=TRUE)

4-3

4-2

3-2

4-1

3-1

2-1

95% family-wise confidence level

-20

-10

10

20

30

40

Differences in mean levels of pesticide

11.7 Phn tch phng sai cho th nghim hnh


vung Latin (Latin square experiment)
V d 5. so snh hiu qu ca 2 loi phn bn (A v B) cng 2
phng php canh tc (a v b), cc nh nghin cu tin hnh mt th nghim
hnh vung Latin. Theo , c 4 nhm can thip tng hp t hai loi phn bn
v phng php canh tc: Aa, Ab, Ba, v Bb (s cho m s, ln lc, l 1=Aa,
2=Ab, 3=Ba, 4=Bb). Bn phng php (treatment) c p dng trong 4
mu rung (sample = 1, 2, 3, 4) v 4 loi cy trng (variety = 1, 2, 3, 4). Tng
cng, th nghim c 4x4 = 16 mu. Tiu ch nh gi l sn lng, v kt qu
sn lng c tm tt trong bng sau y:
Bng 11.6. Sn lng cho 2 loi phn bn v 2 phng php canh tc:
Mu rung
(sample)
1
2
3

1
175
Aa
170
Ab
135
Bb

Ging (variety)
2
3
143
128
Ba
Bb
178
140
Aa
Ba
173
169
Ab
Aa

4
166
Ab
131
Bb
141
Ba

201

145
Ba

136
Bb

165
Ab

173
Aa

Cu hi t ra l cc phng php canh tc v phn bn c nh hng n


sn lng hay khng. tr li cu hi , chng ta phi xem xt n cc
ngun lm cho sn lng thay i hay bin thin. Nhn qua th nghim v bng
s liu trn, rt d dng hnh dung ra 3 ngun bin thin chnh:

Ngun th nht l khc bit gia cc phng php canh tc v phn bn;
Ngun th hai l khc bit gia cc loi ging cy;
Ngun th ba l khc bit gia cc mu rung;

V phn cn li l khc bit trong mi mu rung v loi ging. c mt ci


nhn chung v s liu, chng ta hy tnh trung bnh cho tng nhm qua bng s
sau y:
Trung bnh cho tng loi
ging

Trung bnh cho tng Trung bnh cho tng


mu
phng php

1: 156.25
2: 157.50
3: 150.50
4: 152.75
Tng trung bnh: 154.25

1: 153.00
2: 154.75
3: 154.50
4: 154.75
Tng trung bnh: 154.25

1: 173.75
2: 168.50
3: 142.25
4: 132.50
Tng trung bnh: 154.25

Bng tm lc trn cho php chng ta tnh tng bnh phng cho tng ngun
bin thin. Khi u l tng bnh phng cho ton b th nghim (s tm gi l
SStotal):

Tng bnh phng chung cho ton th nghim:


SStotal = (175 154.25)2 + (143 154.25)2 + (165 154.25)2 +
(173 154.25)2 = 4941

Tng bnh phng do khc bit gia cc loi ging (SSvariety). Ch


l v trung bnh mi ging c tnh t 4 s, cho nn chng ta phi
nhn cho 4 khi tnh tng bnh phng:
SSvariety = 4(156.25 154.25)2 + 4(157.50 154.25)2 +
4(150.50 154.25)2 + 4(152.75 154.25)2 = 123.5

202

V c 4 loi ging v mt thng s, cho nn bc t do l 4-1=3. Theo


, trung bnh bnh phng (mean square) l:
123.5 / 3 = 41.2.

Tng bnh phng do khc bit gia ging (SSsample). Ch l v


trung bnh mi mu c tnh t 4 s, cho nn khi tnh tng bnh
phng, cn phi nhn cho 4:
SSsample= 4(153.00 154.25)2 + 4(154.75 154.25)2 +
4(154.50 154.25)2 + 4(154.75 154.25)2 = 8.5
V c 4 mu v mt thng s, cho nn bc t do l 4-1=3, v theo
trung bnh bnh phng l: 8.5 / 3 = 2.8.

Tng bnh phng do khc bit gia cc phng php (SSmethod). Ch


l v trung bnh mi phng php c tnh t 4 s, cho nn khi tnh
tng bnh phng, cn phi nhn cho 4:
SSsample= 4(173.75 154.25)2 + 4(168.50 154.25)2 +
4(142.25 154.25)2 + 4(132.50 154.25)2 = 4801.50
V c 4 phng php v mt thng s, cho nn bc t do l 4-1=3, v
theo trung bnh bnh phng l: 4801.5 / 3 = 1600.5.

Tng bnh phng phn d (residual sum of squares):


SSresidual = SStotal SSmethod SSsample - SSvariety
= 4941.0 4801.5 8.5 123.5
= 7.5

Nhng c tnh trn y c th trnh by trong mt bng phn tch phng sai
nh sau:
Ngun bin thin
Gia 4 mu rung
Gia 4 loi ging
Gia 4 phng php
Phn d (residual)

Bc t do
(degrees of
freedom)
3
3
3
6

Tng bnh phng


(Sum of squares)
8.5
123.5
4801.5
7.5

Trung bnh
bnh phng
(Mean square)
2.8
41.2
1600.5

Kim
nh F
2.3
32.9
1280.4

203

Tng s

16

4941.0

Qua phn tch th cng v n gin trn, chng ta thy phng php
canh tc v loi ging c nh hng ln n sn lng. tnh ton chnh xc
tr s p, chng ta c th s dng R tin hnh phn tch phng sai cho th
nghim hnh vung Latin.
Vn t chc s liu sao cho thch hp R c th tnh ton l rt quan
trng. Ni mt cch ngn gn, mi s liu phi l mt s c th (unique).
Trong th nghim trn, chng ta c 4 loi ging, 4 mu, cho nn tng s l 16 s
liu. V, 16 s liu ny phi c nh ngha cho tng loi ging, tng mu, v
quan trng hn l cho tng phng php canh tc. Chng hn nh, trong v d
bng s liu 10.6 trn, 175 l sn lng ca phng php canh tc 1 (tc Aa),
loi ging 1, v mu 1; nhng 173 (s gc mc cui bng) l sn lng ca
phng php canh tc 1, nhng t loi ging 4, v mu 4; v.v...

Trc ht, chng ta nhp s liu sn lng, v gi l y:

> y <- c(175,


170,
135,
145,

143,
178,
173,
136,

128,
140,
169,
165,

166,
131,
141,
173)

K n, gi variety l ging gm 4 bc (1,2,3,4) cho tng s liu trong


y (v cng nh ngha rng variety l mt factor, tc bin th bc):

> variety <- c(1,2,3,4,


1,2,3,4,
1,2,3,4,
1,2,3,4,)
> variety <- as.factor(variety)

Gi sample l mu gm 4 bc (1,2,3,4) cho tng s liu trong y (v


cng nh ngha rng sample l mt factor, tc bin th bc):

> sample

<- c(1,1,1,1,

2,2,2,2,
3,3,3,3,
4,4,4,4)
> sample <- as.factor(sample)

Nhp s liu cho phng php, method,cng gm 4 bc (1,2,3,4) cho


tng s liu trong y (v cng nh ngha rng method l mt factor, tc
bin th bc):

> method <- c(1, 3, 4, 2,


2, 1, 3, 4,

204

4, 2, 1, 3,
3, 4, 2, 1)
> method <- as.factor(method)

Tng hp tt c cc s liu trn vo mt data frame v gi l data:

> data <- data.frame(sample, variety, method, y)

In data kim tra xem s liu c ng v thch hp hay cha:

> data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

sample variety method y


1
1
1
175
1
2
3
143
1
3
4
128
1
4
2
166
2
1
2
170
2
2
1
178
2
3
3
140
2
4
4
131
3
1
4
135
3
2
2
173
3
3
1
169
3
4
3
141
4
1
3
145
4
2
4
136
4
3
2
165
4
4
1
173

By gi chng ta sn sng dng hm lm hay aov phn tch s


liu. y chng ta s s dng hm aov tnh cc ngun bin thin trn (kt
qu tnh ton s cha trong i tng latin):
> latin <- aov(y ~ sample + variety + method)
> summary(latin)
Df Sum Sq Mean Sq F value Pr(>F)
sample
3
8.5
2.8
2.2667 0.1810039
variety
3 123.5
41.2
32.9333 0.0004016 ***
method
3 4801.5 1600.5 1280.4000 8.293e-09 ***
Residuals 6
7.5
1.3
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Tt c cc kt qu ny (d nhin) l nhng kt qu m chng ta tm


tt trong bng phn tch phng sai mt cch th cng trn y. Tuy nhin,
y R cung cp cho chng ta tr s p (trong Pr > F) c th suy lun thng
k. V, qua tr s p, chng ta c th pht biu rng mu rung khng c nh

205

hng n sn lng, nhng loi ging v phng php canh tc th c nh


hng n sn lng.
bit mc khc bit gia cc phng php canh tc v gia cc
loi ging, chng ta dng hm TukeyHSD nh sau:
> TukeyHSD(latin)
$variety
diff
lwr
2-1 1.25 -1.4867231
3-1 -5.75 -8.4867231
4-1 -3.50 -6.2367231
3-2 -7.00 -9.7367231
4-2 -4.75 -7.4867231
4-3 2.25 -0.4867231
$method
diff
2-1 -5.25
3-1 -31.50
4-1 -41.25
3-2 -26.25
4-2 -36.00
4-3 -9.75

lwr
-7.986723
-34.236723
-43.986723
-28.986723
-38.736723
-12.486723

upr
3.9867231
-3.0132769
-0.7632769
-4.2632769
-2.0132769
4.9867231
upr
-2.513277
-28.763277
-38.513277
-23.513277
-33.263277
-7.013277

p adj
0.4528549
0.0014152
0.0173206
0.0004803
0.0038827
0.1034761
p adj
0.0023016
0.0000001
0.0000000
0.0000004
0.0000000
0.0000730

So snh gia cc loi ging cho thy c s khc bit gia ging 3 v 1, 4 v 1, 3
v 2, 4 v 2.
Tt c cc so snh gia cc phng php canh tc u c ngha thng k.
Nhng loi no c sn lng cao nht? tr li cu hi ny, chng ta s s
dng biu hp:
> boxplot(y ~ method, xlab="Methods (1=Aa, 2=Ab, 3=Ba,
4=Bb", ylab=Production")

206

180
170
160
Production

150
140
130

Methods (1=Aa, 2=Ab, 3=Ba, 4=Bb

Biu so snh sn lng ca bn phng php canh tc.

11.8 Phn tch phng sai cho th nghim giao


cho (cross-over experiment)
V d 6. th nghim hiu ng ca mt thuc mi i vi chng ra m
hi (thuc ny c bo ch cha tr bnh tim, nhng ra m hi l mt nh
hng ph), cc nh nghin cu tin hnh mt nghin cu trn 16 bnh nhn. S
bnh nhn ny c chia thnh 2 nhm (tm gi l nhm AB v BA) mt cch
ngu nhin. Mi nhm gm 8 bnh nhn. Bnh nhn c theo di hai ln: thng
th nht v thng th 2. i vi bnh nhn nhm AB, thng th nht h c
iu tr bng thuc, thng th hai h c cho s dng gi dc (placebo). Ngc
li, vi bnh nhn nhm BA, thng th nht s dng gi dc, v thng th hai
c iu tr bng thuc. Tiu ch nh gi l thi gian ra m hi trn trn
(tnh t lc ung thuc n khi ra m hi) sau khi s dng thuc hay gi dc.
Kt qu nghin cu c trnh by trong bng s liu sau y:
Bng 11.7. Kt qu nghin cu hiu ng ra m hi ca thuc iu tr bnh tim
Nhm

M s bnh
nhn s (id)

AB
1
3
5
6
9
10
13

Thi gian (pht) ra m hi trn trn


Thng 1
Thng 2
A
Placebo
6
4
8
7
12
6
7
8
9
10
6
4
11
6

207

15
BA
2
4
7
8
11
12
14
16

8
Placebo
5
9
7
4
9
5
8
9

8
A
7
6
11
7
8
4
9
13

Cu hi chnh l c s khc bit v thi gian ra m hi gia hai nhm iu tr


bng thuc v gi dc hay khng?
tr li cu hi trn, chng ta cn tin hnh phn tch phng sai.
Nhng v cch thit k nghin cu kh c bit (hai nhm bnh nhn vi cch
sp xp can thip theo hai th t khc nhau), nn cc phng php phn tch
trn khng th p dng c. C mt phng php thng dng l phn tch
phng sai trong tng nhm, ri sau so snh gia hai nhm. Mt trong
nhng vn chng ta cn phi lu l kh nng hiu ng ko di (cn gi l
carry-over effect), tc l trong nhm AB, hiu qu ca thng th 2 c th chu
nh hng ko di t thng th nht khi bnh c c iu tr bng thuc
tht. Trc ht, chng ta th tm lc d liu bng bng sau y:
Bng 11.8. Tm lc kt qu th nghim hiu ng ra m hi ca thuc iu
tr bnh tim
Nhm

M s bnh nhn
s (id)

AB
1
3
5
6
9
10
13
15
Trung bnh
BA
2
4
7

208

Thi gian (pht) ra m hi trn


trn
Thng 1
Thng 2
A
Placebo
6
4
8
7
12
6
7
8
9
10
6
4
11
6
8
8
8.375
6.625
Placebo
A
5
7
9
6
7
11

Trung bnh cho


tng bnh nhn

5.0
7.5
9.0
7.5
9.5
5.0
8.5
8.0
7.50
6.0
7.5
9.0

8
4
7
11
9
8
12
5
4
14
8
9
16
9
13
Trung bnh
7.000
8.125
Trung bnh cho 2 nhm
7.6875
7.3750
Trung bnh cho nhm A = (8.375 + 8.125) / 2 = 8.25
Trung bnh cho nhm P (gi dc) = (6.625 + 7.000) / 2 = 6.8125

5.5
8.5
4.5
8.5
11.0
7.5625
7.5312

Qua bng tm lc trn, chng ta c th tnh ton mt s tng bnh phng:

Tng bnh phng do khc bit gia hai nhm iu tr bng thuc v
gi dc:
SSTreat = 16(8.25 7.5312)2 + 16(8.8125 7.5312)2 = 16.53

Tng bnh phng do khc bit gia thng 1 v thng 2:


SSPeriod = 16(7.6875 7.5312)2 + 16(7.3750 7.5312)2 = 0.781

Tng bnh phng do khc bit gia hai nhm AB v BA (th t):
SSseq = 16(7.50 7.5312)2 + 16(7.5625 7.5312)2 = 0.031

Tng bnh phng do khc bit gia cc bnh nhn trong cng nhm
AB hay BA:
SSw = (5.0 7.50)2 + (7.5 7.50)2 + (9.0 7.50)2 + +
(8.0 7.50)2 + (6.0 7.5625)2 + (7.5 7.5625)2 +
(9.0 7.5625)2 + + (11.0 7.5625)2 = 103.44

Tng bnh phng cho ton b mu:


SStotal = (6 7.5312)2 + (9 7.5312)2 + + (13 7.5312)2 +
(9 7.5312)2 = 167.97

Tng bnh phng cn li (tc phn d):


SSres = 167.97 16.53 0.781 0.031 103.44 = 47.19

n y, chng ta c th lp bng phn tch phng sai nh sau:

209

Bng 11.9. Kt qu phn tch phng sai s liu trong bng 11.7
Ngun bin thin

Gia hai nhm iu tr


Gia hai thng
Gia AB v BA
Trong mi nhm
Phn d (residual)
Tng s

Bc t do
(degrees
of
freedom)

Tng bnh
phng
(Sum of
squares)

1
1
1
14
14
31

16.53
0.781
0.031
103.44
47.19
167.97

Trung bnh
bnh
phng
(Mean
square)
16.53
0.781
0.031
7.39
3.37

Kim
nh F

4.90
0.23
0.004

Qua phn tch trn, chng ta thy khc bit gia thuc v gi dc
ln hn l khc bit gia hai thng hay hai nhm AB v BA. Kim nh F
th nghim gi thit thuc v gi dc c hiu qu nh nhau l kim nh
F = 16.53 / 3.37 = 4.90 vi bc t do 1 v 14. Da trn l thuyt xc sut, tr s
F vi bc t do 1 v 14 l 4.60. Do , chng ta c th kt lun rng thuc ny
c hiu ng lm ra m hi lu hn nhm gi dc.
Tt c cc tnh ton th cng trn ch l minh ha cho cch phn tch
phng sai trong th nghim giao cho. Trong thc t, chng ta c th s dng
R tin hnh cc tnh ton nh cch tnh phng sai cho cc th nghim
n gin. Vn chnh l t chc s liu cho phn tch. R (cng nh nhiu
phn mm khc) yu cu ngi s dng phi nhp tng s liu mt, v mi s
liu phi gn lin vi mt bnh nhn, mt nhm iu tr, mt thng (hay
giai on), v mt nhm th t. l mt yu cu rt quan trng, v nu t
chc s liu khng ng, kt qu phn tch c th sai.
Phn sau y s m t tng bc mt:
# bc 1: nhp d liu v t tn object l y
> y <- c(6,8,12,7,9,6,11,8,
4,7,6,8,10,4,6,8,
5,9,7,4,9,5,8,9
7,6,11,7,8,4,9,13)
# bc 2: c mi s liu trong bc 1, ch ra nhm AB
hay BA (m s 1 v 2)
> seq <- c(1,1,1,1,1,1,1,1,

210

1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,
2,2,2,2,2,2,2,2)
> seq <- as.factor(seq)
# bc 3: c mi s liu trong bc 1, ch ra thng 1
hay thng 2
> period <- c(1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,
2,2,2,2,2,2,2,2,
1,1,1,1,1,1,1,1)
> period <- as.factor(period)
# bc 4: c mi s liu trong bc 1, ch ra nhm A
hay placebo bng m s 1 v 2:
> treat <- c(1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,
1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2)
> treat <- as.factor(treat)
# bc 5: c mi s liu trong bc 1, ch ra m s
cho tng bnh nhn
> id <- c(1,3,5,6,9,10,13,15,
1,3,5,6,9,10,13,15,
2,4,7,8,11,12,14,16,
2,4,7,8,11,12,14,16)
> id <- as.factor(id)
# bc 6: lp thnh mt data frame tn l data v in
ra kim tra mt ln na.
> data <- data.frame(seq, period, treat, id, y)
> data
seq period treat id
y
1 1
1
1
1
6
2 1
1
1
3
8
3 1
1
1
5
12
4 1
1
1
6
7
5 1
1
1
9
9
6 1
1
1
10
6
7 1
1
1
13 11
8 1
1
1
15
8

211

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1

2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2

1
3
5
6
9
10
13
15
2
4
7
8
11
12
14
16
2
4
7
8
11
12
14
16

4
7
6
8
10
4
6
8
7
6
11
7
8
4
9
13
5
9
7
4
9
5
8
9

By gi chng ta sn sng dng hm lm ca R phn tch s liu. Ch


rng cch dng hm lm cho phn tch phng sai p dng cho th nghim giao
cho hon ton khng khc g vi cch dng cho cc th nghim khc. Kha
cnh khc bit duy nht l cch t chc d liu cho phn tch nh trnh by trn.
> xover <- lm(y ~ treat+seq+period)
> anova(xover)
Analysis of Variance Table
Response: y
Df
treat
1
seq
1
period
1
id
14
Residuals 14
---

Sum Sq Mean Sq F value Pr(>F)


16.531 16.531 4.9046 0.04388 *
0.031 0.031 0.0093 0.92466
0.781 0.781 0.2318 0.63764
103.438 7.388 2.1921 0.07711 .
47.187 3.371

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

212

Kt qu phn tch trn y d nhin ging vi cch tnh th cng m


chng ta tin hnh phn trn. Ni tm li, mc khc bit gia thuc v
gi duc c ngha thng k, vi tr s F l 0.044.
Chng ta cng c th yu cu khong tin cy 95% cho khc bit gia
hai nhm (bng cch lnh TukeyHSD) nh sau (ch l vi TukeyHSD
chng ta ch s dng hm aov ch khng phi lm):
> TukeyHSD(aov(y ~ treat+seq+period+id))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ treat + seq + period + id)
$treat
diff
lwr
upr
p adj
2-1 -1.4375 -2.829658 -0.04534186 0.0438783
$seq

diff
lwr
upr
p adj
2-1 0.0625 -1.329658 1.454658 0.924656
$period
diff
lwr
upr
p adj
2-1 -0.3125 -1.704658 1.079658 0.6376395

Ch kt qu:
$treat
diff
lwr
upr
p adj
2-1 -1.4375 -2.829658 -0.04534186 0.0438783

cho bit tnh trung bnh thi gian ra m hi ca nhm c iu tr cao hn


nhm gi dc khong 1.44 pht, v khong tin cy 95% l t 0.05 pht n 2.8
pht. Cn cc kt qu so snh gia hai nhm AB v BA (seq) hay gia thng 1
v thng 2 (period) khng c ngha thng k.

11.9 Phn tch phng sai cho th nghim ti o


lng (repeated measure experiment)
V d 7. Mt nghin cu s khi (pilot study) c tin hnh nh
gi hiu nghim ca mt vc-xin mi chng bnh thp khp. Nghin cu gm 8
bnh nhn, c chia thnh 2 nhm mt cch ngu nhin. Nhm 1 gm 4 bnh
nhn c iu tr bng vc-xin; nhm 2 cng gm 4 bnh nhn nhng c
nhn gi dc (placebo, hay i chng). Bnh nhn c theo di trong 3

213

thng, v c mi thng, bnh nhn c hi v tnh trng ca bnh ra sao. Tnh


trng bnh c o lng bng mt ch s c gi tr t 0 (khng c hiu
nghim, bnh vn nh trc) n 10 (c hiu nghim tuyt i, ht bnh). Kt
qu nghin cu c th tm tt trong bng s liu sau y:
Bng 11.10. Kt qu nghin cu vc-xin chng au thp khp
Nhm

M s bnh
nhn (id)

Thng 1

Ch s bnh qua tng thng


Thng 2
Thng 3

Vc-xin
1
2
3
4

6
7
4
8

3
3
1
4

0
1
2
3

Placebo
5
6
5
5
6
9
4
6
7
5
3
4
8
6
2
3
Cu hi chnh l c s khc bit no gia hai nhm vc-xin v gi dc hay
khng?
n gin ha cch phn tch phng sai cho th nghim ti o lng, chng ta
s trnh dng k hiu ton, m ch minh ha bng vi php tnh th cng bn
c c th theo di. Trc ht, chng ta cn phi tm lc s liu bng cch tnh
trung bnh cho mi bnh nhn, mi nhm iu tr, v mi thng nh sau:
Bng 11.11. Tm lc s liu nghin cu vc-xin chng au thp khp
Nhm
iu tr
Vc-xin

Placebo

214

id
1
2
3
4
Trung bnh
SD

Ch s bnh qua tng thng


1
2
3
6
3
0
7
3
1
4
1
2
8
4
3
6.25
2.75
1.50
1.71
1.26
1.29

5
6
7
8
Trung bnh

6
9
5
6
6.50

5
4
3
2
3.50

5
6
4
3
4.50

Trung bnh
3.000
3.667
2.333
5.000
3.500
5.333
6.333
4.000
3.667
4.833

SD
Trung bnh cho hai
nhm

1.73
6.375

1.29
3.125

1.29
3.000

4.167

Qua bng trn, chng ta c th thy rng c 5 ngun lm cho kt qu th nghim


khc nhau:
(a) Gia vc-xin v gi dc (c l l ngun m chng ta cn bit!);
(b) Gia 3 thng theo di;
(c) Gia ba thng trong mi nhm iu tr, m gii thng k thng
cp n l interaction (tng tc), v trong trng hp ny, tng
tc gia nhm iu tr v thi gian;
(d) Gia cc bnh nhn trong cng mt nhm iu tr;
(e) V sau cng l phn d, tc phn m chng ta khng th gii
thch sau khi xem xt cc ngun (a) n (d) trn.

Trc ht l tng bnh phng gia hai nhm iu tr (vc-xin v gi


dc), ti s gi l SStreat:
SStreat = 12(3.500 4.167)2 + 12(4.833 4.167)2 = 10.667

K n l tng bnh phng gia 3 thng iu tr, gi l SStime:


SStime = 8(6.375 4.167)2 + 8(3.125 4.167)2 +
8(3.000 4.167)2 = 58.583

Ngun th ba l tng bnh phng do tng tc gia iu tr v thi


gian, gi l SSint
SSint= 4(6.25 4.167)2 +
4(2.75 4.167)2 +
4(1.50 4.167)2 +
4(6.50 4.167)2 +
4(3.50 4.167)2 +
4(4.50 4.167)2
SSvcxin SStime
= 77.833 10.667 58.583
= 8.583

Ngun th t l tng bnh phng do tng tc gia bnh nhn trong


mi nhm iu tr, gi l SSpatient(treat):
SSpatient(treat) = 3(3.0003.350)2 + 3(3.6673.350)2 + 3(2.3333.350)2
+3(5.0003.350)2+ 3(5.3334.833)2 + 3(6.3334.833)2

215

+3(4.0004.833)2 +3(3.6674.833)2
= 25.333

Ngoi ra, tng bnh phng cho ton mu l:


SStotal = (6-4.167)2 +(3-4.167)2 +(0-4.167)2 + +(3-4.167)2 = 115.333

T , chng ta c th c tnh tng bnh phng cho phn d:


SSE = SStotal SSvcxin SStime SSpatient(vcxin) SSvcxin-time
= 115.333 10.667 58.583 25.333 8.583
= 12.167

Tt c cc tnh ton th cng trn, nh bn c c th thy, kh phc tp, v


rt d sai st. Nhng trong R, chng ta c th c kt qu nhanh chng . Sau y,
s trnh by cch phn tch phng sai ti o lng bng R:
Chng ta c th lp bng phn tch phng sai nh sau:
Ngun bin thin

Gia vcxin v placebo


Bnh nhn (nhm iu tr)
Gia 3 thng
Thi gian v nhm iu tr
Phn d (residual)
Tng s

Bc t do
(degrees
of
freedom)
1
6
2
2
12
23

Tng bnh
phng
(Sum of
squares)
10.667
25.333
58.583
8.583
12.167
115.333

Trung bnh
bnh
phng
(Mean
square)
10.667
4.222
29.292
4.292
1.014

Kim
nh F

2.53
28.89
4.23
-

Trc ht, chng ta nhp d liu cho tng bnh nhn. Cng nh bt c
phn mm thng k no, mi gi tr phi c km theo nhng bin s c
trng nh cho mi bnh nhn, mi nhm, v mi thi gian:

y <- c(6,7,4,8,
3,3,1,4,
0,1,2,3,
6,9,5,6,
5,4,3,2,
5,6,4,3)

216

Trong mi s liu trn, cho R bit thuc nhm iu tr (m s 1) hay gi


dc (m s 2). Cng nn cho R bit treat l mt bin th bc
(categorical variable) ch khng phi bin s (numerical variable):

treat <- c(1,1,1,1,


1,1,1,1,
1,1,1,1,
2,2,2,2,
2,2,2,2,
2,2,2,2)
treat <- as.factor(treat)

Trong mi s liu trn, cho R bit thuc thng no (m s 1, 2, 3), v nh


ngha time l mt bin th bc.

time <- c(1,1,1,1,


2,2,2,2,
3,3,3,3,
1,1,1,1,
2,2,2,2,
3,3,3,3)
time <- as.factor(time)

Trong mi s liu trn, cho R bit thuc bnh nhn no (m s 1, 2, 3,


,8), v nh ngha id l mt bin th bc.

id <- c(1,2,3,4, 1,2,3,4, 1,2,3,4, 5,6,7,8, 5,6,7,8,


5,6,7,8)
id <- as.factor(id)

Nhp tt c bin vo mt data frame v t tn l data. Kim tra mt ln


na xem s liu ng vi nh sp xp hay cha. Xin nhc li, trc
khi phn tch s liu, vic quan trng l phi kim tra li cho tht k s liu
m bo s liu c t chc ng v thch hp.

data <- data.frame(id, time, treat, y)


data
id time treat y
1 1
1
1
6
2 2
1
1
7
3 3
1
1
4
4 4
1
1
8
5 1
2
1
3
6 2
2
1
3
7 3
2
1
1
8 4
2
1
4
9 1
3
1
0

217

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

2
3
4
5
6
7
8
5
6
7
8
5
6
7
8

3
3
3
1
1
1
1
2
2
2
2
3
3
3
3

1
1
1
2
2
2
2
2
2
2
2
2
2
2
2

1
2
3
6
9
5
6
5
4
3
2
5
6
4
3

By gi, chng ta sn sng s dng R phn tch. Hm chnh


phn tch phng sai l aov (analysis of variance). Trong hm ny, ch cch
cung cp thng s bng cch dng mt hm khc c tn l Error. Trong hm
Error, chng ta cho R bit rng mi bnh nhn (id) thuc vo mt nhm
iu tr v do thuc vo bin time. Cch cho R bit l:
Error(id/time). C th hn:
> repeated <- aov(y ~ treat*time + Error(id/time))

Lnh trn y yu cu R phn tch theo m hnh: y = treat +


time + treat*time (ch treat*time tng ng vi
treat+time+treat*time), v trung bnh bnh phng phn d phi c
tch thnh hai phn: mt phn trong cc bnh nhn, v mt phn gia cc thng
iu tr (vit tt bng k hiu id/time). Tt c kt qu cho vo i tng c
tn l repeated. Chng ta yu cu mt bng tm lc kt qu t i tng
repeated:
> summary(repeated)
Error: id
Df Sum Sq Mean Sq F value Pr(>F)
treat
1 10.6667 10.6667 2.5263 0.1631
Residuals 6 25.3333 4.2222
Error: id:time
Df Sum Sq
time
2 58.583
treat:time 2 8.583
Residuals 12 12.167
---

Mean Sq F value Pr(>F)


29.292 28.8904 2.586e-05 ***
4.292
4.2329 0.04064 *
1.014

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

218

Kt qu phn tch trong phn u ca bng trn cho thy s khc bit
gia nhm iu tr bng thuc v gi dc khng c ngha thng k
(p = 0.16). Nh vy chng ta c th kt lun thuc khng c hiu nghim gim
au thp khp?
Cu tr li l khng, bi v phn th hai ca bng phn tch phng
sai cho thy mi tng tc gia treat v time (tr s p = 0.041). iu ny c
ngha l khc bit gia thuc v gi dc ty thuc vo thng iu tr. Tht
vy, nu chng ta xem li bng 10.11 s thy trong thng 1, trung bnh ca
nhm vc-xin v gi dc khng my khc nhau (6.25 v 6.50), nhng n
thng th 2 v nht l thng th 3 th khc bit gia hai nhm rt cao (nh
thng th ba: 1.50 cho vc-xin v 4.50 cho nhm gi dc). Nh vy, hiu
nghim trong nhm c iu tr tng dn theo thi gian, cn trong nhm gi
dc th hu nh khng c khc bit gia 3 thng. Tm li, qua th nghim s
khi ny chng ta c th ni vc-xin c v c hiu qu gim au trong cc bnh
nhn thp khp.
***
Trn y l vi cch s dng cho vic phn tch phng sai vi cc th
nghim thng dng. Thit k v phn tch th nghim (experimental design) l
mt lnh vc nghin cu tng i chuyn su, nhng ch dn trn y khng
th m t tt c cc php tnh cng nh phng php cho tt c th nghim. Tuy
nhin, trong thc t, cc phng php v th nghim rt thng c p dng
trong khoa hc thc nghim. R c mt package tn l nlme (non-linear mixedeffects) cng c th s dng cho cc phn tch trn v cc m hnh phc tp hn
vi a bin v a th bc. Package ny cng c th ti v my min ph ti
website ca R: http://cran.R-project.org.

219

You might also like