You are on page 1of 13

Lm sng thng k

Phn tch tng quan


Nguyn Vn Tun
Mt trong nhng mc tiu ca nghin cu y hc (v khoa hc ni chung) l tm
hiu nhng mi tng quan gia cc yu t lm sng hay cc yu t cn lm sng, v qua
c th tin lng mt yu t ph thuc t cc yu t c lp. Mi tng quan y
bao gm cc c im nh mc tng quan (degree of correlation) v xy dng mt
m hnh tin on. M hnh y chnh l hm s ni kt hai bin vi nhau, v hm s
ny phi c tin cy nht nh v c ngha sinh hc c th ng dng trong lm
sng.
Chng hn nh tm hiu mi lin h gia tui v mt xng (bone mineral
density, hay BMD) c ngha l chng ta mun bit mi tng quan gia hai bin ny ra
sao v c th s dng tui tin lng mt xng cho mt c nhn hay khng.
Trong mi lin h ny, chng ta xc nh BMD l bin ph thuc (dependent variable) v
tui l bin c lp (independent variable). Ni theo ngn ng ton, gi BMD l Y v
tui l X, chng ta mun tm hiu tng quan gia X v Y, v hm s m t mi
lin h n gin ny. Mt trong nhng hm s n gin nht v c l thng dng nht l
hi qui tuyn tnh n bin (simple linear regression). Trong bi ny, ti s ln lc gii
thch phng php phn tch t c hai mc tiu ny.

1. Tm lc l thuyt
m t tng quan gia hai bin, chng ta cn phi c tnh h s tng
quan (coefficient of correlation). V, hiu c ch ca h s tng quan, chng ta
cn lm quen vi khi nim hip bin (covariance). Chng ta bit rng vi mt bin X
hay Y, c ba thng s thng k m t: s c mu, s trung (mean), v phng sai
(variance), m ti bn qua trong bi Lm sng thng k th nht. Nhng m t mi
tng quan gia hai bin X v Y, chng ta cn n hip bin.
C th hiu hip bin qua hnh hc lng gic nh sau. Chng ta bit rng cho
mt tam gic vung, nu gi cnh huyn l c v hai cnh cn li l a v b, nh l
Pythagoras cho bit bnh phng cnh huyn bng tng bnh phng hai cnh kia:
c2 = a 2 + b2

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Nhng cho mt tam gic thng, th mi lin h gia c v hai cnh a v b phc
tp hn vi mi lin h c nh lng bng hm cosine ca gc C nh sau:
c 2 = a 2 + b 2 2 ab.cos C

Tng t nh vy, cho hai bin X v Y, v nu hai bin ny hon ton c lp vi


nhau, chng ta c th pht biu rng phng sai ca bin X + Y bng phng sai ca X
cng vi phng sai ca Y:
var(X + Y) = var(X) + var(Y)
trong , var l vit tt ca phng sai (tc variance). Ch rng X+Y l mt bin
mi. Chng ta cng ch rng cng thc ny tng ng vi nh l Pythagoras cho
tam gic vung.
Nu hai bin X v Y c tng quan nhau, th cng thc trn c thay th bng
mt cng thc khc vi hip bin:
var(X + Y) = var(X) + var(Y) + 2Cov(X,Y)
trong , Cov l vit tt ca hip bin (tc covariance). Chng ta ch rng cng thc
ny tng ng vi cng thc ca tam gic thng, v cng ch rng cng thc trn
2

ging nh nh thc ( x + y ) = x 2 + y 2 + 2 xy ).
Trn y l khi nim. By gi i vo chi tit ton, chng ta cn mt s k
hiu vit tt cc ch s trn. Gi xi v yi l hai bin quan st c ca X v Y cho c
nhn i. Gi s chng ta c n i tng th i = 1, 2, 3, ., n. Gi x v y l hai s trung
bnh ca bin quan st c x v y; sx2 v s 2y ln lc l phng sai ca hai bin, c
nh ngha nh sau:
s x2 =

1 n
2
( xi x )
n 1 i =1

s 2y =

1 n
2
( yi y )
n 1 i =1

Do , nu X v Y c lp, chng ta c th vit:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

sx2+ y = sx2 + s y2

[1]

Nhng nu X v Y c lin h vi nhau, cng thc trn khng p ng c vn m t.


Chng ta cn tm mt ch s khc m t mi lin h gia hai bin, bng cch nhn
lch ca bin x t s trung bnh, ( xi x ) , cho lch ca bin y, ( yi y ) , thay v bnh
phng lch tng bin ring l nh cng thc [1]. Ni cch khc, tch s hai lch
chnh l hip bin. i vi mi c nhn, hip bin l:

cov ( xi , yi ) = ( xi x )( yi y )
Nhng y chng ta c n i tng, cho nn cn phi cng tt c li v chia cho s i
tng:
cov ( x, y ) =

1 n
( xi x )( yi y )
n 1 i =1

[2]

Cng thc [2] chnh l nh ngha ca hip bin. T hai cng thc trn, chng ta c th
rt ra vi nhn xt s khi:

Phng sai lc no cng l s dng, bi v chng c tnh ton t bnh


phng, nhng hip bin c th m m cng c th dng v c c tnh t
tch ca hai lch.

Mt hip bin l s dng c ngha l lch t s trung bnh ca x tun theo


chiu hng thun vi y.

Mt hip bin l s m c ngha l lch t s trung bnh ca x tun theo


chiu hng nghch vi y.

Nu hip bin l 0, th hai bin x v y c lp nhau, tc khng c tng quan


g vi nhau.

Mt cch chun ha hip bin v phng sai l ly t s ca hai ch s ny,


v chnh l nh ngha ca h s tng quan. H s tng quan thng c k hiu
bng r:

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Cov ( x, y )

r=

var ( x ) . var ( y )

Cov ( x, y )

[3]

sx s y

(Ch rng cn s bc hai ca phng sai l lch chun, tc l: sx = sx2 v


s y = s y2 , cho nn cng thc trn c m t bng lch chun, thay v phng sai).

Vi vi thao tc i s, c th vit li cng thc [3] nh sau:

( xi x )( yi y )

i =1

r=

1 n n
x
y

xi yi
i i
n i =1 i =1
i =1
=
( n 1) sx s y
n

2 n

( xi x ) ( yi y )

i =1

[4]

i =1

Cng thc cn c bit n nh l h s Pearson (Pearsons correlation


coefficient) ghi nhn cng hin ca nh thng k hc ni ting Karl Pearson, ngi
u tin pht trin l thuyt v tng quan vo u th k 20.
Nu gi tr ca r l dng, hai bin x v y cng bin thin theo mt hng; nu
gi tr ca r l m, x v y lin h o ngc: tc khi khi x tng th y gim, v ngc li.
Nu r = 1 hay r = -1 (Biu 1a v 1b), mi lin h ca y v x c hon ton xc nh;
c ngha l cho bt c gi tr no ca x, chng ta c th xc nh gi tr ca y. Nu r = 0
(Biu 1c), hai bin x v y hon ton c lp, tc khng c lin h vi nhau.

(b)

(c)

6
y
5

y
-20

-15

10

-10

15

-5

20

(a)

8
x

10

12

14

10

12

14

10

Biu 1: Mi lin h gia x v y: (a) r = 1, (b) r = -1, v (c) r = 0 (c lp).

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

12

14

(e)

(f)

10

10

10

12

12

12

14

14

14

16

(d)

10

6
x

Biu 1: Mi lin h gia x v y: (d) r = 0.80; (e) r = -0.80, v (f) r = 0.001

Tt nhin, trong thc t khoa hc thc nghim, t khi no chng ta c nhng mi


lin h xc nh nh va trnh by. V sai s trong o lng, v cc l do dao ng sinh
hc, mi lin h gia x v y thng dao ng cao hn -1 v thp hn 1, nh Biu 1d,
1e v 1f.
Vn t ra l din dch ngha ca h s tng quan nh th no? C th xem
h s tng quan nh l mt h s nh hng (effect size). Nu h s nh hng cng
cao, th mi lin h c ngha lm sng thc t. Tuy nhin, v ngha lm sng cn ty
thuc vo b mn khoa hc. Chng hn nh i vi cc b mn khoa hc i hi
chnh xc cao, h s tng quan phi trn 0.8 mi c th xem l c ngha; nhng i
vi cc b mn khoa hc lm sng v y t cng cng, mt h s tng quan 0.6 cng c
th l c ngha. Bng 2 sau y l nhng qui c chung v cch din dch h s tng
quan trong lm sng v y t cng cng.

Bng 2. ngha ca h s tng quan


H s tng quan
0.01 n 0.1
0.2 n 0.3
0.4 n 0.5
0.6 n 0.7
0.8 tr ln

ngha
Mi tng quan qu thp, khng ng k
Mi tng quan thp
Mi tng quan trung bnh
Mi tng quan cao
Mi tng quan rt cao

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Cn nhn mnh mt ln na, y ch l nhng gi tr tham chiu, n khng c ngha l


nhng tiu chun vng ng dng.

2. Mt v d c th
V d 1 Cn nng v vng eo. S liu sau y c trch ra t mt nghin cu
qui m (trn 3000 ngi) Vit Nam v mi lin h gia cc ch s nhn trc v bnh
tiu ng. Trng lng v vng eo ca 15 i tng c o lng v kt qu nh sau:
Trng lng
(weight; kg)
51.0
66.0
47.0
54.0
64.0
75.0
54.0
52.0
53.0
52.0
48.0
46.0
63.0
40.0
90.0

Vng eo (waist; cm)


71.0
89.0
64.0
74.0
87.0
93.0
66.0
74.0
75.0
72.0
70.0
66.0
81.0
57.0
94.0

70
40

50

60

weight

80

90

Ch rng cn nng c tnh bng kg v vng eo bng cm. Biu 2 sau y th hin
mi lin h gia hai bin:

60

70

80

90

waist

Biu 2. Mi tng quan gia vng eo (waist) v

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

cn nng (weight) 15 i tng ngi Vit c


chn ngu nhin

p dng cng thc [1] trn chng ta c th m t hai bin ny qua cc ch s thng k nh sau:
Trng lng trung bnh: x =

51 + 66 + 47 + ... + 90
= 57.0 kg
15
2

2
x

Phng sai ca trng lng: s =


Vng eo trung bnh: y =
Phng sai vng eo: s

2
y

( 51 57 ) + ( 66 57 ) + ( 47 57 )

+ ... + ( 90 57 )

15 1

= 163.6

71 + 89 + 64 + ... + 94
= 75.5 cm
15

2
2
2
2
(
71 75.5) + (89 75.5) + (64 75.5) + ... + (94 75.5)
=

15 1

= 122.6

V hip bin:
Cov ( x, y ) =

( 51 57 ) ( 71 75.5 ) + ( 66 57 ) ( 64 75.5 ) + ... + ( 90 57 ) ( 94 75.5 )


14

= 71.2

Tm li:
n = 15
Trng lng trung bnh: x = 57.0, lch chun: sx = 12.8 kg

Vng eo trung bnh: y = 75.5, lch chun: s y = 11.1 cm

Hip bin ca hai trng lng v vng eo: Cov(x, y) = 71.2

Do , h s tng quan gia trng lng v vng eo (theo cng thc [3]) l:

r=

Cov ( x, y )
sx s y

71.2
= 0.92
12.8 11.1

Da vo qui c va cp trong phn trn, chng ta c th ni trong nhm i tng


ny, mi tng quan gia cn nng v vng eo rt cao. Nu mi tng quan ny c
lp li mt hay nhiu nhm i tng khc, c th s dng vng eo tin on trng
lng.

3. Khong tin cy 95% ca h s tng quan


Cng nh cc thng s thng k khc nh s trung bnh v lch chun, h s
tng quan cng chu nh hng ca dao ng gia cc mu. Do , chng ta cn phi
c tnh khong tin cy 95% ca h s tng quan. Xin nhc li rng, chng ta khng

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

bit h s tng quan tht (tc l h s trong qun th, v hy gi h s ny l ) l bao


nhiu, nn phi s dng h s r c tnh .
Mun c tnh khong tin cy 95% ca , chng ta cn phi c tnh lch
1 r2
chun ca r. Li thuyt thng k cho bit lch chun ca r l sr =
. Kh khn
n2
y, nh cng thc ny cho thy, l lch chun ca r ty thuc vo r, tc l mt tnh
c lp. Do , cn phi tm mt phng php khc sao cho khch quan hn. Nh thng
k hc (v cng l cha ca khoa hc thng k hin i v cha ca l thuyt di truyn
hin i) Ronald A. Fisher chng minh rng thay v tnh lch chun ca r, c th tnh
lch chun ca mt hm s ca r v s t c mc tiu khch quan.

Theo phng php ca Fisher, trc ht chng ta cn phi hon chuyn r sang
mt ch s mi z, qua cng thc sau y:

z=

1
1+ r
log

2
1 r

[5]

V, c th chng minh rng lch chun ca z l:

sz =

1
n3

[6]

Do , khong tin cy 95% ca z l: z 1.96 sz . Tt nhin, sau khi c tnh


c khong tin cy 95% ca z, chng ta c th hon chuyn ngc li cho khong tin
cy 95% ca .
V d 1 (tip theo) Cn nng v vng eo: phn trn, chng ta c tnh
r=0.92; do , theo [5] ch s z l:
z=

1
1 + 0.92
log
= 1.59
2
1 0.92

v vi s c mu n = 15, chng ta c th c tnh lch chun ca z qua [6] nh sau:


1
sz =
= 0.267
15 3

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

Khong tin cy 95% ca z dao ng t 1.067 n 2.113 (1.591.960.267 = 1.067 n


1.59+1.960.267 = 2.113). By gi, chng ta c th hon chuyn khong tin cy 95%
sang h s r qua cng thc (suy ra t cng thc [5])

r=

e2 z 1
e2 z + 1

[7]

Khi z = 1.067, r = 0.79. Khi z = 2.113, r = 0.97. Ni cch khc, h s tng quan gia
cn nng v vng eo l 0.92 vi khong tin cy 95% dao ng t 0.79 n 0.97.

4. Kim nh hai h s tng quan


Gi s chng ta c hai h s tng quan r1 v r2 , l c s ca hai h s 1 v 2
trong mt qun th. Hai h s r1 v r2 c c tnh t hai mu c lp n1 v n2 i
tng. kim nh gi nh rng 1 = 2 v gi nh 1 2 , chng ta trc ht cn
phi hon chuyn r thnh ch s z:
1 + r1
1
z1 = log

2
1 r1

z2 =

1 + r2
1
log

2
1 r2

Gi d = z1 z2 , chng ta c th chng minh rng phng sai ca d l:


sd2 =

1
1
+
n1 3 n2 3

[8]

Hay, ni cch khc, lch chun ca d l:

sd =

1
1
+
n1 3 n2 3

[9]

V kim nh cho gi thuyt 1 = 2 c th tnh ton ch s t nh sau:


t=

d
=
sd

z1 z2
1
1
+
n1 3 n2 3

[10]

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

C th chng minh rng nu gi thuyt 1 = 2 l ng th t tun theo lut phn phi


chun vi trung bnh 0 v phng sai 1. iu ny c ngha l nu gi tr ca t thp hn 2 hay cao hn +2, chng ta c th ni hai h s tng quan khc nhau c ngha thng
k.
V d 2 So snh hai h s tng quan. Gi s chng ta tm thy trong y vn
M, h s tng quan gia cn nng v vng eo l 0.7, v h s ny c c tnh 1000
i tng. Nhng vi 15 i tng, chng ta c h s 0.92. Cu hi t ra l khc
bit gia hai h s ny c ngha thng k hay ch do cc yu t ngu nhin gy nn.

Vi h s r1 = 0.7, chng ta c z =
s1 =

1
1.7
log
= 0.867 , v phng sai
2
0.3

1
1
=
0.001.
n 3 1000 3
1
= 0.0833.
15 3

Vi h s r2 = 0.92, chng ta c z = 1.59, v phng sai s1 =

Do , khc bit l: d = 0.867 1.59 = -0.723. Phng sai ca d l: sd2 =


0.001+0.0833 = 0.0843. Do , ch s t l:

t=

0.723
= -2.49
0.0843

V ch s t thp hn -2.0, chng ta c bng chng pht biu rng h s tng quan
ngi Vit cao hn ngi M, v khc bit c ngha thng k.

5. Tm lc
Qua trnh by trn, chng ta bit qua l thuyt v phng php tnh cng nh
kim nh mt h s tng quan (v so snh hai h s tng quan). Cn phi ni thm
rng cc kt qu c tnh va trnh by trn ch c ngha nu hai bin x v y tun theo
lut phn phi chun (tc normal distribution). Nu gi nh phn phi chun khng p
ng, cc kt qu so snh c th (nhn mnh: c th) khng cn gi tr. Tuy nhin,
trong thc t, vi s lng c mu ln, ngay c lch t lut phn phi chun cng khng
c nh hng ln n kt qu.

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

10

Cn lu rng khi ni n tng quan y, chng ta ni n tng quan tuyn


tnh tng quan theo lut ng thng. Do , kt qu phn tch s khng c ngha
khi mi tng quan khng tun theo lut tuyn tnh. iu ny quan trng, v trc khi
tnh ton h s tng quan, chng ta nn v biu xem mi lin h gia hai bin c
tun theo lut tng quan tuyn tnh hay khng. Nu khng, c l chng ta cn phi
hon chuyn s liu trc khi phn tch. Khng c phn tch no c hu hiu hn l biu
. n thun da vo con s rt c th sai lm.
Biu sau y cho thy ba mi tng quan u c h s tng quan nh nhau: r
= 0.62. Nhng ch c Biu (a) l c ngha bi v mi lin h gia x v y l tuyn
tnh, cn mi lin h (b) v (c) th rt khc.

19

20

21

22

23

x1

12
4

10

12
10
y
8
6
4

10

12

14

(c)

14

(b)

14

(a)

18

19

20

21

22

x2

20

21

22

23

24

x3

Mt iu cc k quan trng cn nm lng khi din dch kt qu phn tch tng


quan l h s tng quan khng hn phn nh mi quan h nhn qu (cause-and-effect
relationship). Nu khng c l do sinh hc, khng th v khng nn din dch h s
tng quan theo nh hng nguyn nhn v h qu. Chng hn nh mi tng quan
gia trng lng v vng eo nh va phn tch trn rt cao (0.92), nhng iu ny khng
c ngha l trng lng l nguyn nhn lm cho ngi ta c vng eo rng, hay vng eo
rng l nguyn nhn lm cho ngi ta cn nng. Mi lin h m chng ta quan st ch
n thun l tng quan. Vic s dng v din dch mi tng quan trong bi cnh
lm sng ra sao cn ty thuc vo kinh nghim v kin thc ca tng b mn khoa hc.

Thut ng s dng trong bi vit

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

11

25

Ting Vit
H s tng quan
Hip bin
Khong tin cy 95%
Phng sai
lch chun
Bin ph thuc
Bin c lp

Ting Anh
Coefficient of correlation
Covariance
95% confidence interval
Variance
Standard deviation
Dependent variable
Independent variable

Ch thch k thut: Cc m R sau y c s dng cho phn tch va trnh by.


# M phng cho biu 1d
# trc ht to ra 2 dy s vi 100 i tng o
# tun theo lut phn phi chun bng cch s dng hm rnorm
zn1 <- rnorm(100)
zn2 <- rnorm(100)
# mun c h s tng quan = 0.8
r = 0.80
# m phng x vi trung bnh 5 v lch chun 1.5
x <- 5 + 1.5*zn1
# m phng y vi trung bnh 10 v lch chun 2 v r=0.80
y <- 10 + r*2.0*zn1 + 2.0*sqrt(1-r^2)*zn2
# v biu
plot(y ~ x, pch=16)
# M phng cho biu 1e, ln ny h s tng quan = -0.8
zn1 <- rnorm(100)
zn2 <- rnorm(100)
r = -0.80
x <- 5 + 1.5*zn1
y <- 10 + r*2.0*zn1 + 2.0*sqrt(1-r^2)*zn2
plot(y ~ x, pch=16)
# M phng cho biu 1f h s tng quan = 0.001
zn1 <- rnorm(100)
zn2 <- rnorm(100)
r = 0.001
x <- 5 + 1.5*zn1
y <- 10 + r*2.0*zn1 + 2.0*sqrt(1-r^2)*zn2
plot(y ~ x, pch=16)
# nhp s liu trng lng v vng eo
weight <- c(51,66,47,54,64,75,54,52,53,52,48,46,63,40,90)
waist <- c(71,89,64,74,87,93,66,74,75,72,70,66,81,57,94)
# v biu 2
plot(weight ~ waist, pch=16)
# tnh s trung bnh, dng hm mean

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

12

mean(weight)
mean(waist)
# tnh lch chun, dng hm sd
sd(weight)
sd(waist)
# tnh hip bin v r
cov(weight, waist)
cov(weight, waist) / (sd(weight)*sd(waist))
# hay c th s dng hm cor trong R
cor(weight, waist)
# kim nh r v khong tin cy 95%
cor.test(weight, waist)
# v biu trong phn tm
y <- c(10, 8, 13, 9, 11, 14,
x1 <- c(21.26, 20.57, 20.15,
18.73, 20.01)
x2 <- c(22.40, 21.84, 21.35,
21.15, 19.05)
x3 <- c(20.70, 20.43, 25.31,
20.31, 20.09)

lc
6, 4, 12, 7, 5)
22.28, 21.36, 22.30, 21.35, 18.81, 23.63,
22.15, 22.27, 20.43, 20.25, 17.66, 21.90,
20.56, 20.80, 21.19, 20.21, 19.90, 20.95,

plot(y ~ x1, pch=16)


plot(y ~ x2, pch=16)
plot(y ~ x3, pch=16)

Chng trnh hun luyn y khoa YKHOA.NET Training Nguyn Vn Tun

13

You might also like