Professional Documents
Culture Documents
lstk12 Phantichtuongquan PDF
lstk12 Phantichtuongquan PDF
1. Tm lc l thuyt
m t tng quan gia hai bin, chng ta cn phi c tnh h s tng
quan (coefficient of correlation). V, hiu c ch ca h s tng quan, chng ta
cn lm quen vi khi nim hip bin (covariance). Chng ta bit rng vi mt bin X
hay Y, c ba thng s thng k m t: s c mu, s trung (mean), v phng sai
(variance), m ti bn qua trong bi Lm sng thng k th nht. Nhng m t mi
tng quan gia hai bin X v Y, chng ta cn n hip bin.
C th hiu hip bin qua hnh hc lng gic nh sau. Chng ta bit rng cho
mt tam gic vung, nu gi cnh huyn l c v hai cnh cn li l a v b, nh l
Pythagoras cho bit bnh phng cnh huyn bng tng bnh phng hai cnh kia:
c2 = a 2 + b2
Nhng cho mt tam gic thng, th mi lin h gia c v hai cnh a v b phc
tp hn vi mi lin h c nh lng bng hm cosine ca gc C nh sau:
c 2 = a 2 + b 2 2 ab.cos C
ging nh nh thc ( x + y ) = x 2 + y 2 + 2 xy ).
Trn y l khi nim. By gi i vo chi tit ton, chng ta cn mt s k
hiu vit tt cc ch s trn. Gi xi v yi l hai bin quan st c ca X v Y cho c
nhn i. Gi s chng ta c n i tng th i = 1, 2, 3, ., n. Gi x v y l hai s trung
bnh ca bin quan st c x v y; sx2 v s 2y ln lc l phng sai ca hai bin, c
nh ngha nh sau:
s x2 =
1 n
2
( xi x )
n 1 i =1
s 2y =
1 n
2
( yi y )
n 1 i =1
sx2+ y = sx2 + s y2
[1]
cov ( xi , yi ) = ( xi x )( yi y )
Nhng y chng ta c n i tng, cho nn cn phi cng tt c li v chia cho s i
tng:
cov ( x, y ) =
1 n
( xi x )( yi y )
n 1 i =1
[2]
Cng thc [2] chnh l nh ngha ca hip bin. T hai cng thc trn, chng ta c th
rt ra vi nhn xt s khi:
Cov ( x, y )
r=
var ( x ) . var ( y )
Cov ( x, y )
[3]
sx s y
( xi x )( yi y )
i =1
r=
1 n n
x
y
xi yi
i i
n i =1 i =1
i =1
=
( n 1) sx s y
n
2 n
( xi x ) ( yi y )
i =1
[4]
i =1
(b)
(c)
6
y
5
y
-20
-15
10
-10
15
-5
20
(a)
8
x
10
12
14
10
12
14
10
12
14
(e)
(f)
10
10
10
12
12
12
14
14
14
16
(d)
10
6
x
ngha
Mi tng quan qu thp, khng ng k
Mi tng quan thp
Mi tng quan trung bnh
Mi tng quan cao
Mi tng quan rt cao
2. Mt v d c th
V d 1 Cn nng v vng eo. S liu sau y c trch ra t mt nghin cu
qui m (trn 3000 ngi) Vit Nam v mi lin h gia cc ch s nhn trc v bnh
tiu ng. Trng lng v vng eo ca 15 i tng c o lng v kt qu nh sau:
Trng lng
(weight; kg)
51.0
66.0
47.0
54.0
64.0
75.0
54.0
52.0
53.0
52.0
48.0
46.0
63.0
40.0
90.0
70
40
50
60
weight
80
90
Ch rng cn nng c tnh bng kg v vng eo bng cm. Biu 2 sau y th hin
mi lin h gia hai bin:
60
70
80
90
waist
p dng cng thc [1] trn chng ta c th m t hai bin ny qua cc ch s thng k nh sau:
Trng lng trung bnh: x =
51 + 66 + 47 + ... + 90
= 57.0 kg
15
2
2
x
2
y
( 51 57 ) + ( 66 57 ) + ( 47 57 )
+ ... + ( 90 57 )
15 1
= 163.6
71 + 89 + 64 + ... + 94
= 75.5 cm
15
2
2
2
2
(
71 75.5) + (89 75.5) + (64 75.5) + ... + (94 75.5)
=
15 1
= 122.6
V hip bin:
Cov ( x, y ) =
= 71.2
Tm li:
n = 15
Trng lng trung bnh: x = 57.0, lch chun: sx = 12.8 kg
Do , h s tng quan gia trng lng v vng eo (theo cng thc [3]) l:
r=
Cov ( x, y )
sx s y
71.2
= 0.92
12.8 11.1
Theo phng php ca Fisher, trc ht chng ta cn phi hon chuyn r sang
mt ch s mi z, qua cng thc sau y:
z=
1
1+ r
log
2
1 r
[5]
sz =
1
n3
[6]
1
1 + 0.92
log
= 1.59
2
1 0.92
r=
e2 z 1
e2 z + 1
[7]
Khi z = 1.067, r = 0.79. Khi z = 2.113, r = 0.97. Ni cch khc, h s tng quan gia
cn nng v vng eo l 0.92 vi khong tin cy 95% dao ng t 0.79 n 0.97.
2
1 r1
z2 =
1 + r2
1
log
2
1 r2
1
1
+
n1 3 n2 3
[8]
sd =
1
1
+
n1 3 n2 3
[9]
d
=
sd
z1 z2
1
1
+
n1 3 n2 3
[10]
Vi h s r1 = 0.7, chng ta c z =
s1 =
1
1.7
log
= 0.867 , v phng sai
2
0.3
1
1
=
0.001.
n 3 1000 3
1
= 0.0833.
15 3
t=
0.723
= -2.49
0.0843
V ch s t thp hn -2.0, chng ta c bng chng pht biu rng h s tng quan
ngi Vit cao hn ngi M, v khc bit c ngha thng k.
5. Tm lc
Qua trnh by trn, chng ta bit qua l thuyt v phng php tnh cng nh
kim nh mt h s tng quan (v so snh hai h s tng quan). Cn phi ni thm
rng cc kt qu c tnh va trnh by trn ch c ngha nu hai bin x v y tun theo
lut phn phi chun (tc normal distribution). Nu gi nh phn phi chun khng p
ng, cc kt qu so snh c th (nhn mnh: c th) khng cn gi tr. Tuy nhin,
trong thc t, vi s lng c mu ln, ngay c lch t lut phn phi chun cng khng
c nh hng ln n kt qu.
10
19
20
21
22
23
x1
12
4
10
12
10
y
8
6
4
10
12
14
(c)
14
(b)
14
(a)
18
19
20
21
22
x2
20
21
22
23
24
x3
11
25
Ting Vit
H s tng quan
Hip bin
Khong tin cy 95%
Phng sai
lch chun
Bin ph thuc
Bin c lp
Ting Anh
Coefficient of correlation
Covariance
95% confidence interval
Variance
Standard deviation
Dependent variable
Independent variable
12
mean(weight)
mean(waist)
# tnh lch chun, dng hm sd
sd(weight)
sd(waist)
# tnh hip bin v r
cov(weight, waist)
cov(weight, waist) / (sd(weight)*sd(waist))
# hay c th s dng hm cor trong R
cor(weight, waist)
# kim nh r v khong tin cy 95%
cor.test(weight, waist)
# v biu trong phn tm
y <- c(10, 8, 13, 9, 11, 14,
x1 <- c(21.26, 20.57, 20.15,
18.73, 20.01)
x2 <- c(22.40, 21.84, 21.35,
21.15, 19.05)
x3 <- c(20.70, 20.43, 25.31,
20.31, 20.09)
lc
6, 4, 12, 7, 5)
22.28, 21.36, 22.30, 21.35, 18.81, 23.63,
22.15, 22.27, 20.43, 20.25, 17.66, 21.90,
20.56, 20.80, 21.19, 20.21, 19.90, 20.95,
13