Professional Documents
Culture Documents
tui
(age)
46
20
52
30
57
25
28
36
22
43
57
33
22
63
40
48
28
49
BMI
(bmi)
25.4
20.6
26.2
22.6
25.4
23.1
22.7
24.9
19.8
25.3
23.2
21.8
20.9
26.7
26.4
21.2
21.2
22.8
Cholesterol
(chol)
3.5
1.9
4.0
2.6
4.5
3.0
2.9
3.8
2.1
3.8
4.1
3.0
2.5
4.6
3.2
4.2
2.3
4.0
Nhn s qua s liu chng ta thy ngi c tui cng cao cholesterol cng
cng cao. Chng ta th nhp s liu ny vo R v v mt biu tn x nh sau:
> age <- c(46,20,52,30,57,25,28,36,22,43,57,33,22,63,40,48,28,49)
2.0
2.5
3.0
chol
3.5
4.0
4.5
20
30
40
50
60
age
r=
( xi x )( yi y )
i =1
n
2 n
( xi x ) ( yi y )
i =1
i =1
yi = + xi + i
[1]
y ( + x )
i =1
nh
( x x )( y y )
i =1
(x x )
i =1
[2]
= y x
[3]
)
)
y, x v y l gi tr trung bnh ca bin s x v y. Ch , ti vit v (vi du
m pha trn) l nhc nh rng y l hai c s (estimates) ca v , ch khng
phi v (chng ta khng bit chnh xc v , nhng ch c th c tnh m thi).
)
)
Sau khi c c s v , chng ta c th c tnh cholesterol trung bnh
cho tng tui nh sau:
)
yi = + xi
s =
2
( y y )
i =1
[4]
n2
s2 chnh l c s ca 2.
Trong phn tch hi qui tuyn tnh, thng thng chng ta mun bit h s
= 0 hay khc 0. Nu bng 0, th cng c ngha l khng c mi lin h g gia x v y;
nu khc vi 0, chng ta c bng chng pht biu rng x v y c lin quan nhau.
kim nh gi thit = 0 chng ta dng xt nghim t sau y:
t=
( )
SE
[5]
( )
)
SE c ngha l sai s chun (standard error) ca c s . Trong phng trnh trn,
t tun theo lut phn phi t vi bc t do n-2 (nu tht s = 0).
10.2.2 Phn tch hi qui tuyn tnh n gin bng R
)
Hm lm (vit tt t linear model) trong R c th tnh ton cc gi tr ca
v , cng nh s2 mt cch nhanh gn. Chng ta tip tc vi v d bng R nh sau:
Call:
lm(formula = chol ~ age)
Coefficients:
(Intercept)
1.08922
age
0.05779
3Q
0.17939
Max
0.63040
Coefficients:
Estimate Std. Error t value
(Intercept) 1.089218
0.221466
4.918
age
0.057788
0.005399 10.704
--Signif. codes: 0 '***' 0.001 '**' 0.01
Pr(>|t|)
0.000154 ***
1.06e-08 ***
'*' 0.05 '.' 0.1 ' ' 1
Lnh th hai, summary(reg), yu cu R lit k cc thng tin tnh ton trong reg. Phn
kt qu chia lm 3 phn:
(a) Phn 1 m t phn d (residuals) ca m hnh hi qui:
Residuals:
Min
1Q
Median
-0.40729 -0.24133 -0.04522
3Q
0.17939
Max
0.63040
Chng ta bit rng trung bnh phn d phi l 0, v y, s trung v l -0.04, cng
khng xa 0 bao nhiu. Cc s quantiles 25% (1Q) v 75% (3Q) cng kh cn i chung
quan s trung v, cho thy phn d ca phng trnh ny tng i cn i.
)
)
(b) Phn hai trnh by c s ca v cng vi sai s chun v gi tr ca kim nh t.
)
Gi tr kim nh t cho l 10.74 vi tr s p = 1.06e-08, cho thy khng phi bng 0.
Ni cch khc, chng ta c bng chng cho rng c mt mi lin h gia cholesterol
v tui, v mi lin h ny c ngha thng k.
Coefficients:
Estimate Std. Error t value
(Intercept) 1.089218
0.221466
4.918
age
0.057788
0.005399 10.704
--Signif. codes: 0 '***' 0.001 '**' 0.01
Pr(>|t|)
0.000154 ***
1.06e-08 ***
'*' 0.05 '.' 0.1 ' ' 1
(c) Phn ba ca kt qu cho chng ta thng tin v phng sai ca phn d (residual mean
square). y, s2 = 0.3027. Trong kt qu ny cn c kim nh F, cng ch l mt
kim nh xem c qu tht bng 0, tc c ngha tng t nh kim nh t trong phn
trn. Ni chung, trong trng hp phn tch hi qui tuyn tnh n gin (vi mt yu t)
chng ta khng cn phi quan tm n kim nh F.
Residual standard error: 0.3027 on 16 degrees of freedom
Multiple R-Squared: 0.8775,
Adjusted R-squared: 0.8698
F-statistic: 114.6 on 1 and 16 DF, p-value: 1.058e-08
Ngoi ra, phn 3 cn cho chng ta mt thng tin quan trng, l tr s R2 hay h s xc
nh bi (coefficient of determination). H s ny c c tnh bng cng thc:
n
R2 =
( y y )
( y y )
i =1
n
i =1
[6]
Tc l bng tng bnh phng gia s c tnh v trung bnh chia cho tng bnh phng
s quan st v trung bnh. Tr s R2 trong v d ny l 0.8775, c ngha l phng trnh
tuyn tnh (vi tui l mt yu t) gii thch khong 88% cc khc bit v
cholesterol gia cc c nhn. Tt nhin tr s R2 c gi tr t 0 n 100% (hay 1). Gi tr
R2 cng cao l mt du hiu cho thy mi lin h gia hai bin s tui v cholesterol
cng cht ch.
Mt h s cng cn cp y l h s iu chnh xc nh bi (m trong kt
qu trn R gi l Adjusted R-squared). y l h s cho chng ta bit mc ci tin
ca phng sai phn d (residual variance) do yu t tui c mt trong m hnh tuyn
tnh. Ni chung, h s ny khng khc my so vi h s xc nh bi, v chng ta cng
khng cn ch tm qu mc.
10.2.3 Gi nh ca phn tch hi qui tuyn tnh
Tt c cc phn tch trn da vo mt s gi nh quan trng nh sau:
6
0.466072660
12
0.003765579
18
0.079151419
#yu cu R dnh ra 4 ca s
#v cc th trong reg
-1
Standardized residuals
0.0
0.2
17
17
3.0
1.5
2.5
3.5
4.0
4.5
-2
-1
Fitted values
Theoretical Quantiles
Scale-Location
Residuals vs Leverage
1
0.5
0.5
1.0
17
-1
Standardized residuals
Cook's distance
0.0
Standardized residuals
Normal Q-Q
-0.4
Residuals
0.4
0.6
Residuals vs Fitted
2.5
3.0
3.5
Fitted values
4.0
4.5
0.00
0.05
0.10
0.5
0.15
0.20
0.25
Leverage
Biu 10.2. Phn tch phn d kim tra cc gi nh trong phn tch hi
qui tuyn tnh.
(a) th bn tri dng 1 v phn d ei v gi tr tin on cholesterol yi . th ny cho
thy cc gi tr phn d tp chung quanh ng y = 0, cho nn gi nh (c), hay i c gi
tr trung bnh 0, l c th chp nhn c.
(b) th bn phi dng 1 v gi tr phn d v gi tr k vng da vo phn phi chun.
Chng ta thy cc s phn d tp trung rt gn cc gi tr trn ng chun, v do , gi
nh (b), tc i phn phi theo lut phn phi chun, cng c th p ng.
(c) th bn tri dng 2 v cn s phn d chun (standardized residual) v gi tr ca
yi . th ny cho thy khng c g khc nhau gia cc s phn d chun cho cc gi tr
ca yi , v do , gi nh (d), tc i c phng sai 2 c nh cho tt c xi, cng c th
p ng.
Ni chung qua phn tch phn d, chng ta c th kt lun rng m hnh hi qui tuyn
tnh m t mi lin h gia tui v cholesterol mt cch kh y v hp l.
10.2.4 M hnh tin on
Sau khi m hnh tin on cholesterol c kim tra v tnh hp l c
thit lp, chng ta c th v ng biu din ca mi lin h gia tui v cholesterol
bng lnh abline nh sau (xin nhc li object ca phn tch l reg):
2.0
2.5
3.0
chol
3.5
4.0
4.5
20
30
40
50
60
age
)
)
Nhng mi gi tr yi c tnh t c s v , m cc c s ny u c sai
s chun, cho nn gi tr tin on yi cng c sai s. Ni cch khc, yi ch l trung bnh,
nhng trong thc t c th cao hn hay thp hn ty theo chn mu. Khong tin cy
95% ny c th c tnh qua R bng cc lnh sau y:
> reg <- lm(chol ~ age)
> new <- data.frame(age = seq(15, 70, 5))
2.0
2.5
3.0
chol
3.5
4.0
4.5
>
>
>
>
>
>
>
>
>
>
20
30
40
50
60
age
linear regression model). Trong thc t, chng ta c th pht trin m hnh ny thnh
nhiu bin, ch khng ch gii hn mt bin nh trn, chng hn nh:
1
2
3
n
Ch trong phng trnh trn, chng ta c nhiu bin x (x1, x2, n xk), v mi bin c
mt thng s j (j = 1, 2, , k) cn phi c tnh. V th m hnh ny cn c gi l
m hnh hi qui tuyn tnh a bin.
Phng php c tnh j cng ch yu da vo phng php bnh phng nh
nht. Gi yi = + 1 x1i + 2 x1i + ... + k xki l c tnh ca yi , phng php bnh phng
nh nht tm gi tr , 1 , 2 ,..., k sao cho
( y y )
i
i =1
nh nht. i vi m hnh hi
qui tuyn tnh a bin, cch vit v m t m hnh gn nht l dng k hiu ma trn. M
hnh [7] c th th hin bng k hiu ma trn nh sau:
Y = X +
1 x11
1 x
12
X =
... ...
1 x1n
x21 ...xk1
x22 ...xk 2
,
...
...
x2 n xkn
1
= 2 ,
...
k
1
= 2
...
n
Phng php bnh phng nh nht gii vector bng phng trnh sau y:
= (X T X ) X T Y
1
T = Y Y
22
24
26
50
60
20
24
26
20
30
40
age
chol
20
30
40
50
60
20
22
bmi
Cng nh gia tui v cholesterol, mi lin h gia bmi v cholesterol cng gn tun
theo mt ng thng. Biu trn cn cho chng ta thy tui v bmi c lin h vi
nhau. Tht vy, phn tch hi qui tuyn tnh n gin gia bmi v cholesterol cho thy
nh mi lin h ny c ngha thng k:
> summary(lm(chol ~
bmi))
Call:
lm(formula = chol ~ bmi)
Residuals:
Min
1Q Median
-0.9403 -0.3565 -0.1376
3Q
0.3040
Max
1.4330
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.83187
1.60841 -1.761 0.09739 .
bmi
0.26410
0.06861
3.849 0.00142 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.623 on 16 degrees of freedom
Multiple R-Squared: 0.4808,
Adjusted R-squared: 0.4483
F-statistic: 14.82 on 1 and 16 DF, p-value: 0.001418
BMI gii thch khong 48% dao ng v cholesterol gia cc c nhn. Nhng v BMI
cng c lin h vi tui, chng ta mun bit nu hai yu t ny c phn tch cng
mt lc th yu t no quan trng hn. bit nh hng ca c hai yu t age (x1) v
bmi (tm gi l x2) n cholesterol (y) qua mt m hnh hi qui tuyn tnh a bin, v m
hnh l:
yi = + 1 x1i + 2 x2i + i
hay phng trnh cng c th m t bng k hiu ma trn: Y = X + m ti va trnh
by trn. y, Y l mt vector vector 18 x 1, X l mt matrix 18 x 2 phn t, v mt
vector 2 x 1, v l vector gm 18 x 1 phn t. c tnh hai h s hi qui, 1 v
2 chng ta cng ng dng hm lm() trong R nh sau:
> mreg <- lm(chol ~ age + bmi)
> summary(mreg)
Call:
lm(formula = chol ~ age + bmi)
Residuals:
Min
1Q Median
-0.3762 -0.2259 -0.0534
3Q
0.1698
Max
0.5679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.455458
0.918230
0.496
0.627
age
0.054052
0.007591
7.120 3.50e-06 ***
bmi
0.033364
0.046866
0.712
0.487
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3074 on 15 degrees of freedom
Multiple R-Squared: 0.8815,
Adjusted R-squared: 0.8657
F-statistic: 55.77 on 2 and 15 DF, p-value: 1.132e-07
Phng trnh cho bit khi tui tng 1 nm th cholesterol tng 0.054 mg/L (c s ny
khng khc my so vi 0.0578 trong phng trnh ch c tui), v mi 1 kg/m2 tng
BMI th cholesterol tng 0.0333 mg/L. Hai yu t ny gii thch khong 88.2% (R2 =
0.8815) dao ng ca cholesterol gia cc c nhn.
Chng ta ch phng trnh vi tui (trong phn tch phn trc) gii thch
khong 87.7% dao ng cholesterol gia cc c nhn. Khi chng ta thm yu t BMI,
h s ny tng ln 88.2%, tc ch 0.5%. Cu hi t ra l 0.5% tng trng ny c
ngha thng k hay khng. Cu tr li c th xem qua kt qu kim nh yu t bmi vi
tr s p = 0.487. Nh vy, bmi khng cung cp cho chng thm thng tin hay tin on
cholesterol hn nhng g chng ta c t tui. Ni cch khc, khi tui c
xem xt, th nh hng ca bmi khng cn ngha thng k. iu ny c th hiu c,
bi v qua Biu 10.5 chng ta thy tui v bmi c mt mi lin h kh cao. V hai
bin ny c tng quan vi nhau, chng ta khng cn c hai trong phng trnh. (Tuy
nhin, v d ny ch c tnh cch minh ha cho vic tin hnh phn tch hi qui tuyn tnh
a bin bng R, ch khng c nh m phng d liu theo nh hng sinh hc).
3.0
4.0
2.0
0.0
1.0
4.5
-2
-1
Scale-Location
Residuals vs Leverage
0.4
3.0
3.5
4.0
Fitted values
4.5
16
0.5
0.8
16
-1
Standardized residuals
Theoretical Quantiles
2.5
16
Fitted values
1.2
3.5
0.0
Standardized residuals
2.5
-1.0
0.0
0.4
16
-0.4
Residuals
8
6
Normal Q-Q
Standardized residuals
Residuals vs Fitted
Cook's distance15
0.00
0.10
0.20
0.30
Leverage
Tuy BMI khng c ngha thng k trong trng hp ny, Biu 10.6 cho thy
cc gi nh v m hnh hi qui tuyn tnh c th p ng.
Id
1
2
3
4
5
6
7
8
9
10
11
12
Hm lng
g cng (x)
1.0
1.5
2.0
3.0
4.0
4.5
5.0
5.5
6.0
6.5
7.0
8.0
cng
mnh (y)
6.3
11.1
20.0
24.0
26.1
30.0
33.8
34.0
38.1
39.9
42.0
46.1
13
14
15
16
17
18
19
9.0
10.0
11.0
12.0
13.0
14.0
15.0
53.1
52.0
52.5
48.0
42.8
27.8
21.9
Median
2.938
3Q
7.675
Max
15.840
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.3213
5.4302
3.926 0.00109 **
conc
1.7710
0.6478
2.734 0.01414 *
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.82 on 17 degrees of freedom
Multiple R-Squared: 0.3054,
Adjusted R-squared: 0.2645
F-statistic: 7.474 on 1 and 17 DF, p-value: 0.01414
Kt qu trn cho thy m hnh hi qui tuyn tnh n gin ny (strength = 21.32
+ 1.77*conc) gii thch khong 31% phng sai ca strength. c s phng sai
ca m hnh ny l: s2 = (11.82)2 = 139.7.
By gi chng ta xem qua biu v ng biu din ca m hnh trn:
> plot(strength ~ conc,
xlab="Concentration of hardwood",
ylab="Tensile strength",
main="Relationship between hardwood concentration \n and tensile
strengt", pch=16)
> abline(simple.model)
30
20
Tensile strength
40
50
10
yi = + 1x + 2x2
2
10
12
14
Concentration of hardwood
3Q
4.1350
By gi chng ta s s dng R
c tnh ba thng s trn.
> quadratic <- lm(strength ~
poly(conc, 2))
> summary(quadratic)
Call:
Max
6.5506
Pr(>|t|)
2.73e-16 ***
1.76e-06 ***
1.89e-08 ***
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = strength ~ poly(conc, 3))
Residuals:
Min
1Q
-4.62503 -1.61085
Median
0.04125
3Q
1.58922
Max
5.02159
Coefficients:
# V 3 ng thng, bc hai v bc 3
> plot(strength ~ conc, pch=16,
main=Hardwood concentration and tensile strength,
sub=Linear, quadratic, and cubic fits)
> abline(linear, col=black)
> lines(xnew, y2, col=blue, lwd=3)
> lines(xnew, y3, col=red, lwd=4)
30
10
20
strength
40
50
10
12
14
conc
Linear, quadratic, and cubic fits
RSS = ( yi yi )
i =1
RSS 2k
AIC = log
+
n n
M hnh no c gi tr AIC thp nht c xem l m hnh ti u. Trong v d sau
y, chng ta s dng hm step tm mt m hnh ti u da vo gi tr AIC.
V d 4. nghin cu nh hng ca cc yu t nh nhit , thi gian, v
thnh phn ha hc n sn lng CO2. S liu ca nghin cu ny c th tm lc
trong bng s 2. Mc tiu chnh ca nghin cu l tm mt m hnh hi qui tuyn tnh
tin on sn lng CO2, cng nh nh gi nh hng ca cc yu t ny.
Bng 2. Sn lng CO2 v mt s yu t c th nh hng n CO2
Id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
y
36.98
13.74
10.08
8.53
36.42
26.59
19.07
5.96
15.52
56.61
26.72
20.80
6.99
45.93
43.09
15.79
21.60
35.19
26.14
8.60
11.63
9.59
4.42
38.89
11.19
75.62
X1
5.1
26.4
23.8
46.4
7.0
12.6
18.9
30.2
53.8
5.6
15.1
20.3
48.4
5.8
11.2
27.9
5.1
11.7
16.7
24.8
24.9
39.5
29.0
5.5
11.5
5.2
X2
400
400
400
400
450
450
450
450
450
400
400
400
400
425
425
425
450
450
450
450
450
450
450
460
450
470
X3
51.37
72.33
71.44
79.15
80.47
89.90
91.48
98.60
98.05
55.69
66.29
58.94
74.74
63.71
67.14
77.65
67.22
81.48
83.88
89.38
79.77
87.93
79.50
72.73
77.88
75.50
X4
4.24
30.87
33.01
44.61
33.84
41.26
41.88
70.79
66.82
8.92
17.98
17.79
33.94
11.95
14.73
34.49
14.48
29.69
26.33
37.98
25.66
22.36
31.52
17.86
25.20
8.66
X5
1484.83
289.94
320.79
164.76
1097.26
605.06
405.37
253.70
142.27
1362.24
507.65
377.60
158.05
130.66
682.59
274.20
1496.51
652.43
458.42
312.25
307.08
193.61
155.96
1392.08
663.09
1464.11
X6
2227.25
434.90
481.19
247.14
1645.89
907.59
608.05
380.55
213.40
2043.36
761.48
566.40
237.08
1961.49
1023.89
411.30
2244.77
978.64
687.62
468.38
460.62
290.42
233.95
2088.12
994.63
2196.17
X7
2.06
1.33
0.97
0.62
0.22
0.76
1.71
3.93
1.97
5.08
0.60
0.90
0.63
2.04
1.57
2.38
0.32
0.44
8.82
0.02
1.72
1.88
1.43
1.35
1.61
4.78
27
36.03
10.6
470
83.15
22.39
720.07
1080.11
5.88
Ch thch: y = sn lng CO2; X1 = thi gian (pht); X2 = nhit (C); X3 = phn trm ha tan; X4 =
lng du (g/100g); X5 = lng than ; X6 = tng s lng ha tan; X7 = s hydrogen tiu th.
Trc khi phn tch s liu, chng ta cn nhp s liu vo R bng cc lnh thng thng.
S liu s cha trong i tng REGdata.
> y <- c(36.98,13.74,10.08, 8.53,36.42,26.59,19.07, 5.96,15.52,56.61,
26.72,20.80, 6.99,45.93,43.09,15.79,21.60,35.19,26.14, 8.60,
11.63, 9.59, 4.42,38.89,11.19,75.62,36.03)
> x1 <- c(5.1,26.4,23.8,46.4, 7.0,12.6,18.9,30.2,53.8,5.6,15.1,20.3,48.4,
5.8,11.2,27.9,5.1,11.7,16.7,24.8,24.9,39.5,29.0, 5.5, 11.5,
5.2,10.6)
> x2 <- c(400,400, 400, 400, 450, 450, 450, 450, 450, 400, 400, 400,
400, 425, 425, 425, 450, 450, 450, 450, 450, 450, 450, 460,
450, 470, 470)
> x3 <- c(51.37,72.33,71.44,79.15,80.47,89.90,91.48,98.60,98.05,55.69,
66.29,58.94,74.74,63.71,67.14,77.65,67.22,81.48,83.88,89.38,
79.77,87.93,79.50,72.73,77.88,75.50,83.15)
> x4 <- c(4.24,30.87,33.01,44.61,33.84,41.26,41.88,70.79,66.82,
8.92,17.98,17.79,33.94,11.95,14.73,34.49,14.48,29.69,26.33,
37.98,25.66,22.36,31.52,17.86,25.20, 8.66,22.39)
> x5 <- c(1484.83, 289.94, 320.79, 164.76, 1097.26, 605.06, 405.37,
253.70, 142.27,1362.24, 507.65, 377.60, 158.05, 130.66,
682.59, 274.20, 1496.51, 652.43, 458.42, 312.25, 307.08,
193.61, 155.96,1392.08, 663.09,1464.11, 720.07)
> x6 <- c(2227.25, 434.90, 481.19, 247.14,1645.89, 907.59, 608.05,
380.55, 213.40,2043.36, 761.48, 566.40, 237.08,1961.49,1023.89,
411.30,2244.77, 978.64, 687.62, 468.38, 460.62, 290.42,
233.95,2088.12, 994.63,2196.17,1080.11)
> x7 <- c(2.06,1.33,0.97,0.62,0.22,0.76,1.71,3.93,1.97,5.08,0.60,0.90,
0.63,2.04,1.57,2.38,0.32,0.44,8.82,0.02,1.72,1.88,1.43,
1.35,1.61,4.78,5.88)
> REGdata <- data.frame(y, x1,x2,x3,x4,x5,x6,x7)
Trc khi phn tch s liu, chng ta cn nhp s liu vo R bng cc lnh thng thng.
S liu s cha trong i tng REGdata.
By gi chng ta bt u phn tch. M hnh u tin l m hnh gm tt c 7 bin c
lp nh sau:
> reg <- lm(y ~ x1+x2+x3+x4+x5+x6+x7, data=REGdata)
> summary(reg)
Call:
lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7, data = REGdata)
Residuals:
Min
1Q
-20.035 -4.681
Median
-1.144
3Q
4.072
Max
21.214
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.937016 57.428952
0.939
0.3594
x1
-0.127653
x2
-0.229179
x3
0.824853
x4
-0.438222
x5
-0.001937
x6
0.019886
x7
1.993486
--Signif. codes: 0 '***'
0.281498
0.232643
0.765271
0.358551
0.009654
0.008088
1.089701
-0.453
-0.985
1.078
-1.222
-0.201
2.459
1.829
0.6553
0.3370
0.2946
0.2366
0.8431
0.0237 *
0.0831 .
Kt qu trn cho thy tt c 7 bin s gii thch khong 73% phng sai ca y. Nhng
trong 7 bin , ch c x6 l c ngha thng k (p = 0.024). Chng ta th gim m
hnh thnh mt m hnh hi qui tuyn tnh n gin vi ch bin x6.
> summary(lm(y ~ x6, data=REGdata))
Call:
lm(formula = y ~ x6, data = REGdata)
Residuals:
Min
1Q
-28.081 -5.829
Median
-0.839
3Q
5.522
Max
26.882
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.144181
3.483064
1.764
0.09 .
x6
0.019395
0.002932
6.616 6.24e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.7 on 25 degrees of freedom
Multiple R-Squared: 0.6365,
Adjusted R-squared: 0.6219
F-statistic: 43.77 on 1 and 25 DF, p-value: 6.238e-07
30
50
50
70
90
200
1000
8
70
10
50
10
40
440
10
30
x1
90
400
x2
70
50
70
x3
1000
10
40
x4
2000
200
x5
500
x6
x7
10
40
70
400
440
10
40
70
500
2000
Kt qu trn cho thy y c lin h vi cc bin nh x1, x5 v x6. Ngoi ra, bin x5 v
x6 c mt mi lin h rt mt thit (gn nh l mt ng thng) vi h s tng quan
l 0.88. Ngoi ra, x5 v x1 hay x6 v x5 cng c lin h vi nhau nhng theo mt hm
s nghch o. iu ny c ngha l bin x5 v x6 cung cp mt lng thng tin nh
nhau tin on y, tc l chng ta khng cn c hai trong m m hnh.
tm mt m hnh ti u trong bi cnh c nhiu mi tng quan nh th, chng ta ng
dng step nh sau. Ch cch cung cp thng s lm(y ~ .), du . c ngha l
yu cu R xem xt tt c bin trong i tng REGdata.
> reg <- lm(y ~ ., data=REGdata)
> step(reg, direction=both)
Start: AIC= 134.07
y ~ x1 + x2 + x3 +
Df Sum of Sq
- x5
1
4.54
- x1
1
23.17
- x2
1
109.34
- x3
1
130.90
<none>
- x4
1
168.31
- x7
1
377.09
- x6
1
681.09
x4 + x5 + x6 + x7
RSS
AIC
2145.37 132.13
2164.00 132.36
2250.18 133.42
2271.74 133.68
2140.83 134.07
2309.14 134.12
2517.92 136.45
2821.92 139.53
Df Sum of Sq
RSS
1
22.7 2168.1
1
113.8 2259.1
1
133.5 2278.9
2145.4
1
170.8 2316.2
1
4.5 2140.8
1
375.7 2521.1
1
1058.5 3203.8
AIC
130.4
131.5
131.8
132.1
132.2
134.1
134.5
141.0
- x2
- x3
<none>
- x4
+ x1
+ x5
- x7
- x6
Df Sum of Sq
RSS
1
96.8 2264.9
1
122.0 2290.0
2168.1
1
187.4 2355.5
1
22.7 2145.4
1
4.1 2164.0
1
385.0 2553.1
1
1526.2 3694.3
AIC
129.6
129.9
130.4
130.7
132.1
132.4
132.8
142.8
Df Sum of Sq
RSS
1
25.4 2290.3
1
90.9 2355.8
2264.9
1
96.8 2168.1
1
8.3 2256.5
1
5.7 2259.1
1
384.9 2649.7
1
2015.6 4280.5
AIC
127.9
128.7
129.6
130.4
131.5
131.5
131.8
144.8
- x3
- x4
<none>
+ x2
+ x5
+ x1
- x7
- x6
Df Sum of Sq
RSS
1
73.5 2363.8
2290.3
1
25.4 2264.9
1
11.3 2279.0
1
6.3 2284.0
1
0.3 2290.0
1
486.6 2776.9
1
1993.8 4284.1
AIC
126.7
127.9
129.6
129.8
129.8
129.9
131.1
142.8
Df Sum of Sq
<none>
+ x4
+ x1
+ x3
+ x5
+ x2
- x7
- x6
1
1
1
1
1
1
1
73.5
33.4
8.1
7.7
7.3
497.3
4477.0
RSS
2363.8
2290.3
2330.4
2355.8
2356.1
2356.6
2861.2
6840.8
AIC
126.7
127.9
128.4
128.7
128.7
128.7
129.9
153.4
Call:
lm(formula = y ~ x6 + x7, data =
REGdata)
Coefficients:
(Intercept)
2.52646
x6
0.01852
x7
2.18575
Median
0.2513
3Q
4.9339
Max
21.9682
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.526460
3.610055
0.700
0.4908
x6
0.018522
0.002747
6.742 5.66e-07 ***
x7
2.185753
0.972696
2.247
0.0341 *
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.924 on 24 degrees of freedom
Multiple R-Squared: 0.6996,
Adjusted R-squared: 0.6746
F-statistic: 27.95 on 2 and 24 DF, p-value: 5.391e-07
Phn tch chi tit (kt qu trn) cho thy hai bin ny gii thch khong 70% phng sai
ca y.
By gi chng ta sn sng phn tch bng php tnh BMA. Hm bicreg c vit
c bit cho phn tch hi qui tuyn tnh. Cch p dng hm bicreg nh sau:
> bma <- bicreg(xvars, co2, strict=FALSE, OR=20)
Call:
bicreg(x = xvars, y = co2, strict = FALSE, OR = 20)
16
Best
Intercept
x1
x2
x3
x4
x5
x6
x7
EV
5.75672
-0.01807
-0.00075
0.00011
-0.03059
-0.00023
0.01815
1.60766
nVar
r2
BIC
post prob
SD
14.6244
0.1008
0.0282
0.0791
0.1020
0.0030
0.0040
1.2821
0.6599 ):
model 1
2.5264
.
.
.
.
.
0.0185
2.1857
model 2
6.1441
.
.
.
.
.
0.0193
.
model 3
8.6120
.
.
.
-0.1419
.
0.0164
2.1628
model 4
7.5936
-0.1393
.
.
.
.
0.0162
2.1233
model 5
7.3537
.
.
-0.0572
.
.
0.0179
2.2382
2
0.700
-25.8832
0.311
1
0.636
-24.0238
0.123
3
0.709
-23.4412
0.092
3
0.704
-22.9721
0.072
3
0.701
-22.6801
0.063
M odels selected by BM A
x1
x2
x3
x4
x5
x6
x7
10
13
Model #
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion).
Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.:
Blackwells.
Mt s bi bo lin quan n BMA c th ti t trang web sau y:
www.stat.colostate.edu/~jah/papers.