You are on page 1of 13

36

Chng 6

Chng 4
P DNG MS-EXCEL
TRONG PHN TCH
TNG QUAN V HI QUY

Phn tch tng quan


Phn tch hi quy
n gin
a tham s
B MN TON

GVGD: Nguyn nh Huy

Chng 6

37

A- PHN TCH TNG QUAN


6.1 Khi nim thng k
Hai bin s ngu nhin Y v X c th: lin quan tuyn tnh (a v b), c khuynh hng tuyn tnh
(c) hoc khng c lin quan (d v c).

H s tng quan Pearson:

X ,Y

COV ( X , Y )

X X

; X2

1 N
1 N
2
2
;
(
X

i X
(Yi Y )2
Y
N i 1
N i 1

S phn tch tng quan (correlation) kho st khuynh hng v mc ca s lin


quan, trong s phn tch hi quy(regrestion) xc nh s lin quan nh lng gia hai bin s
ngu nhin Y v X. H s tng quan c th c c tnh bi biu thc:
n

S XY
R

S XX SYY

(X
i 1

X )(Yi Y )

i 1

i 1

( X i X )2 (Yi Y )2

H s tng quan c dng trong vic nh gi mc lin quan:


Gi tr |R|

Mc

<0,70
0,70-0,80
0,80-0,90
<0,90

Ngho nn
Kh
Tt
Xut sc

6.2 p dng MS-EXCEL


Th d 16: Ngi ta tin hnh song song hai th nghim lo ha cp tc mt dng thuc
vi hai iu kin: m 90% v nhit 60oC.T l phn hy (%) ca hot cht theo thi gian
(pht) nh sau:
Thi gian
m
Nhit

5
3,5
2,7

10
5,1
3,2

15
5,8
4,7

20
6,7
6,1

25
7,1
6,2

Gi m, nhit v thi gian c lin quan nh th no?

B MN TON

GVGD: Nguyn nh Huy

Chng 6

38
6.2.1 Nhp d liu vo bng tnh

6.2.3 p dng Correlation


a- Nhp ln lt n lnh Tools v lnh Data Analysic
b- Chn phng trnh Correlation trong hp thoi Data Analysic ri nhp nt OK.
c- Trong hp Correlation, ln lt n nh cc chi tit:
Phm vi u vo (Input Range),
Cch xp xp theo hng hay ct (Group By),
Nhn d liu (Labels Fisrt Row/Column),
Phm vi u ra (Output Range)

Hp thoi Correlation

Kt qu
Cc h s tng quan: R (m/thi gian) = 0,97; R(nhit/thi gian) = 0,97 v R (m / nhit)
= 0,95

B MN TON

GVGD: Nguyn nh Huy

Chng 6

39

B- PHN TCH HI QUY


6.4 Khi nhim thng k
Php phm tch hi quy tuyn tnh (liner regression) hay c p dng trong khoa hc. Th
d, ng hi quy (regression line / line of best fit) thng dng d on v tui th hay hn
dng ca thuc

(L thuyt)

(c tnh)

Phng trnh hi quy c th c c


tnh bng phng php bnh phng
cc tiu (least-squares estimation).

B MN TON

GVGD: Nguyn nh Huy

Chng 6

40

C- HI QUY TUYN TNH N GIN


6.5 Phng trnh tng qut

Y| X B0 BX

Y: bin s ph thuc

B0 Y BX

(dependent / reponse variable)

X: l bin s c lp

X Y X Y / N
i i

(independent / predictor variable)

i i

Xi X

B0 v B l cc h s hi quy
(regression coefficients)

Bng ANOVA
Ngun
sai s
Hi quy

Sai s

Tng cng

Bc
t do

Tng s bnh
phng

Bnh phng
trung bnh

SSR (Yi ' Y ' )2

MSR = SSR

N2

SSE (Yi Yi ' )2

MSE = SSE/(N-2)

N 1

Gi tr
thng k
MSR
F
MSE

SST (Yi Y )2

= SSR + SSE

Gi tr thng k
Gi tr R bnh phng (R square):

SSR
(100R2: % ca bin i trn Y c gii thch bi X)
SST

lch chun (Standard Error):

1
(Yi Yi ' )2
N 2

(S phn tn ca d liu cng t th gi


tr ca S cng gn zero)

Trc nghim thng k


i vi mt phng trnh hi quy, Y| X B0 BX , ngha thng k ca cc h s Bi (B0
hay B) c nh gi bng trc nghim t (phn phi Student) trong khi tnh cht thch hp ca
phng trnh Y| X f ( X ) c nh gi bng trc nghim F (phn b Fischer)
Trc nghim t
- Gi thuyt:

B MN TON

GVGD: Nguyn nh Huy

Chng 6

41
H0: i = 0

H s hi quy khng c ngha

H0: i 0

H s hi quy c ngha

- Gi tr thng k:

Bi i
Sn2

; Sn2

S2

(X

X )2

B
S n2

Phn b Student = N-2


- Bin lun:
Nu t < t (N-2) Chp nhn gi thuyt H0 .
Trc nghim F
- Gi thuyt:
H0: i = 0

Phng trnh hi quy khng thch hp

H0: i 0

Phng trnh hi quy thch hp

MSR
MSE
Phn b Fischer v1 = 1, v2 = N-2

- Gi tr thng k: F

- Kt lun:
Nu F < F (1,N-2) Chp nhn gi thuyt H0 .

B MN TON

GVGD: Nguyn nh Huy

Chng 6

42

D- HI QUY TUYN TNH A THAM S


Trong phng trnh hi quy tuyn tnh a tham s bin s ph thuc Y c lin quan n k
bin s c lp Xi (i = 1,2,k) thay v ch c mt nh trong hi quy tuyn tnh n gin.
Phng trnh tng qut : Y| X 0 , X1 ,..., X k B0 B1 X1 B2 X 2 ... Bk X k
Phng trnh hi quy a tham s c th c trnh by di dng ma trn:

Bng ANOVA

Bc
t do

Tng s bnh
phng

Bnh phng
trung bnh

SSR

MSR = SSR/k

Nk1

SSE

MSE = SSE/(N-k-1)

N1

SST= SSR + SSE

Ngun
sai s
Hi quy
Sai s
Tng cng

Gi tr thng k

MSR
MSE

Gi tr thng k:
Gi tr R bnh phng:
Gi tr R2 c hiu chnh (Adjusted R Square)

R2

SSR
kF

SST ( N k 1) kF
(R2 0,81 l kh tt)

Gi tr R2 c hiu chnh (Adjusted R square):

Rii2

( N 1) R 2 k
k (1 R 2 )
R2
N k 1
( N k 1)

( Rii2 s tr nn m hay khng xc nh nu R2 hay N nh)

B MN TON

GVGD: Nguyn nh Huy

Chng 6

43
lch chun:
S

SSE
( N k 1)

(S 0,30 l kh tt)

Trc nghim thng k


Tng t hi quy n gin, song bn cn ch :
- Trong trc nghim t
H0: i = 0

Cc h s hi quy khng c ngha

H0: i 0

C t nht vi h s hi quy c ngha

Bc t do ca gi tr t: = N k 1.

Bi i
Sn2

; S
2
n

S2

(X

X )2

- Trong trc nghim F:


H0: i = 0

Phng trnh hi quy khng thch hp

H0: i 0

Phng trnh hi quy thch hp vi t nht vi Bi .

Bc t do ca gi tr F: v1 = 1; v2 = N-k-1.
p dng MS-EXCEL
Th d 17: Ngi ta dng ba mc nhit gm 105, 120 v 1350C kt hp vi ba
khong thi gian l 15, 30 v 60 pht thc hin mt phn ng tng hp. Cc hiu sut ca
phn ng (%) c trnh by trong bng sau y:
Thi gian (pht)
X1
15
30
60
15
30
60
15
30
60

Nhit (0C)
X2
105
105
105
120
120
120
135
135
135

Hiu sut (%)


Y
1.87
2.02
3.28
3.05
4.07
5.54
5.03
6.45
7.26

Hy cho bit yu t nhit v/ hoc yu t thi gian c lin quan tuyn tnh vi hiu sut
ca phn ng tng hp ? Nu c th iu kin nhit 1150C trong vng 50 pht th hiu sut
phn ng s l bao nhiu?
Nhp d liu vo bng tnh
D liu nht thit phi c nht thit c nhp theo ct:

B MN TON

GVGD: Nguyn nh Huy

Chng 6

44

S dng Regression
Nhn ln lt n lnh Tools v lnh Data Analysis.
Chn chng trnh Regression trong hp thoi Data Analysis ri nhp OK.
Trong hp thoi Regression, ln lt n nh cc chi tit:
Phm vi ca bin s Y (Input Y Range).
Phm vi ca bin s X (Input Y Range)
Nhn d liu (Labels)
Mc tin cy (Confidence Level)
Ta u ra (Output Range)
V mt s ty chn khc nh ng hi quy (Line Fit Plots), biu thc sai s
(Residuals Plots)

Hp thoi Regression

B MN TON

GVGD: Nguyn nh Huy

Chng 6

45
Phng trnh hi quy Y| X1 f ( X1 )

Y| X1 2,73 0,04 X1
(R2 = 0,21; S=1,81)
Regression Statistics
Multiple R

0.462512069

R Square

0.213917414

Adjusted R
Square

0.101619901

Standard Error

1.811191587

Observations

ANOVA
df

SS

MS

Regression

6.24891746

6.24891746

Residual

22.96290476

3.280414966

Total

29.21182222

Coefficients

Standard
Error

t Stat

Significance F

1.904917

0.209994918

P-value

Lower 95%

Intercept

2.726666667

1.280705853

2.129034282

0.070771

-0.301721453

X1

0.044539683

0.032270754

1.38018722

0.209995

-0.031768525

t0 = 2,19 < t0,05 = 2,365 (Hay PV2 0,071 0,05 )


Chp nhn gi thuyt H0.
t1 = 1,38 < t0,05 = 2,365 (Hay PV 0, 209 0,05 )
Chp nhn gi thuyt H0.
3
F 1,905 F0,05
5,590 (Hay FS4 0, 209 0,05 )

Chp nhn gi thuyt H0.


Vy c hai h s 2,37(B0) v 0,04(B1) ca phng trnh hi quy Y| X i 2,73 0,04 X i u
khng c ngha thng k. Ni mt cch khc, phng trnh hi quy ny khng thch hp.
Kt lun: Yu t thi gian khng c lin quan tuyn tnh vi hiu sut ca phn ng tng hp.
Phng trnh hi quy YX 2 f ( X 2 )

Y| X 2 2,73 0,04 X 2
(R2 = 0,76; S=0,99)

B MN TON

GVGD: Nguyn nh Huy

Chng 6

46
Regression Statistics
Multiple R

0.873933544

R Square

0.76375984

Adjusted R
Square

0.730011246

Standard Error

0.99290379

Observations

ANOVA
df

SS

MS

Regression

22.31081667

22.31082

Residual

6.901005556

0.985858

Total

29.21182222

Coefficients
Intercept
X2

Standard Error

Significance F

22.63086

t Stat

P-value

0.002066188

Lower 95%

-11.14111111

3.25965608

-3.41788

0.011168

-18.84897293

0.128555556

0.027023418

4.757191

0.002066

0.064655325

t0 = 3,418 < t0,05 = 2,365 (Hay PV 0,011 0,05 )


Bc b gi thuyt H0.
t2 = 4,757 < t0,05 = 2,365 (Hay PV 0,00206 0,05 )
Bc b gi thuyt H0.

F 22,631 F0,05 5,590 (Hay FS 0,00206 0,05 )


Bc b gi thuyt H0.
Vy c hai h s -11,14(B0) v 0,13(B2) ca phng trnh hi quy Y| X 2 11,14 0,13 X 2
u c ngha thng k. Ni mt cch khc, phng trnh hi quy ny thch hp.
Kt lun: Yu t nhit c lin quan tuyn tnh vi hiu sut ca phn ng tng hp.
Phng trnh hi quy Y| X1 , X 2 f ( X1 , X 2 )

Y| X1 , X 2 12,70 0,04 X1 0,13 X 2


(R2 = 0,97; S=0,33)

B MN TON

GVGD: Nguyn nh Huy

Chng 6

47
Regression Statistics
Multiple R

0.988775634

R Square
Adjusted R
Square

0.977677254

Standard Error

0.329668544

0.970236338

Observations

ANOVA
df
2
6
8

SS
28.55973413
0.652088095
29.21182222

MS
14.27987
0.108681

F
131.3921

Significance F
1.11235E-05

Coefficients
-12.7
0.044539683
0.128555556

Standard Error
1.101638961
0.005873842
0.008972441

t Stat
-11.5283
7.582718
14.32782

P-value
2.56E-05
0.000274
7.23E-06

Lower 95%
-15.39561342
0.03016691
0.106600783

Regression
Residual
Total

Intercept
X1
X2

t0 = 11,528 > t0,05 = 2,365 (Hay PV 2, 260.105 0,05 )


Bc b gi thuyt H0.
t1 = 7,583 > t0,05 = 2,365 (Hay PV 0,00027 0,05 )
Bc b gi thuyt H0.
t2 = 14,328 > t0,05 = 2,365 (Hay PV 7, 233.106 0,05 )
Bc b gi thuyt H0.

F 131,392 F0,05 5,140 (Hay FS 1,112.105 0,05 )


Bc b gi thuyt H0.
Vy c hai h s -12,70(B0), 0,04(B1) v 0,13(B2) ca phng trnh hi quy
Y| X1 , X 2 12,70 0,04 X1 0,13 X 2 u c ngha thng k. Ni mt cch khc, phng trnh
hi quy ny thch hp.
Kt lun: Hiu sut ca phn ng tng hp c lin quan tuyn tnh vi c hai yu t l thi
gian v nhit .
S tuyn tnh ca phng trnh Y| X1 , X 2 12,70 0,04 X1 0,13 X 2 c th c trnh by
trn biu phn tn (scatterplots):

B MN TON

GVGD: Nguyn nh Huy

Chng 6

48
Mun d on hiu sut ca phn ng bng phng trnh hi quy

Y| X1 , X 2 12,70 0,04 X1 0,13 X 2 , bn ch cn chn mt , th d B21, sau nhp hm v


c kt qu nh sau:
B21
A
17 Interrcept
18 X1
19 X2
20
21
D on

= B17 + B18 * 50 + B19 * 115


B
C
D
-12,7
1,101638961
-11,52827782
0,044539683
0,005873842
7.582717626
0,128555556
0,008972441
14,32782351
4,310873016

Ghi ch: B17 ta ca B0, B18 ta ca B1, B19 ta ca B2, 50 l gi tr ca X1(thi gian)
v 115 l gi tr ca X2(nhit ).
PH LC:
Bng gi tr ti hn dng trong trc nghim loi gi tr bt thng:
Gi tr thng k
G1
N=37

G1

Y2 Y1
YN Y1

N=813

G2

Y3 Y1
YN 1 Y1

N=1424

G3

B MN TON

Y3 Y1
YN 2 Y1

S trng hp kho
st
N
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Tr s ti hn
GP (P=0,01)
0,976
0,846
0,729
0,644
0,586
0,780
0,725
0,678
0,638
0,605
0,578
0,602
0,579
0,559
0,542
0,527
0,514
0,502
0,491
0,481
0,472
0,464

GVGD: Nguyn nh Huy

You might also like