You are on page 1of 15

I HC QUC GIA THNH PH H CH MINH

TRNG I HC BCH KHOA


KHOA KHOA HC V K THUT MY TNH

TON RI RC 1
Bi tp ln

ng dng thng k & phn tch trong R

GVHD:
SV:

ThS. Vng B Thnh


Trn Vn Hn - 51201057
Nguyn Vn Hnh - 51201006
L Duy Hot - 51201254
Mai Phc Hu - 51201043

TP. H CH MINH, THNG 5/2013

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

Mc lc
1 Gii thiu bi ton

2 C s l thuyt
2.1 Thng k m t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Cng c R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2
2
2

3 Kt qu phn tch d liu


3.1 Tp d liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Kt qu phn tch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4
4
5

4 Kt lun

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

14

Trang 1/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

Bi bo co ny trnh by phn tch thng k v thng s k thut ca mt s dng xe hi.

Gii thiu bi ton


Phn tch d liu trong nghin cu khoa hc lun ng vai tr rt quan trng. Hu nh
bt c mt cng trnh nghin cu y khoa no cng cn phn tch d liu, t n gin nht
nh phn tch m t n phc tp nh phn tch a bin. R l mt phn mm phn tch d liu hon ton min ph. Tuy min ph, nhng chc nng
ca R hn hn nhng phn mm khc. Tt c nhng phng php, m hnh m cc phn
mm khc c th lm c th R cng c th lm c. R c li th hn tt c cc phn
mm khc l R c kh nng phn tch biu tuyt vi. D liu cn phn tch l cc thng s k thut ca 38 dng xe hi ca nhng nm 1978-1978
da trn s liu thng k ca ngi tiu dng.

C s l thuyt

2.1

Thng k m t

hng tm ca mt thuc tnh:


mean: gi tr trung bnh thng k c ca mt thuc tnh.
min: gi tr nh nht thng k c ca thuc tnh.
max: gi tr ln nht thng k c ca thuc tnh.
median: s trung v. l gi tr nm chnh gia trong mng cc gi tr ca thuc tnh
c sp xp khng gim. Trng hp kch thc mu l s chn th s trong v l
gi tr trung bnh ca hai gi tr gia trong mng.
ly tm ca mt thuc tnh:
Phng sai: l mt gi tr o phn tn ca d liu, c hm cc gi tr ca bin
thng cch gi tr k vng bao nhiu.
lch chun: l mt i lng thng k dng o mc phn tn ca d liu
c tnh bng cch ly cn bc hai ca phng sai.
Sai s chun: l gi tr o mc phn tn d liu tnh trn c mu thng k c
tnh bng cch chia lch chun cho cn bc hai ca c mu.

2.2

Cng c R

R l mt ngn ng lp trnh, mi trng phn mm dnh cho tnh ton v ha thng


k. R l mt phn mm m ngun m thuc d n GNU. Ngn ng R tr thnh mt
tiu chun trn thc t gia cc nh thng k v c s dng pht trin cc phn mm
thng k, phn tch d liu.
Nhng kh nng ca R:
Cc m hnh k thut thng k: tuyn tnh, phi tuyn, c in, phn tch thi gian,
phn loi, phn nhm...

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 2/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

H tr nh mt ngn ng lp trnh thc th, c th nh ngha thm hm mi.


M rng kh nng tnh ton qua cc gi lnh, kh nng lp trnh hng i tng.
Nn tng ha tt, c kh nng to th cht lng cao.
Nhng ng dng ca R:
Thng k, phn tch d liu.
V biu th hin d liu.
Dng nh mt cng c tnh ton.
Tm tt cc phn c hc trn lp.

Cc hm ca R tnh ton thng k m t :


setwd("c:/"): Chuyn directory lm vic v th mc gc ca a C.
bai8<-read.csv("1.csv",header=TRUE): c d liu t file 1.csv vo R.
attach(bai8): Thng bo d liu cn x l.
names(bai8): In tn cc ct ca d liu.
bai8: In ton b d liu.
mean(x): Tnh gi tr trung bnh ca bin x.
median(x): Tnh trung v ca bin x.
var(x): Tnh phng sai ca bn x.
sd(x): Tnh lch chun ca bin x.
min(x): Tm gi tr nh nht ca bin x.
max(x): Tm gi tr ln nht ca bin x.
range(x): Ton c ca bin x (on nh nht cha tt c cc gi tr ca x).
length(x): Kch c mu ca bin x.
summary(x): Thng s thng k ca bin x .
hist(x): V th cho bin x.
shapiro.test(x): Kim dnh phn phi chun cho bin x.
t.test(x): Kim nh T-test cho mt bin.
t.test(x1,x2): Kim nh T-test cho hai bin.
var.test(x1,x2): So snh phng ca 2 bin.
tapply(x,list(y),summary): Thng k theo nhm.
table(x): Tn s cc gi tr trong bin x.

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 3/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

Kt qu phn tch d liu

3.1

Tp d liu

Country
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Japan
U.S.
Germany
Sweden
Sweden
France
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Japan
U.S.
Germany
Japan
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Italy
Germany
Japan
Germany
Germany

Car
Buick Estate Wagon
Ford Country Squire Wagon
Chevy Malibu Wagon
Chrysler LeBaron Wagon
Chevette
Toyota Corona
Datsun 510
Dodge Omni
Audi 5000
Volvo 240 GT
Saab 99 GLE
Peugeot 694 SL
Buick Century Special
Mercury Zephyr
Dodge Aspen
AMC Concord D/L
Chevy Caprice Classic
Ford LTD
Mercury Grand Marquis
Dodge St Regis
Ford Mustang 4
Ford Mustang Ghia
Mazda GLC
Dodge Colt
AMC Spirit
VW Scirocco
Honda Accord LX
Buick Skylark
Chevy Citation
Olds Omega
Pontiac Phoenix
Plymouth Horizon
Datsun 210
Fiat Strada
VW Dasher
Datsun 810
BWM 320i
VW Rabbit

MPG
16.9
15.5
19.2
18.5
30
27.5
27.2
30.9
20.3
17
21.6
16.2
20.6
20.8
18.6
18.1
17
17.6
16.5
18.2
26.5
21.9
34.1
35.1
27.4
31.5
29.5
28.4
28.8
26.8
33.5
34.2
31.8
37.3
30.5
22
21.5
31.9

Weight
4.36
4.054
3.605
3.94
2.155
2.56
2.3
2.23
2.83
3.14
2.795
3.41
3.38
3.07
3.62
3.41
3.84
3.725
3.955
3.83
2.585
2.91
1.975
1.915
2.67
1.99
2.135
2.67
2.595
2.7
2.556
2.2
2.02
2.13
2.19
2.815
2.6
1.925

Drive-Ratio
2.73
2.26
2.56
2.45
3.7
3.05
3.54
3.37
3.9
3.5
3.77
3.58
2.73
3.08
2.71
2.73
2.41
2.26
2.26
2.45
3.08
3.08
3.73
2.97
3.08
3.78
3.05
2.53
2.69
2.84
2.69
3.37
3.7
3.1
3.7
3.7
3.64
3.78

Horsepower
155
142
125
150
68
95
97
75
103
125
115
133
105
85
110
120
130
129
138
135
88
109
65
80
80
71
68
90
115
115
90
70
65
69
78
97
110
71

Displacement
350
351
267
360
98
134
119
105
131
163
121
163
231
200
225
258
305
302
351
318
140
171
86
98
121
89
98
151
173
173
151
105
85
91
97
146
121
89

Gii thiu v tp d liu: C 8 ct v 38 dng.


Ct 1 - Country: Quc gia sn xut ra mu xe.
Ct 2 - Car: Tn mu xe.

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 4/14

Cylinder
8
8
8
8
4
4
4
4
5
6
4
6
6
6
6
6
8
8
8
8
4
6
4
4
4
4
4
4
6
6
4
4
4
4
4
6
4
4

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

Ct 3 - MPG: S dm i c cho mi gallon xng.


Ct 4 - Weight: Trng lng ca xe.
Ct 5 - Drive-Ratio: T l a trong xe.
Ct 6 - Horsepower: Cng sut ca xe.
Ct 7 - Displacement: Chuyn v ca xe.
Ct 8 - Cylinders: S xi lanh trong ng c xe.
c d liu vo R:
Chuyn d liu cho thnh nh dng .csv, chuyn file .csv vo th mc gc C.
Chuyn th mc lm vic ca R n th mc gc C:
c file .csv vo R, loi b cc gi tr khng xc nh v khai bo lm vic vi d
liu:
>
>
>
>
>

setwd("c:/")
bai8 <-read.csv("1.csv" , header=TRUE)
x<-complete.cases(bai8)
bai8=bai8[x, ]
attach(bai8)

3.2

Kt qu phn tch

Tng quan v d liu:


> summary(bai8)
Country
Car
MPG
Weight
France : 1
AMC Concord D/L
: 1
Min.
:15.50
Min.
:1.915
Germany: 5
AMC Spirit
: 1
1st Qu.:18.52
1st Qu.:2.208
Italy : 1
Audi 5000
: 1
Median :24.25
Median :2.685
Japan : 7
Buick Century Special: 1
Mean
:24.76
Mean
:2.863
Sweden : 2
Buick Estate Wagon
: 1
3rd Qu.:30.38
3rd Qu.:3.410
U.S.
:22
Buick Skylark
: 1
Max.
:37.30
Max.
:4.360
(Other)
:32
Drive_Ratio
Horsepower
Displacement
Cylinders
Min.
:2.260
Min.
: 65.0
Min.
: 85.0
Min.
:4.000
1st Qu.:2.695
1st Qu.: 78.5
1st Qu.:105.0
1st Qu.:4.000
Median :3.080
Median :100.0
Median :148.5
Median :4.500
Mean
:3.093
Mean
:101.7
Mean
:177.3
Mean
:5.395
3rd Qu.:3.625
3rd Qu.:123.8
3rd Qu.:229.5
3rd Qu.:6.000
Max.
:3.900
Max.
:155.0
Max.
:360.0
Max.
:8.000
Ct 1 - Country:
> table(Country)
Country
France Germany
1
5

Italy
1

Japan
7

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Sweden
2

U.S.
22
Trang 5/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

=> Phn ln xe trong d liu thng k c sn xut M.


Ct 2 - Car:
Ct 3 - MPG:
> phantich <-function(x)
+ {
+ trung.binh<-mean(x)
+ trung.vi<-median(x)
+ phuong.sai<-var(x)
+ do.lech<-sd(x)
+ min<-min(x)
+ max<-max(x)
+ sai.so<-do.lech/sqrt(length(x))
+ c(trung.binh=trung.binh,min=min,max=max,trung.vi=trung.vi,
phuong.sai=phuong.sai,do.lech=do.lech,sai.so=sai.so)
+ }
> phantich(MPG)
trung.binh
min
max
trung.vi phuong.sai
do.lech
24.760526 15.500000 37.300000 24.250000 42.867319
6.547314
> hist(MPG)
> shapiro.test(MPG)

sai.so
1.062115

Shapiro-Wilk normality test


data: MPG
W = 0.9211, p-value = 0.01063
> tapply(MPG,list(Cylinders),phantich)
$4
trung.binh
min
max
trung.vi phuong.sai
30.0210526 21.5000000 37.3000000 30.5000000 17.4928655

do.lech
4.1824473

sai.so
0.9595192

$5
trung.binh
20.3

min
20.3

max
20.3

trung.vi phuong.sai
20.3
NA

do.lech
NA

sai.so
NA

$6
trung.binh
21.080000

min
16.200000

max
28.800000

trung.vi phuong.sai
20.700000 16.626222

do.lech
4.077526

sai.so
1.289427

$8
trung.binh
min
max
trung.vi phuong.sai
17.4250000 15.5000000 19.2000000 17.3000000 1.4221429

do.lech
1.1925363

sai.so
0.4216253

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 6/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

=> Tr s p=0.01063<0.05 cho ta kt lun bin MPG khng tun theo lut phn phi
chun.
Ct 4 - Weight:
> phantich(Weight)
trung.binh
min
max
2.8628947 1.9150000 4.3600000
> hist(Weight)
> shapiro.test(Weight)

trung.vi phuong.sai
2.6850000 0.4996658

do.lech
0.7068704

sai.so
0.1146695

Shapiro-Wilk normality test


data: Weight
W = 0.9309, p-value = 0.02143
> tapply(Weight,list(Cylinders),phantich)
$4
trung.binh
min
max
trung.vi phuong.sai
do.lech
sai.so
2.29478947 1.91500000 2.79500000 2.20000000 0.08309973 0.28827024 0.06613373

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 7/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

$5
trung.binh
2.83

min
2.83

max
2.83

trung.vi phuong.sai
2.83
NA

do.lech
NA

sai.so
NA

$6
trung.binh
3.1050000

min
2.5950000

max
3.6200000

trung.vi phuong.sai
3.1050000 0.1195111

do.lech
0.3457038

sai.so
0.1093211

$8
trung.binh
min
max
trung.vi phuong.sai
do.lech
sai.so
3.91362500 3.60500000 4.36000000 3.89000000 0.05208655 0.22822479 0.08068965

=> Tr s p=0.02143<0.05 cho ta kt lun bin Weight khng tun theo lut phn phi
chun.
Ct 5 - Drive-Ratio:
> phantich(Drive_Ratio)
trung.binh
min
max
3.0934211 2.2600000 3.9000000
> hist(Drive_Ratio)

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

trung.vi phuong.sai
3.0800000 0.2679691

do.lech
0.5176573

sai.so
0.0839751

Trang 8/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

> shapiro.test(Drive_Ratio)
Shapiro-Wilk normality test
data: Drive_Ratio
W = 0.9241, p-value = 0.01312
> tapply(Drive_Ratio,list(Cylinders),phantich)
$4
trung.binh
min
max
trung.vi phuong.sai
3.3489474 2.5300000 3.7800000 3.3700000 0.1566544

do.lech
0.3957959

sai.so
0.0908018

$5
trung.binh
3.9

min
3.9

max
3.9

trung.vi phuong.sai
3.9
NA

do.lech
NA

sai.so
NA

$6
trung.binh
3.0640000

min
2.6900000

max
3.7000000

trung.vi phuong.sai
2.9600000 0.1555378

do.lech
0.3943828

sai.so
0.1247148

$8
trung.binh
min
max
trung.vi phuong.sai
do.lech
sai.so
2.42250000 2.26000000 2.73000000 2.43000000 0.02776429 0.16662619 0.05891125

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 9/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

=> Tr s p=0.01312<0.05 cho ta kt lun bin Drive-Ratio khng tun theo lut phn
phi chun.
Ct 6 - Horsepower:
> phantich(Horsepower)
trung.binh
min
max
trung.vi phuong.sai
101.736842 65.000000 155.000000 100.000000 699.334282
> hist(Horsepower)
> shapiro.test(Horsepower)

do.lech
26.444929

sai.so
4.289934

do.lech
14.944537

sai.so
3.428512

Shapiro-Wilk normality test


data: Horsepower
W = 0.946, p-value = 0.066
> tapply(Horsepower,list(Cylinders),phantich)
$4
trung.binh
min
max
trung.vi phuong.sai
81.315789 65.000000 115.000000 78.000000 223.339181

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 10/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

$5
trung.binh
103

trung.vi phuong.sai
103
NA

do.lech
NA

sai.so
NA

min
max
trung.vi phuong.sai
85.000000 133.000000 112.500000 189.377778

do.lech
13.761460

sai.so
4.351756

$8
trung.binh
min
max
trung.vi phuong.sai
138.000000 125.000000 155.000000 136.500000 110.285714

do.lech
10.501701

sai.so
3.712912

$6
trung.binh
111.400000

min
103

max
103

=> Tr s p=0.066>0.05 cho ta kt lun bin Horsepower tun theo lut phn phi chun.
Ct 7 - Displacement:
> phantich(Displacement)
trung.binh
min
max
177.28947
85.00000 360.00000
> hist(Displacement)
> shapiro.test(Displacement)

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

trung.vi phuong.sai
148.50000 7899.07610

do.lech
88.87675

sai.so
14.41771

Trang 11/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

Shapiro-Wilk normality test


data: Displacement
W = 0.8485, p-value = 0.0001201
> tapply(Displacement,list(Cylinders),phantich)
$4
trung.binh
min
max
trung.vi phuong.sai
110.473684 85.000000 151.000000 105.000000 465.152047

do.lech
21.567384

sai.so
4.947897

$5
trung.binh
131

min
131

max
131

trung.vi phuong.sai
131
NA

do.lech
NA

sai.so
NA

$6
trung.binh
190.30000

min
146.00000

max
258.00000

trung.vi phuong.sai
173.00000 1329.12222

do.lech
36.45713

sai.so
11.52876

$8
trung.binh
325.50000

min
267.00000

max
360.00000

trung.vi phuong.sai
334.00000 1077.42857

do.lech
32.82421

sai.so
11.60511

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 12/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

=> Tr s p=0.0001201<0.05 cho ta kt lun bin Displacement khng tun theo lut phn
phi chun.
Ct 8 - Cylinders:
> table(Cylinders)
Cylinders
4 5 6 8
19 1 10 8

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 13/14

Trng i Hc Bch Khoa Tp.H Ch Minh


Khoa Khoa Hc v K Thut My Tnh

=> a s cc dng xe thng k c 4 xi lanh.

Kt lun

Trong bo co ny chng ti trnh by v kt qu phn tch d liu cc thng s ca mt s


dng xe nm 1978-1979

Ti liu
[1] D. Abramson, Constructing school timetables using simulated annealing: sequential and parallel algorithms, Management Science, vol. 37, no. 1, pp. 98-113, 1991.
[2] E. Yu and K. S. Sung, A genetic algorithm for a university weekly course
timetabling problem, International Transactions in Operational Research, vol. 9, no.
6, pp. 703-717, 2002.

Bi tp ln mn Ton Ri Rc 1 - Nin kha 2012-2013

Trang 14/14