You are on page 1of 8

Bi 1: Lm quen R- Nhp/xut X l s liu u vo

I. Lm quen R:
- Ci t chng trnh R.
- Khi ng, mn hnh khi ng:

- Thit lp th mc lm vic, v d D, th mc Works


> setwd(D:/Works)
hoc vo File -> Change dir
- Xem th mc hin hnh
> getwd()
- Xem tr gip, lnh help() hoc ?
> help(setwd) hoc > ?getwd
- Lit k tt c file trong th mc lm vic
> list.files() hoc > dir()
- Lu Workspace ang lm vic
> save.image(ten_file.rda)
- Lu bin ang lm vic, chng hn bin x
> save(x, file=ten_file.rda)
- Khi phc bin x
> load(ten_file.rda), ten_file.rda l file cha bin x va lu trn.
- Xa 1 bin ra khi b nh
> rm(x)
- Xa tt c
> rm(list=ls())
- Lit k tt c nhng bin hin hnh
> ls()
- Xem thng tin ca 1 bin
> str(x)
- Xem thng tin ca tt c bin ang lm vic
> ls.str()

II. Nhp d liu:


- Nhp mt vect, v d , x = (1,2,3,4,5) v y = (7,9,10)
> x <- c(1,2,3,4,5)
> y <- c(7,9,10)
- Nhp trc tip t R, dng lnh edit(data.frame())
> frame <- edit(data.frame())
S m 1 ca s nhp trc tip, ta nhp cc bin cn thit thuc frame vo.
- Nhp t file .txt, dng lnh read.table
> data <- read.table(D:/ng dn/solieu.txt, header=TRUE, sep=)
c file solieu.txt ri gn cho bin data. Nu file trong th mc lm vic th khng cn
ch ra ng dn.
- Nhp t file excel (file excel phi save di dng .csv)
> data <- read.csv(solieu.csv, header= FALSE)
S liu mng/vect:
- Dng lnh c( ): > x <- c(1,2,3,4)
- Dng ton t : hoc lnh seq( )
> y <- 1:10
> z <- seq(1:10)
seq(from =, to =) hoc seq(from =, to =, by =) hoc seq(from =, to =, length =) hoc
seq(length = )
> t <- seq(2,10,2)
> u <- seq(20)
> v <- seq(0.25,0.5,length=100)
- To mng cc k t
> char <- c(a,b,c)
Mt s lnh lin quan:
> length(x) : chiu di ca x
> x[i] : phn t th i ca mng x
> x[-i]: tt c cc phn t ca x tr phn t th i ra.
> x[1:5]: trch x1 cho n x5
> x[c(1,3,5)]: trch cc phn t th 1,3 v 5.
> x[x>3]: trch tt c nhng phn t ln hn 3.
> x[x<-2 | x>2]: trch nhng phn t |x|>2

III. X l d liu:
0/ a mt data frame vo workspace x l: attach(dataframe)
1/ Tch d liu: Lnh subset( bien_goc, dieu_kien)
2/ Nhp 2 dataframe thnh mt: dng lnh merge(frame_1, frame_2, by=)
3/ Bin i s liu: t bin dng numeric sang bin phn loi, s dng cc php ton logic
hoc dng lnh replace( ).
Dng lnh factor( ): chuyn t bin dng numeric sang nhn t.
4/ Phn nhm s liu, dng hm cut2 (trong th vin Hmisc).
> library(Hmisc)
> cut2(bien_goc, g= so_nhom)

Mt s hm v vec-t: cho vec-t x


max(x), min(x) : gi tr ln nht, b nht ca x.
sum(x): tng cc gi tr trong x
mean(x): trung bnh ca x
median(x): trung v ca x
range(x): bng max(x) min(x)
var(x): phng sai ca x
sort(x): sp xp x, mc nh theo th t tng dn
order(x): tr v cc v tr ca x khi sp theo th t tng dn
quantile(x): tnh cc phn v ca x
cumsum(x): tng tch ly
cumprod(x): tch tch ly
Cc php ton trong R: +, -, *, /
Cc php ton logic:
! : NOT
&: AND
|: OR
<, >, <=, >=, ==, != (khc)
&&, ||
xor(x,y); isTRUE(x)
Cc hm ton hc thng dng trong R:
log(x) : logarti c s e
log10(x), log(x,n): logarit c s n
exp(x): e^x
sqrt(x): cn bc 2 ca x
factorial(x): x!
choose(n,k): t hp n chp k
floor(x): gi tr nguyn <x (sn ca x)
ceiling(x): gi tr nguyn > x (trn ca x)
trunc(x): lm tron ti gi tr nguyn gn nht gia x v 0.
round(x, digits=n): lm trn x n n ch s
signif(x, digits=n): hin th x di dn du chm thp phn, n tng ch s hin th
sin(x), cos(x), tan(x)
abs(x): |x|
x %/% y: ly phn nguyn ca php chia x/y
x %% y: ly phn d ca php chia x/y

Cc hm thng k thng dng:

(Xem r hn trong ti liu ting Vit v R)


Tnh ton cc xc sut thng dung thng qua cc hm trong bng.

III. V th mt s phn phi thng dng: (nh thc, poisson, u, m, chun)


Dng hm plot( );
V d: v th hm s
y = e x v y = ln( x )
> x <- seq(0,10,0.1)
> y <- exp(x)
> plot(y~x, type=l)
> y <- log(x)
> plot(y~x, type=l)
Phn phi nh thc:
> x <- 0:50
> y <- dbinom(x,50,0.25)
> plot(x,y,S) hoc > plot(x,y,h)

Phn phi chun:


> sample <- rnorm(50)
> hist(sample, prob=T)
> mu <- mean(sample)
> sigma <- sd(sample)
> x <- seq(-4,4,length=500)
> y <- dnorm(x,mu,sigma)
> lines(x,y)
Bi tp:
1.1 To vec-t: x =[1,2,5,7,-3,0,5,1,5,6] v y=[2,2,0,-5,7,8,11,9,3,2]
a. Tnh x+y, x*y,x-y.
b. Tao z =[Nhng phn t chn ca x], t =[Nhng phn t l ca y]
c. Trch nhng phn t ln hn 0 ca x v y.
d. Tnh trung bnh, lch tiu chun, sai s chun ca x v y.
e. Tm phn t ln nht, b nht ca x, y.
f. Sp xp x tng dn, y gim dn.
g. Lu x v y.
1.2 Nhp s liu t file data01.xls bng lnh read.csv( ) (chuyn file .xls -> .csv) gn vo
frame data1. Thc hin:
a. Tnh trung bnh, phng sai, trung v ca cc bin FPSA v TPSA.
b. V biu dng ng, boxplot cho FPSA v TPSA.
c. Tch nhng gi tr ca bin FPSA c K=0 v K=1.
d. c s liu t file data02.csv gn vo frame data2, merge 2 frame ny theo bin K.
e. To bin mi tPSA theo yu cu sau: Nu tui <= 30, tPSA=0; nu 30< tui <=50,
tPSA=1; nu tui>50, tPSA =1. To bng thng k cho tPSA.
1.3 Bng sau l im mt bi kim tra gm 3 cu hi ca 10 SV

a.
b.
c.
d.
1.4

Sinh vin Cu hi 1
Cu hi 2
Cu hi 3
1
3
5
1
2
3
3
3
3
3
5
1
4
4
5
1
5
3
2
1
6
4
2
3
7
3
5
1
8
4
5
1
9
3
4
1
10
4
2
1
Nhp cc s liu sau v gn vo bin tng ng s dng 3 cch: Dng lnh c( );
dng lnh scan( ); lnh read.table( ) (To file .txt) , edit(data.frame() ).
To bng kt qu ring cho cu hi 1 v cu hi 2.
V biu bar cho 3 cu hi.
V biu bar dng stack cho cu hi 2 v 3.

a. To ngu nhin 100 gi tr c phn phi nh thc, vi xc sut thnh cng mi ln


0.4. V biu t chc tn s.
b. To ngu nhin 100 gi tr c phn phi Poisson, v biu t chc tn s.
c. To ngu nhin 100 gi tr c phn phi chun c trung bnh l 50 v lch tiu
chun 4. V hm phn phi, hm mt .
d. To ngu nhin 100 gi tr c phn phi m vi lambda=1/2500. V hm phn
phi, hm mt .
1.5 File diesel_engine.dat v diesel_time.xls cha s liu v hot ng ca cc ng c
chy bng du diesel. Thc hin:
a. c s liu t hai file ny, gn v hai dataframe, t tn hai dataframe cng tn vi
file.
b. Lit k tn cc bin c trong hai dataframe va nhp.
c. Xc nh c bao nhiu bin b khuyt (missing data) trong diesel_engine. Thay th
cc gi tr khuyt trong bin speed bng 1500, bin load bng 20.
d. Tnh: trung bnh, phng sai, lch tiu chun, gi tr ln nht, nh nht ca bin
alcohol trong dataframe diesel_engine.
e. Ghp hai dataframe diesel_engine v diesel_time li thnh mt frame c tn l diesel.
f. Trch gi tr ca bin run (s th t cc ng c) m c thi gian tr (bin delay)
di 1.000.
g. m xem c bao nhiu ng c c timing bng 30.
h. V biu boxplot cho cc bin speed, timing v delay.
i. V biu phn tn cho cc cp bin (timing, speed), (temp, press).
j. Chuyn bin load sang bin nhn t.
k. Chia phm vi gi tr ca bin delay thnh 4 on u nhau v m s gi tr nm
trong cc on . To bng thng k v v biu ct.
l. Chia phm vi gi tr ca bin delay thnh 4 on nh sau: (0.283, 0.7], (0.7, 0.95],
(0.95, 1.2], (1.2, 1.56]. To bng thng k v v biu ct.
1.6 Cho s liu sau:
year snow.cover
1970 6.5
1971 12.0
1972 14.9
1973 10.0
1974 10.7
1975 7.9
1976 21.9
1977 12.5
1978 14.5
1979 9.2
a. Nhp s liu trn vo R.
b. V snow.cover theo year.

c. V biu histogram cho snow.cover.


d. Lp li cu b. v c. sau khi ly logarit ca bin snow.cover.
1.7 Cho s liu sau:
Temperature Erosion Blowby Total
(F)
incidents incidents incidents
53
3
2
5
57
1
0
1
63
1
0
1
70
1
0
1
70
1
0
1
75
0
2
1
Nhp s liu trn vo mt dataframe, v th biu din tng s incidents theo
temperature.
1.8 Thng k s liu t l lm pht ti 4 nc trong giai an 1960-1980 c thu thp
trong 2 bng s liu sau (vt: %)
Nam
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980

a.
b.
c.
d.

US
1.5
1.1
1.1
1.2
1.4
1.6
2.8
2.8
4.2
5
5.9
4.3
3.6
6.2
10.9
9.2
5.8
6.4
7.6
11.4
13.6

Anh
1
3.4
4.5
2.5
3.9
4.6
3.7
2.4
4.8
5.2
6.5
9.5
6.8
8.4
16
24.2
16.5
15.9
8.3
13.4
18

Nam
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980

Nhat
3.6
5.4
6.7
7.7
3.9
6.5
6
4
5.5
5.1
7.6
6.3
4.9
12
24.6
11.7
9.3
8.1
3.8
3.6
8

Duc
1.5
2.3
4.5
3
2.3
3.4
3.5
1.5
18
2.6
3.7
5.3
5.4
7
7
5.9
4.5
3.7
2.7
4.1
5.5

Nhp d liu trn vo 2 data.frame lamphat1 v lamphat2 trong R bng 3 cch.


Trn 2 data.frame trn vo 1 data.frame duy nht l lamphat theo Nam.
m s nm cc nc US, Anh, Nht, c c t l lm pht trn 5%.
V th phn tn v t l lm pht cho mi quc gia theo thi gian. Cho nhn xt
tng qut v lm pht ca 4 nc?
e. Tnh trung bnh, trung v, Max, Min, lch chun, sai s chun ca tng nc?

f. xc nh lm pht nc no bin thin nhiu hn, ta cn da vo tham s thng


k no? Kt lun?
g. To mt data.frame mi lamphat1 vi s bin nh trong data.frame lamphat nhng
khng cha d liu ca nm 1980.
h. Ta bit rng h s ca phng trnh hi quy tuyn tnh
c xc
nh nh sau:

Xc nh cc h s ny trong m hnh hi quy: lm pht theo thi gian cho US


bng cch s dng data.frame lamphat1. V th phng trnh hi quy ny?
i. S dng phng trnh hi quy trong cu h) hy xc nh t l lm pht trong nm
1980 ca US. So snh vi s liu thc t?

You might also like