Professional Documents
Culture Documents
Introduction
Lê Minh Tâm 1
Contents
1. R introduction
3. Reading data
Lê Minh Tâm 2
1
References
Lê Minh Tâm 3
1. R language_ Introduction
Lê Minh Tâm 4
2
2. Installation & Some simple applications
n http://cran.r-project.org/bin/windows/base/R-
2.6.1-win32.exe
Lê Minh Tâm 5
n Setup file
n Icon on desktop
n Window Screen
Lê Minh Tâm 6
3
2. Cài đặt và một số ứng dụng đơn giản
Lê Minh Tâm 7
n Prompt : >
Lê Minh Tâm 8
4
R syntax
Application - R as a calculator
n Arithmetic calculations
> -27*12/21 Permulation: 3!
[1] -15.42857 Øprod(3:1)
> sqrt(10)
[1] 6
[1] 3.162278
# 10.9.8.7.6.5.4
> log(10) > prod(10:4)
[1] 2.302585 [1] 604800
> log10(2+3*pi)
[1] 1.057848 > prod(10:4)/prod(40:36)
[1] 0.007659481
> exp(2.7689)
[1] 15.94109 > choose(5, 2)
> (25 - 5)^3
[1] 10
[1] 8000
> 1/choose(5, 2)
> cos(pi) [1] 0.1
[1] -1
Lê Minh Tâm 10
5
Application - R as a number generator
> seq(12)
[1] 1 2 3 4 5 6 7 8 9 10 11 12
Lê Minh Tâm 11
> rep(c(1:4), 3)
[1] 1 2 3 4 1 2 3 4 1 2 3 4
Lê Minh Tâm 12
6
Application - R as a number generator
> gl(2,4,8)
[1] 1 1 1 1 2 2 2 2
Levels: 1 2
Lê Minh Tâm 13
Normal probability
∫ f ( x ) dx
b
P(a ≤ X ≤ b) = a
∫ f ( x ) dx
a
pnorm (a, mean, sd) =
−∞
= P(X ≤ a | mean, sd)
Probability of height less than or equal to 150 cm, given that the distribution
has mean=156 and sd=4.6
> pnorm(150, 156, 4.6)
[1] 0.0960575
Lê Minh Tâm 14
7
Application - R as a simulator
200
x <- rbinom(1000, 20, 0.20)
hist(x)
150
Frequency
100
50
0
0 2 4 6 8 10
Lê Minh Tâm 15
x
hist(height)
100
Frequency
50
0
8
R as a sampler
n We have 40 people (1,2,3,…,40). If we randomly
select 5 people from the group, who would be
selected?
sample(1:40, 5)
[1] 32 26 6 18 9
sample(1:40, 5)
[1] 5 22 35 19 4
sample(1:40, 5)
[1] 24 26 12 6 22
sample(1:40, 5)
[1] 22 38 11 6 18
Lê Minh Tâm 17
R syntax
n Phân biệt HOA và THƯỜNG
a <- 5
A <- 7
B <- a+A
9
3. Reading data
a <- c(1,2,3,4,5,6,7,8,9)
A <- matrix(a,nrow=3)
A
a <- c(1,2,3,4,5,6,7,8,9)
A <- matrix(a,nrow=3, byrow=TRUE)
A
Lê Minh Tâm 19
3. Reading data
Lê Minh Tâm 20
10
3. Reading data
Lê Minh Tâm 21
3. Reading data
Lê Minh Tâm 22
11
3. Reading data
Lê Minh Tâm 23
sample(0:999,10,replace=FALSE)
[1] 667 926 888 511 475 889 404 184 713 770
williams(4)
[,1] [,2] [,3] [,4]
[1,] 1 2 4 3
[2,] 2 3 1 4
[3,] 3 4 2 1
[4,] 4 1 3 2
Lê Minh Tâm 24
12
4. Using χ2, T-test, anova, PCA in sensory data analysis
χ2
Step 1: reading data ifg.xls
Lê Minh Tâm 25
χ2
Lê Minh Tâm 26
13
4. Using χ2, T-test, anova, PCA in sensory data analysis
One-sample T-test
Two-samples T-test
- Independent-samples T-test
- Paired T-test
Wilcoxon ???
Lê Minh Tâm 27
STT R1 R2 STT R1 R2
1 7 8 11 7 9
2 8 9 12 7 5
3 6 5 13 8 9
4 8 9 14 9 10
5 7 8 15 7 7
6 7 9 16 7 9
7 7 7 17 8 7
8 6 7 18 7 9
9 8 7 19 6 6
10 6 8 20 8 8
Lê Minh Tâm 28
14
4. Using χ2, T-test, anova, PCA in sensory data analysis
r1 <- c(7,8,6,8,7,7,7,6,8,6,8,9,5,9,8,9,7,7,7,8)
t.test(r1,tb=5)
data: r1
t = 30.1721, df = 19, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
6.840134 7.859866
sample estimates:
mean of x
7.35
Lê Minh Tâm 29
15
4. Using χ2, T-test, anova, PCA in sensory data analysis
r1 <- c(7,8,6,8,7,7,7,6,8,6,8,9,5,9,8,9,7,7,7,8)
r2 <- c(8,9,5,9,8,9,7,7,7,8,9,5,9,10,7,9,7,9,6,8)
tam <- data.frame(r1,r2)
t.test(r1,r2, paired=TRUE)
Paired t-test
data: r1 and r2
t = -1.2289, df = 19, p-value = 0.2341
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2163983 0.3163983
sample estimates:
mean of the differences
Lê Minh Tâm 31
-0.45
16
4. Using χ2, T-test, anova, PCA in sensory data analysis
r1 <- c(7,8,6,8,7,7,7,6,8,6,8,9,5,9,8,9,7,7,7,8)
r2 <- c(8,9,5,9,8,9,7,7,7,8,9,5,9,10,7,9,7,9,6,8)
tam <- data.frame(r1,r2)
t.test(r1,r2)
data: r1 and r2
t = -1.1348, df = 35.845, p-value = 0.2640
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.2543229 0.3543229
sample estimates:
mean of x mean of y
7.35 7.80
Lê Minh Tâm 33
Anova
Model S(A)
G1 G2 G3 G4 G5
9 7 8 4 7
8 9 5 3 8
6 6 6 6 7
8 6 7 5 6
10 6 3 4 9
Lê Minh Tâm 34
17
4. Using χ2, T-test, anova, PCA in sensory data analysis
x <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)
y <- c(9,8,6,8,10,7,9,6,6,6,8,5,6,7,3,4,3,6,5,4,7,8,7,6,9)
x <- as.factor(x)
result <- data.frame(x,y)
result <- aov(y~x)
summary(result)
plot(TukeyHSD(result))
Lê Minh Tâm 35
Anova
Model S(A*B)
Công thức
Giới tính R1 R2 R3
8 7 5
Male 6 6 6
9 7 4
8 6 7
Female 10 8 3
8 7 4
Lê Minh Tâm 36
18
4. Using χ2, T-test, anova, PCA in sensory data analysis
Lê Minh Tâm 37
19
4. Using χ2, T-test, anova, PCA in sensory data analysis
PCA
Lê Minh Tâm 39
Lê Minh Tâm 40
20
Variables factor map (PCA)
1.0
Hong Dau
Ngot
Dimension 2 (28.89%)
0.5
Chua.M Chua.V
Sanh
Vang
Dongnhat
Beo
0.0
Kem
Bamdinh
Chat
-0.5
Bo
-1.0
Nau.kem
Ancom
2
Dimension 2 (28.89%)
Izzi
Vinamilk
Daisy
0
Milky Dutch
-2
Nuti
-4
-4 -2 0 2 4 6
Dimension 1 (32.4%)
Lê Minh Tâm 42
21
4. Using χ2, T-test, anova, PCA in sensory data analysis
Lê Minh Tâm 43
Summary
Lê Minh Tâm 44
22
Standby Slides
Lê Minh Tâm 45
Lê Minh Tâm 46
23
Chương 1: Một số khái niệm thống kê cơ bản
X 1 + X 2 + ... + X n
X =
n
n
∑X i
X = i =1
n
Lê Minh Tâm 47
Example:
Day 1 2 3 4 5 6 7 8 9 10
Time 39 29 43 52 39 44 40 31 44 35
396
X = = 39.6
10
Lê Minh Tâm 48
24
Chương 1: Một số khái niệm thống kê cơ bản
The median is the value such that 50% of the values are
smaller and 50% of the values are lager
n +1
median = ranked .value
2
where n= sample size
Lê Minh Tâm 49
Time 29 31 35 39 39 40 43 44 44 52
Ranks 1 2 3 4 5 6 7 8 9 10
Median = 39.5
25
Chương 1: Một số khái niệm thống kê cơ bản
26
Chương 1: Một số khái niệm thống kê cơ bản
Q1 Q3
Lê Minh Tâm 53
Time 29 31 35 39 39 40 43 44 44 52
Range = 52 – 29 = 23
Lê Minh Tâm 54
27
Chương 1: Một số khái niệm thống kê cơ bản
Lê Minh Tâm 55
Time
(X)
δ = Xi − X δ 2 = ( X i − X )2
39 -0.6 0.36
29 -10.6 112.36
43 3.4 11.56
52 12.4 153.76
39 -0.6 0.36
44 4.4 19.36
40 0.4 0.16
31 -8.6 73.96
44 4.4 19.36
35 -4.6 21.16
Mean=39.6 Sum of dif = 0 Sum of squared dif = 412.4
Lê Minh Tâm 56
28
Chương 1: Một số khái niệm thống kê cơ bản
412.4
Sample variance: s2 = = 45.82
9
Lê Minh Tâm 57
n n
∑ ( X 1 − X )2 ∑ (X 1 − X )2
s2 = i =1
s= i =1
n −1 n −1
For almost all sets of data that have a single mode, most of the
values lie within an interval of plus or minus 3 standard
deviations above or below the mean
M ± (3).SD
Lê Minh Tâm 58
29
Chương 1: Một số khái niệm thống kê cơ bản
Lê Minh Tâm 59
Lê Minh Tâm 60
30
Chương 1: Một số khái niệm thống kê cơ bản
Bimodal Multimodal
Lê Minh Tâm 61
31
Dữ liệu cấp dưới
setwd(“c:/works/r”)
chol <- read.table(“chol.txt”,
header=TRUE)
attach(chol)
Lê Minh Tâm 63
Lê Minh Tâm 64
32