Program & Bibliographie

- 3(1,2): ~5 theory (301, B2) +10 practice (Comp. Chem. Lab by gro up) - Website: www2.hcmut.edu.vn/~dzung / (available from Sep 15)

TIN HỌC TRONG CNTP
Nguyễn Hoàng Dũng, PhD. Nguyễ Hoà ng, Trường Đại học Bách khoa Tp. HCM Trườ Tp.

- R: www.r-project.org www.r-

NHDzung – Lesson 1, slide 2

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? How-to HowConduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *
Sensory practices

NHDzung – Lesson 1, slide 3

NHDzung – Lesson 1, slide 4

1-1. Samples and Populations
A population consists of the set of all measurements in which the investigator is interested. A sample is a subset of the measurements selected from the population. A census is a complete enumeration of every item in a population.

Simple Random Sample
Sampling from the population is often done randomly, such that every possible sample of randomly, equal size (n) will have an equal chance of being selected. A sample selected in this way is called a simple random sample or just a random sample. sample.

NHDzung – Lesson 1, slide 5

NHDzung – Lesson 1, slide 6

1

Samples and Populations

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? How-to HowConduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *
Sensory practices

Population (N)
NHDzung – Lesson 1, slide 7

Sample (n)

NHDzung – Lesson 1, slide 8

Measurements
•The assigning of numbers to the values of a variable (SS Stevens, Science 1946;103:677 -80) •Rules specify procedures to assign numbers to values

The criteria of “science”
Science
Logic, experimental evidence Results are repeatable Falsiability* Falsiability* Peer-reviewed journals PeerEvolution / learn from mistakes

Pseudoscience
Belief, loyalty Results are not repeatable Not falsifiable Not in peer reviewed journals Constant, unchanged belief

*capable of being tested (verified or falsified) by experiment o r observation
NHDzung – Lesson 1, slide 9 NHDzung – Lesson 1, slide 10

Criteria of measurements
Validity measures what it purports to Accuracy - the degree of “truthfulness” of an attribute that is truthfulness”
being measured.

Accuracy vs reliability (precision)

Reliability (consistency and repeatability) Sensitivity to important variation precision

accuracy Measurement error decreases the accuracy of measurement
NHDzung – Lesson 1, slide 11 NHDzung – Lesson 1, slide 12

2

Some important concepts: Data - Variables Scales
Qualitative - Categorical Frequency or Nominal: Examples areare• Color • Gender • Nationality

Quantitative - Measurable or Countable:

THÔNG TIN CHUNG 1.1 Mô tả người trả lời phỏng vấn ngườ trả phỏ 1.1.1 Giới tính của người được phỏng vấn?1. Nam n?1 1. Độc thân Tình trạng hôn nhân: nhân: 1.1.2 Tuổi của người được phỏng vấn? Dưới 25 tuổi Dướ tuổ 25 – 30 tuổi tuổ 31 – 54 tuổi tuổ >55 tuổi tuổ 1.1.3 Xin Ông/Bà cho biết nghề nghiệp hiện nay ? Ông/Bà Học sinh, sinh viên sinh, Bác sĩ/giáo viên /giá Công nhân/ lao động làm thuê/bán hàng nhân/ thuê/bá Hưu trí trí

2. Nữ 2. Có gia đình

Examples areare• Temperatures • Humidity • Gross compounds • Preference points scored on a 100 point

1.1.4 Ông/Bà cho biết thu nhập của gia đình Ông/Bà ở mức nào sau đây Ông/Bà Ông/Bà 1 . Thấp ( ≥ 2 triệu đồng và < 5 triệu) Thấ triệ triệ 2 . Trung bình (≥ 5 triệu và <8 triệu) triệ triệ 3 . Cao ( ≥ 8 triệu) triệ
NHDzung – Lesson 1, slide 13 NHDzung – Lesson 1, slide 14

Some important concepts: Data - Variables Scales
•8 phomat (EdamF, EdamH, GoudaH, m1, m2, m3, m4, phomat (EdamF, EdamH, GoudaH, m5) m5) •11 người thử (chuyên gia) ngườ thử •3 lần lặp lại lầ lặ lạ •15 thuật ngữ mô tả: sour bitterness umami salty greasiness thuậ ngữ butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard •Thang điểm không cấu trúc từ 0-100 mm điể cấ trú từ
NHDzung – Lesson 1, slide 15 NHDzung – Lesson 1, slide 16

Variable Measurement scales • Discrete variables • Nominal scales ? (Label) • Continuous variables • Ordinal scales (Ranks in Army) • Independent variables • Inteval scales (Celsius, • Dependent variables
Fahrenheit)

• Ration scales (true zero
point, ratio)

Types of measurement
Qualitative Qualitative
(định chất) (định chất)

Qualitative measurements
Nominal level Ordinal level
• Classification + Ordering • A set of numbers can be assigned rank values and nothing more. • Ex: socio-economic status, education, levels of satisfaction, etc • Classification • A set of objects can be classified into exhaustive, mutually exclusive and unique symbol • Ex: religion, sex, location, etc

Quantitative Quantitative
(định lượng) (định lượng)

Nominal Ordinal

Interval Ratio

NHDzung – Lesson 1, slide 17

NHDzung – Lesson 1, slide 18

3

Quantitative measurements
Interval level
• Classification + Ordering + Standard distance • A set of objects can be described by units that indicate how far one case is from another case • Ex: temperature

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? How-to HowConduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *
Sensory practices

Ratio level
• Classification + Ordering + Standard distance + Natural zero • Quantitative variable with natural zero • Ex: income, age, weight, bone mineral density

NHDzung – Lesson 1, slide 19

NHDzung – Lesson 1, slide 20

1.2.2. Ông/Bà cho biết loại pho mát cứng nào mà Ông/Bà thường sử dụng Ông/Bà Ông/Bà Cheddar Gouda Edam Emental Khác (ghi rõ)…………………….. Khá rõ) …………………….. 1.2.4. Ông/Bà cho biết mức độ ưa thích chung đối với sản phẩm phó mát thí phó Ông/Bà bán cứng 1 2 3 4 5 6 7 8 9 1.2.5. Xin Ông/Bà cho biết tần số sử dụng sản phẩm phó mát bán cứng. phó ng. Ông/Bà > 3 lần/tuần n/tuầ 1 – 2 lần/tuần n/tuầ 1-3 lần/tháng n/thá 1.2.6. Xin Ông/Bà cho biết lượng phó mát bán cứng sử dụng trong tuần của Ông/Bà phó Ông/Bà Ông/Bà < 100g 100 – 300g > 300g

1.2.7. Theo Ông/Bà phó mát cứ ng ăn v ới sản phẩm nào? Ông/Bà phó Bánh mì Bánh sandwich Salad Bánh biscuit Rượu vang Rượ Khác (ghi rõ tên)……………………………… Khá tên) 1.2.8. Khi chọn mua sản phẩm phó mát cứ ng, Ông/Bà cho biết mức độ quan tâm đối với phó ng, Ông/Bà những y ếu tố sau đây (1=rất không quan tâm, 2=không quan tâm, 3=không ý kiến, (1=r tâm, 2= không tâm, 3=không 4=quan tâm, 5=rất quan tâm) 4=quan tâm, 5=r tâm) Giá cả 1 2 3 4 5 Giá Tính chất cảm quan của sản phẩm 1 2 3 4 5 chấ phẩ Mức độ quen thuộc 1 2 3 4 5 thuộ Thuận lợi khi sử dụng 1 2 3 4 5 Thuậ Có lợi cho sức khoẻ 1 2 3 4 5 khoẻ Khối lượng sản phẩm 1 2 3 4 5 Khố lượ phẩ

NHDzung – Lesson 1, slide 21

NHDzung – Lesson 1, slide 22

•8 phomat (EdamF, EdamH, GoudaH, m1, m2, m3, m4, phomat (EdamF, EdamH, GoudaH, m5) m5) •11 người thử (chuyên gia) ngườ thử •3 lần lặp lại lầ lặ lạ •15 thuật ngữ mô tả: sour bitterness umami salty greasiness thuậ ngữ butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard •Thang điểm không cấu trúc từ 0-100 mm điể cấ trú từ
NHDzung – Lesson 1, slide 23 NHDzung – Lesson 1, slide 24

4

Summary Measures Population Parameters Sample Statistics
judge
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11

session
1 1 1 1 1 1 1 1 1 1 1

product
m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1

sour
50 100 32 30 60 30 50 32 78 55 62

bitterness
18 65 11 10 23 35 32 23 27 30 21

umami
0 40 35 25 30 25 45 40 45 34 43

salty
40 100 4 1 29 50 64 40 21 18 32
l

Measures of Central Tendency

Measures of Variability

• Median • Mode • Mean

• Range • Variance • Standard Deviation
Other summary measures: – Skewness – Kurtosis
NHDzung – Lesson 1, slide 26

NHDzung – Lesson 1, slide 25

1-3. Measures of Central Tendency or Location
• Median â Middle value when sorted in order of magnitude â 50th percentile â Most frequentlyoccurring value â Average
NHDzung – Lesson 1, slide 27

Arithmetic Mean or Average
The mean of a set of observations is their average - the sum of the observed values divided by the number of observations. Population Mean
µ=

• Mode • Mean

Sample Mean
x=

∑x
i =1

N

∑x
i =1

n

N

n

NHDzung – Lesson 1, slide 28

Arithmetic Mean or Average
Affected by outliers

Median
Robust parameter of central tendency Non affected by outliers

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

0 1 2 3 4 5 6 7 8 9 10 12 14

Means = 5

Means = 6

Median = 5

Median = 5

NHDzung – Lesson 1, slide 29

NHDzung – Lesson 1, slide 30

5

Mode

Measures of Central Tendency or Location
Ø Mean :

x =

1 n
i

∑x
i =1

n

i

=

x1 + x 2 + K + x n n

x =
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 n

∑nx
i i =1

k

=

n1 x1 + n2 x 2 + K + nk x k n
Sample size

0 1 2 3 4 5 6

Ø Median :

Mode = 9

Without Mode

med ( x ) = x ( p + 1) = x ( p ) + x ( p + 1) 2

si si

n = 2p + 1 n = 2p

NHDzung – Lesson 1, slide 31

NHDzung – Lesson 1, slide 32

Mean or Median ?
Ÿ Outliers : median Ÿ Many of « ex aequo » (variable discrete) : mean

Quartiles
The value of the boundary at the 25th, 50th, or 75th percentiles of a frequency distribution divided into four parts, each containing a quarter of the population

25%

25%

25%

25%

Position of ith quartile

( Q1 )

( Q2 )

( Q3 )
( Qi ) =

Position of Q1 = Position

1 ( 9 + 1) 4

= 2.5

Q1 =

(12 + 13 ) = 12.5
2

i ( n + 1) 4

Data classified in increasing order : 11 12 13 16 16 17 18 21 22
NHDzung – Lesson 1, slide 33 NHDzung – Lesson 1, slide 34

1-4. Measures of Variability or Dispersion
Range • Difference between maximum and minimum values Variance • Mean* squared deviation from the mean Standard Deviation • Square root of the variance

Dispersion
Ø Range :

Range ( x ) = x( n ) − x (1)
Range = 12 - 7 = 5

Range = 12 - 7 = 5

7

8

9

10

11

12

7

8

9

10

11

12

q0.75 − q0.25
Ø Intervalle interquartile :

Definitions of population variance and sample variance differ slightly .
NHDzung – Lesson 1, slide 35 NHDzung – Lesson 1, slide 36

6

Mean (average)
Given a series of values xi (i = 1, … , n): x1, x2, …, xn, the n): mean is: 1 n
x= n

Variation
∑ xi
i =1

Study 1: the color scores of 6 consumers are: 6, 7, 8, 4, 5, and 6. The 1: mean is: n

The mean does not adequately describe the data. We need to know the variation in the data. An obvious measure is the sum of difference from the mean:
For study 1, the scores 6, 7, 8, 4, 5, and 6, we have: (6-6) + (7-6) + (8-6) + (4-6) + (5-6) + (6-6) (6(7(8(4(5(6=0+1+2–2–1+0 =0 NOT SATISFACTORY!
NHDzung – Lesson 1, slide 38

x=

1 6 + 7 + 8 + 4 + 5 + 6 36 = =6 ∑ xi = 6 6 n i =1

Study 2: the color scores of 4 consumers are: 10, 2, 3, and 9. The 2: mean is: 1 n 10 + 2 + 3 + 9 24

x=

∑ xi = n i =1

4

=

4

=6

NHDzung – Lesson 1, slide 37

Sum of squares
We need to make the difference positive by squaring them. This is called “Sum of squares” (SS) squares” For study 1: 6, 7, 8, 4, 5, 6, we have: SS = (6-6)2 + (7-6)2 + (8-6)2 + (4-6)2 + (5-6)2 + (6-6)2 = (6(5(4(8(7(610 For study 2: 10, 2, 3, 9, we have: SS= (10-6)2 + (2-6)2 + (3-6)2 + (9-6)2 = 50 (9(3(2(10This is better! But it does not take into account sample size n.
NHDzung – Lesson 1, slide 39

Variance
We have to divide the SS by sample size n. But in each square we use the mean to calculate the square, so we lose 1 degree of freedom. Therefore the correct denominator is n-1. This is called variance (denoted by s2)

s2 =

(x1 − x )2 + (x 2 − x )2 + ... + (x n − x )2
n −1

Or, in the sum notation:
s2 = 1 n 2 ∑ ( xi − x ) n − 1 i =1
NHDzung – Lesson 1, slide 40

1-5. Variance and Standard Deviation
Population Variance Sample Variance

Variance - example
For study 1: 6, 7, 8, 4, 5, and 6, the variance is:
s2 =

σ2 =

∑ (x − µ)2
i =1

N

s =
2

∑ (x − x)
i =1

n

2

(6 − 6 )2 + (7 − 6 )2 + (8 − 6 )2 + (5 − 6 )2 + (6 − 6 )2
6 −1

=

N

(n − 1)
2

10 =2 5

= σ=

∑x
i =1

N

2

( x)

N ∑ i =1

2
n

N

N

= s=
NHDzung – Lesson 1, slide 41

∑x −
i =1 2

( )
n ∑x i =1

2

For study 2: 10, 2, 3, 9, the variance is:
s2 =

(10 − 6 )2 + (2 − 6 )2 + (3 − 6 )2 + (9 − 6 )2
4 −1

σ

(n − 1)
s

n

=

50 = 16 .7 3

2

The scores in study 2 were much more variable than those in study 1.
NHDzung – Lesson 1, slide 42

7

Standard deviation
The problem with variance is that it is expressed in unit squared, whereas the mean is in the actual unit. We squared, need a way to convert variance back to the actual unit of measurement. We take the square root of variance – this is called “standard deviation” (denote by s) deviation” For study 1, s = sqrt(2) = 1.41 For study 2, s = sqrt(16.7) = 4.1
NHDzung – Lesson 1, slide 43

Standard Deviation
Data A
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 3.338

Data B
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = .9258 Mean = 15.5 s = 4.57

Data C
11 12 13 14 15 16 17 18 19 20 21

NHDzung – Lesson 1, slide 44

1-6 Form indicators: Skewness & Kurtosis
Skewness
• Measure of asymmetry of a frequency distribution

Skewness
Skewed to left
Mean < median < mode
3 0

• Measure of flatness or peakedness of a frequency distribution

F re q ue nc y

• Skewed to left • Symmetric or unskewed • Skewed to right Kurtosis • Platykurtic (relatively flat) • Mesokurtic (normal) • Leptokurtic (relatively peaked)
NHDzung – Lesson 1, slide 45

2 0

1 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0

x

NHDzung – Lesson 1, slide 46

Kurtosis
Platykurtic - flat distribution
7 0 0

Kurtosis
Mesokurtic - not too flat and not too peaked
5 0 0

6 0 0 4 0 0 5 0 0

F re q u e n c y

4 0 0 3 0 0 2 0 0

F re q u e n c y

3 0 0

2 0 0

1 0 0 1 0 0 0 - 3 .5 - 2 .7 - 1 .9 - 1 .1 - 0 .3 0 .5 1 .3 2 .1 2 .9 3 .7 0 -4 -3 -2 -1 0 1 2 3 4

X

X

NHDzung – Lesson 1, slide 47

NHDzung – Lesson 1, slide 48

8

Diagram

Quantitative variable

NHDzung – Lesson 1, slide 49

NHDzung – Lesson 1, slide 50

Quantitative variable
If we want to see in detail: 21 freq. between 1.65 m & 1.70 m distribute in 8 in [1.65 ; 1.675] & 13 in [1.675 ; 1.70]

Quantitative variable : boxplot
x x

Plus grande valeur inférieure à q 0.75 +1.5(q 0.75 - q 0.25) q 0.75 Median q 0.25 Plus petite valeur supérieure à q 0.25 -1.5(q 0.75 - q 0.25)
x

?
NHDzung – Lesson 1, slide 51

Boîte à moustaches
NHDzung – Lesson 1, slide 52

Form indicators
γ1 < 0
Asymetry Symetry

Principes of good « figure »
γ1 > 0
Asymetry
§Biểu diễn kết quả phức tạp một cách rõ ràng, chính xác và Biể diễ quả phứ ng, chí hiệu quả hiệ quả §Trình bày nhiều ý tưởng một cách hiệu quả nhất Trì nhiề tưở hiệ quả nhấ §Không nói dối !

Q1

Q 2 Q3

Q1 Q 2Q3

Q1 Q2

Q3

NHDzung – Lesson 1, slide 53

NHDzung – Lesson 1, slide 54

9

A BAD figure
Fig.
Digestion interactions of coral
ri da e i or te s (M ) us si da e A y lc ac ea ns i or te s (B ) A a lg e F i av id

A GOOD figure
Figure 3. Digestion interactions for coral taxa sampled at Pioneer Bay, Orpheus Island
60

Frequency

A
120 110 100 90 80 70 60 50 40 30 20 10 0

o cr

po

on

ae S

P

M

P

n po

ge

s

Wins Losses

50 40 30 20 10 0

Freq.

Wins

Losses

op cr A

ae id or

) M s( it e or P

ae sid us M

an ce na yo lc A

s

( es rit Po

B)

ae lg A

ae id vi Fa

o Sp

es ng

Taxon

NHDzung – Lesson 1, slide 55

NHDzung – Lesson 1, slide 56

10

Sign up to vote on this title
UsefulNot useful