# Program & Bibliographie

3(1,2): ~5 theory (301, B2) +10 practice (Comp. Chem. Lab by group)

TIN HỌC TRONG CNTP
Nguyễn Hoàng Dũng, PhD. Nguyễ Hoà ng, Trường Đại học Bách khoa Tp. HCM Trườ Tp.

- R: www.r-project.org www.r-

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? How-to Conduct a research 1. Sampling 2. Measurement 3. Collect data 4. Analysis and present your results
Sensory practices

1-1. Samples and Populations
A population consists of the set of all measurements in which the investigator is interested. A sample is a subset of the measurements selected from the population. A census is a complete enumeration of every item in a population.

Simple Random Sample
Sampling from the population is often done randomly, such that every possible sample of randomly, equal size (n) will have an equal chance of being selected. A sample selected in this way is called a simple random sample or just a random sample. sample.

Samples and Populations

Population (N)
Sample (n)

Measurements
•The assigning of numbers to the values of a variable (SS Stevens, Science 1946;103:677 -80) •Rules specify procedures to assign numbers to values

The criteria of “science”
Science
Logic, experimental evidence Results are repeatable Falsiability* Falsiability* Peer-reviewed journals PeerEvolution / learn from mistakes

Pseudoscience
Belief, loyalty Results are not repeatable Not falsifiable Not in peer reviewed journals Constant, unchanged belief

*capable of being tested (verified or falsified) by experiment o r observation
Criteria of measurements
Validity measures what it purports to Accuracy - the degree of “truthfulness” of an attribute that is truthfulness”
being measured.

Accuracy vs reliability (precision)

Reliability (consistency and repeatability) Sensitivity to important variation precision

accuracy Measurement error decreases the accuracy of measurement
Some important concepts: Data - Variables Scales
Qualitative - Categorical Frequency or Nominal: Examples areare• Color • Gender • Nationality

Quantitative - Measurable or Countable:

THÔNG TIN CHUNG 1.1 Mô tả người trả lời phỏng vấn ngườ trả phỏ 1.1.1 Giới tính của người được phỏng vấn?1. Nam n?1 1. Độc thân Tình trạng hôn nhân: nhân: 1.1.2 Tuổi của người được phỏng vấn? Dưới 25 tuổi Dướ tuổ 25 – 30 tuổi tuổ 31 – 54 tuổi tuổ >55 tuổi tuổ 1.1.3 Xin Ông/Bà cho biết nghề nghiệp hiện nay ? Ông/Bà Học sinh, sinh viên sinh, Bác sĩ/giáo viên /giá Công nhân/ lao động làm thuê/bán hàng nhân/ thuê/bá Hưu trí trí

2. Nữ 2. Có gia đình

Examples areare• Temperatures • Humidity • Gross compounds • Preference points scored on a 100 point

1.1.4 Ông/Bà cho biết thu nhập của gia đình Ông/Bà ở mức nào sau đây Ông/Bà Ông/Bà 1 . Thấp ( ≥ 2 triệu đồng và < 5 triệu) Thấ triệ triệ 2 . Trung bình (≥ 5 triệu và <8 triệu) triệ triệ 3 . Cao ( ≥ 8 triệu) triệ
Some important concepts: Data - Variables Scales
•8 phomat (EdamF, EdamH, GoudaH, m1, m2, m3, m4, phomat (EdamF, EdamH, GoudaH, m5) m5) •11 người thử (chuyên gia) ngườ thử •3 lần lặp lại lầ lặ lạ •15 thuật ngữ mô tả: sour bitterness umami salty greasiness thuậ ngữ butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard •Thang điểm không cấu trúc từ 0-100 mm điể cấ trú từ
Variable Measurement scales • Discrete variables • Nominal scales ? (Label) • Continuous variables • Ordinal scales (Ranks in Army) • Independent variables • Inteval scales (Celsius, • Dependent variables
Fahrenheit)

• Ration scales (true zero
point, ratio)

Types of measurement
Qualitative Qualitative
(định chất) (định chất)

Qualitative measurements
Nominal level Ordinal level
• Classification + Ordering • A set of numbers can be assigned rank values and nothing more. • Ex: socio-economic status, education, levels of satisfaction, etc • Classification • A set of objects can be classified into exhaustive, mutually exclusive and unique symbol • Ex: religion, sex, location, etc

Quantitative Quantitative
(định lượng) (định lượng)

Nominal Ordinal

Interval Ratio

Quantitative measurements
Interval level
• Classification + Ordering + Standard distance • A set of objects can be described by units that indicate how far one case is from another case • Ex: temperature

Ratio level
• Classification + Ordering + Standard distance + Natural zero • Quantitative variable with natural zero • Ex: income, age, weight, bone mineral density

1.2.2. Ông/Bà cho biết loại pho mát cứng nào mà Ông/Bà thường sử dụng Ông/Bà Ông/Bà Cheddar Gouda Edam Emental Khác (ghi rõ)…………………….. Khá rõ) …………………….. 1.2.4. Ông/Bà cho biết mức độ ưa thích chung đối với sản phẩm phó mát thí phó Ông/Bà bán cứng 1 2 3 4 5 6 7 8 9 1.2.5. Xin Ông/Bà cho biết tần số sử dụng sản phẩm phó mát bán cứng. phó ng. Ông/Bà > 3 lần/tuần n/tuầ 1 – 2 lần/tuần n/tuầ 1-3 lần/tháng n/thá 1.2.6. Xin Ông/Bà cho biết lượng phó mát bán cứng sử dụng trong tuần của Ông/Bà phó Ông/Bà Ông/Bà < 100g 100 – 300g > 300g

1.2.7. Theo Ông/Bà phó mát cứ ng ăn v ới sản phẩm nào? Ông/Bà phó Bánh mì Bánh sandwich Salad Bánh biscuit Rượu vang Rượ Khác (ghi rõ tên)……………………………… Khá tên) 1.2.8. Khi chọn mua sản phẩm phó mát cứ ng, Ông/Bà cho biết mức độ quan tâm đối với phó ng, Ông/Bà những y ếu tố sau đây (1=rất không quan tâm, 2=không quan tâm, 3=không ý kiến, (1=r tâm, 2= không tâm, 3=không 4=quan tâm, 5=rất quan tâm) 4=quan tâm, 5=r tâm) Giá cả 1 2 3 4 5 Giá Tính chất cảm quan của sản phẩm 1 2 3 4 5 chấ phẩ Mức độ quen thuộc 1 2 3 4 5 thuộ Thuận lợi khi sử dụng 1 2 3 4 5 Thuậ Có lợi cho sức khoẻ 1 2 3 4 5 khoẻ Khối lượng sản phẩm 1 2 3 4 5 Khố lượ phẩ

•8 phomat (EdamF, EdamH, GoudaH, m1, m2, m3, m4, phomat (EdamF, EdamH, GoudaH, m5) m5) •11 người thử (chuyên gia) ngườ thử •3 lần lặp lại lầ lặ lạ •15 thuật ngữ mô tả: sour bitterness umami salty greasiness thuậ ngữ butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard •Thang điểm không cấu trúc từ 0-100 mm điể cấ trú từ
Summary Measures Population Parameters Sample Statistics
judge
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11

session
1 1 1 1 1 1 1 1 1 1 1

product
m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1

sour
50 100 32 30 60 30 50 32 78 55 62

bitterness
18 65 11 10 23 35 32 23 27 30 21

umami
0 40 35 25 30 25 45 40 45 34 43

salty
40 100 4 1 29 50 64 40 21 18 32
l

Measures of Central Tendency

Measures of Variability

• Median • Mode • Mean

• Range • Variance • Standard Deviation
Other summary measures: – Skewness – Kurtosis
1-3. Measures of Central Tendency or Location
• Median â Middle value when sorted in order of magnitude â 50th percentile â Most frequentlyoccurring value â Average
Arithmetic Mean or Average
The mean of a set of observations is their average - the sum of the observed values divided by the number of observations. Population Mean
µ=

• Mode • Mean

Sample Mean
x=

∑x
i =1

N

∑x
i =1

n

N

n

Arithmetic Mean or Average
Affected by outliers

Median
Robust parameter of central tendency Non affected by outliers

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

0 1 2 3 4 5 6 7 8 9 10 12 14

Means = 5

Means = 6

Median = 5

Median = 5

5

Mode

Measures of Central Tendency or Location
Ø Mean :

x =

1 n
i

∑x
i =1

n

i

=

x1 + x 2 + K + x n n

x =
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 n

∑nx
i i =1

k

=

n1 x1 + n2 x 2 + K + nk x k n
Sample size

0 1 2 3 4 5 6

Ø Median :

Mode = 9

Without Mode

med ( x ) = x ( p + 1) = x ( p ) + x ( p + 1) 2

si si

n = 2p + 1 n = 2p

Mean or Median ?
Ÿ Outliers : median Ÿ Many of « ex aequo » (variable discrete) : mean

Quartiles
The value of the boundary at the 25th, 50th, or 75th percentiles of a frequency distribution divided into four parts, each containing a quarter of the population

25%

25%

25%

25%

Position of ith quartile

( Q1 )

( Q2 )

( Q3 )
( Qi ) =

Position of Q1 = Position

1 ( 9 + 1) 4

= 2.5

Q1 =

(12 + 13 ) = 12.5
2

i ( n + 1) 4

Data classified in increasing order : 11 12 13 16 16 17 18 21 22
1-4. Measures of Variability or Dispersion
Range • Difference between maximum and minimum values Variance • Mean* squared deviation from the mean Standard Deviation • Square root of the variance

Dispersion
Ø Range :

Range ( x ) = x( n ) − x (1)
Range = 12 - 7 = 5

Range = 12 - 7 = 5

7

8

9

10

11

12

7

8

9

10

11

12

q0.75 − q0.25
Ø Intervalle interquartile :

Definitions of population variance and sample variance differ slightly .
6

Mean (average)
Given a series of values xi (i = 1, … , n): x1, x2, …, xn, the n): mean is: 1 n
x= n

Variation
∑ xi
i =1

Study 1: the color scores of 6 consumers are: 6, 7, 8, 4, 5, and 6. The 1: mean is: n

The mean does not adequately describe the data. We need to know the variation in the data. An obvious measure is the sum of difference from the mean:
For study 1, the scores 6, 7, 8, 4, 5, and 6, we have: (6-6) + (7-6) + (8-6) + (4-6) + (5-6) + (6-6) (6(7(8(4(5(6=0+1+2–2–1+0 =0 NOT SATISFACTORY!
x=

1 6 + 7 + 8 + 4 + 5 + 6 36 = =6 ∑ xi = 6 6 n i =1

Study 2: the color scores of 4 consumers are: 10, 2, 3, and 9. The 2: mean is: 1 n 10 + 2 + 3 + 9 24

x=

∑ xi = n i =1

4

=

4

=6

Sum of squares
We need to make the difference positive by squaring them. This is called “Sum of squares” (SS) squares” For study 1: 6, 7, 8, 4, 5, 6, we have: SS = (6-6)2 + (7-6)2 + (8-6)2 + (4-6)2 + (5-6)2 + (6-6)2 = (6(5(4(8(7(610 For study 2: 10, 2, 3, 9, we have: SS= (10-6)2 + (2-6)2 + (3-6)2 + (9-6)2 = 50 (9(3(2(10This is better! But it does not take into account sample size n.
Variance
We have to divide the SS by sample size n. But in each square we use the mean to calculate the square, so we lose 1 degree of freedom. Therefore the correct denominator is n-1. This is called variance (denoted by s2)

s2 =

(x1 − x )2 + (x 2 − x )2 + ... + (x n − x )2
n −1

Or, in the sum notation:
s2 = 1 n 2 ∑ ( xi − x ) n − 1 i =1
1-5. Variance and Standard Deviation
Population Variance Sample Variance

Variance - example
For study 1: 6, 7, 8, 4, 5, and 6, the variance is:
s2 =

σ2 =

∑ (x − µ)2
i =1

N

s =
2

∑ (x − x)
i =1

n

2

(6 − 6 )2 + (7 − 6 )2 + (8 − 6 )2 + (5 − 6 )2 + (6 − 6 )2
6 −1

=

N

(n − 1)
2

10 =2 5

= σ=

∑x
i =1

N

2

( x)

N ∑ i =1

2
n

N

N

= s=
∑x −
i =1 2

( )
n ∑x i =1

2

For study 2: 10, 2, 3, 9, the variance is:
s2 =

(10 − 6 )2 + (2 − 6 )2 + (3 − 6 )2 + (9 − 6 )2
4 −1

σ

(n − 1)
s

n

=

50 = 16 .7 3

2

The scores in study 2 were much more variable than those in study 1.
7

Standard deviation
The problem with variance is that it is expressed in unit squared, whereas the mean is in the actual unit. We squared, need a way to convert variance back to the actual unit of measurement. We take the square root of variance – this is called “standard deviation” (denote by s) deviation” For study 1, s = sqrt(2) = 1.41 For study 2, s = sqrt(16.7) = 4.1
Standard Deviation
Data A
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 3.338

Data B
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = .9258 Mean = 15.5 s = 4.57

Data C
11 12 13 14 15 16 17 18 19 20 21

1-6 Form indicators: Skewness & Kurtosis
Skewness
• Measure of asymmetry of a frequency distribution

Skewness
Skewed to left
Mean < median < mode
3 0

• Measure of flatness or peakedness of a frequency distribution

F re q ue nc y

• Skewed to left • Symmetric or unskewed • Skewed to right Kurtosis • Platykurtic (relatively flat) • Mesokurtic (normal) • Leptokurtic (relatively peaked)
2 0

1 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0

x

Kurtosis
Platykurtic - flat distribution
7 0 0

Kurtosis
Mesokurtic - not too flat and not too peaked
5 0 0

6 0 0 4 0 0 5 0 0

F re q u e n c y

4 0 0 3 0 0 2 0 0

F re q u e n c y

3 0 0

2 0 0

1 0 0 1 0 0 0 - 3 .5 - 2 .7 - 1 .9 - 1 .1 - 0 .3 0 .5 1 .3 2 .1 2 .9 3 .7 0 -4 -3 -2 -1 0 1 2 3 4

X

X

8

Diagram

Quantitative variable

Quantitative variable
If we want to see in detail: 21 freq. between 1.65 m & 1.70 m distribute in 8 in [1.65 ; 1.675] & 13 in [1.675 ; 1.70]

Quantitative variable : boxplot
x x

Plus grande valeur inférieure à q 0.75 +1.5(q 0.75 - q 0.25) q 0.75 Median q 0.25 Plus petite valeur supérieure à q 0.25 -1.5(q 0.75 - q 0.25)
x

?
Boîte à moustaches
Form indicators
γ1 < 0
Asymetry Symetry

Principes of good « figure »
γ1 > 0
Asymetry
§Biểu diễn kết quả phức tạp một cách rõ ràng, chính xác và Biể diễ quả phứ ng, chí hiệu quả hiệ quả §Trình bày nhiều ý tưởng một cách hiệu quả nhất Trì nhiề tưở hiệ quả nhấ §Không nói dối !

Q1

Q 2 Q3

Q1 Q 2Q3

Q1 Q2

Q3

Fig.
Digestion interactions of coral
ri da e i or te s (M ) us si da e A y lc ac ea ns i or te s (B ) A a lg e F i av id

A GOOD figure
Figure 3. Digestion interactions for coral taxa sampled at Pioneer Bay, Orpheus Island
60

Frequency

A
120 110 100 90 80 70 60 50 40 30 20 10 0

o cr

po

on

ae S

P

M

P

n po

ge

s

Wins Losses

50 40 30 20 10 0

Freq.

Wins

Losses

op cr A

ae id or

) M s( it e or P

ae sid us M

an ce na yo lc A

s

( es rit Po

B)

ae lg A

ae id vi Fa

o Sp

es ng

Taxon

