You are on page 1of 34

Analysis of Variance

(ANOVA)
test of mean when groups>2
Lectured by: Sokly SIEV, Dr. Eng.

Reference: Environmental Statistics class


By Asso. Prof. Chihiro Yoshimura @Tokyo Tech
𝜇3
Sample

𝜇1
𝜇2

Population

When H0 is true When H0 is false

If H0 is rejected Type I error No error

If H0 is not rejected No error Type II error


Probability of committing at least one type I error

1 − 0.953 = 0.14
1 − 0.99915 = 0.14
ANOVA = F test
Kruskal-Wallis Test
One-way ANOVA
Factorial ANOVA
Friedman's Test, Quade Test
Repeated-Measures ANOVA

t test Multivariate ANOVA (MANOVA)


Paired t test
Blood pressure was decreased in four weeks.
Highest pressure

Blended Barley Tea

Sesame Barley Tea

Average of 74 persons Week


H0: 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘 or
There is no difference of means among groups.

ANOVA: Analysis of Variance


-> F statistic

testing whether the means of several groups are all equal or not,
and therefore generalizes t-test to more than two groups.

Assumption:
Independent, Additive, Homoscedastic, Normal distribution
2
𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠: 𝑌𝑗 − 𝑌𝑗

2
Group MS 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠: 𝑌𝑗 − 𝑌𝑗

𝑌𝑗 𝑌𝑗
Error MS
Smaller F

𝑌𝑗

Group MS

Error MS
𝑌𝑗 𝑌𝑗 Larger F

𝑌𝑗
Blood pressure was decreased in four weeks.

H0: Blood pressures under different treatments are all equal


after a four-week treatment.

Data 1. Blood pressure after four-week treatment


Drink A Drink B Drink C Drink D
142.6 100.8 108.7 127.8
142.1 97.0 107.7 124.2
140.2 105.0 114.0 123.1
136.5 98.6 106.3 125.7
NS 101.7 109.8 130.3

H0: 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶 = 𝜇𝐷
HA: The mean pressures on the four treatment are not all equal.
Blood Pressure

Drink A Drink B Drink C Drink D


Treatment
2
𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠: 𝑌𝑗 − 𝑌𝑗

2
Group MS 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠: 𝑌𝑗 − 𝑌𝑗

𝑌𝑗 𝑌𝑗
Error MS
Smaller F

𝑌𝑗

Group MS

Error MS
𝑌𝑗 𝑌𝑗 Larger F

𝑌𝑗
ANOVA Table

Source of variation Sum of squares (SS) df Mean square (MS)

2 𝒈𝒓𝒐𝒖𝒑 𝑆𝑆
Group 𝑛𝑖 𝑌𝑖 − 𝑌 m-1
𝒈𝒓𝒐𝒖𝒑 𝑑𝑓

2 𝒆𝒓𝒓𝒐𝒓 𝑆𝑆
Error 𝑌𝑖𝑗 − 𝑌𝑖 n-m
𝒆𝒓𝒓𝒐𝒓 𝑑𝑓

2
Total 𝑌𝑖𝑗 − 𝑌 n-1

Source of variation SS df MS F p

Group 4224.7 3 1408.2 165.0 < 0.001

Error 128.0 15 8.5

Total 4352.7 18 𝐹0.05 1 ,3,15 = 3.29


𝐻0 : 𝛽 = 0
2
𝐺𝑟𝑜𝑢𝑝 𝑀𝑆 𝑌𝑖 − 𝑌 /1
𝐹= = 2
𝐸𝑟𝑟𝑜𝑟 𝑀𝑆 𝑌𝑖 − 𝑌𝑖 / 𝑛 − 2

𝐹𝛼 1 ,1,𝑛−2

𝐹
Ex. 7-1

Apply ANOVA for the example data (Data 1 & 2)


1) Apply one-way ANOVA for Data 1.
2) Apply two-way ANOVA for Data 2, considering the interaction.
In both applications, clearly state a hypothesis and explain the results.

R: ANOVA / Use letters for categorical data, but not numbers.

# one way
oneway.test(data1$pressure ~ data1$drink, var=T)
result1 <- aov(data1$pressure ~ data1$drink)
summary(result1)

# two-way
result2 <- aov(data2$pressure ~ data2$drink * data2$sex)
summary(result2)
Data preparation for ANOVA

Drink A Drink B Drink C Drink D


142.6 100.8 108.7 127.8
142.1 97.0 107.7 124.2
140.2 105.0 114.0 123.1
136.5 98.6 106.3 125.7
NS 101.7 109.8 130.3
Data 1
level or
Drink A Drink B Drink C Drink D
treatment
142.6 100.8 108.7 127.8
142.1 97.0 107.7 124.2
140.2 105.0 114.0 123.1
136.5 98.6 106.3 125.7
NS 101.7 109.8 130.3 Single-factor or one-way ANOVA

Data 2
Male Female
Drink A Drink B Drink C Drink D Drink A Drink B Drink C Drink D
142.6 100.8 108.7 127.8 147.6 105.8 113.7 132.8
142.1 97.0 107.7 124.2 147.1 102.0 112.7 129.2
140.2 105.0 114.0 123.1 145.2 110.0 119 128.1
136.5 98.6 106.3 125.7 141.5 103.6 111.3 130.7
NS 101.7 109.8 130.3 NS 106.7 114.8 135.3

Two-factor or two-way ANOVA


Factorial ANOVA (with two or more factors)
Repeated measures ANOVA
Table for two-way factorial ANOVA

Source of variation SS df F p

Factor A Va - Va / Vr **

Factor B Vb - Vb / Vr *

Factor A x Factor B Vab - Vab / Vr -

Error Vr -

Interaction
Data preparation for ANOVA
Male Female
Drink A Drink B Drink C Drink D Drink A Drink B Drink C Drink D
142.6 100.8 108.7 127.8 147.6 105.8 113.7 132.8
142.1 97.0 107.7 124.2 147.1 102.0 112.7 129.2
140.2 105.0 114.0 123.1 145.2 110.0 119 128.1
136.5 98.6 106.3 125.7 141.5 103.6 111.3 130.7
NS 101.7 109.8 130.3 NS 106.7 114.8 135.3
Post-hoc test | Tukey HSD test, Fisher LSD test, etc

• A single-step multiple comparison procedure (多重比較)


• Statistical test generally used with an ANOVA
Tukey (HSD) test
𝑋1 − 𝑋2 "Var1"; LS Means
Current effect: F(3, 15)=164.99, p=.00000
𝑞𝑠 = --- Studentized range distribution
Effective hypothesis decomposition
𝑆𝐸 Vertical bars denote 0.95 confidence intervals
150

145 a
140

135
d
130

125
V ar2

120

115
c
110 b
105

100

95

90
1 2 3 4
Methods for Hypothesis Test

Type of test Parametric Nonparametric

< 2 samples t-test Mann-Whitney U-test

Paired sample Paired t-test Wilcoxon

> 2 samples 1-way ANOVA Kruskal-Wallis

Distribution Chi-square Kolmogorov-Smirnov

Correlation Pearson's r Spearman's r

Crossed comparisons Factorial ANOVA Friedman's; Quade


ANCOVA: Analysis of Covariance
ANCOVA = Linear Regression + ANOVA

ANOVA: lm(post_weight ~ group)


ANCOVA: lm(post_weight ~ pre_weight + group)
Ex. 7-2
Check the probability of type I error.

Check the probability of committing type I error in one-way ANOVA,


following the procedure below. Note α = 0.05.
1) Specify Fα(1),2,27. [R: qf(1-α, 2, 27)]
2) Assume a population following normal distribution.
3) From the population choose 10 samples randomly (Data-A).
4) Repeat random sampling and generate Data-B and -C (n=10 each).
5) Perform one-way ANOVA for the three sets of samples.
𝐻0 : 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶
6) Repeat from Step 3 to Step 5 many times and determine the probability
of the case when F > F0.05(1),2,27.
7) Explain and discuss the result, specially on the probability.
𝐹𝛼 1 ,1,𝑛−2

PDF

from your data


Type I error

𝐹
𝐹2 𝐹1
Blood pressure was decreased in four weeks.
Highest pressure

Blended Barley Tea

Sesame Barley Tea

Average of 74 persons Week


Blood pressure was decreased in four weeks.
Highest pressure

Population or Sample?
Sample condition?
ANOVA? Blended Barley Tea

No of samples? Sesame Barley Tea


Significant ?

Average of 74 persons Week


Consumer Affairs Agency

特定保健用食品
胡麻麦茶の有効性に関する評価

対象:正常高値および軽症高血圧者72名(男性:38名、女性:34名)。

方法:ゴマ蛋白質分解物を配合していない茶飲料(プラセボ飲料)あ
るいはゴマ蛋白質分解物を500mg配合した茶飲料(試験飲料)を用
いた二重盲検並行群間比較試験を実施した。2週間の事前観察期間の
後、プラセボ飲料あるいは試験飲料を1日1本12週間摂取させ、血圧
降下作用を検討した。

結果:試験飲料摂取群はプラセボ飲料摂取群に比べ、摂取6および10週
後では拡張期血圧の有意な低下(それぞれp<0.01、p<0.05)、
摂取12週後では収縮期血圧および拡張期血圧の有意な低下(いずれ
もp<0.01)が認められた(1)。

(1)森口盛雄, 健康栄養食品研究, 7(1), 49-64(2004)


Appendix
Replication
… the repetition of the set of all the treatment combinations to be
compared in an experiment. Each of the repetitions is called a replicate.

Treatment 1
0.1 L/day

Treatment 2
0.5 L/day

Treatment 3
1.0 L/day
Triplicate
Region Pond Bacterial concentration
A
Hot Data A1, A2, A3

B
Warm Data B1, B2, B3 ANOVA?

C
Cold Data C1, C2, C3
Pseudoreplication(擬似反復)

To organize your data in such a way as to pretend that you have made
more independent observations than is actually the case.

Region Pond Bacterial concentration


A
Hot Data A1, A2, A3

B
Warm Data B1, B2, B3 ANOVA?

C
Cold Data C1, C2, C3
Region Pond

A2
A1

Hot
Data A1, A2, A3
A3

B1 B2

Warm
Data B1, B2, B3 ANOVA
B3

C2
C1
Cold
C3
Data C1, C2, C3

Variation: within pond, within region, and between regions

You might also like