Professional Documents
Culture Documents
One and Two-Way ANOVA
One and Two-Way ANOVA
ANOVA – Analysis of
Variances
Testing of complex hypothesis as a whole, e.g.:
more than two samples (multiple test problem),
several multiple factors (multiway ANOVA)
elimination of covariates (ANCOVA)
fixed and/or random effects (variance decomposition
methods, mixed effects models)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(3) 1/1
3/5/2021 One and two-way ANOVA with R (4)
A practical example
Scientific question
Find a suitable medium for growth experiments with
green algae:
Cheap, easy to handle
Suitable for students courses and classroom
experiments
Idea
Use a commercial fertilizer with the main nutrients N
and P
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(4) 1/2
3/5/2021 One and two-way ANOVA with R (4)
photosynthesis
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(4) 2/2
3/5/2021 One and two-way ANOVA with R (5)
Application
7 Different treatments
Fertilizer solution in closed bottles
Fertilizer solution in open bottles (CO 2 from air)
Fertilizer + Sugar (organic C source)
Fertilizer + additional HCO −
3
(add CaCO 3 to
sparkling mineral water)
A standard algae growth medium (“Basal medium”)
for comparison
Deionized (“destilled”) water and tap water for
comparison
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(5) 1/1
3/5/2021 One and two-way ANOVA with R (6)
Experimental design
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(6) 1/1
3/5/2021 One and two-way ANOVA with R (7)
Results
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(7) 1/1
3/5/2021 One and two-way ANOVA with R (8)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(8) 1/1
3/5/2021 One and two-way ANOVA with R (9)
Fertilizer 1 0.020
Fertilizer 2 -0.217
Fertilizer 3 -0.273
F. open 1 0.940
F. open 2 0.780
F. open 3 0.555
Advantages
F.+sugar 1 0.188
looks “stupid” but is
better for data F.+sugar 2 -0.100
analysis
F.+sugar 3 0.020
dependend growth
F.+CaCO3 1 0.245
and explanation
variable treat clearly F.+CaCO3 2 0.236
visible
F.+CaCO3 3 0.456
easily extensible to
Bas.med. 1 0.699
> 1 explanation
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(9) 1/2
3/5/2021 One and two-way ANOVA with R (9)
Bas.med. 2 0.727
Bas.med. 3 0.656
A.dest 1 -0.010
A.dest 2 0.000
A.dest 3 -0.010
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(9) 2/2
3/5/2021 One and two-way ANOVA with R (10)
The data in R
dat <- data.frame(
treat = factor(c("Fertilizer", "Fertilizer", "Fertilizer",
"F. open", "F. open", "F. open",
"F.+sugar", "F.+sugar", "F.+sugar",
"F.+CaCO3", "F.+CaCO3", "F.+CaCO3",
"Bas.med.", "Bas.med.", "Bas.med.",
"A.dest", "A.dest", "A.dest",
"Tap water", "Tap water"),
levels=c("Fertilizer", "F. open", "F.+sugar",
"F.+CaCO3", "Bas.med.", "A.dest", "Tap water")),
rep = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2),
growth = c(0.02, -0.217, -0.273, 0.94, 0.78, 0.555, 0.188, -0.1, 0.02,
0.245, 0.236, 0.456, 0.699, 0.727, 0.656, -0.01, 0, -0.01, 0.03, -0.07)
)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(10) 1/1
3/5/2021 One and two-way ANOVA with R (11)
Visualization
boxplot(growth ~ treat, data=dat)
abline(h=0, lty="dashed", col="grey")
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(11) 1/2
3/5/2021 One and two-way ANOVA with R (11)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(11) 2/2
3/5/2021 One and two-way ANOVA with R (12)
Hypotheses
H0 growth is the same in all treatments
HA differences between media
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(12) 1/2
3/5/2021 One and two-way ANOVA with R (12)
αtotal ≤ ∑ αi = N ⋅ α
i=1
Solutions
One approach can be to down-correct the alpha
errors so that α = 0.05 total
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(12) 2/2
3/5/2021 One and two-way ANOVA with R (13)
2 2 2
sy = s + sε
effect
Example
We have two brands of Clementines from a shop “E”, that
we encode as “EB” and “EP”. We want to know whether
the premium brand (“P”) and the basic brand (“B”) have a
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(13) 1/3
3/5/2021 One and two-way ANOVA with R (13)
different weight.
Instead of a t-test we encode “EB” with 1 and “EP” with 2.
clem_edeka <- data.frame(
brand = c("EP", "EB", "EB", "EB", "EB", "EB", "EB", "EB", "EB", "EB", "EB",
"EB", "EB", "EB", "EP", "EP", "EP", "EP", "EP", "EP", "EP", "EB", "EP"),
weight = c(88, 96, 100, 96, 90, 100, 92, 92, 102, 99, 86, 89, 99, 89, 75, 80,
81, 96, 82, 98, 80, 107, 88)
)
Total variance
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(13) 2/3
3/5/2021 One and two-way ANOVA with R (13)
## [1] 68.98814
## [1] 43.25
## [1] 0.3730807
Exercise:
Perform a t-Test for the two Clementine brands
Compare the p-value of the t-test with the p-value of
an ANOVA
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(13) 3/3
3/5/2021 One and two-way ANOVA with R (14)
ANOVA in R
Back to the algae growth data. Let’s call the linear model
m:
anova(m)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(14) 1/1
3/5/2021 One and two-way ANOVA with R (15)
Posthoc tests
The test above showed only, that the factor “treatment”
had a significant effect, but we don’t know which levels of
the factor are different. Here we apply a so-called
posthoc test.
Different posthoc tests exist, here we use the Tukey HSD
test that is the most common.
The TukeyHSD function has a numerical and a graphical
output.
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(15) 1/3
3/5/2021 One and two-way ANOVA with R (15)
Graphical output
par(las = 1) # las = 1 make y annotation horizontal
par(mar = c(4, 10, 3, 1)) # more space at the left for axis annotation
plot(tk)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(15) 2/3
3/5/2021 One and two-way ANOVA with R (15)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(15) 3/3
3/5/2021 One and two-way ANOVA with R (16)
plot(m, which=1)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(16) 1/3
3/5/2021 One and two-way ANOVA with R (16)
plot(m, which=2)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(16) 2/3
3/5/2021 One and two-way ANOVA with R (16)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: growth by treat
## Fligner-Killeen:med chi-squared = 4.2095, df = 6, p-value = 0.6483
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(16) 3/3
3/5/2021 One and two-way ANOVA with R (17)
##
## One-way analysis of means (not assuming equal variances)
##
## data: growth and treat
## F = 115.09, num df = 6.0000, denom df = 4.6224, p-value = 6.57e-05
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(17) 1/1
3/5/2021 One and two-way ANOVA with R (18)
Two-way ANOVA
Example from a statistics text book (Crawley 2002)
Effects of diet and coat color on growth of Hamsters
in Gramm per time (constructed data set)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(18) 1/1
3/5/2021 One and two-way ANOVA with R (19)
Tidy data
hams <- data.frame(No = 1:12,
growth = c(6.6, 7.2, 6.9, 8.3, 7.9, 9.2,
8.3, 8.7, 8.1, 8.5, 9.1, 9.0),
diet = rep(c("A", "B", "C"), each=2),
coat = rep(c("light", "dark"), each=6)
)
1 6.6 A light
2 7.2 A light
3 6.9 B light
4 8.3 B light
5 7.9 C light
6 9.2 C light
7 8.3 A dark
8 8.7 A dark
9 8.1 B dark
10 8.5 B dark
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(19) 1/2
3/5/2021 One and two-way ANOVA with R (19)
11 9.1 C dark
12 9.0 C dark
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(19) 2/2
3/5/2021 One and two-way ANOVA with R (20)
ANOVA
m <- lm(growth~coat*diet, data=hams)
anova(m)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(20) 1/2
3/5/2021 One and two-way ANOVA with R (20)
Interaction plot
with(hams, interaction.plot(diet, coat, growth, col=c("brown", "orange"), lty=1, lwd=2))
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(20) 2/2
3/5/2021 One and two-way ANOVA with R (21)
Diagnostics
Assumptions
1. independence of measurements (within
samples)
2. Variance homogeneity of residuals
3. Normal distribution of residuals
Note: test of assumptions only possible after fitting the
model.
⇒ Fit the ANOVA model first, then check if it was correct!
Diagnostic tools
Box plot
Plot of residuals vs. mean values
Q-Q-plot of residuals
Fligner-Killeen test (alternative: some people
recommend the Levene-Test)
par(mfrow=c(1, 2))
par(cex=1.2, las=1)
qqnorm(residuals(m))
qqline(residuals(m))
plot(residuals(m)~fitted(m))
abline(h=0)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(21) 1/2
3/5/2021 One and two-way ANOVA with R (21)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: growth by interaction(coat, diet)
## Fligner-Killeen:med chi-squared = 10.788, df = 5, p-value = 0.05575
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(21) 2/2
3/5/2021 One and two-way ANOVA with R (22)
Algorithm
1. Select smallest p out of all n p -values
2. If p ⋅ n < α ⇒ significant, else STOP
3. Set n − 1 → n , remove smallest p from the list and go to
step 1.
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(22) 1/1
3/5/2021 One and two-way ANOVA with R (23)
Example
Growth rate per day (d ) of blue-green algae cultures −1
plot(TukeyHSD(aov(m)))
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(23) 1/3
3/5/2021 One and two-way ANOVA with R (23)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(23) 3/3
3/5/2021 One and two-way ANOVA with R (24)
Conclusions
Statistical methods
In case of Holm-corrected t-tests, only a signle p-
value (MCYST vs. Subst A) remains significant. This
indicates that in this case, Holm’s method is more
conservative than TukeyHSD (only one compared to
two significant) effects.
An ANOVA with posthoc test is in general preferred,
but the sequential Holm-Bonferroni can be helpful in
special cases.
Moreover, it demonstrates clearly that massive
multiple testing needs to be avoided.
⇒ ANOVA is to be preferred, when possible.
Interpretation
Regarding our original hypothesis, we can see that
MCYST and SubstA did not inhibit growth of
Pseudanabaena. In fact SubstA stimulated growth.
This was contrary to our expectations – the
biological reason was then found 10 years later.
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(24) 1/2
3/5/2021 One and two-way ANOVA with R (24)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(24) 2/2
3/5/2021 One and two-way ANOVA with R (25)
ANCOVA
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(25) 1/2
3/5/2021 One and two-way ANOVA with R (25)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(25) 2/2
3/5/2021 One and two-way ANOVA with R (26)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(26) 1/1
3/5/2021 One and two-way ANOVA with R (27)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(27) 1/2
3/5/2021 One and two-way ANOVA with R (27)
##
## Two Sample t-test
##
## data: weight by sex
## t = 0.97747, df = 22, p-value = 0.339
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -126.3753 351.7086
## sample estimates:
## mean in group M mean in group F
## 3024.000 2911.333
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(27) 2/2
3/5/2021 One and two-way ANOVA with R (28)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(28) 1/3
3/5/2021 One and two-way ANOVA with R (28)
summary(m)
##
## Call:
## lm(formula = weight ~ week * sex, data = dobson)
##
## Residuals:
## Min 1Q Median 3Q Max
## -246.69 -138.11 -39.13 176.57 274.28
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1268.67 1114.64 -1.138 0.268492
## week 111.98 29.05 3.855 0.000986 ***
## sexF -872.99 1611.33 -0.542 0.593952
## week:sexF 18.42 41.76 0.441 0.663893
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 180.6 on 20 degrees of freedom
## Multiple R-squared: 0.6435, Adjusted R-squared: 0.59
## F-statistic: 12.03 on 3 and 20 DF, p-value: 0.000101
p <- coef(m)
abline(a=p[1], b=p[2], col="red")
abline(a=p[1]+p[3], b=p[2]+p[4], col="blue")
## the result is the same as when we would fit separate linear models
fem <- lm(weight ~ week, data=dobson, subset = sex=="F")
mal <- lm(weight ~ week, data=dobson, subset = sex=="M")
abline(fem, col="black", lty="dashed")
abline(mal, col="black", lty="dashed")
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(28) 2/3
3/5/2021 One and two-way ANOVA with R (28)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(28) 3/3
3/5/2021 One and two-way ANOVA with R (29)
2. Unbalanced case:
unequal number of samples for each factor
combination
ANOVA results depend on the order of factors
in the model formula.
Classical method: Type II or Type III ANOVA
Modern approach: model selection and
likelihood ratio tests
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(29) 1/1
3/5/2021 One and two-way ANOVA with R (30)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(30) 1/1
3/5/2021 One and two-way ANOVA with R (31)
library("car")
m <- lm(growth ~ coat * diet, data = hams)
Anova(m, type="II")
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(31) 1/1
3/5/2021 One and two-way ANOVA with R (32)
Alternative approach:
Comparison of different model candidates instead of p-
value based testing.
Model with all potentiall effects → full model,
Omit single factors → reduced models (several!),
No influence factors (ony mean value) → null model.
Which model is the best → minimal adequate
model?
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(32) 1/1
3/5/2021 One and two-way ANOVA with R (33)
AI C = −2 ln(L) + 2k
BI C = −2 ln(L) + k ⋅ ln(n)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(33) 1/1
3/5/2021 One and two-way ANOVA with R (34)
## df AIC
## m1 7 27.53237
## m2 5 26.83151
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(34) 1/1
3/5/2021 One and two-way ANOVA with R (35)
## Start: AIC=-8.52
## growth ~ diet * coat
##
## Df Sum of Sq RSS AIC
## - diet:coat 2 0.68667 2.8567 -9.2230
## <none> 2.1700 -8.5222
##
## Step: AIC=-9.22
## growth ~ diet + coat
##
## Df Sum of Sq RSS AIC
## <none> 2.8567 -9.2230
## - diet 2 2.6600 5.5167 -5.3256
## - coat 1 2.6133 5.4700 -3.4275
##
## Call:
## lm(formula = growth ~ diet + coat, data = hams)
##
## Coefficients:
## (Intercept) dietB dietC coatlight
## 8.1667 0.2500 1.1000 -0.9333
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(35) 1/1
3/5/2021 One and two-way ANOVA with R (36)
Summary
Linear models form the basis of many statistical
methods
Linear regression
ANOVA, ANCOVA, GLM, GAM, GLMM, . . .
ANOVA/ANCOVA instead of multiple testing
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(36) 1/1
3/5/2021 One and two-way ANOVA with R (37)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(37) 1/2
3/5/2021 One and two-way ANOVA with R (37)
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(37) 2/2
3/5/2021 One and two-way ANOVA with R (38)
Copyright
This resource was created by tpetzoldt
(github.com/tpetzoldt). It is provided as is without
warranty.
https://tpetzoldt.github.io/RStatistics/slides-anova.html#(38) 1/1