You are on page 1of 5

Appendices 151

Appendix 5 ANOVA explained


ANOVA can be applied to many different types of data. In this book, we
focus on its application to sensory data.

Purpose
ANOVA is used to examine the different sources of variation within a
data set, e.g. variation from different types of products or different asses-
sors. ANOVA calculates an F-ratio to determine if these sources of varia-
tion are significantly greater than that due to background noise (error).
Consider Figure A5.1 which shows two data sets A and B. Within each
data set, there are three samples with the same mean value for a sensory
attribute. The variation within each sample is due to background noise
and is larger in A than in B. It is easy to conclude that there is a signifi-
cant difference between samples in B because the distance between mean
values is obviously larger than the variation within the samples. In A,
however, the variation within each sample is sufficiently large that, with-
out ANOVA, it is impossible to determine if a significant difference exists
between the samples.

Sources of variation
In sensory testing, the main sources of variation relate to samples and
assessors, and potential interaction between the two. It is important
to consider all sources of potential variation during the experimental
design, so that they can be accounted for in the data analysis.

Data set A

Data set B

Sample 1 mean Sample 2 mean Sample 3 mean


Figure A5.1 Illustration of sample mean and variation in two different data sets
(A and B).
152 Sensory evaluation

Samples: Measuring variation due to samples, and identifying signifi-


cant differences between them, is often the main objective of ANOVA. In
order to measure the variation due to samples, the experimental design
must include multiple responses for each sample.
Assessors: Variation due to assessors can arise from different use of the
rating scale. For example, some individuals have a natural tendency to
use the upper or lower end of the scale. In order to measure the vari-
ation due to assessors, the experimental design must allow all samples
to be evaluated by all assessors. This is known as a repeated measures
design.
Interaction: Variation due to the interaction between the samples and
assessors can arise when different assessors place the samples in different
orders of perceived intensity (crossover interaction), or when the relative
magnitude of differences between samples is inconsistent across asses-
sors (magnitude interaction). In order to measure the variation due to
interaction, the experimental design must include replicated responses
from each assessor for each sample.

ANOVA designs
ANOVA designs relate to the experimental design for the study. The
complexity of the ANOVA is determined by the structure of the data set.
The three most common designs used for sensory data are as follows:
1 One-factor ANOVA: This calculates variation due to one factor only; all
other variation is considered as background noise (error). Commonly,
the factor is the samples, where a different group of assessors provide
results for each sample. The ANOVA compares the variation between
sample means to variation within the samples. Alternatively, the factor
could be assessors, where one sample is assessed on multiple occasions
by each assessor. Here the ANOVA compares the variation between
assessor means to variation within the assessors.
2 Two-factor ANOVA (repeated measures): This calculates variation
due to two factors; any other variation is considered as background
noise (error). Typically, the factors are samples and assessors, where
each assessor has evaluated each sample. The ANOVA compares vari-
ation between samples means to variation within samples that cannot
be accounted for by variation across assessors. (This is the key differ-
ence between one- and two-factor ANOVA: in one-factor ANOVA, any
variation due to assessors cannot be quantified and, therefore, remains
in the background noise (error).) The ANOVA also compares varia-
tion between assessors to variation within assessors that cannot be
accounted for by variation across samples.
Appendices 153

3 Two-factor ANOVA with interaction: This calculates variation due to


two factors and any interaction between these two factors; any other
variation is considered as background noise (error). Typically, the fac-
tors are samples and assessors, where each assessor has provided rep-
licated responses for each sample. The ANOVA compares variation
between samples’ means to variation within samples that cannot be
accounted for by variation across assessors, and interaction between
samples and assessors. This is the key difference between two-factor
ANOVA with and without interaction: in two-factor ANOVA (with-
out interaction), any variation due to interaction cannot be quantified
and, therefore, remains in the background noise (error). The two-fac-
tor ANOVA with interaction also compares variation between assessors
to variation within assessors that cannot be accounted for by variation
across samples and interaction between samples and assessors.
Note: If you have chosen an ANOVA that does not calculate a specific
source of variation, do not assume that this variation does not exist. Its
effect will be contained in the background noise (error).

Calculating ANOVA
Whilst it is possible to calculate ANOVA by hand, nowadays, it is typi-
cally performed using computer software. It is important, however, to
understand the origins of each term in the software output and to inter-
pret them correctly. Table A5.1 shows the key elements of an ANOVA
output for a two-factor ANOVA with interaction.

Interpretation
ANOVA output is interpreted through reference to the p-values associated
with samples, assessors and interaction. If the p-value is less than the speci-
fied significance level (typically  0.05), it can be concluded that this fac-
tor has a significant effect. For example, in Table A5.1, there is a significant
effect of samples, assessors and the interaction between the two for the
attribute hardness. ANOVA does not identify where significant differences
within a factor exist. For example, there may be a significant sample effect
but ANOVA does not identify which samples are significantly different to
one another. This requires further analysis using an MCT.

Multiple comparison tests


MCTs identify the levels (samples or assessors) within a factor, between
which significant differences exist. They should be applied only when the
total effect for the ANOVA is significant. Note that carrying out multiple
t-tests is not appropriate as it does not allow for any adjustment of the
154 Sensory evaluation

Table A5.1 ANOVA output for hardness

SS df MS F-ratio P

Total 1094.79 62 17.658 32.89 0.0001


Sample 1034.18 6 172.36 321.01 0.0001
Assessor 18.99 8 2.37 4.42 0.0001
Sample*assessor 41.62 48 0.87 1.62 0.018
Error 67.655 126 0.537

SS: Sum of squares. The measure of variation within the data set.
df: Degrees of freedom relates to the number of levels within each element of the
data set.
MS: Mean square. This is a measure of variation that takes into account the
number of levels within each element of the data set. It is calculated by dividing
the sum of squares by the df.
F-ratio: This is calculated, for each factor, by dividing the mean square for that fac-
tor by the mean square for the error term (background noise). This is how ANOVA
compares the variation due to factors to variation from the background noise
(error). The F-ratio is used to determine if there is a significant effect of each factor.
If calculating ANOVA by hand, this F-ratio would be compared to a critical value
from statistical tables at a specified significance level. Some software output also
includes the critical F-value.
p: The p-value is a calculated probability. It is the probability of making a type I
error, i.e. concluding that a significant effect exists when it does not. This relates
to the significance level of the test decided at the planning stage (typically 5% for
sensory test); therefore, a p-value of 0.05 would indicate a significant effect.
Note 1: Output from a one-factor ANOVA would only include total, sample (or
assessor) and error terms. Two-factor ANOVA (without interaction) would include
total, sample, assessor and error terms.
Note 2: Remember that an underlying assumption of ANOVA is that the data are
normally distributed. The data should be checked for normality before carrying out
ANOVA. If it is not normally distributed, appropriate, alternative analyses should be
applied.

α risk. It may lead to the conclusion that differences exist when it is not
the case.
Most MCTs work by comparing the difference between the mean values
of all possible pairs of samples or assessors to a calculated value or range
of values. If the difference between two sample means is greater than the
calculated value, then a significant difference is concluded to exist.

Choice of multiple comparison test


There are several MCTs to choose from, each calculated in a different
way (see O’Mahony 1986). The choice of test should be made prior to
the analysis and is directed by the specific objective of the investigation.
Appendices 155

Table A5.2 Results from Tukey’s HSD test on sweetness


for six confectionary samples

Sample Mean

CF53 3.5 A
CF78 3.7 A
CF81 4.2 A B
CF22 4.9 B
CF15 5.8 C
CF48 6.7 D

The most common tests employed are the LSD test, the Newman Keuls
test and the Tukey’s honestly significant different (HSD) test.
The choice of test generally depends on the risk you are willing to take in
terms of missing differences that actually exist (or concluding that a differ-
ence exists when it does not) – the conservativeness of the test. Some MCTs
adjust the significance level so that it is kept at 0.05 (or 0.01), for compari-
sons made between individual pairs of samples. These tests, e.g. Newman
Keuls, Duncan and the LSD test, are more likely to find differences between
pairs of samples, i.e. they are less conservative. The LSD test is the least con-
servative test and should be used only when there are a small number of
comparisons to make, e.g. between three or four samples/assessors.
Other tests, such as the Tukey HSD, Sheffe and Bonferroni tests, work
by keeping the overall level of significance for the whole set of compari-
sons at 0.05 (or 0.01). They are more conservative and so may miss real
differences between pairs of samples.
A typical output from an MCT on a set of six samples is shown in
Table A5.2.
The table lists the mean scores associated with each sample and a let-
ter code. Samples with the same letter codes are not significantly differ-
ent. Samples with different letter codes are significantly different. In the
earlier example, samples CF53, CF78 and CF81 are not significantly dif-
ferent. Sometimes a sample is associated with more than one letter code.
In the earlier example, sample CF81 is also not significantly different to
CF22 but samples CF53 and CF78 are. Samples CF15 and CF48 are sig-
nificantly different to each other, and the other samples.
Other MCTs exist for specific situations and include Dunnett’s test,
used when individual samples are compared to one sample only, e.g. a
control, and Dunn’s test, which is used when only selected pairs of sam-
ples are identified for comparison prior to the ANOVA.
See O’Mahony (1986) and Lea et al. (1997) for more detailed informa-
tion on ANOVA and MCTs.

You might also like