Comparing Central Locations with One-Way ANOVA

Quantitative Methods 2
ECON 20003
WEEK 5
COMPARING SEVERAL POPULATION CENTRAL
LOCATIONS WITH ONE-WAY ANALYSIS OF
VARIANCE (ANOVA) BASED ON INDEPENDENT
SAMPLES AND RANDOMISED BLOCKS
Reference:
SSK: §15.1, 15.3-15.4, 20.3
Dr László Kónya
January 2020
ANALYSIS OF VARIANCE (ANOVA)
• On week 3 you learnt how to compare two population central locations

with parametric and nonparametric tests. Analysis of variance, ANOVA in
brief, is an extension of these tests to several (k  2) populations.
ANOVA: a class of statistical procedures used to divide the total

variation present in a set of data into several components,
and to measure the contribution of each of the possible
sources of variation to the total variation, in order to find out
whether several populations have the same central location.
When the population means exist, the aim is to test the

composite null hypothesis,
H0 : 1 = 2 = … = k vs. HA : not all i’s are equal.
This is a relatively ‘weak’ null hypothesis and even if it is

rejected we still do not know which particular means differ.
Given these hypotheses, Analysis of Variance appears a strange name,

but as you will see, it is indeed based on the comparisons of variances.
L. Kónya, 2020 UoM, ECON 20003, Week 5 2
• One might think of testing H0 with a series of t-tests that compare all
possible pairs of means, but this is not a good idea for two reasons:
i. Since there are K = k (k-1) / 2 possible pairs, the number of t-tests

to be carried out increases rapidly by the number of populations
and K can be prohibitively large.
ii. If these t-tests are calculated independently from each other using
a single sample and the same significance level, say , then the
probability of avoiding the Type I error is (1- ) for each test, but it
is only (1- )K for the whole set of K tests.
For example, if there are k = 10 (sub-) populations, one has to perform

K = (10  9) / 2 = 45 separate t-tests.
If each of these tests is performed at the 5% significance level, then the
probability of avoiding the Type I error (i.e. not to reject a true H0) is 0.95
for each test, but it is only (0.95)45 = 0.01 for the whole set of K = 45 tests.
In other words, the probability of the Type I error in any given test is 0.05,
while the probability of incorrectly rejecting H0 in at least one of the 45 tests
is 0.99. This inflated rate of error is clearly unacceptable in practice.

• In order to perform ANOVA, one has to draw an independent random
sample from each population and compare the sample means to the
corresponding sample items, to each other and also to the overall or
grand mean.
For k = 3, this can be illustrated as follows:
Apparently, the three sample means

(yellow dots) are far closer to the
Grand mean
corresponding sample items (blue dots)
than to each other or to the grand
mean (red dot), and there is far smaller
variability within each sample than
between the three sample means.
Hence, H0 is unlikely to be true.
Samples drawn from the three populations

Consider now the following scenario:
In this case the three sample means

(yellow dots) do not seem to be closer
to the corresponding sample items
(blue dots) than to each other or to the
grand mean (red dot), and variability
seems to be no different within the
samples than between them.
Hence, H0 is probably true.
• ANOVA has its own “vocabulary”.
Treatment: a possible source of variation in the set of data

which is under the experimenter control.
Experimental unit: an entity that receives the treatment.
Factor: a set of treatments of a single kind; it defines the
(sub-) populations.
For example, a factor might be a set of fertilizers, a treatment can be a new
fertilizer, and the experimental units might be plots of grounds.
Measurements are taken on the experimental units in order to obtain

observations for the variable of interest, called response variable.
• There are several different ANOVAs depending on their experimental

designs. The differences are due to (i) the number of factors, (ii) the
source of measurements and (iii) the selection of sampled populations.
i. Number of factors: single-factor design (one-way ANOVA) versus

multifactor design (two-way, three-way etc. ANOVA).
ii. Source of measurements, i.e. whether the observations represent

different sets of experimental units or similar / same set of units:
completely randomised design versus randomised block design.
With a completely randomised design the experimental
units are assigned to the treatments completely at random,
while with a randomised block design the experimental units
are divided into blocks on the basis of some blocking
variable and then within each block they are randomly
assigned to the treatments.
A block is a group of experimental units that are identical, at least
similar, with respect to all known sources of variability.
A special case of the randomised block design is the

repeated measures design, where each experimental unit is
assigned to all treatments in a random order and a block is
the collection of the measurements on a given unit.
Suppose, for example, that we intend to design an experiment to determine

whether there is any difference between three grades of petrol.
In the completely randomised design we could assign ten randomly
selected test cars to each grade and measure their consumptions.
In the randomised block design each block could be a given make and
model of cars, and three randomly selected cars from each block could run
on the three grades of petrol (one on each), or each randomly selected car
would run on each of the three grades of gasoline in a random order
(repeated measures design).
Note: The completely randomised and the randomised block designs are the
generalizations of the experiments based on two independent samples
and on matched pairs, respectively, to more than two populations.
iii. Depending on the way the sampled sub-populations are selected
in the experiment, an ANOVA model can be a
Fixed-effects model: all possible populations of interest are
included in the analysis;
Random-effects model: the sampled populations are chosen
randomly from all possible populations
of interest.
In the case of fixed-effects the inferences are limited to the specific

populations that appear in the experiment, while in the case of
random effects the inferences can be generalised.
As regards the actual steps involved in the analysis, there is no
difference between fixed-effects and random-effects models,
but the results must be interpreted differently.
For example, if we compare the costs of living in the eight Australian state
capitals using analysis of variance, then the model is a fixed-effects ANOVA
model and the results are valid only for these cities.
However, if we randomly select 20 big cities from all across the world in
order to study the costs of living in major cities in general, then the model is
a random-effects ANOVA model and the results can be generalized.
ONE-WAY ANOVA: INDEPENDENT SAMPLES
• One-way ANOVA is the simplest version of ANOVA as it considers a

single factor, i.e. only one kind of a treatment.
Its parametric version based on the completely randomised design,
i.e. on independent samples, is an extension of the two-independent-
sample Z or t test for the difference between two population means.
• One-way independent ANOVA is based on the following assumptions:
i. The data set constitutes k independent random samples drawn

from k (sub-) populations.
ii. Each (sub-) population is normally distributed,
Xj : N (j ; )
iii. … and has the same variance,  2.
Under these assumptions the common variance  2 can be estimated

with the weighted average of the k sample variances, i.e. by the pooled
estimator, sp2, which is an unbiased estimator of  2.

If, in addition, H0 : 1 = 2 = … = k is also true, then all sampled (sub-)
populations have the same distribution and the sample data comprises k
independent random samples drawn from the same population, i.e. from
N ( ; ).
If the sample sizes are equal, then the k sample means can be
considered as a simple random sample drawn from the same
sampling distribution of the sample mean, and the sample
variance of this sampling distribution provides an alternative
unbiased estimator, s02, of the population variance.
It can be proved that these two estimators of  2, i.e. sp2 and s02, are
independent of each other and their ratio follows Fisher’s F distribution,
granted that H0 is correct. Namely,
If H0 is true, the two estimates are expected to be similar, so Fobs  1.

Otherwise, s02 is biased upward and Fobs > 1.
Reject H0 if Fobs is greater than the F,k-1,n-k critical value and
hence the p-value is smaller than .
The calculations are based on the following equality:
(See the formulas
for the grand mean
on the next slide)
Total Sum of Sum of Squares for Sum of Squares

Squares, SS Treatments, SST for Error, SSE
Overall variation Variation Variation
between samples within samples
The sample presents the strongest
possible support for H0 when all
sample means are equal to the
grand mean and thus
SST = MST = Fobs = 0.
Mean Squares:

Ex 1:
Suppose we want to compare the cholesterol contents (milligrams per package)
of k = 4 competing diet foods (A, B, C, D) on the basis of four independent
random samples of size nj = 3 each.
Each brand of diet food is a specific treatment and the packages are the
experimental units.
a) Are the differences between the

sample means significant or can
they be attributed to chance?
Use  = 0.05.
H0 : 1 = 2 = 3 = 4 versus HA : at least one k (k = 1, 2, 3, 4) differs.
The analysis is based on a completely randomised design, and assuming that

there are no other diet foods, the proper setup is a fixed-effects ANOVA model.
If all sample
Grand mean: sizes are the
same, say m
F,df1,df2 = F0.05,3,8 = 4.07 Fobs = 2.25 < 4.07 = Fcrit, so H0 is maintained.
Consequently, we conclude at the 5% level that the differences among the
cholesterol contents of the four competing diet foods are insignificant.
This test can be reproduced in R with the combination of the summary and
aov(Cholesterol ~ as.factor(Food)) commands, which returns
One of the critical assumptions behind the ANOVA F-test is that the sampled
(sub-) populations have the same variance (i.e. they are homoskedastic).
When we are uncertain about the validity of this assumption, it is better to
perform the Welch F-test. It is a generalization of the Welch t-test (see slides
13 & 17 of week 3) and it does not require equal variances.
To run this test in R, execute the oneway.test(Cholesterol ~ Food) command,

which returns
Again, H0 is maintained.
Note: The ANOVA F-test is very reliant on the assumption of equal variances.
In addition, both the ANOVA F-test and the Welch F-test might lead to
incorrect conclusions if the populations are strongly non-normal. When in
doubt, it is better to use some nonparametric alternative to these tests.
• The Kruskal-Wallis test is a nonparametric counterpart of one-way

independent ANOVA and a generalization of the Wilcoxon rank-sum test
(also known as the Mann-Whitney U-test; week 3, slides #19-21).
It can be used to compare the central locations (medians) of
several populations when the data are ranked or quantitative but
not normal. The hypotheses are
This procedure assumes that

i. the data consists of independent random samples drawn from
ii. populations that differ at most with respect to their central locations
(i.e. medians),
iii. the variable of interest is continuous and the measurement scale is
at least ordinal.
Like the Wilcoxon rank-sum test, the Kruskal-Wallis test is based on the
ranks in the pooled set of the k independent samples.
Combine the k independent samples of sizes n1, n2,…, nk,
and rank the observations from the smallest (1) to the largest
(n = ni ), averaging the ranks of tied observations.
Let Tj denote the sum of ranks assigned to the observations in
the j th sample.
The test statistic is
If H0 is true, T12/n1, T22/n2,…, Tk2/nk are fairly similar and their sum is
relatively small. Hence, a ‘large’ H value indicates that H0 is probably
incorrect.
The sampling distribution of H is non-standard, but if H0 is true and each
sample size is sufficiently large (say, at least 5),
Reject H0 if the observed test statistic exceeds
the small-sample or the chi-square critical value,
whichever is appropriate.
Note: For k = 2, we can apply either the Wilcoxon rank-sum test or the
Kruskal-Wallis test. However, while the Wilcoxon rank-sum test can be
used for one-sided and two-sided alternative hypotheses alike, the
Kruskal-Wallis test can only determine whether a significant difference
exist between the sample medians.
Ex 1 (cont.)
b) In part (a) we tacitly assumed that the necessary requirements are satisfied.
Since the sample sizes are equal but small, we cannot verify statistically
normality and the equality of variances. Given this uncertainty, let’s perform
the Kruskal-Wallis test, first manually and then with R.
H0 : 1 = 2 = 3 = 4
HA : at least one i is different

The chi-square critical value is  2,k-1 =  20.05,3 = 7.81, larger than Hobs = 5.17.
However, each nj = 3 is smaller than 5, so the chi-square approximation might

be misleading. The more accurate ‘small-sample’ 5% critical value for k = 4
and nj = 3 (j = 1, 2, 3, 4) is 6.8974 (see the relevant table on LMS). It is smaller
than the chi-square critical value, but still larger than the observed test statistic
value (5.17).
We maintain H0 at the 5% level and conclude that the differences

among the cholesterol contents of the four competing diet foods are
insignificant.
It is reassuring that we drew the same conclusion from all three tests,
i.e. the ANOVA F-test, the Welch F-test and the Kruskal-Wallis test.
In R the Kruskal-Wallis test can be performed by executing the kruskal.test

(Cholesterol, Food) command. It produces the following printout:
H0 is maintained.
Note: The p-value reported by R is based on the asymptotically valid chi-
square distribution, even if the sample sizes are too small to justify the
chi-square approximation. It is our task to check whether the samples are
large enough.
In this case, for example, the p-value is somewhat inaccurate, but still
leads to the same conclusion than the ‘small-sample’ KW critical value
(see the previous slide).

ONE-WAY ANOVA: RANDOMISED BLOCKS
• Parametric one-way ANOVA based on a randomised block design is the

multi-population equivalent of the matched pair Z or t test for the
difference between two population means.
The randomised block design can make it easier to detect differences
among the treatments by reducing the variations within them.
Suppose, for example, that a statistician wants to determine whether

incentive pay plans offered to employees are effective. To do so, he selects
three groups of five workers who assemble the same equipment, and offers
a different incentive plan to each group.
The treatments are the incentive plans, the response variable is the
production output, and the experimental units are the workers.
Since productivity likely depends on various characteristics of the workers,

like age, gender or experience, the experiment can be made more efficient
by forming groups of workers with no or only small differences with respect
to these characteristics, i.e. by eliminating individual differences.
• One-way ANOVA using randomised block design (with k treatments and
b blocks) is based on four assumptions:
i. Each observed xij constitutes an independent random sample of
size 1 drawn from one of the k  b (sub-) populations considered.
ii. Each (sub-) population is normally distributed,
iii. … and has the same variance,  2.
iv. The block and treatment effects are additive.
There supposed to be no interaction between blocks and
treatments, i.e. the effect of any given block-treatment
combination is exactly the same than the sum of their
individual effects.
• When the experimental design is the completely randomised design

(i.e. independent samples), the total sum of squares is decomposed into
two sources of variations, treatment and error (see slide #11).
For the randomised block design, it is divided into three components:

(‘Treatment’, ‘Block, ‘Error’)
Denoting the mean for treatment j and for block i as xT,j -bar and xB,i -bar,
respectively, these sums of squares are as follows.
Total Sum of
Squares:
Sum of Squares
for Treatment:
Sum of Squares
for Blocks:
Sum of Squares
for Error:

• Randomised block design allows us to perform two different tests, one
for the treatment means and another one for the block means.
In both cases, the hypotheses, test statistics and the decision rules are
similar to the ones in independent samples one-way ANOVA.
1) Testing treatment means
Under H0
2) Testing block means
Under H0
In both tests, reject H0 if Fobs > F,df1,df2.

Ex 2: (Selvanathan, p. 645, ex. 15.42)
As an experiment to understand measurement error, a statistics professor asks
four students to measure the heights of the professor (PR), a male student
(MS), and a female student (FS). The differences (in centimetres) between the
correct heights and the heights measured by the students (Error) are listed
below.
Each student measured the
height of the same three people,
hence this experiment is based
on a repeated-measures design,
i.e. on a special case of the
randomised block design.
a) Can we infer that there are differences in the errors between the subjects
being measured? Use  = 0.05.
b) Can we infer that there are differences in the errors between the students
who obtained the measurements? Use  = 0.05.
Let’s consider the subjects being measured as the treatments (k = 3) and the
students who measure as the blocks (b = 4). To answer these questions we
need to test both the treatment means (a) and the block means (b).
The total sum of squares can be obtained from the overall sample variance,
The error sum of squares could be computed using the definitional formula,
but it is easier to obtain it from

i. Treatment means
F,df1,df2 = F0.05,2,6 = 5.14 > 5.13, so we cannot reject H0 at the 5% level.

Hence, the treatment means are only insignificantly different, i.e. the
differences of measurement errors between the subjects are
insignificant.
ii. Block means
F,df1,df2 = F0.05,3,6 = 4.76 < 23.18, so we can reject H0 at the 5% level.

Hence, the block means are significantly different, i.e. the students
differ in terms of measurement errors they make.
To run this test in R, execute the summary(aov(Error ~ as.factor(Student) +
as.factor(Subject))) command, which returns
At the 5% significance level H0 is maintained for the treatment (i.e. Subject)

means, but it is rejected for the block (i.e. Student) means. Notice, however,
that both null hypotheses could be rejected at the 5.2% level.
• The Friedman test is a nonparametric alternative to one-way ANOVA on

randomised blocks and a generalization of the Wilcoxon signed ranks
test for matched pairs (week 2, slides 24-25 and week 3, slide 4).
It can be used to compare the central locations of two or more
(sub-) populations when the data are ranked or quantitative but
not normal.
The hypotheses are the same as in the Kruskal-Wallis test, i.e.

However, the Friedman test is based on the ranks within each
block.
If there are no ties, the test statistic is
where b and k are the
number of blocks and
treatments, respectively,
and Tj is the sum of ranks
for treatment j.
If there are ties, Fr has to be corrected for the number of ties and the
corrected test statistic is
where the correction factor is
and ti is the number of tied

scores in the i th block.
Note: The correction factor always satisfies 0 < C  1. Therefore, Frc  Fr and it
can happen that Fr is below but Frc is above the critical value.
The test based on Frc has potentially more power against H0.
If H0 is true, T1, T2,…, Tk are fairly similar and the sum of their squares is
relatively small. Hence, a ‘large’ Fr (Frc) value indicates that H0 is
probably incorrect.
The sampling distribution of Fr (Frc) is non-standard, but if H0 is true and
k and/or b is sufficiently large (k > 6 and/or b > 24),
Reject H0 if the observed test statistic exceeds

the small-sample or the chi-square critical value,
whichever is appropriate.
Ex 3: (Selvanathan et al., p. 912, ex. 20.45)

The following data are from a blocked experiment. Conduct the Friedman test
to determine whether at least two population central locations differ. Use  =
0.05.
k = 3, b = 5

The ranks have to be
assigned by moving across
blocks (rows),
and the rank sums are
calculated for the
treatments (columns).
This is the uncorrected

Friedman test statistic.
This time, however, the corrected Friedman test statistic Frc is exactly the same
because there is not a single tie and hence each ti = 0 and the correction factor
is C = 1 (see the formulas on slide 28).

From the Friedman critical value table on LMS, the 5% small-sample critical
value is 6.4 and since Fr,obs = 4.8 < 6.4 = Fcrit, we fail to reject H0. Hence, it is
not possible to conclude at the 5% level that at least two population central
locations differ.
To run this test in R, execute the friedman.test(Y ~ Treatment | Block)

command, which returns
H0 is maintained.
Note: The p-values reported by R for nonparametric tests are based on the
asymptotically valid distributions, even if the required conditions for
reasonably accurate approximations are not satisfied. It is our task to
check whether these approximations are acceptable.

WHAT SHOULD YOU KNOW?
• The rational behind analysis of variance (ANOVA).

• The difference between the completely randomised design and the
randomised block design.
• The difference between fixed-effects and random effects models.
• To perform parametric ANOVA based on the completely randomised
design and on the randomised block design manually and with R.
• To perform nonparametric ANOVA based on the completely randomised
design and on the randomised block design manually and with R.

Comparing Central Locations with One-Way ANOVA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparing Central Locations with One-Way ANOVA

Uploaded by

Copyright:

Available Formats

Quantitative Methods 2

• On week 3 you learnt how to compare two population central locations

ANOVA: a class of statistical procedures used to divide the total

When the population means exist, the aim is to test the

This is a relatively ‘weak’ null hypothesis and even if it is

Given these hypotheses, Analysis of Variance appears a strange name,

i. Since there are K = k (k-1) / 2 possible pairs, the number of t-tests

For example, if there are k = 10 (sub-) populations, one has to perform

L. Kónya, 2020 UoM, ECON 20003, Week 5 3

For k = 3, this can be illustrated as follows:

Apparently, the three sample means

Hence, H0 is unlikely to be true.

Samples drawn from the three populations

L. Kónya, 2020 UoM, ECON 20003, Week 5 4

In this case the three sample means

Hence, H0 is probably true.

• ANOVA has its own “vocabulary”.

Treatment: a possible source of variation in the set of data

Measurements are taken on the experimental units in order to obtain

• There are several different ANOVAs depending on their experimental

i. Number of factors: single-factor design (one-way ANOVA) versus

ii. Source of measurements, i.e. whether the observations represent

A special case of the randomised block design is the

Suppose, for example, that we intend to design an experiment to determine

In the case of fixed-effects the inferences are limited to the specific

• One-way ANOVA is the simplest version of ANOVA as it considers a

• One-way independent ANOVA is based on the following assumptions:

i. The data set constitutes k independent random samples drawn

Under these assumptions the common variance  2 can be estimated

L. Kónya, 2020 UoM, ECON 20003, Week 5 9

If H0 is true, the two estimates are expected to be similar, so Fobs  1.

Total Sum of Sum of Squares for Sum of Squares

L. Kónya, 2020 UoM, ECON 20003, Week 5 11

a) Are the differences between the

H0 : 1 = 2 = 3 = 4 versus HA : at least one k (k = 1, 2, 3, 4) differs.

The analysis is based on a completely randomised design, and assuming that

To run this test in R, execute the oneway.test(Cholesterol ~ Food) command,

• The Kruskal-Wallis test is a nonparametric counterpart of one-way

This procedure assumes that

The test statistic is

L. Kónya, 2020 UoM, ECON 20003, Week 5 17

However, each nj = 3 is smaller than 5, so the chi-square approximation might

We maintain H0 at the 5% level and conclude that the differences

In R the Kruskal-Wallis test can be performed by executing the kruskal.test

L. Kónya, 2020 UoM, ECON 20003, Week 5 19

• Parametric one-way ANOVA based on a randomised block design is the

Suppose, for example, that a statistician wants to determine whether

Since productivity likely depends on various characteristics of the workers,

• When the experimental design is the completely randomised design

For the randomised block design, it is divided into three components:

L. Kónya, 2020 UoM, ECON 20003, Week 5 22

1) Testing treatment means

2) Testing block means

In both tests, reject H0 if Fobs > F,df1,df2.

L. Kónya, 2020 UoM, ECON 20003, Week 5 25

F,df1,df2 = F0.05,2,6 = 5.14 > 5.13, so we cannot reject H0 at the 5% level.

ii. Block means

F,df1,df2 = F0.05,3,6 = 4.76 < 23.18, so we can reject H0 at the 5% level.

At the 5% significance level H0 is maintained for the treatment (i.e. Subject)

• The Friedman test is a nonparametric alternative to one-way ANOVA on

L. Kónya, 2020 UoM, ECON 20003, Week 5 27

where the correction factor is

and ti is the number of tied