Professional Documents
Culture Documents
BFC 34303 Chapter 8 Analysis of Variance
BFC 34303 Chapter 8 Analysis of Variance
Analysis of Variance
Analysis of variance (ANOVA) is a hypothesis testing technique used to
test the equality of two or more population means by examining the
variances of samples that are taken.
It is a parametric test. Parametric tests are those that make assumptions
about the parameters of the population distribution from which the sample
is drawn.
It is used to determine whether:
• the differences between the samples are simply due to random error.
• there are systematic treatment effects that causes the mean in one
group to differ from the mean in another.
This is achieved by calculating the 𝐹-ratio.
1
ANOVA is based on comparing the variance (or variation) between the
data samples to variation within each particular sample.
If the “between” variation is much larger than the “within” variation, the
means of different samples will not be equal.
If the “between” and “within” variations are approximately the same size,
then there will be no significant difference between sample means.
𝐹 Distribution
The probability distribution used in ANOVA is the 𝐹 distribution. It was
named to honour Sir Ronald Fisher, one of the founders of modern-day
statistics.
The 𝐹 distribution is used as the distribution of the test statistic for several
situations, such as:
1. To test whether two samples are from populations having equal
variances.
2. To compare several population means simultaneously (as in the case
of ANOVA).
In both of these situations, the populations must be normal and the data
must be at least interval-scale.
2
Characteristics of the 𝑭 Distribution
The 𝑭 Curve
3
Comparing Two Population Variances using 𝐹
Distribution
The 𝐹 distribution is used to test the hypothesis that the variance of one
normal population (𝜎 ) equals the variance of another normal population
(𝜎 ).
The null and alternative hypotheses will be:
𝐻 :𝜎 =𝜎
𝐻 :𝜎 ≠𝜎
where
𝑠 = variance of the first sample (usually having the higher value)
𝑠 = variance of the second sample
If the null hypothesis is true, the test statistic follows the 𝐹 distribution with
𝑛 − 1 and 𝑛 − 1 degrees of freedom.
The critical value of 𝐹 is determined using the 𝐹 distribution table, given
the significance level 𝛼 and the degrees of freedom of the numerator and
denominator.
8
4
Example 8.1
Driving time (minutes)
A GrabCar driver is considering Route 1 Route 2
two routes to the airport that he
52 59
should use to transport his
passengers. Driving times for 67 60
both routes were collected. 56 61
Using the 0.10 significance level, 45 51
is there difference in the variation 70 56
in the driving times using the two 54 63
routes?
64 57
65
5
Route 1
𝑋 −𝑋 485.4 (𝑠 should have the higher value and take the
𝑠 = = = 80.9 role as the numerator. If the value is smaller, it
𝑛−1 7−1
should be the denominator)
Route 2
𝑋 −𝑋 134
𝑠 = = = 19.1
𝑛−1 8−1
𝐻 :𝜎 =𝜎
𝐻 :𝜎 ≠𝜎
Reject 𝑯𝒐
𝛼/2 = 0.05 𝛼/2 = 0.05
0
Critical 𝐹-value (not required
3.87 Critical 𝐹-value
since 𝐹 ≥ 1.00)
6
13
𝑠 80.9
𝐹= = = 4.24
𝑠 19.1
Since the calculated 𝐹 (4.24) is greater than the critical 𝐹 (3.87) and falls
in the rejection region, we reject 𝑯𝒐 . Therefore, we accept 𝐻 , which
states that there is a difference in the variation of the driving times for
both routes.
14
7
One-Way ANOVA Test
Another use of the 𝐹 distribution is the ANOVA technique in which we
compare three or more population means to determine whether they
could be equal, or in other words, to test the equality of means for more
than two populations.
We call this the one-way ANOVA test and the 𝐹 distribution is a one-
tailed distribution.
To use ANOVA, we assume the following:
1. The dependent variable is measured at interval-scale or ratio-scale.
2. The populations are normally distributed.
3. The populations have equal standard deviations.
4. The samples are selected independently.
15
ANOVA Table
We construct an ANOVA table to summarise the calculations of the 𝐹
statistic. The format of the ANOVA table is as follows:
ANOVA Table
Source of Degrees of
Sum of Squares Mean Square 𝐹
Variation Freedom
Treatments 𝑆𝑆𝑇 𝑘−1 𝑀𝑆𝑇 = 𝑆𝑆𝑇 / (𝑘 − 1) 𝑀𝑆𝑇 / 𝑀𝑆𝐸
Error 𝑆𝑆𝐸 𝑛−𝑘 𝑀𝑆𝐸 = 𝑆𝑆𝐸 / (𝑛 − 𝑘)
Total 𝑆𝑆 𝑛−1
8
∑𝑋
Sum of squares total (SS) = 𝑋 −
𝑛
𝑇 ∑𝑋
Sum of squares treatment (SST) = −
𝑛 𝑛
where 𝑇 = column total for each treatment
𝑛 = sample size for each treatment
Example 8.2
Students taking a course in Statistics were asked to rate the performance
of their professor as Excellent, Good, Fair or Poor. The rating was
matched with the student’s course grade. Use the 0.01 significance level
to determine if there is a difference in the mean score of the students in
each of the four rating categories.
Course Grades
Excellent Good Fair Poor
94 75 70 68
90 68 73 70
85 77 76 72
80 83 78 65
88 80 74
68 65
65
18
9
𝐻 : The mean scores are all equal (𝜇 =𝜇 =𝜇 =𝜇 )
𝐻 : The mean scores are not all equal (At least two means are different
from each other)
19
20
10
Course Grades
Excellent Good Fair Poor
𝑋 𝑋 𝑋 𝑋 𝑋 𝑋 𝑋 𝑋 Total
94 8836 75 5625 70 4900 68 4624
90 8100 68 4624 73 5329 70 4900
85 7225 77 5929 76 5776 72 5184
80 6400 83 6889 78 6084 65 4225
88 7744 80 6400 74 5476
68 4624 65 4225
65 4225
𝑇 349 391 510 414 1664
𝑛 4 5 7 6 22
𝑇 /𝑛 30450.25 30576.2 37157.14 28566 126749.59
𝑋 30561 30811 37338 28634 127344
21
𝑀𝑆𝑇 𝑆𝑆𝑇/(𝑘 − 1)
𝐹= =
𝑀𝑆𝐸 𝑆𝑆𝐸/(𝑛 − 𝑘)
∑𝑋 1,664
𝑆𝑆 = 𝑋 − = 127,344 − = 1,485.09
𝑛 22
𝑇 ∑𝑋 1,664
𝑆𝑆𝑇 = − = 126,749.59 − = 890.68
𝑛 𝑛 22
890.68/(4 − 1)
𝐹= = 8.99
594.41/(22 − 4)
22
11
ANOVA Table
Source of Degrees of
Sum of Squares Mean Square 𝐹
Variation Freedom
Treatments 890.68 3 296.89 8.99
Error 594.41 18 33.02
Total 1,485.09 21
23
24
12