You are on page 1of 12

BFC 34303

CIVIL ENGINEERING STATISTICS


Chapter 8
Analysis of Variance
Faculty of Civil and Environmental Engineering
Universiti Tun Hussein Onn Malaysia

Analysis of Variance
Analysis of variance (ANOVA) is a hypothesis testing technique used to
test the equality of two or more population means by examining the
variances of samples that are taken.
It is a parametric test. Parametric tests are those that make assumptions
about the parameters of the population distribution from which the sample
is drawn.
It is used to determine whether:
• the differences between the samples are simply due to random error.
• there are systematic treatment effects that causes the mean in one
group to differ from the mean in another.
This is achieved by calculating the 𝐹-ratio.

1
ANOVA is based on comparing the variance (or variation) between the
data samples to variation within each particular sample.
If the “between” variation is much larger than the “within” variation, the
means of different samples will not be equal.
If the “between” and “within” variations are approximately the same size,
then there will be no significant difference between sample means.

Variance between > Variance within Difference in means


samples samples is significant

Variance between Variance within Difference in means


~
samples samples is not significant

𝐹 Distribution
The probability distribution used in ANOVA is the 𝐹 distribution. It was
named to honour Sir Ronald Fisher, one of the founders of modern-day
statistics.
The 𝐹 distribution is used as the distribution of the test statistic for several
situations, such as:
1. To test whether two samples are from populations having equal
variances.
2. To compare several population means simultaneously (as in the case
of ANOVA).
In both of these situations, the populations must be normal and the data
must be at least interval-scale.

2
Characteristics of the 𝑭 Distribution

1. There is a family of 𝐹 distributions. Each member of the family is


determined by the degrees of freedom in the numerator and the
degrees of freedom in the denominator.
2. The 𝐹 distribution is continuous, meaning that it can assume an infinite
number of values between 0 and +∞.
3. The 𝐹 distribution cannot be negative, hence the smallest value 𝐹 can
have is 0.
4. It is positively skewed or skewed to the right (the shape of the
distribution has a long right-tail).
5. It is asymptotic, meaning as the values of 𝑋 increase, the 𝐹 curve
approaches the 𝑋-axis but never touches it.

The 𝑭 Curve

df = (29, 28) The 𝐹 curve is generally


asymmetrical and asymptotic,
with a skew to the right.
Relative frequency

df = (19, 6) However, as the degrees of


freedom for the numerator and for
df = (6, 6) the denominator get larger, the
curve approximates the normal.

3
Comparing Two Population Variances using 𝐹
Distribution
The 𝐹 distribution is used to test the hypothesis that the variance of one
normal population (𝜎 ) equals the variance of another normal population
(𝜎 ).
The null and alternative hypotheses will be:
𝐻 :𝜎 =𝜎
𝐻 :𝜎 ≠𝜎

To conduct the test, we select a random sample of 𝑛 observations from


the first population, and a sample of 𝑛 observations from the second
population.

The 𝐹 test statistic (or 𝐹-ratio) is defined as follows:


𝑠
𝐹=
𝑠

where
𝑠 = variance of the first sample (usually having the higher value)
𝑠 = variance of the second sample

If the null hypothesis is true, the test statistic follows the 𝐹 distribution with
𝑛 − 1 and 𝑛 − 1 degrees of freedom.
The critical value of 𝐹 is determined using the 𝐹 distribution table, given
the significance level 𝛼 and the degrees of freedom of the numerator and
denominator.
8

4
Example 8.1
Driving time (minutes)
A GrabCar driver is considering Route 1 Route 2
two routes to the airport that he
52 59
should use to transport his
passengers. Driving times for 67 60
both routes were collected. 56 61
Using the 0.10 significance level, 45 51
is there difference in the variation 70 56
in the driving times using the two 54 63
routes?
64 57
65

Calculate the Driving time (minutes)


mean and
variance for Route 1 Route 2
both routes. 𝑋 𝑋 −𝑋 𝑋 𝑋 −𝑋
52 39.5 59 0
Route 1
67 75.9 60 1
∑𝑋 408
𝑋 = = 56 5.2 61 4
𝑛 7
45 176.5 51 64
𝑋 = 58.29
70 137.2 56 9
Route 2 54 18.4 63 16
∑𝑋 472 64 32.7 57 4
𝑋 = =
𝑛 8 65 36
𝑋 = 59 Σ = 408 Σ = 485.4 Σ = 472 Σ = 134
10

5
Route 1
𝑋 −𝑋 485.4 (𝑠 should have the higher value and take the
𝑠 = = = 80.9 role as the numerator. If the value is smaller, it
𝑛−1 7−1
should be the denominator)

Route 2
𝑋 −𝑋 134
𝑠 = = = 19.1
𝑛−1 8−1

The null and alternative hypotheses:

𝐻 :𝜎 =𝜎
𝐻 :𝜎 ≠𝜎

The significance level 𝛼 is 0.10


11

Degrees of freedom in the numerator (𝑣 ) is 7 − 1 = 6 and degrees of


freedom in the denominator (𝑣 ) is 8 − 1 = 7. Since this is a two-tailed
test, 𝛼/2 is 0.05.

Therefore, from the critical 𝐹 table, the critical 𝐹-value is 3.87.

The rejection region:

Reject 𝑯𝒐
𝛼/2 = 0.05 𝛼/2 = 0.05

0
Critical 𝐹-value (not required
3.87 Critical 𝐹-value
since 𝐹 ≥ 1.00)

Decision rule: If calculated 𝐹 > critical 𝐹-value  Reject 𝐻


12

6
13

𝑠 80.9
𝐹= = = 4.24
𝑠 19.1

Since the calculated 𝐹 (4.24) is greater than the critical 𝐹 (3.87) and falls
in the rejection region, we reject 𝑯𝒐 . Therefore, we accept 𝐻 , which
states that there is a difference in the variation of the driving times for
both routes.

Try solving this question


using Excel (F-Test Two
Sample for Variances)

14

7
One-Way ANOVA Test
Another use of the 𝐹 distribution is the ANOVA technique in which we
compare three or more population means to determine whether they
could be equal, or in other words, to test the equality of means for more
than two populations.
We call this the one-way ANOVA test and the 𝐹 distribution is a one-
tailed distribution.
To use ANOVA, we assume the following:
1. The dependent variable is measured at interval-scale or ratio-scale.
2. The populations are normally distributed.
3. The populations have equal standard deviations.
4. The samples are selected independently.
15

ANOVA Table
We construct an ANOVA table to summarise the calculations of the 𝐹
statistic. The format of the ANOVA table is as follows:
ANOVA Table
Source of Degrees of
Sum of Squares Mean Square 𝐹
Variation Freedom
Treatments 𝑆𝑆𝑇 𝑘−1 𝑀𝑆𝑇 = 𝑆𝑆𝑇 / (𝑘 − 1) 𝑀𝑆𝑇 / 𝑀𝑆𝐸
Error 𝑆𝑆𝐸 𝑛−𝑘 𝑀𝑆𝐸 = 𝑆𝑆𝐸 / (𝑛 − 𝑘)
Total 𝑆𝑆 𝑛−1

Notes: 𝑘 = number of treatments, 𝑛 = total number of observations


𝑘 − 1 = degrees of freedom in the numerator, 𝑛 − 𝑘 = degrees of freedom in the denominator
“Treatments” are also called “Groups”, and “Error” can also be called “Residual”
“Treatments” may be called “Between Groups”, and “Error” may be called “Within Groups”
16

8
∑𝑋
Sum of squares total (SS) = 𝑋 −
𝑛
𝑇 ∑𝑋
Sum of squares treatment (SST) = −
𝑛 𝑛
where 𝑇 = column total for each treatment
𝑛 = sample size for each treatment

Sum of squares error (SSE) = SS – SST

Mean squares treatment (MST) = SST / (𝑘 − 1)

Mean squares error (MSE) = SSE / (𝑛 − 𝑘)

𝐹 statistic or 𝐹-ratio = MST / MSE


17

Example 8.2
Students taking a course in Statistics were asked to rate the performance
of their professor as Excellent, Good, Fair or Poor. The rating was
matched with the student’s course grade. Use the 0.01 significance level
to determine if there is a difference in the mean score of the students in
each of the four rating categories.
Course Grades
Excellent Good Fair Poor
94 75 70 68
90 68 73 70
85 77 76 72
80 83 78 65
88 80 74
68 65
65
18

9
𝐻 : The mean scores are all equal (𝜇 =𝜇 =𝜇 =𝜇 )

𝐻 : The mean scores are not all equal (At least two means are different
from each other)

The significance level 𝛼 is 0.01

Degrees of freedom in the numerator (𝑣 ) is 𝑘 − 1 = 4 − 1 = 3 and


degrees of freedom in the denominator (𝑣 ) is 𝑛 − 𝑘 = 22 − 4 = 18.
The 𝐹 distribution for this test is one-tailed, so use 𝛼 = 0.01.

Therefore, from the critical 𝐹 table, the critical 𝐹-value is 5.09.

Decision rule: If calculated 𝐹 > critical 𝐹-value  Reject 𝐻

19

20

10
Course Grades
Excellent Good Fair Poor
𝑋 𝑋 𝑋 𝑋 𝑋 𝑋 𝑋 𝑋 Total
94 8836 75 5625 70 4900 68 4624
90 8100 68 4624 73 5329 70 4900
85 7225 77 5929 76 5776 72 5184
80 6400 83 6889 78 6084 65 4225
88 7744 80 6400 74 5476
68 4624 65 4225
65 4225
𝑇 349 391 510 414 1664
𝑛 4 5 7 6 22
𝑇 /𝑛 30450.25 30576.2 37157.14 28566 126749.59
𝑋 30561 30811 37338 28634 127344

21

𝑀𝑆𝑇 𝑆𝑆𝑇/(𝑘 − 1)
𝐹= =
𝑀𝑆𝐸 𝑆𝑆𝐸/(𝑛 − 𝑘)

∑𝑋 1,664
𝑆𝑆 = 𝑋 − = 127,344 − = 1,485.09
𝑛 22

𝑇 ∑𝑋 1,664
𝑆𝑆𝑇 = − = 126,749.59 − = 890.68
𝑛 𝑛 22

𝑆𝑆𝐸 = 𝑆𝑆 − 𝑆𝑆𝑇 = 1,485.09 − 890.68 = 594.41

890.68/(4 − 1)
𝐹= = 8.99
594.41/(22 − 4)

22

11
ANOVA Table
Source of Degrees of
Sum of Squares Mean Square 𝐹
Variation Freedom
Treatments 890.68 3 296.89 8.99
Error 594.41 18 33.02
Total 1,485.09 21

Since the calculated 𝐹 (8.99) is greater than the critical 𝐹 (5.09), we


reject 𝑯𝒐 . Therefore, we accept 𝐻 , which states that the population
means are not all equal.
The mean scores are not the same in each rating category. It is likely that
the grades students earned are related to the opinion they have of the
performance of the professor.

23

Try solving this question


using Excel (ANOVA:
Single Factor)

24

12

You might also like