You are on page 1of 60

CHAPTER 4

SMQ27103:ANALYSIS OF
VARIANCE(ANOVA)

1
Outline Chapter 4
4.1 Introduction
4.2 Basic Concepts in Analysis of Variance (ANOVA)
4.3 Assumptions of ANOVA
4.4 Completely Randomized Design (One-way ANOVA)
Introduction
Variability in Response Variable WEEK 11
Randomizing Experimental Units to Factor Levels
4.5 ANOVA F-test Statistic
The Concept of ANOVA F-test
Characteristics of F-Distribution
Steps in Performing One-way ANOVA Part 2
4.6 Kruskal-Wallis Test: Nonparametric Alternative to One-way ANOVA WEEK 12
4.7 Randomized Complete Block Design (Two-way ANOVA with Replication)
Introduction to Factorial Design
Variations in Two Factors Factorial Design Part

WEEK 13
ANOVA Table for Two Factors Factorial Design
Steps in Performing ANOVA for Two Factors Factorial Design
Interaction Plot
Possible Outcomes of a 2 x 2 Factorial Design
WEEK 11

4.1 Introduction
4.2 Basic Concepts of Analysis of Variance (Anova)
Learning outcome:
- Able to recognize terminologies of ANOVA
4.3 Assumption of Anova
Learning outcome:
- Able to state the assumption of ANOVA
4.4 Completely Randomized Design (One-Way ANOVA)
Learning outcome:
- Able to apply CRD using EXCEL and make conlcusion based on ANOVA output
4.5 ANOVA F-test Statistic 3
4.1 INTRODUCTION 4
Statistical Inference Method
ANOVA

Use to Used for • experimental research


• field/observational research
• quasi- experimental study

Test equality of three or more population mean

SEM 2 2021,2022
4.2 BASIC CONCEPT IN ANOVA 5
Response/ Afffect the
Dependent outcome of
Variable response
variable
Factor/
Independent Groups of
Variable individuals or
objects of
being
Treatment/ observed
Groups

Result or
Experimental outcome
Unit Level of from
factor set up experimental or
at random by observation
researcher study
EXAMPLE 4.1 6

You are interested to conduct a study in attempt to


determine whether the means of statistics test score
are not the same for different sleeping times (4, 6, 8).

State the response variable, factor, treatment and


experimental units
SOLUTION 4.1 7

Response Variable : Statistics test scores


Factor: Sleeping times
Treatment: Level of the factor (number of hours of sleep)
Experimental Units: Selected random student
4.3 ASSUMPTIONS OF ANOVA 8
MUST BE MUST BE
k simple random Population
samples drawn variances equal
from each k 𝜎12 = 𝜎22 =...𝜎𝑘2 =
population 𝜎2
4
ASSUMPTIONS
OF ANOVA

MUST BE INDEPENDENT
The samples
k sample are
normally or
Independent
approximately
(Not Related
normal
distributed
4.4 COMPLETELY RANDOMIZED DESIGN(CRD)
(ONE-WAY ANOVA) 9

CRD/One-Way often used Scientific experiment with purpose


Anovc Comparing Treatments
Processes
USED Material or Product
WHEN

only one independent variable


(factors) influence response variable
EXAMPLE 10

Comparing effect of three different


types of vehicles (Sedan, Multi-
Purpose-Vehicle (MPV), Small) on
fuel consumptions
SOLUTION 11

Factor : 3 types of vehicles

Response Variable : Distinguish fuel consumption


12

llustration of 𝝁𝟏 = 𝝁𝟐 = 𝝁𝟑 where 𝝈𝟐𝟏 = 𝝈𝟐𝟐 = 𝝈𝟐𝟑


13

llustration of three different groups means with 𝝈𝟐𝟏 = 𝝈𝟐𝟐 = 𝝈𝟐𝟑


EXAMPLE 4.2 14
1) An experiment is conducted to study the effects of three
different sugar solutions (glocose, sucrose, and fructose) on
bacterial growth

2) A researcher wants to determine the effect of smoking on


weight. He randomly selects 10 individuals from three
independent groups (non-smokers, light smokers, heavy
smockers)

State the response variable and factor.


SOLUTION 4.2 15

1) Response Variable : Bacterial Growth


Factor: Sugar Solutions

2) Response Variable : Weight


Factor: Type of Smokers
Variablitiy In Response Varibale 16

Total Variation
Variation of Variation between
observations in groups capture
each group the effect of
around the group treatment on the
mean response variable
Variation of the Variation within
group means groups represent
around the grand the random error
mean not captured by
the experimental
treatments
17

Figure 4.4 Illustration of within and between groups variation

A large value of between groups variation will lead to a higher chance to reject the null hypothesis and
consequently one can gives evidence based on sample data that the groups' population means are
significantly different.

Similarly, large value of within groups variation will reduce the chance to reject the null hypothesis that
the groups means are all equal.
Randomizing Experimental Units to Factor Levels 18
Table 4.4.1 Assigning treatments to experimental units
Treatment Individual Randomization

Consider by identify the experimental units as 15 Ll 1 L2


individuals and the number of levels Ll 2 L3
that to be used. In this example, let levels be the Ll 3 L3
three different diets; Diet A, B and C. Ll 4 Ll
Ll 5 L2
Assign treatments (Diet A, B and C) to each L2 6 Ll
experimental unit or individual at L2 7 L3
random. L2 8 Ll
L2 9 L2
L2 10 Ll
L3 11 L3
L3 12 L2
L3 13 L3
L3 14 L2
L3 15 Ll
Note: Ll - Diet A, L2 - Diet B, L3 - Diet C
Table 4.4.2 Assigning treatments to experimental units 19
Factor
Diet A Diet B Diet C Investigation of
4
6
1
5
2
3
the effect of factor
8
10
9
12
7
11 (diet) on the
15 14 13
response variable
Data for the characteristics of interest
(weight loss).

SEM 2 2021,2022
ANOVA F-Test Statistic 20

F-statistic F =
explained variation
unexplained variation
=
between groups variation
within groups variation

used to perform
the hypothesis Value of between groups variation
that is higher than within groups' variation will lead
to a large value of F-statistic.

test for k Larger F-statistic will increase the chance of


rejecting the null hypothesis that the groups'

population means are all equal.

means
The Total 21
Variation

Response variable is
given by the
difference of each
observation, from
the grand mean
The difference
between the
individual
observation 𝑦𝑖𝑗
and group mean The difference
𝑦𝑘 for k=1, 2, 3, between the
group mean 𝑦𝑘 for
k =1, 2, 3, ... and
the grand mean
𝑌.
22
Total Variation
=
Between groups
variation
+
Within groups
variation
23
SST
sum of squares
of total variation
in the response
variable

SSTR
sum of squares
due to variation
between the groups
SSE
sum of squares
due to the
sampling error
and calculated
from within
groups variation
SEM 2 2021,2022
24

F-test statistic
is
defined as ?
25

Ratio of the
'average variability'
between/among groups
divided by
‘average variability’
within groups
SEM 2 2021,2022 26
27

ANOVA Test
used to test
what kind of
hypothesis?
ANOVA F-Test Statistic 28

F-statistic F =
explained variation
unexplained variation
=
between groups variation
within groups variation

used to perform
the hypothesis Value of between groups variation
that is higher than within groups' variation will lead
to a large value of F-statistic.

test for k Larger F-statistic will increase the chance of


rejecting the null hypothesis that the groups'

population means are all equal.

means
29

𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 = . . . =𝝁𝒌
(No difference between the treatments mean/ groups mean)

𝑯𝟏 : At least one mean is different from the others


(difference between the treatments mean / groups mean)

SEM 2 2021,2022
Characteristics of F-distribution 30

• Not a symmetric
distribution.
• Not negative value
• Shape of F-
distribution is
F-distribution is a right-tailed
determined by two distribution, the rejection region
is the region on the right
degrees of freedom
Steps in Performing One-way ANOVA 31

Comparing the drying time (in minutes) for k different paint brands.

Response Variable : Drying time

Interest : Effect of k different brands of paint on the drying time.


32
STEP 1

State the null and alternative hypotheses.

μ: population mean drying time


𝐻0 : All k population means drying time are equal (𝝁𝟏 = 𝝁𝟐 = . . . =𝝁𝒌 )
𝐻1 : At least one population mean drying time is different from the others
SEM 2 2021,2022 33
34
STEP 3

Compute the test statistic,𝐹

𝑀𝑆𝑇𝑅
𝐹𝑡𝑒𝑠𝑡 =
𝑀𝑆𝐸
35
STEP 4

Make the decision to reject or fail to reject the null hypothesis.

Decision Using Ftest


Reject null hypothesis Ftest ≥ 𝐹𝑐𝑣
Fail to reject the null hypothesis Ftest < 𝐹𝑐𝑣
36
STEP 5

State the conclusion in the context of the problem and the claim.

Reject the null hypothesis (Sufficient Evidence)


Fail to reject the null hypothesis (Insufficient Evidence)
EXAMPLE 4.3 37
A car manufacturer would like to compare the drying time (in minutes) for
three different paint brands. Eighteen paints are randomly selected and
the drying time for each of the paints is recorded.Assume the drying time
data set is normally distributed with equal population variances
Brand Drying Time (in minutes)
I. Identify the response variable and factor in this problem.
II. Is the factor a categorical variable? Why?
A 75 82 68 77 90 65
III. Define the population parameter of interest in this
B 56 53 45 61 58 48
study.
C 45 47 50 44 60 51
IV. What is the purpose of this study?
V. Suggest suitable statistical test for the problem in IV.
Support your reason.
VI. Conduct the suggested test in V at a = 0.05.
SOLUTION 4.3 38

STEP 1

State the null and alternative hypotheses.

𝐻0 : 𝝁𝑨 = 𝝁𝑩 =𝝁𝑪 (There is no difference in the mean drying time


between the three paints)
𝐻1 : At least one population mean drying time is different from the others
(There is a difference in the mean drying time between the three paints)
39
STEP 2

Obtain the critical value, 𝐹𝑐𝑣

𝐹𝑐𝑣 = 𝐹𝛼,𝑘−1,𝑁−𝑘

At 𝜶 = 0.05, k = 3, N=18
and by using statistical table for F-distribution,
F 0.05 , 2.15 = 3.682.
40
STEP 3

Compute the test statistic,𝐹

k = 3 independent groups (Brand A, B, C)


n = 6 (sample size for each group)
SEM 2 2021,2022 41
SSTR = (75) + (56) + (45) 42
+(82) + (53) +(47)
{(75)+(82)+(68)+(77)+(90)+(65) / 6 }^2 +(68) + (45) +(50)
+ {(56)+(53)+(45)+(61)+(58)+(48) / 6 }^2 +(77) + (61) + (44)
+ {(45)+(47)+(50)+(44)+(60)+(51) / 6 }^2 +(90) + (58) + (60)
+(65) + (48) + (51)
_______________
18

SSTR = (34808.17+17173.5+14701.5 )- 64201.39 = 3255.61

Brand Drying Time (in minutes)

A 75 82 68 77 90 65
B 56 53 45 61 58 48
C 45 47 50 44 60 51
43

SSE = SST - SSTR


=3255.61- 2481.79
= 773.82
44
STEP 4

Make the decision to reject or fail to reject the null hypothesis.

Decision Using Ftest


Reject null hypothesis Ftest ≥ 𝐹𝑐𝑣
Fail to reject the null hypothesis Ftest < 𝐹𝑐𝑣

Since Ftest = 24.06 > Fcv = 3.682, thus we reject H0.


45
STEP 5

Make the conclusion to reject or fail to reject the null hypothesis.

There is sufficient evidence to support the claim that the mean drying
time between the three brands of paint are significantly different at \alpha
= 0.05
46
STEP 5

State the conclusion in the context of the problem and the claim.

Reject the null hypothesis (Sufficient Evidence)


Fail to reject the null hypothesis (Insufficient Evidence)

There is sufficient evidence to support the claim that the mean


drying time between the three brands of paint are significantly
different at 𝛂 = 0.05.
WEEK 12

4.6 Kruskal-Wallis Test: Nonparametric Alternative to One-way ANOVA


Learning outcome:
- Able to apply nonparametric Kruskal- Walls test.

47
4.6 KRUSKAL-WALLIS TEST:
NONPARAMETRIC ALTERNATIVE TO 48
ONE-WAY ANOVA

Kruskal-Wall Test Applied under Circumstances


Three or more independent samples
USED (equal or different sample sizes) to be
WHEN compared

Nonparametric method based on


ranks Set of data does not meet the
requirements for a parametric test.
Determine whether there are
three or more samples • Not normally distributed
USED TO
originatting from the same • Different variances of the groups
distribution • Ordinal scalemeasurements
49
2
ASSUMPTIONS
BEFORE
MUST BE KRUSKAL-WALLIS
TEST

least three samples to be


compared and the sample
size for each sample must
be at least five. Rank the
MUST BE
data from the smallest to
the largest and calculate Right-tailed test and the
the test statistic, H. test statistic should be
large enough to reject
the null hypothesis to
conclude that the samples
did not come from the
same distribution.
Steps in Performing Kruskal-Wallis Test 50

STEP 1

State the null and alternative hypotheses.

𝐻0 : Median1, = Median2, = ... = Mediank,


𝐻1 : At least one group median is different from the others
51
STEP 2

2
Obtain the critical value, 𝜒𝛼,𝑘−1

Using the Chi-square distribution table,obtain


level of significance 𝛂 and k-1 degrees of freedom.

Since this is a right-tailed test, the rejection


region is on the right.
52
STEP 3

Calculate the test statistic, H

• Combine the groups or samples and rank all observations from the
smallest to the largest.
* For ties observations, calculate the average of the ranks.

• Compute the sum of ranks for each group 1,2, ..., k.

• Substitute into the formula for H:


53
STEP 4

Make the decision to reject or fail to reject the null hypothesis.

2
If H > 𝜒𝛼,𝑘−1 , we reject H0, otherwise fail to reject H0.
54
STEP 5

State the conclusion in the context of the problem and the claim.
EXAMPLE 4.7 55
Three machines (A, B and C) are used in the packaging of 1000 g milk powder. Each machine is set
so that each packet will contain of an average of 1000 g milk powder. Samples of five packets from
each machine are randomly selected and the amount of milk powder for each packet is measured.
The data set is given below and assumed that the data set is not normally distributed. By using
α=0.05, test the hypothesis that there is a difference in the amount of milk powder packed by the
three machines.

Machine Amount of Milk Powder(g)

A 987 987.5 987 989.5 995.3


B 990 990.5 990.8 989 992
C 989 990 995 995.7 993
SOLUTION 4.7 56

STEP 1
State the null and alternative hypotheses.

𝐻0 : There is no difference in the median amount of milk powder packed by the


machines
𝐻1 : There is a difference in the median amount of milk powder packed by the
machines

OR

𝐻0 : Median1, = Median2, = ... = Mediank,


𝐻1 : At least one group median is different from the others
57
STEP 2

2
Obtain the critical value, 𝜒𝛼,𝑘−1

For 𝛂 = 0.05 and degrees of freedom, k-1 where k is the number of


samples (three types of machines) the critical value given by
2
𝜒𝟎.𝟎𝟓 ,𝟐 = 5.9915.
58

Rank all the observation from


STEP 3
lowest to the largest value :
Substitute into Formula
Machi Rank Machi Rank Machi Rank
ne A ne B ne C
12 𝑅𝐴2 𝑅𝐵2 𝑅𝐶2
𝐻= + + − 3(𝑁 − 1)
𝑁(𝑁 + 1) 𝑛𝐴 𝑛𝐵 𝑛𝐶
987 1.5 990 7.5 989 4.5
987.5 3 990.5 9 990 7.5 12 262 422 522
𝐻= + + − 3(15 − 1)
987 1.5 990.8 10 995 13 15(15 + 1) 5 5 5
989.5 6 989 4.5 995.7 15
𝐻 = 3.444
995.3 14 992 11 993 12
Total Total Total NOTE* N=15
Sum 26 Sum 42 Sum 52
59
STEP 4

Make the decision to reject or fail to reject the null hypothesis.

2
If H > 𝜒𝛼,𝑘−1 , we reject H0, otherwise fail to reject H0.

Since H=3.44 <5.9915, we fail to reject Ho.


60
STEP 5

State the conclusion in the context of the problem and the claim.

There is insufficient evidence to support the claim that there is a


difference in the median amount of milk powder packed by the three
machines at 5% significance level.

You might also like