You are on page 1of 45

ANALYSIS OF VARIANCE

ENS185 2nd Semester


2

In the previous lessons, we’ve talked about


comparisons between two populations,
what if there are more than 2 populations?
ONE-WAY ANOVA
Estimation of a Population Proportion 4

COMPARING THREE OR MORE MEANS


Analysis of Variance (ANOVA) - an inferential method used to test the equality
of three or more population means.

A manufacturer of paper used for making grocery bags is interested in


improving the product’s tensile strength. Product engineering believes
that tensile strength is a function of the hardwood concentration in
the pulp and that the range of hardwood concentrations of practical
interest is between 5 and 20%. A team of engineers responsible for the
study decides to investigate four levels of hardwood concentration: 5%,
10%, 15%, and 20%. They decide to make up six test specimens at each
concentration level by using a pilot plant. All 24 specimens are tested
on a laboratory tensile tester in random order.
Introduction to Data Analysis 5

CHARACTERISTICS OF AN EXPERIMENT
An experiment is a controlled study conducted to determine the effect varying one or more
explanatory variables or factors has on a response variable. Any combination of the values of the factors
is called a treatment.

A factor is a
characteristic that The response is
differentiates each The treatment is Treatment
the measured
a combination of combinations are
group or outcome taken
factors and/or applied to the
population. A from the
factor can have levels of factors. experimental
experimental
two or more units.
units.
levels.

A control group serves as a baseline treatment that can be used to compare it to other treatments.

Replication - Replication is the repetition of an experiment on more than one individual.

Blinding - Blinding is a technique in which the subject doesn’t know whether he or she is receiving a
treatment or a placebo to avoid bias.
Double-blinding – both researcher and subject does not know which one gets the placebo
Introduction to Data Analysis 6

QUESTION?

1 • What is the factor in the study?

2 • What are the levels?

3 • What are the experimental units?

4 • What are the responses?

5 • How many replicates were tested?


Presentation title 7

This is an example of a completely randomized single-factor


experiment with four levels. Randomization indicates that the samples
are randomly assigned to the levels and the order of the runs are
conducted randomly.
Estimation of a Population Proportion 8

ONE-WAY ANOVA
Requirements:
1. There must be k simple random samples, one from each of k
populations or a randomized experiment with k treatments.
2. The k samples must be independent of each other; that is, the
subjects in one group cannot be related in any way to subjects
in a second group.
3. The populations must be normally distributed.
4. The populations must have the same variance; that is, each
treatment group has population variance 𝜎 2
Estimation of a Population Proportion 9

ONE-WAY ANOVA
Source of Sum of Squares Df Mean Squares F
Variation
Treatment 𝑆𝑆𝑇𝑟 k-1 𝑆𝑆𝑇𝑟 𝑀𝑆𝑇𝑟
𝑘−1 𝑀𝑆𝐸
Error 𝑆𝑆𝐸 n-k 𝑆𝑆𝐸
𝑛−𝑘
Total 𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑟 + 𝑆𝑆𝐸 n-1
Estimation of a Population Proportion 10

ONE-WAY ANOVA
Definition of Terms
• SST (Total Sum of Squares) – total variability in the data. This
is defined as the squared differences between the grand mean
and the dependent variable.
2
𝑆𝑆𝑇 = ෍ 𝑦𝑖 − 𝑦ത

The total variability is accounted by the variation between the


treatments(treatment sum of squares) and variation within the
treatments (error sum of squares).
𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑟 + 𝑆𝑆𝐸
Estimation of a Population Proportion 11

ONE-WAY ANOVA
Definition of Terms
• 𝑆𝑆𝑇𝑟 (treatment sum of squares) – sum of squares of
differences between treatment means and the grand mean
2
𝑆𝑆𝑇𝑟 = ෍ 𝑛𝑘 𝑦𝑘 − 𝑦ധ

Where 𝑦𝑘 = 𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛


Estimation of a Population Proportion 12

ONE-WAY ANOVA
Definition of Terms
• 𝑆𝑆𝐸 (error sum of squares) – sum of squares of differences of
observations within a treatment from the treatment mean
𝑘

𝑆𝑆𝐸 = ෍ 𝑦𝑘 − 𝑦 2

𝑖=0
Where
𝑦𝑘 = 𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛
y = 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑝
Estimation of a Population Proportion 13

ONE-WAY ANOVA
Definition of Terms
• M𝑆𝑇𝑟 (Mean square for treatment) – average variability
between groups
𝑆𝑆𝑇𝑟
𝑀𝑆𝑇𝑟 =
𝑘
• 𝑀𝑆𝐸 (Mean square for error) – average variability within
groups
𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑘
Estimation of a Population Proportion 14

ONE-WAY ANOVA
Hypothesis Testing
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡

Test Statistic:
𝑀𝑆𝑇𝑟
𝐹=
𝑀𝑆𝐸
Where dfnum=k-1, dfden=n-k
Estimation of a Population Proportion 15

ONE-WAY ANOVA
Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
0.05 6 60 10 8
0.1 6 94 15.66667 7.866667
0.15 6 102 17 3.2
0.2 6 127 21.16667 6.966667

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 382.7917 3 127.5972 19.60521 3.59E-06 3.098391212
Within Groups 130.1667 20 6.508333

Total 512.9583 23
Presentation title 16
Estimation of a Population Proportion 17

COMPLETELY RANDOMIZED BLOCK DESIGN


Randomized complete block designs differ from the completely
randomized designs in that the experimental units are grouped into
blocks according to known or suspected variation which is isolated by
the blocks.

It is also known as the two-way ANOVA without interaction. A key


assumption in the analysis is that the effect of each level of the
treatment factor is the same for each level of the blocking factor.

In RCBD, there is one observation for each combination of levels of


the treatment and block factors
Estimation of a Population Proportion 18

COMPLETELY RANDOMIZED BLOCK DESIGN


Suppose you are manufacturing concrete cylinders for, say, bridge
supports. There are three ways of drying green concrete (say A, B, and
C), and you want to find the one that gives you the best compressive
strength. The concrete is mixed in batches that are large enough to
produce exactly three cylinders, and your production engineer believes
that there is substantial variation in the quality of the concrete from
batch to batch. Five batches are completed which produces three
cylinders. Your measurements are the compressive strength of the
cylinder in a destructive test.
Introduction to Data Analysis 19

QUESTION?

1 • What is the factor in the study?

2 • What are the levels?

3 • What are the experimental units?

4 • What are the responses?

5 • What are the blocks in this study?


Estimation of a Population Proportion 20

Source of Sum of Squares Df Mean Squares F


Variation
Treatment 𝑆𝑆𝑇𝑟 k-1 𝑆𝑆𝑇𝑟 𝑀𝑆𝑇𝑟
𝑘−1 𝑀𝑆𝐸
Blocks 𝑆𝑆𝐵 b-1 𝑆𝑆𝐵 𝑀𝑆𝐵
𝑏−1 𝑀𝑆𝐸
Error 𝑆𝑆𝐸 (k-1)(b-1) 𝑆𝑆𝐸
(k−1)(b−1)
Total 𝑆𝑆𝑇 n-1
Estimation of a Population Proportion 21

CRBD
Definition of Terms
• SST (Total Sum of Squares) – total variability in the data. This
is defined as the squared differences between the grand mean
and the dependent variable.
2
𝑆𝑆𝑇 = ෍ 𝑦𝑖 − 𝑦ധ

The total variability is accounted by the variation between the


treatments(treatment sum of squares), variation between blocks
(blocks sum of squares and variation within the treatments (error
sum of squares).
𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑟 + 𝑆𝑆𝐸 + 𝑆𝑆𝐵
Estimation of a Population Proportion 22

CRBD
Definition of Terms
• 𝑆𝑆𝑇𝑟 (treatment sum of squares) – sum of squares of
differences between treatment means and the grand mean
2
𝑆𝑆𝑇𝑟 = ෍ 𝑛𝑘 𝑦𝑘 − 𝑦ധ

Where 𝑦𝑘 = 𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛


Estimation of a Population Proportion 23

CRBD
Definition of Terms
• 𝑆𝑆𝐵 (block sum of squares) – sum of squares of differences of
observations within a block from the grand mean
𝑘

𝑆𝑆𝐵 = ෍ 𝑛𝑏 𝑦𝑏 − 𝑦 2

𝑖=0
Where
𝑦𝑏 = 𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛
y = 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑝
Estimation of a Population Proportion 24

CRBD
Definition of Terms
• 𝑆𝑆𝐸 (error sum of squares) – can be usually referred to as
unexplained error
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐵 − 𝑆𝑆𝑇𝑟
Estimation of a Population Proportion 25

CRBD
Definition of Terms
• M𝑆𝑇𝑟 (Mean square for treatment) – average variability
between groups
𝑆𝑆𝑇𝑟
𝑀𝑆𝑇𝑟 =
𝑘
• 𝑀𝑆𝐸 (Mean square for error) – average variability within
groups
𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑘
Estimation of a Population Proportion 26

CRBD
Definition of Terms
• M𝑆𝐵(Mean square for blocks) – average variability between
blocks
𝑆𝑆𝐵
𝑀𝑆𝐵 =
(𝑏 − 1)
Estimation of a Population Proportion 27

CRBD ANOVA
Hypothesis Testing for Treatments
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡

Test Statistic:
𝑀𝑆𝑇𝑟
𝐹=
𝑀𝑆𝐸

Where dfnum=k-1, dfden=(k-1)(b-1)


Estimation of a Population Proportion 28

CRBD ANOVA
Hypothesis Testing for Blocks
𝐻0 : 𝜇𝑏1 = 𝜇𝑏2 = ⋯ = 𝜇𝑏 (𝑛𝑜 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑎𝑚𝑜𝑛𝑔 𝑏𝑙𝑜𝑐𝑘 𝑚𝑒𝑎𝑛𝑠)
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑏𝑙𝑜𝑐𝑘𝑠 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
(𝑏𝑙𝑜𝑐𝑘𝑖𝑛𝑔 𝑖𝑠 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒)

Test Statistic:
𝑀𝑆𝐵
𝐹=
𝑀𝑆𝐸

Where dfnum=b-1, dfden=(k-1)(b-1)


Estimation of a Population Proportion 29

WHEN IS BLOCKING NECESSARY?


Suppose that an experiment is conducted as a RCBD and
blocking was not really necessary. If it was done as a completely
randomized single-factor experiment, the degrees of freedom
would be higher. The blocking cost us degrees of freedom which
increases the critical value.

However, the loss of freedom is usually small so if there is a


reasonable chance that block effects may be important, the
experimenter should use the RCBD.
Presentation title 30
Estimation of a Population Proportion 31

TWO-WAY ANOVA
Two-way ANOVA is an analysis used when two factors can explain
variability in the response variable. We deal with the two factors by
fixing them at different levels. Remember that we can deal with factors
through control by fixing them at one level or at different levels, and
randomizing so that the effect of uncontrolled variables on the
response variable is minimized.

In a completely randomized and completely randomized block design,


only one factor is manipulated and fixed at certain levels. In a two-way
ANOVA, two factors are fixed at certain levels.
Estimation of a Population Proportion 32

TWO-WAY ANOVA
The two-way Analysis of Variance is used when one suspects that there
may be some kind of interaction between the levels of two
experimental factors.

For example, one kind of chemotherapy was better for lung cancer, or
another is better for stomach cancer.
Estimation of a Population Proportion 33

TWO-WAY ANOVA
Aircraft primer paints are applied to aluminum surfaces by two
methods: dipping and spraying. The purpose of using the primer is to
improve paint adhesion, and some parts can be primed using either
application method. The process engineering group responsible for
this operation is interested in learning whether three different primers
differ in their adhesion properties. A factorial experiment was
performed to investigate the effect of paint primer type and
application method on paint adhesion. For each combination
of primer type and application method, three specimens were painted,
then a finish paint was applied, and the adhesion force was measured.
Introduction to Data Analysis 34

QUESTION?

1 • What is/are the factor/s in the study?


2 • What are the levels?
3 • What are the experimental units?
4 • What are the responses?
• How many replicates were done for each
5
treatment?
Estimation of a Population Proportion 35

Source of Sum of Squares Df Mean Squares F


Variation
Factor A 𝑆𝑆𝐴 a-1 𝑆𝑆𝐴 𝑀𝑆𝐴
𝐹𝐴 =
𝑎−1 𝑀𝑆𝐸
Factor B 𝑆𝑆𝐵 b-1 𝑆𝑆𝐵 𝑀𝑆𝐵
𝐹𝐵 =
𝑏−1 𝑀𝑆𝐸
Interaction 𝑆𝑆𝐴𝐵 (a-1)(b-1) 𝑆𝑆𝐴𝐵 𝑀𝑆𝐴𝐵
𝐹𝐴𝐵 =
AB (a−1)(b−1) 𝑀𝑆𝐸
Error SSE ab(r-1) 𝑆𝑆𝐴𝐵
ab(r−1)
Total 𝑆𝑆𝑇 abr-1

Where r=replicates
Estimation of a Population Proportion 36

TWO-WAY ANOVA
Definition of Terms
• SST (Total Sum of Squares) – total variability in the data. This
is defined as the squared differences between the grand mean
and the dependent variable.
2
𝑆𝑆𝑇 = ෍ 𝑦𝑖 − 𝑦ധ

The total variability is accounted by the variation between the


Factor A levels (treatment A sum of squares), variation between
the Factor B levels (treatment B sum of squares), variation due to
interaction and variation within the treatments
𝑆𝑆𝑇 = 𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐴𝐵 + 𝑆𝑆𝐸
Estimation of a Population Proportion 37

TWO-WAY ANOVA
Definition of Terms
• 𝑆𝑆𝐴 (treatment A sum of squares) – sum of squares of
differences between treatment A means and the grand mean
2
𝑆𝑆𝐴 = ෍ 𝑛𝐴 𝑦𝐴 − 𝑦ധ

Where 𝑦𝐴 = 𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝐴


Estimation of a Population Proportion 38

RCBD
Definition of Terms
• 𝑆𝑆𝐵 (treatment B sum of squares) – sum of squares of
differences between treatment B means and the grand mean
2
𝑆𝑆𝐴 = ෍ 𝑛𝑏 𝑦𝑏 − 𝑦ധ

Where 𝑦𝑏 = 𝑔𝑟𝑜𝑢𝑝 𝑚𝑒𝑎𝑛 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝐵


Estimation of a Population Proportion 39

RCBD
Definition of Terms
• 𝑆𝑆𝐴𝐵 (interaction sum of squares) – sum of squares of
differences of each specimen in the treatment condition and
the treatment condition mean
Estimation of a Population Proportion 40

RCBD
Definition of Terms
• 𝑆𝑆𝐸 (error sum of squares) – can be usually referred to as
unexplained error
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐴𝐵 − 𝑆𝑆𝐴 − 𝑆𝑆𝐵
Estimation of a Population Proportion 41

TWO-WAY ANOVA
Hypothesis Testing for Factor A
𝐻0 : 𝜇𝑎1 = 𝜇𝑎2 = ⋯ = 𝜇𝑎
(The means of all levels of Factor A are equal)
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑒𝑣𝑒𝑙 𝐴 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡

Test Statistic:
𝑀𝑆𝐴
𝐹=
𝑀𝑆𝐸

Where dfnum=a-1, dfden=ab(r-1)


Estimation of a Population Proportion 42

TWO-WAY ANOVA
Hypothesis Testing for Factor B
𝐻0 : 𝜇𝑏1 = 𝜇𝑏2 = ⋯ = 𝜇𝑏
(The means of all levels of Factor B are equal)
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑒𝑣𝑒𝑙 𝐵 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡

Test Statistic:
𝑀𝑆𝐵
𝐹=
𝑀𝑆𝐸

Where dfnum=b-1, dfden=ab(r-1)


Estimation of a Population Proportion 43

TWO-WAY ANOVA
Hypothesis Testing for Interaction AB
𝐻0 : 𝑇ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 𝑎𝑟𝑒 𝑎𝑙𝑙 𝑧𝑒𝑟𝑜.
𝐻𝐴 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑖𝑠 𝑛𝑜𝑛 − 𝑧𝑒𝑟𝑜.
Test Statistic:
𝑀𝑆𝐴𝐵
𝐹=
𝑀𝑆𝐸

Where dfnum=(a-1)(b-1), dfden=ab(r-1)


Presentation title 44

Fill in the blanks.


Source of Sum of Squares Df Mean F Fcrit
Variation Squares
Factor A 15 4
Factor B 43.5
Interaction 5.4 8
AB
Error 24.5
Total 59
Introduction to Data Analysis 45

QUESTION?

1 • How many levels does factor A have?

2 • How many levels does factor B have?

3 • How many replicates were done?

You might also like