You are on page 1of 89

Probability and Statistics

for Business and Data

PART 5 - ANOVA
Analysis of Variance

● In the previous section we used


Z- and t-Distributions to answer the question
“What is the probability that two samples
come from the same population? ”
Z-Distribution
t-Distribution
Analysis of Variance

● In this section we introduce a new


distribution – the F-Distribution
● Used to answer the question
“What is the probability that two
samples come from populations
that have the same variance? ”
Analysis of Variance

● In this section we introduce a new


distribution – the F-Distribution
● Can also answer the question
“What is the probability that
three or more samples come
from the same population? ”
ANOVA
Analysis of Variance
ANOVA

● In the previous section we tested


two samples to see if they likely came
from the same parent population.
● What if we had three (or more) samples?
● Could we do the same thing?

A B C
ANOVA

● Our null hypothesis


would look like:
𝐻0 : 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶

A B C
ANOVA

● We could test each pair:


𝐻0 : 𝜇𝐴 = 𝜇𝐵 𝛼 = 0.05
𝐻0 : 𝜇𝐴 = 𝜇𝐶 𝛼 = 0.05
𝐻0 : 𝜇𝐵 = 𝜇𝐶 𝛼 = 0.05

A B C
ANOVA

● The problem is, our overall confidence drops:


𝐻0 : 𝜇𝐴 = 𝜇𝐵 𝛼 = 0.05
.95 × .95 × .95 = .857
𝐻0 : 𝜇𝐴 = 𝜇𝐶 𝛼 = 0.05
𝐻0 : 𝜇𝐵 = 𝜇𝐶 𝛼 = 0.05 85.7% 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙

A B C
ANOVA

● This is where ANOVA comes in!


● We compute an F value, and compare it
to a critical value determined by our
degrees of freedom (the number of groups,
and the number of items in each group)

A B C
GroupA GroupB GroupC
ANOVA 37 62 50
60 27 63
52 69 58
Let’s work with some data: 43 64 54
40 43 49
52 54 52
55 44 53
39 31 43
39 49 65
23 57 43

A B C
GroupA GroupB GroupC
ANOVA 37 62 50
60 27 63
52 69 58
First calculate the sample means 43 64 54
40 43 49

Next calculate the overall mean 52


55
54
44
52
53
39 31 43
39 49 65
23 57 43
A,B,C 44 50 53

TOT 49
ANOVA

ANOVA considers two types of variance:


Between Groups
how far group means stray
from the total mean
Within Groups
how far individual values stray
from their respective group mean
ANOVA

The F value we’re trying to calculate is simply


the ratio between these two variances!
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝐹=
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
ANOVA

Recall the equation for variance:


𝛴 𝑥 − 𝑥ҧ 2 𝑆𝑆
2 =
𝑠 =
𝑛−1 𝑑𝑓

Here 𝛴 𝑥 − 𝑥ҧ 2 is the “sum of squares” 𝑆𝑆


and 𝑛 − 1 is the “degrees of freedom” 𝑑𝑓
ANOVA

So the formula for the F value becomes:


𝑆𝑆𝐺
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝐹= =
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠 𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟
SSG = Sum of Squares Groups 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = degrees of freedom (groups)
SSE = Sum of Squares Error 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = degrees of freedom (error)
𝑆𝑆𝐺 = 420 GroupA GroupB GroupC
ANOVA 37 62 50
60 27 63
52 69 58
Sum of Squares Groups 43 64 54
(𝜇𝐴 −𝜇 𝑇𝑂𝑇 )2 = (44 − 49)2 = 25 40 43 49

(𝜇𝐵 −𝜇 𝑇𝑂𝑇 )2 = (50 − 49)2 = 1 52


55
54
44
52
53
(𝜇𝐶 −𝜇 𝑇𝑂𝑇 )2 = (53 − 49)2 = 16 39 31 43
39 49 65
42 23 57 43
Multiply by the number of A,B,C 44 50 53
items in each group:
TOT 49
42 × 10 = 420
𝑆𝑆𝐺 = 420 GroupA GroupB GroupC
ANOVA 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2 37 62 50
60 27 63
52 69 58
Degrees of Freedom Groups 43 64 54
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 𝑛𝑔𝑟𝑜𝑢𝑝𝑠 − 1 40 43 49
52 54 52
=3−1 55 44 53
=2 39 31 43
39 49 65
23 57 43
A,B,C 44 50 53

TOT 49
𝑆𝑆𝐺 = 420 GroupA GroupB GroupC
ANOVA 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2 37 62 50
60 27 63
𝑆𝑆𝐸 = 3300
52 69 58
Sum of Squares Error 43 64 54
(xA-A)2 (xA-A)2 (xB-B)2 (xB-B)2 (xC-C)2 (xC-C)2 40 43 49
49 64 144 16 9 1 (37-44)2 52 54 52
=(-7)2
256 121 529 36 100 0 =49 55 44 53
64 25 361 361 25 100 39 31 43
1 25 196 1 1 144 39 49 65
16 441 49 49 16 100 23 57 43
1062 1742 496 A,B,C 44 50 53
TOTAL 3300 TOT 49
𝑆𝑆𝐺 = 420 GroupA GroupB GroupC
ANOVA 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2 37 62 50
60 27 63
𝑆𝑆𝐸 = 3300
52 69 58
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 27 43 64 54
Degrees of Freedom Error 40 43 49
52 54 52
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 𝑛𝑟𝑜𝑤𝑠 − 1 ∗ 𝑛𝑔𝑟𝑜𝑢𝑝𝑠 55 44 53

= 10 − 1 ∗ 3 39 31 43
39 49 65
= 27 23 57 43
A,B,C 44 50 53

TOT 49
𝑆𝑆𝐺 = 420 GroupA GroupB GroupC
ANOVA 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2 37 62 50
60 27 63
𝑆𝑆𝐸 = 3300
52 69 58
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 27 43 64 54
Plug these into our formula: 40 43 49
52 54 52
𝑆𝑆𝐺 420 55 44 53
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 2 210
𝐹= 𝑆𝑆𝐸 = 3300 = = 𝟏. 𝟕𝟏𝟖 39 31 43
122.22 39 49 65
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 27
23 57 43
A,B,C 44 50 53

TOT 49
ANOVA with Excel Data Analysis
F Distribution
F-Distribution

shaded area = 𝛼

Fcritical
F-Distribution

Look up our critical value from an F-table


use a table set for
95% confidence
find numerator df
find denominator df
critical value = 3.35
F-Scores in MS Excel

● In Microsoft Excel, the following function


returns an F-score:

α df1 df2 Formula Output Value


0.05 2 27 =FINV(A2,B2,C2) 3.3541308285292
F-Scores in Python

>>> from scipy import stats


>>> stats.f.ppf(1-.05,dfn=2,dfd=27)
3.3541308285291986
ANOVA

Recall our null hypothesis:


𝐻0 : 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶
Since F is less than Fcritical
1.718 < 3.354
we fail to reject the
null hypothesis!
A B C
ANOVA Exercise #1

● In an effort to receive faster


payment of invoices, a company
introduces two discount plans
● One set of customers is given a 2% discount
if they pay their invoice early
● Another set is offered a 1% discount
● A third set is not offered any incentive
ANOVA Exercise #1

● The results are as follows: 2% disc 1% disc no disc

● Using ANOVA, can we say 11 21 14


16 15 11
that the offers result in 9 23 18

faster payments? 14 10 16
10 16 21
ANOVA Exercise #1

1. Calculate the means 2% disc 1% disc no disc


11 21 14
16 15 11
9 23 18
14 10 16
10 16 21
2,1,0 12 17 16
TOT 15
𝑆𝑆𝐺 = 70
ANOVA Exercise #1

2. Find Sum of Squares Groups 2% disc 1% disc no disc

(𝜇2 −𝜇 𝑇𝑂𝑇 )2 = (12 − 15)2 = 9 11 21 14


16 15 11
(𝜇1 −𝜇 𝑇𝑂𝑇 )2 = (17 − 15)2 = 4 9 23 18

(𝜇0 −𝜇 𝑇𝑂𝑇 )2 = (16 − 15)2 = 1 14 10 16


10 16 21
14 2,1,0 12 17 16
Multiply by the number of TOT 15
items in each group:
14 × 5 = 70
𝑆𝑆𝐺 = 70
ANOVA Exercise #1 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2

3. Degrees of Freedom Groups 2% disc 1% disc no disc


11 21 14
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 𝑛𝑔𝑟𝑜𝑢𝑝𝑠 − 1 16 15 11
9 23 18
=3−1 14 10 16
=2 10 16 21
2,1,0 12 17 16
TOT 15
𝑆𝑆𝐺 = 70
ANOVA Exercise #1 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐸 = 198

4. Sum of Squares Error 2% disc 1% disc no disc


11 21 14
(x2-2)2 (x1-1)2 (x0-0)2
16 15 11
1 16 4
9 23 18
16 4 25
14 10 16
9 36 4
10 16 21
4 49 0
2,1,0 12 17 16
4 1 25
TOT 15
34 106 58
TOTAL 198
𝑆𝑆𝐺 = 70
ANOVA Exercise #1 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐸 = 198
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 12

5. Degrees of Freedom Error 2% disc 1% disc no disc


11 21 14
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 𝑛𝑟𝑜𝑤𝑠 − 1 ∗ 𝑛𝑔𝑟𝑜𝑢𝑝𝑠 16 15 11
9 23 18
= 5−1 ∗3
14 10 16
= 12 10 16 21
2,1,0 12 17 16
TOT 15
𝑆𝑆𝐺 = 70
ANOVA Exercise #1 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐸 = 198
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 12

6. Calculate F value: 2% disc 1% disc no disc


11 21 14
𝑆𝑆𝐺 70 16 15 11
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 2 35
𝐹= 𝑆𝑆𝐸 = 198 = = 𝟐. 𝟏𝟐𝟏 9 23 18
16.5 14 10 16
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 12
10 16 21
2,1,0 12 17 16
7. Look up Fcritical: 𝟑. 𝟖𝟖𝟓 TOT 15
𝑆𝑆𝐺 = 70
ANOVA Exercise #1 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐸 = 198
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 12

Since F falls to the left of Fcritical


2.121 < 3.885
Fcalculated Fcritical
we fail to reject the =2.121 =3.885

null hypothesis!
ANOVA Exercise #1

We don’t have enough to


support the idea that our
offers changed the average Fcalculated Fcritical
=2.121 =3.885
number of days that customers
took to pay their invoices!
Two-Way ANOVA
One-Way vs Two-Way ANOVA

● In the previous examples we used one-way


ANOVA to test one independent variable.
● For the invoice problem, the independent
variable was the incentive offered.
● The dependent variable was the time
it took to receive payment.
One-Way vs Two-Way ANOVA

● Two-Way ANOVA lets us test two


independent variables at the same time
● For the invoice example, we might also
consider the amount due
● We would have 3 invoices for $50, 3 for
$100, etc. and offer different incentives at
each dollar amount.
One-Way vs Two-Way ANOVA
2% 1% no
● The resulting data might disc disc disc
$5016 23 21
look like this: $100 14 21 16

● Here, each row or dollar $150 11 16 18


$200 10 15 14
amount is called a block. $250 9 10 11
● Essentially, we want to isolate and remove
any variance contributed by the blocks, to
better understand the variance in the groups.
One-Way vs Two-Way ANOVA
2% 1% no
● So how do we do that? disc disc disc
$50 16 23 21
$100 14 21 16
$150 11 16 18
$200 10 15 14
$250 9 10 11
Two-Way ANOVA
Group 1 Group 2
● The goal of ANOVA is to 8 11
separate different aspects 10 12
TOT
of the total variance.
12 13

1,2 10 12 11
● In the previous examples
we had only
Sum of Squares Groups (SSG) » between groups

and Sum of Squares Error (SSE) » within groups


Two-Way ANOVA
Group 1 Group 2
● These two variances 8 11
SSG and SSE add up to 10 12
TOT
our total variance
12 13
1,2 10 12 11
Sum of Squares Total (SST)

Sum of Squares Groups (SSG) » between groups

and Sum of Squares Error (SSE) » within groups


Two-Way ANOVA
Group 1 Group 2
● Now we’ll look at variance Block A 8 11
between rows, or blocks Block B 10 12
Block C 12 13 TOT
1,2 10 12 11

Sum of Squares Groups (SSG) » between groups

and Sum of Squares Error (SSE) » within groups


Two-Way ANOVA
Group 1 Group 2 A,B,C
● First calculate the Block A 8 11 9.5
block means Block B 10 12 11
Block C 12 13 12.5
1,2 10 12 11
● Then calculate the
Sum of Squares Blocks (SSB) » between blocks

Sum of Squares Groups (SSG) » between groups

and Sum of Squares Error (SSE) » within groups


𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


● ANOVA still considers the Block A 8 11 9.5
relationship between the Block B 10 12 11

SSG and the SSE


Block C 12 13 12.5
1,2 10 12 11

Sum of Squares Blocks (SSB) » between blocks

Sum of Squares Groups (SSG) » between groups

and Sum of Squares Error (SSE) » within groups


𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


● By calculating the SSB, Block A 8 11 9.5
we remove some of the Block B 10 12 11

variance in SSE
Block C 12 13 12.5
1,2 10 12 11

Sum of Squares Blocks (SSB) » between blocks

Sum of Squares Groups (SSG) » between groups

and Sum of Squares Error (SSE) » within groups


𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


Sum of Squares Groups (SSG) Block A 8 11 9.5

(𝜇1 −𝜇 𝑇𝑂𝑇 )2 = (10 − 11)2 = 1 Block B 10 12 11


Block C 12 13 12.5
(𝜇2 −𝜇 𝑇𝑂𝑇 )2 = (12 − 11)2 = 1 1,2 10 12 11

2
multiply by the number 𝑆𝑆𝐺 = 6

of items in each group: 2 × 3 = 6


𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


Sum of Squares Blocks (SSB) Block A 8 11 9.5

(𝜇𝐴 −𝜇 𝑇𝑂𝑇 )2 = (9.5 − 11)2 = 2.25 Block B 10 12 11


Block C 12 13 12.5
2 2
(𝜇𝐵 −𝜇 𝑇𝑂𝑇 ) = (11 − 11) = 0  10 12 11
1,2

(𝜇𝐶 −𝜇 𝑇𝑂𝑇 )2 = (12.5−11)2 = 2.25


4.5 𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
multiply by the number
of items in each block: 4.5 × 2 = 9
𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


Sum of Squares Total (SST) Block A 8 11 9.5

(8 − 11)2 +(11 − 11)2 + Block B 10 12 11


Block C 12 13 12.5
(10 − 11)2 +(12 − 11)2 + 1,2 10 12 11
(12 − 11)2 +(13 − 11)2 = 16
𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
no need to multiply since 𝑆𝑆𝑇 = 16

every item is represented


𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


Sum of Squares Error (SSE) Block A 8 11 9.5

𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐺 − 𝑆𝑆𝐵 Block B 10 12 11


Block C 12 13 12.5
= 16 − 6 − 9 = 1 1,2 10 12 11

no need to multiply since we’re 𝑆𝑆𝐺 = 6


𝑆𝑆𝐵 = 9
working with totals already 𝑆𝑆𝑇 = 16
𝑆𝑆𝐸 = 1
𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


So how do we calculate F? Block A 8 11 9.5
Block B 10 12 11

Degrees of Freedom Groups


Block C 12 13 12.5
1,2 10 12 11
is unchanged:
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 𝑛𝑔𝑟𝑜𝑢𝑝𝑠 − 1 𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
=2−1 𝑆𝑆𝑇 = 16
𝑆𝑆𝐸 = 1
=1 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 1
𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


So how do we calculate F? Block A 8 11 9.5
Block B 10 12 11

Degrees of Freedom Error


Block C 12 13 12.5
 1,2 10 12 11
has changed:
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = (𝑛𝑏𝑙𝑜𝑐𝑘𝑠 − 1)(𝑛𝑔𝑟𝑜𝑢𝑝𝑠 − 1) 𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
= (3 − 1)(2 − 1) 𝑆𝑆𝑇 = 16
𝑆𝑆𝐸 = 1
=2 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 1
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 2
𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


So how do we calculate F? Block A 8 11 9.5
Block B 10 12 11
𝑆𝑆𝐺 6 Block C 12 13 12.5
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝐹= = 1 = 𝟏𝟐 1,2 10 12 11
𝑆𝑆𝐸 1
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 2 𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
𝑆𝑆𝑇 = 16
𝑆𝑆𝐸 = 1
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 1
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 2
𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


𝐹𝑔𝑟𝑜𝑢𝑝𝑠 = 𝟏𝟐 feels like a Block A 8 11 9.5

high value. Block B 10 12 11


Block C 12 13 12.5
1,2 10 12 11
However, in a two-way ANOVA,
𝐹𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 is found for groups and 𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
blocks separately! 𝑆𝑆𝑇 = 16
𝑆𝑆𝐸 = 1
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 1
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 2
𝑆𝑆𝐺

Two-Way ANOVA 𝐹=
𝑉𝑎𝑟. 𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
𝑉𝑎𝑟. 𝑊𝑖𝑡ℎ𝑖𝑛 𝐺𝑟𝑜𝑢𝑝𝑠
=
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠
𝑆𝑆𝐸
𝑑𝑓𝑒𝑟𝑟𝑜𝑟

Group 1 Group 2 A,B,C


𝐹𝑔𝑟𝑜𝑢𝑝𝑠 = 𝟏𝟐 feels like a Block A 8 11 9.5

high value. Block B 10 12 11


Block C 12 13 12.5
1,2 10 12 11
For groups, with 1 df in the
numerator and 2 df in the 𝑆𝑆𝐺 = 6
𝑆𝑆𝐵 = 9
denominator, 𝑆𝑆𝑇 = 16
𝑆𝑆𝐸 = 1
𝐹𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝟏𝟖. 𝟓 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 1
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 2
ANOVA Exercise #2
2% 1% no
● Let’s go back to the invoice disc disc disc
$50 16 23 21
problem, and add a new $100 14 21 16

independent variable $150 11 16 18


$200 10 15 14
● Here each block represents $250 9 10 11
an invoice amount
● The dependent variable is still
days elapsed until payment
ANOVA Exercise #2
2% 1% no block
1. Calculate the group means, disc disc disc
$50 16 23 21 20
the block means, $100 14 21 16 17

and the total mean $150 11 16 18 15


$200 10 15 14 13
$250 9 10 11 10
col 12 17 16 15
ANOVA Exercise #2
2% 1% no block
2. Sum of Squares Groups disc disc disc
$50 16 23 21 20
)2
(𝜇2 −𝜇 𝑇𝑂𝑇 = (12 − 15)2 = 9 $100 14 21 16 17

(𝜇1 −𝜇 𝑇𝑂𝑇 )2 = (17 − 15)2 = 4 $150 11 16 18 15


$200 10 15 14 13
(𝜇0 −𝜇 𝑇𝑂𝑇 )2 = (16 − 15)2 = 1 $250 9 10 11 10

14 col 12 17 16 15
Multiply by the number of 𝑆𝑆𝐺 = 70
items in each group:
14 × 5 = 70
ANOVA Exercise #2
2% 1% no block
disc disc disc
3. Degrees of Freedom Groups $50 16 23 21 20
$100 14 21 16 17
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 𝑛𝑔𝑟𝑜𝑢𝑝𝑠 − 1
$150 11 16 18 15
=3−1 $200 10 15 14 13
$250 9 10 11 10
=2
col 12 17 16 15

𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
ANOVA Exercise #2
4. Sum of Squares Blocks 2%
disc
1%
disc
no block
disc
(𝜇50 −𝜇 𝑇𝑂𝑇 )2 = (20 − 15)2 = 25 $50 16 23 21 20

(𝜇100 −𝜇 𝑇𝑂𝑇 )2 = (17 − 15)2 = 4 $100 14 21 16 17


$150 11 16 18 15
(𝜇200 −𝜇 𝑇𝑂𝑇 )2 = (15 − 15)2 = 0 $200 10 15 14 13

(𝜇200 −𝜇 𝑇𝑂𝑇 )2 = (13 − 15)2 = 4 $250 9 10 11 10


col 12 17 16 15
(𝜇250 −𝜇 𝑇𝑂𝑇 )2 = (10 − 15)2 = 25
𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2

58 × 3 = 174 58 𝑆𝑆𝐵 = 174


ANOVA Exercise #2
5. Sum of Squares Total 2%
disc
1%
disc
no block
disc
(x2-tot)2 (x1-tot)2 (x0-tot)2 $50 16 23 21 20
1 64 36 $100 14 21 16 17
1 36 1 $150 11 16 18 15
16 1 9 $200 10 15 14 13
25 0 1 $250 9 10 11 10
36 25 16 col 12 17 16 15
79 126 63
𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
TOTAL 268 𝑆𝑆𝐵 = 174
𝑆𝑆𝑇 = 268
ANOVA Exercise #2
6. Sum of Squares Error 2%
disc
1%
disc
no block
disc
$50 16 23 21 20
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐺 − 𝑆𝑆𝐵 $100 14 21 16 17
$150 11 16 18 15
= 268 − 70 − 174 = 24
$200 10 15 14 13
$250 9 10 11 10
col 12 17 16 15

𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐵 = 174
𝑆𝑆𝑇 = 268
𝑆𝑆𝐸 = 24
ANOVA Exercise #2
2% 1% no block
7. Degrees of Freedom Error disc disc disc
$50 16 23 21 20
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = (𝑛𝑏𝑙𝑜𝑐𝑘𝑠 − 1)(𝑛𝑔𝑟𝑜𝑢𝑝𝑠 − 1) $100 14 21 16 17

= (5 − 1)(3 − 1) $150 11 16 18 15
$200 10 15 14 13
=8 $250 9 10 11 10
col 12 17 16 15

𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐵 = 174 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 8
𝑆𝑆𝑇 = 268
𝑆𝑆𝐸 = 24
ANOVA Exercise #2
2% 1% no block
8. Calculate F disc disc disc
$50 16 23 21 20

𝑆𝑆𝐺 70
$100 14 21 16 17
$150 11 16 18 15
𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 2 35
𝐹= = = = 𝟏𝟏. 𝟔𝟕 $200 10 15 14 13
𝑆𝑆𝐸 24 3 $250 9 10 11 10
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 8 col 12 17 16 15

𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐵 = 174 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 8
𝑆𝑆𝑇 = 268 F = 11.67
𝑆𝑆𝐸 = 24
ANOVA Exercise #2
2% 1% no block
9. Find Fcritical disc disc disc
$50 16 23 21 20
𝛼 = 0.05 $100 14 21 16 17

𝑑𝑓𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟 = 2 $150 11 16 18 15
$200 10 15 14 13
𝑑𝑓𝑑𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 = 8 $250 9 10 11 10
𝐹𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝟒. 𝟒𝟔 col 12 17 16 15

𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐵 = 174 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 8
𝑆𝑆𝑇 = 268 F = 11.67
𝑆𝑆𝐸 = 24 Fcritical = 4.46
ANOVA Exercise #1

Since F falls to the right of Fcritical


4.46 < 11.67
Fcritical Fcalculated
we reject the null hypothesis! =4.46 =11.67

𝑆𝑆𝐺 = 70 𝑑𝑓𝑔𝑟𝑜𝑢𝑝𝑠 = 2
𝑆𝑆𝐵 = 174 𝑑𝑓𝑒𝑟𝑟𝑜𝑟 = 8
𝑆𝑆𝑇 = 268 F = 11.67
𝑆𝑆𝐸 = 24 Fcritical = 4.46
2-way ANOVA in Excel
Two-Way ANOVA
with Replication
Without vs With Replication
without replication with replication
GroupA GroupB GroupC GroupA GroupB GroupC
Block1 16 23 21 Block1 16 23 21
Block2 14 21 16 14 21 16
Block3 11 16 18 11 16 18
Block4 10 15 14 Block2 10 15 14
Block5 9 10 11 9 10 11
Block6 8 8 10 8 8 10

Samples have multiple values


Samples have a mean value
Two-Way ANOVA with Replication

● Introduces the concept of


sample means and sample variance
● Introduces the concept of interactions
Two-Way ANOVA with Replication

● As with our previous 2-way ANOVA,


we consider two independent variables
organized into groups and blocks
● We sample every block/group combination
● With replication, block/group samples
have multiple measurements
Two-Way ANOVA with Replication

● Consider an experiment that measures


the height of plants
● We apply three types of fertilizer A, B & C
– these are our Groups
● Plants are kept at two temperatures
(warm & cold) – these are our Blocks
● We assign 3 plants to each sample
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● First calculate the mean 12 17 15 k
Cold 16 14 15 M
for each 3-item sample 18 11 13
e
14 a
● Calculate column means 17 14 8
n
s

● Calculate block means Sample 13 19 16


Means
● Calculate the overall mean 17 13 12
Column
15 16 14 15
Means
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● As before, calculate the 12 17 15 k
Cold 16 14 15 M
Sum of Squares Blocks 18 11 13
e
14 a
16 − 15 2 + 14 − 15 2 = 2 17 14 8
n
s

× 9 𝑖𝑡𝑒𝑚𝑠 𝑝𝑒𝑟 𝑏𝑙𝑜𝑐𝑘 = 𝟏𝟖 Sample 13 19 16


Means 17 13 12
Column
15 16 14 15
Means
SSB = 18
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● As before, calculate the 12 17 15 k
Cold 16 14 15 M
Sum of Squares Columns 18 11 13
e
14 a
15 − 15 2 + 16 − 15 2 + 17 14 8
n
s

14 − 15 2 = 2 Sample 13 19 16
Means
× 6 𝑖𝑡𝑒𝑚𝑠 𝑝𝑒𝑟 𝑐𝑜𝑙𝑢𝑚𝑛 = 𝟏𝟐 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● As before, calculate the 12 17 15 k
Cold 16 14 15 M
Degrees of Freedom 18 11 13
e
14 a
Columns 17 14 8
n
s

𝑑𝑓𝑐𝑜𝑙𝑢𝑚𝑛𝑠 = 3 − 1 = 2 Sample 13 19 16
Means 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12 dfcolumns = 2
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● We have a new statistic: 12 17 15 k
Cold 16 14 15 M
SS Interactions 18 11 13
e
14 a
● For each sample mean, 17 14 8
n
s

subtract the matching Sample 13 19 16


Means
block and column means, 17 13 12
Column
add back the overall mean, Means 15 16 14 15

square the result SSB = 18 SSC = 12 dfcolumns = 2


Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
13 − 16 − 15 + 15 2 + 12 17 15 k
Cold 16 14 15 M
19 − 16 − 16 + 15 2 + 18 11 13
e
14 a
16 − 16 − 14 + 15 2 + 17 14 8
n
s

17 − 14 − 15 + 15 2 + Sample 13 19 16
Means
13 − 14 − 16 + 15 2 + 17 13 12
Column
12 − 14 − 14 + 15 2 = 28 Means
15 16 14 15

SSB = 18 SSC = 12 dfcolumns = 2


× 3 𝑖𝑡𝑒𝑚𝑠 𝑝𝑒𝑟 𝑠𝑎𝑚𝑝𝑙𝑒 = 𝟖𝟒
SSI = 84
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● Calculate the Sum of 12 17 15 k
Cold 16 14 15 M
Squares Total 18 11 13
e
14 a
n
4 36 9 17 14 8 s
1 16 0
Sample 13 19 16
9 4 0 Means 17 13 12
1 1 0 Column
15 16 14 15
9 16 4 Means

4 1 49 164 SSB = 18 SSC = 12 dfcolumns = 2


SSI = 84
SST = 164
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● Calculate the Sum of 12 17 15 k
Cold 16 14 15 M
Squares Error by 18 11 13
e
14 a
subtracting the other 17 14 8
n
s

values from the SST: Sample 13 19 16


Means
164 − 18 − 12 − 84 = 𝟓𝟎 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12 dfcolumns = 2
SSI = 84 SSE = 50
SST = 164
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● Degrees of Freedom Error 12 17 15 k
Cold 16 14 15 M
e
18 11 13 14 a
n
𝑏𝑙𝑜𝑐𝑘𝑠 × 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 × 𝑖𝑡𝑒𝑚𝑠 − 1 17 14 8 s
= 2 × 3 × 3 − 1 = 𝟏𝟐 Sample 13 19 16
Means 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12 dfcolumns = 2
SSI = 84 SSE = 50 dferror = 12
SST = 164
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● Calculate F 12 17 15 k
Cold 16 14 15 M
e
𝑆𝑆𝐶 12 18 11 13 14 a
n
𝑑𝑓𝑐𝑜𝑙𝑢𝑚𝑛𝑠
= 2 = 𝟏. 𝟒𝟒
17 14 8 s
𝐹=
𝑆𝑆𝐸 50 Sample 13 19 16
𝑑𝑓𝑒𝑟𝑟𝑜𝑟 12 Means 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12 dfcolumns = 2
SSI = 84 SSE = 50 dferror = 12
SST = 164
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
𝐹 = 𝟏. 𝟒𝟒 12 17 15 k
Cold 16 14 15 M
● Look up Fcritical 18 11 13
e
14 a
n
𝐹(0.05, 2, 12) = 𝟑. 𝟖𝟖𝟓 17 14 8 s

Sample 13 19 16
Means 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12 dfcolumns = 2
SSI = 84 SSE = 50 dferror = 12
SST = 164
Fertilizer: A B C
Two-Way ANOVA Warm 13 21 18 B
l
14 19 15 16 o
c
● A look at Interaction: 12 17 15 k
Cold 16 14 15 M
e
18 11 13 14 a
n
17 14 8 s

Sample 13 19 16
Means 17 13 12
Column
15 16 14 15
Means
SSB = 18 SSC = 12 dfcolumns = 2
SSI = 84 SSE = 50 dferror = 12
SST = 164
2-way with Replication in Excel
Next Up: REGRESSION

You might also like