You are on page 1of 61

ANOVA: Analysis of Variance

Prabhat Mittal
profmittal@yahoo.co.in
1
Learning Objectives

 To compare more than two population means


using one way analysis of variance
 To use the F distribution to test hypotheses
about two population variances
 To learn about randomized block design
 To learn the technique of two-way analysis of
variance and the concept of interaction

Prabhat Mittal
profmittal@yahoo.co.in
2
Chapter Overview

Analysis of Variance (ANOVA)

One-Way Randomized Two-Way


ANOVA Block Design ANOVA

F-test Multiple Interaction


Comparisons Effects
Tukey-
Kramer
test

Prabhat Mittal
profmittal@yahoo.co.in
3
General ANOVA Setting

 Investigator controls one or more independent


variables
 Called factors (or treatment variables)
 Each factor contains two or more levels (or
groups or categories/classifications)
 Observe effects on the dependent variable
 Response to levels of independent variable
 Experimental design: the plan used to collect
the data
Prabhat Mittal
profmittal@yahoo.co.in
4
Completely Randomized Design
 Experimental units (subjects) are assigned
randomly to treatments
 Subjects are assumed homogeneous
 Only one factor or independent variable
 With two or more treatment levels
 Analyzed by one-way analysis of variance
(ANOVA)

Prabhat Mittal
profmittal@yahoo.co.in
5
One-Way Analysis of Variance

 Evaluate the difference among the means of three or


more groups
Examples: Accident rates for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires

 Assumptions
 Populations are normally distributed

 Populations have equal variances

 Samples are randomly and independently drawn

Prabhat Mittal
profmittal@yahoo.co.in
6
Hypotheses of One-Way ANOVA

 H0 : μ1  μ2  μ3    μc
 All population means are equal
 i.e., no treatment effect (no variation in means among groups)

H : Not
 1At least all
oneof the population
population means are the same
mean is different
 i.e., there is a treatment effect
 Does not mean that all population means are different (some
pairs may be the same)

Prabhat Mittal
profmittal@yahoo.co.in
7
One-Factor ANOVA

H0 : μ1  μ2  μ3    μc
H1 : Not all μj are the same

All Means are the same:


The Null Hypothesis is True
(No Treatment Effect)

μ1  μ2  μ3

Prabhat Mittal
profmittal@yahoo.co.in
8
One-Factor ANOVA
(continued)
H0 : μ1  μ2  μ3    μc
H1 : Not all μj are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)

or

μ1  μ2  μ3 μ1  μ2  μ3
Prabhat Mittal
profmittal@yahoo.co.in
9
Partitioning the Variation

 Total variation can be split into two parts:


SST = SSA + SSW

SST = Total Sum of Squares


(Total variation)
SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)

Prabhat Mittal
profmittal@yahoo.co.in
10
Partitioning the Variation
(continued)

SST = SSA + SSW

Total Variation = the aggregate dispersion of the individual


data values across the various factor levels (SST)

Among-Group Variation = dispersion between the factor


sample means (SSA)

Within-Group Variation = dispersion that exists among


the data values within a particular factor level (SSW)

Prabhat Mittal
profmittal@yahoo.co.in
11
Partition of Total Variation

Total Variation (SST)


d.f. = n – 1

Variation Due to Variation Due to Random


= Factor (SSA) + Sampling (SSW)
d.f. = c – 1 d.f. = n – c

Commonly referred to as: Commonly referred to as:


 Sum of Squares Between  Sum of Squares Within
 Sum of Squares Among  Sum of Squares Error
 Sum of Squares Explained  Sum of Squares Unexplained
 Among Groups Variation  Within-Group Variation

Prabhat Mittal
profmittal@yahoo.co.in
12
Total Sum of Squares

SST = SSA + SSW


c nj

SST   ( Xij  X) 2

Where: j1 i1

SST = Total sum of squares


c = number of groups (levels or treatments)
nj = number of observations in group j
Xij = ith observation from group j
X = grand mean (mean of all data values)

Prabhat Mittal
profmittal@yahoo.co.in
13
Total Variation (continued)

SST  ( X11  X)2  ( X12  X)2  ...  ( Xcnc  X)2


Response, X

Group 1 Group 2 Group 3

Prabhat Mittal
profmittal@yahoo.co.in
14
Among-Group Variation

SST = SSA + SSW


c
SSA   n j ( X j  X) 2

j1
Where:
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)

Prabhat Mittal
profmittal@yahoo.co.in
15
Among-Group Variation
(continued)
c
SSA   n j ( X j  X) 2

j1

SSA
Variation Due to
MSA 
Differences Among Groups
c 1
Mean Square Among =
SSA/degrees of freedom

i j

Prabhat Mittal
profmittal@yahoo.co.in
16
Among-Group Variation (continued)

SSA  n1( x1  x)2  n2 ( x 2  x)2  ...  nc ( xc  x)2

Response, X

X3
X2 X
X1

Group 1 Group 2 Group 3

Prabhat Mittal
profmittal@yahoo.co.in
17
Within-Group Variation

SST = SSA + SSW


c nj

SSW    ( Xij  X j )2
j1 i1
Where:
SSW = Sum of squares within groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j
Prabhat Mittal
profmittal@yahoo.co.in
18
Within-Group Variation
(continued)

c nj

SSW    ( Xij  X j )2
j1 i1
SSW
Summing the variation
MSW 
within each group and then
adding over all groups nc
Mean Square Within =
SSW/degrees of freedom

μj

Prabhat Mittal
profmittal@yahoo.co.in
19
Within-Group Variation (continued)

SSW  ( x11  X1 )  ( X12  X 2 )  ...  ( Xcnc  Xc )


2 2 2

Response, X

X3
X2
X1

Group 1 Group 2 Group 3

Prabhat Mittal
profmittal@yahoo.co.in
20
Obtaining the Mean Squares

SSA
MSA 
c 1
SSW
MSW 
nc
SST
MST 
n 1
Prabhat Mittal
profmittal@yahoo.co.in
21
One-Way ANOVA Table

Source of SS df MS F ratio
Variation (Variance)
Among SSA MSA
SSA c-1 MSA = F=
Groups c-1 MSW
Within SSW
SSW n-c MSW =
Groups n-c
SST =
Total n-1
SSA+SSW
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom

Prabhat Mittal
profmittal@yahoo.co.in
22
One-Way ANOVA
F Test Statistic
H0: μ1= μ2 = … = μc
H1: At least two population means are different

 Test statistic MSA


F
MSW
MSA is mean squares among groups
MSW is mean squares within groups
 Degrees of freedom
 df1 = c – 1 (c = number of groups)
 df2 = n – c (n = sum of sample sizes from all populations)

Prabhat Mittal
profmittal@yahoo.co.in
23
Interpreting One-Way ANOVA
F Statistic

 The F statistic is the ratio of the among


estimate of variance and the within
estimate of variance
 The ratio must always be positive
 df1 = c -1 will typically be small
 df2 = n - c will typically be large

Decision Rule:
 Reject H0 if F > FU,  = .05
otherwise do not
reject H0 0 Do not Reject H0
reject H0
FU
Prabhat Mittal
profmittal@yahoo.co.in
24
One-Way ANOVA
F Test Example

You want to see if three Club 1 Club 2 Club 3


different golf clubs yield 254 234 200
different distances. You 263 218 222
randomly select five 241 235 197
measurements from trials on an 237 227 206
automated driving machine for 251 216 204
each club. At the 0.05
significance level, is there a
difference in mean distance?

Prabhat Mittal
profmittal@yahoo.co.in
25
One-Way ANOVA Example:
Scatter Diagram
Distance
Club 1 Club 2 Club 3 270
254 234 200 260 •
263 218 222 ••
241 235 197
250 X1
240 •
237 227 206 • ••
251 216 204 230
• X
220
••
X2 •
210
x1  249.2 x 2  226.0 x 3  205.8
•• X3
200 ••
x  227.0 190

1 2 3
Prabhat Mittal
Club
profmittal@yahoo.co.in
26
One-Way ANOVA Example
Computations

Club 1 Club 2 Club 3 X1 = 249.2 n1 = 5


254 234 200 X2 = 226.0 n2 = 5
263 218 222
X3 = 205.8 n3 = 5
241 235 197
237 227 206 n = 15
X = 227.0
251 216 204 c=3
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

MSA = 4716.4 / (3-1) = 2358.2 2358.2


F  25.275
MSW = 1119.6 / (15-3) = 93.3 93.3

Prabhat Mittal
profmittal@yahoo.co.in
27
One-Way ANOVA Example
Solution
H0: μ1 = μ2 = μ3 Test Statistic:
H1: μj not all equal
MSA 2358.2
 = 0.05 F   25.275
df1= 2 df2 = 12 MSW 93.3

Critical Decision:
Value:
Reject H0 at  = 0.05
FU = 3.89
 = .05 Conclusion:
There is evidence that
0 Do not Reject H0 at least one μj differs
reject H0 F = 25.275
FU = 3.89 from the rest
Prabhat Mittal
profmittal@yahoo.co.in
28
One-Way ANOVA
Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14
Prabhat Mittal
profmittal@yahoo.co.in
29
Numerical Problems
The following data show the number of claims processed
per day for a group of four insurance company employees
observed for a number of days. Test the hypothesis that
the employees’ mean claims per day are all the same. Use
the 0.05 level of significance.
Employee 1 15 17 14 12
Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9

Solution: H0: μ1 = μ2 = μ3 = μ4 & H1: μj not all equal


Prabhat Mittal
profmittal@yahoo.co.in
30
One-Way ANOVA Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY

Groups Count Sum Average Variance

Employee 1 4 58 14.5 4.333333

Employee 2 4 52 13 8.666667
Employee 3 5 65 13 2.5

Employee 4 6 70 11.67 3.466667


Sources of variation SS Df Ms F P-value F Crit
Between groups 19.45 3 6.48 1.46 0.26 3.28

Within Groups 66.33 15 4.42


Total 85.78 18
Do not reject Ho. The employees' productivities are not
significantly different
Prabhat Mittal
profmittal@yahoo.co.in
31
Session-II

ANOVA: Analysis of Variance

Prabhat Mittal
profmittal@yahoo.co.in
32
The Tukey-Kramer Procedure

 Tells which population means are significantly


different
 e.g.: μ1 = μ2  μ3
 Done after rejection of equal means in ANOVA

 Allows pair-wise comparisons


 Compare absolute mean differences with
critical range
μ1= μ2 μ3 x

Prabhat Mittal
profmittal@yahoo.co.in
33
Tukey-Kramer Critical Range

MSW 1 1
Critical Range  QU   
2 n n 
 j j' 

where:
QU = Value from Studentized Range Distribution
with c and n - c degrees of freedom for
the desired level of  (see appendix E.9 table)
MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’

Prabhat Mittal
profmittal@yahoo.co.in
34
The Tukey-Kramer Procedure:
Example
1. Compute absolute mean
Club 1 Club 2 Club 3 differences:
254 234 200
263 218 222 x1  x 2  249.2  226.0  23.2
241 235 197 x1  x 3  249.2  205.8  43.4
237 227 206
251 216 204 x 2  x 3  226.0  205.8  20.2

2. Find the QU value from the table in appendix E.10 with


c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom
for the desired level of  ( = 0.05 used here):

QU  3.77

Prabhat Mittal
profmittal@yahoo.co.in
35
The Tukey-Kramer Procedure:
Example
(continued)
3. Compute Critical Range:
1 1
Critical Range  QU
MSW     3.77 93.3  1  1   16.285
2 n n  2 5 5
 j j' 

4. Compare:
5. All of the absolute mean differences x1  x 2  23.2
are greater than critical range.
Therefore there is a significant x1  x 3  43.4
difference between each pair of
means at 5% level of significance. x 2  x 3  20.2
Thus, with 95% confidence we can conclude
that the mean distance for club 1 is greater
than club 2 and 3, and club 2 is greater than
club 3.
Prabhat Mittal
profmittal@yahoo.co.in
36
The Randomized Block Design

 Like One-Way ANOVA, we test for equal


population means (for different factor levels,
for example)...

 ...but we want to control for possible variation


from a second factor (with two or more levels)

 Levels of the secondary factor are called


blocks
Prabhat Mittal
profmittal@yahoo.co.in
37
Partitioning the Variation

 Total variation can now be split into three


parts:
SST = SSA + SSBL + SSE

SST = Total variation


SSA = Among-Group variation
SSBL = Among-Block variation
SSE = Random variation

Prabhat Mittal
profmittal@yahoo.co.in
38
Sum of Squares for Blocking

SST = SSA + SSBL + SSE

r
SSBL  c  ( Xi.  X) 2

i1
Where:
c = number of groups
r = number of blocks
Xi. = mean of all values in block i
X = grand mean (mean of all data values)

Prabhat Mittal
profmittal@yahoo.co.in
39
Partitioning the Variation

 Total variation can now be split into three


parts:

SST = SSA + SSBL + SSE

SST and SSA are SSE = SST – (SSA + SSBL)


computed as they were
in One-Way ANOVA

Prabhat Mittal
profmittal@yahoo.co.in
40
Mean Squares

SSBL
MSBL  Mean square blocking 
r 1

SSA
MSA  Mean square among groups 
c 1

SSE
MSE  Mean square error 
(r  1)(c  1)

Prabhat Mittal
profmittal@yahoo.co.in
41
Randomized Block ANOVA Table
Source of SS df MS F ratio
Variation
Among MSA
Treatments SSA c-1 MSA
MSE
Among SSBL r-1 MSBL MSBL
Blocks
MSE
Error SSE (r–1)(c-1) MSE

Total SST rc - 1
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom
Prabhat Mittal
profmittal@yahoo.co.in
42
Blocking Test

H0 : μ1.  μ2.  μ3.  ...


H1 : Not all block means are equal

MSBL
F=
MSE
 Blocking test: df1 = r – 1
df2 = (r – 1)(c – 1)

Reject H0 if F > FU

Prabhat Mittal
profmittal@yahoo.co.in
43
Main Factor Test

H0 : μ.1  μ.2  μ.3  ...  μ.c


H1 : Not all population means are equal

MSA
F=
MSE
 Main Factor test: df1 = c – 1
df2 = (r – 1)(c – 1)

Reject H0 if F > FU

Prabhat Mittal
profmittal@yahoo.co.in
44
The Tukey Procedure

 To test which population means are


significantly different
 e.g.: μ1 = μ2 ≠ μ3
 Done after rejection of equal means in
randomized block ANOVA design
 Allows pair-wise comparisons
 Compare absolute mean differences with
critical range

1= 2 3 x

Prabhat Mittal
profmittal@yahoo.co.in
45
The Tukey Procedure
(continued)

MSE
Critical Range  Qu
r

Compare:
Is x.j  x.j'  Critical Range ? x .1  x .2

If the absolute mean difference x .1  x .3


is greater than the critical range
then there is a significant x .2  x .3
difference between that pair of
means at the chosen level of etc...
significance.

Prabhat Mittal
profmittal@yahoo.co.in
46
Factorial Design:
Two-Way ANOVA
 Examines the effect of
 Two factors of interest on the dependent
variable
 e.g., Percent carbonation and line speed on
soft drink bottling process
 Interaction between the different levels of
these two factors
 e.g., Does the effect of one particular
carbonation level depend on which level
the line speed is set?
Prabhat Mittal
profmittal@yahoo.co.in
47
Two-Way ANOVA (continued)

 Assumptions

 Populations are normally distributed


 Populations have equal variances
 Independent random samples are drawn

Prabhat Mittal
profmittal@yahoo.co.in
48
Two-Way ANOVA
Sources of Variation

Two Factors of interest: A and B


r = number of levels of factor A
c = number of levels of factor B
n’ = number of replications for each cell
n = total number of observations in all cells
(n = rcn’)
Xijk = value of the kth observation of level i of
factor A and level j of factor B
Prabhat Mittal
profmittal@yahoo.co.in
49
Two-Way ANOVA
Sources of Variation (continued)

SST = SSA + SSB + SSAB + SSE Degrees of


Freedom:
SSA r–1
Factor A Variation

SST SSB c–1


Factor B Variation
Total Variation
SSAB
Variation due to interaction (r – 1)(c – 1)
between A and B
n-1
SSE rc(n’ – 1)
Random variation (Error)

Prabhat Mittal
profmittal@yahoo.co.in
50
Two Factor ANOVA Equations

Total Variation: r c n
SST   ( Xijk  X)2
i1 j1 k 1

Factor A Variation: r
SSA  cn ( Xi..  X)2
i1

Factor B Variation: c
SSB  rn ( X. j.  X) 2

j1

Prabhat Mittal
profmittal@yahoo.co.in
51
Two Factor ANOVA Equations
(continued)

Interaction Variation:
r c
SSAB  n ( Xij.  Xi..  X.j.  X)2
i1 j1

Sum of Squares Error:


r c n
SSE   ( Xijk  Xij. )2
i1 j1 k 1

Prabhat Mittal
profmittal@yahoo.co.in
52
Two Factor ANOVA Equations
(continued)
r c n

where:  X
i1 j1 k 1
ijk

X  Grand Mean
c n
rcn
 X
j1 k 1
ijk

Xi..   Mean of ith level of factor A (i  1, 2, ..., r)


cn
r n

 X ijk
X. j.  i1 k 1
 Mean of jth level of factor B (j  1, 2, ..., c)
rn
n
Xijk
Xij.  
r = number of levels of factor A
 Mean of cell ij
k 1 n
c = number of levels of factor B
n’ = number of replications in each cell

Prabhat Mittal
profmittal@yahoo.co.in
53
Mean Square Calculations

SSA
MSA  Mean square factor A 
r 1

SSB
MSB  Mean square factor B 
c 1

SSAB
MSAB  Mean square interactio n 
(r  1)(c  1)
SSE
MSE  Mean square error 
rc(n'1)

Prabhat Mittal
profmittal@yahoo.co.in
54
Two-Way ANOVA:
The F Test Statistic
F Test for Factor A Effect
H0: μ1.. = μ2.. = μ3.. = • • •
MSA Reject H0
H1: Not all μi.. are equal F
MSE if F > FU

F Test for Factor B Effect


H0: μ.1. = μ.2. = μ.3. = • • •
MSB Reject H0
H1: Not all μ.j. are equal F
MSE if F > FU

F Test for Interaction Effect


H0: the interaction of A and B is
equal to zero
MSAB
H1: interaction of A and B is not F Reject H0
MSE if F > FU
zero

Prabhat Mittal
profmittal@yahoo.co.in
55
Two-Way ANOVA
Summary Table

Source of Sum of Degrees of Mean F


Variation Squares Freedom Squares Statistic

MSA MSA
Factor A SSA r–1
= SSA /(r – 1) MSE
MSB MSB
Factor B SSB c–1
= SSB /(c – 1) MSE

AB MSAB MSAB
SSAB (r – 1)(c – 1)
(Interaction) = SSAB / (r – 1)(c – 1) MSE

MSE =
Error SSE rc(n’ – 1)
SSE/rc(n’ – 1)
Total SST n–1

Prabhat Mittal
profmittal@yahoo.co.in
56
Features of Two-Way ANOVA
F Test
 Degrees of freedom always add up
 n-1 = rc(n’-1) + (r-1) + (c-1) + (r-1)(c-1)
 Total = error + factor A + factor B + interaction
 The denominator of the F Test is always the
same but the numerator is different
 The sums of squares always add up
 SST = SSE + SSA + SSB + SSAB
 Total = error + factor A + factor B + interaction
Prabhat Mittal
profmittal@yahoo.co.in
57
Examples:
Interaction vs. No Interaction
 Interaction is
 No interaction: present:

Factor B Level 1
Mean Response

Mean Response
Factor B Level 1
Factor B Level 3

Factor B Level 2
Factor B Level 2
Factor B Level 3

Factor A Levels Factor A Levels

Prabhat Mittal
profmittal@yahoo.co.in
58
Multiple Comparisons:
The Tukey Procedure
 Unless there is a significant interaction, you
can determine the levels that are significantly
different using the Tukey procedure
 Consider all absolute mean differences and
compare to the calculated critical range
 Example: Absolute differences X1..  X 2..
for factor A, assuming three factors:
X1..  X3..

X 2..  X3..
Prabhat Mittal
profmittal@yahoo.co.in
59
Multiple Comparisons:
The Tukey Procedure
 Critical Range for Factor A:
MSE
Critical Range  QU
c n'
(where Qu is from Table E.10 with r and rc(n’–1) d.f.)

 Critical Range for Factor B:

MSE
Critical Range  QU
r n'
(where Qu is from Table E.10 with c and rc(n’–1) d.f.)

Prabhat Mittal
profmittal@yahoo.co.in
60
Summary
 Described one-way analysis of variance
 The logic of ANOVA
 ANOVA assumptions
 F test for difference in c means
 The Tukey-Kramer procedure for multiple comparisons
 Considered the Randomized Block Design
 Treatment and Block Effects
 Multiple Comparisons: Tukey Procedure
 Described two-way analysis of variance
 Examined effects of multiple factors
 Examined interaction between factors
Prabhat Mittal
profmittal@yahoo.co.in
61

You might also like