You are on page 1of 24

Analysis of Variance

Introduction
Analysis of Variance
 The Analysis of Variance is abbreviated as
ANOVA
 Used for hypothesis testing in
 Simple Regression
 Multiple Regression
 Comparison of Means
Sources
 There is variation anytime that all of the
data values are not identical
 This variation can come from different
sources such as the model or the factor
 There is always the left-over variation that
can’t be explained by any of the other
sources. This source is called the error
Variation
 Variation is the sum of squares of the
deviations of the values from the mean of
those values
 As long as the values are not identical,
there will be variation
 Abbreviated as SS for Sum of Squares
Degrees of Freedom
 The degrees of freedom are the number
of values that are free to vary once certain
parameters have been established
 Usually, this is one less than the sample
size, but in general, it’s the number of
values minus the number of parameters
being estimated
 Abbreviated as df
Variance
 The sample variance is the average
squared deviation from the mean
 Found by dividing the variation by the
degrees of freedom
 Variance = Variation / df
 Abbreviated as MS for Mean of the
Squares
 MS = SS / df
F
 F is the F test statistic
 There will be an F test statistic for each
source except for the error and total
 F is the ratio of two sample variances
 The MS column contains variances
 The F test statistic for each source is the
MS for that row divided by the MS of the
error row
F
 F requires a pair of degrees of freedom,
one for the numerator and one for the
denominator
 The numerator df is the df for the source
 The denominator df is the df for the error
row
 F is always a right tail test
The ANOVA Table
 The ANOVA table is composed of rows,
each row represents one source of
variation
 For each source of variation …
 The variation is in the SS column
 The degrees of freedom is in the df column
 The variance is in the MS column
 The MS value is found by dividing the SS by
the df
ANOVA Table
 The complete ANOVA table can be
generated by most statistical packages
and spreadsheets
 We’ll concentrate on understanding how
the table works rather than the formulas
for the variations
The ANOVA Table
Source SS df MS F
(variation) (variance)

Explained*

Error

Total

The explained* variation has different names depending on the particular type
of ANOVA problem
Example 1

Source SS df MS F

Explained 18.9 3

Error 72.0 16

Total

The Sum of Squares and Degrees of Freedom are given. Complete the table.
Example 1 – Find Totals

Source SS df MS F

Explained 18.9 3

Error 72.0 16

Total 90.9 19

Add the SS and df columns to get the totals.


Example 1 – Find MS

Source SS df MS F

Explained 18.9 ÷3 = 6.30

Error 72.0 ÷ 16 = 4.50

Total 90.9 ÷ 19 = 4.78

Divide SS by df to get MS.


Example 1 – Find F

Source SS df MS F

Explained 18.9 3 6.30 1.40

Error 72.0 16 4.50

Total 90.9 19 4.78

F = 6.30 / 4.50 = 1.4


Notes about the ANOVA
 The MS(Total) isn’t actually part of the
ANOVA table, but it represents the sample
variance of the response variable, so it’s
useful to find
 The total df is one less than the sample
size
 You would either need to find a Critical F
value or the p-value to finish the
hypothesis test
Example 2

Source SS df MS F

Explained 106.6 21.32 2.60

Error 26

Total

Complete the table


Example 2 – Step 1

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 26 8.20

Total

SS / df = MS, so 106.6 / df = 21.32. Solving for df gives df = 5.


F = MS(Source) / MS(Error), so 2.60 = 21.32 / MS. Solving gives MS = 8.20.
Example 2 – Step 2

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 213.2 26 8.20

Total 31

SS / df = MS, so SS / 26 = 8.20. Solving for SS gives SS = 213.2.


The total df is the sum of the other df, so 5 + 26 = 31.
Example 2 – Step 3

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 213.2 26 8.20

Total 319.8 31

Find the total SS by adding the 106.6 + 213.2 = 319.8


Example 2 – Step 4

Source SS df MS F

Explained 106.6 5 21.32 2.60

Error 213.2 26 8.20

Total 319.8 31 10.32

Find the MS(Total) by dividing SS by df. 319.8 / 31 = 10.32


Example 2 – Notes
 Since there are 31 df, the sample size was
32
 Since the sample variance was 10.32 and
the standard deviation is the square root
of the variance, the sample standard
deviation is 3.21
Example 3

Source SS df MS F

Explained 56.7

Error 14 13.50

Total

The sample size is n = 20. Work this one out on your own!
Example 3 - Solution

Source SS df MS F

Explained 56.7 5 11.34 0.84

Error 189.0 14 13.50

Total 245.7 19 12.93

How did you do?

You might also like