Professional Documents
Culture Documents
Part_Eight
Analysis of Variance
Analysis of Variance
• Analysis of variance helps compare two or
more populations of quantitative data.
• Specifically, we are interested in the
relationships among the population means
(are they equal or not).
• The procedure works by analyzing the sample
variance.
Single - Factor (One - Way)
Analysis of Variance : Independent
Samples
• The analysis of variance is a procedure that
tests to determine whether differences exits
among two or more population means.
800
Exploratory 700
Analysis
600
500
400
SALES
300
N= 20 20 20
ADTYPE
Descriptives
SALES
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
1.00 20 577.5500 103.8027 23.2110 528.9688 626.1312 353.00 793.00
2.00 20 653.0000 85.0771 19.0238 613.1827 692.8173 492.00 804.00
3.00 20 608.6500 93.1141 20.8210 565.0713 652.2287 443.00 776.00
Total 60 613.0667 97.8147 12.6278 587.7984 638.3349 353.00 804.00
H0: 1 = 2= 3
H1: At least two means differ
• The test stems from the following rationale:
– If the null hypothesis is true, we would expect
all the sample means be close to one another
(and as a result to the overall mean).
800
600
SALES
300
N= 20 20 20
SALES
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
1.00 20 577.5500 103.8027 23.2110 528.9688 626.1312 353.00 793.00
2.00 20 653.0000 85.0771 19.0238 613.1827 692.8173 492.00 804.00
3.00 20 608.6500 93.1141 20.8210 565.0713 652.2287 443.00 776.00
Total 60 613.0667 97.8147 12.6278 587.7984 638.3349 353.00 804.00
SST
in the data
s
2
n 1 n 1
• The variability among the sample means is
measured as the sum of squared distances
between each group mean and the overall mean.
This sum is called the
Sum of Squares Between Groups
SSB
900
k 500
300
N= 20 20 20
j1
1.00 2.00 3.00
ADTYPE
If all the means are equal then this number will be small
If differences exist among the means then this number will be large
•The variation in the data, IF some of the group
means are different is called the
Sum of Squares Within Groups
k nj
SSW
( xij x j ) 2
j 1 i 1
SSB SSW
MSB MSW
k 1 nk
Source SS df MS F p-val
If the SSB is ‘large’ then the model with differing group means
is a significant improvement over the constant mean model as
SSW must be ‘small’. We can use
The p-value in this case is Pr( Fk 1, n k F ) SPSS to
calculate this
for us
Assumptions for ANOVA
The statistic will have an F-distribution only if the data in each
group are normally distributed AND the variation in each
group is roughly the same.
When using this test, if the data are highly non-normal OR have
vastly different variation in each group, then the test is not valid.
H0: 1 = 2 = …=k
H1: At least two means differ
10
8 All the distributions seem to
6
4 be close to normal OR at least
2
0
possibly normal.
450 550 650 750 850 More
Quality
10
8
6
4
2
0
450 550 650 750 850 More
Price
10
8
6
4
2
0
450 550 650 750 850 More
A test for equal variances
The hypotheses to be tested here are
H 0 : 12 22 ... k2
H A : At least 1 group variance is different
700
But we want to know which is the ‘best’
600 or ‘worst’ advertising strategy
500
300
N= 20 20 20
ADTYPE
BUT: We need to account for the number of CI’s we calculate. Why??
A 95% CI contains the true value only 95% of the time, it is wrong 5%
of the time. So for every 20 intervals we calculate we would expect to
make the WRONG decision 1 time on average.
Comparison of multiple means
Question : Before looking at the data, the manager decides
to test whether emphasizing convenience is different to
emphasizing quality in terms of sales.
1. A planned comparison
2. An unplanned comparison
3. Data snoop
Planned comparisons
1 1
Y48 Y36 2 s (
2
)
n48 n36
This interval has a 95% chance of containing the true mean
difference 48 36
Unplanned comparisons (data mining)
The researcher for the 50 class study picks the highest and lowest
group means and finds a 95% confidence interval for the difference
between these two means and reports only this interval.
We must account for the fact that there are 2450 possible
comparisons and he has picked the maximum and minimum
group means. Note that again we would expect 5% (that’s
122.5) of 95% intervals not to contain 0, even if ALL the 50
class means were exactly equal to each other.
Mean
Difference 95% Confidence Interval
(I) ADTYPE (J) ADTYPE (I-J) Std. Error Sig. Lower Bound Upper Bound
1.00 2.00 -75.4500* 29.8236 .037 -147.2181 -3.6819
3.00 -31.1000 29.8236 .553 -102.8681 40.6681
2.00 1.00 75.4500* 29.8236 .037 3.6819 147.2181
3.00 44.3500 29.8236 .305 -27.4181 116.1181
3.00 1.00 31.1000 29.8236 .553 -40.6681 102.8681
2.00 -44.3500 29.8236 .305 -116.1181 27.4181
*. The mean difference is significant at the .05 level.
Response
Treatment 3
Treatment 2
Treatment 1
Level 3
Level2 What if we have more
Level 1 Factor A
Level2 Level 1 than one factor OR
Factor B not independent groups?
Block all the observations with some commonality across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Ho: 1 = 2=…= 7
H1: At least two means differ
ANOVA
ANOVA
Source
Source ofof Variation
Variation SS
SS dfdf MS
MS FF P-value
P-value FF crit
crit
Rows
Blocks
Rows 209834.6
209834.6 199
199 1054.445
1054.445 2.627722
2.627722 1.04E-23
1.04E-23 1.187531
1.187531
Columns
Groups
Columns 28673.73
28673.73 66 4778.955
4778.955 11.90936
11.90936 5.14E-13
5.14E-13 2.106162
2.106162
Error
Error 479125.1
479125.1 1194
1194 401.2773
401.2773
Total
Total 717633.5
717633.5 1399
1399
b-1 k-1
MSG
FGroups
MSE
Conclusion: At 5% significance level there is sufficient evidence
to reject the null hypothesis, and infer that mean “radio time”
is different in at least one of the week days.
Two Factor Analysis of Variance
– Emphasis on convenience
– Emphasis on quality
– Emphasis on price
900
MEDIUM
500
SALES1
Newspape
400 TV
N= 10 10 10 10 10 10
ADTYPE
The interaction measures Estimated Marginal Means of SALES1
whether the effect of 700
660
is the same for each 640
advertising medium
580 MEDIUM
560 Newspape
Mean
Difference 95% Confidence Interval
(I) ADTYPE (J) ADTYPE (I-J) Std. Error Sig. Lower Bound Upper Bound
1.00 2.00 -99.3500* 30.4636 .005 -172.7671 -25.9329
3.00 -46.5000 30.4636 .287 -119.9171 26.9171
2.00 1.00 99.3500* 30.4636 .005 25.9329 172.7671
3.00 52.8500 30.4636 .202 -20.5671 126.2671
3.00 1.00 46.5000 30.4636 .287 -26.9171 119.9171
2.00 -52.8500 30.4636 .202 -126.2671 20.5671
Based on observed means.
*. The mean difference is significant at the .05 level.
* ADTYPE 40
36
800
800
700
700
The observations in each group appear to be
600
fairly symmetric and close to normal 600
500
SALES1
500
SALES1
400
N= 30 30
400
Newspape TV
N= 20 20 20
ADTYPE