You are on page 1of 46

Statistics and Data Analysis

Part_Eight
Analysis of Variance
Analysis of Variance
• Analysis of variance helps compare two or
more populations of quantitative data.
• Specifically, we are interested in the
relationships among the population means
(are they equal or not).
• The procedure works by analyzing the sample
variance.
Single - Factor (One - Way)
Analysis of Variance : Independent
Samples
• The analysis of variance is a procedure that
tests to determine whether differences exits
among two or more population means.

• To do this, the technique analyzes the


variance of the data.
• Example
– An apple juice manufacturer is planning to develop
a new product -- a liquid concentrate.
– The marketing manager has to decide how to
market the new product.
– Three strategies are considered

• Emphasize convenience of using the product.


• Emphasize the quality of the product.
• Emphasize the product’s low price.
CLUSTER SAMPLING
• Example - continued
– An experiment was conducted as follows:
• In three cities an advertising campaign was launched.
• In each city only one of the three characteristics
(convenience, quality, and price) was emphasized.
• The weekly sales were recorded for twenty weeks
following the beginning of the campaigns.
• We assume the samples are independent of each other
• Example - continued
– Data
• Solution
Convnce
Convnce Quality
Quality Price
Price
529
529
658
804
804
630
672
672
531
– The data are quantitative.
658 630 531
793
793 774
774 443
443 Weekly– sales
514
514
663
717
717
679
596
596
602
Our problem objective is
663
719
719
679
604
604
602
502
502
to compare sales in three
711
711 620
620 659
659 cities.
606
606 697
697 689
689
461
461
529
529
706
706
615
615
675
675
Weekly
512
512 sales – We hypothesize on the
498
498 492
492 691
691 relationships among the
663
663
604
604
719
719
787
787
733
733
698
698
three mean weekly sales:
495
495 699
699 776
776
485
485 572
572 561
561
557
557 523
523 572
572
353 584
353
557
557
Weekly
584
634
634
sales469
469
581
581
542
542 580
580 679
679
614
614 624
624 532
532
900

800

Exploratory 700
Analysis
600

500

400
SALES

300
N= 20 20 20

1.00 2.00 3.00

ADTYPE

Descriptives

SALES
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
1.00 20 577.5500 103.8027 23.2110 528.9688 626.1312 353.00 793.00
2.00 20 653.0000 85.0771 19.0238 613.1827 692.8173 492.00 804.00
3.00 20 608.6500 93.1141 20.8210 565.0713 652.2287 443.00 776.00
Total 60 613.0667 97.8147 12.6278 587.7984 638.3349 353.00 804.00
H0: 1 = 2= 3
H1: At least two means differ
• The test stems from the following rationale:
– If the null hypothesis is true, we would expect
all the sample means be close to one another
(and as a result to the overall mean).

– If the alternative hypothesis is true, at least 2 of


the sample means would be different from one
another.
Ho: 1 = 2= 3
900

800

H1: At least two means differ 700

600

Test statistic F= 3.23 500

p-val = 0.047 < 0.05 400

SALES
300
N= 20 20 20

1.00 2.00 3.00

There is sufficient evidence ADTYPE

to reject Ho in favor of H1, and argue that at least one


of the mean sales is different than the others.
Descriptives

SALES
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
1.00 20 577.5500 103.8027 23.2110 528.9688 626.1312 353.00 793.00
2.00 20 653.0000 85.0771 19.0238 613.1827 692.8173 492.00 804.00
3.00 20 608.6500 93.1141 20.8210 565.0713 652.2287 443.00 776.00
Total 60 613.0667 97.8147 12.6278 587.7984 638.3349 353.00 804.00

SPSS runs the ANOVA test for us


ANOVA
p-value
SALES
Sum of
Squares df Mean Square F Sig.
Between Groups 57512.233 2 28756.117 3.233 .047
Within Groups 506983.5 57 8894.447
Total 564495.7 59
• The Test Statistic – where does it come from??

• The test stems from the following rationale:


– If the null hypothesis is true, we would expect
all the sample means be close to one another
(and as a result to the grand mean).
– If all the means are equal we would estimate
the sample variance by This measures
k ni
the Total variation
 ( x
j 1 i 1
ij  x) 2

SST
in the data
s 
2

n 1 n 1
• The variability among the sample means is
measured as the sum of squared distances
between each group mean and the overall mean.
This sum is called the
Sum of Squares Between Groups
SSB
900

In our example groups are 800

represented by the different 700

advertising strategies. 600

k 500

SSB   n j ( x j x) 2 400


SALES

300
N= 20 20 20

j1
1.00 2.00 3.00

ADTYPE

If all the means are equal then this number will be small
If differences exist among the means then this number will be large
•The variation in the data, IF some of the group
means are different is called the
Sum of Squares Within Groups
k nj
SSW
  ( xij  x j ) 2

j 1 i 1

It is possible to show that:

SST = SSB + SSW

ANOVA compares SSB to SSW. If SSB is comparatively


large, then there exist differences between the group means
SSB is the sum of squares between the constant mean and
each group mean
SST is the total variation, between the individual points and
the constant mean
SSW is the variation when different means are allowed
If SSW is about the same as SST then the means are close to
equal  SSB is small

 SST = SSB + SSW


 




The test statistic here


  compares SSB and SSW
 


If SSB is comparatively
large, then differences
1 2 3 4 exist in the group means
Actually SSB and SSW are NOT on the same scale,
they are scaled first, then compared

SSB SSW
MSB  MSW 
k 1 nk

The test statistic for ANOVA compares MSB to MSW using


If MSB is large then the F statistic
MSB
F is large and differences exist in the
MSW group means

This statistic has an F distribution with SST  SSB  SSW


(k-1) and (n-k) degrees of freedom
(n  1)  (k  1)  (n  k )
We can summarise this information in an ANOVA table

Source SS df MS F p-val

Between groups SSB k-1 MSB MSB/MSW


Within groups SSW n-k MSW

Total SST n-1

If the SSB is ‘large’ then the model with differing group means
is a significant improvement over the constant mean model as
SSW must be ‘small’. We can use
The p-value in this case is Pr( Fk 1, n  k  F ) SPSS to
calculate this
for us
Assumptions for ANOVA
The statistic will have an F-distribution only if the data in each
group are normally distributed AND the variation in each
group is roughly the same.
When using this test, if the data are highly non-normal OR have
vastly different variation in each group, then the test is not valid.

Usually it is enough if the histograms are roughly mound shaped

This means the histogram is roughly symmetric with highest


density in the middle and lowest density in the tails
And finally the hypothesis test:

H0: 1 = 2 = …=k
H1: At least two means differ

Test statistic: F MST


MSE
Rejection region: F>Fk-1,n-k

Specifically in our advertisement problem


MST
F
MSE
28,756.12

Ho: 1 = 2= 3 8,894.17
 3.23
H1: At least two means differ

Test statistic F= MST / MSE= 3.23

As p = 0.047 < 0.05 there is sufficient evidence


to reject Ho in favor of H1, and argue that at least one
of the mean sales is different than the others.
Checking the required conditions

• The F test requires that the populations are


normally distributed with equal variance.
• From SPSS we compare the sample
variances: 10774, 7238, 8670. It seems the
variances are roughly equal (we can test for
this too if we like)
• To check the normality observe the
histogram of each sample.
Convenience

10
8 All the distributions seem to
6
4 be close to normal OR at least
2
0
possibly normal.
450 550 650 750 850 More

Quality

10
8
6
4
2
0
450 550 650 750 850 More
Price

10
8
6
4
2
0
450 550 650 750 850 More
A test for equal variances
The hypotheses to be tested here are

H 0 :  12   22  ...   k2
H A : At least 1 group variance is different

Levene came up with a test for this assuming normally distributed


data in each group, SPSS does this test for us.

Test of Homogeneity of Variances The large p-value (0.71)


SALES
means that the sample
Levene variances are sufficiently
Statistic df1 df2 Sig. ‘close’ to each other. We
.344 2 57 .710 cannot conclude that they
are different.
So we have found differences in
the groups, but where do they
900 lie?
Hard to see where differences lie here,
800

700
But we want to know which is the ‘best’
600 or ‘worst’ advertising strategy
500

Solution: We can calculate all 3 95%


400
CI’s for the difference in mean sales
SALES

300
N= 20 20 20

1.00 2.00 3.00

ADTYPE
BUT: We need to account for the number of CI’s we calculate. Why??

A 95% CI contains the true value only 95% of the time, it is wrong 5%
of the time. So for every 20 intervals we calculate we would expect to
make the WRONG decision 1 time on average.
Comparison of multiple means
Question : Before looking at the data, the manager decides
to test whether emphasizing convenience is different to
emphasizing quality in terms of sales.

The point estimate of the difference in sales  2  1


is
x2  x1
As this is just a difference between two means we can put a
95% confidence interval around it as usual
2
s
x2  x1  2
n1  n2
This confidence interval relies on two things
1. The 2 groups of sales are independent of each other
2. The comparison has been decided to be made before we look
at the data. WHY??

Consider the case where we have 50 groups and wish to


compare among the means. We might compare among 50
school classes in terms of HSC marks. We can do 3 things

1. A planned comparison
2. An unplanned comparison
3. Data snoop
Planned comparisons

A planned comparison is when we wish to compare two means


because they answer a research question considered before the
data were inspected.

For example, in the 50 group study, Fred Smith may wish to


compare class 36 (his class) to his friend Bob Jones class (class 48).

Here we do the usual confidence interval. For  = 0.05 we have

1 1
Y48  Y36  2 s ( 
2
)
n48 n36
This interval has a 95% chance of containing the true mean
difference  48   36
Unplanned comparisons (data mining)

The researcher for the 50 class study finds 95% confidence


intervals for every possible difference between two means
(that’s 2450 intervals) and reports only the ones found to be
significantly different to each other. The claim is made that
significant differences have been found between the classes
and the intervals are used to illustrate where the differences are.

Any problem with this?

At the 5% level, we expect he will find 0.05*2450 = 122.5


intervals NOT containing 0, even if all the means really are
equal!
We must take account of this in some way.
Data Snooping

The researcher for the 50 class study picks the highest and lowest
group means and finds a 95% confidence interval for the difference
between these two means and reports only this interval.

Any problem with this?

We must account for the fact that there are 2450 possible
comparisons and he has picked the maximum and minimum
group means. Note that again we would expect 5% (that’s
122.5) of 95% intervals not to contain 0, even if ALL the 50
class means were exactly equal to each other.

Again we must take account of his.


Comparing multiple means
Unless we do a planned comparison, we must change the
level of each confidence interval we calculate for
differences between 2 means, to allow for data mining or
data snooping.

We want to calculate multiple confidence intervals so


that there is an overall 5% chance of finding a difference
(when that difference is really 0) for each sample of
DATA we analyse, NOT each CI we calculate.

There are many methods to do this. All involve


widening confidence intervals in some way. The best
and most accurate for pairwise comparisons were
discovered by John Tukey.
Tukey’s simultaneous mean confidence
intervals – give an overall 95% chance of
containing all 3 true mean difference values
Multiple Comparisons

Dependent Variable: SALES


Tukey HSD

Mean
Difference 95% Confidence Interval
(I) ADTYPE (J) ADTYPE (I-J) Std. Error Sig. Lower Bound Upper Bound
1.00 2.00 -75.4500* 29.8236 .037 -147.2181 -3.6819
3.00 -31.1000 29.8236 .553 -102.8681 40.6681
2.00 1.00 75.4500* 29.8236 .037 3.6819 147.2181
3.00 44.3500 29.8236 .305 -27.4181 116.1181
3.00 1.00 31.1000 29.8236 .553 -40.6681 102.8681
2.00 -44.3500 29.8236 .305 -116.1181 27.4181
*. The mean difference is significant at the .05 level.

We are 95% confident that strategy 2 (quality) is better than


strategy 1 (convenience) in terms of mean sales of apple juice.
One - way ANOVA
Response
Two - way ANOVA

Response

Treatment 3

Treatment 2

Treatment 1

Level 3
Level2 What if we have more
Level 1 Factor A
Level2 Level 1 than one factor OR
Factor B not independent groups?
Block all the observations with some commonality across treatments

Treatment 4
Treatment 3

Treatment 2

Treatment 1

Block3 Block2 Block 1

This is similar to matched pairs for 2 samples.


• Example
– A radio station manager wants to know if the
amount of time his listeners spent listening to the
radio is about the same for every day of the week.
– 200 teenagers were asked to record how long they
spend listening to a radio each day of the week.
• Solution
– The problem objective is to compare seven
populations (1 for each week day).
– The data are quantitative.
– Each day of the week can be considered a group.

– Each 7 data points (per person) are related (called a


block), because they belong to the same person.

– This procedure eliminates the variability in the


“radio-times” among teenagers, and helps detect
differences of the mean times teenagers listen to the
radio among the days of the week only.

Ho: 1 = 2=…= 7
H1: At least two means differ
ANOVA
ANOVA
Source
Source ofof Variation
Variation SS
SS dfdf MS
MS FF P-value
P-value FF crit
crit
Rows
Blocks
Rows 209834.6
209834.6 199
199 1054.445
1054.445 2.627722
2.627722 1.04E-23
1.04E-23 1.187531
1.187531
Columns
Groups
Columns 28673.73
28673.73 66 4778.955
4778.955 11.90936
11.90936 5.14E-13
5.14E-13 2.106162
2.106162
Error
Error 479125.1
479125.1 1194
1194 401.2773
401.2773

Total
Total 717633.5
717633.5 1399
1399
b-1 k-1
MSG
FGroups 
MSE
Conclusion: At 5% significance level there is sufficient evidence
to reject the null hypothesis, and infer that mean “radio time”
is different in at least one of the week days.
Two Factor Analysis of Variance

Suppose that two factors are to be examined:


• The effects of the marketing approach on sales.

– Emphasis on convenience
– Emphasis on quality
– Emphasis on price

• The effects of the selected media on sales.


– Advertise on TV
– Advertise in newspapers
• We can design the experiment as follows:
City 1 City2 City3 City4 City5 City6
Conven. & Quality& Price & Convenience& Quality & Price &
TV TV TV paper paper paper

• This is a one - way ANOVA experimental design.


Ho: 1 = 2=…= 6
H1: At least two means differ
The p-value =.045. We conclude that there is a
strong evidence that differences exist in the mean
weekly sales at the 5% level.
• Are these differences caused by differences
in the marketing approach?
• Are these differences caused by differences
in the medium used for advertising?
• Are there combinations of these two factors
that interact to affect the weekly sales?

• A new experimental design is needed to


answer these questions.
Factor A: Marketing strategy
Convenience Quality Price

TV City 1 City3 City 5


sales sales sales
Factor B:
Advertising media
City 2 City 4 City 6
Newspapers sales sales sales

Are there differences in the mean sales


caused by different marketing strategies?
Test whether mean sales of “Convenience”, “Quality”,
and “Price” significantly differ from one another.
Factor A: Marketing strategy
Convenience Quality Price

TV City 1 City 3 City 5


sales sales sales
Factor B:
Advertising media
City 2 City 4 City 6
Newspapers sales sales sales

Are there differences in the mean sales


caused by different advertising media?
Test whether mean sales of the “TV”, and “Newspapers”
significantly differ from one another.
Factor A: Marketing strategy
Convenience Quality Price

TV City 1 City 3 City 5


sales sales sales
Factor B:
Advertising media
City 2 City 4 City 6
Newspapers sales sales sales

Are there differences in the mean sales


caused by interaction between marketing
strategy and advertising medium?
Test whether mean sales of certain cells
are different than the level expected.
Always plot the data first

900

Graphically, quality again


800 seems the best strategy and
newspaper ads tend to get
700
slightly more sales than TV ads.
600

MEDIUM
500
SALES1

Newspape

400 TV
N= 10 10 10 10 10 10

1.00 2.00 3.00

ADTYPE
The interaction measures Estimated Marginal Means of SALES1
whether the effect of 700

each advertising strategy 680

660
is the same for each 640

advertising medium

Estimated Marginal Means


620

(papers and TV) 600

580 MEDIUM

560 Newspape

If the differences between the 540 TV


1.00 2.00 3.00
lines were changing or the
ADTYPE
lines crossed over, this would
be some evidence of interaction
The (close to) parallel lines indicate
that no interaction is occurring
between medium and advertising
strategy.
Convenience Quality Price The two - way
TV
TV
491
712
677
627
575
614
ANOVA in SPSS
TV 558 590 706
Tests of Between-Subjects Effects
TV 447 632 484
Dependent Variable: SALES1
TV 479 683 478
Type III Sum
Source624 760 of Squares
650
TV
df Mean Square F Sig.
546 Model 690 113620.283
TV Corrected 583 a 5 22724.057 2.449 .045
444
TV Intercept 548 22643098.0
536 1 22643098.02 2439.908 .000
TV MEDIUM 582 579 579
13172.017 1 13172.017 1.419 .239
TV ADTYPE 672 644 795
98838.633 2 49419.317 5.325 .008
Newspaper
MEDIUM 464 * ADTYPE
689 803
1609.633 2 804.817 .087 .917
Error 559
Newspaper 650 501136.700
584 54 9280.309
Total 759
Newspaper 704 23257855.0
525 60
557 Total 652 614756.983
Corrected
Newspaper 498 59
Newspaper a. R528
Squared =576 812
.185 (Adjusted R Squared = .109)
Newspaper 670 836 565
Newspaper 534 628 708
Clearly, at the 5% level, the Advertising
Newspaper 657 798 546 Strategy is affecting mean sales (p=0.008),
Newspaper 557 497 616 but Advert medium is NOT significantly
Newspaper 474 841 587
affecting mean sales. Also there is no
significant interaction effect on mean sales
Where do the differences lie?
Multiple Comparisons

Dependent Variable: SALES1


Tukey HSD

Mean
Difference 95% Confidence Interval
(I) ADTYPE (J) ADTYPE (I-J) Std. Error Sig. Lower Bound Upper Bound
1.00 2.00 -99.3500* 30.4636 .005 -172.7671 -25.9329
3.00 -46.5000 30.4636 .287 -119.9171 26.9171
2.00 1.00 99.3500* 30.4636 .005 25.9329 172.7671
3.00 52.8500 30.4636 .202 -20.5671 126.2671
3.00 1.00 46.5000 30.4636 .287 -26.9171 119.9171
2.00 -52.8500 30.4636 .202 -126.2671 20.5671
Based on observed means.
*. The mean difference is significant at the .05 level.

Again, we are 95% confident that strategy 2 (quality) is better than


strategy 1 (convenience) in terms of mean sales of apple juice.
There appears to be no significant difference in mean sales between
emphasizing quality or price, nor between price and convenience.
Are the assumptions satisfied?
a
Levene's Test of Equality of Error Variances

Dependent Variable: SALES1 The sample variances are


F df1 df2 Sig. NOT significantly different to
.731 5 54 .603
each other, as we have a high
Tests the null hypothesis that the error variance of the
dependent variable is equal across groups. p-value (0.603 > 0.05) here
a. Design:
900
Intercept+MEDIUM+ADTYPE+MEDIUM
900

* ADTYPE 40
36

800
800

700
700
The observations in each group appear to be
600
fairly symmetric and close to normal 600

500
SALES1

500
SALES1

400
N= 30 30
400
Newspape TV
N= 20 20 20

1.00 2.00 3.00


MEDIUM

ADTYPE

You might also like