You are on page 1of 40

ANOVA: Analysis of Variance

Prof. Rohit Joshi, Prof. Achinta Kr. Sarmah


Design of experiments
• An experimental design is a plan and a structure to test hypotheses in which the researcher
either controls or manipulates one or more variables.

• It contains independent and dependent variables.

• In an experimental design, an independent variable may be either a treatment variable or a


classification variable.
– A treatment variable (or factors) is a variable the experimenter controls or modifies in
the experiment.
– A classification variable is some characteristic of the experimental subject that was
present prior to the experiment and is not a result of the experimenter’s manipulations
or control.

2
Example 1
Wal-Mart executives might sanction an in-house study to compare daily sales volumes for a given
size store in four different demographic settings: (1) inner-city stores (large city), (2) suburban
stores (large city), (3) stores in a medium-sized city, and (4) stores in a small town. Managers
might also decide to compare sales on the five different weekdays (Monday through Friday).
• In this study, the independent variables are store demographics and day of the week.
• The classification variables of store demographics are the four type of stores.
• The dependent variable is the daily sales volume.

3
Example 2
Southwest Airlines is able to keep fares low, in part because of relatively low maintenance costs
on its airplanes. One of the main reasons for the low maintenance costs is that Southwest flies
only one type of aircraft, the Boeing 737. However, Southwest flies three different versions of the
737. Suppose Southwest decides to conduct a study to determine whether there is a significant
difference in the average annual maintenance costs for the three types of 737s used.

a. State an independent variable for such a study.

b. What are some of the levels or classifications that might be studied under this variable?

c. Give a dependent variable for this study.

4
Example 3
Is there a difference in the family demographics of people who stay at motels?
Suppose a study is conducted in which three categories of motels are used: economy motels,
modestly priced chain motels, and exclusive motels. One of the dependent variables studied
might be the number of children in the family of the person staying in the motel. Name three
other dependent variables that might be used in this study.

5
Defining ANOVA
ANOVA (ANalysis Of VAriance) is a statistical method for determining the existence of differences among
several population means.

 ANOVA is designed to detect differences among means from populations subject to different
treatments

 ANOVA is a joint test


• The equality of several population means is tested simultaneously or jointly.

 ANOVA tests for the equality of several population means by looking at two estimators of the
population variance (hence, analysis of variance).
The Hypothesis Test of
Analysis of Variance
• In an analysis of variance:

 We have r independent random samples, each one corresponding to a population subject to a


different treatment.

 We have:
 n = n1+ n2+ n3+ ...+nr total observations.

 r sample means: x1, x2 , x3 , ... , xr


 These r sample means can be used to calculate an estimator of the population
variance. If the population means are equal, we expect the variance among the
sample means to be small.

 r sample variances: s12, s22, s32, ...,sr2


 These sample variances can be used to find a pooled estimator of the population
variance.

7
The Hypothesis Test of Analysis of
Variance (continued): Assumptions
•• We
Weassume
assumeindependent
independentrandom
randomsampling
samplingfrom
fromeach
eachofofthe
ther rpopulations
populations

•• We
Weassume
assumethat
thatthe
ther rpopulations
populationsunder
understudy:
study:
–– are
arenormally
normallydistributed,
distributed,
–– with
withmeans
meansmmthat
i
i thatmay
mayorormay
maynot
notbe
beequal,
equal,
–– but
butwith
withequal
equalvariances,
variances,s s2i.2.
i

……

1 2 r
Population 1 Population 2 Population r

8
The Hypothesis Test of Analysis of
Variance (continued)
Thehypothesis
The hypothesistest
testofofanalysis
analysisofofvariance:
variance:

HH0:0:mm1 1==mm2 2==mm3 3==mm4 4==......MMr r

HH1:1:Not
Notall
allmm(i
i
i (i==1,
1,...,
...,r)r)are
areequal
equal

Thetest
The teststatistic
statisticofofanalysis
analysisofofvariance:
variance:

Estimate of variance based on means from r samples


F(r-1,n-r)
F(r-1, n-r)== Estimate of variance based on means from r samples
Estimate of variance based on all sample observations
Estimate of variance based on all sample observations

Thatis,
That is,the
thetest
teststatistic
statisticininan
ananalysis
analysisofofvariance
varianceisisbased
basedononthe
theratio
ratioofoftwo
twoestimators
estimatorsofofaa
populationvariance,
population variance,and
andisistherefore
thereforebased
basedononthe
theFFdistribution,
distribution,with
with(r-1)
(r-1)degrees
degreesofof
freedomininthe
freedom thenumerator
numeratorand and(n-r)
(n-r)degrees
degreesofoffreedom
freedomininthe
thedenominator.
denominator.
When the Null Hypothesis Is True

Whenthe
When thenull
nullhypothesis
hypothesisisistrue:
true:

Wewould
We wouldexpect
expectthe
thesample
samplemeans
meanstotobe
benearly
nearlyequal,
equal,asasinin
x thisillustration.
this illustration. And
Andwewewould
wouldexpect
expectthe
thevariation
variationamong
among
thesample
the samplemeans
means(between
(betweensample)
sample)totobe
besmall,
small,relative
relativetoto
thevariation
the variationfound
foundaround
aroundthe
theindividual
individualsample
samplemeans
means
(withinsample).
(within sample).

IfIfthe
thenull
nullhypothesis
hypothesisisistrue,
true, the
thenumerator
numeratorininthethetest
test
x statisticisisexpected
statistic expectedtotobe
besmall,
small,relative
relativetotothe
thedenominator:
denominator:

Estimate of variance based on means from r samples


F(r-1,n-r)=
F(r-1, n-r)=Estimate of variance based on means from r samples
Estimate of variance based on all sample observations
Estimate of variance based on all sample observations

x
10
When the Null Hypothesis Is False
When the null hypothesis is false:
m is equal to m but not to m ,
m is equal to m but not to m ,
m is equal to m but not tom , or
x x x m , m , and m are all unequal.

InInany
anyofofthese
thesesituations,
situations,we
wewould
wouldnot
notexpect
expectthe
thesample
samplemeans
meanstotoall
allbe
benearly
nearlyequal.
equal. WeWewould
would
expectthe
expect thevariation
variationamong
amongthethesample
samplemeans
means(between
(betweensample)
sample)totobe
belarge,
large,relative
relativetotothe
thevariation
variation
aroundthe
around theindividual
individualsample
samplemeans
means(within
(withinsample).
sample).

IfIfthe
thenull
nullhypothesis
hypothesisisisfalse,
false, the
thenumerator
numeratorininthe
thetest
teststatistic
statisticisisexpected
expectedtotobe
belarge,
large,relative
relativetotothe
the
denominator:
denominator:

F(r-1, n-r)=
F(r-1, n-r)=Estimate
Estimate of variance based on means from r samples
of variance based on means from r samples
Estimate of variance based on all sample observations
Estimate of variance based on all sample observations
Example
Randomlychosen
Randomly chosengroups
groupsofofcustomers
customerswere
wereserved
serveddifferent
differenttypes
typesofofcoffee
coffeeand
andasked
asked
totorate
ratethe
thecoffee
coffeeon
onaascale
scaleofof00toto100:
100:21
21were
wereserved
servedpure
pureBrazilian
Braziliancoffee,
coffee,20
20were
were
servedpure
served pureColombian
Colombiancoffee,
coffee,and
and2222were
wereserved
servedpure
pureAfrican-grown
African-growncoffee.
coffee.
Theresulting
The resultingtest
teststatistic
statisticwas
wasFF==2.022.02

F Distribution with 2 and 60 Degrees of Freedom


H :  
0 1 2 3
H : Not all three means equal 0.7
1 0.6
n = 21 n = 20 n = 22 n = 21+ 20 + 22 = 63 0.5
1 2 3
0.4

f(F)
r=3 0.3
The critical point for  = 0.05 is : 0.2
a=0.05
F  F  F  3.15 0.1
 r -1,n-r   31,633  2,60 
  
      0.0
     
0 1 2 3 4 5 F
F  2.02  F  3.15


2,60
 Test Statistic=2.02 F(2,60)=3.15
 

H cannot be rejected, and we cannot conclude that any of the


0
population means differs significan tly from the others.
The Theory and the Computations
of ANOVA: The Grand Mean
Thegrand
The grandmean,
mean,x,x,isisthe
themean
meanof
ofall
allnn== nn+
1+ n2+ n3+...+ nr observations
1 n2+ n3+...+ nr observations
ininall
allrrsamples.
samples.

The mean of sample i (i = 1,2,3,..., r) :


n
i
 xij
xi = j 1
ni
The grand mean, the mean of all data points :
r ni r
  xij  ni xi
i1 j 1
xi = = i1
n n
where x is the particular data point in position j within the sample from population i.
ij
The subscript i denotes the population, or treatme nt, and runs from 1 to r. The subscript j
denotes the data point within the sample from population i; thus, j runs from 1 to n .
j
The Theory and Computations of ANOVA: Error
Deviation and Treatment Deviation

We define an error devi ation as the difference between a data point


and its sample mean. Errors are denoted by e, and we have:

eij  xij  xi
We define a treatment deviation as the deviation of a sample mean
from the grand mean. Treatment deviations, ti , are given by:
t x x
i i

TheANOVA
The ANOVAprinciple
principlesays:
says:
Whenthe
When thepopulation
populationmeans
meansare
arenot
notequal,
equal,the
the“average”
“average”error
error
(withinsample)
(within sample) isisrelatively
relativelysmall
smallcompared
comparedwith
withthe
the“average”
“average”
treatment(between
treatment (betweensample)
sample)deviation.
deviation.
The Theory and Computations of
ANOVA: The Total Deviation
Thetotal
The totaldeviation
deviation(Tot(Totij))isisthe
thedifference
differencebetween
betweenaadata
datapoint
point(x
(xij))and
andthe
thegrand
grandmean
mean(x):
(x):
ij ij
Totij=x
Tot =xij--xx
ij ij
Forany
For anydata
datapoint
pointxx:ij:
ij
Totij==tt++ee
Tot ij
Thatis:
That is:
TotalDeviation
Total Deviation==Treatment
TreatmentDeviation
Deviation++Error
ErrorDeviation
Deviation
Consider data point x24=13. The mean of sample 2 is 11.5,
and the grand mean is 6.909, so: Total deviation:
Tot24=x24-x=6.091
e24  x 24  x 2  13  11.5  1.5
Error deviation:
t 2  x 2  x  11.5  6.909  4 .591 e24=x24-x2=1.5
x24=13
Tot 24  t 2  e24  1.5  4.591  6.091 Treatment deviation:
t2=x2-x=4.591 x2=11.5
or
Tot 24  x 24  x  13  6.909  6.091 x = 6.909
0 5 10
The Theory and Computations of
ANOVA: Squared Deviations
Total Deviation = Treatment Deviation + Error Deviation

The total deviation is the sum of the treatment deviation and the error deviation :
ti + eij = ( xi  x )  ( xij  xi )  ( xij  x )  Tot ij
Notice that the sample mean term ( xi ) cancels out in the above addition, which
simplifies the equation.

Squared Deviations

ti 2 + eij 2 = ( xi  x )2  ( xij  xi )2

Tot ij 2  ( xij  x )2
The Theory and Computations of ANOVA: The
Sum of Squares Principle
Sums of Squared Deviations
n n
r j 2 r 2 r j 2
  Tot   nt +   e
i  1 j  1 ij i 1 ii i 1j  1 ij
n n
r j r r j
  ( x  x ) =  n ( x  x )2 
2   ( x  x )2
i  1 j  1 ij i 1 i i i  1 j  1 ij i

SST = SSTR + SSE


TheSum
The Sumof
ofSquares
SquaresPrinciple
Principle

Thetotal
The totalsum
sumofofsquares
squares(SST)
(SST)isisthe
thesum
sumof
oftwo
twoterms:
terms: the
thesum
sumofof
squaresfor
squares fortreatment
treatment(SSTR)
(SSTR)and
andthethesum
sumof
ofsquares
squaresfor
forerror
error(SSE).
(SSE).
SST == SSTR
SST SSTR ++ SSE SSE
The Theory and Computations of ANOVA:
Picturing The Sum of Squares Principle

SSTR
SSTR SSE

SST
SSTmeasures
SST measuresthe
thetotal
totalvariation
variationininthe
thedata
dataset,
set,the
thevariation
variationofofall
allindividual
individualdata
datapoints
pointsfrom
from
thegrand
the grandmean.
mean.

SSTRmeasures
SSTR measuresthe theexplained
explainedvariation,
variation,the
thevariation
variationofofindividual
individualsample
samplemeans
meansfrom
fromthe
the
grandmean.
grand mean. ItItisisthat
thatpart
partofofthe
thevariation
variationthat
thatisispossibly
possiblyexpected,
expected,ororexplained,
explained,because
becausethethe
datapoints
data pointsare
aredrawn
drawnfrom
fromdifferent
differentpopulations.
populations. It’s
It’sthe
thevariation
variationbetween
betweengroups
groupsofofdata
data
points.
points.

SSEmeasures
SSE measuresunexplained
unexplainedvariation,
variation,the
thevariation
variationwithin
withineach
eachgroup
groupthat
thatcannot
cannotbe
beexplained
explained
bypossible
by possibledifferences
differencesbetween
betweenthe
thegroups.
groups.
The Theory and Computations of
ANOVA: Degrees of Freedom
Thenumber
The numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith
withSST
SSTisis(n(n- -1).
1).
nntotal
totalobservations
observationsininall
allr rgroups,
groups,less
lessone
onedegree
degreeofoffreedom
freedom lost
lostwith
withthe
thecalculation
calculationofofthe
the
grandmean
grand mean

Thenumber
The numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith
withSSTR
SSTRisis(r(r- -1).
1).
r rsample
samplemeans,
means,less
lessone
onedegree
degreeofoffreedom
freedomlost
lostwith
withthe
the
calculationofofthe
calculation thegrand
grandmean
mean

Thenumber
The numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith
withSSE
SSEisis(n-r).
(n-r).
nntotal
totalobservations
observationsininall
allgroups,
groups,less
lessone
onedegree
degreeofoffreedom
freedom
lostwith
lost withthe
thecalculation
calculationofofthe
thesample
samplemean
meanfrom
fromeach
eachofofr rgroups
groups

Thedegrees
The degreesofoffreedom
freedomare areadditive
additiveininthe
thesame
sameway wayasasare
arethe
thesums
sumsofofsquares:
squares:
df(total)==df(treatment)
df(total) df(treatment)++df(error)
df(error)
(n(n- -1)1) == (r(r- -1)1) ++ (n(n- -r)r)
The Theory and Computations of
ANOVA: The Mean Squares
Recallthat
Recall thatthe
thecalculation
calculationof
ofthe
thesample
samplevariance
varianceinvolves
involvesthe
thedivision
divisionofofthe
thesum
sumof
of
squareddeviations
squared deviationsfrom
fromthe
thesample
samplemean
meanby
bythe
thenumber
numberofofdegrees
degreesofoffreedom.
freedom. This
This
principleisisapplied
principle appliedas
aswell
welltotofind
findthe
themean
meansquared
squareddeviations
deviationswithin
withinthe
theanalysis
analysisof
of
variance.
variance.
SSTR
Meansquare
Mean squaretreatment
treatment(MSTR):
(MSTR): MSTR 
( r  1)
SSE
Meansquare
squareerror
error(MSE):
(MSE): MSE 
Mean (n  r )

SST
Meansquare
squaretotal
total(MST):
(MST): MST 
Mean ( n  1)

(Notethat
(Note thatthe
theadditive
additiveproperties
propertiesof
ofsums
sumsof
ofsquares
squaresdo
donot
notextend
extendtotothe
themean
mean
squares. MST
squares. MST ¹¹ MSTR
MSTR++MSE.
MSE.
The Theory and Computations of ANOVA:
The Expected Mean Squares
2
E ( MSE ) = s

and
å m - m 2
2 ni ( i ) = s 2 when the null hypothesis is true
E ( MSTR) = s +
r -1 > s 2 when the null hypothesis is false

where mi is the mean of population i and m is the combined mean of all r populations.

That
That is,
is, the
the expected
expected mean
mean square
square error
error (MSE)
(MSE) isis simply
simply the
the common
common population
population variance
variance
(remember
(remember the the assumption
assumption of
of equal
equal population
population variances),
variances), but
but the
the expected
expected treatment
treatment sum
sum ofof
squares
squares (MSTR)
(MSTR) is is the
the common
common population
population variance
variance plus
plus aa term
term related
related to
to the
the variation
variation of
of the
the
individual
individual population
population means
means around
around the
the grand
grand population
population mean.
mean.

If
If the
the null
null hypothesis
hypothesis is
is true
true soso that
that the
the population
population means
means are
are all
all equal,
equal, the
the second
second term
term in
in
the
the E(MSTR)
E(MSTR) formulation
formulation is
is zero,
zero, and
and E(MSTR)
E(MSTR) isis equal
equal to
to the
the common
common population
population variance.
variance.
Expected Mean Squares and the
ANOVA Principle
When
When the
the null
null hypothesis
hypothesis of
of ANOVA
ANOVA isis true
true and
and all
all rr population
population means
means are
are equal,
equal, MSTR
MSTR and
and
MSE
MSE are
are two
two independent,
independent, unbiased
unbiased estimators
estimators of
of the
the common
common population
population variance
variance ss ..
22

On the other hand, when the null hypothesis is false, then MSTR will tend to be larger than MSE.

So the ratio of MSTR and MSE can be used as an indicator of the equality or inequality of the r
population means.

This ratio (MSTR/MSE) will tend to be near to 1 if the null hypothesis is true, and greater than 1 if the
null hypothesis is false. The ANOVA test, finally, is a test of whether (MSTR/MSE) is equal to, or
greater than, 1.
The Theory and Computations of
ANOVA: The F Statistic
Under the assumptions of ANOVA, the ratio (MSTR/MSE) possess an F distribution with (r-
1) degrees of freedom for the numerator and (n-r) degrees of freedom for the denominator
when the null hypothesis is true.

The test statistics in Analysis of Variance


MSTR
F( r 1,n  r ) 
MSE
Example : Club Med
Club Med has conducted a test to determine whether its Caribbean resorts are equally well liked by vacationing club
members. The analysis was based on a survey questionnaire (general satisfaction, on a scale from 0 to 100) filled
out by a random sample of 40 respondents from each of 5 resorts.

Resort Mean Response (x i ) Source of Sum of Degrees of


Guadeloupe 89 Variation Squares Freedom Mean Square F Ratio

Martinique 75 Treatment SSTR= 14208 (r-1)= 4 MSTR= 3552 7.04

Eleuthra 73 Error SSE=98356 (n-r)= 195 MSE= 504.39


Paradise Island 91 Total SST=112564 (n-1)= 199 MST= 565.65
St. Lucia 85
F Distribution with 4 and 200 Degrees of Freedom
SST=112564 SSE=98356
0.7
Theresultant
The resultantFF
0.6

0.5
ratioisislarger
ratio largerthan
than
Computed test statistic=7.04
0.4 thecritical
the criticalpoint
pointfor
for
f(F)

aa==0.01,
0.01,so sothe
thenull
null
0.3

0.2

0.1
0.01 hypothesismay
hypothesis maybebe
0.0 rejected.
rejected.
0 F(4,200)
3.41
Further Analysis
Do Not Reject H0 Stop
Data ANOVA
Reject H0

The sample means are unbiased estimators of the population means.

The mean square error (MSE) is an unbiased estimator of the common


population variance.

Confidence Intervals
for Population Means
Further
Analysis Tukey Pairwise
Comparisons Test
The ANOVA Diagram
Confidence Intervals for Population
Means
A (1 - a ) 100% confidence interval for mi , the mean of population i:
MSE
xi ± ta
2 ni
where t a is the value of the t distribution with (n - r ) degrees of
2
a
freedom that cuts off a right - tailed area of .
2
Resort Mean Response (x i )
MSE 504.39
Guadeloupe 89 xi ± ta = xi ±1.96 = xi ± 6.96
Martinique 75 2 ni 40
Eleuthra 73 89 ± 6.96 = [82.04, 95.96]
Paradise Island 91 75 ± 6.96 = [ 68.04,81.96]
St. Lucia 85 73 ± 6.96 = [ 66.04, 79.96]
SST = 112564 SSE = 98356 91 ± 6.96 = [84.04, 97.96]
ni = 40 n = (5)(40) = 200 85 ± 6.96 = [ 78.04, 91.96]
MSE = 504.39
The Tukey Pairwise-Comparisons Test
The Tukey Pairwise Comparison test, or Honestly Significant Differences (HSD) test,
allows us to compare every pair of population means with a single level of significance.

It is based on the studentized range distribution, q, with r and (n-r) degrees of freedom.

The critical point in a Tukey Pairwise Comparisons test is the Tukey Criterion:
MSE
T  q
ni
where ni is the smallest of the r sample sizes.

The test statistic is the absolute value of the difference between the appropriate sample
means, and the null hypothesis is rejected if the test statistic is greater than the critical
point of the Tukey Criterion

N o te th a t th e re a re 
r
2

r!

2 !( r  2 ) !
p a irs o f p o p u la tio n m e a n s to c o m p a re . F o r e x a m p le , if r = 3:

H0: 1   2 H 0: 1   3 H0:2  3
H1: 1   2 H1: 1   3 H1:  2   3
The Tukey Pairwise Comparison
Test: The Club Med Example
The test statistic for each pairwise test is the absolute difference between the
appropriate sample means.
i Resort Mean I. H0: m1 = m2 VI. H0: m2 = m4
1 Guadeloupe 89 H1: m1 ¹ m2 H1: m2 ¹ m4
2 Martinique 75 |89-75|=14>13.7* |75-91|=16>13.7*
3 Eleuthra 73 II. H0: m1 = m3 VII. H0: m2 = m5
4 Paradise Is. 91 H1: m1 ¹ m3 H1: m2 ¹ m5
5 St. Lucia 85 |89-73|=16>13.7* |75-85|=10<13.7
III. H0: m1 = m4 VIII.H0: m3 = m4
The critical point T0.05 for H1: m1 ¹ m4 H1: m3 ¹ m4
r=5 and (n-r)=195 |89-91|=2<13.7 |73-91|=18>13.7*
degrees of freedom is: IV.H0: m1 = m5 IX. H0: m3 = m5
H1: m1 ¹ m5 H1: m3 ¹ m5
MSE |89-85|=4<13.7 |73-85|=12<13.7
T  q
ni V. H0: m2 = m3 X. H0: m4 = m5
504.4 H1: m2 ¹ m3 H1: m4 ¹ m5
 3.86  13.7
40 |75-73|=2<13.7 |91-85|= 6<13.7
Reject the null hypothesis if the absolute value of the difference between the sample means
is greater than the critical value of T. (The hypotheses marked with * are rejected.)
Picturing the Results of a Tukey Pairwise
Comparisons Test: The Club Med Example

Werejected
We rejectedthe
thenull
nullhypothesis
hypothesiswhich
whichcompared
comparedthe themeans
meansof
ofpopulations
populations11
and2,2,11and
and and3,3,22and
and4,4,and
and33and
and4.4. On
Onthetheother
otherhand,
hand,we
weaccepted
acceptedthethe
nullhypotheses
null hypothesesof ofthe
theequality
equalityof
ofthe
themeans
meansofofpopulations
populations11and
and4,4,11and
and5,5,
22and
and3,3,22and
and5,5,33and
and5,5,and
and44and
and5.5.
m m m m m
3 2 5 1 4

Thebars
The barsindicate
indicatethe
thethree
threegroupings
groupingsof ofpopulations
populationswith
withpossibly
possiblyequal
equal
means:22and
means: and3;3;2,2,3,3,and
and5;5;and
and1,1,4,4,and
and5.5.
Experimental Design
•• AAcompletely-randomized
completely-randomizeddesign
designisisone
oneininwhich
whichthe
theelements
elementsare
areassigned
assignedtototreatments
treatments
completelyatatrandom.
completely random. That
Thatis,is,any
anyelement
elementchosen
chosenfor
forthe
thestudy
studyhas
hasan
anequal
equalchance
chanceofofbeing
being
assignedtotoany
assigned anytreatment.
treatment.

•• InInaablocking
blockingdesign,
design,elements
elementsare
areassigned
assignedtototreatments
treatmentsafter
afterfirst
firstbeing
beingcollected
collectedinto
into
homogeneousgroups.
homogeneous groups.
 InInaacompletely
completelyrandomized
randomizedblock
blockdesign,
design,all
allmembers
membersofofeach
eachblock
block(homogeneous
(homogeneousgroup)
group)
arerandomly
are randomlyassigned
assignedtotothe
thetreatment
treatmentlevels.
levels.

 InInaarepeated
repeatedmeasures
measuresdesign,
design,each
eachmember
memberofofeach
eachblock
blockisisassigned
assignedtotoall
alltreatment
treatmentlevels.
levels.
Two-Way Analysis of Variance
•• InInaatwo-way
two-wayANOVA,
ANOVA,the
theeffects
effectsofoftwo
twofactors
factorsorortreatments
treatmentscan
canbe
beinvestigated
investigatedsimultaneously.
simultaneously. Two-
Two-
wayANOVA
way ANOVAalso
alsopermits
permitsthe
theinvestigation
investigationofofthe
theeffects
effectsofofeither
eitherfactor
factoralone
aloneand
andofofthe
thetwo
twofactors
factors
together.
together.
 The
Theeffect
effecton
onthe
thepopulation
populationmean
meanthat
thatcan
canbebeattributed
attributedtotothe
thelevels
levelsofofeither
eitherfactor
factoralone
aloneisiscalled
calledaamain
main
effect.
effect.
 AnAninteraction
interactioneffect
effectbetween
betweentwotwofactors
factorsoccurs
occursififthe
thetotal
totaleffect
effectatatsome
somepair
pairofoflevels
levelsofofthe
thetwo
twofactors
factorsoror
treatmentsdiffers
treatments differssignificantly
significantlyfrom
fromthe
thesimple
simpleaddition
additionofofthethetwo
twomain
maineffects.
effects. Factors
Factorsthat
thatdodonot
notinteract
interact
arecalled
are calledadditive.
additive.

•• Three
Threequestions
questionsanswerable
answerableby
bytwo-way
two-wayANOVA:
ANOVA:
 Are
Arethere
thereany
anyfactor
factorAAmain
maineffects?
effects?
 Are
Arethere
thereany
anyfactor
factorBBmain
maineffects?
effects?
 Are
Arethere
thereany
anyinteraction
interactioneffects
effectsbetween
betweenfactors
factorsAAand
andB?
B?

•• For
Forexample,
example, we
wemight
mightinvestigate
investigatethe
theeffects
effectson
onvacationers’
vacationers’ratings
ratingsofofresorts
resortsby
bylooking
lookingatatfive
fivedifferent
different
resorts(factor
resorts (factorA)
A)and
andfour
fourdifferent
differentresort
resortattributes
attributes(factor
(factorB).
B). InInaddition
additiontotothe
thefive
fivemain
mainfactor
factorAA
treatmentlevels
treatment levelsand
andthe
thefour
fourmain
mainfactor
factorBBtreatment
treatmentlevels,
levels,there
thereare
are(5*4=20)
(5*4=20)interaction
interactiontreatment
treatment
levels.
levels.
The Two-Way ANOVA Model

• xijk=m+ai+ bj + (ab)ij + eijk


– where m is the overall mean;
– ai is the effect of level i(i=1,...,a) of factor A;
– b j is the effect of level j(j=1,...,b) of factor B;
– ( ab ) ij is the interaction effect of levels i and j;
– e ijk is the error associated with the kth data point from level i of factor
A and level j of factor B.
– eijk is assumed to be distributed normally with mean zero and
variance s 2 for all i, j, and k.
Two-Way ANOVA Data Layout:
Club Med Example
Factor A: Resort
Paradise
Guadeloupe Martinique Eleuthra Island St. Lucia
Factor B:
Attribute

Friendship n11 n21 n31 n41 n51


Sports n12 n22 n32 n42 n52
Culture n13 n23 n33 n43 n53
Excitement n14 n24 n34 n44 n54

Eleuthra/sports interaction:
Graphical Display of Effects Rating Combined effect greater than
additive main effects
Friendship Friendship
Excitement Attribute
Sports Excitement
Culture
R a ting

Sports

Culture

Eleuthra St. Lucia Paradise island


Resort
Martinique Guadeloupe
Eleuthra St. Lucia Paradise Island
Resort Martinique Guadeloupe
Hypothesis Tests a Two-Way
ANOVA
•• Factor
Factor AA main
main effects
effects test:
test:
HH00::aai=i=00for
 forall alli=1,2,...,a
i=1,2,...,a
HH11::Not
 Notall allaai iare
are00
•• Factor
Factor BB main
main effects
effects test:
test:
HH00::bbj=j=00for
 forall allj=1,2,...,b
j=1,2,...,b
HH11::Not
 Notall allbbi iare
are00
•• Test
Test for
for (AB)
(AB) interactions:
interactions:
HH00::(ab)
 (ab)ijij==00for
forall
alli=1,2,...,a
i=1,2,...,aand
andj=1,2,...,b
j=1,2,...,b
HH11::Not
 (ab)ijijare
all(ab)
Notall are00
Sums of Squares

 In aa two-way
In two-way ANOVA:
ANOVA:
=m+ai+i+bbj j++(ab)
xxijkijk=m+a (ab)ijkijk++eeijkijk
•• SST
SST==SSTR
SSTR+SSE
+SSE
•• SST
SST==SSA
SSA++SSB
SSB+SS(AB)+SSE
+SS(AB)+SSE
SST = SSTR + SSE
å å å ( x - x )2 = å å å ( x - x )2 + å å å ( x - x )2

SSTR = SSA + SSB + SS ( AB)


= å å å ( x - x )2 + å å å ( x - x )2 + å å å ( x + x + x - x )2
i j ij i j
The Two-Way ANOVA Table

Source of Sum of Degrees


Variation Squares of Freedom Mean Square F Ratio
Factor A SSA a-1 SSA MSA
MSA = F=
a -1 MSE
Factor B SSB b-1 SSB MSB
MSB = F=
b -1 MSE
Interaction SS(AB) (a-1)(b-1) SS ( AB) MS( AB)
MS ( AB) = F=
( a -1)(b -1) MSE
Error SSE ab(n-1) SSE
MSE =
ab( n -1)
Total SST abn-1
A Main Effect Test: F(a-1,ab(n-1))
B Main Effect Test: F(b-1,ab(n-1))

(AB) Interaction Effect Test: F((a-1)(b-1),ab(n-1))


Hypothesis Tests

F Distribution with 2 and 81 Degrees of Freedom F Distribution with 4 and 81 Degrees of Freedom

0.7 0.7

0.6 Location test statistic=8.94 0.6

0.5
Artist test statistic=10.93 0.5 Interaction test statistic=1.97
0.4 0.4

f(F)
f(F)

0.3 0.3
a=0.05
a=0.01 0.2
0.2

0.1 0.1

0.0 0.0 F
0 1 2 3 4 5 6 F 0 1 2 3 4 5 6

F0.01=4.88 F0.05=2.48
Overall Significance Level and
Tukey Method for Two-Way ANOVA
Kimball’sInequality
Kimball’s Inequalitygives
givesan
anupper
upperlimit
limiton
onthe
thetrue
trueprobability
probabilityof
ofatatleast
least
one Type
one TypeIIerror
errorininthe
thethree
threetests
testsof
ofaatwo-way
two-wayanalysis:
analysis:
aa£ 1-(1-a1)
£ 1- (1-a1)(1-a2)
(1-a2)(1-a3)
(1-a3)

TukeyCriterion
Tukey Criterionfor
forfactor
factorA:
A:
MSE
T  q
bn
wherethe
where thedegrees
degreesof
offreedom
freedomofofthe
theqqdistribution
distributionare
arenow
nowaaand
andab(n-1).
ab(n-1).
Notethat
Note thatMSE
MSEisisdivided
dividedbybybn.
bn.
Context
• Four different Treatments to cattle groups

• Five brands of petrol

• Four training methods

• Comparison of three business schools

• Absorption in five different concrete aggregates

39
Thank you for your time and participation

Acknowledgements

You might also like