You are on page 1of 10

STAT3010: Lecture 10

TWO-WAY ANALYSIS OF VARIANCE


Recall: Last lecture, we looked at the Two-way ANOVA based
on equal sample sizes in each cell, but what if we had unequal
sample sizes?
Two-way ANOVA with unequal cell numbers "Unbalanced
designs"

Much more common than equal cell numbers because


o in observational studies you often can't control the
number of observations falling into particular cells
o even in randomized experiments you often have
missing data; e.g. the experimental unit dies.
The estimates of treatment effects are not so neatly
calculated as functions of row means, column means,
etc.
SS no longer decompose so neatly. Problem is a lack of
orthogonality, i.e., lack of independence between
predictor variables
"Solution": With SAS proc glm: Type III SS can be used.

Example of two-way unbalanced Fixed Effects ANOVA:


The following data is based on adoptive Child IQ (taken from
Ramsey and Schafer, "Statistical Sleuth"). The IQ scores are
recorded from the adopted children based on their socioeconomic status of adoptive parents (high or low) and their
socio-economic status of biological parents (high or low). A
total of 38 IQ scores are collected:

STAT3010: Lecture 10
Adoptive Parents
High

High
Biological
Parents
Low

131
93
120
87
127

105
111
71
68
88

118
106
85
98
117

94
111
87
99
101

Low
94
130
108
81

100
74
69
87
99

106
112
69
79

98
66
73
65
109

Conduct a test to determine whether the IQ scores is affected


by the socio-economic status of the childs adoptive and
biological status.
SAS CODE:
data unbal;
input IQ adoptive $ biological $ ;
cards;
131
105
93
111
120
71
87
68
127
88
94
106
130
112
108
69
81
79
118
94
106
111
85
87
98
99
117
101
100

H
H
H
H
H
H
H
H
H
H
L
L
L
L
L
L
L
L
H
H
H
H
H
H
H
H
H
H
L

H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
L
L
L
L
L
L
L
L
L
L
L

STAT3010: Lecture 10
98
74
66
69
73
87
65
99
109

L
L
L
L
L
L
L
L
L

L
L
L
L
L
L
L
L
L

;
proc glm ;
class adoptive biological ;
model IQ = adoptive biological adoptive*biological ;
lsmeans adoptive*biological/stderr;
run ;

SAS OUTPUT:
The SAS System
The GLM Procedure
Class Level Information
Class
adoptive
biological

Levels
2
2

Values
H L
H L

Number of Observations Read


Number of Observations Used

38
38

The SAS System


The GLM Procedure
Dependent Variable: IQ
Source
Model
Error
Corrected Total
R-Square
0.149776

DF
3
34
37

Sum of
Squares
1933.03553
10973.17500
12906.21053

Coeff Var
18.77528

Mean Square
644.34518
322.74044

Root MSE
17.96498

F Value
2.00

Pr > F
0.1330

IQ Mean
95.68421

Source
adoptive
biological
adoptive*biological

DF
1
1
1

Type I SS
1126.716082
285.694444
520.625000

Mean Square
1126.716082
285.694444
520.625000

F Value
3.49
0.89
1.61

Pr > F
0.0703
0.3534
0.2127

Source

DF

Type III SS

Mean Square

F Value

Pr > F

1
1
1

972.0132353
331.8014706
520.6250000

972.0132353
331.8014706
520.6250000

3.01
1.03
1.61

0.0917
0.3178
0.2127

adoptive
biological
adoptive*biological

The SAS System


The GLM Procedure
Least Squares Means

STAT3010: Lecture 10

adoptive

biological

H
H
L
L

H
L
H
L

IQ LSMEAN

Standard
Error

Pr > |t|

100.100000
101.600000
97.375000
84.000000

5.681025
5.681025
6.351579
5.681025

<.0001
<.0001
<.0001
<.0001

Step 1:
Interaction Effect hypothesis:

Test Statistic:
Decision:
Conclusion:

Step 2:
Main Effects hypotheses:

Test Statistics:

Decisions:
Conclusions:

STAT3010: Lecture 10

Example:
Consider the following set of data taken from Afifi and Azen
(1972, p166). The authors present a balanced experiment
where is was desired to evaluate the effect of 4 drugs crossed
with 3 experimentally induced diseases. Each drug-disease
combination was applied to 6 randomly selected dogs. The
measurement (y) to be analyzed was the increase is systolic
blood pressure (mmHg) due to the treatment.
Drug

Disease

42

28

24

44

36

23

13

34

29

22

19

42

-2

22

13

19

15

33

27

34

11

12

26

33

12

31

-5

33

16

21

36

-6

15

31

21

22

-3

26

28

25

25

32

25

12

24

16

Can one conclude on the basis of these data that there is


interaction between the dug and disease? Can one conclude
that the different diseases have different effects? Can one
conclude that the drugs differ in effectiveness? If either of the
drug or disease affects the systolic blood pressure, determine
specifically which drug/disease contributes using the LSmeans
and tukey procedure.
5

STAT3010: Lecture 10

SAS CODE:
options ps=62 ls=80;
title 'Unbalanced Two-Way Analysis of Variance';
data dog;
input drug disease @;
do i=1 to 6;
input y @;
output;
end;
datalines;
1 1 42 44 36 13 19 22
1 2 33 . 26 . 33 21
1 3 31 -3 . 25 25 24
2 1 28 . 23 34 42 13
2 2 . 34 33 31 . 36
2 3 3 26 28 32 4 16
3 1 . . 1 29 . 19
3 2 . 11 9 7 1 -6
3 3 21 1 . 9 3 .
4 1 24 . 9 22 -2 15
4 2 27 12 12 -5 16 15
4 3 22 7 25 5 12 .

run;
PROC GLM;
class drug disease;
model y=drug disease drug*disease / ss1 ss2 ss3 ss4;
run;
SAS OUTPUT:
Unbalanced Two-Way Analysis of Variance
The GLM Procedure
Class Level Information
Class
drug
disease

Levels
4
3

Values
1 2 3 4
1 2 3

Number of Observations Read


Number of Observations Used

72
58

Unbalanced Two-Way Analysis of Variance


The GLM Procedure
Dependent Variable: y

STAT3010: Lecture 10

Source
Model
Error
Corrected Total
R-Square
0.456024

DF
11
46
57

Sum of
Squares
4259.338506
5080.816667
9340.155172

Coeff Var
55.66750

Mean Square
387.212591
110.452536

Root MSE
10.50964

F Value
3.51

Pr > F
0.0013

y Mean
18.87931

Source
drug
disease
drug*disease

DF
3
2
6

Type I SS
3133.238506
418.833741
707.266259

Mean Square
1044.412835
209.416870
117.877710

F Value
9.46
1.90
1.07

Pr > F
<.0001
0.1617
0.3958

Source
drug
disease
drug*disease

DF
3
2
6

Type II SS
3063.432863
418.833741
707.266259

Mean Square
1021.144288
209.416870
117.877710

F Value
9.25
1.90
1.07

Pr > F
<.0001
0.1617
0.3958

Source
drug
disease
drug*disease

DF
3
2
6

Type III SS
2997.471860
415.873046
707.266259

Mean Square
999.157287
207.936523
117.877710

F Value
9.05
1.88
1.07

Pr > F
<.0001
0.1637
0.3958

Source
drug
disease
drug*disease

DF
3
2
6

Type IV SS
2997.471860
415.873046
707.266259

Mean Square
999.157287
207.936523
117.877710

F Value
9.05
1.88
1.07

Pr > F
<.0001
0.1637
0.3958

First question (Can one conclude on the basis of these data


that there is interaction between the drug and disease?):
Interaction Effect hypothesis:

Test Statistic:
Decision:
Conclusion:

STAT3010: Lecture 10

Second question (Can one conclude that the different diseases


have different effects?):
Disease Main Effects hypothesis:

Test Statistic:
Decision:
Conclusion:

Third question (Can one conclude that the drugs differ in


effectiveness?):
Drug Main Effects hypothesis:

Test Statistic:
Decision:
Conclusion:

Fourth question (If either of the drug or disease affects the


systolic blood pressure, determine specifically which
drug/disease contributes using the LSmeans and tukey
procedure.):

STAT3010: Lecture 10

Note: We have determined that the drug types have a


significant effect on the systolic blood pressure of the dogs, so
we want to see specifically which drug is affecting the SBP. Is it
drug 1, 2, 3 or 4? In order to check this, we need to add on the
following code to the end of our previous SAS code:
lsmeans drug / pdiff=all adjust=tukey;
run;
We obtain the following SAS OUTPUT:
Unbalanced Two-Way Analysis of Variance
The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Tukey-Kramer

drug
1
2
3
4

y LSMEAN

LSMEAN
Number

25.9944444
26.5555556
9.7444444
13.5444444

1
2
3
4

Least Squares Means for effect drug


Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: y
i/j

1
1
2
3
4

0.9989
0.0016
0.0107

2
0.9989

3
0.0016
0.0011

0.0011
0.0071

0.7870

Explanation:

4
0.0107
0.0071
0.7870

STAT3010: Lecture 10

Multiple Comparisons in Two-way ANOVA with equal cell


numbers "balanced design (done by hand)
The Tukey Method:
Let I be the number of levels of the row factor, J be the number of
levels of the column factor, K be the sample size for each treatment.
Calculate each
overall

x .1 , x .2 ,..., x . J

as well as each

x1. , x 2. ,..., x I .

and the

x ..

The quantity

i x i. x ..

The quantity

j x . j x ..

is called the row effect.


is called the column effect.

For every pair of levels i and j for which


the null hypothesis

Ho :i j 0

| i j | q I ,IJ ( K 1),

MSE
JK

is rejected at level .

MSE

q
For every pair of levels i and j for which
,
i
j
J , IJ ( K 1),
IK
the null hypothesis H o : i j 0 is rejected at level .
*Using Table B.6 for the critical value in above formulas*

10