# Introduction to Design and Analysis of Experiments with the SAS System (Stat 7010 Lecture Notes

)

Asheber Abebe Discrete and Statistical Sciences Auburn University

ii

Contents

1 Completely Randomized Design 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1.2 The Fixed Eﬀects Model . . . . . . . . . . . . . . . 1.2.1 Decomposition of the Total Sum of Squares 1.2.2 Statistical Analysis . . . . . . . . . . . . . . 1.2.3 Comparison of Individual Treatment Means 1.3 The Random Eﬀects Model . . . . . . . . . . . . . 1.4 More About the One-Way Model . . . . . . . . . . 1.4.1 Model Adequacy Checking . . . . . . . . . 1.4.2 Some Remedial Measures . . . . . . . . . . 1 1 2 2 3 5 10 18 18 23 25 25 25 26 27 29 30 30 36 38 39 41 41 42 48 51 51 55 55 57 57 59 60 70 75 79

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

2 Randomized Blocks 2.1 The Randomized Complete Block Design . . . . . . 2.1.1 Introduction . . . . . . . . . . . . . . . . . . 2.1.2 Decomposition of the Total Sum of Squares . 2.1.3 Statistical Analysis . . . . . . . . . . . . . . . 2.1.4 Relative Eﬃciency of the RCBD . . . . . . . 2.1.5 Comparison of Treatment Means . . . . . . . 2.1.6 Model Adequacy Checking . . . . . . . . . . 2.1.7 Missing Values . . . . . . . . . . . . . . . . . 2.2 The Latin Square Design . . . . . . . . . . . . . . . 2.2.1 Statistical Analysis . . . . . . . . . . . . . . . 2.2.2 Missing Values . . . . . . . . . . . . . . . . . 2.2.3 Relative Eﬃciency . . . . . . . . . . . . . . . 2.2.4 Replicated Latin Square . . . . . . . . . . . . 2.3 The Graeco-Latin Square Design . . . . . . . . . . . 2.4 Incomplete Block Designs . . . . . . . . . . . . . . . 2.4.1 Balanced Incomplete Block Designs (BIBD’s) 2.4.2 Youden Squares . . . . . . . . . . . . . . . . . 2.4.3 Other Incomplete Designs . . . . . . . . . . . 3 Factorial Designs 3.1 Introduction . . . . . . . . . . . . . 3.2 The Two-Factor Factorial Design . 3.2.1 The Fixed Eﬀects Model . . 3.2.2 Random and Mixed Models 3.3 Blocking in Factorial Designs . . . 3.4 The General Factorial Design . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

iii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Nesting in the Design Structure . .2. . . .1. . . . . .
. . . . . . . . . 6. . . . . . . . . . . . .2. . . . .1. . . . . . . . .1 The Mixed RCBD . . . . . . . . Factors . . . . . . . . . . . . . . . . . . . . . .3 One-way RM Design : General . . . . . . . .3 The General 2k Design . . . .
. . . . . . .
. . . . . .
. . .
. . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . 4. .
. . . . . . .3 Tests of Hypotheses . . . . . . . . . . . 7. .2. . .4 The Unreplicated 2k Design 4. . . . . . . .
. . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Trend Analyses in One. .
8 Nested Designs 181 8. .1. . . . .
. . . . . .1 Estimation : The Method of Least Squares . . . . . . . . .2 The 23 Design . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Single Factor Designs with One Covariate . . . One Covariate . . . . . . . . . . . . . . . . . . . . . 4. . . .2. . .1 The 22 Design . . . . . . . . . . . . . . . . . . . 6. 7. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .1. . . . .2. . . . . . . . . . . . . . . . . . .1 Regression Components of the Between Treatment SS (SSB ) . . . .
.and Two-way RM Designs . . . . . . . . . . . . . . One Covariate . . . . . . .2 The 2k Factorial Design . . . . . 7. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv 4 2k and 3k Factorial Designs 4. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 The Huynh-Feldt Sphericity (S) Structure 5. .1. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 The 3k Design . . . . . . .1 Introduction . . . . . . .
. . .
. . . . . . . . . . . . .
. . . . . . . .5. . . . . . . 6. . . . . . .
. . .2 Nesting in the Treatment Structure . . . . .
. . . . . . . . . . . . . . . . . . .5 The Johnson-Neyman Technique: Heterogeneous Slopes 7. . . . . .1 Two Groups. . . . . . .
. . . . . . . .4 Two-Way Repeated Measurement Designs with Repeated Measures on 7 Introduction to the Analysis of Covariance 7. . . . . . . . . . .2 RM Designs . . . . . . . . . . . . .
. . .
6 More on Repeated Measurement Designs 6. . . . . . . . . . . . . . . . . 7. . . . . .3 Crossover Designs . . .3 ANCOVA in Randomized Complete Block Designs . . . . . . . . . .
. .2 The One-Way RM Design : (S) Structure 5. . . 7. . . . .1 Introduction . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . .
. . . . . . 5. 5. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .5. . . . . . . . . . . .2. . .
. . . . . . . . . . . . .1. . . . . . . . . . . . . . .2 Multiple Groups. . . . . . . . . . . . . . . .
. . . . . 7. .
CONTENTS 83 83 83 83 86 89 90 99 101 101 101 103 104 104 107 109 129 129 129 134 140 146 151 157 157 157 158 158 162 166 170 174 174 180
. . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . .1 Simple Linear Regression . . . . . . . . . . 6. . . . . . . . . . .2 One-Way Repeated Measurement Designs . . .
. . . . . . . . . . . . 181 8. . . . . .
. . . . . . . . 4. . 185
. . . . . . . . . .2 Partitioning the Total SS . . . . . . .
5 Repeated Measurement Designs 5. . . . .
. . . .4 ANCOVA in Two-Factor Designs . . . . . . . . . . . . . . . . .3 Two-Way Repeated Measurement Designs . . . . . . . . . . 5. . . . . . . . . 7. . . . . . . . . . 6. . . 4. . . .2 The Split-Plot Design . . . . . . . . . . . . . . . . . . Both . . . . . . .

. The goal here is to test hypotheses about the treatment means and estimate the model parameters (µ. • τi is a parameter unique to the ith treatment (ith treatment eﬀect). · · · .1 Introduction
Suppose we have an experiment which compares k treatments or k levels of a single factor. · · · . Suppose we have n experimental units to be included in the experiment. τi . k.
i = 1. . assign the second treatment to n2 units randomly selected from the remaining n − n1 units. yknk
The model in Equation (1. We shall describe the observations using the linear statistical model yij = µ + τi + where • yij is the jth observation on treatment i. • µ is a parameter common to all treatments (overall mean). The model is called the one-way classiﬁcation analysis of variance (one-way ANOVA). . The typical data layout for a one-way ANOVA is shown below: Treatment 2 ··· y21 y21 . ni .Chapter 1
Completely Randomized Design
1. 1
.
In this model the random errors are assumed to be normally and independently distributed with mean zero and variance σ 2 . . y1n1
k yk1 yk1 . j = 1. and so on until the kth treatment is assigned to the ﬁnal nk units. and σ 2 ). We can assign the ﬁrst treatment to n1 units randomly selected from among the n. Such an experimental design is called a completely randomized design (CRD). Conclusions reached here only apply to the treatments considered and cannot be extended to other treatments that were not in the study. Fixed Eﬀects Model : The k treatments could have been speciﬁcally chosen by the experimenter.
(1.1)
is a random error component. . which is assumed constant for all treatments.1) describes two diﬀerent situations : 1. and •
ij ij . y2n2
1 y11 y11 . .

we are not interested in the particular ones in the model. Random : A scientist is interested in the way a fungicide works. He selects. H0 HA An equivalent set of hypotheses is H0 HA : τ1 = τ2 = · · · = τk = 0 : τi = 0 for at least one i : µ1 = µ2 = · · · = µk : µi = µj for at least one i. Further.2
CHAPTER 1. The total sum of squares. Fixed : A scientist develops three new fungicides. Random : Select.. may be decomposed as
. = ¯
1 n
k
ni
yij
i=1 j=1
The total sum of squares (corrected) given by
k ni
SST =
i=1 j=1
(yij − y. The τi are random variables. i = 1. three fungicides from a group of similar fungicides to study the action. · · · . SST . Random Eﬀects Model : The k treatments could be a random sample from a larger population of treatments. k. ¯
measures the total variability in the data. The treatment eﬀects. We test hypotheses about the variability of τi . so that
k
τi = 0 . COMPLETELY RANDOMIZED DESIGN 2. We are interested in testing the equality of the k treatment means. = ¯ 1 ni
ni
yij . Fixed : Conduct an experiment to obtain information about four speciﬁc soil types.
j=1
y. at random.. four soil types to represent all soil types. His interest is in these fungicides only. are expressed as deviations from the overall mean. Conclusions here extend to all the treatments in the population.1
Decomposition of the Total Sum of Squares
k i=1
In the following let n =
ni . Random : Choose ﬁve machines to represent machines as a class.2. at random. thus. Here are a few examples taken from Peterson : Design and Analysis of Experiments: 1. let yi. µi = E(yij ) = µ + τi .
i=1
Denote by µi the mean of the ith treatment. j
1. Fixed : Measure the rate of production of ﬁve particular machines. τi . )2 .2
The Fixed Eﬀects Model
In this section we consider the ANOVA for the ﬁxed eﬀects model.
1. 3. 2.

1.2. THE FIXED EFFECTS MODEL

3

k

ni

k

k

ni

(yij − y.. )2 = ¯

i=1 j=1 i=1

ni (¯i. − y.. )2 + y ¯

i=1 j=1

(yij − yi. )2 ¯

**The proof is left as an exercise. We will write SST = SSB + SSW ,
**

i ni (¯i. −¯.. )2 is called the between treatments sum of squares and SSW = i=1 j=1 (yij − y y where SSB = 2 yi. ) is called the within treatments sum of squares. ¯ One can easily show that the estimate of the common variance σ 2 is SSW /(n − k). Mean squares are obtained by dividing the sum of squares by their respective degrees of freedoms as

k i=1

k

n

M SB = SSB /(k − 1),

M SW = SSW /(n − k) .

1.2.2

Testing

Statistical Analysis

Since we assumed that the random errors are independent, normal random variables, it follows by Cochran’s Theorem that if the null hypothesis is true, then F0 = if F0 > Fk−1,n−k (α) . The following ANOVA table summarizes the test procedure: Source Between Within (Error) Total Estimation Once again consider the one-way classiﬁcation model given by Equation (1.1). We now wish to estimate the model parameters (µ, τi , σ 2 ). The most popular method of estimation is the method of least squares (LS) which determines the estimators of µ and τi by minimizing the sum of squares of the errors

k ni 2 ij i=1 j=1 k ni

M SB M SW

follows an F distribution with k − 1 and n − k degrees of freedom. Thus an α level test of H0 rejects H0

df k−1 n−k n−1

SS SSB SSW SST

MS M SB M SW

F0 F0 = M SB /M SW

L=

=

i=1 j=1

(yij − µ − τi )2 .

Minimization of L via partial diﬀerentiation provides the estimates µ = y.. and τi = yi. − y.. , for ˆ ¯ ˆ ¯ ¯ i = 1, · · · , k. By rewriting the observations as yij = y.. + (¯i. − y.. ) + (yij − yi. ) ¯ y ¯ ¯ one can easily observe that it is quite reasonable to estimate the random error terms by eij = yij − yi. . ¯ These are the model residuals.

4

CHAPTER 1. COMPLETELY RANDOMIZED DESIGN Alternatively, the estimator of yij based on the model (1.1) is yij = µ + τi , ˆ ˆ ˆ

which simpliﬁes to yij = yi. . Thus, the residuals are yij − yij = yij − yi. . ˆ ¯ ˆ ¯ An estimator of the ith treatment mean, µi , would be µi = µ + τi = yi. . ˆ ˆ ˆ ¯ Using M SW as an estimator of σ 2 , we may provide a 100(1 − α)% conﬁdence interval for the treatment mean, µi , yi. ± tn−k (α/2) M SW /ni . ¯ A 100(1 − α)% conﬁdence interval for the diﬀerence of any two treatment means, µi − µj , would be yi. − yj. ± tn−k (α/2) ¯ ¯ M SW (1/ni + 1/nj )

We now consider an example from Montgomery : Design and Analysis of Experiments. Example The tensile strength of a synthetic ﬁber used to make cloth for men’s shirts is of interest to a manufacturer. It is suspected that the strength is aﬀected by the percentage of cotton in the ﬁber. Five levels of cotton percentage are considered: 15%, 20%, 25%, 30% and 35%. For each percentage of cotton in the ﬁber, strength measurements (time to break when subject to a stress) are made on ﬁve pieces of ﬁber. 15 7 7 15 11 9 The corresponding ANOVA table is Source Between Within (Error) Total df 4 20 24 SS 475.76 161.20 636.96 MS 118.94 8.06 F0 F0 = 14.76 20 12 17 12 18 18 25 14 18 18 19 19 30 19 25 22 19 23 35 7 10 11 15 11

Performing the test at α = .01 one can easily conclude that the percentage of cotton has a signiﬁcant eﬀect on ﬁber strength since F0 = 14.76 is greater than the tabulated F4,20 (.01) = 4.43. The estimate of the overall mean is µ = y.. = 15.04. Point estimates of the treatment eﬀects are ˆ ¯ τ1 = y1. − y.. = 9.80 − 15.04 = −5.24 ˆ ¯ ¯ τ2 = y2. − y.. = 15.40 − 15.04 = 0.36 ˆ ¯ ¯ τ3 = y3. − y.. = 17.60 − 15.04 = −2.56 ˆ ¯ ¯ τ4 = y4. − y.. = 21.60 − 15.04 = 6.56 ˆ ¯ ¯ τ5 = y5. − y.. = 10.80 − 15.04 = −4.24 ˆ ¯ ¯ A 95% percent CI on the mean treatment 4 is 21.60 ± (2.086) 8.06/5 , which gives the interval 18.95 ≤ µ4 ≤ 24.25.

1.2. THE FIXED EFFECTS MODEL

5

1.2.3

Comparison of Individual Treatment Means

**Suppose we are interested in a certain linear combination of the treatment means, say,
**

k

L=

i=1

li µi ,

**where li , i = 1, · · · , k, are known real numbers not all zero. The natural estimate of L is
**

k k

ˆ L=

i=1

li µi = ˆ

i=1

li yi. . ¯

**Under the one-way classiﬁcation model (1.1), we have : ˆ 1. L follows a N (L, σ 2 2. √
**

ˆ L−L P 2 M SW ( k li /ni ) i=1 k 2 i=1 li /ni ),

**follows a tn−k distribution,
**

k 2 i=1 li /ni ),

ˆ 3. L ± tn−k (α/2) M SW ( 4. An α-level test of

H0 HA is ˆ L M SW (

: L=0 : L=0

k 2 i=1 li /ni )

> tn−k (α/2) .

**A linear combination of all the treatment means
**

k

φ=

i=1

ci µi

is known as a contrast of µ1 , · · · , µk if

k i=1 ci

**= 0. Its sample estimate is ˆ φ=
**

k

ci yi. . ¯

i=1

**Examples of contrasts are µ1 − µ2 and µ1 − µ. Consider r contrasts of µ1 , · · · , µk , called planned comparisons, such as,
**

k k

φi =

s=1

cis µs with

s=1

cis = 0 for i = 1, · · · , r ,

and the experiment consists of H0 : φ 1 = 0 HA : φ1 = 0 ··· ··· H0 : φ r = 0 HA : φr = 0

we wish to test H0 : µi = µj .75 .n−k (α).75 would imply that the corresponding pair of population means are signiﬁcantly diﬀerent.6 Example The most common example is the set of all
k 2
CHAPTER 1.
Example Consider the fabric strength example we considered above. then reject H0 : µi = µj . The ANOVA F -test rejected H0 : µ1 = · · · = µ5 . ¯ ¯ M SW (1/ni + 1/nj )
and comparing it to tn−k (α/2). Stage 2 : Test H0 HA for all
k 2
then declare H0 : µ1 = · · · = µk true and stop. − yj. COMPLETELY RANDOMIZED DESIGN
pairwise tests H0 HA : µi = µj : µi = µj
for 1 ≤ i < j ≤ k of all µ1 . − yj. • if F0 > Fk−1. | y ¯ M SW (1/ni + 1/nj )
• if |tij | < tn−k (α/2). The LSD at α = . | > y ¯ LSD. The Least Signiﬁcant Diﬀerence (LSD) Method Suppose that following an ANOVA F test where the null hypothesis is rejected. • if F0 < Fk−1. This could be done using the t statistic t0 = yi. − yj.n−k (α). 5
Thus any pair of treatment averages that diﬀer by more than 3. then go to Stage 2. where LSD = tn−k (α/2) M SW (1/ni + 1/nj ) .
then accept H0 : µi = µj .06) = 3. • if |tij | > tn−k (α/2).086 2(8. · · · .025) M SW (1/5 + 1/5) = 2.
: µi = µj : µi = µj
pairs with |tij | = |¯i. The following gives a summary of the steps. The 5 = 10 pairwise diﬀerences among the treatment means 2 are
. for all i = j. Stage 1 : Test H0 : µ1 = · · · = µk with F0 = M SB /M SW . An equivalent test declares µi and µj to be signiﬁcantly diﬀerent if |¯i. The experiment consists of all k pairwise tests. An experimentwise 2 error occurs if at least one of the null hypotheses is declared signiﬁcant when H0 : µ1 = · · · = µk is known to be true.05 is LSD = t20 (. µk .

− y3.1. ¯ 21. Scheﬀ´ (1953) has e proposed a method for comparing all possible contrasts between treatment means.6 = 9.2 −6.4 − 10.n−k (α)
M SW
s=1
(c2 /ni ) . ¯ 10.6 = 15. ¯ 15. This is due to the fact that the ANOVA F -test considers all possible comparisons.6 = 9.6 − 10.4 y3. is
Example As an example.6 y4. ¯ ¯ y2. ¯ ¯ = = = = = = = = = = 9.4 − 21. − y4.i =
(k − 1)Fk−1. − y5.6 = 15.8∗
7
Using underlining the result may be summarized as y1.8 = −5. Scheﬀ´’s Method for Comparing all Contrasts e Often we are interested in comparing diﬀerent combinations of the treatment means.8 − 21.8∗ 10. ¯ ¯ y1. · · · . ¯
s=1
Sα.6
As k gets large the experimentwise error becomes large. − y4. where ˆ φi = and
k k
cis ys. Sometimes we also ﬁnd that the LSD fails to ﬁnd any signiﬁcant pairwise diﬀerences while the F -test declares signiﬁcance.8∗ −11.i . ¯ ¯ y1.8 y5. ¯ ¯ y1. ¯ ¯ y4. − y5.6 − 10.8 y2. ¯ ¯ y2.8 − 17. − y5. The Scheﬀ´ method e controls the experimentwise error rate at level α. not just pairwise comparisons.6∗ −7.4 − 17. THE FIXED EFFECTS MODEL y1. − y3. ¯ 17. consider the fabric strength data and suppose that we are interested in the contrasts φ1 = µ1 + µ3 − µ4 − µ5 and φ2 = µ1 − µ4 .6 = 17.8 = 15.6∗ −4. Consider the r contrasts
k k
φi =
s=1
cis µs with
s=1
cis = 0 for i = 1.8 = 17.8 − 15.2.
and the experiment consists of H0 : φ 1 = 0 HA : φ1 = 0 ··· ··· H0 : φ r = 0 HA : φr = 0
The Scheﬀ´ method declares φi to be signiﬁcant if e ˆ |φi | > Sα. − y2.
.8∗ −1.6 − 21.2∗ 4. ¯ ¯ y2.0 −2. r . ¯ 9.8 − 10.4 = 9. ¯ ¯ y3.0∗ 6. − y5. ¯ ¯ y3. − y4.8 = 21.

From the studentized range distribution table.06(1 + 1 + 1 + 1)/5 = 10. a pair of means.23 2 5 5 Using this value.05 = 4.20 (.43) 8.01.01.1 =
(k − 1)Fk−1.00 ¯ ¯ ¯ ¯ and ˆ φ2 = y1.n−k (. we ﬁnd that q4.
. − y4.8 The sample estimates of these contrasts are
CHAPTER 1. The Tukey-Kramer Method The Tukey-Kramer procedure declares two means. | y ¯ exceeds 8. + y3.23.80 .01. we ﬁnd that the following pairs of means do not signiﬁcantly diﬀer: µ1 and µ5 µ5 and µ2 µ2 and µ3 µ3 and µ4 Notice that this result diﬀers from the one reported by the LSD method.1 . ¯ ¯
We compute the Scheﬀ´ 1% critical values as e
k
S. would be declared signiﬁcantly diﬀerent if |¯i. − y5. we conclude that the contrast φ1 = µ1 + µ3 − µ4 − µ5 is not signiﬁcantly diﬀerent ˆ from zero.n−k (α) M SW 1 1 + 2 ni nj . since |φ2 | > S.06 1 1 + = 5. Thus. that is. = −11. the mean strengths of treatments 1 and 4 diﬀer signiﬁcantly.n−k (α) is the α percentile value of the studentized range distribution with k groups and n − k degrees of freedom. we conclude that φ2 = µ1 − µ2 is signiﬁcantly diﬀerent from zero. µi and µj . µi and µj .58 ˆ Since |φ1 | < S. − yj.01)
M SW
s=1
(c2 /n2 ) 2s
= 4(4.
where qk. COMPLETELY RANDOMIZED DESIGN
ˆ φ1 = y1.69 and
k
S.01. = 5. − y4.06(1 + 1)/5 = 7. to be signiﬁcantly diﬀerent if the absolute value of their sample diﬀerences exceeds Tα = qk.05) = 4.01)
M SW
s=1
(c2 /n1 ) 1s
= 4(4. However.37 .n−k (.2 =
(k − 1)Fk−1.43) 8. Example Reconsider the fabric strength example. T.2 .

k. We wish to test H0 : µ1 = · · · = µ5 at . Example Consider the tensile strength example considered above. − y1. Ak be k arbitrary events with P (Ai ) ≥ 1 − α/k. · · · . · · · . − yj. · · · . Our purpose here is to ﬁnd a set of (1 − α)100% simultaneous conﬁdence intervals for the k − 1 pairwise diﬀerences comparing treatment to control. We may use this inequality to make simultaneous inference about linear combinations of treatment means in a one-way ﬁxed eﬀects ANOVA set up. y ¯ for i = 2. A (1 − α)100% simultaneous conﬁdence interval for L1 . Lr is α ˆ Li ± tn−k 2r for i = 1. | exceeds y ¯ 3. THE FIXED EFFECTS MODEL The Bonferroni Procedure
9
We start with the Bonferroni Inequality.05 level of signiﬁcance.n−k (α) M SW (1/ni + 1/n1 ) . HA : µi = µj
k
M SW
j=1
2 lij /nj . · · · . A Bonferroni α-level test of H0 : µ1 = µ2 = · · · = µk is performed by testing H0 : µi = µj with tij |¯i. A2 . | y ¯ α > tn−k M SW (1/ni + 1/nj ) 2 k 2 . · · · . Lr be r linear combinations of µ1 . − yj. k.
. So the test rejects H0 : µi = µj in favor of HA : µi = µj if |¯i.1. r.
Exercise : Use underlining to summarize the results of the Bonferroni testing procedure. for i = 2. Let A1 . r. Then P (A1 ∩ A2 ∩ · · · ∩ Ak ) ≥ 1 − α. ¯ for i = 1.153 .n−k (α) is read from a table. · · · . | > dk−1. L2 . µi − µ1 .153 M SW (2/5) = 5.05/(2 ∗ 10)) = t20 (. The value dk−1. µk are k − 1 treatment means. · · · . vs. There is no need to perform an overall F -test. Dunnett’s Method for Comparing Treatments to a Control Assume µ1 is a control mean and µ2 . This is done using t20 (. Dunnett’s method rejects the null hypothesis H0 : µi = µ1 at level α if |¯i. µk where Li = j=1 lij µj and Li = j=1 lij yj. · · · . The proof of this result is left as an exercise. k k ˆ Let L1 .66 .0025) = 3. · · · .2.
for 1 ≤ i < j ≤ k.

The usual partition of the total sum of squares still holds:
SST = SSB + SSW . Such a model is known as a random eﬀects model. COMPLETELY RANDOMIZED DESIGN
Consider the tensile strength example above and let treatment 5 be the control. j = 1.3
The Random Eﬀects Model
The treatments in an experiment may be a random sample from a larger population of treatments. ni . − y5. k.05) = 2.20 (. Since we are interested in the bigger population of treatments.65.20 (.
ij .10 Example
CHAPTER 1. the hypothesis of interest is
2 H0 : σ τ = 0
versus
2 HA : στ > 0 . = 6. The mathematical representation of the model is the same as the ﬁxed eﬀects model: yij = µ + τi + ij .8 indicate any signiﬁcant diﬀerence.
2 2 If the hypothesis H0 : στ = 0 is rejected in favor of HA : στ > 0. The
ij
are random errors which follow the normal distribution with mean 0 and common variance σ 2 .
If the τi are independent of
the variance of an observation will be
2 Var(yij ) = σ 2 + στ . · · · . The Dunnett critical value is d4. − y5.n−k (α). The estimators of the variance components are
M SB M SW
σ 2 = M SW ˆ
. if any) the variability among the treatments in the population.76 . Thus the critical diﬀerence is d4.e. | > 4. M SW (2/5) = 4. · · · . Our purpose is to estimate (and test.76
1.05) So the test rejects H0 : µi = µ5 if |¯i. = 10. i. σ 2 and στ are known as variance components. are a random sample from a population that is normally distributed with 2 2 mean 0 and variance στ . except for the assumptions underlying the model. 2. τi . i = 1. Assumptions 1.8 and y4. Thus we ¯ ¯ ¯ ¯ conclude µ3 = µ5 and µ4 = µ5 . then we claim that there is a signiﬁcant diﬀerence among all the treatments.
2 The two variances. τi ∼ N (0. The treatment eﬀects. Testing is performed using the same F statistic that we used for the ﬁxed eﬀects model:
F0 = An α-level test rejects H0 if F0 > Fk−1. y ¯ Only the diﬀerences y3. − y5. στ ).

3. Example A textile company weaves a fabric on a large number of looms.90 ˆ and στ = ˆ2 29.96 . THE RANDOM EFFECTS MODEL and στ = ˆ2 where n0 = 1 k−1
11
M SB − M SW . The variance components are estimated by σ 2 = 1.19 22.1.75 111.73 − 1.n−k (1 − α/2)
. To investigate this.68 Observations 1 2 3 4 98 97 99 96 91 90 93 92 96 95 97 95 95 96 99 98
Since F0 > F3.
The following example is taken from from Montgomery : Design and Analysis of Experiments. we conclude that the looms in the plant diﬀer signiﬁcantly.
We are usually interested in the proportion of the variance of an observation. They would like the looms to be homogeneous so that they obtain a fabric of uniform strength.73 1. The process engineer suspects that. in addition to the usual variation in strength within samples of fabric from the same loom. that is the result of the diﬀerences among the treatments: 2 στ .90 F0 15. 2 + σ2 σ τ
2 2 A 100(1 − α)% conﬁdence interval for στ /(σ 2 + στ ) is
L U . The data are given in the following table: Looms 1 2 3 4 The corresponding ANOVA table is Source Between (Looms) Within (Error) Total df 3 12 15 SS 89. 4
. there may also be signiﬁcant variations in strength between looms. n0
k
ni −
i=1
k 2 i=1 ni k i=1 ni
. Var(yij ).05).
M SB 1 −1 M SW Fk−1. he selects four looms at random and makes four strength determinations on the fabric manufactured on each loom.12 (.90 = 6.
. 1+L 1+U where L= and U= 1 n0 1 n0
.94 MS 29.n−k (α/2) M SB 1 −1 M SW Fk−1.

a (1 − α). QUIT.12 (. CLASS OFFICER. . .86 = 79%) is attributable to the diﬀerence among looms. QUIT.90 1 4. 0 1 -1 -1. 0 1 -1 -1.192. CARDS. CLASS GROUP. From properties of the F distribution we have that Fa. PROC GLM.22 = 0. The engineer must now try to isolate the causes for the diﬀerence in loom performance (faulty set-up.192 − 1 = 20.625
which gives the 95% conﬁdence interval (0. MODEL TS=GROUP.
PROC GLM DATA=A1. RANDOM OFFICER.625 = 0.12 (.96/8.47 and F3. RUN.95) . MEANS GROUP/ CLDIFF BON CONTRAST ’PHI1’ GROUP 1 ESTIMATE ’PHI1’ GROUP 1 CONTRAST ’PHI2’ GROUP 1 ESTIMATE ’PHI2’ GROUP 1 RUN. From the F table we see that F3. . We conclude that the variability among looms accounts for between 39 and 95 percent of the variance in the observed strength of fabric produced.124/21.73 1. INPUT TS GROUP@@.025) = 1/5.86. RUN.
OPTIONS LS=80 PS=66 NODATE. COMPLETELY RANDOMIZED DESIGN
Thus.73 1.975) = 1/F12. 7 1 7 1 15 1 11 1 9 1 12 2 17 2 12 2 18 2 18 2 14 3 18 3 18 3 19 3 19 3 19 4 25 4 22 4 19 4 23 4 7 5 10 5 11 5 15 5 11 5 .b (α) = 1/Fb.12
CHAPTER 1. Most of this variability ˆ ˆ2 (about 6. Using SAS The following SAS code may be used to analyze the tensile strength example considered in the ﬁxed eﬀects CRD case. 2 2 Lets now ﬁnd a 95% conﬁdence interval for στ /(σ 2 + στ ).625/1. 20.39 .
TUKEY SCHEFFE LSD DUNNETT(’5’). MODEL RATING=OFFICER. 0 0 -1 0.
A random eﬀects model may be analyzed using the RANDOM statement to specify the random factor:
.90 1 0. poorly trained operators. ).3 (.025) = 4. the variance of any observation on strength is estimated by σ 2 + στ = 8. 0 0 -1 0. /* print the data */ PROC PRINT DATA=MONT.124 = 0. DATA MONT. Thus L= and U= 1 4 29.47 − 1 = 0.124 1 4 29.

1.2000000
25 3
Source Model Error
DF 4 20
Mean Square 118.9400000 8.0600000
F Value 14.0001
. THE RANDOM EFFECTS MODEL SAS Output
The SAS System Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 TS 7 7 15 11 9 12 17 12 18 18 14 18 18 19 19 19 25 22 19 23 7 10 11 15 11 GROUP 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 2 1
13
The SAS System The GLM Procedure Class Level Information Class GROUP Levels 5 Values 1 2 3 4 5
Number of observations The SAS System The GLM Procedure Dependent Variable: TS Sum of Squares 475.76
Pr > F <.7600000 161.3.

600 -10.745 9.545 11.745 7.745 -1.08596 3.455 7.855 1.7600000 The SAS System The GLM Procedure
Mean Square 118.055 4.545 3.9600000
CHAPTER 1.06 2.945 14.745 *** *** *** *** *** *** *** *** *** *** *** *** ***
.345 9.000
95% Confidence Limits 0.800 7.87642
Root MSE 2.545 -2.055 -7.200 6.7455
Comparisons significant at the 0.945 -5.14
Corrected Total 24 636.800 -4.746923
Coeff Var 18.200 -2.
GROUP Comparison 4 4 4 4 3 3 3 3 2 2 2 2 5 5 5 5 3 2 5 1 4 2 5 1 4 3 5 1 4 3 2 1
Difference Between Means 4.
Alpha Error Degrees of Freedom Error Mean Square Critical Value of t Least Significant Difference
0.200 10.000 6.7600000
Mean Square 118.545 -8.800 -6.055 -0.0001 4
t Tests (LSD) for TS NOTE: This test controls the Type I comparisonwise error rate.839014
TS Mean 15. COMPLETELY RANDOMIZED DESIGN
R-Square 0.76
Pr > F <.855 -14.545 15.455 1.255 5.76
Pr > F <.200 4. not the experimentwise error rate.945 10.04000
Source GROUP
DF 4
Type I SS 475.945 0.000 2.05 20 8.545 8.600 1.800 -6.545 -0.255 2.05 level are indicated by ***.345 -2.055 -3.055 8.800 -4.055 -9.545 -10.800 11.0001
Source GROUP
DF 4
Type III SS 475.9400000
F Value 14.600 5.345 -7.855 4.9400000
F Value 14.

427 -0. but it generally
6
.1.173 1.973 -4.200 -2.373 -17.573 16.173 1.173 -10.23186 5.427 -1.427 -2.773 6.800 11.200 10.373
*** *** ***
*** *** ***
*** *** ***
*** *** ***
The SAS System The GLM Procedure Bonferroni (Dunn) t Tests for TS NOTE: This test controls the Type I experimentwise error rate.973 -5.745 -8.800 -6. THE RANDOM EFFECTS MODEL
1 1 1 1 4 3 2 5 -11.173 17.600 -10.055 -1.227 -16.600 1.3.800 -4.173 13.05 20 8.000 2.600 -1.545 -9.000 6.
GROUP Comparison 4 4 4 4 3 3 3 3 2 2 2 2 5 5 5 5 1 1 1 1 3 2 5 1 4 2 5 1 4 3 5 1 4 3 2 1 4 3 2 5
Difference Between Means 4.173 -9.973 -6.373
Comparisons significant at the 0.000 -15.855 2.05 level are indicated by ***.227 4.800 7.827 3.800 -6.200 4.800 -7.600 5.800 -5.373 -3.173 -13.427 6.745 *** *** ***
15
The SAS System The GLM Procedure Tukey’s Studentized Range (HSD) Test for TS NOTE: This test controls the Type I experimentwise error rate.055 -4.600 -1.545 -11.373 7.200 6.000
Simultaneous 95% Confidence Limits -1.373 11.973 10.573 -7.427 -11.173 -0.573 -0.345 -4.800 -7.427 0.
5
Alpha Error Degrees of Freedom Error Mean Square Critical Value of Studentized Range Minimum Significant Difference
0.373 9.800 -5.373 -6.373 0.427 2.573 12.06 4.173 9.000 -11.827 5.427 -9.773 0.173 -12.800 -4.

15340 5. but it generally has a higher Type II error rate than Tukey’s for all pairwise comparisons.000 -11.462 10.16
CHAPTER 1.600 -1.200 10.6621
Comparisons significant at the 0.862 16.138 -9.05 20 8.600 5.462 -13.462 -0.062 4.662 11.138 2.262 -6.662
*** *** ***
*** *** ***
*** ***
*** ***
The SAS System The GLM Procedure Scheffe’s Test for TS NOTE: This test controls the Type I experimentwise error rate.138 0.
GROUP Comparison 4 4 4 4 3 3 3 3 2 2 2 2 5 5 5 5 1 1 1 1 3 2 5 1 4 2 5 1 4 3 5 1 4 3 2 1 4 3 2 5
Difference Between Means 4.662 0.462 -12.062 -0.0796
Comparisons significant at the 0.800 -5.138 -11.200 4.
.462 -10.600 1.
7
Alpha Error Degrees of Freedom Error Mean Square Critical Value of F Minimum Significant Difference
0.
Alpha Error Degrees of Freedom Error Mean Square Critical Value of t Minimum Significant Difference
0.262 11. COMPLETELY RANDOMIZED DESIGN
has a higher Type II error rate than Tukey’s for all pairwise comparisons.138 -2.538 3.462 17.000 6.800 -4.05 level are indicated by ***.138 1.062 -16.462 13.200 6.538 5.06 2.062 6.800 -4.800 7.000
Simultaneous 95% Confidence Limits -1.200 -2.800 -7.662 7.600 -10.138 -1.862 12.662 -6.800 -6.800 -6.86608 6.862 -1.662 -17.462 1.662 -3.138 6.06 3.800 11.662 9.000 2.262 -4.862 -7.05 level are indicated by ***.262 -5.462 -11.462 1.05 20 8.

GROUP Comparison 4 3 2 1 5 5 5 5
Difference Between Means 10.800 7.280 12.200 10.800 -5.880 -13.720 -1.800 -4.560 9.65112 4.720 1.680 -4.05 level are indicated by ***.3.680 -5.600 -1.800 -6.800 6.880 10.880 13.800 4.06 2.040 2.720 -12.280 16.120 4.480 -0. THE RANDOM EFFECTS MODEL
17
GROUP Comparison 4 4 4 4 3 3 3 3 2 2 2 2 5 5 5 5 1 1 1 1 3 2 5 1 4 2 5 1 4 3 5 1 4 3 2 1 4 3 2 5
Difference Between Means 4.600 1.680 11.000
Simultaneous 95% Confidence Limits 6.480 -16.7602
Comparisons significant at the 0.200 6.480 5.040 -0.560 11.1.800 -7.880 -11.280 -8.720 0.080 0.080 12.720 5.480 7.05 20 8.200 4.880 17.800 11.760 *** ***
.720 -0.080 -5.280 -1.600 -10.080
*** *** ***
*** *** ***
*** ***
*** ***
The SAS System The GLM Procedure Dunnett’s t Tests for TS NOTE: This test controls the Type I experimentwise error for comparisons of all treatments against a control.080 -17.080 -3.000
Simultaneous 95% Confidence Limits -2.880 0.880 -12.880 2.680 -7.160 -5.000 -11.800 -4.720 1.200 -2.080 10.080 8.
8
Alpha Error Degrees of Freedom Error Mean Square Critical Value of Dunnett’s t Minimum Significant Difference
0.720 -10.800 -6.880 -10.000 6.120 3.360 3.880 -0.600 5.760 15.600 -1.000 2.

In the random eﬀects model.79555005
t Value -1.2500000 348.57
Pr > |t| 0. we additionally assume that 2 τi ∼i.18
CHAPTER 1.0630 <. ˆ ¯ The Normality Assumption The simplest check for normality involves plotting the empirical quantiles of the residuals against the expected quantiles if the residuals were to follow a normal distribution. N (0.
.19 Pr > F 0.1000000 Mean Square 31. Diagnostics depend on the residuals.1000000 F Value 3. Anderson-Darling.i.d. TITLE1 ’STRENGTH VS.88 43.0000000 -11. COMPLETELY RANDOMIZED DESIGN
The SAS System The GLM Procedure 9
Dependent Variable: TS Contrast PHI1 PHI2 DF 1 1 Contrast SS 31.4
1.0630 <. CARDS. Other formal tests for normality (Kolmogorov-Smirnov. k.
Consider the one-way CRD model i = 1. INPUT TS GROUP@@. Example The following SAS code and partial output checks the normality assumption for the tensile strength example considered earlier. SYMBOL1 V=CIRCLE I=NONE. PERCENTAGE’. j = 1.97 -6.
where it is assumed that ij ∼i. PLOT TS*GROUP/FRAME. Shapiro-Wilk.1
More About the One-Way Model
Model Adequacy Checking
yij = µ + τi +
ij .4.0001
1. The results from the QQ-plot as well as the formal tests (α = . SAS Code
OPTIONS LS=80 PS=66 NODATE. eij = yij − yij = yij − yi. QUIT. N (0.i. 7 1 7 1 15 1 11 1 9 1 12 2 17 2 12 2 18 2 18 2 14 3 18 3 18 3 19 3 19 3 19 4 25 4 22 4 19 4 23 4 7 5 10 5 11 5 15 5 11 5 . · · · . · · · . Cramer-von Mises) may also be performed to assess the normality of the residuals.2500000 348. This is known as the normal QQ-plot.8000000
Standard Error 2.05) indicate that the residuals are fairly normal.
PROC GPLOT DATA=MONT.53929124 1. ni . RUN. στ ) independently of ij .d. σ 2 ).0001
Parameter PHI1 PHI2
Estimate -5. DATA MONT.

QQPLOT RES/NORMAL (L=1 MU=EST SIGMA=EST). PROC GPLOT DATA=DIAG.
PROC SORT DATA=DIAG.51833065
Basic Statistical Measures Location Mean Median Mode 0.1.11239681 161.59165 6. SYMBOL1 V=CIRCLE I=SM50.00000
NOTE: The mode displayed is the smallest of 7 modes with a count of 2. RUN. MORE ABOUT THE ONE-WAY MODEL
19
PROC GLM. QUIT. RUN.59165327 0.71667 9. TITLE1 ’QQ-PLOT OF RESIDUALS’. BY PRED.
.40000 -3. CLASS GROUP.71666667 -0.00000 0. Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 25 0 6. QUIT. PROC UNIVARIATE DATA=DIAG NORMAL. PLOT RES*PRED/FRAME. MODEL TS=GROUP. QUIT. OUTPUT OUT=DIAG R=RES P=PRED. QUIT.2 .4.40000 Variability Std Deviation Variance Range Interquartile Range 2. TITLE1 ’RESIDUAL PLOT’.2 0.8683604 161.
Partial Output
The UNIVARIATE Procedure Variable: RES Moments N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 25 0 2. RUN. VAR RES.00000 4. RUN.

COMPLETELY RANDOMIZED DESIGN
Tests for Location: Mu0=0 Test Student’s t Sign Signed Rank -Statistict M S 0 2.0000 0.9896
Tests for Normality Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling --Statistic--W D W-Sq A-Sq 0.943868 0.2026 0.080455 0.20
CHAPTER 1.1818 0.518572 -----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.4244 0.5 -----p Value-----Pr > |t| Pr >= |M| Pr >= |S| 1.5 0.1775
.0885 0.162123 0.

The graphical tool we shall utilize in this class is the plot of residuals versus predicted values.4.1.
One procedure for testing the above hypothesis is Bartlett’s test.3026 c where
k
q = (n − k) log10 M SW −
i=1
2 (ni − 1) log10 Si
. MORE ABOUT THE ONE-WAY MODEL
21
Constant Variance Assumption Once again there are graphical and formal tests for checking the constant variance assumption. The hypothesis of interest is
2 2 2 H0 : σ1 = σ2 = · · · = σk
versus
2 2 HA : σi = σj for at least one pair i = j . The test statistic is q B0 = 2.

Partial SAS Output
The GLM Procedure Levene’s Test for Homogeneity of TS Variance ANOVA of Squared Deviations from Group Means Sum of Squares 91.6224 1015.22
CHAPTER 1. RUN.9331 Pr > ChiSq 0.45
Pr > F 0. where mi is the median of the observations in group i. dij . QUIT. The plot of residuals versus predicted values (see above) indicates no serious departure from the constant variance assumption. it should not be used if the normality assumption is not satisﬁed. MODEL TS=GROUP. and then running the usual ANOVA F -test using the transformed observations. CLASS GROUP. COMPLETELY RANDOMIZED DESIGN 1 c=1+ 3(k − 1) We reject H0 if B0 > χ2 (α) k−1
k i=1
1 1 − ni − 1 n−k
where χ2 (α) is read from the chi-square table. Example Once again we consider the tensile strength example. Levene’s test proceeds by computing dij = |yij − mi | . MEANS GROUP/HOVTEST=BARTLETT HOVTEST=LEVENE. A test which is more robust to deviations from normality is Levene’s test.4 Mean Square 22. Partial SAS Code
PROC GLM. The tests provide no evidence that indicates the failure of the constant variance assumption. instead of the original observations.7720
Source GROUP Error
DF 4 20
F Value 0. yij .7704
Bartlett’s Test for Homogeneity of TS Variance Source GROUP DF 4 Chi-Square 0.9056 50. So.9198
. The following modiﬁcation to the proc GLM code given above generates both Bartlett’s and Levene’s tests. k−1 Bartlett’s test is too sensitive deviations from normality.

0 9. The SAS procedure NPAR1WAY may be used to obtain the Kruskal-Wallis test.0 23.0 2.5 20.5 25.0 25 11.0 12.4. 27. MORE ABOUT THE ONE-WAY MODEL
23
1. PROC NPAR1WAY WILCOXON. VAR TS.25. CARDS. we may wish to use nonparametric alternatives to the F -test.0 7.5 85.5 16.5 24.5 1 n−1
k ni 2 Rij − i=1 j=1
n(n + 1)2 4
We ﬁnd that S 2 = 53.0 Ri.5 20.5 66. INPUT TS GROUP@@.01) = 13. Thus we 4 reject the null hypothesis and conclude that the treatments diﬀer.0 16.0 20.0 113. 7 1 7 1 15 1 11 1 9 1 12 2 17 2 12 2 18 2 18 2 14 3 18 3 18 3 19 3 19 3 19 4 25 4 22 4 19 4 23 4 7 5 10 5 11 5 15 5 11 5 .5 7.5 20 9. we ﬁrst rank all the observations. of the observations are given in the following table: 15 2.03 and KW0 = 19. yij .0 14. DATA MONT. QUIT.0 35 2.4.
OPTIONS LS=80 PS=66 NODATE. CLASS GROUP.5 16. From the chi-square table we get χ2 (. − ni 4
where Ri.5 16.0 30 20.0 5. in increasing order.0 12.0 33. k−1 Example For the tensile strength data the ranks. Say the ranks are Rij . To perform the Kruskal-Wallis test.28. and S2 = The test rejects H0 : µ1 = · · · = µk if KW0 > χ2 (α) . The Kruskal-Wallis test is one such procedure based on the rank transformation. is the sum of the ranks of group i.2
Some Remedial Measures
The Kruskal-Wallis Test When the assumption of normality is suspect. RUN.
. Rij .5 7.0 4. The Kruskal-Wallis test statistic is KW0 = 1 S2
k i=1 2 n(n + 1)2 Ri.1.

1/ y .00 65.634434 5.
A simple method of choosing the appropriate transformation is to plot log Si versus log yi.0 14. It uses the maximum likelihood method to simultaneously estimate the transformation parameter as well as the overall mean and the treatment eﬀects.634434 13.
.00 65. The ¯ following table may be used as a guide: Slope 0 1/2 1 3/2 2 Transformation No Transformation Square root Log Reciprocal square root Reciprocal
A slightly more involved technique of choosing a variance stabilizing transformation is the Box-Cox transformation. arcsin( y).60 5 5 33.0 14. The common transformations are √ √ √ y. .00 4 5 113.634434 6.50 2 5 66.70 Average scores were used for ties. or regress ¯ log Si versus log yi.0 14. 1/y. We then choose the transformation depending on the slope of the relationship.00 65.0 14.0008
Variance Stabilizing Transformations There are several variance stabilizing transformations one might consider in the case of heterogeneity of variance (heteroscedasticity). COMPLETELY RANDOMIZED DESIGN
The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable TS Classified by Variable GROUP Sum of Expected Std Dev Mean GROUP N Scores Under H0 Under H0 Score --------------------------------------------------------------------1 5 27.0637 4 0.634434 22.20 3 5 85.50 65.50 65.0 14.634434 17.
Kruskal-Wallis Test Chi-Square DF Pr > Chi-Square 19. log(y).24
CHAPTER 1.

b
(2.1. the treatment. • τi is the ith treatment eﬀect. The simplest design that would accomplish this is known as a randomized complete block design (RCBD). .1
The Randomized Complete Block Design
Introduction
In a completely randomized design (CRD). . . Latin Squares. the experimenter wants to reduce the errors which account for diﬀerences among observations within each treatment. The following layout shows a RCBD with k treatments and b blocks. Each block is divided into k subblocks of equal size. · · · . • βj is the eﬀect of the jth block. . Block b Treatment 1 y11 y12 y13 .Chapter 2
Randomized Blocks. There is one observation per treatment in each block and the treatments are run in a random order within each block. This creates diﬀerences among the blocks and makes observations within a block similar. ykb
The statistical model for RCBD is yij = µ + τi + βj + where • µ is the overall mean. and Related Designs
2. Within each block the k treatments are assigned at random to the subblocks. y1b Treatment 2 y21 y22 y23 . . This is done by identifying supplemental variables that are used to group experimental subjects that are homogeneous with respect to that variable.1)
is the random error term associated with the ijth observation.
j = 1. The design is ”complete” in the sense that each block contains all the k treatments.1
2. The random error component arises because of all the variables which aﬀect the dependent variable except the one controlled variable. . y2b ··· ··· ··· ··· ··· Treatment k yk1 yk2 yk3 . Block 1 Block 2 Block 3 . Naturally. . treatments are assigned to the experimental units in a completely random manner. 25
. · · · . k. and •
ij ij .
i = 1. . One of the ways in which this could be achieved is through blocking.

)2 + k y ¯
j=1 b
(¯.. b yij = 1 k
k
yi.i..
i=1 k b
j = 1. Symbolically.j + y. = ¯
i=1
i=1 j=1
1 b
b
y. )2 = b ¯
i=1 j=1 i=1 k
(¯i.26
CHAPTER 2. Deﬁne 1 yi.j = ¯ y. )2 y ¯
+
i=1 j=1
(yij − yi.
We are mainly interested in testing the hypotheses H0 : µ1 = µ2 = · · · = µk HA : µi = µj for at least one pair i = j Here the ith treatment mean is deﬁned as µi = 1 b
b
(µ + τi + βj ) = µ + τi
j=1
Thus the above hypotheses may be written equivalently as H0 : τ1 = τ2 = · · · = τk = 0 HA : τi = 0 for at least one i
2. − y. = ¯ b y. − y... = ¯ One may show that
k b k b
yij . k
1 k 1 n
yij . SST = SSTreatments + SSBlocks + SSE The degrees of freedom are partitioned accordingly as (n − 1) = (k − 1) + (b − 1) + (k − 1)(b − 1)
. and the sum of squares due to error. · · · ..
j=1 k
i = 1. σ 2 ). · · · .d N (0. )2 ¯ ¯ ¯
Thus the total sum of squares is partitioned into the sum of squares due to the treatments. the sum of squares due to the blocking.2
Decomposition of the Total Sum of Squares
b
Let n = kb be the total number of observations.
βj = 0.j − y.j ¯
j=1
(yij − y. and
∼i.1. RANDOMIZED BLOCKS We make the following assumptions concerning the RCBD model: • • •
ij k i=1 τi b j=1
= 0.

and βj are obtained via minimization of the sum of squares of the errors
k b 2 ij i=1 j=1 k b
L= The solution is
=
i=1 j=1
(yij − µ − τi − βj )2 .. The ANOVA table for RCBD is
Source Treatments Blocks Error Total df k−1 b−1 (k − 1)(b − 1) n−1 SS SSTreatments SSBlocks SSE SST MS M STreatments M SBlocks M SE F -statistic
M STreatments M SE M SBlocks FB = M SE
M STreatments M SE SSE . · · · . a large value of FB would indicate that the blocking variable is probably having the intended eﬀect of reducing noise.(k−1)(b−1) (α) .j − y.
µ = y. THE RANDOMIZED COMPLETE BLOCK DESIGN
27
2..1). ¯ ¯ ¯ Example An experiment was designed to study the performance of four diﬀerent detergents for cleaning clothes. The following ”cleanliness” readings (higher=cleaner) were obtained using a special device for three diﬀerent types of common stains..j − y. − y. τi . − y. b
From the model in (2. The estimators of µ. ˆ ¯ τi = yi. + y. k j = 1. + y. However. (k − 1)(b − 1)
SSTreatments k−1
and M SE =
F0 =
Since there is no randomization of treatments across blocks the use of FB = M SBlocks /M SE as a test for block eﬀects is questionable.j − y. ¯ ¯ ¯ ¯ ¯ = yi. Estimation Estimation of the model parameters is performed using the least squares procedure as in the case of the completely randomized design. ¯ ¯
i = 1.. ˆ ¯ ¯ ˆ βj = y..2. we can see that the estimated values of yij are ˆ yij = µ + τi + βj ˆ ˆ ˆ = y. + yi...1.3
Testing
Statistical Analysis
The test for equality of treatment means is done using the test statistic F0 = where M STreatments = An α level test rejects H0 if F0 > Fk−1.1. · · · . Is there a signiﬁcant diﬀerence among the detergents?
.

08333 F Value 11.1666667 Type III SS 110.
The corresponding output is
The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total R-Square 0. RUN. CLASS STAIN SOAP. MODEL Y = SOAP STAIN.0063. INPUT STAIN SOAP Y @@. CARDS. PROC GLM.0018 Pr > F 0.78 with a p-value of .28 Stain 1 45 47 48 42 182 Stain 2 43 46 50 37 176
CHAPTER 2. The following SAS code gives the ANOVA table:
OPTIONS LS=80 PS=66 NODATE. An incorrect analysis of the data using a one-way ANOVA set up (ignoring the blocking factor) is
.1388889 F Value 15.0063 0.762883 DF 3 2 DF 3 2
Root MSE 1.5833333 Mean Square 36.5833333
The SAS Type I analysis gives the correct F = 11.01. 1 1 45 1 2 47 1 3 48 1 4 42 2 1 43 2 2 46 2 3 50 2 4 37 3 1 51 3 2 52 3 3 55 3 4 49 .8333333 264.53 F Value 11.9722222 67.0018
Type I SS 110.9166667 135.9722222 67.6 19/6
which has a p-value < . RANDOMIZED BLOCKS Stain 3 51 52 55 49 207 Total 139 145 153 128 565
Detergent Detergent Detergent Detergent Total
1 2 3 4
Using the formulæ for SS given above one may compute: SST = 265 SSTreatments = 111 SSBlocks = 135 SSE = 265 − 111 − 135 = 19 Thus F0 = 111/3 = 11.68 Pr > F 0.771691
Y Mean 47.1666667
Mean Square 36.2166667 3.0022
Coeff Var 3. QUIT.78 21.9166667 135.9166667 Mean Square 49.78 21.0833333 18.0063 0. Thus we claim that there is a signiﬁcant diﬀerence among the four detergents.928908 Source SOAP STAIN Source SOAP STAIN DF 5 6 11 Sum of Squares 246. DATA WASH.53 Pr > F 0.

2048
Notice that H0 is not rejected indicating no signiﬁcant diﬀerence among the detergents.139. 2 2 Using the ANOVA table from RCBD.1.2500000 F Value 1. R is the increase in the number of replications required if a CRD to achieve the same precision as a RCBD. we may estimate σr and σb as
σb = M SE ˆ2 σr = ˆ2 Example Consider the detergent example considered in the previous section. QUIT.1.
29
The corresponding output is
The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total DF 3 8 11 Sum of Squares 110. CLASS SOAP.86) = = 4. MODEL Y = SOAP.5 as many replications to obtain the same precision as obtained by blocking on stain types. respectively.2.
2. THE RANDOMIZED COMPLETE BLOCK DESIGN
PROC GLM. RUN.139) = = 14.86 kb − 1 12 − 1 (b − 1)M SBlocks + b(k − 1)M SE kb − 1
The relative eﬃciency of RCBD to CRD is estimated to be (dfb + 1)(dfr + 3) σr ˆ2 ˆ R= · 2 (dfb + 3)(dfr + 1) σb ˆ (6 + 1)(8 + 3)(14.4
Relative Eﬃciency of the RCBD
The example in the previous section shows that RCBD and CRD may lead to diﬀerent conclusions. From the ANOVA table for the RCBD we see that M SE = 3.139) This means that a CRD will need about 4. dfr = kb − k = 8 Thus σb = M SE = 3.0000000 264.5 (6 + 3)(8 + 1)(3.9722222 19.139 ˆ2 σr = ˆ2 (b − 1)M SBlocks + b(k − 1)M SE (2)(67.9166667 154.92 Pr > F 0. A natural question to ask is ”How much more eﬃcient is the RCBD compared to a CRD?” One way to deﬁne this relative eﬃciency is 2 (dfb + 1)(dfr + 3) σr R= · 2 (dfb + 3)(dfr + 1) σb
2 2 where σr and σb are the error variances of the CRD and RCBD. and dfr and dfb are the corresponding error degrees of freedom. dfb = (k − 1)(b − 1) = 6.
.58) + (3)(3)(3.9166667 Mean Square 36.

Example Once again consider the detergent example of the previous section. Here is the procedure:
ij
.6
Model Adequacy Checking
Additivity The initial assumption we made when considering the model yij = µ + τi + βj +
ij
is that the model is additive. to be signiﬁcantly diﬀerent if the absolute value of their sample diﬀerences exceeds M SE 2 Tα = qk. Thus in all the equations we replace ni by b.(k−1)(b−1) (α) is the α percentile value of the studentized range distribution with k groups and (k − 1)(b − 1) degrees of freedom. according to our model. This makes the test on the treatment means less sensitive. ¯ y2.139/3 = (4. Suppose we wish to make pairwise comparisons of the treatment means via the Tukey-Kramer procedure. ¯
2. Thus we lose b − 1 degrees of freedom unnecessarily.67.05) 3. Notice that we are using (k − 1)(b − 1) degrees of freedom in the RCBD as opposed to kb − k in the case of a CRD. we are interested in multiple comparisons to ﬁnd out which treatment means diﬀer. µi and µj . ¯ y4. the way the treatment aﬀects the outcome may be diﬀerent from block to block. = 48. i.(k−1)(b−1) (α) .1. = 42. diﬀerences among the means will remain undetected.e.33 48. The Tukey-Kramer procedure declares two treatment means. the expected increase of the response in block 1 and treatment 1 is 6. ¯ 42. Any nonlinear pattern indicates nonadditivity. If the ﬁrst treatment increases the expected response by 2 and the ﬁrst block increases it by 4. RANDOMIZED BLOCKS
Another natural question is ”What is the cost of blocking if the blocking variable is not really important. = 51.e.6 (. We may use any of the multiple comparison procedures discussed in Chapter 1. This setup rules out the possibility of interactions between blocks and treatments. A quick graphical check for nonadditivity is to plot the residuals. We use the regression approach of testing by ﬁtting the full and reduced models. In reality. The sample treatment means are y1.00 y2.90)(1. 2 b where qk. ¯ y3.1.33. ˆ A formal test is Tukey’s one degree of freedom test for nonadditivity. = 46.00.05 = q4.0127 Thus using underlining y4.67 y1. ˆ yij . ¯ We also have T. We start out by ﬁtting the model yij = µ + τi + βj + γτi βj + Then testing the hypothesis H0 : γ = 0 is equivalent to testing the presence of nonadditivity. if blocking was not necessary?” The answer to this question lies in the diﬀering degrees of freedom we use for the error variable. The only diﬀerence here is that we use the number of blocks b in place of the common sample size. eij = yij − yij .5
Comparison of Treatment Means
As in the case of CRD.30
CHAPTER 2.
2.33 51. versus the ﬁtted values.023) = 5. ¯ 46. i. then. ¯ y3.33.

corresponding to observation ij in resulting from ﬁtting the model above. THE RANDOMIZED COMPLETE BLOCK DESIGN • Fit the model yij = µ + τi + βj +
ij
31
ˆ • Let eij and yij be the residual and the ﬁtted value.40. The data is given below. respectively. model Im = Temp Pres. Do Pres = 25. ˆ • The sum of squares due to nonadditivity is
k b 2 rij i=1 j=1 ij
SSN = γ 2 ˆ • The test statistic for nonadditivity is F0 = Example
SSN /1 (SSE − SSN )/[(k − 1)(b − 1) − 1]
The impurity in a chemical product is believed to be aﬀected by pressure. output. class Temp Pres.
25 5 3 1
Pressure 30 35 40 4 6 3 1 4 2 1 3 1
45 5 3 2
. 100 5 4 6 3 5 125 3 1 4 2 3 150 1 1 3 1 2 . • Regress eij on rij .2. run. Input Temp @.35. Data Chemical. We will use temperature as a blocking variable. proc glm.45. title "Tukey’s 1 DF Nonadditivity Test". Temp 100 125 150 The following SAS code is used. Input Im @. quit. i.e. end. • Let zij = yij and ﬁt ˆ2 zij = µ + τi + βj +
ij
ˆ • Let rij = zij − zij be the residuals from this model.1. proc print. cards.30.
Options ls=80 ps=66 nodate. ﬁt the model eij = α + γrij + Let γ be the estimated slope.

09852217 Type III SS 1.00455079 0. quit.
CHAPTER 2. /* Form a new variable called Psquare.25864083 1.62932041 0.03185550 1.09852217 F Value 42.93333333 Im Mean 2. run.
Tukey’s 1 DF Nonadditivity Test The GLM Procedure Dependent Variable: Im Source Model Error Corrected Total R-Square 0.42 Pr > F 0.36 with 1 and 7 degrees of freedom.32 1.68 0.5660 Pr > F 0.0042 0.33333333 11.933333 Mean Square 11. The following SAS code gives the normality diagnostics. run.32
output out=out1 predicted=Pred.09624963 0.0001 0.1690 0. run. It has a p-value of 0.521191 DF 2 4 1 DF 2 4 1
Type I SS 23.90147783 36.
. set out1. model Im = Temp Pres Psquare.36 Pr > F 0.5660. RANDOMIZED BLOCKS
The following is the corresponding output. Normality The diagnostic tools for the normality of the error terms are the same as those use in the case of the CRD. Thus we have no evidence to declare nonadditivity.948516 Source Temp Pres Psquare Source Temp Pres Psquare Coeff Var 17. */ Data Tukey.0005
Root MSE 0. quit. Psquare = Pred*Pred.36 F Value 2.60000000 0.01 0.27163969 F Value 18. proc glm.95 10.27406241 0. Example Consider the detergent example above.90000000 0.09852217 Mean Square 0. class Temp Pres.5660 Mean Square 5.66666667 2.09852217
Thus F0 = 0. quit.76786 DF 7 7 14 Sum of Squares 35. Formal tests like the Kolmogorov-Smirnov test may also be used to assess the normality of the errors.4634 0. The graphic tools are the QQ-plot and the histogram of the residuals.

MODEL Y = SOAP STAIN. RUN. RUN. QQPLOT RES / NORMAL (L=1 MU=0 SIGMA=EST). PLOT RES*STAIN.1. PLOT RES*PRED. PLOT RES*SOAP. QUIT. 1 1 45 1 2 47 1 3 48 1 4 42 2 1 43 2 2 46 2 3 50 2 4 37 3 1 51 3 2 52 3 3 55 3 4 49 .
PROC GPLOT. INPUT STAIN SOAP Y @@. CARDS.
33
PROC GLM. QUIT. OUTPUT OUT=DIAG R=RES P=PRED. CLASS STAIN SOAP. HIST RES / NORMAL (L=1 MU=0 SIGMA=EST).2. QUIT. RUN. MEANS SOAP/ TUKEY LINES. DATA WASH. THE RANDOMIZED COMPLETE BLOCK DESIGN
OPTIONS LS=80 PS=66 NODATE.
The associated output is (ﬁgures given ﬁrst):
. PROC UNIVARIATE NOPRINT.

01685612 0.333 46.000 48.0076 Means with the same letter are not significantly different.138889 Critical Value of Studentized Range 4. but it generally has a higher Type II error rate than REGWQ.250
Th QQ-plot and the formal tests do not indicate the presence of nonnormality of the errors.
.333 42.05 Error Degrees of Freedom 6 Error Mean Square 3.667
N 3 3 3 3
SOAP 3 2 1 4
B B B
RCBD Diagnostics The UNIVARIATE Procedure Fitted Distribution for RES Parameters for Normal Distribution Parameter Mean Std Dev Symbol Mu Sigma Estimate 0 1.13386116 -----p Value----Pr > W-Sq Pr > A-Sq >0.34
CHAPTER 2. Alpha 0.250 >0. RANDOMIZED BLOCKS
Tukey’s Studentized Range (HSD) Test for Y NOTE: This test controls the Type I experimentwise error rate.252775
Goodness-of-Fit Tests for Normal Distribution Test Cramer-von Mises Anderson-Darling ---Statistic---W-Sq A-Sq 0. Tukey Groupi ng A A A A A
Mean 51.89559 Minimum Significant Difference 5.

2. The spread of the residuals seems to diﬀer from detergent to detergent.1. The plots we need to examine in this case are residuals versus blocks. The plots below (produced by the SAS code above) suggest that there may be nonconstant variance. One may use formal tests. such as Levene’s test or perform graphical checks to see if the assumption of constant variance is satisﬁed. and residuals versus predicted values. We may need to transform the values and rerun the analsis.
. THE RANDOMIZED COMPLETE BLOCK DESIGN Constant Variance
35
The tests for constant variance are the same as those used in the case of the CRD. residuals versus treatments.

Note that the totals (without 37) are T4. If only one value is missing. (k − 1)(b − 1)
.17 6 kTi. however.e. We proceed to estimate the rest in a similar fashion.7
Missing Values
In a randomized complete block design.2 = 37 is missing. diﬀerence between consecutive estimates is small. Example Consider the detergent comparison example. be a loss of one degree of freedom from both the total and error sums of squares. When more than one value is missing. we usually eliminate the block or treatment in question. Suppose y4. for instance.. T. and • T.. i. say yij . they may be estimated via an iterative process. + bT. We then estimate the one missing value using the procedure above. RANDOMIZED BLOCKS
2. A missing observation would mean a loss of the completeness of the design.2 = 4(91) + 3(139) − 528 = 42. We then substitute yij and carry out the ANOVA as usual. We repeat this process until convergence. Another way would be to estimate the missing value. Since the substituted value adds no practical information to the design.1. each treatment appears once in every block.j is the total for block j. = 528. The analysis is then performed as if the block or treatment is nonexistent.2 = 139.j − T. If several observations are missing from a single block or a single treatment group. then we substitute a value yij = where • Ti. The estimate is y4. We then estimate the second one using the one estimated value and the remaining guessed values.. it should not be used in computations of means. = 91.36
CHAPTER 2. We ﬁrst guess the values of all except one. is the grand total. • T. One way to proceed would be to use a multiple regression analysis. when performing multiple comparisons. T. is the total for treatment i. There will.

We then need to modify the F value by hand using the correct degrees of freedom.0001 Pr > F 0.92 F Value 26.92 Pr > F 0.
PROC GLM.012490 DF 3 2 DF 3 2
Type I SS 71.41. The following SAS code will perform the RCBD ANOVA. MODEL Y = SOAP STAIN.
.1564917 Root MSE 0.
/* Replace the missing value with the estimated value.7398167
Mean Square 23.6703750 5.9305583 107.
OPTIONS LS=80 PS=66 NODATE. RUN. SET WASH.9143528 F Value 39. IF Y = .0002
Coeff Var 2.0008 0.4861167 185. CLASS STAIN SOAP.1.0008 0. */ DATA NEW. the correct F value is F0 = 71. CARDS.9340750 0.956218 Y Mean 47. 3 1 51 3 2 52 3 3 55 3 4 49 . RUN. QUIT. QUIT.7398167 Type III SS 71.9768528 53.970370 Source SOAP STAIN Source SOAP STAIN DF 5 6 11 Sum of Squares 179.
The SAS System The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total R-Square 0.9768528 53. THE RANDOMIZED COMPLETE BLOCK DESIGN
37
Now we just plug in this value and perform the analysis.8699083
So. INPUT STAIN SOAP Y @@.0001 Mean Square 35.05) = 5.84 5.51417 F Value 26.17.22 58.49/5
which exceeds the tabulated value of F3.5 (.93/3 = 21.30 Pr > F 0.9305583 107.
The following is the associated SAS output. DATA WASH. 1 1 45 1 2 47 1 3 48 1 4 42 2 1 43 2 2 46 2 3 50 2 4 . THEN Y = 42.2.22 58.8699083 Mean Square 23.

we need p2 observations. OUTPUT OUT=LAT44 ROWS NVALS=(1 2 3 4) COLS NVALS=(1 2 3 4) TMTS NVALS=(1 2 3 4).
OPTIONS LS=80 PS=66 NODATE. FACTORS ROWS=4 ORDERED COLS=4 ORDERED /NOPRINT. Say we have 4 treatments. p rows and p columns.38
CHAPTER 2. row and column hereafter. CLASS ROWS COLS. sometimes we have two or more factors that can be controlled. TITLE ’A 4 BY 4 LATIN SQUARE DESIGN’. PROC PLAN SEED=12345. in particular the Latin square design. VAR TMTS. and only once. TABLE ROWS. TREATMENTS TMTS=4 CYCLIC. and D and two factors to control. ------------------------------------------------------A 4 BY 4 LATIN SQUARE DESIGN ______________________________________ | | COLS | | |___________________| | | 1 | 2 | 3 | 4 | | |____|____|____|____| | |TMTS|TMTS|TMTS|TMTS| | |____|____|____|____| | |Sum |Sum |Sum |Sum | |__________________|____|____|____|____| |ROWS | | | | | |__________________| | | | | |1 | 1| 2| 3| 4| |__________________|____|____|____|____| |2 | 2| 3| 4| 1| |__________________|____|____|____|____| |3 | 3| 4| 1| 2| |__________________|____|____|____|____| |4 | 4| 1| 2| 3| |__________________|____|____|____|____|
. One design that handles such a case is the Latin square design. These observations are then placed in a p × p grid made up of. RUN. Consider a situation where we have two blocking variables. QUIT. RANDOMIZED BLOCKS
2. in such a way that each treatment occurs once. and treatments. RUN. COLS*TMTS. C.2
The Latin Square Design
The RCBD setup allows us to use only one factor as a blocking variable. B. PROC TABULATE. The following SAS code gives the above basic 4 × 4 design. QUIT. A. To build a Latin square design for p treatments. A basic 4 × 4 Latin square design is Column 2 3 B C C D D A A B
Row 1 2 3 4
1 A B C D
4 D A B C
The SAS procedure Proc PLAN may be used in association with Proc TABULATE to generate designs. However. in each row and column.

... − y. An α level test rejects null hypothesis if F0 > Fp−1. − y. N (0. THE LATIN SQUARE DESIGN The statistical model for a Latin square design is i = 1. )2 ¯ ¯ ¯ y
Thus the ANOVA table for the Latin square design is
Source Treatments Rows Columns Error Total df p−1 p−1 p−1 (p − 2)(p − 1) p2 − 1 SS SST rt SSRow SSCol SSE SST MS M ST rt M SRow M SCol M SE F -statistic F0 =
M ST rt M SE
The test statistic for testing for no diﬀerences in the treatment means is F0 . − y. )2 y ¯ (yijk − yi.d. · · · . and treatments.
2.... σ 2 ). p j = 1.i.. and •
ijk
∼i. rows. − y. )+ y ¯ τj + ˆ (¯. p k = 1. − y.... The only diﬀerence is that b is replaced by p and the error degrees of freedom becomes (p − 2)(p − 1) instead of (k − 1)(b − 1). Multiple comparisons are performed in a similar manner as in the case of RCBD... · · · . SST . treatments.k − y. and error.. • αi is the ith block 1 (row) eﬀect....k + 2¯..j. An intuitive way of identifying the components is (since all the cross products are zero)
yijk = = y.. )2 y ¯ (¯.2.j..
There is no interaction between rows.
.2...1
Statistical Analysis
The total sum of squares. )2 ¯ (¯i..(p−2)(p−1) (α). )+ y ¯ αi + ˆ (¯. p
39
yijk = µ + αi + τj + βk +
ijk
where • µ is the grand mean..k − y. • βk is the kth block 2 (column) eﬀect. • τj is the jth treatment eﬀect. + ¯ µ+ ˆ (¯i. partitions into sums of squares due to columns. ) ¯ ¯ ¯ y eijk
We have SST = SSRow + SST rt + SSCol + SSE where SST = SSRow = p SST rt = p SSCol = p SSE = (yijk − y.. − y.j.. )+ y ¯ ˆ βk + (yijk − yi. − y. columns. − y. )2 y ¯ (¯. · · · .2. the model is completely additive..j..k + 2¯..

Period 1 2 3 4 1 A=38 B=32 C=35 D=33 Cow 2 3 B=39 C=45 C=37 D=38 D=36 A=37 A=30 B=35 4 D=41 A=30 B=32 C=33
The following gives the SAS analysis of the data.1875000 54.0625000 18.44 F Value 16. RANDOMIZED BLOCKS
Consider an experiment to investigate the eﬀect of four diﬀerent diets on milk production of cows. INPUT CARDS.0625000 18.6875000 147. Lactation period and cows are used as blocking variables.68750 F Value 16. RUN.38 22.525780 DF 3 3 3 DF 3 3 3
Type I SS 40.69 60.0001 0. CLASS COW DIET PERIOD.8125000 F Value 33. COW PERIOD DIET MILK @@.0026 <. MODEL MILK = DIET PERIOD COW.1875000 54. A 4 × 4 Latin square design is implemented.4375000 Root MSE 0.980298 Source DIET PERIOD COW Source DIET PERIOD COW DF 9 6 15 Sum of Squares 242.
OPTIONS LS=80 PS=66 NODATE. RUN.5625000 49. ------------------------------------------------Dependent Variable: MILK Source Model Error Corrected Total R-Square 0. DATA NEW.38 22.8750000 247.6875000
Mean Square 13. MEANS DIET/ LINES TUKEY. QUIT.0012 Mean Square 26.0026 <.6875000 Type III SS 40.40 Example
CHAPTER 2.44 Pr > F 0.6875000 147.901388 MILK Mean 35.5625000 49. QUIT.0001 0.2291667 Mean Square 13.5625000 4.9513889 0. During each lactation period the cows receive a diﬀerent diet.17 Pr > F 0.0012 Pr > F 0.05
.69 60.2291667
Tukey’s Studentized Range (HSD) Test for MILK Alpha 0. Assume that there is a washout period between diets so that previous diet does not aﬀect future results. There are four cows in the study.0002
Coeff Var 2. 38 39 45 41 1 2 3 4 2 2 2 2 2 3 4 1 32 37 38 30 1 2 3 4 3 3 3 3 3 4 1 2 35 36 37 32 1 2 3 4 4 4 4 4 4 1 2 3 33 30 35 33
PROC GLM. 1 1 1 2 1 2 3 1 3 4 1 4 .

0000 34.85 pM SE 4(. considering the milk production example. we get M SPeriod + (p − 1)M SE 49. All other pairs are declared to be signiﬁcantly diﬀerent.8125 4. + T. (p − 1)(p − 2)
where Ti.2.89559 2..8125) ˆ R(Latin.2064
41
Means with the same letter are not significantly different. and grand totals of the available observations. CRD) = (p + 1)M SE For instance. we employ an iterative procedure similar to the one in RCBD. The estimated relative eﬃciency of a Latin square design with respect to a RCBD with the rows omitted and the columns as blocks is M SRow + (p − 1)M SE ˆ R(Latin...2
Missing Values
Missing values are estimated in a similar manner as in RCBD’s. If only yijk is missing. .0026) on milk production. and T. Tukey Grouping A A A B B B Mean 37. . RCBDcows ) = = = 15.2.06 + 3(. respectively.j. are the row i.
2.7500 N 4 4 4 4 DIET 3 4 2 1
Thus there diet has a signiﬁcant eﬀect (p-value=0.8125) Thus a RCBD design with just cows as blocks would cost about 16 times as much as the present Latin square design to achieve the same sensitivity. T. RCBDrow ) = pM SE Furthermore. RCBDcol ) = pM SE Similarly. The Tukey-Kramer multiple comparison procedure indicates that diets C and D do not diﬀer signiﬁcantly. THE LATIN SQUARE DESIGN
Error Degrees of Freedom Error Mean Square Critical Value of Studentized Range Minimum Significant Difference 6 0. it is estimated by yijk = p(Ti. column k. the estimated relative eﬃciency of a Latin square design with respect to a CRD M SRow + M SCol + (p − 1)M SE ˆ R(Latin.5000 33.2.k . + T. treatment j..
2.j... the estimated relative eﬃciency of a Latin square design with respect to a RCBD with the columns omitted and the rows as blocks is M SCol + (p − 1)M SE ˆ R(Latin... T. If more than one value is missing.5000 37. we see that if we just use cows as blocks.k ) − 2T.3
Relative Eﬃciency
The relative eﬃciency of the Latin square design with respect to other designs is considered next. The same result holds for diets A and B.
.2.

same columns : 1 A B C 1 C B A 1 B A C 2 B C A 2 B A C 2 A C B 3 C A B 3 A C B 3 C B A replication 1
1 2 3
1 2 3
2
1 2 3 diﬀerent rows. RANDOMIZED BLOCKS
2.42
CHAPTER 2. same columns :
3
1 2 3
1 A B C 1 C B A 1 B A C
2 B C A 2 B A C 2 A C B
3 C A B 3 A C B 3 C B A
replication 1
4 5 6
2
7 8 9 diﬀerent rows.4
Replicated Latin Square
Replication of a Latin square is done by forming several Latin squares of the same dimension. This may be done using • same row and column blocks • new rows and same columns • same rows and new columns • new rows and new columns Examples The following 3×3 Latin square designs are intended to illustrate the techniques of replicating Latin squares. diﬀerent columns :
3
.2. same rows.

p yijkl = µ + αi + τj + βk + ψl + ijkl k = 1. thus increasing the complexity of the model. − y... )2 y ¯ (¯. · · · .. · · · .. The analysis of variance depends on the type of replication.2. p l = 1.. The associated ANOVA table is
Source Treatments Rows Columns Replicate Error Total df p−1 p−1 p−1 r−1 (p − 1)[r(p + 1) − 3] rp2 − 1 SS SST rt SSRow SSCol SSRep SSE SST MS M ST rt M SRow M SCol M SRep M SE F -statistic F0 =
M ST rt M SE
where SST =
p
(yijkl − y.j. )2 ¯ (¯.
.. it adds a parameter (or parameters) in our model. p j = 1.. Here ψl represents the eﬀect of the lth replicate. The statistical model is i = 1... THE LATIN SQUARE DESIGN 1 A B C 4 C B A 7 B A C 2 B C A 5 B A C 8 A C B 3 C A B 6 A C B 9 C B A replication 1
43
1 2 3
4 5 6
2
7 8 9
3
Replication increases the error degrees of freedom without increasing the number of treatments. However. · · · . · · · .. Replicated Square This uses the same rows and columns and diﬀerent randomization of the treatments within each square... − y. )2 y ¯
k=1 r
SSCol = np SSRep = p2
l=1
(¯.....2. )2 y ¯
j=1 p
SST rt = np SSRow = np
i=1 p
(¯i... )2 y ¯
and SSE is found by subtraction.. − y.k...l − y. r where r is the number of replications.

TITLE ’REPLICATED LATIN SQUARES SHARING BOTH ROWS AND COLUMNS’.28111111 Mean Square
.
SINGLE LATIN SQUARES
1
------------------------------------------.SQUARE=1 -------------------------------------------The GLM Procedure Dependent Variable: YIELD Source Model Error Corrected Total R-Square 0. The variable measured was the yield of carbon monoxide in a trap.4 2 1 2 2 31.0123 0.70333333 0. CARDS. BY SQUARE.32222 F Value 6.5 1 3 3 3 30. QUIT. 1 1 1 2 26.0076 Pr > F Mean Square 10.0 1 1 2 3 28.43 Pr > F 0.7 2 2 2 1 24. RANDOMIZED BLOCKS
Three gasoline additives (TREATMENTS. TITLE ’SINGLE LATIN SQUARES’.96777778 11. INPUT SQUARE COL ROW TREAT YIELD. The experiment was repeated twice.3 1 3 2 2 28.7 1 1 3 1 25. Here is the SAS analysis.384419 YIELD Mean 26.3 1 2 1 3 25.3 2 2 3 3 29.
DATA ADDITIVE.0137
Coeff Var 1.51555556 Root MSE 0.995419 Source COL ROW TREAT Source DF 2 2 2 DF DF 6 2 8 Sum of Squares 64. PROC GLM.6 1 2 3 2 28. RUN.7 2 1 3 1 24.93555556 23.4 1 3 1 1 21.9 2 2 1 2 28.1325 0.56222222 Type III SS
Mean Square 0.14777778 F Value 72. QUIT.86111111 19. PROC GLM.8 2 3 2 3 30.47 F Value Pr > F 0. MODEL YIELD= COL ROW TREAT.44 Example
CHAPTER 2.0 1 2 2 1 23. RUN.1 2 1 1 3 32. A B & C) were tested for gas eﬃciency by three drivers (ROWS) using three diﬀerent tractors (COLUMNS). CLASS COL ROW TREAT.29555556 64.2 .72222222 38.26 130.3 2 3 1 1 25.55 80. CLASS SQUARE COL ROW TREAT. MODEL YIELD= SQUARE COL ROW TREAT.5 2 3 3 2 29.460434
Type I SS 1.22000000 0.

94 45.44666667 57.SQUARE=2 -------------------------------------------The GLM Procedure Dependent Variable: YIELD Source Model Error Corrected Total R-Square 0.0215 Pr > F 0.78777778
Mean Square 22.96777778 11.74 1.48 Pr > F 0.65333333
REPLICATED LATIN SQUARES SHARING BOTH ROWS AND COLUMNS The GLM Procedure Dependent Variable: YIELD Source Model Error Corrected Total R-Square 0.94 45.56 1.3012222 F Value 8.65333333 Mean Square 3.72222222 38.22333333 28.53333 F Value 5.20111111 94.00055556 8.48666667 2.30666667 Type III SS 7.86111111 19.74333333 1.39388889 Mean Square 22.981606 Source COL ROW TREAT Source COL ROW TREAT DF 2 2 2 DF 2 2 2 DF 6 2 8 Sum of Squares 67.2.0122222 155.0123 0.20666667 0.2244 0.00055556 8.30666667
Mean Square 3.24000000 1.94 1.56 20.60055556 47.0114 0.39388889
.94 1.55 80.0076
------------------------------------------.0114 0.0542
Coeff Var 2.56 20.793725 YIELD Mean 28.60 F Value 9.63000000 F Value 17.19 Pr > F 0.42778 F Value 9. THE LATIN SQUARE DESIGN
45
COL ROW TREAT
2 2 2
1.0018
Coeff Var 5.74333333 1.01444444 7.2244 0.2.56 1.74 1.60 Pr > F 0.00722222 3.00722222 3.79 Pr > F 0.2563 0.851549 Source SQUARE COL ROW TREAT Source SQUARE COL ROW TREAT DF 1 2 2 2 DF 1 2 2 2 DF 7 10 17 Sum of Squares 132.28111111
6.20111111 94.3399 0.781748
Type I SS 7.3399 0.01444444 7.0215 5 Mean Square 11.56222222
0.2563 0.8576984 2.60055556 47.22333333 28.26 130.93555556 23.1441 0.530809
Type I SS 22.50000000 Root MSE 0.47
0.78777778 Type III SS 22.44666667 57.26000000 68.48 F Value 5.516978 YIELD Mean 27.00055556 4.0038889 23.0003 Mean Square 18.00055556 4.1441 0.1325 0.0003 Pr > F 0.0161111 Root MSE 1.48666667 2.

4 2 1 5 2 31.0 1 1 2 3 28. p k = 1.3 2 3 4 1 25. · · · . r
yijkl = µ + αi(l) + τj + βk + ψl +
ijkl
where αi(l) represents the eﬀect of row i nested within replicate (square) l.
DATA ADDITIVE.9 2 2 4 2 28. TITLE ’REPLICATED LATIN SQUARES SHARING ONLY COLUMNS’. PROC GLM.5 2 3 6 2 29. RANDOMIZED BLOCKS
In this instance the rows of diﬀerent squares are independent but the columns are shared.6 1 2 3 2 28. REPLICATED LATIN SQUARES SHARING ONLY COLUMNS The GLM Procedure Dependent Variable: YIELD Source Model DF 9 Sum of Squares 150. MODEL YIELD= SQUARE COL ROW(SQUARE) TREAT. the treatment eﬀect may be diﬀerent for each square. p j = 1. Thus.3 1 3 2 2 28. QUIT. · · · . The model is i = 1.7746296 F Value 33. CLASS SQUARE COL ROW TREAT. CARDS.5 1 3 3 3 30. i.46 Replicated Rows
CHAPTER 2. The statistical model in shows this by nesting row eﬀects within squares.3 2 2 6 3 29. The associated ANOVA table is
Source Treatments Rows Columns Replicate Error Total df p−1 r(p − 1) p−1 r−1 (p − 1)[rp − 2] rp2 − 1 SS SST rt SSRow SSCol SSRep SSE SST MS M ST rt M SRow M SCol M SRep M SE F -statistic F0 =
M ST rt M SE
Example We will reanalyze the previous example. 1 1 1 2 26.8 2 3 5 3 30.7 1 1 3 1 25. · · · . INPUT SQUARE COL ROW TREAT YIELD.2 .0 1 2 2 1 23.7 2 2 5 1 24.0001 7
. p l = 1.3 1 2 1 3 25.18 Pr > F <. · · · . this time assuming we have six diﬀerent drivers.e diﬀerent rows but same columns.4 1 3 1 1 21.7 2 1 6 1 24.9716667 Mean Square 16.1 2 1 4 3 32. RUN.

16888889 94.0002 0. · · · . r
yijkl = µ + αi(l) + τj + βk(l) + ψl +
ijkl
The associated ANOVA table is
Source Treatments Rows Columns Replicate Error Total df p−1 r(p − 1) r(p − 1) r−1 (p − 1)[r(p − 1) − 1] rp2 − 1 SS SST rt SSRow SSCol SSRep SSE SST MS M ST rt M SRow M SCol M SRep M SE F -statistic F0 =
M ST rt M SE
Example Lets reanalyze the previous example. Both row and column eﬀects are nested within the squares. CARDS.01444444 26.7 2 5 5 1 24.
DATA ADDITIVE.54222222 47. p k = 1.00055556 4.3 1 2 1 3 25.00055556 4. 1 1 1 2 26.00055556 8.78777778
Mean Square 22.93 12.3
.2.7 2 4 6 1 24.9 2 5 4 2 28.94 93.01444444 26.00722222 6.592351 DF 1 2 4 2 DF 1 2 4 2
YIELD Mean 27.0127 0. · · · .39388889 Mean Square 22.0014 <.973910 Source SQUARE COL ROW(SQUARE) TREAT Source SQUARE COL ROW(SQUARE) TREAT
8 17
4.4 1 3 1 1 21.0127 0.711024
0. p j = 1.94 93.75 Pr > F 0.16888889 94.42778 F Value 43.1 2 4 4 3 32. · · · .0002 0. · · · .0001
Type I SS 22.52 7.7 1 1 3 1 25.3 1 3 2 2 28.0444444 155.0 1 1 2 3 28.39388889
Replicated Rows and Columns The diﬀerent Latin squares are now independent.5055556
Coeff Var 2.4 2 4 5 2 31.0001 Pr > F 0.00055556 8.0 1 2 2 1 23. INPUT SQUARE COL ROW TREAT YIELD.93 12.00722222 6.2.75 F Value 43. p l = 1.5 1 3 3 3 30. THE LATIN SQUARE DESIGN
47
Error Corrected Total R-Square 0.0161111 Root MSE 0.0014 <. this time assuming six diﬀerent drivers and six diﬀerent tractors.78777778 Type III SS 22.54222222 47.6 1 2 3 2 28.52 7. The statistical model is i = 1.

78777778 Type III SS 22. CLASS SQUARE COL ROW TREAT.06 5.35555556 6. TITLE ’REPLICATED LATIN SQUARES (INDEPENDENT)’.54222222 47.0004 0. Greek letters. and treatments. with the exception of p = 6. and only once.89 107.16888889 94. MODEL YIELD= SQUARE COL(SQUARE) ROW(SQUARE) TREAT. and with each Greek letter.00055556 9.0004 0. Graeco-Latin squares exist for each p ≥ 3.0029 <.78777778
Mean Square 22.85 Pr > F 0.4394444 F Value 31.2
CHAPTER 2.8 30. RANDOMIZED BLOCKS
PROC GLM.0002 16
Coeff Var 2.52 Pr > F 0. REPLICATED LATIN SQUARES (INDEPENDENT) The GLM Procedure Dependent Variable: YIELD Source Model Error Corrected Total R-Square 0.00055556 2. RUN. columns.982991 Source SQUARE COL(SQUARE) ROW(SQUARE) TREAT Source SQUARE COL(SQUARE) ROW(SQUARE) TREAT DF 1 4 4 2 DF 1 4 4 2 DF 11 6 17 Sum of Squares 152. each treatment appears once.8526768 0.3794444 2.00055556 2.06 5.36 14. 5 6 6 6 6 4 5 6 3 1 3 2 29.0350 0.5 29.42222222 26.662906 YIELD Mean 27.0001 Mean Square 13. each row. QUIT.0161111 Root MSE 0.0350 0.42222222 26.3
The Graeco-Latin Square Design
Graeco-Latin squares are used when we have three blocking variables.36 14.39388889
2. Thus we investigate four factors: rows.0001 Pr > F 0. in each column. An example of a Graeco-Latin square for p = 4 is Cγ Dδ Aα Bβ The statistical model is Aβ Bα Cδ Dγ Dα Cβ Bγ Aδ Bδ Aγ Dβ Cα
.48
2 2 2 2 . Greek letters are used to represent the blocking in the third direction.85 F Value 50.89 107.00055556 9.42778 F Value 50.16888889 94.35555556 6.416915
Type I SS 22. In a Graeco-Latin square.54222222 47.0029 <.39388889 Mean Square 22.6366667 155.3 25.

)2 y ¯
k=1 r
SSCol = p SSG = p
l=1
(¯...3... The associated ANOVA table is
Source Treatments Greek letters Rows Columns Error Total df p−1 p−1 p−1 p−1 (p − 1)(p − 3) p2 − 1 SS SST rt SSG SSRow SSCol SSE SST MS M ST rt M SG M SRow M SCol M SE F -statistic F0 =
M ST rt M SE
where SST =
p
(yijkl − y. ﬁve diﬀerent stores assigned to the column classiﬁcation. − y. These included: (1) day of the week. and (3) eﬀect of shelf height. Day Mon Tue Wed Thur Fri 1 E α (238) D δ (149) B (222) C β (187) A γ (65) 2 C δ (228) B β (220) E γ (295) A (66) D α (118) Store 3 B γ (158) A α (92) D β (104) E δ (242) C (279) 4 D (188) C γ (169) A δ (54) B α (122) E β (278) 5 A β (74) E (282) C α (213) D γ (90) B γ (176)
The following is the SAS analysis.. )2 y ¯
and SSE is found by subtraction. There were a number of sources of variation. He decided to conduct a trial using a Graeco-Latin square design with ﬁve weekdays corresponding to the row classiﬁcation. p
49
yijkl = µ + αi + τj + βk + ψl +
ijkl
where ψl is the eﬀect of the lth Greek letter..... )2 y ¯
j=1 p
SST rt = p SSRow = p
i=1 p
(¯i. (2) diﬀerences among stores...j.. The following example is taken from Petersen : Design and Analysis of Experiments (1985). )2 y ¯ (¯.l − y.. − y. p k = 1. B. E.. p l = 1. · · · . · · · . · · · .. D.. The following table contains the results of his trial. C.
.. and ﬁve shelf heights corresponding to the Greek letter classiﬁcation. THE GRAECO-LATIN SQUARE DESIGN i = 1... · · · .. − y. )2 ¯ (¯.k. p j = 1. Example A food processor wanted to determine the eﬀect of package design on the sale of one of his products.. He had ﬁve designs to be tested : A.2.

7600 Root MSE 30.39 Mean Square 8249.0001 0.1366
.42 31.8400 7397. 1 1 5 1 238 1 2 3 4 228 1 3 2 3 158 1 4 4 5 188 1 5 1 2 74 2 1 4 4 149 2 2 2 2 220 2 3 1 1 92 2 4 3 3 169 2 5 5 5 282 3 1 2 5 222 3 2 5 3 295 3 3 4 2 104 3 4 1 4 54 3 5 3 1 213 4 1 3 2 187 4 2 1 5 66 4 3 5 4 242 4 4 2 1 122 4 5 4 3 90 5 1 1 3 65 5 2 4 1 118 5 3 3 5 279 5 4 5 2 278 5 5 2 4 176 .3600 F Value 1. DATA GL.1600 8852.0001 0.5600 1544.2510 0.39 F Value 1.40954 Y Mean 172.
CHAPTER 2.2510 0. RANDOMIZED BLOCKS
The SAS System The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total R-Square 0.42 31.21 2.946929 Source ROW COL TRT GREEK Source ROW COL TRT GREEK DF 16 8 24 Sum of Squares 131997.21 2. CLASS ROW COL TRT GREEK.6400 386. RUN. MODEL Y = ROW COL TRT GREEK.9600 115462.64304 DF 4 4 4 4 DF 4 4 4 4
Type I SS 6138.7400 F Value 8.1600 8852.0019
Coeff Var 17.5400 2213.50
OPTIONS LS=80 PS=66 NODATE.1600 Type III SS 6138.7919 <.1366 Pr > F 0.0400
Pr > F 0.0400 Mean Square 1534.66 0.92
2
Pr > F 0.5400 2213. QUIT.8650 924.9200 139395. CARDS. PROC GLM.7919 <. INPUT ROW COL TRT GREEK Y.2400 28865.5600 1544.9600 115462.2400 28865.1600
Mean Square 1534.6400 386.66 0.

INCOMPLETE BLOCK DESIGNS There are highly signiﬁcant diﬀerences in mean sales among the ﬁve package designs. 1 A C D 2 B D E 3 B C E 4 C D E block 5 6 A A B B C . The statistical model is i = 1.
51
2. C.4. and E where 3 treatments appear per block. where a < k.D 7 A D E 8 B C D k a
9 A B E
10 A C E
Usually. B. Thus the total sample size is ab = kr.4
Incomplete Block Designs
Some experiments may consist of a large number of treatments and it may not be feasible to run all the treatments in all the blocks.
. A. Examples Consider three treatments. Designs where only some of the treatments appear in every block are known as incomplete block designs. · · · . • Let λ be the number of times each pair of treatments appear together in the same block. · · · . BIBD’s may be obtained using fewer than Statistical Analysis We begin by introducing some notation. Suppose there are k treatments and b blocks. The following two examples illustrate this procedure.
2. Thus using three blocks 1 A B block 2 B C 3 A C
3 2
=3
Now consider ﬁve treatments. and C where two treatments are run in every block. however. We use 10 blocks. Each block can hold a treatments. We can also show that λ(k − 1) = r(a − 1) and thus λ = r(a − 1)/(k − 1). k yij = µ + τi + βj + ij j = 1. A. There are ways of choosing 2 out of three.
blocks. D. each block is selected in a balanced manner so that any pair of treatments occur together the same number of times as any other pair.4.2. One way to construct a BIBD is by using k blocks and assigning diﬀerent combination of treatments a to every block. b with the same assumptions as the RCBD model. B.1
Balanced Incomplete Block Designs (BIBD’s)
In a BIBD setup.
• Let r be the number of blocks in which each treatment appears.

kr−k−b+1 (α) √ .j
− y. − 1 a 1 if trt i in block j . )2 ¯
• SSBlocks = k
• Let Ti. For instance.. τi . with MEER=α. we declare treatment i and j to be signiﬁcantly diﬀerent. The diﬀerence here is that the sum of squares due to treatments needs to be adjusted for incompleteness.52
CHAPTER 2. λk Thus individual as well as simultaneous inference may be made concerning the treatment means.. is ˆ aM SE . λk
k
ˆ βj =
rQj .. respectively. M SE λk 2 The following example is taken from em Montgomery: Design and Analysis of Experiments.
i=1
The standard error of the adjusted treatment i mean. Thus. and T. The corresponding ANOVA table is
Source Treatments Blocks Error Total df k−1 b−1 kr − k − b + 1 kr − 1 SS SSTrt(adj) SSBlocks SSE SST MS M STrt(adj) M SBlocks M SE F -statistic F0 =
MS
Trt(adj)
M SE
Estimates of the model are µ = y. Now k a i=1 Q2 i SSTrt(adj) = λk • SSE = SST − SSTrt(adj) − SSBlocks . while making all pairwise comparisons. RANDOMIZED BLOCKS
We partition the total sum of squares in the usual manner. . )2 ¯ − y.j − r Multiple Comparisons
φij Ti. we have SST = SSTrt(adj) + SSBlocks + SSE where • SST =
k i=1 b j=1 (yij b y j=1 (¯.j
j=1
The quantity Qi is the ith treatment total minus the average of the block totals containing treatment i. if |ˆi − τj | exceeds τ ˆ Bonferroni: and 2a qk. ˆ ¯ where τi = ˆ
aQi . and φij = Let Qi = Ti.j be the ith treatment and the jth block totals. and error. into sum of square due to treatments. Tukey: tkr−k−b+1 α 2
k 2
M SE
2a λk
. blocks. 0 otherwise
b
φij T. λb
1 Qj = T.

2. However. Since variations in the batches of raw material may aﬀect the performance of the catalysts. PROC GLM. CONTRAST ’1 VS 2’ CATALYST 1 -1 0 0. DATA CHEM. LSMEANS CATALYST / TDIFF PDIFF ADJUST=BON STDERR. λ = 2. LSMEANS CATALYST / TDIFF ADJUST=TUKEY. INCOMPLETE BLOCK DESIGNS Example
53
A chemical engineer thinks that the time of reaction for a chemical process is a function of the type of catalyst employed.94 Pr > F 0.4.95833333 F Value 19.0024 52
. Four catalysts are being investigated. The following table summarizes the results. CLASS CATALYST BATCH. MODEL TIME = CATALYST BATCH. the engineer decides to use batches of raw materials as blocks.
OPTIONS LS=80 PS=66 NODATE. CARDS. each batch is only large enough to permit three catalysts to be run.
The associated SAS output is
The SAS System The GLM Procedure Dependent Variable: TIME Source Model DF 6 Sum of Squares 77. INPUT CATALYST BATCH TIME. The following SAS code is used to analyze the above data. 1 1 73 1 2 74 1 4 71 2 2 75 2 3 67 2 4 72 3 1 73 3 2 75 3 3 68 4 1 75 4 3 72 4 4 75 . ESTIMATE ’1 VS 2’ CATALYST 1 -1 0 0. The experimental procedure consists of selecting a batch of raw material. a = 3. Treatment (Catalyst) 1 2 3 4 Block (Batch) 1 2 3 4 73 74 71 75 67 72 73 75 68 75 72 75
Thus k = 4. and observing the reaction time. QUIT. This is known as a symmetric design since k = b.75000000 Mean Square 12. b = 4. loading the pilot plant. applying each catalyst in a separate run of the pilot plant. RUN. r = 3.

0130 -4.6250000 72.0010 53
Type I SS 11.0001 <.50000 F Value 5. RANDOMIZED BLOCKS
Error Corrected Total R-Square 0.53709 4 -5.58333333 22.19183 0.98 33.0000000 LSMEAN Number 1 2 3 4 3 -0.83378
.00000000
0.0284 The SAS System The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer CATALYST 1 2 3 4 TIME LSMEAN 71.0000 -0.25000000 81.112036 DF 3 3 DF 3 3
Root MSE 0.191833 0.0000000 75.02777778
The GLM Procedure Least Squares Means Adjustment for Multiple Comparisons: Bonferroni CATALYST 1 2 3 4 TIME LSMEAN 71.4868051 0.358057 1 2 -0.89 Pr > F 0.88888889 22.53709 1.895144 1.67 33.959877 Source CATALYST BATCH Source CATALYST BATCH
5 11
3.89 F Value 11.806226
TIME Mean 72.75000000 66.0107 0.3750000 71.0464 54 4 -5.08333333 The SAS System
Mean Square 3.6250000 72.0464
Least Squares Means for Effect CATALYST t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: TIME i/j 1 2 0.3750000 71.0000 5.0000000 75.08333333 Type III SS 22.0001 <.0000 4.9825 3 -0.358057 1.0001 LSMEAN Number 1 2 3 4
Least Squares Means for Effect CATALYST t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: TIME i/j 1 2 3 4 0.0000000 Standard Error 0.0000 0.0209 -4.0284 -4.537086 1.89514 0.8085 -0.0000 4.0000 0.54
CHAPTER 2.0010 Pr > F 0.66666667 66.833775 0.4868051 0.0001 <.83378 0.296689 0.19183 0.0415 0.02777778 Mean Square 7.4868051 0.35806 0.29669 0.35806 1.89514 1.4868051 Pr > |t| <.65000000
Coeff Var 1.0209 1 2 -0.

INCOMPLETE BLOCK DESIGNS
0.000 y4.7349 4. 4 columns.895144 0.29669 0.9462 4.0281 55 0.0130 0.833775 0. and 5 rows.3
Other Incomplete Designs
There are other incomplete designs that will not be discussed here.8085 5.25000000
t Value -0.69821200 F Value 0.
2.000
2.4. Both the Bonferroni and the Tukey-Kramer procedures give us y1.0107.0281
55
3 4
Parameter 1 VS 2
Estimate -0. and rectangular lattices.9825 0.625 y3. cubic.36
Pr > |t| 0. ¯ 71.537086 0.375 y2. ¯ 75. Thus we declare the catalysts to be signiﬁcantly diﬀerent.296689 0.13 Pr > F 0.0175 The SAS System The GLM Procedure Dependent Variable: TIME Contrast 1 VS 2 DF 1 Contrast SS 0.9462 0. ¯ 71. These include the partially balanced incomplete block design and lattice designs such as square.2.08333333 Standard Error 0.
.67 with a p-value of 0.191833 0.7349
We may see from the output that F0 = 11.2
Youden Squares
These are incomplete Latin squares.0175 -4. ¯ 72.08333333 Mean Square 0.4. Row 1 2 3 4 5 1 A B C D E Column 2 3 B C C D D E E A A B 4 D E A B C
A Youden square may be considered as a symmetric BIBD with rows corresponding to blocks and each treatment occurring exactly once in each position of the block.4. The following example shows a Youden square with 5 treatments. in which the number of columns is not equal to the number of rows.

RANDOMIZED BLOCKS
.56
CHAPTER 2.

1 Introduction
This chapter focuses on the study of the eﬀects of two or more factors using factorial designs. respectively. increasing factor A from level 1 to level 2 causes an increase of 10 units in the response. and at the second level 10 − 25 = −15 . For instance. A factorial design is a design in which every combination of the factors is studied in every trial (replication). known as proﬁle plots. Each replicate in a two-factor factorial design will contain all the a × b treatment combinations. The eﬀect of A depends on the level of B chosen. the main eﬀect of B is 40 + 30 30 + 20 − = 10 2 2 In this case there is no interaction since the eﬀect of factor A is the same at all levels of B: 40 − 30 = 10 and 30 − 20 = 10 B1 30 40 B2 20 30
Sometimes the eﬀect of the ﬁrst factor may depend on the level of the second factor under consideration. display the two situations. consider a two factor experiment in which the two factors A and B have two levels each. This is know as the main eﬀect of factor A. say. with a and b levels. 57 B1 20 40 B2 25 10
. Then the experiment is run once. For example.Chapter 3
Factorial Designs
3. In a similar fashion. The following plots. The eﬀect of factor A is the change in response due to a chang in the level of A. The following is the resulting output. we may have two factors A and B. A1 A2 The eﬀect of factor A at the ﬁrst level of B is 40 − 20 = 20 . A1 A2 The average eﬀect of factor A is then 40 + 30 30 + 20 − = 10 2 2 Thus. The following table shows two interacting factors.

188. and all 36 tests are run in random order. 58. 80. Material 1 2 3 15 130. 110. 40. FACTORIAL DESIGNS
B2 Response
B1
A1
A2
Factor A
B1
Response
B2
A1
A2
Factor A
The following example taken from Montgomery : Design and Analysis of Experiments considers two factors. 122. 139 80 20.58
CHAPTER 3. and the experiment repeated 4 times. 82. Four batteries are tested at each combination of plate material and temperature. 74. 159. 70. 75 136. 120. 58 25. 160 Temperature 65 34. 115 174. 60
. 155. 168. 45 96. 70. 150. 82. The results are shown below. Example The maximum output voltage of a particular battery is thought to be inﬂuenced by the material used in the plates and the temperature in the location at which the battery is installed. each with 3 levels. 126 138. 180 150. 104. 106.

ya2n ··· b y1b1 . (τ β)ij is the eﬀect of the interaction between the ith level of factor A and the jth level of factor B.2. a j = 1. · · · . τi is the eﬀect of the ith level of factor A. y2bn yab1 . · · · . Factor A 1 2 . βj is the eﬀect of the jth level of factor B. a The statistical model is i = 1.
(3. · · · . The following table displays the data arrangement of such an experiment. .
Average Output Voltage
100
150
Type 3
0
50
Type 1 Type 2
40
50
60
70
80
90
Temperature
3. Suppose the experiment is run n times at each combination of the levels of A and B. · · · . · · · . y21n ya11 . n 1 y111 . · · · . · · · . j). · · · . . ya1n Factor B 2 y121 .2
The Two-Factor Factorial Design
We shall now study the statistical properties of the two-factor design. y11n y211 . and ijk is the random error associated with the kth replicate in cell (i. · · · . b k = 1. · · · .1)
where µ is the overall mean. THE TWO-FACTOR FACTORIAL DESIGN
59
As the following proﬁle plot shows the interaction between temperature and material type may be signiﬁcant. · · · . y22n ya21 . y12n y221 . We will perform formal statistical tests to determine whether the interaction is signiﬁcant in the next section. y1bn y2b1 .3. The proﬁle plot is constructed using the average response for each cell. · · · . Let the factors be A and B each with a and b levels.
. yabn
yijk = µ + τi + βj + (τ β)ij +
ijk .

1)..
and
ijk
∼iid N (0.j.. The model residuals are obtained as eijk = yijk − yijk = yijk − yij.j. · · · . + y. ˆ ¯ τi = yi. ˆ ˆ ˆ ¯ Thus. . FACTORIAL DESIGNS
3. ¯ ¯ ¯ ¯
Using the model in (3.
Estimation The estimators of the model parameters are obtained via the least squares procedure.. − y. − y.2.. a j = 1.1
The Fixed Eﬀects Model
a b a b
The ﬁxed eﬀects model is given by (3. .60
CHAPTER 3. − y. we are interested in the hypotheses A main eﬀect: H0 : τ1 = · · · = τa = 0 HA : at least one τi = 0 B main eﬀect: H0 : β1 = · · · = βb = 0 HA : at least one βj = 0 AB interaction eﬀect: H0 : (τ β)11 = · · · = (τ β)ab = 0 HA : at least one (τ β)ij = 0 The following is the two-factor ﬁxed eﬀects ANOVA table:
Source A B AB Error Total df a−1 b−1 (a − 1)(b − 1) ab(n − 1) abn − 1 SS SSA SSB SSAB SSE SST MS M SA M SB M SAB M SE F -statistic
M SA FA = M SE FB = M SB M SE FAB = M SAB M SE
. β ¯ ¯
i = 1. · · · . b
(τ β)ij = yij. a j = 1. − yi. ˆ ¯ Inference In the two-factor ﬁxed eﬀects model. ˆ ¯ ¯ ˆj = y. .. every observation in cell (i. · · · .1) along with the assumptions τi =
i=1 j=1
βj =
i=1
(τ β)ij =
j=1
(τ β)ij = 0 . · · · ... σ 2 ). They are µ = y.. j) is estimated by the cell mean. we can easily see that the observation yijk is estimated by ˆ yijk = µ + τi + βj + (τ β)ij = yij... b i = 1.

PROC MEANS NOPRINT. THE TWO-FACTOR FACTORIAL DESIGN where
a
61
SSA = bn
i=1 b
(¯i.j.. INPUT MAT TEMP LIFE. Example Consider the battery life experiment given above and assume that the material type and temperatures under consideration are the only ones we are interested in.. QUIT. respectively. )2 y ¯ ¯ ¯
n
SSE =
i=1 j=1 k=1 a b n
(yijk − yij.. PROC GPLOT. 3 3 60 ..3.2. or AB eﬀect to be signiﬁcant if FA > Fa−1.ab(n−1) (α).0001
. )2 y ¯ (¯. SYMBOL I=JOIN.. VAR LIFE.
OPTIONS PS=66 LS=80 NODATE.. RUN. PROC GLM. 1 1 130 1 1 155 ..ab(n−1) (α).02778 F Value 11. MODEL LIFE=MAT TEMP MAT*TEMP. DATA BATTERY. − y.22222 Mean Square 7427. RUN. BY MAT TEMP. CLASS MAT TEMP. )2 y ¯
j=1 a b
SSB = an
SSAB = n
i=1 j=1 a b
(¯ij. )2 ¯
i=1 j=1 k=1
SST =
We then declare the A. The following SAS code may be used to produce the two-factor factorial ANOVA table along with an interaction plot (given above). RUN..00 Pr > F <. + y. DATALINES.
The corresponding SAS output is
Dependent Variable: LIFE Source Model DF 8 Sum of Squares 59416. PLOT MN*TEMP=MAT. or FAB > F(a−1)(b−1). QUIT. QUIT.. )2 ¯ (yijk − y... − y. FB > Fb−1. − y. OUTPUT OUT=OUTMEAN MEAN=MN.ab(n−1) (α).. − yi. B.j.

44444
Therefore.97 3. = 30 ¯
.. = 40 ¯ 0.36111 2403.0186 Pr > F 0. ’s). = 20 ¯ 40.91 28. 20.86111 19559. = 50 ¯ 35 2 30.72222 39118. we declare that both main eﬀects as well as the interaction eﬀect are signiﬁcant.36111 2403.62372 DF 2 2 4 DF 2 2 4
Root MSE 25. if the interaction between A and B is signiﬁcant. we may compare the means of factor A pooled over all levels of factor B (¯i.. ¯ 1 10. On the other hand.0001 0.91 28. ’s) and the means of factor B pooled over all y levels of factor A (¯.44444 Mean Square 5341. 60 y21.j.0020 <.62
CHAPTER 3. 30 y11.72222 9613. ¯ 30 30 y.72222 39118.86111 19559. y The following example shows the need for comparing the means of one factor within each level of the other.0001 0.j. 10. 40.77778
Mean Square 5341.0186
Type I SS 10683.56 Pr > F 0.765210 Source MAT TEMP MAT*TEMP Source MAT TEMP MAT*TEMP
27 35
18230. In the case where the interaction between factors A and B is not signiﬁcant. FACTORIAL DESIGNS
Error Corrected Total R-Square 0.56 F Value 7.. 20 y22.21296
Coeff Var 24. 50 y12.
Multiple Comparisons
The manner in which we perform multiple comparisons is dependent on whether or not the interaction eﬀect is signiﬁcant. 50. we need to y compare the means of one factor within each level of the other factor (¯ij.0020 <. ’s)..
B A 1 2 y.98486
LIFE Mean 105.97222
675. = 10 ¯ 25 yi.97 3.72222 9613.5278 F Value 7.75000 77646.77778 Type III SS 10683.

we declare βj to be signiﬁcantly diﬀerent from βj if |¯.j . − yi j. ¯ ¯ Notice that within each level of B. the eﬀect of A is nonzero. − yi . The standard errors are SE(¯ij. ) = y ¯ and 2M SE na Thus. nb 2M SE nb
Similarly.ab(n−1) (α) √ 2 2M SE .. however. n
The following SAS code provides the Tukey-Kramer intervals for the battery life example considered above. = −30 . THE TWO-FACTOR FACTORIAL DESIGN
63
B1
Response
B2
A1
A2
Factor A
The interaction plot shows that factor A may have an eﬀect on the response. = 30. the a factor A means may be compared via the Tukey-Kramer procedure as : declare τi to be signiﬁcantly diﬀerent from τi if SE(¯.2. − y12. − y1...j. ¯ ¯
Generally. j ). − y. na
When interaction is present we use the ab cell means. − y11. − yi j .j..ab(n−1) (α) √ 2 2M SE . when interaction is not present. − y. n
where (i. | > y ¯ qb. ) = y ¯ |¯i. | > y ¯ qa.. for example. The Tukey-Kramer method declares τi to be signiﬁcantly diﬀerent from τi in level j of factor B if |¯ij.. = 0. − yi .ab(n−1) (α) √ 2 2M SE . y21. ) = y ¯ 2M SE . | > y ¯ qab. j) = (i .
. ¯ ¯ y22. y2.j .3. we use the following standard errors in our comparisons of means: SE(¯i.

0014 1 2 -2.95132 0.578956 0.750000 49.083333 LSMEAN Number 1 2 3
CHAPTER 3.0001 1 2 3.0628 3.2718 3 -3.0010 3 7. CLASS MAT TEMP.500000 144. FACTORIAL DESIGNS
Least Squares Means for Effect MAT t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: LIFE i/j 1 2 3 2. ---------------------------------------------------------------Adjustment for Multiple Comparisons: Tukey MAT 1 2 3 LIFE LSMEAN 83.951318 0.750000 85.333333 125.64
PROC GLM.2718
Adjustment for Multiple Comparisons: Tukey TEMP 1 2 3 LIFE LSMEAN 144.0044 -4.604127 <.09272 0.092717 0.250000 57. LSMEANS MAT | TEMP /TDIFF ADJUST=TUKEY.0044 -7.0010
Adjustment for Multiple Comparisons: Tukey MAT 1 1 1 2 2 2 3 3 3 TEMP 1 2 3 1 2 3 1 2 3 LIFE LSMEAN 134.750000 119.0014 -1. MODEL LIFE=MAT TEMP MAT*TEMP.0628 1.57896 0.60413 <.500000 155.166667 LSMEAN Number 1 2 3
Least Squares Means for Effect TEMP t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: LIFE i/j 1 2 3 -3.500000 LSMEAN Number 1 2 3 4 5 6 7 8 9
Least Squares Means for Effect MAT*TEMP
. RUN.750000 57.51141 0.166667 108.583333 64.372362 0.51141 0.37236 0.833333 107. QUIT.0001 4.000000 145.

5819 3.9953 -3.2179 0.5819 7 0.9165 -5.823323 0.523887 0.63969 0.143117 0.0172 1.0000 4.38793 0.20429 0.86404 0.95928 0.816368 0.9995 -2.0172 1.36082 0.9165
65
Least Squares Means for Effect MAT*TEMP t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: LIFE i/j 1 2 3 4 5 6 7 5.9997 -1.9999 -4.78261 0.8282 3 4.0604 -3.360815 0.9999 1 2 4.0022 0.63949 0.81657 0.9616 -0.23836 0.680408 0.8823 -1.415038 0.5819 -3.0001 -0.53749 0.34721 0.4354 1.0000 0.27908 0.54425 0.68041 0.0019 0.41504 0.59867 0.0460 -3.0000 5.0014 -4.183834 0.959283 0.0460 -0.0014 1.0003 3.0000 -3.0018 3 4.435396 1.0000 9 2.0015 1.0005 1.204294 0.0015 0.0006 8 -0.23836 0.0743
Adjustment for Multiple Comparisons: Tukey Least Squares Means for Effect MAT*TEMP t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: LIFE i/j 8 9 1 0.0065 0.0022 0.347209 0.3.01361 1.0000 5.95928 0.40153 0.0475 -0.0018 -4.537493 0.0000 5.387926 0.82332 0.81657 0.0604
Notice that SAS has labelled the 9 cells consecutively as
.0067 1.6420
Least Squares Means for Effect MAT*TEMP t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: LIFE i/j 8 9 6 5.2179 0.86404 0.72133 0.279077 0.0004 3.80296 0.319795 0.0003 -5.707721 0.63969 0.5819 -5.50343 0.14291 0.82332 0.09524 1.544245 0.9953 -4.401533 0.0019 4 -1.0006 6 4.802964 0.18383 0.59867 0.2017 2 4.6420 -1.8347 4 -0.9991 -1.095243 1.0067 -0.142915 0.503427 0.959283 0.52389 0.2017 -1.0000 4.9997 -3.14312 0.0005 -0.70772 0.0065 -4.013606 1.0172 5 1.0172 7 -0.31979 0.8347 3.823323 0.721327 0.0004 -1.9616 -5.9995 -4. THE TWO-FACTOR FACTORIAL DESIGN
t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: LIFE i/j 1 2 3 4 5 6 7 -4.782605 0.8823 -5.0743 8 9 3.0001 3.81637 0.2.0475 1.639488 0.42179 1.8282 -1.9991 5 0.42179 1.

75
y33.75 y32. PLOT RES*MAT. SYMBOL V=CIRCLE. ¯ 57.75 ¯ y32. HIST RES / NORMAL (L=1 MU=0 SIGMA=EST). ¯ 155. ¯ 119.75
y21.66 1 4 7 We use underlining to summarize the results. CLASS MAT TEMP. ¯ 134.75 y21. ¯ 155. PROC GPLOT. ¯ ¯ y31.50 85.75 144.25 y22.75 2 5 8 3 6 9
CHAPTER 3. Temperature within Material: Material = 1 Material = 2 Material = 3 Material within Temperature: Temperature = 1 Temperature = 2 Temperature = 3 y12. ¯ 134. RUN. QQPLOT RES / NORMAL (L=1 MU=0 SIGMA=EST). 85.5 144.
PROC GLM. ¯ 145.25 y23.00 y12. 145. The graphical check for normality is the QQ-plot of the residuals. PLOT RES*PRED. ¯ y33. QUIT.
. and • factor B.5
Model Diagnostics Diagnostics are run the usual way via residual analysis. ¯ 57.75
y23. MODEL LIFE=MAT TEMP MAT*TEMP. ¯ 49. PROC UNIVARIATE NOPRINT. FACTORIAL DESIGNS
y11. ¯ Graphical checks for equality of variances as well as unusual observations are plots of residuals versus • yij. ¯ • factor A. OUTPUT OUT=DIAG R=RES P=PRED. QUIT.50 y13. ¯ y13.50 y22. ¯ 119. ¯ 49. ¯ 57. For the battery life example the following SAS code may be used to produce the required plots. PLOT RES*TEMP. RUN. . Recall that the residuals for the two-factor factorial model are given by eijk = yijk − yij. ¯ y31.00
y11. RUN. QUIT.50 57.

34769847 -----p Value----Pr > W-Sq Pr > A-Sq >0. The residual plots show a mild deviation from constant variance.
Goodness-of-Fit Tests for Normal Distribution Test Cramer-von Mises Anderson-Darling ---Statistic---W-Sq A-Sq 0.05586092 0. We may need to transform the data.250 >0. THE TWO-FACTOR FACTORIAL DESIGN
67
Formal tests for normality indicate no deviation from normality.250
.3. The QQ-plot shows no signs of nonnormality.2.

MEANS CELL/ HOVTEST=LEVENE. The following trick of relabelling the cells and running a one-factor ANOVA model may be used to perform Levene’s test. The partial SAS code is given along with the output. CLASS CELL. -------------------------------------------------Levene’s Test for Homogeneity of LIFE Variance ANOVA of Squared Deviations from Group Means
.. INPUT MAT TEMP LIFE. PROC GLM..68
CHAPTER 3. 3 3 82 3 3 60 .
OPTIONS LS=80 PS=66 NODATE. MODEL LIFE=CELL. 1 1 130 1 1 155 . QUIT. DATALINES. CELL = 3*(MAT . RUN. FACTORIAL DESIGNS
There is no command in SAS to perform Levene’s test for equality of variances.1) + TEMP. DATA BATTERY.

2107
.48
Pr > F 0.2.3. THE TWO-FACTOR FACTORIAL DESIGN
Sum of Squares 5407436 12332885 Mean Square 675929 456774
69
Source CELL Error
DF 8 27
F Value 1.

70

CHAPTER 3. FACTORIAL DESIGNS

3.2.2

Random and Mixed Models

We shall now consider the case where the levels of factor A or the levels of factor B are randomly chosen from a population of levels. The Random Eﬀects Model In a random eﬀects model the a levels of factor A and the b levels of factor B are random samples from populations of levels. The statistical model is the same as the one given in (3.1) where τi , βj , (τ β)ij , and ijk 2 2 2 are randomly sampled from N (0, στ ), N (0, σβ ),N (0, στ β ), and N (0, σ 2 ) distributions, respectively. Thus, the variance of any observation is 2 2 2 V ar(yijk ) = στ + σβ + στ β + σ 2 The hypotheses of interest are A main eﬀect:

2 H0 : σ τ = 0 2 HA : σ τ = 0

B main eﬀect:

2 H0 : σβ = 0 2 HA : σβ = 0

**AB interaction eﬀect:
**

2 H0 : στ β = 0 2 HA : στ β = 0

**The ANOVA table needs some modiﬁcations. This is seen examining the expected mean squares.
**

2 2 E(M SA ) = σ 2 + nστ β + bnστ 2 2 E(M SB ) = σ 2 + nστ β + anσβ 2 E(M SAB ) = σ 2 + nστ β

**E(M SE ) = σ 2 Therefore, the two-factor random eﬀects ANOVA table is:
**

Source A B AB Error Total df a−1 b−1 (a − 1)(b − 1) ab(n − 1) abn − 1 SS SSA SSB SSAB SSE SST MS M SA M SB M SAB M SE F -statistic

M SA FA = M SAB M SB FB = M SAB FAB = M SAB M SE

From the expected mean squares, we get the estimates of the variance components as σ 2 = M SE ˆ M SAB − M SE στ β = ˆ2 n M SA − M SAB στ = ˆ2 bn M SB − M SAB σβ = ˆ2 an

3.2. THE TWO-FACTOR FACTORIAL DESIGN Example

71

Consider the battery life example once again. This time assume that the material types and temperatures are randomly selected out of several possibilities. We may then use the RANDOM statement in PROC GLM of SAS to analyze the data as a random eﬀects model. Here are the SAS code and associated output.

OPTIONS LS=80 PS=66 NODATE; DATA BATTERY; INPUT MAT TEMP LIFE; DATALINES; 1 1 130 1 1 155 ....... 3 3 60 ; PROC GLM; CLASS MAT TEMP; MODEL LIFE=MAT TEMP MAT*TEMP; RANDOM MAT TEMP MAT*TEMP / TEST; RUN; QUIT; --------------------------------------------------------The GLM Procedure Source MAT TEMP MAT*TEMP Type III Expected Mean Square Var(Error) + 4 Var(MAT*TEMP) + 12 Var(MAT) Var(Error) + 4 Var(MAT*TEMP) + 12 Var(TEMP) Var(Error) + 4 Var(MAT*TEMP) The GLM Procedure Tests of Hypotheses for Random Model Analysis of Variance Dependent Variable: LIFE Source MAT TEMP Error: MS(MAT*TEMP) Source MAT*TEMP Error: MS(Error) DF 2 2 4 DF 4 27 Type III SS 10684 39119 9613.777778 Type III SS 9613.777778 18231 Mean Square 5341.861111 19559 2403.444444 Mean Square 2403.444444 675.212963 F Value 3.56 Pr > F 0.0186 F Value 2.22 8.14 Pr > F 0.2243 0.0389

Notice that variability among material types is the only factor that is not signiﬁcant. The estimates of the components of variance are (values in parentheses are percent contributions of the components) σ 2 = M SE = 675.21 (24.27%) ˆ 2403.44 − 675.21 M SAB − M SE = = 432.06 (15.53%) στ β = ˆ2 n 4 M SA − M SAB 5341.86 − 2403.44 στ = ˆ2 = = 244.87 (8.80%) bn 12 M SB − M SAB 19559 − 2403.44 σβ = ˆ2 = = 1429.58 (51.39%) an 12

72 Mixed Models

CHAPTER 3. FACTORIAL DESIGNS

Let us now consider the case where one factor is ﬁxed and the other is random. Without loss of generality, assume that factor A is ﬁxed and factor B is random. When a factor is random, its interaction with any other factor is also random. The statistical model, once again, has the same form given in (3.1). This time we assume that • τi are ﬁxed eﬀects such that τi = 0,

ijk

2 2 • βj ∼iid N (0, σβ ), (τ β)ij ∼iid N (0, a−1 στ β ), and a

∼iid N (0, σ 2 ).

**The hypotheses of interest are H0 : τ1 = · · · = τa = 0, The expected mean squares are given by
**

2 E(M SA ) = σ 2 + nστ β + 2 E(M SB ) = σ 2 + anσβ 2 E(M SAB ) = σ 2 + nστ β 2 H0 : σβ = 0, 2 H0 : στ β = 0

bn τi2 a−1

E(M SE ) = σ 2 The ﬁxed factor eﬀects are estimated the usual way as µ = y... , ˆ ¯ and the variance components are estimated as M SAB − M SE , n The ANOVA table for the mixed model is σ 2 = M SE , ˆ στ β = ˆ2

Source A (ﬁxed) B (random) AB Error Total df a−1 b−1 (a − 1)(b − 1) ab(n − 1) abn − 1 SS SSA SSB SSAB SSE SST

τi = yi.. − y... ˆ ¯ ¯

and

σβ = ˆ2

M SB − M SE an

MS M SA M SB M SAB M SE

F -statistic

M SA FA = M SAB M SB FB = M SE FAB = M SAB M SE

Example Consider the battery life example and assume that temperature is a random factor while material type is a ﬁxed factor. We use PROC MIXED in SAS to generate the output. PROC GLM does not provide the correct analysis!

OPTIONS LS=80 PS=66 NODATE; DATA BATTERY; INPUT MAT TEMP LIFE; DATALINES; 1 1 130 1 1 155 ....... 3 3 60 ; PROC MIXED COVTEST; CLASS MAT TEMP;

----------------------------------------------------------------------The Mixed Procedure Model Information Data Set Dependent Variable Covariance Structure Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method WORK.00000000 3 4 12 1 36 36 0 36
Convergence criteria met. THE TWO-FACTOR FACTORIAL DESIGN
MODEL LIFE=MAT.41258855 327.2
.7 331. Covariance Parameter Estimates Cov Parm TEMP MAT*TEMP Residual Estimate 1429.35 183.67 Pr Z 0.1911 0.0001
Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) The Mixed Procedure Type 3 Tests of Fixed Effects Effect MAT Num DF 2 Den DF 4 F Value 2.06 675.77 Z Value 0.21 Standard Error 1636.01 3. RANDOM TEMP MAT*TEMP. RUN.3. QUIT.2243 327.9 333.BATTERY LIFE Variance Components REML Profile Model-Based Containment
73
Class Level Information Class MAT TEMP Levels 3 3 Values 1 2 3 1 2 3 Dimensions Covariance Parameters Columns in X Columns in Z Subjects Max Obs Per Subject Observations Used Observations Not Used Total Observations Iteration History Iteration 0 1 Evaluations 1 1 -2 Res Log Like 352.87 1.09 427.66 432.91147422 Criterion 0.22 Pr > F 0.1560 0.9 334.2.

When there is imbalance. Proc Mixed uses the restricted maximum likelihood (REML) technique to estimate the variance components. FACTORIAL DESIGNS
From the output we observe that the ﬁxed eﬀect (MAT) is not signiﬁcant.
.1911 and 0. Neither of the random eﬀects are signiﬁcant (p-values of 0.1560). however. In a balanced design the REML method gives identical estimates as those obtained using the expected mean squares.74
CHAPTER 3. show that the estimates of the variance components obtained here are identical to those using the expected mean squares. the results are not the same. Exercise: For this example.

0) T1 P1 (11.7) T1 P3 (18. T2 P1 . y ¯
Example An agronomist wanted to study the eﬀect of diﬀerent rates of phosphorous fertilizer on two types of broad bean plants.2) T1 P1 (13.5) T2 P1 (12. he decided to do a factorial experiment with two factors: 1. )2 . b.6) T2 P2 (10.1) T1 P2 (17. T2 P2 .6) T1 P3 (17.3) T2 P3 (16.5) T2 P3 (15. bushy • T2 = tall.3) T2 P1 (11. erect 2. δk is the eﬀect of the kth block. T2 P3 He conducted the experiment using a randomized block design with four blocks of six plots each.3. so. The ANOVA table for a random eﬀects model is Source A B AB Blocks Error Total where SSBlocks = ab
k=1
df a−1 b−1 (a − 1)(b − 1) n−1 (ab − 1)(n − 1) abn − 1
SS SSA SSB SSAB SSBlocks SSE SST
n
MS M SA M SB M SAB M SBlocks M SE
F -statistic M SA FA = M SAB M SB FB = M SAB FAB = M SAB M SE
(¯..9) T2 P2 (12.3
Blocking in Factorial Designs
Blocking may be implemented in factorial designs using the same principles.6) T1 P3 (18.. a. Plant type (T ) at two levels • T1 = short. · · · . In a randomized block design.7) T2 P1 (12. The model assumes that treatment-block interactions are negligible. The statistical model for a two-factor blocked factorial design with 1 replication per block is yijk = µ + τi + βj + (τ β)ij + δk +
ijk
where i = 1. n.2) T1 P2 (17. j = 1.8) T2 P3 (17. every block contains all possible treatment combinations.k − y. T1 P3 . · · · .5) T1 P1 (14. and k = 1.2) T1 P2 (17.. · · · . BLOCKING IN FACTORIAL DESIGNS
75
3.6) T2 P3 (16.3.1) IV T1 P3 (18. T1 P2 .1) BLOCK II III T2 P1 (11.6) T2 P2 (9.5)
. Phosphorous rate (P ) at three levels • P1 = none • P2 = 25kg/ha • P3 = 50kg/ha Using the full factorial set of combinations he had six treatments: T1 P1 .6) T1 P2 (18. He thought that the plant types might respond diﬀerently to fertilization.1) T1 P1 (14. The ﬁeld layout and the yield in kg/ha are shown below: I T2 P2 (8.

2.6.5 11.0. 18.1.6. 16. FACTORIAL DESIGNS The data layout is (observations in a cell are in increasing block order I.6 Phosphorous P2 17. II. 10.2.1. 18.5. 14. IV) Type T1 T2 P1 11.2. 13. 17.76
CHAPTER 3.7. 11. 12.1 8. 18.3.5. III.6.6.6. 14. 9.
.3. 17.7. 12.8 P3 18.9 15.1. 12.5
The following SAS code and output give the analysis. 16. 17. 17.

1 2 1 4 12. DATALINES.6 1 2 3 17.1 1 3 1 18.6 1 1 3 14. 1 1 1 11.6 2 2 1 8.7 2 3 3 16.2 1 3 4 18.5 .9 2 1 1 11.0 2 1 2 11.6 1 3 3 18.3 1 1 4 14.6 1 2 4 18. PROC GLM.2 1 3 2 17. DATA FERT.8 2 3 1 15. MODEL YIELD=BLOCK TYPE|PH. RUN.6 2 3 4 17.5 1 1 2 13.1 1 2 2 17.3. CLASS TYPE PH BLOCK.
77
. BLOCKING IN FACTORIAL DESIGNS OPTIONS LS=80 PS=66 NODATE.3.1 2 2 4 12.5 2 2 3 9.7 2 3 2 16.3 2 2 2 10.5 1 2 1 17. QUIT.2 2 1 3 12. INPUT TYPE PH BLOCK YIELD.

40041667 49.0001 <. while tall. The rates of phosphorous fertilizer and the type of plant both aﬀect yield signiﬁcantly.72
CHAPTER 3.32125000 77.32125000 77.68 133.195813 DF 3 1 2 2 DF 3 1 2 2
Type I SS 13.6762500 243. FACTORIAL DESIGNS
Pr > F <.
.0001
The interaction between plant type and phosphorous level is signiﬁcant.33 38.78
------------------------------------------------The GLM Procedure Dependent Variable: YIELD Source Model Error Corrected Total R-Square 0.68 133.93625000 22.63750 F Value 7.964350 Source BLOCK TYPE PH TYPE*PH Source BLOCK TYPE PH TYPE*PH DF 8 15 23 Sum of Squares 234. The main eﬀects are also signiﬁcant. Short.3762500 Root MSE 0.3375000 0.13 Mean Square 29.7000000 8.0001 <.40041667 99.05291667
Pr > F 0.0001 <.81 86.0001
Coeff Var 5.33 38.760537 YIELD Mean 14.5784167 F Value 50.10583333 Type III SS 13. Diﬀerent plant types respond diﬀerently to diﬀerent levels of the fertilizer. This means that all comparisons of means of one factor would have to be done within the levels of the other factor.0024 <. Blocking seems to be working here judging from the corresponding F value.40041667 99.40041667 49.93625000 22.87250000 44. bushy plants seem to show their greatest yield increase with the ﬁrst increment of added phosphorous.0001 Pr > F 0.05291667 Mean Square 4.44041667 77.0024 <.13 F Value 7. The eﬃciency needs to be investigated further.0001 <.10583333
Mean Square 4.81 86. erect plants seem to show no yield increase with 25kg/ha of phosphorous.87250000 44.44041667 77.

k = 1. To illustrate this. Considering a ﬁxed eﬀects model. etc.
ijkl
3. The statistical model is yijkl = µ + τi + βj + γk + δl + (τ β)ij + where • τi . a is the eﬀect of the ith level of factor A. The statistical model is yi1 i2 ···it l = µ + τ1i1 + τ2i2 + · · · + τtit + (τ1 τ2 )i1 i2 + · · · (τt−1 τt )it−1 it + · · · + (τ1 τ2 · · · τt )i1 i2 ···it +
i1 i2 ···it l
where i1 = 1. b is the eﬀect of the jth level of factor B. b. the ANOVA table is
ijkl
. · · · . · · · . In general.4
The General Factorial Design
Consider an experiment in which we have t factors F1 . l = 1. j = 1. c. ft levels. We need two or more replications to be able to test for all possible interactions. consider a two-factor factorial experiment with 3 levels of factor A and 2 levels of factor B. · · · . ab. · · · . · · · . f1 . n. The only diﬀerence here is that every factor combination is considered to be a treatment. • γk and δl . B. a. respectively. and l = 1. respectively. consider two factors : factor A with a levels and factor B with b levels. b. · · · .3. A special case is the 3 factor factorial design with factors A. and C with levels a. k. · · · . The statistical model is yijkl = µ + τi + βj + γk + (τ β)ij + (τ γ)ik + (βγ)jk + (τ βγ)ijk + where i = 1. · · · . i2 = 1. and c. · · · . j = 1. Ft with f1 . f2 . i = 1. · · · . • βj . We will use Latin letters to represent the 3 × 2 = 6 treatment combinations. · · · . A A1 A1 A2 A2 A3 A3 B B1 B2 B1 B2 B1 B2 Treatment A B C D E F Column 3 4 C D D E E F F A A B B C
We then form the 6 × 6 basic Latin square cyclically as Row 1 2 3 3 3 3 1 A B C D E F 2 B C D E F A 5 E F A B C D 6 F A B C D E
We then randomize the rows and the columns. are the eﬀects of the kth row and the lth column. respectively. THE GENERAL FACTORIAL DESIGN A Factorial Experiment with Two Blocking Factors
79
This is dealt with by implementing a Latin square design in a similar manner as one factor experiments.4.

two levels of B and two levels of C are considered to set up a 3 × 2 × 2 factorial experiment. CLASS CARB PRES SPEED. This experiment is run twice and the deviations from the target volume are recorded. 11
Carbonation (A) 10 12 14
We will use SAS to analyze the data. 0 -1. INPUT CARB PRES SPEED VOL. 1 0. RUN. CARDS. 3 6.80
Source A B C AB AC BC ABC Error Total df a−1 b−1 c−1 (a − 1)(b − 1) (a − 1)(c − 1) (b − 1)(c − 1) (a − 1)(b − 1)(c − 1) abc(n − 1) abcn − 1 SS SSA SSB SSC SSAB SSAC SSBC SSABC SSE SST MS
CHAPTER 3. Pressure (B) 25 psi 30 psi Line Speed (C) Line Speed (C) 200 250 200 250 -3.
. 1 2. 5 5. PROC GLM. The data are given below. 0 1. operating pressure (B). The SAS code and output are as follows:
OPTIONS LS=80 PS=66 NODATE. 1 2. 6 7. 9 10. 4 7. MODEL VOL = CARB|PRES|SPEED. 1 1 1 -3 1 1 1 -1 1 1 2 -1 1 1 2 0 1 2 1 -1 1 2 1 0 1 2 2 1 1 2 2 1 2 1 1 0 2 1 1 1 2 1 2 2 2 1 2 1 2 2 1 2 2 2 1 3 2 2 2 6 2 2 2 5 3 1 1 5 3 1 1 4 3 1 2 7 3 1 2 6 3 2 1 7 3 2 1 9 3 2 2 10 3 2 2 11 . -1 -1. FACTORIAL DESIGNS
F -statistic
M SA FA = M SE FB = M SB M SE FC = M SC M SE FAB = M SAB M SE FAC = M SAC M SE FBC = M SBC M SE S FABC = MMABC SE
M SA M SB M SC M SAB M SAC M SBC M SABC M SE
The following example is taken from Montgomery : Design and Analysis of Experiments Example A soft drink bottler is studying the eﬀect of percent carbonation (A). Three levels of A. and line speed (C) on the volume of beverage packaged in each bottle. DATA BOTTLE.

PROC GLM of SAS is modiﬁed as follows:
PROC GLM.3.0833333 Mean Square 29.71 31.47 0.5000000 336.0001
Thus.2916667 1.76
Pr > F <.0001 0.50000000 2.6715 0.05. ------------------------------------------------------------Source Model Error Corrected Total Source CARB PRES CARB*PRES SPEED CARB*SPEED PRES*SPEED CARB*PRES*SPEED DF 11 12 23 DF 2 1 2 1 2 1 2 Sum of Squares 328. QUIT.8295455 0. As an example we will run the Tukey-Kramer procedure on factor A.2486 0. THE GENERAL FACTORIAL DESIGN
QUIT.50000000 7.0001 0.0416667 0.0001 2 <. --------------------------------------------------------------Adjustment for Multiple Comparisons: Tukey CARB 1 2 3 VOL LSMEAN -0.0558 0.7500000 45. are signiﬁcantly diﬀerent at MEER = 0.41 1.5833333 1.3750000 2.0416667 0.0001 <.6250000 Type I SS 252. percent carbonation.7083333 Mean Square 126.0001 <.3750000 45. CLASS CARB PRES SPEED.0001 3 <.41 64.6250000 22.37500000 LSMEAN Number 1 2 3
Least Squares Means for effect CARB Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: VOL i/j 1 2 3 1 <. all the main eﬀects appear to be signiﬁcant. However.2500000 22.4869
As we can see.06 3.1250000 8.5416667 F Value 42. none of the interactions is signiﬁcant.12 0.0416667 0.11 Pr > F <.0001
81
F Value 178. RUN.0001 <. all pairwise comparisons comparisons of the levels of factor A. LSMEANS CARB/PDIFF ADJUST=TUKEY.4. MODEL VOL = CARB|PRES|SPEED.0001 <.0416667 1. One may perform multiple comparisons at the highest level of the factors.
.3750000 5.

82
CHAPTER 3. FACTORIAL DESIGNS
.

We shall consider special cases where k = 2. 2) is known as a 3k factorial design.
4.1 Introduction
Often we consider general factorial designs with k factors each with 2 levels. i. Factor A B − − + − − + + + Replicate 1 2 3 28 25 27 36 32 32 18 19 23 31 30 29 83
Treatment (1) a b ab
Total 80 100 60 90
. to identify important factors. This involves a 2 × 2 × · · · × 2 = 2k factorial experiment. denoted by + and −.1
The 22 Design
Let A and B denote the two factors of interest with two levels (”low (−)” and ”high (+)”) each. The absence of letter indicates the factor is at a low level. The experiment is replicated three times. 3 before looking at the general 2k factorial design. A similar situation where each of the k factors has three levels (0.2. This is known as a 2k factorial design.Chapter 4
2k and 3k Factorial Designs
4. 1.2
The 2k Factorial Design
This is particularly useful in the early stages of an experiment as a factor screening mechanism. Every interaction term has only 1 degree of freedom. The symbol (1) is used to represent the treatment where every factor is at a low level.e. As an example consider an experiment where the time a chemical reaction takes is investigated. There are 22 = 4 treatments designated as: Treatment (1) a b ab A − + − + B − − + +
The presence of a letter indicates that the factor is at a high level. The two factors of interest are reactant concentration (A at 15% (−) and 25% (+)) and catalyst (B with absence (−) and presence (+)).
4.

.00 MS M SA = 208.5. We can now deﬁne the main eﬀects of a factor.92 F -statistic
M SA FA = M SE = 53.13 M SE FAB = M SAB = 2.15 FB = M SB = 19.00 y ¯ y ¯ 2 2 Now the contrast coeﬃcients are (−. −.33 SSB = 75. 2K AND 3K FACTORIAL DESIGNS
Let y (A+ ) denote the mean of the response where factor A is at high level. The AB interaction eﬀect is the average diﬀerence between the eﬀect of A at the high level of B and the eﬀect of A at the low level of B 1 1 {¯(A+ B+ ) − y (A− B+ )} − {¯(A+ B− ) − y (A− B− )} = 1.5. .33 M SE = 3.00 M SAB = 8. For the example above these yield SSA = 208.5). −. For example.34 The ANOVA table is
Source A B AB Error Total df 1 1 1 4(n − 1) = 8 4n − 1 = 11 SS SSA = 208.5. Let n be the number of replicates in the study. SSE = 31.5. .13 M SE
. SSB = 75.5). −. SSAB = 8. this may be given as a contrast with coeﬃcients (−.5. SST = 323.84
CHAPTER 4.00. .67 y ¯ y ¯ 2 2 Using the treatment means. The sum of squares for each factor and the interaction may be obtained in a very simple manner.33 SSE = 31.33 y ¯ y ¯ 2 2 Using the treatment means.33.5. the main eﬀect of B is given by y (B+ ) − y (B− ) ¯ ¯ which is equivalent to 1 1 {¯(A+ B+ ) + y (A− B+ )} − {¯(A+ B− ) + y (A− B− )} = −5. Similarly. .. y (A− B+ ) is the mean of the response in the case where factor A is at low ¯ level and factor B is at high level.5.33. SSA = n × (Main eﬀect of A)2 SSB = n × (Main eﬀect of B)2 SSAB = n × (Interaction eﬀect of AB)2 The total sum of squares is deﬁned as usual
2 2 n
SST =
(yijk − y.34 SST = 323.33 M SB = 75.5.5. )2 ¯
i=1 j=1 k=1
and the error sum of squares is obtained by subtraction as SSE = SST − SSA − SSB − SSAB .00 SSAB = 8. −.5). . the contrast coeﬃcients become (. The main eﬀect of A is y (A+ ) − y (A− ) ¯ ¯ This is equivalent to 1 1 {¯(A+ B+ ) + y (A+ B− )} − {¯(A− B+ ) + y (A− B− )} = 8.00. A similar notation is use for ¯ all the other means.

.15 2.5 = −. RUN. ----------------------------------------------Source A B A*B DF 1 1 1 Type I SS 208.19 19. MODEL Y = A|B.0000000 8. -1 -1 28 -1 -1 25 -1 -1 1 -1 36 1 -1 32 1 -1 -1 1 18 -1 1 19 -1 1 1 1 31 1 1 30 1 1 . THE 2K FACTORIAL DESIGN Using SAS
OPTIONS LS=80 PS=66 NODATE. INPUT A B Y @@. PROC GLM. X1X2 = 2*X1*X2. CARDS. QUIT.5
27 32 23 29
. ¯ ˆ β1 = Main eﬀect of A ˆ β2 = Main eﬀect of B ˆ β12 = AB Interaction eﬀect Using SAS
OPTIONS LS=80 PS=66 NODATE.5 = .0001 0. x2 ) as A− A+ B− B+ and ﬁtting the regression model y = β0 + β1 x1 + β2 x2 + β12 (2x1 x2 ) + . X2 = B/2.4. X1 = A/2. DATA BOTTLE. INPUT A B Y @@.2. CARDS.13 Pr > F <.. CLASS A B. PROC GLM. DATA BOTTLE.5 = . MODEL Y = X1 X2 X1X2. This is done by forming two variables (x1 .0024 0. The estimated model coeﬃcient are now ˆ β0 = y. ----------------------------------------
x1 x1 x2 x2
= −.0000000 8.3333333 75.3333333 Mean Square 208. QUIT. RUN.3333333 75.1828
85
27 32 23 29
One may use a multiple regression model to estimate the eﬀects in a 22 design. -1 -1 28 -1 -1 25 -1 -1 1 -1 36 1 -1 32 1 -1 -1 1 18 -1 1 19 -1 1 1 1 31 1 1 30 1 1 .3333333 F Value 53.

82 Pr > F 0.13 Pr > |t| <.0000000 Root MSE 1.38 1.0001 0.6666667 31.979057 Y Mean 27.3333333 323.0000000 8.57130455 1.3333333 t Value 48.1828 Mean Square 97.3333333 75.3333333
Mean Square 208.29 -4.2222222 3.0001 <.2
The 23 Design
In this subsection we consider 3-factor factorial experiments each with two levels. Treatment (1) a b ab c ac bc abc I 1/4 1/4 1/4 1/4 1/4 1/4 1/4 1/4 A -1/4 1/4 -1/4 1/4 -1/4 1/4 -1/4 1/4 B -1/4 -1/4 1/4 1/4 -1/4 -1/4 1/4 1/4 Factorial Eﬀects AB C AC 1/4 -1/4 1/4 -1/4 -1/4 -1/4 -1/4 -1/4 1/4 1/4 -1/4 -1/4 1/4 1/4 -1/4 -1/4 1/4 1/4 -1/4 1/4 -1/4 1/4 1/4 1/4 BC 1/4 1/4 -1/4 -1/4 -1/4 -1/4 1/4 1/4 ABC -1/4 1/4 1/4 -1/4 1/4 -1/4 -1/4 1/4
For example. With only the ﬁrst two levels of factor A.00000000 1.50000000 8.3333333 75.46
Standard Error 0.86
CHAPTER 4.1828 Pr > F <.15 2.50000 F Value 53.19 19.9166667 F Value 24.14 7. the data is
.902993 Source X1 X2 X1X2 Parameter Intercept X1 X2 X1X2 DF 3 8 11 Sum of Squares 291.0024 0.33333333 -5.14260910 1. 2K AND 3K FACTORIAL DESIGNS
Dependent Variable: Y Source Model Error Corrected Total R-Square 0.66666667
4.0001 0.14260910
27.196571 DF 1 1 1 Estimate
Type I SS 208. This setup uses a total of 23 = 8 experiments that are represented as Treatment (1) a b c A − + − − B − − + c C − − − + Treatment c ac bc abc A − + − + B − − + + C + + + +
The following table gives the contrast coeﬃcients for calculating the eﬀects.2.14260910 1.0000000 8. the main eﬀect of A is 1 [−¯(A− B− C− ) + y (A+ B− C− ) − y (A− B+ C− ) + y (A+ B+ C− ) y ¯ ¯ ¯ 4 −¯(A− B− C+ ) + y (A+ B− C+ ) − y (A− B+ C+ ) + y (A+ B+ C+ )] y ¯ ¯ ¯ The sum of squares are SSeﬀect = 2n(eﬀect)2
Consider the bottling experiment in Chapter 3.0002
Coeff Var 7.0024 0.

25000000 2.00000000
Mean Square 36.0003
Coeff Var 79.2415 Mean Square 10. 0 b: -1.2415 0.4.25000000 0.00000000
To employ multiple regression.0943 0.0022 0.60 19.62500000 F Value 16. PROC GLM. THE 2K FACTORIAL DESIGN Pressure (B) 25 psi 30 psi Line Speed (C) Line Speed (C) 200 250 200 250 (1): -3.5447 0. 1 a: 0. CLASS A B C.60 1. INPUT A B C VOL.25000000 0.5 = . CARDS.5 = . 1 ac: 2.42857143 0. QUIT.05694 DF 1 1 1 1 1 1 1
Type I SS 36.00000000 20.000000 F Value 57.5 = −.00000000 Root MSE 0.0001 0.25000000 2.00000000 1.5
. 5
87
Carbonation (A) 10 12 The SAS analysis is given below:
OPTIONS LS=80 PS=66 NODATE. 3 abc: 6.
--------------------------------------------------Source Model Error Corrected Total R-Square 0. -1 -1 -1 -3 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 0 -1 1 -1 -1 -1 1 -1 0 -1 1 1 1 -1 1 1 1 1 -1 -1 0 1 -1 -1 1 1 -1 1 2 1 -1 1 1 1 1 -1 2 1 1 -1 3 1 1 1 6 1 1 1 5 .40 3.5 = −.25000000 12. 0 bc: 1.60 0.25000000 1.60 32. MODEL VOL = A|B|C.790569 VOL Mean 1.00000000 1.935897 Source A B A*B C A*C B*C A*B*C DF 7 8 15 Sum of Squares 73. 1 ab: 2.40 1.00000000 20.2.5 = . RUN.69 Pr > F 0.25000000 12.25000000 1. DATA BOTTLE. -1 c: -1.0005 0.60 Pr > F <.00000000 5.00000000 78. one may consider the following coding: A− A+ B− B+ C− C+ x1 x1 x2 x2 x3 x3 = −.

59 5. X1 = A/2.0022 0.39528 0.75000 0. DATA BOTTLE.00000 5.39528 0.50000 Standard Error 0. MODEL VOL = X1 X2 X3 X1X2 X1X3 X2X3 X1X2X3. QUIT.26 1. X1X2 = 2*X1*X2.75000 0.39528 0. the following SAS code is an implementation the regression approach.42857 0.50000 0. For ˆ example. 2K AND 3K FACTORIAL DESIGNS
y = β0 + β1 x1 + β2 x2 + β3 x3 + β12 (2x1 x2 ) + β13 (2x1 x3 ) +β23 (2x2 x3 ) + β123 (4x1 x2 x3 ) + Once again. β12 is the AB interaction eﬀect.5447 0.
OPTIONS LS=80 PS=66 NODATE.05694 Mean Square 10.39528 t Value 5. X1X2X3 = 4*X1*X2*X3. -1 -1 -1 -3 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 0 -1 1 -1 -1 -1 1 -1 0 -1 1 1 1 -1 1 1 1 1 -1 -1 0 1 -1 -1 1 1 -1 1 2 1 -1 1 1 1 1 -1 2 1 1 -1 3 1 1 1 6 1 1 1 5 .2415 0.0001 0.0003
Root MSE Dependent Mean Coeff Var
R-Square Adj R-Sq
0.25000 1.00000 2.26 Pr > |t| 0. RUN. CARDS.88 The regression model is
CHAPTER 4. PROC REG.0943 0.43 1.79057 1.90 0. Considering the bottling experiment.39528 0. the estimate of the regression coeﬃcients correspond to the eﬀects of the factors.25000 0.63 1.62500 F Value 16.19764 0.69 Pr > F 0.00000 0.00000 79. X1X3 = 2*X1*X3.00000 78.00000 3. X3 = C/2.39528 0. X2X3 = 2*X2*X3.0010 <.8798
Parameter Estimates Variable Intercept X1 X2 X3 X1X2 X1X3 X2X3 X1X2X3 DF 1 1 1 1 1 1 1 1 Parameter Estimate 1.0005 0.9359 0. --------------------------------------------------Dependent Variable: VOL Analysis of Variance Source Model Error Corrected Total DF 7 8 15 Sum of Squares 73.2415
. INPUT A B C VOL. X2 = B/2.06 7.39528 0.69 4.

k three-factor interaction eﬀects. k two-factor interaction eﬀects. Fk−1 Fk . As always contrasts are of interest since they represent factor eﬀects.2. Fk k two-factor interactions 2 F 1 F2 F 1 F3 . SSA = 2 × 2 × 32 = 36 . The partial ANOVA table for the 2k design is
Source k main eﬀects F1 F1 . the coeﬃcients are ±21−2 = ±2−1 = ±1/2. (1).4. in the bottling experiment. 1 2k (n − 1) 2k n − 1 SS SSF1 SSF1 . . There are k = k main eﬀects. SSFk−1 Fk . Fk each with 2 levels. . k = 1 k-factor interaction k F 1 F2 · · · Fk Error Total df 1 1 . . . f1 . . k = 1 1 2 3 k k-factor interaction. We will now create k variables x1 . .e. Similarly. . fk . . i. · · · . . We will use the regression approach to estimate the eﬀects. . xk each taking values ±1/2 depending on whether the corresponding factor is at high or low level. · · · . . SSFk SSF1 F2 SSF1 F3 . .
.3
The General 2k Design
Consider k factors F1 . in the 22 design (k = 2). . Suppose the experiment is replicated n times. in the 23 design. . 1 1 1 . . . f1 . . the coeﬃcients become ±1/4 as expected. THE 2K FACTORIAL DESIGN One may obtain sums of squares using SS = 2n(eﬀect)2 . . For example. . Each main eﬀect as well as interaction eﬀect has one degree of freedom. For instance.2. The sums of squares are now 2 SSeﬀect = n × 2k−2 × (eﬀect) .k (2xk−1 xk ) + . Thus the sum of the degrees of freedom due to the factors (main and interaction) is
k i=1
k i
= 2k − 1
and the total degrees of freedom are 2k n − 1. . We then ﬁt the regression equation y = β0 + β1 x1 + · · · + βk xk + β12 (2x1 x2 ) + · · · + βk−1. . . . 1 . We will use contrast coeﬃcients ±21−k to linearly combine the 2k cell means. Thus we get the error degrees of freedom to be (2k n − 1) − (2k − 1) = 2k (n − 1) . . . . SSF1 F2 ···Fk SSE SST
We denote the 2k treatments using the standard notation.
89
4. · · · .

ˆ Now the estimates.
4. 100◦ C • E = resin/extender ratio : 2/1.e. The process takes place in a heated reaction vat. The plastic is produced by mixing resin with an extender in a solvent. The following example. i. An approach is to ignore some high-order interactions.4
The Unreplicated 2k Design
Since the number of factor level combinations in a 2k design may be large. The materials are allowed to react for a period of time. Another approach is to use a QQ-plot to identify the signiﬁcant factors. 2atm • D = drying temperature : 200◦ C. Example A research chemist wanted to study the eﬀect of a number of factors on the yield of a new high-impact plastic.
The multiplier is obtained as
2(number of factors in interaction−1) . it is often impossible to ﬁnd enough subjects to replicate the design. which in turn means that hypotheses cannot be tested. 1/1 The following table gives the results:
A -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 B -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 C -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 1 D -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 E -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 Yield 246 303 276 336 258 344 265 313 249 310 318 363 212 249 283 219 379 326 344 349 389 359 283 363
. He knew that he could run one trial per day in each vat. The chemist worked in a laboratory that contained 32 experimental vats and a ﬁlter and dryer with each vat. assume that interactions with three or higher factors do not exist. ﬁltered on a screen. and dried.2. 150◦ C • B = reaction time : 4hr. and the plastic settles to the bottom of the vat.90
CHAPTER 4. taken from Petersen : Design and Analysis of Experiments. are the eﬀects of the corresponding factor. β’s. 2hr • C = ﬁlter pressure : 1atm. He decided to study ﬁve factors using a single replication of a 25 design. illustrates these approaches. 2K AND 3K FACTORIAL DESIGNS +β12···k (2k−1 x1 x2 · · · xk ) + . In such a situation it is impossible to estimate the error. He selected the following factors: • A = reaction temperature : 300◦ C.

QUIT. BCDE = B*C*D*E. INPUT A B C D E YIELD. INFILE "C:\ASH\S7010\SAS\CHEM. AD=A*D. QUIT. RUN. PROC REG.86694 . THE 2K FACTORIAL DESIGN
-1 1 -1 1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 313 336 370 336 322 352 367 374
91
We will ﬁt the following regression model to estimate the eﬀects: y = β0 + β1 x1 + · · · + β12345 (24 x1 x2 x3 x4 x5 ) + We save the above data as it is in a ﬁle called chem.dat. AC=A*C.DAT". 61
. RUN. RUN. ABD=A*B*D. Pr > F .2. CE=C*E. ABCDE=A*B*C*D*E. BC=B*C. ADE=A*D*E. and we call the ﬁle using the SAS command inﬁle. ABCE=A*B*C*E. ACDE=A*C*D*E. BDE=B*D*E. BCD=B*C*D. ABDE=A*B*D*E. DATA CHEM.4. AB=A*B. ACD=A*C*D. DE=D*E. ABC=A*B*C. ACE=A*C*E. AE=A*E. SET CHEM. QUIT. The following code gives the analysis:
OPTIONS LS=80 PS=66 NODATE. MODEL YIELD = A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE. CD=C*D.
----------------------------------------------------The SAS System The REG Procedure Model: MODEL1 Dependent Variable: YIELD Analysis of Variance Source Model Error DF 31 0 Sum of Squares 73807 0 Mean Square 2380. BE=B*E. DATA CHEM2. ABCD = A*B*C*D. BCE=B*C*E. without the line which contains the names of the variables. F Value . CDE=C*D*E. ABE=A*B*E. BD=B*D.

62500 -1. . . . .81250 Standard Error .5 = 15.31250 2. .00000 31. . .25000 -10. .93750 0. . . . . . . .25000 8.18750 6. . . . . . . . . .31250 -6. the ABE eﬀect is 2 ∗ 7. R-Square Adj R-Sq
CHAPTER 4.81250 -2. . . .31250 -5. .
.92
Corrected Total 31 73807 . . . . Pr > |t| .
Parameter Estimates Variable Intercept A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate 315. . . . 315. . .75000 3. . . . . . .12500 3. For instance. . . . .00000 -7.25000 -1. .00000 0.62500 -6. . .75000 11. . . . .81250 .81250 3. . . .18750 6. . . .0000 . . .56250 11.43750 -7. . t Value . . . . .12500 2.
The estimates of the eﬀects are now the parameter estimate multiplied by 2.62500 -9. .93750 6.25000 9. . 2K AND 3K FACTORIAL DESIGNS
Root MSE Dependent Mean Coeff Var
1. .62500 -5. . . .31250 7.50000 -6. .93750 -4. . . . . . . . . . . .81250 11. . . . . . . . . .18750 3.

50 2.2790 0.58 1.00000 28.12500 1404.38 F Value 3.12500 684.66 1. QUIT. CLASS A B C D E AB AC AD AE BC BD BE CD CE DE.5484 Mean Square 3727.50000 2850.14 0.
.00000 4095.2084 0.72 3. The following partial SAS code follows the one given above.50000 Type III SS 4005.0767 0.50000 1275.12500 800.00000 32385.12500 128.33 Pr > F 0.58 1.12500 684.96 0.00000 1682.03 0.3015 0.00000 32385.00000 4095.00000 32385.00000 32385.00000 28.96 0.12500 420.3252 0.0737 0.12500 684.03 1.12500 1922.0001 0.5484 Pr > F 0.76 0.00000 3081.55 1.0001 0.50000 1275. RUN.12500 420.00000 1682. MODEL YIELD = A B C D E AB AC AD AE BC BD BE CD CE DE.12500 1404.12500 420.50000 Mean Square 4005.11 0.37500 17893.12500 1152. RUN.50000 2850.00000 28.12500 128.26 1.50000 1275.12500 1152.12500 1404.72 3.1300 0.3015 0.38 Pr > F 0.00000 4095.61 2.8760 0.
PROC GLM. The following SAS code that provides the QQ-plot continues the above:
PROC REG OUTEST=EFFECTS.00000 1682.66 1.2084 0.0737 0.12500 1922.7395 0.12500 800. RUN.12500 1404.11 0.00000 28.4101 <. QUIT.26 1.50000 2850.7395 0.50 2.50000 2850. ---------------------------------------------------------Dependent Variable: YIELD Source Model Error Corrected Total Source A B C D E AB AC AD AE BC BD BE CD CE DE Source A B C D E AB AC AD AE BC BD BE CD CE DE DF 15 16 31 DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sum of Squares 55913. MODEL YIELD = A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE.4454 0.12500 128.50000 73806.00000 1682. SET EFFECTS.00000 3081.4454 0.1300 0.50000 1275.34375 F Value 3.2378 0.76 0.1164 0.00000 4095.03 1. DATA EFFECTS.50000 Mean Square 4005.4101 <.2.61 2.0767 0.12500 1922.00000 3081.55 1.2790 0.8760 0. Let us now plot the QQ-plot of the eﬀects in the full model in an eﬀort to identify the important factors. THE 2K FACTORIAL DESIGN
93
We will now analyze the data ignoring 3 and higher factor interactions.12500 1922.00000 3081.50000 F Value 3.4.12500 1152.2378 0.14 0.12500 420.72 28.12500 1152.72 28. DROP YIELD INTERCEPT _RMSE_.03 0.0111
The only signiﬁcant factor appears to be resin/extender ratio.12500 128.1164 0.12500 684.12500 800.12500 800.87500 Type I SS 4005.3252 0.55833 1118.

QUIT. RANKS RANKEFF. QUIT. BY EFFECT. RUN.
CHAPTER 4. PROC RANK DATA=EFFECTS NORMAL=BLOM. PROC GPLOT.94
QUIT. DROP COL1. VAR EFFECT. RUN. PROC TRANSPOSE DATA=EFFECTS OUT=EFFECTS. RUN. DATA EFFECTS. QUIT. QUIT. SET EFFECTS. SYMBOL V=CIRCLE. GOPTION COLORS=(NONE). RUN. QUIT. 2K AND 3K FACTORIAL DESIGNS
. EFFECT = COL1*2. PLOT RANKEFF*EFFECT=_NAME_. RUN. PROC SORT DATA=EFFECTS.

THE 2K FACTORIAL DESIGN
95
.4.2.

ABC=A*B*C. We start by drawing the QQ plot of the eﬀects to identify the potentially signiﬁcant factors. RUN. SET FILTER. RUN. QUIT. DATA EFFECTS. PROC REG OUTEST=EFFECTS. ABCD = A*B*C*D. AC=A*C. ACD=A*C*D. DROP Y INTERCEPT _RMSE_.
Example A chemical product is produced in a pressure vessel. QUIT. RUN. QUIT. BC=B*C. -1 -1 -1 -1 -1 -1 -1 1 -1 -1 1 -1 -1 -1 1 1 -1 1 -1 -1 -1 1 -1 1 -1 1 1 -1 -1 1 1 1 1 -1 -1 -1 1 -1 -1 1 1 -1 1 -1 1 -1 1 1 1 1 -1 -1 1 1 -1 1 1 1 1 -1 1 1 1 1 . DATA EFFECTS. BCD=B*C*D. RUN.
OPTIONS LS=80 PS=66 DATA FILTER.
45 43 68 75 48 45 80 70 71 100 60 86 65 104 65 96
DATA FILTER2. AB=A*B. pressure (B). NODATE. SET EFFECTS. A factorial experiment was conducted in the pilot plant to study the factors thought to inﬂuence the ﬁltration rate of this product. RUN. The four factors are temperature (A). BD=B*D. QUIT. reactant concentration (C). The data are given below: A0 B0 D0 D1 C0 45 43 C1 68 75 C0 48 45 B1 C1 80 70 B0 C0 71 100 C1 60 86 A1 B1 C0 C1 65 65 104 96
This is an unreplicated 24 design. INPUT A B C D Y. MODEL Y = A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD. and stirring rate (D). AD=A*D. SET EFFECTS. CD=C*D. ABD=A*B*D.
. QUIT. CARDS. 2K AND 3K FACTORIAL DESIGNS Once again the only variable that appears to be signiﬁcant is resin/extender ratio. PROC TRANSPOSE DATA=EFFECTS OUT=EFFECTS.96
CHAPTER 4. The following example is taken from Montgomery : Design and Analysis of Experiments.

QUIT. PROC GPLOT. QUIT.
97
. QUIT. RUN. PROC RANK DATA=EFFECTS NORMAL=BLOM. SYMBOL V=CIRCLE. PLOT RANKEFF*EFFECT=_NAME_. GOPTION COLORS=(NONE). QUIT. RUN.4. RANKS RANKEFF. THE 2K FACTORIAL DESIGN
EFFECT = COL1*2.2. RUN. PROC SORT DATA=EFFECTS. VAR EFFECT. DROP COL1. BY EFFECT. RUN.

98
CHAPTER 4. 2K AND 3K FACTORIAL DESIGNS
.

ignoring factor B and any interactions involving factor B we run the analysis.937500 Type I SS 1870.562500 F Value 83. We now have 3k treatments which we denote by k digit combinations of 0. is the combination of the low level of factor A and the intermediate level of factor B.562500 1105.062500 855. D.562500 Mean Square 1870.13 49.0001 0.437500 179. 20.38 58. ab.0031 <. AD as signiﬁcant.47 Pr > F <. 11.35 Pr > F <. RUN.0001 0. The ANOVA table for a 3k design with n replications is
.0001
4.5120 Mean Square 793. The following partial SAS code performs the analysis. C. 10. and 2 (high). a.0003 0.562500 1105. This means the resulting analysis is a 23 design with factors A.062500 1314.062500 10. for a 32 design the treatments are 00. 1. and 2 instead of the standard notation (1). b. .23 0. for instance.3.0001 0. For example. 1 (intermediate).6475 0.562500 390.562500 5. C.. QUIT.500000 5730. MODEL Y = A|C|D.437500 F Value 35.3
The 3k Design
We now consider the analysis of a k-factor factorial design where each factor has 3 levels: 0 (low). AC. .562500 5.062500 855. D and 2 replications.
PROC GLM DATA=FILTER.4. 02.562500 390. 01. .062500 1314. Computations of eﬀects and sums of squares are direct extensions of the 2k case. 21.57 38. 12.062500 22. 22.062500 10. CLASS A C D. -----------------------------------------------------------Dependent Variable: Y Source Model Error Corrected Total Source A C A*C D A*D C*D A*C*D DF 7 8 15 DF 1 1 1 1 1 1 1 Sum of Squares 5551.27 0. THE 3K DESIGN
99
The QQ-plot identiﬁes A. The treatment 01.37 17. Thus.

. . Fk−2 Fk−1 Fk . SSF1 F2 ···Fk SSE SST
. . . . . . . . . k = 1 k-factor interaction k F 1 F2 · · · Fk Error Total
CHAPTER 4. . SSFk SSF1 F2 SSF1 F3 . . . . . . .100
Source k main eﬀects F1 F1 . Fk−1 Fk k three-factor interactions 3 F 1 F2 F3 F 1 F2 F4 . . . . . . 2 4 4 . SSFk−1 Fk SSF1 F2 F3 SSF1 F2 F4 . 2k 3k (n − 1) 3k n − 1 SS SSF1 SSF1 . Fk k two-factor interactions 2 F 1 F2 F 1 F3 . 8 . . 4 8 8 . 2K AND 3K FACTORIAL DESIGNS
df 2 2 . SSFk−2 Fk−1 Fk . .

ρ=
2 V ar(yij ) = σ 2 + σβ
2 Cov(yij . say A. Under these conditions. Such observations are rarely independent as they are measured on the same subject. yi j ) = σβ . 2 σβ 2 σ 2 + σβ
for i = i . one obtains and Thus.
5. σ 2 ) random variables. βb is a random sample from a N (0.1
The Mixed RCBD
Consider a single factor experiment with a levels of the factor. These designs are extensions of the randomized complete block design where blocks are random.1. and
• all the b + ab random variables (blocks and errors) are independent. All variances of observations are equal within a block and all covariances within a block are equal.Chapter 5
Repeated Measurement Designs
5. The statistical model for the mixed randomized complete block design is yij = µ + τi + βj + where •
a i=1 τi ij . σβ ) distribution. a j = 1. Each of the a levels of factor A is observed with each subject. b
= 0.
i = 1. This situation is sometimes referred to as the one-way repeated measurement design. The variance-covariance matrix of observations on each subject yj = (y1j . Let yij be the observation on level i of A for the jth subject. yaj ) is 101
. · · · . · · · .1 Introduction
Sometimes we take observations repeatedly on the same experimental subjects under several treatments.
is the common correlation of measurements made on the same subject for any pair of distinct treatment A levels. i and i .
2 • β1 . · · · . Suppose we have a blocking variable B that is random. We assume that a random sample of b blocks (subjects) is available from a large population of blocks (subjects). · · · .
•
ij
are ab iid N (0.

structure called the Huynh-Feldt sphericity (S) structure. . It is recommended that all mixed RCBD’s be analyzed as one-way RM designs since we can test for the (S) condition in a similar manner as Levene’s test. σa1
σ12 σ22 . where level 1 of a is control. | y ¯
2M SAB b 2 5. 2 ρσy
··· ··· ···
2 ρσy 2 ρσy .(a−1)(b−1) (α) M SAB 2. . 2 σy
α-level Tests Recalling the situation in the mixed two factor analysis. 2 ρσy 2 ρσy 2 σy . we have 1. An α-level test of H0 : τ1 = · · · = τa = 0 is M SA > Fa−1. An α-level simultaneous Tukey-Kramer test of H0 : τi = τi for 1 ≤ i < i ≤ a is |¯i.102 Σ=
CHAPTER 5. σa2
··· ··· ···
σ1a σ2a . . . − yi . 2 σy 2 ρσy .
.(a−1)(b−1) (α) √ 2
4. An α-level test of H0 : τi = τi is |¯i. An α-level test of H0 : σβ = 0 is
> da−1. . . σaa
Under the assumptions of the mixed RCBD Σ satisﬁes compound symmetry. An α-level simultaneous Dunnett test of H0 : τi = τ1 for 2 ≤ i ≤ a. Compound symmetry (CS) is the case where all variances are equal and all covariances are equal: Σ=
2 2 where σy = σ 2 + σβ . RCBD’s that satisfy the (S) condition are known as one-way repeated measurement (RM) designs. . . . is |¯i. | y ¯
2M SAB b
> t(a−1)(b−1) (α/2)
3. REPEATED MEASUREMENT DESIGNS
σ11 σ21 . . − yi . .(a−1)(b−1) (α) M SAB It can be shown that these tests remain valid for a more general variance-covariance. Σ. .(a−1)(b−1) (α)
M SB > Fb−1. − yi . | y ¯
2M SAB b
>
qa.

ta . In the above example. Assume each of n = 6 subjects is to be observed with each of the three drugs. ONE-WAY REPEATED MEASUREMENT DESIGNS
103
5. Usually.2
One-Way Repeated Measurement Designs
The major diﬀerence between the mixed RCBD and the one-way RM design is in the conceptualization of a ’block’. D2 . y31 for subject 1 cannot be obtained simultaneously. (D3 . This is the order eﬀect. etc. This raises another issue to be considered and controlled in RM designs. it is important to control the possible order eﬀects.5. The a levels of A may be a diﬀerent drugs. then how can one distinguish observed diﬀerences between D1 . If all the six subjects are treated with D1 followed by D2 followed by D3 . and D3 from the fact that the drugs were given in the order (D1 . which is produced and secreted by the liver. second by two subjects. In certain RM designs the order eﬀect is impossible to eliminate.
. D1 . the subjects are assumed to be randomly selected from a population of subjects. The following table is one possible randomization:
subject 1 2 3 4 5 6
D1 D2 D1 D3 D3 D2
order D2 D3 D2 D1 D1 D3
D3 D1 D3 D2 D2 D1
Note that each drug is observed ﬁrst by two subjects. That is. For example. and third by two subjects. this may be done as follows. D2 ) and randomly assign two subjects to each order. The following is an example for a = 4. or a diﬀerent dosages of the same drug. D3 . In many cases the ’block’ is a human or an animal and all the a levels of A are observed on the same subject. In all these cases the drug (dosage) mean responses are to be compared for these a levels. or measurements of the same drug at the same dosage level over a period of a times. D2 . is given to all subjects at the same dosage level. D2 . y21 . D2 . The following example illustrates a typical RM design. D1 ). Consider three orders: (D1 . let t1 < t2 < · · · < ta be a times and let τi be the eﬀect of the drug D at time ti . After the eﬀect of D1 is worn-oﬀ. and D3 are to be compared with respect to suppression of enzyme X. D3 . say t1 . The major diﬀerence between mixed RCBD’s and RM designs is that in RM designs the levels of factor A cannot be observed simultaneously. It is assumed that D1 is administered and then enzyme X measured (y11 ). a test of H0 : τ 1 = · · · = τ a = 0 is needed. Note that y11 . (D2 . D2 is administered and enzyme X measured. Assume the drug. D. D3 )? Are these diﬀerences due to true diﬀerences between the drugs or the order in which they are observed? In RM designs.2. D3 ). Subject 1 takes the drugs in the order D1 . · · · . the subject factor (B) will be considered random in RM designs.
Example Suppose three drugs D1 . D2 . Hence.

· · · . γa . i. a
ˆ and φd =
a i=1
di yi. The more general (S) structure for Σ is now introduced. The (S) structure implies the following (prove!): 1. observations taken on diﬀerent subjects are independent.104 i 1 ti t1 t2 t3 t4 time start of study Day 1
CHAPTER 5. This correlation structure violates the (CS) structure assumed under the mixed RCBD. for j = j . It is given by
c2 . is free of ¯
δ ˆ ˆ Cov(φc . φc = γ1 . say. observations observed closer together in time are more highly correlated than observations further apart in time. measurements are made on each subject a = 4 times in the same order. With this type of design. The variance of any sample contrast φ =
a ¯ i=1 ci yi. · · · .
2δ . is free of γ1 .
i=1
5.e. . REPEATED MEASUREMENT DESIGNS description enzyme X baseline measured prior to administration of drug D enzyme X measured 1 day after administration of drug D enzyme X measured 3 days after administration of drug D enzyme X measured 7 days after administration of drug D
2
3
Day 3
4
Day 7
Hence. i
i=1 a ¯ i=1 ci yi.1
The Huynh-Feldt Sphericity (S) Structure
The Huynh-Feldt sphericity structure is given by σii = 2γi + δ γi + γi if i = i if i = i
Of course. Cov(yij . .2
The One-Way RM Design : (S) Structure
The statistical model for the one way RM design under the (S) structure is yij = µ + τi + βj + where •
a i=1 τi ij
= 0. γa .
• the b subjects are a random sample from a population of subjects following a normal distribution with
. All pairwise comparisons of treatments have the same variance V ar(¯i.2.2. ) = y ¯ ˆ 2.
5. φd ) = b
ci di . It is given by
a
δ ˆ V ar(φ) = b ˆ 3. yi j ) = 0. The covariance between any two contrasts. b
ci = 0. − yi .

a set of (1 − α)100% Tukey-Kramer conﬁdence intervals for all (¯i.
3. · · · . L. The following example is taken from Mike Stoline’s class notes.(a−1)(b−1) (α) y ¯ i. These drugs are compounds that are chemically quite similar and each is hypothesized to be eﬀective in the stabilization of the heart function. assuming the eﬀect of previously-administered drugs have worn oﬀ. 1582-1589.2. The order eﬀect was partially removed by selecting one of the four drug orders below for each dog:
.(a−1)(b−1) (α) . b i. a test of H0 : τi = τ1 at MEER=α is |¯i. and D4 .
These results depend on the (S) condition. Example In a pre-clinical trial pilot study b = 6 dogs are randomly selected and each dog is given a standard dosage of each of 4 drugs D1 .5. satisﬁes the (S) condition. | y ¯
2M SAB b
> t(a−1)(b−1) (α/2) . D3 . − yi . These measures are diﬀerences between rates measured immediately prior the injection of the drug and rates measured one hour after injection in all cases. 2. a set of (1 − α)100% Dunnet conﬁdence intervals for all a − 1 comparisons τi − τ1 (treatments versus control) is 2M SAB (¯i. an α-level test of H0 : τi = τi is |¯i. a test of H0 : τi = τi at MEER=α is |¯i.(a−1)(b−1) (α) √ . yj = (y1j . ”Conditions under which the mean square ratios in the repeated measurements designs have exact F distributions”. Actually. an α-level test of H0 : τ1 = · · · = τa = 0 is M SA /M SAB > Fa−1. for all the tests to hold a necessary and suﬃcient condition is that Σ satisﬁes the (S) condition. For the proof of this.(a−1)(b−1) (α) y ¯ . D2 . − yi . Under the one-way RM model where Σ satisﬁes the (S) condition we have 1. Journal of the American Statistical Association. The variance-covariance matrix of the observations on the same subject. − y1 | y ¯
2M SAB b
> da−1. please see Huynh.e. yaj ). ) ± qa. H. in the one-way RM design. b
>
qa. S. 2
and 4. | y ¯
2M SAB b
a 2
pairwise comparisons τi − τi is
M SAB .(a−1)(b−1) (α) . E(yij ) = µ + τi
105
2. − y1 ) ± da−1.e. and Feldt. Four measures of the stabilization of the heart function are obtained for each dog for each drug type. ONE-WAY REPEATED MEASUREMENT DESIGNS 1. 65. − yi . (1970).

3 3. D1 .3 5 2 5.1 5.8 4.6 1 2 4. we may use the following SAS code to perform the ANOVA F -test as well as follow-up Tukey-Kramer analysis.6 5. D3 D1 . --------------------------------------------------------Source Model Error Corrected Total Source DRUG DOG DF 8 15 23 DF 3 5 Sum of Squares 28. D1 .5
Assuming that the (S) condition is satisﬁed.47 Pr > F <.52958333 Type III SS 19.38333333 5.0001 0.
1 2 3 4 5 6
4 4 4 4 4 4
4.2 2. D2 D2 .6 1 3 5.106 drug order 1 2 3 4
CHAPTER 5. REPEATED MEASUREMENT DESIGNS order of administration D3 .1 4 1 2.0 3. QUIT. CARDS. D4 .43486111 1.55000000 LSMEAN Number 1 2 3 4
Least Squares Means for effect DRUG Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y
. MODEL Y=DRUG DOG. RUN.46 11.2 5.0 5. D4 .2 2 1 3.8 4.89708333 Mean Square 6.4 4 2 3.1 5.8 4.5 6 3 5.2 6. D2 . D2 .3 3 1 4.3 7.9 Drug D2 4.9 5.
OPTIONS LS=80 PS=66 NODATE.71 Pr > F <. D4
The data are (a large entry indicates high stabilization of heart-rate) Dog 1 2 3 4 5 6 D1 2.8 3.0 3.86666667 4.8 3 3 7. INPUT DOG DRUG Y @@.52520833 0. CLASS DRUG DOG.2 5.2 5.9 4.3 6 1 3.32791667 30.0 5. D3 .2 5 3 6.1 6.2 D4 4.3 5. 1 1 2. D1 D4 .30458333 8.0001
Adjustment for Multiple Comparisons: Tukey DRUG 1 2 3 4 Y LSMEAN 3.5 Level D3 5. LSMEANS DRUG/PDIFF ADJUST=TUKEY.8 4.01666667 5.15519444 F Value 22.1 5 1 3.6 3.2 3 2 5.2 .77941667 F Value 41.20166667 2.4 3.1 2 3 6. DATA RM1.9 4 3 5.9 6 2 5.5
PROC GLM.9 2 2 5.0001 Mean Square 3. D3 .

we get the following: • Drug 3 is superior to all other drugs. ¯ ¯ • σj is the mean of row j entries of Σ.
5. ¯ • σ· is the mean of the diagonal entries of Σ.9 5. The question now is ”How can it be determined whether Σ satisﬁes (S).46. p-value < . CARDS. MODEL D1-D4 = /NOUNI. in this class.3 3.0002 4 0.2 5.8 7.3 5.2.3 5. we will always use the ˆ G-G e-adjusted test regardless of the result of Mauchly’s test.0001).0001 <.0095 0.8 2.0.0006 2 <. e e M SAB (a − 1)[
The G-G e-adjusted test reduces to the usual F -test if e = 1. DATA RM2. there is a signiﬁcant diﬀerence among the four drugs (F = 41.5 5.0001 0.6 4. 2.5.1 6.0002
107
Thus. • Drugs 2 and 4 are the same.1 4.2134 0.2 3.0.0 4.e. Hence.6 5.0095 0.3
One-way RM Design : General
a2 (¯· − σ·· )2 σ ¯ 2 σij − 2a σj + a2 σ·· ] ¯2 ¯2
An estimate of the departure of Σ from the (S) structure is e= where • σ·· is the mean of all a2 entries of Σ. • Drug 1 is inferior to the other drugs. Example We will reconsider the dogs’ heart-rate example once more. The (S) condition is satisﬁed if and only if e = 1.0006 0.5 .0001 0.2. i.2 6. INPUT D1 D2 D3 D4. The following SAS code produces Mauchly’s test as well as the G-G e-adjusted F -test. The value of e satisﬁes 1/(a − 1) ≤ e ≤ 1.2 4.8 3.1 5.2 4. From the Tukey-Kramer procedure. ONE-WAY REPEATED MEASUREMENT DESIGNS
i/j 1 2 3 4 1 <.2134 3 <. Whenever the (S) condition is not met we use the Greenhouse-Geisser (G-G) e-adjusted test of H0 : τ 1 = · · · = τ a = 0 given by M SA > F(a−1)ˆ.9 5. e = 1?” The answer to this question is the Huynh-Feldt modiﬁcation of the Mauchly (1940) test for sphericity.
.0 3.0001 0. PROC GLM.(a−1)(b−1)ˆ(α) .
OPTIONS LS=80 PS=66 NODATE.4 3.9 5.

3 3.0001
Adj Pr > F G .6735).2522124 9.43486111 0.0196 D24 6 -0.8 2.7576.89 0.1720748
Pr > ChiSq 0. PROC MEANS N MEAN STDERR T PRT. D12 = D2-D1.9 5. The G-G e-adjusted test for H0 : τ1 = τ1 = τ3 = τ4 = 0 is rejected with p-value ˆ < .4 3.7576 1.8 7.1 5.1 6.4833333 0.2472066 -1.2392 0.2 5. D24 = D4-D2. VAR D12 D13 D14 D23 D24 D34. D13 = D3-D1. -------------------------------------------The MEANS Procedure Variable N Mean Std Error t Value Pr > |t| -------------------------------------------------------------------D12 6 1.5 .0001 D13 6 2. The G-G estimate of e is e = 0.7586721 3.0001 <.1666667 0. CARDS.6 4. QUIT. RUN.426476
CHAPTER 5.0 4.0 3. This is done using PROC MEANS in SAS to get pairwise t-test statistics. INPUT D1 D2 D3 D4.1177
.85 0.9 5.6333333 0.4225
Thus the test for H0 : e = 1 is not rejected using Mauchly’s criterion (p-value = 0. 2.2 4.46 Pr > F <.8 3.1173788 13.
OPTIONS LS=80 PS=66 NODATE.6 5.92 <.5 5.3 5.2 4.108
REPEATED DRUG/PRINTE NOM. QUIT. ---------------------------------------------------Sphericity Tests Variables Transformed Variates Orthogonal Components DF 5 5 Mauchly’s Criterion 0.1627138 0.2513298 3.2 6. DATA RM2.53 0.0026 D23 6 0. D23 = D3-D2.32791667 Mean Square 6.30458333 2. REPEATED MEASUREMENT DESIGNS
Chi-Square 6.G H .2108185 5.8500000 0.F <. Follow-up t-tests may be performed without assuming equality of variances. D14 = D4-D1.1 4.38 0.4666667 0.0001
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0.0001. D34 = D4-D3.9 5.3 5.6735
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source DRUG Error(DRUG) Source DRUG Error(DRUG) DF 3 15 Type III SS 19.2 3.0002 D14 6 1.15519444 F Value 41. RUN.

. ya1na y. . y2bn2 . . . y1bn1 y2b1 .3166667 0. ... 1 y111 . . Group Means y1. . Sometimes it is important to consider an unbalanced case. . . .. Suppose that the third clinic services a population that is four times as large as the other two. . A random sample of ni subjects are selected and assigned to the ith level of A. . . . .b.. Within treatment comparisons involve the same subjects and are analyzed similarly to the one-way RM design. . ¯ y. y11n1 y211 .0035 --------------------------------------------------------------------
109
Once again the only drugs that are not signiﬁcantly diﬀerent are D2 and D4 . . The recommended sample sizes are : n1 = n2 = n and n3 = 4n. . . The classic two-way RM model is i = 1. . . . . ¯ . . Sn1 S1 . . Suppose that two of the three clinics serve cities that are mediumsized while the third clinic is serving a large metropolitan population. Sn2 .3
Two-Way Repeated Measurement Designs
In this section we will consider the two-way RM model with repeated measures on one factor. For instance. . The layout for the classic two-way RM design looks like the following: Treatments (B) Groups (A) 1 subjects S1 . . . . TWO-WAY REPEATED MEASUREMENT DESIGNS
D34 6 -1. .5. yabna y. . b k = 1.. . . This is known as the classic two-way RM design. . . Between group comparisons involve distinct subjects and hence are similar to such comparisons in the single factor CRD model. The b levels of B are observed for each subject in each A group. ni where • τi is the ith group eﬀect (
i τi
= 0)
j
• βj is the jth treatment eﬀect (
βj = 0)
.19 0. ya11 . ¯ ··· ··· b y1b1 . We will deﬁne the model for the general unbalanced cased but we will conﬁne our attention to the balanced case as it is the most common design. j = 1.
5. yab1 . .3. . y21n2 . a yijk = µ + τi + βj + (τ β)ij + πk(i) + (βπ)jk(i) + ijk . ¯
a
··· ···
ya. . .2535306 -5. ¯
Sna Treatment Means
This is a very widely used design.1. . ¯
2
···
y2. Let A be the between-subject factor with a ﬁxed levels and B be the within-subject (repeated) factor with b levels. . suppose the groups are 3 clinics serving diﬀerent subjects in diﬀerent cities. S1 .

(βπ)j 3.
k (i ) )
= 0 if k = k or i = i . (βπ)j •
ijk
k(i) )
=
2 −σβπ b
for j = j . If the interaction eﬀect is not signiﬁcant.
Note that the variables (βπ)jk(i) have a (CS) structure similar to the (CS) structure of (τ β)ij in the mixed two factor model: yijk = µ + τi + βj + (τ β)ij + ijk where τi is ﬁxed and βj is random.
a i=1
ni πk(i) random variables
• (βπ)jk(i) is the random joint interaction eﬀect of subject k and treatment j nested within group i. and df2 = (N − a)(b − 1). )2 ¯ ¯ ¯ . and
are mutually independent. df2
The ANOVA table for the classic two-way RM design is
Source of variation Between Subjects A Subjects within groups Within Subjects B AB B× Subjects within groups df N −1 a−1 df1 N (b − 1) b−1 (a − 1)(b − 1) df2 M SB M SAB M S2
2 σ 2 + σβπ + N
MS M SA M S1
E(MS)
2 σ 2 + bσπ + 2 + bσ 2 σ π b
F
Pa
i=1 2 ni τi a−1
M SA /M S1
Pa
i=1
σ2
+ σ2 +
2 σβπ 2 σβπ
Pa b−1 b P
i=1
2 βi
M SB /M S2 M SAB /M S2
+
ni (τ β)2 ij (a−1)(b−1)
j=1
Comparisons of means may be in order if the main eﬀects are signiﬁcant. b−1 σβπ ). Cov((βπ)jk(i) . (βπ)jk(i) . In this model we have V ar((τ β)ij ) = Let N =
a i=1
a−1 2 στ β a
and Cov((τ β)ij . (βπ)jk(i) follows a N (0. A test of H0 : τi = τi is |¯i.
ijk
• The πk(i) . REPEATED MEASUREMENT DESIGNS • (τ β)ij is the interaction of the ith group and the jth treatment (
i (τ β)ij
=
j (τ β)ij
= 0)
• πk(i) is the random eﬀect of subject k nested within group i. The N = 2 are assumed to follow a N (0. + yi.110
CHAPTER 5. (τ β)i j ) =
2 −στ β . then we have the following α-level tests: 1. (βπ)’s are assumed to satisfy
2 1.. − yi .k − yi. let
a ni
M S1 =
i=1 k=1
b(¯i... a
ni . σπ ) distribution. b
2. σ 2 ) distribution. df1 = N − a.
is a random error term which is assumed to follow a N (0. )2 y ¯ df1
and
a b
ni
M S2 =
i=1 j=1 k=1
(yijk − yi. | y ¯
M S1 b 1 ni
+
1 ni
> tdf1 (α/2)
. Further.k − yij.. Cov((βπ)jk(i) .

− y. ) = y ¯ ni the α-level test of H0 : µij = µij (comparison of treatments j and j within Group i) |¯ij. ) = y ¯ 1 1 + ni ni
2 σ 2 + σπ +
b−1 2 1 1 σβπ =: + b ni ni
M
and M cannot be unbiasedly estimated by either M S1 or M S2 . Let µij be the mean of cell (i.j ¯ ¯ yij. − yij . − yi j.e. − yij . − yi . . .. | y ¯
2M S2 a2 a 1 i=1 ni
> tdf2 (α/2)
One may make simultaneous comparisons by making the appropriate adjustments in the above tests. However.j. comparisons of the means of one factor needs to be performed within the levels of the other factor. | y ¯ M S3
1 ni
+
1 ni
> tdf3 (α/2) . slightly more problematic since V ar(¯ij. A test of H0 : βj = βj is
111
|¯. − yij ¯ ¯
.j. − yij . N = na and the ANOVA table is
Source of variation Between Subjects A (Groups) Subjects within groups Within Subjects B AB B× Subjects within groups df N −1 a−1 a(n − 1) N (b − 1) b−1 (a − 1)(b − 1) a(n − 1)(b − 1) MS M SA M S1 F FA = M SA /M S1
M SB M SAB M S2
FB = M SB /M S2 FAB = M SAB /M S2
Comparisons of means in the balanced case are summarized in the following table: Parameter τi − τi βj − βj µij − µij µij − µi j Estimator yi. | y ¯
2M S2 ni
> tdf2 (α/2) .
For the balanced case. i.j .
Standard Error of Estimator
2M S1 bn 2M S2 an 2M S2 n 2M S3 n
df df1 df2 df2 df3
yij. ¯ ¯ y. n1 = n2 = · · · = na = n. j). ¯ Since 2M S2 V ar(¯ij. TWO-WAY REPEATED MEASUREMENT DESIGNS 2. Using the Satterthwaite approximation formula. − y. ¯ ¯
. however. we get the degrees of freedom associated with M S3 as [(df1 )(M S1 ) + (df2 )(M S2 )]2 df3 = (df1 )(M S1 )2 + (df2 )(M S2 )2 Thus an α-level test of H0 : µij = µi j is |¯ij. − yi j. M S3 = (df1 )(M S1 ) + (df2 )(M S2 ) df1 + df2
is an unbiased estimate of M .3. The estimator of µij is yij. If the AB interaction eﬀect is signiﬁcant.5.
The comparison of groups i and i is..

The following table contains results from one such study. DATA RM.3. OUTPUT. INPUT Y @@.4.112 In the balanced case M S3 and df3 are given by M S3 = and df3 =
CHAPTER 5. DRUG Person within drug 1 2 3 4 5 6 7 8 AX23 T1 72 78 71 72 66 74 62 69 T2 86 83 82 83 79 83 73 75 T3 81 88 81 83 77 84 78 76 T4 77 81 75 69 66 77 70 70 T1 85 82 71 83 86 85 79 83 BWW9 T2 86 86 78 88 85 82 83 84 T3 83 80 70 79 76 83 80 78 T4 80 84 75 81 76 80 81 81 T1 69 66 84 80 72 65 75 71 CONTROL T2 73 62 90 81 72 62 69 70 T3 72 67 88 77 69 65 69 65 T4 74 73 87 72 70 61 68 65
The following SAS code performs the analyses:
OPTIONS LS=80 PS=66 NODATE. QUIT. QUIT. OUTPUT OUT=OUTMEAN MEAN=YM. DO A=1. RUN.3. RUN.
69 66 84 80 72 65 75 71
73 62 90 81 72 62 69 70
72 67 88 77 69 65 69 65
74 73 87 72 70 61 68 65
. CARDS.2. At the start of the study. END. VAR Y. BY A B S. END. REPEATED MEASUREMENT DESIGNS
M S1 + (b − 1)M S2 b
a(n − 1)[M S1 + (b − 1)M S2 ]2 2 2 M S1 + (b − 1)M S2
The following example is taken from Milliken and Johnson : The Analysis of Messy Data (Vol 1) Example An experiment involving d drugs was conducted to study each drug eﬀect on the heart rate of humans.2. PROC MEANS MEAN NOPRINT. BY A B. INPUT S @. n female human subjects were randomly assigned to each drug. DO B = 1. PROC SORT DATA=RM. 1 72 86 81 77 85 86 83 80 2 78 83 88 81 82 86 80 84 3 71 82 81 75 71 78 70 75 4 72 83 83 69 83 88 79 81 5 66 79 77 66 86 85 76 76 6 74 83 84 77 85 82 83 80 7 62 73 78 70 79 83 80 81 8 69 75 76 70 83 84 78 81 . After the drug was administered. the heart rate was measured every ﬁve minutes for a total of t times.

0001
-------------------------------------------------------------------------------Least Squares Means A 1 1 1 1 2 2 2 2 3 3 3 3 B 1 2 3 4 1 2 3 4 1 2 3 4 Y LSMEAN 70.7500000 72. TITLE1 ’HEART RATE DATA’.0001 <. QUIT.16 Pr > F <. . F Value .5000000 81. CLASS A B S.489583 Type III SS 1315.166667 458. TEST H=A E=S(A). PROC GPLOT DATA=OUTMEAN.468750 Mean Square 657. .614583 531. SYMBOL2 V=TRIANGLE L=1 I=JOIN CV=BLACK.1666667 Mean Square 94.083333 2320.2048611 88. TITLE3 ’DRUG BY TIME’. SYMBOL3 V=CIRCLE L=1 I=JOIN CV=ORANGE. .527778 7.
113
Tests of Hypotheses Using the Type III MS for S(A) as an Error Term Source A DF 2 Type III SS 1315. . PLOT YM*B=A. -------------------------------------------------------------------------------Dependent Variable: Y Source Model Error Corrected Total Source A S(A) B A*B B*S(A) DF 95 0 95 DF 2 21 3 6 63 Sum of Squares 4907.2500000
.541667 110. Pr > F . SYMBOL1 V=DIAMOND L=1 I=JOIN CV=BLUE.5000000 71. .156250 282. RUN. TWO-WAY REPEATED MEASUREMENT DESIGNS
GOPTIONS DISPLAY.5.0000000 78.3750000 71.0000000 73.657785 . Mean Square 51.483631 94. QUIT. PROC GLM DATA=RM.5277778 F Value 12.204861 88.7500000 72.95 12.1250000 81.6145833 531.6250000 79.7500000 84.3.95 Pr > F 0.277282 F Value . .000000 4907. .0090
Tests of Hypotheses Using the Type III MS for B*S(A) as an Error Term Source B A*B DF 3 6 Type III SS 282. RUN.5000000 80. . TEST H=B A*B E=B*S(A). MODEL Y = A S(A) B A*B B*S(A). LSMEANS A*B.541667 F Value 5. Pr > F .083333 Mean Square 657.489583 0.

1 -----------DRUG 2 : BWW9 T3 T4 T1 T2 78.3 T3 71.114
CHAPTER 5. Compare times within drugs The common standard error is se(¯ij.70 .0 ---------------------------------DRUG 3 : CONTROL T4 71.8 84.4 T1 72. − yij . ) = y ¯ The least signiﬁcant diﬀerence (LSD) is LSD = t63 (. REPEATED MEASUREMENT DESIGNS
The interaction plot as well as the F -test show a signiﬁcant interaction.0 ------------
. − yij .35) = 2.70 indicates a signiﬁcant diﬀerence between µij and µij . n 2M S2 = n (2)(7.8 T2 T3 80.5 T2 72. y ¯ This gives us the following set of underlining patterns: DRUG 1 : AX23 T1 T4 70.5 81.8 81. | that exceeds 2.28) = 1.025) 2M S2 = 2.00(1.5 73.35 8
Thus any |¯ij.6 79.

1 ---------------BWW9 79.87 . − yi j.28)) = 2. n
Thus any |¯ij.0 -----------BWW9 81.6 81. | that exceeds 5.042(2. ) = y ¯ with df3 = a(n − 1)[M S1 + (b − 1)M S2 ]2 3(7)[110.3.8 -------------TIME 2 CONTROL 72. − yij . Example This experiment involved studying the eﬀect of a dose of a drug on the growth of rats.87 indicates a signiﬁcant diﬀerence between µij and µi j . TWO-WAY REPEATED MEASUREMENT DESIGNS ----------------------------
115
Compare drugs within times The common standard error is se(¯ij.5 + (3)(7.7 ≈ 30 2 + (b − 1)M S 2 M S1 (110.4 AX23 BWW9 80.5 + (3)(7.5 84. The weights were obtained each week for 4 weeks. where 5 rats were randomly assigned to each of the 3 doses of the drug.876 (2)(8)
The least signiﬁcant diﬀerence (LSD) is LSD = t30 (.
.28)]2 = = 29.876) = 5. The data set consists of the growth of 15 rats.0 ------------
TIME 4 CONTROL AX23 71.5 BWW9 AX23 78.5.8
The following example taken from Milliken and Johnson illustrates how SAS can be used to make comparisons of means in the absence of a signiﬁcant interaction.75
TIME 3 CONTROL 71.5 72. y ¯ This gives us the following set of underlining patterns: TIME 1 AX23 CONTROL 70.025) 2M S3 = 2.3 73.28)2 2 2M S3 = n 2(M S1 + (b − 1)M S2 ) = nb 2(110.5)2 + (3)(7.

RUN.
. DATA RM1.2. OUTPUT. PROC GPLOT DATA=OUTMEAN.4. RUN.3. OUTPUT OUT=OUTMEAN MEAN=YM. PROC SORT DATA=RM1. CARDS. BY A B S.4. DO B = 1. BY A B. QUIT. PLOT YM*B=A. DO S=1. END. END. SYMBOL2 V=TRIANGLE L=1 I=JOIN CV=BLACK. SYMBOL3 V=CIRCLE L=1 I=JOIN CV=ORANGE. INPUT Y @@. VAR Y. GOPTIONS DISPLAY. REPEATED MEASUREMENT DESIGNS Week 2 3 60 63 75 81 81 87 69 77 58 62 71 73 102 90 69 63 66 77 64 70 75 81 109 95 72 66 70 84 69 73
Dose 1
Rat 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 54 69 77 64 51 62 68 94 81 64 59 56 71 59 65
4 74 90 94 83 71 81 91 112 104 78 75 81 80 76 77
2
3
The SAS code and output are given below:
OPTIONS LS=80 PS=66 NODATE.3. END.2.5.2. SYMBOL1 V=DIAMOND L=1 I=JOIN CV=BLUE. QUIT. DO A=1. PROC MEANS MEAN NOPRINT.3. 54 60 63 74 69 75 81 90 77 81 87 94 64 69 77 83 51 58 62 71 62 71 75 81 68 73 81 91 94 102 109 112 81 90 95 104 64 69 72 78 59 63 66 75 56 66 70 81 71 77 84 80 59 64 69 76 65 70 73 77 .116
CHAPTER 5.

RUN.727778 5. Pr > F .86 1. TITLE1 ’RAT BODY WEIGHT DATA’.5.18333 0.3854
The GLM Procedure Least Squares Means Standard Errors and Probabilities Calculated Using the Type III MS for S(A) as an Error Term A 1 2 3 Y LSMEAN 72. LSMEANS B/PDIFF E=B*S(A).366667 177. QUIT.3. QUIT.936111 F Value .727778 5.0001 0. CLASS A B S. LSMEANS A/PDIFF E=S(A).433333 5405.0500000 LSMEAN Number 1 2 3
Least Squares Means for effect A Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 0.09 Pr > F <.500000 2678.00000 10440. Mean Square 176. TWO-WAY REPEATED MEASUREMENT DESIGNS
TITLE3 ’DOSE BY TIME’.1345
Tests of Hypotheses Using the Type III MS for B*S(A) as an Error Term Source B A*B DF 3 6 Type III SS 2678.7764 2 0.38 Pr > F 0. .0664
. . . MODEL Y = A S(A) B A*B B*S(A). RUN.394444 F Value 180.700000 Mean Square 1073.18333 Type III SS 2146. Pr > F .95226 . .1095 0. . TEST H=A E=S(A). TEST H=B A*B E=B*S(A).458333 892. PROC GLM DATA=RM1.183333 32. .433333 Mean Square 1073.1095 0.216667 450. F Value .394444 4.216667 F Value 2. ---------------------------------------------------------Dependent Variable: Y Source Model Error Corrected Total Source A S(A) B A*B B*S(A) DF 59 0 59 DF 2 12 3 6 36 Sum of Squares 10440. .7764 0.0000000 83.0664 3 0. .6000000 70.366667 Mean Square 892.
117
Tests of Hypotheses Using the Type III MS for S(A) as an Error Term Source A DF 2 Type III SS 2146.183333 32.

Here we consider the analysis of a two-way RM design where the (S) condition may not hold.0 83.0001 <.0001
Since there is no signiﬁcant interaction.0001 <.5 W3 77.0001 2 <.118
CHAPTER 5.3 W2 72.0001 <. the means of A and B can be compared at the highest level.2666667 72.6 W4 84. Using the output from LSMEANS one obtains the following underlining patterns: DOSES : D3 D1 D2 70.6 -------------------W1 66. we have considered the analysis of the classic two-way RM design under the assumption that the (S) condition is satisﬁed for each level of A.5333333 77.4666667 LSMEAN Number 1 2 3 4
Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 4 1 <.6000000 84.
.0001 3 <.0001 <.0001 <.1 72.5
WEEKS :
Two-Way RM Design : General Case
So far.0001 <.0001 <.0001 4 <.0001 <. REPEATED MEASUREMENT DESIGNS
Least Squares Means Standard Errors and Probabilities Calculated Using the Type III MS for B*S(A) as an Error Term B 1 2 3 4 Y LSMEAN 66.

. for level j of B. we test H0 : µi1 = · · · = µib using the data Subject 1 .
OPTIONS LS=80 PS=66 NODATE. PROC GLM DATA=RM3.5. . . we test H0 : µ1j = · · · = µaj and make multiple comparisons of means using the data A i yij1 . That is. . -------------------------------------------------------------SELECTED OUTPUT -------------------------------------------------------------RAT BODY WEIGHT DATA : 2
1 yi11 . . MODEL Y1-Y4 = A. Run a complete one-way ANOVA of the the levels of A within each level of B. yi1ni
··· ··· ··· ···
B j yij1 . . . k . yijk . yajna
3. CLASS A. TWO-WAY REPEATED MEASUREMENT DESIGNS The analysis strategy will be as follows: 1.3. ni Example We will revisit the body weight of rats data considered above. REPEATED B 4/ PRINTE. . . QUIT. INPUT A Y1 Y2 Y3 Y4. . Run a complete one-way RM analysis of the levels of B within each level of A using the G-G eadjusted one-way RM F -tests followed by multiple comparisons of means. for level i of A. DATA RM3. RUN. TITLE1 ’RAT BODY WEIGHT DATA : 2’. yijni
1 y1j1 . Test the B main eﬀect and the AB interaction eﬀect using the G-G e-adjusted F -test.
119
2. CARDS. y1jn1
··· ··· ···
··· ··· ···
a yaj1 . yibk . . yijni
··· ··· ··· ···
a yib1 . . That is. . . . . The following SAS code is used to get the tests for B and AB eﬀects. . . . yi1k . 1 54 60 63 74 1 69 75 81 90 1 77 81 87 94 1 64 69 77 83 1 51 58 62 71 2 62 71 75 81 2 68 73 81 91 2 94 102 109 112 2 81 90 95 104 2 64 69 72 78 3 59 63 66 75 3 56 66 70 81 3 71 77 84 80 3 59 64 69 76 3 65 70 73 77 . . yibni
117
. . .

3345438 Chi-Square 33. We now run one-way RM analyses of B within each level of A.727778 5.0459293 0.05.3846
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0. Mauchly’s test for the (S) condition is signiﬁcant indicating that the analyses run earlier may not be the appropriate ones.3811 <.38 Pr > F 0.740697 Pr > ChiSq <.5244612 Pr > ChiSq 0.G H .27398 Chi-Square 4.6058 0. BY A.0001 0.700000 Mean Square 892.433333 5405. QUIT.120
CHAPTER 5.366667 177. RUN.86 1. RUN.183333 32.216667 450.0001 0. BY A.1345 118
RAT BODY WEIGHT DATA : 2 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source B B*A Error(B) Source B B*A Error(B) DF 3 6 36 Type III SS 2678.0001 0. MODEL Y1-Y4=/NOUNI.0001 0.F <.A=1 -------------------------------------Sphericity Tests Variables Transformed Variates Orthogonal Components DF 5 5 Mauchly’s Criterion 0.8269
Using the G-G e-adjusted tests one observes that the AB interaction eﬀect is not signiﬁcant while the B main eﬀect is signiﬁcant both at α = .031436 11.458333 F Value 2. QUIT. REPEATED MEASUREMENT DESIGNS
Sphericity Tests Variables Transformed Variates Orthogonal Components DF 5 5 Mauchly’s Criterion 0. REPEATED B 4/PRINTE.6197
.4227 0.
PROC SORT DATA=RM3.3854
Adj Pr > F G .500000 Mean Square 1073.1626147 0.9445682 3.936111 F Value 180. PROC GLM DATA=RM3.0385
The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source A Error DF 2 12 Type III SS 2146. ----------------------------------------------------------------SELECTED OUTPUT ----------------------------------------------------------------RAT BODY WEIGHT DATA : 2 119
------------------------------------.394444 4.09
Pr > F <.

0001
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0.000000 52.6370 1.19 Pr > F <.15447 8.3.0021 0.600000 14.0001
Adj Pr > F G .052686 Chi-Square 14.A=3 -------------------------------------Sphericity Tests Variables Transformed Variates Orthogonal Components DF 5 5 Mauchly’s Criterion 0.A=2 -------------------------------------Sphericity Tests Variables Transformed Variates Orthogonal Components DF 5 5 Mauchly’s Criterion 0.F 0.G H .0001 <.4800
.1713
------------------------------------.G H .6286 1.000000 Mean Square 338.0055188 0.0706227 0.0001
Adj Pr > F G .3000000 Mean Square 224.G H .3166667 9. TWO-WAY REPEATED MEASUREMENT DESIGNS
121
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source B Error(B) Source B Error(B) DF 3 12 Type III SS 1023.200000 F Value 284.0001
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0.0147 0.000000 4.0004
Greenhouse-Geisser Epsilon
0.F <.2149884 2.1555
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source B Error(B) Source B Error(B) DF 3 12 Type III SS 672.200000 1.333333 F Value 78.0001
Adj Pr > F G .400000 Mean Square 341.2055
------------------------------------.2051 0.0126045 Pr > ChiSq 0.00 Pr > F <.F <.33 Pr > F <.2750000 F Value 24.0001 <.3812237 Chi-Square 7.6252267 Pr > ChiSq 0.9500000 111.7575
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source B Error(B) Source B Error(B) DF 3 12 Type III SS 1014.5.

RUN.2. MODEL Y=A.6772
Sphericity is satisﬁed in all the three cases. RUN. one uses Welch t-tests.4. QUIT.800000 Mean Square 214. The following SAS code re-creates the data as A.B=1 -------------------------------------The GLM Procedure Dependent Variable: Y Source Model Error DF 2 12 Sum of Squares 428. DROP Y1-Y4.
DATA RM4. RUN.900000 F Value 1. Y = Z(B). DO B=1.122
Huynh-Feldt Epsilon
CHAPTER 5. PROC SORT DATA=RM4. CLASS B S. BY A. QUIT. REPEATED MEASUREMENT DESIGNS
0. Thus. OUTPUT. PROC GLM DATA=RM4. BY A. ------------------------------------------------------------SELECTED OUTPUT ------------------------------------------------------------RAT BODY WEIGHT DATA : 2 140
------------------------------------. S = _N_. MODEL Y=B S. LSMEANS B/ PDIFF. SET RM3. QUIT. B. END.93 Pr > F 0. CLASS A. RUN. PROC SORT DATA=RM4.066667 110.1876
.133333 1330. as shown in the last example of Section 5. we may compare the means of B using the M SE as a denominator. QUIT.2. LSMEANS A/PDIFF. BY B. RUN. PROC GLM DATA=RM4. The repeated factor B is also signiﬁcant in all the cases. BY B. • comparisons of the A means for each level of B. ARRAY Z Y1-Y4. QUIT. In the situation where the (S) condition is not satisﬁed in one or more of the groups. to compare the means of B in the particular group which does not satisfy the (S) condition.3. Y columns and runs: • the one-way ANOVA for the factor A within each level of B. S. and • comparisons of the B means within each level of A.

0757 3 0.1333333 Mean Square 214.8831 0.1309 0.41 Pr > F 0.533333 1341.0666667 F Value 1.266667 111.1018 3 0.0000000 68.41 Pr > F 0.8000000 62.0884 0. TWO-WAY REPEATED MEASUREMENT DESIGNS
123
Corrected Total Source A
14 DF 2
1758.0000000 73.9300 2 0.0884 0.8831 2 0.6000000 81.1319
The GLM Procedure Least Squares Means A 1 2 3 Y LSMEAN 68.0000000 LSMEAN Number 1 2 3
Least Squares Means for effect A Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 0.2666667 F Value 2.733333 Type III SS 538.5.93 Pr > F 0.1309 0.3.B=3 -------------------------------------The GLM Procedure Dependent Variable: Y Source DF Sum of Squares Mean Square F Value Pr > F
.933333 Type III SS 428.200000 1879.0757
------------------------------------.0000000 LSMEAN Number 1 2 3
Least Squares Means for effect A Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 0.766667 F Value 2.1319 Mean Square 269.1018
------------------------------------.1876
The GLM Procedure Least Squares Means A 1 2 3 Y LSMEAN 63.5333333 Mean Square 269.9300 0.B=2 -------------------------------------The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total Source A DF 2 12 14 DF 2 Sum of Squares 538.

124

Model Error Corrected Total Source A 2 12 14 DF 2 587.200000 1636.400000 2223.600000 Type III SS 587.2000000

**CHAPTER 5. REPEATED MEASUREMENT DESIGNS
**

293.600000 136.366667 2.15 0.1589

Mean Square 293.6000000

F Value 2.15

Pr > F 0.1589

The GLM Procedure Least Squares Means A 1 2 3 Y LSMEAN 74.0000000 86.4000000 72.4000000 LSMEAN Number 1 2 3

Least Squares Means for effect A Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 0.1190 0.8321 2 0.1190 0.0824 3 0.8321 0.0824

------------------------------------- B=4 -------------------------------------The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total DF 2 12 14 Sum of Squares 624.933333 1274.800000 1899.733333 The GLM Procedure Least Squares Means A 1 2 3 Y LSMEAN 82.4000000 93.2000000 77.8000000 LSMEAN Number 1 2 3 Mean Square 312.466667 106.233333 F Value 2.94 Pr > F 0.0913

Least Squares Means for effect A Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 0.1235 0.4939 2 0.1235 0.0359 3 0.4939 0.0359

RAT BODY WEIGHT DATA : 2

152

------------------------------------- A=1 -------------------------------------The GLM Procedure Dependent Variable: Y Sum of

**5.3. TWO-WAY REPEATED MEASUREMENT DESIGNS
**

Source Model Error Corrected Total Source B S DF 7 12 19 DF 3 4 Squares 2733.600000 14.400000 2748.000000 Type III SS 1023.600000 1710.000000 The GLM Procedure Least Squares Means B 1 2 3 4 Y LSMEAN 63.0000000 68.6000000 74.0000000 82.4000000 LSMEAN Number 1 2 3 4 Mean Square 341.200000 427.500000 F Value 284.33 356.25 Pr > F <.0001 <.0001 Mean Square 390.514286 1.200000 F Value 325.43 Pr > F <.0001

125

Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 4 1 <.0001 <.0001 <.0001 2 <.0001 <.0001 <.0001 3 <.0001 <.0001 <.0001 4 <.0001 <.0001 <.0001

------------------------------------- A=2 -------------------------------------The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total Source B S DF 7 12 19 DF 3 4 Sum of Squares 4326.800000 52.000000 4378.800000 Type III SS 1014.000000 3312.800000 The GLM Procedure Least Squares Means B 1 2 3 4 Y LSMEAN 73.8000000 81.0000000 86.4000000 93.2000000 LSMEAN Number 1 2 3 4 Mean Square 338.000000 828.200000 F Value 78.00 191.12 Pr > F <.0001 <.0001 Mean Square 618.114286 4.333333 F Value 142.64 Pr > F <.0001

Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 4 1 0.0001 <.0001 <.0001 2 0.0001 0.0015 <.0001 3 <.0001 0.0015 0.0002 4 <.0001 <.0001 0.0002

126

CHAPTER 5. REPEATED MEASUREMENT DESIGNS

------------------------------------- A=3 -------------------------------------The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total Source B S DF 7 12 19 DF 3 4 Sum of Squares 1055.650000 111.300000 1166.950000 Type III SS 672.9500000 382.7000000 Mean Square 224.3166667 95.6750000 F Value 24.19 10.32 Pr > F <.0001 0.0007 Mean Square 150.807143 9.275000 F Value 16.26 Pr > F <.0001

The GLM Procedure Least Squares Means B 1 2 3 4 Y LSMEAN 62.0000000 68.0000000 72.4000000 77.8000000 LSMEAN Number 1 2 3 4

Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 4 1 0.0089 0.0002 <.0001 2 0.0089 0.0414 0.0003 3 0.0002 0.0414 0.0159 4 <.0001 0.0003 0.0159

Using underlining to summarize the results Group(A) --------1 2 3 B levels -------------B1 B2 B3 B4 B1 B2 B3 B4 B1 B2 B3 B4

Treatment(B) -----------1 2 3 4

A levels -----------------A3 A1 A2 -----------------A3 A1 A2 -----------------A3 A1 A2 -----------------A3 A1 A2 -------------------

TWO-WAY REPEATED MEASUREMENT DESIGNS
127
.5.3.

128
CHAPTER 5. REPEATED MEASUREMENT DESIGNS
.

129
. and the levels of the treatment variable. Later sections consider special cases of the two-way RM design. This may be used to ﬁnd the optimal dosage level.
i = 1. in a one-way CRD model.and two-way RM designs as our ﬁrst section.e y = f (x) + ε For example. x.Chapter 6
More on Repeated Measurement Designs
In this chapter we will further investigate one. one may consider the simple linear regression model y = β0 + β1 x + ε and test H0 : β1 = 0 to determine if the response is linearly related to the levels of the treatment. We can get more insight into the nature of the relationship of the response. i.and two-way repeated measurement designs.1
6. y. Thus.
which is tested using F0 = M SB /M SW . 5 . if we consider a regression type relationship between x and y. For instance.1. The treatment consists of ﬁve diﬀerent levels of cotton percentages and the response is the strength of the fabric produced. One is usually interested in developing an equation for a curve that describes the dose-response relationship. we shall consider trend analysis in one.1
Trend Analyses in One. the treatments may be several dosages of the same drug. consider the fabric strength experiment considered in Chapter 1. · · · . This is the usual CRD framework that is represented by the model yij = µ + τi + where our interest lies in testing H0 HA : : τ1 = τ2 = · · · = τ5 = 0 τi = 0 for at least one i
ij . Since RM designs usually involve a time factor. j = 1. As an illustration. · · · . To this end we want to consider applications of regression procedures within the ANOVA framework. Each percentage of cotton is randomly assigned to ﬁve randomly selected experimental units.and Two-way RM Designs
Regression Components of the Between Treatment SS (SSB )
Often the treatments in an experiment consist of levels of a quantitative variable. 5. one may be interested in the pattern of the response variable over time.
6.

This. Without loss of generality. Thus. The F ratios in the above ANOVA table are used to test the following hypotheses: 1. F0 is a test statistic for testing H0 : τ1 = · · · = τk = 0 . assume that we have a balanced CRD model where r represents the number of replications per treatment level. 2. SSB = SSR + SSL where SSR is the sum of squares due to the linear regression and SSL is the sum of squares due to lack of ﬁt (i.e failure of the linear regression to describe the relationship between x and y). 3. Assume also that we have k treatment levels. the hypothesis that the simple linear regression model describes the data against the alternative that a simple linear model is not suﬃcient. the hypothesis that all the treatment means are the same against the alternative that at least two are diﬀerent. however. FL is the test statistic for testing H0 : E(y) = β0 + β1 x .130 Partitioning SSB
CHAPTER 6. To use this approach we may deﬁne contrasts using the deviations of the treatment levels from the treatment mean as our contrast coeﬃcients. The ANOVA table for the CRD is Source Between Linear Regression Lack of Fit Within (Error) Total df k−1 1 k−2 n−k n−1 SS SSB SSR SSL SSW SST MS M SB M SR M SL M SW F0 F0 = M SB /M SW FR = M SR /M SW FL = M SL /M SW
One may obtain SSR by ﬁtting an ordinary linear regression model of y on x. seems to be the hard way as the F values may have to be computed by hand. FR is a test statistic for testing H0 : β1 = 0 . Then
k
φR =
i=1
(xi − x)µi ¯
is a contrast (
k i=1 (xi
− x) = 0) whose estimator is ¯ ˆ φR =
k
(xi − x)¯i ¯y
i=1
ˆ From φR we get SSR = and
ˆ rφ2 R
k i=1 (xi
− x)2 ¯
SSL = SSB − SSR . the hypothesis that there is no linear relationship between the response and the levels of the treatment against the alternative that there is a signiﬁcant linear relationship. MORE ON REPEATED MEASUREMENT DESIGNS
The variability in the response that is explained by the treatment may now be partitioned into that due to the linear regression and that due to the remainder that cannot be explained by the regression model.
. An easier way is to ﬁnd a set of coeﬃcients to deﬁne a contrast among the treatment means.

1195229 -0. quit.3162278 0. k −1.4472136 0. 10. The orthogonal polynomial coeﬃcients for all four terms of the polynomial y = β0 + β1 x + β11 x2 + β111 x3 + β1111 x4 are computed as proc iml. . .4472136 0. .316228 0. 30. 2. run.632456 0. i = 1. and 50. Proc IML in SAS may be used to generate orthogonal polynomial coeﬃcients. quit. This means that SSB can be completely decomposed using a polynomial of degree k − 1 with no lack of ﬁt term. This is very widely used in practice even though it sometimes entail premature stopping. etc. Then
k
Li =
j=1
cij yj ¯
is the contrast of means associated with the ith order term of the polynomial.4472136 0. The usual procedure is to ﬁt successive terms of the polynomial starting with the linear term until lack of ﬁt becomes non-signiﬁcant. .86E-16 -0.3162278 0. TREND ANALYSES IN ONE. xp=orpol(x. k be the ith order polynomial coeﬃcient for the jth treatment level. the design is balanced.478091 0.478091 0. quadratic. xp=orpol(x. say. 20.1.
.534522 -0.6324555 1.6324555 0.316228 3. . run. . quartic. x={10 20 30 40 50}.6. For example proc iml. These coeﬃcients enable us to partition SSB into orthogonal linear. components each with one degree of freedom. and. print xp. the treatment levels are equally spaced. 40.3). cubic.267261 0.AND TWO-WAY RM DESIGNS Orthogonal Polynomials The ﬁtting of curves within the ANOVA framework can be greatly simpliﬁed if 1.
131
In such a case we may replace (xi − x) by a simple set of multipliers known as orthogonal polynomial ¯ coeﬃcients.267261 -0.5345225 -0. For instance.4). consider a case where the treatment has k = 4 levels.7171372 -0. The output is 0. Assume that cij . .5345225 -0. These coeﬃcients are extensively tabulated but we will use Proc IML in SAS to generate them. j = 1. . and
2 Si =
rL2 i k 2 j=1 cij
is the 1 df sum of squares associated with the ith term. print xp. x={0 25 75 100}.632456 -0.4472136 -0. Note that in SAS the treatment levels need not be equally spaced.1195229
The ﬁrst column represents the intercept term.588E-18 0.4472136 0.

linear -0.6324555 quadratic 0. proc print. Quit. Run.1195229 -0. Input Trend @.5 0.8. Symbol4 C=Black V=Circle L=1 I=Spline.632456 0. Input Coef @.5345225 cubic -0. Axis1 Label=(A=90 "Coefficient") Order=(-1 To 1 By .316228 0.1).3162278 0.6324555 -0. Title2 "5 Equallyspaced Levels". run.7.6324555
0.969E-17 0. Output.632456 -0.478091 0. MORE ON REPEATED MEASUREMENT DESIGNS
0.5 0.267261 0. Symbol1 C=Black V=Triangle L=1 I=Spline.5
-0.5. Proc GPlot Data=Five.7171372 -0.3162278
The following code and associated plot show how the contrast coeﬃcients are related to the polynomial terms.5 0.501E-17 -0. Symbol2 C=Black V=Square L=1 I=Spline.
.5 0.632456 0. Symbol3 C=Black V=Diamond L=1 I=Spline.5
-0.316228 0.632456 -0. Do X=4.316228 1. Data Five.5 -0.267261 -0.6.316228 0.6324555 6.478091 0. Length Trend $9.
Title1 "Orthogonal Polynomials".3162278 0.3162278 quartic 0. End. quit.132 gives
CHAPTER 6. Plot Coef*X=Trend / VAxis=Axis1.5 -0. Datalines.1195229 .5345225 -0.534522 -0.

3162278.1.267261 0. contrast ’linear’ temp -0. quit. model pot=temp temp*temp.3162278 0.1195229.478091 0. proc glm.478091 0. contrast ’quartic’ temp 0.6324555.534522 -0.632456 -0. The results are given below: 10◦ 62 55 57 Temperature 30◦ 50◦ 70◦ 26 16 10 36 15 11 31 23 18 90◦ 13 11 9
The contrast coeﬃcients in the following code were generated using Proc IML in SAS.267261 -0. run. data potency.501E-17 -0. selected at random.632456 0. model pot = temp. Fifteen samples of the antibiotic were obtained and three samples. contrast ’quadratic’ temp 0.6324555 6.
. contrast ’cubic’ temp -0. After 30 days of storage the samples were tested for potency. class temp. cards. 10 62 10 55 10 30 26 30 36 30 50 16 50 15 50 70 10 70 11 70 90 13 90 11 90 .
57 31 23 18 9
proc glm. run.316228 1.316228 0.969E-17 0. TREND ANALYSES IN ONE.5345225.1195229 -0. output out=quadmod p=p. were stored at each of ﬁve temperatures.5345225 -0. input temp pot @@.AND TWO-WAY RM DESIGNS
133
EXAMPLE: An experiment was conducted to determine the eﬀect of storage temperature on potency of an antibiotic.6.7171372 -0.

Let φ1 . MORE ON REPEATED MEASUREMENT DESIGNS
proc gplot data=quadmod.0001 <. In the following we will consider two examples: one. The scale used in the study is:
.000000 4680.300220 0. which may cause or explain the insomnia. quit.9597
Over the 30 day storage period used in this study temperature had a highly signiﬁcant eﬀect on the potency of the antibiotic (P < . . and response surface methodology in general.859329 36. will be investigated in a greater detail in a later chapter.0001 <.000000
F Value 70. axis1 label=(A=90 "Y").199609 720.199609 720.134 quit.
CHAPTER 6.300220 0. A sample b = 8 people is selected from a population of insomniacs. Over the temperature range 10◦ to 90◦ . The following is the ANOVA table that is produced by SAS:
Source temp linear quadratic cubic quartic Error Corrected Total
DF 4 1 1 1 1 10 14
Type I SS 4520.j or φ = i=1 cj µij be a polynomial contrast of the main B means or the simple B means speciﬁc to Group i. denoted by B hereafter. Example : One-way RM Design In a small pilot clinical trial dose-response drug study. a pharmaceutical company research team is concerned with the quickness that a new drug can sedate a person so that they can go to sleep.20 45.80 − 1. One may use the POLYNOMIAL transformation option of SAS to obtain trend analysis. So we will ﬁt a quadratic model using SAS’ Proc GLM.100000 3763.042866 160. respectively.859329 36. who have no medically-diagnosed physical or mental disease or symptoms. .042866 16.00
Pr > F <.
6. Personal Communication.400000 3763.27 0.1629 0. φb−1 be the b − 1 orthogonal polynomial contrasts. involves b time measurements b b at times t1 < t2 < . where φi is degree i.0001 0.05 2. A Likert scale is used to determine ease in going to sleep.60x + . The linear and quadratic terms are the only signiﬁcant trend components. . plot p*temp/ vaxis=axis1. . run. Let φ = i=1 cj µ. symbol1 c=black V=square L=1 I=spline.01x2 ˆ where y is the expected potency and x is the 30-day storage temperature. ˆ The issue of trend analysis involving more than one independent variable.400000
Mean Square 1130.2
RM Designs
Consider those cases where the within subject factor. < tb . .and two-way RM designs. .0001).1. The following is due to Michael Stoline. φ2 .63 235. potency may be described by the equation y = 71.

in the above Likert scale.7926290870 0. TITLE ’ONE-WAY RM TREND’. using the above instrument. QUIT.4543694674 B2 -.3442576419 0.5296271413 -.3263766829 -. two. REPEATED TIME 4 (1 2 4 8) POLYNOMIAL / PRINTM SUMMARY. TREND ANALYSES IN ONE.5128776445 0.0567961834
. four. and each consistently scores a value of 4 in their daily evaluation of time required to go to sleep.
DATA RM1.7951465679 B3 0. Sleep diﬃculty. A standard dosage of the new drug is administered daily to each of the eight subjects under the care of a physician. MODEL B1-B4 = / NOUNI.0466252404 -.1.AND TWO-WAY RM DESIGNS Scale 1 2 3 4 Time to go to sleep (in minutes) < 10 10 − 20 20 − 40 > 40
135
Each of the subjects in the insomniac population has chronic sleep diﬃculties. 4 2 2 1 2 2 1 1 2 2 2 2 4 3 3 2 3 2 1 1 3 1 2 1 1 1 2 1 2 2 1 2 . The goal of the study is to assess the relationship of the ease of going to sleep as a function of the length of time under medication.6.7679593549 -.1059254283 0. CARDS. and eight weeks. INPUT B1 B2 B3 B4. RUN. In particular we would like to test the signiﬁcance of the linear quadratic and cubic orthogonal trend components of sleep diﬃculty as a function of time. is measured for each patient after one. The data are given below: Subject 1 2 3 4 5 6 7 8 1 week 4 2 2 4 3 3 1 2 Sleep Diﬃculty Scale 2 weeks 4 weeks 8 weeks 2 2 1 2 1 1 2 2 2 3 3 2 2 1 1 1 2 1 1 2 1 2 1 2
The following SAS code is used to perform orthogonal trend analysis. PROC GLM DATA=RM1.3975732840 B4 0.
The related output is
------------------------------------------------------------------------------TIME_N represents the nth degree polynomial contrast for TIME M Matrix Describing Transformed Variables B1 TIME_1 TIME_2 TIME_3 -.

19791667 0.
.15625000 Mean Square 2.G H .07354488
Mean Square 0.66 Pr > F 0.36123272 -------------------------------------------------------------------------------
The linear trend is the only signiﬁcant trend (P = .38839286 F Value 5.82477209 0.6766 0.9549
The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables TIME_N represents the nth degree polynomial contrast for TIME Contrast Variable: TIME_1
Source Mean Error
DF 1 7
Type III SS 4.136
CHAPTER 6.81653226 0.55407609
Mean Square 4.81653226 2.1391
Contrast Variable: TIME_3
Source
DF
Type III SS
Mean Square
F Value
Pr > F
Mean 1 0.78
Pr > F 0.26 0.82477209 2.1764 Error 7 2.0061
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0.0153 0. MORE ON REPEATED MEASUREMENT DESIGNS
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Source TIME Error(TIME) DF 3 21 Type III SS 6.50772516
F Value 9.52862903 0.29622070
F Value 2.0168
Contrast Variable: TIME_2
Source Mean Error
DF 1 7
Type III SS 0.75
Pr > F 0.95244565 0.59375000 8.0053
Source TIME Error(TIME)
Adj Pr > F G .95244565 3.F 0.0168).

The signiﬁcance of the AB interaction implies that the three levels of A must be treated separately.1.
. At the start of the study. Control may have a constant trend The following is the SAS analysis of the trend components:
OPTIONS LS=80 PS=66 NODATE.AND TWO-WAY RM DESIGNS Example : Two-way RM Design
137
Consider again the experiment involving d drugs was conducted to study each drug eﬀect on the heart rate of humans. AX23 may have a quadratic trend 2. The following table contains results from one such study. DRUG Person within drug 1 2 3 4 5 6 7 8 AX23 T1 72 78 71 72 66 74 62 69 T2 86 83 82 83 79 83 73 75 T3 81 88 81 83 77 84 78 76 T4 77 81 75 69 66 77 70 70 T1 85 82 71 83 86 85 79 83 BWW9 T2 86 86 78 88 85 82 83 84 T3 83 80 70 79 76 83 80 78 T4 80 84 75 81 76 80 81 81 T1 69 66 84 80 72 65 75 71 CONTROL T2 73 62 90 81 72 62 69 70 T3 72 67 88 77 69 65 69 65 T4 74 73 87 72 70 61 68 65
The proﬁle plot (given below) indicates that the trend patterns may be diﬀerent for all the three levels of A.6. DATA RM1. the heart rate was measured every ﬁve minutes for a total of t times. After the drug was administered. TREND ANALYSES IN ONE. BWW9 may have a cubic trend 3.
An inspection of the above proﬁle plot shows that: 1. n female human subjects were randomly assigned to each drug.

Selected output
------------------------------------.05625000 6.05625000 44.89375000 Mean Square 28.0312500 40.86 Pr > F <.04375000 Mean Square 0.
CHAPTER 6.8169643 F Value 109. TITLE1 ’HEART RATE DATA PROC SORT DATA=RM1. REPEATED B 4 POLYNOMIAL/ PRINTM SUMMARY.50625000 4. CARDS.0748
------------------------------------. MODEL Y1-Y4 = /NOUNI.0001 DF 1 7 Type III SS 28.A=2 -------------------------------------Contrast Variable: B_1 Source DF Type III SS Mean Square F Value Pr > F
.A=1 -------------------------------------B_N represents the nth degree polynomial contrast for B Contrast Variable: B_1 Source Mean Error Contrast Variable: B_2 Source Mean Error Contrast Variable: B_3 Source Mean Error DF 1 7 Type III SS 0. RUN.7187500 Mean Square 639.
PROC GLM DATA=RM1.41339286 F Value 4.138
INPUT A Y1 Y2 Y3 Y4 @@.7564 DF 1 7 Type III SS 639.86339286 F Value 0. MORE ON REPEATED MEASUREMENT DESIGNS
86 86 78 88 85 82 83 84
83 80 70 79 76 83 80 78
80 84 75 81 76 80 81 81
3 3 3 3 3 3 3 3
69 66 84 80 72 65 75 71
73 62 90 81 72 62 69 70
72 67 88 77 69 65 69 65
74 73 87 72 70 61 68 65
: TREND’.10 Pr > F 0. BY A. BY A. RUN.37 Pr > F 0.50625000 34.0312500 5. QUIT. 1 72 86 81 77 2 85 1 78 83 88 81 2 82 1 71 82 81 75 2 71 1 72 83 83 69 2 83 1 66 79 77 66 2 86 1 74 83 84 77 2 85 1 62 73 78 70 2 79 1 69 75 76 70 2 83 . CLASS A. QUIT.

0112
------------------------------------.21875000 Mean Square 0. AX23 .14375000 Mean Square 0.A=3 -------------------------------------Contrast Variable: B_1 Source Mean Error Contrast Variable: B_2 Source Mean Error Contrast Variable: B_3 Source Mean Error DF 1 7 Type III SS 0.80625000 6.0570
DF 1 7
Type III SS 2.9511 DF 1 7 Type III SS 11.quadratic 2.62 Pr > F 0.21875000
Mean Square 2.53125000 8.84910714
F Value 11.99910714
5.7332 DF 1 7 Type III SS 0.75625000 9.00 Pr > F 0.50625000 28.No trend
.1855
DF 1 7
Type III SS 79.17410714
F Value 2.6133929 F Value 0.65
Pr > F 0.6.18
0.53125000 1.AND TWO-WAY RM DESIGNS
139
Mean Error Contrast Variable: B_2 Source Mean Error Contrast Variable: B_3 Source Mean Error
1 7
51.03125000 7.03125000 54.4566
The results match our prior expectations: 1.2937500 Mean Square 11.5562500 18.94375000
Mean Square 79. Control .16
Pr > F 0. TREND ANALYSES IN ONE.99375000
51.13 Pr > F 0.74553571 F Value 0.cubic 3.1.50625000 4.80625000 47.75625000 69. BWW9 .5562500 130.02053571 F Value 0.

Tillage machinery requires large plots while fertilizers can be applied to small plots. · · · . · · · . S3 ) as a whole-plot factor and four rates of nitrogen fertilizer applied by hand (N0 . The levels of one factor are assigned at random to large experimental units within blocks of such units. The following example is taken from Milliken and Johnson. a j = 1.
where µ + τi + ij is the whole plot part of the model and βj + (τ β)ij + eijk is the subplot part. N2 . · · · . Whole S2 N0 N3 N2 N1 Plots S1 N3 N2 N0 N1
S3 N3 N2 N1 N0
S1 N2 N3 N0 N1
S3 N0 N1 N3 N2
S2 N1 N0 N3 N2
The statistical model associated with the split-plot design is i = 1. b k = 1. and r is the number of times a whole plot factor is repeated. This results in a design known as the split-plot design which is a generalization of the RCBD (or a subset of the classic two-way RM design.140
CHAPTER 6. An agronomist may be interested in the eﬀect of tillage treatments and fertilizers on yield. The large units are then divided into smaller units. There are two sizes of experimental units: whole plots are the larger units and subplots or split-plots are the smaller units. We may have unequal number of repetitions of the whole plot factor if the number of whole plots is not a multiple of a. N1 . b is the number of levels of the subplot factor. as we shall see later).
. The three methods of land preparation are applied to the whole plots in random order. One such experiment considers three methods of seedbed preparation (S1 . MORE ON REPEATED MEASUREMENT DESIGNS
6. N3 ) as the subplot factor. S2 .2
The Split-Plot Design
In some multi-factor designs that involve blocking. r
yijk = µ + τi +
ij
+ βj + (τ β)ij + eijk . We shall only consider the simplest split-plot design that involves two factors and an incomplete block. The analysis is divided into two parts: whole plot and subplot. and the levels of the second factor are assigned at random to the small units within the larger units. we may not be able to completely randomize the order of runs within each block. Notice that our design restricts the number of whole plots to be a multiple of a. Here a is the number of levels of the whole plot factor. Then each plot is divided into four subplots and the four diﬀerent fertilizers are applied in random order. The ANOVA table for this split-plot design is Source of Variation Whole Plot Analysis A Error(Whole Plot) Subplot Analysis B AB Error(Subplot) df a−1 a(r − 1) b−1 (a − 1)(b − 1) a(b − 1)(r − 1)
Notice that the ANOVA table looks like a two-way RM design ANOVA where the whole plot analysis here corresponds to the between subject analysis of the RM design and the subplot analysis corresponds to the within subject analysis of the RM design.

27853 6.2750 15. RUN.6654 11.0702 10.1413 10 4 3 3.2. /* SPLIT PLOT MJ 24. The four levels of fertilizer were randomly assigned to the four pots in each tray so that each fertilizer occurred once in each tray.5034 10.29741 5.8397 .5173 3.5034 40 8 12 10. /* MEANS OK FOR BALANCED DATA */
. /* TESTS USING TEST STATEMENT */ MEANS MOIST / LSD E = TRAY(MOIST). 20.56933 8.9157 15. The wheat seeds were planted in each pot and after 30 days the dry matter of each pot was measured. 40 8 10 12.4997 4.
OPTIONS LINESIZE=80 PAGESIZE=37. PROC GLM.1180 13.5144 12. MODEL YIELD = MOIST TRAY(MOIST) FERT MOIST*FERT / SS1.6.5144 40 8 11 12.dat” in the following format: moist fert tray yield 10 2 1 3. or 40 ml of water per pot per day to the tray.2 */ DATA EXPT.0842 10..2626 8. The levels of fertilizer were 2.97584 10 4 1 4.7348 12. 4 pots could be put into each tray. There where 48 diﬀerent peat pots and 12 plastic trays. or 8 mg per pot.3170 4.97584 5.0265 10.9081 13.7697 8.DAT’ FIRSTOBS=2.7486 3.5572 4.0444 1. THE SPLIT-PLOT DESIGN Example
141
The following data are taken from an experiment where the amount of dry matter was measured on wheat plants grown in diﬀerent levels of moisture and diﬀerent amounts of fertilizer. INFILE ’C:\ASH\S7010\SAS\SPLIT2.3776 5.7133 14. 6. /* SPLIT PLOT USING GLM */ CLASS TRAY FERT MOIST.8376 9. RANDOM TRAY(MOIST) / TEST.4367 8 5.1413 6.9575 15. /* TESTS USING RANDOM STATEMENT */ TEST H = MOIST E = TRAY(MOIST).95113 6.3170 10 4 2 4.0482 Level of Fertilizer 4 6 4.3934 7.3458 10 2 2 4.0490 5. QUIT.4373 8.0444 10 2 3 1.9419 10.6332 12. The moisture treatment consisted of adding 10.5168 13. INPUT MOIST FERT TRAY YIELD. Level of Moisture 10 Tray 1 2 3 4 5 6 7 8 9 10 11 12 2 3.2811
20
30
40
The data was placed in a ﬁle called ”split2.3654 6. The levels of moisture were randomly assigned to the trays.8393 6.3458 4.9334 10.. where the water was absorbed by the peat pots.8794 7.4730 7.2811 The following SAS code uses two ways (PROC GLM and PROC MIXED) to perform the split-plot analysis.8397 4. 4.91310 6.1372 9. 30.5129 10.

0180009 4.4064398 99.0003
Plot of LSMEAN*moist.0540027 38. RUN. RANDOM TRAY(MOIST).7521392 36.
Symbol is value of fert. CLASS TRAY FERT MOIST.142
CHAPTER 6.5 | -----------------------------------------------------------10 20 30 40 The Mixed Procedure Type 3 Tests of Fixed Effects Effect moist fert fert*moist Num DF 3 3 9 Den DF 8 24 24 F Value 26.0 | 6 | 4 | | 2 2.
/* SAVE LS MEANS DATASET */
/* INTERACTION PLOTS FROM GLM */
PROC MIXED DATA=EXPT.65 5.0019 <.0 | 6 | 4 | | 4 7.0562942 Mean Square 89.62 27. QUIT.2515182 297.0001 0.1895496 27. MODEL YIELD = MOIST | FERT. PROC PLOT DATA=LSM.6027051 Type I SS 269. RUN. LSMEANS MOIST*FERT / PDIFF STDERR OUT=LSM.5 | 6 | 8 | | 6 10.51
Pr > F <.0002 <.0 | | 8 | | 12.2284771 F Value 119. MORE ON REPEATED MEASUREMENT DESIGNS
MEANS FERT / LSD.7298499 3.4587550 0.0513405 649. QUIT.5513647 18.5 | | 2 | 8 2 4 | 2 5.62 Pr > F 0. QUIT.5 | | | | 8 15.
/* SPLIT PLOT USING MIXED */
---------------------------------------------------------------Sum of Source DF Squares Mean Square F Value Model Error Corrected Total Source moist tray(moist) fert fert*moist 23 24 47 DF 3 8 3 9 631.0003
.34 131.65 5.0001 0. RUN. PLOT LSMEAN*FERT=MOIST.
LSMEAN | 17.0001
Pr > F <.30 4.0001 0.53 131. PLOT LSMEAN*MOIST=FERT.

2. and each variety of wheat was randomly assigned to one subplot within each whole plot. The ANOVA table for this split-plot design is Source of Variation Replication A Error(Whole Plot) B AB Error(Subplot) df r−1 a−1 (a − 1)(r − 1) b−1 (a − 1)(b − 1) a(b − 1)(r − 1)
One ﬁnal note here is that there is no appropriate error term to test for signiﬁcance of replication eﬀect. Then each plot is divided into four subplots and the four diﬀerent fertilizers are applied in random order. Each whole plot was divided into two subplots. This is done (replicated) for two blocks of land. Consider once again the land preparation and fertilizers example. βj is the eﬀect of the jth block. Now a block of land is divided into three whole-plots and the three methods of land preparation are applied to the plots in random order. Example Two varieties of wheat (B) are grown in two diﬀerent fertility regimes (A).
. The following table shows the experimental plan: Block S3 N3 N2 N1 N0 I S1 N2 N3 N0 N1 S2 N0 N3 N2 N1 S1 N3 N2 N0 N1 II S3 N0 N1 N3 N2 S2 N1 N0 N3 N2
The following rearrangement of the above table shows that a split-plot design is analyzed as a classic two-way RM design with blocks treated as ”subjects”: S1 I N2 N3 N0 N1 The model for such designs is yijk = µ + τi + βj + eij } + πk + (τ π)ik +
ijk }
S2 II N3 N2 N0 N1 I N0 N3 N2 N1 II N1 N0 N3 N2 I N3 N2 N1 N0
S3 II N0 N1 N3 N2
whole plot part of the model subplot part of the model
where τi is the eﬀect of the ith level of the whole plot factor.6. and πk is the eﬀect of the kth level of the subplot factor. The ﬁeld was divided into two blocks with four whole plots. THE SPLIT-PLOT DESIGN
143
We may add grouping in the split-plot design to reduce the whole plot variability. Each of the four fertilizer levels was randomly assigned to one whole plot within a block.

4 37. YIELD = N1.8 A4 44.5500000 Mean Square 131.07 0. MORE ON REPEATED MEASUREMENT DESIGNS Block 1 Variety Fertility B1 B2 A1 35.1 */
OUTPUT.1025000 13.3091667 2.6 42.1025000 13. VAR=2.5 47.5 40 44.8 4 39.6 A3 43. OUTPUT. RUN.2500000 0.1025000 40. VAR=2.3599 0.5 40.6
OPTIONS NOCENTER PS=64 LS=76.0306
Tests of Hypotheses Using the Type III MS for BLOCK*FERT as an Error Term Source BLOCK FERT DF 1 3 Type III SS 131. DATA SPLIT. BLOCK=1. VAR=1. BLOCK=1. DROP N1--N4.9 A2 36. BLOCK=2. LSMEANS FERT | VAR.
PROC GLM.2 42.6 42.5 47.4 37. RUN. QUIT.5166667 F Value 62. YIELD = N2.7 38.25 Pr > F 0.7 41. RANDOM BLOCK BLOCK*FERT. RANDOM BLOCK BLOCK*FERT. MODEL YIELD = FERT VAR VAR*FERT. INPUT FERT N1 N2 N3 N4. PROC MIXED.4 A4 39.3966667 2. -------------------------------------------------------------Dependent Variable: YIELD Source Model Error Corrected Total Source BLOCK FERT BLOCK*FERT VAR VAR*FERT DF 11 4 15 DF 1 3 3 1 3 Sum of Squares 182. CLASS BLOCK VAR FERT.4476 0. CARDS.9 41.4 43.85 Pr > F 0.21 6. OUTPUT.5472727 2.0530 0.6 3 34.36 1. TEST H = BLOCK FERT E = BLOCK*FERT. OUTPUT.0014 0.1025000 40. SET SPLIT.144
CHAPTER 6.9275000 2.3 2 36.10 1.8 36. 1 35.77 5.8 36. CLASS BLOCK VAR FERT.6 . QUIT. Block 2 Variety Fertility B1 B2 A1 41.6 40. BLOCK=2.2 A3 34. RUN. QUIT.
/* SPLIT PLOT MJ 24.3966667 F Value 56.0914
.7 41.2500000 1. LSMEANS VAR VAR*FERT / PDIFF STDERR.80 Pr > F 0.0200000 8. YIELD = N4.3 A2 42. DATA B.0 SAS was used to analyze the data.1900000 Mean Square 131. LSMEANS FERT / PDIFF STDERR E = BLOCK*FERT.4300000 190.7 38.8612 Mean Square 16.4500000 Type III SS 131.6 40.0048 0.1075000 F Value 7. YIELD = N3. MODEL YIELD = BLOCK FERT BLOCK*FERT VAR VAR*FERT. VAR=1.1900000 6.

A slight modiﬁcation of the SAS code above will provide the necessary tests.1075000 F Value 5.3091667 2.3599 0.25 Pr > F 0.8612
BLOCK 1 FERT 3 BLOCK*FERT=Error(Whole plot) 3 VAR 1 VAR*FERT 3 Error(Subplot) 4
There is no signiﬁcant diﬀerence among fertilizers.0914 0.07 0.1025000 40.1025000 13.9275000 2.
.4300000 Mean Square 131.6.80 1.2500000 1. This is left as an exercise.2500000 0.5166667 2.5500000 8.3966667 2. them multiple comparisons would have to be carried out to determine the signiﬁcant diﬀerences.1900000 6. THE SPLIT-PLOT DESIGN
145
We may reorganize the ANOVA table in the output as
Source DF SS 131.2. If this test were signiﬁcant.

) = τ1 − τ2 2 E(Y221 ) = µ + β2 + γ1 E(Y212 ) = µ + τ2 + β1 + γ2
¯ ¯ E(Y. Thus τ1 and τ2 are the carry-over eﬀects observed in the second observation whenever b1 and b2 are observed ﬁrst. k = 1. − Y. treatment j and period k. 2. ) = 2 ¯ ¯ B : E(Y. j = 1. b1 ) Mean Period: C c1 c2 Y111 Y122 Y221 Y212 ¯ ¯ Y. respectively. ) = (β1 − β2 ) − ¯ ¯ E(Y111 − Y221 ) = β1 − β2 ¯ ¯ E(Y. The following table gives the observations: Sequence: A a1 = (b1 . b2 by subjects in sequence level 1 (a1 ) and in the order b2 . Here the repeated factor B has two levels b1 and b2 which are observed in the order b1 . period C (c1 and c2 ). We will only consider the two-period crossover design as an illustration. then τ1 − τ2 ¯ ¯ =0 A : E(Y1. Let βj be the jth treatment eﬀect. − Y.e..2.146
CHAPTER 6.. Hence.1 Y. and treatment B (b1 and b2 ) in the experiment. AB and C are confounded.. b.2 Mean ¯ Y1. 2. − Y2...3
Crossover Designs
A crossover design is a two-way RM design in which the order (A) of administration of the repeated levels of a single factor (B) is accounted for as a between subject factor in the design. b1 by subjects in sequence level 2 (a2 ) as shown below: Sequence: A a1 a2 Period: C c1 c2 b1 b2 b2 b1
There are three factors: sequence A (a1 and a2 ). and γk be the kth period eﬀect. ) = (β1 − β2 )
. − Y2.1 − Y. ¯ Y2. Further. Let Yijk be the random variable which represents the observation corresponding to sequence i. i.. A Latin square type design approach is used in the construction of the order eﬀects so that each level j of B occurs equally often as the ith observation for i = 1.. If there is no sequence eﬀect (H0 : τ1 = τ2 ). E(Y111 ) = µ + β1 + γ1 E(Y122 ) = µ + τ1 + β2 + γ2 One can show that ¯ ¯ E(Y1.. let τ1 be the eﬀect on the second observation if b1 is observed ﬁrst and τ2 be the eﬀect on the second observation if b2 is observed ﬁrst. MORE ON REPEATED MEASUREMENT DESIGNS
6.
The model incorporates carry-over eﬀects in the second observations but not in the observations made ﬁrst.2.1. · · · .1.2 ) = E
τ1 − τ2 2 = (τ1 − τ2 ) − τ1 + τ2 2
Y212 − Y221 Y111 − Y122 − 2 2
The last expression shows that the AB and C eﬀects are identical.. a crossover design is a basic one-way RM design on the levels of B that has been analyzed as a two way RM design to account for the order or sequence eﬀect caused by the fact that the B levels are given one at a time in diﬀerent orders.. b2 ) a2 = (b2 .

− y. . − Y. and n1 are randomly assigned to sequence 1 while the remaining n2 are assigned to sequence 2.1 − µ. n1 1 . b1 )
One can immediately observe that this is a classic two-way RM layout discussed in Chapter 5. . .2 is (¯.2. An α level test rejects H0 if FB > F1.n1 +n2 −2 (α) using SAS Type III SS. ) ± tn1 +n2 −2 (α/2) y ¯ where y. ¯ y ¯ Step 2B: Test for B main eﬀects using µ11 − µ22 The standard error of the estimator of µ11 − µ22 is se(¯11. The data layout is Sequence : A a1 = (b1 . .1. Now assume that there are n1 + n2 subjects available. . Step 2A: Test for B main eﬀects using µ. − y22. the B main eﬀect is a valid test if there is no sequence eﬀect. 2. b2 ) Subject 1 . .1.1. . E(Y. In this case the test H0 : µ111 = µ221 of the simple B eﬀects is a valid test of H0 : β1 = β2 ¯ ¯ since E(Y111 − Y221 ) = β1 − β2 even when H0 : τ1 = τ2 is not true. The ANOVA table is
Source of variation A : Sequence Subjects within A B AB B× Subjects within A df 1 n1 + n2 − 2 1 1 n1 + n2 − 2 MS M SA M S1 M SB M SAB M S2 F FA = M SA /M S1 FB = M SB /M S2 FAB = M SAB /M S2
The recommended analysis strategy is Step 1: Test for sequence eﬀects H0 : µ1. Otherwise. . then go to Step 2B. the unweighted mean. y11n1 y211 . n2 Treatment : B b1 b2 y111 y121 . ) = y ¯ with the approximate degrees of freedom dfw = (n1 + n2 − 2)(M S1 + M S2 )2 (M S1 )2 + (M S2 )2 M S 1 + M S2 2 1 1 + n1 n2 M S2 2 1 1 + n1 n2
. using FA .6. then go to Step 2A. y21n2 y12n1 y221 . • If FA is signiﬁcant.2. = (¯11. . . . ) is a biased estimate of β1 −β2 . = µ2. CROSSOVER DESIGNS
147
¯ ¯ Thus. A 100(1 − α)% conﬁdence interval for µ.2 1. . • If FA is not signiﬁcant. )/2. .1 − µ. y22n2
a2 = (b2 . + y21.3.

6 0 2 4 -. Y = A.3 -1.8 3.9 3.2
28 -2.3 0. − y22. pp 467-480.9 . RUN.1 1 5 . MODEL Y = SEQ PERSON(SEQ) TRT TRT*SEQ. Y = B. CLASS TRT SEQ PERSON. − y22.6 1.9 0.6 0.6
Subject 24 25 -0. DATA CROSS. OUTPUT.8 .0
Total -9.0 -1.Biometrics . MORE ON REPEATED MEASUREMENT DESIGNS 1.5333
Period 1 2 Total
Treatment B A
21 1.4
26 -2.3000 .9 2 7 -.0 1. ) y ¯ y ¯
Example The following data are taken from Grizzle .2 12 0.4 1 6 1.6 0. OUTPUT.9 -2.
Period 1 2 Total Treatment A B 11 0.3 -1.1965.0 Mean .7 Total 1.0 -1. An α-level test of H0 : µ11 = µ22 rejects H0 if |¯11. PROC GLM. RANDOM PERSON(SEQ) / TEST.9 2 8 .3 -2.7 15 0.2 2. QUIT.8 -0.3 2 2 1 -2.0 0.7 -0.3
23 0. DATA CROSS2. 1 1 .7 -1.9 -0. SET CROSS.2
22 -2.4 2 6 1.1 -0.2 2 1 .3 2 3 .4 -0.3 1.7 -2. − y22.3 0. | y ¯ > tdfw (α/2) se(¯11.9 -2.3 -.8 2 5 -1 -.5 -6. TRT = ’B’.8 0.2 1 1 2 0 -.0 -0. ) ± tdfw (α/2)se(¯11.7 Subject 13 14 -0. ) y ¯ 2.2
27 -1.2 1. ------------------------------------------------------
. The responses are diﬀerences between pre-treatment and post-treatment hemoglobin levels. QUIT.7 16 1. RUN.7 1 3 -. LSMEANS SEQ / E=PERSON(SEQ).9 2. DROP A B.148
CHAPTER 6.1 -1.3 .5 1.2375 0.4 0. CARDS.2 1.9 1.2 5.2 1 4 .4
Mean -1.4375
The SAS code and partial output are given below
OPTIONS LINESIZE=80 PAGESIZE=66.5 1.9 1. TEST H=SEQ E=PERSON(SEQ). TRT = ’A’. LSMEANS TRT TRT*SEQ. − y22.6 1. INPUT SEQ PERSON A B. A 100(1 − α)% conﬁdence interval for µ11 − µ22 is (¯11.

α = .006667 1.3. 6 8 2
12(1 + 1.0538
Least Squares Means TRT A B TRT A A B B Y LSMEAN 0.572
.35208333 SEQ 1 2 1 2 Y LSMEAN 0. = −1.0538).573333 4.6.23750000
Using SAS Type III analysis we get the following summary for α = .2.56297619 6.00055556 3.05 : Sequence eﬀects are not signiﬁcant (p-value = .96583333 14. CROSSOVER DESIGNS
149
Dependent Variable: Y Source Model Error Corrected Total Source SEQ PERSON(SEQ) TRT TRT*SEQ DF 15 12 27 DF 1 12 1 1 Sum of Squares 27.80 2.6446 0. there are no treatment ¯ eﬀects at α = . − y22.40000000 Pr > F 0.000556 Error: MS(PERSON(SEQ)) * This test assumes one or more other fixed effects are zero.69 .05 and α = .1165 0.50 Pr > F 0.53333333 -1. Therefore.3521.3688 ¯ and y. The diﬀerence is not signiﬁcant (p-value = .05 level of signiﬁcance. Analyze treatment eﬀects using y.2375) = 2. ) = y ¯ and dfw = Therefore.245)2
.2435
Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: Y Source DF Type III SS Mean Square F Value SEQ 1 4. = −.245)2 = 23.43750000 0.2375.573333 4.24297619 Mean Square 4.41666667 -0.572 . = .36875000 -0.10 (for illustrative purposes): α = . (1)2 + (1. * Least Squares Means SEQ 1 2 Y LSMEAN 0.56297619 6.24534722 F Value 1.0538).0449 Mean Square 1.30000000 0.94416667 42.86438889 1. We have ¯ ¯ se(¯11.7 ≈ 24 .67 0. Analyze treatment eﬀects at time period 1 using y11.24297619 F Value 3.1.57 Error 12 12.245 + = 0.57333333 1.0794 0.3000 − (−1.91000000 Type III SS 4.57333333 12.000 + 1.01 Pr > F 0.3000 and y22. = .86 5. the t-statistic is 1 1 1.1165).10 : Sequence eﬀects are signiﬁcant (p-value = .00666667 3.

0128.150
CHAPTER 6. MORE ON REPEATED MEASUREMENT DESIGNS which has a two-sided p-value of 0.10 level of signiﬁcance. the diﬀerence is signiﬁcant at α = .
. Thus.10. Therefore. there are treatment eﬀects at α = .

2 • (γS)kl ∼ N (0.
• ejkl ∼ N (0. ybcn
The statistical model for such designs is yjkl = µ + βj + γk + (βγ)jk + Sl + (βS)jl + (γS)kl + (βγS)jkl + ejkl for j = 1. . . . d2 σ2 ) = Bj by subject l interaction. σ 2 ). . . d3 σ3 ) = Ck by subject l interaction. n C1 y111 . Random Eﬀects :
2 • Sl ∼ N (0. One should notice that the ejkl are confounded with (βγS)jkl and hence cannot be estimated. .4
Two-Way Repeated Measurement Designs with Repeated Measures on Both Factors
Consider a design where factors B and C are within subject factors with levels b and c. Source Between Subjects Subjects (S) Within Subjects B B×S C C ×S BC BC × S df n−1 b−1 (b − 1)(n − 1) c−1 (c − 1)(n − 1) (b − 1)(c − 1) (b − 1)(c − 1)(n − 1) MS M SS M Sb M S1 M Sc M S2 M Sbc M S3 E(M S)
2 σ 2 + bcσ1 nc 2 2 βj σ 2 + cσ2 + b−1 2 σ 2 + cσ2 nb 2 2 2 σ + bσ3 + c−1 γk 2 2 σ + bσ3 n 2 σ 2 + σ4 + (b−1)(c−1) 2 2 σ + σ4
(βγ)jk
The following ANOVA table gives the F tests to test for main eﬀects:
. Assume that a random sample of n subjects are observed on each of the bc crossed levels of B and C. c. . where Fixed Eﬀects : • βj and γk ate the Bj and Ck main eﬀects and • (βγ)jk is the Bj Ck interaction eﬀect. .6. . The following table contains the expected MS for the two-way RM design with repeated measures on both factors. yb1n Bb ··· Cc ybc1 . n. . y11n B1 ··· Cc y1c1 . . . Treatments (B) Subjects 1 . . . σ1 ) = subject l eﬀect 2 • (βS)jl ∼ N (0. d4 σ4 ) = Bj by Ck by subject l interaction. . and l = 1. . . . TWO-WAY REPEATED MEASUREMENT DESIGNS WITH REPEATED MEASURES ON BOTH FACTORS151
6. respectively. We assume that the random eﬀects interactions satisfy some (CS) conditions while all other random variables are assumed to be independent. The following table gives the data layout of such designs.4. . . . . y1cn ··· C1 yb11 . b. These are used in the construction of F tests. k = 1. 2 • (βγS)jkl ∼ N (0.

− yj ¯ ¯
k.. and mother. − yj ¯ ¯
. and Fbc need to be checked.
Standard Error
2M S1 cn 2M S2 bn 2 (df1 M S1 +df3 M S3 ) n (df1 +df3 2 (df2 M S2 +df3 M S3 ) n (df2 +df3
df df1 df2 df4 df5
y. father. The data were obtained for seven families. The following is part of an example taken from Milliken and Johnson. then G-G e-adjusted F tests followed by paired t tests for mean comparisons need to be performed. − yjk . If the (S) condition is not satisﬁed. C c.. − y. I. The attitiudes of families were measured every six months for three time periods.
µjk − µjk
yjk. ¯ ¯ yjk. ¯ ¯
The degrees of freedoms for the simple main eﬀects are approximated using Satterthwaite approximation formulae: (df1 M S1 + df3 M S3 )2 df4 = df1 (M S1 )2 + df3 (M S3 )2 df5 = (df2 M S2 + df3 M S3 )2 df2 (M S2 )2 + df3 (M S3 )2
Three sphericity conditions corresponding to Fb . Vol. REPEATED B b. each family consisting of a son. MORE ON REPEATED MEASUREMENT DESIGNS df n−1 b−1 (b − 1)(n − 1) = df1 c−1 (c − 1)(n − 1) = df2 (b − 1)(c − 1) (b − 1)(c − 1)(n − 1) = df3 MS M SS M Sb M S1 M Sc M Sbc M S3 Fb = M Sb /M S1 Fc = M Sc /M S2 Fbc = M Sbc /M S3 F
Mean comparisons are made using the following standard errors: Parameter Main Eﬀects βj − βj γk − γk Simple Main Eﬀects µjk − µj
k
Estimate yj.k. The data are given as follows: Son T2 11 13 13 18 14 6 17 Person Father T1 T2 T3 18 19 22 18 19 22 19 18 22 23 23 26 15 15 19 15 16 19 17 17 21 Mother T2 T3 16 19 16 19 16 20 22 26 17 20 19 21 20 23
Family 1 2 3 4 5 6 7
T1 12 13 12 18 15 6 16
T3 14 17 16 21 16 10 18
T1 16 16 17 23 17 18 18
. The generic SAS Proc GLM code for analyzing two-way repeated measurement designs with repeated measures on both subjects is PROC GLM..152 Source Between Subjects Subjects (S) Within Subjects B B×S C C ×S BC BC × S
CHAPTER 6. Fc . MODEL Y11 Y12 . Ybc = / NOUNI. The Analysis of Messy Data..k .

QUIT. END. OUTPUT. DO I=1 TO 9.3) = 0) THEN DO. PROC GLM DATA = RM2.3). SET RM. LSMEANS C/PDIFF E=C*S. S = _N_. MODEL Y = B|C|S. TWO-WAY REPEATED MEASUREMENT DESIGNS WITH REPEATED MEASURES ON BOTH FACTORS153 The SAS analysis of the data is given as follows:
DATA RM.4. CLASS S B C. LSMEANS B/ PDIFF E=B*S. END. IF (MOD(I. QUIT. REPEATED B 3. Y = Z[I]. INPUT Y1-Y9. QUIT.
Some selected output from a run of the above program is:
Sphericity Tests(B)
. LSMEANS B*C/PDIFF E=B*C*S. END. RUN. B = ROUND(I/3). MODEL Y1--Y9 = / NOUNI. ELSE DO. OUTPUT. ARRAY Z Y1--Y9. C = MOD(I. DROP Y1-Y9. B = FLOOR(I/3) + 1. RUN. RUN. CARDS.
DATA RM2.
19 19 18 23 15 16 17
22 22 22 26 19 19 21
16 16 17 23 17 18 18
16 16 16 22 17 19 20
19 19 20 26 20 21 23
PROC GLM. C = ROUND(I/B).6. TEST H=C E=C*S. TEST H=B*C E=B*C*S. C 3/ PRINTE NOM. 12 11 14 18 13 13 17 18 12 13 16 19 18 18 21 23 15 14 16 15 6 6 10 15 16 17 18 17 . TEST H=B E=B*S.

F 0.0001
Adj Pr > F G .7451 0.8660974
Variables Transformed Variates Orthogonal Components
DF 2 2
Chi-Square 1.159756 10.0007
Source B Error(B)
DF 2 12
Type III SS 350.947328 0.2089947
F Value 14.0508148 0.G H .4285714 0.2804233
F Value 258.8571429 3.1555 0.28
Pr > F <. MORE ON REPEATED MEASUREMENT DESIGNS
Mauchly’s Criterion 0.77777778
Mean Square 0.3809524 146.G H .33333333 0.7187896
Pr > ChiSq 0.F <.2705497 2.0936229
Pr > ChiSq 0.0012 0.40740741
F Value 0.6961315 0.0001
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0.154
CHAPTER 6.5262
Adj Pr > F G .3188
The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Adj Pr > F G .5079365
Mean Square 175.8735 0.33333333 9.8819 1.F 0.3511
Sphericity Tests(BC) Mauchly’s Criterion 0.0948413
Variables Transformed Variates Orthogonal Components
DF 9 9
Chi-Square 13.9348
Source B*C Error(B*C)
DF 4 24
Type III SS 1.6981
Sphericity Tests(C) Mauchly’s Criterion 0.403681
Pr > ChiSq 0.5237
.G H .3650794
Mean Square 72.2212
Source C Error(C)
DF 2 12
Type III SS 144.82
Pr > F 0.4780 0.1904762 12.4043 0.35
Pr > F 0.0001 <.6578854
Variables Transformed Variates Orthogonal Components
DF 2 2
Chi-Square 0.8110836 0.0007
Greenhouse-Geisser Epsilon Huynh-Feldt Epsilon
0.

5714286
LSMEAN Number 1 2 3
Least Squares Means for effect C Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 2 0.0000000
LSMEAN Number 1 2 3
Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 1 2 0.4285714 19.1904762 19.1428571 13.4.2857143 16.6.1428571 16.3992 <.0001 <.0952381 19.5714286 17.8571429 18.0000000 21.8627 3 0.0005 0.1428571 21.0001
B 1 1 1 2 2 2 3 3 3
C 1 2 3 1 2 3 1 2 3
Y LSMEAN 13.8571429 18.1428571
LSMEAN Number 1 2 3 4 5 6 7 8 9
.8627
C 1 2 3
Y LSMEAN 16.0007 0.0001 <.0000000 17.0001 3 <.0005 0.3992 0.0007 0. TWO-WAY REPEATED MEASUREMENT DESIGNS WITH REPEATED MEASURES ON BOTH FACTORS155
B 1 2 3
Y LSMEAN 14.

0001) 7.0001 <.6791 <.0001 8 <.0001 0.0001 0. The B(Person) main eﬀect is signiﬁcant (P = .0001 1.0001 <. Comparison of B(Person) means B1 14.0001 <.2212 <.0001 <.4106 <.0001 <.0001 <.4106 <.0 19.5262).0001 <. Comparison of C(Time) means C1 C2 16.0001 0. 3.0001 <.0001 <.0001 <. and BC.2 --------------
6.0001 0.0001 3 <.0001 0.6791 <.0001 0.6791 0.0000 <.0001 <.6791 <. The (S) condition is satisﬁed for all three tests B. The BC interaction is not signiﬁcant (P = 0.0001 <.0001 <.0001 <.0001 0.0001 <.0000 0.0001 1.0001 <.0001 <.0001 <.6791 <.0001
The following is a summary of the results: 1.0001 6 <.0001 <.0001 <.0001 <.0001 <.0001 <.0007) 5.0001 <.0000 0.4 -------------C3 19.2212 7 <.0001 <.6791 <.0001 <.0001 9 <.0001 <.5262). C.0001 <.0001 <.0001 <.6
.0001 <.156
CHAPTER 6.0001 <.4106 <.0000 1.0001 5 <.4106 0.0001 0.0001 4 <. 2.0001 <. MORE ON REPEATED MEASUREMENT DESIGNS
Least Squares Means for effect B*C Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Y i/j 1 2 3 4 5 6 7 8 9 1 2 1.0001 <. The C(Time) main eﬀect is signiﬁcant (P < .0001 <.1 B3 B2 19. 4.0001 <.0001 <.0001 <.3 16. The BC(Person by Time) interaction is not signiﬁcant (P = 0.0001 <.

· · · . f (x) = β0 + β1 x . suppose we have n experimental units giving rise to observations (x1 . β1 ) =
i=1
ε2 = i
i=1
(yi − β0 − β1 xi )2 . ˆ Using diﬀerential calculus. yi − yi .
Thus the observation yi is estimated by the ﬁtted value ˆ ˆ yi = β0 + β1 xi . y = f (x) + ε.
7. y2 ).Chapter 7
Introduction to the Analysis of Covariance
Analysis of covariance (ANCOVA) methods combine regression and ANOVA techniques to investigate the relationship of a response variable with a set of ’treatments’ as well as other additional ’background’ variables. In a data setting. we get the LS estimators as 157
. 2. yn ). 2. n . n . the method of least squares gives the least possible sum of squared residuals. y1 ). Our goal is to estimate and make inferences about the unknown regression coeﬃcients. that is. ˆ In other words.1
Simple Linear Regression
Let y be a measured response variable that is believed to depend on a predictor x up to a random error. β0 and β1 . In this chapter we will focus our attention to situations where y depends on x in a linear fashion. i = 1.1
Estimation : The Method of Least Squares
n n
ˆ ˆ The least squares (LS) estimators β0 and β1 of β0 and β1 . respectively. yi = β0 + β1 xi + εi . where β0 is the intercept and β1 is the slope.1. We will assume that the random errors. · · · . . · · · . i = 1. are the values of β0 and β1 that minimize L(β0 . are independent random variables with mean 0 and constant variance σ 2 . (x2 . In a data setting. Then our model becomes yi = f (xi ) + εi . that is.
7. εi . (xn .
where β0 and β1 are unknown.

we are interested in testing H0 : β1 = 0 HA : β1 = 0 The rejection of H0 indicates that there is a signiﬁcant linear relationship between x and y. and β0 = y − β1 x . the total sum of squares (yi − y )2 partitions into smaller variabilities: the ¯ variability in the response explained by the regression model and the unexplained variability.3
Tests of Hypotheses
One very important question is ”Does x truly inﬂuence y?”.158
CHAPTER 7. however. SST = SSR + SSE .1.
7. Here SSR represents the sum of squares due to regression while SST and SSE are the familiar SS’s due to total and error.e. ¯ ¯ Sxx where Sxy =
i=1 n n
(xi − x)(yi − y ) and Sxx = ¯ ¯
i=1
(xi − x)2 . Example Let X be the length (cm) of a laboratory mouse and let Y be its weight (gm).. ¯
7. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
Sxy ˆ ˆ β1 = . imply that the model is ”good”. Using the partition of the total variability. X 16 15 20 13 15 Y 32 26 40 27 30
. Since this is under an assumption of a linear relationship. It does not. another way to pose the question will be ”Is there a signiﬁcant linear relationship between x and y?” In terms of statistical hypotheses. This is done as
n n n
(yi − y )2 = ¯
i=1 i=1
(ˆi − y )2 + y ¯
i=1
(yi − y )2 ˆ
i.2
Partitioning the Total SS
Similar to ANOVA models. Consider the data for X and Y given below. As before the degrees of freedom for SST partitions into regression df and error df as n − 1 = 1 + (n − 2) .1. one may set up an ANOVA table as follows: Source Regression Residual Total df 1 n−2 n−1 SS SSR SSE SST MS M SR = SSR /1 M SE = SSE /(n − 2) F F = M SR /M SE
We reject the null if the F -test is signiﬁcant at the desired level of signiﬁcance. respectively.

TITLE1 ’SCATTER PLOT OF Y VS X OVERLAYED WITH REGRESSION FIT’. 16 15 20 13 15 17 16 21 22 23 24 18 .1.
Selected output from SAS is given below:
. OUTPUT OUT=REGOUT R=R P=P. TITLE1 ’RESIDUAL PLOT’. MODEL Y = X.
PROC GLM.
32 26 40 27 30 38 34 43 64 45 46 39
SYMBOL V=CIRCLE I=NONE. QUIT.7. INPUT X Y. PROC GPLOT DATA=SLR.
DATA SLR. QUIT. PLOT R*P. SYMBOL V=CIRCLE I=R. PLOT Y*X. QUIT. SIMPLE LINEAR REGRESSION 17 16 21 22 23 24 18 38 34 43 64 45 46 39
159
The following SAS code is used to perform the simple linear regression. CARDS. RUN. PROC GPLOT. RUN. TITLE1 ’SCATTER PLOT OF Y VS X’. RUN. QUIT. PROC GPLOT DATA=REGOUT. PLOT Y*X. TITLE1 ’REGRESSION OF Y ON X’. RUN.

666667
Sum of Mean Square 813.0009
R-Square 0.171733
Y Mean 38.7638231
Mean Square 813.7638231
F Value 21.428909953 2.96138
Root MSE 6.405213270
Error 9.36
Pr > F 0.0009
.681164
Coeff Var 15.62 0.763823 380.70503506 0.090284
F Value 21.0009
Parameter Intercept X
Estimate -5.902844 1194.7638231
Mean Square 813.36
Pr > F 0.0009
Source X
DF 1
Type III SS 813.66667
Source X
DF 1
Type I SS 813.7638231
F Value 21. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
Source Model Error Corrected Total
DF 1 10 11
Squares 813.56 0.160
Dependent Variable: Y
CHAPTER 7.36
Pr > F 0.763823 38.5882 4.52036911
Standard t Value Pr > |t| -0.

0009).1.7. SIMPLE LINEAR REGRESSION
161
The ﬁtted regression line is y = −5. one observes that the relationship between length and weight is an increasing relationship.43 + 2.41x The length of a mouse is signiﬁcantly linearly related to the weight of a mouse (P = 0. From the scatter plot and the ﬁtted line.
.

. A covariate is a variable that is thought to have an eﬀect on the response. i. Xij . ) is the adjusted estimator ¯ ˆ ¯ ˆ is the least squares estimator of the common slope parameter. − X.. Y2n2 X2n2 ··· k Y Yk1 Yk2 Yknk X Xk1 Xk2 Xknk
. The P -value corresponding to A ∗ X is the P -value for the test of interest.2
Single Factor Designs with One Covariate
Analysis of Covariance (ANCOVA) unites analysis of variance (ANOVA) and regression.e. in the ANCOVA model ˆ ¯ ˆ ¯ ¯ ¯ ¯ yi. . Let A denote the factor of interest. where β An essential assumption in the ANCOVA is that there is a common slope parameter. CLASS A. Test whether the covariate X is important: (a) Assuming heterogeneous slopes. The ANCOVA model which includes the covariate in a linear model is ¯ Yij = µ + τi + β(Xij − X. is an unbiased estimator of µi + β(Xi. suppose that for each experimental a covariate. Thus. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
7.. H0 : β = 0. i. The following is the analysis strategy to follow: 1. then these lines would be parallel to each other. µi. . − β(Xi. The data layout for a single factor ANCOVA model is 1 Y Y11 Y12 Y1n1 X X11 X12 X1n1 Treatment 2 ··· Y X ··· Y21 X21 · · · X22 · · · Y22 .adj = yi.
. we test H0 : β1 = β2 = · · · = βk = 0 . ). The homogeneity of the group slope parameters is an assumption that needs to be tested. of µi in the ANCOVA model. In the one-way ANOVA model. β. . H0 : β1 = β2 = · · · = βk . . . .
In the one-way ANOVA model. CLASS A. − X.162
CHAPTER 7. . Yij . (b) Assuming homogeneous slopes. is measured along with the response variable. In SAS PROC GLM. However.. This assumption is known as the parallelism assumption. then we need a follow up test of the importance of the covariate variable as a predictor. MODEL Y = A A*X /NOINT. . ) +
ij
ij
where µi = µ + τi is the mean of treatment i. β. we test H0 : β = 0 . . In SAS PROC GLM. µi is unbiasedly estimated by µi = yi.e. MODEL Y = A X.
.. If this hypothesis is true. ) + This model may be written as ¯ Yij = µi + β(Xij − X. This says that if we were to ﬁt regression lines for each one of the k groups independently. .

SINGLE FACTOR DESIGNS WITH ONE COVARIATE
163
If both are non-signiﬁcant. ) +
ij
and tests H0 : τ1 = τ2 = · · · = τk = 0. MODEL Y = A X A*X. in the preceding SAS code. It is given along with a partial output. The variables in the study are • Drug . 2.a pre-treatment score of leprosy bacilli • PostTreatment . The following is the SAS code used to analyze the data. DATA DRUGTEST. and six sites on each patient are measured for leprosy bacilli.7. CLASS A. then the covariate X is not needed. the we use the ANCOVA model.2. Example Ten patients are selected for each treatment (Drug).two antibiotics (A and D) and a control (F) • PreTreatment . Using SAS PROC GLM. The following example is adapted from Snedecor and Cochran (1967) (See also SAS Online Documentation).a post-treatment score of leprosy bacilli The covariate (a pretreatment score) is included in the model for increased precision in determining the eﬀect of drug treatments on the posttreatment count of bacilli. If either one of the tests is signiﬁcant. (a) If the test is signiﬁcant. Follow-up pairwise comparisons (H0 : τi = τj ) may be performed by including LSMEANS A/PDIFF. This will be addressed in a later section. A 11 6 A 8 0 A 6 4 A 10 13 D 6 0 D 6 2 POST @@. CLASS A. then we perform the ANCOVA analysis.. Test whether there is a common slope: We test H0 : β1 = β2 = · · · = βk using PROC GLM. Go to Step 2. input DRUG $ PRE CARDS. The P -value of interest corresponds to the A ∗ X term. A A D 5 6 7 2 1 3 A 14 A 11 D 8 8 8 1 A 19 11 A 3 0 D 18 18
. (b) If the test is not signiﬁcant. then we follow a Johnson-Neyman analysis. MODEL Y = A X. ﬁts the classic ANCOVA model ¯ Yij = µ + τi + β(Xij − X.

MODEL POST = DRUG DRUG*PRE / NOINT. RUN.01 Pr > F 0.8000000 577. LSMEANS DRUG/ STDERR PDIFF TDIFF.564915 F Value 8.900000 597. CLASS DRUG.8223226 16.
--------------------------------------------------------------------------Dependent Variable: POST Source DRUG PRE*DRUG Error Total DF 3 3 24 30 Sum of Squares 2165.8974030 Mean Square 146.542048 397.59 Pr > F 0.5606
Dependent Variable: POST Source DRUG PRE DF 2 1 Sum of Squares 293.180683 16. MODEL POST = DRUG PRE DRUG*PRE.58 12. MODEL POST = DRUG PRE.15 36.6000000 577. INTRODUCTION TO THE ANALYSIS OF COVARIANCE D 8 9 F 11 18 F 12 16 D F F 5 9 7 1 5 1 D 15 9 F 21 23 F 12 20
PROC GLM.966667 199.557952 1288.6000000 577. QUIT.86 34.0001 <. PROC GLM.164 D 8 4 F 16 13 F 16 12 . RUN.000000 Mean Square 721. PROC GLM.557952 3161. MEANS DRUG.700000 Mean Square 146.0001
Dependent Variable: POST Source DRUG PRE PRE*DRUG Error Total DF 2 1 2 24 29 Sum of Squares 293. QUIT.8974030 F Value 9. CLASS DRUG. QUIT.6446451 397.0013 <.0010 <.8974030 9.0001
.02 Pr > F <. CLASS DRUG.0001 0. RUN. D 19 14 F 13 10 F 12 5
CHAPTER 7.89 0.8974030 19.564915 F Value 43.8000000 577.

P < 0. P < 0.0001).1611017
Pr > |t| <.0001). the covariate is important and needs to be included in the model.3000000 10. H0 : τ1 = τ2 = τ3 = 0. is signiﬁcant (F = 9. (b) The hypothesis H0 : β = 0 is rejected (F = 36.3159234 4. SINGLE FACTOR DESIGNS WITH ONE COVARIATE Error Total 26 29 417.05. None of the pairwise diﬀerences is signiﬁcant at α = .95671019
DRUG A D F
POST LSMEAN 6.7.202597 1288. 2.0793 -1.14998057
-------------PRE------------Mean Std Dev 9.2.5606).64399254 6.1000000 12.2884943 1.02.01.0001 <.0001 <.0793 1 2 -0.0835
1.700000 16. P = 0.8239348 10. Do we have a common slope? We fail to reject the hypothesis H0 : β1 = β2 = β3 (F = 0.046254
165
Level of DRUG A D F
N 10 10 10
-------------POST-----------Mean Std Dev 5. P = 0.2724690 1.76211904 5.9000000 LSMEAN Number 1 2 3 4. The test for treatment eﬀects.3000000 Standard Error 1.0010).060704 0.80011 0.82646 0. Here is a summary of the results: 1.
.800112 0.3000000 6. the assumption of a common slope is a valid assumption.24933858 3.0000000 12.15449249 7.15.0001
Least Squares Means for Effect DRUG t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: POST i/j 1 2 3 0. Thus.0835
An observation is that the results of MEANS and LSMEANS are quite diﬀerent. Is the covariate important? (a) The hypothesis H0 : β1 = β2 = β3 = 0 is rejected (F = 12.826465 0.9521 1. Thus.0607 0.59.9521 3 -1.7149635 6. LSMEANS gives the adjusted means while MEANS gives the raw means.

· · · . MODEL Y = A B A*X /NOINT. The P -value corresponding to A ∗ X is the P -value for the test of interest. j = 1. we test H0 : β1 = β2 = · · · = βk = 0 . In SAS PROC GLM.. The statistical model for the RCBD ANCOVA is ¯ Yij = µ + τi + γj + β(Xij − X. Using SAS PROC GLM. The P -value of interest corresponds to the A ∗ X term. CLASS A. MODEL Y = A B X. 2. Test whether there is a common slope: We test H0 : β1 = β2 = · · · = βk using PROC GLM. ) +
ij
where i = 1. n.166
CHAPTER 7. CLASS A. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
7. MODEL Y = A B X A*X. If either one of the tests is signiﬁcant. k. Let A denote the factor of interest and B be the blocking factor. MODEL Y = A B X. CLASS A. (b) Assuming homogeneous slopes. then we perform the ANCOVA analysis. In SAS PROC GLM. Go to Step 2. the we use the ANCOVA model. we test H0 : β = 0 . The following is the analysis strategy to follow: 1. (b) If the test is not signiﬁcant. The analysis of such a design proceeds in the same manner as the single factor design considered above. and γj is the eﬀect of the jth block of n blocks. If both are non-signiﬁcant.
. · · · .3
ANCOVA in Randomized Complete Block Designs
We will consider the case where a single covariate is observed along with the response in a RCBD. then the covariate X is not needed. (a) If the test is signiﬁcant. then we follow a Johnson-Neyman type analysis. Test whether the covariate X is important: (a) Assuming heterogeneous slopes. CLASS A.

ANCOVA IN RANDOMIZED COMPLETE BLOCK DESIGNS ﬁts the RCBD ANCOVA model ¯ Yij = µ + τi + γj + β(Xij − X. ) +
ij
167
and tests H0 : τ1 = τ2 = · · · = τk = 0. $ B Y X. in the preceding SAS code. A 1 A 2 A 3 A 4 B 1 B 2 B 3 B 4 C 1 C 2 C 3 C 4 . The variables of interest are • X = yield of a plot in a previous year • Y = yield on the same plot for the experimental year The data are as follows: Block 1 X Y 2 X Y X Y X Y Varieties A B C 54 51 57 64 65 72 62 68 51 54 53 62 64 69 47 60 50 66 60 70 46 57 41 61
3
4 The SAS analysis of the data is as follows: DATA CROP. Follow-up pairwise comparisons (H0 : τi = τj ) may be performed by including LSMEANS A/PDIFF.3. INPUT A CARDS. The following example is taken from Wishart (1949). Example Yields for 3 varieties of a certain crop in a randomized complete block design with 4 blocks are considered.7.. 64 68 54 62 65 69 60 66 72 70 57 61 54 62 51 53 51 64 47 50 57 60 46 41
.

00000 16392. CLASS A B.6046512 8. LSMEANS A/ STDERR PDIFF TDIFF. RUN. QUIT.6046512 17. MODEL Y = A B A*X/ NOINT.57 12.0000000 Mean Square 12.6046512 4.0057 0.6046512 23.6790698 F Value 2.07228 7.0000000 24. QUIT.26 Pr > F 0.00000 84.95 5.
----------------------------------------------------------------------Dependent Variable: Y Sum of Source DF Squares Mean Square F Value Pr > F A 3 49176. MODEL Y = A B X A*X.168
CHAPTER 7.8060976 1. CLASS A B.0042 0.57 0.0000000 24.0000000 Mean Square 12.1229
Dependent Variable: Y Source A B X Error Total DF 2 3 1 5 11 Sum of Squares 24.0375 0.78315 1.57 Pr > F 0.9277178 F Value 6.00000 8503.32 <.22 43. PROC GLM.0057 X*A 3 42. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
PROC GLM.30 0.1712 0.0000000 84.0000000 24.00000
Dependent Variable: Y Source A B X X*A Error Total DF 2 3 1 2 3 11 Sum of Squares 24. PROC GLM.7831535 324.0684 Error 3 5. MODEL Y = A B X. RUN.56 17.0000000 252.6121953 5.21685 14.76 4.0001 B 3 252.0000000 84.0000000 24.00000 43. RUN.0704
.0000000 252. QUIT.3953488 324. CLASS A B.0856 0.92772 Total 12 49476.

0001
Least Squares Means for Effect A t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: Y i/j 1 2 3 2. with P -values close to 0.545014 0. Hence.5332
The covariate X does not appear to be very important (P = 0.0684 for heterogeneous slopes and P = 0. However.0351 -0.54501 0.9302326 B C 65.66898 0.0351 1 2 -2.0516 3 -2.868582 0. we will keep it in the analysis for the sake of illustration.0000000 66. The test for treatment eﬀect is not signiﬁcant.0704 for a single slope).0697674 --------------------------
.3.86858 0.1778789 LSMEAN Number 1 2 3
169
A A B C
Y LSMEAN 60. The results of the pairwise testing are summarized below using underlining. we fail to detect any diﬀerence in yield among the 3 diﬀerent crop varieties after adjusting for yield diﬀerences of the previous year.0001 <.0697674
Pr > |t| <.668975 0.05. this is the conservative route to follow.0001 <. The test for the equality of slopes is not rejected (P = 0. ANCOVA IN RANDOMIZED COMPLETE BLOCK DESIGNS Least Squares Means Standard Error 1.9302326 65.0516 2.1778789 1. Group A 60. Besides.1229).0000000 66. Thus. the assumption of a common slope is justiﬁed.0815579 1.5332
0.7.

.j = y. . a yijk = µ + τi + γj + (τ γ)ij + β(xijk − x. . . The analysis of two-way ANCOVA is performed as follows: 1. = βab = β is of interest..j . . and µij are estimated by the adjusted means ˆx µi... − x. Kutner..4
ANCOVA in Two-Factor Designs
We will start by recalling the balanced two-factor ﬁxed eﬀects analysis of variance model i = 1. . . ) + ijk . .. a yijk = µ + τi + γj + (τ γ)ij + ijk . .. then our model may be written as i = 1. . . µ. ) ˆ ¯ ¯ ˆx µ. .. j = 1. . . . n where
a b a b
τi =
i=1 j=1
γj =
i=1
(τ γ)ij =
j=1
(τ γ)ij = 0 . . .. . use a Johnson-Neyman analysis. . Applied Linear Statistical Models.
s k (a.1
= 1. . . . If the slopes are not heterogeneous.. . . As in the one-way model.j. .j. .. If the slopes are heterogeneous.b) ab
which is obtained by ”rolling-out” the cells into one big vector as (i. . INTRODUCTION TO THE ANALYSIS OF COVARIANCE
7. . . 2. .
and ijk ∼iid N (0. b k = 1. . σ 2 ). once again. the means µi. Thus. n under the same assumptions as above. . the hypothesis H0 : β11 = . − β(¯i. . We shall consider the case where a covariate xijk is observed along with the response for each experimental unit. If that assumption is not satisﬁed. .2) 2 ··· ···
The correspondence between s and (i.170
CHAPTER 7. − β(¯ij. . . j) is according to the formula s = b(i − 1) + j . = yi. and Wasserman. . ) ˆ ¯ ¯ The major assumption. . is the homogeneity of the slopes. b k = 1. Test for the homogeneity of the slopes. . Nachtsheim. . b k = 1.. ) + ¯
sk . ) ˆ ¯ ¯ ˆx µij = yij.. We may use the one-way ANCOVA methods to perform the test by rewriting the model as ysk = µs + βs (xsk − x. ¯ j = 1.
.1) 1 (1. n under the same assumptions. n (a. . − x. . . ¯ j = 1. The corresponding two-factor ﬁxed eﬀects ANCOVA model is i = 1. − x. ) + ijk . − β(¯. .. . . ab = 1. a yijk = µ + τi + γj + (τ γ)ij + βij (xijk − x.j) s (1. b-1) ab . . The following example is taken from Neter. test main and simple eﬀects using the adjusted means. .

7. QUIT. ANCOVA IN TWO-FACTOR DESIGNS Example
171
A horticulturist conducted an experiment to study the eﬀects of ﬂower variety (factor A) and moisture level (factor B) on yield of salable ﬂowers (Y ). C = 2*(A-1)+B. the horticulturist wished to use plot size (X) as the concomitant variable. PROC GLM. TITLE1 ’HOMOGENEITY OF SLOPES’. 1 1 98 15 1 2 1 1 60 4 1 2 1 1 77 7 1 2 1 1 80 9 1 2 1 1 95 14 1 2 1 1 64 5 1 2 2 1 55 4 2 2 2 1 60 5 2 2 2 1 75 8 2 2 2 1 65 7 2 2 2 1 87 13 2 2 2 1 78 11 2 2 . CARDS.4. TITLE1 ’TWO-WAY ANCOVA’. INPUT A B Y X @@. Because the plots were not of the same size. CLASS C. MODEL Y = C X C*X. The data are presented below: Factor B B1 (low) B2 (high) Yi1k Xi1k Yi2k Xi2k 98 15 71 10 60 4 80 12 77 7 86 14 80 9 82 13 95 14 46 2 64 5 55 3 55 60 75 65 87 78 4 5 8 7 13 11 76 68 43 47 62 70 11 10 2 3 7 9
Factor A A1 (variety LP)
A2 (variety WB)
The following SAS code is used to analyze the above example. A partial output is given following the code: DATA FLOWERS. RUN.
71 80 86 82 46 55 76 68 43 47 62 70
10 12 14 13 2 3 11 10 2 3 7 9
PROC GLM.
. Six replications were made for each treatment.

LSMEANS A B A*B / STDERR PDIFF TDIFF.6704
TWO-WAY ANCOVA Dependent Variable: Y Sum of Squares 96.518817 119.747808 5086. MODEL Y = A B A*B X.0212 <.5423387 67.288483
F Value 15.8192204 65. INTRODUCTION TO THE ANALYSIS OF COVARIANCE CLASS A B.309097 10.6807796 66.7304444 0.0001 <.5423387 70.7304444 Standard Error 0.0001 <.18 <.309097 3.53
Pr > F 0.0009 <.0001 <.7246356 H0:LSMEAN=0 Pr > |t| <.7246356 0.042244 3994.733375 108.0242739 1.21
Pr > F 0. QUIT.0001 H0:LSMEAN=0 Pr > |t| <.0365780
Pr > |t| <.0961022
Standard Error 1.55 635.0423387 67.133262 3703.0001 0. ----------------------------------------------------------------------------HOMOGENEITY OF SLOPES Dependent Variable: Y Sum of Squares 87.0001 0.0283916 1.796738
F Value 4.577792 6.0001 <.
RUN.1267 <.0001
Least Squares Means Standard Error 0.0009
A 1 2
Y LSMEAN 72.399785 3703.3192204
H0:LSMean1=LSMean2 t Value Pr > |t| 7.0001
A 1 1 2 2
B 1 2 1 2
Y LSMEAN 76.36 51.0283916 1.0001 <.0001
LSMEAN Number 1 2 3 4
Least Squares Means for Effect A*B
.849473 16.172
CHAPTER 7.87 0.000000
Source A B A*B X Error Total
DF 1 1 1 1 19 23
Mean Square 96.601826 323.50 2.000000
Source C X X*C Error Total
DF 3 1 3 16 23
Mean Square 29.92 0.601826 323.481183 5086.849473 16.518817 6.042244 3994.9576613
B 1 2
Y LSMEAN 73.29 544.0001 H0:LSMean1=LSMean2 t Value Pr > |t| 3.

78137 <.0009). (c) The B main eﬀect is signiﬁcant (P < 0.0009
173
2.7.0001 3 3.9371 0. Thus the inclusion of the covariate is justiﬁed.0362 -1. (d) The hypothesis H0 : β = 0 is rejected (P < 0.4.
.1267).216274 <.781373 <.0009 -2.937098 0. The parallelism hypothesis H0 : β11 = β12 = β21 = β22 = β is not rejected (P = 0.937098 0. We may compare A and B means at the main eﬀect level.1127
-3.0001). 2.254261 0.0001 1. (b) The A main eﬀect is signiﬁcant (P = 0. Comparisons of means results: (a) The AB interaction eﬀect fails to be signiﬁcant (P = 0.21627 <.0009
Thus 1.0001 1 2 6.6704).9371 0.663 0.25426 0.1127 3.662999 0.0362 4 7.0009 -7.0001). ANCOVA IN TWO-FACTOR DESIGNS t for H0: LSMean(i)=LSMean(j) / Pr > |t| Dependent Variable: Y i/j 1 2 3 4 -6.0001 -3. The ANCOVA model with a common slope is valid.

The following plot gives a case where the heterogeneity of the slopes is clear. X. The dependent variable is the aggressiveness score on a behavioral checklist. One Covariate
We shall now consider ANCOVA designs where the assumption of equal slopes is not satisﬁed. The diﬀerence between the two lines is clearly not signiﬁcant for values of X near 6. Scores on a sociability scale are employed as a covariate. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
7. Heterogeneous slopes present a problem in ANCOVA in that it is impossible to claim signiﬁcance or non-signiﬁcance of the treatment eﬀect throughout the range of the covariate under consideration.
Thus we wish to test if there is a signiﬁcant treatment eﬀect for a chosen value of the covariate X. The following example is taken from Huitema (1980) : The Analysis of Covariance and Alternatives.5.174
CHAPTER 7. This happens when the treatment eﬀect is dependent upon the value of the covariate. The Johnson-Neyman procedure generalizes this process by identifying the values of the covariate for which there is a signiﬁcant diﬀerence between the levels of the treatment.5.5
7. there is a signiﬁcant interaction between the levels of the treatment and the covariate variable.1
The Johnson-Neyman Technique: Heterogeneous Slopes
Two Groups.
.
Example Suppose that the data in the following table are based on an experiment in which the treatments consist of two methods of therapy. there may be a signiﬁcant diﬀerence between the lines for values of X around 2. however. In other words.

cards.5 12 12 11 11 12.5 12.5 11 12.5 2.5 14 14.5 8 9 10 1 Y 5 6 6 7 8 9 9 9 10. input grp x y. THE JOHNSON-NEYMAN TECHNIQUE: HETEROGENEOUS SLOPES Therapy 1 Y 10 10 11 10 11 11 10 11 11.5 4.7.5 5 6 6 7 7 7.5 12 Therapy X 1 1.5 3.5 16
175
X 1 2 2 3 4 5 5 6 6 7 8 8 9 10 11
Using SAS. The following SAS code may be used to ﬁt the heterogeneous slope ANCOVA and test for signiﬁcance of the diﬀerence between the two methods of therapy on a case by case basis.5. data jn. A partial output is given following the code. we get the following two regression lines:
The slopes are heterogeneous.5 4. 1 1 10 1 2 10 1 2 11 1 3 10 1 4 11 1 5 11 1 5 10 1 6 11
.

X=6.
-------------------------------------------------------------------------Sum of Source DF Squares Mean Square F Value Pr > F grp 1 64.0001 Error 26 10.79 <.5963693 141.5 3.0001 x*grp 1 59.5 4.90 <. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
goptions reset=symbol. 6 7 8 8 9 10 11 1 1. run.5963693 59.7 PDIFF TDIFF.7551731 280.
PROC GLM DATA=JN.5 2. X=8 PDIFF TDIFF.5 5 6 6 7 7 7. X=5 PDIFF. symbol2 c=black v=square l=1 i=r.0001 x 1 117.1707155 152. quit.5 12 12 11 11 12.37 <.5 14 14.176 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 .
proc gplot.1707155 64.5 8 9 10 11.5 12 5 6 6 7 8 9 9 9 10. QUIT.4199994
.5 PDIFF TDIFF. symbol1 c=black v=dot l=1 i=r.7551731 117.5 4.
GRP*X.5 11 12. CLASS GRP. MODEL Y = GRP X LSMEANS GRP/ AT LSMEANS GRP/ AT LSMEANS GRP/ AT LSMEANS GRP/ AT RUN. plot y*x=grp.9199841 0. X=6.5 16
CHAPTER 7.5 12.

7.5. THE JOHNSON-NEYMAN TECHNIQUE: HETEROGENEOUS SLOPES Total 29 176.9666667 Least Squares Means at x=5 H0:LSMean1= LSMean2 grp y LSMEAN Pr > |t| 1 10.8997955 <.0001 2 9.3402754 Least Squares Means at x=6.5 H0:LSMean1=LSMean2 y LSMEAN t Value Pr > |t| 11.2126789 0.07 0.9461 11.1957508 Least Squares Means at x=6.7 H0:LSMean1=LSMean2 y LSMEAN t Value Pr > |t| 11.2543967 -0.74 0.4636 11.4431475 Least Squares Means at x=8 H0:LSMean1=LSMean2 y LSMEAN t Value Pr > |t| 11.5255624 -4.89 <.0001 13.0512261

177

grp 1 2

grp 1 2

grp 1 2

Thus, there is a signiﬁcant diﬀerence between the two methods of therapy for an individual with a sociability scale of 5 (P < 0.0001), while we fail to ﬁnd a signiﬁcant treatment eﬀect for an individual with a sociability scale of 6.5 (P = 0.9461). ˆ ˆ ˆ ˆ Let the regression lines ﬁtted individually for groups 1 and 2 be Y1 = α1 + β1 X1 and Y2 = α2 + β2 X2 , ˆ ˆ ¯ ¯ respectively. Let X1 and X2 be the sample means of the covariate associated with the two groups while

n1

SXX1 =

i=1

¯ (X1i − X1 )2 , and SXX2 =

n2 i=1

¯ (X2i − X2 )2

where n1 and n2 are the respective sample sizes. We can identify the lower and upper limits, XL and XU , of the region of non-signiﬁcance on X using the following formulæ: √ −B − B 2 − AC XL = √A −B + B 2 − AC XU = A where ˆ ˆ + (β1 − β2 )2 SXX1 SXX2 ¯ ¯ X1 X2 ˆ B=d + + (ˆ 1 − α2 )(β1 − β2 ) α ˆ ˆ SXX1 SXX2 ¯2 ¯2 1 X1 X2 1 + + + + (α1 − α2 )2 ˆ ˆ C = −d n1 n2 SXX1 SXX2 A = −d + 1 1

178 Here

CHAPTER 7. INTRODUCTION TO THE ANALYSIS OF COVARIANCE

d = (tn1 +n2 −4 (α/2))2 s2 p where s2 = p (n1 − 2)s2 + (n2 − 2)s2 1 2 n1 + n2 − 4

is the pooled variance. The values of s2 and s2 represent the error mean of squares when the two individual 1 2 regression lines are ﬁt. Example Consider the preceding example. The following SAS code can be used to compute the quantities in the formula. PROC SORT; BY GRP; RUN; QUIT; PROC MEANS MEAN CSS N; VAR X; BY GRP; RUN; QUIT; PROC GLM DATA=JN; MODEL Y = X; BY GRP; RUN; QUIT; /* Get t value squared from the T-table*/ DATA T; TVAL = TINV(.975, 26); TVAL = TVAL*TVAL; RUN; QUIT; PROC PRINT DATA=T; RUN; QUIT; --------------------------------------------------------------------------------------------------------------------------- grp=1 --------------------------Analysis Variable : x Mean 5.8000000 Corrected SS 130.4000000 N 15

--------------------------------------------- grp=2 --------------------------Analysis Variable : x

7.5. THE JOHNSON-NEYMAN TECHNIQUE: HETEROGENEOUS SLOPES

179

Mean 5.5333333

Corrected SS 99.2333333

N 15

--------------------------------------------- grp=1 --------------------------Sum of Source DF Squares Mean Square F Value Pr > F x 1 5.67361963 5.67361963 19.62 0.0007 Error 13 3.75971370 0.28920875 Total 14 9.43333333 Standard Error 0.30641368 0.04709414

Parameter Intercept x

Estimate 9.856850716 0.208588957

t Value 32.17 4.43

Pr > |t| <.0001 0.0007

--------------------------------------------- grp=2 --------------------------------------------Sum of Source DF Squares Mean Square F Value Pr > F x 1 151.8397296 151.8397296 275.68 <.0001 Error 13 7.1602704 0.5507900 Total 14 159.0000000 Standard Error 0.45460081 0.07450137

Parameter Intercept x Obs 1 TVAL 4.22520

Estimate 3.155357743 1.236983540

t Value 6.94 16.60

Pr > |t| <.0001 <.0001

We may now compute XL and XU . A summary the quantities needed is

Group 1 n1 = 15 ¯ X1 = 5.8 SXX1 = 130.4 α1 = 9.857 ˆ ˆ β1 = 0.209 s2 = 0.289 1

t26 (.025) = 4.225

Group 2 n2 = 15 ¯ X2 = 5.533 SXX2 = 99.23 α2 = 3.155 ˆ ˆ β2 = 1.237 s2 = 0.551 2

However. s) = where Ars = −d Brs Crs Here d = t(Pa ni −2a) i=1 α a(a − 1) ˆ ˆ + (βr − βs )2 SXXr SXXs ¯ ¯ Xr Xs ˆ =d + + (αr − αs )(βr − βs ) ˆ ˆ ˆ SXXr SXXs ¯2 ¯2 1 1 Xr Xs = −d + + + + (ˆ r − αs )2 .
. 7. XL (r. using α = 0.0253 √ −B + B 2 − AC 6. s) = XU (r.857 − 3.4 99.7745) ∗ + (9.857 − 3.551) = 0. we fail to detect a diﬀerence between the two methods of therapy.42 26 d = 4. this may be done at once using the BONFERRONI adjustment of LSMEANS in SAS. while method 2 seems to work better for subjects above the sociability score of 7. for 1 ≤ r < s ≤ a are: XL (r. The lower and upper limits for comparing groups r and s.03.712 − 0. and C are computed as p s2 = p (13 ∗ 0.03 and 7.5.82 5. d.237) = −6.8 5.07 A 1.209 − 1.42 = 1.4 99.2
Multiple Groups.03 .23 1 5.712 + 0.23 A = −(1. s). One compares each pair of groups using a Johnson-Neyman procedure at level α/ a . one uses the formulægiven above.07).712 130.209 − 1. INTRODUCTION TO THE ANALYSIS OF COVARIANCE
The values of s2 .0253 XL = −B − Thus.533 XU = = = 7.07.533 = = 6. Method 1 appears to be superior for subjects with sociability score below 6. If the 2 speciﬁc value of X is known. One Covariate
Two procedures extending the Johnson-Neyman strategy to incorporate several groups were proposed by Potthoﬀ(1964).237)2 = 1.225 ∗ 0. s) and XU (r. to determine regions of non-signiﬁcance with simultaneous conﬁdence α. We shall consider the simpler of the two which uses the above technique along with a Bonferroni correction. the region of non-signiﬁcance is (6.155)2 = 43.289) + (13 ∗ 0.4 99.155) ∗ (0.05.533 B = 1.675 15 15 130.7745
1 1 + + (0. There are a = a(a − 1)/2 possible pairwise 2 comparisons.23 5.5332 1 + + + C = −(1.0253 130. For subjects with sociability score between 6.7745 ∗ + + (9. Let a be the number of groups under consideration. α ˆ nr ns SXXr SXXs +
2
−Brs − −Brs +
2 Brs − Ars Crs Ars 2 Brs − Ars Crs Ars
1
1
s2 p
where
s2 = p
a 2 i=1 (ni − 2)si a i=1 ni − 2a
.180
CHAPTER 7. A.7745) ∗ These give B 2 − AC 6. B. The idea is to make all pairwise comparisons using the Bonferroni technique.07 √
7.03 A 1.

As an illustration. smaller sampling units (subsamples) are selected from each experimental unit on which the response is measured. consider a situation where a the treatments are several diets that are taken by humans in a completely randomized manner.
8. we take several blood samples from each individual and measure the hormone in each sample. Suppose we are interested in comparing the a levels of factor A. Another instance of nesting involves two or more factors in which one or more of the factors are nested within the other structurally. or hierarchical designs.1
Nesting in the Design Structure
The type of nesting where there are two or more sizes of experimental units is known as nesting in the design structure. are used in experiments where it is diﬃcult or impossible to measure response on the experimental unit. This subsampling process introduces a new source of variability due to the subsamples within the experimental units in our model in addition to the variation among the experimental units. Responses are measured on n subsamples for each subject. Nested designs. Assume there are r subjects available for each level of A. The experimental error now consists of the variability among the blood samples per individual and the variability among the individuals themselves. The response is the level of a certain hormone in the blood of a subject’s body. These involve smaller sampling units obtained via subsampling. Instead. The data layout looks like the following:
181
. Since it is diﬃcult to measure the level of the hormone in all of the blood of an individual.Chapter 8
Nested Designs
We have already seen some examples of nesting in the analysis of repeated measurement models as well as split-plot designs.

k = 1. τi is the eﬀect of level i of the treatment. . . . y1rn
a ya11 ya12 . . σδ ) . . . ya2n . r. . . . . . n
1 y111 y112 . . . . we ﬁnd the following ANOVA table Source Treatment Experimental Error Sampling Error Total df a−1 a(r − 1) ar(n − 1) prn − 1 MS M SA M SE F FA = M SA /M SE FE = M SE /M SS
The following example is taken from Peterson: Design and Analysis of Experiments. y2rn ··· ··· ··· ··· ··· ··· ···
CHAPTER 8. . and δk(ij) is the random variation of the kth sampling unit within the jth unit on the ith treatment. .
j(i)
∼ N (0. . Here µ is the overall mean. . . y22n .e. . n. . i. y2r1 y2r2 . . .
One can show that the expected mean squares are those given in the following table: Source Treatment Experimental Error Sampling Error MS M SA M SE M SS E(MS) 2 σδ + nσ 2 + (rn 2 σδ + nσ 2 2 σδ τi2 )/(a − 1)
Using the expected MS. . . a. . NESTED DESIGNS
Unit 1
Sample 1 2 . 1 2 . . y1r1 y1r2 . . . y11n y121 y122 . Example A chemist wanted to measure the ability of three chemicals to retard the spread of ﬁre when used to treat plywood panels.182 Treatment 2 ··· y211 · · · y212 · · · . The data is given as follows:
. . We assume that the random errors follow normal distributions with mean 0 and constant variances. . ya1n ya21 ya22 . σ 2 )
and
2 δk(ij) ∼ N (0. j(i) is the random variation of the jth experimental unit on the ith treatment. . . . . He then cut two small pieces from each panel and measured the time required for each to be completely consumed in a standard ﬂame. r
The statistical model for a one-way CRD with subsampling is yijk = µ + τi +
j(i)
+ δk(ij)
where i = 1. . . . . y21n y221 y222 . . j = 1. n . y12n . . . yar1 yar2 . . . . yarn
2
. He obtained 12 panels and sprayed 4 of the panels with each of the three chemicals. n 1 2 . .

4 1.5 5.9 5.6 9.6 5.6 5.6 5.4 3.4 3. DATA NEW.2
PROC GLM.3 5. TEST H=TRT E=PANEL(TRT). NESTING IN THE DESIGN STRUCTURE Chemical A B C 10.7 3.7 4.9 5.3 5. -----------------------------------------------------------------------Dependent Variable: RESP
.8 2.3 4.1 9.8 4.0 4. INPUT TRT CARDS.2
183
Panel 1 2 3 4
Sample 1 2 1 2 1 2 1 2
The SAS analysis of the data is given as follows.5 5. 10.0 7.7 6.1 9. MODEL RESP=TRT PANEL(TRT).3 4.7 6.4 8. QUIT.0 4.5 8.7 3. It is followed by the output.8. CLASS TRT PANEL SUB.7 4.4 3.4 3.4 8.6 5.1.4 4. 1 1 1 2 1 1 3 1 1 1 1 2 2 1 2 3 1 2 1 2 1 2 2 1 3 2 1 1 2 2 2 2 2 3 2 2 1 3 1 2 3 1 3 3 1 1 3 2 2 3 2 3 3 2 1 4 1 2 4 1 3 4 1 1 4 2 2 4 2 3 4 2 .6 5.5 8.6 9. PANEL SUB RESP.8 2.4 4.1 10. RUN.6 5.0 7.8 4.1 10.4 1.

1633333 8.81541667 F Value 9.0120 0. Thus. The corresponding output is TRT 1 2 3 RESP LSMEAN 8.
.53250000
Mean Square 46.51
Pr > F <.63083333 Mean Square 46.53750000 3.0022 0. There is also a signiﬁcant diﬀerence among the panels treated alike (P = 0.08750000 LSMEAN Number 1 2 3
i/j 1 2 3
1 0.0733333
CHAPTER 8.0057
Thus. there were highly signiﬁcant diﬀerences among the three chemicals in their ability to retard the spread of ﬁre (P = 0. LSMEANS TRT / PDIFF E=PANEL(TRT). treatment A appears to be the most eﬀective ﬁre retardant.2988
3 0.0019).9100000 146.0057).87500000 5.79
Pr > F <.05 6.7425000
F Value 16.4693939 0.0022 0.2988
Thus there is a signiﬁcant diﬀerence between treatment A and the remaining two.83694444
F Value 63.0019
Tests of Hypotheses Using the Type III MS for PANEL(TRT) as an Error Term Source TRT DF 2 Type III SS 93.81541667 4.63083333 43.68 Pr > F 0.0001
Source TRT PANEL(TRT)
DF 2 9
Type III SS 93. Mean comparisons can be made by using SAS’ LSMEANS with the appropriate error term.0001 0.184 Sum of Squares 137.0120
2 0. NESTED DESIGNS
Source Model Error Corrected Total
DF 11 12 23
Mean Square 12.

The statistical model for this case is i = 1.. )2 . − y. τi is the eﬀect of the ith level of A. . This is left as an exercise. .. .. The data has the following format: A 1 2 3 1 X 2 X 3 X B 4 X 5 X X X 6 7
Assuming that factor A has a levels and the nested factor B has a total of b levels with mi levels appearing with level i if A. a yijk = µ + τi + βj(i) + ijk . . b = mi = 7.. . there is only one error term. ¯ mi ni (¯i. . A and B. mi k = 1. . 2. In this case B needs to have more levels than A.. . in contrast to nesting in the design structure. βj(i) is the eﬀect of the jth level of B contained within level i of A. The total sum of squares decomposes into component sum of squares in a natural way. m1 = 2. Then we have SST = SSA + SSB(A) + SSE where (given with the associated df)
a mi ni
SST =
i=1 j=1 k=1 a
(yijk − y.8.2
Nesting in the Treatment Structure
Consider two factors.. ¯
df =
i=1
mi (ni − 1) = N − b
The following ANOVA table may be used to test if there are any treatment diﬀerences. )2 . 2. . )2 . − yi. In the above representation a = 3. )2 .
. ni where yijk is the observed response from the kth replication of level j of B within level i of A. σ 2 ) are random errors. µ is the overall mean. 2. and ijk ∼ N (0. NESTING IN THE TREATMENT STRUCTURE
185
8. Further assume that there are ni replications for each level of B nested within level i of A. . the sampling unit and the experimental unit coincide. j = 1. In the current case. A more general case that will not be considered here is the case of unequal replications for each B nested within A. m2 = 3. . The levels of factor B are nested within the levels of factor A if each level of B occurs with only one level of A. Source A B(A) Error Total df a−1 b−a N −b N −1 MS M SA M SB(A) M SE F FA = M SA /M SB(A) FB(A) = M SB(A) /M SE
The following example is taken from Milliken and Johnson :The Analysis of Messy Data. . ¯
i=1 j=1 a mi nj(i)
df =
i=1
(mi − 1) = b − a
a
SSE =
i=1 j=1 k=1
(yijk − yij. The correct F is derived using the expected mean squares. Let N = a i=1 mi ni be the total number of observations.2. Vol I. y ¯
i=1 a mi
df = N − 1
SSA = SSB(A) =
df = a − 1
a
ni (yij. m3 = 2. Note that.

A 1 151 A 2 118 A 3 A 1 135 A 2 132 A 3 A 1 137 A 2 135 A 3 B 4 140 B 5 151 B 4 152 B 5 132 B 4 133 B 5 139 C 6 96 C 7 84 C 6 108 C 7 87 C 6 94 C 7 82 D 8 79 D 9 67 D 10 D 8 74 D 9 78 D 10 D 8 73 D 9 63 D 10 .186 Example Four chemical companies produce insecticides in the following manner: • Company A produces three insecticides. class A B. • Company D produces four insecticides.
131 137 121
90 81 96
D 11 83 D 11 89 D 11 94
. input A $ B Y @@. cards. b = 11. The treatment structure is a two-way with the data given as follows: Company A 1 151 135 137 2 118 132 135 3 131 137 121 4 Insecticide 5 6 7
CHAPTER 8. proc glm. test H=A E=B(A). model y = A B(A). and • no company produces an insecticide exactly like that of another. • Companies B and C produce two insecticides each. NESTED DESIGNS
8
9
10
11
B
140 152 133
151 132 139 96 108 94 84 87 82 79 74 73 67 78 63 90 81 96 83 89 94
C
D
Thus in this example a = 4. m4 = 4. m1 = 3. n1 = n2 = n3 = n4 = 3 The following SAS code (given with edited output) may be used to analyze the data: data insect. m2 = m3 = 2.

quit.0081
187
Thus there are signiﬁcant diﬀerences among the companies (P = 0.29545 1500.87879 7604.00000 25573.8.74 0. -------------------------------------------------------------------------Sum of Source DF Squares Mean Square F Value Pr > F A B(A) Error Total 3 7 22 32 22813.0081). NESTING IN THE TREATMENT STRUCTURE run.2.36905 57.58333 1260.47 3.0001) and there are signiﬁcant diﬀerences among the insecticides produced by the same company (P = 0.
.0001 0.43182 214.27273 35.