Professional Documents
Culture Documents
ANCOVA
By Brian W. Sloboda
18 - 1
Preview of ANOVA and
Some Preliminaries
Analysis of variance (ANOVA) is a method for
testing the hypothesis that three or more
population means are equal.
For example:
H0: µ1 = µ2 = µ3 = . . . µk
H1: At least one mean is different
ANOVA Methods
Require the F-Distribution
1. The F- distribution is not symmetric; it is skewed to
the right.
2. The values of F can be 0 or positive; they cannot
be negative.
3. There is a different F-distribution for each pair of
degrees of freedom for the numerator and
denominator.
F-Distribution
Definition
One-way analysis of variance (ANOVA) is a method of
testing the equality of three or more population means
by analyzing sample variances. One-way analysis of
variance is used with data categorized with one
treatment (or factor), which is a characteristic that
allows us to distinguish the different populations from
one another.
Relationship Between F Test Statistic / P-Value
One-Way ANOVA Assumptions
1. The populations have approximately normal
distributions.
2. The populations have the same variance 2
(or the standard deviation ).
3. The samples are simple random samples.
4. The samples are independent of each other.
5. The different samples are from populations
that are categorized in only one way.
Procedure for testing
H0: µ1 = µ2 = µ3 = . . µ0
ANOVA
Source of
Variation SS df MS F P-value F critical
Between
Groups 162.8667 2 81.43333 4.094413 0.027986 3.354131
Within
Groups 537 27 19.88889
x1 x4
t
1 1
MS (error )
n1 n4
Bonferroni Multiple
Comparison Test
Step 2 (cont.) Change the subscripts and
use another pair of samples until all of the
different possible pairs of samples have
been tested.
12
10
Mean Response
8 Low A
6 High A
0
Low B High B
14
12
10
Mean Response
8 Low A
6 High A
0
Low B High B
F tests for the Two-Way ANOVA with
Replication
ANOVA
Source of Variation SS df MS F P-value F crit
0.45983
Sample 14.22222 1 14.222220.583144 4 4.747225
0.00907 3.88529
Columns 348.1111 2 174.0556 7.136674 8 4
3.88529
Interaction 14.77778 2 7.3888890.302961 0.744114 4
Within 292.6667 12 24.38889
MS(size) 77.389
Column Factor: F 3.58
MS(error) 21.611
Note: That the Two Way ANOVA without Replication only has Rows and Columns
and no interaction effects.
Advantages of Two-Way ANOVA
• When interested in studying the effects of two
factors, two-way designs offer great advantages
over several single-factor studies.
• Researchers want to determine the influence of dietary
minerals on blood pressure.
• Rats receive diets prepared with varying amounts of
calcium and varying amounts of magnesium, but with
all other ingredients of the diets the same. There are
three levels of calcium (low, medium and high) and three
levels of magnesium.
Recap
In this section we learned about Two Way
ANOVA with Replication
23-49
Example: Calf Weight Gain
• An animal scientist wishes to examine the impact of a pair of new dietary
supplements on calf weight gain (response).
• Three treatments are defined: standard diet, standard diet + supplement Q,
and standard diet + supplement R.
• All new calves from a large herd are available for use as study units. She
selects 30 calves for study. Calves are randomized to the three diets at
random (completely randomized design).
• Initial weights are recorded, then calves are placed on the diets. At the
end of four weeks the final weight is taken and weight gain is computed.
• Simple analysis of variance and associated multiple comparisons
procedures indicate no significant differences in weight gain between the
two supplementary diets, but big differences between the supplemental
diets and the standard diet.
• Is this the end of the story?
23-50
ANOVA Results
xx
x xx x x xx Standard
Diet
xxx
xxx x xx + Supplement Q
xx x
x x xx x x x + Supplement R
23-51
Initial Weights
Initial Weight
xx x
x xx x x xx Standard
Diet
xx
x xxxx x xx + Supplement Q
x xx
xx xx x x x + Supplement R
Plotting of the initial weights by group shows that the groups were not equal
when it came to initial weights.
23-52
Weight Gain to Initial Weight
Standard Diet
Weight (kg)
2
wF
2
w gain
1 2
1
wF wi
w gain 1
wi
age
If animals come into the study at different ages, they have different initial weights and
are at different points on the growth curve. Expected weight gains will be different
depending on age at entry into study.
23-53
Regression of Initial Weight to
Weight Gain
Weight
Gain If we disregard the
2
(g/day) age of the animal
(Y)
w gain but instead focus on
the initial weight,
we see that there is
1 a linear relationship
w gain
Initial between initial
Weight weight and the
1 2 (x)
wi wi weight gain
expected.
23-54
Covariates
Initial weight in the previous example is a covariate.
A covariate is a disturbing variable (or known as the confounder), that is, it is known
to have an effect on the response. Usually, the covariate can be measured but
often we may not be able to control its effect through blocking.
In the EXAMPLE, had the animal scientist known that the calves were very
variable in initial weight (or age), she could have:
• Created blocks of 3 or 6 equal weight animals, and randomized
treatments to calves within these blocks.
• This would have entailed some cost in terms of time spent sorting the
calves and then keeping track of block membership over the life of the
study.
• It was much easier to simply record the calf initial weight and then use
analysis of covariance for the final analysis.
• In many cases, due to the continuous nature of the covariate, blocking is
just not feasible.
23-55
Expectations under Ho
Under Ho: no treatment
If all animals had come in with the same initial
effects.
weight, All three treatments would produce the
same weight gain.
Expected
Weight
Gain
(g/day)
(Y)
Initial
Weight
(x)
Average Weight Animal
23-56
Different Initial Weights
If the average initial weights in the treatment
Under Ho: no treatment
groups differ, the observed weight gains will
effects.
be different, even if treatments have no
effect.
WGR
WGs
WGQ
Expected
Weight
Gain cc c
(g/day) qq r rr Initial
c cc c c cc
(Y) q qqqq q qq rr rr r r r Weight
(x)
23-57
Observed Responses under HA
Suppose now that different supplements actually do increase weight gain.
This translates to animals in different treatment groups following different, but
parallel regression lines with initial weight.
+ Supplement Q
+ Supplement R
r
rr r Standard Diet
WGR q r rr r
q rr
WGQ qq q
q q c
q q c cc
WGs q c cc c Under HA: Significant
c c Treatment
effects
Weight
Gain
(g/day) cc c
(Y) qq r rr Initial
c cc c c cc
q qqqq q qq rr rr r r r Weight
(x)
What difference in weight gain is due to Initial weight and what is due to Treatment?
23-58
Simple one-way classification ANOVA (without
Observed Group Means accounting for initial weight) gives us the
wrong answer!
Weight
Gain
(g/day) + Supplement Q
(Y)
+ Supplement R
r rr
yr q
rr
r r r
Standard Diet
rr
yq qq q
q
q q c c
q q q c c
c c
yc c c
cc
cc c
qq r rr Initial
c cc c c cc
Unadjusted
q qqqq q qq rr rr r r r Weight
treatment means (x)
23-59
A Priori Assumptions
The covariate is related to the response, and can account for variation in the
response.
Check with a scatterplot of Y vs. X.
23-60
Formulation Strength Thickness
1 46.5 13
Example 1
1
45.9
49.8
14
12
Four different formulations of an industrial glue are 1 46.1 12
being tested. The tensile strength (response) of the glue 1 44.3 14
is known to be related to the thickness as applied. Five 2 48.7 12
observations on strength (Y) in pounds, and thickness 2 49.0 10
(X) in 0.01 inches are made for each formulation. 2 50.1 11
2 48.5 12
Here: 2 45.2 14
• There are t=4 treatments (formulations of glue). 3 46.3 15
• Covariate X is thickness of applied glue. 3 47.1 14
• Each treatment is replicated n=5 times at different 3 48.9 11
values of X. 3 48.2 11
3 50.3 10
4 44.7 16
4 43.0 15
4 51.0 10
4 48.1 12
4 46.8 11
23-61
Formulation Profiles
52.0
48.0
Strength
(Y)
44.0
40.0
16 15 10 12 11
Thickness (X)
23-62
data glue;
SAS Studio Program input Formulation Strength Thickness;
datalines;
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
2 48.7 12
2 49.0 10
2 50.1 11
2 48.5 12
2 45.2 14
3 46.3 15
3 47.1 14
3 48.9 11
3 48.2 11
3 50.3 10
The basic model is a combination of 4 44.7 16
regression and one-way 4 43.0 15
4 51.0 10
classification. 4 48.1 12
4 46.8 11
;
run;
proc glm;
class formulation;
model strength = thickness formulation
/ solution ;
lsmeans formulation / stderr pdiff;
run;
23-63
Output: Use Type III SS to test significance of each variable
MSE
Divide by MSE
Standard
to get mean
Parameter Estimate Error t Value Pr > |t|
squares.
Intercept 58.93698630 B 2.21321008 26.63 <.0001
Thickness -0.95445205 0.16705494 -5.71 <.0001
Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912
Formulation 2 0.62554795 B 0.82451389 0.76 0.4598
Formulation 3 0.86732877 B 0.81361075 1.07 0.3033
Formulation 4 0.00000000 B . . .
23-64
Least Squares Means
(Adjusted Formulation means computed at the
average value of Thickness [=12.45])
The GLM Procedure
Least Squares Means
i/j 1 2 3 4
1 0.4574 0.3011 0.9912
2 0.4574 0.7695 0.4598
3 0.3011 0.7695 0.3033
4 0.9912 0.4598 0.3033
23-65
End of Presentation
18 - 1