Analysis of Variance Models for Biostatistics

Analysis-of-Variance Models for
Biostatistics
Ernesto Ponsot Balaguer

PhD in Statistics, MSc. In Applied Statistics, Systems Engineering
http://webdelprofesor.ula.ve/economia/ernesto
E-mail: eponsot@yachaytech.edu.ec
University of Experimental Technologies Research Yachay (Yachay Tech)

School of Mathematical Sciences and Information Technology
School of Biological Sciences and Engineering
Imbabura, Ecuador - April 2018

Content
1 Introduction
2 ANOVA model estimation

Introduction ANOVA model estimation
ANOVA Models
In many experimental situations, a researcher applies several

treatments or treatment combinations to randomly selected
experimental units and then wishes to compare the treatment
means for some response y.
In analysis-of-variance (ANOVA), we use linear models to
facilitate a comparison of these means.
The model is often expressed with more parameters than can
be estimated, which results in an X matrix that is not of full
rank.
3 / 18
ANOVA Models
One-Way Model
Suppose that a researcher has developed two biological

fertilizers to increase the production of a certain variety of
corn. To formulate the model, we might start with the notion
that fertilizers, a plant yields an average of µ cobs. Then if
fertilizer 1 is added, the number of cobs is expected to
increase by τ1 , and if fertilizer 2 is added, the number of cobs
would increase by τ2 .
Thinking about the experiment, it is clear that we should
preferably use a single land, to which we perform three tests
(maybe): One without fertilizers, another with fertilizer 1 and
another with fertilizer 2.
Of course, we need a clear separation between the three
parcels
4 / 18
ANOVA Models
One-Way Model
We need a single land to ensure that its characteristics does

not affect our experiment. The separation to ensure that the
fertilizer 2 does not mix with the fertilizer 1. Obviously, we
must ensure that the environmental conditions are the same.
If we have the minimum resources possible, we only need to
do two tests, one with fertilizer 1 and one with fertilizer 2.
The model will be:
(
y1 = µ + τ1 + 1
y2 = µ + τ2 + 2
In our model yi are observed performance (number of cobs)

and i is an error (unobservable) for i = 1, 2. We would like to
estimate the parameters and test hypotheses such as
H0 : τ1 = τ2 , for example.
5 / 18
ANOVA Models
One-Way Model
As you probably already noticed, that minimal experiment

does not look very reliable, mainly because we will only get
one sample of each condition. We need more samples.
Suppose that we use six parcels, then adding fertilizer 1 to
three and fertilizer 2 to the other three. The new model is


 y11 = µ + τ1 + 11
y12 = µ + τ1 + 12





y
13 = µ + τ1 + 13

 y21 = µ + τ2 + 21




 y22 = µ + τ2 + 22
y23 = µ + τ2 + 23

6 / 18
ANOVA Models
One-Way Model
This is,
yij = µ + τi + ij , i = 1, 2; j = 1, 2, 3
In matrix form:
y = Xβ +
with
     
y11 1 1 0 11
y12  1 1 0 12 
    
 
y 

1 µ
 13  1 0
13 
 
y =  , X =   , β = τ1  , =  
  
y21  1 0 1 21 
 
y22 

1
 τ2  
0 1 22 
y23 1 0 1 23
Do you notice any problem?
7 / 18
ANOVA Models
Two-Way Model
Suppose now that our researcher suspects that the

performance is also related to the variety of corn and wants to
add this information to the experiment, for example by adding
three different varieties of corn: V1, V2 and V3.
As before, µ is the mean, τ1 is the effect of fertilizer 1 and τ2
is the effect of fertilizer 2. We need more parameters. Let γ1
be the effect of the variety V1, γ2 be the effect of V2 and γ3
be the effect of V3.
We might also need more trials, what do you think? What
would you propose?
8 / 18
ANOVA Models
Two-Way Model
Table 0: Number of samples when add varieties to our experiment

Variety
Fertilizer V1 V2 V3 Total
1 2 2 2 6
2 2 2 2 6
Total 4 4 4 12
The new purely additive model is
yijk = µ + τi + γj + ijk , i = 1, 2; j = 1, 2, 3; k = 1, 2
Can you build the matrix form of the model?

Do you note the difference between the One-Way and
Two-Way models?
9 / 18
ANOVA Models Estimation
As we know, in the ANOVA model the matrix X is not of full rank

and then @ (X 0 X)−1 . This happens because the model is
overparameterized. We have three strategies to work with this
limitation:
1 Reparametrize: Redefine the model using a smaller number
of new parameters that are unique.
2 To restrict: Use the overparameterized model but place
constraints on the parameters so that they become unique.
3 Find Linear Estimable Functions: In the overparameterized
model, work with linear combinations of the parameters that
are unique and can be unambiguously estimated.
We will describe 1, 2 and leave 3 for later.
10 / 18

1. Reparametrization
Let’s look the general version of the One-Way model.
yij = µ + τi + ij , i = 1, 2, · · · , k; j = 1, 2, · · · , n
Here k is the number of levels of the (only one) factor or the

number of treatments, and n is the (equal) number of
observations for each treatment. This model with an equal
number of observations for all factor levels is called the
Balanced Case.
This design results in an X matrix with k + 1 columns, as we
postulate k + 1 parameters: µ, τ1 , τ2 , · · · , τk , but
r(X) < k + 1.
11 / 18

Suppose now that on second thought, that it’s not so

important to differentiate between µ, τi (for all i). It is only
important to differentiate the effects of each treatment. Then
we can write:
yij = µi + ij , i = 1, 2, · · · , k; j = 1, 2, · · · , n
Where µi = µ + τi , ∀ i. Then, for our example (k = 2, n = 3)
we have:
     
y11 1 0 11
y12  1 0 12 
   " #  
y  1 0 µ1
 
y = W µ + ⇒  13  =  +  13 
    
y21  0 1  µ2 21 

     
y22  0 1  22 
y23 0 1 23
12 / 18

Now our model has a full rank W matrix since r(W ) = 2, and
we can use all previous theory to estimate the µ parameters
and to test hypotheses.
13 / 18

2. To restrict
Suppose now that distinguish between the original parameters

µ, τi (for all i) it is important, but we can put conditions in
some reasonable way. Such constraints are called side
conditions.
Let τ1 + τ2 + · · · + τk = 0 be the condition imposed. Now the
model is:
X
yij = µ + τi + ij , subject to τi = 0,
i
i = 1, 2, · · · , k; j = 1, 2, · · · , n
Note that we have not added new parameters, we have only

put a condition on existing ones.
14 / 18

2. To restrict
Then, for our example (k = 2, n = 3) we have:

     
y11 1 1 11
y12  1 1  12 
   " #  
y  1 1  µ  
 13  
y = Zτ + ⇒   =  +  13 
  
y21  1 −1 τ1 21 

     
y22  1 −1 22 
y23 1 −1 23
Because we know that τ1 + τ2 = 0 ⇒ τ2 = −τ1 and we do not

need to directly estimate τ2 .
Again our new model has a full rank Z matrix since r(Z ) = 2,
and we can use all previous theory to estimate the τ
parameters and to test hypotheses.
15 / 18

2. To restrict
Note that, in general, if we want to estimate k + 1 parameters

but the r (X) = r < k + 1, we will need k + 1 − r linearly
independent constraints or side conditions.
16 / 18
Exercise 1
Three methods of packaging frozen foods were compared by
Daniel (1974, p. 196). The response variable was ascorbic acid
(mg/100g). The data are in next table:
Table 1: Three methods of packaging frozen foods

Method A B C
14.29 20.06 20.04
19.10 20.64 26.23
19.09 18.00 22.74
16.25 19.56 24.04
15.09 19.47 23.37
16.61 19.07 25.02
19.63 18.38 23.27
17 / 18
R code 1
# Exercise 11
library(ggplot2)
data<-read.table("ex11.txt",header=T,sep=";")
data
# Scatter plot
ggplot(data, aes(MP, y)) + geom_point()
# Boxplot
ggplot(data, aes(MP, y)) + geom_boxplot()
mod <- lm(y~MP, data=data, x=T); mod
mod$x
summary(mod)
anova(mod)
contrasts(data$MP)
18 / 18

Analysis of Variance Models for Biostatistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Variance Models for Biostatistics

Uploaded by

Copyright:

Available Formats

Analysis-of-Variance Models for

Ernesto Ponsot Balaguer

University of Experimental Technologies Research Yachay (Yachay Tech)

Imbabura, Ecuador - April 2018

2 ANOVA model estimation

In many experimental situations, a researcher applies several

Suppose that a researcher has developed two biological

We need a single land to ensure that its characteristics does

In our model yi are observed performance (number of cobs)

As you probably already noticed, that minimal experiment

Suppose now that our researcher suspects that the

Table 0: Number of samples when add varieties to our experiment

Can you build the matrix form of the model?

ANOVA Models Estimation

As we know, in the ANOVA model the matrix X is not of full rank

ANOVA Models Estimation

Let’s look the general version of the One-Way model.

Here k is the number of levels of the (only one) factor or the

ANOVA Models Estimation

Suppose now that on second thought, that it’s not so

ANOVA Models Estimation

ANOVA Models Estimation

Suppose now that distinguish between the original parameters

Note that we have not added new parameters, we have only

ANOVA Models Estimation

Then, for our example (k = 2, n = 3) we have:

Because we know that τ1 + τ2 = 0 ⇒ τ2 = −τ1 and we do not

ANOVA Models Estimation

Note that, in general, if we want to estimate k + 1 parameters

ANOVA Models Estimation

Table 1: Three methods of packaging frozen foods

ANOVA Models Estimation

You might also like