You are on page 1of 8

Assumptions in the ANOVA

• Additive Effects 4
Analysis of Count • Independence of 3
and Proportion Data errors

Variance
2
• Homogeneity of
variances 1
• Normal distribution
0
0 2 4 6 8 10
Mean
Violeta I. Bartolome
Senior Associate Scientist
PBGB-CRIL
v.bartolome@cgiar.org

Count Data Proportion data

• Response variable
10
9
• Count of the number
is an integer
8
7
of failures of an 2

Variance
event as well as the
Variance

6
• Variance usually 5
4 number of successes
increase linearly 3
1

with the mean 2 • Variance will be an


1
0 inverted U-shaped 0

• Errors are not 0 2 4 6 8 10


0 2 4 6 8 10
function of the mean. Mean
normally distributed Mean
Count Data
Analysis of Count data
For treatment levels, define the
control as the first level when
sorted in ascending order. GLM
uses the first level as reference.

Analysis of count data


Overdispersion

• Residual deviance is inflated


• There are extra, unexplained
Note: glm
uses the first variation in the response
level as
reference.
• May result if the underlying
distribution is not Poisson
401.45/15=26.8
• Compensate for the overdispersion
Residual deviance
is much greater by refitting using quasi-Poisson
than df.
df. Indication
of overdispersion
rather than Poisson errors.
Correct for overdispersion
ANOVA table

401.47/15=26.8

Residual Plot
Standardized residuals
• After fitting a model to data, we
• For count data • For proportion data
should investigate how well the
y − fittedvalue
model describes the data. y − fitted value
 
fitted values  
• With normal errors, the raw and fitted valuesx 1 −
fitted values 
 binomial deno min ator 
standardized residuals are identical.  
 
• The standardized residuals are
required to correct non-normal errors
(like in count and proportion).
Residual plot
Compute standardized residuals

Predicted Means

Note:
differences are
based on
transformed
values

If the interval
includes zero then
difference is not
significant.
Proportion Data

Traditional Analysis General Approach

o Convert to percentage data and used • Use general linear model (glm)
as response variable • Family=binomial
o Not good • Uses two vectors, one for success
o Errors are not normally distributed counts and the other for failure
o Variances are heterogeneous counts
o Response is bounded by 0 and 100 • Number of failures + number of
o Size of the sample, n, is lost successes = binomial denominator, n
Analysis of proportion Create response matrix

• First column is success or failure


• Second column is n - first column

Analysis of proportion Correct for overdispersion

123.96/45=2.8
An indication of
overdispersion
ANOVA table Plot standardized residuals

Predicted Means
Mean Comparison

Thank you!