You are on page 1of 14

LECTURE 4

ANALYSIS OF VARIANCE (ANOVA)

The analysis of variance is a most powerful statistical tool for test of significance.
The test of significance based on t-distribution is an adequate procedure only for testing
the significance of the difference between two sample means. In a situation when we
have three or more samples to consider at a time, an alternative procedure is needed for
testing the hypothesis that all the samples are drawn from the same population, i.e., they
have the same mean. For example, five fertilizers are applied to four plots each of wheat
and yield of wheat on each of the plot is given. We may be interested in finding out
whether the effect of these fertilizers on the yield is significantly different or in other
words, whether the samples have come from the same normal population. The answer to
this problem is provided by the technique of analysis of variance.
The total variation in a set of observable quantities may, under certain circumstances, be
partitioned in to a number of components, associated with the nature of classification of
the data. The systematic procedure for achieving this is called the analysis of variance.
With the help of the technique of analysis of variance, it will be possible for us to
perform the test for equality of several population means and to provide estimates for
components of variation. In other words by this technique, the total variation in the
sample data is expressed as the sum of its non-negative components where each of these
components is a measure of the variation due to some specific independent source or
factor or cause. The ANOVA consists in the estimation of the amount of variation due to
each of the independent factors (causes) separately and then comparing these estimates
due to assignable factors (causes) with the estimate of variation due to chance factor
(causes), the latter being known as experimental error or simply error.

This technique of Analysis of Variance is based on the concept of linear model


which is fundamental and explained below:
Linear Model
Let y1, y2, …, yn be n observable quantities. In all cases, we shall assume the
observed values to be composed of two parts:

22
yi = μi + ei,
where μi is the true value and ei the error. The true value μi is that part which is due to
assignable causes, and the portion that remains is the error, which is due to various
chance causes. The true value μi is again assumed to be a linear function of k unknown
quantities,  1 , 2 ,...., k , called effects:
 i  ai1 1  ai 2 2  ....  aik k ,

where aij’s are known, each being usually taken to be 0 or 1


This setup is called the linear model
Types of analysis of variance models
A linear model in which all the effects  j ' s are unknown constants called model-
I or a fixed- effects model or linear hypothesis model. It is often the case that one of the
 j ' s is a constant with aij = 1 for that j and all i. Such  j is called a general mean or an

additive constant.
A linear model, in which all, the  j ' s , are random variables except possibly the
additive constant, is called Model-II or a random effects model or variance component
model.
Finally, a model in which at least one  j is a random one and at least one  j is a
constant (other than additive constant) is called a mixed model.

Types of data
One way classified data
We may consider that there are some numbers of classes of a factor, say k, in which the
whole set of n observations is divided in to these classes then such form of the data is
called as one way classified data. The form of the linear model for one way classified
data will be as:
yij = μ + αi + eij
where
yij is the value of jth individual belonging to the ith class
μ is the general mean effect
αi is the effect of ith class

23
eij is the error with the jth observation of the ith class

Two way classified data with one or m number of observations per cell
We can plan an experiment in such a way as to study the effects of two factors in the
same experiment. For each factor there will be a number of classes or levels.
Corresponding to each cell, formed for each level of the two factors, if there is only one
observation then such form of the data is known as two-way classified data with one
observation per cell and form of the linear model for such data will be as:
yij = μ + αi + βj + eij
where
yij is the value of the individual belonging to the i th class of first factor and j th class
of the second factor
μ is the general mean effect
αi is the effect of ith class of the first factor
βj is the effect of the jth class of the second factor
eij is the error with the observation of the individual belonging to the i th class of
first factor and jth class of the second factor
If the number of observations per cell is m (>1) then such form of the data is known as
two-way classified data with m observations per cell and form of the linear model for
such data will be as:
yij = μ + αi + βj + γij + eij
where
yijk is the value of the kth individual belonging to the ith class of first factor and jth
class of the second factor
μ is the general mean effect
αi is the effect of ith class of the first factor
βj is the effect of the jth class of the second factor
γij is the interaction effect of the ith class of first factor and jth class of second
factors
eijk is the error with the observation of the kth individual belonging to the ith class of
first factor and jth class of the second factor

24
Note: The above types of two-way classified data sets are known as orthogonal data and
the understanding of the analysis procedure and its calculation is very systematic and
simple. The two-way classified data with unequal number of observations per cell is
known as non orthogonal data set and their analysis procedure is tedious. Therefore, the
analysis procedure for non orthogonal data set will be beyond the scope of this lecture
series
Assumptions made in the analysis of variance models
There are certain basic assumptions underlying the analysis of variance models. They are:
1. The effects of the different factors (treatment and environment) are additive
2. The errors are independent
3. The errors are distributed normally with mean zero and common variance σ2

In the following section we will discuss the analysis of variance procedure for (a)
one-way classified and (b) two-way classified forms of data with one and m observations
per cell with the presumption that the basic assumptions made in the analysis of variance
are fulfilled.
One-way classified Data
Table 1 Form of the Data

Treatments
1 2 i t
Y11 y21 yi1 yt1
Y12 y22 yi2 yt2
… … … …
Y1j y2j yij ytj
… … … …
Y1r1 y2r2 yiri ytrt
Total Y1. y2. yi. yt. y.. = Grand Total
Mean y1 y2 y3 yt
No. of r1 r2 r3 rt n= r i
observations i

Let there be t treatments and the ith treatment be replicated ri times. Let yi. be the total of
the ith treatment and y.. be the grand total (GT) where

25
yi. = y
j
ij and y.. =  y
i j
ij

Model: yij =  + ti + eij


where
yij is the value of the variate in jth replicate of the ith treatment (i =1, 2,…, t; j =1, 2,…, ri)
 is the general mean effect
ti is the effect due to ith treatment
eij is random error which is assumed to be independently and normally distributed with

mean zero and variance  e2


This means that any Yij is made up of overall mean + treatment effect + an error
term. Since the effects are added in this model, it is known as a linear additive model. It
is commonly known as analysis of variance model. Depending on the design, the analysis
of variance model will take different forms.
Hypothesis: H0: 1 = 2 = … = t

Computational steps:
Step 1: Compute the correction factor
(GrandTotal ) 2 y..2
CF = 
n n
Step 2 : Compute the total sum of squares [SS(Y)]

Total SS = yi j
2
ij - CF

Step 3 : Compute the treatment (or group) sum of squares [SS(T)]


y i2.
Treatment SS =  i ri
- CF

Step 4 : Compute the error sum of squares (within treatment SS) [SS(E)]
Error SS = Total SS – Treatment SS
Table 2 General form of the analysis of variance table for one-way classification
Sources d.f. SS MS F
Between treatments t-1 SS(T) SS(T) s 2t
 s 2t
t 1 s e2

26
Within treatments n-t SS(E) SS( E )
 s e2
(Error) nt
Total n-1 SS(Y)
Example : Wheat yield obtained with four types of variety

A B C D
10 14 17 12
15 18 16 15
8 21 14 17
12 15 15 15
15 17 16
15 15
18
G. total
60 68 112 90 330
Total

Computational steps:
Step 1 : Compute the correction factor
2 y2 (330) 2
CF = (Grand total )  .. = = 4950
n n 22
Step 2 : Compute the total sum of squares

Total SS =  y
i j
2
ij - CF

= (10)2 + (15)2 + … + (15)2 – 4950


= 162
Step 3 : Compute the variety sum of squares
yi2.
Variety SS = i ri
- CF

(60) 2 (68) 2 (112 ) 2 (90) 2


=    - 4950
5 4 7 6
= 68
Step 4 : Compute the error sum of squares (within variety SS)

Error SS = Total SS – Variety SS


= 162 – 68 = 94

ANOVA
Sources d.f. SS MS F

27
Between variety 3 68 22.67 4.34
Within variety
18 94 5.22
(Error)
Total 21 162
Two-way classification with one observation per cell
Table 3 Form of the Data

Treatments
Block 1 2 i t Total Mean
1 Y11 y21 yi1 yt1 R1 y.1
2 Y12 y22 yi2 yt2 R2 y. 2
… … … … … … …
J Y1j y2j yij ytj Rj y. j
… … … … … … …
R Y1r y2r yir ytr Rr y .r
Total T1 T2 Ti Tt GT
Mean y1 . y2 . y3 . yt . y ..
th
Let there be t treatments, the i treatment be replicated r times.

Let Ti = y
j
ij and Rj = yi
ij

Total number of observations = rt


Hypothesis: H01: 1 = 2 = … = t
H02: 1’ = 2’ = … = r’

Model : yij =  + ti + bj + eij


where
yij is the value of the variate for the ith treatment in the jth block (i=1, 2,…, t; j=1, 2,…, r)
 is the general mean effect
ti is the effect due to ith treatment
bj is the effect due to jth block
eij is random error which is assumed to be independently and normally distributed with

mean zero and variance  e2


Computational steps:
Step 1 : Compute the correction factor
(Grand total) 2
CF =
rt

28
where Grand total = T R
i
i
j
j    y ij
i j

Step 2 : Compute the total sum of squares [SS(Y)]

Total SS = y
i j
2
ij - CF

Step 3 : Compute sum of squares due to treatment [SS(T)]


Ti2
Treatment SS = i r - CF
Step 4 : Compute sum of squares due to block [SS(R)]
R 2j
Block SS =  j t
- CF

Step 5 : Compute the error sum of squares [SS(E)]


Error SS = Total SS – Treatment SS – Block SS
Table 4 General form of the analysis of variance table for two-way classification
with one observation per cell

Sources d.f. SS MS F
Replication r-1 SS(R) SS( R ) s 2r
 s 2r
(Blocks) r 1 s e2
Treatments t-1 SS(T) SS(T) s 2t
 s 2t
t 1 s e2
Error (r-1)(t-1) SS(E) SS( E )
 s e2
( r  1)( t  1)
Total rt-1 SS(Y)

Example : Yield per plot for different levels of FYM and replications
Level of
Block I Block II Block III Block IV Total
FYM
1 6.90 4.60 4.40 4.81 20.71
2 6.48 5.57 4.28 4.45 20.78
3 6.52 7.60 5.30 5.30 24.72
4 6.90 6.65 6.75 7.75 28.05
5 6.00 6.18 5.50 5.50 23.18
6 7.90 7.57 6.80 6.62 28.89
Total 40.70 38.17 33.03 34.43 146.33
G. total

29
Computational steps:
Step 1 : Compute the correction factor
(Grand total ) 2 (146.33) 2
CF = = = 892.19
n 24
Step 2 : Compute the total sum of squares

Total SS =  y
i j
2
ij - CF

= (6.90)2 + (6.48)2 + … + (6.62)2 – 892.19


= 28.59
Step 3 : Compute sum of squares due to block
R 2j
Block SS = j t
- CF

(40.70) 2  (38.17) 2  ...  (34.43) 2


= - 892.19
6
= 6.12
Step 4 : Compute sum of squares due to FYM
Ti 2
FYM SS =  - CF
i r

(20.71) 2  (20.78) 2  ...  ( 28.89) 2


= - 892.19
4
= 15.44
Step 5 : Compute the error sum of squares

Error SS = Total SS – Block SS – FYM SS


= 28.59 – 6.12 – 15.44 = 7.03

ANOVA
Sources d.f. SS MS F
Blocks 3 6.12 2.040 4.35*
Treatments (FYM) 5 15.44 3.088 6.58**
Error 15 7.03 0.469
Total 23 28.29

30
Two-way classification with m observation per cell
Table 5 Form of the Data

Factor A
Factor B 1 2 i p Total Mean
1 y111,…,y11m y211,…,y21m yi11,…,yi1m yp11,…,yp1m B1 y.1
2 y121,…,y12m y221,…,y22m yi21,…,yi2m yp21,…,yp2m B2 y. 2
… … … … … … …
J y1j1,…, y1jm y2j1,…, y2jm yij1,…, yijm ypj1,…, ypjm Bj y. j
… … … … … … …
Q y1q1,…,y1qm y2q1,…,y2qm yiq1,…,yiqm ypq1,…,ypqm Bq y .r
Total A1 A2 Ai Ap GT
Mean y1 . y2 . y3 . yt . y ..
Let there be p levels of factor A, q levels of Factor B and corresponding to each cell of
factor A and B there are m observations.

Let y ij   y ijk , A =
i
 y
j k
ijk =
j
y ij ,
k

Bj =  y
i k
ijk =
i
y ij and GT=    y ijk
i j k

Total number of observations = rtm


Hypothesis: H01: 1 = 2 = … = p
H02: 1’ = 2’ = … = q’
H03: There is no interaction effect

Model : yijk =  + ai + bj + cij + eijk


where
yijk is the value of the kth individual in the (i, j)th cell (i=1, 2,…, p; j=1, 2,…, q)
 is the general mean effect
ai is the effect due to ith level of factor A
bj is the effect due to jth level of factor B
cij is the interaction effect of ith level of factor A and jth level of factor B
eijk is random error with kth individual in the (i, j)th cell

Computational steps:
Step 1 : Compute the correction factor
31
(GT ) 2
CF =
pqm

Step 2 : Compute the total sum of squares [SS(Y)]

   y - CF
2
Total SS = i j k ijk

Step 3 : Compute sum of squares due to Factor A[SS(A)]


Ai2
ASS =  - CF
i qm

Step 4 : Compute sum of squares due to Factor B [SS(B)]


B 2j
BSS =  pm - CF
j

Step 5 : Compute sum of squares due to AXB interaction [SS(AXB)]


1
AXB SS = [ m   y ij  CF ] – ASS – BSS
2

i j

Step 6 : Compute the error sum of squares [SS(E)]


Error SS = Total SS – ASS – BSS - AXB SS
Table 6 General form of the analysis of variance table for two-way classification
with m observation per cell

Sources d.f. SS MS F
Factor A p-1 SS(A) SS ( A)
 s A2 s A2
p 1 F1 =
s e2
Factor B q-1 SS(B) SS ( B )
 s B2 s B2
q 1 F2 =
se2
Interaction (p-1)(q-1) SS(AXB) SS ( AXB ) s B2
 s AB
2
F3 = 2
(AXB) ( p  1)(q  1) s e
Error pq(m-1) SS(E) SS ( E )
 s e2
pq(m  1)
Total pqm-1 SS(Y)

Example : An experiment was conducted to determine the effects of five different


varieties of cowpeas (V1, V2, …, V5) and three different spacings, viz. 4”, 8”and 12” (S1,
S2 and S3) apart in a row, with rows 3’ apart, and also to see whether the varieties behave
differently at different spacings. The data below given the yield of each of 4 plots taken

32
for each variety-spacing combination:

Spacing
Variety S1 S2 S3
V1 56 45 43 46 60 50 45 48 66 57 50 50
V2 61 58 55 56 60 59 54 54 59 55 51 52
V3 63 53 49 48 65 56 50 50 66 58 52 55
V4 65 61 60 63 60 58 56 60 53 53 48 55
V5 60 61 50 53 62 68 67 60 73 77 77 65
Carry out an analysis of variance for the above data.

Solution:
Let yijk denote the yield of the kth plot for the ith variety at the jth spacing (i=1, 2, 3,
4, 5; j=1, 2, 3; k=1, 2, 3, 4).
The sub-totals for the five varieties, the three spacings and the fifteen variety-
spacing combinations, and the grand total are shown below:

Spacing
Variety S1 S2 S3 Total
V1 190 203 223 616
V2 230 227 217 674
V3 213 221 231 665
V4 249 234 209 692
V5 224 257 292 773
Total 1106 1142 1172 3420

Computational steps:
Step 1 : Compute the correction factor
(GT ) 2 (3420) 2
CF = = = 194940
pqm 60
Step 2 : Compute the total sum of squares [SS(Y)]

   y - CF
2
Total SS = i j k ijk

= 198184 – 194940 = 3244


Step 3 : Compute sum of squares due to Variety [SS(V)]
Ai2
VSS = i qm - CF
(616) 2  (674) 2  ...  (773) 2
= - 194940
12
= 196029.1667 – 194940 = 1089.1667

33
Step 4 : Compute sum of squares due to Spacing [SS(S)]
B 2j
SSS =  pm - CF
j

(1106) 2  (1142 ) 2  (1172 ) 2


= - 194940
20
= 195049.2 – 194940 = 109.2
Step 5 : Compute sum of squares due to VxS interaction [SS(VxS)]
1
VxS SS = [ m   y ij  CF ] – VSS – SSS
2

i j

 (190) 2  (203) 2  ...  ( 292) 2 


=   194940 - 1089.1667 – 109.2
 20 

= (197013.5 – 194940) - 1089.1667 – 109.2 = 875.1833


Step 6 : Compute the error sum of squares [SS(E)]
Error SS = Total SS – VSS – SSS - VxS SS
= 3244 – 1089.1667 – 109.2 – 875.1833 = 1170.4500

ANOVA
Sources of variation d.f. SS MS F
Due to varieties 4 1089.1667 272.292 10.469**
Due to spacings 2 109.2000 54.600 2.099NS
Due to interaction 8 875.1833 109.398 4.206**
Error 45 1170.4500 26.010
Total 59 3244.0000

Interpretation of ANOVA results


We have seen that the null hypothesis of equal treatment means, that is, 1 = 2 =
… = t can be tested using the analysis of variance technique. The test statistic in this
case is F, which is the ratio of the Factor mean square to the error mean square. The
calculated F value is then compared with the table value of F. The table value is read
against the specified level of significance and treatment and error degrees of freedom.
If the calculated value of F is greater than the table value, then F is significant;
otherwise F is not significant. If F is significant at 5% level of significance, it is indicated
by placing one asterisk (*) on the computed F-value. If it is significant at 1% level of
significance, two asterisk (**) are placed on the computed F-value.

34
A non-significant F may result either due to small treatment/ Factor difference or
a very large experimental error or both. It does not mean always that all the treatments
have the same effect. When the experimental error is large, it is an indication of the
failure of the experiment to detect treatment differences. In order to find out the
reliability of the experiment, the coefficient of variation (CV) is used. It is computed as
Error MS
CV  x100
Overall mean

If the CV is 20% or less, it is an indication of better precision of the experiment.


When the CV is more than 20%, the experiment may be repeated and efforts should be
made to reduce the experimental error.
In case of significant F, the null hypothesis is rejected. Then the problem is to
know which of the treatment means are significantly different. Many test procedures are
available for this purpose which is beyond the scope of the topic.

35

You might also like