ANOVA
ANALYSIS OF VARIANCE(ANOVA)
Application
• An agronomist may like to know whether yield per acre will
be the same if four different varieties of wheat are sown in
different identical plots.
• A dairy farm may like to test whether there is significant
difference between the quantity and quality of milk obtained
from different classes of cattle.
• A business manager may like to find out whether there is any
difference in the average sales by four salesmen and so on.
• In all these situation, ANOVA can be applied
What is ANOVA?
• It is a statistical technique specially designed to
test whether the means of several samples
differ significantly or not.
• It was developed by RA Fisher in 1920’s
• It involves classifying and cross classifying
statistical results and testing whether the means
of several samples are equal.
• Analysis of variance technique is used to
test whether the means of several samples
differ significantly. It tests whether given
samples are drawn from populations with
same mean or all samples belong to the
same population.
• T-test is an adequate procedure for testing
null hypothesis when we have means of only
two samples to consider.
• ANOVA-For testing null hypothesis when
we have means of three or more samples.
• Eg: A fertilizer is applied to 4 plots. ANOVA
can be employed to test whether the effect of
the fertilizer in the 4 plots significantly differ
or not.
• Analysis of variance techniques was
used in agricultural Research, natural
science and social science.
Definition
• Analysis of variance may be defined as a
techniques which analysis the variance of two or
more comparable series (samples) for
determining the significance of differences in
their arithmetic means, and for determining
whether different samples under study are
drawn from same population or not, with the
help of the statistical techniques called F –test.
Characteristics
1 It makes statistical analysis of variances of two
or more samples.
2 It test whether the difference in the means of
different sample is due to chance or due to any
significant cause .
3 It uses the statistical test called F- test by
finding the appropriate variance ratio.
Assumptions in Analysis of variance
1. Population from which samples have been drawn are
normally distributed.
2. Populations from which the samples are drawn have
same variance.
3. The observations in the samples are randomly selected
from the population.
4. The observations are non-correlated random variables.
5. Any observation is the sum of the effects of the factors
influencing it.
6. The random errors are normally distributed with mean
0 and a common variance σ2
Techniques of ANOVA
• For the sake of clarity, the technique of
ANOVA has been discussed separately for
a) One-way classification
b) Two way classification
One way classification of data
In one way classification, observation(data) are classified
in to groups according to/ on the basis of only one
criterion.
The null hypothesis is:
H0 : µ1=µ2=µ3=………..= µK
Eg:- Suppose we want to study the yield of crop. This
study can be made with effect of one variable (fertilizers)
on different paddy fields. Here, we apply different kinds
of fertilizers on different paddy fields and try to find out
the difference in the effect of these different kinds of
fertilizers on yield. So, we get ‘k’ samples each of them
drawn from different population [paddy field]. Here, the
only one variable is the ‘effect of fertilizers on yield’.
• In one way classification we have k samples.
Each sample contains observation collected
from different population where different
fertilizers are applied.
• Any observation belongs to the collected data
has 3 components namely general effect,
effect of the fertilizer and effect of unknown
factors .
Two way classification of data
In two way classification, observation are classified into
groups on the basis of two criteria.
For example: Suppose we want to study the yield of a
crop. We can study the effect of 2 variables-fertilizer and
seed. Here, we apply different kinds of fertilizers and
different kinds of seeds on different paddy fields and try
to find out the difference on the effect of these different
fertilizers and different seeds on the yield. We get ‘c’
sample in ‘c’ columns and ‘r’ samples in ‘r’ rows.
So an observation has 4 components namely general
effect, effect of the fertilizer, effect of seed and effect of
unknown factors.
Types of variances
Types of variances in one way classification
a) Variance between samples
b) Variance within the sample
c) Variance about the sample (Total variance for all
observations together)
Types of variances in two way classification
a)Variance between samples due to column variable
b)Variance between samples due to row variable
c)Variance within the sample
d)Variance about the sample(Total variance for all
observations together)
• Variance between samples is the net result
of variation of different sample means from
the mean of the means(grand mean). Grand
mean is the mean of all the sample means.
• Variance within samples is the net result of
variations of the different items of the
samples from the respective sample means.
This is also called residual variance.
• Variance about the samples(Total variance):
This refers to the variation of all the items of all
the samples from the mean of mean(grand mean)
• Variance between samples due to column
variable and due to row variable: If we work out
the variation of different samples reading the
observations column wise, we get variance
between the samples due to column variable.
Similarly, we can find the variance between
samples due to row variable.
Short cut method
Procedure
1] Assume that the means of the samples are
equal, i.e., the effects of all factors are equal.
H0 =µ1=µ2=µ3=…..µk
2. Find out Correction Factor i.e. T2 ÷ N where T is the sum of all
the observations and N is the total number of values.
3. Find out Total Sum of Squares(SST) = Sum of squares of all
observation minus correction factor.
4. Find out Sum of Squares between the samples(SSC) =
SSC= (ΣX1)2 /n1+ (ΣX2)2 /n2+ (ΣX3)2 /n3 minus correction factor.
(n1 stands for number of observation s in the first column and
n2 stands for …..and so on)
5. Find out Sum of Squares within the samples(SSE) = SST-SSC
6. Then calculate Means Square between samples, MSC: MSC=
SSC/k-1
(k stands for number of columns)
7. Then calculate MSE: MSE=SSE/N-k
(N stands for total number of observations)
8. Calculate F ratio =MSC/MSE (take larger value as numerator)
9. Obtain the table value of F for [ k-1 , N-k] degree of freedom (if
MSE is larger than MSC degrees of freedom is N-k, k-1)
If the calculated value of F is less than table value, accept the
null hypothesis that sample means are equal. That is, we
conclude that the factors’ influence are equal or not
significantly different.
ANOVA TABLE
Anova table presents the various results
obtained while carrying out the analysis of
variance. A specimen of an Anova table in one
way classification is given below:
Source of variation Sum of squares degrees of freedom Mean Square(MS) Variance Ratio(F)
Between samples SSC k-1 MSC=SSC/k-1
Within samples SSE N-k MSE=SSE/N-k MSC/MSE
Total SST N-1
Problems
• Below are given the yield [in kg] per acre
for 4 varieties of treatment. Carry out an
analysis of variance and state your
conclusion
Treatment
A B C D
42 48 68 80
50 66 52 94
62 68 76 78
34 78 64 82
52 70 70 66
Answer
H0 : Means of the samples are equal, i.e., the
effects of all treatments are equal.
H0 =µ1=µ2=µ3=µ4
H1 : Means of the samples are not equal, i.e.,
the effects of all treatments are not equal.
Sum of Sum of square
A B C D 2 2 2 2
values A B C D of values
42 48 68 80 1764 2304 4624 6400
50 66 52 94 2500 4356 2704 8836
62 68 76 78 3844 4624 5776 6084
34 78 64 82 1156 6084 4096 6724
52 70 70 66 2704 4900 4900 4356
ƩX 240 330 330 400 T=1300 11968 22268 22100 32400 88736
13002/
CF T2/N 20 84500 Sum of squares of observation 88736
SST=sum of squares of
(ƩX)2 57600 108900 108900 160000 observation-CF 88736-84500
n 5 5 5 5
(ƩX)2 /n 11520 21780 21780 32000 87080
SSC (ƩX)2 /n-CF 87080-84500 2580 SST 4236
SSE=SST-SSC
SSE=4236-2580=1656
MSC=SSC/k-1
MSC=2580/4-1= 2580/3=860
MSE=SSE/N-k=1656/20-4=1656/16=103.5
F=MSC/MSE=860/103.5=8.3
ANOVA TABLE
Source of Sum of degrees of Mean Variance
variation squares freedom Square(MS) Ratio(F)
Between SSC=2580 k-1=3 MSC=SSC/k-1
samples 2580/3=860
Within SSE=1656 N-k=16 MSE=SSE/N-k MSC/MSE
samples 1656/16=103.5 860/103.5=8.3
Total SST=4236 N-1=19
Degree of freedom=(k-1, N-k)=(4-1,20-4)=(3,16)
Table value at 5% LOS for (3,16) degrees of freedom is 3.24. The calculated
value of F is more than TV of F. Therefore, the null hypothesis is rejected. The
mean values of samples are not equal. The treatment do not have the same
effect.
Problems
• To assess the significance of possible variation in
performance in a certain test between the schools of a
city, a common test was given to a number of students
taken at random from each of the four schools concerned.
The results are given below: Make an analysis of variance
of data. A B C D
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
Answer
H0 : Means of the samples are equal, i.e., the
performance of the schools are equal
H0 =µ1=µ2=µ3=µ4
H1 : Means of the samples are not equal,
i.e., the performance of the schools are not
equal.
Sum of
2 2 2 2 Sum of
A B C D values A B C D
squares
T
8 12 18 13 64 144 324 169
10 11 12 9 100 121 144 81
12 9 16 12 144 81 256 144
8 14 6 16 64 196 36 256
7 4 8 15 49 16 64 225
ƩX 45 50 60 65 220 421 558 824 875 2678
CF T2/N 22002/20 2420
(ƩX)2 2025 2500 3600 4225
n 5 5 5 5 20 SST 2678-CF 258
(ƩX)2 /n 405 500 720 845 2470
SSC 2470-2420 50 MSC=SSC/k-1 50/3=16.67 F MSC/MSE
SSE SST-SSC 208 MSE=SSE/N-k 208/16=13 16.67/13 1.28
Degree of freedom 3,16
Table value 3.24
Conclusion Ho accepted
ANOVA Table
Source of variation Sum of squares degrees of Mean Square(MS) Variance Ratio(F)
freedom
Between samples SSC=50 k-1=3 MSC=SSC/k-1
=16.67
Within samples SSE=208 N-k=16 MSE=SSE/N-k MSC/MSE
=13 16.67/13=1.28
Total SST=258 N-1=19
Degree of freedom=(k-1, N-k)=(4-1,20-4)=(3,16)
Table value at 5% LOS for (3,16) degrees of freedom is 3.24. The calculated
value of F is less than TV of F. Therefore, the null hypothesis is accepted. The
mean values of samples are equal. The performance of the schools are equal
Qn
• Three varieties of crops A B C are tested in a
randomnised block design with four replication:
The yields (in kg) are given below. Test whether
there is difference between varieties.
Variety Replications Total
1st 2nd 3rd 4th
A 6 4 8 6 24
B 7 6 6 9 28
C 8 5 10 9 32
Total 21 15 24 24 84
An
H0 : Means of the samples are equal, i.e., the yields of the varieties are equal
H0 =µ1=µ2=µ3
H1 : Means of the samples are not equal, i.e., the yields of the varieties are not
equal.
SST=36
SSC=8
SSE=28
MSC=4
MSE=3.1
F=4/3.1=1.29
Degree of freedom=(2,9)
Table Value=4.26
Conclusion= Null Hypothesis is accepted. There is no significant difference between
the varieties in terms of their yield.
Homework
• Exercises 10,11 and 13
Two way analysis of variance
• In one way classification model we consider
only a single factor. There are, however, many
situations in which the response variable of
interest may be effected by morethan one
factor. We want to know the effects of
morethan one factor. In two way classification,
the data are classified according to two
different criteria or factors.
• SST = SSC+SSR+SSE
• Sum of squares of coloums + sum of
squares of rows+ sum of squares of
residual equals the sum of squares of
variations.
Short cut method
Procedure
1 a)Assume means of all columns are equal
b)Assume means of all rows are equal
2. Find out Correction Factor i.e. T2 ÷ N where T is the
sum of all the observations and N is the total number of
values.
3. Find out Total Sum of Squares(SST) = Sum of squares
of all observation minus correction factor.
4. Find out Sum of Squares between the columns(SSC) =
SSC= (ΣX1)2 /n1+ (ΣX2)2 /n2+ (ΣX3)2 /n3 minus correction factor, where ΣX1 , ΣX2……are column
totals
5. Find out Sum of Squares between the rows(SSR) =
SSR = (ΣX1)2 /n1+ (ΣX2)2 /n2+ (ΣX3)2 /n3 minus correction factor, where ΣX1 , ΣX2……are row totals
6. Find out Sum of Squares within the samples(SSE) =
SSE =SST-SSC-SSR
7. Then calculate Mean Square between Columns : MSC= SSC/(c-1) and
Mean Square between Rows: MS= SSR/(r-1)
8 Then calculate MSE: MSE=SSE/(c-1)(r-1)
9.Calculate F ratios
FC = MSC/MSE and FR = MSR /MSE (It is to be noted that Bigger one should be the
numerator)
10 Obtain the table value of Fc for [ c-1 , (c-1)(r-1)] degree of freedom (where c is the total number
of columns and r is the number of rows)
If the calculated value of Fc is less than table value of F, we accept the null hypothesis that
column means are equal.
Obtain the table value of FR for [ r-1 , (c-1)(r-1)] degree of freedom (where c is the total number
of columns and r is the number of rows)
If the calculated value of FR is less than table value of F, we accept the null hypothesis that
row means are equal.
ANOVA TABLE
SOURCE OF SUM OF DEGREE OF MEANS OF SQUARE F RATIO
VARIATION SQUARES FREEDOM
Between coloums SSC [c-1] MSC=SSC/c-1 FC =MSC/MSE
Between rows SSR [r-1] MSR=SSR/r - 1 FR=MSR/MSE
Residuals SSE [c-1] x [r -1] MSE = SSE/ [c-1] [r-1]
Total SST N- 1
1 Make a null hypothesis
2 Find out the correction factor
3 Find SST
4 Find out SSC
5 Find out SSR
6 Find SSE
7 Find out MSC
8 Find out MSR
9 Find out MSE
10 Find FC
11 Find FR
12 Prepare an ANOVA table
13 Compute Table values for degree of freedom
14 Conclusion
Problems
• Apply the technique of ANOVA (two way) to the following data relating
to yield of 4 varieties (A, B, C and D) in respecct of 5 different fertilizers
(F1, F2, F3, F4 and F5).
A B C D
F1 8 12 18 13
F2 10 11 12 9
F3 12 9 16 12
F4 8 14 6 16
F5 7 4 8 15
Answer
Null Hypothesis:
1 a)Means of all columns (varieties) are equal
b)Means of all rows (Fertilizers) are equal
An
A B C D ƩX (ƩX)2 n (ƩX)2 /n
8 12 18 13
F1 51 2601 4 650.25
10 11 12 9
F2 42 1764 4 441
12 9 16 12
F3 49 2401 4 600.25
8 14 6 16
F4 44 1936 4 484
7 4 8 15
F5 34 1156 4 289
ƩX 45 50 60 65 220 20 2464.5
(ƩX)2 2025 2500 3600 4225
n 5 5 5 5 20
(ƩX)2 /n 405 500 720 845 2470
2 2 2 2
A B C D
64 144 324 169
100 121 144 81
144 81 256 144
64 196 36 256
49 16 64 225
421 558 824 875
2678
2. Find out Correction Factor i.e. T2 ÷ N
T=220
N=20
T2 ÷ N= (220x220)/20 =2420
3. Find out Total Sum of Squares(SST) = Sum of squares of all
observation minus correction factor.
= 2678-2420=258
4.SSC= 2470-CF=2470-2420=50
5.SSR=2464.5-CF=2464.5-2420=44.5
6.SSE=SST-SSC-SSR=258-50-44.5=163.5
7.MSC=SSC/(c-1)=50/3=16.67
8.MSR=SSR/(r-1)=44.5/4=11.125
9.MSE=SSE/(c-1)*(r-1)=163.5/3*4=13.625
10 Find FC =MSC/MSE=16.67/13.625=1.22
11 Find FR =MSE/MSR =13.625/11.125=1.22
12 Prepare an ANOVA table
SOURCE OF SUM OF DEGREE OF MEANS OF SQUARE F RATIO
VARIATION SQUARES FREEDOM
Between coloums SSC=50 [c-1]=4-1=3 MSC=SSC/c-1=16.67 FC =MSC/MSE=1.22
Between rows SSR=44.5 [r-1]=5-1=4 MSR=SSR/r – 1=11.125 FR=MSR/MSE=1.22
Residuals SSE=163.5 [c-1] x [r -1]=(4-1) MSE = SSE/ [c-1] [r-1]=13.625
Total SST=258 N- 1
13. Table value of Fc for [ 3,12] degree of freedom=3.49
Calculated value of FC is less than table value of F,
we accept the null hypothesis that column means
(varieties) are equal.
Table value of FR for [12,4] degree of freedom=5.91
Calculated value of FR is less than table value of F,
we accept the null hypothesis that row means
(fertilizers) are equal.
• Qn: Three varieties of crops A, B and C are tested in a
randomnised block design with four replication: The
yields are given bellow. Test whether there is
difference between varieties. Test also whether yield
of different replications differ significantly.
Variety Replications Total
1st 2nd 3rd 4th
A 6 4 8 6 24
B 7 6 6 9 28
C 8 5 10 9 32
Total 21 15 24 24 84
Coding methode
• Coding method refers to the addition,
multiplication, subtraction or division of data by a
constant or common factor. In the computation of
analysis of variance, the final quantity tested is a
ratio, hence dimensionless. Thus, the original
values can be coded to simplify calculations
without the need for any subsequent adjustment of
the results. Coding method can be applied for both
one way and two way analysis.
• Qn: The following data relate to the yield of four
varieties of wheat each sown on 5 plots. Find
whether there is a significant difference between
the mean yield of these varieties
Plot A B C D
1 99 103 109 104
2 101 102 103 100
3 103 100 107 103
4 99 105 97 107
5 98 95 99 106
Ex 23
A certain company had four salesmen A, B, C, D each of
whom was sent for a month to three types of areas
countryside K, outskirts of a city O and shopping centre of
the City S. The sales in hundreds of Rupees per month are
shown below.
City/Salesman A B C D
K 30 70 30 30
O 80 50 40 70
S 100 60 80 80
Carry out an analysis of variance and interpret the results.
Problems
• To assess the significance of possible variation in performance
in a certain test between the schools of a city, a common test
was given to a number of students taken at random from each
of the four schools concerned. The results are given below:
Make an analysis of variance of data.
A B C D
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
H0 =µ1=µ2=µ3=µ4 Solution Direct Method
A B C D
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
Mean= 9 10 12 13 Grant Mean
11
SSC [(11-9)2]x5=20 [(10-11)2]x5=5 [(12-11)2]x5=5 [(11-13)2]x5=20 50
SSE [(8-9)2]+[(10-9)2]+[(12- [(12-10)2]+[(11-10)2]+[(9- [(18-12)2]+[(12- [(13-13)2]+[(9- 208
9)2]+[(8-9)2]+[(7- 10)2]+[(14-10)2]+[(4- 12)2]+[(16- 13)2]+[(12-
9)2]=16 10)2]=58 12)2]+[(6- 13)2]+[(16-
12)2]+[(8-12)2]= 13)2]+[(15-13)2]=
104 30
SST 258
MSC SSC/k-1 50/3=16.67
MSE SSE/N-k 208/16=13
F MSC/MSE 16.67/13=1.28
Table Value DF 3,16 5% Level of significance 3.24
ANOVA TABLE
Source of variation Sum of squares degrees of freedom Mean Square(MS) Variance Ratio(F)
Between samples SSC=50 k-1=3 MSC=SSC/k-1
=16.67
Within samples SSE=208 N-k=16 MSE=SSE/N-k MSC/MSE
=13 16.67/13=1.28
Total SST=258 N-1=19