You are on page 1of 49

ANOVA (Analysis of Variation)

Dr D Nabirasool
Assistant Professor
Indian Institute of Plantation Management Bangalore (IIPMB)
(An Autonomous Organization of the Ministry of Commerce & Industry-GOI)
Contents

 Introduction
 Types of ANOVA
 Principle of ANOVA
 Techniques involved in ANOVA
 One way ANOVA
 Two way ANOVA
 Application
 Reference
ANALYSIS OF VARIANCE (ANOVA)

• Analysis of variance (abbreviated as ANOVA)


• an extremely useful technique concerning researches in the
many fields of economics, biology, education, psychology,
sociology, business/industry and in researches of several
other disciplines.
• This technique is used when multiple sample cases are
involved.
• The ANOVA technique enables us to perform to examine the
significance of the difference amongst more than two
sample means at the same time.
• Using this technique, one can draw inferences about
whether the samples have been drawn from populations
having the same mean.
WHAT IS ANOVA?
 ANOVA is a procedure for testing the difference among
different groups of data for homogeneity.
 Professor R.A. Fisher was the first man to use the term
‘Variance’.
 Variance is an important statistical measure and is
described as the mean of the squares of deviations
taken from the mean of the given series of data. It is a
frequently used measure of variation. square of
standard deviation is called variance.
 i.e., Variance = (standard deviation)2 .
 There may be variation between samples and also
within sample items.
 An ANOVA test is a way to find out if survey or
experiment results are significant
 In other words, ANOVA help us to figure out if there
is need to reject the null hypothesis or accept the
alternate hypothesis
 Basically, we’re testing groups to see if there’s a
difference between them.
Examples:
• A group of psychiatric patients are trying three
different therapies: counseling, medication and
biofeedback. We want to see if one therapy is
better than the others.
• A manufacturer has two different processes to
make light bulbs. They want to know if one
process is better than the other.
• Students from different colleges take the same
exam. You want to see if one college outperforms
the other.
Types of ANOVA

ANOVA is two types


One way ANOVA : only one factor is investigate
 one independent variable (with 2 levels)
 Analysis of Variance could have one IV (brand of cereal)
Two Way ANOVA : investigate two factors at the same
time.
 two independent variables (can have multiple levels).
 Analysis of Variance has two IVs (brand of cereal,
calories).
I. Two way ANOVA without replication
II. Two way ANOVA with replication
Two way Anova without replication
• We are testing one set of individuals before
and after they take a medication to see if it
works or not.
Two way Anova with replication
• Two groups, and the members of those groups
are doing more than one thing.

• For example, two groups of patients from


different hospitals trying two different
therapies.
What is Levels?
Levels is simply a comparison.
 brand of cereal :
Lucky Charms,
Raisin Bran,
Cornflakes — a total of three levels
Calories :
sweetened,
unsweetened – a total of two level
PRINCIPLE OF ANOVA
 we have to make two estimates of population
variance viz., one based on between samples variance
and the other based on within samples variance. Then
the said two estimates of population variance are
compared with F-test, wherein we work out.
Estimate of population variance based on
between samples variance
F = -------------------------------------------------------------
Estimate of population variance based on within
samples variance
• This value of F is to be compared to the F-limit
for given degrees of freedom. If the F value we
work out is equal or exceeds the F-limit value
• we may say that there are significant
differences between the sample means.
Steps in ANOVA Technique
I. Obtain the mean of each sample
II. Work out the mean of the sample means
III. Calculate sum of squares for variance between the
samples (or SS between).
IV. Obtain variance or mean square (MS) between
samples
V. Calculate sum of squares for variance within samples
(or SS within).
VI. obtain the variance or mean square (MS) within
samples
VII. Find sum of squares of deviations for total variance
VIII. Finally, find F-ratio
One way ANOVA
The null hypothesis for the test is that the means
are equal. Therefore, a significant result means
that the means are unequal.
Situation 1 : we might be studying the effects of tea
on weight loss and form three groups: green tea,
black tea, and no tea.
Situation 2: We might be studying leg strength of
people according to weight. We could split
participants into weight categories (obese,
overweight and normal) and measure their leg
strength on a weight machine.
Conclusion
 The above table shows that the calculated value
of F is 1.5 which is less than the table value of
4.26 at 5% level with d.f. being v1 = 2 and v2 = 9
and hence could have arisen due to chance.
 This analysis supports the null-hypothesis of no
difference is sample means.
 The difference in wheat output due to varieties
is insignificant and is just a matter of chance.
 It means that the varieties is not so much differ
from each other
Two way ANOVA

 Is used when the data are classified on the basis


of two factors
For example:-
 The agricultural output may be classified on the
basis of different varieties of seeds and also on
the basis of different varieties of fertilizers
used.
 A business firm may have its sales data
classified on the basis of different salesmen and
also on the basis of sales in different regions.
• Use of two way ANOVA when we have
one measurement variable (i.e. a quantitative variable)
and two nominal variables.
• In other words, if our experiment has a quantitative
outcome and we have two categorical explanatory
variables,
For example:
• we might want to find out if there is an interaction
between income and gender for anxiety level at job
interviews.
• The anxiety level is the outcome, or the variable that can
be measured.
• Gender and Income are the two categorical variables
The factors can be split into levels.
In the above example,

• Income level could be split into three levels:


low, middle and high income.
• Gender could be split into three levels: male,
female, and transgender.
• Treatment groups and all possible
combinations of the factors. In this example
there would be 3 x 3 = 9 treatment groups.
Null hypotheses are placed one observation in each cell.
For this example, those hypotheses would be:

H01: All the income groups have equal mean stress.


H02: All the gender groups have equal mean stress.

For multiple observations in cells, we would also be


testing a third hypothesis:
H03: The factors are independent or the interaction
effect does not exist.
The various steps involved are as follows:
(i) Take the total of the values of individual
items (or their coded values as the case may
be)in all the samples and call it T

T = sum of all the individual value

(ii) Work out the correction factor as under:


correction factor = (T)2 / n
• MS residual or the residual variance provides
the basis for the F-ratio concerning variation
between columns treatment and between
rows treatment .
• MS residual is always due to the fluctuation
of sampling and hence serves as the basis for
the significance test.
• Both the F- ratio are compared with their
corresponding table values for given degree
of freedom at a specific level of significance
• If it is found that the calculated f ratio
concerning variation between columns is
equal to or greater than its table value then
the difference among means is considered
significant.
• Similarly the f- ratio concerning variation
between rows can be interpreted
Conclusion

• The above table shows that the calculated value of F


is 4 which is less than the table value 5.14 of at 5%
level with d.f. being v1 = 2 and v2 = 6 in column .

a) This analysis show has no significance .


The f value in rows
is 6 which is higher than the table 4.76 of a 5% level
with d.f. Being v1=3 and v2= 6.

b)This analysis supports the null-hypothesis which


shows difference is sample means.
• It means that differences concerning varieties
of seeds are insignificant at 5% level as the
calculated F ratio of 4 is less than the table
value of 5.14 but the variety differences
concerning fertilizers are significant as the
calculated F ratio of 6 is more than its table
value of 4.76
• Fertilizers act differently, the different
varieties of seeds are affected differently.

You might also like