You are on page 1of 49

Class Seminar Paper : Biostatics

Topic : ANOVA (Analysis of Variation)

Submitted
By:
MD. JIYAUL MUSTAFA
(M.Sc. Biotech. Ist Sem.)

DEPARTMENT OF
BIOTECHNOLOGY
Contents
Introduction
Types of ANOVA
Principle of ANOVA
Techniques involved in ANOVA
One way ANOVA
Two way ANOVA
Application
Reference
ANALYSIS OF VARIANCE (ANOVA)
 Analysis of variance (abbreviated as ANOVA)

 an extremely useful technique concerning researches in the


many fields of economics, biology, education, psychology,
sociology, business/industry and in researches of several other
disciplines.

 This technique is used when multiple sample cases are


involved.

 The ANOVA technique enables us to perform to examine the


significance of the difference amongst more than two sample
means at the same time.

 Using this technique, one can draw inferences about


whether the samples have been drawn from populations
having the same mean.
WHAT IS ANOVA?

ANOVA is a procedure for testing the difference among


different groups of data for homogeneity.

 Professor R.A. Fisher was the first man to use the term
‘Variance’.

Variance is an important statistical measure and is described as


the mean of the squares of deviations taken from the mean of the
given series of data. It is a frequently used measure of variation.
square of standard deviation is called variance.

i.e., Variance = (standard deviation)2 .

There may be variation between samples and also within sample


items.
An ANOVA test is a way to find out if survey
or experiment results are significant

In other words, ANOVA help us to figure out


if there is need to reject the null hypothesis or
accept the alternate hypothesis

Basically, we’re testing groups to see if there’s


a difference between them.
Examples:
A group of psychiatric patients are trying three
different therapies: counseling, medication and
biofeedback. We want to see if one therapy is
better than the others.

A manufacturer has two different processes to make


light bulbs. They want to know if one process is
better than the other.

Students from different colleges take the same exam.


You want to see if one college outperforms the other.
Types of ANOVA

 ANOVA is two types

One way ANOVA : only one factor is investigate


one independent variable (with 2 levels)
Analysis of Variance could have one IV (brand of cereal)

Two Way ANOVA : investigate two factors at the same


time.
two independent variables (can have multiple levels).
Analysis of Variance has two IVs (brand of cereal,
calories).

I. Two way ANOVA without replication


Two way Anova without replication

We are testing one set of individuals before and after


they take a medication to see if it works or not.

Two way Anova with replication

two groups, and the members of those groups


are doing more than one thing.

For example, two groups of patients from


different
hospitals trying two different therapies.
What is Levels?

Levels is simply a comparison.


brand of cereal :
Lucky Charms,
Raisin Bran,
Cornflakes —
a total of
three levels

Calories :
sweetened,
unsweetene
d–

a total of
PRINCIPLE OF ANOVA

we have to make two estimates of population variance viz., one


based on between samples variance and the other based on within
samples variance. Then the said two estimates of population variance
are compared with F-test, wherein we work out.

F = Estimate of population variance based on between samples


variance
Estimate of population variance based on within samples variance

This value of F is to be compared to the F-limit for given degrees of


freedom. If the F value we work out is equal or exceeds the F-limit
value

we may say that there are significant differences between the
ANOVA Technique

I. Obtain the mean of each sample


II. Work out the mean of the sample means
III. Calculate sum of squares for variance between
the samples (or SS between).
IV. Obtain variance or mean square (MS) between
samples
V. Calculate sum of squares for variance within
samples
(or SS within).
VI. obtain the variance or mean square (MS)
within samples
VII. Find sum of squares of deviations for total
One way ANOVA
The null hypothesis for the test is that the means are equal.
Therefore, a significant result means that the means are
unequal.
Situation 1 : we might be studying the effects of tea on
weight loss and form three groups: green tea, black tea,
and no tea.

Situation 2: . We might be studying leg strength of people


according to weight. We could split participants into weight
categories (obese, overweight and normal) and
measure their leg strength on a weight machine.
Table : setup for One Way ANOVA

Source of Sum of squares (SS) Degree of Mean squares F-ratio


variation freedom (d.f.) (MS)

Between n1 (X ̅1 - X̅̅ )2 + n2 (k-1) SS between


samples or (k-1)
categories (X̅2 - X̅̅ )2+ .....+ nk
MS
(X̅K - X̅̅ )2
between
∑( X1i – X̅1 )2 + ∑(Xki – (n-k) SS within MS within
Within samples (n-k)
or categories X̅k )2

Tota ∑( X ij - X̅̅ )2 (n-1)


l
i- 1,2 ,....
J =1,2,.....
Example -1
Set up an analysis of variance table for the following per acre
production data for three varieties of wheat, each grown on 4 plots
and state if the variety differences are significance.

per acre production data


Plots of land variety of wheat
A B C

1 6 5 5
2 7 5 4
3 3 3 3
4 8 7 4
Step 1 : obtain the mean of each sample i.e., obtain

X̅1
= 6 +7 + 3 + 8 = 24/4 = 6
4

X̅2
= 5+5+3+7 =20/4 = 5
4

= 5+4+3+4 = 16/4 = 4
X̅3
4
Step 2: Find mean of sample mean
Mean of the sample mean or X̅̅ = x 1̅ +x 2̅ +x 3
k

6+5+4
3

=5
Step 3: Now we work out between and SS
within samples:

SS between = n1 (x1̅ –x)̅̅ 2+ n2 (x2̅ -x)̅̅ 2 + n3(x3̅ -x)̅̅ 2


= 4(6-5)2 +4(5-5)2 + 4(4-5)2
Step : calculate the SS within the Sample

SS within = ∑(X 1i-X̅̅1)2 + ∑(X2i – X̅̅2)2 +


∑(X3i – X̅̅3)2

=(6-6)2 + (7-6)2 + (3-6)2 + (8-6)2

+(5-5)2 + (5-5)2 + (3-5)2 + (7-


5)2

+
Now calculate total variance

SS for total variance = ∑ (Xij – X̅̅)2


= (6-5)2 + (7-5)2 + (3-5)2+ (8-5)2
+ (5-5)2 + (5-5)2 + (3-5)2

+ (7-5)2
+ (5-5)2 + (4-5)2 + (3-5)2 + (4-5)2

= 1+4 +4 + 9 + 0 + 0 +4 + 4+ 0+1 +4+1


=32

Alternatively it (SS for total variance) can also be


worked out thus:
SS for total = SS between + SS within
Table one way anova
:
Source of d.f. f- ratio 5% F-
SS variation MS limit(from
the f
ratio
Between (3-1)= 2 8/2 = 4 table)
8 sample
4/2.67=
1.5 F(2,9)=
Within 24 (12-3) = 9 24/9 = 4.26
Sample 2.67

Tota 32 (12-1) =
l 11
Conclusion
 The above table shows that the calculated value of F is 1.5
which is less than the table value of 4.26 at 5% level with
d.f. being v1 = 2 and v2 = 9 and hence could have arisen
due to chance.

 This analysis supports the null-hypothesis of


no difference is sample means.

 The difference in wheat output due to varieties is


insignificant and is just a matter of chance.
 It means that the varieties is not so much
differ from each other
Two way ANOVA
is used when the data are classified on the basis of two
factors

For example:-

The agricultural output may be classified on the basis of


different varieties of seeds and also on the basis of
different varieties of fertilizers used.

A business firm may have its sales data classified on the


basis of different salesmen and also on the basis of sales in
different regions.
Use of two way ANOVA when we have
one measurement variable (i.e. a quantitative variable) and
two nominal variables.

In other words, if our experiment has a quantitative outcome and


we have two categorical explanatory variables,

For example:

we might want to find out if there is an interaction between income


and gender for anxiety level at job interviews.

The anxiety level is the outcome, or the variable that can


be measured.

Gender and Income are the two categorical variables


The factors can be split into levels.
In the above example,

Income level could be split into three levels:


low, middle and high income.

Gender could be split into three levels: male,


female, and transgender.

Treatment groups and all possible


combinations of the factors. In this example
there would be 3 x 3 = 9 treatment groups.
Null hypotheses are placed one observation in each
cell.
For this example, those hypotheses would be:

H01: All the income groups have equal mean stress.

H02: All the gender groups have equal mean stress.

For multiple observations in cells, we would also be testing


a third hypothesis:

H03: The factors are independent or the interaction effect


does not exist.
The various steps involved are as follows:

(i) Take the total of the values of individual items (or their
coded values as the case may be)in all the samples and call
it T

T= sum of all the individual value

(ii) Work out the correction factor as under:

correction factor = (T)2 / n


(iii) obtain the sum of squares of deviations for
variance between columns or (SS between columns).

∑(Tj)2 /nj - (T)2 /n

(iv)obtain the sum of squares of deviations for


variance between rows (or SS between rows).

∑(Ti)2 / ni –
(T)2 / n

(v)obtain the sum of squares of deviations for total


variance.
∑X2 (T)2 / n
(vi) Sum of squares of deviations for residual or error
variance

Total SS – (SS between columns + SS between


rows)
= SS for residual or error variance.

(vii) Find Degrees of freedom (d.f.)

d.f. for total variance = (c . r – 1)


d.f. for variance between columns = (c – 1)
d.f. for variance between rows = (r – 1)
d.f. for residual variance = (c – 1) (r – 1)

where c = number of columns


r = number of rows
Table : Two way ANOVA
Sample of Sum of square Degree of Mean F –ratio
variation (SS) freedom (d.f.) square(MS)

Between colums ∑(Tj)2 / nj - ( T)2 (C-1) SS Between MS between


treatment /n column/(c-1) column /
MS residual

Between ∑(T i )2 / nj– (T)2 (r-1) SS Between MS between


rows /n rows/ (r-1) rows / residual
treatment

Residual or Error Total SS – (SS (c-1)(r-1) SS residual/(c-


between 1)(r-1)
column
+ SS between
rows)
Tota ∑X2 ij– (T)/n (c.r.-1)
 MS residual or the residual variance provides the
basis for the F-ratio concerning variation between
columns treatment and between rows treatment .

 MS residual is always due to the fluctuation of


sampling and hence serves as the basis for the
significance test.

 Both the F- ratio are compared with their


corresponding table values for given degree of
freedom at a specific level of significance
 if it is found that the calculated f ratio
concerning variation between columns is
equal to or greater than its table value then
the difference among means is considered
significant.

 Similarly the f- ratio concerning variation


between rows can be interpreted
Example -2
Set up an analysis of variance table for the following two-
way design results:
per acre production data

Varieties of seeds
Varieties of A B C
fertilizers
W 6 5 5

X 7 5 4
Y 3 3 3
Z 8 7 4
Step1 : Obtain the total value of individual

Total = sum of all individual


= 6+7+3+8+5+5+3+7+5+4+3+4 Where
T = sum of all individual
=60 sample
N = number of sample
so we have T= 60 and n = 12

Step 2 : calculate correction factor

Correction factor = Square of total value / number of sample

So correction factor = (T)2 /n

= 60*60/12 = 300
Step 2 : calculate Total SS
Total SS = square of all the items value – correction factor
= {(36+25+25+49+25+16+9+9+9+64+49+16) – (60*60/12)}
Individual squaring Result of
value
squaring
= 332-300 6 6*6 36
5 5*5 25
5 5*5 25
= 32 7 7*7 49
5 5*5 25
4 4*4 16
3 3*3 9
3 3*3 9
So we have total SS = 32 3 3*3 9
8 8*8 64
7 7*7 49
4 4*4 16
Step 3 : calculate SS between columns treatment
SS between column = sum of all (sqaure of individual column / sample in
column)
– correction factor

= (24*24/4 + 20*20/4+16*16/4 ) –
(60*60/12) 24*24 576/4 =
144

= 144 +100+60 – 300 20*20 400/4=


100

16*16 256/4=60
=8
Step 4 : calulate SS Between rows treatment
SS between = sum of all square of individual row / number of sample)
– correction factor

= (16*16/3 + 16*16/3 +9*9/3)


–(60*60/12)
= 85.33 + 85.33 +27+120.33 – 300
= 317.99 - 300 16*16 256/3=85
.33
=18 16*16 256/3=85
.33

9*9 81/3=27

19*19 361/3 =
120.33
Step 5 : SS residual or error
SS residual = Total SS – (SS between
columns + SS between rows)
=32 - (8+18)
=6
Step 6 : find degree of freedom
d.f. for variance between columns = (c – 1)
= 3-1

2
d.f. for variance between rows = (r – 1)
= 4-1
Finally we have some important quantitative value
SS between column 8
SS between row 18
SS residual 6
d.f for variation between column 2
d.f for variation between row 3
d.f for residual variance 6
d.f for total variance 11
Table : Two way Anova Table
Sample of Sum of Degree of
variation
square Mean
(SS)
F –ratio

5% F – limit (
freedom (d.f.)

square(MS)

or the tables
v
a
l
u
e
s
)

Between 8 (3-1) = 2
Conclusion
 The above table shows that the calculated value of F is
4 which is less than the table value 5.14 of at 5%
level with
d.f. being v1 = 2 and v2 = 6 in column .

a) This analysis show has no significance .

The f value in rows


is 6 which is higher than the table 4.76 of a 5% level with
d.f. Being v1=3 and v2= 6.

b)This analysis supports the null-hypothesis which shows


difference is sample means.
It means that differences concerning varieties of
seeds are insignificant at 5% level as the calculated F-
ratio of 4 is less than the table value of 5.14 but the
variety differences concerning fertilizers are
significant as the calculated F ratio of 6 is more than
its table value of 4.76

Fertilizers act differently, the different varieties of


seeds are affected differently.
Two way anova with replication
Example 3 :
set up anova table for the following information relating to
the three drugs testing to judge the effectiveness in reducing
blood pressure for three different groups of people
Drugs
X

Y
Z

Group of people A 14 10 11
15 9 11
B 12 7 10
11 8 11
C 10 11 8
11 11 7
Questions:

I. Do the drugs acts differently?

II. Are the different groups of


people affected differently?

III. Is the interaction terms


significantly?
Computation for two way anova with repeated values
Step (i) T = 187 , n = 18

Step (ii) correction factor = 187*187 /18


= 1942.72

Step(iii) SS between column (i.e between drugs)


= (73*73/6 + 56*56/6 + 58*58/6)
- (187*187/18)

= 888.16 + 522.66 + 560.67 -1942.72


= 28.77
Step (iii) SS between rows (i.e between people)
= (70*70/6 + 59*59/6 + 58*58/6) –
(187*187/18)

= 816.67 + 580.16 + 560.67 – 1942.72


= 14.78

Step(iv) total SS ={(14)2 + (15)2 + (12)2 + (11)2 +(10)2 + (11)2 +


(10)2 + (9)2 + (7)2 + (8)2 +
(11)2 + (11)2 +
(11)2 + (11)2 + (10)2 +
(11)2 +
(8)2 + (7)2 – (187*187/18)

= 2019 -1942.72
Step (v) SS within samples
= (14-14.5)2 + (15-14.5)2 + (10-9.5)2 + (9-9.5)2 +
(11-11)2 + (11-11)2 + (12-11.5)2 +(11-11.5)2 +
(7-7.5)2 + (8-7.5)2 + (10 -10.5 )2 + (11 – 10.5 )2
+
( 10- 10.5)2 + (11 -10.5 )2 + (11 -11 )2 + (11- 11 )2
+ (8 – 7.5 )2 + (7 – 7.5)2
= 3.50

Step (vi) SS for interaction


= 76.28 - [28.77 +
14.78 + 3.50]
Table : The Anova Table
Source of d.f. MS F-ratio 5% F –
SS variation limit
Between 28.77 (3-1) = 2 28.77/2 14.385/ 0.389 F (2,9) = 4.26
column (i.e = 14.385 = 36.9
between
drugs)
Between 14.78 (3-1) = 2 14.78 / 2 7.390/0.389 F (2,9) = 4.26
rows = 7.390 = 19.0
(i.e
between
people)
Interaction 29.23 4 29.23 / 4 7.308/0.389 F(4,9) = 3.63
= 7.308 = 18.786

Within 3.50 (18- 9) = 9


sample
errors.

Tota 76.28 (18- 1)


l = 17
Conclusion
The above table show that all the
three f - ratio are significant of
5% level which means that

-the drugs act differently ,

-different groups of people are affected


differently

-the interaction terms is significant


Software used for ANOVA calculation
• SPSS-
• The ANOVA test in spss are for simple one way ANOVA
calculation anything more complicated gets
difficult.

• Statistica-
• Also used for anova calculation .

• Excel -
• excel allows to ANOVA calculation from data analysis add but
the instructions are not good .

• ez ANOVA –
• this is free to download and it used for ANOVA calculation .
Reference
Khan and Khanum - “Analysis of
variance” - Fundamental of Biostatics
Kothari.C.R. – “analysis of varince” -
Research Methodology
 Research paper - “interaction effect in
Anova”-
Stevens, 1990,Stevens, 1999

You might also like