You are on page 1of 15

ANALYSIS OF VARIANCE

The analysis of variance frequently referred to by contraction ANOVA is a statistical technique


specially designed to test whether the means of more than two quantitative populations are equal.

Problems:

The three samples below have been obtained from normal populations with equal variance. Test
the hypothesis that the sample means are equal:
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14

The table value of F at 5% level of significance for
1
=2 and
2
=12 is 3.88


Solutions:
1
X
2
X
3
X
8
10
7
14
11
7
5
10
9
9
12
9
13
13
14
Total 50
X 10
40
8
60
12

10
3
12 8 10
=
+ +
= X

VARIANCE BEWEEN SAMPLES
( )
2
1
X X
2
2
|
.
|

\
|
X X
2
3
|
.
|

\
|
X X
0
0
0
0
0
4
4
4
4
4
4
4
4
4
4
0 20 20

Sum of squares between samples = 0+20+20=40


VARIANCE WITHIN SAMPLES

1
X
( )
2
1 1
X X
2
X
( )
2
2 2
X X
3
X
( )
2
3 3
X X
8
10
7
14
11
4
0
9
16
1
7
5
10
9
9
1
9
4
1
1
12
9
13
13
14
0
9
1
0
4
30 16 14

Sum of squares between samples = 30+16+14=60


ANOVA TABLE
Source of variation Sum of squares v Mean square
Between 40 2 20
Within 60 12 5
Total 100 14
4
5
20
= = F
The calculated value of F is greater than the table value. The hypothesis is rejected. Hence
there is significant difference in the sample means.
ANALYSIS OF VARIANCE IN TWO-WAY CLASSIFICATION MODEL

In a one-factor analysis of variance explained above the treatments different levels
of a single factor which is controlled in the There are, however, many situations in which
the response variable of interest may be affected by more than one factor. For example, sales of
Maxfactor Cosmetics, in addition to being affected by the point-of-sale display, might also be
affected by the price charged, the size and/or location of the store or the number of
competitive products sold by the store, Similarly petrol mileage may be affected by the
type of car driven, the way it is driven, road conditions and other factors in addition to the brand
of petrol used.

When it is believed that two independent factors might have an effect on the response variable
of interest, it is possible to design the test so that an analysis of variance can be used to test for
the effects of the two factors simultaneously. Such a test is called a two-factor analysis of
variance. With two- factor analysis of variance, we can test two sets of hypothesis with the same
data at the same time.

In a two-way classifications the data are classified according to two different criteria or factors.
The procedure for analysis of variance is somewhat different than the one followed while dealing
with problems of one-way classification. In a two-way classification the analysis of variance
table takes the following form.

Source of Variation Sum of squares Degree of
freedom
Mean Sum of Squares Ratio of F
Between samples
Between Rows
Residual or error
SSC
SSR
SSE
(c-1)
(r-1)
(c-1)(r-1)
MSC-SSC/)/(c-1)
MSR = SSR/(r-1)
MSE=SSE/(r-1)(c-1)
MSE/MSE
MSR/MSE
Total SST n-1
SSC = Sum of square between columns
SSR = Sum of squares between rows
SSE= Sum of squares due to error
SST= Total sum of squares
The sum squares for the source 'Residual' is obtained by subtracting from the total sum of
squares the sum of squares between columns and rows, i.e.,
SSE = SST-[SSC+SSR]
The total number of degrees of freedom = n - 1 or cr - 1
where c refers to number of columns, and
r refers to number of rows,
Number of degrees of freedom between columns
=(c-1)
Number of degrees of freedom between rows
= (r-1)
Number of degrees of freedom for residual
=(c- l)(r- 1)
The total sum of squares, sum of squares for between columns and sum of squares for between
rows are obtained in the same way as before.
Residual or error sum of square = Total sum of square Sum of squares between columns Sum
of squares between rows.

5The F values are calculates as follows:
F (

1.

2
) = MSC/MSE
Where
1
= (c-1) and
2
= (c-1)(r-1)
F (

1.

2
) = MSR/MSE
Where
1
= (r-1) and
2
= (c-1)(r-1)
It should be carefully noted that
1
may not be same in both cases- in one case
1
= (c-1) and
another case
1
(r-1).
The calculated values of F are compared with the table values. If calculated value of F is greater
than the table value at pre-assigned level of significance the null hypothesis is rejected, otherwise
accepted.
It would be clear from above that in problems involving two-way classification. residual is the
measuring rod for testing significance. It represents the magnitude of variation due to forces
called change. The following examples would illustrate the procedure.

Problems:

A tea company appoints four salesmen A,B,C and D and observes their sales in three seasons-
summer, winter and monsoon. The figures (in lakhs) are given in the following table:











(i) Do the salesmen significantly differ in performance?
(ii) Is there significant difference between the seasons?

Solutions:
The above data are classified according to criteria (i) salesman, and (ii) seasons in order to
simply calculations we code the data by subtracting 30 from each figure. The data in the code
from are given below:









Correction Factor =
( )
0
12
0
2 2
= =
N
T
(number of items or N is 12)
Sum of squares between salesmen
This will be obtained by squaring up the salesmens totals, diving each total by the number of
items included in it, adding these figures and then subtracting the correction factor from them.
Thus, sum of squares between salesmen:

( ) ( )
3 ) 1 4 (
42 0 12 27 3 0
3
) 6 (
3
9
3
3
3
) 0 (
2 2 2 2 2
= =
= + + + + =
+

+ + =
v
N
T

Sum of squares between seasons
This will be obtained by dividing the squares of the season totals by the numbers of items that
make up each total, adding all such figures and subtracting therefrom the correction factor, thus,

sum of squares between salesmen:
360 96 81 93 90 Salesmens
Totals
112 29 29 28 26 Monsoon
120 32 31 29 28 Winter
128 35 21 36 36 Summer
Total D C B A
Seasons Salesmen Seasons
Grand total
T=0
6 -9 3 0
-8 -1 -1 -2 -4 Monsoon
0 +2 +1 -1 -2 Winter
+8 +5 -9 +6 +6 Summer
Total D C B A
Seasons Salesmen Seasons

( ) ( )
2 ) 1 4 (
32 0 16 0 16
4
8
4
0
4
) 8 (
2 2 2 2
= =
= + + =

+ + =
v
N
T


Total sum of squares

This will be obtained by adding the squares of all items in the table and subtracting the
correction factor therefrom, thus:
Total sum of squares =
11 ) 1 12 (
210 0 210
) 1 ( ) 2 ( ) 5 ( ) 1 ( ) 1 ( ) 9 (
) 2 ( ) 1 ( ) 6 ( ) 4 ( ) 2 ( ) 6 (
2
2 2 2 2 2 2
2 2 2 2 2 2
= =
= =
+ + + + + +
+ + + + +
v
N
T

The above information will be presented in the following table of Analysis of Variance:
Source of
Variation
Sum of squares Degree of
freedom
Mean Sum of Squares
Between columns
(salesmen)
Between
Rows(seasons)
Residual
42
32
136
3
2
6
14
16
22.67
210 11

Let us take the hypothesis that there is no difference between the sales of salesman and of
seasons or .In other words, the three independent estimates of variance are the estimates of
variance of a common population.
Now first compare the salesmen variance estimate with the residual variance estimate; thus
619 . 1
14
67 . 22
= = F

The table value of F for
1
= 3 and
2
= 6 at 5% level of significance is 4.76.
The calculated value is less than the table value and we conclude that the sales of different
salesmen do not differ significantly.
Now let us compare the season variance estimate with the residual variance estimate: thus
417 . 1
16
67 . 22
= = F
The critical value of F for
1
= 2 and
2
= 6 at 5% level of significance is 5.14.
The calculated value is less than this and hence there is no significant difference in the seasons as
far as the sales are concerned.
Thus, the test shows that the salesmen and the seasons are alike so far as the sales are concerned.

Problems:
The following data represent the number of units of production per day tumed out by 5 different
workers using 4 different types of machines:












A. Test whether the mean productivity is the same for the different machine types.
B. Test whether the 5 men differ with respect to mean productivity

Solutions:
Let us take the hypothesis that (a) the mean productivity is the same for four different machines,
and (b) the 5 men do not differ with respect to mean productivity. To simply calculations let us
divide each value by 40. The coded data is given below








Correction Factor = 20
20
400
2
= =
N
T

Sum of squares between machines

( ) ( )

+ +

+ =
2 2 2 2
5
) 17 (
5
38
5
6
5
) 5 (
Correction Factor
3 ) 1 4 ( ) 1 (
8 . 338 20 8 . 358
20 ) 8 . 57 8 . 288 2 . 7 5 (
= = =
= =
+ + + =
c v

Sum of squares between workers

39 49 42 38 5
33 46 38 43 4
32 44 36 34 3
43 52 40 46 2
36 47 38 44 1
D C B A

Workers
Machine type
Total Machine type Worker
D C B A
+5
+21
-14
0
+8
-4
+3
-8
-7
-1
+7
+12
+4
+6
+9
-2
0
-4
-2
+2
+4
+6
-6
+3
-2
1
2
3
4
5
T=2
0
-17 +38 -6 +5 Total

( ) ( ) ( ) ( )
4 ) 1 5 ( ) 1 (
5 . 161 20 5 . 181
20 ) 16 0 49 25 . 110 25 . 6 (
20
4
64
4
0
4
196
4
441
4
25
4
8
4
0
4
14
4
21
4
) 4 (
2 2 2 2 2 2
= = =
= =
+ + + + =
+ + + + =
+ +

+ + =
r
N
T
v

Total sum of squares
Total sum of squares =
574 20 594
20 ] 1 49 64 9 16 81 36
16 144 49 4 4 16 4 4 9 36 36 16 [
] ) 1 ( ) 7 ( ) 8 ( ) 3 ( ) 4 ( ) 9 ( ) 6 (
) 4 ( ) 12 ( ) 7 ( ) 2 ( ) 2 ( ) 4 ( ) 0 (
) 2 ( ) 2 ( ) 3 ( ) 6 ( ) 6 ( ) 4 [(
2
2 2 2 2 2 2 2
2 2 2 2 2 2 2
2 2 2 2 2 2
= =
+ + + + + + +
+ + + + + + + + + + + =
+ + + + + + +
+ + + + + + +
+ + + + + =
N
T

Residual or Remainder = Total sum of squares (Sum of squares between machines Sum of
squares between workers)
= 574-33.8-161.5 = 73.7
Degree of freedom for remainder = 19-3-4=12
(c-1) (r-1)= (3*4) = 12

Source of Variation S.S d.f M.S Variance
Ratio or F
Between Machine types
Between Workers
Remainder or Residual
338.8
161.5
73.7
3
4
12

112.933
40.375
6.142
112.933/6.142
= 18.387
40.375/6.142
= 6.574
574 19
(a) For
3
=12, F
0.05
=3.49
Since the calculated value (18.4) is greater than the table value, we conclude that the mean
productivity is not same for the four different types of machines
(b) For
4
=12, F
0.05
=3.26
The calculated value (6.58) is greater than the table value, hence the worker differ with
respect to mean productivity.



Application of the t-distribution

The following are some of the examples to illustrate the way in which the Student distribution
is generally use to test the significance of the various results obtained from small samples.

1. To test the Significance of the Mean of a Random Sample. In determining whether the
mean of a sample drawn from a normal population deviates significantly from a stated value (the
hypothetical value of the populations mean), when variance of the population is unknown we
calculate the statistic:




where X = the mean of the sample
= the actual or hypothetical mean of the population
n=the sample size
S= the standard deviation of the sample

Problems

The manufactures of a certain make of electric bulbs claims that his bulbs have a mean life of 25
months with a standard deviation of 5 months. A random sample of 6 such bulbs gave the
following values.

Life in months: 24, 26, 30, 20, 20, 18

Can you regard the producers claim to be valid at 1% level of significance? (Give that the table
values of the appropriate test statistics at the said level are 4.032, 3.707 and 3.499 for 5.6 and 7
degree of freedom respectively)

Solutions: Let us take the hypothesis that there is no significant difference in the mean life of
bulbs in the sample and that of the population. Applying t-test:









CALCULATION OF X and S

X
x
X X ) (



( )
S
n X
t

=
( )
1
2

=

n
X X
S
( )
S
n X
t

=
2
x
24
26
30
20
20
18
+1
+3
+7
-3
-3
-5
1
9
49
9
9
25
X =138

2
x =102

517 . 4 4 . 20
5
102
1
23
6
138
2
= = =

=
= = =

n
x
S
n
X
X

084 . 1
517 . 4
449 . 2 2
6
517 . 4
| 25 23 |
=

=

=n-1=6-1=5. For =5, t
0.01
=4.032.
The calculated value if t is less than table value. The hypothesis is accepted. Hence, the
producers claim is not valid at 1 level of significance.


Problems

A random sample of size 16 has 53 as mean. The sum of the squares of the deviations taken from
mean is 135. Can this sample be regarded as taken from the population having 56 as mean?
Obtain 95% and 99% confidence limits of the mean of the population. (for =15, t
0.05

= 2.13 for
=15, t
0.01
= 2.95)

Solutions: Let us take the hypothesis that there is no significant difference between the sample
mean and hypothetical population mean. Applying t test:
( )
4
3
4 3
16
3
| 56 53 |
3
15
135
1
135 ) ( , 16 , 56 , 53
2
=

=
= =

=
= = = =


t
n
X X
S
X X N X
n
S
X
t


=16-1=15. . For =16, t
0.05

= 2.13
The calculated value of t is more than the table value. The hypothesis is rejected. Hence, the
sample has not come from a population having 56 as mean.

95% confidence limits of the population mean


6 . 54 4 . 51 6 . 1 53
13 . 2
16
3
53
05 . 0
to
t
n
S
X
= =
=



99% confidence limits of the population means

212 . 55 788 . 50 212 . 2 53
95 . 2
4
3
53
95 . 2
16
3
53
01 . 0
to
t
n
S
X
= =

=



2. Testing Difference Between Means of Two Samples (Independent Samples) Given two
independent random samples of size n1 and n2 with means 1 X and 2 X and standard deviations
S
1
and S
2
we may be interested in testing the hypothesis that the samples come from the same
normal population. To carry out the test, we calculate the statistic as follows:

Where 1 X = mean of the first sample
2 X = mean of the second sample
n
1
= number of observations in the first sample
n
2
= number of observations in the second sample
S = combined standard deviation





The value of S is calculated by the following formula:
Two types of drugs were used on 5 and 7 patients for reducing their weight.
Drug A was imported and drug B indigenous. The decrease in the weight after using the drugs
for six months was as follows:
Drug A : 10 12 13 11 14
Drug B : 8 9 12 14 15 10 9
2 1
2 1
2 1
n n
n n
S
X X
t
+

=
( ) ( )
2
2 1
2 2
2
1
1
2 +
+
=

n n
X X X X
S
If the bias correction due to small is ignored, pooled estimate of the standard deviation can be
obtained by:





Is there a significant difference in the efficacy of the two drugs? If not, which drug should you
buy? (For =10, t
0.05
=2.223)

Solution: Let us take the hypothesis that there is no significant difference in the efficacy of the
two drugs. Applying t-test
2 1
2 1 2 1
n n
n n
S
X X
t
+
=
=

1
X
( )
1 1
X X
( )
2
1 1
X X
2
X
( )
2 2
X X
( )
2
2 2
X X
10
12
13
11
14
-2
0
+1
-1
+2
4
0
1
1
4
8
9
12
14
15
10
9
-3
-2
+1
+3
+4
-1
-2
9
4
1
9
16
1
4

1
X
=60

( )
2
1 1
X X
=10

2
X
=60

( )
2
2 2
X X
=44

However, it is advisable to take account of bias.
( ) ( )
324 . 2
10
54
2 7 5
44 10
2
; 11
7
77
; 12
5
60
2 1
2
1 1
2
1 1
2
2
2
1
1
1
= =
+
+
=
+
+
=
= = = = = =


n n
X X X X
S
n
X
X
n
X
X


735 . 0
324 . 2
708 . 1
7 5
7 5
324 . 2
11 12
2 1
2 1 2 1
= =
+

+
=
=
n n
n n
S
X X
t


=n
1
+n
2
2 = 5+7-2 = 10
=10, t
0.05
= 2.228.


2 1
2
2 2
2
1 1
n n
S n S n
S
+
+
=
For the calculated value of t is less than the table value, the hypothesis is accepted. Hence, there
is no significance in the efficacy of two drugs. Since drug B is indigenous and there is no
difference in the efficacy of impoted and indigenous drug, we should buy indigenous drug, i.e.,
B.






Problems:
For a random sample of 10 persons, fed on die A, the increased weight in pounds in a certain
period were:
10, 6, 16, 17, 13, 12, 8, 14, 15, 9
For another random sample of 12 persons, fed on diet B, increase in the same period were:
7, 13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17
Test whether the diets A and B differ significantly as regards her effect on increase in weight.
Given the following:
Degree of freedom 19 20 21 22 23
Value of t at 5% level 2.09 2.09 2.08 2.07 2.07


Solutions: Let us take the null hypothesis that A and B do not differ significantly weight regard
to their effect on increase in weight. Applying t-test

2 1
2 1 2 1
n n
n n
S
X X
t
+
=
=
( ) ( )
2
2 1
2
1 1
2
1 1
+
+
=

n n
X X X X
S
Calculating the requires values:

Persons fed on diet A Persons fed on diet B
Increases in
weight
1
X
Deviations
from mean 12
( )
1 1
X X
( )
2
1 1
X X
Increases
in weight
2
X
Deviations
from mean
15
( )
2 2
X X
( )
2
2 2
X X
10
6
16
17
13
12
-2
-6
+4
+5
+1
0
4
36
16
25
1
0
7
13
22
15
12
14
-8
-2
+7
0
-3
-1
64
4
49
0
9
1
2 1
2
2 2
2
1 1
n n
S n S n
S
+
+
=
8
14
15
9
-4
+2
+3
-3
16
4
9
9
18
8
21
23
10
17
+3
-7
+6
+8
-5
+2
9
49
36
64
25
4

1
X =120
( )
1 1
X X =0
( )
2
1 1
X X =
120

2
X =180
( )
2 2
X X
=0
( )
2
2 2
X X
=44

Mean increase in weight of 10 persons fed on diet A
; 12
10
120
1
1
1 = = =

n
X
X pounds
Mean increase in weight of 12 persons fed on diet A
; 15
12
180
2
2
2 = = =

n
X
X pounds
( ) ( )
66 . 4
20
434
2 12 10
314 120
2
2 1
2
1 1
2
1 1
= =
+
+
=
+
+
=

n n
X X X X
S
1 X =12, 2 X =15, n
1
= 12, n
2
= 12, S = 4.66. Substituting the values in the above formula

51 . 1 34 . 2
66 . 4
3
12 10
12 10
66 . 4
15 12
= =
+

= t
=n
1
+n
2
2 = 10+12-2 = 20.

For =20, the table value of t at 5 percent level is 2.09. The calculated value is less than the table
value and hence the experiment provides no evidence against the hypothesis. We, therefore,
conclude that diets A and B do not differ significantly as regards their effect on increase in
weight is concerned.


3. Testing Difference between Means of Two samples( Dependent Samples or Matched
Paired Observations)
n
S
d
t

=
0
or
S
n d
t =
Where d = the mean of the differences
S = the standard deviation of the differences
The value of S is calculated as follows:
( )
1
2

=

n
d d
S or
1
) (
2

n
d d


It should be noted that t is based on n-1 degree of freedom.


Problems

To verify whether a course in accounting improved performed, a similar test was given to 12
participants both before and after the course, The original marks recorded in alphabetical order
of the participants were44, 40, 61, 52, 32, 44, 70, 41,47,72,53, and 72. After the course, the
marks were in the same order, 53, 38, 69, 57, 46, 39, 73, 48,73,74,60 and 78. Was the course
useful?

Solutions: Let us take the hypothesis that there is no difference in the marks obtained before and
after the course, i.e. the course has not been useful
Applying t-test(difference formula):
S
n d
t =


Participants Before
(1
st
Test)
After
(
2nd
Test)
(2
nd
1
st
Test)
d
d
2

A
B
C
D
E
F
G
H
I
J
K
L
44
40
61
552
32
44
70
41
67
72
53
72
58
38
69
57
46
39
73
48
73
74
60
78
+9
-2
+8
+5
+14
-5
+3
+7
+6
+2
+7
+6
81
4
64
25
196
25
9
49
36
4
49
36
d=60 d
2
=578


443 . 3
03 . 5
464 . 3 5
03 . 5
12
03 . 5
11
278
1 12
) 5 ( 12 578
1
) (
5
12
60
2
2
2
=

=
= =

=
= = =

t
t
n
d n d
S
n
d
d


=n-1=12 1 = 11;
For =11, t
0.05
= 2.201
The calculated value of t is greater than the table value. The hypothesis is rejected. Hence the
course has been useful.

Problems:
A drug is given to 10 patients and the increments in their blood pressure were recorded to be 3,
6, -2, 4, -3, 4, 6, 0, 0, 2. Is it reasonable to believe that the drug has no effect on change of
blood pressure? (5% value of t for 9 d.f.=2.26)
Solution. Let us take the hypothesis that the drug has no effect on charge of blood pressure.
Applying the difference test:
S
n d
t =


2
162 . 3
162 . 3 2
162 . 3
12 2
162 . 3
1 10
) 2 ( 10 130
1
) (
2
10
20
2
2
2
=

= =
=

=
= = =

t
n
d n d
S
n
d
d


=n-1=10 1 = 9; For =6, t
0.05
= 2.26.

The calculated value of t is less than the table value. The hypothesis is accepted. Hence it is
reasonable to believe that the drug has no effect on change of blood pressure.


d d
2

3
6
-2
4
-3
4
6
0
0
2
9
36
4
16
9
16
36
0
0
4
d=0 d
2
=130

You might also like