You are on page 1of 37

Chi-Square, F-Tests

&
Analysis of Variance
(ANOVA)
Chi-Square Test
The Chi-square is used to compare experimentally obtained
results with those expected theoretically results.
It is writeen as and defined as
Where
The value of is always +ve.
Since is a statistic and not a parameter it does not involve
any assumption about the population.
The significant test on is always based on one tailed test
of the right hand side of standard normal curve.
2

=
(


=
n
i
i
i i
E
E O
1
2
2
) (

event i of frequency Expected E


event i of frequency Observed
th
i
th
=
=
i
O
2

Its shape depends on the degree ,of fredom but not


symmetrical distribution
Test for goodness of fit
is a measure of probabilities of association between
attributes.
It gives us an idea about the divergence between the observed
and the expected frequencies.
Thus, the test also is described as the ``test of goodness of fit.
F-Ddistribution Test
F-test is the ratio of two independent variates. It is givenby
2

2
2
2
1
2
1
v v
F

=
2

The F-statistic is the ratio of independent estimates of population


variates and is expressed as
with degree of freedom for nu-
merator and degree of freedom for denominator.
are the unbiased estimates based on sample variance and
given by
Generally but if then the two variances should
be interchanged, so that the value of F is always greater than 1
2
2
2
1

= F
1
1
n
1
2
n
2
2
2
1
and
1
s
2
s
1

,
1

2
2
2 2
2
2
1
2
1 1
2
1

=
n
s n
n
s n

2
2
2
1

>
2
2
2
1

<
Assumptions in F-test
Nomality: The values in each group should be normally
distributed
Independence of errors: Errors should be independent of each
value
Homogeneity: The variances within each group should be equal
for all groups.
i.e.
If the samples are large enough, we dont need assumption of
normality
2 2
3
2
2
2
1
. . .
n
= = = =
Test of hypothesis about the the variance of two
populations.
Steps involved are as follows:
1. Setting up of hypothesis
2. Calculation of Test statistic
3. Determination of for given
4. Decision: Accept if computed
2
2
2
1 1
2
2
2
1 0
:
:


=
=
H
H
2
2
2
1
2
1
2
2
2
2
2
1
2
2
2
1
if


if

< =
> = F

F
0
H

F F s
Summary of procedures is given below:
Accept if Right tail One
tailed
Accept if
Otherwise
accept
Both sides
tails
Two
tailed
Decision rule Rejection
region
Type of
test
0
H
2
2
2
1 0
: = H
2
F F
cal
>

F F
cal
>
1
H
2
2
2
1 1
: = H
0
H
1
H
2
2
2
1 0
: > H
2
2
2
1 1
: < H
0
H
Example
The standard deviations calculated from random samples of size
9 and 13 are 2.1 and 1.8 respectively. The samples are regarded
as drawn from the normal population with the same standard
deviation.
Solution
Given
51 . 3
1 13
) 8 . 1 ( 13
1

96 . 4
1 9
) 2 . 1 ( 9
1

8 . 1 , 13 n , 1 . 2 , 9
2
2
2
2 2
2
2
2
1
2
1 1
2
1
2 2 1 1
=

=
=

=
= = = =
n
s n
n
s n
s s n

Hypothesis setting
The value of F-critical is computed by
Read on the top and on the vertical (left most) under
. For
Then,
Since
Accept
Hence, the samples may be regarded as drwn from the normal
population
0
H
85 . 2 ) 12 , 8 ( 41 . 1
05 . 0
= < = F F
cal
85 . 2 ) 12 , 8 (
05 . 0
= F
05 . 0 =
12 1 13 , 8 1 9
2 1
= = = = v v
1
v
2
v
1 , 1 , 05 . 0 where ) , (
2 2 1 1 2 1
= = = n v n v v v F

41 . 1
51 . 3
96 . 4

: , :
2
2
2
1
2
2
2
1 1
2
2
2
1 0
= = =
= =


F
H H
Analysis of variance
This is the method of splitting the total variation of data into
constituent parts. The total variation is split into:-
(a) Variation within subgroups of samples
(b) Variation between subgroups of the samples.
After splitting, these variations are tested for their significance
by F-test.
Anova Table
The technique of analysis of variance is referred to as
ANOVA
The ANOVA table shows source of variation, the sum of
squares, degrees of freedom, mean squares (variance) and the
formula for F-ratio.
Assumption of Analysis of Variance:
Each of the samples is a random sample
Each of the sample is independent of the other sample
Each of the population has the same variance ( )
and identical means ( ).
Classification of Analysis of Variance
One way Analysis of Variance
Two way Analysis of variance.
2 2
2
2
1
...
n
= = =
n
= = = ...
2 1
One way Analysis of Variance
In this case we consider the influence of one factor
We determine if there are differences within that factor.
Computational of the test statistic:
The analysis is carried out on the basis of the ratio between the
variances.
2
2



Statistic
within
btn
samples the within Variance
samples the btn Variance
F

=
=
The degrees of freedom associated with both numerator ( )
and denominator ( ) are (K - 1) and ( K(n -1)) respectively,
because there are K samples and each sample has (n - 1) degree
of freedom.
Variance Between Samples:
In order to evaluate the variance between samples, we need to
calculate sum of squares between samples (SSB),
where is the size of the sample
is the Grand mean or population mean (average of all
items in all samples)
is the mean of sample i.
2
btn

2
within

( )
2
1
x x n SSB
i
k
i
i
=

=
i
n
th
i
x
i
x
This is also known as Mean
Square btn samples (MSB).
Variance Within Samples:
Computation of variance within samples involve evaluation of
sum of squares within samples (SSW).
i =1, 2, 3
1
2

=
K
SSB
btn

( ) ( ) ( )

+ + + =
2
2
2 2
2
1 1
...
k ki i i
X X X X X X SSW
Kn N ,
) 1 (
2
=

=
n K
SSW
K N
SSW
within

Where N =Total number of items in all the samples


K =Number of samples
is also known as Mean Square within Samples (MSW).
Then,
2
within

MSW
MSB

) (
) 1 (
2
2
=

= = =
K N SSW
K SSB
df SSW
df SSB
F
within
btn

Anova Table
(N 1) SST Total
(K 1)
(N K)
SSB
SSW
Between
Samples
Within
Samples
F-ratio Mean
Square
Degree of
freedom
Sum of
Squares
Source of
Variation
1
=
K
SSB
MSB
K N
SSW
MSW

=
MSW
MSB
Short-Cut Method
This method involve the following steps/procedures:
(i) Find the total sum (TS)
(ii) Calculate the correction factor (CF)
(iii) Calculate the Total sum of squares(SST)
samples K for . . .
2 2 1 1 k k
X n X n X n TS + + + =
ns observatio of number Total N ,
) (
2
= =
N
TS
CF
i. sample in s X' all for squares of summation the is X where
) (
X . . .
2
i
2
2
k
2
2
2
1


+ + + =
N
TS
X X SST
(iv) Find the sum of squares between samples (SSB)
(v) Find the sum of squares within samples (SSW)
SSW =SST SSB
Then, rest of the procedures are similar to the previous method.
i.e.
( ) ( ) ( )
( ) 1. sample in values all of total the of square the is X where
X
. . .
2
1
2
k
2
2
2
1
2
1


+ + + = CF
n n
X
n
X
SSB
k
K N
SSW
MSW
K
SSB
MSB
within
btn

= =

= =
2
2
1

Note:
The analysis of variance also involve test of hypothesis (test of
significance) of the variation.
The hypothesis setting is
i.e. The means of the populations from
which P samples are taken are equal to one another.
:Atleast two of the means of the populations are unequal or all the
are not equal.
Example
The supervisor wanted to test whether materials given by instructors
in different section of the statistics class are the same or not. 4
sections of the same course were selected and common test was
administered to 5 students selected randomly from each section.
p
H = = = ... :
2 1 0
i

1
H
The score for each student from each section were recorded.
Test for any difference in learning reflected in the average scores.
Mean
Total
10 14 13 5 5
10 12 8 10 4
13 11 10 12 3
15 13 12 10 2
12 10 12 8 1
Section 4
Score (X4)
Section 3
Score (X3)
Section 2
Score (X2)
Section 1
Score (X1)
Students
9
45
1
1
=
=

X
X
11
55
2
2
=
=

X
X
12
60
3
3
=
=

X
X
12
60
4
4
=
=

X
X
Solution
Computation of test statistic:
Total sum of squares
equal not are means All : H
: H : Hypothesis
1
4 3 2 1 0
= = =
2420
20
220
N
(TS)
(CF) factor Correction
220
60 60 55 45
X X X X
X (TS) Sum Total
2
4 3 2 1
= = =
=
+ + + =
+ + + =
=

( ) 102 2420 2522


2
= = =

CF X SST
Sum of squares between samples (SSB) is
Sum of squares within samples (SSW) is
The critical values of F at
Since , we accept
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
30 2420
5
60
5
60
5
55
5
45

2 2 2 2
4
2
4
3
2
3
2
2
2
1
2
1
= + + + =
+ + + =

CF
n
X
n
X
n
X
n
X
SSB

22 . 2
) 4 20 ( 72
) 1 4 ( 30
) (
) 1 (
F Then,
72 30 102
2
within
2
btn
=

= =
= = =
K N SSW
K SSB
SSB SST SSW

16 and 3 , 05 . 0
2 1
= = = v v
24 . 3 ) 16 , 3 ( is
05 . 0
= F
24 . 3 ) 16 , 3 ( 22 . 2
05 . 0
= < = F F
cal
0
H
The ANOVA Table is shown below.
SST =102 Total
F =
MSB/MSW
=2.22
MSB =30/3
=10
MSW =72/16
=4.5
K 1 =3
N K =16
SSB =30
SSW =72
Between
samples
Within
Samples
F- ratio Mean Square Degree of
freedom
Sum of
squares
Source of
variation
Two way Analysis of Variances.
Two-way ANOVA technique is used when data are classified
on the basis of two factors:
For example, the agricultural output may be classified on the
basis of different varieties of seeds and also on different
varieties of fertilizers used.
In a factory, the various units of products produced during a
certain period may be classified on the basis of different
varieties of machines used and also on the basis of different
grades of labour.
There are two cases for 2-way ANOVA.
(i) When no repeated values
(ii) When there are repeated values
Case1: No repeated values
The following are steps involved;
(i) Compute the total sum (TS)
(ii) Compute the correction factor (CF)
(iii) Compute the total sum of squares (SST)
(iv) Find the sum of squares btn columns (SSB columns)
(v) Find the sum of squares btn rows(SSB rows)
N TS CF
2
) ( =

= CF X SST
ij
2
( ) ( ) ( )
CF
n
X
n
X
n
X
SSB
k
ik i i
columns
+ + + =

2
2
2
2
1
2
1
. . .
( ) ( ) ( )
CF
n
X
n
X
n
X
SSB
m
mi i i
rows
+ + + =

2
2
2
2
1
2
1
. . .
(vi) Find the sum of squares for residuals (error variance)
SS for residuals =SST (SSB-columns +SSB-rows)
(vii) Degree of freedom (d.f ) is as follows:
d.f. for total variance =c.r 1
d.f. for SSB columns =c 1
d.f. for SSB rows =r 1
d.f. for error variance =(c 1)(r 1)
Where c and r are number of columns and rows respectively.
The ANOVA table is set as follows:
2-way ANOVA Table
c.r 1 SST Total
MS
=(SSres)/(c.r1)
(c 1)(r 1) SS-
residual
Residual
or error
MSBr/MSr
esidual
MSBr
=(SSBr)/(r 1)
r 1 SSB-
rows
Between
rows
MSBc/MSr
esidual
MSBc
=(SSBc)/(c 1)
c 1 SSB-
columns
Between
columns
F-ratio Mean Squares Degree of
freedom
Sum of
Squares
Source of
Variation
Thus, MS residual or residual variance provides the basis for the
F-ratios concerning the variation btn column treatment and btn
row treatment.
MS residuals is always due to the fluactions of sampling and hence
serves as the basis for significance test.
Example
Set up an ANOVA table for the following two-way design results.
4 7 8 Z
3 3 3 Y
4 5 7 X
5 5 6 W
Varieties of fertilizers
C B A Varieties of seeds
Solution
12 N 66 108 158 16 20 24
2 2 2
= = = = = = =

C B A C B A
300
12
60
N
(TS)
(CF) factor Correction
60
2 2
= = =
= + + =

C B A TS
( ) ( ) ( )
( ) ( ) ( )
8 300
4
16
4
20
4
24

32 300 66 108 158
2 2 2
2 2 2
2 2 2
= + + =
+ + =
= + + = + + =


CF
n
C
n
B
n
A
SSB
CF C B A SST
C B A
columns
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
18 300
3
19
3
9
3
16
3
16

2 2 2 2
2 2 2 2
= + + + =
+ + + =

CF
n
Z
n
Y
n
X
n
W
SSB
Z Y X W
rows
SS for residuals or error =SST (SSBcolumns +SSBrows)
=32 [8+18]=6.
ANOVA Table
4x3-1=11 32 Total
6/6=1 (3-1)(4-1)=6 6 Residuals
F(3,6)=4.76 6/1=6 18/3=6 4-1=3 18 Btn rows
F(2,6)=5.14 4/1=4 8/2=4 3-1=2 8 Btn columns
5% Fvalues F-ratio MS d.f SS Source of
Variation
Case 2: When there are repeated Values.
Consider the following:
The procedures are almost the similar but with slight changes.
(i) Find the total sum (Ts)
(ii) Compute the correction factor (CF)
(iii) Compute the total sum of squares (SST)
(iv) Find SSB columns (v) Find SSB rows
1 1 1
4 2 1 C
1 2 1
1 3 2 B
2 2 5
1 1 4 A
Z Y X
(vi) Find the left-over sums of squares and left-over degrees of
freedom which are known as interaction variation.
Interaction SS = SST (SSBcolumns + SSBrows)
Interaction is the measure of interrelationship among the two
different classifications.
(vii) Find the sum of squares within samples
( )
2
1
X X
i
The ANOVA table is as follows:
Total
MSW N K SSW Within
Samples
Total d.f.
-[(c1)+(r1)+(N
K)]
SS
interaction
Interactio
n
r 1 SSB rows Between
rows
c 1 SSB
columns
Between
Columns
F-ratio MS d.f. SS Source of
Treatment
MSW
rows
MSB
d.f.
interaction
MS
MSW
columns
MSB
MSW
n interactio
MS
columns
MSB
rows
MSB
Example
Set up the Anova table for the following information relating to
these drugs testing to judge effectiveness in reducing blood
pressure for different groups of people.
7 11 11
8 11 10 C
11 8 11
10 7 12 B
11 9 15
11 10 14 A
Z Y X
Drugs
Solution
( )
( ) ( ) ( )
( ) ( ) ( )
77 . 28 72 . 1942
6
58
6
56
6
73

28 . 76
72 . 1942
18
) 187 (
N
(TS)
CF , 18 N , 187
2 2 2
2 2 2
2
2 2
= + + =
+ + =
= =
= = = = =

CF
n
Z
n
Y
n
X
SSB
CF X SST
TS
Z Y X
columns
ij
( ) ( ) ( )
( )
23 . 29 ] 50 . 3 78 . 14 77 . 28 [ 76.28
] SS SSB [SSB SST n interactio SS
3.50 M samples within
78 . 14
samples Within rows columns
2
ij
2 2 2
= + + =
+ + =
= =
= + + =


M SS
CF
n
C
n
B
n
A
SSB
C B A
rows
ANOVA Table
17 76.28 Total
9 3.50 Within
samples(errors)
F(4,9)=3.63 4 29.23 Interaction
F(2,9)=4.26 2 14.78 Btn Rows
F(2,9)=4.26 2 28.77 Btn Columns
5% Limit F-ratio MS d.f. SS Source of
Treatment
385 . 14
2
77 . 28
= 9 . 36
389 . 0
385 . 14
=
390 . 7
2
78 . 14
=
308 . 7
4
23 . 29
=
389 . 0
9
50 . 3
=
0 . 19
389 . 0
390 . 7
=
8 . 18
389 . 0
308 . 7
=

You might also like