You are on page 1of 10

Paper Code and Title: H16RM Research Methodology and Statistics for Home

Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

CHI-SQUARE TEST
1. Introduction
Chi square test is one of a non parametric test. To test the goodness of fit and to
verify the distribution of observed data with assumed theoretical distribution, chi square
test is used. Therefore, it is a measure to study the divergence of actual and expected
frequencies. It has great use in Statistics, specially in sampling studies, where we expect
a doubtful coincidence between actual and expected frequencies and the extent to which
the difference can be ignored, because of fluctuations in sampling. If there is no
difference between the actual and expected frequencies, 2is zero. Thus, the chi square
test shows the discrepancy between theory and observation.

2. Objectives

In this module, we are going to discuss the following

1. The characteristics of 2 Test


2. Assumption of 2 Test
3. Uses of chi square test
4. How to apply chi square test

3. Characteristics of 2 Test:

1. It is specifically used to test the association between the variables when the data
is qualitative.
2. Chi square test is based on events or frequencies, whereas in theoretical
distribution, the test is based on mean and standard deviation.
3. To draw inferences, it is applied to test the hypothesis
4. The test can be used between the entire set of observed and expected
frequencies.
5. A new 2distribution is formed, for every increase in the number of degree of
freedom,

4. Assumptions:

1. The observations must be independent

1
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

2. The events must be mutually exclusive


3. There must be large observations
4. The data must be in original units for comparison purposes,
5. The expected frequencies should not be less than five. If it is less than 5, it
should be pooled with the frequency from the adjacent item.
6. The sample data must be drawn at random basis.

5. Degree of freedom:

When we compare the computed value of 2with the table value, the degree of
freedom is evident. The degree of freedom means the number of classes to which values
can be assigned at will, without violating restrictions. For example, we choose any four
numbers, whose total is 50. Here we have a choice to select any three numbers, say 10,
15, 20 and the fourth number is 5: [50 – (10+15+20)]. Thus our choice of freedom is
reduced by one, on the condition that the total be 50. Therefore the restriction placed on
the freedom is one and degree of freedom is three. As the restrictions increase, the
freedom is reduced.

Thus V=n–k

V: (nu) : Degree of freedom

k : No. of independent constraints

k : Number of frequency classes

For a contingency table, 2 x 2 table, the degree of freedom is


V = (c-1) (r – 1)
= (2 -1) (2 – 1)
= 1.
6. Uses:
1. 2Test of goodness of fit: Through the test we can find out the deviations
between the observed values and expected values. Here we are not concerned
with the parameters but concerned with the form of distribution. Karl Pearson

2
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

has developed a method to test the difference between the theoretical value
(hypothesis) and the observed value. The test is done by comparing the
computed value with the table value of 2for the desired degree of freedom. A
Greek letter 2is used to describe the magnitude of difference between the fact
and theory.
The 2may be defined as,
(𝑂−𝐸)2
2 = ∑ [ ]
𝐸

O =Observed frequencies
E = Expected frequencies.

Steps:

1. A hypothesis is established along with the significance level.


2. Compute deviations between observed value and expected value (O – E).
3. Square the deviations calculated (O – E)2.
4. Divide the (O – E)2by its expected frequency.
5. Add all the values obtained in step 4.
6. Find the value of 2, from 2Table at certain level of significance, usually 5% level.

If the calculated value of 2is greater than the table value of 2, at certain level
of significance, we reject the hypothesis. If the computed value of 2value is zero, then
the observed value and expected value completely coincide. If the computed value of
2is less than the table value, at a certain degree of level of significance, it is said to be
non-significant. This implies that the discrepancy between the observed and expected
frequencies may be due to fluctuations in simple sampling.

Example: Coins were tossed 160 times and the following results were obtained:

No. of heads 0 1 2 3 4
Observed frequencies 17 52 54 31 6
Under the assumption that coins are balanced, find the expected frequencies of
getting 0, 1, 2, 3 or 4 heads and test the goodness of fit.

3
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

Solution:

Hypothesis is that the coins are unbiased.

X Expected frequency 1604 cx (0.5)4 - E


0 160 x 4c0(0.5)4 = 10
1 160 x 4c1 (0.5)4 = 40
2 160 x 4c2 (0.5)4 = 60
3 160 x 4c3 (0.5)4 = 40
4 160 x 4c4(0.5)4 = 10

When applying 2,


(𝑶 − 𝑬)𝟐
No. of heads O E O–E (O – E)2
𝑬
0 17 10 7 49 4.900
1 52 40 12 144 3.600
2 54 60 -6 36 0.600
3 31 40 -9 81 2.025
4 6 10 -4 16 1.600
(𝑂−𝐸)2
∑[ ] = 12.725
𝐸

df = 5 – 1 = 4; 2 0.05 = 9.488

Calculated value of 2 is 12.725, which is greater than the table value 9.488.
Therefore, the fit is insignificant.

2. 2 as a test of independence: 2 test can be used to find out whether one or more
attributes are associated or not. For example, coaching class and successful
candidate, marriage and failure, etc., we can find out whether they are related or
independent. We take a hypothesis that the attributes are independent. If the
calculated value of 2 is less than the table value at a certain level of
significance, the hypothesis is correct and vice versa.

4
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

Example: Out of sample of 120 persons in a village, 80 were administered a new drug
for preventing influenza and out of them 26 persons were attacked by influenza. Out of
those who were not administered the new drug, 6 persons were not affected by
influenza. (a) prepare 2 x 2 tables showing the actual and expected frequencies; (b) Use
Chi square test for finding out whether the new drug is effective or not.

(At 5% level for one degree of freedom, the value of Chi square is 3.84).

Solution: 2 x 2 Table

A 
B 26 34 60(B)
 54 6 60
80(A) 40 120 N

Let the influenza and new drug be independent. The expected frequencies are:

80 x 60 60 x 40 60
= 40 = 20
120 120

60 x 80 60 x40 60
= 40 = 20
120 120

80 40 120

O E O–E (O – E)2 (𝑶 − 𝑬)𝟐


𝑬
24 40 -16 256 6.4
52 40 12 52 1.3
32 20 12 240 20
12 20 -8 -160 8
(𝑂−𝐸)2
∑[ ] = 35.7
𝐸

d.f. = (2 – 1) (2 – 1) = 1.,20.05 ford.f. = 3.84

5
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

Calculated value of 2 is 35.7 which is much higher than the table value.
Therefore the hypothesis is rejected. Hence we conclude that the drug is undoubtedly
effective in controlling the influenza.

7. YATES CORRECTION

If any cell frequency in 2 x 2 table is less than 5, then for the application of 2
test it has to be pooled with the preceding or succeeding frequency so that total is
greater than 5. This results in the loss of 1 d.f. In such a situation i.e., when any cell
frequency in 2 x 2 table is less than 5, we apply the correction, popularly known as the
Yates correction, for continuity. This consists in adding 0.5 to the cell frequency which
is less than 5 and adjusting the remaining frequencies accordingly, since row and
column totals are fixed and then applying 2 test without pooling.

Illustration: Solving the illustration.

Solution: Yates corrections are applied. In Yates corrections, 0.5 is added to the
observed cell frequency which is lessthan 5. In this problem, an observed frequency is
less than 5. As such we will add 0.5 to 2 and make it 2.5. The rest of the frequencies
are adjusted keeping sub tables unchanged. Thus after Yates corrections, the observed
frequencies would be:

2.5 9.5 12
5.5 6.5 12
8.00 16.0 24
Now we can calculate the value of 2with the above observed frequencies.

Computation of 2

O E O–E (O – E)2 (𝑶 − 𝑬)𝟐


𝑬
2.5 4.0 -1.5 2.25 0.56250
9.5 8.0 1.5 2.25 0.28125
5.5 4.0 1.5 2.25 0.56250
6.5 8.0 -1.5 2.25 0.28125

6
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

24 24 0 1.68750

Since this value is less than the value of 2is not significant. The vaccine is not
effective in controlling the disease.

3. 2As a test of homogeneity: This type of application of 2test can be regarded as


an extension of the 2test of independence. Such tests indicate whether two or
more independent samples are drawn from the same population or from
different populations. Instead of one sample as we use with independence
problem, we shall now have two or more samples. A random sample is dawn
from each of the population and the number in each sample falling into each
category is determined. The sample data is displaced in a contingency table. The
analytical procedure is the same as that discussed for test of independence.

The main difference is that, in test of independence, we are concerned with the
problem whether the two attributes are independent or not while in tests of
homogeneity, we are concerned whether the different samples come from the
same population. Another difference is that test of independence involves a
single sample but test of homogeneity involves two or more samples, one from
each population.

Additive Property: One of the merits of 2test as an instrument of research is


that it is possible to combine the independently derived values of 2relating to
samples of similar data by the simple process of addition. It enables a better test
than could be made using the data of any one sample by itself. The sum of the
2values thus combined will itself have a 2distribution with degrees of freedom
equal to the sum of the degrees of freedom of the separate 2values. However,
while adding 2values two points must be remembered;

(i) The combined result in a single inclusive test is appropriate when the
samples are independent; and
(ii) When2values are to be added. Yates’ corrections should not be applied,
because the addition theorem holds only for uncorrected constituent items.

7
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

Example: 1000 families were selected at random in a city to test the belief that high
income families usually send their children to public schools and the low income
families often send their children to government schools. The following results were
obtained:

School
Income Total
Public Government
Low 370 430 800
High 130 70 200
Total 500 500 1000

Test whether income and type of schooling are independent.

Solution: Let us take the hypothesis that the income and type of schooling are
independent.
(𝐴)𝑥 (𝐵)
Expectation of (AB)=
𝑁

500 x 800
= = 400
1000

The table of expected frequencies:

400 400 800


100 100 200
500 500 1000

Applying 2test

O E (O – E)2 (𝑶 − 𝑬)𝟐
𝑬
370 400 900 2.25
130 100 900 9.00
430 400 900 2.25
70 100 900 9.00
24 24 0 22.50

8
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

(𝑂−𝐸)2
2 = ∑ [ ] = 22.5
𝐸

The calculated value of 2is more than the table value (V = 1, 2= 0.05 =
3.84). The hypothesis is rejected. Hence income and type of schooling are not
independent.

Example: From the data given below about the treatment of 250 patients
suffering from a disease, state whether the new treatment is superior to the
conventional treatment.

No. of Patients
Type of Total
Not
treatment Favourable
favourable
New 140 30 170
Conventional 50 30 80
Total 190 60 250

(Given for degrees of freedom = 1, Chi-square 5% = 3.84)

Solution: Let us take the hypothesis that there is no significant difference between the
new and conventional treatment.

Applying 2test
190 x 170 40.8 170
=129.2
250

60.8 19.2 80
190 60 250

(𝑶 − 𝑬)𝟐
O E O–E (O – E)2
𝑬
140 129.2 10.8 116.64 0.903
50 60.8 -10.8 116.64 1.918
30 40.8 -10.8 116.64 2.858
30 19.2 10.8 116.64 6.075
 11.754
V = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1

9
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi

For V = 1, 20.05= 3.84.

The calculated value of 2is greater than the table value. The hypothesis is
rejected. Hence, there is significant difference between the new and conventional
treatment.

7. CONCLUSION

Let us summarise, Chi square test is a non parametric test. It means that it is not
based on any distributional assumptions. The chi square test is used to test the
independence of attributes, to test the goodness of fit and to test the homogeneity of
samples. Chi square test is based on frequencies and it is not based on the actual values.
When the data are categorical and qualitative in nature, we cannot apply any parametric
test. Because the parametric tests are based on distributional assumptions. In such
cases, we have to compulsorily use the non parametric test. But it is not the meaning
that chi square test can be applied only for the data of qualitative in nature. Even if the
data are in actual values, we can convert the data into categorical and can use the chi
square test. Hence the chi square test can be used in the place of parametric test also.
But parametric test could not be used in the place of non parametric test of chi square.

10

You might also like