Professional Documents
Culture Documents
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
CHI-SQUARE TEST
1. Introduction
Chi square test is one of a non parametric test. To test the goodness of fit and to
verify the distribution of observed data with assumed theoretical distribution, chi square
test is used. Therefore, it is a measure to study the divergence of actual and expected
frequencies. It has great use in Statistics, specially in sampling studies, where we expect
a doubtful coincidence between actual and expected frequencies and the extent to which
the difference can be ignored, because of fluctuations in sampling. If there is no
difference between the actual and expected frequencies, 2is zero. Thus, the chi square
test shows the discrepancy between theory and observation.
2. Objectives
3. Characteristics of 2 Test:
1. It is specifically used to test the association between the variables when the data
is qualitative.
2. Chi square test is based on events or frequencies, whereas in theoretical
distribution, the test is based on mean and standard deviation.
3. To draw inferences, it is applied to test the hypothesis
4. The test can be used between the entire set of observed and expected
frequencies.
5. A new 2distribution is formed, for every increase in the number of degree of
freedom,
4. Assumptions:
1
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
5. Degree of freedom:
When we compare the computed value of 2with the table value, the degree of
freedom is evident. The degree of freedom means the number of classes to which values
can be assigned at will, without violating restrictions. For example, we choose any four
numbers, whose total is 50. Here we have a choice to select any three numbers, say 10,
15, 20 and the fourth number is 5: [50 – (10+15+20)]. Thus our choice of freedom is
reduced by one, on the condition that the total be 50. Therefore the restriction placed on
the freedom is one and degree of freedom is three. As the restrictions increase, the
freedom is reduced.
Thus V=n–k
2
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
has developed a method to test the difference between the theoretical value
(hypothesis) and the observed value. The test is done by comparing the
computed value with the table value of 2for the desired degree of freedom. A
Greek letter 2is used to describe the magnitude of difference between the fact
and theory.
The 2may be defined as,
(𝑂−𝐸)2
2 = ∑ [ ]
𝐸
O =Observed frequencies
E = Expected frequencies.
Steps:
If the calculated value of 2is greater than the table value of 2, at certain level
of significance, we reject the hypothesis. If the computed value of 2value is zero, then
the observed value and expected value completely coincide. If the computed value of
2is less than the table value, at a certain degree of level of significance, it is said to be
non-significant. This implies that the discrepancy between the observed and expected
frequencies may be due to fluctuations in simple sampling.
Example: Coins were tossed 160 times and the following results were obtained:
No. of heads 0 1 2 3 4
Observed frequencies 17 52 54 31 6
Under the assumption that coins are balanced, find the expected frequencies of
getting 0, 1, 2, 3 or 4 heads and test the goodness of fit.
3
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
Solution:
df = 5 – 1 = 4; 2 0.05 = 9.488
Calculated value of 2 is 12.725, which is greater than the table value 9.488.
Therefore, the fit is insignificant.
2. 2 as a test of independence: 2 test can be used to find out whether one or more
attributes are associated or not. For example, coaching class and successful
candidate, marriage and failure, etc., we can find out whether they are related or
independent. We take a hypothesis that the attributes are independent. If the
calculated value of 2 is less than the table value at a certain level of
significance, the hypothesis is correct and vice versa.
4
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
Example: Out of sample of 120 persons in a village, 80 were administered a new drug
for preventing influenza and out of them 26 persons were attacked by influenza. Out of
those who were not administered the new drug, 6 persons were not affected by
influenza. (a) prepare 2 x 2 tables showing the actual and expected frequencies; (b) Use
Chi square test for finding out whether the new drug is effective or not.
(At 5% level for one degree of freedom, the value of Chi square is 3.84).
Solution: 2 x 2 Table
A
B 26 34 60(B)
54 6 60
80(A) 40 120 N
Let the influenza and new drug be independent. The expected frequencies are:
80 x 60 60 x 40 60
= 40 = 20
120 120
60 x 80 60 x40 60
= 40 = 20
120 120
80 40 120
5
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
Calculated value of 2 is 35.7 which is much higher than the table value.
Therefore the hypothesis is rejected. Hence we conclude that the drug is undoubtedly
effective in controlling the influenza.
7. YATES CORRECTION
If any cell frequency in 2 x 2 table is less than 5, then for the application of 2
test it has to be pooled with the preceding or succeeding frequency so that total is
greater than 5. This results in the loss of 1 d.f. In such a situation i.e., when any cell
frequency in 2 x 2 table is less than 5, we apply the correction, popularly known as the
Yates correction, for continuity. This consists in adding 0.5 to the cell frequency which
is less than 5 and adjusting the remaining frequencies accordingly, since row and
column totals are fixed and then applying 2 test without pooling.
Solution: Yates corrections are applied. In Yates corrections, 0.5 is added to the
observed cell frequency which is lessthan 5. In this problem, an observed frequency is
less than 5. As such we will add 0.5 to 2 and make it 2.5. The rest of the frequencies
are adjusted keeping sub tables unchanged. Thus after Yates corrections, the observed
frequencies would be:
2.5 9.5 12
5.5 6.5 12
8.00 16.0 24
Now we can calculate the value of 2with the above observed frequencies.
Computation of 2
6
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
24 24 0 1.68750
Since this value is less than the value of 2is not significant. The vaccine is not
effective in controlling the disease.
The main difference is that, in test of independence, we are concerned with the
problem whether the two attributes are independent or not while in tests of
homogeneity, we are concerned whether the different samples come from the
same population. Another difference is that test of independence involves a
single sample but test of homogeneity involves two or more samples, one from
each population.
(i) The combined result in a single inclusive test is appropriate when the
samples are independent; and
(ii) When2values are to be added. Yates’ corrections should not be applied,
because the addition theorem holds only for uncorrected constituent items.
7
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
Example: 1000 families were selected at random in a city to test the belief that high
income families usually send their children to public schools and the low income
families often send their children to government schools. The following results were
obtained:
School
Income Total
Public Government
Low 370 430 800
High 130 70 200
Total 500 500 1000
Solution: Let us take the hypothesis that the income and type of schooling are
independent.
(𝐴)𝑥 (𝐵)
Expectation of (AB)=
𝑁
500 x 800
= = 400
1000
Applying 2test
O E (O – E)2 (𝑶 − 𝑬)𝟐
𝑬
370 400 900 2.25
130 100 900 9.00
430 400 900 2.25
70 100 900 9.00
24 24 0 22.50
8
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
(𝑂−𝐸)2
2 = ∑ [ ] = 22.5
𝐸
The calculated value of 2is more than the table value (V = 1, 2= 0.05 =
3.84). The hypothesis is rejected. Hence income and type of schooling are not
independent.
Example: From the data given below about the treatment of 250 patients
suffering from a disease, state whether the new treatment is superior to the
conventional treatment.
No. of Patients
Type of Total
Not
treatment Favourable
favourable
New 140 30 170
Conventional 50 30 80
Total 190 60 250
Solution: Let us take the hypothesis that there is no significant difference between the
new and conventional treatment.
Applying 2test
190 x 170 40.8 170
=129.2
250
60.8 19.2 80
190 60 250
(𝑶 − 𝑬)𝟐
O E O–E (O – E)2
𝑬
140 129.2 10.8 116.64 0.903
50 60.8 -10.8 116.64 1.918
30 40.8 -10.8 116.64 2.858
30 19.2 10.8 116.64 6.075
11.754
V = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1
9
Paper Code and Title: H16RM Research Methodology and Statistics for Home
Science
Module Code and name: H16RM37 Chi-Square Test
Name of the Content Writer: Dr. S. Gandhimathi
The calculated value of 2is greater than the table value. The hypothesis is
rejected. Hence, there is significant difference between the new and conventional
treatment.
7. CONCLUSION
Let us summarise, Chi square test is a non parametric test. It means that it is not
based on any distributional assumptions. The chi square test is used to test the
independence of attributes, to test the goodness of fit and to test the homogeneity of
samples. Chi square test is based on frequencies and it is not based on the actual values.
When the data are categorical and qualitative in nature, we cannot apply any parametric
test. Because the parametric tests are based on distributional assumptions. In such
cases, we have to compulsorily use the non parametric test. But it is not the meaning
that chi square test can be applied only for the data of qualitative in nature. Even if the
data are in actual values, we can convert the data into categorical and can use the chi
square test. Hence the chi square test can be used in the place of parametric test also.
But parametric test could not be used in the place of non parametric test of chi square.
10