You are on page 1of 30

2

Chi-Square(χ ) Test
Meaning of Chi-Square (𝒙𝟐 ) Distribution
𝑥 2 distribution is a continuous probability
distribution. The probability function of 𝑥 2
distribution is given by:
2 2 𝑣𝑐 ;1 𝑥 2 /2
𝑓(𝑥 ) = 𝑒(𝑥 ) 𝑒
Where, 𝑒 = 2.71828

𝑣 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

𝑐 = 𝐴 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑑𝑒𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛𝑙𝑦 𝑜𝑛 𝑣


Properties of Chi-Square Distribution
Nature – It is continuous probability distribution

Value – It has the zero value at its lower limit


and extends to infinity in the positive direction.
The value of 𝑥 2 can never be negative since the
differences between the observed and expected
frequencies are always squared.

Only one parameter – It has only one parameter


i.e. 𝑣 (𝑖. 𝑒. 𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚)
Properties of Chi-Square Distribution
Shape – Its shape depends upon the number of
degrees of freedom (𝑣)

Size of Shape of distribution

(a) if is small The shape of the distribution is skewed to the right.


(b) If is large The distribution becomes more and more
symmetrical and starts assuming the shape of
normal distribution
Mean – Its mean =𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑣)

Variance – Its variance 𝑥 2 =Twice degree of


freedom =2 𝑣

Approximation to normality - As 𝑣 gets larger,


𝑥 2 approaches the normal distribution with
mean (v) and standard deviation ( 2𝑣 − 1) and
standard deviation = 1.

Additive Property - The sum of independent 𝑥 2


varietes is also a 𝑥 2 variate.
Constant of 𝒙𝟐 Distribution
The constant of 𝑥 2 distribution with v degrees of
freedom are:
(i) Mean = 𝑣; Mode = 𝑣 -2, Variance=2 𝑣

(ii) Moments = 𝜇0 = 0; 𝜇2 = 2𝑣;


𝜇3 = 8𝑣 𝑎𝑛𝑑 𝜇4 = 48𝑣 + 12𝑣 2

𝜇3 2 64𝑣 2 8
(iii) 𝛽1 = = = , 𝐻𝑒𝑛𝑐𝑒, 𝛾1 = 8/𝑣
𝜇2 3 8𝑣 3 𝑣
𝜇4 48𝑣:12𝑣 2 12
𝛽2 = = =3+
𝜇2 2 4𝑣 2 𝑣
12
𝐻𝑒𝑛𝑐𝑒, 𝛾2 = 𝛽2 − 3 =
𝑣
Meaning of Chi-Square Test
Chi-square test is a technique to test whether a
given discrepancy (i.e. value of 𝑥 2 ) between
observed (actual) frequencies and expected
(theoretical) frequencies is considered to be
significant. The formula for computing chi-
square is given by-

𝑤𝑕𝑒𝑟𝑒, 𝑂 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦,


𝐸 = 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦.
The computed value of 𝑥 2 is compared with the
table value of 𝑥 2 for given degrees of freedom
at specified level of significance.
Conditions for the Application of 𝒙𝟐 Test
Random Samples – The samples must be drawn
at random from the target population.
Independent Observations – The observations of
each sample must be independent.
At least 50 observations – Each sample must
contain at least 50 observations.
Data in original Units – The data must be
expressed in original units, and
At least 5 frequencies – In any one cell, there
must be at least 5 frequencies.
Grouping in case of small theoretical frequencies
In case of small theoretical frequencies, one or more classes with theoretical
frequencies less than 5 may be combined into a single class. The number of degrees
of freedom would be determined with the number of classes after regrouping.

Observed 114 126 200 80 60 13 3 4


frequencies (O)

Expected 104 136 190 90 74 5 1 0


frequencies (E)

After combining the last three classes, the number of classes would be reduced to 6 as
follows:
IS 𝒙𝟐 Distribution Free Test?

𝒙𝟐 is distribution free test since,

No rigid assumptions are necessary in regard to


the type of population as is necessary for
normal distribution except those stated in
regard to nature of sample observation and
nature of events.
No necessity to have values of the parameters
like 𝜇 or 𝜎 of a population distribution because
it is based on degrees of freedom only.
Note: All 𝒙𝟐 tests are one tailed significance test.
Important Uses of 𝒙𝟐 Test

The important uses of 𝒙𝟐 test are –

1. As a test for a specified variance

2. As a test for independence of attributes

3. As a test of goodness of fit.


𝒙𝟐 As a Test for a specified variance
It is used to test whether particular random
sample has been drawn from a normal
distribution with mean 𝜇 and a specified
variance 𝜎02 .

To test hypothesis about the variance of a


normally distributed population, the null
hypothesis is 𝐻0 : 𝜎 2 = 𝜎02 where 𝜎02 is some
specified value of the population variance.
Practical steps involved in Testing for a Specified Variance

Step-1: Set up the hypothesis as follows:

𝐻0 : 𝜎 2 = 𝜎02 , 𝐻1 : 𝜎 2 ≠ 𝜎02 (In case the test is whether population variance is equal to
given value)

𝐻0 : 𝜎 2 ≤ 𝜎02 , 𝐻1 : 𝜎 2 > 𝜎02 (In case the test is whether population variance shall not
exceed a given value)

Step-2: Compute the value of 𝑥 2 as follows:

Where, 𝑠 2 = Sample variance


𝜎 2 = Population variance
𝑛 = Sample size
𝑛 = Number of classes
Practical steps involved in Testing for a Specified Variance

Step-3: Calculate degrees of freedom (v) = n – 1

Step-4: Find out table value of 𝑥 2 for given degrees of freedom at a


given level of significance

Step-5: Compare the computed value of 𝑥 2 with table value of 𝑥 2 and


interpret.
Case Study - 13.
Weights in tons of 10 shipments are given
below:
30, 34, 31, 36, 32, 38, 35, 33, 35, 36
Can we say that variance of the distribution of
weight of all shipments from which the above
sample of 10 shipments was drawn is equal to
4 square tons?
Discussion of Case Study - 13.
Hint: It is a comparison of variance problem,
we have to find sample variance using the
𝑿;𝑿 𝟐
formula, 𝒔𝟐 =
𝒏;𝟏
𝑺𝟐
We have to use 𝝌𝟐 formula as 𝝌𝟐 = (𝒏 − 𝟏)
𝝈𝟐
Discussion of Case Study - 13.
H0: 𝜎 2 = 4
H1: 𝜎 2 ≠ 4
Computation of 𝑆 2 (Sample variance)
Weights (in tons) 𝑥−𝑥 (𝑥 − 𝑥 )2
𝑥
30 -4 16
34 0 0
31 -3 9
36 2 4
32 -2 4
38 4 16
35 1 1
33 -1 1
35 1 1
36 2 4

𝑥 = 340 (𝑥 − 𝑥 )2 = 56

𝑥 840
Sample mean = 𝑥 = 𝑛
= 10 = 34
2
(𝑥 − 𝑥 ) 56
𝑆2 = =
𝑛−1 9
56
(𝑛 − 1)𝑆 2 9. ( ) 56
2
𝜒 = = 9 = = 14
𝜎 2 4 4
Degrees of freedom = 𝜈 = 𝑛 − 1 = 10 − 1 = 9
𝜒 2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 14
𝜒 2 𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 𝜒 2 0.05,𝜈<9 = 16.91
𝜒 2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 14 < 𝜒 2 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑒𝑑 = 16.91
Since the computed value of 𝜒 2 is less than the
tabular value of𝜒 2 , 𝐻0 is accepted.
∴ The variance of the distribution is 4 tons.
Case Study - 14.
Two researchers adopted different sampling techniques
while investigating the same group of students to find the
number of students falling in different intelligence level. the
results are as follows:
Number of students in each level
Above
Researcher Below average Average Genius Total
Average
X 137 164 152 147 600
Y 32 37 56 35 180
Total 169 221 208 182 780

Required:
Would you say that the sampling techniques adopted by the two researchers
are significantly different?
H0: There is no significance difference among the sampling
techniques adopted by the two researchers.
H1: There is a significance difference.
Degrees of freedom = (r-1) (c-1) = (2-1) (4-1)=3
r = No. of rows c = No. of columns.
Given: Observed frequency (O)
Observed Contingency Table
Researcher Below Average Above Gents Total
Average Average
X 137 164 152 147 600 = R1
Y 32 57 56 35 180 = R2
Total 169 221 208 182 780 : N
C1 C2 C3 C4
Calculation of expected frequency (E)
𝑅𝑖 ×𝐶𝑗
𝐸𝑖𝑗 =
𝑁
Where 𝑅𝑖 = An 𝑖𝑡ℎ row total
𝐶𝑗 = A 𝑗𝑡ℎ column total
N = Sample size = 𝑅𝑖 = 𝐶𝑗

(𝑅1 )(𝐶1 ) (600)(169)


𝐸11 = = = 130
𝑁 780
(𝑅1 )(𝐶2 ) (600)(221)
𝐸12 = = = 170
𝑁 780
(𝑅1 )(𝐶3 ) (600)(208)
𝐸13 = = = 160
𝑁 780

(𝑅1 )(𝐶4 ) (600)(182)


𝐸14 = = = 140
𝑁 780
(𝑅2 )(𝐶1 ) (180)(169)
𝐸21 = = = 39
𝑁 780

(𝑅2 )(𝐶2 ) (180)(221)


𝐸22 = = = 51
𝑁 780

(𝑅2 )(𝐶3 ) (180)(208)


𝐸23 = = = 48
𝑁 780

(𝑅2 )(𝐶4 ) (180)(182)


𝐸24 = = = 42
𝑁 780
Expected Contingency table
Research Below Average Above Gents Total
er Average Average
X 130 170 160 140 600
Y 39 51 48 42 180
Total 169 221 208 182 N=780
Calculation of 𝝌𝟐
O E O–E (O – E)2 (O – E)2
𝐸
137 130 7 49 0.377
164 170 -6 36 0.212
152 160 -8 64 0.400
147 140 7 49 0.350
32 39 -7 49 1.256
57 51 6 36 0.706
56 48 8 64 1.333
35 42 -7 49 1.167

2
(O – E)2
𝜒 = = 5.801
𝐸

𝜒 2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 5.801 ≅ 5.80


𝜒 2 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑜𝑟 = 𝜒 2 0.05,𝜈<3 = 7.815 ≅ 7.82
𝜒 2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 5.80 < 𝜒 2 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑜𝑟 = 7.82
Since the computed value of 𝜒 2 is less than the tabular value of 𝜒 2 , HO is
accepted.
There is no significance difference in the sampling techniques, used by the
two researchers.
Case Study-15: Genetic Theory states that
children having one parent of blood type A and
the other of blood type B will always be of one
of three types A, AB, B and that the proportion
of three types will on an average be as 1 : 2 : 1.
A report states that out of 300 children having
one A parent and B parent, 30 percent were
found to be types A, 45 percent type AB and
remainder type B. Does this supports
theoretical hypothesis of the genetic theory?
Hint: Observed frequencies are given in %.
Expected / Theoretical frequencies are given in
ratios. It is test of goodness of fit.
Discussion of Case Study - 15.
H0: Observed frequencies = Expected Frequencies
H1: Observed frequencies ≠ Expected Frequencies
d = 5%, 𝜈 = 3 − 1 = 2
Calculation of Observed Frequencies
30
𝑂𝐴 = 30% 𝑜𝑓 300 = . 300 = 90
100
45
𝑂𝐴𝐵 = 45% 𝑜𝑓 300 = . 300 = 135
100
25
𝑂𝐵 = 100 − 30 − 45 % 𝑜𝑓 300 = . 300 = 75
100
Calculation of Expected Frequencies
1 1
𝐸𝐴 = = . 300 = 75
1+2+1 4
2
𝐸𝐴𝐵 = . 300 = 150
4
1
𝐸𝐵 = . 300 = 75
4
Calculation of 𝒙𝟐

O E O – E (O – E)𝟐 (𝐎 – 𝐄)𝟐
𝑬
90 75 15 225 225/75=3
135 150 -15 225 225/150=1.5
75 75 0 0 0/75=0
𝜒2 =
(O – E)2
=4.5
𝐸
𝜒 2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 4.5
Degrees of freedom = 𝜈 = 𝑛 − 1 = 3 − 1 = 2

𝜒 2 𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 𝜒 2 0.05,𝜈<2 =5.991


𝜒 2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 4.5 < 𝜒 2 𝑡𝑎𝑏𝑢𝑙𝑎𝑡𝑒𝑑 = 5.991
Since the computed value of 𝜒 2 is less than the
tabular value of 𝜒 2 , 𝐻0 is accepted.
∴ It is concluded that the given distribution
provides a good fit to the data, i.e. the
theoretical hypothesis of the genetic theory that
on an average type A, AB, B stand in the
proportion. 1 : 2 : 1

You might also like