You are on page 1of 11

Chi-square A distribution with degree of freedom as the only

distribution parameter. It is skewed to the right for small degree of


freedom, but when degree of freedom is large, it likes
a normal curve.
Chi-square test A statistical technique used to test significance in the
analysis of frequency distribution.
Contingency A table having rows and columns wherein each row
table corresponds to a level of one variable and each columns to
a level of another variable. The frequency with which each
variable combination has occurred are contained in the
body of the table.
Degree of The number of elements that can be chosen freely.
freedom
expected the frequencies for different categories of a multinomial
frequencies experiments or for different cell of a contingency table,
which are expected to occur on the assumption that the
given hypothesis is true.

Goodness-of-fit A statistical test involving the chi-square statistic for


test determining whether some observed pattern of frequencies
conforms to an expected pattern.
observed The frequencies actually obtained from the performance
frequency of an experiment.
Test of A statistical test involving the chi-square statistic, which
association verifies whether the proportions of elements belonging to
different groups in two (or more) population are similar or
not.
Test of A statistical test involving the chi-square statistic, which
independency verifies whether the two attributes of a population are
related or not.

List of formula
2 2

1. Probability function of χ 2-- F( χ 2 ¿=c( χ 2)( ν )−1 e χ /2 ; e=2.71828

2. Values of the test statistic χ 2

k 2 2
2 ( Oi −Ei ) ( Oi−Ei )
χ =∑ ∨∑
i=1 Ei Ei

Where Oi=ovserved frequency ∈the ithcategory


Ei =Expected frequency ∈the ith category
k =number of category

Expected frequency for a cell for an independence or association test:


RT ×CT
E=
N

E=Expected frequency ∈¿ a givencell


RT =thetotal of row ∈which that cell lies
CT =thetotal of column∈which that cell lies
N=Total number of observation

3. Number of degree of freedom:


df =( r−1 ) ( c−1 ) ,
where ' r ' isthe no . of row∧' c' isthe no . of column

4. (a) Another formula for χ 2 where a, b, c, and d are the totals of the four
cells in 2 X 2 contingency table:

N ( ad −bc )2
χ 2=
( a+ c )( b+ d )( c + d ) ( a+ b )
5. (b) Another formula for χ 2 for 2 X 3 contingency table:

N a2 b2 c2 N d2 e2 f2
χ 2= ( + + + ) + (
+
a+ b+c a+d b+ e c+ f d +e + f a+ d b+ e c+ f
−N )
Both these formulas are used with observed frequencies alone, thus
avoiding the computation of expected frequencies.

6. Yate’s Correction:
When results of continuous data are applied to discrete data, Yates’
correction is applied by rewriting the chi-square

2
2
k
(|Oi−Ei|−0.5 )
χ ( corrected ) =∑
i=1 Ei
Where |Oi−E i| means that difference between observed frequency and
expected frequency in k category is to be taken ignoring ‘+’ and ‘−‘
signs.
Numerical Problem: Goodness of fit

1. A survey of 320 families with 5 children each, revealed the following


distribution. Is this result consistent with the hypothesis that whether
villagers are biased for male child means that male and female births are
equally probable or not.

No. of boys: 5 4 3 2 1 0

Frequency of the family: 14 56 110 88 40 12

2. The number of parts for particular spare parts in a factory was found to vary
from day to day. In a sample study, the following information was obtained.
Test the hypothesis that the number of parts demanded does depend on a day
of the week.

Monday 1124

Tuesday 1125

Wednesday 1110

Thursday 1120

Friday 1126

Saturday 1115

The table value of chi-square for ν=n- 1 =5 d.f. at 5% level of significance


is 11.07

Numerical Problem: Test of Independence

3. A brand manager is concerned that his brand’s share may be unevenly


distributed throughout the country. In a survey in which the country is divided
into four geographical regions, a random sampling of 100 consumers in each
region was surveyed, with the following results:

Region
NE NW SE SW Total
Purchase the brand 40 55 45 50 190

Do not purchase the brand


60 45 55 50 210
100 100 100 100 400

Develop a table of observed and expected frequencies for this problem.


(i) State the null hypothesis.
(ii) Calculate the sample chi-square value
(iii) At α = 0.05 , test whether brand share is the same across the
four regions.

Solution:

(i) In order to calculate the expected frequency for a cell for for the
corresponding observed frequencies, we have applied the formula
and shown in following table:

RT ×CT
E=
N

E=Expected frequency ∈¿ a given cell

RT =thetotal of row ∈which that cell lies

CT =thetotal of column∈which that cell lies

N=Total number of observation

O E O-E (O – E)2 (O – E)2 / E


Row 1 NE 40 47.5 - 7.5 56.25 1.18
NW 55 47.5 7.5 56.25 1.18
SE 45 47.5 - 2.5 6.25 0.13
SW 50 47.5 2.5 6.25 0.13
Row 2 NE 60 52.5 7.5 56.25 1.07
NW 45 52.5 - 7.5 56.25 1.07
SE 55 52.5 2.5 6.25 0.12
SW 50 52.5 -2.5 6.25 0.12
2
χ =5.00

(ii) H 0 :The brand share is evenly distributed


H 1 :The brand share is not evenly distributed

(iii) degree of freedom:


df =( r−1 ) ( c−1 ) =( 2−1 ) ( 4−1 )=3 ,
where ' r ' isthe no . of row∧' c' isthe no . of column
The critical value of χ 2at α = 0.05 level for 3 degree of freedom. From
table is 7.815.( (see the table below)
Since the calculated value of χ 2 is less than the critical value of 7.815,
the null hypothesis is accepted. In other words, the brand share is evenly
distributed in all the four regions of the country.

4. To win a political game during election season, a political party has put the
subsidy on certain commodity. To test whether subsidized rate of a commodity
is effective or not, a sample of 200 persons was selected wherein 100 persons
were availing the subsidized rate others were not. The results (for and against
the subsidy)are given as follows:

Subsidized rate Non subsidized rate Total


for 65 55 120

against 35 45 80
Total 100 100 200

Solution:

(i) H 0 :The subsidized rate is not effective ¿ makethe influence

H 1 :The subsidized rate is effective ¿ make theinfluence

O E O-E (O – E)2 (O – E)2 / E


65 60 5 25 0.417
55 60 -5 25 0.625
35 40 -5 25 0.417
45 40 5 25 0.625
χ 2=2.084
(i) degree of freedom:
df =( r−1 ) ( c−1 ) =( 2−1 ) ( 2−1 )=1 ,
where ' r ' isthe no . of row∧' c' isthe no . of column
The critical value of χ 2at α = 0.05 level for 1 df from table is 3.841(see
the table below).
Since the calculated value of χ 2 is less than the critical value of 3.841,
the null hypothesis is accepted. In other words, the subsidized rate of a
commodity is not effective in making influence.

5. A certain drug is claimed to be highly effective in curing cold. In an experiment


on 500 persons with cold, half of them were given drug and half of them were
given sugar syrup. The patient’s reactions to the treatment are recorded in the
following table. On the basis of these data can it be concluded that there is a
significant difference in the effect of the drug and sugar syrup?

cured harmed not cured Total


medicine 150 30 70 250
sugar syrup 130 40 80 250

(i) In order to calculate the expected frequency for a cell for the
corresponding observed frequencies, we have applied the formula
and shown in following table:

RT ×CT
E=
N

E=Expected frequency ∈¿ a givencell

RT =thetotal of row ∈which that cell lies


CT =thetotal of column∈which that cell lies

N=Total number of observation

O E O-E (O – E)2 (O – E)2 / E


Row C1 150 140 10 100 0.71
1 C2 30 35 -5 25 0.71
C3 70 75 -5 25 0.33

Row C1 130 140 10 100 071


2 C2 40 35 5 25 0.71
C3 80 75 5 25 0.33

χ 2=3.50

(ii) H 0 :Patient are not cured due ¿ medicine


H 1 : Patient are cured due ¿ medicine

(iii) degree of freedom:


df =( r−1 ) ( c−1 ) =( 2−1 ) ( 3−1 )=2 ,
where ' r ' isthe no . of row∧' c' isthe no . of column
The critical value of χ 2at α = 0.05 level for 3 degree of freedom. From
table is 5.99 (see the table below)
Since the calculated value of χ 2 is less than the critical value of 5.99, the
null hypothesis is accepted. In other words, patients are not cured due to
medicine. (cure is independent of treatment using medicine)

Numerical Problem: Test of Homogeneity

6. A random sample of 400 persons was selected from each of three age group
and each person was asked to specify which of three types of TV program be
preferred. The results are shown in the following table:

A B C Total

under 30 120 30 50 200


30-44 10 75 15 100

45 and above 10 30 60 100

Total 140 135 125 400

Test the hypothesis that populations are homogeneous with respect to


types of TV program they prefer.
(iv) In order to calculate the expected frequency for a cell for for the
corresponding observed frequencies, we have applied the formula
and shown in following table:

RT ×CT
E=
N

E=Expected frequency ∈¿ a givencell

RT =thetotal of row ∈which that cell lies

CT =thetotal of column∈which that cell lies

N=Total number of observation

O E O-E (O – E)2 (O – E)2 / E


Row 1 C1 120 70 10 100 0.71
C2 30 33.7 -5 25 0.71
C3 50 30.2 -5 25 0.33

Row 2 C1 10 35 10 100 0.71


C2 75 33.7 -5 25 0.71
C3 15 30.2 -5 25 0.33

Row 3 C1 10 70 10 100 0.71


C2 30 33.7 -5 25 0.71
C3 60 75 -5 25 0.33

χ 2=5.00

(iii) H 0 :The people of all age group has same liking of TV programs

H 1 :The people of all age group has not the sameliking of TV programs

(v) degree of freedom:


df =( r−1 ) ( c−1 ) =( 3−1 ) ( 3−1 ) =4 ,
where ' r ' isthe no . of row∧' c' isthe no . of column
The critical value of χ 2at α = 0.05 level for 4 degree of freedom. From
table is 9.41(see the table below)
Since the calculated value of χ 2 is less than the critical value of 9.41, the
null hypothesis is accepted. In other words, the brand share is evenly
distributed in all the four regions of the country.

Miscellaneous problems--
7. An automobile manufacturing firm is bringing out a new model. In order to
map out its advertising campaign, it wants to determine whether the model
appeal depends on age group or not. The firm takes a random sample from
persons attending a preview of the new model and obtained the results
summarized below:

Under 20 20- 40- over 50 Total


40 50
liked the 146 78 48 28 300
model
54 52 32 62 200
disliked the
model
200 130 80 90 500

8. A sample analysis of examination results of 200 MBA’s was made. It was


found 46 students have failed, 68 secured a third division, 62 secured
second division and rest were placed in the first division. Are these figure
commensurate with the general examination result which Is in the ratio of
2:3:3:2, for various categories respectively?
9.

Chi-Square Table

You might also like