You are on page 1of 3

1.

Hypothesis Testing: Chi Square Tests


Test of Goodness of Fit:
Chi Square test for discrete data is used to compare some theoretical frequencies from a
category of population distribution to the actual observed frequencies from a distribution. This test
is also known as Test of goodness of fit.

Suppose .fo. is the observed frequency, and .fe. is the expected frequency. Then the statistic:
2
X = Ói ((foi . fei)^2/fei) follows chi square distribution with (n-1) degrees of freedom, where n is
the number of categories I = 1 to n.
2 2
We reject the null hypothesis, if .calculated X ” >X (á, n-1). Where á denotes the level of
significance and n-1 is the degree of freedom.

X2Test of Independence:
For a data set classified according to two attributes, by performing a chi-square test we can test
whether the two classification factors are independent.

Example #4:
Suppose a company (of 100 employees) has 72 employees with less experience with domain and
the rest (28) with more experience with domain. Is the ratio of less experienced to more
experienced equivalent to the industry standard of 60% to 40%?

Method 1:
Here the null hypothesis is: .Observations are confirmed with the industry standards.. So we
have:

Category fo fe
Less experienced 72 60
More experienced 28 40

Calculated value
2
X = (72 . 60)^2 / 60 + (28 - 40)^2 / 40 = 6
Tabulated value of chi square with 1 degree of freedom = 3.841.
Hence calculated value > tabulated value, so we reject the null hypothesis.

Method 2:
Here the null hypothesis is: .Observations are confirmed with the industry standards..

Category fo fe
Less experienced 72 60
More experienced 28 40

COMPUTER HANDS-ON (Microsoft Excel)

1. Use the Function CHITEST.


2. Put Column Fo in Actual Range, Column Fe in Expected Range.

You will obtain p Value = 0.014306


Hence p < 0.05, reject the Null Hypothesis.

Example #5
A random sample of 500 students were classified according to economic conditions and merit,
and we have the following observations:

Economic Condition

Merit Rich Mid-Class Poor Row Total

Meritorious 42 137 61 240


Non-meritorious 58 113 89 260
Column Total 100 250 150 500

Our Null Hypothesis is:


H0: Economic condition and Merit are two independent attributes.
And H1: Economic condition and Merit are related attributes.

Our next step is to evaluate expected frequencies for independence i.e. what the frequencies
would have been if the attributes were independent.
We calculate the expected frequencies fe for (i, j) = (ith row total frequencies) *(jth column Total
frequencies)/(Total frequencies)

Merit Rich Mid-Class Poor Row Total

Meritorious 48 120 72 240


Non-meritorious 52 130 78 260
Column Total 100 150 150 500

Degrees of freedom:
(2-1) for number of Rows * (3-1) for number of columns = 2
Tabulated value for Chi Square with 2 df, and 95% Level of Confidence = 5.99.
Since Chi Square calculated > tabulated value, we reject the hypothesis and say that Merit and
Economic conditions are not independent; they are associated.

COMPUTER HANDS-ON

Method 2:
EXCEL WORKOUT:

Use the CHITEST function.


Select the actual range as:

42 137 61
58 113 89

p-Value obtained will be = CHITEST(G10:I11,G13:I14) = 0.009535, which is less than 0.05.


Tata Consultancy Services -2-
Hence we reject the Null Hypothesis.

Tata Consultancy Services -3-

You might also like