# STATISTICS

THE CHI SQUARE TEST

BY:
LJV & IKR

OBJECTIVES

 Test a distribution for goodness of fit using chi-square.  Test two variables for independence using chi-square.  Test proportions for homogeneity using chi-square

INTRODUCTION
 The chi-square distribution can be used for tests concerning frequency distributions, such as: ― If a sample of buyers is given a choice of automobile colors, will each color be selected with the same frequency?‖

INTRODUCTION ( CONT’D )
 The chi-square distribution can also be used to test the independence of two variables. For example, ―Are senators’ opinions on gun control independent of party af filiations?‖

INTRODUCTION ( CONT’D )
 The chi-squared distribution can be used to test the homogeneity of proportions. For example: ―Is the proportion of high school seniors who attend college immediately after graduating the same for the northern, southern, eastern, and western parts of the Iloilo City?‖

ASSUMPTIONS FOR GOODNESS-OF-FIT TEST, INDEPENDENCE AND HOMOGENEIT Y TESTS
 The data are obtained from a random sample.  The expected frequency for each category must be 5 or more.

TEST FOR GOODNESS-OF-FIT

 The chi-square statistic can be used to see whether a frequency distribution fits a specific pattern. This is referred to as the chi-squared goodness-of-fit test.

OBSERVED FREQUENCIES VS. EXPECTED FREQUENCIES
 Suppose a market analyst wished to see whether consumers have any preference among five flavors of a new fruit soda. A sample of 100 people provided these data: Cherry 32 Strawberry 28 Orange 16 Lime 14 Grape 10

 Since the frequencies for each flavor were obtained from a sample, these actual frequencies are called the observed frequencies. The frequencies obtained by calculation (as if there were no preference) are called the expected frequencies.

OBSERVED FREQUENCIES VS. EXPECTED FREQUENCIES ( CONT’D )

Frequency Observed Expected

Cherry 32 20

Strawberry 28 20

Orange 16 20

Lime 14 20

Grape 10 20

GOODNESS-OF-FIT TEST
 The formula for the chi -square goodness-of-fit test:

X 2 = ∑(O – E) 2 E
Where: O = the observed or obtained frequency E = the expected or theoretical frequency

GOODNESS-OF-FIT TEST ( CONT’D )
 And the degrees of freedom (df):

df = (C – 1) (R – 1)
Where: C = the number of columns R = the number of rows

AN EXAMPLE

 Is there enough evidence to reject the claim that there is no preference in the selection of fruit soda flavors, using the data shown previously? Let α = 0.05.

SOLUTION
 STEP 1 State the hypotheses and identify the claim .
 H0: Consumers show no preference for flavors (claim).  H1 : Consumers show a preference.

 STEP 2 Find the critical value. The degrees of freedom are 5 – 1=4, and α = 0.05. Hence, the critical value from the table is 9.488 .  STEP 3 Compute χ2

X 2 = ∑(O – E) 2 = 18.0 E

SOLUTION ( CONT’D )
 STEP 4 Make the decision. The decision is to reject the null hypothesis, since 18.0 > 9.488.

SOLUTION ( CONT’D )
 STEP 5 Summarize the results . There is enough evidence to reject the claim that consumers show no preference for the flavors.

A GOOD FIT
When the observed values and expected values are close together, the chi-square test value will be small. Then the decision will be not to reject the null hypothesis— hence, there is a ―good fit.‖

NOT A GOOD FIT
When the observed values and the expected values are far apart, the chi-square test value will be large. Then, the null hypothesis will be rejected—hence, there is ―not a good fit.‖

CHI-SQUARE GOODNESS-OF-FIT PROCEDURE SUMMARY
 Step 1 State the hypotheses and identify the claim.  Step 2 Find the critical value. The test is always right-tailed.  Step 3 Compute the test value . Find the sum of the values . (O – E) 2 E  Step 4 Make the decision.  Step 5 Summarize the results.

INDEPENDENCE TEST

 The chi-square independence test can be used to test the independence of two variables . H0: There is no relationship between two variables. H1: There is a relationship between two variables. If the null hypothesis is rejected, there is some relationship between the variables.

AN EXAMPLE

Suppose a new postoperative procedure is administered to a number of patients in a large hospital. One can ask the question, ―Do the doctors feel dif ferently about this procedure from the nurses, or do they feel basically the same way?‖ Note that the question is not whether they prefer the procedure but whether there is a dif ference of opinion between the two groups.

AN EXAMPLE ( CONT’D )

To answer this question, a researcher selects a sample of nurses and doctors and tabulates the data in table form, as shown. Group Nurses Doctors Prefer new 100 50 Prefer old 80 120 No Preference 20 30

AN EXAMPLE ( CONT’D )

If the null hypothesis is not rejected, the test means that both professions feel basically the same way about the procedure and the dif ferences are due to chance. If the null hypothesis is rejected, the test means that one group feels dif ferently about the procedure from the other. Remember that rejection does not mean that one group favors the procedure and the other does not.

CHI-SQUARE INDEPENDENCE TEST

In order to test the null hypothesis, one must compute the expected frequencies, assuming the null hypothesis is true. When data are arranged in table form for the independence test, the table is called a contingency table.

CONTINGENCY TABLE

Row 1 Row 2

Column 1 C1 ,1 C2,1

Column 2 C1 ,2 C2,2

Column 3 C1 ,3 C2,3

CHI-SQUARE INDEPENDENCE TEST FORMULA
 The formula for the test value for the independence test is the same as the one for the goodness -of-fit test.

X 2 = ∑(O – E) 2 E
Where: O = the obser ved or obtained frequency E = the expected or theoretical frequency and the degrees of freedom : d f = (C – 1) (R – 1) Where: C = the number of columns R = the number of rows

CALCULATION OF THE EXPECTED FREQUENCIES
Using the contingency table, one can compute the expected frequencies for each block (or cell) as shown next. 1 . Find the sum of each row and each column, and find the grand total, as shown.

Group

Prefer new procedure

Prefer old procedure

No preference

Total

Nurses Doctors Total

100 50 150

80 120 200

20 30 50

200 200 400

Grand Total

CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
2. For each cell, multiply the corresponding row sum by the column sum and divide by the grand total, to get the expected value: Expected value = Row sum × Column sum Grand total

CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
The rationale for the computation of the expected frequencies for a contingency table uses proportions. For C1 ,1 a total of 150 out of 400 people prefer the new procedure. And since there are 200 nurses, one would expect, if the null hypothesis were true , (150/400)(200), or 75, of the nurses to be in favor of the new procedure. For example, for C1 ,2, the expected value, denoted by E1 ,2, is E = 200 x 200 = 100 400

CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
For each cell, the expected values can be computed and placed in the table:

Group

Prefer new procedure

Prefer old procedure

No preference

Total

Nurses Doctors Total

100(75) 50(75) 150

80(100) 120(100) 200

20(25) 30(25) 50

200 200 400

CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
Now, compute χ2:

X 2 = ∑(O – E) 2 = 26.27 E

The final steps are to make the decision and summarize the results. This test is always a right -tailed test, and the degrees of freedom are (R -1)(C-1)= (2-1)(3-1)=2.

CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
If α = 0.05, the critical value from the table is 5.991 . Hence, the decision is to reject the null hypothesis, since 26.67 > 5.991 .

HOMOGENEIT Y OF PROPORTIONS TEST
 is used when samples are selected from several different populations and whether the proportions of elements that have a common characteristic are the same for each population.  The sample sizes are specified in advance, making either the row totals or column totals in the contingency table known before the samples are selected.
H0: p1 = p2 = p3 =… = pn. H1: At least one propor tion is dif ferent from the other s . When the null hypothesis is rejected, it can be assumed that the propor tions are not all equal.

PROCEDURE

The procedures for the chi-square independence and homogeneity tests are identical and summarized below. Step 1 State the hypotheses and identify the claim. Step 2 Find he critical value in the right tail. Step 3. Compute the test value. To compute the test value, first find the expected values. For each cell of the contingency table, use the formula E = (row sum)(column sum) grand total

PROCEDURE ( CONT’D )
to get the expected value. To find the test value, use the formula

X 2 = ∑(O – E) 2 E
Step 4 Make the decision . Step 5 Summarize the results.

AN EXAMPLE
A researcher selected a sample of 150 seniors from each of three area high schools and asked each senior, ―Do you drive to school in a car owned by either you or your parents ?‖. The data are shown in the table. At α = 0.05, test the claim that the proportion of students who drives their own or their parents’ cars is the same at all three schools .
School 1 School 2 School 3 Total

Yes No Total

18 32 50

22 28 50

16 34 50

56 94 150

SOLUTION

STEP 1 . State the hypotheses. H0: p1 = p2 = p3 H1: At least one proportion is dif ferent from the others. STEP 2. Find the critical value. The formula for the degrees of freedom is the same as before: (2 -1)(3-1)=2. The critical value is 5.991 . STEP 3. Compute the X² test value. First, compute the expected values, and the complete table is shown.

SOLUTION ( CONT’D )
School 1 Yes 18(18.67) School 2 22(18.67) School 3 16(18.67) Total 56

No

32(31.33)

28(31.33)

34(31.33)

94

Total

50

50

50

150

The Chi-square test value is

X 2 = ∑(O – E) 2 = 1.596 E

SOLUTION ( CONT’D )
STEP 4. Make the decision. The decision is not to reject the null hypothesis, since 1 .596 < 5.991 . STEP 5. Summarize the results . There is not enough evidence to reject the null hypothesis that the proportions of high school students who drive their own or their parents’ cars to school are equal for each school.

SUMMARY
There are three main uses of the chi -square test:  1 . The test of independence is used to determine whether two variables are related or are independent .  2. It can be used as goodness -of-fit test, in order to determine whether the frequencies of a distribution are the same as the hypothesized frequencies.  3. The homogeneity of proportions test is used to determine if several proportions are all equal when samples are selected from dif ferent populations.

CONCLUSION

 The chi-square test is useful in a variety of hypotheses tests that can be applied to many dif ferent everyday situations.

 References:
h t t p : / / w w w. c o mp ut i n g . d cu .i e / ~ w uh a i / c h a p 1 1 . p d f M e l e c i o D . ( 1 9 9 6 ) . E l e m e n t a r y s t a t i s t ic s f o r b a s i c e d u c a t i o n . P h o e n i x P u b l i s h i n g H o u s e I n c . Q u e z o n C i t y. P u n s a l a n a n d U r i a r te . ( 1 9 9 4 ) . S t a t i s t i cs a s i m p l ifi e d a p p r o a c h . Rex P r i n t in g C o m p a ny, I n c . Q u e z o n C i t y.

THE END

THANK YOU