## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

THE CHI SQUARE TEST

BY:

LJV & IKR

OBJECTIVES

Test a distribution for goodness of fit using chi-square. Test two variables for independence using chi-square. Test proportions for homogeneity using chi-square

INTRODUCTION

The chi-square distribution can be used for tests concerning frequency distributions, such as: ― If a sample of buyers is given a choice of automobile colors, will each color be selected with the same frequency?‖

INTRODUCTION ( CONT’D )

The chi-square distribution can also be used to test the independence of two variables. For example, ―Are senators’ opinions on gun control independent of party af filiations?‖

INTRODUCTION ( CONT’D )

The chi-squared distribution can be used to test the homogeneity of proportions. For example: ―Is the proportion of high school seniors who attend college immediately after graduating the same for the northern, southern, eastern, and western parts of the Iloilo City?‖

**ASSUMPTIONS FOR GOODNESS-OF-FIT TEST, INDEPENDENCE AND HOMOGENEIT Y TESTS
**

The data are obtained from a random sample. The expected frequency for each category must be 5 or more.

TEST FOR GOODNESS-OF-FIT

The chi-square statistic can be used to see whether a frequency distribution fits a specific pattern. This is referred to as the chi-squared goodness-of-fit test.

**OBSERVED FREQUENCIES VS. EXPECTED FREQUENCIES
**

Suppose a market analyst wished to see whether consumers have any preference among five flavors of a new fruit soda. A sample of 100 people provided these data: Cherry 32 Strawberry 28 Orange 16 Lime 14 Grape 10

Since the frequencies for each flavor were obtained from a sample, these actual frequencies are called the observed frequencies. The frequencies obtained by calculation (as if there were no preference) are called the expected frequencies.

OBSERVED FREQUENCIES VS. EXPECTED FREQUENCIES ( CONT’D )

Frequency Observed Expected

Cherry 32 20

Strawberry 28 20

Orange 16 20

Lime 14 20

Grape 10 20

**GOODNESS-OF-FIT TEST
**

The formula for the chi -square goodness-of-fit test:

X 2 = ∑(O – E) 2 E

Where: O = the observed or obtained frequency E = the expected or theoretical frequency

**GOODNESS-OF-FIT TEST ( CONT’D )
**

And the degrees of freedom (df):

df = (C – 1) (R – 1)

Where: C = the number of columns R = the number of rows

AN EXAMPLE

Is there enough evidence to reject the claim that there is no preference in the selection of fruit soda flavors, using the data shown previously? Let α = 0.05.

SOLUTION

STEP 1 State the hypotheses and identify the claim .

H0: Consumers show no preference for flavors (claim). H1 : Consumers show a preference.

STEP 2 Find the critical value. The degrees of freedom are 5 – 1=4, and α = 0.05. Hence, the critical value from the table is 9.488 . STEP 3 Compute χ2

X 2 = ∑(O – E) 2 = 18.0 E

SOLUTION ( CONT’D )

STEP 4 Make the decision. The decision is to reject the null hypothesis, since 18.0 > 9.488.

SOLUTION ( CONT’D )

STEP 5 Summarize the results . There is enough evidence to reject the claim that consumers show no preference for the flavors.

A GOOD FIT

When the observed values and expected values are close together, the chi-square test value will be small. Then the decision will be not to reject the null hypothesis— hence, there is a ―good fit.‖

**NOT A GOOD FIT
**

When the observed values and the expected values are far apart, the chi-square test value will be large. Then, the null hypothesis will be rejected—hence, there is ―not a good fit.‖

**CHI-SQUARE GOODNESS-OF-FIT PROCEDURE SUMMARY
**

Step 1 State the hypotheses and identify the claim. Step 2 Find the critical value. The test is always right-tailed. Step 3 Compute the test value . Find the sum of the values . (O – E) 2 E Step 4 Make the decision. Step 5 Summarize the results.

INDEPENDENCE TEST

The chi-square independence test can be used to test the independence of two variables . H0: There is no relationship between two variables. H1: There is a relationship between two variables. If the null hypothesis is rejected, there is some relationship between the variables.

AN EXAMPLE

Suppose a new postoperative procedure is administered to a number of patients in a large hospital. One can ask the question, ―Do the doctors feel dif ferently about this procedure from the nurses, or do they feel basically the same way?‖ Note that the question is not whether they prefer the procedure but whether there is a dif ference of opinion between the two groups.

AN EXAMPLE ( CONT’D )

To answer this question, a researcher selects a sample of nurses and doctors and tabulates the data in table form, as shown. Group Nurses Doctors Prefer new 100 50 Prefer old 80 120 No Preference 20 30

AN EXAMPLE ( CONT’D )

If the null hypothesis is not rejected, the test means that both professions feel basically the same way about the procedure and the dif ferences are due to chance. If the null hypothesis is rejected, the test means that one group feels dif ferently about the procedure from the other. Remember that rejection does not mean that one group favors the procedure and the other does not.

CHI-SQUARE INDEPENDENCE TEST

In order to test the null hypothesis, one must compute the expected frequencies, assuming the null hypothesis is true. When data are arranged in table form for the independence test, the table is called a contingency table.

CONTINGENCY TABLE

Row 1 Row 2

Column 1 C1 ,1 C2,1

Column 2 C1 ,2 C2,2

Column 3 C1 ,3 C2,3

**CHI-SQUARE INDEPENDENCE TEST FORMULA
**

The formula for the test value for the independence test is the same as the one for the goodness -of-fit test.

X 2 = ∑(O – E) 2 E

Where: O = the obser ved or obtained frequency E = the expected or theoretical frequency and the degrees of freedom : d f = (C – 1) (R – 1) Where: C = the number of columns R = the number of rows

**CALCULATION OF THE EXPECTED FREQUENCIES
**

Using the contingency table, one can compute the expected frequencies for each block (or cell) as shown next. 1 . Find the sum of each row and each column, and find the grand total, as shown.

Group

Prefer new procedure

Prefer old procedure

No preference

Total

Nurses Doctors Total

100 50 150

80 120 200

20 30 50

200 200 400

Grand Total

**CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
**

2. For each cell, multiply the corresponding row sum by the column sum and divide by the grand total, to get the expected value: Expected value = Row sum × Column sum Grand total

**CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
**

The rationale for the computation of the expected frequencies for a contingency table uses proportions. For C1 ,1 a total of 150 out of 400 people prefer the new procedure. And since there are 200 nurses, one would expect, if the null hypothesis were true , (150/400)(200), or 75, of the nurses to be in favor of the new procedure. For example, for C1 ,2, the expected value, denoted by E1 ,2, is E = 200 x 200 = 100 400

**CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
**

For each cell, the expected values can be computed and placed in the table:

Group

Prefer new procedure

Prefer old procedure

No preference

Total

Nurses Doctors Total

100(75) 50(75) 150

80(100) 120(100) 200

20(25) 30(25) 50

200 200 400

**CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
**

Now, compute χ2:

X 2 = ∑(O – E) 2 = 26.27 E

The final steps are to make the decision and summarize the results. This test is always a right -tailed test, and the degrees of freedom are (R -1)(C-1)= (2-1)(3-1)=2.

**CALCULATION OF THE EXPECTED FREQUENCIES ( CONT’D )
**

If α = 0.05, the critical value from the table is 5.991 . Hence, the decision is to reject the null hypothesis, since 26.67 > 5.991 .

**HOMOGENEIT Y OF PROPORTIONS TEST
**

is used when samples are selected from several different populations and whether the proportions of elements that have a common characteristic are the same for each population. The sample sizes are specified in advance, making either the row totals or column totals in the contingency table known before the samples are selected.

H0: p1 = p2 = p3 =… = pn. H1: At least one propor tion is dif ferent from the other s . When the null hypothesis is rejected, it can be assumed that the propor tions are not all equal.

PROCEDURE

The procedures for the chi-square independence and homogeneity tests are identical and summarized below. Step 1 State the hypotheses and identify the claim. Step 2 Find he critical value in the right tail. Step 3. Compute the test value. To compute the test value, first find the expected values. For each cell of the contingency table, use the formula E = (row sum)(column sum) grand total

PROCEDURE ( CONT’D )

to get the expected value. To find the test value, use the formula

X 2 = ∑(O – E) 2 E

Step 4 Make the decision . Step 5 Summarize the results.

AN EXAMPLE

A researcher selected a sample of 150 seniors from each of three area high schools and asked each senior, ―Do you drive to school in a car owned by either you or your parents ?‖. The data are shown in the table. At α = 0.05, test the claim that the proportion of students who drives their own or their parents’ cars is the same at all three schools .

School 1 School 2 School 3 Total

Yes No Total

18 32 50

22 28 50

16 34 50

56 94 150

SOLUTION

STEP 1 . State the hypotheses. H0: p1 = p2 = p3 H1: At least one proportion is dif ferent from the others. STEP 2. Find the critical value. The formula for the degrees of freedom is the same as before: (2 -1)(3-1)=2. The critical value is 5.991 . STEP 3. Compute the X² test value. First, compute the expected values, and the complete table is shown.

SOLUTION ( CONT’D )

School 1 Yes 18(18.67) School 2 22(18.67) School 3 16(18.67) Total 56

No

32(31.33)

28(31.33)

34(31.33)

94

Total

50

50

50

150

The Chi-square test value is

X 2 = ∑(O – E) 2 = 1.596 E

SOLUTION ( CONT’D )

STEP 4. Make the decision. The decision is not to reject the null hypothesis, since 1 .596 < 5.991 . STEP 5. Summarize the results . There is not enough evidence to reject the null hypothesis that the proportions of high school students who drive their own or their parents’ cars to school are equal for each school.

SUMMARY

There are three main uses of the chi -square test: 1 . The test of independence is used to determine whether two variables are related or are independent . 2. It can be used as goodness -of-fit test, in order to determine whether the frequencies of a distribution are the same as the hypothesized frequencies. 3. The homogeneity of proportions test is used to determine if several proportions are all equal when samples are selected from dif ferent populations.

CONCLUSION

The chi-square test is useful in a variety of hypotheses tests that can be applied to many dif ferent everyday situations.

References:

h t t p : / / w w w. c o mp ut i n g . d cu .i e / ~ w uh a i / c h a p 1 1 . p d f M e l e c i o D . ( 1 9 9 6 ) . E l e m e n t a r y s t a t i s t ic s f o r b a s i c e d u c a t i o n . P h o e n i x P u b l i s h i n g H o u s e I n c . Q u e z o n C i t y. P u n s a l a n a n d U r i a r te . ( 1 9 9 4 ) . S t a t i s t i cs a s i m p l ifi e d a p p r o a c h . Rex P r i n t in g C o m p a ny, I n c . Q u e z o n C i t y.

THE END

THANK YOU

- Cross Tabs
- Equivalence Testing
- MB 0050
- 13 Chapter 6
- Data Analisis AAN
- OUTPUTallanalisa 3
- 21537sm Finalnew Vol2 Cp18
- Insurance Rural.pdf
- Output Spss
- Christopher
- The Chi
- Chi2
- Evaluation of Green Act in Small Scale Foundry
- Hasil SPSS
- Out Put All an Alisa
- Final Report_Aug14 (1)
- 1
- Lecture 7 QM Introduction to Hypothesis Testing SIX
- Taking the risk out of systemic risk measurement
- Ashaolu_Resource Use 1
- Lab 5
- Us Environmental Protection Agency-Acute Toxicity Lc50
- res12_s4glossary
- stop signs- rough draft
- 10052-33600-1-SM(1)
- RSM Project Report Questions 2014-15
- Summary and Limitations of ANOVA
- download-1390656377712
- Lyon Barber and Tsai on LR Returns JF 1999
- hyp-test

Skip carousel

- tmp8752.tmp
- UT Dallas Syllabus for stat3332.001.08s taught by Robert Serfling (serfling)
- UT Dallas Syllabus for epps3405.001.11f taught by Michael Tiefelsdorf (mrt052000)
- UT Dallas Syllabus for opre6301.mim.07f taught by Kurt Beron (kberon)
- UT Dallas Syllabus for psy2317.501.07s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.501.08f taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.002 06s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for stat4v97.501 06s taught by Robert Serfling (serfling)
- UT Dallas Syllabus for cs4315.501.10s taught by Richard Golden (golden)
- UT Dallas Syllabus for cgs4315.501.08s taught by Richard Golden (golden)
- UT Dallas Syllabus for stat5352.501 06s taught by Robert Serfling (serfling)
- UT Dallas Syllabus for psy2317.001 06s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for stat1342.5u1.11u taught by Yuly Koshevnik (yxk055000)
- Berner International Corporation v. Mars Sales Company, 987 F.2d 975, 3rd Cir. (1993)
- UT Dallas Syllabus for socs3305.001 05f taught by Flounsay Caver (frc021000)
- UT Dallas Syllabus for pa5309.501 06f taught by Yoon Park (yxp022000)
- UT Dallas Syllabus for psy2317.002.08s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.501 06f taught by Betty-gene Edelman (bedelman)
- Taking the risk out of systemic risk measurement
- UT Dallas Syllabus for stat3332.001.10s taught by Robert Serfling (serfling)
- tmpD5F6.tmp
- UT Dallas Syllabus for stat3332.001.09f taught by Robert Serfling (serfling)
- UT Dallas Syllabus for psy2317.501 05f taught by Betty-gene Edelman (bedelman)
- UT Dallas Syllabus for stat1342.5u1.11u taught by Yuly Koshevnik (yxk055000)
- UT Dallas Syllabus for stat4352.501.09s taught by Yuly Koshevnik (yxk055000)
- UT Dallas Syllabus for stat3332.001.10f taught by Robert Serfling (serfling)
- UT Dallas Syllabus for cgs4315.501.10s taught by Richard Golden (golden)
- tmp42BA.tmp
- tmpC525
- UT Dallas Syllabus for opre6301.mim.08f taught by Kurt Beron (kberon)

Skip carousel

- UT Dallas Syllabus for psy2317.001.07s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.001 05f taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.002.07f taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.002 06s taught by Nancy Juhn (njuhn)
- Business Statistics Level 3/Series 2 2008 (Code 3009)
- UT Dallas Syllabus for psy2317.001.07f taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.501.09s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.001 06s taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.001 06f taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for epps3405.501.11f taught by Bryan Chastain (bjc062000)
- UT Dallas Syllabus for psy2317.002 06f taught by Nancy Juhn (njuhn)
- UT Dallas Syllabus for psy2317.002.09s taught by Nancy Juhn (njuhn)
- tmp5B28
- Ecbwp2006.En
- UT Dallas Syllabus for psy2317.001.09s taught by Nancy Juhn (njuhn)
- Business Statistics Level 3/Series 3 2008 (Code 3009)
- UT Dallas Syllabus for psy2317.002.07s taught by Nancy Juhn (njuhn)

Sign up to vote on this title

UsefulNot usefulRead Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading