Professional Documents
Culture Documents
c2 CHI-SQUARE TEST
Chi-Square Test
• There are three main uses of the chi-square
distribution:
1. The test of independence is used to determine
whether two variables are related or are
independent.
2. It can be used as goodness-of-fit test, in order to
determine whether the frequencies of a distribution
are the same as the hypothesized frequencies.
3. The homogeneity of proportions test is used to
determine if several proportions are all equal when
samples are selected from different populations.
Hypothesis Tests
Qualitative Data
Qualitative
Data
More than
1 pop. 2 pop.
Proportion Independence
2 pop.
1. It is not symmetric.
2. The shape of the chi-square distribution depends on the
degrees of freedom, just like Student’s t-distribution.
3. As the number of degrees of freedom increases, the chi-
square distribution becomes more nearly symmetric.
4. The values of c2 are nonnegative,
i.e., the values of c2
are greater than or equal to 0.
Goodness of Fit
Chi-Square Goodness-of-Fit Test
To calculate the test statistic for the chi-square goodness-of-fit test, the
observed frequencies and the expected frequencies are used.
i = 1, 2, …, k
Would you reject the null hypothesis that the die is fair
at the 5% level of significance?
Chi-Square Goodness-of-Fit Test
Example: Fair Die?
If a fair die is rolled 60 times, you “expect” to get each face of the die on 1/6 of the
60 rolls, or 10 times each.
Outcome Observed Frequency, O Expected Frequency, E
1 12 10
2 9 10
3 10 10
4 6 10
5 11 10
6 12 10
Total 60 60
The die is unfair if the observed frequencies are far from the expected
frequencies.
Chi-Square Goodness-of-Fit Test
The Test Statistic
Chi-Square Goodness-of-Fit Test
Example: The Test Statistic for the Die
Observed Expected
Outcome O-E (O - E)2 / E
Frequency, O Frequency, E
1 12 10 2 0.4
2 9 10 -1 0.1
3 10 10 0 0.0
4 6 10 -4 1.6
5 11 10 1 0.1
6 12 10 2 0.4
Total 60 60 0 2.6
2
c 2.6
Chi-Square Goodness-of-Fit Test
Critical Values
Since the test statistic c2 = 2.6 is smaller than the critical value c2* = 11.07,
we fail to reject the null hypothesis that the coin is fair at the = 0.05 level.
Chi-Square Goodness-of-Fit Test
Milk chocolate M&Ms are 13% red, 14% yellow, 16% green, 20% orange, 13%
brown, and 24% blue. A random sample of 200 peanut butter M&Ms yielded this
distribution:
Use the proportions of each color of milk chocolate M&Ms and the
sample size (n = 200) to compute the expected number of each color.
Then,
2
o ij Eij
2
Expected
c count
Eij
3. Degrees of Freedom: (r – 1)(c – 1)
Rows Columns
Example:
The following contingency table shows a random sample of 321 fatally
injured passenger vehicle drivers by age and gender. The expected
frequencies are displayed in parentheses. At = 0.05, can you conclude that
the drivers’ ages are related to gender in such accidents?
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Female 13 22 33 21 10 6 105
(14.72) (23.88) (27.80) (20.93) (12.43) (5.23)
45 73 85 64 38 16 321
Example continued:
Because each expected frequency is at least 5 and the drivers were randomly
selected, the chi-square independence test can be used to test whether the
variables are independent.
d.f. = (r – 1)(c – 1)
= (2 – 1)(6 – 1)
=5
With d.f. = 5 and = 0.05, the critical value is χ20 = 11.071.
Continued.
Example continued:
O E O–E (O – E)2 (O E )2
Rejection E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O E )2 22 23.88 1.88 3.5344 0.148
2
χ 2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134
There is not enough evidence at the 5% level to conclude that age is dependent on gender
in such accidents.
You’re a marketing research analyst. You ask a random
sample of 286 consumers if they purchase Diet Pepsi or Diet
Coke. At the 0.05 level of significance, is there evidence of
a relationship?
Diet Pepsi
Diet Coke No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
H0: No Relationship Test Statistic:
Ha: Relationship
= .05
Decision:
df = (2 - 1)(2 - 1) = 1
Critical Value(s): Conclusion:
Reject H0
= .05
0 3.841 c2
Eij 5 in all cells
170·132 170·154
286 286
c2
Oi E i 2
Ei
2 2 2
O11 E11 O12 E12
...
O 22 E 22
E11 E12 E 22
2 2 2
84 53.5 32 62.5
...
122 91.5
53.5 62.5 91.5
54.29
H0: No Relationship Test Statistic:
= .05
Decision:
df = (2 - 1)(2 - 1) = 1
Critical Value(s): Conclusion:
= .05
There is evidence of a relationship
0 3.841 c2
Tabulated statistics: Coke, Pepsi
Using frequencies in frequency
No Yes All
No 84 32 116
53.5 62.5 116.0
Yes 48 122 170
78.5 91.5 170.0
All 132 154 286
132.0 154.0 286.0