Chi Square

CHAPTER 6
c2 CHI-SQUARE TEST
Chi-Square Test
• There are three main uses of the chi-square
distribution:
1. The test of independence is used to determine
whether two variables are related or are
independent.
2. It can be used as goodness-of-fit test, in order to
determine whether the frequencies of a distribution
are the same as the hypothesized frequencies.
3. The homogeneity of proportions test is used to
determine if several proportions are all equal when
samples are selected from different populations.
Hypothesis Tests
Qualitative Data
Qualitative
Data
More than
1 pop. 2 pop.
Proportion Independence
2 pop.
Z Test Z Test c2 Test c2 Test

Characteristics of the Chi-Square
Distribution
1. It is not symmetric.
2. The shape of the chi-square distribution depends on the
degrees of freedom, just like Student’s t-distribution.
3. As the number of degrees of freedom increases, the chi-
square distribution becomes more nearly symmetric.
4. The values of c2 are nonnegative,
i.e., the values of c2
are greater than or equal to 0.
Goodness of Fit
Chi-Square Goodness-of-Fit Test
To calculate the test statistic for the chi-square goodness-of-fit test, the
observed frequencies and the expected frequencies are used.
• The observed frequency O of a category is the frequency for the

category observed in the sample data.
• The expected frequency E of a category is the calculated frequency for
the category.
• The expected frequency for the ith category is

Ei = npi
• where n is the number of trials (the sample size) and
pi is the assumed probability of the ith category.
Test Statistic for Goodness-of-Fit Tests
Let
Oi - the observed counts of category i,
Ei - the expected counts of an category i,
k - the number of categories, and n represent the number of
independent trials of an experiment. Then,
i = 1, 2, …, k
with k – 1 degrees of freedom

Critical Value for Goodness-of-Fit Tests
All goodness-of-fit tests are right-tailed tests,
degrees of freedom (df) = k-1.
2
c
Assumptions
• The assumptions for the chi-square

independence and homogeneity tests:
1. The data are obtained from a random sample.
2. No more than 20% of the expected counts are
less than 5
The Chi-Square Goodness-of-Fit Test
Example 1: Fair Die?
Suppose you roll a die 60 times and get 12 ones, 9 twos,

10 threes, 6 fours, 11 fives, and 12 sixes.
Would you reject the null hypothesis that the die is fair
at the 5% level of significance?
Example: Fair Die?
If a fair die is rolled 60 times, you “expect” to get each face of the die on 1/6 of the
60 rolls, or 10 times each.
Outcome Observed Frequency, O Expected Frequency, E
1 12 10
2 9 10
3 10 10
4 6 10
5 11 10
6 12 10
Total 60 60
The die is unfair if the observed frequencies are far from the expected
frequencies.
The Test Statistic
Example: The Test Statistic for the Die
Compute the c2 for the fair die.
Observed Expected
Outcome O-E (O - E)2 / E
Frequency, O Frequency, E
1 12 10 2 0.4
2 9 10 -1 0.1
3 10 10 0 0.0
4 6 10 -4 1.6
5 11 10 1 0.1
6 12 10 2 0.4
Total 60 60 0 2.6
2
c  2.6
Example: Fair Die?
Critical Values
at the 5% level of significance
Determine the degrees of freedom (the number of categories minus 1):

df = 6 - 1 = 5.
c2* = 11.07
Since the test statistic c2 = 2.6 is smaller than the critical value c2* = 11.07,
we fail to reject the null hypothesis that the coin is fair at the  = 0.05 level.
The Chi-Square Goodness of Fit Test

State the hypotheses
H 0 : The proportions in the population are equal to the proportions in

your model.
H 1 : At least one proportion in the population is not equal to the
corresponding proportion in your model.
Example 2: Milk Chocolate Versus Peanut Butter
Milk chocolate M&Ms are 13% red, 14% yellow, 16% green, 20% orange, 13%
brown, and 24% blue. A random sample of 200 peanut butter M&Ms yielded this
distribution:
Color Observed Number of M&Ms

Red 25
Yellow 37
Green 45
Orange 34
Brown 19
Blue 40
Total 200
Do we have convincing evidence that the distribution of peanut butter M&Ms is

different from the distribution of milk chocolate M&Ms?
Example: Milk Chocolate Versus Peanut Butter
Use the proportions of each color of milk chocolate M&Ms and the
sample size (n = 200) to compute the expected number of each color.
Observed Number Expected Number

Color
of M&Ms of M&Ms
Red 25 0.13(200) = 26
Yellow 37 0.14(200) = 28
Green 45 0.16(200) = 32
Orange 34 0.20(200) = 40
Brown 19 0.13(200) = 26
Blue 40 0.24(200) = 48
Total 200 200
State the hypotheses:
H0: The proportions in the population of peanut M&Ms are equal to

the proportions in the model (plain M&Ms).
H1: At least one proportion in the population of peanut M&Ms is not

equal to the corresponding proportion in the model (plain M&Ms).
Compute the test statistic, approximate the P-value, and draw a

sketch. 2
O  E 
The test statistic is c 2  
E
Observed Number Expected Number

Color (O - E)2 / E
of M&Ms of M&Ms
Red 25 0.13(200) = 26 0.038
Yellow 37 0.14(200) = 28 2.893
Green 45 0.16(200) = 32 5.281
Orange 34 0.20(200) = 40 0.900
Brown 19 0.13(200) = 26 1.885
Blue 40 0.24(200) = 48 1.333
Total 200 200 c2 = 12.331
 Shows if a relationship exists between
two qualitative variables
 Uses two-way contingency table
 The null hypothesis (H0) states that no
association exists between the two cross-
tabulated variables in the population,
(the variables are statistically independent).
 The (H1) proposes that the two variables are
related in the population.
(the variables are dependent)
Assuming the two variables are independent, you can use the
contingency table to find the expected frequency for each cell.
The Expected Frequency for Contingency Table Cells

The expected frequency for a cell Erc in a contingency table is
(Sum of row r )  (Sum of column c )

Expected frequency E r ,c  .
Sample size
Test Statistic for the Test of Independence
Let :
Oi - the observed number of counts in the ith cell,
Ei - the expected number of counts in the ith cell.
Then,
With degrees of freedom = (r – 1)(c – 1)

where: r is the number of rows
c is the number of columns
Critical Region for the Test of Independence
1. Hypotheses
• H0: Variables are independent
• Ha: Variables are related (dependent)
2. Test Statistic Observed count
2
o ij  Eij 
2
Expected
c  count
Eij
3. Degrees of Freedom: (r – 1)(c – 1)
Rows Columns
Example:
The following contingency table shows a random sample of 321 fatally
injured passenger vehicle drivers by age and gender. The expected
frequencies are displayed in parentheses. At  = 0.05, can you conclude that
the drivers’ ages are related to gender in such accidents?
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Female 13 22 33 21 10 6 105
(14.72) (23.88) (27.80) (20.93) (12.43) (5.23)
45 73 85 64 38 16 321
Example continued:
Because each expected frequency is at least 5 and the drivers were randomly
selected, the chi-square independence test can be used to test whether the
variables are independent.
H0: The drivers’ ages are independent on gender.

Ha: The drivers’ ages are dependent on gender.
d.f. = (r – 1)(c – 1)
= (2 – 1)(6 – 1)
=5
With d.f. = 5 and  = 0.05, the critical value is χ20 = 11.071.
Continued.
Example continued:
O E O–E (O – E)2 (O  E )2
Rejection E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
  0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O  E )2 22 23.88 1.88 3.5344 0.148
2
χ   2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134
There is not enough evidence at the 5% level to conclude that age is dependent on gender
in such accidents.
You’re a marketing research analyst. You ask a random
sample of 286 consumers if they purchase Diet Pepsi or Diet
Coke. At the 0.05 level of significance, is there evidence of
a relationship?
Diet Pepsi
Diet Coke No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
 H0: No Relationship Test Statistic:
 Ha: Relationship
  = .05
Decision:
 df = (2 - 1)(2 - 1) = 1
 Critical Value(s): Conclusion:
Reject H0
 = .05
0 3.841 c2

Eij  5 in all cells
116·132 Diet Pepsi 154·132

286 No Yes 286
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286
170·132 170·154
286 286
c2  
Oi  E i 2
Ei
2 2 2

O11  E11  O12  E12 
  ... 
O 22  E 22 
E11 E12 E 22
2 2 2

84  53.5 32  62.5
  ... 
122  91.5
53.5 62.5 91.5
 54.29
H0: No Relationship Test Statistic:
Ha: Relationship c2 = 54.29
 = .05
Decision:
df = (2 - 1)(2 - 1) = 1
Critical Value(s): Conclusion:
Reject H0 Reject at  = .05
 = .05
There is evidence of a relationship
0 3.841 c2
Tabulated statistics: Coke, Pepsi
Using frequencies in frequency
Rows: Coke Columns: Pepsi
No Yes All
No 84 32 116
53.5 62.5 116.0
Yes 48 122 170
78.5 91.5 170.0
All 132 154 286
132.0 154.0 286.0
Cell Contents: Count

Expected count
Pearson Chi-Square = 54.150, DF = 1, P-Value = 0.000

Likelihood Ratio Chi-Square = 55.783, DF = 1, P-Value = 0.000

Chi Square

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chi Square

Uploaded by

Copyright:

Available Formats

CHAPTER 6

Z Test Z Test c2 Test c2 Test

• The observed frequency O of a category is the frequency for the

• The expected frequency for the ith category is

with k – 1 degrees of freedom

• The assumptions for the chi-square

Example 1: Fair Die?

Suppose you roll a die 60 times and get 12 ones, 9 twos,

Compute the c2 for the fair die.

Example: Fair Die?

at the 5% level of significance

Determine the degrees of freedom (the number of categories minus 1):

The Chi-Square Goodness of Fit Test

H 0 : The proportions in the population are equal to the proportions in

Color Observed Number of M&Ms

Do we have convincing evidence that the distribution of peanut butter M&Ms is

Observed Number Expected Number

State the hypotheses:

H0: The proportions in the population of peanut M&Ms are equal to

H1: At least one proportion in the population of peanut M&Ms is not

Compute the test statistic, approximate the P-value, and draw a

Observed Number Expected Number

The Expected Frequency for Contingency Table Cells

(Sum of row r )  (Sum of column c )

With degrees of freedom = (r – 1)(c – 1)

H0: The drivers’ ages are independent on gender.

116·132 Diet Pepsi 154·132

Ha: Relationship c2 = 54.29

Reject H0 Reject at  = .05

Rows: Coke Columns: Pepsi

Cell Contents: Count

Pearson Chi-Square = 54.150, DF = 1, P-Value = 0.000

You might also like