Professional Documents
Culture Documents
Notes12 1student
Notes12 1student
In chapters 9, 10 and 11, we have been doing inference for categorical (count) data, but only for variables with 2
categories (success, failure). In this section we will be considering univariate categorical variables with 2 or
more categories, such as “ethnic group” or “college major.”
At one time, the intended distribution of colors in a bag of regular M&M’s was:
10% blue, 10% brown, 10% green, 10% orange, 20% red, 20% yellow, and 20% purple
Suppose that we opened a random sample of bags and observed the counts below. Does the data indicate that
the stated distribution has changed?
Because sample size is an important consideration, we consider the expected counts in each category instead of
the proportions. For example, if 10% of the M&M’s are supposed to be blue and we have a total of 250
M&M’s, then we would expect 25 to be blue.
At first glance, it appears that the distribution of M&M colors has changed since the observed counts are
different than the expected counts. However, it is possible that the distribution hasn’t changed and that the
differences we see are due to sampling variability.
How likely is it to get differences like these due to sampling variability? We first need a way to measure how
different the observed counts are from the expected counts.
X =�
2
expected count
1. At first glance, it appears that the distribution of M&M colors has changed since the observed counts are
different than the expected counts. However, it is possible that the distribution hasn’t changed and that the
differences we see are due to sampling variability. To decide I will use a c 2 Goodness of Fit test ( a =.05).
3. Conditions:
a. Random sample of M&M’s? Must assume the bags represent a random sample of M&M’s.
b. Large sample size? Check: all expected cell counts are all ≥ 5. Refer to table of expected counts.
Note: If this condition is not met, you should combine categories until all are ≥ 5. For example
you could combine the categories blue, brown, green, and orange into one category called
“other.”
c. Sample < 10% of population? Yes, assuming > 2500 M&M’s.
( 28 - 25)
2
4. X 2
= +L =
25
df =
P-value =
5. Since P-value > a , we fail to reject the null hypothesis and cannot conclude that the distribution of colors
has changed.
Does your zodiac sign determine how successful you will be in later life? Fortune Magazine collected data on
the zodiac signs of 256 heads of the largest 400 companies. Test this hypothesis, based on the results by
Fortune: