You are on page 1of 5

3/5/2020

Suppose we wanted to determine if the


proportions for the different colors in a large
bag of M&M candies matches the proportions
that the company claims is in their candies.

Chi- Square Testing The company claims the following distribution:


pblue=.24; pbrown=.13; pyellow=.14; pred=.13; porange=.20; pgreen=.16

The Analysis of We could record the color


of each candy in the bag.
Categorical Data and
Goodness of Fit Tests There are six colors –
so k = 6.

M&M Candies Continued . . . c2 (Chi-square) test


We could count how many candies of each
color are in the bag. • Used to test the counts of
Red Blue Green Yellow Orange Brown categorical data
23 28 21 19 22 25
• Three types
17.94 33.12 22.08 19.32 27.6 17.94
– Goodness of fit (univariate)
– Homogeneity (univariate with
two or more samples)
– Independence (bivariate)

Facts About c2 distributions Goodness-of-Fit Test Procedure


• Different df have different curves Null Hypothesis:
• c2 curves are skewed right H0: p1 = hypothesized proportion for Category 1
• As df increases, the c2 curve shifts
...

toward the right and becomes more like pk = hypothesized proportion for Category k
a normal curve Ha: at least one proportion is different
* c2 values are always non-negative

Test Statistic:
observed cell count - expected cell count2
c2  
all cells
expected cell count

1
3/5/2020

Goodness-of-Fit Test Procedure A company says its premium mixture of nuts


contains 10% Brazil nuts, 20% cashews, 20%
Continued . . . almonds, 10% hazelnuts and 40% peanuts. You
Assumptions: buy a large can and separate the nuts. Upon
weighing them, you find there are 112 g Brazil
1) Observed cell counts are based on a random sample nuts, 183 g of cashews, 207 g of almonds, 71 g or
2) The sample size is large enough as long as every hazelnuts, and 446 g of peanuts. You wonder
expected cell count is at least 5 whether your mix is significantly different from
3) Sample is less than 10% of the population what the company advertises?
Why is the chi-square goodness-of-fit test NOT
appropriate here?
What might you do instead of weighing the
nuts in order to use chi-square?

Offspring of certain fruit flies may have


yellow or ebony bodies and normal wings or
Are my “crazy” dice short wings. Genetic theory predicts that
these traits will appear in the ratio 9:3:3:1
considered “fair”? (yellow & normal, yellow & short, ebony &
normal, ebony & short) A researcher checks
100 such flies and finds the distribution of
traits to be 59, 20, 11, and 10, respectively.

Do the data provide convincing evidence


that the proportions of fruit flies with
the different traits are different than
the genetic theory?

Assumptions: H0: p1 = 9/16, p2 = 3/16, p3 = 3/16, p4 = 1/16


•Have a random sample of fruit flies Ha: At least one of the proportions of fruit flies
•All expected counts are greater than 5. is different.
Expected counts:
Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25 59  56.25 2  20  18.75 2 10  6.25 2
• There are at least 1000 of each kind of fly c2   ...   5.671
56.25 18.75 6.25

p1= the proportion of fruit flies with yellow bodies and normal wings
p2= the proportion of fruit flies with yellow bodies and short wings P-value = c2cdf(5.671, 10^99, 3) = .129
p3= the proportion of fruit flies with ebony bodies and normal wings a = .05
p4= the proportion of fruit flies with ebony bodies and short wings
Since p-value > a, I fail to reject H0. There is
H0: p1 = 9/16, p2 = 3/16, p3 = 3/16, p4 = 1/16 not sufficient evidence to suggest that that the
Ha: At least one of the proportions of fruit flies is proportions of fruit flies with the different
different. traits are different than the genetic theory.

2
3/5/2020

A study was conducted to determine if collegiate soccer


players had in increased risk of concussions over other
athletes or students. The two-way frequency table below
c2 Test for Homogeneity
displays the number of previous concussions for students
in independently selected random samples of 91 soccer Null Hypothesis:
players, 96 non-soccer athletes, and 53 non-athletes.
H0: the true category proportions are the same
Number of Concussions
for all the populations or treatments
3 or
0 1 2 Total
more
Soccer Players 45 25 11 10 91
Alternative Hypothesis:
Non-Soccer Players 68 15 8 5 96 Ha: at least one proportion is different
Non-Athletes 45 5 3 0 53
Total 158 45 22 15 240 Test Statistic:

observed cell count - expected cell count2


c2  
all cells
expected cell count

c2 Test for Homogeneity c2 Test for Homogeneity


Continued . . . Continued . . .
Assumptions: Expected Counts: (assuming H0 is true)
1) Data are from independently chosen random
samples or from subjects who were assigned (row marginal total)(col umn marginal total)
expected cell counts 
at random to treatment groups. grand total

2) The sample size is large: all expected cell


counts are at least 5. Degree of Freedom:
(If some expected counts are less than 5,
rows or columns of the table may be combined df  (rows - 1)(columns - 1)
to achieve a table with satisfactory expected
counts.)
3) Sample is less than 10% of the population

Soccer Players Continued . . . Soccer Players Continued . . .


Number of Concussions Number
NumberofofConcussions
Concussions
3 or 2 or
3 or
0 1 2 Total 0 0 1 1 2 Total
Total
more more
more
Soccer Players 45 25 11 10 91 Soccer Players 45 (59.9)
45 (59.9)
25 (17.1)
25 (17.1)
11 (8.321 10
(14.0)
(5.7) 9191
Non-Soccer Players 68 15 8 5 96 Non-Soccer Players 68 (63.2)
68 (63.2)
15 (18.0)
15 (18.0)
8 (8.8)13 (14.8)
5 (6.0) 96
96
Non-Athletes 45 5 3 0 53 Non-Athletes 45 (34.9)
45 (34.9)
5 (10.0)
5 (10.0)
3 (4.9) 3 (8.2)
0 (3.3) 53
53
Total 158 45 22 15 240 Total 158 158 45 45 22 3715 240
240

H0: Proportions in each response category df = 4 (45  59.9) 2 (3  8.2) 2


(number of concussions) are the same for c2   ...   20.6
all three groups Test Statistic: 59.9 8.2

Ha: At least one of the category proportions is


different
P-value = .00038 a = .05
Df = (2)(3) = 6

3
3/5/2020

Soccer Players Continued . . .


Number of Concussions
c2 Test for Independence
2 or
0 1 Total
more Null Hypothesis:
Soccer Players 45 (59.9) 25 (17.1) 21 (14.0) 91
H0: The two variables are independent
Non-Soccer Players 68 (63.2) 15 (18.0) 13 (14.8) 96
Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53
Alternative Hypothesis:
Total 158 45 37 240
Ha: The two variables are not independent

Since the P-value < a, we reject H0. There is Test Statistic:


strong evidence to suggest that the category
proportions for the number of observed cell count - expected cell count2
concussions is not the same for the 3 c2   expected cell count
groups. all cells

c2 Test for Independence c2 Test for Independence


Continued . . . Continued . . .
Assumptions: Expected Counts: (assuming H0 is true)
1) The observed counts are based on data from (row marginal total)(col umn marginal total)
a random sample. expected cell counts 
grand total
2) The sample size is large: all expected cell
counts are at least 5. Degree of Freedom:
(If some expected counts are less than 5,
rows or columns of the table may be combined
to achieve a table with satisfactory expected
df  (rows - 1)(columns - 1)
counts.)
3) Sample is less than 10% of the population

The paper “Contemporary College Students and Body Art Continued . . .


Body Piercing” (Journal of Adolescent Health,
2004) described a survey of 490 undergraduate Body Both Body
students at a state university in the southwestern Piercing Tattoos Piercing and No Body
region of the United States. Each student in the sample Only Only Tattoos Art
was classified according to class standing (freshman, Freshman 61 (49.7)
61 7 (15.1)
7 14 (18.5)
14 86 (84.7)
86
sophomore, junior, senior) and body art category (body Sophomore 43 (37.9)
43 11 (11.5)
11 10 (14.1)
10 64 (64.5)
64
piercing only, tattoos only, both tattoos and body Junior 20 (23.4)
20 9 (7.1)
9 7 (8.7)
7 43 (39.8)
43
piercing, no body art). Is there evidence that there is an
Senior 21 (34.0)
21 17 (10.3)
17 23 23
(12.7) 54 (58.0)
54
association between class standing and response to the
body art question? Use a = .01.
Body Both Body H0: class standing and body art category are
Piercing Tattoos Piercing and No Body
Only Only Tattoos Art
independent
Freshman 61 7 14 86 Ha: class standing and body art category are not
Sophomore 43 11 10 64 independent
Junior 20 9 7 43
df = 9
Senior 21 17 23 54

4
3/5/2020

Body Art Continued . . . Body Art Continued . . .


Body Both Body Body Both Body
Piercing Tattoos Piercing and No Body Piercing Tattoos Piercing and No Body
Only Only Tattoos Art Only Only Tattoos Art
Freshman 61 (49.7)
61 7 (15.1)
7 14 (18.5)
14 86 (84.7)
86 Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)
Sophomore 43 (37.9)
43 11 (11.5)
11 10 (14.1)
10 64 (64.5)
64 Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)
Junior 20 (23.4)
20 9 (7.1)
9 7 (8.7)
7 43 (39.8)
43 Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)
Senior 21 (34.0)
21 17 (10.3)
17 23 23
(12.7) 54 (58.0)
54 Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)

Test Statistic:
1) Have a random sample of undergraduate
students. (61  49.7)2 (54  58.0)2
c2   ...   29.48
49.7 58.0
2) All expected counts are at least 5.
3) There are at least 4900 undergraduates at P-value = .00054 a = .01
the college

Body Art Continued . . .


Body Both Body
Piercing Tattoos Piercing and No Body
Only Only Tattoos Art
Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7)
Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5)
Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8)
Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0)

Since the P-value < a, we reject H0. There is


sufficient evidence to suggest that class
standing and the body art category are not
independent.

You might also like