You are on page 1of 20

Statistics Unlocking the Power of Data

1st Edition Lock Solutions Manual


Visit to download the full and correct content document: https://testbankdeal.com/dow
nload/statistics-unlocking-the-power-of-data-1st-edition-lock-solutions-manual/
CHAPTER 7 347

Section 7.1 Solutions

7.1 The expected count in each category is n · pi = 500(0.25) = 125. See the table.

Category 1 2 3 4
Expected count 125 125 125 125

7.2 Since the categories are equally likely and there are three of them, the null proportion for each is pi = 1/3.
The expected count in each category is n · pi = 1200(1/3) = 400. See the table.

Category A B C
Expected count 400 400 400

7.3 The expected count in category A is n · pA = 200(0.50) = 100. The expected count in category B is
n · pB = 200(0.25) = 50. The expected count in category C is n · pC = 200(0.25) = 50. See the table.

Category A B C
Expected count 100 50 50

7.4 The expected count in category 1 is n · p1 = 400(0.7) = 280. The expected count in category 2 is
n · p2 = 400(0.1) = 40. The expected count in category 3 is n · p3 = 400(0.1) = 40. The expected count in
category 4 is n · p4 = 400(0.1) = 40. See the table.

Category 1 2 3 4
Expected count 280 40 40 40

7.5 We calculate the chi-square statistic using the observed and expected counts.
∑ (observed − expected)2
χ2 =
expected
(35 − 40) 2
(32 − 40)2 (53 − 40)2
= + +
40 40 40
= 0.625 + 1.6 + 4.225
= 6.45

There are three categories (A, B, and C) for this categorical variable, so we use a chi-square distribution
with degrees of freedom equal to 2. The p-value is the area in the upper tail, which we see is 0.0398.

7.6 We calculate the chi-square statistic using the observed and expected counts.
∑ (observed − expected)2
χ2 =
expected
(61 − 50)2 (35 − 50)2 (54 − 50)2
= + +
50 50 50
= 2.42 + 4.5 + 0.32
= 7.24

There are three categories (A, B, and C) for this categorical variable, so we use a chi-square distribution
with degrees of freedom equal to 2. The p-value is the area in the upper tail, which we see is 0.0268.
348 CHAPTER 7

7.7 We calculate the chi-square statistic using the observed and expected counts.
∑ (observed − expected)2
χ2 =
expected
(132 − 160)2 (181 − 160)2 (45 − 40)2 (42 − 40)2
= + + +
160 160 40 40
= 4.90 + 2.76 + 0.63 + 0.10
= 8.38

There are four categories (A, B, C, and D) for this categorical variable, so we use a chi-square distribution
with degrees of freedom equal to 3. The p-value is the area in the upper tail, which we see is 0.039.

7.8 We calculate the chi-square statistic using the observed and expected counts.
∑ (observed − expected)2
χ2 =
expected
(38 − 30)2 (55 − 60)2 (79 − 90)2 (128 − 120)2
= + + +
30 60 90 120
= 2.13 + 0.42 + 1.34 + 0.53
= 4.43

There are four categories (A, B, C, and D) for this categorical variable, so we use a chi-square distribution
with degrees of freedom equal to 3. The p-value is the area in the upper tail beyond 4.43, or about 0.219.

7.9 (a) The sample size is n = 160 and the hypothesized proportion is pb = 0.25, so the expected count is
n · pb = 160(0.25) = 40.
(b) For the “B” cell we have

(observed − expected)2 (36 − 40)2


= = 0.4
expected 40

(c) The table has k = 4 cells, so the chi-square distribution has 4 − 1 = 3 degrees of freedom.

7.10 (a) The sample size is n = 500 and the hypothesized proportion is pb = 0.25, so the expected count
is n · pb = 500(0.25) = 125.
(b) For the “B” cell we have

(observed − expected)2 (148 − 125)2


= = 4.232
expected 125

(c) The table has k = 4 cells, so the chi-square distribution has 4 − 1 = 3 degrees of freedom.

7.11 (a) Add the counts in the table to find the sample size is n = 210+732+396+125+213+324 = 2000.
The hypothesized proportion is pb = 0.35, so the expected count is n · pb = 2000(0.35) = 700.
(b) For the “B” cell we have

(observed − expected)2 (732 − 700)2


= = 1.46
expected 700
CHAPTER 7 349

(c) The table has k = 6 cells, so the chi-square distribution has 6 − 1 = 5 degrees of freedom.

7.12 (a) Add the counts in the table to find the sample size is n = 132 + 468 = 600. The hypothesized
proportion is pb = 0.8, so the expected count is n · pb = 600(0.8) = 480.
(b) For the “B” cell we have

(observed − expected)2 (468 − 480)2


= = 0.30
expected 480

(c) The table has just k = 2 cells, so the chi-square distribution has 2 − 1 = 1 degree of freedom.

7.13 (a) Let pg , po , pp , pr , and py be the proportion of people who choose each of the respective flavors.
If all flavors are equally popular (1/5 each) the hypotheses are

H0 : pg = po = pp = pr = py = 0.2
Ha : Some pi ̸= 0.2

(b) If they were equally popular we would have 66(1/5) = 13.2 people in each category.
(c) Since we have 5 categories we have 4 degrees of freedom.
(d) We calculate the test statistic
.

(18 − 13.2)2 (9 − 13.2)2 (15 − 13.2)2 (13 − 13.2)2 (11 − 13.2)2


χ2 = + + + +
13.2 13.2 13.2 13.2 13.2
= 1.75 + 1.34 + 0.25 + 0.00 + 0.37
= 3.70

(e) The test statistic 3.70 compared to a chi-square distribution with 4 degrees of freedom yields a p-value
of 0.449. We fail to reject the null hypothesis, meaning the data doesn’t provide significant evidence
that some skittle flavors are more popular than others.

7.14 If we let pr , pp , and ps represent the proportions of rock, paper and scissor choices, the hypotheses are

H0 : pr = pp = ps = 1/3
Ha : Some pi ̸= 1/3

The expected count is 119(1/3) = 39.7 for each cell. The chi-square statistic is

(66 − 39.7)2 (39 − 39.7)2 (14 − 39.7)2


χ2 = + +
39.7 39.7 39.7
= 17.4 + 0.01 + 16.6
= 34.01

The test statistic, χ2 = 34.01, lies very far in the tail of a chi-square distribution with 2 degrees of freedom,
so the p-value is very close to zero. This gives strong evidence that the choices made on the first turn of a
rock-paper scissors game are not all equally likely. Comparing the expected counts to the observed counts it
appears that “rock” is used more often and “scissors” is less frequent than expected. Unless your opponent
has also looked at this study, it might be smart to start with paper.
350 CHAPTER 7

7.15 (a) Since there are five groups and we are assuming the groups are equally likely, the proportion in
each group should be 1/5 or 0.2. The hypotheses for the test are:

H0 : pa = p2 = p3 = p4 = p5 = 0.2
Ha : Some pi is not 0.2

where p1 represents the proportion of social networking site users in the 18-22 age group, p2 represents
the proportion in the 23-35 age group, and so on. We have n = 975 and pi = 0.2 for each group, so
the expected count if all age groups are equally likely is 975(0.2) = 195. The expected counts are all
195, as shown in the table.

Age 18-22 23-35 36-49 50-65 65+


Expected count 195 195 195 195 195

We calculate the chi-square statistic using the observed and expected counts.
∑ (observed − expected)2
χ2 =
expected
(156 − 195)2 (312 − 195)2 (253 − 195)2 (195 − 195)2 (59 − 195)2
= + + + +
195 195 195 195 195
= 7.8 + 70.2 + 17.3 + 0 + 94.9
= 190.2

There are five age groups, so we use a chi-square distribution with degrees of freedom equal to 4. The
p-value is the area in the upper tail, which we see is essentially zero. There is strong evidence that
users of social networking sites are not equally distributed among these age groups.
(b) The largest contributor to the sum for the chi-square test statistic is 94.9 from the 65+ age group,
where the observed count is significantly below the expected count. Many fewer senior citizens use a
social networking site than we would expect if users were evenly distributed by age.

7.16 The null hypothesis is that the proportion of sales for each of the cars is 1/3, while the alternative
hypothesis is that at least one of the proportions is not 1/3. Total sales of the three cars are

n = 22, 274 + 21, 385 + 20, 808 = 64, 467

so the expected count in each group is 64, 467(1/3) = 21, 489. If sales were exactly the same we would expect
each model to sell 21,489 cars. We calculate the chi-square test statistic
∑ (observed − expected)2
χ2 =
expected
(22, 274 − 21, 489)2 (21, 385 − 21, 489)2 (20, 808 − 21, 489)2
= + +
21, 489 21, 489 21, 489
= 28.68 + 0.50 + 21.58
= 50.76

Compared to a chi-square distribution with 2 degrees of freedom we get a very small p-value ≈ 0. We conclude
that the sales are different amongst these three models. We can go further to look at the contributions to
the chi-square test statistic to conclude that Escapes are selling better, and Fusions are selling worse.
CHAPTER 7 351

7.17 Let p1 , p2 , p3 , and p4 be the proportion of hockey players born in the 1st , 2nd , 3rd , and 4th quarter of
the year, respectively. We are testing

H0 : p1 = 0.237, p2 = 0.259, p3 = 0.259 and p4 = 0.245


Ha : Some pi is not specified as in H0

The total sample size is n = 147 + 110 + 52 + 50 = 359. The expected counts are 359(0.237) = 85 for Qtr 1,
359(0.259) = 93 for Qtr 2, 359(0.259) = 93 for Qtr 3, and 359(0.245) = 88 for Qtr 4. The chi-square statistic
is
(147 − 85)2 (110 − 93)2 (52 − 93)2 (50 − 88)2
χ2 = + + + = 82.6.
85 93 93 88
We use the chi-square distribution with 4 − 1 = 3 degrees of freedom, which gives a very small p-value that
is essentially zero. This is strong evidence that the distribution of birthdates for OHL hockey players differs
significantly from the national proportions.

7.18 The sample size is n = 196 + 162 + 137 + 122 = 617. We multiply this sample size by the national
proportion in each quarter to get the expected counts of 153.0, 154.9, 156.7, and 152.4, respectively. We see
that the actual number of athletes’ birthdates is higher than expected for the first six months (two quarters)
of the year and lower than expected for the second six months. For the hypotheses for the test, we can use
the proportions given or give a more general null and alternative hypothesis, such as:

H0 : The proportions of athletes born in each quarter matches the proportions nationally
Ha : Some proportion for athletes is different from the national proportion

We calculate the chi-square statistic using the observed and expected counts.
∑ (observed − expected)2
χ2 =
expected
(196 − 153.0)2 (162 − 154.9)2 (137 − 156.7)2 (122 − 152.4)2
= + + +
153.0 154.9 156.7 152.4
= 12.08 + 0.33 + 2.48 + 6.06
= 20.95

There are four categories, so we use a chi-square distribution with degrees of freedom equal to 3. The p-value
is the area in the upper tail beyond 20.95, which we see is 0.0001. There is strong evidence that birthdates
of athletes do not match the national distribution of birthdates. Being born early in the year appears to
significantly increase the likelihood of growing up to play in the Australian Football League, while being
born late in the year appears to decrease the likelihood. (This same effect has been found in European soccer
and Canadian hockey.)

7.19 (a) A χ2 goodness-of-fit test was most likely done.


(b) Since the results are given as statistically significant, the χ2 -statistic is likely to be large.
(c) Since the results are given as statistically significant, the p-value is likely to be small.
(d) The categorical variable appears to record the number of deaths due to medication errors in different
months at hospitals.
(e) The cell giving the number of deaths in July appears to contribute the most to the χ2 - statistic.
(f) In July, the observed count is probably much higher than the expected count.
352 CHAPTER 7

7.20 (a) Since the results are given as statistically significant, the χ2 -statistic is likely to be large.
(b) Since the results are given as statistically significant, the p-value is likely to be small.
(c) In the week before the festival, the expected count is higher than the observed count. This tells us
that some elderly people may be able to delay death.
(d) For the week before the festival, the contribution to the χ2 statistic is
(observed − expected)2 (33 − 50.82)2
= = 6.249
expected 50.82
(e) In the week after the festival, the observed count is higher than the expected count. This tells us that,
although some elderly people are able to delay death, they don’t delay it for very long.
(f) For the week after the festival, the contribution to the χ2 statistic is
(observed − expected)2 (70 − 52)2
= = 6.231
expected 52
(g) The control group allows us to attribute the difference specifically to the meaningful event (the Harbor
Moon Festival) since the effect was only seen in the group who found this event meaningful.
7.21 (a) There are 6 actors and we are testing for a difference in popularity. The null hypothesis is that
each of the proportions is 1/6 while the alternative hypothesis is that at least one of the proportions
is not 1/6. The sample size is 98 + 5 + 23 + 9 + 25 + 51 = 211, so the expected count for each actor is
Expected count for each actor = n · pi = 211(1/6) = 35.2
The chi-square test statistic calculated using the observed data and these expected counts
∑ (observed − expected)2
χ2 =
expected
(98 − 35.2)2 (5 − 35.2)2 (23 − 35.2)2 (9 − 35.2)2 (25 − 35.2)2 (51 − 35.2)2
= + + + + +
35.2 35.2 35.2 35.2 35.2 35.2
= 112.3 + 25.9 + 4.2 + 19.5 + 3.0 + 7.1
= 172.0
This chi-square statistic gives a very small p-value of essentially zero when compared to a chi-square
distribution with 5 degrees of freedom. There is strong evidence of a difference in the popularity of the
James Bond actors.
(b) If we eliminate one actor, the null hypothesis is that each of the proportions is 1/5 while the alternative
hypothesis is that at least one of the proportions is not 1/5. The sample size without the 5 people who
selected George Lazenby is 98 + 23 + 9 + 25 + 51 = 206, so the expected count for each actor is
Expected count for each actor = n · pi = 206(1/5) = 41.2
The chi-square statistic calculated using the observed data (without Lazenby) and these expected
counts is
∑ (observed − expected)2
χ2 =
expected
(98 − 41.2)2 (23 − 41.2)2 (9 − 41.2)2 (25 − 41.2)2 (51 − 41.2)2
= + + + +
41.2 41.2 41.2 41.2 41.2
= 78.3 + 8.0 + 25.2 + 6.4 + 2.3
= 120.2
CHAPTER 7 353

This is still a very large χ2 -statistic and we again have a p-value of essentially zero when it is compared
to a chi-square with 4 degrees of freedom. Even with the Lazenby data omitted, we sill find substantial
differences in he proportions of fans who choose the different James Bond actors.
(c) No, we should not generalize the results from this online survey to a population of all movie watchers.
This poll was a volunteer poll completed by people visiting a James Bond fan site. This is definitely
not a random sample of the movie watching population, and could easily be biased. The best inference
we could hope for is to generalize to people who visit a James Bond fan website and who participate
in online polls.

7.22 To compare proportions of championships for the four teams, the null hypothesis is H0 : pl = po =
ps = pt = 0.25 and the alternative hypothesis is that at least one of the proportions is not 0.25. The sample
size is 8 + 9 + 3 + 4 = 24, so the expected count in each cell is n · pi = 24(1/4) = 6. We just barely meet the
requirements for a chi-square test (expected count≥ 5) in this case. The chi-square statistic is

(8 − 6)2 (9 − 6)2 (3 − 6)2 (4 − 6)2


χ2 = + + + = 0.67 + 1.5 + 1.5 + 0.67 = 4.33
6 6 6 6
Compared to a chi-square distribution with 3 degrees of freedom, we get a p-value in the upper tail beyond
4.33 of 0.228. We do not have much evidence that some teams are significantly more likely to win the AL
West than others. We should be aware, however, that, although we barely met the minimum requirements, a
χ2 -test may not be the best choice since the counts are so low. We might be better off doing a randomization
test in this situation. However, the p-value is so high that the conclusion is likely to be the same.

7.23 (a) We see in the bottom row of the output that n = 436. (We could also add the observed counts.)
(b) We see in the output that the observed value for RR is 130 and the expected count is 109.
(c) The contribution to the χ2 statistic is the highest for those with the XX variant, which contributes
7.716. For this category, the observed count (80) is less than expected (109).
(d) We see in the bottom row of the output that df=2. (We could also compute df using 3 categories minus
1.)
(e) We see in the bottom row of the output that the p-value is 0.002. This is a small p-value and provides
evidence that at least one of the proportions of the three variants differ from the values of 0.25, 0.5,
and 0.25, respectively.

7.24 (a) The null hypothesis is H0 : pR = pX = 0.5 and the alternative hypothesis is that at least one
of the proportions is not 0.5. The expected count in each category is n · pi = 436(0.5) = 218. The
chi-square statistic is

(244 − 218)2 (192 − 218)2


χ2 = + = 3.10 + 3.10 = 6.20
218 218
Using a χ2 distribution with df = 1, we obtain a p-value of 0.0128. This gives evidence at a 5% level
that these two genetic variations are not equally likely.
(b) The null hypothesis is H0 : p = 0.5 and the alternative hypothesis is Ha : p ̸= 0.5 where p represents
the proportion classified R. (Note that the test would give the same results if we used the proportion
classified X.) The sample statistic is p̂ = 244/436 = 0.5596. The test statistic is
p̂ − p0 0.5596 − 0.5
z=√ = √ = 2.490
p0 (1−p0 ) 0.5(0.5)
n 436
354 CHAPTER 7

Using a standard normal distribution, we see that the area above 2.490 is 0.0064. This is a two-tail
test, so the p-value is 2(0.0064) = 0.0128. This gives evidence at a 5% level that these two genetic
variations are not equally likely.
(c) The p-values are equal and the conclusions are identical (and the χ2 -statistic is the square of the
z-statistic.)

7.25 According to Benford’s Law the hypotheses are

H0 : p1 = 0.301, p2 = 0.176, p3 = 0.125, p4 = 0.097, p5 = 0.079,


p6 = 0.067, p7 = 0.058, p8 = 0.051, p9 = 0.046
Ha : At least one of the proportions is different from Benford’s law
Here is a table of observed counts for the addresses and expected counts using the Benford proportions and
a sample size of 1188.
Digit 1 2 3 4 5 6 7 8 9
Observed 345 197 170 126 101 72 69 51 57
Expected 357.6 209.2 148.4 115.1 94.1 79.5 68.9 60.8 54.4
The value of the chi-square statistic is
∑ (observed − expected)2 (345 − 357.6)2 (57 − 54.4)2
χ2 = = + ··· + = 8.24
expected 357.6 54.4

We find the p-value = 0.41 using the upper tail beyond 8.24 of a chi-square distribution with 9 − 1 = 8
degrees of freedom. This is not a small p-value so we do not have convincing evidence that the first digits of
street addresses in a phone book do not follow Benford’s law. Note that this doesn’t prove Benford’s law in
this situation; we only have a lack of evidence against it.

7.26 According to Benford’s Law the hypotheses are

H0 : p1 = 0.301, p2 = 0.176, p3 = 0.125, p4 = 0.097, p5 = 0.079,


p6 = 0.067, p7 = 0.058, p8 = 0.051, p9 = 0.046
Ha : At least one of the proportions is different from Benford’s law
Here is a table of observed counts for the invoices and expected counts using the Benford proportions and a
sample size of 7273.
Digit 1 2 3 4 5 6 7 8 9
Observed 2225 1214 881 639 655 532 433 362 332
Expected 2189.4 1280.7 908.7 704.8 575.9 486.9 421.8 372.0 332.8
The value of the chi-square statistic is
∑ (observed − expected)2 (2225 − 2189.4)2 (332 − 332.8)2
χ2 = = + ··· + = 26.66
expected 2189.4 332.8

We find the p-value = 0.0008 using the upper tail beyond 26.66 of a chi-square distribution with 9 − 1 = 8
degrees of freedom. This is a very small p-value so we have strong evidence that the first digits of these
invoices do not follow Benford’s law. The biggest contributions to the chi-square statistic come from an
unusually large number of entries starting with “5” and too few with “4”. Auditors might want to look more
carefully at invoices for amounts beginning with the digit “5”.
CHAPTER 7 355

7.27 For a pair of fair dice, the proportion for each of the possible sums are shown in the table below, along
with the observed and expected counts from the sample of 180 rolls.

Dice sum 2 3 4 5 6 7 8 9 10 11 12
Null proportion 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Observed count 5 11 16 13 26 34 19 20 16 13 7
(Expected count) (5) (10) (15) (20) (25) (30) (25) (20) (15) (10) (5)

For the null proportions, there are 6 · 6 = 36 possible results for the two dice and only one (1,1) gives a sum
of two while there are six ways to get a sum of seven, (1,6), (2,5), (3,4), (4,3), (5,2), (6,1).

The null hypothesis is that the proportion for each sum is as shown in the table above and the alternative is
that one or more of these proportions is inaccurate. We find the expected counts in the table by multiplying
each of these proportions by the sample size of n = 180. The chi-square statistic is computed as

(5 − 5)2 (11 − 10)2 (16 − 15)2 (13 − 10)2 (7 − 5)2


χ2 = + + + ··· + + = 6.40
5 10 15 10 5
Comparing χ2 = 6.40 to the upper-tail of a chi-square distribution with 10 degrees of freedom yields a
p-value of 0.781. We have very little to no evidence that this author can roll some numbers more often than
should naturally appear by random chance.

7.28 (a) To test H0 : p0 = p1 = p2 = · · · = p9 = 0.10 vs Ha : Some pi ̸= 0.10 the expected count in each
cell is npi = 150(0.1) = 15. Here is a table of observed counts for the digits in RN D4.

Test Contribution
Category Observed Proportion Expected to Chi-Sq
0 12 0.1 15 0.60000
1 14 0.1 15 0.06667
2 16 0.1 15 0.06667
3 13 0.1 15 0.26667
4 22 0.1 15 3.26667
5 10 0.1 15 1.66667
6 27 0.1 15 9.60000
7 14 0.1 15 0.06667
8 10 0.1 15 1.66667
9 12 0.1 15 0.60000

N DF Chi-Sq P-Value
150 9 17.8667 0.037

(observed − expected)2
The contribution to to the chi-square statistic for each cell is the value of . The
expected
sum of these values gives the chi-square statistic

χ2 = 0.6000 + 0.0667 + 0.0667 + · · · + 0.6000 = 17.87

We compare this to the upper tail of a chi-square distribution with 10 − 1 = 9 degrees of freedom to
get a p-value of 0.037. This is a small p-value, providing evidence that the last digits are not randomly
distributed.
356 CHAPTER 7

(b) To test the first digits with H0 : p1 = p1 = p2 = · · · = p9 = 1/9 vs Ha : Some pi ̸= 1/9, the expected
count in each cell is npi = 150 · 1/9 = 16.67. Here is a table of observed counts for the digits in RN D1.

Test Contribution
Category Observed Proportion Expected to Chi-Sq
1 44 0.111111 16.6667 44.8267
2 23 0.111111 16.6667 2.4067
3 16 0.111111 16.6667 0.0267
4 14 0.111111 16.6667 0.4267
5 13 0.111111 16.6667 0.8067
6 10 0.111111 16.6667 2.6667
7 14 0.111111 16.6667 0.4267
8 11 0.111111 16.6667 1.9267
9 5 0.111111 16.6667 8.1667

N DF Chi-Sq P-Value
150 8 61.68 0.000

The value of the chi-square statistic is χ2 = 44.83 + 2.41 + · · · + 8.17 = 61.68 We compare this to the
upper tail of a chi-square distribution with 9 − 1 = 8 degrees of freedom to get a p-value≈ 0. This
provides very strong evidence that the first digits of these numbers are not chosen at random. Looking
at the observed and expected counts we see that there are far more numbers starting with “1” than
we would expect by random chance.
7.29 The hypotheses are
H0 : p0 = p1 = p2 = · · · = p9 = 0.10
Ha : Some pi ̸= 0.10

Here are the observed and expected counts for the digits in SSN 8.
Digit 0 1 2 3 4 5 6 7 8 9
Observed 13 14 16 13 14 15 17 15 15 18
Expected 15 15 15 15 15 15 15 15 15 15
Using technology we obtain the following output for this test
X-squared = 1.6, df = 9, p-value = 0.9963
This is a very large p-value so we have no substantial evidence that the eighth digits of social security num-
bers are not random.

Here are the observed and expected counts for the digits in SSN 9, the last digit.
Digit 0 1 2 3 4 5 6 7 8 9
Observed 16 12 17 15 12 10 15 27 15 11
Expected 15 15 15 15 15 15 15 15 15 15
Using technology we obtain the following output for this test
X-squared = 13.8667, df = 9, p-value = 0.1271
This is not a small p-value so we lack sufficient evidence to conclude the last digits of social security numbers
are not random.
CHAPTER 7 357

Section 7.2 Solutions

7.30 For the (Group 3, Yes) cell we have


Group 3 row total · Yes column total 100 · 260
Expected count = = = 65
n 400
(72 − 65)2
The contribution to the χ2 -statistic from the (Group 3, Yes) cell is = 0.754.
65
7.31 For the (B,E) cell we have
B row total · E column total 330 · 160
Expected count = = = 88
n 600
(89 − 88)2
The contribution to the χ2 -statistic from the (B,E) cell is = 0.011.
88
7.32 We need to find the row total for Control, 40 + 50 + 5 + 15 + 10 = 120, and the column total for
Disagree, 15 + 5 = 20. Adding the counts in all the cells shows that the overall sample size is n = 240. For
the (Control, Disagree) cell we have
120 · 20
Expected count = = 10
240
(15 − 10)2
The contribution to the χ2 -statistic from the (Control, Disagree) cell is = 2.5.
10
7.33
We need to find a row total for Group 2, 1180 + 320 = 1500, and the column total for No, 280 + 320 = 600.
Since the row total for Group 1 is 720 + 280 = 1000, the overall sample size is n = 1000 + 1500 = 2500. For
the (Group 2, No) cell we have
1500 · 600
Expected count = = 360
2500
(320 − 360)2
The contribution to the χ2 -statistic from the (Group 2, No) cell is = 4.44.
360
7.34 This is a 3 × 2 table so we have (3 − 1) · (2 − 1) = 2 degrees of freedom. Also, if we eliminate the last
row and last column (ignoring the totals) there are 2 cells remaining.
7.35 This is a 3 × 4 table so we have (3 − 1) · (4 − 1) = 6 degrees of freedom. Also, if we eliminate the last
row and last column (ignoring the totals) there are six cells remaining.
7.36 This is a 2 × 5 table so we have (2 − 1) · (5 − 1) = 4 degrees of freedom. Also, if we eliminate the last
row and last column there are four cells remaining.
7.37 This is a 2 × 2 table so we have (2 − 1) · (2 − 1) = 1 degree of freedom. If the row and column totals
are known, we only need the value for any one cell to be able to fill in the rest of the table.
7.38 The hypotheses for testing an association between these two categorical variables are

H0 : Award preference is not related to Gender


Ha : Award preference is related to Gender

The table below shows the observed and expected counts for each cell. For example, the expected count for
the (Female, Academy) cell is 31 · 169/362 = 14.5.
358 CHAPTER 7

Academy Nobel Olympic Total


Female 20 (14.5) 76 (69.6) 73 (85.0) 169
Male 11 (16.5) 73 (79.4) 109 (97.0) 193
Total 31 149 182 362
The value of the chi-square statistic is
(20 − 14.5)2 (76 − 69.6)2 (73 − 85.0)2
χ2 = + +
14.5 69.6 85.0
(11 − 16.5)2 (73 − 79.4)2 (109 − 97.0)2
+ + +
16.5 79.4 97.0
= 2.08 + 0.59 + 1.69 + 1.83 + 0.52 + 1.48 = 8.20
Since this is a 2 × 3 table we use a chi-square distribution with (3 − 1)(2 − 1) = 2 degrees of freedom. The
area beyond χ2 = 8.20 is 0.017. This is a fairly small p-value, less than 5%, so we have fairly strong evidence
that the award preferences tend to differ between male and female students.
7.39 The null hypothesis is that the attitude about one true love is not related to one’s educational level and
the alternative hypothesis is that the two variables are related in some way. We compute expected counts
for all 9 cells. For example, for the (Agree, Some) cell, we have
735 · 668
Expected = = 187.0
2625
Computing all the expected counts in the same way, we find the expected counts shown in the table. Note
that all cell counts are large enough to use a χ2 -test.
HS Some College
Agree 263.2 187.0 284.8
Disagree 648.9 461.1 702.0
Don’t know 27.9 19.8 30.2

We compute (observed − expected)2 /expected for each cell. The results are shown in the next table.
HS Some College
Agree 37.71 0.65 27.69
Disagree 13.02 0.05 10.78
Don’t know 2.24 1.94 0.11

Adding up all of these contributions, we obtain the χ2 -statistic 93.7. This is a very large test statistic, and
the p-value from a chi-square distribution with df = 4 is essentially zero. There is very strong evidence of
an association between education level and how one feels about whether we all have one true love.
We see that the largest contribution to the χ2 -statistic is in those who agree, with more high school educated
people than expected agreeing and fewer college educated people than expected agreeing. It appears that
the greater the amount of education, the less likely a person is to agree that we each have exactly one true
love.
7.40 (a) The two-way table of penguin survival vs type tag is shown:

Metal Electronic Total


Survived 33 68 101
Died 134 121 255
Total 167 189 356
CHAPTER 7 359

(b) The hypothesis are


H0 : Type of tag is not related to survival
Ha : Type of tag is related to survival
(c) The table below shows the expected counts, obtained for each cell by multiplying the row total by the
column total and dividing by n = 356.

Metal Electronic
Survived 47.4 53.6
Died 119.6 135.4

(d) We calculate the chi-square test statistic

(33 − 47.4)2 (68 − 53.6)2 (134 − 119.6)2 (121 − 135.4)2


χ2 = + + +
47.4 53.6 119.6 135.4
= 4.37 + 3.87 + 1.73 + 1.53
= 11.5

(e) We compare our test statistic of 11.5 from part (c) to a chi-square with 1 degree of freedom to get a
p-value of 0.0007. There is strong evidence that the type of tag and survival rate of the penguins are
related.

7.41 (a) Based on the given counts, here is the two-way table, with totals included.

Relapse No relapse Total


Desipramine 10 14 24
Lithium 18 6 24
Placebo 20 4 24
Total 48 24 72

(b) The expected count for the (Desipramine, Relapse) cell is (48 · 24)/72 = 16. All expected counts are
shown in the table. Since the sample size is the same for each group, we see that the expected counts
are the same from row to row, matching the null hypothesis that the treatment drug doesn’t matter.
Since all the expected counts are greater than 5, a chi-square test is appropriate.

Relapse No relapse
Desipramine 16 8
Lithium 16 8
Placebo 16 8

(c) The null hypothesis is that the drug does not affect the likelihood of a relapse, and the alternative
hypothesis is that the drug does matter in the chances of recovery. We use the observed and expected
counts to find the χ2 statistic is

(10 − 16)2 (14 − 8)2 (18 − 16)2 (6 − 8)2 (20 − 16)2 (4 − 8)2
χ2 = + + + + + = 10.5
16 8 16 8 16 8
Using the χ2 distribution with df = 2, we find a p-value of 0.005. There is strong evidence that the
drug used is related to the likelihood of a relapse.
360 CHAPTER 7

(d) Desipramine appears to be significantly more effective than lithium or a placebo. Yes, we can conclude
that the drug affects the likelihood of successful recovery, since the results come from a randomized
experiment.

7.42 We test H0 : Home size is not related to state vs Ha : Home size is related to state.

Since there are 30 homes from each state, the expected counts for all four “Big” cells are (30 · 23)/120 = 5.75
and the expected counts for all the “Not big” cells are (30 · 97)/120 = 24.25. Since all the expected counts
are greater than 5, we proceed with the chi-square test. The chi-square statistic is

(7 − 5.75)2 (6 − 5.75)2 (27 − 24.25)2


χ2 = + + ··· + = 2.31
5.75 5.75 24.25
Using a χ2 distribution with df = 3, we find a p-value of 0.510. We do not find convincing evidence that
location of the house for sale (at the level of the state it’s in) is related to the house being large.

7.43 (a) The hypotheses are


H0 : Skittles choice does not depend on method of choosing (color vs flavor)
Ha : Skittles choice depends on method of choosing
(b) The expected counts under H0 are

Green(Lime) Orange Purple(Grape) Red(Strawberry) Yellow(Lemon)


Color 13.0 10.5 14.3 19.8 8.4
Flavor 18.0 14.5 19.7 27.2 11.6

(c) All of the expected counts are larger then 5, so we can use a chi-square test.
(d) We have (5 − 1)(2 − 1) = 4 degrees of freedom.
(e) The chi-square statistic is

(18 − 13.0)2 (9 − 10.5)2 (9 − 11.6)2


χ2 = + + ··· + = 9.07
13.0 10.5 11.6

(f) Comparing our test statistic to a chi-square distribution with 4 degrees of freedom we get a p-value of
0.059. This is right on the border, so we see weak evidence that choosing flavor vs color might affect
the choices, but not enough to reject the null hypothesis if we are using a 5% level.

7.44 (a) Architects had the highest proportion of left-handed people (26/148 = 0.176); Orthopedic Sur-
geon had the highest proportion of right-handed people (121/132 = 0.917).
(b) The null and alternative hypotheses are H0 : Handedness and career are not associated vs. Ha : Hand-
edness and career are associated. The observed and expected counts are shown in the following table.

Right Left Ambidextrous Total


Psychiatrist 101 (99.6) 10 (12.5) 7 (5.9) 118
Architect 115 (124.9) 26 (15.6) 7 (7.5) 148
Orthopedic Surgeon 121 (111.4) 5 (13.9) 6 (6.7) 132
Lawyer 83 (88.6) 16 (11.1) 6 (5.3) 105
Dentist 116 (111.4) 10 (13.9) 6 (6.7) 132
Total 536 67 32 635
CHAPTER 7 361

The value of the chi-square statistic is

(101 − 99.6)2 (10 − 12.7)2 (7 − 5.9)2 (115 − 124.9)2 (26 − 15.6)2


χ2 = + + + +
99.6 12.7 5.9 124.9 15.6
(7 − 7.5)2 (121 − 111.4)2 (5 − 13.9)2 (6 − 6.7)2 (83 − 88.6)2
+ + + + +
7.5 111.4 13.9 6.7 88.6
(16 − 11.1)2 (6 − 5.3)2 (116 − 111.4)2 (10 − 13.9)2 (6 − 6.7)2
+ + + + +
11.1 5.3 111.4 13.9 6.7
= 19.0 .

The degrees of freedom are (5 − 1)(3 − 1) = 8, and the area above 19.0 in the chi-square distribution
gives a p-value of 0.015.
(c) At the 5% significance level we can conclude that career choice is associated with handedness, while at
the 1% level we cannot conclude that there is an association with handedness for these five professions.

7.45 The hypotheses are

H0 : Age distribution is the same in 2008 and 2010


Ha : Age distribution is different between 2008 and 2010

We compute expected counts for all 8 cells. For example, for the 18-22 year olds in 2008, we have

290 · 495
Expected count for ages 18-22 in 2008 = = 99.5.
1442
Computing all the expected counts in the same way, we find the expected counts shown in the table.

↓Age/Year→ 2008 2010


18-22 99.5 190.5
23-35 171.6 328.4
36-49 121.5 232.5
50+ 102.3 195.7

Notice that all expected counts are over 5 so we can use a chi-square distribution. We compute the contri-
bution to the χ2 -statistic, (observed − expected)2 /expected, for each cell. The results are shown in the next
table.

↓Age/Year→ 2008 2010


18-22 14.9 7.8
23-35 3.7 2.0
36-49 1.5 0.8
50+ 24.7 12.9

Adding up all of these contributions, we obtain the χ2 -statistic 68.3. This test statistic is very large. Using
a chi-square distribution with df = (4 − 1) · (2 − 1) = 3, we see that the p-value is essentially zero. We
have very strong evidence that the age distribution changed during these two years. We see that the biggest
contribution to the chi-square statistic is from the 50 and older groups in the year 2008, where the number
of social network site users was below expected (based on H0 ) in 2008.
362 CHAPTER 7

7.46 (a) Under a null hypothesis that age and frequency of status updates are unrelated, the expected
count for the “Every day” cell under 18-22 years old is

Row Total · Column Total 136 · 156


Expected count = = = 22.4
n 947

(b) For χ2 = 210.9 with (5 − 1)(4 − 1) = 12 degrees of freedom, we see that the p-value is essentially zero.
We have very strong evidence that age of users is related to the frequency of status updates.

7.47 The null hypothesis is that frequency of status updates on Facebook is not different for males and
females. The alternative hypothesis is that frequency of status updates is related to gender. We compute
expected counts for all 10 cells. For example, for the males who update their status every day, we have
130 · 386
Expected = = 57.2.
877
Computing all the expected counts in the same way, we find the expected counts shown in the following
table.

↓Status/Gender→ Male Female


Every day 57.2 72.8
3-5 days/week 46.2 58.8
1-2 days/week 65.6 83.4
Every few weeks 68.7 87.3
Less often 148.3 188.7

Notice that all expected counts are over 5 so we can use a chi-square distribution. We compute the contri-
bution to the χ2 -statistic, (observed − expected)2 /expected, for each cell and show the results in the next
table.

↓Status/Gender→ Male Female


Every day 4.05 3.18
3-5 days/week 0.0 0.0
1-2 days/week 0.33 0.23
Every few weeks 1.01 0.80
Less often 0.05 0.04

Adding up all of these contributions, we obtain the χ2 -statistic 9.65. Using a chi-square distribution with
df = (5 − 1) · (2 − 1) = 4, we see that the p-value is 0.047. This is very close to a 5% cutoff. At a 5%
level, there is some evidence of a possible relationship between gender and frequency of status updates on
Facebook. Females appear more likely to update their status at least once a day.

7.48 The null hypothesis is that frequency of liking content on Facebook is not different for males and
females. The alternative hypothesis is that frequency of liking is related to gender. We compute expected
counts for all 10 cells. For example, for the males who like content on Facebook every day, we have
219 · 386
Expected = = 96.4.
877
Computing all the expected counts in the same way, we find the expected counts shown in the following
table.
CHAPTER 7 363

↓Liking/Gender→ Male Female


Every day 96.4 122.6
3-5 days/week 40.9 52.1
1-2 days/week 57.7 73.3
Every few weeks 37.9 48.1
Less often 153.2 194.8

Notice that all expected counts are over 5 so we can use a chi-square distribution. We compute the con-
tribution to the χ2 -statistic, (observed − expected)2 /expected for each celland show the results in the next
table.

↓Liking/Gender→ Male Female


Every day 3.90 3.07
3-5 days/week 0.09 0.07
1-2 days/week 0.34 0.26
Every few weeks 0.45 0.36
Less often 1.07 0.84

Adding up all of these contributions, we obtain the χ2 statistic 10.45. Using a chi-square distribution with
df = (5 − 1) · (2 − 1) = 4, we see that the p-value is 0.034. At a 5% level, there is evidence of an association
between gender and frequency of liking content on Facebook. Females appear more likely to like content at
least once a day. The results are significant at the 5% level but not at the 1% level.

7.49 (a) The expected count in the (Endurance, XX) cell is 34.75, and the contribution of this cell to the
chi-square statistic is 3.645. We find the expected count using

Endurance row total · XX column total 194 · 132


= = 34.75
Sample size 737

The contribution to the chi-square statistic is

(observed − expected)2 (46 − 34.75)2


= = 3.642
expected 34.75

which is the same (up to round-off) as the computer output.


(b) We see in the bottom row of the computer output that “DF = 4”. Since the two-way table has 3 rows
and 3 columns, we have df = (3 − 1) · (3 − 1) = 4, as given.
(c) We see in the bottom row of the output that the chi-square test statistic is 24.805 and the p-value is
0.000. There is strong evidence that the distribution of genotypes for this gene is different between
sprinters, endurance athletes, and non-athletes.
(d) The (Sprint, XX) cell contributes the most, 9.043, to the χ2 -statistic. The observed count (6) is
substantially less than the expected count (19.16). Sprinters are not likely to have this genotype.
(e) The genotype RR is most over-represented in sprinters (53 compared to an expected count of 35.28).
The genotype XX is most over-represented in endurance athletes (46 compared to an expected count
of 34.75).

7.50 (a) We see in the Total column that 194 endurance athletes were included in the study.
364 CHAPTER 7

(b) The expected count for sprinters with the R allele is 61.70 and the contribution to the chi-square
statistic is 3.792. We find the expected count using

Sprinter row total · R column total 107 · 425


= = 61.70
Sample size 737

The contribution to the chi-square statistic is

(Observed − Expected)2 (77 − 61.70)2


= = 3.794
Expected 61.70

which is the same (up to round-off) as the computer output.


(c) We see in the bottom row of the computer output that “DF = 2”. Since the two-way table has 3 rows
and 2 columns, we have df = (3 − 1) · (2 − 1) = 2, as given.
(d) We see in the bottom row of the output that the chi-square test statistic is 10.785 and the p-value
is 0.005. There is strong evidence that the distribution of alleles for this gene is different between
sprinters, endurance athletes, and non-athletes.
(e) The (Sprint, X) cell contributes the most, 5.166, to the χ2 -statistic. The observed count (30) is
substantially less than the expected count (45.30). Sprinters are less likely to have this X allele.
(f) The allele R is most over-represented in sprinters (77 observed compared to 61.7 expected). For
endurance athletes the most over-represented allele is X (90 observed compared to 82.1 expected).

7.51 The p-value is 0.592, which is a large p-value. The sample provides no evidence at all that genotype
distribution is different between males and females. Gender does not appear to be associated with whether
or not one has the “sprinting gene”.

7.52 (a) We would write “relapse” or “no relapse” on each card. There would be 38 relapse cards and
24 no relapse cards. We would shuffle the cards together and then deal them into three equal piles,
signifying the three different groups, desipramine, lithium, and placebo.
(b) Because the p-value is 0.005, about 5 out of 1000 randomization statistics will be greater than or equal
to the observed statistic.

7.53 The null hypothesis is that the two variables are related and the alternative hypothesis is that the two
variables are not related. The output from one statistics package (Minitab) is given. Output from other
packages may look different but will give the same (or similar) chi-square statistic and p-value. We see that
the p-value is 0.004 so there is a significant association between these two variables. The largest contribution
to the chi-square statistic is from the males who do not take vitamins, and we see that males are less likely
to take vitamins than expected.

Rows: Gender Columns: VitaminUse

No Occasional Regular

Female 87 77 109
96.20 71.07 105.73
0.8798 0.4954 0.1009

Male 24 5 13
CHAPTER 7 365

14.80 10.93 16.27


5.7189 3.2199 0.6560

Cell Contents: Count


Expected count
Contribution to Chi-square

Pearson Chi-Square = 11.071, DF = 2, P-Value = 0.004

7.54 The null hypothesis is that the two variables are related and the alternative hypothesis is that the two
variables are not related. The output from one statistics package (Minitab) is given. Output from other
packages may look different but will give the same (or similar) chi-square statistic and p-value. We see that
the p-value is 0.028 so there is a significant association between these two variables at the 5% level although
not at the 1% level. The largest contribution to the chi-square statistic is from the males who do not smoke,
and we see that males are less likely to be nonsmokers than expected (which means they are more likely to
smoke).

Rows: Gender Columns: PriorSmoke

1 2 3

Female 144 93 36
136.07 99.67 37.27
0.4626 0.4459 0.0431

Male 13 22 7
20.93 15.33 5.73
3.0066 2.8986 0.2798

Cell Contents: Count


Expected count
Contribution to Chi-square

Pearson Chi-Square = 7.137, DF = 2, P-Value = 0.028

You might also like