You are on page 1of 30

Chapter 13 ANALYSIS OF CATEGORICAL DATA

13.1 The null hypothesis is H 0 : p1 = p2 = ... = p6 = 1 6 and the alternative hypothesis H1 is that at least two proportions are different. Under H0, the expected frequency of 2 each cell is 320 ( 1 6 ) . The statistic for goodness-of-fit is calculated as follows:

Face Number Observed Frequency (O) Expected Frequency (E ) (O E )2 E

1 39

2 63

3 56

4 67

5 57

6 38

Total 320

320 6

320 6

320 6

320 6

320 6

320 6

320

3.852

1.752

0.133

3.502

0.252

4.408

13.90 = 2

2 We take = 0.05 . For d.f. = 5, we find that 0.05 = 11.07 , so that the rejection

region is R : 2 11.07 . Since the observed value of 2 = 13.90 lies in R, we reject H0 at = 0.05 . As such, the model of a fair die is contradicted. 13.2 Let p1 , p2 , p3 , and p4 denote the population proportions of donors having the blood types O, A, B, and AB, respectively. The null hypothesis is H 0 : p1 = p2 = p3 = p4 = 1 4 and the alternative hypothesis H1 is that at least two proportions are different. Under H0, the expected frequency of each cell is 2 100 ( 1 4 ) = 25 . The statistic for goodness-of-fit is calculated as follows:

377

378 Blood Type Observed Frequency (O) Expected Frequency (E) (O E )2 E

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA O 40 A 44 B 10 AB 6 Total 100

25

25

25

25

100

9.00

14.44

9.00

14.44

46.88 = 2

2 We take = 0.05 . For d.f. = 3, we find that 0.05 = 7.81 , so that the rejection

region is R : 2 7.81 . Since the observed value of 2 = 46.88 lies in R, we reject H0 at = 0.05 . As such, the model of equal proportions of blood types is contradicted. 13.3 The null hypothesis is H 0 : p1 = 0.4, p2 = 0.4, p3 = 0.1, p4 = 0.1 and the alternative hypothesis H1 is that at least one of these proportions is not correct. Under H0, multiplying these probabilities by n = 100 , the expected frequencies are found to be 40, 40, 10, and 10. The 2 statistic for goodness-of-fit is calculated as follows: Blood Type Observed Frequency (O) Expected Frequency (E) (O E )2 E O 40 A 44 B 10 AB 6 Total 100

40

40

10

10

100

0.00

0.40

0.00

1.60

2.00 = 2

2 We take = 0.05 . For d.f. = 3, we find that 0.05 = 7.81 , so that the rejection

region is R : 2 7.81 . Since the observed value of 2 = 2.00 does not lie in R, we do not reject H0 at = 0.05 . 13.4 The three colors are equally popular when the population proportion of sales is 1 3 1 for each. So, the null hypothesis is H 0 : p1 = p2 = p3 = 3 and the alternative hypothesis H1 is that at least two of these proportions are not equal. Under H0, the 2 expected frequency of each cell is 150 ( 1 3 ) = 50 . The statistic for goodness-offit is calculated as follows:

379

Color Observed Frequency (O) Expected Frequency (E) (O E )2 E

White Stainless Steel 63 56

Black

Total

31

150

50

50

50

150

3.38

0.72

7.22

11.32 = 2

2 We take = 0.05 . For d.f. = 2, we find that 0.05 = 5.99 , so that the rejection

region is R : 2 5.99 . Since the observed value of 2 = 11.32 lies in R, we reject H0 at = 0.05 . The main contribution to 2 is the sales of black, which are below the amount expected under the model of equal popularity. 13.5 Let p1 , p2 , p3 , and p4 denote the population proportions of walnuts, hazelnuts, almonds, and pistachios, respectively. The null hypothesis is H 0 : p1 = 0.45, p2 = 0.20, p3 = 0.20, p4 = 0.15 and the alternative hypothesis H1 is that at least one of these proportions is not correct. Under H0, multiplying these probabilities by n = 240 , the expected frequencies are found to be 108, 48, 48, and 36. The 2 statistic for goodness-of-fit is calculated as follows: Nut Type Observed Frequency (O) Expected Frequency (E) (O E )2 E Walnuts 95 Hazelnuts 70 Almonds 33 Pistachios 42 Total 240

108

48

48

36

240

1.565

10.083

4.688

1.000

17.336 = 2

2 = 9.35 , so that the rejection We take = 0.025 . For d.f. = 3, we find that 0.025

region is R : 2 9.35 . Since the observed value of 2 = 17.336 lies in R, we reject H0 at = 0.025 . We conclude that there is strong evidence of mislabeling. 13.6 Let p1 , p2 , and p3 denote the population proportions of the categories white, pink, and red, respectively. The null hypothesis is H 0 : p1 = 0.25, p2 = 0.50, p3 = 0.25 and the alternative hypothesis H1 is that at

380

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA least one of these proportions is not correct. Under H0, multiplying these probabilities by n = 564 , the expected frequencies are found to be 141, 282, and 141. The 2 statistic for goodness-of-fit is calculated as follows: Category Observed Frequency (O) Expected Frequency (E) (O E )2 E White 141 Pink 291 Red 132 Total 564

141

282

141

564

0.287

0.574

0.861 = 2

2 We take = 0.10 . For d.f. = 2, we find that 0.10 = 4.61 , so that the rejection

region is R : 2 4.61 . Since the observed value of 2 = 0.861 does not lie in R, we do not reject H0 at = 0.10 . As such, the probabilities suggested by Mendels theory are not contradicted. 13.7 For the particular geographical region, let p1 , p2 , ..., p6 denote the population proportions of accidental deaths due to motor vehicle, poison, falls, choking, drowning, and other reasons, respectively. The null hypothesis is H 0 : p1 = 0.405, p2 = 0.185, p3 = 0.157, p4 = 0.041, p5 = 0.032, p6 = 0.180 and the alternative hypothesis H1 is that at least one of these proportions is not correct. Under H0, multiplying these probabilities by n = 908 , the expected frequencies are found to be 367.74, 167.98, 142.56, 37.23, 25.06, and 163.44. The 2 statistic for goodness-of-fit is calculated as follows:
Reason Observed Frequency (O) Expected Frequency (E ) Motor Vehicle
356

Poison
207

Fall
125

Choking
33

Drowning
26

Other
161

Total
908

367.74

167.98

142.56

37.23

29.06

163.44

908

(O E )2 E

0.375

9.064

2.163

0.481

0.322

0.036 12.441

= 2

2 We take = 0.05 . For d.f. = 5, we find that 0.05 = 11.07 , so that the rejection

region is R : 2 11.07 . Since the observed value of 2 = 12.441 lies in R, we reject H0 at = 0.05 . As such, we conclude that the pattern is significantly

381

different the differences in categories 2 and 3 are conspicuous from the large (O E )2 values of . E 13.8 The null hypothesis that all 7 days of the week are equally likely is formalized by H 0 : p1 = p2 = ... = p7 = 1 7 and the alternative hypothesis H1 is that at least two of these proportions are different. Since the observed frequencies are given in units of 10,000, we proceed with the calculation of the expected frequencies and the 2 statistic in the same units. In particular, under H0 the expected frequency of 2 each cell is 426.56 ( 1 7 ) = 60.937 The statistic for goodness-of-fit is calculated as follows:
Day Observed Frequency (O) Expected Frequency (E ) Mon 62.81 Tues 69.66 Wed 70.11 Thurs 69.91 Fri 68.39 Sat 45.47 Sun 40.21 Total 426.56

60.937

60.937

60.937

60.937

60.937

60.937

60.937

426.56

(O E )2 E

0.058

1.249

1.381

1.321

0.912

3.926

7.050

15.897

= 2

2 We take = 0.01 . For d.f. = 6, we find that 0.01 = 16.81 , so that the rejection

region is R : 2 16.81 . Since the observed value of 2 = 15.897does not lie in R, we do not reject H0 at = 0.01 . 13.9 (a) From the binomial table for n = 3 and p = 0.4 , we find that P[ X = 0] = 0.216 P[ X = 1] = 0.648 0.216 = 0.432 P[ X = 2] = 0.936 0.648 = 0.288 P[ X = 3] = 1.000 0.936 = 0.064 Let p0 , p1 , p2 , and p3 denote the probabilities of the four categories: 0, 1, 2 and 3 males in litter, respectively. The null hypothesis is H 0 : p0 = 0.216, p1 = 0.432, p2 = 0.288, p3 = 0.064 and the alternative hypothesis H1 is that at least one of these proportions is not correct. Under H0, multiplying these probabilities by n = 80 , the expected frequencies are found to be 17.28, 34.56, 23.04, and 5.12. The 2 statistic for goodness-of-fit is calculated as follows:

(b)

382
Number of Males Observed Frequency (O) Expected Frequency (E ) 0

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA


1 2 3 Total

19

32

22

80

17.28

34.56

23.04

5.12

80

(O E )2 E

0.171

0.190

0.047

0.690

1.098

= 2

2 We take = 0.05 . For d.f. = 2, we find that 0.05 = 5.99 , so that the rejection

region is R : 2 5.99 . Since the observed value of 2 = 1.098 does not lie in R, we do not reject H0 at = 0.05 . As such, we conclude that the binomial model is not contradicted. 13.10 Observe that n2 (ni npi 0 )2 ni2 2ni npi 0 + (npi 0 ) 2 = = i 2ni + npi 0 npi 0 npi 0 npi 0 Summing each term on the right-side over the cells then yields n2 n2 2 = i 2ni + npi 0 = i n . npi 0 npi 0 13.11 (a) We summarize the information in the table below: Open all mail 414 532 946 Dont open all mail 386 368 754 Total 800 900 1700

Males Females Total

(b) Let p1 and p2 be the probabilities that a person opens all of his/her mail for males and females, respectively. The null hypothesis of homogeneity is H 0 : p1 = p2 .
2 = 3.84 , so that the rejection (c) We take = 0.05 . For d.f. = 1, we find that 0.05

region is R : 2 3.84 . The 2 statistic for homogeneity is calculated as follows:


Expected Values:

Males Females Total

Open all mail 445.176 500.823 946

Dont open all mail 354.824 399.176 754

Total 800 900 1700

383

(O E )2 : E Males Females Open all mail 2.183 1.941 Dont open all mail 2.739 2.435

(d)

Then, 2 = 2.183 + 2.739 + 1.941 + 2.435 = 9.298 , which lies in R. So, we reject H0 (of equal probabilities of opening all mail) at = 0.05 . Furthermore, the p-value is about 0.002. The largest contribution to 2 comes from the mail dont open cell, where the observed count is higher than expected. The female dont open count is lower than expected.

13.12 (a) Denote by pi1 , pi 2 , and pi 3 the population proportions of the categories No job, Work 10 hours or less, and Work more than 10 hours, respectively, where i = 1 stands for underclassmen and i = 2 stands for upperclassmen. The null hypothesis of homogeneity is H 0 : p1i = p2 i = p3i (i = 1, 2) .
2 (b) We take = 0.05 . For d.f. = 2, we find that 0.05 = 5.99 , so that the rejection

region is R : 2 5.99 . The 2 statistic for homogeneity is calculated as follows:


Expected Values E: (For each cell, multiply row and column totals, and divide by 400.)

No Job Underclassmen Upperclassmen Total (O E )2 : E Underclassmen Upperclassmen 115.2 140.8 256 No Job 2.45 2.00

Work 10 Hours 32.4 39.6 72 Work 10 Hours 0.598 0.489

Work >10 Hours 32.4 39.6 72 Work >10 Hours 4.746 3.883

Total 180 220 400

(c)

Then, 2 = 2.45 + 2.00 + 0.598 + 0.489 + 4.746 + 3.883 = 14.166 , which lies in R. So, we reject H0 at = 0.05 . Furthermore, the p-value is less than 0.001. The largest contribution to 2 comes from the Work more than 10 hours

384

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA category, where the observed number of underclassmen is higher than expected and the observed number of upperclassmen higher than expected under homogeneous populations.

13.13

Denote by p A1 , p A 2 , p A3 , and p A 4 the probabilities of response in the categories none, slight, moderate, and severe, respectively, under the use of Brand A pills. Similarly, use pB1 , pB 2 , pB3 , and pB 4 for Brand B. We are to test the null hypothesis of homogeneity H 0 : p Aj = pBj ( j = 1, 2,3, 4) . The 2 statistic for homogeneity is calculated as follows:

2 We take = 0.05 . For d.f. = 3, we find that 0.05 = 7.81 , so that the rejection

region is R : 2 7.81 . Since the observed 2 = 5.580 (from the Minitab output) does not lie in R, we do not reject H0 at = 0.05 . As such, we fail to conclude that the two pills are significantly different in quality. 13.14 Denote by p1 and p2 the probabilities of experiencing no low-back pain for those sleeping on a firm mattress and medium firm mattress, respectively. We are to test the null hypothesis of homogeneity H 0 : p1 = p2 versus H1 : p1 p2 . The 2 statistic for homogeneity is calculated using Minitab as follows:
Expected counts are printed below observed counts No 36 46.68 55 44.32 yes 122 111.32 95 105.68 Total 158

150

Total 91 217 Chi-Sq = 7.123 DF = 1, P-Value = 0.0076

308

385

2 We take = 0.01 . For d.f. = 1, we find that 0.01 = 6.63 , so that the rejection

region is R : 2 6.63 . Since the observed 2 = 7.123 (from the Minitab output) does lie in R, we reject H0 at = 0.01 . As such, we conclude that the two groups of mattresses are significantly different in regard to lower back pain. (b) The test statistic is

Z=

1 p 2 p 1 1 pq + n1 n2

For = 0.01 , the rejection region is R : Z 2.575 . We have the following:


55 2 = 150 p = 0.367 36 + 55 = Pooled estimate p = 0.295 . 158 + 150 0.2278 0.367 The value of the observed z is = 2.678 , which 1 1 + (0.295)(0.705) 158 150 lies in R. So, we reject H0 at = 0.01 . Furthermore, the associated p-value is P[ Z 2.672] = 2 P[Z 2.672] = 0.007 . This fairly small p-value still 36 1 = 158 p = 0.2278,

suggests the two are different at say = 0.05 . 13.15 (a) Denote by p1 , p2, p3 and p4 the probabilities that a paper will be taken from sites 1, 2, 3, and 4, respectively. We are to test the null hypothesis of homogeneity H 0 : p1 = p2 = p3 = p4 . (b) First, we calculate the number of papers taken from each site. For site 1, the number taken is 50 17 = 33; the others are computed similarly. The 2 statistic for homogeneity is calculated using Minitab the output is shown at the top of the next page.
2 We take = 0.05 . For d.f. = 3, we find that 0.05 = 7.81 , so that the rejection

region is R : 2 7.81 . Since the observed 2 = 9.780 (from the Minitab output) lies in R, we reject H0 at = 0.05 . As such, we conclude that the proportions of papers taken differ between the sites. (c) A 95% confidence interval for a population proportion p is given by pq 1.96 p . n We calculate such an interval for each of the four proportions, as follows:

386

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

1 : For p

1 = p

33 50

= 0.66, 0.66 1.96

(0.66)(0.34) = 0.66 0.13 50 (0.745)(0.255) = 0.745 0.125 47 (0.854)(0.146) = 0.854 0.100 48

or (0.53, 0.79) 2 : For p 2 = p


35 47

= 0.745, 0.745 1.96

or (0.62, 0.87) 3 : For p 3 = p


41 48

= 0.854, 0.854 1.96

or (0.754, 0.954) 4 : For p 4 = p


29 50

= 0.58, 0.58 1.96

(0.58)(0.42) = 0.58 0.137 50

or (0.44, 0.72)

387

13.16 Site 1 3 Number of Paper Taken Remaining 33 17 41 7 Delivered 50 48

(a) In order to test H 0 : p1 = p3 versus H1 : p1 p3 , we employ the 2 test to the above 2 2 table. The Minitab calculations are as follows:

2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the rejection

(b)

region is R : 2 3.84 . Since the observed 2 = 4.993 (from the Minitab output) lies in R, we reject H0 at = 0.05 . The test statistic is 1 p 3 p Z= 1 1 pq + n1 n3 For = 0.05 , the rejection region is R : Z 1.96 . We have the following: 3 = 0.854 p 33 + 41 = Pooled estimate p = 0.755 . 50 + 48 0.660 0.854 The value of the observed z is = 2.23 , which lies 1 1 (0.755)(0.245) + 50 48 in R. So, we reject H0 at = 0.05 . Furthermore, the associated p-value is 1 = 0.660, p

388

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA


P[ Z 2.23] = 2 P[Z 2.23] = 2(0.0129) = 0.0258 . This fairly small p-value suggests there is strong evidence in favor of unequal proportions.

13.17 (a) We are to test H 0 : p3 = p4 versus H1 : p3 > p4 . Since H1 is one-sided, the 2 test is not appropriate. As such, we use the test statistic 3 p 4 p Z= 1 1 pq + n3 n4 Since H1 is right-side, the rejection region is of the form R : Z 1.645 . We have the following: 29 41 3 = 48 4 = 50 p = 0.854, p = 0.580 41 + 29 = Pooled estimate p = 0.714 . 48 + 50 0.854 0.580 The value of the observed z is = 3.00 . The 1 1 (0.714)(0.286) + 48 50 associated p-value is P[ Z 3.00] = 0.0013 . This small p-value suggests there is strong evidence in support of H1. 3 p 4 = 0.854 0.580 = 0.274 . (b) Observe that p Estimated S.E. =
3q 3 p q p (0.854)(0.146) (0.580)(0.420) + 4 4 = + = 0.0864 n3 n4 48 50 3q 3 p q p + 4 4 = 0.274 1.96(0.0864) = 0.274 0.169 n3 n4

So, a 95% confidence interval for p3 p4 is given by


3 p 4 ) z (p
0.05 2

or (0.11, 0.44). 13.18 (a) Denote by p11 , p12 , and p13 the probabilities of appreciable loss, little change, and appreciable increase, respectively, for the control group. Similarly, denote by p21 , p22 , p23 and p31 , p32 , p33 those probabilities for the physical therapy and physical activity groups. The null hypothesis of homogeneity is H 0 : p1i = p2 i = p3i (i = 1, 2,3) . The 2 statistic for homogeneity is calculated using Minitab the output is shown at the top of the next page.

389

The observed 2 value, namely 28.162, exceeds the tabulated values 2 corresponding to 0.01 (for d.f. = 4). Thus, the p-value is <0.01. So, the data indicate significant differences of bone mineral loss in these three groups. An examination of the observed and expected frequencies shows that the loss is higher in the control group than in the activity group. 13.19 (a) The calculations are identical. (b) The Minitab output is as follows:

390

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA


2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the rejection

(c)

region is R : 2 3.84 . Since the observed 2 = 9.298 (from the Minitab output) lies in R, we reject H0 at = 0.05 . Furthermore, the associated pvalue is 0.002. As such, the proportions of males and females who open all of their mail are significantly different. The Minitab output is as follows:

2 We take = 0.01 . For d.f. = 4, we find that 0.01 = 13.28 , so that the rejection

region is R : 2 13.28 . Since the observed 2 = 28.162 (from the Minitab output) lies in R, we reject H0 at = 0.01 . Comparing observed and expected frequencies, we see that the bone loss is higher in the control group than the activity group. 13.20 We test the null hypothesis that the proportion of frequent snorers, 4 or 5 on the scale, is the same for females and males. The Minitab output is illustrated on the top of the next page.
2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the rejection

region is R : 2 3.84 . Since the observed 2 = 8.084 (from the Minitab output) lies in R, we reject H0 at = 0.05 . So, we strongly reject the null hypothesis that males and females are the same proportions of frequent snorers. Examining the observed and expected frequencies, we see that there are fewer frequent snorers among females.

391

13.21 We test the null hypothesis that the pattern of appeals decision and the type of representation are independent. The 2 statistic is calculated using Minitab as follows:

2 We take = 0.05 . For d.f. = 2, we find that 0.05 = 5.99 , so that the rejection

region is R : 2 5.99 . Since the observed 2 = 15.734 (from the Minitab output) lies in R, we reject H0 at = 0.05 . We conclude that the patterns of appeals decision are significantly different between the two types of representation. 13.22 (a) We test the null hypothesis of independence between purchasing and the decision to pick up a basket upon entering the store. The 2 statistic is calculated using Minitab as follows:

392
Chi-Square Test

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

Expected counts are printed below observed counts Purchase 60 40.40 41 60.60 101 No Purch 20 39.60 79 59.40 99 Total 80

120

Total

200 6.467 = 32.017

Chi-Sq = 9.509 + 9.701 + 6.339 + DF = 1, P-Value = 0.000

2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the

(b)

rejection region is R : 2 3.84 . Since the observed 2 = 32.017 (from the Minitab output) lies in R, we reject H0 at = 0.05 . We conclude that the there is a strong association between purchasing and choosing to pick up a basket upon entering the store. Its hard to say for certain. One the one hand, it is clear that if a person enters the store with an intention to purchase a single item, then there is likely no need for a basket, while if there are several items to be bought, a basket would be needed. On the other hand, if a person enters the store not knowing he will make a purchase, then his decision to pick up a basket depends on how likely he feels he will buy something.

13.23 We test the null hypothesis of independence between union membership and attitude toward spending on social welfare. The 2 statistic is calculated using Minitab as follows:

393

2 We take = 0.01 . For d.f. = 2, we find that 0.01 = 9.21 , so that the rejection

region is R : 2 9.21 . Since the observed 2 = 27.847 (from the Minitab output) lies in R, we reject H0 at = 0.01 . We conclude that attitudes and union membership are dependent. There are significant differences between the attitudes of the union and non-union groups. 13.24 We test the null hypothesis of independence between gender and attitude toward violence on television. The 2 statistic is calculated using Minitab as follows:

2 We take = 0.01 . For d.f. = 2, we find that 0.01 = 9.21 , so that the rejection

region is R : 2 9.21 . Since the observed 2 = 27.167 (from the Minitab output) lies in R, we reject H0 at = 0.01 . We conclude that attitudes of males and females are significantly different. The large contributions come from the No categories, which have more than expected males and fewer than expected females.

394

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

13.25 The null hypothesis is that group and stopping response are independent. The cell probabilities are the product of the marginal probabilities. The 2 statistic is calculated using Minitab as follows:

2 We take = 0.05 . For d.f. = 2, we find that 0.05 = 5.99 , so that the rejection

region is R : 2 5.99 . Since the observed 2 = 4.1334 (from the Minitab output) does not lie in R, we do not reject H0 at = 0.05 . We conclude that, under independence, the groups are not significantly different in their response. 13.26 (a) We obtain the frequencies for the non influential or no responses by subtraction. The frequency of no to golf and yes to dining = 53 25 = 28, the frequency of yes to golf and no to dining = 80 25 = 55, and the frequency of the no-no response = 120 25 28 55 = 12. This information is tabulated below: Golf Dining Total Yes No Yes 25 55 80 No 28 12 40 Total 53 67 120

(b) The null hypothesis of independence states that the response to the dining facilities is independent of the response to the golf facilities. The probability

395

of a yes-yes response is equal to the product of the probability of a yes to dining response and the probability of a yes to golf response. (c) Using Minitab, the 2 2 contingency table is repeated here, along with the calculations for the 2 statistic.

2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the rejection

region is R : 2 3.84 . Since the observed 2 = 16.238 (from the Minitab output) lies in R, we reject H0 at = 0.05 . Furthermore, the very small pvalue strengthens the evidence against independence. (d) The largest contributions to the 2 statistic comes from a lower observed frequency than expected frequency for golfing facilities influential (yes) and dining not influential (no) in decision to join. 13.27 (a) They are identical except for the third decimal in the Chi-sq entry. (b) The Minitab output is as follows:
Chi-Square Test: C1, C2, C3
Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts C1 361 400.97 3.984 433 393.03 4.065 794 C2 228 186.35 9.311 141 182.66 9.500 369 C3 17 18.69 0.152 20 18.32 0.155 37 Total 606

594

Total

1200

Chi-Sq = 27.167, DF = 2, P-Value = 0.000

396

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

Since the p-value is so small, the null hypothesis would be rejected for any reasonable value of This provides strong evidence against independence. 13.28 The Minitab output is as follows:

The very small p-value provides strong evidence against independence for any reasonable value of 13.29 Denote by p0 , p1 , ... , p9 the probabilities of the integers 0, 1, 2, , 9, respectively. The null hypothesis that all ten integers are equally likely is 1 formalized by H 0 : p0 = p1 = ... = p9 = 10 and the alternative hypothesis H1 is that at least two of these proportions are different. Under H0, the expected frequency 1 of each cell is 500 ( 10 ) = 50 . The 2 statistic for goodness-of-fit is calculated as follows:
Integer Observed Frequency (O) Expected Frequency (E ) 0 41 1 58 2 51 3 61 4 39 5 56 6 45 7 35 8 62 9 52 Total 500

50
1.62

50
1.28

50
0.02

50
2.42

50
2.42

50
0.72

50
0.50

50
4.50

50
2.88

50
0.08

500

(O E )2 E

16.44

= 2

397

2 We take = 0.05 . For d.f. = 9, we find that 0.05 = 16.92 , so that the rejection

region is R : 2 16.92 . Since the observed value of 2 = 16.44 does not lie in R, we do not reject H0 at = 0.05 . As such, the data do not demonstrate any bias. 13.30 Denote by p1 , p2 , ... , p12 the probabilities of birth in the months of January, February, , December, respectively. The null hypothesis of uniform 1 distribution of births is formalized by H 0 : p1 = p2 = ... = p12 = 12 and the alternative hypothesis H1 is that at least two of these proportions are different. 1 Under H0, the expected frequency of each cell is 41, 208 ( 12 ) = 3434 . Because
E = 3434 is the same for all categories (i.e., months), the 2 statistic is conveniently calculated by using the following alternative formula: O2 1 2 = n= O 2 41, 208 = 41, 280.45 41, 208 = 72.45 3434 cells E 2 We take = 0.01 . For d.f. = 11, we find that 0.01 = 24.72 , so that the rejection

region is R : 2 24.72 . Since the observed value of 2 = 72.45 lies in R, we reject H0 at = 0.01 . As such, we conclude that the model of uniform distribution of births is contradicted by the data. (Note: Small differences can be significant when the sample size is large.) 13.31 Denote by p1 , p2 , p3 , and p4 the probabilities of birth in the four consecutive quarters. The null hypothesis is H 0 : p1 = 0.4, p2 = 0.2, p3 = 0.2, p4 = 0.2 and the alternative hypothesis H1 is that at least one of these proportions is not correct. Under H0, multiplying these probabilities by n = 151 , the expected frequencies are found to be 60.4, 30.2, 30.2, 30.2. The 2 statistic for goodness-of-fit is calculated as follows: Quarter Observed Frequency (O) Expected Frequency (E) (O E )2 E Jan Mar 55 Apr Jun 29 Jul Sep 26 Oct Dec 41 Total 151

60.4

30.2

30.2

30.2

151

0.483

0.048

0.584

3.862

4.977 = 2

2 We take = 0.10 . For d.f. = 3, we find that 0.10 = 6.25 , so that the rejection

region is R : 2 6.25 . Since the observed value of 2 = 4.977 does not lie in R, we do not reject H0 at = 0.05 . As such, the stated conjecture is not contradicted.

398

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

13.32 (a) Let p1i , p2 i , p3i denote the population proportions from the categories 1 major, 2 majors, and 3 majors, respectively, and i = 1 stands for AY 0102 and i = 2 stands for AY 07-08. We test the null hypothesis of homogeneity: H 0 : p1i = p2i = p3i (i = 1, 2) .
2 Take = 0.05 . For d.f. = 2, 0.05 = 5.99 and so the rejection region is

R : 2 5.99 .
Observed Frequencies O:
1 major 2 majors 3 majors TOTAL AY 01-02 2327 837 57 3221 AY 07-08 2356 1059 88 3503 4683 1896 145 6724

Expected Values E:
1 major 2 majors 3 majors AY 01-02 2243.3 908.24 69.46 AY 07-08 2439.7 987.76 75.54

(O E )2 : E
1 major 2 majors 3 majors AY 01-02 3.123 5.588 2.232 AY 07-08 2.872 5.352 2.052

Thus, the test statistic is 2 = 21.0109 . Since this lies in R, we reject H0 at = 0.05 . (b) We form the following condensed table of observed frequencies: Observed Frequencies O:
1 major >1 major TOTAL AY 01-02 2327 894 32121 AY 07-08 2356 1147 3503 4683 2041 6724

Let p1 = population proportion of seniors with multiple majors in AY 01-02 p2 = population proportion of seniors with multiple majors in AY 07-08 1 = 0.278, p 2 = 0.327 . Also, Note that p Est. SE =
1 q 1 p q p (0.278)(0.722) (0.327)(0.673) + 2 2 = + = 0.0112 . n1 n2 3221 3503

399

So, the 95% confidence interval for p1 p2 is


1 p 2 ) z (p
0.05 2

1q 1 p q p + 2 2 = 0.05 1.96 ( 0.0112 ) n1 n2

or (-0.072, -0.028). 13.33 Denote by p1 and p2 the population proportion of persons having hepatitis in the two groups vaccinated and not vaccinated, respectively. We are to test H 0 : p1 = p2 versus H1 : p1 p2 . The 2 statistic for homogeneity is calculated using Minitab as follows:

2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the

rejection region is R : 2 3.84 . Since the observed value of 2 = 48.242 lies in R, we reject H0 at = 0.05 . This is very strong evidence in support of H1. 13.34 (a) Denote by p1 and p2 the two population proportions. We are to test H 0 : p1 = p2 versus H1 : p1 p2 . We use the test statistic 1 p 2 p Z= 1 1 pq + n1 n2 Since H1 is two-side and = 0.05 , the rejection region is R : Z 1.96 . We have the following: 70 11 1 = 549 2 = 534 p = 0.0200, p = 0.1311 11 + 70 = Pooled estimate p = 0.0748 . 1083

400

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA


0.0200 0.1311 = 6.948 , 1 1 (0.0748)(0.9252) + 549 534 which lies in R. So, we reject H0 at = 0.05 . Also, note that the value of the 2 statistic, namely 48.241, is essentially equal to

The value of the observed z is

(b)

z 2 = (6.948) 2 = 48.27 (where the difference is due to rounding). For H 0 : p1 = p2 versus H1 : p1 < p2 , we must use the Z-test with a onesided rejection region R : Z 1.645 corresponding to = 0.05 . The two response categories are free of pain and not free of pain. The frequencies of the latter category are obtained by subtracting those of the first from the corresponding number of patients assigned. The 4 2 contingency table is presented here, along with the calculations (from Minitab) for the 2 statistic.

13.35 (a)

2 We take = 0.05 . For d.f. = 3, we find that 0.05 = 7.81 , so that the

rejection region is R : 2 7.81 . Since the observed value of 2 = 12.053

401

(b)

lies in R, we reject H0 at = 0.05 . We conclude that there are significant differences in the effectiveness of the four drugs. A 95% confidence interval for a population proportion p is given by pq 1.96 p . n We calculate such an interval for each of the four proportions, as follows: (0.434)(0.566) 23 1 = 53 For Drug 1: p = 0.434, 0.442 1.645 53 = 0.434 0.112 or (0.32, 0.55) (0.638)(0.362) 2 = 30 For Drug 2: p 47 = 0.638, 0.625 1.645 47 = 0.638 0.115 or (0.52, 0.75) (0.373)(0.627) 3 = 19 For Drug 3: p 51 = 0.373, 0.373 1.645 51 = 0.373 0.111 or (0.26, 0.48) (0.659)(0.341) 4 = 29 For Drug 4: p 44 = 0.659, 0.659 1.645 44 = 0.659 0.118 or (0.54, 0.78)

13.36 (a) Free 23 19 Not Free 30 32 Total 53 51

Drug 1 Drug 3

(a) In order to test H 0 : p1 = p3 versus H1 : p1 p3 , we employ the 2 test to the above 2 2 table. The Minitab calculations are as follows:

402

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA


2 We take = 0.05 . For d.f. = 1, we find that 0.05 = 3.84 , so that the rejection

region is R : 2 3.84 . Since the observed value of 2 = 0.407 does not lie in R, we do not reject H0 at = 0.05 . (b) The test statistic is 1 p 3 p Z= 1 1 pq + n1 n3 For = 0.05 , the rejection region is R : Z 1.96 . We have the following: 3 = 0.373 p 23 + 19 = = 0.404 . Pooled estimate p 53 + 51 0.434 0.373 The value of the observed z is = 0.634 , which 1 1 (0.404)(0.596) + 53 51 does not lie in R. So, we do not reject H0 at = 0.05 . This is consistent with part (a). 1 = 0.434, p

13.37 (a) We are to test H 0 : p4 = p3 versus H1 : p4 > p3 . Since H1 is one-sided, the 2 test is not appropriate. We have the following: 3 = 19 4 = 29 p p 51 = 0.373, 44 = 0.659 29 + 19 = Pooled estimate p = 0.505 . 44 + 51 0.659 0.373 The value of the observed z is = 2.78 . The 1 1 (0.505)(0.495) + 44 51 associated p-value is P[ Z 2.78] = 0.0027 . This small p-value suggests there is strong evidence in support of H1. 4 p 3 = 0.659 0.373 = 0.286 . (b) Observe that p Estimated S.E. =
3q 3 p q p (0.373)(0.627) (0.659)(0.341) + 4 4 = + = 0.0985 n3 n4 51 44 3q 3 p q p + 4 4 = 0.286 1.96(0.0985) = 0.286 0.193 n3 n4

So, a 95% confidence interval for p4 p3 is given by


4 p 3 ) z (p
0.05 2

or (0.09, 0.48). 13.38 Denote by p11 , p12 , and p13 the probabilities of response in the categories worsened, no change, and improved, respectively, for the experimental group. Similarly, denote by p21 , p22 , p23 those probabilities for the control

403

group. The null hypothesis of homogeneity is H 0 : p1i = p2 i (i = 1, 2,3) . The

2 statistic for homogeneity is calculated using Minitab the output is shown at


the top of the next page.

2 We take = 0.01 . For d.f. = 2, we find that 0.01 = 9.21 , so that the

rejection region is R : 2 9.21 . Since the observed value of 2 = 20.431 lies in R, we reject H0 at = 0.01 . We conclude that the changes in coronary blockage were significantly different for the two groups. 13.39 We test the null hypothesis that the duration of marriage is independent of the period of acquaintance before marriage. The 2 statistic is calculated (using Minitab) as follows:

p-value = 0.9265

404

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

2 For = 0.05 , the critical value is 0.05 = 5.99 . The observed 2 is not significant. As such, the null hypothesis of independence between period of acquaintanceship and duration of marriage is not contradicted.

13.40 We test the null hypothesis that the classifications according to gender and attitude are independent. The 2 statistic is calculated (using Minitab) as follows:

2 For = 0.01 , the tabulated value is 0.01 = 9.21 . The observed 2 is significant. The large contribution from the corner cells indicate different attitude patterns between males and females. More males than expected and fewer females than expected are in favor.

13.41 We test the null hypothesis of independence. The 2 statistic is calculated (using Minitab) as follows:

405

2 = 3.84 , so that the rejection We take = 0.05 . For d.f. = 1, we find that 0.05

region is R : 2 3.84 . Since the observed value of 2 = 2.764 does not lie in R, we do not reject H0 at = 0.05 . 13.42 We test the null hypothesis that risk-aversion is independent of age. The 2 statistic is calculated (using Minitab) as follows:

p-value = 0.5779 2 We take = 0.05 . For d.f. = 4, we find that 0.05 = 9.49 , so that the rejection region is R : 2 9.49 . Since the observed value of 2 = 2.881 does not lie in R, we do not reject H0 at = 0.05 . The data fail to demonstrate an association between risk-aversion and age. 13.43 (a) In Table 14, the expected frequency of the first cell is 75 ( 100 300 ) = 25 , which is the same as the observed frequency. In the same manner, it is seen that the expected and observed frequencies are identical in every cell. Consequently, 2 = 0 . This is also the case for Table 15. The 2 statistic is calculated (using Minitab) for the pooled data in Table 16 as follows:

(b)

406

CHAPTER 13. ANALYSIS OF CATEGORICAL DATA

p-value < 0.001. 2 = 3.84 , so the null hypothesis is rejected at For d.f. = 1, we find that 0.05 = 0.05 . (c) Note that for secretarial positions the proportion of candidates receiving an offer is 25/75 = 1/3 for males and 75/225 = 1/3 for females. The proportions for sales positions are 150/200 = 3/4 for males and 75/100 = 3/4 for females. Although the rates are equal within each table, we see that in the pooled table, the rates are 175/272 = 0.64 for males and 150/325 = 0.46 for females. This discrepancy arises because in Table 14, 75/300 (or 25%) are male applicants, whereas in Table 15 there are 67%. Thus, pooling the tables results in a rather uneven mix. Since the overall rates of offer are very different for the two kinds of positions, the tables should not be combined.

You might also like