You are on page 1of 34

Chapter 12: Analysis of Categorical Data

Chapter 12
Analysis of Categorical Data

LEARNING OBJECTIVES

This chapter presents several nonparametric statistics that can be used to analyze data
enabling you to:

1.

Understand the chi-square goodness-of-fit test and how to use it.

2.

Analyze data using the chi-square test of independence.

CHAPTER TEACHING STRATEGY

Chapter 12 is a chapter containing the two most prevalent chi-square tests: chisquare goodness-of-fit and chi-square test of independence. These two techniques are
important because they give the statistician a tool that is particularly useful for analyzing
nominal data (even though independent variable categories can sometimes have ordinal
or higher categories). It should be emphasized that there are many instances in business
research where the resulting data gathered are merely categorical identification. For
example, in segmenting the market place (consumers or industrial users), information is
gathered regarding gender, income level, geographical location, political affiliation,
religious preference, ethnicity, occupation, size of company, type of industry, etc. On
these variables, the measurement is often a tallying of the frequency of occurrence of
individuals, items, or companies in each category. The subject of the research is given no
"score" or "measurement" other than a 0/1 for being a member or not of a given category.
These two chi-square tests are perfectly tailored to analyze such data.
The chi-square goodness-of-fit test examines the categories of one variable to
determine if the distribution of observed occurrences matches some expected or
theoretical distribution of occurrences. It can be used to determine if some standard or
previously known distribution of proportions is the same as some observed distribution of

Chapter 12: Analysis of Categorical Data

proportions. It can also be used to validate the theoretical distribution of occurrences of


phenomena such as random arrivals which are often assumed to be Poisson distributed.
You will note that the degrees of freedom which are k - 1 for a given set of expected
values or for the uniform distribution change to k - 2 for an expected Poisson distribution
and to k - 3 for an expected normal distribution. To conduct a chi-square goodness-of-fit
test to analyze an expected Poisson distribution, the value of lambda must be estimated
from the observed data. This causes the loss of an additional degree of freedom. With
the normal distribution, both the mean and standard deviation of the expected distribution
are estimated from the observed values causing the loss of two additional degrees of
freedom from the k - 1 value.
The chi-square test of independence is used to compare the observed frequencies
along the categories of two independent variables to expected values to determine if the
two variables are independent or not. Of course, if the variables are not independent,
they are dependent or related. This allows business researchers to reach some
conclusions about such questions as is smoking independent of gender or is type of
housing preferred independent of geographic region. The chi-square test of independence
is often used as a tool for preliminary analysis of data gathered in exploratory research
where the researcher has little idea of what variables seem to be related to what variables,
and the data are nominal. This test is particularly useful with demographic type data.
A word of warning is appropriate here. When an expected frequency is small, the
observed chi-square value can be inordinately large thus yielding an increased possibility
of committing a Type I error. The research on this problem has yielded varying results
with some authors indicating that expected values as low as two or three are acceptable
and other researchers demanding that expected values be ten or more. In this text, we
have settled on the fairly widespread accepted criterion of five or more.

CHAPTER OUTLINE

16.1

Chi-Square Goodness-of-Fit Test


Testing a Population Proportion Using the Chi-square Goodness-of-Fit
Test as an Alternative Technique to the z Test

16.2

Contingency Analysis: Chi-Square Test of Independence

KEY TERMS
Categorical Data
Chi-Square Distribution
Chi-Square Goodness-of-Fit Test

Chi-Square Test of Independence


Contingency Analysis
Contingency Table

Chapter 12: Analysis of Categorical Data

SOLUTIONS TO CHAPTER 16

12.1

( f 0 fe )2
f0
68
42
33
22
10
8

f0
53
37
32
28
18
15

fe
3.309
0.595
0.030
1.636
6.400
6.125

Ho: The observed distribution is the same


as the expected distribution.
Ha: The observed distribution is not the same
as the expected distribution.

Observed 2 =

( f0 fe )2
= 18.095
fe

df = k - 1 = 6 - 1 = 5, = .05
2.05,5 = 11.07
Since the observed 2 = 18.095 > 2.05,5 = 11.07, the decision is to reject the null
hypothesis.
The observed frequencies are not distributed the same as the expected
frequencies.
12.2

f0
19
17
14
18
19
21
18
18
fo = 144

fe
18
18
18
18
18
18
18
18
fe = 144

( f 0 fe )2
f0
0.056
0.056
0.889
0.000
0.056
0.500
0.000
0.000
1.557

Chapter 12: Analysis of Categorical Data

Ho: The observed frequencies are uniformly distributed.


Ha: The observed frequencies are not uniformly distributed.

x=

f
k

144
= 18
8

In this uniform distribution, each fe = 18


df = k 1 = 8 1 = 7, = .01
2.01,7 = 18.4753

Observed 2 =

( f0 fe )2
= 1.557
fe

Since the observed 2 = 1.557 < 2.01,7 = 18.4753, the decision is to fail to reject
the null hypothesis
There is no reason to conclude that the frequencies are not uniformly
distributed.

12.3

Number

f0

0
1
2
3

28
17
11
5

(Number)(f0)
0
17
22
15
54

Ho: The frequency distribution is Poisson.


Ha: The frequency distribution is not Poisson.
=

54
=0.9
61
Number
0
1
2
>3

Expected
Probability
.4066
.3659
.1647
.0628

Expected
Frequency
24.803
22.312
10.047
3.831

Since fe for > 3 is less than 5, collapse categories 2 and >3:

Chapter 12: Analysis of Categorical Data

Number

fo

fe

0
1
>2

28
17
16
61

24.803
22.312
13.878
60.993

df = k - 2 = 3 - 2 = 1,

( f0 fe )2
f0
0.412
1.265
0.324
2.001

= .05

2.05,1 = 3.84146
( f0 fe )2
Calculated =
= 2.001
fe
2

Since the observed 2 = 2.001 < 2.05,1 = 3.84146, the decision is to fail to reject
the null hypothesis.
There is insufficient evidence to reject the distribution as Poisson distributed.
The conclusion is that the distribution is Poisson distributed.

12.4
Category
f(observed)
10-20
6
20-30
14
30-40
29
40-50
38
50-60
25
60-70
10
70-80
7
n = f = 129

x=

fm = 5,715
f 129

s=

fM
2

Midpt.
fm
15
90
25
350
35
1,015
45
1,710
55
1,375
65
650
75
525
fm = 5,715

= 44.3

( fM ) 2

n 1

fm2
1,350
8,750
35,525
76,950
75,625
42,250
39,375
2
fm = 279,825

(5,715) 2
129
= 14.43
128

279,825

Ho: The observed frequencies are normally distributed.


Ha: The observed frequencies are not normally distributed.

Chapter 12: Analysis of Categorical Data

For Category 10 - 20

Prob

10 44.3
= -2.38
14.43
20 44.3
= -1.68
z =
14.43

z =

.4913

- .4535
Expected prob.:

For Category 20-30

.0378

Prob

for x = 20,
z = -1.68
30 44.3
z=
= -0.99
14.43

.4535
-.3389
Expected prob:

For Category 30 - 40

.1146

Prob

for x = 30,
z = -0.99
40 44.3
z =
= -0.30
14.43

.3389
-.1179
Expected prob:

For Category 40 - 50

.2210

Prob

for x = 40,
z = -0.30
50 44.3
z =
= 0.40
14.43

.1179
+.1554
Expected prob:

.2733

For Category 50 - 60

Prob

60 44.3
= 1.09
14.43
for x = 50,
z = 0.40

.3621

z =

-.1554
Expected prob: .2067

Chapter 12: Analysis of Categorical Data

For Category 60 - 70

Prob

70 44.3
= 1.78
14.43
for x = 60, z = 1.09

.4625

z =

-.3621
Expected prob: .1004

For Category 70 - 80

Prob

80 44.3
= 2.47
14.43
for x = 70, z = 1.78

.4932

z =

-.4625
Expected prob: .0307

For < 10:


Probability between 10 and the mean, 44.3 = (.0378 + .1145 + .2210
+ .1179) = .4913. Probability < 10 = .5000 - .4913 = .0087
For > 80:
Probability between 80 and the mean, 44.3 = (.0307 + .1004 + .2067 + .1554) =
.4932. Probability > 80 = .5000 - .4932 = .0068
Category
< 10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
> 80

Prob
.0087
.0378
.1146
.2210
.2733
.2067
.1004
.0307
.0068

expected frequency
.0087(129) = 0.99
.0378(129) = 4.88
14.78
28.51
35.26
26.66
12.95
3.96
0.88

Due to the small sizes of expected frequencies, category < 10 is folded into 10-20
and >80 into 70-80.

Chapter 12: Analysis of Categorical Data

( f0 fe )2
f0
.003
.041
.008
.213
.103
.672
.964
2.004

Category

fo

fe

10-20
20-30
30-40
40-50
50-60
60-70
70-80

6
14
29
38
25
10
7

5.87
14.78
28.51
32.26
26.66
12.95
4.84

Calculated 2 =

( f0 fe )2
= 2.004
fe

df = k - 3 = 7 - 3 = 4, = .05
2.05,1 = 9.48773
Since the observed 2 = 2.004 > 2.05,4 = 9.48773, the decision is to fail to reject
the null hypothesis. There is not enough evidence to declare that the observed
frequencies are not normally distributed.

12.5

Definition

fo

Exp.Prop.

Happiness
Sales/Profit
Helping Others
Achievement/
Challenge

42
95
27

.39
.12
.18

227(.39)= 88.53
227(.12)= 27.24
40.86

63
227

.31

70.34

fe

( f 0 fe )2
f0
24.46
168.55
4.70
0.77
198.48

Ho: The observed frequencies are distributed the same as the expected
frequencies.
Ha: The observed frequencies are not distributed the same as the expected
frequencies.
Observed 2 = 198.48
df = k 1 = 4 1 = 3,
2.05,3 = 7.81473

= .05

Chapter 12: Analysis of Categorical Data

Since the observed 2 = 198.48 > 2.05,3 = 7.81473, the decision is to reject the
null hypothesis.
The observed frequencies for men are not distributed the same as the
expected frequencies which are based on the responses of women.

12.6

Age

fo

10-14
15-19
20-24
25-29
30-34
> 35

22
50
43
29
19
49
212

Prop. from survey


.09
.23
.22
.14
.10
.22

fe
(.09)(212)=19.08
(.23)(212)=48.76
46.64
29.68
21.20
46.64

( f0 fe )2
f0
0.45
0.03
0.28
0.02
0.23
0.12
1.13

Ho: The distribution of observed frequencies is the same as the distribution of


expected frequencies.
Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.
= .01, df = k - 1 = 6 - 1 = 5
2.01,5 = 15.0863
The observed 2 = 1.13
Since the observed 2 = 1.13 < 2.01,5 = 15.0863, the decision is to fail to reject
the null hypothesis.
There is not enough evidence to declare that the distribution of observed
frequencies is different from the distribution of expected frequencies.

Chapter 12: Analysis of Categorical Data

12.7

Age
10-20
20-30
30-40
40-50
50-60
60-70

x=

fo
16
44
61
56
35
19
231

fM
n

fm
240
1,100
2,135
2,520
1,925
1,235
fm = 9,155

fm2
3,600
27,500
74,725
113,400
105,875
80,275
fm2 = 405,375

9,155
= 39.63
231

fM 2
s =

m
15
25
35
45
55
65

10

( fM ) 2

n 1

(9,155) 2
231 = 13.6
230

405,375

Ho: The observed frequencies are normally distributed.


Ha: The observed frequencies are not normally distributed.
For Category 10-20

Prob

10 39.63
= -2.18
.4854
13.6
20 39.63
= -1.44
-.4251
z =
13.6
Expected prob. .0603

z =

For Category 20-30


for x = 20,
z =

z = -1.44

.4251

30 39.63
= -0.71
-.2611
13.6
Expected prob. .1640

For Category 30-40


for x = 30,
z =

Prob

z = -0.71

Prob
.2611

40 39.63
= 0.03
+.0120
13.6
Expected prob. .2731

Chapter 12: Analysis of Categorical Data

For Category 40-50


z =

Prob

50 39.63
= 0.76
13.6

for x = 40,

11

z = 0.03

.2764
-.0120

Expected prob. .2644

For Category 50-60


z =

Prob

60 39.63
= 1.50
13.6

for x = 50,

z = 0.76

.4332
-.2764

Expected prob. .1568

For Category 60-70


z =

Prob

70 39.63
= 2.23
13.6

for x = 60, z = 1.50

.4871
-.4332

Expected prob. .0539

For < 10:


Probability between 10 and the mean = .0603 + .1640 + .2611 = .4854
Probability < 10 = .5000 - .4854 = .0146
For > 70:
Probability between 70 and the mean = .0120 + .2644 + .1568 + .0539 = .4871
Probability > 70 = .5000 - .4871 = .0129
Age
< 10
10-20
20-30
30-40
40-50

Probability
fe
.0146
(.0146)(231) = 3.37
.0603
(.0603)(231) = 13.93
.1640
37.88
.2731
63.09
.2644
61.08

Chapter 12: Analysis of Categorical Data

50-60
60-70
> 70

.1568
.0539
.0129

12

36.22
12.45
2.98

Categories < 10 and > 70 are less than 5.


Collapse the < 10 into 10-20 and > 70 into 60-70.
Age

fo

fe

10-20
20-30
30-40
40-50
50-60
60-70

16
44
61
56
35
19

17.30
37.88
63.09
61.08
36.22
15.43

( f0 fe )2
f0
0.10
0.99
0.07
0.42
0.04
0.83
2.45

df = k - 3 = 6 - 3 = 3, = .05
2.05,3 = 7.81473
Observed 2 = 2.45

Since the observed 2 < 2.05,3 = 7.81473, the decision is to fail to reject the null
hypothesis.
There is no reason to reject that the observed frequencies are normally
distributed.

12.8

Number
0
1
2
3
4
5
6 or more

f
18
28
47
21
16
11
9
f = 150

f number = 358
150
f

(f) (number)
0
28
94
63
64
55
54
f(number) = 358

= 2.4

Ho: The observed frequencies are Poisson distributed.


Ha: The observed frequencies are not Poisson distributed.

Chapter 12: Analysis of Categorical Data

Number
0
1
2
3
4
5
6 or more

Probability
.0907
.2177
.2613
.2090
.1254
.0602
.0358

fo

fe

18
28
47
21
16
11
9

13.61
32.66
39.66
31.35
18.81
9.03
5.36

13

fe
(.0907)(150 = 13.61
(.2177)(150) = 32.66
39.20
31.35
18.81
9.03
5.36

( f 0 fe )2
f0
1.42
0.66
1.55
3.42
0.42
0.43
2.47
10.37

The observed 2 = 10.27


= .01, df = k 2 = 7 2 = 5,

2.01,5 = 15.0863

Since the observed 2 = 10.27 < 2.01,5 = 15.0863, the decision is to fail to reject
the null hypothesis.
There is not enough evidence to reject the claim that the observed
frequencies are Poisson distributed.

12.9

H0: p = .28
Ha: p .28

n = 270

x = 62

fo
Spend More
Don't Spend More
Total

fe

( f0 fe )2
f0

62

270(.28) = 75.6

2.44656

208

270(.72) = 194.4

0.95144

270

270.0

3.39800

Chapter 12: Analysis of Categorical Data

14

The observed value of 2 is 3.398


= .05 and /2 = .025

df = k - 1 = 2 - 1 = 1

2.025,1 = 5.02389
Since the observed 2 = 3.398 < 2.025,1 = 5.02389, the decision is to fail to
reject the null hypothesis.

12.10

H0: p = .30
Ha: p .30

n = 180 x= 42

42

180(.30) = 54

( f 0 fe )2
f0
2.6666

138

180(.70) = 126

1.1429

180

180

3.8095

f0
Provide
Don't Provide
Total

fe

The observed value of 2 is 3.8095


= .05 and /2 = .025

df = k - 1 = 2 - 1 = 1

2.025,1 = 5.02389
Since the observed 2 = 3.8095 < 2.025,1 = 5.02389, the decision is to fail to
reject the null hypothesis.

Chapter 12: Analysis of Categorical Data

15

12.11
Variable
One

Variable Two
203
326
68
110
271
436

529
178
707

Ho: Variable One is independent of Variable Two.


Ha: Variable One is not independent of Variable Two.

e11 =

(529)(271)
= 202.77
707

e12 =

(529)(436)
= 326.23
707

e21 =

(271)(178)
= 68.23
707

e22 =

(436)(178)
= 109.77
707

Variable Two
Variable (202.77) (326.23)
One
203
326
(68.23) (109.77)
68
110

529

271

707

436

178

(203 202.77) 2
(326 326.23) 2
(68 6.23) 2
(110 109.77) 2
=
+
+
+
=
202.77
326.23
68.23
109.77
2

.00 + .00 + .00 + .00 = 0.00


= .05, df = (c-1)(r-1) = (2-1)(2-1) = 1
2.05,1 = 3.84146
Since the observed 2 = 0.00 < 2.05,1 = 3.84146, the decision is to fail to reject
the null hypothesis.
Variable One is independent of Variable Two.

Chapter 12: Analysis of Categorical Data

16

12.12

Variable
One

24
93
117

Variable
Two
13
47
59
187

58
244

142
583

72

302

725

234

Ho: Variable One is independent of Variable Two.


Ha: Variable One is not independent of Variable Two.
e11 =

(142)(117)
= 22.92
725

e12 =

(142)(72)
= 14.10
725

e13 =

(142)(234)
= 45.83
725

e14 =

(142)(302)
= 59.15
725

e21 =

(583)(117)
= 94.08
725

e22 =

(583)(72)
= 57.90
725

e23 =

(583)(234)
= 188.17
725

e24 =

(583)(302)
= 242.85
725

Variable
Two
Variable (22.92) (14.10) (45.83) (59.15)
One
24
13
47
58
(94.08) (57.90) (188.17) (242.85)
93
59
187
244
117

2 =

72

234

302

142
583
725

(24 22.92) 2
(13 14.10) 2
(47 45.83) 2
(58 59.15) 2
+
+
+
+
22.92
14.10
45.83
59.15
(93 94.08) 2
(59 57.90) 2
(188 188.17) 2
(244 242.85) 2
+
+
+
=
94.08
57.90
188.17
242.85
.05 + .09 + .03 + .02 + .01 + .02 + .01 + .01 = 0.24

Chapter 12: Analysis of Categorical Data

17

= .01, df = (c-1)(r-1) = (4-1)(2-1) = 3


2.01,3 = 11.3449
Since the observed 2 = 0.24 < 2.01,3 = 11.3449, the decision is to fail to
reject the null hypothesis.
Variable One is independent of Variable Two.

12.13

Number
of
Children

0
1
2 or 3
>3

Social Class
Lower Middle Upper
7
18
6
9
38
23
34
97
58
47
31
30
97
184
117

31
70
189
108
398

Ho: Social Class is independent of Number of Children.


Ha: Social Class is not independent of Number of Children.
e11 =

(31)(97)
= 7.56
398

e31 =

(189)(97)
= 46.06
398

e12 =

(31)(184)
= 14.3
398

e32 =

(189)(184)
= 87.38
398

e13 =

(31)(117)
= 9.11
398

e33 =

(189)(117)
= 55.56
398

e21 =

(70)(97)
= 17.06
398

e41 =

(108)(97)
= 26.32
398

e22 =

(70)(184)
= 32.36
398

e42 =

(108)(184)
= 49.93
398

e23 =

(70)(117)
= 20.58
398

e43 =

(108)(117)
= 31.75
398

Chapter 12: Analysis of Categorical Data

0
Number
of
Children

1
2 or 3
>3

2 =

18

Social Class
Lower Middle Upper
(7.56) (14.33) (9.11)
7
18
6
(17.06) (32.36) (20.58)
9
38
23
(46.06) (87.38) (55.56)
34
97
58
(26.32) (49.93) (31.75)
47
31
30
97
184
117

31
70
189
108
398

(7 7.56) 2
(18 14.33) 2
(6 9.11) 2
(9 17.06) 2
+
+
+
+
7.56
14.33
9.11
17.06
(38 32.36) 2
(23 20.58) 2
(34 46.06) 2
(97 87.38) 2
+
+
+
+
32.36
20.58
46.06
87.38
(58 55.56) 2
(47 26.32) 2
(31 49.93) 2
(30 31.75) 2
+
+
+
=
55.56
26.32
49.93
31.75

.04 + .94 + 1.06 + 3.81 + .98 + .28 + 3.16 + 1.06 + .11 + 16.25 +
7.18 + .10 = 34.97
= .05, df = (c-1)(r-1) = (3-1)(4-1) = 6
2.05,6 = 12.5916
Since the observed 2 = 34.97 > 2.05,6 = 12.5916, the decision is to reject the
null hypothesis.
Number of children is not independent of social class.

Chapter 12: Analysis of Categorical Data

19

12.14

Region

NE
S
W

Type of Music Preferred


Rock
R&B Coun Clssic
140
32
5
18
134
41
52
8
154
27
8
13
428
100
65
39

195
235
202
632

Ho: Type of music preferred is independent of region.


Ha: Type of music preferred is not independent of region.
e11 =

(195)(428)
= 132.6
632

e23 =

(235)(65)
= 24.17
632

e12 =

(195)(100)
= 30.85
632

e24 =

(235)(39)
= 14.50
632

e13 =

(195)(65)
= 20.06
632

e31 =

(202)(428)
= 136.80
632

e14 =

(195)(39)
= 12.03
632

e32 =

(202)(100)
= 31.96
632

e21 =

(235)(428)
= 159.15
632

e33 =

(202)(65)
= 20.78
632

e22 =

(235)(100)
= 37.18
632

e34 =

(202)(39)
= 12.47
632

NE
Region
S
W

Type of Music Preferred


Rock
R&B Coun
Clssic
(132.06) (30.85) (20.06) (12.03)
140
32
5
18
(159.15) (37.18) (24.17) (14.50)
134
41
52
8
(136.80) (31.96) (20.78) (12.47)
154
27
8
13
428
100
65
39

195
235
202
632

Chapter 12: Analysis of Categorical Data

2 =

20

(141 132.06) 2
(32 30.85) 2
(5 20.06) 2
(18 12.03) 2
+
+
+
+
132.06
30.85
20.06
12.03
(134 159.15) 2
(41 37.18) 2
(52 24.17) 2
(8 14.50) 2
+
+
+
+
159.15
37.18
24.17
14.50
(154 136.80)2
(27 31.96) 2
(8 20.78) 2
(13 12.47) 2
+
+
+
=
136.80
31.96
20.78
12.47
.48 + .04 + 11.31 + 2.96 + 3.97 + .39 + 32.04 + 2.91 + 2.16 + .77 +
7.86 + .02 = 64.91

= .01, df = (c-1)(r-1) = (4-1)(3-1) = 6


2.01,6 = 16.8119
Since the observed 2 = 64.91 > 2.01,6 = 16.8119, the decision is to
reject the null hypothesis.
Type of music preferred is not independent of region of the country.

12.15
Transportation Mode
Air
Train Truck
Industry Publishing
32
12
41
Comp.Hard.
5
6
24
37
18
65

85
35
120

H0: Transportation Mode is independent of Industry.


Ha: Transportation Mode is not independent of Industry.

e11 =

(85)(37)
= 26.21
120

e21 =

(35)(37)
= 10.79
120

e12 =

(85)(18)
= 12.75
120

e22 =

(35)(18)
= 5.25
120

e13 =

(85)(65)
= 46.04
120

e23 =

(35)(65)
= 18.96
120

Chapter 12: Analysis of Categorical Data

21

Transportation Mode
Air
Train Truck
Industry Publishing
(26.21) (12.75) (46.04)
32
12
41
Comp.Hard. (10.79) (5.25) (18.96)
5
6
24
37
18
65

2 =

85
35
120

(32 26.21) 2
(12 12.75) 2
(41 46.04) 2
+
+
+
26.21
12.75
46.04
(5 10.79) 2
(6 5.25) 2
(24 18.96) 2
+
+
=
10.79
5.25
18.96
1.28 + .04 + .55 + 3.11 + .11 + 1.34 = 6.43

= .05, df = (c-1)(r-1) = (3-1)(2-1) = 2


2.05,2 = 5.99147
Since the observed 2 = 6.431 > 2.05,2 = 5.99147, the decision is to
reject the null hypothesis.
Transportation mode is not independent of industry.

12.16

Number of
Stories

1
2

Number of Bedrooms
<2
3
>4
116
101
57
90
325
160
206
426
217

274
575
849

H0: Number of Stories is independent of number of bedrooms.


Ha: Number of Stories is not independent of number of bedrooms.
e11 =

(274)(206)
= 66.48
849

e21 =

(575)(206)
= 139.52
849

e12 =

(274)(426)
= 137.48
849

e22 =

(575)(426)
= 288.52
849

Chapter 12: Analysis of Categorical Data

22

e13 =

(274)(217)
= 70.03
849

2 =

(90 139.52) 2
(101 137.48) 2
(57 70.03) 2
(90 139.52) 2
+
+
+
+
139.52
137.48
70.03
139.52

e23 =

(575)(217)
= 146.97
849

(325 288.52) 2
(160 146.97) 2
+
=
288.52
146.97
2 = 36.89 + 9.68 + 2.42 + 17.58 + 4.61 + 1.16 = 72.34
= .10

df = (c-1)(r-1) = (3-1)(2-1) = 2

2.10,2 = 4.60517
Since the observed 2 = 72.34 > 2.10,2 = 4.60517, the decision is to
reject the null hypothesis.
Number of stories is not independent of number of bedrooms.

12.17

Type
of
Store

Mexican Citizens
Yes
No
Dept.
24
17
Disc.
20
15
Hard.
11
19
Shoe
32
28
87
79

41
35
30
60
166

Ho: Citizenship is independent of store type


Ha: Citizenship is not independent of store type
e11 =

(41)(87)
= 21.49
166

e31 =

(30)(87)
= 15.72
166

e12 =

(41)(79)
= 19.51
166

e32 =

(30)(79)
= 14.28
166

e21 =

(35)(87)
= 18.34
166

e41 =

(60)(87)
= 31.45
166

Chapter 12: Analysis of Categorical Data

e22 =

(35)(79)
= 16.66
166

Type
of
Store

2 =

e42 =

23

(60)(79)
= 28.55
166

Mexican Citizens
Yes
No
Dept. (21.49) (19.51)
24
17
Disc.
(18.34) (16.66)
20
15
Hard. (15.72) (14.28)
11
19
Shoe
(31.45) (28.55)
32
28
87
79

41
35
30
60
166

(24 21.49) 2
(17 19.51) 2
(20 18.34) 2
(15 16.66) 2
+
+
+
+
21.49
19.51
18.34
16.66

(11 15.72) 2
(19 14.28) 2
(32 31.45) 2
(28 28.55) 2
+
+
+
=
15.72
14.28
31.45
28.55
.29 + .32 + .15 + .17 + 1.42 + 1.56 + .01 + .01 = 3.93
= .05,

df = (c-1)(r-1) = (2-1)(4-1) = 3

2.05,3 = 7.81473
Since the observed 2 = 3.93 < 2.05,3 = 7.81473, the decision is to fail to
reject the null hypothesis.
Citizenship is independent of type of store.

Chapter 12: Analysis of Categorical Data

24

12.18 = .01, k = 7, df = 6
H0: The observed distribution is the same as the expected distribution
Ha: The observed distribution is not the same as the expected distribution
Use:

2 =

( f0 fe )2
fe

critical 2.01,7 = 18.4753


fo

fe

(f0-fe)2

214
235
279
281
264
254
211

206
232
268
284
268
232
206

64
9
121
9
16
484
25

2 =

( f0 fe )2
f0
0.311
0.039
0.451
0.032
0.060
2.086
0.121
3.100

( f0 fe )2
= 3.100
fe

Since the observed value of 2 = 3.1 < 2.01,7 = 18.4753, the decision is to fail to
reject the null hypothesis. The observed distribution is not different from the
expected distribution.

12.19

Variable 1

12
8
7
27

Variable 2
23
17
11
51

e11 = 11.00

e12 = 20.85

e13 = 24.12

e21 = 8.87

e22 = 16.75

e23 = 19.38

21
20
18
59

56
45
36
137

Chapter 12: Analysis of Categorical Data

e31 = 7.09
2 =

e32 = 13.40

25

e33 = 15.50

(12 11.04) 2
(23 20.85) 2
(21 24.12) 2
(8 8.87) 2
+
+
+
+
11.04
20.85
24.12
8.87
(17 16.75) 2
(20 19.38) 2
(7 7.09) 2
(11 13.40) 2
+
+
+
+
16.75
19.38
7.09
13.40
(18 15.50) 2
=
15.50
.084 + .222 + .403 + .085 + .004 + .020 + .001 + .430 + .402 = 1.652

df = (c-1)(r-1) = (2)(2) = 4

= .05

2.05,4 = 9.48773
Since the observed value of 2 = 1.652 < 2.05,4 = 9.48773, the decision is to fail
to reject the null hypothesis.

12.20

Customer Industrial
Retail

NE
230
185
415

Location
W
S
115
68
143
89
258
157

413
417
830

e11 =

(413)(415)
= 206.5
830

e21 =

(417)(415)
= 208.5
830

e12 =

(413)(258)
= 128.38
830

e22 =

(417)(258)
= 129.62
830

e13 =

(413)(157)
= 78.12
830

e23 =

(417)(157)
= 78.88
830

Chapter 12: Analysis of Categorical Data

26

Location
NE
W
S
Customer Industrial (206.5) (128.38) (78.12)
230
115
68
Retail
(208.5) (129.62) (78.88)
185
143
89
415
258
157

2 =

413
417
830

(230 206.5) 2
(115 128.38) 2
(68 78.12) 2
+
+
+
206.5
128.38
78.12
(185 208.5) 2
(143 129.62) 2
(89 78.88) 2
+
+
=
208.5
129.62
78.88
2.67 + 1.39 + 1.31 + 2.65 + 1.38 + 1.30 = 10.70

= .10 and df = (c - 1)(r - 1) = (3 - 1)(2 - 1) = 2


2.10,2 = 4.60517
Since the observed 2 = 10.70 > 2.10,2 = 4.60517, the decision is to reject the
null hypothesis.
Type of customer is not independent of geographic region.

12.21 Cookie Type


fo
Chocolate Chip
189
Peanut Butter
168
Cheese Cracker
155
Lemon Flavored
161
Chocolate Mint
216
Vanilla Filled
165
fo = 1,054
Ho:
Ha:

Cookie Sales is uniformly distributed across kind of cookie.


Cookie Sales is not uniformly distributed across kind of cookie.

If cookie sales are uniformly distributed, then fe =

no.kinds

1,054
= 175.67
6

Chapter 12: Analysis of Categorical Data

fo

fe

189
168
155
161
216
165

175.67
175.67
175.67
175.67
175.67
175.67

27

( f 0 fe )2
f0
1.01
0.33
2.43
1.23
9.26
0.65
14.91

The observed 2 = 14.91


= .05

df = k - 1 = 6 - 1 = 5

2.05,5 = 11.0705
Since the observed 2 = 14.91 > 2.05,5 = 11.0705, the decision is to reject the
null hypothesis.
Cookie Sales is not uniformly distributed by kind of cookie.

12.22

Bought
Car

Y
N

Gender
M
F
207
65
811
984
1,018
1,049

272
1,795
2,067

Ho: Purchasing a car or not is independent of gender.


Ha: Purchasing a car or not is not independent of gender.

(272)(1,018)
= 133.96
2,067
(1,795)(1,018)
e21 =
= 884.04
2,067
e11 =

(27)(1,049)
= 138.04
2,067
(1,795)(1,049)
e22 =
= 910.96
2,067
e12 =

Chapter 12: Analysis of Categorical Data

Bought
Car

Gender
M
F
(133.96) (138.04)
207
65
272
(884.04) (910.96)
811
984
1,795
1,018
1,049
2,067

Y
N

2 =

28

(207 133.96) 2
(65 138.04) 2
(811 884.04) 2
+
+
+
133.96
138.04
884.04
(984 910.96) 2
=
910.96

= .05

39.82 + 38.65 + 6.03 + 5.86 = 90.36

df = (c-1)(r-1) = (2-1)(2-1) = 1

2.05,1 = 3.841
Since the observed 2 = 90.36 > 2.05,1 = 3.841, the decision is to reject the
null hypothesis.
Purchasing a car is not independent of gender.

12.23

Arrivals
0
1
2
3
4
5
6

fo
26
40
57
32
17
12
8
fo = 192

( f )(arrivals) = 426
192
f
0

(fo)(Arrivals)
0
40
114
96
68
60
48
(fo)(arrivals) = 426

= 2.2

Ho: The observed frequencies are Poisson distributed.


Ha: The observed frequencies are not Poisson distributed.

Chapter 12: Analysis of Categorical Data

Arrivals
0
1
2
3
4
5
6

Probability
.1108
.2438
.2681
.1966
.1082
.0476
.0249

29

fe
(.1108)(192) = 21.27
(.2438)(192) = 46.81
51.48
37.75
20.77
9.14
4.78

fo

fe

26
40
57
32
17
12
8

21.27
46.81
51.48
37.75
20.77
9.14
4.78

( f 0 fe )2
f0
1.05
2.18
0.59
0.88
0.68
0.89
2.17
8.44

Observed 2 = 8.44
= .05

df = k - 2 = 7 - 2 = 5

2.05,5 = 11.0705
Since the observed 2 = 8.44 < 2.05,5 = 11.0705, the decision is to fail to reject
the null hypothesis. There is not enough evidence to reject the claim that the
observed frequency of arrivals is Poisson distributed.

Chapter 12: Analysis of Categorical Data

30

12.24 Ho: The distribution of observed frequencies is the same as the


distribution of expected frequencies.
Ha: The distribution of observed frequencies is not the same as the distribution of
expected frequencies.

Soft Drink

fo

Classic Coke
361
Pepsi
272
Diet Coke
192
Mt. Dew
121
Dr. Pepper
94
Sprite
102
Others
584
fo = 1,726

proportions

fe

.206 (.206)(1726) = 355.56


.145 (.145)(1726) = 250.27
.085
146.71
.063
108.74
.059
101.83
.062
107.01
.380
655.86

( f0 fe )2
f0
0.08
1.89
13.98
1.38
0.60
0.23
7.87
26.03

Calculated 2 = 26.03
= .05

df = k - 1 = 7 - 1 = 6

2.05,6 = 12.5916
Since the observed 2 = 26.03 > 2.05,6 = 12.5916, the decision is to reject the
null hypothesis.
The observed frequencies are not distributed the same as the expected frequencies
from the national poll.

12.25
Position

Years

0-3
4-8
>8

Systems
Manager Programmer Operator Analyst
6
37
11
13
28
16
23
24
47
10
12
19
81
63
46
56

67
91
88
246

Chapter 12: Analysis of Categorical Data

31

e11 =

(67)(81)
= 22.06
246

e23 =

(91)(46)
= 17.02
246

e12 =

(67)(63)
= 17.16
246

e24 =

(91)(56)
= 20.72
246

e13 =

(67)(46)
= 12.53
246

e31 =

(88)(81)
= 28.98
246

e14 =

(67)(56)
= 15.25
246

e32 =

(88)(63)
= 22.54
246

e21 =

(91)(81)
= 29.96
246

e33 =

(88)(46)
= 16.46
246

e22 =

(91)(63)
= 23.30
246

e34 =

(88)(56)
= 20.03
246

Position

0-3
Years
4-8
>8

2 =

Systems
Manager Programmer Operator Analyst
(22.06)
(17.16)
(12.53)
(15.25)
6
37
11
13
(29.96)
(23.30)
(17.02)
(20.72)
28
16
23
24
(22.54)
(28.98)
(16.46)
(20.03)
47
10
12
19
81
63
46
56

67
91
88
246

(6 22.06) 2
(37 17.16) 2
(11 12.53) 2
(13 15.25) 2
+
+
+
+
22.06
17.16
12.53
15.25
(28 29.96) 2
(16 23.30) 2
(23 17.02) 2
(24 20.72) 2
+
+
+
+
29.96
23.30
17.02
20.72
(47 28.98) 2
(10 22.54) 2
(12 16.46) 2
(19 20.03) 2
+
+
+
=
28.98
22.54
16.46
20.03
11.69 + 22.94 + .19 + .33 + .13 + 2.29 + 2.1 + .52 + 11.2 + 6.98 +
1.21 + .05 = 59.63

Chapter 12: Analysis of Categorical Data

= .01

32

df = (c-1)(r-1) = (4-1)(3-1) = 6

2.01,6 = 16.8119
Since the observed 2 = 59.63 > 2.01,6 = 16.8119, the decision is to reject the
null hypothesis. Position is not independent of number of years of experience.

12.26 H0: p = .43


Ha: p .43

n = 315
x = 120

=.05
/2 = .025
fo

fe

( f 0 fe )2
f0

More Work,
More Business

120

(.43)(315) = 135.45

1.76

Others

195

(.57)(315) = 179.55

1.33

Total

315

315.00

3.09

The calculated value of 2 is 3.09


= .05 and /2 = .025

df = k - 1 = 2 - 1 = 1

2.025,1 = 5.02389
Since 2 = 3.09 < 2.025,1 = 5.02389, the decision is to fail to reject the null
hypothesis.

12.27

Number
of
Children

0
1
2
>3

Type of College or University


Community
Large
Small
College
University College
25
178
31
49
141
12
31
54
8
22
14
6
127
387
57

234
202
93
42
571

Ho: Number of Children is independent of Type of College or University.


Ha: Number of Children is not independent of Type of College or University.

Chapter 12: Analysis of Categorical Data

33

e11 =

(234)(127)
= 52.05
571

e31 =

(93)(127)
= 20.68
571

e12 =

(234)(387)
= 158.60
571

e32 =

(193)(387)
= 63.03
571

e13 =

(234)(57)
= 23.36
571

e33 =

(93)(57)
= 9.28
571

e21 =

(202)(127)
= 44.93
571

e41 =

(42)(127)
= 9.34
571

e22 =

(202)(387)
= 136.91
571

e42 =

(42)(387)
= 28.47
571

e23 =

(202)(57)
= 20.16
571

e43 =

(42)(57)
= 4.19
571

Number
of
Children

0
1
2
>3

2 =

Type of College or University


Community
Large
Small
College
University College
(52.05)
(158.60) (23.36)
25
178
31
(44.93)
(136.91) (20.16)
49
141
12
(20.68)
(63.03)
(9.28)
31
54
8
(9.34)
(28.47)
(4.19)
22
14
6
127
387
57

234
202
93
42
571

(25 52.05) 2
(178 158.6) 2
(31 23.36) 2
(49 44.93) 2
+
+
+
+
52.05
158.6
23.36
44.93

(141 136.91) 2
(12 20.16) 2
(31 20.68) 2
(54 63.03) 2
+
+
+
+
136.91
20.16
20.68
63.03
(8 9.28) 2
(22 9.34) 2
(14 28.47) 2
(6 4.19) 2
+
+
+
=
9.28
9.34
28.47
4.19

Chapter 12: Analysis of Categorical Data

34

14.06 + 2.37 + 2.50 + 0.37 + 0.12 + 3.30 + 5.15 + 1.29 + 0.18 +


17.16 + 7.35 + 0.78 = 54.63
= .05,

df= (c - 1)(r - 1) = (3 - 1)(4 - 1) = 6

2.05,6 = 12.5916
Since the observed 2 = 54.63 > 2.05,6 = 12.5916, the decision is to reject the
null hypothesis.
Number of children is not independent of type of College or University.

12.28 The observed chi-square is 30.18 with a p-value of .0000043. The chi-square
goodness-of-fit test indicates that there is a significant difference between the
observed frequencies and the expected frequencies. The distribution of responses
to the question are not the same for adults between 21 and 30 years of age as they
are to others. Marketing and sales people might reorient their 21 to 30 year old
efforts away from home improvement and pay more attention to leisure
travel/vacation, clothing, and home entertainment.

12.29 The observed chi-square value for this test of independence is 5.366. The
associated p-value of .252 indicates failure to reject the null hypothesis. There is
not enough evidence here to say that color choice is dependent upon gender.
Automobile marketing people do not have to worry about which colors especially
appeal to men or to women because car color is independent of gender. In
addition, design and production people can determine car color quotas based on
other variables.

You might also like