You are on page 1of 6

13.

4 Test of Independence: Contingency Tables


Motivating Example:

Objective:
we want to determine whether the beer preference is independent of the
gender of the beer drinker.
We want to test
H0 :

Beer preference is independent of the gender

vs.
Ha :

Beer preference is not independent of the gender

with 0.05 .
We have the following data:

Beer Preference
Light

20 f 11

40 f 12

Dark
20 f 13

Female

30 f 21

30 f 22

10 f 23

70

Total

50

70

30

150 n

15

3
15

pc 2

pc 3

Male

Gender

Proportion

15

pc1

Regular

Total

Proportion

80

15
7

15

p r1
pr 2

The above table is called a contingency table.


If H 0 is true, then the expected numbers under H 0 are
the expected number (Male, Light) e11 80
the expected number (Male, Regular) e12 80

5
8 5
150
np r1 p c1
15
15 15

7
8 7
150
np r1 pc 2
15
15 15

the expected number (Male, Dark) e13 80

3
8 3
150
npr1 pc 3
15
15 15

the expected number (Female, Light) e21 70


the expected number (Female, Regular) e22 70

5
7 5
150
npr 2 pc1
15
15 15

7
7 7
150
np r 2 pc 2
15
15 15

the expected number (Female, Dark) e23 70

3
7 3
150
np r 2 pc 3 .
15
15 15

The expected numbers under H 0 can be summarized by

Beer Preference
Light
Male

Gende
r

Female

Regular

Dark

n p r1 p c1

n p r1 p c 2

n p r1 p c 3

e11 26.67

e12 37.33

e13 16

n p r 2 p c1

n pr 2 pc 2

n p r 2 pc 3

e21 23.33

e22 32.67

e23 14

Proportion

15

pc1

15

pc 2

Proportion

15
7

15

p r1
pr 2

3
15

pc 3
Intuitively, if the differences between the observed number f ij and the expect number
(under H 0 ) eij , i 1, 2; j 1, 2, 3 , are small, that might imply H 0 is true and thus
the observed number and the expected number (under H 0 ) are close. The following
statistic can be used to reflect the difference between the observed number and the
expected number,

ij

i 1 j 1

eij

eij

f11 e11 2 f12 e12 2 f13 e13 2

e11

e12

e13

f 21 e21 2 f 22 e22 2 f 23 e23 2


e21

e22

e23

20 26.67 2 40 37.33 2 20 16 2

26.67
37.33
16
2
2
30 23.33 30 32.67 10 14 2

23.33
32.67
14
6.13

General Case:
Suppose there are two variables, column variable (with m categories)
and row variable (with p categories). We want test the hypothesis
H0 :

Row variable is independent of column variable


vs.

Ha :

Row variable is not independent of column variable.


2

Suppose the sample size is n. The contingency table is

Column Variable (m columns)


1

...

f 11

f1 j

f 1m

proportions
m

p r1

Row
Variabl
e
(p
rows)

f i1

f ij

f im

f
k 1

f p1

f pj

f pm

k 1

f
k 1

pk

p cm
p

f k1

k 1

p cj

ik

p c1

1k

p ri

proporti
ons

H0

k 1

p rp

If

f kj

k 1

km

is true, then the expected numbers under

H0

are

Column Variable (m columns)


1

...

e11

npr1 pc1

Row
Variabl
e
(p
rows)

ei1

e p1

p c1

proportions

e1m

p r1

eij

np r1 p cm

eim

p ri

np ri p cj

np ri p cm

e pj

eim

p rp

np rp p c1

proporti
ons

e1 j

np r1 p cj

np ri p c1

np rp p cm

np rp p cj

p cj

p cm

Note:

eij np ri p cj sample size row i proportion colmmn j proportion

k 1

ik

f
k 1

ik

sample size row i total column j total


sample size sample size

kj

k 1

f
k 1

kj

row i total column j total


sample size

where
m

k 1

k 1

row i total f ik , column j total f kj , i 1, , p; j 1, , m.

and
p

sample size f ij n .
i 1 j 1

Thus, the chi-square statistic used to reflect the difference between the
observed number and the expected number is
p

ij

eij

eij

i 1 j 1

f11 e11 2 f12 e12 2 f1m e1m 2


e11

e12

e1m

f 21 e21 2 f 22 e22 2 f 2 m e2 m 2

e21

p1

e p1

e22

e p1

Next question: how large

e p2

p2

ep2

e2 m

must be to reject

pm

e pm

e pm
H0 ?

Chi-Square Test:
Let
p

i 1 j 1

As

eij 5

for

ij

eij

eij

for every i and j, the chi-square test with level of significance


H0 :

Row variable is independent of column vairalbe


vs.

Ha :

Row variable is not independent of column variable.

is to
4

reject H 0 :

2 2p 1 m 1 ,

not reject H 0 :

2 2p 1 m 1 ,

where 2p1 m1 , can be obtained by


P 2p 1 m 1 2p 1 m 1 , .

In addition,
p - value P 2p 1 m1 2 .

Note: as

H0

is true, the random variable with sample value

is

2p 1 m1 .

Example (continue)
2
2
2
Since p 2, m 3 and 6.13 5.99 2, 0.05 p 1 m 1 , , thus we reject H 0 .

Also,

p value P 2p 1 m 1 2 P 22 6.13 0.047 0.05 ,


we also reject H 0 based on p-value. Therefore, we conclude that the beer
preference is not independent of the gender of the beer drinker.
Example:
The following data are the number of people who are in favor of, are not in favor of,
and have no comment on, some proposal:
Male
Female

Favor
252
148

Not Favor
145
105

No Comment
203
147

Please test if female and male differ in their opinions about the proposal with
0.05 .
[solution:]
The column totals are 252 148 400,145 105 250,203 147 350 while the row
totals are 252 145 203 600,148 105 147 400 . In addition, the total number is
1000.
The table for the expected numbers eij is

Favor

Not Favor

No Comment

Row Total
600

Male

600 400
240
1000

600 250
150
1000

600 350
210
1000

Female

400 400
160
1000

400 250
100
1000

400 350
140
1000

400

Column Total

400

250

350

1000

Thus,
p

i 1 j 1

ij

eij
eij


i 1 j 1

ij

eij

eij

252 240 2 145 150 2 203 210 2

240
150
210
2
2
148 160 105 100 147 140 2 2.5

160
100
140

2
2
2
2
Since 2.5 5.99 2, 0.05 21 31 , 0.05 p 1 m1 , , we do not reject H 0 .

Online Exercise:
Exercise 13.4.1
Exercise 13.4.2

You might also like