P. 1
Nonparametric Statistical Inference, Fourth Edition

Nonparametric Statistical Inference, Fourth Edition

|Views: 1,472|Likes:
Published by Mario Balderas

More info:

Published by: Mario Balderas on Oct 31, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/01/2013

pdf

text

original

In a kÂ2 contingency table, the B family is simply a dichotomy with
say success and failure as the two possible outcomes. Then it is a
simple algebraic exercise to show that the test statistic for indepen-
dence can be written in an equivalent form as

Q ¼

X

k

i¼1

X

2

j¼1

ðXij ÀXi:X:j=NÞ2
Xi:X:j=N ¼

X

k

i¼1

ðYi Àni^pÞ2
ni^pð1À^pÞ

ð3:1Þ

where

Yi ¼ Xi1 ni ÀYi ¼ Xi2

^p ¼

X

k

i¼1

Yi=N

If B1 and B2 are regarded as success and failure, and A1;A2;. . .;Ak are
termed sample 1, sample 2,. . ., and sample k, we see that the chi-
square test statistic in (3.1) is the sum of the squares of k standardized
binomial variables with parameter p estimated by its consistent esti-
mator ^p. Thus the test based on (3.1) is frequently called the test for the

equality of k proportions, previously covered in Section 10.8 and illu-
strated here by Example 3.1.

Example 3.1 A marketing research firm has conducted a survey of
businesses of different sizes. Questionnaires were sent to 200 ran-
domly selected businesses of each of three sizes. The data on responses

Table 2.2 Expected frequencies

Nicotine

Alcohol

0

1–15

16 or more

Total

0

105 (82.7)

7 (17.7)

11 (22.6)

123

0.01–0.10

58 (51.1)

5 (10.9)

13 (14.0)

76

0.11–0.99

84 (109.6)

37 (23.4)

42 (30.0)

163

1.00 or more

57 (60.5)

16 (12.9)

17 (16.5)

90

Total

304

65

83

452

Business size

Small

Medium

Large

Response

125

81

40

ANALYSIS OF COUNT DATA

529

are summarized below. Is there a significant difference in the pro-
portion of nonresponses by small, medium, and large businesses?

Solution The frequencies of nonresponses are 75, 119, and 160. The
best estimate of the common probability of nonresponse is (75þ119þ
160)=600¼0.59. The expected numbers of nonresponse are then 118
for each size business. The value of Q from (3.1) is 74.70 with 2 degrees
of freedom. From Table B we find P < 0:001, and we conclude that the
proportions of nonresponse are not the same for the three sizes of
businesses.

If k ¼ 2, the expression in (3.1) can be written as

Q ¼ ðY1=n1 ÀY2=n2Þ2
^pð1À^pÞð1=n1 þ1=n2Þ

ð3:2Þ

Now the chi-square test statistic in (3.2) is the square of the difference
between two sample proportions divided by the estimated variance of
their difference. In other words, Q is the square of the classical stan-
dard normal theory test statistic used for the hypothesis that two
population proportions are equal.
Substituting the original Xij notation in (3.2), a little algebraic
manipulation gives another equivalent form for Q as

Q ¼ NðX11X22 ÀX12X21Þ2
X:1X:2X1:X2:

ð3:3Þ

This expression is related to the sample Kendall tau coefficient of
Chapter 11. Suppose that the two families A and B are factors or
qualities, both dichotomized into categories which can be called pre-
sence and absence of the factor or possessing and not possessing the
quality. Suppose further that we have a single sample of size N, and
that we make two observations on each element in the sample, one for
each of the two factors. We record the observations using the code 1 for
presence and 2 for absence. The observations then consist of N sets of
pairs, for which the Kendall tau coefficient T of Chapter 11 can be
determined as a measure of association between the factors. The
numerator of T is the number of sets of pairs of observations, sayðaibiÞ
and ðajbjÞ, whose differences ai Àaj and bi Àbj have the same sign but
are not zero. The differences here are both positive or both negative
only for a set (1,1) and (2,2), and are of opposite signs for a set (1,2) and
(2,1). If Xij denotes the number of observations where factor A was
recorded as i and factor B was recorded as j for i,j¼1,2, the number of
differences with the same sign is the product X11X22, the number of
pairs which agreed in the sense that both factors were present or both
were absent. The number of differences with opposite signs is X12X21,

530

CHAPTER 14

the number of pairs which disagreed. Since there are so many ties, it
seems most appropriate to use the definition of T modified for ties,
given in (11.2.37) and called tau b. Then the denominator of T is the
square root of the product of the numbers of pairs with no ties for each
factor, or X1:X2:X:1X:2. Therefore the tau coefficient is

T ¼ X11X22 ÀX12X21
ðX:1X:2X1:X2:Þ1=2 ¼ Q

N

1=2

ð3:4Þ

and Q=N estimates t2

, the parameter of association between factors A
and B. For this type of data, the Kendall measure of association is
sometimes called the phi coefficient, as defined in (2.6).

Example 3.2 The researchers in the study reported in Example 2.1
really might have been more interested in a one-sided alternative of
positive dependence between the variables alcohol and nicotine.
Since the data are measurements of level of consumption, we could
regard them as 452 pairs of measurements with many ties. For
example, the 37 mothers in cell (3,2) of Table 2.1 represent the
pair of measurements (AIII, BII), where AIII indicates alcohol
consumption in the 0.11–0.99 range and BII represents nicotine
consumption at level 1–15. For these kinds of data we can then
calculate Kendall’s tau for the 452 pairs. The number of concordant
pairs C and the number of discordant pairs Q are calculated as
shown in Table 2.3. Because the ties are quite extensive, we need to
incorporate the correction for ties in the calculation of T from
(11.2.38). Then we use the normal approximation to the distribution
of T in (11.2.30) to calculate the right-tailed P value for this one-
sided alternative.

Table 2.3 Calculations for C and Q

C

Q

105(5þ13þ37þ42þ16þ17)¼13,650

7(58þ84þ57)¼1,393

7(13þ42þ17)¼504

11(58þ84þ57þ5þ37þ16)¼2,827

58(37þ42þ16þ17)¼6,496

58(84þ57)¼705

5(42þ17)¼295

13(84þ57þ37þ16)¼2,522

84(16þ17)¼2,2772

37(57)¼2,109

37(17)¼629

42(57þ16)¼3,066

24,346

12,622

ANALYSIS OF COUNT DATA

531

T¼

24;346À12;622

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

452
2

À 304
2

À ÁÀ 65
2

À ÁÀ 83
2

À Á

! 452
2

À ÁÀ 123
2

À ÁÀ 76
2

À ÁÀ 163
2

À ÁÀ 90
2

À Á

!

s

¼0:1915

Z ¼ 3ð0:1915Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

452ð451Þ

p
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2ð904þ5Þ

p

¼ 6:08

We find P=0.000 from Table A of the Appendix.
There is also a relationship between the value of the chi-square
statistic in a 2Â2 contingency table and Kendall’s partial tau coeffi-
cient. If we compare the expression for TXY:Z in (12.6.1) with the ex-
pression for Q in (3.3), we see that

TXY:Z ¼

ffiffiffiffiffiffiffiffiffiffiffi

Q=N

p

for N ¼ m
2

A test for the significance of TXY:Z cannot be carried out by using Q,
however. The contingency table entries in Table 6.1 of Chapter 12 are
not independent even if X and Yare independent for fixed Z, since all
categories involve pairings with the Z sample.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->