P. 1
Nonparametric Statistical Inference, Fourth Edition

# Nonparametric Statistical Inference, Fourth Edition

|Views: 1,472|Likes:

### Availability:

See more
See less

07/01/2013

pdf

text

original

As an extension of the sampling situation of Section 12.4, suppose that
we have n objects to be ranked and a ﬁxed number of observers to rank
them but each observer ranks only some subset of the n objects. This
situation could arise for reasons of economy or practicality. In the case
of human observers particularly, the ability to rank objects effectively
and reliably may be a function of the number of comparative judg-
ments to be made. For example, after 10 different brands of bourbon
have been tasted, the discriminatory powers of the observers may
legitimately be questioned.
such that the rankings are incomplete in the same symmetrical way as
in the balanced incomplete-blocks design which is used effectively in
agricultural ﬁeld experiments. In terms of our situation, this means
that:

1. Each observer will rank the same number m of objects for some

m < n.
2. Everyobjectwillberankedexactlythesametotalnumberkoftimes.
3. Each pair of objects will be presented together to some observer a
total of exactly l times, l51, a constant for all pairs.

These speciﬁcations then ensure that all comparisons are made with
the same frequency.

476

CHAPTER 12

In order to visualize the design, imagine a two-way layout of p
rows and n columns, where the entry dij in (i,j) cell equals 1 if object j is
presented to observer i and 0 otherwise. The design speciﬁcations then
can be written symbolically as

1. Pn

j¼1dij ¼ m

for i ¼ 1;2;. . .;p

2. Pp

i¼1dij ¼ k

for j ¼ 1;2;. . .;n

3. Pp

i¼1dijdir ¼ l for allr j ¼ 1;2;. . .;n

Summing on the other subscript in speciﬁcations 1 and 2, we obtain

X

p

i¼1

X

n

j¼1

dij ¼ mp ¼ kn

which implies that the number of observers is ﬁxed by the design to be

p ¼ kn=m. Now using speciﬁcation 3, we have

X

p

i¼1

X

n

j¼1

dij

!2

¼

X

p

i¼1

X

n

j¼1

d2

ij þ

X

n

j¼1

X

n

r¼1

jr

dijdir

mpþlnðnÀ1Þ

and from speciﬁcation 1, this same sum equals pm2

. This requires the

relation

l ¼ pmðmÀ1Þ

nðnÀ1Þ ¼ kðmÀ1Þ
nÀ1

Since p and l must both be positive integers, m must be a factor of kn
and nÀ1 must be a factor of kðmÀ1Þ. Designs of this type are called

Youden squares or incomplete Latin squares. Such plans have been
tabulated (for example, in Cochran and Cox, 1957, pp. 520–544). An
example of this design for n ¼ 7;l ¼ 1;m ¼ k ¼ 3, where the objects
are designated by A, B, C, D, E, F, and G is:

Observer

1

2

3

4

5

6

7

Objects presented

for ranking

A

B

C

D

E

F

G

B

C

D

E

F

G

A

D

E

F

G

A

B

C

MEASURES OF ASSOCIATION IN MULTIPLE CLASSIFICATIONS

477

We are interested in determining a single measure of the overall
concordance or agreement between the kn=m observers in their re-
lative comparisons of the objects. For simpliﬁcation, suppose there is
some natural ordering of all n objects and the objects labeled accord-
ingly. In other words, object number r would receive rank r by all
observers if each observer was presented with all n objects and the
observers agreed perfectly in their evaluation of the objects. For per-
fect agreement in a balanced incomplete ranking then, where each
observer assigns ranks 1;2;. . .;m to the subset presented to him, ob-
ject 1 will receive rank 1 whenever it is presented; object 2 will receive
rank 2 whenever it is presented along with object 1, and rank 1
otherwise; object 3 will receive rank 3 when presented along with both
objects 1 and 2, rank 2 when with either objects 1 or 2 but not both,
and rank 1 otherwise, etc. In general, then, the rank of object j when
presented to observer i is one more than the number of objects pre-
sented to that observer from the subset of objects f1;2;. . .;jÀ1g, for
all 24j4n. Symbolically, using the d notation of before, the rank of
object j when presented to observer i is 1 for j ¼ 1 and

1þ

X

jÀ1

r¼1

dir for all 24j4n

The sum of the ranks assigned to object j by all p observers in the case
of perfect agreement then is

X

p

i¼1

1þ

X

jÀ1

r¼1

dir

!dij ¼

X

p

i¼1

dij þ

X

jÀ1

r¼1

X

p

i¼1

dirdij

kþlðjÀ1Þ

for j ¼ 1;2;. . .;n

as a result of the design speciﬁcations 2 and 3.
Since each object is ranked a ﬁxed number, k, of times, the ob-
served data for an experiment of this type can easily be presented in
a two-way layout of k rows and n columns, where the jth column
contains the collection of ranks assigned to object j by those observers
to whom object j was presented. The rows no longer have any sig-
niﬁcance, but the column sums can be used to measure concordance.
The sum of all ranks in the table is½mðmþ1Þ=2 ½kn=m ¼ knðmþ1Þ=2,
and thus the average column sum is kðmþ1Þ=2. In the case of perfect
concordance, the column sums are some permutation of the numbers

k; kþl;kþ2l;. . .;kþðnÀ1Þl

478

CHAPTER 12

and the sums of squares of deviations of column sums around their
mean is

X

nÀ1

j¼0

ðkþjlÞÀkðmþ1Þ
2

!2

¼ l2

nðn2

À1Þ

12

Let Rj denote the actual sum of ranks in the jth column. A relative
measure of concordance between observers may be deﬁned here as

W ¼

12Pn

j¼1½Rj Àkðmþ1Þ=2 2

l2

nðn2

À1Þ

ð5:1Þ

If m ¼ n and l ¼ k so that each observer ranks all n objects, (5.1) is
equivalent to (4.4), as it should be.
This coefﬁcient of concordance also varies between 0 and 1 with
larger values reﬂecting greater agreement between observers. If there
is no agreement, the column sums would all tend to be equal to the
average column sum and W would be zero.

TESTS OF SIGNIFICANCE BASED ON W

For testing the null hypothesis that the ranks are allotted randomly by
each observer to the subset of objects presented to him so that there is
no concordance, the appropriate rejection region is large values of W.
This test is frequently called the Durbin (1951) test.
The exact sampling distribution of W could be determined only
by an extensive enumeration process. Exact tables for 15 different
designs are given in van der Laan and Prakken (1972). For k large
an approximation to the null distribution may be employed for tests
of signiﬁcance. We shall ﬁrst determine the exact null mean and
variance of W using an approach analogous to the steps leading to
(2.7). Let Rij; i ¼ 1;2;. . .;k, denote the collection of ranks allotted to
object number j by the k observers to whom it was presented.
From (11.3.2), (11.3.3), and (11.3.10), in the null case then for all i, j,
and q j

EðRijÞ ¼ mþ1
2

varðRijÞ ¼ m2

À1

12

covðRij;RiqÞ ¼ Àmþ1
12

and Rij and Rhj are independent for all j where i h. Denoting
ðmþ1Þ=2 by m, the numerator of W in (5.1) may be written as

MEASURES OF ASSOCIATION IN MULTIPLE CLASSIFICATIONS

479

12

X

n

j¼1

X

k

i¼1

Rij Àkm

"

#2

¼ 12

X

n

j¼1

X

k

i¼1

ðRij ÀmÞ

"

#2

¼ 12

X

n

j¼1

X

k

i¼1

ðRij ÀmÞ2

þ24

X

n

j¼1

XX

14i4k

ðRij ÀmÞðRhj ÀmÞ

¼ pmðm2

À1Þþ24U ¼ l2

nðn2

À1ÞW

ð5:2Þ
Since covðRij;RhjÞ ¼ 0 for all i < h; EðUÞ ¼ 0. Squaring the sum
represented by U, we have

U2

¼

X

n

j¼1

XX

14i4k

ðRij ÀmÞ2

ðRhj ÀmÞ2

þ2

XX

14j4n

XX

14i4k

Â

XX

14r4k

ðRij ÀmÞðRhj ÀmÞðRrq ÀmÞðRsq ÀmÞ

and

EðU2

Þ ¼

X

n

j¼1

XX

14i4k

varðRijÞvarðRhjÞ

þ2

XX

14j4n

l

2

covðRij;RiqÞcovðRhj;RhqÞ

since objects j and q are presented together to both observers i and h
a total of l

2

times in the experiment. Substituting the respective
variances and covariances, we obtain

varðUÞ ¼ EðU2

Þ ¼

n

k

2

m2

À1Þ2

þ2

n

2

! l

2

mþ1Þ2

144

¼ nkðmþ1Þ2

ðmÀ1ÞðmÀ1ÞðkÀ1ÞþðlÀ1Þ
288

From (5.2), the moments of W are

EðWÞ ¼ mþ1
nþ1Þ

varðWÞ ¼ 2ðmþ1Þ2ðmÀ1ÞðkÀ1ÞþðlÀ1Þ
nkl2

ðmÀ1Þðnþ1Þ2

480

CHAPTER 12

Asinthecaseofcompleterankings,alinearfunctionofWhasmoments
approximately equal to the corresponding moments of the chi-square
distribution with nÀ1 degrees of freedom if k is large. This function is

Q ¼ lðn2

À1ÞW
mþ1

and its exact mean and variance are

EðQÞ ¼ nÀ1

varðQÞ ¼ 2ðnÀ1Þ 1À mðnÀ1Þ
nkðmÀ1Þ

!% 2ðnÀ1Þ 1À1
k

The rejection region for large k and signiﬁcant level a then is

Q 2 R for Q5w2

nÀ1;a

TIED OBSERVATIONS

Unlike the case of complete rankings, no simple correction factor can
be introduced to account for the reduction in total sum of squares of
deviations of column totals around their mean when the midrank
method is used to handle ties. If there are only a few ties, the null
distribution of W should not be seriously altered, and thus the statistic
can be computed as usual with midranks assigned. Alternatively, any
of the other methods of handling ties discussed in Section 5.6 (except
omission of tied observations) may be adopted.

APPLICATIONS

This analysis-of-variance test based on ranks for balanced incomplete
rankings is usually called the Durbin test. The test statistic here,
where l is the number of times each pair of treatments is ranked and

m is the number of treatments in each block, is most easily computed
as

Q ¼

12Pn

j¼1R2
j

lnðmþ1Þ À3k2

ðmþ1Þ

l

ð5:3Þ

which is asymptotically chi-square distributed with nÀ1 degrees of
freedom. The null hypothesis of equal treatment effects is rejected for

Q large.

Kendall’s coefﬁcient of concordance descriptive measure for k in-
completesetsofnrankings,wheremisthenumberofobjectspresented

MEASURES OF ASSOCIATION IN MULTIPLE CLASSIFICATIONS

481

for ranking and l is the number of times each pair of objects is ranked
together, is given in (5.1), which is equivalent to

W ¼

12Pn

j¼1R2

j À3k2

nðmþ1Þ2

l2

nðn2

À1Þ

ð5:4Þ

and Q ¼ lðn2

À1ÞW=ðmþ1Þ is the chi-square test statistic with nÀ1
degrees of freedom for the null hypothesis of no agreement between
rankings.

If the null hypothesis of equal treatment effects is rejected, we
can use a multiple comparisons procedure to determine which pairs of
treatments have signiﬁcantly different effects. Treatments i and j are
declared to be signiﬁcantly different if

jRi ÀRjj5zÃ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

kmðm2

À1Þ

6ðnÀ1Þ

s

ð5:5Þ

where zÃ is the negative of the ½a=nðnÀ1Þ th quantile of the standard
normal distribution.

Example 5.1 A taste-test experiment to compare seven different kinds
of wine is to be designed such that no taster will be asked to rank more
than three different kinds, so we have n ¼ 7 and m ¼ 3. If each pair of
wines is to be compared only once so that l ¼ 1, the required number
of tasters is p ¼ lnðnÀ1Þ=mðmÀ1Þ ¼ 7. A balanced design was used
and the rankings given are shown below. Calculate Kendall’s coefﬁ-
cient of concordance as a measure of agreement between rankings and
test the null hypothesis of no agreement.

Solution Each wine is ranked three times so that k ¼ 3. We calculate
PR2

j ¼ 280 and substitute into (5.4) to get W ¼ 1, which describes

Wine

Taster

A

B

C

D

E

F

G

1

1

2

3

2

1

3

2

3

3

2

1

4

2

3

1

5

1

3

2

6

2

1

3

7

1

3

2

Total

3

5

9

7

8

4

6

482

CHAPTER 12

perfect agreement. The test statistic from (5.3) is Q ¼ 12 with 6 de-
grees of freedom. The P value from Table B of the Appendix is
0:05 < P < 0:10 for the test of no agreement between rankings. At the
time of this writing, neither STATXACT nor SAS has an option for the
Durbin test.

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->