You are on page 1of 10

A Non-Parametric Test for Randomness in a Sequence of Multinomial Trials

Author(s): B. M. Bennett
Source: Biometrics, Vol. 20, No. 1 (Mar., 1964), pp. 182-190
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2527626 .
Accessed: 11/06/2014 04:42

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.

http://www.jstor.org

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
A NON-PARAMETRIC TEST FOR RANDOMNESS IN A
SEQUENCE OF MULTINOMIAL TRIALS

B. M. BENNETT

University of Washington, Seattle, Washington, U.S.A.

SUMMARY
A rank-order test is proposed for experimental situations where it is of concerin
to determine whether a sequence of multinomial probabilities may be varying
significantly in n independent trials each with a fixed number of possible outcomes.
An approximate x2-test for the hypothesis Ho of constancy of probabilities is
proposed, and the generating function is obtained for the distribution of the rank
sums. An example using sibship data on congenital malformations (data of Milham
[1962]) illuistrates the use of the test for birth-order effect when several abnormalities
are present.

1. INTRODUCTION AND STATEMENT OF PROBLEM

Consider the situation of n independent trials in each of which


one and only one of the s possible outcomes or events E1, , Es can
occur. Suppose that the probability of the realization of event E; on
the i-th trial is [pi] (i = 1, , n; j = 1, , s). It is required to
test the hypothesis
Ho:p = pi (unspecified),
i.e., the probability that E, occurs is constant from trial to trial, given
that there are fixed totals of n1 occurrences of E1, , n, occurrences
of Es , respectively, in the n(= 2ni) trials.
In this general formulation it is seen that the problem is of interest,
e.g., in learning situations (Cane, [1962]), where it may be desired to
test whether the proportion or degree of graded achievements may be
changing during a series of independent trials. In genetics problems
also it may be of concern whether certain assumed ratios of observable
characteristics (e.g., a 9: 3: 3: 1 ratio) are in fact changing in a sequence
of records. The usual chi-square test does not generally detect such lack
of randomness.
The formulation of the problem and the test proposed represent a
generalization of that given by Haldane & Smith [1948]. These authors
were concerned with a rank-order test for detection of a possible dif-
ferential incidence of a particular abnormality or disease amongst
182

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
RANDOMNESS IN MULTINOMIAL TRIALS 183

recorded sibs (without missing ones or twins) in families in which there


was at least one affected member. As a test of a possible birth-order
effect, Haldane & Smith proposed for a criterion the sum (= X) of
the ranks, or rather birth orders, of the affected sibs in examining
records of propositi, and worked out the distribution of X through its
generating function, assuming that all possible orders of affected sibs
were equally probable.
The test proposed in this paper extends to the cases where perhaps
several abnormalities are of concern. Some of the properties of the
Haldane-Smith test have been discussed by the author [1956]. Cox
[1958] considered the regression aspects of this problem, and demon-
strated the optimum properties of the Haldane-Smith test in testing
against a linear trend in particular with respect to a 'logit' transforma-
tion of the (binomial) probabilities.

2. DISTRIBUTION OF RANK SUMS


The multinomnialsituation envisaged in the previous section may
be characterized by the following pairs [x; , yiiJ of variates, where

[x,1 , yJi] = [i, 1] if E; occurs on i-th trial


= [0, 0] otherwise,

so that the sums X1= E xl, ... , X3 = E xi3 of the ranks or serial
numbers at which the events E1, ... , E, occur are measures of the
randomness or lack of it amongst the independent multinomial trials.
The sums X1, * , X3 are further qualified by the restrictions Y1 -
Eyil=nl , * =, Y Y3=n3 . Also E Xj=-n(n+1), E Yj=n.
Clearlythen it sufficesto consideronly the first (s - 1) of the sums X,
since
8-1

X= 2n(n + 1) - E Xi
i-1

and of the correspondingY.


Now denoteby Ai the numericalvalue taken by the randomvariable
Xi, and let T(ni , n2, ... , n-1 ; n; A1 , A2, *... , A_-1) represent the
total numberof partitionsin which A (1 < j < s) is partitionedinto
nm distinctintegers,eachpart beingn or less, suchthat 0 < E3j- ni < n.
If now these partitionsare furtherdistinguishedaccordingto which of
the X's containsthe integer n, then T satisfies the differenceequation
T(n1T, ,ni - ; n;A1
. , ; nA8 -A1)

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
184 BIOMETRICS, MARCH 1964

+ +1 +T(nl,* n,n_1 -1; n 1;Al , * ;, 4,_ - n)

+ T(n1 , * , n.1 , n - 1; A1, I *, I ). (1)


Any term in this equation is taken to be zero if, for any i(1 < i < s),
ni > 0 and the argument Ai or Ai -n is negative or zero.
If P(n1 , n2, , n_1 ; n; A1,
7 * *
l , A8.1) represents the correspond-
ing probability that X1 = A1, * , X -1 = A-1 , then dividing (1) by
the multinomial coefficient n!/n! ... n,-1!n3! we obtain the recursion
formula
P(ni n,_-, n; Al,** AB-1)
= ( I)P(ni-1, ,n1- ;n - 1;Al -n, A.-) (2)

+ XX+ sf-1)P(n,,i ,n,,_1 , ln -1; A,,l A,-,_,-n)

+ (1 _n_ ns,) P(nl , n.-1 ;n - 1; Al, A,.

Using the recursion relation (2), it may be verified that the first and
second moments and covariances of the rank sums Xi (i = 1, *** , s) are
Ai = E(X)= AjP(n1, n,_- ;n;A,A l A81) = 1(n + 1)nj
A

V(Xj) = =ii E(X -i = A(n + 1)ni(n -ni)

C(Xi, Xi) = E(Xi - )(X=Ai)


- i (3)
- -A(n + 1)ninj (i 5 j).
Also all mixedmomentsE(X -,- )r(Xi - gj)8 vanish if (r + s) is odd.
In connection with a related test based on the rank-order statistics
Xi *... , X. for the hypothesis of the equality of means in the one-way
analysis of variance with unequal numbers of continuous observations
in each of s classifications, Kruskal & Wallis [1952] have tabulated
for s 3 the exact distribution of the X's close to the significance levels
e = 0.10, .05, .01 for various combinations of n1 and n2 up to n = 15.
The Kruskal-Wallis test is a generalization of the two-sample Mann-
Whitney-Wilcoxon one [1947].
3. APPROXIMATE x2-TEST FOR Ho
Provided that limn (ni/n) = c,i 0 for i = 1, ***, s-1 it is known
from a theorem of Wald and Wolfowitz [1944] that the sequence
ui = V/12[Xi - 1(n + 1)ni]/n3/2

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
RANDOMNESS IN MULTINOMIAL TRIALS 185

has asymptotically a singular (s - 1) -variate normal distribution


with variance-covariance matrix IjC(uj , uj) l = 1iici - cic ll for
X, J 1, , s.
I = 1- represents the inverse matrix of the variances and
If Io?'I
co-variances given in (3), so that
12 (1 1
O
n(n + I) ( na) (4)
= 12 1
n(n + )n.
n) (i
comparison of
J-1

X2 _ 2>j ii[Xi- 3(n + 1)njj[X -l(n + 1)nj] (5)

with X2 is proposed as an approximate test for Ho , where x2 represents


the 100Eper cent significance point of the X2 distribution with (s - 1)
degrees of freedom.
Using (4), (5) reduces to
a
2 12 1
X ( + [Xi-(n 2 + 1)n ]
ni-i1)j= ni
12 + 1)
E X2-3(n (6)
-n(n
3
+1) n.7

which happens to coincide with the H-test originally proposed by


Kruskal and Wallis [1952] as a non-parametric test analogous to the F
or variance ratio. For this test it is known from results on k- statistics
and polykays

EX2)= (s - 1)

V(X2) _6( EI_

+ ( +
5n(n + 1)1lOn2(s
L - 1) - 2n(2s - 3)(s + 2) + 6s(3s - 2)] (7)

(Cf. David & Barton [1962], p. 199), though this latter result does
not appear to agree with that obtained by Kruskal ([1952] p. 535).
David and Barton suggest that for n large and s fixed the X2distribution
is an adequate approximation. These authors noted, however, that
if n is as small as 15 the second moment of the test differs considerably
from that for x2. Kruskal and Wallis [1952] also tabulated the exact
distribution for s = 3, nt < 5. Wallace [1959] has considered a Beta
type approximation for x2-

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
186 BIOMETRICS, MARCH 1964

Kruskal [1952] has also demonstrated that, if lim (ni/n) = ci = 0


for i = 1, ..., k, x2 defined in (6) now has (s - kl- 1) degrees of
freedom.
4. EXAMPLE

We consider one of the sibships from Milharm'sdata [1962] on the


outcomes of successive pregilancies in patients who had two or more
infants with malformations of the central nervous system, in particular
anencephalus or spina bifida. The presence or absence of each of the
specified conditions (including 'Normal' pregnancy) is denoted by the
y values, 1 or 0.

DATA FOR PATIENT NO. 3

Premature
Pregnancy No. Anencephalus or Abortion Normal

1 0 1 0
2 0 0 1
3 0 0 1
4 1 0 0
5 0 0 1
6 0 0
7 0 0
8 0 0 1
9 1 0 0
10 1 0 0

TOTAL X1= 23 X2=1 X3 =3

1
X2 ()2) [1(23)2 + (1)2 + t(31)2] 3(10 + 1)

= 3.82.

There is then little statistically significant evidence (P = .15) of a


birth-order effect in these categories. None of the ten patients of
Milham's data in fact shows evidence of birth-order effect in the several
categories of congenital malformation by the use of this X2-test and
its sum for the combined data.

5. DERIVATION OF GENERATING FUNCTION OF


RANK SUMS UNDER Ho AND ALTERNATIVES
The following section will be devoted to the derivation of the genera-
ting function of the rank sums X1 , *.. X, under the hypothesis Ho,

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
RANDOMNESS IN MULTINOMIAL TRIALS 187

i.e. assuming that all orders of the occurrences of the multinomial


events described in section 1 are equally probable.
The generating function G0(X1, X, ; t1, , ts) of the sums
may in fact be derived directly from the recursion relations (2), but
the following alternative presentation introduces new features of
interest in the problem and its relation to the earlier work of Haldane &
Smith and Kruskal & Wallis.
The joint generating function of the X's and the Y's, defined in
section (2), is
n

G?(X, Y; t, u) =1 (pit1u1+ * + P,tsu,) (8)


i =1

so that the required (conditional) distribution of the sums X1,i , Xs

G0(XJY; t), given that Y1 = n1, Y.,, = n., is the coefficient of


*u"
... in the expansion of (8) above, when divided by the multi-
nomial probability
, =
P(Yi nl, Y3 = na,) nl! ..n, 1P la .

In view of the dependence of the X's, it suffices then to consider the


terms in the expansion of the functioni
n

F(ul, u,u-8)= II (1 + tl~ui + + t>_1u,_1)

= Ei *** z
n7
Ec(n
ns-i
n,-,)U'
* ... uA1< (9)

since the coefficientc(ni , ** , n,-1) in the expansion(9) above, divided


by the multinomialcoefficientn!/n, ! - - *n, ! (n - n, - - n,-,)!,
also coincideswith the generatingfunction G?(X1 Xs_-I tl
t8,_). Now since
(I + tn1 UJ + ***+ tn+U_,-l)f(l2 nls1

= (I + tiul + * + t,s-lUs-lF(t1u1,*,tslsl

identifyingthe coefficientsc(ni, , n,-1) on both sides of this equation


gives the recursionrelation
(1 - tj t-1 .)c(n , , ns-1)
Ts se-r to bring o t e p+ regar
.l .tns -
+ (tltni
_
t+)c(ni n,***nn- l). (10)
The case s = 3 will serve to bringout the esselntialpoints regarding

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
188 BIOMETRICS, MARCH 1964

the solution of the generating function from (10). For this case if we
first set n1 = 0, n2 = 0, respectively in (10), we obtain the coefficients

c(n, , 0) = (lti c n2)


C(O0
1 2

which are Euler's generating functions for enumerating either the


partitions of n1 or n2 distinct integers each less than n. Equation (11)
was rediscovered by Haldane & Smith in connection with their test
(= X1). Under the null hypothesis the Haldane & Smith test is equiva-
lent to that of Mann & Whitney [1947] for the equality of two sample
means from continuous distributions. If we define

A(n n ) =(tl t2 1l)( ) (tl t2 t2) (2

equation (10) becomes

c(ni , n2) = X(n1 , n2)c(n-1, n2) + /(n, , n2)C(n , n2 - 1). (13)

Using this recursion relation, we then obtain formally

c(ni , n2) = fII X * *. * ... (14)

where the meaning is that each product contains n1X's and n2bl's in a
specified order such that each X(j, n) is followed by X(j 1, n) or -

-
(j-1, n) and each ,u(j, n) is followed by X(j, n - 1) or (j, n - 1), a
total of n!/n1 ! n2 ! (n - n- terms altogether. n2) !

Equation (14) is formally then a generalization of Euler's theorem,


and enumerates the bipartitions of the first n integers into two sums
with n1 and n2 distinct integers in each.
An important consideration in the use of the proposed test based on
the rank sums X1 , ***, X8 will be its power in distinguishing 'trend'
alternatives in particular amongst the multinomial sequence {pil} for
j = 1, ... n. Detection of possible periodicity amongst the probabilities
of the events E, , E. is likely to be less powerful in the use of this
test.
It is the purpose of this concluding section to present an outline of
results on a general class of alternatives to Ho which permit trend, in the
sense used by Cox. The resulting generating function has the feature
of allowing the exact power function of the test based on the X's to be
computed for each n, i.e. we may compute the exact probability of
obtaining a significant result using the generalized rank-order test.
This represents a generalization of the results obtained for the Haldane-
Smith test (Bennett [1956]).

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
RANDOMNESS IN MULTINOMIAL TRIALS 189

We consider 'logit' alternatives to HO , in which the multinomial


probabilities are given by
e a+oW if

H: PZei= (15)

for any particular sequence {xj }. (Thus H0 is equivalent to the hypoth-


esis fl = 0, j = 1, s.) Using an argument similar to that of Cox
[1958] in the binomial case, the likelihood function of the y's when
x. =i is

n s-1 exp [ aojY; + Z [3AX-]


i 1 ~[II 71(ZEeai+:ii)ui i] (6
p i i

and so using the notation of section 2, the joint distribution of the X's
and n's is

P(X? = A1, * , A8_1= A 1 , iY11= n, , Y1-, n=

T(n1, ,n81 ;n; A1 A,-,) exp acn7 + Z f3X-)]


(17)
- [II II (Z eai+0ji)Yi ]

The required conditional distribution- of the rank sums X is then

P(X1 = A1, *., IX_1-=A_1 I Y, = 1, ... Ys-=n,_-)

T(nj n, Al
,-81; n;A1, A8-1) exp [LZjA
=
A
T(n1, ,n8_; n; Al, , A81) exp [Z fjAJ]
;

T(nj, n81;
n n; Al , A8.1) exp [Z O3jA;]
-31oxly(ol ~ ~ (1 Q)

The moment generating function of the X's is

Mx y 1 ,l * O-,) = E[exp E 6jAj

X IY(01 + l t 0-,1 + [38-1) (19)


- MO,Q1 ... ,f~~

in terms of the generating function MO


xy(0, **, s-1)

=Goly 1,*** *, e.'&) under HO.

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions
190 BIOMETRICS, MARCH 1964

The corresponding cumulant generating function of the X's is

KXIY(61 - 608-1)

- KXIY(01 + f3 X , 6.-, + AS-l) - K f3yl) . *, (20)


from which the cumulants may be computed.
ACKNOWLEDGEMENTS
In addition to some discussions with Professor E. M. Wright of
Aberdeen University on the generating function in section 3, it is also a
pleasure to mention the helpful comments of Dr. D. J. Finney of the
Statistics Department of Aberdeen during the author's tenure of a
Fulbright Research Scholarship there. I am also indebted to Mr. Donald
DuBeau for checking some of the derivations and to the referee for
improvements in the presentation of this paper.
REFERENCES
Bennett, B. M. [1956]. "On a rank-order test for the equality of probability of
an event", Skand. Akt. 39, 11-18.
Cane, V. R. [1962]. "Learning and inference", J. R. Statist, Soc. A 125, 183-209.
Cox, D. R. [1958]. "The regression analysis of binary sequences", J. R. Statist.
Soc. B. 20, 215-231.
David, F. N. and Barton, D. E. [1962]. Combinatorial Chance, Griffin and Co.
Haldane, J. B. S. and Smith, C. A. B. [1948]. "A simple exact test for birth-order
effect", Ann. Eug. 14, 117-124.
Hardy, G. H. and Wright, E. M. [1960]. Introduction to Theory of Numbers, Oxford
University Press, 4th edition.
Kruskal, W. H. and Wallis, W. A. [1952, 1953]. "Use of ranks in one-criterion
analysis of variance", J. Amer. Statist. Ass., 47, 583-621, 48, 907-11.
Kruskal, W. H. [1952]. "A nonparametric test for the several sample problem".
Ann. Math. Statist. 23, 525-540.
Mann, H. B. and Whitney, D. R. [1947]. "On a test whether one of two random
variables is stochastically larger than the other", Ann. Math. Statist., 18, 50-60.
Milham, S. [1962]. "Increased incidence of anencephalus and spina bifida in siblings
of affected cases", Science, 138, 593-594.
Wallace, D. L. [1959]. "Simplified Beta approximations to the Kruskal-Wallis
test", J. Am. Statist. Ass., 54, 225-230.

This content downloaded from 62.122.76.57 on Wed, 11 Jun 2014 04:42:51 AM


All use subject to JSTOR Terms and Conditions

You might also like