You are on page 1of 7

This article was downloaded by: [Moskow State Univ Bibliote]

On: 15 January 2014, At: 11:16


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41
Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical Association


Publication details, including instructions for authors and subscription information:
http://amstat.tandfonline.com/loi/uasa20

Nonparametric Stepwise Multiple Comparison Procedures


a b
Gregory Campbell & John H. Skillings
a
Laboratory of Statistical and Mathematical Methodology , Division of Computer Research and
Technology, National Institutes of Health , Bethesda , MD , 20205 , USA
b
Department of Mathematics and Statistics , Miami University , Oxford , OH , 45056 , USA
Published online: 12 Mar 2012.

To cite this article: Gregory Campbell & John H. Skillings (1985) Nonparametric Stepwise Multiple Comparison Procedures, Journal of
the American Statistical Association, 80:392, 998-1003

To link to this article: http://dx.doi.org/10.1080/01621459.1985.10478216

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or
warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed
by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly
in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at http://amstat.tandfonline.com/page/terms-
and-conditions
Nonparametric Stepwise Multiple
Comparison Procedures
GREGORY CAMPBELL and JOHN H. SKILLINGS*

Nonparametric multiple comparisons are discussed, with par- problem of this example but behaves poorly in other cases (see
ticular emphasis given to stepwise procedures. Nonparametric Skillings 1983).
analogs of the stepwise all-subset procedure of Einot and Ga- 2. For highly skewed populations, RG can fail to control
briel are presented, along with an ad hoc nonparametric analog the Type I error at level a for the situation of (k - 1) equal
of the Newman-Keuls procedure. These new procedures are populations and one different, even if the one population is a
compared among themselves and with nonstepwise procedures location shift from the others (Oude Voshaar 1980).
based on Type I error levels and comparisonwise power. It is
Thus there is a need for better nonparametric procedures, and
shown that these stepwise nonparametric procedures control
it is reasonable to explore the stepwise approach in an attempt
Type I error levels, and that they have superior pairwise power
to improve the power while maintaining reasonable error levels.
compared to the commonly used nonstepwise procedures.
The application of stepwise procedures to nonparametric sta-
KEY WORDS: Experimentwise error rate; Simultaneous test tistics has been limited. Ryan (1960) proposed a stepwise ap-
procedure; Stepwise all-subset procedures; Newman-Keuls; proach using pairwise Wilcoxon statistics. Steel (1961) sug-
Monte Carlo. gested a Newman-Keuls stepwise approach for the maximum
Downloaded by [Moskow State Univ Bibliote] at 11:16 15 January 2014

of the pairwise Wilcoxon statistics. Miller (1981) mentioned


1. INTRODUCTION briefly the possibility of a Newman-Keuls approach for the
Multiple comparison procedures (MCP’s) are useful tech- techniques in his nonparametric chapter. Campbell (1980) ad-
niques for detecting differences among treatments in k-sample vocated the nonparametric use of the stepwise subset procedure
(k 2 2) problems. Numerous such procedures have been de- of Einot and Gabriel (1975). However, the properties of these
veloped, and they have been the subject of considerable re- procedures have not been investigated in detail.
search. One major reason for the presence of so many different The purpose of this article is to explore various stepwise
MCP’s concerns the philosophical divergence of opinion on nonparametric MCP’s. Section 2 considers stepwise subset pro-
how the error levels should be controlled; this clearly affects cedures using several nonparametric test statistics. The third
the comparative power of the procedures. Attempts to improve section introduces a nonparametric ad hoc procedure and ex-
the power for normally distributed data have included the New- amines its properties. Section 4 uses a Monte Carlo study to
man-Keuls procedure (NK) and other stepwise approaches. compare these procedures with nonstepwise ones in terms of
power and Type I error levels. Relative efficiency is discussed
However, NK does not control the probability of finding at
in Section 5. The final section contains a discussion and sum-
least one false difference (Type I error) at level a for all possible
hypotheses that have groups of equal treatments (Hartley 1955). mary.
Stepwise normal theory procedures that do control the actual In this article attention is restricted to the one-way layout
Type I error probability at level a have been advanced by Ryan with model
(1960), Einot and Gabriel (1975), and Welsch (1977); these Yi,=Bi+cijr j = 1 , . . . , n i , i = l , . . . , k,
procedum have good power properties (Einot and Gabriel 1975;
Ramsey 1978). where the Bi’s are unknown and the cij’s are continuous, in-
Stepwise procedures can also play an important role in non- dependent, identically distributed random variables. Note that
parametric MCP’s for the following reasons. this model implies the assumption of equal scales for the k
populations. For ease of presentation, equal sample sizes
1. The commonly used nonstepwise nonparametric tech- (ni = n, i = 1, . . . , k) are assumed until the final section.
niques can have poor power characteristics. For example, sup- Throughout this article it is assumed that the objective of mul-
pose that the joint ranks are (1, 2, 3, 4}, (5, 6, 7, 8}, and (9, tiple comparisons is to make inferences concerning equalities
10, 11, 12) for treatments A , B, and C, respectively. For these among all pairs of the 8,’s. It is also assumed that simultaneous
data the joint rank range procedure of Nemenyi (1963) (here- estimation using confidence regions is not of interest.
inafter RG) distinguishes A from C for any a ? .012, but fails
to distinguish B from either A or C. This example illustrates 2. ALL-SUBSET PROCEDURES
an unfortunate feature of RG, namely, a strong dependence
among the ranks that can cause a loss of power in some cases. The class of procedures of this section is the class of non-
The use of a statistic based on pairwise ranks circumvents the parametric analogs of the stepwise subset procedures described
in Einot and Gabriel (1975). These all-subset (AS) procedures
~
declare two treatments different only if every subset containing
* Gregory Campbell is Senior Staff Fellow, Laboratory of Statistical and
Mathematical Methodology, Division of Computer Research and Technology,
National Institutes of Health, Bethesda, MD 20205. John H. Skillings is Pro- In the Publlc Dornaln
fessor, Department of Mathematics and Statistics, Miami University, Oxford, Journal of the Arnerlcan Statistical Assoclatlon
OH 45056. December 1985, Vol. 80, No. 392, Theory and Methods
Campbell and Sklllings: Nonparametric Multiple Comparisons 999

both treatments is rejected. Let T,,denote a nonparametric test in the partition be denoted by Si. where Si contains vi equal
statistic used to compare a subset of p treatments, for p = 2, &’s. In order to decide falsely that two treatments differ using
. . . , k. The steps of the procedure are as follows: a subset procedure, one must reject for the subset Si that contains
both. Thus
Step 1. The statistic Tktests the equality of all k treatments.
If the test is nonsignificant at level ak, then the procedure P(Type I error) 5 1 - P ( S I retained, . . . , S, retained)
terminates and no differences are reported.
Step 2. The statistic Tk-I is applied to all k subsets of k -
1 treatments at level ak-1. If, for a subset of size k - 1,
= 1- n P(Si retained),
4

i= I

Tk-,is nonsignificant at level a k - , , then that subset and all of the latter following by using the independence of the rank sta-
its proper subsets are declared not different (retained). If
tistics for the different Si. Thus, if a, is defined as 0 for subsets
,
Tk- is significant at level ak- I for some subset, then one con- of size 1, the Type I error level is 5 1 - fly,, (1 - avi);the
tinues.
actual level can be quite a bit smaller than this, as one must
Step (k - p + 1j. The statistic T,,is used to test at level
make no incorrect decision prior to the decision at the true
a,, all subsets of size p that have not been previously retained
partition. For all possible configurations,
( p = k - 2 , . . . , 2). One continues to the next step only if
there is at least one significant test statistic at the current step. 4

The procedure terminates by early stopping or with reported P(Type I error) 5 max(1 - fl (1 - a”,)},
i= I
(1)
pairwise differences after step k - 1.
where the maximum is over all possible partitions of the set of
Downloaded by [Moskow State Univ Bibliote] at 11:16 15 January 2014

Any AS procedure is coherent over the class of all subsets; k treatments into subsets of equal Bi’s.
that is, if the hypothesis of equality of the Oi’s is rejected on For normal theory AS procedures, the maximal Type I error
some subset, it is rejected for all subsets containing that subset. level exceeds a if all up = a, and the same situation applies
This subset coherence is a direct consequence of the stepwise in the nonparametric case. To remedy thls situation, Ryan (1960)
. nature of the procedure (Gabriel 1969). suggested that one should use a,, = palk for p = 2, . . . , k,
A subset procedure is consonant if the hypothesis of equality and Welsch (1977) and Ramsey (1978) have refined this to
of the Bi’s in the set is rejected, then there is at least one subset a, = 1 - (1 - fOrp = 2, . . . , k - 2 and ah-] =
of it for which equality of the 0,’s in the subset is also rejected ak = a. With either adjustment the maximal Type I error level
(Gabriel 1969). For AS procedures, consonance will depend 1s s a .
on the nonparametric test statistic and the up’s. In the nonparametric case the discreteness of the nonpara-
There are many choices for the test statistic Tp. Note that if metric statistics complicatesthis adjustment. The remedy adopted
T,, is the studentized range and a,, = a, the classical NK results. here is to select the largest critical values so that a,, 5 1 -
For simplicity only the following three nonparametric statistics (1 - forp = 2, . . . , k - 2 and a k - , , ah S a; then the
are considered here. maximal Type I error probability in (1) is S a . One difficulty
1. Joint Rank Range. For p treatments, relabeled from 1 that can be encountered when n is small is that there may not
to p , the observations are jointly ranked and the ith treatment be any critical values satisfying a,, 5 1 - (1 - for the
sum Ri is calculated for i = 1, . . . , p. The test statistic, due desired level a. Of course, for larger samples, the discreteness
to Nememyi (1963) and discussed in Miller (1981) with tables becomes less crucial. RGS, SRS, and KWS refer to the non-
in Hollander and Wolfe (1973), is r n a ~ ~ ~IRi. ~ -< Rjl.
~~,, parametric all-subset procedures with these adjusted up’s.
2. Separate Rank. Let Rij denote the sum of the ranks for A few comments about these procedures are in order. Both
the observations from treatment i in a separate ranking of the RGS and KWS require reranking of the data for each subset.
observations in treatments i andj. The test statistic for a group In fact, it is this reranking that sidesteps the deficiency of RG
~ , , the nonstepwise MCP
of p treatments is r n a ~ ~ , R~, ~. SR, noted by Oude Voshaar (1980) and ensures Type I error control
based on this statistic, is due to Steel (1960, 1961) and Dwass for all partitions of the treatments. While neither RGS nor KWS
(1960) and is discussed in Miller (1981). are consonant, SRS is, provided the a,-critical values are non-
3. Kruskal-Wallis (KW). Consider the well-known Krus- decreasing in p .
kal-Wallis statistic based on the joint ranking of the observa-
3. AN AD HOC PROCEDURE
tions for p treatments. There is not a nonstepwise MCP based
on this statistic. The AS procedure can be computationally cumbersome in
that many separate tests can be required. In this section a simpler
For p = 2, all three reduce to the two-sided Wilcoxon sta-
ad hoc procedure is proposed, with steps similar to NK.
tistic.
Consider the Type I error levels for AS procedures. The Type Step 0. Order the treatments from smallest to largest ac-
I error probability, the probability of falsely declaring at least cording to the rank sums in the joint ranking. Use sample
one pair of equal Oi’s different, depends on the true configu- medians to resolve any ties. Without loss of generality relabel
ration of the di’s. The configuration that all of the 0,’s are equal, the treatments so that the rank sums are ordered: R I IR2 I
called the complete null hypothesis, has experimentwise Type ..- 5 R k .
I error level s a k .Now consider a general configuration of equal Step I. Conclude that treatments 1 and k differ if Rk - R I
6,’s in each of q groups (1 5 q 5 k - 1). Let the ith group exceeds ra,,k, the upper ak cutoff of the rank range for k equal
1000 Journal of the American Statistical Assoclatlon, December 1985

treatments. If treatments 1 and k are not declared different, then are reranked and the absolute value of the difference of the
stop and report no differences. rank sums of treatments i +
p - 1 and i is compared with
rap,p.The procedure terminates by early stopping or with re-
Step 2. Declare treatments 1 and k - 1 different if, in the
ported differences after step k - 1.
joint ranking of treatments 1 to k - 1, the absolute value of
the difference in rank sums between treatments 1 and k - 1 This procedure can be generalized in several ways. A dif-
-,
exceeds ra,-,,k . Declare treatments 2 and k different if, in the ferent test statistic, such as the separate rank, could be employed
joint ranking of treatments 2 to k, the absolute value of the as suggested in Ryan (1960) and Steel (1961). A different rule
difference in rank sums between treatments 2 and k exceeds for ordering treatments in Step 0 could be used. In simulations
T ~ , _ , , ~ - If neither pair differs, then stop. the ordering rule based on rank sums adopted here was found
Step (k - p + 1): ( p = k - 2, , . . , 2). Continue to preferable to other rules based on medians or on tie-breaking
test each subset S of size p of the form {i, i +
1, . . . , i + by means.
p - 1) that are contained in subsets { j , j +
1, . . . ,j + p } For the test statistic based on the joint ranks, two procedures
that were declared significant at the preceding step, where the are considered: the nonparametric ad hoc procedure (NAH) with
order is as assigned in Step 0. For each subset S the observations the adjusted a,'s of Section 2 and the unadjusted nonparametric

Table 7 . Monte Carlo Estimates of Type I Error and Power for M C Procedures for
Downloaded by [Moskow State Univ Bibliote] at 11:16 15 January 2014

Uniform Shifts (5,000 replications)

Nonstep-
Error or wise Stepwise NK
Treatment Procedures AS Procedures Procedures
Shifts, Pair
k n ( c , , . . . , CJ fini) SR RG SRS RGS KWS NAH NAHU

,043 ,038 ,043 ,036 ,038 .044 ,045


.ooa .008 ,009 ,010 ,011 ,012 .013
,008 .004 ,019 .ma .ON ,019 ,028
.oa3 ,045 ,131 ,160 .170 .175 ,175
,497 ,599 .517 ,622 ,597 ,641 ,630
(0,3, 3, 2) ,036 ,010 .ox .oa7 ,087 ,085 .107
.171 .153 ,224 ,291 ,303 ,310 ,311
,503 sa4 ,543 ,613 .6i5 ,649 ,637
,023 .004 ,040 ,043 .046 ,044 ,045
,009 ,002 ,019 ,017 ,019 ,021 ,020
,499 ,442 .6i6 ,660 .cia9 ,681 .699
(0,0,2, 2) ,021 ,001 ,049 .054 ,050 .050 .077
.012 .001 ,025 ,025 .024 ,025 ,039
.494 ,428 ,614 .669 ,715 .6ai ,698

4 10 (0, 0, 0, 0) ,041 .047 .051 ,050 .047 ,044 ,047


.ooa .013 ,011 .011 ,011 .011 ,011
(0,1, 1, 2) ,007 ,004 .022 ,020 .023 ,020 .047
,234 ,144 .329 ,320 .347 ,346 ,405
,880 ,915 .ago .923 ,916 .92a .927
(0,I, 4, 2) ,089 ,029 .is0 ,160 .i5a ,160 .231
,437 ,388 558 ,593 .570 ,610 ,613
,876 .910 .902 ,922 .922 ,930 ,930
,019 ,006 .051 ,043 .042 ,046 ,048
,007 ,002 ,020 ,016 ,015 .021 ,023
,872 ,830 ,937 .934 .94i .941 ,962
(0,0,2, 2) .017 .001 ,051 .040 ,042 .051 ,104
.ooa .001 ,029 .020 ,023 .02a .054
.a81 .a33 ,941 ,948 ,946 .944 ,955

4 15 (0,0, 0, 0) ,042 ,051 .042 ,051 .040 .051 .049


,012 .009 ,013 .011 ,013 .014 ,013
(0,t , t , 1) ,009 ,008 .020 .oi5 ,022 .020 .020
,069 ,070 .111 ,121 .131 ,128 ,149
,396 ,453 .455 .465 .462 .466 .486
,023 .019 .044 .044 .045 ,047 ,045
.010 .006 .020 .Ole .018 .oi a ,022
.399 .406 ,479 .496 ,509 .509 534
,019 ,008 .om ,044 .OM ,043 .099
,009 .oo4 ,015 ,019 ,028 ,023 .050
.401 .404 ,514 500 ,536 5 11 5-39
Campbell and Skillings: Nonparametric Multiple Comparisons I001

ad hoc procedure (NAHU), which uses the largest a,-critical Although NAHU does not control Type I errors at level a,
values with a, Ia. one might suspect that NAH does. For NAH it is possible to
NAH can be compared with RGS of Section 2. Both pro- prove asymptotic Type I error control for all hypotheses:
cedures have the same critical values but can lead to different
results. For example, treatments 1 and k differ with NAH if Theorem 3.1. For the stepwise procedure of this section,
the test statistic is significant at Step 1, whereas with RGS, let a,,, denote the significance level for the joint rank range test
statistics for all other 2k-2 - 1 subsets containing 1 and k must on subsets of p treatments with n observations per treatment
also be significant. As a consequence NAH can find potentially ( p = 2, . . . , k). Let the k treatments be grouped into q subsets
more differences than RGS in some cases. On the other hand, SI,. . . , S, with homogenous treatments within subsets and
it is possible in some cases to find a difference between two location shifts between subsets. If limn+man, = a,, then the
treatments with RGS that may not be found with NAH. This limiting Type I error as n + m is 1 - nS=, (1 - a,,,), where
can occur when the order of the reranked sums is not the order vi is the size of S;.
imposed by NAH in Step 0.
The major advantages of NAH and NAHU are computa- Proof. Let N = nk and let Ri. denote the average of the
tional. For k > 3 an AS procedure can require as many as ranks for treatment i. It is well known (see Puri 1964) that as
2k - k - 1 tests (each requiring reranking) whereas these ad n + m the vector N - ' ( R 1 . , . . . , Rk.)'converges in probability
hoc analogs require at most (:) tests. For k = 3 this compu- to some constant vector, say y = ( y , , . . . , y k ) ' . With shift
tational savings is minimal, but for k 2 6 the reduction in alternatives of the form Fi(x) = F(x - OJ, it follows from
computation is considerable. Pun (1964) that yi < yj for Bi < 6,.Therefore the probability
Downloaded by [Moskow State Univ Bibliote] at 11:16 15 January 2014

Table 1 (continued).

Nonstep-
Error or wise Stepwise NK
Treatment Procedures AS Procedures Procedures
Shifts, Pair
k n (c,, . . . , CJ 0,i) SR RG SRS RGS KWS NAH NAHU

6 6 (0,0, 0, 0, 0, 0) EERl ,050 ,039 .050 ,020 .011 ,041 ,039


(1, 2)' ,006 ,002 ,006 ,002 ,001 .005 ,002
(0,t, 6, I , f , 2) (1, 2) ,008 .001 ,016 ,013 ,018 ,016 ,032
(1, 3) ,028 ,008 ,036 ,034 ,039 ,040 ,070
(1, 4) .074 ,061 .087 .092 ,099 .121 ,160
(1, 5) ,166 .196 .178 ,219 ,192 ,279 ,299
(1, 6) ,330 .442 .341 ,413 ,337 .498 ,509
(0,0,0, 0, 0, 2) EERl ,030 ,009 ,040 ,021 .015 ,042 ,037
(1, 2)' .004 .001 .004 ,002 ,002 .005 ,006
(1, 6) ,326 .283 .414 ,420 ,472 .488 ,562
(0,0, 1, 1, 2, 2) EERl ,015 ,001 ,024 .023 ,026 ,024 .065
(1, 2)' ,006 ,000 .009 ,010 .009 ,008 ,025
(1, 3) ,042 .020 ,066 ,066 ,067 ,082 ,129
(1, 6) .336 ,379 ,367 .408 ,404 ,510 555
(0,0, 0, 2, 2, 2) EERl ,028 .001 ,043 ,038 ,039 ,041 ,083
(1, 2)' .004 ,000 .008 .010 ,008 ,009 ,020
(1, 6) ,332 ,276 ,418 ,418 ,488 ,480 ,562

6 10 (0,0, 0, 0,0, 0) EERl .033 ,043 ,033 ,034 ,012 ,037 .OM
(1, 2)' ,001 ,002 ,002 ,003 .001 ,003 ,007
,014 ,004 ,031 ,033 .038 ,031 ,074
,062 ,033 ,106 .113 ,118 ,124 .179
.194 ,176 ,270 .305 ,317 ,341 .388
,458 .504 ,521 ,597 ,585 ,642 ,657
,760 ,830 ,780 ,853 .813 ,885 ,873
(0,0,0,0,0,2) EERl ,025 .008 ,032 .036 ,032 .OM .047
(1, 21' ,004 ,001 ,005 ,006 ,005 ,007 ,006
,749 .689 .852 A85 ,898 A84 ,936
.008 ,000 ,029 ,027 ,038 .034 ,102
,003 ,000 .012 .010 ,016 ,013 ,040
,113 ,066 ,179 ,208 .219 ,204 ,294
,755 ,786 .817 .868 ,852 .877 .903
.017 ,000 .033 .039 .046 .032 .090
,004 .ooo .008 .007 ,010 .008 .024
.770 ,688 ,849 ,878 .go1 ,891 ,936
* Equal shifts.
NOTE: EERI-Type I experimentwise error rate.
1002 Journal of the American Statistical Association, December 1985

of correctly rejecting any subset that contains unequal shift Several patterns of Table 1 are noteworthy:
parameters tends to 1 as n + m. Thus 1. Type I Errors. Within the sampling error, all procedures
lim P(no Type I error) have estimated experimentwise Type I error levels 5.05 for
w-
the complete null hypothesis (0, 0, . . . , 0). For other shift
= lim P(Siretained for i = 1, . . . ,q) configurations, all of the procedures except NAHU have esti-
n-s
mated Type I error levels 1.05.The inability of NAHU to
= lim n P(Siretained) n lim (1
9

m- i = l
=
4

, = I m-
- a",,,) control the Type I error rate at the nominal level a is illustrated
for shifts (0, 0, 2, 2) and (0, 0, 1, 1, 2, 2).
2. Power Comparisons Within Groups. The nonstepwise
= fi (1
i= 1
- a",). procedures exhibit the same patterns as reported in Skillings
(1983); in particular, RG is superior to SR for detecting extreme
A similar result can be obtained for unequal sample sizes if treatments (largest versus smallest) when there are intermediate
limvm n i / N = rli, where 0 < Ai < 1 for all i . A major con- treatments [e.g., 0 vs. 2 in (0, 1, 1, 2)], whereas SR is unaf-
sequence of Theorem 3.1 is that if limn+- an,, = 1 - (1 - fected by other treatments and hence excels in detecting dif-
for p = 2, . . . , k - 2 and limn+- anp = a for p = ferences with no intermediate treatments. Among the stepwise
k - 1 and k, then the limiting Type I error level for NAH is procedures of Section 2, KWS and RGS appear to be slightly
a, whereas for NAHU it can be as large as 1 - (1 - a)"2 (for more powerful than SRS, especially fork = 6. RGS does well
k even), the rate for NK. for evenly spaced shifts and KWS for slippages such as (0, 0,
4. COMPARISON OF SMALL-SAMPLE POWERS 2, 2) and (0, 0, 0, 2, 2, 2).
Downloaded by [Moskow State Univ Bibliote] at 11:16 15 January 2014

3. Power Comparisons Among Groups. Generally the


The small-sample behavior of the stepwise nonparametric stepwise procedures are vastly superior to SR and RG in terms
procedures of the previous two sections has been examined by of comparisonwise power. The superiority is dramatic for ex-
a Monte Carlo study. In particular, SRS, RGS, KWS, NAH, treme differences [e.g., n = 6 in (0, 0, 0, 2), (0, 0, 2, 2), and
and NAHU were contrasted with each other and with the non- (0, 0, 0, 2, 2, 2)] and in other cases [ n = 10, 0 vs. 1 in (0,
stepwise procedures SR and RG. Here, k = 4 treatments with 1, 1, 2)]. The inferiority of RG to detect internal pairs is vastly
sample sizes n = 6, 10, and 15 and k = 6 with n = 6 and improved in RGS. For k = 4 NAH is comparable to RGS, and
10 were investigated. For each k and n , several treatment shift for k = 6 it is even sometimes superior [e.g., n = 6, (0, 0,
configurations have been considered for each of four population 1, 1, 2, 2) and (0, 4, 2, !, 5, 2)]. A surprising observation is
distributions (uniform, normal, exponential, and double ex- that for k = 6, although SRS is better than SR, RGS is not
ponential). Independent observations have been generated for necessarily superior in pairwise power to RG; the single instance
each shift and distribution, so that 5,000 replications of each in the table is 0 versus 2 for n = 6 and (0, g, 6, g, i!, 2).
experiment were performed. The critical values of the test sta- It is interesting that the pairwise powers of NAHU are only
tistics were obtained from exact tables and, whenever neces- slightly larger than that of NAH. This suggests for k 5 6 that
s a r y , from large-sample approximations to achieve an overall there may not be a serious loss of power with the more stringent
experimentwise error rate of no more than a = .05.The poor Type I error levels of NAH.
behavior of the large-sample approximation of SR necessitated
a simulated .05 critical value for k = 4, n = 6 and 10. The 5. RELATIVE EFFICIENCY
large-sample approximation of the stepwise procedures required Let Q denote the studentized range statistic and F the classical
nonstandard up's, namely, a, = 1 - (.95)"", which were analysis of variance (ANOVA) statistic for the normally dis-
obtained from the tables of the normal range in Beyer (1968, tributed model. Let QS and FS denote the corresponding all-
pp. 352-359).
In the simulation study the following quantities were esti- Table 2. Monte Carlo Estimates of Power for MC Procedures at
mated the Type I error level, the power of two-sided compar- a = .05 for Normal Populations With Unit Variances and Means
isons of unequal treatment pairs, and the Type I11 experiment- (0, 0,312, 312) Based on 5,000 Replications
wise error rate (declare Bi > 8, when Bi < 0,). The standard Overall
deviation of an estimate of the power a is [a(1 - a)/ Power, Pairwise Power
5,000]1/2I.007. In terms of comparisons of the procedures, (1. 2,
the results were similar for all four distributions; consequently, MCP 3, 4) (1, 2) (3, 4) (1, 3) (1, 4) (2,3) (2, 4)
only results for the uniform are reported here. These estimated QS' (n = 9) (.917) .026 .023 .743 ,717 .723 .716
Type I error levels and powers are given for a representative SRS
(n = 10) .923 .025 .021 ,736 .745 ,739 .748
set of treatment shifts c of the U(0, 1) uniform in Table 1, RGS
where c = (cl, . . . , CJ' are the shifts expressed as multiples (n = 10) .932 .023 .023 .748 .753 .747 .754
of the standard deviation of the uniform (0 = 289). The es- NAH
(n = 10) ,933 .022 .022 ,752 .758 .752 ,756
timated experimentwise Type I11 errors levels are quite small
(less than .004 in all cases) for the alternatives studied and are FS'
(n = 9) (.956) ,027 ,024 .772 .742 .723 .737
also not in the table. It is interesting to note that although Shaffer KWS
(1980) cautioned that the probability of a Type I or Type I11 (n = 10) .961 .022 .025 .757 .759 .762 .759
error could be >a, this did not seem to be the case for the * Powers horn Elnol and Gabriel (1975) wd on 1,000 replicationsexcept ovarall power (in
e
alternatives studied here. parentheses), which ere theoratic~ll.
Campbell and Sklllings: Nonparametric Multiple Comparlsons 1003

subset procedures as introduced in Einot and Gabriel (1975). be superior to the better of RGS and KWS. RGS is fairly
[Both procedures are computer implemented (SAS Institute, powerful in the simulations, doing well for equally spaced treat-
Inc. 1982).] The following Pitman asymptotic relative effi- ment shifts. KWS is quite powerful, especially for slippages.
ciency (ARE) results for local alternatives are well-known: The main disadvantage of these procedures is computational
1. ARE(SR, RG) = 1 (see Sherman 1965 and Koziol and for large k or n, but the chi-squared approximation makes KWS
Reid 1977). easily implemented as a computer routine, even for unequal n,.
2. ARE(SR, Q) = ARE(W, t ) , where W is the two-sample An additional advantage of KWS is realized for analyses that
rank sum Wilcoxon statistic and r is the classical two-sample use the Kruskal-Wallis statistic as an initial screen prior to a
f test (Sherman 1965). MCP in that there is no need to switch to a different statistic.
The main ARE result of this section is that ARE(SRS, 3. NAH is quite easy to perform, even for large k, although
QS) = ARE(W, f). The proof is straightforward. Sherman subset coherence is sacrificed. Although all Type I errors may
(1965) demonstrated that the asymptotic covariance structure be controlled only asymptotically, the simulations suggest rea-
of SR and Q are proportional. What is crucial here is that since sonable error control. This procedure also has simulated power
there is no reranking, the asymptotic covariance structure of characteristics that compare very favorably with the AS pro-
the extended AS procedures have the same proportionality from cedures.
[Received March 1984. Revised January 1985.]
step to step. Thus, ARE(SRS, QS) = ARE(SR, Q). Unfor-
tunately, the efficiencies ARE(RGS, QS), ARE(KWS, FS), REFERENCES
and ARE(NAH, QS) are not obtainable by this argument be- Beyer, W. H. (1968). CRC Handbook of Tables for Probability and Statistics
cause reranking at each step modifies the covariance structure (2nd ed.), Cleveland: Chemical Rubber Company.
Downloaded by [Moskow State Univ Bibliote] at 11:16 15 January 2014

Box, G. E. P., and Muller, M. E. (1958), “A Note on the Generation of


from step to step. Random Normal Deviates,” Annals of Mathematical Statistics, 29, 610-
Small-sample efficiency is briefly examined. For QS and FS, 611.
Einot and Gabriel (1975) presented a table of powers for four Campbell, G. (1980), “Nonparametric Multiple Comparisons,” Proceedings
of the Section on Statistical Education, American Statistical Association,
normal populations with means 0, 0, B, 3, common variance 24-27.
1, and sample size 9. In Table 2 these powers are compared Dunn, 0 . J. (1964), “Multiple Comparisons Using Rank Sums,” Techno-
with the simulated powers of SRS, RGS, KWS, and NAH for metrics, 6, 241-252.
Dwass, M. (1960). “Some k-Sample Rank-Order Tests,” Contributions to
n = 10, using normal deviates from the transformation of Box Probability and Statistics, eds. I. Olkin, S. G. Ghurye, W. Hoeffding,
and Muller (1958). Since the simulated powers of SRS, RGS, W. G. Madow, and H. B. Mann, Stanford, CA: Stanford University Press,
and NAH with n = 10 dominate those for QS with n = 9, it pp. 198-202.
Einot, I., and Gabriel, K. R. (1975), “A Study of the Powers of Several
follows that the small-sample efficiency of SRS, RGS, and Methods of Multiple Comparisons,” Journal of the American Statistical
NAH to QS for this alternative is at least the ratio A. A similar Association, 70, 574-583.
small-sample efficiency is seen for KWS versus FS. For SRS Gabriel, K. R. (1969). “Simultaneous Test Procedures-Some Theory of Mul-
tiple Comparisons, Annals of Mathematical Statistics, 40, 224-250.

versus QS this supports the result for normality that ARE(SRS, Hartley, H. 0. (1955), “Some Recent Developments in Analysis of
QS) = 3In = .955. Variance,” Communications in Pure and Applied Mathematics, 8, 47-72.
Hollander, M., and Wolfe, D. A. (1973), Nonparametric Statistical Methods,
6. DISCUSSION AND CONCLUSIONS New York: John Wiley.
Koziol, J. A., and Reid, N. (1977), “Onthe Asymptotic Equivalence of Two
The issue of unequal sample sizes is now considered. For Ranking Methods for k-Sample Linear Rank Statistics,” The Annals of Sta-
tistics, 5 , 1099-1 106.
k > 3, exact tables of the nonparametric statistics are generally Miller, R. G., Jr. (1981), Simultaneous Statistical Inference (2nd ed.), New
unavailable for unequal n,. The chi-squared large-sample ap- York: McGraw-Hill.
proximation for KWS easily accommodates unequal samples. Nemenyi, P. (1963). “Distribution-Free Multiple Comparisons,” unpublished
Ph.D. thesis, Princeton University.
For SRS it is simplest to use Bonferroni’s inequality on the Oude Voshaar, J. H. (1980), “(k - 1)-Mean Significance Levels of Nonpar-
pairwise Wilcoxon statistics for each step (Ryan 1960). For ametric Multiple Comparisons Procedures,” The Annals of Statistics, 8, 75-
RGS, the pairwise differences of rank means (replacing sums) 86.
Pun, M. L. (1964). “Asymptotic Efficiency of a Class of c-Sample Tests,”
can be adjusted using Bonferroni (see Dunn 1964). Annals of Mathematical Statistics, 35, 102-121.
The advantages and drawbacks of the various procedures are Ramsey, P. H. (1978), “Power Differences Between Pairwise Multiple
considered: Comparisons,” Journal of the American Statistical Association, 73, 479-
485.
1. The nonstepwise procedures (SR, RG) have the poorest Ryan, T. A. (1960), “Significance Tests for Multiple Comparison of Propor-
power but are simplest to use, except for limited tables. SR tions, Variances, and Other Statistics,” Psychological Bulletin, 57, 318-
328.
has an inadequate large-sample approximation for moderate SAS Institute, Inc. (1982), SAS User’s Guide: Statistics, Cary, NC: Author.
samples. RG has somewhat better power for extreme treat- Shaffer, J. P. (1980), “Control of Directional Errors With Stagewise Multiple
ments. Test Procedures,” The Annals of Statistics, 8 , 1342-1347.
Sherman, E. (1965). “A Note on Multiple Comparisons Using Rank Sums,”
2. Procedures SRS, RGS, and KWS are generally much Technometrics. 7, 255-256.
more powerful than RG and SR and provide stringent error Skillings, J. H. (1983), “Nonparametric Approaches to Testing and Multiple
control. Whereas both Miller (198 1) and Gabriel (1969) em- Comparisons in a One-way ANOVA,” Communications in Statistics. Part
B-Simulation and Computation, 12, 373-387.
phasize the rank dependence of pairwise decisions upon other Steel, R. G. D. (1960), “A Rank Sum Test for Comparing All Pairs of
treatments as a drawback of RG, the AS approach sidesteps Treatments,” Technometrics, 2, 197-207.
this criticism to some extent with subset coherence. SRS is (1961). “Some Rank Sum Multiple Comparisons Tests,”Biomenics,
17, 539-552.
plagued by inadequate tables and poor large-sample approxi- Welsch, R. E. (1977), “Stepwise Multiple Comparison Procedures,” Journal
mations for moderate n; it never appears in the simulations to of the American Statistical Association, 72, 566-575.

You might also like