162
Healy
not realising the relative unimportance of the
assumption in practice. Is there any reason
why they should not be the usual methods of
data analysis?
One accusation that can certainly not be
raised against ranking tests is one of ineffi-
cion on a reading which differs by what may
seem to be an implausible amount from the
bulk of the data. Such readings must be
important, no matter what their origin; either
they are genuine, in which case they deserve
further investigation, or they are spurious (in
ciency, ofwasting the information in the data.
The relative efficiency oftwo alternative tests
ofsignificance can be measured by the ratio of
my experience, 95% or so of the time) and
have no business forming part ofthe analysis.
The non-parametric approach encourages the
the sample sizes needed to achieve the same
power foragiven significanceleveland agiven
departure from thenullhypothesis- ifone test
throw-it-at-the-computer-and-stand-back atti-
tude to statistical analysis which is no part of
good scientificpractice.
needs twice as many observations as another,
itsrelativeefficiencyis50%. When theNormal
assumption istrue, the usual tand F tests can
be shown to be the most efficient that can be
devised so that under these circumstances
usinganon-parametric testmustbe equivalent
to throwing away some of the observations.
But the most important case against the
widespread use ofnon-parametric methods is
that they do not lend themselves at all readily
to estimation. This isalmost true by definition
- ifno parameters are brought into considera-
tion, it is difficult to see how to describe and
assess the magnitude of a treatment effect,
However, the cost is remarkably small, no
more than around 5%. With non-Normal data
always so much more important than mere
statistical significance. It cannot be said often
non-parametric methods, apart from giving a
correct significance level, may be more effi-
cient than t tests. Many people feel that the
small price isworth paying forpeace ofmind.
enough that simply establishing that an effect
exists (at some conventional level of signifi-
cance) ishardly ever a satisfactory goal for an
illuminating statistical analysis.
Itisironicthatthehighefficiencyofranking
methods was not always realised by their
originators,who offeredthetestsasquick-and-
dirtyalternatives aimed atpeople who did not
possess the mechanical calculators ofthe day.
In actual fact they are a fiddly nuisance to
apply by hand and
they have only become
really popular since being incorporated into
It is actually possible to obtain confidence
intervals in a non-parametric framework. On
average just 50% of sample observations are
expected to fallbelow the population median,
so the number doing so in a particular sample
willfollowabinomial distributionwithp=0 5.
This fact can be used to obtain a confidence
interval for the population median (agreeable
computer packages.
Even so, there issomething a bit odd about
rankingtests. One may ask,iftheideaofdraw-
ing one's sample from a distribution is given
philosophical discussions can be had as to
whether or not the median is a population
up, how is the significance probability arrived
parameter). An extension of the argument
leads to confidence intervals for other quan-
tiles, such as the 2'/2% point which is com-
at?With no parent distribution, how can there
monly used as a lower normal limit or
be a sampling distribution? The answer is an
ingenious one. Take the Wilcoxon test as an
reference value. It is worth noting that the
price paid for abandoning the assumption of
example. In table 3 there were 10 differences
Normality (orlogNormality) when itisin fact
which were ranked, and on the nullhypothesis
justified is a heavy one in this context - to
each difference was equallylikelyto have been
achieve the same level ofprecision in estimat-
positive or negative. We could set out all the
ing the
2/2%
point without assuming
1024 possiblepatternsof+ and - signs,apply
Normality requires about between 2 and 3
them to the ranks and work out the Wilcoxon
times as many observations.
statisticfor each pattern. On the null hypothe-
sisallthepatternsare equallylikelyso thateach
Far more commonly, what is required is a
confidence interval for the difference between
valueofthestatistichasaprobabilityof1/1024,
and thisprovidesthe distributionwe require;if
the value actuallyobserved liesin one or other
ofthe tails, we say that itis significant. But it
two population medians, treated v control or
after v before. Here again, methods are avail-
able for obtaining such intervals without
assuming Normality of the populations (see
should be noted thatthe 10 differences remain
chapter 8 of Statistics with Confidence, M J
fixed throughout this argument - there is no
referencetowhatmighthavehappenedtoother
Gardner and D G Altman, eds. BMA, 1989).
patients
with other values ofthe difference. In
particular, the method does not permit us to
But these methods are by no means distribu-
tion free. They rest upon the assumption that
the two distributions are identicalin shapeand
assess the probability
of the treatment being
differonlyinlocation. This ispreciselythesort
effective on a new
patient - the significance
of assumption that users of non-parametric
probability is quite distinct from this. Exactly
the same istrue, mutatis mutandis,forthe other
methods are trying to avoid. It is almost cer-
tainly false when the distributions concerned
ranking tests. As a solutionto the fundamental
share a common start, as is the case for many
problem ofstatistics, that ofarguing
from the
particular to the general,
non-parametric
methods must be subjectto
question.
biochemical and endocrinological determina-
tions. Happily, the fact that the methods
concernedare notwidelyavailableincomputer
Ranking methods do not cope with the
packages has prevented them from being
problem of outliers, or only by sweeping it
extensivelyused.
under the rug.
Distribution free methods of
On balance,thewidespreaduse
ofnon-para-
theirvery nature do not allow us to cast suspi-
metric methods isin my view
pernicious.They