You are on page 1of 12

This article was downloaded by: [Carnegie Mellon University]

On: 17 January 2015, At: 11:22


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical Association


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/uasa20

Alternatives to the Median Absolute Deviation


a a
Peter J. Rousseeuw & Christophe Croux
a
Department of Mathematics and Computer Science , Universitaire Instelling Antwerpen
(UIA) , Universiteitsplein 1, B-2610 , Wilrijk , Belgium
Published online: 27 Feb 2012.

To cite this article: Peter J. Rousseeuw & Christophe Croux (1993) Alternatives to the Median Absolute Deviation, Journal of
the American Statistical Association, 88:424, 1273-1283

To link to this article: http://dx.doi.org/10.1080/01621459.1993.10476408

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall not be
liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of
the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Alternatives to the Median Absolute Deviation
Peter J. ROUSSEEUWand Christophe CROUX*

~~~ ~ ~~ ~

In robust estimation one frequently needs an initial or auxiliary estimate of scale. For this one usually takes the median abso-
lute deviation MAD, = 1.4826 med, { I xi - med,x, I } , because it has a simple explicit formula, needs little computation time, and
is very robust as witnessed by its bounded influence function and its 50% breakdown point. But there is still room for improve-
ment in two areas: the fact that MAD, is aimed at symmetric distributions and its low (37%) Gaussian efficiency. In this article
we set out to construct explicit and 50% breakdown scale estimators that are more efficient. We consider the estimator S,
= 1.1926 medi {med, I xi - xjl } and the estimator Qn given by the .25 quantile of the distances { 1 x, - x, I ; i < j } . Note that S, and
Q, do not need any location estimate. Both S. and Qn can be computed using O(n log n) time and O ( n ) storage. The Gaussian
efficiency of S. is 58%, whereas Q. attains 82%. We study S, and Qnby means of their influence functions, their bias curves (for
implosion as well as explosion), and their finite-sample performance. Their behavior is also compared at non-Gaussian models,
including the negative exponential model where S, has a lower gross-error sensitivity than the MAD.
KEY WORDS: Bias curve; Breakdown point; Influence function; Robustness; Scale estimation.

1. INTRODUCTION The MAD has the best possible breakdown point (50%,twice
as much as the interquartile range), and its influence function
Although many robust estimators of location exist, the
is bounded, with the sharpest possible bound among all scale
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

sample median is still the most widely known. If {XI, . . . ,


estimators. The MAD was first promoted by Hampel ( 1974),
x,} is a batch of numbers, we will denote its sample median
who attributed it to Gauss. The constant b in ( 1.2) is needed
by to make the estimator consistent for the parameter of interest.
med,x, In the case of the usual parameter c at Gaussian distributions,
we need to set b = 1.4826. (In the same situation, the average
which is simply the middle order statistic when n is odd.
deviation in ( 1.1) needs to be multiplied by 1.2533to become
When n is even, we shall use the average of the order statistics
consistent.)
+
with ranks ( n / 2 )and ( n / 2 ) 1. The median has a breakdown
The sample median and the MAD are simple and easy to
point of 50% (which is the highest possible), because the
compute, but nevertheless very useful. Their extreme stur-
estimate remains bounded when fewer than 50%of the data
diness makes them ideal for screening the data for outliers
points are replaced by arbitrary numbers. Its influence func-
in a quick way, by computing
tion is also bounded, with the sharpest bound for any location
estimator (see Hampel, Ronchetti, Rousseeuw, and Stahel
1986). These are but a few results attesting to the median’s (1.3)
robustness.
Robust estimation of scale appears to have gained some- for each xi and flagging those xi as spurious for which this
what less acceptance among general users of statistical meth- statistic exceeds a certain cutoff (say, 2.5 or 3.0). Also, the
ods. The only robust scale estimator to be found in most median and the MAD are often used as initial values for the
statistical packages is the interquartile range, which has a computation of more efficient robust estimators. It was con-
breakdown point of 25% (which is rather good, although it firmed by simulation (Andrews et al. 1972) that it is very
can be improved on). Some people erroneously consider the important to have robust starting values for the computation
average deviation of M-estimators, and that it won’t do to start from the average
and the standard deviation. Also note that location M-
ave, I x, - ave,x, I (1.1) estimators need an ancillary estimate of scale to make them
equivariant. It has turned out that the MADS high break-
(where ave stands for “average”) to be a robust estimator, down property makes it a better ancillary scale estimator
although its breakdown point is 0 and its influence function than the interquartile range, also in regression problems. This
is unbounded. If in ( 1.1) one of the averages is replaced by led Huber ( 198 1, p. 107) to conclude that “the MAD has
a median, we obtain the “median deviation about the av- emerged as the single most useful ancillary estimate of scale.”
erage” and the “average deviation about the median,” both In spite of all these advantages, the MAD also has some
of which suffer from a breakdown point of 0 as well. drawbacks. First, its efficiency at Gaussian distributions is
A very robust scale estimator is the median absolute de- very low; whereas the location median’s asymptotic efficiency
viation about the median, given by is still 64%,the MAD is only 37% efficient. Second, the MAD
MAD, = b med, I x, - med,x,I. takes a symmetric view on dispersion, because one first es-
(1 .2)
timates a central value (the median) and then attaches equal
This estimator is also known under the shorter name of me- importance to positive and negative deviations from it. Ac-
dian absolute deviation (MAD) or even median deviation. tually, the MAD corresponds to finding the symmetric in-

* Peter J. Rousseeuw is Professor and Christophe Croux is Assistant, De-


partment of Mathematics and Computer Science, Universitaire Instelling 0 1993 American Statistical Association
Antwerpen (UIA), Universiteitsplein I , B-2610 Wilrijk, Belgium. The authors Journal of the American Stalistical Association
thank the associate editor and referees for helpful comments. December 1993, Vol. 88, No. 424, Theory and Methods
1273
1274 Journal of the American Statistical Association, December 1993

terval (around the median) that contains 50% of the data (or Table 1. Average Estimated Value of MAD,, Sn, On,
50% of the probability), which does not seem to be a natural and SO, at Gaussian Data
approach at asymmetric distributions. The interquartile Average estimated value
range does not have this problem, as the quartiles need not
be equally far away from the center. In fact, Huber (1 98 1 , n MAD, Sn On Son
-
p. 1 14) presented the MAD as the symmetrized version of 10 ,911 .992 1.392 .973
the interquartile range. This implicit reliance on symmetry 20 .959 ,999 1.193 ,990
is at odds with the general theory of M-estimators, in which 40 .978 .999 1.093 ,993
60 .987 1.001 1.064 .996
the symmetry assumption is not needed. Of course, there is 80 ,991 1.002 1.048 .997
nothing to stop us from using the MAD at highly skewed 100 .992 .997 1.038 .997
distributions, but it may be rather inefficient and artificial 200 ,996 1.ooo 1.019 ,999
03 1.ooo 1.000 1.ooo 1.ooo
to do so.
NOTE: Based on 10,oOO Sam@eS foC each n.
2. THE ESTIMATOR S,
The purpose of this article is to search for alternatives to
the MAD that can be used as initial or ancillary scale esti- (rousse@wins.uia.ac.be),and it has been incorporated in
mates in the same way but that are more efficient and not Statistical Calculator (T. Dusoir, fbgj23@ujvax.ulster.ac.uk)
slanted towards symmetric distributions. Two such estima- and in Statlib (statlib@stat.cmu.edu).
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

tors will be proposed and investigated; the first is To check whether the correction factor c = 1.1926 (ob-
S,,=cmedi{medjIxi -xjl}. (2.1) tained through an asymptotic argument) succeeds in making
S,, approximately unbiased for finite samples, we performed
This should be read as follows. For each i we compute the a modest simulation study. The estimators in Table 1 are
median of { I xi - x, I ; j = 1 , . . . ,n } . This yields n numbers, the MAD,, S,,, the estimator Qn(which will be described in
the median of which gives our final estimate S,,. (The factor Section 3), and the sample standard deviation SD,,. Each
c is again for consistency, and its default value is 1.1926, as table entry is the average scale estimate on 10,000 batches
we will see later.) In our implementation of (2.1) the outer of Gaussian observations. We see that S,, behaves better than
median is a low median, which is the order statistic of rank MAD,,in this experiment. Moreover, Croux and Rousseeuw
[ (n+ 1)/2], and the inner median is a high median, which (1 992b) have derived finite-sample correction factors that
+
is the order statistic of rank h = [n/2] 1. The estimator render MAD,,, S,,, and Q,, almost exactly unbiased.
S,, can be seen as an analog of Gini’s average difference (Gini In the following theorem we prove that the finite sample
19 12), which one would obtain when replacing the medians breakdown point of S,, is the highest possible. We use the
by averages in (2.1). Note that (2.1) is very similar to formula replacement version of the breakdown point: For any sample
(1.2) for the MAD, the only difference being that the med, X = { x, , . . . ,x,,} , the breakdown point of S,, is defined by
operation was moved outside the absolute value. The idea
to apply medians repeatedly was introduced by Tukey (1977) &n*(sn,
X ) = min{ E n + ( s n , X I , & i ( S n X
, ) } , (2.2)
for estimation in two-way tables and by Siege1 (1982) for where
estimating regression coefficients. Rousseeuw and Bassett
( 1990) used recursive medians for estimating location in large en+(&, X ) = min
data sets.
Note that (2.1) is an explicit formula; hence S,, is always
uniquely defined. We also see immediately that S,, does be-
have like a scale estimator, in the sense that transforming
+
the observations xi to axi b will multiply S,, by 1 a I. We
and
:{
c,(S,,, X ) = min - ;inf S,,(X’)= 0
X‘

and X ’ is obtained by replacing any m observations of X by


I
will refer to this property as afine equivariance.
arbitrary values. The quantities E : and E ; are called the ex-
Like the MAD, the new estimator S,, is a simple combi-
plosion breakdown point and the implosion breakdown
nation of medians and absolute values. Instead of the ab-
point.
solute values, we could also use squares and then take a
square root at the end, which would yield exactly the same Theorem 1. At any sample X = { x , , . . . ,x,, } in which
estimate. (If we would replace the medians by averages in no two points coincide, we have
that formula, we would recover the standard deviation.)
On the other hand, S,, is unlike the MAD in that it does
cn+(S,,, X ) = [(n + 1)/2]/n and c,(S,,, X ) = [n/2]/n.
not need any location estimate of the data. Instead of mea- The breakdown point of the scale estimator S,,thus is given
suring how far away the observationsare from a central value, by
S,, looks at a typical distance between observations, which
en*(&, X ) = [n/21/n,
is still valid at asymmetric distributions.
A straightforward algorithm for computing (2.1) would which is the highest possible value for any affine equivariant
need O( n 2 )computation time. However, Croux and Rous- scale estimator. (The proof is given in the Appendix.)
seeuw (1992b) have constructed an O(nlog n)-time algo- We now turn to the asymptotic version of our estimator.
rithm for S,,. Its source code can be obtained from the authors Let X and Y be independent stochastic variables with dis-
Rousseeuw and Croux: Alternatives to the MAD 1275

tribution function G. If we denote g G ( x )= medy 1 x - Y 1, (A more general “chain rule” for influence functions was
then given by Rousseeuw and Croux 1992.) Let us consider the
special case where F = a. We will use the notation q
S ( G ) = c med g G ( X ) (2.3) = W 1 ( 3 / 4 ) .From the proof of Theorem 2, it follows that
X
41 = -4, q 2 = q, and g k ( q ) = - g k ( - q ) . Because we have
is the desired functional. If Gnis the empirical distribution
function, then S( Gn)= Sn. @ ( x+ g d x ) ) - + ( x - g d x ) ) = .5
Let us consider a location-scale model F8,*(x ) = F( ( x - O ) / for all x , it follows that
a), where F is called the model distribution. We want our
estimator to be Fisher-consistent, which means that S( F8,J 4(q - c-l) - 4(q + c - l )
gk(q) =
= d for all 8 and all u > 0. It is easily verified that S is Fisher- +
4 ( q c-l) 4 ( q - c-1) .+
consistent if we take
This yields the expression
c = I/(med g ~ ( J 3 ) , (2.4)
X

where X is distributed according to the model distribution


F . The following theorem gives the value of c in case F = a, xI - c-I) + sgn( 1-4 xI - c-I)
where a(x ) is the standard Gaussian distribution function. + sgn( 1q -4(4(q + + 4(q -
c-l)
-

c-l))
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

Theorem 2. For F = a, the constant c satisfies the equa-


tion (2.7)
The influence function of S is plotted in Figure 1. We see
a(a-1(3/4) + c-I) - a(a-1(3/4) - c-I) = 1/2. that it is a step function that takes on four values (which are
Equivalently, c-I is the square root of the median of the symmetric about 0), unlike ZF(x; MAD, a), which takes
noncentral chi-squared distribution with 1 degree of freedom on only two values. Let us also consider the gross-error sen-
and noncentrality parameter 9-’(3/4). Numerical calcu- sitivity
lation yields c = 1.1926. (If we want to work with another
model distribution, we must take a different c , as in the y*( S , a) = sup(I F ( x ; S ,
X
@)I = 1.625, (2.8)
examples in Sec. 4.)
The influence function of the functional S at the distri- which is rather small, indicating that S i s quite robust against
bution F is defind by butliers. In contrast, y*(SD, a) = co,whereas ?*(MAD, a)
= 1.167 is the smallest value that we can obtain for any scale
+
S((1 - & ) F &A,) - S ( F ) estimator at the Gaussian distribution.
ZF(x:
. , S ,. F ) = lim
, , (2.5)
, . ,
40 & The influence function of the classical standard deviation
is given by ZF(x; SD, a) = ( x 2 - 1)/2. Note that I F ( x ; S ,
where Ax has all its mass in x (Hampel 1974). Theorem 3
a) looks more like that U-shaped curve than does I F ( x ;
gives the expression of IF( x ; S , F ) .
MAD, @), indicating that S will be more efficient than the
Theorem 3. Assume that the following conditions hold: MAD. Indeed, the asymptotic variance of Snis given by
1. There exists an xo such that g F ( x )is increasing on
V ( S , a) = ZF(x; S , a)’ d @ ( x )= .8573. (2.9)
[ x O ,a[and decreasing on ]-m, x o ] .
2. There exist q1 < xoand q2 > xo for which gF(q1) = c-’
= gF(q 2 ) .Assume that F has a density f in neighborhoods
of ql and q 2 , with f(q1) > 0 and f(q 2 ) > 0. Assume further
that gF has a strictly positive derivative in a neighbor-
hood of 42 and a strictly negative derivative in a neighborhood
of 41.
3. f exists in neighborhoods of the points q1 5 c-’ and
q 2 f c-l.
Then the influence function of S is given by

[
w x ; s, F ) = c F(q2) - F(q1) - Z(q1 < x < q 2 )

- c-l)
- f(q2)sgn(Iq2 -XI
2(f(q2 + C - I ) -A42 - c-l))

+ f(4r)sgn(lq, - X I - C - I )
2(f(q1 + c-l) -f(a - c - l ) ) I ’ -4 -2 0 2

Figure 1. Influence Functions of the MAD, the Estimator S, and the


4

(2.6) Estimator Q When the Model Distribution is Gaussian.


1276 Journal of the American Statistical Association, December 1993

Table 2. Standardized Variance of MAD,, S,, Q,, where g+ is defined implicitly by


and SO, at Gaussian Data
1
Standardized variance F(x + g ' ( x ) ) - F ( x - g+(x))= . (2.12)
2(1 -c)
n MAD, S" 0" SO" The implosion bias curve is given by
10 1.361 1.125 ,910 ,584
20 1.368 ,984 .773 ,532
40 1.338 ,890 .701 .513
60 1.381 .893 ,679 .515
80 1.342 ,878 .652 ,511 where g- is defined implicitly by
100 1.377 .869 ,650 ,507
200 1.361 ,873 .636 .499 1 - 2E
m 1.361 .857 .608 .500 F(x + g - ( x ) ) - F(x - g - ( x ) ) = -. (2.14)
1-&
NOTE: Based on 10,000 samples for each n.
Figure 2 displays both bias curves at F = @. We see that the
explosion bias curve of S is nearly as good as that of the
(The asymptotic normality of S,, was proved in Hossjer, MAD, and for E close to the breakdown point it is even
Croux, and Rousseeuw 1993.) This yields an efficiency of better. For the implosion bias curve the MAD performs
58.23%,which is a marked improvement relative to the MAD slightly better than S overall. Note that [ d B + (c, @)/a&]I==o
whose efficiency at Gaussian distributions is 36.74%. We pay = y*( S , @)and[ d B - ( c , @)/&I IC=o = -y*(S, a). Moreover,
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

for this by a slight increase in the gross-error sensitivity and at the breakdown point ( c + 4) we find B + (E, @) --* co and
in the required computation time. B - ( c , @) + 0.
We carried out a simulation to verify this efficiency gain
at finite samples. For each n in Table 2, we computed the
variance var,(S,,) of the scale estimator S,,over m = 10,000
samples. Table 2 lists the standardized variances
n var, ( sn)
/ (ave,( Sn) ) 2 , (2.10)
where ave,(S,,) is the average estimated value as given in
Table 1. (It was argued by Bickel and Lehmann (1976) that
the denominator of (2.10) is needed to obtain a natural mea-
sure of accuracy for scale estimators.) The results show that
the asymptotic variance provides a good approximation for
(not too small) finite samples, and that S,, is more efficient
than MAD, even for small n .
Whereas the influence function describes how the esti-
mator reacts to a single outlier, the bias curve tells us how
much the estimator can change (in the worst case) when a
fraction e of the data is contaminated. Bias curves were briefly
mentioned by Hampel et al. (1986, p. 177), but their full
potential was not realized until the work of Martin and Za- 0.0 0.1 0.2 0.3 0.4 0.5
mar (1989, 1991), Martin, Yohai and Zamar (1989), and epsi l on
He and Simpson (1993). (4
In the case of scale estimators, we distinguish between an
increase of the estimate (explosion) and a decrease (implo-
sion). Fore > 0, we define 3== { G; G = (1 - E ) F +c H } ,
where H ranges over all distributions. Then the explosion
bias curve of S is defined as
B + ( E F)
, = SUP S(G)
CE3.

(plotted as a function of E ) , and the implosion bias curve of


S is given by
B - ( E ,F) = inf S ( G ) .
GE3.

Theorem 4. For a distribution F that is symmetric 0


0.0 0.1 0.2 0.3 0.4 0.5
around the origin, is unimodal, and has a strictly positive epsi l on
density, we have (b)
3 - 2E Figure 2. Bias Curves of the MAD, S, and Q as a Function of the Fraction
of Contamination:(a) for Explosion; (b) for Implosion.
Rousseeuw and Croux: Alternatives to the MAD 1277

Note that the influence function and the bias curve are We first show that the finite sample breakdown point of
asymptotic concepts. For finite samples, we have confirmed Q,, attains the optimal value.
the above results by computing the correspondingsensitivity Theorem 5 . At any sample X = { x l , . . . ,x,,} in which no
curves and empirical bias curves. two points coincide, we have &,+(en, X) = [(n + 1)/2]/n
3. THE ESTIMATOR Q,
and e;( Q,,,X ) = [ n / 2 ] / n ;thus the breakdown point of Q,,
is
A drawback of MAD,, and S,, is that their influence func-
cn*(Qn, X ) = [n/21/n.
tions have discontinuities. In the location framework, the
sample median has the same drawback, unlike the Hodges- We now give the asymptotic version of our estimator. Let
Lehmann (1963) estimator X and Y be independent random variables with distribution
G . Then we define
(3.1)
Q ( G ) = d H ; ' ( t ) with HG = f ( I X - Y l ) . (3.4)
which possesses a smooth influence function (see, for ex-
Note that Q ( G n )is not exactly the same as Q,, (where we
ample, Hampel et al. 1986, p. 1 12). Therefore, the Hodges-
take an order statistic among (2") elements instead of n 2 ) ,
Lehmann estimator might be viewed as a "smooth version"
but asymptotically this makes no difference. Representation
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

of the median.
(3.4)belongs to the class of generalized L-statistics, as intro-
An analogous scale estimator, mentioned by Shamos duced by Serfling ( 1 984). Because X - Y is symmetric, we
(1976, p. 260) and Bickel and Lehmann (1979, p. 38), is could also write
obtained by replacing the pairwise averages by pairwise dis-
tances, yielding
Q ( G ) = d K E ' ( i ) with KG = L ( X - Y ) (3.5)
med{ ! x i - x i \ ; i < J } . (3.2)
instead of (3.4). Finally, we obtain
This resembles definition (2.1) of S,,, where separate medians
were taken over i and j , whereas (3.2) uses an overall median
over (1) pairs. We propose to multiply (3.2) by 1.0483 to
achieve consistency for the parameter u of Gaussian distri-
1 s
Q ( G ) = inf s > 0;
I
G ( t + d - ' s ) d G ( t ) 2 5 / 8 . (3.6)

In the parametric model F8,,,(x) = F ( ( x - O ) / u ) , the


butions. Like the Hodges-Jxhmann estimator, this scale es-
timator has only a 29%breakdown point, but a rather high functional Q is Fisher-consistent for u if we choose the con-
Gaussian efficiency (about 86%). stant d according to
But we want an estimator with a 50% breakdown point
like the MAD. We found that, somewhat surprisingly, this (3.7)
goal can be attained by replacing the median in (3.2) by a
different order statistic. Therefore, we propose the estimator For symmetric F this reduces to d = 1/((F*2)-'(5/8)),
'
where F* denotes the convolution F* F. In the case F = a,
Qn = d ( I xi - xjl; i < J } ( k ) , (3.3) we obtain KF = L ( X - Y ) = L ( X + Y) = L ( h ' ) ;
hence
where d is a constant factor and k = ( 5 ) = (1)/4, where h
= [n/2] + 1 is roughly half the number of observations. d = 1 /( \ / 2 + - ' ( 5 / 8 ) ) = 2.2219.
That is, we take the kth order statistic of the ($) interpoint
distances. This bears some resemblance to S,,, because the In Table 1 we see that with this constant the estimator Q,,
double median of the interpoint distances is at least as large has a considerable small-sample bias, but we can use the
as their .25 quantile. correction factors derived by Croux and Rousseeuw (1992b).
The estimator Q,, shares the attractive properties of S,,: a By means of some standard calculations (or using expres-
simple and explicit formula, a definition that is equally suit- sion (2.12) in Serfling 1984), one obtains the following for-
able for asymmetric distributions, and a 50% breakdown mula for the influence function.
point. In addition, we will see that its influence function is Theorem 6. If KFhas a positive derivative at 1/ d , then
smooth, and that its efficiency at Gaussian distributions is
very high (about 82%). At first sight, these advantages are
offset by a larger computational complexity, because the na-
ive algorithm (which begins by computing and storing all ZF(x; Q , F ) = d . (3.8)
(2") pairwise distances) needs O ( n 2 )space and O ( n 2 )time.
But Croux and Rousseeuw (1992b) have constructed an al-
gorithm for computing Q,,with O(n ) space and O(n log n) If F has a density f,then (3.8) becomes
time. Due to the availability of fast algorithms for S,, and
1
Q,,, a financial company is now using these estimators on :
4
+
- F(x + d-I) F(x- d-I)
a daily basis in analyses of the behavior of stocks, with n ZF(x; Q , F ) = d . (3.9)
8000. J f ( v + d - * ) f ( v )dv
1278 Journal of the American Statistical Association, December 1993

In Figure 1 we plotted the function ZF(x; Q, a). It turns Table 3. Simulation Results for Cauchy Data,
out that the gross-error sensitivity y*( Q, a) = 2.069 is larger Based on 70,000 Samples for Each n
than that of the MAD and S. Note that this influence func- Average value Standardized variance
tion is unbalanced: The influence of an inlier at the center
of the distribution is smaller (in absolute value) than the n MAD. S" Qn MAD, S" Qn

influence of an outlier at infinity. 10 1.097 1.126 1.629 3.150 3.194 3.363


With the aid of numerical integration, Equations (2.9) 20 1.045 1.044 1.284 2.819 2.554 2.606
40 1.025 1.021 1.136 2.569 2.280 2.299
and (3.9) yield V (Q, a) = .6077. By applying Theorem 3.1 60 1.014 1.010 1.085 2.571 2.202 2.172
in Serfling ( 1984), we obtain a rigorous proof of the asymp- 80 1.009 1.007 1.062 2.586 2.240 2.215
totics of Q,, under the same conditions as in our Theorem 100 1.011 1.010 1.055 2.497 2.164 2.119
200 1.005 1.004 1.027 2.454 2.130 2.058
6. The resulting Gaussian efficiency of 82.27%is surprisingly aJ 1.000 1.000 1.000 2.467 2.106 2.044
high.
In Table 2 we see that Q,, is considerably more efficient
than either MAD, or S,,. For instance, Qa has about the
same precision as MADso. But Q,,looses some of its efficiency 4. ROBUSTNESS AT NON-GAUSSIAN MODELS
at small sample sizes.
Quite often one uses a parametric model Fa,u(x) = F ( (x
Theorem 7. If F is symmetric about 0 and possesses a - e)/a) in which the model distribution F is itself non-
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

density, then Gaussian. In this section we investigate the behavior of S,,


and Q,, for a heavy-tailed model distribution and for an
B + ( EF
, ) = d(F*2)-' ( - 8E -k
8(1 - ")2
4E2), (3.10) asymmetric model distribution.
Let us first consider the Cauchy distribution
and the implosion bias is given by the solution of the equation
arctan(x)
+
C , 241 - E)F(B-(E,F)) + c 2
( I - E ) * P * ~ ( ~ - ' B - (F))
F(x) =
n-
+ -21.
= 5 / 8 . (3.11)
At this heavy-tailed distribution, the sample standard devia-
tion is not a useful estimator of u because the second moment
Figure 2 displays the bias curves of Q (at F = a) along
of Fe,u does not exist. But we can still use MAD,,, S,,,and
with those of the MAD and S. Note that the MAD is slightly
Q,,, whose consistency factors become b = 1, c = .707 1, and
more robust than Q regarding their explosion bias curves,
d = 1.2071 in this model. Figure 3 shows the influence func-
whereas Q is more robust than MAD for implosion bias. It
tions of these three estimators, which were computed in
is also interesting to note (see the derivation in the Appendix)
roughly the same way as in the Gaussian model. The shape
that the supremum bias of Q does not correspond to con-
of these functions is quite similar to Figure 1, except for
tamination by a point mass moving to infinity, but rather
ZF(x; Q, F), which is more balanced than in the Gaussian
to a contaminating distribution of which both location and
case. The gross-error sensitivity y*( Q, F) = 2.22 14 is attained
scale tend to infinity. This explains why the slope of the
for x going to infinity, but ZF(x; Q , F ) approaches this limit
explosion bias curve at E = 0 is not equal to the supremum
very slowly.
of the influence function in this case.
For the asymptotic variances we obtain V(MAD, F)
= 2.4674, V ( S, F ) = 2.1060, and V (Q, F ) = 2.0438. These
should be compared to the theoretical lower bound (i.e., the
inverse of the Fisher information) which equals 2 in this
case. Therefore, the absolute asymptotic efficienciesbecome
e(MAD) = 81%, e ( S ) = 95%, and e ( Q ) = 98%. At this
heavy-tailed distribution, all three robust estimators are more
efficient than at the Gaussian distribution.
Table 3 summarizes a simulation (analogous to Tables 1
and 2) confirming that MAD,, and S,, are approximately un-
biased for finite samples. For not too small n, the variability
of the estimators is described reasonably well by their
asymptotic variance. Note that S,, performs somewhat better
than Q,, at small samples.
As an example of an asymmetric model distribution, we
take the negative exponential
F(x) = (1 - exp(-x))l(x 2 0),
I
-5 3 5
~~~

-3 -1 1
which generates the location-scale family
Figure 3. Influence Functions of the MAD, S, and Q at the Cauchy Dis-
tribution. Fe,Ax) = (1 - exp(-(x - e)/u))Z(x 2 e),
Rousseeuw and Croux: Alternatives to the MAD 1279

which we will call the shifted exponential model. Note that Table 4. Simulation Results for Exponentially Distributed Data,
Based on 70,000Samples for Each n
the location parameter 8 is not at the center of the distri- ~~~ ~ ~

bution, but rather at its left boundary. To estimate u by Average value Standardized variance
MAD,, S,, or Q,, we need to set b = 2.0781,c = 1.6982,
and d = 3.4760.Figure 4 displays the influence functions of n MA0, S” 0” MAD, S” Q”
~~ ~

these estimators at F. Note that the influence functions were 10 0.925 1.002 1.536 2.152 2.121 1.677
also computed at negative x although F has no mass there, 20 0.959 0.981 1.246 2.182 2.065 1.521
40 0.983 0.991 1.120 2.167 2.006 1.481
because outliers can occur to the left of the unknown 19. 60 0.986 0.993 1.078 2.172 1.989 1.420
The influence function of the MAD looks very different 80 0.988 0.993 1.056 2.153 1.924 1.395
from those in Figures 1 and 3, because this time F is not 100 0.993 0.996 1.046 2.122 1.911 1.373
200 0.996 0.998 1.024 2.173 1.919 1.381
symmetric. Note that ZF(x; MAD, F) = 0 for negative x a, 1 .ooo 1 .ooo 1 .ooo 2.135 1.822 1.343
(due to the compensating effect of the outlier at x on the
location median) and that it has three jumps for positive x
+
(at ln(2) - b-I, at ln(2), and at ln(2) b-I). The influence
functions of S and Q are roughly U-shaped, as in Figures 1 V ( 5,F) = I . But the extreme efficiency of 8, is offset by its
and 3.Surprisingly, the gross-error sensitivity of S is smaller extreme sensitivity to any outlier xi < 8. One possibility for
than that of the MAD. Indeed, constructing a more robust estimator of 8 would be to use
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

y*( S , F ) = 1.8447 < y*(MAD, F) = 1.8587.


8, = medixi - 8,ln 2,
This would seem to disprove a theorem stating that the MAD
has the lowest possible gross-error sensitivity (see Hampel et where G, is a robust scale estimator such as MAD,, S,,
al. 1986,p. 142),but there is no contradiction because that or Q n -
theorem was for symmetric F.
For the asymptotic variances we have obtained V(MAD, 5. DISCUSSION
F) = 2.1352,V ( S ,F) = 1.8217,and V ( Q , F) = 1.3433.
As in the Gaussian and the Cauchy models, we again see Let us compare S, and Q, to another explicit scale esti-
that Q is more efficient than S,which in turn is more efficient mator with a 50% breakdown point, which is based on the
than the MAD. (This also holds for finite samples, as con- length of the shortest half sample:
firmed by Table 4.)Note that the Fisher information for this
model does not exist, so the absolute efficiency of the esti- LMS, = c’ min I x ( ; + ~ - - ~X)( ~ ) [ , (5.1)
i
mators is not defined. But we can still compute relative ef-
ficiencies between them, such as ARE(MAD, Q) = 1.3433/ where again h = [n/2]+ 1 and x ( ~I ) x ( ~I ) --- IX(n) are
2.1351 = 63%. the ordered data. (Note that the minimum length is always
The maximum likelihood estimator (MLE) of u in this unique, even if the half sample itself is not.) The default
model is 6, = aveixi - 8, where 8, = mini xj is the MLE of value of c’ is .7413, which achieves consistency at Gaussian
8 (see Johnson and Kotz 1970,p. 21 1). It turns out that 8, distributions. The estimator (5.1) first occurred as the scale
converges at the unusually fast rate of , - I , whereas $, con- part of the least median of squares (LMS) regression estimator
verges at the standard rate of n-1’2with asymptotic variance (Rousseeuw 1984) in the special case of one-dimensional
data. As we can see from (5.1), the estimator LMS, is sim-
ple, has the optimal breakdown point, and needs only
I-l
function is the same as that of the MAD (Rousseeuw and
Leroy 1988),and its efficiency equals that of the MAD as
r - - - - - - /- - - _- well (Grubel 1988). Therefore, LMS, is less efficient than
IF( x;S,F) either S, or Q,. Note that if we use the p-subset algorithm
of Rousseeuw and Leroy (1987),we obtain the estimator

c’ min med I x, - xjl , (5.2)


i j

which is asymptotically equivalent to LMS, and also has a


50% breakdown point.
The estimators MAD,, LMS,, and S, are all based on half
N
samples. MAD, looks for the shortest half that is symmetric
I - about the location median, whereas LMS, looks for the
shortest half without such a constraint (hence LMS, can be
I-l
I used at asymmetric distributions as well). Similarly, S, can
1280 Journal of the American Statistical Association, December 1993

Theorem 8. At every data set { xI,. . . ,x,,} ,it holds that 6. EXTENSIONS AND OUTLOOK

1 1 2 Our estimators S,, and Q,, are special cases ( u = 2) of the


- MAD,, I - S < - MAD, following formulas:
b C "-b
Sp) = cumedi,(medi2(.. . medi,SD(xi,,xi2, . . . , xi,) . . .))
1 1 1
-LMS,,S-S n <-LMS,,.
- Cr
2 c' C (6.1)
and
If we define the collection of interpoint distances by D
= { 1 xi - xjl; 1 I i, j I n}, then S,, is the remedian with Q$")= du{SD(xi,,. . . ,xi"); i, < i2 < * * - < iu}(;), (6.2)
base n (see Rousseeuw and Bassett 1990) of the set D ,as- where SD(xi,, . , . , xi,) denotes the standard deviation of
suming that the elements of D are read row by row. This the u observations {xi,,. . . , xi,} and h = [n/2] 1. These+
implies that S,, corresponds to an element between the .25 estimators all have 50% breakdown points. We expect the
and .75 quantiles of D. On the other hand, Q,, is asymptot- efficiency to increase with the order u. But the estimators
ically equivalent to the .25 quantile ofD. Roughly speaking, (6.1) and (6.2) demand a high computational effort. Also,
and ignoring the constants involved, it follows that S,, is larger for increasing u their gross-error sensitivity may become ar-
than Q,,. (Note that S,, = Q,, for n I 4.) bitrarily large. Therefore, using (6.1) or (6.2) with a large u
We have seen that S,, and Q,, have a better efficiency than
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

is not recommended.
MAD,,, both asymptotically and for finite samples, while If we also allow scale estimators that measure dispersion
their gross-error sensitivities and bias curves are almost as around a location estimate, possible alternatives would be
good. Another advantage is that they do not presuppose a
symmetric model distribution, but can be considered as
nonparametric measures of spread. (The estimator LMS,
shares this advantage, but is less efficient.) The only price
we pay is a small increase of the computation time, which and
is O( n log n ) for S,, and Q,,as compared to O ( n )for MAD,,
but today's computers can easily afford this.
One could also compare S,, and Q,, with the class of M-
estimators. The latter can be more efficient, and their com- which also have a 50% breakdown point. At symmetric dis-
putational complexity is similar (or even lower, if we restrict tributions, (6.3) and (6.4) are asymptotically equivalent to
their algorithm to a fixed number of iteration steps). But M- S,, and Q,,;however, simulations have indicated that at finite
estimators are defined implicitly, whereas S,, and Q,, have samples these estimators do not perform better than S,, and
an explicit formula, which guarantees uniqueness of the es- Q,,, so we see no reason to switch to either (6.3) or (6.4). A
timate and is more intuitive. Moreover, there is no need to possible application of (6.3) would be to divide it by S,, to
select a function x or to choose tuning constants. obtain a test for symmetry (analogously, (6.4) can be divided
Another class is that of one-step M-estimators, which have by Q,,). This is similar to a proposal of Boos ( 1982) using
an explicit formula and can also attain a high efficiency and Gini's average deviation.
a high breakdown point. But like fully iterated M-estimators, Recently, Rousseeuw and Croux (1 992) proposed several
they are not location-free because they need an ancillary other explicit scale estimators with high breakdown point.
location estimator. And, also like fully iterated M-estimators, A promising estimator is
their computation requires an initial high-breakdown scale
estimator such as MAD,, S,,, or Qnin the first place. Finally,
it should be noted that although k-step M-estimators inherit
the breakdown point of the initial estimator, it turns out that
their actual contamination bias does increase (Rousseeuw It was proved that T,, has a 50% breakdown point, a contin-
and Croux 1993). uous influence function, an efficiency of 52%, and a gross-
Choosing between S,, and Qncomes down to a tradeoff error sensitivity of 1.4688.
between advantages that are hard to compare. Although Qn In cluster analysis one often needs a measure of dissimi-
is more efficient, in most applications we would prefer S,, larity between two groups A and B, which is based on the
because it is very robust, as witnessed by its low gross-error interobject dissimilarities. (For a discussion of such measures
sensitivity. Another advantage of S,, is that its simplicity see Kaufman and Rousseeuw 1990, sec. 5.5.) Our estimators
makes it easier to compute. S,, and Qncan be extended to this situation, yielding
We expect the application potential of the new robust scale
&(A, B ) = media{medjd(i,j)} (6.6)
estimators to be mainly in the following three uses: (1) as a
data analytic tool by itselc (2) as an ancillary scale estimate and
to make M-estimators affine equivariant and as a starting
~ Q ( AB, ) = { d ( i , J ) ; i E A , j E B}(k), (6.7)
value for the iterative computation of M-estimators; and (3)
as an objective function for regression analysis, as will be where k is approximately (#A)( #B)/4. (Both measures are
described in the next section. equivariant for monotone transformations on the d( i, j ) ;
Rousseeuw and Croux: Alternatives to the MAD 1281

hence they can still be applied when the dissimilarities are = c lomed,himed,Ix, - x,l. We first show that E ; I [n/2]/n.
on an ordinal scale.) Note that dQis symmetric in the sense Construct a contaminated sample X'by replacing the observations
that dQ(A,B ) = dQ(B,A ) , whereas ds is not, but this can xz, . . . , x[,,,~~+~ by xi. Then we have himed,Ix: - x; I = 0
be repaired by working with dk(A, B ) = min{ d s ( A , B ) , for [ n/2] + 1 observations x i , hence S,,(X') = 0. Moreover, ti
> [n/2]/n. Indeed, take any sample X'where fewer than [n/2]
dS(B,A)}.
observations are replaced. Because X was in general position,
Both dsand dQcan be seen as robustifying existing meth-
himed,I x: - x; I 2 min,,,I x, - x,1/2 = 6 > 0 for all i; hence
ods. Indeed, if the medians in (6.6) were replaced by averages,
S,,(X') > cd.
or if the order statistic in (6.7) was replaced by an average, Further, E ; I [( n + 1)/2]/n. Construct a sample X'by replac-
then we would recover the well-known average linkage cri- ing X I by -q,,)+ L , xz by x(") + 2 L , . . . , and XI(,,+I)/ZI by x(,,)
terion. On the other hand, replacing the medians by minima + +
[(n 1)/2]L, with L > 0. Then himed,)~:- x(l] 2 L for all i .
(or equivalently, setting k in (6.7) equal to 1) produces the Letting L tend to infinity will then inflate the estimator S,, be-
single linkage criterion. Moreover, replacing the medians by +
yond all bounds. On the other hand, t: > [(n 1)/2]/n. Take any
maxima (or setting k = ( # A ) ( # B )in (6.7)) yields complete +
sample X ' where fewer than [ ( n 1)/2] observations are replaced.
linkage. We can thus think of d's and dQ as intermediate If X: belongs to the original sample, then himed,I xi - XI I
between single linkage and complete linkage, which are often 5 I x(,,)- x ( ~I. )Because this holds for at least half the points, we
considered to be too extreme. For instance, the latter criteria find t h a t S , ( X ' ) I c l ~ ( , ) - x ( ~ , <
l CQ.
do not have an asymptotic limit (because usually the smallest Remark. This proof also implies that if we replace the outer
dissimilarity tends to 0 as n + 00, and the largest dissimilarity median by an order statistic of rank k 5 h = [ n/2] +
1 , then the
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

tends to a),whereas (6.6) and (6.7) do. estimator will keep the same breakdown points.
Another application is to regression. The approach put The fact that eE(S,,, X ) 5 [n/2]/n for any affine equivariant
forward in (Rousseeuw 1984) was to construct affine equi- scale estimator was proved by Croux and Rousseeuw (1 992a).
variant high-breakdown regression estimates by minimizing Proof of Theorem 3. We use the notations
a high-breakdown scale estimate of the residuals rl , . . . , r,,.
gl = ((gF))\]-m,x~])-'and g2 = ((gF)lh,co[)-'
For instance, minimizing the scale estimator (5.1) yields the
LMS regression estimator. In the same vein, S-estimators Condition I ensures that these functions are well defined. With
(Rousseeuw and Yohai 1984) were obtained by minimization G F ( u )= P ( g F ( Y )i u ) , we have S ( F ) = cGF'(1/2) = 1, using
of an M-estimator of scale based on a smooth function p . +
(2.3).Let us denote F. = ( 1 - e ) F &Ax.Differentiating the expres-
The high-breakdown scale estimators introduced in this ar- sion GFc(CIS(F,)) = 1 / 2 yields
ticle could be used in the same way. It seems that Q,, is - acFc(c-')l
preferable to S,, in this regard, because of its smooth influence a& ILO
function. Therefore, we may consider the regression esti- . (A.1)
G;( c-I )
mator
Using Conditions 1 and 2 , we obtain for u in a neighborhood of
8, = argmin8Q,,(rl, . . . , rn). (6.8) c-' that
We will refer to 8, as the least quartile difference (LQD) G F ( u )= P ( g i ( u ) I Y s g z ( u ) ) = F ( g z ( u ) )- F(gi(u)-).
estimator. The LQD belongs to the larger class of generalized
Condition 2 yields g,(c-') = q1 and g 2 ( c - ' ) = qz;hence
S-estimators (Croux, Rousseeuw, and Hossjer 1993). The
asymptotic efficiency of the LQD turns out to be 67.170,
which is much higher than that of S-estimators.
A final extension is to scale estimation in a linear model. Condition 3 allows us to assume that x is different from the points
Existing estimators of the error dispersion are based on the ql ? c-' and q2 f c-' (in which the influence function will not be
residuals from some regression fit. But it is possible to gen- defined). For E sufficiently small, we may write
eralize the location-free estimators s,,and Q,, to scale esti- GF,(U) = Fr(gZ.r(u)) - Fc(gl,c(u)) ('4.3)
mators that are regressionfree, in the sense that they do not for a11 u in a small neighborhood of c-I . We choose a neighborhood
depend on any previous estimate of the regression parame- such that gFc stays decreasing in a neighborhood of qi and increasing
ters. In the simple linear model, we may consider triangles near qz. We may also assume that gl,c(c-')and gZ,Jc-l) are lying
formed by data points and compute their vertical height in that neighborhood. Differentiation of (A.3) yields
h( i,j, k ) . This leads to several scale estimators (Rousseeuw
and Hubert 1993), obtained from quantiles or repeated me-
dians ofthe h( i , j , k ) .When attention is restricted to adjacent
triangles, this also yields a test for linearity.

APPENDIX: PROOFS
Evaluating in C - I gives
Due to lack of space, some proofs and computations have been
omitted and referred to a technical report (Rousseeuw and Croux
1991).
Proof of Theorem 1. Let X be a sample in general position
and denote E: = E:(S,,, X ) and E ; = E;(S,,, X ) , where S,,
1282 Journal of the American Statistical Association, December 1993

By definition of gFc,we have When x < gG,(X), Equation (A.9) entails g C o ( x )= g - ( x ) . There-
fore, &,,( x) is symmetric and increasing on the positive numbers.
(1 - c){F(c-l + g2.Jc-I)) - F(gz.,(c-’) - c-’)} This implies that
+ E { A ~ ( c -+’ & . , ( c - ’ ) )- AX(g2,.(c-’)- c-’)} = 1/2.
B - ( E ,F ) = S ( G o )= cgGo(medlY I )
Differentiating yields

where Y is distributed according to GO. Because F-I{(3 - 4c)/


- { F ( c - l + g2Ac-I)) - F(gz..(c-’) - c - l ) } [4(1 - E ) ] } is smaller than F-’{(2 - 3c)/[2(1 - ~ ) ] } / 2 ,
+ { Z ( x I c-I + g2.Jc-I)) Z(x I g2Jc-I) - c-I)} = 0.
- Equations (2.13) and (2.14) provide an implicit determination
ofB-(E, F).
Using the relation F( q2 + c-I ) F( q2 - = 1/2, we obtain
- c-I)

Proof of Theorem 7. Let the model distribution F be symmetric


+
about 0 and have a density. At G = ( 1 - E ) F E H ,the functional
Q( G) is the smallest positive solution of
Analogously, we also find
G(Y + d - ’ Q ( G ) ) W Y2) 5/8, (A. 1 I )

where d is the constant defined by (3.7). IfX is distributed according


Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

Combining (A.I), (A.2), (A.4), (A.5), and (A.6) gives the desired to F and YI , Y2 according to G, then we can rewrite (A. I 1) as
equation (2.6).
(I - E ) 2 ~ * 2 ( d - l ~ ( ~ ) )
Proof of Theorem 4. Let E be any value in 10, [. We obtain 4 + E ( I - E ) { I +P(IX-Y1I s ~ - ’ Q ( G ) ) } + E ~ P ( ( Y I - Y ~ )
B + ( c ,F) as the limit of S ( G , ) , where G, = (1 - c ) F &Ax,and +
x, goes to infinity. Let Y be distributed according to G,. Then I d - l Q ( G ) ) z 5/8. (A.12)
g c , ( x ) is the smallest positive value for which Note that each term in (A. 12) is increasing in Q( C). To maximize
G , ( x + g c , ( x ) ) - G,(x - g c , ( x ) ) + P(Y = x - g c , ( x ) ) 2 .5. Q ( G ) , we have to minimize P ( I X - YII I d - l Q ( G ) ) and
P(( Yl - Y2)5 d-IQ( G)). These terms approach their lower bounds
Substituting G, yields (0 and 4) when both location and scale of H tend to infinity. (For
( I - c ) { F ( x+ g c . ( x ) ) - F ( x - g c ” ( x ) ) } instance, consider a sequence of Gaussian distributions with location
n and standard deviation n .) Therefore, (A. 12)yields formula (3.10).
+ & { Axn(x+ gG,(X)) - Axn(x- gC,(x))1 For the implosion bias curve, we have to maximize P( I X - Yl I
+ EI(X - g c , ( x ) = x,) 2 .5. 5 d - I Q ( G ) ) and P ( ( Y l - Y,) I d - ’ Q ( G ) ) .When we choose H
= &, we attain the maximal values 2F(Q(G)) - 1 and 1. Com-
Because E < .5, for each M > 0 we can choose no such that for all
bining this with (A.12), we obtain Equation (3.1 I ) , which deter-
n > no
mines B - ( E ,F).
SUP ( x + g c , ( x ) ) < xn (‘4.7)
IxlcM
[Received September 1991. Revised June 1992.1
and
inf gC,(x) 2 SUP gG,(x), (A.8) REFERENCES
IxlrM Ixl<M

making use of the properties of F. Using (A.7) yields g + ( x ) Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H.,
and Tukey, J. W. (1972). Robust Estimates of Location: Survey and Ad-
= g G , ( x )for all I X I < M and for all n > 4. If we choose M such vances, Princeton, NJ: Princeton University Press.
that Bickel, P. J., and Lehmann, E. L. (1976), “Descriptive Statistics for Non-
parametric Models 111: Dispersion,” The Annals of Statistics, 4, 1139-
1158.
(1979). “Descriptive Statistics for Nonparametric Models IV:
Spread,” in Contributions to Statistics, Hiijek Memorial Volume, ed. J.
then, using (A.8) and the symmetry and monotonicity of &.(x)
JureEkovB, Prague: Academia, pp. 33-40.
on 1-M, M [ (from Eq. (2.12)), we obtain Boos, D. (1982), “A Test for Asymmetry Associated With the Hodges-
Lehmann Estimator,” Journal of the American Statistical Association,
S ( G , ) = cgGJmedlYI), 77,647-65 1.
Croux, C., and Rousseeuw, P. J. (1992a), “A Class of High-Breakdown
which does not depend on n (provided that n > a).Therefore,
Scale Estimators Based on Subranges,” Communications in Statistics,
Equations (2.1 1) and (2.12) together determine It+(&,F). Theory andMethods, 21, 1935-1951.
For the implosion bias curve, we have B - ( E ,F ) = S(Go),with (1992b), “Time-Efficient Algorithms for Two Highly Robust Esti-
Go = ( 1 - E ) F+ E&. Now g c o ( x )is the smallest positive solution mators of Scale,” in Computational Statistics, Volume 1, eds. Y . Dodge
of and J. Whittaker, Heidelberg: Physika-Verlag, 41 1-428.
Croux, C., Rousseeuw, P. J., and Hossjer, 0. (1993), “Generalized S-
(1 - & ) { F ( X+ g c 0 ( x ) )- F ( x - g c o ( x ) ) } Estimators,” technical report 93-03, Universitaire Instelling Antwerpen,
Belgium.
+ & { & ( x + gCo(x)) - & ( x - gGo(x))} Gini, C. (1912), “Variabiliti 6 mutabilita, contributo allo studio delle dis-
+ EI(x - gGo(x)= 0 ) 2 .5. (7.9) tribuzioni e delle relazioni statistiche,” Studi Economico-Giuridici della
R. Universitd di Cagliari, 3, 3- 159.
It follows that x 5 g G o ( x and
) that Griibel, R. (1988), “The Length of the Shorth,” The Annals of Statistics,
16, 619-628.
x = gGo(x) iffx 2 F-I
(-/2. Hampel, F. R. (1974), “The Influence Curve and its Role in Robust Esti-
mation,” Journal of the American Statistical Association, 69, 383-393.
Rousseeuw and Croux: Alternatives to the MAD 1283

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986), Rousseeuw, P. J., and Croux, C. (199 I), “Alternatives to the Median Absolute
Robust Statistics: The Approach Based on Influence Functions, New York: Deviation,” Technical Report 9 1-43, Universitaire Instelling Antwerpen,
John Wiley. Belgium.
He, X., and Simpson, D. G. (1993), “Lower Bounds for Contamination (1992), “Explicit Scale Estimators with High Breakdown Point,” in
Bias: Globally Minimax versus Locally Linear Estimation,” The Annals L,-Statistical Analysis and Related Methods, ed. Y. Dodge, Amsterdam:
OfStatistics. 21, 314-337. North-Holland, pp. 77-92.
Hodges, J. L., Jr., and Lehmann, E. L. (l963), “Estimates of Location Based (1993), “The Bias of k-Step M-estimators,” submitted to Statistics
o n Rank Tests,” Annals of Mathematical Statistics, 34, 598-61 1. and Probability Letters.
Hossjer, O., Croux, C., and Rousseeuw, P. J. (1993), “Asymptotics of a Rousseeuw, P. J., and Hubert, M. (1993), “Regression-Free and Robust
Very Robust Scale Estimator,” technical report 93- 10, Universitaire In- Estimation of Scale,” technical report 93-24, Universitaire Instelling An-
stelling Antwerpen, Belgium. twerpen, Belgium.
Huber, P. J. (1981), Robust Statistics. New York: John Wiley. Rousseeuw, P. J., and Leroy, A. M. (1987), Robust Regression and Outlier
Johnson, N. L., and Kotz, S. (1970), Continuous Univariate Distributions- Detection, New York: John Wiley.
I , New York: John Wiley. (1988), “A Robust Scale Estimator Based on the Shortest Half,”
Kaufman, L., and Rousseeuw, P. J. (1990), Finding Groups in Data: An Statistica Neerlandica, 42, 103- 1 16.
Introduction to Cluster Analysis, New York: John Wiley. Rousseeuw, P. J., and Yohai, V. J. (1984), “Robust Regression by Means
Martin, R. D., Yohai, V. J., and Zamar, R. H. (1989), “Min-Max Bias of SEstimators.” in Robust and Nonlinear Time Series Analysis, (Lecture
Robust Regression,” The Annals of Statisfics, 17, 1608-1630. Notes in Statistics 26), eds. J. Franke, W. Hardle, and R. D. Martin, New
Martin, R. D., and Zamar, R. H. (1989), “Asymptotically Min-Max Bias York: Springer-Verlag. pp. 256-272.
Robust M-estimates of Scale for Positive Random Variables,” Journal of Serfling, R. J. (1984). “Generalized L-, M-, and R-Statistics,” The Annals
the American Statistical Association, 84, 494-50 I . of Statistics, 12, 76-86.
(1991), “Bias Robust Estimation of Scale when Location is Un- Shamos, M. I. (1976), “Geometry and Statistics: Problems at the Interface,”
known,” submitted to The Annals of Statistics. in New Directions and Recent Results in Algorithms and Complexity, ed.
Downloaded by [Carnegie Mellon University] at 11:22 17 January 2015

Rousseeuw, P. J. (1984), “Least Median of Squares Regression,” Journal J. F. Traub, New York: Academic Press, pp. 251-280.
of the American Statistical Association. 79, 87 1-880. Siegel, A. F. (l982), “Robust Regression Using Repeated Medians,” Bio-
Rousseeuw, P. J., and Bassett, G. W., Jr. (1990), “The Remedian: A Robust metrika, 69, 242-244.
Averaging Method for Large Data Sets,” Journal of the American Statistical Tukey, J. W. (1977), Exploratory Data Analysis, Reading, MA: Addison-
Association. 85. 97- 104. Weslev.

You might also like