Rousseeuw and Christophe Croux Source: Journal of the American Statistical Association, Vol. 88, No. 424 (Dec., 1993), pp. 12731283 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2291267 . Accessed: 07/06/2011 12:41
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=astata. . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association.
http://www.jstor.org
point(50%,twice breakdown The MAD has thebestpossible function range), and itsinfluence as muchas theinterquartile of Althoughmany robustestimators location exist,the boundamongall scale possible the is bounded,with sharpest sample medianis stillthe mostwidelyknown.If {xj, . . .. by promoted Hampel(1974), The estimators. MAD was first we xn} is a batchofnumbers, willdenoteitssamplemedian who attributed to Gauss. The constant in (1.2) is needed b it by of for consistent theparameter interest. to maketheestimator a In thecase oftheusualparameter at Gaussiandistributions, medixi the we needto setb = 1.4826.(In thesame situation, average when n is odd. whichis simplythe middle orderstatistic in by deviation (1.1) needsto be multiplied 1.2533to become of statistics Whenn is even,we shalluse theaverage theorder consistent.) (n/2) and (n/2) + 1.The medianhasa breakdown with ranks The samplemedianand theMAD are simpleand easyto possible), because the point of 50% (which is the highest sturveryuseful.Theirextreme compute,but nevertheless than50% ofthedata boundedwhenfewer estimate remains the dinessmakesthemideal forscreening data foroutliers funcnumbers. influence Its by pointsare replaced arbitrary in a quick way,by computing with sharpest the boundfor location any tionis also bounded, lxi medxjI Rousseeuw,and Stahel estimator (see Hampel, Ronchetti, (1.3) attesting the median's to 1986). These are but a fewresults MADn robustness. thosexi as spuriousforwhichthis foreach xi and flagging of Robustestimation scale appearsto have gainedsome(say, 2.5 or 3.0). Also, the statistic exceedsa certaincutoff methusersofstatistical whatlessacceptanceamonggeneral the used as initialvaluesfor medianand theMAD are often to ods. The only robustscale estimator be foundin most It robust estimators. was conof computation moreefficient range,which has a statistical packages is the interquartile by firmed simulation(Andrewset al. 1972) thatit is very it good, although breakdown pointof 25% (whichis rather valuesfor computation the starting to important have robust consider the can be improved on). Some peopleerroneously from average the and it ofMestimators, that won'tdo to start deviation average and the standarddeviation. Also note that location Mestimate scale to makethem of need an ancillary estimators (x1.1) avei I xi  avexjI It equivariant. has turnedout thatthe MAD's highbreak(whereave standsfor"average") to be a robustestimator, down property makes it a betterancillaryscale estimator function thantheinterquartile its pointis 0 and itsinfluence although breakdown This also problems. range, in regression is unbounded.If in (1.1) one of the averagesis replacedby led Huber (1981, p. 107) to conclude that"the MAD has a median,we obtain the "median deviationabout the av emerged thesingle of estimate scale." as mostuseful ancillary erage"and the"averagedeviationabout themedian,"both In spiteof all theseadvantages, MAD also has some the from breakdown a ofwhichsuffer pointof 0 as well. is drawbacks.First,its efficiency Gaussian distributions at A veryrobustscale estimator the medianabsolutede very is efficiency the low;whereas locationmedian'sasymptotic by given viation aboutthemedian, Second,theMAD is still 64%,theMAD is only37% efficient. esview on dispersion, because one first takesa symmetric (1.2) MADn = b medi x  medjxjI. equal value (themedian)and thenattaches timates central a from Acit. deviations to name ofme importance positiveand negative This estimator also knownundertheshorter is inthe the to dian absolutedeviation(MAD) or even mediandeviation. tually, MAD corresponds finding symmetric
* PeterJ.Rousseeuwis Professor Christophe and DeCrouxis Assistant, of Instelling and ComputerScience,Universitaire partment Mathematics The Belgium. authors 1,B2610Wilrijk, (UIA), Universiteitsplein Antwerpen for comments. thanktheassociateeditorand referees helpful
1. INTRODUCTION
? 1993 AmericanStatistical Association Association of Journal the AmericanStatistical and Methods December 1993,Vol. 88, No. 424, Theory
1273
1274
Journal of the American Statistical Association, December Table 1. AverageEstimatedValue ofMAD,, Sn, Qn, and SDn at Gaussian Data Averageestimatedvalue MADn
.911 .959 .978 .987 .991 .992 .996 1.000
1993
terval (aroundthemedian)thatcontains50% ofthedata (or 50% oftheprobability), whichdoes notseemto be a natural approach at asymmetric distributions. The interquartile rangedoes not have thisproblem, thequartiles as need not n be equally faraway fromthe center.In fact,Huber (1981, p. 114) presented MAD as the symmetrized the version of 10 theinterquartile range.This implicit relianceon symmetry 20 is at odds withthegeneral in of theory Mestimators, which 40 60 the symmetry is is assumption not needed.Of course,there 80 nothing stop us fromusingthe MAD at highly to skewed 100 but distributions, it may be rather inefficient artificial 200 and oo to do so.
Sn
.992 .999 .999 1.001 1.002 .997 1.000 1.000
Qn
1.392 1.193 1.093 1.064 1.048 1.038 1.019 1.000
SDn
.973 .990 .993 .996 .997 .997 .999 1.000
2. THEESTIMATOR Sn
NOTE:
The purposeof thisarticleis to searchforalternatives to the MAD thatcan be used as initialor ancillary scale esti (rousse@wins.uia.ac.be), and it has been incorporated in matesin the same way but thatare moreefficient not Statistical and Calculator(T. Dusoir,fbgj23@ujvax.ulster.ac.uk) slantedtowardssymmetric distributions. Two such estima and in Statlib(statlib@stat.cmu.edu). torswillbe proposedand investigated; first the is To check whether correction the factor = 1.1926 (obc an tainedthrough asymptotic succeedsin making argument) (2.1) Sn = c medi {medjl x,  x,I }. unbiasedforfinite Sn approximately we samples, performed This shouldbe read as follows. For each i we computethe a modestsimulationstudy.The estimators Table 1 are in median Ixi  xjI; j = 1,. .., n}. Thisyields numbers, the of{ n in MAD,, Sn, theestimator (whichwillbe described Q, themedianofwhichgivesour final estimate (The factor Section 3), and the sample standarddeviation S,n. SD,. Each c is again forconsistency, itsdefault and value is 1.1926,as table entry the averagescale estimate 10,000batches is on we will see later.)In our implementation (2.1) the outer ofGaussian observations. see that behavesbetter of We than S, medianis a low median,whichis theorderstatistic rank of Crouxand Rousseeuw Moreover, MAD, in thisexperiment. [(n + 1)/2], and theinnermedianis a highmedian,which (1992b) have derivedfinitesample correction factors that is the orderstatistic rankh = [n/2] + 1. The estimator render of unbiased. MAD,, S", and Q, almostexactly Sn can be seenas an analogofGini's average difference (Gini In the following theorem provethatthe finite we sample 1912), whichone wouldobtainwhenreplacing medians breakdown the point of Sn is the highest possible.We use the in byaverages (2.1). Notethat(2.1) is very similar formula replacement to version thebreakdown of point:For anysample (1.2) forthe MAD, the onlydifference beingthatthe medj X = { xl, . .. , x, }, thebreakdown pointofS, is defined by operationwas moved outsidethe absolutevalue. The idea to applymedians was repeatedly introduced Tukey( 1977) en(Sn, X) = min{4 (S", X), en(Sn, X)}, by (2.2) forestimation twoway in tables and by Siegel (1982) for where estimating regression coefficients. Rousseeuw and Bassett (1990) usedrecursive mediansfor estimating in location large en(Sn, X) = min,{; sup Sn(X') = oo xf data sets. n Note that(2.1) is an explicit formula; henceSn is always We uniquelydefined. also see immediately S, does bethat and e(Sn, X) = min{; infSn(X') = 0} n xf have like a scale estimator, the sense thattransforming in the observations to axi + b will multiply by Ia 1.We Sn xi and X' is obtainedby replacing of any m observations X by willrefer thisproperty affine to as equivariance. values.The quantities and e are called theexarbitrary 4" Like the MAD, the new estimator is a simplecombiSn plosion breakdownpoint and the implosion breakdown nation of mediansand absolute values. Insteadof the absolute values, we could also use squares and then take a point. square rootat theend, whichwould yieldexactly same the Theorem1. At any sampleX = {xl, . .. , xn} in which estimate. we would replacethe mediansby averagesin no two pointscoincide,we have (If thatformula, would recover standard we the deviation.) X) = [(n + 1)12]/n and e(Sn, X) = [n/2]/n. On the otherhand,Sn is unlikethe MAD in thatit does 4c+(Sn, not need any locationestimate thedata. Insteadof mea The breakdown of pointofthescale estimator thusis given Sn howfar are suring awaytheobservations from central a by value, Sn looks at a typical distancebetweenobservations, which en(Sn, X) = [n/2]/n, is stillvalid at asymmetric distributions. A straightforward algorithm computing for possiblevalue foranyaffine (2.1) would whichis thehighest equivariant need0(n2) computation However, time. Croux Rous scale estimator. and (The proofis givenin theAppendix.) We now turnto theasymptotic seeuw (1992b) have constructed 0( n log n)time algoversion our estimator. of an rithm Sn. Itssourcecode can be obtained for stochastic from authors Let X and Y be independent variableswithdisthe
1275
functions was (A more general"chain rule" forinfluence givenby Rousseeuwand Croux 1992.) Let us considerthe special case where F = 4). We will use the notation q (2.3) S(G) = c med gG(X) that = 4)1 (3/4). From the proofof Theorem2, it follows x we Because have = q, q2 = q, and g'(q) = g'(q). distribution q If is the desiredfunctional. G, is the empirical 4?(x + gj)(x))  4?(x gj(x)) = .5 thenS(Gn) = S, function, modelF,,( x) = F((x  0)/ forall x, it follows Let us consider locationscale a that We a), whereF is called the model distribution. wantour whichmeansthatS(Fo,,) estimator be Fisherconsistent, to (q  cl)  4)(q + c1) g' (q g(D ) (q + cl) + 4(q  cl)l = ar all 0 and all a > 0. It is easilyverified S is Fisherthat for if consistent we take This yieldstheexpression c= 1/(medgF(X)), (2.4) Ig 1 x xl  q q) )=clg(q)sgn(I S, ~ IF(x;(X;S4)) = accordingto the model distribution whereX is distributed givesthevalue ofc in case F = 4), theorem F. The following +sgn(lq xl  C') + sgn(I q xl c')' function. Gaussian distribution where4)(x) is thestandard 4(4fi(q + cl) + 4)(q  cl)) the c Theorem 2. For F = 4),theconstant satisfies equa(2.7) tion of The influence function S is plottedin Figure 1. We 'see 4)(4)(3/4) + c1)  4)(4)'(3/4)  c') = 1/2. values(whichare that thatitis a stepfunction takeson four Equivalently, is the square root of the median of the symmetric c1 about 0), unlikeIF(x; MAD, 4)), whichtakes of the senwith distribution 1 degree freedom on onlytwo values.Let us also consider grosserror chisquared noncentral and noncentrality parameter4)'(3/4). Numericalcalcu sitivity lationyieldsc = 1.1926. (If we wantto workwithanother (2.8) y*(S, 4)) = sup IIF(x;S, 4))I = 1.625, we c, model distribution, must take a different as in the x examplesin Sec. 4.) against that small,indicating S is quiterobust S function the functional at the distri whichis rather of The influence In oy*(SD,4)) = oo, whereas y*(MAD,4)) outliers. contrast, butionF is defind by = 1.167 is thesmallest valuethatwe can obtainfor anyscale estimator theGaussian distribution. at + eAx)S(F) ) IF(x; S, F) = lim S((,)F deviation of The influence function theclassicalstandard 6 40 is givenby IF(x; SD, 4)) = (x2  1)/2. Note thatIF(x; 5, whereA?x all its mass in x (Hampel 1974). Theorem3 has 4)) looks more like that Ushaped curve than does IF(x; of givestheexpression IF(x; S, F). than MAD, 4)), indicating thatS willbe moreefficient the varianceof Sn is givenby hold: MAD. Indeed,theasymptotic conditions 3. Assumethatthefollowing Theorem
=
then
medyI x
Y,
on 1. There existsan xo such that gF(x) is increasing on [xo, oo[ and decreasing ]oo, xo]. AssumethatF has a density in neighborhoods f of q, and q2, withf(q) > 0 andf(q2)> 0. Assumefurther that gF has a strictly positive derivativein a neighborin derivative a neighborhood hood ofq2and a strictly negative of q,. of 3. f existsin neighborhoods the pointsq, ? c' and
= gF(q2).
q2 ? C .
V(S5,4)
IF(x;S,
4))2
c
IF(x;Q,T)
o~~~~ I
< q2)
x IF(x;S,T) ~~~~~~~~~~~~~~~
I(q, < x
f(q2)sgn(Iq2_xl  c')
2(f(q2 + C') +f(ql)sgn(I f(q2 q xl c1)I

 C l))
L
4
x
IF(x;MAD,Il

+2(f(q, + c1)f(q,
Xl{gF(q2)
c'))
J
(2.6)
2
2 .......
) gFt(ql
S, of Figure Influence 1. Functions theMAD,theEstimator and the Distribution is Gaussian. Estimator When Model Q the
1276
1993
of Table2. Standardized Variance MADn, Qn, Sn, andSDnat Gaussian Data variance Standardized n
10 20 40 60 80 100 200 oo
F(x
g+(x))
2(1)
(2.12)
MADn
1.361 1.368 1.338 1.381 1.342 1.377 1.361 1.361
Sn
1.125 .984 .890 .893 .878 .869 .873 .857
Qn
.910 .773 .701 .679 .652 .650 .636 .608
The implosionbias curveis givenby B(e, F) = cg} F(4(1 whereg is defined implicitly by F(x + g(x))

c))),
(2.13)
F(x
g(x)) = I
(2.14)
bothbias curvesat F = 4). We see thatthe Figure2 displays explosionbias curve of S is nearlyas good as that of the of (The asymptotic normality S, was proved in Hossjer, MAD, and for e close to the breakdownpoint it is even of Croux, and Rousseeuw 1993.) This yieldsan efficiency better.For the implosionbias curve the MAD performs is to relative theMAD slightly 58.23%,which a marked improvement better thanS overall.Note that[aB+(e, 4b)/ae] e=O at is whoseefficiency Gaussiandistributions 36.74%.We pay = y*(S5 and [aB(e, 4?)/ae] ,e=o y*(S5 4)). Moreover, 4)) and atthebreakdown increasein thegrosserror forthisby a slight sensitivity B+(e, 4b) oo and (e point 2) wefind in therequired computation time. 0. Be, 4)) O) We carriedout a simulation verify efficiency to this gain at finite samples.For each n in Table 2, we computedthe Sn variancevarmSn) ofthescale estimator overm = 10,000 ( variances samples.Table 2 liststhe standardized
(n
n varm(Sn)/(avem(Sn))2,
(2.10)
value as givenin whereavem (Sn) is the averageestimated Table 1. (It was arguedby Bickeland Lehmann(1976) that thedenominator (2. 10) is neededto obtaina natural meaof showthat The sureofaccuracyforscale estimators.) results a for theasymptotic varianceprovides good approximation (not too small) finite samples,and thatSn is moreefficient thanMAD, even forsmall n. Whereasthe influence function describeshow the estimatorreactsto a singleoutlier, bias curvetellsus how the much the estimator change(in the worstcase) whena can e Bias werebriefly fractionofthedatais contaminated. curves mentioned Hampel et al. (1986, p. 177), but theirfull by was not realizeduntiltheworkof Martinand Zapotential mar (1989, 1991), Martin,Yohai and Zamar (1989), and He and Simpson(1993). In thecase of scale estimators, distinguish we between an increaseof the estimate (explosion)and a decrease(implosion). For e > 0, we define = { G; G = (1  e)F + eH}, Then the explosion whereH rangesover all distributions. of as bias curve S is defined
Se
MAD
0.0 0.1 0.2 0.3 0.4 0.5
epsilon
(a)
0
Q
0
of bias curve of as (plotted a function e), and the implosion S is givenby B(e, F)= inf S(G).
Ge Cl
F Theorem 4. For a distribution that is symmetric positive around the origin, unimodal,and has a strictly is density, have we
0.0
0.1
0.2 (b)
0.3
0.4
0.5
epsilon
of Fraction 2. of MAD, andQ as a Function the S, Figure BiasCurves the (a) (b) Implosion. ofContamination:for Explosion; for
B+(e, F)
cg?(F1(4f72
)))'
(2.11)
1277
We first show thatthe finite sample breakdown Note thatthe influence function and the bias curveare pointof asymptotic concepts.For finite we samples, haveconfirmed Q, attainstheoptimalvalue. theabove results computing corresponding the by sensitivity Theorem AtanysampleX= {x,...,x,,} 5. in which no curvesand empirical bias curves.
is A drawback MAD, and S,,is thattheir of influence funcen (Qn, X) = [n/2]/n. In tions have discontinuities. the location framework, the We nowgivetheasymptotic unliketheHodgessample medianhas thesame drawback, version our estimator. of Let Lehmann(1963) estimator X and Y be independent randomvariables withdistribution G. Then we define +xj <j medX (3.1) 2j with HG=.L(IXYI). Q(G) = dHC() (3.4) whichpossessesa smoothinfluence function (see, forexample,Hampel et al. 1986,p. 112). Therefore, Hodgesthe Lehmannestimator might viewedas a "smoothversion" be ofthemedian. An analogous scale estimator, mentionedby Shamos (1976, p. 260) and Bickel and Lehmann (1979, p. 38), is obtainedbyreplacing pairwise the averages pairwise by distances,yielding med{Ixi  xjl; i <}. (3.2) Note thatQ(Gn) is not exactlythe same as Qn(wherewe take an orderstatistic among (n ) elementsinsteadof n2), butasymptotically makesno difference. this Representation (3.4) belongsto theclass ofgeneralized Lstatistics, introas duced by Serfling (1984). Because X  Y is symmetric, we could also write
3. THEESTIMATOR Qn
Q(G) = dKc'( )
with KG = L(X  Y)
(3.5)
Thisresembles definition ofSn,where (2.1) separate medians weretakenoveri andj, whereas (3.2) usesan overallmedian over (2) pairs.We proposeto multiply (3.2) by 1.0483 to achieveconsistency theparameter of Gaussian distrifor a butions.Like theHodgesLehmann estimator, scale esthis timator onlya 29% breakdown has point,but a rather high Gaussian efficiency (about 86%). But we wantan estimator witha 50% breakdown point like the MAD. We foundthat,somewhat surprisingly, this goal can be attainedby replacing median in (3.2) by a the different order statistic. Therefore, proposetheestimator we
(3.6)
In the parametric model F0,0(x)= F((x  0)/a), the functional is Fisherconsistent a ifwe choosetheconfor Q stantd according to
d = 1/lHF' (I)
I K
(3.7)
For symmetric this reducesto d = I/((F*2)I(5/8)), F where F*2 denotestheconvolution F. In thecase F = ?, F* Qn=d{lxi Xjl;i<j}(k), (3.3) we obtain KF = ?(X  Y) = ?(X + Y) = I(VXX); hence whered is a constantfactor and k = (h) (2)/4, whereh = [n/2] + 1 is roughly halfthe numberof observations. d = l/( 4 '(5/8)) = 2.2219. That is, we takethe kthorderstatistic the (') interpoint of Q,, distances. This bears some resemblance S, because the In Table 1 we see thatwiththisconstantthe estimator to has a considerablesmallsample bias, but we can use the double medianoftheinterpoint is distances at leastas large correction factors derived Crouxand Rousseeuw 992b). by (1 as their.25 quantile. calculations usingexpresBy meansofsome standard (or The estimator sharestheattractive of Q,, properties Sn:a for1984), one obtainsthefollowing a that simpleand explicit formula, definition is equallysuit sion (2.12) in Serfling mula fortheinfluence function. and able forasymmetric distributions, a 50% breakdown is point.In addition,we will see thatits influence function 6. If KF has a positive Theorem derivative 1/d, then at at is smooth,and thatits efficiency Gaussian distributions veryhigh(about 82%). At first theseadvantagesare sight, 4 offset a larger by becausethenacomputational complexity, (dd =d (1) . (3.8) IF(x; Q,F) ive algorithm all (whichbeginsby computing and storing (') pairwise distances)needs 0(n2) space and 0(n2) time. KF d But Croux and Rousseeuw(1992b) have constructed alan for gorithm computing with0(n) space and 0(n log n) Q,, IfF has a density then(3.8) becomes f,
time.Due to theavailability fast of algorithms Sn, for and a company nowusing is these estimators on Qn, financial a dailybasisin analyses thebehavior stocks, of of with n 8000.
4 FX IF(X; Q,F) =d
dl)+
F(X dl)
rffV+dl)ffv)
(3.9
1278
1993
In Figure 1 we plottedthe function Table 3. Simulation Results forCauchy Data, IF(x; Q, f). It turns Based on 10,000 Samples forEach n outthatthegrosserror sensitivity Q, f) = 2.069 is larger oy*( thanthatoftheMAD and S. Note thatthisinfluence funcAverage value Standardizedvariance tion is unbalanced:The influence an inlierat the center of MADn Sn Qn MADn of the distribution smaller(in absolute value) than the n Sn is Qn influence an outlier infinity. of at 10 1.097 1.126 1.629 3.150 3.194 3.363 With the aid of numericalintegration, 20 1.045 1.044 1.284 Equations (2.9) 2.819 2.554 2.606 40 1.025 1.021 1.136 2.569 2.280 2.299 and (3.9) yieldV(Q, f) = .6077. By applying Theorem3.1 60 1.014 1.010 1.085 2.571 2.202 2.172 in Serfling (1984), we obtaina rigorous proofoftheasymp 80 1.009 1.007 1.062 2.586 2.240 2.215 toticsof Q, underthe same conditions in our Theorem 100 1.011 as 1.010 1.055 2.497 2.164 2.119 1.005 1.004 1.027 2.454 2.130 2.058 6. The resulting Gaussianefficiency 82.27% is surprisingly 200 of 00 1.000 1.000 1.000 2.467 2.106 2.044 high. In Table 2 we see thatQ, is considerably more efficient than eitherMAD, or S,. For instance,Q40 has about the sameprecision MAD80.ButQ, loosessomeofitsefficiency as 4. ROBUSTNESS NONGAUSSIAN AT MODELS at smallsamplesizes. Quite often one uses a parametric model F0,,( = F((x x) Theorem7. If F is symmetric about 0 and possessesa  0)/a) in which the model distribution is itselfnonF then density, Gaussian. In thissectionwe investigate behaviorof Sn the and Q, fora heavytailed model distribution and foran model distribution. B+(e, F) = d(F* 2) 58(1  )2 (3.10) asymmetric Let us first considertheCauchydistribution and theimplosion is given thesolution theequation bias by of arctan(x) 1 + F(x) = (1  e)2F*2(d1B(e, F)) + 2e(1  e)F(B(e, F)) + C2 7r 2
= 5/8. (3.11) Figure2 displaysthe bias curvesof Q (at F = f) along with thoseoftheMAD and S. Note thattheMAD is slightly more robustthan Q regarding theirexplosionbias curves, whereasQ is morerobust thanMAD forimplosionbias. It is also interesting note(see thederivation theAppendix) to in thatthe supremum bias of Q does not correspond conto tamination a point mass movingto infinity, rather by but to a contaminating distribution whichbothlocationand of scale tend to infinity. This explainswhythe slope of the explosionbias curveat e = 0 is not equal to thesupremum oftheinfluence in function thiscase.
C\J
IF(x;Q,F)
At thisheavytailed distribution, samplestandard the deviationis nota useful estimator a becausethesecondmoment of of F0,,does not exist.But we can stilluse MAD,, S", and factors Qn,whoseconsistency become b = 1,c = .7071,and d = 1.2071in thismodel.Figure3 showstheinfluence functions of these threeestimators, which were computedin the roughly same wayas in theGaussian model.The shape of these functions quite similarto Figure 1, except for is IF(x; Q,F), whichis morebalanced thanin theGaussian case. The grosserror sensitivity Q,F) = 2.2214is attained oy*( for goingto infinity, IF(x; Q,F) approaches x but thislimit veryslowly. For the asymptotic variances we obtain V(MAD, F) = 2.4674, V(S, F) = 2.1060, and V(Q, F) = 2.0438. These shouldbe comparedto thetheoretical lowerbound (i.e., the inverseof the Fisherinformation) whichequals 2 in this case. Therefore, absoluteasymptotic the efficiencies become heavytailed distribution, three all robust estimators more are efficient at theGaussian distribution. than Table 3 summarizes simulation a (analogousto Tables 1 and 2) confirming MAD, and Sn approximately that are unbiased forfinite samples.For nottoo smalln, thevariability of the estimatorsis described reasonablywell by their asymptotic variance. NotethatSn performs somewhat better thanQ, at small samples. As an exampleof an asymmetric model distribution, we takethenegative exponential F(x)
=
e(MAD)
= 81%, e(S)
= 95%, and
IF(x; F) S,
IF(x;MAD,F)1_
r~) _ 5
(1 1exp(x)
)I(x ? 0),
3
1
(1 exp((x
)/of))I(x
? 0),
1279
Data, Distributed ResultsforExponentially Table 4. Simulation model.Note that exponential whichwe willcall the shifted Based on 10,000 Samples forEach n 0 the locationparameter is not at the centerof the distribution,but ratherat its leftboundary.To estimatea by Standardizedvariance Average value MADn,Sn, or Qn, we need to set b = 2.0781, c = 1.6982, Qn Sn n MADn Qn MADn Sn of functions the and d = 3.4760. Figure4 displays influence were 1.677 2.121 functions at 2.152 1.536 1.002 theseestimators F. Note thattheinfluence 0.925 10 1.521 2.065 2.182 1.246 0.981 0.959 20 F x also computedat negative although has no mass there, 1.481 2.006 1.120 2.167 0.991 0.983 40 of can occurto theleft theunknown0. because outliers 1.420 1.989 2.172 1.078 0.993 0.986 60 of function the MAD looks verydifferent 80 The influence 1.395 1.924 2.153 1.056 0.993 0.988 1.373 1.911 2.122 1.046 0.996 0.993 100 fromthose in Figures 1 and 3, because thistime F is not 1.381 1.919 2.173 1.024 0.998 0.996 200 x Note thatIF(x; MAD, F) = 0 fornegative symmetric. 1.343 1.822 2.135 1.000 1.000 1.000 so of effect the outlierat x on the (due to the compensating x jumps forpositive locationmedian) and thatit has three (at ln(2)  b', at ln(2), and at ln(2) + b '). The influence Ushaped,as in Figures1 V( a, F) = 1. But theextreme of functions S and Q are roughly by of efficiency O,is offset its of sensitivity S is smaller extreme the and 3. Surprisingly, grosserror for to sensitivity any outlier < 0. One possibility xi thanthatof theMAD. Indeed, of a constructing morerobustestimator 0 would be to use
that stating theMAD a Thiswouldseemto disprove theorem such as MADn, Sn, is (see sensitivity Hampel et where a,, a robustscale estimator has thelowestpossiblegrosserror because that or Qn. al. 1986,p. 142), but thereis no contradiction F. was forsymmetric theorem 5. DISCUSSION we variances have obtainedV(MAD, For theasymptotic F) = 2.1352, V(S, F) = 1.8217,and V(Q, F) = 1.3433. Let us compareSn and Qnto anotherexplicitscale estiAs in the Gaussian and the Cauchy models,we again see point,whichis based on the matorwitha 50% breakdown is than thatQ is moreefficient S, whichin turn moreefficient half of length the shortest sample: samples,as conthanthe MAD. (This also holds forfinite for firmed Table 4.) Note thattheFisherinformation this by (5.1) LMSn = c' min I X(i+h1)  X()I, of model does not exist,so the absoluteefficiency the estii efBut we can stillcomputerelative matorsis not defined. x(n)are them,suchas ARE(MAD, Q) = 1.3433/ whereagain h = [n/2] + 1 and x(l) < x(2) < ... ficiencies between is always length data. (Note thatthe minimum the ordered 2.1351 = 63%. is (MLE) of ain this unique, even if the half sample itself not.) The default The maximumlikelihoodestimator at = model is n = avei xi  n where0,n minjxj is theMLE of value ofc' is .7413, whichachievesconsistency Gaussian as occurred thescale The (5.1) first O(see Johnson and Kotz 1970,p. 211). It turnsout that0,n distributions. estimator estimator of con part theleastmedianofsquares(LMS) regression fastrateof n', whereas8n at converges theunusually variance (Rousseeuw 1984) in the special case of onedimensional rateofn'1/2withasymptotic at verges thestandard (5.1), theestimator data. As we can see from LMSn is simple, has the optimal breakdownpoint, and needs only Its timeand 0(n) storage. influence 0(n log n) computation r~) MAD (Rousseeuw and is function the same as thatof the equals thatof the MAD as Leroy 1988), and its efficiency than well (Grubel 1988). Therefore, LMSn is less efficient IF(x;S,F) either or Qn. Note thatifwe use thepsubsetalgorithm Sn of Rousseeuwand Leroy(1987), we obtaintheestimator oU IF(x;MAD,F)
On= medixi
a,ln 2,
_
(5.2)
Distribution.
equivalentto LMSn and also has a whichis asymptotically 50% breakdown point. The estimators MADn, LMSn,and Sn are all based on half half the samples.MADn looks for shortest thatis symmetric about the location median, whereasLMSn looks forthe such a constraint half (hence LMSn can be shortest without as Sn distributions well). Similarly, can used at asymmetric 4 0 3 2 1 1 of the be said to reflect "typical"length halfsamplesin the showsthatMAD", LMS", theorem S, of 4. Functions theMAD, and Q at theExponential data set.The following Figure Influence
1280
1993
<
MADn LMSn.
cvmedj1(medi2(..
. medi,SD(xi,,
xi2
*, Xi)
(6.1)
. .
and If we definethe collectionof interpoint distancesby D iV}(h), (6.2) = {I xi xj I; 1 < i, j < n}, thenSnis the remedianwith Qnv)= d {SD(xi,, * , xi,); i1 < i2 < . . .< base n (see Rousseeuw and Bassett1990) of the set D, aswhereSD(xi,, . .., xiv)denotesthe standarddeviationof sumingthatthe elementsof D are read row by row. This thev observations {xi,, . . ., xiv} and h = [n/2] + 1. These impliesthatSn corresponds an elementbetweenthe .25 to estimators have 50% breakdown all points.We expectthe and .75 quantilesofD. On theotherhand,Qnis asymptotto efficiency increasewiththe orderv. But the estimators icallyequivalent the.25 quantileofD. Roughly to speaking, (6.1) and (6.2) demand a highcomputational effort. Also, and ignoring constants the it that involved, follows Snis larger forincreasing their v grosserror sensitivity becomearmay thanQn.(Note thatSn = Qnforn < 4.) bitrarily large.Therefore, using(6.1) or (6.2) witha largev We have seenthatSn and Qnhave a better than efficiency is not recommended. and samples,while MADn, both asymptotically forfinite If we also allow scale estimators thatmeasuredispersion theirgrosserror sensitivities bias curvesare almostas and arounda locationestimate, possiblealternatives would be good. Another advantageis thattheydo not presupposea model distribution, can be consideredas symmetric but j  medxk) medi(med( Xi (6.3) nonparametric measuresof spread. (The estimator LMSn sharesthisadvantage,but is less efficient.) only price The we pay is a small increaseof the computation time,which and is 0(n log n) for and Qnas comparedto 0(n) forMADn, Sn (6.4) but today'scomputers easilyafford can 2  medkxk; i this. One could also compareSn and Qnwiththe class of Mestimators. latter The can be moreefficient, their and com whichalso have a 50% breakdown point.At symmetric disputational is complexity similar evenlower, we restrict tributions, if (or (6.3) and (6.4) are asymptotically equivalentto their to algorithm a fixed of number iteration steps).But M Snand Qn;however, simulations haveindicated thatat finite estimators definedimplicitly, are whereasSn and Qnhave samplestheseestimators not perform do thanSn and better an explicit whichguarantees formula, uniquenessofthees Qn,so we see no reasonto switch either to (6.3) or (6.4). A timateand is moreintuitive. thereis no need to possibleapplicationof (6.3) would be to divideit by Sn to Moreover, selecta function or to choose tuning X constants. obtaina testfor symmetry (analogously, (6.4) can be divided Another classis thatofonestep Mestimators, whichhave by Qn). This is similarto a proposalof Boos (1982) using an explicit formula and can also attaina highefficiency and Gini's averagedeviation. a highbreakdown point.Butlikefully iterated Mestimators, Recently, Rousseeuwand Croux(1992) proposedseveral because theyneed an ancillary otherexplicitscale estimators theyare not locationfree withhighbreakdown point. locationestimator. And,also likefully iterated Mestimators, A promising estimator is their an computation requires initialhighbreakdown scale l h estimator suchas MADn, Sn,or Qnin thefirst place. Finally, Tn= 1.3800 z2 {medIxi Xjl}(k). (6.5) it shouldbe notedthatalthough kstep Mestimators inherit *i1 hk= I thebreakdown pointoftheinitial it estimator, turns that out point,a contintheiractual contamination bias does increase(Rousseeuw It was provedthatTnhas a 50% breakdown uous influence an of function, efficiency 52%, and a grossand Croux 1993). of sensitivity 1.4688. ChoosingbetweenSn and Qncomes down to a tradeoff error In cluster analysisone oftenneeds a measureof dissimibetween thatare hardto compare.Although advantages Qn betweentwo groupsA and B, whichis based on the is more efficient, most applicationswe would prefer in Sn larity dissimilarities. a discussion suchmeasures interobject (For of because it is veryrobust, witnessed itslow grosserror as by see Kaufmanand Rousseeuw1990,sec. 5.5.) Our estimators Anotheradvantageof Sn is that its simplicity sensitivity. to makesit easierto compute. Snand Qncan be extended thissituation, yielding We expect application the of potential thenewrobust scale j)} ds(A, B) = medieAf{medjeBd(i, (6.6) estimators be mainlyin the following to threeuses: (1) as a data analytic tool by itself; as an ancillary (2) scale estimate and to make Mestimators affine equivariantand as a starting dQ(A, B) = { d(i, j); i&EA, j&EB}(k), (6.7) value for iterative the computation Mestimators; (3) of and as an objectivefunction regression for analysis,as will be wherek is approximately )(#B)/4. (Both measures (#A are described thenextsection. in equivariantformonotonetransformations the d( i, j); on
1281
are = c lomedihimed,Ix,  x,. We firstshow that en ? [n/2]/n. hence theycan stillbe applied whenthe dissimilarities the sampleX' byreplacing observations a in on an ordinalscale.) Note thatdQis symmetric thesense Construct contaminated 0 X[n/2]+l by xl. Then we have himed,Ixi  x,= thatdQ(A, B) = dQ(B, A), whereasds is not,but thiscan x2, x', for[n/2] + 1 observations hence Sn(X') = 0. Moreover,ewith d'8(A, B) = min{ ds(A, B), be repairedby working than [n/2] 2 [n/2]/n. Indeed, take any sample X' wherefewer ds(B, A)}. are replaced. Because X was in general position, observations methexisting Both d'8and dQcan be seenas robustifying himed,Ixi  x I 2 mini<,Ix,  x I/2 = 6 > 0 for all i; hence by ods. Indeed,ifthemediansin (6.6) werereplaced averages, in or ifthe orderstatistic (6.7) was replacedby an average, a e+ Further, < [(n + 1)/2]/n. Construct sampleX' by replacaveragelinkagecri ing xl by X(n) + L, x2 by X(n) + 2L, the thenwe would recover wellknown and X[(n+1)/2]by X(n) the hand,replacing mediansbyminima + [(n + 1)/2]L, withL > 0. Then himedgI  xI 2 L forall i. On terion. theother x' k will setting in (6.7) equal to 1) producesthe LettingL tend to infinity then inflatethe estimator be(or equivalently, Sn hand,e+ > [(n + 1)/2]/n.Take any the replacing mediansby yondall bounds.On theother Moreover, criterion. linkage single are 2] than[(n + 1)/ observations replaced. k maxima(or setting = (#A)(#B) in (6.7)) yieldscomplete sampleX' wherefewer If x' belongs to the original sample, then himed, x'  x,I I linkage.We can thus thinkof d' and dQ as intermediate X we < x(n)  x(l) I. Because thisholds forat least halfthe points, whichare often linkage, and complete linkage between single thatSn(X') < c Ix(n)x(l)I < 00. find the criteria For to considered be too extreme. instance, latter Remark. This proofalso impliesthatif we replacethe outer the limit (becauseusually smallest do nothavean asymptotic of medianby an orderstatistic rankk < h = [n/2] + 1, thenthe tends dissimilarity dissimilarity to 0 as n oo, and thelargest points. will estimator keep thesame breakdown tendsto oo), whereas(6.6) and (6.7) do. equivariant The factthate*(Sn, X) ? [n/2]/n forany affine The approach put Anotherapplicationis to regression. was scale estimator provedby Croux and Rousseeuw(1992a). affine equiin forward (Rousseeuw 1984) was to construct 3. by estimates minimizing Proofof Theorem We use the notations regression highbreakdown variant r,, of scale a highbreakdown estimate theresiduals . . ., r,. and g2 =((gF) [x0.[) g= ((gF))l]_,Ox0])' the (5.1) yields the minimizing scale estimator For instance, are In estimator. the same vein, Sestimators Condition 1 ensuresthat these functions well defined.With LMS regression GF(U) = P(gF(Y) ? u), we have S(F) = cG'(1/2) = 1, using by (Rousseeuwand Yohai 1984)wereobtained minimization the p. (2.3). Let us denote F, = (1  e)F + eIx. Differentiating expresof of an Mestimator scale based on a smoothfunction sion GF,(c'S(F,)) = 1/2 yields in introduced thisarscale estimators The highbreakdown ticle could be used in the same way. It seems that Q, is aGF,(c') influence becauseofitssmooth to preferable Snin thisregard, _ )_C _= aS(Fc) = (Al ) IF(x; S,F) = estiwe function. Therefore, may considerthe regression C.'(c mator
Sn(X ) > Cb.
(6.8)
of Using Conditions1 and 2, we obtain foru in a neighborhood c' that GF(U) = P(gl (u) Y? g2(u)) = F(g2(u))F(g,
f
(LQD) to We will refer 0,,as the least quartiledifference classofgeneralized to The estimator. LQD belongs thelarger Sestimators (Croux, Rousseeuw,and H6ssjer 1993). The of efficiency the LQD turnsout to be 67.1%, asymptotic thanthatof Sestimators. whichis much higher in A finalextension to scale estimation a linearmodel. is are dispersion based on the estimators the error of Existing fit. residualsfrom some regression But it is possibleto genestimators and Q, to scale estiSn eralizethe locationfree in the do thatare regressionfree, sensethatthey not mators of paramedependon any previousestimate the regression ters.In the simplelinearmodel,we may considertriangles formedby data points and compute theirverticalheight (Rousseeuw h( i, j, k). This leads to severalscale estimators mequantilesor repeated and Hubert1993), obtainedfrom to is diansoftheh( i,j, k). Whenattention restricted adjacent this triangles, also yieldsa testforlinearity.
(u)).
f f(q2 ) '(q2)
( q l) f'(ql)
(A .2)
from Condition3 allowsus to assumethatx is different thepoints will function notbe c? and q2 ? c' (in whichtheinfluence + q, small,we maywrite defined).For e sufficiently
GF,(u)=
Fj(g2,c(u))
Fc(gi,,(u))
(A.3)
of all for u in a smallneighborhood c'. We choosea neighborhood in of decreasing a neighborhood q, and increasing suchthatgF. stays nearq2. We mayalso assume thatgj,,(c6') and g2,.(c') are lying of Differentiation (A.3) yields in thatneighborhood.
aFjg2(U)) ae Ic=o
ae
ae
APPENDIX: PROOFS
have and someproofs computations been Due to lackofspace, and (Rousseeuw Croux report to and omitted referred a technical 1991).
anddenote En
4+(S5, X)
andEn
en(Sn, X),
where Sn~
dc 0+F(q,)I(x
< q).
(A.4)
of Journal the AmericanStatistical Association,December 1993 Whenx < gG0(x), Equation (A.9) entailsgG0(x)= g(x). Thereon and fore, gG0(x)is symmetric increasing thepositive numbers. This impliesthat B(e, F) = S(Go)
gGo
=
c
x(g2,c
)
')c')} 1/2.
+ e{z\A(c'
+ g2,j(c'))
Differentiating yields
{f(g2,(c') + c') f(g2,(c') {F(c'
+ {I(x ? c'
cgG0(medYI) I
 C')}

ag2,(c') c
? 92,c(C')
== C~O(F F1
4(1 1e

))) ))
3e)/[2(1 
(A. I10)
e)]}/2,
+ g2,j(c'))
+ g2,c(c')) 
F(g2c

)}
= 0.
I(x
Equations (2.13) and (2.14) provide an implicitdetermination of B(e, F). Proof Theorem Let themodeldistribution symmetric of 7. Fbe about 0 and have a density. G = (1  e)F + eH, thefunctional At Q(G) is thesmallest positivesolutionof
e)]}
ag2,(C') sgn(Iq2xl c') ac C=O 2(f(q2 + C ') f(q2 Analogously, also find we
_g_c(c_')
where is theconstant d defined (3.7). IfX is distributed by according Combining(A.1), (A.2), (A.4), (A.5), and (A.6) givesthe desired to F and Y1, Y2according G, thenwe can rewrite I 1) as to (A. equation(2.6). Proofof Theorem4. Let e be any value in ]O, 2 [. We obtain B+(e, F) as the limitof S(Gn), whereG, = (1  e)F + cAx,and Let accordingto G,. Then x, goes to infinity. Y be distributed is thesmallest positivevalue forwhich gG,(x) Gn(x + gGn(x)) (1 e){F(x +
+ e
ac
=o
(A.1 1)
(1
C)2F*2(dlQ(G))
+e(1 e){1
+P(IX
Y'I
d'Q(G))}
?
+e2P((Y1Y2)
2
d'Q(G))
5/8. (A.12)
G,(x
gGn(x)) + P(Y
gG,(x)) ? .5.
Note thateach termin (A. 12) is increasing Q(G). To maximize in Q(G), we have to minimize P(I X  Y,I < d'Q(G)) and (O and 2) whenbothlocationand scale of H tendto infinity. (For instance, consider sequenceofGaussiandistributions location a with n and standard deviation Therefore, 12)yields n.) (A. formula (3.10). For theimplosion bias curve, have to maximizeP( IX we I < d'Q(G)) and P((YI  Y2) < d'Q(G)). When we choose H = AO,we attainthe maximalvalues 2F(Q(G))  1 and 1. Combiningthiswith(A. 12), we obtain Equation (3.11), whichdeterminesB(e, F). 1991. June [Received September Revised 1992.]
P((Yi

Substituting yields Gn
gGn(x))
{ L\xn(X
+
F(x
gGn (X)
gGn(x))}

,,, (X
gGn(X))
}
2
.5.
n>
nO
SUp (X + gG,(X)) Ixi <M < Xn
(A.7)
and
IxI M
gG(x),
(A.8)
REFERENCES
Andrews, F., Bickel,P. J.,Hampel, F. R., Huber,P. J.,Rogers, H., D. W.
of making use of the properties F. Using (A.7) yields g+(x) = gG,(x) for all I xl < M and forall n > no. If we choose M such that
medlIYlI
=
F(
e ) <M,
vances,Princeton, Princeton NJ: University Press. Bickel,P. J.,and Lehmann,E. L. (1976), "Descriptive Statistics Nonfor parametric Models III: Dispersion,"The Annals ofStatistics, 11394, 1158. (1979), "DescriptiveStatisticsfor Nonparametric Models IV: Jureckova, Prague:Academia,pp. 3340. Boos, D. (1982), "A Test forAsymmetry AssociatedWith the Hodges77, 647651.
and then,using(A.8) and the symmetry monotonicity gG"(x) of on ]M, M[ (from Eq. (2.12)), we obtain
S(G,) = cgG"(med I Y I),
whichdoes not depend on n (providedthatn > no). Therefore, Croux, C., and Rousseeuw,P. J. (1992a), "A Class of HighBreakdown Scale Estimators Based on Subranges,"Communications Statistics, in Equations(2.11) and (2.12) together determine B+(e, F). and Methods, Theory 21, 19351951. For the implosionbias curve,we have B(e, F) = S(GO), with (1992b), "TimeEfficient Algorithms Two HighlyRobustEstifor matorsof Scale," in Computational Statistics, Volume1, eds. Y. Dodge Now gG0(x) is the smallest positive solution Go = (1  )F + eLA0. and J.Whittaker, Heidelberg: PhysikaVerlag, 411428. of (1

e){F(x + gG0(x))
+ e
F(x
gG.(x))}

{ AO (X
+ gGo(x))
AO (X 
gGo(x))
+ RI(x
gG0(x) = 0) ?
.5. (7.9)
Croux, C., Rousseeuw, P. J., and H6ssjer, 0. (1993), "Generalized SEstimators," technical report 9303,Universitaire Instelling Antwerpen, Belgium. contributo studiodelle disallo Gini, C. (1912), "Variabilitae mutabilita, tribuzioni dellerelazioni e StudiEconomicoGiuridicidella statistiche,"
iffx?F1l(
21
3' )/2.
Grubel,R. (1988), "The Lengthof the Shorth,"The AnnalsofStatistics, 16, 619628. Curveand its Role in RobustEstiHampel, F. R. (1974), "The Influence
R. UniversitdCagliari, 3159. di 3,
Rousseeuw and Croux: Alternatives to the MAD P. W. E. Hampel,F. R., Ronchetti, M., Rousseeuw, J.,and Stahel, A. (1986), New York: The Functions, Based on Influence Robust Statistics: Approach JohnWiley. He, X., and Simpson,D. G. (1993), "Lower Bounds forContamination The Annals Bias: GloballyMinimaxversusLocallyLinearEstimation," 21, ofStatistics, 314337. E. of and Hodges,J.L., Jr., Lehmann, L. (1963), "Estimates LocationBased Statistics, 598611. 34, on Rank Tests,"AnnalsofMathematical of H6ssjer,O., Croux, C., and Rousseeuw,P. J. (1993), "Asymptotics a In9310, Universitaire technical report VeryRobustScale Estimator," Antwerpen, Belgium. stelling New York: JohnWiley. Huber,P. J.(1981), RobustStatistics, Univariate DistributionsN. Johnson, L., and Kotz, S. (1970), Continuous I, New York: JohnWiley. Kaufman,L., and Rousseeuw,P. J. (1990), FindingGroupsin Data: An New York: JohnWiley. to Analysis, Introduction Cluster Martin,R. D., Yohai, V. J.,and Zamar, R. H. (1989), "MinMax Bias 17, The RobustRegression," AnnalsofStatistics, 16081630. MinMax Bias R. Martin, D., and Zamar, R. H. (1989), "Asymptotically of Random Variables,"Journal of RobustMestimates Scale forPositive Association, 494501. 84, theAmerican Statistical of (1991), "Bias Robust Estimation Scale when Location is Unto known,"submitted The AnnalsofStatistics. Journal Rousseeuw,P. J. (1984), "Least Median of Squares Regression," Association, 871880. 79, Statistical oftheAmerican P. G. (1990), "The Remedian:A Robust Rousseeuw, J.,and Bassett, W., Jr. oftheAmerican Statistical Methodfor LargeData Sets,"Journal Averaging 85, Association, 97104.
1283 Rousseeuw, J., P. and Croux, (1991), "Alternatives theMedianAbsolute C. to Deviation,"TechnicalReport9143,Universitaire Instelling Antwerpen, Belgium. (1992), "ExplicitScale Estimators withHighBreakdown Point,"in Analysisand RelatedMethods, Y. Dodge, Amsterdam: ed. LIStatistical NorthHolland, 7792. pp. (1993), "The Bias of kStepMestimators," submitted Statistics to and Probability Letters. Rousseeuw,P. J.,and Hubert,M. (1993), "RegressionFree Robust and AnEstimation Scale," technical of report 9324,Universitaire Instelling twerpen, Belgium. A. Rousseeuw,P. J.,and Leroy, M. (1987), RobustRegression and Outlier Detection, New York: JohnWiley. (1988), "A Robust Scale Estimator Based on the Shortest Half," Statistica Neerlandica, 103116. 42, Rousseeuw,P. J.,and Yohai, V. J. (1984), "Robust Regression Means by in ofSEstimators," Robustand Nonlinear TimeSeriesAnalysis, (Lecture Notesin Statistics eds. J.Franke, Hardle,and R. D. Martin, 26), W. New York: SpringerVerlag, 256272. pp. R. The Serfling, J. (1984), "GeneralizedL, M, and RStatistics," Annals ofStatistics, 7686. 12, and Problems theInterface," at Shamos,M. I. (1976), "Geometry Statistics: in NewDirections RecentResultsinAlgorithms Complexity, and and ed. J. F. Traub,New York: AcademicPress,pp. 251280. Siegel,A. F. (1982), "Robust Regression Using Repeated Medians,"Biometrika, 242244. 69, Data Analysis, Tukey,J. W. (1977), Exploratory Reading,MA: AddisonWesley.