You are on page 1of 7

Focus Article

Robust statistics for outlier


detection
Peter J. Rousseeuw and Mia Hubert

When analyzing data, outlying observations cause problems because they may
strongly influence the result. Robust statistics aims at detecting the outliers by
searching for the model fitted by the majority of the data. We present an overview
of several robust methods and outlier detection tools. We discuss robust proce-
dures for univariate, low-dimensional, and high-dimensional data such as esti-
mation of location and scatter, linear regression, principal component analysis,
and classification. C 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 73–79
DOI: 10.1002/widm.2

INTRODUCTION ESTIMATING UNIVARIATE


LOCATION AND SCALE
I n real data sets, it often happens that some obser-
vations are different from the majority. Such ob-
servations are called outliers. Outlying observations
As an example, suppose we have five measurements
of a length:
may be errors, or they could have been recorded un-
der exceptional circumstances, or belong to another 6.27, 6.34, 6.25, 6.31, 6.28 (1)
population. Consequently, they do not fit the model
and we want to estimate its true value.n For this,
well. It is very important to be able to detect these
one usually computes the mean x̄ = 1n i=1 xi , which
outliers.
in this case equals x̄ = (6.27 + 6.34 + 6.25 + 6.31 +
In practice, one often tries to detect outliers
6.28)/5 = 6.29. Let us now suppose that the fourth
using diagnostics starting from a classical fitting
measurement has been recorded wrongly and the data
method. However, classical methods can be affected
become
by outliers so strongly that the resulting fitted model
does not allow to detect the deviating observations. 6.27, 6.34, 6.25, 63.1, 6.28. (2)
This is called the masking effect. In addition, some
good data points might even appear to be outliers, In this case, we obtain x̄ = 17.65, which is far from
which is known as swamping. To avoid these effects, the unknown true value. On the contrary, we could
the goal of robust statistics is to find a fit that is close also compute the median of these data. To this end, we
to the fit we would have found without the outliers. sort the observations in (2) from smallest to largest:
We can then identify the outliers by their large devia- 6.25 ≤ 6.27 ≤ 6.28 ≤ 6.34 ≤ 63.10.
tion from that robust fit.
First, we describe some robust procedures for The median is then the middle value, yielding 6.28,
estimating univariate location and scale. Next, we which is still reasonable. We say that the median is
discuss multivariate location and scatter, as well as more robust against an outlier.
linear regression. We also give a summary of avail- More generally, the location-scale model states
able robust methods for principal component analy- that the n univariate observations xi are independent
sis (PCA), classification, and clustering. For a more and identically distributed (i.i.d.) with distribution
extensive review, see Ref 1. Some full-length books function F [(x − μ)/σ ] where F is known. Typically, F
on this topic are Refs 2, 3. is the standard Gaussian distribution function . We
then want to find estimates for the center μ and the
scale parameter σ .
The classical estimate of location is the mean.

Correspondence to: peter@rousseeuw.net As we saw above, the mean is very sensitive to even
Renaissance Technologies, New York, and Katholieke Universiteit 1 aberrant value out of the $n$ observations. We say
Leuven, Belgium that the breakdown value4,5 of the sample mean is
DOI: 10.1002/widm.2 1/n, so it is 0% for large n. In general, the breakdown

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 73
Focus Article wires.wiley.com/widm

value is the smallest proportion of observations in the for a real function ψ. The denominator σ̂ is an
data set that need to be replaced to carry the estimate initial robust scale estimate such as the MAD. A
arbitrarily far away. See Ref 6 for precise definitions solution to (4) can be found by the Newton–Raphson
and extensions. The robustness of an estimator is also algorithm, starting from the initial location estimate
measured by its influence function,7 which measures θ̂ (0) = mediani (xi ). Popular choices for ψ are the
the effect of one outlier. The influence function of the Huber function ψ(x) = x min(1, c/|x|), and Tukey’s
mean is unbounded, which again illustrates that the biweight function ψ(x) = x(1 − (x/c)2 )2 I(|x| ≤ c).
mean is not robust. These M-estimators contain a tuning parameter c,
For a general definition of the median, we de- which needs to be chosen in advance.
note the ith ordered observation as x(i) . Then, the People often use rules to detect outliers. The
median is x[(n+1)/2] if n is odd, and [x(n/2) + x(n/2+1) ]/2 classical rule is based on the z-scores of the observa-
if n is even. Its breakdown value is about 50%, mean- tions given by
ing that the median can resist up to 50% of outliers,
and its influence function is bounded. Both properties zi = (xi − x̄)/s (5)
illustrate the median’s robustness.
The situation for the scale parameter σ is simi- where s is the standard deviation. More precisely, the
lar. The
 classical estimator is the standard deviation rule flags xi as outlying if |zi | exceeds 2.5, say. But
n
s= i=1 (xi − x̄) /(n − 1). Because a single outlier
2 in the above-mentioned example (2) with the outlier,
can already make s arbitrarily large, its breakdown the z-scores are
value is 0%. For instance, for the clean data (1) above,
we have s = 0.035, whereas for the data (2) with the −0.45, −0.45, −0.45, 1.79, −0.45
outlier, we obtain s = 25.41! A robust measure of
scale is the median of all absolute deviations from the so none of them attains 2.5. The largest value is only
median (MAD), given by the median of all absolute 1.79, which is quite similar to the largest z-score for
deviations from the median: the clean data (1), which equals 1.41. The z-score of
  the outlier is small because it subtracts the nonrobust
 
MAD = 1.483 median xi − median(x j ). (3) mean (which was drawn toward the outlier) and be-
i=1,...,n j=1,...,n
cause it divides by the nonrobust standard deviation
The constant 1.483 is a correction factor that makes (which the outlier has made much larger than in the
the MAD unbiased at the normal distribution. The clean data). Plugging robust estimators of location
MAD of (2) is the same as that of (1), namely 0.044. and scale into (5), such as the median and the MAD,
We can also use the Qn estimator,8 defined as yields the robust scores
Qn = 2.2219 {|xi − x j |; i < j}(k)

h n xi − median(x j ) /MAD (6)
with k = 2 ≈ 2 /4 and h =  2n  + 1. Here, . . . j=1,...,n
rounds down to the nearest integer. This scale estima-
tor is thus the first quartile of all pairwise differences which are more useful; in the contaminated example
between two data points. The breakdown value of (2), the robust scores are
both the MAD and the Qn estimator is 50%.
Also popular is the interquartile range (IQR) −0.22, 1.35, −0.67, 1277.5, 0.0
defined as the difference between the third and first
quartiles, that is, IQR = x3n/4 − xn/4 (where . . . in which the outlier greatly exceeds the 2.5 cutoff.
rounds up to the nearest integer). Its breakdown value Also Tukey’s boxplot is often used to pinpoint
is only 25% but it has a simple interpretation. possible outliers. In this plot, a box is drawn from
The robustness of the median and the MAD the first quartile Q1 = xn/4 to the third quartile
comes at a cost: At the normal model they are less effi- Q3 = x3n/4 of the data. Points outside the inter-
cient than the mean. To find a better balance between val [Q1 − 1.5 IQR, Q3 + 1.5 IQR], called the fence,
robustness and efficiency, many other robust pro- are traditionally marked as outliers. Note that the
cedures have been proposed such as M-estimators.9 boxplot assumes symmetry because we add the same
They are defined implicitly as the solution of the amount to Q3 as what we subtract from Q1 . At asym-
equation metric distributions, the usual boxplot typically flags
 many regular data points as outlying. The skewness-
n
xi − θ̂
ψ =0 (4) adjusted boxplot corrects for this by using a robust
σ̂ measure of skewness in determining the fence.10
i=1

74 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Robust statistics for outlier detection

MULTIVARIATE LOCATION AND these subsets, h-subsets are constructed by so-called


COVARIANCE ESTIMATION C-steps (see Ref 14 for details).
Many other robust estimators of location and
From now on, we assume that the data are p- scatter have been presented in the literature. The
dimensional and are stored in an n × p data matrix first such estimator was proposed by Stahel15 and
X = (x1 , . . . , xn )T with xi = (xi1 , . . . , xi p )T the ith ob- Donoho16 (see also Ref 17). They defined the so-
servation. Classical measures of location n and scatter called Stahel–Donoho outlyingness of a data point
are given by the empirical mean x̄ =1n i=1 xi and the xi as
n
empirical covariance matrix S x = i=1 (xi − x̄)(xi −  T  
x̄)T /(n − 1). As in the univariate case, both classi-  x d − median j=1,...,n xT d 
i j
cal estimators have a breakdown value of 0%, that outl(xi ) = max   (8)
d MAD j=1,...,n xTj d
is, a small fraction of outliers can completely ruin
them. where the maximum is over all directions (i.e., all
Robust estimates of location and scatter can p-dimensional unit length vectors d), and xTj d is the
be obtained by the minimum covariance determinant projection of x j on the direction d. Next, they gave
(MCD) method of Rousseeuw.11,12 The MCD looks each observation a weight wi based on outl(xi ), and
for those h observations in the data set (where the computed the resulting weighted mean and covari-
number h is given by the user) whose classical covari- ance matrix.
ance matrix has the lowest possible determinant. The Multivariate M-estimators18 have a relatively
MCD estimate of location μ̂0 is then the average of low breakdown value due to possible implosion
these h points, whereas the MCD estimate of scatter of the estimated scatter matrix. More recently, ro-
ˆ 0 is their covariance matrix, multiplied by a consis- bust estimators of multivariate location and scat-
tency factor. We can then give each xi some weight ter include S-estimators,2,19 MM-estimators,20 and
wi , for instance, by putting wi = 1 if the initial robust the orthogonalized Gnanadesikan-Kettenring (OGK)
distance estimator.21

−1
RDi (xi , μ̂0 ,  0 ) = (xi − μ̂0 )T ˆ 0 (xi − μ̂0 )
ˆ

≤ χ p,0.975
2
(7) LINEAR REGRESSION
The multiple linear regression model assumes that in
n wi = 0 otherwise. The weighted mean μ̂w =
and addition to the p independent x-variables also a re-
n
(i=1 wi xi )/( i=1 wi ) and covariance
n matrix ˆ w = sponse variable y is measured, which can be explained
n
( i=1 wi (xi − μ̂w )(xi − μ̂w ) )/( i=1 wi − 1)
T
then by a linear combination of the x variables. More pre-
have a better finite-sample efficiency. The final robust cisely, the model says that for all observations (xi , yi )
distances RD(xi ) are obtained by inserting μ̂w and it holds that
ˆ w into (7).
The MCD estimator, as well as its weighted ver- yi = β0 + β1 xi1 + . . . + β p xi p + i
sion, has a bounded-influence function and break- = β0 + β T xi + i i = 1, . . . , n (9)
down value (n − h + 1)/n, hence the number h de-
termines the robustness of the estimator. The MCD where the errors i are assumed to be independent and
has its highest possible breakdown value when h = identically distributed with zero mean and constant
(n + p + 1)/2. When a large proportion of con- variance σ 2 . Applying a regression estimator to the
tamination is expected, h should thus be chosen data yields p + 1 regression coefficients, combined as
close to 0.5n. Otherwise an intermediate value for θ̂ = (β̂0 , . . . , βˆp )T . The residual ri of case i is defined
h, such as 0.75n, is recommended to obtain a as the difference between the observed response yi
higher finite-sample efficiency. We refer to Ref 13 and its estimated value ŷi .
for an overview of the MCD estimator and its The classical least squares (LS) method to esti-
properties. mate θ minimizes the sum of the squared residuals. It
The computation of the MCD estimator is non- is popular because it allows to compute the regression
trivial and naively requires an exhaustive investiga- estimates explicitly, and it is optimal if the errors are
tion of all h-subsets out of n. Rousseeuw and Van normally distributed. However, LS is extremely sen-
Driessen14 constructed a much faster algorithm called sitive to regression outliers, that is, observations that
FAST-MCD. It starts by randomly drawing many do not obey the linear pattern formed by the majority
p + 1 observations from the data set. On the basis of of the data.

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 75
Focus Article wires.wiley.com/widm

The least trimmed squares estimator (LTS) pro- a small number of components. These components are
posed by Rousseeuw11 is given by linear combinations of the original variables and often
allow for an interpretation and a better understanding

h
of the different sources of variation. PCA is often
minimize (r 2 )(i) (10)
the first step of the data analysis, followed by other
i=1
multivariate techniques.
where (r 2 )(1) ≤ (r 2 )(2) ≤ . . . ≤ (r 2 )(n) are the ordered In the classical approach, the first principal com-
squared residuals. (They are first squared and then ponent corresponds to the direction in which the pro-
ordered.) The value h plays the same role as in the jected observations have the largest variance. The sec-
MCD estimator. For h ≈ n/2, we find a breakdown ond component is then orthogonal to the first and
value of 50%, whereas for larger h, we obtain roughly again maximizes the variance of the data points pro-
(n − h)/n. A fast algorithm for the LTS estimator jected on it. Continuing in this way produces all
(FAST-LTS) has been developed.22 the principal components, which correspond to the
eigenvectors of the empirical covariance matrix. Un-
h of 2the errors σ can be estimated by
The scale
σ̂LTS
2
= ch,n
2
i=1 (r )(i) / h, where ri are the residuals
fortunately, both the classical variance (which is be-
from the LTS fit and ch,n is a constant that makes σ̂ ing maximized) and the classical covariance matrix
unbiased at Gaussian error distributions. We can then (which is being decomposed) are very sensitive to
identify regression outliers by their standardized LTS anomalous observations. Consequently, the first com-
residuals ri /σ̂LTS . We can also use the standardized ponents from classical PCA are often attracted toward
LTS residuals to assign a weight to every observation. outlying points and may not capture the variation of
The weighted LS estimator with these LTS weights the regular observations.
inherits the nice robustness properties of LTS, but A first group of robust PCA methods is obtained
is more efficient and yields all the usual inferential by replacing the classical covariance matrix by a ro-
output, such as t-statistics, F-statistics, an R2 statistic, bust covariance estimator such as the weighted MCD
and the corresponding p-values. estimator or MM-estimators.29,30 Unfortunately, the
The outlier map23 plots the standardized LTS use of these covariance estimators is limited to small-
residuals versus robust distances (7) based on (for to-moderate dimensions because they are not defined
instance) the MCD estimator, which is applied to when p is larger than n.
the x-variables only. This allows to distinguish ver- A second approach to robust PCA uses Projec-
tical outliers (with small robust distances and outly- tion Pursuit techniques. These methods maximize a
ing residuals), good leverage points (with outlying ro- robust measure of spread to obtain consecutive di-
bust distances but small residuals), and bad leverage rections on which the data points are projected, see
points (with outlying robust distances and outlying Refs 31–34.
residuals). The ROBPCA35 approach is a hybrid, which
The earliest theory of robust regression was combines ideas of projection pursuit and robust co-
based on M-estimators,24 R-estimators,25 and L- variance estimation. The projection pursuit part is
estimators.26 The breakdown value of all these used for the initial dimension reduction. Some ideas
methods is 0% because of their vulnerability to based on the MCD estimator are then applied to this
bad leverage points. Generalized M-estimators (GM- lower-dimensional data space.
estimators)7 were the first to attain a positive break- For outlier detection, a PCA outlier map35 can
down value, which unfortunately still went down to be constructed, similar to the regression outlier map.
zero for increasing p. It plots for every observation its (Euclidean) orthog-
The low finite-sample efficiency of LTS can be onal distance to the PCA-subspace, against its score
improved by replacing its objective function by a more distance, which measures the robust distance of its
efficient scale estimator applied to the residuals ri . projection to the center of all the projected observa-
This approach has led to the introduction of regres- tions. Doing so, four types of observations are visu-
sion S-estimators27 and MM-estimators.28 alized. Regular observations have a small orthogo-
nal and a small score distance. Observations with a
large score distance but a small orthogonal distance
are called good leverage points. Orthogonal outliers
FURTHER DIRECTIONS
have a large orthogonal distance but a small score
Principal Component Analysis distance. Bad leverage points have a large orthogo-
PCA is a very popular dimension-reduction method. It nal distance and a large score distance. They lie far
tries to explain the covariance structure of the data by outside the space spanned by the robust principal

76 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Robust statistics for outlier detection

components, and after projection far from the regular When linear models are not appropriate, sup-
data points. Typically, they have a large influence on port vector machines (SVM) are powerful tools to
classical PCA, as the eigenvectors will be tilted toward handle nonlinear structures.43 An SVM with an un-
them. bounded kernel (such as a linear kernel) is not robust
Other proposals for robust PCA include spheri- and suffers the same problems as traditional linear
cal PCA,36 which first projects the data onto a sphere classifiers. But when a bounded kernel is used, the re-
with a robust center, and then applies PCA to these sulting nonlinear SVM classification handles outliers
projected data. For a review of robust versions of quite well.
principal component regression and partial Least
Square, see Ref 1.
Clustering
Classification Cluster analysis is an important technique when han-
The goal of classification, also known as discriminant dling large data sets. It searches for homogeneous
analysis or supervised learning, is to obtain rules that groups in the data, which afterward may be an-
describe the separation between known groups of p- alyzed separately. Nonhierarchical cluster methods
dimensional observations. This allows to classify new search for the best clustering in k groups for a user-
observations into one of the groups. We denote the specified k.
number of groups by l and assume that we can de- For spherical clusters, the most popular method
scribe each population π j by its density f j . We write is k-means, which minimizes the sum of the squared
p j for the membership probability, that is, the prob- Euclidean distances of the observations to the mean
ability for any observation to come from π j . of their group.44 This method is not robust as it uses
For low-dimensional data, a popular classifica- group averages. To overcome this problem, one of
tion rule results from maximizing the Bayes posterior the first robust proposals was the Partitioning around
probability. At gaussian distributions, this leads to Medoids method.45 It searches for k observations
the maximization of the quadratic discriminant scores (medoids) such that the sum of the unsquared dis-
d jQ(x) given by tances of the observations to the medoid of their
group is minimized.
1 1 Later on, the more robust trimmed k-means
d jQ(x) = − ln| j | − (x − μ j )T  −1
j (x − μ j )
2 2 method has been proposed,46 inspired by the trim-
+ ln( p j ). (11) ming idea of the MCD and the LTS. It searches for
the h-subset (with h as in the definition of MCD and
When all the covariance matrices are assumed to be LTS) such that the sum of the squared distances of
equal, these scores can be simplified to the observations to the mean of their group is min-
1 T −1 imized. Consequently, not all observations need to
dLj (x) = μTj  −1 x − μ  μ j + ln( p j ) (12)
2 j be classified, as n − h cases can be left unassigned.
where  is the common covariance matrix. This leads To perform the trimmed k-means clustering, an itera-
to linear classification boundaries. Robust classifica- tive algorithm has been developed, using C-steps that
tion rules can be obtained by replacing the classical are similar to those in the FAST-MCD and FAST-
covariance matrices by robust alternatives such as the LTS algorithms.47 For nonspherical clusters, con-
MCD estimator and S-estimators, as in Refs 37–40. strained maximum likelihood approaches48,49 were
When the data are high dimensional, this ap- developed.
proach cannot be applied anymore because the robust
covariance estimators become uncomputable. This
problem can be solved by first applying PCA to the Beyond Outlier Detection
entire set of observations. Alternatively, one can also There are other aspects to robust statistics apart
apply a PCA method to each group separately. This from outlier detection. For instance, robust estima-
is the idea behind the soft independent modeling of tion can be used in automated settings such as com-
class analogy (SIMCA) method.41 A robustification puter vision.50,51 Another aspect is statistical infer-
of SIMCA is obtained by first applying robust PCA to ence such as the construction of robust hypothesis
each group, and then constructing a classification rule tests, p-values, confidence intervals, and model selec-
for new observations based on their orthogonal dis- tion (e.g., variable selection in regression). This as-
tance to each subspace and their score distance within pect is studied in Refs 3 and 7, which also cite earlier
each subspace.42 work.

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 77
Focus Article wires.wiley.com/widm

SOFTWARE AVAILABILITY and SAS/IML (version 7 or higher). The free soft-


ware R provides many robust procedures in the pack-
Matlab functions for many of the procedures men- ages robustbase (e.g., huberM, Qn, adjbox, covMcd,
tioned in this article are part of the LIBRA covOGK, ltsReg, and lmrob) and rrcov (many robust
toolbox,52,53 which can be downloaded from wis. covariance estimators, linear and quadratic discrimi-
kuleuven.be/stat/robust. nant analysis, and several robust PCA methods). Ro-
The MCD and LTS estimators are also built bust clustering can be performed with the cluster
into S-PLUS as well as SAS (version 11 or higher) and tclust packages.

REFERENCES
1. Hubert M, Rousseeuw PJ, Van Aelst S, High break- 15. Stahel WA. Robuste Schätzungen: Infinitesimale Op-
down robust multivariate methods. Statist Sci 2008, timalität und Schätzungen von Kovarianzmatrizen,
23:92–119. Ph.D. thesis, ETH Zürich, 1981.
2. Rousseeuw PJ, Leroy AM. Robust regression and 16. Donoho DL, Gasko M. Breakdown properties of
outlier detection. New York: Wiley Interscience, location estimates based on halfspace depth and
1987. projected outlyingness. Ann Statist 1992, 20:1803–
3. Maronna RA, Martin DR, Yohai VJ. Robust Statistics: 1827.
Theory and Methods. New York: Wiley, 2006. 17. Maronna RA, Yohai VJ. The behavior of the Stahel-
4. Hampel FR. A general qualitative definition of robust- Donoho robust multivariate estimator. J Amn Statist
ness. Ann Math Stat 1971, 42:1887–1896. Assoc 1995, 90:330–341.
5. Donoho DL, Huber PJ. The notion of breakdown 18. Maronna RA. Robust M-estimators of multivariate lo-
point. In: Bickel P, Doksum K, and Hodges JL, cation and scatter. Ann Statist 1976, 4:51–67.
eds., A Festschrift for Erich Lehmann Belmont, MA: 19. Davies L. Asymptotic behavior of S-estimators of mul-
Wadsworth; 1983, 157–184. tivariate location parameters and dispersion matrices.
Ann Statist 1987, 15:1269–1292.
6. Hubert M, Debruyne M. Breakdown Value. Wiley In-
terdiscipl Rev: Comput Stat 2009, 1:296–302. 20. Tatsuoka KS, Tyler DE. On the uniqueness of S-
functionals and M-functionals under nonelliptical dis-
7. Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA.
tributions. Ann Statist 2000, 28:1219–1243.
Robust Statistics: The Approach Based on Influence
Functions. New York: Wiley; 1986. 21. Maronna RA, Zamar RH. Robust estimates of location
8. Rousseeuw PJ, Croux C. Alternatives to the median ab- and dispersion for high-dimensional data sets. Techno-
solute deviation. J Amn Statist Assoc 1993, 88:1273– metrics 2002, 44:307–317.
1283. 22. Rousseeuw PJ, Van Driessen K. Computing LTS re-
9. Huber PJ. Robust estimation of a location parameter. gression for large data sets. Data Min Knowledge Dis-
Ann Math Stat 1964, 35:73–101. covery 2006, 12:29–45.
10. Hubert M, Vandervieren E. An adjusted boxplot for 23. Rousseeuw PJ, van Zomeren BC. Unmasking multi-
skewed distributions. Comput Stat Data Anal 2008, variate outliers and leverage points. J Amn Statist As-
52:5186–5201. soc 1990, 85:633–651.
11. Rousseeuw PJ. Least median of squares regression. J 24. Huber PJ. Robust Statistics. New York: Wiley; 1981.
Amn Statist Assoc 1984, 79:871–880. 25. Jurecková J. Nonparametric estimate of regression co-
12. Rousseeuw PJ. Multivariate estimation with high efficients. Ann Math Stat 1971, 42:1328–1338.
breakdown point. In: Grossmann W, Pflug G., Vincze 26. Koenker R, Portnoy S. L-estimation for linear models.
I, and wertz W, eds. Mathematical Statistics and Ap- J Amn Statist Assoc 1987, 82:851–857.
plications. Vol.B. Dordrecht, The Netherlands: Reidel 27. Rousseeuw PJ, Yohai VJ. Robust regression by means
Publishing Company; 1985, 283–297. of S-estimators, In: Robust and nonlinear time series
13. Hubert M, Debruyne M. Minimum covariance deter- analysis, Franke J, Härdle W, and Martin RD eds. Lec-
minant. Wiley Interdiscipl Rev: Comput Stat 2010, ture Notes in Statistics No. 26, Springer-Verlag; 1984,
2:36–43. 256–272.
14. Rousseeuw PJ, Van Driessen K. A fast algorithm for 28. Yohai VJ. High breakdown point and high efficiency
the minimum covariance determinant estimator. Tech- robust estimates for regression. Ann Statist 1987,
nometrics 1999, 41:212–223. 15:642–656.

78 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Robust statistics for outlier detection

29. Croux C, Haesbroeck G. Principal components anal- 40. Hubert M, Van Driessen K. Fast and robust discrimi-
ysis based on robust estimators of the covariance or nant analysis. Comput Stat Data Anal 2004, 45:301–
correlation matrix: influence functions and efficiencies. 320.
Biometrika 2000, 87:603–618. 41. Wold S. Pattern recognition by means of disjoint
30. Salibian-Barrera M, Van Aelst S, Willems G. PCA principal components models. Pattern Recognit 1976,
based on multivariate MM-estimators with fast and 8:127–139.
robust bootstrap. J Amn Statist Assoc 2006, 101:1198– 42. Vanden Branden K, Hubert M. Robust classification
1211. in high dimensions based on the SIMCA method.
31. Li G, Chen Z. Projection-pursuit approach to robust Chemom Intell Lab Syst 2005, 79:10–21.
dispersion matrices and principal components: Primary 43. Steinwart I, Christmann A. Support Vector Machines.
theory and Monte Carlo. J Amn Statist Assoc 1985, New York: Springer; 2008.
80:759–766. 44. MacQueen JB. Some methods for classification and
32. Croux C, Ruiz-Gazen A. High breakdown estimators analysis of multivariate observations. Proceedings of
for principal components: the projection-pursuit ap- 5th Berkeley Symposium on mathematical statistics
proach revisited. J Multivariate Anal 2005, 95:206– and probability, vol. 1, University of California Press,
226. 1967, pp. 281–297.
33. Hubert M, Rousseeuw PJ, Verboven S. A fast robust 45. Kaufman L, Rousseeuw PJ. Finding Groups in Data:
method for principal components with applications to An Introduction to Cluster Analysis. New York: Wiley;
chemometrics. Chemom Intell Lab Syst 2002, 60:101– 1990.
111. 46. Cuesta-Albertos JA, Gordaliza A, Matrán C. Trimmed
34. Croux C, Filzmoser P, Oliveira MR. Algorithms for k-means: An attempt to robustify quantizers. Ann
projection-pursuit robust principal component analy- Statist 1997, 25:553–576.
sis. Chemom Intell Lab Syst 2007, 87:218–225. 47. Garcı́a-Escudero LA, Gordaliza A, Matrán C. Trim-
35. Hubert M, Rousseeuw PJ, Vanden Branden K. ming tools in exploratory data analysis. J Comput
ROBPCA: A new approach to robust principal Graph Stat 2003, 12:434–449.
components analysis. Technometrics 2005, 47:64– 48. Gallegos MT, Ritter G. A robust method for cluster
79. analysis. Ann Statist 2005, 33:347–380.
36. Locantore N, Marron JS, Simpson DG, Tripoli N, 49. Garcı́a-Escudero LA, Gordaliza A, Matrán C, Mayo
Zhang JT, Cohen KL. Robust principal compo- Iscar A. A general trimming approach to robust cluster
nent analysis for functional data. Test 1999, 8:1– analysis. Ann Statist 2008, 36:1324–1345.
73. 50. Meer P, Mintz D, Rosenfeld A, Kim DY. Robust re-
37. Hawkins DM, McLachlan GJ. High-breakdown lin- gression methods in computer vision: A review. Int J
ear discriminant analysis. J Amn Statist Assoc 1997, Comput Vision 1991, 6:59–70.
92:136–143. 51. Stewart CV. MINPRAN: A new robust estimator for
38. He X, Fung WK. High breakdown estimation for computer vision. IEEE Trans Pattern Anal Machine
multiple populations with applications to discrim- Intell 1995, 17:925–938.
inant analysis. J Multivariate Anal 2000, 72:151– 52. Verboven S, Hubert M. LIBRA: a Matlab library for ro-
162. bust analysis. Chemom Intell Lab Syst 2005, 75:127–
39. Croux C, Dehon C. Robust linear discriminant anal- 136.
ysis using S-estimators. Can J Statist 2001, 29:473– 53. Verboven S, Hubert M. Matlab library LIBRA. Wiley
492. Interdiscipl Rev: Comput Stat 2010, in press.

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 79

You might also like