Professional Documents
Culture Documents
To cite this article: Meltem Ekiz & O.Ufuk Ekiz (2016): Outlier detection with Mahalanobis
square distance: incorporating small sample correction factor, Journal of Applied Statistics,
DOI: 10.1080/02664763.2016.1255313
Article views: 19
Download by: [The UC San Diego Library] Date: 27 January 2017, At: 10:44
JOURNAL OF APPLIED STATISTICS, 2016
http://dx.doi.org/10.1080/02664763.2016.1255313
1. Introduction
There is a growing interest in literature on detecting the outliers in multivariate data. This
problem is first addressed by using the classical statistical methods [1,17,23]. However,
these classical methods fail in detection, since sample mean and covariance are highly
sensitive to the outliers and existence of multiple outliers in the data causes masking
and swamping problems. These issues are then addressed by using Mahalanobis square
distances (MSDs) based on robust estimators of location and scale parameters.
In the literature, there is a vast amount of work on the details and the performance of
the proposed robust estimators, [4,5,9,10,14,22,25,27,28,33]. Particularly in their studies,
Becker and Gather [2], Cerioli [7], Hadi [13], Hardin and Rocke [16], Herwindiati et al.
[18], Rocke [26], and Rousseeuw and van Zomeren [29] have most commonly used MSDs
based on minimum covariance determinant (MCD) and S-estimators with bi-weight and
CONTACT Meltem Ekiz ozmeltem@gazi.edu.tr Department of Statistics, University of Gazi, Faculty of Science
Teknikokullar, 06500 Ankara, Turkey
t-biweight functions. Moreover, they investigate the distributions of MSDs based on these
proposed robust estimators.
The formal definition of MSD is given by
ˆ = (Xi − μ̂)
Di (Xi , μ̂, ) ˆ −1 (Xi − μ̂),
where represents the transpose, i = 1, 2, . . . , n and n is the sample size. In the formula
above, Xi is drawn from a multivariate normal distribution Xi ∼ Np (μ, ), and μ̂ and ˆ
are the estimations of mean μ and covariance . The MSD computed by using robust
estimators of location and scale parameters is known to be distributed asymptotically χp2 ,
[16] where p is the number of variables. However, the approximation remains weak in cases
where the sample size is small [7,16,30].
In order to overcome this problem, for the multivariate normal distribution, the con-
sistency factor(k) is used to ensure the consistency of the robust estimators of the scale
parameters, [7,24]. However, when the sample size is small, using k does not necessar-
ily guarantee the unbiasedness of the robust estimators [7,24]. If the random variable in
Equation 3 is Wishart distributed then mMCD E(k ˆ MCD ) = mMCD [21]. On the other
hand, in the case of small sample size, E(k ˆ MCD ) = and there exists a c that satisfies
E(ck ˆ MCD ) = .
Hence, in [7,11,24] the authors suggested to use small sample correction factor (c) that
is determined by simulations for various sample sizes and number of variables.
The aim of this study is to show the effect of small sample correction factor (c) on the
distribution of MSDs based on MCD and S-estimators when the sample size is small. To
achieve this, we computed c values from the simulated data with various n and p and com-
pared the number of outliers detected by MSDs (with incorporated c) based on various
robust estimators. In the analysis, we show the sensitivity of the results to p/n which is the
ratio of the number of variables to the sample size.
In Section 2, we introduce MCD and S-estimators with bi-weight and t-biweight func-
tions and presented the obtained c values. Then, the distributions of the MSDs based on
these estimators (with incorporated c values) are examined in Section 3. In Section 4,
we summarize seven different methods to detect outliers from MSDs based on various
robust estimators and cut-off values. We also illustrate the simulation results that compare
their performances in estimating the number of outliers. Finally, we present an example
application of the proposed methods to real data.
For the definition of BP of an estimator we follow the definition given in [22]. The BP
of an estimate θ̂ at G, denoted by ε ∗ (θ̂ , G), is the largest ε ∗ ∈ (0, 1) such that λ < ε ∗ , θ̂∞
((1 − λ)G + λH) as a function of H remains bounded.
1
h
μ̂MCD = x
h j=1 j
1
h
ˆ MCD = k
(x − μ̂MCD )(xj − μ̂MCD ) ,
h j=1 j
(1) κ(0) = 1
(2) κ : + → [0, 1] is non-increasing, continuous at 0 and continuous on the left
(3) κ(u) > 0, 0 ≤ u < e
κ(u) = 0, u > e for some e > 0
1
n
κ((X − α) A−1 (X − α)) ≥ 1 − ε (1)
n 1
The BP of the S-estimators is ε ∗ = min(ε, 1 − ε). For a given ε, one may ensure the Fisher
consistency of the estimates by choosing a function κ with the properties given above
and has a third continuous derivative. If we replace the κ function, with a non-decreasing
function ρ : → [0, ∞), we obtain similar results presented in [19].
If the Tukey’s bi-weight function,
(1 − (u/e)2 )2 , 0≤u≤e
κb (u) =
0, u>e
This function contains two parameters (e1 , e2 ) that satisfy the given values of the BP and the
√
asymptotic rejection probability (for large values of p, Rocke [26] showed that e2 = p/e1 ).
√
The constants e1 and e2 = p/e1 are determined from Equation (2), and they satisfy
−1
E(κt ((X − μ) (X − μ))) = 1 − ε. e1 is computed by replacing κb with κt and proper
√
partition of the integral boundaries. Then e2 is found by the formula e2 = p/e1 . Hence-
forth, we use S-estimators when we refer to S-estimators with b-weight and t-biweight
functions.
In our previous study, we conclude that it is necessary to use small sample correction
factor for S-estimators, [11]. To achieve this, after the computation of the robust estimators
explained above we multiply the estimated covariance with small sample correction factor
cMCD (mentioned in [7]), cb and ct for MCD, S-estimators with bi-weight and t-biweight
functions, respectively. Since the robust estimators are scale invariant, without loss of gen-
erality we assume that the real value of the covariance is = I where I represents the
identity matrix. In order to choose the proper c values, we run r simulations for each esti-
mator and we compute the value of ˆ and took the 1/pth power of the determinant of the
average across the total number of runs, [7,11]. This is given by
1/p
r
1 (j)
c = 1/ ˆ .
r j
ˆ ≈ 1.
c takes values on the interval [0, 1] and when the sample size is small it satisfies |c|
However, when the sample size increases c approaches 1. Table 1 illustrates c values for
robust estimators investigated in this study. Table 2 presents e (for bi-weight) and e1 values
(for t-biweight and e2 can be directly computed from e1 ) obtained for various p.
JOURNAL OF APPLIED STATISTICS 5
[16]. Here, mMCD is the unknown degrees of freedom, and k is the consistency factor.
Additionally, since μ̂MCD → μ, as n → ∞ the distribution of MSDs for extreme
observations based on the MCD estimators becomes
kpmMCD
ˆ MCD ) ∼
D(Xi , μ̂MCD , k Fp,mMCD −p+1
(mMCD − p + 1)
2
m̂MCD = , (4)
Ĉ2
second group of the random sample. The second sample is independent from the first and
generated as follows [15,16]. Given 1 − ε < δ < 1,
(1) Let n1 , n2 and n3 , come from the multinomial (n, (1 − ε), δ − (1 − ε), 1 − δ) distri-
bution.
(2) X1 , X2 , . . . , Xn1 , is a sample from a truncated normal distribution Np (μ, ), where the
truncation is on the condition (X − μ) −1 (X − μ) < χp,(1−ε) 2 .
(3) Xn1 +1 , Xn1 +2 , . . . , Xn1 +n2 , is a sample from a truncated normal distribution Np (μ, ),
where the truncation is based on the conditions (X − μ) −1 (X − μ) > χp,(1−ε) 2 and
(X − μ) −1 (X − μ) < χp,δ 2 .
(4) Xn1 +n2 +1 , Xn1 +n2 +2 , . . . , Xn1 +n2 +n3 =n , is a sample from a truncated normal distribu-
tion Np (μ, ), where the truncation is based on the condition (X − μ) −1 (X −
μ) > χp,δ 2 .
Here, E3 is the extreme observations of the second sample. Therefore, under the condi-
tions above, the second sample X1 , X2 , . . . , Xn would be an i.i.d. sample from the Np (μ, )
distribution.
Remark 3.1: In this study, the data is also generated by the conditions presented above,
and in what follows we show the impact of incorporating c for generating the distribution
of the MSDs based on the MCD and S-estimators.
With the small sample correction factor (c), as μ̂MCD → μ, the distribution of MSDs
(based on MCD) for extreme observations becomes
cMCD .k.p.mMCD
ˆ MCD ) ∼
D(Xi , μ̂MCD , cMCD .k. .Fp,mMCD −p+1 .
(mMCD − p + 1)
Similarly, when S-estimators with bi-weight and t-biweight functions are used, as μ̂b → μ,
the distribution of MSDs for extreme observations become
cb .p.mb
ˆ b) ∼
D(Xi , μ̂b , cb . .Fp,mb −p+1 ,
(mb − p + 1)
and as μ̂t → μ
ct .p.mt
ˆ t) ∼
D(Xi , μ̂t , ct . .Fp,mt −p+1 ,
(mt − p + 1)
respectively. These equations are similar to those given in [16] but they differ with the
multiplicative c values.
ˆ
When c is incorporated, the probability of an observation (used to calculate μ̂ and )
to fall into the region E2 ∪ E3 and especially E3 will approach zero for increased sample
JOURNAL OF APPLIED STATISTICS 7
Figure 1. The horizontal axis denotes the Monte-Carlo estimates of the expectations of the order statis-
tics of χp2 , and the vertical axis shows the Monte-Carlo estimates of the ordered MSDs based on the robust
estimators (MCD and S-estimators with bi-weight and t-biweight functions). The Monte-Carlo estimates
for ordered MSDs based on robust estimators are plotted against the expected values of the order
statistics for χp2 . The simulations are performed for 5000 repetitions when n = 50, p = 10, ε = 0.50 and
δ = 0.90. The Monte-Carlo estimates of the ordered MSDs for the first group of samples Y1 , Y2 , . . . , Yn ,
based on μ̂ and c, ˆ obtained from X1 , X2 , . . . , Xn are plotted versus the expected values of the order
statistics for χp2 and are represented by green plus signs ( ). The Monte-Carlo estimates of the ordered
MSDs for the second group of samples X1 , X2 , . . . , Xn , based on μ̂ and c, ˆ obtained from X1 , X2 , . . . , Xn
are plotted versus the expected values of the order statistics for χp and are represented by red diamonds
2
( ). The expected values of the order statistics for F are plotted versus the expected values of the order
statistics for χp2 are represented by blue circles ( ). The expected values of the order statistics for χp2
are plotted versus the expected values of the order statistics for χp2 and are represented by solid black
line (–).
8 M. EKIZ AND O.U. EKIZ
size. This is along with the lines given in [16]. Hence, in our analysis below, we will use
MCD and S-estimators from the second sample which are by definition independent of
both the extreme values (of the second sample) and the first sample, that is, μ̂ and c ˆ are
independent of Xn2 +1 · · · Xn and Y1 , . . . Yn .
In order to illustrate the impact of using small sample correction factor (c), we present
the simulation results in Figure 1. In the simulations, we compute m̂ from Equation (3). As
described in [16], since the diagonal elements of ˆ are identically distributed and uncorre-
lated, we simulate r independent copies of ˆ matrix from n data points of each independent
sample, and m is estimated by the coefficient of variation of rp diagonal elements. In Table 3,
we presented m̂ values that are obtained by r = 5000 simulations for various n and p values
where ε = 0.50 and δ = 0.90.
For the observations in region E1 , the distribution of MSDs based on μ̂ and c ˆ fits
exactly χp with p degrees of freedom (Figure 1, (a,1), (b,1), (c,1)). However, in the absence
2
of c, this is not necessarily the case (Figure 1, (a,2), (b,2), (c,2)) in that the sample size is
small. The results are even more prominent for S-estimators. Moreover, the distribution
based on the extreme observations (region E3 ) deviates more significantly from χp2 when
the sample size is small.
We also generate the distributions of MSDs for the observations from Y1 · · · Yn based
on μ̂ and c ˆ and observe that they fit to F distribution (see Figure 1, (a,1),(b,1),(c,1)).
The distribution of MSDs of the extreme observations (in E3 ) fit to the tail of the same
F. Nevertheless, the degrees of freedom of F found in our simulations with c are different
from the F distributions observed in [16] without c (see Figure 1, (a,2), (b,2),(c,2)).
an example application to the real data and compare our findings with the previous results
from the literature.
(1) MSDs based on maximum likelihood estimators and using a cut off value of χp,1−α
2 ,
(2) MSDs based on MCD estimators and using a cut off value of Fp,m̂MCD −p+1,1−α ,
(3) MSDs based on MCD estimators and using a cut off value of χp,1−α
2 ,
(4) MSDs based on S-estimators with bi-weight function and using a cut off value of
Fp,m̂b −p+1,1−α ,
(5) MSDs based on S-estimators with bi-weight function and using a cut off value of
χp,1−α
2 ,
(6) MSDs based on S-estimators with t-biweight function and using a cut off value of
Fp,m̂t −p+1,1−α ,
(7) MSDs based on S-estimators with t-biweight function and using a cut off value of
χp,1−α
2 .
These are compared in terms of their performance in detecting the number of outlying
points when the sample size is small.
As the number of variables increases, the presence of outliers in the data causes swamp-
ing and masking problems. These problems are investigated by generating data with small
sample size and multiple outliers. To achieve this, we construct a simulation study to gen-
erate data from multivariate normal distributions, and the steps of the simulation are listed
below;
(1) ν1 is a randomly selected integer from the uniform distribution U(1, ν), where ν
denotes the number of outliers and ν2 is determined so that ν = ν1 + ν2 is satisfied,
(2) ν1 number of outliers are then generated from Np (μ1 , I), and the remaining ν2 = ν −
ν1 outliers are generated from Np (μ2 , I). Here the elements of μ1 and μ2 are randomly
generated from U(−10, 10) such that 25 ≤ μ1 μ1 and 25 ≤ μ2 μ2 ,
(3) Finally, (n − v) samples are generated from the standard multivariate normal.
Hence, the data is used to test performances of seven methods presented earlier in
detecting outliers. The subset of methods used to detect v outliers are listed below
(Figure 2):
(1) we applied methods 2 and 3, and the results are given in plots (a,1), (a,2) and (a,3),
respectively, for p/n > 0.20, p/n = 0.20, and p/n < 0.20,
(2) we applied methods 1, 4 and 5, and the results are given in plots (b,1), (b,2) and (b,3),
respectively, for p/n > 0.20, p/n = 0.20, and p/n < 0.20,
(3) we applied methods 1, 6 and 7, and the results are given in plots (c,1), (c,2) and (c,3),
respectively, for p/n > 0.20, p/n = 0.20, and p/n < 0.20.
10 M. EKIZ AND O.U. EKIZ
Figure 2. For 5000 repetitions, the horizontal axis denotes the number of outliers, ν, the vertical axis
shows the means for the detected outlier ratios (v̂/n) determined using the procedures given in 1–7.
In all of the subplots black solid line is line l with slope 1/n. Moreover, -.-.-.-. in subplots (b1)–(c3) is the
line that represents the results gathered from Method 1. In subplots (a1)–(a3), blue and red represent
the results for Methods 2 and 3, respectively. In subplots (b1)–(b3), brown and green are used for Meth-
ods 4 and 5, respectively. Finally, In subplots (c1)–(c3), purple and cyan are used for Methods 6 and 7,
respectively.
The horizontal axis in these plots denotes the number of outliers ν, and the vertical axis
denotes the Monte-Carlo means for the detected number of outlier rates (v̂/n) obtained by
repeating the simulations r times for each method. Here, the solid line (l) in plots represents
the line with the slope 1/n such that each point on the vertical axis is equal to v/n with the
real v values. The performance of a given method is concluded as ‘successful’ if estimated
v̂/n fits l.
JOURNAL OF APPLIED STATISTICS 11
Table 4. Monte-Carlo estimates of the number of observations used in the calculations of the S-
estimators(h).
n p 2 4 6 8 10 20
50 hb 34.3350 36.3600 37.0700 37.5300 37.6350 40.1300
ht 26.6800 26.9300 27.6200 28.2400 29.6000 35.2100
100 hb 70.7700 77.2200 81.9300 84.3600 85.1700 86.9500
ht 53.0300 52.5700 52.5300 53.8700 54.3500 59.8100
Please note that, v̂/n is a Monte-Carlo mean for the ratio of observations that the meth-
ods can detect as an outlier. It is not taken into account if the observation is correctly
classified as an outlier or not.
The results gathered from the analysis were similar for various p values. Hence, in order
to avoid repetition, we plot only p = 20 in Figure 2. We focus on three cases p/n > 0.20,
p/n = 0.20, and p/n < 0.20. This is because, in all of the repeated simulations with various
p/n, we observe the most dramatic change in behavior for the values below and above
p/n = 0.20. For instance, one may conclude from the plots (a,1) for p/n > 0.20 and (a,3)
for p/n < 0.20 in Figure 2 that the results differ significantly and methods 2, 3 perform
better in approaching line l in the latter case.
Furthermore, plots (b,i) and (c,i), i = 1,2,3, show that the BPs of the maximum likelihood
estimators (method 1) are very low. From the plots, one may observe that as the number
of outliers increase, the estimations become further away from line l.
For a general conclusion, if we consider p/n < 0.20, and compare all seven methods (see
(a,3), (b,3), and (c,3) in Figure 2), it is clear that method 2 performs the best in determining
the number of outliers correctly. We also conclude that the behavior of each pair of methods
(2, 3), (4, 5) and (6, 7) become similar to each other when the sample size increases. This
similarity occurs at a rapid speed for methods 2 and 3 and at a slower speed for methods 6
and 7. These conclusions are parallel to the well known fact that the distribution of extreme
values of MSDs fit chi-square with p degrees of freedom when n increases [30].
When p/n > 0.20, it is obvious from plots (a,1), (b,1) and (c,1) in Figure 2 that the
results gathered from method 4 are very satisfactory and we conclude that it performs the
best for p/n > 0.20, by detecting %10 of the outliers in the data with great accuracy.
hb and ht represent the Monte-Carlo estimates of the number of observations that are
used to calculate S-estimators where subscripts b and t represent bi-weight and t-weight
functions, respectively. These estimations are given in Table 4. One may observe from the
table that hb values are always larger than ht for every n and p pair. This shows that S-
estimators with bi-weight functions are more efficient than the S-estimators with t-biweight
√
functions when e2 = p/e1 .
In what follows, we illustrate an example application for the proposed framework.
Figure 3. Dashed and solid lines are cut-off values of chi-square and F, respectively. Blue circles are the
MSDs, based on bi-weight S-estimators which are improved by cb and the green diamonds are MSDs,
based on S-estimators, and red crosses are the MSDs, based on maximum likelihood estimators.
as Projection and Kosinsky. The Projection and Kosinsky methods detected (2, 3, 6, 10, 11,
12, 15, 18) and (2, 6, 9, 10, 11, 15, 17) as outliers, respectively (Figure 3).
In the second study, performed by Filzmoser [12], the numbered observations (1, 6, 9,
10, 11, 15, 18) were determined to be the outlying points. In fact, when the MSDs based
on the classical maximum likelihood estimators are used there were no outliers detected
in the data, (see Figure 3 or [28]).
In this study, we apply two methods (with c) to the same data (Method 4 and Method 5,
refer to Section 4.1) that use MSDs based on S-estimators with bi-weight function. Meth-
ods 4 and 5 have cut-off values as Fp,m̂b −p+1,1−α and χp,1−α
2 , respectively. As illustrated in
Figure 3, Method 4 and 5 detect (6, 9, 10, 11) and (1, 2, 6, 9, 10, 11, 13, 15, 18) as outliers,
respectively. From Figure 3, we conclude that methods (with c) based on χp,1−α 2 detect too
many outliers and F should be preferred. This validates our simulation studies in the pre-
ceding section in which we show that when we incorporate small sample correction factor,
the MSDs of extreme observations fit to F. However, this F differs significantly from the F
without using c.
5. Conclusions
MSDs based on robust estimators perform well in outlier detection for large sample size.
However, when the sample size is small, robust estimators are biased. To address this prob-
lem, we introduce small sample correction factor for the distributions of MSDs based
on robust estimators. Two widely studied estimators are chosen as prototypes, MCD and
S-estimators, for testing the outlier detection performance with incorporated c, when the
sample size is small. We show details of formally incorporating c to the existing model.
JOURNAL OF APPLIED STATISTICS 13
First, we use simulated data to investigate the impact of incorporating c. The results from
simulations show that distribution of MSDs for non-extreme observations are more likely
to fit to chi-square with p degrees of freedom and MSDs of the extreme observations fit to F
distribution, when c is incorporated into the model. However, without c, the distributions
deviate significantly from chi-square and F observed for the case with incorporated c.
Second, we introduce seven different methods to analyze their outlier detection per-
formance. The methods use MSDs based on MCD and S-estimators with various cut-off
values. The simulations reveal that the performance of methods are highly effected by the
ratio of the number of variables to the sample size. In our simulation set up, most distinct
change in performance occurs below and above 0.2 threshold. We conclude that below
the threshold, the comparison between MSDs based on MCD and a cut off value F gives
the best performance. However, above the threshold the comparison of MSDs based on S-
estimators with bi-weight function with F distribution gives more satisfactory results when
the sample size is small.
Finally, these results are validated by applying some of these comparison methods to real
data. We investigate the well-known Coleman data which has 5 variables and 20 obser-
vations in total. We observe that the method with χp,1−α 2 cut-off value detects too many
outliers and cut-off value F gives more reasonable results.
This study particularly focuses on the effect of c to the MSDs based on MCD and S
estimators. However, there exist studies in literature that use MSDs based on different esti-
mators (e.g. [32]), and the comparison of these methods with the MSDs mentioned in this
study is the subject of the future work.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
[1] V. Barnett and T. Lewis, Outliers in Statistical Data, ISBN 0-471-93094-6, John Wiley & Sons,
Chichester, 1994.
[2] C. Becker and U. Gather, The largest nonidentifiable outlier: A comparison of multivariate
simultaneous outlier identification rules, Comput. Statist. Data Anal. 36 (2001), pp. 119–127.
[3] P. de Boer and V. Feltkamp, Robust multivariate outlier detection, Statistics Netherlands Project
number 80820 BPA number 324-00-RMS/INTERN, 2000.
[4] R.W. Butler, P.L. Davies, and M. Jhun, Asymptotics for the minimum covariance determinant
estimator, Ann. Statist. 21 (1993), pp. 1385–1400.
[5] N.A. Campbell, Robust procedures in multivariate analysis I: Robust covariance estimation, Appl.
Stat. 29 (1980), pp. 231–237.
[6] N.A. Campbell, H.P. Lopuha, and P.J. Rousseeuw, On the calculation of a robust S-estimator of
a covariance matrix, Stat. Med. 17 (1998), pp. 2685–2695.
[7] A. Cerioli, Multivariate outlier detection with high-breakdown estimators, J. Amer. Statist. Assoc.
105 (2010), pp. 147–156.
[8] C. Croux and G. Haesbroeck, Influence function and efficiency of the minimum covariance
determinant scatter matrix estimator, J. Multivariate Anal. 71 (1999), pp. 161–190.
[9] P.L. Davies, Asymtotics behaviour of S-estimates of multivariate location parameters and disper-
sion matrices, Ann. Statist. 15 (1987), pp. 1269–1292.
[10] L. Davies, The asymptotics of S-estimators in the linear regression model, Ann. Statist. 18 (1990),
pp. 1651–1675.
14 M. EKIZ AND O.U. EKIZ
[11] O.U. Ekiz and M. Ekiz, A small-sample correction factor for S-estimators, J. Stat. Comput. Simul.
85 (2013), pp. 794–801.
[12] P. Filzmoser, Identification of multivariate outliers: A performance study, Aust. J. Stat. 34 (2005),
pp. 127–138.
[13] A.S. Hadi, Identifying multiple outliers in multivariate data, J. R. Stat. Soc. B 54 (1992),
pp. 761–771.
[14] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel, Robust Statistics, The Approach
Based on Influence Functions, John Wiley & Sons, New York, 1986.
[15] J.S. Hardin, Multivariate outlier detection and robust clustering with minimum covariance
determinant estimation and S-estimation, Ph.D. thesis, University of California, 2000.
[16] J. Hardin and D.M. Rocke, The distribution of robust distances, J. Comput. Graph. Statist. 14
(2005), pp. 928–946.
[17] D.M. Hawkins, Identification of Outliers, Vol. 11, Chapman and Hall, London, 1980.
[18] D.E. Herwindiati, M.A. Djauhari, and M. Mashuri, Robust multivariate outlier labeling, Comm.
Statist. Simulation Comput. 36 (2007), pp. 1287–1294.
[19] H.P. Lopuha, On the relation between S-estimators and M-estimators of multivariate location
and covariance, Ann. Statist. 17 (1989), pp. 1662–1683.
[20] H.P. Lopuha and P.J. Rousseeuw, Breakdown points of affine equivariant estimators of multivari-
ate location and covariance matrices, Ann. Statist. 19 (1991), pp. 229–248.
[21] K.V. Mardia, J.T. Kent, and J.M. Bibby, Multivariate Analysis, Academic Press, New York, 1979.
[22] R.A. Maronna, R.D. Martin, and V.J. Yohai, Robust Statistics: Theory and Methods, John Wiley
& Sons, New York, 2006.
[23] K.I. Penny and I.T. Jolliffe, A comparison of multivariate outlier detection methods for clinical
laboratory safety data, The Statistician 50 (2001), pp. 295–307.
[24] G. Pison, S. Van Aelst, and G. Willems, Small sample corrections for LTS and MCD, Metrika 55
(2002), pp. 111–123.
[25] M. Riani, A.C. Atkinson, and A. Cerioli, Finding an unknown number of multivariate outliers,
J. R. Stat. Soc. B 71 (2009), pp. 447–466.
[26] D.M. Rocke, Robustness properties of S-estimators of multivariate location and shape in high
dimension, Ann. Statist. 24 (1996), pp. 1327–1345.
[27] P.J. Rousseeuw, Multivariate estimation with high breakdown point, in Mathematical Statistics
and Applications, W. Grossman, G. Pflug, I. Vinceze, and W. Wertz, eds., Reidel Publishing
Company, Dordrecht, 1985, pp. 283–297.
[28] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, Wiley, New York,
1987.
[29] P.J. Rousseeuw and B.C. van Zomeren, Unmasking multivariate outliers and leverage points
(with discussion), J. Amer. Statist. Assoc. 85 (1990), pp. 633–639.
[30] P.J. Rousseeuw and B.C. van Zomeren, Robust distances: Simulations and cutoff values, in Direc-
tions in Robust Statistics and Diagnostics, W. Stahel and S. Weisberg, eds., Springer, New York,
1991, pp. 195–203.
[31] R.J. Serfling, Approximation Theorems of Mathematical Statistics, Wiley, New York, 1980.
[32] R. Todeschini, D. Ballabio, V. Consonni, F. Sahigara, and P. Filzmoser, Locally centred Maha-
lanobis distance: A new distance measure with salient features towards outlier detection, Anal.
Chim. Acta 787 (2013), pp. 1–9.
[33] R.R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, 2nd ed., Elsevier
Academic Press, London, 2005.