Limb2017-Ineficacy of Chauvenets Criterion

The Inefficacy of Chauvenet’s Criterion impact of removing such points, and whether Chauvenet’s crite-
rion is used in accordance to Chauvenet’s original intentions.

for Elimination of Data Points A survey of some popular texts on measurement uncertainty
[1–6] as well as an older source [7] shows that Chauvenet’s crite-
rion is often discussed as a possible means of identifying outliers,
Braden J. Limb but the language used in these texts is often ambiguous about
Mechanical and Aerospace Engineering Department, whether outliers should be removed. Some of these sources seem
Utah State University, to encourage the automatic use of the criterion [1,5,7], state that it
“may” be used [4], or state that samples identified by the criterion
Logan, UT 84322
as outliers can “…be considered for rejection” [2,6]. Some sug-
gest the criterion as a means to detect a measurement that is some-
Dalon G. Work how illegitimate, encouraging additional scrutiny of such
Mechanical and Aerospace Engineering Department, measurements [2,3], while others [6,7] appear dubious about its
Utah State University, use for various reasons, including its arbitrary threshold [6]. Most
Logan, UT 84322 of the texts suggest that Chauvenet’s criterion applies to multiple
outliers using the phrases (all the emphases are ours): “all meas-
Joshua Hodson urements… can be considered for rejection” [2], “…which identi-
Mechanical and Aerospace Engineering Department, fies outliers…” [3], “…reject both values” [6], “…discard both
data points” [1], and “…the dubious points are eliminated” [4].
Utah State University,
None of the texts states that Chauvenet’s criterion only applies to
Logan, UT 84322 one outlier, as was Chauvenet’s original intention. Several of
these texts note that use of the criterion is controversial [5–7].
Barton L. Smith One text suggests that each instance of use of the criterion be spe-
Mechanical and Aerospace Engineering Department, cifically logged [1]. The various recommendations and the ambi-
Utah State University, guity in some may be especially problematic in that it encourages
Logan, UT 84322 practices leading to confirmation bias, such as eliminating an out-
lier when it improves the result in some way. We also note that
e-mail: Barton.Smith@usu.edu
none of the sources listed above provides any reference to Chau-
venet’s paper nor any later interpretation of the paper as a ration-
ale for their interpretation of the criterion.
Chauvenet’s criterion is commonly used for rejection of outliers While some of these authors may not have advocated use of the
from sample datasets in engineering and physical science criterion, the vague language used in these texts has led to wide-
research. Measurement and uncertainty textbooks provide con- spread use of Chauvenet’s criterion in the scientific literature. A
flicting information on how the criterion should be applied and small sample of recent papers finds examples in fluid dynamics
generally do not refer to the original work. This study was under- [9,10], bioengineering [11], electrical engineering [12], chemical
taken to evaluate the efficacy of Chauvenet’s criterion for improv- engineering [13], ocean engineering [14], fracture mechanics
ing the estimate of the standard deviation of a sample, evaluate [15,16], environmental sciences [17], and pharmaceutical sciences
the various interpretations on how it is to be applied, and evaluate [18]. In each of these papers, data points were eliminated from
the impact of removing detected outliers. Monte Carlo simulations samples based only on Chauvenet’s criterion. This is in spite of
using normally distributed random numbers were performed with the fact that the metrology community roundly rejects all the elim-
sample sizes of 5–100,000. The results show that discarding out- inations of outliers based only on their probability.
liers based on Chauvenet’s criterion is more likely to have a nega- The present study seeks to determine the efficacy of Chauve-
tive effect on estimates of mean and standard deviation than to net’s criterion by using Monte Carlo methods. Large samples will
have a positive effect. At best, the probability of improving the be drawn from a Gaussian distribution. The specific methods will
estimates is around 50%, which only occurs for large sample be described, followed by results showing how often this criterion
sizes. [DOI: 10.1115/1.4035761] results in an improvement of the estimate of the parent
population.
Keywords: data rejection, measurements, outliers, statistics
2 Methods
1 Introduction
For this study, a normalized random number generator was
In engineering and physical science research, experimental data used to generate datasets of various sizes from a population with
are commonly acquired with the aim of estimating the parent pop- known mean and standard deviation. Chauvenet’s criterion was
ulation. It is a common occurrence, especially for small data sets, used for outlier detection from these datasets. We identify instan-
that some observations appear to be the result of a defect in the ces when the sample mean and standard deviation more closely
measuring system or the result of a very low-probability event. represent that of the population after the detected outliers were
Some have suggested that these observations be removed from the removed. Sections 2.1 and 2.2 outline the methods used for this
dataset to improve the estimate of the standard deviation. study.
Of the outlier detection methods, Chauvenet’s criterion is per-
haps most widely used throughout engineering and physical sci- 2.1 Chauvenet’s Criterion. Chauvenet’s criterion specifies a
ence [1–8]. This method was originally suggested by William maximum expected error max for a sample set, beyond which a
Chauvenet in 1863 in an effort to simplify Benjamin Peirce’s data point should be considered for rejection. This maximum
method for outlier detection. However, the efficacy of this simpli- expected error is set by Chauvenet to be the location beyond which
fication and each test’s ability to detect outliers has yet to be dem- the probable number of data points within a set is less than 1=2.
onstrated in the literature. The present study seeks to evaluate the Table 1 gives Chauvenet’s criterion values, sC, for several values of
effectiveness of Chauvenet’s criterion in outlier detection, the N. The Z-score of any suspected outliers is then computed with
Contributed by the Fluids Engineering Division of ASME for publication in the

JOURNAL OF FLUIDS ENGINEERING. Manuscript received July 2, 2016; final manuscript xo x
received December 26, 2016; published online March 16, 2017. Assoc. Editor: Mark
Z0 ¼ (1)
F. Tachie. sx
Journal of Fluids Engineering Copyright V

C 2017 by ASME MAY 2017, Vol. 139 / 054501-1
Downloaded From: http://fluidsengineering.asmedigitalcollection.asme.org/pdfaccess.ashx?url=/data/journals/jfega4/936070/ on 06/04/2017 Terms of Use: http://www.asme.org/a

Table 1 Values of Chauvenet’s criterion 3 Results
N sC Zmax In total, 26 different values of N were tested, ranging from 5 to
100,000. M was set to 1 106, for a total of 26 106 runs. Several
2 1.15 0.707 different statistics were collected for each N. These include the
4 1.54 1.500 number of times removing outliers improved both the estimate of
6 1.73 2.041 the mean and standard deviation.
8 1.87 2.475 Figure 1 shows the percentage of cases where removing outliers
10 1.96 2.846 improved the estimates of the mean and standard deviation. It is
clear from this figure that when N is small, removing outliers as
defined by the criterion will make the estimates of the mean and
standard deviation worse in most cases. As N gets large, the prob-
ability that removing outliers will make the estimates better
approaches 50%. While not shown here, rejection of samples
where xo is the position of the sample, and x and sx are the sample based on Pierce’s criterion showed very similar results to those in
mean and standard deviation, respectively. The Z-score and Chau- Fig. 1 and demonstrated no real advantage. Furthermore, use of
venet’s criterion are compared, and if Zo > sC, then the datum xo different probability values for the Chauvenet’s criterion threshold
is removed from the dataset. Once all the outliers are removed, showed no improvement.
the mean and standard deviation are recomputed. In accordance
with popular texts on measurement uncertainty [1–5], Chauve-
net’s criterion is not repeated and all the outliers are removed.
For the present numerical experiments, we generate N random 4 Conclusion
numbers from a normal distribution and then compute the mean Chauvenet’s criterion is commonly used for outlier detection
and standard deviation of that sample. The outliers are found by and rejection in the physical science and engineering fields. The
computing Zi for each number in the set and comparing it to Chau- results show that discarding outliers based on Chauvenet’s crite-
venet’s criterion for a sample of size N. If Zi > sC, then xi is dis- rion is more likely to have a negative effect on the estimates of
carded. Next, we compute a new mean and standard deviation mean and standard deviation than to have a positive effect. At
with the revised sample. Finally, we check to see if the new mean best, the probability of improving the estimates is around 50%,
and standard deviation are better approximations of the population which only occurs for large sample sizes. It was also found that
mean and standard deviation than the original estimates of the all the measurement and uncertainty textbooks reviewed deviated
mean and standard deviation. This process is repeated M times for from Chauvenet’s original intentions in the presentation of his cri-
each sample of size N, and the results of each case are recorded. terion. Based on this information, it is not recommended that
The smallest sample size used in this study will be five. While Chauvenet’s criterion be used for outlier rejection without other
Chauvenet’s criterion may seem most important for small datasets evidence to support rejection.
where a single outlier can badly skew the estimate of the parent
population, it can be shown that criterion cannot identify an out- References
lier for sample sets of size four or smaller. [1] Bevington, P. R., and Robinson, D. K., 2003, Data Reduction and Error Analy-
sis for the Physical Sciences, McGraw-Hill, New York.
2.2 Random Number Generator. The random number gen- [2] Coleman, H. W., and Steele, W. G., 2009, Experimentation, Validation, and
Uncertainty Analysis for Engineers, Wiley, Hoboken, NJ.
erator used is the 2007 SIMD Mersenne Twister-19937 algorithm [3] Figliola, R. S., and Beasley, D. E., 2010, Theory and Design for Mechanical
[19–21], which was seeded using the computer time when the pro- Measurements, 5th ed., Wiley, Hoboken, NJ.
gram was run. The normal distribution used is the Wichura normal [4] Holman, J. P., and Gajda, W. J., 1989, Experimental Methods for Engineers,
distribution [22], which is accurate to 16 decimal places. McGraw-Hill Education, New York.
[5] Hughes, I., and Hase, T., 2010, Measurements and Their Uncertainties: A Prac-
tical Guide to Modern Error Analysis, OUP Oxford, New York.
[6] Taylor, J. R., 1997, An Introduction to Error Analysis: The Study of Uncertain-
ties in Physical Measurements, University Science Books, South Orange, NJ.
[7] Young, H. D., 1996, Statistical Treatment of Experimental Data: An Introduc-
tion to Statistical Methods, Waveland Press, Long Grove, IL.
[8] Ross, S. M., 2003, “Peirce’s Criterion for the Elimination of Suspect Experi-
mental Data,” J. Eng. Technol., 20(2), pp. 38–41.
€
[9] Kurup, A. L., Olçmen, S. M., and Ahmed, A., 2015, “Experimental Study of
Co-Annular Jet Subjected to Transverse Disturbances,” Exp. Therm. Fluid Sci.,
66, pp. 53–62.
[10] Griffin, J., Schultz, T., Holman, R., Ukeiley, L. S., and Cattafesta, L. N., III,
2010, “Application of Multivariate Outlier Detection to Fluid Velocity Meas-
urements,” Exp. Fluids, 49(1), pp. 305–317.
[11] Leahy, J., and Hukins, D., 2001, “Viscoelastic Properties of the Nucleus Pulpo-
sus of the Intervertebral Disk in Compression,” J. Mater. Sci. Mater. Med.,
12(8), pp. 689–692.
[12] Serkan, K., 2007, “Error Data Analyzing of the Integration of the Sensors With
Different Measuring Boundaries in Dynamic Environments,” IU-J. Electr. Elec-
tron. Eng., 7(2), pp. 439–442.
[13] Phongikaroon, S., Bezzant, R. W., and Simpson, M. F., 2013, “Measurements
and Analysis of Oxygen Bubble Distributions in LiCL–KCL Molten Salt,”
Chem. Eng. Res. Des., 91(3), pp. 418–425.
€
[14] Unal, U. O., 2015, “Correlation of Frictional Drag and Roughness Length Scale
for Transitionally and Fully Rough Turbulent Boundary Layers,” Ocean Eng.,
107, pp. 283–298.
[15] Lopez-Crespo, P., Moreno, B., Lopez-Moreno, A., and Zapatero, J.,
2015, “Study of Crack Orientation and Fatigue Life Prediction in
Biaxial Fatigue With Critical Plane Models,” Eng. Fract. Mech., 136,
pp. 115–130.
[16] Goktan, R., and Gunes, N., 2005, “A Comparative Study of Schmidt Hammer
Testing Procedures With Reference to Rock Cutting Machine Performance Pre-
diction,” Int. J. Rock Mech. Min. Sci., 42(3), pp. 466–472.
Fig. 1 The percentage of when removing outliers improved the [17] Sleeswijk, A. W., van Oers, L. F., Guinee, J. B., Struijs, J., and Huijbregts, M.
mean and standard deviation for Chauvenet’s criterion A., 2008, “Normalisation in Product Life Cycle Assessment: An LCA of the
054501-2 / Vol. 139, MAY 2017 Transactions of the ASME

Global and European Economic Systems in the Year 2000,” Sci. Total Environ., [20] Saito, M., and Matsumoto, M., 2008, “SIMD-Oriented Fast Mersenne Twister:
390(1), pp. 227–240. A 128-Bit Pseudorandom Number Generator,” Monte Carlo and Quasi-Monte
[18] Obreshkova, D. P., Ivanov, K. V., Tsvetkova, D. D., and Pankova, S. A., 2012, Carlo Methods 2006, Springer, Berlin, pp. 607–622.
“Quality Control of Aminoacids in Organic Foods and Food Supplements,” Int. [21] Hiroshima University—Department of Mathematics, 2015, “SIMD-Oriented Fast
J. Pharm. Pharm. Sci., 4(2), pp. 404–409. Mersenne Twister (SFMT),” Hiroshima University, Higashihiroshima, Japan,
[19] Saito, M., 2007, “An Application of Finite Field: Design and Implementation of accessed Jan. 5, 2016, http://www.math.sci.hiroshima-u. ac.jp/m-mat/MT/SFMT/
128-Bit Instruction-Based Fast Pseudorandom Number Generator,” Master’s [22] Wichura, M. J., 1988, “Algorithm as 241: The Percentage Points of the Normal
thesis, Hiroshima University, Hiroshima, Japan. Distribution,” Appl. Stat., 37(3), pp. 477–484.
Journal of Fluids Engineering MAY 2017, Vol. 139 / 054501-3

Limb2017-Ineficacy of Chauvenets Criterion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Limb2017-Ineficacy of Chauvenets Criterion

Uploaded by

Copyright:

Available Formats

The Inefficacy of Chauvenet’s Criterion impact of removing such points, and whether Chauvenet’s crite-

rion is used in accordance to Chauvenet’s original intentions.

Contributed by the Fluids Engineering Division of ASME for publication in the

Journal of Fluids Engineering Copyright V

Downloaded From: http://fluidsengineering.asmedigitalcollection.asme.org/pdfaccess.ashx?url=/data/journals/jfega4/936070/ on 06/04/2017 Terms of Use: http://www.asme.org/a

054501-2 / Vol. 139, MAY 2017 Transactions of the ASME

Downloaded From: http://fluidsengineering.asmedigitalcollection.asme.org/pdfaccess.ashx?url=/data/journals/jfega4/936070/ on 06/04/2017 Terms of Use: http://www.asme.org/a

Journal of Fluids Engineering MAY 2017, Vol. 139 / 054501-3

Downloaded From: http://fluidsengineering.asmedigitalcollection.asme.org/pdfaccess.ashx?url=/data/journals/jfega4/936070/ on 06/04/2017 Terms of Use: http://www.asme.org/a

You might also like