You are on page 1of 3

CHANCE

ISSN: 0933-2480 (Print) 1867-2280 (Online) Journal homepage: https://www.tandfonline.com/loi/ucha20

Reconsidering the Human Face as Boxplot

David C. Hoaglin

To cite this article: David C. Hoaglin (2019) Reconsidering the Human Face as Boxplot, CHANCE,
32:3, 62-63, DOI: 10.1080/09332480.2019.1662703

To link to this article: https://doi.org/10.1080/09332480.2019.1662703

Published online: 06 Sep 2019.

Submit your article to this journal

Article views: 9

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=ucha20
LETTER TO THE EDITOR

Reconsidering the Human


Face as Boxplot
David C. Hoaglin

I
welcome Sarjinder Singh’s creativity (“Box Plot
Versus Human Face,” CHANCE 32(2):28–29),
but the boxplot does not align with a human
face as closely as he suggests, and it does not classify
observations as outliers.
In the standard boxplot (in his notation), the inner
fences are:
LIF = Q 1 – 1.5(Q 3 – Q 1)
UIF = Q 3 + 1.5(Q 3 –Q1)
and the outer fences are:
LOF = Q 1 – 3(Q 3 – Q 1)
UOF = Q 3 + 3(Q 3 – Q 1)
(For Q 1 and Q 3 the boxplot uses a particular defini-
tion, known as the “fourths.”)
Importantly, all the data values are real, but some
may merit investigation. As in Professor Singh’s
construction, the left and right whiskers end at the
most-extreme data values that are inside the LIF and
UIF, respectively. Data values outside the inner fences
are plotted individually, and those beyond the outer
fences receive larger plotting symbols.
Exploratory data analysis (EDA) does not auto-
matically classify data values as “outliers.” It leaves
that judgment to the analyst, after investigation. Data
values between an inner fence and the corresponding
outer fence are “outside,” and those beyond the outer
fences are “far out.” Thus, Singh’s “mild outliers”
are merely “outside” and his “extreme outliers” are
“far out.”
It is helpful to have an idea of how often samples
of well-behaved (i.e., normal) data contain observa-
tions that are outside or far out. In this null situation,
the percentage of random samples that contain one

VOL. 32.3, 2019


62
or more observations beyond the inner fences is 33% A batch may consist of distinct clusters; a boxplot
for n = 5, 20% for n = 10, 23% for n = 20, 36% for n may hint at such structure, but it is not designed to
= 50, and 53% for n = 100; and it approaches 100% reveal it. If the outside data values appear at only one
as n becomes large. The irregularities in this sequence end of the boxplot (often the upper end), a transfor-
of percentages arise mainly from the definition of mation may make the batch more nearly symmetric,
the fourths. For the outer fences, the corresponding suggesting that the presence of outside data values is
percentages are 13.4%, 2.9%, 1.2%, 0.5%, and 0.2%. a consequence of the initial scale of the data.
Thus, in well-behaved data, samples containing far out
observations are fairly rare, but samples containing Further Reading
outside observations are fairly common.
Another characteristic of the boxplot is the per- Emerson, J.D., and Strenio, J. 1983. Boxplots and
centage of observations that are outside (including batch comparison. In Understanding Robust
far out), supplemented by the percentage of observa- and Exploratory Data Analysis. Hoaglin, D.C.,
tions that are far out—again, in well-behaved data. Mosteller, F., and Tukey, J.W., eds. New York:
The percentage of observations beyond the inner John Wiley & Sons, 58–96.
fences is 8.6% for n = 5, 2.8% for n = 10, 1.7% for Frigge, M., Hoaglin, D.C., and Iglewicz, B. 2014.
n = 20, 1.15% for n = 50, and 0.95% for n = 100, Some implementations of the boxplot. The Ameri-
and 0.70% in a normal population. For the outer can Statistician 43:50–4.
fences, the corresponding percentages are 3.3%, Hoaglin, D.C., Iglewicz, B., and Tukey, J.W. 1986.
0.36%, 0.074%, 0.011%, 0.002%, and 0.00023%. Performance of some resistant rules for outlier
Thus, even normal distributions produce occasional labeling. Journal of the American Statistical Associa-
far out observations. tion 81:991–9.
In published articles, boxplots usually summarize Tukey, J.W. 1977. Exploratory Data Analysis. Reading,
part of a completed analysis. When they identify data MA: Addison-Wesley.
values as outside, however, the analysis is generally at
an early stage. Conceptually, an outlier comes from a
different population or mechanism than the bulk of
the data. In considering whether an outside data value About the Author
may actually be an outlier, one can use a stem-and-leaf
display or a dot plot to look closer at the whole batch David C. Hoaglin is an independent consultant
based in Sudbury, Massachusetts.
of data. An isolated outside or, especially, far out data
value may have helpful background information.

CHANCE
63

You might also like