Professional Documents
Culture Documents
To cite this article: Jerry L. Hintze & Ray D. Nelson (1998) Violin Plots: A Box Plot-Density Trace Synergism, The American
Statistician, 52:2, 181-184
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
amstat.tandfonline.com/page/terms-and-conditions
Statistical Computing and Graphics
Violin Plots: A Box Plot-Density Trace Synergism
Jerry L. HINTZE and Ray D. NELSON
density shape into a single plot provides a useful tool for in the United States. The labels in the diagram identify the
data analysis and exploration. principal lines and points which form the main structure of
the traditional box plot diagram. As shown, the violin plot
KEY WORDS: Density estimation; Exploratory data
includes a box plot with two slight modifications. First, a
analysis; Graphical techniques.
circle replaces the median line which facilitates quick com-
parisons when viewing multiple groups. Second, outside
1. INTRODUCTION points which are traditionally classified as mild and severe
outliers, are not identified by individual symbols.
Many different statistics and graphs summarize the charac- The density trace supplements traditional summary statis-
teristics of single batches of data. Descriptive statistics give tics by graphically showing the distributional characteristics
information about location, scale, symmetry, and tail thick- of batches of data. One simple density estimator, the his-
ness. Other statistics and graphs investigate extreme obser- togram, displays the distribution of data values along the
vations or study the distribution of data values. Diagrams real number line. Weaknesses of the histogram caused Tapia
such as stem-leaf plots, dot plots, box plots, histograms, and Thompson (1979), Parzen (1979), Silverman (1986),
density traces, and probability plots give information about Izenman (1991), and Scott (1992) to propose and summarize
the distribution of values assumed by all observations. numerous alternative density estimators. One of these alter-
The violin plot, introduced in this article, synergistically natives is the density trace described in Chambers, Cleve-
combines the box plot and the density trace (or smoothed land, Kleiner, and Tukey (1983). Defining the location den-
histogram) into a single display that reveals structure found sity d(xlh) at a point x as the fraction of the data values
within the data. The introduction of this new graphical tool per unit of measurement that fall in an interval centered at
begins with a quick overview of the combination of the box x gives
plot and density trace into the violin plot. Then, three illus-
trations and examples show the advantages and challenges ",n 8.
of violin plots in data summarization and exploration. d(xlh) = L..i=l t, (1)
nh
2. COMPONENT PARTS OF VIOLIN PLOTS
The violin plot, as depicted in Figure 1 and implemented
in NCSS (1997) statistical software, combines the box plot
and density trace into one diagram. The name violin plot
originated because one of the first analyses that used the
envisioned procedure resulted in a graphic with the ap-
pearance of a violin. Violin plots add information to the
simple structure of the box plot that Tukey (1977) initially
conceived. Although these original graphs are easily drawn
with pencil and paper, computers ease subsequent modifi-
cations, refinements, and computation of box plots as dis-
cussed by McGill, Tukey, and Larsen (1978); Velleman and
© 1998 American Statistical Association The American Statistician, May 1998 Vol. 52, No.2 181
30 able density trace requires experience and judgment in de-
15
•• termining the appropriate amount of smoothing. As with
the selection of the bin width in the histogram, the inter-
val width h, which is usually specified as a percentage of
o the data range, must be selected. Experience suggests that
values near 15% of the data range often give good results.
-15
The choice of h, however, must be tempered by the size of
the sample. The density trace is subject to the same sample
size restrictions and challenges that apply to any density
-30 +-----r----'-'T---.......,...-----.., estimator. For small data sets, too small a value for h gives
Bimodal Uniform Normal
a wiggly density trace that suggest features that are simply
(a) artifacts of the individual data points. The oversmoothed
density estimate that results from too large h values gives
the illusion of knowing the shape of the distribution, while
30 in reality the data set is too small for any conclusions. As
a rule of thumb based on practice, the density trace tends
Downloaded by [The University Of Melbourne Libraries] at 15:19 14 October 2014
3. SPECIFICATION OF INTERVAL WIDTH Figure 3. Additional Information in Violin Plots. Two examples from
the density estimation literature: (a) annual snowfall for Buffalo, NY,
As with other density estimators, achieving an accept- 1910-1972; (b) Old Faithful eruption length.