You are on page 1of 4

Boxplots (aka, Box and Whisker Plots)

A boxplot, sometimes called a box and whisker plot, is a type of graph used to
display patterns of quantitative data.

Boxplot Basics

A boxplot splits the data set into quartiles. The body of the boxplot consists of a
"box" (hence, the name), which goes from the first quartile (Q1) to the third
quartile (Q3).

Within the box, a vertical line is drawn at the Q2, the median of the data set. Two
horizontal lines, called whiskers, extend from the front and back of the box. The
front whisker goes from Q1 to the smallest non-outlier in the data set, and the
back whisker goes from Q3 to the largest non-outlier.

Smallest non-outlier Q1 Q2 Q3 Largest non-outlier

.. ...

-600 -400 -200 0 200 400 600 800 1000 1200 1400 1600

If the data set includes one or more outliers, they are plotted separately as points
on the chart. In the boxplot above, two outliers precede the first whisker; and
three outliers follow the second whisker.

How to Interpret a Boxplot

Here is how to read a boxplot. The median is indicated by the vertical line that
runs down the center of the box. In the boxplot above, the median is about 400.

Additionally, boxplots display two common measures of the variability or spread in


a data set.

Range. If you are interested in the spread of all the data, it is represented
on a boxplot by the horizontal distance between the smallest value and the
largest value, including any outliers. In the boxplot above, data values
range from about -700 (the smallest outlier) to 1700 (the largest outlier),
so the range is 2400. If you ignore outliers, the range is illustrated by the
distance between the opposite ends of the whiskers - about 1000 in the
boxplot above.

Interquartile range (IQR). The middle half of a data set falls within the
interquartile range. In a boxplot, the interquartile range is represented by
the width of the box (Q3 minus Q1). In the chart above, the interquartile
range is equal to 600 minus 300 or about 300.

And finally, boxplots often provide information about the shape of a data set. The
examples below show some common patterns.

2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16

Skewed right Symmetric Skewed left

Each of the above boxplots illustrates a different skewness pattern. If most of the
observations are concentrated on the low end of the scale, the distribution is
skewed right; and vice versa. If a distribution is symmetric, the observations will
be evenly split at the median, as shown above in the middle figure.

Test Your Understanding of This Lesson

Problem 1

Consider the boxplot below.

2 4 6 8 10 12 14 16 18

Which of the following statements are true?

I. The distribution is skewed right.


II. The interquartile range is about 8.
III. The median is about 10.
(A) I only
(B) II only
(C) III only
(D) I and III
(E) II and III

Solution

The correct answer is (B). Most of the observations are on the high end of the
scale, so the distribution is skewed left. The interquartile range is indicated by the
length of the box, which is 18 minus 10 or 8. And the median is indicated by the
vertical line running through the middle of the box, which is roughly centered over
15. So the median is about 15

You might also like