You are on page 1of 19

Boxplots

• Another way to graph the distribution of a


numerical variable is through a boxplot (aka box-
and-whisker plot).
• A boxplot is a visual representation of the five-
number summary of the distribution of a
numerical variable. This consists of:
– The minimum value of the distribution
– The first quartile
– The median
– The third quartile
– The maximum value of the distribution
Steps to Make a Boxplot
1) Draw a central box (rectangle) from the first
quartile to the third quartile
2) Draw a vertical line to mark the median
3) Draw horizontal lines (called whiskers) that
extend from the box out to the smallest and
largest observations that are not outliers
4) If there are any outliers, mark them
separately
• Let’s go back to our Chris Johnson example.
• Let’s reexamine his rushing attempts, along with
other key data.
Let’s now go back to our Tom Brady example.
Here were his passer ratings, along with other
key data we calculated.
…and the boxplot
Comparing Distributions
• When asked to compare two distributions, you
must address four points:
– The shape
– The outliers
– The center
– The spread
• Think of the acronym SOCS to help you remember
what to address.
• The shape of a distribution may be difficult to
determine from a boxplot.
• Try comparing the distance from the median
to the minimum and maximum values to
determine if a distribution is skewed or
roughly symmetric.
• You will not be able to tell if a distribution is
unimodal from looking at a boxplot.
• Here are boxplots for the number of runs
scored in the AL and in the NL during 2008.
(Note: the plots are on the same scale for
comparison purposes.)

• Let’s compare using our four points.


Shape
The AL distribution is skewed slightly left (the
left half of the distribution appears more spread
out).
The NL distribution is approximately symmetric.
Outliers
Neither distribution contains an outlier.
Center
Typically, teams score more runs in the AL
because the median for the AL distributions is
higher than the median for the NL distributions.
Spread
• The AL distribution is slightly more spread out
because it has both a larger range and larger IQR.

• This indicates there is more variability among AL


teams and more consistency among NL teams.
Using the TI-84 to Make Graphs and
Calculate Summary Statistics
• As fun as it is to calculate everything by hand,
the TI-84 calculator can do many of our
calculations for us.
• The calculator can create boxplots,
histograms, and calculate summary statistics.
Boxplot
• Let’s use our 2008 run data.
• Here are the numbers:
AL runs scored:
782 845 811 805 821 691 765 829 789
646 671 774 901 714
NL runs scored:
720 753 855 704 747 770 712 700 750
799 799 735 637 640 779 641

Write these numbers down or open to pg 120!


• The first thing we have to do is store this data
as a list.
• Press STAT and choose the first option EDIT
• Enter the 14 AL data values in L1 and the 16
NL values in L2
Now we are going to set up the boxplot. Exit back into the
home screen.
Then press STAT PLOT (2nd and y= ).
Choose Plot1. Then, turn Plot1 on.
Scroll to Type and choose the boxplot icon (with outliers). It
is the first option in the second row.
Enter L1 for Xlist.
Enter 1 for Freq. Choose a mark for outliers.
Now we will display the graph. Press ZOOM.
Then select option 9: ZOOMSTAT. Press enter.
Press TRACE and scroll around to see different
statistics for the distribution.
• To see the boxplot for the NL distribution at
the same time:
• Go back into STAT PLOT and turn on Plot2.
Repeat the steps, but enter L2 for Xlist. To do
this, scroll down to Xlist. Then press 2nd-2 (you
will see the L2 button on top of the number
2).

You might also like