You are on page 1of 56

Chapter 2

Presenting Data in Tables and


Charts
1
Learning Objectives
In this chapter you learn:
• To develop tables and charts for
categorical data
• To develop tables and charts for
numerical data
• The principles of properly presenting
graphs
2
Organizing and Presenting
Data Graphically
• Data in raw form are usually not easy to use
for decision making
– Some type of organization is needed
• Table
• Graph
• Techniques reviewed here:
– Bar charts and pie charts
– Pareto diagram
– Ordered array
– Stem-and-leaf display
– Frequency distributions, histograms and polygons
– Cumulative distributions and ogives
– Contingency tables
3
– Scatter diagrams
Tables and Charts for
Categorical Data
Categorical
Data

Tabulating Data Graphing Data

Summary Bar Pie Pareto


Table Charts Charts Diagram

4
The Summary Table
Summarize data by category

Example: Current Investment Portfolio


Investment Amount Percentage
Type (in thousands $) (%)

Stocks 46.5 42.27


Bonds 32.0 29.09
CD 15.5 14.09
(Variables are Savings 16.0 14.55
Categorical)
Total 110.0 100.0
5
Bar and Pie Charts
• Bar charts and Pie charts are often
used for qualitative data (categories
or nominal scale)

• Height of bar or size of pie slice


shows the frequency or percentage
for each category

6
Bar Chart Example
Current Investment Portfolio
Investment Amount Percentage
Type (in thousands $) (%)

Stocks 46.5 42.27


Bonds 32.0 29.09
CD 15.5 14.09 Investor's Portfolio
Savings 16.0 14.55
Total 110.0 100.0 Savings
CD
Bonds
Stocks

0 10 20 30 40 50
Amount in $1000's 7
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage
Type (in thousands $) (%)

Stocks 46.5 42.27


Bonds 32.0 29.09 Savings
CD 15.5 14.09 15%
Savings 16.0 14.55 Stocks
Total 110.0 100.0 42%
CD
14%

Percentages
are rounded to
Bonds the nearest
29% percent
8
Pareto Diagram
• Used to portray categorical data (nominal
scale)
• A bar chart, where categories are shown
in descending order of frequency
• A cumulative polygon is often shown in the
same graph
• Used to separate the “vital few” from the
“trivial many” 9
Pareto Diagram Example
Current Investment Portfolio
45% 100%

40% 90%
% invested in each category

cumulative % invested
80%
35%

70%
30%
(bar graph)

(line graph)
60%
25%

50%
20%
40%

15%
30%

10%
20%

5% 10%

0% 0%
Stocks Bonds Savings CD
10
Tables and Charts for
Numerical Data
Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Histogram Polygon Ogive
Display

11
The Ordered Array

A sequence of data in rank order:


 Shows range (min to max)
 Provides some signals about variability
within the range
 May help identify outliers (unusual observations)
 If the data set is large, the ordered array is
less useful

12
The Ordered Array
(continued)
• Data in raw form (as collected):

24, 26, 24, 21, 27, 27, 30, 41, 32, 38

• Data in ordered array from smallest to


largest:

21, 24, 24, 26, 27, 27, 30, 32, 38, 41


13
Stem-and-Leaf Diagram

• A simple way to see distribution details


in a data set

METHOD: Separate the sorted data


series
into leading digits (the
stem) and
the trailing digits (the 14
• Statistics: The branch of mathematics that
deals with collecting, organizing, and
analyzing or interpreting data.
• Data: Numerical facts or numerical
information.
• Stem-and-Leaf Plots: A convenient
method to display every piece of data by
showing the digits of each number.
• In a stem-and leaf plot, the greatest common
place value of the data is used to form stems.
The numbers in the next greatest place-value
position are then used to form the leaves.
INTERPRETATION
1. Center (typical observations)
2. Spread (variation, minimum and maximum
observations)
3. Shape (symmetry, number of peaks, skewness)
4. Other (unusual observations in the data set
Leaf: The last digit on the right of the
number.
Stem: The digit or digits that remain when
the leaf is dropped.
Look at the number 284
The leaf is the last digit formed: the number 4.
The stem is the remaining digits when the leaf is dropped: the number 28.
The stem with the leaf forms the number 284.
Stem
Leaf
28 4 = 284
Here are the scores from two periods of math class. Students took
the same test.
Period 1: 77 79 85 58 97 94 82 81 75 63 60 92 75 98 83 58 72 57
70 81
Period 2: 57 60 88 85 79 70 65 98 97 59 58 65 62 77 77 75 73 69
82 81
Period 1: 76 79 85 58 97 94 82 81 75 63 60 92 75 98 83 58 72 57
70 81
Notice that the data (numerical facts)
are numbers between 57-98. Create the
Stem Leaf stem by listing numbers from 5-9.
5 8 8 7 Stem Leaf
A key
should be 6
included 3 0 5 7 7 8
when
7
6
making a 8 6 9 5 5 2 0 0 3
stem- Rearrange 7
and-leaf 9 5 2 1 3 1 the leaf in 0 2 5 5 6 9
plot. 8
numerical
7 4 2 8 order from 9 1 1 2 3 5
least to
Key: 7 9 means 79 greatest 2 4 7 8
Match up the data to the stem-and-leaf. The last digit in 76 will match up with the stem 7.
Then the last digit in 79 will match up with the stem 7. Then the last digit in 85 will match
up with the stem 8 and this pattern will continue until all data have been recorded in the
Period 2: 57 60 88 85 79 70 65 98 97 59 58 65 62 77 77 75 73 69
82 81 Notice that the data (numerical facts)
are numbers between 57-98. Create the
stem by listing numbers from 5-9.
Stem Leaf Stem Leaf
5 7 9 8 5 7 8 9
Key: 7 9 means 79
6 6
0 5 5 2 9 0 2 5 5 9
7 7
9 0 7 7 5 3
8 8 0 3 5 7 7 9
Rearrange the leaf
9 8 5 2 1 in numerical order 9 1 2 5 8
from least to
8 7 greatest 7 8
Match up the data to the stem-and-leaf. The last digit in 57 will match up with the stem 5.
Then the last digit in 60 will match up with the stem 6. Then the last digit in 88 will match
up with the stem 8 and this pattern will continue until all data have been recorded in the
stem-and-leaf.
It is easy to interpret or analyze information from the Stem-and-Leaf.
1. How many presidents were 51 years old at their inauguration?
2. What age is the youngest president to be inaugurated?
3. What is the age of the oldest president to be inaugurated?
4. How many presidents 23were 40-49 years old at their inauguration?
42

69
7

Stem Leaf: Age of United States Presidents at their First Inauguration


(through the 40th Presidency) Rearrange the leaf in
4 2 3 6 7 8 9 9 numerical order from least
to greatest
5
0 0 1 1 1 1 2 2 4 4 4 4 5 5 5 6 6 6 7 7 7 7 8
6
0 1 1 1 2 4 4 5 8 9 Key: 5 7 means 57
Example
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
• Here, use the 10’s digit for the stem unit:
Stem Leaf
• 21 is shown as 2 1

• 38 is shown as 3 8
4 1
• 41 is shown as

22
Example
(continued)
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

• Completed stem-and-leaf diagram:


Stem Leaves
2 1 4 4 6 7
7
3 0 2 8
4 1
23
Tabulating Numerical Data:
Frequency Distributions

What is a Frequency Distribution?


• A frequency distribution is a list or a table

• containing class groupings (ranges within
which the data fall) ...
• and the corresponding frequencies with
which data fall within each grouping or
category 24
Why Use a Frequency
Distribution?

• It is a way to summarize numerical


data
• It condenses the raw data into a more
useful form...
• It allows for a quick visual
interpretation of the data

25
Class Intervals
and Class Boundaries
• Each class grouping has the same width
• Determine the width of each interval by
range
Width of interval 
number of desired class groupings

• Usually at least 5 but no more than 15


groupings
• Class boundaries never overlap
• Round up the interval width to get
26
desirable endpoints
Frequency Distribution
Example
Example: A manufacturer of insulation
randomly selects 20 winter days and
records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58,


30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27

27
Frequency Distribution Example
(continued)

• Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46,
53, 58

• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5
and 15)
• Compute class interval (width): 10 (46/5 then round
up)

• Determine class boundaries (limits): 10, 20, 30,


28
40, 50, 60
Frequency Distribution
Example (continued)
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100 29
Tabulating Numerical Data:
Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15 3 15


20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
30
Graphing Numerical
Data: The Histogram
• A graph of the data in a frequency
distribution is called a histogram
• The class boundaries (or class
midpoints) are shown on the horizontal
axis
• the vertical axis is either frequency,
relative frequency, or percentage
• Bars of the appropriate heights are used31
Histogram Example

Class
Class Midpoint Frequency
10 but less than 20 15 3 Histogram : Daily High Tem perature
20 but less than 30 25 6
30 but less than 40 35 5 7
40 but less than 50 45 4
50 but less than 60 55 2
6
5
Frequency

4
3
2
(No gaps 1
between 0
bars)
5 15 25 35 45 55 65
32
Class Midpoints
Graphing Numerical Data:
The Frequency Polygon
Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Frequency Polygon: Daily High Temperature
40 but less than 50 45 4
7
50 but less than 60 55 2
6
5
Frequency

4
3
2
(In a percentage 1
polygon the vertical axis 0
would be defined to 5 15 25 35 45 55 65
show the percentage of
observations per class) Class Midpoints
33
Graphing Cumulative
Frequencies:
The Ogive (Cumulative % Polygon)
Lower
class Cumulative
Class boundary Percentage
Less than 10 0 0
10 but less than 20 10 15
20 but less than 30 20 45 Ogive: Daily High Temperature
30 but less than 40 30 70
40 but less than 50 40 90 100
Cumulative Percentage
50 but less than 60 50 100
80
60
40
20
0
10 10 20 20 30 30 40 40 50 50 60 60
Class Boundaries (Not Midpoints) 34
Tabulating and Graphing
Multivariate Categorical Data
• Contingency Table for Investment Choices
($1000’s)Investor A
Investment Investor B Investor C Total
Category
Stocks 46.5 55 27.5 129
Bonds 32.0 44 19.0 95
CD 15.5 20 13.5 49
Savings 16.0 28 7.0 51
Total 110.0 147 67.0 324

(Individual values could also be expressed as percentages of the overall total,


percentages of the row totals, or percentages of the column totals)
35
Tabulating and Graphing
Multivariate Categorical Data
(continued)

• Side-by-side bar charts


C o m p arin g In vesto rs

S avings

CD

B onds

S toc k s

0 10 20 30 40 50 60

Inves tor A Inves tor B Inves tor C

36
Side-by-Side Chart Example
• Sales by quarter for three sales
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
territories:
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9

60

50

40
East
30 West
North
20

10

0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
37
Scatter Diagrams

• Scatter Diagrams are used to


examine possible relationships
between two numerical variables

• The Scatter Diagram:


– one variable is measured on the
vertical axis and the other variable
is measured on the horizontal axis
38
Scatter Diagram Example

Volum
e per Cost
day per day Cost per Day vs. Production Volume
250
23 131
24 120 200
Cost per Day

26 140 150

29 151 100

33 160 50

38 167 0
41 185 0 10 20 30 40 50 60 70
Volume per Day
42 170
50 188
55 195 39
60 200
Time Series Plot

• A Time Series Plot is used to


study patterns in the values of a
variable over time

• The Time Series Plot:


– one variable is measured on the
vertical axis and the time period is
measured on the horizontal axis
40
Scatter Diagram Example

Number
of Number of Franchises, 1996-2004
120
Franchise
100
Year s
Franchises
Number of

80
1996 43 60
1997 54 40

1998 60 20
0
1999 73 1994 1996 1998 2000 2002 2004 2006
2000 82 Year

2001 95
2002 107
2003 99
2004 95 41
Misusing Graphs and Ethical
Issues
Guidelines for good graphs:
• Do not distort the data
• Use a scale for each axis on a two-
dimensional graph
• The vertical axis scale should begin at
zero
• Properly label all axes
• The graph should contain a title
• Use the simplest graph for a given set of 42
Box plots
• a box plot or boxplot is a convenient way of graphically depicting
groups of numerical data through their quartiles. Box plots may also
have lines extending vertically from the boxes (whiskers) indicating
variability outside the upper and lower quartiles, hence the
terms box-and-whisker plot and box-and-whisker
diagram. Outliers may be plotted as individual points. Box plots
are non-parametric: they display variation in samples of a statistical
population without making any assumptions of the
underlying statistical distribution. The spacings between the different
parts of the box indicate the degree of dispersion (spread)
andskewness in the data, and show outliers. In addition to the points
themselves, they allow one to visually estimate various L-estimators,
notably the interquartile range, midhinge, range, mid-range,
and trimean. Boxplots can be drawn either horizontally or vertically.
43
Step 1 – Order
Numbers

1. Order the set of numbers from


least to greatest
Step 2 – Find the
Median

2. Find the median. The median


is the middle number. If the
data has two middle numbers,
find the mean of the two
numbers. What is the median?
Step 3 – Upper & Lower Quartiles

3. Find the lower and upper


medians or quartiles. These
are the middle numbers on
each side of the median. What
Step 4 – Draw a Number Line
Now you are ready to construct
the actual box & whisker graph.
First you will need to draw an
ordinary number line that
extends far enough in both
directions to include all the
numbers in your data:
Step 5 – Draw the
Parts
Locate the main median 12 using
a vertical line just above your
number line:
Step 5 – Draw the Parts
Locate the lower median 8.5
and the upper median 14 with
similar vertical lines:
Step 5 – Draw the Parts
• Next, draw a box using the
lower and upper median lines
as endpoints:
Step 5 – Draw the
Parts
Finally, the whiskers extend out
to the data's smallest number 5
and largest number 20:
Step 6 - Label the Parts of a Box-
and-Whisker Plot
Lower Quartile Median Upper Quartile
Lower Extreme Upper Extreme
3 1 2

4 5

Name the parts of a Box-and-Whisker Plot


Interquartile Range
The interquartile range is
the difference between the
upper quartile and the
lower quartile.

14 – 8.5 = 5.5
• Find the outliers, if any, for the following data set:
• 10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7,
14.7, 14.7, 14.9, 15.1, 15.9, 16.4
• To find out if there are any outliers, I first have to find the IQR. There
are fifteen data points, so the median will be at position (15 + 1) ÷ 2
= 8. Then Q2 = 14.6. There are seven data points on either side of
the median, so Q1 is the fourth value in the list and Q3 is the
twelfth: Q1 = 14.4 and Q3 = 14.9. Then IQR = 14.9 – 14.4 = 0.5.
• Outliers will be any points below Q1 – 1.5×IQR = 14.4 – 0.75 =
13.65 or above Q3 + 1.5×IQR = 14.9 + 0.75 = 15.65.

• Then the outliers are at 10.2, 15.9, and 16.4.

54
Diagrammatic presentation

55
Chapter Summary
• Data in raw form are usually not easy to use for
decision making -- Some type of organization is
needed:
 Table  Graph
• Techniques reviewed in this chapter:
– Bar charts, pie charts, and Pareto diagrams
– Ordered array and stem-and-leaf display
– Frequency distributions, histograms and
polygons
– Cumulative distributions and ogives 56

You might also like