Professional Documents
Culture Documents
4
The Summary Table
Summarize data by category
6
Bar Chart Example
Current Investment Portfolio
Investment Amount Percentage
Type (in thousands $) (%)
0 10 20 30 40 50
Amount in $1000's 7
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage
Type (in thousands $) (%)
Percentages
are rounded to
Bonds the nearest
29% percent
8
Pareto Diagram
• Used to portray categorical data (nominal
scale)
• A bar chart, where categories are shown
in descending order of frequency
• A cumulative polygon is often shown in the
same graph
• Used to separate the “vital few” from the
“trivial many” 9
Pareto Diagram Example
Current Investment Portfolio
45% 100%
40% 90%
% invested in each category
cumulative % invested
80%
35%
70%
30%
(bar graph)
(line graph)
60%
25%
50%
20%
40%
15%
30%
10%
20%
5% 10%
0% 0%
Stocks Bonds Savings CD
10
Tables and Charts for
Numerical Data
Numerical Data
Frequency Distributions
Ordered Array and
Cumulative Distributions
Stem-and-Leaf
Histogram Polygon Ogive
Display
11
The Ordered Array
12
The Ordered Array
(continued)
• Data in raw form (as collected):
69
7
• 38 is shown as 3 8
4 1
• 41 is shown as
22
Example
(continued)
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
25
Class Intervals
and Class Boundaries
• Each class grouping has the same width
• Determine the width of each interval by
range
Width of interval
number of desired class groupings
27
Frequency Distribution Example
(continued)
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5
and 15)
• Compute class interval (width): 10 (46/5 then round
up)
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100 29
Tabulating Numerical Data:
Cumulative Frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
Class
Class Midpoint Frequency
10 but less than 20 15 3 Histogram : Daily High Tem perature
20 but less than 30 25 6
30 but less than 40 35 5 7
40 but less than 50 45 4
50 but less than 60 55 2
6
5
Frequency
4
3
2
(No gaps 1
between 0
bars)
5 15 25 35 45 55 65
32
Class Midpoints
Graphing Numerical Data:
The Frequency Polygon
Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Frequency Polygon: Daily High Temperature
40 but less than 50 45 4
7
50 but less than 60 55 2
6
5
Frequency
4
3
2
(In a percentage 1
polygon the vertical axis 0
would be defined to 5 15 25 35 45 55 65
show the percentage of
observations per class) Class Midpoints
33
Graphing Cumulative
Frequencies:
The Ogive (Cumulative % Polygon)
Lower
class Cumulative
Class boundary Percentage
Less than 10 0 0
10 but less than 20 10 15
20 but less than 30 20 45 Ogive: Daily High Temperature
30 but less than 40 30 70
40 but less than 50 40 90 100
Cumulative Percentage
50 but less than 60 50 100
80
60
40
20
0
10 10 20 20 30 30 40 40 50 50 60 60
Class Boundaries (Not Midpoints) 34
Tabulating and Graphing
Multivariate Categorical Data
• Contingency Table for Investment Choices
($1000’s)Investor A
Investment Investor B Investor C Total
Category
Stocks 46.5 55 27.5 129
Bonds 32.0 44 19.0 95
CD 15.5 20 13.5 49
Savings 16.0 28 7.0 51
Total 110.0 147 67.0 324
S avings
CD
B onds
S toc k s
0 10 20 30 40 50 60
36
Side-by-Side Chart Example
• Sales by quarter for three sales
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
territories:
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
60
50
40
East
30 West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
37
Scatter Diagrams
Volum
e per Cost
day per day Cost per Day vs. Production Volume
250
23 131
24 120 200
Cost per Day
26 140 150
29 151 100
33 160 50
38 167 0
41 185 0 10 20 30 40 50 60 70
Volume per Day
42 170
50 188
55 195 39
60 200
Time Series Plot
Number
of Number of Franchises, 1996-2004
120
Franchise
100
Year s
Franchises
Number of
80
1996 43 60
1997 54 40
1998 60 20
0
1999 73 1994 1996 1998 2000 2002 2004 2006
2000 82 Year
2001 95
2002 107
2003 99
2004 95 41
Misusing Graphs and Ethical
Issues
Guidelines for good graphs:
• Do not distort the data
• Use a scale for each axis on a two-
dimensional graph
• The vertical axis scale should begin at
zero
• Properly label all axes
• The graph should contain a title
• Use the simplest graph for a given set of 42
Box plots
• a box plot or boxplot is a convenient way of graphically depicting
groups of numerical data through their quartiles. Box plots may also
have lines extending vertically from the boxes (whiskers) indicating
variability outside the upper and lower quartiles, hence the
terms box-and-whisker plot and box-and-whisker
diagram. Outliers may be plotted as individual points. Box plots
are non-parametric: they display variation in samples of a statistical
population without making any assumptions of the
underlying statistical distribution. The spacings between the different
parts of the box indicate the degree of dispersion (spread)
andskewness in the data, and show outliers. In addition to the points
themselves, they allow one to visually estimate various L-estimators,
notably the interquartile range, midhinge, range, mid-range,
and trimean. Boxplots can be drawn either horizontally or vertically.
43
Step 1 – Order
Numbers
4 5
14 – 8.5 = 5.5
• Find the outliers, if any, for the following data set:
• 10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7,
14.7, 14.7, 14.9, 15.1, 15.9, 16.4
• To find out if there are any outliers, I first have to find the IQR. There
are fifteen data points, so the median will be at position (15 + 1) ÷ 2
= 8. Then Q2 = 14.6. There are seven data points on either side of
the median, so Q1 is the fourth value in the list and Q3 is the
twelfth: Q1 = 14.4 and Q3 = 14.9. Then IQR = 14.9 – 14.4 = 0.5.
• Outliers will be any points below Q1 – 1.5×IQR = 14.4 – 0.75 =
13.65 or above Q3 + 1.5×IQR = 14.9 + 0.75 = 15.65.
54
Diagrammatic presentation
55
Chapter Summary
• Data in raw form are usually not easy to use for
decision making -- Some type of organization is
needed:
Table Graph
• Techniques reviewed in this chapter:
– Bar charts, pie charts, and Pareto diagrams
– Ordered array and stem-and-leaf display
– Frequency distributions, histograms and
polygons
– Cumulative distributions and ogives 56