You are on page 1of 57

Applied Statistics in Business &

Economics
David P. Doane and Lori E. Seward

Vũ Võ
vu.vo@ueh.edu.vn

3-1
Chapter 3
Describing Data Visually
Chapter Contents
3.1 Stem-and-Leaf Displays and Dot Plots
3.2 Frequency Distributions and Histograms
3.3 Effective Excel Charts
3.4 Line Charts
3.5 Column and Bar Charts
3.6 Pie Charts
3.7 Scatter Plots
3.8 Tables
3.9 Deceptive Graphs
3-2
Chapter 3
Chapter Learning Objectives
LO3-1: Make a stem-and-leaf or dot plot.
LO3-2: Create a frequency distribution for a data set.
LO3-3: Make a histogram with appropriate bins.
LO3-4: Identify skewness, modal classes, and outliers in
a histogram.
LO3-5: Make an effective line chart.

3-3
Chapter 3
Chapter Learning Objectives (continued)
LO3-6: Make an effective column chart or bar chart.
LO3-7: Make an effective pie chart.
LO3-8: Make and interpret a scatter plot.
LO3-9: Make simple tables and pivot tables.
LO3-10: Recognize deceptive graphing techniques.

3-4
Chapter 3
3.1 Stem-and-Leaf Displays and Dot Plots
LO3-1: Make a stem-and-leaf or dot plot.

Methods of organizing, exploring, and summarizing data include:


• Visual (charts and graphs) provides insight into
characteristics of a data set without using mathematics.
• Numerical (statistics or tables) provides insight into
characteristics of a data set using mathematics.

3-5
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued).
Begin with univariate data (a set of n observations on
one variable) and consider the following (Table 3.1):

Characteristic Interpretation
What are the units of measurement
(e.g., dollars)? Are the data integer
Measurement or continuous? Any missing
observations? Any concerns with
accuracy or sampling methods?
Where are the data values
Center concentrated? What seem to be
typical or middle data values?
How much dispersion is there in the
Variability data? How spread out are the data
values? Are there unusual values?
Are the data values distributed
Shape symmetrically? Skewed? Sharply
peaked? Flat? Bimodal?

3-6
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 2).
Preliminary Assessment
• Look at the data and visualize how they were collected and
measured.
• Sorting (Example: Price/Earnings Ratios)
• Sort the data as a first step and then summarize in a
graphical display. Here are the sorted P/E ratios (values
from Table 3.2).

3-7
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 3).

The type of graph you use to display your data is dependent


on the type of data you have. Some charts are better suited
for quantitative data, while others are better for displaying
categorical data.

Stem-and-Leaf Plot
One simple way to visualize small data sets is a stem-
and-leaf plot. The stem-and-leaf plot is a tool of
exploratory data analysis (EDA) that seeks to reveal
essential data features in an intuitive way. A stem-and-
leaf plot is basically a frequency tally, except that we use
digits instead of tally marks. For two-digit or three-digit
integer data, the stem is the tens digit of the data, and
the leaf is the ones digit.
3-8
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 4).
Stem-and-Leaf Plot (continued, 2)

For the 44 P/E ratios, the stem-and-leaf plot is given


below.

3-9
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 5).
Stem-and-Leaf Plot (continued, 3)
• For example, the data values in the fourth stem are 31, 37,
37, 38.
• We always use equally spaced stems (even if some stems
are empty).
• The stem-and-leaf can reveal central tendency (24 of the
44 P/E ratios were in the 10–19 stem) as well as
dispersion (the range is from 7 to 59).
• In this illustration, the leaf digits have been sorted,
although this is not necessary.
• The stem-and-leaf has the advantage that we can retrieve
the raw data by concatenating a stem digit with each of its
leaf digits. For example, the last stem has data values 50
and 59.

3-10
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 6).
Dot Plots
• A dot plot is the simplest graphical display of n individual values of
numerical data.
• Easy to understand.
• It reveals dispersion, central tendency, and the shape of the
distribution.
Steps in Making a Dot Plot
1. Make a scale that covers the data range.
2. Mark the axes and label them.
3. Plot each data value as a dot above the scale at its
approximate location.

Note: If more than one data value lies at about the same axis location,
the dots are stacked vertically.

3-11
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 7).
Below is the dot plot for the P/E Ratios.

• The range is from 7 to 59.


• All but a few data values lie between 10 and 25.
• A typical “middle” data value would be around 17 or 18.
• The data are not symmetric due to a few large P/E ratios.

3-12
Chapter 3
LO3-1: Make a stem-and-leaf or dot plot (continued, 8).
Comparing Groups
• A stacked dot plot can be used to compares two or more
groups using a common X-axis scale.

3-13
Chapter 3
3.2 Frequency Distributions and
Histograms
LO3-2: Create a frequency distribution for a data set.
Bins and Bin Limits
• A frequency distribution is a table formed by classifying n data
values into k classes (bins).
• Bin limits define the values to be included in each bin. Widths
must all be the same except when we have open-ended bins.
• For guidance, find the approximate width of each bin by dividing
the data range by the number of bins: (xmax – xmin)/k.
• Frequencies are the number of observations within each bin.
• Express as relative frequencies (frequency divided by the total)
or percentages (relative frequency times 100).

3-14
Chapter 3
LO3-2: Create a frequency distribution for a data set
(continued).
Constructing a Frequency Distribution
Herbert Sturges proposed the following rule:

3-15
Chapter 3
LO3-2: Create a frequency distribution for a data set
(continued, 2).
For the P/E ratio, the smallest P/E ratio was 7 and the largest P/E
ratio was 59, so if we want to use k = 6 bins, we calculate the
approximate bin width as (59 − 7)/6 = 8.67.
To obtain “nice” limits, we could round the bin width up to 10 and
choose bin limits of 0, 10, 20, 30, 40, 50, 60.

3-16
Chapter 3
LO3-3: Make a histogram with appropriate bins.

Histograms
• A histogram is a graphical representation of a
frequency distribution.
• A histogram is a bar chart.
• Y-axis shows frequency within each bin.
• X-axis ticks shows end points of each bin.

3-17
Chapter 3
LO3-3: Make a histogram with appropriate bins
(continued).
Consider 3 histograms for the P/E ratio data with different bin
widths. What do they tell you?

3-18
Chapter 3
LO3-3: Make a histogram with appropriate bins
(continued, 2).
• Choosing the number of bins and bin limits in creating
histograms requires judgment.
• One can use software programs to create histograms
with different bins. These include software such as:
• Excel
• MegaStat
• Minitab

3-19
Chapter 3
LO3-4: Identify skewness, modal classes, and outliers
in a histogram.

Modal Class

• A histogram bar that is higher than those on either


side.
• Unimodal – a single modal class.
• Bimodal – two modal classes.
• Multimodal – more than two modal classes.
• Modal classes may be artifacts of the way bin
limits are chosen.

3-20
Chapter 3
LO3-4: Identify skewness, modal classes, and outliers
in a histogram (continued).

Shape
• A histogram may suggest the shape of the population.
• It is influenced by the number of bins and bin limits.
• Skewness – indicated by the direction of the longer
tail of the histogram.
• Left-skewed – (negatively skewed) a longer left
tail.
• Right-skewed – (positively skewed) a longer right
tail.
• Symmetric – both tail areas are the same.

3-21
Chapter 3
LO3-4: Identify skewness, modal classes, and outliers
in a histogram (continued, 2).

3-22
Chapter 3
LO3-4: Identify skewness, modal classes, and outliers
in a histogram (continued, 3).

• An outlier is an extreme value that is far enough


from the majority of the data that it probably arose
from a different cause or is due to measurement
error.
• We will define outliers more precisely in the next
chapter.
• For now, think of outliers as unusual points located
in the histogram tails.

3-23
Chapter 3
LO3-4: Identify skewness, modal classes, and outliers in
a histogram (continued, 4).

Frequency Polygons and Ogive


• A frequency polygon is a line graph that connects the midpoints
of the histogram intervals, plus extra intervals at the beginning
and end so that the line will touch the X-axis.
• It serves the same purpose as a histogram but is attractive
when you need to compare two data sets (since more than one
frequency polygon can be plotted on the same scale).
• An ogive (pronounced “oh-jive”) is a line graph of the
cumulative frequencies.
• It is useful for finding percentiles or in comparing the shape of
the sample with a known benchmark such as the normal
distribution (that you will be seeing in the next chapter).

3-24
Chapter 3
LO3-4: Identify skewness, modal classes, and outliers
in a histogram (continued, 5).

Frequency Polygons and Ogives

3-25
Chapter 3
3.3 Effective Excel Charts
This section describes how to use Excel to create charts.
Excel offers a vast array of charts. Refer to Figure 3.8
and to the text as well.

3-26
Chapter 3
3.4 Line Charts
LO3-5: Make an effective line chart.

Simple Line Charts


• Used to display a
time series or spot
trends, or to compare
time periods.
• Can display several
variables at once.

3-27
Chapter 3
LO3-5: Make an effective line chart (continued).
Simple Line Charts
• Two-scale line chart – used to compare variables that
differ in magnitude or are measured in different units.

3-28
Chapter 3
LO3-5: Make an effective line chart (continued, 2).

Log Scales
• Arithmetic scale – distances on the Y-axis are proportional to
the magnitude of the variable being displayed.
• Logarithmic scale – (ratio scale) equal distances represent
equal ratios.
• Use a log scale for the vertical axis when data vary over a
wide range, say, by more than an order of magnitude.
• This will reveal more detail for smaller data values.

3-29
Chapter 3
LO3-5: Make an effective line chart (continued, 3).
Log Scales
• A log scale is useful for time series data that might be expected
to grow at a compound annual percentage rate (e.g., GDP, the
national debt, or your future income). It reveals whether the
quantity is growing at an
• increasing percent (concave upward),
• constant percent (straight line), or
• declining percent (concave downward).

3-30
Chapter 3
3.5 Column and Bar Charts
LO3-6: Make an effective column chart or bar chart.
• A column chart is a vertical display of the data.
• A bar chart is a horizontal display of the data.

Figure 3.14 shows


simple column
and bar charts
comparing market
shares among tire
manufacturers.

3-31
Chapter 3
LO3-6: Make an effective column chart or bar chart
(continued).
Pareto Charts
• Special type of bar chart used in quality management to
display the frequency of defects or errors of different types.
• Categories are displayed
in descending order of
frequency.
• Focus on significant few
(i.e., few categories that
account for most defects
or errors).

3-32
Chapter 3
LO3-6: Make an effective column chart or bar chart
(continued, 2).
Stacked Column Chart
• Bar height with the sum
of several subtotals.
Areas may be compared
by color to show patterns
in the subgroups and
total.

Source: www.aamc.org

3-33
Chapter 3
3.6 Pie Charts

LO3-7: Make an effective pie chart.

Pie Chart
• A pie chart can only convey a general idea of the data.
• Pie charts should be used to portray data which sum
to a total (e.g., percent market shares).
• A pie chart should only have a few (i.e., 2 to 5) slices.
• Each slice can be labeled with data values or
percents.

3-34
Chapter 3
LO3-7: Make an effective pie chart (continued).
Pie Chart
• A simple 2-D pie chart is best, as shown in Figure 3.17.

3-35
Chapter 3
LO3-7: Make an effective pie chart (continued, 2).
Pie Chart
• The 3-D pie chart adds visual interest, but the sizes of the
pie slices are harder to assess.

3-36
Chapter 3
LO3-7: Make an effective pie chart (continued, 3).
Bar Chart
• A simple bar chart can be used to display the same data, and
would be preferred by many statisticians.

3-37
Chapter 3
3.7 Scatter Plots
LO3-8: Make and interpret a scatter plot.

• Scatter plots can convey patterns in data pairs that


would not be apparent from a table.
• A scatter plot is a starting point for bivariate data
analysis in which we investigate the association and
relationship between two quantitative variables.
• View the next slide for an example.

3-38
Chapter 3
LO3-8: Make and interpret a scatter plot (continued).

The figure shows a scatter


plot with life expectancy on
the X-axis and birth rates on
the Y-axis. In this illustration,
there seems to be an
association between X and Y.
That is, nations with higher
birth rates tend to have lower
life expectancy (and vice
versa). No cause-and-effect
relationship is implied
because, in this example,
both variables could be
influenced by a third variable
that is not mentioned (e.g.,
GDP per capita).

3-39
Chapter 3
LO3-8: Make and interpret a scatter plot (continued, 2).
• Figure 3.21 shows some scatter plot patterns similar to those that
you might observe when you have a sample of (X, Y) data pairs.
• A scatter plot can convey patterns in data pairs that would not be
apparent from a table.

3-40
Chapter 3
LO3-8: Make and interpret a scatter plot (continued, 3).
Other examples of scatter plots.

3-41
Chapter 3
LO3-8: Make and interpret a scatter plot (continued, 4).

Other examples of scatter plots (continued).

3-42
Chapter 3
LO3-8: Make and interpret a scatter plot (continued, 5).

Other examples of scatter plots (continued).

3-43
Chapter 3
LO3-8: Make and interpret a scatter plot (continued, 6).

Other examples of scatter plots (continued).

3-44
Chapter 3
3.8 Tables
LO3-9: Make simple tables and pivot tables.

• Tables are the simplest form of data display.


• By arranging numbers in rows and columns, their meaning
can be enhanced so it can be understood at a glance.
• Arrangement of data is in rows and columns to enhance
meaning.
• The data can be viewed by focusing on the time pattern
(down the columns) or by comparing the variables (across
the rows).

3-45
Chapter 3
LO3-9: Make simple tables and pivot tables (continued).

Example: School Expenditures

Refer to the text on the discussion on Pivot Tables.

3-46
Chapter 3
LO3-9: Make simple tables and Pivot tables (continued, 2).

Here are some tips for creating effective tables:


1. Keep the table simple, consistent with its purpose. Put summary
tables in the main body of the written report and detailed tables
in an appendix.
2. Display the data to be compared in columns rather than rows.
3. For presentation purposes, round off to three or four significant
digits.
4. Physical table layout should guide the eye toward the
comparison you wish to emphasize.
5. Row and column headings should be simple yet descriptive.
6. Within a column, use a consistent number of decimal digits.

3-47
Chapter 3
3.9 Deceptive Graphs
LO3-10: Recognize deceptive graphing techniques.

Error 1: Nonzero Origin


• A nonzero origin will exaggerate the trend.

3-48
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 2).
Error 2: Elastic Graph Proportions
• Keep the aspect ratio (width/height) below 2.00 so as not to
exaggerate the graph. By default, Excel uses an aspect ratio of
1.68.

3-49
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 3).
Error 3: Dramatic Titles and Distracting Pictures
• A dramatic title often is designed more to grab the reader's
attention than to convey the chart's content (Criminals on a
Spree, Deficit Swamps Economy).
• Sometimes the title attempts to draw your conclusion for you
(Inflation Wipes Out Savings, Imports Dwarf Exports).
• A title should be short but adequate for the purpose.
• To add visual pizzazz, artists may superimpose the chart on a
photograph (e.g., a gasoline price chart atop a photo of an oil-
drilling platform) or add colorful cartoon figures, banners, or
drawings.

3-50
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 4).
Error 3: Dramatic Titles and Distracting Pictures
(continued)
• This is mostly harmless but can distract the reader or
impart an emotional slant.
• Advertisements sometimes feature mature, attractive,
conservatively attired actors portraying scientists, doctors,
or business leaders examining scientific-looking charts.
• Because the public respects science’s reputation, such
displays impart credibility to self-serving commercial
claims.
• The medical school applications graph (see next slide)
illustrates these deceptive elements.

3-51
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 5).
Error 3: Dramatic Titles and Distracting Pictures
(continued, 3)

3-52
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 6).
Error 4: 3-D and Novelty Graphs
• Novelty charts such as the pyramid chart should be
avoided because they distort the bar volume and make it
hard to measure bar height.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 3-53
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 7).

Error 5: Rotated Graphs


• Can make trends appear to dwindle into the distance or
loom towards you.

3-54
Chapter 3
LO3-10: Recognize deceptive graphing technique
(continued, 8).
Error 8: Complex Graphs
• Avoid if possible. This example (surgery volume) combines
several errors (silly subtitle, distracting pictures, no data
labels, no definitions, vague source, too much information).

3-55
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 9).
Error 11: Area Trick
• As figure height increases, so does width, distorting the
graph.

3-56
Chapter 3
LO3-10: Recognize deceptive graphing techniques
(continued, 10).

Other deceptive graphing techniques:


Error 6: Unclear Definitions or Scales
Error 7: Vague Sources
Error 9: Gratuitous Effects
Error 10: Estimated Data

3-57

You might also like