Applied Statistics in Business & Economics, 5th edition

Economics, 5th edition

David P. Doane and Lori E. Seward

Prepared by Lloyd R. Jaisingh

McGraw-Hill/Irwin

Chapter 3

Chapter Contents

3.1 Stem-and-Leaf Displays and Dot Plots

3.2 Frequency Distributions and Histograms

3.3 Effective Excel Charts

3.4 Line Charts

3.5 Column and Bar Charts

3.6 Pie Charts

3.7 Scatter Plots

3.8 Tables

3.9 Deceptive Graphs

3-2

Chapter 3

Chapter Learning Objectives

LO3-1: Make a stem-and-leaf or dot plot.

LO3-2: Create a frequency distribution for a data set.

LO3-3: Make a histogram with appropriate bins.

LO3-4: Identify skewness, modal classes, and outliers in a

histogram.

LO3-5: Make an effective line chart.

3-3

Chapter 3

LO3-6: Make an effective column chart or bar chart.

LO3-7: Make an effective pie chart.

LO3-8: Make and interpret a scatter plot.

LO3-9: Make simple tables and pivot tables.

LO3-10: Recognize deceptive graphing techniques.

3-4

Chapter 3

Dot Plots

Methods of organizing, exploring and summarizing data include:

provides insight into characteristics of a data set without using

mathematics.

provides insight into characteristics of a data set using

mathematics.

3-5

Chapter 3

Dot Plots

Begin with univariate data (a set of n observations on one variable)

and consider the following:

3-6

Chapter 3

Dot Plots

Preliminary Assessment

Look at the data and visualize how they were collected

and measured.

Sorting (Example: Price/Earnings Ratios)

Sort the data as a first step and then summarize in a

graphical display. Here are the sorted P/E ratios (values

from Table 3.2).

3-7

Dot Plots

Chapter 3

LO3-1

the type of data you have. Some charts are better suited for

quantitative data, while others are better for displaying

categorical data.

Stem-and-Leaf Plot

plot. The stem-and-leaf plot is a tool of exploratory data

analysis (EDA) that seeks to reveal essential data features in an

intuitive way. A stem-and-leaf plot is basically a frequency tally,

except that we use digits instead of tally marks. For two-digit or

three-digit integer data, the stem is the tens digit of the data, and

the leaf is the ones digit.

3-8

Dot Plots

Chapter 3

LO3-1

For example, the data values in the fourth stem are 31, 37, 37, 38. We always use

equally spaced stems (even if some stems are empty). The stem-and-leaf can reveal

central tendency (24 of the 44 P/E ratios were in the 1019 stem) as well as

dispersion (the range is from 7 to 59). In this illustration, the leaf digits have been

sorted, although this is not necessary. The stem-and-leaf has the advantage that we

can retrieve the raw data by concatenating a stem digit with each of its leaf digits. For

example, the last stem has data values 50 and 59.

3-9

Dot Plots

Chapter 3

LO3-1

Dot Plots

numerical data.

- Easy to understand.

- It reveals dispersion, central tendency, and the shape of the distribution.

1. Make a scale that covers the data range.

2. Mark the axes and label them.

3. Plot each data value as a dot above the scale at its approximate

location.

Note: If more than one data value lies at about the same axis

location, the dots are stacked vertically.

3-10

Dot Plots

Chapter 3

LO3-1

All but a few data values lie between 10 and 25.

A typical middle data value would be around 17 or 18.

The data are not symmetric due to a few large P/E ratios.

3-11

Dot Plots

Chapter 3

LO3-1

Comparing Groups

A stacked dot plot compares two or more groups using a common

X-axis scale.

3-12

Histograms

Chapter 3

LO3-2

values into k classes (bins).

Bin limits define the values to be included in each bin. Widths must

all be the same except when we have open-ended bins.

Frequencies are the number of observations within each bin.

Express as relative frequencies (frequency divided by the total) or

percentages (relative frequency times 100).

3-13

Chapter 3

LO3-2

Histograms

- Herbert Sturges proposed the following rule:

3-14

Chapter 3

LO3-2

Histograms

3-15

Chapter 3

LO3-2

Histograms

Histograms

A histogram is a graphical representation of a frequency

distribution.

Y-axis shows frequency within each bin.

A histogram is a bar chart.

X-axis ticks shows end points of each bin.

3-16

Chapter 3

LO3-3

Histograms

Consider 3 histograms for the P/E ratio data with different bin

widths. What do they tell you?

3-17

Chapter 3

LO3-3

Histograms

requires judgment.

One can use software programs to create histograms with different

bins. These include software such as:

Excel

MegaStat

Minitab

3-18

Chapter 3

LO3-3

Histograms

Modal Class

Unimodal a single modal class.

Bimodal two modal classes.

Multimodal more than two modal classes.

Modal classes may be artifacts of the way bin limits are chosen.

3-19

Chapter 3

LO3-4

Histograms

Shape

It is influenced by the number of bins and bin limits.

Skewness indicated by the direction of the longer tail of the

histogram.

Left-skewed (negatively skewed) a longer left tail.

Right-skewed (positively skewed) a longer right tail.

Symmetric both tail areas are the same.

3-20

Chapter 3

LO3-4

Histograms

3-21

Chapter 3

Histograms

A frequency polygon is a line graph that connects the midpoints of

the histogram intervals, plus extra intervals at the beginning and

end

so that the line will touch the X-axis.

It serves the same purpose as a histogram, but is attractive when

you

need to compare two data sets (since more than one

frequency

polygon can be plotted on the same scale).

An ogive (pronounced oh-jive) is a line graph of the cumulative

frequencies.

It is useful for finding percentiles or in comparing the shape of the

sample with a known benchmark such as the normal distribution

(that

you will be seeing in the next chapter).

3-22

Chapter 3

Histograms

Frequency Polygons and Ogives

3-23

charts. Excel offers a vast array of charts. Refer to

Figure 3.10. Please refer to the text as well.

Chapter 3

3-24

Chapter 3

LO3-5

series or spot trends,

or to compare time

periods.

variables at once.

3-25

Chapter 3

LO3-5

magnitude or are measured in different units.

3-26

Chapter 3

LO3-5

Log Scales

magnitude of the variable being displayed.

ratios.

Use a log scale for the vertical axis when data vary over a wide

range, say, by more than an order of magnitude.

3-27

Log Scales

Chapter 3

LO3-5

A log scale is useful for time series data that might be expected to grow at a

compound annual percentage rate (e.g., GDP, the national debt, or your

future income). It reveals whether the quantity is growing at an

increasing percent (concave upward),

constant percent (straight line), or

declining percent (concave downward)

3-28

Chapter 3

LO3-6

Bar chart is a horizontal display of the data.

3-29

Chapter 3

LO3-6

Pareto Charts

frequency of defects or errors of different types.

Categories are

displayed in

descending order

of frequency.

Focus on

significant few

(i.e., few

categories that

account for most defects or errors).

3-30

Chapter 3

LO3-6

of several subtotals.

Areas may be

compared by color to

show patterns in the

subgroups and total.

3-31

Chapter 3

LO3-7

Pie Chart

Pie charts should be used to portray data which sum to a total

(e.g., percent market shares).

A pie chart should only have a few (i.e., 2 to 5) slices.

Each slice can be labeled with data values or percents.

3-32

Chapter 3

LO3-7

Pie Chart

A simple 2-D pie chart is best as shown in Figure 3.19.

3-33

Pie Chart

Chapter 3

LO3-7

The 3-D pie chart adds visual interest, but the sizes of the

pie slices are harder to assess.

3-34

Chapter 3

LO3-7

Pie Chart

A simple bar chart can be used to display the same data, and

would be preferred by many statisticians.

3-35

Chapter 3

LO3-8

Scatter plots can convey patterns in data pairs that would not be

apparent from a table.

A scatter plot is a starting point for bivariate data analysis in which we

investigate the association and relationship between two variables.

View the next slide for an example.

3-36

Chapter 3

LO3-8

3-37

Chapter 3

LO3-8

Figure 3.23 shows some scatter plot patterns similar to those that you

might observe when you have a sample of (X, Y) data pairs.

A scatter plot can convey patterns in data pars that would not be

apparent from a table.

3-38

Chapter 3

LO3-8

3-39

Chapter 3

LO3-8

3-40

3.8 Tables

Chapter 3

LO3-9

By arranging numbers in rows and columns, their meaning can be

enhanced so it can be understood at a glance.

The data can be viewed by focusing on the time pattern (down the

columns) or by comparing the variables (across the rows).

3-41

3.8 Tables

Chapter 3

LO3-9

3-42

3.8 Tables

Chapter 3

LO3-9

Here are some tips for creating effective tables:

1. Keep the table simple, consistent with its purpose. Put

summary

tables in the main body of the written report and detailed tables in

an appendix.

2. Display the data to be compared in columns rather than rows.

3. For presentation purposes, round off to three or four significant

digits.

4. Physical table layout should guide the eye toward the

comparison you wish to emphasize.

5. Row and column headings should be simple yet descriptive.

6. Within a column, use a consistent number of decimal digits.

3-43

Chapter 3

LO3-10

Deceptive

Correct

3-44

Chapter 3

LO3-10

exaggerate the graph. By default, Excel uses an aspect ratio of 1.68.

3-45

Chapter 3

LO3-10

because they distort the bar volume and make it hard to measure

bar height.

3-46

Chapter 3

LO3-10

towards you.

3-47

Chapter 3

LO3-10

errors (silly subtitle, distracting pictures, no data labels, no

definitions, vague source, too much information).

3-48

Chapter 3

LO3-10

3-49

Chapter 3

LO3-10

Error 3:

Error 6:

Error 7:

Error 9:

Error 10:

Unclear Definitions or Scales

Vague Sources

Gratuitous Effects

Estimated Data

3-50

