You are on page 1of 31

4-1

Chapter

Four

McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.


4-2
Chapter Four
Describing Data: Displaying and Exploring Data

GOALS
When you have completed this chapter, you will be able to:

ONE
Develop and interpret a dot plot.
TWO
Develop and interpret a stem-and-leaf display.
THREE
Compute and interpret quartiles, deciles, and percentiles.
FOUR
Construct and interpret box plots.
Goals
4-3
Chapter Four
Describing Data: Displaying and Exploring Data

FIVE
Compute and understand the coefficient of variation and the
coefficient of skewness.

SIX
Draw and interpret a scatter diagram.

SEVEN
Set up and interpret a contingency table.

Goals
4-4

Dot Plot

Dot plots:
 Report the details of each observation
 Are useful for comparing two or more data sets

Dot Plot
4-5

This example gives the percentages of men and


women participating in the workforce in a recent
year for the fifty states of the United States.
Compare the dispersions of labor force
participation by gender.

Example 1
4-6

This example gives the percentages of men and


women participating in the workforce in a recent
year for the fifty states of the United States.
Compare the dispersions of labor force
participation by gender.

Example 1
(continued)
4-7

Percentage of women Percentage of men


participating participating
In the labor force for the In the labor force for the
50 states. 50 states.

Example 1 (continued)
4-8

Stem-and-leaf Displays

Stem-and-leaf Note: an advantage


display: A of the stem-and-leaf
statistical technique display over a
for displaying a set frequency
of data. Each distribution is we
numerical value is do not lose the
divided into two identity of each
parts: the leading observation.
digits become the
stem and the
trailing digits the
leaf.
Stem-and-leaf Displays
4-9

Stock prices on twelve


consecutive days for a major
publicly traded company 100

90

80

70

60

86, 79, 92, 84, 69, 88, 91 50


1 2 3 4 5 6 7 8 9 10 11 12

83, 96, 78, 82, 85.

Example 2
4-10

Stem and leaf display of stock prices

stem leaf
6 9
7 89
8 234568
9 126

Example 2 (Continued )
4-11

Quartiles
D iv id e a s e t o f
o b s e r v a tio n s
in to fo u r
e q u a l p a r ts.

Quartiles
4-12

Quartiles
L o c a te th e m e d ia n ,
(5 0 th p e r c e n tile )

Quartiles (continued)
4-13

Quartiles
L o c a te th e m e d ia n ,
(5 0 th p e r c e n tile )
th e fir s t q u a r tile
(2 5 th p e r c e n tile )

Quartiles (continued)
4-14

Quartiles
L o c a te th e m e d ia n ,
(5 0 th p e r c e n tile )
fir s t q u a r tile (2 5 th p e r c e n tile )

a n d th e 3 r d q u a r tile
(7 5 th p e r c e n tile )

Quartiles (continued)
4-15

Quartiles
P
Lp = (n+1)
100
w h e re

P is th e d e s ire d p e rc e n tile

Quartiles (continued)
4-16

Using the twelve stock prices, we can find the


median, 25th, and 75th percentiles as follows:

L = (1 2 + 1 ) 7 5 = 9 .7 5 th o b s e rv a tio n
Quartile 3 75
100

50
Median L 50 = (1 2 + 1 ) = 6 . 5 0 th o b s e r v a t i o n
100

25 = 3 .2 5 th o b s e rv a tio n
Quartile 1 L 25 = (1 2 + 1 )
100

Example 2 (continued)
4-17

96 75 percentile
th
12
Q4 11 92 Price at 9.75 observation = 88 + .75(91-88)
10 91 = 90.25
9 88
Q3 8 86
50th percentile: Median
7 85
Price at 6.50 observation = 85 + .5(85-84)
6 84
= 84.50
Q2 5 83
4 82
3 79 25 th
percentile
Q1 2 78 Price at 3.25 observation = 79 + .25(82-79)
1 69 = 79.75

Example 2 (continued)
4-18

The Interquartile This distance will


range is the distance include the middle 50
between the third percent of the
quartile Q3 and the observations.
first quartile Q1.

Interquartile range = Q3 - Q1

Interquartile Range
4-19

For a set of
observations the third
quartile is 24 and the
first quartile is 10.
What is the quartile
deviation?
The interquartile range is
24 - 10 = 14. Fifty
percent of the observations
will occur between 10 and
24.

Example 3
4-20

A box plot is a graphical


display, based on quartiles,
that helps to picture a set of
data.
Five pieces of data
are needed to
construct a box
plot: the Minimum
Value, the First
Quartile, the
Median, the Third
Quartile, and the
Maximum Value.
Box Plots
4-21

Based on a sample of 20
deliveries,
Buddy’s Pizza determined the
following information. The
minimum delivery time was 13
minutes and the maximum 30
minutes. The first quartile was
15 minutes, the median 18
minutes, and the third quartile
22 minutes. Develop a box plot
for the delivery times.

Example 4
4-22

Example 4 continued
4-23

M in Q M e d ia n Q M ax
1 3

12 14 16 18 20 22 24 26 28 30 32

Example 4 continued
4-24

The coefficient of variation is


the ratio of the standard
Relative dispersion deviation to the arithmetic
mean, expressed as a
percentage:

s
CV  (100%)
X

M ea n
Coefficient of Variation
4-25

Skewness is the
measurement of the
lack of symmetry of
the distribution.

The coefficient of
skewness can range A value of 0 indicates a
symmetric distribution.
from -3.00 up to 3.00
when using the following
formula: Some software packages use a
different formula which results

sk 

3 X  Median  in a wider range for the
coefficient.
s
Movie
4-26

Using the twelve stock prices, we find the mean to be


84.42, standard deviation, 7.18, median, 84.5.

Coefficient of variation

s
CV  (100%) = 8.5%
X
Coefficient of skewness

sk  3 X  Median  = -.035
s

Example 2 revisited
4-27

Scatter Variables must be at least interval scaled.


diagram: A
technique
used to show Relationship can be positive (direct) or
the negative (inverse).
relationship
between
variables.

Example
The twelve days of stock prices and the overall market
index on each day are given as follows:

Scatter diagram
4-28

Index
(000s) Price
Relationship between Market Index
8.0 96 and Stock Price
7.5 92 100
7.5 91 90
7.3 88 80
Price
7.2 86 70

7.2 85 60
50
7.1 84 5 6 7 8 9 10
7.1 83 Index
7.0 82
6.2 79
6.2 78
5.1 69
Example 2 revisited
4-29

A contingency table is
used to classify
observations according to
two identifiable
characteristics.
Contingency tables are used
when one or both variables are
nominally scaled.
A contingency table is a
cross tabulation that
simultaneously
summarizes two variables
of interest.
Contingency table
4-30

Weight Loss
45 adults, all 60 pounds
overweight, are randomly
assigned to three weight
loss programs. Twenty
weeks into the program, a
researcher gathers data on
weight loss and divides the
loss into three categories:
less than 20 pounds, 20 up
to 40 pounds, 40 or more
pounds. Here are the
results.
Example 5
4-31

Weight Less 20 up to 40
Loss than 20 40 pounds
Plan pounds pounds or more
Plan 1 4 8 3
Plan 2 2 12 1
Plan 3 12 2 1

Compare the weight loss under the three plans.

Example 5 continued

You might also like