You are on page 1of 30

Slides by

JOHN
LOUCKS
St. Edward’s
University

© 2009 Thomson South-Western. All Rights Reserved Slide 1


Chapter 2, Part B
Descriptive Statistics:
Tabular and Graphical Presentations

 Exploratory Data Analysis: Stem-and-Leaf Display


 Crosstabulations and
Scatter Diagrams y

© 2009 Thomson South-Western. All Rights Reserved Slide 2


Exploratory Data Analysis

 The techniques of exploratory data analysis consist of


simple arithmetic and easy-to-draw pictures that can
be used to summarize data quickly.
 One such technique is the stem-and-leaf display.

© 2009 Thomson South-Western. All Rights Reserved Slide 3


Stem-and-Leaf Display

 A stem-and-leaf display shows both the rank order


and shape of the distribution of the data.
 It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
 The first digits of each data item are arranged to the
left of a vertical line.
 To the right of the vertical line we record the last
digit for each item in rank order.
 Each line in the display is referred to as a stem.
 Each digit on a stem is a leaf.

© 2009 Thomson South-Western. All Rights Reserved Slide 4


Example: Hudson Auto Repair

The manager of Hudson Auto would like to have a


better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

© 2009 Thomson South-Western. All Rights Reserved Slide 5


Example: Hudson Auto Repair

 Sample of Parts Cost ($) for 50 Tune-ups

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

© 2009 Thomson South-Western. All Rights Reserved Slide 6


Stem-and-Leaf Display

5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9

a stem
a leaf

© 2009 Thomson South-Western. All Rights Reserved Slide 7


Stretched Stem-and-Leaf Display

 If we believe the original stem-and-leaf display has


condensed the data too much, we can stretch the
display by using two stems for each leading digit(s).

 Whenever a stem value is stated twice, the first value


corresponds to leaf values of 0 - 4, and the second
value corresponds to leaf values of 5 - 9.

© 2009 Thomson South-Western. All Rights Reserved Slide 8


Stretched Stem-and-Leaf Display

5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9

© 2009 Thomson South-Western. All Rights Reserved Slide 9


Stem-and-Leaf Display

 Leaf Units
• A single digit is used to define each leaf.
• In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.
• Where the leaf unit is not shown, it is assumed
to equal 1.

© 2009 Thomson South-Western. All Rights Reserved Slide 10


Example: Leaf Unit = 0.1

If we have data with values such as


8.6 11.7 9.4 9.1 10.2 11.0 8.8

a stem-and-leaf display of these data will be

Leaf Unit = 0.1


8 6 8
9 1 4
10 2
11 0 7

© 2009 Thomson South-Western. All Rights Reserved Slide 11


Example: Leaf Unit = 10

If we have data with values such as


1806 1717 1974 1791 1682 1910 1838

a stem-and-leaf display of these data will be

Leaf Unit = 10
16 8
The 82 in 1682
17 1 9 is rounded down
18 0 3 to 80 and is
represented as an 8.
19 1 7

© 2009 Thomson South-Western. All Rights Reserved Slide 12


Crosstabulations and Scatter Diagrams

 Thus far we have focused on presentations that are


used to summarize the data for one variable at a time.
 Often a manager is interested in presentations that
will help understand the relationship between two
variables.
 Crosstabulation and a scatter diagram are two
methods for summarizing the data for two variables
simultaneously.

© 2009 Thomson South-Western. All Rights Reserved Slide 13


Crosstabulation

 A crosstabulation is a tabular summary of data for


two variables.
 Crosstabulation can be used when:
• one variable is qualitative and the other is
quantitative,
• both variables are qualitative, or
• both variables are quantitative.
 The left and top margin labels define the classes for
the two variables.

© 2009 Thomson South-Western. All Rights Reserved Slide 14


Crosstabulation

 Example: Finger Lakes Homes


The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
quantitative qualitative
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100

© 2009 Thomson South-Western. All Rights Reserved Slide 15


Crosstabulation

 Insights Gained from Preceding Crosstabulation


• The greatest number of homes (19) in the sample
are a split-level style and priced at less than or
equal to $99,000.
• Only three homes in the sample are an A-Frame
style and priced at more than $99,000.

© 2009 Thomson South-Western. All Rights Reserved Slide 16


Crosstabulation

Frequency distribution
for the price variable

Price Home Style


Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution
for the home style variable

© 2009 Thomson South-Western. All Rights Reserved Slide 17


Crosstabulation: Row or Column Percentages

 Converting the entries in the table into row


percentages or column percentages can provide
additional insight about the relationship between
the two variables.

© 2009 Thomson South-Western. All Rights Reserved Slide 18


Crosstabulation: Row Percentages

Price Home Style


Range Colonial Log Split A-Frame Total
< $99,000 32.73 10.91 34.55 21.82 100
> $99,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100

© 2009 Thomson South-Western. All Rights Reserved Slide 19


Crosstabulation: Column Percentages

Price Home Style


Range Colonial Log Split A-Frame
< $99,000 60.00 30.00 54.29 80.00
> $99,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

(Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100

© 2009 Thomson South-Western. All Rights Reserved Slide 20


Crosstabulation: Simpson’s Paradox

 Data in two or more crosstabulations are often


aggregated to produce a summary crosstabulation.
 We must be careful in drawing conclusions about the
relationship between the two variables in the
aggregated crosstabulation.

 Simpson’ Paradox: In some cases the conclusions


based upon an aggregated crosstabulation can be
completely reversed if we look at the unaggregated
data.

© 2009 Thomson South-Western. All Rights Reserved Slide 21


Scatter Diagram and Trendline

 A scatter diagram is a graphical presentation of the


relationship between two quantitative variables.
 One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
 The general pattern of the plotted points suggests the
overall relationship between the variables.
 A trendline is an approximation of the relationship.

© 2009 Thomson South-Western. All Rights Reserved Slide 22


Scatter Diagram and Trendline

 A Positive Relationship

© 2009 Thomson South-Western. All Rights Reserved Slide 23


Scatter Diagram and Trendline

 A Negative Relationship

© 2009 Thomson South-Western. All Rights Reserved Slide 24


Scatter Diagram and Trendline

 No Apparent Relationship

© 2009 Thomson South-Western. All Rights Reserved Slide 25


Example: Panthers Football Team

 Scatter Diagram and Trendline


The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.

x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30

© 2009 Thomson South-Western. All Rights Reserved Slide 26


Scatter Diagram and Trendline

y
35
Number of Points Scored

30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions

© 2009 Thomson South-Western. All Rights Reserved Slide 27


Example: Panthers Football Team

 Insights Gained from the Preceding Scatter Diagram


• The scatter diagram and trendline indicate a
positive relationship between the number of
interceptions and the number of points scored.
• Higher points scored are associated with a higher
number of interceptions.
• The relationship is not perfect; all plotted points in
the scatter diagram are not on a straight line.

© 2009 Thomson South-Western. All Rights Reserved Slide 28


Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

• Frequency • Bar Graph • Frequency Dist. • Dot Plot


Distribution • Pie Chart • Rel. Freq. Dist. • Histogram
• Relative Freq. • % Freq. Dist. • Ogive
Distribution • Cum. Freq. Dist. • Stem-and-
• Percent Freq. • Cum. Rel. Freq. Leaf Display
Distribution Distribution • Scatter
• Crosstabulation • Cum. % Freq. Diagram
Distribution
• Crosstabulation

© 2009 Thomson South-Western. All Rights Reserved Slide 29


End of Chapter 2, Part B

© 2009 Thomson South-Western. All Rights Reserved Slide 30

You might also like