You are on page 1of 59

Chapter Two

Graphical and Tabular


Descriptive Techniques

2.1
Definitions…
A variable is some characteristic of a population or sample.
E.g. student grades.

The values of the variable are the range of possible values


for a variable.
E.g. student marks (0..100)

Data are the observed values of a variable.


E.g. student marks: {67, 74, 71, 83, 93, 55, 48}

Copyright © 2009 Cengage Learning 2.2


Interval Data…
• Real numbers, i.e. heights, weights, prices, etc.
• Also referred to as quantitative or numerical.

Arithmetic operations can be performed on Interval Data,

Copyright © 2009 Cengage Learning 2.3


Nominal Data…
• The values of nominal data are categories.
E.g. responses to questions about marital status, coded
as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4

Arithmetic operations don’t make any sense (e.g. does


Widowed ÷ 2 = Married?!)

Nominal data are also called qualitative or categorical.

Copyright © 2009 Cengage Learning 2.4


Ordinal Data…
Ordinal Data appear to be categorical in nature, but their
values have an order; a ranking to them:

E.g. College course rating system:


poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

Still not meaningful to do arithmetic on this data


We can say:
excellent > poor or fair < very good
Order is maintained no matter what numeric values are
assigned to each category.
Copyright © 2009 Cengage Learning 2.5
Hierarchy of Data…
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.

Ordinal
Values must represent the ranked order of the data.
Data may be treated as nominal but not as interval.

Nominal
Values are the arbitrary numbers that represent categories.
Data may not be treated as ordinal or interval.

Copyright © 2009 Cengage Learning 2.6


Part A:
SUMMARIZING UNIVARIATE DATA

Copyright © 2009 Cengage Learning 2.7


Graphical & Tabular Techniques for Nominal Data…
The only allowable calculation on nominal data is to count
the frequency of each value of the variable.

We can summarize the data in a table that presents the


categories and their counts called a frequency distribution.

A relative frequency distribution lists the categories and the


proportion with which each occurs.

Copyright © 2009 Cengage Learning 2.8


Light Beer Brand Frequency Relative Frequency

Budweiser Light 90 31.6%

Busch Light 19 6.7

Coors Light 62 21.8

Michelob Light 13 4.6

Miller Lite 59 20.7

Natural Light 25 8.8

Other brands 17 6.0

Copyright © 2009 Cengage Learning 2.9


Nominal Data (Frequency)
Light Beer Brand Frequency Relative Frequency

100 Budweiser Light 90 31.6%


90
90 Busch Light 19 6.7

Coors Light 62 21.8


80
Michelob Light 13 4.6
70 62
59 Miller Lite 59 20.7
60
Natural Light 25 8.8
50
Other brands 17 6.0
40 285

30 25
19 17
20 13
10
0
1 2 3 4 5 6 7

Bar Charts show frequencies…


Copyright © 2009 Cengage Learning 2.10
Nominal Data (Relative Frequency)
7
6%
6
9%
1
31%

5
21%

2
7%
4
4%
3
22%
Pie Charts show relative frequencies…
Copyright © 2009 Cengage Learning 2.11
Nominal Data
Light Beer Brand Frequency Relative Frequency

Budweiser Light 90 31.6%

Busch Light 19 6.7


It all the same information,
Coors Light

Michelob Light
62

13
21.8

4.6
(based on the same data).
Miller Lite 59 20.7
Just different presentation.
Natural Light 25 8.8

Other brands 17 6.0


100
90
90
80
7
6% 70 62
6 59
60
9%
1
50
31%
40
30 25
5 19 17
20 13
21%
10
0
2
1 2 3 4 5 6 7
7%
4
4%
3
22%

Copyright © 2009 Cengage Learning 2.12


Ordinal Data
Same as the nominal data but
• the bars in the bar chart are arranged in ascending or
descending order.
• The wedges in the pie chart are arranged clockwise in
ascending or descending order.

Copyright © 2009 Cengage Learning 2.13


Graphical Techniques for Interval Data
There are several graphical methods that are used when the
data are interval.

The most important of these graphical methods is the


histogram.

Copyright © 2009 Cengage Learning 2.14


Example
For nominal data we created a frequency distribution of the
categories.
For interval data, we create a frequency distribution by
counting the number of observations that fall into a series of
intervals, called classes.

Copyright © 2009 Cengage Learning 2.15


Example
These data are stored in file Xm02-04
These classes are defined as follows:
Classes
Amounts that are less than or equal to 15
Amounts that are more than 15 but less than or equal to 30
Amounts that are more than 30 but less than or equal to 45
Amounts that are more than 45 but less than or equal to 60
Amounts that are more than 60 but less than or equal to 75
Amounts that are more than 75 but less than or equal to 90
Amounts that are more than 90 but less than or equal to 105
Amounts that are more than 105 but less than or equal to 120

Copyright © 2009 Cengage Learning 2.16


Example
Histogram

80
70
60
Frequency

50
40
30
20
10
0
15 30 45 60 75 90 105 120
Bills

Copyright © 2009 Cengage Learning 2.17


Interpret…

(18+28+14=60)÷200 = 30%
about half (71+37=108)
i.e. nearly a third of the phone bills
of the bills are “small”,
are $90 or more.
i.e. less than $30

There are only a few telephone


bills in the middle range.

Copyright © 2009 Cengage Learning 2.18


Building a Histogram…
a) Determine the number of classes

With 200 observations,


we should have
between 7 & 10
classes…

Alternative, we could use Sturges’ formula:


Number of class intervals = 1 + 3.3 log (n)

Copyright © 2009 Cengage Learning 2.19


Building a Histogram…
a) Determine the number of classes to use. [8]
b) Determine how large to make each class…

Look at the range of the data, that is,


Range = Largest Observation – Smallest Observation
Range = $119.63 – $0 = $119.63
Then each class width becomes:
Range ÷ (# classes) = 119.63 ÷ 8 ≈ 15

Copyright © 2009 Cengage Learning 2.20


Building a Histogram…

Copyright © 2009 Cengage Learning 2.21


Building a Histogram…

In Excel, use Data, Data Analysis, Histogram


Inputs: Input Range, Bin Range

Copyright © 2009 Cengage Learning 2.22


Shapes of Histograms…
Symmetry
When we draw a vertical line down the center of the
histogram, the two sides are identical in shape and size:
Frequency

Frequency

Frequency
Variable Variable Variable

Copyright © 2009 Cengage Learning 2.23


Shapes of Histograms…
Skewness
A long tail extending to either the right or the left:
Frequency

Frequency
Variable Variable

Positively Skewed Negatively Skewed

Copyright © 2009 Cengage Learning 2.24


Shapes of Histograms…
Modality
A unimodal histogram: a single peak,
A bimodal histogram: two peaks:

Bimodal
Unimodal
Frequency

Frequency
Variable Variable

A modal class
Copyright © 2009 Cengage Learning 2.25
Shapes of Histograms…
A special type of symmetric unimodal histogram is one that
is bell shaped:

Frequency

Variable

Bell Shaped
Copyright © 2009 Cengage Learning 2.26
Stem & Leaf Display…
• Retains information about individual observations that
would normally be lost in the creation of a histogram.

• Split each observation into two parts, a stem and a leaf:

• e.g. Observation value: 42.19


• There are several ways to split it up…
Stem Leaf
42 19
• We could split it at the decimal point:
4 2
• Or split it at the “tens” position (while rounding to the
nearest integer in the “ones” position)
Copyright © 2009 Cengage Learning 2.27
Stem & Leaf Display…
• Use the “stems” for the classes
• Each leaf becomes part of the histogram

Stem Leaf
0 0000000000111112222223333345555556666666778888999999
1 000001111233333334455555667889999
2 0000111112344666778999
3 001335589
4 124445589
5 33566
6 3458
7 022224556789
8 334457889999 Thus, we still have access to our
9 00112222233344555999 original data point’s value!
10 001344446699
11 124557889

Copyright © 2009 Cengage Learning 2.28


Histogram and Stem & Leaf…

Compare the overall shapes of the figures…


Copyright © 2009 Cengage Learning 2.29
Ogive…
• (pronounced “Oh-jive”)
• a graph of a cumulative frequency distribution.

Copyright © 2009 Cengage Learning 2.30


Relative Frequencies…

Copyright © 2009 Cengage Learning 2.31


Cumulative Relative Frequencies…
Cumulative relative frequencies =
the current class’ relative frequency
+ the previous class’ cumulative relative frequency.

first class…
next class: .355+.185=.540

:
:

last class: .930+.070=1.00


Copyright © 2009 Cengage Learning 2.32
Ogive…
Graph the cumulative relative frequencies…

Copyright © 2009 Cengage Learning 2.33


Ogive…

The ogive can be used to


answer questions like:

What value is at the 50th


percentile?

“around $35”
(Refer also to Fig. 2.13 in your textbook)
Copyright © 2009 Cengage Learning 2.34
Describing Time Series Data
Observations measured at successive points in time are
called time-series data.

Line chart plots the value of the variable on the vertical axis
against time periods on the horizontal axis.

Copyright © 2009 Cengage Learning 2.35


the monthly average retail price of gasoline

3.5

2.5

1.5

0.5

0
1 25 49 73 97 121 145 169 193 217 241 265 289 313 337

Copyright © 2009 Cengage Learning 2.36


Price of Gasoline in 1982-84 Constant Dollars

1.800
1.600
1.400
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1 25 49 73 97 121 145 169 193 217 241 265 289 313 337

Copyright © 2009 Cengage Learning 2.37


Copyright © 2009 Cengage Learning 2.38
15
Frequency

10

0
30 36 42 48 54 60 66

Ages

Copyright © 2009 Cengage Learning 2.39


Part B
SUMMARIZING BIVARIATE DATA

❖ Cross – tabulation y
❖ Scatter Diagram

x
Crosstabulations and Scatter
Diagrams
◼ So far we have focused on methods that are used
to summarize the data for one variable at a time.
◼ Often a manager is interested in tabular and
graphical methods that will help understand the
relationship between two variables.
◼ Cross-tabulation and a scatter diagram are two
methods for summarizing the data for two (or more)
variables simultaneously.

Copyright © 2009 Cengage Learning


Crosstabulation

◼ A crosstabulation is a tabular summary of data for


two variables.

◼ The left and top margin labels define the classes for
the two variables.

Copyright © 2009 Cengage Learning


Crosstabulation

Example: Finger Lakes Homes


The number of Finger Lakes homes sold for each style
and price for the past two years is shown below.

Price Home Style


Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45
Total 30 20 35 15 100

price range: ordinal:interval data transformed to ordinal


homestyle: ordinal (??)

Copyright © 2009 Cengage Learning


Crosstabulation
• Insights Gained from Preceding Crosstabulation

• The greatest number of homes in the sample (19)


are a split-level style and priced at less than or
equal to $99,000.
• Only three homes in the sample are an A-Frame
style and priced at more than $99,000.
effects between prices and numbers of houses sold
eg. A frame ppl prefer lower-cost houses

Copyright © 2009 Cengage Learning


Crosstabulation

Frequency distribution
for the price variable

Price Home Style


Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45
Total 30 20 35 15 100

Frequency distribution
for the home style variable

Copyright © 2009 Cengage Learning


Cross-tabulation: Row or Column Percentages

• Converting the entries in the table into row


percentages or column percentages can provide
additional insight about the relationship between
the two variables.

Copyright © 2009 Cengage Learning


Crosstabulation: Row Percentages

Price Home Style


Range Colonial Log Split A-Frame Total
< $99,000 32.73 10.91 34.55 21.82 100
> $99,000 26.67 31.11 35.56 6.67 100
Note: row totals are actually 100.01 due to rounding.

(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100

Copyright © 2009 Cengage Learning


Crosstabulation: Column Percentages

Price Home Style


Range Colonial Log Split A-Frame
< $99,000 60.00 30.00 54.29 80.00
> $99,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

(Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100

Copyright © 2009 Cengage Learning


Scatter Diagram and Trendline

◼ A scatter diagram is a graphical presentation of the


relationship between two quantitative variables.
◼ One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
◼ The general pattern of the plotted points suggests the
overall relationship between the variables.
◼ A trendline is an approximation of the relationship.

Copyright © 2009 Cengage Learning


Scatter Diagram
• A Positive Relationship
y

Copyright © 2009 Cengage Learning


Scatter Diagram
• A Negative Relationship
y

Copyright © 2009 Cengage Learning


Scatter Diagram
• No Apparent Relationship
y

Copyright © 2009 Cengage Learning


Example: Panthers Football Team
• Scatter Diagram
The Panthers football team is interested
in investigating the relationship, if any,
between interceptions made and points scored.

x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30

Copyright © 2009 Cengage Learning


Scatter Diagram
y
35
Number of Points Scored

30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions

Copyright © 2009 Cengage Learning


Example: Panthers Football Team

n Insights Gained from the Preceding Scatter Diagram


• The scatter diagram indicates a positive relationship
between the number of interceptions and the
number of points scored.
• Higher points scored are associated with a higher
number of interceptions.
• The relationship is not perfect; all plotted points in
the scatter diagram are not on a straight line.

Copyright © 2009 Cengage Learning


Summary I…
Factors That Identify When to Use Frequency and Relative Frequency Tables, Bar and Pie
Charts
1. Objective: Describe a single set of data.
2. Data type: Nominal

Factors That Identify When to Use a Histogram, Ogive, or Stem-and-Leaf Display


1. Objective: Describe a single set of data.
2. Data type: Interval

Factors that Identify When to Use a Cross-classification Table


1. Objective: Describe the relationship between two variables.
2. Data type: Nominal

Factors that Identify When to Use a Scatter Diagram


1. Objective: Describe the relationship between two variables.
2. Data type: Interval

Copyright © 2009 Cengage Learning 2.56


Summary II…

Interval Nominal
Data Data
Histogram Frequency and
Single Set of Relative Frequency
Data Tables, Bar and Pie
Charts
Relationship Scatter Diagram Bar Charts
Between
Two Variables

Copyright © 2009 Cengage Learning 2.57


Copyright © 2009 Cengage Learning 2.58
Copyright © 2009 Cengage Learning 2.59

You might also like