You are on page 1of 52

Chapter 2,3.

Describing Data Using Tables


2.1

and Graphs: Outline


1. Types of Variables
2. Tabular Techniques for One Qualitative Variable: Frequency
and Relative Frequency Table
3. Graphical Presentation for One Qualitative Variable: Pie Chart
and Bar Chart
4. Tabular Techniques for Two Qualitative Variables Cross-
Classification Table
5. Graphical Presentation for Two Qualitative Variables:
Clustered Bar Chart
6. Graphical Presentation for Quantitative Data: Histogram,
Ogive, Dot plot, Box Plot (next chapter), Index Plot (line chart)
7. Graphical Presentation for two Quantitative Variables: Scatter
Plot

MACT 2222: Statistics for Business 2.1


2.2

Types of Variables

MACT 2222: Statistics for Business 2.2


1.2
2.3

Types of Variables

Types of variables

Quantitative
Qualitative (numeric)
(numeric or non -
numeric)
Interval

Nominal Ordinal

MACT 2222: Statistics for Business 2.3


2.4

Types of Variables
We need to know the type of variables in order to
choose the appropriate way to analyze the data.
1. Quantitative (Interval) :
a) Values are real numbers.
b) They have measurement units (e.g., dollars,
pounds, miles, etc…)
c) All calculations are valid.
Examples:
# of children in a family, # of shares outstanding,
Exchange rates, Stock prices

MACT 2222: Statistics for Business 2.4


2.5

Types of Variables
2. Qualitative Ordinal Data:

1. Values can be ordered or ranked.


2. Values can be numeric or non-numeric
3. Calculations based on an ordering process are valid.
Example: Financial ratings, letter grades, position in a race
3. Qualitative Nominal Data:

4. Values are names or labels of objects (there is no


natural ordering of the values at all).
5. Values can be numeric or non-numeric
6. Only calculations based on the frequencies of occurrence
are valid
Example: Gender, Marital status
MACT 2222: Statistics for Business 2.5
2.6

Describing and Summarizing Data

Data can be summarized


using
1. Numerical measures:
such as measures of Descriptive
location and measure Statistics
of variability.
2. Tables: Data can also Numerical
Tables Graphs
Measures
be described in tables
and/or graphs.
3. Graphs: which are
more attractive than
numbers and tables.

MACT 2222: Statistics for Business 2.6


2.7

Describing and Summarizing Data for One


Variable
Tabular and Graphical
Techniques for One Variable

Qualitative Quantitative

Tabular Techniques Graphical Techniques


Graphical
1.Frequency Table Tabular Techniques 1. Histogram
Techniques
2.Relative Frequency 1.Frequency Table 2. Dot Plot
1.Bar Chart
Table 2.Relative 3. Ogive
2.Pie Chart
Frequency Table 4.Box Plot
3.Cumulative 5.Line Chart for one
Frequency/Relative variable changing
Frequency Table over time

MACT 2222: Statistics for Business 2.7


2.8

Tabular and Graphical


Techniques for Qualitative
Variables

MACT 2222: Statistics for Business 2.8


1.8
2.9

Tabular Techniques for Qualitative Data

• An easy way to initially summarize the data is with


a frequency distribution(table), which simply lists each
main category in the data set, along with the
corresponding number of occurrences(counts) within
each category.
• A relative frequency distribution lists the categories
and the proportion with which each occurs.
• Classes are mutually exclusive (disjoint) and
collectively exhaustive: Every one is counted once and
only once.

MACT 2222: Statistics for Business 2.9


2.10

Graphical Techniques

A well-designed graph of the data will help you:


• Discover things that are not likely revealed in
numerical summaries or tables (e.g., unusual
or possibly wrong observations)
• See patterns and relationships that may be
hidden in the data
• Understand the distribution of your data
(center, shape, and spread)
• Explain your data to others (graphs are easy to
understand)

MACT 2222: Statistics for Business 2.10


2.11

Example 1: Frequency and Relative


Frequency Distributions (Tables)

If we take a bag of plain M&M candies (the ones in the


brown package). If you open the bag and simply dump
the M&Ms into a bowl, you see lots of colors, but no
underlying patterns. If, however, you divide the M&Ms
into a separate category for each color (Brown, Red, Blue,
Yellow, Green, and Orange), you can count the number of
M&Ms of each color. A frequency distribution for a bag of
M&Ms might look like:

MACT 2222: Statistics for Business 2.11


2.12

Example 1
COLOR
FREQUENCY RELATIVE
(NUMBER) FREQUENCY
These are nominal data.
Nominal data can be
Brown 14 0.24 ( = 14/58 ) described by:
Red 12 0.21 ( = 12/58 ) • Frequency and relative
frequency distribution
Blue 11 0.19 ( = 11/58 ) tables.
Yellow 9 0.16 ( = 9/58 ) • Graphically using either
Green 8
Bar Chart and/or Pie
0.14 ( = 8/58 )
Charts.
Orange 4 0.07 ( = 4/58 ) • The bar or the pie chart
tell us about the classes
with highest and lowest
frequencies
MACT 2222: Statistics for Business 2.12
2.13

Graphical Presentation for One Qualitative


Variable
Same information, (based on the same data).
Just different presentation.

Pie chart of M&M candies Bar chart of M&M


candies
MACT 2222: Statistics for Business 2.13
2.14

Example 2
Hospital Unit Number
of Patients
Cardiac Care 1,052
Emergency 2,245
Intensive Care 340
Maternity 552
Surgery 4,630

To comment on the graph


1. The class with the highest
frequency is Surgery (mode)
2. The class with the lowest
frequency is intensive care.

MACT 2222: Statistics for Business 2.14


2.15

Describing and Summarizing Data


for Two Variables
Graphical
Techniques for
two variables

2 qualitative 2 quantitative
variables variables

Clustered Bar
Scatter Plot
Chart

MACT 2222: Statistics for Business 2.15


2.16

Tabular Techniques for two Qualitative


Variables
• A cross-classification table (or cross-tabulation
table or contingency table) is used to describe
the relationship between two qualitative
variables or two categorized quantitative
variables.

• A cross-classification table lists the frequency of


each combination of the values of the two
variables.

MACT 2222: Statistics for Business 2.16


2.17

Example 3
In a major North American city there are four
competing newspapers: the Post, Globe and Mail,
Sun, and Star.
To help design advertising campaigns, the
advertising managers of the newspapers need to
know which segments of the newspaper market
are reading their papers.
A survey was conducted to analyze the
relationship between newspapers read and
occupation.

MACT 2222: Statistics for Business 2.17


2.18

Example 3
A sample of newspaper readers was asked to
report which newspaper they read:
1 = G&M (Globe and Mail),
2 = Post,
3 = Star, or
4 = Sun,
and to indicate whether they were:
1 = Blue-collar worker,
2 = White-collar worker, or
3 = Professional.
MACT 2222: Statistics for Business 2.18
2.19

Example 3
By counting the number of times each of the 12
combinations occurs, we produced the following
Cross-Classification Table or Contingency Table.
Occupation

Newspaper BlueCollar White Collar white Collar Total

G&M 27 29 33 89

Post 28 43 51 112

Star 38 21 22 81

Sun 37 15 20 72

Total 120 108 126 354

MACT 2222: Statistics for Business 2.19


2.20

Example 3
If occupation and newspaper are related, then there will be
differences in the newspapers read among the occupations.
An easy way to see this is to convert the frequencies in each column to
relative frequencies in each column. That is, compute the column
totals and divide each frequency by its column total. Then compare
the frequencies for each row, if the percentages are different at least
for one row then this implies that there is a relation between the
variables. If the percentages are close within the row and it is the
same case for all rows then there is no relation.
Occupation
Newspaper Blue Collar White Collar Professional
G&M 27/120 =.23 29/108 = .27 33/126 = .26
Post 18/120 = .15 43/108 = .40 51/126 = .40
Star 38/120 = .32 21/108 = .19 22/126 = .17
Sun 37/120 = .31 15/108 = .14 20/126 = .16

MACT 2222: Statistics for Business 2.20


2.21

Example 3
Interpretation: The relative frequencies in the
columns 2 & 3 are similar, but there are large
differences between columns 1 and 2 and
between columns 1 and 3.

similar

dissimilar
MACT 2222: Statistics for Business 2.21
2.22

Example 3
This tells us that blue collar workers tend to read
different newspapers from both white collar
workers and professionals and that white collar
and professionals are quite similar in their
newspaper choice.

similar

dissimilar
MACT 2222: Statistics for Business 2.22
Example 3
2.23

Use the data from the cross-classification table to create the clustered bar chart.
We can use frequencies or relative frequencies. If all clusters are sharing the
same pattern then there is no relation (regardless of the height of the bars)
otherwise there is a relation between the two variables. The heights of the bars
of one cluster should add up to 100% if using percentages. Be careful how the
bars are arranged!

Professionals
tend to read
the Globe &
Mail more
than twice as
often as the
Star or Sun

MACT 2222: Statistics for Business 2.23


2.24

Tabular and Graphical


Techniques for Quantitative
Variables

MACT 2222: Statistics for Business 2.24


1.24
2.25

Graphical Presentation for One


Quantitative Variable
• There are several graphical methods used when
the data are interval (i.e. numeric, non-
categorical).
• Common graphical methods to describe interval
data are Box Plots, Dot Plots, Histograms, and
Ogives
• Histograms are used to summarize interval
data, as well as to help explain probabilities.

MACT 2222: Statistics for Business 2.25


2.26

Histogram
To draw a histogram, we need a frequency or a
relative frequency table first.
So, we partition or split the interval data into a
number of categories or classes. As a rule of
thumb, the number of classes, m, should be
between 5 and 15. If m < 5, Less than 5, then we
would be summarizing too much and if m > 15, we
would be giving too much details and then count
how many observations fall in each of the classes.
Our focus here is on Equal size classes whenever
possible. The tables will be given in this course. No
need to construct them.
MACT 2222: Statistics for Business 2.26
2.27

Example 4
As part of a larger study, a long-distance company
wanted to acquire information about the monthly
bills of new subscribers in the first month after
signing with the company.
The company’s marketing manager conducted a
survey of 200 new residential subscribers wherein
the first month’s bills were recorded.
The general manager planned to present his/her
findings to senior executives. What information
can be extracted from these data?

MACT 2222: Statistics for Business 2.27


Example 4
2.28

Here the range of the data is about 120 and if we


decide on having 8 classes, we have the following
table:
Frequency
Class
Lower limit
Limit
Absolute Relative
(0,15] 71 35.5%
(15, 30] 37 18.5%
Upper limit (30, 45] 13 6.5%
(45,60] 9 4.5%
1. What is the class width
here? (60, 75] 10 5%
2. Where do we count the (75, 90] 18 9%
value 30?
(90,105] 28 14%
Class width= upper limit – (105,120] 14 7%
lower limit 200
Total 100%
MACT 2222: Statistics for Business 2.28
From Table to Histogram
2.29

Once we have a frequency distribution table, we


can construct the corresponding histogram of
the data (and vice verse).

MACT 2222: Statistics for Business 2.29


2.30

Example 4

(18+28+14=60)÷200 = 30%
about half (71+37=108)
i.e. nearly a third of the phone bills
of the bills are “small”,
are more than $75.
i.e. less than $30 There are only a few telephone
bills in the middle range.

MACT 2222: Statistics for Business 2.30


2.31

Example 5
The Distribution of the hourly wages of 50
employees is given as follows:

Class (Wages) Frequency (# of


Employees)
15 – 20 1
20 – 25 3
25 – 30 8
30 – 35 15
35 – 40 14
40 – 45 7
45 – 50 2
Total 50

MACT 2222: Statistics for Business 2.31


2.32

Example 5
Frequency 15
14
Comment on this 13
Histogram of the
histogram 12 wage distribution
1. The class with the
11
highest frequency is
from 30-35. This is our 10
modal class. Most of 9
the wages are 8
between 30 and 35. 7
2. Symmetry?
6
3. Spread? Is it steep or
flat? 5

4. Unimodal? 4
3
2
wages
1
0
15 20 25 30 35 40 45 50
MACT 2222: Statistics for Business 2.32
2.33

Shapes of Histogram
Symmetry
A histogram is said to be symmetric if, when we
draw a vertical line down the center of the
histogram, the two sides are identical in shape and
size:
Frequency

Frequency

Frequency
Variable Variable Variable

MACT 2222: Statistics for Business 2.33


2.34

Shapes of Histogram

Skewness
A skewed histogram is one with a long tail
extending to either the right or the left:
Frequency

Frequency
Variable Variable
Positively Negatively
Skewed Skewed

MACT 2222: Statistics for Business 2.34


2.35

Shapes of Histogram

Modality
A unimodal histogram is one with a single peak,
while
Bimoda
a bimodal histogram is one with two peaks:
l Unimoda
l
Frequency

Frequency
Variable Variable
A modal class is the class with
the largest number of
observations

MACT 2222: Statistics for Business 2.35


2.36

Histogram Comparison…
•Compare the following histograms based on data
about the grades of a group of students in two
The two courses
courses. . have very different
histograms…

unimodal vs. bimodal

spread of the marks (narrower | wider)


MACT 2222: Statistics for Business 2.36
2.37

Histogram Comparison…

Business Statistics marks: The histogram is unimodal and


approximately symmetric. There are no marks below 50, and the
majority of the marks are between 60 and 90. The modal class is 70 to
80, and the center of the distribution is approximately 75.

Mathematical Statistics marks: The histogram is bimodal. The larger


modal class is in the 70s, the smaller is in the 50s, there are few marks
in the 60s. This histogram suggests that there are two groups of
students, one may conclude that those who performed poorly in the
course are weaker in mathematics than those who performed well.

Comparing the two histograms, one can conclude that the two
courses have two different distribution of marks.

MACT 2222: Statistics for Business 2.37


2.38

Shapes of Histogram
Bell Shape
A special type of symmetric unimodal histogram is
one that is bell shaped:
Many statistical
techniques require that
the population be bell

Frequency
shaped.
Drawing the histogram
helps verify the shape of Variable

the population in Bell Shaped


question.
MACT 2222: Statistics for Business 2.38
Ogive
2.39

Ogive (pronounced “Oh-jive”) is a graph of a


cumulative frequency distribution. We create an
ogive in three steps
1. From the frequency distribution created earlier,
calculate relative frequencies:

2. Calculate cumulative relative frequencies by


adding the current class’ relative frequency to
the previous class’ cumulative relative
frequency.
3. Graph the cumulative relative frequencies
MACT 2222: Statistics for Business 2.39
2.40

Example 6

first
class…
next class: .355+.185=.540

:
:

last class: .930+.070=1.00


Find the median from the table?? Below the median we have 50% of the data.
From the cumulative frequency column the upper limit of the second class has 0.540 of the
data below it. Therefore the median is in this class 15-30.

MACT 2222: Statistics for Business 2.40


2.41

Example 6

What telephone bill value is at the 50th percentile?

Ogive accurately drawn


AT 0.50 we find that the long-distance bill is almost $27 which means that 50%
pays less than it and 50% pays more than it

41
MACT 2222: Statistics for Business 2.41
2.42

Dot Plot
It is a number line containing all numbers in the sample
showing a dot or a mark over the position corresponding
to each number.
If more than one dot falls in the same position then they
are stacked up.
This graph shows that
the most repeated mark
is 80. However, the most
repeated values for girls
are 70,80.
More boys tend to
achieve higher marks
than girls.

MACT 2222: Statistics for Business 2.42


2.43

Graphical Presentation for Time Series Data

• Observations measured at the same point in


time are called cross-sectional data.

• Observations measured at successive points in


time are called time-series data.

• Time-series data graphed on a line chart (index


plot), which plots the value of the variable on
the vertical axis against the time periods on the
horizontal axis.
MACT 2222: Statistics for Business 2.43
Example 7
2.44

We recorded the monthly average retail price of gasoline since


1978. A line chart is used to describe these data. We can observe
an increasing trend for the average retail price over years.

A line chart of the monthly average retail price of gasoline since 1978

MACT 2222: Statistics for Business 2.44


2.45

Graphical Presentation for Two


Quantitative Variables

• We Don’t need a table to plot the scatter


diagram. The data will come in pairs.
• To explore the relationship between two
interval variables, we employ a Scatter Plot,
which plots two variables against one another.
• The independent variable is labeled X and is
usually placed on the horizontal axis, while the
other, dependent variable, Y, is mapped to the
vertical axis.
MACT 2222: Statistics for Business 2.45
2.46

Example 8
A real estate agent wanted to know to what extent
the selling price of a home is related to its size. To
acquire this information he took a sample of 12
homes that had recently sold, recording the price
in thousands of dollars and the size in hundreds of
square feet. These data are listed in the
accompanying table. Use a graphical technique to
describe the relationship between size and price.
Size 23 18 26 20 22 14 33 28 23 20 27 18

Price 315 229 355 261 234 216 308 306 289 204 265 195

MACT 2222: Statistics for Business 2.46


2.47

Example 8
It appears that there is a relationship, that is, the
greater the house size the greater the selling price

MACT 2222: Statistics for Business 2.47


2.48

Patterns of Scatter Diagrams


Linearity and Direction are two concepts we are interested in

Positive Linear Relationship Negative Linear Relationship

Weak or Non-Linear Relationship

MACT 2222: Statistics for Business 2.48


2.49

Summary

Quantitative Qualitative
Data Data

Histogram Frequency Tables


Box Plot Bar Chart
Single Set of Data Dot Plot Pie Charts
Time Series Plot

Relationship Scatter Plot Cross-Classification


Between Table
Two Variables Bar Charts

MACT 2222: Statistics for Business 2.49


2.50

Examples from Previous Exams


1. A company bakes quiches and sells their products in Cairo. Their
records over the past 60 days are shown below

a) Draw a histogram for the


above table and comment
on its shape.
b) For how many days did the
company sell at least 400
quiches?

MACT 2222: Statistics for Business 2.50


2.51

Examples from Previous Exams

2. Based on the following


graph answer the
following.
a) What are the variables
of interest?
b) What are the types of
these variables?
c) Is there a relation
between the two
variables ?

MACT 2222: Statistics for Business 2.51


2.52

Learning Outcomes

Student is able to:


1. Explain the difference between the different
types of data
2. Select the appropriate tabular and/or
graphical techniques.
3. Construct the selected tables and/or graphs
using a computer package
4. Interpret the obtained tables and/or graphs.

MACT 2222: Statistics for Business 2.52

You might also like