You are on page 1of 78

Chapter 2

Summarizing Data (6 hours)


Learning Objectives
In this chapter you learn:
 1. Frequency distribution classes
 2. Descriptive Statistics: Tabulars and
Charts/Graphs
Data Presentation
Data
Presentation

Categorial Numerical
Data Data

Summary Dot Stem-&-Leaf Frequency


Table Plot Display Distribution

Bar Pie Pareto


Histogram
Graph Chart Diagram
Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both

Row Is
Major Count Tally:
Category |||| ||||
Accounting 130
|||| ||||
Economics 20
Management 50
Total 200
Bar Chart
(for an Investor’s Portfolio)

Investor's Portfolio

Savings
CD
Bonds
Stocks

0 10 20 30 40 50
Amount in K$
Bar Chart

 The bar chart visualizes a categorical variable


as a series of bars.
 The length of each bar represents either the
frequency or percentage of values for each
category.
 Each bar is separated by a space called a gap.
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD
CD 15.5 14.09 14%
Savings 16.0 14.55
Total 110 100

Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Pie Chart

 The pie chart is a circle broken up into slices that


represent categories.
 The size of each slice of the pie varies according to
the percentage in each category.
Pareto Diagram
45% 100%

40% 90%

Axis for 35%


80%

bar 70%
chart 30%

shows 60%
25%
% 50%

invested 20%
40%
in each
15%
category 30% Axis for line
10%
20%
graph
shows
5% 10%
cumulative
0% 0% % invested
Stocks Bonds Savings CD
VILFREDO PARETO
(1843–1923)

The Pareto Principle


Pareto showed that approximately 80% of the total

wealth in a society lies with only 20% of the


families.
This famous law about the “vital few and the

trivial many” is widely known as the Pareto


principle in economics.
Pareto Diagram
 Used to portray categorical data
 A bar chart, where categories are shown in
descending order of frequency
 A cumulative polygon is shown in the same graph
 Used to separate the “vital few” from the “trivial
many”.
 Pareto charts are also powerful tools for prioritizing
improvement efforts, such as when data are collected
that identify defective or nonconforming items.
Pareto Diagram

The “Vital
Few”
Pareto Diagram

 Use the bank’s own processing systems as a


primary data source, causes of incomplete
transactions are collected, stored in ATM
Transactions to construct a Pareto Diagram.
Example
Pareto diagram
Summary
Bar graph: The categories (classes) of the
qualitative variable are represented by bars, where
the height of each bar is either the class frequency,
class relative frequency, or class percentage.
Pie chart: The categories (classes) of the qualitative
variable are represented by slices of a pie (circle).
The size of each slice is proportional to the class
relative frequency.
Pareto diagram: A bar graph with the categories
(classes) of the qualitative variable (i.e., the bars)
arranged by height in descending order from left to
right.
Bivariate Categorical Data
Side By Side Bar Charts
 The side by side bar chart represents the data from a contingency table.

No
Errors Errors Total
Small 50.75% 30.77% 47.50%
Amount
Medium 29.85% 61.54% 35.00%
Amount
Large 19.40% 7.69% 17.50%
Amount
Total 100.0% 100.0% 100.0%

Invoices with errors are much more likely to be of


medium size (61.54% vs 30.77% and 7.69%)
Bivariate Categorical Data

 Contingency Tables: Investment in Thousands of Dollars


Investment Investor A Investor B Investor C Total
Category

Stocks 46.5 55 27.5 129


Bonds 32 44 19 95
CD 15.5 20 13.5 49
Savings 16 28 7 51
Total 110 147 67 324
Bivariate Categorical Data
 Side by Side Charts
Comparing Investors

S avings

CD

B onds

S toc k s

0 10 20 30 40 50 60

Inves tor A Inves tor B Inves tor C


Side by Side Bar Charts

 Use Mutual Funds to construct a side-by-side


chart to visualizes the data for the levels of risk
for growth and value funds.
Side-By-Side Bar Charts
Data Presentation
Data
Presentation

Categorial Numerical
Data Data

Summary Dot Stem-&-Leaf Frequency


Table Plot Display Distribution

Bar Pie Pareto


Histogram
Graph Chart Diagram
Dot Plot
1. Horizontal axis is a scale for the quantitative
variable, e.g., percent.
2. The numerical value of each measurement is
located on the horizontal scale by a dot.
Stem-and-Leaf

 Data in Raw Form (as Collected):


24, 26, 24, 21, 27, 27, 30, 41, 32, 38
 Data in Ordered Array from Smallest to Largest:
Largest
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
 Stem-and-Leaf Display:
2 144677
3 028
4 1
Frequency Distributions
 A frequency distribution is a list or a table
containing the values of a variable (or a set of
ranges within which the data fall) and the
corresponding frequencies with which each value
occurs (or frequencies with which data fall within
each range).
 A frequency distribution is a way to summarize data
 The distribution condenses the raw data into a
more useful form and allows for a quick visual
interpretation of the data.
Example
Data in Ordered Array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
Frequency Distribution:
Discrete Data
 The following data record the number of
children in the families of the 47 workers in a
company:
1 1 3 2 0 2 0 1 2 2 1 3

5 2 4 0 0 2 4 1 1 2 2 0

3 0 0 2 1 3 6 0 2 1 0 3

2 2 2 1 0 0 1 1 3 1 4
Frequency distribution table
Number of children Number of workers
in family
0
1
2
3
4
5
6
Frequency Distribution:
Discrete Data
 Discrete data: possible values are countable
Number of days
Example: An read
Frequency
advertiser asks 0 44
200 customers 1 24
how many days 2 18
per week they 3 16
read the daily 4 20
newspaper. 5 22
6 26
7 30
Total 200
Relative Frequency
Relative Frequency: What proportion is in each category?
Number of days Relative
Frequency
read Frequency
44
0 44 .22  .22
1 24 .12
200
2 18 .09 22% of the
people in the
3 16 .08
sample report
4 20 .10 that they read
5 22 .11 the newspaper
0 days per week
6 26 .13
7 30 .15
Total 200 1.00
NOTE
For developing frequency and relative frequency
distributions for discrete data
(1)List all possible values of the variables. If the
variable is quantitatives, order the possible values
from low and high.
(2) Count the number of occurrences at each value
of the variable and place this value in a column
labeled “frequency”
(3)Determine the variable frequencies
Frequency Distribution:
Continuous Data

 Continuous Data: may take on any value in


some interval

Example: A manufacturer of insulation randomly selects


20 winter days and records the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27
(Temperature is a continuous variable because it could
be measured to any degree of precision desired)
Distribution classes
 Class limits: are the lower and upper values of the
classes as physically described in the distribution.

Discrete data Continuous data


Lower Upper
limit Classes Classes
limit

Lower Upper
limit limit
Distribution classes
 Class widths (class lengths):
- continuous data: are the numerical differences
between lower and upper class limits.
- discrete data: are the numerical differences
between the lower limit of one class and the lower
limit of the immediately following class
 Class mid-points: are situated in the centre of the
classes.
Distribution classes
 Open-ended class: Classes
- A class without a/an < 10
lower/upper limit.
- Usually used for the first 10-15
class which has no defined
lower limit and/or the last 15-20
class which has no defined
upper limit
>=20
Grouping Data by Classes
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

 Find range: 58 - 12 = 46
 Select number of classes: 5 (usually between 5 and 20)
 Compute class width: 10 (46/5 then round off)
 Determine class boundaries:10, 20, 30, 40, 50
 Compute class midpoints: 15, 25, 35, 45, 55
 Count observations & assign to classes
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency Distribution

Class Frequency Relative


Frequency
10 but under 20 3 .15
20 but under 30 6 .30
30 but under 40 5 .25
40 but under 50 4 .20
50 but under 60 2 .10
Total 20 1.00
Questions for Grouping Data
into Classes
 1. How wide should each interval be?
(How many classes should be used?)
 2. How should the endpoints of the
intervals be determined?
 Often answered by trial and error, subject to user judgment
 The goal is to create a distribution that is neither too "jagged"
nor too "blocky”
 Goal is to appropriately show the pattern of variation in the
data
How Many Class Intervals?
 Many (Narrow class intervals) 3.5
 may yield a very jagged 3

distribution with gaps from


2.5

Frequency
2
empty classes 1.5
1
 Can give a poor indication of 0.5

how frequency varies across 0

12
4
8

16
20
24
28
32
36
40
44
48
52
56
60
More
classes Temperature

12
 Few (Wide class intervals) 10

 may compress variation too 8

Frequency
6
much and yield a blocky 4
distribution 2

 can obscure important patterns 0


0 30 60 More
of variation. Temperature

(X axis labels are upper class endpoints)


Guidelines for grouping values into
classes
• Use between 5 and 20 classes.
Or Sturges’s Rule: Classes = 1 + 3.322[log10(n)]
where n: number of data values.
The classes should meet four criteria
-First, they must be mutually exclusive.
-Second, they must be all-inclusive.
-Third, if at all possible, they should be of equal-width
and
-Fourth, avoid empty classes if possible.
General Guidelines
 Number of Data Points Number of Classes
under 50 5- 7
50 – 100 6 - 10
100 – 250 7 - 12
over 250 10 - 20
 Class widths can typically be reduced as the number of
observations increases
 Distributions with numerous observations are more
likely to be smooth and have gaps filled since data are
plentiful
Class Width
 The class width is the distance between the
lowest possible value and the highest possible
value for a frequency class

 The minimum class width is


Largest Value - Smallest Value
W =
Number of Classes
The Histogram

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram

7 6
6 5
Frequen cy

5 4 No Gaps
4 3
3 2
Between
2 Bars
1 0 0
0
5 15 25 35 45 55 More

Class Boundaries
Class Midpoints
The Histogram

 A graph of the data in a frequency distribution is called


a histogram.

 The class boundaries (or class midpoints) are shown


on the horizontal axis.

 The vertical axis is either frequency, relative


frequency, or percentage.

 Bars of the appropriate heights are used to represent


the number of observations within each class.
Example

 To improve the quality of a product, a sample of


10 products is randomly collected each day for
ten days. The outside diameter is gauged and
reported in dataset diameter. Construct a
histogram to show the distribution of the
diameter.
The Histogram

Three general types of information:


 A visual indication of where the approximate

center of data is.


The degree of spread (or variation) in the data.

The shape of the distribution.


Data pattern

 Patterns in data are commonly described in


terms of: center, spread, shape and unusual
features
 Some common distributions have special
descriptive labels such as symmetric, bell-
shapes, skewed, etc…
Center

 the center of a distribution is located at the 


median of the distribution
 This is the point in a graphic display where
about half of the observations are on either side
 Example:
Spread

 The spread of a distribution refers to the


variability of the data
 If the observations cover a wide range, the
spread is larger. If the observations are
clustered around
Shape

The shape of a distribution is described by the


following characteristics.
Symmetry

Number of peaks

Skewness

Uniform: When a uniform distribution has no

clear peaks.
Example
Unusual features

 The two most common unusual features are


gaps and outliers.
 Gaps: refer to areas of a distribution where there are
no observations.
 Outliers: distributions are characterized by extreme
values that differ greatly from the other observations.
Example
Histogram-Example
 A manufacturer of industrial wheels suspects that
profitable orders are being lost because of the long time
the firm takes to develop price quotes for potential
customers. To investigate this possibility, 50 requests for
price quotes were randomly selected from the set of all
quotes made last year, and the processing time was
determined for each quote. Each quote was classified
according to whether the order was “lost” or not (i.e.,
whether or not the customer placed an order after
receiving a price quote).
 Use data set QUOTES to create a frequency histogram for
these data. Then shade the area under the histogram that
corresponds to lost orders. Interpret the result.
Histogram- Example
 In the Journal of Experimental Social Psychology (Vol. 45, 2009) study on
whether money can buy love (p. 63), the researchers randomly assigned
participants to the role of either gift-giver or gift-receiver. (Gift-givers, recall,
were asked about a birthday gift they recently gave, while gift-recipients were
asked about a birthday gift they recently received.) Two quantitative variables
were measured for each of the 237 participants: gift price (measured in dollars)
and overall level of appreciation for the gift (measured as the sum of the two 7-
point appreciation scales, with higher values indicating a higher level of
appreciation).
 One of the objectives of the research was to investigate whether givers and
receivers differ on the price of the gift reported and on the level of appreciation
reported.
 Use BUYLOV to construct side-by-side histograms for the quantitative
variables, one histogram for gift-givers and one for gift-recipients.
Organizing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Stem and Leaf Frequency Distributions


Ordered
Display and
Array Cumulative Distributions
2 144677
21, 24, 24, 26, 27,
3 028
27, 30, 32, 38, 41
4 1 Histograms Ogive

Tables Polygons
Cummulative and relative cummulative
frequency distribution
 A summary of a set of data that displays the
number of observations with values less than or
equal to the upper limit of each of its classes.
 A summary of a set of data that displays the
proportion of observations with values less than
or equal to the upper limit of each of its classes.
Relative frequency histograms
and ogives

• A relative frequency histogram is


formed in the same manner as a
frequency histogram, but it used
rather than frequencies.
• The cummulative relative frequency
is presented using a graph called an
ogive.
Tabulating Numerical Data:
Cumulative Frequency

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Lower Cumulative Cumulative


Limit Frequency % Frequency
10 0 0
20 3 15
30 9 45
40 14 70
50 18 90
60 20 100
The Ogive (Cumulative %)

Data in Ordered Array :


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Ogive

100
80
60
40

20
0
10 20 30 40 50 60

Class Boundaries (Not Midpoints)


The Frequency Polygon

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency

7
6
5
4
3
2
1
0
5 15 25 35 45 55 More

Class Midpoints
The Polygon
 A percentage polygon is formed by having the
midpoint of each class represent the data in that
class and then connecting the sequence of
midpoints at their respective class percentages.
 The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis,
and the cumulative percentages along the Y axis.
 Useful when there are two or more groups to
compare.
The Polygon
Useful When Comparing Two or More Groups
The Percentage Polygon
Line Charts and Scatter Diagrams
 Line charts show values of one variable
vs. time
 Time is traditionally shown on the horizontal axis

 Scatter Diagrams show points for bivariate


data
 one variable is measured on the vertical axis and
the other variable is measured on the horizontal
axis
Line Chart Example
Inflation
Year
1985
Rate
3.56
U.S. Inflation Rate
1986 1.86
6
1987 3.65 5
Inflation Rate (%)

1988 4.14
1989 4.82 4
1990 5.40
1991 4.21 3
1992 3.01
1993 2.99
2
1994 2.56
1
1995 2.83
1996 2.95 0
1997 2.29 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56
1999 2.21 Year
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example

Production Volume vs. Cost per Day


Volume Cost per
per day day
250
23 125
26 140 200
Cost per Day

29 146 150
33 160 100
38 167
50
42 170
50 188 0
55 195 0 10 20 30 40 50 60 70
60 200 Volume per Day
Types of Relationships
 Linear Relationships

Y Y

X X
Types of Relationships
(continued)

 Curvilinear Relationships

Y Y

X X
Types of Relationships
(continued)

 No Relationship

Y Y

X X
Summary
 END OF CHAPTER 2
Seven Basic Tools of Quality Control
1. Process Flowcharts
2. Brainstorming
3. Fishbone Diagram
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts
Seven Basic Tools of Quality Control
(continued)

1. Process Flowcharts Map out the process to better


2. Brainstorming visualize and understand
3. Fishbone Diagram opportunities for improvement.
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts
Seven Basic Tools of Quality
Control
(continued)

1. Process Flowcharts Fishbone (cause-and-effect) diagram:


2. Brainstorming
Cause 1 Cause 2
3. Fishbone Diagram
4. Histogram Sub-causes
5. Trend Charts
6. Scatter Plots Problem
7. Statistical Process Sub-causes
Control Charts

Show patterns of variation Cause 3 Cause 4


Seven Basic Tools of Quality
Control
(continued)

1. Process Flowcharts Identify trend


2. Brainstorming y
3. Fishbone Diagram
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process time
Control Charts
Examine relationships
y

x
Seven Basic Tools of Quality
Control
(continued)

1. Process Flowcharts Examine the performance


2. Brainstorming of a process over time
3. Fishbone Diagram
4. Histogram X
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts

time

You might also like