You are on page 1of 131

Quantitative Methods in

Management
Day-2
Recap..
• Introduction
• Definition
• Terms and terminologies
• Types of statistics
• Types of data
• Levels of measurements
• Application of statistics in business
• Sources of data
Organizing and visualizing
variables
Chapter 2
Pg. 33-98
• Qualitative
• Quantitative
• Geographical
• Chronological
• Time series (is a set of observations collected at usually
discrete and equally spaced time intervals- Eg. Daily closing
stock price of a certain stock recorded over the last six
weeks )
• Cross sectional (observations from different individuals or
groups at a single point in time – inventory of all ice creams
in stock at a particular store)
PRESENTATION OF DATA
TABULAR
DIAGRAMS
GRAPHS
• TABULATION

SPECIMEN OF A TABLE

Stub Caption Total


Stub Body of the table
Entries
Stub entries

Total Grand
Total

Foot Note
Sources
DESCRIPTIVE STATISTICS:
ORGANIZING AND VISUALIZING
VARIABLES
CHAPTER 2
Descriptive Statistics:
Tabular and Graphical Presentations
• Summarizing Categorical Data
 Summarizing Quantitative Data

Categorical data use labels or names


to identify categories of like items.

Quantitative data are numerical values


that indicate how much or how many.
Categorical Data Are Organized By
Utilizing Tables DCOVA
Categorical
Data

Tallying Data

One Two
Categorical Categorical
Variable Variables

Summary Contingency
Table Table
Organizing Categorical Data: Summary Table
DCOVA
 A summary table tallies the frequencies or percentages of items in a set of
categories so that you can see differences between categories.

Main Reason Young Adults Shop Online

Reason For Shopping Online? Percent


Better Prices 37%
Avoiding holiday crowds or hassles 29%
Convenience 18%
Better selection 13%
Ships directly 3%

Source: Data extracted and adapted from “Main Reason Young Adults Shop Online?”
USA Today, December 5, 2012, p. 1A.
Frequency Distribution

A frequency distribution is a tabular summary of


data showing the frequency (or number) of items
in each of several non-overlapping classes.

The objective is to provide insights about the data


that cannot be quickly obtained by looking only at
the original data.
Relative Frequency Distribution

The relative frequency of a class is the fraction or


proportion of the total number of data items
belonging to the class.

A relative frequency distribution is a tabular


summary of a set of data showing the relative
frequency for each class.
Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.

A percent frequency distribution is a tabular


summary of a set of data showing the percent
frequency for each class.
Frequency Distribution…
Example – 4 soft drinks – 15 households
Coke Pepsi 7 Up Coke Mirinda
Coke 7 Up 7 Up Coke Coke
Mirinda 7 Up Coke Mirinda Coke

Drink Frequency
Coke 7
Pepsi 1
Mirinda 3
7 Up 4
Total 15
Frequency Distribution…
Soft Drink Frequency Relative Percent
frequency frequency
Coke 7 0.46 46
Pepsi 1 0.07 7
Mirinda 3 0.20 20
7 Up 4 0.27 27
Total 15 1.00 100
Frequency Distribution
 Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality of


their accommodations as being excellent, above average,
average, below average, or poor. The ratings provided by a
sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Frequency Distribution

 Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency and
Percent Frequency Distributions
 Example: Marada Inn

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20 = .05
A Contingency Table Helps Organize Two or
More Categorical Variables
DCOVA
• Used to study patterns that may exist between the
responses of two or more categorical variables

• Cross tabulates or tallies jointly the responses of the


categorical variables

• For two variables the tallies for one variable are


located in the rows and the tallies for the second
variable are located in the columns
Contingency Table - Example
DCOVA
• A random sample of 400
invoices is drawn. Contingency Table Showing
Frequency of Invoices Categorized
• Each invoice is categorized as a By Size and The Presence Of Errors
small, medium, or large No
amount. Errors Errors Total
• Each invoice is also examined Small 170 20 190
to identify if there are any Amount
errors. Medium 100 40 140
Amount
• This data are then organized in
the contingency table to the Large 65 5 70
Amount
right.
335 65 400
Total
Contingency Table Based On Percentage
Of Overall Total DCOVA
No
Errors Errors Total 42.50% = 170 / 400
Small 170 20 190 25.00% = 100 / 400
Amount 16.25% = 65 / 400
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 42.50% 5.00% 47.50%
Total 335 65 400 Amount
Medium 25.00% 10.00% 35.00%
Amount
83.75% of sampled invoices
Large 16.25% 1.25% 17.50%
have no errors and 47.50% Amount
of sampled invoices are for Total 83.75% 16.25% 100.0%
small amounts.
Contingency Table Based On
Percentage of Row Totals
No
DCOVA
Errors Errors Total 89.47% = 170 / 190
Small 170 20 190 71.43% = 100 / 140
Amount 92.86% = 65 / 70
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 89.47% 10.53% 100.0%
Total 335 65 400 Amount
Medium 71.43% 28.57% 100.0%
Amount
Medium invoices have a larger
Large 92.86% 7.14% 100.0%
chance (28.57%) of having Amount
errors than small (10.53%) or Total 83.75% 16.25% 100.0%
large (7.14%) invoices.
Contingency Table Based On
Percentage Of Column Totals
No
DCOVA
Errors Errors Total 50.75% = 170 / 335
Small 170 20 190 30.77% = 20 / 65
Amount
Medium 100 40 140
Amount No
Large 65 5 70 Errors Errors Total
Amount Small 50.75% 30.77% 47.50%
Total 335 65 400 Amount
Medium 29.85% 61.54% 35.00%
Amount
There is a 61.54% chance
Large 19.40% 7.69% 17.50%
that invoices with errors are Amount
of medium size. Total 100.0% 100.0% 100.0%
Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Ungrouped Versus Grouped Data
• Ungrouped data
• have not been summarized in any way
• are also called raw data
• Grouped data
• have been organized into a frequency
distribution
Tables Used For Organizing
Numerical Data
DCOVA

Numerical Data

Ordered Array Frequency Cumulative


Distributions Distributions
ARRANGE (ARRAY)
ARRANGE (ARRAY)
RAW DATA
(OR)
INDIVIDUAL UNARRANGE (RANDOM)
UNARRANGE (RANDOM)
SERIES
INCLUSIVEINCLUSIVE
DISCRETE SERIES
DISCRETE SERIES

EXCLUSIVE

GROUPED CONTINUOUS SERIES


EXCLUSIVE

SERIES/ OPEN END


CONTINUOUS SERIES
DATA
OPEN END

COMMULATIVE FREQUENCY LESS THAN

CUMMULATIVE FREQUENCY LESS THAN

MORE THAN

BIVARIATE DATA
MORE THAN
Organizing Numerical Data:
Ordered Array
DCOVA
 An ordered array is a sequence of data, in rank order, from the smallest
value to the largest value.
 Shows range (minimum value to maximum value)
 May help identify outliers (unusual observations)

Age of Day Students


Surveyed
16 17 17 18 18 18
College
Students 19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Organizing Numerical Data:
Frequency Distribution
DCOVA
 The frequency distribution is a summary table in which the data are arranged into
numerically ordered classes.

 You must give attention to selecting the appropriate number of class groupings for the table,
determining a suitable width of a class grouping, and establishing the boundaries of each class
grouping to avoid overlapping.

 The number of classes depends on the number of values in the data. With a larger number of
values, typically there are more classes. In general, a frequency distribution should have at
least 5 but no more than 15 classes.

 To determine the width of a class interval, you divide the range (Highest value–Lowest
value) of the data by the number of class groupings desired.
Organizing Numerical Data:
Frequency Distribution Example
DCOVA

Example: A manufacturer of insulation randomly selects 20


winter days and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Organizing Numerical Data:
Frequency Distribution Example
DCOVA
 Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
 Find range: 58 - 12 = 46
 Select number of classes: 5 (usually between 5 and 15)
 Compute class interval (width): 10 (46/5 then round up)
 Determine class boundaries (limits):
 Class 1: 10 but less than 20
 Class 2: 20 but less than 30
 Class 3: 30 but less than 40
 Class 4: 40 but less than 50
 Class 5: 50 but less than 60
 Compute class midpoints: 15, 25, 35, 45, 55
 Count observations & assign to classes
Organizing Numerical Data: Frequency
Distribution Example
DCOVA
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Midpoints Frequency

10 but less than 20 15 3


20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Total 20
Organizing Numerical Data: Relative & Percent
Frequency Distribution Example
DCOVA
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15%
20 but less than 30 6 .30 30%
30 but less than 40 5 .25 25%
40 but less than 50 4 .20 20%
50 but less than 60 2 .10 10%
Total 20 1.00 100%

Relative Frequency = Frequency / Total, e.g. 0.10 = 2 / 20


Organizing Numerical Data: Cumulative
Frequency Distribution Example
DCOVA
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15% 3 15%


20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%
50 but less than 60 2 10% 20 100%
Total 20 100 20 100%

Cumulative Percentage = Cumulative Frequency / Total * 100 e.g. 45% = 100*9/20


• LESS THAN CUMULATIVE FREQUENCY SERIES

NO. OF
HOURS
WORKERS

LESS THAN 10 5
LESS THAN 30 15
LESS THAN 60 30
LESS THAN 90 50
• MORE THAN CUMULATIVE FREQUENCY SERIES

PROFITS (RS. IN LAKHS) NO. OF COMPANIES

MORE THAN 100 150


MORE THAN 150 90
MORE THAN 200 40
MORE THAN 250 5
• INCLUSIVE CLASS INTERVAL

CLASS INTERVAL FREQUENCY

10 – 19 17
20 – 29 15
30 – 39 12
40 – 49 10
• EXCLUSIVE CLASS INTERVAL

NO. OF
REVENUE (RS.)
PRODUCTS
100 – 200 15
200 – 300 20
300 – 400 10
400 – 500 5
TOTAL 50
• OPEN END CLASS INTERVAL

SALARY (RS.) NO. OF CLERKS


LESS THAN 1500 10
1500 – 1700 25
1700 – 1900 45
1900 – 2100 11
MORE THAN 2100 9
TOTAL 100
Why Use a Frequency Distribution?
DCOVA
• It condenses the raw data into a more useful
form
• It allows for a quick visual interpretation of the
data
• It enables the determination of the major
characteristics of the data set including where
the data are concentrated / clustered
Frequency Distributions:
Some Tips
DCOVA
• Different class boundaries may provide different pictures for the
same data (especially for smaller data sets)

• Shifts in data concentration may show up when different class


boundaries are chosen

• As the size of the data set increases, the impact of alterations in the
selection of class boundaries is greatly reduced

• When comparing two or more groups with different sample sizes,


you must use either a relative frequency or a percentage
distribution
Visualizing Categorical Data Through
Graphical Displays
DCOVA
Categorical
Data
Visualizing Data

Summary Contingency
Table For One Table For Two
Variable Variables

Bar Pareto Side By Side


Chart Chart Bar Chart
Pie Chart
Visualizing Categorical Data:
The Bar Chart
DCOVA
 The bar chart visualizes a categorical variable as a series of bars. The
length of each bar represents either the frequency or percentage of values
for each category. Each bar is separated by a space called a gap.

Reason For Percent


Shopping Online?
Better Prices 37%
Avoiding holiday 29%
crowds or hassles
Convenience 18%
Better selection 13%
Ships directly 3%
Visualizing Categorical Data:
The Pie Chart
DCOVA
 The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.

Reason For Shopping Percent


Online?
Better Prices 37%
Avoiding holiday crowds or 29%
hassles
Convenience 18%
Better selection 13%
Ships directly 3%
Visualizing Categorical Data:
The Pareto Chart
DCOVA
• Used to portray categorical data
• A vertical bar chart, where categories are shown in
descending order of frequency
• A cumulative polygon is shown in the same graph
• Used to separate the “vital few” from the “trivial
many”
Visualizing Categorical Data:
The Pareto Chart (con’t) DCOVA
Ordered Summary Table For Causes
Of Incomplete ATM Transactions
Cumulative
Cause Frequency Percent Percent
Warped card jammed 365 50.41% 50.41%
Card unreadable 234 32.32% 82.73%
ATM malfunctions 32 4.42% 87.15%
ATM out of cash 28 3.87% 91.02%
Invalid amount requested 23 3.18% 94.20%
Wrong keystroke 23 3.18% 97.38%
Lack of funds in account 19 2.62% 100.00%
Total 724 100.00%
Source: Data extracted from A. Bhalla, “Don’t Misuse the Pareto Principle,” Six Sigma Forum
Magazine, May 2009, pp. 15–18.
Visualizing Categorical Data:
The Pareto Chart (con’t) DCOVA

The “Vital
Few”
Visualizing Categorical Data:
Side By Side Bar Charts DCOVA
 The side by side bar chart represents the data from a contingency table.

No
Errors Errors Total
Invoice Size Split Out By Errors
Small 50.75% 30.77% 47.50% & No Errors
Amount
Medium 29.85% 61.54% 35.00% Errors

Amount
Large 19.40% 7.69% 17.50% No Errors

Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
Total 100.0% 100.0% 100.0% Large Medium Small

Invoices with errors are much more likely to be of


medium size (61.54% vs 30.77% and 7.69%)
Visualizing Numerical Data By
Using Graphical Displays
DCOVA
Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Histogram Polygon Ogive
Display
Stem-and-Leaf Display
DCOVA

• A simple way to see how the data are distributed and


where concentrations of data exist

METHOD: Separate the sorted data series


into leading digits (the stems) and
the trailing digits (the leaves)
Organizing Numerical Data:
Stem and Leaf Display
DCOVA
 A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves) branch
out to the right on each row.

Age of College Students

Age of Day Students Day Students Night Students


Surveyed
16 17 17 18 18 18 Stem Leaf
College Stem Leaf
Students 19 19 20 20 21 22
1 8899 1 67788899
22 25 27 32 38 42
Night Students 2 0138 2 0012257
18 18 19 19 20 21
3 23 3 28
23 28 32 33 41 45
4 2
4 15
Visualizing Numerical Data:
The Histogram
DCOVA
 A vertical bar chart of the data in a frequency distribution is
called a histogram.

 In a histogram there are no gaps between adjacent bars.

 The class boundaries (or class midpoints) are shown on the


horizontal axis.

 The vertical axis is either frequency, relative frequency, or


percentage.

 The height of the bars represent the frequency, relative


frequency, or percentage.
Visualizing Numerical Data:
The Histogram
DCOVA
Relative
Class Frequency Percentage
Frequency

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10 8
Total 20 1.00 100 Daily High Temperature
6

Frequency
4
(In a percentage
histogram the
vertical axis would 2
be defined to show
the percentage of 0
observations per
class) 5 15 25 35 45 55
Visualizing Numerical Data:
The Polygon
DCOVA

 A percentage polygon is formed by having the midpoint of


each class represent the data in that class and then connecting
the sequence of midpoints at their respective class
percentages.

 The cumulative percentage polygon, or ogive, displays the


variable of interest along the X axis, and the cumulative
percentages along the Y axis.

 Useful when there are two or more groups to compare.


Visualizing Numerical Data:
The Percentage Polygon DCOVA
Useful When Comparing Two or More Groups
Visualizing Numerical Data:
The Percentage Polygon
DCOVA
Visualizing Two Numerical Variables By Using
Graphical Displays
DCOVA

Two Numerical
Variables

Scatter Time-
Plot Series
Plot
Visualizing Two Numerical
Variables: The Scatter Plot
DCOVA
 Scatter plots are used for numerical data consisting of paired
observations taken from two numerical variables

 One variable is measured on the vertical axis and the other


variable is measured on the horizontal axis

 Scatter plots are used to examine possible relationships


between two numerical variables
Scatter Plot Example
DCOVA

Volume Cost per


per day day
Cost per Day vs. Production Volume
23 125
26 140 250
29 146 200

Cost per Day


33 160 150
38 167 100
42 170 50

50 188 0
20 30 40 50 60 70
55 195
Volume per Day
60 200
Visualizing Two Numerical
Variables: The Time Series Plot
DCOVA
• A Time-Series Plot is used to study patterns
in the values of a numeric variable over time

• The Time-Series Plot:


• Numeric variable is measured on the vertical
axis and the time period is measured on the
horizontal axis
Time Series Plot Example
Number of
DCOVA
Year Franchises
1996 43
1997 54
1998 60
1999 73
2000 82
2001 95
2002 107
2003 99
2004 95 Number of Franchises, 1996 - 2004
150
Franchises
Number of

100

50

0
1994 1996 1998 2000 2002 2004 2006
Year
Organizing Many Categorical Variables: The
Multidimensional Contingency Table
DCOVA
• A multidimensional contingency table is constructed by tallying
the responses of three or more categorical variables.

• In Excel creating a Pivot Table to yield an interactive display of


this type.

• While Minitab will not create an interactive table, it has many


specialized statistical & graphical procedures (not covered in this
book) to analyze & visualize multidimensional data.
Using Excel Pivot Tables To Organize & Visualize
Many Variables
DCOVA
A pivot table:
• Summarizes variables as a multidimensional summary table
• Allows interactive changing of the level of summarization and
formatting of the variables
• Allows you to interactively “slice” your data to summarize
subsets of data that meet specified criteria
• Can be used to discover possible patterns and relationships in
multidimensional data that simpler tables and charts would
fail to make apparent.
A Multidimensional Contingency Table Tallies
Responses Of Three or More Categorical Variables
DCOVA

Two Dimensional Table Showing Three Dimensional Table


The Mean 10 Year Return % Showing The Mean 10 Year
Broken Out By Type Of Fund & Return % Broken Out By Type
Risk Level Of Fund, Market Cap, &Risk
Level
Data Discovery Methods Can Yield
Initial Insights Into Data
DCOVA
• Data discovery are methods enable the performance
of preliminary analyses by manipulating interactive
summarizations
• Are used to:
• Take a closer look at historical or status data
• Review data for unusual values
• Uncover new patterns in data
• Drill-down is perhaps the simplest form of data
discovery
Drill-Down Reveals The Data Underlying A
Higher-Level Summary
DCOVA
Results of drilling down to
the details about small
market cap value funds with
low risk.
Some Data Discovery Methods Are
Primarily Visual DCOVA
• A treemap is such a method

• A treemap visualizes the comparison of two or more


variables using the size and color of rectangles to
represent values

• When used with one or more categorical variables it


forms a multilevel hierarchy or tree that can uncover
patterns among numerical variables.
An Example Of A Treemap
DCOVA
A treemap of the numerical variables assets (size) and 10-year
return percentage (color) for growth and value funds that have
small market capitalizations and low risk
The Challenges in Organizing and
Visualizing Variables DCOVA
• When organizing and visualizing data need to be
mindful of:
• The limits of others ability to perceive and comprehend
• Presentation issues that can undercut the usefulness of
methods from this chapter.
• It is easy to create summaries that
• Obscure the data or
• Create false impressions
An Example Of Obscuring Data, Information Overload
DCOVA
False Impressions Can Be Created In
Many Ways DCOVA

• Selective summarization
• Presenting only part of the data collected

• Improperly constructed charts


• Potential pie chart issues
• Improperly scaled axes
• A Y axis that does not begin at the origin or is a broken
axis missing intermediate values

• Chartjunk
An Example of Selective Summarization, These Two
Summarizations Tell Totally Different Stories

DCOVA

Change
from Prior
Company Year Company Year 1 Year 2 Year 3
A +7.2% A -22.6% -33.2% +7.2%
B +24.4% B -4.5% -41.9% +24.4%
C +24.9% C -18.5% -31.5% +24.9%
D +24.8% D -29.4% -48.1% +24.8%
E +12.5% E -1.9% -25.3% +12.5%
F +35.1% F -1.6% -37.8% +35.1%
G +29.7% G +7.4% -13.6% +29.7%
How Obvious Is It That Both Pie Charts Summarize
The Same Data?
DCOVA

Why is it hard to tell? What would you do to improve?


Graphical Errors:
No Relative Basis DCOVA

Bad Presentation Good Presentation


A’s received by A’s received by
Freq. students. % students.
30%
300

200 20%

100 10%

0 0%
FR SO JR SR FR SO JR SR

FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior


Graphical Errors:
Compressing the Vertical Axis
DCOVA

Bad Presentation  Good Presentation


Quarterly Sales Quarterly Sales
$ $
200 50

100 25

0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Graphical Errors: No Zero Point on the
Vertical Axis
DCOVA

Bad Presentation
 Good Presentations

Monthly Sales $ Monthly Sales


$ 45
45
42
42 39
39 36
36 0
J F M A M J J F M A M J

Graphing the first six months of sales


Graphical Errors: Chart Junk, Can
You Identify The Junk?
DCOVA
Bad Presentation  Good Presentation
Graphical Errors: Chart Junk,
Can You Identify The Junk?
DCOVA

Bad Presentation  Good Presentation


Minimum Wage Minimum Wage
1960: $1.00
$
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
In Excel It Is Easy To Inadvertently
Create Distortions
• Excel often will create a graph where the vertical axis
does not start at 0

• Excel offers the opportunity to turn simple charts


into 3-D charts and in the process can create
distorted images

• Unusual charts offered as choices by Excel will most


often create distorted images
Best Practices for Constructing
Visualizations DCOVA

 Use the simplest possible visualization


 Include a title
 Label all axes
 Include a scale for each axis if the chart contains axes
 Begin the scale for a vertical axis at zero
 Use a constant scale
 Avoid 3D effects
 Avoid chartjunk
Crosstabulations and Scatter
Diagrams
 Thus far we have focused on methods that are used
to summarize the data for one variable at a time.
 Often a manager is interested in tabular and
graphical methods that will help understand the
relationship between two variables.
 Crosstabulation and a scatter diagram are two
methods for summarizing the data for two variables
simultaneously.
Crosstabulation

 A crosstabulation is a tabular summary of data for


two variables.
 Crosstabulation can be used when:
• one variable is qualitative and the other is
quantitative,
• both variables are qualitative, or
• both variables are quantitative.
 The left and top margin labels define the classes for
the two variables.
Crosstabulation

 Example: Finger Lakes Homes


The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
quantitative categorical
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100
Crosstabulation
 Example: Finger Lakes Homes
Insights Gained from Preceding Crosstabulation
• The greatest number of homes (19) in the sample
are a split-level style and priced at less than
$200,000.
• Only three homes in the sample are an A-Frame
style and priced at $200,000 or more.
Crosstabulation
Frequency
 Example: Finger Lakes Homes distribution
for the
price range
variable

Price Home Style


Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution for


the home style variable
Crosstabulation: Row or Column Percentages
Converting the entries in the table into row percentages or
column percentages can provide additional insight about the
relationship between the two variables.
Crosstabulation: Row Percentages

 Example: Finger Lakes Homes

Price Home Style


Range Colonial Log Split A-Frame Total
< $200,000 32.73 10.91 34.55 21.82 100
> $200,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

(Colonial and > $200K)/(All > $200K) x 100 = (12/45) x 100


Crosstabulation: Column Percentages

 Example: Finger Lakes Homes

Price Home Style


Range Colonial Log Split A-Frame
< $200,000 60.00 30.00 54.29 80.00
> $200,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

(Colonial and > $200K)/(All Colonial) x 100 = (12/30) x 100


Cross Tabulation
• Understanding relationship between 2 variables
• Example – Quality rating of meals of various prices at 10
restaurants

# Rating Price # Rating Price


1 Good 18 7 Excellent 19
2 Very Good 22 8 Very Good 11
3 Good 28 9 Good 23
4 Excellent 38 10 Very Good 13
5 Good 33 11 Excellent 18
6 Very Good 28 12 Excellent 33
Cross Tabulation…
• One variable is qualitative (Rating) and the other quantitative( Price) –
Row % included

Price
Rating 10 - 19 20 - 29 30 - 39 Total
Good 1 2 1 4
25% 50% 25% 100%

Very Good 2 2 0 4
50% 50% 100%

Excellent 2 0 2 4
50% 50% 100%

Total 5 4 3 12
Cross Tabulation …
Problem - In a study of job satisfaction for 4 occupations
– higher the scores indicate high satisfaction – Provide a
cross tab of occupation & satisfaction score

Lawyer 44 Comp Analyst 54 Lawyer 53

Doctor 80 Lawyer 42 Physiatrist 48

Lawyer 62 Physiatrist 59 Doctor 62

Physiatrist 55 Doctor 79 Lawyer 86

Lawyer 64 Physiatrist 76 Comp Analyst 79

Comp Analyst 73 Doctor 50 Physiatrist 60

Physiatrist 86 Comp Analyst 86 Doctor 52

Lawyer 71 Comp Analyst 50 Lawyer 79

Doctor 78 Physiatrist 76 Comp Analyst 69


• TWO WAY FREQUENCY SERIES/BIVARIATE SERIES

CLASS
0–5 5 – 10 10 – 15 15 – 20
INTERVAL
0 – 10 1 - 2 -
10 – 20 4 3 - -
20 – 30 - - 1 -
30 – 40 2 - 1 -
Scatter Diagram and Trendline

 A scatter diagram is a graphical presentation of the


relationship between two quantitative variables.
 One variable is shown on the horizontal axis and
the other variable is shown on the vertical axis.
 The general pattern of the plotted points suggests
the overall relationship between the variables.
 A trendline provides an approximation of the
relationship.
ScatterADiagram
Positive Relationship
y

x
ScatterADiagram
Negative Relationship
y

x
Scatter Diagram
No Apparent Relationship
y

x
Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
Scatter Diagram
y
35
30

Number of Points Scored


25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Example: Panthers Football Team

 Insights Gained from the Preceding Scatter Diagram


• The scatter diagram indicates a positive relationship
between the number of interceptions and the
number of points scored.
• Higher points scored are associated with a higher
number of interceptions.
• The relationship is not perfect; all plotted points in
the scatter diagram are not on a straight line.
Scatter Diagram and Trendline

Scatter Diagram for the Panthers


35
30

Points Scored.
25
Number of 20
15
10
5
0
0 1 2 3 4
Number of Interceptions
Tabular and GraphicalDataMethods
Categorical Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

• Frequency • Bar Chart • Frequency • Dot Plot


Distribution • Pie Chart Distribution • Histogram
• Rel. Freq. Dist. • Rel. Freq. Dist. • Ogive
• Percent Freq. • % Freq. Dist. • Stem-and-
Distribution • Cum. Freq. Dist. Leaf Display
• Crosstabulation • Cum. Rel. Freq. • Scatter
Distribution Diagram
• Cum. % Freq.
Distribution
• Crosstabulation
Methods of Summarizing Data
Tabular Graphical
Presentation Presentation
Frequency Distribution Dot Plot, Line Chart
Relative Frequency Distribution Histogram
Percent Frequency Distribution Bar diagram, Pie
Cumulative Frequency Distribution Ogive, Freq polygon
Cum Relative Frequency Distribution Freq Curve
Cum Percent Frequency Distribution Stem & Leaf Display
Cross tabulation Scatter Diagram
• The response to a question has three alternatives: A, B and C. A
sample of 120 responses provides 60 A, 24 B and 36 C. Show the
frequency and relative frequency distributions.
• A partial relative frequency distribution is given .
Class Relative frequency
A 0.22
B 0.18
C 0.40
D

• What is the relative frequency of class D?


• The total sample size is 200. what is the frequency of class D?
• Show the frequency distribution.
• Show the percent frequency distribution
• A questionnaire provides 58 yes, 42 no and 20 no-opinion answers.
• In the construction of a pie chart , how many degrees would be in the section
of the pie showing the Yes answers?
• How many degrees would be in the section of the pie showing the No
answers.
Frequency Distribution

 Example: Hudson Auto Repair


Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Relative Frequency and
Percent Frequency Distributions
 Example: Hudson Auto Repair

Parts Relative Percent


Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
2/50 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
multiplied
Total 1.00 100 by 100.
Relative Frequency and
Percent Frequency Distributions
 Example: Hudson Auto Repair
Insights Gained from the % Frequency Distribution:
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.
Example of Ungrouped Data
42 26 32 34 57

30 58 37 50 30

53 40 30 47 49
Ages of a Sample of
Managers from
50 40 32 31 40 Urban Child Care
52 28 23 35 25 Centers in the
United States
30 36 32 26 50

55 30 58 64 52

49 33 43 46 32

61 31 30 40 60

74 37 29 43 54
Frequency Distribution of Child
Care Manager’s Ages

Class Interval Frequency


20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Data Range

42 26 32 34 57 Range = Largest - Smallest


30 58 37 50 30

53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25

30 36 32 26 50

55 30 58 64 52 Smallest
49 33 43 46 32

61 31 30 40 60 Largest
74 37 29 43 54
Number of Classes and Class Width
• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number

51
Approximat e Class Width = = 8.5
6
Class Width = 10
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
Cumulative Frequency
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
Class Midpoints, Relative Frequencies, and
Cumulative Frequencies

Relative Cumulative
Class Interval Frequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative Frequencies
Cumulative
Relative Cumulative Relative
Class Interval Frequency Frequency Frequency Frequency
20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Cumulative Distributions

 The last entry in a cumulative frequency distribution


always equals the total number of observations.
 The last entry in a cumulative relative frequency
distribution always equals 1.00.
 The last entry in a cumulative percent frequency
distribution always equals 100.
Ogive

 An ogive is a graph of a cumulative distribution.


 The data values are shown on the horizontal axis.
 Shown on the vertical axis are the:
• cumulative frequencies, or
• cumulative relative frequencies, or
• cumulative percent frequencies
 The frequency (one of the above) of each class is
plotted as a point.
 The plotted points are connected by straight lines.
Ogive

 Hudson Auto Repair


• Because the class limits for the parts-cost data are
50-59, 60-69, and so on, there appear to be one-unit
gaps from 59 to 60, 69 to 70, and so on.
• These gaps are eliminated by plotting points
halfway between the class limits.
• Thus, 59.5 is used for the 50-59 class, 69.5 is used
for the 60-69 class, and so on.
Ogive with Cumulative Percent Frequencies

 Example: Hudson Auto Repair

100 Tune-up Parts Cost


Cumulative Percent Frequency
80

60 (89.5, 76)

40

20
Parts
Cost ($)
50 60 70 80 90 100 110
Histogram – more insight
18
Tune-up Parts Cost
16
14
12
Frequency

10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Histograms Showing Skewness

 Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness

 Moderately Skewed Left


• A longer tail to the left
• Example: exam scores
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness

 Moderately Right Skewed


• A Longer tail to the right
• Example: housing values
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Histograms Showing Skewness

 Highly Skewed Right


• A very long tail to the right
• Example: executive salaries
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Dot Plot

 One of the simplest graphical summaries of data is a


dot plot.
 A horizontal axis shows the range of data values.
 Then each data value is represented by a dot placed
above the axis.
Dot Plot

 Example: Hudson Auto Repair

Tune-up Parts Cost

50 60 70 80 90 100 110

Cost ($)
Frequency Distribution…
Example – BMW manufactures racing cars and
has gathered the follg info on the number of
models of engines in different size categories used
in the racing market it serves.

Engine Size # of Engine Size # of


cu inches models cu inches models
101 – 150 1 301 – 350 17
151 – 200 7 351 – 400 16
201 – 250 7 401 – 450 15
251 – 300 8 451 – 500 7
Frequency Distribution…
- Construct a cumulative relative frequency distribution.

- 70% of engine models are larger than what size?

- What is the approx middle value in the original data set?

- If BMW designs a fuel injection system that can be used


on engines upto 400 cu inches, about what % of engine
models will not be able to use the system?
• END OF CHAPTER 2

Descriptive statistics: Tabular and Graphical Displays

(Page :33-98)

You might also like