Professional Documents
Culture Documents
Management
Term II
4 credits
MGT 408
DAY -2
Business Statistics
A First course
David M.Levine
Kathryn A.Szabat
David F.Stephan
P.K.Viswanathan
PEARSON PUBLICATIONS 7e
Subject Outline
• Introduction ch-1
• Data collection, classification and presentation ch-2
• Measures of central tendencies and dispersion ch-3
• Correlation and Regression analysis ch-12
• Probability concepts ch-04
• Probability distributions – Binomial and Poisson ch-05
• Probability distribution – Normal ch-06
• Sampling techniques ch-07
• Estimation and Inference statistics ch-08
• Testing of Hypothesis – Non Parametric (Chi square) ch-9, 11
• Bayesian Analysis and decision theory ch-15
Recap -1
• Introduction
• Definition
• Importance and limitations
• Applications
• Scale of measurement
• Type of variables
– Qualitative, quantitative
– Time series and cross sectional
• Population, sample, parameter and
statistic
Types of statistics
Descriptive
inferential
Types of Statistics
• Statistics
• The branch of mathematics that transforms data
into useful information for decision makers.
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean = X i
n
Inferential Statistics
• Estimation
– e.g., Estimate the
population mean
weight using the
sample mean weight
• Hypothesis testing
– e.g., Test the claim that
the population mean
weight is 120 pounds
Sample
Inference
Statistic
Parameter
Calculate x
to estimate
Population Sample
x
(parameter) (statistic)
Select a
random sample
Sources of data collection
Collecting Data Correctly Is A Critical
Task
DCOVA
Need to avoid data flawed by
biases, ambiguities, or other
types of errors.
Secondary Sources: The person performing data analysis is not the data
collector
Analyzing census data
Examining data from print journals or data published on the internet.
Government data: economics and demographics
Media reports – TV, newspapers, Internet
Companies that specialize in gathering data
Sources of data fall into five
categories
DCOVA
• Data distributed by an organization or an
individual
• The outcomes of a designed experiment
• The responses from a survey
• The results of conducting an
observational study
• Data collected by ongoing business
activities
Examples Of Data Distributed By
Organizations or Individuals
DCOVA
• Financial data on a company provided
by investment services.
In
In experimental
experimental studies
studies the
the variable
variable ofof interest
interest is
is
first
first identified.
identified. Then
Then one
one oror more
more other
other variables
variables
are
are identified
identified and
and controlled
controlled soso that
that data
data can
can be
be
obtained
obtained about
about how
how they
they influence
influence the
the variable
variable ofof
interest.
interest.
The
The largest
largest experimental
experimental study
study ever
ever conducted
conducted isis
believed
believed toto be
be the
the 1954
1954 Public
Public Health
Health Service
Service
experiment
experiment for for the
the Salk
Salk polio
polio vaccine.
vaccine. Nearly
Nearly two
two
million
million U.S.
U.S. children
children (grades
(grades 1-
1- 3)
3) were
were selected.
selected.
Examples of Survey Data
DCOVA
• A survey asking people which laundry
detergent has the best stain-removing
abilities
In
In observational
observational (nonexperimental)
(nonexperimental) studies
studies no
no
attempt
attempt is
is made
made to to control
control or
or influence
influence the
the
variables
variables of
of interest.
interest. a survey is a good
example
Studies
Studies of
of smokers
smokers and
and nonsmokers
nonsmokers are
are
observational
observational studies
studies because
because researchers
researchers
do
do not
not determine
determine or
or control
control
who
who will
will smoke
smoke and
and who
who will
will not
not smoke.
smoke.
Examples of Data Collected From
Ongoing Business Activities
DCOVA
• A bank studies years of financial
transactions to help them identify
patterns of fraud.
Time Requirement
• Searching for information can be time consuming.
• Information may no longer be useful by the time it
is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
PRACTICE
Examples of Types of Variables
DCOVA
Population Statistic Parameter Sample
• A student makes an 82 on the first test in a
statistics course. From this, she assumes that her
average at the end of the semester (after other
tests) will be about 82. This is an example of
(a)____________________.
Descriptive statistics
Inferential statistics
Nonparametric statistics
Wishful thinking
• A statistics instructor collects information about the
background of his students. About 30% have taken
economics and about 40% have taken accounting.
There are 23 male students and 27 female students in
this class. This is an example
of(a)____________________.
Descriptive statistics
Inferential statistics
Nominal data
Nonparametric statistics
• Abel Alonzo, Director of Human Resources, is
exploring the causes of employee absenteeism
at Batesville Bottling during the last operating
year (January 1, 1999 through December 31,
1999). The average number of absences per
employee, computed from the personnel data of
all employees, is a (a)____________________.
(a) O Parameter Population Sample St
atistic
• Pinky Bauer, Chief Financial Officer of Harrison
Haulers, Inc., suspects irregularities in the payroll
system, and orders an inspection of "each and
every payroll voucher issued since January 1,
1991". Five percent of the payroll vouchers
contained material errors. This is an example
of(a)____________________.
(a) Nonparametric statistics Nominal
data Inferential statistics O Descriptive statistics
Data Mining
• Search for patterns in large data sets
– Businesses data: marketing, finance, production ...
• Collected for some purpose, often useful for others
• From government or private companies
– Makes use of
• Statistics – all the basic activities, and
– Prediction, classification, clustering
• Computer science – efficient algorithms (instructions) for
– Collecting, maintaining, organizing, analyzing data
• Optimization – calculations to achieve a goal
– Maximize or minimize (e.g. sales or costs)
Census Bureau County Data
• Over 1,000 counties with demographic, social,
economic, and housing data available for
mining
Clusters of Households
• Identified through data mining (A
Classification of Residential Neighborhoods)Segments
Summary Groups Top One Percent
• Qualitative
• Quantitative
• Geographical
• Chronological
– Time series (is a set of observations collected at usually
discrete and equally spaced time intervals- Eg. Daily closing
stock price of a certain stock recorded over the last six
weeks )
– Cross sectional (observations from different individuals or
groups at a single point in time – inventory of all ice creams
in stock at a particular store)
PRESENTATION OF DATA
TABULAR
DIAGRAMS
GRAPHS
• TABULATION
SPECIMEN OF A TABLE
Total Grand
Total
Foot Note
Sources
DESCRIPTIVE STATISTICS:
ORGANIZING AND VISUALIZING
VARIABLES
CHAPTER 2
Descriptive Statistics:
Tabular and Graphical
Presentations
• Summarizing Categorical Data
Summarizing Quantitative Data
Categorical
Categorical data
data use
use labels
labels or
or names
names
to
to identify
identify categories
categories of
of like
like items.
items.
Quantitative
Quantitative data
data are
are numerical
numerical values
values
that
that indicate
indicate how
how much
much or
or how
how many.
many.
Categorical Data Are Organized By Utilizing Tables
DCOVA
Categorical
Data
Tallying Data
One Two
Categorical Categorical
Variable Variables
Summary Contingency
Table Table
Organizing Categorical Data: Summary Table
DCOVA
A summary table tallies the frequencies or percentages of items in a set of
categories so that you can see differences between categories.
Source: Data extracted and adapted from “Main Reason Young Adults Shop Online?”
USA Today, December 5, 2012, p. 1A.
Frequency Distribution
AA frequency
frequency distribution
distribution is
is aa tabular
tabular summary
summary of of
data
data showing
showing the
the frequency
frequency (or(or number)
number) of
of items
items
in
in each
each of
of several
several non-overlapping
non-overlapping classes.
classes.
The
The objective
objective is
is to
to provide
provide insights
insights about
about the
the data
data
that
that cannot
cannot be
be quickly
quickly obtained
obtained by
by looking
looking only
only at
at
the
the original
original data.
data.
Relative Frequency Distribution
The
The relative
relative frequency
frequency of of aa class
class is
is the
the fraction
fraction or
or
proportion
proportion of
of the
the total
total number
number of of data
data items
items
belonging
belonging to
to the
the class.
class.
AA relative
relative frequency
frequency distribution
distribution is
is aa tabular
tabular
summary
summary of of aa set
set of
of data
data showing
showing the
the relative
relative
frequency
frequency forfor each
each class.
class.
Percent Frequency Distribution
The
The percent
percent frequency
frequency of
of aa class
class is
is the
the relative
relative
frequency
frequency multiplied
multiplied by
by 100.
100.
AA percent
percent frequency
frequency distribution
distribution is
is aa tabular
tabular
summary
summary of of aa set
set of
of data
data showing
showing the
the percent
percent
frequency
frequency for
for each
each class.
class.
Frequency Distribution…
Example – 4 soft drinks – 15
households
Coke Pepsi 7 Up Coke Mirinda
Coke 7 Up 7 Up Coke Coke
Mirinda 7 Up Coke Mirinda Coke
Drink Frequency
Coke 7
Pepsi 1
Mirinda 3
7 Up 4
Total 15
Frequency Distribution…
Soft Drink Frequency Relative Percent
frequency frequency
Coke 7 0.46 46
Pepsi 1 0.07 7
Mirinda 3 0.20 20
7 Up 4 0.27 27
Total 15 1.00 100
Frequency Distribution
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average, below
average, or poor. The ratings provided by a sample of 20 guests are:
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100)
.10(100) == 10
10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20
1/20 == .05
.05
A Contingency Table Helps Organize Two or More
Categorical Variables
DCOVA
• Used to study patterns that may exist between the
responses of two or more categorical variables
Quantitative
Quantitative data
data indicate
indicate how
how many
many or
or how
how much:
much:
discrete,
discrete, ifif measuring
measuring how
how many
many
continuous,
continuous, ifif measuring
measuring how
how much
much
Quantitative
Quantitative data
data are
are always
always numeric.
numeric.
Ordinary
Ordinary arithmetic
arithmetic operations
operations are
are meaningful
meaningful for
for
quantitative
quantitative data.
data.
Ungrouped Versus Grouped Data
• Ungrouped data
• have not been summarized in any way
• are also called raw data
• Grouped data
• have been organized into a frequency
distribution
Tables Used For Organizing
Numerical Data
DCOVA
Numerical Data
You must give attention to selecting the appropriate number of class groupings for the
table, determining a suitable width of a class grouping, and establishing the boundaries
of each class grouping to avoid overlapping.
The number of classes depends on the number of values in the data. With a larger
number of values, typically there are more classes. In general, a frequency distribution
should have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the range (Highest value–
Lowest value) of the data by the number of class groupings desired.
Organizing Numerical Data:
Frequency Distribution Example
DCOVA
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53,
27
Organizing Numerical Data:
Frequency Distribution Example
DCOVA
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits):
Class 1: 10 but less than 20
Class 2: 20 but less than 30
Class 3: 30 but less than 40
Class 4: 40 but less than 50
Class 5: 50 but less than 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Organizing Numerical Data: Frequency Distribution
Example
DCOVA
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53,
58
NO. OF
HOURS
WORKERS
LESS THAN 10 5
LESS THAN 30 15
LESS THAN 60 30
LESS THAN 90 50
• MORE THAN CUMULATIVE FREQUENCY SERIES
10 – 19 17
20 – 29 15
30 – 39 12
40 – 49 10
• EXCLUSIVE CLASS INTERVAL
NO. OF
REVENUE (RS.)
PRODUCTS
100 – 200 15
200 – 300 20
300 – 400 10
400 – 500 5
TOTAL 50
• OPEN END CLASS INTERVAL
30 58 37 50 30
53 40 30 47 49 Ages of a Sample of
Managers from
50 40 32 31 40 Urban Child Care
52 28 23 35 25 Centers in the
United States
30 36 32 26 50
55 30 58 64 52
49 33 43 46 32
61 31 30 40 60
74 37 29 43 54
Frequency Distribution of Child Care
Manager’s Ages
53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52 Smallest
49 33 43 46 32
61 31 30 40 60 Largest
74 37 29 43 54
Number of Classes and Class Width
• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive
summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
51
Approximate Class Width = = 8.5
6
Class Width = 10
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
Cumulative Frequency
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
Class Midpoints, Relative Frequencies, and
Cumulative Frequencies
Relative Cumulative
Class Interval Frequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative Frequencies
Cumulative
Relative Cumulative Relative
Class Interval Frequency Frequency Frequency Frequency
20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Cumulative Distributions