MEL761: Statistics for Decision Making
About the course Introduction Need Descriptive and Inferential Statistics Examples Various Problem Areas
Web site for the course:
[Link]
Dr S G Deshmukh
Mechanical Department Indian Institute of Technology
1
Objectives of this course
Appreciate the role of statistics in various decision making situations Summarize data with frequency distributions and graphic presentation. Interpret descriptive statistics for central tendency, dispersion and location Define and interpret probability. Utilize discrete and continuous probability distributions to determine probabilities in various managerial applications. Apply the central limit theorem to determine probabilities of sample means and compute and interpret point and interval estimates. Conduct Hypothesis tests for means Utilize linear regression to estimate and predict variables. Understand basic concepts of design-of-experiment Understand importance of non-parametric tests
Course coverage
Introduction to statistics: definitions and terminology; data classification; data collection techniques, various scales for measurement and their relevance Descriptive statistics: frequency distributions; measures of central tendency, Variation, Probability: basic concepts; multiplication and addition rules, Bayes rule, Discrete probability distributions: basic concepts; Binomial , Poisson and other discrete distributions Continuous probability distributions :Exponential and other distributions: Normal probability distributions: introductory concepts; the standard normal Distribution; central limit theorem, applications of normal distributions, approximations to discrete probability distributions Correlation and Regression analysis: overview of correlation; linear regression Type I and Type II errors, Confidence intervals: confidence intervals for the mean (large samples and small samples) and for population proportions Analysis of Variance and Design of Experiments, Non-parametric tests Case studies and applications to managerial decision making
3
Evaluation scheme
Surprise Quizzes (n numbers) Minors(2) Major Lab work /assignments Mini-Project Statistics application review 5% 30 % 35% 15 % 10 % 5%
Learning Objectives
Define statistics Become aware of a wide range of applications of statistics in business for decision making Differentiate between descriptive and inferential statistics Formulate and test various sets of hypotheses Understand implications of design of experiments
Statistics..
Plays an important role in many facets of human endeavour Occurs remarkably frequently in our everyday lives Is often incorrectly thought of as just a collection of data, graphs and diagrams
Statistics in Business
Accounting auditing and cost estimation Economics regional, national, and international economic performance Finance investments and portfolio management Management human resources, compensation, and quality management Management Information Systems (ERP): performance of systems which gather, summarize, and disseminate information to various managerial levels Marketing market analysis and consumer research International Business market and demographic analysis
7
What is Statistics?
Science of gathering, analyzing, interpreting, and presenting data Branch of mathematics One page in Courses of study? Facts and figures Measurement taken on a sample Type of distribution being used to analyze data Statistics is the scientific method that enables us to make decisions as responsibly as possible. 8
Statistics
The science of data to answer research questions
Formulate a research question(s) (hypothesis) Collect data Analyze and summarize data Draw conclusions to answer research questions
Statistical Inference
In the presence of variation
9
Answers Questions from Everyday Life
Business: Will a new marketing strategy be profitable? Industry: Will a products life exceed the warranty period? Medicine: Will this years flu vaccine reduce the chance of flu? Education: Will technology improve learning? Government: Will a change in interest rates affect inflation?
10
Decision making process..
1. Collecting pertinent information that is as reliable as possible. 2. Selecting the parts of the available information that are most helpful to make rational decisions.
3. Making the actual decisions as sensibly as possible on the basis of the available evidence.
4. Perceiving the risks entailed in the particular decision made, and evaluating the corresponding risks of alternative actions.
11
Example
Polio Vaccine Results of the Experiment
Vaccine Group Non-vaccine Group
57 142
12
Can Statistics Be Trusted?
There are three kinds of lies: Lies, damned lies, and statistics.
--Mark Twain
It is easy to lie with statistics. But it is easier to lie without them.
--Frederick Mosteller
Figures wont lie but liars will figure.
--Charles Grosvenor
13
Can Statistics Be Trusted?
There are three kinds of lies: Lies, damned lies, and statistics.
--Mark Twain
It is easy to lie with statistics. But it is easier to lie without them.
--Frederick Mosteller
Figures wont lie but liars will figure.
--Charles Grosvenor
14
Population Versus Sample
Population the whole
a collection of persons, objects, or items under study The entire group of individuals in a statistical study we want information about.
Census gathering data from the entire population Sample a portion of the whole
a subset of the population a part of the population from which we actually collect information, used to draw conclusions about the whole (statistical inference
15
Statistics can be split into two broad categories
1. Descriptive statistics
2. Statistical inference
16
Descriptive vs. Inferential Statistics
Descriptive Statistics using data gathered on a group to describe or reach conclusions about that same group only
Inferential Statistics using sample data to reach conclusions about the population from which the sample was taken
17
Descriptive statistics..
Encompasses the following:
Graphical or pictorial display Condensation of large masses of data into a form such as tables Preparation of summary measures to give a concise description of complex information (e.g. an average figure) Exhibition of patterns that may be found in sets of information
18
Inferential Statistics..
Especially relates to:
Determining whether characteristics of a situation are unusual or if they have happened by chance Estimating values of numerical quantities and determining the reliability of those estimates Using past occurrences to attempt to predict the future
19
Statistics: Science of variability..?
Virtually everything varies Variation occurs among individuals Variation occurs within any one individual as time passes
20
Parameter vs. Statistic
Parameter descriptive measure of the population
Usually represented by Greek letters
Statistic descriptive measure of a sample
Usually represented by Roman letters
21
Symbols for Population Parameters
denotes population parameter
denotes population variance
denotes population standard deviation
22
Symbols for Sample Statistics
x denotes sample mean
denotes sample variance
S denotes sample standard deviation
23
Process of Inferential Statistics
Calculate x
Population
to estimate
Sample x (statistic)
(parameter )
Select a random sample
24
Levels of Data Measurement
Nominal Lowest level of measurement Ordinal Interval Ratio Highest level of measurement
25
Nominal Level Data
Numbers are used to classify or categorize
Example: Employment Classification
1 for Educator 2 for Construction Worker 3 for Manufacturing Worker
Example: Ethnicity
1 for African-American 2 for Anglo-American 3 for Hispanic-American
26
Ordinal Level Data
Numbers are used to indicate rank or order Relative magnitude of numbers is meaningful Differences between numbers are not comparable Example: Ranking productivity of employees Example: Taste test ranking of three brands of soft drink Example: Position within an organization 1 for President 2 for Vice President 3 for Plant Manager 4 for Department Supervisor 5 for Employee
27
Example of Ordinal Measurement
1 6 2 4 3 5
f i n i s h
28
Ordinal Data
Faculty should receive preferential treatment for parking space in new Bharati Telecom building.
Strongly Agree
Agree
Neutral
Disagree
Strongly Disagree
29
Interval Level Data
Distances between consecutive integers are equal
Relative magnitude of numbers is meaningful Differences between numbers are comparable Location of origin, zero, is arbitrary Vertical intercept of unit of measure transform function is not zero
Example: Fahrenheit Temperature Example: Calendar Time Example: Monetary Utility
30
Ratio Level Data
Highest level of measurement
Relative magnitude of numbers is meaningful Differences between numbers are comparable Location of origin, zero, is absolute (natural) Vertical intercept of unit of measure transform function is zero
Examples: Height, Weight, and Volume Example: Monetary Variables, such as Profit and Loss, Revenues, and Expenses Example: Financial ratios, such as P/E Ratio, Inventory Turnover
31
Usage Potential of Various Levels of Data
Ratio Interval Ordinal Nominal
32
Data Level, Operations, and Statistical Methods
Data Level Nominal Ordinal Interval Meaningful Operations Classifying and Counting All of the above plus Ranking All of the above plus Addition, Subtraction, Multiplication, and Division All of the above
Statistical Methods
Nonparametric Nonparametric Parametric
Ratio
Parametric
33
Visual presentation of data
34
Data preparation rules
Data presented must be
factual relevant
Before presentation always check: the source of the data that the data has been accurately transcribed the figures are relevant to the problem
35
Methods of visual presentation of data
Table
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 90 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9
East West North
36
Methods of visual presentation of data
Graphs
90 80 70 60 50 40 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North
37
Methods of visual presentation of data
Pie chart
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
38
Methods of visual presentation of data
Multiple bar chart
4th Qtr 3rd Qtr 2nd Qtr 1st Qtr 0 20 40 60 80 100
39
North West East
Methods of visual presentation of data
Simple pictogram
100 80 60 40 20 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East
West
North
40
Frequency distributions
Frequency tables
Observation Table Frequency Cumulative Frequency 13 13 18 31 25 56 15 71 9 80
Class Interval < 20 <40 <60 <80 <100
41
Frequency diagrams
Frequency 30 25 20 15 10 5 0 < 20 <40 <60 <80 <100 Frequency
Cumulative Frequency 90 80 70 60 50 40 30 20 10 0 < 20
Frequency
Cumulative Frequency
<40
<60
<80
<100
30 25 20 15 10 5 0 < 20 <40 <60 <80 <100 Frequency
42
Ungrouped Versus Grouped Data
Ungrouped data
have not been summarized in any way are also called raw data
Grouped data
have been organized into a frequency distribution
43
Example of Ungrouped Data
42 30 53 26 58 40 32 37 30 34 50 47 57 30 49
50
52 30 55
40
28 36 30
32
23 32 58
31
35 26 64
40
25 50 52
Ages of a Sample of Managers from XYZ
49
61 74
33
31 37
43
30 29
46
40 43
32
60 54
44
Frequency Distribution of Ages
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency 6 18 11 11 3 1
45
Data Range
42 30 53 50 52 30 55 49 61 74 26 58 40 40 28 36 30 33 31 37 32 37 30 32 23 32 58 43 30 29 34 50 47 31 35 26 64 46 40 43 57 30 49 40 25 50 52 32 60 54
Range = Largest - Smallest = 74 - 23 = 51
Smallest
Largest
46
Number of Classes and Class Width
The number of classes should be between 5 and 15. Fewer than 5 classes cause excessive summarization. More than 15 classes leave too much detail. Class Width Divide the range by the number of classes for an approximate class width Round up to a convenient number
51 Approximate Class Width = = 8.5 6 Class Width = 10
47
Class Midpoint
beginning class endpoint + ending class endpoint Class Midpoint = 2 30 + 40 = 2 = 35
1 Class Midpoint = class beginning point + class width 2 1 = 30 + 10 2 = 35
48
Relative Frequency
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Relative Frequency Frequency 6 .12 6 18 .36 50 11 .22 18 50 11 .22 3 .06 1 .02 50 1.00
49
Cumulative Frequency
Cumulative Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total
Frequency 6 18 11 11 3 1 50
Frequency 6 18 + 6 24 11 + 24 35 46 49 50
50
Class Midpoints, Relative Frequencies, and Cumulative Frequencies
Relative
Cumulative Class IntervalFrequency Midpoint Frequency Frequency 20-under 30 6 25 .12 6 30-under 40 18 35 .36 24 40-under 50 11 45 .22 35 50-under 60 11 55 .22 46 60-under 70 3 65 .06 49 70-under 80 1 75 .02 50 51 Total 50 1.00
Cumulative Relative Frequencies
Cumulative
Relative Cumulative Relative Class IntervalFrequency Frequency Frequency Frequency 20-under 30 6 .12 6 .12 30-under 40 18 .36 24 .48 40-under 50 11 .22 35 .70 50-under 60 11 .22 46 .92 60-under 70 3 .06 49 .98 70-under 80 1 .02 50 1.00 Total 50 1.00 52
Common Statistical Graphs
Histogram -- vertical bar chart of frequencies Frequency Polygon -- line graph of frequencies Ogive -- line graph of cumulative frequencies Pie Chart -- proportional representation for categories of a whole Stem and Leaf Plot Pareto Chart Scatter Plot
53
Histogram
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10
10 20 30 40 50 60 70 80 Years
54
Histogram Construction
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
Frequency
10
20
10 20 30 40 50 60 70 80 Years
55
Frequency Polygon
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10
10 20 30 40 50 60 70 80 Years
56
Ogive
Cumulative Class Interval Frequency 20-under 30 6 30-under 40 24 40-under 50 35 50-under 60 46 60-under 70 49 70-under 80 50
Frequency
0
0
20
40
60
10
20
30
40 Years
50
60
70
80
57
Relative Frequency Ogive
Cumulative Relative Frequency .12 .48 .70 .92 .98 1.00
Cumulative Relative Frequency
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80
1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 10 20 30 40 Years
58
50
60
70
80
Complaints by Passengers
COMPLAINT Stations, etc. Train Performance Equipment Personnel Schedules, etc. Total NUMBER 28,000 14,700 10,500 9,800 7,000 PROPORTION .40 .21 .15 .14 .10 DEGREES 144.0 75.6 50.4 50.6 36.0
70,000
1.00
360.0
59
Complaints by Passengers
Personnel 14% Equipment 15% Schedules, Etc. 10%
Train Performance 21%
Stations, Etc. 40%
60
Company A B
2d Quarter Truck Production 357,411 354,936 160,997
Second Quarter Truck Production
D
E Totals
34,099
12,747 920,190
61
Second Quarter Truck Production
17% 4% 1%
39% 39%
62
Pie Chart Calculations for Company A
Company A 2d Quarter Truck Production 357,411 Proportion .388 Degrees 140
B
C D E
357, 411 = 920,190
354,936
160,997 34,099 12,747 920,190
.386
.175
139
63 13 5 360
63
.388 360 = .037
.014 1.000
Totals
Pareto Chart
100 90 100% 90%
80
70
80%
70% 60%
Frequency
60
50
40 30 20 10 0 Poor Wiring Short in Coil Defective Plug Other
50%
40% 30% 20% 10% 0%
64
Scatter Plot
Registered Vehicles (1000's) Gasoline Sales (1000's of Gallons)
Gasoline Sales
200
5 15 9 15 7
60 120 90
100
140 60
10 15 Registered Vehicles
20
65