Professional Documents
Culture Documents
Shovan Chowdhury
EPGCOSCM 13
Course Objective
• Familiarity with different types of data and their visualization
• Understanding presence of intrinsic uncertainty in a business
situation
• Use of appropriate statistical techniques for modelling data and
capturing uncertainty
• Applying statistical software for data analysis
• Interpreting outputs from a managerial aspect (may require
knowledge of other disciplines)
EPGCOSCM 13
“Statistical Techniques/Methods”
Do some Interpret
statistical results
calculations
EPGCOSCM 13
DATA AND SUMMARIZATION
Primary Uses of Statistics
EPGCOSCM 13
Problem
One Chocolate manufacturing company sells quality chocolate products at its plant
and retail stores. Two years ago, the company developed a Web site and began
selling its products over the Internet. Web site have exceeded the company’s
expectations, and management is now considering strategies to increase sales even
further. To learn more about the Web site customers, a sample of 50 Chocolate
transactions was selected from the previous month’s sales.
Data showing
the day of the week each transaction was made,
the type of browser the customer used,
the time spent on the Web site,
the number of Web site pages viewed,
the amount spent by each of the 50 customers.
EPGCOSCM 13
The Cab Case (Text Book) : Demand Supply Gap
EPGCOSCM 13
Basic Vocabulary of Statistics
POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.
SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.
STATISTIC
A statistic is a numerical measure that describes a characteristic
of a sample.
EPGCOSCM 13
Qualitative(Categ
Quantitative
orical)
Discrete (no.
of customers, Ordinal (customer
no of claims) satisfaction,
efficiency of workers,
bond rating)
Continuous
(salary, price)
Nominal (sex,
nationality,
eye color)
EPGCOSCM 13
Cross-Sectional Data
EPGCOSCM 13
Data Visualization
EPGCOSCM 13
Some quick questions
- Return on investment
- Project completion time
- Mutual fund ratings
- Political affiliation
-Demand for a product
- No of customers waiting in a queue
- Diameter of bolts
- Number of defectives produced in a shift
- Gender
-No of misprints per page of a book
- Marital Status
- Efficiency of employee
Excel Bar and Pie Chart of Pizza Preference
Data
EPGCOSCM 13
Histogram
EPGCOSCM 13
Summary for WaitTime
A nderson-D arling N ormality Test
A -S quared 0.24
P -V alue 0.759
M ean 5.4600
S tDev 2.4755
V ariance 6.1279
S kew ness 0.250415
Kurtosis -0.404960
N 100
M inimum 0.4000
1st Q uartile 3.8000
M edian 5.2500
3rd Q uartile 7.2000
0 2 4 6 8 10 12
M aximum 11.6000
95% C onfidence Interv al for M ean
4.9688 5.9512
95% C onfidence Interv al for M edian
4.5742 5.8773
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
2.1735 2.8757
Mean
Median
EPGCOSCM 13
Rating Distribution of Sample
35%
30%
25%
20%
15%
10%
5%
0%
1 2 3 4 5
60%
40%
20%
0%
M F
25
20
15
10
0
EPGCOSCM 13
Data and randomness
Three questions that good business managers ask themselves when
they look at “the numbers”:-
EPGCOSCM 13
Key Performance Measures
EPGCOSCM 13
Measures of Center
There are three main measures of center:
• Mean (most useful measure)
• Median (generally used under the presence of outliers)
• Mode (used for categorical data)
EPGCOSCM 13
Symmetrical
Negatively/
Positively
Left
/Right
Skewed
Skewed
EPGCOSCM 13
Dispersion
Describes how similar a set of observations are to each other
or
the degree of deviation (spread) of a set of data from their central
value
• In general, the more spread out a distribution is, the larger the
measure of dispersion will be
EPGCOSCM 13
Measures of Dispersion
There are four main measures of dispersion:
• Variance
• Standard Deviation
• Mean absolute Deviation
• Quartile Deviation or Semi-Inter-quartile range (IQR)
EPGCOSCM 13
Mean Absolute Deviation
EPGCOSCM 13
Variance and Standard Deviation
• The standard deviation is defined as the square root
of the variance. The units of measurement for the
standard deviation is same as the units of the
variable.
EPGCOSCM 13
Standard
Deviation
EPGCOSCM 13
0
5
10
15
20
25
30
1
Sales
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
EPGCOSCM 13 106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
175
178
181
184
187
190
193
196
---------------------------------------------------------------------------------------------------------------------------------------
199
Interpretation
• The larger the SD/variance is, the more the observations deviate, on
average, away from the mean
• The smaller the SD/variance is, the less the observations deviate, on
average, from the mean
EPGCOSCM 13
Coefficient of Variation (CV)
s
CV = 100
x
EPGCOSCM 13
Percentiles, Quartiles and IQR
EPGCOSCM 13
Percentiles, quartiles, and the IQR
EPGCOSCM 13
Box Plot
EPGCOSCM 13
Box Plot
EPGCOSCM 13
Detection of Outliers (Box Plot)
83 84 85 86 87 88 89 90 91
IBM
BoxPlot
EPGCOSCM 13
A large number of fast-food restaurants with drive-through
windows offering drivers and their passengers the
advantages of quick service. To measure how good the
service is, an organization called QSR planned a study
wherein the amount of time taken by a sample of drive-
through customers at each of five restaurants was
recorded. Compare the five sets of data using a box plot
and interpret the results.
EPGCOSCM 13
Box Plots…
EPGCOSCM 13
Standardising Data
• Purpose: To compare each data point to the natural
range and variation of the dataset.
• Method: For each data value – subtract off sample
mean and divided by sample std dev.
Resulting numbers called z-values or z-scores
• measure how many standard deviations above or
below the mean a data point is.
• are “unit free”
• have mean zero and SD 1
EPGCOSCM 13
Standardising Data
EPGCOSCM 13
Capturing variation
⚫ Chebyshev’s Theorem
Applies to any distribution, regardless of shape
⚫ Empirical Rule
Applies only to roughly mound-shaped and symmetric
distributions
EPGCOSCM 13
Chebyshev’s Theorem
1
⚫ At least 1 −
2 of the elements of any
k
distribution lie within k standard deviations of the
mean
1 1 3
1− = 1 − = = 75%
2
2
4 4 2
Standard
At 1 1 8 Lie
1 − 2 = 1 − = = 89% 3 deviations
least 3 9 9 within of the mean
1 1 15 4
1− 2 = 1− = = 94%
4 16 16
EPGCOSCM 13
Empirical Rule
⚫ For roughly mound-shaped and symmetric
distributions, approximately:
m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
EPGCOSCM 13
A survey is conducted on 20 respondents to gather information on customer satisfaction for a product.
The data on customer satisfaction is obtained on a 3 point scale viz. highly satisfied (HS), satisfied (S), not satisfied
(NS) and also on gender- male (M) and female (F). The data is recorded as shown below:
Respondents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F F F F F F M M M M M F F F F M F M M
Satisfaction S S NS S NS NS NS NS HS HS S NS HS S S HS NS NS S S
level
EPGCOSCM 13
Scatter Plots and Correlation
EPGCOSCM 13
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y
x x
y y
x x
EPGCOSCM 13
Strong relationships Weak relationships
y y
x x
y y
x x
EPGCOSCM 13
No relationship
x
EPGCOSCM 13
Correlation Coefficient
EPGCOSCM 13
Calculating sample Correlation Coefficient
cov( x, y )
rxy =
sx s y
1
cov( x, y ) = ( xi − x )( yi − y )
n
1 1
sx =
n
( xi − x ) 2
s y =
n
( y i − y ) 2
EPGCOSCM 13
Features of correlation coefficient
• Unit free
• Range between -1.00 and 1.00
• -1≤r<0 implies that as X ↑ (↓), Y ↓ (↑ )
• 0< r≤1 implies that as X ↑ (↓), Y ↑ (↓)
• The closer to -1.00, the stronger the negative linear relationship
• The closer to 1.00, the stronger the positive linear relationship
• The closer to 0.00, the weaker the linear relationship
• r=0 implies that X and Y are not linearly associated
EPGCOSCM 13
Examples of Approximate r Values
y y y
x x x
r = -1.00 r = -.60 r = 0.00
y y
x x
r = 0.20 EPGCOSCM 13 r = 1.00