QUANTITATIVE TITLE

Quantitative Techniques
Shovan Chowdhury
Indian Institute of Management, Kozhikode

Data Driven Business Performance
EPGCOSCM 13
Course Objective
• Familiarity with different types of data and their visualization
• Understanding presence of intrinsic uncertainty in a business
situation
• Use of appropriate statistical techniques for modelling data and
capturing uncertainty
• Applying statistical software for data analysis
• Interpreting outputs from a managerial aspect (may require
knowledge of other disciplines)
EPGCOSCM 13
“Statistical Techniques/Methods”
Formulate Get some Visualize the

problem data data
Do some Interpret
statistical results
calculations
EPGCOSCM 13
DATA AND SUMMARIZATION
Primary Uses of Statistics
• Descriptive statistics – the collection, organization,

presentation and summary of data.
• Inferential statistics – generalizing from a sample to a

population, estimating unknown parameters, drawing
conclusions, making decisions.
EPGCOSCM 13
Problem
One Chocolate manufacturing company sells quality chocolate products at its plant
and retail stores. Two years ago, the company developed a Web site and began
selling its products over the Internet. Web site have exceeded the company’s
expectations, and management is now considering strategies to increase sales even
further. To learn more about the Web site customers, a sample of 50 Chocolate
transactions was selected from the previous month’s sales.
Data showing
the day of the week each transaction was made,
the type of browser the customer used,
the time spent on the Web site,
the number of Web site pages viewed,
the amount spent by each of the 50 customers.
EPGCOSCM 13
The Cab Case (Text Book) : Demand Supply Gap
EPGCOSCM 13
Basic Vocabulary of Statistics
POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.
SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.
STATISTIC
A statistic is a numerical measure that describes a characteristic
of a sample.
EPGCOSCM 13
Qualitative(Categ
Quantitative
orical)
Discrete (no.
of customers, Ordinal (customer
no of claims) satisfaction,
efficiency of workers,
bond rating)
Continuous
(salary, price)
Nominal (sex,
nationality,
eye color)
EPGCOSCM 13
Cross-Sectional Data
• Cross-sectional data: Data collected at the same or approximately the

same point in time
• Time series data: data collected over different time periods
EPGCOSCM 13
Data Visualization
EPGCOSCM 13
Some quick questions
- Return on investment
- Project completion time
- Mutual fund ratings
- Political affiliation
-Demand for a product
- No of customers waiting in a queue
- Diameter of bolts
- Number of defectives produced in a shift
- Gender
-No of misprints per page of a book
- Marital Status
- Efficiency of employee
Excel Bar and Pie Chart of Pizza Preference
Data
EPGCOSCM 13
Histogram
EPGCOSCM 13
Summary for WaitTime
A nderson-D arling N ormality Test
A -S quared 0.24
P -V alue 0.759
M ean 5.4600
S tDev 2.4755
V ariance 6.1279
S kew ness 0.250415
Kurtosis -0.404960
N 100
M inimum 0.4000
1st Q uartile 3.8000
M edian 5.2500
3rd Q uartile 7.2000
0 2 4 6 8 10 12
M aximum 11.6000
95% C onfidence Interv al for M ean
4.9688 5.9512
95% C onfidence Interv al for M edian
4.5742 5.8773
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
2.1735 2.8757
Mean
Median
4.50 4.75 5.00 5.25 5.50 5.75 6.00
EPGCOSCM 13
Rating Distribution of Sample
35%
30%
25%
20%
15%
10%
5%
0%
1 2 3 4 5
Gender Profile of Sample

80%
60%
40%
20%
0%
M F
Distribution of Waiting Time (mins)
25
20
15
10
0
EPGCOSCM 13
Data and randomness
Three questions that good business managers ask themselves when
they look at “the numbers”:-
• What is a typical or central value?
• How much variability is present in the data set?
• Are there unusual shocks/events/cases (shape of the curve)?
EPGCOSCM 13
Key Performance Measures
EPGCOSCM 13
Measures of Center
There are three main measures of center:
• Mean (most useful measure)
• Median (generally used under the presence of outliers)
• Mode (used for categorical data)
EPGCOSCM 13
Symmetrical
Negatively/
Positively
Left
/Right
Skewed
Skewed
EPGCOSCM 13
Dispersion
Describes how similar a set of observations are to each other
or
the degree of deviation (spread) of a set of data from their central
value
• In general, the more spread out a distribution is, the larger the
measure of dispersion will be
EPGCOSCM 13
Measures of Dispersion
There are four main measures of dispersion:
• Variance
• Standard Deviation
• Mean absolute Deviation
• Quartile Deviation or Semi-Inter-quartile range (IQR)
EPGCOSCM 13
Mean Absolute Deviation
EPGCOSCM 13
Variance and Standard Deviation
• The standard deviation is defined as the square root
of the variance. The units of measurement for the
standard deviation is same as the units of the
variable.
Population Standard Sample Standard

Deviation Deviation
EPGCOSCM 13
Standard
Deviation
EPGCOSCM 13
0
5
10
15
20
25
30
1
Sales
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
EPGCOSCM 13 106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
175
178
181
184
187
190
193
196
---------------------------------------------------------------------------------------------------------------------------------------
199
Interpretation
• The larger the SD/variance is, the more the observations deviate, on
average, away from the mean
• The smaller the SD/variance is, the less the observations deviate, on
average, from the mean
EPGCOSCM 13
Coefficient of Variation (CV)
• Relative measure (unit free) used for the purpose

of comparison of variability.
• Relative Measure=absolute measure/avg. *100
s
CV = 100 
x
EPGCOSCM 13
Percentiles, Quartiles and IQR
• Percentiles are data that have been divided into 100

groups (99 percentiles).
• For example, you score in the 83rd percentile on a
standardized test. That means that 83% of the test-
takers scored below you.
• Deciles are data that have been divided into 10 groups
(9 deciles).
• Quartiles are data that have been divided into 4 groups
(3 quartiles).
EPGCOSCM 13
Percentiles, quartiles, and the IQR
The 10th percentile (denoted by P10) is the number

such that 10% of the values are less than it and 90%
are bigger.
The median is the 50th percentile.
The 1st quartile (denoted by Q1) is the data such

that 25% of the values are less than it and 75% are
bigger.
Inter quartile range (IQR) = Q3-Q1
EPGCOSCM 13
Box Plot
Describes the overall distribution of a set of

numbers but is simpler than a histogram.
Useful when comparing several samples because
too many histograms on one graph would be
both crowded and confusing.
Also produces useful display with small data
sets.
Useful to detect outliers / extreme values
EPGCOSCM 13
Box Plot
S=smallest, L=Largest, M=median

Q1=lower quartile, Q3=upper quartile
EPGCOSCM 13
Detection of Outliers (Box Plot)
• Calculate Q1-1.5*IQR and Q3+1.5*IQR

• Any data lying outside this region is an outlier
BoxPlot
83 84 85 86 87 88 89 90 91
IBM
BoxPlot
18.5 19 19.5 20 20.5 21 21.5 22 22.5

EDS
EPGCOSCM 13
A large number of fast-food restaurants with drive-through
windows offering drivers and their passengers the
advantages of quick service. To measure how good the
service is, an organization called QSR planned a study
wherein the amount of time taken by a sample of drive-
through customers at each of five restaurants was
recorded. Compare the five sets of data using a box plot
and interpret the results.
EPGCOSCM 13
Box Plots…
Wendy’s service time is

shortest and least variable.
Hardee’s has the greatest

variability, while Jack-in-the-
Box has the longest service
times.
EPGCOSCM 13
Standardising Data
• Purpose: To compare each data point to the natural
range and variation of the dataset.
• Method: For each data value – subtract off sample
mean and divided by sample std dev.
Resulting numbers called z-values or z-scores
• measure how many standard deviations above or
below the mean a data point is.
• are “unit free”
• have mean zero and SD 1
EPGCOSCM 13
Standardising Data
How: To compare each data point to the

natural range and variation of the dataset.
x−x
z=
s
z score can be both positive or negative
EPGCOSCM 13
Capturing variation
⚫ Chebyshev’s Theorem
Applies to any distribution, regardless of shape
⚫ Empirical Rule
Applies only to roughly mound-shaped and symmetric
distributions
EPGCOSCM 13
Chebyshev’s Theorem
 1 
⚫ At least 1 −
 2 of the elements of any


 k 
distribution lie within k standard deviations of the
mean
1 1 3
1− = 1 − = = 75%
2
2
4 4 2
Standard
At 1 1 8 Lie
1 − 2 = 1 − = = 89% 3 deviations
least 3 9 9 within of the mean
1 1 15 4
1− 2 = 1− = = 94%
4 16 16
EPGCOSCM 13
Empirical Rule
⚫ For roughly mound-shaped and symmetric
distributions, approximately:
68% 1 standard deviation

of the mean
95% Lie 2 standard deviations

within of the mean
All 3 standard deviations

of the mean
EPGCOSCM 13
Empirical Rule
99.72%
95.44%
68.26%
m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
EPGCOSCM 13
A survey is conducted on 20 respondents to gather information on customer satisfaction for a product.
The data on customer satisfaction is obtained on a 3 point scale viz. highly satisfied (HS), satisfied (S), not satisfied
(NS) and also on gender- male (M) and female (F). The data is recorded as shown below:
Respondents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F F F F F F M M M M M F F F F M F M M
Satisfaction S S NS S NS NS NS NS HS HS S NS HS S S HS NS NS S S
level
EPGCOSCM 13
Scatter Plots and Correlation
• A scatter plot (or scatter diagram) is used to show

the relationship between two variables
• Correlation analysis is used to measure strength of
the linear association between two variables
• Only concerned with strength of the relationship
• No causal effect is implied
EPGCOSCM 13
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y
x x
y y
x x
EPGCOSCM 13
Strong relationships Weak relationships
y y
x x
y y
x x
EPGCOSCM 13
No relationship
x
EPGCOSCM 13
Correlation Coefficient
• The correlation coefficient (r) is used to measure

the strength of the linear relationship in the sample
observations
EPGCOSCM 13
Calculating sample Correlation Coefficient
cov( x, y )
rxy =
sx s y
1
cov( x, y ) =  ( xi − x )( yi − y )
n
1 1
sx =
n
 ( xi − x ) 2
s y =
n
 ( y i − y ) 2
EPGCOSCM 13
Features of correlation coefficient
• Unit free
• Range between -1.00 and 1.00
• -1≤r<0 implies that as X ↑ (↓), Y ↓ (↑ )
• 0< r≤1 implies that as X ↑ (↓), Y ↑ (↓)
• The closer to -1.00, the stronger the negative linear relationship
• The closer to 1.00, the stronger the positive linear relationship
• The closer to 0.00, the weaker the linear relationship
• r=0 implies that X and Y are not linearly associated
EPGCOSCM 13
Examples of Approximate r Values
y y y
x x x
r = -1.00 r = -.60 r = 0.00
y y
x x
r = 0.20 EPGCOSCM 13 r = 1.00

QUANTITATIVE TITLE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QUANTITATIVE TITLE

Uploaded by

Copyright:

Available Formats

Quantitative Techniques

Indian Institute of Management, Kozhikode

Formulate Get some Visualize the

• Descriptive statistics – the collection, organization,

• Inferential statistics – generalizing from a sample to a

• Cross-sectional data: Data collected at the same or approximately the

4.50 4.75 5.00 5.25 5.50 5.75 6.00

Gender Profile of Sample

Distribution of Waiting Time (mins)

• What is a typical or central value?

• How much variability is present in the data set?

• Are there unusual shocks/events/cases (shape of the curve)?

Population Standard Sample Standard

• Relative measure (unit free) used for the purpose

• Relative Measure=absolute measure/avg. *100

• Percentiles are data that have been divided into 100

The 10th percentile (denoted by P10) is the number

The median is the 50th percentile.

The 1st quartile (denoted by Q1) is the data such

Inter quartile range (IQR) = Q3-Q1

Describes the overall distribution of a set of

Useful to detect outliers / extreme values

S=smallest, L=Largest, M=median

• Calculate Q1-1.5*IQR and Q3+1.5*IQR

18.5 19 19.5 20 20.5 21 21.5 22 22.5

Wendy’s service time is

Hardee’s has the greatest

How: To compare each data point to the

z score can be both positive or negative

68% 1 standard deviation

95% Lie 2 standard deviations

All 3 standard deviations

• A scatter plot (or scatter diagram) is used to show

• The correlation coefficient (r) is used to measure

You might also like

• Calculate Q1-1.5IQR and Q3+1.5IQR