You are on page 1of 51

Quantitative Techniques

Shovan Chowdhury

Indian Institute of Management, Kozhikode


Data Driven Business Performance

EPGCOSCM 13
Course Objective
• Familiarity with different types of data and their visualization
• Understanding presence of intrinsic uncertainty in a business
situation
• Use of appropriate statistical techniques for modelling data and
capturing uncertainty
• Applying statistical software for data analysis
• Interpreting outputs from a managerial aspect (may require
knowledge of other disciplines)

EPGCOSCM 13
“Statistical Techniques/Methods”

Formulate Get some Visualize the


problem data data

Do some Interpret
statistical results
calculations

EPGCOSCM 13
DATA AND SUMMARIZATION
Primary Uses of Statistics

• Descriptive statistics – the collection, organization,


presentation and summary of data.

• Inferential statistics – generalizing from a sample to a


population, estimating unknown parameters, drawing
conclusions, making decisions.

EPGCOSCM 13
Problem
One Chocolate manufacturing company sells quality chocolate products at its plant
and retail stores. Two years ago, the company developed a Web site and began
selling its products over the Internet. Web site have exceeded the company’s
expectations, and management is now considering strategies to increase sales even
further. To learn more about the Web site customers, a sample of 50 Chocolate
transactions was selected from the previous month’s sales.

Data showing
the day of the week each transaction was made,
the type of browser the customer used,
the time spent on the Web site,
the number of Web site pages viewed,
the amount spent by each of the 50 customers.

EPGCOSCM 13
The Cab Case (Text Book) : Demand Supply Gap

EPGCOSCM 13
Basic Vocabulary of Statistics

POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.

STATISTIC
A statistic is a numerical measure that describes a characteristic
of a sample.
EPGCOSCM 13
Qualitative(Categ
Quantitative
orical)

Discrete (no.
of customers, Ordinal (customer
no of claims) satisfaction,
efficiency of workers,
bond rating)
Continuous
(salary, price)
Nominal (sex,
nationality,
eye color)

EPGCOSCM 13
Cross-Sectional Data

• Cross-sectional data: Data collected at the same or approximately the


same point in time
• Time series data: data collected over different time periods

EPGCOSCM 13
Data Visualization

EPGCOSCM 13
Some quick questions
- Return on investment
- Project completion time
- Mutual fund ratings
- Political affiliation
-Demand for a product
- No of customers waiting in a queue
- Diameter of bolts
- Number of defectives produced in a shift
- Gender
-No of misprints per page of a book
- Marital Status
- Efficiency of employee
Excel Bar and Pie Chart of Pizza Preference
Data

EPGCOSCM 13
Histogram

EPGCOSCM 13
Summary for WaitTime
A nderson-D arling N ormality Test
A -S quared 0.24
P -V alue 0.759
M ean 5.4600
S tDev 2.4755
V ariance 6.1279
S kew ness 0.250415
Kurtosis -0.404960
N 100

M inimum 0.4000
1st Q uartile 3.8000
M edian 5.2500
3rd Q uartile 7.2000
0 2 4 6 8 10 12
M aximum 11.6000
95% C onfidence Interv al for M ean
4.9688 5.9512
95% C onfidence Interv al for M edian
4.5742 5.8773
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
2.1735 2.8757
Mean

Median

4.50 4.75 5.00 5.25 5.50 5.75 6.00

EPGCOSCM 13
Rating Distribution of Sample
35%
30%
25%
20%
15%
10%
5%
0%
1 2 3 4 5

Gender Profile of Sample


80%

60%

40%

20%

0%
M F

Distribution of Waiting Time (mins)

25

20

15

10

0
EPGCOSCM 13
Data and randomness
Three questions that good business managers ask themselves when
they look at “the numbers”:-

• What is a typical or central value?

• How much variability is present in the data set?

• Are there unusual shocks/events/cases (shape of the curve)?

EPGCOSCM 13
Key Performance Measures

EPGCOSCM 13
Measures of Center
There are three main measures of center:
• Mean (most useful measure)
• Median (generally used under the presence of outliers)
• Mode (used for categorical data)

EPGCOSCM 13
Symmetrical

Negatively/
Positively
Left
/Right
Skewed
Skewed

EPGCOSCM 13
Dispersion
Describes how similar a set of observations are to each other
or
the degree of deviation (spread) of a set of data from their central
value

• In general, the more spread out a distribution is, the larger the
measure of dispersion will be

EPGCOSCM 13
Measures of Dispersion
There are four main measures of dispersion:

• Variance
• Standard Deviation
• Mean absolute Deviation
• Quartile Deviation or Semi-Inter-quartile range (IQR)

EPGCOSCM 13
Mean Absolute Deviation

EPGCOSCM 13
Variance and Standard Deviation
• The standard deviation is defined as the square root
of the variance. The units of measurement for the
standard deviation is same as the units of the
variable.

Population Standard Sample Standard


Deviation Deviation

EPGCOSCM 13
Standard
Deviation

EPGCOSCM 13
0
5
10
15
20
25
30

1
Sales

4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
EPGCOSCM 13 106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
175
178
181
184
187
190
193
196
---------------------------------------------------------------------------------------------------------------------------------------

199
Interpretation

• The larger the SD/variance is, the more the observations deviate, on
average, away from the mean
• The smaller the SD/variance is, the less the observations deviate, on
average, from the mean

EPGCOSCM 13
Coefficient of Variation (CV)

• Relative measure (unit free) used for the purpose


of comparison of variability.

• Relative Measure=absolute measure/avg. *100

s
CV = 100 
x

EPGCOSCM 13
Percentiles, Quartiles and IQR

• Percentiles are data that have been divided into 100


groups (99 percentiles).
• For example, you score in the 83rd percentile on a
standardized test. That means that 83% of the test-
takers scored below you.
• Deciles are data that have been divided into 10 groups
(9 deciles).
• Quartiles are data that have been divided into 4 groups
(3 quartiles).

EPGCOSCM 13
Percentiles, quartiles, and the IQR

The 10th percentile (denoted by P10) is the number


such that 10% of the values are less than it and 90%
are bigger.

The median is the 50th percentile.

The 1st quartile (denoted by Q1) is the data such


that 25% of the values are less than it and 75% are
bigger.

Inter quartile range (IQR) = Q3-Q1

EPGCOSCM 13
Box Plot

Describes the overall distribution of a set of


numbers but is simpler than a histogram.
Useful when comparing several samples because
too many histograms on one graph would be
both crowded and confusing.
Also produces useful display with small data
sets.

Useful to detect outliers / extreme values

EPGCOSCM 13
Box Plot

S=smallest, L=Largest, M=median


Q1=lower quartile, Q3=upper quartile

EPGCOSCM 13
Detection of Outliers (Box Plot)

• Calculate Q1-1.5*IQR and Q3+1.5*IQR


• Any data lying outside this region is an outlier
BoxPlot

83 84 85 86 87 88 89 90 91
IBM

BoxPlot

18.5 19 19.5 20 20.5 21 21.5 22 22.5


EDS

EPGCOSCM 13
A large number of fast-food restaurants with drive-through
windows offering drivers and their passengers the
advantages of quick service. To measure how good the
service is, an organization called QSR planned a study
wherein the amount of time taken by a sample of drive-
through customers at each of five restaurants was
recorded. Compare the five sets of data using a box plot
and interpret the results.

EPGCOSCM 13
Box Plots…

Wendy’s service time is


shortest and least variable.

Hardee’s has the greatest


variability, while Jack-in-the-
Box has the longest service
times.

EPGCOSCM 13
Standardising Data
• Purpose: To compare each data point to the natural
range and variation of the dataset.
• Method: For each data value – subtract off sample
mean and divided by sample std dev.
Resulting numbers called z-values or z-scores
• measure how many standard deviations above or
below the mean a data point is.
• are “unit free”
• have mean zero and SD 1

EPGCOSCM 13
Standardising Data

How: To compare each data point to the


natural range and variation of the dataset.
x−x
z=
s

z score can be both positive or negative

EPGCOSCM 13
Capturing variation

⚫ Chebyshev’s Theorem
Applies to any distribution, regardless of shape

⚫ Empirical Rule
Applies only to roughly mound-shaped and symmetric
distributions

EPGCOSCM 13
Chebyshev’s Theorem
 1 
⚫ At least 1 −
 2 of the elements of any


 k 
distribution lie within k standard deviations of the
mean
1 1 3
1− = 1 − = = 75%
2
2
4 4 2
Standard
At 1 1 8 Lie
1 − 2 = 1 − = = 89% 3 deviations
least 3 9 9 within of the mean
1 1 15 4
1− 2 = 1− = = 94%
4 16 16

EPGCOSCM 13
Empirical Rule
⚫ For roughly mound-shaped and symmetric
distributions, approximately:

68% 1 standard deviation


of the mean

95% Lie 2 standard deviations


within of the mean

All 3 standard deviations


of the mean
EPGCOSCM 13
Empirical Rule
99.72%
95.44%
68.26%

m
x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s

EPGCOSCM 13
A survey is conducted on 20 respondents to gather information on customer satisfaction for a product.
The data on customer satisfaction is obtained on a 3 point scale viz. highly satisfied (HS), satisfied (S), not satisfied
(NS) and also on gender- male (M) and female (F). The data is recorded as shown below:

Respondents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Gender M F F F F F F M M M M M F F F F M F M M
Satisfaction S S NS S NS NS NS NS HS HS S NS HS S S HS NS NS S S
level

EPGCOSCM 13
Scatter Plots and Correlation

• A scatter plot (or scatter diagram) is used to show


the relationship between two variables
• Correlation analysis is used to measure strength of
the linear association between two variables
• Only concerned with strength of the relationship
• No causal effect is implied

EPGCOSCM 13
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
EPGCOSCM 13
Strong relationships Weak relationships

y y

x x

y y

x x
EPGCOSCM 13
No relationship

x
EPGCOSCM 13
Correlation Coefficient

• The correlation coefficient (r) is used to measure


the strength of the linear relationship in the sample
observations

EPGCOSCM 13
Calculating sample Correlation Coefficient

cov( x, y )
rxy =
sx s y
1
cov( x, y ) =  ( xi − x )( yi − y )
n
1 1
sx =
n
 ( xi − x ) 2
s y =
n
 ( y i − y ) 2

EPGCOSCM 13
Features of correlation coefficient
• Unit free
• Range between -1.00 and 1.00
• -1≤r<0 implies that as X ↑ (↓), Y ↓ (↑ )
• 0< r≤1 implies that as X ↑ (↓), Y ↑ (↓)
• The closer to -1.00, the stronger the negative linear relationship
• The closer to 1.00, the stronger the positive linear relationship
• The closer to 0.00, the weaker the linear relationship
• r=0 implies that X and Y are not linearly associated

EPGCOSCM 13
Examples of Approximate r Values

y y y

x x x
r = -1.00 r = -.60 r = 0.00
y y

x x
r = 0.20 EPGCOSCM 13 r = 1.00

You might also like