You are on page 1of 31

HBR Orientation Module

Quantitative Methods
What do we fear the most?
Ophidiophobia
Glossophobia…and,
Arithmophobia
Fear of Statistics
This course is dedicated to:
Sir Ronald Fisher (1890-1962)
What’s the story?
Data to Decision

We immediately need to
Started lagging backfill the 2 FTEs, who left
in May

We nearly kept up with


incoming volume in the
following two months
Service level was maintained
But, since Aug, service level
Consistent fall witnessed since Aug has been impacted adversely
due to continued understaffing
in the team
A statistical dilemma?

Customer data of a large North American retailer expected to follow a bell curve distribution,
but…
A statistical dilemma?

Customer data of a large North American retailer expected to follow a bell curve distribution,
but…
Another statistical dilemma?

Advertising spend on TV ads and monthly sales in an FMCG firm

Goodness of Fit statistics


Regression Statistics
Multiple R 0.90
R Square 0.81
Adjusted R Square 0.81
Standard Error 2.32
Observations 400

ANOVA n 400
df SS MS F
Regression 1 8993 8993.397603 1675.60562
Residual 398 2136 5.367252003
Total 399 11129.6
Coefficients
Coefficients Standard Error t Stat P-value
Intercept 7.23 0.23 31.37 0.00
TV Ad exp (Rs. lakh) 0.06 0.00 40.93 0.00
Business Caselet – An Investment Dilemma
Consider the Mutual Funds data set.
• Column A – Category – type of companies in which the mutual fund is investing, categorized as large,
mid, and small capitalization
• Column B – Objective – investment philosophy of the mutual fund, categorized as value (longer
duration, low risk, stable returns) or growth (shorter investment duration, higher risk, potentially
higher returns) oriented
• Column C – Fees – whether the MF is charging entry or exit fee (% of funds invested/withdrawn) or
not
• Column D – Asset Size – the value of the investment corpus (amount invested) of the mutual fund
• Column E – Expense Ratio – fund management expenses / asset size
• Column F-H – Returns – annualized 1/3/5 year return generated by the fund (in %)
• Column I – Risk – high, average, or low risk rating of the fund
Assume you are an investment advisor, working with a wealth management firm, managing the
investment portfolio of high net-worth individuals (HNWI). The typical investment portfolio of an HNWI
is Rs. 7.5 Cr (USD 1 mn.)
Think of the various ways in which you can statistically analyze the MF dataset to provide
investment recommendations to your HNWI clients.
Association and
Causation
Confidence • Correlation
Interval analysis
Data • Sampling • Simple linear
Distribution distribution regression
• Normal • CI • Multiple linear
distribution estimation regression
‘Describing’ Data • Standard for the
• Measures of normal population
mean
descriptive statistics distribution
• Positional measures • Bell curve
– Quartiles
• 5-number summary
• Outliers
Understanding Data
• Reading a data set
• Exploratory data
visualization
Statistical Analysis
• DCOVA lifecycle
• Statistical analysis
classification
Pyramid of Statistics

APPLIED

DESCRIPTIVE ANALYTICAL INFERENTIAL INDUCTIVE


Measures of Descriptive Statistics

Measures of Descriptive Statistics

Measures of
Measures of Measures of
Central
Dispersion/Variability Asymmetry/Shape
Tendency

• Mean • Standard deviation • Kurtosis


• Median • Variance • Skewness
• Mode • Coefficient of Variation
• Interquartile Range (IQR)
5-Number Summary

5-Number Summary & Box Plot

The “Five” Numbers


1. Q0 – always the minimum value
2. Q1 – the 25th percentile
3. Q2 – always the median value
4. Q3 – the 75th percentile
5. Q4 – always the maximum value
Pyramid of statistics

APPLIED

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE


Inferential Statistics

Leap of Faith

POPULATION
Parameter
Sample
Statistic
Sampling Distribution

Sampling distribution of the mean is the distribution of all sample means if all
possible samples of the same size are selected from a given population.

Standard error of the mean is the value of standard deviation of all possible
sample means from the population mean. As the sample size increases, the
standard error decreases by a factor of the square root of the sample size.

Calculated as: σ / √ n (pop. Std deviation / square root of sample size, n)


Pyramid of statistics

APPLIED

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE


Correlation and Regression
Correlation Matrix

r correlation coefficient Assets Expense Ratio Return 2006 3-Year Return 5-Year Return
Assets 1
Expense Ratio -0.29 1.00
Return 2006 0.08 -0.13 1.00
3-Year Return 0.07 -0.11 0.70 1.00
5-Year Return 0.06 -0.06 0.59 0.84 1.00
Regression Types

Generalized Linear Non-linear models,


Models, e.g. for e.g.
Simple
Linear Quadratic

Logistic

Polynomial
Log linear Multiple

Anova/Ancova
Interpreting Regression Output: Simple Linear Regression (1/2)

• Multiple R: same as correlation


coefficient for regression models where
Regression equation there is only one independent variable.
y = ax + b, where, • R square: This is the square of Multiple
SUMMARY OUTPUT
y is the dependent variable, Return 2006 r, the Coefficient of Determination. It
tells you how many points fall on the
Regression Statistics a is the slope of line or regression coefficient (coefficient
Multiple R 0.08 regression line, i.e. explained variation
R Square 0.01 of the x) (by the model)/total variation. For
Adjusted R Square 0.01 x is the independent variable, Assets example, 98% means that 98% of the
Standard Error 6.27
b is the y-intercept or constant, the point where variation of y-values around the mean
Observations 868
are explained by the x-values. In other
regression line crosses the y axis words, 98% of the values fit the model.
ANOVA
df SS MS F Significance F R² is the percentage of explained
Regression 1 230.6058383 230.6058383 5.858377732 0.015707884 variance, i.e. the percentage of variance
Residual 866 34088.72986 39.3634294 of y that stems from the regression line.
Total 867 34319.3357 For a visualization, draw, for each data
point, a vertical line to the regression
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% line; also draw a horizontal line for the
Intercept 12.3575309 0.222570792 55.52179955 1.4747E-287 11.92068963 12.79437217 11.92068963 12.79437217
mean of y. For each vertical line, take
Assets 9.0742E-05 3.74904E-05 2.420408588 0.015707884 1.71594E-05 0.000164325 1.71594E-05 0.000164325
the section between the horizontal line
and the regression line. The sum of
squares of these sections are the
Goodness of Fit
measures
explained variance.
• Adjusted R Square. The adjusted R-
square adjusts for the number of terms
When the Y-intercept is positive, the line crosses the Y-
in a model. You’ll want to use this
axis above the origin. When the Y-intercept is negative,
a, regression/ instead of R squared if you have more
the line crosses the Y-axis below the origin. When the Y-
slope coefficient
intercept is zero, the line passes through the origin. than one x variable.

b, intercept coefficient (constant)


Interpreting Regression Output: Simple Linear Regression (2/2)

Regression equation • Standard Error (S): An


y = ax + b, where, estimate of the typical
SUMMARY OUTPUT distance the data points fall
y is the dependent variable, Return 2006
away from the regression
Regression Statistics
Multiple R 0.08
a is the slope of line or regression coefficient line. Expressed in units of
R Square 0.01 (coefficient of the x) the dependent variable. A
Adjusted R Square
Standard Error
0.01
6.27
x is the independent variable, Assets value closer to 0 indicates a
Observations 868 b is the y-intercept or constant, the point where fit that is more useful for
regression line crosses the y axis prediction. If the coefficient
ANOVA
is large compared S, then
df SS MS F Significance F
Regression 1 230.6058383 230.6058383 5.858377732 0.015707884 the coefficient is probably
Residual 866 34088.72986 39.3634294 different from 0.
Total 867 34319.3357 • Observations. Number of
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
observations in the sample.
Intercept 12.3575309 0.222570792 55.52179955 1.4747E-287 11.92068963 12.79437217 11.92068963 12.79437217 • p-value - tests the null
Assets 9.0742E-05 3.74904E-05 2.420408588 0.015707884 1.71594E-05 0.000164325 1.71594E-05 0.000164325 hypothesis that the
coefficient is equal to zero
a, regression (no effect). A low p-value (<
coefficient 0.05) indicates that you can
reject the null hypothesis
Pyramid of statistics

APPLIED

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE


Time Series Analysis - Basics
Time Series Analysis - Basics

• A quantitative forecasting technique, Time


series involves past data indexed by
equally spaced increments of time
(minutes, hours, days, weeks, etc.), which
Qualitative Quantitative
is used to forecast/ predict future data

Delphi
• Time series analysis accounts for the fact
that data points taken over time may have
Time Causal an internal structure (such as
Surveys
Series
autocorrelation, trend or seasonal variation)
Polling
that should be accounted for

Executive
Opinions
Situation 1 - Moving Average Approach

• A retail company has monthly sales data for 2018, from Jan-Dec. The Sales Head wants to apply a simple,
but statistical technique, taking 3-months rolling data to forecast sales for Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Calculate the simple moving average for 3-month duration
• For a simple moving average, the formula is the sum of the data points over a given period divided by the
number of periods
• Insert a line graph for actual and forecast figures to evaluate the effect of smoothing variations between
time-steps
• Alternatively, use data analysis toolpak (preferred)
Situation 2 –
Weighted Moving Average Approach
Situation 2 –
Weighted Moving Average Approach

• A retail company has monthly sales data for 2018, from Jan-Dec. The Sales Head wants to apply a simple,
but statistical technique, giving higher weightage to recent months in a 3-month rolling format to forecast
sales for Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Apply weights in 3:2:1 ratio and calculate the moving average for 3-month duration
• Insert a line graph for actual and forecast figures to evaluate the effect of smoothing variations between
time-steps
Sit
Situation 3 –
uatn 4 –
Exponential Moving Average Approach

• A retail company has monthly sales data for 2018, from Jan-Dec. The
Sales Head wants to apply a simple, but statistical technique, giving
accounting for the dynamic fluctuations in sales data, to forecast sales for
Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Follow ‘exponential smoothing’ steps in data analysis toolpak
o The damping factor is the coefficient of exponential smoothing (default is
0.3).

You might also like