Quant

HBR Orientation Module
Quantitative Methods
What do we fear the most?
Ophidiophobia
Glossophobia…and,
Arithmophobia
Fear of Statistics
This course is dedicated to:
Sir Ronald Fisher (1890-1962)
What’s the story?
Data to Decision
We immediately need to
Started lagging backfill the 2 FTEs, who left
in May
We nearly kept up with

incoming volume in the
following two months
Service level was maintained
But, since Aug, service level
Consistent fall witnessed since Aug has been impacted adversely
due to continued understaffing
in the team
A statistical dilemma?
Customer data of a large North American retailer expected to follow a bell curve distribution,
but…
A statistical dilemma?
Customer data of a large North American retailer expected to follow a bell curve distribution,
but…
Another statistical dilemma?
Advertising spend on TV ads and monthly sales in an FMCG firm
Goodness of Fit statistics

Regression Statistics
Multiple R 0.90
R Square 0.81
Adjusted R Square 0.81
Standard Error 2.32
Observations 400
ANOVA n 400
df SS MS F
Regression 1 8993 8993.397603 1675.60562
Residual 398 2136 5.367252003
Total 399 11129.6
Coefficients
Coefficients Standard Error t Stat P-value
Intercept 7.23 0.23 31.37 0.00
TV Ad exp (Rs. lakh) 0.06 0.00 40.93 0.00
Business Caselet – An Investment Dilemma
Consider the Mutual Funds data set.
• Column A – Category – type of companies in which the mutual fund is investing, categorized as large,
mid, and small capitalization
• Column B – Objective – investment philosophy of the mutual fund, categorized as value (longer
duration, low risk, stable returns) or growth (shorter investment duration, higher risk, potentially
higher returns) oriented
• Column C – Fees – whether the MF is charging entry or exit fee (% of funds invested/withdrawn) or
not
• Column D – Asset Size – the value of the investment corpus (amount invested) of the mutual fund
• Column E – Expense Ratio – fund management expenses / asset size
• Column F-H – Returns – annualized 1/3/5 year return generated by the fund (in %)
• Column I – Risk – high, average, or low risk rating of the fund
Assume you are an investment advisor, working with a wealth management firm, managing the
investment portfolio of high net-worth individuals (HNWI). The typical investment portfolio of an HNWI
is Rs. 7.5 Cr (USD 1 mn.)
Think of the various ways in which you can statistically analyze the MF dataset to provide
investment recommendations to your HNWI clients.
Association and
Causation
Confidence • Correlation
Interval analysis
Data • Sampling • Simple linear
Distribution distribution regression
• Normal • CI • Multiple linear
distribution estimation regression
‘Describing’ Data • Standard for the
• Measures of normal population
mean
descriptive statistics distribution
• Positional measures • Bell curve
– Quartiles
• 5-number summary
• Outliers
Understanding Data
• Reading a data set
• Exploratory data
visualization
Statistical Analysis
• DCOVA lifecycle
• Statistical analysis
classification
Pyramid of Statistics
APPLIED
DESCRIPTIVE ANALYTICAL INFERENTIAL INDUCTIVE

Measures of Descriptive Statistics
Measures of Descriptive Statistics
Measures of
Measures of Measures of
Central
Dispersion/Variability Asymmetry/Shape
Tendency
• Mean • Standard deviation • Kurtosis

• Median • Variance • Skewness
• Mode • Coefficient of Variation
• Interquartile Range (IQR)
5-Number Summary
5-Number Summary & Box Plot
The “Five” Numbers

1. Q0 – always the minimum value
2. Q1 – the 25th percentile
3. Q2 – always the median value
4. Q3 – the 75th percentile
5. Q4 – always the maximum value
Pyramid of statistics
APPLIED
DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE

Inferential Statistics
Leap of Faith
POPULATION
Parameter
Sample
Statistic
Sampling Distribution
Sampling distribution of the mean is the distribution of all sample means if all
possible samples of the same size are selected from a given population.
Standard error of the mean is the value of standard deviation of all possible
sample means from the population mean. As the sample size increases, the
standard error decreases by a factor of the square root of the sample size.
Calculated as: σ / √ n (pop. Std deviation / square root of sample size, n)

APPLIED

Correlation and Regression
Correlation Matrix
r correlation coefficient Assets Expense Ratio Return 2006 3-Year Return 5-Year Return
Assets 1
Expense Ratio -0.29 1.00
Return 2006 0.08 -0.13 1.00
3-Year Return 0.07 -0.11 0.70 1.00
5-Year Return 0.06 -0.06 0.59 0.84 1.00
Regression Types
Generalized Linear Non-linear models,

Models, e.g. for e.g.
Simple
Linear Quadratic
Logistic
Polynomial
Log linear Multiple
Anova/Ancova
Interpreting Regression Output: Simple Linear Regression (1/2)
• Multiple R: same as correlation

coefficient for regression models where
Regression equation there is only one independent variable.
y = ax + b, where, • R square: This is the square of Multiple
SUMMARY OUTPUT
y is the dependent variable, Return 2006 r, the Coefficient of Determination. It
tells you how many points fall on the
Regression Statistics a is the slope of line or regression coefficient (coefficient
Multiple R 0.08 regression line, i.e. explained variation
R Square 0.01 of the x) (by the model)/total variation. For
Adjusted R Square 0.01 x is the independent variable, Assets example, 98% means that 98% of the
Standard Error 6.27
b is the y-intercept or constant, the point where variation of y-values around the mean
Observations 868
are explained by the x-values. In other
regression line crosses the y axis words, 98% of the values fit the model.
ANOVA
df SS MS F Significance F R² is the percentage of explained
Regression 1 230.6058383 230.6058383 5.858377732 0.015707884 variance, i.e. the percentage of variance
Residual 866 34088.72986 39.3634294 of y that stems from the regression line.
Total 867 34319.3357 For a visualization, draw, for each data
point, a vertical line to the regression
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% line; also draw a horizontal line for the
Intercept 12.3575309 0.222570792 55.52179955 1.4747E-287 11.92068963 12.79437217 11.92068963 12.79437217
mean of y. For each vertical line, take
Assets 9.0742E-05 3.74904E-05 2.420408588 0.015707884 1.71594E-05 0.000164325 1.71594E-05 0.000164325
the section between the horizontal line
and the regression line. The sum of
squares of these sections are the
Goodness of Fit
measures
explained variance.
• Adjusted R Square. The adjusted R-
square adjusts for the number of terms
When the Y-intercept is positive, the line crosses the Y-
in a model. You’ll want to use this
axis above the origin. When the Y-intercept is negative,
a, regression/ instead of R squared if you have more
the line crosses the Y-axis below the origin. When the Y-
slope coefficient
intercept is zero, the line passes through the origin. than one x variable.
b, intercept coefficient (constant)

Interpreting Regression Output: Simple Linear Regression (2/2)
Regression equation • Standard Error (S): An

y = ax + b, where, estimate of the typical
SUMMARY OUTPUT distance the data points fall
y is the dependent variable, Return 2006
away from the regression
Regression Statistics
Multiple R 0.08
a is the slope of line or regression coefficient line. Expressed in units of
R Square 0.01 (coefficient of the x) the dependent variable. A
Adjusted R Square
Standard Error
0.01
6.27
x is the independent variable, Assets value closer to 0 indicates a
Observations 868 b is the y-intercept or constant, the point where fit that is more useful for
regression line crosses the y axis prediction. If the coefficient
ANOVA
is large compared S, then
df SS MS F Significance F
Regression 1 230.6058383 230.6058383 5.858377732 0.015707884 the coefficient is probably
Residual 866 34088.72986 39.3634294 different from 0.
Total 867 34319.3357 • Observations. Number of
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
observations in the sample.
Intercept 12.3575309 0.222570792 55.52179955 1.4747E-287 11.92068963 12.79437217 11.92068963 12.79437217 • p-value - tests the null
Assets 9.0742E-05 3.74904E-05 2.420408588 0.015707884 1.71594E-05 0.000164325 1.71594E-05 0.000164325 hypothesis that the
coefficient is equal to zero
a, regression (no effect). A low p-value (<
coefficient 0.05) indicates that you can
reject the null hypothesis
APPLIED

Time Series Analysis - Basics
Time Series Analysis - Basics
• A quantitative forecasting technique, Time

series involves past data indexed by
equally spaced increments of time
(minutes, hours, days, weeks, etc.), which
Qualitative Quantitative
is used to forecast/ predict future data
Delphi
• Time series analysis accounts for the fact
that data points taken over time may have
Time Causal an internal structure (such as
Surveys
Series
autocorrelation, trend or seasonal variation)
Polling
that should be accounted for
Executive
Opinions
Situation 1 - Moving Average Approach
• A retail company has monthly sales data for 2018, from Jan-Dec. The Sales Head wants to apply a simple,
but statistical technique, taking 3-months rolling data to forecast sales for Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Calculate the simple moving average for 3-month duration
• For a simple moving average, the formula is the sum of the data points over a given period divided by the
number of periods
• Insert a line graph for actual and forecast figures to evaluate the effect of smoothing variations between
time-steps
• Alternatively, use data analysis toolpak (preferred)
Situation 2 –
Weighted Moving Average Approach
Situation 2 –
Weighted Moving Average Approach
• A retail company has monthly sales data for 2018, from Jan-Dec. The Sales Head wants to apply a simple,
but statistical technique, giving higher weightage to recent months in a 3-month rolling format to forecast
sales for Jan, 2019 and beyond.
Steps
• Apply weights in 3:2:1 ratio and calculate the moving average for 3-month duration
• Insert a line graph for actual and forecast figures to evaluate the effect of smoothing variations between
time-steps
Sit
Situation 3 –
uatn 4 –
Exponential Moving Average Approach
• A retail company has monthly sales data for 2018, from Jan-Dec. The
Sales Head wants to apply a simple, but statistical technique, giving
accounting for the dynamic fluctuations in sales data, to forecast sales for
Jan, 2019 and beyond.
Steps
• Follow ‘exponential smoothing’ steps in data analysis toolpak
o The damping factor is the coefficient of exponential smoothing (default is
0.3).

Quant

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quant

Uploaded by

Copyright:

Available Formats

HBR Orientation Module

We nearly kept up with

Advertising spend on TV ads and monthly sales in an FMCG firm

Goodness of Fit statistics

DESCRIPTIVE ANALYTICAL INFERENTIAL INDUCTIVE

Measures of Descriptive Statistics

• Mean • Standard deviation • Kurtosis

5-Number Summary & Box Plot

The “Five” Numbers

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE

Calculated as: σ / √ n (pop. Std deviation / square root of sample size, n)

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE

Generalized Linear Non-linear models,

• Multiple R: same as correlation

b, intercept coefficient (constant)

Regression equation • Standard Error (S): An

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE

• A quantitative forecasting technique, Time

You might also like