Professional Documents
Culture Documents
Quant
Quant
Quantitative Methods
What do we fear the most?
Ophidiophobia
Glossophobia…and,
Arithmophobia
Fear of Statistics
This course is dedicated to:
Sir Ronald Fisher (1890-1962)
What’s the story?
Data to Decision
We immediately need to
Started lagging backfill the 2 FTEs, who left
in May
Customer data of a large North American retailer expected to follow a bell curve distribution,
but…
A statistical dilemma?
Customer data of a large North American retailer expected to follow a bell curve distribution,
but…
Another statistical dilemma?
ANOVA n 400
df SS MS F
Regression 1 8993 8993.397603 1675.60562
Residual 398 2136 5.367252003
Total 399 11129.6
Coefficients
Coefficients Standard Error t Stat P-value
Intercept 7.23 0.23 31.37 0.00
TV Ad exp (Rs. lakh) 0.06 0.00 40.93 0.00
Business Caselet – An Investment Dilemma
Consider the Mutual Funds data set.
• Column A – Category – type of companies in which the mutual fund is investing, categorized as large,
mid, and small capitalization
• Column B – Objective – investment philosophy of the mutual fund, categorized as value (longer
duration, low risk, stable returns) or growth (shorter investment duration, higher risk, potentially
higher returns) oriented
• Column C – Fees – whether the MF is charging entry or exit fee (% of funds invested/withdrawn) or
not
• Column D – Asset Size – the value of the investment corpus (amount invested) of the mutual fund
• Column E – Expense Ratio – fund management expenses / asset size
• Column F-H – Returns – annualized 1/3/5 year return generated by the fund (in %)
• Column I – Risk – high, average, or low risk rating of the fund
Assume you are an investment advisor, working with a wealth management firm, managing the
investment portfolio of high net-worth individuals (HNWI). The typical investment portfolio of an HNWI
is Rs. 7.5 Cr (USD 1 mn.)
Think of the various ways in which you can statistically analyze the MF dataset to provide
investment recommendations to your HNWI clients.
Association and
Causation
Confidence • Correlation
Interval analysis
Data • Sampling • Simple linear
Distribution distribution regression
• Normal • CI • Multiple linear
distribution estimation regression
‘Describing’ Data • Standard for the
• Measures of normal population
mean
descriptive statistics distribution
• Positional measures • Bell curve
– Quartiles
• 5-number summary
• Outliers
Understanding Data
• Reading a data set
• Exploratory data
visualization
Statistical Analysis
• DCOVA lifecycle
• Statistical analysis
classification
Pyramid of Statistics
APPLIED
Measures of
Measures of Measures of
Central
Dispersion/Variability Asymmetry/Shape
Tendency
APPLIED
Leap of Faith
POPULATION
Parameter
Sample
Statistic
Sampling Distribution
Sampling distribution of the mean is the distribution of all sample means if all
possible samples of the same size are selected from a given population.
Standard error of the mean is the value of standard deviation of all possible
sample means from the population mean. As the sample size increases, the
standard error decreases by a factor of the square root of the sample size.
APPLIED
r correlation coefficient Assets Expense Ratio Return 2006 3-Year Return 5-Year Return
Assets 1
Expense Ratio -0.29 1.00
Return 2006 0.08 -0.13 1.00
3-Year Return 0.07 -0.11 0.70 1.00
5-Year Return 0.06 -0.06 0.59 0.84 1.00
Regression Types
Logistic
Polynomial
Log linear Multiple
Anova/Ancova
Interpreting Regression Output: Simple Linear Regression (1/2)
APPLIED
Delphi
• Time series analysis accounts for the fact
that data points taken over time may have
Time Causal an internal structure (such as
Surveys
Series
autocorrelation, trend or seasonal variation)
Polling
that should be accounted for
Executive
Opinions
Situation 1 - Moving Average Approach
• A retail company has monthly sales data for 2018, from Jan-Dec. The Sales Head wants to apply a simple,
but statistical technique, taking 3-months rolling data to forecast sales for Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Calculate the simple moving average for 3-month duration
• For a simple moving average, the formula is the sum of the data points over a given period divided by the
number of periods
• Insert a line graph for actual and forecast figures to evaluate the effect of smoothing variations between
time-steps
• Alternatively, use data analysis toolpak (preferred)
Situation 2 –
Weighted Moving Average Approach
Situation 2 –
Weighted Moving Average Approach
• A retail company has monthly sales data for 2018, from Jan-Dec. The Sales Head wants to apply a simple,
but statistical technique, giving higher weightage to recent months in a 3-month rolling format to forecast
sales for Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Apply weights in 3:2:1 ratio and calculate the moving average for 3-month duration
• Insert a line graph for actual and forecast figures to evaluate the effect of smoothing variations between
time-steps
Sit
Situation 3 –
uatn 4 –
Exponential Moving Average Approach
• A retail company has monthly sales data for 2018, from Jan-Dec. The
Sales Head wants to apply a simple, but statistical technique, giving
accounting for the dynamic fluctuations in sales data, to forecast sales for
Jan, 2019 and beyond.
Steps
• Refer to sales data in Excel
• Follow ‘exponential smoothing’ steps in data analysis toolpak
o The damping factor is the coefficient of exponential smoothing (default is
0.3).