Professional Documents
Culture Documents
1
What is Financial Econometrics and how is it helpful?
Financial Econometrics
Applies mathematical and statistical techniques to analyzing economic and
financial data
Can help test theories or validate intuition using empirical or simulated data
and rich toolbox of techniques and models to derive connections
Investment managers typically apply these quantitative tools to better select
securities or manage portfolios
Econometric can help answer questions such as:
Does one portfolio manager have a consistent edge over another?
Do high beta stocks outperform low beta?
Do acquisitions favor the target company or the acquirer?
How well could you replicate the S&P 500 with fewer securities
The goal of most empirical research is to discover insights that improve
investment performance
While Econometrics may seem rigid but once you master the fundamentals, it
allows you to be creative and robust in your research
2
Econometrics is built on statistics. Lets review some
essentials
Most asset research and investment models require a measure of
risk and return for the asset or strategy under review
The most commonly used measure of return is either the Arithmetic mean or
Geometric mean
Risk or volatility is typically measure by the Standard Deviation
Will explore both with help of data from Robert Shiller of Yale University and Aswath
Damodaran of NYU
A combined data series captures broad based US equity returns, S&P 500 composite,
back to 1871
US Equity Returns
50%
40%
30%
20%
10%
0%
-10%
-20%
-30%
-40%
1871
1875
1879
1883
1887
1891
1895
1899
1903
1907
1911
1915
1919
1923
1927
1931
1935
1939
1943
1947
1951
1955
1959
1963
1967
1971
1975
1979
1983
1987
1991
1995
1999
2003
2007
2011
2015
2019
3
What would $100 invested in 1871 have grown to?
A whopping $38.3 million
Albert Einstein is reputed to have said that “Compounded Interest is the most
powerful force in the world”
General tendency of stock markets to go up is a visceral reminder of what
uninterrupted investing can achieve
As a cautionary tale, withdrawing $5 at the end of each year for consumption would
have reduced the nest egg to $9.4 million!
25,000,000
20,000,000
Powerful bull market post
15,000,000
2008-09 financial crisis
10,000,000
5,000,000
0
1871
1875
1879
1883
1887
1891
1895
1899
1903
1907
1911
1915
1919
1923
1927
1931
1935
1939
1943
1947
1951
1955
1959
1963
1967
1971
1975
1979
1983
1987
1991
1995
1999
2003
2007
2011
2015
2019
4
Measuring risk and return
Like any traditional mean the Arithmetic mean is computed as the summation of all
observations divided by the number of observations
Return i
n
The annual Arithmetic mean / return since 1871 is 10.58%
Since 1928, which is a typical test period for lots of research, the return is 11.57%
But returns on their own give an incomplete picture
Investors are assumed to be risk adverse - that is for the same expected return a rational
investor will chose the least risky investment
Obviously they care about return volatility
Standard Deviation ( )
Standard deviation is the most common measure of risk and is often referenced by its
Greek letter (sigma) in formal research reports
Annual equity volatility since 1871 has been 18.05% and since 1928, 17.99%
How does all of that help investors think about the range or distribution of returns
For instance what would a record year look like or more ominously how bad could it
get?
5
What about the return distribution?
Many of the great breakthroughs in statistical thinking came from the demands of
a very practical clientele, gamblers
As they looked for an edge in games of chance, which started with rudimentary
efforts like coin flips, they started to ask questions like - If a fair coin is flipped 100
times what’s the probability of getting 20 heads or 40 or 60 ….?
That led to a string of insights on how independent data in repeated experiments
behaves
Insights from statistical luminaries like Laplace, De Moivre (was a consultant to
gamblers and insurance brokers), Bernoulli, Adrian, Gauss and others incrementally
built on each others work and came to a stunning conclusion…
… in studying fair games and experiments of chance repeated testing or sampling
resulted in a distribution that was bell shaped
As it appeared so often in their experiments, they called it the Normal Distribution
Its most striking characteristic is its symmetry with the largest number of observations
clustering near the center (around the mean) and then trailing off as you move further
towards the tails Normal Distribution
6
Some key insight that led to the Normal Distribution
Before we jump into how to use the Normal Distribution its worth highlighting two
key building blocks that make it robust
Law of Large Numbers: This is fairly intuitive. The larger the size of a selected
random sample is, the closer that average is to the true population average versus the
result form a smaller sample
o Another more interesting and robust approach to the above is to create a series of fairly
large samples with at least 30 observations in each sample. Next take the average of
each sample and finally take the average of all those sample averages. This should give a
decent estimate of the true population average, which improves with sample size
o For example assume you create 300 samples that each have a sample size of 100
observations. Compute the average of each sample. You now have 300 estimates for the
true average. Finally take the average of those 300 estimates to get a robust estimate for
the true population average
8
Mini Case Study - Central Limit Theorem (continued)
Graph of Empirical Results for Distribution of Sample Means
Population
Obs 1 5.00%
Obs 2 -5.00% Simulation Results
Mean 0% To recap, we have 300 data points
Stdev 5.00%
Each is the average of 100 observations
Sample for Mean n = 100 Empirical Theoretical All 300 samples are created through
Average 0.0040% 0.0000%
Standard Error of Mean 0.5030% 0.5000% random selection
Mean of the resulting distribution of
Graph of Samples
Point Num Obs sample means is 0.0040%. We would
-1.400000% 0.3333% theoretically expect 0.0000%
-1.200000%
-1.100000%
0.6667%
0.3333%
Distribution of Sample Means, n = 100
Graph is Constructed Using 300 Sample Means
Standard Error of the Mean is 0.5030%.
-1.000000% 1.0000% Theoretically expect 0.5000%
-0.900000%
-0.800000%
1.3333%
3.0000%
10% Both are extremely close to theoretical
-0.700000% 2.3333% 9%
expectations
-0.600000% 5.6667%
8%
Will show later that a Different
-0.500000% 4.3333%
-0.400000% 4.3333% 7%
Between Two Means test confirms this
-0.300000% 8.0000% Also once we introduce methodology to
Frequency
-0.200000% 7.0000% 6%
-0.100000% 9.3333%
test if a distribution is normally
5%
0.000000% 7.6667% distributed (KS test) we will further
0.100000% 7.6667% 4%
reconfirm our results
0.200000% 5.0000%
0.300000% 7.6667%
3% However, to avoid suspense - the KS
0.400000% 4.6667% 2% test confirm that the Sample Means are
0.500000% 4.3333%
0.600000% 5.0000% 1% normally distributed even though the
0.700000% 2.6667% 0% underlying security returns are not
0.800000% 3.0000%
30 %
50 %
70 %
90 %
10 %
50 %
%
.1 0 %
.9 0 %
.7 0 %
.5 0 %
.3 0 %
.1 0 %
10 %
0.900000% 1.0000%
0. 000
0. 000
0. 000
0. 000
1. 000
1. 000
00
0. 000
-1 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
00
1.000000% 1.6667%
0
0
0
0
0
0
00
0
0
0
0
0
0
.4
-1
1.100000% 1.3333%
1.200000% 0.3333% Sample Means
1.500000% 0.3333%
Total 100.0000% 9
Back to the Normal Distribution
As we noted earlier most investment models in Modern Portfolio Theory assume
stock returns are normally distributed
Efficient Market theory suggests stock prices follow a random walk as only new
and therefore still unknown / random information drive prices
Since stock prices are bound at 0 with theoretically unlimited upside, this skews the
distribution of stock prices to the right. Therefore stock prices tend to follow a
Lognormal Distribution
Consistent with this, stock returns are expected to follow a Normal
Distribution. The assumption generally holds in practice although daily returns tend
to have more extreme outcomes (fatter tails) than is normally assumed
10
Normal distribution and the Z score in action
Identifying a top performing manager
From 149 years of S&P 500 data, we found a mean of 10.58%, Standard Deviation of 18.05%
Assume there are 5 equity managers who are urging you to invest with them. And all claim to
have superior performance
You could simply rank them and take the top one. However 5 eager mangers is a small sample.
The true superior performers may have all the business they need so may not have called yet
Assume you define superior performance as returns in the top 5% based on historical S&P 500
results
Of the 5 managers, the top performer had a 15.28% return over the last year. Is this superior
performance?
15
As an aside S&P 500 rolling 10 year returns
It’s rare to have a decade of losses but late 30’s and 2008-09 were tough
Per graph below, 18% over a decade certainly looks like superior performance
Late 30’s and 2008-09 included the worst S&P 500 annual downturns, but
otherwise 10 year runs have been positive
18%
15%
12%
9%
6%
3%
0%
-3%
1880
1884
1888
1892
1896
1900
1904
1908
1912
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
16
Normal distribution: Standard Deviation ranges worth
remembering
First thing worth noting is that 50% of the normal distribution lies below and 50%
above a Z score of 0
Since the mean has a Z score of 0 this is not a surprise. But affirms the distribution is
symmetrical
68.26% of the distribution lies within 1 standard deviation (σ) of the mean (µ) , µ ± 1σ
95.00% % lies within 1.96 standard deviations of the mean, µ ± 1.96σ
95.45% lies within 2 standard deviations of the mean, µ ± 2σ
And most if it, or 99.73%, lies within 3 standard deviations of the mean, µ ± 3σ
6% 4%
Probability
5%
3%
4%
2%
3%
2% 1%
1% 0%
0%
-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
-1%
-42%
-39%
-36%
-33%
-29%
-26%
-23%
-20%
-17%
-13%
-10%
-7%
-4%
0%
3%
6%
9%
12%
16%
19%
22%
25%
28%
32%
35%
38%
41%
45%
48%
51%
Returns
Returns
50% 50%
40%
40%
30%
30%
20%
20%
10%
10% 0%
-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
0%
19
-4%
-42%
-36%
-29%
-23%
-17%
-10%
3%
9%
16%
22%
28%
35%
41%
48%
53%
Returns Returns
Lets review the methodology through a simple example
Assume the daily returns for a share of common stock are:
Monday -14%
Tuesday 8%
Wednesday -2%
Thursday 10%
Friday 4%
(X i X )2
Standard Deviation = i 1
n 1
= 9.654%
Next we can compute the Z scores and related cumulative probability for each return:
Cumulative
Return Z Score Normal Probability
-14% -1.5745 0.0577
-2% -0.3315 0.3701
4% 0.2900 0.6141
8% 0.7044 0.7594
10% 0.9115 0.8190
Compare Actual and Cumulative Probabilities and compute the absolute difference between
them:
Cumulative Cumulative Absolute
Return Actual Probability Normal Probability Difference
-14% 0.20 0.0577 0.1423
-2% 0.40 0.3701 0.0299
4% 0.60 0.6141 0.0141
8% 0.80 0.7594 0.0406
10% 1.00 0.8190 0.1810
Finally use the maximum absolute difference of 0.1810 in our critical test value 21
Final step and conclusions from KS test
Final step is to compare the maximum absolute
differences of 0.1810 to a critical table value
The test designers created a KS table to help interpret
this absolute value, which they called the D value
In advance of any formal statistical test any researcher
would nominate how rigorous they wanted the
conclusion to be
This is called the degree of confidence. Most
researcher test at the 95% confidence interval or
range, implying they want to be 95% confident in
their conclusions
95% confidence interval (CI) is also referenced as the
5% significant level from (100% - CI). We will test
at this level as well
From the table to the right look up the value
associated with a sample size of 5 at the 5%
significant level is 0.565
The generic approach in most statistical test’s is to
compare the key ratio or number from the test to
the critical value from the table
If the Test Value > Critical Value we reject the Null
Hypothesis. If the Test Value <= Critical Value then
we fail to reject the Null Hypothesis
In our case 0.1810 < 0.562 so we cannot reject the
Null hypothesis. Therefore we are 95% confident
that our stock data is normally distributed
22
Test 147 year of S&P 500 returns – Return Distributions
To recap this is what our respective distributions looks like. The distribution of Actual
returns on the left and the Theoretical returns on the right
Will use the KS methodology to determine if these are statistical equivalent at the 95%
confidence internal
Annual S&P Returns Annual S&P Returns
10%
if precisely Normally Distributed
7%
9%
6%
8%
7% 5%
Probability
6% 4%
Probability
5%
3%
4%
2%
3%
2% 1%
1% 0%
0%
-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
-1%
-42%
-39%
-36%
-33%
-29%
-26%
-23%
-20%
-17%
-13%
-10%
-7%
-4%
0%
3%
6%
9%
12%
16%
19%
22%
25%
28%
32%
35%
38%
41%
45%
48%
51%
Returns
Returns
Probability
60% 60%
Probability
50% 50%
40%
40%
30%
30%
20%
20%
10%
10% 0%
-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
0%
-4%
-42%
-36%
-29%
-23%
-17%
-10%
3%
9%
16%
22%
28%
35%
41%
48%
53%
Returns Returns
23
Test 147 year of S&P 500 returns – Volatility Ranges
As an interesting prelude before running the KS test, the table below shows values for a 1.00
and a 1.96 standard deviation range around the mean for both theoretical and actual returns
From the standard normal distribution we theoretically know 68.26% of the data lies within a
-/+ 1 standard deviation of the mean. That includes all data between a 15.87% and 84.13%
cumulative probability
And 95% of the data theoretically lies within -/+ 1.96 standard deviations of the mean.
Therefore, it includes all data between 2.50% and 97.50% cumulative probability
For actual data, order the returns from lowest to highest and set up the actual cumulative
probability distribution. Interpolate as needed to get values at the 15.87%, 84.13%, 2.50%
and 97.50% cumulative levels. The resulting returns are captured below
For a -1 standard deviation the theoretical return would be -7.5% while the actual return is
-8.5%. For +1 standard deviation the theoretical return is 28.6% with an actual of 28.7%
For a -1.96 standard deviation the theoretical return is -24.8% while the actual return is
-29.0%. For +1.96 standard deviation the theoretical return is 46.0% with actual of 44.6%
Broadly these specific ranges seem fairly close but there are differences. And the key
question in statistics is always, are the differences statistically different
And that’s where the KS Test provides a comprehensive answer. Its inspects every difference
between cumulative theoretical and actual and zeros in on the largest and then helps
conclude if they are statistically equivalent or different
Cumulative Probability
100%
90%
80%
70%
60%
50%
Probability
40%
30%
20%
10%
0%
-42%
-38%
-35%
-31%
-27%
-23%
-19%
-15%
-11%
-8%
-4%
0%
4%
8%
12%
16%
19%
23%
27%
31%
35%
39%
43%
46%
50%
Actual Theoretical
25
Test 147 year of S&P 500 returns – KS Test
The maximum absolute difference from the KS test is 0.0451
We are conducting our test at the 95% confidence interval / 5% significant level
Given 147 observations. The Critical value from the KS table is 1.36 / 147 or 0.1122
Since Test Value <= Critical Value or 0.0451 < 0.1122 then we fail to reject the Null
Hypothesis
Therefore we are 95% confident that our data is normally distributed
26
One more KS test: Mini Case Study - Central Limit Theorem
Graph of Empirical Results for Distribution of Sample Means
Population Recap the Central Limit Theorem
Obs 1
Obs 2
5.00%
-5.00%
In earlier simulation we assumed an
Mean
Stdev
0%
5.00%
asset that either goes up 5% or drops
Sample for Mean n = 100 Empirical Theoretical
5% each day with equal probability
Average 0.0040% 0.0000% To demonstrate the Central Limit
Standard Error of Mean 0.5030% 0.5000%
Theorem we randomly selected 100
Graph of Samples
Point Num Obs daily returns and computed the
-1.400000%
-1.200000%
0.3333%
0.6667% Distribution of Sample Means, n = 100
sample average
-1.100000%
-1.000000%
0.3333%
1.0000%
Graph is Constructed Using 300 Sample Means We did this 300 times and based on
-0.900000% 1.3333%
10%
the Central Limit Theorem expected
-0.800000% 3.0000%
-0.700000% 2.3333% 9%
the distribution of sample means to
-0.600000% 5.6667%
-0.500000% 4.3333% 8% be normally distributed with a Mean
-0.400000%
-0.300000%
4.3333%
8.0000%
7% of 0% and a Standard Error of the
Mean of 0.5%
Frequency
-0.200000% 7.0000% 6%
-0.100000% 9.3333%
0.000000% 7.6667%
5%
And ended up with results close to
0.100000% 7.6667% 4%
0.200000% 5.0000% theoretical
Broadly the graph of sample means
3%
0.300000% 7.6667%
0.400000% 4.6667% 2%
0.500000%
0.600000%
4.3333%
5.0000% 1% looks bell shaped
0.700000% 2.6667% 0% But lets run a KS test to see if the
0.800000% 3.0000%
data is normally distributed
30 %
50 %
70 %
90 %
10 %
50 %
%
.1 0 %
.9 0 %
.7 0 %
.5 0 %
.3 0 %
.1 0 %
10 %
0.900000% 1.0000%
0. 000
0. 000
0. 000
0. 000
1. 000
1. 000
00
0. 000
-1 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
00
1.000000% 1.6667%
0
0
0
0
0
0
00
0
0
0
0
0
0
.4
-1
1.100000% 1.3333%
1.200000% 0.3333% Sample Means
1.500000% 0.3333%
27
Total 100.0000%
Results of KS test: Mini Case Study - Central Limit Theorem
KS Test for Normality KS Results
Mean 0.0040% When we compared the Theoretical and
Std Dev 0.5030%
Actual distribution we found a maximum
difference of 0.0586
Obs
Actual
Frequency Cum Prob
Theo Max Abs
Cum Prob Difference
From the KS tables the Critical D value is
-1.40% 1 0.3333% 0.2623% 0.0007 1.36 / 300 = 0.0785
-1.20% 2 1.0000% 0.8336% 0.0017
-1.10% 1 1.3333% 1.4080% 0.0007 Since the Test Value <= Critical Value
-1.00% 3 2.3333% 2.2955% 0.0004
-0.90% 4 3.6667% 3.6137% 0.0005
or 0.0586 < 0.0785, we fail to reject the
-0.80%
-0.70%
9
7
6.6667%
9.0000%
5.4959%
8.0796%
0.0117
0.0092
Null Hypothesis
-0.60% 17 14.6667% 11.4893% 0.0318 Therefore we are 95% confident that
-0.50% 13 19.0000% 15.8152% 0.0318
-0.40% 13 23.3333% 21.0913% 0.0224
our data is normally distributed
-0.30%
-0.20%
24 31.3333%
21 38.3333%
27.2778%
34.2516%
0.0406
0.0408
Yes its what we are expecting. You can
-0.10% 28 47.6667% 41.8091% 0.0586 have a population whose data is not
0.00% 23 55.3333% 49.6827% 0.0565
0.10% 23 63.0000% 57.5687% 0.0543 normally distributed but as long as the
0.20%
0.30%
15 68.0000%
23 75.6667%
65.1621%
72.1910%
0.0284
0.0348
population has a finite range then sample
0.40% 14 80.3333% 78.4462% 0.0189 means will be normally distributed
0.50%
0.60%
13 84.6667%
15 89.6667%
83.7977%
88.1992%
0.0087
0.0147 It’s a powerful demonstration why the
0.70%
0.80%
8 92.3333%
9 95.3333%
91.6795%
94.3250%
0.0065
0.0101
Normal Distribution is one of the most
0.90% 3 96.3333% 96.2583% 0.0008 prominent patterns in a wide range of
1.00% 5 98.0000% 97.6166% 0.0038
1.10% 4 99.3333% 98.5339% 0.0080 everyday observations
1.20% 1 99.6667% 99.1296% 0.0054
1.50% 1 100.0000% 99.8532% 0.0015