You are on page 1of 28

Introduction to Statistics & Econometrics

1
What is Financial Econometrics and how is it helpful?
Financial Econometrics
 Applies mathematical and statistical techniques to analyzing economic and
financial data
 Can help test theories or validate intuition using empirical or simulated data
and rich toolbox of techniques and models to derive connections
 Investment managers typically apply these quantitative tools to better select
securities or manage portfolios
 Econometric can help answer questions such as:
 Does one portfolio manager have a consistent edge over another?
 Do high beta stocks outperform low beta?
 Do acquisitions favor the target company or the acquirer?
 How well could you replicate the S&P 500 with fewer securities
 The goal of most empirical research is to discover insights that improve
investment performance
 While Econometrics may seem rigid but once you master the fundamentals, it
allows you to be creative and robust in your research

2
Econometrics is built on statistics. Lets review some
essentials
Most asset research and investment models require a measure of
risk and return for the asset or strategy under review
 The most commonly used measure of return is either the Arithmetic mean or
Geometric mean
 Risk or volatility is typically measure by the Standard Deviation
 Will explore both with help of data from Robert Shiller of Yale University and Aswath
Damodaran of NYU
 A combined data series captures broad based US equity returns, S&P 500 composite,
back to 1871
US Equity Returns
50%

40%

30%

20%

10%

0%

-10%

-20%

-30%

-40%
1871
1875
1879
1883
1887
1891
1895
1899
1903
1907
1911
1915
1919
1923
1927
1931
1935
1939
1943
1947
1951
1955
1959
1963
1967
1971
1975
1979
1983
1987
1991
1995
1999
2003
2007
2011
2015
2019
3
What would $100 invested in 1871 have grown to?
A whopping $38.3 million
 Albert Einstein is reputed to have said that “Compounded Interest is the most
powerful force in the world”
 General tendency of stock markets to go up is a visceral reminder of what
uninterrupted investing can achieve
 As a cautionary tale, withdrawing $5 at the end of each year for consumption would
have reduced the nest egg to $9.4 million!

Growth of $100 in US Equity


40,000,000
$100 grows to
35,000,000 $38.3 million
30,000,000

25,000,000

20,000,000
Powerful bull market post
15,000,000
2008-09 financial crisis
10,000,000

5,000,000

0
1871
1875
1879
1883
1887
1891
1895
1899
1903
1907
1911
1915
1919
1923
1927
1931
1935
1939
1943
1947
1951
1955
1959
1963
1967
1971
1975
1979
1983
1987
1991
1995
1999
2003
2007
2011
2015
2019
4
Measuring risk and return
 Like any traditional mean the Arithmetic mean is computed as the summation of all
observations divided by the number of observations
Return i

n
 The annual Arithmetic mean / return since 1871 is 10.58%
 Since 1928, which is a typical test period for lots of research, the return is 11.57%
 But returns on their own give an incomplete picture
 Investors are assumed to be risk adverse - that is for the same expected return a rational
investor will chose the least risky investment
 Obviously they care about return volatility

Standard Deviation ( )

 Standard deviation is the most common measure of risk and is often referenced by its
Greek letter (sigma) in formal research reports
 Annual equity volatility since 1871 has been 18.05% and since 1928, 17.99%
 How does all of that help investors think about the range or distribution of returns
 For instance what would a record year look like or more ominously how bad could it
get?

5
What about the return distribution?
 Many of the great breakthroughs in statistical thinking came from the demands of
a very practical clientele, gamblers
 As they looked for an edge in games of chance, which started with rudimentary
efforts like coin flips, they started to ask questions like - If a fair coin is flipped 100
times what’s the probability of getting 20 heads or 40 or 60 ….?
 That led to a string of insights on how independent data in repeated experiments
behaves
 Insights from statistical luminaries like Laplace, De Moivre (was a consultant to
gamblers and insurance brokers), Bernoulli, Adrian, Gauss and others incrementally
built on each others work and came to a stunning conclusion…
 … in studying fair games and experiments of chance repeated testing or sampling
resulted in a distribution that was bell shaped
 As it appeared so often in their experiments, they called it the Normal Distribution
 Its most striking characteristic is its symmetry with the largest number of observations
clustering near the center (around the mean) and then trailing off as you move further
towards the tails Normal Distribution

6
Some key insight that led to the Normal Distribution
 Before we jump into how to use the Normal Distribution its worth highlighting two
key building blocks that make it robust
 Law of Large Numbers: This is fairly intuitive. The larger the size of a selected
random sample is, the closer that average is to the true population average versus the
result form a smaller sample
o Another more interesting and robust approach to the above is to create a series of fairly
large samples with at least 30 observations in each sample. Next take the average of
each sample and finally take the average of all those sample averages. This should give a
decent estimate of the true population average, which improves with sample size
o For example assume you create 300 samples that each have a sample size of 100
observations. Compute the average of each sample. You now have 300 estimates for the
true average. Finally take the average of those 300 estimates to get a robust estimate for
the true population average

 Central Limit Theorem: The average of averages approach highlighted above


generated deep insight. Statistician found that they could start with data that was not
normally distributed but as long as it had a mean and a finite standard deviation (in
other words had a definite range) – then when they graphed the sample means they
were Normally Distributed
 The other remarkable discovery was that the standard deviation of the sample means
was related to the standard deviation of the initial data and the sample size. Let see the
central limit theorem in action on the next slide
7
Mini Case Study - Central Limit Theorem
And an introduction to the Standard Error of the Mean

Assume the following


 There is a hypothetical security that can either go up 5% or drop 5% each day with equal
probability. Assume you have 10,000 observations and they are evenly split as noted
 Therefore the return on this security is expected to be 0% with a Standard Deviation of 5.00%
 Clearly this security is not normally distribution as 50% of the observations are +5.00% and 50%
are -5.00%
 To prove the Central Limit Theorem, select 100 random observations from the large data series.
Do this 300 times, resulting in 300 samples
 Compute the mean of each sample
What would you expect based on theory
 In accordance with the Law of Large Number the distribution of sample means is expected to have
a theoretical return or mean of 0% and equal to the security return
 However the more interesting result is what the Central Limit Theorem additionally expects
 It expects that the Standard Deviation of these 300 Sample Means, which is formally called, the
Standard Error of the Mean should be equal to:

Security Standard Deviation /  Sample Size = 5% /  100 or 0.5%


 We simulate these results by randomly selecting a sample of 100 observations, 300 times. Results
are on the next slide

8
Mini Case Study - Central Limit Theorem (continued)
Graph of Empirical Results for Distribution of Sample Means
Population
Obs 1 5.00%
Obs 2 -5.00% Simulation Results
Mean 0%  To recap, we have 300 data points
Stdev 5.00%
 Each is the average of 100 observations
Sample for Mean n = 100 Empirical Theoretical  All 300 samples are created through
Average 0.0040% 0.0000%
Standard Error of Mean 0.5030% 0.5000% random selection
 Mean of the resulting distribution of
Graph of Samples
Point Num Obs sample means is 0.0040%. We would
-1.400000% 0.3333% theoretically expect 0.0000%
-1.200000%
-1.100000%
0.6667%
0.3333%
Distribution of Sample Means, n = 100
Graph is Constructed Using 300 Sample Means
 Standard Error of the Mean is 0.5030%.
-1.000000% 1.0000% Theoretically expect 0.5000%
-0.900000%
-0.800000%
1.3333%
3.0000%
10%  Both are extremely close to theoretical
-0.700000% 2.3333% 9%
expectations
-0.600000% 5.6667%
8%
 Will show later that a Different
-0.500000% 4.3333%
-0.400000% 4.3333% 7%
Between Two Means test confirms this
-0.300000% 8.0000%  Also once we introduce methodology to
Frequency

-0.200000% 7.0000% 6%
-0.100000% 9.3333%
test if a distribution is normally
5%
0.000000% 7.6667% distributed (KS test) we will further
0.100000% 7.6667% 4%
reconfirm our results
0.200000% 5.0000%
0.300000% 7.6667%
3%  However, to avoid suspense - the KS
0.400000% 4.6667% 2% test confirm that the Sample Means are
0.500000% 4.3333%
0.600000% 5.0000% 1% normally distributed even though the
0.700000% 2.6667% 0% underlying security returns are not
0.800000% 3.0000%
30 %
50 %
70 %
90 %
10 %
50 %

%
.1 0 %
.9 0 %
.7 0 %
.5 0 %
.3 0 %
.1 0 %

10 %

0.900000% 1.0000%
0. 000
0. 000
0. 000
0. 000
1. 000
1. 000
00
0. 000
-1 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0

00

1.000000% 1.6667%
0
0
0
0
0
0
00
0
0
0
0
0
0
.4
-1

1.100000% 1.3333%
1.200000% 0.3333% Sample Means
1.500000% 0.3333%

Total 100.0000% 9
Back to the Normal Distribution
 As we noted earlier most investment models in Modern Portfolio Theory assume
stock returns are normally distributed
 Efficient Market theory suggests stock prices follow a random walk as only new
and therefore still unknown / random information drive prices
 Since stock prices are bound at 0 with theoretically unlimited upside, this skews the
distribution of stock prices to the right. Therefore stock prices tend to follow a
Lognormal Distribution
 Consistent with this, stock returns are expected to follow a Normal
Distribution. The assumption generally holds in practice although daily returns tend
to have more extreme outcomes (fatter tails) than is normally assumed

Elegance of the Normal Distribution


 Given its symmetry, the probability of any observation can easily be computed
relative to the mean and standard deviation
 A theoretical function generates every point on the distribution. And integral
calculus allows us to compute the area under the curve to arrive at the cumulative
probability for any observation
 Fortunately we don’t have to start from scratch. The probabilities are published as
Standard Normal Distribution tables in statistical textbooks and can also be easily
derived in computational packages like Excel
 Key is deriving the cumulative probability of any observations is to compute its Z
Score

10
Normal distribution and the Z score in action
Identifying a top performing manager
 From 149 years of S&P 500 data, we found a mean of 10.58%, Standard Deviation of 18.05%
 Assume there are 5 equity managers who are urging you to invest with them. And all claim to
have superior performance
 You could simply rank them and take the top one. However 5 eager mangers is a small sample.
The true superior performers may have all the business they need so may not have called yet
 Assume you define superior performance as returns in the top 5% based on historical S&P 500
results
 Of the 5 managers, the top performer had a 15.28% return over the last year. Is this superior
performance?

Evaluating the manager


 First compute the Z score = (Observation – Mean) / Standard Deviation
 Punching in the numbers gives a Z score = (15.28% - 10.58%) / 18.05% = 0.2605
 The Z score is the number of Standard Deviation the observation is from the mean. A Z score of
0 indicates an observation is equal to the mean. A positive Z score indicates a value above the
mean while negative is below
 To use the Standard Normal tables (Z Scores) on the next few slide round the Z score to 2
decimal places
 The tables are set up as a matrix of cumulative probabilities. The coordinate down the left
hand side is the Z score to the first decimal place while the numbers across the top row helps
capture the second decimal place
 As highlighted in the table, a 0.26 Z score has a cumulative probability of 0.6026. Therefore it
suggest the manager would have outperformed 60% of peers. Fine but not superior
11
Standard Normal Tables (Negative Z Scores)
Values of the Standard Normal Distribution Function
-3.89 Z Score has close Page 1: Negative Z Scores
to 0% probability
Z -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 Z
-3.8 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 -3.8
-3.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 -3.7
-3.6 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0002 -3.6
-3.5 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 -3.5
-3.4 0.0002 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 -3.4
-3.3 0.0003 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0005 0.0005 0.0005 -3.3
-3.2 0.0005 0.0005 0.0005 0.0006 0.0006 0.0006 0.0006 0.0006 0.0007 0.0007 -3.2
-3.1 0.0007 0.0007 0.0008 0.0008 0.0008 0.0008 0.0009 0.0009 0.0009 0.0010 -3.1
-3.0 0.0010 0.0010 0.0011 0.0011 0.0011 0.0012 0.0012 0.0013 0.0013 0.0013 -3.0
-2.9 0.0014 0.0014 0.0015 0.0015 0.0016 0.0016 0.0017 0.0018 0.0018 0.0019 -2.9
-2.8 0.0019 0.0020 0.0021 0.0021 0.0022 0.0023 0.0023 0.0024 0.0025 0.0026 -2.8
-2.7 0.0026 0.0027 0.0028 0.0029 0.0030 0.0031 0.0032 0.0033 0.0034 0.0035 -2.7
-2.6 0.0036 0.0037 0.0038 0.0039 0.0040 0.0041 0.0043 0.0044 0.0045 0.0047 -2.6
-2.5 0.0048 0.0049 0.0051 0.0052 0.0054 0.0055 0.0057 0.0059 0.0060 0.0062 -2.5
-2.4 0.0064 0.0066 0.0068 0.0069 0.0071 0.0073 0.0075 0.0078 0.0080 0.0082 -2.4
-2.3 0.0084 0.0087 0.0089 0.0091 0.0094 0.0096 0.0099 0.0102 0.0104 0.0107 -2.3
-2.2 0.0110 0.0113 0.0116 0.0119 0.0122 0.0125 0.0129 0.0132 0.0136 0.0139 -2.2
-2.1 0.0143 0.0146 0.0150 0.0154 0.0158 0.0162 0.0166 0.0170 0.0174 0.0179 -2.1
-2.0 0.0183 0.0188 0.0192 0.0197 0.0202 0.0207 0.0212 0.0217 0.0222 0.0228 -2.0
-1.9 0.0233 0.0239 0.0244 0.0250 0.0256 0.0262 0.0268 0.0274 0.0281 0.0287 -1.9
-1.8 0.0294 0.0301 0.0307 0.0314 0.0322 0.0329 0.0336 0.0344 0.0351 0.0359 -1.8
-1.7 0.0367 0.0375 0.0384 0.0392 0.0401 0.0409 0.0418 0.0427 0.0436 0.0446 -1.7
-1.6 0.0455 0.0465 0.0475 0.0485 0.0495 0.0505 0.0516 0.0526 0.0537 0.0548 -1.6
-1.5 0.0559 0.0571 0.0582 0.0594 0.0606 0.0618 0.0630 0.0643 0.0655 0.0668 -1.5
-1.4 0.0681 0.0694 0.0708 0.0721 0.0735 0.0749 0.0764 0.0778 0.0793 0.0808 -1.4
-1.3 0.0823 0.0838 0.0853 0.0869 0.0885 0.0901 0.0918 0.0934 0.0951 0.0968 -1.3
-1.2 0.0985 0.1003 0.1020 0.1038 0.1056 0.1075 0.1093 0.1112 0.1131 0.1151 -1.2
-1.1 0.1170 0.1190 0.1210 0.1230 0.1251 0.1271 0.1292 0.1314 0.1335 0.1357 -1.1
-1.0 0.1379 0.1401 0.1423 0.1446 0.1469 0.1492 0.1515 0.1539 0.1562 0.1587 -1.0
-0.9 0.1611 0.1635 0.1660 0.1685 0.1711 0.1736 0.1762 0.1788 0.1814 0.1841 -0.9
-0.8 0.1867 0.1894 0.1922 0.1949 0.1977 0.2005 0.2033 0.2061 0.2090 0.2119 -0.8
-0.7 0.2148 0.2177 0.2206 0.2236 0.2266 0.2296 0.2327 0.2358 0.2389 0.2420 -0.7
-0.6 0.2451 0.2483 0.2514 0.2546 0.2578 0.2611 0.2643 0.2676 0.2709 0.2743 -0.6
-0.5 0.2776 0.2810 0.2843 0.2877 0.2912 0.2946 0.2981 0.3015 0.3050 0.3085 -0.5 0 Z Score has a
-0.4 0.3121 0.3156 0.3192 0.3228 0.3264 0.3300 0.3336 0.3372 0.3409 0.3446 -0.4
-0.3 0.3483 0.3520 0.3557 0.3594 0.3632 0.3669 0.3707 0.3745 0.3783 0.3821 -0.3 50% probability
-0.2 0.3859 0.3897 0.3936 0.3974 0.4013 0.4052 0.4090 0.4129 0.4168 0.4207 -0.2
-0.1 0.4247 0.4286 0.4325 0.4364 0.4404 0.4443 0.4483 0.4522 0.4562 0.4602 -0.1 12
0.0 0.4641 0.4681 0.4721 0.4761 0.4801 0.4840 0.4880 0.4920 0.4960 0.5000 0.0
Standard Normal Tables (Positive Z Scores)
Values of the Standard Normal Distribution Function
0.26 Z Score has a Page 2: Positive Z Scores
60.26% probability Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Z
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.0
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.1
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.2
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.3
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.4
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.5
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.6
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.7
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.8
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 0.9
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.0
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.1
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.2
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.3
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.4
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.5
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.6
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.7
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.8
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1.9
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.0
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.1
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.2
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.3
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.4
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.5
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.6
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.7
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.8
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 2.9
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.0
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.1
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.2
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.3
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 3.4
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 3.5
3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 3.6
13
3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 3.7
3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 3.8
Normal distribution and the Z score in action
Continuing our inquiry, what return would have been superior?
 Again we define superior performance as returns in the top 5%. Therefore inspecting
the Standard Normal table, what Z score would generate a 95% cumulative
probability?
 As highlight below 95% cumulative probability would be approximately a 1.645
Z score
Values of the Standard Normal Distribution Function
Page 2: Positive Z Scores
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Z
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.0
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.1
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.2
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.3
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.4
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.5
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.6
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.7
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.8
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 0.9
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.0
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.1
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.2
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.3
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.4
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.5
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.6

What performance level is superior?


 Solve for return where: Z score = (Observation – Mean) / Standard Deviation
or 1.645 = (Superior - 10.58%) / 18.05%, implying Superior = 40.27%
 In practice most clients aren’t looking for single year stars. Then want sustainability
so lets see what’s superior over a longer horizon 14
Normal distribution and the Z score in action
Continuing our inquiry, how about superior over 10 years?

What performance level is superior over 10 years?


 Using a rolling 10 year return over the 147 years results in a Mean of 10.59% and a
Standard Deviation of 4.69%
 You will notice that the mean is comparable to our one year average result but the
Standard Deviation for single returns has dropped from 18.05% to 4.69% for 10 year
average performance
 This is expected. Remember the Standard Error of the Mean introduced earlier?
 We would expect a theoretical result of 18.05% /  10 = 5.70%
 This is the one year Standard Deviation divided by the square root of the sample size
which is 10 year’s of annual returns
 Given that the theoretical result of 5.70% is higher than the actual result of 4.69%,
implies that annual returns are not completely random. We tend to have a run of
positive years punctuated by some correction
 In general, outside of a recession, equity markets tend to have positive performance
 Now back to our manger! Superior performance over 10 years is:
1.645 = (Superior - 10.59%) / 4.69%, implying Superior = 18.28%

15
As an aside S&P 500 rolling 10 year returns
It’s rare to have a decade of losses but late 30’s and 2008-09 were tough

 Per graph below, 18% over a decade certainly looks like superior performance
 Late 30’s and 2008-09 included the worst S&P 500 annual downturns, but
otherwise 10 year runs have been positive

10 Year Rolling Average


21%

18%

15%

12%

9%

6%

3%

0%

-3%
1880
1884
1888
1892
1896
1900
1904
1908
1912
1916
1920
1924
1928
1932
1936
1940
1944
1948
1952
1956
1960
1964
1968
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
16
Normal distribution: Standard Deviation ranges worth
remembering
 First thing worth noting is that 50% of the normal distribution lies below and 50%
above a Z score of 0
 Since the mean has a Z score of 0 this is not a surprise. But affirms the distribution is
symmetrical
 68.26% of the distribution lies within 1 standard deviation (σ) of the mean (µ) , µ ± 1σ
 95.00% % lies within 1.96 standard deviations of the mean, µ ± 1.96σ
 95.45% lies within 2 standard deviations of the mean, µ ± 2σ
 And most if it, or 99.73%, lies within 3 standard deviations of the mean, µ ± 3σ

Couple of items worth highlighting


 The 95% range is very prominent in formal statistical studies. We will explore in more
detail but when if you are trying to publish your claim for inclusion in a research
journal, there is a high bar for proof
 The starting assumption by your peers is that there is nothing novel going on in your
findings. So you have to prove otherwise
 In general your results have to outside the 95% statistical range to be confident some
worthwhile is going on
 In parallel fashion practitioners tend to regard a bubble as a 2 sigma (standard
deviation) valuation move from a reasonable mean level
 Jeremy Grantham, a cofounder of GMO a Boston based quantitative manager has been
a prolific Bubble research and have include some of his thoughts on the next slide
17
Some of the biggest bubbles in modern times
Thoughts from Jeremy Grantham
 Two sigma events are statically expected once
every 44 years. In practice they occur once
every 31 years. They have a fat tail!
 US housing crash in 2007-08 was closer to a 3.5
sigma evert: a 1 in 5,000 year outcome
 US equity market crash of 2008 wasn’t a 2
sigma evert but the Dot Com bust in 2000 was
 However 2008 was unique in other ways. Global
asset classes were almost universally overpriced
Thoughts from Ed Chancellor
In the same update Ed offer his views on key
characteristic that drive market mania
 Feeling this time is different and there has been a
regime shift
 Moral hazard when markets feel governments
will bail them out
 Easy Money: Cheap credit allows reckless
leverage
 Over confidence in growth stories
 No valuation anchor: Analysts ignore
fundamentals
 Conspicuous consumption: As new wealth
splurges on trophy assets
 Ponzi Schemes: Decline in lending standards and
a believe it will continue with few ramifications
 Irrational Exuberance: Flaky IPOs are an early
indicator. A persistent view that prices will only
Source: GMO Q1 2014 Newsletter, Looking for Bubbles, Part One: A Statistical Approach
go up 18
Test 149 years of S&P 500 to see if normally distributed?
Will use the Kolmogorov-Smirnov (KS) goodness of fit test to help confirm normality
 The KS test is a well regard general test. Its essential approach is to compare the cumulative probability
of the actual data to the assumed theoretical distribution
 Actual returns and cumulative probability on the left look a little choppier than the aesthetically smooth
normal distribution on the right but are they statistically close? The KS test will help decide

Annual S&P Returns Annual S&P Returns


10%
if precisely Normally Distributed
7%
9%
6%
8%
7% 5%
Probability

6% 4%

Probability
5%
3%
4%
2%
3%
2% 1%
1% 0%
0%

-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
-1%
-42%
-39%
-36%
-33%
-29%
-26%
-23%
-20%
-17%
-13%
-10%
-7%
-4%
0%
3%
6%
9%
12%
16%
19%
22%
25%
28%
32%
35%
38%
41%
45%
48%
51%
Returns
Returns

Cumulative Probability Cumulative Probability


100% 100%
90% 90%
80% 80%
70% 70%
Probability
60% 60%
Probability

50% 50%
40%
40%
30%
30%
20%
20%
10%
10% 0%
-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
0%

19
-4%
-42%

-36%

-29%

-23%

-17%

-10%

3%

9%

16%

22%

28%

35%

41%

48%

53%

Returns Returns
Lets review the methodology through a simple example
 Assume the daily returns for a share of common stock are:
Monday -14%
Tuesday 8%
Wednesday -2%
Thursday 10%
Friday 4%

 Our research objective is to see if these returns are normally distributed


 In formal published research we would lay out our research goal as follows - where Ho is our
Null Hypothesis and Ha is the Alternative:
Ho: Actual Cumulative Probability = Normal Cumulative Probability
Ha: Actual Cumulative Probability  Normal Cumulative Probability

 Lets see is Actual and Theoretical probabilities are different


 First compute the Actual cumulative probabilities by sorting the data from lowest to highest
 Assign the actual probability to each unique return and compute actual cumulative probability
as follows:
Number of Actual Cumulative
Return Observations Probability Actual Probability
-14% 1 0.20 0.20
-2% 1 0.20 0.40
4% 1 0.20 0.60
8% 1 0.20 0.80
10% 1 0.20 1.00 20
Continuing the KS Test
 Now lets calculate the cumulative probability assuming the data is normally distributed. All we
need is mean and standard deviation
 14  8  2  10  4
Average Return = 5
 1.20 %
1.20%
n

(X i  X )2
Standard Deviation = i 1

n 1
= 9.654%

 Next we can compute the Z scores and related cumulative probability for each return:
Cumulative
Return Z Score Normal Probability
-14% -1.5745 0.0577
-2% -0.3315 0.3701
4% 0.2900 0.6141
8% 0.7044 0.7594
10% 0.9115 0.8190

 Compare Actual and Cumulative Probabilities and compute the absolute difference between
them:
Cumulative Cumulative Absolute
Return Actual Probability Normal Probability Difference
-14% 0.20 0.0577 0.1423
-2% 0.40 0.3701 0.0299
4% 0.60 0.6141 0.0141
8% 0.80 0.7594 0.0406
10% 1.00 0.8190 0.1810
 Finally use the maximum absolute difference of 0.1810 in our critical test value 21
Final step and conclusions from KS test
 Final step is to compare the maximum absolute
differences of 0.1810 to a critical table value
 The test designers created a KS table to help interpret
this absolute value, which they called the D value
 In advance of any formal statistical test any researcher
would nominate how rigorous they wanted the
conclusion to be
 This is called the degree of confidence. Most
researcher test at the 95% confidence interval or
range, implying they want to be 95% confident in
their conclusions
 95% confidence interval (CI) is also referenced as the
5% significant level from (100% - CI). We will test
at this level as well
 From the table to the right look up the value
associated with a sample size of 5 at the 5%
significant level is 0.565
 The generic approach in most statistical test’s is to
compare the key ratio or number from the test to
the critical value from the table
 If the Test Value > Critical Value we reject the Null
Hypothesis. If the Test Value <= Critical Value then
we fail to reject the Null Hypothesis
 In our case 0.1810 < 0.562 so we cannot reject the
Null hypothesis. Therefore we are 95% confident
that our stock data is normally distributed
22
Test 147 year of S&P 500 returns – Return Distributions
 To recap this is what our respective distributions looks like. The distribution of Actual
returns on the left and the Theoretical returns on the right
 Will use the KS methodology to determine if these are statistical equivalent at the 95%
confidence internal
Annual S&P Returns Annual S&P Returns
10%
if precisely Normally Distributed
7%
9%
6%
8%
7% 5%
Probability

6% 4%

Probability
5%
3%
4%
2%
3%
2% 1%
1% 0%
0%

-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
-1%
-42%
-39%
-36%
-33%
-29%
-26%
-23%
-20%
-17%
-13%
-10%
-7%
-4%
0%
3%
6%
9%
12%
16%
19%
22%
25%
28%
32%
35%
38%
41%
45%
48%
51%
Returns
Returns

Cumulative Probability Cumulative Probability


100% 100%
90% 90%
80% 80%
70% 70%

Probability
60% 60%
Probability

50% 50%
40%
40%
30%
30%
20%
20%
10%
10% 0%
-44%
-42%
-36%
-29%
-23%
-17%
-10%
-4%
3%
9%
16%
22%
28%
35%
41%
48%
49%
50%
53%
0%
-4%
-42%

-36%

-29%

-23%

-17%

-10%

3%

9%

16%

22%

28%

35%

41%

48%

53%

Returns Returns

23
Test 147 year of S&P 500 returns – Volatility Ranges
 As an interesting prelude before running the KS test, the table below shows values for a 1.00
and a 1.96 standard deviation range around the mean for both theoretical and actual returns
 From the standard normal distribution we theoretically know 68.26% of the data lies within a
-/+ 1 standard deviation of the mean. That includes all data between a 15.87% and 84.13%
cumulative probability
 And 95% of the data theoretically lies within -/+ 1.96 standard deviations of the mean.
Therefore, it includes all data between 2.50% and 97.50% cumulative probability
 For actual data, order the returns from lowest to highest and set up the actual cumulative
probability distribution. Interpolate as needed to get values at the 15.87%, 84.13%, 2.50%
and 97.50% cumulative levels. The resulting returns are captured below
 For a -1 standard deviation the theoretical return would be -7.5% while the actual return is
-8.5%. For +1 standard deviation the theoretical return is 28.6% with an actual of 28.7%
 For a -1.96 standard deviation the theoretical return is -24.8% while the actual return is
-29.0%. For +1.96 standard deviation the theoretical return is 46.0% with actual of 44.6%
 Broadly these specific ranges seem fairly close but there are differences. And the key
question in statistics is always, are the differences statistically different
 And that’s where the KS Test provides a comprehensive answer. Its inspects every difference
between cumulative theoretical and actual and zeros in on the largest and then helps
conclude if they are statistically equivalent or different

Standard Theoretical Ranges Actual Ranges


Deviation Lower Average Upper Lower Average Upper
1.00 -7.5% 10.58% 28.6% -8.5% 10.58% 28.7%
1.96 -24.8% 10.58% 46.0% -29.0% 10.58% 44.6% 24
Test 147 year of S&P 500 returns – Cumulative Probabilities
 The essence of the KS test is to compare cumulative probabilities
 The theoretical and actual distribution below seem to be reasonably in line
 However, let’s see what the KS tests concludes on the next slide

Cumulative Probability
100%
90%
80%
70%
60%
50%
Probability

40%
30%
20%
10%
0%
-42%
-38%
-35%
-31%
-27%
-23%
-19%
-15%
-11%
-8%
-4%
0%
4%
8%
12%
16%
19%
23%
27%
31%
35%
39%
43%
46%
50%
Actual Theoretical

25
Test 147 year of S&P 500 returns – KS Test
 The maximum absolute difference from the KS test is 0.0451
 We are conducting our test at the 95% confidence interval / 5% significant level
 Given 147 observations. The Critical value from the KS table is 1.36 / 147 or 0.1122
 Since Test Value <= Critical Value or 0.0451 < 0.1122 then we fail to reject the Null
Hypothesis
 Therefore we are 95% confident that our data is normally distributed

KS Test - Absolute Differences


0.045
0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
-43.84%
-25.12%
-14.29%
-10.67%
-8.81%
-8.30%
-4.70%
-2.38%
0.34%
2.10%
3.58%
5.48%
6.20%
7.64%
9.98%
11.77%
12.88%
14.82%
16.54%
18.49%
19.10%
21.61%
22.68%
23.83%
25.94%
29.28%
31.24%
32.60%
37.00%
46.67%
Sorted Returns

26
One more KS test: Mini Case Study - Central Limit Theorem
Graph of Empirical Results for Distribution of Sample Means
Population Recap the Central Limit Theorem
Obs 1
Obs 2
5.00%
-5.00%
 In earlier simulation we assumed an
Mean
Stdev
0%
5.00%
asset that either goes up 5% or drops
Sample for Mean n = 100 Empirical Theoretical
5% each day with equal probability
Average 0.0040% 0.0000%  To demonstrate the Central Limit
Standard Error of Mean 0.5030% 0.5000%
Theorem we randomly selected 100
Graph of Samples
Point Num Obs daily returns and computed the
-1.400000%
-1.200000%
0.3333%
0.6667% Distribution of Sample Means, n = 100
sample average
-1.100000%
-1.000000%
0.3333%
1.0000%
Graph is Constructed Using 300 Sample Means  We did this 300 times and based on
-0.900000% 1.3333%
10%
the Central Limit Theorem expected
-0.800000% 3.0000%
-0.700000% 2.3333% 9%
the distribution of sample means to
-0.600000% 5.6667%
-0.500000% 4.3333% 8% be normally distributed with a Mean
-0.400000%
-0.300000%
4.3333%
8.0000%
7% of 0% and a Standard Error of the
Mean of 0.5%
Frequency

-0.200000% 7.0000% 6%
-0.100000% 9.3333%
0.000000% 7.6667%
5%
 And ended up with results close to
0.100000% 7.6667% 4%
0.200000% 5.0000% theoretical
 Broadly the graph of sample means
3%
0.300000% 7.6667%
0.400000% 4.6667% 2%
0.500000%
0.600000%
4.3333%
5.0000% 1% looks bell shaped
0.700000% 2.6667% 0%  But lets run a KS test to see if the
0.800000% 3.0000%
data is normally distributed
30 %
50 %
70 %
90 %
10 %
50 %

%
.1 0 %
.9 0 %
.7 0 %
.5 0 %
.3 0 %
.1 0 %

10 %

0.900000% 1.0000%
0. 000
0. 000
0. 000
0. 000
1. 000
1. 000
00
0. 000
-1 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0
-0 0 0 0

00

1.000000% 1.6667%
0
0
0
0
0
0
00
0
0
0
0
0
0
.4
-1

1.100000% 1.3333%
1.200000% 0.3333% Sample Means
1.500000% 0.3333%
27
Total 100.0000%
Results of KS test: Mini Case Study - Central Limit Theorem
KS Test for Normality KS Results
Mean 0.0040%  When we compared the Theoretical and
Std Dev 0.5030%
Actual distribution we found a maximum
difference of 0.0586
Obs
Actual
Frequency Cum Prob
Theo Max Abs
Cum Prob Difference
 From the KS tables the Critical D value is
-1.40% 1 0.3333% 0.2623% 0.0007 1.36 /  300 = 0.0785
-1.20% 2 1.0000% 0.8336% 0.0017
-1.10% 1 1.3333% 1.4080% 0.0007  Since the Test Value <= Critical Value
-1.00% 3 2.3333% 2.2955% 0.0004
-0.90% 4 3.6667% 3.6137% 0.0005
or 0.0586 < 0.0785, we fail to reject the
-0.80%
-0.70%
9
7
6.6667%
9.0000%
5.4959%
8.0796%
0.0117
0.0092
Null Hypothesis
-0.60% 17 14.6667% 11.4893% 0.0318  Therefore we are 95% confident that
-0.50% 13 19.0000% 15.8152% 0.0318
-0.40% 13 23.3333% 21.0913% 0.0224
our data is normally distributed
-0.30%
-0.20%
24 31.3333%
21 38.3333%
27.2778%
34.2516%
0.0406
0.0408
 Yes its what we are expecting. You can
-0.10% 28 47.6667% 41.8091% 0.0586 have a population whose data is not
0.00% 23 55.3333% 49.6827% 0.0565
0.10% 23 63.0000% 57.5687% 0.0543 normally distributed but as long as the
0.20%
0.30%
15 68.0000%
23 75.6667%
65.1621%
72.1910%
0.0284
0.0348
population has a finite range then sample
0.40% 14 80.3333% 78.4462% 0.0189 means will be normally distributed
0.50%
0.60%
13 84.6667%
15 89.6667%
83.7977%
88.1992%
0.0087
0.0147  It’s a powerful demonstration why the
0.70%
0.80%
8 92.3333%
9 95.3333%
91.6795%
94.3250%
0.0065
0.0101
Normal Distribution is one of the most
0.90% 3 96.3333% 96.2583% 0.0008 prominent patterns in a wide range of
1.00% 5 98.0000% 97.6166% 0.0038
1.10% 4 99.3333% 98.5339% 0.0080 everyday observations
1.20% 1 99.6667% 99.1296% 0.0054
1.50% 1 100.0000% 99.8532% 0.0015

Total Obs 300

Actual Absolute Max 0.0586 28


Critical Max, Obs = 300 0.0785

You might also like