Professional Documents
Culture Documents
It has been said that the distribution of monthly (daily ? Yearly ?) returns are
Normally distributed.
That is, the frequency with which certain returns occur lie on a "Normal" curve.
A wise man once said: "What faith one should place in the conclusions drawn
from such assumptions is a religious argument."
We suppose these percentages all lie between -12% and +14%. Here's what we'll
do:
Now we plot the points (-12,0), (-10,2), (-8,7), etc., (+14,600) and we get Fig. 1
(where we've also drawn a smoooth curve joining these points - indicated by wee
red squares).
Fig. 1
1/26
Normal vs Log-Normal
The Average (or Mean) of the 600 percentages is 1%, in this example, and half
of the 600 (that's 300) are less than the Mean.
This is the Cumulative distribution for our 600 percentages (and, for this
example, we've actually plotted a Normal Cumulative distribution - so we can see
what it looks like).
Now let's consider the number of these 600 percentages which lie in the intervals:
(-12% to -10%) (-10% to -8%) (-8% to -6%) ... (+12% to +14%)
For example, if 460 are less than 4% and 355 are less than 2%, then 460 - 355 =
105 lie between 2% and 4%.
We then plot a point at 3% (half-way between 2% and 4%) and give it the value
105. That's one point in our graph (shown in blue).
Fig. 2
This is the familiar (infamous?) Bell Curve and gives the Density distribution for
our 600 percentages.
Note: If we count the number in (-12 to -10) and those in (-10,-8) etc. and add all
these numbers we get ... 600!
Okay, it's time to talk about the actual monthly percentage gains for the S&P 500
... namely, the 600 monthly returns from January, 1950 through December, 1999.
2/26
Normal vs Log-Normal
They range from roughly -14% to +14%, so we count the number of returns less
than -14%, then the number less than -12%, then less than -10%, etc. etc. and we
plot these points and we get - voila! - Fig. 3 which shows the distributions:
Cumulative (on the left) and Density (on the right).
In each case there's a thin red curve which is an actual, real, live Normal Curve
(as opposed to the distribution of S&P 500 returns).
Stare at Fig. 3 for a while and one starts to believe that these monthly returns are,
indeed, normally distributed.
>Are you kidding ? The Density guy is way off ! Isn't there some better fit ? Some
theoretical distribution which gives a better match? One that ...?
For the S&P500, the mean monthly return is 0.78% and the SD is 4.04% so the
range is from 0.78 - 4.04 = -3.3% to 0.78 + 4.04 = 4.8% and, as it turns out, 73%
of the monthly returns for the S&P lie in that interval.
Actually, you can sorta see that from Fig. 3b where the returns are more closely
scrunched near their Mean, compared to the red, Normal distribution.
3/26
Normal vs Log-Normal
Further, the range of monthly returns is from -100% to + infinity (meaning
unlimited upside gains) whereas the Normal density function expects a range from
- infinity to + infinity.
>Is that bad ? Not bad. It just means that assuming a Normal distribution for the
S&P is just an approximation, and approximations are just . uh ... approximations.
>Wait ! What info do I need to plot a Normal graph ? Do I need hundreds of points
or ...? Actually, you just need the Mean and Standard Deviation of the distribution
and, armed with these 2 numbers, the Normal distribution is defined.
We do this:
1. Let r1, r2, ... rN denote N monthly returns where r = 0.0123 corresponds to
a monthly return of 1.23%
2. Assume that -1 < rn < infinity assuming that no month will reduce our
investment to zero ... or less!
3. Let g1, g2, ... gN denote the monthly Gain Factors where gn = 1 + rn
4. Then 0 < gn = 1 + rn < infinity
5. Then -infinity < log(gn) < infinity so the logs have the appropriate range for
a Normal distribution!
6. Consider the distribution of the N logarithms, xn = log(gn)
7. Assume that the xn are Normally Distributed.
8. Then we say that the gn have a Log-normal distribution.
If the set of logarithms, log(gn), are Normally Distributed, then we say that the set
gn has a Log-normal distribution.
In other words:
If a set {g} has a Log-normal distribution, then it means that g = ex where the set
{x} has a Normal distribution.
>You forgot your subscripts. Let me leave them out - for sanitary reasons.
So instead of saying g1, g2, ... gN I'll just say {g} and instead of saying gn=1+rn I'll
just say g=1+r and instead of ...
4/26
Normal vs Log-Normal
I looked at all 600 monthly returns, {r} expressed as a decimal, like 0.0123,
instead of a percentage, like 1.23%
then the monthly Gain Factors {g} = {1+r}
then their logarithms {log(g)} the natural log, to the base e, tho' that isn't
important
then calculated the Mean and Standard Deviation of these 600 logs
then I plotted the Cumulative and Density distributions for these 600 logs as
described above
and I compared them to a real, live Normal distribution with the same Mean
and Standard Deviation
cuz if these Gain Factors had a Log-normal distribution, their logs should
have a Normal distribution
and I got:
>That's pretty lousy, and besides, plotting against a logarithm seems sort of ...
Just an example (plotting the percentage of values, unlike Fig. 4 which plots the
number) to show how the Log-normal Density distribution is skewed to the right
with a fatter tail - that's kurtosis - whereas the Normal distribution is nicely
distributed about the mean which, in this example, happens to be 3.0.
5/26
Normal vs Log-Normal
Fig. 5a and Fig. 5b
>Kurtosis ? My doctor said ... One neat thing about considering the logs ...
>Wait! You said the Mean for Fig. 5 was 3.0 yet the Log-normal curve lies mostly
below the mean - I mean to the left of the mean - but only half of the points, that's
50% of 'em, should lie below the mean, right ? Wrong. For the Log-normal
distribution, more than 50% lie below the mean because of the skewness ...
>Is that possible? The 4 numbers 4, 4, 4 and 100 have a mean of 28, yet more
than half of these numbers are below the mean. In fact, 3/4 of them are below
their mean.
Anyway, one neat thing about considering the logs of the monthly ...
48% of the 600 monthly gains are less than their mean ... which is 0.78%.
Anyway, one neat thing about considering the logs of the monthly gain factors
(instead of the gain factors themselves) is that $1.00 invested in the S&P500
grows to g1 after one month, then g1g2 after 2 months, etc. and g1g2...gN after N
months and, since g=ex where x = log(g), we get the neat result:
6/26
Normal vs Log-Normal
or, more succinctly
N-month Gain Factor = e Σx
>Hold on! We don't like assuming that the monthly returns, r, are Normally
distributed so we turn to a Log-normal distribution where we assume that the set
{x} is Normally distributed and then find that x = r, very nearly, so is the analysis
so precise that ... ?
Good question ! I suspect that, as much as anything, it's the neat mathematics
that follows from a Log-normal distribution that prompts people to assume Log-
normal. Let me explain:
Good point. Actually, 73% of the 600 log(g) values lie within one Standard
Deviation of their Mean, so ...
7/26
Normal vs Log-Normal
>But that's the same as the plain Jane Normal, right ? Actually, for the gains
themselves, to one decimal place, it's 72.7% and for their logs it's 73.2% so ...
>So, assuming that the gains are Normal is better than assuming they're Log-
normal, right? Well ... if the 68% is your only goal in life and the S&P your only
investment and those 50 years from 1950 to 1999 your only time period, but there
are other things to consider.
>Like what ? Okay, but first we should identify certain properties of both Normal
and Log-normal distributions.
We'll consider a Normally distributed set {x} with Mean = μ and Variance = σ2 (so
Standard Deviation = σ).
We'll also use EXP(x) to mean ex and SQRT(x) to denote the square root of x and,
of course, we'll use {x} to represent a collection (maybe 600?) of numbers and
LOG to mean the natural logarithm and ...
If a and b are constants, then the set {a + bx} is N(a+bμ,b2σ2) ... it has
Mean=a+bμ, Variance=b2σ2
Notice that the Mean of the set {y} is greater than zero ! In fact, this set describes
the Gain Factors for a set of returns. (For a return of 0.123 the Gain Factor is
1.123, namely the value of $1.00 after applying the gain.)
Okay. Suppose we have a Log-normal set {y} and we know its Mean = M and
Variance = S2 (where S is the Standard Deviation). We need to find values of μ
and σ so that, using 3, above, we can identify the associated Normal set {x}. That
means that:
EXP(μ+σ2/2) = M
EXP(2μ+σ2) [EXP(σ2)-1] = S2
μ = LOG(M) - σ2/2
where σ2 = LOG(1 + S2/M2)
We want the fraction of returns less than x (where, for 12.3%, we put x = 0.123).
In MS Excel, we can use:
=LOGNORMDIST(1+x,LN(1+R)-LN(1+(S/(1+R))^2)/2,SQRT(LN(1+(S/(1+R))^2)))
For example, if R = 0.1 (or 10%) and S = 0.15 (or 15%) and x = R+2S=0.4 and x =
R-2S=-0.2, then
LOGNORMDIST(1+0.4,LN(1+0.1)-
LN(1+(0.15/(1+0.1))^2)/2,SQRT(LN(1+(0.15/(1+0.1))^2))) = 0.967
and
LOGNORMDIST(1-0.2,LN(1+0.1)-
LN(1+(0.15/(1+0.1))^2)/2,SQRT(LN(1+(0.15/(1+0.1))^2))) = 0.011
so 96.7% - 1.1% = 95.6% of returns (should!) lie within two standard deviations of
the mean.
We'll consider a set of N numbers, {g}, with a known Mean and Standard
Deviation
... like our 600 S&P monthly Gain Factors ... the Factors are always greater
than 1!
We'll see what the Cumulative and Density distributions would look like if the
set {g} were distributed Normally.
Then we'll see what the distribution functions would look like if the set {g}
were distributed Log-normally.
Then we'll compare the two graphs with the actual distribution of S&P gain
factors.
Then we'll stare at the three distributions ... for hours.
First off, we note that, for the 600 S&P monthly gains, the Mean is 0.078 and the
Standard Deviation is 0.0404 so, for the Gain Factors, we have Mean = 1.078 and
SD = 0.0404 and now
Finally, we have a chart with the distribution of monthly returns, for the S&P 500,
from 1926 to 2001 and Normal and Log-normal distributions with the same Mean
and Standard Deviation, and, in case you're wondering whether the fit is better
with a single stock rather than an index (like the S&P), I've included GE stock
(using the data for the past 40 years):
10/26
Normal vs Log-Normal
We should also note the effect of increasing
the Standard Deviation on your eventual
portfolio after 1, 2, 3, ... 15 years.
Q = log(P)
and P = eQ
If we were to consider a monthly change in the stock price, say P2 - P1, in 1950,
then (staring at the blue S&P graph) we see that the same change would be
insignificant if it happened in the nineties.
However, that insignificance would not be the case for a change in the logarithm.
log(PN) - log(P0) =
log(PN/P0)
12/26
Normal vs Log-Normal
over the N-month period.
>Example?
Me? Suggest Normal or Log-normal? Are you kidding ! Look at the comparison. Is
it good ?
Ta DUM
13/26
Normal vs Log-Normal
Normal, Log-normal and other assorted Distributions continued from Part II
We start with a jillion numbers: g1, g2, g3, ... gN where N is very large. For
convenience, we'll refer to this set as simply {g}.
>Wait ! What are we talking about here? Haven't we done this before?
Yes, but I want to talk about different distributions of, say, monthly stock gains and
>Okay, please proceed. Gee, thanks. Anyway, we count how many of these
numbers are less than x. It'll depend upon x, so we'll call it F(x).
>We'll call it ? What's it ? We're counting the number of g's less than, say, "2".
That number ... that's F(2). The number less than 1 we'll call F(1). The number
less than ...
>So N = 10,000.
The graph of F(x) could look like anything, except that it necessarily starts at the
value "0" on the far left and increases to the number of members of the set {g}.
Notice that we can calculate the number of g's between x=0 and x=2:
There are 6616 less than 2 and 4011 less than 0 so there are 6616-4011=2605
between x=0 and x=2.
Usually, we divide F(x) by the number of members of the set {g} - in our case
10,000 - so F(x) gives the fraction which are less than some given x.
14/26
Normal vs Log-Normal
The graph of this "new" F(x) would then go from 0 at the far left to 1 on the far
right and, for our example, the fraction less than 2 is 0.6616 and less than 0 is
0.4011 so 0.2605 or 26.05% of the g's lie between 0 and 2.
>Are these numbers called g or are they called x ? It's confusing, I mean ...
Uh ... I'm calling the original numbers g, like g1, g2, etc.. However, when I want to
talk about a particular g-value, I use the symbol x.
For example, I refer to the number of g's less than some particular value x. Clear ?
>No. Anyway, you've got some guy called F(x). Does he have a name ?
Note that the fraction lying in an interval of length Δx, at the place x, is f(x) Δx.
That's important to remember. I'll say it again:
The fraction lying in an interval of length Δx, at the place x, is f(x) Δx and
Σf(x) Δx = 1
>I take it that Σ means we add them all up and "1" means we've included 100% of
the g's.
Right. For small Δx, we write this sum as an integral and get our first magic
equation:
(1)
15/26
Normal vs Log-Normal
>An integral ? Don't worry about it. It's a wee bit of calculus.
Okay, now we want to know the average value of the g's. We do this like so:
1. Suppose we're determining the average grade on a test and we know that a
fraction 0.1 of the students (that's 10%) got a grade of 45, 0.3 got a grade of
65 and 0.6 got a grade of 85.
2. The average grade is 0.1(45) + 0.3(65) + 0.6(85) = 75
>Don't tell me! To get the average of the g's we'd determine the fraction having
the value x1, say n1, the fraction having the value x2, say n2, etc., then ... uh ...
we'd calculate n1(x1) + n2(x2) + ...
Very good. You've been eating your smart pills. Okay, for the case we're
considering, where
(2)
Now we want to measure how far the g's are from their Mean, m. We calculate the
average of the squared deviations:
(3)
16/26
Normal vs Log-Normal
>And S is called ... what ? S, the Root Mean Square (or RMS) deviation is called
the Standard Deviation.
Here's a picture:
The funny guys (like 2π) are there so that equation (1) is satisfied.
Also, the graphs we've used above are from a Normal Distribution with m = 1 and
S = 3.
The next is the Log-normal Distribution. In this case, we assume that the
logarithms of the g's have a Normal Distribution.
Because we're considering log(g1), log(g1), etc., the numbers g1, g2, etc. had
better be positive !
>Why ? Because log(g) isn't defined (as a real number) unless g > 0.
Since the Gain Factors are always positive - assuming you don't lose everything in
one month (!) - then we can consider the distribution of their logarithms.
>I assume that the logarithm is the natural log. Yes, to the base e = 2.71828,
roughly. Anyway, here's the picture.
Note that f(z) has a part (z-m) and since z=log(x) and m=log(G) we get (z-
m)=log(x)-log(G)=log(x/G).
Put z-m=log(x/G)
Since we must satisfy equation (1), above, then changing from z to x
requires changing Δz to (1/x)Δx
(that is, dz = dx/x) so we must change f Δz to f Δx/x
We get, finally f(x), our density distribution for x (as opposed to, f(z), the
distribution for z):
(5) Log-normal with x > 0 since x is now a Gain Factor ... and S
Distribution is now the Standard Deviation of log(x) !
18/26
Normal vs Log-Normal
>We must change f Δz to f Δx/x ? What's that about ?
It's because dz = d(log(x) = (1/x) dx ... but don't worry about it.
>You promised just a wee bit of calculus ! Yes ... uh, it's because of the logarithm,
you see.
If x changes by a tiny amount from x to x+Δx, then its logarithm will change from
log(x) to log(x+Δx) = log(x[1+Δx/x]) = log(x) + log(1+Δx/x) and, for tiny values of
Δx/x, log(1+Δx/x) = Δx/x so the logarithm will change by a tiny amount Δx/x and
we can see that x in the denominator so that ...
>Number 3 is our Log-normal distribution, right ? Right. Notice that, for the Log-
normal distribution, the geometric Mean, G, plays the central role (unlike the
Normal distribution where it's the arithmetic Mean).
But if we choose the Mean m and Standard Deviation S to match S&P 500
returns, neither distribution is a very good match.
19/26
Normal vs Log-Normal
>So?
where the
horizontal axis
corresponds to
the Gain
Factors, not the
gains themselves,
so 0.8 means a
gain of 0.8 - 1 = -
0.2 or a 20% loss
and 1.1 means a
10% gain and ...
>Yeah, yeah. I
got it.
>But I thought that individual stocks are supposed to be lognormal ... not the S&P
500.
Yes. Some say that individual stocks are more closely approximated by the
lognormal distribution than a collection of stocks. For example, here's some
samples, for comparison:
20/26
Normal vs Log-Normal
>The density distribution is rather erratic ?
21/26
Normal vs Log-Normal
However, I'd like to consider a distribution which approximates (for example) the
S&P distribution:
22/26
Normal vs Log-Normal
the Mystery Distribution ... continued from Part III
This requires that A = k/2, so that our function now looks like:
Further, we'll borrow from the Log-normal scheme and assume that the logarithms
of the Gain Factors have this distribution.
(3) f(x) = (k/2x) e-k |log(x/G)| where x > 0 since x is now a Gain Factor
>Don't tell me ! m is now the average of the logs, so m=log(G) where ...
Yes, as in Part III, G is the geometric Mean of the numbers {g} whose distribution
we're discussing.
>And you're doing that funny dz = (1/x) dx because of the logarithm so that f gets
divided by x and ... Of course.
23/26
Normal vs Log-Normal
>So if the numbers we're discussing are
g1, g2, etc. then ...
... then G = { g1g2...gN }(1/N)
>Where x is the Gain Factor, namely 1+Gain Fraction so it's always positive, and
G is ...the Geometric Mean of the Gain Factors which are always greater than
zero.
>So what's the magic formula for f(x)? It's the slope of F(x), right ?
24/26
Normal vs Log-Normal
f(x) = (k/2x) (x/G)k when 0 < x < G
= (k/2x)(x/G)-k when x > G
or
f(x) = (k/2G)(x/G)p
where
p = k-1 when 0 < x < G
and
p = -k-1 when x > G
and x is a Gain Factor and
G is the Geometric Mean of the Gain
Factors
Here's another:
Undoubtedly, you'll have noticed that, when x is far from its mean (which is
denoted by m), this distribution function has an exponential decrease similar to the
first Mystery Distribution (in equation (1), above). In fact, for large values of |x-m|,
our new distribution looks like: A e-k|x-m|, which means ... which means we've still
got those bigger tails.
>So, what do you intend to do with this stuff ? Hmmm ... give me a minute to think
about it.
25/26
Normal vs Log-Normal