You are on page 1of 25

Normal vs Log-normal

It has been said that the distribution of monthly (daily ? Yearly ?) returns are
Normally distributed.

That is, the frequency with which certain returns occur lie on a "Normal" curve.

We want to understand what this means and to determine whether such an


assumption is valid or simply a mechanism for applying some esoteric
mathematical ritual.

A wise man once said: "What faith one should place in the conclusions drawn
from such assumptions is a religious argument."

First we consider 600 numbers (which we'll express as a percentage because


we'll eventually be considering monthly percentage gains in the S&P 500 index).

We suppose these percentages all lie between -12% and +14%. Here's what we'll
do:

 We count the number of percentages less than -12%: 0


 We count the number of percentages less than -10%: 2
 We count the number of percentages less than -8%: 7
 We count the number of percentages less than -6%: 25
 etc. etc.
 We count the number of percentages less than +14%: 600

Now we plot the points (-12,0), (-10,2), (-8,7), etc., (+14,600) and we get Fig. 1
(where we've also drawn a smoooth curve joining these points - indicated by wee
red squares).

Fig. 1
1/26
Normal vs Log-Normal
The Average (or Mean) of the 600 percentages is 1%, in this example, and half
of the 600 (that's 300) are less than the Mean.

This is the Cumulative distribution for our 600 percentages (and, for this
example, we've actually plotted a Normal Cumulative distribution - so we can see
what it looks like).

Now let's consider the number of these 600 percentages which lie in the intervals:
(-12% to -10%) (-10% to -8%) (-8% to -6%) ... (+12% to +14%)

We can actually get these numbers from Fig. 1.

For example, if 460 are less than 4% and 355 are less than 2%, then 460 - 355 =
105 lie between 2% and 4%.

We then plot a point at 3% (half-way between 2% and 4%) and give it the value
105. That's one point in our graph (shown in blue).

The points, complete with smoooth curve, are shown in Fig. 2

Fig. 2

This is the familiar (infamous?) Bell Curve and gives the Density distribution for
our 600 percentages.

Note: If we count the number in (-12 to -10) and those in (-10,-8) etc. and add all
these numbers we get ... 600!

Okay, it's time to talk about the actual monthly percentage gains for the S&P 500
... namely, the 600 monthly returns from January, 1950 through December, 1999.
2/26
Normal vs Log-Normal
They range from roughly -14% to +14%, so we count the number of returns less
than -14%, then the number less than -12%, then less than -10%, etc. etc. and we
plot these points and we get - voila! - Fig. 3 which shows the distributions:
Cumulative (on the left) and Density (on the right).

Fig. 3a and Fig. 3b

In each case there's a thin red curve which is an actual, real, live Normal Curve
(as opposed to the distribution of S&P 500 returns).

Stare at Fig. 3 for a while and one starts to believe that these monthly returns are,
indeed, normally distributed.

>Are you kidding ? The Density guy is way off ! Isn't there some better fit ? Some
theoretical distribution which gives a better match? One that ...?

There's also a Log-normal distribution which is quite popular among analysts.


Before we get to that, however, let's look more carefully at the Normal distribution.

If we consider the range of returns from Mean - Standard Deviation to Mean +


Standard Deviation, the real live Normal distribution will have 68% lying in that
interval. (See SD stuff.)

For the S&P500, the mean monthly return is 0.78% and the SD is 4.04% so the
range is from 0.78 - 4.04 = -3.3% to 0.78 + 4.04 = 4.8% and, as it turns out, 73%
of the monthly returns for the S&P lie in that interval.

Actually, you can sorta see that from Fig. 3b where the returns are more closely
scrunched near their Mean, compared to the red, Normal distribution.

3/26
Normal vs Log-Normal
Further, the range of monthly returns is from -100% to + infinity (meaning
unlimited upside gains) whereas the Normal density function expects a range from
- infinity to + infinity.

>Is that bad ? Not bad. It just means that assuming a Normal distribution for the
S&P is just an approximation, and approximations are just . uh ... approximations.

Let's move on to the Log-normal distribution.

>Wait ! What info do I need to plot a Normal graph ? Do I need hundreds of points
or ...? Actually, you just need the Mean and Standard Deviation of the distribution
and, armed with these 2 numbers, the Normal distribution is defined.

Let's move on to the Log-normal distribution. To understand it we do a bit of math


...

We do this:

1. Let r1, r2, ... rN denote N monthly returns where r = 0.0123 corresponds to
a monthly return of 1.23%
2. Assume that -1 < rn < infinity assuming that no month will reduce our
investment to zero ... or less!
3. Let g1, g2, ... gN denote the monthly Gain Factors where gn = 1 + rn
4. Then 0 < gn = 1 + rn < infinity
5. Then -infinity < log(gn) < infinity so the logs have the appropriate range for
a Normal distribution!
6. Consider the distribution of the N logarithms, xn = log(gn)
7. Assume that the xn are Normally Distributed.
8. Then we say that the gn have a Log-normal distribution.

If the set of logarithms, log(gn), are Normally Distributed, then we say that the set
gn has a Log-normal distribution.

In other words:

If a set {g} has a Log-normal distribution, then it means that g = ex where the set
{x} has a Normal distribution.

>You forgot your subscripts. Let me leave them out - for sanitary reasons.

So instead of saying g1, g2, ... gN I'll just say {g} and instead of saying gn=1+rn I'll
just say g=1+r and instead of ...

>Okay, I get the idea. And what about the S&P500 ?

Okay. Here's what I did:

4/26
Normal vs Log-Normal
 I looked at all 600 monthly returns, {r} expressed as a decimal, like 0.0123,
instead of a percentage, like 1.23%
 then the monthly Gain Factors {g} = {1+r}
 then their logarithms {log(g)} the natural log, to the base e, tho' that isn't
important
 then calculated the Mean and Standard Deviation of these 600 logs
 then I plotted the Cumulative and Density distributions for these 600 logs as
described above
 and I compared them to a real, live Normal distribution with the same Mean
and Standard Deviation
cuz if these Gain Factors had a Log-normal distribution, their logs should
have a Normal distribution
 and I got:

Fig. 4a and Fig. 4b

>That's pretty lousy, and besides, plotting against a logarithm seems sort of ...

Okay, here's a picture of a Log-normal and a Normal distribution, having nothing


to do with the S&P500.

Just an example (plotting the percentage of values, unlike Fig. 4 which plots the
number) to show how the Log-normal Density distribution is skewed to the right
with a fatter tail - that's kurtosis - whereas the Normal distribution is nicely
distributed about the mean which, in this example, happens to be 3.0.

5/26
Normal vs Log-Normal
Fig. 5a and Fig. 5b

>Kurtosis ? My doctor said ... One neat thing about considering the logs ...

>Wait! You said the Mean for Fig. 5 was 3.0 yet the Log-normal curve lies mostly
below the mean - I mean to the left of the mean - but only half of the points, that's
50% of 'em, should lie below the mean, right ? Wrong. For the Log-normal
distribution, more than 50% lie below the mean because of the skewness ...

>Is that possible? The 4 numbers 4, 4, 4 and 100 have a mean of 28, yet more
than half of these numbers are below the mean. In fact, 3/4 of them are below
their mean.

Anyway, one neat thing about considering the logs of the monthly ...

>Wait ! What about the S&P500 monthly gains ?

48% of the 600 monthly gains are less than their mean ... which is 0.78%.

Anyway, one neat thing about considering the logs of the monthly gain factors
(instead of the gain factors themselves) is that $1.00 invested in the S&P500
grows to g1 after one month, then g1g2 after 2 months, etc. and g1g2...gN after N
months and, since g=ex where x = log(g), we get the neat result:

N-month Gain Factor = ex1+x1+...xN

6/26
Normal vs Log-Normal
or, more succinctly
N-month Gain Factor = e Σx

Note: Here's something interesting.

 The Gain Factor for a single year is then e(1/N)Σx = e


where Mean[x] = is the average or Mean of the x's.
 The Annualized Gain is then e - 1.
 If the x's are normally distributed, then 50% of them are above Mean[x] and
50% below.
 Hence 50% of the Annualized Gains are above and 50% below e - 1.
 The Median Annualized Gain is then e - 1.

>You forgot your subscripts.


Pay attention. The set of numbers {x} = x1, x2, x3 ...

>I was going to ask about them. What are they ?


... are very nearly the actual monthly returns: r1, r2
etc.

That's because g=1+r is close to 1 (unless the


monthly gain is huge) and since g=ex it means that
x is close to zero and that means that ex is very
nearly equal to 1 + x ... see the graph of y = ex and
y = 1+x
so that means ...

>Don't tell me! g = 1+r and g = 1+x, nearly. So x =


r, nearly. You got it.

>Hold on! We don't like assuming that the monthly returns, r, are Normally
distributed so we turn to a Log-normal distribution where we assume that the set
{x} is Normally distributed and then find that x = r, very nearly, so is the analysis
so precise that ... ?

Good question ! I suspect that, as much as anything, it's the neat mathematics
that follows from a Log-normal distribution that prompts people to assume Log-
normal. Let me explain:

>Hold on ! If Log-normal assumes that log(g) is Normally distributed then I


assume that the number lying within one standard deviation of their mean is closer
to 68%, like the real, live Normal ...

Good point. Actually, 73% of the 600 log(g) values lie within one Standard
Deviation of their Mean, so ...
7/26
Normal vs Log-Normal
>But that's the same as the plain Jane Normal, right ? Actually, for the gains
themselves, to one decimal place, it's 72.7% and for their logs it's 73.2% so ...

>So, assuming that the gains are Normal is better than assuming they're Log-
normal, right? Well ... if the 68% is your only goal in life and the S&P your only
investment and those 50 years from 1950 to 1999 your only time period, but there
are other things to consider.

>Like what ? Okay, but first we should identify certain properties of both Normal
and Log-normal distributions.

We'll consider a Normally distributed set {x} with Mean = μ and Variance = σ2 (so
Standard Deviation = σ).

We denote such a Normal distribution by N(μ,σ2).

We'll also use EXP(x) to mean ex and SQRT(x) to denote the square root of x and,
of course, we'll use {x} to represent a collection (maybe 600?) of numbers and
LOG to mean the natural logarithm and ...

If a and b are constants, then the set {a + bx} is N(a+bμ,b2σ2) ... it has
Mean=a+bμ, Variance=b2σ2

1. {y} is a Log-normal distribution if y = EXP(x) where {x} is Normal: N(μ,σ2).


2. The Mean of {y} = {EXP(x)} is EXP(μ+σ2/2) and its Variance is EXP(2μ+σ2)
[EXP(σ2)-1]

Notice that the Mean of the set {y} is greater than zero ! In fact, this set describes
the Gain Factors for a set of returns. (For a return of 0.123 the Gain Factor is
1.123, namely the value of $1.00 after applying the gain.)

Okay. Suppose we have a Log-normal set {y} and we know its Mean = M and
Variance = S2 (where S is the Standard Deviation). We need to find values of μ
and σ so that, using 3, above, we can identify the associated Normal set {x}. That
means that:

EXP(μ+σ2/2) = M
EXP(2μ+σ2) [EXP(σ2)-1] = S2

Solving, we get the Magic Formula:

μ = LOG(M) - σ2/2
where σ2 = LOG(1 + S2/M2)

Note: μ = LOG(M) - (1/2) LOG(1 + S2/M2) = (1/2)log[M2/(1 + S2/M2)]


8/26
Normal vs Log-Normal
For example, we consider a collection of returns with Mean Return = R (so the
Mean Gain Factor is M = 1 + R, which, as required, is greater than "0") and
Standard Deviation = S and we assume a Log-normal distribution.

We want the fraction of returns less than x (where, for 12.3%, we put x = 0.123).
In MS Excel, we can use:

=LOGNORMDIST(1+x,LN(1+R)-LN(1+(S/(1+R))^2)/2,SQRT(LN(1+(S/(1+R))^2)))

For example, if R = 0.1 (or 10%) and S = 0.15 (or 15%) and x = R+2S=0.4 and x =
R-2S=-0.2, then

LOGNORMDIST(1+0.4,LN(1+0.1)-
LN(1+(0.15/(1+0.1))^2)/2,SQRT(LN(1+(0.15/(1+0.1))^2))) = 0.967
and
LOGNORMDIST(1-0.2,LN(1+0.1)-
LN(1+(0.15/(1+0.1))^2)/2,SQRT(LN(1+(0.15/(1+0.1))^2))) = 0.011

so 96.7% - 1.1% = 95.6% of returns (should!) lie within two standard deviations of
the mean.

Here's what we'll do:

 We'll consider a set of N numbers, {g}, with a known Mean and Standard
Deviation
... like our 600 S&P monthly Gain Factors ... the Factors are always greater
than 1!
 We'll see what the Cumulative and Density distributions would look like if the
set {g} were distributed Normally.
 Then we'll see what the distribution functions would look like if the set {g}
were distributed Log-normally.
 Then we'll compare the two graphs with the actual distribution of S&P gain
factors.
 Then we'll stare at the three distributions ... for hours.

First off, we note that, for the 600 S&P monthly gains, the Mean is 0.078 and the
Standard Deviation is 0.0404 so, for the Gain Factors, we have Mean = 1.078 and
SD = 0.0404 and now

 we generate a Normal distribution with Mean=1.078, SD=0.0404 and,


 a Log-normal distribution by looking carefully at the Magic Formula, above,
and identifying the associated
Normal distribution which has:
SD=SQRT[LOG(1+0.04042/1.0782)]=0.0375
Mean=LOG(1.078)-(0.0375)2/2=0.0744 and
9/26
Normal vs Log-Normal
and we get - voila :

Finally, we have a chart with the distribution of monthly returns, for the S&P 500,
from 1926 to 2001 and Normal and Log-normal distributions with the same Mean
and Standard Deviation, and, in case you're wondering whether the fit is better
with a single stock rather than an index (like the S&P), I've included GE stock
(using the data for the past 40 years):

Okay, which is the better fit ? Normal or Log-normal ?

I leave you with this:

10/26
Normal vs Log-Normal
We should also note the effect of increasing
the Standard Deviation on your eventual
portfolio after 1, 2, 3, ... 15 years.

If we look at the range of possible portfolios


(within 2 standard deviations of the Mean
Return), assuming (for example) a Log-
normal distribution and SD = 15% and SD =
25% ... we get

Normal vs Log-normal a continuation of Part I

We continue with our study of the S&P 500 to


determine whether the monthly returns are
Normal or Log-normal or ... whatever.

However, now we'll look not at the fractional


changes in price (as we did in Part I) but at the
price itself.

Recall that we used r and g=1+r to denote


fractional changes in price and gain factor
(respectively).

That is, if a stock (our S&P500) goes from price


P1 (this month) to P2 (next month), then
Fig. 1
11/26
Normal vs Log-Normal
r = P2/P1 - 1 and g = P2/P1

In Fig. 1 there are two graphs: the S&P500 and


its (natural, to-the-base-e) logarithm. If, at some
time t, the logarithm has a value Q and the S&P
has the value P, then

Q = log(P)
and P = eQ

If we were to consider a monthly change in the stock price, say P2 - P1, in 1950,
then (staring at the blue S&P graph) we see that the same change would be
insignificant if it happened in the nineties.

However, that insignificance would not be the case for a change in the logarithm.

Indeed: log(P2) - log(P1) = log(P2/P1) = log(g)

Fig. 2 shows the set {g} of monthly gain factors for


the S&P500; the distribution doesn't change much
?

We'll consider N months and let P1, P2, P3, ... PN


denote the N end-of-month stock prices, and use
P0 as the starting price (9:30 am ET on Jan 1,
1950, for our S&P example, where N=600). We
can also use Q1, Q2, etc. to denote their
logarithms.
Fig. 2
>Can you show the distribution of log(g) ?
Here it is

If we now calculate the total


change in log(P) we get:

log(PN) - log(P0) =
log(PN/P0)

and we may recognize PN/P0


as the N-month gain factor
and {PN/P0}1/N as the
equivalent per-month gain
factor and {PN/P0}12/N as the
equivalent per-year (or
annualized) gain factor ...

12/26
Normal vs Log-Normal
over the N-month period.

>Example?

 P600/P0 = 1469.25/16.66 = 88.19 (the


S&P grew by a factor 88.19)
 {88.19}1/600 = 1.0075 (the equivalent
per-month gain factor was 1.0075)
 {1.0075}12 = 1.094 (the equivalent
annualized gain factor was 1.094)
 so the annualized gain was 9.4%

And, just to be obstreperous (so you don't fall


asleep), I'll remind you of the chart with the
distribution of the monthly S&P500 returns
(since 1926) so you can compare them with
Normal and Log-normal distributions with the
same Mean and Standard Deviation ... like
so

>Why are you telling me all this ? Are you


going to suggest a Normal or a Log-normal
distribution? Are you ...?

Me? Suggest Normal or Log-normal? Are you kidding ! Look at the comparison. Is
it good ?

Actually, I'd like to introduce you to ..

Ta DUM

13/26
Normal vs Log-Normal
Normal, Log-normal and other assorted Distributions continued from Part II

We start with a jillion numbers: g1, g2, g3, ... gN where N is very large. For
convenience, we'll refer to this set as simply {g}.

>Wait ! What are we talking about here? Haven't we done this before?
Yes, but I want to talk about different distributions of, say, monthly stock gains and

>Okay, please proceed. Gee, thanks. Anyway, we count how many of these
numbers are less than x. It'll depend upon x, so we'll call it F(x).

>We'll call it ? What's it ? We're counting the number of g's less than, say, "2".
That number ... that's F(2). The number less than 1 we'll call F(1). The number
less than ...

>Okay, I get it, but a picture is worth a


thousand ...
Okay, here's a picture
Every time we get a count, we get a
point on the chart.

For example, the number of g's less


than 2 is 6616. The number less than 0
is 4011.

>It looks like you're working with 10,000


numbers. You got it, and ...

>And it looks like they all lie between -


10 and +10.
Yes, in this particular example, but ...

>So N = 10,000.

This is just an example, The numbers can be anything.

The graph of F(x) could look like anything, except that it necessarily starts at the
value "0" on the far left and increases to the number of members of the set {g}.

Notice that we can calculate the number of g's between x=0 and x=2:
There are 6616 less than 2 and 4011 less than 0 so there are 6616-4011=2605
between x=0 and x=2.

Usually, we divide F(x) by the number of members of the set {g} - in our case
10,000 - so F(x) gives the fraction which are less than some given x.

14/26
Normal vs Log-Normal
The graph of this "new" F(x) would then go from 0 at the far left to 1 on the far
right and, for our example, the fraction less than 2 is 0.6616 and less than 0 is
0.4011 so 0.2605 or 26.05% of the g's lie between 0 and 2.

>Are these numbers called g or are they called x ? It's confusing, I mean ...

Uh ... I'm calling the original numbers g, like g1, g2, etc.. However, when I want to
talk about a particular g-value, I use the symbol x.

For example, I refer to the number of g's less than some particular value x. Clear ?

>No. Anyway, you've got some guy called F(x). Does he have a name ?

The Cumulative Distribution Function.

In general, if we wanted to know the fraction of g's


between x and x+Δx,
it'd be F(x+Δx) - F(x) and, for small Δx, we can write
F(x+Δx) - F(x) = F'(x)Δx
where F'(x) is the slope of F(x), at the place x. We'll
call this slope f(x), so
F'(x) = f(x)

and the fraction of g's between x and x+Δx is then:


F(x+Δx) - F(x) = f(x) Δx

and if we sum of all these fractions we'd get all the


g's, so:
Σf(x) Δx = 1

Note that the fraction lying in an interval of length Δx, at the place x, is f(x) Δx.
That's important to remember. I'll say it again:

The fraction lying in an interval of length Δx, at the place x, is f(x) Δx and
Σf(x) Δx = 1

>I take it that Σ means we add them all up and "1" means we've included 100% of
the g's.

Right. For small Δx, we write this sum as an integral and get our first magic
equation:

(1)
15/26
Normal vs Log-Normal
>An integral ? Don't worry about it. It's a wee bit of calculus.

Okay, now we want to know the average value of the g's. We do this like so:

1. Suppose we're determining the average grade on a test and we know that a
fraction 0.1 of the students (that's 10%) got a grade of 45, 0.3 got a grade of
65 and 0.6 got a grade of 85.
2. The average grade is 0.1(45) + 0.3(65) + 0.6(85) = 75

>Don't tell me! To get the average of the g's we'd determine the fraction having
the value x1, say n1, the fraction having the value x2, say n2, etc., then ... uh ...
we'd calculate n1(x1) + n2(x2) + ...

Very good. You've been eating your smart pills. Okay, for the case we're
considering, where

the fraction having the value x1 is f(x1)Δx,


the fraction having the value x2 is f(x2)Δx
etc.,

we'd calculate: x1f(x1)Δx + x2f(x2)Δx + ... = Σ x f(x) Δx

which gives us our second magic equation:

(2)

where m is the average, or Mean.

Now we want to measure how far the g's are from their Mean, m. We calculate the
average of the squared deviations:

(1/N){(g1-m)2 + (g2-m)2 + ... }


but, as above, we count the fraction having the value x1, namely f(x1)Δx, and the
fraction having the value x2, namely f(x2)Δx, etc. and use:
(x1-m)2f(x1)Δx + (x2-m)2f(x2)Δ + ...
which brings us to our third magic equation:

(3)

where S2 is the mean squared deviation from the Mean.

16/26
Normal vs Log-Normal
>And S is called ... what ? S, the Root Mean Square (or RMS) deviation is called
the Standard Deviation.

>So, what does f(x) look like ?

Since it's the slope of the Cumulative


Distribution (which grows from 0 to 1), then
we expect f(x) to begin (at the far left) with the
value 0 then increase (as the slope of F(x)
increases), then decrease again (as the slope
of F(x) decreases to 0).

>So, what does f(x) look like ?

Here's a picture:

>And f(x), I presume, is the "Density".

Did I forget to mention that ?


There are a couple of popular distributions when considering the monthly (daily,
weekly, yearly?) returns of stocks. The first is the infamous Bell Curve:

(4) Normal Distribution

The funny guys (like 2π) are there so that equation (1) is satisfied.

Also, the graphs we've used above are from a Normal Distribution with m = 1 and
S = 3.

The next is the Log-normal Distribution. In this case, we assume that the
logarithms of the g's have a Normal Distribution.

Because we're considering log(g1), log(g1), etc., the numbers g1, g2, etc. had
better be positive !

>Why ? Because log(g) isn't defined (as a real number) unless g > 0.

Hence, when considering a Log-normal distribution of returns, we consider g1, g2,


etc. to be the Gain Factors.
17/26
Normal vs Log-Normal
>Remind me. If the monthly return is 2.3%, the Gain Factor for that month is
1.023, meaning that $1.00 will grow to $1.023 during that month.

Since the Gain Factors are always positive - assuming you don't lose everything in
one month (!) - then we can consider the distribution of their logarithms.

>I assume that the logarithm is the natural log. Yes, to the base e = 2.71828,
roughly. Anyway, here's the picture.

If we plot the distribution of logarithms, log(g1),


log(g2), ... log(gN),
we'd get a Normal curve as shown, where m
and S are now the Mean and Standard
Deviation of the logarithms !

>That sounds tough, I mean ...


Actually, the Mean is easy. For example,
suppose {g} were a set of annual returns. The
average logarithm, m, is
(1/N){ =
log(g1)+log(g2)+...+log(gN) } (1/N)log(g1g2...gN)
= log{g1g2...gN}1/N
= log(G)
where G is the annualized return.

Note that f(z) has a part (z-m) and since z=log(x) and m=log(G) we get (z-
m)=log(x)-log(G)=log(x/G).

That allows us to write the Log-normal density distribution like so:

 Put z-m=log(x/G)
 Since we must satisfy equation (1), above, then changing from z to x
requires changing Δz to (1/x)Δx
(that is, dz = dx/x) so we must change f Δz to f Δx/x
 We get, finally f(x), our density distribution for x (as opposed to, f(z), the
distribution for z):

(5) Log-normal with x > 0 since x is now a Gain Factor ... and S
Distribution is now the Standard Deviation of log(x) !

18/26
Normal vs Log-Normal
>We must change f Δz to f Δx/x ? What's that about ?
It's because dz = d(log(x) = (1/x) dx ... but don't worry about it.

>You promised just a wee bit of calculus ! Yes ... uh, it's because of the logarithm,
you see.

If x changes by a tiny amount from x to x+Δx, then its logarithm will change from
log(x) to log(x+Δx) = log(x[1+Δx/x]) = log(x) + log(1+Δx/x) and, for tiny values of
Δx/x, log(1+Δx/x) = Δx/x so the logarithm will change by a tiny amount Δx/x and
we can see that x in the denominator so that ...

>Please ... please, continue.

Okay. Notice that, if

1. F(x) is a Normal cumulative distribution with Mean = 0 and Standard


Deviation = 1, then
2. F((x-A)/B) is a Normal distribution with Mean = A and Standard Deviation =
B, and
3. F([log(x)-A]/B) describes a distribution where it's log(x) which has a Normal
distribution and it's log(x) which has the Mean = A and Standard Deviation =
B.

>Number 3 is our Log-normal distribution, right ? Right. Notice that, for the Log-
normal distribution, the geometric Mean, G, plays the central role (unlike the
Normal distribution where it's the arithmetic Mean).

But if we choose the Mean m and Standard Deviation S to match S&P 500
returns, neither distribution is a very good match.

19/26
Normal vs Log-Normal
>So?

So, let's try


another tack
... to get this:

where the
horizontal axis
corresponds to
the Gain
Factors, not the
gains themselves,
so 0.8 means a
gain of 0.8 - 1 = -
0.2 or a 20% loss
and 1.1 means a
10% gain and ...

>Yeah, yeah. I
got it.

>But I thought that individual stocks are supposed to be lognormal ... not the S&P
500.

Yes. Some say that individual stocks are more closely approximated by the
lognormal distribution than a collection of stocks. For example, here's some
samples, for comparison:

20/26
Normal vs Log-Normal
>The density distribution is rather erratic ?

Okay, here are the cumulative distributions:

21/26
Normal vs Log-Normal
However, I'd like to consider a distribution which approximates (for example) the
S&P distribution:

22/26
Normal vs Log-Normal
the Mystery Distribution ... continued from Part III

Now we'll consider the following Density


Distribution function:

(1) f(x) = A e-k |x-m|

where m is the mean and |x-m| is the absolute


value of x-m and A and k are as-yet-unknown
constants.

This has more "tail" than a function which


decays like the Normal distribution.

A sample of such a function looks like this

Remember (from Part III) a magic requirement


for the function f(x), namely

This requires that A = k/2, so that our function now looks like:

(2) f(x) = (k/2) e-k |x-m|

Further, we'll borrow from the Log-normal scheme and assume that the logarithms
of the Gain Factors have this distribution.

That gives us:

(3) f(x) = (k/2x) e-k |log(x/G)| where x > 0 since x is now a Gain Factor

>Don't tell me ! m is now the average of the logs, so m=log(G) where ...

Yes, as in Part III, G is the geometric Mean of the numbers {g} whose distribution
we're discussing.
>And you're doing that funny dz = (1/x) dx because of the logarithm so that f gets
divided by x and ... Of course.

23/26
Normal vs Log-Normal
>So if the numbers we're discussing are
g1, g2, etc. then ...
... then G = { g1g2...gN }(1/N)

Here's a picture of our distribution


with k = 25 ... chosen to be the best
match to the S&P 500.

Notice that our Mystery Distribution has a


tail greater than the actual S&P
distribution, meaning that there is a
greater opportunity for returns far from the
average.
>What's the Cumulative distribution look like,
compared to ...? Here they are, where we're
now back to the monthly gains (rather than the
Gain Factors).

To do this we just subtract "1" from the Gain


Factors which slides the graph to the left by an
amount "1".

>This Mystery Distribution is your invention ?


No, not really. It's called the exponential
distribution.

>I assume there's a magic formula for the


mystery F(x)? Yes, and it's pretty simple:

F(x) = (1/2) (x/G)k when 0 < x < G


= 1 - (1/2)(x/G)-k when x > G

>Where x is the Gain Factor, namely 1+Gain Fraction so it's always positive, and
G is ...the Geometric Mean of the Gain Factors which are always greater than
zero.

>So what's the magic formula for f(x)? It's the slope of F(x), right ?

Yes. Although we've given it above, the simpler version is:

24/26
Normal vs Log-Normal
f(x) = (k/2x) (x/G)k when 0 < x < G
= (k/2x)(x/G)-k when x > G
or
f(x) = (k/2G)(x/G)p
where
p = k-1 when 0 < x < G
and
p = -k-1 when x > G
and x is a Gain Factor and
G is the Geometric Mean of the Gain
Factors

>Is that your Mystery Distribution?

Here's another:

(1) f(x) = A e -k(x-m)2/(b+|x-m|)

where we can choose the numbers A,


k and b so as satisfy the magic
distribution requirement (noted above)
and so as to match, for example, the
historical S&P 500 returns.

The figure on the right has such a


choice and it seems a pretty good
match, eh?

Undoubtedly, you'll have noticed that, when x is far from its mean (which is
denoted by m), this distribution function has an exponential decrease similar to the
first Mystery Distribution (in equation (1), above). In fact, for large values of |x-m|,
our new distribution looks like: A e-k|x-m|, which means ... which means we've still
got those bigger tails.

>So, what do you intend to do with this stuff ? Hmmm ... give me a minute to think
about it.

25/26
Normal vs Log-Normal

You might also like