You are on page 1of 5

On the Fractal Nature of Stock Markets

Jamal Munshi, Sonoma State University, 1992

All rights reserved

Stock market data have thwarted decades of effort by mathematicians and

statisticians to discover their hidden pattern. Simple time series analyses
including AR, MA, ARMA, and ARIMA were eventually replaced with more
sophisticated instruments of torture such as spectral analysis. But the data
refused to confess.

The failure to discover the structure in price movements convinced many

researchers that the movements were random. The so called random walk
hypothesis (RWH) of Osborne and others was developed into the efficient
market hypothesis (EMH) by Eugene Fama. The `weak form of the EMH says
that movements in stock returns are independent random events independent of
historical values. The rationale is that if patterns did exist, arbitrageurs would
take advantage and thereby quickly eliminate them.

Both the RWH and the EMH came under immediate attack from market
analysts and this attack continues to this day partly because statistics used in
tests of the EMH are controversial. The null hypothesis states that the market is
efficient. The test then consists of presenting convincing evidence that it is not.
The tests usually fail. Many argue that the failure of these tests represent a Type
II error, that is, a failure to detect a real effect because of low power of the
statistical test employed.

Besides, the methods of analysis assume a normal and linear world that is
difficult to defend. All residuals are assumed to be independent and normally
disrtributed, all relationships are assumed to be linear, and all effects are
assumed to be linearly additive with no interactions. At each point in time the
data are assumed to be taken from identically distributed independent
populations of numbers the other members of which are unobservable.
Econometric models such as ARIMA assume that all dependencies in time are

It is therefore logical to conjecture that the reason for the failure of statistics to
reject the EMH is due not to the strength of the theory but to the weakness of
the statistics. Many hold that a different and more powerful mathematical
device that allowed for non-linearities to exist might be more successful in
discovering the hidden structure of stock prices.

In the early seventies, it appeared that Catastrophe Theory was just such a
device. It had a seductive ability to model long bull market periods followed by
catastrophic crashes. But it proved to be a mathematical artifact whose
properties could not be generalized. It yielded no secret structure or patterns in
stock prices. The results of other non-EMH models such as the Rational Bubble
theory and the Fads theory are equally unimpressive.

Many economists feel that the mathematics of time series implied by Chaos
Theory is a promising alternative. If time series data are allowed to be non-
linearly dependent, rather than independent as the EMH requires, or linearly
dependent as the AR models require, then much of what appears to be erratic
random behavior or "white noise" may to be part of the deterministic response
of the system. Certain non-linear dynamical system of equations can generate
time series data that appear remarkably similar to the observed stock market

By using new mathematical techniques hidden structures can be discovered in

what appears to be a random time series. One technique, attributed to Lorenz,
uses a plot of the data in phase space to detect patterns called strange attractors.
Another method proposed by Takens uses an algorithm to determine the
`correlation dimension' of the data. A low correlation dimension indicates a
deterministic system. A high correlation dimension is indicative of randomness.

The correlation dimension technique has yielded mixed results with stock data.
Halbert, Brock, and others working with daily returns of IBM concluded that
the correlation dimension was sufficiently high to regard the time series as
white noise. However, Schenkmann et al claim that weekly data of IBM returns
have a significant deterministic component. These structures may not be
inconsistent with the EMH if the discovery of the structure, though providing
insight to economic theorists, do not provide arbitrage opportunities.

A third technique for discovering structure in time series data has been
described by Mandelbrot, Hurst, Feder, and most recently by Peters . Called
`rescaled range analysis', or R/S, it is a test for randomness of a series not
unlike the runs test. The test rests on the relationship that in a truly random
series, a serial selection of sub-samples without replacement should produce a
random sampling distribution with a standard deviation given by

sigmaXbar = [ sigma/n^0.5 ] * [ (N-n)/(N-1) ]

Here sigmaXbar is the standard deviation of the distribution of sample means

obtained by drawing samples without replacement of size n from a population
of size N, and sigma is the standard deviation of the population, i.e., when n=1.

However, when the time series has runs, it can be shown that the exponent of n
in the term `n^0.5', will differ from 0.5. The paper by Peters describes the
following relationships.

R/S = NH (Peters equation 4)

where R is the range of subsample sums, S is the standard deviation of the large
sample, and N is the size of the sub-samples . The `H' term is called the Hurst
constant and is equal to 0.5 if no runs exist and the data are sequenced
randomly. If there is a tendency for positive runs, that is increases are more
likely to be followed by increases and decreases are more likely to be followed
by decreases, then H will be greater than 0.5 but less than 1.0. Values of H
between 0 and 0.5 are indicative of negative runs, that is increases are more
likely to be followed by decreases and vice versa. Hurst and Mandelbrot have
found that many natural phenomena previously thought to be random have H-
values around 0.7. These values are indicative of serious departures from

Once `H' is determined for a time series, the autocorrelation in the time series is
computed as follows:

CN = 2(2H-1) -1

CN is he correlation coefficient and its magnitude is indicative of the degree to

which the elements of the time series are dependent on historical values. The
interpretation of this coefficient used by Peters to challenge the EMH is that it
represents the percentage of the variation in the time series that can be
explained by historical data. The weak form of the EMH would require that this
correlation be zero; i.e., the observations are independent of each other.
Therefore, any evidence of such a correlation can be interpreted as to mean that
the weak form does not hold.

Peters studied 463 monthly returns of the S&P500 index returns, 30-year
government T-bond returns, and the excess of stocks returns over the bond
returns. He found, using R/S analysis, that these time series were not random
but that they contained runs or persistence as evidenced by values of CN
ranging from 16.8% to 24.5%. The correlation estimates indicate that a
significant portion of the returns are determined by past returns. This finding
appears to present a serious challenge to the efficient market hypothesis.

Peters obtained sequential subsamples for eleven different values of N and

computed R/S for each N. To estimate H he converted his equation 4 to linear
form by taking logarithms to yield

log(R/S) = H * log(N)

and then used OLS linear regression between log(R/S) and log(N). The slope of
the regression is taken to be an unbiased estimate of H. The results are
summarized in Table 1.

Summary of Results Using Logarithmic Transformations

Returns Regression Serial Correlation

Constant H CN

Stocks -0.103 0.611 0.168

Bonds -0.151 0.641 0.215
Premium -0.185 0.658 0.245

A Re-examination of the Analysis

However, the logarithmic transformation used by Peters and the interpretation

of the linear regression parameters raise some questions that require a re-
examination of his results. First, consider the logarithmic conversion.

The OLS regression procedure minimizes the error sum of squares between the
predicted log(R/S) and the observed log(R/S). However, it does not necessarily
follow that the value of H at which the error sum of squares of the log(R/S) is at
a minimum is conincident with the value of H at which the error sum of squares
of R/S is also at a minimum. This is because of the nature of exponential
functions which assures that R/S changes more rapidly at the high end than at
the low end for the same change in log(R/S).

For instance, an error of 0.1 when the ln(R/S) = 4, implies an error in R/S of
about 6 but the same error in logarithms at ln(R/S)=8 carries an error in R/S of
313. To the OLS regression routine working on logarithms, these errors are
equivalent. This means that it would give up an error of 300 on the high end to
gain an error reduction of 6 on the low end.

Secondly, the equation to be fitted, R/S = NH may also be written as R/S = 1 *

NHand taking logarithms would yield log(R/S) = log(1) + H * log(N)or,
specifically, since log(1) = 0, we can write log(R/S) = 0 + H * log(N).

This means that to fit the model as stated, the intercept term must be tested
against zero. If the interecept term is significantly different from zero, then the
model must be rejected. In all three regression equations above, the intercept is
negative and significantly different from zero. Therefore, we would expect that
the computed slope is an over-stated estimate of `H.

An Alternative Interpretation

Both problems with the logarithmic transformation mentioned above may be

avoided by applying a non-linear least squares fit directly to the model. The
results of such a procedure are so different from those obtained by logarithmic
transformations that the interpretation and conclusion must be re-evaluated.

Table 2 shows the data used by Peters to infer his regression parameters. Figure
1 shows the error sum of squares plotted against values of H. An unbiased
estimator of H is that at which the error sum of squares is at a minimum. These
values of H, shown in Table 3, are significantly different from those shown in
Table 1 and they are closer to 0.5 than previously thought. In particular, the
correlations are much lower; that is, a much lower proportion of the variance in
security returns are determined by runs or persistence. Rather than 16% to
25% , past prices only explain 5% to 13% of returns variance.
The Data Used in the Regression Models

Stocks Bonds Premium
463 31.877 45.050 27.977
230 22.081 21.587 18.806
150 16.795 15.720 15.161
116 12.247 12.805 11.275
75 12.182 10.248 11.626
52 10.121 9.290 8.790
36 7.689 7.711 7.014
25 6.296 5.449 4.958
18 4.454 4.193 4.444
13 3.580 4.471 3.549
6 2.168 2.110 2.209

Summary of Results Using Non-Linear Regression

Returns Regression Serial Correlation

Constant H CN

Stocks 0 0.56 .0867

Bonds 0 0.59 .1329
Premium 0 0.54 .0570

A comparison of the logarithmic fit to the direct non-linear fit is shown in

Figures 2, 3, and 4. In each case, the non-linear fit follows the data more closely
while the logarithmic fit shows wide dispersions at the high end as expected.


This analysis shows that the amount of variance in returns explained by the
fractal model is very low. It has not been established that the correlation is
significantly different from zero. Even if it were, the low magnitude of the
correlation precludes any conclusions of practical significance either in terms of
arbitrage profits or financial theory.

Therefore, the derived model parameters may not be subjected to interpretations

with regard to behavior of the market and the results may not be considered to
be inconsistent with the efficient market hypothesis.