You are on page 1of 9

Novel Time Series Analysis and Prediction of Stock

Trading Using Fractal Theory and Time Delayed Neural


Network *
Fuminnri Yakuwa Yasubikn Dote Mika Yoneyama Shinji U z u r a b a s h i
Kushiro branch Deuartment of Comuuter Deuartment of Comuuter Panasonic
Hokkaido Electric Power Science & Systems Science & Systems Mobile & System
Co.,Inc. Engineering Engineering Engineering Co.,Ltd.
8-1, Saiwai-cho, Kushiro, Muroran Institute of Muroran Institute of
050, JAPAN Technology Technology
Phone: +S1-154-23-1114; 27-1, Mizumoto-cho, 27-1, Mimmoto-cho,
FAX: 181-154-23-2220 Muroran 050, JAPAN Muroran 050, JAPAN
Phone: +SI-143-46-5432; Phone: +SI-143-46-5432;
FAX: +SI-14346.5499 F A X +SI-143-46-5499

series using a Hurst exponent, a fractal analysis method,


Abstract - The stock markets are well known for wide
and an autocorrelation analysis method. In order to extract
variations in prices over short and long terms. These
the knowledge, decision making rules comprehensible by
fluctuations are due to a large number of deals produced
humans using the features are derived with rough set
by agents and act independently from each other.
theory [26]. Finally the knowledge is embedded into the
However, even in the middle of the apparently chaotic
stmcture of the Time Delayed Neural Network (TDNN).
world, there are opportunities for making good
The accurate prediction is obtained.
predictions [ I ] .
This paper is organized as follows. In Section 2 time
In this paper the Nikkei stock prices over 1500 days
series analysis using fractal analysis is described. Section
from July to Oct. 2002 are analyzed and predicted using
3 illustrates the structure of neural networks for time
a Hurst exponent (H), a fractal dimension (D), and an
series. Section 4 describes short-term prediction using
autocorre~ation coefficient (c). They are H = 0.6699
TDNN. Some conclusions are drawn in Section 5 .
D=2-H=1.3301 and C = 0.26558 over three days. This
obtained knowledge is embedded into the structure of our
developed time delayed neural network 121. It is 2 Time Series Analysis using Fractal
confirmed that the obtained prediction accuracy is much Analysis
higher than that by a back propagation-type forward
neural networkfor the short-term. 2.1 Fractal
Fractal analysis provides a unique insight into a wide
Although this predictor works for the short term. it
range of natural phenomena. Fractal objects are -those
is embedded into our developedfiruy neural network [3]
which exhibit ‘self-similarity’. This means that the general
to construct multi-blended local nonlinear models. It is
shape of the object is repeated at arbitrarily smaller and
applied to general long term prediction whose more
smaller scales. Coastlines have this property: a particular
accurate prediction is expected than that by the method
coastline viewed on a world map has the same character
proposed in [I].
as a small piece of it seen on a local map. New details
1 Introduction appear at each smaller scale, so that the coastline always
appears rough. Although true fractals repeat the detail to a
The Nikkei Average Stock prices over 1500 days are vanishingly small scale, examples in nature are self-
in the middle of the apparently chaotic world. In this similar up to some non-zero limit. The fractal dimension
paper. on the basis of Zadeh’s proposal: i.e., “From measures how much complexity is being repeated at each
Manipulation of Measurements to Manipulation of scale. A shape with a higher fractal dimension is more
Perceptions-Computations with Words” [25], that is a data complicated or ‘rough‘ than one with a lower dimension,
mining technology, knowledge easily comprehensible by and fills more space. These dimensions are fractional: a
humans is extracted by obtaining the features of the time shape with fractal dimension of D=1.2, for example, fills

0-7803-7952-7/03/$17.00 0 2003 IEEE. 134


more space than a one-dimensional curve, but less space 2.3 R I S method
than a two-dimensional area. The fractal dimension
successfully tells much information about the geometry of The rescaled range analysis, also called as R I S or
an object. Very realistic computer images of mountains, Hurst method, was invented by Hurst for the evaluation of
clouds and plants can be produced by simple recursions time dependent hydrological data [8][9]. His original
with the appropriate fractal dimension. Time series of work is related to the water reservoirs and the design of an
many natural phenomena are fractal. Small sections taken ideal storage on river Nile. After the detailed discussion of
from these series, one scaled by the appropriate factor, this work by Mandelbrot [lO][ll], the method has
cannot be distinguished from the whole signal. Being able attracted much attention in many fields of science. For the
to recognize a time series as fractal means being able to mathematical aspects of the method we refer to the papers
link information at different time scales. We call such sets of Mandelbrot [19], Feder 1121, and Daley [13]. Since its
'self-affine' instead of self-similar because they scale by earliest days the method was used for a number of
different amounts in each axis direction. applications, whenever the question was the quantification
of long range statistical interdependence within time
There are many methods available for estimating the series. As examples we can cite the analysis of the
fractal dimension of data sets. These lead to different asymmetry of solar activity [7][8],relaxation of stress [SI,
numerical results, yet little comparison of accuracy has problems in particle physics [IS], mechanical sliding in
been made among them in the literature. We combine two solids [19]. The Hurst analysis is also used as a tool to
methods which are known as the most popular for determine the self-similarity parameter of fractal signals
assigning fractal dimensions to time series, the box- [20-231, or to detect unwanted correlations in pseudo-
counting method and rescaled range analysis. random number generaton [23]. The Hurst exponent was
calculated for corrosion noise data in the work of Moon
and Skerry [24] where the corrosion resistance properties
2.2 Box-counting
of organic paint films was analyzed and a direct
The box-counting algorithm is intuitive and easy to relationship between the Hurst exponent and the corrosion
apply. It can be applied to sets in any dimension, and has resistance of different coatings was established. Greisiger
been used on images of everything from river systems to and Schauer [15] discussed the applicability of different
the clusters of galaxies. A fractal curve is a curve of methods to the electrochemical potential and current noise
infinite detail, by virtue of its self-similarity. The length of analysis. They concluded, that the Hurst exponent allows
the curve is indefinite, increasing as the resolution of the the extraction of mechanistic information about corrosion
measuring instrument increases. The fractal dimension processes, hence suitable for characterizing coatings.
determines the increase in detail, and therefore length, at
each resolution change. For a fractal, the length L as a We give a brief introduction to the R I S method, in
function of the resolution of the measurement device 6 lines of Feder's [I91 work. Let the time coordinate, t, he
is: discredited in terms of the time resolution, A t , as i = r / & ,
The discrete time record of a given process is denoted by
L ( 6 ) oc F D x, , i=O,l,---,N if the total duration of the observation is
T = N & . According to the hasic idea of the R I S method
where D is an exponent known as the fractal the time record is evaluated for a certain time interval,
dimension. (For ordinary curves ~ ( 6 )approaches a called time lag, the length of which is r = jAt and begins
constant value as 6 decreases) Box-counting algorithms at to= luAt . Obviously, j < N and I, < N hold. The
measure L(S) for varying 6 by counting the number of average of xI over the time lag is calculated as
non-overlapping boxes of size 6 required to cover the
curve. These measurements are fitted to Eq. ( I ) to obtain
an estimate of the fractal dimension, known as the box (3)
dimension. A fractal dimension can be assigned to a set of
time series data by plotting it as a function of time, and
calculating the box dimension. Eq. (1) will hold over a Next the accumulative deviation from the mean,
finite range of box-size; the smallest boxes will be of J , ~ is
, evaluated
~ as
width r , where r is the resolution in time, and height 0 ,
where a is the resolution of the magnitude of the time
series.
(4)
1=4.>

where k takes the values 15 k 5 j

135
In order to visualize the meaning of Eq. (4) let us events with Gaussian distribution and zero mean. The
refer to the hydrological context in which the method was Hurst exponent for such a time record is 1/ 2 .
devised by Hurst. Here x is the annual water input into a
reservoir in the ith year of a series of N years, and .v~,,,,~ For 1 1 2 < H < 1 the time series is called persistent,
is the net gain or loss of stored water in the year 1, + k , i.e. i.e. an increasing trend in the past implies, on the average,
a continued increasing trend in the future, or a decreasing
some time within the time lag in question. That is, the
trend in the past implies a decrease in the future. If,
annual increment is the object of analysis. The ideal
O<H < 1 / 2 prevails, the time series observed is anti
reservoir never empties and never overflows, so the
persistent, i.e. an increasing trend in the past implies
required storage capacity is equal to the difference
decreasing trend in the future and vice versa. Persistency
between the maximum and minimum value of Y ~ , , , ~
is found also in cases where the time series exhibit clear
over j . This difference is called the range, R,h3, trends with relatively little noise [I 1][14][22].

R, I,) = "Yk,k.," 1- "Yk ,.lo 1 (5) 2.4 Interpretation of fractal dimension


We have already mentioned that the fractal
The variance of x, for the same period, r , IS given as dimension of an object is a measure of complexity and
degree of space filling. When the object is a series in time,
the dimension also tells us something about the relation
between increments. It is a useful and meaningful insight
into series of natural processes.

2.5 Fractional Brownian motion


and the quotient R,," is called rescaled range. The
A particle undergoing Brownian motion moves by
above expressions are referred to a given position of the
jumping step-lengths which are given by independent
time lag in the time axis. However, the time lag can be
Gaussian random variables. For one-dimensional motion
shifted and the procedure given by (I), (2), (3) and (4) can
the position of the particle in time, A'(/), is given by the
be repeated for each position. Thus a series of rescaled
ranges is obtained the average of which can be evaluated. addition of all past increments. The function X ( / ) is a
As a non-unique but rational choice the lag is shifted by self-affine fractal, whose graph has dimension 1.5.
steps Eqs. (3)-(6), thus a series of non-overlapping but
contacting intervals is constructed. In other words a series Fractional Brownian motion (Eh) generalizes
of R,,," /S,.!" is evaluated with j fixed and I vaned as X ( f ) by allowing the increments to be correlated.
I = ( l , + m , ) where m=1,2;..,[N/j] , with the square Ordinary Brownian motion can be defined by:
bracket denoting integer part. Then the rescaled range for
the time lag r , is calculated as the average: X(r) - ~ ( t , =) g1t -tOl2 (9)

1 I=[Nij]
where H = I12 , 5 is a normalized independent
R/S=- C ( R j l /Sj,,) ('I
IN1 f M, Gaussian process and X ( t J the initial position [4][5].
Replacing the exponent H = 1 / 2 in Eq.5 with any other
Hurst observed that there is a great number of number in the range O < H < l defines an fBm function
natural phenomena, for which the ratio R I S obeys the X , ( r ) . The exponent H here corresponds to the statistic H
rule that R I S analysis calculates.

RISocr' (8) The correlation function of future increments with


past increments for the motion X , ( t ) can be shown to be
where H is called Hurst exponent. The Hurst [5]:
exponent was seen to be between 0 and 1. The value
H = 112 has a special significance, because this reflects
C(t)= 22x-1- 1 (10)
that the observations are statistically independent of each
other. This is the random noise case. For example the
increment series, i.e. the series of displacements, in Clearly, C ( t ) = O for H = I12 ; increments in
Brownian motion is a sequence of uncorrelated random ordinary Brownian motion are independent. For H > 1 1 2 ,
C(r)is positive for all r . This means that after a positive

136
increment, future increments are more likely to be positive.
This is known as persistence. When H < I / 2 , increments
are negatively correlated, which means an increase in the
past makes a decrease more likely in the future. This is
called anti-persistence.

Now it is true for self-affine functions such as


X,,(t) that the fractal dimension, D , is related to H by
[41:

We can then identify persistence or anti-persistence


in data sets whose graphs are fractal. Persistent time series
show long-term memory effects. An increasing trend in
the past is likely to continue in the future because future
increments are positively correlated to past ones.
Similarly, a negative trend will persist. This means that
extreme values in the series tend to be more extreme than
for uncorrelated series. In the context of climatic data,
droughts or extended rain periods are more likely for
persistent data.

In order to determine the Hurst exponent,


log(R/S) is plotted against logr and the slope renders
H . However, not all the points of this plot have the same
i_-__I
statistical weights: when P is very small, a large number 0s
of R / S data can be calculated but their scatter is large;
0.3
when r is very large, only few RIS data are at hand, so
the statistics is poor. For this reason the first and the last 0.
0 IO 10 o, o. ,e M m 10 so I_

few points of the double logarithmic plot are usually -


N

discarded.

To begin with, we verified whether change of a Figure 2. Relationship of scaling interval (N) versus Hurst
stock price time series follows the random walk exponent (H)
hypothesis using the rescaled range analysis. We analyzed As shown in this Figure 2, H =0.88 corresponding
stock prices time series of Nikkei Stock Average. The to N = 3 is the maximum. Therefore, it is found that Data
analysis period used the data for 1500 days from July, for 3 days show the strongest correlation using the fractal
1996 to October, 2002. In the analysis, the logarithm analysis of the Nikkei Stock Average Prices. This
profitability was applied to the original analysis object. knowledge is discovered .using a feature.de map with
rough set theory [26] shown in Table I . Firstly a Hurst
exponent is obtained. Then the fractal dimension and the
). - 0.66Pn-010,l
autocorrelation coefficient are calculated from the Hurst
exponent.

Table I . Knowledge extraction with feature-rule map


with rough set theory

Figure 1. Rescaled range analysis of Nikkei Stock


Average time series
Figure 1 shows the result of analyzing the Nikkei
Stock Average Prices for 1500 days. From the gradient of
this obtained straight line, the Hurst exponent (H) in the

137
Table 2. Pawlak’s Lower and Upper Approximation 4 Short-term Prediction of using
Class Brownian Motion BPNN
We obtained the features (knowledge: N = 3 ) of the
Lower approximation time series by the fractal analysis. Two kinds of 3 layers
Upper approximation BP neural networks which have a delay element between
Aooroximation accuracv 1 each input node are considered. No filter at each node has
Class I N=3 I connected in the first one. The second one has a 3-order
FIR at each input node.
Number of objects: 3

4.1 Simulation by 3 layer BPNN without filters


Upper approximation 1
Approximation accuracy 1 No filter is connected at each input node. The
structure of the neural network is shown in Table 3.
Class Similaritv (Fractal)
Number of objects 1
Table 3. BP networks structure
I
Upper approximation
Aooroximation accuracv I
1
1 I
Out utNode
3 Structure of Neural Network for
E ochs 500 300
Time Series
In order to embed the discovered knowledge into the 4.1.1 BPNN simulation with 3 input nodes
structure of neural networks, it is found that our The structure of the neural network is illustrated in
developed delayed neural network is suitable 121. Figure 4. The simulation result with 3 input nodes is
shown in Figure 5. We predicted for seven days from the
3.1 Time Delayed Neural Network (TDNN) 1501st. The error and the number of epochs are given in
Table 4.
In order to handle dynamical systems time delay
elements representing the obtained knowledge are put into
the inputs of neural networks 121. The structure Table 4. BPNN without filters with 3 inputs

I
configuration of FIR filter is shown in Figure 3. It is a Error 0) I 59.4803
finite impulse response (FIR) digital filter which is
connected to each input of a back propagation type
Epochs 201
forward neural network (BPNN). A time delay element is
also put between the inputs of the filter. 4.1.2 BPNN simulation with 5 input Nodes
In the same way, the structure of the neural network
is illustrated in Figure 6. The simulation result with 5
input nodes is shown in Figure 7. The error and the
number of epochs are listed in Table 5.

Table 5 . BPNN without filters with 5 input nodes

Error 0) I 304.9743
Epochs 447
Zl:delay element

Figure 3. FIR filter 3 Inputs Nodes 5 Inputs Nodes


Error 0) 59.4803 304.9743
Epochs 201 447

where, f is a sigmoid function and the weights are


corrected by the Back Propagation algorithm (BP).

138
input input
Three-layer FIR filters Three-layer
Zl. .Pm

Back Propagation Z-1-- 2-1 Back Pmpagation output


JUtpUt

Neural Network
K) Neural Nework
2-1. .2 1

2-1.. 2~'

Figure 4. The structure ofthe BPNN without filter with 3 Figure 8. The structure of the BPNN with filter with 3
input nodes input nodes.
.- ..__.."..

I I I I

Figure 5. BPNN simulation without filters with 3 input Figure 9. BPNN Simulation with filter with 3 input nodes
nodes
input Three-layer input Three-layer

Neural Network

Figure 6. The structure of the BPNN without filter with 5 Figure 10. The structure of the BPNN with filter with 5
input nodes input nodes

"-.-
I

I-
">-
- c .-*./- --d,-

Figure 7. BPNN simulation without filters with 5 input Figure 11. BPNN Simulation with tilter with 5 input nodes
nodes

139
Table 1 I , Conparison of both networks
4.2 Simulation by the 3 layer BPNN with filters.
Table 7 shows the structure of the 3 layer BPNN
with filters.
r
I T T i n p u t Nodes I
Erroro)
59.4803
I
I
Epochs
201 I
5 Input Nodes 304.9743 447
filters
Table 7. 3 layer BPNN with filters network structure

\ 3 Inputs Nodes 5 Inputs Nodes 5 Input Nodes 88.2962 154


3-order FIR filter connected connected filters
Hidden Node 3 3
OutputNode I 1 I 5 Conclusion
Epochs I 500 300
A data mining technique is applied to time series
analysis and prediction. From a large amount of data
4.2.1 Simulation with 3 input Nodes understandable knowledge is extracted using a Hurst
The structure of the neural network is illustrated in exponent, a fractal analysis method and an
Figure 8. The simulation result is shown in Figure 9. autocorrelation analysis method. Then it is embedded into
Table 8 lists the prediction error and the number of the suitable network, BPFN. The accurate prediction is
emchs. obtained in the Nikkei Average Stock price time series by
the BPNN with filters.
Table 8. Simulation with 3 input nodes
References
E r r o r 0) 47.3381
[ I ] 0. Castillo and P. Melin, “Hybrid Intelligent
Systems for Time Series Prediction Using Neural
Networks, F u u y Logic, and Fractal Theory”, I€€€
Transactions on NN, Vol. 13, no. 6, pp. 1395-1407, Nov.
2002.
. ,. . .
[2] M. S. Shafique and Y. Dote, “An Empirical Study
on Fault Diagnosis for -Nonlinear Time Series using
Linear Regression Method and FIR Network”, Trans. I€€
Table 9. Simulation with 5 input nodes ofJapan, Vol. 120-C, no. 10, pp. 1435-1440, Oct. 2000.

I E r r o r 0) I 88.2962 I [3] F. Yakuwa, S. Satoh, M. S. Shaikh, and Y. Dote,


I Epochs I 154 I “Fault Diagnosis for Dynamical Systems Using Sol?
Computing”, Proceedings of the 2002 World Congress on
Table 10 tells that with both 3 input 5 input nodes Computational Intelligence, Honolulu, Hawaii, U.S.A.,
prediction accuracy is fairly high. May 12-17,2002.

Table IO. Conparison ofboth [4] J. Feder, Fractals, Plenum Press, New York, 1988, p.
288.
- .. -
\ 3 Inputs Node 5 Inputs Node ,/%_I

Error 0) 47.3381 88.2962 [5] N. Wiener, Differential space, I. Math. Phys. Mass.
Epochs 37 154 Inst. Techno. 2,1923, pp. 131-174

[6] T. Vicsek, Fractal Growth Phenomena, World


Scientific, Sigapore, 1992, p.488.

[7] H. Greisiger and T. Schauer, Prog. Org. Coat. 39,


2000, p. 31.

[SI H. E. Hurst, Nature 180, 1957, p. 494,

[9] H. E. Hurst, R. P. Black and Y. M. Simaika, Long-


term Storage, an Experimental Study, Constable, London,
1965.

140
[IO] B. Mandelbrot and J. R. Wallis, WaferResour. Res:
5, 1969, p. 228.

[ I 11 B. Mandelbrot and J. R. Wallis, WaferResour. Res.


5 , 1969, p. 967.

[ 121 J. Feder, Fractals, Plenum, New York, 1988,

(131 D. J. Daley, Ann. Probab. 27, 1999, p. 2035.

[I41 R. W. Komm, SolarPhys. 156,1995,~.17.

[I51 R. Oliver and J. L. Ballester, Solar Phys. 169, 1996, p.


216.

[I61 A. Gadomski,Mod. Phys. Lett. B I!, 1 9 9 7 , ~645.


.

[I71 1. A. Lebedev and B. G. Shaikhatdenov, J Phys. G:


Nucl. Part. Phys. 23, 1997, p. 637.

[I81 M. A. F. Gomes, F. A. Osouza and V. P. Brito, J.


Phys. D 31, 1998, p. 3223.

[I91 C. L. Jones, G. T. Lonergan and D. E. Mainwaring,


J. Phys. A 29, 1996, p. 2509.

[20] C. Heneghan and G. McDarby, Phys. Rev. E 62,


2000, p. 6103.

[21] C. W. Lung, J. Jiang, E. K. Tian and C. H. Zhang,


Phys. Rev. E 60, 1999, p. 5121.

[22] B. A. Cameras, B. Ph. van Milligen, M. A. Pedrosa,


R. Balbin, C. Hidalgo, D. E. Newman, E. Sanchez, M.
Frances, 1. Garcia-Cones, J. Bleuel, M. Endler, C. Ricardi,
S. Davies, G. F. Matthews, E. Martines, V. Antoni, A.
Latten and T. Klinger, Phys. Plasmas 5 , 1998, p. 3632.

[23] B. M. Gammel, Phys. Rev. E58,1998, p. 2586.

[24] M. Moon and B. Skerry, J. Coal. Technol. 67, 1995,


p. 35.

1251 L. A. Zadeh, Plenaly Talk on from computing with


numbers to computing with words-from manipulation of
measurements to manipulation of perceptions, Proceedings of
the IWSCI-99,June 16-18, Muroran, Japan.

1261 A. Kusiak, “Rough Set Theory: A Data Mining Twll for


Semiconductor Manufacturing”, IEEE Trans. On EPM, Vol. 24.
No. I , pp. 44-50, January, 2001.

141

You might also like