You are on page 1of 10

The Prediction of the Financial Time Series Based

on Correlation Dimension
Chen Feng1, Guangrong Ji1, Wencang Zhao1,2, and Rui Nian1
1

College of Information Science and Engineering Ocean University of China,


Qingdao, 266003, China
fccjg@sdu.edu.cn, grji@mail.ouc.edu.cn,nianrui_80@163.com
2
College of Automation and Electronic Engineering, Qingdao University of Science,
&Technology, Qingdao, 266042, China
wencangzhao@mail.edu.cn

Abstract. In this paper we firstly analysis the chaotic characters of three sets of
the financial time series (Hang Sheng Index (HIS), Shanghai Stock Index and
US gold price) based on the phase space reconstruction. But when we adopt the
feedforward neural networks to predict those time series, we found this method
run short of a criterion in selecting the training set, so we present a new method:
using correlation dimension (CD) as the criterion . By the experiments, the
method is proved effective.

1 Introduction
The prediction of the financial time series is a problem which interest the researchers
at all time because it has important meaning for macro-economic adjustment and
micro-economic management. For predicting the financial time series better researchers made great efforts to find the laws of the time series. In the past the financial time
series were considered random walk and the models were built according to this viewpoint, but the predicted results were proved bad by some experiments [1].
In recent years researchers found that some financial time series are chaotic time
series rather than the random series in fact. Literature [2] indicated that hourly data of
four spot exchange rates (British Pound, Deutschmark, Japanese Yen and Swiss
France) are chaotic; literature [3] pointed out American national debt time series has
chaotic attractor; literature [4] proved that some metal prices in London market follows a mean process that is dynamic chaotic.
Many methods such as the maximum Lyapunov exponent method [5] and one-rank
weighed local method [6] are used to predict the chaotic time series. In maximum
Lyapunov exponent method, a teeny error induced by computing the maximum
Lyapunov exponent will bring large error in the prediction. The idea of one-rank
weighed local method is to use the linear model to resume local chaotic system. But
the linear model always has some limits to mirror the nonlinear system. So the predicted effects of the economic time series are not good enough with these methods.
At the same time owing to the strong nonlinear mapping ability of the neural networks, many kinds of neural networks such as BPNN [7], GRNN [8] and RNN [9]
etc. were used to predict the financial time series. In this paper we adopt the feedforL. Wang, K. Chen, and Y.S. Ong (Eds.): ICNC 2005, LNCS 3610, pp. 1256 1265, 2005.
Springer-Verlag Berlin Heidelberg 2005

The Prediction of the Financial Time Series Based on Correlation Dimension

1257

ward neural networks used in the literature [10] as the training networks to predict the
financial time series. With this kind of networks introduced in the third section, many
classical chaotic systems such as Lorenz system, Henon mapping etc. can be predicted very well.
But in the process of studying the method, we find the training sets choice is hazy
and run short of a criterion in this method. So at the forth section, we bring forward a
new method to choose the training set. According that the financial time series are
chaotic, we choose the correlation dimension -- a kind of fractal dimension that can
depict the chaotic characteristics as the criterion to choose the training set. By the
experiments the method is proved effective.
If we use the feedforward neural networks to predict the time series, the phase
space must be reconstructed firstly, so in the second section we introduce the delay
coordinate method adopted to reconstruct the space and compute the financial time
series maximum Lyapunov exponents to prove the three sets of financial time series
are chaotic. Then we show the architecture of the neural networks in third section. In
the forth section we explain the definition of the correlation dimension simply, and
introduce how to choose the training set according to the correlation dimension. At
the same time the three sets of economic data are used to prove the effect of the new
method in the fifth section. In the last section, we reach the conclusion.

2 Phase Space Reconstruction


2.1 Theory Introduction
For resuming the dynamic characteristics of the original financial systems, the phase
space should be reconstructed firstly. Takens theorem, which opens out some nonlinear systems dynamic mechanism, is the theoretic base of the phase space reconstruction.
Takens theorem: M is d dimension manifold mapping : M M is a smooth differential homeomorphism mapping y : M R has second-order continuous derivative
( , y ) : M R 2 d +1 and

( x , y ) = ( y ( x ), y ( ( x ), y ( 2 ( x )), L , y ( 2 d ( x )))

(1)

where the function ( , y ) is a embedding from M to R 2 d +1 . The theorem indicates that a


suitable embedding dimension can be found to resume the inerratic trajectory [11].
The delay coordinate method is used to reconstruct the phase space in the paper. An
embedding dimension m and a delay time are determined to create N m points, and
every point Yi is a m dimension vector,
Y1 = ( x1 , x1+ ,L, x1+ ( m 1) ),L,Yi = ( xi , xi + ,L, xi + ( m 1) ),L,Y N m = ( x N m , x N m +1 ,L, x N )

(2)

where N m = N (m 1) . The embedding dimension m and the delay time are important parameters because they decide the quality of the reconstructed phase space.
In this paper, we use the so-called false nearest-neighbor method [12] to decide the
embedding dimension m . The idea of the method is when the dimension is in-

1258

C. Feng et al.

creased from m to m + 1 , we estimate whether there are false near points in the near
points of the point Yi , if there is none, the geometrical structure of the attractor has
been opened. When the dimension is m , supposing that the point Yi' is the nearest
point of the point Y i , the distance between these two points is

dimension is increased to m + 1 , their distance is marked


Y Y
i
i '

( m +1)

Y i ' Yi

(m)

Yi ' Yi

(m)

Y i' Y i

Y i' Y i

( m +1)

(m )

.When the

> RT ,10 RT 50

(3)

The point Yi' is the false neighbor point of the point Yi where RT is the threshold .We
start at dimension 2 and increase the dimension by one each time. Either the proportion of the nearest neighbor points is smaller than 5% or the number of the nearest
neighbor points dont decrease with the increase of the dimension, the dimension m is
the optimum.
2.2 Financial Time Series Phase Space Reconstruction
In the paper, we choose the opening quotation of Hang Sheng Index (HIS) (4067
points from 31 December 1986 to 16 June 2003), Shanghai Stock Index (2729 points
from 19 December 1990 to 29 January 2001), and US gold price (7277 points from 2
January 1975 to 8 August 2003) as the experiment data. The three sets of time series
are shown in Fig.1.

(a)

(b)

(c)

Fig. 1. (a) Opening quotation of Hang Sheng Index (b) Opening quotation of Shanghai Stock
Index (c) Opening quotation of US gold price.

From Fig.1 we can observe that in the time series curves some locals have similarity with the whole. For showing the complexity of the three sets economic data, we
compute their box dimensions [13]. The box dimension, which always is used to calculate the dimension of the continuous curve, is a kind of fractal dimension. They are
shown in Table 1.
According to the theory in the literature [14], if the capital market follows the random walk, the box dimension should be 1.5. The time series whose box dimension is
between 1 and 1.5 is called long range correlation fractal time series, which means
that the past increment is positive correlative with the future increment. The time

The Prediction of the Financial Time Series Based on Correlation Dimension

1259

series whose box dimension is between 1.5 and 2 is called long range negative correlation fractal time series, which means that the past increment is negative correlative
with the future increment. From the Table 1, we can observe that the box dimensions
are all between 1 and 1.5, so the financial time series dont follow the random walk
entirely, and that there is long range positive correlation in them.

Table 1. The box dimensions of the enconomic time series

Box dimension

HIS
1.16016

Shanghai Stock Index


1.16631

US gold price
1.18816

We reconstruct the phase space by calculating the embedding dimension m and the
delay time with the prediction error minimizing method [15].
At the same time, we choose three dimensions data from the every m -dimension
reconstructed phase space of the financial time series and plot them which are shown
in Fig.2.

(a)

(b)

(c)

Fig. 2. The 3 dimensions data from the reconstructed phase space of the financial time series (a)
the opening quotation of Hang Sheng: 1-dimension, 9-dimension and 17dimension (b) the
opening quotation of Shanghai Stock Index:1-dimension, 10-dimension and 19dimension (c)
the opening quotation of US gold price:1-dimension, 10-dimension and 19dimension

The maximum Lyapunov exponent max is computed with the small data sets
method [16] to prove that these financial time series are chaotic. A quantitative measure for the sensitive dependence on the initial conditions is the Lyapunov exponent,
which characterizes the average divergence rate of two neighboring trajectories.
It is not necessary to calculate Lyapunove spectrum because a bounded time series
with a positive maximum Lyapunove exponent indicates chaos. Moreover, the maximum Lyapunov exponent gives an estimate of the level of chaos in the underlying
dynamical system. From Table 2 we can found the maximum Lyapunov exponents
are positive, so the financial time series are chaotic.
The chaotic systems are sensitive to the initial values, so the chaotic time series has
limited prediction potential. Since the maximum Lyapunove exponent characterizes
the average degree of neighboring orbits, its reciprocal 1 max determines the maximum
predictable time. The results are all shown in Table 2.

1260

C. Feng et al.
Table 2. The chaotic analyse of the financial time series.

Embedding Delay time maximum Lyapunov maximum predictable


dimension
exponent
time
HSI
17
6
0.069
14
Shanghai Stock Index
19
4
0.029
30
US gold price
19
7
0.046
20

3 Feedforward Neural Networks


The architecture of the feedforward neural networks used in this lecture is
m : 2m : m : 1 , where m is the embedding dimension. The topology architecture is shown
in Fig.3.

Fig. 3. Architecture of the feedforward neural networks

When the m dimension training set is put into the networks, each hidden unit j in
the first hidden layer receives a net input

j = w ji xi

(4)

and produces the output

V j = tanh( j ) = tanh( w ji x i )
i

(5)

where w ji represents the connection weight between the i th input unit and the j th
hidden unit in the first layer. Following the same procedure for the other unit in the
next layers, the final output is then given by

z ' = w sl tanh wlj tanh wij x i


j

l

i

(6)

The Prediction of the Financial Time Series Based on Correlation Dimension

1261

where the hyperbolic tangent activation function is chosen for all hidden unit, and the
linear function for the final output unit.
The weights are determined by presenting the networks with the training set and
comparing the output of the networks with the real value of the time series. The function of the weights adjusting is
old
wqtnew = w qt
w qt

(7)

where wqt = E ( wqt ) wqt , E ( wqt ) is the mean square error function, 0 < 1 is the learn
rate, and 0 < < 1 is the inertial term. By setting the delay coordinates of the time series x(t ) : ( x(t ), x(t ),L, x(t (m 1) ) as the input pattern and choosing x (t + ) as the
know target, the networks can be trained to predict the future state of the system at a
time , which corresponds to a certain number of time steps [10].

4 How to Choose the Training Set


4.1 Method of Choosing Training Set
In the above feedfoward neural networks prediction the input data and the target
should be known, but it is impossible to know the target in reality, therefore this prediction is only a systemic simulation. At the same time the literature [10] didnt mention how to choose the training set, thus the choice of the training set has some uncertainty. So if we want to predict the time series authentically, we should choose the
training set whose characters are similar with the prediction sets, and use the weights
getting from the training sets exercitation to forecast the prediction set. So how to
choose the training set became an important problem. We solve this problem in this
section.
In the second section we proved the three sets of the financial time series are chaotic time series, so we put forward a new method to choose the training by using the
correlation dimension as the criterion. The correlation dimension is a kind of fractal
dimension that can depict the chaotic characteristic. The notion of dimension often
refers to the degree of complexity of a system expressed by the minimum number of
variables that is needed to replicate it.
The steps of how to choose the training set are shown as follows.
1)
2)
3)
4)
5)

Reconstructing the phase space.


Calculating correlation dimension of the prediction set.
Choosing 50 sets which are closest to the prediction set.
Computing every sets correlation dimension.
Choosing the set whose correlation dimension is nearest to the prediction sets as
the training set.

Then use this training set to train the networks, and get the weights. We can adopt
these weights to forecast the prediction set.

1262

C. Feng et al.

4.2 Correlation Dimension


The G-P algorithm which was presented by Grassberger and Procaccia is adopted to
calculate correlation dimension [17].
For a set of the space points {Yi } , defining
C N (r) =

where ( x ) = 0,
1,

x0
x>0

2
( r Y i Y j ),
N ( N 1) 1 i < j N

(8)

is the Heaviside function.

When we choose different r , we can get different C N (r ) . In estimating the correlation dimension from the data, one plots log C N ( r ) against log( r ) , where N is the cardinality of the data set. C N (r ) measures the fraction of the total number of pairs
(Yi , Y j ) such that the distance between Yi and Y j not longer than r .

5 Experiments
From the embedding dimensions in the Table 2 we can determine the neural networks architecture, for Shanghai Stock Index the architecture is 19:38:19:1, for HSI
the architecture is 17:34:17:1, for US gold price the architecture is 19:38:19:1.

(a)

(b)

(c)

Fig. 4. Fitting curves of prediction sets CD and the Training sets CD (a) the opening quotation
of Hang Sheng Index (b) the opening quotation of Shanghai Stock Index (c) the opening quotation of US gold price

Based on the three phase spaces with the financial time series, we choose 100 continuous points in the every phase space as the prediction set. Using the method expatiated in the forth section we determine the training set. The correlation dimensions of
the prediction set and training set are listed in Table 3.
Fig.4 shows the fitting curves of prediction sets CD and the Training sets CD.
From Table 3 and Fig.4 we can observe for every set of the financial time series that
the training sets correlation dimensions are near to the prediction sets, and their
fitting curves are parallel. So in the next step we use these three training sets to train
the networks.

The Prediction of the Financial Time Series Based on Correlation Dimension

1263

Table 3. The correlation dimension comparison between the predicting set data and training
data

Predicting set CD
Training sets CD
CDs difference

HIS
1.9113
2.1078
0.0965

Shanghai Stock Index


2.3979
2.3276
-0.0703

US gold price
2.7269
2.7936
0.0667

By educating the every training set in the networks, we obtained the weights one
by one. Put the prediction set into the networks whose weights have been determined,
and the predicted data are calculated. The three sets of the predicted results and the
real values are shown in Fig.5.

(a)

(b)

(c)

Fig. 5. The prediction of the financial time series (a) the opening quotation of Hang Sheng
Index from 2 April 2002 to 22 April 2002 (b) the opening quotation of Shanghai Stock Index
from 21 October 1996 to 11 November 1996 (c) the opening quotation of US gold price from 8
September to 5 October 1995

We also calculate the mean absolute percentage error (MAPE) displayed in Table 4
to show the prediction effect.
n

MAPE =

x
t =1

xt' xt

(9)

where xt is the real data and xt' is the predicted data.


Table 4. The MAPE between the real data and the predicted data

MAPE

HIS
1.9%

Shanghai Stock Index


3.9%

US gold price
0.46%

Every MAPE is less than 5%, so the prediction effect is good enough.

6 Conclusions
Though the experiment results, we can find that the predicted datas trend is identical
with the real datas on the whole except few exceptional points and the MAPE be-

1264

C. Feng et al.

tween the real data and the predicted data are all small. This proved that as the chaotic
time series, the financial time series can be predicted by the feedforward neural networks.
On the other hand, we also can prove that the method which is adopted to choose
the training set by using the correlation dimension as the criterion is effective from
the experiment results. When we predict the chaotic financial time series using this
method, the uncertainty of the training sets choice is reduced.

Acknowledgements
The National 863 Natural Science Foundation of P. R. China
fully supported this research.

2001AA635010

References
1. Kim, S.H., Hyun, J.N.: Predictability of Interest Rates Using Data Mining Tools: A Comparative Analysis of Korea and the US. Expert Systems with Application,Vol.13. (1997)
85-95
2. Cecen, A.A., Erkal, C.: Distinguishing between stochastic and deterministic behavior in
high frequency foreign exchange rate returns: Can non-linear dynamics help forecasting?.
International Journal of Forecasting, Vol.12. (1996) 465-473
3. Harrison, R.G., Yu, D., Oxley, L., Lu, W., George, D.: Non-linear noise reduction and detecting chaos: some evidence from the S&P Composite Price Index. Mathematics and
Computers in Simulation, Vol.48. (1999) 407-502
4. Catherine, K., Walter, C.L., Michel, T.: Noisy chaotic dynamics in commodity markets.
Empirical Economics, Vol.29. (2004) 489-502
5. Rosenstein, M.T., Collins, J.J, De luca, C.J.: Apractial method for calculating largestLyapunov exponents in dynamical systems. Physica D, Vol.65. (1993)117-134
6. Lu, J.H., Zhang, S.C.: Application of adding-weight one-rank local-region method in electric power system short-term load forecast. Control Theory And Application, Vol. 19.
(2002) 767-770
7. Oh, K.J., Han, I.: Using change-point detection to support artificial neural networks for interest rates forecasting. Expert Systems with Application, Vol.19. (2000) 105-115
8. Leung, M.T., Chen, A., Daouk, H.: Forecasting exchange rates using general regression
neural networks Computer and Operations Research,Vol.27. (2000) 1093-1110
9. Kermanshahi, B.: Recurrent neural network for forecasting next 10 years loads of nine
Japanese utilities. Neurocomputing, Vol.23. (1998) 125-133
10. Holger, K., Thomas, S.: Nonlinear Time Series Analysis. Beijing: Qinghua University
Press (2000)
11. de Oliveira, Kenya, Andrsia, Vannucci, lvaro, da Silva, Elton, C.: Using artificial neural
networks to forecast chaotic time series. Physica A, Vol.284. (2000) 393-404
12. Kennel, M. B., Abarbanel, H., D., I.: Determining embedding dimension for phase space
reconstruction using a geometric construction. Physical Review A, Vol.151. (1990) 225223
13. Buczkowski, S., Hildgen, P., Cartilier, L.: Measurements of fractal dimension by boxcounting: a critical analysis of data scatter. Physica A, Vol.252. (1998) 2334.

The Prediction of the Financial Time Series Based on Correlation Dimension

1265

14. Heinz, O.P., Dietmar, S.E.: The science of fractal Image. New York: Springer Verlag New
York Inc (1988) 71-94.
15. Wang, H.Y., Zhu, M.: A prediction comparison between univariate and multivariate chaotic time series. Journal of Southeast University (English Edition),Vol.19. (2003) 414-417
16. Zhang, J., Lam, K.C., Yan, W.J., Gao, H., Li, Y.: Time series prediction using Lyapunov
exponents in embedding phase space. Computers and Electrical Engineering,Vol.30.
(2004)1-15
17. Grassberger, P., Procaccia, I.: Measuring the strangeness of the strange attractors. Physica,
9D (1983) 189-208.