You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322564911

A Fusion Financial Prediction Strategy Based on RNN and Representative Pattern


Discovery

Conference Paper · December 2017


DOI: 10.1109/PDCAT.2017.00024

CITATION READS
1 119

1 author:

Xiaopeng Fan
Chinese Academy of Sciences
40 PUBLICATIONS   191 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Stream Data Mining View project

the Natural Science Foundation of Hubei Province (2015CFB192) View project

All content following this page was uploaded by Xiaopeng Fan on 18 January 2018.

The user has requested enhancement of the downloaded file.


A Fusion Financial Prediction Strategy Based on RNN and Representative Pattern
Discovery

Lu Zhang∗† , Xiaopeng Fan∗ , Chengzhong Xu∗ ,


∗ Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
† Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, China
Email: {zhanglu, xp.fan, cz.xu}@siat.ac.cn

Abstract—To predicate the future with high accuracy is a prices more reliable, time series analysis attracts extensive
holy grail in financial market. However, the volatility of chaotic attentions.
financial market challenges new technologies from computer Traditional methods in time series analysis lie on solid
science to economic science all the time. Recently, Recurrent
Neural Network (RNN) plays a new role in financial market theoretical foundation, but nonlinear fitting ability is still
prediction. However, results from RNN are restricted by sample not enough to meet the requirements of financial predic-
size of training datasets, and show predication accuracy can tion. Recurrent Neural Network (RNN) is a deep learning
hardly be guaranteed in a long term. On the other hand, algorithm for time series analysis, which is with a strong
Representative Pattern Discovery (RPD) is an effective way ability of nonlinear fitting to extract deep features from
in long-term prediction while it is ineffective in short-term
prediction. datasets. However, in daily financial markets, there are only
In this paper, we define a representative pattern for time a few thousands of records for a stock or a future, which
series, and propose a fusion financial prediction strategy based greatly undermines the ability of RNN due to small data
on RNN and RPD. We take the advantages of both RNN size. Therefore, the prediction accuracy of RNN can hardly
and RPD, in the way that the proposed strategy is stateful be guaranteed more than several days.
to keep the short-term trend and it rectifies the predication
by a time-dependent incremental factor in a long-term way. Fortunately, there is another way - prediction based on
Compared with RNN and pattern discovery respectively, our pattern discovery. Almost all human actions have hidden
experimental results demonstrate that our proposed strategy patterns, and 93% of human actions are predictable [1]. As
performs much better than that of others. It can increase the one type of economic activities, there are also hidden pat-
prediction accuracy by 6% on the basis of RNN at most, but terns in financial market, and some investors have discovered
at a cost of higher Mean Squared Error.
some useful hidden patterns and benefit from them. The
Keywords-Financial prediction; Fusion method; Representa- discovery of hidden patterns from time series is so important
tive pattern discovery; RNNs that many significant methods [2]–[4] have been proposed.
In these algorithms, time-series subsequences clustering acts
I. I NTRODUCTION as one of the most important subroutines. However, it is
To predicate the future with high accuracy is a holy grail criticized to apply clustering approaches to discover frequent
in financial market. Since the birth of financial markets, patterns on time series subsequence [5]. The reason is when
financial prediction is the permanent goal for economists sliding window is applied to discretize long time series
and investors to pursue. Investors may gain enormous profits into subsequences given with a fixed window size, trivial
with a small amount of capital if they can accurately match subsequences always exist. Extracting subsequences
predict market fluctuation. However, the volatility of chaotic randomly with a length-variable window can reduce trivial
financial market challenges new technologies from computer matches significantly.
science to economic science all the time. As we all know, In this paper, we define a representative pattern as the sub-
financial markets are complex chaotic systems and they sequence that keeps the similarity with other subsequences
are influenced by many factors, including political events, within a range of distance. We propose a representative pat-
macro-economic conditions, and traders’ expectations etc. tern discovery algorithm based on the density peak clustering
Therefore, predicting financial market’s movements is a algorithm [6] with Dynamic Time Warping (DTW) distance.
challenging problem. On the basis of discovered patterns, we try to predict the
According to academic investigations, fluctuations of S&P500 index movements. Experiment results show that this
prices in markets are not completely random. They behave method can grasp the long-term fluctuations of the index.
in a highly nonlinear dynamic manner. The random-walk Due to market volatility, it is always ineffective in short-
assumption of prices may merely describe randomness with term prediction. Based on this observation, we propose a
a noisy nonlinear process. To make the forecasting of novel financial market prediction strategy by the fusion of
RNN and pattern matching. The proposed strategy applies
Corresponding author: Dr. Xiaopeng Fan, xp.fan@siat.ac.cn pattern matching to rectify the results of RNN. Experiment
results show that our fusion method performs much better to represent the features of the dataset. In the density peak
in real datasets on financial market prediction. clustering algorithm, cluster centers are recognized as the
maximum among local densities, which is far away from
II. R ELATED WORK any point with higher density [6]. In our work, the density
Lonardi et al. [7] apply Approximation Distance and distance are both measured by the DTW distance.
Map(ADM), and propose a high quality and high effective RNN is another time series analysis algorithm. In a
repeat pattern discovery method. In our problem, the dataset traditional neural network, all the inputs and outputs are
is small and with high noisy. It is hard to discover high independent of each other. However, RNN makes full use
similar repeat patterns from the dataset. of the sequential information. RNN is so called ”recurrent”
Das et al. [3] discretize time series by clustering basic because they carry out the same task on each element of a
methods, and then discover rules from discretized sequences. sequence, with the output depending on previous computa-
In this paper, we discovery representative pattern by den- tions. RNN reserves part of memories to know what has been
sity peak clustering. Petitjean et al. [8] provide a global calculated so far. Different from traditional neural networks,
averaging method for dynamic time warping with k-means the states of hidden layers in RNN are calculated by the
clustering. following equation:
Tak-chung et al. [4] discover patterns from stock time
series by using self-organizing Maps. Chonghui et al. [9] a(t) = W h(t−1) + U x(t) + b (1)
convert stock time series data into feature vectors by using
, where the parameters are the bias vectors b, the weight
the ICA algorithm, and then applies k-means algorithm to
matrices U and W respectively for input-to-hidden and
the extracted feature vectors. Xiaozhe et al. [10] propose a
hidden-to-hidden connections. The h(t−1) is the outputs of
method for clustering of time series based on the structural
hidden layers at pervious step.
characteristics of stock time series.
Suppose we consider tanh as an activation function, and
Persio et al. [11] try to predict the trend of the S&P 500
the outputs of hidden layers are calculated by the following
index by Long Short-Term Memory (LSTM) neural network,
equation:
which is a widely used variant of RNNs. But their results are
not good enough, in the way that each prediction is almost h(t) = tanh(a(t) ) (2)
the same as the previous day, and the accuracy rate was The loss function used in RNN can be the mean squared cost,
only 52.2%. We consider that there may be an inappropriate the cross-entropy cost and etc. If we can find a derivable
network structure in their work, and in our work we show function to compute the DTW distance, we can use the DTW
a better results by improving the LSTM neural network. distance as the cost of RNN.
Maknickiene et al. [12] apply the EVOLINO learning
algorithm to select the RNNs’ super parameters to achieve III. S TRATEGY
better stock prediction performance. In our work, we try
to use a new way to improve the prediction performance In this section, we propose a fusion strategy based on
of RNNs. To the best of our knowledge, our proposed RNN and pattern matching. The idea is motivated by the
approach to combine RNNs with pattern matching hasn’t following experiment results. RNN can accurately obtain
been reported until now. the short-term trends while pattern matching has a better
performance on long-term predictions. Our strategy consists
A. Background of four parts as follows.
Among most of data mining algorithms on time series, • Discover representative patterns. Inspired by the density
the quality of outputs depends almost exclusively on the peak clustering algorithm, we select the subsequences are
distance metrics [13]. The Dynamic-Time-Warping (DTW) with higher similarity to other subsequences, but with very
distance is considered as the best. In a time series with low similarity among each other. These subsequences dis-
high noises such as financial data, subsequences may vary covered are called Representative Patterns(RPs) because
greatly although they share the same pattern. A local lag they may not be repeated patterns but they are more
between peaks will appear because of the changes in data, representative than other subsequences.
such as traders’ expectation, Fed’s policy, macro-economic • Prediction by pattern matching. For a prediction event,
conditions etc. For such an out-of-sync problem, the DTW we calculate the DTW distance between the front end of
distance can capture invariants more accurately. the RPs and the historical data. If we can find matched
But in a daily financial history dataset, it is difficult to RPs, we locate the best join position by the value of points.
find repeat patterns with high similarity, even if we apply The subsequent of RPs behind the join position is the
the DTW distance. How can we find hidden patterns inside prediction of the future.
a dataset? Inspired by the density peak clustering algorithm, • Prediction by RNN. We apply a RNN structure [14]
we try to find the subsequences which have higher ability which can summarize a sequence and produce a fixed-size
representation. In this problem, a fixed-size representation Algorithm 2 Representative patterns discovery
is the results we try to predict. Inputs: SS: the set of subsequences; n: the num of repre-
• Fusion RNN with pattern matching. After we obtain the sentative patterns.
predictions from RNN and pattern matching, the fusion Outputs: RP s, The set of representative patterns.
adjusts the results of RNN by the predictions of pattern 1: function PATTERN D ISCOVERY (SS, k)
matching. 2: D ← calculate all-pair DTW distances matrix of SS.
3: n = len(SS)
A. Representative Patterns Discovery 4: for all i ∈ 1 : n do
5: ρ[i] = Count(D(i, otherSubsequences) < dc )
6: end for
Algorithm 1 Extract subsequences
7: for all i ∈ 1 : n do
Inputs: HS: history sequences; w: basic length of subse- 8: δlist [i] ← the list of subsequences with higher
quences; r: dynamic scope (r < w/2). density than SS[i]
Outputs: SS, the set of subsequences. 9: end for
1: function G ENERATE DATA(HS, w, r)
10: (δlist , index) = sort(δlist , ’descend’)
2: size = len(HS) * 2 / w 11: for all j ∈ 2:n do
3: for all i ∈ 1:size do 12: δ[index[j]] = MinDistance(index[j],
4: window size = w + r * random(-1, 1) δlist [index[j]])
5: begin pos = random(0, len(HS) - 13: end for
window size) 14: δ[index[1]] = MaxDistance(index[1], otherSubse-
6: SS[i] = HS[begin pos : begin pos + quences)
window size] 15: RP s ← topk(Subsequences, ρ * δ)
7: end for 16: return RP s
8: return SS 17: end function
9: end function

Finding repeated pattern with high similarity is difficult B. Prediction by pattern matching
in daily financial history datasets due to small size and high Until now we have obtained representative patterns. For
noise. So we try another heuristic way to obtain hidden a concrete prediction event, the computation flow is shown
patterns from the datasets. The way by applying clustering as Algorithm 3. At lines 2-5, we calculate the DTW dis-
approaches to discover frequent patterns is criticized mean- tance between the front segment of the RPs and the input
ingless when discretizing long time series into subsequences sequence, and choose the index of RPs which has minimum
given with a fixed window size. Therefore, we apply a distance at line 6. If the distance is less than the given cut-
window with variable length (w ± r, where r<m/2), and off distance, it means that we have a successful matching.
extract the subsequences with the num (2 * length of history Then we determine the join position based on the last few
sequences)/w randomly. By this way, we can reduce trivial values of the input sequence at line 10, and the subsequence
matches significantly. The flow is shown as Algorithm 1. behind the join position is the prediction result.
Most clustering algorithms are based on the assumption
that cluster averages are approximated as the ideal clustering C. Prediction by RNN
center. This means, on the other hand, that we can use There are three major design patterns for recurrent neural
clustering centers as approximations to cluster averages. We networks. In our work, we choose the time-unfolded RNN
consider clustering centers with more possibility to represent with a single output at the end of the sequence which is
such data than other points. illustrated in Fig. 1. This structure can summarize a sequence
Based on the rationality above, we propose a represen- and produce a fixed-size representation.
tative pattern discovery algorithm based on density peak, Before we use the RNN model to predict the future, we
and the flow is shown as Algorithm 2. At line 2, we need to set the structure and the the super parameter firstly.
calculate all-pair DTW distances between subsequences. Then we train the model by history data. After the RNN
From lines 4-6, ρ (i.e., the density of all the subsequences) is model is trained, the RNN model can produce a fixed-size
calculated. δ (i.e., minimum distance between higher density representation with a valid input sequence, which is the right
subsequences) is calculated in lines 7-13. For the special prediction we need.
case of the highest density subsequences, δ is calculated in
line 14. D. Fusion RNN with pattern matching
After ρ and δ are calculated, we choose the top-k subse- According to prediction results obtained from RNN and
quences which have higher values in ρ ∗ δ in line 15. pattern matching above, we rectify the results of RNN by the
Algorithm 3 Prediction by pattern matching Algorithm 4 Fusion Strategy
Inputs: RP : the set of representative patterns: ihs: input Inputs: RS: the prediction of RNN; M S: the prediction of
history sequence; n: The time span need prediction. pattern matching; n: the time span need prediction.
Outputs: M P R, The predictions by pattern matching. Outputs: F S: the predictions of the fusion method.
1: function P REDICTION B Y PM(RP , ihs, n, dc ) 1: function F USION M ETHOD (RS, M S, n)
2: for all i ∈ 1:len(RP s) do 2: F S[1] ← RS[1]
3: f rp ← the front end of the rp[i] 3: for all t ∈ 2:n do
4: d[i] = CalculateDtwDistance(f rp, ihs) 4: α = log(t+n) (t + 1)
5: end for 5: F S[t] = α ∗ ( M S[t]−M [t−1]
M [t−1] + 1) ∗ F S[t − 1] +
6: ι = argmin(d) (1 − α) ∗ RS(t)
7: M P R ← null 6: end for
8: if d[ι] < dc then 7: return F S
9: mp = RP s[ι] 8: end function
10: Determine the join position based on the last few
values of the ihs
11: M P R ← the subsequent of mp behind join IV. E XPERIMENTS
position
12: end if A. Dataset
13: return M P R We choose the S&P 500 as a typical example of financial
14: end function market time-series dataset. The Standard & Poor’s 500 Index
(S&P 500) is an index of 500 stocks as a leading indicator of
U.S. equities and a reflection of the performance of the large
𝐿(()
cap universe. The index is made up of companies selected
by economists. In our work, we obtain the history data from
𝑦 (() 𝑜(() 2004 to 2016 and contains 4399 records. The data between
𝑉
2004 and 2014 is extracted as the training dataset, while the
𝑊 𝑊 𝑊 𝑊 𝑊 𝑊- others are considered as the testing dataset.
… 𝐶 (#$%) 𝐶 (#) 𝐶 (#'%) … 𝐶 (() 𝐶 -(.)
… …
𝑈 𝑈 𝑈 𝑈 𝑈 B. Evaluation Metrics
𝑥 (#$%) 𝑥 (#) 𝑥 (#'%) 𝑥 (…… ) 𝑥 (() We take Mean Squared Error(MSE) and Mean Trend
Accuracy(MTA) to evaluate the effectiveness of our pro-
Figure 1. The RNN structure with a single output at the end of the posed solution. MSE is an estimator metric to evaluate the
sequences.
average of the squares of errors, i.e., the difference between
the estimator and what is estimated. MTA is an estimator
to evaluate the trend prediction ability of the models. It’s
predictions from pattern matching according to the following defined as the rate of the correct times to total times. In our
equation: work, we evaluate the accuracy of prediction by the value
between the prediction point and the start point. For instance,
M (t) − M (t − 1) in the prediction for the t5 , if the predicted value and the true
F (t) = α∗( +1)∗F (t−1)+(1−α)∗R(t) value are both higher than the value of t0 , it means we have
M (t − 1)
(3) a correct prediction. In the domain of financial prediction,
,where t = 1, . . . , T , F is the prediction of the fusion the trend is more important than actual values. Thus, in our
method, M is the prediction of pattern matching, R is the work, MTA is the major metric of performance.
prediction of RNN, and α is a value between 0 and 1.
C. Performance Analysis
The details of the fusion method are shown in Algorithm
4. At line 2, we assign the first day prediction of F S by the The super parameter of RNN is showed in Table I, and
RS, because RNN is accurate enough within the time range. each output represents the prediction of one day. The super
At lines 3-5, we calculate α by the following equation: parameter of pattern discovery is showed in Table II.

Table I
α = log(t+T ) (t + 1) (4) P RIMARY SUPER PARAMETERS OF RNN.

hidden layers step nums input size output size cells


and then calculate the prediction of t moment by Equation 1 20 1 5 LSTM
3.
Future

Figure 2. (a) Pattern matching example. (b) Prediction example.

Figure 3. (a) The MTAs of methods. (b) The MSEs of methods.

Table II
S UPER PARAMETERS OF PATTERN DISCOVERY. It can increase the accuracy by 6% on the basis of RNN
at most. On the metric of MSE, RNN is with the minimum
subseq length center nums predict length cutdown percent MSE, but the fusion strategy is with the largest MSE because
25 16 5 4%
of some abnormal values during the fusion.

V. C ONCLUSION
The experimental results are based on the above super In this paper, we propose a representative pattern discov-
parameters. The example of prediction is showed in Fig.2. ery algorithm based on the density peak clustering algorithm.
The label of y-axis in Fig.2 is the S&P index value after the The DTW distance is used in the algorithm as the metric of
z-score normalization. From Fig.2(a), after we find a repre- similarity. After representative patterns are discovered, we
sentative pattern, our adjust strategy can make our patterns apply the pattern to predict the future. On the basis of the
closer to true value. Fig.2(b) shows among three prediction above work, we propose a fusion strategy which rectifies
results, the fusion strategy reconciles the predictions of RNN RNN’s prediction by pattern matching. Experimental results
and pattern matching, and its result is more closer to the true show that the fusion strategy performs better on trend
value. prediction than RNN and pattern matching while have a high
To reduce the randomness of experiments, we take several MSE on the values. In the future, We aim to improve our
experiments to calculate the means and variances. The fusion algorithm so that it can preform better both on trends
experimental results are shown in Fig. 3. As the results show, and MSE values.
RNN has a better performance on the trend prediction within
ACKNOWLEDGMENT
two days, while pattern matching has a better performance
on other days. The fusion strategy takes the advantages of This research is partially supported by Major Project of
both methods, and have a good performance all the time. Chinese National Programs for Fundamental Research and
Development (2015CB352400), National Natural Science
Foundation of China (61672513, 61501443, U1401258), and
Shenzhen Strategic Emerging Industry Development Funds
(JSGG20160229200957727, JSGG20150512145714248).
R EFERENCES
[1] A.-L. Barabási, Bursts: the hidden patterns behind everything
we do, from your e-mail to bloody crusades. Penguin, 2010.

[2] H. Wang, “Clustering by pattern similarity in large data sets,”


in ACM SIGMOD International Conference on Management
of Data, 2002, pp. 394–405.

[3] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth,


“Rule discovery from time series.” in KDD, vol. 98, no. 1,
1998, pp. 16–22.

[4] T. C. Fu, F. L. Chung, V. Ng, and R. Luk, “Pattern discovery


from stock time series using self-organizing maps,” in Work-
shop Notes of Kdd Workshop on Temporal Data Mining, 2001,
pp. 1 – 8.

[5] E. Keogh, J. Lin, and W. Truppel, “Clustering of time series


subsequences is meaningless: Implications for previous and
future research,” in IEEE International Conference on Data
Mining, 2003, pp. 115–122.

[6] A. Rodriguez and A. Laio, “Clustering by fast search and find


of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496,
2014.

[7] J. Lonardi and P. Patel, “Finding motifs in time series,” in


Proc. of the 2nd Workshop on Temporal Data Mining, 2002,
pp. 53–68.

[8] F. Petitjean, A. Ketterlin, and P. Gançarski, “A global aver-


aging method for dynamic time warping, with applications to
clustering,” Pattern Recognition, vol. 44, no. 3, pp. 678–693,
2011.

[9] C. Guo, H. Jia, and N. Zhang, “Time series clustering based


on ica for stock data analysis,” in International Conference
on Wireless Communications, NETWORKING and Mobile
Computing, 2008, pp. 1–4.

[10] X. Wang, K. Smith, and R. Hyndman, “Characteristic-based


clustering for time series data,” Data Mining & Knowledge
Discovery, vol. 13, no. 3, pp. 335–364, 2006.

[11] L. Di Persio and O. Honchar, “Artificial neural networks


architectures for stock price prediction: comparisons and
applications,” International Journal of Circuits, Systems and
Signal Processing, vol. 10, pp. 403–413, 2016.

[12] N. Maknickienė and A. Maknickas, “Financial market predic-


tion system with evolino neural network and delphi method,”
Journal of Business Economics and Management, vol. 14,
no. 2, pp. 403–413, 2013.

[13] J. Shieh and E. Keogh, “i sax: indexing and mining terabyte


sized time series,” in Proceedings of the 14th ACM SIGKDD
international conference on Knowledge discovery and data
mining. ACM, 2008, pp. 623–631.

[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.


MIT Press, 2016, http://www.deeplearningbook.org.

View publication stats

You might also like