You are on page 1of 15

Information Sciences 622 (2023) 133–147

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

Residual long short-term memory network with multi-source


and multi-frequency information fusion: An application to
China’s stock market
Songsong Li a, Zhihong Tian a,⇑, Yao Li b
a
School of Management, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin City, Heilongjiang Province, China
b
School of Mathematics, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin City, Heilongjiang Province, China

a r t i c l e i n f o a b s t r a c t

Article history: The most widely used model in stock price forecasting is the long short-term memory net-
Received 29 December 2021 work (LSTM). However, LSTM has its limitations, as it does not recognize and extract fea-
Received in revised form 24 November 2022 tures well and has a representational bottleneck. Furthermore, the factors affecting stock
Accepted 27 November 2022
prices are multi-source and multi-frequency information, making neural network models
Available online 1 December 2022
difficult to handle. In this paper, we introduce a feature fusion residual LSTM (FFRL) model
to answer these two questions – how to compensate for the three limitations of LSTM and
Keywords:
how to fuse the multi-source and multi-frequency information. FFRL consists of three mod-
Long short-term memory network
Residual connection
ules to improve the three limitations of LSTM, namely the feature selection module, feature
Multi-source extraction module, and residual module. To learn features from multi-source and multi-
information frequency information, FFRL applies the feature selection module to emphasize important
Multi-frequency information features and the feature extraction module to extract deeper features. We demonstrate sig-
nificant performance improvements of FFRL over comparison models, ablation networks,
and visualization methods on a variety of Chinese stocks.
Ó 2022 Elsevier Inc. All rights reserved.

1. Introduction

Stock price series are unique financial time series in that they are noisier and have relatively weaker cyclical patterns,
making stock price forecasting more complex than other time series (e.g., electricity, traffic flow, etc.). Meanwhile, stock
price forecasting can help investors understand the overall future condition of the stock market and warn of possible upcom-
ing stock market shock, reducing investors’ losses and maintaining the stability of financial markets. Performance improve-
ments in stock price forecasting are highly significant.
On a global scale, many factors affect the movements of stock prices, and financial markets in most countries have moved
into ‘weak-form market efficiency’ [1,2]. According to Fama’s efficient market hypothesis, when a country’s financial market
enters ‘weak-form market efficiency’, features related to stock prices will fail to forecast stock prices, and fundamental anal-
ysis will be needed [3]. However, most studies choose input features only related to stock prices, such as volume, price [4–6]
and technical features [7,8]; when predicting the closing price, only a few studies choose features related to fundamentals
[9,10]. For a better prediction, we combine macro features with price-related features. However, most of the macro features
are not consistent with the frequency of price-related features. For example, GDP is a quarterly indicator, while the indicators

⇑ Corresponding author.
E-mail address: 18637025058@163.com (Z. Tian).

https://doi.org/10.1016/j.ins.2022.11.136
0020-0255/Ó 2022 Elsevier Inc. All rights reserved.
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

we selected related to stock prices are all 30 min indicators. In addition, the heterogeneity between different types of fea-
tures poses a challenge – how to use multi-source and multi-frequency information to fully improve the model’s perfor-
mance [11,12]. Therefore, in dealing with multi-frequency and multi-source features, improving the model’s predictive
performance has become a key issue in this paper.
The most suitable model for predicting time series among existing models is LSTM. The unique structure of LSTM [13]
makes it a significant player in time series forecasting. Numerous studies have confirmed that LSTM outperforms most single
econometric [14,15], machine learning, and neural network models in stock price forecasting [16–21]. However, the gating
mechanism of LSTM makes it have the representational bottleneck [22–24], where it is possible to permanently discard
important information. In addition, LSTM does not have enough capacity for feature recognition [25–27] or feature extrac-
tion [28–30]. Therefore, it is imperative to improve LSTM to optimize its predictive performance. The most common and
effective method of improvement in the current research is constructing a hybrid model, which outperforms single predic-
tion models [31–35].
Building a hybrid model and compensating for the poor feature recognition, feature extraction, and representational bot-
tleneck of LSTM has become another key issue. Convolutional neural networks (CNNs) [36] are the popular feature extraction
tool in the field of image recognition [37], autonomous driving [38], etc. Compared to principal component analysis (PCA),
CNN can extract deeper features without losing information. [7,39,40] use CNN as feature extractors in time series prediction
and improve LSTM. In addition, some studies have added a time-based attention mechanism behind LSTM [39,41], improving
the model’s recognition of essential moments [12]. However, few solutions have been proposed to solve the problem of LSTM
discarding essential information, i.e., representational bottleneck, and the poor feature recognition of LSTM.
We present a high-performance model based on feature fusion residual LSTM (FFRL) and improve the three limitations of
LSTM by effectively filtering and extracting features from different sources and frequencies. FFRL achieves high predictive
performance with the interaction of components, specifically incorporating (1) a feature selection module, which is used
to filter features that have significant impacts on the closing price of the stock, (2) a feature integration module, which is
used to process features of different resources and frequencies and can extract deeper features, and (3) a residual module,
which is used to improve the nonlinear fitness of FFRL and reduce information losses. Meanwhile, the residual module assists
the model in identifying historical moments of significance. To verify the necessity of each component of FFRL, we conduct
three ablation analyses: (1) removing the feature selection module, (2) substituting the feature integration module, and (3)
substituting the residual module. In addition, we visualize the feature selection and feature integration module using the
attention weight and t-SNE method.

2. Related theory

The long short-term memory network (LSTM) is the suitable neural network for time series prediction. LSTM partially
compensates for the vanishing gradient and the long-term dependence of RNN. Compared to RNN, LSTM incorporates three
gates and a cell state. The three gates allow LSTM to selectively retain or discard historical information, while the cell state
solves the vanishing gradient problem and allows LSTM to retain historical information for a relatively long period. The
structure of the LSTM unit is shown in Fig. 1:
The LSTM unit works in four main steps, which are as follows:
(1) The forget gate calculates the forgetting factor.
The calculation of a forgetting factor is shown in Eq. (1):

Fig. 1. The LSTM unit.

134
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

f t ¼ rðW f xt þ U f ht1 þ bf Þ ð1Þ


where rðÞ is the sigmoid activation function, which has a value range between ½0; 1, ðW f ; U f ; bf Þ is a set of weights of the
forget gate, and f t is the forgetting factor.
(2) The input gate calculates an input factor and creates a new cell state.
The calculation of the input factor is shown in Eq. (2):
it ¼ rðW i xt þ U i ht1 þ bi Þ ð2Þ
where it is the input factor, whose value is between ½0; 1. The input factor determines the extent to which the information of
the new cell state is retained. ðW f ; U f ; bf Þ is a set of weights of the input gate. After calculating the input factor, LSTM’s unit
will create a new cell state, and the calculation of the new cell state is shown in Eq. (3):
C 0t ¼ tanhðW c xt þ U c ht1 þ bc Þ ð3Þ

C 0t
where is the new cell state and tanhðÞ is the tanh activation function, which has a value range between ½1; 1. ðW c ; U c ; bc Þ
is a set of weights shared in the calculation of the new cell state.
(3) Updating the cell state
The process of updating cell state is shown in Eq. (4):
C t ¼ f t  C t1 þ it  C 0t ð4Þ
(4) The output gate calculates an output factor,
The calculation of the output factor is shown in Eq. (5):
ot ¼ rðW o xt þ U o ht1 þ bo Þ ð5Þ
where ot is the output factor and ðW o ; U o ; bo Þ is a set of weights of the output gate. The output factor determines the output at
the current moment or the input of the next moment. The output is based on the cell state’s information, so we need to mul-
tiply the output factor with the cell state’s information. The process is shown in Eq. (6):
ht ¼ ot  tanh ðC t Þ ð6Þ
By analyzing the structure of the LSTM unit, we note that the three gating factors of LSTM are all in ½0; 1. When the for-
getting factor is not 0 or the input factor and output factor are not 1, the gating mechanism will cause a loss of previous and
current information, resulting in a representational bottleneck. Furthermore, LSTM focuses on improving the memory of
time series, not the feature selection or extraction mechanism. To improve on these three limitations of LSTM, this paper
will construct a LSTM-based hybrid model (see Fig. 2).

3. Model architecture

We designed FFRL to use specialized components to compensate for the three limitations of LSTM and to make full use of
each input type (i.e., technical and macro features) for high forecasting performance. The major components of FFRL are as
follows:
The feature selection module improves the feature recognition of LSTM through a dimension-based attention mecha-
nism, assigning higher weights to important features and lower weights to less important features to assist the model in rec-
ognizing important features.
The feature integration module improves the feature extraction of LSTM, using different convolutional layers to extract
features from different sources and frequencies and fusing the data extracted from them.
The residual module improves the representational bottleneck of LSTM and deepens the module with convergence. A
residual module consists of a two-layer LSTM, a time-based attention mechanism, and a residual connection, where LSTM
is used to forecast the closing price of the stock, and time-based attention is used to improve recognition of important
moments influencing closing price. Residual connection is used to deal with the problem of losing information when the
LSTM gating mechanism is working, deepening the model [42,43].

3.1. Feature selection module

We divide features into three types – volume and price, technical, and macro features. The feature selection module filters
different types of features to recognize which ones have a greater or lesser impact on forecasts. In this section, we demon-
strate the feature selection module using volume and price features. Let xi;tk:t denote the i-th volume and price feature, with
X ¼ ½xT1;tk:t ; xT2;tk:t ;    xTm;tk:t  being the matrix of all volume and price features. Feature selection weights are generated from
a fully connected layer activated by Softmax:
v x ¼ SoftmaxðX Þ ð7Þ

135
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

Fig. 2. The model architecture of FFRL.

where v x is a vector of feature selection weights for volume and price features, and v xi is the i-th element of v x , which deter-
mines the feature selection weight of the i-th feature. Features are then weighted by their feature selection weights:
nxi ;tk:t ¼ v xi xi;tk:t ð8Þ

nx ¼ ½nx1;tk:t ; nx2 ;tk:t ;    ; nxm;tk:t  ð9Þ

where nxi;tk:t is the selected feature vector for variable i and nx is the selected matrix of volume and price features. Meanwhile,
two other feature selectors, for technical features matrix Z ¼ ½zT1;tk:t ; zT2;tk:t ;    zTn;tk:t  and macro features matrix
h i
C ¼ cT1;tk:t ; cT2;tk:t ;    cTq;tk:t , select features separately, obtaining the filtered feature matrices nz , nc .

3.2. Feature integration module

The feature integration module consists of two components – feature extraction and feature fusion. This section first ana-
lyzes feature extraction. We choose 1D convolutional layers to extract deeper features from selected features, and each type
of selected feature has its own 1D convolutional layer. The process of feature extraction is as follows:
n0x ¼ ReluðW n  nx Þ ð10Þ
where W n is the convolutional kernel,  is the element-wise Hadamard product, Relu is the activation function of the con-
volutional layer, and n0x is the matrix of the extracted volume and price features. Similarly, two other feature extractors
extract the selected technical and macro features separately, obtaining the extracted feature matrices n0z and n0c . We then ana-
lyze feature fusion, which fuses three different types of features, feeding fused features to the next module. The process of
feature fusion is as follows:
 
H ¼ Concat n0x ; n0z ; n0c ð11Þ

136
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147


H ¼ W HH þ b ð12Þ

where H is the feature matrix, fusing extracted volume and price, technical and macro features.H is generated using the lin-

ear transformation of H, and H ¼ ½hT1:p;tk ; hT1:p;tkþ1 ;    hT1:p;t , where p is the number of extracted features. Another expression
 
of H is H ¼ ½hT1;tk:t ; hT2;tk:t ;    hTp;tk:t . Because the next section will analyze how FFRL selects an important moment, we choose

the first expression. H, as the extracted and fused feature matrix, is fed to the next module of FFRL.

3.3. Residual module

In this paper, the number of residual modules is a hyperparameter. The process of the attention mechanism recognizing
important moments is as follows:

H ¼ LSTMðHÞ ð13Þ

where H is a matrix locally enhanced by LSTM, and H ¼ ½gT1:p;tk ; gT1:p;tkþ1 ;    gT1:p;t .

uT ¼ SoftmaxðHÞ ð14Þ

where uT is a vector of moment selection weights, the l-th element of uT is denoted by uT tl with l being the l-th historical
moment, and the more important the l-th historical moment, the larger the uT tl . Local enhanced moments are weighted by
their moment selection weights:

w1:p;tl ¼ ul g1:p;tl ð15Þ

W ¼ ½w1:p;tk ; w1:p;tkþ1 ;    ; w1:p;t  ð16Þ

where w1:p;tl is the l-th historical moment after being selected and W is the selected moment matrix. We also employ a resid-
ual connection over 2-layer LSTM and a time-based attention mechanism:

X ¼ H þW ð17Þ

where X is the final output of the residual module.

3.4. Output module

The output module consists of a 2-layer dense, and forecasting results are generated using a linear transformation of the
output from the residual module:

ytþ1 ¼ W q X þ bq ð18Þ

where ytþ1 is the forecasting result of the closing price at the next moment, W q and bq are coefficient matrices.

4. Experiment

We compare FFRL with LSTM, CNN-LSTM, and CNN-LSTM-ATTEN to verify the high predictive performance of FFRL. The
experiments are all conducted on the same dataset and operating environment. All the experiments are conducted on a Win-
dows 10 Professional 64-bit operating system with a six-core CPU configuration. The processor is an Intel(R) Core(TM) i7-
10750H CPU @ 2.60 GHz, the GPU is an NVIDIA GeForce GTX1650 and the IDE is Pycharm.

Table 1
Information on stock price indices.

Code Name Frequency Duration


000300.SH CSI300 30 min 2018.02.23–2021.02.22
000905.SH CSI500 30 min 2018.02.23–2021.02.22
000001.SH SSEC 30 min 2018.02.26–2021.02.22
399106.SZ SZSE 30 min 2018.02.26–2021.02.22

137
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

4.1. Data

Two constituent indices, CSI300 (000300. SH) and CSI500 (000905. SH), and two composite indices, SSEC (000001. SH) and
SZSE (399106. SH), are selected to test the predictive performance of the model. We present detailed information on stock
price indices in Table 1.

4.2. Features

We select three types of features, namely, volume and price, technical, and macro features. The volume and price features
include Open, High, Low, Volume, Amt, Change, and Percentage of Change (Change%). Technical features include BIAS, BOLL,
DMI, EXPMA, HV, KDJ, MA, MACD, and RSI. Macro features include six types, namely, Exchange Rates (XR), General Indicators
of Stock Market (GISM), Summary of Domestic Futures Trading (SDFT), Price Indices (PI), Money supply (MS), and GDP, with
30 features in total. Specific information on macro features is documented in Table A7, Appendix A.

4.3. Normalization

To eliminate the effects of different magnitudes, we preprocess the data in a normalized way – Min-Max normalization.
The process of Min-Max normalization is shown in Eq. (19):
 X i  X min
Xi ¼ ð19Þ
X max  X min

where X i is the normalized value of the i-th data, X i is the value without normalization of the i-th data, X min is the minimum
of the series, and X max is the maximum of the series.

4.4. Model evaluation

We select four negative evaluation indicators to evaluate the forecasting performance of FFRL and comparison models,
namely, MSE, MAE, RMSE, and MAPE. For ease of reading, we calculate MSE using the normalized data [44,45] and calculate
the remaining indicators using unnormalized data [46,47].
(1) Mean Squared Error (MSE)

1X n
2
MSE ¼ ðpredict k  actualk Þ ð20Þ
n k¼1

where predictk is the k-th predicted value and actualk is the k-th actual value corresponding to the k-th predicted value. The
smaller the value of MSE is, the better the predictive performance.
(2) Mean Absolute Error (MAE)

1X n
MAE ¼ jpredict k  actualk j ð21Þ
n k¼1

(3) Root Mean Squared Error (RMSE)


vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u n
u1 X
RMSE ¼ t
2
ðpredict k  actualk Þ ð22Þ
n k¼1

(4) Mean Absolute Percentage Error (MAPE)

100% X
n
predictk  actualk
MAPE ¼ j j ð23Þ
n k¼1 actualk

4.5. Training procedure

We partition each stock price index into a training set and a testing set according to the ratio 7:3. Let the last 25 % of the
training set be the validation set for the experiment. The optimal parameters of FFRL are shown in Table 2.

4.6. Results and discussion

Fig. 3 summarizes the predictions of FFRL and the comparison models for the closing price of each stock price index. From
the predicted results of CSI300, we note that the predicted result of FFRL is consistently closer to the true value, while the
forecasts of the comparison models do not keep pace with the true value when the closing price increases rapidly. This phe-
138
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

Table 2
Parameters of FFRL.

Parameters Value
Conv1D_1 filter 32
Conv1D_2 filter 32
Conv1D_3 filter 15
Kernel_size 1
Convolutional activation function Relu
Dropout behind the convolutional layer 0.3
Lstm_units 64
LSTM activation function Tanh
Dropout behind the convolutional layer 0.2
Batch_size 32
Epoch 200
Time_step 30
Learing_rate 0.001
Optimizer Adam
Loss function MSE

Fig. 3. The results of comparison experiments.

nomenon is also found in comparison models’ forecasts of other stock indices. The series of macro features are relatively flat
compared to the series of closing prices due to their highly different frequencies, and each comparison model lacks a feature
selection and integration module. This affects the forecasts of comparison models through macro features, and the forecast-
ing results are smoother.
Table 3 summarizes the evaluation indicators of the forecasting results generated by the FFRL and comparison models.
For each stock index, the best results of the four models are shown in bold. For forecasts of CSI300, the forecast of FFRL is
the best of all models, and the forecast of CNN-LSTM-ATTEN is second. Compared to the suboptimal model, the FFRL model
reduces the MSE by 97.61 %, MAE by 84.88 %, RMSE by 84.52 %, and MAPE by 83.16 %. For the remaining three indices, we
also find that the results of FFRL are significantly improved compared to the comparison models. Combined with Table 4,
we find that although FFRL consumes more time and memory resources, its prediction performance greatly increases.

4.7. Ablation analysis

To verify the necessity of each FFRL component, we perform the ablation analysis - substituting or removing the compo-
nents of FFRL listed below and quantifying the forecasting performance in evaluation indicators versus the original model.
Details of the ablation analysis are as follows:
No feature fusion: This experiment substitutes the feature extraction module to verify its effectiveness on processing
features from different sources. This experiment substitutes three different convolutional layers with one singular convolu-
tional layer.
139
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

Table 3
MSE, MAE, RMSE, and MAPE for the differences between the real closing prices and estimates using the testing dataset and FFRL, LSTM, CNN-LSTM, and CNN-
LSTM-ATTEN comparison models.

Name Model MSE MAE RMSE MAPE(%)


CSI300 FFRL 4.26E04 40.851 59.544 0.897
LSTM 2.98E02 334.985 497.926 6.524
CNN-LSTM 5.70E02 544.839 688.733 10.831
CNN-LSTM-ATTEN 1.78E02 270.232 384.663 5.326
CSI500 FFRL 4.10E04 42.558 58.567 0.686
LSTM 2.73E03 113.370 151.003 1.784
CNN-LSTM 1.18E02 279.211 313.568 4.546
CNN-LSTM-ATTEN 1.66E02 339.871 372.426 5.538
SSEC FFRL 3.99EE04 18.745 24.994 0.566
LSTM 8.19E03 93.645 113.274 2.828
CNN-LSTM 3.53E03 59.262 74.331 1.798
CNN-LSTM-ATTEN 5.08E03 59.825 89.191 1.766
SZSE FFRL 6.01E04 25.196 30.811 1.153
LSTM 8.51E02 331.231 366.557 14.807
CNN-LSTM 7.44E02 314.836 342.683 14.133
CNN-LSTM-ATTEN 6.94E02 303.325 330.916 13.612

Table 4
Trainable parameters and execution time of FFRL and comparison
models.

Model Trainable Parameters Execution Time


FFRL 372,880 1 h20 min
LSTM 229,505 16 min
CNN-LSTM 277,505 17 min
CNN-LSTM-ATTEN 516,003 18 min

No residual connection: This experiment removes the residual connection in the residual module to verify its impor-
tance in compensating for LSTM’s bottleneck and deepening the model with convergence.
No variable selection: We ablate by removing the variable selection module and retaining other modules to explore
whether the feature selection module can contribute to the predictive performance of FFRL.
Ablated experiments are conducted for each stock price index using the hyperparameters of Table 2. Fig. 4 shows predic-
tions of the ablated networks and FFRL. We note that those with no residual connection have the worst prediction with a
straight line. This suggests that the ablated network does not learn features of stock price indices. This phenomenon results

Fig. 4. Results of ablation analysis.

140
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

from the vanishing gradient caused by the excessive depth of the model. This phenomenon indicates in reverse that adding
residual connections to the deep network accelerates the network’s convergence and improves its predictive performance.
Table 5 summarizes the evaluation indicators of the ablated networks. For each stock index, the best results derived from
the four models are bolded. Combining with Table 6, we note that FFRL outperforms all the ablated networks with similar
time and memory resources, except for the forecast of the CSI 500. For the forecasts of CSI500, FFRL outperforms those with
no residual connection and no feature fusion. However, compared to those with no variable selection, the MSE and RMSE of
FFRL are marginally larger than no variable selection. From the formulas of MSE and RMSE, we note that these two indicators
are more susceptible to outliers, and the CSI 500 is more volatile than the remaining three stock indices, so outliers are more
likely to occur. For CSI500, MSE and RMSE are less reliable compared to MAE and MAPE. In summary, FFRL outperforms each
ablated network, suggesting that each component of FFRL is necessary.
To evaluate the contribution of each component to FFRL, we calculated the changes of evaluation indicators of the ablated
networks compared to FFRL, and Fig. 5 summarizes the results. We note that the residual connection of the residual module
has a more profound impact on FFRL. It is the core tool that makes FFRL a deep model. Without residual connection, FFRL as a
deep model is useless in predicting closing prices. The feature extraction module has a higher impact on the prediction of
FFRL than the feature selection module.

4.8. Visualization

To interpret the reasonableness of the predicted results and FFRL, we visualize the results of the feature selection module
as well as the feature integration module as follows: (a) using feature selection weights to represent the importance of fea-
tures, and visualizing feature selection weights to show the distributions of feature selection weights to determine whether
the feature selection module can identify important features based on their distributions, (b) using t-SNE [48] to show the
separability of raw features and convolved features, and visualizing the results of t-SNE to ensure the validity of the feature
integration module.

4.8.1. Feature selection weights


We quantify the importance of features using feature selection weights calculated by the feature selection module illus-
trated in Sec. 3.1. The higher the selection weight is, the more important the feature is. Fig. 6 describes the selection weights
of the four indices. For CSI300, we note that the distribution of selection weights in each category is significantly different.
Taking the macro features category as an example, the maximum weight representing GISM is 78 times the minimum weight
representing PI. The largest selection weight accounts for the majority of the sum of features’ weights in six types of macro
features. Furthermore, the distribution of selection weights among categories are significantly different. The distribution is
relatively uniform in the technical features, sub-uniform in the volume and price features, and extreme in the macro fea-
tures. The differences in distribution are also found in the CSI500, SSEC, and SZSE. These differences mean that the feature
selection module of FFRL in effective operation can capture features that are more essential to the prediction using higher
selection weights and less essential ones using lower selection weights.

4.8.2. Feature separability


In this section, we use the t-SNE method to visualize the feature separability and compare the t-SNE results of raw and
convolved features to prove the efficiency of the feature integration module of FFRL. The t-SNE method can cluster similar

Table 5
MSE, MAE, RMSE, and MAPE for the differences between the real closing prices and estimates using the testing dataset and FFRL, no feature fusion, no residual
connection, and no variable selection ablated networks.

Name Model MSE MAE RMSE MAPE


CSI300 FFRL 4.26E04 40.851 59.544 0.897
No feature fusion 5.24E04 53.282 66.031 1.113
No residual connection 1.74E01 1079.453 1204.099 22.121
No variable selection 1.52E03 89.661 112.618 1.816
CSI500 FFRL 4.10E04 42.558 58.567 0.686
No feature fusion 2.56E03 124.663 146.498 2.018
No residual connection 1.27E01 383.500 445.724 11.341
No variable selection 3.59E04 44.242 54.844 0.729
SSEC FFRL 3.99E04 18.745 24.993 0.566
No feature fusion 8.69E03 86.706 116.684 2.5664
No residual connection 2.65E01 607.518 646.460 27.427
No variable selection 7.36E04 28.245 33.961 0.882
SZSE FFRL 6.01E04 25.196 30.811 1.153
No feature fusion 2.04E03 51.421 56.723 2.347
No residual connection 1.81E01 1140.791 1232.081 17.974
No variable selection 2.55E03 55.447 63.396 2.522

141
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

Table 6
Trainable parameters and execution time of FFRL and ablated models.

Model Trainable Parameters Execution Time


FFRL 372,880 1 h20 min
No feature fusion 375,552 1 h 20 min
No residual connection 371,600 1 h 18 min
No variable selection 338,780 1 h 11 min

Fig. 5. Changes in evaluation indicators across ablation analyses.

data to display feature separability. The more clustered the t-SNE results are, the more separable the feature and the more
efficient the feature integration module. Fig. 7 shows the t-SNE results of four stock indices. From the first column of Fig. 7,
we can see that raw features are all mixed up, without obvious borders. This phenomenon indicates that unprocessed fea-
tures can hardly be distinguished and recognized. Thus, the prediction model needs a more complex feature extraction
mechanism. The second column of Fig. 7 displays the t-SNE results of features after the feature integration module; we note
that the convolved features are apparently clustered together. This circumstance suggests that the feature extraction module
of FFRL is efficient because it can extract deep and easily identifiable features, which can facilitate the subsequent module
prediction, from hardly distinguished and recognized raw features.

5. Conclusion

We introduce FFRL, a novel model based on feature fusion and residual LSTM. Building FFRL aims to compensate for the
three limitations of LSTM and fully utilizes multi-source and multi-frequency information. The feature selection module is
added to handle the poor feature recognition of LSTM. Creating a feature integration module improves the poor feature
extraction of LSTM. Adding a residual module is for dealing with the representational bottleneck of LSTM. Through compar-
ison experiments, ablation analysis and visualization, the following conclusions can be drawn:
(1) Compared to comparison models, FFRL can capture closing price movement details. From the graph of forecasting
results, we note that prediction curves of comparison models are smoother than those of FFRL, so the prediction of FFRL
can reflect the true movement of stock price indices.
142
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

Fig. 6. Feature selection weights of CSI300, CSI500, SSEC, and SZSE. (i), (ii) and (iii) show the selection weights of technical, volume and price, and six types
of macro features, respectively. For (iii), the dark colored bins represent the most essential feature in each category, and the light colored bins represent the
sum of features’ selection weights except for the essential feature.

(1) FFRL makes better use of multi-source and multi-frequency information than comparison models. The predictions of
FFRL are closer to the true value, but the prediction curves of the comparison models are lower and smoother than the true
curves. (3) FFRL has better applicability than the comparison models. FFRL is the optimal model for predicting all four indices.
This indicates that the data have less impact on the prediction performance of FFRL, and they are suitable for most scenarios.
(4) Each component of FFRL has its necessity, where the residual connection of the residual module is a core tool for build-
ing a deep model. From the results of ablation analysis and visualization, we note that (a) the prediction of FFRL is better than
that of ablated networks, and FFRL without residual connection does not have any predictive ability, (b) the feature selection

143
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

Fig. 7. The t-SNE results of CSI300, CSI500, SSEC, and SZSE. (i) and (ii) represent the t-SNE results of raw features and convolved features respectively.
Different colored dots represent different types of features.

144
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

module can recognize features effectively, and the feature integration module can extract features that can be easily sepa-
rated by t-SNE and recognized by the latter module.
Limited to paper length, this paper does not link forecasts to the market in real-time. Subsequent research allows for the
development of trading strategies and statistics on strategy returns, allowing for more practical applications of FFRL.

CRediT authorship contribution statement

Songsong Li: Conceptualization, Methodology, Visualization. Zhihong Tian: Data curation, Software, Writing – original
draft. Yao Li: Supervision, Writing – review & editing.

Data availability

Data will be made available on request.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by ‘‘National Natural Science Foundation of China (NSFC)” (Grant No. 71773024), ‘‘the Hei-
longjiang Postdoctoral Scientific Research Developmental Fund” (LBH-Q18064).

Appendix A

Table A7
The summary information of macro features.

Categories No. Indicators Units Frequency


Exchange Rates 1 Middle Exchange Rate: USD/CNY Daily
(XR) 2 Middle Exchange Rate: EUR/CNY Daily
3 Middle Exchange Rate: HKD/CNY Daily
4 Middle Exchange Rate:100JPY/CNY Daily
General Indicators of Stock Market (GISM) 5 SSE: Total Market Value of Stocks CNY 100mn Daily
6 SSE: Market Value of Shares Outstanding CNY 100mn Daily
7 SSE: Average P/E Ratio Times Daily
8 SSE: Total Stock Volume 10,000 Daily
shares
9 SSE: Total Stock Turnover Rate % Daily
10 SSE: Total No. of Shares Sold 10,000 units Daily
11 SSE: Total Stock Turnover CNY 100mn Daily
Summary of Domestic Futures Trading 12 Futures Settlement Price (Active Contract): 5-year Treasury Bond yuan Daily
(SDFT) Futures
13 Futures Settlement Price (Active Contract): 10Y T-Bond Futures yuan Daily
14 Futures Settlement Price (Active Contract): CSI50 Index Futures Point Daily
15 Futures Settlement Price (Active Contract): Gold yuan/g Daily
Price Indices (PI) 16 CPI: YoY % Monthly
17 CPI: Food: YoY % Monthly
18 CPI: Non-food: YoY % Monthly
Price Indices 19 CPI: Consumer Goods: YoY % Monthly
(PI) 20 CPI: Services: YoY % Monthly
Money supply 21 M0 CNY 100mn Monthly
(MS) 22 M0:YoY % Monthly
23 M1 CNY 100mn Monthly
24 M1:YoY % Monthly
25 M2 CNY 100mn Monthly
26 M2:YoY % Monthly
GDP 27 GDP: Current Prices CNY 100mn Quarterly
28 GDP: Current Prices: Primary Industry CNY 100mn Quarterly
29 GDP: Current Prices: Secondary Industry CNY 100mn Quarterly
30 GDP: Current Prices: Tertiary Industry CNY 100mn Quarterly

145
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

References

[1] R. Kumar, R. Ayushman, Agarwal, Weak form market efficiency of Indian stock market: evidence from Indian metal & mining sector, Pac. Bus. Rev. Int.
14 (2021) 18–125.
[2] M.L. Erdas, Validity of weak-form market efficiency in Central and Eastern European Countries (CEECs): Evidence from linear and nonlinear unit root
tests, Rev. of Econ. Perspect. 19 (2019) 399–428, https://doi.org/10.2478/revecp-2019-0020.
[3] X. Li, L. Yang, F. Xue, H. Zhou, Time series prediction of stock price using deep belief networks with intrinsic plasticity, In: 29th Chinese Control and
Decision Conference (CCDC), 2017, pp.1237-1242.https://doi.org/10.1109/CCDC.2017.7978707.
[4] X. Pang, Y. Zhou, P. Wang, W. Lin, V. Chang, An innovative neural network approach for stock market prediction, J. Supercomput. 76 (2020) 2098–2118,
https://doi.org/10.1007/s11227-017-2228-y.
[5] L. Chen, Z. Qiao, M. Wang, C. Wang, R. Du, H.E. Stanley, Which artificial intelligence algorithm better predicts the Chinese stock market?, IEEE Access 6
(2018) 48625–48633, https://doiorg/10.1109/ACCESS.2018.2859809.
[6] C. Xie, D. Rajan, Q. Chai, An interpretable neural fuzzy hammerstein-wiener network for stock price prediction, Inf. Sci. 577 (2021) 324–335, https://
doi.org/10.1016/j.ins.2021.06.076.
[7] W. Chen, M. Jiang, W.-G. Zhang, Z. Chen, A novel graph convolutional feature based convolutional neural network for stock trend prediction, Inf. Sci.
556 (2021) 67–94, https://doi.org/10.1016/j.ins.2020.12.068.
[8] M. Agrawal, P. Kumar Shukla, R. Nair, A. Nayyar, M. Masud, Stock prediction based on technical indicators using deep learning model, Comput., Mater.
Continua 70 (2022) 287–304. https://doi.org/10.32604/cmc.2022.014637.
[9] O.B. Sezer, M.U. Gudelek, A.M. Ozbayoglu, Financial time series forecasting with deep learning: A systematic literature review: 2005–2019, Appl. Soft
Comput. 90 (2020) 1–32, https://doi.org/10.1016/j.asoc.2020.106181.
[10] A.M. Ozbayoglu, M.U. Gudelek, O.B. Sezer, Deep learning for financial applications: A survey, Appl. Soft Comput. 93 (2020) 1–52, https://doi.org/
10.1016/j.asoc.2020.106384.
[11] Y. Xun, L. Wang, H. Yang, J. Cai, Mining relevant partial periodic pattern of multi-source time series data, Inf. Sci. 615 (2022) 638–656, https://doi.org/
10.1016/j.ins.2022.10.049.
[12] Q. Zhang, L. Yang, F. Zhou, Attention enhanced long short-term memory network with multi-source heterogeneous information fusion: An application
to BGI Genomics, Inf. Sci. 553 (2021) 305–330, https://doi.org/10.1016/j.ins.2020.10.023.
[13] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997) 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735.
[14] A.H. Bukhari, M.A.Z. Raja, M. Sulaiman, S. Islam, M. Shoaib, P. Kumam, Fractional Neuro-Sequential ARFIMA-LSTM for financial market forecasting, IEEE
Access 8 (2020) 1–14, https://doi.org/10.1109/ACCESS.2020.2985763.
[15] H.Y. Kim, C.H. Won, Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models, Expert Syst.
Appl. 103 (2018) 25–37, https://doi.org/10.1016/j.eswa.2018.03.002.
[16] S. Ahmed, R.K. Chakrabortty, D.L. Essam, W. Ding, Poly-linear Regression with augmented long short term memory neural network: predicting time
series data, Inf. Sci. 606 (2022) 573–600, https://doi.org/10.1016/j.ins.2022.05.078.
[17] F. Kamalov, Forecasting significant stock price changes using neural networks, Neural Compu. Appl. 32 (2020) 17655–17667, https://doi.org/10.1007/
s00521-020-04942-3.
[18] T. Fischer, C. Krauss, Deep learning with long short-term memory networks for financial market predictions, Eur. J. Oper. Res. 270 (2) (2018) 654–669,
https://doi.org/10.1016/j.ejor.2017.11.054.
[19] M. Nabipour, P. Nayyeri, H. Jabani, A. Mosavi, E. Salwana, Deep learning for stock market prediction, Entropy 22 (8) (2020) 1–23, https://doi.org/
10.3390/e22080840.
[20] J. Li, H. Bu, J. Wu, Sentiment-aware stock market prediction: A deep learning method, International Conference on Service Systems and Service
Management 2017 (2017) 1–6, https://doi.org/10.1109/ICSSSM.2017.7996306.
[21] W. Wang, W. Li, N. Zhang, K. Liu, Portfolio formation with preselection using deep learning from long-term financial data, Expert Syst. Appl. 143 (2020)
1–17, https://doi.org/10.1016/j.eswa.2019.113042.
[22] F. Chollet, Deep Learning with Python, Shelter Island, NY, Manning, USA, 2018.
[23] W. Liao, Y. Ma, Y. Yin, G. Ye, D. Zuo, Improving abstractive summarization based on dynamic residual network with reinforce dependency,
Neurocomputing 448 (2021) 228–237, https://doi.org/10.1016/j.neucom.2021.02.028.
[24] Y. Xie, R. Liang, Z. Liang, L. Zhao, Attention-based dense LSTM for speech emotion recognition, IEICE Trans. Inf. Syst. E102.D (7) (2019) 1426–1429.
[25] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W. Wong, W. Woo, Convolutional LSTM network: a machine learning approach for precipitation nowcasting, Adv.
Neural Inf. Process Syst. 28 (2015) 802–810, https://doi.org/10.1007/978-3-319-21233-3_6.
[26] P. Huang, W. Chao, L. Fu, Q. Peng, Y. Tang, A deep learning approach for multi-attribute data: a study of train delay prediction in railway systems, Inf.
Sci. 516 (2020) 234–253, https://doi.org/10.1016/j.ins.2019.12.053.
[27] H. Rezaei, H. Faaljou, G. Mansourfar, Stock price prediction using deep learning and frequency decomposition, Expert Syst. Appl. 169 (2021), https://
doi.org/10.1016/j.eswa.2020.114332 114332.
[28] Z. Ouyang, X. Yang, Y. Lai, Systemic financial risk early warning of financial market in China using Attention-LSTM model, North Am. J. Econ. Financ. 56
(2021), https://doi.org/10.1016/j.najef.2021.101383 101383.
[29] J. Qin, Y. Zhang, S. Fan, X. Hu, Y. Huang, Z. Lu, Y. Liu, Multi-task short-term reactive and active load forecasting method based on attention-LSTM model,
Int. J. Electr. Power Energy Syst. 135 (2022), https://doi.org/10.1016/j.ijepes.2021.107517 107517.
[30] Y. Su, X. Kong, G. Liu, J. Su, Advertising popularity feature collaborative recommendation algorithm based on attention-LSTM model, Security Commun.
Networks 2021 (2021) 1–11.
[31] R. Wang, X. Pei, J. Zhu, Z. Zhang, X. Huang, J. Zhai, F. Zhang, Multivariable time series forecasting using model fusion, Inf. Sci. 585 (2022) 262–274.
[32] P.H. Vuong, T.T. Dat, T.K. Mai, P.H. Uyen, Stock-Price Forecasting Based on XGBoost and LSTM, Comput. Syst. Sci. Eng. 40 (2022) 237–246. https://doi.
org/10.32604/csse.2022.017685.
[33] L. Du, R. Gao, P.N. Suganthan, D.Z. Wang, Bayesian optimization based dynamic ensemble for time series forecasting, Inf. Sci. 591 (2022) 155–175,
https://doi.org/10.1016/j.ins.2022.01.010.
[34] X. Yu, D. Zhang, T. Zhu, X. Jiang, Novel hybrid multi-head self-attention and multifractal algorithm for non-stationary time series prediction, Inf. Sci.
613 (2022) 541–555, https://doi.org/10.1016/j.ins.2022.08.126.
[35] M. Sun, Q. Li, P. Lin, Short-term stock price forecasting based on an SVD-LSTM model, Intell. Autom. Soft Comput. 28 (2021) 369–378. https://doi.org/
10.32604/iasc.2021.014962.
[36] Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series, The Handb. of Brain Theory and Neural Netw., 3361 (10) (1995) 1-14.
[37] T.J. Jun, Y. Eom, D. Kim, C. Kim, J. Park, H.M. Nguyen, Y. Kim, D. Kim, TRk-CNN: Transferable Ranking-CNN for image classification of glaucoma,
glaucoma suspect, and normal eyes, Expert Syst. Appl. 182 (2021) 1–49, https://doi.org/10.1016/j.eswa.2021.115211.
[38] Y. Ko, Y. Lee, S. Azam, F. Munir, M. Jeon, W. Pedrycz, Key points estimation and point instance segmentation approach for lane detection, IEEE Trans.
Intell. Transp. Syst. 23 (2022) 8949–8958, https://doi.org/10.1109/TITS.2021.3088488.
[39] D. Cabrera, F. Sancho, M. Cerrada, R.-V. Sánchez, C. Li, Knowledge extraction from deep convolutional neural networks applied to cyclo-stationary time-
series classification, Inf. Sci. 524 (2020) 1–14, https://doi.org/10.1016/j.ins.2020.03.039.
[40] A.F. Kamara, E. Chen, Z. Pan, An ensemble of a boosted hybrid of deep learning models and technical analysis for forecasting stock prices, Inf. Sci. 594
(2022) 1–19, https://doi.org/10.1016/j.ins.2022.02.015.
[41] R. Chen, X. Yan, S. Wang, G. Xiao, DA-Net: Dual-attention network for multivariate time series classification, Inf. Sci. 610 (2022) 472–487, https://doi.
org/10.1016/j.ins.2022.07.178.

146
S. Li, Z. Tian and Y. Li Information Sciences 622 (2023) 133–147

[42] K. He, X. Zhang, S. Ren , & J. Sun, Deep residual learning for image recognition, In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, (2016) 770-778, https://doi.org/10.1109/CVPR.2016.90.
[43] S. Gao, M. Cheng, K. Zhao, X. Zhang, M. Yang, P. Torr, Res2Net: A new Multi-Scale backbone architecture, IEEE Trans. Pattern Anal 43 (2021) 652–662,
https://doi.org/10.1109/TPAMI.2019.2938758.
[44] D.J. Verly Lopes, G.D.S. Bobadilha, A. Peres Vieira Bedette, Analysis of lumber prices time series using long Short-Term memory artificial neural
networks, For. 12 (2021) 1–12, https://doi.org/10.3390/f12040428.
[45] A.H. Alenezy, M.T. Ismail, S.A. Wadi, M. Tahir, N.N. Hamadneh, J.J. Jaber, W.A. Khan, B.K. Papadopoulos, Forecasting stock market volatility using hybrid
of adaptive network of fuzzy inference system and wavelet functions, J. Math. 2021 (2021) 1–10.
[46] T.B. Shahi, A. Shrestha, A. Neupane, W. Guo, Stock price forecasting with deep learning: A comparative study, Math. 8 (2020) 1–15, https://doi.org/
10.3390/math8091441.
[47] T. Zhan, F. Xiao, A fast evidential approach for stock forecasting, Int. J. Intell. Syst. 36 (2021) 7544–7562, https://doi.org/10.1002/int.22598.
[48] L. van der Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (2008) 2579–2605.

147

You might also like