Learning Dynamic Dependencies With Graph Evolution Recurrent Unit For Stock Predictions
Learning Dynamic Dependencies With Graph Evolution Recurrent Unit For Stock Predictions
Abstract—Investment decisions and risk management require predictions [2], [3]. Unfortunately, from the perspective of
understanding the time-varying dependencies between stocks. information usage, most of these solutions follow a single-
Graph-based learning systems have emerged as a promising stock prediction schema, i.e., they treat stocks typically as
approach for predicting stock prices by leveraging interfirm rela-
tionships. However, existing methods rely on a static stock graph independent of each other and ignore the relations between
predefined from finance domain knowledge and large-scale data them.
engineering, which overlooks the dynamic dependencies between In reality, stock markets are increasingly referring to
stocks. In this article, we present a novel framework called graph complex interconnected systems as the recognition of vast
evolution recurrent unit (GERU), which uses a dynamic graph interactivity between the listed companies and investors [4].
neural network to automatically learn the evolving dependencies
from historical stock features, leading to better predictions. Our Currently, there are several common characteristics concern-
approach consists of three parts: first, we develop an adaptive ing the complexity of the interactions between the stocks.
dynamic graph learning (ADGL) module to learn latent dynamic First, stock interactions are challenging to recognize precisely
dependencies from stock time series. Second, we propose a clus- due to the existence of hidden interdependencies [6] (e.g.,
tered ADGL (clu-ADGL) to handle large-scale time series by insider trading, price manipulation, and hidden shareholders
reducing time and memory complexity. Third, we combine the
ADGL/clu-ADGL with a graph-gated recurrent unit to model the beneath the surface of financial regulations). Second, the inter-
temporal evolutions of stock networks. Extensive experiments on firm dependencies are dynamic evolutionary and shaped by
real-world datasets show that our proposed methods outperform multiple elements, such as business strategy adjustments [7]
existing methods in predicting stock movements, capturing mean- or the changes in the macro-economic environment [8]. Third,
ingful dynamic dependencies and temporal evolution patterns experimental financial researches indicate that the financial
from the financial market, and achieving outstanding profitability
in portfolio construction. network is sparse, and only a few numbers of stocks own
large degrees and influence powers [4]. Knowing the proper-
Index Terms—Gated recurrent unit, graph representation ties of financial complex systems is helpful to develop effective
learning, learning dynamic dependencies, stock prediction
system. network-based prediction models or micro-trading tactics.
Graph (network) as an essential data structure describes the
relationships between different entities. Stock predictions can
be viewed naturally from a graph perspective. Graph neu-
I. I NTRODUCTION ral networks (GNNs) have shown excellent performance in
TOCK predictions aim to predict the future trend and
S price of the listed company and play a crucial role in the
financial investment domain. In the past few years, machine
processing various real-world graphs due to their permutation-
invariance, local connectivity, and compositionality [9], [10].
Recent research has proposed predicting the stock price
learning algorithms for financial market predictions have movement by using interfirm relationships based on human
rapidly developed [1]. Due to the strong abilities to capture knowledge and intuition [11], [12], [13], [14] and obtained
complex patterns and learn the nonlinear relationships from some promising performances. These predictive systems fol-
data, the deep learning methods have better performances than low a similar roadmap where one predefines a financial
traditional machine learning systems on the stock movement network (graph) by utilizing available interfirm relationships
such as business partnerships to convert stock prediction into
Manuscript received 22 February 2023; accepted 30 May 2023. Date of a node classification task, where the prediction for one stock
publication 10 July 2023; date of current version 17 October 2023. This work
was supported in part by the Ministry of Science and Technology of China can learn from other stocks in a graph. Subsequently, GNNs
under Grant 2020AAA0108401, and in part by the Natural Science Foundation are trained on the predefined financial network to predict the
of China under Grant 72225011 and Grant 72293575. This article was future stock movements.
recommended by Associate Editor W.-K. V. Chan. (Corresponding author:
Xiaolong Zheng.) Although the existing network-based methods have shown
The authors are with the State Key Laboratory of Multimodal Artificial better performance than other deep learning models, we argue
Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, that such solutions have three main limitations. First, con-
Beijing 100190, China, and also with the School of Artificial Intelligence,
University of Chinese Academy of Sciences, Beijing 100190, China (e-mail: structing a predefined financial network relies on solid finance
tianhu2018@[Link]; zhangxingwei2019@[Link]; [Link]@[Link]; domain knowledge and large-scale data engineering, which is
[Link]@[Link]). time-consuming, laborious, and costly. Second, the predefined
Color versions of one or more figures in this article are available at
[Link] network is incapable of addressing the complicated stock
Digital Object Identifier 10.1109/TSMC.2023.3284840 dependencies because the human intuition and knowledge with
2168-2216
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See [Link] for more information.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
6706 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 53, NO. 11, NOVEMBER 2023
various prior assumptions may neglect the hidden interactions II. L ITERATURE R EVIEW
critical for effective decision-making. Third, these graph-based This article is conceptually and methodologically related to
methods only construct stable interfirm relationships and learn previous studies in graph-based financial predictions, learn-
node representations over the static stock network, which lacks ing graph models, and dynamic GNNs. In this section, we
the dynamic characteristic of the financial market. As such, will review related work in these areas and highlight our
while the nature of the market varies over time [5], wrong contributions.
interdependencies between stocks can incur much noise for the
learning models, thus potentially restricting the static model’s
generalization. Hence, modeling the dynamic temporal depen- A. Graph-Based Financial Prediction
dencies between stocks is indispensable for understanding the The traditional solutions aim to recognize valuable signals
market evolution and predicting the future stock status. from the constructed financial networks to enhance the
To overcome these limitations, we propose a novel neural performance of predictive systems [15], [16], [17]. For exam-
network framework, graph evolution recurrent unit (GERU), ple, Creamer et al. [15] found that the average eigenvector
which tries to directly learn the dynamic temporal dependen- centrality of the corporate news networks at different time
cies between stocks from the financial time series for better points has an impact on return and volatility of the leading
stock predictions. Specifically, we feed the current features blue-chip index for the Eurozone. By mining daily role-based
of each stock and historical stock embeddings to an adaptive trading networks, Sun et al. [16] fed the investors’ trad-
dynamic graph learning (ADGL) module to explicitly capture ing behavior characteristics to the neural network to predict
the latent dependencies and learn the relation-wise sequential the stock price. Cao et al. [17] found out that the stock
representations. ADGL also exploits a sparsification mecha- price volatility pattern networks’ topology characteristics are
nism to filter unimportant links and obtain fine interpretability. significant features for the machine learning predictive mod-
Furthermore, we propose a clustered ADGL (clu-ADGL) mod- els. Such methods can utilize the dynamic factors of stock
ule, which improves ADGL to handle large-scale time series market behavior from a systematic perspective but cannot
by only considering the dependencies between a stock and consider the dependencies between stocks and the tradi-
several other stocks affiliating the same cluster. Next, we tional machine learning methods are incapable of handling the
exploit a gated graph recurrent unit (GGRU) [25], [42] to graph-structured data.
string ADGL/clu-ADGL and learn the node representations Recently, using GNN methods to predict the stock market
concerning topology and node attribute evolutions in dynamic received increased attention. Based on shareholding relation-
graphs. Finally, we feed the latest stock representations to a ships, Chen et al. [11] created a financial network consisting of
fully connected (FC) layer to predict the stock movements. the listed companies and their top 10 stockholders and trained
The main contributions of this work can be summarized as a standard GCN model on graph-structured data to make
follows. predictions. Considering the diverse and complicated corpora-
1) We investigated directly learning the latent dynamic tion relationships, some researchers proposed using multiple
dependencies between stocks from the financial time types of relations to build a financial network. Feng et al. [12]
series and developed an ADGL module to generate regarded the sector-industry and knowledge-based company
sparse but meaningful time-varying stock graphs. relations as the edges of building a static stock graph. They
2) We proposed a clu-ADGL which can present comparable used a long short-term memory (LSTM) and GCN to learn
performance with the ADGL and has high efficiency in node embeddings from temporal features. HAST [13] intro-
terms of memory and computation when used in large- duced a hierarchical graph attention system to selectively
scale time-series scenarios. aggregate information on different types of corporate relations.
3) We provided a generic dynamic graph learning and Viewing the different relations as different financial networks
predictive system GERU by incorporating ADGL/clu- and exploiting multigraph convolutional networks to encode
ADGL into the gated graph recurrent networks, which multiple types of stock graphs, Ye et al.’s [14] method can
can model the interactions of time series at each time learn more complex relations for stock predictions. HAD-
step and capture both the topology and node attribute GNN [19] built the time-varying stock graphs leveraging stock
changes in dynamic graphs. price correlations and proposed a hybrid-attention dynamic
4) We justified that GERU has better predictive GNN method to predict the stock market.
performance than the baseline methods and can The superiority of graph-based prediction methods indicates
provide interpretability by discovering meaningful that considering time series’ dependencies is a valuable signal
underlying dependencies between stocks with the for more accurate forecasting. However, these solutions rely on
evolution of the financial market. predefined and static networks and cannot model relationship
The remainder of this article is organized as follows. In evolution’s dynamic patterns. Prebuilding the network strongly
Section II, a brief review of related studies is presented. relies on domain knowledge and data engineering and may
In Section III, our proposed GERU is described in detail. neglect the hidden interactions critical for effective decision-
Section IV presents the experimental results on five real-world making. Therefore, we argue that directly learning dynamic
stock market datasets. Section V concludes this article and dependencies from time series of stocks can be a practical
outlines the future work. approach to address these challenges.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: LEARNING DYNAMIC DEPENDENCIES WITH GERU FOR STOCK PREDICTIONS 6707
B. Learning Graph Models movement based on the past T steps (lag size) historical data
When handling the time series with unknown interactions, y(t + τ ) = fθ
Xt ,
Xt−1 , . . . ,
Xt−T+1 (2)
the primary step is identifying the underlying graph topol-
ogy. Learning graphs from data is an essential theme for where y(t + τ ) = [1hot(y1 (t + τ )), . . . , 1hot(yn (t + τ ))]T ∈
graph machine learning and graph signal processing. The tra- Rn×2 .1hot(·) encodes the binary value to a one-hot vector.
ditional approaches to learning graphs mainly contain the In this article, we investigate the stock dependencies on the
graphical Lasso methods [20] and the Granger causality-based long-term time scale. Each time step represents one month.
methods [21]. Cardoso and Palomar [20] imposed Laplacian
structural constraints into the Gaussian Markov random field IV. P ROPOSED M ETHOD
model, which can find conditional correlations between stocks.
Independent of the downstream tasks, these methods cannot In this section, we first present the overall architecture of
learn the meaningful network structures for the task-specific GERU system, which is shown in Fig. 1. The proposed GERU
and are not accessible to incorporate into GNNs to take cantinas two main components: the ADGL module and the
advantage of the adaptations of deep neural networks (DNNs). GGRU. The vanilla ADGL explicitly learns stock dependen-
Recently, the DNN-based methods have been proposed to cies from the raw stock data for each time point by introducing
learn dependencies of time series [22], [23], [24]. These a multihead sparse self-attention network. To reduce the space
methods follow a similar principle: learning a global node and time complexity of vanilla ADGL, we devise a clu-ADGL.
embedding matrix by incorporating a learning graph mod- In the GGRU, the linear transformation layers are replaced
ule into the task-specific neural networks. Such a matrix with GNNs to encode the stock representations. Finally, we
can measure the nodes’ similarities via paired inner-product. obtain a generic method called GERU by combing ADGL
SDGNN [25] extended such an idea by concatenating the and GGRU to model the dynamic evolutions of the learned
time series features with the initial node embeddings to infer stock dependencies and stock time series.
time-varying node relations. HAST [13] and FinGAT [26]
used a graph attention network (GAT) [18] to derivate latent A. Adaptive Dynamic Graph Learning
and dynamic dependencies of time series by transforming the To capture the underlying dependencies between stock time
learned attention matrixes. However, these methods cannot series, the ADGL directly learns dynamic similarities of the
model the stock graph’s evolutionary characteristics. In addi- stock feature representations deriving from the current stock
tion, the above learning graph methods have quadratic space feature embeddings and the historical stock embeddings (See
complexity regarding the number of time series because the Fig. 2). Because our method can adapt to different market
learned relation matrix needs to be explicitly stored in the situations and constantly capture the dependencies of stocks,
graphic processing unit (GPU) memory. These methods can- we call it ADGL.
not scale to a large-scale financial market with thousands of 1) Multihead Self-Attention: Self-attention is an attention
securities. mechanism that directs the model’s focus and makes it learn
the correlation between any two elements in the same input
sequence [27]. In broad terms, self-attention is in charge
III. P ROBLEM F ORMULATION of managing and quantifying the interdependence within the
Considering the stochasticity and volatility of the financial input elements. Based on this idea, we regard stocks in a finan-
market, predicting the price movement is more attainable than cial market as a node sequence S. Each node (stock) s ∈ S has
forecasting the exact price. Thus, we target predicting the stock its own raw features xst at time t. Before inputting into the
price movement in this article, a classification task in machine ADGL, s
xt is transformed into a latent space whose dimen-
learning. Meanwhile, we assume that the set of investigated sion equals the model dimension dm by a linear transformation
stocks is fixed and in the same financial market, implying that xt = WTX xst , where WX ∈ Rd×dm .
the stocks usually have strong interactions [5]. Hence, we can The self-attention function can be described as mapping the
model the stocks with discrete-time dynamic graphs. node embeddings into a set of query vectors and key vectors.
Given a stock set S that contains n stocks, the features of By using the linear transformation, self-attention encodes the
any stock s ∈ S = {1, 2, . . . , n} at time t is represented as xst = concatenation of Ht−1 and Xt as a set of keys Kt ∈ Rn×dm
x1,t ,
[s x2,t , . . . ,
s xd,t ] ∈ R , where d is the number of features.
s d
and quires Qt ∈ Rn×dm
These features could include those related to stock price (e.g.,
open and close prices), trading activities (e.g., trading volume), Kt = Ht−1 ; Xt WK
as well as other technical indicators (e.g., moving average). Qt = Ht−1 ; Xt WQ (3)
For all stocks in S, their features at time t can be denoted
as Xt = [ x1t ,x2t , . . . ,
xnt ]T ∈ Rn×d . We define the next τ -step where Xt = Xt WX , Ht−1 = [h1t−1 , h2t−1 , . . . , hnt−1 ]T ∈ Rn×dm
price movement as is the historical hidden states at time step t−1, dm is the dimen-
sion of hst , [·; ·] denotes the concatenation operator, WK and
0, pst+τ ≤ pst WQ ∈ R2dm ×dm are learnable parameters. Intuitively, the stock
ys (t + τ ) = (1)
1, pst+τ > pst graphs at consecutive time steps should not vary too much, i.e.,
the current stock graph evolves from the predecessor. It implies
where pst denotes the close price of stock s at time t. Our that the ADGL should know the graph information from the
target is to find a function fθ to predict the next τ -step price last time step while learning the stock graph at the current
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
6708 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 53, NO. 11, NOVEMBER 2023
Fig. 1. Schematic illustration of the proposed predictive system. For simplicity, we only present the GERU at time t and t + 1. The input of the GERU at
time t is the feature of all stocks. The ADGL and clu-ADGL explicitly learn a stock graph At according to the current input data and its hidden state from the
last time step. GGRU encodes the representations Et of nodes concerning topology and node attribute changes in dynamic graphs. The detailed architectures
of the ADGL/clu-ADGL and GGRU can be found in Figs. 2 and 5, respectively.
Qht KhT
t
Rht = √ (5)
dm
where
Kht = Ht−1 ; Xt WhK
Qht = Ht−1 ; Xt WhQ . (6)
A multihead attention mechanism means that the ADGL
can jointly attend to information from different representa-
tion subspaces [27] at each time step. Intuitively, a multihead
attention mechanism can learn richer stock dependencies, thus
not neglecting those latent interactions critical for effective
decision-making in the financial markets.
2) Sparsification: The dynamic relation matrix Rt can be
normalized by the softmax function to generate a weighted
adjacency matrix, which defines a dense digraph. However,
the dense weights may lead to more diffused attention, i.e.,
predicting the focal stock could need to aggregate information
from many neighbor stocks, even if some are not much cor-
related to the focal stock. A sparse financial network with a
fewer number of edges also can significantly reduce the com-
Fig. 2. ADGL is a modified multihead self-attention network consisting putation cost of the GNN component compared to a dense
of H independent self-attention layers running in parallel and a sparsification
function transforming the learned dense relation matrixes into sparse matrixes. network. Therefore, we propose a sparsification strategy that
The clu-ADGL exploits LSH based K-means clustering and replaces scaled keeps the k dependencies for the focal stock that are most
dot product and sparsification with top-k scaled dot product (see Fig. 3). The conducive to making accurate predictions.
right part represents the FFN.
As shown in Fig. 3, the sparsification strategy’s basic idea
time step. Hidden state Ht−1 represents the dynamic node is straightforward. We explicitly select the k-largest element of
representations containing the locally topological information each row in Rht and record their positions in the index matrix
and node features of stock graphs before time t and concate- IDXht
nating Ht−1 with Xt as the input allows the ADGL to generate idx = argtopk Rht [i, :]
more consistent stock graphs from prior knowledge.
The dynamic relation matrix Rt between all stocks at time IDXht [i, idx] = 1
t is computed by a compatibility function (i.e., the scaled dot- IDXht [i, −idx] = 0 (7)
product attention [27] in (4) and Fig. 2) of the query with the
corresponding key where k is a hyperparameter, argtopk(·) returns the indices of
the top-k largest values of a vector, and −idx refers to the
Qt KT indices not in idx. Then, we mask those unchosen elements in
Rt = √ t . (4) Rht with a sufficiently small negative value N (e.g., −10e16).
dm
The formulation definition can be illustrated as follows:
As suggested by [27], we also found it beneficial to perform h
Rt = Rht ⊗ IDXht + N 11T − IDXht (8)
self-attention layer H times with independent learned param-
eters WK and WQ (see Fig. 2). For the hth head, the dynamic where the operator is the Hadamard product, and 1 is an
relation matrix Rht can be formulated as all-one vector. Since those unchosen stock relations are set as
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: LEARNING DYNAMIC DEPENDENCIES WITH GERU FOR STOCK PREDICTIONS 6709
TABLE I
D ETAILED I NFORMATION OF F IVE S TOCK DATASETS
We removed the stocks that were traded for fewer than 90% in multivariate time series [23]. Furthermore, SDGNN [25]
of all trading months during the corresponding data collec- extends MTGNN and AGCRN as a learning dynamic
tion periods and retained 78, 221, 98, 811, and 1886 stocks graph-based model by concatenating the stock features with
on five datasets. We split each dataset into three parts: the initial stock embeddings to infer time-varying stock
1) 66.7% (48 months) for training (Train.); 2) 8.3% (6 months) relations.
for validation (Valid.); and 3) 25% (18 months) for testing 4) Hyperparameters: Following the setting of [37], we
(Test.). To comprehensively evaluate our proposed method’s deployed the Adam optimizer with beta coefficients 0.9 and
performance, we set three prediction tasks with τ = 1, 3, 0.999 and set the hidden dimension dm as 64 for all mod-
and 6, respectively. The next τ -step price movement (label) of els. Meanwhile, we set their corresponding hyperparameters,
each stock was given by (1). All raw datasets are transformed including the learning rate, regularization weight of L2 , lag
via z-score normalization according to the training datasets. size T, number of head H, and sparsification indicator k
Our smallest dataset, CSI 100, has 5616 data points, and the were selected via grid search within the sets of [0.0005,
largest dataset, Rus 3 K, has 135 792 data points. The dataset 0.001, 0.005, 0.01], [0, 5e-4, 5e-3], [1, 3, 6, 9, 12], [1, 3,
scale can display the discriminative powers of the evaluated 5, 7, 9], and [1, 3, 5, 10]. For clu-GERU, we search c
methods. within [3, 5, 10, 15]. For independent prediction methods,
the batch size is set to 128. For graph-based methods, the
batch size is set to 18 for S&P 100, CSI 100, and CSI
B. Experimental Settings 300, 3 for Rus 1 K, and 1 for Rus 3 K to avoid exceed-
We compared our proposed GERU with some represen- ing GPU memory. Especially for clu-GERU, the batch size
tative and state-of-the-art baselines. These baselines can be is set to 18 for Rus 1 K, and 9 for Rus 3 K to speed
divided into independent prediction, predefined graph-based, up the training. We use PyTorch Geometric [38] to imple-
and learning graph-based methods. Considering that the base- ment the mini-batch processing of GNNs. All methods are
lines MTGNN [22] and AGCRN [23] both used GCN as their implemented with PyTorch and run on NVIDIA Tesla V100
graph encoders, we also use GCN as the graph representa- GPUs. We exploit an early stopping strategy with the patience
tion learning module in other graph-based methods except of 15. The code is available at [Link]
HATS [13] and FinGAT [26], which use the GAT. CAS/GERU/.
1) Independent Prediction Methods: Independent 5) Evaluation Metrics: To measure the predictive
prediction methods include LSTM [34], ALSTM [35], performance, we calculate two evaluation metrics, including
and Adv-LSTM [36], which only utilize each stock’s own accuracy and precision, by the following formations:
information for making predictions. tp + tn
2) Predefined Graph-Based Methods: We predefined three accuracy =
tp + tn + fp + fn
types of interfirm relationships based on financial knowl- tp
edge: 1) a completely connected stock graph (CSG); 2) a precision = (18)
tp + fp
distance-based stock graph (DSG) is by computing the pair-
wise Euclidean distances between stock prices and selecting where tp, tn, fp, and fn are the number of true positive,
the threshold to make every node has at least one edge; true negative, false positive, and false negative cases, respec-
and 3) a sector stock graph (SSG) [26] according to the tively. Although other measures are also useful to evaluate
global industry classification standard (GICS). A two-layer the performance, we care more about the precision of the
GCN model with ReLU activation function was used on CSG predictions. Intuitively, if our precision is higher, we have a
and DSG. We call these two baselines CSG-GCN and DSG- higher probability of picking out some stocks whose prices
GCN. For the SSG, we chose two state-of-the-art methods will go up. All the experiments in the following section were
FinGAT [26] (also called SSG-GAT) and HATS [13] which repeated ten times.
implicitly infer dynamic graphs. Although the FinGAT and
HATS are based on the predefined static stock graphs, they C. Predictive Performance
both exploit the GAT that can derive time-varying graph We conducted experiments to compare 1-step, 3-step, and 6-
attention matrixes, which can be seen as dynamic and hid- step price movement prediction performance on five real-world
den relations of stocks. Therefore, they are also two learning datasets. The evaluation results are summarized in Tables II–
dynamic graph-based models. HAD-GNN [19] is a dynamic IV. Comparing the independent prediction and learning graph-
GNN method for time-varying stock graphs constructed based based methods, almost all learning graph-based methods out-
on historical stock price co-movement. perform the independent prediction methods throughout five
3) Learning Graph-Based Methods: MTGNN explicitly datasets, which confirms that learning graph-based methods
learns the static uni-directed relations among multivariate time is a promising technology for stock movement predictions.
series through a graph learning module and utilizes the graph From Tables II– IV, we can find that our proposed GERU
convolution and temporal convolution modules to learn useful and clu-GERU consistently outperform all baselines through-
node representation for predictions [22]. AGCRN uses a sim- out five real-world datasets and two different financial markets.
ilar approach to infer the static interdependencies between For instance, compared with the best baselines on 1-step
multiple variables implicitly and adopts a recurrent graph predictions, the GERU’s accuracy improvements are 5.0%,
convolutional network to capture the temporal correlations 2.2%, 4.8%, 4.6%, and 4.9% on the five datasets, respectively.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
6712 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 53, NO. 11, NOVEMBER 2023
TABLE II
1-S TEP P REDICTIVE P ERFORMANCE C OMPARISON A MONG VARIOUS M ETHODS
TABLE III
3-S TEP P REDICTIVE P ERFORMANCE C OMPARISON A MONG VARIOUS M ETHODS
TABLE IV
6-S TEP P REDICTIVE P ERFORMANCE C OMPARISON A MONG VARIOUS M ETHODS
GERU also improved precisions by 6.5%, 9.9%, 11.9%, 5.9%, the stock market characteristics. However, our proposed GERU
and 6.0% on the five datasets. Similarly, clu-GERU increased and clu-GERU exhibit significant improvements over dynamic
the accuracies by 4.6%, 4.3%, 2.8%, 5.9%, and 6.9%, and graph-based methods. We observe relative accuracy improve-
increased the precisions by 3.2%, 9.4%, 4.5%, 8.3%, and ments of approximately 5% for CSI 100, Rus 1 K, and Rus 3 K
7.9% on the five datasets, respectively. Such significant datasets. These promising results, when compared to HAD-
performance improvements indicate the superiority of GERU GNN, demonstrate the effectiveness of adaptively learning
and clu-GERU for financial market predictions. In addition, dynamic interactions between stocks using the ADGL/clu-
we observed that clu-GERU tends to outperform GERU on ADGL module. Furthermore, the superiority of GERU over
large datasets, including Rus 1 K and Rus 3 K. A possible the other three dynamic methods validates the benefits of
reason is that GERU with vanilla ADGL needs more GPU modeling topology and node attribute evolutions, as opposed
memory, so we can only use a small batch size to train the to independently learning the dynamic graphs.
model.
Furthermore, the learning graph-based methods achieved
better results than predefined graph-based methods in terms D. Model Analyses
of accuracy and precision. It can be owed to the flexibility This section conducted more detailed model analyses,
of learning latent stock relations. The dynamic graph-based including ablation study, robustness analysis, scalability anal-
models, including HAD-GNN, FinGAT, HATS, and SDGNN, ysis, and dynamic temporal stock graph analysis to closely
slightly outperform the static models, which is consistent with look at the proposed GERU/clu-GERU.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: LEARNING DYNAMIC DEPENDENCIES WITH GERU FOR STOCK PREDICTIONS 6713
TABLE V
P REDICTION ACCURACY C OMPARISON B ETWEEN GERU AND D IFFERENT
C OMBINATIONAL M ETHODS ON 1-S TEP S TOCK M OVEMENT P REDICTIONS
Fig. 6. Sensitivities of the GERU and the clu-GERU with respect to spar-
sification indicator k on the validation and testing of three datasets. (a) S&P
100. (b) CSI 100. (c) CSI 300.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
6714 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 53, NO. 11, NOVEMBER 2023
TABLE VI
AVERAGE D ENSITY OF THE L EARNED S TOCK G RAPHS C ONCERNING TO
H ON T HREE DATASETS W HEN k I S F IXED AS 5
Fig. 10. Learned stock graphs (k = 3, c = 5) in the CSI 100 dataset from Oct.
2020 to Dec. 2020 by the (a) GERU (with ADGL) and the (b) clu-GERU (with
clu-ADGL), respectively. The node color denotes the stock sector affiliation.
Fig. 9. (a) Time cost and (b) GPU memory comparisons of the dynamic
models with the different number of time series.
in Table VI. We can find that the average density is still rel-
atively small with the increase of H, which demonstrates that
summing the multihead adjacency matrixes in (10) can main-
tain modest sparsity. The reason is that the learned multihead
adjacency matrixes have many overlap edges.
3) Scalability Analysis: Our analysis has proved that the
clu-ADGL reduces the ADGL’s quadratic complexity to linear
complexity. To further investigate the efficiency of the clu-
ADGL in practice, we conducted a scalability analysis with
synthetic time series data whose shape is identical to
xst . Given
the same computing environment (11 GB GPU memory), we
ran the HAD-GNN, SDGNN, GERU, and clu-GERU with
the number of time series increasing from 0.5 to 64 K. We
divided the overall time cost and GPU memory with the batch
size, the number of time series, and T to calculate the time
cost and GPU memory of single time series. As shown in
Fig. 9, for the first three methods, the computational time and
memory of single series scale linearly concerning the num- Fig. 11. Influence power of each stock evolves along the time axis. The
ber of time series, and they are out of the memory when the horizontal axis represents time, and the longitudinal axis denotes the individual
number of time series is larger than 8 K. In contrast, with the stock index. (a) CSI 100. (b) CSI 300. (c) S&P 100.
number of time series increasing from 0.5 to 64 K, the clu-
GERU shows an approximately constant computation time and stock can provide more information for its neighbors’ price
memory for single series, which indicates that the clu-GERU movement predictions. The weights along the time axis are
has better scalability than other dynamic graph-based methods shown in Fig. 11, where each element represents the influ-
in practice. ence of one stock on its neighbor stocks at a specific time
4) Learned Stock Graphs: We first visualized the learned point.
dynamic graphs in Fig. 10 to show the evolution of the From Fig. 11, we can observe an interesting finding that
stock market. Macroscopically, these stock graphs have sim- the evolution patterns of influence power between the Chinese
ilar structures and dominated stock nodes but moderate local stock market and the American stock market are very different.
link changes for GERU and clu-GERU at successive time In the CSI 100 dataset and CSI 300 dataset, many influen-
steps. And the clu-GERU’s graphs own transparent com- tial stocks successively survive several months and eventually,
munities because clu-ADGL explicitly discovers the clusters decline and perish. By summing the weights of stocks that
with LHS K-means. To display the overall evolution of the affiliate the same sector according to GICS and normalizing
learned dynamic graphs, we calculated the outcoming-edge them, we obtain the sector influence evolving along the time
weight sum of each stock by performing a summing along axis on CSI 100 in Fig. 12. We can observe a significant sector
with At ’s columns. A larger weight sum means that the rotation effect that the influential sectors transfer from some
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: LEARNING DYNAMIC DEPENDENCIES WITH GERU FOR STOCK PREDICTIONS 6715
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
6716 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 53, NO. 11, NOVEMBER 2023
[11] Y. Chen, Z. Wei, and X. Huang, “Incorporating corporation relationship [32] A. Shrivastava and P. Li, “Asymmetric LSH (ALSH) for sublinear time
via graph convolutional neural networks for stock price prediction,” in maximum inner product search (MIPS),” in Proc. 27th Int. Conf. Adv.
Proc. 27th ACM Int. Conf. Inf. Knowl. Manag., Turin, Italy, Oct. 2018, Neural Inf. Process. Syst., vol. 2. Montreal, QC, Canada, Dec. 2014,
pp. 1655–1658. pp. 2321–2329. [Online]. Available: [Link]
[12] F. Feng, X. He, X. Wang, C. Luo, Y. Liu, and T.-S. Chua, “Temporal paper/2014file/[Link]
relational ranking for stock prediction,” ACM Trans. Inf. Syst., vol. 37, [33] A. Hasanzadeh, E. Hajiramezanali, K. Narayanan, N. Duffield, M. Zhou,
no. 2, pp. 1–30, 2019, doi: 10.1145/3309547. and X. Qian, “Variational graph recurrent neural networks,” in Proc.
32nd Int. Conf. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada,
[13] R. Kim, C. H. So, M. Jeong, S. Lee, J. Kim, and J. Kang, “HATS:
Dec. 2019, pp. 10701–10711. [Online]. Available: [Link]
A hierarchical graph attention network for stock movement prediction,”
[Link]/paper/2019/file/a6b8deb7798e7532ade2a8934477d3ce-Paper.
2019, arXiv:1908.07999.
pdf
[14] J. Ye, J. Zhao, K. Ye, and C. Xu, “Multi-graph convolutional network [34] D. M. Q. Nelson, A. C. M. Pereira, and R. A. de Oliveira, “Stock
for relationship-driven stock movement prediction,” in Proc. 25th market’s price movement prediction with LSTM neural networks,” in
Int. Conf. Pattern Recognit., Milan, Italy, Jan. 2021, pp. 6702–6709, Proc. Int. Joint Conf. Neural Net. (IJCNN), May 2017, pp. 1419–1426,
doi: 10.1109/ICPR48806.2021.9412695. doi: 10.1109/IJCNN.2017.7966019.
[15] G. G. Creamer, Y. Ren, and J. V. Nickerson, “Impact of dynamic cor- [35] Y. Qin, D. Song, H. Cheng, W. Cheng, G. Jiang, and G. W. Cottrell,
porate news networks on asset return and volatility,” in Proc. Int. Conf. “A dual-stage attention-based recurrent neural network for time series
Soc. Comput., 2013, pp. 809–814, doi: 10.1109/SocialCom.2013.121. prediction,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2017,
[16] X.-Q. Sun, H.-W. Shen, and X.-Q. Cheng, “Trading network pp. 2627–2633, doi: 10.24963/ijcai.2017/366.
predicts stock price,” Sci. Rep., vol. 4, no. 1, pp. 1–6, 2014, [36] F. Feng, H. Chen, X. He, J. Ding, M. Sun, and T.-S. Chua,
doi: 10.1038/srep03711. “Enhancing stock movement prediction with adversarial training,” in
Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 5843–5849,
[17] H. Cao, T. Lin, Y. Li, and H. Zhang, “Stock price pattern prediction
doi: 10.24963/ijcai.2019/810.
based on complex network and machine learning,” Complexity,
[37] T. N. Kipf and M. Welling, “Semi-supervised classification with
vol. 2019, May 2019, Art. no. 4132485, doi: 10.1155/2019/4132485.
graph convolutional networks,” in Proc. Int. Conf. Learn. Represent.,
[18] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and 2017, pp. 1–10. [Online]. Available: [Link]
Y. Bengio, “Graph attention networks,” in Proc. Int. Conf. Learn. SJU4ayYgl
Represent., 2018, pp. 1–12. [Online]. Available: [Link] [38] M. Fey and J. E. Lenssen, “Fast graph representation learning with
forum?id=rJXMpikCZ PyTorch geometric,” 2019, arXiv:1903.02428.
[19] H. Tian, X. Zheng, K. Zhao, M. W. Liu, and D. D. Zeng, “Inductive [39] S. Kundu and N. Sarkar, “Return and volatility interdependences
representation learning on dynamic stock co-movement graphs for stock in up and down markets across developed and emerging coun-
predictions,” Inf. J. Comput., vol. 34, no. 4, pp. 1940–1957, 2022, tries,” Res. Int. Bus. Finance, vol. 36, pp. 297–311, Jan. 2016,
doi: 10.1287/ijoc.2022.1172. doi: 10.1016/[Link].2015.09.023.
[20] J. V. D. M. Cardoso and D. P. Palomar, “Learning undirected graphs in [40] W. Wang, “Research on consumption upgrading and retail
financial markets,” in Proc. 54th Asilomar Conf. Signals Syst. Comput., innovation development based on mobile Internet technol-
2020, pp. 741–745, doi: 10.1109/IEEECONF51394.2020.9443573. ogy,” in Proc. J. Phys. Conf. Ser., Mar. 2019, Art. no. 42070,
doi: 10.1088/1742-6596/1176/4/042070.
[21] D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Kernel-Granger
[41] Q. Li, J. Tan, J. Wang, and H. C. Chen, “A multimodal event-
causality and the analysis of dynamical networks,” Phys. Rev. E, Stat.
driven LSTM model for stock prediction using online news,” IEEE
Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 77, no. 5, 2008,
Trans. Knowl. Data Eng., vol. 33, no. 10, pp. 3323–3337, Oct. 2021,
Art. no. 56215, doi: 10.1103/PhysRevE.77.056215.
doi: 10.1109/TKDE.2020.2968894.
[22] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang, [42] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional recur-
“Connecting the dots: Multivariate time series forecasting with graph rent neural network: Data-driven traffic forecasting,” in Proc. Int.
neural networks,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Conf. Learn. Represent., 2018, pp. 1–10. [Online]. Available: https://
Data Min., Aug. 2020, pp. 753–763. [Link]/forum?id=SJiHXGWAZ
[23] L. Bai, L. Yao, C. Li, X. Wang, and C. Wang, “Adaptive graph
convolutional recurrent network for traffic forecasting,” in Proc. 33rd
Int. Conf. Adv. Neural Inf. Process. Syst., vol. 33, Dec. 2020,
pp. 17804–17815. [Online]. Available: [Link]
Hu Tian received the B.S. degree in mathemat-
paper/2020/file/[Link]
ics from the School of Mathematics and Physics,
[24] A. Deng and B. Hooi, “Graph neural network-based anomaly detection Chinese University of Geosciences, Wuhan, China,
in multivariate time series,” in Proc. AAAI Conf. Artif. Intell., vol. 35, in 2018. He is currently pursuing the Ph.D. degree in
Feb. 2021, pp. 4027–4035. social computing with the Institute of Automation,
[25] Y. He, Q. Li, F. Wu, and J. Gao, “Static-dynamic graph neural network Chinese Academy of Sciences, Beijing, China.
for stock recommendation,” in Proc. 34th Int. Conf. Sci. Stat. Database His research interest includes machine learning
Manag., Copenhagen, Denmark, Jul. 2022, pp. 1–4. and intelligent system, particularly in machine learn-
[26] Y.-L. Hsu, Y.-C. Tsai, and C.-T. Li, “FinGAT: Financial graph atten- ing techniques and applications, such as graph rep-
tion networks for recommending top-KK profitable stocks,” IEEE resentation learning, FinTech, and social computing.
Trans. Knowl. Data Eng., vol. 35, no. 1, pp. 469–481, Jan. 2023,
doi: 10.1109/TKDE.2021.3079496.
[27] A. Vaswani et al., “Attention is all you need,” in Proc. 31st
Int. Conf. Adv. Neural Inf. Process. Syst., vol. 30, Dec. 2017,
pp. 6000–6010. [Online]. Available: [Link]
paper/2017/file/[Link]
[28] M. Geva, R. Schuster, J. Berant, and O. Levy, “Transformer feed-forward Xingwei Zhang received the B.S. degree in elec-
layers are key-value memories,” 2020, arXiv:2012.14913. tronic engineering from the College of Information
Science and Electronic Engineering, Zhejiang
[29] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016,
University, Hangzhou, China, in 2014, and the Ph.D.
arXiv:1607.06450.
degree in computer application technology from
[30] A. Roy, M. Saffar, A. Vaswani, and D. Grangier, “Efficient content- the Institute of Automation, Chinese Academy of
based sparse attention with routing transformers,” Trans. Assoc. Comput. Sciences, Beijing, China, in 2023.
Linguist., vol. 9, pp. 53–68, 2021, doi: 10.1162/tacl_a_00353. He is currently an Assistant Research Professor
[31] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, with the Institute of Automation, Chinese Academy
and A. Y. Wu, “An efficient k-means clustering algorithm: Analysis and of Sciences. His research interests include artificial
implementation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, intelligence, Internet or Things, information security,
pp. 881–892, Jul. 2002. and cross-modal retrieval.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: LEARNING DYNAMIC DEPENDENCIES WITH GERU FOR STOCK PREDICTIONS 6717
Xiaolong Zheng (Member, IEEE) received the B.S. Daniel Dajun Zeng (Fellow, IEEE) received the
degree from China Jiliang University, Hangzhou, B.S. degree in economics and operations research
China, in 2003, the M.S. degree from Beijing from the University of Science and Technology of
Jiaotong University, Beijing, China in 2006, and the China, Hefei, China, in 1990, and the M.S. and Ph.D.
Ph.D. degree in social computing from the Institute degrees in industrial administration from Carnegie
of Automation, Chinese Academy of Sciences, Mellon University, Pittsburgh, PA, USA, in 1994 and
Beijing, in 2009. 1998, respectively.
He is currently a Research Professor with the He is currently a Research Professor with the
Institute of Automation, Chinese Academy of Institute of Automation, Chinese Academy of
Sciences. His research interest includes artificial Sciences, Beijing, China. His research interests
intelligence, complex networks, security informatics, include intelligence and security informatics, infec-
social computing, and recommender system. tious disease informatics, social computing, recommendation systems, soft-
ware agents, and applied operations research and game theory.
Authorized licensed use limited to: KU Leuven Libraries. Downloaded on December 28,2024 at [Link] UTC from IEEE Xplore. Restrictions apply.