Professional Documents
Culture Documents
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
article info a b s t r a c t
Article history: Social Network Analysis (SNA) has been a popular field of research since the early 1990s. Law
Received 13 July 2020 enforcement agencies have been utilizing it as a tool for intelligence gathering and criminal inves-
Received in revised form 11 October 2020 tigation for decades. However, the graph nature of social networks makes it highly restricted to
Accepted 1 January 2021
intelligence analysis tasks, such as role prediction (node classification), social relation inference (link
Available online 9 January 2021
prediction), and criminal group discovery (community detection), etc. In the past few years, many
Keywords: studies have focused on Graph Neural Network (GNN), which utilizes deep learning methods to solve
Criminal Network Analysis graph-related problems. However, we have rarely seen GNNs tackle time-evolving social network
Social Network Analysis problems, especially in the criminology field. The existing studies have commonly over-looked the
Graph Neural Network temporal-evolution characteristics of social networks. In this paper, we propose a graph neural network
Spatial–Temporal Graph Neural Network framework, namely Spatial-Temporal Graph Social Network (STGSN), which models social networks
Attention network from both spatial and temporal perspectives. Using a novel approach, we leverage the temporal
attention mechanism to capture social networks’ temporal features. We design a method analyzing
temporal attention distribution to improve the interpretation ability of our method. In the end, we
conduct extensive experiments on six public datasets to prove our methods’ effectiveness.
© 2021 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.knosys.2021.106746
0950-7051/© 2021 Elsevier B.V. All rights reserved.
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
1.2. Deep learning and graph neural network topological structure with no spatial locality like grids. Most re-
cently, after some pioneering work in RecGNNs, ConvGNNs have
Deep learning has boosted research in many fields in the last become the most popular approach thanks to their efficiency
decades, such as image classification, video processing, speech and convenience. ConvGNNs fall into two categories: spectral-
recognition, natural language processing, etc. In particular, re- based and spatial-based. Spectral-based methods take the path
searchers have successfully utilized Convolutional Neural Net- of graph signal processing for graph convolution, and typical
works (CNNs) and Recurrent Neural Networks (RNNs) to model methods are Spectral Convolutional Neural Network (Spectral
spatial features and temporal dynamics, respectively. However, CNN) [20], Chebyshev Spectral CNN (ChebNet) [21], Graph Con-
data like images, audios, and videos are in the form of pixels or volutional Network (GCN) [8], etc. On the other hand, the spatial-
frames chronologically. At the same time, graph data is gener- based approach is similar to the convolutional operation of a
ated from the non-Euclidean domain and has no specific orders. conventional CNN on images. It mainly focuses on the spatial re-
Intuitively, a graph may have an un-fixed number of nodes with lations in networks, such as GraphSage [10], GAT [11], GAAN [17],
an indeterminate amount of edges. Therefore, operations like GeniePath [22], DeepMGGE [23], etc.
convolutions are straightforward on images but complicated on In this study, the main aspect that we will look into will
graphs. be temporal dependency on social networks. Existing methods
Recently, inspired by methods such as CNNs and RNNs, many usually take RNN-based approaches to capture temporal dynam-
studies have focused on extending deep learning methodologies ics. However, recurrent neural architectures follow a sequential
for graph-related problems, called Graph Neural Networks (GNN). path from past units to current ones, which leads to the time-
The comprehensive survey (as shown in [7]) categorizes GNNs consuming iterative training and gradient explosion/vanishing
into four groups: Recurrent Graph Neural Networks (RecGNNs), issues. Attention, as one of the most influential concepts in
Convolutional Graph Neural Networks (ConvGNNs), Graph Auto- recent deep learning research, was initially invented to solve
Encoders (GAEs), and Spatial–Temporal Graph Neural Networks these sequential problems with much better performance. The
(STGNN). Early studies mainly fell into the RecGNNs group, which study cited in [24] initially introduced an attention mechanism
learns a node’s representation by propagating neighbor infor- to memorize long sentences in neural machine translation. Then,
mation iteratively. Likewise, motivated by the success of CNNs, it rapidly grew and was widely adopted in various fields. The pre-
research efforts have been primarily put into ConvGNNs, which vious studies included image caption generation [25] global/local
cast the notion of convolution into neural networks for graph attention [26], multi-head attention [27], Simple Neural Atten-
data. On the other hand, GAEs encode the graph features into low- tive Learner (SNAIL) [28] , Self-Attention Generative Adversarial
dimensional representations from which it decodes to obtain the Network (SAGAN) [29], and Graph Attention Model (GAM) [30].
graph information. Meanwhile, STGNNs aim to model both spatial In the criminology field, social networks can have a complex
and temporal dynamicity of the graph. topological structure and also evolve. We cannot make any as-
sumptions on what periods are more relevant for predictions.
1.3. Deep learning for social network analysis We have observed group interactions (behaviors) with various
kinds of temporal patterns. As shown in Fig. 1, we demonstrate
Many studies have been conducted using deep learning frame- three examples in a meetup network, in which some may have a
works to tackle SNA-related problems in recent years. The cen- linear development trend, as shown in Case 1 and 2. In contrast,
tral problem for analyzing social networks is how to encode others may produce a significant seasonality pattern, such as
the network data into low-dimensional representations (vectors) that shown in Case 3. With the temporal aspects of the network
to preserve the network structure and information effectively ignored, all three cases would produce similar prediction results
for the downstream machine learning models to analyze social even though they are entirely different. In our work, we focus
our study on the graphs as they have both spatial and temporal
networks further. It has been demonstrated that deep learning
characteristics. Inspired by the recent work on STGNNs, we aim
methodology has potent capabilities for this crucial problem,
to devise a spatial–temporal deep learning framework for time-
and many successful methods have brought SNA to the next
evolving social networks. For spatial convolutions, we adopt the
level, such as GCN [8], CTDNE [9], GraphSAGE [10], and GAT [11]
graph embedding approach over spectral methods due to two
etc. In study [12], the authors conduct a comprehensive review
main reasons. Firstly, we need to incorporate node features as
of the current studies utilizing deep learning models for social
nodes in social networks have various attributes. Secondly, we
networks. It categorizes the real-world social networks as ho-
deal with social networks that evolve, which means that the
mogeneous, heterogeneous, attributed, and dynamic. Our study
number of nodes and edges will change dynamically over time.
mainly looks into dynamic social networks, which evolve over
One of our primary goals is to capture the temporal evolution
time with frequent addition/deletion of nodes and links. Although
and make it beneficial for subsequent tasks. Regarding temporal
some existing studies have looked into learning dynamic net-
dependency, we novelly introduce the attention mechanism to
works, we novelly utilize the concept of STGNN and attention
model the temporal dynamics and design an approach to make
mechanism to model time-evolving social networks for better
the model more interpretable. Finally, our method can capture
performance.
both spatial and temporal aspects of the time-evolving social
STGNNs consider both spatial and temporal dynamics when
networks.
modeling the graph while other GNNs mainly focus on modeling
In summary, our main contributions are as follows:
the spatial structure of networks. Many methods facilitate graph
convolutions to capture spatial dependency with RNNs or CNNs to 1. We propose a novel framework of the Spatial–Temporal
model the temporal dependence. Recent studies have put a lot of Graph Neural Network specifically for social network mod-
effort into STGNNs, such as for traffic forecasting [13–17], driver eling, called STGSN. It models spatial and temporal features
maneuver anticipation [18], and action recognition [19]. With the of time-evolving social networks, which can be particularly
emerging STGNN methods that have mostly looked into computer useful for criminal network analysis.
vision problems, there has been very little research conducted 2. To the best of our knowledge, our method is the first
utilizing the concept of STGNN for social networks. attempt that leverages attention mechanisms on graph em-
From the spatial perspective, unlike the fixed-size images and beddings to make the framework capable of modeling the
grids that CNN typically deals with, graphs have a more complex temporal characteristic of social networks.
2
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
Fig. 1. Temporal Characteristics in Criminal Networks (Meetup). In the following examples, we denote the target user, gang members and ordinary users in yellow,
red, and white respectively. (1) Case 1: The meetups between the central user and gang members decrease over time and vanish in the end, which shows less
group involvement and less potential danger. (2) Case 2: The meetups between the central user and gang members increase monthly, which is a strong signal that
the user in question is joining the group as a new member. (3) Case 3: The target user contacts other gang members in a seasonal fashion, which indicates a high
probability that he/she holds a specific kind of role or position in the group.
3. We discuss the strong expressive ability of the proposed 2.2. Spatial–temporal graph neural network
method in-depth by creating five temporal attention distri-
bution categories and analyzing how they affect the down- Many studies have extensively applied the concept of STGNN
stream predictions. in the field of traffic forecasting. Li et al. [13] devise a deep learn-
4. We conduct extensive experiments on six public datasets ing framework called Diffusion Convolutional Recurrent Neural
with a series of prediction tasks. The results show that the Network (DCRNN) for traffic forecasting that combines spatial
model outperforms state-of-the-art baselines significantly. and temporal dependencies in traffic flow. The proposed model
utilizes bidirectional random walks to capture the spatial correla-
2. Related work
tion and the encoder–decoder architecture to seize the temporal
2.1. Graph neural network for social network analysis dependence. Guo et al. [14] propose a novel attention-based
spatial–temporal graph convolutional network (ASTGCN) for traf-
DeepInf [31] is an end to end unified framework designed fic forecasting problems. In this study, the attention mechanism
by Qiu et al. which utilizes graph convolution and attention aims to select the information that is relatively critical to the
mechanisms to incorporate user-specific features and network current task from both spatial and temporal perspectives. In
structures for predicting social influence. Zhang et al. propose a study [15], Yu et al. propose a deep learning framework for
framework: SEAL [32] for link prediction. For each target link, traffic forecasting, called Spatial–Temporal Graph Convolutional
SEAL extracts a local enclosing subgraph and uses a GNN to learn Networks (STGCN), which integrates graph convolution layer and
general graph structure features. Wang et al. propose an embed- gated temporal convolution layers through spatial–temporal con-
ding model, namely, Multiple Conditional Network Embedding volutional blocks. GaAN proposed in study [17], unlike the exist-
(MCNE) [33]. Combined with a GNN based on the message- ing multi-head attention mechanism, uses a convolutional sub-
passing/receiving mechanism, the model introduces the binary network to control each attention head’s importance. The re-
mask, followed by an attention network to model correlations searchers build Graph Gated Recurrent Unit (GGRU) by using
among multiple preferences. Liu et al. [34] point out that a single GaAN as building blocks to address the traffic speed forecast-
vector representation is not enough for network embedding — for
ing problem. In study [9], researchers propose two algorithms:
instance, used on an online shopping website where a customer
Continuous-Time Dynamic Network Embeddings (CTDNE) and
may have bought items of disparate genres. Existing embedding
TemporalWalk, which leverages random walks strategy to incor-
techniques tend to fuse different aspects of a node into only
a single vector representation, which can be problematic. This porate temporal information into network embedding methods.
study proposes a polysemous embedding approach for modeling Motivated by the success of traffic forecasting, STGNN has
multiple aspects of nodes. Ioannidis et al. [35] also present a developed rapidly in other fields as well. In study [18], for driver
Graph Recurrent Neural Network (GRNN) where nodes engage maneuver anticipation, Jain et al. propose an approach, named
in multi-relational scenarios. In study [36], motivated by the structural-RNN (S-RNN), which combines the power of spatial–
concept of GCNs, the authors devise a method, called RCNN, to temporal graphs and Recurrent Neural Networks (RNNs) by trans-
identify the critical nodes (super-spreaders) in complex networks forming arbitrary spatial–temporal graphs into a mixture of RNNs
based on the message-spreading ability. and effectively captures the interactions in the underlying
3
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
Table 1
Summary of Notations.
Symbols Definition
G A social network graph
V Set of nodes
E Set of edges
X Feature vector
W, B Learning parameters
N(i) The neighbors of node i
hki The embedding of the ith node within the kth layer
′
hti The embedding of the ith node at the historical time step t ′ Fig. 2. Node Embedding Example. Target node: v1 . Layer-0 neighbors: Node v1 ,
htotal
i The embedding of the ith node (all the time steps as a whole) v2 , v3 , v6 , v7 . Layer-1 neighbors: v2 , v3 , v4 , v5 .
α Attention weights
α <total,t >
′
The attention weight (time step t ′ to total)
e The coefficient between two different embeddings (GCN) [8], we adopt the graph embedding approach, similar to
<total,t ′ > ′ GraphSage [10], to make our framework inductive. We aggregate
e The coefficient of the time step t to total embedding
the neighborhood features to the target node following Eq. (3).
Cit Temporal attention context
hki represents the embedding of the ith node in the kth layer.
hˆti The final embedding of node i til time t N(i) is the set that contains the neighbors of node i within K
AGG() Aggregation function hubs. hkj −1 defines the feature embeddings of neighbor j while
ATT () Attention function hki −1 denotes the embedding of node i in the (k − 1)th layer. W
and B are the learning parameters that reflect the importance
of the neighbors of node i and node i itself in the previous
spatial–temporal graphs. Wu et al. [16] devise a novel CNN-based layer during aggregation. Initially, when k = 0, h0i equals xi , as
graph neural network architecture, namely Graph WaveNet. Mo- shown in Eq. (1), which is the feature vector of node i. Then, we
tivated by WaveNet, the model adopts stacked dilated casual aggregate all neighbors’ information layer by layer until we reach
convolutions to capture temporal dependencies that handle very the target node — node i. Finally, hKi is the final embedding of
long sequences. Yan et al. [19] propose a novel Spatial–Temporal node i, denoted as ziK in Eq. (2).
Graph Convolutional Networks (ST-GCN) model for skeleton-
h0i = xi (1)
based action recognition. The model applies spatial and temporal
graph convolutions on the skeleton sequences. hKi = ziK (2)
([ ∑ k−1
])
j∈N(i) hj
3. Methodology hki = ReLU Wk ⏐ ⏐ , Bk hki −1 , ∀k ∈ {1, 2, . . . , K }
⏐N(i) ⏐
We propose a deep learning framework for modeling both the (3)
spatial and temporal patterns of time-evolving social networks.
To give an illustration, as shown in Fig. 2, we have K = 2,
The structure consists of three steps: Firstly, we utilize an embed-
which indicates that we aggregate the information from neigh-
ding method to capture the node’s spatial features for each time
bors within two hubs. However, the number of layers can be
slices. Secondly, we propose an attention-based mechanism to ag-
set dynamically to different applications. In our case, we only
gregate the temporal memory of the graph so that the model can
consider neighboring nodes within two hubs. Taking node v1
pay weighted attention intentionally to different historical steps.
as an example, we embed the neighbors: v2 , v3 , v6 , v7 , and v1
Additionally, we focus on improving the interpretation ability of
from layer-0 to the layer-1 neighbors that are v2 , v3 , v4 , and
the model by looking into the temporal attention distribution.
Thirdly, we assemble the neural networks above-mentioned for
v5 . And then, we further aggregate the layer-1 neighbors to the
target node v1 at layer-2. In our case, If we calculate the node i’s
the downstream social network prediction tasks.
embedding across all the time steps as a while, we denote it as
htotal
i . In contrast, if the embedding is only for a specific time step
3.1. Framework overview ′
t ′ , we note it as hti .
We first present the process of embedding the node’s spatial
features. Then, we illustrate the Algorithm 1, for building the 3.3. Temporal attention neural network
temporal attention network, and discuss its expressive ability.
Finally, we demonstrate the operation of how to put the pieces As shown in Fig. 3, after the aggregation of the spatial features,
together and make the final prediction (see Table 1). on which we build an attention network to capture the temporal
features as the network evolves.
3.2. Graph spatial convolution Firstly, we calculate the global embeddings of each node,
denoted as htotal i . Secondly, we carry out the aggregation at each
′
We construct a social network graph G = (V , E) where V = time step independently. Let hti be the embedding of node i at
{v1 , v2 , v3 ...vi } denotes the set of the nodes (users), and E = time t up to the present time: t. h0i , h1i , . . . , hti −1 , hti represents
′
{e1 , e2 , e3 ...ei } defines the set of edges (relations) between them. the embeddings of node i at the time step 0, 1, . . . , t − 1, t
Let X = {x1 , x2 ...xi } be the feature vector, and xi represents the respectively. Thirdly, we leverage the attention mechanism and
user’s ith feature, such as age, origin, hobby, etc. calculate the coefficient between the embeddings from each time
In our case, the nodes and edges in the social network keep step and the total embedding by following Eq. (4), where t is the
changing as the network evolves. Therefore, instead of mod- target time step in question, and t ′ denotes a preceding time step.
⟨total,t ′ ⟩
eling the spatial dependency via Graph Convolution Network ei defines the magnitude of the importance of time step
4
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
Table 2
Dataset Statistics.
Ego-Comm InVS13 InVS15 IARadoslawEmail FB-Forum DNC-Email
Nodes 774 100 232 167 899 2029
Edges 13,287 394,247 1,283,194 82,927 33,720 39,264
Average Degrees 34.33 8299.92 11,718.66 993.14 75.02 38.70
Durations 18 months 2 weeks 2 weeks 10 months 25 weeks 18 months
Node ID ✓ ✓ ✓ ✓ ✓ ✓
Node Attributed ✗ ✓ ✓ ✗ ✗ ✗
Edge Attributed ✓ ✓ ✓ ✓ ✓ ✓
Directed ✗ ✗ ✗ ✓ ✗ ✓
Temporal ✓ ✓ ✓ ✓ ✓ ✓
Table 3
MAE and MSE of Algorithm (Regression on Ego-Comm).
Baseline {1 − 9} → 9 {10 − 18} → 18
MAE MSE MAE MSE
GCN 2.316 ± 0.037 8.391 ± 0.392 1.983 ± 0.015 6.222 ± 0.144
CTDNE 2.237 ± 0.117 7.294 ± 0.092 1.921 ± 0.017 6.291 ± 0.068
TemporalWalk 2.195 ± 0.051 7.019 ± 0.027 1.881 ± 0.022 5.760 ± 0.113
GraphSAGE (Mean) 2.234 ± 0.025 7.003 ± 0.241 1.926 ± 0.032 6.581 ± 0.078
GraphSAGE (MaxPool) 2.191 ± 0.100 6.809 ± 0.277 1.737 ± 0.019 5.415 ± 0.047
GraphSAGE (MeanPool) 2.185 ± 0.051 6.735 ± 0.291 1.950 ± 0.022 6.676 ± 0.106
GAT (Single-head) 2.222 ± 0.021 7.599 ± 0.322 1.990 ± 0.034 6.721 ± 0.288
GAT (Multi-head) 2.195 ± 0.048 7.371 ± 0.348 1.789 ± 0.062 5.631 ± 0.271
STGSN (No Attention) 2.559 ± 0.172 9.618 ± 0.393 2.036 ± 0.073 7.451 ± 0.311
STGSN (Single-head) 2.169 ± 0.011 6.913 ± 0.271 1.730 ± 0.028 5.302 ± 0.093
STGSN (Multi-head) 2.068 ± 0.021 6.265 ± 0.228 1.770 ± 0.025 5.581 ± 0.089
Table 4
F1(Micro) Score of Algorithm (Classification on InVS13).
Baseline {1 − 6} → 7 {1 − 7} → 8 {1 − 8} → 9 {1 − 9} → 10
F1 (Micro) F1 (Micro) F1 (Micro) F1 (Micro)
GCN 0.416 ± 0.021 0.425 ± 0.039 0.420 ± 0.037 0.438 ± 0.026
CTDNE 0.520 ± 0.012 0.494 ± 0.011 0.521 ± 0.017 0.529 ± 0.018
TemporalWalk 0.555 ± 0.011 0.411 ± 0.027 0.478 ± 0.012 0.560 ± 0.013
GraphSAGE (Mean) 0.551 ± 0.026 0.513 ± 0.022 0.568 ± 0.025 0.634 ± 0.028
GraphSAGE (MaxPool) 0.527 ± 0.029 0.511 ± 0.006 0.582 ± 0.019 0.641 ± 0.011
GraphSAGE (MeanPool) 0.503 ± 0.025 0.516 ± 0.031 0.574 ± 0.029 0.621 ± 0.013
GAT (Single-head) 0.434 ± 0.021 0.409 ± 0.017 0.474 ± 0.017 0.514 ± 0.016
GAT (Multi-head) 0.436 ± 0.051 0.410 ± 0.027 0.463 ± 0.034 0.597 ± 0.033
STGSN (No Attention) 0.4696 ± 0.031 0.502 ± 0.032 0.538 ± 0.024 0.609 ± 0.033
STGSN (Single-head) 0.589 ± 0.021 0.525 ± 0.025 0.590 ± 0.005 0.652 ± 0.021
STGSN (Multi-head) 0.511 ± 0.031 0.502 ± 0.036 0.542 ± 0.023 0.634 ± 0.029
Table 5
F1(Micro) Score of Algorithm (Classification on InVS15).
Baseline {1 − 6} → 7 {1 − 7} → 8 {1 − 8} → 9 {1 − 9} → 10
F1 (Micro) F1 (Micro) F1 (Micro) F1 (Micro)
GCN 0.382 ± 0.014 0.364 ± 0.016 0.367 ± 0.011 0.406 ± 0.019
CTDNE 0.576 ± 0.009 0.505 ± 0.012 0.473 ± 0.010 0.531 ± 0.017
TemporalWalk 0.579 ± 0.017 0.509 ± 0.021 0.494 ± 0.029 0.497 ± 0.011
GraphSAGE (Mean) 0.849 ± 0.002 0.559 ± 0.010 0.574 ± 0.003 0.636 ± 0.005
GraphSAGE (MaxPool) 0.838 ± 0.004 0.553 ± 0.012 0.566 ± 0.001 0.611 ± 0.013
GraphSAGE (MeanPool) 0.842 ± 0.006 0.546 ± 0.008 0.576 ± 0.006 0.613 ± 0.010
GAT (Single-head) 0.513 ± 0.016 0.462 ± 0.018 0.438 ± 0.013 0.457 ± 0.017
GAT (Multi-head) 0.518 ± 0.011 0.477 ± 0.014 0.456 ± 0.008 0.469 ± 0.012
STGSN (No Attention) 0.849 ± 0.012 0.527 ± 0.029 0.578 ± 0.013 0.573 ± 0.019
STGSN (Single-head) 0.852 ± 0.005 0.573 ± 0.010 0.578 ± 0.005 0.631 ± 0.023
STGSN (Multi-head) 0.852 ± 0.017 0.556 ± 0.020 0.575 ± 0.035 0.604 ± 0.026
4.2. Experiment setup and 0.7 percentiles into three classes: Rare (0 − 0.35), Normal
(0.35 − 0.7), and Frequent (0.7 − 1) to make the case a classifica-
For meetup networks, Ego-Comm [37]: The nodes are users tion task. We denote the experiment as {1 − 7} → 7, indicating
and their friends while the edges indicate the face-to-face com- using the 1st to the 7th days’ data for the 7th-day prediction. We
munications. There is only one node feature: ID (we utilize one- conduct multiple rounds of experiments as follows: {1 − 7} → 7,
hot-encoding to encode the user’s ID). We divide the dataset {1 − 8} → 8, {1 − 9} → 9, {1 − 10} → 10. In this case, we use
into 18 slices in which one piece contains the activities for one 80% data for training (with 20% used for validation) and 20% for
month. We use the face-to-face communication data in the 1 − 9 testing.
months to predict the friendship ratings given in the 9th month For communication networks, IARadoslawEmail [39], we di-
vide it into ten slices (one slice per month) and then organize
and use 10 − 18 months to predict the ratings provided in the
the link prediction tasks as follows: {1 − 5} → 6, {1 − 6} → 7,
18th month. The friendship ratings scale from 0 to 10, and we set
{1 − 7} → 8, and {1 − 8} → 9. We segment DNC-Email [41]
up this case as a regression task. We denote the two experiments
daily and select the data from 2016-05-02 to 2016-05-09. For
as {1 − 9} → 9 and {10 − 18} → 18. To train the prediction
simplicity, we denote 2016-05-02, 2016-05-03, . . . , 2016-05-08 as
task, we split the dataset into 80% for training (with 20% used 1, 2, 3, . . . , 7 and design the following experiments: {1 − 7} → 8,
for validation) and 20% for testing. InVS13 and InVS15 [38]: We {1 − 8} → 9, {1 − 9} → 10, and {1 − 10} → 11. At last, FB-
segment the co-presence data into ten slices, and each piece Forum [40] has 25 weeks’ data ranging from week 19 to week
represents the meetups for a working day. In the network, nodes 34. We slice the data weekly and ignore week 19 with only 295
are the users in the workplace, and the co-presence forms edges. records, which is exceptionally small compared to other weeks.
The node features consist of the user’s ID (one-hot-encoding) and We set up the prediction tasks from week 20 to 27 as {1 − 4} →
the Department. The prediction goal is to use the node embedding 5, {1 − 5} → 6, {1 − 6} → 7, and {1 − 7} → 8. Similarly, We use
to predict the extent of co-occurrences of given pairs on a given 80% data for training (with 20% used for validation) and 20% for
day. We categorize the number of co-presence following the 0.35 testing.
7
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
Table 6
ROC-AUC Score of Algorithm (Classification on IARadoslawEmail).
Baseline {1 − 5} → 6 {1 − 6} → 7 {1 − 7} → 8 {1 − 8} → 9
ROC-AUC ROC-AUC ROC-AUC ROC-AUC
GCN 0.787 ± 0.012 0.819 ± 0.006 0.820 ± 0.004 0.778 ± 0.017
CTDNE 0.748 ± 0.019 0.751 ± 0.024 0.747 ± 0.009 0.761 ± 0.014
TemporalWalk 0.817 ± 0.015 0.796 ± 0.017 0.822 ± 0.011 0.814 ± 0.023
GraphSAGE (Mean) 0.852 ± 0.009 0.845 ± 0.011 0.853 ± 0.006 0.808 ± 0.015
GraphSAGE (MaxPool) 0.839 ± 0.014 0.845 ± 0.009 0.832 ± 0.010 0.820 ± 0.008
GraphSAGE (MeanPool) 0.830 ± 0.016 0.855 ± 0.014 0.845 ± 0.006 0.798 ± 0.023
GAT (Single-head) 0.821 ± 0.012 0.812 ± 0.008 0.795 ± 0.016 0.795 ± 0.021
GAT (Multi-head) 0.824 ± 0.009 0.825 ± 0.002 0.818 ± 0.007 0.832 ± 0.015
STGSN (No Attention) 0.850 ± 0.017 0.855 ± 0.018 0.837 ± 0.009 0.815 ± 0.013
STGSN (Single-head) 0.863 ± 0.004 0.857 ± 0.013 0.861 ± 0.012 0.842 ± 0.011
STGSN (Multi-head) 0.866 ± 0.013 0.860 ± 0.025 0.863 ± 0.023 0.825 ± 0.017
Table 7
ROC-AUC Score of Algorithm (Classification on FB-Forum).
Baseline {1 − 4} → 5 {1 − 5} → 6 {1 − 6} → 7 {1 − 7} → 8
ROC-AUC ROC-AUC ROC-AUC ROC-AUC
1 GCN 0.627 ± 0.023 0.546 ± 0.013 0.610 ± 0.028 0.556 ± 0.029
CTDNE 0.816 ± 0.016 0.831 ± 0.010 0.795 ± 0.014 0.828 ± 0.015
TemporalWalk 0.831 ± 0.017 0.863 ± 0.022 0.771 ± 0.018 0.855 ± 0.017
GraphSAGE (Mean) 0.777 ± 0.015 0.810 ± 0.011 0.812 ± 0.012 0.777 ± 0.021
GraphSAGE (MaxPool) 0.815 ± 0.008 0.847 ± 0.016 0.838 ± 0.013 0.901 ± 0.014
GraphSAGE (MeanPool) 0.784 ± 0.023 0.844 ± 0.014 0.783 ± 0.030 0.757 ± 0.023
GAT (Single-head) 0.703 ± 0.021 0.707 ± 0.035 0.644 ± 0.023 0.640 ± 0.024
GAT (Multi-head) 0.684 ± 0.016 0.663 ± 0.028 0.686 ± 0.026 0.760 ± 0.019
STGSN (No Attention) 0.845 ± 0.027 0.880 ± 0.008 0.841 ± 0.013 0.913 ± 0.011
STGSN (Single-head) 0.853 ± 0.017 0.880 ± 0.011 0.859 ± 0.009 0.916 ± 0.012
STGSN (Multi-head) 0.854 ± 0.031 0.930 ± 0.026 0.869 ± 0.025 0.841 ± 0.016
Table 8
ROC-AUC Score of Algorithm (Classification on DNC-Email).
Baseline {1 − 7} → 8 {1 − 8} → 9 {1 − 9} → 10 {1 − 10} → 11
ROC-AUC ROC-AUC ROC-AUC ROC-AUC
GCN 0.677 ± 0.027 0.654 ± 0.019 0.6484 ± 0.021 0.623 ± 0.016
CTDNE 0.894 ± 0.013 0.891 ± 0.022 0.904 ± 0.017 0.911 ± 0.011
TemporalWalk 0.834 ± 0.015 0.817 ± 0.019 0.848 ± 0.007 0.868 ± 0.014
GraphSAGE (Mean) 0.848 ± 0.011 0.856 ± 0.017 0.831 ± 0.008 0.797 ± 0.025
GraphSAGE (MaxPool) 0.884 ± 0.013 0.914 ± 0.009 0.907 ± 0.014 0.927 ± 0.022
GraphSAGE (MeanPool) 0.860 ± 0.015 0.868 ± 0.016 0.869 ± 0.011 0.858 ± 0.012
GAT (Single-head) 0.813 ± 0.024 0.868 ± 0.018 0.798 ± 0.012 0.915 ± 0.012
GAT (Multi-head) 0.910 ± 0.017 0.885 ± 0.016 0.832 ± 0.023 0.830 ± 0.011
STGSN (No Attention) 0.940 ± 0.024 0.922 ± 0.027 0.914 ± 0.025 0.909 ± 0.016
STGSN (Single-head) 0.971 ± 0.010 0.945 ± 0.011 0.959 ± 0.014 0.959 ± 0.013
STGSN (Multi-head) 0.968 ± 0.018 0.952 ± 0.018 0.950 ± 0.009 0.919 ± 0.026
method has powerful capabilities of modeling social 3. The temporal importance distribution varies from case
networks with temporal characteristics. to case in time-evolving social networks. In our exper-
(b) It is worth pointing out that STGSN, in particular, iments, we closely look into how attention distributions
has superior performance on communication net- vary in different cases. It is noticeable that the temporal
works: IARadoslawEmail, FB-Forum, and DNC-Email. attentions are distributed differently in various scenarios.
It indicates that the temporal patterns are crucial For example, Fig. 4(a) shows a friendship prediction in the
for communication networks (email and comment 9th month based on the 1st to 9th months’ face-to-face
exchanges). For example, to answer questions like: communications. The attention weights are increasing from
will two people email each other in a few days. The 0.095 (Month 1) to 0.147 (Month 9) chronologically, which
answer may heavily depend on if they had email ex- shows an increasing-trend pattern that the more recent
changes recently. In this case, the recent time steps the month is, the more decisive influence it contributes. In
are more important than the historical ones. It shows contrast, Fig. 4(b) is one example of FB-Forum indicating a
that our method can robustly capture the influential decreasing-trend pattern. Furthermore, Fig. 4(c) and 4(d)
temporal patterns by slicing the network interac- demonstrates a seasonal pattern. In Fig. 4(c), every sec-
tions in time steps and paying weighted attention to ond day shows a more substantial influence. Meanwhile,
them. Fig. 4(d) shows a co-presence prediction on Friday is in-
(c) Attention plays a critical role in our framework. It is fluenced mostly by the previous Fridays. Two Fridays have
noticeable that STGSN without attention mechanism, attention weights 0.192 and 0.126, respectively, which are
namely STGSN (No Attention), performs not as good significantly more influential than all other weekdays. At
as STGSN models with attention. It even produces last, Fig. 4(e) and Fig. 4(f) are the key-player(s) pattern. As
the worst performance in the Ego-Comm case. Com- shown in Fig. 4(e), this example in FB-Forum has week 2
pared to the methods, like GraphSAGE, that looking with the most prominent contribution. In this case, there is
at the network structure as a whole, learning the only one key player: week 2. In contrast, the InVS13 exam-
node representation in time-slice fashion without ple in Fig. 4(f) has multiple key time steps: two Tuesdays
attention is not much improvement. Moreover, if we and a Friday as key players.
apply the time-segmentation incautiously, the node- Unlike some models used in a low-risk environment, where
aggregation approach may even yield unsatisfactory a mistake will not have serious consequences, in our case,
performance. it is not enough for the police officers to know just the
prediction (what). The model must demonstrate how it
2. STGSN with single-head and multi-head attention has comes to the prediction (why). Therefore, when show-
not much difference w.r.t performance. In our exper- ing the prediction result, we also designed a visualization
iments, we have tried both single-head and multi-head tool to show the temporal attention distribution (chart
setup. The outcomes indicate that our method generally and pattern class), which is proven to be very helpful in
achieves the same performance level with multi-head at- practice. By doing this, the officers can use the prediction
tention and single-head attention, which implies that mul- (what + why) as assistance and combine it with his domain
tiple attention heads have no better capability of seizing knowledge to make the final call.
more hidden information of temporal characteristics as
expected. In study [42] and [43], the authors observe some 5. Conclusion and future work
similar phenomena that multi-heads are not necessarily
better than single-head attentions and pruning the vast In this paper, we propose a spatial–temporal graph neural
majority of heads can still obtain a decent performance network: STGSN for social networks specifically. The proposed
level. How multiple heads can affect performance in social framework firstly utilizes the graph embedding approach to ag-
networks analysis? It is a promising future research direc- gregate the neighborhood information for the target nodes. Sec-
tion that is worth further exploring. We consider this topic ondly, we introduce a novel attention mechanism to build a
is beyond our research scope and decide to focus on our temporal context, enabling the algorithm to pay weighted at-
goal — modeling time-evolving social networks. As shown tention to different time steps. Thirdly, we devise a method
in Tables 3, 4, 5, 6, 7, 8, we list the scores produced by to improve the interpretation ability of the model. Finally, the
STGSN with both single-head and multi-head attentions. In algorithm combines the spatial and temporal features for the
practice, we choose the single-head setting, which achieves downstream prediction tasks. We conducted extensive experi-
the same level of performance as the multi-head setup but ments on six public datasets. The results show that the proposed
with much less computational overhead. framework outperforms state-of-the-art methods by producing
9
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
significantly better performance in the downstream prediction the research. Sichuan Provincial Public Security Department sup-
tasks. ported this work under the Intelligence Command Operational
Some possible future research directions are: Devising a better Platform Program.
graph representation method that fully considers the neighbor-
hood’s node ID, node attributes, and edge attributes. Develop an References
optimized time-slicing method for better temporal segmentation.
[1] Stanley Wasserman, Katherine Faust, Social Network Analysis, Cambridge
CRediT authorship contribution statement University Press, 1994.
[2] Emilio Ferrara, Pasquale De Meo, Salvatore Catanese, Giacomo Fiumara,
Shengjie Min: Conceptualization, Methodology, Writing - Detecting criminal organizations in mobile phone networks, Expert Syst.
original draft, Software. Zhan Gao: Conceptualization, Writing Appl. 41 (13) (2014) 5733–5750.
- original draft, Software. Jing Peng: Writing - original draft, [3] Rafał Drezewski, Jan Sepielak, Wojciech Filipkowski, The application of so-
cial network analysis algorithms in a system supporting money laundering
Software. Liang Wang: Software, Visualization, Investigation. Ke
detection, Inform. Sci. 295 (October) (2015) 18–32.
Qin: Writing - review & editing. Bo Fang: Data curation, Software,
[4] Giulia Berlusconi, Francesco Calderoni, Nicola Parolini, Marco Verani,
Visualization. Carlo Piccardi, Link prediction in criminal networks: A tool for criminal
intelligence analysis, PLoS One 11 (4) (2016) 1–21.
Declaration of competing interest [5] Andrea Fronzetti Colladon, Elisa Remondi, Using social network analysis to
prevent money laundering, Expert Syst. Appl. 67 (2017) 49–58.
The authors declare that they have no known competing finan- [6] S. Min, G. Luo, Z. Gao, J. Peng, K. Qin, Resonance - An intelligence anal-
cial interests or personal relationships that could have appeared ysis framework for social connection inference via mining co-occurrence
patterns over multiplex trajectories, IEEE Access 8 (2020) 24535–24548.
to influence the work reported in this paper.
[7] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang,
Philip S. Yu, A comprehensive survey on graph neural networks, IEEE Trans.
Acknowledgments Neural Netw. Learn. Syst. XX (Xx) (2020) 1–21.
[8] Thomas N. Kipf, Max Welling, Semi-supervised classification with graph
We thank the Sichuan Provincial Public Security Department convolutional networks, in: 5th International Conference on Learning
experts, who provided insight and expertise that greatly assisted Representations, ICLR 2017 - Conference Track Proceedings, 2019, pp. 1–14.
10
S. Min, Z. Gao, J. Peng et al. Knowledge-Based Systems 214 (2021) 106746
[9] Giang Hoang Nguyen, John Boaz Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eu- [26] Minh Thang Luong, Hieu Pham, Christopher D. Manning, Effective ap-
nyee Koh, Sungchul Kim, Continuous-time dynamic network embeddings, proaches to attention-based neural machine translation, in: Conference
in: Companion Proceedings of the the Web Conference 2018, 2018, pp. Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural
969–976. Language Processing, 2015, pp. 1412–1421.
[10] Nesreen K. Ahmed, Ryan A. Rossi, Rong Zhou, John Boaz Lee, Xiangnan [27] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Kong, Theodore L. Willke, Hoda Eldardiry, Inductive representation learning Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Attention is all you need,
in large attributed graphs, in: NIPS, 2017, pp. 1–11. Adv. Neural Inf. Process. Syst. 2017-Decem (Nips) (2017) 5999–6009.
[11] Petar Veličković, Arantxa Casanova, Pietro Liò, Guillem Cucurull, Adriana [28] Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel, A simple
Romero, Yoshua Bengio, Graph attention networks, in: 6th International neural attentive meta-learner, in: 6th International Conference on Learning
Conference on Learning Representations, ICLR 2018 - Conference Track Representations, ICLR 2018 - Conference Track Proceedings, 2018, pp. 1–17.
Proceedings, 2018, pp. 1–12. [29] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena, Self-
[12] Qiaoyu Tan, Ninghao Liu, Xia Hu, Deep representation learning for social attention generative adversarial networks, in: 36th International Con-
network analysis, Front. Big Data 2 (April) (2019) 1–10. ference on Machine Learning, ICML 2019, Vol. 2019-June, 2019, pp.
[13] Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu, Diffusion convolutional 12744–12753.
recurrent neural network : Data-driven traffic forecasting, in: International [30] John Boaz Lee, Ryan Rossi, Xiangnan Kong, Graph classification using
Conference on Learning Representations, 2018, pp. 1–16. structural attention, in: Proceedings of the 24th ACM SIGKDD International
[14] Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, Huaiyu Wan, Atten- Conference on Knowledge Discovery & Data Mining, August, ACM, New
tion based spatial-temporal graph convolutional networks for traffic flow York, NY, USA, 2018, pp. 1666–1674.
forecasting, Proc. AAAI Conf. Artif. Intell. 33 (2019) 922–929. [31] Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, Jie Tang,
[15] Bing Yu, Haoteng Yin, Zhanxing Zhu, Spatio-temporal graph convolutional DeepInf: Social influence prediction with deep learning, in: Proceedings of
networks: A deep learning framework for traffic forecasting, in: Proceed- the ACM SIGKDD International Conference on Knowledge Discovery and
ings of the Twenty-Seventh International Joint Conference on Artificial Data Mining, 2018, pp. 2110–2119.
Intelligence, Vol. 2018-July, International Joint Conferences on Artificial [32] Muhan Zhang, Yixin Chen, Link prediction based on graph neural networks,
Intelligence Organization, California, 2018, pp. 3634–3640. Adv. Neural Inf. Process. Syst. 2018-Decem (Nips) (2018) 5165–5175.
[16] Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Chengqi Zhang, Graph [33] Hao Wang, Tong Xu, Qi Liu, Defu Lian, Enhong Chen, Dongfang Du, Han
wavenet for deep spatial-temporal graph modeling, in: IJCAI International Wu, Wen Su, MCNE: An end-to-end framework for learning multiple
Joint Conference on Artificial Intelligence, Vol. 2019-Augus, 2019, pp. conditional network representations of social network, in: Proceedings of
1907–1913. the ACM SIGKDD International Conference on Knowledge Discovery and
[17] Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, Dit Yan Yeung, Data Mining, 2019, pp. 1064–1072.
GaAN: Gated attention networks for learning on large and spatiotemporal [34] Ninghao Liu, Qiaoyu Tan, Yuening Li, Hongxia Yang, Jingren Zhou, Xia
graphs, in: 34th Conference on Uncertainty in Artificial Intelligence 2018, Hu, Is a single vector enough? Exploring node polysemy for network
UAI 2018, Vol. 1, 2018, pp. 339–349. embedding, in: Proceedings of the ACM SIGKDD International Conference
[18] Ashesh Jain, Amir R. Zamir, Silvio Savarese, Ashutosh Saxena, Structural- on Knowledge Discovery and Data Mining, 2019, pp. 932–940.
RNN: Deep learning on spatio-temporal graphs, in: CVPR, 2016, pp. [35] Vassilis N. Ioannidis, Antonio G. Marques, Georgios B. Giannakis, A re-
5308–5317. current graph neural network for multi-relational data, in: ICASSP, IEEE
[19] Yong Li, Zihang He, Xiang Ye, Zuguo He, Kangrong Han, Spatial temporal International Conference on Acoustics, Speech and Signal Processing -
graph convolutional networks for skeleton-based dynamic hand gesture Proceedings, Vol. 2019-May, 2019, pp. 8157–8161.
recognition, Eurasip J. Image Video Process. 2019 (1) (2019). [36] En Yu Yu, Yue Ping Wang, Yan Fu, Duan Bing Chen, Mei Xie, Identifying
[20] Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun, Spectral critical nodes in complex networks via graph convolutional networks,
networks and deep locally connected networks on graphs, in: 2nd Inter- Knowl.-Based Syst. 198 (2020).
national Conference on Learning Representations, ICLR 2014 - Conference [37] Jari Saramäki, E.A. Leicht, Eduardo López, Sam G.B. Roberts, Felix Reed-
Track Proceedings, 2014, pp. 1–14. tsochas, Robin I.M. Dunbar, Persistence of social signatures in human
[21] Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, Convolutional communication, 111 (3), 2014, pp. 942–947.
neural networks on graphs with fast localized spectral filtering, Comput. [38] Mathieu G’enois, Alain Barrat, Can co-location be used as a proxy for
Mater. Sci. 152 (59) (2016) 60–69. face-to-face contacts? EPJ Data Sci. 7 (1) (2018) 11.
[22] Ziqi Liu, Chaochao Chen, Longfei Li, Jun Zhou, Xiaolong Li, Le Song, Yuan [39] Radosław Michalski, Sebastian Palus, Przemysław Kazienko, Matching
Qi, Geniepath: Graph neural networks with adaptive receptive paths, Proc. Organizational Structure and Social Network Extracted from Email Com-
AAAI Conf. Artif. Intell. 33 (0) (2019) 4424–4431. munication, in: Lecture Notes in Business Information Processing, vol. 87,
[23] Shun Fu, Guoyin Wang, Shuyin Xia, Li Liu, Deep multi-granularity graph Springer Berlin Heidelberg, 2011, pp. 197–206.
embedding for user identity linkage across social networks, Knowl.-Based [40] T. Opsahl, Triadic closure in two-mode networks: Redefining the global
Syst. 193 (2020) 105301. and local clustering coefficients, Social Networks (2011).
[24] Dzmitry Bahdanau, Kyung Hyun Cho, Yoshua Bengio, Neural machine [41] DNC emails network dataset – KONECT, 2017.
translation by jointly learning to align and translate, in: 3rd International [42] Paul Michel, Omer Levy, Graham Neubig, Are sixteen heads really better
Conference on Learning Representations, ICLR 2015 - Conference Track than one? Adv. Neural Inf. Process. Syst. 32 (NeurIPS) (2019) 1–11.
Proceedings, 2015, pp. 1–15. [43] Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov,
[25] Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Analyzing multi-head self-attention: Specialized heads do the heavy lifting,
Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, Show, attend the rest can be pruned, in: ACL 2019 - 57th Annual Meeting of the
and tell: Neural image caption generation with visual attention, in: 32nd Association for Computational Linguistics, Proceedings of the Conference,
International Conference on Machine Learning, ICML 2015, Vol. 3, 2015, 2020, pp. 5797–5808.
pp. 2048–2057.
11