You are on page 1of 15

1638 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL.

18, 2023

A High Accuracy and Adaptive Anomaly Detection


Model With Dual-Domain Graph Convolutional
Network for Insider Threat Detection
Ximing Li , Xiaoyong Li , Jia Jia , Graduate Student Member, IEEE, Linghui Li, Member, IEEE,
Jie Yuan , Member, IEEE, Yali Gao, Member, IEEE, and Shui Yu , Fellow, IEEE

Abstract— Insider threat is destructive and concealable, mak- organizations in recent years. Compared with outsider threats,
ing addressing it a challenging task in cybersecurity. Most exist- insider threats are usually launched by insiders of the orga-
ing methods transform user behavior into sequential information nization who already have the authorization to access the
and analyze user behavior while neglecting structural informa-
tion among users, resulting in high false positives. To solve information system and are familiar with enterprise internal
this problem, in this paper, we propose Dual-Domain Graph defense processes [1]. Insider attackers can remove footprints
Convolutional Network (referred to as DD-GCN), a graph- after posing enterprise privacy or property in threat, which are
based modularized method for high accuracy and adaptive difficult for traditional anomaly detection schemes to discover.
insider threat detection. The central idea is to convert user Since insider attackers are familiar with the core secrets of the
features and structural information into heterogeneous graphs
in the light of various relationships and take user behavior company, the consequences of insider threats are often more
and relationship into account together. To this end, a weighted severe than those caused by external attackers [2], [3], [4].
feature similarity mechanism is applied to balance the feature This situation attracts more and more attention from academia.
similarity of users and original linkages among them so as to Thus, deploying a cybersecurity approach to information sys-
generate the fused structure. Next, specific graph embeddings tems to detect insider threats comprehensively is crucial.
are extracted from the original topology structure and fused
structure simultaneously, which convert behavior information
into high-level representations. Furthermore, an attention mech-
anism is applied to learn the adaptive importance weights of the A. Motivation
user’s features in the corresponding embedding. The combination So far, researchers have done numerous works on insider
and difference constraints are proposed to enhance the learned threat detection. Existing insider threat detection works are
embeddings’ commonality and the ability to capture different
information. Extensive experiments on two real-world datasets divided into two categories. 1) The first type of typical
clearly show that our proposed DD-GCN extracts the most methods attempts to extract various insider threat patterns
correlated information from structural topology and feature and make anomaly detection about insider threats based on
information substantially, and achieves improved accuracy with prior knowledge. These techniques focus on establishing users’
a clear margin. behavior baseline to classify normal users against insider
Index Terms— Insider threat detection, anomaly detection, threats via game theory [5], machine learning [6], etc., ana-
graph convolutional network. lyzing users’ behavior based on system logs (e.g., file access,
logon and logoff operations, removable device usage), and
I. I NTRODUCTION adding the consideration of role attributes to form a real-time
monitoring system in the company. 2) Another group of
I NSIDER threat, as an essential research topic in cyber-
security, has become one of the main factors in numer-
ous cybersecurity incidents and caused significant losses to
typical methods concentrates effort on transforming various
user behavior into sequence data (i.e., sequential relationships
among log entries), which hold temporal information of the
Manuscript received 15 March 2022; revised 5 October 2022 and 5 January user. The rapid evolution of deep learning has recently brought
2023; accepted 3 February 2023. Date of publication 14 February 2023; date
of current version 2 March 2023. This work was supported in part by the a new aspect to sequence processing techniques for insider
National Natural Science Foundation of China under Grant 62202066, Grant threat detection. Deep learning as a subfield of representation
62102040, and Grant 62002028. The associate editor coordinating the review learning has a powerful feature representing capability, which
of this manuscript and approving it for publication was Dr. Ragnar Thobaben.
(Corresponding author: Xiaoyong Li.) can learn profound information and perform abstract and
Ximing Li, Xiaoyong Li, Jia Jia, Linghui Li, Jie Yuan, and Yali Gao accurate representation. Sequence processing techniques of
are with the Key Laboratory of Trustworthy Distributed Computing deep learning, including convolutional neural network (CNN)
and Service, Ministry of Education, Beijing University of Posts and
Telecommunications, Beijing 100876, China (e-mail: ximingli@bupt.edu.cn; and recurrent neural network (RNN), are widely applied to
lixiaoyong@bupt.edu.cn; jiajia@bupt.edu.cn; lilinghui@bupt.edu.cn; learn knowledge according to the history behavior and analyze
yuanjie@bupt.edu.cn; gaoyali@bupt.edu.cn). the users’ future behavior. Essentially, these behavior-to-entry
Shui Yu is with the School of Computer Science, University of Technology
Sydney, Ultimo, NSW 2007, Australia (e-mail: shui.yu@uts.edu.au). methods simulate normal user behavior and mark a deviation
Digital Object Identifier 10.1109/TIFS.2023.3245413 as an anomaly.
1556-6021 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1639

Although prior works on insider threat detection have made is provided. In summary, we highlight our contributions as
exciting progress by extracting threat patterns and entirely follows:
using user behavior information, they still have numerous • By leveraging feature similarity and original linkage,
drawbacks. Methods proposed in prior works can effectively a weighted feature similarity mechanism for insider threat
block hackers and prevent intruders from attacking to some detection is applied to balance user behavior and original
extent, but they are often costly and occupy several system linkage information, and translate users’ log entries into
resources. Similarly, much complicated and costly work is heterogeneous graphs. The weighted feature similarity
deployed when facing high-dimension, complexity, hetero- mechanism calculates the feature similarity among users
geneity, and sparsity situations. Moreover, it is too idealized and extracts latent structural information, which holds
that existing methods identify anomalies by comparing them users’ specific behavior information and maximizes the
with users’ daily normal behavior based on the assumption that usage of labeled samples. As a result, various structural
users’ daily behavior over time is relatively regular and steady information can be fully quantified and discovered.
(logical relationships over days or weeks). Relationships can • We propose the DD-GCN, a two-domain and two convo-
provide valuable and essential information, similar to social lutional operations graph-based model for high accuracy
networks in our daily life, yet existing approaches ignore such and adaptive insider threat detection, which takes into
relationships in the insider threat detection task. At the same account of structural relationships and feature informa-
time, current methods depend on a large number of labeled tion. The fused structural information as a bonus com-
data to ensure the model’s performance. However, collecting ponent for the final result. Furthermore, an attention
enough labeled user information and malicious activities from mechanism is applied to learn the different importance of
the organization is challenging in real-world scenarios due related nodes to capture behavior differences adaptively.
to privacy concerns. Thus, maximizing the usage of circum- The model aggregates different scale information and
scribed data is critical. In summary, we need to address the achieves improved insider threat detection accuracy.
following three problems: 1) How to capture different behavior • Two constraints, i.e., the combination constraint and
accurately under high-dimension, complexity, heterogeneity, the difference constraint, are designed for insider threat
and sparsity situation. 2) How to integrate the relationship of detection to ensure the consistency and disparity of the
structural information (i.e., user interaction) with user behavior learned embeddings to enhance their commonality and
features in an insider threat detection model. 3) How to use the ability to capture different scale information.
limited labeled instances efficiently for insider threat tasks. • Our evaluation of the proposed model on public insider
threat datasets illustrates that our proposed model
achieves improved detection accuracy of 98.65% and out-
B. Our Designs and Contributions performs state-of-the-art techniques with a clear margin.
The rest of the article is organized as follows. In Section II,
To alleviate the aforementioned drawbacks, we propose a we will review related works of insider threat detection
novel graph-based model named Dual-Domain Graph Convo- and the application of Graph Convolutional Network (GCN).
lutional Network (DD-GCN) for high accuracy and adaptive In Section III, we elaborate Dual-Domain GCN model and
insider threat detection. DD-GCN mainly consists of three insider threat detection framework based on it. The perfor-
specialized modules to extract relationships, accurately capture mance evaluation results are reported in Section IV, and we
different behavior, and detect anomalies under limited labeled conclude and prospect our work in Section V.
data efficiently. First, a weighted feature similarity mechanism
is applied to construct users’ fused structure from the original II. R ELATED W ORK
linkage network among users and integrate multiple relation-
Insider threat is an in-depth investigation problem that
ships, such as email communication, device, or file transform-
causes severe damage to enterprises. Scholars work on plenty
ing operations. Next, users’ behaviors are propagated over
of studies for insider threat detection and prevention. In this
the fused structure and the original topology graph to extract
section, we will review related works on insider threat detec-
embeddings in two domains with two specific convolutional
tion and the application of Graph Convolutional Network.
modules. Vectorized user behavior and operations based on the
embeddings are extracted from the original structural topol-
ogy and fused structural information simultaneously. Finally, A. Insider Threat Detection
an attention mechanism is utilized to learn the importance Insider threats have received significant attention for a long
weights of the user’s features within two embeddings in time as one of the most challenging cyberattacks to counter.
order to propagate them adaptively. In this way, insider threat One of the most typical methods of existing paradigms [8],
labels are able to supervise the learning process to adjust [9], [10], [11], [12] extracts and transforms user behavior
the adaptive importance weights of users’ features in two features, including email communication, network activity,
graphs and extract the most correlated information. Moreover, file access, and personal computer, into machine learning
we design combination and difference constraints to ensure models to identify malicious sessions and events. Features
the common and disparity effect of the learned embeddings. for enterprise situations also contain logon/logoff information,
Meanwhile, the complete framework of the insider threat printer, and device connection. The data used for insider threat
detection system based on the Dual-Domain GCN algorithm detection are often heterogeneous and sparsity, making feature

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1640 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

extracting-based approaches time-consuming, complicated and In addition to typical detection models mentioned above,
hindering practical deployment. tracking systems are applied to monitor and analyze activities
Another group of typical approaches converts log informa- of the system for insider threat detection [21], [22], [23]. Our
tion into sequential data and then analyzes user behavior. Min proposed DD-GCN differs from these tracking systems in two
Du et al. [13] design an anomaly detection method based on ways. First, most tracking systems aim to prevent attack foren-
LSTM named DeepLog. DeepLog uses system log keywords sics, not cyber threat detection. Second, these tracking systems
and different types of anomalies in log entries. Tuor et al. [14] usually apply causal graphs to track the flow of operations
propose a stacked LSTM structure to capture user actions and processes and interactions, such as IPC syscall operating on
utilize user activity log-likelihoods as the anomalous scores pipes. Our proposed DD-GCN analyzes the log information
to identify insider threat sessions. Instead of relying only that records users’ behavior in an information system and
on the activity type, Yuan et al. [15] utilize file uploading captures multiple relationships among log information that
and web browsing activities for insider threat detection and reflect user features, such as removable devices connection
propose a hierarchical neural temporal point processes model, history or website browsing contents.
which generates an anomalous score based on the differences
between test and normal activities in terms of activity types
B. Graph Convolutional Network
and occurrence time. Shen et al. [16] leverage RNN to predict
a user’s upcoming action to identify an anomaly user. If no sig- With the development of Represent Learning and Graph
nificant disparities are detected between the prediction findings Embedding techniques, Graph Convolutional Networks have
and the user’s activities, the user is identified as a regular user. attracted much attention and are widely studied [24], [25],
Otherwise, malicious activities are identified. Hu et al. [17] [26]. Bruna et al. [27] first apply graph Laplacian to graph
propose a CNN-based user authentication approach to detect convolution in the Fourier domain. Defferrard et al. [28]
insider threats by evaluating mouse bio-behavioral traits. The assume the Laplace transform from the Chebyshev expansion
proposed approach utilizes a picture to show how users move graph, improving the efficiency. Kipf et al. [7] simplify the
their controllers. If ID theft is committed, the user’s controllers convolution operation and recommend that only single-hop
actions will differ from those of the authorized user. Although neighbor node features are aggregated. GraphSage [29] aggre-
sequence processing models improve insider threat detection gates node features from the local neighbors with mean, max,
significantly, they always neglect the interaction information or LSTM pooling. Graph Attention network [30] outlines the
among users, which can provide indispensable information for node that should be paid attention to by learning the weight of
insider threat identification. the node feature mechanism. Pei et al. [31] design Geom-GCN
Recently, graph data structures have been widely deployed, that applies the structural similarity to collect long-range
which are able to represent the interaction among users effec- disassortative graphs. Xu et al. [32] propose Graph Wavelet
tively. Oka et al. [18] convert user-system interaction into a Neural Network (GWNN), which modifies the GCN model
bipartite graph and use an unsupervised learning framework by replacing the eigenvectors with wavelet bases to improve
to evaluate whether a potential insider threat is triggered after efficiency. Monti et al. [33] propose MoNet, which provides a
an incident shows a clear correlation between precipitation unified generalization of graph convolutional structure in the
with significant anomalies. Jiang et al. [19] adopt GCN model spatial domain. Most current GCN variants mainly concentrate
for insider threat detection. It is reasonable to utilize a graph on propagating node features on the topology to learn graph
structure to represent the interdependencies among users since embedding for detection or classification tasks. They aim to
individuals in an organization frequently contact one another design the aggregation function of neighbor nodes for message
via Email. The proposed model also uses profound user passing patterns.
profiles as the feature vectors of nodes. Liu et al. [20] propose Conversely, some studies question and analyze the propagat-
a heterogeneous graph embedding model, named Log2vec, ing mechanism of GCN. Li et al. [34] show that GCN actually
to encode activity linkages. Log2vec first creates a heteroge- carries out Laplacian debugging on node attributes so that the
neous graph from audit data by representing varied activities nodes embedded in the entire network gradually converge. Nt
as nodes and profound relationships between activities as et al. [35] and Wu et al. [36] show that when performance
edges. Then, Log2vec distinguishes malicious and benign information is propagated through the network topology, the
activity into different groups and identifies insider threats by topology plays a role in the low-frequency filtering of node
applying a clustering algorithm to node embeddings. Graph attributes. Using only low-frequency information under differ-
data-based methods above are built on normal user behavior ent conditions is limited. Based on this, Deyu et al. [37] pro-
patterns and compare or predict them with new behavior to pose a Frequency Adaptation Graph Convolutional Networks
identify anomalies. Different from the idea of [19] and [20], (FAGCN) based on a self-gating mechanism. The core idea
our proposed DD-GCN transforms user feature behavior and lies in implementing adaptive convolution of frequency infor-
relationship into graphs and uses specific domain embed- mation graphs. They also analyze the role of low-frequency
dings to extract latent information. Except for the relevant and high-frequency information in learning node feature rep-
user behavior transformation patterns, DD-GCN exploits two resentation. Also, Wu et al. [38] provide an elaborate review
domains to explore users’ latent information and applies an of GCN. For now, it is still being determined whether GCN
attention mechanism to learn the corresponding importance can adaptively extract relevant information from node features
weights of user behavior under different domains adaptively. and topology structure for classification or detection tasks.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1641

TABLE I and feature domains. A topology mapping based on the


N OTATION features of nodes is proposed. We construct adjacency and
feature matrices of the feature domain based on the original
structural information and fused feature information by a
weighted feature similarity function. Furthermore, we design
two convolution modules for the two specified domains. With
two convolution operations, the node feature X is able to be
propagated over both topology domain and feature domain
to learn corresponding embeddings Z T and Z F . Since the
classification result could associate with one of the topology
domain and feature domain or both, an attention mechanism
is applied to associate embedding information with learned
adaptive importance weights to extract the most relevant
information. Thus, the most related information is applied to
generate the final classification result Z .
1) Feature Domain Convolution Module: Traditionally, the
adjacency matrix of a GCN is fulfilled by 0 or 1 where
Ai j = 1 satisfies the connectivity if node i and node j are
linked, and Ai j = 0 otherwise. It is not trivial to define edge
only by the linkage information. Leveraging a simple topology
structure alone is not sufficient to describe the fusion process
either. Therefore, a comprehensive weighted feature similarity
method that takes advantage of edge linkage between nodes
and the relevance of their features is proposed to capture the
underlying structure information in the feature domain. The
quality of the relationship between nodes is measured by:
Therefore, our proposed DD-GCN takes a different perspective A F (i, j) = ω × Si, j + (1 − ω) × Ai j , (1)
that is extracting specific graph embeddings from structural
topology and feature information domains simultaneously. where ω is a hyper-parameter between (0, 1) to balance the
connection and feature similarity between nodes. Si, j is a
similarity function that calculates the feature similarity of
III. P ROPOSED M ODEL : D UAL -D OMAIN G RAPH
two nodes, Ai j represents the original linkage of the nodes.
C ONVOLUTIONAL N ETWORK
In order to calculate the adjacency matrix in the feature
In this section, we will elaborate the framework of domain A F (i, j) ∈ Rn×n , a similarity calculation mechanism
Dual-Domain Graph Convolutional Network for insider threat is provided to obtain structural information in the feature
detection task and the detail of each module. For convenience, domain. In fact, several methods can calculate the similarity
the notations of this paper are summarized in Table I. between two vectors, we list two popular ones here: cosine
similarity and Heat kernel. Cosine similarity leverages the
A. Dual-Domain Graph Convolutional Network cosine of the angle between two vectors to calculate the
similarity, which is:
Problem Settings: We focus on semi-supervised insider
xi · x j
threat detection in an attributed graph G = (A, X ), where Si, j = cos(xi , x j ) = . (2)
A ∈ Rn×n is the symmetric adjacency matrix with n nodes. |xi ||x j |
Ai j = 1 represents an edge between node i and node j, Heat Kernel similarity is calculated as follow:
otherwise Ai j = 0. X is the feature matrix. The output
∥xi −x j ∥2
of DD-GCN is n × m matrix, where m is the number of Si, j = H eat (xi , x j ) = e− t , (3)
classification categories, and each node belongs to one of m
classes. here t is the time parameter in the heat conduction equation,
Consistent with recent research on the integrating mecha- and t = 2. A F (i, j) aims to explore potential connections
nism of GCN, although both node features and topology have among users with similar features. We want to apply a
a high degree of direct correlation with node labels, GCN similarity function that can effectively explore the plausible
cannot adaptively separate information from them. Therefore, augmentation relations of users with similar behavior, while
our idea is to take a different perspective: to propagate node disregarding the magnitude of their feature vectors. Since the
features not only in the topology domain but also in the feature cosine similar function performs better with high-dimensional
domain. data and puts aside the magnitude of feature vectors, which
The overview framework of our Dual-Domain GCN is is a match for the high-dimension nature of insider threat
illustrated in Fig. 1. The key idea behind Dual Domain-GCN information, we choose cosine similarity in our work to
is that node features can be propagated in both the topology generate A F (i, j). If A F (i, j) > 0.5, the edge between node i

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1642 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

Fig. 1. The architecture of Dual-Domain GCN. The input matrices are adjacency matrix A and feature matrix X . Adjacency matrix A represents the
linkage among nodes. Node feature X is to construct for the feature domain. DD-GCN consists of two convolution modules for topology domain and feature
domain and the attention mechanism. The adjacency matrix and feature matrix are generated by Equation 1 for the feature domain.

Algorithm 1 Calculate Adjacency Matrix A F for Feature adjacency matrix and feature matrix graph G t = (A T , X ),
Domain where A T = A. The l-th layer of topology embedding Z Tl
Input:Adjacency matrix:A; is calculated in the same way as that in the feature domain,
Input feature:X ; which is:
Feature vector:xi , ∀i ∈ V ;
−1 −1
Balance parameter:ω; Z Tl = σ ( D̃T 2 Ã T D̃T 2 Z Tl−1 WTl ), (5)
Output:Adjacency matrix for feature domain:A F
here, σ is the activation function, and ReLU is chosen as our
1: Initialize A F with zeros; activation function due to the low computational complexity
2: Initialize ω; and the non-exponential operation. D̃T is the diagonal degree
3: for node i, j ∈ V do matrix of à T . Specifically, we have à T = A T + IT , and IT is
4:
x ·x
Si, j ← cos(xi , x j ) = |xii||xjj | identify matrix. WTl is the weight matrix of l-th GCN in the
5: A F (i, j) ← ω × Si, j + (1 − ω) × Ai j topology domain. The last embedding in the topology domain
6: if A F (i, j) > 0.5 then is denoted as Z T .
7: A F (i, j) ← 1 3) Attention Mechanism: Since we have two specific
8: end if embeddings Z T and Z F , different node features contribute
9: end for differently to the detection result. Thus, an attention mecha-
Return:A F nism is applied to learn corresponding adaptive importance
weights. We concatenate two embeddings firstly and get
′ ′
Z cat ∈ Rn ×d×d , where n ′ is the number of nodes in two
and node j in the feature domain is established. The generation embeddings, i.e., n ′ = 2n. d is the number of features
of A F in the feature domain is summarized in Algorithm 1. applied in this section, and d ′ is the dimension of the feature

After the fused adjacency matrix in feature domain A F is information. For node i in the embedding Z cat , z i ∈ Rd×d is
generated, the input graph in the feature domain is G f = node i’s embedding vector in Z cat . The embedding of node
(A F , X ). Subsequently, l-th layer feature embedding Z lF can i is propagated through a nonlinear transformation, and a
be represented as: shared vector of attention q ∈ Rh×1 is applied to calculate
the attention values ξ i as follow:
−1 −1
Z lF = σ ( D̃ F 2 Ã F D̃ F 2 Z l−1
F W F ),
l
(4)
ξ i = q T σ (W (z i )T + b), (6)
where Z 0F= X , which is similar to Kipf’s graph convolution
kernel [7]. σ is the activation function, and we choose ReLU where σ is the activation function, and we choose tanh as
as our activation function because of the low computational the activation function for attention value calculation. W is
complexity, the non-exponential operation, and the ease of the weight matrix, and b is the bias vector. After that, the
learning optimization. D̃ F is the diagonal degree matrix of normalized attention value with the softmax function is applied
à F , here à F = A F + I F , and I F is identify matrix. W Fl is to get the final importance weight:
the weight matrix of l-th GCN in the feature domain. The last
exp ξ i j
  
embedding in the feature domain is denoted as Z F . In this α i j = softmax ξ i j = Pd . (7)
j=1 exp(ξ )
way, we can obtain the node embedding in the feature domain ij
that captures the specific fused information.
2) Topology Domain Convolution Module: For the topology The values of α i reflect the importance of node i within all
domain, the input of the convolution module is the original node features in these two domains for the corresponding

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1643

embedding. Each embedding Z i is calculated as follow: normalize the different domain embeddings, which are denoted
X as Z T nor and Z Fnor . Then these two normalized matrices are
Zi = αi j · zi j . (8) applied to calculate the similarity denoted as ST and S F by:
j

The whole process of final embedding generation is sum- ST = Z T nor · Z TT nor , (9)
marized in Algorithm 2. S F = Z Fnor · T
Z Fnor . (10)

Algorithm 2 Node Embedding Generation of Dual-Domain The combination constrain is calculated by:
GCN
Input: Graph:G = (V, E); Lcomb = ∥ST − S F ∥2 , (11)
Input feature:X ; where ∥ · ∥2 is L 2 -normalization.
Weight matrix : W k , ∀k ∈ {1, . . . , K }; b) Difference constraint: For feature or topology embed-
Non-linearity:σ ; ding and combination embeddings, which we denote as
Balance parameter:ω; Z combT and Z combF for embedding Z T and Z F , respectively.
Feature vector:xi , ∀i ∈ V ; Z combT and Z combF are learned from the same graph as
Shared vector of attention:qTi ; Z T and Z F with shared parameters WCl . Thus, a difference
Bias:b. constraint is proposed to enhance them to learn different
Output:Final embedding Z information. Inspired by mutual information, Hilbert-Schmidt
1: Build adjacency (adj) matrix:A ← (V, E) Independence Criterion (HSIC) [39], an effective measure
2: Feature domain fused structure:A F (i, j) to calculate the independence of two vectors, is applied to
3: Feature domain graph generation:G F ← (A F , X ) enhance difference constraint in this work. Unlike mutual
4: Degree matrix of feature domain: D̃ F information, HSIC does not need to estimate the probability
5: Z 0F ← X density of two variables, and it transforms this process into
−1 −1
6: F WF )
Embedding:Z lF ← σ ( D̃ F 2 Ã F D̃ F 2 Z l−1 l a sampling form directly. Technically, the difference con-
7: Adj matrix for topology domain:A T ← A straint of topology embedding Z T and combination embedding
8: Topology domain graph generation:G T ← (A T , X ) Z combT are learned from the same graph, and the HSIC is
9: Degree matrix of topology domain: D̃T calculated as:
10: Z T0 ← X 1
−1 −1 H S I C(Z T , Z combT ) = tr (K T J K combT J ), (12)
l
11: Embedding:Z T ← Ã T D̃T 2 Z Tl−1 WTl )
σ ( D̃T 2 (n − 1)2
12: Concat two embeddings:Z cat ← Z T ||Z F
13: for node i ∈ V do where J = I − n1 and I is the n-level identify matrix. tr is
14: Node i’s embedding in Z cat :z i the trace of matrix. K T and K combT are Gram matrices and
15: Attention value:ξ i ← q T · tanh(W (z i )T + b) calculated as:
16: Adaptive
P importance weight:α i j ← so f tmax(ξ i j ) ∥x1 − x2 ∥22
Zi ← j α · zi j i j K (x1 , x2 ) = exp(− )(σ > 0), (13)
17: σ
18: end for
j
Return:Z where K T,i j = K T (z iT , z T ) and K combT,i j =
j
K combT (z combT , z combT ). z iT is the i-th node’s embedding in
i

the topology domain and z combTi stands for the same meaning.
4) Loss Function: In this section, we design two constraints
to enhance the effectiveness of our proposed model. Since the For the difference constraint in feature domain is calculated
information in these two domains has a combination effect, in the same way as topology domain:
i.e., node label related to both domains, combination constraint 1
is designed to improve this effect through a combination H S I C(Z F , Z combF ) = tr (K F J K combF J ). (14)
(n − 1)2
embedding calculated by two graph convolutional operations
via shared parameters. The difference constraint is intended to Thus, the difference constraint is denoted as Ldi f f is:
improve the disparity effect between two domain embeddings
and corresponding combination embedding. The generation of Ldi f f = H S I C(Z T , Z combT ) + H S I C(Z F , Z combF ). (15)
these two constraints is shown in Fig. 2. The loss function
c) Optimization function: The output embedding Z is
consists of two constraints and an optimization function.
used for multi-class classification with a linear transformation
a) Combination constraint: Generally speaking, the node
and a softmax function. The class prediction Ŷ of the node
classification results may be related to the information in both
can be calculated as follows:
the feature and topology domains. The combination constraint
is designed to enforce the commonality effect between these Ŷ = so f tmax(W Z + b), (16)
two domain embeddings and implies that the node label is
i)
related to both domains. Thus the combination embedding is here so f tmax(xi ) = Pexp(x . In this work, we choose cross-
i exp(xi )
learned with shared parameters WCl of two domain convolu- entropy loss for node classification overall training process.
tional operations as Eq 4 and 5. We utilize L 2 -normalization to Suppose the training set is L, each l ∈ L, the real label of

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1644 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

information by extracting the behavior characteristics of users.


However, only establishing user connections and some features
could not achieve promising performance. We apply DD-GCN
to the insider threat detection application.
All the data collected from the Input module is trans-
formed to Data pre-processing module to be converted to a
graph-based format for DD-GCN detection module. The data
processing module will generate the graph’s adjacency matrix
and feature matrix. In this work, we extract user behavior
features from emails, websites, files, inputs, devices, etc.,
and use natural language processing (NLP) to collect features
based on user behavior content, providing a solid foundation
for describing node feature information. The list of brief
Fig. 2. Two constraints of Loss function. The two constraints in object
features is shown below:
function are combination constraint lcomb and difference constraint ldi f f . • Work hour logon, off-work hour logon, weekend logon;
Combination constraint implies the topology and feature embedding learning
common information, and difference constraint improves disparity effect of
• File features, total files, photo files, txt files, document
these two embeddings. files, exe files, zip files, and PC for files;
• Email features, email sent outside the organization, email
sent within the organization, average email size, email
the training set is Yl and the predicted label is Ŷl . Thus, the receivers;
cross-entropy Lt is: • Web browser features, the number of pages browsed,
C Cloud service-related web pages, job hunting-related web
pages, Wikileaks-related web pages, hacking-related web
XX
Lt = − Yli ln Ŷli , (17)
l∈L i=1
pages.
For the structural information of these nodes, we do not
here C is the predicted class of node. Combining combination
use direct communication relationships alone. The weighted
constraint, difference constraint, and cross-entropy loss, the
feature similarity mechanism of DD-GCN is applied to cal-
overall loss function is obtained by:
culate the similarity of user features and extract latent user
L = Lt + γ Lcomb + βLdi f f , (18) connection information.
The output module classifies insider threat detection tasks
where γ and β are the parameters of combination constraint into four categories: normal scenario, data exfiltration, intel-
and difference constraint. The proposed DD-GCN is optimized lectual property, and IT sabotage. To train the learnable
via backpropagation with the help of labeled data and the parameters of DD-GCN, we perform batch gradient descent
learned embeddings of nodes for classification. In the next and use cross-entropy loss to evaluate during the training
part, we will briefly illustrate the framework of DD-GCN phase.
based insider threat detection model. In this work, we use insider threat detection as a domain
application example to demonstrate a detailed model for
B. Framework of Insider Threat Detection via Dual-Domain building DD-GCN based insider threats detection model. Cre-
GCN ating anomaly detection models in many other areas (such
as network traffic anomaly detection and fraud detection) is
As mentioned earlier, traditional insider threat detection
similar to this example, with only a few changes in feature
extracts user’s behavior information alone and ignores user
extraction.
role-based relations or communication relationships such as
In the next section, we will demonstrate the effectiveness
email communication, social media information, etc., which
of the proposed DD-GCN model in detecting insider threat
could be essential information for insider threat detection.
behavior in public datasets.
In order to collect user features and connection information in
the task of insider threat detection, we transform user behavior
into graphs and apply DD-GCN to the insider threat detection IV. P ERFORMANCE E VALUATION
model. The framework of insider threat detection via DD-GCN
is shown in Fig. 3. This framework consists of four modules: We will evaluate our proposed DD-GCN insider threat
Input data module, Data Pre-processing module, DD-GCN detection model on public datasets in this section. We begin
based detection module, and Output module. with brief information about the datasets, followed by a pre-
Firstly, the system launches the sequence of user actions processing method to generate the input matrix for DD-GCN.
and then transmits it to the Data Pre-processing module to Next, we conduct ablation studies to examine the effectiveness
construct the graph. Graph nodes represent users or other of DD-GCN, including components analysis and parameter
objects, and the edges between nodes represent the structural sensitivity study. Finally, the insider threat classification per-
information of these users. In order to obtain node features formance of DD-GCN will be compared with other state-of-
and graph structure information, the system obtains node the-art methods.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1645

Fig. 3. The framework of Insider Threat Detection via Dual-Domain GCN. This framework has four modules: input data module, data pre-processing
module, DD-GCN detection module, and output module. Input data module collects user behavior information; Data Pre-pressing module extracts specific
feature of each user and form adjacency matrix Ai and feature matrix X i ; DD-GCN detection module applies DD-GCN to generate node embeddings for
insider threat classification task. The architecture of DD-GCN will be introduced in section III-A. The origin adjacency matrix Ai is transformed to A f for
the feature domain, and the adjacency matrix for topology is the same as origin adjacency matrix; The Output module states the classification result.

TABLE II description of these activities is shown in Table II. To build


D ESCRIPTION OF CERT DATASET a model for users’ features extraction, we extract and review
the dataset to identify missing information such as user-to-
user relationships, user-to-PC relationships, regular working
hours, website, and file categories. Followed by Section III-B,
we analyze the detailed information of the log file, such as
timestamp, user ID, PC ID, operation details, etc., to generate
the feature matrix. The dataset we use mainly contains numer-
ous activity categories, which reflect the user’s operating habits
and can be applied to distinguish normal and malicious users
based on these activity tracks. We choose representative fea-
ture information (including Work hour/off-work hour/weekend
logon, web browsing contents, email communications, mobile
device transfer operations, file transfer operations, file cate-
gories, PC for files etc.) to generate the user behavior for
A. Experiment Setup the feature matrix. To further evaluate our proposed model
in insider threat detection task, we chose 500 users with three
1) Dataset: This work is performed on CERT insider different label rates for the training set and 400 users as the
threat dataset [40], which is a publicly available dataset for test set, and 100 users as the validation set. The label rate
research, testing, development to help mitigate insider threats means the percent of labeled data per category in the train-
and provides comprehensive behavioral observations of users ing data used for training the semi-supervised insider threat
to model their behavior. There are various versions of the detection model. Different labeling rates constitute different
CERT dataset, and the specific versions used in this work ratios of labeled data samples, thus effectively forming a
are CERT r4.2 and r6.2. CERT r4.2 simulates a company semi-supervised learning scenario.
with 1,000 employees, 70 of whom are malicious insiders in 2) Parameter Setting: For our DD-GCN, we train different
three threat scenarios. There are 32,770,227 activities in this domains with 2-layer GCN (hidden layer: nhid1 = 128,
dataset, and 7,323 malicious activities are manually injected nhid2 = 256). Adam optimizer with a learning rate between
by experts in insider threat detection task, representing three {0.0001, 0.0005} is applied, and the dropout is 0.5. In addi-
insider threat scenarios discovered. CERT r4.2 involves several tion, weight decay is {5e-3, 5e-8} and balance parameter of
categories of user behavior data, such as file access (creation, feature domain ω is selected in {0.1, . . . , 0.9}. The coeffi-
modification, deletion, file name, file type, etc.), email sending cient of combination and difference constraint is selected in
and receiving, device usage (mobile storage devices, printers, {0.01, 0.001, 0.0001} and {1e-10, 5e-9, 1e-9, 5e-8}. We use
etc.), HTTP access, system login, and also includes infor- Accuracy (ACC) and macro F1-score (F1) to evaluate the
mation about the user’s job title and working departments. performance of our proposed model and baselines. All
CERT r6.2 simulates a company with 4,000 employees, 5 of experiments are conducted with Pytorch 1.1.0 [41] and
whom are malicious insiders in three threat scenarios. There Python 3.6.8.
are 135,117,169 activities in this dataset and 470 malicious
activities are manually injected by experts. B. Ablation Studies
The CERT dataset covers five different types of activities: In this section, to examine the effectiveness of our proposed
logon/logoff, email, device, file, and HTTP. The detailed model, we conduct a series of ablation experiments to compare

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1646 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

Fig. 4. Analysis of Attention Mechanism. The accuracy of DD-GCN


outperforms the variant without attention mechanism (DD-GCN-w/o-att) in
two datasets for all label rates.

the results of different components, attention mechanisms,


and correlation coefficients in this model with different label
rates. Moreover, accuracy is selected as a metric for evaluating
ablation experiments’ performance.
1) Analysis of Components: We compare different compo-
nents of DD-GCN on the CERT r4.2 and CERT r6.2 to validate
the effectiveness of the constraints and different domains. The
related parameters applied in this part are γ =0.001, β=5e-
8, ω=0.6. DD-GCN-w/o(T) is referred to DD-GCN without
constraints lcomb and ldi f f , and considers the topology domain
only. DD-GCN-w/o(F) considers the feature domain without
lcomb and ldi f f . DD-GCN with difference constraint ldi f f
alone is named as DD-GCN-Diff. DD-GCN with combination
constraint lcomb is denoted as DD-GCN-Comb.
From the results shown in Table III, we get the follow-
ing observations: with the label rate increase, each variant’s
accuracy increases. DD-GCN achieves the best accuracy of
98.26% and 98.65% with the label rate of 60%; DD-GCN Fig. 5. Analysis of Attention Distribution of Different Domains. The
is consistently superior to all the other four variants, demon- attention distribution of the feature domain is higher than the topology domain,
strating the effectiveness of combining these two constraint which verifies that the attention mechanism captures different information in
these two domains and sets corresponding importance weight.
options. The most relative growth of accuracy is 3.07% and
2.78%; Compared with DD-GCN-w/o, DD-GCN-Comb and
DD-GCN-Diff achieve better performance in all label rates,
which verifies the importance of these two constraints. And distribution of two domains is shown in Fig. 5. And the results
the most relative improvement accuracy of DD-GCN-w/o(T) of attention value of specific features are shown in Fig. 6.
reaches 2.44%; The superior results of DD-GCN-Comb over a) Attention effectiveness: The attention mechanism in
DD-GCN-Diff indicate that combination constraint performs DD-GCN extracts the most relevant information from the
better under the same situation, along with 0.54% and 0.49% topology domain and feature domain. The variant of DD-GCN
improvement in accuracy. Moreover, the performance of DD- without attention mechanism is denoted as DD-GCN-w/o-att,
GCN-w/o(F) is better than DD-GCN-w/o(T), indicating the and the accuracy results against DD-GCN for all label rates are
feature domain’s necessity and the effectiveness of our pro- shown in Fig. 4. We find that DD-GCN outperforms the variant
posed weighted feature similarity mechanism to generate the without attention mechanism on both datasets for all label
graph of the feature domain. rates, which verifies the effectiveness of the attention mecha-
2) Analysis of Attention Mechanism: We test the variant of nism. From the results of Fig. 4, Table IV and Table V, DD-
DD-GCN without attention mechanism and take the average GCN-w/o-att still achieves competitive performance against
importance weight value of the topology domain and feature other variants without attention mechanism, implying our
domain for performance comparison. framework is competitive.
Additionally, we analyze the distribution of attention of b) Attention distribution: DD-GCN learns two specific
two domains and specific features on both CERT r4.2 and embeddings in each domain. We first separate two domains
r6.2 datasets. DD-GCN learns two graph embeddings, and and examine attention distribution for CERT r4.2 and CERT
each node’s embedding within them will be associated with r6.2 in Fig. 5. As we can see, in Fig. 5(a), the attention
attention values. The parameters of this section are the same distribution of specific node embedding in the feature domain
as the ablation study (γ =0.001, β=5e-8, ω=0.6). The results is more significant than in the topology domain, which implies
of the attention effectiveness are shown in Fig. 4. Attention that the information in the feature domain is more important

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1647

TABLE III
VARIANTS C OMPARISON

less than CERT r4.2. Thus, the feature of the user plays a
more significant role in insider threat detection.
For the insider threat detection task, the extracted informa-
tion of the feature domain is critical, contributing to better
results. On the same dataset, different domain contributes
differently to insider threat detection. The same domain con-
tributes differently to insider threat detection on the differ-
ent datasets. Overall, the attention distribution experiments
demonstrate that DD-GCN can focus on more critical infor-
mation.
c) Attention value: With the same parameters, we test
the attention value of each feature in the test set of two
datasets. The results are shown in Fig. 6. The feature ID in
both datasets represents the same. Feature ID from 1-6 stands
for user logon activities (i.e., workhour logons, after-hour
logons, or the total number of logon activities in a given time).
ID 7-15 represents the removable device connection and file
upload activities. ID from 16-20 represents email operations,
and ID from 21-26 denotes web browsing operations such
as downloading, browsing, or uploading operations based on
simple NLP preprocessing. In CERT r4.2, anomaly users are
more related to visiting malicious hacking-related websites
then downloading a keylogger and using a thumb drive to
transfer it to other machines, or surfing job-hunting-related
websites and soliciting employment from a competitor. Thus,
the attention values of anomaly nodes on web browsing oper-
ations are higher than other features, which verifies that our
attention mechanism can assign different importance weights
to different features and focus more on important features.
In CERT r6.2, malicious users are distinguished more by
files, removable device operations, and hacking-related web-
sites browsed during or after work. The attention values of
Web browsing and device operations are higher than other
features. In summary, the evaluation of attention distribution
and attention value indicates that our proposed DD-GCN is
able to assign more significant importance weights to more
Fig. 6. Analysis of Attention Values of Different Features. Together with critical information and features adaptively.
the insider scenarios in both datasets, the insider threat related features get 3) Parameter Study: To examine the sensitivity of the
higher attention values than other session related features which implies the
attention mechanism assigns higher importance weight to crucial information. parameters of our proposed model, we investigate the balance
parameter ω for the feature similarity mechanism in the feature
domain and the parameters of two constraints in both datasets.
than that in the topology domain on the corresponding dataset. The results are shown in Fig. 7.
Meanwhile, for attention distribution in CERT r6.2 shown in a) Analysis of balance parameter ω of feature domain:
Fig. 5(b), the attention difference between topology and feature As ω is used to balance the feature similarity for the feature
domains is more noteworthy than in CERT r4.2. The reason domain and original linkage in the topology domain, we study
behind this could be that the anomaly user in CERT r6.2 is the performance of DD-GCN for insider threat detection

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1648 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

with the balance parameter ω ranging from 0.1 to 0.9 for density of data points. LOF reflects the degree of the
CERT r4.2 and CERT r6.2 in the first column of Fig. 7. anomaly of a sample mainly by calculating a score.
The parameter of combination constraint γ and difference • Auto-Encoder (AE) [44] is a form of multi-layer neural
constraint coefficient β are set to 0.001 and 5e-9. From Both network that compresses and reproduces data.
figures, the accuracy increases first and then decreases. This • Long Short Term Memory (LSTM) is widely-used
performance may be because when ω is smaller, the adjacency to model the long sequences and capture the long-time
matrix in the feature domain is similar to that in topology, dependence.
which is not much different from GCN. Also, larger ω may • Convolutional Neural Network (CNN) is a typical
introduce more noisy edges and ignore the original linkage model in deep learning for computer vision, which
among nodes. achieves shift, scale, and distortion invariance through
b) Analysis of combination constraint parameter γ : In local receptive fields, shared weights, and sub-sampling.
order to test the effect of the combination constraint coefficient • Graph Convolutional Network (GCN) learns node rep-
γ , we deploy the experiments of γ varying from 0 to 10,000. resentations through aggregating information from neigh-
The results of CERT r4.2 and CERT r6.2 are shown in the bors, which is a variant of graph neural network and can
second column of Fig. 7. The balance parameter ω is set to be applied to semi-supervised multi-class classification
0.6, and the difference constraint coefficient is set to 5e-9. task.
With the increase of γ , the accuracy rises first and drops after. • Graph Convolutional Network with Feature Aug-
The reason behind this performance probably because more mentation (GCN-FA) [19] is a GCN based model for
labeled data brings more information to feature and topology insider threats detection with feature augmentation via a
domains. Furthermore, the curves of all label rates display weighted feature function.
similar varying trends.
c) Analysis of difference constraint parameter β: After 1) Comparison with Machine Learning-Based Methods:
testing the effect of the combination constraint coefficient, The comparison results are reported in Table IV. Different
we examine the coefficient of the difference constraint and label rates mean different percent of labeled nodes per class
vary it from 0 to 1e-5. The result of CERT r4.2 and CERT in the training set. We have the following observations:
r6.2 are shown in the third column of Fig. 7. Similar to the • Our proposed DD-GCN achieves the best performance
result of the parameter of combination constraint, the accuracy over all label rates compared with machine learning
rises first and drops after. Conversely, the performance drops baselines. Especially for accuracy, DD-GCN achieves
significantly when β increases to 1e-9. While β between 5e- accuracy of around 98.21% and 98.65% for CERT
9 and 1e-10, the result of the label rate of 60% is steadier r4.2 and r6.2, respectively. Maximum relative improve-
than that with the label rate of 30% and 15%. Moreover, even ment is about 13.86% in CERT r4.2 and 14.24% in CERT
for the label rate of 15%, the accuracy still increases, which r6.2. As for the F1 score, DD-GCN reaches the best
implies our proposed DD-GCN can achieve better results with result, around 93.04%, which improves around 30% than
all label rates. other traditional machine learning-based methods in both
datasets. These results demonstrate the effectiveness of
DD-GCN.
C. Insider Threat Detection Analysis • Compared with traditional machine learning meth-
We compare our proposed insider threat detection via ods, DD-GCN outperforms in all label rates. Because
Dual-Domain GCN with other state-of-the-art methods, such DD-GCN extracts not only user behavior feature infor-
as traditional machine learning-based and deep learning-based mation but also the relationship information between
methods. The traditional machine learning methods applied topology structure and user features.
to compare in this section include support vector machine The reason behind this indicates that the sophisticated fea-
(SVM), random forest (RF), logic regression (LR), and ture and relationships of structural information are fundamen-
two outstanding machine learning-based methods of anomaly tal for insider threat detection task, which traditional machine
detection. The deep learning-based schemes in this section learning-based schemes neglect. Moreover, the number of
comprise two categories: manually structured skeleton data representative features of user behavior considered in this
approaches and graph-based architectures. We conducted the work is less than traditional machine learning-based schemes
experiments 10 times and reported the average values and stan- require, which undermines the performance of traditional
dard deviation. The information of these traditional machine machine learning baselines. Also, the size of the dataset is
learning methods and deep learning paradigms will be shown another concern for machine learning baselines. For example,
below, and the parameters applied in this part are γ =0.001, LODA can be theoretically improved with more training data.
β=5e-8, ω=0.6: In some cases, the performance of LOF is better than other
• Lightweight on-line Detector of Anomalies (LODA) algorithms because the identified anomalies are determined by
[42] is a method that consists an anomaly detector based the local features that other algorithms may ignore. However,
on a weak histogram with a robust detector. LOF suffers from long training and prediction time. LODA
• Local Outlier Factor (LOF) [43] is a density-based requires lower time complexity, which makes it suitable for
algorithm and the core part of it is characterizing the time essence detection tasks.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1649

Fig. 7. Parameter Study. The sensitivity of parameters of Balance Parameter ω, Combination Constraint Coefficient γ and Difference Constraint Coefficient
β.

TABLE IV
R ESULTS (%) C OMPARED W ITH M ACHINE L EARNING -BASED M ETHODS

Comparing the results in Table III and IV, we can find The results of DD-GCN and GCN-FA are better than
that despite two constraints, DD-GCN-w/o still outperforms GCN, which verifies that the feature domain contributes more
other schemes, indicating that our proposed framework is significantly to this task. DD-GCN consistently outperforms
competitive. GCN-FA and GCN with all label rates, which indicates the
2) Comparison with Deep Learning-Based Methods: The effectiveness of the adaptive attention mechanism and trans-
results of accuracy and F1-score are summarized in Table V. formation pattern in DD-GCN. DD-GCN can extract more
The first observation is the overwhelming performance of information than GCN and achieve better accuracy, around
graph-based architecture compared with the structured-based 98.21% and 98.65% of the two datasets, respectively. The
skeleton ones, such as CNN and LSTM. Even though LSTM maximum relative improvement of accuracy is 6.57%. For
can be considered to extract structural information successfully the performance of the F1 score, our approach improves by
according to their data-driven architectures, solid improve- around 7.36%. For the 60% label rate, DD-GCN achieves
ments can not be evident when modeling irregular skeleton higher accuracy than GCN with a similar F1 score.
data, such as a graph. For better performance of structure Comparing GCN-FA and GCN, we can find that struc-
skeleton data schemes for insider threat detection tasks, LSTM tural differences exist between topology and feature domains,
can be applied to extract the sequence feature of user behavior and performing convolutional operation only on the topology
and process it to CNN for classification tasks. domain does not show a better result than on the feature

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1650 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

TABLE V
R ESULTS (%) C OMPARED W ITH D EEP L EARNING -BASED M ETHODS

TABLE VI
C OMPUTATIONAL T IME A NALYSIS

Fig. 8. Precision-Recall curve of Deep Learning methods. The


precision-recall curves of DD-GCN and other competitive deep learning
methods. DD-GCN, GCN-FA and GCN are significantly perform better than
other deep learning methods.
of nodes and edges in these two domains. Suppose the number
of nodes in these two domains is n, m T and m F are the number
domain, especially when the data have a more relative rela- of edges in these two domains, respectively. And m is the
tionship on feature domain than linkage. In addition, DD- total number of edges in these two domains. For the training
GCN outperforms GCN-FA verifies the effectiveness of the algorithm of DD-GCN, the number of layers of convolution
attention mechanism. Our proposed Dual-Domain GCN is operation in these two domains are denoted as L T and L F , and
better deployed and more suitable for feature allocation and the layer of convolution operation is same which is denoted as
learning representations for insider threat detection task. L. For simplicity, the dimensions of the node hidden features
In addition to accuracy and F1-score, we also test the remain constant, denoted by d. The memory complexity of
precision-recall curves (PR-curves) of deep learning methods. DD-GCN is O(Lnd +K d 2 ). The time complexity of DD-GCN
The result of PR-curves are shwon in and Fig. 8. The figure is O(Lm T d + Ln T d 2 ) + O(Lm F d + Ln F d 2 ) = O(Lmd +
shows the PR curves of DD-GCN and competitive deep Lnd 2 ).
learning methods. Compared with GCN-FA and GCN, the To investigate the computational performance of our pro-
performance gain of DD-GCN is significantly attributed to posed DD-GCN, we select Floating Point Operations (FLOPs)
the introduction of both the topology domain and feature as a metric to examine time performance against GCN-FA
domain. The recall at 90% precision is improved around and GCN since they have comparable accuracy. We choose
7.32% and 9.84% in dataset CERT r4.2 and r6.2, respectively. the label rate of 60% and CERT r6.2 dataset. The results
When comparing GCN-FA and GCN, the improvement is of DD-GCN, GCN-FA, and GCN in terms of accuracy and
not prominent, which indicates that the performance lift by efficiency trade-offs (Latency) are shown in Table VI. The
considering a single domain is not promising. Besides, DD- latency we used in this section is in term of running time.
GCN, GCN-FA, and GCN significantly perform better than For comparison, we express latency as a percent form, i.e.,
CNN, LSTM, and AE. Compared with other competitive the less, the better. From this table, we find that DD-GCN
deep learning baselines, the promising performance indicates gets unsurprisingly higher FLOPs than GCN-FA and GCN
that DD-GCN can extract the most correlated user behavior since DD-GCN conducts two domain convolutional operations
information and distinguish the normal and malicious users and attention mechanism, which takes some time to process
based on these activity tracks. than other methods. During the application of our proposed
model and other comparable graph-based paradigms, there is
D. Model Complexity and Computational Time Analysis no significant difference in their latency due to the size of
As our proposed DD-GCN contains two convolution opera- datasets and the selection of features. In contrast, DD-GCN
tions, the time complexity of DD-GCN depends on the number achieves the best accuracy with a clear margin.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1651

V. S UMMARY AND F UTURE W ORK [11] I. Homoliak, F. Toffalini, J. Guarnizo, Y. Elovici, and M. Ochoa, “Insight
into insiders and IT: A survey of insider threat taxonomies, analysis,
In this paper, we propose Dual-Domain GCN, a novel modeling, and countermeasures,” ACM Comput. Surveys, vol. 52, no. 2,
pp. 1–40, Mar. 2020.
graph-based model for insider threat detection. The key idea
[12] H. A. Kholidy, F. Baiardi, and S. Hariri, “DDSGA: A data-driven
behind DD-GCN is fusing user behavior from structural topol- semiglobal alignment approach for detecting masquerade attacks,” IEEE
ogy and feature information. A weighted feature similarity Trans. Dependable Secure Comput., vol. 12, no. 2, pp. 164–178,
mechanism function is designed to convert user log entries into Jun. 2015.
[13] M. Du, F. Li, G. Zheng, and V. Srikumar, “DeepLog: Anomaly detection
heterogeneous graphs via leveraging relationship information and diagnosis from system logs through deep learning,” in Proc.
between users and their corresponding activities features. ACM SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA,
We extract specific graph embeddings from structural topology Oct. 2017, pp. 1285–1298, doi: 10.1145/3133956.3134015.
[14] A. Tuor, S. Kaplan, B. Hutchinson, N. Nichols, and S. Robinson,
and feature information simultaneously to analyze the graph, “Deep learning for unsupervised insider threat detection in structured
transforming behavior information into high-level representa- cybersecurity data streams,” 2017, arXiv:1710.00811.
tions. Meanwhile, an attention mechanism is applied to learn [15] S. Yuan, P. Zheng, X. Wu, and Q. Li, “Insider threat detection via
hierarchical neural temporal point processes,” in Proc. IEEE Int. Conf.
the importance weight of each node’s embedding within these Big Data (Big Data), Dec. 2019, pp. 1343–1350.
two domains so as to fuse them adaptively. The evaluation [16] Y. Shen, E. Mariconti, P. A. Vervier, and G. Stringhini, “Tiresias:
of DD-GCN on public datasets demonstrates that DD-GCN Predicting security events through deep learning,” in Proc. ACM SIGSAC
based detection model achieves improved detection accuracy Conf. Comput. Commun. Secur., New York, NY, USA, Oct. 2018,
pp. 592–605, doi: 10.1145/3243734.3243811.
and outperforms other state-of-the-art traditional techniques [17] T. Hu, W. Niu, X. Zhang, X. Liu, J. Lu, and Y. Liu, “An insider threat
based insider detection models. detection approach based on mouse dynamics and deep learning,” Secur.
In future work, we plan to apply the DD-GCN based model Commun. Netw., vol. 2019, pp. 1–12, Feb. 2019.
[18] M. Oka, Y. Oyama, and K. Kato, “Eigen co-occurrence matrix method
to other practical applications to evaluate its feasibility and for masquerade detection,” in Proc. 7th JSSST SIGSYS Workshop Syst.
scalability further. We think the process of creating anomaly Program. Appl. (SPA). Tsukuba, Japan: Tsukuba Univ., 2004.
detection models in many other areas (such as network traffic [19] J. Jiang et al., “Anomaly detection with graph convolutional networks
for insider threat and fraud detection,” in Proc. IEEE Mil. Commun.
anomaly detection and fraud detection) could be similar to Conf. (MILCOM), Nov. 2019, pp. 109–114.
our proposed model with only a few changes in the feature [20] F. Liu, Y. Wen, D. Zhang, X. Jiang, X. Xing, and D. Meng, “Log2vec:
extraction process. In addition, since the number of labeled A heterogeneous graph embedding based approach for detecting cyber
attack samples in insider threat is limited, we wonder if the threats within enterprise,” in Proc. ACM SIGSAC Conf. Comput. Com-
mun. Secur., New York, NY, USA, Nov. 2019, pp. 1777–1794, doi:
pre-training GCN model is capable of tackling this situation. 10.1145/3319535.3363224.
Thus, we will seek to incorporate the pre-training mechanism [21] M. N. Hossain et al., “SLEUTH: Real-time attack scenario reconstruc-
into our model in future work. tion from COTS audit data,” in Proc. USENIX Secur. Symp., 2017,
pp. 487–504.
[22] S. Ma, J. Zhai, F. Wang, K. H. Lee, X. Zhang, and D. Xu, “MPI:
Multiple perspective attack investigation with semantics aware execution
R EFERENCES partitioning,” in Proc. USENIX Secur. Symp., 2017, pp. 1111–1128.
[23] Y. Tang et al., “NodeMerge: Template based efficient data reduction
[1] L. Liu, O. de Vel, Q.-L. Han, J. Zhang, and Y. Xiang, “Detect- for big-data causality analysis,” in Proc. ACM SIGSAC Conf. Comput.
ing and preventing cyber insider threats: A survey,” IEEE Commun. Commun. Secur., New York, NY, USA, Oct. 2018, pp. 1324–1337, doi:
Surveys Tuts., vol. 20, no. 2, pp. 1397–1417, 2nd Quart., 2018, doi: 10.1145/3243734.3243763.
10.1109/COMST.2018.2800740. [24] J. Chen, T. Ma, and C. Xiao, “FastGCN: Fast learning with graph
[2] U.S. Cybercrime-Survey. (2015). CERT Division of the convolutional networks via importance sampling,” in Proc. ICLR, 2018,
Software Engineering Institute, Price Waterhouse Cooper. pp. 1–15.
[Online]. Available: http://www.pwc.com/us/en/increasing-it- [25] Y. Ma, S. Wang, C. C. Aggarwal, and J. Tang, “Graph convolutional
effectiveness/publications/assets/2015-us-cybercrime-survey.pdf networks with EigenPooling,” in Proc. 25th ACM SIGKDD Int. Conf.
[3] Verizon 2018 Data Breach Investigations Report. (2018). [Online]. Knowl. Discovery Data Mining, Jul. 2019, pp. 723–731.
Available: http://www.verizon-enterprise.com/resources/reports/ [26] M. Qu, Y. Bengio, and J. Tang, “GMNN: Graph Markov neural
[4] Dtex Systems. (2018). 2018 Insider Threat Intelligence Report. networks,” in Proc. ICML, 2019, pp. 5241–5250.
[Online]. Available: http://www.dtex-systems.com/2018-insider-threat- [27] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and
intelligence-report/ locally connected networks on graphs,” in Proc. ICLR, 2013, pp. 1–14.
[5] X. Feng, Z. Zheng, D. Cansever, A. Swami, and P. Mohapatra, “Stealthy [28] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
attacks with insider information: A game theoretic model with asym- networks on graphs with fast localized spectral filtering,” in Proc.
metric feedback,” in Proc. IEEE Mil. Commun. Conf., Nov. 2016, NeurIPS, 2016, pp. 3844–3852.
pp. 277–282. [29] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
[6] B. A. Alahmadi, P. A. Legg, and J. R. C. Nurse, “Using Internet activity learning on large graphs,” in Proc. NeurIPS, 2017, pp. 1024–1034.
profiling for insider-threat detection,” in Proc. 17th Int. Conf. Enterprise [30] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and
Inf. Syst., 2015, pp. 709–720. Y. Bengio, “Graph attention networks,” in Proc. ICLR, 2017, pp. 1–12.
[7] N. T. Kipf and M. Welling, “Semi-supervised classification with graph [31] H. Pei, B. Wei, K. C. C. Chang, Y. Lei, and B. Yang, “Geom-
convolutional networks,” in Proc. ICLR, 2017, pp. 1–14. GCN: Geometric graph convolutional networks,” in Proc. ICLR, 2020,
[8] Y. Chen, C. M. Poskitt, and J. Sun, “Learning from mutants: Using code pp. 1–12.
mutation to learn and monitor invariants of a cyber-physical system,” in [32] B. Xu, H. Shen, Q. Cao, Y. Qiu, and X. Cheng, “Graph wavelet neural
Proc. IEEE Symp. Secur. Privacy (SP), May 2018, pp. 648–660, doi: network,” in Proc. ICLR, 2019, pp. 1–13.
10.1109/SP.2018.00016. [33] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and
[9] X. Shu, D. Yao, N. Ramakrishnan, and T. Jaeger, “Long-span program M. M. Bronstein, “Geometric deep learning on graphs and manifolds
behavior modeling and attack detection,” ACM Trans. Privacy Secur., using mixture model CNNs,” in Proc. IEEE Conf. Comput. Vis. Pattern
vol. 20, no. 4, pp. 1–28, Nov. 2017, doi: 10.1145/3105761. Recognit. (CVPR), Jul. 2017, pp. 5425–5434.
[10] P. A. Legg, O. Buckley, and M. Goldsmith, “Caught in the act of an [34] Q. Li, Z. Han, and X. Wu, “Deeper insights into graph convolu-
insider attack: Detection and assessment of insider threat,” in Proc. IEEE tional networks for semi-supervised learning,” in Proc. AAAI, 2018,
Symp. Technol. Homeland Secur., Apr. 2015, pp. 1–6. pp. 3538–3545.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.
1652 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023

[35] N. T. Hoang and T. Maehara, “Revisiting graph neural networks: All Linghui Li (Member, IEEE) received the Ph.D.
we have is low-pass filters,” 2019, arXiv:1905.09550. degree from the Institute of Computing Technol-
[36] F. Wu, T. Zhang, A. H. D. Souza, C. Fifty, T. Yu, and Q. K. Weinberger, ogy, University of Chinese Academy of Sciences.
“Simplifying graph convolutional networks,” in Proc. ICML, 2019, She was a Post-Doctoral Researcher with the Bei-
pp. 6861–6871. jing University of Posts and Telecommunications
[37] D. Bo, X. Wang, C. Shi, and H. Shen, “Beyond low-frequency informa- (BUPT), where she is currently an Associate Pro-
tion in graph convolutional networks,” 2021, arXiv:2101.00797. fessor of cyberspace security. Her current research
[38] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A compre- interests are information security and artificial intel-
hensive survey on graph neural networks,” 2019, arXiv:1901.00596. ligence.
[39] W. D. K. Ma, J. P. Lewis, and W. B. Kleijn, “The HSIC bottle-
neck: Deep learning without back-propagation,” in Proc. AAAI, 2019,
pp. 5085–5092.
[40] J. Glasser and B. Lindauer, “Bridging the gap: A pragmatic approach to
generating insider threat data,” in Proc. IEEE Secur. Privacy Workshops,
San Francisco, CA, USA, 2013, pp. 98–104, doi: 10.1109/SPW.2013.37.
[41] A. Paszke, “PyTorch: An imperative style, high-performance deep learn-
ing library,” in Proc. NeurIPS, 2019, pp. 1–12.
Jie Yuan (Member, IEEE) received the Ph.D. degree
[42] T. Pevny, “Loda: Lightweight on-line detector of anomalies,” Mach. in cyberspace security from the Beijing University
Learn, vol. 102, no. 2, pp. 275–304, 2016. of Posts and Telecommunications. She has pub-
[43] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: Identifying lished several papers in journals and conference
density-based local outliers,” in Proc. ACM SIGMOD, May 2000, proceedings. Her current research focuses on cloud
vol. 29, no. 2, pp. 93–104. computing, network security, and trusted systems.
[44] E. Pantelidis, G. Bendiab, S. Shiaeles, and N. Kolokotronis, “Insider
threat detection using deep autoencoder and variational autoencoder
neural networks,” in Proc. IEEE Int. Conf. Cyber Secur. Resilience
(CSR), Jul. 2021, pp. 129–134, doi: 10.1109/CSR51186.2021.9527925.

Ximing Li received the M.S. degree in cybersecurity


from Villanova University in 2019. He is currently
pursuing the Ph.D. degree in cyberspace security
with the Key Laboratory of Trustworthy Distributed
Computing and Service, Ministry of Education, Bei- Yali Gao (Member, IEEE) received the Ph.D.
jing University of Posts and Telecommunications. degree from the Beijing University of Posts and
His current research interests include insider threats Telecommunications, Beijing, China. She was a
detection, graph neural networks, and network secu- Post-Doctoral Researcher with the Beijing Univer-
rity. sity of Posts and Telecommunications, where she
is currently an Associate Professor of cyberspace
security. Her current research interests include cyber
security, distributed computing, and trusted services.

Xiaoyong Li received the Ph.D. degree in com-


puter science from Xi’an Jiaotong University, Xi’an,
China, in 2009. He is currently a Professor and
the Executive Director of the School of Cyberspace
Security, Beijing University of Posts and Telecom-
munications. He has published more than 130 papers
in prestigious journals and conference proceedings.
His current research interests include network secu- Shui Yu (Fellow, IEEE) received the Ph.D. degree
rity, data security, and trusted distributed computing. from Deakin University, Australia, in 2004. He is
In 2009, he was awarded an Outstanding Doctoral currently a Professor with the School of Computer
Graduate in Shaanxi Province, China. In 2012, Science, University of Technology Sydney, Aus-
he was awarded the New Century Excellent Talent in University, China. tralia. His current H-index is 67. He has published
In 2015, he won the IET Information Security Premium Award. In 2019, five monographs, edited two books, and more than
he was selected as the Capital Science and Technology Leading Talents, and 500 technical papers at different venues, such as
won the First Prize of Science and Technology Progress Award of the China the IEEE T RANSACTIONS ON D EPENDABLE AND
Electronics Society. In 2021, he was selected into the China’s National Key S ECURE C OMPUTING, the IEEE T RANSACTIONS
Talent Plan. He has served as a member of the editorial boards for many ON PARALLEL AND D ISTRIBUTED S YSTEMS , the
journals, such as IEEE N ETWORKING L ETTERS and EURASIP Journal on IEEE T RANSACTIONS ON C OMPUTERS, the IEEE
Information Security. T RANSACTIONS ON I NFORMATION F ORENSICS AND S ECURITY, the IEEE
T RANSACTIONS ON M OBILE C OMPUTING, the IEEE T RANSACTIONS ON
K NOWLEDGE AND DATA E NGINEERING, the IEEE T RANSACTIONS ON
E MERGING T OPICS IN C OMPUTING, the IEEE/ACM T RANSACTIONS ON
Jia Jia (Graduate Student Member, IEEE) received N ETWORKING, and INFOCOM. He has been promoting the research field of
the M.S. degree from North China Electric Power networking for big data since 2013, and his research outputs have been widely
University in 2017. He is currently pursuing the adopted by industrial systems, such as Amazon cloud security. His research
Ph.D. degree in cyberspace security with the Key interests include cybersecurity, network science, big data, and mathematical
Laboratory of Trustworthy Distributed Computing modeling. He is an Elected Member of Board of Governors of the IEEE
and Service, Ministry of Education, Beijing Uni- VTS and ComSoc. He is a member of ACM and AAAS. He is also
serving the editorial boards for the IEEE C OMMUNICATIONS S URVEYS AND
versity of Posts and Telecommunications. His cur-
rent research interests include data security, network T UTORIALS (Area Editor) and the IEEE I NTERNET OF T HINGS J OURNAL
security, the Internet of Vehicles, and the Internet of (Editor). He is also a Distinguished Visitor of the IEEE Computer Society.
Things. He served as a Distinguished Lecturer for the IEEE Communications Society
from 2018 to 2021.

Authorized licensed use limited to: MIT Art Design and Technology University. Downloaded on February 01,2024 at 10:42:35 UTC from IEEE Xplore. Restrictions apply.

You might also like