You are on page 1of 6

Available online at www.sciencedirect.

com
Availableonline
Available onlineatatwww.sciencedirect.com
www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2018) 000–000
Procedia Computer Science 139 (2018) 263–268
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
The International Academy of Information Technology and Quantitative Management,
The International Academy of Information
the Peter Kiewit Institute,Technology and
University of Quantitative Management,
Nebraska
the Peter Kiewit Institute, University of Nebraska
User Behavior Auditing in Electric Management Information
User Behavior Auditing in Electric Management Information
System based on Graph Clustering
System based on Graph Clustering
Bingfeng Cui*, Hongbin Zhu
Bingfeng Cui*, Hongbin Zhu
Department of Information and Communication Technology, State Grid Corporation of China, Beijing, 100000, China
Department of Information and Communication Technology, State Grid Corporation of China, Beijing, 100000, China
Abstract
Abstract
In this paper, we propose a user behavior auditing algorithm based on graph clustering. First, we record the user operations
in the
In thiselectric management
paper, we propose a information
user behaviorsystem (MIS)
auditing and convert
algorithm basedtheon log data
graph into graphFirst,
clustering. representation
we record which
the userincludes not
operations
in thethe
only electric management
operation itself but information system
also the source (MIS)
and next andThen
step. convert the log
the user logdata
graph into graph
will representation
be divided which includes
into sub-graphs not
with strong
inner connections,
only the which
operation itself butstand
also for a continuous
the source and nextor step.
specific
Thenuser
the behavior. Next,will
user log graph thebedistance
dividedbetween behaviorwith
into sub-graphs graphs is
strong
defined
inner based on their
connections, whichsimilarity.
stand forFinally, the clustering
a continuous algorithm
or specific is appliedNext,
user behavior. in thethe
behavior
distance graphs to detect
between the abnormal
behavior graphs is
defined based
behavior. on their similarity.
The experiment shows thatFinally, the clustering
the proposed methodalgorithm is applied
can effectively detectinthe
theabnormal
behaviorbehavior
graphs toindetect the abnormal
a simulated electric
company management
behavior. The experimentinformation
shows thatsystem.
the proposed method can effectively detect the abnormal behavior in a simulated electric
company management information system.
© 2018 The Authors. Published by Elsevier B.V.
© 2018 The Authors. Published by Elsevier B.V.
This
© 2018is an open
The accessPublished
Authors. article under the CC BY-NC-ND
by Elsevier B.V. license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer
This review
is an under
open responsibility
access article of
under the
the scientific
CC BY-NC-NDcommittee
Peer review under responsibility of the scientific committee of (http://creativecommons.org/licenses/by-nc-nd/4.0/)
license The International
of The Academy
International Academy of Information
of InformationTechnology and
Technology
Peer review
Quantitative under responsibility
Management, the of
Peter the scientific
Kiewit committee
Institute, of
University The
of International
Nebraska.
and Quantitative Management, the Peter Kiewit Institute, University of Nebraska. Academy of Information Technology and
Quantitative Management, the Peter Kiewit Institute, University of Nebraska.
Keywords: Behavior auditing; Graph mining; Clustering
Keywords: Behavior auditing; Graph mining; Clustering

1. Introduction
1. Introduction
Abnormal detection is essential for security in management information system (MIS). Especially for the
Abnormal detection
infrastructure companiesissuch
essential for security
as power in management
grid, railway, information
and water supply. Thesesystem (MIS).
companies Especially
require for the
strict security
infrastructure companies such as power grid, railway, and water supply. These companies require
standards to avoid the abnormal behavior that could course huge damage. Even though quite many rules have strict security
standards to avoid
been defined the abnormal
to constrain behavior
the operations thatMIS,
in the could
it iscourse huge damage.
still necessary Even
to apply fullthough
auditingquite
of themany
user rules have
behaviors.
been defined to constrain the operations in the MIS, it is still necessary to apply full auditing of the user
Usually, the auditing is implemented manually based on the statistic information of the system. For example, we behaviors.
Usually, the auditing is implemented manually based on the statistic information of the system. For example, we

* Corresponding author. Tel.: +86-13911362102; fax: +86-10-66597486.


*E-mail address: bfcui@sgcc.com.cn.
Corresponding author. Tel.: +86-13911362102; fax: +86-10-66597486.
E-mail address: bfcui@sgcc.com.cn.
1877-0509 © 2018 The Authors. Published by Elsevier B.V.
This is an open
1877-0509 access
© 2018 The article
Authors.under the CC by
Published BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Elsevier B.V.
Peer is
This review under
an open responsibility
access of the
article under the scientific
CC BY-NC-NDcommittee of The
license International Academy of Information Technology and
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Quantitative
Peer Management,
review under the Peter
responsibility Kiewit
of the Institute,
scientific University
committee of Nebraska.
of The International Academy of Information Technology and
1877-0509 © 2018 The Authors. Published by Elsevier B.V.
Quantitative Management, the Peter Kiewit Institute, University of Nebraska.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer review under responsibility of the scientific committee of The International Academy of Information Technology and Quantitative
Management, the Peter Kiewit Institute, University of Nebraska.
10.1016/j.procs.2018.10.266
264 Bingfeng Cui et al. / Procedia Computer Science 139 (2018) 263–268
Bingfeng Cui, Hongbin Zhu / Procedia Computer Science 00 (2018) 000–000

can estimate the working loads of a department according to the working overtime information. However, it is
time-consuming and subjective to directly detect the user abnormal behavior by manpower.
In this paper, an automatic user behavior abnormal detection algorithm is proposed based on graph analysis.
Compared with existing log analysis and abnormal detection methods, the proposed algorithm represents the user
behavior with graph structure that can preserve not only a signal operation but also the relations among a set of
operations, which suggest a behavior with meaningful semantic information such as searching an object and
revising the document according to it or sending the search results to a college. The abnormal detection based on
the semantic behavior instead of signal user activity could be more accurate since the relationships between
operations are considered.
Based on the behavior graph generated from user logs, the distance between two behaviors is defined based
on the similarity of two corresponding graphs. Then we cluster the user behaviors according to their distances
into different groups. The groups containing a number of behaviors that are quite different from others should be
identified as a high possibility of abnormal behaviors. They can be sent to further manual check. Therefore, the
proposed framework can filter and identify the high possibility abnormal behaviors in electric MIS which can
improve the processing efficiency and save the manpower for auditing. The rest of the paper is structured as
follows.
Section 2 introduces the related work in abnormal detection and graph mining. The proposed methodology is
described in Section 3. Experimental results on a simulated dataset are given in Section 4 to illustrate the detection
accuracy. Finally, Section 5 concludes the whole paper.

2. Related work

For abnormal detection, Fernandes and Marwala [1] compared Support Vector Machine (SVM) classification
and a number of clustering approaches to separate human from not human users in Twitter in order to identify
normal human activity. Reum et al. [2] proposed a framework for user behavior analysis for bot detection in
online games. They focused on party play which reflects the social activities among gamers. Yoo et al. [3]
presented a visual analytics system, LongLine, which enables interactive visual analyses of large-scale audit logs.
Bao et al. [4] proposed an anomaly detection algorithm that considers traces as sequence data and uses a
probabilistic suffix tree-based method to organize and differentiate significant statistical properties possessed by
the sequences. However, the graph mining should be studied to use behavior representation for complex abnormal
detection.
Graph mining has been applied in many applications. Aridhi et al. [5] gave an overview of existing data mining
and graph processing frameworks that deal with very big graphs. Anwar et al. [6] proposed a social graph
generation technique to model users' interactions, where ties (edges) between a pair of users (nodes) were
established only if they participate in at least one common group-chat session, and weights were assigned to the
ties based on the degree of overlap in users' interests and interactions. Maesa et al. [7] explained their findings
structures in Bitcoin user graph, showing that these structural properties of the network are due to peculiar
unusual patterns in the user graph. Kent et al. [8] showed graph-based approaches to user classification and
intrusion detection with practical results. Khalilian et al. [9] applied graph mining on the opcode graphs of a
metamorphic family of malware to extract the frequent sub-graphs.

3. Methodology

First we record the user behavior as log file that contains the user visited resource and operations. Then the
behavior graph is generated based on the log file. Next the distance between behavior graphs is defined according
to their differences among nodes and edges. Finally, a clustering algorithm is applied to detect the abnormal
behavior based on the behavior graph distance. Figure 1 gives the flowchart of the proposed framework.
Author name / Procedia Computer Science 00 (2017) 000–000
Bingfeng Cui et al. / Procedia Computer Science 139 (2018) 263–268 265

Fig. 1. Flowchart of graph based abnormal detection

3.1. Behavior Graph Generation

The behavior graph is generated from user logs that record the user activities and operations in MIS. The log
file contains the timestamp, user id, user activities (login, logout, searching, copying, deleting, updating, inserting
and etc.) and related resources (table, file, network and computing). In the behavior graph, each node represents
a record in the log that including resource and duration time. An edge stands for the changing from one record to
the next record including timestamp and activity information.
According to the proposed method, each log file can generate a behavior graph, which may lead to the graph
contains too much information for example the work of a user in one day. Since abnormal behavior may only
happen in a short period, the behavior graph has to be further divided to represent a relatively independent
behavior with meaningful semantic information. In this paper, we first divide the user log files into segments that
represent a complete user behavior. The segmentation is based on time duration. If the timestamp difference
between two continuous records are bigger than a threshold, the user log will be divide into two parts from the
two records. This process will be iterated until no more new parts are generated. Then the user behavior graph
will be created for each log parts.
Initially, the graph is just a direct mapping from user log. To improve the matching efficiency, the nodes with
same resource id are aggregated together and the duration time of new node is updated by summarizing the values
from two nodes. Also the connected edges of two nodes are aggregated if there are connected with same nodes
and their activities are same. The merged edge contains activity attribute and a list recording the timestamps from
its ancestors. The aggregated graph stands the accessed resources and user operations with duration and
timestamp information, which can be used to describe the user behavior.

3.2 Behavior Graph Distance

Abnormal detection need to evaluate the difference between user behaviors. In this paper we define the
behavior distance based on similarity of their representative graphs. As defined in Section 3.1, the behavior graph
represent the resources with nodes and user actions with edges. Therefore, the graph similarity can be computed
by composing node similarity and edge similarity.
Node in behavior graph has two attributes: resource and duration. If two nodes have different resource attribute,
���������������������
their similarity is defined as 0. Otherwise their similarity can be calculated by formula: 1 − .
���������������������
266 Bingfeng Cui et al. / Procedia Computer Science 139 (2018) 263–268
Bingfeng Cui, Hongbin Zhu / Procedia Computer Science 00 (2018) 000–000

If two nodes have same duration, their similarity value will be 1. And the bigger their duration difference, the
smaller their similarity value. The overall graph node similarity can be defined the sum of corresponding nodes
(with same resource) similarity divided by the number of total nodes.
Accordingly, edge similarity can be defined as follows. If two edges connected nodes are not corresponding
nodes (with same resource attribute) or their activity attributes are same, then the similarity of the two edges is
|���������|
0. Otherwise, these two edges are corresponding edges and their similarity value is defined by 1 − |���������|,
in which numA and numB are the number of elements in the timestamps lists of two edges respectively. Then the
overall graph edge similarity is the sum of corresponding edges similarity divided by the number of total edges.
The overall similarity of two behavior graphs is the sum of their node similarity and edge similarity. According
to the definition, the overall similarity for two identical graphs is 1 and for two complete different graph is 0,
which is normalized and suitable for comparison.

3.3 Abnormal Detection

Based on the defined behavior graph similarity, we can apply the cluster algorithm to detect the abnormal
behaviors. Currently many clustering algorithms are proposed for different applications. In this paper we select
dbscan method considering its ability to group objects based on their distance. Dbscan is a density-based
clustering algorithm: given a set of points in some space, it groups together points that are closely packed together
(points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose
nearest neighbors are too far away).
To implement the abnormal behavior detection, we first create the behavior graphs of as set of users in a
certain period such as one month or one season. Then, the distance matrix among these behavior graphs are per-
computed to speed up the processing. Next the dbscan algorithm is performed under different threshold value
that defines the maximum distance between two samples for them to be considered as in the same neighborhood.
Finally the cluster with less number of object will be considered as abnormal behaviors that can be alerted to
system manager for further confirmation.

4. Results

To test the proposed method, we employ SimPY a python based event simulation tool to generate the user log
files. The extracted behavior graph is saved and analyzed with Networkx a Python package for the creation,
manipulation, and study of the structure, dynamics, and functions of complex networks. Based on that, the
distance between graphs is implemented. The dbscan algorithm is performed with sklearn a python package for
scientific computation based on the distance matrix of user behavior graph.

4.1. Behavior graph generation

We simulate 300 user behaviors and generate the log file in which each record has attributes as follows:
timestamp, user id, login IP address, user client interface (PC, mobile, web or others), operation type, operation
duration, and database table. Totally 8948 records during a day are generated for testing and each user has 30
operations during the day.
The log files are grouped by user id to create the behavior graph of that user. Some examples are given in Fig.
2 as follows. Fig. 2a shows a user behavior related to 7 resources and their connections, and Fig. 2b shows another
user behavior related to 4 resouces.
Bingfeng Cui et al. / Procedia Computer Science 139 (2018) 263–268 267
Author name / Procedia Computer Science 00 (2017) 000–000

(a) (b)

Fig. 2. Generated User Behavior Graphs

4.2. Dbscan clustering results

Based on the generated behavior graph and distance definition. We can calculate the distance matrix for graph
clustering. Fig. 3 gives the generated distance matrix for test data in which 50 users are analyzed. In Fig. 3, each
cell represents the distance between its row and column behavior. The lighter color indicates the smaller distance.
We can observe that there are certain group of users working on the same tasks are more closed than others.
Considering the complexity of distance matrix calculation, it is suggested to divide the user into groups and then
calculate the matrix. Since we can easily divide the users according to their rank, task and obligation, then for
the users in the same group the generated distance matrix can be computed quickly.

Fig. 3. Distance Matrix of Behavior Graphs

Based on the distance matrix, dbscan algorithm is applied to identify the abnormal behaviors. In the simulated
user log, we create 30 abnormal users who act differently from others. They abnormal users can be classified into
four types: unauthorized file copying, account stealing, low efficient working, and abnormal accessing (login in
unusual place and time). According to the experiment results, 90% of the abnormal behaviors are detected when
we set the minimum distance parameter (epsilon) of dbscan into 0.43 in our case. The epsilon parameter should
be adjusted according to different applications to achieve the high detection accuracy. The traditional rule based
268 Bingfeng Cui et al. / Procedia Computer Science 139 (2018) 263–268
Bingfeng Cui, Hongbin Zhu / Procedia Computer Science 00 (2018) 000–000

abnormal detection method can achieve 72% accuracy in the test dataset. Therefore, the proposed algorithm is
more effective for abnormal behavior detection than the baseline method.

5. Conclusions

In this paper, we suggest a graph based algorithm to describe the user behavior in electric management
information system for abnormal detection by defining graph distance and clustering. The simulation results
indicate that the proposed method can represent the user behavior and identify the abnormal behaviors from the
normal ones. However it is necessary to adjust the minimum distance parameter for the dbscan clustering
algorithm to improve the abnormal detection accuracy.

References

[1] M.A. Fernandes, P. Patel, T. Marwala, Automated Detection of Human Users in Twitter, Procedia Computer Science, Volume 53,
2015, Pages 224-231,
[2] Ah Reum Kang, Jiyoung Woo, Juyong Park, Huy Kang Kim, Online game bot detection based on party-play log analysis, Computers
& Mathematics with Applications, Volume 65, Issue 9, 2013, Pages 1384-1395,
[3] Seunghoon Yoo, Jaemin Jo, Bohyoung Kim, Jinwook Seo, LongLine: Visual Analytics System for Large-scale Audit Logs, Visual
Informatics, Volume 2, Issue 1, 2018, Pages 82-97,
[4] Liang Bao, Qian Li, Peiyao Lu, Jie Lu, Tongxiao Ruan, Ke Zhang, Execution anomaly detection in large-scale systems through
console log analysis, Journal of Systems and Software, Volume 143, 2018, Pages 172-186,
[5] Sabeur Aridhi, Engelbert Mephu Nguifo, Big Graph Mining: Frameworks and Techniques, Big Data Research, Volume 6, 2016, Pages
1-10,
[6] Tarique Anwar, Muhammad Abulaish, A social graph based text mining framework for chat log investigation, Digital Investigation,
Volume 11, Issue 4, 2014, Pages 349-362,
[7] Damiano Di Francesco Maesa, Andrea Marino, Laura Ricci, Detecting artificial behaviours in the Bitcoin users graph, Online Social
Networks and Media, Volumes 3–4, 2017, Pages 63-74,
[8] Alexander D. Kent, Lorie M. Liebrock, Joshua C. Neil, Authentication graphs: Analyzing user behavior within an enterprise network,
Computers & Security, Volume 48, 2015, Pages 150-166,
[9] Alireza Khalilian, Amir Nourazar, Mojtaba Vahidi-Asl, Hassan Haghighi, G3MD: Mining frequent opcode sub-graphs for
metamorphic malware detection of existing families, Expert Systems with Applications, Volume 112, 2018, Pages 15-33,

You might also like