You are on page 1of 8

Management & Engineering 33 (2018) 1838-5745

Contents lists available at SEI

Management & Engineering


journal homepage: www.seidatacollection.com

Research on Internet Financial Fraud Detection Based on


Knowledge Graph
Yanru XUE ∗, Lei HUANG

School of Economic and Management, Beijing Jiaotong University, Beijing 100044, China

KEYWORDS ABSTRACT

Internet Finance, In recent years, with the development of Internet Financial, the
Internet Financial Fraud, proportion of Internet financial fraud is rising. At the same time, the
Knowledge graph, scale of fraud organization has the trend of globalization, because of
Fraud detection model which the fraud detection complexity rises, which brings new
challenges to the traditional way. Knowledge Graph Technology is a
new search engine optimization techniques of Google. It uses entities,
attributes, and connections to depict the different entities in the real
world and storage them in the database. It brings a new way to the
detection of financial fraud. This article uses the knowledge graph
techniques with mathematical models and neighborhood knowledge
to build an Internet financial fraud detection model. Firstly, it
introduces the type of Internet financial fraud and analyzed in-depth
understanding of the risks of Internet financial fraud, and finds a
breakthrough to solve fraud-identity record testing. Secondly, it depicts
the method to conduct the knowledge graph. Finally, we use the
knowledge graph to detect the likelihood of the fraud, with which the
client constitutions have a clearer grasp of the users. Therefore, it
reduces the proportion of Internet financial fraud.

© ST. PLUM-BLOSSOM PRESS PTY LTD

∗Corresponding author.
E-mail address: 16120620@bjtu.edu.cn

English edition copyright © ST. PLUM-BLOSSOM PRESS PTY LTD


DOI:10.5503/J.ME.2018.33.004
32
1 Introduction On the other hand, there are so many
differences between countries on
With the explosive growth of Internet finance, Internet-related policies and laws. Internet
Internet financial fraud incidents brought huge financial fraud has brought serious economic
economic losses to the community. Financial losses to the country and the people, and it is
fraud occupies 5% of the global economy, with very important to establish an effective fraud
annual losses of more than 2.3 trillion euros. The identification model.
proportion of Internet financial fraud has risen The primary issue about how to effectively
year by year. At the same time, the Internet identify the Internet Financial fraud is the
involves the majority of the community. The perception of the risk of Internet financial fraud.
occurrence of fraud has a double impact on For Internet financial fraud, the risks involved in
social stability and the people's property. The the transaction include the following:
knowledge graph technology can deal with the information asymmetry risk, moral hazard,
complex identification of Internet financial fraud. operational risk and liquidity risk. The risk of
Knowledge graph technology is of great information asymmetry is the primary feature.
significance to the new generation of Internet This paper mainly solves the risk of information
financial fraud detection technology. asymmetry through the technology of
knowledge graph.
2 Internet Financial Fraud
3 Knowledge Graph Technology
2.1 Research on Internet Finance
3.1 The development of knowledge graph
In recent years, with the development of the
Internet and the popularity of infrastructure, The development of the linking Open Data
traditional business continues to develop. 2013 project facilitated the development of the RDF
was named as the first year of “Internet database and graph database. Google pointed
development”. The Internet financial industry the concept of knowledge graph for the first
has developed rapidly, China is one of the time in 2012. Half a year later, the knowledge
fastest growing countries. Internet finance graph semantic network contains 57 billion
includes both Internet and financial “objects”, more than 18 billion "facts" and
characteristics, which was virtual, real-time "relationships" used to detect the search result.
response and so on. Through the Internet, The Knowledge graph quickly gained
transactions can greatly reduce the trading time widespread attention.
and transaction processes. At the same time, the The knowledge graph is essentially a semantic
user's convenience was enhanced. The network with a graph-based data structure
transaction process requires a high degree of consisting of nodes and edges. In the
security. knowledge map, each node represents the
"entity" that exists in the real world, and each
2.2 Internet Financial Fraud Detection
edge is the "relationship" between the
Internet financial fraud presents two major “entities”. The knowledge graph is the most
characteristics: increased complexity and efficient way of expressing the relationship. Each
globalization. Complexity means in order to entity or concept is identified by a globally
achieve the purpose of transactions the uniquely identified ID, each attribute-value pair
fraudulent user conceal some identity is used to characterize the intrinsic properties of
information. At the same time, the Internet the entity, and the relationship is used to
transactions formed an information island. The connect two entities, depicting the association
evaluation of the user is not enough, the between them. In the knowledge of the map, the
complexity of fraud detection increases. richness of the entity is rising, it varies from the
33
people, places, organizations, movies, music, relationship between different entities. The third
books, software, video games and so on. The step in building a map is knowledge reasoning.
objective description of the real world. As we all know, there are certain social rules and
other rules in the real world. By applying these
rules to the entities, more connections between
entities can be found. The last step is
"knowledge representation". How to store and
analyze the relationship of knowledge map by
using graph database is the key to apply the
knowledge graph.

4 Internet Financial Fraud


Recognition Process Based on
Knowledge Graph
Figure 1 The diagram of knowledge graph

4.1 The construction of the modal


3.2 The construction of knowledge graph
The types of entities stored in the knowledge
Data processing requires relying on graph include "people", "animals", "places", etc.
knowledge-related techniques. The specific When applying the knowledge graph
process of knowledge map includes "entity technology to detect fraud mostly is about the
chain", "the extraction of relationship", entity type--"people". When users of the
"knowledge reasoning" and "knowledge Internet start the financial transactions, the
representation". In order to show the identity of its legitimacy, authenticity,
construction of the knowledge graph, it will be comprehensiveness is essential for fraud
described in the way of constructing the movie detection. How to distinguish illegal and false
"Chinese partner" in the knowledge graph. identities from knowledge is the core. In order
to make the platform user more intuitively
aware of the identity of the applicant, we will
give the identity of the application by
establishing a model. The higher the score, the
more suspicious the higher the likelihood of
fraud.
This model is mainly for the Internet financial
platform to detect. The process is the identity
records enter into the fraud detection model,
which include specific features: name, age,
Figure 2 The construction of knowledge graph home address, ID number and so on. The fraud
detection model evaluates the identity records
The first step to build a knowledge map is the with the assistance of other data information
"entity chain". How to connect each entity is the records (identity record type standards and
first step to build a knowledge map. Firstly, the information, historical databases, etc.), returns
information about "Chinese partners" will be fraud scores and cause codes. The total model
stored in the knowledge map. You can link any of fraud detection is shown in Figure 3.
information about the entity. This associated
information is the information that the users
looked for. The second step is to build the

34
4.2 The basic type of identity record

The first step in fraud detection is to review


and authenticate the basic information of
identity, which is shown in Table 1. In this paper,
the Internet financial platform applicant Li Ming
is illustrated on the model evaluation process.

Table 1 Information required by the Internet


Figure 3 The model of fraud detection financial application business

This model detects the identity records by Numb


Type Number Type
three modules: The basic type of identity record, er
the score associated with the historical identity 1 Li Ming 8 Beijing
record, the intimate relationship affecting the 2 man 9 199
score. The first module needs to detect the 3 32 10 Beijing
authenticity of identity record’s basic type. This 13909283039
module requires type-related standards and 4 11 6883948
4039201
information support, such as how to distinguish 5 18892039484 12 id
the phone number of the identity is valid, liming@163.c Personal
whether the mailbox is in compliance with the 6 13
oml credit report
format of the mailbox. The second module and
63302933889
the third module are supported by the historical 7 14 others
93302933
database. The second module needs to
determine the fraud score by detecting its
The basic information needs support from
graphical network model. The third module
credit agencies, public security departments, the
determines the fraud score by creating a
telecommunications sector and other basic
graphical network of this identity record and the
information department. For example, the
identity record associated with it. Finally, the
verification of identity cards and names, gender
three parts of the score to merge, conclude the
authenticity need the support of the public
final score and the corresponding reason
security department to check whether the
coding.
identity card number is valid. Besides, we also
the entity
record should judge whether the phone number is valid.
We should pay attention to the liquidity of the
information when verifying the basic
the basic type of identity
standards
record information. Last but not least. The
characteristics of information that change over
he score associated with
time.
history date the historical record

Table 2 Information required by the Internet


the intimate relationship
financial application business
history date affecting the score
Weig Weig Scor
Type Score Type
ht ht e
score 1 100 10 8 70 8
2 100 10 9 50 0
Figure 4 Internet financial fraud detection model 3 100 10 10 80 10
4 100 10 11 80 9

35
5 80 10 12 100 10 Type score Type score
6 50 3 13 100 10 1 H 0 10 L 9

7 80 9 14 90 7 2 M 9 11 H 9

Total score 8.9766 5 H 5 12 H 8

E-mail and postal 6 M 0 13 H 7


code information is 7 H 9 14 H 8
not perfect; 8 L 4 3
reasons
Personal credit report 9 L 0 4
missing part of the Total score 68
information; Identity card number information
has been fraudulent history;
Reason
4.3 The score associated with the historical Lack of postal information;
identity record E-mail information is incomplete;

The historical database stores the data records


that have been applied. The entities, attributes 4.4 The intimate relationship affecting the
and connections of the knowledge map are score
stored in the graph database. The second part There are certain links between fraudulent
and the third part of the model are based on the identities, such as workplace, IP address, bank
mining of the relevant models in the graph card transfer account, communication contact,
database. The authenticity and comprehensive- postal code, etc., which may expose clues. The
ness of the application for identity information model is equivalent to the reality of the
can be obtained by testing with historical community of characters, intimate relationships
records. mapped to the database. When a point is fraud,
First, we need to connect the historical the associated entity fraud is likely to become
identity record associated with the type of the larger, and for the case of a close relationship,
current identity record and draw it from the that is, a strong association of two entities, when
historical database. The current application one fraud is retrieved, the possibility of exposure
status records include: name, ID number, mobile the other community increases.
phone number, e-mail, bank card number, home The process of calculation of Li Ming identity
address, postal code, work unit, office phone, record in this part is: according to Li Ming's
personal credit report. We obtain different levels information retrieval with the intimate
of sub-database. These sub-databases can relationship associate with it; indexing different
reflect this identity from different dimensions. sub-models; the normal model and fraud model
For example, sub-databases derived from home are compared to conclude the score, ultimately
addresses can generally be used to describe the output the score of the module and the cause of
identity application information from the same coding.
family. Furthermore, the case also can be brought
The threshold is expressed by H/M/L into a certain weight. Table 4 shows that Li
(high/middle/low). The higher the score is, the Ming’s intimate relationship impact
lower the likelihood of fraud is. The final score is assessment score. Li Ming identity records are
collected by the above model, and a certain summed up by the scores of the three modules.
weight can be given in the specific The final score and reason code are shown in
implementation case. Table 5.

Table 3 The score associated with the historical


identity record

36
Table 4 Li Ming intimate relationship impact 5 Conclusion
assessment score

Type Score Type Score This paper presents a new model of identity
1 H 7 9 H 9 record detection, using knowledge graph to
5 M 9 10 L 9 detect the identity of the Internet financial
7 H 5 11 L 8 portal. The application information of different
8 M 0 platforms is stored in the graph database. The
Total score 47
specific detection process is divided into three
steps: the basic evaluation of identity record,
The information is more perfect, did
Reason historical identity record evaluation, intimate
not find their intimate relationship in
code relationship evaluation. The model is based on
the identity of fraud records.
mathematical modeling, neighborhood
knowledge, and graphics network model in the
Table 5 Li Ming intimate relationship impact
assessment score
calculation process. This model returns the score
of the identity record and the reason code to the
Total transaction portal. The business personnel has
8.9766/14
score more accurate judge the applicant's fraud
E-mail and postal code possibility to reduce the fraud as soon as
Part one
Reason information is not perfect; possible.
code Personal credit report missing In the future research, we can integrate the
part of the information; calculation process of each module into the xml
Total to realize the automation. At the same time, we
68/120
score can compare the normal model and the fraud
Identity card number model in the knowledge map. Knowledge graph
information has been technology can also be applied to more identity
Part two
Reason fraudulent history; applications, the more increase of the major
code Lack of postal information; portals sharing information, the quicker they
E-mail information is build the knowledge map, which reduce the
incomplete; occurrence of fraud.
Total
47/70
score References
Part The information is more perfect,
three Reason did not find their intimate [1]. Singhal A. Introducing the knowledge graph:
code relationship in the identity of things, not strings[J]. Official Google Blog, May,
fraud records. 2012.
[2]. Pujara J, Miao H, Getoor L, et al. Knowledge

This model obtains the final fraud score and graph identification[M]//The Semantic
Web-ISWC 2013. Springer Berlin Heidelberg,
reason code by the summary information of the
2013: 542-557.
three modules. The fraud detection model feeds
[3]. Chein M, Mugnier M L. Graph-based knowledge
information to the Internet financial portal,
representation: computational foundations of
which helps them to judge the potential
conceptual graphs[M]. Springer Science &
fraudulence as soon as possible. The high fraud Business Media, 2008.
score identity records can contact with the [4]. Shujin Cao, Yuhui Wu, Jingzhu Wei, Changes and
customer to verify information. Through this trends of knowledge map research _C omission
model, the detection of identity records to a _CSSCI Journal of Metrology and Visualization
certain extent reduces the possibility of fraud _Shujin Cao [J]. Journal of Library Science, 2015
and reduces the loss of Internet financial portal. [5]. Research on the Risk and Supervision of China's

37
Internet Finance [J]. Financial Forum, 2014 [11]. Guozhang Yao, Zhao Gang. Internet Finance and
[6]. Liu Qi. Fraud into the Internet financial "black its risk research [J]. Journal of Nanjing University
swan" cross-industry defense joint control is of Posts and Telecommunications, 2015, 35 (2):
imperative [J]. Securities Daily, 2015 12-25.
[7]. Minghua Gong. Internet Finance: Characteristics, [12]. Youjun Huang. Internet financial risk
Impacts and Risk Prevention [J]. New Finance, identification analysis [J]. New Economy, 2014
2014 (2): 8-10. [13]. Haofen Wang. Semantic search for large-scale
[8]. Liu Shi, Wenjin Chen. China's credit card Internet RDF data. Shanghai: Shanghai Jiaotong
fraud last year increased by more than Jiucheng University, 2013.
[J]. China Information Daily, 2013 [14]. Changjiang Qin, Hanqing Hou. Knowledge Map -
[9]. Xie Ping, Chuanwei Zou. Internet financial model A New Field of Information Management and
research [J]. Finance Research, 2012 Knowledge Management [J]. Journal of
[10]. Minghua Gong. Internet Finance: Characteristics, Academic Libraries, 2009, 27 (1): 30-37.
Impact and Risk Prevention [J]. New Finance

38
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

You might also like