You are on page 1of 17

White Paper:

Multi-Model Identifies Fraud


At Scale

By Arthur Keen (Senior Solution Architect, ArangoDB)


May 2020
Table of Contents 
 
The Significance of Fraud and Graphs 2 

Why Multi-Model for Fraud Detection? 3 

Converting from Relational Source to Multi-model Graph 4 

Fraud Questions 5 

Detect Fraud Rings From a Suspicious Account 5 

Detect All Fraud Rings 6 

Find Orphan Accounts 7 

Find Most Influential Customers and Accounts 8 

What are the top 3 most influential accounts? 9 

Finding Money Laundering Patterns 10 

Detecting Fraud At Scale 11 

Conclusion 12 

Hands-on with Fraud Detection & Anti Money Laundering 13 

Appendix A: Queries 15 

 
 

1
The Significance of Fraud and Graphs 
Fraud  is  an  enormous  and  ever  growing  problem  impacting  all  industries  and 
government  services.   Global  fraud  results  in  over  $3.7  trillion  losses  annually. 
Businesses  lose  on  average  5%  of  their  income  to  fraud  every  year.  In  2018 
businesses  incurred  $3.13 remediation  costs  for  each  dollar of fraud [​1​], dealing 
with chargebacks, fees, interest and labor. 
  
Traditional  fraud detection views data through a straw, focusing on discrete data 
points  including  specific  accounts, individuals, devices or IP addresses. However, 
today’s  sophisticated  fraudsters  escape  detection  by  forming  fraud  rings 
composed of stolen and synthetic identities and circuitous back channels.  
 
To  uncover  fraud  rings,  it  is  essential  to  look  beyond  individual  data  points  in 
individual  data  sources  to  a  broader  view  of  the  connection  patterns  that  exist 
across  multiple  data  ​modalities1.  Multiple  disparate  data  sources  storing 
individual  activities  and  relationships  that  need  to  be  analysed  in  concert  to 
detect complex fraudulent behavior. 
  
ArangoDB’s  native  multi-model  is  ideal  for  tackling  this  challenge,  because  it 
supports  graphs,  documents,  key-stores,  and  relational  models.  This  provides 
streamlined,  flexible,  and  agile  harmonization  of  the  relevant  ​multi-modal2  user 
activity  data,  provides  the  performance  and  scale  to  detect  the  complex  fraud 
patterns,  and  serves  results  in  the  different  data  models  needed  by 
stakeholders. 
  

1
​Multimodal data​. Our experience of the world is ​multimodal​ — we see objects, hear sounds, feel 
the texture, smell odors, and taste flavors. ​Modality​ refers to the way in which something 
happens or is experienced and a research problem is characterized as ​multimodal​ when it 
includes ​multiple​ such modalities.
2
This juxtaposition of multi-model and multi-modal is deliberate, they are orthogonal terms.

2
 
Figure 1: Identify fraud patterns in the network of transactions and relationships. 
 
 

Why Multi-Model for Fraud Detection? 


ArangoDB’s  multi-model  graph  allows  you  to  easily  fuse together disparate data 
and  identify  complex  fraudulent  patterns  of  connections,  such  as  fraud  rings, 
using the ArangoDB Query Language (AQL).  
 
The identification of fraud ring patterns requires very deep (multi-hop) traversals 
across  the  graph.   The  query  for  detecting  a  fraud  ring  can  be  accomplished  in 
six  lines  of  (easy  to  write  and  maintain)  AQL  code  and  ArangoDB  can  execute 
these queries with sub-second response times 
 
Multi-model  do  not  have  to  convert  the  entire  dataset  to  graph  to  do  this. 
Use  graphs  where  needed  for  analytics.  Multi-model  graphs  allow  you  to 
combine documents, joins, and graphs to solve this problem.   
 
   

3
Converting from Relational Source to Multi-model Graph 

The  source  of  data  for  fraud  detection  would  likely  be  a  relational  database, 
for  example,  the  schema  depicted  in  Figure  3,  which  describes  the  foreign 
key  relationships  among  the  Bank,  Branch,  Customer,  Account,  and 
Transaction Tables.   
 

 
Figure 3: Relational Source Schema 
 
How  do  we  convert  this  to  a  graph  in  ArangoDB?  Because  ArangoDB  is  a 
multi-model  database,  the  tables  can  be  ingested  as-is,  directly  into 
ArangoDB  as  collections,  so  the  Bank  table  becomes  the  Bank collection and 
so  on.  Then  you  can  choose  whether  to  convert  all  or  part  of  it  to  a  graph 
model based on the requirements for fraud analytics.  
 
Since  we  need to do deep link/traversal analytics on the account transactions 
and  the  customers,  it  makes  sense  to  add  graph  edges  in  this  area  of  the 
graph.  In  this  transformation  it  makes  sense  to  use  the  Transaction 
collection  as  an  edge  and  to  materialize  the CustomerAccount foreign key as 
the AccountHolder edge.  
 

 
 
Figure 4: Multi-Model Schema: Documents, Joins, Graph 
 
We  use  the  convention  of  converting  foreign  key  relationships  to  edges  that 
are  directed  from  the  dependent  to  the  independent  entity.  Resolution 

4
entities  AKA  join  tables  in  the  relational  model  can  be  used  as  edges  in  a 
graph model as we have done with Transaction. 
 
 

Fraud Questions 
We will describe how to use ArangoDB to answer various questions:   
 
● Are there potential fraud rings connected to a suspicious account? 
● Are there any potential fraud rings in my data? 
● Are there orphan accounts (those not transacting)?  
● Who are the most influential customers/accounts in transactions? 
● Are there any money laundering patterns? 
 
The  following  section  shows  how  these  questions  can  be  answered  in 
ArangoDB  on  synthetically  generated  transaction  data.  The  queries  are 
examples  for  detecting  the  patterns  on  this  synthetic  data  set,  meant  to 
inspire  practitioners  to  develop  real-world  fraud  detection  capabilities  on 
ArangoDB with real data. 

Detect Fraud Rings From a Suspicious Account 


 
Fraud  rings  consist  of  very  long  loops  of  transactions  and  relationships 
among  individuals  that  are  used  by  fraudsters  to  evade  detection.  These 
long loops are also used in sophisticated cyber crime, where the perpetrators 
create  long  paths  of  logins  across  multiple  systems  to  avoid  detection.  The 
reason  these  long  paths  are  difficult  to  detect  and  understand  is  that  they 
require  deep  multi-hop  traversals  into  the  graph  of  transactions  and 
relationships among the individuals collaborating in the fraud.   
 
In  conventional  systems,  these  multi-hop  queries  require  a  high  number  of 
joins,  which  can  take  a  substantial  amount  of  time  and  consume  a  large 
amount  of  computing  resources.  ArangoDB’s  graph  model  supports  high 
performance  multi-hop  queries,  where  for  example  10-hop  queries  on  large 
datasets  can  take  less than 10 milliseconds depending on the topology of the 

5
graph.  For  this  example,  the  query  finds  long  loops  of  transactions  starting 
from  a  suspicious  account  and  looping back to the suspicious account over 5 
to 10 transaction hops. 
 
Figure  5  depicts  the  fraud  ring  detection  query  written  in  the  ArangoDB 
Query  Language  (AQL)  being  developed  and  executed  in  the  ArangoDB 
administrative  panel.  Note  that  this  sophisticated  query  is  expressed  in  6 
lines  of  AQL  code  and  that  the  compact  representation  is  easily 
understandable  and  maintainable.  The  query  results  are  displayed  as  a 
circuit  in  the  graph  visualization  and are also available in json, so they can be 
processed  by  applications  calling  this  query.  Note  also  that  the  query  is 
parameterized by ‘suspicious account’ and number of loops to detect. 
 

 
Figure 5: Finding fraud Ring(s) from a suspicious account 

Detect All Fraud Rings 


 
In  the  previous  example,  we  detected  fraud  rings  connected  to  a  suspicious 
account.  What  if  we  did  not  have  a  list  of  suspicious  accounts  to  analyze  yet 
and wanted to analyze our graph to detect all of the fraud ring patterns in it? 
 

6
This  is  easily  accomplished  in  AQL  by  adding  an  outer  loop  to  the  fraud  ring 
detector for suspicious accounts.  This sophisticated query is written in only 6 
lines of AQL! 
The query for finding all fraud loops is depicted in Figure 6.   
 

 
Figure 6: Find all fraud rings 

Find Orphan Accounts 


 
There  are  many  patterns  for  finding  suspicious  accounts  that  may  require 
further  investigation.  Most  of  these  patterns  are  essentially  finding 
anomalous behavior to flag accounts.   
 
One  pattern  is  the  orphan  account,  where  an  account is set up to participate 
in  very  specific  fraud  transaction  patterns,  but otherwise does not interact in 
a ‘normal’ way with other accounts and may be used very infrequently. 
 
Figure  7  depicts  a  query  for  finding  orphan  accounts  and  reports  on  the 
accounts and account owner. 
 

7
 
Figure 7: Find Suspicious “Orphan” Accounts 
 
 

Find Most Influential Customers and Accounts 


 
We  can  also  use  standard  graph  algorithms  like  pagerank  to  find  deeply 
coordinated  activity,  by  looking  for  the  most  influential  customers  and 
accounts.   
 
The  pagerank  algorithm  scores  how  important  or  influential  a  vertex  is 
relative  to  the  rest  of  the  network.  This  is  accomplished  in  ArangoDB  by 
executing  ArangoDB’s  pagerank  algorithm  on  the  graph  via  the  Pregel 
interface and then visualizing the results.   
 
Figure  8  depicts  a  visualization  of  several  clusters  of  customer/ 
account/transaction  activity,  where  the  size  of  the  vertices  is  scaled 
proportional  to  the  pagerank  computed  for  that  vertex.  This  visualization 
provides  visual  cues  to  the  relative  dominance  of  customers and accounts in 
the network. 
 

8
 
Figure 8: Find most influential accounts and customers 
 
 

What are the top 3 most influential accounts? 


 
Top  3  or  top  10  queries  are  often  used  to  focus  attention.  In  this  example, 
we  use an AQL query to find the top 3 most influential customers.  This query 
is  essentially  reading  the  pagerank  value  inserted  by  ArangoDB’s  pagerank 
algorithm  and  ordering  the  results  in  descending  order  and  returning a limit 
of three. The query and the results of execution are depicted in Figure 9. 
 

 
Figure 9: Query for listing top 3 most influential accounts 

9
 
 

Finding Money Laundering Patterns 


 
ArangoDB  can  also  be  used  to  find  more  specific  patterns,  for  example,  in 
money  laundering.  In  money  laundering  there  is  a  funds 
disaggregation/aggregation  pattern,  where  many  small  transactions  (below 
some  known  triggering  threshold)  are  used  to split up a large sum of money, 
followed  by  multiple  transaction  hops  across  accounts  to  further  avoid 
detection,  ultimately  followed  by  a  number  of  transactions  that  aggregate 
the funds back to an account.   
 
This  fan-out/fan-in  pattern  can  easily  be  detected  using  AQL.  The query and 
results are depicted in Figure 10. 
 

 
Figure 10: Finding Money Laundering Patterns 
 

10
Detecting Fraud At Scale 
Real-world financial transactions generate billions of data points and 
relationships, which will rapidly overrun the capabilities of a single server. 
Providing fraud-detection performance at scale requires the underlying data 
systems to be able to scale out data across multiple nodes in a distributed 
cluster and to be able to efficiently distribute computation in parallel across 
the cluster.   
 
On a distributed database cluster, the limiting factor is network performance, 
because network performance is two orders of magnitude slower than 
memory and in a distributed cluster there will be data and communication 
traffic between nodes in the cluster. For example, the performance on 
detecting a fraud ring would be negatively impacted if many of the edges 
being traversed caused computation to hop back and forth between servers. 
Obviously better network performance improves overall performance, 
however there are also data distribution and query optimizations that can 
greatly reduce the amount of inter-node communication needed to execute 
queries, and therefore improve distributed performance.   
 
Optimizing the layout of data on the cluster can reduce the inter-node 
communication needed to perform queries. ArangoDB uses Smartgraph 
algorithm to optimize graph distribution across a cluster, SmartJoins to 
ensure that joins do not cross servers, and satellite collections to replicate 
metadata across servers so that lookups occur local to servers. 
 

 
Figure 11: Bad distribution of graph data causes network hops during query execution 

11
 
The Smartgraph feature of ArangoDB allows us to handle this problem in a 
smarter way. In Fraud Detection we might know from the past that 
fraudsters use banks in certain countries or regions to launder their money. 
We can use this domain knowledge as a sharing key for our graph data and 
allocate all financial transactions performed in this region on DB server 1, 
and distribute other transactions on other DB servers. By using this 
approach we can allocate all data needed to be grouped together on each 
machine, and use the query engines on each DB Server to execute our 
queries in parallel. 
 
 

Figure 12: Optimized data distribution with ArangoDB SmartGraphs 


 

Conclusion 
This  paper  points  the  way  to  using  ArangoDB  as  part  of  a  fraud  detection 
solution.  We  encourage  users  to  experiment  with  our  sample  data  and 
sample  queries,  learn  how  to  apply  ArangoDB  to  fraud  visa  experimentation 
by  adding/modifying  the  data  and  queries,  and  be  inspired  and  empowered 
to  apply  your  knowledge  of  fraud  to  use  ArangoDB  on  your  own  data  to 

12
detect  fraudulent  activity.  To  get  started  easily, you can follow the interactive 
demo provider on our cloud service ArangoDb Oasis and described below. 
  
 
 
Hands-on with Fraud Detection & Anti Money Laundering 
Testing  ArangoDB  and  its  capabilities  for  detecting  fraud  and  money 
laundering  is  very  simple.  Many  of  the  use  cases  shown  in  this  white  paper 
are  part  of  an  interactive  demo  available  for  free  on  ArangoDB’s  cloud 
service  Oasis.  No credit card is needed for a 14 day free trial deployment and 
the  examples  can  be  installed with just one click. A detailed guide is provided 
so really everyone can follow along easily.  
 
Just s​ ign-up for ArangoDB Oasis​ and follow the few steps below 
 
1. Create a Deployment (​Here is a 2min video Tutorial​) 
2. Install  the  Fraud  Detection  Example  in  Oasis  (Project  ->  Deployment 
Tab  ->  View  your  deployment  ->  Examples  Tab  or  just  click  “view 
Deployment” directly after initiating the deployment creation 
3. After  the  example  is  ready  (~1minute)  follow  the  Fraud  Detection 
guide  provided  to  run  real  queries  against  the  demo  data  you  just 
installed 
 
 

13
 
 
 
 
This  White  Paper  was  written  by  Arthur  Keen.  For  any  questions  about  solving 
Fraud  Detection  cases  with  ArangoDB,  feel  free  to  reach  out  to 
arthur@arangodb.com 

14
Appendix A: Queries 
/*  
 
Find all suspicious long loops of transactions 
Show the graph and json results 
Scroll to bottom of graph results and click "GraphViewer" to see results in Graph Viewer 
 
*/ 
 
WITH​ transaction, account 
FOR​ suspicous_account ​IN​ account 
​FOR​ acct, tx, path IN 5..10 ​OUTBOUND​ suspicous_account._id ​GRAPH​ 'fraud-detection' 
​PRUNE​ tx._to == suspicous_account._id 
​FILTER​ tx._to == suspicous_account._id 
RETURN​ path 
 
 
/*  
Find number of Curious loops from a suspicious Account 
Hints:  
Try suspiciousAccountID = account/10000032 
Rerun the query for different number of loops detected 
Show the graph and json results 
Scroll to bottom of graph results and click "GraphViewer" to see results in Graph Viewer 
*/ 
 
WITH​ account, transaction 
LET​ suspicious_account = ​DOCUMENT​(@suspiciousAccountID) 
FOR​ acct, tx, path IN 5..10 ​OUTBOUND​ suspicious_account._id ​GRAPH​ 'fraud-detection' 
​PRUNE​ tx._to == suspicious_account._id 
​FILTER​ tx._to == suspicious_account._id 
​LIMIT​ @numberOfLoopsReturned 
RETURN​ path 
 
 
/* 
Find Orphan Account 
An orphan account is an account with little or no transactions. 
These may be set up in advance of money laundering operations. 
This query finds accounts with no transactions 
   
*/ 
 
LET​ usedResources = ​UNION_DISTINCT​( 
​FOR​ relationship ​IN​ transaction ​RETURN​ relationship._from,  
​FOR​ relationship ​IN​ transaction ​RETURN​ relationship._to)  
FOR​ resource ​IN​ account  
​FILTER​ resource._id ​NOT​ ​IN​ usedResources  
​SORT​ resource.account_type, resource.customer_id  
  ​RETURN  {"customerName"  :  DOCUMENT(CONCAT("customer/", 
resource.customer_id)).Name,  "customerID":  resource.customer_id,  "accountID": 
resource._id, "type": resource.account_type } 

15
 
 
/*  
Anti Money Laundering Pattern Detection 
  Find  transaction  patterns  that  contain  a  disaggregation  and  re-aggregation  of  funds 
pattern 
  This  pattern  is  characterized  by  transactions  that  dis-aggregate  funds  from  a  source 
account to  
multiple accounts in amounts that are below a reporting threshold, i.e., below $10,000 
  followed  by  a  series  of  small  transactions  into  1  or  more  accounts,  followed  by 
re-aggregation 
of the small transactions into a destination account.  
Show the graph and json results 
Scroll to bottom of graph results and click "GraphViewer" to see results in Graph Viewer 
*/ 
 
WITH​ account, transaction 
LET​ accountOutDegree = (​FOR​ transaction ​IN​ transaction 
​COLLECT​ accountOut = transaction._from WITH COUNT INTO outDegree 
​RETURN​ {account : accountOut, outDegree : outDegree}) 
LET​ accountInDegree = (FOR transaction ​IN​ transaction 
​COLLECT​ accountIn = transaction._to ​WITH​ ​COUNT​ ​INTO​ inDegree 
​RETURN​ {account : accountIn, inDegree : inDegree} ) 
LET​ accountDegree = (​FOR​ inRecord ​IN​ accountInDegree 
​FOR​ outRecord ​IN​ accountOutDegree 
​FILTER​ inRecord.account == outRecord.account  
​RETURN​ ​MERGE​(inRecord, outRecord)) 
LET​ maxAccount = (​FOR​ maxDegree ​IN​ accountOutDegree  
​FILTER​ maxDegree.outDegree == ​MAX​(accountOutDegree[*].outDegree) 
​RETURN​ maxDegree)[0] 
FOR​ account, transaction ​IN​ 1..4 ​OUTBOUND​ maxAccount.account transaction 
RETURN​ transaction 
 

16

You might also like