Professional Documents
Culture Documents
with network
visualization
About fraud data 2
1. Identifying fraud more quickly 3
2. Finding anomalous activity 5
3. Performing an investigation 7
Bonus example: Credit Card Fraud 10
Visualize your own fraud data 12
What is KeyLines? 12
In this white paper, we will look at three insurance workflows using KeyLines to help analysts make sense
of large and complex data, and derive useful and valuable insight. All of the use examples are real, but
the data has been synthesized for privacy reasons.
There are many different kinds of fraud. In this paper we will look at insurance fraud, healthcare fraud,
review fraud and credit card fraud.
What unites these four examples is the nature of the data an analyst needs to understand. Fraud data is
characterized by four attributes: it is large, complex, noisy and often incomplete. A powerful and well-
designed network visualization application is the ideal tool to overcome these challenges.
2
1. Identifying fraud more quickly
This example shows how network visualization can make a repetitive and time-consuming fraud detection
process faster and more intuitive.
After loading a claim, the user is offered the ability to ‘find matches’ – which calls all other claims with any
similar attributes from the database:
3
To emphasize unusual connections, we can use KeyLines’ combine feature to merge identical nodes:
Here we can see two claimants living at the same address in Colnbrook Street. Given they share a
surname, however, it is perhaps not as suspicious as it initially appeared and the claim should be approved.
This example is a more suspicious. One vehicle is involved in claims from both Stewart Walter and Julia
Rodriguez. At this point, an investigator or analyst can decide to submit this case for further investigation:
4
2. Finding anomalous activity
This example shows how network visualization harnesses the analyst’s natural pattern-recognition
capability to find anomalies that might otherwise go undetected.
The previous example uses an investigative approach to network visualization (starting with a small
group of nodes and working outwards). Fraud analysts can also use anomaly detection. This is where
many transactions are inspected at an overview level to find unusual behavior. This example shows how
KeyLines could help uncover cases of healthcare fraud.
As with all fraud detection, the key to detecting healthcare fraud is understanding connections: between
patients, procedures, practitioners and clinics.
• A practitioner will only perform certain kinds of procedure, based on their specialism.
One of the most common, and costly, types of Medicare fraud is when practitioners bill for services they
haven’t provided using stolen social security numbers.
Medicare authorities cannot check each bill with the patients concerned. In any case, many patients do
not fully know or remember which services they received, and rarely understand the obscure procedure
codes used.
Instead, an investigator could look for unusual practitioner claims using KeyLines. Here we have loaded a
dataset of all claims made by practitioners on a certain day. We can see a standard pattern emerging:
Generally, each procedure is connected to one patient and one practitioner. The practitioner is also
connected to a location or clinic. This standard pattern makes it easy to spot anomalies, like this very
productive doctor who appears to have claimed for 5 of these procedures in a single day:
5
This practitioner performed five procedures in one day - including four to a single patient
Another unusual pattern we could uncover in this data is practitioners whose patient lists significantly
overlap. This could indicate two doctors exploiting the same list of stolen social security numbers:
By displaying a large volume of data at once, KeyLines allows users to quickly see anomalous patterns.
6
3. Performing an investigation
By providing analysts with data in its full-connected context, with tools to help them explore the network,
investigations become simpler and faster to complete.
Thousands of reviews are posted to the web everyday. Sites like eBay, Yelp, Foursquare and Amazon own
huge volumes of user-generated review data that sits at the heart of their sales platforms.
Review fraud is when individuals or organizations manipulate that user-generated content to their own
advantage – creating false reviews to misrepresent their business or competitors.
For the websites, false reviews erode customer trust and damage the integrity of the data on which their
brands are built. Websites cannot monetize their content if the consumers don’t trust its accuracy or
validity.
A key difference between review fraud and financial fraud is that review websites rarely ask for verifiable
information, e.g. an address, credit card number, etc. This increases the number of reviews submitted,
but makes it difficult to crosscheck contributors’ reviews against a watch list.
Instead investigators are reliant on device data, location data and behavioral patterns, such as:
• Review text
• Device fingerprints
• Profile data
• Geo-location data
Using an algorithmic approach, it’s possible to assign each piece of user-generated content with a fraud
likelihood score. There are plenty of different behavior patterns that could indicate fraud. These will
evolve over time as new techniques are developed, but some obvious patterns include:
• Creating an account, leaving a single (very high or low) review, never returning.
• Reviewing a collection of businesses in one small area (e.g. all Italian restaurants in Cambridge)
leaving a single excellent review and a series of 1* reviews for the rest.
7
The circular nodes represent reviews, and are color-coded by rating (0 --> 5, red --> green). Reviews
previously removed as fraudulent show as ‘ghosted’ (faded) red ‘X’ nodes. The Time Bar shows review
volume over time.
There are three pieces of information associated with each review: the business reviewed (building icon),
the IP address used (computer icon), and the email address provided (@ symbol icon). Reviews flagged by
the system as suspicious use a heavy red link, instead of the default blue:
8
One IP address is responsible for 7 reviews for one establishment, three of which have already been deleted
With this visual data model, we can start to pick out and investigate unusual patterns of behavior. For
example, in the image above, one IP address has been used to submit seven reviews about a single
business, using four email addresses. Three reviews have already been removed as fake.
If we expand outwards on one of the deleted reviews, we see more clues of a possible attempt to
manipulate ratings:
One email address has been used to submit 6 zero-star reviews about a single business, using multiple IP addresses
This time, one device has been used to submit 6 zero-star reviews about a single business, but using 4
different IP addresses (or, more likely, a proxy IP address).
This visualization approach provides a fast and intuitive way to digest large amounts of data, improving
the quality and speed of investigations and allowing fraud to be identified more quickly.
9
Bonus example: Credit Card Fraud
This final example shows how KeyLines can help investigators understand the chronology of events,
allowing us to identify a fraud’s point of origin and follow subsequent transactions.
This dataset (taken from the Neo4j GraphGist, Credit Card Fraud Detection) is fake but helps us to show a
useful visual model for credit card fraud data.
Here, the main graph shows us relations between people and merchants (nodes), with the transactions
shown as links. Red links indicate disputed transactions and merchant nodes are enhanced with glyphs
showing the total dollar value of transactions within the scope of the time bar:
The time bar shows two aggregated trends. The histogram represents the total volume of transactions in
the network. The red trend-line shows the value of disputed transactions.
Using this visual model, we can follow transactions and begin to find useful insight. For example, by
zooming into specific spikes we can find times of interest. The largest spike is at 18:00 on 18 June when
four separate disputed transactions totaling $6009:
A vendor-centric view
10
Or we can tale a person-centric view. Here we can see that Madison’s card was used for 3 disputed
transactions in mid-June, starting at RadioShack on 01 June:
Figure 13: Madison’s card has been used for 3 disputed transactions
In fact, by selecting only those people with disputes we can see a cluster of potential frauds surrounding
a small group of merchants. The Time Bar helps us pin-point RadioShack again as the earliest disputed
transaction, and a potential point of source for the string of credit card frauds:
This example shows how simple and intuitive it is t perform fraud investigation using graph visualization.
With a few clicks, it is possible to translate a mass of complex data into specific and useful insight.
11
Visualize your own fraud data
What is KeyLines?
KeyLines, by Cambridge Intelligence, is a unique technology for building powerful web applications for
network visualization.
Using the KeyLines toolkit, developers can rapidly incorporate visualization components to deploy
alongside existing fraud detection and investigation platforms. These applications turn raw connected
data into powerful interactive charts, empowering users to ‘join the dots’ to discover patterns and
anomalies.
The applications built using KeyLines run in virtually any web browser and any device. A flexible
architecture means they can be deployed into existing IT environments, as part of a dashboard or as
standalone tools. Data can be pulled from multiple sources and interactive charts can be shared for
reporting purposes.
• Easy filtering
Organizations that have deployed KeyLines for fraud management include Cifas, the UK fraud prevention
authority, TripAdvisor, Fico, Aviva, Visa, JP Morgan Chase, Western Union, Allianz and BAE Systems.