You are on page 1of 12

Catching fraudsters

with network
visualization
About fraud data 2
1. Identifying fraud more quickly 3
2. Finding anomalous activity 5
3. Performing an investigation 7
Bonus example: Credit Card Fraud 10
Visualize your own fraud data 12
What is KeyLines? 12

www.cambridge-intelligence.com USA: +1 (775) 842-6665 UK: +44 (0)1223 362 000


Cambridge Intelligence Ltd,1Mount Pleasant House, Cambridge, CB3 0RN, UK.
Catching fraudsters with network visualization
Fraud is an expensive and complex problem. It touches all organizations and its prevalence is increasing
as fraudsters take advantage of new technologies. To stop fraud, analysts need data – lots of data – and
tools to understand it.

In this white paper, we will look at three insurance workflows using KeyLines to help analysts make sense
of large and complex data, and derive useful and valuable insight. All of the use examples are real, but
the data has been synthesized for privacy reasons.

About fraud data

There are many different kinds of fraud. In this paper we will look at insurance fraud, healthcare fraud,
review fraud and credit card fraud.

What unites these four examples is the nature of the data an analyst needs to understand. Fraud data is
characterized by four attributes: it is large, complex, noisy and often incomplete. A powerful and well-
designed network visualization application is the ideal tool to overcome these challenges.

2
1. Identifying fraud more quickly
This example shows how network visualization can make a repetitive and time-consuming fraud detection
process faster and more intuitive.

Most fraud detection systems work in a similar way. Data is collated on a


huge scale, rule scored and sorted into three categories: fraud, not fraud
and unsure.

A team of analysts then manually reviews the ‘unsures’ - a careful


balancing act between keeping genuine customers happy with fast,
accurate decisions and preventing real frauds from getting through.

Reviewing insurance claims with KeyLines

In this example we have synthesized a dataset of insurance claims. Nodes


represent claims, vehicles, people and addresses. These are presented
using a hierarchical layout to more clearly show dependency:

After loading a claim, the user is offered the ability to ‘find matches’ – which calls all other claims with any
similar attributes from the database:

Here we can see claims with shared attributes side-by-side

3
To emphasize unusual connections, we can use KeyLines’ combine feature to merge identical nodes:

Stephen and Kristina Porter both share an same address

Here we can see two claimants living at the same address in Colnbrook Street. Given they share a
surname, however, it is perhaps not as suspicious as it initially appeared and the claim should be approved.

Here one vehicle is involved in two separate claims

This example is a more suspicious. One vehicle is involved in claims from both Stewart Walter and Julia
Rodriguez. At this point, an investigator or analyst can decide to submit this case for further investigation:

Representing data as a network offers


an engaging way for analysts to rapidly
understand events. By incorporating
KeyLines into existing claims
management workflow, we have made
the process simple and intuitive.

4
2. Finding anomalous activity
This example shows how network visualization harnesses the analyst’s natural pattern-recognition
capability to find anomalies that might otherwise go undetected.

The previous example uses an investigative approach to network visualization (starting with a small
group of nodes and working outwards). Fraud analysts can also use anomaly detection. This is where
many transactions are inspected at an overview level to find unusual behavior. This example shows how
KeyLines could help uncover cases of healthcare fraud.

Detecting Medicare Fraud

As with all fraud detection, the key to detecting healthcare fraud is understanding connections: between
patients, procedures, practitioners and clinics.

Using this model, we can identify ‘normal’ patterns:

• A patient will see a limited number of practitioners.

• A practitioner will only perform certain kinds of procedure, based on their specialism.

• A practitioner and patient will be limited to a geographic area.

One of the most common, and costly, types of Medicare fraud is when practitioners bill for services they
haven’t provided using stolen social security numbers.

Medicare authorities cannot check each bill with the patients concerned. In any case, many patients do
not fully know or remember which services they received, and rarely understand the obscure procedure
codes used.

Instead, an investigator could look for unusual practitioner claims using KeyLines. Here we have loaded a
dataset of all claims made by practitioners on a certain day. We can see a standard pattern emerging:

One day of Medicare claims data

Generally, each procedure is connected to one patient and one practitioner. The practitioner is also
connected to a location or clinic. This standard pattern makes it easy to spot anomalies, like this very
productive doctor who appears to have claimed for 5 of these procedures in a single day:

5
This practitioner performed five procedures in one day - including four to a single patient

Another unusual pattern we could uncover in this data is practitioners whose patient lists significantly
overlap. This could indicate two doctors exploiting the same list of stolen social security numbers:

The two left-hand practitioners have an unusual volume of shared patients

By displaying a large volume of data at once, KeyLines allows users to quickly see anomalous patterns.

6
3. Performing an investigation
By providing analysts with data in its full-connected context, with tools to help them explore the network,
investigations become simpler and faster to complete.

Investigating Review Fraud

Thousands of reviews are posted to the web everyday. Sites like eBay, Yelp, Foursquare and Amazon own
huge volumes of user-generated review data that sits at the heart of their sales platforms.

Review fraud is when individuals or organizations manipulate that user-generated content to their own
advantage – creating false reviews to misrepresent their business or competitors.

For the websites, false reviews erode customer trust and damage the integrity of the data on which their
brands are built. Websites cannot monetize their content if the consumers don’t trust its accuracy or
validity.

Differences to financial fraud

A key difference between review fraud and financial fraud is that review websites rarely ask for verifiable
information, e.g. an address, credit card number, etc. This increases the number of reviews submitted,
but makes it difficult to crosscheck contributors’ reviews against a watch list.

Instead investigators are reliant on device data, location data and behavioral patterns, such as:

• Review text

• Review submission velocity

• Device fingerprints

• Profile data

• Geo-location data

Using an algorithmic approach, it’s possible to assign each piece of user-generated content with a fraud
likelihood score. There are plenty of different behavior patterns that could indicate fraud. These will
evolve over time as new techniques are developed, but some obvious patterns include:

• Multiple accounts associated with one device.

• Creating an account, leaving a single (very high or low) review, never returning.

• Reviewing a collection of businesses in one small area (e.g. all Italian restaurants in Cambridge)
leaving a single excellent review and a series of 1* reviews for the rest.

Let’s take a look at an example. Here is a subset of review data:

7
The circular nodes represent reviews, and are color-coded by rating (0 --> 5, red --> green). Reviews
previously removed as fraudulent show as ‘ghosted’ (faded) red ‘X’ nodes. The Time Bar shows review
volume over time.

The visual model we have used

There are three pieces of information associated with each review: the business reviewed (building icon),
the IP address used (computer icon), and the email address provided (@ symbol icon). Reviews flagged by
the system as suspicious use a heavy red link, instead of the default blue:

8
One IP address is responsible for 7 reviews for one establishment, three of which have already been deleted

With this visual data model, we can start to pick out and investigate unusual patterns of behavior. For
example, in the image above, one IP address has been used to submit seven reviews about a single
business, using four email addresses. Three reviews have already been removed as fake.

If we expand outwards on one of the deleted reviews, we see more clues of a possible attempt to
manipulate ratings:

One email address has been used to submit 6 zero-star reviews about a single business, using multiple IP addresses

This time, one device has been used to submit 6 zero-star reviews about a single business, but using 4
different IP addresses (or, more likely, a proxy IP address).

This visualization approach provides a fast and intuitive way to digest large amounts of data, improving
the quality and speed of investigations and allowing fraud to be identified more quickly.

9
Bonus example: Credit Card Fraud

This final example shows how KeyLines can help investigators understand the chronology of events,
allowing us to identify a fraud’s point of origin and follow subsequent transactions.

This dataset (taken from the Neo4j GraphGist, Credit Card Fraud Detection) is fake but helps us to show a
useful visual model for credit card fraud data.

Here, the main graph shows us relations between people and merchants (nodes), with the transactions
shown as links. Red links indicate disputed transactions and merchant nodes are enhanced with glyphs
showing the total dollar value of transactions within the scope of the time bar:

The time bar shows two aggregated trends. The histogram represents the total volume of transactions in
the network. The red trend-line shows the value of disputed transactions.

Using this visual model, we can follow transactions and begin to find useful insight. For example, by
zooming into specific spikes we can find times of interest. The largest spike is at 18:00 on 18 June when
four separate disputed transactions totaling $6009:

A vendor-centric view

10
Or we can tale a person-centric view. Here we can see that Madison’s card was used for 3 disputed
transactions in mid-June, starting at RadioShack on 01 June:

Figure 13: Madison’s card has been used for 3 disputed transactions

In fact, by selecting only those people with disputes we can see a cluster of potential frauds surrounding
a small group of merchants. The Time Bar helps us pin-point RadioShack again as the earliest disputed
transaction, and a potential point of source for the string of credit card frauds:

This example shows how simple and intuitive it is t perform fraud investigation using graph visualization.
With a few clicks, it is possible to translate a mass of complex data into specific and useful insight.

11
Visualize your own fraud data
What is KeyLines?

KeyLines, by Cambridge Intelligence, is a unique technology for building powerful web applications for
network visualization.

Using the KeyLines toolkit, developers can rapidly incorporate visualization components to deploy
alongside existing fraud detection and investigation platforms. These applications turn raw connected
data into powerful interactive charts, empowering users to ‘join the dots’ to discover patterns and
anomalies.

The applications built using KeyLines run in virtually any web browser and any device. A flexible
architecture means they can be deployed into existing IT environments, as part of a dashboard or as
standalone tools. Data can be pulled from multiple sources and interactive charts can be shared for
reporting purposes.

The advanced functionality available includes:

• Easy filtering

• Multiple automated layouts

• Social network analysis (SNA) measures

• Node and link aggregation and grouping

• A time bar for visualizing temporal data

• A mapping integration to visualize connected data with a geospatial dimension.

Organizations that have deployed KeyLines for fraud management include Cifas, the UK fraud prevention
authority, TripAdvisor, Fico, Aviva, Visa, JP Morgan Chase, Western Union, Allianz and BAE Systems.

To register for a free trial, visit http://cambridge-intelligence.com/try-keylines.

Want to learn more?


We have extra resources and information available to download from our website.
http://cambridge-intelligence.com/keylines
If you have any questions about pricing, or would like a free trial, just get in touch.
We would be delighted to help!
http://cambridge-intelligence.com/contact

www.cambridge-intelligence.com USA: +1 (775) 842-6665 UK: +44 (0)1223 362 000


Cambridge Intelligence Ltd, Mount
12Pleasant House, Cambridge, CB3 0RN, UK.

You might also like