This action might not be possible to undo. Are you sure you want to continue?
Hidden Markov Model
Abhinav Srivastava, Amlan Kundu, Shamik Sural, Senior Member, IEEE, and
Arun K. Majumdar, Senior Member, IEEE
Abstract—Due to a rapid advancement in the electronic commerce technology, the use of credit cards has dramatically increased. As
credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are
also rising. In this paper, we model the sequence of operations in credit card transaction processing using a Hidden Markov Model
(HMM) and showhowit can be used for the detection of frauds. An HMMis initially trained with the normal behavior of a cardholder. If an
incoming credit card transaction is not accepted by the trained HMMwith sufficiently high probability, it is considered to be fraudulent. At
the same time, we try to ensure that genuine transactions are not rejected. We present detailed experimental results to show the
effectiveness of our approach and compare it with other techniques available in the literature.
Index Terms—Internet, online shopping, credit card, ecommerce security, fraud detection, Hidden Markov Model.
Ç
1 INTRODUCTION
T
HE popularity of online shopping is growing day by day.
According to an ACNielsen study conducted in 2005,
onetenth of the world’s population is shopping online [1].
Germany andGreat Britain have the largest number of online
shoppers, and credit card is the most popular mode of
payment (59percent). About 350milliontransactions per year
were reportedly carried out by Barclaycard, the largest credit
card company in the United Kingdom, toward the end of the
last century [2]. Retailers like WalMart typically handle
much larger number of credit card transactions including
online and regular purchases. As the number of credit card
users rises worldwide, the opportunities for attackers tosteal
credit card details and, subsequently, commit fraud are also
increasing. The total credit card fraud in the United States
itself is reported to be $2.7 billion in 2005 and estimated to be
$3.0 billion in 2006, out of which $1.6 billion and $1.7 billion,
respectively, are the estimates of online fraud [3].
Creditcardbased purchases can be categorized into two
types: 1) physical card and 2) virtual card. In a physicalcard
based purchase, the cardholder presents his card physically
to a merchant for making a payment. To carry out fraudulent
transactions in this kind of purchase, an attacker has to steal
the credit card. If the cardholder does not realize the loss of
card, it canleadtoa substantial financial loss tothe credit card
company. In the second kind of purchase, only some
important information about a card(card number, expiration
date, secure code) is required to make the payment. Such
purchases are normally done on the Internet or over the
telephone. To commit fraud in these types of purchases, a
fraudster simply needs to know the card details. Most of the
time, the genuine cardholder is not aware that someone else
has seenor stolenhis cardinformation. The onlywaytodetect
this kindof fraudis to analyze the spending patterns onevery
card and to figure out any inconsistency with respect to the
“usual” spending patterns. Fraud detection based on the
analysis of existing purchase data of cardholder is a
promising way to reduce the rate of successful credit card
frauds. Since humans tend to exhibit specific behavioristic
profiles, every cardholder can be represented by a set of
patterns containing information about the typical purchase
category, the time since the last purchase, the amount of
money spent, etc. Deviation from such patterns is a potential
threat to the system.
Several techniques for the detection of credit card fraud
have been proposed in the last few years. We briefly review
some of them in Section 2.
2 RELATED WORK ON CREDIT CARD FRAUD
DETECTION
Credit card fraud detection has drawn a lot of research
interest and a number of techniques, with special emphasis
on data mining and neural networks, have been suggested.
Ghosh and Reilly [4] have proposed credit card fraud
detection with a neural network. They have built a detection
system, which is trained on a large sample of labeled credit
card account transactions. These transactions contain exam
ple fraud cases due to lost cards, stolen cards, application
fraud, counterfeit fraud, mailorder fraud, and nonreceived
issue (NRI) fraud. Recently, Syeda et al. [5] have usedparallel
granular neural networks (PGNNs) for improving the speed
of data mining and knowledge discovery process in credit
card fraud detection. A complete system has been imple
mented for this purpose. Stolfo et al. [6] suggest a credit card
fraud detection system(FDS) using metalearning techniques
to learn models of fraudulent credit card transactions.
Metalearning is a general strategy that provides a means for
combining and integrating a number of separately built
classifiers or models. A metaclassifier is thus trained on the
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008 37
. The authors are with the School of Information Technology, Indian
Institute of Technology, Kharagpur, 721302 India.
Email: {abhinav.srivastava, amlan.kundu, shamik, akmj}
@sit.iitkgp.ernet.in.
Manuscript received 2 Aug. 2006; revised 11 Mar. 2007; accepted 21 Aug.
2007; published online 31 Aug. 2007.
For information on obtaining reprints of this article, please send email to:
tdsc@computer.org, and reference IEEECS Log Number TDSC01120806.
Digital Object Identifier no. 10.1109/TDSC.2007.70228.
15455971/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
correlation of the predictions of the base classifiers. The same
group has also worked on a costbased model for fraud and
intrusiondetection[7]. Theyuse Java agents for Metalearning
(JAM), which is a distributed data mining system for credit
card fraud detection. A number of important performance
metrics like True Positive—False Positive (TPFP) spreadand
accuracy have been defined by them.
Aleskerov et al. [8] present CARDWATCH, a database
mining system used for credit card fraud detection. The
system, based on a neural learning module, provides an
interface to a variety of commercial databases. Kim and Kim
have identified skewed distribution of data and mix of
legitimate and fraudulent transactions as the two main
reasons for the complexity of credit card fraud detection [9].
Based on this observation, they use fraud density of real
transaction data as a confidence value and generate the
weighted fraud score to reduce the number of misdetections.
Fan et al. [10] suggest the application of distributed data
mining in credit card fraud detection. Brause et al. [11] have
developed an approach that involves advanced data mining
techniques and neural network algorithms to obtain high
fraud coverage. Chiu and Tsai [12] have proposed Web
services and data mining techniques to establish a colla
borative scheme for fraud detection in the banking industry.
With this scheme, participating banks share knowledge
about the fraud patterns in a heterogeneous and distributed
environment. To establish a smooth channel of data
exchange, Web services techniques such as XML, SOAP,
and WSDL are used. Phua et al. [13] have done an extensive
survey of existing dataminingbased FDSs and published a
comprehensive report. Prodromidis and Stolfo [14] use an
agentbased approach with distributed learning for detect
ing frauds in credit card transactions. It is based on artificial
intelligence and combines inductive learning algorithms
and metalearning methods for achieving higher accuracy.
Phua et al. [15] suggest the use of metaclassifier similar to
[6] infrauddetectionproblems. TheyconsidernaiveBayesian,
C4.5, and Back Propagation neural networks as the base
classifiers. A metaclassifier is used to determine which
classifier should be considered based on skewness of data.
Although they do not directly use credit card fraud detection
as thetarget application, their approachis quitegeneric. Vatsa
et al. [16] have recently proposed a gametheoretic approach
to credit card fraud detection. They model the interaction
betweenanattacker andanFDSas amultistagegamebetween
two players, each trying to maximize his payoff.
The problem with most of the abovementioned ap
proaches is that they require labeled data for both genuine,
as well as fraudulent transactions, to train the classifiers.
Getting realworld fraud data is one of the biggest problems
associated with credit card fraud detection. Also, these
approaches cannot detect new kinds of frauds for which
labeleddata is not available. In contrast, we present a Hidden
Markov Model (HMM)based credit card FDS, which does
not require fraudsignatures andyet is able todetect frauds by
consideringa cardholder’s spendinghabit. We model a credit
card transaction processing sequence by the stochastic
process of an HMM. The details of items purchased in
individual transactions are usually not known to an FDS
runningat thebankthat issues credit cards tothecardholders.
This can be represented as the underlying finite Markov
chain, which is not observable. The transactions can only be
observed through the other stochastic process that produces
the sequence of the amount of money spent in each
transaction. Hence, we feel that HMM is an ideal choice for
addressingthis problem. Another important advantage of the
HMMbasedapproach is a drastic reduction in the number of
False Positives (FPs)—transactions identified as malicious by
an FDS although they are actually genuine. Since the number
of genuine transactions is a few orders of magnitude higher
than the number of malicious transactions, an FDS should be
designed in such a way that the number of FPs is as low as
possible. Otherwise, due to the “base rate fallacy” effect [17],
bank administrators may tend to ignore the alarms. To the
best of our knowledge, there is no other published literature
on the application of HMM for credit card fraud detection.
The rest of the paper is organized as follows: In Section 3,
we first briefly explain the working principle of an HMM.
We then show how to model credit card transaction
processing using HMM in Section 4. We also describe the
complete process flow of the proposed FDS in this section.
Detailed experimental result is presented in Section 5.
Finally, we conclude the paper with some discussions.
3 HMM BACKGROUND
An HMM is a double embedded stochastic process with two
hierarchy levels. It can be used to model much more
complicatedstochastic processes as comparedto a traditional
Markovmodel. AnHMMhas a finite set of states governedby
a set of transition probabilities. In a particular state, an
outcome or observation can be generated according to an
associatedprobability distribution. It is onlythe outcome and
not the state that is visible to an external observer [18].
HMMbased applications are common in various areas
such as speech recognition, bioinformatics, and genomics. In
recent years, Joshi and Phoba [19] have investigated the
capabilities of HMMin anomaly detection. They classify TCP
network traffic as an attack or normal using HMM. Cho and
Park [20] suggest an HMMbased intrusion detection system
that improves the modeling time and performance by
considering only the privilege transition flows based on the
domain knowledge of attacks. Ourston et al. [21] have
proposed the application of HMM in detecting multistage
network attacks. Hoang et al. [22] present a new method to
process sequences of systemcalls for anomalydetectionusing
HMM. The key idea is tobuilda multilayer model of program
behaviors based on both HMMs and enumerating methods
for anomaly detection. Lane [23] has used HMM to model
humanbehavior. Once humanbehavior is correctly modeled,
any detecteddeviationis a cause for concernsince an attacker
is not expectedto have a behavior similar to the genuine user.
Hence, an alarm is raised in case of any deviation.
An HMM can be characterized by the following [18]:
1. ` is the number of states in the model. We denote
the set of states o ¼ fo
1
. o
2
. . . . o
`
g, where o
i
, i ¼
1. 2. . . . . ` is an individual state. The state at time
instant t is denoted by c
t
.
2. ` is the number of distinct observation symbols per
state. The observation symbols correspond to the
physical output of the system being modeled. We
denote the set of symbols \ ¼ f\
1
. \
2
. . . . \
`
g, where
\
i
, i ¼ 1. 2. . . . . ` is an individual symbol.
38 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
3. Thestatetransitionprobabilitymatrix¹ ¼ ½o
i,
, where
o
i,
¼Pðc
tþ1
¼o
,
jc
t
¼o
i
Þ. 1i `. 1, `; t ¼1. 2. . . . .
ð1Þ
For the general case where any state , can be reached
from any other state i in a single step, we have o
i,
0 for all i, ,. Also,
P
`
,¼1
o
i,
¼ 1, 1 i `.
4. The observation symbol probability matrix 1 ¼
½/
,
ð/Þ, where
/
,
ð/Þ ¼ Pð\
/
jo
,
Þ. 1 , `. 1 / ` and
X
`
/¼1
/
,
ð/Þ ¼ 1. 1 , `.
ð2Þ
5. The initial state probability vector ¬ ¼ ½¬
i
, where
¬
i
¼ Pðc
1
¼ o
i
Þ. 1 i `. such that
X
`
i¼1
¬
i
¼ 1. ð3Þ
6. The observation sequence O ¼ O
1
. O
2
. O
3
. . . . O
1
,
where each observation O
t
is one of the symbols from
\ , and 1 is the number of observations in the
sequence.
It is evident that a complete specificationof anHMMrequires
the estimation of two model parameters, ` and `, and three
probability distributions ¹, 1, and ¬. We use the notation
` ¼ ð¹. 1. ¬Þ to indicate the complete set of parameters of the
model, where ¹, 1 implicitly include ` and `.
An observation sequence O, as mentioned above, can be
generated by many possible state sequences. Consider one
such particular sequence
Q ¼ q
1
. q
2
. . . . . q
R
. ð4Þ
where q
1
is the initial state.
The probability that O is generated from this state
sequence is given by
PðOjQ. `Þ ¼
Y
1
t¼1
PðO
t
jc
t
. `Þ. ð5Þ
where statistical independence of observations is assumed.
Equation (5) can be expanded as
PðOjQ. `Þ ¼ b
c1
ðO
1
Þ.b
c2
ðO
2
Þ . . . b
c1
ðO
R
Þ. ð6Þ
The probability of the state sequence Q is given as
PðQj`Þ ¼ ¬
c1
a
c1c2
a
c2c3
. . . a
c1À1c1
. ð7Þ
Thus, the probability of generation of the observation
sequence O by the HMM specified by ` can be written as
follows:
PðOj`Þ ¼
X
o Q
PðOjQ. `ÞPðQj`Þ. ð8Þ
Derivingthevalueof PðOj`Þ usingthedirect definitionof (8) is
computationally intensive. Hence, a procedure named as
ForwardBackwardprocedure[18] isusedtocomputePðOj`Þ.
4 USE OF HMM FOR CREDIT CARD FRAUD
DETECTION
An FDS runs at a credit card issuing bank. Each incoming
transaction is submitted to the FDS for verification. FDS
receives the card details and the value of purchase to verify
whether the transaction is genuine or not. The types of goods
that are bought inthat transactionare not knowntothe FDS. It
tries to find any anomaly in the transaction based on the
spending profile of the cardholder, shipping address, and
billing address, etc. If the FDS confirms the transaction to be
malicious, it raises analarm, andthe issuingbankdeclines the
transaction. The concernedcardholder maythenbe contacted
and alerted about the possibility that the card is compro
mised. In this section, we explain how HMM can be used for
credit card fraud detection.
A set of notations and acronyms used in the paper is
given in Table 1.
4.1 HMM Model for Credit Card Transaction
Processing
To map the credit card transaction processing operation in
terms of an HMM, we start by first deciding the observation
symbols inour model. We quantizethe purchase values rinto
` price ranges \
1
. \
2
. . . . \
`
, forming the observation
symbols at the issuing bank. The actual price range for each
symbol is configurable based on the spending habit of
individual cardholders. These price ranges can be deter
mineddynamically by applying a clustering algorithmonthe
values of each cardholder’s transactions, as shown in
Section 5.2. We use \
/
, / ¼ 1. 2. . . . `, to represent both the
observationsymbol, as well as the correspondingprice range.
In this work, we consider only three price ranges, namely,
low ðÞ, medium ðiÞ, and high ð/Þ. Our set of observation
symbols is, therefore, \ ¼ f. i. /g making ` ¼ 3. For
example, let  ¼ ð0. $100, i ¼ ð$100. $500, and / ¼ ð$500.
credit card limit. If a cardholder performs a transaction of
$190, then the corresponding observation symbol is i.
A credit cardholder makes different kinds of purchases of
different amounts over a period of time. One possibility is to
SRIVASTAVA ET AL.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 39
TABLE 1
Notations and Acronyms
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
consider the sequence of transaction amounts and look for
deviations in them. However, the sequence of types of
purchase is more stable compared to the sequence of
transaction amounts. The reason is that, a cardholder makes
purchases depending on his need for procuring different
types of items over a period of time. This, in turn, generates a
sequence of transactionamounts. Eachindividual transaction
amount usually depends on the corresponding type of
purchase. Hence, we consider the transition in the type of
purchase as state transition in our model. The type of each
purchase is linkedto the line of business of the corresponding
merchant. This information about the merchant’s line of
business is not known to the issuing bank running the FDS.
Thus, the type of purchase of the cardholder is hidden from
the FDS. The set of all possible types of purchase and,
equivalently, the set of all possible lines of business of
merchants forms the set of hidden states of the HMM. It
should be noted at this stage that the line of business of the
merchant is known to the acquiring bank, since this
information is furnished at the time of registration of a
merchant. Also, some merchants may be dealing in various
types of commodities (For example, WalMart, KMart, or
Target sells tens of thousands of different items). Such types
of line of business are consideredas Miscellaneous, andwedo
not attempt to determine the actual types of items purchased
in these transactions. Any assumption about availability of
this information with the issuing bank and, hence, with the
FDS, is not practical and, therefore, would not have been
valid. In Section 5, we show the effect of the choice of the
number of states on the system performance.
After deciding the state and symbol representations, the
next step is to determine the probability matrices ¹, 1, and
¬ so that representation of the HMM is complete. These
three model parameters are determined in a training phase
using the BaumWelch algorithm [18]. The initial choice of
parameters affects the performance of this algorithm and,
hence, they should be chosen carefully.
We consider the special case of fully connected HMM in
which every state of the model can be reached in a single step
from every other state, as shown in Fig. 1. Gr, El, Mi, etc., are
names given to the states to denote purchase types like
Groceries, Electronic items, and Miscellaneous purchases.
Spending profiles of the individual cardholders are used to
obtain an initial estimate for probability matrix 1 of (2). In
Section 4.2, we explain how to determine observation
symbols dynamically from a cardholder’s transactions.
4.2 Dynamic Generation of Observation Symbols
For each cardholder, we train and maintain an HMM. To
find the observation symbols corresponding to individual
cardholder’s transactions dynamically, we run a clustering
algorithm on his past transactions. Normally, the transac
tions that are stored in the issuing bank’s database contain
many attributes. For our work, we consider only the amount
that the cardholder spent in his transactions. Although
various clustering techniques could be used, we use
1means clustering algorithm [24] to determine the clusters.
1means is an unsupervised learning algorithmfor grouping
a given set of data based on the similarity in their attribute
(often called feature) values. Each group formed in the
process is called a cluster. The number of clusters 1 is fixed a
priori. The grouping is performed by minimizing the sum of
squares of distances between each data point and the
centroid of the cluster to which it belongs.
In our work, 1 is the same as the number of observation
symbols `. Let c
1
. c
2
. . . . c
`
be the centroids of the
generated clusters. These centroids or mean values are
used to decide the observation symbols when a new
transaction comes in. Let r be the amount spent by the
cardholder n in transaction T. FDS generates the observa
tion symbol for r (denoted by O
r
) as follows:
O
r
¼ \
oio iii
i
jrÀc
i
j
. ð9Þ
As mentioned before, the number of symbols is 3 in our
system. Considering ` ¼ 3, if we execute 1means algo
rithm on the example transactions in Table 2, we get the
clusters, as shown in Table 3, with c

, c
i
, and c
/
as the
respective centroids. It may be noted that the dollar amounts
5, 10, and 10 have been clustered together as c

resulting in a
centroid of 8.3. The percentage ðjÞ of total number of
transactions in this cluster is thus 30 percent. Similarly,
40 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008
Fig. 1. HMM for credit card fraud detection.
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
dollar amounts 15, 15, 20, 25, and25 have beengroupedinthe
cluster c
i
with centroid 20, whereas amounts 40 and 80 have
been grouped together in cluster c
/
. c
i
and c
/
, thus, contain
50 percent and 20 percent of the total number of transactions.
When the FDS receives a transaction T for this cardholder, it
measures the distance of the purchase amount r with respect
to the means c

, c
i
, and c
/
to decide (using (9)) the cluster to
which T belongs and, hence, the corresponding observation
symbol. As an example, if r ¼ $10, then in Table 3 using (9),
the observation symbol is \
1
¼ .
4.3 Spending Profile of Cardholders
The spending profile of a cardholder suggests his normal
spending behavior. Cardholders can be broadly categorized
into three groups based on their spending habits, namely,
highspending (hs) group, mediumspending (ms) group,
and lowspending (ls) group. Cardholders who belong to the
hs group, normally use their credit cards for buying high
priced items. Similar definition applies to the other two
categories also.
Spending profiles of cardholders are determined at the
end of the clustering step. Let j
i
be the percentage of total
number of transactions of the cardholder that belong to
cluster with mean c
i
. Then, the spending profile (SP) of the
cardholder n is determined as follows:
o1ðnÞ ¼ oio ior
i
ðj
i
Þ. ð10Þ
Thus, spending profile denotes the cluster number to which
most of the transactions of the cardholder belong. In the
example inTable 3, the spendingprofile of the cardholder is 2,
that is, iand, hence, the cardholder belongs to the ms group.
4.4 Model Parameter Estimation and Training
We use BaumWelch algorithm to estimate the HMM
parameters for each cardholder. The algorithm starts with
an initial estimate of HMM parameters ¹, 1, and ¬ and
converges to the nearest local maximum of the likelihood
function. Initial state probability distribution is considered to
be uniform, that is, if there are ` states, then the initial
probability of each state is 1´`. Initial guess of transition and
observation probability distributions can also be considered
to be uniform. However, to make the initial guess of
observation symbol probabilities more accurate, spending
profile of the cardholder, as determined in Section 4.3, is
taken into account. We make three sets of initial probability
for observation symbol generation for three spending
groups—ls, ms, and hs. Based on the cardholder’s spending
profile, we choose the corresponding set of initial observation
probabilities. The initial estimate of symbol generation
probabilities using this method leads to accurate learning of
the model. Since there is noa priori knowledge about the state
transition probabilities, we consider the initial guesses to be
uniform. Incase of a collaborative workbetweenanacquiring
bank and an issuing bank, we can have better initial guess
about state transition probabilities as well.
We now start training the HMM. The training algorithm
has the following steps: 1) initialization of HMM para
meters, 2) forward procedure, and 3) backward procedure.
Details of these steps can be found in [18]. For training the
HMM, we convert the cardholder’s transaction amount into
observation symbols and form sequences out of them. At
the end of the training phase, we get an HMM correspond
ing to each cardholder. Since this step is done offline, it does
not affect the credit card transaction processing perfor
mance, which needs online response.
4.5 Fraud Detection
After the HMM parameters are learned, we take the
symbols from a cardholder’s training data and form an
initial sequence of symbols. Let O
1
. O
2
. . . . O
1
be one such
sequence of length 1. This recorded sequence is formed
from the cardholder’s transactions up to time t. We input
this sequence to the HMM and compute the probability of
acceptance by the HMM. Let the probability be c
1
, which
can be written as follows:
c
1
¼ PðO
1
. O
2
. O
3
. . . . O
1
j`Þ. ð11Þ
Let O
1þ1
be the symbol generated by a new transaction at
time t þ 1. To form another sequence of length 1, we drop
O
1
and append O
1þ1
in that sequence, generating
O
2
. O
3
. . . . O
1
. O
1þ1
as the new sequence. We input this
new sequence to the HMM and calculate the probability of
acceptance by the HMM. Let the new probability be c
2
c
2
¼ PðO
2
. O
3
. O
4
. . . . O
1þ1
j`Þ. ð12Þ
Let Ác ¼ c
1
Àc
2
. ð13Þ
If Ác 0, it means that the new sequence is accepted by the
HMM with low probability, and it could be a fraud. The
newly added transaction is determined to be fraudulent if
the percentage change in the probability is above a threshold,
that is,
Ác´c
1
! Threshold. ð14Þ
The threshold value can be learned empirically, as will be
discussed in Section 5. If O
1þ1
is malicious, the issuing bank
does not approve the transaction, and the FDS discards the
symbol. Otherwise, O
1þ1
is added in the sequence perma
nently, andthe newsequence is usedas the base sequence for
determining the validity of the next transaction. The reason
for including newnonmalicious symbols inthe sequence is to
capture the changing spending behavior of a cardholder.
Fig. 2 shows the complete process flow of the proposed FDS.
As shown in the figure, the FDS is divided into two
parts—one is the training module, and the other is detection.
SRIVASTAVA ET AL.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 41
TABLE 3
Output of KMeans Clustering Algorithm
TABLE 2
Example Transactions with the Dollar Amount
Spent in Each Transaction
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
Training phase is performed offline, whereas detection is an
online process.
5 RESULTS
Testing credit card FDSs using real data set is a difficult task.
Banks do not, in general, agree to share their data with
researchers. There is also no benchmark data set available for
experimentation. We have, therefore, performed largescale
simulation studies to test the efficacy of the system. A
simulator is usedto generate a mix of genuine andfraudulent
transactions. The number of fraudulent transactions in a
given length of mixed transactions is normally distributed
with a user specified j (mean) and o (standard deviation),
taking cardholder’s spending behavior into account. j
specifies the average number of fraudulent transactions in a
given transaction mix. In a typical scenario, an issuing bank,
and hence, its FDS receives a large number of genuine
transactions sparingly intermixed with fraudulent transac
tions. Thegenuinetransactions aregeneratedaccordingtothe
cardholders’ profiles. The cardholders are classified into
three categories as mentioned before—the low, medium, and
hs groups. We have studiedthe effects of spendinggroupand
the percentage of transactions that belong to the low,
medium, and highpricerange clusters. We use standard
metrics—True Positive (TP) and FP, as well as TPFP spread
and Accuracy metrics, as proposed in [7] to measure the
effectiveness of the system. TP represents the fraction of
fraudulent transactions correctly identified as fraudulent,
whereas FP is the fraction of genuine transactions identified
as fraudulent. Most of the design choices for a FDS that result
in higher values of TP, also cause FP to increase. To
meaningfully capture the performance of such a system, the
difference between TP and FP, often called the TPFP spread,
is used as a metric. Accuracy represents the fraction of total
number of transactions (both genuine and fraudulent) that
have been detected correctly. It can be expressed as follows:
Accuracy ¼
No. of good trans. detected as goodþNo. of bad trans. detected as bad
Total No. of transactions
. ð15Þ
We first carried out a set of experiments to determine the
correct combination of HMM design parameters, namely,
the number of states, the sequence length, and the threshold
value. Once these parameters were decided, we performed
comparative study with another FDS.
5.1 Choice of Design Parameters
Since there are three parameters in an HMM, we need to
vary one at a time keeping the other two fixed, thus
generating a large number of possible combinations. For
choosing the design parameters, we generate transaction
sequences using 95 percent low value, 3 percent medium
value, and 2 percent high value transactions. The reason for
using this mix is that it represents a profile that strongly
resembles a ls customer profile. We also consider the j and
o values to be 1.0 and 0.5, respectively. This is chosen so
that, on the average, there will be 1 fraudulent transaction
in any incoming sequence with some scope for variation.
After the parameter values are fixed, we will see in
Section 5.2, how the system performs as we vary the profile
and the mix of fraudulent transactions.
For parameter selection, the sequence length is varied
from 5 to 25 in steps of 5. The threshold values considered
are 30 percent, 50 percent, 70 percent, and 90 percent. The
number of states is varied from 5 to 10 in steps of 1. We
consider both TP and FP for deciding the optimum
parameter values. Thus, there are a total of 120 ð5 Â 4 Â 6Þ
possible combinations of parameters. The number of
simulation runs required for obtaining results with a given
confidence interval (CI) was derived as follows [25], [26].
Aninitial set of fivesimulationruns, eachwith100samples,
was carried out to estimate the mean and standard
deviation of both TP and FP for a fixed sequence length,
number of states, andthresholdvalue. Mean TPwas foundto
be an order of magnitude higher than mean FP. Standard
deviation of TP was 0.1 and that for FP was 0.005. We set the
target 95 percent CI for TP and FP, respectively, as þ´ À
2.5 percent and þ´ À 0.25 percent around their mean values.
Using Student’s tdistribution, the minimum number of
simulation runs required for obtaining desired CI for TP
42 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008
Fig. 2. Process flow of the proposed FDS.
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
was derived as 83 and that for FP as 23. Based on these
observations, we set the number of simulation runs for all the
experiments to be 100. The results obtained were within the
desired CI, as mentioned above.
Since it is not convenient to present the detailed results
for each of the 120 combinations, we show summarized
results. In Table 4, we show the results for each value of
sequence length averaged over all the six states. Similarly,
we present results for each value of the number of states
averaged over all the five sequence lengths in Table 5.
In Tables 4 and 5, the highest value of TP, as well as the
lowest value of FP, has been highlighted for each row. It is
seen from the two tables that FP shows a clear trend of
decreasing with higher threshold and smaller sequence
lengths. However, the number of states does not have a
stronginfluence either onTPor onFP. InTable 4, it is seenthat
TP is high for sequence length 15 in 75 percent of the cases.
Also, fraud detection time increases linearly with the
sequence length, as shown in Fig. 3. The results have been
plotted for a Java implementation on a 1.8 GHz Pentium IV
processor machine. Hence, we choose 15 as the length of
observation symbol sequence for optimum performance.
Once sequence length is decided, it is seen in Table 4 that the
threshold could be set to either 30 percent or 50 percent.
Although TP is higher for threshold ¼ 30%, FP is also higher.
To minimize FP, we choose threshold ¼ 50%. After choosing
sequence lengthandthreshold, wehavetochoose thenumber
of states. Since there is no clear indication from the above
summaryinformation, as presentedinTables 4 and5, we take
a look at the detailed data for TP when threshold ¼ 50%, as
shown in Table 6.
It is seen that for sequence length ¼ 15, the highest value
of TP occurs for no. of states ¼ 10 and, hence, it would be a
good choice for our design.
We have also analyzed the time taken by the training
phase, which is performed offline for each cardholder’s
HMM. Fig. 4 shows the plot of model learning time against
the number of sequences in the training data. As the size of
training data increases, learning time increases, especially
beyond 100. We therefore, use 100 sequences for training the
HMM. Although done offline, the model learning time has a
strong impact on the scalability of the system. Since an HMM
SRIVASTAVA ET AL.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 43
TABLE 4
Variation of TP and FP with Different Sequence Lengths
TABLE 5
Variation of TP and FP with Different Number of States
Fig. 3. Detection time versus length of sequence.
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
is trainedfor eachcardholder, it is imperative that thetraining
time is kept as low as possible especially when an issuing
bank is meant to handle millions of cardholders with many
newcards being issuedeveryday. The online processing time
of about 200 ms on a 1.8 GHz PentiumIVmachine also shows
that the system will be able to handle a large number of
concurrent operations and, hence, is scalable.
Thus, our design parameter setting is given as follows:
1. number of hidden states ` ¼ 10,
2. length of observation sequence 1 ¼ 15,
3. Threshold value ¼ 50%, and
4. number of sequences for training ¼ 100.
With this design parameter setting, we next proceed to
study the performance of the system under various
combinations of input data.
5.2 Comparative Performance
In this section, we showperformance of the proposed system
as we varythe number of fraudulent transactions andalso the
spending profile of the cardholder. Our design parameter
setting is as obtained in the previous section. We compare
performance of our approach (denoted by OA below) with
the credit card fraud detection technique proposed by Stolfo
et al. [6] (denoted by ST below). For comparison, we consider
the metrics TP and FP, as well as TPFP and Accuracy [7].
We carried out experiments by varying both the transac
tion amount mix, as well as the number of fraudulent
transactions intermixed with a sequence of genuine transac
tions. Transaction amount mix is captured by the cardhol
der’s profile. We consider four profiles. One of them is the
mixed profile, which means that spending profile is not
considered at all by our approach, as explained in
Section 4.3. The other profiles considered are (55 35 10),
(70 20 10), and (95 3 2). Here, ðo / cÞ profile represents a ls
profile cardholder who has been found to carry out
o percent of his transactions in the low, / percent in
medium, and c percent in the high range. Thus, our attempt
is to see how the system performs in the presence of
different mixes of transaction amount ranges in the
transactions. It may be noted that for cardholders in the
other two groups, namely, hs and ms, will show similar
performance as only the relative ordering of o, /, and c will
change. We also vary the mean value j of malicious
transactions from 0.5 to 4.0 in steps of 0.5. The o value is
kept fixed at 0.5 for all the experiments. Thus, every
sequence of transaction that we use for testing is a mixed
sequence containing both genuine, as well as malicious,
transactions. For each combination of spending profile and
malicious transaction distribution, we carried out 100 runs
and report the average result. The same set of data was used
to determine the performance of both OA and ST.
Fig. 5a shows the variation of TP and FP for the two
approaches using the spending profile (95 3 2). Variation of
TPFP and Accuracy is shown in Fig. 5b.
It is seenfromthe figures that TPof the proposedapproach
is very close to that of Stolfo et al’s approach. Also, both the
approaches have almost similar values of FP. As a result, the
two systems have comparable accuracies and average TPFP
spread. Further, the two approaches exhibit similar trend
with variation in j. Next, we show how the two systems
behave as we mix the transaction amounts. Percentages of
low value, medium value, and high value transactions are
changed from(95 3 2), as shown above to (70 20 10) in Figs. 6a
and 6b and (55 35 10) in Figs. 7a and 7b.
There are a number of interesting observations from the
above two sets of figures. The first observation is that, TPfalls
andFPrises for boththeapproaches, as transactions nolonger
remain strictly ls in nature. However, the FP rate of ST rises
sharply, although the TP rate does not degrade to a great
extent. On the other hand, for our approach, FP rate remains
low while there is a graceful degradation of the TP rate. As a
result, the Accuracy of the proposed system remains close to
80 percent for all the above settings. Accuracy by Stolfo et al’s
approach falls drastically to about 60 percent for (55 35 10).
Thus, our approach has 1520 percent better Accuracy.
Although TPFP values are close for the profile (70 20 10),
there is more than 15 percent difference in TPFP for the
profile (55 35 10) with our approach performing better.
In Figs. 8a and 8b, we show the performance of the two
approaches when the profile is mixed, which means that all
thethreerangesof transactionsareequiprobable. It isseenthat
theFPof theSTmethodhas goneupsharply. Infact, FPis even
higher than TP. As a result, the mean accuracy has dropped
below40 percent. Our approach shows a fall in TP, but the FP
has not degradeda lot. As a result, the Accuracyof our system
still remains around 80 percent. The TPFP value for the ST
approach has become negative even for j ¼ 0.5. For our
approach, theTPFPvaluebecamenegativeonlyafter j ¼ 2.5.
From the above results, it can be concluded that the
proposed system has an overall Accuracy of 80 percent even
under large input condition variations, which is much higher
44 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008
Fig. 4. Model learning time versus number of sequences in training data.
TABLE 6
Detailed Result of TP for threshold ¼ 50%
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
SRIVASTAVA ET AL.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 45
Fig. 5. Performance variation of the two systems (OA and ST) with the mean ðjÞ of malicious transaction distribution for the spending profile (95 3 2).
(a) TP and FP. (b) TPFP spread (SP) and Accuracy (AC).
Fig. 6. Performance variation of the two systems (OA and ST) within the mean ðjÞ of malicious transaction distribution for the spending profile
(70 20 10). (a) TP and FP. (b) TPFP spread (SP) and Accuracy (AC).
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
46 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008
Fig. 7. Performance variation of the two systems (OA and ST) with the mean ðjÞ of malicious transaction distribution for the spending profile
(55 35 10). (a) TP and FP. (b) TPFP spread (SP) and Accuracy (AC).
Fig. 8. Performance variation of the two systems (OA and ST) with mean ðjÞ of malicious transaction distribution for mixed profile. (a) TP and FP.
(b) TPFP spread (SP) and Accuracy (AC).
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
than the overall Accuracy of the method proposed by Stolfo
et al. [6]. Our system can, therefore, correctly detect most of
the transactions. However, when there is no profile informa
tion at all, the system shows some performance degradation
interms of TPFP. This observationhighlights the importance
of profile selection as explainedinSection 4. Also, when there
is little difference between genuine transactions and mal
icious transactions, most of the credit card FDSs suffer
performance degradation, either due toa fall inthe number of
TPs or a rise in the number of FPs.
6 CONCLUSIONS AND DISCUSSIONS
In this paper, we have proposed an application of HMM in
credit card fraud detection. The different steps in credit
card transaction processing are represented as the under
lying stochastic process of an HMM. We have used the
ranges of transaction amount as the observation symbols,
whereas the types of item have been considered to be states
of the HMM. We have suggested a method for finding the
spending profile of cardholders, as well as application of
this knowledge in deciding the value of observation
symbols and initial estimate of the model parameters. It
has also been explained how the HMM can detect whether
an incoming transaction is fraudulent or not. Experimental
results show the performance and effectiveness of our
system and demonstrate the usefulness of learning the
spending profile of the cardholders. Comparative studies
reveal that the Accuracy of the system is close to 80 percent
over a wide variation in the input data. The system is also
scalable for handling large volumes of transactions.
ACKNOWLEDGMENTS
The authors would like to the anonymous reviewers for
their constructive and useful comments. This work is
partially supported by a research grant from the Depart
ment of Information Technology, Ministry of Communica
tion and Information Technology, Government of India,
under Grant 12(34)/04IRSD dated 07/12/2004.
REFERENCES
[1] “Global Consumer Attitude Towards OnLine Shopping,”
http://www2.acnielsen.com/reports/documents/2005_cc_online
shopping.pdf, Mar. 2007.
[2] D.J. Hand, G. Blunt, M.G. Kelly, and N.M. Adams, “Data Mining
for Fun and Profit,” Statistical Science, vol. 15, no. 2, pp. 111131,
2000.
[3] “Statistics for General and OnLine Card Fraud,” http://www.
epaynews.com/statistics/fraud.html, Mar. 2007.
[4] S. Ghosh and D.L. Reilly, “Credit Card Fraud Detection with a
NeuralNetwork,” Proc. 27th Hawaii Int’l Conf. System Sciences:
Information Systems: Decision Support and KnowledgeBased Systems,
vol. 3, pp. 621630, 1994.
[5] M. Syeda, Y.Q. Zhang, and Y. Pan, “Parallel Granular Networks
for Fast Credit Card Fraud Detection,” Proc. IEEE Int’l Conf. Fuzzy
Systems, pp. 572577, 2002.
[6] S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan,
“Credit Card Fraud Detection Using MetaLearning: Issues and
Initial Results,” Proc. AAAI Workshop AI Methods in Fraud and Risk
Management, pp. 8390, 1997.
[7] S.J. Stolfo, D.W. Fan, W. Lee, A. Prodromidis, and P.K. Chan,
“CostBased Modeling for Fraud and Intrusion Detection: Results
from the JAM Project,” Proc. DARPA Information Survivability Conf.
and Exposition, vol. 2, pp. 130144, 2000.
[8] E. Aleskerov, B. Freisleben, and B. Rao, “CARDWATCH: A
Neural Network Based Database Mining System for Credit Card
Fraud Detection,” Proc. IEEE/IAFE: Computational Intelligence for
Financial Eng., pp. 220226, 1997.
[9] M.J. Kim and T.S. Kim, “A Neural Classifier with Fraud Density
Map for Effective Credit Card Fraud Detection,” Proc. Int’l Conf.
Intelligent Data Eng. and Automated Learning, pp. 378383, 2002.
[10] W. Fan, A.L. Prodromidis, and S.J. Stolfo, “Distributed Data
Mining in Credit Card Fraud Detection,” IEEE Intelligent Systems,
vol. 14, no. 6, pp. 6774, 1999.
[11] R. Brause, T. Langsdorf, and M. Hepp, “Neural Data Mining for
Credit Card Fraud Detection,” Proc. IEEE Int’l Conf. Tools with
Artificial Intelligence, pp. 103106, 1999.
[12] C. Chiu and C. Tsai, “A Web ServicesBased Collaborative Scheme
for Credit Card Fraud Detection,” Proc. IEEE Int’l Conf.
eTechnology, eCommerce and eService, pp. 177181, 2004.
[13] C. Phua, V. Lee, K. Smith, and R. Gayler, “A Comprehensive
Survey of Data MiningBased Fraud Detection Research,” http://
www.bsys.monash.edu.au/people/cphua/, Mar. 2007.
[14] S. Stolfo and A.L. Prodromidis, “AgentBased Distributed Learn
ing Applied to Fraud Detection,” Technical Report CUCS01499,
Columbia Univ., 1999.
[15] C. Phua, D. Alahakoon, and V. Lee, “Minority Report in Fraud
Detection: Classification of Skewed Data,” ACM SIGKDD Explora
tions Newsletter, vol. 6, no. 1, pp. 5059, 2004.
[16] V. Vatsa, S. Sural, and A.K. Majumdar, “A Gametheoretic
Approach to Credit Card Fraud Detection,” Proc. First Int’l Conf.
Information Systems Security, pp. 263276, 2005.
[17] S. Axelsson, “The BaseRate Fallacy and the Difficulty of Intrusion
Detection,” ACM Trans. Information and System Security, vol. 3,
no. 3, pp. 186205, 2000.
[18] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2,
pp. 257286, 1989.
[19] S.S. Joshi and V.V. Phoha, “Investigating Hidden Markov Models
Capabilities in Anomaly Detection,” Proc. 43rd ACM Ann. South
east Regional Conf., vol. 1, pp. 98103, 2005.
[20] S.B. Cho and H.J. Park, “Efficient Anomaly Detection by Modeling
Privilege Flows Using Hidden Markov Model,” Computer and
Security, vol. 22, no. 1, pp. 4555, 2003.
[21] D. Ourston, S. Matzner, W. Stump, and B. Hopkins, “Applications
of Hidden Markov Models to Detecting MultiStage Network
Attacks,” Proc. 36th Ann. Hawaii Int’l Conf. System Sciences, vol. 9,
pp. 334344, 2003.
[22] X.D. Hoang, J. Hu, and P. Bertok, “A MultiLayer Model for
Anomaly Intrusion Detection Using Program Sequences of System
Calls,” Proc. 11th IEEE Int’l Conf. Networks, pp. 531536, 2003.
[23] T. Lane, “Hidden Markov Models for Human/Computer Inter
face Modeling,” Proc. Int’l Joint Conf. Artificial Intelligence, Work
shop Learning about Users, pp. 3544, 1999.
[24] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An
Introduction to Cluster Analysis, Wiley Series in Probability and
Math. Statistics, 1990.
[25] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and
Computer Science Applications, second ed. John Wiley & Sons, 2001.
[26] J. Banks, J.S. Carson II, B.L. Nelson, and D.M. Nicol, DiscreteEvent
System Simulation, fourth ed. Prentice Hall, 2004.
Abhinav Srivastava received the BTech degree
in computer science and engineering from
Harcourt Butler Technological Institute, Kanpur,
in 2001 and the MTech degree in information
technology from the Indian Institute of Technol
ogy, Kharagpur, in 2006. He is currently working
toward the PhD degree in the School of Compu
ter Science, Georgia Institute of Technology. His
research interests include virtualization, systems
security, database security, and data mining.
SRIVASTAVA ET AL.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 47
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
Amlan Kundu is working toward the MS degree
in information technology from the School of
Information Technology, Indian Institute of
Technology, Kharagpur, India. His research
interests include fraud detection, database
security, and network security.
Shamik Sural received the BE degree in
electronics and telecommunication engineering
from Jadavpur University, Calcutta, India, in
1990, the ME degree in electrical communica
tion engineering from the Indian Institute of
Science, Bangalore, India, in 1992 and the
PhD degree from Jadavpur University in 2000.
Before joining IIT, he held technical and man
agerial positions in a number of organizations in
India and the United States. He has served on
the program committee and executive committee of a number of
international conferences including International conference on Informa
tion Systems Security, International Database Engineering and Applica
tions Symposium, IEEE Conference on Fuzzy Systems, Asian Mobile
Computing Conference, International Workshop on Distributed Comput
ing, and others. He is an associate professor at the School of
Information Technology, IIT, Kharagpur, India. His research interests
include database security, data mining, and multimedia database
systems. His research work has been funded by the Ministry of
Communication and Information Technology, Department of Science
and Technology, government of India, and the National Semiconductors
Corp. He has published more than 70 research papers in reputed
international journals and conference proceedings. He is a senior
member of the IEEE and has served as the chairman of the IEEE
Kharagpur Section in 2006.
Arun K. Majumdar received the MTech and
PhD degrees in applied physics from the
University of Calcutta in 1968 and 1973,
respectively, and the PhD degree in electrical
engineering from the University of Florida,
Gainesville, in 1976. Before joining the IIT,
Kharagpur, in 1980, he served as a faculty
member at the Indian Statistical Institute, Cal
cutta, and Jawaharlal Nehru University, New
Delhi. He has held several administrative posi
tions at IIT Kharagpur that include dean (Faculty and Planning), head of
the Computer Science and Engineering Department, head of the
Computer and Informatics Centre, and head of the School of Medical
Science and Technology. He was a visiting professor in the Computing
and Information Sciences Department, University of Guelph, Canada,
and at George Mason University, Fairfax, Virginia. He is currently a
professor of the Computer Science and Engineering Department, Indian
Institute of Technology, Kharagpur, West Bengal. His research interests
include data and knowledgebased systems, multimedia systems,
medical information systems, information security, and design automa
tion. He has more than 160 research publications in international
journals and conference proceedings. He has supervised several PhD
and masters’ level theses. He is a fellow of the Indian National Academy
of Engineering, fellow of the Institution of Engineers (India), and a senior
member of the IEEE.
> For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
48 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 5, NO. 1, JANUARYMARCH 2008
Authorized licensed use limited to: Rajarambapu Institute of Technology. Downloaded on August 2, 2009 at 09:30 from IEEE Xplore. Restrictions apply.
2009 at 09:30 from IEEE Xplore. . With this scheme. N is the number of states in the model. They classify TCP network traffic as an attack or normal using HMM. [22] present a new method to process sequences of system calls for anomaly detection using HMM. . The details of items purchased in individual transactions are usually not known to an FDS running at the bank that issues credit cards to the cardholders. which does not require fraud signatures and yet is able to detect frauds by considering a cardholder’s spending habit. Chiu and Tsai [12] have proposed Web services and data mining techniques to establish a collaborative scheme for fraud detection in the banking industry. Brause et al. . 1. Once human behavior is correctly modeled. Downloaded on August 2. due to the “base rate fallacy” effect [17]. we feel that HMM is an ideal choice for addressing this problem. which is not observable. In a particular state. there is no other published literature on the application of HMM for credit card fraud detection. Kim and Kim have identified skewed distribution of data and mix of legitimate and fraudulent transactions as the two main reasons for the complexity of credit card fraud detection [9].5. It is based on artificial intelligence and combines inductive learning algorithms and metalearning methods for achieving higher accuracy. an alarm is raised in case of any deviation. V2 . [16] have recently proposed a gametheoretic approach to credit card fraud detection. as well as fraudulent transactions. provides an interface to a variety of commercial databases. To establish a smooth channel of data exchange. They use Java agents for Metalearning (JAM). where Vi . 2. . a database mining system used for credit card fraud detection. In recent years. SOAP. Based on this observation. [11] have developed an approach that involves advanced data mining techniques and neural network algorithms to obtain high fraud coverage. We denote the set of states S ¼ fS1 . The problem with most of the abovementioned approaches is that they require labeled data for both genuine. Another important advantage of the HMMbased approach is a drastic reduction in the number of False Positives (FPs)—transactions identified as malicious by an FDS although they are actually genuine. [10] suggest the application of distributed data mining in credit card fraud detection. The same group has also worked on a costbased model for fraud and intrusion detection [7]. and genomics. Hoang et al. . An HMM can be characterized by the following [18]: 1. Restrictions apply. and WSDL are used. SN g. C4. The key idea is to build a multilayer model of program behaviors based on both HMMs and enumerating methods for anomaly detection. We then show how to model credit card transaction processing using HMM in Section 4. The system. Web services techniques such as XML. and Back Propagation neural networks as the base classifiers. [21] have proposed the application of HMM in detecting multistage network attacks. based on a neural learning module. these approaches cannot detect new kinds of frauds for which labeled data is not available. Although they do not directly use credit card fraud detection as the target application. A metaclassifier is used to determine which classifier should be considered based on skewness of data. to train the classifiers. 3 HMM BACKGROUND An HMM is a double embedded stochastic process with two hierarchy levels. each trying to maximize his payoff. an FDS should be designed in such a way that the number of FPs is as low as possible. The state at time instant t is denoted by qt . they use fraud density of real transaction data as a confidence value and generate the weighted fraud score to reduce the number of misdetections. It is only the outcome and not the state that is visible to an external observer [18]. . Cho and Park [20] suggest an HMMbased intrusion detection system that improves the modeling time and performance by considering only the privilege transition flows based on the domain knowledge of attacks. Fan et al. Aleskerov et al. They consider naive Bayesian. an outcome or observation can be generated according to an associated probability distribution. bioinformatics. i ¼ 1.38 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. It can be used to model much more complicated stochastic processes as compared to a traditional Markov model. This can be represented as the underlying finite Markov chain. [8] present CARDWATCH. . M is the number of distinct observation symbols per state. bank administrators may tend to ignore the alarms. Vatsa et al. The observation symbols correspond to the physical output of the system being modeled. Finally. S2 . Ourston et al. their approach is quite generic. . Lane [23] has used HMM to model human behavior. M is an individual symbol. Detailed experimental result is presented in Section 5. VM g. They model the interaction between an attacker and an FDS as a multistage game between two players. Phua et al. The rest of the paper is organized as follows: In Section 3. . HMMbased applications are common in various areas such as speech recognition. 5. The transactions can only be observed through the other stochastic process that produces the sequence of the amount of money spent in each transaction. we present a Hidden Markov Model (HMM)based credit card FDS. any detected deviation is a cause for concern since an attacker is not expected to have a behavior similar to the genuine user. where Si . We also describe the complete process flow of the proposed FDS in this section. 2. . Hence. JANUARYMARCH 2008 correlation of the predictions of the base classifiers. N is an individual state. participating banks share knowledge about the fraud patterns in a heterogeneous and distributed environment. To the best of our knowledge. NO. Getting realworld fraud data is one of the biggest problems associated with credit card fraud detection. which is a distributed data mining system for credit card fraud detection. A number of important performance metrics like True Positive—False Positive (TPFP) spread and accuracy have been defined by them. We denote the set of symbols V ¼ fV1 . Joshi and Phoba [19] have investigated the capabilities of HMM in anomaly detection. Otherwise. . . An HMM has a finite set of states governed by a set of transition probabilities. Phua et al. VOL. 2. . we first briefly explain the working principle of an HMM. Also. Since the number of genuine transactions is a few orders of magnitude higher than the number of malicious transactions. [15] suggest the use of metaclassifier similar to [6] in fraud detection problems. Authorized licensed use limited to: Rajarambapu Institute of Technology. . we conclude the paper with some discussions. We model a credit card transaction processing sequence by the stochastic process of an HMM. [13] have done an extensive survey of existing dataminingbased FDSs and published a comprehensive report. i ¼ 1. Prodromidis and Stolfo [14] use an agentbased approach with distributed learning for detecting frauds in credit card transactions. . Hence. In contrast.
An observation sequence O. VM . We use the notation ¼ ðA. A credit cardholder makes different kinds of purchases of different amounts over a period of time. 2009 at 09:30 from IEEE Xplore. Q ¼ q1 . . k ¼ 1. can be generated by many possible state sequences. as well as the corresponding price range. as shown in Section 5. B. . The initial state probability vector ¼ ½i . where aij ¼ Pðqtþ1 ¼ Sj jqt ¼ Si Þ. we have aij > P 0 for all i. . Þ ¼ bq1 ðO1 Þ:bq2 ðO2 Þ . N aij ¼ 1. we consider only three price ranges. . a procedure named as ForwardBackward procedure [18] is used to compute PðOjÞ. ÞPðQjÞ: ð8Þ Deriving the value of PðOjÞ using the direct definition of (8) is computationally intensive. . . $100. The concerned cardholder may then be contacted and alerted about the possibility that the card is compromised. In this section. . O3 . where A. ð4Þ 4 USE OF HMM DETECTION FOR CREDIT CARD FRAUD where q1 is the initial state. and billing address. where i ¼ Pðq1 ¼ Si Þ. . to represent both the observation symbol. . 1 i N. then the corresponding observation symbol is m. . For example. 1 M X k¼1 TABLE 1 Notations and Acronyms 4. $500. 1 i N. B. The actual price range for each symbol is configurable based on the spending habit of individual cardholders. qR . HMM Model for Credit Card Transaction Processing To map the credit card transaction processing operation in terms of an HMM. One possibility is to 4. and three probability distributions A. M. 2. where each observation Ot is one of the symbols from V . and . OR . Restrictions apply. and R is the number of observations in the sequence. O2 . . 1 j N. we start by first deciding the observation symbols in our model. therefore. it raises an alarm. and h ¼ ð$500. It tries to find any anomaly in the transaction based on the spending profile of the cardholder. Each incoming transaction is submitted to the FDS for verification. namely. The probability that O is generated from this state sequence is given by PðOjQ. FDS receives the card details and the value of purchase to verify whether the transaction is genuine or not. . forming the observation symbols at the issuing bank. Consider one such particular sequence 6. Our set of observation symbols is. Þ ¼ R Y t¼1 An FDS runs at a credit card issuing bank. as mentioned above. These price ranges can be determined dynamically by applying a clustering algorithm on the values of each cardholder’s transactions. j¼1 The observation symbol probability matrix B ¼ ½bj ðkÞ. q2 . j.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 39 3. j N. credit card limit. . B implicitly include N and M. V2 . m ¼ ð$100.1 Authorized licensed use limited to: Rajarambapu Institute of Technology. N and M. . 1 5. medium ðmÞ.2.SRIVASTAVA ET AL. In this work. low ðlÞ. The state transition probability matrix A ¼ ½aij . The types of goods that are bought in that transaction are not known to the FDS. V ¼ fl. : ð1Þ For the general case where any state j can be reached from any other state i in a single step. 1 i N. A set of notations and acronyms used in the paper is given in Table 1. etc. Þ to indicate the complete set of parameters of the model. Þ. and high ðhÞ. aqRÀ1 qR : ð7Þ ð6Þ Thus. Equation (5) can be expanded as PðOjQ. . If the FDS confirms the transaction to be malicious. Hence. Also. PðOt jqt . 2. ð5Þ where statistical independence of observations is assumed. If a cardholder performs a transaction of $190. 1 j N: k M and ð2Þ bj ðkÞ ¼ 1. hg making M ¼ 3. shipping address. such that N X i¼1 i ¼ 1: ð3Þ The observation sequence O ¼ O1 . We use Vk . and the issuing bank declines the transaction. Downloaded on August 2. bqR ðOR Þ: The probability of the state sequence Q is given as PðQjÞ ¼ q1 aq1 q2 aq2 q3 . t ¼ 1. m. . let l ¼ ð0. . . . the probability of generation of the observation sequence O by the HMM specified by can be written as follows: PðOjÞ ¼ X all Q PðOjQ. We quantize the purchase values x into M price ranges V1 . It is evident that a complete specification of an HMM requires the estimation of two model parameters. . where bj ðkÞ ¼ PðVk jSj Þ. . we explain how HMM can be used for credit card fraud detection.
is not practical and. if we execute Kmeans algorithm on the example transactions in Table 2. we use Kmeans clustering algorithm [24] to determine the clusters. cM be the centroids of the generated clusters. In our work. This information about the merchant’s line of business is not known to the issuing bank running the FDS. Normally. they should be chosen carefully. The set of all possible types of purchase and. with cl . . Let x be the amount spent by the cardholder u in transaction T . we run a clustering algorithm on his past transactions. Considering M ¼ 3. Any assumption about availability of this information with the issuing bank and. the set of all possible lines of business of merchants forms the set of hidden states of the HMM. We consider the special case of fully connected HMM in which every state of the model can be reached in a single step from every other state. as shown in Fig. we show the effect of the choice of the number of states on the system performance. The percentage ðpÞ of total number of transactions in this cluster is thus 30 percent. Although various clustering techniques could be used. Downloaded on August 2. The reason is that. JANUARYMARCH 2008 Fig. and we do not attempt to determine the actual types of items purchased in these transactions. Authorized licensed use limited to: Rajarambapu Institute of Technology. After deciding the state and symbol representations. we consider only the amount that the cardholder spent in his transactions. The grouping is performed by minimizing the sum of squares of distances between each data point and the centroid of the cluster to which it belongs. VOL. HMM for credit card fraud detection. El. would not have been valid. This. 5. . 10. the number of symbols is 3 in our system. 1. .. To find the observation symbols corresponding to individual cardholder’s transactions dynamically. . we consider the transition in the type of purchase as state transition in our model. we explain how to determine observation symbols dynamically from a cardholder’s transactions. hence.2. NO. consider the sequence of transaction amounts and look for deviations in them. in turn. However. and 10 have been clustered together as cl resulting in a centroid of 8. Each group formed in the process is called a cluster. Let c1 . 1. c2 . etc. Each individual transaction amount usually depends on the corresponding type of purchase. therefore. Electronic items. B. Thus. as shown in Table 3.3. Such types of line of business are considered as Miscellaneous. Gr. since this information is furnished at the time of registration of a merchant. and so that representation of the HMM is complete. K is the same as the number of observation symbols M. generates a sequence of transaction amounts. and ch as the respective centroids. or Target sells tens of thousands of different items). In Section 5. the next step is to determine the probability matrices A. The type of each purchase is linked to the line of business of the corresponding merchant. Similarly. we train and maintain an HMM. the type of purchase of the cardholder is hidden from the FDS. WalMart. These three model parameters are determined in a training phase using the BaumWelch algorithm [18]. KMart. and Miscellaneous purchases. 4. the sequence of types of purchase is more stable compared to the sequence of transaction amounts. These centroids or mean values are used to decide the observation symbols when a new transaction comes in. are names given to the states to denote purchase types like Groceries. The initial choice of parameters affects the performance of this algorithm and. Kmeans is an unsupervised learning algorithm for grouping a given set of data based on the similarity in their attribute (often called feature) values.40 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. It may be noted that the dollar amounts 5. cm . In Section 4. a cardholder makes purchases depending on his need for procuring different types of items over a period of time. FDS generates the observation symbol for x (denoted by Ox ) as follows: Ox ¼ Varg min jxÀci j : i ð9Þ As mentioned before. 2009 at 09:30 from IEEE Xplore. with the FDS. Restrictions apply. some merchants may be dealing in various types of commodities (For example. equivalently.2 Dynamic Generation of Observation Symbols For each cardholder. The number of clusters K is fixed a priori. we get the clusters. 1. Also. the transactions that are stored in the issuing bank’s database contain many attributes. For our work. Mi. hence. It should be noted at this stage that the line of business of the merchant is known to the acquiring bank. Hence. Spending profiles of the individual cardholders are used to obtain an initial estimate for probability matrix B of (2).
hence. . that is.5 Fraud Detection After the HMM parameters are learned. . 2009 at 09:30 from IEEE Xplore. Initial state probability distribution is considered to be uniform. Spending profiles of cardholders are determined at the end of the clustering step. Downloaded on August 2. . . Otherwise. We make three sets of initial probability for observation symbol generation for three spending groups—ls. the spending profile (SP) of the cardholder u is determined as follows: SP ðuÞ ¼ arg maxðpi Þ: i bank and an issuing bank. The training algorithm has the following steps: 1) initialization of HMM parameters. cm . ORþ1 jÞ. we choose the corresponding set of initial observation probabilities. When the FDS receives a transaction T for this cardholder. and ch to decide (using (9)) the cluster to which T belongs and. At the end of the training phase. For training the HMM. namely. O2 . Details of these steps can be found in [18]. then the initial probability of each state is 1=N.3 Spending Profile of Cardholders The spending profile of a cardholder suggests his normal spending behavior. O3 . m and. we drop O1 and append ORþ1 in that sequence.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 41 TABLE 2 Example Transactions with the Dollar Amount Spent in Each Transaction TABLE 3 Output of KMeans Clustering Algorithm dollar amounts 15. Restrictions apply. . Fig. . to make the initial guess of observation symbol probabilities more accurate. O4 . then in Table 3 using (9). if there are N states. Let pi be the percentage of total number of transactions of the cardholder that belong to cluster with mean ci . 25. we can have better initial guess about state transition probabilities as well. B. Let ORþ1 be the symbol generated by a new transaction at time t þ 1. which needs online response. 2) forward procedure. we get an HMM corresponding to each cardholder. As an example. . mediumspending (ms) group. OR . generating O2 . . as will be discussed in Section 5. the cardholder belongs to the ms group. In case of a collaborative work between an acquiring If Á > 0. we consider the initial guesses to be uniform. it measures the distance of the purchase amount x with respect to the means cl . O3 . Based on the cardholder’s spending profile.SRIVASTAVA ET AL. The algorithm starts with an initial estimate of HMM parameters A. the observation symbol is V1 ¼ l. Since this step is done offline. we take the symbols from a cardholder’s training data and form an initial sequence of symbols. Initial guess of transition and observation probability distributions can also be considered to be uniform. O3 . and 25 have been grouped in the cluster cm with centroid 20. 4. Let the new probability be 2 2 ¼ PðO2 . and the other is detection. . the spending profile of the cardholder is 2. hence. To form another sequence of length R. 2 shows the complete process flow of the proposed FDS. O2 . highspending (hs) group. However. OR be one such sequence of length R. whereas amounts 40 and 80 have been grouped together in cluster ch . Cardholders who belong to the hs group. the issuing bank does not approve the transaction. As shown in the figure. We input this sequence to the HMM and compute the probability of acceptance by the HMM. We now start training the HMM. Let O1 . and and converges to the nearest local maximum of the likelihood function. normally use their credit cards for buying highpriced items. and lowspending (ls) group. . spending profile of the cardholder. and the new sequence is used as the base sequence for determining the validity of the next transaction. contain 50 percent and 20 percent of the total number of transactions. and hs. ORþ1 is added in the sequence permanently. Á=1 ! Threshold: ð14Þ The threshold value can be learned empirically. . 4. 15. OR jÞ: ð11Þ ð10Þ Thus. if x ¼ $10. ORþ1 as the new sequence. it means that the new sequence is accepted by the HMM with low probability. . . The newly added transaction is determined to be fraudulent if the percentage change in the probability is above a threshold. 20. spending profile denotes the cluster number to which most of the transactions of the cardholder belong. the FDS is divided into two parts—one is the training module. and the FDS discards the symbol. it does not affect the credit card transaction processing performance. Since there is no a priori knowledge about the state transition probabilities. The reason for including new nonmalicious symbols in the sequence is to capture the changing spending behavior of a cardholder. we convert the cardholder’s transaction amount into observation symbols and form sequences out of them. Cardholders can be broadly categorized into three groups based on their spending habits. which can be written as follows: 1 ¼ PðO1 . Let the probability be 1 . ms. This recorded sequence is formed from the cardholder’s transactions up to time t. We input this new sequence to the HMM and calculate the probability of acceptance by the HMM.4 Model Parameter Estimation and Training We use BaumWelch algorithm to estimate the HMM parameters for each cardholder. and it could be a fraud. and 3) backward procedure. Let Á ¼ 1 À 2 : ð12Þ ð13Þ 4. as determined in Section 4. Similar definition applies to the other two categories also. In the example in Table 3. the corresponding observation symbol.3. Then. that is. cm and ch . The initial estimate of symbol generation probabilities using this method leads to accurate learning of the model. thus. is taken into account. that is. Authorized licensed use limited to: Rajarambapu Institute of Technology. If ORþ1 is malicious.
as proposed in [7] to measure the effectiveness of the system. on the average. is used as a metric. It can be expressed as follows: Accuracy ¼ No: of good trans: detected as goodþNo: of bad trans: detected as bad: Total No: of transactions We first carried out a set of experiments to determine the correct combination of HMM design parameters. Process flow of the proposed FDS.5. A simulator is used to generate a mix of genuine and fraudulent transactions. 70 percent. and 2 percent high value transactions. 3 percent medium value. we will see in Section 5. 5. respectively. For parameter selection. We have. 2. and 90 percent.1 Choice of Design Parameters Since there are three parameters in an HMM. we generate transaction sequences using 95 percent low value. Using Student’s tdistribution. This is chosen so that. we performed comparative study with another FDS. the minimum number of simulation runs required for obtaining desired CI for TP Authorized licensed use limited to: Rajarambapu Institute of Technology. as well as TPFP spread and Accuracy metrics. its FDS receives a large number of genuine transactions sparingly intermixed with fraudulent transactions. [26]. VOL. and the threshold value. Training phase is performed offline. We also consider the and values to be 1. 2009 at 09:30 from IEEE Xplore. ð15Þ 5. NO. specifies the average number of fraudulent transactions in a given transaction mix. an issuing bank. The threshold values considered are 30 percent. whereas FP is the fraction of genuine transactions identified as fraudulent.1 and that for FP was 0. Most of the design choices for a FDS that result in higher values of TP.25 percent around their mean values. TP represents the fraction of fraudulent transactions correctly identified as fraudulent. The genuine transactions are generated according to the cardholders’ profiles. The number of states is varied from 5 to 10 in steps of 1. JANUARYMARCH 2008 Fig. We use standard metrics—True Positive (TP) and FP. thus generating a large number of possible combinations. and hence. After the parameter values are fixed. and hs groups. there will be 1 fraudulent transaction in any incoming sequence with some scope for variation. agree to share their data with researchers. how the system performs as we vary the profile and the mix of fraudulent transactions. there are a total of 120 ð5 Â 4 Â 6Þ possible combinations of parameters. each with 100 samples. and threshold value. 50 percent. as þ= À 2. Standard deviation of TP was 0. Once these parameters were decided. The cardholders are classified into three categories as mentioned before—the low. Thus. The number of simulation runs required for obtaining results with a given confidence interval (CI) was derived as follows [25]. To meaningfully capture the performance of such a system. and highpricerange clusters. often called the TPFP spread. .2. Mean TP was found to be an order of magnitude higher than mean FP. namely. the sequence length is varied from 5 to 25 in steps of 5. performed largescale simulation studies to test the efficacy of the system. We set the target 95 percent CI for TP and FP. The reason for using this mix is that it represents a profile that strongly resembles a ls customer profile. number of states.005.0 and 0. There is also no benchmark data set available for experimentation. An initial set of five simulation runs. 5 RESULTS Testing credit card FDSs using real data set is a difficult task. We have studied the effects of spending group and the percentage of transactions that belong to the low. taking cardholder’s spending behavior into account.42 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. The number of fraudulent transactions in a given length of mixed transactions is normally distributed with a user specified (mean) and (standard deviation). the sequence length. in general. For choosing the design parameters. Downloaded on August 2. Restrictions apply. therefore. whereas detection is an online process. We consider both TP and FP for deciding the optimum parameter values. was carried out to estimate the mean and standard deviation of both TP and FP for a fixed sequence length. Banks do not. medium. the number of states. we need to vary one at a time keeping the other two fixed. In a typical scenario. medium. the difference between TP and FP. respectively. 1. Accuracy represents the fraction of total number of transactions (both genuine and fraudulent) that have been detected correctly.5 percent and þ= À 0. also cause FP to increase.
We therefore. we have to choose the number of states. After choosing sequence length and threshold. Although done offline. we present results for each value of the number of states averaged over all the five sequence lengths in Table 5. 2009 at 09:30 from IEEE Xplore. the number of states does not have a strong influence either on TP or on FP. Restrictions apply. we take a look at the detailed data for TP when threshold ¼ 50%.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 43 TABLE 4 Variation of TP and FP with Different Sequence Lengths TABLE 5 Variation of TP and FP with Different Number of States was derived as 83 and that for FP as 23. we choose 15 as the length of observation symbol sequence for optimum performance. especially beyond 100. Hence. hence. we set the number of simulation runs for all the experiments to be 100. the highest value of TP. Also. 3. It is seen that for sequence length ¼ 15. as shown in Fig. Since there is no clear indication from the above summary information. Detection time versus length of sequence. Downloaded on August 2. has been highlighted for each row. In Table 4. use 100 sequences for training the HMM. In Tables 4 and 5. Once sequence length is decided. . we show summarized results. which is performed offline for each cardholder’s HMM. as shown in Table 6. The results have been plotted for a Java implementation on a 1. Fig. it is seen in Table 4 that the threshold could be set to either 30 percent or 50 percent. we choose threshold ¼ 50%. The results obtained were within the desired CI.SRIVASTAVA ET AL. In Table 4. 4 shows the plot of model learning time against the number of sequences in the training data. 3. FP is also higher. As the size of training data increases. as mentioned above. Since an HMM Fig. as presented in Tables 4 and 5. we show the results for each value of sequence length averaged over all the six states. it would be a good choice for our design. However. it is seen that TP is high for sequence length 15 in 75 percent of the cases. fraud detection time increases linearly with the sequence length. Since it is not convenient to present the detailed results for each of the 120 combinations. the highest value of TP occurs for no: of states ¼ 10 and. Similarly. as well as the lowest value of FP.8 GHz Pentium IV processor machine. Authorized licensed use limited to: Rajarambapu Institute of Technology. We have also analyzed the time taken by the training phase. It is seen from the two tables that FP shows a clear trend of decreasing with higher threshold and smaller sequence lengths. learning time increases. Based on these observations. To minimize FP. Although TP is higher for threshold ¼ 30%. the model learning time has a strong impact on the scalability of the system.
8a and 8b. the two approaches exhibit similar trend with variation in . the mean accuracy has dropped below 40 percent. 5b. As a result. transactions. Variation of TPFP and Accuracy is shown in Fig. Thus. it is imperative that the training time is kept as low as possible especially when an issuing bank is meant to handle millions of cardholders with many new cards being issued everyday. . 2. It may be noted that for cardholders in the other two groups. Model learning time versus number of sequences in training data. number of sequences for training ¼ 100. 4. the Accuracy of the proposed system remains close to 80 percent for all the above settings. is trained for each cardholder. Restrictions apply. The online processing time of about 200 ms on a 1. NO. there is more than 15 percent difference in TPFP for the profile (55 35 10) with our approach performing better. The first observation is that. the Accuracy of our system still remains around 80 percent. we show the performance of the two approaches when the profile is mixed. We consider four profiles. JANUARYMARCH 2008 TABLE 6 Detailed Result of TP for threshold ¼ 50% Fig. as shown above to (70 20 10) in Figs. 3. We also vary the mean value of malicious transactions from 0. Thus. b. The other profiles considered are (55 35 10). As a result. There are a number of interesting observations from the above two sets of figures.5. We compare performance of our approach (denoted by OA below) with the credit card fraud detection technique proposed by Stolfo et al. Downloaded on August 2. TP falls and FP rises for both the approaches. as transactions no longer remain strictly ls in nature. we next proceed to study the performance of the system under various combinations of input data. it can be concluded that the proposed system has an overall Accuracy of 80 percent even under large input condition variations. is scalable. our approach has 1520 percent better Accuracy. we show how the two systems behave as we mix the transaction amounts. length of observation sequence R ¼ 15. However. and c percent in the high range. as well as the number of fraudulent transactions intermixed with a sequence of genuine transactions.5 to 4. ða b cÞ profile represents a ls profile cardholder who has been found to carry out a percent of his transactions in the low. Here.2 Comparative Performance In this section. the FP rate of ST rises sharply. every sequence of transaction that we use for testing is a mixed sequence containing both genuine. the TPFP value became negative only after ¼ 2:5. Although TPFP values are close for the profile (70 20 10). In fact. With this design parameter setting.0 in steps of 0. the two systems have comparable accuracies and average TPFP spread.3. For each combination of spending profile and malicious transaction distribution. medium value. Fig. and high value transactions are changed from (95 3 2). we show performance of the proposed system as we vary the number of fraudulent transactions and also the spending profile of the cardholder. From the above results. Thus. For comparison. and c will change. Also. b percent in medium. which means that spending profile is not considered at all by our approach. Accuracy by Stolfo et al’s approach falls drastically to about 60 percent for (55 35 10).5 for all the experiments. as well as malicious. Next. our attempt is to see how the system performs in the presence of different mixes of transaction amount ranges in the transactions. as explained in Section 4. our design parameter setting is given as follows: 1. and (95 3 2). 5. VOL.44 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. will show similar performance as only the relative ordering of a. we consider the metrics TP and FP. as well as TPFP and Accuracy [7]. Percentages of low value. We carried out experiments by varying both the transaction amount mix. Further. It is seen from the figures that TP of the proposed approach is very close to that of Stolfo et al’s approach. For our approach. hence. 5a shows the variation of TP and FP for the two approaches using the spending profile (95 3 2). Transaction amount mix is captured by the cardholder’s profile. but the FP has not degraded a lot. In Figs. The TPFP value for the ST approach has become negative even for ¼ 0:5. The value is kept fixed at 0. hs and ms. One of them is the mixed profile. On the other hand. both the approaches have almost similar values of FP. which means that all the three ranges of transactions are equiprobable. 5. 7a and 7b. [6] (denoted by ST below). It is seen that the FP of the ST method has gone up sharply. which is much higher Authorized licensed use limited to: Rajarambapu Institute of Technology. and 4. for our approach. FP is even higher than TP. number of hidden states N ¼ 10. The same set of data was used to determine the performance of both OA and ST. 2009 at 09:30 from IEEE Xplore. Thus. As a result. 1. FP rate remains low while there is a graceful degradation of the TP rate. namely.8 GHz Pentium IV machine also shows that the system will be able to handle a large number of concurrent operations and. Our approach shows a fall in TP. Threshold value ¼ 50%. As a result. 6a and 6b and (55 35 10) in Figs. although the TP rate does not degrade to a great extent. (70 20 10). Our design parameter setting is as obtained in the previous section. we carried out 100 runs and report the average result.
6. Fig. (b) TPFP spread (SP) and Accuracy (AC). 2009 at 09:30 from IEEE Xplore.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 45 Fig. .SRIVASTAVA ET AL. (a) TP and FP. Restrictions apply. Performance variation of the two systems (OA and ST) within the mean ðÞ of malicious transaction distribution for the spending profile (70 20 10). Authorized licensed use limited to: Rajarambapu Institute of Technology. (a) TP and FP. Performance variation of the two systems (OA and ST) with the mean ðÞ of malicious transaction distribution for the spending profile (95 3 2). (b) TPFP spread (SP) and Accuracy (AC). Downloaded on August 2. 5.
Restrictions apply. NO. Authorized licensed use limited to: Rajarambapu Institute of Technology. Performance variation of the two systems (OA and ST) with mean ðÞ of malicious transaction distribution for mixed profile. 7. (a) TP and FP. 1.46 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. (b) TPFP spread (SP) and Accuracy (AC). VOL. JANUARYMARCH 2008 Fig. Fig. (b) TPFP spread (SP) and Accuracy (AC). Performance variation of the two systems (OA and ST) with the mean ðÞ of malicious transaction distribution for the spending profile (55 35 10). Downloaded on August 2. (a) TP and FP. 2009 at 09:30 from IEEE Xplore. 8. . 5.
” Proc.B. [6].J. S. Georgia Institute of Technology.monash. 2.” IEEE Intelligent Systems. and P. J.J. Nelson. “Distributed Data Mining in Credit Card Fraud Detection. L. Experimental results show the performance and effectiveness of our system and demonstrate the usefulness of learning the spending profile of the cardholders. Kanpur. 257286. “CARDWATCH: A Neural Network Based Database Mining System for Credit Card Fraud Detection. “Applications of Hidden Markov Models to Detecting MultiStage Network Attacks. 572577. 2000. vol.J. S. and M. pp. [8] [9] [10] [11] [12] 6 CONCLUSIONS AND DISCUSSIONS [13] [14] [15] [16] [17] [18] [19] [20] [21] In this paper. “Neural Data Mining for Credit Card Fraud Detection. Int’l Joint Conf. K. Chan. V. Tsai.bsys.V. 22. under Grant 12(34)/04IRSD dated 07/12/2004. 2004. 2. 2002. “AgentBased Distributed Learning Applied to Fraud Detection. Hawaii Int’l Conf. 2002. Queuing. Rabiner. G. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. S. and R. Nicol. Abhinav Srivastava received the BTech degree in computer science and engineering from Harcourt Butler Technological Institute. S. Prodromidis. either due to a fall in the number of TPs or a rise in the number of FPs. IEEE Int’l Conf. when there is no profile information at all. 2007. and P. It has also been explained how the HMM can detect whether an incoming transaction is fraudulent or not. B. Ghosh and D.L. pp. D. and Exposition. eCommerce and eService. Adams. as well as application of this knowledge in deciding the value of observation symbols and initial estimate of the model parameters. 621630. S. This observation highlights the importance of profile selection as explained in Section 4. pp. W. Matzner. “Credit Card Fraud Detection with a NeuralNetwork.” Proc. Mar.” http://www. IEEE/IAFE: Computational Intelligence for Financial Eng.” Proc. 6774. Government of India. C. S.K. pp. J. 103106. pp.J. John Wiley & Sons. Trivedi. Bertok. Lee.” Proc. pp. and Automated Learning. 1999. Prodromidis. we have proposed an application of HMM in credit card fraud detection. M. Kharagpur. Fuzzy Systems. Hepp. He is currently working toward the PhD degree in the School of Computer Science. 130144. Hand. 2007. “The BaseRate Fallacy and the Difficulty of Intrusion Detection. Stolfo. 14. D. vol. Southeast Regional Conf. 220226. whereas the types of item have been considered to be states of the HMM.S. Stolfo and A. Workshop Learning about Users. W. Fan. D. when there is little difference between genuine transactions and malicious transactions. no. Kelly. System Sciences: Information Systems: Decision Support and KnowledgeBased Systems. Reilly. systems security. vol. 1. 98103. Mar. IEEE Int’l Conf. A. Park. Pan. Alahakoon. 531536. “Hidden Markov Models for Human/Computer Interface Modeling. 2009 at 09:30 from IEEE Xplore.W. Prodromidis. in 2006. S. [22] [23] [24] [25] [26] REFERENCES [1] [2] [3] [4] “Global Consumer Attitude Towards OnLine Shopping. 1. 1999. pp.au/people/cphua/. Langsdorf. Banks.” Proc. and A. Chan.G. The system is also scalable for handling large volumes of transactions. M.R.” ACM SIGKDD Explorations Newsletter. “A Web ServicesBased Collaborative Scheme for Credit Card Fraud Detection. 43rd ACM Ann. “A Comprehensive Survey of Data MiningBased Fraud Detection Research. “Efficient Anomaly Detection by Modeling Privilege Flows Using Hidden Markov Model. Stump. We have used the ranges of transaction amount as the observation symbols. epaynews. Fan. 1990. Lane.K. Smith.” Computer and Security. Joshi and V. pp. Also. Brause. E. Axelsson. 4555. Comparative studies reveal that the Accuracy of the system is close to 80 percent over a wide variation in the input data. Freisleben. vol.” Proc. and V.M. 1997. eTechnology. This work is partially supported by a research grant from the Department of Information Technology.J. “A MultiLayer Model for Anomaly Intrusion Detection Using Program Sequences of System Calls. Blunt. 1989. Kim. 6. 2. D. Rousseeuw.Q. pp. Hu. 111131.S. S. Chiu and C. 378383.” Statistical Science.edu.” Proc. C. 2003. 2003. Probability and Statistics with Reliability.S. and N. W. Information and System Security. 2000. Phoha. 5059. DiscreteEvent System Simulation. Artificial Intelligence. 3. therefore. 2004.K. Aleskerov. most of the credit card FDSs suffer performance degradation. ACKNOWLEDGMENTS The authors would like to the anonymous reviewers for their constructive and useful comments. B. pp. 2001. Lee. Rao.pdf. vol. 263276. System Sciences. L. pp. 186205. T. Ministry of Communication and Information Technology. “Minority Report in Fraud Detection: Classification of Skewed Data. Restrictions apply. Downloaded on August 2. 1999. Wiley Series in Probability and Math. Columbia Univ. “Data Mining for Fun and Profit. second ed. Majumdar. V. A. 1997. Lee. “Credit Card Fraud Detection Using MetaLearning: Issues and Initial Results. “A Gametheoretic Approach to Credit Card Fraud Detection. W. [5] [6] [7] Authorized licensed use limited to: Rajarambapu Institute of Technology. 177181. pp. no.” http://www2. Cho and H. pp. Phua. vol. 3.” Proc. and data mining.J. Information Systems Security.” Proc.L. 1994. 11th IEEE Int’l Conf. no. pp.html. vol. pp.. 2003. Mar. and Y. K.J. Y. and P. Prentice Hall. His research interests include virtualization. correctly detect most of the transactions.S. 9.” Technical Report CUCS01499. 3544. 27th Hawaii Int’l Conf. “Investigating Hidden Markov Models Capabilities in Anomaly Detection. Int’l Conf. Tools with Artificial Intelligence. pp. A.L.” Proc. and B.com/reports/documents/2005_cc_online shopping. fourth ed. pp. and S. Our system can. 8390. Hoang. vol. pp. pp. “CostBased Modeling for Fraud and Intrusion Detection: Results from the JAM Project.” Proc. the system shows some performance degradation in terms of TPFP. J..com/statistics/fraud. 2005. 334344. Sural. no.” http:// www. and Computer Science Applications. vol. M.” Proc. in 2001 and the MTech degree in information technology from the Indian Institute of Technology. Prodromidis. Carson II. 2004. “Statistics for General and OnLine Card Fraud. 1. Networks. Vatsa.” Proc. no. Hopkins. DARPA Information Survivability Conf. Lee. 2005. IEEE Int’l Conf. and B. and D. “A Neural Classifier with Fraud Density Map for Effective Credit Card Fraud Detection. 15.acnielsen. 6. First Int’l Conf. Gayler. AAAI Workshop AI Methods in Fraud and Risk Management.L.L.M.SRIVASTAVA ET AL. However. 1999. S.” Proc. IEEE. D. vol. 2007. Statistics. R.D. “Parallel Granular Networks for Fast Credit Card Fraud Detection.W. 2000. The different steps in credit card transaction processing are represented as the underlying stochastic process of an HMM. We have suggested a method for finding the spending profile of cardholders. 77. Syeda. no. Zhang. Kaufman and P. X.. 3. Stolfo. Intelligent Data Eng. T.: CREDIT CARD FRAUD DETECTION USING HIDDEN MARKOV MODEL 47 than the overall Accuracy of the method proposed by Stolfo et al. Finding Groups in Data: An Introduction to Cluster Analysis. Ourston. database security.” ACM Trans. Fan. 36th Ann. Kim and T. Stolfo. . C. Phua.
University of Guelph. in 1976. For more information on this or any other computing topic. He has published more than 70 research papers in reputed international journals and conference proceedings. West Bengal. International Workshop on Distributed Computing. Authorized licensed use limited to: Rajarambapu Institute of Technology. head of the Computer Science and Engineering Department.computer. Before joining the IIT. 2009 at 09:30 from IEEE Xplore. India. He is a senior member of the IEEE and has served as the chairman of the IEEE Kharagpur Section in 2006. 1. database security. and others. he served as a faculty member at the Indian Statistical Institute. He was a visiting professor in the Computing and Information Sciences Department. Before joining IIT. He has more than 160 research publications in international journals and conference proceedings. data mining. VOL. His research interests include fraud detection. India. He is currently a professor of the Computer Science and Engineering Department. 5. information security. Bangalore. Majumdar received the MTech and PhD degrees in applied physics from the University of Calcutta in 1968 and 1973. IEEE Conference on Fuzzy Systems. and the PhD degree in electrical engineering from the University of Florida. Canada. Kharagpur. Downloaded on August 2. Shamik Sural received the BE degree in electronics and telecommunication engineering from Jadavpur University. Arun K. in 1990. Indian Institute of Technology. and multimedia database systems.48 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. Kharagpur. Asian Mobile Computing Conference. and at George Mason University. Virginia. Calcutta. NO. Restrictions apply. Department of Science and Technology. he held technical and managerial positions in a number of organizations in India and the United States. and head of the School of Medical Science and Technology. JANUARYMARCH 2008 Amlan Kundu is working toward the MS degree in information technology from the School of Information Technology. and a senior member of the IEEE. He is an associate professor at the School of Information Technology. medical information systems. respectively. Gainesville. Fairfax. He has held several administrative positions at IIT Kharagpur that include dean (Faculty and Planning). and the National Semiconductors Corp. India. He has supervised several PhD and masters’ level theses. and network security. International Database Engineering and Applications Symposium. He has served on the program committee and executive committee of a number of international conferences including International conference on Information Systems Security. . His research interests include data and knowledgebased systems. IIT. fellow of the Institution of Engineers (India). government of India. He is a fellow of the Indian National Academy of Engineering. India. Calcutta. His research interests include database security. and Jawaharlal Nehru University.org/publications/dlib. the ME degree in electrical communication engineering from the Indian Institute of Science. head of the Computer and Informatics Centre. please visit our Digital Library at www. Kharagpur. . and design automation. in 1992 and the PhD degree from Jadavpur University in 2000. multimedia systems. New Delhi. Indian Institute of Technology. Kharagpur. in 1980. His research work has been funded by the Ministry of Communication and Information Technology.