A Framework For Data Mining-Based Anti-Money Laundering Research

The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1368-5201.htm
JMLC
10,2 A framework for data
mining-based anti-money
laundering research
170
Zengan Gao and Mao Ye
School of Economics and Management,
Southwest Jiaotong University, Chengdu, People’s Republic of China
Abstract
Purpose – The purpose of this paper is to propose a framework for data mining (DM)-based
anti-money laundering (AML) research.
Design/methodology/approach – First, suspicion data are prepared by using DM techniques.
Also, DM methods are compared with traditional investigation techniques. Next, rare transactional
patterns are further categorized as unusual/abnormal/anomalous and suspicious patterns whose
recognition also includes fraud/outlier detection. Then, in summarizing the reporting of money
laundering (ML) crimes, an analysis is made on ML network generation, which involves link analysis,
community generation, and network destabilization. Future research directions are derived from a
review of literature.
Findings – The key of the framework lies in ML network analysis involving link analysis,
community generation, and network destabilization.
Originality/value – The paper offers insights into DM in the context of AML.
Keywords Data analysis, Money laundering, Crimes
Paper type Research paper
Introduction
Money laundering (ML) is the processing of criminal, “dirty” money to disguise their
illicit origin and make them appear legitimate and “clean.” IMF Managing Director
Michel Camdessus estimated in 1998 that between 2 and 5 percent of the global GDP is
laundered annually. Evidence also shows that ML finances terrorist attacks
worldwide. So anti-money laundering (AML) research is of critical significance to
national financial stability and international security.
ML behavioral patterns and ML network structural features are essential to AML,
but traditional research focuses on legislative considerations and compliance
requirements. It is methodologically limited to incident identification, avoidance
detection, and suspicion surveillance. Investigations are generally manual, tedious,
time-consuming, and resource-intensive. So challenges are often made to their high
false positive rate (FPR) and inefficiency with voluminous data sets.
Comparatively, data mining (DM) can reduce data preparing time, determine
detection priority, improve FPR, and lessen the pressure of manpower, training, and
budget. It proves particularly effective and efficient in:
Journal of Money Laundering Control
Vol. 10 No. 2, 2007
pp. 170-179 The research is accomplished during the author’s visit to UQ Business School, the University of
q Emerald Group Publishing Limited
1368-5201
Queensland, Australia. The author’s grateful thanks are due to Professor Peter Green,
DOI 10.1108/13685200710746875 Dr Dongming Xu, and Dr Jon Heales for their invitation and support.
.
using decision tree and Bayesian inference to rank suspicious ML cases based on Framework for
probability computations so as to help AML analysts focus on the most DM-based AML
likely suspects;
.
employing consolidation, link analysis, and social network analysis, etc. to
research
identify central members, subgroups, and inter-/intra-group interaction patterns
in ML networks;
.
applying regression and case-based reasoning to uncovering hidden leads and 171
patterns that may prove valuable or timely and predicting prospective
trends; and
.
using support vector machine (SVM) (Schölkopf et al., 2001; Vapnik, 1995) to deal
with high dimensionality heterogeneous data sets.
DM is also limited in:

.
analysts may only see what they are looking for;
.
a sufficiently exhaustive search will always suggest patterns which are merely
the product of random fluctuations because of data inconsistency; and
.
investigators may refuse to adopt it as they are unwilling to change their
conventional ways of doing things, lack of training in customerized artificial
intelligence (AI) models, or inaccessible to real financial databases because of
legal and competitive reasons (Hand, 1998; Watkins et al., 2003).
The objective of this paper is to suggest a framework for DM-based AML research and
highlight some promising directions for future studies from literature review. We also
propose some improvements to the current methodologies.
Suspicion data preparations

Suspicion data preparations involve data collection, data pre-processing, and database
restructuring.
Traditional suspicion data come from disgruntled employees, banks, and informants.
The financial intelligence unit (FIU) typically reads mountains of textual reports received
from financial institutions, hypothesizes crime group models, and forwards suspicious
cases to competent law enforcement agencies (LEAs) for further investigations. With DM
techniques, customer, account, product, geography, and time are all included in feature
vectors (Kingdon, 2004). Data are mostly derived out by comparing unusual transaction
records with normal behavior norms. The SVM-based statistical learning theory is
developed to replace the suspicious transaction data filtering system based on predefined
rules (Tang and Yin, 2005; Vapnik, 1995). As publicly available real data are always in
short, synthetic data can be employed to train and adapt a system and even to benchmark
different systems. The important qualities of simulated data and a five-step generation
methodology are explored by Barse et al. (2003).
Since, the data are often incomplete and noisy, we need to decide which records
represent facts about which particular real world entities (RWEs) in many cases and
combine features of these records to depict the complete picture of that RWE’s activity.
Owing to data entry errors, unforeseen requirements, data collection cost, and multiple
data reporters, a database must be restructured to allow for the identification of
interesting entities and the effective search of the linkage structure hidden in the
JMLC original transactions (Goldberg and Wong, 1998). Sometimes, information from
10,2 multiple databases are combined together for link discovery.
Rare transactional pattern recognition

ML is rarely, if ever, manifested by a single person, business, account, or transaction,
but rather by a behavioral pattern occurring over time and involving a set of related
172 RWEs (Goldberg and Senator, 1995).
The traditional AML regime assumes that a usual transaction is legal and an
unusual transaction is suspicious and may likely be illegal, so when two different
persons are engaged in the same transaction, one can be guilty of ML while the other is
not because ML is an offence of motive rather than a crime of activity. “Usual” and
“unusual” relate to one’s behavioral habit or business model while “suspicious” is a
personal judgment of the legitimacy of a transaction. As far as terms are concerned, the
USA, China, Philippine, and Canada and The Netherlands make use of “suspicious
activity report” (SAR), “suspicious transaction report,” “covered transaction report,”
and “unusual transaction report,” respectively, for their AML reporting systems,
implying that different countries might have different understandings of the meanings
of the words “unusual” and “suspicious.” So we argue for a continuum of transactional
patterns from “legal” to “usual,” “unusual/abnormal/anomalous,” “suspicious,” and
“illegal” finally, where “usual” means “most probably but not necessarily legitimate”
whereas “suspicious” implies “a larger likelihood to be illegal,” as shown in Figure 1.
As they are rare in comparison with the great number of “usual” or “legal”
transactions, “unusual/abnormal/anomalous,” “suspicious,” and “illegal” transactions
are referred to as “rare” transactions in this paper. In the following, we will review the
DM techniques used to mine rare transactional patterns. For the convenience of the
writing, fraud/outlier detection is included in this section, as well.
Unusual/abnormal/anomalous transaction pattern recognition

There are no clues as to what constitutes suspicious behavior and SARs are not the best
basis for building an algorithm since they only rely on the observation that a customer
looks suspicious and his money smells of fish. Searchspace makes its first breakthrough
by searching for unusual behavior, which is non-prescriptive but statistically defined over
the massive dimensionality “customers £ accounts £ products £ geography £ time.”
With the exaggeration of support vectors, an enormous adaptive probabilistic matrix
is generated to compute the likelihood of each customer’s actions based on simple
weighted aggregations. Peer groups are also employed to confirm unusualness outside a
customer’s behavior norm. So far Searchspace has reported an approximate 1-in-14 FPR
and an average 0.00001 alerts generation rate in its trial running of special investigator
modules now known as sentinels (Kingdon, 2004).
High Legitimacy Low
Illegitimacy
Legitimacy
Figure 1.
Continuum of Legal Usual Unusual Suspicious Illegal
transactional patterns Transaction Transaction Transaction Transaction Transaction
Alternatively, DM is used to detect ML and terrorist financing through abnormal Framework for
transfer pricing (ATP) in international trade, where the imports/exports at prices DM-based AML
over/below the upper/lower quartile import/export prices are considered abnormal
according to the 482 regulations of the US Internal Revenue Service in 1994. The ATP research
range is determined by the real-time transaction prices in the US Merchandize Trade
Database together with upper/lower quartile prices and US/World median prices. Price
frequency distribution and country/commodity heterogeneity are considered, and 173
prices outside the inter-quartile range are also adjusted and brought back to the
median prices. In 2001, the total amount laundered out of the USA through ATP was
estimated to be $156.22 billion, in which nearly $4.27 billion flowed to the 25 countries
appearing on the USA State Department’s watch list and $3.65 billion to the top five
Al-Qaeda countries (Zdanowicz, 2004). These findings not only enable us to feel the
iceberg of ATP’s contribution to international ML but also give evidence that ML
might finance terrorist attacks worldwide, implying that an effective audit and
inspection system must be established to monitor pricing behavior in international
trade in the fight against global ML crimes (MLCs).
Anomaly detection is also identified as a KDD approach (Piatesky-Shapiro and
Matheus, 1992). The previous research on skewed data mostly casts the problem into a
cost-sensitive one where skewed examples receive a higher weight. Some old
techniques are applicable when the cost-model is clearly defined and the volume of the
data is small. To mine trading “anomalies” from extremely skewed positive classes
(, 0.01 percent) in very large volume of data (5 M with 144 features), an
ensemble-based approach is suggested by Fan et al. (2004). This research defines the
goals of using DM in AML context as follows:
.
to automate and shorten the developing cycle;
.
to output a score, such as, posterior probability, to indicate the likelihood for a
trade to be truly anomalous; and
.
to obtain a much lower FPR but a higher recall rate so that analysts can focus on
the “really bad” cases.
Built upon probabilistic modeling and loss functions, its basic idea is to train ensemble of
classifier (or multiple classifiers) from “biased” samples taken from the database. Each
classifier in the ensemble outputs posterior probability, and the probability estimates
from multiple classifiers are averaged to compute the final posterior probability. When
the probability is higher than a threshold, the trade is classified as anomalous. The best
threshold here is determined by the given loss function in each application.
Suspicious transaction pattern recognition

“Suspicious” or “suspicion” is the most often cited term in AML environment. It is
synonymous with “unusual” in the Wolfsberg Anti-Money Laundering Principles,
referring to any transaction involving a covered product that is relevant to a possible
violation of law or regulation. Traditional suspicion indicators are categorized in
account opening, large cash sums, deposits and withdrawals, payments, wire transfer,
and informal value transfer system (IVTS), but DM extends the scope of feature
vectors to cover customer, account, product, geography, and time, etc. (Kingdon, 2004).
Expert rules do not quantify the probability for a transaction to be suspicious, so we
need to weight each suspicion indicator, compute suspicion scores, and rank the items
JMLC as per the computations so as to assist analysts to focus on the most suspicious
10,2 subjects identified from Bank Secrecy Act filings. FAIS integrates intelligent human
and software agents in a cooperative discovery task on a very large data space.
Rule-based reasoning, blackboard, and many other aspects of AI technology are
incorporated in it. Its unique analytic power arises primarily from the change in view
of the underlying data from a transaction-oriented perspective to a subject-oriented
174 (i.e. person or organization) one. Every transaction/subject/account is tested by each of
the 336 rules derived from the earlier customs AI system, and a single suspiciousness
rating for each item is combined by Bayesian inference. Its use of nearest-neighbor
matching and inductive retrieval also differs from ordinary expert systems. In another
research, neural networks are used to correlate information from a variety of
technology and database sources for financial institutions to identify suspicious
account activity and handle illegitimate behavior (Vikram et al., 2004).
The hypothesis “If Q is a highly probable pattern (. 0.9) then Q constitutes a
normal pattern and not(Q) can constitute a suspicious (abnormal) pattern” is tested by
Kovalerchuk and Vityaev (2003). They further develop a four-step algorithm based on
a modified combination of first-order logic and probabilistic semantic inference to
recognize two suspicious patterns from ordinary or distributed databases that are
related to terrorism and other illegal activities:
.
a manufacturer buys a precursor and sells the same precursor; and
.
a trading company buys a precursor and sells the same precursor cheaper.
To sum up, “unusualness” is a good surrogate for “suspiciousness,” and customer due
diligence should focus on recognizing behavior patterns instead of learning
background knowledge simply because knowing how people behave is the best way
to identify risks. In order to improve the recognition of rare transactional patterns, our
attention should not be paid to transactions but to subjects and accounts which are
abstractions resulted from consolidation whereby similar identification information is
used to group transactions into clusters.
Fraud/outlier detection
Traditional fraud detection emphasizes access control like identity verification and
customer profiling analysis based on transaction history. It is weak in timely
uncovering potential fraud and insider trading. Statistical analysis, neural networks,
decision tree, fuzzy logic, and genetic algorithms should be employed to reduce FPR.
FAIS seldom uses supervised techniques like case-based reasoning, nearest neighbor
retrieval, and decision trees due to propositional approaches, lack of clearly labeled
positive examples, and scalability issues. Unsupervised techniques are also avoided
because of difficulties in deriving appropriate attributes (Phua et al., 2005). Rule-based
learning and classifier neural networks are taken as part of an adaptive system to
detect cellular cloning account fraud from external threats (Fawcett and Provost, 1997).
Neural networks are also used together with transactional history data to recognize
underlying patterns in data sets and assign risk levels to specific transaction sets,
assuming account-related predictor variables (e.g. transaction on or access to an
account) are related with predicted variables (e.g. risk of fraud or degree of
unusualness on the account) (Vikram et al., 2004). This research is applicable for
internal fraud, but its effectiveness might be weakened by its variable-oriented rather
than history-oriented perspective of each transaction. If a time sequence analysis is Framework for
made, the results might be improved. DM-based AML
SVM is suitable for density estimate and outlier detection. Peer group analysis and
break point analysis can be used to detect fraudulent activities while local outlier and research
global outlier are clearly distinguished. Pattern discovery techniques are advanced so
as to deal with complex (non-numeric) evidences and involve structured objects, texts,
and data in a variety of discrete and continuous scales (nominal, order, absolute and so 175
on) (Kovalerchuk and Vityaev, 2003).
Money laundering crime report

The present MLC reporting regime is predefined and expert rules-based, resulting in
under-fitting or the loss of logical associations between behaviors due to the
insufficient training of models. Financial institutions choose to over-report and
minimize false positive errors in the face of heavy fines for failure to comply with AML
reporting system whereas FIUs and LEAs would rather minimize false negative errors
for the sake of national financial stability and security, as shown in Table I. So in the
context of DM, we argue for a new MLC generation system capable of self-learning,
self-adaptation, self-decision-making, and self-explanation. Thus, reporting MLCs
becomes an issue of pattern classification to separate the unusual from the usual and
only those “bad” activities would be reported and investigated.
Further, MLC reporting might involve ML network generation (MLNG) in practice.
MLNG has three stages: entities extraction, association identification, and network
representation. It may be the information extraction from structured databases (IESD) or
the knowledge discovery in unstructured databases or textual documentations (KDUD).
For IESD, consolidation and heuristic-based link formation are used to relate the
identifiers present in a database to a set of RWEs which are not uniquely identified in the
database (Senator et al., 1995; Kingdon, 2004). A statistic-based concept space approach is
developed using co-occurrence weight to measure the frequency with which two words or
phrases appear in the same document, and crime incident data can be transformed further
into a network format (Chen and Lynch, 1992). For KDUD, text mining is a major tool.
Based on a set of predefined patterns and rules, relation-specifying words and phrases are
used to identify from free texts associations among extracted entities and events (Lee,
1998). The named-entity extraction techniques (Chau et al., 2002; Chinchor, 1998) are able
to automatically identify the names of interesting entities like person, time, location, and
organization, etc. from text documents.
Money laundering network analysis

As a predicate offence of ML, organized crime is characteristic of networking. Evidence
from a single case usually does not reveal any leads to aid investigations. What really
counts is the knowledge of ML network structure and role play. Link discovery, social
network analysis (SNA), and graph theory are frequently used money laundering
ML Non-ML
Reporting True positive False positive Table I.

Non-reporting False negative True negative Types of MLC reporting
JMLC network analysis (MLNA) tools, and DM in this context involves link analysis,
10,2 community generation, and network destabilization.
Money laundering network link analysis

Money laundering network link analysis appeals to advanced stochastic models,
algorithms and AI technologies. Existing criminal network analysis tools are of three
176 generations. The first is the manual approach, typically represented by Anacpapa Chart,
which is unapplicable for very large data sets. The second is the graphics-based
approach, which automatically produces graphical representations of networks and
covers most of the current tools like Analyst’s Notebook, Netmap, and XANALYS Link
Explorer widely adopted by the USA and The Netherlands. The third is the structural
analysis approach which expects to discover more structural characteristics of criminal
networks such as central members, subgroups, inter-/intra-group relationships, and the
overall structure (Klerks, 2001). Spatio-temporal co-occurrence can be used to infer
associations between criminals, and different strengths of associations are determined
by frequency and intensity (Lauw et al., 2005; Xu and Chen, 2005).
Similar to correlation rules, link analysis can discern ML networks and hierarchies,
but it remains primarily a manual process except for network visualization. Particularly,
existing MLNA tools are limited to direct association search and hardly make any
material analysis. To fill in the gap, shortest-path algorithm priority first-search (PFS)
and two-tree Dijkstra are used to identify the strongest associations between source
entities that are not directly related (Xu and Chen, 2004). It shows that PFS algorithms
outperform classical association-search approach – the modified breadth-first-search
algorithm – in terms of effectiveness. As far as efficiency is concerned, two-tree PFS is
better for small, dense networks while one-tree PFS is better for large, sparse networks.
As suggested by its evaluation results, if named-entities extraction and domain-specific
heuristics can be used, the two-tree PFS algorithm might become more effective.
With the incorporation of several approaches from other disciplines like SNA,
concept space approach, hierarchical clustering, and multidimensional scaling (MDS),
the CrimeNet Explorer system is developed on the basis of a four-stage framework for
automated network structural analysis and visualization (Xu and Chen, 2005,
pp. 201-26). Structural analysis functionality facilitates the system’s detection of
central members, subgroups, and intergroup interaction patterns, as expected by the
third generation of criminal network analysis tools. It also improves the system’s KDD
ability significantly. However, it is methodologically limited for its simplistic use of
concept space approach to mine association rules, single focus on criminal-criminal
relationship, and MDS’s poor productivity.
In addition, an association matrix of the 19 hijackers in September 11 attacks is
manually constructed using the public data that were available before but collected
after the event (Krebs, 2001). Incomplete as the information is, the analysis provides
valuable insights into the structure of the terrorist organization.
Money laundering network community generation

Clustering, fuzzy logic, K-means algorithm, and timeline analysis are used to
investigate MLCs, with person, time, and transaction being support vectors (Zhang
et al., 2003). The research has made four contributions:
(1) Identifying a new paradigm of uni-party data community generation where the Framework for
traditional, direct and explicit binary relationship does not exist in data items. DM-based AML
(2) Formulating an MLC group model generation problem to exemplify the use of research
the paradigm.
(3) Developing CORAL method to generate the MLC group model based on link
discovery based on correlation analysis methodology, where a correlation
measure with fuzzy logic is used to determine the similarity of patterns among 177
uni-party textual items.
(4) Implementing, evaluating, and testing a CORAL prototype in a real MLC case
data provided by the National Institute of Justice with promising results.
It also complements SNA, collaborative filtering, and web mining which focus on
automatic community generation based on a binary relationship given between data
items.
Money laundering network destabilization
To destabilize an ML network, power analysis, cohesion analysis, and role analysis are
employed, respectively, to identify central members, subgroups, and inter-/intra-group
relationships which have long been subjects of interest for LEAs.
Degree centrality and eigenvector centrality are used for power analysis to estimate
the influence of a node/person in the network. That person is considered the central or
the most powerful one who has the largest number/weight of connections. Variants of
clique model like n-clique and n-clan are used to identify cohesive subgroups with three
structural properties: familiarity, reachability, and robustness (Luce, 1950; Mokken,
1979). Network efficiency E(G) is employed to quantify how efficiently the nodes of a
network exchange information and to determine network critical components finally
(Latora and Marchiori, 2004). Followers and gatekeepers can also be labeled by
position indices. It is possible now for AML analysts to estimate how a network will be
affected if a particular node is removed from it.
Destabilizing ML networks implies disconnecting those key players from the
peripheries by which maximum network could be disrupted, but it is topology-specific.
While a star/wheel structure is centralized upon a hub, chain and complete structures
are decentralized networks. To disrupt a centralized network, removal of the hub can
be very effective, whereas to overthrow a decentralized structure is more difficult since
more than one sub-hub exists in it. These findings are heuristic to AML agencies.
Discussion and conclusions

Following the flow of data, we propose a framework for DM-based AML research.
Reporting MLCs is based on suspicion data preparations and rare transactional pattern
recognition including fraud/outlier detection. Also, it might involve MLNG in practice.
The key of the framework lies in ML network analysis involving link analysis,
community generation, and network destabilization.
Meanwhile, DM in the context of AML is a young, challenging field, and the
following subjects deserve further research:
.
constructing and evaluating suspicious transaction indicator (STI) systems;
.
improving ML network structural analysis in addition to visualization by more
use of unsupervised techniques;
JMLC .
upgrading DM-based AML software in areas like interesting entities extraction,
10,2 consolidation, link formation, and STI ranking; and
.
an integrated use of techniques from multiple disciplines such as semantics,
high-performance computation, signal processing, and spatio-temporal DM to
study ML in the broad environment of financial crimes.
178
References
Barse, E., Kvarnström, H. and Jonsson, E. (2003), “Synthesizing test data for fraud detection
systems”, Proceedings of the 19th Annual Computer Security Applications Conference,
pp. 384-95.
Chau, M., Xu, J. and Chen, H. (2002), “Extracting meaningful entities from police narrative
reports”, Proceedings of the National Conference on Digital Government Research,
Los Angeles, CA, pp. 271-5.
Chen, H. and Lynch, K.J. (1992), “Automatic construction of networks of concepts characterizing
document databases”, IEEE Trans. Syst. Man Cybernet, Vol. 22, pp. 885-902.
Chinchor, N.A. (1998), “Overview of MUC-7/MET-2”, Proceedings of the Seventh Message
Understanding Conference (MUC-7).
Fan, W., Yu, P.S. and Wang, H. (2004), “Mining extremely skewed trading anomalies”, in Bertino,
E. et al. (Eds), EDBT 2004, LNCS,Vol. 2992, Springer, Berlin, pp. 801-10.
Fawcett, T. and Provost, F. (1997), “Adaptive fraud detection”, Data Mining and Knowledge
Discovery, Vol. 1 No. 3, pp. 291-316.
Goldberg, H.G. and Senator, T.E. (1995), “Restructuring databases for knowledge discovery by
consolidation and link formation”, Proceedings of the First International Conference on
Knowledge Discovery in Databases (KDD-95), AAAI Press, Menlo Park, CA, pp. 136-41.
Goldberg, H.G. and Wong, R.W.H. (1998), “Restructuring transactional data for link analysis in
the FinCEN AI System”, Proceedings of 1998 AAAI Fall Symposium on Artificial
Intelligence and Link Analysis, AAAI Press, Menlo Park, CA.
Hand, D.J. (1998), “Data mining: statistics and more?”, The American Statistician, Vol. 52 No. 2,
pp. 112-8.
Kingdon, J. (2004), “AI fights money laundering”, IEEE Intelligent Systems, Vol. 5/6, pp. 87-9.
Klerks, P. (2001), “The network paradigm applied to criminal organizations: theoretical
nitpicking or a relevant doctrine for investigators? Recent developments in The
Netherlands”, Connections, Vol. 24 No. 3, pp. 53-65.
Kovalerchuk, B. and Vityaev, E. (2003), “Detecting patterns of fraudulent behavior in forensic
accounting”, in Palade, V., Howlett, R.J. and Jain, L.C. (Eds), KES 2003, LNAI,Vol. 2773,
Springer, Berlin, pp. 502-9.
Krebs, V.E. (2001), “Mapping networks of terrorist cells”, Connections, Vol. 24 No. 3, pp. 43-52.
Latora, V. and Marchiori, M. (2004), “How the science of complex networks can help developing
strategies against terrorism”, Chaos, Solitons & Fractals, Vol. 20, pp. 69-75.
Lauw, H.W., Lim, E., Pang, H. and Tan, T. (2005), “Social network discovery by mining
spatio-temporal events”, Computational and Mathematical Organization Theory, Vol. 11,
pp. 97-118.
Lee, R. (1998), “Automatic information extraction from documents: a tool for intelligence and law
enforcement analysts”, Proceedings of 1998 AAAI Fall Symposium on Artificial
Intelligence and Link Analysis, AAAI Press, Menlo Park, CA.
Luce, R. (1950), “Connectivity and generailized cliques in seciometric group structure”, Framework for
Psychometrika, Vol. 15, pp. 169-90.
Mokken, R. (1979), “Cliques, clubs, and clans”, Quality & Quantity, Vol. 13, pp. 161-73.
DM-based AML
Phua, C., Lee, V., Smith, K. and Gayler, R. (2005), “A comprehensive survey of data mining-based
research
fraud detection research”, available at: www.bsys.monash.edu.au/people/cphua/papers/
A%20Comprehensive%20Survey%20of%20Data%20Mining-based%20Fraud%
20Detection%20Research%20%5BDRAFT%5D%20(v1.2).pdf 179
Piatesky-Shapiro, G. and Matheus, C. (1992), “Knowledge discovery workbench for exploring
business databases”, Intl. J. Intell. Sys., Vol. 7 No. 7, pp. 675-86.
Schölkopf, B., Platt, J.C., Taylor, J.S. and Smola, A.L. (2001), “Estimating the support of a
high-dimensional distribution”, Neural Computation, Vol. 13 No. 7, pp. 1443-71.
Senator, T.E., Goldberg, H.G. and Wooton, J. (1995), “The financial crimes enforcement network
AI system (FAIS): identifying potential money laundering from reports of large cash
transactions”, AI Magazine, Vol. 16 No. 4, pp. 21-39.
Tang, J. and Yin, J. (2005), “Developing an intelligent data discriminating system of antimony
laundering based on SVM”, Proceedings of the Fourth International Conference on
Machine Learning and Cybernetics. Guangzhou, pp. 3453-7.
Vapnik, V. (1995), The Nature of Statistical Learning Theory, 2nd ed., Springer, New York, NY,
pp. 138-70.
Vikram, A., Chennuru, S., Rao, H.R. and Upadhyaya, S. (2004), “A solution architecture for
financial institutions to handle illegal activities: a neural networks approach”, Proceedings
of the 37th Hawaii International Conference on System Sciences-2004.
Watkins, R.C., Reynolds, K.M., Demara, R., Georgiopoulos, M., Gonzalez, A. and Eaglin, R. (2003),
“Tracking dirty proceeds: exploring data mining technologies as tools to investigate
money laundering”, Police Practice and Research, Vol. 4 No. 2, pp. 163-78.
Xu, J.J. and Chen, H. (2004), “Fighting organized crimes: using shortest-path algorithms to
identify associations in criminal networks”, Decision Support Systems, Vol. 38, pp. 473-87.
Xu, J.J. and Chen, H. (2005), “CrimeNet Explorer: a framework for criminal network knowledge
discovery”, ACM Transactions on Information Systems, Vol. 23 No. 2, pp. 201-26.
Zdanowicz, J.S. (2004), “Detecting money laundering and terrorist financing via data mining”,
Communications of the ACM, Vol. 47 No. 5, pp. 53-5.
Zhang, Z., Salerno, J.J. and Yu, P.S. (2003), “Applying data mining in investigating money
laundering crimes”, paper presented at SIGKDD’03, Washington, DC, pp. 747-52.
About the authors

Zengan Gao, an Associate Professor in the Department of Finance and Accounting, School of
Economics and Management, Southwest Jiaotong University, PR China. Also a Visiting
Academic to UQ Business School, the University of Queensland, Australia. He has a keen interest
in the use of DM in the context of AML. His subjects also include international trade and
enterprise management. Zengan Gao is the corresponding author and can be contacted at:
gaozengan133@163.com
Mao Ye, a Professor in the Institute of Knowledge Management and Business Intelligence, School
of Economics and Management, Southwest Jiaotong University, PR China. He has a keen interest in
intelligent information processing theory and applications. Email: yem_mei29@hotmail.com
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com

Or visit our web site for further details: www.emeraldinsight.com/reprints

A Framework For Data Mining-Based Anti-Money Laundering Research

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Framework For Data Mining-Based Anti-Money Laundering Research

Uploaded by

Copyright:

Available Formats

The current issue and full text archive of this journal is available at

DM is also limited in:

Suspicion data preparations

Rare transactional pattern recognition

Unusual/abnormal/anomalous transaction pattern recognition

High Legitimacy Low

Suspicious transaction pattern recognition

Money laundering crime report

Money laundering network analysis

Reporting True positive False positive Table I.

Money laundering network link analysis

Money laundering network community generation

Discussion and conclusions

About the authors

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com

You might also like