You are on page 1of 16

REVIEW

A Review of Data Mining Applications in Crime


Hossein Hassani1∗ , Xu Huang2 , Emmanuel S. Silva3 , and Mansi Ghodsi1
1
Institute for International Energy Studies, Tehran 1967743711, Iran
2
Faculty of Management, Bournemouth University, Fern Barrow, Poole BH125BB, UK
3
Fashion Business School, London College of Fashion, University of the Arts London, London WC1V7EY, UK

Received 23 September 2014; revised 25 February 2016; accepted 28 February 2016


DOI:10.1002/sam.11312
Published online 21 April 2016 in Wiley Online Library (wileyonlinelibrary.com).

Abstract: Crime continues to remain a severe threat to all communities and nations across the globe alongside the
sophistication in technology and processes that are being exploited to enable highly complex criminal activities. Data mining,
the process of uncovering hidden information from Big Data, is now an important tool for investigating, curbing and preventing
crime and is exploited by both private and government institutions around the world. The primary aim of this paper is to provide
a concise review of the data mining applications in crime. To this end, the paper reviews over 100 applications of data mining
in crime, covering a substantial quantity of research to date, presented in chronological order with an overview table of many
important data mining applications in the crime domain as a reference directory. The data mining techniques themselves are
briefly introduced to the reader and these include entity extraction, clustering, association rule mining, decision trees, support
vector machines, naive Bayes rule, neural networks and social network analysis amongst others. © 2016 Wiley Periodicals, Inc.
Statistical Analysis and Data Mining: The ASA Data Science Journal 9: 139–154, 2016

Keywords: big data, data mining, crime, review

1. INTRODUCTION It is in this backdrop that data mining is described as a


powerful tool with great potential to help criminal investiga-
Crime has evolved rapidly over time, with criminals tors focus on the most important information hidden within
now exploiting the latest in technology not only to the ‘Big Data’ on crime [3]. Data mining as a tool for crime
commit crimes but also to evade being captured. Criminal
analysis is recognized as a comparatively new and highly
activity is no longer limited to the streets and back alleys
sought after area of research [4]. This is not surprising as
in our neighborhoods. The Internet, which connects the
data mining itself is a relatively new and rapidly evolving
entire world, is also a thriving playground for the more
subject, and those interested in the historical and modern
sophisticated criminals in the modern age. Building upon
definitions of data mining are referred to [5] as data mining
acts of terror, such as the 9/11 terrorist attacks and use of
is not concerned with estimation and tests or prespecified
technology to hack into the most secure defense databases,
models but with discovering models through an algorith-
the need for new and effective methods of crime prevention
is increasingly significant [1]. The evolution of ‘Big Data’, mic search process exploring linear and nonlinear models,
which demands novel approaches towards the effective and explicit or not.
accurate analysis of the growing volumes of crime data, Alongside the increasing use of the computerized systems
has been a major challenge for all law enforcement and to track crimes, computer data analysts have begun helping
intelligence-gathering organizations [2]. law enforcement officers and detectives speed up the pro-
cess of solving crimes [6] and predict crimes in advance.
The popularity in the application of many data mining tech-
∗ niques are further influenced by the increasing availability
Correspondence to: Hossein Hassani (hassani.stat@gmail.com)

© 2016 Wiley Periodicals, Inc.


140 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

of Big Data and its ease of use for people who lack data In terms of crime, as a broad range of research disciplines,
analysis skills and statistical knowledge [7]. As identified crime analysis can encompass a wide range of crime activi-
by many authors, access to data plays an important role ties, from simple violation of civic duties to internationally
in the effectiveness of data mining in crime, but problems organized crimes [4]. According to ref. [9], complex con-
arise as access is hindered by privacy concerns [2,8]. spiracies are often difficult to unravel because information
This paper is aimed at providing a concise review of the on suspects can be geographically diffused and span long
data mining applications used for identifying and prevent- periods of time; detecting cyber crime can likewise be
ing crime over the years. It is expected that this informative difficult because busy network traffic and frequent online
review paper will be useful for introducing Data Mining transactions generate large amounts of data, whilst only a
techniques to crime researchers and investigators in addition small portion would actually relate to illegal activities.
to supporting and encouraging future research into devel- In order to provide a clear view of data mining applica-
oping data mining for crime analysis. In order to enable tions in crime, Table 1 is provided as a reference directory.
such use, the review has been organized so that interested This table summarizes information based on the data mining
parties could easily refer to this article alone to apprise technique used and provides information relating to soft-
themselves on research that has already been conducted to ware, regions and purpose of the underlying applications.
date and the resulting outcomes that have been attained. The A recent review of data mining techniques used for finan-
main contribution of this paper is two-fold as it not only cial accounting fraud detection up until 2011 can be found
captures a majority of the significant data mining appli- in ref. [11] and are therefore not reproduced here.
cations in crime by classifying these based on different
types of techniques but also presents a concise introduction
into each of the relevant data mining techniques that have 2.1. Entity Extraction
been exploited for mining crime. Moreover, the review also Entity extraction can be defined as the process of extract-
includes, in tabular format, a summary of data mining appli- ing metadata from unstructured text documents.1 Chau et
cations in crime that can act as a quick reference guide for al. [8] proposed a neural network-based entity extractor,
researchers. which applied named-entity extraction techniques to iden-
According to ref. [9] the main crime-related data mining tify useful entities from police narrative reports in 2002.
techniques are clustering, association rule mining, classi- They highlighted the importance of valuable information
fication and sequential pattern mining. In line with the stored as text objects in criminal-justice data (i.e., the
findings in ref. [9], our research uncovered that the fol- free-text police narrative reports), which are considered
lowing data mining techniques are most frequently adopted unstructured data. Unlike the information from structured
for crime analysis. These include entity extraction, cluster- data, these unstructured data cannot be easily accessed and
ing, association rule mining, decision trees, support vector used by investigators or detectives. In ref. [8], four major
machines, naive Bayes rule, neural networks and social net- named-entity extraction approaches are briefly summarized
work analysis. and listed accordingly: lexical lookup [98], rule-based [99],
The remainder of this paper is organized such that a statistic-based [100] and machine learning [98,101,102].
review of the applications of data mining for crime analysis Similar to most existing information extraction systems,
is presented in Section 2 in chronological order along the entity extractor proposed by Chau et al. [8] consists of
with more specific explanations on the implementation of more than one of these listed approaches. More specifically,
data mining techniques. The paper concludes in Section 3, it combines lexical lookup, machine learning and minimal
whilst introductions to various data mining techniques are hand-crafted rules. By applying the neural network-based
presented in the Appendix. entity extractor in 36 reports randomly selected from the
Phoenix Police Department database for narcotic-related
crimes, the precision rates of extracting entities of persons
2. DATA MINING APPLICATIONS IN CRIME and narcotic drug are 74.1% and 85.4%, with recall rates
of 73.4% and 77.9%, respectively [8].
In this section, we present a summary of the Data Mining In order to detect what crimes may or may not have been
applications used to detect and prevent crime. In compar- committed by the same group of individuals, in ref. [12],
ison to traditional data mining techniques, the advanced a distance measure is proposed with a four-step paradigm
techniques focus on both structured and unstructured data and adaptation of the probability density function to extract
to detect patterns [2,10]. With the emergence of Big Data, entities from a collection of documents. It is then used to
most existing systems use a combination of data mining
techniques (which have been introduced in the appendix) 1
http://www.dataversity.net/entity-extraction-and-the-
in order to obtain more precise and accurate extractions. semantic-web/

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Table 1. Overview of Data Mining applications in crime.
Data Mining Overview Summary
Techniques References Key Techniques or Softwares Specific Tested Regions Purpose and Function
Entity Extraction [8,10,12–20] Named Entity Extraction (lexical lookup, Phoenix [8], Taiwan [17], Extract valuable information especially from
rule-based, machine learning and Arabin [18], Malaysian unstructured text data (i.e., Person(also Name
hand-crafted rules), SPSS LexiQuest, [19], (New Zealand, by Different Languages), Address, Location,
Natural Language Processing, Investigative Australia and India) [20] Time, Vehicle, Nationality, Phone, Gender
Interviewing Technique, Principles of the and Race, Crime Type, Personal Property,
Cognitive Interview, General Architecture Organization, Narcotic Drug, Suspect
for Text Engineering System, Semantic Descriptions).
Inferentialism, Part-of-Speech Tagging,
LBJTatter Algorithm, Conditional Random
Field
Cluster Analysis [6,21–34] GIS (Geographic Information System), United Kingdom [35], Detect crime hot spots; automatically identify
Self-Organizing Map, Hierarchical Tucson [26,36], Phoenix associations from existing crime data and
Clustering Technique, Partitioning [37], Netherlands [27], weight relationships to detect the strongest
Clustering Technique, Concept Space Northamptonshire [28], association among all possible pairs of crime
Technique, Co-occurrence Analysis Tucson [38], India [34,39], related entities; overcome the challenges like
Algorithm, Link Analysis Technique Italy [29], Nigeria [30] ‘information overload, high search complexity
(Shortest Path Algorithms, and heavy reliance on domain knowledge’
Priority-First-Search and Two Tree [38].
Priority-First-Search), K-Means Clustering,
Heuristic Approach
Association Rule [40–45] Apriori Algorithm, Distributed Association Richmond(Virginia) [40,41], Link crime incidents, narrow down possible
Rule Mining (Survey in [46]),Transformed Hong Kong [43], USA suspects, provide informative association
Categorical Similarities, Dynamically [44], among criminal entities or items, discover
Adjusted Weights, Outlier Score Function, crime patterns.
Distributed High Order Text Mining,
Temporal Association Rule, Incremental
Mining, Frequent Pattern Growth
Classification [10,32,47–77] Iterative Dichotomiser 3 Algorithm, C4.5 Greek [48], USA Efficient detection of specific criminal activities
Techniques Algorithm, STAGE Algorithm, CART, [50,51,54,56,74], Asian among large-sized data sets; Categorize crime
Hunt’s Algorithm, Deceptive Theory [62], Dutch [63], Brazil data; Predict crime hot spots.
Hassani et al.: Review of Data Mining Applications in Crime

[66], Malaysia [76], San


Francisco [77]
Social Network [78–97] K-core, Core/periphery Ratio; Measure of Canada [81], United States Provide analyses of functions, structures and the
Analysis Centrality, Closeness and Betweenness; [82], Richmond (Virginia, interaction measurements, which, in the crime
Center Weights Algorithm; Borgatti’s Key US) [96] domain, detect and illustrate relationships and
Player Approach; connections of criminal entities(including
individuals, groups, events, organizations,
etc.); identify key members and interaction
patterns between sub-groups in criminal
networks; depict the suspects and their
connections to other individuals or
artifacts(like phone numbers, bank accounts).

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
141
142 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

transform a high-dimensional vector table into input for a extracting the crime scene alongside a 72% precision rate
police-operable tool. By applying this proposed distance for extracting the type of crime.
measure, the authors employed the SPSS LexiQuest text In ref. [16], an expanded entity phrase is defined as
mining tool [103] to form a table of all the entities included the key component for extracting an entity. Effective per-
in each investigation. Thereafter, the transformation stage formance is conducted by combining part of the speech-
compared the investigations on common entities, the based template matching and ontology-driven national
adaptation of symmetrical distance measure and probability language processing. The preliminary results by imple-
density function with the normal distribution, and finally, mentation of this proposed approach on free text law
a two-dimensional representation of the distances between enforcement data outperformed the named-entity extraction
all possible couples of investigations is obtained in order to technique and reported over 80% precision and recall rates
help the crime analysts and investigators attain an overall in general.
clear understanding of all undergoing investigations. Moreover, Yang et al. [17] used the entity extraction
An Online Crime Reporting System was developed technique combined with part-of-speech (POS) tagging for
in ref. [104] to extract relevant crime information from criminal information analysis and relationship visualization.
witness narratives and also to generate additional questions They reported their method as an efficient and effective
based on the extracted information. The proposed system term-relationship mining technique, which showed great
combined natural language processing and an investigative performance even with the most complex case of Chinese
interviewing technique (based on the cognitive interview POS tagging during its implementation on criminal infor-
principles) with the adoption of the General Architecture for mation data from Taiwan.
Text Engineering System as the information extraction tool. In ref. [18], a rule-based Arabic named entity recognition
By evaluating the performance on the suspect description (NER) system is presented to identify and classify named
module, an overall recall rate of 70% and 100% precision entities in Arabian crime text. It comprises of three modules
was achieved. in the pre-processing stage: sentence splitting, tokenization
Ku et al. [13,14] recalled that the goal of applying and part-of-speech tagging; grammatical rules, patterns and
information extraction techniques in crime analysis is to gazetteer are also considered in the process of the named
help investigators extract crime-related information quickly entities identification stage. The proposed system achieved
and effectively. They developed an online reporting system an overall 91% precision rate and 89% recall rate when
that is based on an information extraction technique and tested on the corpus of Arabic crime documents from
combined natural language processing with insights from newspapers. Recently, Alkaff and Mohd [19] used named
the cognitive interview approach to obtain more information entity recognition with gazetteers and rule-based extraction
from witnesses and victims. According to ref. [13], this for extracting information on nationalities from crime news
proposed system combines information extraction and in Malaysia.
principles of the cognitive interview with the purpose of In ref. [20], NER (LBJTatter algorithm is chosen due to
efficiently gathering more valuable information from those the best performance in comparison to several algorithms)
victims and witnesses who are too scared or embarrassed was used with conditional random field (a machine-learning
to report crime incidents. As this system also encourages approach to classify a crime location sentence in an
the use of natural language, it not only makes the process article) to extract crime information found in online news
of reporting crime easier but also enables the gathering articles. Both comparisons of two newspapers in New
of more information. A large lexicon that combines a Zealand and comparisons of crime location extraction
rule-based system is developed to extract crime-related across countries (Australia and India) were conducted with
entities by triggering this proposed system to ask questions an overall 80–90% accuracy for tests on New Zealand
according to the principles of the cognitive interview. The newspapers and generally about 75% accuracy for cross-
performances of the proposed system show significantly country scenarios.
high precision rates (94% for police narratives and 96%
for witness narratives) and recall rates (85% for police
2.2. Cluster Analysis
narratives and 90% for witness narratives).
Pinheiro and Furtado [15] proposed the semantic Cluster analysis is a technique used to group observations
inferentialism-based natural language processing model to where observations within each group are similar, and
extract crime information from unstructured text. In addi- the groups are different from each other. Cluster analysis
tion, this framework particularly provided the adaptation of was combined with the geographic information system
the collaborative environments on the Web. The evaluation (GIS) for detecting crime hotspots in [25], where the
of this framework was conducted on 100 crime-related texts authors evaluated the performances of different clustering
on the Web, and a precision rate of 87% was achieved for techniques and stated the concept of cluster significance
Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Hassani et al.: Review of Data Mining Applications in Crime 143

as the standard to define a spot as ‘hot’ or not in crime performance, a clear and objective rationale is provided for
analysis. further development as a viable path for improving crime
Adderly and Musgrove [35] combined clustering tech- investigation. Schroeder et al. [38] proposed the Crime-
niques and self-organizing maps to model the behavior of less Explorer system, which combined several techniques,
sex-offenders in UK. Their results indicated that crimes including co-occurrence analysis, shortest-path algorithm
within a single cluster have strong similarities and may con- and the heuristic approach, to provide a better link anal-
tain crimes that have been committed by the same offender. ysis to assist crime investigations. In terms of enhancing
The Coplink system is recognized as one of the investigation effectiveness for Indian police, Gupta et al.
successful implementations of the clustering technique for [39] proposed a crime analysis tool with an interactive
crime data mining [2,26,36,37]. According to Hauck et al. query-based interface to help the police in crime investi-
[26], Coplink applies the concept space as its underlying gations. Furthermore, the clustering technique is adopted to
structure to identify relationships between suspects, victims find crime hotspots based on the National Crime Record
and other relevant data in order to accelerate criminal Bureau database in India.
investigations and enhance law enforcement efforts. To More recently, in ref. [29], cluster analysis was used to
elaborate, the concept space algorithm is stated as a identify the effect of various economic factors on crime
statistics-based algorithmic technique that can automatically in Italy, while in ref. [30], a GIS is complemented with
compute the strength of relationships between each possible multivariate cluster analysis for assessing property crime
pair of concept descriptors identified in a document in Nigeria. Self-organizing map and multilayer perceptron
collection [26]. As most criminal-justice data are properly neural networks are employed with the clustering technique
structured in the existing reports, the sources that a concept in ref. [31] for crime analysis and matching. Sukanya
space can be derived from are located. All the terms in et al. [32] applied the clustering technique with GIS to
this located concept space are then filtered and indexed identify criminals and crime hotspots by analyzing the
by an appropriate co-occurrence analysis algorithm. In spatial pattern. K-means clustering was used in combination
ref. [26], terms are categorized into five main types: with the rapid miner tool in ref. [33] for crime analysis of
Person, Organization, Location, Crime and Vehicle, and offences recorded in England and Wales. A spatial-temporal
according to feedback from crime analysts and detectives, analysis combining GIS and the clustering technique was
Coplink can provide valuable leads by discovering links adopted in ref. [34] for day-to-day crime forecasting in
between different types of terms and can provide the crucial India.
advantage of operating efficiency in the investigating
process.
In ref. [105], a link analysis technique is proposed to 2.3. Association Rule Mining
identify the strongest association paths between entities in Association rule mining is a method that exploits the
a criminal network. In comparison to the existing link anal- relationships among observations for uncovering crucial
ysis tools used in crime analysis (for example, Anacapa information hidden within Big Data. In ref. [21], the basic
charting system [106], Netmap [107], Analyst’s Notebook and advanced concepts and algorithms of association analy-
[108], Watson [109] and Coplink Detect [26]), the authors sis are provided with detailed calculation and examples. In
employed shortest-path algorithms and logarithmic trans- ref. [110], an association rule mining algorithm is proposed
formation of the link weights to identify the shortest path; by using the sales data of customer transactions obtained
therefore, the strongest associations can be detected among from a large retailing company. Association rule mining
the crime-related entities. is designed for applications where observations consist of
De Bruin et al. [27] describes a tool that extracts transactions, and a subset of the available items appears in
important factors that play a role in the analysis of criminal each transaction. Later, in ref. [111], two new algorithms
careers from databases and creates digital profiles for all are presented and compared with the known algorithms, and
offenders. Their method compares all individuals on these significant improvements are recorded by both new algo-
profiles using a new distance measure and clusters them rithms and their combined AprioriHybrid algorithm.
accordingly, which, in turn, yields a visual clustering of In ref. [40], association methods are employed for
these criminal careers and enables the identification of applications in law enforcement. The proposed approach
classes of criminals [27]. The clustering technique was automatically looks for similarities by using a new
employed again in [12] in an investigation process to total similarity measure with information theoretic-based
identify the offenses that were committed by the same group weights among attributes of robbery records data from the
of criminals. Richmond Police Department in order to associate incidents
K-means clustering was used in [28] to assess the per- possibly committed by the same or group of criminals.
formance of crime scene investigators. By modeling their Thereafter, Lin and Brown presented a new association
Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
144 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

method that combined outlier score function and tested and reported over 95% accuracy in correctly classifying
it on the same robbery data from the Richmond Police e-mails in a large-sized dataset. The model of deceptive
Department in ref. [41]. The new outlier-based method theory is applied to the dataset of e-mails, and the
outperforms the similarity-based method, with promising decision tree is generated via the ID3 algorithm. Kirkos
results in providing more helpful information for police et al. [48] explored and compared the performances of
officers. different classification techniques for auditors in detecting
A distributed higher-order association rule mining algo- firms that issue fraudulent financial statements. Three
rithm is proposed by Li et al. [112] with example law models (decision trees, neural networks and Bayesian belief
enforcement data, which overcomes the requirements of networks) are evaluated by dealing with the identification
knowledge of a global schema and the distribution of data of factors associated with fraudulent financial statements on
being horizontal or vertical. In ref. [42], association rule datasets from 76 Greek manufacturing firms. In ref. [49],
mining is used for detecting suspicious e-mails by identify- decision trees, neural networks, support vector machines
ing unusual and deceptive communication in e-mails with and stochastic boosting are utilized for hotspot detection in
the implementation of an Apriori Algorithm. This proposed an urban development project dataset, which contains 1.4
technique was designed to assist the investigators in effi- million cases, 14 predictors and a binary response variable.
ciently obtaining information and taking effective actions Decision trees are adopted for predicting crime reporting in
to reduce or prevent criminal activities. Ng et al. [43] ref. [50] by identifying the variables that influence whether
introduced an incremental algorithm for maintaining tem- a crime is reported from survey data obtained via the Bureau
poral association rules with numerical attributes by using of Justice Statistics of USA, while in ref. [51], classification
the negative border method. Their new algorithm has been techniques like decision trees, neural network, naive Bayes
implemented to support the discoveries of crime patterns in and support vector machines are applied and compared for
a district of Hong Kong, and the results showed significant predicting crime hotspots and crime forecasting. In ref.
improvements. [52], a reduction strategy of attribute is combined with the
Fuzzy association rule mining was introduced in ref. decision trees algorithm for analyzing criminal behavior.
[44] as a novel means for knowledge discovery in the Naive Bayesian, rule-based classification and C4.5 decision
crime domain, supported by experimental results on open- trees are exploited in ref. [53] for detecting auto insurance
source communities and crime datasets [113] in USA. The fraud. The decision trees technique is particularly compared
most significant result is that crime patterns are discovered with other data mining techniques for automotive insurance
and also consistent across all regions, subsets of regions fraud detection in ref. [54]. In ref. [55], a decision tree-
and all states, which proves the capability and potential based classification model is used for discovering crime
benefit of implementing this proposed technique for crime patterns and predicting future trends. Recently, in ref.
analysis and police force investigations. A review of more [56], classification algorithms are evaluated for crime
applications of association rule mining for crime detection prediction where they find decision trees outperforming
up until 2014 can be found in ref. [45] and are therefore a naive Bayesian algorithm in accurately predicting the
not reproduced here. crime category for different states in USA. An improved
decision tree algorithm based on the Maclaurin-Priority
Value First method is used in ref. [57] for computer crime
2.4. Classification Techniques forensics, and this improved algorithm outperformed the
As one of the most fundamental and significant data commonly used ID3 algorithm in terms of both efficiency
mining techniques, Classification techniques are used and accuracy. More recently, a variety of classification
for classifying observations based on some significant techniques were evaluated for detecting insurance fraud
rules/attributes that are discovered from the database. In in ref. [58] where an algorithm for determining the
terms of applying classification techniques in crime data relationship between classification models and different
mining, many implementations combined more than one types of insurance fraud data was proposed.
specific type of classification technique. Therefore, the
review that follows is classified by each technique in
2.4.2. Neural Networks (NN)
chronological order, and depending on circumstances, those
complex combination cases are not reproduced. Artificial neural networks (ANNs), decision trees and
logistic regression are used in ref. [59] for uncovering
lies from 371 statements of different types of crimes. In
2.4.1. Decision Trees
ref. [60], logistic regression and ANNs are applied to
The classification technique, also known as decision identify smuggling vessels, and the findings indicate that
trees, is used in ref. [47] for detecting suspicious e-mails the ANN reports a higher accuracy than logistic regression.
Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Hassani et al.: Review of Data Mining Applications in Crime 145

Multilayer perceptron (MLP), Decision trees, support vector particle swarm optimization for forecasting property crime
machines, genetic programming (GP) and probabilistic rates in USA and found it outperforming forecasts from the
neural networks were compared in ref. [61] for detecting individual models, while later in [75], SVR was employed
phishing e-mails from a dataset of 2500 e-mails, whilst in for forecasting property crime rates along with grey ratio-
ref. [62], decision trees, ANNs and support vector machines nal analysis. The behavior of criminals are analyzed using
were compared for discriminating between those charged SVM in ref. [76] for Malaysia, where the ratio of police to
and not charged for initial juvenile offending. Logistic population is extremely low at 3.6 to 1000. More recently, it
regression, boosting, SVM, neural networks, K-nearest has been shown in ref. [77] that an SVM model can be used
neighbor and a variety of other data mining techniques are to provide live predictions of crime in urban areas based
evaluated for predicting recidivism in ref. [63]. on Twitter data related to an urban subarea from within the
city of San Francisco.
2.4.3. Support Vector Machines (SVM)
The SVM classification has been used to identify the 2.5. Social Network Analysis (SNA)
sources of e-mail spamming based on the sender’s lin- Social network analysis (SNA) is a method built on
guistic patterns and structural features [10,64]. An SVM the analysis of social structure of observations for iden-
model is successfully used in [64] to aid author identifi- tifying significant information. Sparrow [78] summarized
cation forensics via mining e-mail content. SVM is used existing concepts of network analysis in the applications of
for crime hotspot prediction in ref. [65], where they find it the crime analysis domain with detailed explanations and
outperforming neural networks and spatial auto-regression- comparisons. Wang and Chiu [79] applied SNA and iden-
based approaches. In ref. [66], SVM is exploited for crime tified two transactional network structure measurements,
scene classification using a database comprising of 400 k-core and core/periphery ratio, to detect the online auction
crime scenes. The authors find SVM-based approaches per- inflated-reputation traders from regular accounts. Signifi-
forming better than Multilayer perceptron neural networks. cant results proved that SNA can act as an effective indictor
Liu et al. [67] used SVM to help with identifying digital to distinguish criminal accounts and possibly prevent and
evidence relating to computer crimes via outlier detection. reduce problematic transactions and online auction frauds.
In ref. [68], SVM was used to help crime investigators pro- Qin et al. [80] proposed an analysis of the structure of the
duce a set of possible interpretations for domain-relevant Global Salafi Jihad network with SNA and Web structural
concepts through kernel-based relation extraction. Wang et mining. Results showed that the proposed technique can be
al. [69] used the SVM technique for predicting criminal an effective tool to identify key members in a terrorist net-
recidivism and compared their results with those obtained work and, therefore, help authorities develop efficient and
from logistic regression and neural networks. Salem et al. effective disruptive strategies and measures.
[70] evaluated the performance of SVM for detecting iden- In ref. [81], the SNA technique was used to investi-
tity theft using the Schonlau dataset, and whilst they found gate organized criminal groups by applying it to an outlaw
one-class SVM models to be practical in accurately locat- motorcycle gang operating in Canada. Ressler [82] intro-
ing identity theft in the general case, the results were not duced SNA as a new type of intelligence for homeland
satisfactory where specific user profiles fit the average user security in USA, which can provide important informa-
profile. Advanced fee fraud activities are detected using tion on the unique characteristics of terrorist organizations,
both SVM and random forests (RF) in ref. [71], where understand terrorist networks and form the basis for a more
they find SVM outperforming RF2 . Moreover, an SVM effective countermeasure to net war. In ref. [83], SNA was
model is used alongside logistic regression and RF for employed on the Enron corporation’s e-mail archive, and it
detecting credit card fraud in [72], where the performances proved to be able to analyze and catalog patterns of commu-
are evaluated on real world transaction data from inter- nications between entities in an e-mail collection to extract
national financial institutions. The SVM technique, com- social standing. The authors stated that this technique makes
bined with the AdaBoost algorithm, is used in ref. [73] for it possible to view a snapshot of a corporate community and
cyber-crime detection and prevention based on a Facebook effectively determine the real relationships and connections
dataset. Alwee et al. [74] exploited a hybrid support vector between individuals.
regression (SVR) model in combination with ARIMA and Chau and Xu [84] applied the web mining and network
analysis technique to study hate groups in online blogs.
2 Their proposed approach successfully identified and ana-
it is noteworthy that RF have been successfully employed in
ref. [114] in describing the precursors of homicide by accounting lyzed a selected set of hate groups (which contained 820
for the relative costs of forecasting errors. The RF algorithms are bloggers) on Xanga. Nair and Sarasamma [85] performed
also explained in detail in Appendix A of Berk et al. [114]. SNA alongside fuzzy theory with the aim of modeling
Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
146 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

multi-modal social networks. A new fuzzy binary operation was used as a tool for helping crime analysts and
was proposed to satisfy the requirements of a fuzzy con- detectives in police forces develop interdiction strategies
solidation operator. Liu et al. [86] proposed an enhanced in ref. [96], with an example case study focusing on the
shortest-path algorithm, combing SNA to mine the core Richmond City Police Department. Detailed examples of
member of a terrorist group. Wang and Chiu [77] used two implementations are provided to show how the proposed
SNA indicators, k-core and center weights algorithms, to approach can assist police in understanding complex
form a recommendation system that can suggest the risks behavioral motivations of offenders, strategically hot-
of collusion associated with an account for online auction spotting people of interest and developing stronger inter-
site users. The results are promising, with 76% detection jurisdictional working relationships [96]. More recently,
accuracy on real world ‘blacklist’ data and sound evidence in ref. [97], SNA was used to detect and analyze hidden
that the approach can provide effective warnings several activities in social networks. Both regression-based and
months ahead of the official release of blacklists. Min Cut-based algorithms are adopted and compared with
In ref. [88] the authors discussed the solution for the datasets from several sources.
case of privacy-preserving SNA in collaboration with multi-
ple investigators. The proposed algorithm, compute metrics,
can operate the computation and analysis without knowing
3. CONCLUSION
the personally identifiable data with high privacy guaran-
tees. Chau and Xu [89] again addressed the importance of This review paper begins with a brief introduction into
studying the emergence of cyber communities in blogs with the evolution of crime and the importance of data min-
a combination of web mining and SNA techniques. In ref. ing for crime detection, analysis and prevention. Given the
[90], the authors designed a simulation e-mail system based vast amount of funding that is allocated for defense-related
on personality trait dimensions to model the traffic behav- expenditure around the globe, it is clear that considerable
ior of e-mail account users and employed this with SNA savings could be attained through the correct application
to identify key members of a criminal group. Chen [91] of data mining techniques, which are primarily aimed at
shows the applicability of SNA for homeland security data uncovering hidden relationships in Big Data. Following
mining with case studies based on gang/narcotic networks, thorough research, we are able to present a list of data min-
US extremist networks, Al-Qaeda member networks and ing techniques as the most frequently adopted at present for
international Jihadist websites and forum networks. crime analysis. These include entity extraction, clustering,
Fard and Ester [92] proposed a framework for automated association rule mining and classification techniques like
network data analysis and deduction from multiple social decision trees, support vector machines, naive Bayes rule,
networks by converting a transaction dataset and applying neural networks and social network analysis.
association mining and statistical methods. The proposed As opposed to providing a review of applications alone,
method varies from previous work and combined the game this paper goes a step further and provides a concise intro-
theory concept in a multi-agent model with the purpose duction to these data mining techniques (see Appendix) as
of building P2P applications for the police force to detect well as a brief summary of the specific functions of each
relationships between criminals and narrow down possible technique in crime analysis and thereby enables readers
suspects. In ref. [93], the authors proposed an SNA- from different backgrounds to obtain a much richer under-
based model for targeting criminal networks. It is built standing of the underlying process whilst guiding them to
on Borgatti’s key player approach with modifications on relevant articles for a more detailed description. Table 1
incorporating the relative strength of actors as well as the particularly functions as a useful resource or ‘quick guide’,
strength of the relationships binding network actors. Chiu which summarizes the data mining applications in crime,
et al. [94] employed SNA for identifying internet auction providing useful information not only on the software and
fraud. They performed experiments on data from the Yahoo purpose but also the regions that have been tested.
Auctions website with comparisons of different types of This review itself is unique as it captures over 100
internet auction accounts and achieved promising results applications and is the most up-to-date and thorough review
pertaining to prediction accuracy. of data mining applications in crime to date. We find
In ref. [95], Carrington classified the applications of strong evidence based on the number of applications to
SNA in the crime domain into three areas (the influence conclude that classification techniques are the most popular
of the personal network on ego’s delinquency or crime, form of data mining in crime. This is interesting as there
the influence of neighborhood networks on crime in the exists a major difference between the findings from data
neighborhood and the organization of criminal groups mining in official statistics [5] and our findings here.
and activities) and provided detailed explanations together We notice that SVM, neural networks and association
with significant theories and literature. The SNA technique rule mining are seldom used to mine official statistics,
Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Hassani et al.: Review of Data Mining Applications in Crime 147

whereas in crime data mining, these methods are extremely are neural networks, decision trees [101], hidden Markov models
popular and well exploited. This further highlights the [102] and entropy maximization [98].
institutional and cultural differences that exist between
national statistical institutes and organizations engaged in
A.2 Cluster Analysis
analyzing crime.
The increasing applications of data mining techniques According to ref. [21], cluster analysis groups data objects based only
alongside the emergence of Big Data also points toward on information found in the data that describes the objects and their
relationships. The key objective of this process is to find groups of objects
the need for more training and investment in educating such that the objects in a group will be similar (or related) to one another
and empowering youth with knowledge on the advantages, and different from (or unrelated to) the objects in other groups [21]. The
developments and practical use of data mining techniques. fundamental concept of clustering is the distance measurement among
In countries such as the UK, it is encouraging to see objects. Euclidean distance, Manhattan distance and Minkowski distance
universities now offering not only units but also courses are the most widely adopted distance measures for cluster analysis. In
order to identify different types of structures or clusters in the data,
on data mining, which will most certainly have a positive different algorithms have been developed to date, which can be mainly
impact on future generations as well as lead towards classified into hierarchical methods and partitioning methods based on
an exponential increase in data mining applications. We how the subsets are organized [24]. Other methods, except the two
believe that this review paper can help researchers around major approaches, include the density-based method, grid-based method,
the globe easily identify existing research so as to develop model-based method and constraint-based method. For the sake of briefly
introducing the process of cluster analysis without covering all available
more complex and improved techniques for mining crime algorithms to date, we introduce one specific algorithm of cluster analysis
data in the future. below by following ref. [22]. Note that alternatives exist for each step due
to the significant developments in cluster analysis. More details can be
found in refs. [21–24].
APPENDIX Given a set of points S = {s1 , ..., sn } in Rl , which we wish to cluster
into k subsets, the first stage will be the ordination of the data (includes
A. DATA MINING TECHNIQUES IN CRIME
steps 1–4) followed by the second stage, which focuses on clustering
This appendix aims to give the reader an introduction to the data mining (includes steps 5–6):
techniques that have been used for crime analysis.
1. From the affinity matrix A ∈ Rn×n defined by Aij = exp(−
 si − sj 2 /2σ 2 ) if i = j , and Aij = 0. Note that  si − sj 
A.1 Entity Extraction refers to the normalized distance between si and sj .

Entity extraction is a process that depends on the availability of


2. Define D as the diagonal matrix whose (i, i)-element is the sum
extensive amounts of clean input data for the identification of particular
of A’s i-th row, and construct the matrix L = D−1/2 AD−1/2 .
patterns such as text, images or audio materials that can then provide
basic information for crime analysis [2,10]. The Message Understanding
Conference (MUC) series, which coordinates multiple research groups and 3. Find x1 , x2 , ..., xk , the k largest eigenvectors of L (chosen to be
government agencies [115], has been the major forum for researchers in orthogonal to each other in the case of repeated eigenvalues)
this area, where they meet and compare the performance of their entity and form the matrix X = [x1 x2 ...xk ] ∈ Rn×k by stacking the
extraction approaches [116] and seek to improve the technologies [115]. eigenvectors in columns.
As concluded in ref. [8], the main approaches for entity extraction are
lexical-lookup, rule-based, statistic-based and machine learning. Below, 4. Form the matrix Y from X by re-normalizing each of X’s rows to
we provide some basic introductions to the major approaches, and in doing have unit length (i.e., Yij = Xij /(j Xij2 )1/2 ).
so, we mainly follow ref. [8].
5. Treating each row of Y as a point in Rk , cluster them into k clusters
Lexical lookup Most entity extraction systems maintain hand-crafted via K-means or any other algorithm (that attempts to minimize
lexicons that contain lists of popular names for entities of interest. distortion).
These systems work by looking up phrases in text that match the
items specified in their lexicons. See, for example, [98]. 6. Finally, assign the original point si to cluster j if and only if row
i of the matrix Y was assigned to cluster j .
Rule-based The rule-based system relies on hand-crafted rules to identify
named entities. The rules may be structural, contextual or lexical
[99].
A.3 Association Rule Mining
Statistic-based Statistical-based systems often use statistical models to
identify occurrences of certain cues of particular patterns for Association rule mining was initially proposed by Agrawal et al. [110]
entities in texts. A training dataset is needed for these systems as a method of discovering interesting co-occurrences in supermarket
to obtain the statistics. The statistical language model in ref. [100] data. The association rule is defined as a technique for investigating the
is an example of statistical-based entity extraction systems. possibility of the simultaneous occurrence of data [117]. Yun et al. [117]
suggested an association rule discovery technique that is able to identify
Machine learning Instead of human-created rules, the machine learning significant, rare data associated with specific data in a way that the rare
system relies on what is termed as machine learning algorithms to data occurs simultaneously with the specific data more frequently than the
extract knowledge and identify patterns from texts. Few examples average co-occurrence frequency in the database. We follow ref. [111] and

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
148 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

briefly introduce the association rule mining technique below. For a more records are summarized in a table named a confusion matrix. Then, the
detailed explanation, see ref. [110]. performance metric, which summarizes the information in the confusion
Let I = {i1 , i2 , ..., im } be a set of literals, called items. Let D be a set matrix into a single number, provides an easier way for comparing the
of transactions, where each transaction T is a set of items such that T ⊆ I . performance of different classification models [21].
Associated with each transaction is a unique identifier, called T I D. We
say that a transaction T contains X, a set of some items in I , if X ⊆ T .
An association rule is an implication of the form X ⇒ Y , where X ⊂ I , Decision Trees
Y ⊂ I and X ∩ Y = ∅.
Decision trees ( [124–126]) are applied to accomplish the classification
More specifically, in ref. [21], the strength of an association rule can
task by giving a series of carefully crafted questions about the attributes
be measured in terms of its support and confidence, in which support
of the test record [21]. When an answer is achieved, it will be followed
determines how often a rule is applicable to a given dataset, while
by a question until the category of this attribute is concluded. All the
confidence determines how frequently items in Y appear in transactions
series of questions and possible answers are carefully predefined as well
that contain X. The formal definitions of these measures are as follows,
as the process repeated to all subsets of the tree. The classification task
in which N refers to the number of transactions:
is established at the same time a decision tree is constructed. Note that
the training records are split based on an attribute test that optimizes
σ (X ∪ Y ) some certain criterions; additionally, it is possible that many different
Support,s(X ⇒ Y ) = . (A1) decision trees can be conducted from the same given set of cases. The final
N
choice depends on the research and the individual circumstances. Many
σ (X ∪ Y )
Confidence,c(X ⇒ Y ) = . (A2) algorithms have been developed for getting the most reasonably optimal
σ (X) decision tree with good accuracy in a timely manner. For example, CART
[124], C4.5 [125], ID3 [126], Hunt’s Algorithm [127], SLIQ [128] and
Therefore, it can be concluded that support indicates that the rule
SPRINT [129].
X ⇒ Y holds in the transaction set D with confidence c if c% of
Among all the effective algorithms, Hunt’s algorithm is the basis of
transactions in D contain X ∪ Y ; and confidence indicates that the rule
many existing decision tree-induction algorithms, by which the decision
X ⇒ Y holds in the transaction set D with confidence c if c% of
tree is constructed via recursively partitioning until each path ends in a
transactions in D that contain X also contain X ∪ Y . Given a set of
pure subset [21]. The process of Hunt’s algorithm is listed below, mainly
transactions D, the problem of association rule mining is to generate
following [21]. In addition, one example problem of predicting whether
all association rules in the database that have support and confidence
a loan applicant will succeed in repaying her loan obligations or become
greater than the user-specified minimum support (called minsup) and
delinquent is adopted for applying Hunt’s algorithm and inducing decision
minimum confidence (called minconf ), respectively [111]. There are many
trees in Figure A2. Let Dt be the set of training records that are associated
algorithms for association rule mining, among which, the most well-known
with node t, and y = y1 , y2 , ..., yc be the class labels.
and widely-used Apriori algorithm was proposed in refs. [110,111] as a
seminal algorithm for identifying support using candidate generation. Wu Step One If all the records in Dt belong to the same class yt , then t is
et al. [118] stated that the Apriori algorithm is one of the most popular data a leaf node (which refers to a node that has exactly one incoming
mining approaches to find support from a transaction dataset and derive edge and no outgoing edges), and it is labeled as yt .
association rules. Other algorithms, like direct hashing and pruning (DHP),
FP-growth and H-mine, can be found with more details in [119–122]. Step Two If Dt contains records that belong to more than one class,
an attribute test condition is selected to recursively partition the
records into smaller subsets. A child node is created for each
A.4 Classification Techniques outcome of the test condition, and the records in Dt are distributed

As one of the most fundamental and significant data mining techniques,


classification is briefly defined in ref. [21] as the task of assigning objects
to one of several predefined categories. The aim of classification rule
mining is to discover a small set of rules in the database to form an accurate
classifier [123]. The formal description of the algorithm underlying the
classification rule is listed below by mainly following ref. [21].
The input data for a classification task can be regarded as a collection of
records that contains the instances that are characterized by a tuple (X, y)
(note that X is the attribute set, y is a special attribute, also known as
category). Therefore, classification can be explained as the task of learning
a target function f that maps each attribute set X to one of the predefined
class labels y (see Figure A1). In order to evaluate the performance of
a classification model, the counts of correct and incorrect predicted test

Fig. A1 Classification as the task of mapping an input attribute Fig. A2 Example of Hunt’s algorithm for inducing decision trees
set X into its class label y [21]. [21].

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Hassani et al.: Review of Data Mining Applications in Crime 149

to the children based on the outcomes. The algorithm is then assuming the attributes are conditionally independent given the label [132].
recursively applied to each child node. In general, it is simple and easy to understand; furthermore, it is also
convenient for implementation. This classifier is recognized as one of
the most efficient and effective inductive learning algorithms for machine
Support Vector Machines learning and data mining [133]. We follow ref. [133] to present a detailed
algorithm of the naive Bayes rule.
Support vector machines (SVM) were initially introduced in ref. [130] Assume an example set E is represented by a set of attribute values
as a training algorithm for classification, which divides the objects into (x1 , x2 , ..., xn ), where xi is the value of attribute Xi . Let C represent the
two classes by using an optimal separating hyperplane that minimizes classification variable, and let c be the value of C. Note that here, we
the classification error (see Figure A3). We briefly present the theory assume that there are only two classes: + (positive class) or - (negative
underlying SVM below by following [130]. class).
Consider the simplest problem of separating a set of training vectors According to Bayes Rule, the probability of an example E being class
D belonging to two separate classes: c is:

p(E | c)p(c)
p(c | E) = , (A7)
D = {(x 1 , y 1 ), ..., (x l , y l )}, where xεR n , yε{−1, 1}. (A3) p(E)

The hyperplane that geometrically separates the surfaces of two classes is where E is classified as the class C = + if and only if
defined as < w, x > +b = 0, where w is the weight vector and b refers to
the bias. Additionally, the parameters are constrained by mini | < w, x i > p(C = + | E)
fb (E) = ≥ 1, (A8)
+b| = 1 [130]. A separating hyperplane in such canonical form must p(C = − | E)
satisfy the following constraint: y i [< w, x i > +b] ≥ 1, where i = 1, ..., l.
The set of vectors is said to be optimally separated by the hyperplane where fb (E) is called a Bayesian classifier.
if it is separated without error, and the distance d between the closest Assume that all attributes are independent given the value of the class
vectors to the hyperplane is maximal, which can be computed by variable, that is:


n
| < w, x i > +b| p(E | c) = p(x1 , x2 , ..., xn | c) = p(xi | c). (A9)
d(w, b, x) = . (A4)
||w|| i=1

The optimal separating hyperplane is attainable via maximizing ρ (the Then the resulting classifier is:
margin) subject to the constraints in the canonical form of the hyperplane
p(C = +)  p(xi | C = +)
n

2 fnb (E) = . (A10)


ρ(w, b) = min d(w, b, x i ) + min d(w, b, x i ) = , (A5) p(C = −) p(xi | C = −)
x i :y i =−1 x i :y i =1 ||w|| i=1

The function fnb (E) is so called a naive Bayesian classifier (or Naive
and minimizes Bayes). Therefore, the p(x1 | c), p(x2 | c), ..., p(xn | c) can be estimated
from a training sample, and the posterior probability for each class can
1
φ(w) = ||w||2 . (A6) be calculated respectively. The class with the highest posterior probability
2
will be the prediction class.

Naive Bayes Rule


Neural Networks
The naive Bayes classifier was proposed in Langley et al. [131] and uses
Bayes Rule to compute the probability of each class given the instance, As one of the most important tools for classification, neural networks
has proved its high tolerance to noisy data and the ability to classify
untrained patterns with convincing evidence through recently published
research. According to ref. [134], neural networks are able to estimate
the posterior probabilities, which provides the basis for establishing
classification rule and performing statistical analysis [134–138]. Effective
and successful performances have been achieved by neural networks
in many real-world classification tasks [139]. The algorithm of neural
networks is introduced below by mainly following [135].
Consider a general M-group classification problem in which each object
has an associated attribute vector x of d dimensions. Note that ω denotes
the membership variable that takes a value of ωj if an object belongs to
group j . Neural network for a classification problem can be viewed as a
mapping function

F : Rd → RM , (A11)

where d-dimensional input x is submitted to the network, and an


M-vectored network output y is obtained to make the classification
Fig. A3 Classification by Support Vector Machines [130]. decision. The network is built so that an overall error measure, such

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
150 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

as the mean squared error, is minimized. According to ref. [135], the placement on the shortest path between other pairs of nodes in the
mapping function F : x → y that minimizes the expected squared error network [145].
E[y − F (x)]2 is the conditional expectation of y given x F (x) = E[y | x].
Hence, the j th element of F (x) is given by

Fj (x) = E[yj | x] REFERENCES


= 1 · P (yj = 1 | x) + 0 · P (yj = 0 | x)
(A12) [1] P. Kanellis, E. Kiountouzis, N. Kolokotronis, and D. Mar-
= P (yj = 1 | x) takos, Digital Crime and Forensic Science in Cyberspace,
= P (ωj | x). Hershey, PA, IGI Global, 2006.
[2] G. K. Gupta, Introduction to Data Mining with case studies,
That is, the least squares estimate for the mapping function in a New Delhi, Prentice Hall, 2006.
classification problem is exactly the posterior probability. The mean [3] P. Thongtae and S. Srisuk, An analysis of data mining
squared error function can be derived as applications in crime domain, In IEEE 8th International
Conference on Computer and Information Technology
M 
 Workshops, Sydney, 2008, 8–11.
MSE = [Fj (x) − P (ωj | x)]2 f (x)dx [4] H. Chen, W. Chung, Y. Qin, M. Chau, J. J. Xu, G. Wang, R.
j =1 Rd
Zheng, and H. Atabakhsh, Crime data mining: an overview
and case studies, In Proceedings of the 2003 Annual
M 
 National Conference on Digital Government Research,
+ P (ωj | x)(1 − P (ωj | x))f (x)dx. (A13)
Rd Boston, MA, 2003, 1–5.
j =1
[5] H. Hassani, G. Saporta, and E. S. Silva, Data mining and
The first term of the right side, termed as the estimation error, is affected
official statistics: the past, the present and the future, Big
by the effectiveness of neural network mapping, while the second term is
Data 2 (1) (2014), 34–43.
independent of neural networks and reflects the inherent irreducible error
[6] S. V. Nath, Crime pattern detection using data mining,
due to randomness of the data [135].
In Proceedings of the International Conference on Web
Intelligence and Intelligent Agent Technology Workshops,
Hong Kong, 2006, 41–44.
[7] U. Fayyad and R. Uthurusamy, Evolving data into mining
A.5 Social Network Analysis solutions for insights, Commun ACM 45 (8) (2002),
Social network data is now a significant resource for criminal 28–31.
investigations. According to Mena [140], social network analysis (SNA) [8] M. Chau, J. J. Xu, and H. Chen, Extracting meaningful
is a data mining technique that reveals the structure and content
entities from police narrative reports, In Proceedings of the
found in a body of information by representing the same as a set
2002 Annual National Conference on Digital Government
of interconnected, linked objects. The basic structure of a social
Research, Los Angeles, CA, 2002, 1–5.
network consists of nodes that are connected to other related nodes
[9] J. Hosseinkhani, M. Koochakzaei, S. Keikhaee, and Y. H.
by links. The connections between two nodes is called an edge, which
Amin, Detecting suspicion information on the web using
indicates that two persons participated in a certain event [141]. The
crime Data Mining techniques, Int J Adv Comput Sci
measurement techniques used in social network analysis are based on
Inform Technol 3 (1) (2014), 32–41.
the principles of graph theory, which consists of various mathematical
[10] H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M.
formulae and concepts for the study of pattern of lines [141] (for
Chau, Crime data mining: a general framework and some
more details see, [143–145]). Among all the measurement techniques,
examples, Computer 37 (4) (2004), 50–56.
degree, density and centrality are the most significant and widely used
[11] A. Sharma, and P. K. Panigrahi, A review of financial
techniques.
accounting fraud detection based on data mining tech-
niques, Int J Comput Appl 39 (1) (2012), 37–47.
Degree The ‘degree’ of any node of the network is defined as the [12] T. K. Cocx, and W. A. Kosters, A distance measure
number of other nodes to which it is directly linked [78]. It simply for determining similarity between criminal investigations,
represents the scores or count of the number of ties with other Adv Data Mining Appl Med Web Mining Marketing Image
actors in the network. Signal Mining 4065 (2006), 511–525.
[13] C. H. Ku, A. Iriberri, and G. Leroy, Crime information
Density Density is defined as the number of edges in a portion of a social extraction from police and witness narrative reports, In
network to the maximum number of edges that theoretically make Proceedings of the IEEE Conference on Technologies for
up the social network [104]. Assume C is the number of observed Homeland Security, 12–13 May, Waltham, MA, 2008,
edges and n is the total number of nodes in the social network, 193–198
then the formula of density is, Density = n(n−1)/2
C
. [14] C. H. Ku, A. Iriberri, and G. Leroy, Natural language
processing and e-government: crime information extraction
Centrality According to ref. [146], centrality is the measure of closeness from heterogeneous data sources, In Proceedings of the
of a node to the center of high activity in a network, which 9th Annual International Digital Government Research
indicates the structural importance level of the node. The centrality Conference, Montreal, Canada, 2008, 18–21.
can be evaluated by degree, closeness and betweenness: Degree [15] V. Pinheiro, V. Furtado, T. Pequeno, and D. Nogueira, Nat-
centrality refers to the direct count of the number of connections a ural language processing based on semantic inferentialism
node has to other nodes; Closeness centrality is considering the for extracting crime information from text, In Proceedings
overall closeness of a node to all other nodes in the network of the IEEE International Conference on Intelligence and
[147]; Betweenness centrality measures the extent of a node’s Security Informatics (ISI), Vancouver, BC, 2010, 23–26.

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Hassani et al.: Review of Data Mining Applications in Crime 151

[16] J. Johnson, A. Miller, L. Khan, B. Thuraisingham, and clustering and classification, Int J Adv Res Comput Eng
M. Kantarcioglu, Extraction of expanded entity phrases, Technol 1 (10) (2012), 225–227.
In Proceedings of the IEEE International Conference on [33] J. Agarwal, R. Nagpal, and R. Sehgal, Crime analysis using
Intelligence and Security Informatics (ISI), Beijing, China, K-means clustering, Int J Comput Appl 83 (4) (2013), 1–4.
2011, 10–12. [34] M. Vijayakumar, S. Karthick, and N. Prakash, The day-
[17] K.-S. Yang, C.-C. Chen, Y.-H. Tseng, and Z.-P. Ho, Name to-day crime forecasting analysis of using spatial-temporal
entity extraction based on POS tagging for criminal infor- clustering simulation, Int J Sci Eng Res 4 (1) (2013), 1–6.
mation analysis and relation visualization. In Proceedings [35] R. Adderley, and P. B. Musgrove, Data mining case study:
of the 6th International Conference on New Trends in modeling the behavior of offenders who commit serious
Information Science and Service Science and Data Mining sexual assaults, In Proceedings of the 7th International
(ISSDM), 23–25 October, Taipei, Japan, 2012, 785–789. Conference on Knowledge Discovery and Data Mining,
[18] M. Asharef, N. Omar, and M. Albared, Arabic named San Francisco, CA, 2001, 26–29.
entity recognition in crime documents, J Theor Appl Inform [36] H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway,
Technol 44 (1) (2012), 1–6. H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen,
[19] A. Alkaff, and M. Mohd, Extraction of nationality from and A. W. Clements, COPLINK connect: information
crime news, J Theor Appl Inform Technol 54 (2) (2013), and knowledge management for law enforcement, Decis
304–312. Support Syst 34 (3) (2002), 271–285.
[20] R. Arulanandam, B. T. R. Savarimuthu, and M. A. Purvis, [37] H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, and J.
Extracting crime information from online newspaper arti- Schroeder, COPLINK: managing law enforcement data and
cles, In Proceedings of the Second Australasian Web Con- knowledge, Commun ACM 46 (1) (2003), 28–34.
ference, 20–23 January, Auckland, New Zealand, 2014, [38] J. Schroeder, J. Xu, H. Chen, and M. Chau, Automated
31–38. criminal link analysis based on domain knowledge, J Am
[21] T. Pang-Ning, M. Steinbach, and V. Kumar, Introduction Soc Inform Sci Technol 58 (6) (2007), 842–855.
to Data Mining, (1st ed.), London, Pearson, 2014. [39] M. Gupta, B. Chandra, and M. P. Gupta, Crime data mining
[22] T. G. Dietterich, S. Becker, and Z. Ghahramani, Advances for Indian police information system. Computer Society of
in neural information processing systems 14, In Proceed- India, 2008, 389–397.
ings of the Annual Conference on Neural Information Pro- [40] D. E. Brown, and S. Hagen, Data association methods with
cessing Systems, MIT Press, 2002. applications to law enforcement, Deci Support Syst 34 (4)
(2003), 369–378.
[23] M. R. Anderberg, Cluster analysis for applications (No.
[41] S. Lin, and D. E. Brown, An outlier-based data association
OAS-TR-73-9). Office of the Assistant for Study Support
method for linking criminal incidents, Decis Support Syst
Kirtland Afb N Mex, 1973.
41 (3) (2006), 604–615.
[24] R. T. Ng and J. Han, Efficient and effective clustering
[42] S. Appavu, M. Pandian, and R. Rajaram, Association
methods for spatial data mining, In Proceedings of
rule mining for suspicious email detection: a data mining
The 20th International Conference on Very Large Data
approach, In Proceedings of the IEEE Intelligence and
Bases, September 12–15, Chile, Santiago de Chile, 1994, Security Informatics, Brunswick, NJ, 2007, 23–24.
144–155. [43] V. Ng, S. Chan, D. Lau, and C. M. Ying, Incremental
[25] T. Grubesic, and A. Murray, Detecting Hot-spots using mining for temporal Association Rules for crime pattern
Cluster Analysis and GIS, In Proceedings of the 5th Annual discoveries, In Proceedings of the 18th Conference on
International Crime Mapping Research Conference, Dallas, Australasian Database, Ballarat, Victoria, 2007, 29 January-
2001. 2 February, 123–132.
[26] R. V. Hauck, H. Atabakhsb, P. Ongvasith, H. Gupta, and [44] A. L. Buczak, and C. M. Gifford, Fuzzy association
H. Chen, Using Coplink to analyze criminal-justice data, rule mining for community crime pattern discovery,
Computer 35 (3) (2002), 30–37. In Proceedings of the ACM SIGKDD Workshop on
[27] J. S. De Bruin, T. K. Cocx, W. A. Kosters, J. F. J. Laros, Intelligence and Security Informatics, Washington, DC,
and J. N. Kok, Data Mining approaches to criminal career 2010, 25–28.
analysis, In Proceedings of the 6th International Conference [45] D. Usha, and K. Rameshkumar, A complete survey on
on Data Mining, Hong Kong, 2006, 18–22. application of frequent pattern mining and association rule
[28] R. Adderley, M. Townsley, and J. Bond, Use of data mining mining on crime pattern mining, Int J Adv Comput Sci
techniques to model crime scene investigator performance, Technol 3 (4) (2014), 264–275.
Knowl Based Syst 20 (2) (2007), 170–176. [46] M. J. Zaki, Parallel and distributed association mining: a
[29] R. Lombardo, and M. Falcone, 2011. Crime and Economic survey, IEEE Concurrency 7 (4) (1999), 14–25.
Performance. A cluster analysis of panel data on Italy’s [47] S. Appavu, and R. Rajaram, Suspicious e-mail detection
NUTS 3 regions. Working Paper, University of Calabria, via decision tree: a data mining approach, J Comput Inform
1–33. Technol 15 (2) (2007), 161–169.
[30] Y. Bello, and S. A. Yelwa, Complementing GIS with [48] E. Kirkos, C. Spathis, and Y. Manolopoulos, Data Mining
Cluster Analysis in Assessing Property Crime in Katsina techniques for the detection of fraudulent financial state-
State, Nigeria, Am Int J Contemp Res 2 (7) 2012, 190–198. ments, Expert Syst Appl 32 (4) (2007), 995–1003.
[31] M. R. Keyvanpour, M. Javideh, and M. R. Ebrahimi, [49] C. Wang, and P.-S. Liu, Data mining and hotspot detection
Detecting and investigating crime by means of data mining: in an urban development project. J Data Sci 6 (2008),
a general crime matching framework, Procedia Comput Sci 389–414.
3 (2011), 872–880. [50] J. Gutierrez, Using decision trees to predict crime reporting,
[32] M. Sukanya, T. Kalaikumaran, and S. Karthik, Criminals In Advanced Principles for Improving Database Design,
and crime hotspot detection using data mining algorithms: Systems Modeling, and Software Development, K. Siau

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
152 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

and J. Erickson, eds. Hershey, PA, Information Science [68] R. Basili, C. Giannone, C. D. Vescovo, A. Moschitti,
Reference, 2009, 132–145. and P. Naggar, Kernel-based relation extraction for crime
[51] C.-H. Yu, M. W. Ward, M. Morabito, and W. Ding, Crime investigation. In AI*IA 2009: Emergent Perspectives in
forecasting using data mining techniques. In Proceedings Artificial Intelligence, 2009, 161–171.
of the 11th International Conference on Data Mining [69] P. Wang, R. Mathieu, J. Ke, and H. J. Cai, Predicting crimi-
Workshops, 11 December, Vancouver, BC, 2011, 779–786. nal recidivism with support vector machine. In Proceedings
[52] W. Hui, W. Jing, and Z. Tao, Analysis of decision tree of the International Conference on Management and Ser-
classification algorithm based on attribute reduction and vice Science, 24–26 August, Wuhan, 2010, 1–9.
application in criminal behavior. In Proceedings of the [70] M. Salem, and S. Stolfo, Detecting masqueraders: a com-
3rd International Conference on Computer Research and parison of one-class bag-of-words user behavior model-
Development, 11–13 March, Shanghai, 2011, 27–30. ing techniques, J Wireless Mob Netw Ubiquitous Comput
[53] R. Bhowmik, Detecting auto insurance fraud by data Dependable Appl 1 (1) (2010), 3–13.
mining techniques, J Emerg Trends Comput Inform Sci [71] A. Modupe, O. O. Olugbara, and S. O. Ojo, Exploring
2 (4) (2011), 156–162. support vector machines and random forests to detect
[54] A. Gepp, J. H. Wilson, K. Kumar, and S. Bhattacharya, advanced fee fraud activities on internet, In Proceedings
Comparative analysis of decision trees vis-à-vis other com- of the IEEE 11th Conference on Data Mining Workshops,
putational data mining techniques in automotive insurance 11 December, Vancouver, BC, 2011, 331–335.
fraud detection, J Data Sci 10 (2012), 537–561. [72] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C.
[55] A. Nasridinov, S.-Y. Ihm, and Y.-H. Park, A decision Westland, Data mining for credit card fraud: a comparative
tree-based classification model for crime prediction, Inform study, Decis Support Syst 50 (3) (2011), 602–613.
Technol Convergence Lect Notes Electr Eng 253 (2013), [73] H. M. Deylami, and Y. P. Singh, Adaboost and SVM based
531–538. cybercrime detection and prevention model, Artif Intell Res
[56] R. Iqbal, M. A. A. Murad, A. Mustapha, P. H. S. 1 (2) (2012), 117–130.
Panahy, and N. Khanahmadliravi, An experimental study [74] R. Alwee, S. M. H. Shamsuddin, and R. Sallehuddin,
of classification algorithms for crime prediction, Ind J Sci 2013. Hybrid support vector regression and autoregressive
Technol 6 (3) (2013), 4219–4225. integrated moving average models improved by particle
[57] Y. Wang, X. Peng, and J. Bian, Computer crime forensics swarm optimization for property crime states forecasting
based on improved decision tree algorithm, J Netw 9 (4) with economic indicators, Sci World J, 1–11.
(2014), 1005–1011. [75] R. Alwee, S. M. H. Shamsuddin, and R. Sallehuddin,
[58] S. A. Muhammad, Fraud: the affinity of classification
2013. Economic indicators selection for property crime
techniques to insurance fraud detection, Int J Innov Technol
rates using Grey Relational Analysis and Support Vector
Explor Eng 3 (11) (2014), 62–66.
Regression. In Proceedings of the International Conference
[59] C. M. Fuller, D. P. Biros, and D. Delen, An investigation
on Systems, Control, Signal Processing and Informatics,
of data and text mining methods for real world deception
16–19 July, Rhodes Island, 178–185.
detection, Expert Syst Appl 38 (7) (2011), 8392–8398.
[76] A. K. Junoh, M. N. Mansor, A. M. Yaacob, F. A. Adnan,
[60] C.-H. Wen, P.-Y. Hsu, C-y Wang, and T. L. Wu, Identifying
smuggling vessels with artificial neural network and S. A. Saad, and N. M. Yazid, Crime Detection with DCT
logistics regression in criminal intelligence using vessels and artificial intelligent approach, Adv Mat Res (2013),
smuggling case data, Intell Inform Database Syst Lect 816–817. 610–615.
Notes Comput Sci 7197 (2012), 539–548. [77] J. Bendler, T. Brandt, S. Wagner, and D. Neumann, Inves-
[61] M. Pandey, and V. Ravi, Detecting phishing e-mails tigating crime-to-twitter relationships in urban environ-
using text and data mining, In Proceedings of the IEEE ments–facilitating a virtual neighborhood watch, In Pro-
International Conference on Computational Intelligence & ceedings of the 22nd European Conference on Information
Computing Research, Coimbatore, India, 2012, 18–20. Systems, Tel Aviv, 2014, 9–11.
[62] R. P. Ang, and D. H. Goh, Predicting juvenile offending: [78] M. K. Sparrow, The application of network analysis to
a comparison of data mining methods, Int J Offender Ther criminal intelligence: an assessment of the prospects, Social
Comp Criminol 57 (2) (2013), 191–207. Netw 13 (3) (1991), 251–274.
[63] N. Tollenaar, and P. G. M. van der Heijden, Which method [79] J. C. Wang, and C. Q. Chiu, Detecting online auction
predicts recidivism best?: a comparison of statistical, inflated-reputation behaviors using social network analy-
machine learning and data mining predictive models, J R sis, In Proceedings of the Annual Conference of the North
Stat Soc A 176 (2) (2013), 565–584. American Association for Computational Social and Orga-
[64] O. De-Vel, A. Anderson, M. Corney, and G. Mohay, nizational Science, Notre Dame, 2005, 26–28.
Mining e-mail content for author identification forensics, [80] J. Qin, J. J. Xu, D. Hu, M. Sageman, and H. Chen,
ACM SIGMOD Rec 30 (4) (2001), 55–64. 2005. Analyzing terrorist networks: a case study of the
[65] K. Kianmehr, and R. Alhajj, Effectiveness of support vector global salafi jihad network. In Intelligence and Security
machine for crime hot-spots prediction, J Appl Artif Intell informatics, 3495, 287–304.
22 (5) (2008), 433–458. [81] D. McNally, and J. Alston, Use of social network analysis
[66] R. Abu Hana, C. Freitas, L. S. Oliveira, and F. Bor- (SNA) in the examination of an outlaw motorcycle gang,
tolozzi,Crime Scene Classification. In Proceedings of the J Gang Res 13 (3) (2006), 1–25.
23th Annual ACM Symposium in Applied Computing, 16- [82] S. Ressler, Social network analysis as an approach to
20 March, Fortaleza, Cear, Brazil, 2008, 419–423. combat terrorism: Past, present, and future research, Homel
[67] Z. Liu, D. Lin, and F. Guo, A method for locating Secur Aff 2 (2) (2006), 1–10.
digital evidences with outlier detection using support vector [83] R. Rowe, G. Creamer, S. Hershkop, and S. J. Stolfo, Auto-
machine, Int J Netw Secur 6 (3) (2008), 301–308. mated social hierarchy detection through email network

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
Hassani et al.: Review of Data Mining Applications in Crime 153

analysis, In Proceedings of the 9th WebKDD and 1st SNA- [100] I. H. Witten, Z. Bray, M. Mahoui, and W. J. Teahan,
KDD 2007 Workshop on Web mining and Social Network Using language models for generic entity extraction,
Analysis, San Jose, CA, 2007, 12–15. In Proceedings of the 16th International Conference on
[84] M. Chau, and J. Xu, Mining communities and their Machine Learning Workshop on Text Mining, Bled,
relationships in blogs: a study of online hate groups, Int Slovenia, 1999.
J Hum Comput Stud 65 (1) (2007), 57–70. [101] S. Baluja, V. O. Mittal, and R. Sukthankar, Apply-
[85] P. S. Nair, and S. T. Sarasamma, Data mining through ing machine learning for high-performance named-entity
fuzzy social network analysis, In Proceedings of the Annual extraction, Comput Intell 16 (4) (2000), 586–595.
Meeting of the North American, San Diego, CA, 2007, [102] S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz,
24–27. R. Stone, and R. Weischedel, Algorithms that learn to
[86] Q. Liu, and C. Tang, S. Qiao, Q. Liu, F. Wen, 2007. Mining extract information BBN: description of the sift system as
the core member of terrorist crime group based on social used for MUC-7. In Proceedings of the Seventh Message
network analysis. Intelligence and Security Informatics, Understanding Conference, 29 April–1 May, Fairfax, VA,
4430, 311–313. 1998.
[87] J. C. Wang, and C. C. Chiu, Recommending trusted online [103] LexiQuest Mine. Executive Report, SPSS, IBM. URL
auction sellers using social network analysis, Expert Syst http://www.spss.ch/upload/1037287198_LexiQuestMine.
Appl 34 (3) (2008), 1666–1679. pdf. Accessed: January 26, 2016.
[88] F. Kerschbaum, and A. Schaad, Privacy-preserving social [104] A. Iriberri, and G. Leroy, Natural language processing and
network analysis for criminal investigations, In Proceed- e-government: extracting reusable crime report informa-
ings of the 7th ACM workshop on Privacy in the electronic tion, In Proceedings of the IEEE International Conference
society, Alexandria, VA, 2008, 27–31. on Information Reuse and Integration, Las Vegas, IL, 2007,
[89] M. Chau, and J. Xu, Using web mining and social network 13–15.
analysis to study the emergence of cyber communities in [105] J. J. Xu, and H. Chen, Fighting organized crimes: using
blogs, Terror Inform 18 (2008), 473–494. shortest-path algorithms to identify associations in criminal
[90] S. Qiao, C. Tang, J. Peng, W. Liu, F. Wen, and J. networks, Decis Support Syst 38 (3) (2004), 473–487.
Qiu, Mining key members of crime networks based on [106] W. R. Harper, and D. H. Harris, The application of link
personality trait simulation email analysis system. Chin J analysis to police intelligence, Human Factors 17 (2)
Comput 31 (10) (2008), 1795–1803. (1975), 157–164.
[91] H. Chen, Homeland security data mining using social [107] H. G. Goldberg, and R. W. H. Wong, Restructuring
network analysis. In Proceedings of the IEEE International transactional data for link analysis in the FinCEN AI
Conference on Intelligence and Security Informatics, system, In AAAI Fall Symposia, Florida, Orlando, 1998,
Taipei, xxvii, 2008, 17–20. 22–24.
[92] A. M. Fard, and M. Ester, Collaborative mining in multiple [108] P. Klerks, The network paradigm applied to criminal
social networks data for criminal group discovery, In Pro- organizations: theoretical nitpicking or a relevant doctrine
ceedings of the International Conference on Computational for investigators? Recent developments in the Netherlands,
Science and Engineering, Vancouver, BC, 2009, 29–31. Connections 24 (3) (2001), 53–65.
[93] D. M. Schwartz, and T. D. A. Rouselle, Using social [109] T. Anderson, L. Arbetter, A. Benawides, and A. Longmore-
network analysis to target criminal networks, Trends Organ Etheridge, Security works, Security Manage 38 (17)
Crime 12 (2) (2009), 188–207. (1994), 17–20.
[94] C. Chiu, Y. Ku, T. Lie, and Y. Chen, Internet auction fraud [110] R. Imieliski, T., and Swami, A. Agrawal, Mining Asso-
detection using social network analysis and classification ciation Rules between sets of items in large databases, In
tree approaches, Int J Electr Commer 15 (3) (2011), Proceedings of the 1993 ACM SIGMOD International Con-
123–147. ference on Management of data, 26–28 May, Washington,
[95] P. J. Carrington, 2011. Crime and social network analysis. DC, 1993, 207–216.
In The SAGE Handbook of Social Network Analysis, [111] R. Agrawal and R. Srikant, Fast algorithms for mining
236–255. association rules, In Proceedings of the 20th International
[96] J. A. Johnson, and J. D. Reitzel, Geneva, 2011, Social Conference on Very Large Data Bases, September 12–15,
network analysis in an operational environment: defining Santiago de Chile, Chile, 1994, 487–499.
the utility of a network approach for crime analysis [112] S. Li, T. Wu, and W. M. Pottenger, Distributed higher order
using the Richmond City Police Department as a case association rule mining using information extracted from
study. In: Proceedings of the International Police Executive textual data, SIGKDD Explorations 7 (1) (2005), 26–35.
Symposium, 1–23. [113] A. Asuncion, and D. J. Newman, Uci machine learning
[97] D. Prakash, and S. Suren, Detection and analysis of hidden repository. 2007, Irvine, CA, Department of Information
activities in social networks, Int J Comput Appl 77 (16) and Computer Science, University of California. URL
(2013), 34–38. http://www.ics.uci.edu/mlearn/MLRepository.html.
[98] A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman, [114] R. Berk, L. Sherman, G. Barnes, E. Kurtz, and L. Ahlman,
NYU: description of the MENE named entity system as Forecasting murder within a population of probationers and
used in MUC-7, In Proceedings of the Seventh Message parolees: a high stakes application of statistical learning,
Understanding Conference, 29 April - 1 May, Fairfax, VA, J R Stat Soc A, 172 (1) (2009), 971–1008.
1998. [115] J. Cowie, and W. Lehnert, Information extraction, Commun
[99] G. R. Krupka and K. Hausman, IsoQuest Inc.: description ACM 39 (1) (1996), 80–91.
of the NetOwl (TM) Extractor System as Used for MUC- [116] N. A. Chinchor, Overview of MUC-7/MET-2, In Proceed-
7, In Proceedings of the Seventh Message Understanding ings of the Seventh Message Understanding Conference,
Conference, 29 April–1 May, Fairfax, VA, 1998. 29 April-1 May, Fairfax, VA, 1998.

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam
154 Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 9 (2016)

[117] H. Yun, D. Ha, B. Hwang, and K. H. Ryu, Mining [132] R. Kohavi, Scaling up the accuracy of naive-Bayes
Association Rules on significant rare data using relative classifiers: a decision-tree hybrid, In Proceedings of the
support, J Syst Softw 67 (3) (2003), 181–191. 2nd International Conference on Knowledge Discovery and
[118] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Data Mining, Portland, OR, 1996, 2–4.
Motoda, G. F. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. [133] H. Zhang, The optimality of naive Bayes, Proceedings
Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, Top of the 17th International Florida Artificial Intelligence
10 algorithms in data mining, Knowl Inform Syst 14 (1) Research Society Conference, Florida, Miami Beach, 2004,
(2008), 1–37. 12–14. May,
[119] J. S. Park, M.-S. Chen, and P. S. Yu, An effective [134] M. D. Richard, and R. P. Lippmann, Neural network
hash-based algorithm for mining association rules, ACM classifiers estimate Bayesian a posteriori probabilities,
SIGMOD Record 24 (2) (1995), 175–186. Neural Comput 3 (4) (1991), 461–483.
[120] J. S. Park, M. S. Chen, and P. S. Yu, Using a hash-based [135] G. P. Zhang, Neural networks for classification: a survey,
method with transaction trimming for mining association IEEE Trans Syst Man Cybern C Appl Rev 30 (4) (2000),
rules, IEEE Trans Knowl Data Eng 9 (5) (1997), 813–825. 451–462.
[121] J. Han, and J. Pei, and Y. Yin, Mining frequent patterns [136] H. Gish, A probabilistic approach to the understanding and
without candidate generation, ACM SIGMOD Record 29 training of neural network classifiers, In Proceedings of the
(2) (1995), 175–186. International Conference on Acoustics, Speech, and Signal
[122] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, Processing, Albuquerque, NM, 1990, 3–6.
H-mine: Hyper-structure mining of frequent patterns in [137] P. A. Shoemaker, A note on least-squares learning
large databases, In Proceedings of the IEEE International procedures and classification by neural network models,
Conference on Data Mining, 29 November-2 December, IEEE Trans Neural Netw 2 (1) (1991), 158–160.
San Jose, CA, 2001, 441–448. [138] E. A. Wan, Neural network classification: a Bayesian
[123] B. Liu and W. Hsu, Integrating classification and associa- interpretation, IEEE Trans Neural Netw 1 (4) (1989),
tion rule mining. In: Proceedings of The 3rd International 303–305.
Conference on Knowledge Discovery and Data Mining, [139] B. Widrow, D. E. Rumelhart, and M. A. Lehr, Neural
New York, 27-31 August, 1998. networks: applications in industry, business and science,
[124] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Commun ACM 37 (3) (1994), 93–105.
Classification and regression trees, London, Chapman & [140] J. Mena, Investigative data mining for security and criminal
Hall/CRC, 1984. detection, Oxford, Butterworth-Heinemann, 2003.
[125] J. R. Quinlan, C4.5: Programs for Machine Learning, [141] A. M. Fard, and M. Ester, Collaborative mining in mul-
Burlington, MA, Morgan Kaufmann, 1992. tiple social networks data for criminal group discovery,
[126] J. R. Quinlan, Induction of decision trees, Mach Learn In Proceedings of the International Conference on Com-
1(1) (1986), 81–106. putational Science and Engineering, Vancouver, BC, 2009,
[127] J. W. Hunt, and T. G. Szymanski, A fast algorithm for com- 29–31.
puting longest common subsequences, Communications of [142] C. Haythornthwaite, Social network analysis: an approach
the ACM 20(5) (1977), 350–353. and technique for the study of information exchange, Libr
[128] M. Mehta, R. Agrawal, and J. Rissanen, SLIQ: a fast Inform Sci Res 18 (4) (1996), 323–342.
scalable classifier for data mining. In Advances in Database [143] R. D. Alba, Taking stock of network analysis: A decade’s
Technology EDBT’96, 1057 (1996), 18–32. results, Res Sociol Organ 1 (1982), 39–74.
[129] J. C. Shafer, R. Agrawal, and M. Mehta, SPRINT: [144] J. Scott, Social Network Analysis: A Handbook, London,
a scalable parallel classifier for data mining, In Pro- SAGE, 1991.
ceedings of the 22nd International Conference on Very [145] S. Wasserman, and K. Faust, Social Network Analysis
Large Databases, 3–6 September, Bombay, India, 1996, Methods and Applications, Cambridge, MA, Cambridge
544–555. University Press, 1995.
[130] C. Cortes, and V. Vapnik, Support vector networks, Mach [146] K. Chan, and J. Liebowitz, The synergy of social network
Learn 20 (3) (1995), 273–297. analysis and knowledge mapping: a case study, Int J
[131] P. Langley, W. Iba, and K. Thompson, An analysis of Manage Decis Mak 7 (1) (2006), 19–35.
Bayesian classifiers, In Proceedings of the 10th National [147] K. Faust, Centrality in affiliation networks, Soc Netw 19
Conference on Artificial Intelligence, 12–16 July, San Jose, (2) (1997), 157–191.
CA, 1992, 223–228.

Statistical Analysis and Data Mining: The ASA Data Science Journal DOI:10.1002/sam

You might also like