You are on page 1of 12

Application of Data Mining for Analysis

and Prediction of Crime


Vaibhavi Shinde, Yash Bhatt, Sanika Wawage, Vishal Kongre,
and Rashmi Sonar

Abstract Crime is a significant component of every society. Its costs and conse-
quences touch just about everyone to a remarkable extent. About 10% of the culprits
commit about 50% of the crimes (Nath in Crime Pattern Detection Using Data
Mining. IEEE, 2006, [4]). Explorations that aid in resolving violations quicker will
compensate for itself. But, due to the massive increase in the number of crimes,
it becomes challenging to analyze crime manually and predict future crimes based
on location, pattern, and time. Also today, criminals are becoming technologically
advanced, so there is a need to use advanced technologies to keep police ahead of
them. Information mining can be employed to demonstrate wrongdoing apprehension
issues. Considerable research work turned out to be published earlier upon this topic.
In the proposed work, we thoroughly review some of them. The main focus is on the
techniques and algorithms used in those papers for examination and expectation of
violation.

V. Shinde · Y. Bhatt (B) · S. Wawage · V. Kongre · R. Sonar


Computer Science and Engineering, Prof Ram Meghe College of Engineering and Management,
Amravati, Maharashtra, India
e-mail: yabhatt31@gmail.com
V. Shinde
e-mail: vaibhavicshinde15@gmail.com
S. Wawage
e-mail: sanika.wawage19@gmail.com
V. Kongre
e-mail: vkongre92@gmail.com
R. Sonar
e-mail: rashmi.sonar@gmail.com

© The Editor(s) (if applicable) and The Author(s), under exclusive license 91
to Springer Nature Singapore Pte Ltd. 2021
T. Senjyu et al. (eds.), Information and Communication Technology
for Intelligent Systems, Smart Innovation, Systems and Technologies 195,
https://doi.org/10.1007/978-981-15-7078-0_8
92 V. Shinde et al.

1 Introduction

A crime rate, which is increasing day by day, has become a topic of major concern
assuredly to limit the improvement of healthy governance. Crimes are neither
precise nor irregular, and generally, crimes cannot be dissected. Violations similar
to homicide, sex abuse, assault, and so on seem to be raised, whereas offenses like
housebreaking, firebombing, and so forth seem to be diminished [16].
Crime is a critical part of each general public. Its expenses and outcomes contact
pretty much everybody to a surprising degree. About 10% of the guilty parties carry
out about half of the violations. In any case, because of the enormous increment in
the number of violations, it becomes difficult to analyze and foresee future crimes
dependent on area, pattern, and time. Likewise today, criminals are getting inno-
vatively progressed, so there is a need to utilize trend-setting innovations to keep
police in front of them. Data mining can be utilized to model crime issues. Any
exploration that can help in settling crimes quicker will make up for itself. An exten-
sive number of research papers have been distributed before on this theme. In this
paper, we completely survey some of them. The primary spotlight is on the methods
and algorithms utilized in those papers for examination and forecast of crime.

2 Literature Analysis

Figure 1 shows the general strategy used by most of the researchers for crime predic-
tion and analysis. Table 1 presents the detailed classification and analysis of literature.
Table consists of attributes including: Title of the research paper, focus of the research,
dataset used by them, algorithms or tools, and the future scope of the research.

3 Review Analysis

The review is thoroughly based on infringement examination and forecast employing


information mining methods. We have extensively studied research papers from the
former few years containing maximum papers from 2017 and 2018. In Figure 2, pie
chart presents the most frequently used algorithms for crime analysis and prediction.
1. K-means Clustering
The undertaking of clubbing an assortment of items in such a way, that articles in
a similar gathering, which are otherwise called clusters, are significantly related in
some understanding to each other than to those in different gatherings, is known as
clustering. K-means clustering expects to separate several assessments say n, in k
clusters containing every assessment, that relates with the group including the most
neighboring mean, obeying the requirement for clustering. K-means clustering algo-
rithm plays a vital role in analyzing and predicting crime and is used extensively.
Application of Data Mining for Analysis and Prediction of Crime 93

Fig. 1 Common approaches


Data Collection
for crime analysis and
prediction

Preprocessing

Attribute
Selection

Clustering Classification

Crime Clusters Crime Prediction

Visualization

Yadav et al. [1] have utilized k-means to make various clusters as indicated by rates
that can be maximum or minimum. A pair cluster is built: Group 1: A huge count of
individuals associated with wrongdoing. Group 2: A tiny number of individuals corre-
lated with wrongdoing. Firstly, for preprocessing, the information is introduced from
specific document saved as book.csv within Weka tool, succeeded by employing k-
means clustering to that particular data collection, utilizing a similar realistic Weka’s
GUI. Joshi et al. [3] have used the K-means clustering information mining technique
on the respective dataset, to identify cities with a huge violation rate by detecting
violation rates of each sort of violence. The methodology used is dataset collec-
tion supported by preprocessing of data and followed by the analysis concerning
k-means employing a clustering tool which includes, (a) recognition concerning k,
by applying silhouette measure and (b) adding that information within the K-means
clustering tool. Later, using k-means, cluster 0 to cluster 4 are gained supported by
examination regarding clusters gained applying K-means along with the case study
concerning violation in different areas. Nath [4] used k-means clustering to detect
the patterns in crime. Offenses extensively differ in nature. Violation information
sets usually contain numerous unsolved crimes. The nature of violations changes
over time. For instance, cybercrimes or infringements by utilizing cell-phones were
unique before few years. Essentially, the classification system relies on the current
and known comprehended violations, and it will not give great prescient quality for
future wrongdoings. Paying attention to the above details, the author proposed that
the clustering technique is superior to other supervised methods like classification.
Hazarika et al. [7] have presented the analysis of the lost kids’ information index,
based on the previously observed place of the kids before they were claimed to be
94 V. Shinde et al.

Table 1 Classification and analysis of studied literature


Title Focus Dataset Algorithms/tools Future scope
[1] To give a survey National Crime Algorithms: apriori, To create the
of research Records Bureau k-means, naïve Bayes, violation problem
related to the Web site correlation, and regression. areas and to apply
avoidance of the Tools: Weka tool and R tool these methods on
offenses and to the comprehensive
execute various information set
information which comprises
analysis 42 violation heads
algorithms for possessing 14
connecting characteristics
crime and its
pattern
[2] A model that San Francisco Algorithms: multilinear Employing
recognizes Homicide dataset regression, K-neighbors complex neural
violation classifier, and neural networks similar
designs from networks to CNN and RNN
deductions to improve the
gathered from accuracy of the
the wrongdoing structure
scene and
foretells the
depiction of the
culprit who is
likely doubted to
carry out the
violation
[3] Crimes Web site of Bureau K-means clustering, –
including of Crime Statistics RapidMiner tool
robbery, murder, and Research of
and different New South Wales
drug offenses Government’s
which Justice
additionally Department
incorporate
doubtful actions,
commotion
grievances, and
robber alerts are
investigated by
utilizing
subjective and
quantitative
methodology
(continued)
Application of Data Mining for Analysis and Prediction of Crime 95

Table 1 (continued)
Title Focus Dataset Algorithms/tools Future scope
[4] Transgression Genuine K-means clustering Generate models
patterns are wrongdoing for foreseeing the
recognized by information from a crime problem
using clustering sheriff’s office areas at
algorithms and prominently
consequently, expected places of
the process of wrongdoing for
resolving crime some random span
becomes more of time,
accelerated developing social
link networks
systems to
interface
hoodlums
[5] The method of City police KNN classifier and naïve –
crime forecast is department Bayes classifier
recommended,
in the light of
the naïve Bayes
classifier
[6] Identifying Computer Aid Apriori algorithm –
potential Dispatch System
violation of Beijing
designs Shijingshan Police
implementing Sub-bureau
earlier
underutilized
characteristics
from police
registered
offense
information
[7] To locate the Missing children K-means clustering, A forecast model
missing children dataset of Delhi in distance matrix—Haversine, for approximate
in Delhi year 2016 and Euclidean mapping of the
point wherever a
kid is expected to
be seen applying
lost and found
kids’ information
set is constructed
(continued)
96 V. Shinde et al.

Table 1 (continued)
Title Focus Dataset Algorithms/tools Future scope
[8] Making an UK police Regression, instance-based –
expectation learning, and decision trees
model toward
foretelling the
frequency
concerning
numerous sorts
of violations by
LSOA principle,
also the
recurrence of
social conduct
violation
[9] Contrasts pair Socio-economic Naïve Bayes and BP To assess the
concerning data from 1990 US forecast
separate census performance of
classification separate
algorithms, i.e., classification
naïve Bayes plus algorithms upon
BP concerning the information set
the foretelling
category of
offense for
unique states in
the USA
[10] Examine an US police Classification, spatial data –
assortment of department (real mining
classification time)
techniques to
figure out which
is most suitable
for anticipating
violation areas.
Explore
characterization
on increment or
development
(continued)
Application of Data Mining for Analysis and Prediction of Crime 97

Table 1 (continued)
Title Focus Dataset Algorithms/tools Future scope
[11] Highlight CCIS database, Clustering –
existing systems NCRB, India
used by Indian
police.
An intelligent
question-based
interface as a
wrongdoing
examination
instrument is
proposed
[12] Violations No mention of Communal detection and To demonstrate
related to credit dataset spike detection algorithms the concept of
card utilization adaptivity
are recognized properly
[13] Clustering NCRB dataset Classification: K-nearest To improve
methods are neighborhood, clustering classification
adopted to (k-means, DBSCAN, algorithms and
prognosticate agglomerative-hierarchical) enhance privacy
violation within algorithms and security
6 cities from measures
Tamil Nadu.
Crooks are
recognized via
applying
classification
approaches
[14] Regression is The US’s FBI Utility-based regression, The
adopted to violations SVM, RF, and MARS implementation of
forecast (2013–2014) the proposed
violations, and framework in
integer linear separate countries
programming or zones is
formulation is thought of as
employed for being employed
optimizing the
distribution of
police officers
[15] Establishment of Nationwide police Affinity propagation (AP) Making the
criminal profile. extracted data clustering algorithm algorithm more
Recommend a versatile to assess
novel two-level its influence on
clustering the cluster quality
algorithm and durability
98 V. Shinde et al.

Others (SVM, K-Means


RF, Distance Clustering,
Matrix, AP), 5,
5, 25%
25%

Apriori
Algorithm,
1,
5%
Naive Bayes
Classification,
3,
Regression, 15%
3,
15% KNN
Classification,
3,
15%

Fig. 2 Frequently used algorithms for crime analysis and prediction

missing. The research makes use of clustering, to club the regions wherever the
degree of lost kids is more eminent. In addition, it also is used to distinguish the
patterns, to foretell later possible violation areas. The K-means clustering algorithm
is implemented on the respective information set employing Euclidean distance and
the haversine distance. Sivaranjani et al. [13], to obtain inner patterns and connections
in the offense information set, implemented the K-means clustering algorithm. The
approach presents a boundary of comprehensive violation information and clarifies
in administration, exploring followed by reclaiming of the favored offense infor-
mation. Other clustering algorithms include agglomerative-hierarchical clustering
[13], DBSCAN clustering algorithm [13], and affinity propagation (AP) clustering
algorithm [15].
2. Naïve Bayes Classification
An information function that designates objects within an assortment, to intended
sections/classes is termed as classification. The purpose of classification is to
precisely foretell the aimed group for individual cases within particular informa-
tion. A classification algorithm based on the Bayesian principle for computing
probabilities and conditional probabilities, thus, used for prediction is known as
naive Bayes. Yadav et al. [1] have used naive Bayes classification to understand the
existing dataset and to prognosticate in what way unique personal information sets
will function dependent upon specific classification standards. Babakura et al. [9]
have compared naive Bayes and back propagation for foretelling crime categories
in which the average accuracy of naive Bayes comes to be 92%, whereas for back
propagation is 65%.
Application of Data Mining for Analysis and Prediction of Crime 99

3. KNN Classification
K-nearest neighbors is one of several supervised learning algorithms employed in
data mining and machine learning, and it is a classifier algorithm wherever the
training is based on the extent of similarity of data from others. Shermila et al. [2]
has used the KNN classifier whenever the goal variable includes multiple classes to
perform classification. Within the respective information set, the specific destination
variable relationship holds twenty-seven unique classes like friend, husband, wife,
etc. Furthermore, the objective variable perpetrator gender possesses three classes
viz male, female, and not known. Henceforth, KNN classifier is employed to clas-
sify those objective variables that are accused’s gender and association. Kiran and
Kaishveen [5] have compared KNN classification and naive Bayes for crime predic-
tion and concluded that naive Bayes has higher precision and more inferior execution
time as contrasted to KNN. Sivaranjani et al. [13] has applied the KNN classification
technique that quests within the information set to obtain the greatest related occur-
rence if the input is provided over that. Significant input over the KNN algorithm
comprises the characteristic values of the offense information set. In the view of that
query, the KNN algorithm provides the output, which serves to examine the massive
violation information set moreover it supports foretelling the fate of violation in
several cities, illustrating the offense patterns concerning numerous cities.
4. Regression
For a particular dataset, regression is a data mining technique employed to predict
a range of continuous values. Yadav et al. [1], has applied linear regression in order
to create a constant variable called “Y,” like a mathematical function concerning at
least one variable called “X,” so that regression model could be employed to fore-
tell Y while just the X is identified. Therefore by regression, they have predicted
the number of personalities that conducted the crime versus the estimate of exper-
iments performed during the year. Shermila et al. [2], uses multilinear regression
for finding the relationship between a dependent variable that is the culprit’s age,
with input evidence, which is a provided group of independent variables including,
obtained from the wrongdoing scene. This method foretells the most likely culprit’s
age based on the input features since their dataset had simple traits that are non-
binary, along with the prognostication involved further than two consequent predic-
tors. Cavadas et al. [14] used regression for foretelling furious crimes followed
by resource optimization, analyzing the past predictions. This design employs the
concept of utility-based regression and depends over the interpretation concerning a
relevance function.
5. Apriori Algorithm
Apriori algorithm is created to discover frequently much of the time happening
things and affiliation rules from value-based information sets. Chen and Kurland
[6], applied the apriori algorithm concerning violation pattern apprehension. The
approach operates by begin creating candidate item collections having length called
as K, of item collections of length which is K − 1 by employing a breadth-first
100 V. Shinde et al.

Data Mining Techniques

Clustering Classification Association Prediction


mining

K-means, DBSCAN, KNN, Naïve Byes, back Apriori Algorithm Regression


Agglomerative hierarchical propagation, Decision tree,
clustering, Affinity Propagation SVM, Neural Networks,
(AP) Clustering Algorithm Random Forest

Fig. 3 Data mining techniques and algorithms

search moreover a hash-tree construction for computing applicant item collections,


later that tailors the applicants possessing rare sub-items till the applicant collection
comprises complete frequent k-length object collections, from that point forward,
the transaction information set is browsed in order to discover mostly occurring item
collections among the applicants.
6. Other Algorithms
Other algorithms and techniques used for examination and prognostication of viola-
tion include: (a) support vector machine [14] which is a supervised machine learning
algorithm, including correlated training algorithms that examine information which
utilized concerning classification and regression analysis. (b) Random forest (RF)
[14] comprises of a huge number of individual decision trees that serve as an whole
learning approach concerning classification, regression, and separate businesses that
work through creating several decision trees at training time followed by producing
as an output the class which is the mode of those classes that are classification or
mean forecast that is regression of specific trees. (c) Distance matrix [7] is a two-
dimensional array or a square matrix that shows the distance between pairs of objects.
(d) Back propagation (BP) [9] is an influential algorithm for enhancing the precision
of predictions in data mining and machine learning. To compute a gradient descent
with respect to weights, back propagation is employed by artificial neural networks.
Figure 3 describes the information mining methods and algorithms employed for
crime analysis and prediction used by the research work that we have examined.

4 Conclusion

The violation analysis is a sensitive domain which is expanding day by day and has
a severe impact on society. How to efficiently and precisely analyze the expanding
volumes of crime data manually is the most prominent challenge faced by various
law enforcement agencies today. This research work focuses on reviewing different
methods used for analyzing and predicting crime that can prove to be useful for the
Application of Data Mining for Analysis and Prediction of Crime 101

police forces to handle crimes efficiently. Thus, a criminal investigation ought to have
the option to distinguish the crime patterns as quick as could be expected under the
circumstances and in a viable way for future crime recognition. The review analysis
includes a detailed description of the algorithms that are utilized by the studied
literature along with a pie chart that describes the most frequently used algorithms
concerning analysis and foretelling of violation using information mining. This work
is limited to social crime and can be further expanded by considering cybercrime.

5 Future Scope

For future work, we intend to expand this research to enhance and implement crime
analysis and prediction techniques to resolve the present limitations of the current
approaches to obtain more precise results and better performance.

References

1. Yadav, S., Timbadia, M., Yadav, A., Vishwakarma, R., Yadav, N.: Crime pattern detection,
analysis and prediction. In: 2017 International Conference on Electronics, Communication
and Aerospace Technology ICECA 2017. IEEE (2017)
2. Shermila, M.A., Bellarmine, A.B., Santiago, N.: Crime data analysis and prediction of perpe-
trator identity using machine learning approach. In: 2018 2nd International Conference on
Trends in Electronics and Informatics (ICOEI2018). IEEE (2018)
3. Joshi, A., Sabitha, A.S., Choudhury, T.: Crime analysis using k-means clustering. In: 2017
International Conference on Computational Intelligence and Networks. IEEE (2017)
4. Nath, S.: Crime pattern detection using data mining. In: 2006 IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2006 Workshops).
IEEE (2006)
5. Kiran, J., Kaishveen, K.: Prediction analysis of crime in India using a hybrid clustering
approach. In: The Second International conference on I-SMAC (IoT in Social, Mobile,
Analytics and Cloud) (I-SMAC 2018). IEEE (2018)
6. Chen, P., Kurland, J.: Time, place, and modus operandi: a simple apriori algorithm experiment
for crime pattern detection. IEEE (2018)
7. Hazarika, A.V., Sai Raghu Ram, G.J., Jain, E.: Cluster analysis of Delhi crimes using different
distance metrics. In: International Conference on Energy, Communication, Data Analytics and
Soft Computing (ICECDS-2017). IEEE (2017)
8. Saltos, G., Cocea, M.: An exploration of crime prediction using data mining on open data. Int.
J. Inf. Technol. Decis. Mak. (2017)
9. Babakura, A., Sulaiman, M.N., Yusuf, M.A.: Improved method of classification algorithms for
crime prediction. In: 2014 International Symposium on Biometrics and Security Technologies
(ISBAST). IEEE (2014)
10. Yu, C.-H., Ward, M.W., Morabito, M., Ding, W.: Crime forecasting using data mining
techniques. In: 2011 11th IEEE International Conference on Data Mining Workshops (2011)
11. Gupta, M., Chandra, B., Gupta, M.P.: Crime Data Mining for Indian Police information System.
IIT Delhi, India (2006)
12. Dutta, S., Gupta, A.K., Narayan, N.: Identity crime detection using data mining. In: 2017
International Conference on Computational Intelligence and Networks. IEEE (2017)
102 V. Shinde et al.

13. Sivaranjani, S., Sivakumari, S., Aasha, M.: Crime prediction & forecasting in Tamil Nadu using
clustering approaches. In: 2016 International Conference on Emerging Technological Trends
[ICETT]. IEEE (2016)
14. Cavadas, B., Branco, P., Pereira, S.: Crime Prediction Using Regression & Resources
Optimization. Springer International Publishing, Switzerland (2015)
15. Alphonse Inbaraj, X., Rao, A.S.: Hybrid clustering algorithms for crime pattern analysis. In:
2018 IEEE International Conference on Current Trends toward Converging Technologies,
Coimbatore, India
16. Chauhan, C., Sehgal, S.: A review: crime analysis using data mining techniques and algorithms.
In: International Conference on Computing, Communication and Automation (ICCCA2017).
IEEE (2017)

You might also like