Professional Documents
Culture Documents
Abstract — Crime is one of unnecessary evils that lurks in Some costs of crime are less tangible (not simply or
our society. This is usually the orchestrated by people with exactly identified). These costs will be the cause of much pain
nefarious intentions and has a lasting negative impact on and suffering, and a lower quality of life. There are traumatic
people and is inhumane in nature. For the purpose of impacts on friends and the disruption of family. The behavior
combating this problem various law enforcement agencies of such individuals will always be forever modified and
work round the clock. This ensures that the residents are formed by crime, be it whether the risks of living in certain
safe and secure form these crimes, which also requires places or perhaps the worry of creating new friends.
significant amounts of money from the governments.
Therefore, the implementation of the artificial intelligence Crime not only affects economic productivity once
or the machine learning approach can provide significant victims start missing work, however, but communities are also
improvements to the efficiency. To this end various affected through loss of commercial enterprise and retail sales.
researches have been performed to enable crime Even the questionable victimless crimes of vice crime, drug
prediction but most of them have fallen short of their abuse, and gambling have major social consequences. Abuse
expectations. Thus, this publication outlines an effective often affects employee productivity and uses public funds for
and secure crime prediction system that utilizes K-Means drug treatment programs and medical attention, and the victim
clustering and Linear Regression along with the Fuzzy often ends up in criminal activity to support the expenses of a
Artificial Neural Networks and Decision tree. The drug habit.
experimental results conclude that the proposed system is
executing as intended. Communities and governments pay public funds for
police departments, prisons and jails, courts, and treatment
Keywords— Crime Scenario, Crime prediction, K-Means, programs, together with the salaries of prosecutors, judges,
Linear Regression, Fuzzy ANN, Decision tree. public defenders, social employees, security guards, and
probation officers. the amount of your time wasted by victims,
I. INTRODUCTION offenders, their families, and juries throughout court trials
conjointly deducts from community productivity. By the start
of the 21st century, it absolutely was calculable that the annual
The crime is a major part of each society. Crime has
value of crime within the U.S. was reaching upward toward
lasting effects on everyone who has come in contact with it to
$1.7 trillion.
some extent. The price paid for the crime and effects is widely
varied. Additionally, some effects of crime are short whereas
Crime is one of the largest and dominating drawbacks in
others last a lifespan. Of course, the biggest cost paid for
our society and its reduction is very important. task. Daily
crime is a loss of life. Different prices are paid by victims as
there are an immense increase in numbers of crimes
they embrace medical prices, property losses, and loss of
committed. This needs keeping track of all the crimes and
financial gain.
maintaining a piece of information for the same which can be
used for future reference. the present drawback in this
Losses to each victim and non-victims also range from
technique is the maintaining of the correct information set of a
increased expenses such as medical bills and security expenses
particular crime and analyzing this data to assist in predicting
together with stronger locks, additional lighting, parking in
and finding crimes in the future. the target of this technique is
additional high-priced secure plots, security alarms for homes
to investigate a dataset that contains varied crimes and
and cars, and maintaining guard dogs, etc. significant cash is
predicting which kind of crime can happen in the future
spent to avoid being misused. different sorts of expenses will
relying upon varied conditions. during this project, we are
have to be borne by a victim or person scared of crime moving
going to be utilizing the technique of machine learning and
to a brand-new neighborhood, observance expenses, legal
information science for the purpose of crime prediction of
fees, and loss of college days.
Chicago crime information set. The crime information is
extracted from the official portal of the Chicago police. It
2
consists of crime info like location description, sort of crime, KNN has some benefits and drawbacks compared to
date, time, latitude, longitude. Before training the model Naïve Bayes. a bonus is that KNN's call boundary will take
information, preprocessing is going to be performed following any type, Naïve Bayes will solely have linear, elliptic, or
which feature selection and scaling are going to be done in parabolic call boundaries. additionally, Naive Bayes isn't
order to increase the accuracy. The K-Nearest Neighbor useful with correlative attributes, if the peculiarity of
(KNN) classification and various alternative algorithms are classification isn't the marginal distributions but relies on
going to be tested for crime prediction and one with higher correlation, then NB will not be an honest alternative. Naive
accuracy is going to be used for training. the visual image of Bayes can also be misled by the absence of an Associate in the
the dataset is going to be worn out terms of graphical Nursing attribute. one of the disadvantages is that KNN
illustration of the many cases as an example at which time of doesn't acknowledge the foremost vital attributes, space is that
the year the crime rates are high or at which month the the sole criteria used. in addition, it's non-parametric, and so
criminal activities are on the rise. The sole purpose of this not as explicable as NB, KNN cannot give any relationships
project is to allow a gist plan of how machine learning is often between the distribution of attributes and categories. KNN
employed by the enforcement agencies to observe, predict and doesn't handle the missing knowledge properly, Naïve Bayes
solve crimes at a lot quicker rate and therefore reduce the rate simply excludes the attribute of missing knowledge. In KNN,
of crime. It not restricted to just Chicago, this could be the value of K has to be tuned relative to clustering and the
employed in alternative states or countries relying upon the best value has to be allotted. Another disadvantage is that
provision of the dataset. KNN is slower in the process throughout prediction, with
massive amounts of knowledge the distinction in speed is
In this analysis, Python was trained for exploring critical.
criminal knowledge, creating multivariate analysis and
predicting classes for the purpose of taking a look at the Crime hot-spot location prediction is vital for public
inferred data, so as to identify the most effective correlation safety. The output from the prediction will give helpful info to
between the options (Date, Pd-District, Address, Day of boost the activities aimed toward police investigation and
Week, Description, Resolution, X and Y) and therefore the preventing safety and security issues. Location prediction may
target result (Category of Crime). All nominal values were be a special case of spatial data processing classification. for
converted into binary values by changing the values of the example, within the public safety domain, it should be
attributes into separate new attributes which provide the attention-grabbing to predict location(s) of crime hot spots.
values of either zero or 1. many trials of various Regression during this study, we tend to use a support vector machine
strategies were used on the coaching knowledge by (SVM)-based approach to predict the situation as an
segregating it into 2 sets; coaching and validation, each alternative to existing modeling approaches. The support
validation and cross-validation were conducted, the strategy vector machine forms the new generation of machine-learning
with the smallest amount of Log loss was applied to predict techniques that are used to realize the best disconnection
the results by taking a look at knowledge. 2 main Algorithms between categories at intervals datasets. we tend to compare
were employed in this analysis. the primary algorithmic the performance of 2 forms of SVMs techniques: two-class
program is K-nearest neighbors, KNN is a supervised learning SVMs and one-class SVMs. we tend to conjointly compare
algorithmic program for either classification or regression, SVM with a neural network-based approach and spatial auto-
combining 2 totally different distances-weighting functions. regression-based approach. Experiments on 2 totally different
the primary operation is uniform, all points in the spatial datasets demonstrate that the previous approach
neighborhood area are weighted equally. The second operation performs slightly higher and therefore the latter one provides
is inverse, weight points are alotted by the inverse of their cheap results. moreover, during this study, we offer a general
distance. during this operation, the nearest neighbors of an framework to customize the spatial knowledge classification
attribute can have a bigger influence than neighbors in that task for alternative spatial domains that perform on datasets
area unit. The second algorithmic program is Naive Bayes, like the analyzed crime datasets.
which is a set of supervised learning algorithms that supports
applying Bayes’ theorem with the “naive” assumption of Predictive models have many social utilities where they
independence between each of the options, by combining 3 are used to prevent suicides and crimes by analyzing past data
functions. the primary operation is Bernoulli, that implements that is available in various sources. Predictive analysis is
the naive Bayes coaching and classification algorithms for usually done to predict the outcome of certain incidents using
knowledge. it's helpful if your feature vectors are in binary historical data. Traditionally analysis is done by the user after
units. The second operation is multinomial, which implements he/she specifies what to look for in the data sets. But when
the naive Bayes algorithmic program for multinomial there are thousands of data items it gets more complicated to
distributed knowledge, it's usually used for distinct counts. look for certain numerical values or texts to predict certain
The third operate is mathematician, wherever the chance of outcomes. Machine Learning, when paired with these
the options is assumed to be mathematician, rather than prediction methods, the possibilities become infinite. Machine
distinct counts, we've got continuous options. In Python, the learning has various concepts that can be implemented into
Scikit-learn library functions were trained to conduct effect based on the user’s requirement. Deep Learning is a
regression and classification. concept that is used in many systems like Google’s search
engine and the predictive keyboards that everyone uses in their
3
phones every day. Deep testing data and the output is K.R Vineeth [5] introduces the factors which are
compared with the actual data. The output is visualized responsible for increasing the crime rate in India, such as
through a graph. Multiple graphs are drawn for multiple growing population and the limited job opportunity for youths
training process. The accuracy can be visualized between the which diverts them to commits crime due to stress. Crime
graphs. analysis is necessary and important which is helpful to
inverstigation agencies to take action to prevent them. In the
This paper dedicates section 2 for analysis of past work proposed paper they authors are used FP Max a bottom-up
as literature survey and section 3 describes the proposed approach to concentrate on frequent crime which uses linked
model in details. Section 4 elaborates about the experimental list for reducing the space complexity and CIP to classify the
setup and Result evaluation, whereas section 5 concludes this data with labels like high, low, and dangerous by using
paper along with the future scope expectations. random forest which yields a promising accuracy in the
prediction of the crime.
Z. Wawrzyniak [6] elaborates the prediction of crime
II. LITERATURE SURVEY events that will occur in the future is dependent on
observational data and other factors that are affecting crime.
This section of the literature survey eventually reveals Data can be fetched from the police records or from open
some facts based on thoughtful analysis of many authors work sources to form structured data that can be used to understand
as follows. the criminal behavior for predicting future crime events. The
model used deep leaning architecture of Artificial Neural
S.Yadav[1] discusses as the population is increasing Network to reach a good level prediction. For selection of
simultaneously the crime rate is increasing in India. In the hidden neurons, a new technique is developed a virtual leave-
proposed paper authors predict the crime from the previous one-out test (VLOO) and for selection of network inputs they
year’s record of crime such as murder, kidnapping and have used Gram-Schmidt orthogonalization (GS). Short-term
abduction, dacoity, robbery, burglary, rape, and other such crime prediction using the long short-term memory (LSTM)
crimes. The model is used Naive Bayes Algorithm which is recurrent neural networks (RNN) and convolutional neural
one of the finest data mining techniques which classify the networks (CNN) is used.
data in different predefined classes and sets. Correlation &
Regression is used to relate two variables with each other if M. Nakib [7] presents the new approach for predicting
correlation result is 1 then it is perfect relation if it is 0 then the crime from the crime scene using the blood, knife, and
there is no relation between two variables. So, the proposed Gun. As we know and see on a regular basis there has been a
model is help to predict the crime and reduce the crime. larger amount of CCTV cameras that have been installed to
A. Babakura [2] states that it is necessary to analyze the monitor a certain area but it is very hard to manage all
crime data using the data mining technique. In the present cameras manually. The Freighting item is extracted from the
paper they have compared two data mining techniques such as images which give the predication whether the crime is
Naïve Bayesian and Back Propagation for the crime committed or not and from where the image is taken.
prediction, then output is provided in three categories such as Detection is done on the basis Rectified Linear Unit (ReLU),
low, high and medium. They finally reveal that accuracy rate Convolutional Layer, Fully connected layer and dropout
of Naïve Bayesian is better than the Back Propagation. function of CNN. 90.2% accuracy is achieved on tested
S.Sivaranjani [3] estimates that the crime is one of the dataset.
important fields to be restricted in India which is a curse in
developing countries. In the proposed paper the data is C. Chauhan [8] discusses the crime rate is increasing day
extracted from the National Crime Records Bureau (NCRB) by day and it is one of the major topics to be researched by
of India, which is having information of six cities of 14 years using Artificial Intelligence and Machine learning techniques.
with 9 attributes. The system is used different clustering Thus, by using this technique and by the guidance of crime
methods such K-means, Density-Based Spatial Clustering etc. data analysts they can predict the crime and help the
to get the best clustering method for crime prediction. The investigation officers. To handle the huge amount of data
attribute input is given to KNN algorithm to analyze the large manually it is not possible as the criminals are becoming
data and the fed to k-means clustering too. Thus the system technically advanced therefore it is necessary to develop
predicts there is an improvement in results as compared to advanced technology to help the police officer ahead of them.
others. For classification of data they have used Naïve Bayes
N.Mahmud[4] narrates that crime prediction is one of the Classifiers technique.
emerging topic to be researched in recent years. By using the
crime pattern theory the crime can be predicted from the past Z. Beiji [9] estimates the new approach called a Neuro-
data. The model is introduced CRIMECAST which is crime fuzzy based model for evaluating the crime. Firstly data is
detection and strategy direction service, which attempts to collected from violent scene detection (VSD) for extracting
predict probable future crimes by simulating probabilistic the fuzzy rules. Videos can be collected from the various web
model implementation and Artificial Neural Network. As it is sites or some scenes of some Hollywood movie. Secondly
new model they have not compared with any other model. video analyzing is done, the video analyzing system has 3
modeled indicators such as 52 action concepts for e.g. punch,
4
slap, kick, fall, run with 15 scene concepts for e.g. crowd, they have used a new technique that accounts for external
street, park, residential, bush and 21 object concepts for e.g. influences by using ARIMAX – which transfers the single
gun, knife, fire, face mask, car. So for representation they have input that is in the motorcycle.
used Bag of concepts (BoC) and Co-occurrence of concepts
(CoC). Then the neuro Fuzzy based model is implemented for
computing crime.
Distance Evaluation – Here for the clustering purpose 3 row of the clusters. The obtained the Y-intercept regression
main attributes are considered from the labeled list like age, value is measured for its maximum and minimum values as
sex and charge Description which are labeled in integer the high and low regression ranges. These regression values
format. These attributes of each of the rows of the labeled list are utilized in the next step of the fuzzy ANN model for the
are subject to evaluate the Euclidean distance between all prediction of crime efficiently.
other rows to calculate their mean to obtain the Row distance.
The obtained row distance R D is appended at the end of the Step 4: Fuzzy Artificial Neural Network- This is the most
each row. The average of all these rows is considered as the important step of the proposed model, where the obtained
Average row distance or Euclidean distance of the complete maximum and the minimum regression values are used to
dataset EDD. This process is carried out by using the equation 1 form the fuzzy crisp ranges. The obtained difference between
and 2. the minimum and maximum regression value is divided by
five to get the quotient. This quotient is used to segregate the
RD=√ (x 1−x 2)2 +( y 1− y 2)2______ (1) five fuzzy crisp ranges like VERY LOW, LOW, MEDIUM,
HIGH and VERY HIGH. For each of the regression clusters
n the rows in the range of HIGH and VERY HIGH crisp values
are considered to be added in a list to call as the ANN input
EDD = ∑ RD ________________________(2) list. This input list is used to estimate further prediction score.
k=0
// Input: UCD= Charge Description, UAT = Arrest Performance Evaluation based on Precision and Recall
Type
// Output : Prediction String PSTR Precision and Recall enable the derivation of elaborate
1: Start information regarding the performance of the presented
2: PSTR= “ ”, count=0 system. The precision and recall metrics are comprehensive
3: for i=0 to Size of PDL and judicious parameters that can calculate the veritable
4: TL= ∅ [TL = Temporary List] performance of the system. Precision in this evaluation
5: RL= PDL [i] calculates the relative accuracy of the presented technique by
6: AN = RL[4] ,AG = RL[6] ,SX = RL[7], estimating the accurate values of the magnitude of precision
7: CD = RL[10], AT = RL[11] achieved in the proposed system.
8: if (UAN == AN), then count++, end if
9: if (UAG == AG), then count++, end if Precision in this system is being calculated as the ratio of
10: if (USX == SX), then count++, end if the incorporated sum of all the correctly predicted crimes to
11: if (UCD == CD), then count++, end if the number of incorrectly predicted crimes. Therefore, the
12: if (UAT == AT), then count++, end if calculation of the values of precision acquired is an thorough
13: RF= (100 * count) / PDL[size]*5 assessment of the accuracy of the presented system.
14: if( RF >= 0 AND RF<=20), then The Recall metrics used for calculation of the absolute
15: PSTR= VERY LOW accuracy of the approach which is considerably distinct from
16: end if the precision metrics. The Recall metrics are calculated by the
17: if( RF >= 21 AND RF<=40), then assessment of the ratio of the number of accurately predicted
18: PSTR= LOW crimes versus the total number of crime predictions
19: end if performed. This systematic assessment provides insightful
20: if( RF >= 41 AND RF<=60), then knowledge as it calculates the absolute accuracy of the system.
21: PSTR= MEDIUM Precision and recall are detailed mathematically in the
22: end if equations given below.
23: if( RF >= 61 AND RF<=80), then
24: PSTR= HIGH Precision can be mathematically explained as below
25: end if
26: if( RF >= 81 AND RF<=100), then A = The number of accurately predicted crimes.
27: PSTR= VERY HIGH
28: end if B= The number of inaccurately predicted crimes
29: end for
30: return PSTR C = The number of crimes not predicted.
31: Stop
So, precision can be defined as
For Future research applications, the proposed [10] Hao Jianzhong, Teng Yufa, Zhang Mingxue, Liu Gang,
methodology can be executed on a real-time criminal activity Gao Wei,” Application of Discrete Orthogonal Combinatorial
from the police records. The accuracy of the methodology can Prediction in Crime Sites” 978-1-4577-2074-1/12/6.00c 2012
be increased further by introduction of even more attributes IEEE.
for the prediction purposes.
[11] Anahita Ghazvini, Mohd Zakree Bin Ahmad Nazri, Siti
Norul Huda Sheikh Abdullah, Md Nawawi Junoh,” Biography
REFERENCES Commercial Serial Crime Analysis Using Enhanced Dynamic
Neural Network ” 978-1-4673-9360-7/15/2015 IEEE.
[1] Sunil Yadav, Meet Timbadia, Ajit Yadav, Rohit
Vishwakarma, and Nikhilesh Yadav,” Crime Pattern
8
*****