You are on page 1of 75

SPATIO-TEMPORAL CRIME HOTSPOT

DETECTION USING HYBRID MACHINE


LEARNING ALGORITHM TO IMPROVE
PREDICTION ACCURACY

THEEBAN PILLAI ANBALAGU

UNIVERSITI SAINS MALAYSIA

2023
SPATIO-TEMPORAL CRIME HOTSPOT
DETECTION USING HYBRID MACHINE
LEARNING ALGORITHM TO IMPROVE
PREDICTION ACCURACY

by

THEEBAN PILLAI ANBALAGU

Thesis submitted in fulfilment of the requirements


for the degree of
Master of Science

August 2023
ACKNOWLEDGEMENT

I would like to express my deepest gratitude to my esteemed Lecturer, Dr. Sukumar,

for their invaluable guidance, unwavering support, and encouragement throughout my

academic journey. Dr. Sukumar's expertise and dedication have been instrumental in

shaping my understanding of the subject matter and have inspired me to pursue

excellence in my research. I am also profoundly thankful to Umair Butt, for their

mentorship and constructive feedback on the building the codes and report evaluation.

Their insightful suggestions and continuous motivation have significantly contributed

to the success of this thesis. Their patient guidance and willingness to share knowledge

have been pivotal in enhancing my research skills and critical thinking abilities. I

would also like to extend my appreciation to all the faculty members of the department

for providing a stimulating academic environment and fostering an atmosphere of

learning. Lastly, I wish to acknowledge my family and friends for their constant

encouragement and understanding while this academic pursuit. Not to forget my

beloved late sister, Puntalir Anbalagu for motivating and supporting me to enrol this

course after completing my degree. Their love and support have been the driving force

behind my achievements. Thank you to all who have contributed to this endeavour in

any capacity. Your support has been invaluable.

ii
TABLE OF CONTENTS

ACKNOWLEDGEMENT ......................................................................................... ii

TABLE OF CONTENTS .......................................................................................... iii

LIST OF TABLES .................................................................................................... vi

LIST OF FIGURES ................................................................................................. vii

LIST OF ABBREVIATIONS ................................................................................ viii

LIST OF APPENDICES .......................................................................................... ix

ABSTRACT ................................................................................................................ x

CHAPTER 1 INTRODUCTION .......................................................................... 1

1.1 Motivation ........................................................................................................ 2

1.2 Research Questions .......................................................................................... 3

1.3 Problem Statement ........................................................................................... 3

1.4 Objective .......................................................................................................... 4

1.5 Research Contributions .................................................................................... 4

CHAPTER 2 LITERATURE REVIEW .............................................................. 5

2.1 Introduction ...................................................................................................... 5

2.2 Data Pre-Processing ......................................................................................... 5

2.3 Single Machine Learning Algorithms .............................................................. 6

2.3.1 Decision Tree and Random Forest ................................................... 6

2.3.2 Naïve Bayes...................................................................................... 7

2.3.3 Linear Regression (LR) .................................................................... 8

2.3.4 Autoregressive Integrated Moving Average (ARIMA) ................... 9

2.3.5 Kernel Density Estimation (KDE) ................................................. 10

2.3.6 Gradient Boosting (GB) ................................................................. 11

2.3.7 Long-Short Term Memory ............................................................. 12

2.4 Hybrid Deep Learning Algorithms ................................................................ 16

iii
2.4.1 Bidirectional-LSTM (Bi-LSTM).................................................... 17

2.4.2 LSTM-CNN ................................................................................... 19

2.4.3 Bi-LSTM-CNN .............................................................................. 21

2.5 Performance Evaluation Metrics .................................................................... 23

2.6 Summary ........................................................................................................ 24

CHAPTER 3 METHODOLOGY ....................................................................... 30

3.1 Introduction .................................................................................................... 30

3.2 Experimental Setup ........................................................................................ 31

3.2.1 Hardware Setup .............................................................................. 31

3.2.2 Software Setup ............................................................................... 31

3.3 Experimental Dataset ..................................................................................... 31

3.3.1 Collection of Experimental Dataset ............................................... 31

3.3.2 Data Cleaning and Preprocessing ................................................... 32

3.4 Model Building and Training ......................................................................... 33

3.4.1 Building the Model......................................................................... 34

3.4.2 Compiling the Model ..................................................................... 35

3.4.3 Training the Model ......................................................................... 35

3.5 Model Evaluation ........................................................................................... 36

3.5.1 Root Mean Squared Error (RMSE) ................................................ 36

3.5.2 Mean Absolute Percentage Error (MAPE)..................................... 37

3.5.3 Mean Squared Error (MSE) ........................................................... 37

3.5.4 R-Squared (R²) ............................................................................... 38

3.5.5 Training Loss.................................................................................. 38

3.5.6 Accuracy of the Model Generated ................................................. 39

CHAPTER 4 RESULTS AND DISCUSSION................................................... 40

4.1 Introduction .................................................................................................... 40

4.2 Data Analysis ................................................................................................. 40

iv
4.3 Results ............................................................................................................ 43

4.3.1 Accuracy of the Trained Models .................................................... 43

4.3.2 Training Loss.................................................................................. 43

4.3.3 RMSE ............................................................................................. 44

4.3.4 MAPE ............................................................................................. 45

4.3.5 MSE................................................................................................ 46

4.3.6 R-Squared ....................................................................................... 47

4.3.7 Time Taken for Each Algorithm for Training ................................ 48

4.4 Summary ........................................................................................................ 49

CHAPTER 5 CONCLUSION AND FUTURE RECOMMENDATIONS ...... 51

5.1 Conclusion ...................................................................................................... 51

5.2 Recommendations for Future Research ......................................................... 52

REFERENCES ......................................................................................................... 53

APPENDICES

v
LIST OF TABLES

Page

Table 2.1: Comparison of Algorithms ....................................................................... 28

Table 3.1: Hardware Specification ............................................................................. 31

Table 4.1: Accuracy of Trained Models .................................................................... 43

Table 4.2: Loss during Model Training ..................................................................... 44

Table 4.3: RMSE Average ......................................................................................... 45

Table 4.4: MAPE Average ......................................................................................... 45

Table 4.5: MSE Average ............................................................................................ 46

Table 4.6: R-Squared Average ................................................................................... 47

Table 4.7: Average Time Taken for Model Training ................................................. 48

vi
LIST OF FIGURES

Page

Figure 2.1: Example of Gradient Boosting Model ..................................................... 12

Figure 2.2: LSTM Model ........................................................................................... 13

Figure 2.3: Bi-LSTM Model ...................................................................................... 17

Figure 2.4: CNN-LSTM Model ................................................................................. 20

Figure 2.5: Bi-LSTM-CNN Model ............................................................................ 22

Figure 3.1: Overall Methodology............................................................................... 30

Figure 3.2: Model Building and Training Flowchart ................................................. 33

Figure 4.1: Crime Type Distribution.......................................................................... 40

Figure 4.2: District-wise Crime Distribution ............................................................. 41

Figure 4.3: Hour-wise Crime Distribution ................................................................. 41

Figure 4.4: Year-wise Crime Distribution ................................................................. 42

Figure 4.5: Month-wise Crime Distribution............................................................... 42

Figure 4.6: Average Accuracy Comparison ............................................................... 43

Figure 4.7: Average Training Loss Comparison ........................................................ 44

Figure 4.8: Average RMSE Comparison ................................................................... 45

Figure 4.9: Average MAPE Comparison ................................................................... 46

Figure 4.10: Average MSE Comparison .................................................................... 47

Figure 4.11: Average R-Squared Value Comparison ................................................ 48

Figure 4.12: Average Time Taken for Model Training Comparison ......................... 49

vii
LIST OF ABBREVIATIONS

ARIMA Autoregressive Integrated Moving Average


Bi-LSTM Bidirectional Long Short-Term Memory
BPD Boston Police Department
CNN Convolutional Neural Network
DDoS Distributed Denial-of-Service
DNN Deep Neural Network
DT Decision Tree
GBM Gradient Boosting Machine
KDE Kernel Density Estimation
KNN K-Nearest Neighbor
LR Linear Regression
LSTM Long Short-Term Memory
MAE Mean Absolute Error
MLP Multilayer Perceptron
MSE Mean Squared Error
NB Naïve Bayes
PCA Principal Component Analysis
ReLU Rectified Linear Unit
RF Random Forest
RMSE Root Mean Squared Error
RNN Recurrent Neural Network
SARIMA Seasonal Autoregressive Integrated Moving Average
STNN Spatio-Temporal Neural Network
SVM Support Vector Machine
TFF TensorFlow Federated

viii
LIST OF APPENDICES

Appendix A Python Code for the 4 Algorithms

ix
SPATIO-TEMPORAL CRIME HOTSPOT DETECTION USING

HYBRID MACHINE LEARNING ALGORITHM TO IMPROVE

PREDICTION ACCURACY

ABSTRACT

Crime hotspot detection and prediction are crucial for effective law

enforcement and proactive crime prevention methods. As a result, the goal of this

research is to find a suitable machine learning algorithm for detecting spatiotemporal

crime hotspots. Based on previous research, numerous machine learning methods such

as Decision Tree, Random Forest, Nave Bayes, Linear Regression, ARIMA, Kernel

Density Estimation, Gradient Boosting, and LSTM were explored. This paper further

investigated on hybrid models including Bi-LSTM, LSTM-CNN, and Bi-LSTM-CNN.

It was discovered that hybrid models outperform solitary models in forecasting

spatiotemporal crime hotspots. As a result, using the dataset acquired from the Boston

Police Department, a comparison was performed to establish the best

performing model. Bi-LSTM-CNN performed the best compared to the other models

by achieving the highest accuracy, highest R2 score, and lowest RMSE, MAPE,

training time and MSE. Overall, law enforcement agencies can use the Bi-LSTM-CNN

hybrid model to prevent crime more effectively.

x
CHAPTER 1

INTRODUCTION

The significant objective of a smart city is to improve the quality of life of its

residents by making better use of the city's resources. The dramatic alteration of urban

areas has a huge influence on cities' socioeconomic growth. Smart cities infrastructure

has been developed because of technological improvements, and it primarily focuses

on the quality of citizen life, better management of urban population concerns, and

sustainability in all aspects of their lives. Smart cities have enriched human life by

leveraging technology to address socioeconomic issues such as education, health,

transportation, economics, and public safety. However, cities' relatively expanding

population presents issues based on a vast quantity of data created by electronic

devices used by a big population in a city which includes sensors, cameras and tracking

devices.

Security is a critical component of a country's foundation. It is the obligation

of a country's law enforcement authorities to regulate crime incidences and crime

threats for the welfare of society. Crimes can have a tremendous impact influence on

a country's economic growth. As a result, countries spend a large amount of their GDP

on law enforcement agencies in order to fight crime. Thus, collaboration among

developers, research teams, legal authorities, industrial community, and residents are

critical for presenting and developing ideas to address smart city difficulties and attain

smart city goals. Cities are becoming overcrowded, pushing governments to launch

smart city programs to improve infrastructure management. To maintain a safe and

secure workplace, it could be challenging for the government officials. For successful

policymaking toward improved and peaceful communities, law enforcement

authorities must study crime trends and patterns.

1
Intelligent technologies can forecast future crimes and patterns by examining

prior crimes. Researchers can now gather and analyse massive volumes of data thanks

to the rising usage of powerful algorithms in criminal investigation. Crime Detection

generates patterns from existing data gathered by law enforcement and criminals to

avoid possible human error during classification and identification. The analysis and

prediction of crimes may be a quick and efficient procedure. Many existing studies

make use of artificial intelligence and machine learning to extract criminal trends and

detect crimes. Even though the data processing and classification time is rapidly

refined, accuracy is an important aspect to be considered.

Data mining techniques are becoming increasingly common in the security

sector as businesses and organizations seek to better their operations by collecting and

analysing huge volumes of data. This research examines based on the category of

crimes, time, and location of the crime occurrence. The targeted category of the crimes

are rape, murder, robbery, and physical assault, which usually happens in public areas.

These crimes can be associated directly with time and location of occurrence without

deviation.

1.1 Motivation

The motivation behind this research is to address the limitations of the

traditional crime analysis methods and explore the potential of spatio-temporal crime

predictions models. By considering the spatial and temporal aspects of crime data, we

aim to develop more accurate and efficient methods for crime hotspot predictions.

Our main aim is to increase the accuracy of crime hotspot identification by

adding hybrid machine learning techniques. Accuracy enhancement is first and

foremost necessary. Traditional approaches may struggle to capture the intricate

2
relationships and interactions between various factors contributing to crime hotspots.

Hybrid algorithms have the potential to leverage the strengths of different models and

techniques, leading to more precise and reliable predictions.

Moreover, efficient resource allocation is important for crime prevention.

Hybrid machine learning algorithms can assist in optimizing resource allocation by

identifying crime hotspots with higher precision. Law enforcement agencies can

prioritize these areas, allocating personnel and surveillance systems, accordingly,

thereby maximizing the impact of crime prevention efforts.

Apart from that, assessing the efficiency and accuracy of hybrid

machine learning algorithms for crime hotspot detection is crucial for their practical

implementation to avoid misinterpretation of data prediction based on lower accuracy.

By evaluating and comparing the performance of different algorithms, we can identify

the most effective approach in terms of both prediction accuracy and efficiency in

detecting crime hotspots.

1.2 Research Questions

• How can we increase accuracy in predicting crime hotspot for crime

prevention?

• How does the existing models can be enhanced to improve the crime hotspot

training efficiency and prediction accuracy?

1.3 Problem Statement

Crime analysis and prevention is an important aspect of maintaining public

safety. However, traditional methods of crime analysis often rely on manual inspection

of crime data, which can be time-consuming and prone to errors. To address this issue,

3
there is a need for an automated approach to crime hotspot detection that can analyze

spatio-temporal crime data and predict the likelihood of crime in certain locations and

times. A crime prediction algorithm for spatio-temporal crime hotspot detection using

machine learning is available but lacking in a better performance which eventually

results an increase in the failure rate for authorities to detect and stop the crime.

1.4 Objective

The aim of this research is:

1. To study crime hotspot detection using the existing machine learning

algorithms.

2. To develop a more accurate and efficient machine learning algorithm based on

the available dataset.

3. To compare the existing machine learning algorithm performance with the new

algorithm.

1.5 Research Contributions

The following are the research contributions:

1. Having a hybrid algorithm for spatio-temporal crime prediction results in a

high accuracy compared to traditional models.

2. A combination of a Bi-LSTM and a CNN algorithm is developed, which

results in results in high accuracy for Boston Crime dataset for crime hotspot

prediction.

4
CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

Crime is a major concern in every society and identifying crime hotspots can

help prevent and reduce criminal activities. A crucial step in crime analysis and

prevention is the recognition of spatiotemporal crime hotspots. It can be helpful for law

enforcement organizations to deploy resources and create effective crime-fighting

tactics if they are able to pinpoint areas and times when criminal activity is most

prevalent. In recent years, hotspots have been identified and crime data has been

analysed using machine learning techniques. The section below summarizes the

application of various machine learning models for the detection of spatiotemporal

crime hotspots based on the prediction accuracy, dataset applied and limitations of each

model.

2.2 Data Pre-Processing

To be used in machine learning, data must be cleaned, processed, and prepared

as part of data preparation. This step is essential because missing values, outliers, or

unrelated properties in the raw data may have a major impact on the model's accuracy.

For spatio-temporal data, like Boston Crime Dataset ((BPD), 2018) there are specific

data preprocessing required to ensure optimal model training. Finding and resolving

missing or erroneous data is a common step in the data preprocessing process known as

"data cleaning". This process can be applied to eliminate inaccurate or unneeded data

as well as impute missing values with the aid of techniques like mean, median, or mode.

For the Boston Crime Dataset, it is only required to remove and drop unused columns

(Salam, 2022).

5
2.3 Single Machine Learning Algorithms

2.3.1 Decision Tree and Random Forest

As suggested by its name, this supervised machine learning algorithm creates a

tree-like model with decision nodes and leaf nodes. The leaf node here represents a

choice, whereas the decision nodes are arranged in the order of two or more branches.

To handle categorical and continuous data, a decision tree is utilized. This algorithm is

simple and useful decision-making diagram. Using the trees is simple and practical

technique to comprehend how a selection is concluded as well as to visualize the

outcomes of the algorithms A decision tree's key benefit is that it can swiftly adapt to

the dataset.

The random forest is an algorithm with a combination of multiple

decision trees and then concludes an average output. This algorithm is widely used for

used for both classification and regression problems. Almost identical hyperparameters

exist for the forest and a decision tree. On randomly split data, its ensemble approach

of decision trees is created. This entire group may be compared to a forest with several

independent random samples growing on each tree. When there are enough trees

present, the random forest technique may become too slow and inefficient for real-time

prediction. The random forest approach, in contrast, creates the findings based on

randomly selected observations and traits built on numerous decision trees.

Based on the study conducted (Yin, Michael, & Afa, 2020), the random forest

model performance accuracy increases with number of decision trees is used in

algorithm, but not a big difference based on the accuracy improvement. Even though

the comparison between DT and RF is done, it is compatible with the very large dataset

since the study has reached the bottleneck of the algorithm used. Additionally,

according to another research, decision trees outperform other algorithms for the Boston

6
crime dataset in terms of precision, recall, and F1-score, resulting in a robust tree that

employs longitude and latitude (Aljuboori, Shaker, & Fadhil, 2022).

Moreover, Decision Tree algorithm also can be enhanced for Boston

Crime dataset using the Principal Component Analysis(PCA), where the methos is used

for dimensionality reduction. it figures out how to project the initial data into less

dimensions (Sharma, Choudhury, & Kandwal, 2021). Apart from this, another study

(Jogendra, Sravani, Akhil, Sureshkumar, & Yasaswi, 2022) mentioned that Decision

Tree algorithm has a very good performance evaluation based on the evaluation metrics

(MAE, MAE, R-Squared, RMSE and Accuracy).

2.3.2 Naïve Bayes

A well-known supervised machine learning approach for classification

applications like text categorization is the Naive Bayes model. It replicates the input

distribution for one class or category and belongs to the family of generative learning

algorithms. This method is predicated on the idea that given the class, the properties of

the input data are conditionally independent, enabling the algorithm to predict outcomes

rapidly and precisely.

Naive Bayes classifiers are among the most basic Bayesian network models, yet

when used in conjunction with kernel density estimation, they may attain excellent

accuracy levels. With the aid of this method, the classifier may perform better in

challenging situations where the data distribution is ill-defined by estimating the

probability density function of the input data using a kernel function. As a result, the

naive Bayes classifier is an effective machine learning tool, especially for sentiment

analysis, spam filtering, and text categorization, among other applications.

Mathematically it can be stated:

7
ℎ 𝑥 𝑝(ℎ)
𝑝( ) = 𝑝( )∗
𝑥 ℎ 𝑝(𝑥)

Equation 2.1: Naïve Bayes Equation

p(h/x) is the probability of event (h) occurring if (x) is true.

(x/h) is the probability of event (x) occurring if (h) is true.

Naïve Bayes model have also multiple techniques such as Gaussian Naïve Bayes

(GaussionNB), Multinomial Naïve Bayes(MultinomialNB) and Bernoulli Naïve Bayes

(BernoulliNB). In a study (Kanimozhi, N, G, Ranjitha, & Yuvarani, 2021) comparing

the 3 techniques, Multinomial NB and Gaussion NB has the highest accuracy, where

the training time is quite low, which best suited for real time predictions as well.

2.3.3 Linear Regression (LR)

Linear regression is a supervised machine learning technique. To forecast an

event's result using the data for the independent variables, relationships that could be

predict the outcome. The technique frequently resembles a straight line that as closely

as possible correlates to the various data points. A continuous form or a number is the

outcome. The output might include things like financial earnings or sales, the quantity

of goods sold, etc. In the previous scenario, there might be one or more independent

variables. The mathematical notation for linear regression is:

y = β0 + β1x + ε
Equation 2.2: Linear Regression Equation

Y= Dependent Variable
X= Independent Variable
β0 = intercept of the line
β1 = Linear regression coefficient (slope of the line)

8
ε = random error
The linear relationship between a dependent (y) and one or more independent

(y) variables is shown by the Linear Regression procedure. In other words, it establishes

the way a change in the independent variable's value affects the value of the dependent

variable. Independent and dependent variables are related in a straight line with a slope.

Based on a study, Linear regression able to outperform other 3 algorithms

(Decision Tree, Random Forest, Linear Regression & Neural Network) with the highest

accuracy of 88% with MAE score of 0.02538 (Mittal, Goyal, & Sethi, 2018).

2.3.4 Autoregressive Integrated Moving Average (ARIMA)

In an autoregression model, the variable of interest is predicted using a linear

combination of the variable's prior values. As suggested by the phrase "autoregression,"

it is a regression of the variable against itself. We utilize lagged values of the target

variable as our input variables to predict values for the future. A model of order p

autoregression will look like:

𝑚𝑡 = 𝑜 + 𝑎1 𝑚𝑡−1 + 𝑎2 𝑚𝑡−2 + 𝑎3 𝑚𝑡−3 + ⋯ + 𝑎𝑝 𝑚𝑡−𝑝

Equation 2.3: Autoregression Part of the Model Equation

In the equation above, m's present value is a linear function of its prior p values.

The regression coefficients are [0, p] and are determined after training. One of the

methods that is often used to determine the optimal values of p is by looking at plots of

the autocorrelation and partial autocorrelation functions. Any differencing that must be

used to make the data steady is represented by integrated. The data may be tested for

stationarity using the dickey-fuller test, and after that, various differencing factors can

be tried out. A lag of mt-mt-1 is indicated by the differencing factor, d=1. Instead of

utilizing history values in a regression-like model to predict historical prediction errors,

9
moving average methods forecast future values. A moving average model can be

represented by the following equation:

𝑚𝑡 = 𝑜 + 𝑎1 𝑒𝑡−1 + 𝑎2 𝑒𝑡−2 + ⋯ + 𝑎𝑞 𝑒𝑡−𝑞

Equation 2.4: Moving Average Part of the Model Equation

The moving average component of the regression model is denoted by the letter

"q," and the random residual deviations between the model and the target variable are

denoted by the letter "error" (e) in the equation above. Since it can only be determined

after the model has been fitted and because it is a parameter as well, "e" is an

unobservable parameter in this case.

SARIMA, which stands for Seasonal-ARIMA, contains the forecast's

seasonality component. The significance of seasonality is obvious, yet ARIMA fails to

implicitly capture that information. The addition of Seasonality adds robustness to the

SARIMA model. Based on a study compared between ARIMA and SARIMA,

SARIMA has a better accuracy using the Seasonal component and setting the

parameters for the algorithm using grid search technique (Noor, et al., 2022).

2.3.5 Kernel Density Estimation (KDE)

The mathematical technique known as Kernel Density Estimation (KDE) is used

to calculate the probability density function of a random variable. The estimator

attempts to determine a population's characteristics based on a limited quantity of data.

The data smoothing problem is often used in signal processing and data science because

it is a reliable approach for predicting probability density. The technique essentially

makes it possible to produce a smooth curve out of a batch of random data. However,

the estimate may also be used to generate points that appear to have come from just a

10
certain sample set. Particularly useful for modelling items and project simulation is this

function.

By visualizing the data, the Kernel Density Estimation starts to shape the

distribution's curve. The distance between each point at a particular place in the

distribution is weighted to determine the curve's shape. The estimation is larger if there

are more points clustered nearby since there is a greater chance of seeing a point there.

The specific method utilized to balance the points throughout the data set is called the

kernel function. The kernel's form changes depending on its bandwidth. A smaller

bandwidth restricts the function's application space and makes the estimate curve appear

rough and jagged. The size and form of the estimate may be altered by adjusting the

kernel function's parameters such as bandwidth and amplitude.

For crimes in Bangalore, a study conducted with KDE algorithm where it was

able to solve the problems based on the proposed algorithm, where the kernel density

function, k of features, f in spatial point for every distance between events (Boppuru,

2023).

2.3.6 Gradient Boosting (GB)

A machine learning algorithm named Gradient Boosting (GB) is used to solve

classification and regression issues. It is an ensemble learning technique that creates a

powerful predictive model by combining several weak prediction models, often

decision trees.

The basic idea behind gradient boosting is to iteratively build an ensemble of

weak models and optimize them to minimize the errors of the previous models. The

ensemble is built by adding models sequentially, each one attempting to correct the

mistakes of its predecessors. This iterative process makes it a boosting algorithm.

11
Figure 2.1: Example of Gradient Boosting Model

Tong et. al (2021), developed the Light Gradient Boosting Machine


(LightGBM) to forecast crime occurrences based on the dataset from year 2001 to 2020
in San Francisco. According to the authors, LightGBM was effective in predicting crime
and provided accurate forecasts of crime possibility compared to other classification
models. Alsirhani et al. (2018) developed a DDoS detection framework using a Gradient
Boosting technique and the Apache Processing Engine Spark. The authors discovered
that integrating GBT with Apache Spark performed extremely well for detecting DDoS
attacks with a greater depth of decision trees and a greater number of iterations. The
results also show that it has direct impact on the processing delays based on the size of
the dataset and the number of features, as well as the depth of the decision tree and
number of iterations. Lamari et al. (2020) used a gradient boosting model to forecast
spatial crime occurrences, and it produced the most accurate predictions for violent
crimes, property crimes, motor vehicle thefts, vandalism, and overall crime count when
compared to the Poisson model and neural network model. Khan et al. (2022) used
crime data from San Francisco to compare Random Forest, Gradient Boosting decision
Tree, and Nave Bayes models for crime prediction and prevention. It can be concluded
that the GB DT performed the best after analysing the precision and recall data.

2.3.7 Long-Short Term Memory

A recurrent neural network (RNN) is an artificial neural network that uses

sequential input or time series data. Well-known programmes like Siri, voice search,

and Google Translate use these deep learning techniques. They are commonly used for

ordinal or temporal problems in speech recognition, picture captioning, and natural

12
language processing. Recurrent neural networks (RNNs) and feedforward and

convolutional neural networks (CNNs) both require training data to learn.

Due to the use of previous data, it compares and influences current input and

output by their memory. Unlike traditional neural networks, which assume that inputs

and outputs are independent of one another, recurrent neural networks' outputs are

dependent on the preceding components in the sequence. Even if they would be helpful

in determining the output of a certain sequence, future events cannot be considered in

the predictions made by unidirectional recurrent neural networks.

Recurrent neural networks (RNNs) of the Long Short-Term Memory (LSTM)

variety have gained popularity for addressing long-term dependencies in sequential data

analysis. Traditional RNNs are susceptible to the vanishing gradient problem, which

makes it challenging to learn long-term dependencies. By employing memory cells and

gates to selectively forget or remember information at certain time steps in a sequence,

LSTM is made to get around this problem. The input gate, forget gate, output gate, and

memory cell are the four main parts of the LSTM architecture.

Figure 2.2: LSTM Model

The input gate chooses which data from the input to maintain, the forget gate

chooses which data from the memory cell to erase, and the output gate chooses which

13
data from the memory cell to output. The memory cell gradually stores the data while

selectively updating its contents in response to input. The decision of whether to allow

input or forget a piece of information is made by the input and forget gates using

sigmoid activation functions, which have a range of 0 to 1. The memory cell updates its

state using a hyperbolic tangent function, and the output gate employs sigmoid

activation to choose which portion of the memory cell to output.

Several studies have reported the usage of an LSTM model for crime detection.

Comparing the federated LSTM and LSTM based on a study (Salam, 2022) by using

the same parameter values, almost similar metrics value with no significant differences,

whereby LSTM can still be considered as a better traditional algorithm.

In a study that employed historical crime data of public property from 2015 to

2018 in a coastal city in China, Zhang et al. (2020) proposed an LSTM-based crime

prediction model. When simply using historical crime data, the results demonstrated

that the LSTM model outperformed alternative approaches such as KNN, RF, SVM,

NB, and CNN.

Similarly, a federated long short-term memory (LSTM) model is suggested for

forecasting the frequency of crimes, and it is contrasted with a conventional LSTM

model in a study conducted by Abdul Salam (2022). Based on the Boston crime dataset

((BPD), 2018), the suggested model is created using TensorFlow Federated (TFF)

model and the Keras API. To increase precision and reduce loss, the model's parameters

are adjusted. The results show that the federated LSTM model performs better than the

traditional LSTM model in terms of decreased loss, enhanced accuracy, and longer

training times.

14
Han et. al (2020) suggests a model of daily crime prediction in his research using

a combination of Long Short-Term Memory Network (LSTM) and Spatial-Temporal

Graph Convolutional Network (ST-GCN) to identify high-risk regions automatically

and effectively in Chicago, America. The goal is to address crime-related issues in urban

neighborhoods. Based on the study, we can conclude that hybrid models have a better

prediction capability for crimes based on the sliding time window.

In another study by Dewan et. al (2022), where a Convolutional Neural Network

(CNN) and a Long-Short Term Memory (LSTM) network (thus, CLSTM-NN) was used

to forecast the occurrence of criminal activity in Baltimore, USA. The results show that

while the significance thresholds for Random Forest and Ridge are both greater than

0.05, those for LSTM and the Integrated Model are both lower than 0.05. This implies

that while the findings from ridge and Random Forest are not significant, the results

from LSTM and the Integrated Model are substantial and trustworthy.

Zhuang et al. (2017) propose the Spatio-Temporal Neural Network (STNN) as

a technique for precisely anticipating crime hotspots with incorporating spatial

information. They evaluate the model using call-for-service data collected over a five-

year period between March 2012 and the end of December 2016 by the Portland,

Oregon Police Bureau (PPB). They contrast our model with the most advanced

classification methods, including Multi-Layer Perceptron, Gaussian Naive Bayes,

Random Forests, K-Nearest Neighbours, and Decision Trees. The STNN(LSTM) model

outperformed each of the conventional machine learning techniques, as can be

observed.

A study conducted by Safat et. al (2021) where several machine algorithms,

namely, LR, SVM, NB, KNN, DT, MLP, RF, and XGBoost, and time sequence models

such as LSTM and ARIMA was used to forecast crime based on the criminal records

15
for the cities of Chicago and Los Angeles. It was found that in terms of root mean square

error (RMSE) and mean absolute error (MAE), LSTM performed reasonably well for

time series analysis compared to ARIMA on both data sets. However, the authors of

this work acknowledge several drawbacks to employing LSTM. First, since LSTM

models need a lot of training data to make accurate predictions, they performed better

when applied to predict crime using the Chicago dataset, which contains plenty of

instances, than when they were applied to the Los Angeles dataset, which contained

fewer instances. In addition, training LSTM models costs money and takes a long time.

Additionally, to reach optimal performance, LSTM models necessitate meticulous

hyperparameter tuning, which can be a difficult iterative process.

Researchers frequently integrate LSTM with other machine learning methods,

such as CNNs, random forests, or support vector machines, to get around these

constraints. These hybrid models can increase the precision of crime detection models

by utilizing the advantages of several methodologies. Overall, LSTM is an effective

method for processing temporal data, but because crime data includes both temporal

and geographical components, it might not be enough to detect crimes. The constraints

of crime detection models can be solved, and their accuracy increased by combining

LSTM with additional techniques.

2.4 Hybrid Deep Learning Algorithms

Deep Neural Network are widely used for image classification by translating

facts encoded into actual knowledge, more comprehensible. At each layer, Deep Neural

Networks (DNNs) change the data and generate a new representation. DNNs attempt to

categorize the data in a classification issue, improving this process layer by layer until

the desired result is obtained. This work may be thought of as the separation of lower-

16
dimensional manifolds in a data space, which is in accordance with the manifold

hypothesis, which claims that natural data forms lower-dimensional manifolds in its

embedding space (Fefferman, Mitter, & Narayanan, 2016) (Olah, 2014).

2.4.1 Bidirectional-LSTM (Bi-LSTM)

Recurrent neural network called a Bidirectional LSTM (Bi-LSTM) is used

mostly for natural language processing. It is a valuable tool for modelling the sequential

dependencies between words and phrases in both directions of the sequence since,

unlike ordinary LSTM, the input flows in both directions and it can use information

from both sides. One more LSTM layer is added by Bi-LSTM, which changes the

information flow's direction. The additional LSTM layer's input sequence flows

backward in this case, and the outputs from the two LSTM layers are then combined in

a variety of ways, including average, sum, multiplication, and concatenation.

Figure 2.3: Bi-LSTM Model

According to Chandra et al. (2021), initially developed for word-embedding in

natural language processing, bi-directional LSTM networks (BD-LSTM) operate

similarly to BD-RNNs in accessing long-range context or state in both directions. BD-

LSTM would take inputs in two separate directions, as opposed to typical LSTM

networks, one from the past to the future and the other from the future to the past.

Reversing the information preserves state information from the future. Thus, by

17
merging two disguised states, the network can always maintain data from the past and

the future. As a result, Chandra et al. (2021) investigated the performance of deep

learning techniques, such as simple RNN, LSTM networks, Bi-LSTM networks,

encoder-decoder LSTM networks, and CNN and it was found that bi-directional LSTM

networks with encoder-decoders perform better than other models for both simulated

and real-world time series.

Butt et al. (2022) stated that Bi-LSTM and Exponential Smoothing (ES) hybrid

for crime forecasting. Using crime statistics from 2010 to 2017 for New York City, the

suggested method is assessed. The suggested method performed better than cutting-

edge Seasonal Autoregressive Integrated Moving Averages (SARIMA) with low Mean

Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Mean

Absolute Error (MAE). As a result, by predicting crime patterns, the suggested

technique can aid law enforcement authorities in reducing and eliminating crime.

The Bi-LSTM neural network that Deepak et al. (2021) presented categorizes

the various sorts of crime based on information gathered from Google News and

Twitter. HREACC, HANNCC, HMNCC, and GSCA were contrasted with the

suggested technique. For all four different standard datasets, it was discovered that the

suggested method beat the others with an average improvement in accuracy of 9.2%

and a very low False-Negatives Ratio value of 0.11. Thus, by using a Bi-LSTM neural

network, which is an effective option and has been trained on a sizable dataset with a

wide range of data, it is possible to classify fresh items of evidence with greater accuracy

and speed than with other methods.

Tasnim et al. (2022) conducted a study using deep learning techniques as an

efficient multi-module strategy for forecasting crime. Decision Level Fusion and

Feature Level Fusion are the 2 modules of the suggested methodology. The Fusion

18
model, Stacked Bidirectional LSTM, and Temporal-based Attention LSTM used by the

first module. The training data from the first two models is used by the Fusion model.

On the dataset of several cities, the key models for the transfer learning technique are

temporal-related models, hence, reducing learning model's training time. The second

module produces the final prediction using the Spatio-Temporal based Attention-

LSTM, Stacked Bidirectional LSTM, and the outcome of feature-level fusion. Based on

the information from the previous 24 hours, the suggested model forecasts the upcoming

hour. The output of the proposed model may be the estimated number of crimes in any

category for a specific location. Additionally, it gives law enforcement information on

potential criminal activity based on category, location, and time. The American cities

of San Francisco and Chicago were primarily focused for the experimental analysis for

this experimental analysis. The mean absolute error for the San Francisco and Chicago

datasets

For the San Francisco and Chicago datasets, the model's mean absolute error is

0.008, 0.02, its coefficient of determination is 0.95 and 0.94, and its symmetric mean

absolute percent error is, respectively, 1.03% and 0.6%. Thus, the suggested machines

learning model performs better than several other well-known models, such as

SARIMAX.

In conclusion, Bidirectional LSTM (BI-LSTM) has demonstrated promising

performance as a machine learning system for forecasting spatiotemporal crime. By

making use of the bidirectional feature of the LSTM architecture, bi-LSTM models may

successfully capture both past and future associations in sequential crime data.

2.4.2 LSTM-CNN

The Long Short-Term Memory and Convolutional Neural Network (LSTM-

CNN) combination of two potent neural network models is the suggested machine

19
learning approach for identifying spatiotemporal crime. CNN is the best algorithm for

processing images and signals, whereas LSTM is made to cope with sequential data.

Several research have employed the LSTM-CNN model to forecast crime in diverse

contexts.

Figure 2.4: CNN-LSTM Model

According to a study by Muthamizharasan and Ponnusamy (2022) combining

CNN and the LSTM model in combination can produce a reliable crime prediction

approach with a high forecast accuracy. CNN was used to extract the attributes and the

LSTM to forecast the crime rate. Given that the calculated R-squared and MAE values

were 0.99 and 0.0027, respectively, the forecast is accurate (Muthamizharasan &

Ponnusamy, 2022).

In another study by Dewan et al. (2022), where a Convolutional Neural Network

(CNN) and a Long-Short Term Memory (LSTM) network (thus, CLSTM-NN) was used

to forecast the occurrence of criminal activity in Baltimore, USA. The results show that

while the significance thresholds for Random Forest and Ridge are both greater than

0.05, those for LSTM and the Integrated Model are both lower than 0.05. This implies

that while the findings from the ridge and Random Forest models are not significant,

the results from the regression models using the LSTM and the Integrated Model are

significant and reliable.

20
Convolutional neural networks with long short-term memories (CNN-LSTM)

were used in surveillance systems by Esan et al. (2020) to identify unusual behavioural

patterns in an academic setting. While the LSTM uses the gate mechanism to store

important information for memory, the CNN extracts the image features from the

picture frame sequences. The outcomes are contrasted with those from detection models

that are already in use, such as models built using dictionaries, motion deep nets, social

force, and probabilistic principal analysis. The outcomes demonstrate that the suggested

approach performs better than the others stated with 86% accuracy.

To conclude, various studies have demonstrated encouraging outcomes when

using LSTM-CNN machine learning algorithms for spatio-temporal crime hotspot

detection. These algorithms are capable of precisely capturing the temporal and spatial

characteristics of crime data as well as identifying high-risk locations for criminal

activity. For these algorithms to be useful in various circumstances, more study is

required.

2.4.3 Bi-LSTM-CNN

A Bidirectional-LSTM-CNN model is a model that combines Bi-LSTM and

CNN architectures. The CNN-Bi-LSTM model effectively make predictions the via the

fully connected layer by selecting the spatial features of the data through CNN,

extracting the temporal information for Bi-LSTM's input, and combining three models

(Zhuang & Cao, 2022).

21
Figure 2.5: Bi-LSTM-CNN Model

The hybrid attention-based Bi-LSTM and CNN network was used by Kumar et

al. (2020) to extract objectionable languages from Dravidian Code-Mixed text.

Character embedding was done through the CNN network, and word embedding was

done via attention-based Bi-LSTM. For word embedding, FastText was developed

using language-specific code-mixed Tamil and Malayalam text for the Tamil and

Malayalam models, respectively. One-hot encoding vectors was utilised for character

encoding. After combining the output from the CNN and attention-based Bi-LSTM

layers, a softmax layer is used to predict offensive and neutral text. Hyper-parameters

can affect how well deep neural networks operate. So, by varying the learning rate,

batch size, optimizer, epochs, loss function, and activation function, an extensive testing

was run. The parameters that provided the recommended system the highest

performance was learning rate of 0.001, batch size of 32, Adam as the optimizer, epochs

= 100, binary cross-entropy as the loss function, ReLU activation in the internal layers

of the network, and softmax activation at the output layer.

Bi-attention-LSTM-CNN hybrid model was used by Guo et al. (2020) to

differentiate the discriminative characteristics of charges as an internal mapping

between fact descriptions and charges. In text classification, CNN seeks to identify

global text features while Bi-LSTM concentrates on local text features. The hybrid

model, which combines Bi-LSTM and CNN, was proposed to extract every

22
spatiotemporal feature of text description because neither model can perform on its own.

As a result, the model obtained more accurate text feature information and boosted the

accuracy of text classification.

Singh et al. (2023) studied the detection of violence using advanced deep

learning techniques where ConvLSTM, 2D LSTM-CNN and 2D Bi-LSTM CNN was

used by analysing fight scenes on surveillance videos. ConvLSTM model was reported

to be slower and less effective overall than the other two models. The CNN Bi-LSTM

model consistently beat the CNN LSTM in the CCTV dataset and achieved the highest

degree of accuracy. In the hockey dataset, it marginally outperformed CNN LSTM.

Two-dimensional convolutional neural networks were used in both the LSTM and Bi-

LSTM models to extract features from frames. However, the Bi-LSTM layer showed

better results and was discovered to be more efficient for identification since it analysed

temporal data in both directions rather than just the forward way.

According to previous studies, Bi-LSTM CNN hybrid model has great potential

to be applied as a tool to study spatio-temporal data. Therefore, the hybrid model Bi-

LSTM CNN is proposed for the study of spatio-temporal crime detection.

2.5 Performance Evaluation Metrics

Butt et al. (2022) state that a machine learning model needs to be carefully

assessed to ensure its accuracy in interpreting a complex phenomenon given a small

number of data points and to investigate the proper application of the same models to

fresh datasets. In this study (Zhuang & Cao, 2022), the performance of the model in

predicting crime hotspots is validated using RMSE, MAPE, and MSE. R2 is mentioned

in another work by Butt et al. (2022) as being a crucial evaluation statistic for LSTM

hybrid models. The use of performance evaluation criteria is justified by the fact that

23
similar studies have frequently employed these evaluation metrics in prior research.

Additionally, MSE was chosen since it scales dependently and highly weights outliers.

The big dataset in this study suggests (Butt, Letchmunan, Hassan, & Koh, 2022)

that RMSE is a reasonable performance metric to assess the model's performance and

is appropriate for LSTM-related models. Similarly, RMSE is sensitive to outliers and is

heavily reliant on the percentage of data. It is also easily comprehensible and obliquely

illustrates the model's expected accuracy.

2.6 Summary

Ref. Model Accuracy Parameters Limitations Dataset


Decision
(Yin, The accuracy is very The accuracy Boston
Tree &
Michael, & 51.68% low and not is very low and Crime
Random
Afa, 2020) significant not significant Dataset
Forest
Max Depth = 12,
bootstrap = true,
n_estimator = 100,
Random
0.76 min sample leaf = 1,
Forest RF performs
CCP alpha = 0.0,
the best
(Ribeiro, criterion = gini, max
compared to Crime
Meneses, feature = log2
other models, Data
Costa, Min sample leaf =
but done on Para,
Miranda, & 40, criterion =
Decision specific crime, Brazil
Alves, 2022) 0.71 entropy, splitter =
Tree homicides
random, min samples
only.
split = 2
Learning rate =
Neural
0.74 invscaling, solver =
Network
adam, activation =

24
tanh, epoch = 300,
hidden layer size = 3
Memory Neurons=20
STNN- 0.815 Learning Rate =
LSTM 0.0005
Activation = relu
Decision Criterion = gini, max
0.76
Tree depth = 5
There is no
Gaussion-
0.743 - limitation,
NB
(Zhuang, however future
Random Estimators = 10, min CFS
Almeida, 0.7625 development
Forest samples split =2 data,
Morabito, & will include
K=1, Distance Portland
Ding, 2017) KNN 0.6375 more features
measure = L2
into the hybrid
Logistic Epochs = 300,
0.75 model.
Regression penalty = L2
Hidden layer
size(100,50),
ML
0.7675 Learning
Perception
Rate(0.001),
Activation(relu)
Decision To adopt Boston
No parameter info,
(Aljuboori, Tree, Naïve Precision, gender and Crime
but DT performs best
Shaker, & Bayes, Recall and identity of Dataset
in all performance
Fadhil, 2022) Logistic F1 offender as
classifier
Regression attributes
LSTM Loss=Huber, Boston
Almost Consider the
Federated Optimizer = SGD, Crime
similar global
LSTM Adam, Metrics = all, Dataset
output, but optimization
(Salam, 2022) Data size = 319073,
FLSTM is and reduce the
Batch size = 4,
better communication
Window size = 60,
slightly overhead
round number = 5

25
ES-Bi- Using the New
LSTM knowledge of York
the criminal City
Lowest domain to Crime
(Butt, Seasonal=24, Batch
MAPE, improve Dataset
Letchmunan, size=48, Epoch = 10,
RMSE, transfer
Hassan, & optimizer=RMSProp,
MAE & learning to
Koh, 2022) MSE
R^2 increase the
accuracy of
crime
prediction
Naïve Bayes If class label is Denver
absent, the City
(Kanimozhi, probability of Crime
N, G, estimation will Dataset
Accuracy = No parameter
Ranjitha, & be zero. Use
93.07% information
Yuvarani, different
2021) models in
enhance
performance
ARIMA No parameter ARIMA Chicago
No value performs better and Los
information
in future Angeles
(SAFAT, LSTM
predictions & Crime
ASGHAR, &
RMSE = crime trend Dataset
GILLANI, Epochs=40,
8.78 LSTM,
2021) batch=31
MAE = 6 evaluated
using RMSE
and RAE only.
Kernel Bengalu
Performance is
(Boppuru, Density Accuracy = No parameter ru
like ARIMA
2023) Estimation 77.49% information Crime
model,
(KDE) Dataset

26
LSTM- Missing of Chennai
CNN data monthly, City
(Muthamizhar
R^2 = 0.99 quarterly, or Crime
asan & No parameter
MAE = seasonal basis Dataset
Ponnusamy, information
0.0027 to produce
2022)
more accurate
result.
SARIMA Not all types of Saudi
R^2 = 0.853 Grid Search method
(Noor, et al., crime are used Arabia
MAE = to determine the
2022) for training the Crime
0.066059 optimal parameters
model. Dataset
Feed Chicago
71.3% Accuracy
Forward and
(Stec & CNN drops not clear Portland
72.7% No parameter
Klabjan, due to fewer Crime
RNN 74.1% information
2018) training Dataset
CNN + examples
75.6%
RNN
Decision NCRB
85.75%
Tree
(Mittal,
Random No parameter No limitation
Goyal, & 88.61%
Forest information stated
Sethi, 2018)
LR 89.61%
NN 88.31%
(Jogendra, Decision 80% No parameter Recommended Chicago
Sravani, Tree + K- information to use CNN or Crime
Akhil, means DNN for the Dataset
Sureshkumar, future use
& Yasaswi,
2022)
(Kang & DNN 83.25% Layer1 = 256, Unable to Chicago
Kang, 2017) Layer2 = 256, implement Crime
Layer3 = 128 DNN with Dataset
insufficient

27
Layer size = (1024, dataset, which
1024, 2), activation = will cause
softmax performance
degradation.
(Anuvarshini, LSTM- MAE = Kernel size = 2, Full CNN Boston
Deeksha, C, CNN 24.56, convo1D= 1, stride performs better Crime
& Krishna, RMSE = window size = 30, than LSTM Dataset
2022) 30.11 batch size = 32,
epochs = 100
(Deepak, Bi-LSTM Precision, No parameter Performance Multiple
Rooban, & recall, f- information varies with crime
Santhanavijay measure, dataset, but has dataset
an, 2021) accuracy, an average of
FNR 80% and above
(Singh, Rani, Bi-LSTM- Accuracy MaxPooling2D, 16 Bi-LSTM- Image
Bansal, & CNN filters of size 3×3 & CNN has the data
Techniques, 4x4, dropout = 0.75, highest
2023) Conv2D with 64 performance
filters
Table 2.1: Comparison of Algorithms

Overall, the hybrid models performed better compared to the singular models

excluding LSTM model in previous research. Therefore, in this study LSTM, Bi-LSTM,

LSTM-CNN, and Bi-LSTM-CNN to evaluate the better performing model in predicting

spatio-temporal crime hotspots. The hybrid models, Bi-LSTM and LSTM-CNN, has a

very convincing accuracy and training loss on the previous study based on the structured

crime dataset. According to previous studies, Bi-LSTM-CNN has shown great potential

in classifying spatio-temporal data, despite having fewer studies related to spatio-

temporal crime hotspot prediction using Bi-LSTM-CNN model. Hence, this study

assesses the hybrid model Bi-LSTM-CNN for a spatio-temporal crime hotspot

28
prediction. Based on the on the best model, evaluation metrics such as RMSE, MAPE,

MSE and R2 score can be used to get optimum comparison result.

29
CHAPTER 3

METHODOLOGY

3.1 Introduction

The overall working flow for this experiment is as shown below.

Start

Experimental Setup
HW/SW

Collection of
Experimental Dataset

Dataset
Preprocessing

Model Buidling &


Training

Model Evaluation

End

Figure 3.1: Overall Methodology

30
3.2 Experimental Setup

3.2.1 Hardware Setup

The experiment is conducted using as the hardware specification below:

Components Specification
CPU Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
GPU NVIDIA Quadro RTX 3000
Storage 250GB SSD + 2TB HDD
RAM 32.0 GB
Table 3.1: Hardware Specification

3.2.2 Software Setup

With Keras package, a python code was developed to model all the 4 algorithms

to predict the spatio-temporal crime hotspot locations. Tensorflow was used to choose

cores between CPU or GPU for training and testing. Below are the list of dependencies

and libraries installed prior to the code execution:

• Jupyter Notebook
• Keras Layers (LSTM, Conv1D, MaxPooling1D, Flatten, Dense, Dropout,
Bidirectional)
• Keras Optimizer (Adam)
• Tensorflow
• Pandas
• Numpy
• Matplotlib
• Sklearn

3.3 Experimental Dataset

3.3.1 Collection of Experimental Dataset

This dataset is collected from Boston Police Department (BPD) to document the

initial details surrounding an incident to which BPD officers respond ((BPD), 2018).

This is a dataset containing records from the various crime incident report system from

June 14, 2015, and continue to September 3, 2018, which includes a reduced set of fields

focused on capturing the type of incident as well as when and where it occurred. The

31
dataset consists of 17 attributes (Columns) and 327820 data (Rows). The crime data

was obtained in the form of csv file.

Description of attributes:

• INCIDENT_NUMBER: Crime case number registered by the police.

• OFFENSE_CODE: Specific kind of crime code.

• OFFENSE_CODE_GROUP: Conducted crime activity name.

• OFFENSE_DESCRIPTION: Detailed specification of crime.

• DISTRICT: Neighbourhood 133 in Boston.

• REPORTING_AREA: Incident reported location to the police.

• SHOOTING: “Y” means shooting occurred during crime.

• OCCURRED_ON_DATE/YEAR/MONTH/DAY_OF_WEEK/HOUR:

Time of crime.

• UCR_PART: Rate of the crime, part 1 is the highest rank.

• STREET/LATITUDE/LONGITUDE/LOCATION: Location of crime

happened.

3.3.2 Data Cleaning and Preprocessing

First, the dataset is checked for empty cells in the spreadsheet. The

“SHOOTING” column only has ‘Y’ values given; the empty cells are filled with value

‘N’. Some of the rows also have shifted, where 2434 rows were removed to clean the

dataset. Then, the dataset was partitioned 80% for training models and balance 20%

was used for testing and evaluating the models.

32
3.4 Model Building and Training

Below are the 4 algorithms planned to be built to generate training model and

test the Boston Crime Dataset. The following sections describe the hyperparameters

used for this project.

Figure 3.2: Model Building and Training Flowchart

33
3.4.1 Building the Model

Below is the flow for LSTM model generation and testing the model:

1. A sequential model was initialized using Sequential() function, to stack layers

one after another.

2. Below are the layers added for each respective model:

a. LSTM

i. An LSTM layer with 50 memory unit.

ii. Input data was initialized and determined by number of time

steps and the number of features which 1 for both.

b. Bi-LSTM

i. A Bi-LSTM layer to the model.

ii. The layer wraps an LSTM layer with 50 memory unit.

iii. Input data was initialized and determined by number of time

steps and the number of features which 1 for both.

c. LSTM-CNN

i. A 1D convolutional layer with 32 filters, a kernel size of 3, uses

the ‘ReLU’ activation function.

ii. Input data was initialized and determined by number of time

steps and the number of features which 1 for both.

iii. A 1D max pooling layer with a pool size of 2.

iv. An LSTM layer with 50 memory unit.

d. Bi-LSTM-CNN

i. A 1D convolutional layer with 32 filters, a kernel size of 3, uses

the ‘ReLU’ activation function.

34
ii. Input data was initialized and determined by number of time

steps and the number of features which 1 for both.

iii. A 1D max pooling layer with a pool size of 2.

iv. A Bi-LSTM layer with 50 memory unit.

3. A dropout Layer was added to all the models with a rate of 0.2.

4. A dense layer with a few units equal to the number of classes in the target

variable. The activation function used is ‘SoftMax’.

3.4.2 Compiling the Model

Compiling the model after the layers are added:

1. The optimizer was set to ‘Adam()’

2. The loss function was set to ‘categorical_crossentropy’.

3. The metric used to evaluate the model during training is set to accuracy.

3.4.3 Training the Model

Training the model compiled based on the layers declared:

1. The input dataset was reshaped to have dimensions to match the input shape of

the model layers used:

a. LSTM

b. Bi-LSTM

c. LSTM-CNN

d. Bi-LSTM-CNN

2. The categorical target variable (y_train) was converted into one-hot encoded

vectors.

3. 50 epochs parameter was set, which is the number of times the entire dataset

passed through the model during training.

4. Batch size is the number of samples used in each gradient update, was set to 32.

35
5. Optionally, the argument display progress was also enabled (verbose=1).

3.5 Model Evaluation

This entails refining the model's parameters and assessing the model's

effectiveness using validation data. The usefulness of the model in identifying crime

hotspots is assessed using a variety of measures, including Root Mean Squared Error

(RMSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE),

F-2 Score, training loss and accuracy. The model will then be tested on new data to

evaluate its generalizability and robustness. The final step involves analysing the results

of the experiments and drawing conclusions about the effectiveness of the proposed

approach.

3.5.1 Root Mean Squared Error (RMSE)

The square root of the mean of the square of all the errors is known as the root

mean squared error (RMSE). RMSE is regarded as a superior all-purpose error metric

for numerical forecasts. Since RMSE is scale-dependent, it should only be used to

compare prediction errors of various models or model configurations for a single

variable and not between variables. It evaluates how well a regression line matches the

observed data. The RMSE calculation formula is:

∑𝑁
𝑖=1(𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑖 − 𝐴𝑐𝑡𝑢𝑎𝑙𝑖 )
2
𝑅𝑀𝑆𝐸 = √
𝑁
Equation 3.1: RMSE Equation

𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑖 = The predicted value for the ith observation.


𝐴𝑐𝑡𝑢𝑎𝑙𝑖 = The observed(actual) value for the ith observation
N = Total number of observations.

36
3.5.2 Mean Absolute Percentage Error (MAPE)

The mean absolute percentage error (MAPE), a measurement, establishes a

forecasting technique's accuracy. It is the average of the absolute % errors of each item

in a dataset, which may be used to assess how accurate the predicted quantities were in

comparison to the actual numbers. MAPE, which necessitates the use of dataset values

other than zero, may frequently be used to examine large data sets successfully. The

formula is as shown below:

1 |𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡|
𝑀𝐴𝑃𝐸 = ∗ ∑[ ] ∗ 100
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 |𝑎𝑐𝑡𝑢𝑎𝑙|
Equation 3.2: MAPE Equation

3.5.3 Mean Squared Error (MSE)

The technique of employing the mean squared error (MSE) means of assessing

how closely a regression line matches a collection of points. To do this, the distances

between the points and the regression line ,also referred to as the "errors" are squared.

It is necessary to square to remove any unfavorable pattern. It also targets to larger

differences. This mistake type is called the mean squared error since you're averaging a

group of errors. The forecast becomes more precise as MSE decreases.

1
𝑀𝑆𝐸 = ( ) ∗ ∑(𝑎𝑐𝑡𝑢𝑎𝑙 − 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡)2
𝑛

Equation 3.3: MSE Equation

n = number of objects
∑= summation notation
Actual = original or observed y-value,
Forecast = y-value from regression.

37
3.5.4 R-Squared (R²)

R-squared is a statistical evaluation metric used to assess the goodness of fit of

a regression model. The degree to which the regression model accurately predicts the

observed data and the percentage of the dependent variable's variation that can be

accounted for by the model's independent variables are both measured. R-squared is

commonly used to understand the performance of regression models and compare

different models.

𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠(𝑆𝑆𝑅)


𝑅2 =
𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠(𝑆𝑆𝑇)

Equation 3.4: R-Squared Formula

SSR: the sum of the squared differences between the predicted and the actual values.

SST: the sum of the squared differences between the actual values and the mean of the

dependent variable.

R-squared is calculated by taking the proportion of the explained variance to the

total variance. A value of 1 for R-squared indicates that the regression model perfectly

fits the data, explaining all the variability. A score of 0 on the other hand signifies that

the model does not account for any variability and is essentially comparable to

forecasting the dependent variable's mean. R-squared can also take on negative values,

this typically happens when the model is poorly fitted or overfit.

3.5.5 Training Loss

Categorical loss entropy is a loss function in machine learning models for

classification tasks with multiple classes which is proposed to calculate for all the

proposed models for evaluation. It is designed to measure the dissimilarity or error

between predicted class probabilities and the true class labels. The categorical cross-

38
entropy loss quantifies the difference between the predicted probability distribution and

the true class labels. The equation for categorical cross-entropy loss is as follows:

𝑙𝑜𝑠𝑠 = −Σ(𝑦𝑡𝑟𝑢𝑒 ∗ log(𝑦𝑝𝑟𝑒𝑑 ))


Equation 3.5: Training Loss Equation

Given:

• 𝑦𝑡𝑟𝑢𝑒 : True labels (one-hot encoded vectors)

• 𝑦𝑝𝑟𝑒𝑑 : Predicted probabilities for each class.

Keras is used to calculate the categorical cross-entropy loss function for the model.

Thus, during training, the loss is calculated using this equation for each batch of training

samples.

3.5.6 Accuracy of the Model Generated

The accuracy is calculated based on the predictions made by the model on the

test dataset input features (X_test) and the corresponding true labels from categorical

data (y_test). Below is the generic equation how the accuracy is calculated:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒𝑠


𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

Equation 3.6: Formula to Calculate Accuracy

The evaluate() method calculates the loss and accuracy for the provided test data

(X_test) and the true labels (y_test), which are converted into one-hot encoded vectors.

In summary, the accuracy is obtained directly from the model evaluation process and

indicates the proportion of samples in the test dataset that were properly categorised to

all the samples.

39
CHAPTER 4

RESULTS AND DISCUSSION

4.1 Introduction

This section consists of the results obtained based on the dataset used to predict

crime hotspot using LSTM, Bi-LSTM, LSTM-CNN, and Bi-LSTM-CNN. The models

are evaluated using evaluation metrics including RMSE, MAPE, MSE, R-squared

value, accuracy, and training loss in order to compare the performance of the provided

technique and choose the best performing model.

4.2 Data Analysis

Below are the total number of crimes categorized under each type of crime.

Figure 4.1: Crime Type Distribution

40
Figure 4.2: District-wise Crime Distribution

Figure 4.3: Hour-wise Crime Distribution

41
Figure 4.4: Year-wise Crime Distribution

Figure 4.5: Month-wise Crime Distribution

42
4.3 Results

4.3.1 Accuracy of the Trained Models

Accuracy (%)
Models 1st 2nd 3rd 4th 5th Average
Training Training Training Training Training
LSTM 96.9 94.3 97.2 92.7 93.0 94.82
Bi- 90.4 85.5 91.7 86.3 92.5
LSTM 89.28
LSTM- 93.9 97.8 89.2 97.8 96.2
CNN 94.98
Bi- 98.8 98.6 96.6 94.8 93.2
LSTM- 96.4
CNN
Table 4.1: Accuracy of Trained Models

Accuracy (%)
98

96

94

92

90

88

86

84
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.6: Average Accuracy Comparison

4.3.2 Training Loss

Loss During Training


Models Training: Training: Training: Training: Training: Average
1 2 3 4 5
LSTM 0.108 0.133 0.105 0.145 0.185 0.1352
Bi- 0.264 0.355 0.200 0.320 0.188
LSTM 0.2654
LSTM- 0.128 0.091 0.257 0.079 0.097
CNN 0.1304

43
Bi- 0.051 0.056 0.112 0.136 0.137
LSTM- 0.0984
CNN

Table 4.2: Loss during Model Training

Training Loss
0.3

0.25

0.2

0.15

0.1

0.05

0
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.7: Average Training Loss Comparison

4.3.3 RMSE

RMSE
Models Training: Training: Training: Training: Training: Average
1 2 3 4 5
LSTM 2.454 4.125 2.824 4.258 4.755 3.6832
Bi- 4.443 3.674 3.524 3.720 4.206
LSTM 3.9134
LSTM- 3.216 2.487 4.127 2.122 2.22
CNN 2.8344
Bi- 1.920 2.490 2.561 2.178 3.860
LSTM- 2.6018
CNN

44
Table 4.3: RMSE Average

RMSE
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.8: Average RMSE Comparison

4.3.4 MAPE

MAPE
Models Training: Training: Training: Training: Training: Average
1 2 3 4 5

LSTM 0.0637 0.0670 0.0299 0.0828 0.0655 0.06178


Bi- 0.0737 0.103 0.0357 0.01504 0.0170
LSTM 0.04889
LSTM- 0.0421 0.022 0.0526 0.0211 0.0250
CNN 0.03256

Bi- 0.0180 0.025 0.0257 0.0618 0.061


LSTM- 0.03132
CNN

Table 4.4: MAPE Average

45
MAPE
0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.9: Average MAPE Comparison

4.3.5 MSE

MSE
Models Training: Training: Training: Training: Training: Average
1 2 3 4 5
LSTM 6.026 17.020 7.977 18.133 22.614 14.354
Bi- 19.747 13.501 12.424 13.844 17.691
LSTM 15.4414
LSTM- 10.346 6.190 17.036 4.505 4.970
CNN 8.6094
Bi- 3.688 6.204 6.559 4.747 14.902
LSTM- 7.22
CNN

Table 4.5: MSE Average

46
MSE
18
16
14
12
10
8
6
4
2
0
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.10: Average MSE Comparison

4.3.6 R-Squared

R-Squared
Models Training: Training: Training: Training: Training: Average
1 2 3 4 5
LSTM 0.975 0.930 0.967 0.926 0.908 0.9412
Bi- 0.919 0.945 0.949 0.943 0.928
LSTM 0.9368
LSTM- 0.957 0.975 0.9307 0.981 0.979
CNN 0.96454
Bi- 0.985 0.974 0.973 0.980 0.946
LSTM- 0.9716
CNN

Table 4.6: R-Squared Average

47
R-Squared
0.98

0.97

0.96

0.95

0.94

0.93

0.92

0.91
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.11: Average R-Squared Value Comparison

4.3.7 Time Taken for Each Algorithm for Training

Each model was run for 5 times, each time epochs are set to 50. Below is the table of

comparison of the time taken for each model for training ins seconds.

Models Training: Training: Training: Training: Training: Average


1 2 3 4 5
LSTM 1142.0 1183.13 1300.50 1168.66 1298.41 1218.54
Bi- 1429.61 1433.36 1481.91 1387.33 1444.75
1435.39
LSTM
LSTM- 823.2 812.15 840.09 802.90 888.34
833.336
CNN
Bi- 979.12 977.45 1099.96 994.04 986.88
LSTM- 1007.49
CNN
Table 4.7: Average Time Taken for Model Training

48
Model Training Time (s)
1600

1400

1200

1000

800

600

400

200

0
LSTM Bi-LSTM LSTM-CNN Bi-LSTM-CNN

Figure 4.12: Average Time Taken for Model Training Comparison

4.4 Summary

Based on the data analysis, the type of crime was taken as the categorical data

(y_test) for this experiment. In the crime type distribution, Motor Vehicle Accident

Response is recorded to be highest reported crime. Apart from this, district B2 has the

highest reported crime and district A15 is vice versa. Crimes were also analysed based

on timeline, such as hour, month, and year wise. Throughout the day, high crime rates

are reported between 5pm to 7pm. The crime rates are uncertain based on distribution

graph for year wise since it may go high or low for the upcoming years. For the month

wise, months 7, 8, and 9 are the timeline with highest recorded crimes.

For the models trained and compared, Bi-LSTM-CNN results a very convincing

performance in all evaluation metrics compare to other algorithms. Bi-LSTM-CNN

recorded to have highest average training accuracy (96.4%) and average R-squared

(0.9716) value, but not a very big significant difference with LSTM and LSTM-CNN

models. For the training loss, Bi-LSTM-CNN has the lowest average loss (0.0984). A

low average RMSE value for Bi-LSTM-CNN model (2.6018), followed by LSTM-

CNN (2.8344). There is not much of a difference for the average MAPE value recorded

49
between Bi-LSTM-CNN (0.03132) and LSTM-CNN (0.03256). For the MSE, Bi-

LSTM-CNN has the lowest value recorded (7.22) compared to other 3 models.

To choose the optimum model based on the training time, a further evaluation

was also carried out. Even though LSTM-CNN has a better average training time around

833s, the Bi-LSTM-CNN takes an additional of 170s to produce a better result and

performs better in other evaluation metrics since training is also a one time step.

50
CHAPTER 5

CONCLUSION AND FUTURE RECOMMENDATIONS

5.1 Conclusion

Crime hotspot identification is critical for increasing public safety and law

enforcement operations. The capacity to detect and anticipate high-crime regions

allows law enforcement organizations to better allocate resources and conduct targeted

crime prevention initiatives. The development of sophisticated data analysis

techniques and machine learning algorithms has resulted in substantial advances in

spatiotemporal crime hotspot detection in recent years.

The study focuses on spatiotemporal crime hotspot detection with a hybrid

machine learning method to increase forecasting performance. Various machine

learning algorithms were investigated to reduce the focus of the investigation. It was

discovered that hybrid machine learning algorithms outperform traditional machine

learning algorithms in forecasting spatiotemporal crime hotspots. This research

compares a non-hybrid model, LSTM and hybrid machine learning model including

Bi-LSTM, LSTM-CNN, and Bi-LSTM-CNN. It was discovered that Bi-LSTM-CNN

outperformed the other models in terms of accuracy, R-squared value, RMSE, MAPE,

and MSE. An additional training time evaluation was also performed to show a

significant difference among the models. In contradictory, the LSTM-CNN has a

shorter training time, but Bi-LSTM-CNN model additional training time can be

compensated for its better accuracy. This demonstrates that the hybrid model, Bi-

LSTM-CNN, may be employed for spatiotemporal crime hotspot detection, leading to

improved crime prevention when deployed by police agencies.

51
5.2 Recommendations for Future Research

For future recommendation, it is recommended to implement the Bi-LSTM-

CNN algorithm to plot hotspots on Boston, to make the model more usable for real-

life application for the crime department. Moreover, it is recommended to include

different region dataset to test the modularity of the algorithm. Another idea is to

develop a programme that will get daily real-time data from the crime department and

provide accurate hotspots based on the most recent information.

52
REFERENCES

(BPD), B. P. (2018). Crimes in Boston. Retrieved from


https://www.kaggle.com/datasets/AnalyzeBoston/crimes-in-boston
Aljuboori, F., Shaker, H., & Fadhil, A. (2022). Approaches, A Crime Data Analysis
of Prediction Based on Classification. Baghdad Science Journal, 19(5), 1073-
1077.
Alsirhani, A., Sampalli, S., & Bodorik, P. (2018). DDoS Detection System: Utilizing
Gradient Boosting Algorithm and Apache Spark. 2018 IEEE Canadian
Conference on Electrical & Computer Engineering (CCECE).
Annie, S. S., Pathmanaban, J., Kingsley, S., Sriman, B., K, S. N., & E, S. K. (2023,
March 14). Prediction and Prevention Analysis Using Machine Learning
Algorithms for Detecting the Crime Data. 2022 1st International Conference
on Computational Science and Technology (ICCST), (pp. 986-991).
Anuvarshini, S. R., Deeksha, N., C, D. S., & Krishna, S. K. (2022). Crime Forecasting
: A Theoretical Approach. 2022 IEEE 7th International Conference on Recent
Advances and Innovations in Engineering (ICRAIE).
Boppuru, P. R. (2023, February). Geo-spatial crime density attribution using optimized
machine algorithms. International Journal of Information Technology.
doi:10.1007/s41870-023-01160-7
Butt, U. M., Letchmunan, S., Hassan, F. H., & Koh, T. W. (2022, September 7). Hybrid
of deep learning and exponential smoothing for enhancing crime forecasting
accuracy.
Cai, C., Tao, Y., Zhu, T., & Deng, &. Z. (2021). Short-Term Load Forecasting Based
on Deep Learning Bidirectional LSTM Neural Network.
CHANDRA, R., GOYAL, S., & GUPTA, R. (2021). Evaluation of deep learning
models for multi-step ahead time series prediction.
Deepak, G., Rooban, S., & Santhanavijayan, A. (2021). A knowledge centric
hybridized approach for crime classification incorporating deep bi-LSTM
neural network. Multimedia Tools and Applications (2021), 28061–28085.
Dewan, A., Islam, K. M., Fariha, T. R., Murshed, M. M., Ishtiaque, A., Adnan, M. S.,
. . . Chowdhury, M. B. (2022). Spatial Pattern and Land Surface Features
Associated with Cloud-to-Ground Lightning in Bangladesh: An Exploratory
Study. Earth Systems and Environment (2022), 437-451.
Esan, D. O., Owolawi, P., & Tu, C. (2020). Detection of Anomalous Behavioural
Patterns In University Environment Using CNN-LSTM. 2020 International
Conference on Computational Science and Computational Intelligence
(CSCI), (pp. 29-35).
Fefferman, C., Mitter, S., & Narayanan, H. (2016). Testing the manifold hypothesis.
Journal of the Amer, 29(4), 983–1049.
Guo, J., Wu, B., & Zhou, P. (2020). 2020 IEEE Fifth International Conference on Data
Science in Cyberspace. Cyberspace (DSC). Beijing, China.
HAN, X., HU, X., WU, H., SHEN, B., & WU, J. (2020). Risk Prediction of Theft
Crimes in Urban Communities: An Integrated Model of LSTM and ST-GCN.
8, 217222-217230.
Huang, C., Zhang, J., Zheng, Y., & Chawla, N. V. (2018). DeepCrime: Attentive
Hierarchical Recurrent Networks for Crime Prediction. The 27th ACM

53
International Conference on Information and Knowledge Management (CIKM
’18). Turin, Italy.
Jogendra, K., Sravani, M., Akhil, M., Sureshkumar, P., & Yasaswi, V. (2022). Crime
Rate Prediction Based on K-means Clustering and Decision Tree Algorithm.
Computer Networks and Inventive COmmunication Technologies, 451-462.
Kang, H.-W., & Kang, H.-B. (2017, April 17). Prediction of crime occurrence from
multimodal data using deep learning. Dept. of Digital Media, Catholic
University of Korea, Bucheon, Gyonggi-Do, Korea.
Kanimozhi, N., N, V. K., G, S. P., Ranjitha, G., & Yuvarani, S. (2021). CRIME TYPE
AND OCCURRENCE PREDICTION USING MACHINE LEARNING
ALGORITHM. International Conference on Artificial Intelligence and Smart
Systems. doi:10.1109/ICAIS50930.2021.9395953
Khan, M., Ali, A., & Alharbi, Y. (2022). Predicting and Preventing Crime: A Crime
PredictionModel Using San Francisco Crime Data by Classification
Techniques.
Kumar, A., Saumyab, S., & Singh, J. P. (2020). NITP-AI-NLP@HASOC-Dravidian-
CodeMix-FIRE2020: A Machine Learning Approach to Identify Offensive
Languages from Dravidian Code-Mixed Text. Forum for Information Retrieval
Evaluation. Hyderabad, India.
Lamari, Y., Freskura, B., Abdessamad, A., Eichberg, S., & Bonviller, S. d. (2020).
Predicting Spatial Crime Occurrences through an Efficient Ensemble-Learning
Model. International Journal of Geo-Information, 9(645).
Mittal, M., Goyal, L. M., & Sethi, J. K. (2018). Monitoring the Impact of Economic
Crisis on Crime in India Using Machine Learning. 1469-1485.
Muthamizharasan, M., & Ponnusamy, R. (2022). Forecasting Crime Event Rate with
a CNN-LSTM Model. Innovative Data Communication Technologies and
Application, 461-470.
Noor, T. H., Almars, A. M., Alwateer, M., Almaliki, M., Gad, I., & Atlam, E.-S.
(2022). SARIMA: A Seasonal Autoregressive Integrated Moving Average
Model for Crime Analysis in Saudi Arabia. Electronics 2022, 11(3986).
Olah, C. (2014, April 6). Neural Networks, Manifolds, and Topology. Retrieved from
https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Ribeiro, J., Meneses, L., Costa, D., Miranda, W., & Alves, R. (2022). Prediction of
Homicides in Urban Centers: A Machine Leaning Approach. 344-361.
SAFAT, W., ASGHAR, S., & GILLANI, S. A. (2021, May 6). Empirical Analysis for
Crime Prediction and Forecasting Using Machine Learning and Deep Learning
Techniques. 9, 70080-70094.
Salam, M. A. (2022, April). Time Series Crime Prediction Using a Federated Machine
Learning Model.
Sharma, H. K., Choudhury, T., & Kandwal, A. (2021). Machine learning based
analytical approach for geographical analysis and prediction of Boston City
crime using geospatial dataset. GeoJournal.
Singh, S., Rani, i., Bansal, P., & Techniques, A. S. (2023). Designing of an Efficient
Model for Violence . 2023 International Conference on Advancement in
Computation & Computer Technologies (InCACCT), (pp. 533-538).
Stec, A., & Klabjan, D. (2018, June 5). Forecasting Crime with Deep Learning.
TASNIM, N., IMAM, I. T., & HASHEM, M. M. (2022). A Novel Multi-Module
Approach to Predict Crime Based on Multivariate Spatio-Temporal Data Using
Attention and Sequential Fusion Model. 10, 48009-48030.

54
Tong, X., Ni, P., Li, Q., Yuan, Q., Liu, J., Lu, H., & Li, G. (2021). Urban Crime Trends
Analysis and Occurrence Possibility Prediction based on Light Gradient
Boosting Machine. 2021 IEEE 4th International Conference on Big Data and
Artificial Intelligence.
Ye, X., Duan, L., & Peng, Q. (2021). Spatiotemporal Prediction of Theft Risk with
Deep Inception-Residual Networks. Smart Cities, 4, 204 - 216.
Yin, J., Michael, I. A., & Afa, I. J. (2020, February 9). Machine Learning Algorithms
for Visualization and Prediction Modeling of Boston Crime Data. p. 15.
ZHANG, X., LIU, L., & XIAO, L. (2020). Comparison of Machine Learning
Algorithms for Predicting Crime Hotspots. 8, 181302-181310.
Zhuang, W., & Cao, &. Y. (2022). Short-Term Traffic Flow Prediction Based on CNN-
BILSTM with Multicomponent Information. 12.
Zhuang, Y., Almeida, M., Morabito, M., & Ding, W. (2017). Crime Hot Spot
Forecasting:. IEEE International Conference on Big Knowledge. Lowell,
Massachussets.

55
APPENDICES

APPENDIX A PYTHON CODE FOR THE 4 ALGORITHMS

You might also like