Crime Hotspot Prediction Using Machine Learning v4

Crime Hotspot Prediction using Machine-
Learning Algorithms
Abstract
Crime rates have been increasing in many cities over the past few years; therefore,
analyzing hotspots and proactively preventing them is critical. Crime and incident data are being
collected by the police data initiative and are available for the public to encourage joint problem-
solving.
This project focuses on analyzing and predicting hotspots for three major cities (Orlando,
Fort Lauderdale and Gainesville) in Florida using classification. The data were cleansed and
mapped based on the National Incident-Based Reporting System(NIBRS) crime types. Tableau
Public software was used to create a visual report with ability to analyze data using various
dimensions for all the cities. Classification was done using four different algorithms for Fort
Lauderdale and Gainesville in the open source data mining tool Weka 3.8.
The ability to visualize crimes by NIBRS standard crime type at various times allows the
user to analyze multiple scenarios and identify hotspots quickly. Crime spot prediction was done
successfully by all four algorithms and the results were compared to each other to identify the
best one. Classification using k-nearest neighbor was the most precise (60.39%) for Fort
Lauderdale whereas ada boosting was the most precise (84.19%) for Gainesville.
Introduction
Recent statistics show that crime rate is increasing in some cities and is affecting the
quality of life in those communities [3]. In the past few years, some cities have been
implementing Data-Driven Approaches to Crime and Traffic Safety(DDACTS) to improve

public safety by decreasing crime and traffic crashes [1]. DDACTS is a model developed through
a partnership between US Department of Transportation and US Department of Justice. It
emphasizes on locating crime “hotspots” so that law enforcement can be deployed effectively in
those areas. In 2015 White House press announced the “Smart Cities” initiative to help
communities tackle local challenges and improve city services [5]. As part of this, Police data
initiative is identified as a key initiative that can help local authorities with data to improve
community policing [6]. An immense amount of crime data is being collected a part of this
initiative [9]. Using this historical data, predicting crime can be useful for the police department
to proactively monitor high crime areas. This research focuses on predicting crime spots using
visualization and machine learning algorithms for three major cities in Florida for which the data
is currently available in the Police Data Initiative repository.
Incident datasets from Orlando, Fort Lauderdale and Gainesville are used for prediction.
Data is visualized using Tableau Public software and analyzed to identify crime “Hot Spots” by
different crime types for all the three cities. Four different classification algorithms are used to
classify dataset based on binary class, Crime Status. Machine learning algorithms considered are
Decision Tree, Naïve Bayes, K-Nearest neighbor and Boosting. By Experiment, prediction
results of all the four algorithms for Fort Lauderdale and Gainesville incident data were recorded
and analyzed to identify the best algorithm to use. Orlando data is not usable by these algorithms
for prediction as the incidents data only include crimes and thus doesn’t provide sufficient data
for training the models.

Tools
• Weka: Weka is a is a suite of machine learning software written in Java, developed at the
University of Waikato, New Zealand for data mining tasks
• Tableau Public: Tableau Public is a free service tool which allows the user to load and
analyze the data visually. It also publishes the visualizations on the web
• Data Sets
o For visualization and prediction, city incident data with the Case ID, longitude
and latitude of the incident, street name and NIBRS Case type are downloaded
from Police Data Initiative site
Approach
Data Collection
The data were collected from the Police Data Initiative (PDI) open data site. PDI is a law
enforcement community that promotes the use of open data to improve public safety by
collaboration between law enforcement agencies, technologists, and researchers. Currently, there
are around 130 law enforcement agencies have released more than 200 data sets and the
inventory is growing. There are many data sets that are available on the site like accidents crash
data, incident data, complaints, and officer-involved shootings. Incident data includes
information about incidents where the police department responds to an offense and a report of a
crime is generated. In Florida, Gainesville, Orlando, and Fort Lauderdale are the cities that have
provided incident data so far.
Data Cleansing
In data mining, data cleansing is an essential process to prepare the data to remove
incomplete or inconsistent data. In our datasets, crime type is not consistent across the cities. For
example, Fort Lauderdale has 938 different crime type categories whereas Orlando has only 24
crime type categories. This is because each law enforcement agencies classify incidents
according to their own offense definitions. To analyze and visualize crimes across multiple cities
it’s important that crime types are standardized across them. NIBRS provides an offense lookup
table with various types of crime and NIBRS crime category covering the offense. Crime types
in all the data sets are mapped to NIBRS type by defining a mapping table for them.
Geographical longitude and latitude are critical for this study and they were not provided as a
single attribute in all the data sets. For example, in Gainesville set, street names and Geo
coordinates are listed in one field. Steps were taken to parse and substring appropriate data for
analyzing.
Data Exploration
Data exploration is an initial data analysis step to better understand the data and its
specific characteristics. Visual exploration is the best way for this as it allows the analyst to
quickly absorb large amounts of visual information. All the three data sets are explored using
Tableau public software.

Model Building
Prediction models were built using machine learning classification algorithms.
Classification is a supervised learning method used to predict a certain outcome based on a given
input. In general, there are two steps for data classification. The first step is to build a classifier
model based on a data with a predetermined set of classes. This is a “learning” process where the
model is trained from known data. In this step, it is required that each instance of the data is
labeled with an appropriate class label. This dataset is called training set. Because the class label
of each instance in the dataset is provided to the model, it is known as “Supervised” learning. In
the second step, the model is used for predicting the class label for the test set. Test set will have
all the attributes except for the class label. The classification model will classify each instance to
predict the class label for each instance of the test set. There are many data mining algorithms
that can be used. Various algorithms were analyzed and prioritized based on its limitations, and
ability to classify using supervised learning [13].
Algorithm Limitations Classification Supervised Learning Final Score

C49 Decision Tree 4 5 5 17
K-means Clustering 3 5 1 12
Support Vector
3 5 5 15
Machines
Apriori 4 1 1 11
EM 5 1 1 10
PageRank 2 5 1 12
Ada Boosting 5 5 5 18
K-nearest neighbor 4 5 5 18
Naïve Bayes 4 5 5 17
CART 3 5 5 16
Based on the final score, following four classification algorithms will be used to predict the
crime hotspots
1. Decision Tree
2. Naïve Bayes
3. K-Nearest Neighbor
4. Ada Boosting
A general representation of how algorithms work.
Decision Trees
Decision tree is a widely used algorithm in data mining. This algorithm uses a flow chart like
structure to predict the class labels based on several input attributes [6]. Decision trees have 3
kinds of nodes:
1. Root Node – Top most node in the tree with no incoming edges but with zero or more
outgoing edges
2. Internal Node – The node that has exactly one
incoming edge and two or more outgoing edges
3. Leaf Node – The node that has exactly one
incoming edge and no outgoing edge
In this tree, each internal node represents a test on
an input attribute to separate records that have different characteristics and each branch
represents an outcome of those tests. Each leaf node of the tree represents the actual class label
of the instance. The classification problem is resolved based on series of questions (Root or
Internal node) until a conclusion about the class label is reached (Leaf node). The diagram above
shows a simple example of a decision tree to classify if a species is a mammal or non-mammal

[12]
. In this example, there are two attributes body temperature and gives birth considered for
classification. When training the data, the tree is built in a recursive way to optimize the model.
In the first root split, all the attributes will be considered and training data is divided into groups
based on the split. In the example, training data is split based on body temperature. The cost of
the split will be calculated and the step is repeated with the other attribute gives birth. The
attribute with the lowest cost will be considered the root node. There are many measures that can
be used to calculate the cost of the split. Classification error (number of incorrectly classified
instances/ total number of instances) is one of them. The process continues recursively with other
features until a condition is met to stop the split. One way to stop is based on a minimum number
of training inputs to use in each leaf. Another way is using the maximum depth of the model tree.
Maximum depth is the longest path from a root to a leaf. Once the training model is built, the
testing of each instance in the test set starts with the root node. Test conditions are applied to the
records and appropriate branch path is followed until a leaf node with a class label is reached.
Naïve Bayes Classifier
Naïve Bayes is a machine learning classification algorithm based on Bayes' probability
theorem. Let X denote the attribute set of the input instances and Y denote the class variable. By
𝑃(𝑋|𝑌)∗𝑃(𝑌)
Bayes’ Theorem: P(Y|X) = 𝑃(𝑋)
• P(Y|X) is the posterior probability which provides the probability of hypothesis Y given
the data X
• P(X|Y) is class conditional probability which is of data X given that the hypothesis Y was
true.
• P(Y) is the prior probability which provides probability of hypothesis Y being true
regardless of the data X
• P(X) is the probability of the data regardless of the hypothesis
P(X) is always constant when comparing posterior probabilities for different values of Y
and thus it can be ignored. Prior probability can be calculated from the training set by dividing
the number of training records that belong to a class by the total number of instances. For
example, if the dataset has the same number of instances in each class the probability for each of
them are same. Class conditional probability is the probability of each input value given each
class value. This is calculated by dividing the frequency of each attribute value for a given class
value divided by the frequency of instances with that class value. For estimating the class
conditional probability P(X|Y) Naïve Bayes classifier is used. This algorithm estimates the class
conditional probability with an assumption that the attributes are conditionally independent,
given the class label y [12].
It is called Naïve because it estimates conditional probability of each Xi given Y rather
than attempting to calculate the probability for each attribute value P (x1, x2, x3|Y) [7]. It takes
an assumption that they are conditionally independent and calculates P(x1|Y), P(x2|Y) etc. It can
be represented as,
𝑑
𝑃(𝑋|𝑌 = 𝑦) 𝐸 ∏ 𝑃 (𝑋𝑖 |𝑌 = 𝑦)
𝑖=1
Where each attribute set X = {X1, X2, …Xd} consists of d attributes. To classify a test record, the
naïve Bayes classifier computes the posterior probability for each class Y:
𝑃(𝑌) ∏𝑑𝑖=1 𝑃(𝑋𝑖|𝑌)
𝑃(𝑌|𝑋) =
𝑃(𝑋)
Ada Boosting
Boosting is an ensemble classifier that can improve classification accuracy by
aggregating the predictions of multiple classifiers. Multiple training sets are created by
resampling the original data and classifier is built from each training set using any standard
classification algorithms (For example, decision trees) [12]. Ada boosting algorithm attempts to
boost the accuracy of any underlying classification algorithm by assigning weight to each
training sample and adaptively change the weight at the end of each round. Following are the
steps for boosting algorithm
1. Apply same weight for all the instances in the training set
2. Create a training set by sampling with replacement
3. Train using any classifier and calculate the accuracy
4. Reset the weights for all samples. Assign higher weight to incorrectly classified instances so
that classifiers can focus on those in the next round
5. Repeat from step 3 until set number of rounds are completed or higher accuracy is reached
The following diagram provides the visual
representation of how boosting works [8]
In the diagram above, original data set
D1 starts with equal weighting for all data
points. First trained classifier labeled one ‘+’ and two ‘-’ classes incorrectly. In the next round,
those data points are assigned higher weights (highlighted bigger than rest of the data points).
The second classifier will focus on predicting them correctly due to higher weights. This
continues to the next round until the final classifier is obtained by combining the learnings from
the multiple classifiers to obtain better accuracy.
K-Nearest Neighbor Classifier
The nearest neighbor classifier is an instance based learner which makes predictions
using specific training instances “closer” to test instance. These are also called as “Lazy
Learners” and doesn’t require model building. However, the classification process is quite
expensive as the test instance needs to be classified by computing its proximity to training
examples. Most common class label among its K nearest neighbor will be chosen.
In the above diagram (a), classification is based on
one nearest neighbor and the test instance will be assigned
with the label ‘- ‘. In (c), the test instance will be assigned
as ‘+’ as two of the 3 neighbors have “+” as a label. In (b),
where there is a tie, algorithm randomly chooses one of the
labels for the test instance [2].
Model Evaluation
Once the model is built by a classifier, the efficiency of it can be evaluated using several
metrics which are based on counts of correctly and incorrectly labeled instances by the model.
The confusion matrix is a table that can provide a visual representation of the performance of a
model. In the notation below the rare class is denoted as positive class and majority class is
denoted as negative class.

Predicted Class
+ -
Actual + TP FN
Case
- FP TN
• True positive (TP): Number of positive examples that are correctly predicted by the
model
• False negative (FN): Number of positive examples that are wrongly predicted as negative
by the model
• False positive (FP): Number of negative examples that are wrongly predicted as positive
by the model
• True negative (TN): Number of negative examples that are correctly predicted by the
model
In this paper, the following metrics are used to evaluate and compare the classification models
[7]
.
Accuracy
Accuracy provides a summarized number that represents a proportionate number of times
the model is correct when test set is applied.

TP+TN
Accuracy, a = TP+TN+FP+FN
Error rate
Error rate provides a summarized number that represents a proportionate number of times
the model is incorrect when test set is applied.

FP+FN
Error Rate, e = TP+TN+FP+FN
Recall
Recall provides a summarized number that represents a fraction of positive instances
correctly predicted by the classifier. This is also called as True positive rate. Large recall value
means that model has few positive instances misclassified as negative.

𝑇𝑃
Recall, r = 𝑇𝑃+𝐹𝑁
Precision
Precision provides a summarized number that represents a fraction of records labeled as
positive in the group classifier predicted as positive. Many false positive errors predicted by the
classifier will be lower with higher precision.

𝑇𝑃
Precision, p = 𝑇𝑃+𝐹𝑃
Using these measures, the performance of a model needs to be evaluated using a test set
for which the labels are already known. In the cases where there is no separate test set is
available, other alternative methods can be used [10]. The first method is called holdout method.
In this, the original data set if split into two disjoint sets. One is used as training set and the other
is used as testing set. The proportion of split between train and test set is decided by the analyst
based on the problem that is in hand. It can be 1:1 or 2:1 or 3:1 or any other ratios. This method
has few limitations. First, many instances used for training the model will be fewer as some
portion of the data is kept for testing. To overcome this, if the training set split is too large, test
accuracy will be less reliable. Moreover, training and testing are subsets of original data which
means that a class overrepresented in one will be underrepresented in the other. An alternative
method that is widely used is “K- fold Cross Validation”. In this approach, all the instances will
be used for training as well as for testing using a random subsampling method. In k-fold cross
validation, data is divided into k-subsets. K-1 subsets are used as training set and one subset is
used as testing. In this, each partition is used exactly once for testing as this procedure is
repeated k-times. The total error is calculated by taking the mean of the accuracy from the k-
runs. 10-fold cross validation is the most commonly used to evaluate or compare the classifiers.
Results
For all the cities data is mapped using the longitude and latitude of the incident location.
These points are imposed on US geographical map to visualize the hot spots by the NIBRS crime
type. Crime spots are weighted by the number of incidents that occurred in a specific
geographical location. This gives a visual view of “hotspots” in the cities. Selection can be made
by year of the crime or month of the crime or NIBRS crime type for further analysis.
Visualization provides parameters to allow users to select a subset of data for further analysis
based on a scenario. For example, if the police department would like to focus on drug-related
crime, the chart allows selection of that specific crime and helps to identify the hotspots for
taking targeted action.
Visualization
Fort Lauderdale Top 10 Crimes

NIBRS Type Number of Crimes % of Crimes
Robbery 26,768 40.61%
Disorderly Conduct 13,374 20.29%
Assault 6,081 9.23%
Fraud 5,361 8.13%
Trespassing 5,000 7.59%
Drug/ Narcotic 3,885 5.89%
Motor Vehicle Theft 2,427 3.68%
Family Offenses 2,300 3.49%
Liquor Violations 431 0.65%
Driving under the Influence 285 0.43%
In the city of Fort Lauderdale, Robbery is the top crime and following three areas are
“hotspots” for this crime: West Broward Blvd/ NW 25th Ave, E Oakland Park Blvd/ N Federal
Hwy and E Sunrise Blvd and NE 4th Avenue.
Orlando Top 10 Crimes

Larceny 43,018 27.19%
Motor Vehicle Theft 25,334 16.01%
Burglary 24,239 15.32%
Drug/ Narcotics 16,880 10.67%
Assault 16,700 10.56%
Stolen Property 15,407 9.74%
Extortion 7,264 4.59%
Robbery 6,114 3.86%
Fraud 2,997 1.89%
Arson 244 0.15%
In Orlando, Larceny is the top crime and visualization shows that it’s occurring highest in
Orlando international airport followed by Orlando International Premium Outlets and Conroy
Rd/ Eastgate Dr (next to The Mall at Millenia).
Gainesville Top 10 Crimes

Robbery 11,122 32.70%
Assault 5,801 17.05%
Curfew/ loitering 4,029 11.84%
Drug/ narcotics 3,238 9.52%
Disorderly conduct 2,784 8.18%
Stolen property 2,541 7.47%
Embezzlement 1,947 5.72%
Fraud 1,419 4.17%
Driving under the
influence 617 1.81%
Runaway 518 1.52%
In Gainesville, the top crime is robbery and visualization show that the “hotspots” for
robbery is at NE12th Ave/ NE 19th Ter followed by NW 23rd/ NW 34th and SW Archer Rd/ SW
34th St.
Prediction
The table below provides basic statistics of the datasets that were used in this study.
Total # of % of # of Non- % of Non-
Dataset Incidents # of Crimes Crimes Crimes Crimes
Fort Lauderdale 163,765 73,953 45.16 89,812 54.84
Gainesville 49,516 8,888 17.95 40,628 82.05
Orlando 158,459 158,459 100 0 0
In this study, 10-fold cross validation is used to obtain the results. The tool randomly
divides the data into 10 separate subsets. 9 of them were used for training and the rest was used
for testing. The process was repeated 10 times so that each subset is used as a test set at least
once and the average of the result is calculated. Table below provides the results of the
classification using various algorithms.
Accuracy Precision Recall

Dataset Algorithm (%) Error Rate (%) (%) (%)
Decision Tree 57.53 42.47 59.45 70.98
Boosting 57.53 42.47 58.97 74.15
Fort
Lauderdale Nearest
58.10 41.90 60.39 68.58
Neighbor
Naïve Bayes 55.69 44.31 57.97 69.84
Decision Tree 82.05 17.95 82.05 100.00

Boosting 80.01 19.99 84.19 93.13
Gainesville Nearest
79.13 20.87 84.01 92.10
Neighbor
Naive Bayes 80.63 19.37 82.80 96.42
For Fort Lauderdale, K-Nearest neighbor (KNN) provided the best accuracy rate of
58.1% and lower Error rate (41.0%). Better Precision is also achieved by KNN with 60.39%.
Recall value of Naïve Bayes (69.84%) is the best among the four classifiers that were compared.
For Gainesville, Decision tree provided the best accuracy rate of 82.05% and lower Error rate
(17.95%). Better Precision is also achieved by Boosting with 84.19%. Recall value of Decision
tree (100%) is the best among the four classifiers that were compared.
Prediction results are mapped for both Fort Lauderdale and Gainesville, based on the
algorithm results with best precision. The pictures below highlight the predicted hot spots.
Conclusion
The aim of this project is to create an application to predict crime hotspots using
visualization and classification for three Florida cities. To achieve that, incident data for the
cities are obtained, mapped using visualization software and classified using data mining
algorithms. Visualization was done successfully and the application allows hot spot analysis
using multiple dimensions like time and crime type. In addition, four different algorithms are
used to build classification models that can predict hotspots and the models are compared using
evaluation techniques. Based on Accuracy, k-nearest neighbor was the most effective algorithm
for Fort Lauderdale and decision tree was the most effective algorithm for Gainesville. When
Precision is considered, boosting is a better algorithm for prediction in Gainesville and k-nearest
neighbor is best for Fort Lauderdale.

Future Research
The current research provides a baseline for how machine learning algorithms can be
used for proactive policing. This study can be further developed by adding additional Florida
cities as the incident data becomes available for them in Police Data Initiative repository. It can
potentially be a single interface for the users to quickly access crime spots for various cities. The
predictions can be enhanced by adding more features to the dataset. Adding the actual weather
during the incident occurrence time may provide more insights. For example, there were more
crimes in certain areas when there was a hurricane. Adding the weather information can help the
model to identify these patterns. Similarly adding census data for the incident location can also
be beneficial. Particularly, augmenting the dataset with median income level, ethnicity
population spread, and education level will certainly help with improved prediction of hotspots.
Limitations
The tool used for prediction (Weka 3.8) didn’t have a single class classifier. Currently,
Orlando’s data did not include all incidents, rather there has only crime data. Having a one class
classifier could have been used to predict the hotspots for Orlando. Some of the data points in the
dataset are “outliers” from the most of the other common data points for the city. For example,
some incident data recorded for Gainesville had geo-coordinates in Utah. These outliers tend to
limit the model learning process.
Applications
This application is a very beneficial tool, not only for the Police Officers but also for the
public. It provides visualization of crime spots which can be further analyzed using multiple
parameters like time, season, crime type, etc. For example, police officers can filter the data for
certain months when “snow bird” population is higher in retirement community areas and see if
there is a different crime pattern. They can also use the prediction models to proactively predict
the hotspots and patrol those areas more actively. Common people can utilize this to understand
crime spots and use that knowledge when they are considering an area to buy a property or to be
more alert in problem areas while visiting.
References
1. [Special issue]. (2010). Geography Public Safety, 2(3). Retrieved from
https://www.nij.gov/topics/technology/maps/documents/gps-bulletin-v2i3.pdf
2. Brownlee, J. (2016, April 11). Naive Bayes for machine-learning [Online forum post].
Retrieved from https://machinelearningmastery.com/naive-bayes-for-machine-learning/
3. Castillo, M. (2015, June 4). Is a new crime wave on the horizon? CNN. Retrieved from
http://www.cnn.com/2015/06/02/us/crime-in-america/
4. Criminal Justice Information Services Division Uniform Crime Reporting Program.
(n.d.). Retrieved from https://ucr.fbi.gov/nibrs/nibrs-user-manual
5. FACT SHEET: Administration Announces New “Smart Cities” Initiative to Help
Communities Tackle Local Challenges and Improve City Services [Press release]. (2015,
September 14). Retrieved from https://obamawhitehouse.archives.gov/the-press-
office/2015/09/14/fact-sheet-administration-announces-new-smart-cities-initiative-help
6. FACT SHEET: Announcing Over $80 million in New Federal Investment and a Doubling
of Participating Communities in the White House Smart Cities Initiative [Press release].
(n.d.). Retrieved from https://obamawhitehouse.archives.gov/the-press-
office/2016/09/26/fact-sheet-announcing-over-80-million-new-federal-investment-and
7. Gunawardena, T. (2016, September 4). K Nearest Neighbors. Retrieved January 7, 2018,
from https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
8. Marsh, B. (2016, September). Multivariate Analysis of the Vector Boson Fusion Higgs
Boson. Retrieved from
https://www.researchgate.net/profile/Brendan_Marsh3/publication/306054843_Multivari
ate_Analysis_of_the_Vector_Boson_Fusion_Higgs_Boson/links/57ac9d6508ae7a6420c2
ffa8/Multivariate-Analysis-of-the-Vector-Boson-Fusion-Higgs-Boson.pdf
9. Police Data Initiative. (2017). Retrieved from https://www.policedatainitiative.org/
10. Schneider, J. (1997, February 7). Cross Validation. Retrieved January 7, 2018, from
https://www.cs.cmu.edu/~schneide/tut5/node42.html
11. Shojaee, S., Mustapha, A., Sidi, F., & Jabar, M. Z. (2013, May). A Study on
Classification Learning Algorithms to Predict Crime Status. Retrieved from
https://www.researchgate.net/profile/Somayeh_Shojaee/publication/266971832_A_Study
_on_Classification_Learning_Algorithms_to_Predict_Crime_Status/links/54436e830cf2e
6f0c0f94761/A-Study-on-Classification-Learning-Algorithms-to-Predict-Crime-
Status.pdf
12. Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to datamining. Boston, Ma:
Addison-Wesley Longman Publishing Co.
13. Wu, X., Kumar, V., Quinlan, R. J., Ghosh, J., Yang, Q., Motoda, H., . . . Steinberg, D.
(2007, September). Top 10 algorithms in data mining. Retrieved from
https://atasehir.bel.tr/Content/Yuklemeler/Dokuman/Dokuman3_4.pdf

Crime Hotspot Prediction Using Machine Learning v4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Crime Hotspot Prediction Using Machine Learning v4

Uploaded by

Copyright:

Available Formats

Crime Hotspot Prediction using Machine-

implementing Data-Driven Approaches to Crime and Traffic Safety(DDACTS) to improve

a partnership between US Department of Transportation and US Department of Justice. It

is currently available in the Police Data Initiative repository.

for training the models.

University of Waikato, New Zealand for data mining tasks

from Police Data Initiative site

provided incident data so far.

Tableau public software.

Prediction models were built using machine learning classification algorithms.

ability to classify using supervised learning [13].

Algorithm Limitations Classification Supervised Learning Final Score

A general representation of how algorithms work.

2. Internal Node – The node that has exactly one

incoming edge and two or more outgoing edges

3. Leaf Node – The node that has exactly one

incoming edge and no outgoing edge

In this tree, each internal node represents a test on

shows a simple example of a decision tree to classify if a species is a mammal or non-mammal

Naïve Bayes Classifier

Naïve Bayes is a machine learning classification algorithm based on Bayes' probability

regardless of the data X

• P(X) is the probability of the data regardless of the hypothesis

given the class label y [12].

It is called Naïve because it estimates conditional probability of each Xi given Y rather

Boosting is an ensemble classifier that can improve classification accuracy by

steps for boosting algorithm

2. Create a training set by sampling with replacement

3. Train using any classifier and calculate the accuracy

that classifiers can focus on those in the next round

The following diagram provides the visual

representation of how boosting works [8]

In the diagram above, original data set

D1 starts with equal weighting for all data

the multiple classifiers to obtain better accuracy.

K-Nearest Neighbor Classifier

In the above diagram (a), classification is based on

one nearest neighbor and the test instance will be assigned

with the label ‘- ‘. In (c), the test instance will be assigned

as ‘+’ as two of the 3 neighbors have “+” as a label. In (b),

where there is a tie, algorithm randomly chooses one of the

labels for the test instance [2].

denoted as negative class.

Accuracy provides a summarized number that represents a proportionate number of times

the model is correct when test set is applied.

the model is incorrect when test set is applied.

Recall provides a summarized number that represents a fraction of positive instances

means that model has few positive instances misclassified as negative.

Precision provides a summarized number that represents a fraction of records labeled as

classifier will be lower with higher precision.

taking targeted action.

Fort Lauderdale Top 10 Crimes

Hwy and E Sunrise Blvd and NE 4th Avenue.

Orlando Top 10 Crimes

Rd/ Eastgate Dr (next to The Mall at Millenia).

Gainesville Top 10 Crimes

classification using various algorithms.

Accuracy Precision Recall

Decision Tree 82.05 17.95 82.05 100.00

neighbor is best for Fort Lauderdale.

limit the model learning process.

more alert in problem areas while visiting.

1. [Special issue]. (2010). Geography Public Safety, 2(3). Retrieved from