You are on page 1of 8

Artificial Intelligence for criminal behavior

prediction
Technical University of Cluj-Napoca,
Faculty of Electronics, Telecommunications and Information Technology
Cluj-Napoca, Romania

Authors: Horatau Vlad, Pop Alexandru Iosif

Abstract sentencing, and other arrangements. Moreover, the usage of


AI in these processes can cause substantial changes in
Nowadays crimes are increasing at a high rate criminal decision making and may bypass jurisdictive
which is a great challenge for the police department of a protocols [3].
city. An enormous amount of data from different
locations is collected and stored based on diverse types of In a study triggered by Lombroso`s research [4], it
crimes taking place. It is highly necessary to analyze the was shown that criminals could by identified by their facial
data so that potential solutions for solving and structure and emotions. Although this study is made from a
diminishing the crime incidents and predicting similar physiology and psychiatry perspective, other studies have
incident patterns for future becomes possible. The recent started from this one, in order to investigate whether or not
specialization of deep learning models along with the machine learning algorithms would be able to learn and
constant improvement of performance and memory space distinguish between criminal and non-criminal facial images.
in computing machines have radically boosted the role of One of the problems the system is facing is the deficiency of
images in pattern recognition. In comparison with other women mugshots in comparison to men mugshots that are
works done on social media posts which reveal individual available to public and used to train the machine. This lack of
characteristics of its author, the facial images may data limits the performance result of the training.
manifest some unexpected personality traits.

Keywords: crime, criminal, behavior, prediction, artificial This literature review is focused on papers that use artificial
intelligence, AI, deep learning, neural network, pattern intelligence and machine learning algorithms to extract
recognition, facial recognition, training features and predict a possible criminal behavior.

I. INTRODUCTION II. SYSTEMATIC REVIEW AND SYSTEMATIC MAPPINGS

In recent years there was an increasing use of Systematic review and systematic mapping aim to make
Artificial Intelligence to assist decision making in areas of evidence synthesis as transparent, objective and
high relevance to the society such as criminal justice. For comprehensive as possible. They are specific approaches,
example machine learning models are able to learn rules from with required stages and processes. For example, they both
large datasets and may improve decision making processes start with setting out the methods that you plan to use for the
by being more accurate and avoiding human cognitive biases research in a written protocol, which is then peer-reviewed.
[1]. In the algorithmic fairness as in real life, the features Other used methods include things like searching
related to the population, such as gender, race, religion, through multiple databases using a tried and tested search
nationality are known as sensitive features, and ideally the string, screening articles for relevance against a pre-
should not affect the outcome. determined set of inclusion criteria and extracting data in a
specific way.
Systematic review and systematic mapping use very
Face is the primary means of recognizing a person, similar approaches. Both start the same way – with setting up
transmitting information, communicating with others, and a peer-reviewed protocol that outlines the planned methods.
inferring people’s feelings, among others. Our faces might In systematic mapping, the focus is then on identifying and
disclose more than what we expect. A facial image can be describing the evidence base: selecting those studies that
informative of personal traits [2], such as race, gender, age, meet criteria of relevance and scientific credibility and
health, emotion, psychology, and profession. This could be detailing them in a searchable database.
used to inform decision-making situations in the criminal Systematic reviews tend to be narrower, more about
justice system, such as probation or bail decisions, finding “what the science says” about a particular question.
Often, they are part of the same evidence synthesis pathway, Paper [17] explores the capabilities of deep learning
with a systematic map followed by one or more systematic in distinguishing the criminal from non-criminal facial
review. images. They used two deep learning models: a standard
feedforward neural network (SNN) and a convolutional
neural network (CNN) and trained them with 10.000 neural-
III. RELATED WORK emotion, mixed-gender, and mixed-face facial images. No
control has been imposed on race, due to the small batch of
The main focus in emotion detection through facial dataset and low image quality. Both models have been trained
images is to train a machine to distinguish among the six with and without controlling the gender. The results indicated
emotional facial expressions: happiness, surprise, sadness, that controlling gender does not have much effect on
disgust, fear and anger [5]. Some of the approaches used for accuracy and both trainings reached high classification
classifying facial emotions are Bayesian network [6], fuzzy efficiency up to 97%.
inference system [7] and hidden Markov model based on real-
time tracking of mouth shape [8].
IV. VISUAL CRIMINAL TENDANCY DETECTION RESULTS
AND DISCUSSION:
Machine learning has proved to be more efficient
than humans in discovering personality traits through facial
Splitting a small dataset into training and testing sets
images. Some examples are Gent et al. [9] trained machine would leave us with even a smaller training set. In cross-
that estimates the age through facial images and Reece and validation, all the samples could be used for both training and
Danforth [10] machine learning model that detects
testing, while the model is evaluated on previously unseen
depression and psychiatric disorder in Instagram facial samples. Additionally, in k-fold cross-validation, we train
images. and test k models. This allows us to be more confident in the
performance results. Consequently, we can not only report a
Deep learning has drawn much attention in the last more solid test accuracy, but also the standard deviation for
decade, due to its applicability in a wide range of this test accuracy. Finally, cross-validation allows us to tune
applications. Among the most relevant applications of deep the number of layers in our neural network, which will be
learning, we can point to the face and pattern recognition further elaborated at the end of this section. With these
applications presented in [11] and [12]. advantages in mind, the tenfold cross-validation approach is
applied here. The tenfold is preferred over its fivefold
counterpart to produce a more accurate standard deviation.
Another relevant work was made by Cristiani [13]
and Segalin et al. [14, 15] in which they applied machine
learning to predict the self-assessed personality traits The neural networks are trained up to 500 epochs,
(openness to experience, conscientiousness, extraversion, after which the change in training accuracy becomes
agreeableness, and neuroticism) of a person from the images imperceptible. The charts in Fig. 1 represent the average and
uploaded on social media, and what results in terms of standard deviation of training and test accuracies at each
personality traits those images trigger. The authors proceeded epoch. The tenfold cross-validation has been performed at
to use a hybrid approach where models, used as latent each epoch. Thus, the training and test accuracies at each
representations of features (color, composition, textual epoch, are the average over the ten folds. The standard
properties, etc.) extracted from images, are built and then deviation of accuracies is also calculated over the ten folds at
passed to a discriminative classifier to predict each user`s each epoch and depicted using the line’s thickness. The CNN
personality traits. Simplifying the problem into five distinct achieves its highest test accuracy at epoch 306. While the
binary classification problems, one for each trait, Segalin et. training accuracy keeps rising after this epoch, the test
Al [15] applied an eight-layer version of CNN, pre-trained on accuracy starts dropping. The test accuracy of 97%, achieved
ImageNet 2012 competition dataset. The results showed that by CNN (Fig. 1a), exceeds our expectations and is a clear
the personality trait that others attribute to a person, based on indicator of the possibility to differentiate between criminals
the social media images, can be predicted 10% more and non-criminals using their facial images. It is noteworthy
accurately than the personality traits that the individual that the criminal mugshots are coming from a different source
attributes to him/her-self. than non-criminal face shots. That means the conditions
under which the criminal images are taken are different than
those of non-criminal images. These different conditions
Criminal tendency is another personality trait. Wu refer to the camera, illumination, angle, distance,
and Zhang [16] demonstrated the correlation between background, resolution, etc. Such disparities which are not
criminality and facial features. They trained four classifiers: related to facial structure, though negligible in majority of
logistic regression, k nearest neighbors (KNN), support cases, might have slightly contributed in training the
vector machines (SVM), and convolutional neural network classifier and helping the classifier to distinguish between the
(CNN) and claimed their machine can identify a criminal face two categories. Therefore, it would be too ambitious to
with a 90% accuracy. Their model was controlled for race, claim that this accuracy is easily generalizable.
gender and facial expressions of emotions.
Table 2 Confusion matrix for SNN

There are more females among non-criminal images


than criminal ones. While 25% of non-criminal images are
female, only 4% of criminal images are female. The machine
might be unfairly taking advantage of this distinction to boost
its classification accuracy. To observe and control the gender
bias effect, we separate male and female images in each
category. Since the number of female images is too small, we
only train and cross-validate the models using male images.
There are 4796 male images in the criminal and 3727 in the
non-criminal category. Figure 1c, d show the average and
standard deviation of training and test accuracies over
different training epochs for the CNN and SNN, respectively.
These charts very closely imitate their mixed gender
counterparts in Fig. 1a, b, a sign that gender has no effect on
biasing the classifier one way or the other. The corresponding
confusion matrixes for CNN and SNN when applied to only
Fig 1. Training and test accuracy with one standard deviation of male images, shown in Tables 3 and 4, endorse the same
uncertainty at different epochs for: a CNN, b SNN, c CNN when conclusion.
applied to only male images, and d SNN when applied to only male
images

Interestingly but not surprisingly, the CNN (Fig. 1a)


achieves a higher test accuracy than the SNN (Fig. 1b), also
in a more consistent way. The CNN’s best test accuracy Table 3 Confusion matrix for CNN when applied to only
(97%) is 8% higher than the SNN’s best test accuracy (89% male images
with a standard deviation of 1.18%). This goes back to the
SNN being general purpose but the CNN being specifically
designed for image classification. The CNN is more
consistent in learning because the variance around its training
and test accuracy curves (Fig. 1a) is tighter than that of the Table 4 Confusion matrix for SNN when applied to only
SNN (Fig. 1b). The higher consistency and accuracy of the male images
CNN are because of its assumption of locality of pixel
dependencies and its fewer parameters.
A. Facial features and criminal tendency
The confusion matrixes for the CNN and SNN are Convolutional layers in CNN are essentially feature
shown in Tables 1 and 2, respectively. The difference generation layers. If a CNN achieves a high accuracy, it
between the false positive and false negative rates is 1% for means that the generated features by convolutional layers are
the CNN and 2% for the SNN. The classifier has no effective in distinguishing between classes. Therefore, to
meaningful bias in making either type of mistake more than understand what facial features are used by CNN to classify
the other. By analyzing the percentages of false positives the images, we need to look at the facial features that are
(non-criminal images which were misclassified as emphasized or pinpointed by each convolutional layer. A
criminal) and false negatives (criminal images which were convolutional layer usually has multiple filters. Each filter
misclassified as non-criminal) it can be stated that the separately contributes in feature generation, though it is their
classifier is not biased to put people of a specific gender or cumulative knowledge that helps CNN to classify the images.
race in a specific category while ignoring their criminal Our CNN contains 2 convolutional layers, the first one has 8
tendency. filters and the second one has 16.

In Fig. 2, the output of one of the filters from the first


convolutional layer and one of the filters from the second
convolutional layer are visualized. They highlight the facial
Table 1 Confusion matrix for CNN characteristics that are learned and used by CNN to
distinguish between the two classes. Additionally,
Fig. 2 compares these facial features between a criminal and
non-criminal face shot. It is noteworthy that neither these
facial features nor their differences are hard coded into the
machine. They are learned by the machine as most helpful in
classifying the two sets of images in the training dataset. Both SNN for the first hidden layer. Despite this explanation
convolutional layers detect and underscore the shape of the concerned the first hidden layer, it is true for all convolutional
face, eyebrows, top of the eye, pupils, nostrils, and lips. hidden layers. This property gives CNNs two advantages
over SNN. The first advantage is even less parameters for the
machine to learn and the second is enabling the CNN to look
for certain objects in the image, regardless of where in the
image they are.

V. RESEARCH METHODOLOGY
Fig. 2 Facial features detected by the first (a, c) and second (b, d)
convolutional layers in CNN, for a criminal (a, b) vs. non-criminal
(c, d) face shot
The purpose of this research paper was to present a
summary of all the work done in the last 5 years based on
B. Why CNN achieves higher accuracy than SNN?
finding out the relevant information about how AI predicts
criminal behavior.
Two architectural features of CNNs making them more
convincing than SNNs for image classification are as follows:  We defined two research questions, and these will
help us finding the best results in our mapping
 Partial connectivity rather than full connectivity process:
 What is the field distribution over half of the
decade?
A node in a CNN is connected only to a small What is the field distribution in different subdomains?
number of nodes in the previous layer, while the same
node in an SNN is connected to all nodes in the previous Time – Bound Research
layer. This means that the number of synaptic weights
that need to be calculated is much fewer in CNN than The first step in finding the relevant documents for our
SNN. If the image is n × m and the convolution window project is to go through all studies done since 2015 until
is z × z, the number of synaptic weights in CNN today. The search was made on three different websites and
is n × m/z2 times fewer than SNN. We showed this only here is the total number of results on which we will start to
for the first hidden layer, but the same is true for all work on: IEEExplore - 46, ScienceDirect - 4621,
convolutional hidden layers. This has two advantages. SpringerLink - 17575. We decided to make the search after
First, a much fewer unknown parameters (synaptic 8 different search strings, and Table 5 contains the number of
weights) can be learned more quickly (less results from each website.
computational complexity) and accurately by the
machine, with a significantly reduced chance of
overfitting. Second, deriving the value of each node in
Search Strings IEEExplore ScienceDirect SpringerLink
the next layer from only a small number of neighboring
pixels, rather than the entire image, is based on the AI Criminal 0 562 1837
assumption that the relationship between two distant behaviour
pixels is probably less significant than two close
AI illegal 4 602 1377
neighbors. This assumption is inspired by the visual behaviour
cortex system in humans and other animals.
AI villain 0 10 51
prediction
 Shared weights
AI criminal 3 494 1696
conduct
We mentioned that n × m synaptic weights need to be
learned for one node in the first hidden layer of SNN.
With k nodes in the first hidden layer, a total Artificial 24 936 3678
Intelligence
of n × m × k synaptic weights must be calculated, because criminal
each node in the first hidden layer has its own synaptic behaviour
weights which are different than those of other nodes. In a
Artificial 14 776 2944
CNN, however, the number of synaptic weights that need to Intelligence
be learned remains z2, because nodes in the first hidden layer illegal actions
do not have different synaptic weights, but share the same
Artificial 0 108 1520
weights. Therefore, regardless of how many nodes exist in the Intelligence
first hidden layer, the number of synaptic weights that need criminal bearing
to be learned remains z2. Consequently, the number of
synaptic weights in CNN is n × m × k/z2 times fewer than
Artificial 0 92 248 Artificial 8 599 751
Intelligence Intelligence
suspicious criminal conduct
habits detection
Total Results 11 3204 3867
Artificial 1 184 829
Intelligence
criminal habits

Artificial 26 857 3395 Table 6 - Inclusion of Articles and Journals


Intelligence
criminal
conduct As there is said above, the process started with the
inclusion part, where after the first filter applied, we had a
Total Results 46 4621 17575
total of 7071 results from all the websites mentioned above.

In Table 7 below, it is presented the exclusion part, where it


Table 5- Total number of researches for half a decade is calculated the number of different types of work that are
not necessarily relevant for our work because they can be too
long and can contain unnecessary details.
Inclusion and Exclusion
The exclusion part was made for Chapter, Conference Paper,
a. Only Articles, Journals. Reference Work Entry, Protocol, Book, Conference
Proceedings, Conference info, Conference Abstracts,
For the primary part of our inclusion process, it was Encyclopedia, Book Chapter, Case reports, Correspondence,
considered that the Articles, Journals or Articles from Discussion, Editorials, Errata, Examinations, Reviews,
Journals were the best fit type of documents for our research News, Practice guidelines, Short Communications, Software
according to the project that we decided to make. publications.

Down below (Table 6) there are represented the Search Strings IEEExplore ScienceDirect SpringerLink
numbers for every search string on different websites and the
AI Criminal 0 164 1425
total result remained after the first part of the inclusion. behaviour

AI illegal 4 158 969


Search Strings IEEExplore ScienceDirect SpringerLink behaviour

AI Criminal 0 398 412 AI villain 0 3 36


behaviour prediction

AI illegal 0 444 408 AI criminal 2 130 1291


behaviour conduct

AI villain 0 7 15
prediction Artificial 22 319 2899
Intelligence
criminal
behaviour
AI criminal 1 364 405
conduct
Artificial 14 222 2364
Intelligence
illegal actions
Artificial 2 617 779
Intelligence
Artificial 0 43 1258
criminal Intelligence
behaviour
criminal bearing

Artificial 0 554 580


Artificial 0 37 135
Intelligence Intelligence
illegal actions
suspicious
habits detection
Artificial 0 65 262
Intelligence
Artificial 1 83 687
criminal bearing Intelligence
criminal habits
Artificial 0 55 113
Intelligence
Artificial 18 258 2644
suspicious habits
Intelligence
detection criminal
conduct
Artificial 0 101 142
Intelligence
Total Results 35 1417 13708
criminal habits

Table 7 - Exclusion results


b. Only Articles, Journals, with technical the years. In Figure 3, it is shown the percent of documents
background, in English released in the last five years, which means the following: in
2015 were published 107 documents, in 2016 – 150
The next filter applied is to keep only article and journals documents, in 2017 and 2018 – 172 documents, in 2019 – 300
focused on technical fields, written in English. documents and until now, in 2020 was published 172
documents.
In Table 8 it is presented the result based on technical
background filter of the mentioned websites. The number of
relevant documents was 1167.

Search Strings IEEExplore ScienceDirect SpringerLink

AI Criminal 0 93 44
behaviour

AI illegal 4 56 25
behaviour

AI villain 0 5 0
prediction

AI criminal 0 97 38
conduct Figure 3. Field distribution over half of the decade
Artificial 0 162 91
Intelligence  What is the field distribution in different
criminal subdomains?
behaviour

Artificial 0 123 53 The second step in the mapping process is to try to categorize
Intelligence
illegal actions
the results based on their main theme and the subdomain they
refer to. The below graphic shows the number of documents
Artificial 0 18 26 for some of the main subareas in everyday Computer Science:
Intelligence
criminal bearing
Artificial Intelligence, Biometrics, Security,
Telecommunication, Big Data, IoT, Neurocomputing and
Artificial 0 23 4 also Psychology applied in IT.
Intelligence
suspicious habits
detection

Artificial 0 26 15
Intelligence
criminal habits

Artificial 5 181 78
Intelligence
criminal conduct

Total Results 9 784 374

Table 8 - Articles and journals with technical background


in English.

c. Only Articles, Journals, with technical


Figure 4. Field distribution in different subdomains
background, in English without duplicates

After eliminating all the duplicates, we get to a result of 738


articles. e. Only relevant documents for our project
d. Research questions: The next action in our project was to go through all the results
and only keep the relevant work for us. We decided that the
 What is the field distribution over half of the first keyword for our manual filter to be “crime” or
decade? “criminal”. Based on this, the number of results was 25. In
the graphic below is represented the main website that had
The first question that is relevant for the mapping process relevant work for our project.
is the one based on the distribution of our theme project over
of little impact. Race, another source of bias, was not
accounted for in this study because of our small dataset and
the difficulty and occasionally subjectivity of identifying the
race from low-quality facial images. However, both
categories contain images of all races with roughly similar
proportions. Enlarging our dataset, measuring the impact of
racial bias, and detecting other personality traits form our
future research venues.

VII. REFERENCES

[1] Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J,


Figure 5. Results from each website for the first keyword Mullainathan S (2017) Human decisions and
machine predictions. Q J Econ 133(1):237–293
The next important keyword was “detection” and [2] Zebrowitz LA, Montepare JM. Social psychological
we also added some of the synonyms for this it. We did the face perception: why appearance matters. Soc
filtering for “identification”, “prediction”, “investigation” Personal Psychol Compass. 2008;2(3):1497–517.
and we had a total of 5 results relevant for our work [3] Green B (2018) ‘fair’risk assessments: a precarious
approach for criminal justice reform. In: 5th
Workshop on fairness, accountability, and
transparency in machine learning
[4] Lombroso C. Criminal man. 5th ed. Durham: Duke
Univ. Press; 2006.
[5] Ekman P, Friesen W. The facial action coding system
(FACS): a technique for the measurement of facial
action. Palo Alto: Consulting Psychologists; 1978.
[6] Zhang Y, Ji Q. Active and dynamic information
fusion for facial expression understanding from
image sequences. IEEE Trans Pattern Anal Mach
Intell. 2005;27(5):699–714.
[7] Tsapatsoulis N, Karpouzis K, Stamou G, Piat F,
Figure 6. Percentage from each website for second keyword
Kollias S. A fuzzy system for emotion classification
based on the MPEG-4 facial definition parameter set.
In: 10th European signal processing conference;
2000. p. 1–4.
VI. CONCLUSIONS AND FUTURE DIRECTIONS
[8] Oliver N, Pentland A, Bérard F. LAFTER: a real-
Classifying people in any manner requires care but time face and lips tracker with facial expression
predicting whether a person is a criminal demands even more recognition. Pattern Recogn. 2000;33(8):1369–82.
caution and control and must be looked upon with suspicion. [9] Geng X, Yin C, Zhou ZH. Facial age estimation by
The danger of this technology lies in its imperfection, since learning from label distributions. IEEE Trans Pattern
misclassifying individuals can have grave repercussions. It Anal Mach Intell. 2013;35(10):2401–12.
would be too optimistic to claim that the 97% test accuracy, [10] Reece AG, Danforth CM. Instagram photos reveal
achieved by the CNN in this work, is easily generalizable to predictive markers of depression. EPJ Data Sci.
face shots from any other source. This is not only because of 2017;6(1):15.
the small size of our dataset, but also the fact that criminal
and non-criminal images come from different sources. Thus, [11] Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y,
the conditions under which the images are taken are not Li H, Yang S, Wang Z, Loy CC, et al. Deepid-net:
exactly the same, which raises the question, whether this deformable deep convolutional neural networks for
disparity in peripheral conditions was captured by the deep object detection. In: IEEE conference on computer
classifier to unfairly distinguish between the two classes. vision and pattern recognition; 2015. p. 2403–12.
[12] Sun Y, Liang D, Wang X, Tang X. Deepid3: face
recognition with very deep neural networks. 2015.
Facial emotions and age, major sources of bias in arXiv preprint. arXiv:1502.00873.
classifying facial images based on criminal tendency, were
controlled in our work by eliminating non-neutral facial [13] Cristani M, Vinciarelli A, Segalin C, Perina A.
images and images of elderly and children. The bias due to Unveiling the multimedia unconscious: Implicit
cognitive processes and multimedia content analysis.
background effects was mitigated by cropping the facial area
out of images. The gender bias was not only eliminated by In: The 21st ACM international conference on
ignoring female images, but also measured and shown to be multimedia; 2013. p. 213–22.
[14] Segalin C, Perina A, Cristani M, Vinciarelli A. The
pictures we like are our image: continuous mapping
of favorite pictures into self-assessed and attributed
personality traits. IEEE Trans Affect Comput.
2017;8(2):268–85.
[15] Segalin C, Cheng DS, Cristani M. Social profiling
through image understanding: personality inference
using convolutional neural networks. Comput Vis
Image Underst. 2017;156(1):34–50.
[16] Wu X, Zhang X. Automated inference on criminality
using face images. 2016. arXiv
preprint. arXiv:1611.04135.
[17] Marius Miron, Songul Tolan, Emilia Gomez &
Carlos Castillo. Evaluating causes of algorithmic
bias in juvenile criminal recidivism. Artificial
Intelligence and Law, 2020.

You might also like