Professional Documents
Culture Documents
prediction
Technical University of Cluj-Napoca,
Faculty of Electronics, Telecommunications and Information Technology
Cluj-Napoca, Romania
Keywords: crime, criminal, behavior, prediction, artificial This literature review is focused on papers that use artificial
intelligence, AI, deep learning, neural network, pattern intelligence and machine learning algorithms to extract
recognition, facial recognition, training features and predict a possible criminal behavior.
In recent years there was an increasing use of Systematic review and systematic mapping aim to make
Artificial Intelligence to assist decision making in areas of evidence synthesis as transparent, objective and
high relevance to the society such as criminal justice. For comprehensive as possible. They are specific approaches,
example machine learning models are able to learn rules from with required stages and processes. For example, they both
large datasets and may improve decision making processes start with setting out the methods that you plan to use for the
by being more accurate and avoiding human cognitive biases research in a written protocol, which is then peer-reviewed.
[1]. In the algorithmic fairness as in real life, the features Other used methods include things like searching
related to the population, such as gender, race, religion, through multiple databases using a tried and tested search
nationality are known as sensitive features, and ideally the string, screening articles for relevance against a pre-
should not affect the outcome. determined set of inclusion criteria and extracting data in a
specific way.
Systematic review and systematic mapping use very
Face is the primary means of recognizing a person, similar approaches. Both start the same way – with setting up
transmitting information, communicating with others, and a peer-reviewed protocol that outlines the planned methods.
inferring people’s feelings, among others. Our faces might In systematic mapping, the focus is then on identifying and
disclose more than what we expect. A facial image can be describing the evidence base: selecting those studies that
informative of personal traits [2], such as race, gender, age, meet criteria of relevance and scientific credibility and
health, emotion, psychology, and profession. This could be detailing them in a searchable database.
used to inform decision-making situations in the criminal Systematic reviews tend to be narrower, more about
justice system, such as probation or bail decisions, finding “what the science says” about a particular question.
Often, they are part of the same evidence synthesis pathway, Paper [17] explores the capabilities of deep learning
with a systematic map followed by one or more systematic in distinguishing the criminal from non-criminal facial
review. images. They used two deep learning models: a standard
feedforward neural network (SNN) and a convolutional
neural network (CNN) and trained them with 10.000 neural-
III. RELATED WORK emotion, mixed-gender, and mixed-face facial images. No
control has been imposed on race, due to the small batch of
The main focus in emotion detection through facial dataset and low image quality. Both models have been trained
images is to train a machine to distinguish among the six with and without controlling the gender. The results indicated
emotional facial expressions: happiness, surprise, sadness, that controlling gender does not have much effect on
disgust, fear and anger [5]. Some of the approaches used for accuracy and both trainings reached high classification
classifying facial emotions are Bayesian network [6], fuzzy efficiency up to 97%.
inference system [7] and hidden Markov model based on real-
time tracking of mouth shape [8].
IV. VISUAL CRIMINAL TENDANCY DETECTION RESULTS
AND DISCUSSION:
Machine learning has proved to be more efficient
than humans in discovering personality traits through facial
Splitting a small dataset into training and testing sets
images. Some examples are Gent et al. [9] trained machine would leave us with even a smaller training set. In cross-
that estimates the age through facial images and Reece and validation, all the samples could be used for both training and
Danforth [10] machine learning model that detects
testing, while the model is evaluated on previously unseen
depression and psychiatric disorder in Instagram facial samples. Additionally, in k-fold cross-validation, we train
images. and test k models. This allows us to be more confident in the
performance results. Consequently, we can not only report a
Deep learning has drawn much attention in the last more solid test accuracy, but also the standard deviation for
decade, due to its applicability in a wide range of this test accuracy. Finally, cross-validation allows us to tune
applications. Among the most relevant applications of deep the number of layers in our neural network, which will be
learning, we can point to the face and pattern recognition further elaborated at the end of this section. With these
applications presented in [11] and [12]. advantages in mind, the tenfold cross-validation approach is
applied here. The tenfold is preferred over its fivefold
counterpart to produce a more accurate standard deviation.
Another relevant work was made by Cristiani [13]
and Segalin et al. [14, 15] in which they applied machine
learning to predict the self-assessed personality traits The neural networks are trained up to 500 epochs,
(openness to experience, conscientiousness, extraversion, after which the change in training accuracy becomes
agreeableness, and neuroticism) of a person from the images imperceptible. The charts in Fig. 1 represent the average and
uploaded on social media, and what results in terms of standard deviation of training and test accuracies at each
personality traits those images trigger. The authors proceeded epoch. The tenfold cross-validation has been performed at
to use a hybrid approach where models, used as latent each epoch. Thus, the training and test accuracies at each
representations of features (color, composition, textual epoch, are the average over the ten folds. The standard
properties, etc.) extracted from images, are built and then deviation of accuracies is also calculated over the ten folds at
passed to a discriminative classifier to predict each user`s each epoch and depicted using the line’s thickness. The CNN
personality traits. Simplifying the problem into five distinct achieves its highest test accuracy at epoch 306. While the
binary classification problems, one for each trait, Segalin et. training accuracy keeps rising after this epoch, the test
Al [15] applied an eight-layer version of CNN, pre-trained on accuracy starts dropping. The test accuracy of 97%, achieved
ImageNet 2012 competition dataset. The results showed that by CNN (Fig. 1a), exceeds our expectations and is a clear
the personality trait that others attribute to a person, based on indicator of the possibility to differentiate between criminals
the social media images, can be predicted 10% more and non-criminals using their facial images. It is noteworthy
accurately than the personality traits that the individual that the criminal mugshots are coming from a different source
attributes to him/her-self. than non-criminal face shots. That means the conditions
under which the criminal images are taken are different than
those of non-criminal images. These different conditions
Criminal tendency is another personality trait. Wu refer to the camera, illumination, angle, distance,
and Zhang [16] demonstrated the correlation between background, resolution, etc. Such disparities which are not
criminality and facial features. They trained four classifiers: related to facial structure, though negligible in majority of
logistic regression, k nearest neighbors (KNN), support cases, might have slightly contributed in training the
vector machines (SVM), and convolutional neural network classifier and helping the classifier to distinguish between the
(CNN) and claimed their machine can identify a criminal face two categories. Therefore, it would be too ambitious to
with a 90% accuracy. Their model was controlled for race, claim that this accuracy is easily generalizable.
gender and facial expressions of emotions.
Table 2 Confusion matrix for SNN
V. RESEARCH METHODOLOGY
Fig. 2 Facial features detected by the first (a, c) and second (b, d)
convolutional layers in CNN, for a criminal (a, b) vs. non-criminal
(c, d) face shot
The purpose of this research paper was to present a
summary of all the work done in the last 5 years based on
B. Why CNN achieves higher accuracy than SNN?
finding out the relevant information about how AI predicts
criminal behavior.
Two architectural features of CNNs making them more
convincing than SNNs for image classification are as follows: We defined two research questions, and these will
help us finding the best results in our mapping
Partial connectivity rather than full connectivity process:
What is the field distribution over half of the
decade?
A node in a CNN is connected only to a small What is the field distribution in different subdomains?
number of nodes in the previous layer, while the same
node in an SNN is connected to all nodes in the previous Time – Bound Research
layer. This means that the number of synaptic weights
that need to be calculated is much fewer in CNN than The first step in finding the relevant documents for our
SNN. If the image is n × m and the convolution window project is to go through all studies done since 2015 until
is z × z, the number of synaptic weights in CNN today. The search was made on three different websites and
is n × m/z2 times fewer than SNN. We showed this only here is the total number of results on which we will start to
for the first hidden layer, but the same is true for all work on: IEEExplore - 46, ScienceDirect - 4621,
convolutional hidden layers. This has two advantages. SpringerLink - 17575. We decided to make the search after
First, a much fewer unknown parameters (synaptic 8 different search strings, and Table 5 contains the number of
weights) can be learned more quickly (less results from each website.
computational complexity) and accurately by the
machine, with a significantly reduced chance of
overfitting. Second, deriving the value of each node in
Search Strings IEEExplore ScienceDirect SpringerLink
the next layer from only a small number of neighboring
pixels, rather than the entire image, is based on the AI Criminal 0 562 1837
assumption that the relationship between two distant behaviour
pixels is probably less significant than two close
AI illegal 4 602 1377
neighbors. This assumption is inspired by the visual behaviour
cortex system in humans and other animals.
AI villain 0 10 51
prediction
Shared weights
AI criminal 3 494 1696
conduct
We mentioned that n × m synaptic weights need to be
learned for one node in the first hidden layer of SNN.
With k nodes in the first hidden layer, a total Artificial 24 936 3678
Intelligence
of n × m × k synaptic weights must be calculated, because criminal
each node in the first hidden layer has its own synaptic behaviour
weights which are different than those of other nodes. In a
Artificial 14 776 2944
CNN, however, the number of synaptic weights that need to Intelligence
be learned remains z2, because nodes in the first hidden layer illegal actions
do not have different synaptic weights, but share the same
Artificial 0 108 1520
weights. Therefore, regardless of how many nodes exist in the Intelligence
first hidden layer, the number of synaptic weights that need criminal bearing
to be learned remains z2. Consequently, the number of
synaptic weights in CNN is n × m × k/z2 times fewer than
Artificial 0 92 248 Artificial 8 599 751
Intelligence Intelligence
suspicious criminal conduct
habits detection
Total Results 11 3204 3867
Artificial 1 184 829
Intelligence
criminal habits
Down below (Table 6) there are represented the Search Strings IEEExplore ScienceDirect SpringerLink
numbers for every search string on different websites and the
AI Criminal 0 164 1425
total result remained after the first part of the inclusion. behaviour
AI villain 0 7 15
prediction Artificial 22 319 2899
Intelligence
criminal
behaviour
AI criminal 1 364 405
conduct
Artificial 14 222 2364
Intelligence
illegal actions
Artificial 2 617 779
Intelligence
Artificial 0 43 1258
criminal Intelligence
behaviour
criminal bearing
AI Criminal 0 93 44
behaviour
AI illegal 4 56 25
behaviour
AI villain 0 5 0
prediction
AI criminal 0 97 38
conduct Figure 3. Field distribution over half of the decade
Artificial 0 162 91
Intelligence What is the field distribution in different
criminal subdomains?
behaviour
Artificial 0 123 53 The second step in the mapping process is to try to categorize
Intelligence
illegal actions
the results based on their main theme and the subdomain they
refer to. The below graphic shows the number of documents
Artificial 0 18 26 for some of the main subareas in everyday Computer Science:
Intelligence
criminal bearing
Artificial Intelligence, Biometrics, Security,
Telecommunication, Big Data, IoT, Neurocomputing and
Artificial 0 23 4 also Psychology applied in IT.
Intelligence
suspicious habits
detection
Artificial 0 26 15
Intelligence
criminal habits
Artificial 5 181 78
Intelligence
criminal conduct
VII. REFERENCES