Ijsret v8 Issue4 461

International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 4, July-Aug-2022, ISSN (Online): 2395-566X
Sketch To Face Generation

Pulkit Dhingra, Ritika Pandey
pulkit12dhingraa@gmail.com, pandeyvaishali77@gmail.com
Abstract-Automation has been impacting various industries on a large scale by efficiently tackling challenging tasks. In a
criminological investigation, eyewitnesses play a vital part in putting the accused behind bars. Sketches of criminals drawn
from the information provided by the eyewitnesses make it easier to identify the accused. Often this process of sketch
generation is time-consuming. Modern-day machine learning models are capable enough to tackle various extreme situations.
With the help of technology, we can produce models that eliminate the demand for a sketch artist. The paper deals with the
use of a generative adversarial network to build a machine learning model that can be used by eyewitnesses to draw a free-
hand sketch and get a colored image as the output from the model.
Keywords- Deep Learning, Machine learning, Python, Generative Adversarial networks, Computer Science
I. INTRODUCTION The generative network generates candidates while the

discriminative network evaluates them.[1] The contest
Face sketch synthesis and recognition have attracted operates in terms of data distributions, the generative
significant attention in pattern recognition and computer network learns to map from a latent space to a data
vision due to their wide range of applications in law distribution of interest, while the discriminative network
enforcement agencies [1]. The sketch is useful to distinguishes candidates produced by the generator from
automatically recognize or narrow down face photos of the true data distribution. The generative network's
potential suspects from police mug-shot databases. training objective is to increase the error rate of the
discriminative network (i.e., "fool" the discriminator
network by producing novel candidates that the
discriminator thinks are not synthesized (are part of the
true data distribution)).
[1][6] A known dataset serves as the initial training data

for the discriminator. Training involves presenting it with
samples from the training dataset until it achieves
acceptable accuracy. The generator trains based on
whether it succeeds in fooling the discriminator. Typically
the generator is seeded with randomized input that is
Fig 1. The Sketch Interface sampled from a predefined latent space (e.g. a multivariate
normal distribution). Thereafter, candidates synthesized by
At a crime scene, if limited information about the suspect the generator are evaluated by the discriminator.
is available due to the low quality of surveillance videos or Independent backpropagation procedures are applied to
even no video/image clues, a sketch drawn according to both networks so that the generator produces better
the description of the witnesses is usually taken as the samples, while the discriminator becomes more skilled at
substitute for the suspect identification. When obtaining flagging synthetic samples.[7] When used for image
the sketches, the police can narrow down the suspects by generation, the generator is typically a deconvolutional
retrieving the law enforcement face datasets or neural network, and the discriminator is a convolutional
surveillance camera footage with the sketches [2]. In this neural network.
project, we have used a Generative Adversarial Network
to work on the deep learning part of pattern recognition to The project incorporates these basic aspects of interphase.
get predictions for the sketches. The core idea of a GAN is ● We are developing a Graphical User interface that will
based on the "indirect" training through the discriminator, interact with the user to take a sketch and process that
another neural network that can tell how much input is sketch with the help of our custom-trained model to
"realistic", which itself is also being updated generate facial output.
dynamically.[5] This means that the generator is not ● At the backend of the project, we proposed to develop an
trained to minimize the distance to a specific image, but API that will interact with our front end to take in sketch
rather to fool the discriminator. This enables the model to images from the user. The proposed model is developed
learn in an unsupervised manner. using Tensorflow and the data collected through various
channels over the internet.
© 2022 IJSRET
1925
● With the help of this data, a Neural network model is recognition problems [21]. Considering a large amount of
trained that can detect patterns in the sketch to draw real- 2-D spatial information in face images, Gao et al. [12]
life colored images of the sketches. employed EHMM to model the nonlinear relationship
● The predicted sketches of the model will be sent back to between the sketches and their photo counterparts.
our front end and will be displayed to the user in different
canvas spaces. Deep learning-based methods One of the earliest and most
● The user then can make required changes to its original popular deep-learning approaches is the AlexNet Deep
sketch to add details to the output image. Convolutional Neural Network (DCNN) architecture [18]
which was trained for the task of object classification.
II.LITERATURE REVIEW Several superior approaches that rely on deep learning
have since been introduced, along with new methods to
Existing face sketch–photo-synthesis methods can be improve the performance of a network. Of particular
divided into three main categories [1]: interest in this project are face recognition methods such
1. Subspace learning-based as Facebook‟s DeepFace [14], DeepID series [21]–[12],
2. Sparse representation-based Google‟s FaceNet [15], and VGG-Face [16], which have
3. Bayesian inference-based approaches provided some valuable observations such as the superior
The Subspace learning-based methods include the linear performance that is generally obtained by using more
subspace method, which is based on the principal layers [12], the benefit of a high amount of training data
component analysis [21], and the nonlinear subspace (especially for „deeper‟ networks having more trainable
method, which is based on local line embedding (LLE) parameters) [15], [16], the use of multiple DCNNs [19],
[12]. Tang and Wang [10], [21] first considered the face- and a “triplet-based” objective function which aims at
sketch synthesis procedure as a linear process and decreasing the distance between features of the same
proposed an Eigen sketch transformation method. The subject and increasing the distance between features of
input photo was projected onto the training photo set to different subjects [15], [16].
obtain projection coefficients. The target sketch was then
synthesized by linearly combining the corresponding III.DATASET DESCRIPTION
training sketches with the previously obtained projection
coefficients. The dataset used is custom designed based on images and
sketches. The images are taken from a variety of datasets
capturing the maximum face shapes possible across
regions around the globe. More than 10,000 images are
trained to create the model. Free hand sketches are created
for a set of 2k photos. A machine learning model is trained
based on these 2k face sketch images to generate sketches
of the remaining photos for the model to learn from. These
sketches and images are then divided into eight batches.
Images per batch are based on the facial structure. A
model is trained on these batches using Generative
Table 1. Previous Approaches Adversarial Networks.
Spatial representation has been applied to various IV.METHODOLOGY

computer vision tasks, in which subsets weighted by a
sparse vector are selected to represent the input signal. Although the U-Net was first developed for biomedical
Chang et al. [2] assumed that the sketch photo patch and image segmentation, its impressive performance in image-
the corresponding face patch could be decomposed on the to-image synthesis has significantly promoted many other
photo patch dictionary and the face patch dictionary with computer vision applications such as shape generation
the same sparse coefficients. By building a coupled [14], and image deblurs [15]. Motivated by its remarkable
dictionary with the photo and sketch patch pairs using success, we employ U-Net as the generator of GAN to
sparse coding [20], the input test photo was decomposed perform the face sketch synthesis in this project. In
on the photo elements in the coupled dictionary based on addition, we modify the U-Net model with the RDB
sparse coefficients. The face patch could then be computed module which can extract abundant local features via
using the sketch elements in the coupled dictionary and the densely connected convolutional layers. Different from the
previously obtained sparse coefficients. original RDB, we applied the Instance normalization [16]
after each convolutional layer in the RDB to improve the
Bayesian inference-based methods can be further divided quality of the synthesized sketch images.
into the embedded hidden Markov model (EHMM)-based
methods and the MRFs-based methods. Hidden Markov
models have been extensively applied to speech-
© 2022 IJSRET
1926
regarded as a kind of style transfer [17]. Thus, the

synthesized sketch should have a sketch style and preserve
the content of the photo image. To deal with the style
effectively, it was found that perceptual loss [18] was
improved by extracting high-level features from deep
networks like VGG-19 [19].
V. INTERPHASE DESIGN
Fig. 2 Neural Net Model
We are developing a web interface that will interact with
the user to take sketches and process those sketches with
Face sketch recognition with discriminator network The
the help of our custom-trained model to generate facial
basic structure of the discriminator network consists of
output. At the backend of the project, we proposed to
five convolutional layers with kernel size 3, padding 1,
develop an API that will interact with our front end to take
and strides 2, as shown in Fig. 1. The numbers of filters
sketch images from users. The proposed model is
are 32, 32, 64, 64, and 128, respectively. After each
developed using Tensorflow and the data is collected
through various channels over the internet. With the help
of this data, a neural network model is trained that can
detect patterns in the sketch to draw real-life colored
images of the sketches. The predicted sketches of the
model will be sent back to our front end and will be
displayed to users in different canvas spaces.
The user then can make the required changes to the
original sketch to add details to the output image.
Fig 3. Prescribed GAN Deep learning Model
convolutional layer, the batch normalization and ReLU

activation are stacked. At the end of the basic
discriminator network, a feature map with a size of
7×7×128 can be obtained. Connected to the basic
discriminator network, two branches are designed to
implement the functions of real/fake sketch discrimination Fig 4. Program Flow Chart
and face feature extraction
From the above flow chart, we can see that the Sketch to
The discrimination branch is a convolutional layer with an Face model first of all accepts a video as input. After
output size of 7×7 and a sigmoid activation layer to predict getting the video, it splits it into frames. Once the video is
probability scores between 0 and 1. This is used to split into frames, the model passes the images to the
determine whether the input sketch image is true or fake. Convolutional Neural Network(CNN). CNN is a type of
The face feature extraction branch is a fully connected artificial neural network used in image recognition and
layer with an output size of 1024, with which a face sketch processing that is specifically designed to process pixel
image can be represented as a 1024-dimension feature data. CNNs are powerful image processing, and artificial
vector. To address the lack of training photo-sketch data, intelligence (AI) that use deep learning to perform both
we adopt the triplet loss [10] to train the feature extraction generative and descriptive tasks, often using machine
branch, which will be introduced in the following section. vision that includes image and video recognition, along
2.4. Loss functions Assume x and y are the training photo- with recommender systems and natural language
sketch pair, and ŷ is the synthesized sketch. The formulas processing (NLP). A CNN consists of a system much like
of the loss functions used for training the proposed GAN a multilayer perceptron that has been designed for reduced
model will be introduced in this section. 2.4.1. The loss processing requirements. After the images are passed to
function for generator Face sketch synthesis can be
© 2022 IJSRET
1927
CNN, the condition is checked to determine whether the Performance in Face Verification,” in IEEE Conf.
data is train data or not. If it is train data, then the CNN is Comput. Vision Pattern Recog., June 2014, pp. 1701–
trained, and the process continues. If it is not trained, 1708.
certain predictions are generated by the Sketch Face [15] C. Peng, N. Wang, X. Gao, and J. Li, (2018) “Face
Model and finally, we get our desired output. recognition from multiple stylistic sketches:
Scenarios, datasets, and evaluation,” Pattern
Recognition, vol. 84, pp. 262-272, 2018.
REFERENCES [16] N. Wang, X. Gao, J. Sun, and J. Li, “Anchored
neighborhood index for face sketch synthesis,” IEEE
[1] B. Klare, Z. Li, and A. K. Jain, “Matching forensic Transactions on Circuits System and Video
sketches to mug shot photos,” IEEE Trans. Pattern Technology, vol. 28, no. 9, pp. 2154-2163, 2018.
Anal. Mach. Intell., vol. 33, no. 3, pp. 639–646, 2011. [17] X. Wang, and X. Tang, “Face photo-sketch synthesis
[2] B. Klare and A. K. Jain, “Heterogeneous Face and recognition,” IEEE Transactions on Pattern
Recognition Using Kernel Prototype Similarities,” Analysis and Machine Intelligent, vol. 31, no. 11, pp.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1955-1967, 2009.
6, pp. 1410–1422, Jun 2013. [18] H. Zhou, Z. Kuang, and K. Wong, “Markov weight
[3] S. Klum, H. Han, A. K. Jain, and B. Klare, “Sketch fields for face sketch synthesis,” in Proceedings of the
based face recognition: Forensic vs. composite IEEE Conference on Computer Vision and Pattern
sketches,” in Int. Conf. Biometrics (ICB), 2013, pp. Recognition, 2012, pp. 1091-1097.
1–8. [19] N. Wang, X. Gao, and J. Li, “Random sampling for
[4] H. Han, B. F. Klare, K. Bonnen, and A. K. Jain, fast face sketch synthesis,” Pattern Recognition, vol.
“Matching composite sketches to face photos: A 76, pp. 215-227, 2018.
component-based approach,” IEEE Trans. Inf. [20] L. Zhang, L. Lin, X. Wu, S. Ding, and L. Zhang,
Forensics Security, vol. 8, no. 1, pp. 191–204, Jan “End-to-end photo-sketch generation via fully
2013. convolutional representation learning,” in Proceedings
[5] C. Galea and R. A. Farrugia, “A Large-Scale of the 5th ACM on International Conference on
Software-Generated Face Composite Sketch Multimedia Retrieval, 2015, pp. 627-634.
Database,” in Int. Conf. Biometrics Special Interest [21] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.
Group (BIOSIG), Sep 2016, pp. 1–5. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,
[6] Identi-kit, identi-kit solutions. “Generative adversarial nets,” in Advances in the
http://www.identikit.net/. [ neural information processing system, 2014, pp.
[7] VisionMetric, About EFIT-V. 2672-2680.
http://www.visionmetric.com/products/aboute-fit/.
[8] D. Mcquiston, L. Topp, and R. Malpass, “Use of
facial composite systems in US law enforcement
agencies,” Psychology, Crime and Law, vol. 12, no. 5,
pp. 505–517, 2006.
[9] H. Galoogahi and T. Sim, “Inter-modality face sketch
recognition,” in IEEE Int. Conf. Multimedia and Expo
(ICME), July 2012, pp. 224–229.
[10] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “A
comprehensive survey to face hallucination,” Int. J.
Comput. Vision, vol. 106, no. 1, pp. 9–30, 2014.
[11] C. Galea and R. A. Farrugia, “Fusion of intra- and
inter-modality algorithms for face-sketch
recognition,” in Computer Analysis of Images and
Patterns, vol. 9257, 2015, pp. 700–711.
[12] C. Galea and R. A. Farrugia, “Face Photo-Sketch
Recognition using Local and Global Texture
Descriptors,” in European Signal Processing
Conference (EUSIPCO), Budapest, Hungary, Aug.
2016.
[13] W. Zhang, X. Wang, and X. Tang, “Coupled
information-theoretic encoding for face photo-sketch
recognition,” in IEEE Conf. Comput. Vision Pattern
Recog., 2011, pp. 513–520.
[14] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf,
“DeepFace: Closing the Gap to Human-Level
© 2022 IJSRET
1928

Ijsret v8 Issue4 461

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ijsret v8 Issue4 461

Uploaded by

Copyright:

Available Formats

International Journal of Scientific Research & Engineering Trends

Volume 8, Issue 4, July-Aug-2022, ISSN (Online): 2395-566X

Sketch To Face Generation

I. INTRODUCTION The generative network generates candidates while the

[1][6] A known dataset serves as the initial training data

Spatial representation has been applied to various IV.METHODOLOGY

regarded as a kind of style transfer [17]. Thus, the

Fig 3. Prescribed GAN Deep learning Model

convolutional layer, the batch normalization and ReLU

You might also like