You are on page 1of 19

Multimedia Tools and Applications

S2HConv-Caps-BiGRU: Deep Learning based Heterogeneous Face Recognition


Model with Divergent Stages
--Manuscript Draft--

Manuscript Number: MTAP-D-23-00442

Full Title: S2HConv-Caps-BiGRU: Deep Learning based Heterogeneous Face Recognition


Model with Divergent Stages

Article Type: Track 1: General Multimedia Topics

Keywords: Heterogeneous Face Recognition; Squirrel Search Heterogeneous Convolutional-


Capsule- Bidirectional Gated Recurrent Unit; coupled representation similarity metric;
state-of-the-art; deep neural networks; CUHK Face Sketch database

Abstract: Real-time images of faces captured in different spectrum bands are considered
heterogeneous images. Heterogeneous Face Recognition (HFR) matches faces
across domains and is crucial to public safety. This paper proposes an HFR approach
based on Deep Neural Networks (DNN). Feature maps are extracted from two images,
such as gallery and sketch images, using S2HConv-Caps-BiGRU (Squirrel Search
Heterogeneous Convolutional-Capsule- Bidirectional Gated Recurrent Unit). As a
method of efficiently recognizing faces, coupled representation similarity metric
(CRSM) will use the measure for a similarity of two feature maps. The experimental
results will be evaluated to state-of-the-art (SOTA) statistical measures in terms of
accuracy, recall, Jaccard score, dice score, mean square error (MSE), image similarity,
performance, and RMSE. Compared to other SOTA, the model produces the best
results. The accuracy value of a CUFS dataset is 98.7%.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Click here to access/download;Manuscript;Manuscript.docx

Click here to view linked References

S2HConv-Caps-BiGRU: Deep Learning based Heterogeneous Face


1
2 Recognition Model with Divergent Stages
3
4 *1Narasimhula Balayesu, 2Dr. Avuthu Avinash Reddy
5
6 *1
7
Research Scholar, CSE Department,
8 Vignan's foundation for science technology and research,
9 Vadlamudi, Andhra Pradesh- 522213, India.
2
10 Assistant Professor
11 Department of Advanced computer science and engineering,
12
13 Vignan's foundation for science technology and research,
14 Vadlamudi, Andhra Pradesh- 522213, India.
15 Email: narasimhulabalayesu23@gmail.com
16
17
Abstract: Real-time images of faces captured in different spectrum bands are considered
18
19 heterogeneous images. Heterogeneous Face Recognition (HFR) matches faces across domains
20 and is crucial to public safety. This paper proposes an HFR approach based on Deep Neural
21 Networks (DNN). Feature maps are extracted from two images, such as gallery and sketch
22 images, using S2HConv-Caps-BiGRU (Squirrel Search Heterogeneous Convolutional-
23
24
Capsule- Bidirectional Gated Recurrent Unit). As a method of efficiently recognizing faces,
25 coupled representation similarity metric (CRSM) will use the measure for a similarity of two
26 feature maps. The experimental results will be evaluated to state-of-the-art (SOTA) statistical
27 measures in terms of accuracy, recall, Jaccard score, dice score, mean square error (MSE),
28 image similarity, performance, and RMSE. Compared to other SOTA, the model produces the
29
30 best results. The accuracy value of a CUFS dataset is 98.7%.
31
32 Keywords: Heterogeneous Face Recognition, Squirrel Search Heterogeneous Convolutional-
33 Capsule- Bidirectional Gated Recurrent Unit, coupled representation similarity metric, state-
34
35
of-the-art, deep neural networks, CUHK Face Sketch database.
36
37 1. Introduction
38
39 Today’s digital life is equipped with various digital devices such as laptops, palmtops,
40
41 smartphones, and other digital devices [1]. Due to the increase in digital devices, security
42 threats and online frauds are maximized. Hence, biometric authentication is becoming more
43 necessary for system security. A biometric system aims to gather similar metrics among a
44 gallery sample query by using some invariant parameter extracted features from both samples
45
46
[2]. There are various biometric features like palm prints, fingerprints, faces, DNA, and gait
47 are used by various forensic investigation and authentication issues [3]. It is widely accepted
48 that biometrics and human characteristics and traits can effectively be used to identify and
49 authenticate people. They are used for access control, surveillance, and in national and
50 international security systems [4]. There are various image-capturing devices, such as CCTV,
51
52 thermal-infrared (TIR) cameras, and near-infrared (NIR) cameras [5]. The NIR cameras are
53 necessary to take images during the night. Thermal cameras are used to examine the aliveness
54 of a living body [6]. There are many problems with existing face recognition algorithms. One
55 of the significant challenges is detecting faces using different modalities [7].
56
There are two types of face recognition they are homogeneous and heterogeneous face
57
58 recognition (HFR) [8]. If a similar sensor or camera captures the face images of a gallery and
59 query, then it is termed homogeneous face recognition [9]. The major challenges of this type
60 of face recognition are occlusions, poses, intra-class variations, expressions, and illuminations
61
62
63
64
65
[10]. The most commonly used databases, such as FERET [11], ORL [12], and extended Yale
1 [13], is composed of images that are taken using a similar camera device with some changes in
2 the illuminations and pose. If a dissimilar sensor or camera takes the face images of a gallery
3
4 and queries, then it is termed heterogeneous face recognition. The sensors or cameras refer to
5 NIR, TIR, and visible light (VIS) [14]. The current advancement of modern digital devices
6 produces images of a face in different modalities. Previous face recognition systems used only
7 a single modality of photographs: visible light. Due to the enhancement in image-capturing
8
9
sensors, there are various changes in modern image modality [15]. The usage of these modern
10 cameras is maximized due to some needs in security, multimedia applications, and law
11 enforcement. TIR cameras take body heat levels to detect aliveness identification [16]. Hence,
12 it is essential during nighttime, especially in the forest department, to observe enemies and
13 animals. NIR camera takes images during night illumination to recognize faces [17]. The major
14
15 issue with heterogeneous face recognition is the difference in modalities among gallery and
16 query images [18].
17 Synthesis-based strategies initially change one modality image into another modality, like
18 images in homogenous nature, and then conventional homogenous face recognition strategies
19
20
are utilized [19]. Feature descriptor-based strategies extract features directly from the image,
21 and those features are used for recognition [20]. The gallery will be connected directly to the
22 projection space strategies, which will project heterogeneous images of faces into latent
23 common spaces [21]. Deep convolutional neural networks (DCNNs) are broadly used in face
24 recognition. The DCNN techniques will learn from large datasets and extract the most
25
26 important discriminative features that explain the non-linear relationship of different modalities
27 [22]. Hence, many researchers started to introduce DCN–based HFR systems. DCNN method
28 shows great performance in face recognition with small-scale HFR datasets with naïve training
29 schemes [23]. The HFR strategies use DCNNs properties, and the properties are to train the
30
network with the type of training data applied [24]. To identity in public present training data
31
32 like VGG2-Face, MS-Celeb-1M, VGG-Face, Oulu-CASIA NIR-VIS, CAISA WebFace, and
33 CASIA NIR-VIS 2.0 ranges starts several thousand to half million [25]. But it suffers from a
34 drawback of a model discrepancy because the entire parameters are shared while the inputs are
35 of different modalities [26]. Compared to the VGG network, Inception-ResNet performs well
36
37
in the HFR system. The third property is the loss function designs [27, 28]. Therefore, effective
38 DL techniques must be used to enhance the performance of HFR in face recognition.
39
40 Contribution:
41
42
43 The key contribution of this proposed work:
44  The preprocessing phase in face recognition applications commonly includes cropping,
45 alignment, resizing, normalizing, and filtering methods.
46
 The enhanced guided filtering approach is used for the removal of noise.
47
48  S2HConv-Caps-BiGRU is proposed to extract the feature map of two gallery and sketch
49 images.
50  The similarity between the two feature maps will be measured using CRSM to
51
recognize the face efficiently.
52
53 The rest of the paper is organized as follows: Section 2 reviews recent related works on
54 predicting a similar image. Section 3 provides a detailed description of the proposed similarity
55 score framework. Section 4 validates the performance of a proposed model by comparing it
56 with existing models. Section 5 experimental evaluation this part, contains mathematically
57
58
formulated system models. Finally, the paper is concluded in Section 6.
59
60
61
62
63
64
65
2. Related Work
1
2 Peng et al. [29] proposed the graphic representation-based HFR approach (G-HFR). The spatial
3
4 compatibility between adjacent image patches is considered when using Markov networks to
5 represent heterogeneous image patches individually. To measure the degree of similarity
6 between learned graphical representations, a coupled representation similarity metric (CRSM)
7 is used. Extensive testing using a variety of HFR scenarios (viewed sketch, forensic sketch)
8
9
demonstrates that the suggested method performs better than state-of-the-art approaches.
10 Wang et al. [30] proposed two parts consisting of the Bayesian framework: the weight
11 computation model and the neighbour selection model. The suggested Bayesian face sketch
12 synthesis method is used for the framework. The essential rationale behind the proposed
13 Bayesian method is that it considers the neighbouring spatial constraint between adjacent
14
15 image patches for both models. The conventional methods ignore the constraint when selecting
16 neighbours or when calculating weights. In both subjective perceptions and objective
17 evaluations, extensive experiments on CUHK’s face sketch database demonstrate that the
18 proposed Bayesian method performs better than state-of-the-art methods.
19
20
CNN, or WCNN for short, were proposed by He et al. [31] to learn characteristics that are
21 invariant between near-infrared and visible face images (i.e. NIR-VIS face recognition).
22 WCNN’s low-level layers are trained using readily accessible visual spectrum face photos. The
23 NIR layer, VIS layer, and NIR-VIS shared layer are the three components of the high-level
24 layer. The NIR-VIS shared layer is intended to learn modality-invariant feature subspace,
25
26 whereas the previous two layers are intended to learn modality-specific features. The
27 Wasserstein distance is added to the common NIR-VIS layer to quantify the differences
28 between the heterogeneous feature distributions. For invariant deep feature representation of
29 heterogeneous face images, W-CNN learning tries to minimize the Wasserstein distance
30
between NIR and VIS distribution.
31
32 The heterogeneous face interpretable disentangled representation (HFIDR), which can
33 explicitly interpret face representation dimensions as opposed to simple mapping, was
34 proposed by Liu et al. [32]. The interpretable structure helped us to extract latent identity
35 information for cross-modality recognition further and convert the modality component to
36
37
synthesize cross-modality faces. A multimodality heterogeneous face interpretable
38 disentangled representation (M-HFIDR) has been proposed to expand the fundamental method
39 suited for multimodality face recognition and synthesis. Building a brand-new, expansive face
40 sketch data set will allow for assessing generalization capacity.
41 George et al. [33] suggested a surprisingly straightforward and efficient technique for
42
43 matching facial images across many sensory modalities. The basic concept behind the
44 suggested method is to close the domain gap by adding a brand-new neural network component
45 called the Prepended Domain Transformer (PDT) before a facial recognition (FR) model that
46 has already been trained. Modern performance in many HFR benchmarks was attained by
47
48
retraining this new block with a small number of paired samples in a contrastive learning
49 configuration. The PDT blocks may be retrained using the suggested broad structure for various
50 source-target pairings.
51 Cho et al. [34] proposed a graph-structured module that gathers generic facial traits as well
52 as global relational information, is also a module that gathers global relational information. The
53
54 relational information between each identity’s intra-facial components is comparable in all
55 modalities; therefore, cross-domain matching can be aided by modelling the link between
56 characteristics. Relational graph modelling (RGM) allows relation propagation to reduce
57 texture reliance while maintaining the benefits of pre-trained features. The RGM extracts
58
59
global face geometrics from locally linked convolutional features to identify long-distance
60 associations. Additionally, it provides a brand-new conditional margin loss function (C-
61
62
63
64
65
softmax) for effective projection learning of the embedding vector in HFR. Table 1 shows the
1 comparison with the state-of-the-art techniques.
2
3
4 Table1: Comparison with the state-of-the-art techniques
5
6 Reference Methods Merits Demerits
7 Peng et al. [29] graphical HFR (G- Face recognition No support in a
8
9 HFR) precision is better with G- situation with more
10 HFR. HFR.
11 Wang et al. anchored ANI may be performed in Face recognition
12 [30] neighbourhood offline; it doesn’t raise the accuracy is low.
13
14
index (ANI) computational
15 complexity.
16 He et al. [31] Wasserstein CNN This is particularly true Compared to some
17 (WCNN) when the false acceptance other methods ROC
18
rate (FAR) is low. value is low.
19
20 Liu et al. [32] Heterogeneous face The symmetric Due to the large
21 interpretable adversarial loss would disparity and small
22 disentangled enhance both the training training sample, it is
23 representation stability and synthesis difficult to produce
24
25 (HFIDR) quality. satisfying images.
26 George et al. Prepended Domain They are more compatible In the visual domain,
27 [33] Transformer (PDT) with various “imaging” heterogeneous
28 modalities since they modalities like hand-
29
30
structurally resemble drawn sketches fall
31 visible face images. short and may be more
32 difficult to adapt.
33 Cho et al. [34] Relational Graph Relation propagation Data in relational
34 Module (RGM), reduces texture databases can
35
36 Node dependency through the sometimes become
37 Attention Unit RGM without complex as the amount
38 (NAU) compromising the pre- of data grows, and the
39 trained features. relationships between
40
41
data pieces become
42 more complex.
43
44 There are a few disadvantages to detecting the face recognition similarity score. Find a
45 similar image with disadvantages such as high computational cost and manually given weights
46
47 for training. To resolve this problem, a new intelligent approach is proposed using a deep
48 learning-based heterogeneous face recognition model with divergent stages for gallery and
49 sketch images.
50
51
52
3. System Model for Deep Learning-Based Heterogeneous Face Recognition Model
53
54 When the query face of gallery images is captured from different cameras or sensors, like
55 visible light versus near-infrared or visible light versus thermal etc., this face recognition
56 process is called heterogeneous face recognition. Active infrared relies on reflected signals
57
58 from infrared-illuminated objects, which is Short-wave infrared (SWIR) (1 μm - 3 μm) and
59 Near-infrared (NIR) (0.74 μm - 1 μm) make up class. Thermal infrared, passive infrared
60 involves measures of the radiation emitted by the body. Infrared is divided into long wave
61
62
63
64
65
infrared (8 μm - 14 μm) and middle wave infrared (3 μm - 5 μm). Normally, face recognition
1 applications require alignment, filtering, and cropping during the preprocessing phase. Most
2 face recognition systems’ initial stage is to detect facial regions by segmenting an image’s face
3
4 and non-face parts. The segmented face area is then aligned, resized and normalized for better
5 face recognition results. Image filtering is the most important operation in image processing as
6 it reduces noise and improves the visual quality of an image. The filter can be used to either
7 reverse the noise in images or blur them. The filter used here is an enhanced guided filtering
8
9
approach to remove noise. These filters are intended to help compensate for intensity changes,
10 uneven lighting, and appearance changes. Figure 1 shows the step of a heterogeneous diagram.
11
12
13
14
S2HConv-Caps-
15 BiGRU
16
feature map
17 Cropping
18 Eye-brows Resize Normalization guided filtering
Eyes approach
19 Nose
Mouth
20
Matching
21
22
23
24
25
CUFS
26 Python
27 Dataset
28 Evaluation
29 Accuracy
performance
30
31 Image sketch
32 Figure 1: The step of a Heterogeneous diagram
33
34
35 S2HConv-
36 Caps-BiGRU Feature map for sketch
37 image

38
39
40 Sketch image
41
42
Partition
43
44
45 ……..
Similarity
46 Representation measure Score
dataset (CRSM)
47
48 ……..
49
50 Partition

51
52 S2HConv- Feature map for gallery
53 Caps-BiGRU image
54
55
56
57
58
Gallery image
59
60 Figure 2: Flow diagram of S2HConv-Caps-BiGRU Heterogeneous faces recognition model.
61
62
63
64
65
A deep learning bionic model is proposed for HFR recognition. This paper suggests an HFR
1 approach based on neural network representation. The next stage is feature extraction,
2 S2HConv-Caps-BiGRU (Squirrel Search Heterogeneous Convolutional- Capsule-
3
4 Bidirectional Gated Recurrent Unit) for two images such as gallery image and sketch image;
5 this model provides a future map. Here, a bionic squirrel search model is introduced to reduce
6 the loss function of a network model. Finally, find the similarity measure: the calculated
7 similarity measure between the gallery image and sketch image using the coupled
8
9
representation similarity metric (CRSM). Figure 2 shows the flow diagram of a proposed
10 heterogeneous face recognition model.
11 It has one of the most studied research topics in heterogeneous face recognition (HFR) to
12 visible and match near-infrared images. The recognition and synthesis are used to synthesize
13 new heterogeneous images to improve matching results in near-infrared and visible spectrums.
14
15
16 3.1. Preprocessing
17
18 Face recognition applications commonly use alignment, cropping, and filtering methods during
19
20
preprocessing. The affine transformation transforms the corresponding images into canonical
21 coordinates using basis points (e.g., nose, eyes, mouth, etc.) that are either automatically
22 detected or manually located. As a result, a set of geometrically aligned images is produced
23 from different modalities. The images are then cropped around the faces after they have been
24 aligned. A Region of Interest (ROI) is created based on the detected rectangular bounding box
25
26 of the face. Depending on how the ROI is cropped, the resolution varies. Hence, the term
27 normalization is changing the range of pixel intensity values in a digital image to make it more
28 familiar to the senses.
29
30
3.1.1. Guided filtering approach
31
32
33 Parameters can be controlled to control the preserving-edge smoothing of guided filters. For
34 smoothing input images, the present paper adopts a guided filter to account for this property
35 and intensity variations in gallery and sketch images. Guided filters rely on a local linear model
36
37 between filter output s and guidance G . An input image is smoothed using the guidance image.
38 Based on pixel k [34], it assumed s is to be a linear transform of G in window ck :
39
40 si  yk Gi  zk i c k (1)
41
42
This equation assumes that ( yk , zk ) are some linear coefficients ck . Since s  yG , the
43 local linear model implies that G has an edge only if s has one.
44 Accordingly, the linear coefficients can be determined by minimizing the difference
45
between the input P and output images.
46
47 U ( yk , zk )   (( yk Gi  zk  Pi ) 2  yk2 ) (2)
48 ick
49 The linear regression method can be used to solve equation (2):
50 1
51
52
 Gi Li   k Pk
c ick
yk 
53
54
 k2   (3)
55 z k  Pk  yk  k
56 (4)
57 Here  denotes the parameter preventing regularization y k from being too large,  and  k2 is
58
59 variance and mean in G of ck , and Pk denotes mean value of P in ck . c denotes the number of
60
61
62
63
64
65
pixels in ck . To compute the filter output after computing ( yk , zk ) for all patches ck in the
1
2 image, take the following steps:
1
si   ( yk Gi  zk )  yi Gi  zi
3
4 c k:ick
5 (5)
6 Here yi and zi is the value of mean in y k and zk , accordingly.
7
8 The guided filter has better edge-preserving smoothing properties than other filters due to
9 the linear relationship between filter output and guidance. As a result of the non-approximate
10 method of implementation, the generated results are of good quality. Moreover, the linear
11
12
running time of the algorithm only depends on the number of pixels
13 Feature Mapping is a method of representing features on a graph and their relevance. In this
14 way, the features can be visualized, and their corresponding information can be accessed
15 visually. By excluding the irrelevant features, only the relevant ones are retained. CNN’s
16 produce feature maps due to applying a filter to the previous layer. In a feature map, different
17
18 features are mapped according to where they appear in an image. The neural network looks for
19 ‘features’ like straight lines, edges, or even objects.
20
21 3.2. Squirrel Search Heterogeneous Convolutional- Capsule- Bidirectional Gated Recurrent
22
23
Unit (S2HConv-Caps-BiGRU)
24
25 The contextual information in unique orientations for hate speech detection introduces
26 S2HConv-Caps-BiGRU, a unique deep neural network model combining a convolutional layer,
27 capsule network, and BiGRU.The following section provides a detailed overview of various
28
29 components of the S2HConv-Caps-BiGRU system. There are convolutional layers integrated
30 with squirrel search optimization, capsule networks, and BiGRU layers, followed by fully
31 connected and output layers. Figure 3 shows the architecture ofS2HConv-Caps-BiGRU.
32
33
34 Input layer Convolutional layer
35
36 w1 BiGRU layer
37
…...

38 w2
39 Capsule layer
40 w3
41 squirrel
42 w4 search
…...

43 algorithm for
44 optimize the
w5 weight
45
46
47
…...

48
..
49
50 wn
51
52 Figure 3: Architecture ofS2HConv-Caps-BiGRU.
53
54
55
3.2.1. Input layer
56
57 It passes preprocessed gallery and sketch images to the input layer of the S2HConv-Caps-
58 BiGRU model. As an outcome, the input layer converts the text input into a numerical vector.
59 According to mathematical theory, the input gallery image and sketch image are represented
60
61
62
63
64
65
as follows: When there is a gallery image and sketch image, each image will be replaced by its
1 dictionary index, such as T ∈<1×n.
2
3
4 3.2.2. Convolution layer
5
6 To extract the feature map, a convolutional layer is applied over the embedding vector in
7 S2HConv-Caps-BiGRU. Because the input embedding image is a row image, the proposed
8
9
model uses a one-dimensional convolutional operation. Squirrel Search optimization is
10 integrated to reduce weight. Using 32 filters of three different filter sizes, the convolutional
11 layer extracts 32 sequences of hate-related temporal and spatial features. Convolution is
12 performed on the input image by 32 filters, extracting features as f g  [ f1 , f 2 ...., f16 ] input
13
14 images. The underlying feature map is obtained via a max-pooling process.
15 f m  f ( wtc .d t  z ) (6)
16
th c
17
Here m feature of sequence, f m generate from d t word window, then wt , z , and f (.)
18
19 indicates filter weight, bias, and ReLU, respectively.
20
21 3.2.3. Capsule Network Layer
22
23
24 One capsule can hold several other capsules in a network of capsules. The image classification
25 network uses the capsule module to represent the capsule orientation and classification
26 probability to describe the different orientations of the image. Therefore, Capsule networks are
27 more enriched and efficient than traditional neural network models, including CNN’s. A
28
29
capsule network produces an image rather than a scalar value like a CNN pooling layer. Based
30 on the advantages discussed above, the S2HConv-Caps-BiGRU model uses the capsule
31 network. This is accomplished by passing the final hidden state f m to the capsule network
32
33 layer, which represents the output of a convolution layer.
34 eˆ j i  wtcap ei
35 (7)
36 A non-linear activation function is used to convert the final hidden state f m of S2HConv-
37
38 Caps-BiGRU into a feature capsule ei . A few more specifics are in the code, ei determines the
39 correlation between the input and output layers, and ê j i is used to predict outputs, where wtcap
40
41 is the weight for the input.
42 n
43 g j   aij eˆ j i
44 i 1 (8)
45
46
The coupling coefficients a ij are calculated using the dynamic routing process. This process
47 ignores input images that contain trivial and irrelevant elements. Images in the gallery and
48 sketch are weighted according to the coupling coefficient a ij . The weight of a feature is higher
49
50 when it has a high a ij value and vice versa. Capsule outputs g j are calculated by summation
51
52 of all prediction feature maps.
n

 a eˆ
53
54 gj  ij j i
i 1 (9)
55
56 By using the below equation, the softmax function calculates the coupling coefficient a ij .
57
exp( z ij )
58 aij 
59
60
 exp( z
e ie )
(10)
61
62
63
64
65
The following equation updates z ij . This layer represents the capsule network of higher
1
2 layers.
3 zij  zij  RTj f (ei , j )
4 (11)
5 With this equation, the final output is rj normalized using a squash function (an activation
6
7 function of non-linear) that accounts for different orientations.
2
8 gj gj
9 rj  2 (12)
10 1 g j gj
11
12 3.2.4. BiGRU layer
13
14 BiGRU is a type of RNN used to extract sequences from backwards and forward directions in
15
16 sequential modelling problems. In BiGRU, backward GRUs (GRU ) and forward GRUs
17
18
(GRU ) are included on succeeding retrieve (i.e., f1 to f32) and feature sequence proceeding
19 (i.e., f32 to f1), correspondingly. Using the BiGRU layer at the output of a convolutional layer,
20 the proposed S2HConv-Caps-BiGRU model obtains forward and reverse sequences with
21 context information. Following equations display the results of BiGRU in both backward and
22
23
forward directions. The BiGRU-based representation for a feature sequence concatenates
 
24
25
forward hv and reverse hz hidden states. Information collected around wtGRU is incorporated
26 into the two hidden states to retrieve based on weight. Finally, the below equation represents a
27 final hidden state hq representing the concatenated contextual information-incorporating
28
29 sequence.
30 

31 hv  GRU ( wtGRU ), m  [1,32 ]


(13)
32
33 hz  GRU ( wtGRU ), m  [1,32 ]
(14)
34  
35 hq  [hv , hz ]
36 (15)
37 To optimize the weight of the Conv-Caps-BiGRU, a squirrel search algorithm is integrated
38 here.
W  wtc , wtcap , wtGRU 
39
40 (16)
41 Here W indicates the weight of the Conv-Caps-BiGRUneural network. This Conv-Caps-
42
43 BiGRUneural network should be reduced in weight wtc , wtcap , wtGRU . This paper introduced the
44 squirrel search algorithm (SSA) for optimizing the weight function.
45
46
47
3.2.5. Squirrel search algorithm for Weight Optimization
48
49 The proposed method has 2 search methods: the first is the jumping method, and the second is
50 a progressive method. As part of the evolutionary process, the practical method is automatically
51
52 selected through the selection strategy of linear regression, which increases the robustness of a
53 squirrel search algorithm. The escape operation sufficiently develops the search space, and the
54 death operation further explores the developed space by the jumping method. SSA balances
55
56
SSA’s development and exploration capability. As for the progressive search method, mutation
57 operation preserves the current evolutionary information and pays more attention to
58 maintaining population diversity.
59
60
61
62
63
64
65
3.2.5.1. Random Initialization
1
2 Initial population W was generated randomly with k number of flying squirrels.
3
4  W1,1 W1, 2   W1,d 
W 
5  2,1 W2, 2   W2,d 
6 W       
7  
8       
9 Wm,1 Wm, 2   Wm,d 
 
10 (17)
11 th th
12 In this case, Wi , j indicates the j dimension of i flying squirrel. In a forest, flying squirrels
13 are allocated their initial locations according to a uniform distribution.
14
15
Wi  WL  U (0,1)  (WU  WL )
(18)
16
17
A distributed uniformly random number in the range [0, 1] is U (0,1) , and WL and WU represent
18 lower and upper bounds for j th dimension i th flying squirrel.
19
20
21 3.2.5.2. Fitness evaluation
22
23 To calculate the localization, fitness values of the decision variable (solution vector) are
24
25
inserted into a user-defined fitness function. The resulting values are stored in the following
26 field:
27  f ( W1,1 , W1, 2 ,.....W1,d ) 
28  
 f ( W2,1 , W2, 2 ,.....W2,d ) 
29 f  

30  
31  f ( Wn,1 , Wn, 2 ,.....Wn,d )
32  
(19)
33
34 The value of fitness neural network layer weight depicts the quality of optimal weight. Here
35 W  ( wtc , wtcap , wtGRU ) , after storing the fitness values, the array is sorted in ascending order.
36
37
The squirrel with a minimum fitness value is declared on the hickory nut tree. Afterwards, the
38 best three flying squirrels are thought to move to hickory nut trees from acorn trees.
39
40 3.2.5.3. Generating location
41
42
Case 1:
   
43
44 Flying squirrels may move from acorn nut trees Wi , j ant
to hickory nut trees Wi , j hnt
. It is
45
possible to determine the new location of squirrels in this case by following these steps:
W   x  s  W   W  
46
if R1  P
Wi, j ant   i, j ant g c i, j hnt i, j ant
47
48 
49 random location otherwise 
50 (20)
51 Where x g indicates a distance of random gliding, R1 indicates a random number in the range
52
53  
of [0, 1], Wi , j hnt
denotes the location of squirrel reached hickory nut tree and t represent
54 current iteration. In the mathematical model, the gliding constant sc is used to balance
55
exploration and exploitation.
 
56
57 Wi , j nort
58 Case 2: Asquirrel on a normal tree can move towards an acorn tree; the new location
59
60
of the squirrel can be determined as follows:
61
62
63
64
65
W   x  s  W   W   if R2  P
1 Wi, j nort   i, j nort g c i, j ant i, j nort 
2 random location otherwise 
3 (21)
4 Case 3: In some cases, squirrels that have already consumed acorns on normal trees may move
5 into hickory trees to store hickory nuts that can be consumed in times of food scarcity.
6
Accordingly, new squirrel locations can be obtained as follows:
7
W   x  s  W   W   if R3  P
8
9 Wi, j nort   i, j nort g c i, j hnt i, j nort 
random location otherwise 
10 (22)
11
12
R3
13
In this case, represents a random number in the range of [0, 1]. The optimization
14 algorithm was designed using an approximated model of the gliding behaviour. Flying squirrel
15 fitness values are computed, and the locations are updated on each iteration until the maximum
16
number of iterations is reached.
17
18
19 3.3. Coupled Representation Similarity Metric (CRSM)
20
21 Compute the similarity of each coupled patch pair individually in order to compare the two
22
23 representations C t and C g l . A “couple” is defined as two-column images with the same order
24
25 of columns extracted from C t and C g l . Many metric functions measure the similarity between
26 two vectors, including the P1 norm, the P 2 norm, the P norm, the cosine distance, and the
27
28 chi-square distance. In a coupled heterogeneous image, two metrics corresponding to the same
29 position have similar semantic meanings, despite being useful metrics. This is illustrated by
30  yi , z and  xl , which describes the weight of the gallery and sketch images in the representation
31 i,z

32 dataset. To describe semantic similarity, use weights that share the same neighbours in neural
33 network representations. To accommodate this principle, they propose a novel similarity
34
35
measure, namely CRSM, based on the rank-based similarity measure. Using the same nearest
36 neighbour weights as the sketch yi and gallery photo patch xil , compute a similarity score.
37 N
38
39
g ( y i , xil )  0.5  m (w
z 1
z yi , z w
xil, z
(23)
40
1, w y1, z  0 and wxl  0
41 mz   i,z

42  0, otherwise
43 (24)
44 The similarity map images were quantified using binary images. The bright areas represent
45 similarity scores over 0.5. Compared to similarity maps of heterogeneous faces of the same
46 person, similarity maps of heterogeneous faces of different individuals tend to have more bright
47
48
areas.
N N
49
50  wyi ,z  1 ,
z 1
w
z 1
xil , z
1
51 (25)
52 There is a range of 0 to 1 for the proposed similarity measure. When matching the probe sketch
53 with the gallery photo, the average similarity scores on all patch positions are used.
54
55
56 4. Result and Discussion
57
58 This section examines the proposed approach’s performance using gallery and sketch images.
59 This section describes the experimental setup, performance measurement, evaluation datasets,
60
61
62
63
64
65
and experimental outcomes. The experiment was built on a dataset from the CUHK Face
1 Sketch database, which deals with face sketch recognition and face sketch synthesis. The
2 proposed system will be implemented on the Python platform, and the experimental results will
3
4 be evaluated and compared with earlier machine learning prediction models for statistical
5 measures.
6
7 4.1 Dataset Description
8
9
10 Research on face sketch recognition and face sketch synthesis is conducted using the CUHK
11 Face Sketch database (CUFS). The Chinese University of Hong Kong (CUHK) student
12 database, AR database, and XM2VTS database each provide 188 faces, and the total number
13 of faces is 606. The artist sketches each face from a frontal photo taken under normal lighting
14
15 conditions with a neutral expression.
16
17 4.2 Simulation setup
18
19
20
The tests are performed using an Intel(R) Core(TM) i5-3470 processor with four cores and four
21 logical processors operating at 3.20GHz, and 3200GHz. The computer is called 111, features
22 Micro Software 10 Pro as its operating system, and has an internal physical memory (RAM)
23 capacity of 8GB.
24
25
26 5. Performance Evaluation
27
28 For performance evaluation such as accuracy, preprocessed image, feature map image,
29 accuracy vs loss value, face recognition accuracy variations against the number of dimensions
30
reduced. The suggested method is compared to several methods like Generative adversarial
31
32 networks (GANs), fully convolutional networks (FCNs), BP-GAN, multidomain adversarial
33 learning (MDAL), collaborative nets (Col-Nets), Long Short Term Memory (LSTM), and BP-
34 LSTM.
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 Figure 4: Preprocessed image for CUFS dataset.
61
62
63
64
65
1 Figure 4 shows the preprocessed image for a gallery image and sketch image. Face
2 recognition applications typically involve alignment, cropping, resizing and normalizing and
3
4 filtering methods during the preprocessing phase. Filtering is used here to remove noise using
5 an enhanced guided filtering approach. The filters are designed to compensate for changes in
6 intensity, illumination, and appearance caused by non-uniform lighting.
7
8
9
10
11
12
13
14
15
16
17 Figure5:Feature map for CUFS dataset.
18
19
20 Figure 5 shows the gallery image and sketch image’s feature map usingS2HConv-Caps-
21 BiGRU for a CUFS dataset. The squirrel search algorithm is employed to reduce the
22 convolution layer weight.
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43 Figure 6:Accuracy Vs. Loss for CUFS dataset.
44
45 A loss is typically used to determine the “optimal” parameter values when training a model.
46
47
The suggested technique offers high accuracy and low loss values compared to current
48 approaches during the training and testing. Figure 6 shows accuracy Vs. Loss for CUFS dataset.
49 Face recognition accuracy is one of the most effective metrics when evaluating the quality
50 of results. A high level of recognition accuracy would be achieved with high-quality
51 synthesized images. A random selection of 150 synthesized sketches and their corresponding
52
53 artist drawings were used for classifier training in the CUFS database, while the remaining 188
54 synthesized sketches were used in gallery sets. Using the CUFS database, Figure 7showsthe
55 face recognition accuracy as a function of some dimensions reduced by null-space linear
56 discriminated analysis (NLDA).
57
58
59
60
61
62
63
64
65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 Figure 7: Face recognition accuracy against variations in the number of
20
21
dimensions reduced by the CUFS database.
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Figure 8: CUFS database compared with existing methods.
42
43 The suggested method was compared with existing methods such as FCN, GAN, BP-GAN,
44 MDAL, Col-Nets, LSTM, and BP-LSTM.BP-GAN produces sketches with a finer texture than
45 FCN and GAN methods but results in a blurry face structure comparable to GAN and FCN.
46
47 Both methods produce sketches with noisy textures. An MDAL or Col-Nets method provides
48 smooth textures and more detail than an FCN or GAN method. Compared with the existing
49 method, the suggested method gives the best performance value. Figure 8 shows the CUFS
50 database compared with existing methods, and the proposed method’s accuracy value is 98.7%.
51
52
An important performance indicator for a regression model is its Root Mean Square Error
53 (RMSE). The accuracy model is determined by its ability to predict the target value. Lower
54 RMSE values indicate a better fit. A model’s RMSE provides a good indication of how accurate
55 its prediction is. If the purpose of the model is to predict, this is the most important criterion.
56 A suggested method is compared with cycle-GAN and conditional-GAN. The RMSE value of
57
58 a proposed method is 0.132. Figure 9 shows RMSE for the CUFS database; compared with the
59 existing method, the proposed method gives a low RMSE value. Table 2 shows the accuracy
60
61
62
63
64
65
value, recall, Jaccard score, dice score, MSE (means core error), RMSE, PSNR (peak signal-
1 to-noise ratio), SSIM (structural similarity), and image similarity.
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 Figure 9: RMSE for CUFS database.
22
23
24 Table 2: proposed method performance
25
26 Parameters values
27
28 Accuracy 98.7
29
30
31 Recall 0.856
32
33 Jaccard Score 0.597
34
35 Dice Score 0.904
36
37 MSE 0.813
38
39 RMSE 0.132
40
41
42 PSNR 190.39
43
44 SSIM 0.999
45
46 Image Similarity 0.74
47
48
49 6. Conclusion
50
51
52 Capturing faces in different modalities is very important for real-life security, multimedia
53 applications, and law enforcement, where the conventional homogeneous face recognition
54 system failed in face recognition. Therefore, storing images of a person in different modalities
55
posed great challenges for the HFR. One of the major problems in HFR is a difference in
56
57 illumination patterns. Homogeneous face recognition occurs when the same camera or sensor
58 captures both the query and the gallery. The experimental results will be evaluated and
59 compared with earlier models for accuracy, recall, Jaccard score, dice score, MSE, image
60 similarity, performance, and RMSE. Based on the CUFS dataset, the suggested method will
61
62
63
64
65
identify the similarity between gallery and sketch images; it is compared with several other
1 methods. The accuracy value of a CUFS dataset is 98.7%.
2
3
4 Compliance with Ethical Standards
5
6 Funding: No funding is provided for the preparation of manuscript.
7 Conflict of Interest: Authors declare that they have no conflict of interest.
8
9
Ethical Approval: This article does not contain any studies with human participants or animals
10 performed by any of the authors.
11 Consent to participate: All the authors involved have agreed to participate in this submitted
12 article.
13 Consent to Publish: All the authors involved in this manuscript give full consent for
14
15 publication of this submitted article.
16 Authors Contributions: All authors have equal contributions in this work.
17 Data Availability Statement: Data sharing not applicable to this article.
18
19
20
References
21
22 [1] Fu C, Wu X, Hu Y, Huang H, He R (2019) Dual variational generation for low shot
23 heterogeneous face recognition. Advances in neural information processing systems 32.
24 [2] Luo M, Ma X, Li Z, Cao J, He R (2021) Partial NIR-VIS heterogeneous face recognition
25
26 with automatic saliency search. IEEE Transactions on Information Forensics and Security
27 16:5003-17.
28 [3] Wu X, Huang H, Patel VM, He R, Sun Z (2019) Disentangled variational representation
29 for heterogeneous face recognition. In Proceedings of the AAAI conference on artificial
30
intelligence 33(1): 9005-9012.
31
32 [4] Menotti D, Chiachia G, Pinto A, Schwartz WR, Pedrini H, Falcao AX, Rocha A (2015)
33 Deep representations for iris, face, and fingerprint spoofing detection. IEEE Transactions
34 on Information Forensics and Security 10(4):864-79.
35 [5] Liu D, Gao X, Wang N, Li J, Peng C (2020) Coupled attribute learning for heterogeneous
36
37
face recognition. IEEE Transactions on Neural Networks and Learning Systems
38 31(11):4699-712.
39 [6] Deng Z, Peng X, Qiao Y (2019) Residual compensation networks for heterogeneous face
40 recognition. InProceedings of the AAAI Conference on Artificial Intelligence 33(1): 8239-
41 8246.
42
43 [7] Hu W, Hu H (2019) Fine tuning dual streams deep network with multi-scale pyramid
44 decision for heterogeneous face recognition. Neural Processing Letters 50:1465-83.
45 [8] de Freitas Pereira T, Anjos A, Marcel S (2018) Heterogeneous face recognition using
46 domain specific units. IEEE Transactions on Information Forensics and Security
47
48
14(7):1803-16.
49 [9] Roy H, Bhattacharjee D (2018) A novel local wavelet energy mesh pattern (LWEMeP) for
50 heterogeneous face recognition. Image and Vision Computing 72:1-3.
51 [10] Roy H, Bhattacharjee D (2018) A novel quaternary pattern of local maximum quotient for
52 heterogeneous face recognition. Pattern Recognition Letters 113:19-28.
53
54 [11] Cament LA, Galdames FJ, Bowyer KW, Perez CA (2015) Face recognition under pose
55 variation with local Gabor features enhanced by active shape and statistical models. Pattern
56 Recognition (11):3371-84.
57 [12] Sánchez D, Melin P, Castillo O (2015) Optimization of modular granular neural networks
58
59
using a hierarchical genetic algorithm based on the database complexity applied to human
60 recognition. Information Sciences 309:73-101.
61
62
63
64
65
[13] Al-Dabagh MZ, Alhabib MM, AL-Mukhtar FH (2018) Face recognition system based on
1 kernel discriminant analysis, k-nearest neighbor and support vector machine. International
2 Journal of Research and Engineering 5(3):335-8.
3
4 [14] Yang S, Fu K, Yang X, Lin Y, Zhang J, Peng C (2020) Learning domain-invariant
5 discriminative features for heterogeneous face recognition. IEEE Access. 8:209790-801.
6 [15] Kar A, Neogi PP (2020) Triangular coil pattern of local radius of gyration face for
7 heterogeneous face recognition. Applied Intelligence 50(3):698-716.
8
9
[16] Cao B, Wang N, Li J, Gao X (2018) Data augmentation-based joint learning for
10 heterogeneous face recognition. IEEE transactions on neural networks and learning systems
11 30(6):1731-43.
12 [17] Song L, Zhang M, Wu X, He R (2018) Adversarial discriminative heterogeneous face
13 recognition. InProceedings of the AAAI conference on artificial intelligence 32(1).
14
15 [18] Hu W, Hu H (2019) Discriminant deep feature learning based on joint supervision loss
16 and multi-layer feature fusion for heterogeneous face recognition. Computer Vision and
17 Image Understanding 184:9-21.
18 [19] Peng C, Gao X, Wang N, Li J (2019) Sparse graphical representation based discriminant
19
20
analysis for heterogeneous face recognition. Signal Processing 156:46-61.
21 [20] Bhattacharya S, Dasgupta A, Routray A (2020) Multi‐directional local adjacency
22 descriptors (MDLAD) for heterogeneous face recognition. IET Image Processing
23 14(5):982-94.
24 [21] Chethana HT, Nagavi TC (2021) A Heterogeneous Face Recognition Approach for
25
26 Matching Composite Sketch with Age Variation Digital Images. In2021 Sixth International
27 Conference on Wireless Communications, Signal Processing and Networking (WiSPNET)
28 335-339. IEEE.
29 [22] Mingxin T. An Unsupervised Method Based on Unpaired Multimodality Data for
30
31
Heterogeneous Face Recognition.
32 [23] Cao Z, Cen X, Zhao H, Pang L (2021) Balancing heterogeneous image quality for
33 improved cross-spectral face recognition. Sensors 21(7):2322.
34 [24] Cao X, Wen K, Huang L, Tang B, Zhang W (2020) Coupled discriminant mappings for
35 heterogeneous face recognition. InMIPPR 2019: Pattern Recognition and Computer Vision
36
37 11430: 141-148. SPIE.
38 [25] Lu X, Tian Y (2019) Heterogeneous kernel based convolutional neural network for face
39 liveness detection. InBio-inspired Computing: Theories and Applications: 14th
40 International Conference, BIC-TA 2019, Zhengzhou, China, November 22–25, 2019,
41 Revised Selected Papers, Part II 14 :381-392. Springer Singapore.
42
43 [26] Zhang Y, Liu C, Sun B, He J, Yu L (2021) NIR-VIS Heterogeneous Face Synthesis via
44 Enhanced Asymmetric CycleGAN. In 2021 International Joint Conference on Neural
45 Networks (IJCNN) 1-8. IEEE.
46 [27] Lu J, Liong VE, Zhou J (2017) Simultaneous local binary feature learning and encoding
47
48
for homogeneous and heterogeneous face recognition. IEEE transactions on pattern
49 analysis and machine intelligence 40(8):1979-93.
50 [28] Zhang W, Shu Z, Samaras D, Chen L (2017) Improving heterogeneous face recognition
51 with conditional adversarial networks. arXiv preprint arXiv:1709.02848.
52 [29]. Peng C, Gao X, Wang N, Li J (2016) Graphical representation for heterogeneous face
53
54 recognition. IEEE transactions on pattern analysis and machine intelligence 39(2):301-12.
55 [30]. Wang N, Gao X, Sun L, Li J (2017) Bayesian face sketch synthesis. IEEE transactions on
56 image processing 26(3):1264-74.
57 [31He R, Wu X, Sun Z, Tan T (2018) Wasserstein CNN: Learning invariant features for NIR-
58
59
VIS face recognition. IEEE transactions on pattern analysis and machine intelligence
60 41(7):1761-73.
61
62
63
64
65
[32]. Liu D, Gao X, Peng C, Wang N, Li J (2021) Heterogeneous face interpretable
1 disentangled representation for joint face recognition and synthesis. IEEE transactions on
2 neural networks and learning systems 33(10):5611-25.
3
4 [33]. George A, Mohammadi A, Marcel S (2022) Prepended Domain Transformer:
5 Heterogeneous Face Recognition without Bells and Whistles. IEEE Transactions on
6 Information Forensics and Security 18:133-46.
7 [34]Cho M, Kim T, Kim IJ, Lee K, Lee S (2020) Relational deep feature learning for
8
9
heterogeneous face recognition. IEEE Transactions on Information Forensics and Security
10 16:376.
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

You might also like