You are on page 1of 5

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2020.3014472, IEEE Signal
Processing Letters

Periocular Recognition in the Wild with


Generalized Label Smoothing Regularization
Yoon Gyo Jung, Cheng Yaw Low, Jaewoo Park and Andrew Beng Jin Teoh, Senior Member, IEEE

 more complicated due to co-modulation of identity information


Abstract— Periocular biometric covering the immediate vicinity and the aforementioned external distractors.
of human eye is a synergistic alternative to face particularly when The great challenge of periocular recognition is dominantly
the face is masked or occluded. Most present work for periocular addressed by deep convolutional neural networks (CNNs). In
recognition in the wild are mainly convolutional neural networks general, most of the recent state of the arts emphasize more on
learned based on cross-entropy loss. However, periocular images the strictly constrained environment, either exploring different
only capture the least salient face features, and thus suffering from CNN backbones, or interleaving the basic CNN with additional
severe intra-class compactness and inter-class dispersion issues for
manipulations to single out the most important regions for deep
discriminative deep feature learning. Recently, label smoothing
regularization (LSR) is discerned capable of diminishing the intra- embedding learning [6][7]. Other most relevant works include
class variation by minimizing the Kullback-Liebler divergence of biometric fusion with multiple modalities, of which periocular
a uniform distribution and a network prediction distribution. In recognition is not the targeted task [20][21]. Despite [8] and [19]
this paper, we extend LSR to that of Generalized LSR (GLSR) by are two representative contributions to periocular recognition in
learning a pre-task network prediction, in place of the predefined the wild, both are supplied with the pre-extracted features as the
uniform distribution. Extensive experiments on four periocular in CNN inputs, e.g. local binary patterns, local ternary patterns,
the wild datasets disclose that the GSLR-trained networks prevail histogram of oriented gradients, on top of periocular images. In
over the LSR-based counterpart and other most recent the state of a nutshell, these CNNs neglect the importance of loss function
the arts. This is supported by our empirical analyses that the
formulation. To be more precise, CNNs are typically appended
embedding periocular features rendered by GLSR results in better
class-wise cluster separation than the conventional LSR. with a softmax classifier learned with respect to cross-entropy
Index Terms— Convolution neural networks, label smoothing loss estimated from one-hot-encoded ground truth (hard targets)
regularization, periocular recognition in the wild. and network prediction. However, the CNNs learned from the
hard targets may undesirably be over-trained to capture noisy
I. INTRODUCTION features specific to non-identity artifacts, resulting in poor
generalization. Therefore, this may not be ideal for periocular
Face has been known as one of the most popular biometric
recognition in the wild.
modalities, along with iris and fingerprint. However, face
Szegedy et al. [9] put forward label smoothing (LS) notion
recognition may suffer under circumstances such as occlusion,
and demonstrate that LS improves the overall performance by
aging etc. which often deteriorates its accuracy performance.
computing cross-entropy with smoothen (soft) targets. Recently,
On the other hand, periocular biometric leverages the sub area
Müller et al. [10] reveals that LS encourages the feature
of the face in which embraces periocular region, eyelids,
representations learned by the penultimate layer of the network
eyelashes, eyes, skin color and eyebrow. The periocular
from the same class to group in tight clusters, which suggest LS
biometric appears to be more tolerate to the expression and
can reduce intra-class variation. Apart from that, [11]
occlusion, such as crime scene where criminals purposely mask
reformulates LS as a weighted combination of cross-entropy
part of their faces. This facilitates matching of partial faces
and a regularization, of which it is manifested as Kullback-
[1][2].
Liebler (KL) divergence of uniform distribution and the
Most of the studies of periocular biometric in the past are
network prediction.
accomplished under controlled environments and mainly adopt
It is commonly known that overfitting would likely occur if
hand crafted features for periocular representation [3-7].
a CNN places all prediction probabilities on a single class in the
Unfortunately, the accuracy performance may deteriorate
training set. The LS regularization (LSR) attempts to smoothen
miserably when they are thrown out in the wild environment
the network prediction distribution with a predefined uniform
whereby the pose, lighting, resolution etc. are unconstrained.
distribution to facilitate the confidence scores to be distributed
This is known as periocular recognition in the wild problem [8].
over all classes [12]. However, we remark that LSR is incapable
This challenge associates with the inter-class variation in
of resolving the unconstrained periocular recognition problems
periocular images due to the sensors’ placement, illumination
that prerequisite not only intra-class compactness but also inter-
changing, pose alignments, etc. On the other hand, unlike face
class dispersion.
recognition, the periocular images capture less salient features
We propose in this paper the Generalized LSR (GLSR) for
and therefore incurring larger intra-class variability. In other
periocular recognition in the wild. Specifically, we leverage KL
words, the unconstrained periocular recognition problem is of

This work was supported by the National Research Foundation of Korea Electronic Engineering, College of Engineering, Yonsei University, Seoul
(NRF) grant funded by the Korea government (MSIP) (NO. NRF- 120749, South Korea (e-mail: jungyg@yonsei.ac.kr;
2019R1A2C1003306) The authors are with the School of Electrical and chengyawlow@yonsei.ac.kr; julypraise@yonsei.ac.kr; bjteoh@yonsei.ac.kr).

1070-9908 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Glasgow. Downloaded on August 08,2020 at 02:19:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2020.3014472, IEEE Signal
Processing Letters

divergence of a smoothened periocular network prediction and ii. A pre-task network 𝑓𝑝𝑡 learning a uniform distribution u to
a pre-task prediction by learning as a means of regularization, minimize the Kullback-Liebler (KL) term - 𝐷𝐾𝐿 (𝐮, 𝐩).
in place of the predefined uniform distribution as in LSR. This 1
The output u suggests 𝑓𝑝𝑡 has a random accuracy i.e. % for the
permits the periocular network to explore and, in the meantime, 𝐶
learns the inter-class dissimilarities from the pre-task network, c-class classification problem [11]. Therefore, it is instructive
while mitigating the risk of over-fitting. In principle, the pre- to utilize 𝑓𝑝𝑡 to produce pre-task prediction 𝐩𝑝𝑡 for
task prediction is rendered from an auxiliary network of which regularization.
coined as pre-task network henceforward. Architecturally, the 𝑝𝑡
We opt for 𝐩𝜏 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝐳 𝑝𝑡 /𝜏), where 𝐳 𝑝𝑡 refers to the
pre-task network shares an identical network construction with logit vector for the pre-task network and 𝜏 is a temperature term
that of periocular network. Our experimental results in Section
regulating the smoothness of 𝐩𝑝𝑡 , given that 𝜏 ≥ 1 [16]. This
III reveal that the periocular network co-supervised with GLSR formulation is known as the generalized LSR (GLSR) in this
produces better class-wise cluster separation than that of LSR,
paper. For the GLSR-based network instances, the two 𝑓𝑝 and
and thereby improving the overall network generalizability by
means of learning a generally more discriminative embedding 𝑓𝑝𝑡 are of equivalent in terms of architecture and capacity. To
space for identification and verification tasks. be more precise, 𝑓𝑝𝑡 can be any pre-learned model yielded from
the training set similar with that of 𝑓𝑝 , and it is active only
II. PROPOSED METHOD during the training stage. This implies that 𝑓𝑝𝑡 is no longer
A. Label Smoothing Regularization (LSR) required at the inferencing stage. Based on (3), the GLSR loss
function ℒ𝐺𝐿𝑆𝑅 is expressed as follows:
In this section, we give a brief account of label smoothing.
ℒ𝐺𝐿𝑆𝑅 = (1 − 𝛼)𝐻(𝐭, 𝐩) +⁡𝛼𝜏 2 𝐷𝐾𝐿 (𝐩𝜏 , 𝐩𝜏 )
exp⁡(z𝑖 ) 𝑝𝑡
Suppose 𝑝𝑖 = ∑𝑐 , i = 1,…,c, or 𝐩 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝐳) ∈ (4)
𝑘=1 exp⁡(z𝑘 )
ℝ𝑐 , where c refers to the total number of classes, 𝐩 indicates the The generic GLSR architecture constituting a pre-task and a
prediction output for a particular CNN, given that z𝑖 = 𝐲 𝑇 𝐰𝑖 is periocular network is portrayed in Figure 1. The adoption of 𝜏
a logit of the output layer, 𝐰𝑖 is a weight vector along with the and 𝐩𝜏 is motivated by knowledge distillation (KD) reported in
bias term, and y denotes the activations of the penultimate layer. [16], wherein 𝜏 is a temperature term. However, KD is outlined
𝑝𝑖 estimates the probability of that CNN assigning an image to primarily for network compression. Typically, both teacher and
class 𝑖. For the CNNs trained based on one-hot encoded target student networks are designated with different architectures and
vector 𝐭 ∈ {0,1}𝑐 , the expected value for the cross-entropy (CE) capacities, of which the teacher is of generally deeper and more
loss comparing t and 𝐩 is expressed as 𝐻(𝐭, 𝐩) = −𝐭⁡log⁡(𝐩). In sophisticated.
place of that, for those counterparts involving soft target 𝐭 𝐿𝑆𝑅 , Different from KD, GLSR leverages the pre-task network as
the CE loss between 𝐭 𝐿𝑆𝑅 and 𝐩 is minimized, where 𝐭 𝐿𝑆𝑅 is a means of regularization for the primary periocular network to
defined as follows: render a generally more discriminative embedding feature set
for identification and verification tasks. We assert that GSLR
𝐭 𝐿𝑆𝑅 = (1 − 𝛼)𝐭 + 𝛼𝐮 ∈ ℝ𝑐 (1)
incites the class-wise cluster separation. As expressed in (4),
From (1), we observe that 𝐭 𝐿𝑆𝑅 is a mixture of t and a uniform 𝑝𝑡
minimizing ℒ𝐺𝐿𝑆𝑅 diminishes 𝐷𝐾𝐿 (𝐩𝜏 , 𝐩𝜏 ), and thus leading 𝐩
distribution 𝐮 weighted by 0 ≤ 𝛼 ≤1 [9]. It is easy to infer that: 𝑝𝑡
to approximate the fairly sparse pre-task prediction 𝐩𝜏 , thereby
𝐻(𝐭 𝐿𝑆𝑅 , 𝐩) = −𝐭 𝐿𝑆𝑅 log(𝐩) inducing the ideal class-wise cluster separation to the ultimate
(2) classifier. On the other hand, GLSR also serves to regularize the
= ⁡ (1 − 𝛼)𝐻(𝐭, 𝐩) + 𝛼𝐻(𝐮, 𝐩)
over-confident 𝐩 for performance gain. Note that each class
𝑝𝑡
⁡= (1 − 𝛼)𝐻(𝐭, 𝐩) + 𝛼(𝐷𝐾𝐿 (𝐮, 𝐩) + 𝐻(𝐮)) probability of 𝐩𝜏 is of non-zero stemmed from the smoothing
where 𝐷𝐾𝐿 refers to the KL divergence loss, and 𝐻(𝐮) is the effect imposed by 𝜏 ≥1 (hence unlike the one-hot encoded 𝐭 in
𝑝𝑡
entropy of a fixed distribution 𝐮 and hence a constant. As a (3) and (4)). In principle, smoothing 𝐩 with respect to 𝐩𝜏
whole, the joint loss function for LSR is: thwart GSLR from learning an overly confident classifier as in
the hard-label network.
ℒ𝐿𝑆𝑅 = (1 − 𝛼)𝐻(𝐭, 𝐩)+𝛼𝐷𝐾𝐿 (𝐮, 𝐩) (3)
From (3), ℒ𝐿𝑆𝑅 is a composition of the standard CE loss and the
Generalized LSR
Pre-Task Network

(GLSR) w.r.t. 𝜏

Softmax

KL divergence loss weighted by 𝛼, of which the latter serves as KL Divergence Loss


Logit

𝑝𝑡
𝜏 2𝐷𝐾𝐿 (𝐩𝜏 , 𝐩𝜏 )
a regularization term. Accordingly, this regularization attempts
to smoothen the network prediction p = softmax(z) with respect GLSR Joint Loss
Periocular, face,
to a predefined distribution u against overfitting. However, other biometrics,
Periocular Network

since optimizing 𝐷𝐾𝐿 (𝐮, 𝐩) compels 𝐩 to be equal for all or combination


Softmax

Softmax Los
classes, the underlying class-wise cluster separation learned by
Logit

𝐻(𝐭, 𝐩)
p may be corrupted undesirably. Periocular
Biometric
B. Generalized Label Smoothing Regularization (GLSR)
Figure 1. GLSR composing of a pre-task and a periocular network during
The LSR network is architecturally a composition of: training, whereas only periocular network is required for inferencing an
i. A primary network 𝑓𝑝 resolving the problem of interest for unseen periocular sample.
deep feature embedding learning by minimizing ℒ𝐿𝑆𝑅 in (3).

1070-9908 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Glasgow. Downloaded on August 08,2020 at 02:19:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2020.3014472, IEEE Signal
Processing Letters

In this paper, GSLR is explored with three pre-task networks Table 1. Description of testing datasets.
fed with different biometric modalities as follows: Test. Set, including Gallery and Probe
i. P-GLSR – Pre -task network trained with periocular images. Train. Set
Ethnic Pubfig Facescrub Imdb Wiki
ii. F-GLSR – Pre-task network trained with face images.
# of subjects 1,054 328 200 530 2,129
iii. PF-GLSR – A linear combination of periocular and face
# of train 166,737 - - - -
pre-task networks with the joint loss function ℒ𝑃𝐹−𝐺𝐿𝑆𝑅 in
# of val 47,372 - - - -
(5), where 𝛾 denotes a weighting factor for the two pre-task # of gallery - 1,645 9,221 31,066 40,241
𝑝𝑡
networks (periocular and face), and 𝐟𝜏 is the smoothened # of probe1 - 24,171 6,138 21,518 17,658
face prediction. # of probe2 - - 6,101 27,292 15,252
# of probe3 - - 7,680 - 16,273
ℒ𝑃𝐹−𝐺𝐿𝑆𝑅 = (1 − 𝛼)⁡𝐻(𝐭, 𝐩) +
𝑝𝑡 𝑝𝑡 (5)
𝛼𝜏 2 [(1 − 𝛾)⁡𝐷𝐾𝐿 (𝐩𝜏 , 𝐩𝜏 ) + 𝛾⁡𝐷𝐾𝐿 (𝐟𝜏 , 𝐩𝜏 )] (2) Davies-Bouldin index (DBI) [17] is applied to analyze the
intra-class and the inter-class variations. (3) we further justify
C. Inference Stage the reason why P-GLSR prevails over F-GLSR with respect to
Bhattacharyya coefficient (BC) [18]. (4) we provide a thorough
To identify an unknown probe image from N galleries in the
comparison with other competing periocular works.
dataset, both probe and gallery images are present to the trained
periocular network 𝑓𝑝 to extract the embedded features from the B. Experimental Results
penultimate layer. This is followed by an one-to-N matching for As disclosed in Table 2 and Table 3, the deep PF-GLSR features
identity retrieval according to Cosine similarity indexes. In the are among the most discriminative for periocular recognition in
meantime, we also assess the verification performance based on the wild, followed by P-GLSR, Uniform LSR (ULSR) [9], and
Cosine similarity to determine if two paired images are positive F-GLSR. We discern that PF-GLSR improves the average rank-
or negative. 1 identification rate and EER by 2.1% and 1.17% respectively,
compared to the CNN trained with hard targets, i.e. setting 𝛼 =
III. EXPERIMENTAL EVALUATION 0. We term this typical CNN as hard-label network hereinafter.
A. Experimental Setup Interestingly, we also observe that F-GLSR is outperformed by
P-GLSR, despite the former is learned from the face modality.
The training repository is composed of Ethnic [8] and IMDB This is contradictory the rationale that the pre-task face network
Wiki datasets [13]. On the contrary, there are four testing sets, is expected to distill more identity information to supervise the
namely Ethnic [8], Pubfig [14], Facescrub [15] and IMDB Wiki periocular network. We provide justifications in Section III (C)
[13], of which the subject identities do not overlap with that of and (D), and we compare the proposed GLSR networks to other
training set. We detail these training and testing sets in Table 1. state of the arts in Section III (E).
We employ Mobilenet-v2 [22] – a lightweight CNN with a
computational complexity of only 0.1 GFLOPS, as the basic C. Davies Bouldin Index (DBI)
backbone to assemble the CNN construction of a pre-task and a We analyze how intra-class and inter-class variations perturb
periocular network. To induce data diversity, the training set is based on the deep features learned for each gallery set 𝐲𝑔 with
augmented with the horizontally flipped images. We rescale the respect to various CNNs. Our analysis employs Davies Bouldin
pixel intensity in the range of 0 to 1, followed by normalization Index (DBI) [17], which is a well-known evaluation metric for
with respect to a mean of 0.5 and a standard deviation of 0.5. clustering algorithms which simultaneously measures intra-
We opt for Stochastic Gradient Descent with a batch size of 128 class and inter-class variations. In general, a lower DBI value
images for optimization with an initial learning rate set to 0.1, implies a higher inter-class variation (separation of cluster) and
a momentum of 0.9, and a weight decay of 5e-4., All pre-task a lower intra-class variation (compactness of cluster), which is
networks are first trained for 90 epochs and the learning rate is equivalent to the class-wise cluster separation in our exposition.
decayed by a factor of 0.1 at 30, 60 and 80 epochs. In
subsequent to that, the periocular network for each P-GLSR, F- Table 2. Performance comparison in terms of rank-1 identification rate (%)
for hard-label network (involving only hard targets), ULSR, P-GLSR, F-
GLSR, and PF-GLSR are trained for 50 epochs with the GLSR and PF-GLSR.
learning rate decreased by a factor of 0.1 at 25, 40, and 45 Ethnic Pubfig Facescrub Imdb Wiki Average
epochs. The optimal 𝛼 and τ in
Hard-Label 83.5 96.97 92.5 73.52 86.67
(4) are empirically set to 0.1 and 2.5 for P-GLSR, 0.05 and 2.5 ULSR [9] 85.36 97.03 93.62 75.36 87.75
for F-GLSR using the validation set. For PF-GLSR, the optimal P-GLSR 86.43 97.53 93.35 76.46 88.35
𝛼, τ and 𝛾 are fixed to 0.15, 2.5 and 0.33, respectively. F-GLSR 84.34 97.05 92.81 74.26 87.08
In the following section, the GSLR-trained instances, i.e. P- PF-GLSR 86.55 97.58 93.92 77.07 88.7
GLSR, F-GLSR and PF-GLSR, are discussed in 5 perspectives: Table 3. Performance comparison in terms of EER (%) for hard-label
(1) the network generalizability is reported in terms of rank-1 network (involving only hard targets), ULSR, P-GLSR, F-GLSR and
identification rate (%) with Cumulative Characteristic Curve PF-GLSR.
(CMC) (for identification task), and equal error rate (EER) (%) Ethnic Pubfig Facescrub Imdb Wiki Avg.
with Receiver Operating Characteristic (ROC) curve (for Hard-Label 13.88 11.74 8.02 12.96 11.65
verification task). Note that we only sample 4 random images ULSR [9] 13.42 11.98 7.45 13.69 11.64
P-GLSR 12.22 11.15 7.74 11.96 10.77
for each subject for verification task – each image pair forms 12 F-GLSR 13.74 12.34 7.86 13.79 11.93
(=4×(4−1)) positive pairs along with 16 (=4×4) negative pairs. PF-GLSR 12.31 11.10 6.86 11.60 10.47

1070-9908 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Glasgow. Downloaded on August 08,2020 at 02:19:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2020.3014472, IEEE Signal
Processing Letters

From Table 4, we observe that the average DBI for PF-GLSR the basic convolution blocks, e.g. the explicit attention module
and P-GLSR are relatively lower than that of F-GLSR, but the in Attnet serving to underline the most discriminative region of
latter is on a par with hard-label. On the other hand, despite the a periocular image for deep feature learning. We show the CMC
identification rate of ULSR is higher than the baseline, the DBI and ROC curves comparing all three GLSR instances to Attnet,
reveals otherwise. This suggests a strong prior assumed on RGB-OCLBCP, and multi-fusion in Figure 2 and 3.
ULSR defects the clustering due to its uniform inter-class
Table 6. Performance comparison with other periocular networks in
distance constraint induced by the prior of uniform distribution. terms of rank-1 identification rate (%) and EER (%) averaged over
Table 4. DBI for periocular hard-label network, ULSR, P-GLSR, F- four testing sets – Ethnic, Pubfig, Facescrub, and Imdb Wiki.
GLSR, and PL-GLSR with respect to different datasets. CNNs Ident. Rate (%) EER (%)
Train Ethnic Pubfig Facescrub Imdb Wiki Avg. Attnet [7] 67.74 18.58
Hard-Label 2.38 2.56 2.61 2.42 2.94 2.58 Multi-Fusion [21] 84.71 13.27
RGB-OCLBCP [8] 80.83 11.5
ULSR [9] 2.29 2.64 2.7 2.38 3.02 2.61
PF-GLSR 88.7 10.47
P-GLSR 2.28 2.43 2.48 2.3 2.81 2.46
F-GLSR 2.33 2.57 2.63 2.38 2.97 2.58
PF-GLSR 2.22 2.45 2.5 2.27 2.84 2.49

D. Bhattacharyya Coefficient
This section elaborates why F-GLSR is outperformed by P-
GLSR as disclosed earlier in Table 2 and Table 3. Recall the
network prediction vector p is indeed an approximation of
probability mass function (PMF). Since the key ingredient of
𝑝𝑡
GLSR is to minimize 𝐷𝐾𝐿 (𝐩𝜏 , 𝐩𝜏 ) along with the cross
𝑝𝑡
entropy term, the “similarity” between 𝐩𝜏 and 𝐩𝜏 is crucial.
We conjecture if the similarity among them is higher, the better
GLSR would perform as the KL divergence would be easier to
be minimized. For analysis, the similarity of two PMFs are
measured through Bhattacharyya coefficient (BC) [18]. Note
that the higher the BC is, the more similar the two PMFs are. Figure 2. Averaged CMC curves over all four testing sets for hard-
We select a random subject from each training and validation label, ULSR, GLSR-trained network instances, Attnet [7], Multi-
set, and another different subject from the testing set which is Fusion [21], and OCLBCP [8].
not shown during training. All samples are forwarded through
the competing CNNs and the prediction outputs are averaged to
compute BC. We summarize in Table 5 that the BC for pre-task
face network and F-GLSR are relatively lower compared to that
of pre-task periocular network and P-GLSR. This shows that
the pre-task network with stronger biometrics, e.g. face in this
case, do not necessarily benefit the recognition performance.
However, it does offer favorable effect when GLSR is applied
to both face and periocular as revealed by PF-GLSR.
Table 5. BC for face and periocular Hard label networks, ULSR, P-GLSR,
F-GLSR and PL-GLSR.
Hard-Label Networks
Dataset ULSR [9] P-GLSR F-GLSR PF-GLSR
Periocular Face
Train 1.0 0.986 0.971 0.996 0.991 0.995
Val 1.0 0.966 0.947 0.988 0.98 0.988 Figure 3. Averaged ROC curves and EER over all four testing sets for hard-
Test 1.0 0.766 0.877 0.967 0.931 0.966 label, ULSR, GLSR-trained network instances, Attnet [7], Multi-Fusion
[21], and OCLBCP [8].
E. Comparison with Other Work
In this section, we compare the best-performing PF-GLSR to IV. CONCLUSION AND FUTURE WORK
the other most recent CNNs under the same evaluation protocol, In this paper, we extended the label smoothing regularization
including Attnet [7], RGB-OCLBCP [8], and Multi-Fusion [21]. (LSR) to the generalized LSR (GLSR) to cope with periocular
We re-implemented these CNNs following the reported setting recognition in the wild challenge. Different from that of LSR,
and configuration. We disclose in Table 6 that the proposed PF- the GLSR regularized network accomplishes class-wise cluster
GLSR, on average, achieves the most promising identification dispersion by means of learning a pre-task network configured
rate of 88.7% and EER of 10.4%, outperforming all comparing with multiple biometric modalities. The experimental results
counterparts by a distinguishable margin. It is noteworthy that disclose that and the pre-task network designated with a mixture
the inference stage of RGB-OCLBP and Multi-Fusion demands of face and periocular outperformed the baseline periocular
a plurality of inputs or multiple biometric modalities, whilst PF- network and the LSR-based model. As for future work, we will
GLSR involves only a single periocular image. In addition, the explore pre-task networks of greater capacity with other
PF-GLSR construction equips no other manipulation on top of biometric combination such as iris.

1070-9908 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Glasgow. Downloaded on August 08,2020 at 02:19:10 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/LSP.2020.3014472, IEEE Signal
Processing Letters

REFERENCES
[1] Rattani, A., & Derakhshani, R. (2017). Ocular biometrics in the [12] Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., & Hinton, G.
visible spectrum: A survey. Image and Vision Computing, 59, 1-16. (2017). Regularizing neural networks by penalizing confident output
[2] Alonso-Fernandez, F., & Bigun, J. (2016). A survey on periocular distributions. arXiv preprint arXiv:1701.06548.
biometrics research. Pattern Recognition Letters, 82, 92-105. [13] Internet Movie Database. In: IMDB. http://www.imdb.com
[3] Park, U., Jillela, R. R., Ross, A., & Jain, A. K. (2010). Periocular [14] Kumar, N., Berg, A. C., Belhumeur, P. N., & Nayar, S. K. (2009,
biometrics in the visible spectrum. IEEE Transactions on Information September). Attribute and simile classifiers for face verification. In
Forensics and Security, 6(1), 96-106.. 2009 IEEE 12th international conference on computer vision (pp.
[4] Cao, Z., & Schmid, N. A. (2016). Fusion of operators for 365-372). IEEE.
heterogeneous periocular recognition at varying ranges. Pattern [15] Ng, H. W., & Winkler, S. (2014, October). A data-driven approach to
Recognition Letters, 82, 170-180. cleaning large face datasets. In 2014 IEEE international conference
[5] Xu, J., Cha, M., Heyman, J. L., Venugopalan, S., Abiantun, R., & on image processing (ICIP) (pp. 343-347). IEEE.
Savvides, M. (2010, September). Robust local binary pattern feature [16] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge
sets for periocular biometric identification. In 2010 Fourth IEEE in a neural network. arXiv preprint arXiv:1503.02531.
International Conference on Biometrics: Theory, Applications and [17] Davies, David L., and Donald W. Bouldin. "A cluster separation
Systems (BTAS) (pp. 1-8). IEEE. measure." IEEE transactions on pattern analysis and machine
[6] Proença, H., & Neves, J. C. (2017). Deep-prwis: Periocular intelligence 2 (1979): 224-227.
recognition without the iris and sclera using deep learning [18] Bhattacharyya, Anil. "On a measure of divergence between two
frameworks. IEEE Transactions on Information Forensics and statistical populations defined by their probability
Security, 13(4), 888-896. distributions." Bull. Calcutta Math. Soc. 35 (1943): 99-109.
[7] Zhao, Z., & Kumar, A. (2018). Improving periocular recognition by [19] Tiong, L. C. O., Kim, S. T., & Ro, Y. M. (2019). Implementation of
explicit attention to critical regions in deep neural network. IEEE multimodal biometric recognition via multi-feature deep learning
Transactions on Information Forensics and Security, 13(12), 2937- networks and feature fusion. Multimedia Tools and Applications,
2952. 78(16), 22743-22772.
[8] Tiong, L. C. O., Teoh, A. B. J., and Lee, Y. (2019). “Periocular [20] Zhang, Q.; Li, H.; Sun, Z.; Tan, T. Deep feature fusion for iris and
Recognition in the Wild with Orthogonal Combination of Local periocular biometrics on mobile devices.
Binary Coded Pattern in Dual-Stream Convolutional Neural IEEE Trans. Inf. Forensics Secur. (2018), 13, 2897–2912
Network”. International Conference on Biometrics (ICB), 1-6, doi: [21] Soleymani, S., Dabouei, A., Kazemi, H., Dawson, J., Nasrabadi,
10.1109/ICB45273.2019.8987278.. N.M. (2018). Multi-level feature abstraction from
[9] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). convolutional neural networks for multimodal biometric
Rethinking the inception architecture for computer vision. In identification. International Conferences on Pattern Recognition
Proceedings of the IEEE conference on computer vision and pattern (ICPR), 3469-3476.
recognition, 2818-2826. [22] Sandler, Mark, et al. Mobilenetv2: Inverted residuals and linear
[10] Müller, R., Kornblith, S., & Hinton, G. E. (2019). When does label bottlenecks. Proceedings of the IEEE conference on computer vision
smoothing help?. In Advances in Neural Information Processing and pattern recognition. 2018.
Systems (pp. 4696-4705).
[11] Yuan, L., Tay, F. E., Li, G., Wang, T., & Feng, J. (2019). Revisit
Knowledge Distillation: a Teacher-free Framework. arXiv preprint
arXiv:1909.11723.

1070-9908 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Glasgow. Downloaded on August 08,2020 at 02:19:10 UTC from IEEE Xplore. Restrictions apply.

You might also like