You are on page 1of 4

2021 3rd International Conference on Applied Machine Learning (ICAML )

Low Resolution Face Recognition System Based on ESRGAN

Zhuj iang He
Chengkun Song Guangdong Ocea n University(s), Zhanjiang, Guangdong,
Guangdong Ocea n University(s), Zhanjiang, Guangdong, 524088, China
524088, China

Zhenni Zhang
2021 3rd International Conference on Applied Machine Learning (ICAML) | 978-1-6654-2125-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICAML54311.2021.00024

Yinghuai Yu' Guangdong Ocean University(s), Zhanjiang, Guangdong,


Guangdong Ocean Univer sity(s), Zhanjiang, Guangdong, 524088, China
524088, China
*yuyinghuai@126.com

Abstract- Low- re solution face re cognition is one of the propose a more suitable LRFR algorithm for teaching
re search hotspots of face recognition tuday. It can be widel)' scenanos.
used in face recognition in va r ious scenarios, such as identity With the development of relevant aspects, face
verification at sta tions and clas sroom check-in . The prior art recognition technology has developed rapidly. However,
h as ac hieved better performance in ideal scen a r ios, but in the case of LRFR, the recognition rate will decline,
detecting low -resolution images will make it difficult to making the existing model difficult to be applied in the
recognize low-resolution human face s, an d th e accuraC)' will
actual situation[3]. Therefore, in this paper, we not only
eventually decrease. This is why improving the accurac)' of
improve the existing face target detection algorithm model,
low-resolution face recognition (L RFR) is still challenging.
but also build a LRFR model through Facenet[4]
\ Ve haw fini shed the reasearch aim to solve the problem
about L RFR. The super-resolution GAN (S RGAl'l') and
combined with ESRGAN.
enhanced su per- r esolution GA N (ESRGAN) used in this
II . O VERALL S YSTEM D ESIGN
search. C om pa r ing these methods, we fin all,· obtain a model
that can solve the low-precis ion problem of LRFR. Our In this paper, A training module and a recognition
sys tem use s su per -resolut ion reconstruction as the module with reconfiguration function are the key parts of
preprocessing ste p of the LRFR problem, a nd then uses our system in Fig. 1.
F ace net to recognize the image. T hese data sets are \Vild
Face T ag (LFW), YouTube Face Databa se (YTF) , and
Wider F ace Dataset. The experimental re sults show that the
accuracy of the ESRG AN based on Facenet of the proposed
syst em in the unconstrained natural environment is as high
as 98.78 % • At the same time, incr-ease the number and speed
of face detection, effectively realize the fun ction of multiple
face re cognition, has practical application value and system
robustness.

Keywords-Face detection ; Face Recogn ition; YOLO;


S up er-resolution reconstruction; ESRGAN

1. INTRODUCTION
Through our observation of social phenomena, It IS
found that there are frequent phenomena such as lax Figure I. System designed by ourselves
management of middle school style construction and
students' absence from classes on campus, which is mainly LPFR consists of three parts: multi-target recognition,
due to the time-eonsuming and laborious classroom super-resolution face reconstruction, and face feature
attendance of college staff. However, due to the extremely extraction. Multi-target recognition based on technologies
low efficiency of recognition hardware support in most have used in the system such as multi-threading, high
colleges and universities, and expensive attendance concurrency, and high availability for simultaneous
machines are equipped with cameras and other hardware recognition of multiple single targets. Perform super-
for face recognition algorithm[I][2]. Therefore, we resolution face reconstruction with ESRGAN as the
algorithm support on a single low-resolution face image,

978-1-66 54-2 125-6121/$3 1.00 ©202 1 IEEE 76


DOl 10. II 09IICAM L543I 1.2021. 00024
Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:08:54 UTC from IEEE Xplore. Restrictions apply.
supplement and restore part of the face information, and and eliminate the calculation bottleneck of the CSPBlock
transfer the obtained high-resolution (HR) face image to module with a large amount of calculation, In addition,
Facenet for face Feature extraction. The incoming face instead of using the Mish activation function in Y010v4,
recognition model is used for face feature matching and there keeps LeakyReLu function used in CSPDarkNet53-
recognition, and the recognition result is obtained through tiny network as activation function to further simplify the
the retrieval of the corresponding information in the calculation process. LeakyReLU function is
student information database and fed back to the user.
X, ,x, ~ 0
(I)
III. FACE D ETE CTION Yt;:::
{ ~
, x, < 0
"i
A . YOLOv4-Tiny
where a t E (1, + (0), it is a constant parameters.
The two-stage method based on regional The feature pyramid network in feature fusion is used
recommendation and the one-stage method based on by Yolov-l-tiny to extract feature maps of different scales,
regression are the two mainstream target detection which improves the speed of target detection. Assuming
methods . YOLO, is a single-stage detector with a unified that the input size is 416*416, the feature classification has
architecture . It extracts the feature map and uses the whole only face, the structure of Yolov-l-tiny is shown in Figure
feature map as a candidate region to predict the bounding 2.
box and category, On this basis, YOLOv3 and
YOLOv4[5][6][7][8] et al. were improved in many aspects. TABL E I. THE EFFECT OF FACE DETECTI ON BY VARI OUS
METHODS .
Based on the face object detection method of deep
learning, we chose the YOLOv4-tiny face detection model. Method FPS mAP(%
We compared multiple models including MTCNN[9][10] )

and YOLO. YOLOv4-tiny have yield successful results in YOLOv3 49 52.6


field of object detection . The typical MTCNN can detect YOLOv4 41 64.7
many regular faces, but its ability doesn't meet demand
about our system's need to real-time. YOLOv4-tiny 275 38.5

B . Model ofStructure F rom YOLOv4-Tiny


The reason why we choose Yolov-l-tiny as our main In the model we designed, we used LFW[ll] and
structure is that the compatibility of target detection Wider Face dataset[12] as training samples and test
method deployed on embedded system or mobile device is samples for our model. After contrast YOLO series
satisfied models for training and testing, in ensuring that guarantee
real-time mAP, finally we chose YOLOv4 - tiny. Test the
model in the WIDER face dataset, FPS mAP for 275 is
38.5, the effect such as table 1, finally achieved better
effect such as Fig. 3.
I 'I

Figure 2. The structure ofYOL0v4 Figure 3. Result of detecting faces4 Face Recognition.

CSPDarknet53-tiny network is used in the Yolova-tiny c. Face Recognition


method as the backbone network as showing in Fig. 2.
1) Enhanced Super Resolution GAN(ESRGAN)
There are the CSPBlock network used in the cross-layer
GAN[13] designed by Ian Goodfellow and his
component of the CSPDarkNet53-Tiny and ResBlock
colleagues in 2014. The core of GA N lies in generation
module in the rest ofthe network . Besides, The CSPBlock
and confrontation. Generation refers to a generative model,
module not only divides the feature map into two parts,
"confrontation" refers to a confrontation training method ,
but also fuses it through the edge of residual across layers.
and GAN is specifically designed to optimize generation
This makes the gradient flow can be spread in two
tasks. One of the difficulties of generative models is how
different network path, thus increasing the gradient
to measure the similarity between the generative
information of relevance. In addition, cspblock module
distribution and the true distribution. Under normal
can enhance the learning ability of convolutional network
circumstances, we only know the sampling results of these

77

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:08:54 UTC from IEEE Xplore. Restrictions apply.
two distributions, it is difficult to know the specific
distribution expressions, so it is difficult to find a suitable L~a =- lEx, [log(l- DRa(XpXt))]-
measurement method. The idea of GAN is to give this IExr[log (DRa(xt , xr))] (3)
measurement task to a neural network, which is called a
discriminator. Xt is the image that it passes through the generator
ESRGAN15further improves the restored Image after the original low-resolution image. Since the loss of
quality of SRGAN[14]. ESRGAN[15] removes all BN the confrontation includes x, and xf' the generator
layers in the structure of the generator and replaces the benefits from the generated data and actual data in the
original basic block with the Residual-in-Residual Dense confrontation training. The gradient of the data , this
Block (RRDB). It combines combines multi- level residual adjustment will make the network learn clear edges and
network and dense connections as shown in Fig. 4. textures.
Using the perceived loss of need features before and
activation of super-resolution restoration, overcame the
two shortcomings. The features of activation are very few
and scattered, especially in the deep-network. The few and
scattered activation provide oversight of the effect is very
weak, can lead to poor performance. Use activation
characteristics will lead to the brightness of the
reconstructed image with GT , and add the content is lost,
lost the final function consists of three parts ,

(4)

where

£1 = IE x ,IIG(x,) - YI11 (5)


Figure 4. SRGAN removes the BN layers from residual block, there is
IV. CONCLUSION
RRDB block used in our deepermodel andthereis pas the residual
scaling parameter. This experiment is based on YOLOv4.tiny and Facenet
with ESRGAN to solve the problem about LRFR First,
The improvement of the antagonistic loss is mainly the we propose a face recognition method in low-resolution
use of relativistic GAN to make it relatively real rather scenes based on YOLOv4.tiny; secondly, Facenet with
than absolute. The presence of perceptual loss was ESRGAN have been used in face recognition. LFW and
calculated using features prior to activation (previously, Y1F[ 16] have been used in our model.
features after activation). Pre-train the network to optimize
for PSNR first, and then fine-tune it using GA N. TABLE II. THE EV ALU TION OF PRE-PROCESSING.
ESRGAN does not use batch normalization. The BN Item Low High PSN SSI Ace Valida
layer normalizes the test data by using the mean and R M mac tion
variance normalization features of a batch of data during v rate
training, and normalizes the test data by using the mean None 32 75.3
and variance estimated on the overall training set in the X 32 4%
SRGA 32 256 31.2 0.8 98.5 94.98
time of testing. If the statistical results of the training set
N X 32 X 256 3 7 1% %
and the test set are far different, the generalization ability ESRG 32 256 39.1 0.9 98.9 96.56
of the model will be restrained by the BN layer. Deleting AN X 32 X 256 a 6 5% %
batch standardization can improve stability and reduce
computational costs. In this system, Our model are proposed to solve the
2) Loss Function LRFR problem in based on YOLOv4 and GAN. First of
Compared with SRGAN, ESRGAN uses the relative all, this system proposed a model to face detection by
average discriminator (RaD) discriminator to estimate the contrasting MTCNN and YOLOv4. Secondly, it also
real image relative to the probability of false images more proposed super-resolution methods by using ESGAN and
real. The loss function of the discriminator is, SRGAN. Last but not least, it used Facenet to solve the
LPFR problem. This model was evaluated the model to
detect faces using Wider face datasets and to recognize
L~a =- lEx, [log(DRa(XpXt))]- IE xr[1og(l- students' faces using LFW. Although, an image with 32 X
DRa(xt'x r))] (2) 32 pixels only had the ace of 75%. In our experiment,
98.95% is the latest accuracy. Finally.our experiment have
The counter loss function of the corresponding achieved the accuracy of 98.78% and VAL of 96.56%.
generator is, Our experiment had approached Facenet with HR images.

78

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:08:54 UTC from IEEE Xplore. Restrictions apply.
For further research, we hope can develop the system to [16] L. Wolf, "Face Recognition in Unconstrained Videos with
achieve real-time. Based on the YOLOv4-tiny network, Matched Background Similarity."
the face detection technology has been studied in detail,
and combined with the current technology, the detection
accuracy has been improved under the condition that the
real-time detection of the YOLO network is guaranteed to
a certain extent, but there is still a big improvement.

ACKNOWLEDGEMENT
In our experiment, the National Undergraduate
Irmovation and Entrepreneurship Program (20201 0566022)
and the Undergraduate Innovation Team Program of
Guangdong Ocean University (CXTD2019004) provide us
with a lot of practical help, cameras, venues and technical
guidance.

REFERENCES
[1] Lin Shouguang. Research on fast face recognition algorithm based
on laboratory attendance system [J]. Information Technology,
2019,43 (04):16-18+22.
[2] Chen Qi. Design and implementation of student attendance system
based on face recognition [D]. University of Electronic Science
and Technology of China, 2019.
[3] Kaibing Zhang, Dongdong Zheng, Junfeng Jing. Review of low
resolution face recognition [J]. Computer Engineering and
Applications,2019,55(22):14-24.
[4] Schroff F, Kalenichenko D, Philbin J. Facenet: A unified
embedding for face recognition and clustering[ C]/ / Proceedings
of the IEEE conference on computer vision and pattern recognition.
2015: 815-823.
[5] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: optimal speed
and accuracy of object detection[J]. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),2020.
[6] Redmon J, Farhadi A. Yolov3: an incremental improvement[J]. In
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR).2018.
[7] Joseph Redmon, Ali Farhadi. Yol09000: better, faster, stronger[J].
In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR).2016.
[8] Redmon J, Divvala S, Girshick R. You only look once: unified,
real-time object detection[J]. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR). 2016.
[9] JIANG N,YU W,TANG S,et al.A cascade detector for rapid face
detection[C]!lIEEE.IEEE 7th International Colloquium on Signal
Processing and Its Applications.Penang:Ignatius,2011:155-158.
[10] Zhang K, Zhang Z, Li Z, et al. Joint face detection and alignment
using multitask cascaded convolutional networksl.I] . IEEE Signal
Processing Letters, 2016, 23( 10) : 1499-1503.
[11] G. B. Huang, M. Mattar, T. Berg, and E. Learned-miller, "Labeled
Faces in the Wild: A Database for Studying Face Recognition in
Unconstrained Environments," Tech. rep., 2008.
[12] S. Yang, et al. : WIDER FACE: A Face Detection Benchmark,
2015.
[13] I. J. Goodfellow et aI., "Generative Adversarial Networks," pp. 1-
9,2014, [Online]. Available: http://arxiv.orglabs/1406.2661.
[14] Ledig, C., Theis, L., Husz'ar, F., Caballero, J., Cunningham, A.,
Acosta, A., Aitken, A., Tejani, A., Totz, 1., Wang, Z., et al.Photo-
realistic single image super-resolution using a generative
adversarial network],l]. In CVPR, 2017.
[15] X. Wang et al., "ESRGAN: Enhanced super-resolution generative
adversarial networks," 2019,

79

Authorized licensed use limited to: VIT University. Downloaded on January 20,2024 at 07:08:54 UTC from IEEE Xplore. Restrictions apply.

You might also like