You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/353630696

Face detection in the darkness using infrared imaging: a deep-learning-based


study

Conference Paper · August 2021


DOI: 10.1117/12.2597194

CITATION READS
1 262

4 authors, including:

Zhicheng Cao
Xidian University
44 PUBLICATIONS 188 CITATIONS

SEE PROFILE

All content following this page was uploaded by Zhicheng Cao on 01 September 2022.

The user has requested enhancement of the downloaded file.


Face Detection in the Darkness Using Infrared Imaging: A
Deep Learning-Based Study
Zhicheng Cao a , Heng Zhao a , Shufen Cao b , and Liaojun Pang a *
a
Xidian University, Xi’an, China.
b Case Western Reserve University, Cleveland, OH, USA.

ABSTRACT
Face detection is one of the most important research topics in the field of computer vision, and it is also the premise
and an essential part of face recognition. With the advent of deep learning-based techniques, the performance of
face detection has been largely improved and more and more daily applications have been witnessed. However,
face detection is greatly affected by environmental illumination. Most of existing face detection algorithms neglect
harsh illumination conditions such as nighttime condition where lighting is insufficient or it is totally dark. These
conditions are often encountered in real-world scenarios, e.g., nighttime surveillance in law enforcement or civil
settings. How to overcome the problem of face detection in the darkness becomes a critical and urgent demand.
We thus in this paper study face detection in the darkness using infrared (IR) imaging. We build an IR face
detection dataset and utilize a deep learning-based model to study the face detection performance. Specifically,
the deep learning model is a Single Stage Detector which has the advantage of fast speed and lower computation
cost compared with other face detectors that consists of multiple stages. In the experiment, we also compare
the performance of the deep learning model with that of a well-known traditional face detection algorithm, Harr
(or Adaboost). In terms of precision, our model significantly outperforms Harr by more than 30% — a dramatic
boost from 68.75% to 98.01%, which suggests our deep learning-based method with IR imaging can indeed meet
the requirement of real-world nighttime face detection applications.
Keywords: Face Detection, low illumination, nighttime, infrared, deep learning.

1. INTRODUCTION
Face recognition has become a relatively mature and widely-used biometrics technology in our daily life after
rapid development in recent years. As a prerequisite for face recognition, face detection has a great influence
on the effect and accuracy of subsequent face recognition. Therefore, the topic of face detection has drawn a
great amount of attention. A lot research works have been conducted,1 among which some algorithms have been
proved to be especially successful and efficiently, such as the AdaBoost algorithm by Viola and Jones.2 As the
era of deep learning comes, many researchers have designed sophisticated convolutional networks to address the
problem of face detection and great achievements have been witnessed, such as Faster RCNN,3 MTCNN,4 and
Cascade CNN.5
However, at present, popular practice of face detection is mostly performed under the traditional choice of
the light spectrum — the visible light band. Although the detection technology has been very mature, but
it is limited by the influence of environmental lighting and visibility factors. In harsh environments, such as
uneven lighting and low visibility, the detection effect is usually not satisfactory since the detection accuracy will
be greatly effected and reduced by those lighting factors. Moreover, the face images need to be pre-processed
prior to the actual detection process to eliminate the influence of light, which leads to the loss of some useful
information.
A more serious case emerges when the face detection task takes place in a totally dark environment where
visible light illuminance is infeasible or should be avoided, such as nighttime surveillance. Many real-world
Further author information: (Send correspondence to Liaojun Pang)
* Liaojun Pang: E-mail: ljpang@mail.xidian.edu.cn, Telephone: +86 2981891070;
Zhicheng Cao: E-mail: zccao@xidian.edu.cn.
surveillance tasks occur during the night. These scenarios are especially common in civil, law enforcement
and military applications. All these scenarios require a quite different light band — the infrared (IR). Latest
advancements have been witnessed in the manufacturing of small and cheap imaging devices sensitive in the
infrared range. Cameras equipped with these sensors have the ability to see at night and through fog and rain
even at long ranges, which makes surveillance in harsh environments more practical and reliable but at the same
time poses new research problems.6–9 Compared with the visible light, IR bands have very different imaging
nature. The reflectivity properties of the facial tissues under IR imaging and visible light imaging are quite
different from each other. Also, the electromagnetic waves of active IR than that of visible light are less affected
by scattering and absorption of smoke or dust, due to different propagation properties.10 Moreover, unlike
imaging in the visible light, IR imaging can be used to extract not only exterior but also useful subcutaneous
anatomical information.11
Therefore, we in this paper study the feasibility of using the infrared as the light band to address the problem
of face detection in the darkness. So far, there have been a few works on face detection that used IR as the
working light band. For example, the work of Yan and Wang utilized the famous Adaboost algorithm to study the
IR face detection problem.12 Zheng et al proposed a projection profile analysis algorithm for face detection and
eyeglasses detection using the thermal IR.13 Shen et al utilized the Active Shape Model (ASM) method to detect
the face features and the low-dimensional Gabor features were extracted for recognition by combining Principal
Component Analysis (PCA) and Linear Discriminant Analysis (LDA).14 The work of Zhou et al designed and
prototyped a novel type of hybrid sensor by combining a pair of near-infrared cameras and a thermal camera.15
To summarize, all these works have adopted traditional non-deep learning methods. Therefore, IR face detection
and especially deep learning-based is of great significance and it remains a challenge.

2. DEEP LEARNING-BASED INFRARED FACE DETECTION


Traditionally, facial features are extracted by hand-designed operators to complete the face detection task.
These traditional methods can be divided into four categories: Feature-based, statistics-based, template-based
and learning-based. More recently, deep neural networks have been shown to be more advantageous than the
traditional methods due to the advancement of high computation power machines (especially more sophisticated
GPUs) and advent of large data era — availability of large training datasets.16, 17
Therefore, we study the problem of Infrared Face Detection using DNN-based methods. More specifically,
we utilize a state-of-the-arts deep learning model — Single Shot Scale-Invariant Face Detector (S3FD).18 The
reason we chose this model to study IR face detection is that it is a singular-staged method which means it
is designed for realtime applications and can run at a faster speed than other multiple-staged models. What’s
more, a scale-equitable face detection framework is proposed for S3FD to handle different scales of faces which
is especially beneficial to boosting the detection rate.

2.1 Model Structure


The most prominent features of the deep model we utilize for IR face detection are that it is a single stage face
detector and it is designed in a scale-equitable structure. The single stage type of model has the advantage of
faster detection speed which is more suitable for realtime requirement. The scale-equitable structure is especially
efficient for handling different scales of faces which is beneficial for the final detection accuracy. In addition, this
model improves the recall rate of small faces by a scale compensation anchor matching strategy. On the other
hand, it also reduces the false positive rate of small faces via a max-out background label.
As illustrated in Figure 1, the overall model consists of six modules, the module of base convolutional
layers, the module of extra convolutional layers, the module of detection convolutional layers, the module of
normalization layers, the module of predicted convolutional layers, and the module of multi-task loss layer. The
base convolutional layers are the same as the convolutional layers of the VGG16 model. This module consists
of a set of 5 stacks of convolutional layers, with the last layer of the last three stacks set as the detection layers.
The module of extra convolutional layers are additional convolutional layers compared with the VGG16 model
with the fully connected layers replaced by convolutional layers. The layers of conv fc7, conv6 2 and conv7 2 are
also set as detection layers. The module of normalization layers are used to deal with the issue of different scale
of features at the conv3 3, conv4 3 and conv5 3 layers, where a L2 normalization is used. Each detection layer
is followed by a p × 3 × 3 × q convolutional layer, where p and q are the channel number of input and output,
and 3 × 3 is the kernel size.

Figure 1: Structure of the S3FD model.

2.2 The Loss Function


The loss function consists of two terms, the classification loss, Lcls , and the regression loss, Lreg . The classification
loss is basically a softmax loss over two classes, i.e., the face and the background; the regression loss is a L1 loss
responsible for correcting the coordinates of the detection boxes. The mathematical expression of the total loss
is as follows:
λ X 1 X ∗
L({pi }, {ti }) = Lcls (pi , p∗i ) + p Lreg (ti , t∗i ), (1)
Ncls i Nreg i i
where i is the index of an anchor and pi is the predicted probability that anchor i is a face. The ground-truth
label p∗i is 1 if the anchor is positive, and 0 otherwise. ti is a vector representing the 4 parametrized coordinates
of the predicted bounding box, and t∗i is that of the ground truth box associated with a positive anchor.

3. EXPERIMENTS AND ANALYSIS


This section presents the numerical results and analysis of detecting faces in IR images. We hereby examine the
performance of IR face detection using a deep neural network-based method. The DNN-based method, as well as
the baselines of two traditional method (i.e., HOG and Harr) are also compared. The performance of the three
methods is displayed as Precision-Recall (PR) curves and Receiver Operating Characteristic (ROC) curves, as
shown in Figure 3 and Figure 4. Performance metrics, such as mAP and AUC, are summarized in Table 2.

3.1 Datasets
In our experiments we utilize the dataset of Q-FIRE collected by Schuckers et al. at Clarkson University.19 The
Q-FIRE dataset consists of face images collected under visible light, and IR. The subset of thermal infrared (or
long wave infrared) of Q-FIRE is considered where a total of 84 individuals are involved. The total number
of thermal IR images is 1340, among which 1101 images are divided as the training set, 85 are divided as the
validation set and the remaining 154 are treated as the test set, as listed in Table 1. A sample face image from
the Q-FIRE dataset is shown in Figure 2.

Table 1: Datasets
Dataset Training Validation Test
Q-FIRE 1101 85 154

Figure 2: A sample face image of the dataset Q-FIRE.

3.2 Experimental Setup


In order to study the performance of IR face detection under deep learning, we chose S3FD as the deep learning
model and compare its performance against that of two traditional methods that were representative and popular
before deep learning methods emerged. The two traditional methods are HOG and Harr. Precision-Recall (PR)
and Receiver Operating Characteristics (ROC) curves are plotted and the values of mAP and AUC are calculated.
All the experiments are run in our lab on a PC with an i9 CPU and a RTX 2080 Ti GPU.

3.3 Performance of IR Face Detection


In this subsection, we carry out experiment of IR face detection on the Q-FIRE dataset using three different
methods, the DNN model we use in this paper, and the other two traditional methods of HOG and Harr. We
then compare the detection performance of the three in order to study the feasibility of using the IR band for
face detection as well as to validate the advantage of the DNN model we use over traditional detection methods.

3.3.1 The Precision-Recall Curve


We first plot and compare the precision-recall curves of the three methods, the DNN model we use in this paper,
and the other two traditional methods of HOG and Harr. As we can see from Figure 3, the PR curve of our
DNN method (i.e., S3FD) is clearly higher than the PR curves of both the traditional methods, HOG and
Harr, demonstrating an apparent advantage over the other two. Between the two traditional methods, HOG is
performing better that Harr. Also, the DNN method yields a PR curve that is almost as high as 1 for a majority
portion over the x-axis, which means the DNN method is satisfactory in performance for real-world applications
of IR face detection.
Figure 3: The precision-recall curves of all the three methods.

3.3.2 The ROC Curve


In addition to the PR curves, we plot the ROC curves of the three methods to verify our conclusion. As we can
see from Figure 4, once again the ROC curve of our DNN method is significantly higher than the ROC curves of
both HOG and Harr. This result once again suggests that the DNN method we use is superior to the other two
methods. Between the two traditional methods, HOG is performing better that Harr. Similar to the PR plot,
the DNN method again yields a ROC curve that is almost as high as 1 for a large portion over the x-axis, which
means the DNN method can actually be applied to real-world applications.

Figure 4: The ROC curves of all the three methods.


3.3.3 Performance Metrics
In order to validate how well our DNN method performs, we also calculate and compare the metrics of detection
among the three methods. Metrics of precision, recall, mAP and AUC are used to evaluate the detection
performance.
As can be seen from Table 2, our method of S3FD yields a precision value of 0.9801, compared with precision
values of 0.6875 and 0.9740 for Harr and HOG, respectively. This result apparently validates that our DNN
method is a more advanced one for the purpose of IR face detection. When recall is chosen as the metrics,
once again our DNN method yields a higher value than the methods of HOG and Harr – the recall value of our
method is 0.9610 while the recall values of Harr and HOG are 0.5000 and 0.4870, respectively. Finally, when
the metrics are chosen to be mAP and AUC, the DNN method demonstrate similar advantage – much higher
mAP and AUC values are obtained by the DNN method. To summarize, this experiment clearly demonstrates
the advantage of using our deep learning-base method for IR face detection.

Table 2: Performance Metrics


Metrics Haar HOG S3FD
Precision 0.6875 0.9740 0.9801
Recall 0.5000 0.4870 0.9610
mAP 0.3377 0.4775 0.9606
AUC 0.2568 0.2469 0.4931

4. CONCLUSIONS
In this paper, we study the problem of face detection at nighttime using the IR light band, since the IR band of
the light is needed for some special real world face detection applications. We take a deep learning-based approach
to study the feasibility of IR face detection. A DNN model is utilized for the study due to its advantages of
accuracy and speed. We carry out experiments on a thermal IR dataset to study the performance of this DNN
model. We also compare its performance against that of two other traditional methods that are representative.
We find out that thermal IR is a promising subband of IR for reliable face detection with deep learning as the
detection tool. We also observe that deep learning-based methods perform better than traditional hand-crafted
operators.

ACKNOWLEDGMENTS
This research is funded by the National Natural Science Foundation of China under Grant 61906149, the Fun-
damental Research Funds for the Central Universities under Grants JB181206 and XJS201201, and the Natural
Science Basic Research Plan in Shaanxi Province of China under Grants 2021JM-136.

REFERENCES
[1] Hjelmas, E. and Low, B. K., “Face detection: A survey,” Computer Vision and Image Understanding 83(3),
236–274 (2001).
[2] Viola, P. and Jones, M. J., “Robust real-time face detection,” International Journal of Computer Vi-
sion 57(2), 137–154 (2004).
[3] Ren, S., He, K., Girshick, R., and Sun, J., “Faster r-cnn: Towards real-time object detection with region
proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149
(2017).
[4] Zhang, K., Zhang, Z., Li, Z., and Qiao, Y., “Joint face detection and alignment using multitask cascaded
convolutional networks,” IEEE Signal Processing Letters 23(10), 1499–1503 (2016).
[5] Deng, J. and Xie, X., “Nested shallow cnn-cascade for face detection in the wild,” in [IEEE International
Conference on Automatic Face and Gesture Recognition ], (2017).
[6] Maeng, H., Liao, S., Kang, D., Lee, S. W., and Jain, A. K., “Nighttime face recognition at long distance:
cross-distance and cross-spectral matching,” in [Asian Conference on Computer Vision], 708–721 (2012).
[7] Nicolo, F. and Schmid, N. A., “Long range cross-spectral face recognition: Matching SWIR against visible
light images,” IEEE Trans. on Inf. Forensics and Security 7(6), 1717–1726 (2012).
[8] Cao, Z., Schmid, N. A., and Li, X., “Image disparity in cross-spectral face recognition: mitigating camera
and atmospheric effects,” Proc. SPIE 9844, 98440Z–98440Z–10 (2016).
[9] Cao, Z. and Schmid, N. A., “Heterogeneous sharpness for cross-spectral face recognition,” Proc. SPIE 10202,
10202 – 10202 – 11 (2017).
[10] Bohren, C. F. and Huffman, D. R., [Absorption and scattering of light by small particles ], John Wiley &
Sons (2008).
[11] Ghiass, R. S., Arandjelovic, O., Bendada, H., and Maldague, X., “Infrared face recognition: a literature
review,” in [Neural Networks (IJCNN), The 2013 International Joint Conference on], 1–10, IEEE (2013).
[12] Yan, C. and Wang, Y., “A novel multi-user face detection under infrared illumination by real adaboost,” in
[International Conference on Computational Intelligence and Software ], 1–6 (2009).
[13] Y, Z., R, B. P., and Y, L. E., “Face detection and eyeglasses detection for thermal face recognition,” in
[Proceedings of SPIE - The International Society for Optical Engineering], 8300:9 (2012).
[14] Shen, P., Liang, Z., Song, J., Hu, X., and Zeng, M., “A near-infrared face detection and recognition system
using ASM and PCA+LDA,” Journal of Networks 9(10) (2014).
[15] Zhou, M., Lin, H., Susan, Y. S., and Yu, J., “Hybrid sensing face detection and registration for low-light
and unconstrained conditions,” Applied Optics 57(1), 69 (2018).
[16] Lecun, Y. and Bengio, Y., [Convolutional networks for images, speech, and time series], MIT Press (1998).
[17] Krizhevsky, A., Sutskever, I., and Hinton, G. E., “Imagenet classification with deep convolutional neural
networks,” in [International Conference on Neural Information Processing Systems], 1097–1105 (2012).
[18] Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., and Li, S. Z., “S3FD: Single shot scale-invariant face
detector,” in [IEEE International Conference on Computer Vision], 192–201 (2017).
[19] Johnson, P. A., Lopez-Meyer, P., Sazonova, N., Hua, F., and Schuckers, S., “Quality in face and iris research
ensemble (Q-FIRE),” in [2010 Fourth IEEE International Conference on Biometrics: Theory, Applications
and Systems (BTAS)], 1–6 (2010).

View publication stats

You might also like