Professional Documents
Culture Documents
Abstract—Person recognition from pose-variant face images is a well (NN)[6] are feature engineering approach and extensively applied
addressed, yet challenging problem, especially for surveillance in a for face recognition. But sallow NN architectures fail to manage
crowded place where the pose variation is large in the test set compare
complex and large data set due to its limitation to represent the wide
to the training set. Conventional feature extraction based face recognition
techniques are not efficient enough to solve the problem. In this paper, a variability in features.
noble mechanism has been proposed to learn the training set consisting
of few pose variant images and many frontal images of different persons Deep neural architectures automatically extract the high levelabstract
using deep learning algorithms. At first, autoencoders are trained to build features unlike the manual feature engineering approach. Deep
the templates for representing the pose variant training images. The left
(45∘ ) and right (+45∘ ) templates cover all pose variations of test images architectures are suitable for analyzing complex structure and
from 90∘ to +90∘ . In the next step the convolution neural network extracting the underlying patterns from raw data, require for decision
(CNN) architectures are used in supervised mode for transforming the making. Autoencoder is a type of unsupervised deep architecture
templates into person specific frontal images present in the training set. used in sentiment analysis and object recognition[19].
Left and right cluster of trained CNNs are obtained with respect to left
and right templates.
Autoencoders[15], [16], [17] provide a higher level representation of
In the testing phase, the head-pose of the test image is estimated using the input by repeatedly mapping the input into a lower dimensional
collaborative representation based classifier (CRC) in order to select the space and then back to a higher dimension, used for reconstruction
appropriate cluster of CNN architectures for generation of the frontal of signals by eliminating the redundant and irrelevant features.
image. The CNN architecture which provides the best match frontal Convolution Neural Networks (CNN)[11], [12], [13], [14] are used
image with the training set is recognized as the specific person. The
matching score is measured using correlation coefficient and Frobenius both supervised and unsupervised mode following deep architectures.
norm. For a frontal test image if the matching score is below than the CNN looks for local receptive fields in an image based on the
predefined threshold then the proposed method does not recognize the assumption that a cell in an image is influenced more by its
image. However, the training set has been updated by the unrecognized neighbouring cells than other cells. It uses different kernels[12],
frontal test images for future recognition. The accuracy of the proposed
method is around 99% when tested on CMU PIE database which is much [13] or functions and observes the response corresponding to
higher in comparison to the existing face-recognition methods. different local areas of an image and constructs the feature set.
Index Terms- Face recognition, Pose estimation, Convolution CNNs have widespread applications including object recognition,
Neural Network, Autoencoders, Template digit classification, and satellite image analysis[20]. CNN along
with softmax regression have been used in face recognition[21],
I. I NTRODUCTION [22]. CNN along with the fully connecter layers extract high level
Person identification has immense scope of real time applications in features while the softmax layer gives probability of the test instance
surveillance systems for monitoring the activity of persons. However, belonging to each of the training classes. CNN has been used to
the existing feature extraction based face recognition[1],[2], [9] directly transform faces into an Euclidean space where distances
techniques are hardly effective for real time applications. For measure similarity between the faces. However, the existing methods
instance, face images acquired from a crowded place are mostl are applied to recognize the faces from non-frontal and frontal
nonfrontal while training images consisting of few pose variant training images where the training set is usually dense than the test
images and more frontal images of recognized criminals. The task set. This paper recognizes faces considering training images which
of surveillance system is to recognize the persons from their pose are mostly frontal.
variant nonfrontal face images. In this paper, we employ autoencoder architecture to obtain
representation for each pose-specific class using a set of training
Dimension reduction based face recognition methods include images, considered as template. The images, being left-oriented
principal component analysis (PCA), linear discriminant analysis head poses and right-oriented head poses, represented by −45∘
(LDA), Independent Component Analysis (ICA), Multi Dimensional and +45∘ templates. The idea is to train the autoencoder using
Scaling (MDS) and Isomap[3]. However, PCA[3], [4], and LDA images of different persons of a particular head pose and we obtain
though effective for dimension reduction, not very suitable for face a representation of the group of images as output of the encoder.
recognition of pose variant images. MDS cannot capture non-linearity The standard autoencoder algorithm has been modified to construct
in feature space whereas Isomap is not efficient for new data points. the templates. Next another deep architecture CNN has been applied
In Kernel PCA[3], [4] based methods, choosing the kernel is a major for transforming the template image into frontal face image. Here
problem. In feature based face recognition methods, accuracy depends CNN is used in supervised mode to learn the network parameters for
on the feature descriptors[5]. Selection of the feature descriptor is conversion of −45∘ and +45∘ head pose templates to frontal images
a major task which depends on the application or the problem and of training set. The CNN are grouped into left cluster architecture
becomes complex when dealing with large quantity data set. (LCA) and (RCA) where LCA contains all the person-specific CNN
Gradient based machine learning algorithms using Neural Networks architectures for conversion of −45∘ template to frontal images and
978-1-5386-2241-4/17/$31.00 ⃝2017
c IEEE
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Autoencoder Architecture
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.
total responses of a group. This characteristic reduces dimensionality head pose of the test image in order to select the appropriate cluster
and ensures translation invariance. Most commonly used pooling (LCA or RCA) of CNN architectures. The test image is reconstructed
method is max pooling which selects the maximum responses from a using that particular cluster and find out the best match training image.
group of local responses. The pooling layer has been omitted in the
proposed CNN architecture because the image databases do not have III. PROPOSED METHOD
translation invariance in images. An example of CNN architecture is
Face recognition is challenging under the constraint that the
shown in Figure 3.
training set contains more frontal images compare to pose variant
images, especially when the test images have wide pose variations.
The aim of the paper is to solve the problem in a real time and
dynamic environment, like surveillance in a crowded place. In the
training phase first we construct templates of 45∘ and +45∘ pose
variant images using autoencoders and the few pose variant images
available in the training set. Then we propose a learning model to
transform the nonfrontal templates to frontal person-specific images
of the training set using CNN architectures.
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.
where 𝑤𝑖𝑗 denotes a particular weight or parameter to be learnt (𝑗 𝑡ℎ
node of 𝑖𝑡ℎ layer), 𝜂 is the learning rate and 𝐸 is the squared error.
However, it has been observed that in the initial phase of training the
weights of the autoencoder become negative due to the difference in
gray level images and the ReLU. Therefore, a correction is needed in
the algorithm and we consider two heuristics to modify the algorithm.
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 5: Testing Method Flowchart Fig. 7: Pose Variation from −90∘ to +90∘ in Database 1 with a
gap of 22.5∘ between successive images of a particular person
for the person, and the training data set is updated. The proposed
method therefore, dynamically update the training set to strength the
training phase. Even if recognition fails due to low confidence value,
we can still use the test image for enriching the dataset. The updating Fig. 8: Template of +45∘ built using 8 images of +45∘
condition is given in Table 1.
TABLE II: Accuracy of the proposed method
IV. RESULTS AND DISCUSSIONS
Head-pose Number Architecture Accuracy
For experimentation the database is built using CMU PIE[8] which of test images of images group used (%)
consists of 43168 images of 68 persons with 13 different poses −90∘ 40 −45∘ to 0∘ 99
and 43 different illuminations. Different pose variant images of an −67.5∘ 40 −45∘ to 0∘ 99
individual available in the dataset are −90∘ , −67.5∘ , −45∘ , −22.5∘ −45∘ 40 −45∘ to 0∘ 100
, 0∘ , +22.5∘ , +45∘ , +67.5∘ and +90∘ . Frontal images of 40 +45∘ 40 +45∘ to 0∘ 100
persons are sampled randomly and 8 persons with +45∘ and −45∘ +67.5∘ 40 +45∘ to 0∘ 99
are sampled randomly from the database to build the training set. +90∘ 40 +45∘ to 0∘ 99
Pose variant images of different persons in Database are shown in
Figure 7. The test set consists of +45∘ , −45∘ , +67.5∘ , −67.5∘
, +90∘ and −90∘ images of all the 40 persons. After estimating
the head pose of the test image, it has been passed through the
corresponding architectures (LCA or RCA). The reconstructed image
is compared with the corresponding frontal images and the best match
is recorded as the recognized person. Table 2 shows the accuracy of
the proposed method for various groups of pose-varying test images.
TABLE I: Updation Table
Test Image Recognition(confidence) Updation
New(Frontal or Non-Frontal) Incorrect(High) No
Fig. 9: Template of +90∘ built using 8 images of +90∘
New(Non-frontal) Correct(Low) No
New(Frontal) Correct(Low) Yes
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.
ROC Curve Theoretical and Applied Information Technology, vol.36, no.1, 2012.
1 [2] Xiaozheng Zhang and Yongsheng Gao, Face recognition across
Gabor PCA pose: A review, Pattern Recognition, vol. 42,no. 11
Gabor LDA [3] Sukhvinder Singh, Meenakshi Sharma, and Dr N Suresh Rao, Ac-
0.8 ICA curate face recognition using pca and lda, International Conference on
LPP Emerging Trends in Com-puter and Image Processing (ICETCIP2011)
True Positive Rate
DCT [4] Ming-Hsuan Yang, Kernel eigenfaces vs. kernel fisherfaces: Face
0.6 Proposed Method recognition using kernel methods, in fgr. IEEE, 2002,p. 0215.
[5] Cong Geng and Xudong Jiang, Face recognition using sift features,
in Image Processing (ICIP),2009 16th IEEE International Conference
0.4 on. IEEE, 2009,pp. 33133316.
[6] M Nandini, P Bhargavi, and G Raja Sekhar, Face recognition
using neural networks, International Journal of Scientific and Research
0.2 Publications, vol. 3, no. 3, pp. 1, 2013.
[7] Lei Zhang, Meng Yang, Xiangchu Feng, Yi Ma, and David Zhang,
Collaborative representation based classification for face recognition,
0 CoRR, vol. abs/1204.2358, 2012.
0 0.2 0.4 0.6 0.8 1 [8] T. Sim, S. Baker, and M. Bsat, The CMU pose, illumination,
False Positive Rate and expression (PIE) database, in Proceedings of the 5th International
Fig. 10: Comparison of ROC using Database Conference on Automatic Face and Gesture Recognition, 2002.
[9] Xiujuan Chai, Shiguang Shan, Xilin Chen, and Wen Gao, Locally
linear regression for pose-invariant face recognition, Image Process-
A. Comparisons ing, IEEE Transactions on, vol. 16, no. 7, pp. 17161725, 2007.
[10] Srija Chowdhury, Jaya Sil. Head pose estimation for recognizing
Comparison[10] of accuracy of proposed method with other
face images using collaborative representation based classification.
existing face recognition methods applied on CMU-PIE database[8]
Advances in Computing, Communications and Informatics (ICACCI),
are listed in Table 3 while ROC curves is shown in Figure 10.
2016 International Conference on. IEEE, 2016.
TABLE III: Comparison of the face recognition methods using [11] LeCun, Yann, and Yoshua Bengio. Convolutional networks for
CMU-PIE database images, speech, and time series. The handbook of brain theory and
neural networks 3361.10 (1995): 1995.
Methods Accuracy TPR [12] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Deep learn-
Gabor PCA 58 0.60 ing. Nature 521.7553 (2015): 436-444.
Gabor LDA 65.5 0.66 [13] Lawrence, Steve, et al. Face recognition: A convolutional neural
ICA 59 0.60 network approach. IEEE transactions on neural networks 8.1 (1997)
Gabor Supervised LPP 74.3 0.74 [14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Im-
Local DCT + Feature Fusion 70.9 0.72 agenet classification with deep convolutional neural networks. Ad-
NN 78.8 0.80 vances in neural information processing systems. 2012.
LRC 81.9 0.83 [15] Vincent, Pascal, et al. Extracting and composing robust features
S-SRC 90 0.93 with denoising autoencoders. Proceedings of the 25th international
Proposed Method 99.3 0.99 conference on Machine learning. ACM, 2008.
[16] Vincent, Pascal,et al. Stacked denoising autoencoders:Learning
V. CONCLUSIONS useful representations in a deep network with a local denoising
Proposed algorithm performs much better than the existing al- criterion.Journal of Machine Learning Research 11.Dec(2010)
gorithms, as depicted in tables and figures. The CNN feature of [17] Bengio, Yoshua. Learning deep architectures for AI. Foundations
local patterns, connectivity exploitation and autoencoder for template and trends in Machine Learning 2.1 (2009): 1-127.
reconstruction are intelligently used in the paper for solving the [18] Nair, Vinod, and Geoffrey E. Hinton. Rectified linear units
problem in question. The only drawback is that many architectures improve restricted boltzmann machines. Proceedings of the 27th
are needed to built for this method, though can be constructed international conference on machine learning (ICML-10). 2010.
offline mode. But during testing, only prediction with the existing [19] Socher, Richard, et al. Semi-supervised recursive autoencoders
architectures is required which do not take much time. Novelty of the for predicting sentiment distributions. Proceedings of the conference
proposed method is updation of database from the test images and the on empirical methods in natural language processing. Association for
system performs well even for few number of pose variant images in Computational Linguistics, 2011.
the training set. [20] Pal, Saptarshi, Srija Chowdhury, and Soumya K. Ghosh. DCAP:
Future scope includes bringing down the number of architectures A deep convolution architecture for prediction of urban growth.
required for person recognition without sacrificing the accuracy thus Geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE
to have a single architecture that will convert any non-frontal image International. IEEE, 2016.
to its frontal counterpart, thus saving space and time for recognition. [21] Sun, Yi, et al. Deep learning face representation by joint
identification verification. Advances in neural information processing
REFERENCES systems. 2014.
[1] D.G. Balakrishnan S. Chitra, A survey of face recognition [22] Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. Deep
on feature extraction process of dimensionality reduction techniques, Face Recognition. BMVC. Vol. 1. No. 3. 2015.
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.