J. Sil 1

FACE RECOGNITION from NON-FRONTAL IMAGES
Using DEEP NEURAL NETWORK

Srija Chowdhury Jaya Sil
IIEST,Shibpur IIEST,Shibpur
srija5584@gmail.com js@cs.iiests.ac.in
Abstract—Person recognition from pose-variant face images is a well (NN)[6] are feature engineering approach and extensively applied
addressed, yet challenging problem, especially for surveillance in a for face recognition. But sallow NN architectures fail to manage
crowded place where the pose variation is large in the test set compare
complex and large data set due to its limitation to represent the wide
to the training set. Conventional feature extraction based face recognition
techniques are not efficient enough to solve the problem. In this paper, a variability in features.
noble mechanism has been proposed to learn the training set consisting
of few pose variant images and many frontal images of different persons Deep neural architectures automatically extract the high levelabstract
using deep learning algorithms. At first, autoencoders are trained to build features unlike the manual feature engineering approach. Deep
the templates for representing the pose variant training images. The left
(45∘ ) and right (+45∘ ) templates cover all pose variations of test images architectures are suitable for analyzing complex structure and
from 90∘ to +90∘ . In the next step the convolution neural network extracting the underlying patterns from raw data, require for decision
(CNN) architectures are used in supervised mode for transforming the making. Autoencoder is a type of unsupervised deep architecture
templates into person specific frontal images present in the training set. used in sentiment analysis and object recognition[19].
Left and right cluster of trained CNNs are obtained with respect to left
and right templates.
Autoencoders[15], [16], [17] provide a higher level representation of
In the testing phase, the head-pose of the test image is estimated using the input by repeatedly mapping the input into a lower dimensional
collaborative representation based classifier (CRC) in order to select the space and then back to a higher dimension, used for reconstruction
appropriate cluster of CNN architectures for generation of the frontal of signals by eliminating the redundant and irrelevant features.
image. The CNN architecture which provides the best match frontal Convolution Neural Networks (CNN)[11], [12], [13], [14] are used
image with the training set is recognized as the specific person. The
matching score is measured using correlation coefficient and Frobenius both supervised and unsupervised mode following deep architectures.
norm. For a frontal test image if the matching score is below than the CNN looks for local receptive fields in an image based on the
predefined threshold then the proposed method does not recognize the assumption that a cell in an image is influenced more by its
image. However, the training set has been updated by the unrecognized neighbouring cells than other cells. It uses different kernels[12],
frontal test images for future recognition. The accuracy of the proposed
method is around 99% when tested on CMU PIE database which is much [13] or functions and observes the response corresponding to
higher in comparison to the existing face-recognition methods. different local areas of an image and constructs the feature set.
Index Terms- Face recognition, Pose estimation, Convolution CNNs have widespread applications including object recognition,
Neural Network, Autoencoders, Template digit classification, and satellite image analysis[20]. CNN along
with softmax regression have been used in face recognition[21],
I. I NTRODUCTION [22]. CNN along with the fully connecter layers extract high level
Person identification has immense scope of real time applications in features while the softmax layer gives probability of the test instance
surveillance systems for monitoring the activity of persons. However, belonging to each of the training classes. CNN has been used to
the existing feature extraction based face recognition[1],[2], [9] directly transform faces into an Euclidean space where distances
techniques are hardly effective for real time applications. For measure similarity between the faces. However, the existing methods
instance, face images acquired from a crowded place are mostl are applied to recognize the faces from non-frontal and frontal
nonfrontal while training images consisting of few pose variant training images where the training set is usually dense than the test
images and more frontal images of recognized criminals. The task set. This paper recognizes faces considering training images which
of surveillance system is to recognize the persons from their pose are mostly frontal.
variant nonfrontal face images. In this paper, we employ autoencoder architecture to obtain
representation for each pose-specific class using a set of training
Dimension reduction based face recognition methods include images, considered as template. The images, being left-oriented
principal component analysis (PCA), linear discriminant analysis head poses and right-oriented head poses, represented by −45∘
(LDA), Independent Component Analysis (ICA), Multi Dimensional and +45∘ templates. The idea is to train the autoencoder using
Scaling (MDS) and Isomap[3]. However, PCA[3], [4], and LDA images of different persons of a particular head pose and we obtain
though effective for dimension reduction, not very suitable for face a representation of the group of images as output of the encoder.
recognition of pose variant images. MDS cannot capture non-linearity The standard autoencoder algorithm has been modified to construct
in feature space whereas Isomap is not efficient for new data points. the templates. Next another deep architecture CNN has been applied
In Kernel PCA[3], [4] based methods, choosing the kernel is a major for transforming the template image into frontal face image. Here
problem. In feature based face recognition methods, accuracy depends CNN is used in supervised mode to learn the network parameters for
on the feature descriptors[5]. Selection of the feature descriptor is conversion of −45∘ and +45∘ head pose templates to frontal images
a major task which depends on the application or the problem and of training set. The CNN are grouped into left cluster architecture
becomes complex when dealing with large quantity data set. (LCA) and (RCA) where LCA contains all the person-specific CNN
Gradient based machine learning algorithms using Neural Networks architectures for conversion of −45∘ template to frontal images and
978-1-5386-2241-4/17/$31.00 ⃝2017
c IEEE
Authorized licensed use limited to: Indian Instt of Engg Science & Tech- SHIBPUR. Downloaded on April 16,2024 at 13:53:32 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Autoencoder Architecture
𝑍 = (𝑊 𝑋 + 𝑏) where 𝑋 ∈ ℜ𝑑 and 𝐷 > 𝑑. In the output layer, 𝑍

is decompressed back to higher dimension using 𝑋 ′ = ′ (𝑊 ′ 𝑍 + 𝑏′ )
and 𝑋 ′ ∈ ℜ𝐷 . Here, and ′ are activation functions[18] like sigmoid
function, Rectified Linear units[18]. The autoencoder architecture is
Fig. 1: Face Recognition Procedure shown in Figure 2.
1) Activation functions:
Sigmoidal: The sigmoidal activation function is given by:
RCA contains the same with +45∘ template.
1
For face recognition CRC[7], [10] method has been applied to 𝑆(𝑥) = (1)
1 + 𝑒−𝑥
estimate the head-pose of the test image. Then depending on whether
the head pose of the test image is less than or greater than 0∘ , Sigmoidal function 𝑆(𝑥) varies between 0 to +1 and the curve
LCA or RCA is selected and the corresponding frontal images are resembles the letter 𝑆.
generated. The architecture which gives the best reconstruction of
the corresponding frontal training image is chosen as the best match. ReLU: : The rectified Linear Unit activation function is given by:
A matching score is calculated using correlation coefficient and
𝑓 (𝑥) = 𝑚𝑎𝑥(0, 𝑥) (2)
Frobenius norm. As the images are compared with the frontal views,
linear measures are sufficient to study them. If the matching score It is named so due to its similarity with the half-wave rectifier. In
implies whether the test image is a new one and does not belong to this work, ReLU has been used as activation function for both the
the training set, the frontal test image is used to add an architecture autoencoder and the CNN architectures.
to each of the LCA and RCA and updating takes place in the training
set. Hence, even if recognition fails (test person is new), there is B. Convolution Neural Network (CNN)
chance of improving the training set to create a dynamic system. The Convolution Neural Networks are artificial neural networks
entire process is shown in Figure 1. inspired by animal visual system. A group of neurons response to a
Figure 1 shows the proposed method of face recognition. stimulus in a restricted or limited region of space which is known
Rest of the paper is organized as follows- Section 2 describes as the receptive field, i.e, a single neuron affects the neurons nearby
Autoencoders, CNN and CRC in brief as the methods used in the and affected by only its neighbouring neurons. The size of the
paper for face recognition. Section 3 presents the proposed method region of space is a parameter in such networks. In CNN, various
while Section 4 presents results and discussions. Conclusions are kernel functions or filters are used to see the responses of groups of
arrived at Section 5. connected neurons and the responses are stored as feature vectors.
The response is approximated using the convolution operation. CNNs
II. BACKGROUND
are applied for face recognition due to its following operational
A. Autoencoders characteristics as explained below.
Autoencoders[15], [16], [17] are a particular type of artificial
neural network which are used to learn different representations of Local Connectivity CNN calculates correlation among the neurons
the input. It can be used for dimensionality reduction since the input considering small local regions which implies that it tries to find
is mapped to lower dimension during training and remove redundant out the connectivity patterns locally. Due to this reason, CNN can
and noise signals. The autoencoder architecture reconstructs the capture the details of the non-linear features of different regions of
given input in unsupervised learning methods. The compression and the face image.
expansion to lower and higher dimensions are made in the respective Shared weights CNN captures the responses of different regions of
hidden layers. The compression is done by the encoder part while the image and applied to the same filter to form a feature map and
the decoder decodes the compressed data back to higher dimension. with each layer the level of abstraction increases. Non-linearity in
For the simplest autoencoder, there is only one hidden layer between feature extraction is obtained by using an activation function like
where the image is compressed to a smaller dimension and then Sigmoidal or Tanh or ReLU.
expanded back to a higher dimension in the output layer. Thus, if Pooling layer In CNN architectures after each convolution layer,
𝑋 ∈ ℜ𝐷 be the input matrix and 𝑊 be the weight matrix with 𝑏 generally there is a pooling layer in which a representative for a
being the bias, then in the hidden layer, the input is transformed into group of connected and local responses are evaluated instead of the
total responses of a group. This characteristic reduces dimensionality head pose of the test image in order to select the appropriate cluster
and ensures translation invariance. Most commonly used pooling (LCA or RCA) of CNN architectures. The test image is reconstructed
method is max pooling which selects the maximum responses from a using that particular cluster and find out the best match training image.
group of local responses. The pooling layer has been omitted in the
proposed CNN architecture because the image databases do not have III. PROPOSED METHOD
translation invariance in images. An example of CNN architecture is
Face recognition is challenging under the constraint that the
shown in Figure 3.
training set contains more frontal images compare to pose variant
images, especially when the test images have wide pose variations.
The aim of the paper is to solve the problem in a real time and
dynamic environment, like surveillance in a crowded place. In the
training phase first we construct templates of 45∘ and +45∘ pose
variant images using autoencoders and the few pose variant images
available in the training set. Then we propose a learning model to
transform the nonfrontal templates to frontal person-specific images
of the training set using CNN architectures.
In order to construct the template corresponding to a particular

head pose, 8 to 10 images of different persons having that head pose
is considered from the training set. Only 8 to 10 images with 45∘ and
Fig. 3: CNN Architecture +45∘ head poses are needed in the training set while the rest of the
training set might contain only frontal images of different persons.
C. Collaborative Representation based Classification (CRC) The proposed method is still able to recognize the non-frontal test
images of different persons.
Classification is the problem of identifying the class of a query
sample on the basis of the training instances having different class A. Construction of pose-variant face templates
labels known a priori. However, the chance of belongingness of the
query sample in all training classes is not equal therefore, searching Autoencoders are used to obtain the higher level representation
the whole training data set is unnecessary and time consuming. The of the input images while constructing the templates. Template
problem becomes critical in real time computer vision applications construction is necessary in order to represent a particular class of
due to large image search space. instances by a single entity. So, instead of considering many examples
for a class, we use only a particular entity as the representative of
Say, there are 𝑘 classes of instances represented by the matrix that class saving computation time. In this work the template of
𝐴 = [𝐴1 𝐴2 ...𝐴𝑘 ] where 𝐴𝑖 ∈ ℜ𝑚 (𝑖 = 1...𝑘). In sparse a particular pose we build considering the same pose images of
representation based classifier (SRC), the instance of 𝑖𝑡ℎ class is different persons as inputs to the autoencoder. However, the existing
represented by the vector 𝐴𝑖 and a query sample 𝑦 ∈ ℜ𝑚 is coded as autoencoder algorithm fails to generate the desired template. In the
𝑦 = 𝐴𝛼, where 𝛼 = [𝛼1 𝛼2 ...𝛼𝑘 ] is the sparse coding vector and 𝛼𝑖 paper, we modify the learning method to generate more accurate
is associated with class 𝑖. We can classify the query sample 𝑦 more templates. Since the template represents image of a particular pose,
accurately into a particular class compared to the samples of other irrespective of the person, so the output can be considered as a higher
classes. However, in order to represent the class of the query sample level representation of all the inputs.
accurately, there must be enough training samples of each class.
Say, to build a template corresponding to a particular pose,
In CRC, we have 𝑚 training classes and n instances in images of m different persons at that particular head pose are trained
each class. The training set is represented as a matrix by a single-hidden layer autoencoder. Let the size of each image is
𝐴 = [𝐴11 𝐴12 . . . 𝐴1𝑛 𝐴21 𝐴22 . . . 𝐴2𝑛 . . . . 𝐴𝑚1 𝐴𝑚2 . . . 𝐴𝑚𝑛 ], 𝑎x𝑏. For the architecture, each input is transformed into a row vector
where 𝐴𝑖𝑗 represents 𝑗 𝑡ℎ training example of 𝑖𝑡ℎ class. Thus matrix and we obtain the input matrix 𝑋 is of dimension 𝑚x(𝑎x𝑏). One
𝐴 contains 𝑚x𝑛 training samples of different classes. Using CRC hidden layer is used in the proposed architecture and the training
approach we can reconstruct a query sample y by a linear combination data is compressed by a factor 𝑝, which is experimentally chosen
of the training samples, as given in equation (3), to be around 60% of 𝑚. The weight matrix 𝑊 has dimension
(𝑝x𝑚). 𝑊 𝑋 depicts the compressed representation of the input 𝑋
𝑦 = 𝐴𝑋 (3) of dimension 𝑝x(𝑎x𝑏). Next the decompression of the image 𝑋 takes
where 𝑋 is the coefficient vector, each element of which represents place in the output layer. The output here is the template and is of
contribution of the training samples in a particular class for recon- dimension same as that of an individual pose (1x(𝑎x𝑏)). Now, Output
structing the query sample. The sparsity in coding reduces space (𝑂) = 𝑊 𝑋, where W is the weight matrix for decompression with
complexity using the same principle of SRC while representing the dimension (1x𝑝). We use ReLU as activation function to build the
query sample in terms of the coefficient vector 𝑋, evaluated using templates of 45∘ and +45∘ head poses.
𝑙1−𝑛𝑜𝑟𝑚 or 𝑙2−𝑛𝑜𝑟𝑚. Equation (4) is used to obtain the coefficients 1) Learning Method: In the gradient based learning algorithm the
𝑥 i in vector 𝑋 corresponding to each training sample. fundamental step is to minimize the error between the computed and
the target output by updating the network parameters. The parameters
𝑋 = (𝐴𝑇 𝐴)−1 𝐴𝑇 𝑦 (4) are learnt using the gradients as given in equation (5).
The more is the number of training samples, better reconstruction of ∂𝐸
the query sample. In this paper, CRC has been applied to estimate the ▽𝑤𝑖𝑗 = −𝜂 (5)
∂𝑤𝑖𝑗
where 𝑤𝑖𝑗 denotes a particular weight or parameter to be learnt (𝑗 𝑡ℎ
node of 𝑖𝑡ℎ layer), 𝜂 is the learning rate and 𝐸 is the squared error.
However, it has been observed that in the initial phase of training the
weights of the autoencoder become negative due to the difference in
gray level images and the ReLU. Therefore, a correction is needed in
the algorithm and we consider two heuristics to modify the algorithm.
(𝑖) Our aim is to construct the template for a particular pose

and therefore, only the regions where there is a gradual gradient
change are important since they usually denote the shape-specific
features.
(𝑖𝑖) However, too much change in gradient is usually due to the
person-specific features like eyes, nose etc. So, those regions are not
important for template construction.
Fig. 4: CNN Architecture
We introduce these two heuristics in the autoencoder architecture by
applying the computational steps in the algorithm.
to −∘ while RCA provides conversion for all head poses between 0∘
′
(𝑖)For a particular cell 𝑋𝑖𝑗 , consider a fixed neighbourhood to +90∘ .
as decided experimentally.
(𝑖𝑖)If edge information is below a threshold, chosen experimentally C. Testing Phase
′ ′
then 𝑋𝑖𝑗 = 1, else 𝑋𝑖𝑗 = 0. For a given test image, the first step is to estimate the head-pose
angle using CRC[10]. In this work, the available pose-variant images
Algorithm for finding out pose-specific template at pose 𝑇 ∘ (,say) in the training set comprise matrix 𝐴 of equation 3 for estimating head
Input-𝑚 person-variant images at pose 𝑇 ∘ , each of size 𝑎x𝑏. Note- pose of the test image. Using equation 4 we obtain co-efficient vector
Each image is transformed into a row vector, thus input matrix 𝑋 is X and the training image corresponding to the highest X provides the
of size 𝑚x(𝑎x𝑏). estimated head-pose of the test image. If it is a frontal face image, then
directly, any linear classifier or standard algorithm like PCA can be
(1) Randomly initialise weight matrix W of size 𝑝𝑥𝑚(𝑝 = 60% of used for identifying the person. Let us restrict ourselves to the cases
𝑚). where the head pose is non-frontal. The pose estimation determines
(2) 𝑊 𝑋 gives compressed matrix 𝑋 ′ of size 𝑝x(𝑎x𝑏). whether the head pose angle is lesser than or greater than 0∘ and
′
(3) Initialize 𝑊 of size 1x𝑝. accordingly the given non-frontal test image is passed through the
(4) 𝑊 ′ 𝑋 ′ gives output 𝑂. LCA or RCA of CNNs (−45∘ to 0∘ or +45∘ to 0∘ ) built in the
(5) Expected output matrix is obtained by sampling some input training phase.
images and averaging over each pixel. The respective (LCA or RCA) CNN architectures reconstruct the
(6) Perform back-propagation by comparing expected output with 𝑂. frontal faces. Let 𝑇𝑡𝑒𝑠𝑡 is the test image and 𝑃 𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑1 ,𝑃 𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑2
′
[6𝑎.] For each cell 𝑋𝑖𝑗 in 𝑋 ′ which contribute to error in a particular , ...,𝑃 𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑𝑞 are the faces reconstructed when passed through
′
cell in 𝑂, consider 𝑋𝑖𝑗 only if its gradient comes within the decided 𝐴𝑟𝑐ℎ1 , 𝐴𝑟𝑐ℎ2 , ..., 𝐴𝑟𝑐ℎ𝑞 . If the test image is belongs or identical
specified range and if considered, then in calculation of the error to person 1, then 𝑃 𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑1 should be a good reconstruction of
′
term 𝑡, 𝑋𝑖𝑗 = 1, else 0. the frontal image of person 1. We compare 𝑃 𝑟𝑜𝑗𝑒𝑐𝑡𝑒𝑑1 with the
(7) Adjust weight matrices 𝑊 and 𝑊 ′ accordingly. (8) Repeat steps original frontal image of Person 1 of the training set using correlation
2, 4, 5, 6, 7 till convergence. coefficient(𝑐) and Frobenius norm(𝑓 ).
∑ ∑
(𝑝𝑖 − 𝑝) (𝑞𝑖 − 𝑞)
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛(𝑝, 𝑞) = √∑ √∑ (6)
B. Person specific architectures 2 (𝑝𝑖 − 𝑝) (𝑞𝑖 − 𝑞)2
In the second phase of training we construct architectures for √
𝐴 = 𝑝 − 𝑞, 𝐹 𝑟𝑜𝑏𝑒𝑛𝑖𝑢𝑠𝑛 𝑜𝑟𝑚(𝑝, 𝑞) = 𝑓 = 𝑇 𝑟(𝐴𝐴𝑇 ) (7)
conversion of non-frontal images to frontal images using CNN. Using
non-frontal template of any pose as an input, we obtain the frontal For the best match, high 𝑐 value and low 𝑓 value(normalised) are
image of each person present in the training set and thus build person- desirable.
specific architectures. The novelty of the method is that we do not need So, 𝑝𝑎𝑟𝑎𝑚 = (𝑐 + (1𝑓 )) is chosen as the variable for person
to know the non-frontal images of each person rather we need a pose identification and the maximum value corresponding to the person
template to train the CNN architecture for obtaining the frontal image is the identified person. The process is depicted in Figures 5 and 6.
of that person. The CNN architecture with ReLU as the activation The functional value 𝑝𝑎𝑟𝑎𝑚 also gives the confidence of recogni-
function has two hidden layers (convolution layers) and the final fully tion. In the test phase if maximum value of 𝑝𝑎𝑟𝑎𝑚 is below a certain
connected layer, shown in Figure 5. Let these architectures be 𝐴𝑟𝑐ℎ1 value (experimentally chosen 0.4) then the system does not recognize
, 𝐴𝑟𝑐ℎ2 , ..., 𝐴𝑟𝑐ℎ𝑞 for person 1, person 2,,, person q, respectively. the person confidently and test image do not match with the training
In this work, non-frontal templates of +45∘ and −45∘ are used set. This is certain because if the person identical to the frontal image
to construct the LCA and RCA CNNs of frontal images considering of the training set, then the corresponding architecture in LCA or
all q persons present in the training set. The CNN in the LCA gives RCA should construct a good frontal image with 𝑝𝑎𝑟𝑎𝑚 value more
non-frontal to frontal conversion of all head poses between −9045∘ than 0.55. For the frontal test image, we append the LCA and RCA
Fig. 5: Testing Method Flowchart Fig. 7: Pose Variation from −90∘ to +90∘ in Database 1 with a
gap of 22.5∘ between successive images of a particular person
For building the templates of a particular pose (say 𝑇 ∘ ), 8 images

at 𝑇 ∘ of 8 randomly chosen persons are considered. The results are
shown in Figure 8 and Figure 9.
Fig. 6: Testing method flowchart using an example test image
for the person, and the training data set is updated. The proposed
method therefore, dynamically update the training set to strength the
training phase. Even if recognition fails due to low confidence value,
we can still use the test image for enriching the dataset. The updating Fig. 8: Template of +45∘ built using 8 images of +45∘
condition is given in Table 1.
TABLE II: Accuracy of the proposed method
IV. RESULTS AND DISCUSSIONS
Head-pose Number Architecture Accuracy
For experimentation the database is built using CMU PIE[8] which of test images of images group used (%)
consists of 43168 images of 68 persons with 13 different poses −90∘ 40 −45∘ to 0∘ 99
and 43 different illuminations. Different pose variant images of an −67.5∘ 40 −45∘ to 0∘ 99
individual available in the dataset are −90∘ , −67.5∘ , −45∘ , −22.5∘ −45∘ 40 −45∘ to 0∘ 100
, 0∘ , +22.5∘ , +45∘ , +67.5∘ and +90∘ . Frontal images of 40 +45∘ 40 +45∘ to 0∘ 100
persons are sampled randomly and 8 persons with +45∘ and −45∘ +67.5∘ 40 +45∘ to 0∘ 99
are sampled randomly from the database to build the training set. +90∘ 40 +45∘ to 0∘ 99
Pose variant images of different persons in Database are shown in
Figure 7. The test set consists of +45∘ , −45∘ , +67.5∘ , −67.5∘
, +90∘ and −90∘ images of all the 40 persons. After estimating
the head pose of the test image, it has been passed through the
corresponding architectures (LCA or RCA). The reconstructed image
is compared with the corresponding frontal images and the best match
is recorded as the recognized person. Table 2 shows the accuracy of
the proposed method for various groups of pose-varying test images.
TABLE I: Updation Table
Test Image Recognition(confidence) Updation
New(Frontal or Non-Frontal) Incorrect(High) No
Fig. 9: Template of +90∘ built using 8 images of +90∘
New(Non-frontal) Correct(Low) No
New(Frontal) Correct(Low) Yes
ROC Curve Theoretical and Applied Information Technology, vol.36, no.1, 2012.
1 [2] Xiaozheng Zhang and Yongsheng Gao, Face recognition across
Gabor PCA pose: A review, Pattern Recognition, vol. 42,no. 11
Gabor LDA [3] Sukhvinder Singh, Meenakshi Sharma, and Dr N Suresh Rao, Ac-
0.8 ICA curate face recognition using pca and lda, International Conference on
LPP Emerging Trends in Com-puter and Image Processing (ICETCIP2011)
True Positive Rate
DCT [4] Ming-Hsuan Yang, Kernel eigenfaces vs. kernel fisherfaces: Face
0.6 Proposed Method recognition using kernel methods, in fgr. IEEE, 2002,p. 0215.
[5] Cong Geng and Xudong Jiang, Face recognition using sift features,
in Image Processing (ICIP),2009 16th IEEE International Conference
0.4 on. IEEE, 2009,pp. 33133316.
[6] M Nandini, P Bhargavi, and G Raja Sekhar, Face recognition
using neural networks, International Journal of Scientific and Research
0.2 Publications, vol. 3, no. 3, pp. 1, 2013.
[7] Lei Zhang, Meng Yang, Xiangchu Feng, Yi Ma, and David Zhang,
Collaborative representation based classification for face recognition,
0 CoRR, vol. abs/1204.2358, 2012.
0 0.2 0.4 0.6 0.8 1 [8] T. Sim, S. Baker, and M. Bsat, The CMU pose, illumination,
False Positive Rate and expression (PIE) database, in Proceedings of the 5th International
Fig. 10: Comparison of ROC using Database Conference on Automatic Face and Gesture Recognition, 2002.
[9] Xiujuan Chai, Shiguang Shan, Xilin Chen, and Wen Gao, Locally
linear regression for pose-invariant face recognition, Image Process-
A. Comparisons ing, IEEE Transactions on, vol. 16, no. 7, pp. 17161725, 2007.
[10] Srija Chowdhury, Jaya Sil. Head pose estimation for recognizing
Comparison[10] of accuracy of proposed method with other
face images using collaborative representation based classification.
existing face recognition methods applied on CMU-PIE database[8]
Advances in Computing, Communications and Informatics (ICACCI),
are listed in Table 3 while ROC curves is shown in Figure 10.
2016 International Conference on. IEEE, 2016.
TABLE III: Comparison of the face recognition methods using [11] LeCun, Yann, and Yoshua Bengio. Convolutional networks for
CMU-PIE database images, speech, and time series. The handbook of brain theory and
neural networks 3361.10 (1995): 1995.
Methods Accuracy TPR [12] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Deep learn-
Gabor PCA 58 0.60 ing. Nature 521.7553 (2015): 436-444.
Gabor LDA 65.5 0.66 [13] Lawrence, Steve, et al. Face recognition: A convolutional neural
ICA 59 0.60 network approach. IEEE transactions on neural networks 8.1 (1997)
Gabor Supervised LPP 74.3 0.74 [14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Im-
Local DCT + Feature Fusion 70.9 0.72 agenet classification with deep convolutional neural networks. Ad-
NN 78.8 0.80 vances in neural information processing systems. 2012.
LRC 81.9 0.83 [15] Vincent, Pascal, et al. Extracting and composing robust features
S-SRC 90 0.93 with denoising autoencoders. Proceedings of the 25th international
Proposed Method 99.3 0.99 conference on Machine learning. ACM, 2008.
[16] Vincent, Pascal,et al. Stacked denoising autoencoders:Learning
V. CONCLUSIONS useful representations in a deep network with a local denoising
Proposed algorithm performs much better than the existing al- criterion.Journal of Machine Learning Research 11.Dec(2010)
gorithms, as depicted in tables and figures. The CNN feature of [17] Bengio, Yoshua. Learning deep architectures for AI. Foundations
local patterns, connectivity exploitation and autoencoder for template and trends in Machine Learning 2.1 (2009): 1-127.
reconstruction are intelligently used in the paper for solving the [18] Nair, Vinod, and Geoffrey E. Hinton. Rectified linear units
problem in question. The only drawback is that many architectures improve restricted boltzmann machines. Proceedings of the 27th
are needed to built for this method, though can be constructed international conference on machine learning (ICML-10). 2010.
offline mode. But during testing, only prediction with the existing [19] Socher, Richard, et al. Semi-supervised recursive autoencoders
architectures is required which do not take much time. Novelty of the for predicting sentiment distributions. Proceedings of the conference
proposed method is updation of database from the test images and the on empirical methods in natural language processing. Association for
system performs well even for few number of pose variant images in Computational Linguistics, 2011.
the training set. [20] Pal, Saptarshi, Srija Chowdhury, and Soumya K. Ghosh. DCAP:
Future scope includes bringing down the number of architectures A deep convolution architecture for prediction of urban growth.
required for person recognition without sacrificing the accuracy thus Geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE
to have a single architecture that will convert any non-frontal image International. IEEE, 2016.
to its frontal counterpart, thus saving space and time for recognition. [21] Sun, Yi, et al. Deep learning face representation by joint
identification verification. Advances in neural information processing
REFERENCES systems. 2014.
[1] D.G. Balakrishnan S. Chitra, A survey of face recognition [22] Parkhi, Omkar M., Andrea Vedaldi, and Andrew Zisserman. Deep
on feature extraction process of dimensionality reduction techniques, Face Recognition. BMVC. Vol. 1. No. 3. 2015.

J. Sil 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

J. Sil 1

Uploaded by

Copyright:

Available Formats

FACE RECOGNITION from NON-FRONTAL IMAGES

Using DEEP NEURAL NETWORK

𝑍 = (𝑊 𝑋 + 𝑏) where 𝑋 ∈ ℜ𝑑 and 𝐷 > 𝑑. In the output layer, 𝑍

In order to construct the template corresponding to a particular

(𝑖) Our aim is to construct the template for a particular pose

For building the templates of a particular pose (say 𝑇 ∘ ), 8 images

Fig. 6: Testing method flowchart using an example test image

You might also like