Professional Documents
Culture Documents
Abstract – Face recognition has wide range of real time advanced human-computer interaction, video surveillance,
application such as Criminal Identification, Access and Security, automatic indexing of images, and video database, among
Healthcare, Finding missing persons, Helping the blind, etc. others.
These face recognition application suffers from inaccuracy due
Face attribute prediction is usually tackled via a
to similar attributes existing between male and female faces. To
overcome this problem, the facial attribute of a person are Detection- Alignment-Recognition (DAR) pipeline. Within
predicted from the original image datasets. Based on these DAR, an off-the-shelf face detector is used to detect faces
attributes we can classify them as male or female and their in images in the detection stage. Then in the alignment
expressions. This paper proposes global transformation method stage a face landmark detector is applied to faces, followed
and global transfer learning technique to predict the attribute of by the operation of establishing correspondence between
facial images. The global transformation is done by bilinear detected landmarks and canonical locations where domain
transformation of the image pixels using the adjacent pixels with
no constraints such as equal scaling, rotation, etc. This gives full
experts’ input is required. Finally faces are aligned by
flexibility to discover transformation that is beneficial to predict transformations estimated from the correspondence. In the
he attributes for any specific input images. The global learning recognition stage, features are extracted from the aligned
net is used to establish dependencies on multiple face attributes. faces and fed into a classifier to predict the face attributes.
Thus learning a shared face representation for multiple
attribute prediction is far better than learning separate face However, the alignment stage in the DAR pipeline
representation for each individual attributes. This deep learning suffers several issues. It heavily depends on the quality of
technique provides more accuracy compared to traditional landmark detection results. Despite good performance on
handcrafted feature technique. The experimental result shows near frontal faces, the current face landmark detectors
that this method can effectively predict the gender, expression cannot give satisfactory results on unconstrained faces with
and youthfulness of the facial image with accuracy 91.05% and
validation loss 0.2079 for CELEBA datasets with less learning large pose angles, occlusion or blurriness. The error in
rate compared to the Alignment based method whose accuracy is landmark localization would definitely harm the
88%. performance of attribute prediction. Besides, even with
accurate facial landmarks, one still needs to handcraft
I. INTRODUCTION specific face alignment protocols (canonical locations,
A face recognition system is a technology capable of transformation methods, etc.), which demands dense
identifying or verifying a person from a digital image or a domain expert knowledge. Some warping artifacts of
video frame from a video source. There are multiple mapping landmark locations to canonical positions are also
methods in which facial recognition systems work, but in inevitable in aligning the faces. Thus facial attribute
general, they work by comparing selected facial attributes prediction error accumulates due to a combination of
from given image with faces within a database. Thus face erroneous landmark detection and handcrafted protocols.
attribute prediction is an important task in face analysis. It is
also described as a Biometric Artificial Intelligence based In this work, we propose a landmark free global
application that can uniquely identify a person by analyzing facial attribute detection which directly learns a global
patterns based on the person’s facial attribute and texture. transformation and part localizations on each input face
While initially a form of computer application, it end-to-end, getting rid of reliance on landmarks and hard-
has seen wider uses in recent times on mobile platforms and wired face alignment as in DAR. This method is landmark
in other forms of technology, such as robotics. It is typically free and learns transformation and localization globally in
used as access control in security systems and can be all images of the datasets.
compared to other biometrics such as fingerprint or eye iris
recognition systems. Although the accuracy of facial II. RELATED WORKS
recognition system as a biometric technology is lower than Attributes, such as person attributes, object attributes
iris recognition and fingerprint recognition, it is widely and face attributes are mid-level representations which
adopted due to its contactless and non-invasive process. convey compact semantic information. Traditional methods
Recently, it has also become popular as a commercial usually use hand-crafted features such as SIFT and HOG to
identification and marketing tool. Other applications include
2
3.1). GLOBAL COMPONENT IN AFFAIR: input face image to the i-th attribute prediction as
GLOBAL TRANSNET: 𝑓𝜃𝐶 ,𝜃𝐹 ,𝜃𝑇 (I).
𝑔𝑖 𝑔 𝑔
specific classification. Denote the overall mapping from an transformed by the part localization parameter 𝑇𝑝𝑖 . The
locally transformed face image is then processed by the i-th
3
part representation learning net parametrized by 𝜃𝑝𝐹𝑖 and the mirroring to augment the training data. No alignment or
other pre-processing is performed.
i-th part classifier with parameter 𝜃𝑝𝐶𝑖 .
Note that some attributes are corresponding to the
same local regions, e.g. attribute “Mouth Open” and IV RESULTS AND DISCUSSION
attribute “Wearing Lipstick” both correspond to the mouth
region. To save computation power, different attributes The facial attributes such as gender expression and
correspond to the same local face regions may share the youthfulness of the facial image is simulated and detected
using Google colabs with tensorflow backend tool for the
same part Loc-Net parameter 𝜃𝑝𝑇𝑖 and part feature
CELEBA datasets which are publically available.
extraction net parameter 𝜃𝑝𝐹𝑖 .
I. GLOBAL TRANSNET:
3.3) DATA AGUMENTATION: Global Transnet is bilinear interpolation of the
The proposed AFFAIR method is evaluated on the image. Fig 4.1 shows the bi-linearly interpolated image.
large-scale Celebrity Faces Attributes (CelebA) [11] dataset.
The CelebA dataset contains over 200k celebrity images,
each fully annotated with 40 attributes like “Pointy Nose”,
“Wavy Hair”, “Oval Face”. The CelebA dataset has two
versions, one version of unaligned face images in the wild
and the other version of aligned faces which are aligned by
ground truth facial landmarks. We use the unaligned version
in this experiment. The face images cover large pose
variations and cluttered background, thus are quite
challenging. For evaluation on each dataset, it uses its
official training/testing split protocol. It also provides
ablation study of AFFAIR on all the datasets.
4
accuracy obtained in this method is approximately 91.5%
and the validation accuracy obtained is approximately
92.5%.
5
systems that depends on the attributes of the facial image for [5] H. Lai, S. Xiao, Y. Pan, Z. Cui, J. Feng, C. Xu, J. Yin, and S.
classifying and recognizing human faces. It can be used in Yan, 2016, “Deep recurrent regression for facial landmark
healthcare field to monitor the expression of the patients. It detection”, IEEE Transactions on Circuits and Systems for Video
can be used in application that in need of distinguishing Technology, vol. 28, pp. 1144-1157.
between male and female faces. It also finds variety of [6] Y. Zhong, J. Sullivan, and H. Li., 2016, “Face attribute
application in safety and security systems, criminal prediction using off the-shelf CNN features”, International
identification, human robot interaction, mobile applications, Conference on Biometrics, pp. 1–7. 2016.
etc. [7] Y. Zhong, J. Sullivan, and H. Li., 2016, “Leveraging
mid-level deep representations for predicting face attributes
REFERENCE in the wild”, IEEE International Conference on Image
Processing, pp. 3239–3243.
[1] Jianshu Li, Fang Zhao, Jiashi Feng, Sujoy Roy, [8] E. M. Rudd, M. G¨unther, and T. E. Boult., 2016,
Shuicheng Yan, 2018, “Landmark free Face Attribute “Moon: A mixed objective optimization network for the
Prediction”, IEEE Trans. Image Processing., vol. 27, pp. recognition of facial attributes”, European Conference on
4651-4662. Computer Vision, pp. 19–35, Springer.
.[2] M. M. Kalayeh, B. Gong, and M. Shah, 2017, [9] R. Torfason, E. Agustsson, R. Rothe, and R. Timofte.,
“Improving facial attribute prediction using semantic 2016, “From face images and attributes to attributes”, Asian
segmentation”, IEEE Conference on Computer Vision and Conference on Computer Vision, pp. 313–329, Springer.
Pattern Recognition. [10] H. Dibeklioglu, F. Alnajar, A. A. Salah, and T. Gevers,
[3] M. Ehrlich, T. J. Shields, T. Almaev, and M. R. Ame, 2015, “Combining facial dynamics with appearance for age
2016, “Facial attributes classification using multi-task estimation”, IEEE Transactions on Image Processing, vol.
representation learning”, IEEE Conference on Computer 24, pp. 1928–1943.
Vision and Pattern Recognition Workshops, pp. 47–55.
[4] C. Huang, Y. Li, C. Change Loy, and X. Tang, 2016,
“Learning deep representation for imbalanced
classification”, IEEE Transaction on Computer Vision and
Pattern Recognition, pp. 5375– 5384.