You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331416284

Image Size, Color Depth, Age variant on Convolution Neural Network

Conference Paper · September 2018


DOI: 10.1109/INAPR.2018.8627054

CITATIONS READS

0 58

5 authors, including:

Hady Pranoto Harco Leslie Hendric Spits Warnars


Binus University Binus University
14 PUBLICATIONS 112 CITATIONS 348 PUBLICATIONS 1,733 CITATIONS

SEE PROFILE SEE PROFILE

Yaya Heryadi
Binus University
138 PUBLICATIONS 1,052 CITATIONS

SEE PROFILE

All content following this page was uploaded by Harco Leslie Hendric Spits Warnars on 12 November 2020.

The user has requested enhancement of the downloaded file.


Image Size, Color Depth, Age variant on
Convolution Neural Network
Harco Leslie Hendric Spits Warnars
Hady Pranoto1,2 Widodo Budiharto
1Computer Computer Science Department, BINUS
Science Department, School Computer Science Department, School
Graduate Program - Doctor of
of Computer Science, of Computer Science,
Computer Science,
Bina Nusantara University
2Computer Science Department, BINUS Bina Nusantara University
Jakarta, Indonesia 11480
Graduate Program - Doctor of Jakarta, Indonesia 11480
wbudiharto@binus.edu spits.hendric@binus.ac.id
Computer Science,
Bina Nusantara University
Jakarta, Indonesia 11480
hadypranoto@binus.ac.id Tokuro Matsuo
Graduate School of Industrial Yaya Heryadi
Technology Computer Science Department, BINUS
Graduate Program - Doctor of
Advanced Institute of Industrial Computer Science,
Technology Bina Nusantara University
Tokyo, Japan Jakarta, Indonesia 11480
matsuo@tokuro.net yayaheryadi@binus.edu

Abstract—Facial recognition as of biometric authentication from a low dimensional representation through the
used in the field of security, military, finance and daily use is assumption of distribution, such as linear subspace [2] [3][4],
become a trend or famous, because of its natural and not intrusive
manifold [5][6][7], sparse representation [8][9][10]. This
nature. Many methods for face recognition such as holistic
learning, the use of local features, shallow learning and deep technique dominates the techniques used in facial recognition
learning, some methods are susceptible to variations in pose communities in the 1990s and 2000s, but from widely known,
change, illumination, expression and age variation. State of the theoretical and practical problems, the holistic method fails to
art of face recognition today is a deep learning technique that respond to the challenges of free change and the difference
delivers high accuracy. In this paper author replicate an face from the face of the assumption that they made before. The
recognition using deep learning architecture called OpenFace general technique has a disadvantage when faced with
Convolutional Neural Network. In this research author make
variation on the size of image, color dept and age, and see how that
changes in Ages, poses, illumination and expression (PIE),
factor impact on accuracy of face recognition in that architecture. When A-PIE change, this technique decrease at the level of
As the result from the research, the accuracy of a model depends accuracy in detecting a person's face. To address the
on the image size, color depth, and age variation, but in OpenFace challenges of these problems, in the early 2000s, new
CNN that recognition still provides fairly good accuracy when techniques emerged that used local-features as a base such as
reducing the size of image and color depth, as long as the image Gabor [11] and LBP [12] using multilevel and multi-
can still be detected on the landmark facial, so the alignment
dimensional extensions [13][14][15], resulting in excellent
process can be done on the face image.
performance (robust) that can pass through the variant
Keywords—Image Size, Color Depth, Age variant, properties of local-filtering. However, that local-features or
Convolutional Neural Network, Face Recognition handcrafted features are difficult to implement and not
compact. In the early 2010s, local descriptor was introduced
I. INTRODUCTION into face recognition community [16][17][18], where local
filters offer better cohesiveness. But the external
A. Face Recognition representation of this technique still has a limit regarding
Face recognition, now widely used in biometric robustness when faced with the apparition of a linear complex
authentication, often used for security, financial and military, of variations in facial appearance.
or daily used in present application. Face recognition becomes At 2014, DeepFace [19] and DeepID[20] became the best
popular, because of the nature of face recognition are not technique in verification accuracy for Labeled Face in the
intrusive and natural. Face recognition research introduce Wild (LFW), defeating human capabilities in an unlimited
since early 1990. Eigenface as holistic learning technique in scenario for the first time. Since then research in facial
early 1990, make face recognition became famous as a recognition has shifted to a deep learning-based approach,
research study. Eigenface technique gets 60% accuracy on which acquired original invariant features progressively by
experiment conduct by Turk et al. [1]. accumulating nonlinear filters. Deep-learning architectures,
In facial recognition, enhancement using features as a basis including convolutional neural networks (CNNs)
in the previous few years is divided into four essential [21][22][23][24], deep believe Networks (DBNs) [25] and
techniques: a holistic approach, local handcraft, shallow stacked autoencoders (SAEs) [26] that simulate the inner
learning dan deep learning. The holistic approach is derived perceptron brain work human brain, deep-network can
978-1-5386-9422-0/18/$31.00 ©2018 IEEE
The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 39
represent high-level abstraction with various layers of The other factor to improve the accuracy of the algorithm
nonlinear transformation. required a large dataset, with large dataset computer can
perform better to determine the features can be used as a
B. Background and Terminology in Face Recognition
descriptor of one's face recognition. To enlarge the dataset,
The basic of face recognition are classification; a person we can create a synthetic dataset, or do some process like
would be classified as one person. A face image forms any mirroring, scaling, change image mode, hue and illumination.
person can be classified as one person-based calculation of The other terminology for creating a synthetic image is
similarity between face image as input and face image in a one-to-many augmentation: create multiple image images
database. A person will be classified as the same person if the with various pose variations from numerous photos to enable
high similarity between input and database, and state as a deep learning network to learn to become a feature. But we
different person if had low similarity. If you want success to must choose one: many to one normalization or one-to-many
find great feature or discriminator for face recognition, you augmentation, we can see on table I.
must find the technique to make intra-variant in face image
for same person decreases and makes the inter-variant of
different people grow larger. The function to calculate the TABLE I. PREPROCESSING IN FACE RECOGNITION
similarity between same image person can describe as Data Description Subsetting
bellow[27] in equation (1) where 𝐼𝑖 and 𝐼𝑗 are two images processing
that we want to measure the resemblance, 𝑃𝑗 is a One to Create a new face 3D model
preprocessing function, 𝑓 is a function to perform feature many with the pose [29][30][31][32][33][34][35][36]
variations of an 2D deep model [37][38][39]
extraction, and S is the similarity level of the two images. image Data augmentation
[40][41][42][43][20][44][45][46]
𝑆 (𝑓(𝑃𝑖 (𝐼𝑖 )); 𝑓 (𝑃𝑗 (𝐼𝑗 ))) () Many to Fixed canonical SAE[47][48][49]
one view of facial CNN[50][51][52][53][54]
images from one GAN[55][56][57][58]
As mentioned before when face Ages, poses, illumination face or many faces
and expression (A-PIE) change will decrease the accuracy of that have a non-
the face recognition. A posture changed in the same person frontal view
will make a big difference in the classification process, the
same person will be classified as difference person, because C. Face Recognition in Deep Learning
of differences of appearance the images (if using a 2D camera In deep learning technique, a computer is asked to learn by
or CCTV). If a face gets a different level of illumination may themselves to determine the essential features that will be
be classified as a different person because illumination level used, this job need high computational resources, especially
gave significantly different change on an appearance of the if it involves a substantial number of datasets in the learning
images. Fifth factors are expression and age differences, that process, but now this no problem again because the current
factor will reduce the accuracy in face recognition. computer technology and supply can handle this task.
To prevent the reduction of accuracy in classification, face
recognition must be following some steps. As mentioned by
Ranjan et al. [28] there are three things that a facial D. Feature extraction uses deep learning.
recognition system needs, the first is a face detector that Network architecture deep learning in facial recognition
detects and localizes faces in a video or image. The second is can be categorized in a single network or multiple networks.
a facial landmark detector to determine the landmark points Get inspired by the success of the challenges given by
in the face, faces that have been successfully detected facial ImageNet [59], common CNN architectures like AlexNet,
landmarks will be aligned into official coordinate positions. VGGNet, GoogleNet and RestNet [21][22][60][24],
Then after the video or face image is in canonical coordinates, introduced and became the model base in the face recognition
then the next is the process of face recognition. The face directly or modified. Until now a new face recognition
detection steps its essential to reduce the intra-variant, more architecture is still being developed to improve efficiency. By
substantial area to process in face recognition, more gave adopting a single network, face recognition network is trained
significant differences, so we need to cut the field of attention in multiple networks with multiple inputs [55] or various
by the only process the face part, we need to detect the face tasks. [61], result from research shows this way provides an
on an image, and crop into the sufficient space for face- increase in performance when results are accumulated from
recognition. This process also reduces the computation cost various networks.
because the part of an image processed decreased. Face Various architectural of deep learning built such as
alignment even importance step because the face was aligned ImageNet, AlexNet, VGGNet, GoogleNet, and ResNet are
into legal coordinate position have more prominent similarity using the CNN method, used as a basis for face recognition,
in face space, than not aligned. This technique also knows as and some other architectures built to improve efficiency such
terminology Many-to-one normalization: fix the canonical as multiple networks. But basically, the function used on the
view of the face image of one or many photos that have a non- construction is the same loss of function.
frontal image angle, an image with an authoritative frontal
view; so facial recognition can work in controlled conditions.

978-1-5386-9422-0/18/$31.00 ©2018 IEEE


The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 40
E. Loss Function classification using Support Vector Machine (SVM) and
Softmax loss function is the most widely used to supervise KNN-Classifier. With labeled dataset, the architecture
the signal used in object recognition, but softmax is not trained using embedding vector from the triplet loss function,
enough to accommodate intra-variant on face recognition. So after trained with the model the system will recognize the
that another researcher created other loss function like: input image, in context predict the image label for the new
• Euclidian-distance-based loss: minimizes internal input.
variation, and enlarges variations between different For detail, the CNN architecture used by Krasser in this
people using the Euclidian distance. project is a variation of the inception architecture [23], the
• Angular / cosine-margin-based loss: studying variant of NN4 architecture from OpenFace Project.
distinguishing features in the context of angular Implemented on Keras library, this project using a fully
similarities, looking for large differences / large connected layer of CNN architecture, with 128 hidden units,
separations of angular / cosine between objects and followed by L2 normalization layer on top of this
studied. architecture. The two top layer called as embedding layer
• Softmax loss and variation: keep using softmax loss contains 128-embedding vector[62] can calculate by equation
function, but by modifying it to improve performance
(2) which L means max (z, 0) and m is the number of triplets
such as normalizing L2 on feature or doing noise
injection. from the training set. The Triplets loss is implements loss
function.
F. Comparison of Similarities
In training process, its important select the positive pair
After the deep network undergoes training using large data 𝑝
(𝑥𝑖𝑎 , 𝑥𝑖 ) and the negative pair (𝑥𝑖𝑎 , 𝑥𝑖𝑛 ) and perform a
and uses the appropriate loss function to obtain
calculation of L2 distance, the L2 distance calculated should
discriminatory features. Many methods calculate the
be less than the margin limit α if it passes this limit, should
similarity between two elements using cosine distance or L2
retrain the next iteration.
distance, then the nearest neighbor (NN) technique and
compare threshold can be used for process recognition and
𝑖=1 [[||𝑓(𝑥𝑖 ) − 𝑓(𝑥𝑖 )||2 − ||𝑓(𝑥𝑖 ) − 𝑓(𝑥𝑖 )||2 + 𝛼]] ()
𝑝
𝐿 = ∑𝑚 𝑎 2 𝑎 𝑛 2
verification, another technique that can be used is the Support
Vector Machine (SVM) Classifier. From the shallow method
technique, there is a post process that can be implemented to The model created using FaceScrub and CASIA-WebFace
compare similarities; the process uses metric learning, data set, OpenFace project converts the weight of pretrained
sparse-representation classifier (SRC). model to CSV file, and also binary format. Krasser using a
subset of LFW dataset, consisting of 100 images from 10
subject, the image passes the alignment process based on the
II. EXPERIMENTS outer eye and nose, after alignment the image, calculated for
To produce a system that can accommodate the variation L2 distance.
of change of pose, lighting, expression, age-variant, the
author will pay attention such as about the quality of database
used, there are several factors that need to be considered
include: image resolution, noise on image, color depth, and
completeness of database in age-variant
In this research, the author makes experimental
modifications based on experiments conducted by Krasser.
Tests conducted by Krasser using a deep convolutional neural
network (CNN) to extract features, Krassner project inspired
by an OpenFace project, he makes some modified the
architecture. This project using Dlib and OpenCV library, in
this project Krasser detect facial landmark using DLib and
processing the image using OpenCV. In his research, he uses
a small portion of the LFW dataset as an experiment and
verification.
Krasser performs the face detection process, cropping the
image for the input image are important, because this process
will increase the speed performance on CNN architecture.
After that, the architecture extracts a 128-dimensional
representation from a picture called embedding space from
the image was already in align. To get the embedding space
the calculated Euclidian distance the loss function to
measuring the face similarity, by the comparing vector
embedding from image input with vector embedding in a Fig 1 L2 Distance between a positive pair (above) and a negative pair
database. After comparing the resemblance, they perform (below)
978-1-5386-9422-0/18/$31.00 ©2018 IEEE
The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 41
2 Images size reduce to 100-pixel x 100 pixel and original
color depth 24-bit color
As expected, two images from the same identity have a
3 Images size reduce to 75-pixel x 75 pixel and original color
small L2 distance, and two images from different depth 24-bit color
personalities will have a sizeable L2 distance. But it could be 4 Images size reduce to 50-pixel x 50 pixel and original color
that two images from different people have a small L2 depth 24-bit color
5 Images in original size 250-pixel x 250 pixels and color
distance if, the two people are indeed similar to twin. But to depth reduce to 8-bit color
a distinct identity is the same person or not, we need to find 6 Images in original size 250-pixel x 250 pixels and color
the optimal value of τ, within the threshold value limit, the depth reduce to 7-bit color
embedded vector will be considered as the same person or 7 Images in original size 250-pixel x 250 pixels and color
depth reduce to 6-bit color
different person. Because its related with skewed value, the Images in original size 250-pixel x 250 pixels and color
F1 used to replace the accuracy. depth reduce to 5-bit color
From replicating the experimental verification research 9 Images in original size 250-pixel x 250 pixels and color
depth reduce to 4-bit color
conduct by Krasser, in the threshold value τ, the accuracy is
10 Images in original size 250-pixel x 250 pixels and color
95.5, accuracy when using k-NN classifier 96% and SVM depth reduce to 3-bit color
Classifier 96%. 11 Images in original size 250-pixel x 250 pixels and color
depth reduce to 2-bit color
12 Images in original size 250-pixel x 250 pixels and color
depth reduce to 1-bit color
13 Images in original size 250-pixel x 250 pixels and color
mode changed to Gray Scale
From each schema, every image from database loaded into
matrix 250 x 250 element, and for next step author perform
face detection process from images and make square
bounding box on the face, images were cropped and
converted into 96x96, if source image small than 96-pixel x
96-pixel, image also cropped on square bounding box, and
enlarge into 96-pixel x 96-pixel. After image was cropped
and into that size, image was alignment into frontal faces,
based on other eyes and nose landmarks into alignment vector
images, next process alignment vector images.
Alignment vector images implemented into OpenFace
CNN as Kasser project, using a fully connected layer of CNN
architecture, with 128 hidden units, and followed by L2
normalization layer on top of this architecture. And the result
from two top layer called as embedding layer contains 128-
embedding vector[62] can calculate by equation (2) which L
means max (z, 0) and m is the number of triplets from the
training set. The Triplets loss is implements loss function.
The next step is processing to find optimal distance
threshold, threshold value given best accuracies when
Fig 2. Accuracy on Threshold value τ = 0.56 (95.5%) classify image as same person or not, if because we are
dealing with skewed classes, author prefer using F1 score as
To prove the hypothesis image resolution, noise in an evaluation matric, instead accuracy.
image, color depth and completeness dataset in age-variant With given estimate distance threshold, face recognition
will influent the accuracy of face recognition. The author now as simple as calculating the distance between an input
performs many experiments with a variation on image embedding vector and all embedding vector in database.
resolution, noise in an image, color depth and completeness Input with assigned label calculating if the distance less than
dataset in age-variant. Author collected 263 images from 11 threshold value. Using k-NN classification with k value 3,
identity, with average IPD (inter Pupillary Distance) 26.69 using a Euclidian distance metric, the database classifies the
pixel, and size image size 250-pixel x 250-pixel. database, alternative author also using a linear support vector
machine (SVM) as classifier, and database mapping into
From same source of images, author make variation size space. For training these classifiers we use 50% dataset, for
reduce the size and color depth, and author also maintenance evaluation use other 50%. And the result showing as table III
the aspect ratio image. The schema conducted by the author (the level of accuracy in variate image resolution and color
described as below (Table II): depth).
To optimize the value of k, we can see the fact of setting
TABLE II. EXPERIMENTAL SCHEME
to small value of k lead to large variance, alternately setting
Schema Descriptions k to large value lead to large model bias[63]. Some author
1 Images in original size 250-pixel x 250 pixel and original suggests thumb rule using formula:
color depth 24-bit color
978-1-5386-9422-0/18/$31.00 ©2018 IEEE
The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 42
𝑘 = √𝑁 (3) We can see on Table III, from various scheme (1,2,3, and
4) the experimental scheme we can see, the modification on
size image, the accuracy on the threshold value not
Where N is number of class. significantly changed image size reduced to 50-pixel width.
Its mean the architecture gives good accuracy, if the facial
TABLE III. THE LEVEL OF ACCURACY IN VARIATION IMAGE landmark detected, and face can make alignment, when the
RESOLUTION AND COLOR DEPTH
width of the image less than 50-pixel width, the system fails
Sc Imag Col. Thr Acc. SVM KNN Desc. to place the facial marker (Fig. 3. Accuracy result by variating
he e size depth esho on a Classi Classi
me (in ld thres ficati ficati the resolution)
pixel) valu hold on on From Table III, at scheme 5 until 12 we can also see the
e Acc Acc effect of color depth changes gives a significant impact when
(k=1)
1 250 x 24 bit 0.56 95.5% 96% 96% Color
the color has reached a low color depth. Accuracy begins to
250 decrease significantly when the color depth reaches 3 bits of
2 100 x 24 bit 0.63 95.3% 96% 100% Color color depth (Fig 4. Accuracy change reducing by color
100 depth). When image have only 3 bits of color the, facial
3 75 x 24 bit 0.59 93.2% 82.0% 80.0% Color
75 landmark detection process can detect the facial point, and
4 50 x 24 bit 0.30 10.0% 10% 10% Color, alignment process can be conducted.
50 can’t
align
ment
120,00%
5 250 x 8 bit 0.56 95.5% 96% 94% Color
250
6 250 x 7 bit 0.58 95.3% 98% 94% Color
250 100,00%
7 250 x 6 bit 0.62 94.7% 98% 94% Color
250
8 250 x 5 bit 0.59 94.3% 98% 94% Color
250 80,00%
9 250 x 4 bit 0.63 93.1% 94% 88% Color
250
Accuracy

10 250 x 3 bit 0.63 92.3% 84% 78% Color


250
11 250 x 2 bit 0.78 89.2% 76% 64% Color
60,00%
250
12 250 x 1 bit 0.77 77.0% 24% 24% B/W
250
13 250 x 24 bit 0.56 95.0% 98% 94% Grays 40,00%
250 cale

20,00%
120,00%

100,00%
0,00%

80,00%
Accuracy

60,00%

40,00%

20,00%
Fig 4. Accuracy change by reducing color depth
0,00%
250 100 75 50 Then the author also using the own dataset obtained from
Pixel Width the collection of images from the internet, 263 images from
13 subject or identity, and gives the age variation on the
Accuracy on Threshold SVM Acc KNN Acc dataset to test whether age or age variation will decrease
accuracy in detecting a person's face, and the following
results are obtained from table IV:
Fig 3. Accuracy result by variating the resolution

978-1-5386-9422-0/18/$31.00 ©2018 IEEE


The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 43
TABLE IV. THE LEVEL OF ACCURACY WHEN AGE-VARIANT 27, no. 3, pp. 328–340, 2005.
IMPLEMENTED [6] Y. Gan, T. Yang, and C. He, “A Deep Graph Embedding Network
Model for Face Recognition,” 12th Int. Conf. Signal Process., no. 2,
Image Threshold Acc on Col. KNN- SVM Desc.
pp. 1268–1271, 2014.
size value threshol depth Acc. Acc.
[7] J. H. Xue and D. M. Titterington, “Comment on ‘on discriminative vs.
(in d
generative classifiers: A comparison of logistic regression and naive
pixel)
bayes,’” Neural Process. Lett., vol. 28, no. 3, pp. 169–187, 2008.
250 x 0.54 94.5 24 bit 98.8 96.3 Color, [8] J. Wright, a. Y. Yang, a. Ganesh, S. S. Sastry, and Y. Ma, “Robust
250 % % less
face recognition via sparse representation.,” IEEE Trans. Pattern Anal.
age
Mach. Intell., vol. 31, no. 2, pp. 210–227, 2009.
varian [9] L. Zhang, M. Yang, and X. Feng, “Sparse representation or
t
collaborative representation: Which helps face recognition?,” Proc.
250 x 0.56 92.8 24 bit 98.3 98.3 Color, IEEE Int. Conf. Comput. Vis., pp. 471–478, 2011.
250 % % more [10] W. Deng, J. Hu, and J. Guo, “Extended SRC: Undersampled face
age recognition via intraclass variant dictionary,” IEEE Trans. Pattern
varian Anal. Mach. Intell., vol. 34, no. 9, pp. 1864–1870, 2012.
t [11] C. Liu and H. Wechsler, “Gabor feature based classification using the
enhanced Fisher linear discriminant model for face recognition,” IEEE
Trans. Image Process., vol. 11, no. 4, pp. 467–476, 2002.
III. CONCLUSION [12] T. Ahonen, A. Hadid, and M. Pietikäinen, “Face description with local
binary patterns: Application to face recognition,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, 2006.
For the conclusion, we can conclude that the decrease of [13] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local Gabor
image resolution on face recognition, using deep learning Binary Pattern Histogram Sequence (LGBPHS): A novel non-
statistical model for face representation and recognition,” Proc. IEEE
technique give an effect, n that is not too big give significant Int. Conf. Comput. Vis., vol. I, pp. 786–791, 2005.
influence, until a certain point from image size, this technique [14] D. Chen, X. Cao, F. Wen, and J. Sun, “Blessing of dimensionality:
has decreased ability to calculate L2 distance (75x75 px). High-dimensional feature and its efficient compression for face
verification,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Model fail to get high accuracy in a certain size (50x50 px) Recognit., pp. 3025–3032, 2013.
because landmarks cannot detect and place on the image so [15] W. Deng, J. Hu, and J. Guo, “Compressive Binary Patterns: Designing
that alignment processes cannot be performed, this shows that a Robust Binary Face Descriptor with Random-Field Eigenfilters,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 8828, no. c, pp. 1–10,
the alignment process also provides an essential role in
2018.
improving the accuracy of facial recognition. From this fact [16] Z. Cao, Q. Yin, X. Tang, and J. Sun, “Face recognition with learning-
it can be concluded that the resolution of datasets and higher based descriptor,” in Proceedings of the IEEE Computer Society
input data will always give better results (see accuracy on the Conference on Computer Vision and Pattern Recognition, 2010, pp.
2707–2714.
threshold). The decrease in color depth also begins to provide [17] Z. Lei, M. Pietikainen, and S. Z. Li, “Learning discriminant face
a significant effect when color depth is lowered to 2 bits (4 descriptor,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 2, pp.
colors). The image quality level, which has high noise in the 289–302, 2014.
[18] T. H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: A
AgeDb dataset will also affect the degree of accuracy as Simple Deep Learning Baseline for Image Classification?,” IEEE
shown in Table 2.0. If a dataset has a much broader age Trans. Image Process., vol. 24, no. 12, pp. 5017–5032, 2015.
variation, then the efficiency of facial recognition is also [19] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing
the gap to human-level performance in face verification,” Proc. IEEE
decreased, as shown in Table 3.0. Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1701–1708,
But from the result of experiments, we can conclude that 2014.
factor of size of image, color depth, variation of age giving [20] Y. Sun, X. Wang, and X. Tang, “Deep Learning Face Representation
effect on face recognition using OpenFace,CNN still provide by Joint Identification-Verification,” pp. 1–9, 2014.
[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
good accuracy results regardless of the various conditions Classification with Deep Convolutional Neural Networks,” Adv.
above. Face recognition still provides good accuracy when Neural Inf. Process. Syst., pp. 1–9, 2012.
face size resolution and its color depth can still be detected [22] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks
for Large-Scale Image Recognition,” pp. 1–14, 2014.
by facial landmarks, so that process alignment can be [23] C. Szegedy et al., “Going Deeper with Convolutions,” pp. 1–9, 2014.
performed on the face image. [24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for
Image Recognition,” Comput. Vis. Pattern Recognit. 2015, pp. 1–17,
Dec. 2015.
REFERENCES [25] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm
for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,
2006.
[1] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. [26] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
Neurosci., vol. 3, no. 1, pp. 71–86, 1991. “Stacked Denoising Autoencoders: Learning Useful Representations in
[2] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. a Deep Network with a Local Denoising Criterion Pierre-Antoine
fisherfaces: Recognition using class specific linear projection,” IEEE Manzagol,” J. Mach. Learn. Res., vol. 11, pp. 3371–3408, 2010.
Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, 1997. [27] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified
[3] W. W. and A. P. Baback Moghaddam, “Beyond Eigenfaces: embedding for face recognition and clustering,” Proc. IEEE Comput.
Probabilistic Matching for Face Recognition,” 3rd IEEE Int’l Conf. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07–12–June, pp. 815–
Autom. Face Gesture Recognit., no. 443, 1998. 823, 2015.
[4] W. Deng, J. Hu, J. Lu, and J. Guo, “Transform-Invariant PCA: A [28] R. Ranjan et al., “Deep Learning for Understanding Faces: Machines
Unified Approach to Fully Automatic Face Alignment, Representation, May Be Just as Good, or Better, than Humans,” IEEE Signal Process.
and Recognition.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, Mag., vol. 35, no. 1, pp. 66–83, 2018.
no. 6, pp. 1275–1284, 2014. [29] I. Masi, A. T. Trân, T. Hassner, J. T. Leksut, and G. Medioni, “Do we
[5] X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang, “Face recognition really need to collect millions of faces for effective face recognition?,”
using Laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell.
978-1-5386-9422-0/18/$31.00 ©2018 IEEE
The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 44
Lect. Notes Bioinformatics), vol. 9909 LNCS, pp. 579–596, 2016. Conf. Comput. Vis. Pattern Recognit., vol. 07–12–June, pp. 676–684,
[30] I. Masi, T. Hassner, A. T. Tran, and G. Medioni, “Rapid Synthesis of 2015.
Massive Face Sets for Improved Face Recognition,” Proc. - 12th IEEE [55] R. Huang, S. Zhang, T. Li, and R. He, “Beyond Face Rotation: Global
Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. and Local Perception GAN for Photorealistic and Identity Preserving
Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Frontal View Synthesis,” Proc. IEEE Int. Conf. Comput. Vis., vol.
Biometrics Wild, Bwild 2017, Heteroge, pp. 604–611, 2017. 2017–Octob, pp. 2458–2467, 2017.
[31] E. Richardson, M. Sela, and R. Kimmel, “3D Face Reconstruction by [56] L. Tran, X. Yin, and X. Liu, “Disentangled Representation Learning
Learning from Synthetic Data,” 2016. GAN for Pose-Invariant Face Recognition.”
[32] E. Richardson, M. Sela, R. Or-El, and R. Kimmel, “Learning Detailed [57] J. Deng, S. Cheng, N. Xue, Y. Zhou, and S. Zafeiriou, “UV-GAN:
Face Reconstruction from a Single Image,” pp. 1–15, 2016. Adversarial Facial UV Map Completion for Pose-invariant Face
[33] P. Dou, S. K. Shah, and I. A. Kakadiaris, “End-to-end 3D face Recognition,” 2017.
reconstruction with deep neural networks,” 2017. [58] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Towards Large-
[34] Y. Guo, J. Zhang, J. Cai, B. Jiang, and J. Zheng, “CNN-based Real- Pose Face Frontalization in the Wild,” Proc. IEEE Int. Conf. Comput.
time Dense Face Reconstruction with Inverse-rendered Photo-realistic Vis., vol. 2017–Octob, pp. 4010–4019, 2017.
Face Images,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–16, 2018. [59] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition
[35] A. T. Tran, T. Hassner, I. Masi, and G. Medioni, “Regressing robust Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
and discriminative 3D morphable models with a very deep neural [60] C. Szegedy et al., “Going deeper with convolutions,” Proc. IEEE
network,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07–12–June,
CVPR 2017, vol. 2017–Janua, pp. 1493–1502, 2017. pp. 1–9, 2015.
[36] A. Tewari et al., “MoFA: Model-Based Deep Convolutional Face [61] G. Hu et al., “When Face Recognition Meets with Deep Learning: an
Autoencoder for Unsupervised Monocular Reconstruction,” Proc. - Evaluation of Convolutional Neural Networks for Face Recognition,”
2017 IEEE Int. Conf. Comput. Vis. Work. ICCVW 2017, vol. 2018– Iccv, no. APRIL, pp. 4321–4329, 2015.
Janua, pp. 1274–1283, 2018. [62] M. Krasser, “Deep face recognition with Keras, Dlib and OpenCV.”
[37] Z. Zhu, P. Luo, X. Wang, and X. Tang, “Multi-View Perceptron : a Jul-2018.
Deep Model for Learning Face Identity and View Representations,” [63] F. R. Scott, “Understanding the Bias-Variance Tradeoff.” Sep-2016.
Adv. Neural Inf. Process. Syst., pp. 1–9, 2014.
[38] J. Zhao et al., “Dual-Agent GANs for Photorealistic and Identity
Preserving Profile Face Synthesis,” Nips 2017, no. 15, pp. 1–11, 2017.
[39] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R.
Webb, “Learning from simulated and unsupervised images through
adversarial training,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern
Recognition, CVPR 2017, vol. 2017–Janua, pp. 2242–2251, 2017.
[40] J. Liu, “Targeting Ultimate Accuracy : Face Recognition via Deep
Embedding,” Cvpr, pp. 4–7, 2015.
[41] E. Zhou, Z. Cao, and Q. Yin, “Naive-Deep Face Recognition: Touching
the Limit of LFW Benchmark or Not?,” Phys. Rev. D, vol. 91, no. 4, p.
045023, Jan. 2015.
[42] C. Ding and D. Tao, “Robust Face Recognition via Multimodal Deep
Face Representation,” IEEE Trans. Multimed., vol. 17, no. 11, pp.
2049–2058, 2015.
[43] Y. Sun, X. Wang, and X. Tang, “Deeply learned face representations
are sparse, selective, and robust,” Proc. IEEE Comput. Soc. Conf.
Comput. Vis. Pattern Recognit., vol. 07–12–June, pp. 2892–2900,
2015.
[44] Y. Sun, D. Liang, X. Wang, and X. Tang, “DeepID3: Face Recognition
with Very Deep Neural Networks,” pp. 2–6, 2015.
[45] Y. Sun, X. Wang, and X. Tang, “Sparsifying Neural Network
Connections for Face Recognition,” 2015.
[46] D. Wang, C. Otto, and A. K. Jain, “Face Search at Scale,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1122–1136, 2017.
[47] M. Kan, S. Shan, H. Chang, and X. Chen, “Stacked progressive auto-
encoders (SPAE) for face recognition across poses,” Proc. IEEE
Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1883–1890,
2014.
[48] Y. Zhang, M. Shao, E. K. Wong, and Y. Fu, “Random faces guided
sparse many-to-one encoder for pose-invariant face recognition,” Proc.
IEEE Int. Conf. Comput. Vis., pp. 2416–2423, 2013.
[49] J. Yang, S. Reed, M.-H. Yang, and H. Lee, “Weakly-supervised
Disentangling with Recurrent Transformations for 3D View
Synthesis,” pp. 1–11, 2016.
[50] Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep learning identity-
preserving face space,” Proc. IEEE Int. Conf. Comput. Vis., pp. 113–
120, 2013.
[51] Z. Zhu, P. Luo, X. Wang, and X. Tang, “Recover Canonical-View
Faces in the Wild with Deep Neural Networks,” pp. 1–10, Apr. 2014.
[52] L. Hu, M. Kan, S. Shan, X. Song, and X. Chen, “LDF-Net: Learning a
Displacement Field Network for Face Recognition across Pose,” Proc.
- 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 -
1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP
2017, Biometrics Wild, Bwild 2017, Heteroge, pp. 9–16, 2017.
[53] F. Cole, D. Belanger, D. Krishnan, A. Sarna, I. Mosseri, and W. T.
Freeman, “Synthesizing normalized faces from facial identity
features,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition,
CVPR 2017, vol. 2017–Janua, pp. 3386–3395, 2017.
[54] J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, and J. Kim, “Rotating your
face using multi-task deep neural network,” Proc. IEEE Comput. Soc.
978-1-5386-9422-0/18/$31.00 ©2018 IEEE
The 1st 2018 INAPR International Conference, 7 Sept 2018, Jakarta, Indonesia 45

View publication stats

You might also like