You are on page 1of 6

2018 International Conference on Communication, Information & Computing Technology (ICCICT), Feb.

2-3, Mumbai, India

A comparative analysis of feature extraction


techniques for face recognition

Saket Karve, Vasisht Shende Rizwan Ahmed


Department of Computer and Information Technology Department of Electrical Engineering
V. J. Technological Institute V. J. Technological Institute
Mumbai, India. Mumbai, India.
saketk1.sk@gmail.com, vasishtshende77@gmail.com rizwan.vjti@ieee.org

Abstract— Face recognition finds extensive applications in dimensional space, thus providing fewer variables for the
various places. To build a fully functional and reliable system for learning phase. This method is also referred to as Eigen Face
recognizing faces, a robust and efficient face detection algorithm analysis. The image transformed in to the Eigen subspace is
is required. Existing systems incorporate geometrical and called Eigen face [2], [3]. In the Eigen space, the basis vectors
template based approaches like distance between different points are the Eigen vectors with the largest Eigen values extracted
on the face, shape of the face, matching with existing templates from the original 2D data. The Eigen vectors are determined
and similar methods. These methods work successfully with by considering the relative variance between the pixels in the
specific images and have a tendency to fail, when an unusual image. This helps to extract a sub space of lower
image is tested. To overcome this, we propose statistical feature
dimensionality representing the most relevant and useful
based approach in this study for face recognition. For this
purpose, we make a detailed study of various feature extraction
information from the image.
techniques involving principal and independent component Principal Component Analysis (PCA) has been found to
analysis. Four different classifiers have been for the extracted work well on correlated data. As in general, the image
features. Accuracy measurements and execution time have been manifest as highly correlated data, PCA works well for
recorded for the analysis. It is found that factor analysis method extracting features from these images.
outperforms principal component analysis and independent
component analysis with improved accuracy. An image is considered as a matrix of highly correlated
data. This matrix is transformed to a lower dimensional Eigen
Keywords—face recognition, Eigen decomposition, independent subspace by performing different operations on it. Then, the
component analysis, factor analysis, feature extraction. covariance matrix of this matrix is found using (1). This
covariance matrix represents the relative variance between
I. INTRODUCTION pixels for the image. The basis vectors of the Eigen subspace
Face recognition has been an active area of research in the are found by finding the Eigen vectors of the covariance
field of computer vision for the past few years. Face matrix using (2). The Eigen vectors with highest Eigen values
recognition finds extensive use in wide range of applications are considered as the principal components. These principal
like social media, law enforcement, make secure online components represent what is called as Eigen face. Thus, the
payments, track down criminals, reliable biometric and many image is transformed into the feature plane.
more. Various techniques have been put forth for recognizing ்
faces, for example, structural features considering the ‫ ܥ‬ൌ ‫ܧ‬ሼሺܺ െ  ߤ௫ ሻ൫ܺȂߤ௫ ൯ ሽ (1)
distances between different points on the face, color gradients,
template matching and so on. Face recognition involves ‫ܥ‬௫ ݁௜ ൌ  ߣ௜ ݁௜ ǡ ݅ ൌ ͳǡʹǡ͵ǡ ǥ ǡ ݊ (2)
localization of face from a large image followed by pre- where, X is the 2D array (image), ߤ௫ is row-wise mean of X,
processing it to suit the requirements of the feature extraction ‫ ܥ‬is Covariance matrix of X; ݁௜ is Eigen vector of C; ߣ௜ is
technique to be employed. Various feature extraction Eigen value of C
approaches have been used most of which focus on structural
or geometric features. Statistical approaches are also used in Independent Component Analysis (ICA) is a method to
the literature [2]-[9]. In this paper we investigate this further extract features [6], [7]. This method uses an objective
for face recognition. function which minimizes intra class scattering and maximizes
inter class scattering for the given dataset. Equation 3
Principal Component Analysis [1] is a method used to represents the objective function which is minimized to obtain
extract features from images. Principal components of an the independent components. Such an optimization leads to
image extract the most relevant and significant information the components which are independent of each other.
from the original image and projects it onto the Eigen Independent Component Analysis can thus be used for feature
subspace. The Eigen subspace has smaller dimensions than the extraction.
original image space. Principal Components as features, help
us to extract the useful information from an image into a lower ܵ௕ ܹ ൌ ߣܵ௪ ܹ (3)

978-1-5386-2051-9/18/$31.00 ©2018 IEEE


2018 International Conference on Communication, Information & Computing Technology (ICCICT), Feb. 2-3, Mumbai, India

ܵ௕ ൌ  σ௖௜ୀଵ ݊௜ ሺܺ௜ െ ߤሻሺܺ௜ െ ߤሻ் (4) divided into two sets – training data and test data. 70 % of the
images are kept in the training data set and the remaining are
ܵ௪ ൌ  σ௖௜ୀଵ σ௡௝ୀଵ൫ܺ௜௝ െ  ߤ௜ ൯ሺܺ௜௝ െ  ߤ௜ ሻ் (5) into the test set. While dividing the dataset into the two sets, it
where, W is Weight matrix; ܵ௕ is interclass scatter matrix; ܵ௪ is ensured that 70 % of each class is into the training set. This
is intra-class scatter matrix; ݊௜ is Number of data-points; X is ensures a fair division of the data set. The classifier is trained
Image; ߤ is row-wise mean; c is Number of classes using 70 % of all images of every person. This is followed by
testing the trained classifier using the test data set containing
Factor analysis (FA) [1] is statistical approach which the remaining 30 % images.
searches for underlying variability in observed images. A
combination of latent variables and the error terms define the As part of the analysis, different combinations of training
features of an individual image. The inter-dependencies in the and test set have been used to record the results. Every image
data are removed and the dimension of the image is reduced in the data set has some unique characteristic. This influences
thereby restoring or combining the unobserved relevant data. the performance of the classifiers significantly. The
combinations chosen are selected greedily based on the
In this paper, we propose feature extraction from PCA, observations made recently and the performance measured for
ICA and FA for face recognition problem. All these features the different classifiers. This provides a good sample space for
are used as input with different classifiers for the comparative generalizing the results.
analysis. This paper is organized as follows. With a brief
introduction in Section I, Section II explains the methodology Support Vector Machine [10] minimizes the upper bound
used for the comparative analysis. This involves the pre- on expected generalization error, structural risk minimization.
processing performed, feature extraction methods used and the The linear classifier chooses a separating hyperplane to
classifiers employed for the analysis. Section III covers the minimize the expected classification error in hidden face
results obtained from the observations. Finally, conclusions pattern. Weighted combination of subset of vectors forms
are drawn in section IV. optimal hyperplane. These vectors are called Support Vectors.
SVM is implemented to overcome the disadvantage of
extensive tuning in MLP.
II. METHODOLOGY Multi-Layer Perceptron (MLP) [11] is implemented
A. Pre-Processing extensively in pattern recognition problems, to identify
complex conditional patterns in dense classes. A five layered
Before extracting the facial features from the image, the
MLP is implemented to identify face patterns within the
face images are converted to gray-scale. Then, the image is
images. Network consists of three hidden layers with up to
resized to a lower dimension if required.
150 units each. Inputs to the MLP are the extracted features of
B. Feature Extraction the input images. Performance is calculated with several
Various dimensionality reduction techniques have been combinations for number of units across the layers.
employed for comparative analysis. Dimensionality reduction Decision Tree [12] is a model generated from induction
is a statistical technique used to extract important pixel rules obtained from the input data. Inputs to build a decision
information from the original image. The image features in the tree are the features obtained from the feature extraction
reduced dimension space should make clear distinctions techniques, entropy, mean and standard deviation of pixel
between different classes in the dataset. Three of the most intensity values of input images along with label of
popular dimensionality reduction techniques have been used corresponding images.
for comparing the performance for face recognition viz.
Principal Component Analysis (PCA) [1], Independent Naïve Bayes is a probabilistic model, useful in determining
Component Analysis (ICA) and Factor Analysis (FA) [1]. The the local face pattern by joint probability of face pattern
study also involves analysis of the classification using a appearance. It considers a feature of an input image
various combinations of extracted features. The pre-processed independent from other features to develop joint probability of
images are reduced in dimension by the techniques mentioned the image. With minimum correlation and supervised learning
above. After many trials, three components are found to be Naïve Bayes can be compared against advanced SVMs and
plausible choice in mean squared sense. The first three MLP. As the image data which we provide is large and
principal components are extracted from every image, which number of features is reduced, problem stemming from Curse
will be used as features. These three components carry the of Dimensionality in Naïve Bayes is avoided.
most relevant information. The combination of features from The dataset used contains 10 images of 40 distinct persons.
PCA and ICA, helps pick the best features from the given Hence, a total of 400 images out of which 280 are used for
image resulting in a better performance. training and the rest 120 are used for testing. As shown in the
C. Classification figure 1, some images of each individual contained face
images at different angles, different lighting, different
The extracted features are then passed on to the classifier expressions and other facial characteristics. While testing on
which classifies the images into the available classes. For the the dataset, combination of images with peculiar
comparative study, we employ four classifiers namely Multi- characteristics was considered for detailed analysis. Test set
Layer Perceptron (MLP), Support Vector Machine (SVM), was generated based on the characteristics of the face images
Decision Tree and Gaussian Naive Bayes. The dataset is in the dataset.

978-1-5386-2051-9/18/$31.00 ©2018 IEEE


2018 International Conference on Communication, Information & Computing Technology (ICCICT), Feb. 2-3, Mumbai, India

1. Principal Component Analysis works better with


Naive Bayes with 87.42% average accuracy.
Decision Tree shows the worst performance when
used with PCA. However, the performance remains
similar irrespective of the classifier.
2. Independent Component Analysis shows better results
as compared to Principal Component Analysis on an
average for all classifiers. A significant improvement
can be observed in case of Decision tree learning
algorithm. As we can see from Table 1, ICA works
best with Decision Tree.
3. Combination of Principal Component Analysis and
Independent Component Analysis improved the
performance when used with all the classifiers except
Decision Tree learning. This shows that introduction
of one component of ICA in place of a Principal
Component has remarkably improved the
Fig.1 Sample images from dataset
performance.
4. Factor Analysis produced the best results with all he
classifiers tested against. 100 % accuracy was
III. RESULTS AND ANALYSIS achieved with Neural Network, Support Vector
The following features have been used to calculate the Machine and Naive Bayes. Decision Tree produced
accuracy and execution time for analyzing the performance. an almost perfect result with 99.17 % accuracy. This
shows that Factor Analysis on its own is the best
• Principal Component Analysis (3 components) method suitable for classifying facial images. So,
Factor Analysis is the most suitable dimensionality
• Independent Component Analysis (3 components) reduction technique which most closely represents
• Factor Analysis (3 components) Human face.

• Principal Component Analysis (2 components) and


Independent Component Analysis (1 component)
These features have been tested against the following
classifiers,
• Multi-Layer Perceptron (Neural Network)
• Support Vector Machine (SVM)
• Decision Tree
• Gaussian Naive Bayes
Figure 2 shows a histogram plot indicating the incorrect
classification frequency for images of different individuals in
the data set used. It helps us understand the images which are
misclassified with the various features extracted against the
Fig 2: Plot of various performance metrics
different classifiers used. We make the following observations
from the results obtained.
Tables I details the average accuracy attained with all
combinations of features and classifiers.
Table I: Average Accuracy
Neural Support Decision Naive
Network Vector Tree Bayes
Machine
PCA 85.67% 86.23% 74.23% 87.42%
ICA 99.28% 88.82% 99.23% 91.78%
FA 100.00% 100.00% 99.17% 100.00%
PCA + ICA 99.40% 95.00% 99.17% 98.12%

978-1-5386-2051-9/18/$31.00 ©2018 IEEE


2018 International Conference on Communication, Information & Computing Technology (ICCICT), Feb. 2-3, Mumbai, India

Table II: Average execution time


Neural Support Decision Naive
Network Vector Tree Bayes
Machine
PCA 5.38 0.13 0.22 0.04
ICA 6.12 0.13 0.38 0.04
FA 4.41 0.18 0.26 0.08
PCA + ICA 5.75 0.15 0.40 0.06

Table III: Dataset summary


Type/ Characteristic Number of Instances
Male 30
Female 10
Beard 6
Mustache 6 Fig 3: Comparison of classifiers with PCA features
Spectacles 1

Execution time was also noted for the analysis of various


techniques. It has been observed that Neural Network takes
maximum time for execution which increases with increase in
the hidden units in the architecture. Average execution time
for Neural Network with three hidden layers of 150 units each
is 5.41 sec. Naive Bayes classifier takes the least execution
time with an average execution time of 0.055 sec. Decision
tree and Support Vector Machine takes 0.31 and 0.15 sec
respectively to execute completely on an average Table II.
Even though Factor Analysis has the best performance in
terms of accuracy, it is the slowest among the other techniques Fig 4: Comparison of classifiers with ICA features
used except Neural Network. But the difference being very
small does not affect the overall effective performance.

Other metrics viz. Precision, Recall and F-Score have been


measured over the dataset. The line graph in Figure 2 shows
the plot of these metrics against all the combinations of feature
extraction techniques and classifiers. It can be observed that
PCA fares relatively poor in terms of these metrics. ICA and
FA shows good results. When one component of ICA is used
along with two PCA components, the performance improves
drastically. Among the classifiers, decision tree fares the
worst. Other classifiers have an almost same performance.
High precision shows the algorithm miss-classifies only a few
in the respective classes. Fig 5: Comparison of classifiers with FA features

Observations have been made on individual images, thus


analyzing which individual’s image is classified incorrectly.
The dataset used contains different type of individuals as
summarized in Table III.

Fig 6: Comparison of classifiers with PCA and ICA features

978-1-5386-2051-9/18/$31.00 ©2018 IEEE


2018 International Conference on Communication, Information & Computing Technology (ICCICT), Feb. 2-3, Mumbai, India

The histograms in Figures 2 to 5 show the frequency of


incorrect classification of images of each individual when
used with different dimensionality reduction techniques.
Histograms in Figures 6 to 9 show the frequency of incorrect
classification of each individual when used with different
classifiers. In ORL dataset candidates are labeled from 0 to 39,
same labeling is used to calculate error frequency. Following
observations can be made from these histograms,
1. Image 0 in the ORL dataset has the highest
misclassification irrespective of the combination of
feature space and classifiers. Factor Analysis has
correctly classified all images except image 0.
Fig 7: Comparison of feature extraction techniques with Neural Networks 2. Images labeled as 30 and above in the ORL dataset
also have high frequency of misclassification. These
images include mostly individuals wearing
spectacles.
3. People having beard and/or moustache have less
frequency of misclassification. This indicates that
change in facial features does not have any impact on
the classification.
4. Men and women both have been classified correctly
in most of the cases. This indicates that gender has
insignificant impact on classification.

Fig 8: Comparison of feature extraction techniques with SVM


IV. CONCLUSIONS
In this paper, images of various individuals have been used for
face recognition. PCA, ICA and FA were used to extract
features from the images. These features were tested against
four classifiers for recognizing the identity of the individuals.
After carefully carrying out experiments this paper leads to a
number of conclusions;
• We have observed that Factor Analysis has highest
accuracy on ORL database among the other
dimensionality reduction techniques. Factor Analysis
works best irrespective of the classifier used.
• Although, the Principal Component Analysis is
Fig 9: Comparison of feature extraction techniques with Decision Tree outperformed by Independent Component Analysis as
reported in literature, however the combination of
Principal Component Analysis and Independent
Component Analysis is more sensitive to variation in
facial-expression and orientation .This makes better
distinction among individuals than Principal
Component Analysis and Independent Component
Analysis when used independently.
• Neural Network leads the classifiers in terms of
accuracy but has the highest execution time with all
dimensionality reduction techniques.
• Naïve Bayes can be considered as an optimal
classifier in terms of accuracy and execution time
both.
Fig 10: Comparison of feature extraction techniques with Gaussian Naïve
Bayes

978-1-5386-2051-9/18/$31.00 ©2018 IEEE


2018 International Conference on Communication, Information & Computing Technology (ICCICT), Feb. 2-3, Mumbai, India

REFERENCES
[1] Jolliffe I.T. (1986) Principal Component Analysis and Factor Analysis.
In: Principal Component Analysis. Springer Series in Statistics.
Springer, New York, NY
[2] M. Turk and A. Pentland, “Eigenfaces for Recognition,” J. Cognitive
Neuroscience, vol. 3, no. 1, pp. 71-86, 1991
[3] Bartlett, Marian Stewart, Javier R. Movellan, and Terrence J. Sejnowski.
“Face Recognition by Independent Component Analysis.” IEEE
transactions on neural networks / a publication of the IEEE Neural
Networks Council 13.6 (2002): 1450–1464. PMC. Web. 20 Sept. 2017.
[4] V. V. Mokeyev, "On application of generalized Jacobi method in face
recognition by linear discriminant analysis and principal component
analysis," 2016 2nd International Conference on Industrial Engineering,
Applications and Manufacturing (ICIEAM), Chelyabinsk, 2016, pp. 1-6.
doi: 10.1109/ICIEAM.2016.7911673
[5] P. Dave and J. Agarwal, "Study and analysis of face recognition system
using Principal Component Analysis (PCA)," 2015 International
Conference on Electrical, Electronics, Signals, Communication and
Optimization (EESCO), Visakhapatnam, 2015, pp. 1-4.
doi: 10.1109/EESCO.2015.7253718
[6] M. Mollaee and M. H. Moattar, "Face recognition based on modified
discriminant independent component analysis," 2016 6th International
Conference on Computer and Knowledge Engineering (ICCKE),
Mashhad, 2016, pp. 60-65.
doi: 10.1109/ICCKE.2016.7802116
[7] Z. Lihong, W. Ye and T. Hongfeng, "Face recognition based on
independent component analysis," 2011 Chinese Control and Decision
Conference (CCDC), Mianyang, 2011, pp. 426-429.
doi: 10.1109/CCDC.2011.5968217
[8] A. M. C. Machado, "The 2D factor analysis and its application to face
recognition with a single sample per person," 2015 23rd European
Signal Processing Conference (EUSIPCO), Nice, 2015, pp. 1148-1152.
doi: 10.1109/EUSIPCO.2015.7362563
[9] B. Tunç, V. Da÷lÕ and M. Gökmen, "Robust face recognition with class
dependent factor analysis," 2011 International Joint Conference on
Biometrics (IJCB), Washington, DC, 2011, pp. 1-6.
doi: 10.1109/IJCB.2011.6117508
[10] R. Senthilkumar and R. K. Gnanamurthy, "Performance improvement in
classification rate of appearance based statistical face recognition
methods using SVM classifier," 2017 4th International Conference on
Advanced Computing and Communication Systems (ICACCS),
Coimbatore, 2017, pp. 1-7.
doi: 10.1109/ICACCS.2017.8014584
[11] H. Boughrara, M. Chtourou and C. B. Amar, "MLP neural network
based face recognition system using constructive training algorithm,"
2012 International Conference on Multimedia Computing and Systems,
Tangier, 2012, pp. 233-238.
doi: 10.1109/ICMCS.2012.6320263
[12] L. Luo, S. Hu, J. Cai, F. Tang, Z. Qiu and X. Hu, "Face Classification
Based on Natural Features and Decision Tree," 2016 International
Conference on Virtual Reality and Visualization (ICVRV), Hangzhou,
2016, pp. 1-7.
doi: 10.1109/ICVRV.2016.10

978-1-5386-2051-9/18/$31.00 ©2018 IEEE

You might also like