Professional Documents
Culture Documents
Business Analysis
Winter 2023
FACIAL RECOGNITION
DSEB 63 - GROUP 02
Professor: Pham Tuan Minh, PhD
List of members:
Vu Minh An
This research focuses on the advancement of facial recognition technology, a field that
has seen substantial interest over recent years, exemplified by its adoption in consumer
devices such as Apple's iPhone, which has evolved its security features from password
entry to finger print to facial identification for its non-intrusive collection process and the
rich structural information of the face. Our study delves into dimensionality reduction
techniques, primarily Principal Components Analysis (PCA) variants; in combination
with different conventional machine learning classifiers and evaluates their effectiveness
in facial recognition tasks. Finally, we extend our previous theoretical knowledge to an
application in the task of Facial Emotion Recognition, which can be helpful cross
domain, from medical to customer service and even politics.
Chapter 2 investigates the efficacy of diverse dimensionality reduction methods and their
integration with conventional machine learning classifiers like Naive Bayes, Support
Vector Machines (SVM), Logistic Regression. This analysis aims to understand the
potential advantages these combinations offer over the more intuitive distance-based
method such as k-nearest neighbors (K-NN) we started with in chapter 1.
Figure 11: Accuracy of combinations of PCA variants and machine learning classifiers
Figure 15: The evolution of facial expression recognition in terms of datasets and
methods.
Figure 16: Algorithm analysis of 2D FER Techniques (Huang, Chen, Lv, & Wang, 2019)
Figure 17: Survey done by Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017)
Figure 21: Accuracy of various classifier following two feature extraction methods
Chapter 1: Principal Component Analysis
1.1 Introduction
In this chapter, we will try to reimplement and investigate one of the most basic
approaches in face recognition: PCA in depth and to complete the identification
phase in face recognition coupled with method in distance measuring using KNN.
The ORL Database of Faces, also known as the AT&T Database of Faces,
comprises 400 images spanning 40 subjects. For certain subjects, images capture
variations in lighting, facial expressions, and accessories like glasses. Taken
against a dark background, each subject is positioned upright, allowing for slight
side movement. Images are 92x112 pixels, in 256 gray levels (Figure 1) . This
diversity and structure make the dataset a fundamental resource for facial
recognition research, emphasizing its versatility in different conditions (AT&T
Laboratories Cambridge, 2001)
Figure 1. Samples from ORL (Our Database of Faces)
Also, One of the studies that has been conducted is stated in the journal entitled “
Face Identification Based on K-Nearest Neighbor “ by Wirdiani and Hridayami in
Indonesia, The proposed facial recognition process is carried out using the PCA in
feature extraction and K-nearest neighbors ( KNN ) for the face identification with
the restriction of 30 people, each person had 3 images used for training and 2
images used for testing. The result obtained from several test of k-neighbors value
with the best accuracy or F1-score is 81% with k-neighbor = 1, 53% for k = 2 and
47% for k = 3.
1.3 Methodology
1.3.1. Overview
The PCA method will transform the original dataset to feature vectors then
principal components or eigenfaces.
Figure 4. Eigenfaces in facial recognition
Elbow Chart Method involves employing an elbow chart to graphically depict the
relationship between the number of eigenfaces and the cumulative explained
variance ratio. Through observation, we identify a range of 30-40 eigenfaces as the
'elbow point,' where the increase in explained variance ratio begins to plateau,
indicating an optimal balance between information retention and dimensionality
reduction. We ultimately chose 30 as the number of eigenfaces.
-
2 2 2
d(P, Q) = (𝑃1 − 𝑄1) + (𝑃2 − 𝑄2) +... + (𝑃𝑛 − 𝑄𝑛)
Euclidean
𝑛
∑ |𝑃𝑖 − 𝑄𝑖|
Manhattan 𝑖=1
𝑛
∑ 𝐴𝑖*𝐵𝑖
𝐴*𝐵 𝑖=1
Sc(A,B) = Cos(Θ)= |𝐴|*|𝐵| =
Cosine 𝑛
2
𝑛
2
∑ 𝐴𝑖 * ∑ *𝐵𝑖
𝑖=1 𝑖=1
Accuracy is one of the most common metrics in evaluating the performance of the
model. It represents the proportion of correctly identified instances out of the total
instances
Accuracy formula:
Moreover, we have also had a small glimpse of the dataset and seen that there have
been many cases when we begin to split the data, we encountered some problem
like the imbalance between the train images and test ones,... Therefore, we decided
to do the Cross Validation with Kfold to find out the average performance of the
metrics we will use later to measure the distance in different cases.
So with the KNN model, we have found out about the option for different weight
metrics, which are Uniform and Distance.
Weight refers to how much influence each neighbor has on the prediction of a new
data point. The weight assigned to each neighbor affects how its contribution is
considered in the prediction process.
● Uniform: All neighboring points will have equal weight in the prediction
process
● Distance: the contribution of each neighbor is weighted by its distance from
the query point. Closer neighbors have a higher influence on the prediction
compared to farther neighbors.
In the task of face biometrics, the dataset will normally have the Inherent
Structure. It is because different images of one individual typically share common
characteristics such as the arrangement of facial features, overall shape, and
proportions. Conversely, faces of different individuals have distinct features and
variations that differentiate them from each other.
The dataset includes 40 labels which is 40 people. Each person has 10 images with
different angles, expressions,...We had looked through the dataset and found the
person number 4 with the imbalance in his 10 images ( 6 images of him wearing
glasses and 4 not wearing ones ).Therefore, we decided to do further investigation
in this case to find out more insight and the sensitivity between the PCA and
problem of imbalanced dataset.
● The eigenface of the test image projected on the eigenspace still gets the
blurred glasses although the input shows that he does not wear ones.
● The test image’s closest neighbors from 1 to 3 normally return the exact
person wearing the glasses
● The other neighbors return incorrect people
● The blurred glasses result from the algorithm of PCA. PCA aims to capture
the most significant variations across the dataset. Therefore, the overall
facial structure should be preserved rather than the glasses.However,
enough images of one person wearing glasses might lead to the blurred
detail of glasses being still captured in the eigenspace
● The features extracted from the images wearing glasses might have higher
variability compared to the images not wearing ones
● The closest neighbors still return the correct person because the general
patterns such as, hair, nose, mouth,... are still recognized across the images
with or without the glasses
● The difference between the test image and its eigenface might cause the
complex decision boundary. As a result, images of this person not wearing
glasses could be closer in the feature space to images of other people.
Wearing glasses case:
Comments: The overall result of the case testing the wearing glass and closing
eyes is relatively good. All of the 5 closest neighbors return the same person
across all the metrics we tried.
The closest neighbors returned shares some detail like the test image. We can see
that the overall neighbors have the same curly hair, the expression of his smiling
mouth or the monolid eyes.
2.1 Introduction
Furthering this line of research, a subsequent study explored the use of Support
Vector Machine (SVM) as a classifier with PCA as a feature extractor, attaining an
impressive recognition rate of 98.75%. This highlights the robustness of SVMs in
handling high-dimensional data, particularly when combined with efficient
dimensionality reduction through PCA.
In 2019, Pranati et al. continued to leverage the PCA technique for feature
extraction and dimensionality reduction, implementing facial recognition systems
using SVM and achieving a solid accuracy rate of 92% on the ORL database. This
consistency in results reaffirms the reliability of PCA as a preprocessing step in
facial recognition systems.
Kak et al. utilized DWT and PCA for preprocessing and feature extraction,
coupled with the K-Nearest Neighbors (KNN) classifier, achieving an exceptional
99.25% recognition rate on the ORL database. This study exemplifies the potential
of combining powerful feature extraction techniques with effective classifiers like
KNN.
Exploring PCA further, the literature identifies various adaptations of the standard
PCA method, addressing its computational challenges, especially with large
datasets. Variants like Sparse PCA, Randomized PCA, Incremental PCA, Kernel
PCA, and Modular PCA offer solutions that balance computational efficiency and
analytical precision, each tailored to specific types of data challenges.
Incremental PCA is tailored for large datasets that exceed memory capacity,
processing data in batches and updating the PCA model incrementally, ideal for
streaming data scenarios (Ross, Lim, Lin, & Yang, 2008).
Kernel PCA extends PCA to nonlinear data structures using kernel methods,
facilitating the identification of principal components in a higher-dimensional
space, thus enabling the linear separation of complex, nonlinear data patterns
(Schölkopf, Smola, & Müller, 1998).
Modular PCA is useful for the segmentation of a dataset into smaller, more
manageable modules, on which PCA can be independently applied. The primary
objective of Modular PCA is to facilitate efficient computation and improved
interpretability of principal components within large or complex datasets by
leveraging the inherent structure or partitioning of the data.(An improved face
recognition technique based on modular PCA approach Rajkiran Gottumukkal,
Vijayan K.Asari )
Logistic Regression: Contrary to what its name might suggest, Logistic Regression
is a linear model used for classification purposes. It employs the logistic function
to estimate probability distributions, offering a straightforward and efficient means
of modeling binary dependent variables. Its widespread application is partly due to
the interpretability of its output, which directly represents the odds of class
membership (Hosmer Jr, Lemeshow, & Sturdivant, 2013).
Naive Bayes Classifier: The Naive Bayes Classifier is a probabilistic classifier that
applies Bayes' theorem under the assumption that features are independent of each
other. This "naive" assumption simplifies computations and, despite its simplicity,
often results in surprisingly effective classification, especially in high-dimensional
datasets. Its efficiency and straightforward implementation have made it a popular
choice for text classification and spam detection tasks (McCallum & Nigam,
1998).
2.3 Methodology
Our analysis extended to comparing the accuracy of different PCA methods when
combined with various classifiers. Modular PCA demonstrated superior
performance in most cases, attributed to its ability to better accommodate the
inherent variability in the dataset related to lighting and facial expressions. This
comparative exercise underscored the efficacy of Modular PCA over traditional
PCA and its variants in handling the complexities of facial recognition tasks.
Insights gleaned from visualizing incorrectly predicted cases revealed that certain
facial features (e.g., beards, glasses, bald foreheads) could confound classification
efforts, highlighting areas for further model refinement. Additionally, we explored
the application of our optimized models, particularly SVM combined with Modular
PCA, on a new image class to validate the generalizability and robustness of our
approach.
2.4.1. Results
Figure 11: Accuracy of combinations of PCA variants and machine learning classifiers
Analyzing the chart, it becomes evident that Support Vector Machines (SVM) and
k-Nearest Neighbors (KNN) exhibit superior performance in addressing face
recognition challenges. Following closely are Logistic Regression and Random
Forest, demonstrating commendable results. However, Decision Tree yields
comparatively poor outcomes. Moreover, the utilization of Principal Component
Analysis (PCA) variants proves beneficial in enhancing the performance of
classification algorithms with lower score, as observed with Modular PCA in
conjunction with Decision Tree and Naive Bayes. However, for SVM or KNN, the
incorporation of PCA variants does not yield discernible differences in
performance, emphasizing the algorithm-specific impact of PCA in face
recognition problems.
This chart shows the score of PCA combined with Classification methods.
Looking at the chart, we can see that SVM and KNN are the methods with the
highest score 0.985 and 0.9775 respectively and Random Forest above 0.94.
Meanwhile, Decision Trees produced quite bad results with only about 0.63.
It can be seen that the Modular PCA method gives the best results of most
classification methods . This can be explained because the data contains 400
images from 40 distinct subjects. For some subjects, the images were taken at
different times, varying the lighting, facial expressions while the limitations of
PCA (Principal Component Analysis) based face recognition in handling
variations in pose and illumination. The method, which considers global
information and represents faces with a set of weights, struggles when faced with
significant changes in pose and illumination. Dividing face images into smaller
regions and calculating weight vectors for each region can address this issue. By
focusing on local information, the weights become more representative, allowing
for better recognition under varying conditions. The modular PCA approach,
which considers individual face regions independently, is proposed as a solution to
improve recognition rates when dealing with pose and illumination variations.
Note: Because there is no separate test dataset, the problem must be divided into
fold train test dataset to solve the problem, so the scores are only relative and will
fluctuate when divided into different folds when changing random_state.
2.4.2. Discussions
Below are some interesting insights from our experiments. By visualizing the
cases that were incorrectly predicted, below are the cases where most
classification methods give wrong results.
3.1 Introduction
In recent years, the field of Facial Expression Recognition (FER) has seen
substantial growth due to its wide-ranging applications across various domains.
FER systems are pivotal in interpreting human emotions by analyzing facial
features, which can significantly enhance human-computer interaction. This
research paper delves into the intricate mechanisms of FER with a conventional
machine learning approach focusing on interpretability.
The applications of FER are manifold and profoundly impactful in sectors such as
medical, customer service, and political campaigns. In the medical field, FER can
play a crucial role in understanding patient emotions, thereby enabling healthcare
providers to offer better care by recognizing non-verbal cues of discomfort or pain.
This is particularly significant in scenarios where patients are unable to verbally
communicate their feelings. Similarly, in the realm of customer satisfaction, FER
technology can transcend traditional feedback mechanisms by analyzing
customers' facial expressions at the point of interaction. This can provide authentic
insights into customer experiences, distinguishing genuine satisfaction from polite
or courtesy responses. In automatic driver monitoring, the system can detect driver
fatigue level. Moreover, the application of FER in political campaigns, especially
during presidential debates in the USA, can offer a novel perspective on public
engagement and reaction. By analyzing the facial expressions of the audience,
campaign strategists can gauge the emotional impact of their messages, enabling
them to refine their communication strategies effectively.
3.2 Literature Review
Figure 15: The evolution of facial expression recognition in terms of datasets and methods.
Figure 16: Algorithm analysis of 2D FER Techniques (Huang, Chen, Lv, & Wang, 2019)
Conventional Facial Expression Recognition (FER) approaches rely heavily on
manual feature engineering, requiring meticulous preprocessing, feature
extraction, and classification tailored to specific datasets. These methods are
grounded in three primary steps: image preprocessing to enhance relevant
information and eliminate distractions, feature extraction to distill useful data from
images, and expression classification to categorize facial expressions accurately.
Image Preprocessing:
This initial phase focuses on preparing images for feature extraction by reducing
noise, detecting faces, normalizing scale and grayscale, and applying histogram
equalization to improve image quality. These processes are crucial for mitigating
interference factors like complex backgrounds, light intensity, and occlusion,
which can vary significantly across datasets due to differences in size, color, and
the equipment used for image capture.
Feature Extraction:
Feature extraction is pivotal in translating images into a form more amenable for
classification. Techniques such as Gabor feature extraction, Local Binary Pattern
(LBP), optical flow method, Haar-like feature extraction, and feature point
tracking are, Facial Landmarks commonly employed. Each method has its
strengths and weaknesses, with Gabor wavelets offering robustness to texture
transformation and LBP providing efficiency and less storage demand. However,
challenges like dimensionality and sensitivity to noise can affect their
performance.
Expression Classification:
Figure 17: Survey done by Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017)
Figure 18: State-of-the-art Algorithms and their performance on the database listed in
Fig. 17
In this paper we will investigate and perform testing and inference on the most
updated version of AffectNet for its recency and wide coverage dataset with use of
strictly Machine Learning approaches for classifiers for interpretability.
3.3 Methodology
a. Face Detection
To initiate the facial expression recognition process, the first step is face detection,
where the goal is to identify and locate faces within an image. This is a crucial
preliminary step before predicting facial landmarks. In Figure (1), a face detection
algorithm is applied using the built-in Dlib function, which provides an object
detector specifically designed for detecting faces in images.
The Dlib face detection function identifies potential face regions in the image and
returns bounding boxes around these regions. These bounding boxes represent the
detected faces and serve as input for the subsequent facial landmark prediction
step.
After successful face detection, the next stage involves predicting facial landmarks
within the identified face regions. This was accomplished using the built-in Dlib
function, which returns an object detector capable of detecting faces in images. In
fig (1) After the face detection is complete, the built-in function of Dlib is utilized
to predict the facial landmark points. The popular pretrained model, which can be
downloaded from the dlib website, is used to predict 68 landmark points. Yellow
dots indicate the expected facial landmark positions in Figure (2). 4)
c. Feature Extraction
The most crucial stage in facial expression recognition is calculating the feature
vector that characterizes a person's feeling. It's crucial to understand how the facial
landmark points relate to one another. This is accomplished by finding the mean of
both axes, which yields a central point. (X_mean, y_mean) near the nose region
AS IT is considered as the center of the face. The position of all points in relation
to this center point can then be determined. Then a line is formed between the
center point and each other facial landmarks position. As a result, each resulting
line has magnitude and direction (i.e., it is a vector) and serves as the feature
vector for both training and classification. The magnitude is the Euclidean distance
between the two points, while the direction represents the angle formed by the line
with the horizontal reference axis. As a result, the feature vector may be summed
up as follows: feature vector = <point1.x, point1.y, magnitude1, direction1, . . .,
point68.x, point68.y, magnitude68, direction68>
With the preprocessed facial expression data, which includes the detected faces,
predicted facial landmarks, and extracted feature vectors, the next step is to
employ machine learning algorithms for facial expression recognition. The dataset,
derived from the Affectnet dataset and split into training and testing sets in an
80:20 ratio, serves as the basis for training and evaluating the machine learning
models. Subsequently, SVM, Logistic Regression, Random Forest, KNN, and
Naive Bayes algorithms are utilized to classify the emotions. The accuracy metric
is employed to evaluate the performance of each algorithm.
Figure 21: Accuracy of various classifier following two feature extraction methods
The table highlights an unexpected finding: Logistic Regression outperforms
Support Vector Machines (SVM) marginally, showcasing notable efficacy despite
its simplicity compared to advanced, multi-layered neural network approaches in
Facial Expression Recognition (FER). This outcome suggests that the
straightforwardness of Logistic Regression does not hinder its performance,
potentially offering a robust, efficient alternative for FER tasks, especially in
scenarios where the complexity of neural networks may not be warranted. Such
insights advocate for a balanced consideration of both sophisticated and simpler
models, depending on the specific characteristics and requirements of the dataset
at hand.
Chapter 4: Conclusion and Future work
First is the integration of more advanced machine learning models and techniques.
Incorporating more sophisticated machine learning algorithms, such as deep
learning models, could significantly improve the system's accuracy and efficiency
in recognizing facial emotions.
Second, the model fails to generalize in labels where data was insufficient,
imbalanced or labeled wrongly from what is expected. Enriching the dataset with a
broader and more diverse range of facial expressions and emotions could enhance
the model's ability to generalize and perform accurately across different
demographics and contexts.
Çarıkçı, M., & Özen, F. (Year). A Face Recognition System Based on Eigenfaces
Method. Haliç University, Electrical and Electronics Engineering Department,
Şişli, Istanbul, Turkey.
Task 2:
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis.
Journal of Computational and Graphical Statistics, 15(2), 265-286.
Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with
randomness: Probabilistic algorithms for constructing approximate matrix
decompositions. SIAM Review, 53(2), 217-288.
Ross, B., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for
robust visual tracking. International Journal of Computer Vision, 77(1-3), 125-141.
Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis
as a kernel eigenvalue problem. Neural computation, 10(5), 1299-1319.
M. O. Faruqe and M. A. M. Hasan, "Face recognition using PCA and SVM," 2009
3rd International Conference on Anti-counterfeiting, Security, and Identification in
Communication, Hong Kong, China, 2009.
Singhal, Nikita & Ganganwar, Vaishali & Yadav, Menka & Chauhan, Asha &
Jakhar, Mahender & Sharma, Kareena. (2021). Comparative study of machine
learning and deep learning algorithm for face recognition. Jordanian Journal of
Computers and Information Technology.
Task 3:
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE
Transactions on Information Theory, 13(1), 21-27.
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic
Regression. John Wiley & Sons.
McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive
Bayes text classification. In AAAI-98 workshop on learning for text categorization
(Vol. 752, pp. 41-48).
Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017). AffectNet: A database for
facial expression, valence, and arousal computing in the wild. IEEE.