You are on page 1of 48

NATIONAL ECONOMICS UNIVERSITY

FACULTY OF MATHEMATICAL ECONOMICS

Business Analysis

Winter 2023

FACIAL RECOGNITION
DSEB 63 - GROUP 02
Professor: Pham Tuan Minh, PhD

List of members:

Vu Minh An

Nguyen Khanh Huyen

Truong Dinh Viet

Truong Minh Hung


Acknowledgment
We would like to thank our mentor Prof. Pham Tuan Minh for his support, guidance, and
help throughout the project.
Abstract

This research focuses on the advancement of facial recognition technology, a field that
has seen substantial interest over recent years, exemplified by its adoption in consumer
devices such as Apple's iPhone, which has evolved its security features from password
entry to finger print to facial identification for its non-intrusive collection process and the
rich structural information of the face. Our study delves into dimensionality reduction
techniques, primarily Principal Components Analysis (PCA) variants; in combination
with different conventional machine learning classifiers and evaluates their effectiveness
in facial recognition tasks. Finally, we extend our previous theoretical knowledge to an
application in the task of Facial Emotion Recognition, which can be helpful cross
domain, from medical to customer service and even politics.

In Chapter 1, we explore various distance metrics used in our PCA implementation to


assess their impact on facial recognition accuracy, aiming to identify the most effective
methodological combinations for this purpose.

Chapter 2 investigates the efficacy of diverse dimensionality reduction methods and their
integration with conventional machine learning classifiers like Naive Bayes, Support
Vector Machines (SVM), Logistic Regression. This analysis aims to understand the
potential advantages these combinations offer over the more intuitive distance-based
method such as k-nearest neighbors (K-NN) we started with in chapter 1.

Chapter 3 is the culmination of our research is the development of a real-time human


emotion detection application. This application is built on the insights gained from the
initial chapters and implements machine learning techniques to achieve promising results
in detecting human emotions through facial expressions.

Our work provides a detailed comparison of dimensionality reduction methods and


classifiers. This comparative analysis helped us better understand the nuances of facial
recognition tasks, particularly how different algorithmic strategies and their combinations
can influence the system's accuracy and efficiency. By exploring a range of techniques,
we have illuminated the strengths and limitations of each approach, offering valuable
insights into the most effective practices in the field of facial biometrics and
demonstrating its potential for further research.

Keywords: facial recognition, PCA methods, machine learning, facial emotion


recognition
Table of content
Acknowledgment 2
Abstract 3
List of figure 6
Chapter 1: Principal Component Analysis 8
1.1 Introduction 8
1.2 Literature review 8
1.2.1. Datasets overview 8
1.2.2 Literature Review 9
1.3 Methodology 10
1.3.1. Overview 10
1.3.2 Feature Extraction using Principal Components Analysis 11
1.3.3. Distance Calculation 13
1.4 Results and Discussions 15
1.4.1. Model evaluation 15
1.4.2. Test case and insights 17
Chapter 2: Face Recognition 24
2.1 Introduction 24
2.2 Literature Review 24
2.3 Methodology 27
2.3.1 Model Evaluation Framework 27
2.3.2 Preliminary Analysis 27
2.3.3 Modular PCA Integration 28
2.3.4 Hyperparameter Tuning 28
2.3.5 Comparative Analysis 28
2.3.6 Insight and Further Testing 28
2.4 Results and Discussions 29
2.4.1. Results 29
2.4.2. Discussions 32
Chapter 3: Human emotion Detection 34
3.1 Introduction 34
3.2 Literature Review 35
3.2.1. Dataset Reviews 39
3.3 Methodology 40
3.4 Results and Discussions 42
Chapter 4: Conclusion and Future work 44
References 46
List of figure
Figure 1. Samples from ORL (Our Database of Faces)

Figure 2. Face recognition steps

Figure 3. Dimensional reduction using PCA

Figure 4. Eigenfaces in facial recognition

Figure 5: PCA elbow chart

Figure 6: Visualization of different distance method

Figure 7: Distance method calculation formula

Figure 8: Reported accuracy using different number of eigenfaces

Figure 9: Reporting accuracy score

Figure 10: Original datasets of people label 4

Figure 11: Accuracy of combinations of PCA variants and machine learning classifiers

Figure 12: Accuracy of classifiers following base-PCA implementation

Figure 13: Average accuracy of PCA variants

Figure 14: False prediction case

Figure 15: The evolution of facial expression recognition in terms of datasets and
methods.

Figure 16: Algorithm analysis of 2D FER Techniques (Huang, Chen, Lv, & Wang, 2019)
Figure 17: Survey done by Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017)

Figure 18:State-of-the-art Algorithms and their performance on the database listed in


Fig. 17

Figure 19: Step-by-step for Facial Emotion Recognition systems

Figure 20: Feature extraction process

Figure 21: Accuracy of various classifier following two feature extraction methods
Chapter 1: Principal Component Analysis

1.1 Introduction

Principal Component Analysis (PCA), despite being developed decades ago,


remains a cornerstone in the field of face recognition. It exemplifies the enduring
relevance of classical algorithms in the era of modern computer vision. PCA’s
ability to reduce dimensionality while preserving vital facial information makes it
particularly effective for the holistic approach, offering a straightforward and
efficient method for deploying face recognition technologies in real-world
applications.

In this chapter, we will try to reimplement and investigate one of the most basic
approaches in face recognition: PCA in depth and to complete the identification
phase in face recognition coupled with method in distance measuring using KNN.

1.2 Literature review

1.2.1. Datasets overview

The ORL Database of Faces, also known as the AT&T Database of Faces,
comprises 400 images spanning 40 subjects. For certain subjects, images capture
variations in lighting, facial expressions, and accessories like glasses. Taken
against a dark background, each subject is positioned upright, allowing for slight
side movement. Images are 92x112 pixels, in 256 gray levels (Figure 1) . This
diversity and structure make the dataset a fundamental resource for facial
recognition research, emphasizing its versatility in different conditions (AT&T
Laboratories Cambridge, 2001)
Figure 1. Samples from ORL (Our Database of Faces)

1.2.2 Literature Review

Facial recognition technology, particularly focusing on machine learning


algorithms and Principal Component Analysis (PCA), demonstrates a significant
evolution and application of various methodologies over recent years.
There has already been a lot of research in this area starting with Kanade working
on the First Automated System in 1973, Sirovich and Kirby introducing the
concept of eigenfaces using principal component analysis (PCA) in 1987 in order
to search for a reduced dimensional representation of face images. This concept
was later developed by Turk and Pentland in 1991 and has then proved out to be
one of the most commonly used algorithms for face recognition.

Also, One of the studies that has been conducted is stated in the journal entitled “
Face Identification Based on K-Nearest Neighbor “ by Wirdiani and Hridayami in
Indonesia, The proposed facial recognition process is carried out using the PCA in
feature extraction and K-nearest neighbors ( KNN ) for the face identification with
the restriction of 30 people, each person had 3 images used for training and 2
images used for testing. The result obtained from several test of k-neighbors value
with the best accuracy or F1-score is 81% with k-neighbor = 1, 53% for k = 2 and
47% for k = 3.

1.3 Methodology

1.3.1. Overview

Prior to discussing the implementation of facial recognition technology, it is


essential to outline the preliminary steps involved in the process:

● Face Detection: Localization of the human faces in a particular image.


● Face Extraction: Extract the features from the face images after detection
● Face Recognition: Using features extracted to compare with known face in
datasets

Figure 2. Face recognition steps


In this research, we mainly use the PCA method for face extraction and
distance-based methods to recognize the face.

1.3.2 Feature Extraction using Principal Components Analysis

Principal Components Analysis (PCA) is a machine learning algorithm to reduce


dimension, it will transform the original dataset to a new variable set for prediction
purposes . In the face recognition it performs relatively good and accurate due to
its strengths:

● Handling high dimensional dataset: Transferring a high resolution picture


to a lower for calculation purposes.
● Capturing general patterns in a small number of dimensions: Capture and
store the basic human features: eyes, nose, mouth, ear, hair, …
● Reducing the sensitivity to the noise of the model: Face images often
contain noise, variations, and factors such as lighting conditions, pose
changes, and facial expressions (noise), which can lead to the overfitting of
the model.

Figure 3. Dimensional reduction using PCA

The PCA method will transform the original dataset to feature vectors then
principal components or eigenfaces.
Figure 4. Eigenfaces in facial recognition

The primary objective of utilizing eigenfaces is to maintain maximal information


content while reducing data dimensionality, emphasizing the importance of
preserving the dataset's distribution. Consequently, selecting an appropriate
number of eigenfaces is crucial to conserve as much variance as possible, which
represents the dataset's distribution.

To accurately and effectively determine the optimal number of eigenfaces, our


team employs two methodologies: the Elbow Chart method and Testing.

Elbow Chart Method involves employing an elbow chart to graphically depict the
relationship between the number of eigenfaces and the cumulative explained
variance ratio. Through observation, we identify a range of 30-40 eigenfaces as the
'elbow point,' where the increase in explained variance ratio begins to plateau,
indicating an optimal balance between information retention and dimensionality
reduction. We ultimately chose 30 as the number of eigenfaces.
-

Figure 5: PCA elbow chart

1.3.3. Distance Calculation

Following extensive investigation, we have identified several methods suitable for


calculating distances, including Euclidean, Manhattan, Cosine, and Mahalanobis,
among others.

Figure 6: Visualization of different distance method


Given the objectives of our study, which focuses on the redevelopment of facial
recognition systems, we have chosen to implement the K-Nearest Neighbors
(KNN) model. This model incorporates predefined metrics, enabling us to perform
calculations with a reduced margin of error.

Distance calculation Formula


method

2 2 2
d(P, Q) = (𝑃1 − 𝑄1) + (𝑃2 − 𝑄2) +... + (𝑃𝑛 − 𝑄𝑛)

Euclidean

𝑛
∑ |𝑃𝑖 − 𝑄𝑖|
Manhattan 𝑖=1

𝑛
∑ 𝐴𝑖*𝐵𝑖
𝐴*𝐵 𝑖=1
Sc(A,B) = Cos(Θ)= |𝐴|*|𝐵| =
Cosine 𝑛
2
𝑛
2
∑ 𝐴𝑖 * ∑ *𝐵𝑖
𝑖=1 𝑖=1

Figure 7: Distance method calculation formula

Additionally, experiments with eigenfaces ranging from 20 to 180, subsequently


computing the accuracy scores has proven that the elbow has helped us choose
quite a good number of eigenfaces.
Figure 8: Reported accuracy using different number of eigenfaces

1.4 Results and Discussions

1.4.1. Model evaluation

Accuracy is one of the most common metrics in evaluating the performance of the
model. It represents the proportion of correctly identified instances out of the total
instances

Accuracy formula:

Predicted = Yes Predicted = No

Actual = Yes True Positive False Negative

Actual = No False Positive True Negative


(𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒)
Accuracy Score =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

Moreover, we have also had a small glimpse of the dataset and seen that there have
been many cases when we begin to split the data, we encountered some problem
like the imbalance between the train images and test ones,... Therefore, we decided
to do the Cross Validation with Kfold to find out the average performance of the
metrics we will use later to measure the distance in different cases.

Figure 9: Reporting accuracy score

So with the KNN model, we have found out about the option for different weight
metrics, which are Uniform and Distance.

Weight refers to how much influence each neighbor has on the prediction of a new
data point. The weight assigned to each neighbor affects how its contribution is
considered in the prediction process.
● Uniform: All neighboring points will have equal weight in the prediction
process
● Distance: the contribution of each neighbor is weighted by its distance from
the query point. Closer neighbors have a higher influence on the prediction
compared to farther neighbors.

In the task of face biometrics, the dataset will normally have the Inherent
Structure. It is because different images of one individual typically share common
characteristics such as the arrangement of facial features, overall shape, and
proportions. Conversely, faces of different individuals have distinct features and
variations that differentiate them from each other.

Therefore, in the task of Face Identification, where the goal is to identify


individuals based on facial features, the similarities between faces often depend on
the closeness of their feature representations in the feature space. Thus, Distance
weight seems to be very appropriate with the task we are doing since it prioritizes
the influences of nearby faces, which are more likely to be similar. As a result,
after cross validating the dataset, the average performance of methods with
distance weight appears to be better than uniform weight.

1.4.2. Test case and insights

Testing case: People with labels number 4


Figure 10: Original datasets of people label 4

The dataset includes 40 labels which is 40 people. Each person has 10 images with
different angles, expressions,...We had looked through the dataset and found the
person number 4 with the imbalance in his 10 images ( 6 images of him wearing
glasses and 4 not wearing ones ).Therefore, we decided to do further investigation
in this case to find out more insight and the sensitivity between the PCA and
problem of imbalanced dataset.

Not wearing glasses case:

Test image Eigenface


Euclidean Uniform output:

Manhattan Uniform output:

Cosine uniform output:


Comments:

● The eigenface of the test image projected on the eigenspace still gets the
blurred glasses although the input shows that he does not wear ones.
● The test image’s closest neighbors from 1 to 3 normally return the exact
person wearing the glasses
● The other neighbors return incorrect people

Reason and insight:

● The blurred glasses result from the algorithm of PCA. PCA aims to capture
the most significant variations across the dataset. Therefore, the overall
facial structure should be preserved rather than the glasses.However,
enough images of one person wearing glasses might lead to the blurred
detail of glasses being still captured in the eigenspace
● The features extracted from the images wearing glasses might have higher
variability compared to the images not wearing ones
● The closest neighbors still return the correct person because the general
patterns such as, hair, nose, mouth,... are still recognized across the images
with or without the glasses
● The difference between the test image and its eigenface might cause the
complex decision boundary. As a result, images of this person not wearing
glasses could be closer in the feature space to images of other people.
Wearing glasses case:

Test image Eigenface

Euclidean Uniform output:

Manhattan Uniform output:


Cosine uniform output:

Comments: The overall result of the case testing the wearing glass and closing
eyes is relatively good. All of the 5 closest neighbors return the same person
across all the metrics we tried.

Testing image not from datasets:


Comments:

The closest neighbors returned shares some detail like the test image. We can see
that the overall neighbors have the same curly hair, the expression of his smiling
mouth or the monolid eyes.

PCA will help to preserve the general facial pattern.


Chapter 2: Face Recognition

2.1 Introduction

In this chapter, we initially tackled facial recognition using a simple


distance-based approach. Advancing from this, Task 2 explores machine learning
classification techniques, employing StratifiedKFold for dataset division and
performance evaluation through accuracy. Preliminary results highlighted the
effectiveness of PCA with 30 components across various classifiers, where SVM
and KNN notably excelled. Post-tuning, SVM, Logistic Regression, and KNN
showed improvements, emphasizing the importance of hyperparameter
optimization. Insights from incorrect predictions revealed challenges in
distinguishing similar facial features.

2.2 Literature Review

Facial recognition technology, particularly focusing on machine learning


algorithms and Principal Component Analysis (PCA), demonstrates a significant
evolution and application of various methodologies over recent years.

Starting with the advancements in 2015, Vaishali et al. utilized a combination of


PCA, Discrete Cosine Transform (DCT), and Discrete Wavelet Transform (DWT)
for preprocessing and feature vector extraction on the ORL database, achieving a
notable accuracy of 96%. This study underscored the effectiveness of integrating
multiple preprocessing techniques to enhance facial recognition accuracy.

Furthering this line of research, a subsequent study explored the use of Support
Vector Machine (SVM) as a classifier with PCA as a feature extractor, attaining an
impressive recognition rate of 98.75%. This highlights the robustness of SVMs in
handling high-dimensional data, particularly when combined with efficient
dimensionality reduction through PCA.
In 2019, Pranati et al. continued to leverage the PCA technique for feature
extraction and dimensionality reduction, implementing facial recognition systems
using SVM and achieving a solid accuracy rate of 92% on the ORL database. This
consistency in results reaffirms the reliability of PCA as a preprocessing step in
facial recognition systems.

By 2021, Muhammad et al. conducted a comparative study of four traditional


machine learning algorithms - PCA, 1-Nearest Neighbor (1-NN), Linear
Discriminant Analysis (LDA), and SVM - using the ORL database. Their
evaluation, based on a 5-fold cross-validation technique, demonstrated that
systems based on these algorithms could achieve high accuracy rates, with SVM
leading at 98%.

Kak et al. utilized DWT and PCA for preprocessing and feature extraction,
coupled with the K-Nearest Neighbors (KNN) classifier, achieving an exceptional
99.25% recognition rate on the ORL database. This study exemplifies the potential
of combining powerful feature extraction techniques with effective classifiers like
KNN.

a) Principal Component Analysis

Exploring PCA further, the literature identifies various adaptations of the standard
PCA method, addressing its computational challenges, especially with large
datasets. Variants like Sparse PCA, Randomized PCA, Incremental PCA, Kernel
PCA, and Modular PCA offer solutions that balance computational efficiency and
analytical precision, each tailored to specific types of data challenges.

Sparse Principal Component Analysis (Sparse PCA) refines conventional PCA by


introducing sparsity in the principal component loadings, enhancing
interpretability and relevance in fields like finance, genomics, and statistics. This
method incorporates a sparsity-inducing regularization, akin to LASSO,
facilitating the extraction of more meaningful components (Zou, Hastie, &
Tibshirani, 2006).

Randomized PCA offers an efficient approximation of principal components,


particularly useful for large datasets. It employs stochastic algorithms to estimate
the dominant singular values and vectors quickly, significantly reducing
computation time (Halko, Martinsson, & Tropp, 2011).

Incremental PCA is tailored for large datasets that exceed memory capacity,
processing data in batches and updating the PCA model incrementally, ideal for
streaming data scenarios (Ross, Lim, Lin, & Yang, 2008).

Kernel PCA extends PCA to nonlinear data structures using kernel methods,
facilitating the identification of principal components in a higher-dimensional
space, thus enabling the linear separation of complex, nonlinear data patterns
(Schölkopf, Smola, & Müller, 1998).

Modular PCA is useful for the segmentation of a dataset into smaller, more
manageable modules, on which PCA can be independently applied. The primary
objective of Modular PCA is to facilitate efficient computation and improved
interpretability of principal components within large or complex datasets by
leveraging the inherent structure or partitioning of the data.(An improved face
recognition technique based on modular PCA approach Rajkiran Gottumukkal,
Vijayan K.Asari )

b) Machine Learning Classifier

Support Vector Classifier (SVC): The Support Vector Classifier (SVC) is


recognized for its superior performance in high-dimensional spaces, making it
particularly effective for datasets with a large number of features. Its versatility in
handling different types of data is attributed to the use of kernel functions, which
enable the algorithm to operate in a transformed feature space. This capability
allows SVC to achieve high accuracy across a variety of tasks (Cortes & Vapnik,
1995).

Random Forest Classifier: The Random Forest Classifier employs an ensemble


learning method that constructs numerous decision trees at training time. By
taking the mode of the classes predicted by individual trees, it offers robust
classification outcomes. This method is celebrated for its ability to reduce
overfitting and improve model generalizability, making it effective across a broad
range of applications (Breiman, 2001).
K-Nearest Neighbors (KNN): KNN is a non-parametric method that classifies
instances based on the closest feature space neighbors. This simplicity and the
method's reliance on local information make it highly adaptable to specific
classification problems. However, its performance is heavily dependent on the
choice of distance metric and the value of k, the number of neighbors considered
(Cover & Hart, 1967)

Logistic Regression: Contrary to what its name might suggest, Logistic Regression
is a linear model used for classification purposes. It employs the logistic function
to estimate probability distributions, offering a straightforward and efficient means
of modeling binary dependent variables. Its widespread application is partly due to
the interpretability of its output, which directly represents the odds of class
membership (Hosmer Jr, Lemeshow, & Sturdivant, 2013).

Naive Bayes Classifier: The Naive Bayes Classifier is a probabilistic classifier that
applies Bayes' theorem under the assumption that features are independent of each
other. This "naive" assumption simplifies computations and, despite its simplicity,
often results in surprisingly effective classification, especially in high-dimensional
datasets. Its efficiency and straightforward implementation have made it a popular
choice for text classification and spam detection tasks (McCallum & Nigam,
1998).

2.3 Methodology

2.3.1 Model Evaluation Framework

Our evaluation framework is predicated on established metrics of accuracy, as


detailed in Section 1.4.1 of our paper. These metrics serve as the cornerstone for
assessing the effectiveness of our classification models prior to and following the
optimization process.

2.3.2 Preliminary Analysis

Initially, we applied PCA with a fixed number of components (n_components =


30) to reduce the dimensionality of our dataset, which comprises 400 images
across 40 distinct subjects. Subsequent classification was performed using a suite
of algorithms: Support Vector Machine (SVM), Decision Tree, Naive Bayes,
Random Forest, Logistic Regression, and K-Nearest Neighbors (KNN). The
performance of these classifiers was evaluated based on accuracy.

2.3.3 Modular PCA Integration

The incorporation of Modular PCA was motivated by its potential to enhance


classification outcomes by addressing variations in pose and illumination, a noted
limitation of traditional PCA in facial recognition contexts. Modular PCA, by
dividing facial images into smaller, independently analyzed regions, allows for a
more nuanced representation and recognition process, effectively mitigating the
challenges posed by global information processing.

2.3.4 Hyperparameter Tuning

Following the identification of high-performing classifiers (SVM, Logistic


Regression, KNN), we embarked on a hyperparameter tuning exercise to refine
their performance further. This process involved adjusting parameters such as 'C',
'gamma', and 'kernel' for SVM; 'C', 'max_iter', 'penalty', and 'solver' for Logistic
Regression; and 'n_neighbors', 'p', and 'weights' for KNN, leading to marginal yet
noteworthy improvements in their respective accuracies

2.3.5 Comparative Analysis

Our analysis extended to comparing the accuracy of different PCA methods when
combined with various classifiers. Modular PCA demonstrated superior
performance in most cases, attributed to its ability to better accommodate the
inherent variability in the dataset related to lighting and facial expressions. This
comparative exercise underscored the efficacy of Modular PCA over traditional
PCA and its variants in handling the complexities of facial recognition tasks.

2.3.6 Insight and Further Testing

Insights gleaned from visualizing incorrectly predicted cases revealed that certain
facial features (e.g., beards, glasses, bald foreheads) could confound classification
efforts, highlighting areas for further model refinement. Additionally, we explored
the application of our optimized models, particularly SVM combined with Modular
PCA, on a new image class to validate the generalizability and robustness of our
approach.

2.4 Results and Discussions

2.4.1. Results

Prior to optimization, we utilized some variants of Principal Component Analysis


(PCA) with n_components = 30 and subsequently tested various classification
algorithms, including Support Vector Machine (SVM), Decision Tree, Naive
Bayes, Random Forest, Logistic Regression, and K-Nearest Neighbors (KNN).
The accuracy metric is employed to evaluate the performance of each algorithm.

Figure 11: Accuracy of combinations of PCA variants and machine learning classifiers

Analyzing the chart, it becomes evident that Support Vector Machines (SVM) and
k-Nearest Neighbors (KNN) exhibit superior performance in addressing face
recognition challenges. Following closely are Logistic Regression and Random
Forest, demonstrating commendable results. However, Decision Tree yields
comparatively poor outcomes. Moreover, the utilization of Principal Component
Analysis (PCA) variants proves beneficial in enhancing the performance of
classification algorithms with lower score, as observed with Modular PCA in
conjunction with Decision Tree and Naive Bayes. However, for SVM or KNN, the
incorporation of PCA variants does not yield discernible differences in
performance, emphasizing the algorithm-specific impact of PCA in face
recognition problems.

Figure 12: Accuracy of classifiers following base-PCA implementation

This chart shows the score of PCA combined with Classification methods.
Looking at the chart, we can see that SVM and KNN are the methods with the
highest score 0.985 and 0.9775 respectively and Random Forest above 0.94.
Meanwhile, Decision Trees produced quite bad results with only about 0.63.

Another comparison result tested by our team is to compare accuracy results


between other PCA variants combined with classification methods.
Figure 13: Average accuracy of PCA variants

It can be seen that the Modular PCA method gives the best results of most
classification methods . This can be explained because the data contains 400
images from 40 distinct subjects. For some subjects, the images were taken at
different times, varying the lighting, facial expressions while the limitations of
PCA (Principal Component Analysis) based face recognition in handling
variations in pose and illumination. The method, which considers global
information and represents faces with a set of weights, struggles when faced with
significant changes in pose and illumination. Dividing face images into smaller
regions and calculating weight vectors for each region can address this issue. By
focusing on local information, the weights become more representative, allowing
for better recognition under varying conditions. The modular PCA approach,
which considers individual face regions independently, is proposed as a solution to
improve recognition rates when dealing with pose and illumination variations.
Note: Because there is no separate test dataset, the problem must be divided into
fold train test dataset to solve the problem, so the scores are only relative and will
fluctuate when divided into different folds when changing random_state.

2.4.2. Discussions

Below are some interesting insights from our experiments. By visualizing the
cases that were incorrectly predicted, below are the cases where most
classification methods give wrong results.

Figure 14: False prediction case


For example, if you look at two people 28 and 37 with the naked eye, you can see
that there are many similarities between the two faces, such as beards, glasses,
bald foreheads, etc., so the model predictions are mixed up.
Chapter 3: Human emotion Detection

3.1 Introduction

In recent years, the field of Facial Expression Recognition (FER) has seen
substantial growth due to its wide-ranging applications across various domains.
FER systems are pivotal in interpreting human emotions by analyzing facial
features, which can significantly enhance human-computer interaction. This
research paper delves into the intricate mechanisms of FER with a conventional
machine learning approach focusing on interpretability.

The applications of FER are manifold and profoundly impactful in sectors such as
medical, customer service, and political campaigns. In the medical field, FER can
play a crucial role in understanding patient emotions, thereby enabling healthcare
providers to offer better care by recognizing non-verbal cues of discomfort or pain.
This is particularly significant in scenarios where patients are unable to verbally
communicate their feelings. Similarly, in the realm of customer satisfaction, FER
technology can transcend traditional feedback mechanisms by analyzing
customers' facial expressions at the point of interaction. This can provide authentic
insights into customer experiences, distinguishing genuine satisfaction from polite
or courtesy responses. In automatic driver monitoring, the system can detect driver
fatigue level. Moreover, the application of FER in political campaigns, especially
during presidential debates in the USA, can offer a novel perspective on public
engagement and reaction. By analyzing the facial expressions of the audience,
campaign strategists can gauge the emotional impact of their messages, enabling
them to refine their communication strategies effectively.
3.2 Literature Review

Figure 15: The evolution of facial expression recognition in terms of datasets and methods.
Figure 16: Algorithm analysis of 2D FER Techniques (Huang, Chen, Lv, & Wang, 2019)
Conventional Facial Expression Recognition (FER) approaches rely heavily on
manual feature engineering, requiring meticulous preprocessing, feature
extraction, and classification tailored to specific datasets. These methods are
grounded in three primary steps: image preprocessing to enhance relevant
information and eliminate distractions, feature extraction to distill useful data from
images, and expression classification to categorize facial expressions accurately.

Image Preprocessing:

This initial phase focuses on preparing images for feature extraction by reducing
noise, detecting faces, normalizing scale and grayscale, and applying histogram
equalization to improve image quality. These processes are crucial for mitigating
interference factors like complex backgrounds, light intensity, and occlusion,
which can vary significantly across datasets due to differences in size, color, and
the equipment used for image capture.

Feature Extraction:

Feature extraction is pivotal in translating images into a form more amenable for
classification. Techniques such as Gabor feature extraction, Local Binary Pattern
(LBP), optical flow method, Haar-like feature extraction, and feature point
tracking are, Facial Landmarks commonly employed. Each method has its
strengths and weaknesses, with Gabor wavelets offering robustness to texture
transformation and LBP providing efficiency and less storage demand. However,
challenges like dimensionality and sensitivity to noise can affect their
performance.

Specific Techniques Highlighted:

● Gabor Feature Extraction excels in handling multi-scale and directional


texture features but may require significant memory for global features.
● Local Binary Pattern (LBP) is efficient and requires less storage but might
miss critical feature information due to its focus on local pixel features.
● Active Shape Model (ASM) and Active Appearance Model (AAM) offer
precise feature point extraction on expression contours, with AAM
incorporating local texture features for better fitting.
● Optical Flow Method captures the motion in facial expressions, identifying
the direction and magnitude of movements.
● Haar-like Feature Extraction focuses on local grayscale variations, effective
in stable illumination conditions.
● Feature Point Tracking synthesizes expressions based on the displacement
of feature points, aiding in dynamic expression recognition.
● Facial Landmarks: Identifying and using key facial landmarks, such as the
eyes, nose, and mouth, can be a powerful feature-based approach. The
relative positions and distances between these landmarks are used for
recognition.

Expression Classification:

Selecting an appropriate classifier is crucial for predicting facial expressions.


Widely used classifiers include k-Nearest Neighbours (kNN), Support Vector
Machine (SVM), Adaptive Boosting (AdaBoost), Bayesian, Sparse
Representation-based Classifier (SRC), and Probabilistic Neural Network (PNN).
Each classifier has its specific advantages and challenges, such as SVM's ability to
handle high-dimensional data and AdaBoost's sensitivity to noisy data. The choice
of classifier affects the FER system's overall accuracy and efficiency.

Conventional FER approaches, while less dependent on extensive data and


advanced hardware, are limited by the need for manual optimization of feature
extraction and classification methods. This separation prevents simultaneous
optimization of these phases, potentially capping the effectiveness of the FER
method to the performance of individual components.
3.2.1. Dataset Reviews

Figure 17: Survey done by Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017)
Figure 18: State-of-the-art Algorithms and their performance on the database listed in
Fig. 17

In this paper we will investigate and perform testing and inference on the most
updated version of AffectNet for its recency and wide coverage dataset with use of
strictly Machine Learning approaches for classifiers for interpretability.

3.3 Methodology

Figure 19: Step-by-step for Facial Emotion Recognition systems

a. Face Detection

To initiate the facial expression recognition process, the first step is face detection,
where the goal is to identify and locate faces within an image. This is a crucial
preliminary step before predicting facial landmarks. In Figure (1), a face detection
algorithm is applied using the built-in Dlib function, which provides an object
detector specifically designed for detecting faces in images.

The Dlib face detection function identifies potential face regions in the image and
returns bounding boxes around these regions. These bounding boxes represent the
detected faces and serve as input for the subsequent facial landmark prediction
step.

Figure 20: Feature Extraction process

b. Facial Landmark Prediction

After successful face detection, the next stage involves predicting facial landmarks
within the identified face regions. This was accomplished using the built-in Dlib
function, which returns an object detector capable of detecting faces in images. In
fig (1) After the face detection is complete, the built-in function of Dlib is utilized
to predict the facial landmark points. The popular pretrained model, which can be
downloaded from the dlib website, is used to predict 68 landmark points. Yellow
dots indicate the expected facial landmark positions in Figure (2). 4)

c. Feature Extraction

The most crucial stage in facial expression recognition is calculating the feature
vector that characterizes a person's feeling. It's crucial to understand how the facial
landmark points relate to one another. This is accomplished by finding the mean of
both axes, which yields a central point. (X_mean, y_mean) near the nose region
AS IT is considered as the center of the face. The position of all points in relation
to this center point can then be determined. Then a line is formed between the
center point and each other facial landmarks position. As a result, each resulting
line has magnitude and direction (i.e., it is a vector) and serves as the feature
vector for both training and classification. The magnitude is the Euclidean distance
between the two points, while the direction represents the angle formed by the line
with the horizontal reference axis. As a result, the feature vector may be summed
up as follows: feature vector = <point1.x, point1.y, magnitude1, direction1, . . .,
point68.x, point68.y, magnitude68, direction68>

d. Machine learning predictions

With the preprocessed facial expression data, which includes the detected faces,
predicted facial landmarks, and extracted feature vectors, the next step is to
employ machine learning algorithms for facial expression recognition. The dataset,
derived from the Affectnet dataset and split into training and testing sets in an
80:20 ratio, serves as the basis for training and evaluating the machine learning
models. Subsequently, SVM, Logistic Regression, Random Forest, KNN, and
Naive Bayes algorithms are utilized to classify the emotions. The accuracy metric
is employed to evaluate the performance of each algorithm.

3.4 Results and Discussions

Logistic SVM Random KNN Naive Bayes


Regression Forest

Landmarks 0.53 0.52 0.48 0.37 0.30

Landmarks 0.34 Computationally 0.48 0.20 0.20


+Hog Intensive
+Filter

Figure 21: Accuracy of various classifier following two feature extraction methods
The table highlights an unexpected finding: Logistic Regression outperforms
Support Vector Machines (SVM) marginally, showcasing notable efficacy despite
its simplicity compared to advanced, multi-layered neural network approaches in
Facial Expression Recognition (FER). This outcome suggests that the
straightforwardness of Logistic Regression does not hinder its performance,
potentially offering a robust, efficient alternative for FER tasks, especially in
scenarios where the complexity of neural networks may not be warranted. Such
insights advocate for a balanced consideration of both sophisticated and simpler
models, depending on the specific characteristics and requirements of the dataset
at hand.
Chapter 4: Conclusion and Future work

In this research, we meticulously evaluated and compared a range of


dimensionality reduction techniques tailored for facial recognition classifiers,
employing a substantial dataset of 4000 images. Our investigation addressed
complex variables including background interferences, eyewear, and diverse facial
expressions, although some instances of failure were observed. The findings
underscore the efficacy of the Manhattan distance measure, which, in conjunction
with optimized distance weight metrics in KNN, significantly enhances model
performance. Notably, SVMs emerged as superior learning algorithms,
demonstrating a remarkable 98.5% accuracy rate post-cross-validation.
Furthermore, our pioneering development of a real-time facial emotion recognition
system has shown promising results, aligning well with established machine
learning benchmarks, albeit with a modest accuracy rate of 52%.

Despite these achievements, we acknowledge several avenues for enhancement


that could further augment the utility and performance of our project. To elevate
its practical application and accuracy, we propose the following improvements:

First is the integration of more advanced machine learning models and techniques.
Incorporating more sophisticated machine learning algorithms, such as deep
learning models, could significantly improve the system's accuracy and efficiency
in recognizing facial emotions.

Second, the model fails to generalize in labels where data was insufficient,
imbalanced or labeled wrongly from what is expected. Enriching the dataset with a
broader and more diverse range of facial expressions and emotions could enhance
the model's ability to generalize and perform accurately across different
demographics and contexts.

Further exploration and optimization of dimensionality reduction methods could


lead to more efficient feature extraction, thereby improving the overall
performance of the facial recognition system.
By addressing these areas, we can significantly enhance the project's effectiveness
and broaden its applicability, paving the way for more accurate, efficient, and
user-centric facial/facial-emotion recognition systems.
References
Task1:

AT&T Laboratories Cambridge. (2001). The Database of Faces [formerly 'The


ORL Database of Faces']. Retrieved from
ftp://ftp.uk.research.att.com/pub/data/att_faces.tar.Z or
ftp://ftp.uk.research.att.com/pub/data/att_faces.zip. Used in collaboration with the
Speech, Vision and Robotics Group of the Cambridge University Engineering
Department.

Kumar Rusia, M., & Kumar Singh, D. (2022). A comprehensive survey on


techniques to handle face identity threats: Challenges and opportunities. Springer
Science+Business Media, LLC, part of Springer Nature

Çarıkçı, M., & Özen, F. (Year). A Face Recognition System Based on Eigenfaces
Method. Haliç University, Electrical and Electronics Engineering Department,
Şişli, Istanbul, Turkey.

Perlibakas, V. (2003). Distance measures for PCA-based face recognition. Image


Processing and Multimedia Laboratory, Kaunas University of Technology,
Studentu st. 56-305, LT-3031 Kaunas, Lithuania.

Kim, K. (Year). Face Recognition using Principal Component Analysis.


Department of Computer Science, University of Maryland, College Park, MD
20742, USA

Wirdiani, N. K. A., Hridayami, P., Widiari, N. P. A., Rismawan, K. D.,


Candradinatha, P. B., & Jayantha, I. P. D. (Year). Face Identification Based on
K-Nearest Neighbor. Information Technology Department, Faculty of Engineering,
Universitas Udayana, Indonesia

T. Kanade, “Picture Processing by Computer Complex and Recognition of Human


Faces”, In Tech. Report, Kyoto University, Dept. of Information Science, 1973

L. Sirovich and M. Kirby, “Low-Dimensional Procedure for the Characterization


of Human Faces”, Journal of the Optical Society of America A, Vol. 4, page
519-524, March 1987.
M. Turk, A. Pentland, “Face Recognition using Eigenfaces”, Computer Vision and
Pattern Recognition, 1991. Proceedings CVPR'91, IEEE Computer Society
Conference on. IEEE, 1991.

M. Turk, A. Pentland, “Eigenfaces for Recognition”, Journal of Cognitive


Neuroscience”, March 1991.

Task 2:

Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis.
Journal of Computational and Graphical Statistics, 15(2), 265-286.

Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with
randomness: Probabilistic algorithms for constructing approximate matrix
decompositions. SIAM Review, 53(2), 217-288.

Ross, B., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for
robust visual tracking. International Journal of Computer Vision, 77(1-3), 125-141.

Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis
as a kernel eigenvalue problem. Neural computation, 10(5), 1299-1319.

Rajkiran Gottumukkal, Vijayan K. Asari, An improved face recognition technique


based on modular PCA approach,

M. O. Faruqe and M. A. M. Hasan, "Face recognition using PCA and SVM," 2009
3rd International Conference on Anti-counterfeiting, Security, and Identification in
Communication, Hong Kong, China, 2009.

Singhal, Nikita & Ganganwar, Vaishali & Yadav, Menka & Chauhan, Asha &
Jakhar, Mahender & Sharma, Kareena. (2021). Comparative study of machine
learning and deep learning algorithm for face recognition. Jordanian Journal of
Computers and Information Technology.
Task 3:

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning,


20(3), 273-297.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE
Transactions on Information Theory, 13(1), 21-27.

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic
Regression. John Wiley & Sons.

McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive
Bayes text classification. In AAAI-98 workshop on learning for text categorization
(Vol. 752, pp. 41-48).

Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017). AffectNet: A database for
facial expression, valence, and arousal computing in the wild. IEEE.

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, "The


Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and
emotion-specified expression," 2010 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition - Workshops, San Francisco, CA, USA,
2010, pp. 94-101, doi: 10.1109/CVPRW.2010.5543262.

One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid


Kazemi and Josephine Sullivan, CVPR 2014

Huang Y, Chen F, Lv S, Wang X. Facial Expression Recognition: A Survey.


Symmetry. 2019; 11(10):1189. https://doi.org/10.3390/sym11101189

You might also like