Professional Documents
Culture Documents
Emotion Recognition For An E-Learning Platform Using Deep Learning: A Comparison of Different Approaches
Emotion Recognition For An E-Learning Platform Using Deep Learning: A Comparison of Different Approaches
net/publication/372230358
CITATIONS READS
0 4
4 authors:
Some of the authors of this publication are also working on these related projects:
Contribution to the Development of A Dynamic Circulation Map using the Multi-Agent Approach View project
All content following this page was uploaded by Mohammed Kodad on 09 July 2023.
BOUSSELHAM Abdelmajid
Informatique, Intelligence Artificielle et
Cyber Sécurité (L2IAS)
ENSET Mohammedia, Université
Hassan II de Casablanca
Mohammedia, Morocco
bousselham@enset-media.ac.ma
Abstract— This abstract provides a brief summary of expressions. Emotion recognition is important in how
utilizing deep learning techniques to recognize emotions computers interact with humans, and it can greatly improve
through facial expressions. Deep learning models, specifically e-learning platforms.
Convolutional Neural Networks (CNNs), have gained
significant popularity in accurately analyzing and E-learning platforms have become popular for their
understanding emotions from facial images. flexibility and accessibility in education. However, they
often struggle to understand and respond to learners'
By training these models on extensive datasets of labeled emotions, which affects the effectiveness of personalized
facial expression images, they can effectively learn and extract and engaging learning experiences.
crucial features. CNNs excel in capturing spatial details from
facial images. To address this, researchers are using deep learning
techniques to build emotion recognition systems. These
The application of deep learning-based emotion recognition systems analyze facial expressions using specialized neural
extends to various domains, including human-computer networks like CNNs and RNNs to accurately identify and
interaction, healthcare, and entertainment. Real-time emotion classify emotions in real-time. This opens up exciting
detection enables personalized interventions, adaptive content possibilities for integrating emotion recognition technology
delivery, and the creation of emotionally captivating into e-learning platforms, making learning more
experiences, particularly in the context of E-Learning. personalized and adaptive.
However, challenges remain, such as the limited availability This paper explores the potential of using deep learning
of diverse and well-annotated datasets and the need to account and facial expression analysis for emotion recognition in
for variations in facial expressions across individuals and e-learning platforms. We will discuss why emotion
cultures. recognition is important for improving human-computer
interaction and its specific benefits in e-learning. We will
Nevertheless, the integration of deep learning techniques also explain the methods and techniques used in deep
for emotion recognition has the potential to revolutionize learning systems, focusing on how CNNs analyze facial
human-computer interaction, enhance user experiences, and expressions.
foster more empathetic and adaptable technologies across
different fields. Continuous research and advancements in By integrating deep learning-based emotion recognition
deep learning approaches are expected to further refine the into e-learning platforms, we can gain real-time insights into
accuracy and reliability of emotion recognition systems based learners' emotions, allowing for personalized support and
on facial expressions. customized content. Deep learning models can also create
engaging learning materials that make the learning
Keywords— emotion recognition, e-learning, facial experience more enjoyable.
expression, online machine learning, real-time, Deep Learning,
DataSet However, while integrating emotion recognition
technology has many benefits, it's important to balance its
I. INTRODUCTION use with respect for privacy and user autonomy. This paper
TDeep learning has emerged as a powerful technology will also discuss the ethical considerations and challenges
that can help machines understand complex patterns in data. associated with using emotion recognition in e-learning
One area where deep learning shows promise is in platforms.
recognizing emotions, especially by analyzing facial
1
In summary, this paper highlights the potential of deep nineteen for collecting EEG signals. Virtual markers were
learning techniques for emotion recognition and their placed on the subject's face, and the markers were tracked
application in e-learning platforms. By using facial using an optical flow algorithm. The distance between the
expressions to understand learners' emotions, we can center of the subject's face and each marker position was
transform the way people interact with online education, used as a feature for facial expression classification, while
creating a more empathetic, adaptable, and effective the fourteen signals collected from the EEG signal reader
learning environment. were used for emotional classification.
The article [2] discusses the importance of facial
recognition in various applications, such as security, identity
II. RESEARCH PROBLEM: verification, and database management systems. The article
In e-learning, the way teachers and students presents a deep learning algorithm for accurate facial
communicate is usually through written messages, which recognition and identification, using haar cascade detection
makes it difficult to express and understand emotions. and a convolutional neural network model. The proposed
Unlike in traditional classrooms, where students can show work includes three objectives : face detection, recognition,
their feelings through their words and actions, e-learning and emotion classification, using OpenCV, Python
lacks this ability. programming, and a dataset. An experiment was conducted
to identify the emotions of multiple students, and the results
Emotions are important in learning because they affect demonstrate the efficacy of the face analysis system. Finally,
how motivated and engaged students are, as well as how the accuracy of the automatic face detection and recognition
well they remember information. That's why it's essential to is measured.
include emotions in e-learning systems to make the learning
experience better and improve results. By finding ways for They proposed [3] a Convolutional Neural Network
students to express their emotions and for the system to (CNN) based LeNet architecture for facial expression
recognize those emotions, e-learning can become more like recognition. First of all, they merged 3 datasets (JAFFE,
face-to-face learning and help students feel more connected KDEF and our custom dataset). Then they trained the LeNet
to what they're learning. architecture for emotion states classification. In this study,
they achieved accuracy of 96.43% and validation accuracy
Researchers have looked into different ways to of 91.81% for classification of 7 different emotions through
recognize emotions, such as by analyzing speech, body facial expressions.
signals, and facial expressions. Facial expressions, in
particular, have shown promise for recognizing emotions in They present [4] an approach of Facial Expression
e-learning. However, most studies have focused on Recognition (FER) using Convolutional Neural Networks
analyzing pre-recorded data and not on recognizing (CNN). This model created using CNN can be used to detect
emotions in real-time during online learning. facial expressions in real time. The system can be used for
analysis of emotions while users watch movie trailers or
One challenge in e-learning is figuring out if students are video lectures.
satisfied and engaged. How students feel about their
learning experience affects how motivated they are and how They discuss [5] how computer-animated agents and
well they do. But the usual ways of measuring satisfaction, robots can add a social dimension to human-computer
like feedback forms or surveys, can be subjective and may interaction. Real-time face-to-face communication requires
not capture the true emotions of students. That's why it's relying on sensory-rich perceptual primitives rather than
important to develop more accurate and objective ways to slow symbolic inference processes due to the high level of
measure how satisfied and engaged students are in uncertainty at this time scale. The system presented in the
e-learning. paper detects frontal faces and codes them with respect to 7
dimensions in real time. It employs boosting techniques and
There's also a lack of research on recognizing emotions SVM classifiers to enhance performance and has been tested
in real-time using facial expressions in e-learning. This is an on a dataset of posed facial expressions. The system's
area that deserves more attention. Developing a system that outputs change smoothly over time, providing a potentially
can recognize emotions in real-time using facial expressions valuable representation to code facial expression dynamics
would greatly improve the e-learning experience. This study in a fully automatic and unobtrusive manner.
aims to fill this research gap by creating an online system
that uses machine learning to recognize emotions from They compared [6] five different methods for real-time
facial expressions in real-time. The system will be tested emotion recognition from facial images, specifically for the
with a group of students in an e-learning environment to see four basic emotions of happiness, sadness, anger, and fear.
how well it measures their satisfaction and engagement. Three of the approaches are based on convolutional neural
networks (CNN) and two are conventional methods using
Histogram of Oriented Gradients (HOG) features. The
approaches compared are : AlexNet CNN, Affdex CNN,
III. RELATED WORKS
FER-CNN, SVM using HOG features, and MLP artificial
BThe study [1] describes an algorithm for real-time neural network using HOG features. The paper presents the
emotion recognition using virtual markers, facial landmarks, results of testing these methods in real-time on a group of
and EEG signals. The study focused on physically disabled eight volunteers.
individuals and children with Autism. The algorithm used
CNN and LSTM classifiers to classify six facial emotions
and EEG signals. The study involved fifty-five This paper [7] presents an advanced deep learning
undergraduate students for facial emotion recognition and technique for emotion prediction through facial expression
2
analysis. The proposed approach employs a two-stage 1) Machine Learning :
convolutional neural network (CNN) model. The first CNN Machine Learning involves training algorithms to analyze
predicts the primary emotion of the input image as happy or and interpret data, and make predictions or decisions based
sad, while the second CNN predicts the secondary emotion. on patterns and statistical models. ML algorithms learn from
The model was trained on FER2013 and JAFFE datasets labeled data and use features derived from that data to make
and achieved superior results compared to existing predictions on new, unseen examples. ML algorithms can be
state-of-the-art methods for emotion prediction from facial broadly categorized into supervised, unsupervised, and
expressions. reinforcement learning.
This paper [8] addresses the challenging task of
real-time emotion recognition through facial expression in Supervised Learning : In supervised learning, algorithms are
live video using an automatic facial feature tracker for face trained using labeled data, where each data point is
localization and feature extraction. The extracted facial associated with a corresponding label or outcome. The goal
features are fed into a Support Vector Machine classifier to is to learn a mapping function that can predict labels for
infer emotions. The paper presents the results of new, unseen data accurately. Examples of supervised
experiments evaluating the accuracy of the approach for learning algorithms include linear regression, decision trees,
various scenarios, including person-dependent and and support vector machines.
person-independent recognition. The findings show that the
proposed method is effective in achieving fully automatic Unsupervised Learning : Unsupervised learning involves
and unobtrusive expression recognition in live video. The training algorithms on unlabeled data, without any
paper concludes by discussing the significance of the
predefined labels or outcomes. The algorithms learn to
research for affective and intelligent man-machine interfaces
and suggesting possible future improvements. identify patterns, similarities, and structures within the data.
Clustering algorithms and dimensionality reduction
This paper [9] focuses on the importance of analyzing techniques are common examples of unsupervised learning.
users' facial expressions to improve the interaction between
humans and machines. The paper proposes a method for Reinforcement Learning : Reinforcement learning involves
extracting facial features and recognizing the user's training algorithms to make decisions or take actions in an
emotional state that is robust to facial expression variations environment to maximize a cumulative reward signal. The
among different users. The method extracts facial animation algorithms learn through trial and error, receiving feedback
parameters (FAPs) and uses a novel neuro fuzzy system to from the environment based on their actions. Reinforcement
analyze FAP variations both at the discrete emotional space
learning has been successful in applications such as game
and in the 2D continuous activation-evaluation space. The
playing and robotics.
system can further learn and adapt to specific users' facial
expression characteristics using clustering analysis. The
paper reports the experimental results of emotionally 2) Deep Learning :
expressive datasets, indicating the good performance and Deep Learning is a subset of ML that focuses on training
potential of the proposed approach. deep neural networks with multiple layers to automatically
learn hierarchical representations of data. Deep Learning
The aim of this study [10] is to develop predictive algorithms are inspired by the structure and function of the
models that can classify emotions in real-time from videos human brain, and they excel at capturing complex patterns
of workshop participants engaging with an educational and relationships in large-scale datasets. Deep neural
robot. We combine the two best generalizing models networks consist of interconnected layers of artificial
(Inception-v3 and ResNet-34) to achieve better prediction
neurons (nodes), with each layer extracting increasingly
accuracy. To test our approach, we apply the models to
video data and analyze the predicted emotions based on the abstract features from the input data.
participants' gender, activities, and tasks. Statistical analysis
reveals that female participants are more likely to show Deep Learning architectures, such as Convolutional Neural
emotions in almost all activity types, and happiness is the Networks (CNNs) for image recognition and Recurrent
most frequently predicted emotion for all activity types, Neural Networks (RNNs) for sequential data, have achieved
regardless of gender. Additionally, programming is the remarkable performance in various domains, including
activity type where the analyzed emotions were the most computer vision, natural language processing, and speech
frequent. These findings highlight the potential of using recognition. Deep Learning algorithms often require
facial expressions to improve teaching practices and substantial amounts of labeled training data and powerful
understand student engagement. computational resources for training due to their complex
architectures.
3
Performance and Scalability: Deep Learning algorithms can human emotional states. This advancement can
achieve state-of-the-art performance in certain tasks when revolutionize numerous industries and significantly enhance
trained on large amounts of data. ML algorithms may be our interaction with technology and each other.
more suitable for smaller datasets or when interpretability of
the model is critical.
4
less centered face occupying a similar amount of space weights to minimize the loss and improve its prediction
in each image. accuracy.
It's important to note that this dataset was obtained from The VGG16 model has demonstrated strong
the "Challenges in Representation Learning: Facial performance in various computer vision tasks, including
Expression Recognition Challenge" competition. The emotion recognition based on facial expressions. Its deep
dataset was prepared by Pierre-Luc Carrier and Aaron architecture allows it to learn complex patterns and
Courville as part of their ongoing research project. They features from images, enabling accurate recognition of
generously provided a preliminary version of their different emotions. However, it's worth noting that the
dataset to the workshop organizers for use in this VGG16 model can be computationally intensive and
contest. may require substantial computational resources for
training and inference, particularly when dealing with
large-scale datasets.
B. VGG16 Model
The VGG16 [11] model is a convolutional neural In summary, the VGG16 model is a powerful CNN
network (CNN) architecture that has been widely used in architecture commonly employed in emotion recognition
various computer vision tasks, including emotion based on facial expressions. Its deep structure, along
recognition based on facial expressions. It was with its ability to learn intricate features, makes it
developed by the Visual Geometry Group (VGG) at the suitable for capturing meaningful representations from
University of Oxford. images and accurately predicting different emotions.
5
A softmax activation function is commonly used to layers, batch normalization, and activation functions.
produce a probability distribution over the emotion The residual connections within the blocks facilitate the
classes, allowing the model to make predictions about flow of information and improve gradient flow during
the dominant emotion in the input facial expression. training.
Training the VGG19 model for emotion recognition In the context of emotion recognition based on facial
typically involves using a labeled dataset of facial expressions, the ResNet50V2 model takes an input
expression images. The model is optimized using image of a face and processes it through the layers to
algorithms such as stochastic gradient descent (SGD) or extract discriminative features. These features capture
Adam optimizer, and the categorical cross-entropy loss the unique characteristics of facial expressions
function is commonly employed to measure the associated with different emotions.
discrepancy between the predicted probabilities and the
true emotion labels. Through iterative training, the The output layer of the ResNet50V2 model is typically
VGG19 model adjusts its weights to minimize the loss configured to have multiple units corresponding to
and improve its ability to accurately classify emotions. different emotion classes. The final activation function,
often softmax, generates a probability distribution over
The VGG19 model's increased depth compared to these emotion classes, enabling the model to make
VGG16 enables it to capture more intricate and nuanced predictions about the dominant emotion exhibited in the
features, potentially leading to improved performance in facial expression.
emotion recognition tasks. However, it's important to
note that the additional depth also increases the model's Training the ResNet50V2 model for emotion recognition
complexity and computational requirements, demanding involves using a labeled dataset of facial expression
more computational resources during training and images. The model's weights are optimized using
inference. algorithms such as stochastic gradient descent (SGD) or
Adam optimizer. The choice of loss function, such as
In summary, the VGG19 model is an extension of the categorical cross-entropy, helps measure the
VGG16 architecture widely utilized in emotion dissimilarity between predicted probabilities and the true
recognition tasks based on facial expressions. Its deeper emotion labels. Through iterative training, the model
structure enables it to capture more complex patterns, adjusts its weights to minimize the loss and improve its
allowing for better discrimination among different ability to accurately classify emotions.
emotions. By leveraging the convolutional and fully
connected layers, the VGG19 model can effectively The ResNet50V2 architecture has shown remarkable
extract features from facial images and provide accurate performance in various computer vision tasks due to its
predictions for various emotion classes. deep structure, residual connections, and efficient
training. These attributes make it capable of capturing
complex visual patterns and effectively recognizing
D. ResNet50V2 Model
emotions based on facial expressions.
The ResNet50V2 [13] model is a convolutional neural
network (CNN) architecture that has been widely In summary, the ResNet50V2 model is a deep CNN
employed in various computer vision tasks, including architecture with residual connections, designed for
emotion recognition based on facial expressions. tasks such as emotion recognition based on facial
expressions. Its ability to learn intricate features, along
ResNet50V2 is an extension of the original ResNet with improved gradient flow through residual
architecture introduced by Microsoft Research. The "50" connections, allows it to effectively capture and classify
in the name refers to the number of layers in the different emotions. By leveraging its layers and
network, indicating its depth. The "V2" denotes that it is connections, the ResNet50V2 model demonstrates
an updated version of the model with improved strong performance in recognizing emotions from facial
performance and efficiency. expressions.
6
maintaining computational efficiency, making them
suitable for resource-constrained environments.
The EfficientNetB0 architecture follows a compound In summary, the EfficientNetB0 model is a highly
scaling method, which uniformly scales the network's efficient CNN architecture that achieves excellent
depth, width, and resolution. This scaling allows the performance in emotion recognition tasks based on
model to achieve a good balance between model facial expressions. Its compound scaling approach,
capacity and computational efficiency. The "B0" in the depth-wise separable convolutions, and other techniques
name signifies the base configuration of the EfficientNet contribute to its efficiency and accuracy. By leveraging
family, where "B0" represents the smallest and least these features, the EfficientNetB0 model demonstrates
computationally expensive variant. strong performance in accurately recognizing emotions
from facial expressions.
EfficientNetB0 consists of multiple stacked layers of
depth-wise separable convolutions, which reduce the
F. EfficientNetB7 Model
number of parameters and computational cost while
maintaining model performance. These depth-wise EfficientNetB7 is a deep learning model and a part of the
separable convolutions split the standard convolutional EfficientNet family, which is a series of convolutional
operation into two separate steps: a depth-wise neural networks (CNNs) designed to achieve
convolution, which processes each input channel state-of-the-art performance with significantly fewer
separately, and a point-wise convolution, which parameters compared to other models. The
combines the output of the depth-wise convolution EfficientNetB7 model is the largest and most powerful
across channels. variant in the EfficientNet series.
The EfficientNetB0 model also incorporates other The main idea behind EfficientNet models is compound
techniques such as batch normalization, activation scaling, which involves scaling the depth, width, and
functions, and skip connections. These techniques aid in resolution of the network in a balanced manner. This
improving the learning process, increasing model allows EfficientNetB7 to achieve better performance and
accuracy, and facilitating gradient flow during training. efficiency by effectively utilizing computational
resources.
In the context of emotion recognition based on facial
expressions, the EfficientNetB0 model takes an input Specifically, the EfficientNetB7 model has the following
image of a face and passes it through its layers to extract characteristics:
meaningful features. These features capture the relevant
patterns and expressions associated with different ● Depth: EfficientNetB7 has a deep network
emotions. architecture with a large number of layers, allowing
it to capture complex patterns and features from the
The output layer of the EfficientNetB0 model is input data.
typically configured to have multiple units
corresponding to the different emotion classes. The final ● Width: It has a significantly larger number of
activation function, often softmax, produces a channels or filters in each layer compared to
probability distribution over these emotion classes, smaller variants, which enables it to learn more
enabling the model to make predictions about the expressive representations.
dominant emotion displayed in the facial expression.
● Resolution: The input images to EfficientNetB7 are
Training the EfficientNetB0 model for emotion of higher resolution, which allows the model to
recognition involves using a labeled dataset of facial capture fine-grained details and improve
expression images. The model's weights are optimized recognition accuracy.
using algorithms such as stochastic gradient descent
(SGD) or Adam optimizer. The choice of a suitable loss EfficientNetB7 has been pre-trained on large-scale
function, such as categorical cross-entropy, helps datasets, such as ImageNet, using techniques like
measure the dissimilarity between the predicted transfer learning. As a result, it has learned to recognize
probabilities and the true emotion labels. Through a wide range of features from different images. This
iterative training, the model adjusts its weights to pre-training makes it a powerful feature extractor that
minimize the loss and improve its ability to accurately can be fine-tuned on specific tasks or datasets with
classify emotions. relatively few additional training samples.
The EfficientNetB0 model's efficiency and performance Due to its efficiency and high performance,
make it well-suited for emotion recognition based on EfficientNetB7 is commonly used in various computer
facial expressions. Its ability to capture important vision tasks, such as image classification, object
features while being computationally efficient allows for detection, and semantic segmentation, where it
accurate emotion classification, even in resource-limited consistently achieves top-tier results. However, it should
settings. be noted that EfficientNetB7 might require significant
7
computational resources, especially during training, due SGD also adds a regularizing effect, helping the model
to its large size. generalize better and avoid overfitting.
The key advantage of SGD is its efficiency in processing 5. Bias Correction: In the early iterations of training,
large datasets. Since it operates on mini-batches, it the estimates of the first and second moments can
requires less memory and computational resources be biased towards zero due to their initialization as
compared to batch gradient descent, where the entire zero vectors. To address this, bias correction is
dataset is used in each iteration. The stochastic nature of applied to the first and second moment estimates to
make them unbiased.
8
A. Experimental Setup
6. Learning Rate Scaling: The Adam optimizer scales Scaling and resizing an image involves adjusting its size
the gradients by dividing them by the square root while preserving its aspect ratio or changing the aspect
of the second moment estimate. This scaling allows ratio as desired. The process typically involves two
for adaptive learning rates, where the learning rate steps: scaling and resizing.
is automatically adjusted based on the magnitudes
of the gradients. The pixel values of an image typically range from 0 to
255, representing the intensity of each pixel. Scaling the
7. Parameter Update: Finally, the model parameters image by dividing it by 255 transforms the pixel values
are updated by subtracting the scaled gradients, to a range between 0 and 1. This normalization is often
which are divided by the square root of the second performed to ensure that the pixel values are within a
moment estimate, multiplied by the learning rate. consistent and standardized range, which can be
This step effectively moves the parameters in the beneficial for various image processing algorithms and
direction that minimizes the loss function. models.
The Adam optimizer's adaptive learning rate mechanism In the context of facial expression recognition, resizing
makes it less sensitive to the choice of an initial learning an image to (48, 48) is commonly done to preprocess
rate and helps achieve faster convergence. It combines facial images and prepare them as input for emotion
the benefits of AdaGrad, which adjusts the learning rate recognition models. The dimensions of 48x48 pixels
for each parameter individually, and RMSprop, which have been widely adopted in facial expression datasets
performs adaptive learning rate scaling. Additionally, the and models. This size is typically sufficient to capture
bias correction step ensures that the estimates of the first important facial features while keeping the
and second moments are accurate, particularly during the computational requirements manageable.
early stages of training.
9
The term "freezing" refers to the process of preventing hyperparameter space using different strategies to
the weights and parameters of specific layers in a find the best combination of hyperparameters.
pre-trained model from being updated or trained further ● Objective Functions: You can define an objective
during the fine-tuning or transfer learning process. function that quantifies the performance of your
model based on specific metrics, such as accuracy,
When we load a pre-trained model, all the layers in the loss, or any custom evaluation metric. The tuner
model have already been trained on a large dataset to uses this objective function to guide the search
learn meaningful representations. However, in some process and optimize the hyperparameters.
cases, we may want to fine-tune the model for a specific
task using your own dataset. ● Early Stopping: Keras Tuner supports early
stopping, which allows you to stop the search
By freezing layers, we keep the learned representations process if the performance of the model plateaus or
intact, especially in the early layers that capture basic worsens. This helps save computational resources
features and patterns. This is useful because the by terminating the search early if no further
pre-trained model has been trained on a dataset similar improvements are observed.
to our task, and we want to leverage the pre-existing
knowledge while fine-tuning on your specific data. ● Results Analysis: Keras Tuner provides utilities to
analyze and visualize the results of the
hyperparameter search, such as the best
hyperparameters found, performance metrics
across different trials, and search history.
10
transforming the output of the final layer into
meaningful class probabilities, enabling reliable and
interpretable predictions.
Fig. 7. Keras-tuner_2
To summarize, the optimal model is built upon the Fig. 10. The graphical representation depicting the loss and accuracy
metrics
VGG16 architecture, which serves as the foundational
backbone. It is complemented by a flatten layer to The confusion matrix provides a comprehensive view of
reshape the output, a dense layer with 1000 units to the model's performance by showing the distribution of
capture intricate patterns, and an output layer with correct and incorrect predictions across different classes.
softmax activation for effective multiclass classification. It helps in understanding the types of errors made by the
The softmax function plays a pivotal role in model, such as false positives and false negatives.
11
VIII. LIMITATIONS AND FUTURE WORKS
While the tested algorithms showed promise, we
encountered several limitations that we aim to address in our
future works. Below, we outline these limitations and
discuss our objectives for future research.
1. Limitations:
● Limited Accuracy: The tested algorithms for
emotion recognition based on facial expressions
did not provide highly accurate results. Further
improvements are needed to enhance their
performance.
12
View publication stats
FIGURE TABLE [10] Wisal Hashim Abdulsalam, Rafah Shihab Alhamdani, and
Mohammed Najm Abdullah “Facial Emotion Recognition from
Videos Using Deep Convolutional Neural Networks” [CrossRef]
[11] Philipp Michel and Rana El Kaliouby “Real time facial expression
Figure Figure title Page recognition in video using support vector machines” [CrossRef]
number Number [12] David Dukić and Ana Sovic Krzic “Real-Time Facial Expression
Recognition Using Deep Learning with Application in the Active
Figure 1 The steps of emotion 4 Classroom Environment”. [CrossRef]
recognition using facial [13] VGG16 [CrossRef]
expression [14] VGG19 [CrossRef]
[15] ResNet50V2 [CrossRef]
Figure 2 The pipeline of our 9 [16] EfficientNetB0 [CrossRef]
experiment [17] EfficientNetB7 [CrossRef]
[18] Adam optimizer [CrossRef]
Figure 3 Data importation 9
Figure 6 Keras-tuner_1 10
Figure 7 Keras-tuner_2 11
REFERENCES
13