You are on page 1of 7

Robust Facial Emotion Detection using CNN

Architecture

Aiman Shahid, Hamad Rizwan


Department of Computer Science Department of Computer Science,
University of Engineering and University of Engineering and
Technology Technology
Taxila, Pakistan Taxila, Pakistan
18-cs-21@students.uettaxila.edu.pk 18-cs-40@students.uettaxila.edu.pk

Abstract— Emotions are a natural and significant essential nonverbal components of social
element of human behavior that influence how we communication. As a result, it's no wonder that face
communicate. Precision study and interpretation of the
emotional content of human facial expressions are emotion research has gained a lot of interest in
essential for a better understanding of human recent years, with various applications from visual
behavior. AI has gained popularity in almost every processing sciences to affective computation and
field. Deep learning approaches have been studied as a computer animations [2].
set of methodologies for gaining scalability and
robustness on new forms of data. Deep learning is used Automatic facial emotion recognition (FER) (An
in this article to recognize human emotions through expanded variant of the acronym FER is used in
facial expressions. A new CNN model that can predict each study, such as facial emotion recognition and
human expression from an image has been developed.
facial expression recognition) is a popular topic.
CK+ and FER2013 dataset was used to experiment
with and train a deep convolutional network in this Because this study focuses on the broad
case. We also applied augmentation techniques like the characteristics of facial emotion recognition (the
K-fold method while working with the CK+ dataset. term FER stands for facial emotion recognition).
Furthermore, we attained an accuracy rate of 89 Artificial intelligence techniques include
percent with CK+ and 67.9 percent with the FER2013
dataset which was far better than many systems. Using augmented reality (AR) [3], virtual reality (VR) [3,
our CNN model, we recognized seven different 4], human-computer interface (HCI) [5, 6],
emotions. When tested with different people under entertainment [7, 8]and advanced driver assistant
varying ambient and light conditions, our work has systems (ADASs) [9] have all seen rapid progress
shown positive outcomes. Results were evaluated using
the confusion matrix and roc curve. in recent years. Although numerous sensors for
FER inputs, such as the electrocardiogram (ECG),
Keywords— facial emotion recognition, deep learning- electromyography (EMG), camera, and
based facial emotion recognition, classification, CNN, k- electroencephalograph (EEG)can be employed, the
fold technique. camera has shown to be the most successful sensor
for FER detection since it provides the most
valuable information and requires no maintenance.
I. INTRODUCTION
We offer a deep learning-based emotion recognition
Facial expressions are important in human algorithm based on pictures of facial expressions in
communication because they assist us to understand this research. Unlike previous applications, the
what others are thinking. People infer other people's system maintains high accuracy outcomes in a
emotional states from their facial expressions and variety of environments and models. In our
conversational tones, such as joy, despair, and experiment, we developed a novel CNN model. In
hostility. According to several studies [1, 2], verbal our experiment, we used two datasets: FER2013
components transmit one-third of human and CK+. The CNN model was then trained using
communication, whereas nonverbal components both datasets. Furthermore, the accuracy of the
transmit two-thirds. Because they include emotional
importance, facial expressions are one of the most
produced CNN's was checked by comparing them recognition was achieved in [19] by extracting
to test data. features using wavelet modification and classifying
II. LITERATURE REVIEW emotions using the K-nearest neighbors (KNN)
algorithm.
Deep learning has been used to solve problems in a
range of sectors [8], whereas prior approaches Because a person's face often has many traits,
required manual data collection and decision- scientists used principal component analysis (PCA)
making based on a set of constraints. In the to identify facial features.
previous techniques, the sensory error is quite likely Krithika et al. [20] used the local binary patterns
to occur, and judgments are made based on a set of (LBP) and Voila Jones algorithm to distinguish a
rules created by humans, which may or may not facial expression, head, face, and eye movement
always be correct. Support vector machines (SVM) during learning to evaluate boredom, the learner's
[10], principal component analysis (PCA) [11], and interest, and other factors. In comparison to other
convolutional neural networks (CNN) [12, 13] have modern systems, machine learning and deep
all been used to classify human facial expressions learning algorithms be more efficient in terms of
by researchers. implementation and outcomes [21].
The majority of real-time emotion detection CNN can not only recognize facial expressions but
research has focused on visuals. When the studies it can also be fine-tuned to detect certain areas of
in the literature on facial emotion identification the face rather than the entire face. Furthermore, on
with computer vision techniques are evaluated over numerous datasets, including JAFFE [22-24],
the last several years, they indicate a considerable FERG, FER-2013, and CK+, the CNN model
increase in the development of computer outperforms previous approaches. Even though
technology. there has been a lot of work done on emotion
Even though there are several applications for facial recognition, it has all been focused on discovering
recognition and detection of emotions. Breuer and the fundamental emotion, but the complex emotions
Kimmel [14] employed visual approaches for that humans confront during the learning process
interpreting a CNN model for emotion have been neglected.
identification that was trained with a range of
datasets. They put CNN to the test on datasets for III. PROBLEM STATEMENT
face emotion detection as well as several facial
Facial emotion identification is crucial in the
emotion recognition applications. Jung et al. [15]
disciplines of artificial intelligence and computer
devised a method that combined two forms of
vision. Deep learning aspires to get as near to the
CNNs. One of them is the identification of
human brain as feasible. Convolutional neural
geometric aspects of facial parts, while the other is
networks are particularly useful in vision-based
feature extraction from visual data.
applications. If a personal robot can closely
Liu et colleagues [16] categorize distinct facial perceive human emotion, humans will have higher
expressions using a two-layer CNN and the trust in dealing with or relying on robots. Emotion
FER2013 dataset. They also compared it to four is one of the most fundamental human expressions.
other existing models, finding that the suggested It's an essential component of our nonverbal
model had an accuracy rate of 49.8%. S Suresh et al communication. It aids in determining a person's
[17] created a Deep Neural Network-based sign thoughts, actions, and emotions.
language recognition system that distinguishes six
According to research, a human's ability to interpret
distinct sign languages (DNN). The comparison of
the emotions of other humans is still 65 percent
two models one with an Adam optimizer and the
effective, as measured by the FER 2013 dataset
other with an SGD optimizer shows that the Adam
[25]. To anticipate human emotion, researchers
optimizer model is more accurate. K. Bouaziz et al
have used a variety of methodologies.
[18] have demonstrated an analytical strategy that
includes image recognition technologies and It's still difficult to achieve real-time precision.
processes. The suggested model uses CNN Predicting manually human emotion by using the
architecture to classify different types of dataset FER 2013 can be used to assess the
handwriting. Automatic facial expression difficulty of predicting human emotion. In a
previous study, a network was developed to predict Gathering Dataset
emotion using CNN, SVM, and even probabilistic
approaches. The issue with those methods is that
they don't always meet accuracy criteria, and most Data Preprocessing
of the time, the deep learning networks developed
have so large parameter needs that they're nearly Splitting Dataset
hard to implement in real-time systems.
Real-time implementable systems are required in
Model Training
people's daily lives to make instant forecasts. Even
people with developmental difficulties require some
help in their daily lives. According to the authors of Model Evaluation
[26], a 2012 Canadian Survey on Disability (CSD)
found that 90% of disabled Canadians require Figure 1 Experiment Flow
support in daily life, and 72% of them seem to be
A. Pre-processing
unable to fulfill their demands in at least one of
their tasks. A lot of these everyday requirements, as The proposed approach began with preprocessing,
well as regular therapy, customizable personal in which the entire dataset was scaled to (48*48)
robots and acquire their trust, allowing them to be pixels, resulting in a normalized dataset.
dependent on them. A lot of people with
developmental disabilities require social robots that B. Dataset Used
can assist them in all aspects of their lives. In [26] Robust testing is performed to ensure network
an attempt is made to create such robots. efficiency using two independent annotated
We need a CNN model for all of these needs for datasets. The datasets Fer2013 and CKPlus were
their implementation in real-time systems like applied. There are 28,709 training photos and 3589
personal robots. To reduce computational test images in the FER2013 dataset. Each emotion
complexity, we used a variety of strategies in our label has 4593 photos: furious, 5121 for fear, 547
network. In summary, we have focused primarily for disgust, 8989 for joyous, 6077 for sad, 4002 for
on developing a robust system that can be surprise, and 6198 for neutral emotion.
implemented in real-time on a robot.

IV. PROPOSED SOLUTION


For this research, the steps are as follows. This
experiment employed two datasets: FER2013 and
CKPlus. The dataset is then augmented and divided
into a training and a testing dataset. The training
dataset was used to train the CNN, and the
network's performance was evaluated using the
testing dataset. The generated network is then used
to categorize facial expressions. Figure 1 depicts the
flowchart for this research.

Figure 2. Examples of basic facial emotions from the


dataset are (a) anger, (b) fear, (c) sad, (d) disgust, (e)
neutral, (f) surprise, (g) happy

The Cohn-Kanade+ (CKPlus) collection contains


593 picture sequences depicting seven emotions
from 123 targets. Table 1 and Table 2 shows the
dataset properties for FER2013 and CKPlus
respectively.

Table 1. For FER2013 Dataset

Total Classes 7
Training Images 28,709
Testing Images 7178

Table 2. For CKPlus Dataset

Total Classes 7
Training Images 784
Testing Images 197
C. Convolutional Neural Network (CNN)
CNN is a multilayer artificial neural network
(ANN) model created specifically for computer
vision applications.
Separate layers with diverse tasks, including
convolution, pooling, and fully connected, make up
CNN's architecture. CNN is created by aligning
these structures in a specific order. The feature
extraction activities are carried out in the early
levels of this structure, and the classification
process is carried out in the final layers [27].

Figure 3. Summary of the proposed model

Several pooling functions, convolution layers, and


some fully connected layers were used to develop a
CNN architecture in this work. The 5-layer CNN
model was trained on a dataset that was created.
The CNN's architecture is depicted in Figure. The
network is made up of several convolution layers,
some of which have max-pooling, and some fully
connected layers with a softmax feed at the end.
The neurons in the hidden layer in the last fully
connected layer have been reduced to seven
because there are seven different classes in this
study.

V. RESULTS
The FER2013 and CKPlus datasets are used to train
CNN, in which the optimizer used are Adam and
The Table 4 shows the loss and accuracy of the
the loss function is categorical cross-entropy. Table model for both FER2013 and CKPlus dataset
3 lists the model's parameters.
Table 3 Model Parameters for Training Table 4 Showing Loss and Accuracy

Total images 981 for CKPlus Dataset Accuracy Loss


35,887 for FER2013 Fer2013 67.9% 90.9%
Activation Softmax CKPlus 89% 25.6%
Batch Size 32
Epochs 50
Optimizer Adam and RMSprop For the whole epochs, Figure 5 and 6 show the
Loss function Categorical Cross- model accuracy and training loss for CKPlus,
entropy respectively.

Instead of utilizing the usual stochastic gradient


descent procedure, Adam is an optimization
technique that may be used to update network
weights using individual learning rates [28]. To
adjust the learning rate, it employs first and second-
moment gradient estimations for each weight in the
neural network.
The suggested DCNN model's normalized
confusion matrix for the test samples is shown. The
suggested DCNN model's normalized confusion
matrix for the test samples is shown. The suggested
DCNN model's normalized confusion matrix for the Figure 5 Accuracy Graph for CKPlus
test samples is shown in Figure 3. Except for class
2, the specificity (recall), or the percentage of
positive samples that are expected to be negative,
shows that the majority of them are expected to be
negative (angry). The prediction results for Class 3
(disgust) and Class 7 (surprise) are good.

Figure 6 Loss Graph for CKPlus

For the whole epochs, Figure 7 and 8 show the


model accuracy and training loss for FER2013,
respectively.

Figure 4 Confusion Matrix for CKPlus Trained Model


VII. FUTURE WORK
Further testing with datasets containing
photographs from many other perspectives, such as
side views, bottom views, and top views, could be
part of the future scope of this work. It will lead to
the development of a model that can distinguish
human facial expressions from any angle and
against any backdrop. Only looking at photos of
facial expressions is insufficient to accurately
assess human emotions. Other small facial emotions
such as curves and micro-expressions, as well as
Figure 7 Accuracy Graph for FER2013 eye movement, eye blinks, eye focus change, and
eyebrow movement, may help in emotion
recognition.

VIII. REFERENCES
1. Kaulard, K., et al., The MPI facial expression database
—a validated database of emotional and
conversational facial expressions. PloS one, 2012. 7(3):
p. e32321.
2. Sohrab, F., J. Raitoharju, and M. Gabbouj. Facial
expression based satisfaction index for empathic
buildings. in Adjunct Proceedings of the 2020 ACM
International Joint Conference on Pervasive and
Ubiquitous Computing and Proceedings of the 2020
ACM International Symposium on Wearable
Computers. 2020.
3. Chen, C.-H., I.-J. Lee, and L.-Y. Lin, Augmented
Figure 8 Loss Graph for FER2013 reality-based self-facial modeling to promote the
emotional expression and social skills of adolescents
with autism spectrum disorders. Research in
developmental disabilities, 2015. 36: p. 396-403.
4. Hickson, S., et al. Eyemotion: Classifying facial
expressions in VR using eye-tracking cameras. in 2019
IEEE Winter Conference on Applications of Computer
Vision (WACV). 2019. IEEE.
5. Bartneck, C. and M.J. Lyons. HCI and the face:
VI. CONCLUSION Towards an art of the soluble. in International
Conference on Human-computer Interaction. 2007.
This research proposes a neural network model for Springer.
6. Sandoval, F., et al., Computational and Ambient
facial emotion identification. The model was Intelligence: 9th International Work-Conference on
developed using two datasets: FER2013 and Artificial Neural Networks, IWANN 2007, San
CKPlus. The model uses the image collection to Sebastián, Spain, June 20-22, 2007, Proceedings. Vol.
4507. 2007: Springer.
classify seven different facial expressions. The 7. Semwal, V.B., K. Mondal, and G.C. Nandi, Robust and
proposed model has equivalent training and accurate feature selection for humanoid push recovery
validation accuracy, indicating that it has a good fit and classification: deep learning approach. Neural
Computing and Applications, 2017. 28(3): p. 565-574.
for the data and can be generalized. The model 8. Zhan, C., et al., A real-time facial expression
reduces the loss function using an Adam optimizer, recognition system for online games. International
and it has been tested to have an accuracy of 67.9 Journal of Computer Games Technology, 2008. 2008.
9. Assari, M.A. and M. Rahmati. Driver drowsiness
percent for FER2013 and 89 percent for the CKPlus detection using face expression recognition. in 2011
Dataset. The study might be enhanced to use a IEEE international conference on signal and image
video sequence to detect changes in emotion, which processing applications (ICSIPA). 2011. IEEE.
10. Chen, L., C. Zhou, and L. Shen, Facial expression
could then be used for several real-time applications recognition based on SVM in E-learning. Ieri Procedia,
including feedback analysis and so on. For 2012. 2: p. 781-787.
11. Ren, X.-D., et al. Convolutional neural network based
successful control of other electrical devices, this
on principal component analysis initialization for
system can be integrated with them. image classification. in 2016 IEEE first international
conference on data science in cyberspace (DSC). 2016. 21. Revina, I.M. and W.S. Emmanuel, A survey on human
IEEE. face expression recognition techniques. Journal of King
12. Alizadeh, S. and A. Fazel, Convolutional neural Saud University-Computer and Information Sciences,
networks for facial expression recognition. arXiv 2017. 2021. 33(6): p. 619-628.
arXiv preprint arXiv:1704.06756, 2017. 22. Minaee, S., M. Minaei, and A. Abdolrashidi, Deep-
13. Chen, Z., et al. Only look once, mining distinctive emotion: Facial expression recognition using
landmarks from convnet for visual place recognition. in attentional convolutional network. Sensors, 2021.
2017 IEEE/RSJ International Conference on Intelligent 21(9): p. 3046.
Robots and Systems (IROS). 2017. IEEE. 23. Sun, A., et al. Using facial expression to detect emotion
14. Breuer, R. and R. Kimmel, A deep learning perspective in e-learning system: A deep learning method. in
on the origin of facial expressions. arXiv preprint International Symposium on Emerging Technologies
arXiv:1705.01842, 2017. for Education. 2017. Springer.
15. Jung, H., et al. Joint fine-tuning in deep neural 24. Zadeh, M.M.T., M. Imani, and B. Majidi. Fast facial
networks for facial expression recognition. in emotion recognition using convolutional neural
Proceedings of the IEEE international conference on networks and Gabor filters. in 2019 5th Conference on
computer vision. 2015. Knowledge Based Engineering and Innovation (KBEI).
16. Modi, S. and M.H. Bohara. Facial Emotion 2019. IEEE.
Recognition using Convolution Neural Network. in 25. Goodfellow, I.J., et al. Challenges in representation
2021 5th International Conference on Intelligent learning: A report on three machine learning contests.
Computing and Control Systems (ICICCS). 2021. IEEE. in International conference on neural information
17. Solanki, S., Deep Convolutional neural networks for processing. 2013. Springer.
facial emotion recognition. Turkish Journal of 26. Wu, X. and L. Bartram, Social Robots for People with
Physiotherapy and Rehabilitation. 32(3). Developmental Disabilities: A User Study on Design
18. Zhou, F., et al. Ship detection based on deep Features of a Graphical User Interface. arXiv preprint
convolutional neural networks for polsar images. in arXiv:1808.00121, 2018.
IGARSS 2018-2018 IEEE International Geoscience 27. Aydilek, İ.B. Approximate estimation of the nutritions
and Remote Sensing Symposium. 2018. IEEE. of consumed food by deep learning. in 2017
19. Ou, J., Classification algorithms research on facial International Conference on Computer Science and
expression recognition. Physics Procedia, 2012. 25: p. Engineering (UBMK). 2017. IEEE.
1241-1244. 28. Kingma, D.P. and J.B. Adam, A Method for Stochastic.
20. Krithika, L. and L.P. GG, Student emotion recognition Optimization. In, ICLR, 2015. 5.
system (SERS) for e-learning improvement based on
learner concentration metric. Procedia Computer
Science, 2016. 85: p. 767-776.

You might also like