You are on page 1of 5

2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA)

Galgotias University, Greater Noida, UP, India. Oct 30-31, 2020

Facial Expression Recognition using Deep Learning


for Children with Autism Spectrum Disorder
Jasmine Awatramani Nitasha Hasteer
Amity University, Uttar Pradesh Amity University, Uttar Pradesh
jasmineawatramani@gmail.com nitasha78@gmail.com

Abstract- Autism Spectrum Disorder is a neurodevelopment represents the methodology used in this paper. Section V
disorder that can affect the ability to communicate, socialize, and showcases experimental outcomes along with discussions.
behavioral challenges. Facial emotion recognition is a Finally, we conclude the work in Section VI.
methodology that helps to detect the emotions of humans. But,
people generally suffering from ASD lack in expression
recognition. In order to improve their ability to detect emotions,
it is necessary to teach them at an early stage, which is childhood.
This research work showcases the use and implementation of the
basic architecture of the Convolutional Neural Network to
educate children with ASD to identify human emotions. An
existing dataset from the literature has been used to validate the
model and an accuracy of 67.50% has been achieved.

Keywords- Autism Spectrum Disorder, Deep Learning,


Convolutional Neural Network, Emotions, Face Recognition.

I. INTRODUCTION Person A (autistic) Person B (non-autistic)


Autism Spectrum Disorder (ASD) is a combination of
developmental disorders. Each child with ASD is going to have Fig. 1. Conversation
different spectrum of symptoms and deficits. Despite the fact
that everyone develops at slightly different paces, there are
some general developmental milestones. Some of them are
socializing and language & communication. When these skills
don’t develop as normally, it can lead to isolation. From here,
the word Autism originated. Autism refers to a state where
somebody might be removed from social interaction in
communication leaving them alone. This disorder requires Fig. 2. Working of CNN
significant input from caretakers. It can be identified in any
phase of the life. II. CONVOLUTIONAL NEURAL NETWORK
Deep learning (DL) is a subgroup of machine learning,
Increase in the application of Machine Learning (ML) and which in turn, is a subgroup of artificial intelligence (AI). It is a
Deep Learning (DL) techniques, early detection of various kind of machine learning influenced by the anatomy of the
health issues now seems possible[1]. Therefore, early detection human brain. Convolutional Neural Network (CNN), filters the
of this neurodevelopmental disorder can help in keeping up the images before training the deep neural network. After filtering
person’s mental and physical health. Assuming that the images, features within the images could then come to the
environmental and genetic factors are somewhere reason forefront and then those features can be spotted to identify the
behind this disorder, as the exact cause hasn’t been found. objects. A filter is a set of multiplier. With the help of CNN,
ASD cannot be completely treated but if detected early, there is the size of image can be reduced but the features of that image
a possibility where some of the effects can be reduced. can still be maintained. When an image is fed into the
The objective is to educate the child with ASD to detect convolutional layer, a number of arbitrary initialised filters will
facial expressions of another person while having a pass over that image. The results of these are fed into the next
conversation. This work has been implemented in Jupyter layer and matching is performed by the neural network. Over
along with various libraries like Matplotlib, Keras, Tensorflow the time, the filters that gives the image outputs will be learnt.
and OpenCV[15]. The paper presents the outcome of the basic This procedure is known as feature extraction.
CNN architecture. The rest of the paper is organized as CNN are similar to ordinary NN. Architecture of ConvNet
follows: Section II provides an overview of deep learning permits to encode some particular properties into the
algorithm (convolutional neural network). Section III architecture by making an explicit assumption like images are
showcases the literature study in the same field. Section IV

978-1-7281-6324-6/20/$31.00 ©2020 IEEE 35

Authorized licensed use limited to: AMITY University. Downloaded on June 29,2021 at 11:39:10 UTC from IEEE Xplore. Restrictions apply.
input. Then, even more efficient forward function is created to III. LITERATURE REVIEW
largely reduce the parameters in the network. Therefore, this is The research has been done by accessing IEEE Digital
a kind of Multi-Layer Perceptron (MLP) which is designed for Library to retrieve the suitable learnings in this field. The work
minimal processing. But, then this doesn’t have full presents the literature studies with a same objective to find the
connectivity among the nodes[10]. studies which consisted of facial emotion and expression
Following is the synopsis of layers and their respective features: recognition as their research work.
O. Arriaga et al.,[3] conducted a research with an aim to
1. Convolutional Layer: It is the most important building implement fully-sequential CNN architecture with an accuracy
block of CNN. Its work is to do the most of the heavy of about 66% where this model was able to detect the emotions
computational lifting. This layer consists of set of and as well as the gender of a person in real-time. Research by
filters. Output of this layer can be simplified as holding A. Ankit et al.,[12] compared machine learning algorithms,
neurons arranged in a 3D volume. Every filter is small where SVM showcased a good accuracy. In this paper[5], the
space-wise, but it can be extended through full length author designed three CNN architectures, where the best ANN
of volume of the input. was further supposed to be optimised. In [7], authors
2. Pooling Layer: This layer is inserted sometimes in the implemented neural network with the help of MATLAB along
convolutional layers. Its work is to gradually reduce with fine-tuning methodologies. In the work of [8], authors
the spatial size, in order to reduce the number of showcased an accuracy of 59%. B. Balasubramanian et al.,[9]
parameters, reduce computation in the CNN and also conducted a comparative work, where CNN outperformed
to control overfitting. This layer uses MAX function. other algorithms.
3. ReLU: It stands for Rectified Linear Layer Unit. It From the research work conducted, the limitations that it
applies element-wise activation function. Even after its holds is either about detection of less emotions in real-time,
application, the size of the volume remains same. i.e., the model tends to get confused about which emotion to
showcase. This problem can occur due to less images present
4. Fully Connected Layer: It calculates the class score. It in the dataset of that particular emotion/ emotions.
results the size of volume as [1 x 1 x n], where n
depicts the class score. With ordinary NN, every
neuron in the present layer will be connected to all
numbers in previous volume.
TABLE I. LITERATURE WORK
Authors Objective Results
Aim of this research was to propose a CNN
architecture. It reported accuracies of two datasets: This work showcases accuracy of 66% with a
O. Arriaga, P. G. Plöger, M. Valdenegro[3]
FER-2013 and IMDB dataset, as it was focused on sequentially fully-CNN proposed architecture.
gender classification as well.

This research worked with three customized neural


The authors showed the application of ANN with
G V.E.Correa, A. Jonker, M. Ozo, R. Stolk [5] network architectures. After the performance, the
an accuracy 66.6%.
best network is further optimized.

The work has been done with the help of an API


The results of the study showcased the application
A. Ankit, D. Narayan, A. Kumar [6] that can fetch images combined with the usage of
of SVM exhibited good accuracy.
HAAR Cascade Classifier.

Objective of the research was to showcase the


performance of neural network along with This work showed the application of CNN with an
H E. Canbalaban, M. Ö. Efe [7]
MATLAB. Various training and fine-tuning accuracy 61.80%.
methodologies have been implemented.

Objective of this research was to implement CNN The results showed the application of CNN with
Y. Zeng, N. Xiao, K. Wang, H. Yuan [8]
in order to learn the facial expressions. accuracy 59.00%.
Objective was to implement and compare various This research showed the application of
B. Balasubramanian, P. Diwan, R. Nadar, A. Bhatia deep machine learning and deep learning Convolutional Neural Networks outperformed
[9] techniques and to conclude the best algorithm with other algorithms.
the greater precision to detect facial emotion.

36

Authorized licensed use limited to: AMITY University. Downloaded on June 29,2021 at 11:39:10 UTC from IEEE Xplore. Restrictions apply.
IV. METHODOLOGY
A. Attributes and Data Set
Dataset used in this project is FER-2013[2].
It consists images with 7 different emotions –
0=Angry, 1=Disgust, 2=Happy, 3=Sad, 4=Scared
5=Surprise, 6=Neutral.
The dataset consists of 48x48 pixel grayscale images. The
train.csv file consists of two columns: emotion (0-6) and pixels
(string surrounded quotes of each image). Whereas, the
validation.csv consists of only the pixels column. The task here
is to predict the emotion.
Where, while splitting, 80% of images are contained inside
the train folder, and 20% of images are contained inside the
validation folder.
Training set: 28,709 samples
Validation set: 3,589 samples
B. Working
FER-2013 raw data (images) was converted into a csv file
with the columns named emotion and pixels. While exploring
the dataset, the images were zoomed, rotated and mirrored in
order to increase the accuracy and better real-time emotion
Fig. 3. CNN Architecture
recognition. Dataset was further divided as training and
validation as mentioned in methodology. After, the division, V. EXPERIMENTAL RESULTS AND DISCUSSION
CNN was implemented as the hypothesis of reduction of
dimension suits a large amount of parameters in an image. After implementation of CNN, an average accuracy came
After the implementation, the model was connected with the out to be 67.50% , basically ranging between (65% ± 70%).
web cam in order to test the model in real-time for detection of After examination of the FER-2013 dataset, outcome is,
emotions. that the number of images present for disgust emotion is less,
C. CNN Architecture and that results in average performance of the model in
recognizing the disgust emotion. However, the accuracy and
Initially, the system will detect the face from the real-time
training of the model can be improved if a larger dataset is
video using HAAR features[12].Use of HAAR-Feature is, that
used with equivalent number of disgust emotion images.
is it like a kernel in Convolutional Neural Network, where
the values in HAAR are decided manually. These features can In the designing, problems occurred as the resolution of the
be edge features (used to detect edges), line feature, rectangle sample images are quite low, from only 48×48 pixels.
feature and so on[11].
Then with the OpenCV library, the process of face
detection will be implemented. In the backend, the detected
faces from the image will be cropped, therefore, will be
normalized to a fixed size, mentioned in the beginning.
The proposed CNN architecture consists of 10
convolutional layers along with batch normalization
(normalizes input channel with respect to the mini batch
made), ReLU (performs threshold operation and it makes the
value as 0 if it is less than 0), average pooling (calculates the
average of each patch), and global average pooling layers.
Whereas, softmax calculates error and concludes with
classification layer. The number of filters increases when
moving forward in the network. Dropout layer with the
probability of 0.5, means that some neurons are dropped out
randomly. Following represents the CNN architecture in this Fig. 4. accuracy vs value loss
work:

37

Authorized licensed use limited to: AMITY University. Downloaded on June 29,2021 at 11:39:10 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION
Our CNN model gave an accuracy of 67.50% in real-time
facial emotion recognition. This approach has resulted better
with the usage of softmax in the network, as compared to other
models. Anyhow, convolutional neural network emerged out
from literature review which is capable of differentiating seven
different group with both static testing and with real-time data.
This model has resulted in accurate result. However, to
improve the performance, different feature extraction
methodologies use of more powerful machine and increase the
size and balanced resolution of the images; even higher
accuracies can be achieved.
This work will be of great use in future for the cognitive
development of children suffering from ASD. This model can
be further incorporated with IoT, in order to serve it to the
needful.
REFERENCES
[1] Raj, Suman & Masood, Sarfaraz. (2020). Analysis and Detection of
Autism Spectrum Disorder Using Machine Learning Techniques.
Procedia Computer Science. 167. 994-1004.
10.1016/j.procs.2020.03.399.
[2] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B.
Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C.
Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M.
Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie,
L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio. Challenges in
representation learning: A report on three machine learning contests.
Neural Networks, 64:59--63, 2015. Special Issue on "Deep Learning of
Representations"
[3] https://www.kaggle.com/c/challenges-in-representation-learning-facial-
expression-recognition-challenge
[4] K Arriaga, O., Valdenegro-Toro, M., & Plöger, P. (2017). Real-time
Convolutional Neural Networks for emotion and gender
classification. ArXiv, abs/1710.07557.
[5] https://edlab.tc.columbia.edu/blog/10822-Do-Kids-Learn-Faster-than-
Adults
[6] G V.Enrique Correa, Arnoud Jonker, Michael Ozo, Rob Stolk. (2016)
“Emotion Recognition using Deep Convolutional Neural Networks”
[7] A. Ankit, D. Narayan, A. Kumar, " Transformation of Facial Expression
into corresponding Emoticons," in International Journal of Advanced
Research in Computer Engineering and Technology (IJARCET)
Volume-8 Issue-5, June 2019.
[8] H E. Canbalaban and M. Ö. Efe, "Facial Expression Classification Using
Convolutional Neural Network and Real Time Application," 2019 4th
International Conference on Computer Science and Engineering
(UBMK), Samsun, Turkey, 2019, pp. 23-27, doi:
10.1109/UBMK.2019.8907065.
[9] Y. Zeng, N. Xiao, K. Wang and H. Yuan, "Real-Time Facial Expression
Recognition Using Deep Convolutional Neural Network," 2019 IEEE
International Conference on Mechatronics and Automation (ICMA),
Tianjin, China, 2019, pp. 1536-1541, doi:
10.1109/ICMA.2019.8816322.
[10] B. Balasubramanian, P. Diwan, R. Nadar and A. Bhatia, "Analysis of
Facial Emotion Recognition," 2019 3rd International Conference on
Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019,
pp. 945-949, doi: 10.1109/ICOEI.2019.8862731.
[11] https://cs231n.github.io/convolutional-networks/
[12] https://towardsdatascience.com/whats-the-difference-between-haar-
feature- classifiers-and-convolutional-neural-networks-ce6828343aeb
[13] K H. Jung et al., "Development of deep learning-based facial expression
recognition system," 2015 21st Korea-Japan Joint Workshop on
Fig. 5. Experimental Results Frontiers of Computer Vision (FCV), Mokpo, 2015, pp. 1-4, doi:
10.1109/FCV.2015.7103729.

38

Authorized licensed use limited to: AMITY University. Downloaded on June 29,2021 at 11:39:10 UTC from IEEE Xplore. Restrictions apply.
[14] https://pathmind.com/wiki/neural-network [18] S. Paul, J. K. Verma, A. Datta, R. N. Shaw and A. Saikia, "Deep
[15] https://cs231n.github.io/neural-networks-1/ Learning and its Importance for Early Signature of Neuronal Disorders,"
2018 4th International Conference on Computing Communication and
[16] ‘Jupyter’, 2020. [Online]. Available: https://jupyter.org/ [Accessed:
Automation (ICCCA), Greater Noida, India, 2018, pp. 1-5.doi:
May, 2020].
10.1109/CCAA.2018.8777527
[17] K P. Rani, "Emotion Detection of Autistic Children Using Image
Processing," 2019 Fifth International Conference on Image Information
Processing (ICIIP), Shimla, India, 2019.

39

Authorized licensed use limited to: AMITY University. Downloaded on June 29,2021 at 11:39:10 UTC from IEEE Xplore. Restrictions apply.

You might also like