You are on page 1of 30

Chapter 1

Facial expression recognition : state of the art

1- Introduction

Due to the important role of facial expression in human interaction, the ability to
perform facial expression recognition automatically via computer vision enables a
range of applications such as human- computer interaction and data analytics, etc…

In this chapter, we will present some notions of emotions and different coding
theories as well as the architecture of facial recognition.

We will present some approaches that help as to recognize facial expression and we
end the chapter with differents machine learning techniques.

2- Facial expressions and emotions :

2.1. definitions :

2.1.1. Emotions : the emotion is expressed through many channels such as body
position, voice and facial expressions.

SHERER proposes the following definition : « Emotion is a set of episodic variations


in several components of the organisation in response to events assessed as
important by the organism. »

2.1.2. Facial expressions : facial expression is a meaningful imitation of the face.


The meaning can be expression of an emotion, a semantic index or an intonation in
the language of panels. The interpretation of a set of muscle movements in
expression depends on the context of the application.
For example, in the case of an application in Human-Machine interaction where we
want to know an indication of the emotional state of an individual, we will try to
classify measures in terms of emotions.
2.2. The six universal facial expressions :
Charles DARWIN wrote in his 1872 book « the expressions of the emotions in Man
and Animals » that facial expressions of emotion are universal, not learned differently
in each culture.
Several studies since have attempted to classify human emotions and demonstrate
how your face can give away your emotional state.[1]
In 1960, Ekman and Friesen defined six basic emotions based on cross-culture
study, which indicated that humans perceive certain basic emotions in the same way
regardless of culture. These prototypical facial expressions are anger, disgust, fear,
hapiness, sadness, and surprise.[2]

2.3. Coding systems :


Facial expressions is a consequence of activity of facial muscles.
These muscles are also calledd mimetic muscles or muscles of the facial
expressions.
The study of facial expressions cannot be done without the study of the anatomy of
the face and the underlying structure of the muscles.
That’s why some researchers focused on a coding system for facial expressions.
Several systems have been proposed such as Ekman system’s.
In 1978 Ekman developed a tool for coding facial expressions widely used today.
We will present some systems.
2.3.1. FACS : facial action coding systems  is a system developeed by Ekman and
friesen which is a standard way of describing facial expressions in both
psychology and computer animation.
Facs is based on 44 actions units (AUs) that represent facial movement that cannot
be composed into smaller area.
FACS is very successful but it suffers from some defaults such as :
 Complexity : it takes 100 hours of learning to master the main concepts.
 Difficulty of handling bu a machine : FACS was created for psychologist, some
measurements remains vague and difficult to assess by a machine.
 Lack of precision : the transition between two states of a muscle are
represented by linear way, which is an approximation of reality.
2.3.2. MPEG4 : the MPEG4 video encoding standard has a model of the face human
developped by the face and body AdHocGroup interest group.
This is a 3D model.
This model is built on a set of facial attributes, called Facial Feature Points(FFP).
Mezasurements are used to describe muscle movements( Facial animation
Parameters-equivalents of Ekman unit Actions).

MPEG4 model
2.3.3. Candide : is a model of the face, contained 75 vertices ans 100 triangles.
It is composed of a model with a generic face and a set of parameters(SHAPE
UNITS).
These parameters are used to adapt the generic model to a particular individual.
They represent the differences between individuals and are 12 in number :
1/ head height
2/vertical position of the eyebrows
3/vertical eye position
4/eye width
5/eye height
6/eye separation distance
7/depth of the cheeks
8/depth of the nose
9/vertical position of the nose
10/degree of the curvature of the nose
11/vertical position of the mouth
12/width of the mouth
CANDIDE model

2.4. Facial Emotion Recognition : Areas of application[1’]


A challenging problem of automatic recognition of human has become a research
field involving many scientists specializing in different areas such as artificial
intelligence, computer vision, psychology, physiology, education, website
customization.
3- Facial expression recognition architecture :
A system that performs automatic recognition of facial expression consists of three
modules :
The first one is detecting and recording the face in the image or the input image
sequences. It can be a sensor to detect the face in each image or just detect the face
in the first image and then follow the face in the rest of video sequences.
The second module consist in extracting and representing the facial changes caused
by facial expressions.
The last one determines a similarity between the set of characteristics extracted and
a set of reference characteristics.
Other filters or data preprocessing modules can be used between these main
modules to improve the results of detection, extraction of characteristics or
classification.

Fig : general architecture of the facial expression recognition system


3.1. face detection :
Face detection consists of determinig the presence or absence of faces in a picture.
This is a preliminary task necessary for most techniques for analysing the face.
The techniques used generally come from the field of recognition shapes.
There are several techniques for detecting the face, we mention the most used.
 Automatic facial treatement : it is a method that specifies faces by distances
and proportions between particular pointsaround the eyes, nose, corners of
the mouth, but it is not effective when the light i slow.
 Eigenface : this is an effective method of characterization in facial treatment
such as as face detection and recognition.
It is based on the representation of face features from model grayscale
images.
LDA( linear discriminant analysis) : it is based on predictive discriminant
analysis. It is about explaining and predicting the membership of azn individual
to a predefined class based on measured characteristics using prediction
variables.
 LBP( local binary patterns method) ; the technique of local binary model
devides the face into square subregions of equal size where the
LBPcharacteristics are calculated . the vector obtained are concatenated to
get the final feature vector.
 Haar filter : this face detection method uses a multiscale haar filter. The
characteristics of a face are described in an XML file.
3.2. characteristics extraction
The characteristics points of the face are mainly located around the facial
components such as the eyes, mouth, eye-brow nose and chin.
The detection of characteristics points of the faces usually starts from a rectangular
bounding box returned by a face detector which locates the latter. The extraction of
the geometric features such as the countours of facial components.
3.3. Facial emotion recognition :

3.4. Facial expression databases[2]


Having sufficient labeled training data that include as many variations of the
populations and environments as possible is important for the design of a deep
expression recognition system.
We will introduce some databases that contain a large amount of affective images
collected from the real world to benefit the training of deep neural networks.
 CK+ : the extended cohnkanade database is the most extensively used
laboratru-controlled database for evaluating FER system.
CK+ contains 593 video sequences from 123 subjects.
The sequences vary in duration from 10 to 60 frames and show a shift from a
neutral facial expression to the peak expression. Among these video, 327
sequences from 118 subjects are labeled with seven basic expression
labels(anger, comptemt, disgust, fear, hapiness, sadness and surprise) based
on the facial action coding systems(FACS).
Because CK+does not provide specified training, validation and test set, the
algorithms evaluated on this database are not uniform.
 NMI : this database is laboratry-controlled are includes 326 sequences from
32 subjects. A total of 213sequences are labeled with six basic expressions
and 205 sequences are captured in frontal view. In contrast to CK+ sequences
in NMI are onset-apex-offset labeled.
The sequence begins with a neutral expression and reaches peak near the
middle before returning to the neutral expression.
 JAFFE : the japaneese female facial expression database is a laboratry-
controlled image database that contains 213 samples of posed expressions
from 10 japaneese female. Each person has 3^4 images with each of six basic
facial expression( anger, disgust, fear, hapiness, sadness and surprise) and
one image with a neutral expression. The database is challenging because it
contains few examples per subject/expression.
 FER-2013 : this database xas introduced during the ICML 2013 challenges in
representation learning. FER-2013 is a large scale and unconstrained
database collected automatically by the google image search API. All images
have been registred and resized to 48*48 pixels after rejecting wrongfully
labeled frames and adjusting the cropped region. FER-2013 contains 28.709
training images, 3.589 validation images and 3.589 test images with seven
expression labels ( anger, disgust, fear, hapiness, sadness, surprise, and
neutral).
3.5. Machine learning
Chapter II
Deep learning
1. Introduction :
Deep learning is a subset of machine learning, which uses the neural network to
analyze different factors with a structure that is similar to the human neural system.
It uses complex multi-layered neural networks, where the level of abstraction
increases gradually by non_linear transformations of input data.[4]
It concerns algorithms inspired b by the structure and function of the brain. They can
learn several levels of representation in order to model complex relationships
between data
2. Machine learning vs Deep learning
Machine learning algorithms work well for a wide variety of problems. However they
failed to solve some major AI problems such as speech, face and emotions
recognition.
Machine learning method includes the following four steps:
 Features engineering: choice as a basic for prediction( attributes, features).
 Choose the appropriate machine learning algorithm( such as classification
algorithm or regression algorithm).
 Train and evaluate model performance( for different algorithms, evaluate and
select the best performing model).
 Use the trained model to classify or predict the unknown data.[5]
Most of the characteristics of an application must be determined by an expert and
then encoded as a data type. Features can be pixel value, shapes, etc,...
The performance of machine learning algorithms depends upon the accuracy of
the features extracted.
Deep learning reduces the task of developing new features extractor, by
automating the phase of extracting and learning features.[6]
Deep learning uses neural network to learn representations of characteristics
directly from data.
3. Artificial neural network [8]
Artificial neural network is a computing model that tries to mimmic the human brain in
a very primitive way to emulate the capabilities of human being in a very limited
sense. ANNs have been developed as a generalization of mathematical models of
human cognition or neural biology.
It takes an input vector X and produces an output vector Y. the relationship between
X and Y are determined by the network architecture.[9]
An ANN is a network of parallel, distributed informationprocessing. It consists of a
number of informations processing elements called neurons or nodes which are
grouped in layers.
The input layer processing elements receive the input vector and transmit the values
to the next layer of processing elements across connections where this process is
continued. This type of network, where data flow one way(forward) is known as a
feed forward network. A feedforward ANN has an input layer, an output layer and one
or more hidden layers between the input and the output layers.
Each of the neurons in a layer is connected to all the neuros of the next layer and
the neuron in one layer are connected only to the neurons of the immediate next
layer. The strength of the signal passing from one neuron to the other depends on
the weight of the interconnections.
The hidden layers enhance to the network’s ability to model complex functions.
Performance of BPANN(back propogation artificial neural network) model sis
compared with the developped linear transfer function(LTF) model and was found
superior.

4. Convolutional neural network cnn


4.1. Presentation
Convolutional neural network CNN is an artificial neural network type that proposed
by Yann le Cuhn in 1988.
CNNs are one of the most popular deep learning architectures for image
classification, recognition and segmentation.
CNN consists of hierarchical multiply hidden layers. These artificial neurons take
input from image, multiply weight, add bias and then apply activation function.
So that, artificial neurons can be used in image classification, recognition and
segmentation by perform simple convolutions by feeding the convolutional neural
network with more data( huge amount of data).[10]
4.2. Architecture [3][6]
Convolutional Neurals networks are the most efficient models for classifying images
data.
It was inspired by the mammal’s visual cortex[7]
Each CNN channe lis made up of convolutional layers, max pooling layers, fuuly
connected layers and an output layer.[11]

Architecture for a convolutional neural network


4.2.1. The convolution layer CONV
The convolution layer is the first layer to extract features from an input image[6]. It is
the fundamental unit of a convnet[12]
It contains a set of filters whose parameters need to be learned.
Once the information hits a convolution layer , the layer convolves every filters across
the spatial dimensionality of the data to provide a 2D activation map.
The convolution of (N,M) image matrix multiplies with (n,m) filter matrix is called
« feature map ».
The convolution of an image with different filters can perform operations such as
edge detection, blur and sharpen by applying filters[12].
During the forward pass, each filter is convolved across the width and height of the
input volume and compute dot products between the entries of the filter and the input
at any position. As the filter convolve over the width and the height of the input
volume it produces a 2 dimensional activation map that gives the responses of the
filter at every spatial position.
There will be an entire set of filters in each of them will produce a separate 2-
dimensional activation map.[14]
The 2D convolution between image A and filter B can be given as :
Na−1 Na−1
C(i,j)= ∑ ❑ ∑ A ( m , n )∗B(i−m, j−n)
m =0 m=0

where size of A is (Ma x Na), size of B is (Mb x Nb), 0≤𝑖<𝑀𝑎+𝑀𝑏−1∧0≤𝑗<𝑁𝑎+𝑁𝑏−1

CNN learns the values of these filters on its own during the training process( although
parameters such as number of filters, filter size, architecture of the networ, etc still
needed to specify the training process).
By increasing the number of filters, the more image features get extracted and the
better network becomes.
Three parameters control teh size of the feature map( convolved feature) :
 Depth : correspond to the number of filters we use for the convolution
operation.
 Stride : if the size of filter is 3 then stride is3.
 Zero padding : it is convenient to pad the input matrix with zeros around the
border, so that filter can be applied to bordering elements of input image
matrix.
An additional operation is used after every convolution operation, called RELU layer.
A rectified linear unit apply an activation function, the output is :
F(x)= max(0.x).
There are an other non linear fuctions such as tanh or sigmoid that can alsobe used
instead of RELU. Most of the data scientist since performance wise RELU is better
than the other two.[13]
4.2.2. The pooling layer [6][13][14]
Pool layer is inserted between successive convolution layers, applying a
downsampling operation along the spatial dimensions width and height. Which
reduces the dimensionality of each map but retains important informations.
Spatial pooling can be of different types such as max pooling, average pooling and
sum pooling.
In MAXpooling, a spatial neighborhood (for example 2*2 window) is defined and the
largest element is taken from the rectified feature map within that window.
In case of average pooling, the average or sum of all elements is that window is
taken.
In practice, the MAXpooling has been shown to work better.
MAXpooling reduces the input by applying the max function over the input 𝑥𝑖, let m
be the size of the filter then the output calculates as follows :
M(𝑥𝑖) = 𝑚𝑎𝑥 {𝑥𝑖+𝑘,+𝑙 |𝑘| ≤ 𝑚/2 , |𝑙| ≤ 𝑚/2 𝑘, 𝑙𝜖 N}

figure : MAX pooling

4.2.3. The fully connected layer[13][14]


In the end, a feature extractor vector or CNN code concatenate the output
informations as a unique vector and feed it into fully connected layer(multilayer
perceptron).
The term « fully connected »indicates that every neuron in the previous layer is
connected to every neuron on the next layer. The output from the convolutional and
pooling layers represent high level features of the input image.
The purpose of the fully connected layer is to use these features for classifying the
input image into various classes based on the training Dataset.
4.2.4. Activation function
The activation function is a mathematical function applied to a signal at the output of
an artificial neuron.
The term activation function comes from the biological equivalent »activation
potential » simulation threshold which, once reached leads to a response of the
neuron.
Softmax is used for activation function, it treats the outputs as scores for each class.
In the softmax, the function mapping stayed unchanged and these scores are
interpreted as the unnormalized log probabilities for each class.
Softmax is calculated as :
e (Zj)
K
f(z)j=
∑k e(Zk )
where j is index for image and K is number of total facial expression class.
The RELU is an activation function which eliminates all the negative values.
5. Visualisation of some CNN architectures[4]
In recent years, we remarked the evolution of CNNs architectures. These networks
have gotten so deep that it has become extremely difficult to visualise the entire
model.
5.1. leNet-5 (1998) : is one of the simplest architectures. It has 2 convolutional
and 3 fully-connected layers.
This architectures has about 60.000 parameters.
5.2. AlexNet(2012) : with 60 M parameters, AlexNet has 8 layers 5
convolutional and 3 fully connected.
AlexNet just stacked a few more layers. This architecture was one of the
largest convolutional neural networks to date on the subsets of ImageNet.
They are the first to implement RELU as an activation Function.
5.3. VGG-16(2014) : with this architecture, we notice taht CNNs were strating to
get deeper and deeper. This is because the most straight forward way of
improving performance of deep neural networks is by increasing their size.
VGG-16 has 13 convolutional and 3 fully connected layers, carrying with
them the RELU tradition from AlexNet. It consists of 138M parameters
and takes about 500MB of storage space.
5.4. Inception-v1(2014) : this 22 layers architecture with 5M parameters is
called the inception-v1. The design of the architecture of an inception
module is a product of research on approximating sparse structures.
5.5. ResNet-50(2015) : from the past few CNNs, we have seen nothing but an
increasing number of layers in the design and achieving better
performance. But with the network depth increasing, accuracy gets
saturated and the degrades rapidly.
The folkes from Microsoft researcher adressed this problem with ResNet,
using skip connections while building deeper models.
ResNet is one of the early adapters of batch normalisation with 26 M
parameters.
5.6. Xception(2016) : Xception is an adaptation from inception, when the
inception modules have been replaced with depthwise separable
convolution, it has also roughly the same number of parameters as
inception-v1(23M).
6. Conclusion
In this chapter, we have presented the neural network and its differents
architectures.
We focuses on CNNs , their structures and its differents layers, then we have
presented a few examples of architectures.
In the next chapter, we will explains the idea of using the architecture that we
have chosen for our system of face expression recognition.
Chapter III
Design of a CNN based FER system
1. Introduction
2. System presentation
3. General architecture
3.1. Face detection
3.2. Facial features extraction
3.3. Presentation of VGG16
accuracy
training (min: 0.317, max: 0.649, cur: 0.649)
validation (min: 0.391, max: 0.622, cur: 0.622)
Loss
training (min: 0.931, max: 1.790, cur: 0.931)
validation (min: 1.021, max: 1.600, cur: 1.021)

Epoch 00015: saving model to C:\Users\HP\Desktop\Project\


model_weights.h5
448/448 [==============================] - 1775s 4s/step - loss: 0.9309
- accuracy: 0.6492 - val_loss: 1.0208 - val_accuracy: 0.6223

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 48, 48, 64) 640
_________________________________________________________________
batch_normalization (BatchNo (None, 48, 48, 64) 256
_________________________________________________________________
activation (Activation) (None, 48, 48, 64) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 24, 24, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 128) 204928
_________________________________________________________________
batch_normalization_1 (Batch (None, 24, 24, 128) 512
_________________________________________________________________
activation_1 (Activation) (None, 24, 24, 128) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 128) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 12, 12, 512) 590336
_________________________________________________________________
batch_normalization_2 (Batch (None, 12, 12, 512) 2048
_________________________________________________________________
activation_2 (Activation) (None, 12, 12, 512) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 6, 512) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 6, 6, 512) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 6, 6, 512) 2359808
_________________________________________________________________
batch_normalization_3 (Batch (None, 6, 6, 512) 2048
_________________________________________________________________
activation_3 (Activation) (None, 6, 6, 512) 0
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 3, 3, 512) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 3, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 4608) 0
_________________________________________________________________
dense (Dense) (None, 256) 1179904
_________________________________________________________________
batch_normalization_4 (Batch (None, 256) 1024
_________________________________________________________________
activation_4 (Activation) (None, 256) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 131584
_________________________________________________________________
batch_normalization_5 (Batch (None, 512) 2048
_________________________________________________________________
activation_5 (Activation) (None, 512) 0
_________________________________________________________________
dropout_5 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 7) 3591
=================================================================
Total params: 4,478,727
Trainable params: 4,474,759
Non-trainable params: 3,968
_________________________________________________________________

1. Software and tools used for implementation :


Python3.7
 OpenCv
 Tensorflows
 Keras
 Numpy
 Sklearn
 Matplotlib
2. Database :
For a best performance, we should train the network with a lot of samples of
images.
This would increase the accuracy and improve the performance of the model.
Unfortunately, the large amount of data in datasets do not exist publicly, but we
have access to two public databases(FER2013 and CK+).
For our system, we will use FER2013 database.
3. Implementation
The system of Facial Emotion Recognition consists of two modules :
The first one is for face detection and the second one is for emotion recognition.
In the following, we will detail each modul.
3.1 face detection :
to detect faces, we used to choose the method proposed by paul Viola and
michael Jones.
In our system, we opted fior Haarcascade_eye.xml from the OpenCv library
which provides the Haar cascade method.
from __future__ import print_function
import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow. keras.models import Sequential
from tensorflow.keras.layers import
Dense,Dropout,Activation,Flatten,BatchNormalization
from tensorflow.keras.layers import Conv2D,MaxPooling2D
from tensorflow.keras.models import load_model
import os
import numpy as np
import matplotlib.pyplot as plt
from livelossplot.tf_keras import PlotLossesCallback

from sklearn.utils import shuffle


from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
import itertools

num_classes = 5
img_rows,img_cols = 48,48
batch_size = 8

train_data_dir = 'C:/Users/HP/Desktop/projectEssai/train'
validation_data_dir = 'C:/Users/HP/Desktop/projectEssai/validation'

train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=30,
shear_range=0.3,
zoom_range=0.3,
width_shift_range=0.4,
height_shift_range=0.4,
horizontal_flip=True,
fill_mode='nearest')

validation_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
train_data_dir,
color_mode='grayscale',
target_size=(img_rows,img_cols),
batch_size=batch_size,
class_mode='categorical',
shuffle=True)

validation_generator = validation_datagen.flow_from_directory(
validation_data_dir,

color_mode='grayscale',

target_size=(img_rows,img_cols),
batch_size=batch_size,

class_mode='categorical',
shuffle=True)

model = Sequential()
# Block-1
model.add(Conv2D(32,
(3,3),padding='same',kernel_initializer='he_normal',input_shape=(img_rows,i
mg_cols,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(32,
(3,3),padding='same',kernel_initializer='he_normal',input_shape=(img_rows,i
mg_cols,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-2
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-3
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-4
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-5
model.add(Flatten())
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# Block-6
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# Block-7
model.add(Dense(num_classes,kernel_initializer='he_normal'))
model.add(Activation('softmax'))

print(model.summary())

from tensorflow.keras.optimizers import RMSprop,SGD,Adam


from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping,
ReduceLROnPlateau

checkpoint = ModelCheckpoint('Emotion_little_vgg.h5',
monitor='val_loss',
mode='min',
save_best_only=True,
verbose=1)

#earlystop = EarlyStopping(monitor='val_loss',
#min_delta=0,
#patience=3,
#verbose=1,
#restore_best_weights=True
#)
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=3,
verbose=1,
min_delta=0.0001)

callbacks = [PlotLossesCallback(),checkpoint,reduce_lr]

model.compile(loss='categorical_crossentropy',
optimizer = Adam(lr=0.001),
metrics=['accuracy'])

nb_train_samples = 24176
nb_validation_samples = 3006
epochs=15

history=model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples//batch_size,
epochs=epochs,
callbacks=callbacks,
validation_data=validation_generator,
validation_steps=nb_validation_samples//batch_size)
accuracy
training (min: 0.255, max: 0.464, cur: 0.464)
validation (min: 0.294, max: 0.586, cur: 0.586)

Loss
training (min: 1.295, max: 1.725, cur: 1.295)
validation (min: 1.043, max: 1.551, cur: 1.043)

Epoch 00015: val_loss improved from 1.05397 to 1.04265, saving model to


Emotion_little_vgg.h5
3022/3022 [==============================] - 1431s 473ms/step - loss:
1.2950 - accuracy: 0.4640 - val_loss: 1.0426 - val_accuracy: 0.5860

 If we increase the epoch up to 25, the value of accuracy


will increase up to 65%.

Confusion matrix in KERAS

test_dir=validation_data_dir
In [3]:
import sklearn

from sklearn.utils import shuffle


from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt

%matplotlib inline
In [4]:
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(img_rows, img_cols),
color_mode='grayscale',
batch_size=batch_size,
class_mode='categorical',
shuffle=True)

FLOW1_model =
load_model('C:/Users/HP/Desktop/projectEssai/Emotion_little_vgg(3).h5')

#cnfusion Matrix and Classification Report


Y_pred = FLOW1_model.predict_generator(test_generator,
test_generator.samples // test_generator.batch_size)
y_pred = np.argmax(Y_pred, axis=1)
y_pred = y_pred.reshape(-1)
print('Confusion Matrix')
print(confusion_matrix(test_generator.classes, y_pred))
print('Classification Report')

target_names = ['happy','sad','neutral','angry', 'surprise']

print(classification_report(test_generator.classes, y_pred,
target_names=target_names))

#Evaluating using Keras model_evaluate:


x, y = zip(*(test_generator[i] for i in range(len(test_generator))))
x_test, y_test = np.vstack(x), np.vstack(y)
loss, acc = FLOW1_model.evaluate(x_test, y_test, batch_size=batch_size)

print("Accuracy: " ,acc)


print("Loss: ", loss)
Confusion Matrix
[[ 17 346 253 168 176]
[ 26 684 496 291 327]
[ 15 468 321 204 208]
[ 15 441 300 194 189]
[ 9 312 213 125 138]]
Classification Report
precision recall f1-score support
happy 0.21 0.02 0.03 960
sad 0.30 0.38 0.34 1824
neutral 0.20 0.26 0.23 1216
angry 0.20 0.17 0.18 1139
surprise 0.13 0.17 0.15 797

accuracy 0.23 5936


macro avg 0.21 0.20 0.19 5936
weighted avg 0.22 0.23 0.21 5936

742/742 [==============================] - 60s 81ms/step - loss: 1.0494 -


accuracy: 0.5864
Accuracy: 0.5864218473434448
Loss: 1.0493624210357666

The second architecture was inspired from the VGGNet, but we reduced the number
of layers as shows the code below :
# Initialising the CNN
model = Sequential()
# 1 - Convolution
model.add(Conv2D(64,(3,3), padding='same', input_shape=(48, 48,1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 2nd Convolution layer
model.add(Conv2D(128,(5,5), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 3rd Convolution layer
model.add(Conv2D(512,(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 4th Convolution layer
model.add(Conv2D(512,(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# Flattening
model.add(Flatten())
# Fully connected layer 1st layer
model.add(Dense(256))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
# Fully connected layer 2nd layer
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))

model.add(Dense(5, activation='softmax'))

opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()

accuracy
training (min: 0.377, max: 0.671, cur: 0.671)
validation (min: 0.410, max: 0.690, cur: 0.690)

Loss
training (min: 0.855, max: 1.499, cur: 0.855)
validation (min: 0.829, max: 1.384, cur: 0.829)

Epoch 00015: val_loss improved from 0.78075 to 0.72863, saving model to


model_weights.h5
3035/3035 [==============================] - 1215s 400ms/step - loss:
0.6999 - accuracy: 0.7360 - val_loss: 0.7286 - val_accuracy: 0.7279
Bibliographie :
[1] https://www.kairos.com/blog/the-universally-recognized-facial-expressions-of-
emotion
[2] Deep facial expression recognition : a survey, Shan Li and Weihon Deng,
member, IEEE
[3]https://towardsdatascience.com/a-simple-guide-to-convolutional-neural-networks-
751789e7bd88
[4] https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
[4] https://medium.com/ai-in-plain-english/artificial-intelligence-vs-machine-learning-
vs-deep-learning-whats-the-difference-dccce18efe7f
[5] Yang Xin1,3, Lingshuang Kong2, Zhi Liu2,3, (IEEE member), Yuling Chen3,
Yanmiao Li1, Hongliang Zhu1, Mingcheng Gao1, Haixia Hou1, Chunhua
Wang4 « Machine learning and deep learning method for cybersecurity »
1 Centre of Information Security, Beijing University of Posts and
Telecommunications, Beijing, China
2 School of Information Science and Engineering, Shandong University, Jinan, China
3 Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University,
Guiyang, China
4 China Changfeng Science Technology Industry Group Corp, Beijing, China
[6] Nadia Jmour, Sehla zayen, Afef Abdelkrim « Convolutional Neural Networks for
image classification » L.A.R.A laboratry, National Engineering School of Tunis,
National Engineering School of carthage.
[7]Aditya kakde, Durgansh Sharma, Nitin Arora, « A comparative study of different
types of CNN and highway CNN », University of petrolum and energy study, Indian
institute of technology Roorkee.
[8] Dinesh Bisht, Shilpa Jain, Mohan Raju Mann, « Prediction of water table elevation
fluctuation through fuzzy logic and Artificial Neural Network », School of engineering

and technology, ITM university, Gurgaon, INDIA .


[10]Mehmet Akif Ozdemir, Bekay Elagoz, Aysegul AlaBeyOglu, Reza
sadighzadeh, Aydin Akan, « Realtime Emotion Rcognition from Facial expression
using CNN Architecture » Department of biomedical engineering, Department of
computer engineering, Business Administration Izmir katip celebi university Izmir,
Turkey.

[11] Lucy Nwosu, Huiwang Jiang Lu, IshapUnwala Xiakun Yang and Ting
Zhang « Deep Neural Network for facial expression recognition using Facial
Part », Dept.of computer Engineering university of Huston, Department of CSET,
university of Huston.

[12] Shadman Sakib, Nazib Ahmed, Ahmed Jawad Kabir, and Hridon Ahmed « An
overview of convolutional Neural Networks : its architecture and applications »,
Dept.of EEE, international university of business agriculture and technology,
dhaka 1230, Bangladesh, Dept. of EEE, Independent University of Bangladesh.

[13] https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-
network-cnn-deep-learning-99760835f148 .

[14]Deepesh Leakhak, »Facial expression recognition system using convolutional


Neural Network », Department of Electronics and Computer Engineering.

[9] link.springer.com/chapter/Basic learning principles of Artificial Neural


Netwoks.

 [1’] Agata Kołakowska, Agnieszka Landowska, Mariusz Szwoch, Wioleta


Szwoch « Emotion recognition and its application », Gdansk University of
Technology

You might also like