Output 18

Republic of Tunisia LR-SITI-ENIT
Ministry of Higher Education, Scientific

Research and Information and
Communication Technologies
Tunis Manar University ST-EN07/00

National School of Engineering of Tunis Master Project
Serial N°: 2015 / DIMA-033
Master Project
Report
presented at
National School of Engineering of Tunis

(LR-SITI-ENIT)
in order to obtain the
Master degree in Systems, Science and Data
by
Awatef MESSAOUDI
Defended on 18/12/2020 in front of the committee composed of
Mr Foulen Fouleni President

Ms Foulena Foulenia Supervisor
Mr Foulen Fouleni Reviewer
Dedication
Put your dedication lines here

And try to be expressive ;)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum
To all of you,
I dedicate this work.
Awatef MESSAOUDI
Thanks
And put your thanks here.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit
anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis
aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.
CONTENTS Awatef MESSAOUDI
Contents
Dedication i
Thanks ii
Contents iv
List of Figures v
Acronyms vi
Introduction 1
1 Facial expression recognition : state of the art 2

1.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Facial expressions and emotions : . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Definitions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 The universal facial expressions: . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Coding systems: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 Areas of application of FER: . . . . . . . . . . . . . . . . . . . . . 7
1.3 Architecture of Facial expression recognition: . . . . . . . . . . . . . . . . . 7
1.3.1 Face detection: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Feature extraction: . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 Emotion recognition: . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.4 Facial expression databases: . . . . . . . . . . . . . . . . . . . . . . 10
1.3.5 Machine learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.6 Deep learning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Page iii
CONTENTS Awatef MESSAOUDI
1.4 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Deep learning 14
2.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Machine learning vs Deep learning: . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Artificial neural network:[11] . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Convolutional neural network CNN: . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Presentation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Architecture:[7][12] . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Visualisation of some CNN architectures:[5] . . . . . . . . . . . . . . . . . 20
2.5.1 LeNet-5 (1998): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 AlexNet(2012): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.3 VGG-16(2014): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.4 Inception-v1(2014): . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.5 ResNet-50(2015): . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.6 Xception(2016): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Sub section Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Chapter Two 24
3.1 Section One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Sub section One . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Sub section Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Chapter Two 26
4.1 Section One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Sub section One . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Sub section Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Conclusion 28
Appendix 29
Webography 30
Bibliography 30
Page iv
LIST OF FIGURES Awatef MESSAOUDI
List of Figures
1 The six universal emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 This is a test image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Page v
LIST OF FIGURES Awatef MESSAOUDI
Acronyms
ENIT National School of Engineering of Tunis
Page vi
INTRODUCTION Awatef MESSAOUDI
Introduction
Welcome to National School of Engineering of Tunis (ENIT).

Again, welcome to ENIT.
Your introduction goes here.
Page 1
CHAPTER 1. FACIAL EXPRESSION RECOGNITION : STATEAwatef
OF THEMESSAOUDI
ART
Chapter
1
Facial expression recognition : state
of the art
1.1 Introduction:
Due to the important role of facial expression in human interaction, the ability to
perform facial expression recognition automatically via computer vision enables a range
of applications such as human- computer interaction and data analytics, etc… In this
chapter, we will present some notions of emotions and different coding theories as well as
the architecture of facial recognition. We will present some approaches that help as to
recognize facial expression and we will end the chapter with differents machine learning
techniques.
1.2 Facial expressions and emotions :
1.2.1 Definitions:
1.2.1.1 Emotions:
the emotion is expressed through many channels such as body position, voice and facial
expressions. It is a mental and physiological state which is subjective and private. It
involves a lot of behaviours, actions, thoughts and feelings.
Page 2
OF THEMESSAOUDI
ART
SHERER proposes the following definition : « Emotion is a set of episodic variations

in several components of the organisation in response to events assessed as important by
the organism. »
1.2.1.2 Facial expressions:
facial expression is a meaningful imitation of the face. The meaning can be expression of an
emotion, a semantic index or an intonation in the language of panels. The interpretation
of a set of muscle movements in expression depends on the context of the application. For
example, in the case of an application in Human-Machine interaction where we want to
know an indication of the emotional state of an individual, we will try to classify measures
in terms of emotions.
1.2.2 The universal facial expressions:
Charles DARWIN wrote in his 1872 book « the expressions of the emotions in Man and
Animals » that facial expressions of emotion are universal, not learned differently in each
culture. Several studies since have attempted to classify human emotions and demonstrate
how your face can give away your emotional state.[2] In 1960, Ekman and Friesen defined
six basic emotions based on cross-culture study, which indicated that humans perceive
certain basic emotions in the same way regardless of culture. These prototypical facial
expressions are anger, disgust, fear, hapiness, sadness, and surprise.[2]
Figure 1. The six universal emotions
Page 3
OF THEMESSAOUDI
ART
1.2.3 Coding systems:
Facial expressions is a consequence of activity of facial muscles. These muscles are also
called mimetic muscles or muscles of the facial expressions. The study of facial expressions
cannot be done without the study of the anatomy of the face and the underlying structure
of the muscles. That’s why some researchers focused on a coding system for facial
expressions. Several systems have been proposed such as Ekman system’s. In 1978 Ekman
developed a tool for coding facial expressions widely used today. We will present some
systems.
1.2.3.1 FACS:
facial action coding systems is a system developped by Ekman and friesen which is a
standard way of describing facial expressions in both psychology and computer animation.
Facs is based on 44 actions units (AUs) that represent facial movement that cannot be
composed into smaller area. FACS is very successful but it suffers from some defaults
such as :
• Complexity: : it takes 100 hours of learning to master the main concepts.
• Difficulty of handling bu a machine : FACS was created for psychologist, some

measurements remains vague and difficult to assess by a machine.
• Lack of precision : the transition between two states of a muscle are represented by
linear way, which is an approximation of reality.
subsubsectionComplexity: It takes 100 hours of learning to master the main concepts.

subsubsectionDifficulty of handling bu a machine: FACS was created for psychologist,
some measurements remains vague and difficult to assess by a machine. subsubsectionLack
of precision: The transition between two states of a muscle are represented by linear way,
which is an approximation of reality.
Page 4
OF THEMESSAOUDI
ART
1.2.3.2 MPEG4:
the MPEG4 video encoding standard has a model of the face human developped by the
face and body AdHocGroup interest group. This is a 3D model. This model is built
on a set of facial attributes, called Facial Feature Points(FFP). Measurements are used
to describe muscle movements( Facial animation Parameters-equivalents of Ekman unit
Actions).
Figure 2. MPEG4 Model
1.2.3.3 Candide:
It is a model of the face, contained 75 vertices ans 100 triangles. It is composed of a model
with a generic face and a set of parameters(SHAPE UNITS). These parameters are used
Page 5
OF THEMESSAOUDI
ART
to adapt the generic model to a particular individual. They represent the differences
between individuals and are 12 in number:
1. head height.
2. vertical position of the eyebrows.
3. vertical eye position.
4. eye width.
5. eye height.
6. eye separation distance.
7. depth of the cheeks.
8. depth of the nose.
9. vertical position of the nose.
10. degree of the curvature of the nose.
11. vertical position of the mouth.
12. width of the mouth.
Page 6
OF THEMESSAOUDI
ART
Figure 3. Candide Model
1.2.4 Areas of application of FER:
Automatic Facial expression recognition system has many applications including human
behavior understanding, detecting of mental disorder, etc...[3]. It has become a research
field involving many scientists specializing in different Areas such as artificial intelligence,
computer vision, psychology, physiology, education, website customization, etc…
1.3 Architecture of Facial expression recognition:
The system that performs automatic recognition of facial expression consists of three
modules : The first one is detecting and recording the face in the image or the input image
sequences. It can be a sensor to detect the face in each image or just detect the face in
the first image and then follow the face in the rest of video sequences. The second module
Page 7
OF THEMESSAOUDI
ART
consist in extracting and representing the facial changes caused by facial expressions. The
last one determines a similarity between the set of characteristics extracted and a set of
reference characteristics. Other filters or data preprocessing modules can be used between
these main modules to improve the results of detection, extraction of characteristics or
classification.
1.3.1 Face detection:
Face detection consists of determinig the presence or absence of faces in a picture. This
is a preliminary task necessary for most techniques for analysing the face. This used
technique come from the field of recognition shapes. There are several techniques for
detecting the face, we mention the most used.
• Automatic facial treatement : it is a method that specifies faces by distances and

proportions between particular pointsaround the eyes, nose, corners of the mouth,
but it is not effective when the light i slow.
• Eigenface : this is an effective method of characterization in facial treatment such as

as face detection and recognition. It is based on the representation of face features
from model grayscale images.
• LDA( linear discriminant analysis) : it is based on predictive discriminant analysis.

It is about explaining and predicting the membership of azn individual to a
predefined class based on measured characteristics using prediction variables.
• LBP( local binary patterns method) ; the technique of local binary model devides the
face into square subregions of equal size where the LBPcharacteristics are calculated
. the vector obtained are concatenated to get the final feature vector.
• Haar filter : this face detection method uses a multiscale haar filter. The
characteristics of a face are described in an XML file.
Page 8
OF THEMESSAOUDI
ART
1.3.2 Feature extraction:
The characteristics of the face are mainly located around the facial components such as
the eyes, mouth, eye-brow nose and chin. The detection of characteristics points of the
faces is done by a rectangular box returned by a detector which locates the face. The
extraction of the geometric features such as the countours of facial components and facial
distance provides location or appearance of characteristics. Therefore, there are two types
of approaches :
1.3.2.1 the geometric characteristics:
characteristics represent the shape and location of components of the face(including the
mouth, eyes, eyebrows and nose). The facial compnents or facial features are extracted
to form a vector of features representing the geometry of the face.
1.3.2.2 the characteristics of appearance:
It represents change in appearance of the face such as wrinkles and furrows. According to
these methods, the effect of rotation of head and the different facial shooting scales could
be eliminated by a normalization before the step of extraction of characteristics or by a
representation of features before the expression recognition step.
1.3.3 Emotion recognition:
Many researches are divided into three parts global approaches local approaches,
local approaches and finally hybrid approaches. Each approaches has advantages and
disadvantages related to environmental issues, orientation of images, position of the head,
etc…
Page 9
OF THEMESSAOUDI
ART
1.3.3.1 global approach:
These approaches are independent of head positions (top, bottom) and face image
orientations. These methods are effective but requires a heavy learning phase and the
result depends on the number of samples used.
1.3.3.2 local approach:
These approachs are based facial objects detection and they are robust to the change
of luminance. The position of the head and its orientation can cause some gaps in the
system.
1.3.3.3 Hybrid approach:
the alternative is to combine the two approaches(local and global) in order to take
advantages from these approaches. The recognition phase in this system is based on
machine learning theory : The feature vector is formed to describe the facial expressionand
the first part of the classifier is Learning. Classifier training consists of labeling the images
after detection, once the classifier is trained, it can recognize the images input. The
classification method can be devided into two groups : • Recognition based on static
data which only concerns images. • Recognition based on dynamic data concerning
sequences images or videos. Various classifiers have been applied such as neural network,
bayesian network, SVM, etc…
1.3.4 Facial expression databases:
Having sufficient labeled training data that include as many variations of the populations
and environments as possible is important for the design of a deep expression recognition
system. We will introduce some databases that contain a large amount of affective images
collected from the real world to benefit the training of deep neural networks.
Page 10
OF THEMESSAOUDI
ART
1.3.4.1 CK+:
The extended cohnkanade database is the most extensively used laboratru-controlled

database for evaluating FER system. CK+ contains 593 video sequences from 123
subjects. The sequences vary in duration from 10 to 60 frames and show a shift from a
neutral facial expression to the peak expression. Among these video, 327 sequences from
118 subjects are labeled with seven basic expression labels(anger, comptemt, disgust, fear,
hapiness, sadness and surprise) based on the facial action coding systems(FACS). Because
CK+does not provide specified training, validation and test set, the algorithms evaluated
on this database are not uniform.
1.3.4.2 NMI:
this database is laboratry-controlled are includes 326 sequences from 32 subjects. A total
of 213sequences are labeled with six basic expressions and 205 sequences are captured
in frontal view. In contrast to CK+ sequences in NMI are onset-apex-offset labeled.
The sequence begins with a neutral expression and reaches peak near the middle before
returning to the neutral expression.
1.3.4.3 JAFFE:
The japaneese female facial expression database is a laboratry-controlled image database

that contains 213 samples of posed expressions from 10 japaneese female. Each person has
3°4 images with each of six basic facial expression( anger, disgust, fear, hapiness, sadness
and surprise) and one image with a neutral expression. The database is challenging
because it contains few examples per subject/expression.
1.3.4.4 FER-2013:
This database was introduced during the ICML 2013 challenges in representation learning.
FER-2013 is a large scale and unconstrained database collected automatically by the
Page 11
OF THEMESSAOUDI
ART
google image search API. All images have been registred and resized to 48*48 pixels
after rejecting wrongfully labeled frames and adjusting the cropped region. FER-2013
contains 28.709 training images, 3.589 validation images and 3.589 test images with seven
expression labels ( anger, disgust, fear, hapiness, sadness, surprise, and neutral).
Figure 4. Test Image
1.3.5 Machine learning:
This is a second subsection[1].

Machine learning is one of the most exciting areas of technology at the moment. We
see daily many stories that herald new breackthroughs in facial recognition technology,
self driving cars or computers that can have a conversation just like a person. Machine
learning technology is set to revolutionise almost any area of human life and work. The
one primary reason behind the using of machine learning is to automate complex tasks
and to analyze the variety and the complexity of data.
1.3.6 Deep learning:
Deep learning or deep machine learning is a branch of machine learning that takes data
as an input and makes intuitive and intelligent decisions using an artificial neural network
stacked layer wise. It is being applied in various domains for its ability to find patterns
in data extract features and generate intermediate representations.
Page 12
OF THEMESSAOUDI
ART
1.4 Conclusion:
Deep learning or deep machine learning is a branch of machine learning that takes data
as an input and makes intuitive and intelligent decisions using an artificial neural network
stacked layer wise. It is being applied in various domains for its ability to find patterns
in data extract features and generate intermediate representations.
• Menu Item
Menu Description.
Focus topics: Topic one, topic two, topic three, ...
• Menu Item
Menu Description.
• Menu Item
Menu Description.
Also bullets such as:
• One
• Two
• Three
• Four
• …
And for more chaptes, just copy the file “004-chapter1.tex” and edit the content, and
then you’ll have to add it to “001-report.tex”.
Page 13
CHAPTER 2. DEEP LEARNING Awatef MESSAOUDI
Chapter
2
Deep learning
2.1 Introduction:
Deep learning is a subset of machine learning, which uses the neural network to analyze
different factors with a structure that is similar to the human neural system. It uses
complex multi-layered neural networks, where the level of abstraction increases gradually
by nonlinear transformations of input data.[5] It concerns algorithms inspired b by the
structure and function of the brain. They can learn several levels of representation in
order to model complex relationships between data
2.2 Machine learning vs Deep learning:
Machine learning algorithms work well for a wide variety of problems. However they failed
to solve some major AI problems such as speech, face and emotions recognition.
Page 14
Figure 5. Machine learning vs Deep learning.
Machine learning method includes the following four steps:
• Features engineering: choice as a basic for prediction( attributes, features).
• Choose the appropriate machine learning algorithm( such as classification algorithm

or regression algorithm).
• Train and evaluate model performance( for different algorithms, evaluate and select
the best performing model).
• Use the trained model to classify or predict the unknown data.[9]
Most of Features must be determined by an expert and then encoded as a data type.
Features can be pixel value, shapes, etc,... The performance of machine learning
algorithms depends upon the accuracy of the features extracted. Deep learning reduces
the task of developing new features extractor, by automating the phase of extracting
and learning features.[10] Deep learning uses neural network to learn representations of
characteristics directly from data.
Page 15
2.3 Artificial neural network:[11]
Artificial neural network is a computing model that tries to mimmic the human brain
in a very primitive way to emulate the capabilities of human being in a very limited
sense. ANNs have been developed as a generalization of mathematical models of human
cognition or neural biology. It takes an input vector X and produces an output vector
Y. the relationship between X and Y are determined by the network architecture.[12] An
ANN is a network of parallel, distributed information processing. It consists of a number
of informations processing elements called neurons or nodes which are grouped in layers.
The input layer processing elements receive the input vector and transmit the values to the
next layer of processing elements across connections where this process is continued. This
type of network, where data flow one way(forward) is known as a feed forward network.
A feedforward ANN has an input layer, an output layer and one or more hidden layers
between the input and the output layers. Each of the neurons in a layer is connected to
all the neuros of the next layer and the neuron in one layer are connected only to the
neurons of the immediate next layer. The strength of the signal passing from one neuron
to the other depends on the weight of the interconnections. The hidden layers enhance
to the network’s ability to model complex functions. Performance of BPANN(back
propogation artificial neural network) model is compared with the developped linear
transfer function(LTF) model and was found superior.
Page 16
Figure 6. Machine learning vs Deep learning.
2.4 Convolutional neural network CNN:
2.4.1 Presentation:
Convolutional neural network CNN is an artificial neural network type that proposed by
Yann le Cuhn in 1988. CNNs are one of the most popular deep learning architectures for
image classification, recognition and segmentation. CNN consists of hierarchical multiply
hidden layers. These artificial neurons take input from image, multiply weight, add bias
and then apply activation function. So that, artificial neurons can be used in image
classification, recognition and segmentation by perform simple convolutions by feeding
the convolutional neural network with more data( huge amount of data).[16]
2.4.2 Architecture:[7][12]
Convolutional Neurals networks are the most efficient models for classifying images data.
It was inspired by the mammal’s visual cortex[10] Each CNN channe lis made up of
convolutional layers, max pooling layers, fuuly connected layers and an output layer.[14]
Page 17
Figure 7. Architecture for a convolutional neural network.
2.4.2.1 The convolution layer CONV:
The convolution layer is the first layer to extract features from an input image[12]. It is
the fundamental unit of a convnet[15] It contains a set of filters whose parameters need
to be learned. Once the information hits a convolution layer , the layer convolves every
filters across the spatial dimensionality of the data to provide a 2D activation map. The
convolution of (N,M) image matrix multiplies with (n,m) filter matrix is called « feature
map ». The convolution of an image with different filters can perform operations such
as edge detection, blur and sharpen by applying filters[15]. During the forward pass,
each filter is convolved across the width and height of the input volume and compute dot
products between the entries of the filter and the input at any position. As the filter
convolve over the width and the height of the input volume it produces a 2 dimensional
activation map that gives the responses of the filter at every spatial position. There will
be an entire set of filters in each of them will produce a separate 2-dimensional activation
map.[17] The 2D convolution between image A and filter B can be given as :
C(i,j)=
N∑
a−1
A(m, n) ∗ B(i − m, j − n)
m=0
Page 18
where size of A is (Ma * Na), size of B is (Mb * Nb), 0<=i<=Ma+Mb-1

0<=j<=Na+Nb-1
CNN learns the values of these filters on its own during the training process( although
parameters such as number of filters, filter size, architecture of the network, etc still
needed to specify the training process). By increasing the number of filters, the more
image features get extracted and the better network becomes. Three parameters control
the size of the feature map( convolved feature) :
• Depth : correspond to the number of filters we use for the convolution operation.
• Stride : if the size of filter is 3 then stride is3.
• Zero padding : it is convenient to pad the input matrix with zeros around the
border, so that filter can be applied to bordering elements of input image matrix.
An additional operation is used after every convolution operation, called RELU layer.
A rectified linear unit apply an activation function, the output is : F(x)= max(0.x). There
are an other non linear fuctions such as tanh or sigmoid that can alsobe used instead of
RELU. Most of the data scientist since performance wise RELU is better than the other
two.[16]
2.4.2.2 The pooling layer:[12][16][17]
Pool layer is inserted between successive convolution layers, applying a downsampling

operation along the spatial dimensions width and height. Which reduces the
dimensionality of each map but retains important informations. Spatial pooling can be of
different types such as max pooling, average pooling and sum pooling. In MAXpooling,
a spatial neighborhood (for example 2*2 window) is defined and the largest element is
taken from the rectified feature map within that window. In case of average pooling, the
average or sum of all elements is that window is taken. In practice, the MAXpooling has
been shown to work better. MAXpooling reduces the input by applying the max function
over the input Xi,l and m be the size of the filter then the output calculates as follows :
M(Xi)= maxXi+k, +l|k|<=m/2, |l|<=m/2k, l£N
Page 19
2.4.2.3 The fully connected layer:[16][17]
In the end, a feature extractor vector or CNN code concatenate the output informations
as a unique vector and feed it into fully connected layer(multilayer perceptron). The term
« fully connected »indicates that every neuron in the previous layer is connected to every
neuron on the next layer. The output from the convolutional and pooling layers represent
high level features of the input image. The purpose of the fully connected layer is to use
these features for classifying the input image into various classes based on the training
Dataset.
2.4.2.4 Activation function:
The activation function is a mathematical function applied to a signal at the output of

an artificial neuron. The term activation function comes from the biological equivalent
»activation potential » simulation threshold which, once reached leads to a response of
the neuron. Softmax is used for activation function, it treats the outputs as scores for
each class. In the softmax, the function mapping stayed unchanged and these scores are
interpreted as the unnormalized log probabilities for each class. Softmax is calculated as
:
where j is index for image and K is number of total facial expression class. The RELU
is an activation function which eliminates all the negative values.
2.5 Visualisation of some CNN architectures:[5]
In recent years, we remarked the evolution of CNNs architectures. These networks have
gotten so deep that it has become extremely difficult to visualise the entire model.
Page 20
2.5.1 LeNet-5 (1998):
It is one of the simplest architectures. It has 2 convolutional and 3 fully-connected layers.

This architectures has about 60.000 parameters.
2.5.2 AlexNet(2012):
With 60 M parameters, AlexNet has 8 layers 5 convolutional and 3 fully connected.

AlexNet just stacked a few more layers. This architecture was one of the largest
convolutional neural networks to date on the subsets of ImageNet. They are the first
to implement RELU as an activation Function.
2.5.3 VGG-16(2014):
With this architecture, we notice taht CNNs were strating to get deeper and deeper.
This is because the most straight forward way of improving performance of deep neural
networks is by increasing their size. VGG-16 has 13 convolutional and 3 fully connected
layers, carrying with them the RELU tradition from AlexNet. It consists of 138M
parameters and takes about 500MB of storage space.
2.5.4 Inception-v1(2014):
This 22 layers architecture with 5M parameters is called the inception-v1. The design of
the architecture of an inception module is a product of research on approximating sparse
structures.
2.5.5 ResNet-50(2015):
From the past few CNNs, we have seen nothing but an increasing number of layers in
the design and achieving better performance. But with the network depth increasing,
accuracy gets saturated and the degrades rapidly. The folkes from Microsoft researcher
Page 21
adressed this problem with ResNet, using skip connections while building deeper models.
ResNet is one of the early adapters of batch normalisation with 26 M parameters.
2.5.6 Xception(2016):
Xception is an adaptation from inception, when the inception modules have been replaced
with depthwise separable convolution, it has also roughly the same number of parameters
as inception-v1(23M).
2.6 Conclusion:
In this chapter, we have presented the neural network and its differents architectures.
We focuses on CNNs , their structures and its differents layers, then we have presented a
few examples of architectures. In the next chapter, we will explains the idea of using the
architecture that we have chosen for our system of face expression recognition.
And your chapter one goes here[2].

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
laborum.
Page 22
2.6.1 Sub section Two

laborum.
• Menu Item
Menu Description.
• Menu Item
Menu Description.
• Menu Item
Menu Description.
• One
• Two
• Three
• Four
• …
Page 23
CHAPTER 3. CHAPTER TWO Awatef MESSAOUDI
Chapter
3
Chapter Two
3.1 Section One
3.1.1 Sub section One

laborum.
Page 24

laborum.
• Menu Item
Menu Description.
• Menu Item
Menu Description.
• Menu Item
Menu Description.
• One
• Two
• Three
• Four
• …
Page 25
Chapter
4
Chapter Two
4.1 Section One
4.1.1 Sub section One

laborum.
Page 26

laborum.
• Menu Item
Menu Description.
• Menu Item
Menu Description.
• Menu Item
Menu Description.
• One
• Two
• Three
• Four
• …
Page 27
CONCLUSION Awatef MESSAOUDI
Conclusion
And a very interesting conclusion here.

sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
anim id est laborum.
Page 28
APPENDIX Awatef MESSAOUDI
Appendix
An appedix if you need it.
Page 29
WEBOGRAPHY Awatef MESSAOUDI
Webography
[2] Latex @ Wikipedia. url: https://www.kairos.com/blog/the-universally-recognized-

facial-expressions-of-emotion (visited on 0004–2016).
[3] ENIS. url: http://www.enis.rnu.tn/site/enis_fr/ (visited on 0004–2016).
Page 30
BIBLIOGRAPHY Awatef MESSAOUDI
Bibliography
[1] Charles Bazerman et al. Shaping written knowledge: The genre and activity of the
experimental article in science. Vol. 356. University of Wisconsin Press Madison,
1988.
[4] Ashraf Aboulnaga, Alaa R Alameldeen, and Jeffrey F Naughton. “Estimating the
selectivity of XML path expressions for internet scale applications”. In: VLDB. Vol. 1.
2001, pp. 591–600.
[5] Ashbindu Singh. “Review article digital change detection techniques using
remotely-sensed data”. In: International journal of remote sensing 10.6 (1989),
pp. 989–1003.
Page 31
Hello World!
Here is my test PDF document.

Output 18

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Output 18

Uploaded by

Copyright:

Available Formats

Republic of Tunisia LR-SITI-ENIT

Ministry of Higher Education, Scientific

Tunis Manar University ST-EN07/00

National School of Engineering of Tunis

in order to obtain the

Master degree in Systems, Science and Data

Defended on 18/12/2020 in front of the committee composed of

Mr Foulen Fouleni President

Put your dedication lines here

And put your thanks here.

1 Facial expression recognition : state of the art 2

1 The six universal emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

5 The six universal emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

9 This is a test image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

10 This is a test image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ENIT National School of Engineering of Tunis

Welcome to National School of Engineering of Tunis (ENIT).

1.2 Facial expressions and emotions :

SHERER proposes the following definition : « Emotion is a set of episodic variations

1.2.1.2 Facial expressions:

1.2.2 The universal facial expressions:

Figure 1. The six universal emotions

1.2.3 Coding systems:

• Complexity: : it takes 100 hours of learning to master the main concepts.

• Diﬀiculty of handling bu a machine : FACS was created for psychologist, some

subsubsectionComplexity: It takes 100 hours of learning to master the main concepts.

Figure 2. MPEG4 Model

2. vertical position of the eyebrows.

3. vertical eye position.

6. eye separation distance.

7. depth of the cheeks.

8. depth of the nose.

9. vertical position of the nose.

10. degree of the curvature of the nose.

11. vertical position of the mouth.

12. width of the mouth.

Figure 3. Candide Model

1.2.4 Areas of application of FER:

1.3 Architecture of Facial expression recognition:

1.3.1 Face detection:

• Automatic facial treatement : it is a method that specifies faces by distances and

• Eigenface : this is an effective method of characterization in facial treatment such as

• LDA( linear discriminant analysis) : it is based on predictive discriminant analysis.

1.3.2 Feature extraction:

1.3.2.1 the geometric characteristics:

1.3.2.2 the characteristics of appearance:

1.3.3 Emotion recognition:

1.3.3.1 global approach:

1.3.3.2 local approach:

1.3.3.3 Hybrid approach:

1.3.4 Facial expression databases:

The extended cohnkanade database is the most extensively used laboratru-controlled

The japaneese female facial expression database is a laboratry-controlled image database

Figure 4. Test Image

1.3.5 Machine learning:

This is a second subsection[1].

1.3.6 Deep learning:

Also bullets such as:

2.2 Machine learning vs Deep learning:

Figure 5. Machine learning vs Deep learning.

Machine learning method includes the following four steps:

• Features engineering: choice as a basic for prediction( attributes, features).

• Choose the appropriate machine learning algorithm( such as classification algorithm