Professional Documents
Culture Documents
1- Introduction
Due to the important role of facial expression in human interaction, the ability to
perform facial expression recognition automatically via computer vision enables a
range of applications such as human- computer interaction and data analytics, etc…
In this chapter, we will present some notions of emotions and different coding
theories as well as the architecture of facial recognition.
We will present some approaches that help as to recognize facial expression and we
end the chapter with differents machine learning techniques.
2.1. definitions :
2.1.1. Emotions : the emotion is expressed through many channels such as body
position, voice and facial expressions.
MPEG4 model
2.3.3. Candide : is a model of the face, contained 75 vertices ans 100 triangles.
It is composed of a model with a generic face and a set of parameters(SHAPE
UNITS).
These parameters are used to adapt the generic model to a particular individual.
They represent the differences between individuals and are 12 in number :
1/ head height
2/vertical position of the eyebrows
3/vertical eye position
4/eye width
5/eye height
6/eye separation distance
7/depth of the cheeks
8/depth of the nose
9/vertical position of the nose
10/degree of the curvature of the nose
11/vertical position of the mouth
12/width of the mouth
CANDIDE model
CNN learns the values of these filters on its own during the training process( although
parameters such as number of filters, filter size, architecture of the networ, etc still
needed to specify the training process).
By increasing the number of filters, the more image features get extracted and the
better network becomes.
Three parameters control teh size of the feature map( convolved feature) :
Depth : correspond to the number of filters we use for the convolution
operation.
Stride : if the size of filter is 3 then stride is3.
Zero padding : it is convenient to pad the input matrix with zeros around the
border, so that filter can be applied to bordering elements of input image
matrix.
An additional operation is used after every convolution operation, called RELU layer.
A rectified linear unit apply an activation function, the output is :
F(x)= max(0.x).
There are an other non linear fuctions such as tanh or sigmoid that can alsobe used
instead of RELU. Most of the data scientist since performance wise RELU is better
than the other two.[13]
4.2.2. The pooling layer [6][13][14]
Pool layer is inserted between successive convolution layers, applying a
downsampling operation along the spatial dimensions width and height. Which
reduces the dimensionality of each map but retains important informations.
Spatial pooling can be of different types such as max pooling, average pooling and
sum pooling.
In MAXpooling, a spatial neighborhood (for example 2*2 window) is defined and the
largest element is taken from the rectified feature map within that window.
In case of average pooling, the average or sum of all elements is that window is
taken.
In practice, the MAXpooling has been shown to work better.
MAXpooling reduces the input by applying the max function over the input 𝑥𝑖, let m
be the size of the filter then the output calculates as follows :
M(𝑥𝑖) = 𝑚𝑎𝑥 {𝑥𝑖+𝑘,+𝑙 |𝑘| ≤ 𝑚/2 , |𝑙| ≤ 𝑚/2 𝑘, 𝑙𝜖 N}
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 48, 48, 64) 640
_________________________________________________________________
batch_normalization (BatchNo (None, 48, 48, 64) 256
_________________________________________________________________
activation (Activation) (None, 48, 48, 64) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 24, 24, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 128) 204928
_________________________________________________________________
batch_normalization_1 (Batch (None, 24, 24, 128) 512
_________________________________________________________________
activation_1 (Activation) (None, 24, 24, 128) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 128) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 12, 12, 512) 590336
_________________________________________________________________
batch_normalization_2 (Batch (None, 12, 12, 512) 2048
_________________________________________________________________
activation_2 (Activation) (None, 12, 12, 512) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 6, 512) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 6, 6, 512) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 6, 6, 512) 2359808
_________________________________________________________________
batch_normalization_3 (Batch (None, 6, 6, 512) 2048
_________________________________________________________________
activation_3 (Activation) (None, 6, 6, 512) 0
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 3, 3, 512) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 3, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 4608) 0
_________________________________________________________________
dense (Dense) (None, 256) 1179904
_________________________________________________________________
batch_normalization_4 (Batch (None, 256) 1024
_________________________________________________________________
activation_4 (Activation) (None, 256) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 131584
_________________________________________________________________
batch_normalization_5 (Batch (None, 512) 2048
_________________________________________________________________
activation_5 (Activation) (None, 512) 0
_________________________________________________________________
dropout_5 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 7) 3591
=================================================================
Total params: 4,478,727
Trainable params: 4,474,759
Non-trainable params: 3,968
_________________________________________________________________
num_classes = 5
img_rows,img_cols = 48,48
batch_size = 8
train_data_dir = 'C:/Users/HP/Desktop/projectEssai/train'
validation_data_dir = 'C:/Users/HP/Desktop/projectEssai/validation'
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=30,
shear_range=0.3,
zoom_range=0.3,
width_shift_range=0.4,
height_shift_range=0.4,
horizontal_flip=True,
fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
color_mode='grayscale',
target_size=(img_rows,img_cols),
batch_size=batch_size,
class_mode='categorical',
shuffle=True)
validation_generator = validation_datagen.flow_from_directory(
validation_data_dir,
color_mode='grayscale',
target_size=(img_rows,img_cols),
batch_size=batch_size,
class_mode='categorical',
shuffle=True)
model = Sequential()
# Block-1
model.add(Conv2D(32,
(3,3),padding='same',kernel_initializer='he_normal',input_shape=(img_rows,i
mg_cols,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(32,
(3,3),padding='same',kernel_initializer='he_normal',input_shape=(img_rows,i
mg_cols,1)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-2
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-3
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-4
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# Block-5
model.add(Flatten())
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# Block-6
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# Block-7
model.add(Dense(num_classes,kernel_initializer='he_normal'))
model.add(Activation('softmax'))
print(model.summary())
checkpoint = ModelCheckpoint('Emotion_little_vgg.h5',
monitor='val_loss',
mode='min',
save_best_only=True,
verbose=1)
#earlystop = EarlyStopping(monitor='val_loss',
#min_delta=0,
#patience=3,
#verbose=1,
#restore_best_weights=True
#)
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
factor=0.2,
patience=3,
verbose=1,
min_delta=0.0001)
callbacks = [PlotLossesCallback(),checkpoint,reduce_lr]
model.compile(loss='categorical_crossentropy',
optimizer = Adam(lr=0.001),
metrics=['accuracy'])
nb_train_samples = 24176
nb_validation_samples = 3006
epochs=15
history=model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples//batch_size,
epochs=epochs,
callbacks=callbacks,
validation_data=validation_generator,
validation_steps=nb_validation_samples//batch_size)
accuracy
training (min: 0.255, max: 0.464, cur: 0.464)
validation (min: 0.294, max: 0.586, cur: 0.586)
Loss
training (min: 1.295, max: 1.725, cur: 1.295)
validation (min: 1.043, max: 1.551, cur: 1.043)
test_dir=validation_data_dir
In [3]:
import sklearn
%matplotlib inline
In [4]:
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(img_rows, img_cols),
color_mode='grayscale',
batch_size=batch_size,
class_mode='categorical',
shuffle=True)
FLOW1_model =
load_model('C:/Users/HP/Desktop/projectEssai/Emotion_little_vgg(3).h5')
print(classification_report(test_generator.classes, y_pred,
target_names=target_names))
The second architecture was inspired from the VGGNet, but we reduced the number
of layers as shows the code below :
# Initialising the CNN
model = Sequential()
# 1 - Convolution
model.add(Conv2D(64,(3,3), padding='same', input_shape=(48, 48,1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 2nd Convolution layer
model.add(Conv2D(128,(5,5), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 3rd Convolution layer
model.add(Conv2D(512,(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 4th Convolution layer
model.add(Conv2D(512,(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# Flattening
model.add(Flatten())
# Fully connected layer 1st layer
model.add(Dense(256))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
# Fully connected layer 2nd layer
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(5, activation='softmax'))
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
accuracy
training (min: 0.377, max: 0.671, cur: 0.671)
validation (min: 0.410, max: 0.690, cur: 0.690)
Loss
training (min: 0.855, max: 1.499, cur: 0.855)
validation (min: 0.829, max: 1.384, cur: 0.829)
[11] Lucy Nwosu, Huiwang Jiang Lu, IshapUnwala Xiakun Yang and Ting
Zhang « Deep Neural Network for facial expression recognition using Facial
Part », Dept.of computer Engineering university of Huston, Department of CSET,
university of Huston.
[12] Shadman Sakib, Nazib Ahmed, Ahmed Jawad Kabir, and Hridon Ahmed « An
overview of convolutional Neural Networks : its architecture and applications »,
Dept.of EEE, international university of business agriculture and technology,
dhaka 1230, Bangladesh, Dept. of EEE, Independent University of Bangladesh.
[13] https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-
network-cnn-deep-learning-99760835f148 .