Professional Documents
Culture Documents
https://doi.org/10.1007/s00371-019-01628-3
ORIGINAL ARTICLE
Abstract
Detection of emotion using facial expression is a growing field of research. Facial expression detection is also helpful to
identify the behavior of a person when a man interacts with the computer. In this work, facial expression recognition with
respect to the changes in the facial geometry is proposed. First, the image is enhanced by means of discrete wavelet transform
and fuzzy combination. Then, the facial geometry is found using the modified eyemap and mouthmap algorithm after finding
the landmarks. Finally, the area and angle of the constructed triangles are found and classified using neural network with the
help of tensorflow central processing unit version. Results show that the proposed algorithm is efficient in finding the facial
emotion.
123
A. Joseph, P. Geetha
deals with experimental results and analysis. Finally, Sect. 5 emotion recognition. Wong et al. [29] work on face emo-
concludes the paper. tion recognition based on tree structure representation with
probabilistic recursive neural network modeling. The two
databases that they used are JAFFE and Cohn–Kanade.
2 Related works Gabor feature extraction was used to extract the features of
face, and the classifier used was probability-based recursive
Many works have been performed for the detection of facial neural network model.
emotion recognition. Happy et al. [7] proposed a framework Alugupally et al. [2] used Cohn–Kanade database for
for expression recognition with the help of appearance fea- detecting facial expression. In their work, they use statistical
tures of selected facial patches. This method classifies the analysis to determine the landmarks and features that are best
emotion by assessing a few facial patches, and they worked suited to recognize the expressions in a face. They suggest
on grayscale images, with the help of Japanese Female Facial that the texture and spatial information of the features in a
Expressions (JAFFE) [21] and CK+ databases. Malakar et al. face, such as eyes, mouth, eyebrows, and nose, can be used
[23] proposed a method to identify facial emotions based on for feature extraction and classification of facial expression.
histogram of gradients features, and they used JAFFE and Jain et al. [12] proposed an approach using multi-scale Gaus-
Cohn–Kanade [13] databases. They classified their features sian derivatives and SVM for detecting facial expressions.
using support vector machine (SVM). Kim [15] proposed a facial expression recognition system
Lajevardi et al. [16] extract features based on Gabor filters, using active shape model (ASM) landmark information and
log Gabor filter, local binary pattern operator, higher-order appearance-based classification algorithm using embedded
local autocorrelation (HLAC), and HLAC-like features and hidden Markov model. He claims that the proposed approach
used JAFFE and Cohn–Kanade as their databases. The clas- gives performance improvements compared to ASM-based
sifier they used was naive Bayes which is a probabilistic face alignment method for Cohn–Kanade database and
method. They claim that experimental results on both Cohn– JAFFE database, respectively. Yu et al. [32] worked on clas-
Kanade and JAFFE databases show that it is time-consuming sifying strong and weak emotions. Another work on facial
and more complex when holistic methods are used. Buciu emotion recognition is [18], where a super wide regres-
et al. [3] worked on comparison of independent component sion network for facial expression is used. Facial expression
analysis (ICA) approaches for facial expression recogni- recognition by de-expression residue learning is proposed in
tion. In their work, in addition to the InfoMax approach [31], where the authors use expressive component and neutral
present in ICA, other methods such as the extended InfoMax component to classify facial expression.
ICA, the undercomplete ICA, and the nonlinear kernel-ICA From all the works discussed above, one can find that
approaches were used for extracting features from two facial the facial features are extracted and classified with the help
expression databases, namely JAFFE and Cohn–Kanade. of appropriate classification algorithms. The extraction of
They use cosine similarity measure and SVM classifiers for features and manipulation of features with the help of classifi-
classification of facial expression. They conclude that sparse cation algorithms play a vital role in identifying the emotions.
basis images do not necessarily lead to a more accurate facial In this work, facial geometry-based expression detection
expression classification, as ICA yields an efficient coding is proposed by detecting the eye and mouth with the help of
by performing a sparse image representation. Viola Jones and modified mouthmap and eyemap algorithms.
Ilbeygi et al. [10] proposed a fuzzy-based facial expression Tensorflow is used for classification. We compare our work
recognition system based on facial feature extraction. The with different classification algorithms also.
features that they extract are eye opening, mouth opening,
eyebrow constriction, mouth corners displacement, mouth
length, nose-side wrinkles detection, detecting the existence 3 Proposed system
of the tooth, eyebrow slope, and lips thickness. They compare
the accuracy of the proposed fuzzy system with K -nearest The overall block diagram of the proposed system is shown in
neighbor (KNN) classifier and claim that their method out- Fig. 1. The main components are preprocessing, face detec-
performs KNN classifier. tion, mouth detection, and eye detection.
Huang et al. [9] extract the contour deformation of the
facial features and classify using the minimum distance clas- 3.1 Preprocessing
sifier to cluster the parameters. They conclude that further
study is required to improve the accuracy of the facial fea- The input image is first preprocessed in order to enhance the
ture extraction. Karthigayan et al. [14] used three feature clarity of the image. The input RGB image is first converted
extraction methods, namely projection profile, contour pro- to HSV image. The manipulations are carried out on the V
file, and moments. Genetic algorithm was used for face plane after extracting the V plane separately. DWT is used
123
Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image…
to transform the V plane. The image obtained after transfor- obtained based on the experiments. The following rules were
mation is shown in Fig. 2. Here, we get the approximation applied, if the input is mf1, then the output is mf1. Similarly
coefficients (LL), horizontal coefficients (HL), vertical coef- if...then rules were applied for mf2, mf3, mf4, and mf5.
ficients (LH), and diagonal coefficients (HH). Figure 3 shows
the output for the given input image after applying DWT. 3.2 Face detection
Fuzzy logic [24]-based manipulations are carried out in
the approximation coefficients of the DWT in order to get the The enhanced image shown in Fig. 5 is used for face detec-
enhanced image. Inverse DWT (IDWT) is used to reconstruct tion. The face is detected using Viola Jones algorithm [28],
the image. Figure 4 shows the membership function details, which has four stages, namely (1) Haar feature selection, (2)
and Table 1 shows the fuzzy rule base for image enhancement creating an integral image, (3) Adaboost training, (4) cascad-
123
A. Joseph, P. Geetha
(a)
1
n (x,y)∈F G Cr (x, y)2
where η = 0.5 · with n
1
n Cr (x, y)/Cb (x, y)
(x,y)∈F G
being the number of pixels within the face mask, F G.
After finding the mouthmap, the image is eroded using a
non-flat, ball-shaped structuring element whose radius in the
X –Y plane is 5 and whose maximum offset height is 5. The
image is then dilated using a non-flat, ball-shaped structuring
element whose radius in the X –Y plane is 11 and whose
maximum offset height is 11. The values for morphological
operations were obtained by trial-and-error method.
A mask is created near the region surrounding the mouth
in order to extract the mouth region alone using active con-
tour by Chan Vese method [4]. It is based on level sets that
are evolved iteratively to minimize an energy. It is defined
by weighted values corresponding to the sum of differ-
ences intensity from the average value outside the segmented
(b)
region, the sum of differences from the average value inside
Fig. 4 Membership function details: a input, b output the segmented region, and a term which is dependent on
the length of the boundary of the segmented region. The
detected image of the face, cropped image, mouthmap, mask,
Table 1 Fuzzy rule base for image enhancement
and segmented active contour image is shown in Fig. 6a–e,
Input Output respectively. Based on the area, the largest blob in the seg-
MF a b c MF a b c mented image is extracted in order to detect the mouth. The
detected eye and mouth are shown in Fig. 7.
mf1 0 0 100 mf1 − 50 0 50
Eye detection is also performed using Viola Jones algo-
mf2 0 100 180 mf2 50 100 150
rithm. The detected eye is further split into the right eye and
mf3 30 120 180 mf3 100 150 200
left eye, and then the modified eyemap algorithm is applied
mf4 100 180 255 mf4 185 220 255
in order to detect the landmarks in the eye. The eyemap is
mf5 180 255 255 mf5 180 255 330 a combination of EmC and EmL. The EmC is found using
MF membership function, a, b, c parameters of triangular membership equation
function
1 2
EmC = (Cb ) + (C̃r )2 + (Cb /Cr ) , (2)
3
ing classifiers. For this work, Viola Jones algorithm proved to
be very efficient in detecting the face. A rectangular bounding where C̃r is the negative of Cr .
box is created for the detected face, and then the face region Before finding the EmL, a non-flat, ball-shaped structuring
alone is cropped. The cropped image is resized to 256 × 256 element, whose radius in the X –Y plane is 5 and whose max-
pixels because the actual image size is 562 × 762 pixels. imum offset height is 5, was chosen for the morphological
123
Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image…
Fig. 6 a Detected face, b cropped face, c mouthmap, d mask, e seg- Fig. 8 Constructed geometry of the face
mented mouth
structuring function gδ : G δ ⊂ R → R with gδ (x) =
|δ|g(|δ|−1 x) and x ∈ G δ , δ = 0, where δ is the scale param-
eter and G δ = {x : |δ|−1 x ∈ G} as defined in [11]. The prod-
uct of EmC and EmL is then found using Eq. 4 to get eyemap.
EyeMap = EmC · EmL. (4)
123
A. Joseph, P. Geetha
4 Results I2 is given by the luminance term, the contrast term, and the
structural term using the formula:
4.1 Dataset
SSIM(I1 , I2 ) = [l(I1 , I2 )]α · [c(I1 , I2 )]β · [s(I1 , I2 )]γ , (5)
Here, we use 478 images from the Karolinska Directed Emo-
tional Faces (KDEF) database, which is a set of human facial
expressions of emotion. The size of the images is 562 × 762 where
pixels with a resolution of 72 dpi. The database uses 16.7
million (32 bits) colors and uses a JPEG file format with a 2μ I1 μ I2 + C1
l(I1 , I2 ) = ,
compression quality of 94%. A sample set is shown in Fig. 9. μ2I1 + μ2I2 + C1
We also use two other datasets, namely the Oulu-CASIA 2σ I1 σ I2 + C2
dataset and the CK+ dataset, for testing the proposed algo- c(I1 , I2 ) = ,
σ I21 + σ I22 + C2
rithm. The Oulu-CASIA dataset consists of six expressions,
σ I1 I2 + C 3
namely surprise, happiness, sadness, anger, fear, and disgust, s(I1 , I2 ) =
σ I1 σ I2 + C 3
from 80 people who were between 23 and 58 years old. The
color images in the CK+ dataset were used for testing.
with μ I1 , μ I2 , σ I1 , σ I2 , and σ I1 I2 being the local means,
standard deviations, and cross-covariance of the images.
4.2 Experiments and discussion α, β, and γ are the weights. C1 , C2 , and C3 are variables
to stabilize the division with weak denominator. The results
4.2.1 Experiments on KDEF dataset of SSIM of the enhanced image are shown in Table 2.
The SSIM values show that the image enhancement tech-
First, we find the structural similarity index (SSIM) of the nique used is robust. This robustness can be improved by
enhanced image in order to assess the quality of the enhanced changing the values of the input membership function and
image. The structural similarity index of two images I1 and keeping the output membership function constant.
123
Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image…
Fig. 9 Samples of the KDEF dataset (the first row gives the seven emotional expressions of a female face and the second row gives the seven
emotional expressions of a male face)
0.8435
0.8340
Fig. 10 a Input image, b Viola Jones algorithm for detecting the mouth,
0.8439
c modified mouthmap algorithm for detecting the mouth
123
A. Joseph, P. Geetha
Fig. 11 Examples of detection from KDEF dataset: a surprised, b anger, c disgust, d afraid, e happy, f neutral, g sad
123
Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image…
Table 4 Performance comparison with the state-of-the-art methods standard deviation accuracy rates are shown in Fig. 12. From
with KDEF database the figure, one can infer that among all the methods used naive
Method Accuracy in % Bayes produced better results since they are highly scalable.
Tenfold cross-validation was used for all the classification
Sun et al. [27] 77.98
algorithms.
Lekdioui et al. [17] 91.87
Yaddaden et al. [30] 90.61
4.2.2 Experiments on Oulu-CASIA and CK+ datasets
Ruiz-Garcia et al. [25] 95.58
Proposed 98.1
Images from the Oulu-CASIA dataset and CK+ dataset were
Proposed (MoMMEM algorithm) subjected to face and eye detection using Viola Jones algo-
rithm, and then we used the modified mouthmap algorithm
0.35 to detect the mouth as it proved efficient in detecting the
Mean
0.3
Standard deviation mouth when compared to Viola Jones algorithm. The geom-
0.25
etry is constructed after finding the landmarks. This can be
visualized in Figs. 13 and 14, respectively. Except for the
Accuracy
0.2
alignment of left eye, the same settings were set for the detec-
0.15
tion of geometry of the two datasets. The detected geometry
0.1
had some variations in the detection when a person closed
0.05
his/her eyes. This can be an area of research which requires
0
LR LDA KNN CART NB SVM challenging algorithms for the detection of facial emotions.
Algorithm
A detailed analysis on the Oulu-CASIA dataset reveals
Fig. 12 Mean and standard deviation accuracy rates using different that the algorithm detects the geometry even when there are
algorithms occlusions in the face and the detection of mouth opening and
closing is also good which can be used for the identification
of emotion. Experiments show that the detected geometry
varies significantly in terms of mouth and eye movements and
the variations can be captured for identifying the emotions.
Based on the detected geometry, classification algorithms can
be designed for classification of emotions.
Experiments on CK+ dataset show that the proposed algo-
rithm can be used for identifying the geometry even when a
person’s emotion is very complex. This can be visualized in
the second row of Fig. 14.
Fig. 13 Example detections from Oulu-CASIA dataset (the first row Finally, to conclude, the proposed algorithm is optimal
represents the input images and the second row represents the detected
in detecting the geometry of the face which in turn can be
geometry)
used for identifying the emotion with good accuracy. The
analysis reveals that proper machine learning algorithm is
also required to identify the emotions.
123
A. Joseph, P. Geetha
123
Facial emotion detection using modified eyemap–mouthmap algorithm on an enhanced image…
systemin an ambient assisted environment. Expert. Syst. Appl. 112, P. Geetha is a Professor in the
173–189 (2018) Department of Computer Science
31. Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de- and Engineering, College of Engi-
expression residue learning. In: Proceedings of the IEEE Confer- neering campus, Anna University,
ence on Computer Vision and Pattern Recognition, pp. 2168–2177 Chennai, India. Her research
(2018) focuses on Image Processing and
32. Yu, Z., Liu, Q., Liu, G.: Deeper cascaded peak-piloted network Pattern Recognition.
for weak expression recognition. Vis. Comput. 34(12), 1691–1699
(2018)
33. Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial
expression recognition from near-infrared videos. Image Vis. Com-
put. 29(9), 607–619 (2011)
123