You are on page 1of 4

10th International Conference on Electrical and Computer Engineering

20-22 December, 2018, Dhaka, Bangladesh 42

High Performance Facial Expression Recognition


System Using Facial Region Segmentation, Fusion
of HOG & LBP Features and Multiclass SVM
Bayezid Islam,1,* Firoz Mahmud,1 and Arfat Hossain1
1
Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology
Rajshahi-6204, Bangladesh
*
bayezid.shouvik@gmail.com

Abstract—Facial expression is an effective method of non- Usually, seven basic facial expressions are considered while
verbal communication and expressing feelings. A method dealing with FER problems and they are neutral, angry, fear,
to recognize emotions through facial expressions is happy, sad, surprise, disgust. Due to its immense applications,
proposed in this paper. After some preprocessing of the FER has been an active research topic in the field of computer
input image, the facial region is segmented into four vision and human-computer interaction for the last few
expression regions according to the highly effective decades. Many efforts have been made and many are still being
proposed segmentation method. Features from these made to develop a robust and accurate system. Basic steps of
segmented parts are extracted using a fusion of Histogram FER include image preprocessing, analyzing action units,
of Oriented Gradients (HOG) and Local Binary Patterns segmenting facial image, feature extraction, and classification.
(LBP). Reduction of the dimension of the feature vector is Popular feature extraction techniques include Gabor
done using Principal Component Analysis (PCA). For wavelets, Linear Discriminant Analysis (LDA), Principal
classifying the features and thus the expression images, Component Analysis (PCA), Speeded-Up Robust Features
multiclass Support Vector Machine (SVM) is used. The (SURF), moments, Scale Invariant Feature Transform (SIFT),
performance of the proposed method is measured using Gray Level Co-occurrence Matrix and many others. But each
three publicly available and highly used datasets (JAFFE, has some drawbacks. For example, 2D Gabor wavelets work
CK+, RaFD). Finally, achieved performance is compared well in FER problems but their dimensionality is a reason for
with performance on these datasets by other available not choosing them as the feature extraction technique. For
methods to indicate that the proposed method succeeds in extracting features, the proposed method uses a fusion of
achieving state-of-the-art performance. Histogram of Oriented Gradients (HOG) and Local Binary
Patterns (LBP). Combination of HOG and LBP has been
Index Terms— Facial Expression Recognition (FER), PCA, successfully used in many FER systems [2]. Different
Emotion Recognition, Image Segmentation, Fusion of HOG and classification algorithms have been used for classifying as
LBP, Multiclass Support Vector Machine (SVM). well. Nearest neighbour classification [3], Artificial Neural
Networks (ANN) [4], Extreme Learning Machine [5], random
I. INTRODUCTION forests [6], AdaBoost classification [7], Support Vector
Facial expressions are helpful in expressing the feelings of Machine (SVM) [6], [8-9] are among the few worth
a person. Mental condition, sentiment can also be analyzed mentioning.
using facial expression. So emotion recognition is highly Classifiers used in FER systems have some issues. Proper
dependent on facial expression recognition. Facial expressions distance metric, the number of neighbours to be considered,
can play a vital role in communicating nonverbally. In a classic inefficient memory usage are challenges with nearest
work [1] A. Mehrabian showed that facial expression neighbour classification. ANN requires long training time, a
contributes 55% to the speaker’s message, which is higher than lot of training samples and many parameters to be tuned. An
vocal and textual information’s contribution. So a speaker’s optimal number of nodes in hidden layers and overfitting are
feelings, emotion can be successfully analyzed if the facial issues with ELM. Visualization is a problem with random
expression of the speaker can be analyzed successfully. forests. Outliers and noisy data are challenging while using
Successfully recognized expressions can be used in many AdaBoost. To solve the problems mentioned, multiclass SVM
sectors of our lives for the betterment of everyday experience. is used for the classification task by the proposed method.
It can be used for security purposes, robots can be given this SVM is highly effective in high dimensional spaces and
ability to enhance their performance, automated machines can images usually have a lot of features. SVM is memory
be articulated with expression analyzing feature, expressions efficient, can separate linearly inseparable data, can be used
can be analyzed to allow or prohibit a person from doing with different kernel functions as well as custom kernel
crucial tasks and in many other sectors this facial expression functions. These attractive features forced to select SVM as the
recognition (FER) can be used. One of the most probable classification algorithm. The next section of this paper briefly
future use can be in social networks where the user can be describes the proposed method and the subsequent sections
suggested to post status depending on the expression of the describe each section of the proposed method in detail with
uploaded image. As the world is heading towards automation, implementation up to some extent. Last few sections of the
so the ultimate goal is to recognize expressions flawlessly and paper are dedicated to result analysis, state-of-the-art
spontaneously using machines, as humans can do. comparison and conclusion.

978-1-5386-7482-6/18/$31.00 ©2018 IEEE


43

II. PROPOSED METHOD width, height values to segment the four parts from a 150×150
An input image, which may be colour or grayscale, is taken facial image are mentioned in TABLE I.
TABLE I
as the input of the system. Then it is converted to grayscale if VALUES FOR SEGMENTATION
it is a colour one. Then only the face region is detected from Facial Parts Coordinate (x,y) Width (w) Height (h)
the image using Viola-Jones face detection method [10] if Right Eye 23.55,46.58 46.01 30.67
there is any face in it. Then the detected facial region is resized Left Eye 88.29,46.58 44.00 29.15
to a fixed size for using it conveniently in the upcoming steps. Nose 54.33,81.84 45.43 38.00
These three steps are altogether defined as image Mouth 50.24,114.0 57.00 34.18
preprocessing. The resized image is then segmented into four
facial expression regions (right eye, left eye, nose, mouth) These values are defined by analyzing many facial images
according to the proposed image segmentation method. and the position of these four parts in these images. A
Features are extracted from the segmented parts using both challenge was to segment these four parts as accurately as
HOG and LBP. The dimension of the feature vector is reduced possible with the least possible dimensions. When these values
using PCA. Finally, multiclass SVM is used to train the system are applied to an image of size 150×150, the image gets
with some of the images and the rest of the images are used to segmented into four parts as illustrated in Fig. 3.
test the system. The methodology is illustrated in Fig. 1.
Image Preprocessing
Conversion Face Image
Image
to Grayscale Detection Resizing

PCA HOG+LBP
Image
(Dimension (Feature
Reduction) Extraction)
Segmentation Fig. 3 Proposed image segmentation method on a block of size 150×150.

For clearly understanding the proposed image segmentation


Train Using Multiclass SVM Test Using Multiclass SVM
method, the whole image segmentation process is pictorially
End of Facial represented step by step in Fig. 4 on a sample image from
Training Expression RaFD dataset [11].
Fig.1 The proposed method of facial expression recognition (FER).

III. IMAGE PREPROCESSING AND IMAGE SEGMENTATION


During the preprocessing, at first an input image is checked
whether it is a colour image or a grayscale image. If it is a
colour image then it is converted into grayscale. If it is not a
colour image then this step is not performed. Then only the
face region is detected from the grayscale image using Viola-
Jones face detection method [10]. Finally, the face region is
Fig. 4 Proposed segmentation method (step by step) on a sample image.
converted to a fixed size of 150×150 pixels for using it in the
subsequent steps. The image preprocessing step is shown in IV. FEATURE EXTRACTION AND DIMENSION REDUCTION
Fig. 2 on a sample image of the RaFD dataset [11].
Classification step is highly dependent on the features it gets
to manipulate. So feature extraction step is an important step
in the whole image processing process. For extracting the
features from the segmented parts, both HOG features and
Fig. 2 Steps of image preprocessing.
LBP features are used.
The full facial region can be used for feature extraction but A. Histogram of Oriented Gradients (HOG)
there are many regions in the face which do not contribute in HOG features are robust against photometric and geometric
facial expressions. So it would be computationally efficient if transformations. It is among the characteristics that made HOG
only the contributory regions from the face can be used. There the supreme choice for selecting as the feature extraction
are many popular methods, such as Viola-Jones object technique in the proposed method. It has been successfully
detection method [10], Active Appearance Models (AAM) used in different tasks such as object detection [13]. In short,
[12] to detect the contributory parts from the full face. But the steps [13] of calculating the HOG features from an image
there are some drawbacks to each of them. For example, Viola- are (1) Divide the whole image into small cells (2) For each
Jones method fails to detect the eye region when the eye is pixel in each cell compute the gradient magnitude and
closed or almost closed. To overcome such problem we opted direction (3) Calculate the corresponding bin for every
to segment the facial region manually into four expression gradient magnitude, direction and represent using a histogram
regions which are right eye, left eye, nose, mouth. The of gradients (4) Adjacent cells are used to build blocks and
segmentation method is quite simple but works well on the then block normalization is performed to calculate the feature
desired category of images. To describe the image vector.
segmentation process, the example of segmenting the left eye Finding the size of cells, blocks, the number of bins is a
is considered here. At first the coordinate point 88.29, 46.58 is crucial task and problem dependent. While implementing,
found from the facial image resized to 150×150. Then a region cells of 8×8 pixels, blocks of 2×2 cells, 9 bins in the histogram
of width 44 and height 29.15 from that coordinate point is and unsigned orientation of gradients were used. Feature
selected and used as the left eye. In this way all the four parts extraction using HOG is illustrated in Fig. 5 on the right eye of
are segmented. The coordinate values and corresponding a sample image from RaFD dataset [11].
44

̅ , ,…, ̅ , where each label is an integer from


the set 1, … , , : → a multiclass classifier
function, kernel function, objective function of the dual
Fig. 5 HOG feature extraction on right eye of a sample image.
program, a regularization constant, ̅ 1 the
B. Local Binary Patterns (LBP) difference between the point distribution 1 and the
Ability to handle changes in illumination, comparatively distribution obtained by the optimization problem, then the
faster computation are among the benefits with LBP. LBP has algorithm [19] is summarized as follows:
been successfully used for texture analysis [14], facial Input ̅ , ,…, ̅ , .
analysis, image analysis, motion analysis. Steps [14] for Initialize ̅ 0, … , ̅ 0.
calculating LBP features from an image are (1) Divide the Loop:
whole image into small cells with a specified radius and a 1. Choose an example p
specified number of neighbours (2) Perform thresholding on
2. Calculate the constants for the reduced problem:
the center pixel and neighbour pixels of the cell (3) A
combination of zeroes and ones are found as the result of • ̅ , ̅
thresholding and the binary number (LBP) is converted to a • ∑ ̅, ̅ ̅ 1
decimal number (LBP) (4) Store the ‘count’ of each LBP (5) 3. Set ̅ to be the solution of the reduced problem:
Compute the histogram, over the cell, of the frequency of each min ̅ ̅ , ̅ . ̅
‘count’ occurring (6) Calculate the feature vector by
concatenating histograms of all the cells. subject to: ̅ 1 and ̅ . 1 0
Choosing an appropriate radius and number of neighbours Output: ̅ arg max ∑ , ̅, ̅
are challenging and problem specific task. During the
implementation, a radius of 1 with 8 neighbours was used. Bias terms are not required for the algorithm to classify the
C. Fusion inputs. While implementing, an accuracy of 1e-4 was set as the
From the four segmented parts, at first HOG features are stopping criteria for the algorithm, linear kernel function was
extracted and then from the same four parts LBP features are used and a value of 1 was set as the penalty parameter of the
extracted. The final feature vector is formed by concatenating error (C).
the feature vector of HOG features and the feature vector of VI. RESULT
LBP features. The length of the concatenated feature vector A computer with 64-bit system, 4GB of memory and core
was 1892. Among them, 1656 features came from HOG i5 processor was used for implementing the proposed method.
descriptor and the rest from LBP. Three widely used facial expression datasets JAFFE [17], CK+
For reducing the dimension of the feature vector PCA [15- [20] and RaFD [11] were used to assess the performance of the
16] is used for its ability to represent data in terms of principal system. All 213 facial expression images of JAFFE, 1219
components. The steps of PCA are (1) Normalize the data and images of 22 people from CK+ and 1407 front facing images
calculate the covariance matrix (2) Calculate eigenvectors, of 67 people from RaFD dataset were used. Seven basic facial
eigenvalues of the covariance matrix (3) Choose principal expressions were considered from all datasets and the
components and translate the data in terms of the components. contemptuous expression was excluded from RaFD dataset.
While implementing PCA, it was aimed to retain 99% To avoid biased results, K fold cross-validation was used.
variance ratio which resulted in dimension reduction up to Training and testing sets were selected randomly without
89.85%. For example considering JAFFE dataset [17], before considering any statistical relation.
applying PCA the length of feature vector was 1892 and after TABLE II
THE ACCURACY OF DIFFERENT DATASETS
applying PCA it was 192 which means a dimension reduction
Dataset Fold Proposed Method Traditional Method
of 89.95% (1-(192÷1892)). Although the dimension of four Used Avg. F. V. L. Avg. F. V. L.
segmented parts is different, they are same for all images and JAFFE 5 93.41 94.37
so their summation is also the same for all images. So the JAFFE 10 94.42 95.76
length of the feature vector for all sample images remain same CK+ 5 99.59 1892 99.67 10463
and they are perfect for comparison, calculation purpose in the CK+ 10 99.59 99.75
classification step. RaFD 5 99.29 99.65
RaFD 10 99.65 99.93
V. CLASSIFICATION
Classification step is one of the most vital steps in the whole TABLE II represents the average (Avg.) accuracy achieved
process. So the importance of choosing and applying an with 5, 10 folds on different datasets by the proposed method.
appropriate classifier cannot be denied. For classifying the Here Correct Recognition Rate (CRR) is meant by accuracy.
expressions multiclass SVM is used. SVM works by creating TABLE II also represents the feature vector length (F. V. L)
hyperplane to distinguish the available data depending on the without any dimension reduction. Traditional method refers to
characteristics of the data. Inherently SVM is suitable for extracting features from the full face without any kind of
binary classification problems. Multiclass classification segmentation. It is apparent that the proposed method achieves
problems are handled by One-Versus-Rest (OVR) and One- almost the same accuracy as the traditional method but with
Versus-One (OV1) approach [18]. These approaches work by just 18.08% features of the traditional method. Fewer features
reducing a multiclass problem into multiple binary problems. mean less computational overhead and a smoother way to
Crammer and Singer [19] proposed a more convenient method create a real-time and efficient system. Yes, the dimension of
to distinguish all classes of multiclass classification into a the feature vector can be reduced by dimension reduction
single optimization problem by using Lagrangian dual of the techniques, as done by the proposed method. But reducing
optimization problem. Given a set of training samples 1892 features to a few hundred and 10463 features to a few
45

hundred are not computationally equivalent. The latter would REFERENCES


require much computation, time and as a result would be an [1] A. Mehrabian, “Communication without Words”, Psychology Today,
Vol. 2, No. 4, pp.53-56, 1968.
obstacle to creating a real-time system. For image processing
[2] S. An and Q. Ruan, "3D facial expression recognition algorithm using
problems like FER, accuracy is not the absolute metric to local threshold binary pattern and histogram of oriented gradient", 2016
analyze the performance of the system. Confusion matrices are IEEE 13th International Conference on Signal Processing (ICSP),
also analyzed in problems like FER. TABLE III represents a Chengdu, pp. 265-270, 2016.
[3] S. Agrawal and S. Yadav, “Approach Based on HOG Descriptor and K
confusion matrix of a random test case on RaFD dataset where
nearest Neighbour for Facial Expressions Recognition”, International
some fear expression images were not classified properly. Journal of Innovations & Advancement in Computer Science, Vol. 7,
TABLE III No. 5, pp. 239-249, 2018.
CONFUSION MATRIX OF CRR ON RAFD DATASET [4] N. Kauser and J. Sharma, "Facial expression recognition using LBP
Ne Ha An Su Fe Di Sa Acc. template of facial parts and multilayer neural network", 2017
Ne 100 0 0 0 0 0 0 100 International Conference on I-SMAC (IoT in Social, Mobile, Analytics
Ha 0 100 0 0 0 0 0 100 and Cloud) (I-SMAC), Palladam, pp. 445-449, 2017.
An 0 0 100 0 0 0 0 100 [5] Z. Liu et al., "A facial expression emotion recognition based human-
Su 0 0 0 100 0 0 0 100 robot interaction system", in IEEE/CAA Journal of Automatica Sinica,
Vol. 4, No. 4, pp. 668-676, 2017.
Fe 0 0 0 1.52 98.48 0 0 98.48
[6] J. Jayalekshmi and T. Mathew, "Facial expression recognition and
Di 0 0 0 0 0 100 0 100 emotion classification system for sentiment analysis", 2017
Sa 0 0 0 0 0 0 100 100 International Conference on Networks & Advances in Computational
Technologies (NetACT), Thiruvanthapuram, pp. 1-8, 2017.
VII. CONCLUSION [7] R. Verma and M. Y. Dabbagh, "Fast facial expression recognition based
on local binary patterns", 2013 26th IEEE Canadian Conference on
Performance comparison of the proposed method with other Electrical and Computer Engineering (CCECE), Regina, SK, pp. 1-4,
FER methods is shown in TABLE IV. 2013.
TABLE IV [8] Y. Liu, Y. Li, X. Ma, and R. Song, “Facial Expression Recognition with
STATE-OF-THE-ART COMPARISON Fusion Features Extracted from Salient Facial Areas”, Sensors, Vol. 17,
Study Technique Dataset Acc. No. 4, p. 712, Mar. 2017.
Proposed Viola-Jones face detection, JAFFE 94.42 [9] P. Kumar, S. L. Happy and A. Routray, "A real-time robust facial
expression recognition system using HOG features", 2016
Method facial region segmentation, CK+ 99.59
International Conference on Computing, Analytics and Security Trends
HOG+LBP, PCA, Multiclass RaFD 99.65 (CAST), Pune, pp. 289-293, 2016.
SVM [10] P. Viola and M. Jones, "Rapid object detection using a boosted cascade
2017 [6] Viola-Jones face detection, of simple features", Proceedings of the 2001 IEEE Computer Society
Zernike moments, LBP, DCT, JAFFE 90.14 Conference on Computer Vision and Pattern Recognition. CVPR 2001,
SVM pp. I-511-I-518, Vol.1, 2001.
2017 [8] Salient Facial Areas, JAFFE 89.60 [11] O. Langner, R. Dotsch, G. Bijlstra, D.H. Wigboldus, S.T. Hawk and A.
van Knippenberg, "Presentation and validation of the Radboud Faces
LBP+HOG, PCA, SVM CK+ 98.30
Database", Cognition and Emotion, Vol. 24, pp. 1377-1388, 2010.
2016 [9] Viola-Jones face detection, CK+ 89.70 [12] T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active Appearance
Facial landmarks, Active facial RaFD 96.00 Models", IEEE Trans on Pattern Analysis and Machine Intelligence,
patches, HOG, SVM Vol. 23, No. 6, pp. 681-685, 2001.
2015 [21] SURF, gentle AdaBoost RaFD 90.64 [13] N. Dalal and B. Triggs, "Histograms of oriented gradients for human
detection", 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, pp.
The result section along with the state-of-the art comparison 886-893, Vol. 1, 2005.
indicates the effectiveness of the proposed method compared [14] T. Ojala, M. Pietikainen and T. Maenpaa, "Multiresolution gray-scale
to other available FER methods. Effective and unique facial and rotation invariant texture classification with local binary patterns",
region segmentation method along with the selection of in IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 24, No. 7, pp. 971-987, Jul 2002.
appropriate feature extraction and classification technique [15] F. Mahmud, S. Afroge, M. A. Mamun and A. Matin, "PCA and back-
developed a high-performance emotion recognition system by propagation neural network based face recognition system," 2015 18th
recognizing facial expressions. Some subtle expressions are International Conference on Computer and Information Technology
even tough for a human to recognize properly, let alone (ICCIT), Dhaka, 2015, pp. 582-587.
[16] A. Matin, F. Mahmud, T. Ahmed and M. S. Ejaz, "Weighted score level
machines. Such expression images affected the overall fusion of iris and face to identify an individual," 2017 International
performance of the system slightly. A controversial step might Conference on Electrical, Computer and Communication Engineering
be the facial region segmentation step. But the system was (ECCE), Cox's Bazar, 2017, pp. 1-4.
tested with almost 3000 front facing images from standard [17] M. J. Lyons, S. Akemastu, M. Kamachi and J. Gyoba, “Coding Facial
Expressions with Gabor Wavelets”, 3rd IEEE International
datasets and it demonstrated its ability to segment the four Conference on Automatic Face and Gesture Recognition, pp. 200-205,
parts properly. But images with unusual facial structure might 1998.
not get segmented properly by the proposed segmentation [18] Z. Wang and X. Xue, “Multi-Class Support Vector Machine,” in
method. The proposed segmentation method also helps to Support Vector Machines Applications, Y. Ma and G. Guo, Eds. New
York: Springer, 2014, pp. 23-48.
achieve a real-time system by considering only the required [19] K. Crammer and Y. Singer, “On the Algorithmic Implementation of
features. Multiclass Kernel-based Vector Machines”, The Journal of Machine
The system is capable of handling front facing images only. Learning Research, Vol. 2, pp.265-292, 2002.
But as the concern was only front-facing images so for the time [20] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I.
Matthews, “The Extended Cohn-Kanade Dataset (CK+): A complete
being it is not a great issue. But developing a more robust dataset for action unit and emotion-specified expression”, IEEE
system capable of handling images rotated at any angle in any Computer Society Conference on Computer Vision and Pattern
direction would be a future work. The ultimate goal is to Recognition Workshops, San Francisco, CA, June 13-18, pp. 94-101,
recognize emotions, so developing a multimodal information 2010.
[21] Q. Rao, X. Qu, Q. Mao and Y. Zhan, "Multi-pose facial expression
fusion system combining facial, vocal, textual information to recognition based on SURF boosting", 2015 International Conference
recognize emotion would be an interesting and challenging on Affective Computing and Intelligent Interaction (ACII), Xi'an, pp.
work in the near future. 630-635, 2015.

You might also like