You are on page 1of 6

Facial Expressionn Recognition Using Deeep Neural

Networks

Junnan Li and Edmund Y. Lam


Departm
ment of Electrical and Electronic Engineering
The University of Hong Kong, Pokfulam,
Hong Kong

Abstract—We develop a technique using deep p neural network


for human facial expression recognition. Images off human faces are
preprocessed with photometric normalization and histogram
manipulation to remove illumination variance. Facial features are
then extracted by convolving each preprocessed d image with 40
Gabor filters. Kernel PCA is applied to featurees before feeding
them into the deep neural network that consists oof 1 input layer, 2 Angry Disgust Fear
hidden layers and a softmax classifier. The deep n
network is trained
using greedy layer-wise strategy. We use the Extended Cohn-
Kanade Dataset for training and testing. Recoognition tests are
performed on six basic expressions (i.e. surpriise, fear, disgust,
anger, happiness, sadness). To test the robustness of the
classification system further, and for benchmark k comparison, we
add a seventh emotion, namely “contempt”, for additional Happiness Sadness Surprise
recognition tests. We construct confusion matrixx to evaluate the Fig.1. Sample face images of six basic emotions
e from the CK+ database [2]
performance of the deep network. It is demon nstrated that the
network generalizes to new images fairly succcessfully with an recognition systems also find usses in other fields such as
average recognition rate of 96.8% for six emotion ns and 91.7% for education software, animations, auto
omobile safety and behavioral
seven emotions. In comparison with shallower neu ural networks and science.
SVM methods, the proposed deep network metthod can provide
better recognition performance. Facial expression recognition falls within the research
framework of pattern recognition. A recognition system would
Keywords—Multi-layer neural network; emootion recognition; consist of three stages: face detecction, feature extraction and
Gabor filters; Kernel principal component analysis. expression classification. A large amount
a of research has been
carried out in these three central issues of facial erepression
recognition. A variety of facial ex
xpression recognition systems
I. INTRODUCTION have been developed using diffeerent feature extraction and
classification methods. Tian et al. studied the combination of
In computer vision, the automatic recoggnition of facial permanent features and transient features
f with artificial neural
expressions has been an active research area for a long time. networks (ANN) as the classifier [3]. Wang and Yin applied
Facial expression recognition refers to detecting human emotions topographic context (TC) expression n descriptors and a combined
based on expressions. Since the early 1970s, sttudies performed classifier of quadratic discriminaant classifier (QDC), linear
by Ekman have shown that there are six “univerrsal categories of discriminant analysis (LDA) and d naive Bayesian network
emotional expressions” that are easiest to bee recognized by classifier (NBC) [4]. More recent work
w by Sánchez et al. applied
humans. Those prototypical facial expressions aare: surprise, fear, optical flow-based methods for feaature extraction and Support
disgust, anger, happiness and sadness [1]. Since facial expression Vector Machine (SVM) as classificaation method [5].
recognition is characterized as a pattern recognnition problem in
human cognition, computers with pattern recoognition abilities Among those published work, neural
n network-based method
have the potential to perform as well as, or even better than has shown promising results. It hass obtained over 85% accuracy
human. for six basic emotion classificatio ons. Our study continues to
explore the potential of neural network
n to recognize facial
The automatic expression recognition has siggnificant meaning expressions. In order to represent beetter mapping from the feature
to many applications. With the advances in robotics, the space to the facial expression spacee, we investigate feedforward
requirement of a robust real-time facial expresssion recognition deep neural networks, which havee the ability to model more
system is urgent. It could improve the perform mance of human- complex nonlinear functions.
computer interaction and help to construct more intelligent robots
with the ability to understand human emotioons. Apart from We have developed a real-time facial expression recognition
robotics and human-computer interaction, facial expression system with high recognition rates. We design a multistep image

978-1-4799-8633-0/15/$31.00 ©2015 IEEE


preprocessing and feature extraction procedure tthat improves the Viola-Jones algorithm constru ucts an intermediate image
classification performance. In particular, histogrram remapping to representation called the internal im
mage, which can be computed
a normal distribution is used after photometric normalization to rapidly. It builds a simple and efficient
e classifier using the
remove illumination variance. Then we use Gabbor wavelets and AdaBoost learning algorithm. The face detection is achieved by
Kernel Principal Component Analysis to extraact representative combining classifiers in a cascadee structure that is capable of
features of the preprocessed image. The noonlinear features increasing the detection perfformance while reducing
extracted by Kernel PCA provide better rrecognition rates computational complexity.
compared with linear PCA. The features are ffed into the deep
neural network for classification. The deep network applies B. Photometric normalization
greedy layer-wise training strategy that can learn from both nditions of a facial image can
The variation of illumination con
labeled and unlabeled data. The use of unlabeleed data helps the introduce large changes to the image,
i hence impairing the
system to learn better features when the amount of labeled data is performance of the facial expression classification. Many
limited. photometric normalization algorithms have been proposed that
can remove illumination variance at the preprocessing level,
In the remainder of the paper, we will first discuss the including multiscale retinex methodd [9], isotropic diffusion based
preprocessing techniques in Section 2. In Sectiion 3, the feature normalization method, and an nisotropic diffusion based
extraction methods are presented. Training proceedure of the deep normalization method [10].
network is proposed in Section 4. Experimeental results for
recognition tests on the Extended Cohn-Kannade Dataset are Among those photometric norrmalization algorithms, it is
demonstrated and analyzed in Section 5. Concllusions are stated shown by Short et al. that ho omomorphic filtering based
in Section 6. normalization yields the most conssistent results compared with
other techniques [11]. Hence, we apply homomorphic filtering
based normalization in our recognitiion system.
II. PREPROCESSING OF IMAGES An image is the product of two components, illumination and
Image preprocessing represents an essentiall part of a facial reflectance. A homomorphic filter decomposes them by taking
expression recognition system. It has significannt impact on the the logarithm. Then it applies Fouriier transform to transform the
robustness and performance of the system. The object of two components into ‫ܨ‬௅ and ‫ܨ‬ோ . ‫ܨ‬௅ mainly comprises of high
preprocessing is to reduce the influence of nnoise on feature frequency components, while ‫ܨ‬ோ mainly comprises of low
extraction and enhance the discriminative inform
mation contained frequency components. Convolving g –Š‡ with a homomorphic
in images. filter will emphasize high frequency
y components and reduce low
frequency components. Therefore, the contrast of the image is
A. Face detection improved and the dynamic range is i compressed. The image is
In our facial expression recognition system, we need images transformed back into spatial domaiin by applying inverse Fourier
of frontal faces that are normalized in scale. Hence, it is transform.
important that we can localize and extract the facce region from an
image. The exclusion of background is cruucial for reliable
expression classification.
Different face detection algorithms have been proposed,
including shape-information-based approach [6] and skin-color-
based approach [7]. Shape-information-based aapproach is often
Fig.3. Sample images of homomorphiic filtering based normalization
not fast enough for real-time detection, while skin-color-based
approach only works for colored images. In ouur system, we use C. Histogram remapping to normal distribution
the Viola-Jones face detection framework, whhich is a robust Histogram remapping is a com mmon pre- or post-processing
algorithm capable of processing images extrem mely rapidly for step for photometric normalization. The most common histogram
real-time situations [8]. It is most effective on iimages of frontal remapping approach is histogram m equalization, where pixel
faces, which is exactly the type of images we use. Tests show values are mapped to a uniform distribution
d so as to improve
that Viola-Jones algorithm achieves a detection rate of 100% for contrast and compensate for the illu umination variance. Study by
the images used in this study. Ranawade has shown that using g histogram equalization in
conjunction with photometric norm malization leads to a better
classification performance than usin ng photometric normalization
on its own [12]. However, th he usefulness of histogram
equalization is determined empiricaally rather than theoretically.
Histogram equalization is only a sp pecial case for a more general
concept, which is altering pixel inteensity values in a way that the
Fig.2. Viola-Jones algorithm applied to sample imaage of human face distribution fits a predefined function. Rather than fitting a
uniform distribution to the histogram of the imaages, as it is done ݂௨ ൌ ݂௠௔௫ Ȁʹ௨Ȁଶ , ߠ ൌ ߨ‫ݒ‬Ȁͺ (4)
in histogram equalization, we propose to use a normal
distribution. The experimental results in Sectionn 5 will show that ݂௠௔௫ denotes the maximal frequ uency. ߛ and ߙ determine the
normal distribution mapping leads to better recoggnition rates than sharpness along a and b axis. u and a v define the number of
histogram equalization. orientations and scales of the filter bank.
b

The expression for the normal distribution cuurve is given by In this system, we construct a Gabor
G filter bank consisting of
40 filters. We choose eight orientatiions to capture subtle features
ଵ ିሺ௣ିఓሻమ of the facial expression, and five scales
s to efficiently represent
ˆሺ‫݌‬ሻ ൌ ‡š’ ቀ ቁ (1)
ఙξଶగ ଶఙ మ features of a 128×128 image. The otther parameters selected are ߛ
where ߤ represents the mean value, and ߪ denootes the standard = ߙ ൌ ξʹ and ݂௠௔௫ ൌ ͲǤʹͷ, which h are also appropriate for the
deviation. image size.
In our system, we set ߤ to be 0 and ߪ to be 1. However, due We extract the Gabor featuress of a grey-scale image by
to the nature of neural networks, ߤ has no iinfluence on the convolving the image ‫ܫ‬ሺܽǡ ܾሻ with
w the Gabor filter bank
classification results. Figure 4 gives a visual example of the
௨ǡ௩ ሺܽǡ ܾሻ, i.e.,
histogram remapping. Here the mapped pixel vaalues are rescaled
back to the 8-bit interval for visualization purposes. ‫ܨ‬ሺܽǡ ܾሻ ൌ ‫ܫ‬ሺܽǡ ܾሻ ‫
כ‬௨ǡ௩ ሺܽǡ ܾሻ. (5)
The magnitude responses of a sample
s image filtered by two
of the Gabor filters are shown in Fig
g.5.

Fig. 5. Magnitude responses of a preprocessed


p sample image

Since we have resized the images to 128×128 before


convolving with Gabor filters, the Gabor
G features reside within a
Fig.4. A sample image and its histogram before and after maapping the histogram space of dimension 655360 (128×1 128×40). The features are of
to the normal distribution
too high a dimension to efficiently process and store. In order to
reduce the dimension, we apply dow wnsampling by a factor of 64
to all feature vectors. However, the features
f still have a very large
III. FEATURE EXTRACTION dimensionality, which is computatio onally expensive and contains
Feature extraction is an essential com mponent of the redundant information. Therefore, Kernel Principal Component
recognition system. It aims to identify the mostt appropriate and Analysis is applied.
meaningful representation of the face images foor recognition. In
our system, Gabor filtering and Kernel Princcipal Component B. Kernel Principal Component An nalysis
Analysis are used to extract features. PCA (Principal Component Analysis)
A and KPCA (Kernel
Principal Component Analysis) are important techniques used to
A. Gabor Wavelets reduce dimensionality of features. Dimensionality reduction
The Gabor filter is a useful tool to extract m
meaningful facial helps create uncorrelated features and
a reduces computation cost
features. It is similar to the receptive field prrofile of cortical [15]. Features of lower dimension n can also provide a better
simple cells, which is characterized as locaalized, frequency representation of the face images.
selective and orientation selective [13]. Study by Zhang et al.
While conventional PCA aims to extract a subspace where
suggests that Gabor wavelet extracted from fface images can
the variance of the features is maximized,
m some undesired
achieve a much better performance than geomettric positions of a
variations might be retained. The liinear projection of PCA may
set of fiducial points [14].
be suboptimal to represent inform mation based on higher order
A Gabor filter bank in 2D spatial domain ሺܽǡ ܾሻ is defined by dependencies in an image, such ass nonlinear relations of pixel
మ మ
values [16]. Schölkopf et al. have extended the linear PCA to
௙ೠ మ ି൬ቀ௔ᇲ Ȁఊమ ቁାቀ௕ ᇲ Ȁఈ మ ቁ൰ ᇲమ
Kernel PCA, where inputs are mapped from their original space

௨ǡ௩ ሺܽǡ ܾሻ ൌ  ݁ ݁ ௝ଶగ௙ೠ ௔ (2)
గఊఈ to a space of higher dimension. KPCA K can extract nonlinear
where features and thus provides better reccognition performance [17].
ܽᇱ ൌ ܽܿ‫ ߠݏ݋‬൅ ܾ‫ߠ݊݅ݏ‬, ܾ ᇱ ൌ െܽ‫ ߠ݊݅ݏ‬൅ ܾܿ‫ߠݏ݋‬ (3)
Given a set of samples ൛‫ݔ‬ଵǡ ‫ݔ‬ଶ ǥ ǥ‫ݔ‬௞ ൟ ‫ ܴ א‬௡ with zero mean
and unit variance, linear PCA finds the projection directions by
finding the eigenvalues ɉ and eigenvectors u oof the covariance
matrix C, such that ɉ— ൌ —.
In KPCA, each vector ‫ݔ‬௜ in ܴ௡ is projected tto a feature space

ܴ of higher dimension, using a nonlinear mappping function ߶Ǥ
Hence, the eigenvalue problem in feature space bbecomes
ߣ‫ݑ‬థ ൌ ‫ ܥ‬థ ‫ݑ‬థ (6)
where ‫ ܥ‬థ is the new covariance matrix, and ‫ݑ‬థ is the
eigenvector with ‫ݑ‬థ ൌ σ௠
௜ୀଵ ߙ௜ ߶ሺ‫ݔ‬௜ ሻ. Fig.6. Architecture of th
he Deep Network

Then, we can project ߶ሺ‫ݔ‬௜ ሻ in feature spaace ܴ to a low The activation function of hiddeen units is the logistic sigmoid
dimensional space spanned by eigenvectors ‫ ݑ‬థ . For a input function:
vector ‫ݔ‬௝ , those projections are the nonnlinear principal ଵ
components corresponding to ߶ . ‰ሺ‫ݖ‬ሻ ൌ . (10)
ଵା௘௫
௫௣ሺି௭ሻ

‫ݑ‬థ ȉ ߶൫‫ݔ‬௝ ൯ ൌ σ௠
௜ୀଵ ߙ௜ ቀ߶ሺ‫ݔ‬௜ ሻ ȉ ߶ ൫‫ݔ‬௝ ൯ቁ. (7) In the function, œ ൌ š, where x is the input vector and W is
the weight parameter.
Denote the kernel function by
The output layer is a softmax classifier. Each output unit will
݇ሺ‫ݔ‬௜ ǡ ‫ݔ‬௝ ሻ ൌ ߶ሺ‫ݔ‬௜ ሻ ȉ ߶൫‫ݔ‬௝ ൯. (8) output the probability of the input im
mage being its corresponding
expression. The output h(x) of the softmax
s classifier for an input
Hence, the nonlinear principal components can be extracted vector ‫ݔ‬௜ is given by
implicitly using the kernel function withoout the explicit

projection of input vectors to high dimensionall space ܴ௙ . This ‫݌‬ሺ‫ݕ‬௜ ൌ ͳȁ‫ݔ‬௜ Ǣ ܹሻ ௐభ ௫೔
‫ ݁ۍ‬೅ ‫ې‬
makes the Kernel PCA have a similar computattional complexity ‫݌‬ሺ‫ݕ‬௜ ൌ ʹȁ‫ݔ‬௜ Ǣ ܹሻ
݄ௐ ሺ‫ݔ‬௜ ሻ ൌ ൦ ൪ൌ

‫ ݁ ێ‬ௐమ ௫೔ ‫ۑ‬ (11)
compared with linear PCA. ‫ڭ‬ ೈ೅ೣ
ೕ ೔‫ێ‬ ‫ۑ ڭ‬
σ೘
ೕసభ ௘
In this system, we use the fractional power ppolynomial kernel, ‫݌‬ሺ‫ݕ‬௜ ൌ ݉ȁ‫ݔ‬௜ Ǣ ܹሻ ‫ ݁ۏ‬ௐ೘೅௫೔ ‫ے‬
which is defined by motions, and W is the weight
where m equals to the number of em
ᇱ ᇱ
൫‫ݔ‬௜ ǡ ‫ݔ‬௝ ൯ ൌ •‰൫‫ݔ‬௜ ‫ݔ‬௝ ൯ ȉ ห‫ݔ‬௜ ‫ݔ‬௝ ห .
଴Ǥ଼
(9) parameter.
B. Greedy layer-wise training
In Section 5, we will show that the nonlinear principal
components extracted by KPCA achieve better recognition rates Even though the significant powwer of deep networks has been
than linear principal components extracted usingg PCA. proved theoretically and appreciatted for decades, researchers
found it difficult to train deep nettworks. Traditional gradient-
based optimization algorithms are no
ot effective when the gradient
is propagated across multiple lay yers of non-linear functions
IV. FEEDFORWARD DEEP NEURAL NETW
WORKS
[19][20]. Reasons include insuffficiency of labeled data,
Deep neural networks are ones in which thhere are multiple converging to local optima and diffu
usion of gradients.
hidden layers. Since each hidden layer compputes a nonlinear
transform of the previous layer, multiple hiddenn layers have the In order to address those problem
ms, Hinton et al. has proposed
power to generate much more complex features oof the input. As a a greedy layer-wise unsupervised training strategy based on
result, a deep network can learn significantlyy more complex restricted Boltzmann machines (RBM) [21]. Bengio et al. further
functions than a shallow network. It has been shown that a k- improved the greedy layer-wise prrocedure with autoassociator
layer network can represent functions that a (k – 1) layer network networks. The main idea of the meth hod is to train different layers
can only represent with an exponentially large nnumber of hidden of the deep network one at a time [18]. It is the training strategy
units [18]. that we apply.

A. The network architecture In our training process, the two hidden layers of the network
mages. They will try to learn
are firstly trained using unlabeled im
The deep network we design contains one input layer, two an identity function where the desirred output is the same as the
hidden layers and one output layer as shown in Fig.6. The inputs input. This process is unsupervised d feature learning. The use of
are feature vectors obtained after KPCA. Eaach hidden layer unlabeled images helps the neetwork learn good feature
contains 200 units. representations prior to supervised learning. Then we feed the
labeled image data into the two hidd den layers that are pre-trained
and perform forward propagation to obtain feature vectors. Those with the total number of images.
feature vectors are used to train the output layer, which is a
softmax classifier. Supervised training is applied here, where the For each test set, we have recorded the recognition rate for
target value of a output unit is 1 if the labeled emotion is the each emotion as well as the total recognition rate. TABLE II
same as the one it represents, and 0 otherwise. We apply fine- shows the recognition rate using original/uniform/normal
tuning of the whole network as the final step. We treat all layers histogram distribution and PCA/KPCA.
as one single model and use back propagation algorithm to Based on the results, we can see that histogram remapped
improve upon all the weights in one iteration. images significantly outperform images with no histogram
manipulation. Of the two histogram manipulation techniques,
fitting a normal distribution leads to better recognition rates than
V. EXPERIMENTAL RESULTS histogram equalization. Moreover, Kernel PCA is more
powerful over PCA in terms of improving recognition rates.
We use the Extended Cohn-Kanade Dataset for training and Overall, fitting a normal histogram distribution combined with
testing the deep neural network. It is a very popular database KPCA yields the best performance.
used to evaluate the performance of facial expression recognition
systems. The number of images contained in the dataset is shown B. Recognition of seven facial expressions
in TABLE I. In order to further test the robustness of the system and do
TABLE I. Number of images of the seven emotions in CK+ Dataset reliable benchmark comparison, we perform recognition tests on
seven emotions, including six basic emotions and contempt. A
Emotion Number of Images confusion matrix of the test results is presented in TABLE III.
Angry 45
TABLE III. Confusion matrix of seven emotion recognition
Disgust 59
An Di Fe Ha Sa Su Co
Fear 25
Happiness 69 An 84.4 3.0 2.2 0 10.4 0 0

Sadness 28 Di 5.2 94.7 0 0 0 0 0

Surprise 82 Fe 6.9 4.2 81.9 0 4.2 2.8 0

Contempt 19 Ha 0 0 0 100 0 0 0

Total 327 Sa 19.8 0 4.9 0 66.7 7.4 0


Su 0 0 0 0 0 100 0
A. Recognition of six basic facial expressions
Co 0 0 1.9 0 18.5 0 79.6
Recognition tests are firstly performed on six basic
expressions where effects of different preprocessing techniques The overall recognition rate for seven emotions is 91.7%,
on classification performance are studied. We conduct six sets of which is lower than the recognition rate for six emotions (96.8%).
test. Each set uses the same 327 images, but the images are An explanation for the drop in recognition rate is that adding one
preprocessed with different techniques. We use leave-one-out more possibility in the output of the network will dilute the
subject cross-validation strategy, which is an exhaustive cross- possibility of the correct emotion, since the total possibility of
validation method. We use one image as the validation set and seven emotions has to be equal to 1. Another reason may be due
the remaining as the training set. For each set of test, the cross- to the fact that contempt is a very subtle emotion and can be
validation is repeated 327 times. And the correct recognition rate easily confused with other stronger emotions.
is defined by dividing the number of correctly recognized image
TABLE II. Recognition rates of six emotions with different histogram distribution and PCA/KPCA

Emotion PCA KPCA Histogram Histogram Normal Normal


equalization equalization distribution distribution
+ PCA + KPCA + PCA + KPCA
Angry 80% 84.4% 91.1% 91.1% 91.1% 95.6%
Disgust 94.9% 94.9% 98.3% 98.3% 98.3% 100%
Fear 64% 68% 72% 72% 76% 84%
Happiness 100% 100% 100% 100% 100% 100%
Sadness 64.3% 67.9% 78.6% 78.6% 85.7% 85.7%
Surprise 100% 100% 100% 100% 100% 100%
Average 89.9% 91.2% 94.2% 94.2% 95.1% 96.8%
The results given in TABLE III are in line with the study by unit and emotion-specified expression,” in 2010 IEEE Computer Society
Lucey et al., where Support Vector Machine is used for Conference on Computer Vision and Pattern Recognition
Workshops(CVPRW), June 2010, pp. 94-101.
recognizing seven emotions on the Extended Cohn-Kanade
[3] Y. Tian, T. Kanade, and J. F. Cohn, “Recognizing action units for facial
Dataset [2]. In their study, Lucey et al. applied two SVMs with expression analysis,” IEEE Transactions on Pattern Analysis and Machine
one using SPTS (similarity-normalized shape) features and the Intelligence, vol. 23, no. 2, pp. 97-115, 2001.
other using CAPP (canonical appearance) features. The same [4] J. Wang and L. Yin, “Static topographic modeling for facial expression
leave-one-out subject cross-validation is applied. It is worth recognition and analysis,” Computer Vision and Image Understanding, vol.
mentioning that Lucey et al. were also involved in the creation of 108, no. 1, pp. 19-34, 2007.
the Extended Cohn-Kanade Dataset. [5] A. Sánchez, J. V. Ruiz, A. B. Moreno, A. S. Montemayor, J. Hernández
and J. J. Pantrigo, “Differential optical flow applied to automatic facial
TABLE IV shows a comparison of the performance between expression recognition,” Neurocomputing, vol. 74, no. 8, pp. 1272-1282,
our deep neural network and the SVM method. Generally 2011.
speaking, the deep network performs better with a +3.7% higher [6] J. Wang and T. Tan, “A new face detection method based on shape
average recognition rate. Explicitly speaking, the deep network information,” Pattern Recognition Letters, vol.21, no. 6, pp. 463-471, 2000.
performs better for Angry, Fear and Surprise. For Disgust and [7] J. Kovac, P. Peer and F. Solina, “Human skin clor clustering for face
Sadness, both systems have the same performance while the detection”, in EUROCON 2003, Computer as a Tool. The IEEE Region8,
September 2003, pp. 144-148.
SVM performs better for Sadness and Contempt.
[8] P. Viola and J. M. Jones, “Robust real-time face detection,” International
TABLE IV. Recognition rates of 7 emotions for the Deep Network and SVM journal of computer vision, vol. 57, no. 2, pp. 137-154, 2004.
[9] D. J. Jobson, Z. Rahman and G. A. Woodell, “A multiscale retinex for
Emotion Deep Network SVM Difference bridging the gap between color images and the human observations of
Angry 84.4% 75% + 9.4% scenes.” IEEE Transations on Imagae Processing, vol. 6, no. 7, pp. 965-
976, 1997.
Disgust 94.7% 94.7% 0
[10] G. Ralph and B. Vladimir, “An image preprocessing algorithm for
Fear 81.9% 65.2% + 16.7% illumination invariant face recognition,” in Proccedings of the 4th
International Conference on Audio- and Video-Based Biometric Personal
Happiness 100% 100% 0 Authentication, June 2003, pp. 10-18.
Sadness 66.7% 68% - 1.3% [11] J. Short, J. Kittler and K. Messer, “A comparison of photometric
Surprise 100% 96% + 4% normalisation algorithms for face verification,” in Sixth IEEE International
Conference on Automatic Face and Gesture Recognition, May 2004, pp.
Contempt 79.6% 84.4% - 4.8% 254-159.
Average 91.7% 88.3% + 3.4% [12] S. S. Ranawade, “Face recognition and verification using artificial neural
network,” International Journal of Computer Applications, vol. 1, no. 14,
pp. 21-25, 2010.
VI. CONCLUSION [13] S. R. Zhou, P. J. Yin and M. J. Zhang, “Local binary pattern (LBP) and
local phase quantization (LBQ) based on Gabor filter for face
In this paper, a facial expression recognition system based on representation,” Neurocomputing, vol. 116, pp. 260-264, 2013.
feedforward deep neural networks is built. The system consists of [14] Z. Zhang, M. Lyons, M. Schuster and S. Akamatsu, “Comparison between
three major stages, which are image preprocessing, feature geometry-based and Gabor-wavelets-based facial expression recognition
extraction and expression classification. Recognition tests were using multi-layer perceptron,” in Proceedings Third IEEE International
Conference on Automatic face and gesture recognition, April 1998, pp.
performed on the Extended Cohn-Kanade Dataset. We have 454-459.
shown that fitting normal distribution to the histogram of images
[15] L. Cao, K. Chua, W. Chong, H. Lee and Q. Gu, “A comparison of PCA,
combined with Kernel PCA yields an improved recognition rate KPCA and ICA for dimensionality reduction in support vector machine,”
compared with conventional histogram equalization and linear Neurocomputing, vol. 55, no. 1-2, pp. 321-336, 2003.
PCA. In the experimental results presented, the deep network [16] H. M. Yang, “Face Recognition Using Kernel Methods,” in Advances in
provides better performance on seven emotion recognition Neural Information Processing Systems (NIPS), vol. 14, 2001.
compared with the SVM method proposed in [2]. However, since [17] B. Schölkopf, A. Smola and R. K. Müller, “Nonlinear component analysis
the recognition tests were performed only on one dataset, future as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp.
work is to improve the system so that it can adapt to a variety of 1299-1319, 1998.
datasets. Moreover, the application of the system to real-life [18] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-
wise training of deep networks,” Advances in neural information
engineering problems will be studied. processing systems, vol. 19, pp. 153-160, 2007.
[19] H. Larochelle, Y. Bengio, J. Louradour and P. Lamblin, “Exploring
strategies for training deep neural networks,” The Journal of Machine
REFERENCES Learning Research, vol. 10, pp. 1-40, 2009.
[20] H. Larochelle, D. Erhan, A. Courville, J. Bergstra and Y. Bengio, “An
[1] P. Ekman and W. Friesen, “Constants across cultures in the face and
empirical evaluation of deep architectures on problems with many factors
emotion. Journal of personality and social psychology,” Journal of
of variation,” in Proceedings of the 24th international conference on
personality and social psychology, vol. 17, no. 2, pp. 124-129, 1971.
Machine learning, June 2007, pp. 473-480.
[2] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews,
G. Hinton, S. Osindero and Y. Teh, “A fast learning algorithm for deep
“The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action
belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.

You might also like