You are on page 1of 5

INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

Facial Expression Recognition: Gabor Filters versus


Higher-Order Correlators
Seyed Mehdi Lajevardi, Zahir M. Hussain
School of Electrical & Computer Engineering, RMIT University, Melbourne, Australia
seyed.lajevardi@rmit.edu.au, zmhussain@ieee.org

Abstract—In this paper we investigate the performance are based on higher-order statistics (HOS), however the dimen-
of different feature extraction methods for facial expression sionality of the resulting data is so high. The dimensionality
recognition based on the higher-order local autocorrelation reduction can be achieved by selection more informative
(HLAC) coefficients and Gabor wavelet filters. We use a
Cohn-Kanade database of facial images, organized in training features based on mutual information.
and testing sets, for evaluation. Autocorrelation coefficients The mutual information (MI) [5] parameter was investigated
are computationally inexpensive, inherently shift-invariant and as a measure of information overlap between features. The
quite robust against changes in facial expression. The focus
is on the difficult problem of recognizing an expression in
mutual information was used as an objective criterion in
different resolutions. Results indicate that local autocorrelation selection of optimal sub-sets of features in a feature reduction
coefficients have surprisingly high information content. task.
In contrast to the classical correlation-based feature selec-
-Index Terms- Feature extraction, higher-order statistics, facial
expression recognition, Gabor filters. tion methods, the mutual information can measure arbitrary
relations between variables and it does not depend on trans-
formations applied to different variables. It can be potentially
I. I NTRODUCTION useful in problems where methods based on linear relations
between data are not performing well.
F ACIAL expression is a visible manifestation of the af-
fective state, cognitive activity, intention, personality, and
psychopathology of a person; it not only expresses our emo-
A functional block-diagram of the proposed facial expres-
sion recognition system is illustrated in Fig.1.
tions, but also provides important communicative cues during
social interaction. Reported by psychologists, facial expression
constitutes 55% of the effect of a communicated message
while language and voice constitute 7% and 38% respectively.
So it is obvious that analysis and automatic recognition of
facial expression can improve human-computer interaction or
even social interaction.
An automatic classification of facial expressions consists
of two stages: feature extraction and feature classification.
The feature extraction is a key importance to the whole
classification process. If inadequate features are used, even
the best classifier could fail to achieve accurate recognition.
In most cases of facial expression classification, the process
of feature extraction yields a prohibitively large number of
features and subsequently a smaller sub-set of features needs
to be selected according to some optimality criteria.
Lyons et al. [2] adopted a wavelet-based face representation.
Input images were convolved with the Gabor filters of five Fig. 1: Block diagram of the facial recognition system
spatial frequencies and the amplitude of the complex-valued
filter responses were sampled on 34 manually selected facial
points and combined into a single vector, containing 1020 The remainder of this paper provides detailed descriptions
elements. Zhang et al. [3] used a similar representation while of the proposed system, experiments and results. Section 2
they applied wavelet of 3 scales and 6 orientations. They explains the image pre-processing stages. In Section 3, the
also considered geometric position of the 34 facial points as feature extraction methods are explained. Section 4 describes
features. the feature selection. In Section 5, the classification process
In this study Higher-Order Local Auto-Correlation (HLAC) based on the NB classifier is explained. Section 6 contains the
features are used for Feature Extraction, HLAC features, an experiments and results, and Section 7 presents the conclu-
extension of autocorrelation features (second-order statistics), sions.

c SQU-2009 ISSN: 1813-419X


° -1-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

primitive image features based on Eq. (1). Their orders and


II. IMAGE PRE-PROCESSING displacements are arbitrary. However, higher order features
The image pre-processing procedure is a very important step with a large displacement region become extremely numer-
in the facial expression recognition task. The aim of the pre- ous. Hence, the original HLAC features are restricted up to
processing phase is to obtain sequences of images which have the second order (three-point relations) and within a 3 × 3
normalized intensity, are uniform in size and shape, and depict displacement region. They are represented by 25 mask patterns
only the face region. The image intensity was normalized using with 0, 1 and 2 displacements (25 mask patterns in Fig. 3).
the histogram equalization. The face area of an image was Each mask pattern is scanned over the entire image, and for
detected using the Viola-Jones method [6] based on the Haar- each possible position, the product of the pixels marked in
like features and the AdaBoost learning algorithm. white is computed.
The Viola and Jones method is an object detection algorithm All the products corresponding to a mask are then summed
providing competitive object detection rates in real-time. It so as to provide one feature. This operation is performed using
was primarily designed for the problem of face detection. 25 different mask pattern to create the feature vector for each
The features used by Viola and Jones are derived from pixels facial image. Each feature value represents the power spectrum
selected from rectangular areas imposed over the picture and of the mask pattern,which corresponds to a basis functions of
show high sensitivity to the vertical and horizontal lines. frequency analysis [8]. Roughly comparison with a Fourier
AdaBoost, is an adaptive learning algorithm that can be transform, the mask size corresponds to the frequency com-
used in conjunction with many other learning algorithms ponent, and the distribution of the displacements corresponds
to improve their performance. AdaBoost is adaptive in the to the direction component. Since the HLAC features use the
sense that subsequent classifiers built iteratively are made to information of two-dimensional distributions as well as the
fix instances misclassified by previous classifiers. At each directions, they analyze an image more closely.
iteration a distribution of weights is updated such that, the
Furthermore, we use large mask patterns to support large
weights of each incorrectly classified example are increased
displacement regions (Fig.4) and extract the features of low
(or alternatively, the weights of each correctly classified ex-
resolutions or low frequencies. Therefore, we use masks
ample are decreased), so that the new classifier focuses more
of different sizes together and construct multi-resolution
on those examples.
features[11].
The final stage of the pre-processing involved detection of
an image which depicted certain emotion with the maximum
level of arousal (emotion intensity). This was done using the
minimum mutual information (MI) criterion. For each frame,
the mutual information between the current frame and the
initial frame was calculated, and the frame with the minimum
mutual information was selected as the frame that represents
an emotion with the maximum arousal [14].

Fig. 2: Six facial expression images after pre-processing. Fig. 3: 25 mask patterns of the HLAC features (3x3).

III. FEATURE EXTRACTION


A. Higher-order local Auto Correlation
The features are generated using higher-order local autocor-
relation. The Nth-order autocorrelation functions, extensions
of autocorrelation functions, are defined as:
Z
x(a1 , a2 , ...., aN ) = f (r)f (r + a1 )f (r + aN )dr (1)
Fig. 4: An extention of HLAC features.
where f (r) denotes the intensity at the observing pixel r,and
a1 , a2 , · · · , aN are N displacements. HLAC features [4] are

c SQU-2009 ISSN: 1813-419X


° -2-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

B. Gabor Wavelet Filters XX


Gabor filters have been applied to various image recognition H(X|Y ) = − p(x, y) log p(y|x) (7)
problems for feature extraction due to its optimal localization The mutual information feature selection (MIFS) algorithm,
properties in both spatial and frequency domain. A Gabor filter described in [5]was applied to perform the feature selection.
can be represented by the following equation: In this approach, starting from the empty set, the best available
1 1 x02 y 02 2Πx0 feature vectors were added one by one to the selected feature
g(x, y) = exp[− ( 2 + 2 )] × exp[j ] (2) set until the size of the set reached the desired value of
2ΠSx Sy 2 Sx Sy λ
NS . The sub-set S of feature vectors were selected through a
where simultaneous maximization of the mutual information between
x0 = x cos θ + y sin θ and y 0 = −x sin θ + y cos θ (3) the selected feature vectors in S and the class labels C, and
minimization of the mutual information between the selected
where (x, y) is the pixel position in the spatial domain, λ feature vectors within S (Eq.8).
is the wavelength (a reciprocal of frequency) in pixels, θ is X
the orientation of a Gabor filter, and Sx , Sy are the standard I(C; fi |S) = I(C; fi ) − β I(fi ; sk ) (8)
deviation along the x and y directions respectively.
where I(C; fi ) is MI between feature and class label and
The Gabor features are calculated as a convolution of the
I(fi , sk ) is the MI between selected feature and new feature.
input image with the Gabor Filter Bank function (Eq.2, Eq.3).
As a result an optimal sub-set of mutually independent and
In this study, the Gabor filters with five frequencies and eight
highly representative feature vectors was obtained.
orientations (Fig.5) are used. Other parameters for designing
of the filters are based on [12].
V. CLASSIFICATION
The facial expressions depicted in images were classified
using Naive Bayesian (NB) classifier. The NB classifier is
a probabilistic method that has been shown to be effective
in many classification problems [9], [10]. It assumes that the
presence (or lack) of a particular feature of a class is unrelated
to the presence (or lack) of any other feature. The classification
decision is made using the following formula:
Y
C = arg max{P (Ci ) P (fi |Ci )} (9)

where P (fi |Ci ) are conditional tables (or conditional density)


learned in training using examples. Despite the independence
Fig. 5: five frequencies and eight orientations of Gabor filters. assumption, NB has been shown to have very good classifi-
cation performance for many real data sets on par with many
more sophisticated classifiers.
IV. FEATURE SELECTION
VI. EXPERIMENTAL RESULTS
Optimal sub-sets of features were selected based on the
mutual information (MI) criterion [7]. The mutual information Facial expression image sequences from the Cohn- Kanade
represents a measure of information found commonly in two database [1] were used to train and test the facial expression
random variables say X and Y , and it is given as: recognition system.The Cohn-Kanade database consists of
approximately 400 image sequences from 100 subjects. The
XX p(x, y) subjects range in age from 18 to 30 years. Sixty-five percent
I(X; Y ) = p(x, y) log (4) of subjects are female; fifteen percent were African-American
p(x)p(y)
and three percent Asian or Latino. Each sequence contained
where p(x)is the probability density function (pdf), defined 12-16 frames. The image sequences expressing different stages
as p(x) = Pr {X = x}, and p(x, y) is the joint pdf defined of an expression development, starting from a low arousal
as p(x, y) = Pr (X = x and Y = y). The MI can be also stage, reaching a peak of arousal and then declining. The facial
expressed in terms of the entropy: expressions of each subject represented six basic emotions:
anger, disgust, fear, happiness, sadness and surprise. The
I(X; Y ) = H(X) − H(X|Y ) (5) training set size was 216 and the test size was 172 image
where, H(X) is the entropy of a random variable X, given sequences. Each test was performed 3 times using randomly
as: selected testing and training sets and an average result was
calculated. The training set contained the same number of
X
H(X, Y ) = − p(x) log p(x) (6) image sequences for each expression. The subjects represented
in the training set were not included in the testing set of
H(X|Y ) in Eq.6 is the conditional entropy given as: images, thus ensuring a person-independent classification of

c SQU-2009 ISSN: 1813-419X


° -3-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

facial expressions. The tested images were classified using TABLE III: Confusion table for Gabor features (person-
HLAC features and NB classifier. Independent).
For comparison, other popular method, Gabor features [12], Anger Disgust Fear Happy Sad Surprise
was applied to the same classification problems. We used Anger 63.3 20.0 6.7 0 10.0 0
Disgust 30.3 48.5 3.0 6.1 12.1 0
filter banks designed in [12]. The numbers of scales (5) and Fear 11.5 9.0 55.1 16.7 2.6 5.1
orientations (8) of the filter bank were adjusted in feature Happy 9.2 1.0 18.5 71.3 0 0
extraction. Sad 20.8 9.2 3.4 0 58.3 8.3
Surprise 3.6 3.5 3.0 0 8.9 81.0
The classification results are presented in Fig.7 and Fig.8.
The results are shown for different resolutions from 16x16
to 128x128. Table.II and Table.III shows the confusions for
different expression in low resolution(16x16) . Fig.6 illustrated VII. C ONCLUSION
the average correct classification for both HLAC and Gabor A comparison of feature extraction methods for the facial
features. It is shown that the classification result based on expression recognition from image sequences was presented
HLAC features for low resolution samples are better than and tested. The method is fully automatic and includes:
Gabor features,however, for high resolution samples the classi- face detection, maximum arousal detection, feature extraction,
fication performance for Gabor features is more accurate than selection of optimal features and classification. The higher-
the HLAC features. order local autocorrelation (HLAC) features in low resolution
Table.I shows an example of the CPU times corresponding images increased the average percentage of correct classifi-
to different resolutions of the feature extraction process. The cations from 62.9% to 65.9% and from 65.2% to 67.2% for
times are given (CPU speed 2.4 GHz and 2GB RAM) for two 16 × 16 and 32 × 32 respectively. Furthermore, the CPU time
cases: the feature extraction process based on Higher order decreased from 2.02s to 0.22s and 2.25s to 0.29s for 16 × 16
local autocorrelation and the feature extraction process based and 32 × 32 respectively.
on Gabor filters. The presented feature selection method is based on the
mutual information(MI) criterion, and does not assume linear
TABLE I: Comparison of feature extraction time.
dependencies between data. It can therefore handle arbitrary
HLAC Features Gabor Features relations between the pattern coordinates and the different
Image Resolution Time(sec) Image Resolution Time(sec)
16x16 0.22 16x16 2.02
classes. The additional advantages of the feature selection
32x32 0.29 32x32 2.25 based on the MI criterion include computational simplicity and
64x64 0.27 64x64 3.57 invariance to the data transformations. The system not only
128x128 0.50 128x128 9.65
offers an optimized feature selection, but also automatically
finds an optimal frame to represent a given class of emotion.
An overall improvement of the classification results and
the discrimination between different facial expressions was
observed when using HLAC features. The accuracy for high
resolution images based on Gabor filters were better than
HLAC features, however the complexity and time consuming
were more discriminate in Gabor filter feature extraction
process.

Fig. 6: A comparision of recognition rates between HLAC features


and Gabor filters.

TABLE II: Confusion table for HLAC filters (person-


Independent). Fig. 7: Recognition rates for six expressions based on Gabor filters
Anger Disgust Fear Happy Sad Surprise
Anger 56.2 25.6 0 0 18.2 0 Though the HLAC features can improve the total recogni-
Disgust 22.2 55.6 14.8 3.7 3.7 0 tion ratios, its not as good as Gabor filters in some expressions
Fear 5.6 6.9 52.8 27.8 0 6.9
Happy 0.6 4.5 19.2 75.0 0 0.7 in high resolution images. We will experiment more to find
Sad 16.1 6.1 5.1 0 72.7 0 the reason and combine other method to solve these problems
Surprise 0 12.5 4.2 0 0 83.3 in future work.

c SQU-2009 ISSN: 1813-419X


° -4-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

Fig. 8: Recognition rates for six expressions based on HLAC features

R EFERENCES
[1] Kanade, T., Cohn, J. F., and Tian, Y..”Comprehensive database for facial
expression analysis,”Proceedings of the Fourth IEEE International Con-
ference on Automatic Face and Gesture Recognition, Grenoble, France,
pp. 46-53, 2000.
[2] M. Lyons, J. Budynek, and S. Akamastu, ”Automatic Classification
of Single Facial Images”,IEEE Trans. Pattern Analysis and Machine
Intelligence, Vol.21, 1999, pp. 1357- 1362.
[3] Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, ”Comparison
Between Geometrybased and Garbor-Wavelet-based Facial Expression
Recognition Using Multi-layer Perceptron”,Proc. 3rd Int. Conf. Automatic
Face and Gesture Recognition, 1998, pp. 454-459.
[4] N. Otsu and T. Kurita. ”A new scheme for practical flexible and intelligent
vision systems”. In Proceedings of the IAPR Workshop on Computer
Vision, pp. 431435, 1988.
[5] Battiti R., ”Using Mutual Information for Selecting Features in Super-
vised Neural Net Learning”. IEEE Trans on Neural Networks, vol.5, no.
4, July 1994, pp. 537-550..
[6] Viola P., Jones M., ”Robust Real-time Object Detection,”, International
Journal of Computer Vision,2004.
[7] Kwak N., Choi C., ”Input Feature Selection for Classification Problems,”,
IEEE Trans. On Neural Networks, vol.13, no.1, pp.143-159, 2002.
[8] T. Toyoda and O. Hasegawa. ”Texture classification using extended higher
order local autocorrelation features,” Proceedings of the 4th International
Workshop on Texture Analysis and Synthesis, pp. 131136, 2005.
[9] Duda R.O., P. E. Hart P.E., Stork D.G., ”Pattern Classification.” New
York: John Wiley and Sons. Inc., 2001.
[10] Rish I., ”An empirical study of the naive Bayes classifier” IJCAI
Workshop on Empirical Methods in Artificial Intelligence, 2001.
[11] Liu, F., et al., ”Facial expression recognition using HLAC features and
WPCA,” Springer Verlag, Heidelberg, Beijing, China, pp.88-94, 2005.
[12] Lajevardi S.M., Lech M., ”Averaged Gabor Filter Features for Facial
Expression Recognition”, DICTA08, Canberra, Australia, 2008.
[13] Lajevardi S.M., Lech M., ”Facial Expression Recognition Using a Bank
of Neural Networks and Logarithmic Gabor Filters”, DICTA08, Canberra,
Australia, 2008.
[14] Lajevardi S.M., Lech M., ”Facial Expression Recognition from Image
Sequences Using Optimised Feature Selection”, IVCNZ08, Christchurch,
New Zealand, 2008.

c SQU-2009 ISSN: 1813-419X


° -5-

You might also like