You are on page 1of 4

INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

Zernike Moments for Facial Expression Recognition


Seyed Mehdi Lajevardi, Zahir M. Hussain
School of Electrical & Computer Engineering, RMIT University, Melbourne, Australia
seyed.lajevardi@rmit.edu.au, zmhussain@ieee.org

Abstract—This study presents a facial expression recognition such as Fourier descriptors do. Teague [11] has suggested
system using an orthogonal invariant moment namely Zernike the orthogonal moments based on the theory of orthogonal
moment (ZM) as a feature extractor and LDA classifier. Changes polynomials to overcome the problems associated with the
in illumination condition, pose, rotation, noise and others are
challenging task in pattern recognition system. Simulation re- regular moments. In the presented approach, Zernike moments
sults on Cohn-Kanade database show that higher order ZM used are a class of such orthogonal moments. The reason for
features are obtained good results in images with noise and selecting them from among the other orthogonal moments
rotation whereas feature extraction time rate is slower than other is that they possess a useful rotation invariance property.
methods. Rotating the image does not change the magnitudes of its
-Keywords- Feature extraction, Facial Expression Recognition,
Zernike moments Zernike moments [12]. Hence, they could be used as rotation
invariant features for image representation. The affect of the
noise is also considered as second criteria for facial expression
I. I NTRODUCTION recognition.

D URING recent years, facial expression recognition has


attracted significant interest in the scientific community
due to its importance for human centred interfaces. Applica-
In this paper, high order Zernike Moment Invariants has
been used for feature extraction of the preprocessed face
images. Also multi-class LDA Classifier is employed in this
tions include border security systems, forensics, virtual reality, system. The process of facial expression recognition system is
computer games, robotics, machine vision, video conferenc- shown in Fig.1.
ing, user profiling for customer satisfaction, broadcasting and The remainder of this paper describes the methods, experi-
web services [1],[2],[4],[6]. ments and results. Section 2 explains the image pre-processing
Various methods have been proposed for automatic recog- steps. In Section 3, the feature extraction method based on
nition of facial expression in the past several decades, Zernike moments is explained. Section 4 explains the LDA
which could be roughly classified into three categories: 1) classifier. Section 5 contains the experimental results and in
Appearance-based method, represented by eigenfaces, fisher- section 6 final conclusion is presented.
faces and other methods using machine-learning techniques,
such as neural networks and Support Vector Machine (SVM);
2) Model-based methods, including graph matching, optical-
flow-based method and others; and 3) Hybrids of appearance
based and model-based methods, such as Active Appearance
Model (AAM) [7],[4]. Appearance-based methods are supe-
rior to model-based methods in system complexity and per-
formance reproducibility. Further, appearance-based methods
allow efficient characterization of a low-dimensional subspace
within the overall space of raw image measurement, which
deepen our understanding of facial expressions from their
manifolds in subspace, and provide a statistical framework for
the theoretical analysis of system performance.
Most of the recently reported studies classify the facial
expressions into six basic classes: anger, disgust, fear, happi-
ness, sadness and surprise [2]. Numerous methodologies have
Fig. 1: Block diagram of the facial recognition system
been proposed for facial expression analysis from both static
images and image sequences [3],[4]. This study presents a
computationally efficient approach to feature extraction for
facial expression recognition from facial images with noise II. IMAGE PRE-PROCESSING
and rotation. One of the key problems in building automated systems in
Moments and functions of moments have been utilized as facial expression recognition task is face localization. Many
pattern features in a number of applications [8], [9], [10]. algorithms have been proposed for face localization and de-
Such features capture global information about the image and tection, which are based on using shape, color information,
do not require closed boundaries as boundary-based methods motion etc. The aim of the pre-processing phase was to obtain

c SQU-2009 ISSN: 1813-419X


° -1-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

images which have normalized intensity, uniform size and are transformed by
shape, and depict only a face expressing certain emotion. The
parts of images that contained only the faces were extracted j(d − c) i(d − c)
xj = c + , yi = d − (6)
using the Sobel operator [13]. After the Sobel kernels were N −1 M −1
applied, the area of the face was found based on the blob where i = 0, ..., M −1, j = 0, ..., N −1 and the real numbers c
analysis [9]. In image processing, a blob is defined as a region and d take values according to if the image function is mapped
of connected pixels. The blob analysis algorithm identifies outside or inside a unit circle. In this study, we set the c and d
these regions in an image, and places them in one of two to map the image function inside the unit circle. Fig.3 shows
categories: the foreground (typically pixels with a non-zero an example of feature extraction from face image. The List of
value) or the background (pixels with a zero value). The parts the first 10 order Zernike moments is given in Table.I.
representing faces were cut out from the images and their
histograms were equalized. Finally, the images were scaled to
the same size of 128 × 128 pixels. Fig.2 shows examples of
images after the pre-processing.

Fig. 3: Example of ZM for feature extraction with different orders


and repetitions

Fig. 2: Examples of images after the pre-processing step. TABLE I: the first 10 order Zernike moments

Order Dimensionality Zernike moments


0 1 A00
III. FEATURE EXTRACTION 1 2 A11
The advantages of considering orthogonal moments are that 2 4 A20 , A22
they are shift, rotation and scale invariant and very robust in 3 6 A31 , A33
4 9 A40 , A42 , A44
the presence of noise. The invariant properties of moments 5 12 A51 , A53 , A55
are utilized as pattern sensitive features in classification and 6 16 A60 , A62 , A64 , A66
recognition applications[9]. 7 20 A71 , A73 , A75 , A77
The kernel of Zernike moments is a set of orthogonal 8 25 A80 , A82 , A84 , A86 , A88
Zernike polynomials defined over the polar coordinate space 9 30 A91 , A93 , A95 , A97 , A99
inside a unit circle. The complex Zernike moments of order n 10 36 A10,0 , A10,2 , A10,4 , A10,6 , A10,8 , A10,10
with repetition l of a function f (r, θ) are defined as:
Z Z
n + 1 2π 1 ∗
Anl = f (r, θ)Znl (r, θ)rdrdθ (1) IV. M ULTI - CLASS LDA C LASSIFIER
Π 0 0
The Linear classifier based on discriminant analysis is used
where * denotes complex conjugate and the circular Zernike
to classify the six different expressions. A natural extension
polynomials in a unit circle are defined as:
of Fisher Linear discriminant that deals with more than two
classes exists [8], which uses multiple discriminant analysis.
Z(r, θ) = Z(rcosθ, rsinθ) = Rnl (r)eilθ (2)
The projection is from high dimensional space to a low
The real-valued radial polynomials, are given by: dimensional space and the transformation sought is the one
that maximizes the ratio of intra-class scatter to the inter-
n−|l| class scatter. The maximization should be done among several
X2
(n − s)! competing classes. The intra-class matrix is defined as:
Rnl (r) = (−1)s rn−2s (3)
s=0 s!( n+|l|
2 − s)!( n−|l|
2 − s)!
n X
X
where l = −∞, ... − 2, −1, 0, 1, 2, 3, ...∞ ;the integer n ≥ 0, Σˆω = S1 + ... + Sn = (x − x̄i )(x − x̄i )T (7)
|l| ≤ n and n − |l| is always even. i=1 x∈ci
The discrete approximation of the continuous Zernike inte-
The inter-class scatter matrix is given by:
gral based on Eq.1, Eq.2 for image function I(i, j) with spatial
dimension M × N is written as follows: n
X
Σ̂b = mi (x̄i − x̄)(x̄i − x̄)T (8)
M
X −1 N
X −1
n+1 ∗ −ilθij i=1
Anl = I(i, j) Rnl (rij ) e (4)
Π i=0 j=0
Here mi is the number of training samples for each class,
x̄i is the mean for each class and x̄ is total mean vector given
where the discrete polar coordinates by:
q n
yi 1 X
rij = x2j + yi2 , θij = arctan( ) (5) x̄ = mi x̄i (9)
xj m i=1

c SQU-2009 ISSN: 1813-419X


° -2-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

After obtaining Σˆω and Σ̂b , based on Fishers criterion the


linear transformation Φ can be obtained by solving the gener-
alized eigenvalue problem:

Σ̂b Φ = λΣˆω Φ (10)

Once the transformation Φ is given, the classification can be


performed in the transformed space based on some distance
measures d such as Euclidean distance. The new instance,
xnew , is classified to
Fig. 5: Face images for different orientations.
Cnew = arg min d(xnew Φ, x¯k Φ) (11)

where x¯k is the centroid of k − th class.


Table.III shows the recognition rate for different rotations
V. E XPERIMENTAL R ESULTS of facial expressions. Due to rotation, each Zernike moment
acquires a phase shift, thus the magnitude of a rotated image
For experimental studies, we have considered static images remain identical to those before rotation.
from the Cohn-Kanade dataset [5]. A total of 359 face images For comparison, we used bank of Gabor filters as feature
from 100 subjects were selected. The images were depicting extractor[13],[14],[15]. Table.V shows the result based on 40
six different facial expressions: anger, disgust, fear, happiness, Gabor filters(5 Frequencies and 8 orientations. The results
sadness and surprise. In the training phase 180 images were show that approximately the same results are obtained for both
used (30 for each expression) and in the testing phase 179 ZM and Gabor filter features for images without noise and
images were classified. The images used in the testing set rotation, But Gabor filters are not rotation and noise invariant.
were not included in the training set. The subjects represented
in the training set were not included in the testing set of
images, thus ensuring a person-independent classification of
facial expressions. Because of the limited number of samples,
each test was performed 3 times using randomly selected
testing and training sets and an average results were calculated.
For testing set the impulsive noise has been add and also the
image is rotated based on different orientations. Fig.4 and
Fig.5 show an example of image with different noises and
orientations.

Fig. 6: Error rate for noisy image based on 10th ZM order

TABLE II: Percentage of correct classifications for different


ZM orders.

Order Accuracy (%) Time (sec)


3 32.4 0.13
4 42.5 0.19
5 50.3 0.27
6 65.4 0.38
7 63.7 0.51
8 65.4 0.68
Fig. 4: Face images with different impulsive noise. 9 69.3 0.88
10 73.2 1.10
Table.II shows the various order considered in the ex- 11 72.6 1.37
periments, the average accuracy and computation time. The 12 67.0 1.68
recognition rate is improved by increasing the order of Zernike
moments. The maximum rate is ZM order 10 which is 73.2 %
and after that the recognition rate is reduced by increasing the
orders. So we choose the ZM order 10 and do other experi- VI. C ONCLUSION
ments. Fig.6 shows the error rate ( number of misclassified / In this research, we have evaluated one kind of feature
Total number of samples ) for noisy face images and proves extraction method (ZM) for facial expression recognition.
that the noise has less affect to the correct classification based After preprocessing, the high order Zernike moments is used
on Zernike moments as a feature extractor. as a feature vectors which are fed to classifier stage. We have

c SQU-2009 ISSN: 1813-419X


° -3-
INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTER AND POWER (ICCCP’09) MUSCAT, FEBRUARY 15-18, 2009

TABLE III: Percentage of classification for different orienta- R EFERENCES


tions
[1] F. Bourel, C. Chibelushi, A. Low, ”Recognition of facial expressions in
the presence of occlusion,” Proceedings of the 12th BMVC, vol. 1, pp.
Order 30o 45o 60o 90o 213-222, Manchester, 2001.
3 32.8 31.8 32.4 32.4 [2] P. Ekman, ”Facial Expressions of Emotion: an Old Controversy and New
4 43.6 46.4 44.1 42.5 Findings,” Philosophical Transactions of the Royal Society, B 335: 63-69,
5 52 52.5 51.4 50.3 London, 1992.
6 65.4 64.3 64.3 65.4 [3] A. Essa, P. Pentland, ”Coding, Analysis, Interpretation, and Recognition
of Facial Expressions,” IEEE Trans. Pattern Analysis and Machine
7 62.6 61.5 62 63.7 Intelligence, vol. 19, pp. 757-763, 1997.
8 63.7 63.1 62.6 65.4 [4] B. Fasel, J. Luettin, ”Automatic facial expression analysis: a survey,”
9 68.7 66.5 67.6 69.3 Pattern Recognition, vol. 36, pp. 259-275, 2003.
10 67 72.1 65.9 73.2 [5] T. Kanade, J. F. Cohn, and Y. Tian, ”Comprehensive database for
11 68.7 72.1 68.2 72.6 facial expression analysis,” Proceedings of the Fourth IEEE International
12 59.2 69.8 58.1 67 Conference on Automatic Face and Gesture Recognition (FG’00), pp.
46-53, Grenoble, France, 2000.
[6] N. Kwak, C. Choi, ”Input Feature Selection for Classification Problems,”
Order 120o 135o 150o 180o IEEE Trans. On Neural Networks, vol. 13, no. 1, pp.143-159, Jan. 2002.
3 31.3 31.8 32.4 32.4 [7] M. Pantic and L.J.M. Rothkrantz, ”Automatic analysis of facial expres-
4 44.7 46.4 44.1 42.5 sions: The state of the art,”,IEEE Trans. Pattern Anal. Mach. Intell.,
5 53.6 52.5 51.4 50.3 vol.22, no.12, pp.14241445, 2000.
6 64.3 64.3 64.3 65.4 [8] Hyung Shin Kim,Heung-Kyu Lee, ”Invariant image watermark using
7 63.7 61.5 62 63.7 Zernike moments,”. Circuits and Systems for Video Technology, IEEE
Trans., pp.766-775, 2003.
8 62.7 63.1 62.7 65.4 [9] Kanan, H. R., K. Faez, et al.”Face recognition using adaptively weighted
9 67 66.5 67.6 69.3 patch PZM array from a single exemplar image per person,”. Pattern
10 68.2 72.1 66 73.2 Recognition 41(12): 3799-3812,2008.
11 67.6 72.1 68.2 72.6 [10] Zhi, R. and Q. Ruan. ”A comparative study on region-based moments
12 57 69.8 58.1 67 for facial expression recognition,”. Institute of Electrical and Electronics
Engineers Computer Society. pp.600-604, 2008.
[11] M. Teague, ”Image analysis via the general theory of moments,” J. Opt.
Soc. Amer., vol. 70, no. 8, pp. 920-930, Aug. 1980.
TABLE IV: Confusion Table based on 10th Zernike order [12] Khotanzad, A. and Hong, Y.H. ”Invariant Image Recognition by Zernike
moments Moments,”. IEEE Trans. on PAMI, 12 (5). 289-497,1990.
[13] Lajevardi S.M., Lech M., ”Averaged Gabor Filter Features for Facial
A D F H S Su Expression Recognition”, Conference Proceedings, DICTA08, Canberra,
Australia, 2008.
A 69.2 15.4 0 0 15.4 0 [14] Lajevardi S.M., Lech M., ”Facial Expression Recognition Using a
D 22.2 70 0 0 7.8 0 Bank of Neural Networks and Logarithmic Gabor Filters”, Conference
F 0 8.3 83.3 0 8.3 0 Proceedings, DICTA08, Canberra, Australia, 2008.
H 5.8 1.9 19.2 71 2.1 0 [15] Lajevardi S.M., Lech M., ”Facial Expression Recognition from Image
S 17.9 9.1 0 0 73 0 Sequences Using Optimised Feature Selection”,IVCNZ08, Christchurch,
Su 4.2 8.3 4.2 2.1 8.3 72.9 New Zealand, 2008.
Average 73.2
A: anger D: disgust F: fear H: happy S: sad Su: surprise

used multi-class LDA classifier. Experimental results on Cohn-


Kanade database indicate that approximately the same results
are obtained for different noise and rotation on the face images
by using ZM, But feature extraction is so complex and time
consuming. In future work we plan to optimize the feature
extraction process by combining other methods.

TABLE V: Percentage of correct classifications using Gabor


Filters

A D F H S Su
A 61.5 13.1 15.3 0 10.1 0
D 11.1 77.8 11.1 0 0 0
F 12.5 4.2 54.5 12.5 8.3 8.0
H 0 1.9 2.0 96.1 0 0
S 12.1 3.2 9.0 0 75.7 0
Su 2.1 2.1 8.3 0 0 87.5
Average 75.5
A: anger D: disgust F: fear H: happy S: sad Su: surprise

c SQU-2009 ISSN: 1813-419X


° -4-

You might also like