You are on page 1of 4

LOCAL ZERNIKE MOMENTS: A NEW REPRESENTATION FOR FACE RECOGNITION

Evangelos Sarıyanidia , Volkan Dağlıb , Salih Cihan Tekc , Birkan Tunçb , Muhittin Gökmenc
a
Department of Control Engineering, Istanbul Technical University, Istanbul. sariyanidi@itu.edu.tr
b
Informatics Institute, Istanbul Technical University, Istanbul. {tuncbi, daglivo}@itu.edu.tr
c
Department of Computer Engineering, Istanbul Technical University, Istanbul. {tek, gokmen}@itu.edu.tr

ABSTRACT In [8], Pseudo ZM invariants are used to extract feature vectors from
faces. Another ZM based approach that considers spatial locality by
In this paper, we propose a new image representation called Local partitioning input images is presented in [9]. This approach involves
Zernike Moments (LZM) for face recognition. In recent years, lo- dividing the face images into non-overlapping subregions and then
cal image representations such as Gabor and Local Binary Patterns extracting features from each of these subregions using pseudo ZMs.
(LBP) have attracted great interest due to their success in handling In this paper, we present a novel representation based on local
difficulties of face recognition. In this study, we aim to develop an al- Zernike moments (LZMs) and demonstrate the superiority of this
ternative representation to further improve the face recognition per- representation for face recognition. The moments are calculated at
formance. We achieve this by utilizing Zernike Moments which have every pixel by considering its local neighborhood. A complex mo-
been successfully used as shape descriptors for character recogni- ment image of equal size as the input image is obtained for each mo-
tion. We modify global Zernike moments to obtain a local represen- ment component resulting in a set of images, as shown in Figure 1.
tation by computing the moments at every pixel of a face image by Then, each moment image is divided to non-overlapping subregions
considering its local neighborhood, thus decomposing the image into and phase-magnitude histograms are extracted from the complex
a set of images, moment components, to capture the micro structure output of each moment at each subregion. The final face represen-
around each pixel. Our experiments on FERET face database reveal tation is obtained by concatenating the extracted phase-magnitude
the superior performance of LZM over Gabor and LBP representa- histograms.
tions.
Index Terms— Face recognition, feature extraction, Zernike 2. GLOBAL ZERNIKE MOMENTS
moments, regional histograms
ZMs of an image are defined as the projection of the image onto
1. INTRODUCTION a orthogonal set of polynomials called Zernike polynomials. The
Zernike polynomials are defined as
Robust 2D face recognition is one of the most active research ar- Vnm (ρ, θ) = Rnm (ρ)ejmθ (1)
eas in computer vision. Despite the impressive advances in recent
years, it still remains as a challenging problem, mainly because of where n is the order of the polynomial and m is the number of itera-
the amount of possible variations in the appearance of a face that tions such that |m| < n and n − |m| is even. The radial polynomials
can result from illumination conditions, viewpoint and facial expres- Rnm are given as
sions. Robust face recognition requires robust features with the abil-
n−|m|
ity to deal with such variations. Most face recognition approaches  (−1)s ρn−2s (n − s)!
2
adopt feature extraction methods involving local shape descriptors Rnm (ρ) = . (2)
like Gabor filtering-based features[1, 2] and LBP[3] to handle some s=0 s!( n+|m|
2
− s)!( n−|m|
2
− s)!
of these variations. In this study, we aim to develop a simple yet
powerful representation as an alternative to these representations by The ZMs of a digital image f (i, j) are calculated as
utilizing moments.
n+1  
N −1 N −1
Image moments are regarded as well established shape descrip- Znm = f (i, j)V  (ρij , θij )Δxi Δyj , (3)
tors and have been frequently used for content based retrieval and π i=0 j=0
pattern recognition tasks. One of the most inspiring early studies
illustrating the potential of image moments is the study of Hu [4], where xi and yj  are the image coordinates mapped to the range
y
in which the moment invariants that enable successful recognition [−1, 1], ρij = x2i + yi2 , θij = tan−1 xji and Δxi = Δyj =
against scaling, translation, and rotation are presented. √
2/N 2. The terms n+1 π
, Δxi , and Δyj are constant and they will
In [5], the usage of Zernike moments (ZMs) for character recog- be ignored in the remainder of this paper for clarity.
nition resulted in very promising recognition rates. In [6] and [7], The global form of ZMs expressed in (3) was used for face
the authors proposed methods to extract features from face images recognition in [7]. The authors utilized both the phase and magni-
through ZMs (ZMs) and discussed the rotation invariancy of ZMs. tude information. However the results reported on the FERET face
This work is partially supported by The Scientific and Technological
database [10] are not promising enough to demonstrate the real po-
Research Council of Turkey with the grant number 109E268. The third and tential of ZMs.
the fourth authors are partially supported by ITU-BAP with grants 34120 and To further investigate the potential of ZMs, we have divided
34385, respectively. the whole image into non-overlapping subregions and calculated the

978-1-4673-2533-2/12/$26.00 ©2012 IEEE 585 ICIP 2012


ZMs of each subregion seperately. Recognition rates achieved by
this approach on the FaFb subset of FERET face database are given
in Table 1 for various numbers of subregions. Results indicate that
such a partitioning increases the performance up to a point.

# subregions 12 32 52 72 92 112 132


Performance 59.0 71.4 78.2 78.3 79.5 78.5 76.3

Table 1. Recognition rates using global ZMs for different levels of


partitioning.

3. LOCAL MOMENT TRANSFORMATION


Fig. 1. LZM representation of different face images. Im-
Global moment descriptors must be applied to images containing ages of two persons, under two illumination conditions are
explicit shape features like characters or fingerprints to get a useful shown. The images in each column stand for: The input image,
description. In this study, we define a novel transformation to stimu- |Z311 |, |Z322 |, |Z331 |,  Z311 ,  Z322 ,  Z331 .
late the shape characteristics of raw intensity images. The proposed
transformation gives a new description of the image by extracting the -0.1 -0.1 -0.1 -0.1 0.0 0.1
shape characteristics out of the texture information contained in it. 0.0 0.0 0.0 -0.1 0.0 0.1
Resulting moment components prove to be very useful for capturing 0.1 0.1 0.1 -0.1 0.0 0.1
the micropatterns of the face.
3
Fig. 2. The moment-based filter V11 . Real and imaginary parts are
3.1. Local computation of ZMs given respectively.
The key difference between our approach and the global approaches
is that we calculate moments around each pixel to obtain a new
the same approach and divide the moment components into subre-
image representation through a transformation referred to as LZM
gions after the proposed LZM transformation is applied. A two-step
transformation.
k partitioning is performed. First we divide the image to N × N
We derive moment-based operators Vnm , which are k × k
k equal-sized blocks beginning from the top-left of the image. Then,
kernels calculated using the equation Vnm (i, j) = Vnm (ρij , θij ).
we divide the image to N − 1 × N − 1 blocks of the same size
These operators are similar to 2D convolution kernels used for image
as the previous ones with a grid shifted half a block size from top-
filtering. The LZM transformation can be defined by these kernels
left. Hence, we have N 2 + (N − 1)2 subregions for each moment
as follows
component. In order to increase the robustness against illumination
variations further, each subregion is z-normalized.
k−1 We use histogram representation to combine the information

2

Zkmn (i, j) = k
f (i − p, j − q)Vnm (p, q). (4) gathered from different moment components. This representation
has been shown to be very successful for face recognition [2, 3].
p,q=− k−1
2
Figure 3 illustrates the whole process.
This transformation provides a rich image representation by suc-
cessfully exposing the intensity variations around each pixel. The 4. EXPERIMENTAL RESULTS
components of the representation (i.e. complex images correspond-
ing to different moment orders) seem to be very robust to illumina- We have carried out extensive experiments to assess the performance
tion variations as shown in Figure 1. The complex and real parts of of our approach. The complex output of ZM images is represented
k
an exemplar Vnm filter for size k = 3, n = m = 1 are shown in using phase-magnitude histograms. The performance of the method
Figure 2. is substantially increased by applying the LZM transformation twice.
The number of components depends on the moment order n. We
impose an additional constraint on the second parameter m, such as
m = 0 since the imaginary part of the kernels Vnm k
becomes zero 4.1. Experimental Setup
when m = 0 and this is not a desirable behavior when the outcome In our method, there are several parameters that need to be de-
of (4) is used to extract the phase-magnitude histograms as explained termined, such as the size of ZM-based operator k, the moment
in Section 4.2. The number of effective moment components can be order n, the grid size N × N and the number of bins in the his-
calculated through the following expression: togram b. After extensive experiments, we found that good per-
 n(n+2) formance is obtained when setting k = 7, n = 4, N = 10,
4
if n is even b = 24. When the moment order, n, is set to 4 the number of
K(n) = (n+1) 2 (5)
4
if n is odd. resulting moments, K, is 6, corresponding to moment components
Zk11 , Zk22 , Zk31 , Zk33 , Zk42 , Zk44 . The length of the final feature vector
is (N 2 + (N − 1)2 ) × b × K = 26064.
3.2. Face description from moment components Znm
All tests are performed on the FERET face database [10] using
The advantage of dividing images into subregions has been ex- standard FERET evaluation protocol: a gallery set of 1196 images
pressed in many face recognition studies. In this study, we follow of 1196 subjects and four probe sets: FaFb (1195 images), FaFc

586
     
   
  
  

Fig. 3. An illustration of the proposed method. Only magnitude images are shown. During histogram construction both magnitude and phase
values are considered.

(194 images), Dup1 (722 images) and Dup2 (234 images). The input transformed images. In other words, we apply the LZM transforma-
images are normalized to have zero mean and unit variance after they tion twice: First to stimulate the local shape characteristics; second,
are cropped, and their sizes are fixed to 130 × 150. The final feature to describe the local shape statistics of the transformed images. The
vectors are compared using L1 distance. moment components obtained by the cascaded LZM transformation
are shown in Figure 4, while the block diagram of the cascaded trans-
form is shown in Figure 5.
4.2. Local phase-magnitude histograms (H-LZM)

Several types of histograms can be employed to utilize the output 


of LZM transformation, including magnitude histograms (MHs) and
phase-magnitude histograms (PMHs). Our experiments have led us
to conclude that PMHs perform susbtantially better than MHs. The
PMH of a moment component is extracted as follows: The angle in-
terval of [0, 2π] is divided into b bins. Then, the magnitude value
(|Zknm (i, j)|) at each pixel location (i, j) is added to the bin corre-
sponding to phase value ( Zknm (i, j)) of the same pixel location.
The PMHs of moment components are extracted at each sub-
region as illustrated in Figure 3. Each local PMH is normalized to
have unit norm, and all normalized local PMHs are concatenated to
get the final feature vector. Most face recognition methods that in-
volve partitioning use a weight map similar to the one in [3] to assign
a different weight for each subregion. In this study we adopt a sim-
ilar weighting procedure. We use the weight map proposed in [3]
by simply interpolating it to fit to our grid size. Although it is pos-
sible to determine the optimum weights for each subregion of each
moment component, this was left for future work. 

Furthermore, we perform a different, pixel-wise weighting in-


side each subregion when computing the PMHs. We use a gaus-
sian weighting kernel of same size as a subregion with σ = 8. We
multiply the magnitude of each pixel with the corresponding kernel
weight before adding it to the relevant histogram bin.
The performance of the proposed method is given in Table 2
with the abbreviation H-LZM. Although the results are acceptable,
they do not show the full potential of the LZMs. The reason of the
relatively low performance is discussed in the next section, where a
solution to increase the performance is also described.

4.3. Cascaded LZM transformation (H-LZM2 )

In general, image moments perform better on images containing ex-


plicit shape information like fingerprints or character images. As in-
dicated before, the proposed LZM transformation gives the descrip-
tion of shape in a micro scale around each pixel. In the previous Fig. 4. Moment components by repeated LZM transformation.
experiments, we have tried to obtain statistics from this new descrip- Blocks on the left and right show outputs of the first and second
tion by calculating PMHs. Since the resulting description exhibits LZM transformations, respectively.
shape information, it is possible to obtain more useful statistics by
using another feature extraction method before computing the his- Since the output of the first stage is complex, one can utilize ei-
tograms. ther its real part or imaginary part or both as the input to the next
Motivated by this property, we modify the method explained so step. One can also compute the magnitude or angle and use them
far by replacing the input intensity images in Figure 3 with LZM instead. We have found that using the imaginary or real parts di-

587
        

    


  
     
   
    

Fig. 5. An illustration of the cascaded approach. Magnitude images are shown after the second transformation. During histogram construction
both magnitude and phase values are considered.

rectly leads to better results and therefore report the results only phase-magnitude histograms of LZMs are very useful in obtaining
for these inputs in Table 2, where H-LZM2 -I, H-LZM2 -R, and H- successful face recognition schemes. With the help of a cascaded
LZM2 -IR indicate results with imaginary, real, and combined in- LZM and two overlapping grids, a face recognition scheme that out-
puts, respectively. As expected, preprocessing the input image to performs the Gabor and LBP representations is obtained without any
extract the shape characteristics increases the performance substan- training. We expect that LZM will be successfully used to solve large
tially. The moment orders used in this approach are n1 = 4 for the variety of problems including feature extraction, face/object detec-
first stage, and n2 = 4 for the second stage. Kernel sizes are k1 = 5 tion, and object recognition in addition to face recognition.
and k2 = 7, respectively. The length of the final feature vector is
(N 2 + (N − 1)2 ) × b × K1 × K2 × 2 = 312768 when combining 6. REFERENCES
real and imaginary parts.
[1] Laurenz Wiskott, Jean-Marc Fellous, Norbert Krger, and
Method FaFb FaFc Dup I Dup II Christoph Von Der Malsburg, “Face recognition by elastic
H-LZM 95.0 87.1 73.8 70.9 bunch graph matching,” Pattern Analysis and Machine Intelli-
H-LZM (Weighted) 97.5 95.4 78.5 76.1
gence, vol. 19, pp. 775–779, 1997.
H-LZM2 -I 96.2 97.9 79.6 76.9
H-LZM2 -I (Weighted) 98.7 99.5 83.9 82.5 [2] Baochang Zhang, Shiguang Shan, Xilin Chen, and Wen Gao,
H-LZM2 -R 96.2 96.9 77.4 73.5 “Histogram of Gabor Phase Patterns (HGPP): A Novel Object
H-LZM2 -R (Weighted) 98.7 99.0 83.2 81.2 Representation Approach for Face Recognition,” IEEE Trans-
H-LZM2 -IR 96.3 97.9 79.9 76.5 actions on Image Processing, vol. 16, no. 1, pp. 57–68, 2007.
H-LZM2 -IR (Weighted) 98.7 99.5 84.8 82.5 [3] Timo Ahonen, Abdenour Hadid, and Matti Pietikäinen, “Face
recognition with local binary patterns,” in Proc. 8th Eur. Conf.
Table 2. Recognition rates with different settings. Computer Vision, 2004, pp. 469–481.
[4] Ming-Kuei Hu, “Visual pattern recognition by moment invari-
Finally, we compare our best results with some well-known face ants,” IRE Transactions on Information Theory, vol. 8, no. 2,
recognition approaches in Table 3. Results indicate that the proposed pp. 179–187, 1962.
LZM transformation outperforms the similar methods based on LBP
and Gabor representations. [5] A. Khontanzad and Y. H. Hong, “Rotation invariant image
recognition using features selected via a systematic method,”
Method FaFb FaFc Dup I Dup II Pattern Recogn., vol. 23, pp. 1089–1101, 1990.
LBP [3] 93.0 51.0 61.0 50.0 [6] Atsushi Ono, “Face recognition with zernike moments,” Sys-
LBP (Weighted) [3] 97.0 79.0 66.0 64.0 tems and Computers in Japan, vol. 34, no. 10, pp. 26–35, 2003.
HGPP [2] 97.6 98.9 77.7 76.1
[7] Chandan Singh, Neerja Mittal, and Ekta Walia, “Face recog-
HGPP (Weighted) [2] 97.5 99.5 79.5 77.8
H-LZM2 -IR 96.3 97.9 79.9 76.5
nition using zernike and complex zernike moment features,”
H-LZM2 -IR (Weighted) 98.7 99.5 84.8 82.5 Pattern Recognition and Image Analysis, vol. 21, pp. 71–81,
2011.
Table 3. Recognition rates compared to other methods. [8] Javad Haddadnia, Majid Ahmadi, and Karim Faez, “An effi-
cient feature extraction method with pseudo-zernike moment
in rbf neural network-based human face recognition system,”
EURASIP J. Appl. Signal Process., vol. 2003, pp. 890–901,
5. CONCLUDING REMARKS January 2003.
In this paper, we have introduced a new representation called LZM [9] Hamidreza Rashidy Kanan, Karim Faez, and Yongsheng Gao,
for face recognition by computing Zernike moments at every pixel “Face recognition using adaptively weighted patch pzm array
of an face image using the local neighborhood of each pixel. In this from a single exemplar image per person,” Pattern Recogni-
representation, a face image is decomposed into a set of complex tion, vol. 41, no. 12, pp. 3799 – 3812, 2008.
images corresponding to the moment components, each of which [10] Syed A Rizvi, P Jonathon Phillips, and Hyeonjoon Moon, “The
exposes different characteristics of the face image by capturing the feret verification testing protocol for face recognition algo-
micro structure around each pixel. We have demonstrated that the rithms,” Computer Engineering, , no. October, pp. 48, 1998.

588

You might also like