You are on page 1of 71



submitted by

for the award of the degree


(by Research)



This is to certify that the thesis titled SUBSPACE COMBINATION TECHNIQUES FOR FACE RECOGNITION, submitted by Amit C. Kale, to the Indian Institute of Technology, Madras, for the award of the degree of Master of Science, is a bona de record of the research work done by him under my supervision. The contents of this thesis, in full or in parts, have not been submitted to any other Institute or University for the award of any degree or diploma.

Dr. R. Aravind Place: Chennai Date: Dr. Devendra Jalihal (Research Guides) Dept. of Electrical Engineering IIT-Madras, 600 036


I would like to express my sincere thanks to my guide Dr. R. Aravind for his excellent guidance in my research. His wide knowledge and his logical way of thinking have been of great value for me. His understanding, encouraging and personal guidance have provided a good basis for the present thesis. I am grateful to Dr. Devendra Jalihal for his valuable interaction and who taught the courses like Signal Compression with such amboyance. During this work I have collaborated with many colleagues for whom I have great regard, and I wish to extend my warmest thanks to all those who have helped me with my work. I would also like to thank my friends Sachin, Balaji, Arnav, Rakesh, and Ashish for making my time in IIT memorable. I thank my other friends Jitendra, Anand, Kavita and Elodie for their valuable support and encouraging me to work hard. I am immensely grateful to IIT-Madras for providing this excellent opportunity. Finally I am greatly indebted to my parents and brother. All my success and achievements go to their credit.



Face Recognition, Dual Eigenfaces, Illumination, Bayes, Energy Dierence Classier, Log-Gabor

Face Recognition is a research problem that has received a lot of attention due to the requirement of robust biometric recognition systems for commercial applications. Given a face image as an input, a face recognition system is required to identify the person by nding a match from the images stored in its database. The system should give acceptable performance for varying expressions, illumination and pose. The main focus of this thesis is the combination of subspace techniques for recognition. We propose two distinct methods. The rst method uses Canonical Correlation Analysis (CCA) for combining the outputs from two feature extractors, such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to improve the system performance. CCA nds the transformation for each of the feature extractor output and maximizes the correlation between them. Using CCA we obtain improved performance of the combined system over the individual feature extractors. The second method uses Intrapersonal (intensity dierences between dierent images of same subject) and Extrapersonal (intensity dierences between images of dierent subjects) classes. PCA is performed on both the mentioned classes ii

to obtain two sets of eigenvectors. Given a test image we compute an intensity dierence with training images and project on the two subspaces. A simple Energy Dierence Classier (EDC) is proposed for classication. The EDC performs better than the popular techniques like PCA and LDA. EDC gives a marginal improvement over Bayesian Face Recognition system with reduced complexity. We propose the use of Log-Gabor lters as a pre-processor, which further boosts the recognition rates.


ACKNOWLEDGEMENTS ABSTRACT LIST OF TABLES LIST OF FIGURES ABBREVIATIONS 1 INTRODUCTION 1.1 1.2 1.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . i ii vi vii viii 1 1 2 4

2 OVERVIEW OF POPULAR FACE RECOGNITION TECHNIQUES 2.1 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis . . . . . . . . . . . . . . . . . . . 2.2.1 2.3 2.4 DCT-based PCA . . . . . . . . . . . . . . . . . . . . . .

5 5 6 11 12 14 16 18 19 19 19 22 25

Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . Bayesian Face Recognition . . . . . . . . . . . . . . . . . . . . . 2.4.1 2.4.2 Subspace Density Estimation . . . . . . . . . . . . . . . Classication . . . . . . . . . . . . . . . . . . . . . . . .

3 PRE-PROCESSING AND NORMALIZATION OF DATABASES 3.1 Pre-Processing and Normalization . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.1.3 De-Rotation . . . . . . . . . . . . . . . . . . . . . . . . . Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intensity Normalization iv . . . . . . . . . . . . . . . . . .

3.2 4

Log-Gabor Filters . . . . . . . . . . . . . . . . . . . . . . . . . .


FACE RECOGNITION BASED ON CANONICAL CORRELATION ANALYSIS 4.1 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . 4.2.1 4.3 CCA based Feature Fusion for Face Recognition . . . . .

28 28 29 32 34 35 36

Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . 4.3.1 4.3.2 Performance without Log-Gabor ltering . . . . . . . . . Performance with Log-Gabor Filtering . . . . . . . . . .

5 DUAL EIGENFACES AND ENERGY DIFFERENCE CLASSIFIER 5.1 5.2 5.3 5.4 5.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual Eigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . Intrapersonal and Extrapersonal Classier . . . . . . . . . . . . Energy Dierence Classier . . . . . . . . . . . . . . . . . . . . Performance of EDC . . . . . . . . . . . . . . . . . . . . . . . .

39 39 39 41 43 46 49 49 50 55 56 56 57 59 61

6 CONCLUSIONS 6.1 6.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

REFERENCES A VARIOUS FACE DATABASES A.1 The Face Recognition Technology (FERET) Database . . . . . . A.1.1 Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . A.2 The AR Database . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF PAPERS BASED ON THESIS

3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Various Filter Parameters used for Log-Gabor lters . . . . . . . No. of subjects and testing images used to test the system . . . Performance Comparison of PCA, LDA and FFM-1, 2 using Euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Comparison of PCA, LDA and FFM-1, 2 using Cosine Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . Performance Comparison of DCT, LDA and FFM-1, 2 using Euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Comparison of DCT, LDA and FFM-1, 2 using Cosine Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . Performance Comparison of PCA, LDA and FFM-1, 2 for LogGabor feature vectors using Euclidean distance . . . . . . . . . Performance Comparison of PCA, LDA and FFM-1, 2 with LogGabor feature vectors using Cosine Similarity Measure . . . . . Comparison of various techniques without Log-Gabor ltering . Performance on Log-Gabor ltered images . . . . . . . . . . . . 27 34 36 36 36 37 37 37 46 47

5.1 5.2


2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 3.5 Eigenfaces for dierent eigenvalues . . . . . . . . . . . . . . . . Original(a) and Reconstructed Image(b) using Principal Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-D Illustration of LDA and PCA . . . . . . . . . . . . . . . . . Feature Space Decomposition and eigenvalue spectrum . . . . . De-Rotation of an image . . . . . . . . . . . . . . . . . . . . . . Non-integer pixel mapping due to de-rotation of image . . . . . De-rotation of an Image . . . . . . . . . . . . . . . . . . . . . . A neighborhood of 4 4 is used to estimate the pixel values . . (a) Images without any form of equalization, (b) Same images after BHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some of the unequalized training and test images . . . . . . . . Mean performance over all databases using FFM-1 and FFM-2 . Distribution of Intrapersonal() and Extrapersonal() classes for rst 3 principal components . . . . . . . . . . . . . . . . . . . . Cross-section of 2-D Gaussian distributions . . . . . . . . . . . . Intrapersonal and Extrapersonal subspace . . . . . . . . . . . . Plot of Subspace energies . . . . . . . . . . . . . . . . . . . . . . Mean performance over all Databases . . . . . . . . . . . . . . . 9 10 12 17 20 21 22 23 25 35 38

4.1 4.2 5.1 5.2 5.3 5.4 5.5

42 43 44 45 47




Principal Component Analysis Linear Discriminant Analysis Discrete Cosine Transform Block Histogram Equalization Canonical Correlation Analysis Feature Fusion Method Energy Dierence Classier






The eld of biometrics is the study of various methods to automate the recognition of a human being based on physical or behavioral traits [1] that are unique to a particular person. Physical traits like nger-print, iris scans, hand-geometry, gait, speech and face are used for identication. Due to the requirement of reliable biometric identication methods for example in law enforcement [2],[3], extensive research has been carried out over the past decade on such methods. In identication methods that employ nger-print and iris scans, cooperation from the subject is imperative, which renders them dicult for surveillance applications. For facial image based identication systems, cooperation from the subject is not a must. Face Recognition is dened as the automated recognition of human faces using facial characteristics. A recognition system can perform either verication (Is he the person he claims to be?) or identication (Who is he?). In face verication the image of unknown person who claims an identity is input to the system, and the system compares the image with the stored images in the database and conrms or rejects the claim. In the identication task, the system compares the given unknown image with the database of known subjects and nds the closest match. In identication it is assumed that, the unknown subject is already present in the

database [4]. In this thesis we address the problem of subject identication. Image-based face recognition algorithms can be classied into feature based and template matching techniques [2]. In feature based approach, knowledge of facial features (eyes, nose etc.) are used [3]. Template matching techniques use holistic approach, where the face image is converted to a vector by lexicographical ordering of rows or columns. The dimensions of the vector is large, making it dicult to operate. Dimensionality reduction techniques like Principal Component Analysis (PCA) [5] and Linear Discriminant Analysis (LDA) [6] are used to obtain a subspace representation for the face images/vectors.


Motivation and Goals

All face recognition algorithms require a training phase before they can be used on test images. In the training phase, important features from faces are extracted from the training images. The extracted features have unique characteristics pertaining to a subject, and are used for recognition. For recognition, the same features are extracted from the test images and given to a classier which compares them with stored features, and provides the best match. PCA is a well-known dimension reduction technique, which computes eigenvectors from the covariance matrix obtained using the data. Faces are projected on these eigenvectors to obtain their low-dimensional representations. PCA works well when the samples are from same subject/class. The projection vectors are more useful for reconstruction, it performs poorly under varying illumination [7].

A supervised technique like LDA uses class specic information to discriminate between two classes. However, LDA suers from small sample size [8],[7] problem, making it less ecient with a small number of training images per subject. Another eective technique called Bayesian Face Recognition [9] uses Bayesian rule [10] to nd a match. In this technique instead of classifying face images directly, dierence images are classied as intrapersonal (belonging to same person) or extrapersonal (belonging to a dierent subject), thus it approaches a multipleclass pattern classication problem as a two-class problem. The main challenge is to recognize faces with varying expressions, illumination and pose. Many algorithms have been implemented to tackle the issues of pose and illumination variations [2],[11]. In our work we attempt to solve the problem of illumination using combination of subspaces. We propose two techniques to combat the problem. The rst technique uses Canonical Correlation Analysis (CCA) [12] to combine dierent subspace techniques like PCA and LDA to obtain the advantages of the individual methods. The other technique we propose is the Energy Dierence Classier (EDC). This technique uses intrapersonal and extrapersonal subspaces [9] as used in Bayesian Face Recognition. As is widely found in the face recognition literature [2],[3],[13] our goal is to develop a robust recognition system for about 150 subjects. We test the techinques on FERET [14], [15] and AR [16] databases. It is observed that, for small databases like Yale [17] acceptable recognition rates can be obtained by just using PCA [5].


Outline of the Thesis

Chapter 2 reviews popular dimensionality reduction techniques like PCA and LDA. We also propose a Discrete Cosine Transform-based PCA, which performs dimensionality reduction twice. Bayesian Face Recognition which uses a probabilistic similarity measure as a metric for classication is also discussed. Face images normally contain a lot of ambient variations like changing distance between the subject and camera, changing illumination etc., Chapter 3 discusses normalization and equalization of images in order to counteract these variations. In Chapter 4 we rst explain one of the two proposed techniques. We rst review Canonical Correlation Analysis (CCA) and then use CCA to combine two subspace techniques and perform recognition. We propose another technique in Chapter 5, where dual eigenfaces and Energy Dierence Classier (EDC) to train and classify images are used. Finally we conclude the thesis in Chapter 6.





In this chapter we review some of the well-known face recognition techniques. There are two main aspects for any technique, dimensionality reduction or feature extraction of the face image, and classication of the corresponding lowdimensional representation. An image is converted to a single dimension vector by lexicographical ordering of rows or columns; it can thus be considered as a point in a high-dimensional space. Face images occupy only a small cluster of points in the high-dimensional space; this cluster thus can be represented with a low-dimensional subspace. Principal Component Analysis (PCA) [5] is a widely used technique for face representation. Faces can also be represented with discriminant features helping us to discriminate between two classes/subjects. These features are obtained using Linear Discriminant Analysis (LDA) [18],[6]. The other technique we review is Bayesian Face Recognition [9] which uses probabilistic modeling Bayes rule for recognition.


Principal Component Analysis

PCA, also known as Karhunen-Lo`ve Transform [19], is a data dependent transe form, nds a new coordinate system such that it best accounts the directions of maximum data variation. It is used for dimensionality reduction of a data set while retaining those characteristics that contribute most of the variance. PCA was rst proposed in [20], for representing faces in low dimensions. This idea was further extended for face recognition in [5]. As mentioned before a face image is considered as a point in a high-dimensional space, whose dimension is equal to the number of image pixels. A normalized1 image F of size M N pixels, is observed as a 2-D matrix, with each element corresponding to some intensity. A lexicographical ordering of this matrix is done to obtain a vector of dimension M N . Let there be I subjects with J images per subject for performing the PCA; i.e. training, a total of T = IJ images are available. Let xi represent ith training image. A training set matrix X is formed, where each image is stored in column

X = [x1 , x2 , . . . , xT ] .


Before performing PCA the data is origin centered by subtracting the empirical mean image which is given as


1 T

xi .


Image Normalization is explained in Chapter 3

A zero centered image xi is given as

xi = xi m.


Matrix X is formed by subtracting the mean m from all the training images:

X = [1 , x2 , . . . , xT ] . x


The covariance matrix Cxx is obtained for the face space as


Cxx =

1 T

xi xT = i

1 T XX . T


The eigenvectors of Cxx correspond to the basis vectors of the new coordinate system. The data is projected on these vectors to obtain a low-dimensional representation. The solution for obtaining the eigenvectors is non-trivial, due to the dimensionality of the data; The size of Cxx is M N M N . Computation of eigenvectors and eigenvalues for such a huge matrix is a non-trivial problem. Since only T images are used and T << M N , the maximum rank of Cxx is T 1. This indicates that Cxx has only T 1 signicant eigenvectors, while the remaining eigenvectors are associated with the eigenvalue zero. Using this property the signicant eigenvectors of Cxx are obtained as follows. A new matrix Y of size T T is dened as

Y = X T X.


Its eigenvectors vi and eigenvalues i are obtained from

Y v i = i vi ,

1 i T 1.


From Eqns.(2.5)-(2.9), we obtain

X T Xvi = i vi , X X T (Xvi ) = i (Xvi ), i.e. Cxx (Xvi ) = Cxx wi 1 i (Xvi ), T 1 = i wi . T



Thus, wi = Xvi are the eigenvectors of Cxx . The number of calculations are signicantly reduced, without having to calculate the eigenvectors of Cxx directly. Eigenvectors are arranged in descending order of eigenvalues. A threshold R (R 1) is decided by normalizing the eigenvalues with their total sum, and select rst Tp eigenvectors which capture maximum variance. For a particular threshold R, Tp is calculated as follows
Tp t=1 t T 1 t=1 t



The rst Tp eigenvectors are retained such that they account for the fraction R of the data variance. A projection matrix Wpca is formed using these vectors

Wpca = w1 , w2 , . . . , wTp


When these eigenvectors are viewed as images, they have ghost-like facial char-

(a) First ve eigenfaces

(b) Last ve eigenfaces

Figure 2.1: Eigenfaces for dierent eigenvalues

acteristics; they are called as eigenfaces. Fig. 4.1(a) shows the rst ve eigenfaces, Fig. 4.1(b) shows eigenfaces for the last ve eigenvalues. Zero centered face images are now projected on the Wpca matrix to obtain low-dimensional representations of the images. For any xi image when projected on Wpca , we obtain a vector representation of size Tp given as

T yi = Wpca xi


The dimension is thus reduced from number of pixels to a much smaller number Tp T , thus reducing the complexity of nding a match. All the eigenvectors are orthogonal to each other, i.e.

T Wpca Wpca = ITp Tp 1 T Wpca = Wpca


Using the above property, original image is reconstructed from low-dimensional



Figure 2.2: Original(a) and Reconstructed Image(b) using Principal Components

weights yi given as xi = Wpca yi + m. (2.14)

The mismatch between xi and xi is termed as reconstruction error dened as

= x i xi


In Figs. 2.2(a)-2.2(b) original and reconstructed image using Tp eigenvectors are shown. The image is projected on 100 eigenvectors and is reconstructed using Eqn.(2.14). A match for a particular test image x is found using its reduced dimension PCA domain. We project the x onto Wpca and nd the nearest match using Euclidean distance or cosine similarity measure as explained in Section 4.2.1. The complexity of nding a match is signicantly reduced because of the dimensionality reduction. PCA is class (subject) independent technique [7], it nds direction with maximum variance. It retains variation between images of same subject due to illumination and change viewing direction. These are large compared to variations caused by change in face identity [7][21]. PCA is optimal for reconstruction from low-dimensional representation, but not for discrimination.



DCT-based PCA

We propose a DCT-based PCA as a feature extractor technique for face recognition. The DCT of an N N image {f (x, y)}, as used in image compression [19], is dened as
N 1 N 1

F (u, v) = (u)(v)
x=0 y=0

f (x, y) cos

(2x + 1)u 2N


(2y + 1)v . (2.16) 2N

where (0) =

2 N

and (k) =

1 , N

k = 1, . . . , N 1. Due to its energy com-

paction property, most of the energy is contained in a small number of low frequency components. DCT acts as a simple dimensionality reduction technique, thus allowing us to discard redundant information based on frequency, while retaining information that is vital for characterizing a face. In DCT-based PCA we compute the DCT of the whole face image. The DCT image is scanned in a zig zag manner [19] and we discard the low amplitude frequency coecients which lie at the tail. A PCA is performed on the set of DCT vectors corresponding to the training images. We project the DCT vectors on the Principal Components to obtain a low-dimensional representation of face vectors. Dimensionality reduction is done twice here, the rst dimensionality reduction removes the redundancy within the image, while the PCA removes the redundancy across dierent images.




C la



Discriminant Component

Figure 2.3: 2-D Illustration of LDA and PCA


Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) [6][18] is a class specic method which discriminates between classes by nding projections such that, it maximizes the class separation. Two scatter matrices namely within-class and between-class are formed. The within-class scatter matrix characterizes the variance from the mean for the same class/subject, while between-class scatter matrix characterizes the variance of class means from the overall mean for all the images. A 2-D illustration is given in Fig. 2.3, where PCA nds the direction of maximum variation making it dicult for discrimination. In contrast, LDA gives the direction of maximum discrimination, making it easy to separate the classes A and B. A projection matrix Wlda needs to be computed, which maximizes the ratio of the determinant of between-class scatter to the within-class scatter matrix of the


C la



Principal Component

projected samples. For I subjects (classes) in the set of training images, we label class i as i and denote by Ti the number of training images in i . The mean vector for class i is mi and the mean vector over all training images is m. The between-class scatter matrix is given as

SB =

(mi m)(mi m)T


while the within-class scatter matrix is dened as


SW =
i=1 xj i

(xj mi )(xj mi )T .


If the matrix SW is non-singular, then the optimal projection Wldaopt [18],[6] is computed such that

Wldaopt = arg max


|W T SB W | = [w1 , w2 , . . . , wTl ] |W T SW W |


where {wi |i = 1, 2, . . . , Tl } is a set of generalized eigenvectors of the two scatter matrices, SB and SW . The upper bound on Tl is I 1, and it is the maximum rank of SB . The problem of evaluating Wldaopt is non-trivial [6][18], and the solution is given as SB wi = i SW wi , 1 i Tl . (2.20)

Since the number of training images is small compared to the image vector dimension, matrices SW and SB are not full rank. This problem is solved by reducing the dimensions of the scatter matrices [6]. We perform PCA exactly as in Section 2.2 to obtain low-dimensional representations yi (1 i T ). We use yi instead 13

of xi , to calculate the scatter matrices (Eqns.(2.17)-(2.18)) and the generalized eigenvector matrix W1 (Eqn.(2.20)). The LDA representation for xi is obtained by projecting yi on W1 :

T zi = W 1 yi ,

1iT (from Eqn.(2.12)) (2.21)

T T = W1 Wpca xi T = Wlda xi

The nal Wlda matrix is given as

T T T Wlda = W1 Wpca .


Zero centered face images can be directly projected on Wlda matrix to obtain the low-dimensional representation. As explained in Section 4.2.1 a nearest match for test image x is found using Eqn.(4.13) and Eqn.(4.14).


Bayesian Face Recognition

Bayesian Face Recognition [9] is based on probabilistic modeling and Bayes rule. Unlike the methods discussed in previous section, this technique uses probabilistic similarity measure to nd a match, i.e. it employs a non-Euclidean approach for face recognition. According to this technique, for images F1 and F2 , the intensity dierence = F1 F2 , characterizes typical variation in the appearance of an individual. Two classes of facial variations are dened: intrapersonal variations


((in) ) and extrapersonal ((ex) ) variations. (in) characterizes facial expression variations for the same subject, while (ex) corresponds to variations occurring due to change in subject [9]. The Probabilistic Similarity measure is given as

S(F1 , F2 ) = P ((in) |).


Using the Bayes rule and the estimates of P (|(in) ) and P (|(ex) ) to calculate the a posteriori probability measure P ((in) |) as

P ((in) |) =

P (|(in) )P ((in) ) , P (|(in) )P ((in) ) + P (|(ex) )P ((ex) )


P ((ex) |) = 1 P ((in) |), and P () = P (|(in) )P ((in) ) + P (|(ex) )P ((ex) ).

The priors P ((in) ) and P ((ex) ) can be set to reect specic conditions (e.g., number of test images versus the size of the database). For the test image x we compute the intensity dierence with each of the training image xk , i.e. k = x xk . The best match is found as xm , where m is the image location which maximizes P ((in) |k ) [22],[23]:

m = arg max P ((in) |k ) , (1 k T )


= arg max

P (|(in) )P ((in) ) P (|(in) )P ((in) ) + P (|(ex) )P ((ex) )


m = arg min 1 +

= arg min

P (|(ex) )P ((ex) ) P (|(in) )P ((in) ) P (|(ex) )P ((ex) ) P (|(in) )P ((in) )

= arg min log P (|(ex) ) + log P ((ex) ) log P (|(in) ) log P ((in) )

= arg min log P (|(ex) ) log P (|(in) )



As observed from the above equation, we nd the closest match m that minimizes the dierence between log-likelihoods obtained for intrapersonal and extrapersonal classes. The training data is used to estimate the above log-likelihoods (Eqn.(2.26)), which is discussed in Section 2.4.1.


Subspace Density Estimation

It is dicult to estimate the density due to the dimensionality of image dierence vector, with RM N , where M N is typically O(104 ). Due to limited number of training samples and high computational ineciency, the problem of density estimation becomes intractable as the intrinsic dimension of is signicantly smaller than M N . We follow the method given in [9],[24] to deal with this problem. Here the vector space RM N is divided in two complementary subspaces using eigenspace decomposition [9], for both the intrapersonal and extrapersonal classes. We form intrapersonal and extrapersonal classes and perform PCA on each class. An illustration of decomposition is seen in Fig. 2.4(a). Consider the intrapersonal class, we divide the feature space RM N into the principal face subspace F and the orthogonal component non-face space F. The face subspace F contains Tb

principal components, while the orthogonal space 16


DIFS F (a) Feature Space Decomposition



(b) Typical eigenvalue spectrum

Figure 2.4: Feature Space Decomposition and eigenvalue spectrum consists of the residual i.e. M N Tb

components. The image dierence vector

has two components, the one which lie in F space is termed as Distance from Feature Space (DFFS) [9]. The component which lies in F is termed as Distance in Feature Space (DIFS) [9]. We perform PCA on the intrapesonal set to obtain the low-dimensional estimates using only rst Tb i Tb
(in) (in)

principal components (wi , 1

), where Tb


<< M N [9],[24].

The likelihood estimate can be written as the product of two independent Gaussian densities as in [9],[24]: exp( 1 2 (2)Tb y(in),i =
T wi , Tb i=1
(in) (in)

P (|(in) ) =

2 y(in),i /(in),i ) Tb i=1


exp(2 ()/2(in) ) (in) (2(in) )(M N Tb 1

MN (in)
(in) i=Tb +1 (in)


(in),i =



(1 i

(in) Tb ),

M N Tb

(in),i ,

2 () = (in),i


2 y(in),i .


Here, yi are the principal component coecients. (in) () is the residual energy. The parameter is estimated as given in Eqn.(2.26); it is simple average of the F eigenvalues. Due to limited number of training images the majority of eigenvalues of F are estimated by tting a non-linear decaying function to the 17

eigenvalues obtained performing PCA. A typical eigenvalue spectrum is shown in Fig. 2.4(b), over which a decaying non-linear curve can be tted [9],[24]. The above likelihood estimation is performed for both the intrapersonal and extrapersonal subspaces. The eigenvalues {(in),i } and {(ex),i } have similar decaying characteristics, and {(ex),i } has a wider spread, also Tb

> Tb




We nd the best match for test image x by computing dierence vectors with all the training images, and likelihoods (Eqn.(2.26)) for both subspaces are calculated. Substituting the likelihoods in Eqn.(2.25) and simplifying it further gives (in) (ex) Tb 2 Tb y 2 2 (k ) y(ex),i 2 (k ) (in) (ex) (in),i , m = arg min + + k (in),i 2(in) (ex),i 2(ex) i=1 i=1

k = 1, 2 . . . T. (2.27)

The match m is thus found by minimizing the above equation. This classier is more computationally intensive with respect to Euclidean distance and Cosine similarity measures, since image dierences and energies are not calculated in subspaces. We call this technique as Bayesian Recognizer.



In this chapter, we explain various pre-processing steps used in face recognition. We also discuss the use of Log-Gabor ltered face images as features for recognition.


Pre-Processing and Normalization

A Pre-processor is the rst block of any face recognition system. Pre-processing is a very important operation as it has a signicant eect on recognition rates. All face images must be normalized before they are used for training and testing. Pre-processing is carried out to normalize the various variations and extract relevant information regarding the subject. Pre-processing involves steps like feature detection, de-rotation, scaling and intensity normalization. As given in [14],[15] we assume that the location of various features of face images, namely eyes, nose and mouth are known. These locations play pivotal role in normalization process. The techniques discussed here apply for frontal facial images only.



Due to in-plane rotation of subjects head in the images, faces are positioned dierently. The in-plane rotation is normalized using the coordinate location of


Original Image

Image after De-rotation

Figure 3.1: De-Rotation of an image

eyes. The images are de-rotated once the angle of rotation is calculated from the eye coordinates. The axis convention used in images is dierent from that of conventional Cartesian coordinate system. The top left corner of the image is considered as the origin, and if (xr , yr ) and (xl , yl ) are the coordinates1 of the left and right eye respectively, then x is the vertical distance from top left corner, while y is the horizontal distance from top left corner. The distance between any two locations is measured in terms of number of pixels. Using the eye locations, angle of rotation is calculated as = tan1 xl x r yl y r (3.1)

From the Fig. 3.1 it is observed that, for angle of rotation , we have to de-rotate the whole image by angle , around some point of rotation. We choose the center

Eye coordinates are provided with the databases


(x1 , y1 )

(1 , y1 ) x (2 , y2 ) x

(x2 , y2 )

(a) Original Image

(b) De-rotated Image

Figure 3.2: Non-integer pixel mapping due to de-rotation of image

of image as the point of rotation. Consider an image of size N M , we take location (N/2, M/2) as the point of rotation. Any point (x, y) in an image, when rotated by an angle keeping (N/2, M/2) as the center of rotation, we obtain new locations (, y ) as x

x y

cos sin sin cos

x N/2 y M/2

N/2 M/2


Now R =

cos sin sin cos

is the rotation matrix, in order to de-rotate a particular

point by , we use R instead of R in Eqn.(3.2). We de-rotate the whole image by applying Eqn.(3.2) on all points in the image. Fig. 3.3 shows forward and reverse mapping of pixels (x1 , y1 ) and (2 , y2 ) respectively. Here, the grid represent pixel x locations with integer values. It is observed that some pixel locations of de-rotated image are mapped by non-integer pixel locations in the original image and vice versa. A simple solution for the problem would be gray-level interpolation [25], where a nearest neighbor approach is used. Instead we estimate the intensity value at non-integer pixel locations of the original image using bilinear interpolation [25] 21

(a) Original Image

(b) Image after De-rotation

Figure 3.3: De-rotation of an Image

and map it onto the nal image. An example of a de-rotated face image is shown in the Fig. 3.2.



De-rotation is not performed on all images, since only some of them have head rotation. While scale normalization is performed on all the training and test images. Scale normalization is done to maintain the distance between facial features constant for all the images. Here, we maintain the distance between the eyes constant. A xed distance between the eyes and the nose tip is maintained. Thus, we have both the eyes and nose tip as the three vertexes of an equilateral triangle. In our experiments we maintained a distance of 25 pixels between the features. We use bi-cubic interpolation to ensure the distance between features in an image as constant. Distance between the features is used to compute scaling factors x and y for both the dimensions. We assume that the images are de-rotated, the distance between the eyes is given as dy = yl yr . 22 (3.3)


I(i, j)

I(i1 , j1 )


(b) Interpolated Image I(i1 , j1 ) mapped in original image.

(a) Original Image

Figure 3.4: A neighborhood of 4 4 is used to estimate the pixel values

Similarly the distance between nose and eyes is given as

dx = x n xl ,


where xn is the the vertical distance of nose from the origin. Also, it is safe to assume xl = xr since the images are de-rotated. We calculate x and y as

d1 dx d2 y = , dy x =


where d1 and d2 are the xed distances in pixels between features in vertical and horizontal directions. An image is of size N M , after scaling the new image is of the size N x M y . In order to estimate a value I(i1 , j1 ) in the new image, pixel location (i1 , j1 ) from target image is mapped to the corresponding pixel location in original image.


As observed in Fig. 3.4, there is possibility that target pixel is mapped to a noninteger pixel in the original image. We nd the closest integer valued pixel location (i, j) in the original image, using the integer part obtained from


i1 x

and y =

j1 . y


Let dx and dy be the part after the decimal point, they are given as

dx = x i and dy = y j.


The interpolated value at location (i1 , j1 ) in the new image is given as

2 2

I(i1 , j1 ) =
m=1 n=1

I(i + m, j + n)R(m dx)R(dy n)


where R(x) is a cubic weight function, given as

1 R(x) = [b3 (x + 2) 4b3 (x + 1) + 6b3 (x) 4b3 (x 1)] 6


where b(x) =

x x>0

We normalized all the databases using bi-cubic interpolation. We have maintained a distance of 30 pixels between both eyes and the nose.

0 x 0.


(a) Normal images without any form of illumination normalization

(b) Images after performing BHE

Figure 3.5: (a) Images without any form of equalization, (b) Same images after BHE


Intensity Normalization

Recognizing faces with changing illumination is an important issue in face recognition. Pre-processing on face images to normalize the eect of illumination improves the system performance. Various approaches to reduce illumination eects are proposed. We use a simple Block Histogram Equalization (BHE) approach which has specically being applied to face images [26]. In BHE [26] we consider two images, rst image is a reference face image without any illumination variations. While the second image is the one which is to be normalized. We divide both the images in overlapping square/rectangular blocks. We perform histogram equalization on each of the block in the second image, matching the histogram to the corresponding block in the reference image. The overlapping blocks are scaled by windowing lter and are added to smooth the transition between two blocks. Fig. 3.5 show the original and their corresponding BHE images. In our implementation we have used a square block of size 8 8. After pre-processing the face images, can be directly fed to the Face Recognition System, or we can lter them with Log-Gabor Filters to obtain new features 25

and then pass it to the system. The face recognition techniques use both normalized face images and Log-Gabor ltered images.


Log-Gabor Filters

Log-Gabor lters are an alternative for Gabor lters proposed in 1987 [27]. Natural images are best coded with lters having Gaussian response in log-frequency domain [27]. Log-Gabor lters are derived from Gabor lters [28] which are traditionally used for obtaining simultaneous localization of spatial and frequency information. The Fourier transform of Log-Gabor lter in terms of polar coordinates in frequency domain is given as H(, ) = H H . H is the radial component and H is the angular component given as

H = exp

[log(/o )]2 2[log( /o )]2

and H = exp

( o )2 2T 2


where o represents the center frequency of the lter, is the bandwidth in radial direction, o represent the orientation angle of the lter; T is the scaling factor and is the angular spacing between the lters. The Log-Gabor lter is given as H(, ) = exp [log(/o )]2 2[log( /o )]2 exp ( o )2 . 2T 2 (3.11)

Important thing to notice is that Log-Gabor lters have extended tail at high frequency end and the amplitude spectra falls o as a function of 1/. Natural images have similar amplitude spectra [27], thus Log-Gabor lters are a good 26

Table 3.1: Various Filter Parameters used for Log-Gabor lters Filter Parameter Number of Scales Number of Orientations Minimum Frequency (o ) Scale Multiplication Factor T 2 /o Value 5 8 0.333 2 0.0685 0.65

choice for feature extractor [29],[30]. In our experiments we use Log-Gabor lters with 5 scales and 8 orientations. For each scale we increase the center frequency of the lter by a multiplication factor. We lter these images through these 40 lters and down-sample them by a factor of 4 in both directions. These 40 ltered and down-sampled and ltered images are converted to vector form by row stacking and concatenated to form a single vector. This vector is given as an input to the system. The values for various parameters in our Log-Gabor implementation is given in Table 3.1. It is to be noted that parameter /o in the Table 3.1 is xed for changing o . Also, for = 0 the lter coecient is not dened, we set the lter value as zero.






Subspace techniques like PCA and LDA have their performance limitations [8]. PCA recognition rate deteriorates with varying illumination, while LDA performs poorly when number of training images are less. We propose a method which combines the features extracted from the two techniques based on the correlation between them, and hence obtain the advantages of both. Correlation analysis is useful to nd a linear relationship between any two variables. For multi-dimensional variables, correlation analysis is dependent on the orientation of the two variables, although rotation in space leaves the properties of variables preserved. Canonical Analysis [12] nds out relationship between two sets of multi-dimensional variables, which is independent of ane transformation. While Canonical Correlation Analysis (CCA) [12] nds basis vectors for the two sets of variables such that the correlation between these variables after projection is maximized. CCA makes use of two semantic views of the same object to extract representation of both. CCA is widely used in eld of meteorology, economics and information processing [31]. It has also been applied to the problem of Blind Source Separation [32]. In [33] CCA is applied to learn a semantic representation to web images and their associated text, while in [34] it is used for the segmentation of functional Magnetic

Resonance Images. We employ CCA to extract a maximally correlated representation obtained by dierent feature extractor techniques when applied to face images. CCA nds a transformation for two feature data sets, such that the data covariation of the transformed data sets is maximized. The resulting vectors can then be combined either by concatenating or by adding, to obtain a new low-dimensional representation for face images. In following sections we explain CCA and show how to combine two subspace face recognition techniques.


Canonical Correlation Analysis

Given two zero-mean random vectors x and y of length n1 and n2 respectively, the overall covariance matrix C of x and y is given as


x y



Cxx Cxy T Cxy Cyy


where Cxy as given by Eqn.(4.10) is the cross-covariance matrix of x and y, Cxx and Cyy are covariance matrices of x and y respectively. We wish to nd vectors wx and wy that maximizes the correlation between the projections. The


T T correlation coecient between the projections x = wx x and y = wy y is given as

= = =

E[xy] E[x2 ] E[y 2 ] T E[wx xyT wy ] T T E[wx xxT wy ] E[wx yyT wy ] T wx Cxy wy . T C w wT C w wx xx x y yy y


wx and wy are constrained to unit length. It is sucient to consider only positive values of . We estimate wx and wy that maximizes by computing the partial derivative of 2 with respect to wx and wy [12]

2 = Cxy wy wx 2 = wy
T Cxy wx

T wx Cxy wy T wx Cxx wx T T wy Cxy wx T wy Cyy wy

Cxx wx Cyy wy (4.3)

T T T wy Cxy wx wx Cxy wy and = ; setting where = T T T T (wx Cxx wx ) (wy Cyy wy ) (wx Cxx wx ) (wy Cyy wy )

the partial derivatives to zero gives us the following set of equations.

Cxy wy =
T Cxy wx =

T wx Cxy wy Cxx wx T wx Cxx wx T wx Cxy wy Cyy wy . T wy Cyy wy

(4.4) (4.5)


Equations(4.4) and (4.5) are further simplied to obtain 2 2

1 1 T Cxx Cxy Cyy Cxy wx =

T wx Cxy wy T wx Cxx wx

T wy Cyy wy

1 T 1 Cyy Cxy Cxx Cxy wy =

= 2 wx

wx wy


T wx Cxy wy

T T wx Cxx wx wy Cyy wy

= 2 wy .


1 1 T 1 T 1 Thus wx and wy are the eigenvectors of Cxx Cxy Cyy Cxy and Cyy Cxy Cxx Cxy re-

spectively, with squared correlation coecients 2 being the eigenvalues. Solving Eqn.(4.6) and Eqn.(4.7) we get Tc solutions for wx , wy and , where Tc = min(n1 , n2 ). A complete description for canonical correlations is also given as

Cxx [0] [0] Cxx

[0] Cxy Cyx [0]

wx wy

x wy y wy


T where, , x , y > 0 and x y = 1. The linear combinations xi = wxi x and T yi = wyi y are called as canonical variates and correlation coecients are called

as canonical correlations [12]. The canonical variates corresponding to dierent roots of Equations(4.4) and (4.5) are uncorrelated, i.e = 0 = 0 , = 0

T wxi Cxx wyi T wyi Cyy wyj T wxi Cxy wyj





CCA based Feature Fusion for Face Recognition

In [35] two methods were proposed to combine the outputs of two arbitrary feature extractors (for the use of character recognition) i.e Feature Fusion. Here we apply the same methods to combine two features obtained from extractors like PCA and LDA. The goal is to get the maximum information out of the two feature extractors. We apply CCA to the feature extractors and fuse the two features as discussed below. Let the two feature extractors be trained by T training images. Let X = [x1 , x2 , . . . , xT ] and Y = [y1 , y2 , . . . , yT ] be the outputs of the two extractors, and n1 and n2 be the dimensions of the two outputs, where n1 , n2 T 1.
1 Let xi = xi T T j=1 1 xj and yi = yi T T j=1

yj . The covariance matrices for

X and Y are given as Cxx and Cyy respectively, and are calculated using Eqn.(2.5); due to dimensionality reduction performed by the feature extractors Cxx and Cyy are full rank, and Cxy is the between-set covariance matrix given as

Cxy =

1 T

xi yi = T

1 T XY . T


Canonical correlations and basis vectors are computed for these covariance matrices by using the Eqns.(4.6) and (4.7). Let Wx and Wy be the set of canonical basis vectors obtained for the features X and Y respectively. We now consider the fusion of the two feature for an arbitrary image i with x i and yi being the outputs of the two feature extractors. We project these features


onto the canonical basis vectors and are combined as

T Wx x i T Wy y i

zi1 =

Wx 0

0 Wy

xi yi xi yi


T T zi2 = Wx xi + Wy yi =

T T Wx Wy


Equations(4.11) and (4.12) are referred as as Feature Fusion Method - 1 (FFM-1), and Feature Fusion method - 2 (FFM-2) respectively. zi1 and zi2 are the combined features for an image i. Note that we are combining two dierent features independent of their coordinates and their length. We tested CCA based Fusion method on three feature extractors namely PCA, LDA and Discrete Cosine Transform (DCT)-based PCA. We use Euclidean distance and Cosine similarity measure to nd a match. The test image is represented by zk , where k corresponds to the Fusion method used. For Euclidean distance measure, we nd the best match which satises i,

= arg i

i[1,2,...,T ]


zk zik

k = 1, 2.


In Cosine similarity measure the cosine of the angle between the probe vector and training vectors is computed, and the best match for the test vector zk is found i by performing

= arg i

i[1,2,...,T ]


zT zik k zk zik

k = 1, 2.



Table 4.1: No. of subjects and testing images used to test the system Database FERET AR Subjects Testing images 157 507 134 1248 Sets 4 4


Performance Evaluation

We employ FERET and AR face databases for design and test our recognition system. The details about the databases are given in Appendix A. Images are normalized to the size of 80 48 and normalization techniques are explained in Chapter 3. Tests are conducted on equalized [26] and Log-Gabor ltered [27] images. All the images used are frontal and have variations in expression and illumination. For training we select two images per subject for all the databases. Table 4.1 contains the details of various training and test images used for the system. Four sets of training and testing images are formed, ensuring a proper shuing in training and testing sets. This gives more robustness to the recognition rates. Some of the training and testing images are shown in Fig. 4.1. For images of size 80 48 the image vector is of size 3840; after performing DCT, only 1500 coecients are retained. Using CCA-based fusion we combine the outputs of Spatial-domain PCA and LDA, DCT-based PCA1 (Section 2.2.1). To nd the best match we independently employ both the Euclidean distance and cosine similarity measures. Mean performance over all the four sets of images is given in the following tables.

For simplicity we will refer DCT-based PCA feature vector as DCT vector


(a) Training images

(b) Test images

Figure 4.1: Some of the unequalized training and test images


Performance without Log-Gabor ltering

The performance comparison of Spatial-PCA, LDA and CCA-based fusion is shown in Tables 4.2-4.3. For FERET database the length of reduced dimensional vector representation for Spatial-PCA and LDA is 260 and 143 respectively. The length of FFM-1 vector is 286 and FFM-2 is 143. We clearly see that for FERET and AR databases, CCA-based fusion works better than individual PCA and LDA methods for both the distance measures. An improvement of 8% and 9% for the FERET and AR Databases is observed. In general it is observed LDA by itself has a low recognition rate. FFM gives improvement in performance over Spatial-PCA and LDA. FFM-2 is computationally half as complex as FFM-1, but has the same performance. The length of DCT-based PCA vectors for FERET database is 170. As the length of DCT vector is smaller than PCA feature vector, complexity of the CCA for DCT and LDA combination is reduced. We compare the performance of DCT, LDA and CCA-based combination for both the databases and is given in Table 4.435

Table 4.2: Performance Comparison of PCA, LDA and FFM-1, 2 using Euclidean distance Database FERET AR Mean PCA 77.91% 72.12% 73.79% LDA 71.01% 69.43% 69.89% FFM-1 80.62% 74.34% 76.15% FFM-2 81.76% 76.79% 78.23%

Table 4.3: Performance Comparison of PCA, LDA and FFM-1, 2 using Cosine Similarity Measure Database FERET AR Mean PCA 78.21% 75.70% 76.43% LDA 72.19% 73.11% 72.84% FFM-1 86.79% 83.70% 84.59% FFM-2 86.84% 84.34% 85.06%

4.5. We observe an improvement of 6% and 11% for FERET and AR Databases over the individual methods.


Performance with Log-Gabor Filtering

We also test the system with Log-Gabor ltered images. The lter parameters are dened in Table 3.1 we down-sample the ltered images by a factor of four in both dimensions. For each image there are 40 ltered images. We concatenate all the down-sampled images to obtain a feature vector of dimensions 9600. We perform

Table 4.4: Performance Comparison of DCT, LDA and FFM-1, 2 using Euclidean distance Database FERET AR Mean DCT 78.25% 68.26% 71.15% LDA 71.01% 69.43% 69.89% FFM-1 83.24% 80.24% 81.11% FFM-2 83.68% 80.45% 81.38%


Table 4.5: Performance Comparison of DCT, LDA and FFM-1, 2 using Cosine Similarity Measure Database FERET AR Mean DCT 80.38% 69.89% 74.23% LDA 72.19% 73.11% 72.92% FFM-1 86.39% 84.58% 85.10% FFM-2 86.79% 84.76% 85.12%

Table 4.6: Performance Comparison of PCA, LDA and FFM-1, 2 for Log-Gabor feature vectors using Euclidean distance Database FERET AR Mean PCA 83.24% 83.22% 83.23% LDA 77.57% 79.33% 78.82% FFM-1 85.41% 81.07% 82.32% FFM-2 86.29% 82.84% 83.83%

PCA and LDA on the Log-Gabor feature vectors and use it for recognition. A boost in recognition rates is observed for PCA and LDA with the use of Log-Gabor feature vectors. A marginal improvement is observed in the case of CCA-based fusion. Log-Gabor ltering cannot be performed for DCT ltered

images and vice-versa, CCA-based fusion methods with DCT-based PCA as a feature extractor thus cannot be used. From Tables 4.2-4.7 it is observed that the mean performance of combined techniques is better compared to the individual methods. All combinations of

Table 4.7: Performance Comparison of PCA, LDA and FFM-1, 2 with Log-Gabor feature vectors using Cosine Similarity Measure Database FERET AR Mean PCA 84.37% 86.01% 85.53% LDA 78.75% 83.00% 81.77% FFM-1 87.77% 85.30% 86.01% FFM-2 88.22% 86.33% 86.88%


FFM1 FFM2 90













Figure 4.2: Mean performance over all databases using FFM-1 and FFM-2

feature extractors are equally eective and the best recognition rates obtain for all three combinations are within 2%. It is also seen that, use of cosine similarity measure boosts the recognition rates throughout. A comparative bar plots for mean performance over the databases are shown in Fig. 4.2, where FFM-1 and FFM-2 are compared with the other subspace methods. Both the fusion methods are equally eective. The combination of two techniques gives an improvement in recognition rate. As mentioned earlier the fusion method FFM-2 is computationally half as complex as FFM-1, but has the same performance. CCA computations for LDA-DCT combination are less intense, can be thus prefered over Spatial-PCA and LDA combination.





Bayesian face recognition [9] uses probability rule for classication. As explained in Chapter 2, density estimation is performed for the intrapersonal and extrapersonal classes using Eqn.(2.26), while the classication is as explained in Section 2.4. Due to the dimensionality of the data the problem of density estimation and classication is non-trivial [9] [24]. We propose a signicantly simpler approach which uses dierence in subspace projection energies as a metric for classication. As explained in Section 2.4 we create intrapersonal and extrapersonal classes by computing intensity dierences between images. Principal components [9] for both the classes are obtained. Intensity dierence images are projected on to the two subspaces, and using the dierence in these projection energies images are classied.


Dual Eigenfaces

As mentioned in Chapter 2, there are I subjects or classes and J images per class for training, where J 2. We form the intrapersonal training class by taking intensity dierence between images of same class for all the classes. The

extrapersonal set is computed as the intensity dierence between the images of dierent classes. There are I J 2 and J 2 I 2 distinct images that form the

intrapersonal and extrapersonal classes respectively. Using combinatorial analysis we observe that only P = I (J 1) and R = (I 1)J 2 are linearly independent images in the intrapersonal and extrapersonal sets respectively. We have T = IJ training images; let x1 , x2 , . . . , xJ belong to Subject i. The intrapersonal set for Subject i is formed by computing the dierence between xj
(i) (i) (i) (i) (i)

and xj+1 , (1 j J 1). All these dierence vectors are linearly independent. There are J 1 dierence images for each subject, giving a total of I (J 1) images for all I subjects. For the extrapersonal class consider Subjects i and i + 1 (1 i I 1). We compute the intensity dierence between the vector xj and every xk
(i+1) (i)

, (1 j, k J) there are J such linearly independent dierences.

For all J images of Subject i there are J J such dierences with each image of Subject i+1. Similarly we compute the dierences for Subjects i+1 and i+2 and so on. We get I 1 such combinations, so we nd (I 1) J 2 linearly independent extrapersonal dierences. For example, if J = 2, then images x1 , x2 belong to Subject i and x1
(i+1) (i) (i)

, x2


belong to Subject i + 1; there are thus 4 linearly

independent image dierences available. For I such Subjects as explained for the case of intrapersonal class, there are I 1 combinations. There are 4 (I 1) dierence vectors. Let in = [1 , 2 , . . . , P ] contain the intrapersonal vectors, while the matrix ex = [1 , 2 , . . . , R ] contains the extrapersonal vectors. The
(ex) (ex) (ex) (in) (in) (in)


covariance matrices Cin and Cex are given as



1 = P 1 R

k k
k=1 R (ex)



= =

1 in T in P 1 ex T . ex R

(5.1) (5.2)

Cex =

k k


Clearly Cin and Cex can have maximum rank of P and R respectively. PCA [9] is performed on Cin and Cex to obtain the matrices Win and Wex . These matrices consist of p (p P ) and r (r R) eigenvectors corresponding to the signicant eigenvalues of Cin and Cex . These eigenvectors are called dual eigenfaces [9].


Intrapersonal and Extrapersonal Classier

Classication of a given dierence image as intrapersonal or extrapersonal is nontrivial. A 3-D scatter plot of rst three PCA components for the dual eigenfaces is shown in Fig. 5.1. We observe that the intrapersonal subspace is embedded in the extrapersonal . The Intrapersonal class is therefore a subset of the extrapersonal class. In order to illustrate the complexity of classication we consider 2-D Gaussian distribution for intrapersonal (A) and extrapersonal (B) classes, with zero mean vectors. As pointed out earlier, intrapersonal variations are smaller than large extrapersonal variations. Using Bayes rule [10] we nd the decision boundary for classication. For eg. consider class w1 with zero mean and covariance matrix 1 = 1/2 0 0 1/4 , while class w2 is also zero mean and with covariance matrix


20 15 10 5 0 5 10 15 Intrapersonal 20 40 20 0 20 40 20 0 10 10 Extrapersonal 30 20

Figure 5.1: Distribution of Intrapersonal() and Extrapersonal() classes for rst 3 principal components 2 0 0 2

2 =

. Assuming equal prior probabilities i.e P (wi ) = 0.5 for i = 1, 2,

the discriminant function is given as

1 1 gi (x) = (x i )T 1 (x i ) ln |i | . i 2 2


The decision region is obtained by setting g1 (x) = g2 (x). We substitute the means and covariance matrices to simplify the equation to obtain

x2 x2 1 2 + = 1. 2.3105 0.9902


It is observed that decision boundary is an elliptical contour centered around the origin. A cross-section of the two Gaussian distributions is shown in Fig. 5.2. As


p(x|i )






Figure 5.2: Cross-section of 2-D Gaussian distributions

the number of dimensions increases, the decision region becomes a hyper-ellipsoid. Obtaining the decision boundary becomes non-trivial and computationally intensive. Instead we propose an Energy Dierence Classier (EDC).


Energy Dierence Classier

The Energy Dierence Classier works on the principle of dierence in projection energies, and does not require any probability estimation. Given a test image, the intensity dierence with each training image is computed and projected on the intrapersonal and extrapersonal eigenvectors. The match is the training image which gives the minimum dierence between the projection energies. For a test image x which needs to be classied, we consider an image xk of a known class. If x is from the same class as xk where k = 1, 2, . . . , T (T = IJ), the dierence


Principal Component (Intrapersonal)

Principal Component (Extrapersonal)

Intrapersonal (A)

Extrapersonal (B)

Figure 5.3: Intrapersonal and Extrapersonal subspace vector is from the intrapersonal set, otherwise it belongs to the extrapersonal. The dierence vector k = xk x is projected on the dual eigenfaces to obtain

yk yk


T = Win k , T = Wex k ,


k = 1, 2, . . . T.

We now compute the dierence in projection energies as

Pdk = yk

(ex) 2


(in) 2

k = 1, 2 . . . T.

The best match for the test image x is found as xm , where m is the image location which minimizes Pdk :

m = arg min {Pdk }


= arg min


(ex) 2


(in) 2

k = 1, 2 . . . T.

A two dimensional illustration of intrapersonal and extrapersonal subspace is shown in Fig. 5.3. The dark meshed region is the intrapersonal subspace A, while the grey shaded region is the extrapersonal subspace B. We refer to


(5.5) (5.6) (5.7) (5.8)


yk (in) yk Pdk

(ex) 2 2





ym (in) ym

(ex) 2 2



Pdm (best match)

0 0 5 10 15 20 25 30 35 40

Figure 5.4: Plot of Subspace Energies Fig. 5.3 to explain our proposed Energy Dierence Classier(EDC). If k is from region A (intrapersonal space), then the projection energy yk
(in) 2

would be

close to the energy of k , since A B the same is true for extrapersonal subspace projection energy yk
(ex) 2

. The dierence in the projection energies Pdk is

(ex) 2

therefore small for k A. If k B and k A, then yk / yk

(in) 2

is large, while

is small with respect to the energy of k , as the intrapersonal class fails

to capture the variation due to change in the subject; hence Pdk is large. It turns out that yk
(ex) 2

is always at least as large as yk

(in) 2

. For example

this behavior can be observed from the plots of the two projection energies and the dierence between them (Fig. 5.4). It is not sucient to consider yk yk
(ex) 2 (in) 2


independently; both these metrics complement each other. Minimizing

Pdk (Eqn.(5.8)) employs both these metrics and gives us the best match.


Table 5.1: Comparison of various techniques without Log-Gabor ltering Databases FERET AR Mean FFM-1 86.79% 84.58% 85.22% FFM-2 86.84% 84.76% 85.36% Bayes 86.24% 85.47% 85.69% EDC 86.39% 86.23% 86.28%


Performance of EDC

Training and testing are performed with same image sets as given in Table 4.1. We compare the performance of EDC with Bayesian Face Recognition and the best results obtained using CCA-based Fusion. Tests are performed on the shued sets. Training is done with two images per subject for all the databases. For the FERET Database we select 157 subjects to train the system, and we use 507 images are used for testing. A total of 157 and 303 dierence vectors are used to train the intrapersonal and extrapersonal classes respectively. For the AR Database 134 subjects are used for training, and 1248 images are used for testing. A total of 134 and 266 dierence vectors are used to train the intrapersonal and extrapersonal classes respectively. From Table 5.1 for equalized images a mean improvement of 1% is seen over CCA-based Fusion. Also, a 1% improvement is observed over Bayesian Recognizer. All the training and testing images are ltered through 40 Log-Gabor lters, and are down-sampled by a factor of four in both dimensions as discussed in Section 3.2. The system is trained and tested with these Log-Gabor features. An improvement is observed when Log-Gabor ltered images are used. The mean eciency of EDC


Table 5.2: Performance on Log-Gabor ltered images Databases FERET AR Mean FFM-1 87.77% 85.30% 86.01% FFM-2 88.22% 86.33% 86.88% Bayes 91.96% 89.53% 90.23% EDC 91.96% 90.48% 90.91%














Figure 5.5: Mean performance over all Databases

is as high as 90.91% giving a minimum improvement of 4% over CCA-based Fusion methods. Also, an improvement of 2% over Bayesian Recognizer is observed. A bar plot of mean performance for each technique for all the features over all databases is shown in the Fig. 5.5. EDCs performance is the best in comparison with all other techniques. We repeat the experiments on unequalized images, the decrease in the recognition rates is less for EDC. A mean eciency of EDC decreases maximum by 3%, unlike other methods the fall in the rates is minimum. It is thus observed that, EDC is robust to illumination changes unlike other methods. 47

Tests conducted on FERET and AR Databases show a substantial improvement in the recognition rate over the benchmark PCA/LDA. At the same time EDC is simple and computational ecient compared to LDA and Bayesian Recognizer. A further improvement is obtained by using down-sampled Log-Gabor ltered face images, as the input to the system, boosting the mean eciency. From Eqn.(2.27) we observe that for nding a match with the Bayesian Recognizer, T (M N + 2) (Tb

+ Tb


+ 1) multiplications are required. The EDC

requires T (M N + 1) (p + r) multiplications to nd a match. In EDC we need not explicitly compute the image dierences and their energies. We can directly project the test images on the eigenvector matrices and minimize the dierence energies as the training image projections are pre-calculated. For a Bayesian Recognizer all the training images are stored, since intensity dierences with test images are required to be calculated. In EDC we just store the projection of training images on both the subspaces. Bayesian Recognizer is thus more complex compared to EDC, both in training and testing.






In this thesis, we have proposed two subspace combination techniques for use in face recognition. The rst method uses Canonical Correlation Analysis to combine the outputs of two feature extractors, like PCA and LDA, to obtain advantages of both. A boost in the recognition rates is observed compared to the individual methods. We propose a DCT-based PCA feature extractor which performs double dimensionality reduction. The DCT operation reduces the redundancy within the image, while the PCA removes the redundancy across dierent images. This system was tested on two dierent databases to obtain recognition rates as high as 87%. The other technique uses dual eigenfaces to extract features from intensity dierence images. Using the projection energies in the intrapersonal and extrapersonal subspaces, an Energy Dierence Classier (EDC) is proposed to classify the dierence images. The EDC is robust to illumination changes and the fall in recognition rates is the smallest when unequalized images are used. The use of Log-Gabor lter is proposed as a pre-processor, which boosts the recognition rates above 90%. The EDC gives an improvement over Bayesian Face Recognition, which is computationally intensive. EDC also gives an improvement over

CCA-based Fusion methods, with reduced complexity.


Future Work

In CCA-based Fusion, dierent combinations of feature extractors other than PCA and LDA, can be used for the task of recognition. Instead of considering the whole face image we could use features like eyes, nose, mouth, etc., and perform recognition with both EDC and CCA-based Fusion methods. For Log-Gabor features instead of ltering the whole face we could, consider the lter response at various feature points n of the face (eyes, nose, mouth etc.). We reduce the feature vector length to n p, where p is the number of Log-Gabor lters used.


[1] A. K. Jain, A. Ross, and S. Pankanti, Biometrics: A tool for information security, IEEE Trans. on Information Forensics and Security, vol. 1, no. 2, pp. 125143, June 2006. [2] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld, Face Recognition: A literature survey, ACM Computing Surveys, vol. 35, no. 4, pp. 399458, 2003. [3] R. Chellappa, C. Wilson, and S. Sirohey, Human and machine recognition of faces: A survey, Proceedings of the IEEE, vol. 83, no. 5, pp. 705741, May. [4] P. Phillips, P. Grother, R. Micheals, D. Blackburn, E. Tabassi, and J. Bone, FRVT 2002: Overview and summary, 2002 Face Recognition Vendor Test, 2003. URL: [5] M. A. Turk and A. P. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience, vol. 3, no.1, pp. 7186, March 1991. [6] D. Swets and J. Weng, Using discriminant eigenfeature for image retrieval, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 18, 1996. [7] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs. sherfaces: Recognition using class specic linear projection, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19 no. 7, 1997. 51

[8] A. Martinez and A. Kak, PCA versus LDA, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228233, Feb. 2001. [9] B. Moghaddam, T. Jebara, and A. Pentland, Bayesian Face Recognition, Pattern Recognition, vol. 33, no. 11, pp. 17711782, Nov. 2000. [10] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classication. New York: Wiley Interscience, 2000. [11] W. Zhao and R. Chellappa, Image based face recognition, issues and methods, B. Javidi, editor, Image Recognition and Classication, pp. 375402, 2002. [12] H. Hotelling, Relation between two sets of variables, Biometrika, vol. 28, 1936. [13] R. Gross, S. Baker, I. Matthews, and T. Kanade, Face recognition across pose and illumination, in Handbook of Face Recognition, S. Z. Li and A. K. Jain, Eds. Springer-Verlag, June 2004. [14] P. J. Phillips, H. Wechsler, J. Huang, and P. Rauss, The FERET database and evaluation procedure for face recognition algorithms, Image and Vision Computing Journal, vol. 16, no. 5, 1998. [15] P. J. Phillips, H. Moon, S. A. Rizvi, and P. Rauss, The FERET datbase and evaluation procedure for face recognition algorithms, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 10, 2000. [16] A. M. Martinez and R. Benavente, The AR face database, CVC Technical Report, vol. 24, 1998. 52

[17] Yale Face Database. URL: yalefaces.html [18] R. Fisher, The statistical utilization of multiple measurements, Annals of Eugenics, vol. 8, 1938. [19] K. Sayood, Introduction to Data Compression. San Fransisco: Elsevier, 2003. [20] L. Sirovich and M. Kirby, Low-dimensional procedure for the characterization of human faces, Journal of the Optical Society of America A, vol. 4 no. 3, 1987. [21] Y. Moses, Y. Adini, and S. Ullman, Face recognition: The problem of compensating for changes in illumination direction, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19 no. 7, pp. 721732, July 1997. [22] M. L. Teixeira, The Bayesian Intrapersonal/Extrapersonal Classier. Collins, Colorado: M.S. thesis, Colorado State University, 2003. [23] J. Czyz, Decision Fusion in Identity Verication using Facial Images. Louvain-la-Neuve, Belgium: Phd thesis, Universit catholique de Louvain, e 2003. [24] B. Moghaddam and A. Pentland, Probabilistic visual learning for object representation, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19 no. 7, 1997. [25] R. Gonzalez and R. Woods, Digital Image Processing. Addison-Wesley Longman Publishing Co., Inc., 1992. Boston, MA, USA: Fort


[26] K. S. Rao and A. N. Rajagopalan, A probabilistic fusion methodology for face recognition, EURASIP Journal on Applied Signal Processing, vol. 2005, no. 17, pp. 27722787, 2005. [27] D. Field, Relation between the statistics of natural images and the response proles of cortical cells, Journal of the Optical Society of America A, vol. 4, no. 12, 1987. [28] L. Wiskott, J.-M. Fellous, N. Krueuger, and C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Trans. on Pattern Analysis and Machine Intelligence, July. [29] N. Rose, Facial expression classication using gabor and log-gabor lters, in FGR 06: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06). Washington, DC, USA: IEEE Computer Society, 2006, pp. 346350. [30] P. Yao, J. Li, X. Ye, Z. Zhuang, and B. Li, Iris recognition algorithm using modied log-gabor lters, in ICPR 06: Proceedings of the 18th International Conference on Pattern Recognition. Washington, DC, USA: IEEE Computer Society, 2006, pp. 461464. [31] S. V. Vaerenbergh, J. Via, and I. Santamaria, Online Kernel Canonical Correlation Analysis for supervised equalization of wiener systems, International Joint Conference on Neural Networks (IJCNN), vol. 3, no.1, pp. 7186, July 16 - 21, 2006 1991. [32] W. Lie, D. Mandic, and A. Cichocki, An analysis of the CCA approach for


blind source separation and its adaptive realization, Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on, 2002. [33] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, Canonical Correlation Analysis: An overview with application to learning methods, Neural Comp., vol. 16, pp. 26392664, 2004. [34] O. Friman, M. Borga, P. Lundberg, and H. Knutsson, Canonical correlation as a tool in functional MRI data analysis, in Proceedings of the SSAB Symposium on Image Analysis, March 2001. [35] Q.-S. Sun, Z. dong Liu, P.-A. Heng, and D.-S. Xia, A theorem on the generalized canonical projective vectors, Pattern Recognition, vol. 38, 2005.




Standard face databases are a must to test any Face Recognition system. Each database is unique, since they contain face images captured with varying pose, illumination, and expressions. Also the images were captured over a period of time; since facial features change in a due amount of time, a system should be robust to such changes. For our experiments we use 2 standard face databases; the FERET database, and the AR database.


The Face Recognition Technology (FERET) Database

The FERET Database [14],[15] is one of the most widely used face database. The FERET program ran from 1993 1997, it was sponsored by Department of Defenses Counterdrug Technology Development Program through the Defense Advanced Research Products Agency (DARPA). Its primary mission was to develop automatic face recognition capabilities that could be employed to assist security, intelligence and law enforcement personnel in the performance of their duties. The FERET database consists of 14051 eight-bit gray-scale images of human heads with views ranging from frontal to left and right proles, varying expressions and illumination. A total of 1, 199 subjects contributed for the formation of the

database. The images are stored in Tagged Image File (TIF) format as raw 8-bit data. All the images are of size 384 256 pixels.



The naming convention for the FERET imagery is of the form nnnnnxxf f f q yymmdd.ext where 1. nnnnn is a ve digit integer that uniquely identies the subject 2. xx is a two lowercase character string that indicates the kind of imagery: f a indicates a regular frontal image f b indicates an alternative frontal image, taken seconds after the corresponding f a ba is a frontal images which is entirely analogous to the f a series bj is an alternative frontal image, corresponding to a ba image, and analogous to the f b image bk is also a frontal image corresponding to ba, but taken under dierent lighting bb through bi is a series of images taken with the express intention of investigating pose angle eects. Specically, bf bi are symmetric analogues of bb be ra through re are random orientations. Their precise angle is unknown

3. f f f is a set of three binary (zero or one) single character ags. In order these denote:


Indicates whether the image is releasable for publication. The ag has fallen into disuse. Image is histogram adjusted if this ag is 1. Indicates whether the image was captured using ASA 200 or 400 lm, 0 implies 200, while 1 implies 400.

4. q is a modier that is not always present. When it is, the meanings are as follows: a : Glasses worn. Note that this ag is a sucient condition only, images of subjects wearing glasses do not necessarily carry this ag b : Duplicate with dierent hair length c : Glasses worn and dierent hair length d : Electronically scaled (resized) and histogram adjusted e : Clothing has been electronically retouched f : Image brightness has been reduced by 40% g : Image brightness has been reduced by 80% h : Image size has been reduced by 10%, with white border replacement i : Image size has been reduced by 20%, with white border replacement k : Image size has been reduced by 30%, with white border replacement

5. yymmdd is the date on which the picture was taken, the format used is year,month and day. 6. The lename extension is .TIF As the Database contains large number of images, with large variations in characteristics. The FERET group segregated the database in 4 groups, given as 58

Fa: Regular frontal images at normal conditions with total of 1196 face images. FaFb: Alternative frontal images taken seconds after corresponding Fa images, total 1195 images are present. Duplicate I: They are obtained anywhere between one minute and 1021 days ater the FaFb images were taken. A total of 272 images were captured under this category. Duplicate II: These images are strict subset of Duplicate I images. These are take at least 18 months after Fa images were captured. This group contains 234 face images.


The AR Database

This face database was created by Aleix Martinez and Robert Benavente in the Computer Vision Center (CVC) at the University of Alabama at Birmingham [16]. The database contains over 4, 000 color images corresponding to 136 peoples faces (76 men and 60 women). Images feature frontal view faces with dierent facial expressions, illumination conditions, and occlusions (sun glasses and scarf). The pictures were taken at the CVC under strictly controlled conditions. Each person participated in two sessions, separated by two weeks (14 days) time. The same pictures were taken in both sessions. All images are stored as RGB RAW les. They are of size 768 576 pixels and of 24-bits of depth. For each subject 13 pictures were photographed in each session. Images belonging to rst session are


indexed as 1 to 13, while those in the second session are indexed as 14 to 26. The nomenclature for AR database is given as Male images are stored as: m xxx yy.raw, while females images as: f xxx yy.raw xxx is a unique person identier (from 001 to 076 for males and from 001 to 060 for females). yy species the features of each image; its meanings are described at the following table: 01 : Neutral expression 02 : Smile 03 : Anger 04 : Scream 05 : Left light on 06 : Right light on 07 : All side lights on 08 : Wearing sun glasses 09 : Wearing sun glasses and left light on 10 : Wearing sun glasses and right light on 11 : Wearing scarf 12 : Wearing scarf and left light on 13 : Wearing scarf and right light on 14 - 26 : Second session (same conditions as 1 to 13)



1. Amit C. Kale and R. Aravind, Face Recognition using Canonical Correlation Analysis, Proc. of the Thirteenth National Conference on Communications, Kanpur, Jan. 2007.