You are on page 1of 6

Face Recognition using Orthogonal Weighted

Locally Linear Discriminant Embedding

Hadiseh Ghafari Mejlej Majid Mohammadi


Department of Computer Engineering Department of Computer Engineering
Shahid Bahonar University of Kerman Shahid Bahonar University of Kerman
Kerman, Iran Kerman, Iran
ghafari.hadiseh@gmail.com mjmohamadi@yahoo.com

Abstract— In this paper an efficient feature extraction Linear Discriminant Analysis (LDA) [4].
method called Orthogonal Weighted Locally Linear Discriminant Linear methods fail in the cases where data have nonlinear
Embedding (OWLLDE) is proposed for face recognition. The distribution. To overcome this problem, nonlinear methods
OWLLDE algorithm is motivated by locally linear embedding
(LLE) algorithm, modified maximizing margin criterion
such as kernel-based methods and manifold-based ones were
(MMMC) and cam weighted distance. In OWLLDE, the LLE presented for feature extraction. In kernel-based methods, first
algorithm is modified based on the weighted distance the data are mapped into a space with higher dimensions and
measurement to select more suitable neighbors for each data. In then linear methods are used for dimension reduction in this
this way, the performance of OWLLDE in feature extraction will new space. Kernel-PCA (KPCA) [5] and kernel-LDA (KLDA)
be improved for deformed distributed data. Moreover, [6] are two well-known methods that can be considered as
OWLLDE preserves the local geometry structure of the data
based on modified LLE and also makes full use of class
kernel versions of PCA and LDA.
information to improve the discriminant ability by a vector Unlike kernel-based methods, the manifold learning-based
translation and rescaling model. Finally to improve the methods are based on the idea that the data points are actually
recognition accuracy, we use Gram–Schmidt orthogonalization to samples from a low-dimensional manifold that is embedded in
obtain the orthogonal basis vectors. The results of experiments on a high-dimensional space. The methods such as isometric
ORL and YALE databases show the superior performance of feature mapping (ISOMAP) [7], Locally Linear Embedding
OWLLDE.
(LLE) [8], [9], Laplacian Eigenmap (LE) [10], [11] and Local
Keywords— cam weighted distanc; feature extraction; locally Tangent Space Alignment (LTSA) [12]can be classified as the
linear discriminant embedding; manifold learning. most well-known manifold-based methods.
LLE is designed to maintain the local linear reconstruction
I. INTRODUCTION relationship among neighboring points in the low dimensional
In recent decades, the face recognition has received a lot of space. However, LLE has limitations and is not suitable for
attention in various applications such as video coding, human- face recognition. It only introduces an embedding of the
computer interface, and surveillance [1], [2]. Appearance- training data points, and does not present a method for
based face recognition has been studied widely since 1990. mapping new data points that do not exist in the training set,
Two central issues in appearance-based face recognition which is the well-known out-of-sample problem. Another
are feature extraction for face representation and image limitation is that LLE is an unsupervised algorithm; such
classification. In appearance based techniques, an -by- pixel characteristic can cause to impair the recognition accuracy.
face image can be viewed as an -dimensional vector. This However, LLE algorithm actually depends on a distance
approach encounters problems in the cases where there are a measure. As a result, the performance of the method relies on
small number of high dimensional samples. The feature the choice of an appropriate measure. To solve this problem,
extraction methods can solve this problem by reducing the the cam weighted distance in [13] is proposed for improving
dimensions. The aim of feature extraction methods is to nearest neighbor finding. The “cam weighted distance” gives a
project the high-dimensional data into low-dimensional deflective cam contours for equal-distance contour in
feature spaces. These methods can be classified into two classification as mentioned and shown in [13], Fig. 1]. Since,
categories based on using or not using the class information: the samples are not isolated instances; the inter-prototype
supervised and unsupervised methods. They can also be relationship should not be neglected. As a result, to globally
divided into linear and nonlinear methods. Linear methods improve distance measure, we should consider both variances
obtain the low-dimensional space from high-dimensional one with its own orientation and discrimination with respect to its
using linear transformation. The most well known linear different surroundings of each sample.
methods are Principal Component Analysis (PCA) [3] and Bo Li et al. proposed locally linear discriminant embedding

978-1-4673-6206-1/13/$31.00 ©2013 IEEE


(LLDE) for face recognition [2]. In their proposed algorithm, a and orientation of distribution, is a normalized vector
vector translation and distance rescaling model was denoting the deformation orientation, √ , and
constructed to enhance the recognition accuracy of the original represent the cam distribution with parameters and in the
LLE in two ways. One is the property of the objective function direction , denoted as ~ , , [13].
of LLDE that tries to preserve the objective function of LLE, Theorem 1. If a random vector ~ , , , then
and the other is the transformation process to maximize the . . and . where and are
modified maximizing margin criterion (MMC) [14],[15]. constants.
In this paper, we propose a new dimensionality reduction ⁄
method called Orthogonal Weighted Locally Linear √2. . ⁄
(3)

Embedding (OWLLDE). The proposed method modifies √2. (4)

LLDE based on weighted distance to improve the performance
of feature extraction in the cases which data distribution is is the Gamma function . . ( 0)
deformed. In these cases the performance of Euclidian [13].
distance for measuring the similarity will decrease. The cam distribution is obtained from a standard normal
Considering the distribution of the information surrounding distribution transformed by (2) which makes it an eccentric
each datum to optimize the distance measure, we can improve distribution that biases towards a given direction. Therefore,
the procedure of selecting neighbors in LLDE and avoid the the Euclidian distance is not suitable for describing the
redundancy and overlapping due to improper neighbor choice. similarity between data. To measure the similarity it is
Finally, to improve the discriminating power, we use the possible to use the inverse transformation,
Gram-Schmidt orthogonalization to obtain orthogonal basis ⁄ . ⁄ to eliminate the deformation and
vectors. then compute the distance. In this case the resulted distribution
The rest of this paper is organized as follows. The cam is not eccentric any more. Thus, we can obtain the cam
weighted distance for modification of LLDE in the following weighted distance to redress the deformation.
sections is introduced in Section II. In Section III briefly Definition 2 (Cam weighted distance). Assume is
reviews the algorithm LLDE. In Section IV, our algorithm is the center of a cam distribution , , . The cam
presented in detail. Experimental results on Yale and ORL weighted distance from a point to is defined to be
databases are given in Section V. Finally, conclusions are [13]
presented in Section VI.
, (5)
II. THE CAM WEIGHTED DISTANCE
The nearest neighbor method assumes a standard normal The expectation of random variable is equal to sum of
distribution on data so that each datum can be considered as probability of each possible outcome multiplied by its value.
the center of a probability distribution. It also uses the We assume that the random vector has cam distribution, we
Euclidian distance to measure the similarity between a datum can compute its expected value using the origin point of this
and its neighbors. However, because of the attraction, distribution, and its nearest neighbors ,…,
repulsion, strengthening effect and weakening effect between approximately and then estimate parameters of the cam
data, the standard normal distributions will be greatly distribution, including , , .
deformed. So, if we ignore the deformation of the data and use To do this, first, should be converted to the set of vectors
Euclidian distance for measuring similarity, the performance ,…, , where , 1, . . , . For a
will be considerably reduced. given datum , its nearest neighbors may belong to
To solve this problem, we can use a weighted distance different classes, so these neighbors shouldn’t be used to
measurement proposed in [13] to obtain nearest neighbors of a estimate the parameters. Assume is the label of and is
datum. The main idea of the weighted distance criterion is to the label of the neighbors, , 1, . . . , . We convert to
give suitable but different distance scale to each datum to , according to:
make the distance measure more reasonable for representing ,
the global distribution of the data set. (6)
,
Definition 1 (Cam distribution). Consider a m-dimensional
random vector ,…, that takes a standard m- Then, we calculate the mean of the and ,
dimensional normal distribution 0, , that is, it has ∑ (7)
aprobability density function
. ∑ (8)
⁄ . (1)
to estimate and , respectively.
Let a random vector be defined by the transformation According to theorem 1 and with the approximation of
.
. . (2) expectation, and , we can compute the
Where 0 are the parameters reflect overall scale parameters , , :
(9) data), the performance of LLDE algorithm will decline.

(10)

(11)

III. LOCALLY LINEAR DISCRIMINANT EMBEDDING


Given a set of samples , ,…, in which
are sampled from manifold. LLE characterizes the local
geometry structure of data by linear coefficients that
reconstruct each sample from nearest neighbors, Fig. 1. Choose nearest neighbors using -neighborhoods algorithm by
1,2, … , . The weight matrix of samples can be calculated Euclidian distance (solid line) and weighted distance (dash line) [16].
by minimizing the following cost function:
For example, as shown in Fig. 1, the samples are not well-
∑ ∑ (12)
distributed, the test point is marked by a cross, and its
With the constraints ∑ 1 and 0 if and neighbors are marked by circles. If -neighborhoods algorithm
are not neighbors. based on standard Euclidean distance measurement is used to
LLE computes the best low-dimensional embedding find nearest neighbors of a test point in a deformed
based on the obtained weight matrix . It corresponds to distribution dataset, it chooses neighbors from a single
minimizing the following cost function: direction, and these neighbors are closely gathered. If these
∑ ∑ (13) neighbors are used to reconstruct the test point by linear
To overcome the out-of-sample problem, a linear coefficients, the information captured in this direction will
transformation i.e. or , was plugged into have serious redundancy; at the same time, no information
the cost function (13), where , ,…, is a from other directions are preserved for test point
transformation matrix. By simple algebra formulation, the reconstruction. These chosen neighbors cannot represent and
objective function (13) can be reduced to: reconstruct the test point well.
In order to solve this problem, the weighted distance can
∑ ∑
(14) measure a better similarity compared with the standard
Euclidian distance for the cases in which the data do not have
Where , ,…, , and I is a suitable distribution. Fig. 1 shows the advantage of weighted
an identity matrix. distance measurement. The modified -neighborhoods method
Equation (14) is an objective of LLDE. However, this based on weighted distance measurement selects neighbors
objective emphasizes the local geometry structure of data. In more reasonable than the one based on standard Euclidean
order to enhance the discriminant power, a modified distance by given a smaller weight scaling to the data with
maximizing margin criterion was combined into the objective high density against a larger weight scaling to the data with
of LLE. LLDE attempts to solve the following constrained low density.
optimization problem: Hence, we propose the new dimensionality reduction
(15) algorithm called OWLLDE that uses the weighted distance
. . measurement and Gram-Schmidt orthogonalization for
Where ∑ and improving the performance of dimension reduction especially
∑ ∑ . is the average vector the for deformed distributed data.
th class , 1⁄ ∑ , is the average vector of all The proposed algorithm uses the weighted distance for each
samples, 1⁄ ∑ . is the number of samples in sample to determine its nearest neighbors. On the other
the th class, and are the set of samples in the th class and hand, OWLLDE obtains orthogonal basis vectors which can
a positive constant, respectively. preserve the metric structure of the data. The orthogonal basis
vectors can be computed using the Gram-Schmidt
IV. ORTHOGONAL WEIGHTED LOCALLY LINEAR
DISCRIMINANT EMBEDDING orthogonalization. We can set and assume that the
1 orthogonal basis vectors , … , are available. The
The LLE algorithm in LLDE finds neighbors using - th vector is calculated as
nearest neighbors (KNN) or choosing neighbors within a ball
of fixed radius ( -neighborhoods) based on Euclidean distance ∑ (16)
for each data point in the given data set with the assumption It is easy to check that the vectors , … . , are orthogonal
that samples are well-distributed. As mentioned in [8], the data then, we can re-normalize them to obtain orthonormal basis
set should be sufficient and well-sampled. If there are not vectors. The main procedure of OWLLDE is illustrated in table
enough available data for the problem (loss of well-sampled I.
TABLE I. ORTHOGONAL WEIGHTED LOCALLY LINEAR DISCRIMINANT
EMBEDDING.
Phase 1:
Input: set of high dimensional data , 1,2, … , with
and a constant
1. For each sample , we find its nearest neighbors, Fig. 3. Images of one person in Yale.
,.., by compare the Euclidian distance between all
data and . The Yale database contains 165 gray scale images of 15
2. Obtain with its elements by , 1, … , . persons. The images demonstrate variations in lighting
3. Calculate and by (7) and (8), respectively.
condition, facial expression (normal, happy, sad, sleepy,
4. Calculate , , by (9), (10) and (11) respectively.
Phase 2: Find neighbors surprised, and wink). 11 sample images of one individual in
1. Calculate the weighted distance of to all data by (5). the Yale database are displayed in Fig. 3.
2. Find nearest neighbors, , ,…, , of which
satisfy B. Recognition Rate versus Feature Dimension
, , In this section, 5 images of each person are selected for
, training, while the remaining images are used for testing on
Phase 3 : Compute reconstructed weights ORL and Yale face databases. The weight matrix used in LPP
1. For each sample , compute reconstructed weights using (12).
Phase 4: Calculate low-dimensional embedding
and LDE is obtained using the cosine similarity. The
1. Calculate the interclass scatter and intraclass scatter and their coefficient i.e. is set to 100. The experiments were repeated
weighted difference . 20 times to reduce random effects on results.
2. Compute . The maximal average recognition rate on the ORL face
3. Construct .
4. Calculate eigenvalues and eigenvectors of
database for PCA, LLE, LPP, NPE, LDE, LLDE and
, and basis vectors correspond to the eigenvectors OWLLDE are 92.7%, 91.7%, 77.5%, 64.37%, 92.35%,
associated with the smalest eigenvalues 90.47% and 94.82%, respectively. Fig. 4 shows the
5. Use Gram-Schmidt orthogonalization to obtain orthogonal basis recognition rates of existing methods for different feature
vectors. dimensionalities on the ORL database.
V. EXPERIMENTAL RESULTS
In this section, a set of experiments are conducted to verify
the effectiveness of OWLLDE by comparing it with some
known methods such as PCA, LLE, LPP [17], [18], NPE [19],
LDE [20] and LLDE on two well-known face databases: the
ORL [21] and Yale [2222]. The nearest neighborhood
parameter for constructing the nearest neighbor graph in LLE,
LPP, NPE, LDE, LLDE and OWLLDE is set to 1,
where denotes the number of training samples per class.
Finally, a nearest neighbor classifier with Euclidean distance
is employed for classification.
In all the experiments, all images are gray scale and were
cropped and resized to the resolution of 64 64 pixels. We
applied PCA on the face images in order to reduce computation
complexity, noise and avoid the ‘‘small sample problem’’
Fig. 4. Average recognition rate versus dimension on the ORL database.
situation. The dimensionality of PCA step is – , where is
the number of classes.
The maximal average recognition rate on the YALE face
A. Face Database database for PCA, LLE, LPP, NPE, LDE, LLDE and
In ORL database, there are 400 face images of 40 persons OWLLDE are 77.22%, 74.72%, 71.89%, 61.17%, 84.89%,
(each one has ten images). The images were captured at 95.39% and 96.28%, respectively. Fig. 5 shows a curve of
different times and have different variations including recognition rates of different methods for different feature
expressions (open or closed eyes, smiling or non-smiling) and dimensionalities on the Yale database.
facial details (glasses or no glasses). 10 sample images of one
person are displayed in Fig. 2.

Fig. 2. Images of one person in ORL.


TABLE III. THE MAXIMAL AVERAGE RECOGNITION RATES (%) AND THE
CORRESPONDING STANDARD DEVIATIONS WITH THE REDUCED DIMENSIONS ON
THE YALE DATABASE.

Method 3 Train 4 Train 5 Train 6 Train


74.92±3.54 77.67±2.65 77.22±3.94 78.87±4.11
PCA
(28) (50) (48) (71)
72.5±3.92 75.38±3.64 74.72±7.48 78.07±4.08
LLE
(24) (34) (42) (46)
67.5±5.26 71.05±4.01 71.89±3.83 75.93±3.41
LPP
(30) (50) (45) (50)
69.21±7.84 76.86±5.63 61.17±6.92 71.33±7.63
NPE
(28) (50) (50) (74)
73.62±6.49 83.29±3.64 84.89±7.17 87.73±6.12
LDE
(22) (27) (24) (27)
92.42±2.7 95.19±2.07 95.39±2.28 96.87±1.54
LLDE
(15) (15) (15) (15)
93.79±2.35 95.48±1.85 96.28±1.66 97.8±1.59
OWLLDE
Fig. 5. Average recognition rate for different feature dimensions on Yale (14) (14) (14) (18)
database.

From the experimental results, we can see that OWLLDE VI. CONCLUSIONS
achieves the highest recognition rate compared with existing In this paper, we presented a new dimensionality reduction
methods. method for face recognition called OWLLDE which optimizes
C. Recognition Rate with Different Numbers of Training the distance measure to find more suitable neighbors especially
Samples for deformed distributed data. It uses Gram-Schmidt
orthogonalization to obtain orthogonal basis vectors. As a
In this section, we investigate the effect of the number of result, it improves the performance of dimension reduction
training samples on the recognition rate obtained by different especially for deformed distributed data. Experimental results
methods. First, 3,4,5 images of each person from ORL on ORL and Yale face databases demonstrate the effectiveness
face database are randomly chosen to form the ORL training of the proposed method.
set. From the Yale face database, we randomly selected
3,4,5,6 images from each person for training. The rest of REFERENCES
each database is considered as test set. We repeated the [1] X. Chen and J. Zhang, "A novel maximum margin neighborhood
preserving embedding for face recognition," Future Generation
experiments 20 times and calculate the maximal average Computer Systems, vol. 28, no. 1, pp. 212-217, January. 2012.
recognition rates, dimensions and standard deviations of [2] B. Li, C. H. Zheng, and D. S. Huang, "Locally linear discriminant
different methods. Experimental results on the ORL and Yale embedding: An efficient method for face recognition," Pattern
databases are shown in tables II-III, respectively. Recognition, vol. 41, no. 12, pp. 3813-3821, December. 2008.
As can be seen, the performances of all methods will be [3] M. A. Turk and A. P. Pentland, "Face recognition using
eigenfaces," in Proc. IEEE Computer Vision and Pattern
improved significantly when the training samples increase. It is Recognition Conf. , 1991, pp. 586-591.
easy to see that our proposed algorithm performs superior [4] A. M. Martinez and A. C. Kak, "PCA versus LDA," IEEE Trans.
compared with other existing methods. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-
233, February. 2001.
TABLE II. THE MAXIMAL AVERAGE RECOGNITION RATES (%) AND THE [5] K. I. Kim, K. Jung, and H. J. Kim, "Face recognition using kernel
CORRESPONDING STANDARD DEVIATIONS WITH THE REDUCED DIMENSIONS ON principal component analysis," IEEE. Signal Processing Letters,
THE ORL DATABASE.
vol. 9, no. 2, pp. 40-42, 2002.
Method 3 Train 4 Train 5 Train [6] M. H. Yang, "Kernel eigenfaces vs. kernel Fisherfaces: Face
recognition using kernel methods.," in Proc. IEEE Int. Conf.
PCA 84.71± 2.15(80) 89.5±2.0052(115) 92.7±1.7 (152) Automatic Face and Gesture Recognition, 2002, pp. 215-220.
LLE 81.54±3.13(69) 85.69±5.35(105) 91.7±2.36 (117) [7] J. B. Tenenbaum, V. De Silva, and J. C. Langford, "A global
geometric framework for nonlinear dimensionality reduction,"
LPP 67.84±2.23 (79) 72.71±3.21(92) 77.5±2.76(110) Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
NPE 59.77±5.28 (67) 60.79±5.58(120) 64.37±5.08(160) [8] S. T. Roweis and L. K. Saul, "Nonlinear dimensionality reduction
by locally linear embedding," Science, vol. 290, no. 5500, pp.
LDE 75.07±5.62(54) 87.87±3.05(57) 92.35±1.6 (66) 2323-2326, 2000.
[9] L. K. Saul and S. T. Roweis, "Think globally, fit locally:
LLDE 83.7±2.65(40) 88.31±2.15(40) 90.47±1.47(39) unsupervised learning of low dimensional manifolds," The Journal
of Machine Learning Research, vol. 4, pp. 119-155, 2003.
OWLLDE 90.7±1.56(52) 93.1±1.75(56) 94.82±1.33(52) [10] M. Belkin and P. Niyogi, "Laplacian eigenmaps and spectral
techniques for embedding and clustering," Advances in Neural
Information Processing Systems, vol. 14, pp. 585-591, 2001.
[11] M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality
reduction and data representation," Neural Computation, vol. 15,
no. 6, pp. 1373-1396, 2003.
[12] Z. Zhang and H. Zha, "Principal manifolds and nonlinear
dimensionality reduction via tangent space alignment,"
SIAMJ.Sci.Comput., vol. 26, pp. 313-338, 2004.
[13] C. Y. Zhou and Y. Q. Chen, "Improving nearest neighbor
classification with cam weighted distance," Pattern Recognition,
vol. 39, no. 4, pp. 635-645, April. 2006.
[14] H. Li, T. Jiang, and K. Zhang, "Efficient and robust feature
extraction by maximum margin criterion," Neural Networks, IEEE
Transactions on, vol. 17, no. 1, pp. 157-165, 2006.
[15] W. Zheng, C. Zou, and L. Zhao, "Weighted maximum margin
discriminant analysis with kernels," Neurocomputing, vol. 67, pp.
357-362, August. 2005.
[16] Y. Pan, S. S. Ge, and A. Al Mamun, "Weighted locally linear
embedding for dimension reduction," Pattern Recognition, vol. 42,
no. 5, pp. 798-811, May. 2009.
[17] P. Niyogi, "Locality preserving projections," Advances in neural
information processing systems, vol. 16, pp. 153-160, 2004.
[18] X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang, "Face
recognition using laplacianfaces," IEEE Trans, Pattern Analysis
and Machine Intelligence, vol. 27, no. 3, pp. 328-340, 2005.
[19] X. He, D. Cai, S. Yan, and H. J. Zhang, "Neighborhood preserving
embedding," in Proc. IEEE Conf. Computer Vision, Beijing,
2005, pp. 1208-1213.
[20] H. T. Chen, H. W. Chang, and T. L. Liu, "Local discriminant
embedding and its variants," in Proc. IEEE Conf. Computer Vision
and Pattern Recognition, 2005, pp. 846-853.
[21] The ORL Face Database, Cambridge, U.K,: AT&T (Olivetti)
Research Laboratories. [Online]. Available:
http://www.uk.research.att.com/facedatabase.html.
[22] Yale Univ. Face Database,
http://cvc.yale.edu/projects/yalefaces/yalefaces.html, 2002.

You might also like