Professional Documents
Culture Documents
Linear Regression for Face Recognition identification. Samples from a specific object class are known to
lie on a linear subspace [3], [9]. We use this concept to develop
class-specific models of the registered users simply using the
Imran Naseem,
downsampled gallery images, thereby defining the task of face
Roberto Togneri, Senior Member, IEEE, and
recognition as a problem of linear regression. Least-squares
Mohammed Bennamoun estimation is used to estimate the vectors of parameters for a
given probe against all class models. Finally, the decision rules in
Abstract—In this paper, we present a novel approach of face identification by favor of the class with the most precise estimation. The proposed
formulating the pattern recognition problem in terms of linear regression. Using a classifier can be categorized as a Nearest Subspace (NS) approach.
fundamental concept that patterns from a single-object class lie on a linear An important relevant work is presented in [8], where down-
subspace, we develop a linear model representing a probe image as a linear sampled images from all classes are used to develop a dictionary
combination of class-specific galleries. The inverse problem is solved using the matrix during the training session. Each probe image is repre-
least-squares method and the decision is ruled in favor of the class with the sented as a linear combination of all gallery images, thereby
minimum reconstruction error. The proposed Linear Regression Classification resulting in an ill-conditioned inverse problem. With the latest
(LRC) algorithm falls in the category of nearest subspace classification. The research in compressive sensing and sparse representation,
algorithm is extensively evaluated on several standard databases under a number sparsity of the vector of coefficients is harnessed to solve the ill-
of exemplary evaluation protocols reported in the face recognition literature. A conditioned problem using the l1 -norm minimization. In [10], the
comparative study with state-of-the-art algorithms clearly reflects the efficacy of
concept of Locally Linear Regression (LLR) is introduced specifically
the proposed approach. For the problem of contiguous occlusion, we propose a
to tackle the problem of pose. The main thrust of the research is to
Modular LRC approach, introducing a novel Distance-based Evidence Fusion
indicate an approximate linear mapping between a nonfrontal face
(DEF) algorithm. The proposed methodology achieves the best results ever
reported for the challenging problem of scarf occlusion.
image and its frontal counterpart; the estimation of linear mapping
is further formulated as a prediction problem with a regression-
based solution. For the case of severe pose variations, the
Index Terms—Face recognition, linear regression, nearest subspace
nonfrontal image is sampled to obtain many overlapped local
classification.
segments. Linear regression is applied to each small patch to
Ç predict the corresponding virtual frontal patch; the LLR approach
has shown some good results in the presence of coarse alignment.
1 INTRODUCTION In [11], a two-step approach has been adopted, fusing the concept
FACE recognition systems are known to be critically dependent on of wavelet decomposition and discriminant analysis to design a
manifold learning methods. A gray-scale face image of an order sophisticated feature extraction stage. These discriminant features
a b can be represented as an ab-dimensional vector in the original are used to develop feature planes (for Nearest Feature Plane—
image space. However, any attempt at recognition in such a high- NFP classifier) and feature spaces (for Nearest Feature Space—NFS
dimensional space is vulnerable to a variety of issues often referred classifier). The query image is projected onto the subspaces and the
to as the curse of dimensionality. Therefore, at the feature extraction decision rules in favor of the subspace with the minimum distance.
stage, images are transformed to low-dimensional vectors in the However, the proposed LRC approach, for the first time, simply
face space. The main objective is to find such a basis function for this uses the downsampled images in combination with the linear
transformation which could distinguishably represent faces in the regression classification to achieve superior results compared to
face space. A number of approaches have been reported in the the benchmark techniques.
literature, such as Principle Component Analysis (PCA) [1], [2], Further, for the problem of severe contiguous occlusion, a
Linear Discriminant Analysis (LDA) [3], and Independent Com- modular representation of images is expected to solve the problem
ponent Analysis (ICA) [4], [5]. Primarily, these approaches are [12]. Based on this concept, we propose an efficient Modular LRC
classified in two categories, i.e., reconstructive and discriminative Approach. The proposed approach segments a given occluded
methods. Reconstructive approaches (such as PCA and ICA) are image and reaches individual decisions for each block. These
reported to be robust for the problem of contaminated pixels [6], intermediate decisions are combined using a novel Distance-based
whereas discriminative approaches (such as LDA) are known to Evidence Fusion (DEF) algorithm to reach the final decision. The
yield better results in clean conditions [7]. Apart from these proposed DEF algorithm uses the distance metrics of the
traditional approaches, it has been shown recently that unorthodox intermediate decisions to decide about the “goodness” of a
features, such as downsampled images and random projections, partition. There are two major advantages of using the DEF
can serve equally well. In fact, the choice of the feature space may approach. First, the nonface partitions are rejected dynamically;
no longer be so critical [8]. What really matters is the dimension- therefore, they do not take part in the final decision making.
ality of feature space and the design of the classifier. Second, the overall recognition performance is better than the best
In this research, we propose a fairly simple but efficient linear individual result of the combining partitions due to the efficient
regression-based classification (LRC) for the problem of face decision fusion of the face segments.
followed by extensive experiments using standard databases contaminated pixels by dividing the face image into a number of
under a variety of evaluation protocols in Section 3. The paper subimages. Each subimage is now processed individually and a
concludes in Section 4. final decision is made by fusing information from all of the
subimages. A commonly reported technique for decision fusion is
majority voting [12]. However, a major pitfall with majority voting
2 LINEAR REGRESSION FOR FACE RECOGNITON is that it treats noisy and clean partitions equally. For instance, if
2.1 Linear Regression Classification Algorithm three out of four partitions of an image are corrupted, majority
voting is likely to be erroneous no matter how significant the clean
Let there be N number of distinguished classes with pi number of partition may be in the context of facial features. The task becomes
training images from the ith class, i ¼ 1; 2; . . . ; N. Each gray-scale even more complicated by the fact that the distribution of occlusion
training image is of an order a b and is represented as over a face image is never known a priori and therefore, along with
ðmÞ face and nonface subimages, we are likely to have face portions
ui 2 IRab , i ¼ 1; 2; . . . ; N and m ¼ 1; 2; . . . ; pi . Each gallery image
corrupted with occlusion. Some sophisticated approaches have
is downsampled to an order c d and transformed to vector
ðmÞ ðmÞ been developed to filter out the potentially contaminated image
through column concatenation such that ui 2 IRab ! wi 2 IRq1 , pixels (for example, [17]). In this section, we make use of the specific
where q ¼ cd, cd < ab. Each image vector is normalized so that nature of distance classification to develop a fairly simple but
maximum pixel value is 1. Using the concept that patterns from efficient fusion strategy which implicitly deemphasizes corrupted
the same class lie on a linear subspace [9], we develop a class- subimages, significantly improving the overall classification accu-
racy. We propose using the distance metric as evidence of our belief
specific model Xi by stacking the q-dimensional image vectors,
in the “goodness” of intermediate decisions taken on the sub-
ð1Þ ð2Þ ðpi Þ images; the approach is called “Distance-based Evidence Fusion.”
Xi ¼ wi wi . . . . . . wi 2 IRqpi ; i ¼ 1; 2; . . . ; N ð1Þ To formulate the concept, let us suppose that each training
ðmÞ image is segmented in M partitions and each partitioned image is
Each vector wi , m ¼ 1; 2; . . . ; pi , spans a subspace of IRq also
called the column space of Xi . Therefore, at the training level, designated as vn , n ¼ 1; 2; . . . ; M. The nth partition of all pi training
each class i is represented by a vector subspace, Xi , which is also images from the ith class is subsampled and transformed to
called the regressor or predictor for class i. Let z be an unlabeled vectors, as discussed in Section 2, to develop a class-specific and
ðnÞ
test image and our problem is to classify z as one of the classes partition-specific subspace Ui :
i ¼ 1; 2; . . . ; N. We transform and normalize the gray-scale image h ð1ÞðnÞ ð2ÞðnÞ i
ðpiÞðnÞ
z to an image vector y 2 IRq1 as discussed for the gallery. If y Ui
ðnÞ
¼ wi wi . . . . . . wi ; i ¼ 1; 2; . . . ; N: ð7Þ
belongs to the ith class, it should be represented as a linear
combination of the training images from the same class (lying in Each class is now represented by M subspaces and altogether
the same subspace), i.e., we have M N subspace models. Now a given probe image is
partitioned into M segments accordingly. Each partition is
y ¼ Xi i ; i ¼ 1; 2; . . . ; N; ð2Þ
transformed to an image vector yðnÞ , n ¼ 1; 2; . . . ; M. Given that i
where i 2 IR pi 1
is the vector of parameters. Given that q pi , the is the true class for the given probe image, yðnÞ is expected to lie on
ðnÞ
system of equations in (2) is well conditioned and i can be the nth subspace of the ith class Ui and should satisfy
estimated using least-squares estimation [13], [14], [15]:
ðnÞ ðnÞ
yðnÞ ¼ Ui i : ð8Þ
^ i ¼ XT Xi 1 XT y:
ð3Þ
i i
The vector of parameters and the response vectors are
The estimated vector of parameters, ^i , along with the estimated as discussed in this section:
predictors Xi are used to predict the response vector for each
^ ðnÞ ¼ ðnÞ T ðnÞ 1 ðnÞ T
class i: i Ui Ui Ui yðnÞ ; ð9Þ
ðnÞ ðnÞ ^ ðnÞ
^ i ¼ Xi ^i ; i ¼ 1; 2; . . . ; N
y
^i
y ¼ Ui i ; i ¼ 1; 2; . . . ; N: ð10Þ
1
^ i ¼ Xi XTi Xi XTi y
y ð4Þ The distance measure between the estimated and the original
^ i ¼ Hi y;
y response vector is computed
ðnÞ
where the predicted vector y ^ i 2 IRq1 is the projection of y onto the ^ i 2 ; i ¼ 1; 2; . . . ; N:
di yðnÞ ¼ yðnÞ y ð11Þ
ith subspace. In other words, y ^ i is the closest vector, in the
ith subspace, to the observation vector y in the euclidean sense Now, for the nth partition, an intermediate decision called jðnÞ is
[16]. H is called a hat matrix since it maps y into y ^ i . We now reached with a corresponding minimum distance calculated as
calculate the distance measure between the predicted response
ðnÞ
^ i , i ¼ 1; 2; . . . ; N, and the original response vector y,
vector y dj min di ðyðnÞ Þ
¼ |{z} i ¼ 1; 2; . . . ; N: ð12Þ
i
di ðyÞ ¼ ky y
^ i k2 ; i ¼ 1; 2; . . . ; N ð5Þ
Therefore, we now have M decisions jðnÞ with M corresponding
and rule in favor of the class with minimum distance, i.e., ðnÞ
distances dj and we decide in favor of the class with minimum
distance:
min di ðyÞ;
|{z} i ¼ 1; 2; . . . ; N: ð6Þ
i ðnÞ
min dj
Decision ¼ arg |{z} n ¼ 1; 2; . . . ; M: ð13Þ
2.2 Modular Approach for the LRC Algorithm j
Fig. 1. A typical subject from the AT&T database. Fig. 2. (a) Recognition accuracy for the AT&T database with respect to feature
dimension for a randomly selected leave-one-out experiment. (b) Feature
dimensionality curve for the GT database.
[21], [22], and AR [23] have been addressed. These databases
incorporate several deviations from the ideal conditions, including
recognition error rate is converted to recognition success rate for
pose, illumination, occlusion, and gesture alterations. Several
[27]. Also for EP2, the LRC approach attains a high recognition
standard evaluation protocols reported in the face recognition
literature have been adopted and a comprehensive comparison of success of 98.75 percent in a 50D feature space, it outperforms the
the proposed approach with the state-of-the-art techniques has ICA approach by 5 percent (approximately), and it is fairly
been presented. It is appropriate to indicate that the developed comparable to the Fisherfaces, Eigenfaces, Kernel Eigenfaces,
approach has been shown to perform well for the cases of severe 2DPCA, and ERE approaches.
gesture variations and contiguous occlusion with little change in The choice of dimensionality for the AT&T database is dilated
pose, scale, illumination, and rotation. However, it is not meant to in Fig. 2a, which reflects that the recognition rate becomes fairly
be robust to other deviations such as severe pose and illumination constant above a 40-dimensional feature space.
variations.
3.2 Georgia Tech (GT) Database
3.1 AT&T Database The Georgia Tech database consists of 50 subjects with 15 images
The AT&T database is maintained at the AT&T Laboratories, per subject [19]. It characterizes several variations such as pose,
Cambridge University; it consists of 40 subjects with 10 images per expression, cluttered background, and illumination (see Fig. 3).
subject. The database incorporates facial gestures, such as smiling Images were downsampled to an order of 15 15 to constitute
or nonsmiling, open or closed eyes, and alterations like glasses or a 225D feature space; choice of dimensionality is depicted in
without glasses. It also characterizes a maximum of 20 degree Fig. 2b, which reflects a consistent performance above a 100D
rotation of the face with some scale variations of about 10 percent feature space. The first eight images of each subject were used for
(see Fig. 1). training, while the remaining seven served as probes [27]; all
We follow two evaluation protocols as proposed in the experiments were conducted on the original database without any
literature quite often [24], [25], [26], [27]. Evaluation Protocol 1 cropping/normalization. Table 2 shows a detailed comparison of
(EP1) takes the first five images of each individual as a training set, the LRC with a variety of approaches; all results are as reported in
while the last five are designated as probes. For Evaluation [27] with recognition error rates converted to recognition success
Protocol 2 (EP2), the “leave-one-out” strategy is adopted. All
rates. Also, results in [27] are shown for a large range of feature
experiments are conducted by downsampling 112 92 images to
dimensions; for the sake of fair comparison, we have picked the
an order of 10 5. A detailed comparison of the results for the two
best reported results. The proposed LRC algorithm outperforms
evaluation protocols is summarized in Table 1. For EP1, the LRC
the traditional PCAM and PCAE approaches by a margin of 12 and
algorithm achieves a comparable recognition accuracy of 93.5 per-
cent in a 50D feature space; the best results are reported for the 18.57 percent, respectively, achieving a high recognition accuracy
latest Eigenfeature Regularization and Extraction (ERE) approach, of 92.57 percent. It is also shown to be fairly comparable to all other
which are 3.5 percent better than the LRC method. Note that methods, including the latest ERE approaches.
TABLE 1
Results for EP1 and EP2 Using the AT&T Database
TABLE 2 TABLE 3
Results for the Georgia Tech Database Results for the FERET Database
Fig. 4. A typical subject from the FERET database; fa and fb represent frontal Fig. 5. Starting from the top, each row illustrates samples from subsets 1, 2, 3, 4,
shots with gesture variations, while ql and qr correspond to pose variations. and 5, respectively.
2110 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 32, NO. 11, NOVEMBER 2010
TABLE 4 TABLE 5
Results for the Extended Yale B Database Recognition Results for Gesture Variations Using the LRC Approach
Fig. 6. Gesture variations in the AR database. Note the changing position of the
head with different poses. (a)-(d) and (e)-(h) correspond to two different sessions
incorporating neutral, happy, angry, and screaming expressions, respectively. Fig. 7. Examples of contiguous occlusion in the AR database.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 32, NO. 11, NOVEMBER 2010 2111
TABLE 6
Recognition Results for Occlusion
Fig. 10. Case studies for the Modular LRC approach for the problem of scarf
occlusion.
Fig. 8. (a) The recognition accuracy versus feature dimension for scarf occlusion
using the LRC approach. (b) A sample image indicating eye and mouth locations
for the purpose of manual alignment.
TABLE 7 REFERENCES
Comparison of the DEF with the Sum Rule for Three Case Studies [1] I.T. Jolliffe, Principal Component Analysis. Springer, 1986.
[2] M. Turk and A. Pentland, “Eigenfaces for Recognition,” J. Cognitive
Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
[3] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs.
Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720,
July 1997.
[4] P. Comon, “Independent Component Analysis—A New Concept?” Signal
Processing, vol. 36, pp. 287-314, 1994.
[5] M. Bartlett, H. Lades, and T. Sejnowski, “Independent Component
93 percent classification accuracy is reported for 50 subjects in [17]. Representations for Face Recognition,” Proc. SPIE Conf. Human Vision and
Electronic Imaging III, pp. 528-539, 1998.
Finally, we compare the proposed DEF approach with the weighted [6] A. Leonardis and H. Bischof, “Robust Recognition Using Eigenimages,”
sum rule, which is perhaps the major workhorse in the field of Computer Vision and Image Understanding, vol. 78, no. 1, pp. 99-118, 2000.
combining classifiers [31]. The comparison for three case studies in [7] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley &
Fig. 10 is presented in Table 7. Note that, without any prior Sons, 2000.
[8] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, and Y. Ma, “Robust Face
knowledge of the goodness of a specific partition, we used equal Recognition via Sparse Representation,” IEEE Trans. Pattern Analysis and
weights for all subimages of a given partitioned image. The DEF Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.
approach comprehensively outperforms the sum rule for the three [9] R. Barsi and D. Jacobs, “Lambertian Reflection and Linear Subspaces,” IEEE
case studies, showing an improvement of 38.5, 33.5, and 20 percent, Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 2, pp. 218-233,
Feb. 2003.
respectively. The performance of the sum rule improves with the [10] X. Chai, S. Shan, X. Chen, and W. Gao, “Locally Linear Regression for Pose-
increase in the pure face regions, thereby demonstrating strong Invariant Face Recognition,” IEEE Trans. Image Processing, vol. 16, no. 7,
dependency on the way partitioning is performed. On the other pp. 1716-1725, July 2007.
[11] J. Chien and C. Wu, “Discriminant Waveletfaces and Nearest Feature
hand, the proposed DEF approach shows a quite consistent
Classifiers for Face Recognition,” IEEE Trans. Pattern Analysis and Machine
performance for even the worst-case partitioning, i.e., Fig. 10a Intelligence, vol. 24, no. 12, pp. 1644-1649, Dec. 2002.
consisting of an equal number of face and nonface partitions. [12] A. Pentland, B. Moghaddam, and T. Starner, “View-Based and Modular
Eigenspaces for Face Recognition,” Proc. IEEE Conf. Computer Vision and
Pattern Recognition, 1994.
[13] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning;
4 CONCLUSION Data Mining, Inference and Prediction. Springer, 2001.
In this research, a novel classification algorithm is proposed which [14] G.A.F. Seber, Linear Regression Analysis. Wiley-Interscience, 2003.
[15] T.P. Ryan, Modern Regression Methods. Wiley-Interscience, 1997.
formulates the face identification task as a problem of linear
[16] R.G. Staudte and S.J. Sheather, Robust Estimation and Testing. Wiley-
regression. The proposed LRC algorithm is extensively evaluated Interscience, 1990.
using the most standard databases with a variety of evaluation [17] S. Fidler, D. Skocaj, and A. Leonardis, “Combining Reconstructive and
methods reported in the face recognition literature. Specifically, the Discriminative Subspace Methods for Robust Classification and Regression
challenges of varying facial expressions and contiguous occlusion by Subsampling,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 28, no. 3, pp. 337-350, Mar. 2006.
are addressed. Considerable comparative analysis with the state- [18] F. Samaria and A. Harter, “Parameterisation of a Stochastic Model for
of-the-art algorithms clearly reflects the potency of the proposed Human Face Identification,” Proc. Second IEEE Workshop Applications of
approach. The proposed LRC approach reveals a number of Computer Vision, Dec. 1994.
interesting outcomes. Apart from the Modular LRC approach for [19] “Georgia Tech Face Database,” http://www.anefian.com/face_reco.htm,
2007.
face identification in the presence of disguise, the LRC approach [20] P.J. Phillips, H. Wechsler, J.S. Huang, and P.J. Rauss, “The FERET Database
yields high recognition accuracies without requiring any prepro- and Evaluation Procedure for Face-Recognition Algorithms,” Image and
cessing steps of face localization and/or normalization. We argue Vision Computing, vol. 16, no. 5, pp. 295-306, 1998.
that in the presence of nonideal conditions such as occlusion, [21] A. Georghiades, P. Belhumeur, and D. Kriegman, “From Few to Many:
Illumination Cone Models for Face Recognition under Variable Lighting
illumination, and severe gestures, a cropped and aligned face is and Pose,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23,
generally not available. Therefore, a consistent reliable perfor- no. 6, pp. 643-660, June 2001.
mance with unprocessed standard databases makes the LRC [22] K.C. Lee, J. Ho, and D. Kriegman, “Acquiring Linear Subspaces for Face
algorithm appropriate for real scenarios. For the case of varying Recognition under Variable Lighting,” IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 27, no. 5, pp. 684-698, May 2005.
gestures, the LRC approach has been shown to cope well with the [23] A. Martinez and R. Benavente, “The AR Face Database,” CVC Technical
most severe screaming expression where the state-of-the-art Report 24, June 1998.
techniques lag behind, indicating consistency for mild and severe [24] J. Yang, D. Zhang, A.F. Frangi, and J. Yang, “Two-Dimensional PCA: A
changes. For the problem of face recognition in the presence of New Approach to Appearance-Based Face Representation and Recogni-
tion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 1,
disguise, the Modular LRC algorithm using an efficient evidential
pp. 131-137, Jan. 2004.
fusion strategy yields the best reported results in the literature. [25] M.H. Yang, “Kernel Eignefaces vs. Kernel Fisherfaces: Face Recognition
In the paradigm of view-based face recognition, the choice of Using Kernel Methods,” Proc. Fifth IEEE Int’l Conf. Automatic Face and
features for a given case study has been a debatable topic. Recent Gesture Recognition, pp. 215-220, May 2002.
[26] P.C. Yuen and J.H. Lai, “Face Representation Using Independent
research has, however, shown the competency of unorthodox
Component Analysis,” Pattern Recognition, vol. 35, no. 6, pp. 1247-1257,
features such as downsampled images and random projections, 2002.
indicating a divergence from the conventional ideology [8]. The [27] X. Jiang, B. Mandal, and A. Kot, “Eigenfeature Regularization and
proposed LRC approach in fact conforms to this emerging belief. It Extraction in Face Recognition,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 30, no. 3, pp. 383-394, Mar. 2008.
has been shown that with an appropriate choice of classifier, the [28] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, and S.Z. Li, “Ensemble-
downsampled images can produce good results compared to the Based Discriminant Learning with Boosting for Face Recognition,” IEEE
traditional approaches. The simple architecture of the proposed Trans. Neural Networks, vol. 17, no. 1, pp. 166-178, Jan. 2006.
approach makes it computationally efficient, therefore suggesting a [29] Handbook of Face Recognition, S.Z. Li, and A.K. Jain, eds. Springer, 2005.
[30] L. Zhang and G.W. Cottrell, “When Holistic Processing Is Not Enough:
strong candidacy for realistic video-based face recognition applica- Local Features Save the Day,” Proc. 26th Ann. Cognitive Science Soc. Conf.,
tions. Other future directions include the robustness issues related 2004.
to illumination, random pixel corruption, and pose variations. [31] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, “On Combining Classifiers,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-
239, Mar. 1998.
ACKNOWLEDGMENTS
This research is partly funded by the Australian Research Council
(ARC) grant No. DP0664228.