Professional Documents
Culture Documents
W. ZHAO
Sarnoff Corporation
R. CHELLAPPA
University of Maryland
P. J. PHILLIPS
National Institute of Standards and Technology
AND
A. ROSENFELD
University of Maryland
As one of the most successful applications of image analysis and understanding, face
recognition has recently received significant attention, especially during the past
several years. At least two reasons account for this trend: the first is the wide range of
commercial and law enforcement applications, and the second is the availability of
feasible technologies after 30 years of research. Even though current machine
recognition systems have reached a certain level of maturity, their success is limited by
the conditions imposed by many real applications. For example, recognition of face
images acquired in an outdoor environment with changes in illumination and/or pose
remains a largely unsolved problem. In other words, current systems are still far away
from the capability of the human perception system.
This paper provides an up-to-date critical survey of still- and video-based face
recognition research. There are two underlying motivations for us to write this survey
paper: the first is to provide an up-to-date review of the existing literature, and the
second is to offer some insights into the studies of machine recognition of faces. To
provide a comprehensive survey, we not only categorize existing recognition techniques
but also present detailed descriptions of representative methods within each category.
In addition, relevant topics such as psychophysical studies, system evaluation, and
issues of illumination and pose variation are covered.
An earlier version of this paper appeared as Face Recognition: A Literature Survey, Technical Report CAR-
TR-948, Center for Automation Research, University of Maryland, College Park, MD, 2000.
Authors addresses: W. Zhao, Vision Technologies Lab, Sarnoff Corporation, Princeton, NJ 08543-5300;
email: wzhao@sarnoff.com; R. Chellappa and A. Rosenfeld, Center for Automation Research, University of
Maryland, College Park, MD 20742-3275; email: {rama,ar}@cfar.umd.edu; P. J. Phillips, National Institute
of Standards and Technology, Gaithersburg, MD 20899; email: jonathon@nist.gov.
Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted with-
out fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright
notice, the title of the publication, and its date appear, and notice is given that copying is by permission of
ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific
permission and/or a fee.
2003
c ACM 0360-0300/03/1200-0399 $5.00
ACM Computing Surveys, Vol. 35, No. 4, December 2003, pp. 399458.
400 Zhao et al.
Table II. Available Commercial Face Recognition Systems (Some of these Web sites
may have changed or been removed.) [The identification of any company, commercial
product, or trade name does not imply endorsement or recommendation by the National
Institute of Standards and Technology or any of the authors or their institutions.]
Commercial products Websites
FaceIt from Visionics http://www.FaceIt.com
Viisage Technology http://www.viisage.com
FaceVACS from Plettac http://www.plettac-electronics.com
FaceKey Corp. http://www.facekey.com
Cognitec Systems http://www.cognitec-systems.de
Keyware Technologies http://www.keywareusa.com/
Passfaces from ID-arts http://www.id-arts.com/
ImageWare Sofware http://www.iwsinc.com/
Eyematic Interfaces Inc. http://www.eyematic.com/
BioID sensor fusion http://www.bioid.com
Visionsphere Technologies http://www.visionspheretech.com/menu.htm
Biometric Systems, Inc. http://www.biometrica.com/
FaceSnap Recoder http://www.facesnap.de/htdocs/english/index2.html
SpotIt for face composite http://spotit.itc.it/SpotIt.html
findings have important consequences for ization. However, the feature extraction
engineers who design algorithms and sys- techniques needed for this type of ap-
tems for machine recognition of human proach are still not reliable or accurate
faces. Section 2 will present a concise re- enough [Cox et al. 1996]. For example,
view of these findings. most eye localization techniques assume
Barring a few exceptions that use range some geometric and textural models and
data [Gordon 1991], the face recognition do not work if the eye is closed. Section 3
problem has been formulated as recogniz- will present a review of still-image-based
ing three-dimensional (3D) objects from face recognition.
two-dimensional (2D) images.1 Earlier ap- During the past 5 to 8 years, much re-
proaches treated it as a 2D pattern recog- search has been concentrated on video-
nition problem. As a result, during the based face recognition. The still image
early and mid-1970s, typical pattern clas- problem has several inherent advantages
sification techniques, which use measured and disadvantages. For applications such
attributes of features (e.g., the distances as drivers licenses, due to the controlled
between important points) in faces or face nature of the image acquisition process,
profiles, were used [Bledsoe 1964; Kanade the segmentation problem is rather easy.
1973; Kelly 1970]. During the 1980s, work However, if only a static picture of an air-
on face recognition remained largely dor- port scene is available, automatic location
mant. Since the early 1990s, research in- and segmentation of a face could pose se-
terest in FRT has grown significantly. One rious challenges to any segmentation al-
can attribute this to several reasons: an in- gorithm. On the other hand, if a video
crease in interest in commercial opportu- sequence is available, segmentation of a
nities; the availability of real-time hard- moving person can be more easily accom-
ware; and the increasing importance of plished using motion as a cue. But the
surveillance-related applications. small size and low image quality of faces
Over the past 15 years, research has captured from video can significantly in-
focused on how to make face recognition crease the difficulty in recognition. Video-
systems fully automatic by tackling prob- based face recognition is reviewed in
lems such as localization of a face in a Section 4.
given image or video clip and extraction As we propose new algorithms and build
of features such as eyes, mouth, etc. more systems, measuring the performance
Meanwhile, significant advances have of new systems and of existing systems
been made in the design of classifiers becomes very important. Systematic data
for successful face recognition. Among collection and evaulation of face recogni-
appearance-based holistic approaches, tion systems is reviewed in Section 5.
eigenfaces [Kirby and Sirovich 1990; Recognizing a 3D object from its 2D im-
Turk and Pentland 1991] and Fisher- ages poses many challenges. The illumina-
faces [Belhumeur et al. 1997; Etemad tion and pose problems are two prominent
and Chellappa 1997; Zhao et al. 1998] issues for appearance- or image-based ap-
have proved to be effective in experiments proaches. Many approaches have been
with large databases. Feature-based proposed to handle these issues, with the
graph matching approaches [Wiskott majority of them exploring domain knowl-
et al. 1997] have also been quite suc- edge. Details of these approaches are dis-
cessful. Compared to holistic approaches, cussed in Section 6.
feature-based methods are less sensi- In 1995, a review paper [Chellappa et al.
tive to variations in illumination and 1995] gave a thorough survey of FRT
viewpoint and to inaccuracy in face local- at that time. (An earlier survey [Samal
and Iyengar 1992] appeared in 1992.) At
1 There have been recent advances on 3D face recogni-
that time, video-based face recognition
tion in situations where range data acquired through
was still in a nascent stage. During the
structured light can be matched reliably [Bronstein past 8 years, face recognition has received
et al. 2003]. increased attention and has advanced
technically. Many commercial systems for large numbers of face images. In most
still face recognition are now available. applications the images are available only
Recently, significant research efforts have in the form of single or multiple views of
been focused on video-based face model- 2D intensity data, so that the inputs to
ing/tracking, recognition, and system in- computer face recognition algorithms are
tegration. New datasets have been created visual only. For this reason, the literature
and evaluations of recognition techniques reviewed in this section is restricted to
using these databases have been carried studies of human visual perception of
out. It is not an overstatement to say that faces.
face recognition has become one of the Many studies in psychology and neuro-
most active applications of pattern recog- science have direct relevance to engineers
nition, image analysis and understanding. interested in designing algorithms or sys-
In this paper we provide a critical review tems for machine recognition of faces. For
of current developments in face recogni- example, findings in psychology [Bruce
tion. This paper is organized as follows: in 1988; Shepherd et al. 1981] about the rela-
Section 2 we briefly review issues that are tive importance of different facial features
relevant from a psychophysical point of have been noted in the engineering liter-
view. Section 3 provides a detailed review ature [Etemad and Chellappa 1997]. On
of recent developments in face recognition the other hand, machine systems provide
techniques using still images. In Section 4 tools for conducting studies in psychology
face recognition techniques based on video and neuroscience [Hancock et al. 1998;
are reviewed. Data collection and perfor- Kalocsai et al. 1998]. For example, a pos-
mance evaluation of face recognition algo- sible engineering explanation of the bot-
rithms are addressed in Section 5 with de- tom lighting effects studied in Johnston
scriptions of representative protocols. In et al. [1992] is as follows: when the actual
Section 6 we discuss two important prob- lighting direction is opposite to the usually
lems in face recognition that can be math- assumed direction, a shape-from-shading
ematically studied, lack of robustness to algorithm recovers incorrect structural in-
illumination and pose variations, and we formation and hence makes recognition of
review proposed methods of overcoming faces harder.
these limitations. Finally, a summary and A detailed review of relevant studies in
conclusions are presented in Section 7. psychophysics and neuroscience is beyond
the scope of this paper. We only summa-
rize findings that are potentially relevant
2. PSYCHOPHYSICS/NEUROSCIENCE
to the design of face recognition systems.
ISSUES RELEVANT TO FACE
For details the reader is referred to the
RECOGNITION
papers cited below. Issues that are of po-
Human recognition processes utilize a tential interest to designers are2 :
broad spectrum of stimuli, obtained from Is face recognition a dedicated process?
many, if not all, of the senses (visual, [Biederman and Kalocsai 1998; Ellis
auditory, olfactory, tactile, etc.). In many 1986; Gauthier et al. 1999; Gauthier and
situations, contextual knowledge is also Logothetis 2000]: It is traditionally be-
applied, for example, surroundings play lieved that face recognition is a dedi-
an important role in recognizing faces in cated process different from other ob-
relation to where they are supposed to ject recognition tasks. Evidence for the
be located. It is futile to even attempt to existence of a dedicated face process-
develop a system using existing technol- ing system comes from several sources
ogy, which will mimic the remarkable face [Ellis 1986]. (a) Faces are more eas-
recognition ability of humans. However, ily remembered by humans than other
the human brain has its limitations in the
total number of persons that it can accu- 2 Readers should be aware of the existence of diverse
rately remember. A key advantage of a opinions on some of these issues. The opinions given
computer system is its capacity to handle here do not necessarily represent our views.
objects when presented in an upright tions may not be used. For example, in
orientation. (b) Prosopagnosia patients face recall studies, humans quickly fo-
are unable to recognize previously fa- cus on odd features such as big ears, a
miliar faces, but usually have no other crooked nose, a staring eye, etc. One of
profound agnosia. They recognize peo- the strongest pieces of evidence to sup-
ple by their voices, hair color, dress, etc. port the view that face recognition in-
It should be noted that prosopagnosia volves more configural/holistic process-
patients recognize whether a given ob- ing than other object recognition has
ject is a face or not, but then have dif- been the face inversion effect in which
ficulty in identifying the face. Seven an inverted face is much harder to rec-
differences between face recognition ognize than a normal face (first demon-
and object recognition can be summa- strated in [Yin 1969]). An excellent ex-
rized [Biederman and Kalocsai 1998] ample is given in [Bartlett and Searcy
based on empirical evidence: (1) con- 1993] using the Thatcher illusion
figural effects (related to the choice of [Thompson 1980]. In this illusion, the
different types of machine recognition eyes and mouth of an expressing face
systems), (2) expertise, (3) differences are excised and inverted, and the re-
verbalizable, (4) sensitivity to contrast sult looks grotesque in an upright face;
polarity and illumination direction (re- however, when shown inverted, the face
lated to the illumination problem in ma- looks fairly normal in appearance, and
chine recognition systems), (5) metric the inversion of the internal features is
variation, (6) Rotation in depth (related not readily noticed.
to the pose variation problem in ma-
chine recognition systems), and (7) ro- Ranking of significance of facial features
tation in plane/inverted face. Contrary [Bruce 1988; Shepherd et al. 1981]: Hair,
to the traditionally held belief, some re- face outline, eyes, and mouth (not nec-
cent findings in human neuropsychol- essarily in this order) have been de-
ogy and neuroimaging suggest that face termined to be important for perceiv-
recognition may not be unique. Accord- ing and remembering faces [Shepherd
ing to [Gauthier and Logothetis 2000], et al. 1981]. Several studies have shown
recent neuroimaging studies in humans that the nose plays an insignificant role;
indicate that level of categorization and this may be due to the fact that al-
expertise interact to produce the speci- most all of these studies have been done
fication for faces in the middle fusiform using frontal images. In face recogni-
gyrus.3 Hence it is possible that the en- tion using profiles (which may be im-
coding scheme used for faces may also portant in mugshot matching applica-
be employed for other classes with simi- tions, where profiles can be extracted
lar properties. (On recognition of famil- from side views), a distinctive nose
iar vs. unfamiliar faces see Section 7.) shape could be more important than the
Is face perception the result of holistic eyes or mouth [Bruce 1988]. Another
or feature analysis? [Bruce 1988; Bruce outcome of some studies is that both
et al. 1998]: Both holistic and feature external and internal features are im-
information are crucial for the percep- portant in the recognition of previ-
tion and recognition of faces. Studies ously presented but otherwise unfamil-
suggest the possibility of global descrip- iar faces, but internal features are more
tions serving as a front end for finer, dominant in the recognition of familiar
feature-based perception. If dominant faces. It has also been found that the
features are present, holistic descrip- upper part of the face is more useful
for face recognition than the lower part
3 The
[Shepherd et al. 1981]. The role of aes-
fusiform gyrus or occipitotemporal gyrus, lo-
cated on the ventromedial surface of the temporal
thetic attributes such as beauty, attrac-
and occipital lobes, is thought to be critical for face tiveness, and/or pleasantness has also
recognition. been studied, with the conclusion that
the more attractive the faces are, the quires the use of high-frequency com-
better is their recognition rate; the least ponents [Sergent 1986]. Low-frequency
attractive faces come next, followed by components contribute to global de-
the midrange faces, in terms of ease of scription, while high-frequency compo-
being recognized. nents contribute to the finer details
Caricatures [Brennan 1985; Bruce 1988; needed in identification.
Perkins 1975]: A caricature can be for- Viewpoint-invariant recognition? [Bie-
mally defined [Perkins 1975] as a sym- derman 1987; Hill et al. 1997; Tarr
bol that exaggerates measurements rel- and Bulthoff 1995]: Much work in vi-
ative to any measure which varies from sual object recognition (e.g. [Biederman
one person to another. Thus the length 1987]) has been cast within a theo-
of a nose is a measure that varies from retical framework introduced in [Marr
person to person, and could be useful 1982] in which different views of ob-
as a symbol in caricaturing someone, jects are analyzed in a way which
but not the number of ears. A stan- allows access to (largely) viewpoint-
dard caricature algorithm [Brennan invariant descriptions. Recently, there
1985] can be applied to different qual- has been some debate about whether ob-
ities of image data (line drawings and ject recognition is viewpoint-invariant
photographs). Caricatures of line draw- or not [Tarr and Bulthoff 1995]. Some
ings do not contain as much information experiments suggest that memory for
as photographs, but they manage to cap- faces is highly viewpoint-dependent.
ture the important characteristics of a Generalization even from one profile
face; experiments based on nonordinary viewpoint to another is poor, though
faces comparing the usefulness of line- generalization from one three-quarter
drawing caricatures and unexaggerated view to the other is very good [Hill et al.
line drawings decidedly favor the former 1997].
[Bruce 1988]. Effect of lighting change [Bruce et al.
Distinctiveness [Bruce et al. 1994]: Stud- 1998; Hill and Bruce 1996; Johnston
ies show that distinctive faces are bet- et al. 1992]: It has long been informally
ter retained in memory and are rec- observed that photographic negatives
ognized better and faster than typical of faces are difficult to recognize. How-
faces. However, if a decision has to be ever, relatively little work has explored
made as to whether an object is a face why it is so difficult to recognize nega-
or not, it takes longer to recognize an tive images of faces. In [Johnston et al.
atypical face than a typical face. This 1992], experiments were conducted to
may be explained by different mecha- explore whether difficulties with nega-
nisms being used for detection and for tive images and inverted images of faces
identification. arise because each of these manipula-
The role of spatial frequency analysis tions reverses the apparent direction of
[Ginsburg 1978; Harmon 1973; Sergent lighting, rendering a top-lit image of a
1986]: Earlier studies [Ginsburg 1978; face apparently lit from below. It was
Harmon 1973] concluded that informa- demonstrated in [Johnston et al. 1992]
tion in low spatial frequency bands that bottom lighting does indeed make it
plays a dominant role in face recog- harder to identity familiar faces. In [Hill
nition. Recent studies [Sergent 1986] and Bruce 1996], the importance of top
have shown that, depending on the spe- lighting for face recognition was demon-
cific recognition task, the low, band- strated using a different task: match-
pass and high-frequency components ing surface images of faces to determine
may play different roles. For example whether they were identical.
gender classification can be successfully Movement and face recognition [OToole
accomplished using low-frequency com- et al. 2002; Bruce et al. 1998; Knight and
ponents only, while identification re- Johnston 1997]: A recent study [Knight
and Johnston 1997] showed that fa- cated in Figure 1. Depending on the nature
mous faces are easier to recognize when of the application, for example, the sizes of
shown in moving sequences than in the training and testing databases, clutter
still photographs. This observation has and variability of the background, noise,
been extended to show that movement occlusion, and speed requirements, some
helps in the recognition of familiar faces of the subtasks can be very challenging.
shown under a range of different types Though fully automatic face recognition
of degradationsnegated, inverted, or systems must perform all three subtasks,
thresholded [Bruce et al. 1998]. Even research on each subtask is critical. This
more interesting is the observation is not only because the techniques used
that there seems to be a benefit for the individual subtasks need to be im-
due to movement even if the informa- proved, but also because they are critical
tion content is equated in the mov- in many different applications (Figure 1).
ing and static comparison conditions. For example, face detection is needed to
However, experiments with unfamiliar initialize face tracking, and extraction of
faces suggest no additional benefit from facial features is needed for recognizing
viewing animated rather than static human emotion, which is in turn essential
sequences. in human-computer interaction (HCI) sys-
Facial expressions [Bruce 1988]: Based tems. Isolating the subtasks makes it eas-
on neurophysiological studies, it seems ier to assess and advance the state of the
that analysis of facial expressions is ac- art of the component techniques. Earlier
complished in parallel to face recogni- face detection techniques could only han-
tion. Some prosopagnosic patients, who dle single or a few well-separated frontal
have difficulties in identifying famil- faces in images with simple backgrounds,
iar faces, nevertheless seem to recog- while state-of-the-art algorithms can de-
nize expressions due to emotions. Pa- tect faces and their poses in cluttered
tients who suffer from organic brain backgrounds [Gu et al. 2001; Heisele et al.
syndrome suffer from poor expression 2001; Schneiderman and Kanade 2000; Vi-
analysis but perform face recognition ola and Jones 2001]. Extensive research on
quite well.4 Similarly, separation of face the subtasks has been carried out and rel-
recognition and focused visual process- evant surveys have appeared on, for exam-
ing tasks (e.g., looking for someone with ple, the subtask of face detection [Hjelmas
a thick mustache) have been claimed. and Low 2001; Yang et al. 2002].
In this section we survey the state of the
3. FACE RECOGNITION FROM
art of face recognition in the engineering
STILL IMAGES
literature. For the sake of completeness,
in Section 3.1 we provide a highlighted
As illustrated in Figure 1, the prob- summary of research on face segmenta-
lem of automatic face recognition involves tion/detection and feature extraction. Sec-
three key steps/subtasks: (1) detection and tion 3.2 contains detailed reviews of recent
rough normalization of faces, (2) feature work on intensity image-based face recog-
extraction and accurate normalization of nition and categorizes methods of recog-
faces, (3) identification and/or verification. nition from intensity images. Section 3.3
Sometimes, different subtasks are not to- summarizes the status of face recognition
tally separated. For example, the facial and discusses open research issues.
features (eyes, nose, mouth) used for face
recognition are often used in face detec-
3.1. Key Steps Prior to Recognition: Face
tion. Face detection and feature extraction
Detection and Feature Extraction
can be achieved simultaneously, as indi-
The first step in any automatic face
4 From a machine recognition point of view, dramatic recognition systems is the detection of
facial expressions may affect face recognition perfor- faces in images. Here we only provide a
mance if only one photograph is available. summary on this topic and highlight a few
very recent methods. After a face has been approach is based on training on multiple-
detected, the task of feature extraction is view samples [Gu et al. 2001; Schnei-
to obtain features that are fed into a face derman and Kanade 2000]. Compared to
classification system. Depending on the invariant-feature-based methods [Wiskott
type of classification system, features can et al. 1997], multiview-based methods of
be local features such as lines or fiducial face detection and recognition seem to be
points, or facial features such as eyes, able to achieve better results when the an-
nose, and mouth. Face detection may also gle of out-of-plane rotation is large (35 ).
employ features, in which case features In the psychology community, a similar
are extracted simultaneously with face debate exists on whether face recognition
detection. Feature extraction is also a is viewpoint-invariant or not. Studies in
key to animation and recognition of facial both disciplines seem to support the idea
expressions. that for small angles, face perception is
Without considering feature locations, view-independent, while for large angles,
face detection is declared successful if the it is view-dependent.
presence and rough location of a face has In a detection problem, two statistics
been correctly identified. However, with- are important: true positives (also referred
out accurate face and feature location, no- to as detection rate) and false positives
ticeable degradation in recognition perfor- (reported detections in nonface regions).
mance is observed [Martinez 2002; Zhao An ideal system would have very high
1999]. The close relationship between fea- true positive and very low false positive
ture extraction and face recognition moti- rates. In practice, these two requirements
vates us to review a few feature extraction are conflicting. Treating face detection as
methods that are used in the recognition a two-class classification problem helps
approaches to be reviewed in Section 3.2. to reduce false positives dramatically
Hence, this section also serves as an intro- [Rowley et al. 1998; Sung and Poggio 1997]
duction to the next section. while maintaining true positives. This is
achieved by retraining systems with false-
positive samples that are generated by
3.1.1. Segmentation/Detection: Summary. previously trained systems.
Up to the mid-1990s, most work on
segmentation was focused on single-face 3.1.2. Feature Extraction: Summary and
segmentation from a simple or complex Methods
background. These approaches included
using a whole-face template, a deformable 3.1.2.1. Summary. The importance of fa-
feature-based template, skin color, and a cial features for face recognition cannot
neural network. be overstated. Many face recognition sys-
Significant advances have been made tems need facial features in addition to
in recent years in achieving automatic the holistic face, as suggested by studies
face detection under various conditions. in psychology. It is well known that even
Compared to feature-based methods and holistic matching methods, for example,
template-matching methods, appearance- eigenfaces [Turk and Pentland 1991] and
or image-based methods [Rowley et al. Fisherfaces [Belhumeur et al. 1997], need
1998; Sung and Poggio 1997] that train accurate locations of key facial features
machine systems on large numbers of such as eyes, nose, and mouth to normal-
samples have achieved the best results. ize the detected face [Martinez 2002; Yang
This may not be surprising since face et al. 2002].
objects are complicated, very similar to Three types of feature extraction meth-
each other, and different from nonface ob- ods can be distinguished: (1) generic meth-
jects. Through extensive training, comput- ods based on edges, lines, and curves;
ers can be quite good at detecting faces. (2) feature-template-based methods that
More recently, detection of faces under are used to detect facial features such
rotation in depth has been studied. One as eyes; (3) structural matching methods
that take into consideration geometrical ible statistical model. To account for tex-
constraints on the features. Early ap- ture variation, the ASM model has been
proaches focused on individual features; expanded to statistical appearance mod-
for example, a template-based approach els including a Flexible Appearance Model
was described in [Hallinan 1991] to de- (FAM) [Lanitis et al. 1995] and an Active
tect and recognize the human eye in a Appearance Model (AAM) [Cootes et al.
frontal face. These methods have difficulty 2001]. In [Cootes et al. 2001], the pro-
when the appearances of the features posed AAM combined a model of shape
change significantly, for example, closed variation (i.e., ASM) with a model of the
eyes, eyes with glasses, open mouth. To de- appearance variation of shape-normalized
tect the features more reliably, recent ap- (shape-free) textures. A training set of 400
proaches have used structural matching images of faces, each manually labeled
methods, for example, the Active Shape with 68 landmark points, and approxi-
Model [Cootes et al. 1995]. Compared to mately 10,000 intensity values sampled
earlier methods, these recent statistical from facial regions were used. The shape
methods are much more robust in terms model (mean shape, orthogonal mapping
of handling variations in image intensity matrix Ps and projection vector bs ) is gen-
and feature shape. erated by representing each set of land-
An even more challenging situation for marks as a vector and applying principal-
feature extraction is feature restoration, component analysis (PCA) to the data.
which tries to recover features that are Then, after each sample image is warped
invisible due to large variations in head so that its landmarks match the mean
pose. The best solution here might be to shape, texture information can be sam-
hallucinate the missing features either by pled from this shape-free face patch. Ap-
using the bilateral symmetry of the face or plying PCA to this data leads to a shape-
using learned information. For example, a free texture model (mean texture, P g
view-based statistical method claims to be and b g ). To explore the correlation be-
able to handle even profile views in which tween the shape and texture variations,
many local features are invisible [Cootes a third PCA is applied to the concate-
et al. 2000]. nated vectors (bs and b g ) to obtain the
combined model in which one vector c
3.1.2.2. Methods. A template-based ap- of appearance parameters controls both
proach to detecting the eyes and mouth in the shape and texture of the model. To
real images was presented in [Yuille et al. match a given image and the model, an
1992]. This method is based on match- optimal vector of parameters (displace-
ing a predefined parameterized template ment parameters between the face region
to an image that contains a face region. and the model, parameters for linear in-
Two templates are used for matching the tensity adjustment, and the appearance
eyes and mouth respectively. An energy parameters c) are searched by minimiz-
function is defined that links edges, peaks ing the difference between the synthetic
and valleys in the image intensity to image and the given one. After match-
the corresponding properties in the tem- ing, a best-fitting model is constructed
plate, and this energy function is min- that gives the locations of all the facial
imized by iteratively changing the pa- features and can be used to reconstruct
rameters of the template to fit the im- the original images. Figure 2 illustrates
age. Compared to this model, which is the optimization/search procedure for
manually designed, the statistical shape fitting the model to the image. To speed up
model (Active Shape Model, ASM) pro- the search procedure, an efficient method
posed in [Cootes et al. 1995] offers more is proposed that exploits the similarities
flexibility and robustness. The advantages among optimizations. This allows the di-
of using the so-called analysis through rect method to find and apply directions
synthesis approach come from the fact of rapid convergence which are learned
that the solution is constrained by a flex- off-line.
Fig. 2. Multiresolution search from a displaced position using a face model. (Courtesy of T. Cootes,
K. Walker, and C. Taylor.)
3.2. Recognition from Intensity Images local features such as the eyes, nose,
and mouth are first extracted and their
Many methods of face recognition have
locations and local statistics (geomet-
been proposed during the past 30 years.
ric and/or appearance) are fed into a
Face recognition is such a challenging
structural classifier.
yet interesting problem that it has at-
tracted researchers who have different (3) Hybrid methods. Just as the human
backgrounds: psychology, pattern recogni- perception system uses both local fea-
tion, neural networks, computer vision, tures and the whole face region to rec-
and computer graphics. It is due to this ognize a face, a machine recognition
fact that the literature on face recognition system should use both. One can ar-
is vast and diverse. Often, a single sys- gue that these methods could poten-
tem involves techniques motivated by dif- tially offer the best of the two types of
ferent principles. The usage of a mixture methods.
of techniques makes it difficult to classify Within each of these categories, further
these systems based purely on what types classification is possible (Table III). Using
of techniques they use for feature repre- principal-component analysis (PCA),
sentation or classification. To have a clear many face recognition techniques have
and high-level categorization, we instead been developed: eigenfaces [Turk and
follow a guideline suggested by the psy- Pentland 1991], which use a nearest-
chological study of how humans use holis- neighbor classifier; feature-line-based
tic and local features. Specifically, we have methods, which replace the point-to-point
the following categorization: distance with the distance between a point
and the feature line linking two stored
(1) Holistic matching methods. These sample points [Li and Lu 1999]; Fisher-
methods use the whole face region as faces [Belhumeur et al. 1997; Liu and
the raw input to a recognition system. Wechsler 2001; Swets and Weng 1996b;
One of the most widely used repre- Zhao et al. 1998] which use linear/Fisher
sentations of the face region is eigen- discriminant analysis (FLD/LDA) [Fisher
pictures [Kirby and Sirovich 1990; 1938]; Bayesian methods, which use a
Sirovich and Kirby 1987], which are probabilistic distance metric [Moghaddam
based on principal component analy- and Pentland 1997]; and SVM methods,
sis. which use a support vector machine as the
(2) Feature-based (structural) matching classifier [Phillips 1998]. Utilizing higher-
methods. Typically, in these methods, order statistics, independent-component
analysis (ICA) is argued to have more els that cover the forehead, eye, nose,
representative power than PCA, and mouth, and chin [Nefian and Hayes 1998;
hence may provide better recognition per- Samaria 1994; Samaria and Young 1994].
formance than PCA [Bartlett et al. 1998]. [Nefian and Hayes 1998] reported bet-
Being able to offer potentially greater ter performance than Samaria [1994] by
generalization through learning, neural using the KL projection coefficients in-
networks/learning methods have also stead of the strips of raw pixels. One of
been applied to face recognition. One ex- the most successful systems in this cate-
ample is the Probabilistic Decision-Based gory is the graph matching system [Okada
Neural Network (PDBNN) method [Lin et al. 1998; Wiskott et al. 1997], which
et al. 1997] and the other is the evolution is based on the Dynamic Link Architec-
pursuit (EP) method [Liu and Wechsler ture (DLA) [Buhmann et al. 1990; Lades
2000a]. et al. 1993]. Using an unsupervised learn-
Most earlier methods belong to the cat- ing method based on a self-organizing map
egory of structural matching methods, us- (SOM), a system based on a convolutional
ing the width of the head, the distances neural network (CNN) has been developed
between the eyes and from the eyes to the [Lawrence et al. 1997].
mouth, etc. [Kelly 1970], or the distances In the hybrid method category, we
and angles between eye corners, mouth will briefly review the modular eigenface
extrema, nostrils, and chin top [Kanade method [Pentland et al. 1994], a hybrid
1973]. More recently, a mixture-distance representation based on PCA and local
based approach using manually extracted feature analysis (LFA) [Penev and Atick
distances was reported [Cox et al. 1996]. 1996], a flexible appearance model-based
Without finding the exact locations of method [Lanitis et al. 1995], and a recent
facial features, Hidden Markov Model- development [Huang et al. 2003] along
(HMM-) based methods use strips of pix- this direction. In [Pentland et al. 1994],
the use of hybrid features by combining of objects such as face images that are
eigenfaces and other eigenmodules is ex- normalized with respect to scale, trans-
plored: eigeneyes, eigenmouth, and eigen- lation, and rotation, the redundancy is
nose. Though experiments show slight even greater [Penev and Atick 1996; Zhao
improvements over holistic eigenfaces or 1999]. One of the best global compact
eigenmodules based on structural match- representations is KL/PCA, which decor-
ing, we believe that these types of methods relates the outputs. More specifically,
are important and deserve further inves- sample vectors x can be expressed as lin-
tigation. Perhaps many relevant problems ear combinations
n of the
morthogonal basis
need to be solved before fruitful results i : x = i=1 ai i i=1 ai i (typically
can be expected, for example, how to opti- m n) by solving the eigenproblem
mally arbitrate the use of holistic and local
features. C = , (1)
Many types of systems have been suc-
cessfully applied to the task of face recog- where C is the covariance matrix for input
nition, but they all have some advantages x.
and disadvantages. Appropriate schemes An advantage of using such representa-
should be chosen based on the specific re- tions is their reduced sensitivity to noise.
quirements of a given task. Most of the Some of this noise may be due to small oc-
systems reviewed here focus on the sub- clusions, as long as the topological struc-
task of recognition, but others also in- ture does not change. For example, good
clude automatic face detection and feature performance under blurring, partial oc-
extraction, making them fully automatic clusion and changes in background has
systems [Lin et al. 1997; Moghaddam and been demonstrated in many eigenpicture-
Pentland 1997; Wiskott et al. 1997]. based systems, as illustrated in Figure 3.
This should not come as a surprise, since
3.2.1. Holistic Approaches the PCA reconstructed images are much
better than the original distorted im-
3.2.1.1. Principal-Component Analysis. ages in terms of their global appearance
Starting from the successful low- (Figure 4).
dimensional reconstruction of faces For better approximation of face images
using KL or PCA projections [Kirby and outside the training set, using an extended
Sirovich 1990; Sirovich and Kirby 1987], training set that adds mirror-imaged faces
eigenpictures have been one of the major was shown to achieve lower approxima-
driving forces behind face representa- tion error [Kirby and Sirovich 1990]. Us-
tion, detection, and recognition. It is ing such an extended training set, the
well known that there exist significant eigenpictures are either symmetric or an-
statistical redundancies in natural im- tisymmetric, with the most leading eigen-
ages [Ruderman 1994]. For a limited class pictures typically being symmetric.
Fig. 4. Reconstructed images using 300 PCA projection coefficients for electronically modi-
fied images (Figure 3). (From Zhao [1999].)
The first really successful demonstra- based on a Bayesian analysis of image dif-
tion of machine recognition of faces was ferences. Two mutually exclusive classes
made in [Turk and Pentland 1991] using were defined: I , representing intraper-
eigenpictures (also known as eigenfaces) sonal variations between multiple images
for face detection and identification. Given of the same individual, and E , represent-
the eigenfaces, every face in the database ing extrapersonal variations due to dif-
can be represented as a vector of weights; ferences in identity. Assuming that both
the weights are obtained by projecting the classes are Gaussian-distributed, likeli-
image into eigenface components by a sim- hood functions P (| I ) and P (| E ) were
ple inner product operation. When a new estimated for a given intensity difference
test image whose identification is required = I1 I2 . Given these likelihood func-
is given, the new image is also represented tions and using the MAP rule, two face im-
by its vector of weights. The identification ages are determined to belong to the same
of the test image is done by locating the individual if P (| I ) > P (| E ). A large
image in the database whose weights are performance improvement of this prob-
the closest to the weights of the test image. abilistic matching technique over stan-
By using the observation that the projec- dard nearest-neighbor eigenspace match-
tion of a face image and a nonface image ing was reported using large face datasets
are usually different, a method of detect- including the FERET database [Phillips
ing the presence of a face in a given image et al. 2000]. In Moghaddam and Pentland
is obtained. The method was demon- [1997], an efficient technique of probabil-
strated using a database of 2500 face im- ity density estimation was proposed by de-
ages of 16 subjects, in all combinations of composing the input space into two mu-
three head orientations, three head sizes, tually exclusive subspaces: the principal
and three lighting conditions. subspace F and its orthogonal subspace F
Using a probabilistic measure of sim- (a similar idea was explored in Sung and
ilarity, instead of the simple Euclidean Poggio [1997]). Covariances only in the
distance used with eigenfaces [Turk and principal subspace are estimated for use
Pentland 1991], the standard eigenface in the Mahalanobis distance [Fukunaga
approach was extended [Moghaddam and 1989]. Experimental results have been re-
Pentland 1997] to a Bayesian approach. ported using different subspace dimen-
Practically, the major drawback of a sionalities M I and M E for I and E .
Bayesian method is the need to esti- For example, M I = 10 and M E = 30
mate probability distributions in a high- were used for internal tests, while M I =
dimensional space from very limited num- M E = 125 were used for the FERET test.
bers of training samples per class. To avoid In Figure 5, the so-called dual eigenfaces
this problem, a much simpler two-class separately trained on samples from I
problem was created from the multiclass and E are plotted along with the stan-
problem by using a similarity measure dard eigenfaces. While the extrapersonal
Fig. 7. Two architectures for performing ICA on images. Left: architecture for
finding statistically independent basis images. Performing source separation on
the face images produces independent images in the rows of U . Right: architecture
for finding a factorial code. Performing source separation on the pixels produces a
factorial code in the columns of the output matrix U [Bartlett et al. 1998]. (Courtesy
of M. Bartlett, H. Lades, and T. Sejnowski.)
Fig. 8. Comparison of basis images using two architectures for performing ICA: (a) 25 indepen-
dent components of Architecture I, (b) 25 independent components of Architecture II [Bartlett
et al. 1998]. (Courtesy of M. Bartlett, H. Lades, and T. Sejnowski.)
Fig. 10. The bunch graph representation of faces used in elastic graph matching [Wiskott et al.
1997]. (Courtesy of L. Wiskott, J.-M. Fellous, and C. von der Malsburg.)
Kelly 1970] as well as 1D [Samaria and Buhmann et al. [1990] and Lades
Young 1994] and pseudo-2D [Samaria et al. [1993] used Gabor-based wavelets
1994] HMM methods. One of the most (Figure 10(a)) as the features. As de-
successful of these systems is the Elas- scribed in Lades et al. [1993] DLAs basic
tic Bunch Graph Matching (EBGM) sys- mechanism, in addition to the connection
tem [Okada et al. 1998; Wiskott et al. parameter Ti j betweeen two neurons (i,
1997], which is based on DLA [Buhmann j ), is a dynamic variable Ji j . Only the
et al. 1990; Lades et al. 1993]. Wavelets, J -variables play the roles of synaptic
especially Gabor wavelets, play a building weights for signal transmission. The
block role for facial representation in these T -parameters merely act to constrain the
graph matching methods. A typical local J -variables, for example, 0 Ji j Ti j .
feature representation consists of wavelet The T -parameters can be changed slowly
coefficients for different scales and rota- by long-term synaptic plasticity. The
tions based on fixed wavelet bases (called weights Ji j are subject to rapid modifi-
jets in Okada et al. [1998]). These locally cation and are controlled by the signal
estimated wavelet coefficients are robust correlations between neurons i and j .
to illumination change, translation, dis- Negative signal correlations lead to a
tortion, rotation, and scaling. decrease and positive signal correlations
The basic 2D Gabor function and its lead to an increase in Ji j . In the absence
Fourier transform are of any correlation, Ji j slowly returns to a
resting state, a fixed fraction of Ti j . Each
stored image is formed by picking a rect-
g (x, y : u0 , v0 ) = exp x 2 /2x2 + y 2 /2 y2
angular grid of points as graph nodes. The
+ 2i[u0 x + vo y]), grid is appropriately positioned over the
G(u, v) = exp 2 2 x2 (u u0 )2 image and is stored with each grid points
locally determined jet (Figure 10(a)), and
+ y2 (v v0 )2 , (5) serves to represent the pattern classes.
Recognition of a new image takes place by
where x and y represent the spatial transforming the image into the grid of
widths of the Gaussian and (u0 , v0 ) is the jets, and matching all stored model graphs
frequency of the complex sinusoid. to the image. Conformation of the DLA
DLAs attempt to solve some of the con- is done by establishing and dynamically
ceptual problems of conventional artificial modifying links between vertices in the
neural networks, the most prominent of model domain.
these being the representation of syntac- The DLA architecture was recently ex-
tical relationships in neural networks. tended to Elastic Bunch Graph Match-
DLAs use synaptic plasticity and are ing [Wiskott et al. 1997] (Figure 10). This
able to form sets of neurons grouped into is similar to the graph described above,
structured graphs while maintaining but instead of attaching only a single jet
the advantages of neural systems. Both to each node, the authors attached a set
Fig. 12. LFA kernels K (xi , y) at different grids xi [Penev and Atick 1996].
small fraction of them are active, corre- The search for the best topographic set of
sponding to natural objects/signals that sparsely distributed grids {xo } based on re-
are statistically redundant [Ruderman construction error is called sparsification
1994]. From the activity of these sparsely and is described in Penev and Atick [1996].
distributed receptors, the brain has to Two interesting points are demonstrated
discover where and what objects are in in this paper: (1) using the same number
the field of view and recover their at- of kernels, the perceptual reconstruction
tributes. Consequently, one expects to rep- quality of LFA based on the optimal set
resent the natural objects/signals in a sub- of grids is better than that of PCA; the
space of lower dimensionality by finding mean square error is 227, and 184 for a
a suitable parameterization. For a lim- particular input; (2) keeping the second
ited class of objects such as faces which PCA eigenmodel in LFA reconstruction re-
are correctly aligned and scaled, this sug- duces the mean square error to 152, sug-
gests that even lower dimensionality can gesting the hybrid use of PCA and LFA. No
be expected [Penev and Atick 1996]. One results on recognition performance based
good example is the successful use of the on LFA were reported. LFA is claimed to
truncated PCA expansion to approximate be used in Visionicss commercial system
the frontal face images in a linear sub- FaceIt (Table II).
space [Kirby and Sirovich 1990; Sirovich A flexible appearance model based
and Kirby 1987]. method for automatic face recognition was
Going a step further, the whole face re- presented in [Lanitis et al. 1995]. To iden-
gion stimulates a full 2D array of recep- tify a face, both shape and gray-level infor-
tors, each of which corresponds to a lo- mation are modeled and used. The shape
cation in the face, but some of these re- model is an ASM; these are statistical
ceptors may be inactive. To explore this models of the shapes of objects which it-
redundancy, LFA is used to extract to- eratively deform to fit to an example of
pographic local features from the global the shape in a new image. The statis-
PCA modes. Unlike PCA kernels i which tical shape model is trained on exam-
contain no topographic information (their ple images using PCA, where the vari-
supports extend over the entire grid of ables are the coordinates of the shape
images), LFA kernels (Figure12) K (xi , y) model points. For the purpose of classifi-
at selected grids xi have local support.10 cation, the shape variations due to inter-
class variation are separated from those
10 These
due to within-class variations (such as
kernels (Figure 12) indexed by grids xi are
similar to the ICA kernels in the first ICA system
small variations in 3D orientation and fa-
architecture [Bartlett et al. 1998; Bell and Sejnowski cial expression) using discriminant anal-
1995]. ysis. Based on the average shape of the
that were not strongly overlapped and con- images. More recently, practical meth-
tained gray-scale structures were used for ods have emerged that aim at more re-
classification. In addition, the face region alistic applications. In the recent com-
was added to the nine components to form prehensive FERET evaluations [Phillips
a single feature vector (a hybrid method), et al. 2000; Phillips et al. 1998b; Rizvi
which was later trained by a SVM clas- et al. 1998], aimed at evaluating dif-
sifer [Vapnik 1995]. Training on three im- ferent systems using the same large
ages and testing on 200 images per sub- database containing thousands of images,
ject led to the following recognition rates the systems described in Moghaddam and
on a set of six subjects: 90% for the hybrid Pentland [1997]; Swets and Weng [1996b];
method and roughly 10% for the global Turk and Pentland [1991]; Wiskott et al.
method that used the face region only; the [1997]; Zhao et al. [1998], as well as
false positive rate was 10%. others, were evaluated. The EBGM sys-
tem [Wiskott et al. 1997], the subspace
3.3. Summary and Discussion LDA system [Zhao et al. 1998], and the
probabilistic eigenface system [Moghad-
Face recognition based on still images or dam and Pentland 1997] were judged to
captured frames in a video stream can be among the top three, with each method
be viewed as 2D image matching and showing different levels of performance on
recognition; range images are not avail- different subsets of sequestered images.
able in most commercial/law enforcement A brief summary of the FERET evalua-
applications. Face recognition based on tions will be presented in Section 5. Re-
other sensing modalities such as sketches cently, more extensive evaluations using
and infrared images is also possible. Even commercial systems and thousands of im-
though this is an oversimplification of the ages have been performed in the FRVT
actual recognition problem of 3D objects 2000 [Blackburn et al. 2001] and FRVT
based on 2D images, we have focused on 2002 [Phillips et al. 2003] tests.
this 2D problem, and we will address two
important issues about 2D recognition of
3.3.2. Lessons, Facts and Highlights. Dur-
3D face objects in Section 6. Significant
ing the development of face recognition
progress has been achieved on various as-
systems, many lessons have been learned
pects of face recognition: segmentation,
which may provide some guidance in the
feature extraction, and recognition of faces
development of new methods and systems.
in intensity images. Recently, progress has
also been made on constructing fully au- Advances in face recognition have come
tomatic systems that integrate all these from considering various aspects of this
techniques. specialized perception problem. Earlier
methods treated face recognition as a
3.3.1. Status of Face Recognition. After standard pattern recognition problem;
more than 30 years of research and de- later methods focused more on the rep-
velopment, basic 2D face recognition has resentation aspect, after realizing its
reached a mature level and many commer- uniqueness (using domain knowledge);
cial systems are available (Table II) for more recent methods have been con-
various applications (Table I). cerned with both representation and
Early research on face recognition was recognition, so a robust system with
primarily focused on the feasibility ques- good generalization capability can be
tion, that is: is machine recognition of built. Face recognition continues to
faces possible? Experiments were usually adopt state-of-the-art techniques from
carried out using datasets consisting of learning, computer vision, and pattern
as few as 10 images. Significant advances recognition. For example, distribution
were made during the mid-1990s, with modeling using mixtures of Gaussians,
many methods proposed and tested on and SVM learning methods, have been
datasets consisting of as many as 100 used in face detection/recognition.
shape. In Craw and Cameron [1996], quick recognition method, the discrim-
manually selected points are used to inant information that it provides may
warp the input images to the mean not be rich enough to handle very large
shape, yielding shape-free images. Be- databases. This insufficiency can be
cause of this difference, PCA was a compensated for by local feature meth-
good classifier in Moghaddam and Pent- ods. However, many questions need to
land [1997] for the shape-free repre- be answered before we can build such a
sentations, but it may not be as good combined system. One important ques-
for the simply normalized representa- tion is how to arbitrate the use of holistic
tions. Recently, systematic comparisons and local features. As a first step, a sim-
and independent reevaluations of ex- ple, naive engineering approach would
isting methods have been published be to weight the features. But how to
[Beveridge et al. 2001]. This is benefi- determine whether and how to use the
cial to the research community. How- features remains an open problem.
ever, since the methods need to be reim- The challenge of developing face detec-
plemented, and not all the details in the tion techniques that report not only the
original implementation can be taken presence of a face but also the accurate
into account, it is difficult to carry out locations of facial features under large
absolutely fair comparisons. pose and illumination variations still re-
Over 30 years of research has provided mains. Without accurate localization of
us with a vast number of methods and important features, accurate and robust
systems. Recognizing the fact that each face recognition cannot be achieved.
method has its advantages and disad- How to model face variation under re-
vantages, we should select methods and alistic settings is still challengingfor
systems appropriate to the application. example, outdoor environments, natu-
For example, local feature based meth- ral aging, etc.
ods cannot be applied when the input
image contains a small face region, say 4. FACE RECOGNITION FROM IMAGE
15 15. Another issue is when to use SEQUENCES
PCA and when to use LDA in building a
system. Apparently, when the number of A typical video-based face recognition sys-
training samples per class is large, LDA tem automatically detects face regions, ex-
is the best choice. On the other hand, tracts features from the video, and recog-
if only one or two samples are available nizes facial identity if a face is present. In
per class (a degenerate case for LDA), surveillance, information security, and ac-
PCA is a better choice. For a more de- cess control applications, face recognition
tailed comparison of PCA versus LDA, and identification from a video sequence
see Beveridge et al. [2001]; Martinez is an important problem. Face recognition
and Kak [2001]. One way to unify PCA based on video is preferable over using still
and LDA is to use regularized subspace images, since as demonstrated in Bruce
LDA [Zhao et al. 1999]. et al. [1998] and Knight and Johnston
[1997], motion helps in recognition of (fa-
3.3.3. Open Research Issues. Though miliar) faces when the images are negated,
machine recognition of faces from still inverted or threshold. It was also demon-
images has achieved a certain level of strated that humans can recognize ani-
success, its performance is still far from mated faces better than randomly rear-
that of human perception. Specifically, we ranged images from the same set. Though
can list the following open issues: recognition of faces from video sequence
is a direct extension of still-image-based
Hybrid face recognition systems that recognition, in our opinion, true video-
use both holistic and local features re- based face recognition techniques that co-
semble the human perceptual system. herently use both spatial and temporal
While the holistic approach provides a information started only a few years ago
and still need further investigation. Sig- Before we examine existing video-based
nificant challenges for video-based recog- face recognition algorithms, we briefly
nition still exist; we list several of them review three closely related techniques:
here. face segmentation and pose estimation,
face tracking, and face modeling. These
(1) The quality of video is low. Usually, techniques are critical for the realization
video acquisition occurs outdoors (or of the full potential of video-based face
indoors but with bad conditions for recognition.
video capture) and the subjects are not
cooperative; hence there may be large
4.1. Basic Techniques of Video-Based Face
illumination and pose variations in the
Recognition
face images. In addition, partial occlu-
sion and disguise are possible. In Chellappa et al. [1995], four computer
(2) Face images are small. Again, due to vision areas were mentioned as being im-
the acquisition conditions, the face im- portant for video-based face recognition:
age sizes are smaller (sometimes much segmentation of moving objects (humans)
smaller) than the assumed sizes in from a video sequence; structure estima-
most still-image-based face recogni- tion; 3D models for faces; and nonrigid mo-
tion systems. For example, the valid tion analysis. For example, in Jebara et al.
face region can be as small as 15 [1998] a face modeling system which es-
15 pixels,13 whereas the face image timates facial features and texture from
sizes used in feature-based still image- a video stream was described. This sys-
based systems can be as large as 128 tem utilizes all four techniques: segmen-
128. Small-size images not only make tation of the face based on skin color to
the recognition task more difficult, but initiate tracking; use of a 3D face model
also affect the accuracy of face segmen- based on laser-scanned range data to nor-
tation, as well as the accurate detec- malize the image (by facial feature align-
tion of the fiducial points/landmarks ment and texture mapping to generate a
that are often needed in recognition frontal view) and construction of an eigen-
methods. subspace for 3D heads; use of structure
(3) The characteristics of faces/human from motion (SfM) at each feature point
body parts. During the past 8 years, to provide depth information; and non-
research on human action/behavior rigid motion analysis of the facial fea-
recognition from video has been very tures based on simple 2D SSD (sum of
active and fruitful. Generic description squared differences) tracking constrained
of human behavior not particular to an by a global 3D model. Based on the current
individual is an interesting and useful development of video-based face recogni-
concept. One of the main reasons for tion, we think it is better to review three
the feasibility of generic descriptions of specific face-related techniques instead of
human behavior is that the intraclass the above four general areas. The three
variations of human bodies, and in par- video-based face-related techniques are:
ticular faces, is much smaller than the face segmentation and pose estimation,
difference between the objects inside face tracking, and face modeling.
and outside the class. For the same rea-
son, recognition of individuals within 4.1.1. Face Segmentation and Pose Estima-
the class is difficult. For example, de- tion. Early attempts [Turk and Pentland
tecting and localizing faces is typically 1991] at segmenting moving faces from an
much easier than recognizing a specific image sequence used simple pixel-based
face. change detection procedures based on dif-
ference images. These techniques may run
13 Notice this is totally different from the situation
into difficulties when multiple moving ob-
where we have images with large face regions but jects and occlusion are present. More so-
the final face regions feed into a classifier is 15 15. phisticated methods use estimated flow
fields for segmenting humans in mo- (1) head tracking, which involves tracking
tion [Shio and Sklansky 1991]. More re- the motion of a rigid object that is perform-
cent methods [Choudhury et al. 1999; ing rotations and translations; (2) facial
McKenna and Gong 1998] have used mo- feature tracking, which involves tracking
tion and/or color information to speed up nonrigid deformations that are limited by
the process of searching for possible face the anatomy of the head, that is, articu-
regions. After candidate face regions are lated motion due to speech or facial expres-
located, still-image-based face detection sions and deformable motion due to mus-
techniques can be applied to locate the cle contractions and relaxations; and (3)
faces [Yang et al. 2002]. Given a face re- complete tracking, which involves track-
gion, important facial features can be lo- ing both the head and the facial features.
cated. The locations of feature points can Early efforts focused on the first two
be used for pose estimation, which is im- problems: head tracking [Azarbayejani
portant for synthesizing a virtual frontal et al. 1993] and facial feature track-
view [Choudhury et al. 1999]. Newly de- ing [Terzopoulos and Waters 1993; Yuille
veloped segmentation methods locate the and Hallinan 1992]. In Azarbayejani et al.
face and estimate its pose simultaneously [1993], an approach to head tracking using
without extracting features [Gu et al. points with high Hessian values was pro-
2001; Li et al. 2001b]. This is achieved by posed. Several such points on the head are
learning multiview face examples which tracked and the 3D motion parameters of
are labeled with manually determined the head are recovered by solving an over-
pose angles. constrained set of motion equations. Facial
feature tracking methods may make use
of the feature boundary or the feature re-
4.1.2. Face and Feature Tracking. After gion. Feature boundary tracking attempts
faces are located, the faces and their fea- to track and accurately delineate the
tures can be tracked. Face tracking and shape of the facial feature, for example, to
feature tracking are critical for recon- track the contours of the lips and mouth
structing a face model (depth) through [Terzopoulos and Waters 1993]. Feature
SfM, and feature tracking is essential region tracking addresses the simpler
for facial expression recognition and gaze problem of tracking a region such as a
recognition. Tracking also plays a key bounding box that surrounds the facial
role in spatiotemporal-based recognition feature [Black et al. 1995].
methods [Li and Chellappa 2001; Li et al. In Black et al. [1995], a tracking sys-
2001a] which directly use the tracking in- tem based on local parameterized mod-
formation. els is used to recognize facial expressions.
In its most general form, tracking is The models include a planar model for
essentially motion estimation. However, the head, local affine models for the eyes,
general motion estimation has fundamen- and local affine models and curvature for
tal limitations such as the aperture prob- the mouth and eyebrows. A face track-
lem. For images like faces, some regions ing system was used in Maurer and Mals-
are too smooth to estimate flow accurately, burg [1996b] to estimate the pose of the
and sometimes the change in local appear- face. This system used a graph represen-
ances is too large to give reliable flow. tation with about 2040 nodes/landmarks
Fortunately, these problems are alleviated to model the face. Knowledge about faces
thanks to face modeling, which exploits is used to find the landmarks in the
domain knowledge. In general, tracking first frame. Two tracking systems de-
and modeling are dual processes: track- scribed in Jebara et al. [1998] and Strom
ing is constrained by a generic 3D model et al. [1999] model faces completely with
or a learned statistical model under de- texture and geometry. Both systems use
formation, and individual models are re- generic 3D models and SfM to recover
fined through tracking. Face tracking can the face structure. Jebara et al. [1998] re-
be roughly divided into three categories: lied fixed feature points (eyes, nose tip),
while Strom et al. [1999] tracked only of frames, and the depths of these fea-
points with high Hessian values. Also, Je- tures are computed. To overcome the dif-
bara et al. [1998] tracked 2D features in ficulty of feature tracking, bundle adjust-
3D by deforming them, while Strom et al. ment [Triggs et al. 2000] can be used to
[1999] relied on direct comparison of a 3D obtain better and more robust results.
model to the image. Methods have been Recently, multiview based 2D methods
proposed in Black et al. [1998] and Hager have gained popularity. In Li et al. [2001b],
and Belhumeur [1998] to solve the vary- a model consisted of a sparse 3D shape
ing appearance (both geometry and pho- model learned from 2D images labeled
tometry) problem in tracking. Some of the with pose and landmarks, a shape-and-
newest model-based tracking methods cal- pose-free texture model, and an affine ge-
culate the 3D motions and deformations ometrical model. An alternative approach
directly from image intensities [Brand is to use 3D models such as the deformable
and Bhotika 2001], thus eliminating the model of DeCarlo and Metaxas [2000] or
information-lossy intermediate represen- the linear 3D object class model of Blanz
tations. and Vetter [1999]. (In Blanz and Vetter
[1999] a morphable 3D face model con-
sisting of shape and texture was directly
4.1.3. Face Modeling. Modeling of faces
matched to single/multiple input images;
includes 3D shape modeling and texture
as a consequence, head orientation, illumi-
modeling. For large texture variations due
nation conditions, and other parameters
to changes in illumination, we will address
could be free variables subject to optimiza-
the illumination problem in Section 6.
tion.) In Blanz and Vetter [1999], real-time
Here we focus on 3D shape modeling. 3D
3D modeling and tracking of faces was
models of faces have been employed in the
described; a generic 3D head model was
graphics, animation, and model-based im-
aligned to match frontal views of the face
age compression literature. More compli-
in a video sequence.
cated models are used in applications such
as forensic face reconstruction from par-
4.2. Video-Based Face Recognition
tial information.
In computer vision, one of the most Historically, video face recognition origi-
widely used methods of estimating 3D nated from still-image-based techniques
shape from a video sequence is SfM, which (Table IV). That is, the system automati-
estimates the 3D depths of interesting cally detects and segments the face from
points. The unconstrained SfM problem the video, and then applies still-image face
has been approached in two ways. In the recognition techniques. Many methods re-
differential approach, one computes some viewed in Section 3 belong to this category:
type of flow field (optical, image, or nor- eigenfaces [Turk and Pentland 1991],
mal) and uses it to estimate the depths probabilistic eigenfaces [Moghaddam
of visible points. The difficulty in this ap- and Pentland 1997], the EBGM
proach is reliable computation of the flow method [Okada et al. 1998; Wiskott
field. In the discrete approach, a set of fea- et al. 1997], and the PDBNN method [Lin
tures such as points, edges, corners, lines, et al. 1997]. An improvement over these
or contours are tracked over a sequence methods is to apply tracking; this can help
Fig. 14. Varying the most significant identity parameters (top) and
manipulating residual variation without affecting identity (bottom)
[Edwards et al. 1998].
the thresholds used to initiate, track, and each person are modeled as a Gaussian
delete an object. distribution.
To find the face region in an image, the Finally, the face and speaker recogni-
preselector uses a generic sparse graph tion modules are combined using a Bayes
consisting of 16 nodes learned from eight net. The system was tested in an ATM
example face images. The landmark finder scenario, a controlled environment. An
uses a dense graph consisting of 48 nodes ATM session begins when the subject en-
learned from 25 example images to find ters the cameras field of view and the
landmarks such as the eyes and the nose system detects his/her face. The system
tip. Finally, an elastic graph matching then greets the user and begins the bank-
scheme is employed to identify the face. ing transaction, which involves a series
A recognition rate of about 90% was of questions by the system and answers
achieved; the size of the database is not by the user. Data for 26 people were col-
known. lected; the normalized face images were
A multimodal person recognition sys- 40 80 pixels and the audio was sam-
tem was described in Choudhury et al. pled at 16 kHz. These experiments on
[1999]. This system consists of a face small databases and well-controlled en-
recognition module, a speaker identifica- vironments showed that the combination
tion module, and a classifier fusion mod- of audio and video improved performance,
ule. It has the following characteristics: and that 100% recognition and verification
(1) the face recognition module can de- were achieved when the image/audio clips
tect and compensate for pose variations; with highest confidence scores were used.
the speaker identification module can de- In Li and Chellappa [2001], a face ver-
tect and compensate for changes in the ification system based on tracking facial
auditory background; (2) the most reli- features was presented. The basic idea of
able video frames and audio clips are se- this approach is to exploit the temporal
lected for recognition; (3) 3D information information available in a video sequence
about the head obtained through SfM is to improve face recognition. First, the fea-
used to detect the presence of an actual ture points defined by Gabor attributes on
person as opposed to an image of that a regular 2D grid are tracked. Then, the
person. trajectories of these tracked feature points
Two key parts of the face recognition are exploited to identify the person pre-
module are face detection/tracking and sented in a short video sequence. The pro-
eigen-face recognition. The face is de- posed tracking-for-verification scheme is
tected using skin color information using different from the pure tracking scheme
a learned model of a mixture of Gaussians. in that one template face from a database
The facial features are then located using of known persons is selected for track-
symmetry transforms and image intensity ing. For each template with a specific
gradients. Correlation-based methods are personal ID, tracking can be performed
used to track the feature points. The loca- and trajectories can be obtained. Based
tions of these feature points are used to on the characteristics of these trajecto-
estimate the pose of the face. This pose ries, identification can be carried out. Ac-
estimate and a 3D head model are used cording to the authors, the trajectories of
to warp the detected face image into a the same person are more coherent than
frontal view. For recognition, the feature those of different persons, as illustrated
locations are refined and the face is nor- in Figure 15. Such characteristics can also
malized with eyes and mouth in fixed lo- be observed in the posterior probabilities
cations. Images from the face tracker are over time by assuming different classes.
used to train a frontal eigenspace, and In other words, the posterior probabilities
the leading 35 eigenvectors are retained. for the true hypothesis tend to be higher
Face recognition is then performed using than those for false hypotheses. This in
a probabilistic eigenface approach where turn can be used for identification. Test-
the projection coefficients of all images of ing results on a small databases of 19
Fig. 15. Corresponding feature points obtained from 20 frames: (a) result
of matching the same person to a video, (b) result of matching a different
person to the video, (c) trajectories of (a), (d) trajectories of (b) [Li and
Chellappa 2001].
Fig. 17. Images from the FERET dataset; these images are of size 384
256.
evaluation (Sep96) will be briefly reviewed The Sep96 evaluation tested the follow-
here. Because the entire FERET data set ing 10 algorithms:
has been released, the Sep96 protocol pro-
vides a good benchmark for performance of an algorithm from Excalibur Corpora-
new algorithms. For the Sep96 evaluation, tion (Carlsbad, CA)(Sept. 1996);
there was a primary gallery consisting of two algorithms from MIT Media Labo-
one frontal image (fa) per person for 1196 ratory (Sept. 1996) [Moghaddam et al.
individuals. This was the core gallery used 1996; Turk and Pentland 1991];
to measure performance for the following three linear discriminant analysis-
four different probe sets: based algorithms from Michigan State
University [Swets and Weng 1996b]
fb probesgallery and probe images of (Sept. 1996) and the University of Mary-
an individual taken on the same day land [Etemad and Chellappa 1997; Zhao
with the same lighting (1195 probes); et al. 1998] (Sept. 1996 and March
fc probesgallery and probe images of 1997);
an individual taken on the same day a gray-scale projection algorithm from
with different lighting (194 probes); Rutgers University [Wilder 1994] (Sept.
Dup I probesgallery and probe im- 1996);
ages of an individual taken on different an Elastic Graph Matching algorithm
daysduplicate images (722 probes); from the University of Southern Cali-
and fornia [Okada et al. 1998; Wiskott et al.
Dup II probesgallery and probe im- 1997] (March 1997);
ages of an individual taken over a year a baseline PCA algorithm [Moon and
apart (the gallery consisted of 894 im- Phillips 2001; Turk and Pentland 1991];
ages; 234 probes). and
a baseline normalized correlation
Performance was measured using two matching algorithm.
basic methods. The first measured identi-
fication performance, where the primary Three of the algorithms performed
performance statistic is the percentage very well: probabilistic eigenface from
of probes that are correctly identified by MIT [Moghaddam et al. 1996], sub-
the algorithm. The second measured veri- space LDA from UMD [Zhao et al. 1998,
fication performance, where the primary 1999], and Elastic Graph Matching from
performance measure is the equal error USC [Wiskott et al. 1997].
rate between the probability of false alarm A number of lessons were learned from
and the probability of correct verification. the FERET evaluations. The first is that
(A more complete method of reporting performance depends on the probe cate-
identification performance is a cumulative gory and there is a difference between best
match characteristic; for verification per- and average algorithm performance.
formance it is a receiver operating charac- Another lesson is that the scenario
teristic (ROC).) has an impact on performance. For
identification, on the fb and duplicate from Australia, Germany, and the United
probes, the USC scores were 94% and 59%, States participating. The five companies
and the UMD scores were 96% and 47%. evaluated were Banque-Tec International
However, for verification, the equal error Pty. Ltd., C-VIS Computer Vision und Au-
rates were 2% and 14% for USC, and 1% tomation GmbH, Miros, Inc., Lau Tech-
and 12% for UMD. nologies, and Visionics Corporation.
A greater variety of imagery was used
5.1.3. Summary. The availability of the in FRVT 2000 than in the FERET evalua-
FERET database and evaluation tech- tions. FRVT 2000 reported results in eight
nology has had a significant impact on general categories: compression, distance,
progress in the development of face recog- expression, illumination, media, pose, res-
nition algorithms. The series of tests olution, and temporal. There was no com-
has allowed advances in algorithm de- mon gallery across all eight categories; the
velopment to be quantifiedfor exam- sizes of the galleries and probe sets varied
ple, the performance improvements in the from category to category.
MIT algorithms between March 1995 and We briefly summarize the results of
September 1996, and in the UMD al- FRVT 2000. Full details can be found in
gorithms between September 1996 and [Blackburn et al. 2001], and include iden-
March 1997. tification and verification performance
Another important contribution of the statistics. The media experiments showed
FERET evaluations is the identification of that changes in media do not adversely
areas for future research. In general the affect performance. Images of a person
test results revealed three major problem were taken simultaneously on conven-
areas: recognizing duplicates, recognizing tional film and on digital media. The
people under illumination variations, and compression experiments showed that
recognizing them under pose variations. compression does not adversely affect per-
formance. Probe images compressed up to
40:1 did not reduce recognition rates. The
5.1.4. FRVT 2000. The Sep96 FERET
compression algorithm was JPEG.
evaluation measured performance on pro-
FRVT 2000 also examined the effect of
totype laboratory systems. After March
pose angle on performance. The results
1997 there was rapid advancement in the
show that pose does not significantly affect
development of commercial face recogni-
performance up to 25 , but that perfor-
tion systems. This advancement repre-
mance is significantly affected when the
sented both a maturing of face recognition
pose angle reaches 40 .
technology, and the development of the
In the illumination category, two key
supporting system and infrastructure nec-
effects were investigated. The first was
essary to create commercial off-the-shelf
lighting change indoors. This was equiv-
(COTS) systems. By the beginning of 2000,
alent to the fc probes in FERET. For the
COTS face recognition systems were read-
best system in this category, the indoor
ily available.
change of lighting did not significantly
To assess the state of the art in COTS
affect performance. A second experiment
face recognition systems the Face Recogni-
tested recognition with an indoor gallery
tion Vendor Test (FRVT) 200018 was orga-
and an outdoor probe set. Moving from in-
nized [Blackburn et al. 2001]. FRVT 2000
door to outdoor lighting significantly af-
was a technology evaluation that used the
fected performance, with the best system
Sep96 evaluation protocol, but was signif-
achieving an identification rate of only
icantly more demanding than the Sep96
0.55.
FERET evaluation.
The temporal category is equivalent to
Participation in FRVT 2000 was re-
the duplicate probes in FERET. To com-
stricted to COTS systems, with companies
pare progress since FERET, dup I and
dup II scores were reported. For FRVT
18 http://www.frvt.org. 2000 the dup I identification rate was 0.63
compared with 0.58 for FERET. The cor- 1%, was only 50%. Thus, face recognition
responding rates for dup II were 0.64 for from outdoor imagery remains a research
FRVT 2000 and 0.52 for FERET. These re- challenge area.
sults showed that there was algorithmic A very important question for real-
progress between the FERET and FRVT world applications is the rate of decrease
2000 evaluations. FRVT 2000 showed that in performance as time increases between
two common concerns, the effects of com- the acquisition of the database of images
pression and recording media, do not af- and new images presented to a system.
fect performance. It also showed that FRVT 2002 found that for the top systems,
future areas of interest continue to be du- performance degraded at approximately
plicates, pose variations, and illumination 5% per year.
variations generated when comparing in- One open question in face recognition
door images with outdoor images. is: how does database and watch list size
effect performance? Because of the large
number of people and images in the FRVT
5.1.5. FRVT 2002. The Face Recogni- 2002 data set, FRVT 2002 reported the
tion Vendor Test (FRVT) 2002 [Phillips first large-scale results on this question.
et al. 2003]18 was a large-scale evalua- For the best system, the top-rank iden-
tion of automatic face recognition technol- tification rate was 85% on a database of
ogy. The primary objective of FRVT 2002 800 people, 83% on a database of 1,600,
was to provide performance measures for and 73% on a database of 37,437. For ev-
assessing the ability of automatic face ery doubling of database size, performance
recognition systems to meet real-world re- decreases by two to three overall percent-
quirements. Ten participants were evalu- age points. More generally, identification
ated under the direct supervision of the performance decreases linearly in the log-
FRVT 2002 organizers in July and August arithm of the database size.
2002. Previous evaluations have reported face
The heart of the FRVT 2002 was the recognition performance as a function of
high computational intensity test (HCInt). imaging properties. For example, previous
The HCInt consisted of 121,589 opera- reports compared the differences in perfor-
tional images of 37,437 people. The im- mance when using indoor versus outdoor
ages were provided from the U.S. Depart- images, or frontal versus nonfrontal im-
ment of States Mexican nonimmigrant ages. FRVT 2002, for the first time, exam-
Visa archive. From this data, real-world ined the effects of demographics on per-
performance figures on a very large data formance. Two major effects were found.
set were computed. Performance statistics First, recognition rates for males were
were computed for verification, identifica- higher than females. For the top systems,
tion, and watch list tasks. identification rates for males were 6% to
FRVT 2002 results showed that nor- 9% points higher than that of females.
mal changes in indoor lighting do not sig- For the best system, identification per-
nificantly affect performance of the top formance on males was 78% and for fe-
systems. Approximately the same perfor- males it was 79%. Second, recognition
mance results were obtained using two in- rates for older people were higher than
door data sets, with different lighting, in for younger people. For 18- to 22-year-olds
FRVT 2002. In both experiments, the best the average identification rate for the top
performer had a 90% verification rate at systems was 62%, and for 38- to 42-year-
a false accept rate of 1%. On comparable olds it was 74%. For every 10-year in-
experiments conducted 2 years earlier in crease in age, performance increased on
FRVT 2000, the results of FRVT 2002 indi- the average by approximately 5% through
cated that there has been a 50% reduction age 63.
in error rates. For the best face recogni- FRVT 2002 looked at two of these
tion systems, the recognition rate for faces new techniques. The first was the three-
captured outdoors, at a false accept rate of dimensional morphable models technique
primarily in the number of subjects (295 The database is divided into three parts:
rather than 37). The M2VTS database a training set, an evaluation set, and a test
contains five shots of each subject taken set. The training set is used to build client
at sessions over a period of 3 months; the models. The evaluation set is used to com-
XM2VTS database contains eight shots of pute client and imposter scores. On the
each subject taken at four sessions over a basis of these scores, a threshold is cho-
period of 4 months (so that each session sen that determines whether a person is
contains two repetitions of the sequence). accepted or rejected. In multimodal clas-
The XM2VTS database was acquired sification, the evaluation set can also be
using a Sony VX1000E digital camcorder used to optimally combine the outputs of
and a DHR1000UX digital VCR. several classifiers. The test set is selected
In the XM2VTS database, the first shot to simulate a real authentication scenario.
is a speaking head shot. Each subject, who 295 subjects were randomly divided into
wore a clip-on microphone, was asked to 200 clients, 25 evaluation imposters, and
read three sentences that were written 70 test imposters. Two different evalu-
on a board positioned just below the cam- ation configurations were used with dif-
era. The subjects were asked to read the ferent distributions of client training and
three simple sentences twice at their nor- client evaluation data. For more details,
mal pace and to pause briefly at the end of see Messer et al. [1999].
each sentence. In order to collect face verification re-
The second shot is a rotating head se- sults on this database using the Lau-
quence. Each subject was asked to ro- sanne protocol, a contest was organized
tate his/her head to the left, to the right, in conjuction with ICPR 2000 (the Inter-
up, and down, and finally to return to national Conference on Pattern Recogni-
the center. The subjects were told that tion). There were twelve algorithms from
a full profile was required and were four partipicants in this contest [Matas
asked to repeat the entire sequence twice. et al. 2000]: an EBGM algorithm from
The same sequence was used in all four IDIAP (Daller Molle Institute for Per-
sessions. ceptual Artificial Intelligence), a slightly
An additional dataset containing a 3D modified EBGM algorithm from Aristo-
model of each subjects head was acquired tle University of Thessaloniki, a FND-
during each session using a high-precision based (Fractal Neighbor Distance) algo-
stereo-based 3D camera developed by the rithm from the University of Sydney, and
Turing Institute.20 eight variants of LDA algorithms and one
SVM algorithm from the University of
5.2.2. Evaluation. The M2VTS Lau- Surrey. The performance measures of a
sanne protocol was designed to evaluate verification system are the false accep-
the performance of vision- and speech- tance rate (FA) and the false rejection rate
based person authentication systems on (FR). Both FA and FR are influenced by
the XM2VTS database. This protocol was an acceptance threshold. According to the
defined for the task of verification. The Lausanne protocol, the threshold is set to
features of the observed person are com- satisfy certain performance levels on the
pared with stored features corresponding evaluation set. The same threshold is ap-
to the claimed identity, and the system plied to the test data and FA and FR on
decides whether the identity claim is true the test data are computed. The best re-
or false on the basis of a similarity score. sults of FA and FR on the test data (FA/FR:
The subjects whose features are stored in 2.3%/2.5% and 1.2%/1.0% for evaulation
the systems database are called clients, configurations I and II, respectively) were
whereas persons claiming a false identity obtained using an LDA algorithm with a
are called imposters. non-Euclidean metric (University of Sur-
rey) when the threshold was set so that
20 Turing Institute Web address: http://www.turing. FA was equal to FB on the evaulation re-
gla.ac.uk/. sult. This result seems to concur with the
equal error rates reported in the FERET and review approaches to solving them.
protocol. In addition, FA and FR on the Pros and cons of these approaches are
test data were reported when the thresh- pointed out so an appropriate approach
old was set set so that FA or FB was zero can be applied to a specific task. The ma-
on the evaulation result. For more details jority of the methods reviewed here are
on the results, see Matas et al. [2000]. generative approaches that can synthe-
size virtual views under desired illumi-
5.2.3. Summary. The results of the nation and viewing conditions. Many of
M2VTS/XM2VTS projects can be used the reviewed methods have not yet been
for a broad range of applications. In applied to the task of face recognition,
the telecommunication field, the results at least not on large databases.21 This
should have a direct impact on network may be for several reasons; some meth-
services where security of information ods may need many sample images per
and access will become increasingly person, pixel-wise accurate alignment of
important. (Telephone fraud in the U.S. images, or high-quality images for recon-
has been estimated to cost several billion struction; or they may be computationally
dollars a year.) too expensive to apply to recognition tasks
that process thousands of images in near-
real-time.
6. TWO ISSUES IN FACE RECOGNITION:
To facilitate discussion and analysis,
ILLUMINATION AND POSE VARIATION
we adopt a varying-albedo Lambertian re-
In this section, we discuss two important flectance model that relates the image I
issues that are related to face recogni- of an object to the object ( p, q) [Horn and
tion. The best face recognition techniques Brooks 1989]:
reviewed in Section 3 were successful in
terms of their recognition performance on
large databases in well-controlled envi- 1 + pPs + q Q s
ronments. However, face recognition in I = , (6)
1 + p2 + q 2 1 + Ps2 + Q 2s
an uncontrolled environment is still very
challenging. For example, the FERET
evaluations and FRVTs revealed that where ( p, q), are the partial derivatives
there are at least two major challenges: and varying albedo of the object, respec-
the illumination variation problem and tively. (Ps , Q s , 1) represents a single dis-
the pose variation problem. Though many tant light source. The light source can also
existing systems build in some sort of be represented by the illuminant slant and
performance invariance by applying pre- tilt angles; slant is the angle between
processing methods such as histogram the opposite lighting direction and the pos-
equalization or pose learning, significant itive z-axis, and tilt is the angle between
illumination or pose change can cause se- the opposite lighting direction and the x-z
rious performance degradation. In addi- plane. These angles are related to Ps and
tion, face images can be partially occluded, Q s by Ps = tan cos , Q s = tan sin .
or the system may need to recognize a per- To simplify
son from an image in the database that the notation, we replace the
was acquired some time ago (referred to constant 1 + Ps2 + Q 2s by K . For easier
as the duplicate problem in the FERET analysis, we assume that frontal face ob-
tests). jects are bilaterally symmetric about the
These problems are unavoidable when vertical midlines of the faces.
face images are acquired in an uncon-
trolled, uncooperative environment, as in
surveillance video clips. It is beyond the 21 One exception is a recent report [Blanz and Vetter
scope of this paper to discuss all these is- 2003] where faces were represented using 4448 im-
sues and possible solutions. In this section ages from the CMU-PIE databases and 1940 images
we discuss only two well-defined problems from the FERET database.
Fig. 18. In each row, the same face appears differently under different illu-
minations (from the Yale face database).
6.1. The Illumination Problem in Face we divide the images and the eigenimages
Recognition into two halves, for example, left and right,
we have
The illumination problem is illustrated
in Figure 18, where the same face ap- ai = I pL iL + I pR iR I A i ,
pears different due to a change in light-
ing. The changes induced by illumination ai = I L iL + I R iR I A i .
are often larger than the differences be- (8)
tween individuals, causing systems based Based on Equation (6), the symmetric
on comparing images to misclassify input property of eigenimages and face objects,
images. This was experimentally observed we have
in Adini et al. [1997] using a dataset of 25
individuals. ai = 2I pL [x, y] iL [x, y] I A i ,
In Zhao [1999], an analysis was carried 2 L
out of how illumination variation changes ai = I p [x, y] + I pL [x, y]q L [x, y]Q s
K
the eigen-subspace projection coefficients iL [x, y] I A i ,
of images under the assumption of a Lam- (9)
bertian surface. Consider the basic expres-
sion for the subspace decomposition of a leading to the following relation:
m
face image I : I I A + i=1 ai i , where I A
is the average image, i are the eigenim- 1 Qs a a T
a = a + f 1 , f 2 , . . . , f ma
ages, and ai are the projection coefficients. K K
Assume that for a particular individual we (10)
K 1
have a prototype image I p that is a nor- a A .,
mally lighted frontal view (Ps = 0, Q s = 0 K
in Equation (6)) in the database, and we
want to match it against a new image I of where f ia = 2(I pL [x, y]q L [x, y]) iL [x, y]
the same class under lighting (Ps , Q s , 1). and a A is the projection coefficient vector
The corresponding subspace projection co- of the average image I A : [I A 1 , . . . , I A
efficient vectors a = [a1 , a2 , . . . , am ]T (for m ]. Now let us assume that the training
I p ) and a = [a1 , a2 , . . . , am ]T (for I ) are set is extended to include mirror images
computed as follows: as in Kirby and Sirovich [1990]. A similar
analysis can be carried out, since in such a
ai = I p i I A i , case the eigenimages are either symmet-
(7) ric (for most leading eigenimages) or anti-
ai = I i I A i , symmetric.
In general, Equation (11) suggests that
where denotes the sum of all element- a significant illumination change can
wise products of two matrices (vectors). If seriously degrade the performance of
Fig. 19. Changes of projection vectors due to class variation (a) and illumination change (b) is of the
same order [Zhao 1999].
point-wise distance. For more details as the measure of complexity and pro-
about these methods and about the eval- posed the following symmetric similarity
uation database, see Adini et al. [1997]. measure:
It was concluded that none of these
representations alone can overcome the I
d G (I, J ) = min(I, J )( )
image variations due to illumination. J
A recently proposed image comparison (13)
method Jacobs et al. [1998] used a new J
dx dy.
measure robust to illumination change. I
The rationale for develop such a method
of directly comparing images is the poten-
tial difficulty of building a complete repre- They noticed the similarity between this
sentation of an objects possible images as measure and the measure that simply
suggested in [Belhumeur and Kriegman compares the edges. It is also clear that
1997]. The authors argued that it is not the measure is not strictly illumination-
clear whether it is possible to construct invariant because it changes for a pair
the complete representation using a small of images of the same object when the
number of training images taken un- illumination changes. Experiments on
der uncontrolled viewing conditions and face recognition showed improved perfor-
containing multiple light sources. It was mance over eigenfaces, which were some-
shown that given two images of an ob- what worse than the illumination cone-
ject with unknown structure and albedo, based method [Georghiades et al. 1998] on
there is always a large family of solutions. the same set of data.
Even in the case of given light sources,
only two out of three independent compo- 6.1.3. Class-Based Approaches. Under
nents of the Hessian of the surface can be the assumptions of Lambertian sur-
determined. Instead, the authors argued faces and no shadowing, a 3D linear
that the ratio of two images of the same illumination subspace for a person
object is simpler than if the images are was constructed in Belhumeur and
from different objects. Based on this ob- Kriegman [1997], Hallinan [1994],
servation, the complexity of the ratio of Murase and Nayar [1995], Ricklin-
two aligned images was proposed as the Raviv and Shashua [1999], and Shashua
similarity measure. More specifically, we [1994] for a fixed viewpoint, using three
have aligned faces/images acquired under
different lighting conditions. Under
ideal assumptions, recognition based on
I1 K2 1 + pI Ps,1 + qI Q s,1
= (11) this subspace is illumination-invariant.
I2 K1 1 + pI Ps,2 + qI Q s,2 More recently, an illumination cone has
been proposed as an effective method
of handling illumination variations, in-
for images of the same object, and
cluding shadowing and multiple light
sources [Belhumeur and Kriegman 1997;
Georghiades et al. 1998]. This method is
I1 K2 I 1 + pI Ps,1 + qI Q s,1
= an extension of the 3D linear subspace
J2 K1 J 1 + pJ Ps,2 + q J Q s,2
method [Hallinan 1994; Shashua 1994]
(12) and has the same drawback, requiring
1 + p2J + q J2 at least three aligned training images
acquired under different lighting con-
1 + p2I + qI2 ditions per person. A more detailed
review of this approach and its extension
for images of different objects. They chose to handle the combined illumination
the integral of the magnitude of the and pose problem will be presented in
gradient of the function (ratio image) Section 6.2.
Fig. 20. Testing the invariance of the quotient image (Q-image) to varying illu-
mination. (a) Original images of a novel face taken under five different illumina-
tions. (b) The Q-images corresponding to the novel images, computed with respect
to the bootstrap set of ten objects [Riklin-Raviv and Shashua 1999]. (Courtesy of
T. Riklin-Raviv and A. Shashua.)
More recently, a method based on quo- tion [Freeman and Tenenbaum 2000] for
tient images was introduced [Riklin-Raviv an image y s of object y under illumination
and Shashua 1999]. Like other class-based s is
methods, this method assumes that the
2
faces of different individuals have the N
ys i Ai x , (14)
same shape and different textures. Given
i=1
two objects a, b, the quotient image Q
is defined to be the ratio of their albedo which is a bilinear problem in the N un-
functions a /b, and hence is illumination- knowns i and 3 unknowns x. For compar-
invariant. Once Q is computed, the en- ison, the proposed energy function is
tire illumination space of object a can be
generated by Q and a linear illumination
N
subspace constructed from three images (i y s Ai x)2 . (15)
of object b. To make this basic idea work i=1
in practice, a training set (called the boot-
strap set in the paper) is needed that con- This formation of the energy function is
sists of images of N objects under various a major reason why the quotient image
lighting conditions, and the quotient im- method works better than reconstruc-
age of a novel object y is defined relative tion methods based on Equation (14) in
to the average object of the bootstrap set. terms of smaller size of the bootstrap set
More specifically, the bootstrap set con- and less requirement for pixel-wise image
sists of 3N images taken from three fixed, alignment. As pointed out by the authors,
linearly independent light sources s1 , s2 , another factor contributing to the success
and s3 that are not known. Under this as- of using only a small bootstrap set is that
sumption, any light source s can be ex- the albedo functions occupy only a small
pressed as a linear combination of the si : subspace. Figure 20 demonstrates the in-
s = x1 s1 + x2 s2 + x3 s3 . The authors fur- variance of the quotient image against
ther defined the normalized albedo func- change in illumination conditions; the
tion of the bootstrap set as the squared image synthesis results are shown in
sum of the i , where i is the albedo func- Figure 21.
tion of object i. An interesting energy/cost
function is defined that is quite differ- 6.1.4. Model-Based Approaches. In
ent from the traditional bilinear form. Let model-based approaches, a 3D face model
A1 , A2 , . . . , AN be m 3 matrices whose is used to synthesize the virtual image
columns are images of object i (from the from a given image under desired illu-
bootstrap set) that contain the same m mination conditions. When the 3D model
pixels; then the bilinear energy/cost func- is unknown, recovering the shape from
Fig. 21. Image synthesis example. Original image (a) and its quotient image (b)
from the N = 10 bootstrap set. The quotient image is generated relative to the
average object of the bootstrap set, shown in (c), (d), and (e). Images (f) through (k)
are synthetic images created from (b) and (c), (d), (e) [Riklin-Raviv and Shashua
1999]. (Courtesy of T. Riklin-Raviv and A. Shashua.)
the images accurately is difficult without To address the issue of varying albedo,
using any priors. Shape-from-shading a direct 2D-to-2D approach was proposed
(SFS) can be used if only one image is based on the assumption that front-view
available; stereo or structure from motion faces are symmetric and making use of a
can be used when multiple images of the generic 3D model [Zhao et al. 1999]. Recall
same object are available. that a prototype image I p is a frontal view
Fortunately, for face recognition the dif- with Ps = 0, Q s = 0. Substituting this into
ferences in the 3D shapes of different face Equation (6), we have
objects are not dramatic. This is especially
true after the images are aligned and nor- 1
malized. Recall that this assumption was I p [x, y] = . (16)
used in the class-based methods reviewed 1 + p2 + q 2
above. Using a statistical representation
of the 3D heads, PCA was suggested as a Comparing Equations (6) and (16), we ob-
tool for solving the parametric SFS prob- tain
lem [Atick et al. 1996]. An eigenhead ap- K
proximation of a 3D head was obtained I p [x, y] = (I [x, y] + I [x, y]).
2(1 + q Q s )
after training on about 300 laser-scanned
(17)
range images of real human heads. The
ill-posed SFS problem is thereby trans- This simple equation relates the proto-
formed into a parametric problem. The au- type image I p to I [x, y] + I [x, y], which
thors also demonstrated that such a rep- is already available. The two advantages
resentation helps to determine the light of this approach are: (1) there is no need
source. For a new face image, its 3D head to recover the varying albedo [x, y]; (2)
can be approximated as a linear combina- there is no need to recover the full shape
tion of eigenheads and then used to de- gradients ( p, q); q can be approximated
termine the light source. Using this com- by a value derived from a generic 3D
plete 3D model, any virtual view of the face shape. As part of the proposed automatic
image can be generated. A major draw- method, a model-based light source iden-
back of this approach is the assumption tification method was also proposed to
of constant albedo. This assumption does improve existing source-from-shading al-
not hold for most real face images, even gorithms. Figure 22 shows some com-
though it is the most common assumption parisons between rendered images ob-
used in SFS algorithms. tained using this method and using a
Fig. 22. Image rendering comparison. The original images are shown
in the first column. The second column shows prototype images rendered
using the local SFS algorithm [Tsai and Shah 1994]. Prototype images
rendered using symmetric SFS are shown in the third column. Finally,
the fourth column shows real images that are close to the prototype
images [Zhao and Chellappa 2000].
local SFS algorithm [Tsai and Shah 1994]. which are produced when the object is
Using the Yale and Weizmann databases illuminated by harmonic functions. The
(Table V), significant performance im- nine harmonic images of a face are plotted
provements were reported when the pro- in Figure 23. An interesting comparison
totype images were used in a subspace was made between the proposed method
LDA system in place of the original in- and the 3D linear illumination subspace
put images [Zhao et al. 1999]. In these ex- methods [Hallinan 1994; Shashua 1994];
periments, the gallery set contained about the 3D linear methods are just first-order
500 images from various databases and harmonic approximations without the DC
the probe set contained 60 images from components.
the Yale database and 96 images from the Assuming precomputed object pose and
Weizmann database. known color albedo/texture, the authors
Recently, a general method of ap- reported an 86% correct recognition rate
proximating Lambertian reflectance us- when applying this technique to the task
ing second-order spherical harmonics has of face recognition using a probe set
been reported [Basri and Jacobs 2001]. of 10 people and a gallery set of 42
Assuming Lambertian objects under dis- people.
tant, isotropic lightng, the authors were
able to show that the set of all re-
flectance functions can be approximated 6.2. The Pose Problem in Face Recognition
using the surface spherical harmonic ex- It is not surprising that the performance
pansion. Specifically, they have proved of face recognition systems drops signif-
that using a second-order (nine harmon- icantly when large pose variations are
ics, i.e., nine-dimensional 9D-space) ap- present, in the input images. This diffi-
proximation, the accuracy for any light culty was documented in the FERET and
function exceeds 97.97%. They then ex- FRVT test reports [Blackburn et al. 2001;
tended this analysis to image formation, Phillips et al. 2002b, 2003], and was sug-
which is a much more difficult problem due gested as a major research issue. When il-
to possible occlusion, shape, and albedo lumination variation is also present, the
variations. As indicated by the authors, task of face recognition becomes even more
worst-case image approximation can be difficult. Here we focus on the out-of-plane
arbitrarily bad, but most cases are good. rotation problem, since in-plane rotation
Using their method, an image can be de- is a pure 2D problem and can be solved
composed into so-called harmonic images, much more easily.
Fig. 24. The process of constructing the illumination cone. (a) The seven training im-
ages from Subset 1 (near frontal illumination) in frontal pose. (b) Images correspond-
ing to the columns of B. (c) Reconstruction up to a GBR transformation. On the left,
the surface was rendered with flat shading, that is, the albedo was assumed to be con-
stant across the surface, while on the right the surface was texture-mapped using the
first basis image of B shown in Figure 24(b). (d) Synthesized images from the illumina-
tion cone of the face under novel lighting conditions but fixed pose. Note the large vari-
ations in shading and shadowing as compared to the seven training images. (Courtesy of
A. Georghiades, P. Belhumeur, and D. Kriegman.)
et al. 2000]. Some of the reviewed meth- recognition [Pentland et al. 1994]. This
ods are very closely relatedfor example, method explicitly codes the pose informa-
methods 3, 4, and 5. Despite their popular- tion by constructing an individual eigen-
ity, these methods have two common draw- face for each pose. More recently, a uni-
backs: (1) they need many example images fied framework called the bilinear model
to cover the range of possible views; (2) the was proposed in Freeman and Tenenbaum
illumination problem is not explicitly ad- [2000] that can handle either pure pose
dressed, though in principle it can be han- variation or pure class variation. (A bilin-
dled if images captured under the same ear example is given in Equation (14) for
pose but different illumination conditions the illumination problem.)
are available. In Wiskott et al. [1997], a robust face
The popular eigenface approach [Turk recognition scheme based on EBGM was
and Pentland 1991] to face recognition has proposed. The authors assumed a planar
been extended to a view-based eigenface surface patch at each feature point (land-
method in order to achieve pose-invariant mark), and learned the transformations
a framework for benchmarking new algo- surveillance where the image size of the
rithms. The scripts can be modified to run face area is very small. On the other
different sets of images against the base- hand, the subspace LDA method [Zhao
line. For on-line resources related to face et al. 1999] works well for both large and
recognition, such as research papers and small images, for example, 96 84 or
databases, see Table V. 12 11.
We give below a concise summary of our Recognition of faces from a video se-
discussion, followed by our conclusions, quence (especially a surveillance video)
in the same order as the topics have ap- is still one of the most challenging prob-
peared in this paper: lems in face recognition because video is
Machine recognition of faces has of low quality and the images are small.
emerged as an active research area Often, the subjects of interest are not co-
spanning disciplines such as image operative, for example, not looking into
processing, pattern recognition, com- the camera. One particular difficulty
puter vision, and neural networks. in these applications is how to obtain
There are numerous applications of good-quality gallery images. Neverthe-
FRT to commercial systems such as less, video-based face recognition sys-
face verification-based ATM and access tems using multiple cues have demon-
control, as well as law enforcement strated good results in relatively con-
applications to video surveillance, etc. trolled environments.
Due to its user-friendly nature, face A crucial step in face recognition is the
recognition will remain a powerful tool evaluation and benchmarking of algo-
in spite of the existence of very reliable rithms. Two of the most important face
methods of biometric personal identifi- databases and their associated evalua-
cation such as fingerprint analysis and tion methods have been reviewed: the
iris scans. FERET, FRVT, and XM2VTS protocols.
Extensive research in psychophysics The availability of these evaluations has
and the neurosciences on human recog- had a significant impact on progress in
nition of faces is documented in the lit- the development of face recognition al-
erature. We do not feel that machine gorithms.
recognition of faces should strictly fol- Although many face recognition tech-
low what is known about human recog- niques have been proposed and have
nition of faces, but it is beneficial for en- shown significant promise, robust face
gineers who design face recognition sys- recognition is still difficult. There are
tems to be aware of the relevant find- at least three major challenges: illumi-
ings. On the other hand, machine sys- nation, pose, and recognition in outdoor
tems provide tools for conducting stud- imagery. A detailed review of methods
ies in psychology and neuroscience. proposed to solve these problems has
Numerous methods have been proposed been presented. Some basic problems re-
for face recognition based on image in- main to be solved; for example, pose dis-
tensities [Chellappa et al. 1995]. Many crimination is not difficult but accurate
of these methods have been success- pose estimation is hard. In addition to
fully applied to the task of face recog- these two problems, there are other even
nition, but they have advantages and more difficult ones, such as recognition
disadvantages. The choice of a method of a person from images acquired years
should be based on the specific require- apart.
ments of a given task. For example, The impressive face recognition capabil-
the EBGM-based method [Okada et al. ity of the human perception system has
1998] has very good performance, but one limitation: the number and types of
it requires an image size, for exam- faces that can be easily distinguished.
ple, 128 128, which severely restricts Machines, on the other hand, can
its possible application to video-based store and potentially recognize as many
people as necessary. Is it really possible BASRI, R. AND JACOBS, D. W. 2001. Lambertian ref-
that a machine can be built that mimics electances and linear subspaces. In Proceedings,
International Conference on Computer Vision.
the human perceptual system without Vol. II. 383390.
its limitations on number and types? BELHUMEUR, P. N., HESPANHA, J. P., AND KRIEGMAN, D.
J. 1997. Eigenfaces vs. Fisherfaces: Recogni-
To conclude our paper, we present a con- tion using class specific linear projection. IEEE
jecture about face recognition based on Trans. Patt. Anal. Mach. Intell. 19, 711720.
psychological studies and lessons learned BELHUMEUR, P. N. AND KRIEGMAN, D. J. 1997. What
from designing algorithms. We conjecture is the set of images of an object under all possible
that different mechanisms are involved in lighting conditions? In Proceedings, IEEE Con-
ference on Computer Vision and Pattern Recog-
human recognition of familiar and unfa- nition. 5258.
miliar faces. For example, it is possible BELL, A. J. AND SEJNOWSKI, T. J. 1995. An informa-
that 3D head models are constructed, by tion maximisation approach to blind separation
extensive training for familiar faces, but and blind deconvolution. Neural Computation 7,
for unfamiliar faces, multiview 2D images 11291159.
are stored. This implies that we have full BELL, A. J. AND SEJNOWSKI, T. J. 1997. The indepen-
probability density functions for familiar dent components of natural scenes are edge fil-
ters. Vis. Res. 37, 33273338.
faces, while for unfamiliar faces we only
BEVERIDGE, J. R., SHE, K., DRAPER, B. A., AND
have discriminant functions. GIVENS, G. H. 2001. A nonparametric statisi-
cal comparison of principal component and
linear discriminant subspaces for face recog-
REFERENCES
nition. In Proceedings, IEEE Conference on
ADINI, Y., MOSES, Y., AND ULLMAN, S. 1997. Face Computer Vision and Pattern Recognition.
recognition: The problem of compensating for (An updated version can be found online
changes in illumination direction. IEEE Trans. at http://www.cs.colostate.edu/evalfacerec/news.
Patt. Anal. Mach. Intell. 19, 721732. html.)
AKAMATSU, S., SASAKI, T., FUKAMACHI, H., MASUI, N., BEYMER, D. 1995. Vectorizing face images by in-
AND SUENAGA, Y. 1992. An accurate and robust terleaving shape and texture computations. MIT
face identification scheme. In Proceedings, In- AI Lab memo 1537. Massachusetts Institute of
ternational Conference on Pattern Recognition. Technology, Cambridge, MA.
217220. BEYMER, D. J. 1993. Face recognition under vary-
ATICK, J., GRIFFIN, P., AND REDLICH, N. 1996. Sta- ing pose. Tech. Rep. 1461. MIT AI Lab, Mas-
tistical approach to shape from shading: Re- sachusetts Institute of Technology, Cambridge,
construction of three-dimensional face surfaces MA.
from single two-dimensional images. Neural BEYMER, D. J. AND POGGIO, T. 1995. Face recognition
Computat. 8, 13211340. from one example view. In Proceedings, Interna-
AZARBAYEJANI, A., STARNER, T., HOROWITZ, B., AND tional Conference on Computer Vision. 500507.
PENTLAND, A. 1993. Visually controlled graph- BIEDERMAN, I. 1987. Recognition by components: A
ics. IEEE Trans. Patt. Anal. Mach. Intell. 15, theory of human image understanding. Psych.
602604. Rev. 94, 115147.
BACHMANN, T. 1991. Identification of spatially BIEDERMAN, I. AND KALOCSAI, P. 1998. Neural and
quantized tachistoscopic images of faces: How psychophysical analysis of object and face recog-
many pixels does it take to carry identity? Euro- nition. In Face Recognition: From Theory to Ap-
pean J. Cog. Psych. 3, 87103. plications, H. Wechsler, P. J. Phillips, V. Bruce, F.
BAILLY-BAILLIERE, E., BENGIO, S., BIMBOT, F., HAMOUZ, F. Soulie, and T. S. Huang, Eds. Springer-Verlag,
M., KITTLER, J., MARIETHOZ, J., MATAS, J., MESSER, Berlin, Germany, 325.
K., POPOVICI, V., POREE, F., RUIZ, B., AND THIRAN, BIGUN, J., DUC, B., SMERALDI, F., FISCHER, S., AND
J. P. 2003. The BANCA database and evalua- MAKAROV, A. 1998. Multi-modal person au-
tion protocol. In Proceedings of the International thentication. In Face Recognition: From The-
Conference on Audio- and Video-Based Biometric ory to Applications, H. Wechsler, P. J. Phillips,
Person Authentication. 625638. V. Bruce, F. F. Soulie, and T. S. Huang, Eds.
BARTLETT, J. C. AND SEARCY, J. 1993. Inversion and Springer-Verlag, Berlin, Germany, 2650.
configuration of faces. Cog. Psych. 25, 281316. BLACK, M., FLEET, D., AND YACOOB, Y. 1998. A
BARTLETT, M. S., LADES, H. M., AND SEJNOWSKI, T. Framework for modelling appearance change in
1998. Independent component representation image sequences. In Proceedings, International
for face recognition. In Proceedings, SPIE Sym- Conference on Computer Vision, 660667.
posium on Electronic Imaging: Science and Tech- BLACK, M. AND YACOOB, Y. 1995. Tracking and rec-
nology. 528539. ognizing facial expressions in image sequences
using local parametrized models of image mo- based active appearance models. In Proceedings,
tion. Tech. rep. CS-TR-3401. Center for Automa- International Conference on Automatic Face and
tion Research, Unversity of Maryland, College Gesture Recognition.
Park, MD. COOTES, T. F., EDWARDS, G. J., AND TAYLOR, C. J. 2001.
BLACKBURN, D., BONE, M., AND PHILLIPS, P. J. 2001. Active appearance models. IEEE Trans. Patt.
Face recognition vendor test 2000. Tech. rep. Anal. Mach. Intell. 23, 681685.
http://www.frvt.org. COX, I. J., GHOSN, J., AND YIANILOS, P. N. 1996.
BLANZ, V. AND VETTER, T. 1999. A Morphable model Feature-based face recognition using mixture-
for the synthesis of 3D faces. In Proceedings, distance. In Proceedings, IEEE Conference on
SIGGRAPH99, 187194. Computer Vision and Pattern Recognition. 209
BLANZ, V. AND VETTER, T. 2003. Face recognition 216.
based on fitting a 3D morphable model. IEEE CRAW, I. AND CAMERON, P. 1996. Face recognition by
Trans. Patt. Anal. Mach. Intell. 25, 10631074. computer. In Proceedings, British Machine Vi-
BLEDSOE, W. W. 1964. The model method in facial sion Conference. 489507.
recognition. Tech. rep. PRI:15, Panoramic re- DARWIN, C. 1972. The Expression of the Emotions
search Inc., Palo Alto, CA. in Man and Animals. John Murray, London, U.K.
BRAND, M. AND BHOTIKA, R. 2001. Flexible flow for DECARLO, D. AND METAXAS, D. 2000. Optical flow
3D nonrigid tracking and shape recovery. In Pro- constraints on deformable models with applica-
ceedings, IEEE Conference on Computer Vision tions to face tracking. Int. J. Comput. Vis. 38,
and Pattern Recognition. 99127.
BRENNAN, S. E. 1985. The caricature generator. DONATO, G., BARTLETT, M. S., HAGER, J. C., EKMAN, P.,
Leonardo, 18, 170178. AND SEJNOWSKI, T. J. 1999. Classifying facial
BRONSTEIN, A., BRONSTEIN, M., GORDON, E., AND KIMMEL, actions. IEEE Trans. Patt. Anal. Mach. Intell. 21,
R. 2003. 3D face recognition using geometric 974989.
invariants. In Proceedings, International Confer- EDWARDS, G. J., TAYLOR, C. J., AND COOTES, T. F.
ence on Audio- and Video-Based Person Authen- 1998. Learning to identify and track faces
tication. in image sequences. In Proceedings, Interna-
BRUCE, V. 1988. Recognizing faces, Lawrence Erl- tional Conference on Automatic Face and Gesture
baum Associates, London, U.K. Recognition.
BRUCE, V., BURTON, M., AND DENCH, N. 1994. Whats EKMAN, P. Ed., 1998. Charles Darwins The Ex-
distinctive about a distinctive face? Quart. J. pression of the Emotions in Man and Animals,
Exp. Psych. 47A, 119141. Third Edition, with Introduction, Afterwords
BRUCE, V., HANCOCK, P. J. B., AND BURTON, A. M. 1998. and Commentaries by Paul Ekman. Harper-
Human face perception and identification. In Collins/Oxford University Press, New York,
Face Recognition: From Theory to Applications, NY/London, U.K.
H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie, ELLIS, H. D. 1986. Introduction to aspects of face
and T. S. Huang, Eds. Springer-Verlag, Berlin, processing: Ten questions in need of answers. In
Germany, 5172. Aspects of Face Processing, H. Ellis, M. Jeeves,
BRUNER, I. S. AND TAGIURI, R. 1954. The percep- F. Newcombe, and A. Young, Eds. Nijhoff, Dor-
drecht, The Netherlands, 313.
tion of people. In Handbook of Social Psychology,
Vol. 2, G. Lindzey, Ed., Addison-Wesley, Reading, ETEMAD, K. AND CHELLAPPA, R. 1997. Discriminant
MA, 634654. analysis for recognition of human face images.
J. Opt. Soc. Am. A 14, 17241733.
BUHMANN, J., LADES, M., AND MALSBURG, C. V. D. 1990.
Size and distortion invariant object recognition FISHER, R. A. 1938. The statistical utilization of
by hierarchical graph matching. In Proceedings, multiple measuremeents. Ann. Eugen. 8, 376
International Joint Conference on Neural Net- 386.
works. 411416. FREEMAN, W. T. AND TENENBAUM, J. B. 2000. Sepa-
CHELLAPPA, R., WILSON, C. L., AND SIROHEY, S. 1995. rating style and contents with bilinear models.
Human and machine recognition of faces: A sur- Neural Computat. 12, 12471283.
vey. Proc. IEEE, 83, 705740. FUKUNAGA, K. 1989. Statistical Pattern Recogni-
CHOUDHURY, T., CLARKSON, B., JEBARA, T., AND PENTLAND, tion, Academic Press, New York, NY.
A. 1999. Multimodal person recognition us- GALTON, F. 1888. Personal identification and de-
ing unconstrained audio and video. In Proceed- scription. Nature, (June 21), 173188.
ings, International Conference on Audio- and GAUTHIER, I., BEHRMANN, M., AND TARR, M. J. 1999.
Video-Based Person Authentication. 176181. Can face recognition really be dissociated from
COOTES, T., TAYLOR, C., COOPER, D., AND GRAHAM, J. object recognition? J. Cogn. Neurosci. 11, 349
1995. Active shape modelstheir training and 370.
application. Comput. Vis. Image Understand. 61, GAUTHIER, I. AND LOGOTHETIS, N. K. 2000. Is face
1823. recognition so unique after All? J. Cogn. Neu-
COOTES, T., WALKER, K., AND TAYLOR, C. 2000. View- ropsych. 17, 125142.
GEORGHIADES, A. S., BELHUMEUR, P. N., AND KRIEGMAN, HORN, B. K. P. AND BROOKS, M. J. 1989. Shape from
D. J. 1999. Illumination-based image synthe- Shading. MIT Press, Cambridge, MA.
sis: Creating novel images of human faces un- HUANG, J., HEISELE, B., AND BLANZ, V. 2003.
der differing pose and lighting. In Proceedings, Component-based face recognition with 3D mor-
Workshop on Multi-View Modeling and Analysis phable models. In Proceedings, International
of Visual Scenes, 4754. Conference on Audio- and Video-Based Person
GEORGHIADES, A. S., BELHUMEUR, P. N., AND KRIEGMAN, Authentication.
D. J. 2001. From few to many: Illumination ISARD, M. AND BLAKE, A. 1996. Contour tracking by
cone models for face recognition under variable stochastic propagation of conditional density. In
lighting and pose. IEEE Trans. Patt. Anal. Mach. Proceedings, European Conference on Computer
Intell. 23, 643660. Vision.
GEORGHIADES, A. S., KRIEGMAN, D. J., AND BELHUMEUR, JACOBS, D. W., BELHUMEUR, P. N., AND BASRI, R.
P. N. 1998. Illumination cones for recognition 1998. Comparing images under variable illu-
under variable lighting: Faces. In Proceedings, mination. In Proceedings, IEEE Conference on
IEEE Conference on Computer Vision and Pat- Computer Vision and Pattern Recognition. 610
tern Recognition, 5258. 617.
GINSBURG, A. G. 1978. Visual information process- JEBARA, T., RUSSEL, K., AND PENTLAND, A. 1998.
ing based on spatial filters constrained by bio- Mixture of eigenfeatures for real-time struc-
logical data. AMRL Tech. rep. 78129. ture from texture. Tech. rep. TR-440, MIT Me-
GONG, S., MCKENNA, S., AND PSARROU, A. 2000. Dy- dia Lab, Massachusetts Institute of Technology,
namic Vision: From Images to Face Recognition. Cambridge, MA.
World Scientific, Singapore. JOHNSTON, A., HILL, H., AND CARMAN, N. 1992. Rec-
GORDON, G. 1991. Face recognition based on depth ognizing faces: Effects of lighting direction, in-
maps and surface curvature. In SPIE Proceed- version and brightness reversal. Cognition 40,
ings, Vol. 1570: Geometric Methods in Computer 119.
Vision. SPIE Press, Bellingham, WA 234247. KALOCSAI, P. K., ZHAO, W., AND ELAGIN, E. 1998.
GU, L., LI, S. Z., AND ZHANG, H. J. 2001. Learning Face similarity space as perceived by humans
probabilistic distribution model for multiview and artificial systems. In Proceedings, Interna-
face dectection. In Proceedings, IEEE Conference tional Conference on Automatic Face and Gesture
on Computer Vision and Pattern Recognition. Recognition. 177180.
HAGER, G. D., AND BELHUMEUR, P. N. 1998. Efficient KANADE, T. 1973. Computer recognition of hu-
region tracking with parametri models of geom- man faces. Birkhauser, Basel, Switzerland, and
etry and illumination. IEEE Trans. Patt. Anal. Stuttgart, Germany.
Mach. Intell. 20, 115. KELLY, M. D. 1970. Visual identification of peo-
HALLINAN, P. W. 1991. Recognizing human eyes. In ple by computer. Tech. rep. AI-130, Stanford AI
SPIE Proceedings, Vol. 1570: Geometric Methods Project, Stanford, CA.
In Computer Vision. 214226. KIRBY, M. AND SIROVICH, L. 1990. Application of the
HALLINAN, P. W. 1994. A low-dimensional repre- Karhunen-Loeve procedure for the characteri-
sentation of human faces for arbitrary lighting zation of human faces. IEEE Trans. Patt. Anal.
conditions. In Proceedings, IEEE Conference on Mach. Intell. 12.
Computer Vision and Pattern Recognition. 995 KLASEN, L. AND LI, H. 1998. Faceless identification.
999. In Face Recognition: From Theory to Applica-
HANCOCK, P., BRUCE, V., AND BURTON, M. 1998. A tions, H. Wechsler, P. J. Phillips, V. Bruce, F. F.
comparison of two computer-based face recogni- Soulie, and T. S. Huang, Eds. Springer-Verlag,
tion systems with human perceptions of faces. Berlin, Germany, 513527.
Vis. Res. 38, 22772288. KNIGHT, B. AND JOHNSTON, A. 1997. The role of
HARMON, L. D. 1973. The recognition of faces. Sci. movement in face recognition. Vis. Cog. 4, 265
Am. 229, 7182. 274.
HEISELE, B., SERRE, T., PONTIL, M., AND POGGIO, T. KRUGER, N., POTZSCH, M., AND MALSBURG, C. V. D. 1997.
2001. Component-based face detection. In Pro- Determination of face position and pose with a
ceedings, IEEE Conference on Computer Vision learned representation based on labelled graphs.
and Pattern Recognition. Image Vis. Comput. 15, 665673.
HILL, H. AND BRUCE, V. 1996. Effects of lighting on KUNG, S. Y. AND TAUR, J. S. 1995. Decision-based
matching facial surfaces. J. Exp. Psych.: Human neural networks with signal/image classification
Percept. Perform. 22, 9861004. applications. IEEE Trans. Neural Netw. 6, 170
HILL, H., SCHYNS, P. G., AND AKAMATSU, S. 1997. 181.
Information and viewpoint dependence in face LADES, M., VORBRUGGEN, J., BUHMANN, J., LANGE, J.,
recognition. Cognition 62, 201222. MALSBURG, C. V.D., WURTZ, R., AND KONEN, W.
HJELMAS, E. AND LOW, B. K. 2001. Face detection: 1993. Distortion invariant object recognition
A Survey. Comput. Vis. Image Understand. 83, in the dynamic link architecture. IEEE Trans.
236274. Comput. 42, 300311.
LANITIS, A., TAYLOR, C. J., AND COOTES, T. F. 1995. sequences of faces. In Proceedings, Interna-
Automatic face identification system using flex- tional Conference on Automatic Face and Gesture
ible appearance models. Image Vis. Comput. 13, Recognition. 176181.
393401. MCKENNA, S. J. AND GONG, S. 1997. Non-intrusive
LAWRENCE, S., GILES, C. L., TSOI, A. C., AND BACK, person authentication for access control by vi-
A. D. 1997. Face recognition: A convolutional sual tracking and face recognition. In Proceed-
neural-network approach. IEEE Trans. Neural ings, International Conference on Audio- and
Netw. 8, 98113. Video-Based Person Authentication. 177183.
LI, B. AND CHELLAPPA, R. 2001. Face verification MCKENNA, S. AND GONG, S. 1998. Recognising mov-
through tracking facial features. J. Opt. Soc. Am. ing faces. In Face Recognition: From Theory to
18. Applications, H. Wechsler, P. J. Phillips, V. Bruce,
LI, S. Z. AND LU, J. 1999. Face recognition using the F. F. Soulie, and T. S. Huang, Eds. Springer-
nearest feature line method. IEEE Trans. Neu- Verlag, Berlin, Germany, 578588.
ral Netw. 10, 439443. MATAS, J. ET. AL., 2000. Comparison of face verifi-
LI, Y., GONG, S., AND LIDDELL, H. 2001a. Construct- cation results on the XM2VTS database. In Pro-
ing facial identity surfaces in a nonlinear dis- ceedings, International Conference on Pattern
criminating space. In Proceedings, IEEE Confer- Recognition, Vol. 4, 858863.
ence on Computer Vision and Pattern Recogni- MESSER, K., MATAS, J., KITTLER, J., LUETTIN, J., AND
tion. MAITRE, G. 1999. XM2VTSDB: The Extended
LI, Y., GONG, S., AND LIDDELL, H. 2001b. Modelling M2VTS Database. In Proceedings, International
face dynamics across view and over time. In Pro- Conference on Audio- and Video-Based Person
ceedings, International Conference on Computer Authentication. 7277.
Vision. MIKA, S., RATSCH, G., WESTON, J., SCHOLKOPF, B.,
LIN, S. H., KUNG, S. Y., AND LIN, L. J. 1997. Face AND MULLER, K.-R. 1999. Fisher discriminant
recognition/detection by probabilistic decision- analysis with kernels. In Proceedings, IEEE
based neural network. IEEE Trans. Neural Workshop on Neural Networks for Signal Pro-
Netw. 8, 114132. cessing.
LIU, C. AND WECHSLER, H. 2000a. Evolutionary pur- MOGHADDAM, B., NASTAR, C., AND PENTLAND, A. 1996.
suit and its application to face recognition. IEEE A Bayesian similarity measure for direct image
Trans. Patt. Anal. Mach. Intell. 22, 570582. matching. In Proceedings, International Confer-
ence on Pattern Recognition.
LIU, C. AND WECHSLER, H. 2000b. Robust coding
scheme for indexing and retrieval from large face MOGHADDAM, B. AND PENTLAND, A. 1997. Probabilis-
databases. IEEE Trans. Image Process. 9, 132 tic visual learning for object representation.
137. IEEE Trans. Patt. Anal. Mach. Intell. 19, 696
LIU, C. AND WECHSLER, H. 2001. A shape- and 710.
texture-based enhanced fisher classifier for face MOON, H. AND PHILLIPS, P. J. 2001. Computational
recognition. IEEE Trans. Image Process. 10, and performance aspects of PCA-based face
598608. recognition algorithms. Perception, 30, 301321.
LIU, J. AND CHEN, R. 1998. Sequential Monte Carlo MURASE, H. AND NAYAR, S. 1995. Visual learning
methods for dynamic systems. J. Am. Stat. Assoc. and recognition of 3D objects from appearances.
93, 10311041. Int. J. Comput. Vis. 14, 525.
MANJUNATH, B. S., CHELLAPPA, R., AND MALSBURG, C. V. D. NEFIAN, A. V. AND HAYES III, M. H. 1998. Hid-
1992. A feature based approach to face recogni- den Markov models for face recognition. In Pro-
tion. In Proceedings, IEEE Conference on Com- ceedings, International Conference on Acoustics,
puter Vision and Pattern Recognition. 373378. Speech and Signal Processing. 27212724.
MARR, D. 1982. Vision. W. H. Freeman, San Fran- OKADA, K., STEFFANS, J., MAURER, T., HONG, H., ELAGIN,
cisco, CA. E., NEVEN, H., AND MALSBURG, C. V. D. 1998. The
MARTINEZ, A. 2002. Recognizing imprecisely local- Bochum/USC Face Recognition System and how
ized, partially occluded and expression variant it fared in the FERET Phase III Test. In Face
faces from a single sample per class. IEEE Trans. Recognition: From Theory to Applications, H.
Patt. Anal. Mach. Intell. 24, 748763. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie,
and T. S. Huang, Eds. Springer-Verlag, Berlin,
MARTINEZ, A. AND KAK, A. C. 2001. PCA versus Germany, 186205.
LDA. IEEE Trans. Patt. Anal. Mach. Intell. 23,
OTOOLE, A. J., ROARK, D., AND ABDI, H. 2002.
228233.
Recognitizing moving faces. A psychological and
MAURER, T. AND MALSBURG, C. V. D. 1996a. Single- neural synthesis. Trends Cogn. Sci. 6, 261
view based recognition of faces rotated in depth. 266.
In Proceedings, International Workshop on Auto-
matic Face and Gesture Recognition. 176181. PANTIC, M. AND ROTHKRANTZ, L. J. M. 2000. Auto-
matic analysis of facial expressions: The state of
MAURER, T. AND MALSBURG, C. V. D. 1996b. Track- the art. IEEE Trans. Patt. Anal. Mach. Intell. 22,
ing and learning graphs and pose on image 14241446.
PENEV, P. AND SIROVICH, L. 2000. The global dimen- position using a small number of example views
sionality of face space. In Proceedings, Interna- or even a single view. In Proceedings, IEEE Con-
tional Conference on Automatic Face and Gesture ference on Computer Vision and Pattern Recog-
Recognition. nition. 153161.
PENEV, P. AND ATICK, J. 1996. Local feature analy- SAMAL, A. AND IYENGAR, P. 1992. Automatic recog-
sis: A general statistical theory for objecct repre- nition and analysis of human faces and facial
sentation. Netw.: Computat. Neural Syst. 7, 477 expressions: A survey. Patt. Recog. 25, 6577.
500. SAMARIA, F. 1994. Face recognition using hidden
PENTLAND, A., MOGHADDAM, B., AND STARNER, T. 1994. markov models. Ph.D. dissertation. University
View-based and modular eigenspaces for face of Cambridge, Cambridge, U.K.
recognition. In Proceedings, IEEE Conference on SAMARIA, F. AND YOUNG, S. 1994. HMM based archi-
Computer Vision and Pattern Recognition. tecture for face identification. Image Vis. Com-
PERKINS, D. 1975. A definition of caricature and put. 12, 537583.
recognition. Stud. Anthro. Vis. Commun. 2, 1 SCHNEIDERMAN, H. AND KANADE, T. 2000. Probabilis-
24. tic modelling of local Appearance and spatial
PHILLIPS, P. J., GROTHER, P. J., MICHEALS, R. J., BLACK- reationships for object recognition. In Proceed-
BURN, D. M., TABASSI, E., AND BONE, J. M. 2003. ings, IEEE Conference on Computer Vision and
Face recognition vendor test 2002: Evaluation Pattern Recognition. 746751.
report. NISTIR 6965, 2003. Available online at SERGENT, J. 1986. Microgenesis of face perception.
http://www.frvt.org. In Aspects of Face Processing, H. D. Ellis, M. A.
PHILLIPS, P. J. 1998. Support vector machines ap- Jeeves, F. Newcombe, and A. Young, Eds. Nijhoff,
plied to face fecognition. Adv. Neural Inform. Dordrecht, The Netherlands.
Process. Syst. 11, 803809. SHASHUA, A. 1994. Geometry and photometry in
PHILLIPS, P. J., MCCABE, R. M., AND CHELLAPPA, R. 3D visual recognition. Ph.D. dissertation. Mas-
1998. Biometric image processing and recogni- sachusetts Institute of Technology, Cambridge,
tion. In Proceedings, European Signal Process- MA.
ing Conference. SHEPHERD, J. W., DAVIES, G. M., AND ELLIS, H. D. 1981.
PHILLIPS, P. J., MOON, H., RIZVI, S., AND RAUSS, P. 2000. Studies of cue saliency. In Perceiving and Re-
The FERET evaluation methodology for face- membering Faces, G. M. Davies, H. D. Ellis, and
recognition algorithms. IEEE Trans. Patt. Anal. J. W. Shepherd, Eds. Academic Press, London,
Mach. Intell. 22. U.K.
PHILLIPS, P. J., WECHSLER, H., HUANG, J., AND RAUSS, SHIO, A. AND SKLANSKY, J. 1991. Segmentation of
P. 1998b. The FERET database and evalua- people in motion. In Proceedings, IEEE Work-
tion procedure for face-recognition algorithms. shop on Visual Motion. 325332.
Image Vis. Comput. 16, 295306. SIROVICH, L. AND KIRBY, M. 1987. Low-dimensional
PIGEON, S. AND VANDENDORPE, L. 1999. The M2VTS procedure for the characterization of human
multimodal face database (Release 1.00). In face. J. Opt. Soc. Am. 4, 519524.
Proceedings, International Conference on Audio- STEFFENS, J., ELAGIN, E., AND NEVEN, H. 1998.
and Video-Based Person Authentication. 403 PersonSpotterfast and robust system for hu-
409. man detection, tracking and recognition. In Pro-
RIKLIN-RAVIV, T. AND SHASHUA, A. 1999. The quo- ceedings, International Conference on Automatic
tient image: Class based re-rendering and recog- Face and Gesture Recognition. 516521.
nition with varying illuminations. In Proceed- STROM, J., JEBARA, T., BASU, S., AND PENTLAND, A.
ings, IEEE Conference on Computer Vision and 1999. Real time tracking and modeling of
Pattern Recognition. 566571. faces: An EKF-based analysis by synthesis ap-
RIZVI, S. A., PHILLIPS, P. J., AND MOON, H. 1998. A proach. Tech. rep. TR-506, MIT Media Lab, Mas-
verification protocol and statistical performance sachusetts, Institute of Technology, Cambridge,
analysis for face recognition algorithms. In Pro- MA.
ceedings, IEEE Conference on Computer Vision SUNG, K. AND POGGIO, T. 1997. Example-based
and Pattern Recognition. 833838. learning for view-based human face detection.
ROWLEY, H. A., BALUJA, S., AND KANADE, T. 1998. IEEE Trans. Patt. Anal. Mach. Intell. 20, 39
Neural network based face detection. IEEE 51.
Trans. Patt. Anal. Mach. Intell. 20. SWETS, D. L. AND WENG, J. 1996b. Using dis-
CHOUDHURY, A. K. R. AND CHELLAPPA, R. 2003. Face criminant eigenfeatures for image retrieval.
reconstruction from monocular video using un- IEEE Trans. Patt. Anal. Mach. Intell. 18, 831
certainty analysis and a generic model. Comput. 836.
Vis. Image Understand. 91, 188213.
SWETS, D. L. AND WENG, J. 1996. Discriminant
RUDERMAN, D. L. 1994. The statistics of natural im- analysis and eigenspace partition tree for face
ages. Netw.: Comput. Neural Syst. 5, 598605. and object recognition from views. In Proceed-
SALI, E. AND ULLMAN, S. 1998. Recognizing novel ings, International Conference on Automatic
3-D objects under new illumination and viewing Face and Gesture Recognition. 192197.
TARR, M. J. AND BULTHOFF, H. H. 1995. Is hu- Mammone, Ed. Chapman Hall, New York, NY,
man object recognition better described by geon 520536.
structural descriptions or by multiple views WISKOTT, L., FELLOUS, J.-M., AND VON DER MALSBURG, C.
comment on Biederman and Gerhardstein 1997. Face recognition by elastic bunch graph
(1993). J. Exp. Psych.: Hum. Percep. Perf. 21, 71 matching. IEEE Trans. Patt. Anal. Mach. Intell.
86. 19, 775779.
TERZOPOULOS, D. AND WATERS, K. 1993. Analysis YANG, M. H., KRIEGMAN, D., AND AHUJA, N. 2002. De-
and synthesis of facial image sequences using tecting faces in images: A survey. IEEE Trans.
physical and anatomical models. IEEE Trans. Patt. Anal. Mach. Intell. 24, 3458.
Patt. Anal. Mach. Intell. 15, 569579.
YIN, R. K. 1969. Looking at upside-down faces.
THOMPSON, P. 1980. Margaret ThatcherA new il- J. Exp, Psych. 81, 141151.
lusion. Perception, 9, 483484.
YUILLE, A. L., COHEN, D. S., AND HALLINAN, P. W.
TSAI, P. S. AND SHAH, M. 1994. Shape from shading 1992. Feature extractiong from faces using de-
using linear approximation. Image Vis. Comput. formable templates. Int. J. Comput. Vis. 8, 99
12, 487498. 112.
TRIGGS, B., MCLAUCHLAN, P., HARTLEY, R., AND YUILLE, A. AND HALLINAN, P. 1992. Deformable tem-
FITZGIBBON, A. 2000. Bundle adjustmenta plates. In Active vision, A. Blake, and A. Yuille,
modern synthesis. In Vision Algorithms: Theory Eds., Cambridge, MA, 2138.
and Practice, Springer-Verlag, Berlin, Germany. ZHAO, W. 1999. Robust Image Based 3D Face
TURK, M. AND PENTLAND, A. 1991. Eigenfaces for Recognition, Ph.D. dissertation. University of
recognition. J. Cogn. Neurosci. 3, 7286. Maryland, College Park, MD.
ULLMAN, S. AND BASRI, R. 1991. Recognition by lin- ZHAO, W. AND CHELLAPPA, R. 2000b. SFS Based
ear combinations of models. IEEE Trans. Patt. View synthesis for robust face recognition. In
Anal. Mach. Intell. 13, 9921006. Proceedings, International Conference on Auto-
VAPNIK, V. N. 1995. The Nature of Statistical matic Face and Gesture Recognition.
Learning Theory. Springer-Verlag, New York, ZHAO, W. AND CHELLAPPA, R. 2000. Illumination-
NY. insensitive face recognition using symmetric
VETTER, T. AND POGGIO, T. 1997. Linear object shape-from-shading. In Proceedings, Conference
classes and image synthesis from a single exam- on Computer Vision and Pattern Recognition.
ple image. IEEE Trans. Patt. Anal. Mach. Intell. 286293.
19, 733742. ZHAO, W., CHELLAPPA, R., AND KRISHNASWAMY, A. 1998.
VIOLA, P. AND JONES, M. 2001. Rapid object detec- Discriminant analysis of principal components
tion using a boosted cascade of simple features. for face recognition. In Proceedings, Interna-
In Proceedings, IEEE Conference on Computer tional Conference on Automatic Face and Gesture
Vision and Pattern Recognition. Recognition. 336341.
WECHSLER, H., KAKKAD, V., HUANG, J., GUTTA, S., AND ZHAO, W., CHELLAPPA, R., AND PHILLIPS, P. J. 1999.
CHEN, V. 1997. Automatic video-based person Subspace linear discriminant analysis for face
authentication using the RBF network. In Pro- recognition. Tech. rep. CAR-TR-914, Center for
ceedings, International Conference on Audio- Automation Research, University of Maryland,
and Video-Based Person Authentication. 8592. College Park, MD.
WILDER, J. 1994. Face recognition using transform ZHOU, S., KRUEGER, V., AND CHELLAPPA, R. 2003.
coding of gray scale projection and the neu- Probabilistic recognition of human faces from
ral tree network. In Artificial Neural Networks video. Comput. Vis. Image Understand. 91, 214
with Applications in Speech and Vision, R. J. 245.