Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
ATHESISSUBMITTEDTO
THEGRADUATESCHOOLOFNATURALSCIENCES
OF
THEMIDDLEEASTTECHNICALUNIVERSITY
BY
BURCUKEPENEKCI
INPARTIALFULLFILMENTOFTHEREQUIREMENTSFORTHEDEGREE
OF
MASTEROFSCIENCE
IN
THEDEPARTMENTOFELECTRICALANDELECTRONICSENGINEERING
SEPTEMBER2001
ii
ABSTRACT
FACERECOGNITIONUSINGGABORWAVELET
TRANSFORM
Kepenekci,Burcu
M.S,DepartmentofElectricalandElectronicsEngineering
Supervisor:A.AydınAlatan
CoSupervisor:GözdeBozdaı Akar
September2001,118pages
Face recognition is emerging as an active research area with numerous
commercial and law enforcement applications. Although existing methods
performs well under certain conditions, the illumination changes, out of plane
rotations and occlusions are still remain as challenging problems. The proposed
algorithm deals with two of these problems, namely occlusion and illumination
changes.Inourmethod,Gaborwavelettransformisusedforfacialfeaturevector
constructionduetoitspowerfulrepresentationofthebehaviorofreceptivefields
iii
in human visual system (HVS). The method is based on selecting peaks (high
energizedpoints)oftheGabor wavelet responses as feature points. Compared to
predefined graph nodes of elastic graph matching, our approach has better
representativecapabilityforGaborwavelets.Thefeaturepointsareautomatically
extracted using the local characteristics of each individual face in order to
decrease the effect of occluded features. Since there is no training as in neural
network approaches, a single frontal face for each individual is enough as a
reference. The experimental results with standard image libraries, show that the
proposed method performs better compared to the graph matching and eigenface
basedmethods.
Keywords:Automaticfacerecognition,Gaborwavelettransform,humanface
perception.
iv
ÖZ
GABORDALGACIKLARINIKULLANARAKYÜZ
TANIMA
Kepenekci,Burcu
YüksekLisans, ElektrikElektronikMühendisliiBölümü
TezYöneticisi:A.AydınAlatan
YardımcıTezYöneticisi:GözdeBozdaı Akar
Eylül2001,118 sayfa
Yüz tanıma günümüzde hem ticari hem de hukuksal alanlarda artan
sayıda uygulaması olan bir problemdir. Varolan yüz tanıma metodları kontrollü
ortamda baarılı sonuçlar verse de örtme, yönlenme, ve aydınlatma deiimleri
hala yüz tanımada çözülememi üç problemdir. Önerilen metod ile bu üç
problemdenaydınlanmadeiimleriveörtmeetkisielealınmıtır. Buçalımada
hem yüze ait öznitelik noktaları hem de vektörleri Gabor dalgacık dönü ümü
v
kullanılarak bulunmutur. Gabor dalgacık dönü ümü, insan görme sistemindeki
duyumsal bölgelerin davranıını modellemesinden dolayı kullanılmıtır.Önerilen
metod, daha önceden tanımlanmı çizge düümleri yerine, Gabor dalgacık
tepkeleri tepelerinin (yüksek enerjili noktalarının) öznitelik noktaları olarak
seçilmesine dayanmaktadır. Böylece Gabor dalgacıklarının en verimli ekilde
kullanılması salanmıtır. Öznitelik noktaları otomatik olarak her yüzün farklı
yerel özellikleri kullanılarak bulunmakta, bunun sonucu olarak örtük
özniteliklerin etkisi de azaltılmaktadır. Sinir aları yaklaımlarında olduu gibi
örenme safhası olmaması nedeniyle tanıma için her kiinin sadece bir ön yüz
görüntüsü yeterli olmaktadır. Yapılan deneylerin sonucunda önerilen metodun
varolan çizge eleme ve özyüzler yöntemleriyle karıla tırıldıında daha baarılı
sonuçlar verdii gözlenmitir.
Anahtar Kelimeler: Yüz tanıma, Gabor dalgacık dönü ümü, insan yüz algısı,
öznitelikbulma,öznitelikeleme,örüntütanıma.
vi
ACKNOWLEDGEMENTS
IwouldliketoexpressmygratitudetoAssoc.Prof.Dr.GözdeBozdaı Akarand
Assist. Prof. Dr. A. Aydın Alatan for their guidance, suggestions and insight
throughout this research. I also thank to F. Boray Tek for his support, useful
discussions, and great friendship. To my family, I offer sincere thanks for their
unshakablefaithinme.SpecialthankstofriendsinSignalProcessingandRemote
Sensing Laboratory for their reassurance during thesis writing. I would like to
acknowledge Dr. Uur Murat Lelolu for his comments. Most importantly, I
express my appreciation to Assoc. Prof. Dr. Mustafa Karaman for his guidance
duringmyundergraduatestudiesandhisencouragingmetoresearch.
vii
TABLEOFCONTENTS
ABSTRACT…………………………………………………...……..……….…...ii
ÖZ………………………………………………………………...…….…………iv
ACKNOWLEDGEMENTS……………………………………………...……….vi
TABLEOFCONTENTS……………………………………………………...…vii
LISTOFTABLES………..…………………………………………………….....x
LISTOFFIGURES...…………………..………………………………………...xi
CHAPTER
1. INTRODUCTION…………………………………………………….....1
1.1. WhyFaceRecognition?………..…………………………………3
1.2. ProblemDefinition………………………...………………………6
1.3. Organizationofthethesis……………………………………….....8
2. PASTRESEARCHONFACERECOGNITION...……………………..9
2.1. HumanFaceRecognition………………………………………….9
2.1.1. Discussion…………..……………………………………13
2.2. AutomaticFaceRecognition……………………………………..15
2.2.1. Representation,MatchingandStatisticalDecision……...16
2.2.2. EarlyFaceRecognitionMethods………………………...19
2.2.3. StatisticalApproachstoFaceRecognition……………....21
viii
2.2.3.1. KarhunenLoeveExpansionBasedMethods…..21
2.2.3.1.1. Eigenfaces………………………….21
2.2.3.1.2. FaceRecognitionusing
Eigenfaces………….……………….26
2.2.3.1.3. Eigenfeatures……………………….28
2.2.3.1.4. KarhunenLoeveTransformofthe
FourierTransform……………….…29
2.2.3.2. LinearDiscriminantMethodsFisherfaces….....30
2.2.3.2.1. Fisher’sLinearDiscriminant……….30
2.2.3.2.2. FaceRecognitionUsingLinear
DiscriminantAnalysis…………..….32
2.2.3.3. SingularValueDecompositionMethods………35
2.2.3.3.1. SingularValueDecomposition…….35
2.2.3.3.2. FaceRecognitionUsingSingular
ValueDecomposition………………35
2.2.4. HiddenMarkovModelBasedMethods………………….38
2.2.5. NeuralNetworksApproach………………………...…….44
2.2.6. TemplateBasedMatching……………………………......49
2.2.7. FeatureBasedMatching……………………………….....51
2.2.8. CurrentStateofTheArt………………………………….60
3. FACECOMPARISONANDMATCHING
USINGGABORWAVELETS……..………………………………….62
3.1. FaceRepresentationUsingGaborWavelets……………..………64
ix
3.1.1. GaborWavelets………………………………………......64
3.1.2. 2DGaborWaveletRepresentationofFaces……………..68
3.1.3. FeatureExtraction………………………………………..70
3.1.3.1. FeaturepointLocalization…………………….70
3.1.3.2. FeaturevectorExtraction……………………...73
3.2. MatchingProcedure……………………………………………...74
3.2.1. SimilarityCalculation……………………………………74
3.2.2. FaceComparison………………………………..……….75
4. RESULTS……………………………………………………………...82
4.1. SimulationSetup…………………..…………………..…………82
4.1.1. ResultsforUniversityofStirlingFaceDatabase………...83
4.1.2. ResultsforPurdueUniversityFaceDatabase……………84
4.1.3. ResultsforTheOlivettiandOracleResearchLaboratory
(ORL)faceDatabase…..…………………………………87
4.1.4. ResultsforFERETFaceDatabase…….………………....89
5. CONCLUSIONSANDFUTUREWORK…………………………...103
REFERENCES……………………………………………………….………....108
x
LISTOFTABLES
4.1 Recognition performances of eigenface, eigenhills and proposed method
onthePurduefacedatabase………………………….…………………..85
4.2 PerformanceresultsofwellknownalgorithmsonORLdatabase……….88
4.3 Probesetsandtheirgoalofevaluation………………………………...…92
4.4 ProbesetsforFERETperformanceevaluation………………………..…92
4.5 FERET performance evaluation results for various face recognition
algorithms…………………………………………………………...……94
xi
LISTOFFIGURES
2.1 AppearancemodelofEigenfaceAlgorithm….…...……………………...26
2.2 DiscriminativemodelofEigenfaceAlgorithm.………………………….27
2.3 ImagesamplingtechniqueforHMMrecognition…………………...…...40
2.4 HMMtrainingscheme…………………………………………………...43
2.5 HMMrecognitionscheme……………………...………………………...44
2.6 Autoassociationandclassificationnetworks…………..……………….45
2.7 ThediagramoftheConvolutionalNeuralNetworkSystem……………..48
2.8 A2Dimagelattice(gridgraph)onMarilynMonroe’sface……..………55
2.9 Abrunchgraph………………………………………………………...…59
2.10 Bunchgraphmatchedtoaface…………………………………..………59
3.1 AnensembleofGaborwavelets………………………..………………..66
3.2 Smallsetoffeaturescanrecognizefacesuniquely,andreceptivefields
thataremachedtothelocalfeaturesontheface…………………………67
3.3 Gaborfilterscorrespondto5spatialfrequencyand8orientation…..…...68
3.4 ExampleofafacialimageresponsetoGaborfilters……..……………..69
3.5 FacialfeaturepointsfoundasthehighenergizedpointsofGaborwavelet
responses………………………………..…………………..…………....71
3.6 Flowchartofthefeatureextractionstageofthefacialimages ………......72
xii
3.7 Testfacesvs.matchingfacefromgallery……………………...………..81
4.1 ExamplesofdifferentfacialexpressionsfromStirlingdatabase……...…83
4.2 ExampleofdifferentfacialimagesforapersonfromPurduedatabase....85
4.3 Wholesetoffaceimagesof40individuals10imagesperperson….. …..86
4.4 ExampleofmisclassifiedfacesofORLdatabase…………………..……88
4.5 ExampleofdifferentfacialimagesforapersonfromORLdatabasethat
areplacedattrainingorprobesetsbydifferent…………….…………....89
4.6 FERETidentificationperformancesagainstfbprobes………...…..……96
4.7 FERETidentificationperformancesagainstduplicateIprobes….…..…97
4.8 FERETidentificationperformancesagainst fcprobes….…….…………98
4.9 FERETidentificationperformancesagainstduplicateIIprobes..…….…99
4.10 FERETaverageidentificationperformances………...…………………100
4.11 FERETcurrentupperboundidentificationperformances..…………....101
4.12 FERETidentificationperformanceofproposedmethod
againstfbprobes…………..…………………………………………....102
4.13 FERETidentificationperformanceofproposedmethod
againstfcprobes………………………………...……………………....102
4.14 FERETidentificationperformanceofproposedmethod
againstduplicateIprobes.……………………….…………...…………103
4.15 FERETidentificationperformanceofproposedmethod
againstduplicateIIprobes….………………………..…………………103
1
CHAPTER1
INTRODUCTION
Machine recognition of faces is emerging as an active research area
spanning several disciplines such as image processing, pattern recognition,
computervisionandneuralnetworks.Facerecognitiontechnologyhasnumerous
commercial and law enforcement applications. These applications range from
static matching of controlled format photographs such as passports, credit cards,
photoID’s,driver’slicenses,andmugshotstorealtimematchingofsurveillance
videoimages[82].
Humans seem to recognize faces in cluttered scenes with relative ease,
having the ability to identify distorted images, coarsely quantized images, and
faces with occluded details. Machine recognition is much more daunting task.
Understandingthehuman mechanisms employed to recognize faces constitutesa
challenge for psychologists and neural scientists. In addition to the cognitive
aspects, understanding face recognition is important, since the same underlying
2
mechanisms could be used to build a system for the automatic identification of
facesbymachine.
AformalmethodofclassifyingfaceswasfirstproposedbyFrancisGalton
in 1888 [53, 54]. During the 1980’s work on face recognition remained largely
dormant. Since the 1990’s, the research interest in face recognition has grown
significantlyasaresultofthefollowingfacts:
1. Theincreaseinemphasisoncivilian/commercialresearchprojects,
2. The reemergence of neural network classifiers with emphasis on real time
computationandadaptation,
3. Theavailabilityofrealtimehardware,
4. The increasing need for surveillance related applications due to drug
trafficking,terroristactivities,etc.
Still most of the access control methods, with all their legitimate
applications in an expanding society, have a bothersome drawback. Except for
human and voice recognition, these methods require the user to remember a
password, to enter a PIN code, to carry a batch, or, in general, require a human
action in the course of identification or authentication. In addition, the
corresponding means (keys, batches, passwords, PIN codes) are prone to being
lost or forgotten, whereas fingerprints and retina scans suffer from low user
acceptance. Modern face recognition has reached an identification rate greater
than 90% with wellcontrolled pose and illumination conditions. While this is a
high rate for face recognition, it is not comparable to methods using keys,
passwordsorbatches.
3
1.1. Whyfacerecognition?
Within today’s environment of increased importance of security and
organization,identificationandauthenticationmethodshavedevelopedintoakey
technology in various areas: entrance control in buildings; access control for
computers in general or for automatic teller machines in particular; daytoday
affairs like withdrawing money from a bank account or dealing with the post
office; or in the prominent field of criminal investigation. Such requirement for
reliable personal identification in computerized access control has resulted in an
increasedinterestinbiometrics.
Biometric identification is the technique of automatically identifying or
verifying an individual by a physical characteristic or personal trait. The term
“automatically”meansthebiometricidentificationsystemmustidentifyorverify
ahumancharacteristicortraitquicklywithlittleornointerventionfromtheuser.
Biometric technology was developed for use in highlevel security systems and
lawenforcementmarkets.Thekeyelementofbiometrictechnologyisitsabilityto
identifyahumanbeingandenforcesecurity[83].
Biometriccharacteristicsandtraitsaredividedintobehavioralorphysical
categories. Behavioral biometrics encompasses such behaviors as signature and
typing rhythms. Physical biometric systems use the eye, finger, hand, voice, and
face,foridentification.
A biometricbased system was developed by Recognition Systems Inc.,
Campbell, California, as reported by Sidlauskas [73]. The system was called
ID3D Handkey and used the three dimensional shape of a person’s hand to
4
distinguish people. The side and top view of a hand positioned in a controlled
captureboxwereusedtogenerateasetofgeometricfeatures.Capturingtakesless
than two seconds and the data could be stored efficiently in a 9byte feature
vector.Thissystemcouldstoreupto20000differenthands.
Another wellknown biometric measure is that of fingerprints. Various
institutions around the world have carried out research in the field. Fingerprint
systemsareunobtrusiveandrelativelycheap to buy. They are used in banks and
to control entrance to restricted access areas. Fowler [51] has produced a short
summaryoftheavailablesystems.
Fingerprintsareuniquetoeachhumanbeing.Ithasbeenobservedthatthe
iris of the eye, like fingerprints, displays patterns and textures unique to each
humanandthatitremainsstableoverdecadesoflifeasdetailedbySiedlarz[74].
Daugman designed a robust pattern recognition method based on 2D Gabor
transformstoclassifyhumanirises.
Speechrecognitionisalsooffersoneofthemostnaturalandlessobtrusive
biometric measures, where a user is identified through his or her spoken words.
AT&Thaveproducedaprototypethatstoresaperson’svoiceonamemorycard,
detailsofwhicharedescribedbyMandelbaum[67].
While appropriate for bank transactions and entry into secure areas, such
technologies have the disadvantage that they are intrusive both physically and
socially. They require the user to position their body relative to the sensor, then
pauseforasecondtodeclarehimselforherself.Thispauseanddeclareinteraction
isunlikelytochangebecauseofthefinegrainspatialsensingrequired.Moreover,
5
since people can not recognize people using this sort of data, these types of
identification do not have a place in normal human interactions and social
structures.
While the pause and present interaction perception are useful in high
security applications, they are exactly the opposite of what is required when
building a store that recognizing its best customers, or an information kiosk that
remembersyou,orahousethatknowsthepeoplewholivethere.
A face recognition system would allow user to be identified by simply
walkingpastasurveillancecamera.Humanbeingsoftenrecognizeoneanotherby
uniquefacialcharacteristics.Oneofthenewestbiometrictechnologies,automatic
facial recognition, is based on this phenomenon. Facial recognition is the most
successful form of human surveillance. Facial recognition technology, is being
used to improve human efficiency when recognizing faces, is one of the fastest
growing fields in the biometric industry. Interest in facial recognition is being
fueled by the availability and low cost of video hardware, the everincreasing
number of video cameras being placed in the workspace, and the noninvasive
aspectoffacialrecognitionsystems.
Althoughfacialrecognitionisstillintheresearchanddevelopmentphase,
several commercial systems are currently available and research organizations,
such as Harvard University and the MIT Media Lab, are working on the
developmentofmoreaccurateandreliablesystems.
6
1.2. ProblemDefinition
Ageneralstatementoftheproblemcanbeformulatedasfollows:givenstill
or video images of a scene, identify one or more persons in the scene using a
storeddatabaseoffaces.
The environment surrounding a face recognition application can cover a
wide spectrum from a wellcontrolled environment to an uncontrolled one. In a
controlledenvironment,frontaland profile photographs are taken, complete with
uniform background and identical poses among the participants. These face
images are commonly called mug shots. Each mug shot can be manually or
automatically cropped to extract a normalized subpart called a canonical face
image.Inacanonicalfaceimage,thesizeandpositionofthefacearenormalized
approximatelytothepredefinedvaluesandbackgroundregionisminimal.
Generalfacerecognition,ataskthatisdonebyhumansindailyactivities,
comes from virtually uncontrolled environment. Systems, which automatically
recognizefacesfromuncontrolledenvironment,mustdetectfacesinimages.Face
detectiontaskistoreportthelocation,andtypicallyalsothesize,ofallthefaces
from a given image and completely a different problem with respect to face
recognition.
Facerecognitionisadifficultproblemduetothegeneralsimilarshapeof
faces combined with the numerous variations between images of the same face.
Recognition of faces from an uncontrolled environment is a very complex task:
lightingconditionmayvarytremendously;facialexpressionsalsovaryfromtime
to time; face may appear at different orientations and a face can be partially
7
occluded.Further,dependingontheapplication,handlingfacialfeaturesovertime
(aging)mayalsoberequired.
Although existing methods performs well under constrained conditions,
theproblemswiththeilluminationchanges,outofplanerotationsandocclusions
arestillremainsunsolved.Theproposedalgorithm,dealswithtwoofthesethree
importantproblems,namelyocclusionandilluminationchanges.
Sincethetechniquesusedinthebestfacerecognitionsystemsmaydepend
ontheapplicationofthesystem,onecanidentifyatleasttwobroadcategoriesof
facerecognitionsystems[19]:
1. Findingapersonwithinlargedatabaseoffaces(e.g.inapolicedatabase).
(Often only one image is available per person. It is usually not necessary for
recognitiontobedoneinrealtime.)
2. Identifyingparticularpeopleinrealtime(e.g.locationtrackingsystem).
(Multiple images per person are often available for training and real time
recognitionisrequired.)
Inthisthesis,weprimarilyinterestedinthefirstcase.Detectionoffaceis
assumed to be done beforehand. We aim to provide the correct label (e.g. name
label) associated withthat face from all the individuals in its database in case of
occlusionsandilluminationchanges.Databaseoffacesthatarestoredinasystem
iscalledgallery.Ingallery,thereexistsonlyonefrontalviewofeachindividual.
We do not consider cases with high degrees of rotation, i.e. we assume that a
minimalpreprocessingstageisavailableifrequired.
8
1.3. OrganizationoftheThesis
Over the past 20 years extensive research has been conducted by
psychophysicists, neuroscientists and engineers on various aspects of face
recognitionbyhumanandmachines.Inchapter2,wesummarizetheliteratureon
bothhumanandmachinerecognitionoffaces.
Chapter 3 introduces the proposed approach based on Gabor wavelet
representationoffaceimages.Thealgorithmisexplainedexplicitly.
Performance of our method is examined on four different standard face
databases with different characteristics. Simulation results and their comparisons
towellknownfacerecognitionmethodsarepresentedinChapter4.
In chapter 5, concluding remarks are stated. Future works, which may
followthisstudy,arealsopresented.
9
CHAPTER2
PASTRESEARCHONFACERECOGNITION
Thetaskofrecognizingfaceshasattractedmuchattentionbothfrom
neuroscientistsandfromcomputervisionscientists.Thischapterreviewssomeof
thewellknownapproachesfromthesebothfields.
2.1. Human Face Recognition: Perceptual and
CognitiveAspects
Themajorresearchissuesofinteresttoneuroscientistsincludethehuman
capacity for face recognition, the modeling of this capability, and the apparent
modularityoffacerecognition.Inthissectionsomefindings,reachedastheresult
ofexperimentsabouthumanfacerecognitionsystem,thatarepotentiallyrelevant
tothedesignoffacerecognitionsystemswillbesummarized
One of the basic issues that have been argued by several scientists is the
existence of a dedicated face processing system [82, 3]. Physiological evidence
10
indicates that the brain possesses specialized ‘face recognition hardware’ in the
formoffacedetectorcellsintheinferotemporalcortexandregionsinthefrontal
right hemisphere; impairment in these areas leads to a syndrome known as
prosapagnosia. Interestingly, prosapagnosics, although unable to recognize
familiar faces, retain their ability to visually recognize nonface objects. As a
resultofmanystudiesscientistscomeupwiththedecisionthatfacerecognitionis
notlikeotherobjectrecognition[42].
Hence,thequestioniswhatfeatureshumansuseforfacerecognition.The
results of the related studies are very valuable in the algorithm design of some
face recognition systems. It is interesting that when all facial features like nose,
mouth, eye etc. are contained in an image, but in different order than ordinary,
recognitionisnotpossibleforhuman.Explanationoffaceperceptionastheresult
of holistic or feature analysis alone is not possible since both are true. In human
both global and local features are used in a hierarchical manner [82]. Local
features provide a finer classification system for face recognition. Simulations
showthatthemostdifficultfacesforhumanstorecognizearethosefaces,which
are neither attractive nor unattractive [4]. Distinctive faces are more easily
recognizedthantypicalones.Informationcontainedinlowfrequencybandsused
in order to make the determination of the sex of the individual, while the higher
frequency components are used in recognition. The low frequency components
contribute to the global description, while the high frequency components
contributetothefinerdetailsrequiredintheidentificationtask[8,11,13].Ithas
11
alsobeenfoundthattheupperpartofthefaceismoreusefulforrecognitionthan
thelowerpart[82].
In [42], Bruce explains an experiment that is realized by superimposing
thelowspatialfrequencyMargaretThatcher’sfaceonthehigh spatial frequency
components of Tony Blair’s face. Although when viewed close up only Tony
Blair was seen, viewed from distance, Blair disappears and Margaret Thatcher
becomesvisible.Thisdemonstratesthattheimportantinformationforrecognizing
familiarfacesiscontainedwithinaparticularrangeofspatialfrequencies.
Another important finding is that human face recognition system is
disrupted by changes in lighting direction and also changes of viewpoint.
Althoughsomescientiststendtoexplainhumanfacerecognitionsystembasedon
derivation of 3D models of faces using shape from shading derivatives, it is
difficulttounderstandwhyface recognition appears so viewpoint dependent [1].
The effects of lighting change on face identification and matching suggest that
representationsforfacerecognitionarecruciallyaffectedbychangesinlowlevel
imagefeatures.
BruceandLangtonfoundthatnegation(invertingbothhueandluminance
values of an image) effects badly the identification of familiar faces [124]. They
alsoobservethatnegationhasnosignificanteffectonidentificationandmatching
ofsurfaceimages that lacked any pigmented and textured features, this led them
toattributethenegationeffecttothealterationofthebrightnessinformationabout
pigmentedareas.AnegativeimageofadarkhairedCaucasian,forexample,will
appear to be a blonde with dark skin. Kemp et al. [125] showed that the hue
12
values of these pigmented regions do not themselves matters for face
identification. Familiar faces presented in ‘hue negated’ versions, with preserved
luminance values, were recognized as well as those with original hue values
maintained,thoughtherewas a decrement in recognition memory for pictures of
faceswhenhuewasalteredinthisway[126].Thissuggeststhatepisodicmemory
forpicturesofunfamiliarfacescanbesensitivetohue,thoughtherepresentations
of familiar faces seems not to be. This distinction between memory for pictures
andfacesisimportant.Itisclearthatrecognitionoffamiliarandunfamiliarfaces
isnotthesameforhumans.Itislikelythatunfamiliarfacesareprocessedinorder
to recognize a picture where as familiar faces are fed into the face recognition
system of human brain. A detailed discussion of recognizing familiar and
unfamiliarfacescanbefoundin[41].
Young children typically recognize unfamiliar faces using unrelated cues
such as glasses, clothes, hats, and hairstyle. By the age of twelve, these
paraphernalia are usually reliably ignored. Curiously, when children as young as
five years are asked to recognize familiar faces, they do pretty well in ignoring
paraphernalia. Several other interesting studies related to how children perceive
invertedfacesaresummarizedin[6,7].
Humans recognize people from their own race better than people from
another race. Humans may encode an ‘average’ face; these averages may be
different for different races and recognition may suffer from prejudice and
unfamiliarity with the class of faces from another race or gender [82]. The poor
identification of other races is not a psychophysical problem but more likely a
13
psychosocialone.Oneoftheinterestingresultsofthestudiestoquantifytherole
of gender in face recognition is that in Japanese population, majority of the
women’sfacialfeaturesismoreheterogeneousthanthemen’sfeatures.Ithasalso
been found that white women’s faces are slightly more variable than men’s, but
thattheoverallvariationissmall[9,10].
2.1.1. Discussion
The recognition of familiar faces plays a fundamental role in our social
interactions. Humans are able to identify reliably a large number of faces and
psychologists are interested in understanding the perceptual and cognitive
mechanisms at the base of the face recognition process. Those researches
illuminatecomputervisionscientists’studies.
We can summarize the founding of studies on human face recognition
systemasfollows:
1. Thehumancapacityforfacerecognitionisadedicatedprocess,notmerelyan
application of the general object recognition process. Thus artificial face
recognitionsystemsshouldalsobefacespecific.
2. Distinctivefacesaremoreeasilyrecognizedthantypicalones.
3. Bothglobalandlocalfeaturesareusedforrepresentingandrecognizingfaces.
4. Humansrecognizepeoplefromtheirownracebetterthanpeoplefromanother
race.Humansmayencodean‘average’face.
14
5. Certain image transformations, such as intensity negation, strange viewpoint
changes, and changes in lighting direction can severely disrupt human face
recognition.
Usingthepresenttechnologyitisimpossibletocompletelymodelhuman
recognition system and reach its performance. However, the human brain has its
shortcomings in the total number of persons that it can accurately ‘remember’.
Thebenefitofacomputersystemwouldbeitscapacitytohandlelargedatasetsof
faceimages.
The observations and findings about human face recognition system will
be a good starting point for automatic face recognition methods. As it is
mentionedaboveanautomatedfacerecognitionsystemshouldbefacespecific.It
shouldeffectivelyusefeaturesthatdiscriminateafacefromothers,andmoreasin
caricaturesitpreferablyamplifiessuchdistinctivecharacteristicsofface[5,13].
Differencebetweenrecognitionoffamiliarandunfamiliarfacesmustalso
benoticed.Firstofallweshouldfindoutwhatmakesusfamiliartoaface.Seeing
a face in many different conditions (different illuminations, rotations,
expressions…etc.) make us familiar to that face, or by just frequently looking at
the same face image can we be familiar to that face? Seeing a face in many
differentconditionsissomethingrelatedtotraininghowevertheinterestingpoint
isthatbyusingonlythesame2Dinformationhowwecanpassfromunfamiliarity
to familiarity. Methods, which recognize faces from a single view, should pay
attentiontothisfamiliaritysubject.
15
Someoftheearlyscientistswereinspiredbywatchingbirdflightandbuilt
their vehicles with mobile wings. Although a single underlying principle, the
Bernoulli effect, explains both biological and manmade flight, we note that no
modernaircrafthasflappingwings.Designersoffacerecognitionalgorithmsand
systems should be aware of relevant psychophysics and neurophysiological
studies but should be prudent in using only those that are applicable or relevant
fromapractical/implementationpointofview.
2.2. AutomaticFaceRecognition
Although humans perform face recognition in an effortless manner,
underlying computations within the human visual system are of tremendous
complexity. The seemingly trivial task of finding and recognizing faces is the
resultofmillionsyearsofevolutionandwearefarawayfromfullyunderstanding
howthebrainperformsthistask.
Up to date, no complete solution has been proposed that allow the
automatic recognition of faces in real images. In this section we will review
existing face recognition systems in five categories: early methods, neural
networks approaches, statistical approaches, template based approaches, and
feature based methods. Finally current state of the art of the face recognition
technologywillbepresented.
16
2.2.1. Representation,MatchingandStatisticalDecision
The performance of face recognition depends on the solution of two
problems:representationandmatching.
At an elementary level, the image of a face is a two dimensional (2D)
arrayofpixelgraylevelsas,
x={x
i,j
,i,j ∈S}, (2.1)
where S is a square lattice. However in some cases it is more convenient to
express the face image, x, as onedimensional (1D) column vector of
concatenatedrowsofpixels,as
x=[x
1
,x
2
,.....,x
n
]
T
(2.2)
Wheren=Sisthetotalnumberofpixelsintheimage.Thereforex∈ R
n
,then
dimensionalEuclideanspace.
For a given representation, two properties are important: discriminating
powerandefficiency;i.e.howfarapartarethefacesundertherepresentationand
howcompactistherepresentation.
While many previous techniques represent faces in their most elementary
forms of (2.1) or (2.2), many others use a feature vector, F(x)=[ f
1
(x), f
2
(x),....,
f
m
(x)]
T
, where f
1
(.),f
2
(.),...,f
m
(.) are linear or nonlinear functionals. Featurebased
representationsareusuallymoreefficientsincegenerallymismuchsmallerthan
n.
17
A simple way to achieve good efficiency is to use an alternative
orthonormal basis of R
n
. Specifically, suppose e
1
, e
2
,..., e
n
are an orthonormal
basis.ThenXcanbeexpressedas
i
n
i
i
e x x
¿
=
=
1
~
(2.3)
where
i i
e x x ,
~
∆ (inner product), and x can be equivalently represented by
[ ]
T
n
x x x x
~
,...,
~
,
~ ~
2 1
= .Twoexamplesoforthonormalbasisarethenaturalbasisused
in (2.2) with e
i
=[0,...,0,1,0,...,0]
T
, where one is i
th
position, and the Fourier basis
( )
T
n
i n
j
n
i
j
n
i
j
i
e e e
n
e
(
(
¸
(
¸

.

\

=

.

\
 −

.

\


.

\
 1
2
2
2 2
2 / 1
,..., , , 1
1
π π π
.Iffor a given orthonormalbasis
i
x
~
are small when m i ≥ , then the face vector x
~
can be compressed into an m
dimensionalvector, [ ]
T
m
x x x x
~
,...,
~
,
~ ~
2 1
≅ .
It is important to notice that an efficient representation does not
necessarilyhavegooddiscriminatingpower.
Inthematchingproblem,anincomingfaceisrecognizedbyidentifyingit
with a prestored face. For example, suppose the input face is x and there are K
prestoredfacesc
k
,k=1,2,...,K.Onepossibilityistoassignxto
0
k
c if
k
c x k
K k
− =
≤ ≤ 1
min arg
0
(2.4)
18
where . represents the Euclidean distance in R
n
. If c
k
 is normalized so that
c
k
=cforallk,theminimumdistancematchingin(2.4)simplifiestocorrelation
matching
k
c x k
K k
, min arg
1
0
≤ ≤
= . (2.5)
Since distance and inner product are invariant to change of orthonormal
basis, minimum distance and correlation matching can be performed using any
orthonormal basis and the recognition performance will be the same. To do this,
simply replace x and c
k
in (2.4) or (2.5) by x
~
and
k
c
~
. Similarly (2.4) and (2.5)
alsocouldbeusedwithfeaturevectors.
Duetosuchfactorssuchasviewingangle,illumination,facialexpression,
distortion, and noise, the face images for a given person can have random
variations and therefore are better modeled as a random vector. In this case,
maximumlikelihood(ML)matchingisoftenused,
( ) ( )
k
c x p k
K k
 log min arg
1
0
≤ ≤
= (2.6)
wherep(xc
k
)isthedensityofxconditioningonitsbeingthek
th
person.TheML
criterion minimizes the probability of recognition error when a priori, the
incomingfaceisequallylikelytobethatofanyoftheKpersons.Furthermoreif
we assume that variations in face vectors are caused by additive white Gaussian
noise(AWGN)
x
k
=c
k
+w
k
(2.7)
19
wherew
k
isazeromeanAWGNwithpowerσ
2
,thentheMLmatchingbecomes
theminimumdistancematchingof(2.4).
2.2.2. Earlyfacerecognitionmethods
Theinitialworkinautomaticfaceprocessingdatesbacktotheendofthe
19
th
century, as reported by Benson and Perrett [39]. In his lecture on personal
identificationattheRoyalInstitutionon25May1888,SirFrancisGalton[53],an
Englishscientist,explorerandacousinofCharlesDarwin,explainedthathehad
“frequently chafed under the sense of inability to verbally explain hereditary
resemblance and types of features”. In order to relieve himself from this
embarrassment, he took considerable trouble and made many experiments. He
described how French prisoners were identified using four primary measures
(head length, head breadth, foot length and middle digit length of the foot and
hand respectively). Each measure could take one of the three possible values
(large, medium, or small), giving a total of 81 possible primary classes. Galton
felt it would be advantageousto have an automatic method of classification. For
thispurpose,hedevisedanapparatus,whichhecalledamechanicalselector,that
could be used to compare measurements of face profiles. Galton reported that
mostofthemeasureshehadtriedwerefairyefficient.
Early face recognition methods were mostly feature based. Galton’s
proposed method, and a lot of work to follow, focused on detecting important
facial features as eye corners, mouth corners, nose tip, etc. By measuring the
relativedistancesbetweenthesefacialfeaturesafeaturevectorcanbeconstructed
20
todescribeeachface.Bycomparingthefeaturevectorofanunknownfacetothe
feature vectors of known vectors from a database of known faces, the closest
matchcanbedetermined.
One of the earliest works is reported by Bledsoe [84]. In this system, a
human operatorlocated the feature points on the faceand entered their positions
into the computer. Given a set of feature point distances of an unknown person,
nearest neighbor or other classification rules were used for identifying the test
face. Since feature extraction is manually done, this system could accommodate
widevariationsinheadrotation,tilt,imagequality,andcontrast.
InKanade’swork[62],seriesfiducialpointsaredetectedusingrelatively
simple image processing tools (edge maps, signatures etc.) and the Euclidean
distancesarethenusedasafeaturevectortoperformrecognition.Thefacefeature
pointsarelocatedintwostages.Thecoarsegrainstagesimplifiedthesucceeding
differential operation and feature finding algorithms. Once the eyes, nose and
mouth are approximately located, more accurate information is extracted by
confiningtheprocessingtofoursmallergroups,scanningathigherresolution,and
using‘bestbeamintensity’ fortheregion.Thefourregionsaretheleftandright
eye, nose, and mouth. The beam intensity is based on the local area histogram
obtainedinthecoarsegainstage.Asetof16facialparameters,whicharerations
ofdistances,areas,andanglestocompensateforthevaryingsizeofthepictures,
isextracted.Toeliminatescaleanddimensiondifferencesthecomponentsofthe
resulting vector are normalized. A simple distance measure is used to check
similaritybetweentwofaceimages.
21
2.2.3. Statisticalapproachestofacerecognition
2.2.3.1. KarhunenLoeveExpansionBasedMethods
2.2.3.1.1. Eigenfaces
A face image, I(x,y), of size NxN is simply a matrix with beach element
representingtheintensityatthatparticularpixel.I(x,y)mayalsobeconsideredas
a vector of length N
2
or a single point in an N
2
dimentional space. So a 128x128
pixel image can be represented as a point in a 16,384 dimensional space. Facial
images in general will occupy only a small subregion of this high dimensional
‘imagespace’andthusarenotoptimallyrepresentedinthiscoordinatesystem.
As mentioned in section 2.2.1, alternative orthonormal bases are often
usedtocompressfacevectors.OnesuchbasisistheKarhunenLoeve(KL).
The‘Eigenfaces’methodproposedbyTurkandPentland[20],isbasedon
theKarhunenLoeveexpansionandismotivatedbytheearlierworkofSirovitch
andKirby[63]forefficientlyrepresentingpictureoffaces.Eigenfacerecognition
derivesitisnamefromtheGermanprefix‘eigen’,meaning‘own’or‘individual’.
TheEigenfacemethodoffacialrecognitionisconsideredthefirstworkingfacial
recognitiontechnology.
TheeigenfacesmethodpresentedbyTurkandPentlandfindstheprincipal
components (KarhunenLoeve expansion) of the face image distribution or the
eigenvectors of the covariance matrix of the set of face images. These
eigenvectors can be thought as a set of features, which together characterize the
variationbetweenfaceimages.
22
LetafaceimageI(x,y)beatwodimensionalarrayofintensityvalues,ora
vectorofdimensionn.LetthetrainingsetofimagesbeI
1
,I
2
,...,I
N
.Theaverage
face image of the set is defined by
¿
=
=
N
i
i
I
N
1
1
ψ . Each face differs from the
average by the vector ψ φ − =
i i
I . This set of very large vectors is subject to
principal component analysis which seeks a set of K orthonormal vectors v
k
,
k=1,...,K and their associated eigenvalues λ
k
which best describe the distribution
ofdata.
Vectors v
k
and scalars λ
k
are the eigenvectors and eigenvalues of the
covariancematrix:
,
1
1
T T
i
N
i
i
AA
N
C = =
¿
=
φ φ (2.9)
where the matrix A=[φ
1
, φ
2
,..., φ
N
]. Finding the eigenvectors of matrix C
nxn
is
computationallyintensive.However,theeigenvectorsofCcanbedeterminedby
first finding the eigenvectors of a much smaller matrix of sizeNxN and taking a
linearcombinationoftheresultingvectors.
k k k
v Cv λ = (2.10)
k k
T
k k
T
k
v v Cv v λ = (2.11)
sinceeigenvectors,v
k
,areorthogonalandnormalizedv
k
T
v
k
=1.
k k
T
k
Cv v λ = (2.12)
23
¿
=
=
N
i
k
T
i i
T
k k
v v
N
1
1
φ φ λ (2.13)
¿
¿
¿
¿
¿
=
=
=
=
=
=
− =
=
=
=
N
i
T
i k
N
i
T
i k
T
i k
N
i
T
i k
N
i
T
i k
T T
i k
N
i
k
T
i i
T
k
I v
N
I v mean I v
N
v
N
v v
N
v v
N
1
1
2
1
2
1
1
) var(
1
)) ( (
1
) (
1
) ( ) (
1
1
φ
φ φ
φ φ
Thus eigenvalue k represents the variance of the representative facial image set
alongtheaxisdescribedbyeigenvectork.
The space spanned by the eigenvectors v
k
, k=1,...,K corresponding to the
largest K eigenvalues of the covariance matrix C, is called the face space. The
eigenvectorsofmatrixC,whicharecalledeigenfacesfromabasissetfortheface
images. A new face image Γ is transformed into its eigenface components
(projectedontothefacespace)by:
) ( ) ( , φ φ ω − Γ >= − Γ =<
T
k
vk vk (2.14)
For k=1,...,K. The projections w
k
form the feature vector Ω=[ w
1
, w
2
,..., w
K
]
which describes the contribution of each of each eigenface in representing the
inputimage.
Given a set of face classes E
q
and the corresponding feature vectors Ωq,
thesimplestmethodfordeterminingwhichfaceclassprovidesthebestdescription
24
of an input face image Γ is to find the face class that minimizes the Euclidean
distanceinthefeaturespace:
q q
Ω − Ω = ξ (2.15)
AfaceisclassifiedasbelongingtoclassE
q
whentheminimumξqisbelowsome
thresholdθ
∈
andalso
{ }
q q q
E ξ min arg = . (2.16)
Otherwise,thefaceisclassifiedasunknown.
Turk and Pentland [20] test how their algorithm performs in changing
conditions,byvaryingillumination,sizeandorientationofthefaces.Theyfound
thattheirsystemhadthemosttroublewithfacesscaledlargerorsmallerthanthe
original dataset. To overcome this problem they suggest using a multiresolution
methodinwhichfacesarecomparedtoeigenfacesofvaryingsizestocomputethe
best match. Also they note that image background can have significant effect on
performance, which they minimize by multiplying input images with a 2D
Gaussiantodiminishthecontributionofthebackgroundandhighlightthecentral
facial features. System performs face recognition in realtime. Turk and
Pentland’s paper was very seminal in the field of face recognition and their
methodisstillquitepopularduetoitseaseofimplementation.
MuraseandNayar[85] extended the capabilities of the eigenface method
to general 3Dobject recognition under different illumination and viewing
conditions. Given N object images taken under P views and L different
25
illumination conditions, a universal image set is built which contains all the
availabledata.Inthiswayasingle‘parametricspace’describestheobjectidentity
aswellastheviewingorilluminationconditions.Theeigenfacedecompositionof
thisspacewasusedforfeatureextractionandclassification.Howeverinorderto
insure discrimination between different objects the number of eigenvectors used
inthismethodwasincreasedcomparedtotheclassicalEigenfacemethod.
Later,basedontheeigenfacedecomposition,Pentlandetal[86]developed
a ‘view based’ eigenspace approach for human face recognition under general
viewing conditions. Given N individuals under P different views, recognition is
performed over P separate eigenspaces, each capturing the variation of the
individuals in a common view. The ‘view based’ approach is essentially an
extensionoftheeigenfacetechniquetomultiplesetsofeigenvectors,oneforeach
face orientation. In order to deal with multiple views, in the first stage of this
approach,theorientationofthetestfaceisdeterminedandtheeigenspacewhich
bestdescribestheinputimageisselected.Thisisaccomplishedbycalculatingthe
residual description error (distance from feature space: DFFS) for each view
space. Once the proper view is determined, the image is projected onto
appropriate view space and then recognized. The view based approach is
computationallymoreintensivethantheparametricapproachbecausePdifferent
sets of V projections are required (V is the number of eigenfaces selected to
represent each eigenspace). Naturally, the viewbased representation can yield
moreaccuraterepresentationoftheunderlyinggeometry.
26
2.2.3.1.2. FaceRecognitionusingEigenfaces
Therearetwomainapproachesofrecognizingfacesbyusingeigenfaces.
Appearancemodel:
1 Adatabaseoffaceimagesiscollected
2 A set of eigenfaces is generated by performing principal component analysis
(PCA) on the face images. Approximately, 100 eigenvectors are enough to
codealargedatabaseoffaces.
3 Eachfaceimageisrepresentedasalinearcombinationoftheeigenfaces.
4 A given test image is approximated by a combination of eigenfaces. A
distancemeasureisusedtocomparethesimilaritybetweentwoimages.
Figure2.1:Appearancemodel
g p
U
Ω
k −
∆
) , ( g p S
27
Figure2.2:Discriminativemodel
Discriminativemodel:
1 Two datasets Ω
l
and Ω
E
are obtained by computing intrapersonal differences
(by matching two views of each individual in the dataset) and the other by
computingextrapersonaldifferences(bymatchingdifferentindividualsinthe
dataset),respectively.
2 TwodatasetsofeigenfacesaregeneratedbyperformingPCAoneachclass.
E
Ω
l
Ω
g p
( )
( )
1
1


Ω ∆
∆ Ω
P
P
) , ( g p S
28
3 Similarity score between two images is derived by calculating S=P(Ω
l
∆),
where ∆ is the difference between a pair of images. Two images are
determinedtobethesameindividual,ifS>0.5.
Although the recognition performance is lower than the correlation
method, the substantial reduction in computational complexity of the eigenface
methodmakesthismethodveryattractive.Therecognitionratesincreasewiththe
number of principal components (eigenfaces) used and in the limit, as more
principal components are used, performance approaches that of correlation. In
[20], and [86], authors reported that the performances level off at about 45
principalcomponents.
Ithasbeenshownthatremovingfirstthreeprincipalcomponentsresultsin
better recognition performances (the authors reported an error rate of %20 when
usingtheeigenfacemethodwith30principalcomponentsonadatabasestrongly
affectedbyilluminationvariationsandonly%10errorrateafterremovingthefirst
three components). The recognition rates in this case were better than the
recognitionratesobtainedusingthecorrelationmethod.Thiswasarguedbasedon
the fact that first components are more influenced by variations of lighting
conditions.
2.2.3.1.3. Eigenfeatures
Pentland et al. [86] discussed the use of facial features for face
recognition.Thiscanbeeitheramodularora layered representation of the face,
29
where a coarse (lowresolution) description of the whole head is augmented by
additional (highresolution) details in terms of salient facial features. The
eigenfacetechniquewasextendedtodetectfacialfeatures.Foreachofthefacial
features, a feature space is built by selecting the most significant eigenfeatures
(eigenvectors corresponding to the largest eigenvalues of the features correlation
matrix).
After the facial features in a test image were extracted, a score of
similarity between the detected features and the features corresponding to the
model images is computed. A simple approach for recognition is to compute a
cumulative score in terms of equal contribution by each of the facial feature
scores. More elaborate weighting schemes can also be used for classification.
Once the cumulative score is determined, a new face is classified such that this
scoreismaximized.
The performance of eigenfeatures method is close to that of eigenfaces,
howeveracombinedrepresentationofeigenfacesandeigenfeaturesshowshigher
recognitionrates.
2.2.3.1.4. TheKarhunenLoeveTransformoftheFourierSpectrum
Akamatsu et. al. [87], illustrated the effectiveness of KarhunenLoeve
Transform of Fourier Spectrum in the Affine Transformed Target Image (KL
FSAT) for face recognition. First, the original images were standardized with
respect to position, size, and orientation using an affine transform so that three
referencepointssatisfyaspecificspatialarrangement.Thepositionofthesepoints
30
isrelatedtothepositionofsomesignificantfacialfeatures.Theeigenfacemethod
is applied discussed in the section 2.2.3.1.1. to the magnitude of the Fourier
spectrum of the standardized images (KLFSAT). Due to the shift invariance
property of the magnitude of the Fourier spectrum, the KLFSAT performed
better than classical eigenfaces method under variations in head orientation and
shifting. However the computational complexity of KLFSAT method is
significantly greater than the eigenface method due to the computation of the
Fourierspectrum.
2.2.3.2. LinearDiscriminantMethodsFisherfaces
In [88], [89], the authors proposed a new method for reducing the
dimensionality of the feature space by using Fisher’s Linear Discriminant (FLD)
[90]. The FLD uses the class membership information and develops a set of
feature vectors in which variations of different faces are emphasized while
different instances of faces due to illumination conditions, facial expression and
orientationsaredeemphasized.
2.2.3.2.1. Fisher’sLinearDiscriminant
Given c classes with a priori probabilities P
i
, let N
i
be the number of
samples of class i, i=1,...,c . Then the following positive semidefinite scatter
matricesaredefinedas:
31
¿
=
− − =
c
i
T
i i i B
P S
1
) )( ( µ µ µ µ (2.17)
¿ ¿
= =
− − =
c
i
N
j
T
i
i
j i
i
j i w
i
x x P S
1 1
) )( ( µ µ (2.18)
Where
i
j
x denotes the j
th
ndimensional sample vector belonging to class i, µ
i
is
themeanofclassi:
,
1
1
¿
=
=
i
N
j
i
j
i
i
x
N
µ (2.19)
andµistheoverallmeanofsamplevectors:
,
1
1 1
1
¿¿
¿
= =
=
=
c
i
N
j
i
j
c
i
i
i
x
N
µ (2.20)
S
w
isthewithin–classscattermatrixandrepresentsthe average scatter of sample
vectorofclassi;S
B
isthebetweenclassscattermatrixandrepresentsthescatter
ofthemean µ
i
ofclassiaroundtheoverallmeanvector µ.IfS
w
isnonsingular,
the Linear Discriminant Analysis (LDA) selects a matrix V
opt
∈R
nxk
with
orthonormal columns which maximizes the ratio of the determinant of the
betweenclassscattermatrixoftheprojectedsamples,
[ ], ,..., , max arg
2 1 k
w
T
B
T
v opt
v v v
V S V
V S V
V =


.

\

= (2.21)
32
Where {v
i
i=1,...,k} is the set of generalized eigenvectors of S
B
and S
w
correspondingtothesetofdecreasingeigenvalues{λ
i
i=1,...,k},i.e.
.
i w i i B
v S v S λ = (2.22)
As shown in [91], the upper bound of k is c1. The matrix V
opt
describes the
OptimalLinearDiscriminantTransformorthe FoleySammon Transform. While
theKarhunenLoeveTransformperformsarotationonasetofaxesalongwhich
the projection of sample vectors differ most in the autocorrelation sense, the
LinearDiscriminantTransformperformsarotationonasetofaxes[v
1
,v
2
,...,v
k
]
alongwhichtheprojectionofsamplevectorsshowmaximumdiscrimination.
2.2.3.2.2. FaceRecognitionUsingLinearDiscriminantAnalysis
LetatrainingsetofNfaceimagesrepresentscdifferentsubjects.Theface
image in the training set are twodimensional arrays of intensity values,
represented as vectors of dimension n. Different instances of a person’s face
(variations in lighting, pose or facial expressions) are defined to be in the same
classandfacesofdifferentsubjectsaredefinedtobefromdifferentclasses.
The scatter matrices S
B
and Sw are defined in Equations (2.17), (2.18).
However the matrix V
opt
cannot be found directly from Equation (2.21), because
ingeneralmatrixS
w
issingular.ThisstemsfromthefactthattherankofS
w
isless
thanNc,andingeneral,thenumberofpixelsineachimagenismuchlargerthan
the number of images in the learning set N. There have been presented many
solutionsintheliteratureinordertoovercomethisproblem[92,93].In[88],the
33
authors propose a method which is called Fisherfaces. The problem of S
w
being
singularisavoidedbyprojectingtheimagesetontoalowerdimensionalspaceso
that the resulting within class scatter is non singular. This is achieved by using
PrincipalComponentAnalysis(PCA)toreducethedimensionofthefeaturespace
to Nc and then, applying the standard linear discriminant defined in Equation
(2.21)toreducethedimensiontoc1.MoreformallyV
opt
isgivenby:
V
opt
=V
fld
V
pca
, (2.23)
Where
[ ], max arg CV V V
T
v pca
= (2.24)
and,


.

\

=
V SwV V V
V SBV V V
V
pca
T
pca
T
pca
T
pca
T
v fld
max arg , (2.25)
Where C is the covariance matrix of the set of training images and is computed
fromEquation(2.9).ThecolumnsofV
opt
areorthogonalvectorswhicharecalled
Fisherfaces.UnliketheEigenfaces,theFisherfacesdonotcorrespondtofacelike
patterns.AllexamplefaceimagesE
q
,q=1,...,QintheexamplesetSareprojected
ontothevectorscorrespondingtothecolumnsoftheV
fld
andasetoffeaturesis
extractedforeachexamplefaceimage.Thesefeaturevectorsareuseddirectlyfor
classification.
34
Having extracteda compact and efficient feature set, the recognition task
canbeperformedbyusingtheEuclideandistanceinthefeaturespace.However,
in [89] as a measure in the feature space, is proposed a weighted mean
absolute/square distance with weights obtained based on the reliability of the
decisionaxis.
¿
=
∈ Ε
Ε − Γ Σ
Ε − Γ
= Ε Γ
K
v
v
v v S
v v
D
1
2
2
) (
) (
) , ( α , (2.26)
Therefore,foragivenfaceimageΓ,thebestmatchE
0
isgivenby
{ } ) , ( min arg
0
Ε Γ = Ε
∈ Ε
D
S
. (2.27)
Theconfidencemeasureisdefinedas:


.

\

Ε Γ
Ε Γ
− = Ε Γ
) , (
) , (
1 ) , (
1
0
0
D
D
Conf ,(2.28)
whereE
1
isthesecondbestcandidate.
In [87], Akamatsu et. al. applied LDA to the Fourier Spectrum of the
intensity image. The results reported by the authors showed that LDA in the
FourierdomainissignificantlymorerobusttovariationsinlightingthantheLDA
applieddirectlytotheintensityimages.Howeverthecomputationalcomplexityof
this method is significantly greater than classical Fisherface method due to the
computationoftheFourierspectrum.
35
2.2.3.3. SingularValueDecompositionMethods
2.2.3.3.1 Singularvaluedecomposition
MethodsbasedontheSingularValueDecompositionforfacerecognition
usethegeneralresultstatedbythefollowingtheorem:
Theorem: Let I
pxq
be a real rectangular matrix Rank(I)=r, then there exists two
orthonormal matrices U
pxp
, V
qxq
and a diagonal matrix Σ
pxq
and the following
formulaholds:
¿
=
= Σ =
r
i
T
i i i
T
v v V U I
1
, λ (2.29)
where
U=(u
1
,u
2
,...,u
r
,u
r+1
,...,u
p
),
V=(v
1
,v
2
,...,v
r
,v
r+1
,...,v
q
),
Σ=diag(λ
1
,λ
2
,...,λ
r
,0,...,0),
λ
1
>λ
2
>...>λ
r
>0,
2
i
λ , i=1,...,r are the eigenvalues of II
T
and I
T
I, u
i
, v
j
, i=1,...,p,
j=1,...,qaretheeigenvectorscorrespondingtoeigenvaluesofII
T
andI
T
I.
2.2.3.3.2. FacerecognitionUsingSingularValueDecomposition
Let a face image I(x,y) be a two dimensional (mxn) array of intensity
valuesand[λ
1
,λ
2
,...,λ
r
]beitssingularvalue(SV)vector.In[93],Zhongrevealed
the importance of using SVD for human face recognition by proving several
important properties of the SV vector as: the stability of the SV vector to small
36
perturbations caused by stochastic variation in the intensity image, the
proportional variance of the SV vector to proportional variance of pixels in the
intensity image, the invariance of the SV feature vector to rotation transform,
translation and mirror transform. The above properties of the SV vector provide
the theoretical basis for using singular values as image features. However, it has
beenshownthatcompressingtheoriginalSVvectorintoalowdimensionalspace,
by means of various mathematic transforms leads to higher recognition
performances.Amongvarioustransformationsofcompressingdimensionality,the
FoleySammon transform based on Fisher criterion, i.e. optimal discriminant
vectors,isthemost popular one. Given N faceimages, which present c different
subjects, the SV vectors are extracted from each image. According to Equations
(2.17)and(2.18),thescattermatricesS
B
andS
w
oftheSVvectorsareconstructed.
Ithasbeenshownthatitisdifficulttoobtaintheoptimaldiscriminantvectorsin
the case of small number of samples, i.e. the number of samples is less than the
dimensionality of the SV vector because the scatter matrix S
w
is singular in this
case. Many solutions have been proposed to overcome this problem. Hong [93],
circumvented the problem by adding a small singular value perturbation to S
w
resulting in S
w
(t) such that S
w
(t) becomes nonsingular. However the perturbation
of S
w
introduces an arbitrary parameter, and the range to which the authors
restrictedtheperturbationisnotappropriatetoensurethattheinversionofS
w
(t)is
numericallystable.Chengetal[92],solvedtheproblembyrankdecompositionof
S
w
. This is a generalization of Tian’s method [94], who substitute S
w

by the
positivepseudoinverseS
w
+
.
37
After the set of optimal discriminant vectors {v
1
, v
2
, ..., v
k
} has been
extracted, the feature vectors are obtained by projecting the SV vectors onto the
spacespannedby{v
1
,v
2
,...,v
k
}.
When a test image is acquired, its SV vector is projected onto the space
spanned by {v
1
, v
2
, ..., v
k
} and classification is performed in the feature space by
measuringtheEuclideandistanceinthisspaceandassigningthetestimagetothe
classofimagesforwhichtheminimumdistanceisachieved.
AnothermethodtoreducethefeaturespaceoftheSVfeaturevectorswas
describedbyChengetal[95].Thetrainingsetusedconsistedofasmallsampleof
face images of the same person. If
i
j
I represents the j
th
face image of person i,
then the average image is given by
¿
=
N
j
i
j
I
N
1
1
. Eigenvalues and eigenvectors are
determinedforthisaverageimageusingSVD.Theeigenvaluesare tresholdedto
disregard the values close to zero. Average eigenvectors (called feature vectors)
foralltheaveragefaceimagesarecalculated.Atestimageisthenprojectedonto
thespacespannedbytheeigenvectors.TheFrobeniusnormisusedasacriterion
todeterminewhichpersonthetestimagebelongs.
2.2.4. HiddenMarkovModelBasedMethods
Hidden Markov Models (HMM) are a set of statistical models used to
characterize the statistical properties of a signal. Rabiner [69][96], provides an
extensive and complete tutorial on HMMs. HMM are made of two interrelated
processes:
38
 anunderlying,unobservableMarkovchainwithfinitenumberofstates,astate
transitionprobabilitymatrixandaninitialstateprobabilitydistribution.
 Asetofprobabilitydensityfunctionsassociatedtoeachstate.
TheelementsofHMMare:
N, the number of states in the model. If S is the set of states, then S={ S
1
, S
2
,...,
S
N
}. The state of the model q
t
time t is given by q
t
∈S, T t ≤ ≤ 1 , where T is the
lengthoftheobservationsequence(numberofframes).
M, the number of different observation symbols. If V is the set of all possible
observationsymbols(alsocalledthecodebookofthemodel),thenV={V
1
,V
2
,...,
V
M
}.
A,thestatetransitionprobabilitymatrix;A={a
ij
}where
N j i S q S q P A
i t j t ij
≤ ≤ = = =
−
, 1 , ]  [
1
(2.30)
1 0 ≤ ≤
ij
a , N i a
N
j
ij
≤ ≤ =
¿
=
1 , 1
1
(2.31)
B,theobservationsymbolprobabilitymatrix;B=b
j
(k)where,
M k N j S q vk Q P k B
j t t j
≤ ≤ ≤ ≤ = = = 1 , 1 , ]  [ ) ( (2.32)
AndQ
t
istheobservationsymbolattimet.
π,theinitialstatedistribution;π=π
i
where
[ ] N i S q P
i i
≤ ≤ = = 1 ,
1
π (2.33)
Usingashorthandnotation,aHMMisdefinedas:
39
λ=(A,B,π).(2.34)
The above characterization corresponds to a discrete HMM, where the
observations characterized as discrete symbols chosen from a finite alphabet
V={v
1
, v
2
,...,v
M
}. In a continuous density HMM, the states are characterized by
continuous observation density functions. The most general representation of the
modelprobabilitydensityfunction(pdf)isafinitemixtureoftheform:
¿
=
≤ ≤ =
M
k
ik ik ik i
N i U O N c O b
1
1 ), , , ( ) ( µ (2.35)
where c
ik
is the mixture coefficient for the k
th
mixture in state i. Without loss of
generalityN(O,µ
ik
,U
ik
)isassumedtobeaGaussianpdfwithmeanvectorµ
ik
and
covariancematrixU
ik
.
HMM have been used extensively for speech recognition, where data is
naturally onedimensional (1D) along time axis. However, the equivalent fully
connected twodimensional HMM would lead to a very high computational
problem [97]. Attempts have been made to use multimodel representations that
lead to pseudo 2D HMM [98]. These models are currently used in character
recognition[99][100].
40
Figure2.3:ImagesamplingtechniqueforHMMrecognition
In[101],Samariaetalproposedtheuseofthe1DcontinuousHMMfor
face recognition. Assuming that each face is in an upright, frontal position,
featureswilloccurinapredictableorder.Thisorderingsuggeststheuseofatop
bottom model, where only transitions between adjacent states in a top to bottom
manner are allowed [102]. The states of the model correspond to the facial
featuresforehead,eyes,nose,mouthandchin[103].TheobservationsequenceO
isgeneratedfromanXxYimageusinganXxLsamplingwindowwithXxMpixels
overlap(Figure2.3).EachobservationvectorisablockofLlines.ThereisanM
lineoverlapbetweensuccessiveobservations.Theoverlappingallowsthefeatures
to be captured in a manner, which is independent of vertical position, while a
disjoint partitioning of the image could result in the truncation of features
occurring across block boundaries. In [104], the effect of different sampling
parametershasbeendiscussed.Withnooverlap,ifasmallheightofthesampling
41
window is used, the segmented data do not correspond to significant facial
features.However,asthewindowheightincreases,thereisahigherprobabilityof
cuttingacrossthefeatures.
Given c face images for each subject of the training set, the goal of the
training stage is to optimize the parameters λ
i
=(A,B,π) to describe ‘best’, the
observations O={ o
1
, o
2
,...,o
T
}, in the sense of maximizing P(Oλ). The general
HMMtrainingschemeisillustratedinFigure2.4andisavariantoftheKmeans
iterativeprocedureforclusteringdata:
1. The training images are collected for each subject in the database and are
sampledtogeneratetheobservationsequence.
2. A common prototype (state) model is constructed with the purpose of
specifyingthenumberofstatesintheHMMandthestatetransitionsallowed,
A(modelinitialization).
3. A set of initial parameter values using the training data and the prototype
model are computed iteratively. The goal of this stage is to find a good
estimate for the observation model probability matrix B. In [96], it has been
shown that a good initial estimates of the parameters are essential for rapid
andproperconvergence(totheglobalmaximumofthelikelihoodfunction)of
thereestimationformulas.Onthefirstcycle,thedataisuniformlysegmented,
matchedwitheachmodelstateandtheinitialmodelparametersareextracted.
Onsuccessivecycles,thesetoftrainingobservationsequenceswassegmented
intostatesviatheViterbialgorithm[50].Theresultofsegmentingeachofthe
training sequences is for each of N states, a maximum likelihood estimate of
42
the set of observations that occur within each state according to the current
model.
4. Following the Viterbi segmentation, the model parameters are reestimated
using the BaumWelch reestimation procedure. This procedure adjusts the
modelparameterssoastomaximizetheprobabilityofobservingthetraining
data,giveneachcorrespondingmodel.
5. Theresultingmodelisthencomparedtothepreviousmodel(bycomputinga
distancescorethatreflectsthestatisticalsimilarityoftheHMMs).Ifthemodel
distance score exceeds a threshold, then the old model λ is replaced by the
newmodel λ
~
,andtheoveralltrainingloopisrepeated.Ifthemodeldistance
score falls below the threshold, then model convergence is assumed and the
finalparametersaresaved.
Recognitionis carried out by matchingthe testimage against each of the
trainedmodels(Figure2.5).Inordertoachievethis,theimageisconvertedtoan
observationsequenceandthenmodellikelihoodsP(O
test
λ
i
)arecomputedforeach
λ
i
,i=1,...,c.Themodelwithhighestlikelihoodrevealstheidentityoftheunknown
face,as
[ ] )  ( max arg
1 i test c i
O P V λ
≤ ≤
= . (2.36)
43
Figure2.4:HMMtrainingscheme
The HMM based method showed significantly better performances for
facerecognitioncomparedtotheeigenfacemethod.ThisisduetofactthatHMM
based method offers a solution to facial features detection as well as face
recognition.
However the 1D continuous HMM are computationally more complex
thantheEigenfacemethod.Asolutioninreducingtherunningtimeofthismethod
is the use of discrete HMM. Extremely encouraging preliminary results (error
rates below %5) were reported in [105] when pseudo 2D HMM are used.
Furthermore, the authors suggested that Fourier representation of the images can
lead to better recognition performance as frequency and frequencyspace
representationcanleadtobetterdataseparation.
YES
NO Training
Data
Model
Initialization
StateSequence
Segmentation
Estimationof
BParameters
Model
Reestimation
Sampling
Model
Convergence
Model
Parameters
44
Figure2.5:HMMrecognitionscheme
2.2.5. NeuralNetworksApproach
Inprincipal,thepopularbackpropagation(BP) neural network [106] can
be trained to recognize face images directly. However, a simple network can be
verycomplexanddifficulttotrain.Atypicalimagerecognitionnetworkrequires
N=mxninputneurons,oneforeachofthepixelsinannxmimage.Forexample,if
theimagesare128x128,thenumberofinputsofthenetworkwouldbe16,384.In
order to reduce the complexity, Cottrell and Fleming [107] used two BP nets
(Figure2.6).Thefirstnetoperatesintheautoassociationmode[108]andextracts
Training
Set
Sampling
Probability
Computation
Probability
Computation
Probability
Computation
λ
1
λ
2
λ
N
Select
Maximum
45
features for the second net, which operates in the more common classification
mode.
The autoassociation nethas ninputs,noutputsand phiddenlayer nodes.
Usually p is much smaller than n. The network takes a face vector x as an input
and is trained to produce an output y that is a ‘best approximation’ of x. In this
way,thehiddenlayeroutputhconstitutesacompressedversionofx,orafeature
vector,andcanbeusedastheinputtoclassificationnet.
Figure2.6:Autoassociationandclassificationnetworks
Bourland and Kamp [108] showed that “under the best circumstances”,
when the sigmoidal functions at the network nodes are replaced by linear
classificationnet
hiddenlayeroutputs
Autoassociationnet
46
functions (when the network is linear), the feature vector is the same as that
produced by the KarhunenLoeve basis, or the eigenfaces. When the network is
nonlinear,thefeaturevectorcoulddeviatefromthebest.Theproblemhereturns
outtobeanapplicationofthesingularvaluedecomposition.
Specifically,supposethatforeachtrainingfacevectorx
k
(ndimensional),
k=1, 2,...,N, the outputs of the hidden layer and output layer for the auto
association net are h
k
(pdimensional, usually p<<n and p<N) and y
k
(n
dimensional),respectively,with
. ), (
2 1 k k k k
h W y x W F h = = (2.37)
Here,W
1
(pbyn)andW
2
(nbyp)arecorrespondingweightmatrixesandF(.)is
either linear or a nonlinear function, applied ‘component by component’. If we
packx
k
,y
k
andh
k
intomatrixesasintheeigenfacecase,thenaboverelationscan
berewrittenas
. ), (
2 1
H W Y X W F H = = (2.38)
Minimizingthetrainingerrorfortheautoassociationnetamountstominimizing
theFrobeniusmatrixnorm
¿
=
− = −
n
k
k k
y x Y X
1
2 2
. (2.39)
since Y=W
2
H, its rank is no more than p. Hence, in order to minimize training
error,Y=W
2
HshouldbethebestrankpapproximationtoX,whichmeans
T
p p
V U H W Λ =
2
(2.40)
47
where
U
p
=[u
1
,u
2
,...,u
p
]
T
,
V
p
=[v
1
,v
2
,...,v
p
]
T
,
ArethefirstpleftandrightsingularvectorsintheSVDofXrespectively,which
alsoarethefirstpeigenvectorsofXX
T
andX
T
X.
OnewaytoachievethisoptimumistohavealinearF(.)andtosettheweightsto
p
T
U W W = =
2 1
(2.41)
SinceU
p
containsthefirsteigenvectorsofXX
T
,wehaveforanyinputx
x U x W h
p
= =
1
(2.42)
which is the same as the feature vector in the eigenface approach. However, it
mustbenotedthattheautoassociationnet,whenitistrainedbytheBPalgorithm
withannonlinearF(.),generallycannotachievethisoptimalperformance.
In[109],thefirst50principalcomponentsoftheimagesareextractedand
reduced to 5 dimensions using an autoassociative neural network. The resulting
representationisclassifiedusingastandardmultilayerperceptron.
In a different approach, a hierarchical neural network, which is grown
automaticallyandnottrainedwithgradientdescent,wasusedforfacerecognition
byWengandHuang[110].
The most successive face recognition with neural networks is a recent
work of Lawrence et. al. [19] which combines local image sampling, a self
organizing map neural network, and a convolutional neural network. In the
48
corresponding work two different methods of representing local image samples
have been evaluated. In each method, a window is scanned over the image. The
firstmethodsimplycreatesavectorfromalocalwindowontheimageusingthe
intensity values at each point in the window. If the local window is a square of
sides 2W+1 long, centered on x
ij
, then the vector associated with this window is
simply [x
iw,jw
, x
iw,jw+1
,..., x
ij
,..., x
i+w,j+w1
, x
i+w,j+w
]. The second method creates a
representationofthelocalsamplebyformingavector out of the intensity of the
centerpixelandthedifferenceinintensitybetweenthecenterpixelandallother
pixels within the square window. Then the vector is given by [x
ij
x
iw,j w
, x
ij
x
iw,j
w+1
,...,w
ij
x
ij
,...,x
ij
x
i+w,j+w1
,x
ij
x
i+w,j+w
],w
i,j
istheweightofcenterpixelvaluex
i,j
.
The resulting representation becomes partially invariant to variations in intensity
ofthecompletesampleandthedegreeofinvariancecanbemodifiedbyadjusting
theweightw
ij
.
Figure2.7:ThediagramoftheConvolutionalNeuralNetworkSystem
imag
Classificati
on
Image
Sampling
Self
Organizin
gMap
Feature
Extractio
nLayers
Multilayer
Perceptron
Style
Classifier
ConvolutionalNeural Network
Dimensionality
Reduction
49
Theselforganizingmap,SOM,introducedbyTeudoKohonen[111,112],
is an unsupervised learning process which learns the distribution of a set of
patterns without any class information. The SOM defines a mapping from an
input space R
n
onto a topologically ordered set of nodes, usually in a lower
dimensional space. For classification a convolutional network is used, as it
achievessomedegreeofshiftanddeformationinvarianceusingthreeideas:local
receptive field, shared weights, and spatial subsampling. The diagram of the
systemisshowninFigure2.7.
2.2.6. TemplateBasedMethods
Themostdirectoftheproceduresusedforfacerecognitionisthematching
between the test images and a set of training images based on measuring the
correlation. The matching technique is based on the computation of the
normalizedcrosscorrelationcoefficientC
N
definedby,
{ } { } { }
{ } { }
G T
T G T G
N
I I
I E I E I I E
C
σ σ
−
= (2.43)
WhereI
G
isthegalleryimagewhichmustbematchedtothetestimage,I
T
.I
G
I
T
is
the pixel by pixel product, E is the expectation operator and σ is the standard
deviation over the area being matched. This normalization rescales the test and
gallery images energy distribution so that their variances and averages match.
However correlation based methods are highly sensitive to illumination, rotation
and scale changes. The best results for the reduction of the illumination changes
50
wereobtainedusingtheintensityofgradient ( )
G Y G X
I I δ δ + .Correlationmethod
is computationally expensive, so the dependency of the recognition on the
resolutionoftheimagehasbeeninvestigated.
In[16],Brunelliand Poggiodescribea correlationbasedmethodforface
recognition in which templates corresponding to facial features of relevant
significance as the eyes, nose and mouth are matched. In order to reduce
complexity,inthismethodfirstpositionsofthosefeaturesaredetected.Detection
of facial features is also subjected to a lot of studies [36, 81, 113, 114]. The
methodpurposedbyBrunelliandPoggiousesasetoftemplatestodetecttheeye
position in a new image, by looking for the maximum absolute values of the
normalized correlation coefficient of these templates at each point in the test
image. In order to handle scale variations, five eye templates at different scales
were used. However, this method is computationally expensive, and also it must
benotedthateyesofdifferentpeoplecanbemarkedlydifferent.Suchdifficulties
canbereducedbyusingahierarchicalcorrelation[115].
Afterfacialfeaturesaredetectedforatestface,theyarecomparedtothose
ofgalleryfacesreturningavectorofmatchingscores(oneperfeature)computed
throughnormalizedcrosscorrelation.
The similarity scores of different features can be integrated to obtain a
globalscore.Thiscumulativescorecanbecomputedinseveralways:choosethe
score of the most similar feature or sum the feature scores or sum the feature
scores using weights. After cumulative scores are computed, a test face is
assignedtothefaceclassforwhichthisscoreismaximized.
51
The recognition rate reported in [16] is higher than 96%. The correlation
method as described above requires a robust feature detection algorithm with
respect to variations in scale, illumination and rotations. Moreover the
computationalcomplexityofthismethodisquitehigh.
Beymer [116] extended the correlation based approach to a view based
approach for recognizing faces under varying orientations, including rotations in
depth. In the first stage three facial features (eyes and nose lobe) are detected to
determinefacepose.Althoughfeaturedetectionissimilartopreviouslydescribed
correlation method to handle rotations, templates from different views and
differentpeopleareused.Afterfaceposeisdetermined,matchingproceduretakes
placewiththecorrespondingviewofthegalleryfaces.Inthiscase,asthenumber
of model views for each person in the database is increased, computational
complexityisalsoincreased.
2.2.7. FeatureBasedMethods
Since most face recognition algorithms are minimum distance classifiers
insomesense,itisimportanttoconsidermorecarefullyhowa“distance”should
be defined. In the previous examples (eigenface, neural nets… etc.) the distance
betweenanobservedfacexandagalleryfacecisthecommonEuclideandistance
c x c x d − = ) , ( , and this distance is sometimes computed using an alternative
orthonormalbasisas c x c x d
~ ~
) , ( − = .
52
Whilesuchanapproachiseasytocompute,italsohassomeshortcomings.
When there is an affine transformation between two faces (shift and dilation),
d(x,c) will not be zero; in fact it can be quite large. As another example, when
there is local transformations and deformations (x is a “smiling” version of c),
again d(x,c)willnotbezero.Moreover,itisveryusefultostoreinformationonly
aboutthekeypointsoftheface.Featurebasedapproachescanbeasolutiontothe
aboveproblems.
Manjunath et al. [36] proposed a method that recognizes faces by using
topological graphs that are constructed from feature points obtained from Gabor
waveletdecompositionoftheface.Thismethodreducesthestoragerequirements
by storing facial feature points detected using the Gabor wavelet decomposition.
Comparison of two faces begins with the alignment of those two graphs by
matching centroids of the features. Hence, this method has some degree of
robustness to rotation in depth but only under restrictedly controlled conditions.
Moreover,illuminationchangesandoccludedfacesarenottakenintoaccount.
In the proposed method by Manjunaht et. al. [36], the identification
process utilizes the information present in a topological graph representation of
thefeaturepoints.ThefeaturepointsarerepresentedbynodesV
i
i={1,2,3,…},in
a consistent numbering technique. The information about a feature point is
contained in {S, q}, where S represents the spatial locations and q is the feature
vectordefinedby,
[ ] ) , , ( ),..., , , (
1 N i i i
y x Q y x Q q θ θ = (2.44)
53
corresponding to i
th
feature point. The vector q
i
is a set of spatial and angular
distances from feature point i to its N nearest neighbors denoted by Q
i
(x,y,θ
j
),
where j is the j
th
of the N neighbors. N
i
represents a set of neighbors. The
neighbors satisfying both maximum number N and minimum Euclidean distance
d
ij
betweentwopointsV
i
andV
j
aresaidtobeofconsequencefori
th
featurepoint.
In order to identify an input graph with a stored one, which might be
differenteitherintotalnumberoffeaturepointsorinthelocationoftherespective
faces,twocostvaluesareevaluated[36].Oneisthetopologicalcostandtheother
isasimilaritycost.Ifi,jrefertonodesintheinputgraphIand n m y x ′ ′ ′ ′ , , , refer
tonodesinthestoredgraphOthenthetwographsarematchedasfollows[36]:
1. ThecentroidsofthefeaturepointsofIandOarealigned.
2. LetN
i
bethei
th
featurepoint{V
i
}ofI.Searchforthebestfeaturepoint{V
i’
}
inOusingthecriterion
m i
N m
i i
i i
i i
S
q q
q q
S
i
′
∈ ′
′
′
′
= − = min
.
1 (2.45)
3. Aftermatching,thetotalcostiscomputedtakingintoaccountthetopologyof
thegraphs.Letnodesi andjoftheinputgraphmatchnodes i′ and j′ ofthe
stored graph and let
i
N j ∈ (i.e., V
j
is a neighbor of V
i
). Let
¦
)
¦
`
¹
¦
¹
¦
´
¦
=
′ ′
′ ′
′ ′
ij
j i
j i
ij
j j i i
d
d
d
d
, min ρ .Thetopologycostisgivenby
. 1
j j i j j i i ′ ′ ′ ′
− = Τ ρ (2.46)
54
4. Thetotalcostiscomputedas
¿ ¿ ¿
∈
′ ′ ′
Τ + =
i N j
j j i i t
i
i i
i
S C λ
1
(2.47)
where
t
λ is a scaling parameter assigning relative importance to the two cost
functions.
5. The total cost is scaled appropriately to reflect the possible difference in the
totalnumberofthefeaturepointsbetweentheinputandthestoredgraph.Ifn
i
,
n
o
are the numbers of the feature points in the input and stored graph,
respectively, then scaling factor
)
`
¹
¹
´
¦
=
I
O
O
I
f
n
n
n
n
s , max and the scaled cost is
C(I,O)=s
f
C
1
(I,O).
6. Thebestcandidateistheonewiththeleastcost,
) , ( min ) , (
*
O I C O I C
O
′ =
′
(2.48)
Therecognizedfaceistheonethathastheminimumofthecombinedcost
value. In this method [36], since comparison of two face graphs begins with
centroid alignment, occluded cases will cause a great performance decrease.
Moreover, directly using the number of feature points of faces can be result in
wrong classifications while the number of feature points can be changed due to
exteriorfactors(glasses…etc).
Anotherfeaturebasedapproachistheelasticmatchingalgorithmproposed
by Lades et al. [117], which has roots in aspectgraph matching. Let S be the
55
originaltwodimensionalimagelattice(Figure2.8).Thefacetemplateisavector
fieldbydefininganewtypeofrepresentation,
c={c
i
,i ∈S
1
} (2.49)
where S
1
is a lattice embedded in S and c
i
is a feature vector at position i. S
1
is
much coarser and smaller than S. c should contain only the most critical
information about the face, since c
i
is composed of the magnitude of the Gabor
filter responses at position i∈S
1
. The Gabor features c
i
provide multiscale edge
strengthsatpositioni.
Figure2.8:A2Dimagelattice(gridgraph)onMarilynMonroe’sface.
AnobservedfaceimageisdefinedasavectorfieldontheoriginalimagelatticeS.
x={x
j
,j∈S} (2.50)
wherex
j
isthesametypeoffeaturevectorasc
i
butdefinedonthefinegridlattice
S.
56
Intheelasticgraphmatchingapproach[23,27,117],thedistanced(c,x)is
defined through ‘best match’ between x and c. A match between x and c can be
describeduniquelythroughamappingbetweenS
1
andS.
M:S
1
→S (2.51)
Howeverwithoutrestriction,thetotalnumberofsuchmappingsisS
s
1

.
Recognizingaface,thebestmatchshouldpreservebothfeaturesandlocal
geometry. If i∈S
1
and j=M(i), then feature preserving means that x
j
is not much
different from c
i
. In addition, if i
1
and i
2
are close in S
1
then preserving local
geometry means j
1
=M(i
1
) to be close to j
2
=M(i
2
). Such a match is called elastic
sincethepreservationcan be approximate rather thanexact, the lattice S
1
can be
stretchedunevenly.
Finding the best match is based on minimizing the following energy function
[117],
[ ] . ) ( ) (
,
) (
2 1
,
2
2 1 2 1 ¿ ¿
− − − +
(
(
¸
(
¸
− =
i i i
j i
j i
j j i i
x c
x c
M E λ (2.52)
Duetothelargenumberofpossiblematches,thismightbedifficulttoobtaininan
acceptabletime.Hence,anapproximatesolutionhastobefound.Thisisachieved
intwostages:Rigidmatchinganddeformablematching[117].Inrigidmatchingc
is moved around in x like conventional template matching and at each position
x c ′ − is calculated. Here x′ is the part of x that, after being matched with c,
does not lead to any deformation of S
1
. In deformable matching, lattice S
1
is
57
stretched through random local perturbations to further reduce the energy
function.
There is still a question of face template construction. An automated
schemehasbeenshowntoworkquitewell.Theprocedureisasfollows[117]:
1. Pick the face image of an individual and generate a face vector field using
Gaborfilters.Denotethisasvectorfieldx.
2. Place a coarse grid lattice S
1
‘by hand’ on x such that vertices are close to
importantfacialfeaturessuchastheeyes.
3. CollectthefeaturevectorsatverticesofS
1
toformthetemplatec.
4. Pick another face image (of a different person) and generate a face vector
field,denotedasy.
5. Performtemplatematchingbetweenthecwithyusingrigidmatching.
6. Whenabestmatchisfound,collectthefeaturevectorsatverticesofS
1
toform
thetemplateofy.
7. Repeatsteps46.
Malsburg et al. [23, 27] developed a system based on graph matching
approach on with several major modifications. First, they suggested using not
only magnitude but also phase information of the Gabor filter responses. Due to
phaserotation,vectorstakenfromimagepointsonlyafewpixelsapartfromeach
otherhaveverydifferentcoefficients,althoughrepresentingalmostthesamelocal
feature.Therefore,phaseshouldeitherbeignoredorcompensatedforitsvariation
explicitly.
58
As mentioned in [23], using phase information has two potential
advantages,
1. Phase information is required to discriminate between patterns with similar
magnitudes.
2. Sincephasevariesquicklywithlocationprovidesameansforaccuratevector
localizationinanimage.
Malsburg et al. proposed to compensate phase shifts by estimating small
relative displacements between two feature vectors to use a phase sensitive
similarityfunction[23].
Second modification is the use of bunch graphs (Figure 2.9, 2.10). The
face bunch graph is able to represent a wide variety of faces, which allows
matching on face images of previously unseen individuals. These improvements
makeitpossibletoextractanimagegraphfromanewfaceimageinonematching
process. Computational efficiency, and ability to deal with different poses
explicitlyarethemajoradvantagesofthesystemin[23],comparedto[117].
59
(a) (b)
Figure 2.9: A brunch graph from the a) artistic point of view (``Unfinished
Portrait''byTullioPericoli(1985)),b)scientificpointofview
Figure2.10:Bunchgraphmatchedtoaface.
60
2.2.8. CurrentStateoftheArt
Comparing recognition results of different face recognition systems is a
complextask,becausegenerallyexperimentsarecarriedoutondifferentdatasets.
However, the following 5 questions can help while reviewing different face
recognitionmethods:
1. Wereexpression,headorientationandlightingconditionscontrolled?
2. Where subjects allowed wearing glasses and having beards or other facial
masks?
3. Was the subject sample balanced? Where gender, age and ethnic origin
spannedevenly?
4. How many subjects there in database? How many images were used for
trainingandtesting?
5. Werethefacefeatureslocatedmanually?
Answering the above questions contributes to building better description
oftheconstraintswithineach approach operated. This helpsto make a more fair
comparisonbetweendifferentsetofexperimentalresults.Howeverthemostdirect
and reliable comparison between different approaches is obtained by
experimentingwiththesamedatabase.
By 1993, there were several algorithms claiming to have accurate
performanceinminimallyconsternatedenvironments.Forabettercomparisonof
thosealgorithmsDARPAandArmyResearchLaboratoryestablishedtheFERET
program (see section 4.1.4) with the goals of both evaluating their performance
andencouragingadvancesinthetechnology[83].
61
Today,therearethreealgorithmsthathavedemonstratedthehighestlevel
of recognition accuracy on large databases (1196 people or more) under double
blind testing conditions. These are the algorithms from University of Southern
California(USC),UniversityofMaryland(UMD),andMITMediaLab[83,128].
The MIT, Rockefeller and UMD algorithms all use a version of the eigenface
transform followed by discriminative modeling (section 2.2.3.1.2). However the
USC system uses a very different approach. It begins by computing Gabor
Wavelettransformoftheimageanddoesthecomparisonbetweenimagesusinga
graph matching algorithm (section 2.2.7). Only two of these algorithms, from
USC and MIT are capable of both minimally constrained detection and
recognition; the others require approximate eye locations to operate. Algorithm
developedatRockefellerUniversity,wasanearlycontender,droppedfromtesting
toformacommercialenterprise.TheMITandUSCalgorithmshavealsobecome
thebasisforcommercialsystems.InFERETtesting,theperformanceofthefour
algorithmsissimilarenoughthatitisdifficultorimpossibletomake meaningful
distinctionsbetweenthem.ResultsofFERETtestwillbegivenatsection4.1.4.
62
CHAPTER3
FACEREPRESENTATIONANDMATCHINGUSING
GABORWAVELETS
Usinglocalfeaturesisamatureapproachtofacerecognitionproblem[11,
14, 17, 18, 23, 35, 59, 36, 62]. One of the main motivations of feature based
methods is due to: representation of the face image in a very compact way and
hence lowering the memory needs. This fact especially gains importance when
thereisahugefacedatabase.Featurebasedmethodsarebasedonfindingfiducial
points(orlocalareas)onafaceandrepresentingcorrespondinginformationinan
efficientway.However,choosingsuitablefeaturelocationsandthecorresponding
values are extremely critical for the performance of a recognition system.
Searching nature for finding an answer has lead researchers to examine the
behaviorofhumanvisualsystem(HVS).
Physiological studies found simple cells, in human visual cortex, that are
selectively tuned to orientation as well as to spatial frequency. It was suggested
that the response of a simple cell could be approximated by 2 D Gabor filters
63
[119].Overthelastcoupleofyears,ithasbeenshownthatusingGaborfiltersas
thefrontendofanautomatedfacerecognitionsystemcouldbehighlysuccessful
[23,36,35,27,32].Oneofthemostsuccessfulfacerecognitionmethodisbased
ongraphmatchingofcoefficientswhichareobtainedfromGaborfilterresponses
[83, 23]. However, such graph matching algorithm methods have some
disadvantages due to their matching complexity, manual localization of training
graphs, and overall execution time. They use general face structure to generate
graphs and such an approach brings the question of how efficient the feature
represents the special facial characteristics of each individual. A novel Gabor
basedmethodmayovercomethosedisadvantages.
2 D Gabor functions are similar to enhancing edge contours, as well as
valleys and ridge contours of the image. This corresponds to enhancing eye,
mouth,noseedges,whicharesupposedtobethemainimportantpointsonaface.
Moreover,such an approach also enhances moles, dimples, scars, etc. Hence, by
using such enhanced points as feature locations, a feature map for each facial
image can be obtained and each face can be represented with its own
characteristicswithoutanyinitialconstrains.Havingfeaturemapsspecializedfor
eachfacemakesitpossibletokeepoverallfaceinformationwhileenhancinglocal
characteristics.
Inthisthesis,anovelmethodisproposedbasedonselectingpeaks(high
energized points) of the Gabor wavelet responses as feature points, instead of
usingpredefinedgraphnodesasinelasticgraphmatching[23]whichreducesthe
64
representative capability of Gabor wavelets. Feature vectors are constructed by
samplingGaborwavelettransformcoefficientsatfeaturepoints.
In the following sections, the details about the Gabor wavelet transform
will be presented while giving the reasons of using it for face recognition. Then
theproposedalgorithmwillbeexplained,explicitly.
3.1. FaceRepresentationUsingGaborWavelets
3.1.1. GaborWavelets
Sincethediscoveryofcrystallineorganizationoftheprimaryvisualcortex
in mammalian brains thirty years ago by Hubel and Wiesel [121], an enormous
amount of experimental and theoretical research has greatly advanced our
understanding of this area and the response properties of its cells. On the
theoretical side, an important insight has been advanced by Marcelja [122] and
Daugman [118, 119] that simple cells in the visual cortex can be modeled by
Gabor functions. The Gabor functions proposed by Daugman are local spatial
bandpass filters that achieve the theoretical limit for conjoint resolution of
informationinthe2Dspatialand2DFourierdomains.
Gabor functions first proposed by Dennis Gabor as a tool for signal
detectioninnoise.Gabor[120]showedthatthereexistsa“quantumprinciple”for
information;theconjointtimefrequency domain for 1D signals must necessarily
bequantizedsothatnosignalorfiltercanoccupylessthancertainminimalarea
in it. However, there is a trade off between time resolution and frequency
65
resolution. Gabor discovered that Gaussian modulated complex exponentials
provide the best trade off. For such a case, the original Gabor elementary
functions are generated with a fixed Gaussian, while the frequency of the
modulatingwavevaries.
Gabor filters, rediscovered and generalized to 2D, are now being used
extensively in various computer vision applications. Daugman [118, 119]
generalized the Gabor function to the following 2D form in order to model the
receptivefieldsoftheorientationselectivesimplecells:
(
(
¸
(
¸
− = Ψ
− −
2 2
2
2
2
2
2
2
) (
σ
σ
σ
e e e
k
x
x k j
x k
i
i
i
i
(3.1)
Eachψ
i
isaplanewavecharacterizedbythevector
i
k
envelopedbyaGaussian
function,whereσisthestandarddeviationofthisGaussian.Thecenterfrequency
ofi
th
filterisgivenbythecharacteristicwavevector,


.

\

=


.

\

=
µ
µ
θ
θ
sin
cos
v
v
iy
ix
i
k
k
k
k
k
, (3.2)
havingascaleandorientationgivenby(k
v
,θ
µ
).Thefirstterminthebrackets(3.1)
determinestheoscillatorypartofthekernel,andthesecondtermcompensatesfor
the DC value of the kernel. Subtracting the DC response, Gabor filters becomes
insensitivetotheoveralllevelofillumination.
66
Recent neurophysiological evidence suggests that the spatial structure of
the receptive fields of simple cells having different sizes is virtually invariant.
Daugman[118]and others [121, 122] have proposed that an ensemble of simple
cells is best modeled as a family of 2D Gabor wavelets sampling the frequency
domain in a logpolar manner. This class is equivalent to a family of affine
coherentstatesgeneratedbyrotationanddilation.Thedecompositionofanimage
Iintothesestatesiscalledthewavelettransformoftheimage:
}
′ ′ − Ψ ′ = x d x x x I x R
i i
) ( ) ( ) ( (3.3)
Where ) (x I
istheimageintensityvalueat x
.
(a) (b)
Figure 3.1: (a) an ensemble of Gabor wavelets, (b) their coverage of spatial
frequencyplane
67
Each member of this family of Gabor wavelets models the spatial
receptive field structure of a simple cell in the primary visual cortex. The Gabor
decompositioncanbeconsideredasadirectionalmicroscopewithanorientation
and scaling sensitivity. Due to the endinhibition property of these cells, they
response to short lines, line endings and sharp changes in curvature. Since such
curvescorrespondtosome lowlevel salient features in animage, these cells can
beassumedtoformalowlevelfeaturemapoftheintensityimage(Figure3.2).
(a)(b)(c)(d)(e)
Figure 3.2: Small set of features can recognize faces uniquely, and receptive
fields that are matched to the local features of the face (a) mouth, (b) nose, (c)
eyebrow,(d)jawline,(e)cheekbone.
SincetheGaborwavelet transform is introduced to computer vision area,
oneofthemostimportantapplication areas for 2D Gabor wavelet representation
is face recognition (see Section 2.2.4). In a U.S. government activity (FERET
program) to find the best face recognition system, a system based on Gabor
wavelet representation of the face image performed among other systems on
several tests. Although the recognition performance of this system shows
qualitative similarities to that of humans by now means, it still leaves plenty of
roomforimprovement.
Utilizationofthe2DGaborwaveletrepresentationincomputervisionwas
pioneered by Daugman in the 1980s. More recently, B. S. Manjunath [36] et al
68
hasdevelopedafacerecognitionsystembasedonthisrepresentation.Afterwards,
studiesforGaborwaveletrepresentationinthefieldoffacerecognitionbyusing
is continued with appending dynamic link architecture [117] and elastic graph
matching[23]totheprevioussystem.
3.1.2. 2DGaborWaveletRepresentationofFaces
Sincefacerecognitionisnotadifficulttaskforhumanbeings,selectionof
biologically motivated Gabor filters is well suited to this problem. Gabor filters,
modeling the responses of simple cells in the primary visual cortex, are simply
planewavesrestrictedbyaGaussianenvelopefunction(3.1)[12].
Figure3.3:Gaborfilterscorrespondto5spatialfrequencyand8
orientation.
Spatialfrequencyvaries
Orientationvaries
69
AnimagecanberepresentedbytheGaborwavelettransformallowingthe
description of both the spatial frequency structure and spatial relations.
Convolving the image with complex Gabor filters with 5 spatial frequency (v =
0,…,4) and 8 orientation (µ = 0,…,7) captures the whole frequency spectrum,
bothamplitudeandphase(Figure3.3).InFigure3.4,aninputfaceimageandthe
amplitudeoftheGaborfilterresponsesareshown.
(a)
(b)
Figure3.4:ExampleofafacialimageresponsetoaboveGaborfilters,a)original
faceimage(fromStirlingdatabase),andb)filterresponses.
Spatialfrequency
varies
Orientationvaries
70
One of the techniques used in the literature for Gabor based face
recognition is based on using the response of a grid representing the facial
topographyforcodingtheface.[23,25,26,35].Insteadofusingthegraphnodes,
highenergized points can be used in comparisons which forms the basis of this
work. This approach not only reduces computational complexity, but also
improvestheperformanceinthepresenceofocclusions.
3.1.3. Featureextraction
Featureextraction algorithm for the proposed method has two main steps
(Figure3.6):(1)Featurepointlocalization,(2)Featurevectorcomputation.
3.1.3.1. Featurepointlocalization
In this step, feature vectors are extracted from points with high
information content on the face image. In most featurebased methods, facial
featuresareassumedtobetheeyes,noseandmouth.However,wedonotfixthe
locations and also the number of feature points in this work. The number of
feature vectors and their locations can vary in order to better represent diverse
facial characteristics of different faces, such as dimples, moles, etc., which are
alsothefeaturesthatpeoplemightuseforrecognizingfaces(Figure3.5).
71
Figure 3.5: Facial feature points found as the highenergized points of Gabor
waveletresponses.
FromtheresponsesofthefaceimagetoGaborfilters,peaksarefoundby
searchingthelocationsina windowW
0
ofsizeWxWbythefollowingprocedure:
Afeaturepointislocatedat(x
0
,y
0
),if
)) , ( ( max ) , (
0
) , (
0 0
y x R y x R
j
W y x
j
∈
= (3.4)
¿¿
= =
>
1 2
1 1 2 1
0 0
) , (
1
) , (
N
x
N
y
j j
y x R
N N
y x R ,(3.5)
j=1,…,40
whereR
j
istheresponseofthefaceimagetothej
th
Gaborfilter(3.3).N
1
N
2
isthe
sizeoffaceimage,thecenterofthewindow,W
0
isat(x
0
,y
0
).WindowsizeW is
one of the important parameters of proposed algorithm, and it must be chosen
small enough to capture the important features and large enough to avoid
redundancy. Equation (3.5) is applied in order not to get stuck on a local
maximum,insteadoffindingthepeaksoftheresponses.Inourexperimentsa9x9
windowisusedtosearchfeaturepointsonGaborfilterresponses.Afeaturemap
isconstructedforthefacebyapplyingaboveprocesstoeachof40Gaborfilters.
72
Figure3.6:Flowchartofthefeatureextractionstageofthefacialimages.
Gaborwavelettransformcoefficients
1234567404142
Coordinatesof
thefeaturepoint
end
x
i
y
i
start
GWT
Findfeaturepoints
Feature
vectors
Image
73
3.1.3.2. Featurevectorgeneration
Feature vectors are generated at the feature points as a composition of
Gabor wavelet transform coefficients. k
th
feature vector of i
th
reference face is
definedas,
{ } 40 ,....., 1 ) , ( , ,
, ,
= = j y x R y x v
k k j i k k k i
.(3.7)
While there are 40 Gabor filters, feature vectors have 42 components. The first
two components represent the location of that feature point by storing (x, y)
coordinates.Sincewehavenootherinformationaboutthelocationsofthefeature
vectors, the first two components of feature vectors are very important during
matching(comparison)process.Theremaining40componentsarethesamplesof
theGaborfilterresponsesatthatpoint.
Although one may use some edge information for feature point selection,
here it is important to construct feature vectors as the coefficients of Gabor
wavelettransform.Featurevectors,asthesamplesofGaborwavelettransformat
featurepoints,allowrepresenting both the spatial frequency structure and spatial
relationsofthelocalimageregionaroundthecorrespondingfeaturepoint.
74
3.2. MatchingProcedure
3.2.1. SimilarityCalculation
Inordertomeasure the similarity of two complex valued feature vectors,
followingsimilarityfunctionisusedwhichignoresthephase:
,l=3,…,42. (3.8)
S
i
(k,j) represents the similarity of j
th
feature vector of the test face, (v
i,j
), to k
th
featurevectorofi
th
referenceface,(v
i,k
), where listhenumberofvectorelements.
Proposedsimilaritymeasurebetweentwovectorssatisfiesfollowingconstrains:
1 0 < <
i
S ,
andifi
th
galleryfaceimageisusedalsoasthetestimage,
1 ) , ( = j j S
i
.
Thelocationinformationisnotusedforvectorsimilaritycalculation,butonlythe
magnitudesofthewaveletcoefficientsaretakeplaceat(3.8).Itmustbeclarified
thatthesimilarityfunction(3.8)isonlyonecomponentoftheproposedmatching
procedure (Section 3.2.2). Location information of feature vectors will also be
usedduringmatching.
Equation (3.8) is a very common similarity measure between feature
vectors,containingGaborwavelet transform coefficients [36], but sometimes we
might have small variations [23, 27]. In [23] similarity function at (3.8) is used
¿ ¿
¿
=
l
j t
l
k i
l
j t k i
i
l v l v
l v l v
j k S
2
,
2
,
, ,
) ( ) (
) ( ) (
) , (
75
with complex valued coefficients and an additional phase compensating term. In
theearlyexperimentsitisobservedthatsmallspatialdisplacementscausechange
in complex valued coefficients due to phase rotation. Then phase can either be
ignored or compensated as in [23]. Although phase compensated similarity
function is found to increase recognition performance significantly [23,27],
similarityfunctionwithoutphaseischosentoavoidcomputationalcomplexity.
3.2.2. Facecomparison
After feature vectors are constructed from the test image, they are
compared to the feature vectors of each reference image in the database. This
comparison stage takes place in two steps. In the first step, we eliminate the
featurevectorsofthereferenceimageswhicharenotcloseenoughtothefeature
vectors of the test image in terms of location and similarity. Only the feature
vectorsthatfitthefollowingtwocriterionsareexaminedinthenextstep.
1. ( ) ( )
1
2 2
th y y x x
t r t r
< − + − ,
whereth
1
istheapproximateradiusoftheareathatcontainseithereye,mouth
or nose, (x
r
, y
r
) and (x
t
, y
t
) represents the location of a feature point on a
reference face and test face respectively. Comparing the distances between
thecoordinatesofthefeaturepointssimplyavoidsthe matching of a feature
point located around the eye with a point of a reference facial image that is
located around the mouth. After such a localization, we may disregard the
location information in the second step. Moreover here topology of face is
76
alsoexaminedtousecorrespondinginformationatthefinalmatchingbyonly
lettingfeaturepointsthatarematcheachotherinatopologicalmanner.
2. S
i
(k,j)>th
2,
Similarityoftwofeaturevectorsisgreaterthanth2,whereth
2
ischosenasthe
standarddeviationofsimilaritiesofallfeaturevectorsinthereferencegallery
andthesimilarityoftwovectorsiscomputedbyEquation(3.8).
Although this thresholding seems to be done very roughly it reduces the
number of feature vectors at the gallery faces and increases the speed of
algorithmatthefollowingsteps.
Bychangingth
1
andth
2
onecancontrolthe topology and vectorsimilarity costs.
In other words, increasing th
1
gives more area for searching the feature points
withsimilaritieslargerthanth
2
.Thiscanbeusefulwhenthelocationsoffeatures
changedduetosomereasons,suchasdifferentexpression.However,ifth
1
istoo
large then the face topology information could be totally wrong. By keeping th
1
constant, increasing th
2
in a large extent will result in finding no match, and
conversely decreasing th
2
could result in a redundant feature vector that will
increasethecomputationalcost.However,smallvariationsinth
1
andth
2
willnot
effecttheperformanceofthemethod.Onecanchoosethesevaluesstatedatabove
steps1and2.
As a result of the first step, we are left with N
k
feature vectors of the
reference face and N
kj
of them are known to be close enough in terms of both
location similarity to the j
th
feature vector of the test face. Hence, possible
77
matches are determined to each of the feature vector on the test face from the
featurevectorsofgalleryfaces.
After elimination of feature vectors in the gallery, at the end of the first
step, there could be no remaining feature vector from the gallery faces to be
matchedtoafeaturevectorofthetestface.Whensuchacaseoccurs,thatfeature
vectoroftestfaceisignoredandmatchingprocedureiscontinuedwithothers.
In the second step, Equation (3.9) is applied as a second elimination of
featurevectorsofthereferenceface in orderto guarantee surviving at most only
one feature vector of a reference face to be matched with a feature vector of the
testface:
)) , ( ( max
,
,
j l S Sim
i
N l
j i
j k
∈
= ,(3.9)
InEquation(3.9)Sim
i,j
givesthesimilarityofthe i
th
referencefacetothetestface
basedonthe j
th
featurevector.
Eventually,theoverallsimilarityofeachreferencefaceiscomputedasthe
meanoffeaturevectorsimilaritiesthatpassedthetwosteps(3.10).
{ }
j i i
Sim mean OS
,
= .(3.10)
OS
i
representstheoverallsimilarityoftestfacetoi
th
referenceface,anditsvalues
changefrom0to1.Ifi
th
galleryimageusedalsoasthetestimage,thenOS
i
will
beequalto1.
Although OS gives a good measure for face similarity, it can further be
improvedbyconsideringthenumberoffeaturevectors.Itshouldbenotedthatthe
numberoffeaturepointsforany of two face image is not equal even for images
78
fromthesameindividual.Thereasonisduetothehugevariationofthenumberof
feature vectors (e.g. glasses/no glasses, illumination changes, etc.) from face to
face and also for the same face under different conditions. Since the number of
featurevectorsthatgivetheoverallsimilarity(3.10)isnotdeterminedbyOS,itis
not very meaningful to use it alone. As an example, OS=0.85 as the mean of 20
vectorsimilaritiesshouldbemorevaluablethanOS=0.95asthemeanof2vector
similarities. Moreover, numbers of matched feature vectors of each gallery face
gives an information about the topological matching. In order to emphasize the
informationcontainedbythenumberofmatchedfeaturepoints,anewparameter
C is computed, by only counting the number of feature vectors of a gallery face
thathavehighsimilaritytoafeaturevectorofthetestface:
¿
= =
j
j l j i i
Sim Sim C )) max( (
, ,
δ ,(3.11)
¿
=
l
t l
N C
,(3.12)
l=1,…,numberofreferencefaces
where N
t
is the total number of feature vectors of test face, and δ(.) is the Delta
Diracfunction.
Foreachfeaturevectoroftestfacewesortreferencefacesbymeansofsimilarity,
andcountthenumberoftimesthateachreferencefacegetsthefirstplace(3.11).
The simulation results are shown in Figure 3.7. In this representation for
ideal results, the corresponding figure must be a stair case function whose
horizontalwidthincludesthetestimageforeachpersoninthegalleryandvertical
level is the correct result. Recognition results achieved by both comparing only
79
similarities(Figure3.7a)andcomparingonlynumberoftimeshavingmaximum
similarfeaturevector(Figure3.7b)arepresentedinFigure3.7.
Although results achieved in Figure 3.7b are better, a problem occurs
when a test face has more feature vectors (due to illumination changes, having
glasses…etc.)thanthecorrespondingreferenceface.Wecanexplainthisbetterby
an example: Assume that there are two reference faces i and j, and a test face,
havingthenumberoffeaturevectorsN
i
,N
j
andN
t
respectively.i
th
referenceface
reaches the maximum similarity at its all feature points, (C
i
=N
i
). Having more
feature points j
th
reference face gets maximum similarity at more points than i
th
does, (C
i
< C
j
<N
t
) and( C
j
<N
j
). In the case, one can use the number of feature
vectorsofreferencefacesasanadditionalcriteria.
Inordertofindthebestmatch,theweightedsumofthesetworesultscan
be used. This improves the performance, in case different false matches are
obtainedbyusingindividualresultsofOSandC.However,successivematching
canbedonebyusingtheexpectation that ifi
th
reference face is correctit should
gethighvaluesforbothC
i
andOS
i
.
Hence,thebestcandidatematchissearchedtomaximizethefollowingfunction,


.

\

+ =
i
i
i i
N
C
OS FSF β α . (3.13)
where C
i
isthenumberoffeaturevectorsofi
th
referencefacethathavemaximum
similaritytoafeaturevectoroftestface,N
i
isthenumberoffeaturevectorsofi
th
referenceimage.Ifi
th
galleryimageusedalsoasthetestimageFSF
i
willbeequal
tounity.
80
Although for the corresponding database %100 recognition result is
achieved (Figure 3.7c), it is seen that for larger databases counting only the
maximum similar feature vectors of gallery faces as in (3.11) becomes useless.
Instead, Equation (3.11) can be generalized for large databases by counting the
number of feature vectors of each gallery face which is in the first %10 of
similarityrank,
¿
= =
j
j l j i i
Sim rank Sim C )) ( (
, 10 ,
δ , (3.14)
l=1,…,numberofreferencefaces .
81
Figure 3.7: Test faces (1624) vs. matching face from gallery (148) by
comparing, a) only similarities, b) only number of maximum similar feature
vectors,c)bothsimilaritiesandnumberofmaximumsimilarfeaturevectors.
100 200 300 400 500 600
5
10
15
20
25
30
35
40
45
100 200 300 400 500 600
5
10
15
20
25
30
35
40
45
(a)
(b)
(c)
100 200 300 400 500 600
5
10
15
20
25
30
35
40
45
82
CHAPTER4
RESULTS
4.1. SimulationSetup
The proposed method is tested on four different face databases: Stirling
[28], Purdue [29], ORL [30] and FERET [128] face databases. For each dataset,
onefrontalfaceimagewithneutralposeofeachindividualisplacedinthegallery
andtheothersareplacedintheprobe(test)set.Eachdatabasemorethanoneface
image with different conditions (expression, illumination…etc.), of each
individual.Itmustbenotedthatnoneofthegalleryimagesisfoundintheprobe
set. Each probe image is matched against the data in the gallery, and the ranked
matchesareanalyzed.
Each of the above mentioned databases have different characteristics to
test the proposed algorithm. Stirling face database [28] is used to measure the
robustness against the expression changes. Purdue face database [29] contains
occluded cases (some obstacles on face). Head pose variations and images that
83
were taken at different times have been included in ORL face database [30].
Finally FERET face database [128] is used. FERET is not only a large face
database (1196 individuals); but also challenging benchmark of all face
recognitionsystems.Thisgivesusanopportunitytocomparetheperformanceof
our face recognition method by the others using a standardized dataset and test
procedure.
Inthefollowingsection,detailedinformationforthosefourfacedatabases
and their corresponding performance results for the proposed face recognition
method are given with the comparisons with some major face recognition
methods.
(a) (b)
Figure4.1:Examplesofdifferent facial expressions of two people fromStirling
database,a)galleryfaces,b)probefaces.
4.1.1. ResultsforUniversityofStirlingfacedatabase
UniversityofStirlingfacedatabasecontainsgrayscalefacialimagesof35
people (18 female, 17 male), in 3 poses and 3 expressions with constant
illumination conditions (Figure 4.1). In the experiments, we used three frontal
images with different expressions for each of 35 persons. The neutral views are
placed in the gallery, whereas two other expressions (smiling and speaking) are
84
used as probes. In this database the proposed algorithm gives 100% correct
recognition.Wedidnotreachanyreportedperformanceresultsonthisdatabase.
4.1.2. ResultsforPurdueUniversityfacedatabase
PurdueUniversityfacedatabasewascreatedbyAlexMartinezandRobert
Benavente in the Computer Vision Center (CVC) at the U.A.B. It contains over
4,000colorimagescorrespondingto126people'sfaces(70menand56women).
Imagesconsistoffrontalviewfaceswithdifferentfacialexpressions,illumination
conditions, and occlusions (sunglasses and scarf). The pictures were taken at the
CVC under strictly controlled conditions. No restrictions on wear (clothes,
glasses,etc.);makeup,hairstyle,etc.wereimposedtoparticipants.
We have used images of first 58 (18 female, 40 male) individuals in our
test setup due to some availability problems of database. For each individual,
facialimagewithneutralexpressionisplacedinthegalleryandtheremaining12
with different conditions are used as probe. An example of 13 different facial
images for a typical person from Purdue database is shown in Figure 4.2. After
simulations,itisobservedthat100%ofpeoplearecorrectlyclassified.Evennone
of the occluded cases are included, recognition performances of eigenface and a
related eigenhills method for this database is reported as 82.3, 89.4 respectively
[31]. Eigenhills and eigenfaces methods were highly effected by illumination
changes, however proposed method is more robust to illumination changes as a
property of Gabor wavelets (Section 3.1.1). Moreover as a result of using local
distinctfeatures,insteadofafacetemplateorlocalfeaturesboundedbyagraph,
85
proposed method gives a high performance result on occluded cases. In the
proposed algorithm the facial features are compared locally, instead of using a
generalstructure,henceitallowsustomakeadecisionfromthepartsoftheface.
Forexample,whentherearesunglasses,thealgorithmcomparesfacesintermsof
mouth,noseandanyotherfeaturesratherthaneyes.
Since the simulation results on this database are encouraging, we pass to
ORL database on which simulation results of other wellkonwn face recognition
methodsarereported.
Figure4.2:ExampleofdifferentfacialimagesforapersonfromPurduedatabase;
firstimageisusedforgalleryandtherest12areforprobeset.
Method Recognitionrate(%)
Eigenface[20] 82.3
Eigenhills[31] 89.4
Proposed face recognition
methodusingGWT
100.0
Table 4.1: Recognition performances of eigenface, eigenhills and proposed
methodonthePurduefacedatabase.
86
Figure4.3:Wholesetoffaceimagesof40individuals10imagesperperson.
87
4.1.3. ResultsforTheOlivettiandOracleResearchLaboratory
(ORL)facedatabase
TheOlivettiandOracleResearchLaboratory(ORL)facedatabaseisalso
usedinordertotestourmethodinthepresenceofheadposevariations.Thereare
tendifferentimagesofeachof40distinctsubjects.Forsomesubjects,theimages
were taken at different times, varying lighting, facial expressions (open / closed
eyes, smiling / not smiling), facial details (glasses / no glasses) and head pose
(tilting and rotation up to 20 degrees). All the images were taken against a dark
homogeneous background. Figure 4.3 shows the whole set of 40 individuals 10
imagesperperson from the ORL database. We took the first images for each 40
individuals as reference and the rest is used for testing purposes. It is observed
that the performance of the method decreased slightly due to the orientation in
depth of the head. The proposed method achieved 95.25% correct classification
with ORL database. In Figure 4.4 erroneously classified faces of ORL database
are presented. These results were expected, since locations are important in
featurevectorscomparison.Locationsofthefacialfeatures(eyes,nose,mouth….)
arequitechangingwithrotationofthehead.
Recognition performances on ORL database of wellknown methods are
tabulated in Table 4.2. Although, the reported recognition rates are better for
convolutionalneuralnetworkandlinebasedmethods,itmustbenotedthatthese
two are using more than one facial image for each individual in training (Figure
4.5). The proposed method achieves %95,25 correct recognition rate by using
onlyonereferencefacialimageforeachindividual.
88
Method RecognitionPerformance(%)
Eigenface[20] 80.0
Elasticgraphmatching[23] 80.0
Neuralnetwork[19] 96.2
Linebased[33] 97.7
Proposedfacerecognition
methodusingGWT
95.25
Table4.2:PerformanceresultsofwellknownalgorithmsonORLdatabase.
(a) (b)
Figure4.4:ExamplesofmisclassifiedfacesofORLdatabase,a)reference faces
fortwoindividuals,b)misclassifiedtestfaces.
89
Figure 4.5: Example of different facial images for a person from ORL database
that are placed at training or probe sets by neural network and line based
algorihms.
4.1.4. ResultsforFERET
Until recently, there did not exist a common face recognition technology
evaluation protocol which includes large databases and standard evaluation
methods. The Face Recognition Technology (FERET) program has addressed
bothissuesthroughtheFERETdatabaseoffacialimagesandtheestablishmentof
theFERETtests.Uptodate,14126imagesfrom1199individualsareincludedin
thedatabase.
PrimaryobjectivesofFERETtestcanbestatedas:
1. assessthestateoftheart
2. identifyfutureareasofresearch
3. measurealgorithmperformance
The FERET database has made it possible for researchers to develop
algorithms on a common database and to report results to the literature based on
this database. The results that exist in the literature do not provide a direct
comparison between algorithms, since each researcher reports results using
facesfortraining facesasprobes
90
different assumptions, scoring methods, and images. The independently
administeredFERETtestallowsforadirectquantitativeassessmentoftherelative
strengthsandweaknessesofdifferentapproaches.
The testing protocol is based on a set of design principals. Stating the
design principle allows one to assess how appropriate the FERET test is for a
particularfacerecognitionalgorithm.
There are two sets of images, gallery and probe. Thegallery is the set of
known individuals. An image of an unknown face presented to the algorithm is
calledaprobe,andthecollectionofprobesiscalledthe probeset.
Themaintestdesignprincipalsare:
1. Algorithmscannotbetrainedduringtesting,
2. Eachfacialimageistreatedasauniqueface,
3. The similarity score between probe and a gallery image is a function of only
thosetwoimages.
Withtheaboveprinciples,asimilaritybetweeneachprobeandeachgalleryimage
mustbegeneratedforperformanceevaluation.
Although,atimelimit for a complete test is assigned as three days while
using less than 10 UNIX workstations, time or number of workstations is not
recorded for any of the algorithms. Hence, FERET program aims to encourage
algorithmdevelopment,notcodeoptimization.
The basic models for evaluating the performance of an algorithm are
closed and open universes. In a closed universe, every probe is in the gallery.
Whereas, in an open universe some probes are not in the gallery. The open
91
universemodelsverification applications. On the other hand,the closed universe
model allows oneto ask how good an algorithm is at identifying a probe image;
thequestionisnotalways“Isthetopmatchiscorrect?”but“Isthecorrectanswer
inthetopnmatches?”.Suchanapproachletsonetoknowhowmanyimageshave
tobeexaminedtogetadesiredlevelofperformance.Theperformancestaticsare
reported as cumulative match scores and also the rank is plotted along the
horizontal axis, and the vertical axis is the percentage of correct matches. The
performanceofthealgorithmisevaluatedondifferentprobecategories.
Notethatalltheabovetestsusedasinglegallerycontaining1196images.
The Duplicate I probe images were obtained anywhere between one minute and
1031 days after their respective gallery matches. The harder Duplicate II probe
images are a strict subset of the Duplicate I images; they are those taken only at
least 18 months after their gallery entries. For assessment of the effect of facial
expression, images of subjects with alternate facial expressions (fafb) have been
used.Thereisusuallyonlyafewsecondsbetweenthecaptureofthegalleryprobe
pairs. Finally, some images are obtained under different illumination conditions
and gathered under another set (fafc). All these sets are tabulated in Table 4.3.
Table4.4showsdifferentsimulationsperformedusingthesesets.
All of the latest results of FERET test is presented in Table 4.5;
Cumulative match scores for each of 4 probe sets (dupI, dupII, fafb, fafc) of 14
algorithms that attend to the FERET test and proposed face recognition method
are presented. On frontal images taken the same day, typical first choice
recognition performance is 95% accuracy. For images taken with a different
92
camera and lighting, typical performance drops to 80% accuracy for the first
choice recognition. For images taken one year later, the typical accuracy is
approximately50%(Table4.5).Notethat,even50%accuracyis600timesbetter
thanarandomselectionfrom1196faces.
Study Goal
Dup1 DuplicateI
Dup2 DuplicateII
fafb expression
fafc illumination
Table4.3:Probesetsandtheirgoalofevaluation.
Numberoffaces
EvaluationTask RecognizedNames
Gallery ProbeSet
Agingofsubjects DuplicateI 1196 722
Agingofsubjects DuplicateII 1196 234
FacialExpression fafb 1196 1195
Illumination fafc 1196 194
Table4.4:ProbesetsforFERETperformanceevaluation.
93
Arl_corisanormalizedcorrelationbasedalgorithm[127].Fornormalized
correlation,theimageswere(1)translated,rotated,andscaledsothatthecenterof
the eyes were placed on specific pixels and (2) faces were masked to remove
background and hair. Arl_ef is a principal components analysis (PCA) based
algorithm [20]. These algorithms are developed by U.S. Army Research
Laboratoryandtheyprovideaperformancebaseline.Intheimplementationofthe
PCAbased algorithm, all images were (1) translated, rotated, and scaled so that
the center of the eyes were placed on specific pixels, (2) faces were masked to
removebackgroundandhair,and(3)thenonmaskedfacialpixelswereprocessed
by a histogram equalization algorithm. The training set consisted of 500 faces.
Faces were represented by their projection onto the first 200 eigenvectors and
were identified by a nearest neighbor classifier using the L1 metric.
ef_hist_dev_ang, ef_hist_dev_anm, ef_hist_dev_l1, ef_hist_dev_l2,
ef_hist_dev_md, ef_hist_dev_ml1, ef_hist_dev_ml2 are seven eigenface based
system from National Institute of Standards and Technology (NIST) with a
commonimageprocessingfrontendandeigenfacerepresentationbutdifferingin
the distance metric. Algorithms are also tested from Excalibur Corp. (Carlsbad,
CA),andfromUniversityof Maryland (umd_mar_97) [129, 130]. There are two
algorithms developed by MIT Media Laboratory using a version of eigenface
transform; mit_mar_95 is the algorithm is the same algorithm that was tested in
March 1995 [86], algorithm retested in order to measure improvements, and
mit_sep_96 is the algorithm developed since March 1995 [22]. And finally the
94
usc_mar_97isanelasticgraphmatchingalgorithm[23]fromSouthernCalifornia
University.
FirstChoiceRecognitionRatio
dupI dupII fafb fafc
arl_cor 0.363 0.171 0.827 0.052
arl_ef 0.410 0.222 0.797 0.186
ef_hist_dev_ang 0.341 0.124 0.701 0.072
ef_hist_dev_anm 0.446 0.209 0.774 0.237
ef_hist_dev_l1 0.350 0.132 0.772 0.258
ef_hist_dev_l2 0.331 0.137 0.716 0.041
ef_hist_dev_md 0.422 0.167 0.741 0.232
ef_hist_dev_ml1 0.305 0.128 0.733 0.392
ef_hist_dev_ml2 0.346 0.128 0.772 0.309
excalibur 0.414 0.197 0.794 0.216
mit_mar_95 0.338 0.171 0.834 0.155
mit_sep_96 0.576 0.342 0.948 0.320
umd_mar_97 0.472 0.209 0.962 0.588
usc_mar_97 0.591 0.521 0.950 0.820
Proposedfacerecognition
methodusingGWT
0.448 0.239 0.963 0.676
Table 4.5: FERET performance evaluation results for various face recognition
algorithms.
InFigures4.6to4.9,recognitionperformancesofvariousfacerecognition
methods are presented. There can be also seen the improvements on algorithms’
performances from September 1996 to March 1997. Moreover, In Figure 4.10
average and in Figure 4.11 current upper bound identification performances of
abovemethodsoneachprobesetarepresented.
Simulation results show that, face recognition performance of the
proposedmethodiscompetitivetothoseofotherpopularmethods,suchaselastic
graph matching, eigenfaces, etc. Moreover, on fafb and fafc probe sets proposed
95
method achieves higher performance results than the most of the FERET test
contenders (Table 4.5). In Figure 4.114.15 cumulative performance results of
proposedmethodoneachprobesetarepresented.
96
Figure4.6:Identificationperformanceagainstfbprobes.(a)algorithmstestedin
September1996.(b)algorithmstestedinMarch1997.
97
Figure 4.7: Identification performance against duplicate I probes. (a) algorithms
testedinSeptember1996.(b)algorithmstestedinMarch1997.
98
Figure4.8:Identificationperformanceagainstfcprobes.(a)algorithmstestedin
September1996.(b)algorithmstestedinMarch1997.
99
Figure4.9:IdentificationperformanceagainstduplicateIIprobes.(a)algorithms
testedinSeptember1996.(b)algorithmstestedinMarch1997.
100
Figure4.10:AverageidentificationperformanceofFERETtestcontenderson
eachprobecategory.
Figure4.11:CurrentupperboundidentificationperformanceofFERET
contenderforeachprobecategory.
101
Figure4.12:Identificationperformanceofproposedmethodagainstfbprobes.
Figure4.13:Identificationperformanceofproposedmethodagainstfcprobes.
102
Figure4.14:IdentificationperformanceofproposedmethodagainstduplicateI
probes.
Figure4.15:IdentificationperformanceofproposedmethodagainstduplicateII
probes.
103
CHAPTER5
CONCLUSIONANDFUTUREWORK
Face recognition has been an attractive field of research for both
neuroscientists and computer vision scientists. Humans are able to identify
reliably a large number of faces and neuroscientists are interested in
understanding the perceptual and cognitive mechanisms at the base of the face
recognition process. Those researches illuminate computer vision scientists’
studies.Althoughdesignersoffacerecognitionalgorithmsandsystemsareaware
of relevant psychophysics and neurophysiological studies, they also should be
prudent in using only those that are applicable or relevant from a
practical/implementationpointofview.
Since 1888, many algorithms have been proposed as a solution to
automatic face recognition. Although none of them could reach the human
recognition performance, currently two biologically inspired methods, namely
104
eigenfaces and elastic graph matching methods, have reached relatively high
recognitionrates.
Eigenfacesalgorithmhassomeshortcomingsduetotheuseofimagepixel
gray values. As a result system becomes sensitive to illumination changes,
scaling,etc.andneedsabeforehandpreprocessingstep.Satisfactoryrecognition
performancescouldbereachedbysuccessfullyalignedfaceimages.Whenanew
face attend to the database system needs to run from the beginning, unless a
universaldatabaseexists.
Unlike the eigenfaces method, elastic graph matching method is more
robusttoilluminationchanges,sinceGaborwavelettransformofimagesisbeing
used, instead of directly using pixel gray values. Although recognition
performance of elastic graph matching method is reported higher than the
eigenfacesmethod[83],duetoitscomputationalcomplexityandexecutiontime,
the elastic graph matching approach is less attractive for commercial systems.
Although using 2D Gabor wavelet transform seems to be well suited to the
problem, graph matching makes algorithm bulky. Moreover, as the local
informationisextractedfrom the nodes of a predefined graph, some details on a
face,whicharethespecialcharacteristicsofthatfaceandcouldbeveryusefulin
recognitiontask,mightbelost.
In this thesis, a new approach to face recognition with Gabor wavelets is
presented. The method uses Gabor wavelet transform for both finding feature
pointsandextractingfeaturevectors.Fromtheexperimentalresults,itisseenthat
105
proposed method achieves better results compared to the graph matching and
eigenfacemethods,whichareknowntobethemostsuccessivealgorithms.
Although the proposed method shows some resemblance to graph
matching algorithm, in our approach, the location of feature points also contains
information about the face. Feature points are obtained from the special
characteristicsofeachindividualfaceautomatically,insteadoffittingagraphthat
is constructed from the general face idea. In the proposed algorithm, since the
facialfeaturesarecomparedlocally,insteadofusingageneralstructure,itallows
us to make a decision from the parts of the face. For example, when there are
sunglasses, the algorithm compares faces in terms of mouth, nose and any other
featuresratherthaneyes.Moreover,havingasimplematchingprocedureandlow
computational cost proposed method is faster than elastic graph matching
methods.Proposedmethodisalsorobusttoilluminationchangesasapropertyof
Gaborwavelets,whichisthemainproblemwiththeeigenfaceapproaches.There
isnotrainingasinmanysupervisedapproaches,suchasneuralnetworks.Anew
facial image can also be simply added by attaching new feature vectors to
reference gallery while such an operation might be quite time consuming for
systemsthatneedtraining.
The algorithm proposed by Manjunath et. al. [36] that shows some
similaritytoouralgorithm,especially in terms of the utilized features. However,
Manjunathet.al.Disregardapointwhileusinganadditionaltopologycost.Their
topologycostisdefinedastheratioofthevectoraldistancesoffeaturepointpairs
betweentwofaceimages.Duringsimulations,itisobservedthatthelocationsof
106
feature points, found from Gabor responses of the face image, can give small
deviations between different conditions (expression, illumination, having glasses
ornot,rotation,etc.),forthesameindividual.Therefore,anexactmeasurementof
corresponding distances is not possible unlike the geometrical feature based
methods.Moreover,duetoautomaticalfeaturedetection,features represented by
thosepointsarenotexplicitlyknown,whethertheybelongtoaneyeoramouth,
etc. Giving an information about the match of the overall facial structure, the
locationsoffeaturepointsareveryimportant.Howeverusingsuchatopologycost
amplifies the small deviations of the locations of feature points that are not a
measureofmatch.
Gabor wavelet transform of a face image takes 1.5 seconds, feature
extraction step of a single face image takes 0.3 seconds and matching an input
image with a single gallery image takes 0.15 seconds on a Pentium III 550 Mhz
PC.Notethataboveexecutiontimesaremeasuredwithoutcodeoptimization.
Although recognition performance of the proposed method is satisfactory
by any means, it can further be improved with some small modifications and/or
additionalpreprocessingoffaceimages.Suchimprovementscanbesummarized
as;
• Since feature points are found from the responses of image to Gabor filters
separately,asetofweightscanbeassignedtothesefeaturepointsbycounting
thetotaltimesofafeaturepointoccursatthoseresponses.
• A motion estimation stage using feature points followed by an affine
transformationcouldbeappliedtominimizerotationeffects.Thisprocesswill
107
not create much computational complexity since we already have feature
vectorsforrecognition.Bythehelpofthisstepfaceimageswouldbealigned.
• Whenthereisavideosequenceastheinputtothesystem,aframegivingthe
“mostfrontal”poseofapersonshouldbeselectedtoincreasetheperformance
of face recognition algorithm. This could be realized by examining the
distances between the main facial features which can be determined as the
locations that the feature points become dense. While trying to maximize
those distances, for example distance between two eyes, existing frame that
has the closest pose to the frontal will be found. Although there is still only
onefrontalfacepereach individual in the gallery, information provided bya
video sequence that includes the face to be recognized would be efficiently
usedbythisstep.
• As it is mentioned in problem definition, a face detection algorithm is
supposedtobedonebeforehand. A robust and successive face detection step
willincreasetherecognitionperformance.Implementingsuchafacedetection
methodisanimportantfutureworkforsuccessfulapplications.
• In order to further speed up the algorithm, number of Gabor filters could be
decreasedwithanacceptablelevelofdecreaseinrecognitionperformance.
It must be noted that performance of recognition systems is highly
application dependent and suggestions for improvements on the proposed
algorithm must be directed to a specific purpose of the face recognition
application.
108
REFERENCES
[1] V.Bruce,RecognizingFaces.London:Erlbaum,1988.
[2] G. Davies, H. Ellis, and E. J. Shepherd, Perceiving and Remembering
Faces,NewYork:Academic,1981.
[3] H. Ellis, M. Jeeves, F. Newcombe, and A. Young, Aspects of Face
Processing.Dordrecht:Nijhoff,1986.
[4] R. Baron, “Mechanisms of human facial recognition,” Int. J. Man
MachineStudies,vol.15,pp.137178,1981.
[5] D.C.HayandA.W.Young,“Thehumanface,”NormalityandPathology
inCognitivefunction,A.W.EllisEd.London:Academic,1982,pp.173
202.
[6] S. Carey, “A case study: Face Recognition,” Explorations in the
BiologicalLanguage,E.WalkerEd.NewYork:Bradford,1987,pp.175
201.
[7] S. Carey, R. Diamond, and B. Woods, “The development of face
recognition A maturational component?” Develop. Psych., vol. 16, pp.
257269,1980.
[8] A. P. Ginsburg, “Visual Information processing based on spatial filters
constrainedbybiologicaldata,”AMRLtech.Rep.,pp.78129,1978.
[9] A.G.Goldstein,“Facial feature variation:Anthropometric Data II,” Bull.
PsychonomicSoc.,vol.13,pp.191193,1979.
[10] A.G.Goldstein,“Facerelatedvariationoffacialfeatures:Anthropometric
DataI,”Bull.PsychonomicSoc.,vol.13,pp.187190,1979.
109
[11] L. D. Harmon, “The recognition of faces.” Scientific American, vol. 229,
pp.7182,1973.
[12] D. Perkins, “ A definition of caricature and recognition,” Studies in the
AnthropologyofVisualCommun.,vol.2,pp.124,1975.
[13] J.Sergent,“Microgenesisoffaceperception,”AspectsofFaceProcessing,
H. D. Ellis, M. A. Jeeves, F. Newcombe, and A. Young Eds. Dordrecht:
Nijhoff,1986.
[14] T. Kanade, “Picture processing by computer complex and recognition of
human faces”. Technical report, Kyoto University, Dept. of Information
Science,1973.
[15] S. Lin, S. Kung, and L. Lin, “Face Recognition / Detection by
Probabilistic DecisionBased Neural Network,” IEEE Trans. Neural
Networks,vol.8,pp.114132,1997.
[16] R.Brunelli,T.Poggio,“FaceRecognition:Featuresvs.Templates,”IEEE
Trans.onPAMI,Vol.12,No.1,Jan.1990.
[17] M. H. Yang, N. Ahuja, and D. Kriegman, “A survey on face detection
methods,”IEEETrans. On Pattern analysis and Machine Intelligance, to
appear2001.
[18] S. Ranganath and K. Arun, “Face Recognition Using Transform Features
andNeuralNetwork,”PatternRecognition,vol.30,pp.16151622,1997.
[19] S. Lawrence, C. Giles, A. Tsoi, and A. Back, “Face Recognition: A
Convolutional Neural Network Approach,” IEEE Trans. on Neural
Networks,vol.8,pp.98113,1997.
[20] M. Turk and A. Pentland. “Eigenfaces for recognition.” Journal of
CognitiveScience,pp.7186,1991.
[21] P.Belhumeur,J.Hespanha,andD.Kriegman,“Eigenfacesvs.Fisherfaces:
RecognitionusingclassSpecificlinearprojection,”IEEETrans.onPAMI,
vol.19,no.7,1997.
[22] B.Moghaddam,C.Nastar,andA.Pentland,“BayesianFaceRecognition
using Deformable Intensity Surfaces,” IEEE Conference on CVPR, San
Francisco,CA,June1996.
[23] L. Wiskott, J. M. Fellous, N. Krüger and Christoph von der Malsburg,
“Face Recognition by Elastic Graph Matching,” In Intelligent Biometric
110
Techniques in fingerprint and Face Recognition, CRC Press, Chapter 11,
pp.355396,1999.
[24] J. G. Daugman, “Complete Discrete 2D Gabor Transform by Neural
Networks for Image Analysis and Compression,” IEEE Trans. on
Acoustics, Speech and Signal Processing, vol. 36, no.7, pp.11691179,
1988.
[25] M. J. Lyons, J.Budynek and S. Akamatsu, “Automatic Classification of
Single Facial Images,” IEEE Trans. on PAMI, vol. 21, no.12, December
1999.
[26] N. Krüger, M. Pötzsch and C. von der Malsburg, “Determining of Face
Position and Pose with a learned Representation Based on Labeled
Graphs,”ImageandVisionComputing,vol.15,pp.665673,1997.
[27] L. Wiskott, J. M. Fellous, N. Krüger and Christoph von der Malsburg,
“Face Recognition and Gender Determination,” Int’l Workshop on
AutomaticFaceRecognitionandGestureRecognition,pp.9297,1995.
[28] Stirlingdatabase,http://pics.psych.stir.ac.uk.
[29] Purduefacedatabase,http://www.purdue.edu.
[30] Olivetti & Oracle Research Laboratory, The Olivetti & Oracle Research
Laboratory Face Database of Faces, http://www.cam
orl.co.uk/facedatabase.html
[31] A.Yılmaz,M.Gökmen,“Eigenhillvs.EigenfaceandEigenedge,”Proc.of
InternationalConf.onPatternRecognition,BarcelonaSpain,2000.
[32] J. Zhang, Y. Yan, and M. Lades, “Face Recognition: Eigenface, Elastic
Matching,andNeuralNets,”Proc.IEEE,vol.85,pp.14231435,1997.
[33] O. de Vel and S. Aeberhard, “LineBased Face Recognition Under
VaryingPose,”IEEETrans.onPAMI,vol.21,no.10,October1999.
[34] B. Moghaddam, W. Wahid, and A. Pentland, “Beyond Eigenfaces:
Probabilistic Matching for Face Recognition,” The 3
rd
IEEE Int’l
Conference on Automatic Face and Gesture Recognition, Nara, Japan,
April1998.
[35] B. Duc, S. Fisher, and J. Bigün, “Face Authentication with Gabor
InformationonDeformableGraphs,”IEEETrans.OnImageProc.,vol.8,
no.4,pp.504515,1999.
111
[36] B.S. Manjunath, R. Chellappa, and C. von der Malsburg, “A Feature
Based Approach to Face Recognition,” Proc. of International Conf. on
ComputerVision,1992.
[37] N.M. Allinson, A.W. Ellis, B.M. Flude, and A.J. Luckman, “A
connectionist model of familiar face recognition,” IEE Colloquium on
Machine Storage and Recognition of Faces, vol. 5, pp. 110 (Digest No:
1992/017),1992.
[38] R.J. Baron, “Mechanisms of human facial recognition,” International
JournalonMachineStudies,vol.15,pp.137178,1981.
[39] P. Benson and D. Perrett, “Face to face with the perfect image,” New
Scientist,pp.3235,22February1992.
[40] P.J.BensonandD.I.Perrett,“Perceptionandrecognitionof photographic
quality facial caricatures: implications for the recognition of natural
images,” Face Recognition, pp. 105135, Laurence Erlbaum Associates,
1991.
[41] T. Valentine, V. Bruce, “Recognizing familiar faces: the role of
distinctivenessandfamiliarity”,CanadianJournalofPsychology,vol.40,
pp.300305,1986.
[42] V. Bruce and A.W. Young, “Understanding face recognition,” British
JournalofPsychology,vol.77,pp.305327,1986.
[43] N.CantorandW.Mischel,“Prototypesinpersonperception,”Advancesin
ExperimentalSocialPsychology,vol.12,pp.352,AcademicPress,1979.
[44] G.W. Cottrell and M. Fleming, “Face recognition using unsupervised
feature extraction,” International Neural Network Conference, vol. 1, pp.
322325,1990.
[45] I. Craw, “Recognizing face features and faces,” IEE Colloquium on
Machine Storage and Recognition of Faces, vol. 7, pp. 14 (Digest No:
1992/017),1992.
[46] I.CrawandP.Cameron,“Facerecognitionbycomputer,”BritishMachine
VisionConference,pp.488507,SpringerVerlag,1992.
[47] J. Daugman, “Uncertainty relation for resolution in space, spatial
frequency, and orientation optimized by twodimensional visual cortical
filters,” Journal of the Optical Society of America, vol. 2(7), 11601169,
July1985.
112
[48] J.G.Daugman,“Highconfidencevisualrecognitionofpersonsbyatestof
statistical independence,” IEEE Transactions on Pattern Analysis and
MachineIntelligence,15(11),1993.
[49] R.DiamondandS.Carey,“Whyfacesareandarenotspecial:aneffectof
expertise,” Journal of Experimental Psychology, vol. 115, pp. 107117,
1986.
[50] G.D. Forney, “The Viterbi Algorithm,” Proceedings of the IEEE, vol.
61(3),pp.268278,March1973.
[51] R.C. Fowler, “FINGERPRINT: an old touchstone decriminalized,” IEEE
Spectrum,pp.26,February1994.
[52] R. Gallery and T.I.P. Trew, “An architecture for face classification,” IEE
ColloquiumonMachineStorageandRecognitionofFaces,vol.2,pp.15
(DigestNo:1992/017),1992.
[53] F. Galton, “Personal identification and description 1,” Nature, pp. 173
177,21June1888.
[54] Sir Francis Galton, “Personal identification and descriptionII”, Nature
201203,28June1888.
[55] L. Gillick and S.J. Cox, “Some statistical issues in the comparison of
speech recognition algorithms,” Proceedings of the Int. Conf. on
Acoustics,SpeechandSignalProcessing,pp.532535,1989.
[56] A.J. Goldstein, L.D. Harmon, and A.B. Lesk, “Identification of human
faces,”Proc.oftheIEEE,vol.59(5),pp.748760,1971.
[57] L.D. Harmon, M.K. Khan, R. Lasch, and P.F. Ramig, “Machine
identification of human faces,” Pattern Recognition, vol. 13(2), pp. 97
110,1981.
[58] D.C. Hay and A.W. Young, “The human face,” Normality and pathology
incognitivefunctions,pp.173202.AcademicPress,1982.
[59] C.L. Huang and C.W. Chen, Human facial feature extraction for face
interpretationandrecognition,”PatternRecognition,vol.25(12),pp.1435
1444,1992.
[60] T. Hutcheson, “FACE: smile, you're on candid camera,” IEEE Spectrum,
pp.2829,February1994.
[61] R.A. Hutchinson and W.J. Welsh, “Comparison of neural networks and
conventional techniques for feature location in facial images,” IEE Int.
113
Conf. on Artificial Neural Networks, Conf. Publication Number 313, pp.
201205,1989.
[62] T. Kanade, “Computer recognition of human faces,” Interdisciplinary
SystemsResearch,BirkhauserVerlag,1977.
[63] M. Kirby and L. Sirovich, “Application of the KarhunenLoueve
procedure for the characterisation of human faces,” IEEE Trans. On
PatternAnalysisandMachineIntelligence,vol.12(1),pp.103108,1990.
[64] T. Kohonen, Selforganization and associate memory, SpringerVerlag,
2ndedition,1988.
[65] A.Lanitis,C.J.Taylor,andT.F.Cootes,“Anautomaticfaceidentification
system using flexible appearance models,” British Machine Vision
Conference,volume1,pp.6574,BMVAPress,1994.
[66] L.L. Light, F. KayraStuart, and S. Hollander, “Recognition memory for
typical and unusual faces,” Journal of experimental Psychology: Human
LearningandMemory,vol.5,pp.212228,1979.
[67] R.Mandelbaum,“SPEECH:justsaytheword,”IEEESpectrum,page30,
February1994.
[68] B.Miller,“Vitalsignsofidentity,”IEEESpectrum,pages2230,February
1994.
[69] L.R. Rabiner, “A tutorial on Hidden Markov Models and selected
applicationsinspeechrecognition,”Proc.oftheIEEE,vol.77(2),pp.257
286,1989.
[70] G. Robertson and I. Craw, “Testing face recognition systems,” British
MachineVisionConference,vol.1,pp.2534,BMVAPress,1993.
[71] A.SamalandP.A.Iyengar,“Automaticrecognitionandanalysisofhuman
faces and facial expressions: A survey,” Pattern Recognition, vol. 25(1),
pp.6577,1992.
[72] J.W. Shepherd, F. Gibling, and H.D. Ellis, “The effect of distinctiveness,
presentation time and delay on face recognition,” Face Recognition, pp.
137145,LaurenceErlbaumAssociates,1991.
[73] D. Sidlauskas, “HAND: give me five,” IEEE Spectrum, pp. 2425,
February1994.
[74] J.E.Siedlarz,“IRIS:moredetailedthanafingerprint,”IEEESpectrum,pp.
27,February1994.
114
[75] T.J.Stonham,“PracticalfacerecognitionandverificationwithWISARD,”
Aspects of face processing, pp. 426441, Martinus Nijhoff Publishers,
1986.
[76] R. Want and A. Hopper, “Active badges and personal interactive
computing objects,” IEEE Trans. On Consumer Electronics, February
1992.
[77] E. Winograd, “Elaboration and distinctiveness in memory for faces,”
JournalofExperimental Psychology: Human Learning and Memory, vol.
7,pp.181190,1981.
[78] K.H.Wong,H.H.M.Law,andP.W.M.Tsang,“A system for recognizing
human faces,” Proc.oftheInt.Conf.onAcoustics,pp.16381642,1989.
[79] R.K. Yin, “Face recognition by braininjured patients: a dissociable
ability?,”Neuropsychologia,vol.8,pp.395402,1970.
[80] A.W.YoungandV.Bruce,“Perceptualcategoriesandthecomputationof
‘grandmother’,” Face Recognition, pp. 549. Laurence Erlbaum
Associates,1991.
[81] A.L. Yuille, P.W. Hallinan, and D.S. Cohen, “Feature extraction from
faces using deformable templates,” Int. Journal of Computer Vision, vol.
8(2),pp.99111,1992.
[82] R. Chellappa, C. Wilson, and S, Sirobey, “Human and machine
recognitionoffaces:Asurvey,”ProceedingsofIEEE,vol.83,May1995.
[83] P. Phillips, “The FERET database and evaluation procedure for face
recognitionalgorithms,”Image and Vision Computing, vol. 16, no. 5, pp.
295306,1998.
[84] W. Bledsoe, “The model method in facial recognition,” Panoramic
ResearchInc.,Tech.Rep.PRI:15,PaloAlto,CA,1964.
[85] M. Muarseand S. Nayar, “Visual learning and recognition of 3d objects
fromappearance,”Int.JournalofComputerVision,vol.14,pp.524,1995.
[86] A. Pentland, B. Moghadam, T. Starter, and M. Turk, “View based and
modular eigenspaces for face recognition,” Proc. on IEEE Computer
SocietyConferenceonComputerVisionandPatternRecognition,pp.84
91,1994.
[87] H. F. S. Akamatsu, T. Sasaki and Y. Suenuga, “A robust face
identificationscheme KL expansion of an invariant feature space,”SPIE
115
Proc.:: Intelligent Robots and Computer Vision X: Algorithms and
Technology,vol.1607,pp.7184,1991.
[88] P.Belhumeur,J.Hespanha,andD.Kriegman,“EigenfacesvsFisherfaces:
Recognition using class specific linear projection,” Proc. of Fourth
EuropeanConf.onComputerVision,ECCV’96,pp.4556,April1996.
[89] K. Etemad and R. Chellapa, “Face recognition using discriminant
eigenvectors,”Proc.ofICASSP,1996.
[90] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic
Press,1990.
[91] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New
York,Wiley,1973.
[92] Y.Cheng,K.Liu,J.Yang,Y.Zang,andN.Gu,“Humanfacerecognition
methodbasedonthestatisticalmodelofsmallsamplesize,”SPIEProc.::
IntelligentRobotsandComputerVisionX:Alg.AndTechn.,vol1607,pp.
8595,1991.
[93] Z.Hong,“Algebraicfeatureextractionofimagefor recognition,”Pattern
Recognition,vol.24,pp.211219,1991.
[94] Q Tian, “Comparison of statistical pattern recognition algorithms for
hybrid processing, ii: eigenvectorbased algorithms,” Journal of the
OpticalSocietyofAmerica,vol.5,pp.16701672,1988.
[95] Y.Cheng,K.Liu,J.Yang,andH.Wang,“Arobustalgebraicmethodfor
human face recognition,” Proc. of 11
th
Int. Conf. on Pattern Recognition,
pp.221224,1992.
[96] L. Rabiner and B. Huang, Fundamentals of speech recognition,
EnglewoodCliffs,NJ:PrenticeHall,1993.
[97] E.LevinandR.Pieraccini,“Dynamicplanarwarpingforopticalcharacter
recognition,”ICAASP ,pp.149152,1992.
[98] O.Agazzi,S.Kuo,E.Levin,andR.Pieraccini,“Connectedanddegraded
textrecognitionusingplanarHMM,”ICASSP,vol.5,pp.113116,1993.
[99] S. Kuo and O. Agazzi, “Keyword spotting in poorly printed documents
usingpseudo2dHMM,”IEEETrans.OnPatternAnalysisandMachine
Intelligence,1994.
116
[100] O. Agazzi and S. Kuo, “Hidden Markov Models based optical character
recognitioninpresenceofdeterministictransformations,”IEEETrans.On
PatternAnalysisandMachineIntelligence,1994.
[101] F. Samaria, “Face segmentation for identification using Hidden Markov
Models,”BritishMachineVisionConference,1993.
[102] F. Samaria and F. Fallside, “Face identification and feature extraction
using Hidden Markov Models,” Image Processing: Theory and
Applications,1993.
[103] F. Samaria and F. Fallside, “Automated face identification using Hidden
MarkovModels,”Proc.oftheInt.ConferenceonAdvancedMechatronics,
1993.
[104] F. Samaria and A. Harter, “Parameterization of stochastic model for
human face identification,” Proc. Of the second IEEE Workshop on
ApplicationofComputerVision,1994.
[105] F. Samaria and S. Young, “HMM based architecture for face
identification,”ImageandComputerVision,vol.12,October1994.
[106] D.E.RumelhartandJ.L.McClellan,ParallelandDistributedProcessing,
vol.1,Cambridge,MA:MITPress,1986.
[107] G. W. Cottrell and M. Fleming, “Face recognition using unsupervised
featureextraction,”Proc.Int.NeuralNetworkConf.,vol.1,Paris,France,
July913,1990,pp.322325.
[108] H. Bourlard and Y. Kamp, “Autoassociation by multilayer perceptrons
and singular value decomposition,” Biological Cybern., vol. 59, pp. 291
294,1988.
[109] David DeMers and G. W. Cottrell, “Nonlinear dimension reduction,”
Advances in Neural Information Processing Systems 5, pages 580587,
SanMateo,CA,1993.MorganKauffmannPublishers.
[110] J. Weng, N. Ahuja, and T. S. Huang, “ Learning recognition and
segmentation of 3d objects from 2d images,” Proc. of Int. Conf. on
ComputerVision,ICCV93,pp.121128,1993.
[111] T. Kohonen, “The selforginizing map,” Proc. of the IEEE, vol. 78, pp.
14641480,1990.
[112] T. Kohonen, Self Orginizing Maps, SpringerVerlag, Berlin, Germany,
1995.
117
[113] G. Chow and X. Li, “Towards a system for automatic facial feature
detection,”PatternRecognition ,vol.26,no.12,pp.17391755,1993.
[114] L. Stringa, “Eyes detection for face recognition,” Applied Artificial
Intelligence,vol.7,pp.365382,OctDec1993.
[115] P.Burt,“Smartsensingwithinapyramidvisionmachine,”Proc.ofIEEE,
vol.76,pp.10061015,August1988.
[116] D. Beymer, “Face recognition under varying pose,” Proc. of 23
rd
Image
understandingWorkshop,vol.2,pp.837842,1994.
[117] M.Lades,JVorbruggen,J,Buhmann,J.Lange,vonderMalsburg,andR.
Wurtz, “ Distortion invariant object recognition in the dynamic link
architecture,”IEEETrans. Comput.,vol.42,no.3,pp.300311,1993.
[118] T.S.Lee,“Imagerepresentationusing2dGaborwavelets,”IEEETrans.
On Pattern Analysis Pattern Analysis and Machine Intelligence, vol. 18,
no.10,October,1996.
[119] J. G. Daugman, “Two dimensional spectral analysis of cortical receptive
fieldprofile”,VisionResearch,vol.20,pp.847856,1980.
[120] D.Gabor,“Theoryofcommunication,”J.IEE,vol.93,pp.429459,1946.
[121] D. H. Hubel and T. N. Wiesel, “Functional architecture of macaque
monkey visual cortex,” Proc. Royal Soc. B (London), vol. 198, pp.159,
1978.
[122] S.Marcelja,“Mathematicaldescriptionoftheresponsesofsimplecortical
cells,”J.OpticalSoc.Am., vol.70,pp.12971300,1980.
[123] A. M. Burton, and V. Bruce, and I. Craw, “Modeling face recognition,”
PhilosophicalTrans.OftheRoyalSoc.B,vol.1,p.457480,1993.
[124] V. Bruce and S. Langton, “The use of pigmentation and shading
information in recognizing the sex and identities of faces,” Perception,
vol.23,pp.803822,1994.
[125] R. Kemp, G. Pike, P. White, and A. Musselman, “A perception and
recognition of normal and negative faces the role of shape from shading
andpigmentationcues,”Perception,vol.25,pp.3752,1996.
[126] R. E. Galper, “Recognition of faces in photographic negative”,
PsychonomicScience,vol.19,pp.207208,1970.
118
[127] H. Moon and P. J. Phillips, ‘Analysis of PCAbased face recognition
algorithms,’ Empirical Evaluation Techniques in Computer Vision, IEEE
ComputerSocietyPress,LosAlamitos,CA,1998.
[128] FERETlatestevaluationresults,
http://www.itl.nist.gov/iad/humanid/feret/perf/eval.html
[129] W.Zhao,R.Chellappa,andA.Krishnaswamy,“Discriminantanalysisof
principalcomponentsforfacerecognition,”In3rdInternational
ConferenceonAutomaticFaceandGestureRecognition,pp.336341,
1998.
[130] K.EtemadandR.Chellappa,“Discriminantanalysisforrecognitionof
humanfaceimages,”J.Opt.Soc.Am.A,vol.14,pp.17241733,August
1997.
ABSTRACT
FACE RECOGNITION USING GABOR WAVELET TRANSFORM
Kepenekci, Burcu M.S, Department of Electrical and Electronics Engineering Supervisor: A. AydınAlatan CoSupervisor: GözdeBozda ı Akar
September 2001, 118 pages
Face recognition is emerging as an active research area with numerous commercial and law enforcement applications. Although existing methods performs well under certain conditions, the illumination changes, out of plane rotations and occlusions are still remain as challenging problems. The proposed algorithm deals with two of these problems, namely occlusion and illumination changes. In our method, Gabor wavelet transform is used for facial feature vector construction due to its powerful representation of the behavior of receptive fields
ii
in human visual system (HVS). The method is based on selecting peaks (highenergized points) of the Gabor wavelet responses as feature points. Compared to predefined graph nodes of elastic graph matching, our approach has better representative capability for Gabor wavelets. The feature points are automatically extracted using the local characteristics of each individual face in order to decrease the effect of occluded features. Since there is no training as in neural network approaches, a single frontal face for each individual is enough as a reference. The experimental results with standard image libraries, show that the proposed method performs better compared to the graph matching and eigenface based methods.
Keywords: Automatic face recognition, Gabor wavelet transform, human face perception.
iii
Önerilen metod ile bu üç problemden aydınlanma de i imleri ve örtme etkisi ele alınmı tır. ve aydınlatma de i imleri hala yüz tanımada çözülememi üç problemdir. AydınAlatan Yardımcı Tez Yöneticisi: Gözde Bozda ı Akar Eylül 2001. 118 sayfa Yüz tanıma günümüzde hem ticari hem de hukuksal alanlarda artan sayıda uygulaması olan bir problemdir. yönlenme. Bu çalı mada hem yüze ait öznitelik noktaları hem de vektörleri Gabor dalgacık dönü ümü iv . Varolan yüz tanıma metodları kontrollü ortamda ba arılı sonuçlar verse de örtme.ÖZ GABOR DALGACIKLARINI KULLANARAK YÜZ TANIMA Kepenekci. Elektrik Elektronik Mühendisli i Bölümü Tez Yöneticisi: A. Burcu YüksekLisans.
Yapılan deneylerin sonucunda önerilen metodun varolan çizge e leme ve özyüzler yöntemleriyle kar ıla tırıldı ında daha ba arılı sonuçlar verdi i gözlenmi tir. insan yüz algısı. Gabor dalgacık dönü ümü. öznitelik bulma. insan görme sistemindeki duyumsal bölgelerin davranı ını modellemesinden dolayı kullanılmı tır. v . örüntü tanıma. Gabor dalgacık tepkeleri tepelerinin (yüksek enerjili noktalarının) öznitelik noktaları olarak seçilmesine dayanmaktadır. Öznitelik noktaları otomatik olarak her yüzün farklı yerel özellikleri kullanılarak bulunmakta. Sinir a ları yakla ımlarında oldu u gibi ö renme safhası olmaması nedeniyle tanıma için her ki inin sadece bir ön yüz görüntüsü yeterli olmaktadır. Anahtar Kelimeler: Yüz tanıma. Böylece Gabor dalgacıklarının en verimli ekilde kullanılması sa lanmı tır. öznitelik e leme. daha önceden tanımlanmı çizge düümleri yerine. bunun sonucu olarak örtük özniteliklerin etkisi de azaltılmaktadır. Önerilen metod.kullanılarak bulunmu tur. Gabor dalgacık dönü ümü.
I also thank to F. Dr. Aydın Alatan for their guidance. Dr. suggestions and insight throughout this research. Most importantly. and great friendship. I would like to acknowledge Dr. A.ACKNOWLEDGEMENTS I would like to express my gratitude to Assoc. Prof. Boray Tek for his support. I offer sincere thanks for their unshakable faith in me. Prof. vi . useful discussions. Gözde Bozda ı Akar and Assist. Special thanks to friends in Signal Processing and Remote Sensing Laboratory for their reassurance during thesis writing. Mustafa Karaman for his guidance during my undergraduate studies and his encouraging me to research. Prof. To my family. I express my appreciation to Assoc. Dr. U ur Murat Lelo lu for his comments.
.…….. Why Face Recognition? ……….. Problem Definition……………………….1.2... PAST RESEARCH ON FACE RECOGNITION..……… ... Early Face Recognition Methods……………………….2... Human Face Recognition………………………………………….…………………….2.x ..vi TABLE OF CONTENTS……………………………………………………..…vii .………… iv ACKNOWLEDGEMENTS…………………………………………….8 2.. LIST OF FIGURES.1.…………………………………………………….1 1....……………………………………….9 2.. Discussion………….………………………6 1.9 2..…………………………………3 1.3..ii ..2..16 2....3.…….15 2.2. Organization of the thesis……………………………………….xi . Matching and Statistical Decision ……...….. Representation..... Automatic Face Recognition…………………………………….TABLE OF CONTENTS ABSTRACT………………………………………………….1.………...2...... CHAPTER 1. Statistical Approachs to Face Recognition…………….……………………………………13 2..19 2. LIST OF TABLES………. ÖZ………………………………………………………………. INTRODUCTION…………………………………………………….1.………………….1.21 vii ..
3..5.2.21 2..30 2. Current State of The Art………………………………….21 2.3.. Singular Value Decomposition…….4.. Feature Based Matching ……………………………. Neural Networks Approach………………………. Face Recognition Using Singular Value Decomposition………………35 2. Eigenfaces………………………….2.2.1. Linear Discriminant Methods.2.8...2.2.26 2. Face Recognition using Eigenfaces…………. Hidden Markov Model Based Methods………………….3.2...3.1.2.3.3. FACE COMPARISON AND MATCHING USING GABOR WAVELETS……. KarhunenLoeve Expansion Based Methods….1.3. Face Recognition Using Linear Discriminant Analysis…………...51 … 2.3.3.2.2.2. Eigenfeatures……………………….3.3.3... KarhunenLoeve Transform of the Fourier Transform……………….1.2.2.1.……………….2.28 2.2.………64 viii . Fisher’s Linear Discriminant………..2.44 2.…29 2. Face Representation Using Gabor Wavelets…………….6.4..7.1.…….62 3..2..3.32 2.3.60 3.………………………………….….3.2.38 2.35 2.49 2. Template Based Matching…………………………….2..2.2.1. Singular Value Decomposition Methods………35 2.2.1..1..Fisherfaces….30 2.2.
. Results for Purdue University Face Database……………84 4. Face Comparison……………………………….2..1.... CONCLUSIONS AND FUTURE WORK…………………………..2.108 ix .1.1.75 4. Simulati n Setup…………………..3.1.…………………. Gabor Wavelets………………………………………..3. Featurepoint Localization……………………..70 Featurevector Extraction…………………….3. Matching Procedure…………………………………………….2.103 REFERENCES……………………………………………………….…………82 o 4. RESULTS…………………………………………………………….1..4.……………….. Similarity Calculation……………………………………74 3.1.2.1..3.1...3.. Results for The Olivetti and Oracle Research Laboratory (ORL) face Database….1..74 3..64 3.1.6 8 3..2.………. 3. 2D Gabor Wavelet Representation of Faces…………….1.1. Results for University of Stirling Face Database………..82 4.73 3...1.2.83 4...... Feature Extraction………………………………………..89 5.70 3.1.…………………………………87 4.……….2. Results for FERET Face Database…….
1 Recognition performances of eigenface...3 4.85 4..…92 Probe sets for FERET perf rmance evaluation……………………….…92 o FERET performance evaluation results for various face recognition algorithms………………………………………………………….4 4.5 Performance results of wellknownalgorithms on ORL database………... eigenhills and proposed method on the Purdue face database………………………….88 Probe sets and their goal of evaluation……………………………….LIST OF TABLES 4..2 4.………………….……94 x .
40 … HMM training scheme…………………………………………………..…………….LIST OF FIGURES 2. .…………………..........3 2.6 2..68 … Example of a facial image response to Gabor filters …….44 Autoassociation and classification networks………….45 The diagram of the Convolutional Neural Network System…………….4 2.………59 An ensemble of Gabor wavelets………………………..………………………….…………………….9 2...…..2 Appearance model of Eigenface Algorithm…....43 HMM recognition scheme……………………..8 2.27 Image sampling technique for HMM recognition ………………..……………………….……………….6 Flowchart of the feature extraction stageof the facial images ……….. and receptive fields that are mached to the local features on the face…………………………67 3...5 2..………5 5 A brunch graph……………………………………………………….1 2.7 2.66 Small set of features can recognize faces uniquely.………….48 A 2D image lattice (grid graph) on Marilyn Monroe’s face…….5 Gabor filters correspond to 5 spatial frequency and 8 orientation…...4 3.………………..10 3....69 Facial feature points found as the highenergized points of Gabor wavelet responses……………………………….…59 Bunch graph matched to a face…………………………………..72 xi ..2 6 Discriminative model of Eigenface Algorithm...3 3.1 3.2 2.71 3....….
10 4.102 4. matching face from gallery ……………………....……...…97 a FERET identification performances aginst fc probes….14 FERET identification performance of proposed method against duplicate I probes...………………………………………….13 FERET identification performance of proposed method against fc probs………………………………..….………….2 4.………….85 Whole set of face images of 40 individuls 10 images per person….…………98 a FERET identification performances against duplicate II probes..1 4...……………………….………..... …...…………………103 xii ...7 4.9 4...15 FERET identification performance of proposed method against duplicate II probes…..8 4.5 Test faces vs.3 4...………………………..11 4.…………………100 FERET current upper bound identification performances.102 e 4..……96 FERET identification performances aginst duplicate I probes….4 4.89 4.86 a Example of misclassified faces of ORL database………………….7 4..……...…99 FERET average identification performances……….81 Examples of different facial expressions from Stirling database…….…...101 FERET identification performance of proposed method against fb probes………….6 4...…………………….……88 Example of different facial images for a person from ORL database that are placed at training or probe sets by different…………….…………...…………103 4.…83 Example of different facial images for a person from Purdue database...3.12 FERET identification performances against fb probes ……….
photo ID’s. pattern recognition. having the ability to identify distorted images. driver’s licenses. since the same underlying 1 .CHAPTER 1 INTRODUCTION Machine recognition of faces is emerging as an active research area spanning several disciplines such as image processing. Humans seem to recognize faces in cluttered scenes with relative ease. computer vision and neural networks. In addition to the cognitive aspects. and faces with occluded details. These applications range from static matching of controlled format photographs such as passports. Machine recognition is much more daunting task. Face recognition technology has numerous commercial and law enforcement applications. coarsely quantized images. understanding face recognition is important. credit cards. and mug shots to real time matching of surveillance video images [82]. Understanding the human mechanisms employed to recognize faces constitutes a challenge for psychologists and neural scientists.
it is not comparable to methods using keys. whereas fingerprints and retina scans suffer from low user acceptance. or. terrorist activities.mechanisms could be used to build a system for the automatic identification of faces by machine. In addition. have a bothersome drawback. passwords. Still most of the access control methods. The availability of real time hardware. these methods require the user to remember a password. 2. 54]. etc. batches. passwords or batches. with all their legitimate applications in an expanding society. A formal method of classifying faces was first proposed by Francis Galton in 1888 [53. During the 1980’s work on face recognition remained largely dormant. 2 . the corresponding means (keys. require a human action in the course of identification or authentication. The increasing need for surveillance related applications due to drug trafficking. PIN codes) are prone to being lost or forgotten. in general. Since the 1990’s. 3. to enter a PIN code. to carry a batch. While this is a high rate for face recognition. The reemergence of neural network classifiers with emphasis on real time computation and adaptation. The increase in emphasis on civilian/commercial research projects. 4. Except for human and voice recognition. the research interest in face recognition has grown significantly as a result of the following facts: 1. Modern face recognition has reached an identification rate greater than 90% with wellcontrolled pose and illumination conditions.
Biometric identification is the technique of automatically identifying or verifying an individual by a physical characteristic or personal trait.1. voice. for identification.1. Such requirement for reliable personal identification in computerized access control has resulted in an increased interest in biometrics. daytoday affairs like withdrawing money from a bank account or dealing with the post office. A biometricbased system was developed by Recognition Systems Inc. and face. finger. access control for computers in general or for automatic teller machines in particular. Physical biometric systems use the eye.. The key element of biometric technology is its ability to identify a human being and enforce security [83]. Behavioral biometrics encompasses such behaviors as signature and typing rhythms. Campbell. California. Why face recognition? Within today’s environment of increased importance of security and organization. Biometric characteristics and traits are divided into behavioral or physical categories. Biometric technology was developed for use in highlevel security systems and law enforcement markets. The system was called ID3D Handkey and used the three dimensional shape of a person’s hand to 3 . or in the prominent field of criminal investigation. as reported by Sidlauskas [73]. The term “automatically” means the biometric identification system must identify or verify a human characteristic or trait quickly with little or no intervention from the user. hand. identification and authentication methods have developed into a key technology in various areas: entrance control in buildings.
like fingerprints. It has been observed that the iris of the eye. where a user is identified through his or her spoken words. This pause and declare interaction is unlikely to change because of the finegrain spatial sensing required. Various institutions around the world have carried out research in the field. AT&T have produced a prototype that stores a person’s voice on a memory card. Fingerprint systems are unobtrusive and relatively cheap to buy. Speech recognition is also offers one of the most natural and less obtrusive biometric measures. This system could store up to 20000 different hands. They require the user to position their body relative to the sensor. 4 . details of which are described by Mandelbaum [67]. Fingerprints are unique to each human being. While appropriate for bank transactions and entry into secure areas. such technologies have the disadvantage that they are intrusive both physically and socially. then pause for a second to declare himself or herself. Moreover.distinguish people. Fowler [51] has produced a short summary of the available systems. Another wellknown biometric measure is that of fingerprints. The side and top view of a hand positioned in a controlled capture box were used to generate a set of geometric features. Daugman designed a robust pattern recognition method based on 2D Gabor transforms to classify human irises. Capturing takes less than two seconds and the data could be stored efficiently in a 9byte feature vector. They are used in banks and to control entrance to restricted access areas. displays patterns and textures unique to each human and that it remains stable over decades of life as detailed by Siedlarz [74].
since people can not recognize people using this sort of data. Facial recognition is the most successful form of human surveillance. is one of the fastest growing fields in the biometric industry. Human beings often recognize one another by unique facial characteristics. such as Harvard University and the MIT Media Lab. While the pause and present interaction perception are useful in high security applications. and the noninvasive aspect of facial recognition systems. these types of identification do not have a place in normal human interactions and social structures. Although facial recognition is still in the research and development phase. are working on the development of more accurate and reliable systems. One of the newest biometric technologies. A face recognition system would allow user to be identified by simply walking past a surveillance camera. 5 . they are exactly the opposite of what is required when building a store that recognizing its best customers. Facial recognition technology. the everincreasing number of video cameras being placed in the workspace. several commercial systems are currently available and research organizations. is based on this phenomenon. Interest in facial recognition is being fueled by the availability and low cost of video hardware. or a house that knows the people who live there. is being used to improve human efficiency when recognizing faces. automatic facial recognition. or an information kiosk that remembers you.
facial expressions also vary from time to time. Face recognition is a difficult problem due to the general similar shape of faces combined with the numerous variations between images of the same face. The environment surrounding a face recognition application can cover a wide spectrum from a wellcontrolled environment to an uncontrolled one. frontal and profile photographs are taken. the size and position of the face are normalized approximately to the predefined values and background region is minimal. These face images are commonly called mug shots. face may appear at different orientations and a face can be partially 6 . complete with uniform background and identical poses among the participants. must detect faces in images. Recognition of faces from an uncontrolled environment is a very complex task: lighting condition may vary tremendously. comes from virtually uncontrolled environment. Face detection task is to report the location. Each mug shot can be manually or automatically cropped to extract a normalized subpart called a canonical face image. a task that is done by humans in daily activities. Systems. In a controlled environment. which automatically recognize faces from uncontrolled environment. General face recognition. Problem Definition A general statement of the problem can be formulated as follows: given still or video images of a scene.1. and typically also the size. identify one or more persons in the scene using a stored database of faces. In a canonical face image. of all the faces from a given image and completely a different problem with respect to face recognition.2.
The proposed algorithm. name label) associated with that face from all the individuals in its database in case of occlusions and illumination changes. Since the techniques used in the best face recognition systems may depend on the application of the system. (Often only one image is available per person. Identifying particular people in real time (e. location tracking system). We aim to provide the correct label (e. namely occlusion and illumination changes. there exists only one frontal view of each individual. Although existing methods performs well under constrained conditions. in a police database). We do not consider cases with high degrees of rotation. Database of faces that are stored in a system is called gallery. deals with two of these three important problems. Further. one can identify at least two broad categories of face recognition systems [19]: 1. the problems with the illumination changes.) 2.e. In gallery. i. 7 .g. Finding a person within large database of faces (e.occluded. out of plane rotations and occlusions are still remains unsolved. we assume that a minimal preprocessing stage is available if required.) In this thesis. It is usually not necessary for recognition to be done in real time. depending on the application. (Multiple images per person are often available for training and real time recognition is required. handling facial features over time (aging) may also be required.g. we primarily interested in the first case. Detection of face is assumed to be done beforehand.g.
The algorithm is explained explicitly. Future works. In chapter 2. 8 . we summarize the literature on both human and machine recognition of faces. In chapter 5. Organization of the Thesis Over the past 20 years extensive research has been conducted by psychophysicists. Chapter 3 introduces the proposed approach based on Gabor wavelet representation of face images. are also presented. Performance of our method is examined on four different standard face databases with different characteristics. Simulation results and their comparisons to wellknown face recognition methods are presented in Chapter 4. which may follow this study.1.3. concluding remarks are stated. neuroscientists and engineers on various aspects of face recognition by human and machines.
1. 2. the modeling of this capability. that are potentially relevant to the design of face recognition systems will be summarized One of the basic issues that have been argued by several scientists is the existence of a dedicated face processing system [82. This chapter reviews some of the wellknown approaches from these both fields. In this section some findings. and the apparent modularity of face recognition. 3]. Human Face Recognition: Cognitive Aspects Perceptual and The major research issues of interest to neuroscientists include the human capacity for face recognition. reached as the result of experiments about human face recognition system.CHAPTER 2 PAST RESEARCH ON FACE RECOGNITION The task of recognizing faces has attracted much attention both from neuroscientists and from computer vision scientists. Physiological evidence 9 .
the question is what features humans use for face recognition. Simulations show that the most difficult faces for humans to recognize are those faces. Interestingly. Explanation of face perception as the result of holistic or feature analysis alone is not possible since both are true.indicates that the brain possesses specialized ‘face recognition hardware’ in the form of face detector cells in the inferotemporal cortex and regions in the frontal right hemisphere. which are neither attractive nor unattractive [4]. mouth. Hence. Local features provide a finer classification system for face recognition. It has 10 . eye etc. 11. recognition is not possible for human. It is interesting that when all facial features like nose. while the high frequency components contribute to the finer details required in the identification task [8. As a result of many studies scientists come up with the decision that face recognition is not like other object recognition [42]. The low frequency components contribute to the global description. prosapagnosics. Distinctive faces are more easily recognized than typical ones. In human both global and local features are used in a hierarchical manner [82]. impairment in these areas leads to a syndrome known as prosapagnosia. but in different order than ordinary. 13]. are contained in an image. The results of the related studies are very valuable in the algorithm design of some face recognition systems. while the higher frequency components are used in recognition. retain their ability to visually recognize nonface objects. Information contained in low frequency bands used in order to make the determination of the sex of the individual. although unable to recognize familiar faces.
A negative image of a darkhaired Caucasian. it is difficult to understand why face recognition appears so viewpoint dependent [1]. Although when viewed close up only Tony Blair was seen. for example. They also observe that negation has no significant effect on identification and matching of surface images that lacked any pigmented and textured features. In [42]. This demonstrates that the important information for recognizing familiar faces is contained within a particular range of spatial frequencies. viewed from distance. Bruce and Langton found that negation (inverting both hue and luminance values of an image) effects badly the identification of familiar faces [124]. Another important finding is that human face recognition system is disrupted by changes in lighting direction and also changes of viewpoint. this led them to attribute the negation effect to the alteration of the brightness information about pigmented areas. Kemp et al.also been found that the upper part of the face is more useful for recognition than the lower part [82]. Blair disappears and Margaret Thatcher becomes visible. Although some scientists tend to explain human face recognition system based on derivation of 3D models of faces using shape from shading derivatives. [125] showed that the hue 11 . Bruce explains an experiment that is realized by superimposing the low spatial frequency Margaret Thatcher’s face on the high spatial frequency components of Tony Blair’s face. The effects of lighting change on face identification and matching suggest that representations for face recognition are crucially affected by changes in low level image features. will appear to be a blonde with dark skin.
and hairstyle. these paraphernalia are usually reliably ignored. they do pretty well in ignoring paraphernalia. Curiously. Several other interesting studies related to how children perceive inverted faces are summarized in [6. Young children typically recognize unfamiliar faces using unrelated cues such as glasses. hats. though there was a decrement in recognition memory for pictures of faces when hue was altered in this way [126]. with preserved luminance values. This distinction between memory for pictures and faces is important. Familiar faces presented in ‘hue negated’ versions. were recognized as well as those with original hue values maintained. It is likely that unfamiliar faces are processed inorder to recognize a picture where as familiar faces are fed into the face recognition system of human brain. though the representations of familiar faces seems not to be. The poor identification of other races is not a psychophysical problem but more likely a 12 . 7]. A detailed discussion of recognizing familiar and unfamiliar faces can be found in [41]. these averages may be different for different races and recognition may suffer from prejudice and unfamiliarity with the class of faces from another race or gender [82]. By the age of twelve. when children as young as five years are asked to recognize familiar faces. It is clear that recognition of familiar and unfamiliar faces is not the same for humans. clothes. Humans may encode an ‘average’ face. Humans recognize people from their own race better than people from another race. This suggests that episodic memory for pictures of unfamiliar faces can be sensitive to hue.values of these pigmented regions do not themselves matters for face identification.
but that the overall variation is small [9. 4. Distinctive faces are more easily recognized than typical ones. Those researches illuminate computer vision scientists’ studies. Discussion The recognition of familiar faces plays a fundamental role in our social interactions.psychosocial one. Humans are able to identify reliably a large number of faces and psychologists are interested in understanding the perceptual and cognitive mechanisms at the base of the face recognition process.1. Humans may encode an ‘average’ face. One of the interesting results of the studies to quantify the role of gender in face recognition is that in Japanese population. 13 . The human capacity for face recognition is a dedicated process. majority of the women’s facial features is more heterogeneous than the men’s features. Thus artificial face recognition systems should also be face specific. Both global and local features are used for representing and recognizing faces. 2. 2. 10]. 3. Humans recognize people from their own race better than people from another race. not merely an application of the general object recognition process. We can summarize the founding of studies on human face recognition system as follows: 1. It has also been found that white women’s faces are slightly more variable than men’s.1.
As it is mentioned above an automated face recognition system should be face specific. should pay attention to this familiarity subject. Methods. such as intensity negation. The observations and findings about human face recognition system will be a good starting point for automatic face recognition methods. Certain image transformations. or by just frequently looking at the same face image can we be familiar to that face? Seeing a face in many different conditions is something related to training however the interesting point is that by using only the same 2D information how we can pass from unfamiliarity to familiarity. However. which recognize faces from a single view. It should effectively use features that discriminate a face from others. and changes in lighting direction can severely disrupt human face recognition. Seeing a face in many different conditions (different illuminations.13]. expressions…etc. rotations. the human brain has its shortcomings in the total number of persons that it can accurately ‘remember’.5. The benefit of a computer system would be its capacity to handle large datasets of face images. and more as in caricatures it preferably amplifies such distinctive characteristics of face [5.) make us familiar to that face. 14 . strange viewpoint changes. Difference between recognition of familiar and unfamiliar faces must also be noticed. First of all we should find out what makes us familiar to a face. Using the present technology it is impossible to completely model human recognition system and reach its performance.
explains both biological and manmade flight. 2. In this section we will review existing face recognition systems in five categories: early methods. Designers of face recognition algorithms and systems should be aware of relevant psychophysics and neurophysiological studies but should be prudent in using only those that are applicable or relevant from a practical/implementation point of view. underlying computations within the human visual system are of tremendous complexity. The seemingly trivial task of finding and recognizing faces is the result of millions years of evolution and we are far away from fully understanding how the brain performs this task. Automatic Face Recognition Although humans perform face recognition in an effortless manner. template based approaches. Finally current state of the art of the face recognition technology will be presented. Although a single underlying principle. the Bernoulli effect.2. Up to date. 15 .Some of the early scientists were inspired by watching bird flight and built their vehicles with mobile wings. we note that no modern aircraft has flapping wings. and feature based methods. no complete solution has been proposed that allow the automatic recognition of faces in real images. neural networks approaches. statistical approaches.
the n dimensional Euclidean space. the image of a face is a two dimensional (2D) array of pixel gray levels as.2) is the total number of pixels in the image.j. x.. F(x)=[ f1 (x). However in some cases it is more convenient to express the face image. Featurebased representations are usually more efficient since generally m is much smaller than n.1) or (2. While many previous techniques represent faces in their most elementary forms of (2.2.1..2.. i...f2(.). For a given representation. many others use a feature vector.i.1) where S is a square lattice...x n] T Where n= S (2. x={xi. where f1(.j ∈ S}. (2.. Therefore x∈ Rn.2). x2. how far apart are the faces under the representation and how compact is the representation.. Representation.. two properties are important: discriminating power and efficiency.)...fm(. f2(x).e.. 16 . as onedimensional (1D) column vector of concatenated rows of pixels. At an elementary level. fm(x)]T.. as x=[ x1. Matching and Statistical Decision The performance of face recognition depends on the solution of two problems: representation and matching.) are linear or nonlinear functionals..
A simple way to achieve good efficiency is to use an alternative orthonormal basis of Rn. Specifically, suppose e1, e2,..., en are an orthonormal basis. Then X can be expressed as
n i =1
x=
~e xi i
(2.3)
x where ~i ∆ x, ei
(inner product), and x can be equivalently represented by
~ = [~ , ~ ,..., ~ ]T . Two examples of orthonormal basis are the natural basis used x x1 x 2 xn
in (2.2) with ei=[0,...,0,1,0,...,0]T, where one is ith position, and the Fourier basis
ei = 1 n1 / 2
j 2π i n j 2π 2i n j 2π
( n −1)i
n
T
1, e
,e
,..., e
. If for a given orthonormal basis ~i x
are small when i ≥ m , then the face vector ~ can be compressed into an m x
T dimensional vector, ~ ≅ [~1 , ~2 ,..., ~m ] . x x x x
It is important to notice that an efficient representation does not necessarily have good discriminating power. In the matching problem, an incoming face is recognized by identifying it with a prestored face. For example, suppose the input face is x and there are K prestored faces ck, k=1,2,...,K. One possibility is to assign x to c k0 if
k 0 = arg min x − c k
1≤ k ≤ K
(2.4)
17
where . represents the Euclidean distance in Rn. If ck is normalized so that
ck=c for all k, the minimum distance matching in (2.4) simplifies to correlation
matching
k 0 = arg min x, c k .
1≤ k ≤ K
(2.5)
Since distance and inner product are invariant to change of orthonormal basis, minimum distance and correlation matching can be performed using any orthonormal basis and the recognition performance will be the same. To do this, ~ simply replace x and ck in (2.4) or (2.5) by ~ and c k . Similarly (2.4) and (2.5) x also could be used with feature vectors. Due to such factors such as viewing angle, illumination, facial expression, distortion, and noise, the face images for a given person can have random variations and therefore are better modeled as a random vector. In this case, maximum likelihood (ML) matching is often used,
k 0 = arg min log( p( x  c k ))
1≤ k ≤ K
(2.6)
where p(xck) is the density of x conditioning on its being the kth person. The ML criterion minimizes the probability of recognition error when a priori, the incoming face is equally likely to be that of any of the K persons. Furthermore if we assume that variations in face vectors are caused by additive white Gaussian noise (AWGN)
xk=ck+wk
18
(2.7)
where wk is a zeromean AWGN with power σ2, then the ML matching becomes the minimum distance matching of (2.4).
2.2.2.
Early face recognition methods
The initial work in automatic face processing dates back to the end of the
19th century, as reported by Benson and Perrett [39]. In his lecture on personal identification at the Royal Institution on 25 May 1888, Sir Francis Galton [53], an English scientist, explorer and a cousin of Charles Darwin, explained that he had “frequently chafed under the sense of inability to verbally explain hereditary resemblance and types of features”. In order to relieve himself from this embarrassment, he took considerable trouble and made many experiments. He described how French prisoners were identified using four primary measures (head length, head breadth, foot length and middle digit length of the foot and hand respectively). Each measure could take one of the three possible values (large, medium, or small), giving a total of 81 possible primary classes. Galton felt it would be advantageous to have an automatic method of classification. For this purpose, he devised an apparatus, which he called a mechanical selector, that could be used to compare measurements of face profiles. Galton reported that most of the measures he had tried were fairy efficient. Early face recognition methods were mostly feature based. Galton’s proposed method, and a lot of work to follow, focused on detecting important facial features as eye corners, mouth corners, nose tip, etc. By measuring the relative distances between these facial features a feature vector can be constructed
19
and using ‘best beam intensity’ for the region. this system could accommodate wide variations in head rotation. series fiducial points are detected using relatively simple image processing tools (edge maps. and mouth. Since feature extraction is manually done. By comparing the feature vector of an unknown face to the feature vectors of known vectors from a database of known faces. signatures etc. 20 . more accurate information is extracted by confining the processing to four smaller groups. and contrast. The coarsegrain stage simplified the succeeding differential operation and feature finding algorithms. In Kanade’s work [62].) and the Euclidean distances are then used as a feature vector to perform recognition. In this system.to describe each face. a human operator located the feature points on the face and entered their positions into the computer. nearest neighbor or other classification rules were used for identifying the test face. image quality. Once the eyes. The four regions are the left and right eye. which are rations of distances. scanning at higher resolution. and angles to compensate for the varying size of the pictures. nose and mouth are approximately located. The beam intensity is based on the local area histogram obtained in the coarsegain stage. nose. To eliminate scale and dimension differences the components of the resulting vector are normalized. A set of 16 facial parameters. the closest match can be determined. The face feature points are located in two stages. One of the earliest works is reported by Bledsoe [84]. A simple distance measure is used to check similarity between two face images. Given a set of feature point distances of an unknown person. is extracted. areas. tilt.
2.2.3.
2.2.3.1.
Statistical approaches to face recognition
KarhunenLoeve Expansion Based Methods
2.2.3.1.1. Eigenfaces
A face image, I(x,y), of size NxN is simply a matrix with beach element representing the intensity at that particular pixel. I(x,y) may also be considered as a vector of length N2 or a single point in an N2dimentional space. So a 128x128 pixel image can be represented as a point in a 16,384 dimensional space. Facial images in general will occupy only a small subregion of this high dimensional ‘image space’ and thus are not optimally represented in this coordinate system. As mentioned in section 2.2.1, alternative orthonormal bases are often used to compress face vectors. One such basis is the KarhunenLoeve (KL). The ‘Eigenfaces’ method proposed by Turk and Pentland [20], is based on the KarhunenLoeve expansion and is motivated by the earlier work of Sirovitch and Kirby [63] for efficiently representing picture of faces. Eigenface recognition derives it is name from the German prefix ‘eigen’, meaning ‘own’ or ‘individual’. The Eigenface method of facial recognition is considered the first working facial recognition technology. The eigenfaces method presented by Turk and Pentland finds the principal components (KarhunenLoeve expansion) of the face image distribution or the eigenvectors of the covariance matrix of the set of face images. These eigenvectors can be thought as a set of features, which together characterize the variation between face images.
21
Let a face image I(x,y) be a two dimensional array of intensity values, or a vector of dimension n. Let the training set of images be I1, I2, ..., IN. The average face image of the set is defined by ψ = 1 N
N i =1
I i . Each face differs from the
average by the vector φ i = I i − ψ . This set of very large vectors is subject to principal component analysis which seeks a set of K orthonormal vectors vk,
k=1,...,K and their associated eigenvalues λk which best describe the distribution
of data. Vectors vk and scalars λk are the eigenvectors and eigenvalues of the covariance matrix:
C=
1 N
N i =1
φ iφ iT = AAT ,
(2.9)
where the matrix A=[φ1, φ2,..., φN]. Finding the eigenvectors of matrix Cnxn is computationally intensive. However, the eigenvectors of C can be determined by first finding the eigenvectors of a much smaller matrix of size NxN and taking a linear combination of the resulting vectors.
Cv k = λ k v k
v k Cv k = v k λ k v k
T T
(2.10) (2.11)
since eigenvectors, vk, are orthogonal and normalized vkTvk=1.
v k Cv k = λ k
T
(2.12)
22
1 λk = v k T N
= = = = =
N i =1
φ iφ iT v k
(2.13)
1 N 1 N 1 N 1 N 1 N
N i =1 N i =1 N i =1 N i =1 N i =1
v k φ iφ iT v k
T
(v k φ iT ) T (v k φ iT ) (v k φ iT ) 2 (v k I iT − mean(v k I iT )) 2 var(v k I iT )
Thus eigenvalue k represents the variance of the representative facial image set along the axis described by eigenvector k. The space spanned by the eigenvectors vk, k=1,...,K corresponding to the largest K eigenvalues of the covariance matrix C, is called the face space. The eigenvectors of matrix C, which are called eigenfaces from a basis set for the face images. A new face image Γ is transformed into its eigenface components (projected onto the face space) by:
ω k =< vk , (Γ − φ ) >= vk T (Γ − φ )
(2.14)
For k=1,...,K. The projections wk form the feature vector Ω=[ w1, w2,..., wK] which describes the contribution of each of each eigenface in representing the input image. Given a set of face classes Eq and the corresponding feature vectors Ωq, the simplest method for determining which face class provides the best description
23
ξ (2. by varying illumination. System performs face recognition in realtime. Given N object images taken under P views and L different 24 . They found that their system had the most trouble with faces scaled larger or smaller than the original dataset. which they minimize by multiplying input images with a 2D Gaussian to diminish the contribution of the background and highlight the central facial features. Turk and Pentland’s paper was very seminal in the field of face recognition and their method is still quite popular due to its ease of implementation.15) A face is classified as belonging to class Eq when the minimum ξq is below some threshold θ∈ and also E q = arg min q { q } . Turk and Pentland [20] test how their algorithm performs in changing conditions. size and orientation of the faces.of an input face image Γ is to find the face class that minimizes the Euclidean distance in the feature space: ξq = Ω − Ωq (2. the face is classified as unknown.16) Otherwise. Murase and Nayar [85] extended the capabilities of the eigenface method to general 3Dobject recognition under different illumination and viewing conditions. To overcome this problem they suggest using a multiresolution method in which faces are compared to eigenfaces of varying sizes to compute the best match. Also they note that image background can have significant effect on performance.
the viewbased representation can yield more accurate representation of the underlying geometry. Given N individuals under P different views. Later. one for each face orientation. The eigenface decomposition of this space was used for feature extraction and classification. based on the eigenface decomposition. In order to deal with multiple views. recognition is performed over P separate eigenspaces. the image is projected onto appropriate view space and then recognized. each capturing the variation of the individuals in a common view. a universal image set is built which contains all the available data. The ‘view based’ approach is essentially an extension of the eigenface technique to multiple sets of eigenvectors. This is accomplished by calculating the residual description error (distance from feature space: DFFS) for each view space. Once the proper view is determined. However in order to insure discrimination between different objects the number of eigenvectors used in this method was increased compared to the classical Eigenface method. The view based approach is computationally more intensive than the parametric approach because P different sets of V projections are required (V is the number of eigenfaces selected to represent each eigenspace). Pentland et al [86] developed a ‘view based’ eigenspace approach for human face recognition under general viewing conditions. Naturally. in the first stage of this approach. 25 . the orientation of the test face is determined and the eigenspace which best describes the input image is selected.illumination conditions. In this way a single ‘parametric space’ describes the object identity as well as the viewing or illumination conditions.
Each face image is represented as a linear combination of the eigenfaces.A given test image is approximated by a combination of eigenfaces.1: Appearance model 26 . A distance measure is used to compare the similarity between two images.2. 100 eigenvectors are enough to code a large database of faces. 4.3.A set of eigenfaces is generated by performing principal component analysis (PCA) on the face images. Face Recognition using Eigenfaces There are two main approaches of recognizing faces by using eigenfaces. Approximately.2. p g ΩU ∆ −k S ( p. Appearance model: 1.1.A database of face images is collected 2. 3. g ) Figure 2.2.
Two datasets Ωl and ΩE are obtained by computing intrapersonal differences (by matching two views of each individual in the dataset) and the other by computing extrapersonal differences (by matching different individuals in the dataset). 2.2: Discriminative model Discriminative model: 1. g ) Figure 2.Two datasets of eigenfaces are generated by performing PCA on each class. 27 .p g Ωl ΩE P(Ω1  ∆ ) P(∆  Ω1 ) S ( p. respectively.
performance approaches that of correlation. Two images are determined to be the same individual. as more principal components are used.5.Similarity score between two images is derived by calculating S=P(Ωl∆). Although the recognition performance is lower than the correlation method. In [20]. Eigenfeatures Pentland et al.1. authors reported that the performances level off at about 45 principal components. 2. if S>0. The recognition rates increase with the number of principal components (eigenfaces) used and in the limit. This was argued based on the fact that first components are more influenced by variations of lighting conditions. It has been shown that removing first three principal components results in better recognition performances (the authors reported an error rate of %20 when using the eigenface method with 30 principal components on a database strongly affected by illumination variations and only %10 error rate after removing the first three components). The recognition rates in this case were better than the recognition rates obtained using the correlation method. where ∆ is the difference between a pair of images. and [86]. This can be either a modular or a layered representation of the face. the substantial reduction in computational complexity of the eigenface method makes this method very attractive. 28 .2.3.3. [86] discussed the use of facial features for face recognition.3.
a new face is classified such that this score is maximized. and orientation using an affine transform so that three reference points satisfy a specific spatial arrangement. A simple approach for recognition is to compute a cumulative score in terms of equal contribution by each of the facial feature scores. [87].4. After the facial features in a test image were extracted. The performance of eigenfeatures method is close to that of eigenfaces. For each of the facial features. The position of these points 29 . The KarhunenLoeve Transform of the Fourier Spectrum Akamatsu et.where a coarse (lowresolution) description of the whole head is augmented by additional (highresolution) details in terms of salient facial features. illustrated the effectiveness of KarhunenLoeve Transform of Fourier Spectrum in the Affine Transformed Target Image (KLFSAT) for face recognition. a feature space is built by selecting the most significant eigenfeatures (eigenvectors corresponding to the largest eigenvalues of the features correlation matrix). 2. The eigenface technique was extended to detect facial features. size. al. the original images were standardized with respect to position. however a combined representation of eigenfaces and eigenfeatures shows higher recognition rates.3. First. a score of similarity between the detected features and the features corresponding to the model images is computed.1. Once the cumulative score is determined. More elaborate weighting schemes can also be used for classification.2.
facial expression and orientations are deemphasized.1. the authors proposed a new method for reducing the dimensionality of the feature space by using Fisher’s Linear Discriminant (FLD) [90]. let Ni be the number of samples of class i. Linear Discriminant Methods. The FLD uses the class membership information and develops a set of feature vectors in which variations of different faces are emphasized while different instances of faces due to illumination conditions.3... The eigenface method is applied discussed in the section 2.2. 2.Fisherfaces In [88].is related to the position of some significant facial features. i=1 . the KLFSAT performed better than classical eigenfaces method under variations in head orientation and shifting. to the magnitude of the Fourier spectrum of the standardized images (KLFSAT).3. However the computational complexity of KLFSAT method is significantly greater than the eigenface method due to the computation of the Fourier spectrum.1. [89]. Due to the shift invariance property of the magnitude of the Fourier spectrum. Fisher’s Linear Discriminant Given c classes with a priori probabilities Pi. Then the following positive semidefinite scatter matrices are defined as: 30 .2..c .1.2.3.2. 2..2.
SB is the betweenclass scatter matrix and represents the scatter of the mean µi of class i around the overall mean vector µ.. v k ].17) Sw = Pi j =1 ( x ij − µ i )( x ij − µ i ) T (2.20) Sw is the within –classscatter matrix and represents the average scatter of sample vector of class i.. (2.19) and µ is the overall mean of sample vectors: µ= 1 c i =1 c Ni Ni i =1 j =1 x ij . (2.. v 2 . µi is the mean of class i: 1 µi = Ni Ni j =1 x ij .SB = c i =1 c i =1 Pi ( µ i − µ )( µ i − µ ) T Ni (2.21) 31 . Vopt = arg max v V T S BV V T S wV = [v1 . the Linear Discriminant Analysis (LDA) selects a matrix Vopt∈Rnxk with orthonormal columns which maximizes the ratio of the determinant of the between class scatter matrix of the projected samples. (2.. If Sw is non singular.18) Where x ij denotes the jth ndimensional sample vector belonging to class i.
.k} is the set of generalized eigenvectors of SB and Sw corresponding to the set of decreasing eigenvalues {λi i=1.18).22) As shown in [91].17). because in general matrix Sw is singular. (2.. In [88]... The scatter matrices SB and Sw are defined in Equations (2. pose or facial expressions) are defined to be in the same class and faces of different subjects are defined to be from different classes.2. the 32 . However the matrix Vopt cannot be found directly from Equation (2. the upper bound of k is c1. the number of pixels in each image n is much larger than the number of images in the learning set N.Where {vii=1 . and in general.... The matrix Vopt describes the Optimal Linear Discriminant Transform or the FoleySammon Transform. the Linear Discriminant Transform performs a rotation on a set of axes [v1..e.. (2. There have been presented many solutions in the literature in order to overcome this problem [92. i. The face image in the training set are twodimensional arrays of intensity values.2.. 2. represented as vectors of dimension n.3.21). Different instances of a person’s face (variations in lighting. vk] along which the projection of sample vectors show maximum discrimination.2. Face Recognition Using Linear Discriminant Analysis Let a training set of N face images represents c different subjects...k}. This stems from the fact that the rank of Sw is less than Nc. S B v i = λi S w v i . While the KarhunenLoeve Transform performs a rotation on a set of axes along which the projection of sample vectors differ most in the autocorrelation sense. 93]. v2.
24) and.. applying the standard linear discriminant defined in Equation (2.. q=1.authors propose a method which is called Fisherfaces. the Fisherfaces do not correspond to face like patterns. These feature vectors are used directly for classification.23) [ ] (2.. The problem of Sw being singular is avoided by projecting the image set onto a lower dimensional space so that the resulting within class scatter is non singular. Unlike the Eigenfaces.. V fld = arg max v T V T V pca SBV pcaV T V T V pca SwV pcaV . More formally Vopt is given by: Vopt =VfldVpca. All example face images Eq. The columns of Vopt are orthogonal vectors which are called Fisherfaces. 33 .21) to reduce the dimension to c1.9).25) Where C is the covariance matrix of the set of training images and is computed from Equation (2. (2. (2. This is achieved by using Principal Component Analysis (PCA) to reduce the dimension of the feature space to Nc and then.Q in the example set S are projected on to the vectors corresponding to the columns of the Vfld and a set of features is extracted for each example face image. Where V pca = arg max v V T CV .
26) Therefore. Ε 0 ) . 2 v =1 Σ Ε∈S (Γv − Ε v ) K (2. However. 34 . However the computational complexity of this method is significantly greater than classical Fisherface method due to the computation of the Fourier spectrum. Conf (Γ. The results reported by the authors showed that LDA in the Fourier domain is significantly more robust to variations in lighting than the LDA applied directly to the intensity images. the best match E0 is given by Ε 0 = arg min Ε∈S {D(Γ. in [89] as a measure in the feature space. In [87]. is proposed a weighted mean absolute/square distance with weights obtained based on the reliability of the decision axis. Ε ) = αv .28) where E1 is the second best candidate. Ε ) = 1 − D (Γ. for a given face image Γ. al. the recognition task can be performed by using the Euclidean distance in the feature space. Akamatsu et. Ε)}. Ε 1 ) 0 (2. (2.27) The confidence measure is defined as: D (Γ. applied LDA to the Fourier Spectrum of the intensity image.Having extracted a compact and efficient feature set. (Γv − Ε v ) 2 D ( Γ.
.2. i=1. ur... r i =1 λi vi viT .3. vq).2.1 Singular value decomposition Methods based on the Singular Value Decomposition for face recognition use the general result stated by the following theorem: Theorem: Let Ipxq be a real rectangular matrix Rank(I)=r. In [93]. Zhong revealed the importance of using SVD for human face recognition by proving several important properties of the SV vector as: the stability of the SV vector to small 35 . λ2. 0.2.. V=( v1.....2. ui..3.... 0)... vr.3. 2. then there exists two orthonormal matrices Upxp.y) be a two dimensional (mxn) array of intensity values and [λ1.. v2. Vqxq and a diagonal matrix Σpxq and the following formula holds: I = UΣV T = where U=( u1..29) Σ=diag(λ1. λi2 .. ur+1.q are the eigenvectors corresponding to eigenvalues of IIT and ITI.. Face recognition Using Singular Value Decomposition Let a face image I(x.....>λr>0.3.2. λr] be its singular value (SV) vector. λ1>λ2>.. up)...... λ2.... j=1.. vj.. u2...r are the eigenvalues of IIT and ITI.3....p. λr.3..... i=1. Singular Value Decomposition Methods 2. (2. vr+1.
is the most popular one. It has been shown that it is difficult to obtain the optimal discriminant vectors in the case of small number of samples. circumvented the problem by adding a small singular value perturbation to Sw resulting in Sw(t) such that Sw(t) becomes nonsingular. optimal discriminant vectors. solved the problem by rank decomposition of Sw. Hong [93]. who substitute Sw. However. i.e. 36 . which present c different subjects.17) and (2. Among various transformations of compressing dimensionality.perturbations caused by stochastic variation in the intensity image. it has been shown that compressing the original SV vector into a low dimensional space. Given N face images. the SV vectors are extracted from each image. i.18). Many solutions have been proposed to overcome this problem. However the perturbation of Sw introduces an arbitrary parameter.e. by means of various mathematic transforms leads to higher recognition performances. the proportional variance of the SV vector to proportional variance of pixels in the intensity image. the scatter matrices SB and Sw of the SV vectors are constructed. the invariance of the SV feature vector to rotation transform. and the range to which the authors restricted the perturbation is not appropriate to ensure that the inversion of Sw(t) is numerically stable. the number of samples is less than the dimensionality of the SV vector because the scatter matrix Sw is singular in this case. Cheng et al [92]. the FoleySammon transform based on Fisher criterion. translation and mirror transform. According to Equations (2.by the positive pseudoinverse Sw+. This is a generalization of Tian’s method [94]. The above properties of the SV vector provide the theoretical basis for using singular values as image features.
vk} has been extracted.. When a test image is acquired. Rabiner [69][96]. its SV vector is projected onto the space spanned by {v1. The training set used consisted of a small sample of face images of the same person. HMM are made of two interrelated processes: 37 . .After the set of optimal discriminant vectors {v1. The Frobenius norm is used as a criterion to determine which person the test image belongs. provides an extensive and complete tutorial on HMMs... the feature vectors are obtained by projecting the SV vectors onto the space spanned by {v1. The eigenvalues are tresholded to disregard the values close to zero. v2. . A test image is then projected onto the space spanned by the eigenvectors. v2. vk}. then the average image is given by 1 N N j =1 I ij ..2.. If I ij represents the jth face image of person i.. vk} and classification is performed in the feature space by measuring the Euclidean distance in this space and assigning the test image to the class of images for which the minimum distance is achieved. Eigenvalues and eigenvectors are determined for this average image using SVD. Average eigenvectors (called feature vectors) for all the average face images are calculated. Hidden Markov Model Based Methods Hidden Markov Models (HMM) are a set of statistical models used to characterize the statistical properties of a signal.. . 2..4. v2. Another method to reduce the feature space of the SV feature vectors was described by Cheng et al [95]..
j ≤ N 1≤ i ≤ N (2. The elements of HMM are: N.. VM}. 0 ≤ aij ≤ 1 . V2. the number of different observation symbols. π. If S is the set of states. 1≤ i ≤ N (2.. the observation symbol probability matrix. the number of states in the model. A={aij} where Aij = P[qt = S j  qt −1 = S i ] .  A set of probability density functions associated to each state. SN}. A. (2. then S={ S1. M.. then V={ V1..33) Using a shorthand notation..32) And Qt is the observation symbol at time t. 1 ≤ k ≤ M (2. S2. 1 ≤ j ≤ N. N j =1 1 ≤ i. the initial state distribution. The state of the model qt time t is given by qt∈S. a state transition probability matrix and an initial state probability distribution. unobservable Markov chain with finite number of states.30) aij = 1.. 1 ≤ t ≤ T . If V is the set of all possible observation symbols (also called the codebook of the model). B=bj(k) where. an underlying. π=πi where π i = P[q1 = S i ] . a HMM is defined as: 38 .. where T is the length of the observation sequence (number of frames).. B j (k ) = P[Qt = vk  qt = S j ] .31) B. the state transition probability matrix.
The most general representation of the model probability density function (pdf) is a finite mixture of the form: bi (O) = M k =1 cik N (O. where the observations characterized as discrete symbols chosen from a finite alphabet V={v1. Without loss of generality N(O.vM}. However. 39 . v2.µik. (2. where data is naturally onedimensional (1D) along time axis.34) The above characterization corresponds to a discrete HMM.λ=(A. HMM have been used extensively for speech recognition. Attempts have been made to use multimodel representations that lead to pseudo 2D HMM [98].. the states are characterized by continuous observation density functions... 1≤ i ≤ N (2.π). U ik ). These models are currently used in character recognition [99][100]. Uik) is assumed to be a Gaussian pdf with mean vector µik and covariance matrix Uik. the equivalent fully connected twodimensional HMM would lead to a very high computational problem [97]. In a continuous density HMM.B.35) where cik is the mixture coefficient for the kth mixture in state i.. µ ik .
frontal position. where only transitions between adjacent states in a top to bottom manner are allowed [102]. In [104]. while a disjoint partitioning of the image could result in the truncation of features occurring across block boundaries. The observation sequence O is generated from an XxY image using an XxL sampling window with XxM pixels overlap (Figure 2. the effect of different sampling parameters has been discussed. Assuming that each face is in an upright. With no overlap. This ordering suggests the use of a topbottom model.Figure 2. The overlapping allows the features to be captured in a manner. There is an M line overlap between successive observations. Samaria et al proposed the use of the 1D continuous HMM for face recognition. if a small height of the sampling 40 . features will occur in a predictable order. eyes. mouth and chin [103].3). The states of the model correspond to the facial features forehead. which is independent of vertical position.3: Image sampling technique for HMM recognition In [101]. nose. Each observation vector is a block of L lines.
the set of training observation sequences was segmented into states via the Viterbi algorithm [50]. there is a higher probability of cutting across the features. The training images are collected for each subject in the database and are sampled to generate the observation sequence. A common prototype (state) model is constructed with the purpose of specifying the number of states in the HMM and the state transitions allowed. a maximum likelihood estimate of 41 .4 and is a variant of the Kmeans iterative procedure for clustering data: 1. The general HMM training scheme is illustrated in Figure 2.π) to describe ‘best’.. The result of segmenting each of the training sequences is for each of N states. The goal of this stage is to find a good estimate for the observation model probability matrix B. Given c face images for each subject of the training set.. 3. On the first cycle.B. 2. it has been shown that a good initial estimates of the parameters are essential for rapid and proper convergence (to the global maximum of the likelihood function) of the reestimation formulas. as the window height increases. the observations O={ o1.oT}.window is used. However. matched with each model state and the initial model parameters are extracted.. In [96]. in the sense of maximizing P(Oλ).. A (model initialization). On successive cycles. the goal of the training stage is to optimize the parameters λi=(A. o2. the segmented data do not correspond to significant facial features. the data is uniformly segmented. A set of initial parameter values using the training data and the prototype model are computed iteratively.
i=1. 5.5). If the model distance score falls below the threshold. In order to achieve this. (2. Following the Viterbi segmentation. the model parameters are reestimated using the BaumWelch reestimation procedure. the image is converted to an observation sequence and then model likelihoods P(Otestλi) are computed for each λi. and the overall training loop is repeated.... as V = arg max 1≤i ≤c [P(Otest  λi )]. This procedure adjusts the model parameters so as to maximize the probability of observing the training data.the set of observations that occur within each state according to the current model. given each corresponding model. Recognition is carried out by matching the test image against each of the trained models (Figure 2.. 4. then model convergence is assumed and the final parameters are saved. The resulting model is then compared to the previous model (by computing a distance score that reflects the statistical similarity of the HMMs).36) 42 . If the model distance score exceeds a threshold. then the old model λ is replaced by the ~ new model λ . The model with highest likelihood reveals the identity of the unknown face.c.
43 .4: HMM training scheme The HMM based method showed significantly better performances for face recognition compared to the eigenface method. Extremely encouraging preliminary results (error rates below %5) were reported in [105] when pseudo 2D HMM are used. Furthermore. However the 1D continuous HMM are computationally more complex than the Eigenface method.Model Initialization Training Data Sampling State Sequence Segmentation NO Model Convergence YES Model Parameters Estimation of B Parameters Model Reestimation Figure 2. A solution in reducing the running time of this method is the use of discrete HMM. This is due to fact that HMM based method offers a solution to facial features detection as well as face recognition. the authors suggested that Fourier representation of the images can lead to better recognition performance as frequency and frequencyspace representation can lead to better data separation.
A typical image recognition network requires N=mxn input neurons. For example. if the images are 128x128. a simple network can be very complex and difficult to train.6). Neural Networks Approach In principal. Cottrell and Fleming [107] used two BP nets (Figure 2.λ1 Probability Computation Training Set λ2 Sampling Probability Computation Select Maximum λN Probability Computation Figure 2. one for each of the pixels in an nxm image. the popular backpropagation (BP) neural network [106] can be trained to recognize face images directly.2. However. The first net operates in the autoassociation mode [108] and extracts 44 .5: HMM recognition scheme 2.5.384. the number of inputs of the network would be 16. In order to reduce the complexity.
which operates in the more common classification mode. when the sigmoidal functions at the network nodes are replaced by linear 45 . The autoassociation net has n inputs. In this way. or a feature vector. the hidden layer output h constitutes a compressed version of x. Usually p is much smaller than n. and can be used as the input to classification net. n outputs and p hidden layer nodes.6: Autoassociation and classification networks Bourland and Kamp [108] showed that “under the best circumstances”. The network takes a face vector x as an input and is trained to produce an output y that is a ‘best approximation’ of x. classification net hidden layer outputs Autoassociation net Figure 2.features for the second net.
in order to minimize training error.37) Here. usually p<<n and p<N) and yk (ndimensional). If we pack xk. then above relations can be rewritten as H = F (W1 X ). respectively.40) 46 . When the network is nonlinear. yk and hk into matrixes as in the eigenface case.. W1 (p by n) and W2 (n by p) are corresponding weight matrixes and F(. Y=W2H should be the best rankp approximation to X. 2.39) since Y=W2H. Hence.functions (when the network is linear).. (2.. the feature vector is the same as that produced by the KarhunenLoeve basis. The problem here turns out to be an application of the singular value decomposition. (2. Specifically. the outputs of the hidden layer and output layer for the autoassociation net are hk (pdimensional.38) Minimizing the training error for the autoassociation net amounts to minimizing the Frobenius matrix norm X −Y 2 = n k =1 xk − y k .) is either linear or a nonlinear function. 2 (2. or the eigenfaces. k=1.. y k = W2 hk . the feature vector could deviate from the best. which means W2 H = U p ΛV pT (2. with hk = F (W1 x k ).N. its rank is no more than p. applied ‘component by component’. suppose that for each training face vector xk (ndimensional). Y = W2 H .
[19] which combines local image sampling.41) SinceU p contains the first eigenvectors of XXT. vp]T. when it is trained by the BP algorithm with an nonlinear F(. al. the first 50 principal components of the images are extracted and reduced to 5 dimensions using an autoassociative neural network. v2. a hierarchical neural network. In the 47 . In [109]. One way to achieve this optimum is to have a linear F(.) and to set the weights to W1T = W2 = U p (2. which also are the first p eigenvectors of XXT and XTX..).. a self organizing map neural network.. which is grown automatically and not trained with gradient descent. it must be noted that the autoassociation net.. was used for face recognition by Weng and Huang [110]. generally can not achieve this optimal performance. up]T. Are the first p left and right singular vectors in the SVD of X respectively.. In a different approach.. The resulting representation is classified using a standard multilayer perceptron. Vp =[ v1. The most successive face recognition with neural networks is a recent work of Lawrence et. we have for any input x h = W1 x = U p x (2. u2. and a convolutional neural network...where Up =[ u1.42) which is the same as the feature vector in the eigenface approach. However.
j w+1. xiw. The second method creates a representation of the local sample by forming a vector out of the intensity of the center pixel and the difference in intensity between the center pixel and all other pixels within the square window. The resulting representation becomes partially invariant to variations in intensity of the complete sample and the degree of invariance can be modified by adjusting the weightw ij.corresponding work two different methods of representing local image samples have been evaluated..j.j is the weight of center pixel value xi. The first method simply creates a vector from a local window on the image using the intensity values at each point in the window. wi.jw. Then the vector is given by [xijxiw ..j+w]...j wijxij. then the vector associated with this window is simply [xiw..j+w1. xi+w. xijxi+w..7: The diagram of the Convolutional Neural Network System 48 .. xij..... imag Image Sampling SelfOrganizin g Map Dimensionality Reduction Feature Extractio n Layers Multilayer Perceptron Style Classifier Classificati on Convolutional Neural Network Figure 2.. In each method.. xijxiw.. If the local window is a square of sides 2W+1 long.j+w]. a window is scanned over the image. xi+w.jw+1.. w.j+w1. xijxi+w.. centered on xij.
IGIT is the pixel by pixel product. 2. SOM.The self organizing map.43) Where IG is the gallery image which must be matched to the test image. Template Based Methods The most direct of the procedures used for face recognition is the matching between the test images and a set of training images based on measuring the correlation. is an unsupervised learning process which learns the distribution of a set of patterns without any class information. The matching technique is based on the computation of the normalized cross correlation coefficient CN defined by. usually in a lower dimensional space. 112].6. E is the expectation operator and σ is the standard deviation over the area being matched. The best results for the reduction of the illumination changes 49 . For classification a convolutional network is used. as it achieves some degree of shift and deformation invariance using three ideas: local receptive field. and spatial subsampling. However correlation based methods are highly sensitive to illumination.2. E{I G I T } − E{I G }E{I T } σ {I T }σ {I G } CN = (2.7. introduced by Teudo Kohonen [111. The diagram of the system is shown in Figure 2. This normalization rescales the test and gallery images energy distribution so that their variances and averages match. The SOM defines a mapping from an input space Rn onto a topologically ordered set of nodes. rotation and scale changes. IT. shared weights.
in this method first positions of those features are detected. by looking for the maximum absolute values of the normalized correlation coefficient of these templates at each point in the test image. 50 . The method purposed by Brunelli and Poggio uses a set of templates to detect the eye position in a new image. they are compared to those of gallery faces returning a vector of matching scores (one per feature) computed through normalized cross correlation. In order to handle scale variations. 81. Detection of facial features is also subjected to a lot of studies [36. In order to reduce complexity. 113. After cumulative scores are computed. 114]. In [16].were obtained using the intensity of gradient ( δ X I G + δ Y I G ) . However. and also it must be noted that eyes of different people can be markedly different. Correlation method is computationally expensive. Such difficulties can be reduced by using a hierarchical correlation [115]. five eye templates at different scales were used. After facial features are detected for a test face. This cumulative score can be computed in several ways: choose the score of the most similar feature or sum the feature scores or sum the feature scores using weights. The similarity scores of different features can be integrated to obtain a global score. a test face is assigned to the face class for which this score is maximized. this method is computationally expensive. Brunelli and Poggio describe a correlation based method for face recognition in which templates corresponding to facial features of relevant significance as the eyes. so the dependency of the recognition on the resolution of the image has been investigated. nose and mouth are matched.
) the distance between an observed face x and a gallery face c is the common Euclidean distance d ( x. In this case. 2. The correlation method as described above requires a robust feature detection algorithm with respect to variations in scale. it is important to consider more carefully how a “distance” should be defined. and this distance is sometimes computed using an alternative x ~ orthonormal basis as d ( x.2. Although feature detection is similar to previously described correlation method to handle rotations. 51 . c) = x − c . After face pose is determined.The recognition rate reported in [16] is higher than 96%. c) = ~ − c . templates from different views and different people are used. computational complexity is also increased. including rotations in depth. Moreover the computational complexity of this method is quite high. Feature Based Methods Since most face recognition algorithms are minimum distance classifiers in some sense. neural nets… etc. In the first stage three facial features (eyes and nose lobe) are detected to determine face pose. matching procedure takes place with the corresponding view of the gallery faces. illumination and rotations. as the number of model views for each person in the database is increased. Beymer [116] extended the correlation based approach to a view based approach for recognizing faces under varying orientations.7. In the previous examples (eigenface.
When there is an affine transformation between two faces (shift and dilation). Hence. The feature points are represented by nodes Vi i={1. Feature based approaches can be a solution to the above problems. Comparison of two faces begins with the alignment of those two graphs by matching centroids of the features.c) will not be zero.…}. y. it is very useful to store information only about the key points of the face. The information about a feature point is contained in {S. qi = [Qi ( x. where S represents the spatial locations and q is the feature vector defined by. This method reduces the storage requirements by storing facial feature points detected using the Gabor wavelet decomposition. in a consistent numbering technique. the identification process utilizes the information present in a topological graph representation of the feature points. Moreover. this method has some degree of robustness to rotation in depth but only under restrictedly controlled conditions..44) 52 . y. it also has some shortcomings.θ 1 ). Manjunath et al. In the proposed method by Manjunaht et. again d(x.. q}. al. 3. [36]. [36] proposed a method that recognizes faces by using topological graphs that are constructed from feature points obtained from Gabor wavelet decomposition of the face. when there is local transformations and deformations (x is a “smiling” version of c). in fact it can be quite large.θ N )] (2. As another example..While such an approach is easy to compute. d(x. Moreover. 2. illumination changes and occluded faces are not taken into account.. Qi ( x.c) will not be zero.
where j is the jth of the N neighbors. n ′ refer to nodes in the stored graph O then the two graphs are matched as follows [36]: 1.46) 53 . which might be different either in total number of feature points or in the location of the respective faces.45) 3. Search for the best feature point {Vi’} in O using the criterion S ii ′ = 1 − qi . Ni represents a set of neighbors. Let Ni be the ith feature point { Vi }of I. One is the topological cost and the other is a similarity cost.corresponding to ith feature point.e. After matching. two cost values are evaluated [36]. Let ρ ii′jj′ = min d ij d i′j′ .. If i.y. Let nodes i and j of the input graph match nodes i ′ and j ′ of the stored graph and let j ∈ Ni (i. The vector qi is a set of spatial and angular distances from feature point i to its N nearest neighbors denoted by Qi(x.qi′ = min S im′ m′∈N i q i q i′ (2.θj). the total cost is computed taking into account the topology of the graphs. j refer to nodes in the input graph I and x ′. (2. Vj is a neighbor of Vi). The centroids of the feature points of I and O are aligned. In order to identify an input graph with a stored one. m′. 2. y ′. The neighbors satisfying both maximum number N and minimum Euclidean distance dij between two points Vi and Vj are said to be of consequence for ith feature point. The topology cost is given by d i′j′ d ij Τii′jj′ = 1 − ρ i′jj′ . .
The total cost is scaled appropriately to reflect the possible difference in the total number of the feature points between the input and the stored graph. 6. Let S be the 54 . If ni. directly using the number of feature points of faces can be result in wrong classifications while the number of feature points can be changed due to exterior factors (glasses…etc). occluded cases will cause a great performance decrease. The best candidate is the one with the least cost. [117]. which has roots in aspectgraph matching.48) The recognized face is the one that has the minimum of the combined cost value. no are the numbers of the feature points in the input and stored graph. C ( I . Moreover.O).47) where λt is a scaling parameter assigning relative importance to the two cost functions.4. O ′) O′ n I nO and the scaled cost is . 5. since comparison of two face graphs begins with centroid alignment.O)=sf C1(I. Another feature based approach is the elastic matching algorithm proposed by Lades et al. then scaling factor s f = max C(I. In this method [36]. O * ) = min C ( I . The total cost is computed as C1 = i S ii′ + λt i j∈N i Τii′jj′ (2. respectively. nO n I (2.
c={ i.original twodimensional image lattice (Figure 2. 55 . j∈S } x (2. x={ j.49) where S1 is a lattice embedded in S and ci is a feature vector at position i. An observed face image is defined as a vector field on the original image lattice S. since ci is composed of the magnitude of the Gabor filter responses at position i∈S1. S1 is much coarser and smaller than S.50) where xj is the same type of feature vector as ci but defined on the finegrid lattice S.8). c should contain only the most critical information about the face.8: A 2D image lattice (grid graph) on Marilyn Monroe’s face.i ∈S1 } c (2. Figure 2. The face template is a vector field by defining a new type of representation. The Gabor features ci provide multiscale edge strengths at position i.
51) E (M ) = i − ci .52) Due to the large number of possible matches. after being matched with c. an approximate solution has to be found. x j ci x j +λ i1 . this might be difficult to obtain in an acceptable time.In the elastic graph matching approach [23. Such a match is called elastic since the preservation can be approximate rather than exact. Here x ′ is the part of x that. Finding the best match is based on minimizing the following energy function [117]. A match between x and c can be described uniquely through a mapping between S1 andS. In rigid matching c is moved around in x like conventional template matching and at each position c − x ′ is calculated. Recognizing a face. if i1 and i2 are close in S1 then preserving local geometry means j1=M(i1) to be close to j2=M(i2). the best match should preserve both features and local geometry. the total number of such mappings is Ss1. This is achieved in two stages: Rigid matching and deformable matching [117]. 27.x) is defined through ‘best match’ between x and c. In addition.i2 [(i1 − i2 ) − ( j1 − j 2 )]2 . Hence. then feature preserving means that xj is not much different from ci. 117]. lattice S1 is 56 . In deformable matching. (2. If i∈S1 and j=M(i). the lattice S1 can be stretched unevenly. the distance d(c. (2. does not lead to any deformation of S1. M:S1→S However without restriction.
4. Due to phase rotation. Perform template matching between the c with y using rigid matching. Therefore. Malsburg et al. Pick another face image (of a different person) and generate a face vector field. Collect the feature vectors at vertices of S1 to form the template c. Repeat steps 46. Pick the face image of an individual and generate a face vector field using Gabor filters. When a best match is found. Place a coarse grid lattice S1 ‘by hand’ on x such that vertices are close to important facial features such as the eyes. phase should either be ignored or compensated for its variation explicitly. denoted as y. Denote this as vector field x. 27] developed a system based on graph matching approach on with several major modifications. 5. vectors taken from image points only a few pixels apart from each other have very different coefficients. [23. The procedure is as follows [117]: 1. 3. 6. collect the feature vectors at vertices of S1 to form the template of y. An automated scheme has been shown to work quite well. First. 57 . they suggested using not only magnitude but also phase information of the Gabor filter responses.stretched through random local perturbations to further reduce the energy function. although representing almost the same local feature. 2. 7. There is still a question of face template construction.
These improvements make it possible to extract an image graph from a new face image in one matching process.As mentioned in [23].9.10). which allows matching on face images of previously unseen individuals. Malsburg et al. Second modification is the use of bunch graphs (Figure 2. The face bunch graph is able to represent a wide variety of faces. using phase information has two potential advantages. Computational efficiency. 1. compared to [117]. 2. and ability to deal with different poses explicitly are the major advantages of the system in [23]. 58 . Phase information is required to discriminate between patterns with similar magnitudes. proposed to compensate phase shifts by estimating small relative displacements between two feature vectors to use a phase sensitive similarity function [23]. 2. Since phase varies quickly with location provides a means for accurate vector localization in an image.
(a) (b) Figure 2.10: Bunch graph matched to a face.9: A brunch graph from the a) artistic point of view (``Unfinished Portrait'' by Tullio Pericoli (1985)). b) scientific point of view Figure 2. 59 .
Were expression. age and ethnic origin spanned evenly? 4. 60 . For a better comparison of those algorithms DARPA and Army Research Laboratory established the FERET program (see section 4. Was the subject sample balanced? Where gender. However.4) with the goals of both evaluating their performance and encouraging advances in the technology [83]. By 1993.1. How many subjects there in database? How many images were used for training and testing? 5. because generally experiments are carried out on different datasets.8.2. However the most direct and reliable comparison between different approaches is obtained by experimenting with the same database. Current State of the Art Comparing recognition results of different face recognition systems is a complex task.2. Were the face features located manually? Answering the above questions contributes to building better description of the constraints within each approach operated. the following 5 questions can help while reviewing different face recognition methods: 1. there were several algorithms claiming to have accurate performance in minimally consternated environments. This helps to make a more fair comparison between different set of experimental results. Where subjects allowed wearing glasses and having beards or other facial masks? 3. head orientation and lighting conditions controlled? 2.
the performance of the four algorithms is similar enough that it is difficult or impossible to make meaningful distinctions between them.2). the others require approximate eye locations to operate. Algorithm developed at Rockefeller University. there are three algorithms that have demonstrated the highest level of recognition accuracy on large databases (1196 people or more) under double blind testing conditions.2. It begins by computing Gabor Wavelet transform of the image and does the comparison between images using a graph matching algorithm (section 2. The MIT.1. These are the algorithms from University of Southern California (USC).3. The MIT and USC algorithms have also become the basis for commercial systems. Results of FERET test will be given at section 4.4.7). from USC and MIT are capable of both minimally constrained detection and recognition. 128]. dropped from testing to form a commercial enterprise. and MIT Media Lab [83. However the USC system uses a very different approach. 61 . was an early contender. Rockefeller and UMD algorithms all use a version of the eigenface transform followed by discriminative modeling (section 2. Only two of these algorithms.2.1.Today. University of Maryland (UMD). In FERET testing.
This fact especially gains importance when there is a huge face database. 14. 18. that are selectively tuned to orientation as well as to spatial frequency. 59. in human visual cortex. It was suggested that the response of a simple cell could be approximated by 2 D Gabor filters 62 . Feature based methods are based on finding fiducial points (or local areas) on a face and representing corresponding information in an efficient way. 36. choosing suitable feature locations and the corresponding values are extremely critical for the performance of a recognition system. However. Searching nature for finding an answer has lead researchers to examine the behavior of human visual system (HVS). 17. 62]. 23.CHAPTER 3 FACE REPRESENTATION AND MATCHING USING GABOR WAVELETS Using local features is a mature approach to face recognition problem [11. Physiological studies found simple cells. One of the main motivations of feature based methods is due to: representation of the face image in a very compact way and hence lowering the memory needs. 35.
35. One of the most successful face recognition method is based on graph matching of coefficients which are obtained from Gabor filter responses [83. 2 D Gabor functions are similar to enhancing edge contours. Over the last couple of years. as well as valleys and ridge contours of the image. such graph matching algorithm methods have some disadvantages due to their matching complexity. However.[119]. which are supposed to be the main important points on a face. by using such enhanced points as feature locations. and overall execution time. manual localization of training graphs. A novel Gabor based method may overcome those disadvantages. nose edges. 32]. Hence. a feature map for each facial image can be obtained and each face can be represented with its own characteristics without any initial constrains. 27. dimples. such an approach also enhances moles. Moreover. In this thesis. They use general face structure to generate graphs and such an approach brings the question of how efficient the feature represents the special facial characteristics of each individual. This corresponds to enhancing eye. 23]. mouth. etc. a novel method is proposed based on selecting peaks (highenergized points) of the Gabor wavelet responses as feature points. Having feature maps specialized for each face makes it possible to keep overall face information while enhancing local characteristics. scars. 36. it has been shown that using Gabor filters as the frontend of an automated face recognition system could be highly successful [23. instead of using predefined graph nodes as in elastic graph matching [23] which reduces the 63 .
3. The Gabor functions proposed by Daugman are local spatial bandpass filters that achieve the theoretical limit for conjoint resolution of information in the 2D spatial and 2D Fourier domains. On the theoretical side. 119] that simple cells in the visual cortex can be modeled by Gabor functions. an important insight has been advanced by Marcelja [122] and Daugman [118.1. Face Representation Using Gabor Wavelets Gabor Wavelets Since the discovery of crystalline organization of the primary visual cortex in mammalian brains thirty years ago by Hubel and Wiesel [121]. the conjoint timefrequency domain for 1D signals must necessarily be quantized so that no signal or filter can occupy less than certain minimal area in it. Feature vectors are constructed by sampling Gabor wavelet transform coefficients at feature points. In the following sections. Then the proposed algorithm will be explained. explicitly. 3. an enormous amount of experimental and theoretical research has greatly advanced our understanding of this area and the response properties of its cells. Gabor functions first proposed by Dennis Gabor as a tool for signal detection in noise.1. Gabor [120] showed that there exists a “quantum principle” for information. However. the details about the Gabor wavelet transform will be presented while giving the reasons of using it for face recognition.1. there is a trade off between time resolution and frequency 64 .representative capability of Gabor wavelets.
The center frequency of ith filter is given by the characteristic wave vector. Daugman [118.resolution.1) determines the oscillatory part of the kernel. 65 .1) Each ψi is a plane wave characterized by the vector k i enveloped by a Gaussian function. and the second term compensates for the DC value of the kernel. For such a case. where σ is the standard deviation of this Gaussian. ki = k ix k v cosθ µ = .θµ ). Subtracting the DC response. rediscovered and generalized to 2D. Gabor filters. are now being used extensively in various computer vision applications.2) having a scale and orientation given by (kv. kiy k v sinθ µ (3. the original Gabor elementary functions are generated with a fixed Gaussian. Gabor filters becomes insensitive to the overall level of illumination. The first term in the brackets (3. 119] generalized the Gabor function to the following 2D form in order to model the receptive fields of the orientation selective simple cells: Ψi ( x ) = ki 2 σ 2 e − ki 2 x 2 2 2σ e jki x − e − σ2 2 (3. while the frequency of the modulating wave varies. Gabor discovered that Gaussian modulated complex exponentials provide the best trade off.
(b) their coverage of spatial frequency plane 66 . 122] have proposed that an ensemble of simple cells is best modeled as a family of 2D Gabor wavelets sampling the frequency domain in a logpolar manner. (a) (b) Figure 3.Recent neurophysiological evidence suggests that the spatial structure of the receptive fields of simple cells having different sizes is virtually invariant. This class is equivalent to a family of affine coherent states generated by rotation and dilation. The decomposition of an image I into these states is called the wavelet transform of the image: Ri ( x ) = I ( x ′)Ψi ( x − x ′)dx ′ (3.3) Where I (x ) is the image intensity value at x . Daugman [118] and others [121.1: (a) an ensemble of Gabor wavelets.
B.2). (c) eyebrow.Each member of this family of Gabor wavelets models the spatial receptive field structure of a simple cell in the primary visual cortex. Manjunath [36] et al 67 . line endings and sharp changes in curvature. it still leaves plenty of room for improvement. Since such curves correspond to some lowlevel salient features in an image. (b) nose. (d)jawline. More recently. one of the most important application areas for 2D Gabor wavelet representation is face recognition (see Section 2. S. government activity (FERET program) to find the best face recognition system. these cells can be assumed to form a low level feature map of the intensity image (Figure 3. Utilization of the 2D Gabor wavelet representation in computer vision was pioneered by Daugman in the 1980s.4). Due to the endinhibition property of these cells. and receptive fields that are matched to the local features of the face (a) mouth. Although the recognition performance of this system shows qualitative similarities to that of humans by now means. The Gabor decomposition can be considered as a directional microscope with an orientation and scaling sensitivity. a system based on Gabor wavelet representation of the face image performed among other systems on several tests. (e) cheekbone. (a) (b) (c) (d) (e) Figure 3.2: Small set of features can recognize faces uniquely. Since the Gabor wavelet transform is introduced to computer vision area.2. they response to short lines. In a U.S.
2D Gabor Wavelet Representation of Faces Since face recognition is not a difficult task for human beings.1. 3. 68 . Gabor filters.1) [12]. are simply plane waves restricted by a Gaussian envelope function (3. modeling the responses of simple cells in the primary visual cortex.3: Gabor filters correspond to 5 spatial frequency and 8 orientation. studies for Gabor wavelet representation in the field of face recognition by using is continued with appending dynamic link architecture [117] and elastic graph matching [23] to the previous system.has developed a face recognition system based on this representation. selection of biologically motivated Gabor filters is well suited to this problem. Afterwards.2. Spatial frequency varies Orientation varies Figure 3.
an input face image and the amplitude of the Gabor filter responses are shown.4: Example of a facial image response to above Gabor filters.7) captures the whole frequency spectrum. Convolving the image with complex Gabor filters with 5 spatial frequency (v = 0. a) original face image (from Stirling database). both amplitude and phase (Figure 3. and b) filter responses. (a) Spatial frequency varies Orientation varies (b) Figure 3.An image can be represented by the Gabor wavelet transform allowing the description of both the spatial frequency structure and spatial relations.3).4.4) and 8 orientation (µ = 0.….…. In Figure 3. 69 .
Feature point localization In this step. 35].1. 3. 70 . highenergized points can be used in comparisons which forms the basis of this work. feature vectors are extracted from points with high information content on the face image.1. (2) Feature vector computation. 25. which are also the features that people might use for recognizing faces (Figure 3. Feature extraction Feature extraction algorithm for the proposed method has two main steps (Figure 3.6): (1) Feature point localization. such as dimples. etc.1. This approach not only reduces computational complexity. [23.One of the techniques used in the literature for Gabor based face recognition is based on using the response of a grid representing the facial topography for coding the face. 3. However.. moles. but also improves the performance in the presence of occlusions. we do not fix the locations and also the number of feature points in this work. Instead of using the graph nodes. nose and mouth. The number of feature vectors and their locations can vary in order to better represent diverse facial characteristics of different faces.3. facial features are assumed to be the eyes. In most featurebased methods.5). 26.3.
y0). y0 ) = max ( R j ( x. 71 . instead of finding the peaks of the responses.4) 1 R j ( x0 .5) is applied in order not to get stuck on a local . peaks are found by searching the locations in a window W0 of size WxW by the following procedure: A feature point is located at (x0.5: Facial feature points found as the highenergized points of Gabor wavelet responses. y )) ( x .5) where Rj is the response of the face image to the jth Gabor filter (3. y ) . y0 ) > N1 N 2 j=1. From the responses of the face image to Gabor filters.Figure 3.3). (3.40 N1 N 2 x =1 y =1 R j ( x. W0 is at (x0. maximum. A feature map is constructed for the face by applying above process to each of 40 Gabor filters. In our experiments a 9x9 window is used to search feature points on Gabor filter responses. Window size W is one of the important parameters of proposed algorithm. N1 N2 is the size of face image. if R j ( x0 . y0). the center of the window. y )∈W0 (3. and it must be chosen small enough to capture the important features and large enough to avoid redundancy Equation (3.….
6: Flowchart of the feature extraction stage of the facial images.start Image GWT Find feature points Feature vectors 1 2 3 4 5 6 7 40 41 42 xi yi end Gabor wavelet transform coefficients Coordinates of the feature point Figure 3. 72 .
v i . y k ) j = 1. j ( x k . allow representing both the spatial frequency structure and spatial relations of the local image region around the corresponding feature point.3. The remaining 40 components are the samples of the Gabor filter responses at that point. 73 . k = x k .40 . Since we have no other information about the locations of the feature vectors.1. feature vectors have 42 components. { } (3. as the samples of Gabor wavelet transform at feature points. Ri . Although one may use some edge information for feature point selection. here it is important to construct feature vectors as the coefficients of Gabor wavelet transform. Feature vector generation Feature vectors are generated at the feature points as a composition of Gabor wavelet transform coefficients.. kth feature vector of ith reference face is defined as... The first two components represent the location of that feature point by storing (x..3. y k .7) While there are 40 Gabor filters.2. y) coordinates. the first two components of feature vectors are very important during matching (comparison) process... Feature vectors.
2. In [23] similarity function at (3. 27].8). Equation (3.8) is a very common similarity measure between feature vectors. The location information is not used for vector similarity calculation. j (l ) 2 Si(k. k (l ) 2 l (3.k). S i ( j.3. where l is the number of vector elements. Matching Procedure Similarity Calculation In order to measure the similarity of two complex valued feature vectors.2).j).j) represents the similarity of jth feature vector of the test face. but sometimes we might have small variations [23. It must be clarified that the similarity function (3. to kth feature vector of ith reference face. vi .1.42. Proposed similarity measure between two vectors satisfies following constrains: 0 < Si < 1.2. j ) = l l . and if ith gallery face image is used also as the test image. (vi. Location information of feature vectors will also be used during matching. j (l ) S i (k . containing Gabor wavelet transform coefficients [36]. l = 3.2. but only the magnitudes of the wavelet coefficients are take place at (3. k (l ) vt .8) is used 74 .….8) is only one component of the proposed matching procedure (Section 3. j ) = 1 . following similarity function is used which ignores the phase: vi . 3.8) vt . (vi.
Moreover here topology of face is 75 . After such a localization. we may disregard the location information in the second step. Then phase can either be ignored or compensated as in [23]. In the early experiments it is observed that small spatial displacements cause change in complex valued coefficients due to phase rotation. Although phase compensated similarity function is found to increase recognition performance significantly [23.2. we eliminate the feature vectors of the reference images which are not close enough to the feature vectors of the test image in terms of location and similarity. yt) represents the location of a feature point on a reference face and test face respectively. similarity function without phase is chosen to avoid computational complexity. In the first step.27]. This comparison stage takes place in two steps. yr) and (xt. Face comparison After feature vectors are constructed from the test image. Comparing the distances between the coordinates of the feature points simply avoids the matching of a feature point located around the eye with a point of a reference facial image that is located around the mouth. Only the feature vectors that fit the following two criterions are examined in the next step. they are compared to the feature vectors of each reference image in the database.2. where th1 is the approximate radius of the area that contains either eye. mouth or nose. (xr − xt )2 + ( yr − yt )2 < th1 . (xr. 1. 3.with complex valued coefficients and an additional phase compensating term.
8). if th1 is too large then the face topology information could be totally wrong. Although this thresholding seems to be done very roughly it reduces the number of feature vectors at the gallery faces and increases the speed of algorithm at the following steps. such as different expression. we are left with Nk feature vectors of the reference face and Nkj of them are known to be close enough in terms of both location similarity to the jth feature vector of the test face. small variations in th1 and th2 will not effect the performance of the method. increasing th1 gives more area for searching the feature points with similarities larger than th2. possible 76 .j)>th2. This can be useful when the locations of features changed due to some reasons. increasing th2 in a large extent will result in finding no match. and conversely decreasing th2 could result in a redundant feature vector that will increase the computational cost. 2. By keeping th1 constant. where th2 is chosen as the standard deviation of similarities of all feature vectors in the reference gallery and the similarity of two vectors is computed by Equation (3. In other words. However. Similarity of two feature vectors is greater than th2. As a result of the first step. By changing th1 and th2 one can control the topology and vector similarity costs. Si(k. Hence. One can choose these values stated at above steps 1 and 2. However.also examined to use corresponding information at the final matching by only letting feature points that are match each other in a topological manner.
(3.9) Simi.matches are determined to each of the feature vector on the test face from the feature vectors of gallery faces. If ith gallery image used also as the test image.9) is applied as a second elimination of feature vectors of the reference face in order to guarantee surviving at most only one feature vector of a reference face to be matched with a feature vector of the test face: Simi . j = max ( S i (l .10). it can further be improved by considering the number of feature vectors. l∈N k . at the end of the first step. j )) . then OSi will be equal to 1. OSi = mean{Simi . Eventually. Equation (3. there could be no remaining feature vector from the gallery faces to be matched to a feature vector of the test face.j gives the similarity of the ith reference face to the test face based on the jth feature vector. When such a case occurs. j }. In the second step.9) In Equation (3.10) OSi represents the overall similarity of test face to ith reference face. and its values change from 0 to 1. Although OS gives a good measure for face similarity. that feature vector of test face is ignored and matching procedure is continued with others. the overall similarity of each reference face is computed as the mean of feature vector similarities that passed the two steps (3. After elimination of feature vectors in the gallery. It should be noted that the number of feature points for any of two face image is not equal even for images 77 . j (3.
85 as the mean of 20 vector similarities should be more valuable than OS=0. by only counting the number of feature vectors of a gallery face that have high similarity to a feature vector of the test face: Ci = j δ ( Sim i .) is the Delta Dirac function.from the same individual. glasses/no glasses. l (3. Recognition results achieved by both comparing only 78 . As an example. numbers of matched feature vectors of each gallery face gives an information about the topological matching.11) (3. In this representation for ideal results. j )) .95 as the mean of 2 vector similarities.11). Since the number of feature vectors that give the overall similarity (3. The reason is due to the huge variation of the number of feature vectors (e.g. number of reference faces where Nt is the total number of feature vectors of test face. and δ(.10) is not determined by OS. The simulation results are shown in Figure 3. etc. OS=0. For each feature vector of test face we sort reference faces by means of similarity. Cl = N t .…. In order to emphasize the information contained by the number of matched feature points. illumination changes. a new parameter C is computed.12) l=1. j = max( Sim l .7.) from face to face and also for the same face under different conditions. it is not very meaningful to use it alone. the corresponding figure must be a stair case function whose horizontal width includes the test image for each person in the gallery and vertical level is the correct result. and count the number of times that each reference face gets the first place (3. Moreover.
one can use the number of feature vectors of reference faces as an additional criteria.7a) and comparing only number of times having maximum similar feature vector (Figure 3. In order to find the best match. Having more feature points jth reference face gets maximum similarity at more points than ith does. (Ci < Cj <Nt) and( Cj<Nj).13) where Ci is the number of feature vectors of ith reference face that have maximum similarity to a feature vector of test face.7b are better.7. Ni is the number of feature vectors of ith reference image. the weighted sum of these two results can be used. Although results achieved in Figure 3. This improves the performance.7b) are presented in Figure 3. having the number of feature vectors Ni. We can explain this better by an example: Assume that there are two reference faces i and j. 79 . In the case. successive matching can be done by using the expectation that if ith reference face is correct it should get high values for both Ci and OSi. in case different false matches are obtained by using individual results of OS and C. If ith gallery image used also as the test image FSFi will be equal to unity. Nj and Nt respectively. the best candidate match is searched to maximize the following function. having glasses…etc. FSFi = α OS i + β Ci .similarities (Figure 3. ith reference face reaches the maximum similarity at its all feature points. (Ci=Ni).) than the corresponding reference face. a problem occurs when a test face has more feature vectors (due to illumination changes. However. Hence. and a test face. Ni (3.
11) becomes useless.14) l=1. Ci = j δ ( Sim i . Equation (3. it is seen that for larger databases counting only the maximum similar feature vectors of gallery faces as in (3. Instead.11) can be generalized for large databases by counting the number of feature vectors of each gallery face which is in the first %10 of similarity rank.….number of reference faces . 80 . (3. j )) .Although for the corresponding database %100 recognition result is achieved (Figure 3.7c). j = rank 10 ( Sim l .
a) only similarities.45 40 35 30 25 20 15 10 5 100 200 300 400 500 600 (a) 45 40 35 30 25 20 15 10 5 100 200 300 400 500 600 (b) 45 40 35 30 25 20 15 10 5 100 200 300 400 500 600 (c) Figure 3. c) both similarities and number of maximum similar feature vectors. matching face from gallery (148) by comparing. 81 . b) only number of maximum similar feature vectors.7: Test faces (1624) vs.
Purdue [29]. Purdue face database [29] contains occluded cases (some obstacles on face).1. Stirling face database [28] is used to measure the robustness against the expression changes. one frontal face image with neutral pose of each individual is placed in the gallery and the others are placed in the probe (test) set. Each probe image is matched against the data in the gallery. Simulation Setup The proposed method is tested on four different face databases: Stirling [28]. Head pose variations and images that 82 . and the ranked matches are analyzed.CHAPTER 4 RESULTS 4. It must be noted that none of the gallery images is found in the probe set. For each dataset. ORL [30] and FERET [128] face databases. illumination…etc.). Each database more than one face image with different conditions (expression. Each of the above mentioned databases have different characteristics to test the proposed algorithm. of each individual.
1: Examples of different facial expressions of two people from Stirling database.were taken at different times have been included in ORL face database [30]. 17 male). In the following section. Results for University of Stirling face database University of Stirling face database contains grayscale facial images of 35 people (18 female. but also challenging benchmark of all face recognition systems.1. FERET is not only a large face database (1196 individuals). detailed information for those four face databases and their corresponding performance results for the proposed face recognition method are given with the comparisons with some major face recognition methods. a) gallery faces. b) probe faces. whereas two other expressions (smiling and speaking) are 83 . (a) b () Figure 4. Finally FERET face database [128] is used. This gives us an opportunity to compare the performance of our face recognition method by the others using a standardized dataset and test procedure. The neutral views are placed in the gallery. 4.1.1). in 3 poses and 3 expressions with constant illumination conditions (Figure 4. we used three frontal images with different expressions for each of 35 persons. In the experiments.
3.1.4 respectively [31]. It contains over 4. 4.). The pictures were taken at the CVC under strictly controlled conditions. No restrictions on wear (clothes. however proposed method is more robust to illumination changes as a property of Gabor wavelets (Section 3. 89. In this database the proposed algorithm gives 100% correct recognition.used as probes. 84 . For each individual. and occlusions (sunglasses and scarf). etc. recognition performances of eigenface and a related eigenhills method for this database is reported as 82. were imposed to participants.1. Eigenhills and eigenfaces methods were highly effected by illumination changes. glasses. An example of 13 different facial images for a typical person from Purdue database is shown in Figure 4. Moreover as a result of using local distinct features. Even none of the occluded cases are included. Results for Purdue University face database Purdue University face database was created by Alex Martinez and Robert Benavente in the Computer Vision Center (CVC) at the U.000 color images corresponding to 126 people's faces (70 men and 56 women). etc. makeup. Images consist of frontal view faces with different facial expressions. 40 male) individuals in our test setup due to some availability problems of database.1). it is observed that 100% of people are correctly classified. illumination conditions.A.2. We have used images of first 58 (18 female. hairstyle. instead of a face template or local features bounded by a graph.B. facial image with neutral expression is placed in the gallery and the remaining 12 with different conditions are used as probe. We did not reach any reported performance results on this database. After simulations.2.
Figure 4. In the proposed algorithm the facial features are compared locally. nose and any other features rather than eyes. Since the simulation results on this database are encouraging. we pass to ORL database on which simulation results of other wellkonwn face recognition methods are reported.0 Table 4. the algorithm compares faces in terms of mouth.4 100.3 89. eigenhills and proposed method on the Purdue face database.1: Recognition performances of eigenface. instead of using a general structure.2: Example of different facial images for a person from Purdue database. For example.proposed method gives a high performance result on occluded cases. Method Recognition rate (%) Eigenface [20] Eigenhills [31] Proposed face recognition method using GWT 82. first image is used for gallery and the rest 12 are for probe set. 85 . when there are sunglasses. hence it allows us to make a decision from the parts of the face.
86 .Figure 4.3: Whole set of face images of 40 individuals 10 images per person.
For some subjects. Locations of the facial features (eyes. These results were expected. facial expressions (open / closed eyes. mouth…. 87 . smiling / not smiling).4 erroneously classified faces of ORL database are presented. the reported recognition rates are better for convolutional neural network and linebased methods.) are quite changing with rotation of the head. We took the first images for each 40 individuals as reference and the rest is used for testing purposes.2. nose. since locations are important in feature vectors comparison.5). Although. it must be noted that these two are using more than one facial image for each individual in training (Figure 4. Recognition performances on ORL database of wellknown methods are tabulated in Table 4.3 shows the whole set of 40 individuals 10 images per person from the ORL database. There are ten different images of each of 40 distinct subjects.25% correct classification with ORL database. varying lighting. It is observed that the performance of the method decreased slightly due to the orientation in depth of the head. Results for The Olivetti and Oracle Research Laboratory (ORL) face database The Olivetti and Oracle Research Laboratory (ORL) face database is also used in order to test our method in the presence of head pose variations.1. The proposed method achieved 95. facial details (glasses / no glasses) and head pose (tilting and rotation up to 20 degrees). In Figure 4. Figure 4. The proposed method achieves %95.25 correct recognition rate by using only one reference facial image for each individual. All the images were taken against a dark homogeneous background.4.3. the images were taken at different times.
7 95. a) reference faces for two individuals.2: Performance results of wellknown algorithms on ORL database.0 96. (a) (b) Figure 4. b) misclassified test faces. 88 .Method Recognition Performance (%) Eigenface [20] Elastic graph matching [23] Neural network [19] Line based [33] Proposed face recognition method using GWT 80.0 80.2 97.4: Examples of misclassified faces of ORL database.25 Table 4.
4. Primary objectives of FERET test can be stated as: 1.faces for training faces as probes Figure 4. 3. assess the state of the art 2. The Face Recognition Technology (FERET) program has addressed both issues through the FERET database of facial images and the establishment of the FERET tests. identify future areas of research measure algorithm performance The FERET database has made it possible for researchers to develop algorithms on a common database and to report results to the literature based on this database. 14126 images from 1199 individuals are included in the database.5: Example of different facial images for a person from ORL database that are placed at training or probe sets by neural network and line based algorihms. Up to date. The results that exist in the literature do not provide a direct comparison between algorithms. 4.1. Results for FERET Until recently. there did not exist a common face recognition technology evaluation protocol which includes large databases and standard evaluation methods. since each researcher reports results using 89 .
Hence. time or number of workstations is not recorded for any of the algorithms. The gallery is the set of known individuals. gallery and probe. The main test design principals are: 1. 2. Algorithms can not be trained during testing. In a closed universe. With the above principles. every probe is in the gallery. An image of an unknown face presented to the algorithm is called aprobe and the collection of probes is called the probe set. and images. Each facial image is treated as a unique face. Although. The testing protocol is based on a set of design principals. a similarity between each probe and each gallery image must be generated for performance evaluation. The basic models for evaluating the performance of an algorithm are closed and open universes. scoring methods. Stating the design principle allows one to assess how appropriate the FERET test is for a particular face recognition algorithm. in an open universe some probes are not in the gallery. 3. a time limit for a complete test is assigned as three days while using less than 10 UNIX workstations. The open 90 . There are two sets of images. Whereas. not code optimization. . FERET program aims to encourage algorithm development.different assumptions. The independently administered FERET test allows for a direct quantitative assessment of the relative strengths and weaknesses of different approaches. The similarity score between probe and a gallery image is a function of only those two images.
The Duplicate I probe images were obtained anywhere between one minute and 1031 days after their respective gallery matches. fafc) of 14 algorithms that attend to the FERET test and proposed face recognition method are presented. Finally. There is usually only a few seconds between the capture of the galleryprobe pairs. they are those taken only at least 18 months after their gallery entries. On the other hand. and the vertical axis is the percentage of correct matches. Table 4. The performance of the algorithm is evaluated on different probe categories. Cumulative match scores for each of 4 probe sets (dupI. All of the latest results of FERET test is presented in Table 4. Note that all the above tests used a single gallery containing 1196 images. the question is not always “Is the top match is correct?” but “Is the correct answer in the top n matches?”.3. dupII. images of subjects with alternate facial expressions (fafb) have been used. some images are obtained under different illumination conditions and gathered under another set (fafc). For images taken with a different 91 .universe models verification applications. the closed universe model allows one to ask how good an algorithm is at identifying a probe image. typical first choice recognition performance is 95% accuracy. On frontal images taken the same day.4 shows different simulations performed using these sets.5. fafb. For assessment of the effect of facial expression. The harder Duplicate II probe images are a strict subset of the Duplicate I images. The performance statics are reported as cumulative match scores and also the rank is plotted along the horizontal axis. All these sets are tabulated in Table 4. Such an approach lets one to know how many images have to be examined to get a desired level of performance.
4: Probe sets for FERET performance evaluation. Number of faces Evaluation Task Recognized Names Gallery Probe Set Aging of subjects Aging of subjects Facial Expression Illumination Duplicate I Duplicate II fafb fafc 1196 1196 1196 1196 722 234 1195 194 Table 4. typical performance drops to 80% accuracy for the first choice recognition. the typical accuracy is approximately 50% (Table 4. even 50% accuracy is 600 times better than a random selection from 1196 faces. For images taken one year later.5). 92 .3: Probe sets and their goal of evaluation. Note that. Study Goal Dup1 Dup2 fafb fafc Duplicate I Duplicate II expression illumination Table 4.camera and lighting.
mit_mar_95 is the algorithm is the same algorithm that was tested in March 1995 [86]. These algorithms are developed by U. rotated. (2) faces were masked to remove background and hair. (Carlsbad. and (3) the nonmasked facial pixels were processed by a histogram equalization algorithm. algorithm retested in order to measure improvements. Algorithms are also tested from Excalibur Corp. There are two algorithms developed by MIT Media Laboratory using a version of eigenface transform. rotated. ef_hist_dev_md. ef_hist_dev_anm. In the implementation of the PCAbased algorithm. Faces were represented by their projection onto the first 200 eigenvectors and were identified by a nearest neighbor classifier using the L1 metric. ef_hist_dev_ang. ef_hist_dev_l2. The training set consisted of 500 faces. ef_hist_dev_ml1. ef_hist_dev_l1. Arl_ef is a principal components analysis (PCA) based algorithm [20]. ef_hist_dev_ml2 are seven eigenface based system from National Institute of Standards and Technology (NIST) with a common image processing front end and eigenface representation but differing in the distance metric. and from University of Maryland (umd_mar_97) [129. and scaled so that the center of the eyes were placed on specific pixels. and mit_sep_96 is the algorithm developed since March 1995 [22]. And finally the 93 .S.Arl_cor is a normalized correlation based algorithm [127]. all images were (1) translated. 130]. and scaled so that the center of the eyes were placed on specific pixels and (2) faces were masked to remove background and hair. Army Research Laboratory and they provide a performance baseline. For normalized correlation. CA). the images were (1) translated.
950 0.521 0.576 0.222 0.331 0.772 0.041 0.305 0.usc_mar_97 is an elastic graph matching algorithm [23] from Southern California University. In Figures 4.591 0.363 0.072 0. recognition performances of various face recognition methods are presented.171 0.414 0.320 0. Simulation results show that.258 0.128 0.797 0.155 0.5: FERET performance evaluation results for various face recognition algorithms.237 0.171 0.701 0.962 0.963 fafc 0. etc.392 0.124 0.733 0.948 0.232 0. In Figure 4.588 0.448 dupII 0.827 0.128 0.209 0. Moreover.309 0.772 0.346 0.6 to 4.422 0. face recognition performance of the proposed method is competitive to those of other popular methods.338 0.10 average and in Figure 4.342 0.239 fafb 0.137 0.410 0.11 current upper bound identification performances of above methods on each probe set are presented. such as elastic graph matching. There can be also seen the improvements on algorithms’ performances from September 1996 to March 1997.774 0.472 0.820 0.052 0.794 0.341 0.186 0.132 0.676 Table 4.834 0.741 0.350 0.716 0.167 0.197 0. Moreover. on fafb and fafc probe sets proposed 94 . eigenfaces.446 0.9.209 0. First Choice Recognition Ratio arl_cor arl_ef ef_hist_dev_ang ef_hist_dev_anm ef_hist_dev_l1 ef_hist_dev_l2 ef_hist_dev_md ef_hist_dev_ml1 ef_hist_dev_ml2 excalibur mit_mar_95 mit_sep_96 umd_mar_97 usc_mar_97 Proposed face recognition method using GWT dupI 0.216 0.
95 .15 cumulative performance results of proposed method on each probe set are presented. In Figure 4.114.5).method achieves higher performance results than the most of the FERET test contenders (Table 4.
96 .6: Identification performance against fb probes.Figure 4. (a) algorithms tested in September 1996. (b) algorithms tested in March 1997.
7: Identification performance against duplicate I probes. (b) algorithms tested in March 1997. 97 .Figure 4. (a) algorithms tested in September 1996.
Figure 4. (b) algorithms tested in March 1997.8: Identification performance against fc probes. (a) algorithms tested in September 1996. 98 .
Figure 4. 99 . (a) algorithms tested in September 1996.9: Identification performance against duplicate II probes. (b) algorithms tested in March 1997.
100 . Figure 4.11: Current upper bound identification performance of FERET contender for each probe category.Figure 4.10: Average identification performance of FERET test contenders on each probe category.
101 .Figure 4. Figure 4.12: Identification performance of proposed method against fb probes.13: Identification performance of proposed method against fc probes.
15: Identification performance of proposed method against duplicate II probes. 102 .14: Identification performance of proposed method against duplicate I probes. Figure 4.Figure 4.
CHAPTER 5 CONCLUSION AND FUTURE WORK Face recognition has been an attractive field of research for both neuroscientists and computer vision scientists. they also should be prudent in using only those that are applicable or relevant from a practical/implementation point of view. Since 1888. namely 103 . currently two biologically inspired methods. Those researches illuminate computer vision scientists’ studies. Although none of them could reach the human recognition performance. Humans are able to identify reliably a large number of faces and neuroscientists are interested in understanding the perceptual and cognitive mechanisms at the base of the face recognition process. Although designers of face recognition algorithms and systems are aware of relevant psychophysics and neurophysiological studies. many algorithms have been proposed as a solution to automatic face recognition.
instead of directly using pixel gray values. the elastic graph matching approach is less attractive for commercial systems. Moreover. As a result system becomes sensitive to illumination changes. Although using 2D Gabor wavelet transform seems to be well suited to the problem. as the local information is extracted from the nodes of a predefined graph. In this thesis. The method uses Gabor wavelet transform for both finding feature points and extracting feature vectors. since Gabor wavelet transform of images is being used. Eigenfaces algorithm has some shortcomings due to the use of image pixel gray values. a new approach to face recognition with Gabor wavelets is presented. unless a universal database exists. it is seen that 104 . graph matching makes algorithm bulky.eigenfaces and elastic graph matching methods. have reached relatively high recognition rates. Unlike the eigenfaces method. etc. elastic graph matching method is more robust to illumination changes. some details on a face. due to its computational complexity and execution time. might be lost. Although recognition performance of elastic graph matching method is reported higher than the eigenfaces method [83]. and needs a beforehand preprocessing step. scaling. When a new face attend to the database system needs to run from the beginning. From the experimental results. Satisfactory recognition performances could be reached by successfully aligned face images. which are the special characteristics of that face and could be very useful in recognition task.
A new facial image can also be simply added by attaching new feature vectors to reference gallery while such an operation might be quite time consuming for systems that need training. since the facial features are compared locally. such as neural networks. Feature points are obtained from the special characteristics of each individual face automatically. in our approach. Proposed method is also robust to illumination changes as a property of Gabor wavelets.proposed method achieves better results compared to the graph matching and eigenface methods. especially in terms of the utilized features. instead of fitting a graph that is constructed from the general face idea. Disregard a point while using an additional topology cost. it allows us to make a decision from the parts of the face. In the proposed algorithm. Their topology cost is defined as the ratio of the vectoral distances of feature point pairs between two face images. There is no training as in many supervised approaches. the algorithm compares faces in terms of mouth. Moreover. Manjunath et. when there are sunglasses. nose and any other features rather than eyes. However. the location of feature points also contains information about the face. having a simple matching procedure and low computational cost proposed method is faster than elastic graph matching methods. which are known to be the most successive algorithms. al. it is observed that the locations of 105 . During simulations. instead of using a general structure. Although the proposed method shows some resemblance to graph matching algorithm. which is the main problem with the eigenface approaches. The algorithm proposed by Manjunath et. al. For example. [36] that shows some similarity to our algorithm.
whether they belong to an eye or a mouth. Moreover. etc. for the same individual. a set of weights can be assigned to these feature points by counting the total times of a feature point occurs at those responses. However using such a topology cost amplifies the small deviations of the locations of feature points that are not a measure of match. • Since feature points are found from the responses of image to Gabor filters separately. feature extraction step of a single face image takes 0.feature points. Such improvements can be summarized as. rotation. illumination. found from Gabor responses of the face image. This process will 106 . due to automatical feature detection. it can further be improved with some small modifications and/or additional preprocessing of face images. Although recognition performance of the proposed method is satisfactory by any means. the locations of feature points are very important.5 seconds.). Gabor wavelet transform of a face image takes 1.15 seconds on a Pentium III 550 Mhz PC. an exact measurement of corresponding distances is not possible unlike the geometrical feature based methods. • A motion estimation stage using feature points followed by an affine transformation could be applied to minimize rotation effects. etc. can give small deviations between different conditions (expression. Therefore. Giving an information about the match of the overall facial structure.3 seconds and matching an input image with a single gallery image takes 0. features represented by those points are not explicitly known. Note that above execution times are measured without code optimization. having glasses or not.
It must be noted that performance of recognition systems is highly application dependent and suggestions for improvements on the proposed algorithm must be directed to a specific purpose of the face recognition application. This could be realized by examining the distances between the main facial features which can be determined as the locations that the feature points become dense. • As it is mentioned in problem definition. • In order to further speed up the algorithm. number of Gabor filters could be decreased with an acceptable level of decrease in recognition performance. Although there is still only one frontal face per each individual in the gallery. for example distance between two eyes. 107 . While trying to maximize those distances. a frame giving the “most frontal” pose of a person should be selected to increase the performance of face recognition algorithm. existing frame that has the closest pose to the frontal will be found. A robust and successive face detection step will increase the recognition performance. information provided by a video sequence that includes the face to be recognized would be efficiently used by this step. • When there is a video sequence as the input to the system. By the help of this step face images would be aligned. a face detection algorithm is supposed to be done beforehand. Implementing such a face detection method is an important future work for successful applications.not create much computational complexity since we already have feature vectors for recognition.
REFERENCES [1] [2] V. 1986. A.. and E. G. 1978. G. Aspects of Face Processing. F.13. Diamond. Psychonomic Soc.13. “Visual Information processing based on spatial filters constrained by biological data. Perceiving and Remembering Faces New York: Academic. 137178. “Face related variation of facial features: Anthropometric Data I. W. 175201. and B. pp. A. S. “A case study: Face Recognition. J.” Int. ManMachine Studies. Psychonomic Soc. Goldstein. “The development of face recognition. C.” Bull. 78129. R. J.. Young. 1988. Davies.” Bull. 173202. 1987. Psych. Dordrecht: Nijhoff. 187190.A maturational component?” Develop. Jeeves.” Explorations in the Biological Language. and A. R. Young. Rep. 16. 1981. H. Recognizing Faces. “Mechanisms of human facial recognition.. A. A. Hay and A. 1980. M. “The human face. vol. pp. Baron. London: Erlbaum. pp. 1981. Carey. “Facial feature variation: Anthropometric Data II. Shepherd. D. Woods. 191193. H. pp. G. Ellis Ed. [3] [4] [5] [6] [7] [8] [9] [10] 108 . Ellis. Walker Ed. 1979. Ellis. Bruce. pp. 15. pp. vol. 1979.” AMRL tech. Carey. Goldstein. W. S. New York: Bradford. E.” Normality and Pathology in Cognitive function. . 1982. pp. P. 257269. London: Academic. vol. vol.. Newcombe. Ginsburg.
[11]
L. D. Harmon, “The recognition of faces.” Scientific American, vol. 229, pp. 7182, 1973. D. Perkins, “ A definition of caricature and recognition,” Studies in the Anthropology of Visual Commun., vol. 2, pp. 124, 1975. J. Sergent, “Microgenesis of face perception,” Aspects of Face Processing, H. D. Ellis, M. A. Jeeves, F. Newcombe, and A. Young Eds. Dordrecht: Nijhoff, 1986. T. Kanade, “Picture processing by computer complex and recognition of human faces”. Technical report, Kyoto University, Dept. of Information Science, 1973. S. Lin, S. Kung, and L. Lin, “Face Recognition / Detection by Probabilistic DecisionBased Neural Network,” IEEE Trans. Neural Networks, vol.8, pp.114132, 1997. R. Brunelli, T. Poggio, “Face Recognition: Features vs. Templates,” IEEE Trans. on PAMI, Vol. 12, No. 1, Jan. 1990. M. H. Yang, N. Ahuja, and D. Kriegman, “A survey on face detection methods,” IEEE Trans. On Pattern analysis and Machine Intelligance, to appear 2001. S. Ranganath and K. Arun, “Face Recognition Using Transform Features and Neural Network,” Pattern Recognition, vol. 30, pp. 16151622, 1997. S. Lawrence, C. Giles, A. Tsoi, and A. Back, “Face Recognition: A Convolutional Neural Network Approach,” IEEE Trans. on Neural Networks, vol. 8, pp. 98113, 1997. M. Turk and A. Pentland. “Eigenfaces for recognition.” Journal of Cognitive Science, pp.7186, 1991. P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition using class Specific linear projection,” IEEE Trans. on PAMI, vol.19, no.7, 1997. B. Moghaddam, C. Nastar, and A. Pentland, “Bayesian Face Recognition using Deformable Intensity Surfaces,” IEEE Conference on CVPR, San Francisco, CA, June 1996. L. Wiskott, J. M. Fellous, N. Krüger and Christoph von der Malsburg, “Face Recognition by Elastic Graph Matching,” In Intelligent Biometric
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
109
Techniques in fingerprint and Face Recognition, CRC Press, Chapter 11, pp. 355396, 1999. [24] J. G. Daugman, “Complete Discrete 2D Gabor Transform by Neural Networks for Image Analysis and Compression,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 36, no.7, pp.11691179, 1988. M. J. Lyons, J.Budynek and S. Akamatsu, “Automatic Classification of Single Facial Images,” IEEE Trans. on PAMI, vol. 21, no.12, December 1999. N. Krüger, M. Pötzsch and C. von der Malsburg, “Determining of Face Position and Pose with a learned Representation Based on Labeled Graphs,” Image and Vision Computing, vol. 15, pp.665673, 1997. L. Wiskott, J. M. Fellous, N. Krüger and Christoph von der Malsburg, “Face Recognition and Gender Determination,” Int’l Workshop on Automatic Face Recognition and Gesture Recognition, pp.9297, 1995. Stirling database, http://pics.psych.stir.ac.uk. Purdue face database, http://www.purdue.edu. Olivetti & Oracle Research Laboratory, The Olivetti & Oracle Research Laboratory Face Database of Faces, http://www.camorl.co.uk/facedatabase.html A. Yılmaz, M. Gökmen, “Eigenhill vs. Eigenface and Eigenedge,”Proc. of International Conf. on Pattern Recognition, Barcelona Spain, 2000. J. Zhang, Y. Yan, and M. Lades, “Face Recognition: Eigenface, Elastic Matching, and Neural Nets,” Proc. IEEE, vol. 85, pp.14231435, 1997. O. de Vel and S. Aeberhard, “LineBased Face Recognition Under Varying Pose,” IEEE Trans. on PAMI, vol. 21, no. 10, October 1999. B. Moghaddam, W. Wahid, and A. Pentland, “Beyond Eigenfaces: Probabilistic Matching for Face Recognition,” The 3rd IEEE Int’l Conference on Automatic Face and Gesture Recognition, Nara, Japan, April 1998. B. Duc, S. Fisher, and J. Bigün, “Face Authentication with Gabor Information on Deformable Graphs,” IEEE Trans. On Image Proc., vol.8, no.4, pp.504515, 1999.
[25]
[26]
[27]
[28] [29] [30]
[31]
[32]
[33]
[34]
[35]
110
[36]
B.S. Manjunath, R. Chellappa, and C. von der Malsburg, “A Feature Based Approach to Face Recognition,” Proc. of International Conf. on Computer Vision, 1992.
[37]
N.M. Allinson, A.W. Ellis, B.M. Flude, and A.J. Luckman, “A connectionist model of familiar face recognition,” IEE Colloquium on Machine Storage and Recognition of Faces, vol. 5, pp. 110 (Digest No: 1992/017), 1992. R.J. Baron, “Mechanisms of human facial recognition,” International Journal on Machine Studies, vol. 15, pp. 137178, 1981. P. Benson and D. Perrett, “Face to face with the perfect image,” New Scientist, pp. 3235, 22 February 1992. P.J. Benson and D.I. Perrett, “Perception and recognition of photographic quality facial caricatures: implications for the recognition of natural images,” Face Recognition, pp. 105135, Laurence Erlbaum Associates, 1991. T. Valentine, V. Bruce, “Recognizing familiar faces: the role of distinctiveness and familiarity”, Canadian Journal of Psychology, vol. 40, pp. 300305, 1986. V. Bruce and A.W. Young, “Understanding face recognition,” British Journal of Psychology, vol. 77, pp. 305327, 1986. N. Cantor and W. Mischel, “Prototypes in person perception,” Advances in Experimental Social Psychology, vol. 12, pp. 352, Academic Press, 1979. G.W. Cottrell and M. Fleming, “Face recognition using unsupervised feature extraction,” International Neural Network Conference, vol. 1, pp. 322325, 1990. I. Craw, “Recognizing face features and faces,” IEE Colloquium on Machine Storage and Recognition of Faces, vol. 7, pp. 14 (Digest No: 1992/017), 1992. I. Craw and P. Cameron, “Face recognition by computer,” British Machine Vision Conference, pp. 488507, SpringerVerlag, 1992. J. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by twodimensional visual cortical filters,” Journal of the Optical Society of America, vol. 2(7), 11601169, July 1985.
[38] [39] [40]
[41]
[42] [43] [44]
[45]
[46] [47]
111
532535.B. R.J.” Proceedings of the Int. 21 June1888.” IEEE Transactions on Pattern Analysis and Machine Intelligence. February 1994. and A. 2829. 1989. 15(11). pp. 1982. Huang and C. 59(5). 26. Hutcheson. Ramig. pp. Gillick and S. L. Welsh. D. March1973.C. pp. A.L. [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] 112 . M. Lasch. 25(12). “Why faces are and are not special: an effect of expertise. vol. Fowler. “Machine identification of human faces.” Nature.F. pp.” IEE Int. pp. Diamond and S. vol.” IEEE Spectrum. Goldstein. L. Harmon. “Personal identification and description 1. R. 115. G.C. Galton. 1971. 61(3). Gallery and T. C. Conf.” Pattern Recognition. Sir Francis Galton. Forney. February 1994. 1986.” Pattern Recognition. 1993. L. you're on candid camera. “An architecture for face classification. R. “Identification of human faces. Harmon.J. 1992. T. F. Daugman.G. Cox. of the IEEE.P.” Journal of Experimental Psychology. “Some statistical issues in the comparison of speech recognition algorithms. Hay and A. Nature 201203.” IEE Colloquium on Machine Storage and Recognition of Faces. 28 June 1888. on Acoustics.D.W. Hutchinson and W. Academic Press. 13(2).14351444.J. 268278.D. “The Viterbi Algorithm. Young. 173202. pp. “FACE: smile. Human facial feature extraction for face interpretation and recognition. pp. “Comparison of neural networks and conventional techniques for feature location in facial images. Lesk. pp. Trew. Carey.W. Khan. pp. 15 (Digest No: 1992/017).” IEEE Spectrum. vol.I.” Proc.A. 748760. Speech and Signal Processing.” Normality and pathology in cognitive functions. “High confidence visual recognition of persons by a test of statistical independence. “Personal identification and descriptionII”. pp. and P. vol. “FINGERPRINT: an old touchstone decriminalized. 1992. R. vol. 107117. 173177. vol.” Proceedings of the IEEE. 2.[48] J. R. 97110.D. Chen. 1981. pp. “The human face.K.
L. SpringerVerlag. Laurence Erlbaum Associates. Rabiner.” IEEE Spectrum. Shepherd. Cootes. M. 212228. Conf. February 1994.Conf.L.” IEEE Trans. page 30. “A tutorial on Hidden Markov Models and selected applications in speech recognition. pp. 1.E. and T. Craw. G. Samal and P. Kirby and L. Taylor. vol. J. KayraStuart. February 1994. 1991. 257286. “Recognition memory for typical and unusual faces. “Computer recognition of human faces. pp. 1992.A. Mandelbaum. Kanade. pp. 5. 2425. of the IEEE. 6574. 25(1). pp. F. 77(2). pp. 1994. On Pattern Analysis and Machine Intelligence. pp. February 1994. J. A. 1993. “IRIS: more detailed than a fingerprint. and S. February 1994.” Proc. B. C. Kohonen. vol. 1989.” IEEE Spectrum. [62] T. Birkhauser Verlag. 103108.” Journal of experimental Psychology: Human Learning and Memory. [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] 113 . vol. F. pp. vol. “Automatic recognition and analysis of human faces and facial expressions: A survey.D. 1977. R. on Artificial Neural Networks. presentation time and delay on face recognition. 27. Robertson and I.12(1). Miller. Light. Sirovich. 201205. Hollander.” British Machine Vision Conference. Ellis. Gibling.” Face Recognition.” British Machine Vision Conference volume 1. pp.” Pattern Recognition. pages 2230. 2534. D. BMVA Press. and H. “Testing face recognition systems. .R. 137145. A. “An automatic face identification system using flexible appearance models. pp. T. Siedlarz. 2nd edition. 1990. Iyengar.” IEEE Spectrum. pp.F.” IEEE Spectrum. 1979. Sidlauskas. Publication Number 313.” Interdisciplinary Systems Research. BMVA Press. “SPEECH: just say the word. Lanitis. Selforganization and associate memory. 6577. “Application of the KarhunenLoueve procedure for the characterisation of human faces. “The effect of distinctiveness.W. L. 1989. “HAND: give me five. “Vital signs of identity.J. vol. 1988.
“Feature extraction from faces using deformable templates. Laurence Erlbaum Associates. F. Yin. S.H.” Aspects of face processing. 1994. on Acoustics.L. “A system for recognizing human faces. pp. A.S. K. pp. Young and V. 1964. “Active badges and personal interactive computing objects. and S. Journal of Computer Vision. Wong. Journal of Computer Vision. 395402. “A robust face identification scheme. P. May 1995. “Visual learning and recognition of 3d objects from appearance. vol. 1989. Moghadam. Conf. B. vol. Starter. “Practical face recognition and verification with WISARD. Suenuga. E. Yuille. and M. 1970. H. “Elaboration and distinctiveness in memory for faces. C. T.” Neuropsychologia.[75] T. Muarse and S. Want and A.K. Chellappa. Sasaki and Y.M. Tsang. Wilson.H. no. Law. PRI:15.” Proc. Hallinan. CA. 1995.” Proceedings of IEEE. M. 1992. 99111. Bruce. pp. 16381642.” Journal of Experimental Psychology: Human Learning and Memory. Cohen. and P. 5. Nayar. “Human and machine recognition of faces: A survey. “The model method in facial recognition. 295306. 1998. pp. Winograd. vol. pp. A. 181190.” IEEE Trans.14.” Int. Stonham.W. 8. on IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 16. of the Int. 1986. Turk. 524.J. pp. Palo Alto.” Proc. pp.” Int. 1981. Martinus Nijhoff Publishers. T. “View based and modular eigenspaces for face recognition.W.W.M. Rep. Tech. 83. Bledsoe. On Consumer Electronics. R. P. R. pp. H.” SPIE [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] 114 .. “The FERET database and evaluation procedure for face recognition algorithms. 426441. vol. vol. February 1992. 8(2). Pentland. “Face recognition by braininjured patients: a dissociable ability?. 549. Sirobey.” Panoramic Research Inc. Hopper. “Perceptual categories and the computation of ‘grandmother’. 8491. pp. 7. vol. Phillips.” Face Recognition. R.KL expansion of an invariant feature space. 1991. Akamatsu. and D. W. A.” Image and Vision Computing.
1991. 5. 1996. Kuo. Zang. Rabiner and B.” Proc. ECCV’96. Y.” Journal of the Optical Society of America. pp. On Pattern Analysis and Machine Intelligence.” Proc. Belhumeur. on Pattern Recognition. 1992. R. Levin. “Algebraic feature extraction of image for recognition.” SPIE Proc. pp. 1992. 221224. And Techn. ii: eigenvectorbased algorithms. “Keyword spotting in poorly printed documents using pseudo 2d HMM.”ICAASP . 113116. Agazzi. of 11th Int. April 1996. Yang. E. and D. Englewood Cliffs. “Eigenfaces vs Fisherfaces: Recognition using class specific linear projection. Wang. 211219. vol. Liu. of Fourth European Conf. Z. Conf. Etemad and R. 1990.:: Intelligent Robots and Computer Vision X: Algorithms and Technology.” Proc. L. K. Y. 1607.” ICASSP. 1993. NJ: Prentice Hall. 7184. 1991. K. pp. vol 1607. “Comparison of statistical pattern recognition algorithms for hybrid processing.149152. Y. Duda and P. of ICASSP. pp. Yang. [88] P. 1991. Fundamentals of speech recognition. Kriegman. Hespanha. “ Connected and degraded text recognition using planar HMM. Pattern Classification and Scene Analysis. S. Agazzi. Cheng.” Pattern Recognition. S. Academic Press. vol. 4556. O. 1994.” IEEE Trans. “A robust algebraic method for human face recognition. 1993. Wiley. New York. “Face recognition using discriminant eigenvectors. and H. Hart.Proc. J. Gu. pp. pp.:: Intelligent Robots and Computer Vision X: Alg. Liu. “Human face recognition method based on the statistical model of small sample size. Chellapa.. J. Cheng. Pieraccini. Pieraccini. E. vol. Introduction to Statistical Pattern Recognition. Q Tian. 1988. Hong. and R. 5. J. Levin and R. pp. Huang. Fukunaga. “Dynamic planar warping for optical character recognition. K. vol. 24. Kuo and O. K. [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] 115 . 8595. 16701672. and N. pp. 1973. on Computer Vision.
Parallel and Distributed Processing. Neural Network Conf. “ Learning recognition and segmentation of 3d objects from 2d images. On Pattern Analysis and Machine Intelligence. W. 1. 12. 1993. 291294. Int. “Nonlinear dimension reduction. Conf. “The selforginizing map. 1994. Berlin. “Automated face identification using Hidden Markov Models. Kohonen.” Image Processing: Theory and Applications. W. Fleming. ICCV 93. France. of the IEEE. Young. vol. [103] F. [111] T. Fallside. 1993. Rumelhart and J. [112] T.” Image and Computer Vision. “HMM based architecture for face identification. [104] F. Samaria. Cambridge. 59. 116 . Self Orginizing Maps. 1990. N. of Int. pp. October 1994. “Face segmentation for identification using Hidden Markov Models. vol. [106] D. vol. Kohonen.” Advances in Neural Information Processing Systems 5. 78. [107] G. 1993. San Mateo. Fallside. [105] F. [108] H. 1988. Cottrell. 1993. Germany. L. “Hidden Markov Models based optical character recognition in presence of deterministic transformations. pp. 1986. Kamp. on Computer Vision. “Face identification and feature extraction using Hidden Markov Models.. Of the second IEEE Workshop on Application of Computer Vision. “Parameterization of stochastic model for human face identification. Huang. SpringerVerlag. 14641480.” Proc. 1993. pp. Samaria and F.” Proc. Ahuja. Conference on Advanced Mechatronics. E. Samaria and A.” Proc. S. [102] F. 322325. Harter. Weng..” Proc. Morgan Kauffmann Publishers. Samaria and S. Samaria and F. pp. Kuo.” British Machine Vision Conference. McClellan. 1990. [110] J.1. “Autoassociation by multilayer perceptrons and singular value decomposition. [109] David DeMers and G.[100] O. Bourlard and Y. “Face recognition using unsupervised feature extraction. CA.” IEEE Trans. vol. of the Int. Cottrell and M. vol. MA: MIT Press. 1995. 121128. July 913. Paris. pages 580587. Agazzi and S. and T.” Biological Cybern. 1994.” Proc. [101] F.
20. 70. “Recognition of faces in photographic negative”. Wurtz. Wiesel.the role of shape from shading and pigmentation cues. August 1988. Lades. [115] P. 1980. pp. Stringa. Hubel and T. 76. Comput. Burton.” Perception.2. J. [122] S. Lange. Musselman. vol. and V. 1993. pp. 93. no.” IEEE Trans. 19. Vision Research. 429459. Lee. 3. vol. 207208. vol. 837842.” Proc. October. Royal Soc. 1994. 1996.10. “Image representation using 2d Gabor wavelets. IEE. “Mathematical description of the responses of simple cortical cells. [123] A. OctDec 1993. [126] R. 17391755. “The use of pigmentation and shading information in recognizing the sex and identities of faces. 42. pp. J. N. “Face recognition under varying pose. no. and A. pp.”IEEE Trans. vol. 365382. pp. 12. P. 7. 10061015. Chow and X. Kemp. 12971300. “Two dimensional spectral analysis of cortical receptive field profile”. von der Malsburg. pp.” Perception. vol. 26. Of the Royal Soc.” Philosophical Trans. of IEEE. “Functional architecture of macaque monkey visual cortex. 1978. 1996. B. 117 . 300311. 3752. Beymer. Bruce. On Pattern Analysis Pattern Analysis and Machine Intelligence.”J. of 23rd Image understanding Workshop. [117] M. “Towards a system for automatic facial feature detection. [125] R. J Vorbruggen. 1970. pp. 1993.”Pattern Recognition . 25. [116] D.[113] G. “ Distortion invariant object recognition in the dynamic link architecture. 1. “Theory of communication. Optical Soc. pp. pp. S.” Applied Artificial Intelligence. M. Buhmann. pp. Langton. 847856.. 1993. Pike. vol. vol. Am. 1980. vol. “Smart sensing within a pyramid vision machine. no. [124] V. [119] J.” J. vol. 198. B (London). [120] D. and I. “Modeling face recognition. 1994.” Proc. 1946.. vol. G. vol. “A perception and recognition of normal and negative faces. vol. Marcelja. Gabor. Craw. Burt. “Eyes detection for face recognition. pp.803822.” Proc. [114] L. Bruce and S. [121] D. vol. 457480.159. White. E. pp. Psychonomic Science. Li. and R. p. [118] T. Daugman. Galper. 18. 23. G. vol. H.
IEEE Computer Society Press.” J. 1998. pp. [128] FERET latest evaluation results.html [129] W.” In 3rd International Conference on Automatic Face and Gesture Recognition. http://www. R. CA.nist. 336341.17241733. Moon and P. August 1997. Soc. 1998. “Discriminant analysis for recognition of human face images. [130] K. 118 .gov/iad/humanid/feret/perf/eval. J. Etemad and R. Am. Zhao. A. and A. vol.itl.[127] H. 14. Los Alamitos. “Discriminant analysis of principal components for face recognition. Phillips. Krishnaswamy. Chellappa.’ Empirical Evaluation Techniques in Computer Vision. ‘Analysis of PCAbased face recognition algorithms. Opt. Chellappa. pp.
This action might not be possible to undo. Are you sure you want to continue?
Use one of your book credits to continue reading from where you left off, or restart the preview.