You are on page 1of 51

Zr i ch

Aut onomous Syst ems Lab


From eigenfaces t o adaboost
Cedric Pradalier
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
An int roduct ion t o t he processing of large dimensionalit y
dat aset
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
I nput
A dat abase of normalised face phot os
A normalised face phot o
Out put
Face ident ificat ion: whose phot o is t hat ?
Face represent at ion in minimal dimension
Face comparison
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Database
Image to identify
Identification
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Why?
Different cent ering
Different mout h shape
Different eye opening
Solut ion:
Ext ract t he most import ant feat ures
Discard t he det ails
PCA is one solut ion
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Each image is a n x m mat rix of pixels
Convert it int o a nm vect or by st acking t he columns
A small image is 100x100 - > a 10000 element
vect or, i. e. a point in a 10000 dimension space.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
M = comput e average vect or
Subt ract M from each vect or - > zero-
cent ered dist ribut ion.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
C = comput e covariance mat rix
( 10000x10000)
= comput e t he 10000
eigenvalues and eigenvect or of C
Change all point s int o t he eigenvect ors
frame
1 1
, ) ( , ( )
p p
v v (

A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Select j ust enough dimensions according
t o t he st rengt h of t heir eigenvalues
Typical value of 30- 100 dimensions seems enough
for faces
Discard all t he remaining dimensions
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Prepare t he image
St art wit h a face t o ident ify.
Convert t he image t o a vect or
Subt ract M
Change t o t he eigenvect ors frame
Keep only t he required dimension
Find t he closest point in t he remaining
dimensions.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Dat abase from Yale:
ht t p: / / cvc. yale. edu/ proj ect s/ yalefaces/ yalefaces.ht ml
165 faces, 11 persons wit h varying light ing, expression,
glasses
Result s and algorit hms from:
ht t p: / / www. cs.princet on.edu/ ~ cdecoro/ eigenfaces/
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
30% of faces used for t est ing, 70% for
learning.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Variance: normalised cummulative sum of the eigenvalues.
About 55 eigenfaces are required to represent 80% of the information
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Adding eigenfaces one at a time
Adding eigenfaces eight at a time
Reconstruction to perfect image requires a lot of eigenfaces, but
much less than pixels
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
All faces with glasses have been
ignored: not a huge difference.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Most recognitions are correct, even with wide range of expression variation:
PCA has relatively low sensitivity to local-changes
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
9/23 recognitions are wrong. PCA is sensitive to global changes
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Normalisat ion:
Normalisat ion of t he range and cent er of each
dimensions
Comput at ional t ools:
Eigenvalues or eigenvect ors
SVD decomposit ion
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Principal Component Analysis is a good t ool
t o ident ify main charact erist ics of a dat a
set .
I t is comput at ionally efficient for recognit ion
and dimensionalit y reduct ion
The const ruct ion of t he eigenvect ors can be
very expensive ( esp. for images) .
Online PCA t echniques have been
researched.
For image recognit ions, image must be pre-
cut very accurat ely, wit h consist ent light ing
for t he t echnique t o work.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
{ x
i
} a set of point s in a D- dimensional space X
{ u
i
} an ort honormal basis in X
Then:
Approximat ing on a sub base:
Approximat ion error:
,
1 1
( )
D D
T
n n i i n i i
i i
u x u u x
= =
= =

,
1 1
D
n n i i
M
i i n
i i M
x u b z u z b
= = +
= + = +

Independent of n
2
1
1
N
n n
n
x J x
N
=
=


A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Minimising J:
W. r. t z:
W. r. t b:
Subst it ut ing:
Leads t o:
0
T
nj n j
nj
J
z x u
z

= =

1
1
0
T
N
T
j j n j
n
j
J
b x u x u
b N
=
| |
= = =
|

\ .

{ }
1
) (
D
T
n n n i i
i M
x x x u u x
= +
=

{ }
2
1 1 1
1
N D D
T T T
n i i i i
n i M i M
x u u S x u u J
N
= = + = +
= =

1
1
( )( )
N
T
n n
n
x x x x
N
S
=
=

Data covariance matrix:
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Minimising J:
Finding t he opt imal u
i
require a minimisat ion
wit h const raint s:
I nt roduce Lagrange mult ipliers

The opt imal is found when u
i
is an eigenvect or of S
Eigenvalues are posit ive, so J is minimal if t he u
i
are
t he eigenvect or wit h t he sm al l est eigenvalues
1
1
D
T
i i
i M
u Su
N
J
= +
=

1
i
u =
1
D
i i i i
i M
Su J u
= +
= =

A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
PCA is t he ort hogonal proj ect ion of t he dat a
ont o a lower subspace such t hat t he variance
of t he proj ect ed dat a is maximised.
I nformally: more variance means more informat ion
Probabilist ic formulat ion:
Lat ent variable z: proj ect ion on t he subspace
EM algorit hm:
Maximise t he log- likelihood of p( x)
Find t he opt imal W, and : t hey correspond t o t he
dat a mean and t he principal component of t he dat a.
- > Can deal wit h missing dat a ( among ot her
advant ages)
2
( ) ( | 0, ) ( | ) ( , | ) p z N z I p x z N x z I W = = +
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
I CA: I ndependent Component Analysis
Similar t o t he Probabilist ic formulat ion, except t he
lat ent variable have a non- linear, non gaussian
dist ribut ion:
Used in signal processing. Typical example is blind
source separat ion in audio signal analysis.
CCA: Canonical Correlat ion Analysis
Creat es a model t hat maximally correlat es 2 set s
of variable
Used in dat a analysis/ st at ist ic t o find what is
common bet ween t wo set s of observat ions.
1
( ( ) )
M
j
j
p z p z
=
=

A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A good way t o build a classifier
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
What is classificat ion ( in layman t erms) ?
NL
L
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Comput at ional learning t heory
dist inguishes bet ween a:
St rong learning algorit hm: finds wit h a high
probabilit y an arbit rarily accurat e classifier
Weak learning algorit hm: Only finds a classifier
wit h a bounded accuracy.
For example: Support Vect or Machines
wit h linear kernel only creat e a bounded
accuracy.
But : They are at least bet t er t han random
guessing!
( i. e. t he classificat ion error is lower t han 0. 5)
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
SVM
Support vector machines for joint multvariables
optimization [Spinello08]
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Decision St umps are a class of very simple
weak classifiers.
Goal: Find an axis- aligned hyperplane
t hat minimizes t he classificat ion error.
This can be done for each feat ure ( i. e.
for each dimension in feat ure space)
I t can be shown t hat t he classificat ion error
is always bet t er t han 0. 5 ( random
guessing) .
I dea: apply many weak classifiers, where
each is t rained on t he misclassified
examples of t he previous.
Object Classification Applications
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Weak classifiers ( in Adaboost ) are binary
classifiers
)
`

> +
=
m
x m
m j x c
j


) , , | (
Stump: simple most non trivial type of decision tree
(equivalent to a linear classifier defined by affine
hyperplane)
) 1 , 1 ( + m
The hyperplane is ort hogonal t o j axis wit h which it int ersect s in
( it ignores all ent ries of x except )

j
x
1
x
2
x

A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Boost ing is a t echnique t o build a st rong
learning algorit hm from a given weak
learning algorit hm.
The most popular boost ing algorit hm is
AdaBoost ( adapt ive boost ing) .
I t assigns a weight t o each t raining dat a point .
I n t he beginning, all weight s are equal
I n each round AdaBoost finds a weak classifier and
re- weight s t he misclassified point s.
Correct classified point s are weight ed less,
misclassified point s are weight ed higher
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Algorit hm TrainAdaBoost :
1. for do
2. for do
3. Find a classifier t hat minimizes
4. comput e
5. ret urn
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Algorit hm ClassifyAdaBoost :
1. ret urn
Major features:
Accuracy of the classifier increases with the
number M of weak classifiers. I.e. the algorithm is
arbitrarily accurate
Classification can be done very fast (in contrast to
training)
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
The st at e of t he art :
Robust Real- t ime Obj ect Det ect ion,
Paul Viola and Michael Jones, I WSCTV, 2001
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Feat ures for face det ect ion
Quick evaluat ion t hrough t he int egral image
approach
Classifier select ion
How t o select a minimal set of feat ures/ weak
classifier t o det ect a face
Classifier cascade
How t o efficient ly assemble classifiers
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Defined as difference of
rect angular int egral area:
The sum of t he pixels which
lie wit hin t he whit e
rect angles are subt ract ed
from t he sum of pixels in
t he grey rect angles.
One feat ure defined as:
Feat ure t ype: A, B, C or D
Feat ure posit ion and size
( )
( )
( , ) ( , )
White Grey
I x y dxdy I x y dxdy

A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Defined as :
I nt egral on rect angle D can
be comput ed in 4 access t o
I
int
:
Very efficient way t o
comput e feat ures
= I(x,y) dy dx ( , )
int
x X y Y
I X Y


(1 ( ) , ) (4) (2) (3)
int int int int
D
I I x y I I I = +

A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A weak classifier has 3 at t ribut es:
A feat ure f
j
( t ype, size and posit ion)
A t hreshold
j
A comparison operat or op
j
= < or >
The result ing weak classifier is:
x is a 24x24 pixels window in t he image
( ) ( )
j j j j
x f x op h =
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A classifier with only this two features can be trained to
recognise 100% of the faces, with 40% of false positives
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
scale = 24x24
Do {
For each posit ion in t he image {
Try classifying t he part of t he image st art ing at t his
posit ion, wit h t he current scale, using t he classifier
select ed by AdaBoost
}
Scale = Scale x 1. 5
} unt il maximum scale
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Basic idea:
I t is easy t o det ect t hat somet hing is not a face
Tune( boost ) classifier t o be very reliable at saying
NO ( i. e. very low false negat ive)
St op evaluat ing t he cascade of classifier if one
classifier says NO
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Fast er processing
Quick eliminat ion of useless windows
Each individual classifier is t rained t o deal
only wit h t he example t hat t he previous
ones could not process
Very specialised
The deeper in t he cascade, t he more
complex ( t he more feat ures) in t he
classifiers.
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s

S
y
s
t
e
m
s

L
a
b
Zr i ch
Face det ect ion is solved
Algorit hms such as Viola-Jones AdaBoost are very
efficient and easily implement ed in hardware
Occurring on digit al camera and camcorder
The approach used in Viola-Jones algorit hm
are generic enough t o be used for ot her
det ect ion t asks
PCA can st ill be useful, but only on very
cont rolled set t ings