Professional Documents
Culture Documents
Curse of Dimensionality
d*
3
Dimensionality Reduction (cont’d)
y = Tx ϵ RK where K<<D
x1
x This is a projection
2 y1 transformation from D
. y dimensions to K dimensions.
T 2
.
x
f (x)
y .
. Each new feature yi is a
. linear combination of the
.
yK original features xi
.
xD
5
Feature Extraction (cont’d)
• From a mathematical point of view, finding an optimum
mapping y=𝑓(x) can be formulated as an optimization
problem (i.e., minimize or maximize an objective
criterion).
6
Feature Extraction (cont’d)
• Popular linear feature extraction methods:
− Principal Components Analysis (PCA): Seeks a projection that
minimizes information loss.
− Linear Discriminant Analysis (LDA): Seeks a projection that
maximizes discriminatory information.
7
Vector Representation
x1
• A vector x ϵ RD
can be x
2
.
represented by D components:
.
x:
.
.
• Assuming the standard base .
xD
<v1, v2, …, vD> (i.e., unit vectors
in each dimension), xi can be xT vi
obtained by projecting x along xi T xT vi
vi vi
the vi direction:
xi are called projection coefficients
x1 3
• Example assuming D=2: x:
x2 4 j
Φi xi x
Step 3: compute the sample covariance matrix Σx
1 M
1 M
1 where A=[Φ1 Φ2 ... ΦΜ]
x
M
i 1
( x i x )( x i x )T
M
i 1
T
i
i
M
AAT
i.e., the columns of A are the Φi
(D x M matrix)
12
PCA - Steps
Step 4: compute the eigenvalues/eigenvectors of Σx
xui iui
where we assume 1 2 ... N
Note : most software packages return the eigenvalues (and corresponding eigenvectors)
is decreasing order – if not, you should explicitly put them in this order)
2 2
D . .
y1
y where U [u1 u2 ... uK ] D x K matrix
2
(xˆ x ) U .
i.e., the columns of U are the
. the first K eigenvectors of Σx
yK
y1
y
2 T = UT K x D matrix
. U T (xˆ x ) i.e., the rows of T are the first
K eigenvectors of Σx
.
yK
15
What is the form of Σy ?
M M
1 1
x
M
i 1
(xi x )(xi x )
M
T
i i
i 1
T
Using diagonalization:
The diagonal elements of
The columns of P are the
x P P T eigenvectors of ΣX
Λ are the eigenvalues of ΣX
or the variances (see review)
y i U T ( x i x ) PT i
M M
1 1
( P )( P )
M
1
T T T T
y (y i y )(y i y )
T ( y i )( y i ) i i
M i 1
M i 1 M i 1
M M
1 1
M
i 1
(T
P i )( T
P
i ) P ( T
M
i 1
i
T
i ) P PT x P PT ( PPT ) P
17
Example
• Compute the PCA of the following dataset:
(1,2),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)
18
Example (cont’d)
• The eigenvectors are the solutions of:
xui iui
20
Approximation or Reconstruction Error
1 D
|| x xˆ || i
2 i K 1
21
Data Normalization
22
Application to Images
• Goal: represent images in a space of lower dimensionality
using PCA.
− Useful for various applications, e.g., face recognition, image
compression, etc.
• Given M images of size N x N, first represent each image
as an N2x1 1D vector (i.e., by stacking the rows together).
number of
features:
D=N2
23
Application to Images (cont’d)
M
AAT (N2 x M matrix)
24
Application to Images (cont’d)
• We will use a simple “trick” to get around this by relating
the eigenvalues/eigenvectors of AAT to those of ATA.
25
Application to Images (cont’d)
26
Application to Images (cont’d)
Step 3 compute ATA (i.e., instead of AAT)
Step 4a: compute μi, vi of ATA
Step 4b: obtain λi, ui of AAT using λi=μi and ui=Avi ; then,
normalize ui to unit length.
Step 5: dimensionality reduction step – approximate x using
only the “largest” K eigenvectors (K<<M): x 1
x
2 y1
M y y1
x x yi ui y1u1 y2u2 ... yM uM
. y
2
. . 2
xx: xˆ x : .
i 1 .
.
ˆ using .
approximate x by x .
.
yK
the K largest eigenvectors . yM
K xD
Dataset
28
Example (cont’d)
K largest eigenvectors: u1,…uK
(visualized as images – called “eigenfaces”)
u1 u2 u3
Mean face: x
29
Example (cont’d)
30
Application to Images (cont’d)
• Interpretation: approximate a face image using eigenfaces
K largest eigenvectors: u1,…uK (basis vectors)
u1 u2
y1
y
2
K xˆ x : .
xˆ yi ui y1u1 y2u2 ... yK u K x
i 1
.
eigen-coefficients yK
y1 y2 y3
... x
31
Case Study: Eigenfaces for Face
Detection/Recognition
• Face Recognition
− The simplest approach is to think of it as a template
matching problem.
32
Training Phase
• Given a set of face images from a image database
group of people (each person could
have one or more images), perform
the following steps:
35
Face Detection Using Eigenfaces
Given an unknown image x, follow these steps:
Step 1: Subtract mean face x (computed from training data):
xx
y T u i 1, 2,..., K ˆ yu
i i i i
i 1
ˆ ||
Step 3: Compute ed ||
reconstruction
Reconstructed image
looks like a face again!
large dffs ˆ ||
ed || 37
Face Detection Using Eigenfaces
We can use dffs to find faces in an image
ˆ ||
ed ||
38
Limitations (cont’d)
• PCA is not always an optimal dimensionality-reduction
technique for classification purposes.
39
Linear Discriminant Analysis (LDA)
projection direction
projection direction
• Let μi be the mean of the i-th class, i=1,2,…,C and μ be the mean of the
whole dataset: 1 C
μ
C
μ
i 1
i
y UTx
• Suppose the scatter matrices of the projected data y are:
Sb , S w
| Sb | | U T SbU |
max or max T
| Sw | | U S wU |
Sbuk k Swuk
• It can be shown that Sb has at most rank C-1; therefore,
the max number of eigenvectors with non-zero
eigenvalues is C-1 which implies that:
44
Linear Discriminant Analysis (LDA) (cont’d)
Sbuk k Swuk
S Sb uk k uk
1
w
45
Linear Discriminant Analysis (LDA) (cont’d)
• To alleviate this problem, PCA could be applied first:
x1 y1
x y
2 2
x . PCA
y . , D D
. .
xD yD
47
Case Study I (cont’d)
• Assumptions
− Well-framed images are required as input for training and query-by-
example test probes.
− Only a small variation in the size, position, and orientation of the
objects in the images is allowed.
48
Case Study I (cont’d)
• Terminology
− Most Expressive Features (MEF): obtained using PCA.
− Most Discriminating Features (MDF): obtained using LDA.
• Numerical instabilities
− Computing the eigenvalues/eigenvectors of Sw-1SBuk = kuk
could lead to unstable computations since Sw-1SB is not always
symmetric.
− Check the paper for more details about how to deal with this
issue.
49
Case Study I (cont’d)
• Comparing projection directions between MEF with MDF:
− PCA eigenvectors show the tendency of PCA to capture major
variations in the training set such as lighting direction.
− LDA eigenvectors discount those factors unrelated to classification.
50
Case Study I (cont’d)
• Clustering effect
51
Case Study I (cont’d)
• Methodology
1) Represent each training image in terms of MDFs (or MEFs for
comparison).
52
Case Study I (cont’d)
• Experiments and results
Face images
− A set of face images was used with 2 expressions, 3 lighting conditions.
− Testing was performed using a disjoint set of images.
53
Case Study I (cont’d)
54
Case Study I (cont’d)
− Examples of correct search probes
55
Case Study I (cont’d)
56
Case Study II
57
Case Study II (cont’d)
AR database
58
Case Study II (cont’d)
59
Case Study II (cont’d)
60