Professional Documents
Culture Documents
33 bins
k: number of bins per feature
Dimensionality Reduction
• What is the objective?
− Choose an optimum set of features of lower
dimensionality to improve classification accuracy.
3
Dimensionality Reduction (cont’d)
K<<N K<<N
4
Feature Extraction
• Linear combinations are particularly attractive because
they are simpler to compute and analytically tractable.
y = Tx ϵ RK where K<<N
5
Feature Extraction (cont’d)
• From a mathematical point of view, finding an optimum
mapping y=𝑓(x) is equivalent to optimizing an objective
criterion.
6
Feature Extraction (cont’d)
• Popular linear feature extraction methods:
− Principal Components Analysis (PCA): Seeks a projection that
preserves as much information in the data as possible.
− Linear Discriminant Analysis (LDA): Seeks a projection that best
discriminates the data.
7
Vector Representation
• A vector x ϵ Rn can be
represented by n components:
9
Principal Component Analysis (PCA)
11
PCA - Steps
• Suppose we are given x1, x2, ..., xM (N x 1) vectors
N: # of features
Step 1: compute sample mean M: # data
12
PCA - Steps
Step 4: compute the eigenvalues/eigenvectors of Σx
where we assume
Note : most software packages return the eigenvalues (and corresponding eigenvectors)
is decreasing order – if not, you can explicitly put them in this order)
i.e., this is
just a “change”
of basis!
approximate
using first K eigenvectors only
(reconstruction)
14
What is the Linear Transformation
implied by PCA?
• The linear transformation y = Tx which performs the
dimensionality reduction in PCA is:
T = UT K x N matrix
15
What is the form of Σy ?
Using diagonalization:
The diagonal elements of
The columns of P are the
Λ are the eigenvalues of ΣX
eigenvectors of ΣX or the variances
17
Example
• Compute the PCA of the following dataset:
(1,2),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)
18
Example (cont’d)
• The eigenvectors are the solutions of the systems:
19
How do we choose K ?
20
Approximation Error
where
(reconstruction)
21
Data Normalization
22
Application to Images
• The goal is to represent images in a space of lower
dimensionality using PCA.
− Useful for various applications, e.g., face recognition, image
compression, etc.
• Given M images of size N x N, first represent each image
as a 1D vector (i.e., by stacking the rows together).
− Note that for face recognition, faces must be centered and of the
same size.
23
Application to Images (cont’d)
24
Application to Images (cont’d)
• We will use a simple “trick” to get around this by relating the
eigenvalues/eigenvectors of AAT to those of ATA.
25
Application to Images (cont’d)
26
Application to Images (cont’d)
Step 3 compute ATA (i.e., instead of AAT)
Step 4b: compute λi, ui of AAT using λi=μi and ui=Avi, then
normalize ui to unit length.
27
Example
Dataset
28
Example (cont’d)
Top eigenvectors: u1,…uk
(visualized as an image - eigenfaces)
u1 u2 u3
Mean face:
29
Example (cont’d)
30
Application to Images (cont’d)
• Interpretation: represent a face in terms of eigenfaces
u1 u2 u3
y1 y2 y3
31
Case Study: Eigenfaces for Face
Detection/Recognition
• Face Recognition
− The simplest approach is to think of it as a template matching
problem.
32
Face Recognition Using Eigenfaces
• Process the image database (i.e., set of images with labels)
– typically referred to as “training” phase:
Ωi
Face Recognition Using Eigenfaces
Given an unknown face x, follow these steps:
Step 1: Subtract mean face (computed from training data)
35
Face Detection Using Eigenfaces
Given an unknown image x, follow these steps:
Step 1: Subtract mean face (computed from training data):
Step 3: Compute
36
Eigenfaces
Input Reconstructed
Reconstructed image
looks like a face again!
37
Reconstruction from partial information
Input Reconstructed
38
Eigenfaces
39
Limitations
• Background changes cause problems
− De-emphasize the outside of the face (e.g., by multiplying the input
image by a 2D Gaussian window centered on the face).
• Light changes degrade performance
− Light normalization might help but this is a challenging issue.
• Performance decreases quickly with changes to face size
− Scale input image to multiple sizes.
− Multi-scale eigenspaces.
• Performance decreases with changes to face orientation (but
not as fast as with scale changes)
− Out-of-plane rotations are more difficult to handle.
− Multi-orientation eigenspaces.
40
Limitations (cont’d)
• Not robust to misalignment.
41
Limitations (cont’d)
• PCA is not always an optimal dimensionality-reduction
technique for classification purposes.
42
Linear Discriminant Analysis (LDA)
projection direction
projection direction
• Let μi is the mean of the i-th class, i=1,2,…,C and μ is the mean of the
whole dataset:
44
Linear Discriminant Analysis (LDA) (cont’d)
• Suppose the desired projection transformation is:
or
45
Linear Discriminant Analysis (LDA) (cont’d)
47
Linear Discriminant Analysis (LDA) (cont’d)
48
Linear Discriminant Analysis (LDA) (cont’d)
• To alleviate this problem, PCA could be applied first:
49
Case Study I
50
Case Study I (cont’d)
• Assumptions
− Well-framed images are required as input for training and query-by-
example test probes.
− Only a small variation in the size, position, and orientation of the
objects in the images is allowed.
51
Case Study I (cont’d)
• Terminology
− Most Expressive Features (MEF): features obtained using PCA.
− Most Discriminating Features (MDF): features obtained using LDA.
• Numerical instabilities
− Computing the eigenvalues/eigenvectors of Sw-1SBuk = lkuk could
lead to unstable computations since Sw-1SB is not always symmetric.
− Check the paper for more details about how to deal with this issue.
52
Case Study I (cont’d)
• Comparing projection directions between MEF with MDF:
− PCA eigenvectors show the tendency of PCA to capture major
variations in the training set such as lighting direction.
− LDA eigenvectors discount those factors unrelated to classification.
53
Case Study I (cont’d)
• Clustering effect
54
Case Study I (cont’d)
1) Methodology
1) Represent each training image in terms of MDFs (or MEFs for
comparison).
55
Case Study I (cont’d)
• Experiments and results
Face images
− A set of face images was used with 2 expressions, 3 lighting conditions.
− Testing was performed using a disjoint set of images.
56
Case Study I (cont’d)
57
Case Study I (cont’d)
− Examples of correct search probes
58
Case Study I (cont’d)
59
Case Study II
60
Case Study II (cont’d)
AR database
61
Case Study II (cont’d)
62
Case Study II (cont’d)
63