Dimensionality Reduction

Dimensionality Reduction
Curse of Dimensionality
• Increasing the number of features

will not always improve
classification accuracy.
• In practice, the inclusion of more

k=3 bins
features might actually lead to Total: 31 bins per feature
worse performance.
• The number of training examples

required increases exponentially
with dimensionality D (i.e., kD). Total: 32 bins
Total: 33 bins
k: number of bins per feature
Dimensionality Reduction
• What is the objective?
− Choose an optimum set of features d* of lower
dimensionality to improve classification accuracy.
d*
• Different methods can be used to reduce

dimensionality:
− Feature extraction
− Feature selection
3
Dimensionality Reduction (cont’d)
Feature extraction: computes

a new set of features from the Feature selection:
original features through some chooses a subset of the
transformation f() . original features.
f() could be linear

or non-linear  x1 
 x1  x 
x   2  xi1 
 2  .   
 y1   
 .  y   xi2 
 . 
   2 x y . 
.
x     f ( x )
y   .   .   
 .       . 
   .    .  
 .   yK   .   xiK 
 .   
   xD 
 xD  K<<D K<<D
4
Feature Extraction
• Linear transformations are particularly attractive because
they are simpler to compute and analytically tractable.
• Given x ϵ RD, find an K x D matrix T such that:
y = Tx ϵ RK where K<<D
 x1 
x  This is a projection
 2  y1  transformation from D
 .  y  dimensions to K dimensions.
  T  2
 . 
x 
f (x)
y   . 
 .    Each new feature yi is a
   .  linear combination of the
  .
 yK  original features xi
 . 
 
 xD 
5
Feature Extraction (cont’d)
• From a mathematical point of view, finding an optimum
mapping y=𝑓(x) can be formulated as an optimization
problem (i.e., minimize or maximize an objective
criterion).
• Commonly used objective criteria:
− Minimize Information Loss: projection in the lower-dimensional

space preserves as much information in the data as possible.
− Maximize Discriminatory Information: projection in the lower-

dimensional space increases class separability.
6
Feature Extraction (cont’d)
• Popular linear feature extraction methods:
− Principal Components Analysis (PCA): Seeks a projection that
minimizes information loss.
− Linear Discriminant Analysis (LDA): Seeks a projection that
maximizes discriminatory information.
• Many other methods:

− Making features as independent as possible (Independent
Component Analysis).
− Retaining interesting directions (Projection Pursuit).
− Embedding to lower dimensional manifolds (Isomap, Locally Linear
Embedding).
7
Vector Representation
 x1 
• A vector x ϵ RD
can be x 
 2
 . 
represented by D components:  
.
x: 
 . 
 
 . 
• Assuming the standard base  . 
 
 xD 
<v1, v2, …, vD> (i.e., unit vectors
in each dimension), xi can be xT vi
obtained by projecting x along xi  T  xT vi
vi vi
the vi direction:
xi are called projection coefficients
• x can be “reconstructed” from its D

projection coefficients: x   xi vi  x1v1  x2v2  ...  xD vD
i 1
xi are called expansion coefficients in this context
• In summary, the components of a vector are associated with a

specific base; if the base is changed, the components will
change too (essentially a change of coordinate systems). 8
Vector Representation (cont’d)
 x1   3
• Example assuming D=2: x:    
 x2   4 j
• Assuming the standard base 1 

x1  x i  3 4    3
T
<v1=i, v2=j>, xi can be obtained 0

by projecting x along the
0 
direction of vi: x2  xT j  3 4    4
1 
• x can be “reconstructed” from x  3i  4 j

its projection coefficients as
follows:
9
PCA – Main Idea
• Any x∈RD can be written as a linear combination of an

 x1 
orthonormal set of D basis vectors <v1,v2,…,vD>, viϵRD x 
 2
(e.g., using the standard base):  . 
D
 
.
x:  
1 if i  j x   xi vi  x1v1  x2v2  ...  xD vD  . 
v vj  
T
i i 1
 
0 otherwise xT vi  . 
where xi  T  xT vi  . 
vi vi  
 xD 
• PCA seeks to approximate x in a subspace of RD using a

new set of K<<D basis vectors <u1, u2, …,uK>, uiϵRD:
K  y1 
xˆ   yi ui  y1u1  y2u2  ...  yK u K xT ui
where yi  T  xT ui y 
ui ui  2
i 1 (reconstruction)
xˆ :  . 
 
1 if i  j
u uj  
T such that || x  xˆ || is minimized!  . 
 yK 
i
0 otherwise (i.e., minimize information loss)
10
Principal Component Analysis (PCA)
• The “optimal” set of basis vectors <u1, u2, …,uK> can be

found as follows (we’ll explain the details later):
(1) Find the eigenvectors u𝑖 of the covariance matrix Σx of the

(training) data (i.e., typically, D distinct eigenvectors)
Σx u𝑖= 𝜆𝑖 u𝑖 (reminder: ui form an orthogonal basis)
(2) Choose the K “largest” eigenvectors u𝑖 (i.e., corresponding

to the K “largest” eigenvalues 𝜆𝑖)
<u1, u2, …,uK> form to the “optimal” basis!
We refer to the “largest” eigenvectors u𝑖 as principal components.

http://people.whitman.edu/~hundledr/courses/M350F14/M350/BestBasisRedone.pdf
11
PCA - Steps
• Suppose we are given x1, x2, ..., xM (D x 1) vectors
D: # of features
Step 1: compute sample mean M: # data
M
1
x
M
x
i 1
i
Step 2: subtract sample mean (i.e., center the data at zero)
Φi  xi  x
Step 3: compute the sample covariance matrix Σx
1 M
1 M
1 where A=[Φ1 Φ2 ... ΦΜ]
x 
M

i 1
( x i  x )( x i  x )T

M

i 1
 T
i
i  
M
AAT
i.e., the columns of A are the Φi
(D x M matrix)
12
PCA - Steps
Step 4: compute the eigenvalues/eigenvectors of Σx
 xui  iui
where we assume 1  2  ...  N
Note : most software packages return the eigenvalues (and corresponding eigenvectors)
is decreasing order – if not, you should explicitly put them in this order)
Since Σx is symmetric, <u1,u2,…,uD> form an orthogonal basis

in RD; therefore, we can represent any x∈RD as: x 
x 
y 
y 
1 1
 2  2
D  .   . 
x  x   yi ui  y1u1  y2u2  ...  yDu D

   
 .  .
xx:  
 .   . 
eigen i 1    
coefficients i.e., this is  .   . 
 .   . 
(x  x) ui
T
just a “change”    
yi   ( x  x )T
ui if || ui || 1  xD   yD 
T
ui ui of basis!
Note : most software packages normalize ui to unit length to simplify calculations; if

not, you should explicitly normalize them) 13
PCA - Steps
Step 5: dimensionality reduction step – approximate x by xˆ
using only the first K eigenvectors (K<<D) (i.e., corresponding
to the K largest eigenvalues where K is a parameter):
D
x  x   yi ui  y1u1  y2u2  ...  yDu D
i 1
approximate x by xˆ
using first K eigenvectors
K
xˆ  x   yi ui  y1u1  y2u2  ...  yK uK
i 1 (reconstruction)
 x1   y1 
x  y 
 2  2  y1 
 . 
 
 . 
 
y 
 2
K<<D; note that if K=D, then xˆ  x
. .
x  x :       xˆ  x :  .  (i.e., zero reconstruction error)
 .   .   
     . 
  .   .
 yK 
 .   . 
    dimensionality reduction
 xD   yD 
change of basis 14
What is the Linear Transformation
implied by PCA?
• The linear transformation y = Tx associated with PCA can
be found as follows:
K
xˆ  x   yi ui  y1u1  y2u2  ...  yK uK
i 1
 y1 
y  where U  [u1 u2 ... uK ] D x K matrix
 2
(xˆ  x )  U  . 
  i.e., the columns of U are the
 .  the first K eigenvectors of Σx
 yK 
 y1 
y 
 2 T = UT K x D matrix
 .   U T (xˆ  x ) i.e., the rows of T are the first
 
K eigenvectors of Σx
 . 
 yK 
15
What is the form of Σy ?
M M
1 1
x 
M

i 1
(xi  x )(xi  x ) 
M
T
 i i
 
i 1
T
Using diagonalization:
The diagonal elements of
The columns of P are the
 x  P P T eigenvectors of ΣX
Λ are the eigenvalues of ΣX
or the variances (see review)
y i  U T ( x i  x )  PT  i
M M
1 1
  ( P  )( P  )
M
1
  
T T T T
y  (y i  y )(y i  y ) 
T ( y i )( y i ) i i
M i 1
M i 1 M i 1
M M
1 1
M

i 1
(T
P  i )(  T
P
i )  P ( T
M
 
i 1
i
T
i ) P  PT  x P  PT ( PPT ) P  
y   PCA de-correlates the data and

preserves the original variances!
16
Interpretation of PCA
• PCA chooses the eigenvectors
corresponding to the largest
eigenvalues.
• The eigenvalues correspond to the
variance of the data along the
eigenvector directions.
• Therefore, PCA projects the data
along the directions where the data
varies most. u1: direction of max variance
u2: orthogonal to u1
• PCA preserves as much information
in the data by preserving as much
variance in the data.
17
Example
• Compute the PCA of the following dataset:
(1,2),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)
• Compute the sample covariance matrix:

n
ˆ  1  (x  μˆ )(x  μˆ )t
k k
n k 1
• The eigenvalues can be computed by finding the roots of the

characteristic polynomial:
18
Example (cont’d)
• The eigenvectors are the solutions of:
xui  iui
Note: if ui is a solution, then cui is also a solution where c≠0.
Eigenvectors are typically normalized to have unit-length:

vi
vˆi 
|| vi ||
19
How should we choose K ?
• K is typically chosen based on how much information

(variance) we want to preserve in the data:
K
Choose the smallest  i

K that satisfies the
i 1
D
T where T is a threshold (e.g ., 0.9)
following inequality:

i 1
i
• If T=0.9, for example, K is chosen to “preserve” 90% of the

information (variance) in the data.
• If K=D, then we “preserve” 100% of the information in the

data (i.e., just a “change” of basis and xˆ  x )
20
Approximation or Reconstruction Error
• The approximation error (or reconstruction error) can be

computed by:
|| x  xˆ ||
K
where xˆ  x   yi ui
i 1
K
or xˆ   yi ui  x  y1u1  y2u2  ...  yK uK  x
i 1
(reconstruction)
• It can also be computed using:
1 D
|| x  xˆ ||  i
2 i  K 1
21
Data Normalization
• The principal components depend both on the units used to

measure the original variables (i.e., features) and the range
of values they assume.
• If different units and/or ranges are involved, features should

always be normalized prior to applying PCA.
• A common normalization method is to transform all the

features to having zero mean and unit standard deviation:
xi   where μ and σ are the mean and standard
 deviation of the i-th feature xi
22
Application to Images
• Goal: represent images in a space of lower dimensionality
using PCA.
− Useful for various applications, e.g., face recognition, image
compression, etc.
• Given M images of size N x N, first represent each image
as an N2x1 1D vector (i.e., by stacking the rows together).
number of
features:
D=N2
23
Application to Images (cont’d)
• The key challenge is that the covariance matrix Σx is now

very large – here is Step 3 again:
Step 3: compute the covariance matrix Σx

M
1 1 where A=[Φ1 Φ2 ... ΦΜ]
x 
M
 i i
 
i 1
T
M
AAT (N2 x M matrix)
• Σx is now an N2 x N2 matrix – computationally expensive to

compute its eigenvalues/eigenvectors λi, ui
(AAT)ui= λiui
24
• We will use a simple “trick” to get around this by relating
the eigenvalues/eigenvectors of AAT to those of ATA.
• ATA is an M x M matrix (i.e., typically much smaller)

− Suppose its eigenvalues/eigenvectors are μi, vi
(ATA)vi= μivi
− Multiply both sides by A:
A(ATA)vi=Aμivi or (AAT)(Avi)= μi(Avi)
− Assuming that (AAT)ui= λiui
A=[Φ1 Φ2 ... ΦΜ]

λi=μi and ui=Avi
(N2 x M matrix)
25
• Do AAT and ATA have the same number of eigenvalues

and eigenvectors?
− AAT can have up to N2 eigenvalues/eigenvectors.
− ATA can have up to M eigenvalues/eigenvectors.
− It turns out that the M eigenvalues/eigenvectors of ATA correspond

to the M largest eigenvalues/eigenvectors of AAT
• Steps 3-5 of PCA need to be updated as follows:
26
Step 3 compute ATA (i.e., instead of AAT)
Step 4a: compute μi, vi of ATA
Step 4b: obtain λi, ui of AAT using λi=μi and ui=Avi ; then,
normalize ui to unit length.
Step 5: dimensionality reduction step – approximate x using
only the “largest” K eigenvectors (K<<M): x  1
x 
 2  y1 
M y   y1 
x  x   yi ui  y1u1  y2u2  ...  yM uM
 .  y 
    2
.   .   2
xx:      xˆ  x :  . 
i 1  .   
   . 
ˆ using    . 
approximate x by x   .
 
.
 yK 
the K largest eigenvectors  .   yM 
 
K  xD 
xˆ  x   yi ui  y1u1  y2u2  ...  yK uK K<<M; note that if K=M, then

i 1 xˆ  x (i.e., close to zero
reconstruction error)
27
Example
Dataset
28
Example (cont’d)
K largest eigenvectors: u1,…uK
(visualized as images – called “eigenfaces”)
u1 u2 u3
Mean face: x
29
Example (cont’d)
• How can you visualize an eigenvector v as a

PGM image? u u u  x1   y1 
x  y 
1 2 3
− Need to map its values xi to integer values yi  2  2

in the interval [0, 255] (i.e., required by the  .   . 
   
PGM format). .  . 
v: 
− Suppose fmin and fmax are the min/max values  .   . 
   
of v (note that they could be negative).  .   . 
 .   . 
− The following transformation achieves the    
desired mapping:  xD   yD 
yi=(int)255(xi - fmin)/(fmax - fmin)
i.e., [fmin, fmax]  [0, 255]
30
• Interpretation: approximate a face image using eigenfaces
K largest eigenvectors: u1,…uK (basis vectors)
u1 u2
 y1 
y 
 2
K xˆ  x :  . 
xˆ   yi ui  y1u1  y2u2  ...  yK u K  x  
i 1
 . 
eigen-coefficients  yK 
y1 y2 y3
...  x
31
Case Study: Eigenfaces for Face
Detection/Recognition
− M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of

Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
• Face Recognition
− The simplest approach is to think of it as a template
matching problem.
− Problems arise when performing recognition in a high-

dimensional space.
− Use PCA for dimensionality reduction!
32
Training Phase
• Given a set of face images from a image database
group of people (each person could
have one or more images), perform
the following steps:
− Compute PCA space (K largest

eigenvectors or eigenspace) using the
image database (i.e., training data)
− Represent each image i in the database
with a vector Ωi of K eigen-coefficients:
Note: faces must
 y1 
y  be centered and
 2 eigen-coefficients of the same size.
Ωi   . 
 
 . 
 yK 
Recognition Phase
Given an unknown face x, apply the following steps:
Step 1: Subtract mean face x (computed from the training data)
  xx
 y1 
Step 2: Project the unknown face in the eigenspace: y 
 2
yi   ui i  1, 2,..., K
T
: .  eigen-coefficients
 
 . 
Step 3: Find closest match Ωp from the training set:  yK 
p  arg min i ||   i || i  1, 2,..., M
K K
1
where er  min i ||   i || min i  ( y j  y ) or min i 
i 2
( y j  y ij ) 2
j 1
j
Euclidean distance j 1 j
Mahalanobis distance
The distance er is called distance in face space (difs)
Step 4: Recognize x as the person associated with the ID of Ωp

Note: for intruder detection, we also impose er<Tr, for some threshold Tr
34
Face detection vs recognition
Detection Recognition “Sally”
35
Face Detection Using Eigenfaces
Given an unknown image x, follow these steps:
Step 1: Subtract mean face x (computed from training data):
  xx
Step 2: Project the unknown face in the eigenspace and

reconstruct it using K projections:
K
y  T u i  1, 2,..., K ˆ yu

i i i i
i 1
ˆ ||
Step 3: Compute ed ||   
reconstruction
The distance ed is called distance from face space (dffs)
Step 4: if ed<Td, then x is a face.

36
Why does this work?
Input  Reconstructed ̂
Reconstructed image looks

like a face.
Reconstructed image looks

like a face.
Reconstructed image
looks like a face again!
large dffs ˆ ||
ed ||    37
Face Detection Using Eigenfaces
We can use dffs to find faces in an image
ˆ ||
ed ||   
1) Compute ed at each image location

2) Pick the location with the lowest distance (red circle)
Visualization of dffs as an image
Dark: small ed value
Bright: large ed value
38
Limitations (cont’d)
• PCA is not always an optimal dimensionality-reduction
technique for classification purposes.
39
Linear Discriminant Analysis (LDA)
• What is the goal of LDA?

− Seeks to find directions along which the classes are best
separated (i.e., find more discriminatory features).
− It takes into consideration the scatter (i.e., variance) within-
classes and between-classes.
projection direction
projection direction
Bad separability Good separability

40
Linear Discriminant Analysis (LDA) (cont’d)
• Let us assume C classes with each class containing Mi samples, i=1,2,..,C
(each of dimensionality D), with M being the total number of samples:
C
M   Mi
i 1
• Let μi be the mean of the i-th class, i=1,2,…,C and μ be the mean of the
whole dataset: 1 C
μ
C
 μ
i 1
i
Within-class scatter matrix:

C Mi
S w   (x j  μi )(x j  μ i )T
i 1 j 1
Between-class scatter matrix:

C
Sb   (μ i  μ )(μ i  μ )T
i 1
41
• Suppose the desired projection transformation U is:
y UTx
• Suppose the scatter matrices of the projected data y are:
Sb , S w
• LDA seeks a transformation U that maximizes the between-

class scatter and minimizes the within-class scatter.
| Sb | | U T SbU |
max or max T
| Sw | | U S wU |
• What is the solution U to the above optimization problem?

42
• It turns out that the columns of U are the eigenvectors (i.e.,

called Fisherfaces) corresponding to the largest eigenvalues
of the following generalized eigen-problem:
Sbuk  k Swuk
• It can be shown that Sb has at most rank C-1; therefore,
the max number of eigenvectors with non-zero
eigenvalues is C-1 which implies that:
max dimensionality of LDA sub-space is C-1
e.g., when C=2, we always end up with one LDA feature

no matter what the original number of features D was!
43
Example
44
• If Sw is non-singular, we can solve a conventional

eigenvalue problem as follows:
Sbuk  k Swuk
S Sb uk  k uk
1
w
• In practice, Sw is singular due to the high dimensionality

of the data (e.g., images) and the relatively low number
of samples.
45
• To alleviate this problem, PCA could be applied first:
1) First, apply PCA to reduce data dimensionality:
 x1   y1 
x  y 
  2  2
x   .  PCA
y   . , D  D
   
  .  . 
 xD   yD 
2) Then, apply LDA to find the most discriminative directions:

 y1   z1 
y  z 
  2  2
y   .   LDA
z   . , K  D
   
  .  . 
 yD   zK 
46
Case Study I
− D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image

Retrieval", IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831-836, 1996.
• Content-based image retrieval:
− Application: query-by-example content-based image retrieval
− Question: how should we select a good set of features?
47
Case Study I (cont’d)
• Assumptions
− Well-framed images are required as input for training and query-by-
example test probes.
− Only a small variation in the size, position, and orientation of the
objects in the images is allowed.
48
• Terminology
− Most Expressive Features (MEF): obtained using PCA.
− Most Discriminating Features (MDF): obtained using LDA.
• Numerical instabilities
− Computing the eigenvalues/eigenvectors of Sw-1SBuk = kuk
could lead to unstable computations since Sw-1SB is not always
symmetric.
− Check the paper for more details about how to deal with this
issue.
49
• Comparing projection directions between MEF with MDF:
− PCA eigenvectors show the tendency of PCA to capture major
variations in the training set such as lighting direction.
− LDA eigenvectors discount those factors unrelated to classification.
50
• Clustering effect
PCA space LDA space
51
• Methodology
1) Represent each training image in terms of MDFs (or MEFs for
comparison).
2) Represent a query image in terms of MDFs (or MEFs for

comparson).
3) Find the r closest neighbors (e.g., using Euclidean distance).
52
• Experiments and results
Face images
− A set of face images was used with 2 expressions, 3 lighting conditions.
− Testing was performed using a disjoint set of images.
53
Top match (r=1)
54
− Examples of correct search probes
55
− Example of a failed search probe
56
Case Study II
− A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on

Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-
233, 2001.
• Is LDA always better than PCA?
− There has been a tendency in the computer vision community to

prefer LDA over PCA.
− This is mainly because LDA deals directly with discrimination
between classes while PCA does not pay attention to the underlying
class structure.
57
Case Study II (cont’d)
AR database
58
LDA is not always better when the training set is small
PCA w/o 3: not using the

first three principal components
that seem to encode mostly
variations due to lighting
59
LDA outperforms PCA when the training set is large
PCA w/o 3: not using the

first three principal components
that seem to encode mostly
variations due to lighting
60

Dimensionality Reduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimensionality Reduction

Uploaded by

Copyright:

Available Formats

Dimensionality Reduction

• Increasing the number of features

• In practice, the inclusion of more

• The number of training examples

• Different methods can be used to reduce

Feature extraction: computes

f() could be linear

• Given x ϵ RD, find an K x D matrix T such that:

• Commonly used objective criteria:

− Minimize Information Loss: projection in the lower-dimensional

− Maximize Discriminatory Information: projection in the lower-

• Many other methods:

• x can be “reconstructed” from its D

• In summary, the components of a vector are associated with a

• Assuming the standard base 1 

<v1=i, v2=j>, xi can be obtained 0

• x can be “reconstructed” from x  3i  4 j

• Any x∈RD can be written as a linear combination of an

• PCA seeks to approximate x in a subspace of RD using a

• The “optimal” set of basis vectors <u1, u2, …,uK> can be

(1) Find the eigenvectors u𝑖 of the covariance matrix Σx of the

(2) Choose the K “largest” eigenvectors u𝑖 (i.e., corresponding

<u1, u2, …,uK> form to the “optimal” basis!

We refer to the “largest” eigenvectors u𝑖 as principal components.

Step 2: subtract sample mean (i.e., center the data at zero)

Since Σx is symmetric, <u1,u2,…,uD> form an orthogonal basis

x  x   yi ui  y1u1  y2u2  ...  yDu D

Note : most software packages normalize ui to unit length to simplify calculations; if

y   PCA de-correlates the data and

• Compute the sample covariance matrix:

• The eigenvalues can be computed by finding the roots of the

Note: if ui is a solution, then cui is also a solution where c≠0.

Eigenvectors are typically normalized to have unit-length:

• K is typically chosen based on how much information

Choose the smallest  i

• If T=0.9, for example, K is chosen to “preserve” 90% of the

• If K=D, then we “preserve” 100% of the information in the

• The approximation error (or reconstruction error) can be

• It can also be computed using:

• The principal components depend both on the units used to

• If different units and/or ranges are involved, features should

• A common normalization method is to transform all the

• The key challenge is that the covariance matrix Σx is now

Step 3: compute the covariance matrix Σx

• Σx is now an N2 x N2 matrix – computationally expensive to

• ATA is an M x M matrix (i.e., typically much smaller)

− Assuming that (AAT)ui= λiui

A=[Φ1 Φ2 ... ΦΜ]

• Do AAT and ATA have the same number of eigenvalues

− AAT can have up to N2 eigenvalues/eigenvectors.

− ATA can have up to M eigenvalues/eigenvectors.

− It turns out that the M eigenvalues/eigenvectors of ATA correspond

• Steps 3-5 of PCA need to be updated as follows:

xˆ  x   yi ui  y1u1  y2u2  ...  yK uK K<<M; note that if K=M, then

• How can you visualize an eigenvector v as a

− Need to map its values xi to integer values yi  2  2

yi=(int)255(xi - fmin)/(fmax - fmin)

i.e., [fmin, fmax]  [0, 255]

− M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of

− Problems arise when performing recognition in a high-

− Use PCA for dimensionality reduction!

− Compute PCA space (K largest

The distance er is called distance in face space (difs)