Professional Documents
Culture Documents
Lecture 9 - PCA
Lecture 9 - PCA
CS498
Todays lecture
Adaptive Feature Extraction
Principal Component Analysis
How, why, when, which
A dual goal
Find a good representation
The features part
Example case
Describe this input
A good feature
Simplify the explanation of the input
Represent repeating patterns
When dened makes the input simpler
Linear features
dimensions
=
Weight Matrix
features
features
samples
Feature Matrix
dimensions
Z = WX
samples
Input Matrix
A 2D case
Matrix representation of data
x1
Z = WX =
T T T
z1 w1 x1
= T = T T
z2 w 2 x 2
x2
x2
x1
Dening a goal
Desirable feature features
Give simple weights
Avoid feature similarity
Feature similarity
Same thing!
Feature similarity
Same thing!
Decorrelate: wT1w 2 = 0
T
z1z1
Cov ( z1, z2 ) = T
z2 z1
zz
/N
T
z2 z2
T
1 2
zz
T
1 2
zT2 z2
/ N =
0
1
=I
Problem denition
For a given input X
Find a feature matrix W
So that the weights decorrelate
T
T
WX
WX
=
N
I
ZZ
= NI
( )( )
( WX )( WX )
= NI
( WX )( WX )
= NI
WXX W = N I
T
WCov ( X ) W = I
T
UT AU =
Where U has eigenvectors of A in its columns
=diag(i), where i are the eigenvalues of A
W=
UT
14.9 0.05
Cov ( X )
0.05
1.08
0.12 30.4
/N
W
8.17
0.03
1 0
Cov(WX) =
0
1
Another solution
This is not the only solution to the problem
Consider this one: ( WX )( WX )T = N I
WXXT WT = N I
W = XX
1/2
Another solution
This is not the only solution to the problem
Consider this one: ( WX )( WX )T = N I
WXXT WT = N I
Similar but
out of scope
for now
W = XX
1/2
W = US1/2 VT
Decorrelation in pictures
An implicit Gaussian assumption
N-D data has N directions of variance
Input Data
Resulting transform
Input gets scaled to a well behaved Gaussian
with unit variance in all dimensions
Input Data
Relationship to eigendecomposition
In our case (covariance input A), U and S will
hold the eigenvectors/values of A
Feature Matrix
samples
samples
= SVD
samples
dimensions
features
features
dimensions
Input Matrix
eigenvalue matrix
features
= SVD
dimensions
features
features
features
features
dimensions
Weight Matrix
dimensions
Input Covariance
Feature Matrix Eigenvalue matrix
Weight Matrix
Dimensionality reduction
PCA is great for high dimensional data
Allows us to perform dimensionality reduction
Helps us nd relevant structure in data
Helps us throw away things that wont matter
A simple example
Two very correlated dimensions
e.g. size and weight of fruit
One eective variable
W =
13.7
28.2
A simple example
Second principal component needs to
be super-boosted to whiten the weights
maybe is it useless?
Example
Eigenvalues of 1200 dimensional video data
Little variance after component 30
We dont need to keep the rest of the data
5
2.5
x 10
1.5
0.5
10
20
30
40
50
60
Face analysis
Analysis of a face database
What are good features for faces?
The Eigenfaces
Low-rank model
Instead of using 780 pixel values we use
the PCA weights (here 50 and 5)
Cumulative approx
Dominant eigenfaces
Input
Full Approximation
Mean Face
985.953
732.591
655.408
229.737
227.179
EM-PCA in a nutshell
Alternate between successive approximations
Start with random C and loop over:
Z = C+X
C = XZ+
A Video Example
The movie is a series of frames
Each frame is a data point
126, 80x60 pixel frames
Data will be 4800x126
PCA Results
An example
Lets take a time series which is not white
Each sample is somewhat correlated with the
previous one (Markov process)
x(t), , x(t +T )
x(t)
x(t
+
1)
X=
x(t + N ) x(t + 1 + N )
An example
In this context, features will be repeating
temporal patterns smaller than N
x(t)
x(t + 1)
Z = W
x(t + N ) x(t + 1 + N )
1 e
1
1 e
Cov ( X )
1 e
1
1 e
0
1 e
1
Recap
Principal Component Analysis
Get used to it!
Decorrelates multivariate data, nds useful
components, reduces dimensionality
Incremental SVD
http://www.merl.com/publications/TR2002-024/
EM-PCA
http://cs.nyu.edu/~roweis/papers/empca.ps.gz