Professional Documents
Culture Documents
Lecture 9 - PCA PDF
Lecture 9 - PCA PDF
CS498
Todays lecture
Z = WX
samples dimensions samples
dimensions
features
features
x1
Z = WX = x2
T T T
z1 w1 x1
= T = T T
z2 w 2 x 2
Scatter plot of same data
x2
x1
Dening a goal
Simple weights
Minimize relation of the two dimensions
Feature similarity
Same thing!
One way to proceed
Simple weights
Minimize relation of the two dimensions
Decorrelate: zT1z2 = 0
Feature similarity
Same thing!
Decorrelate: wT1w 2 = 0
Diagonalizing the covariance
Covariance matrix
T
z1z1 zz
T
Cov ( z1, z2 ) = T 1 2
/N
z2 z1 z2 z2
T
Diagonalizing the covariance suppresses
cross-dimensional co-activity
if z1 is high, z2 wont be, etc
T
z1z1 zz
T
1 0
Cov ( z1 , z2 ) = T 1 2
/ N = =I
z2 z1 zT2 z2 0 1
Problem denition
For a given input X
T
( WX )( WX ) = NI
Solving for diagonalization
T
( WX )( WX ) = NI
WXX W = N I
T T
WCov ( X ) W = IT
Solving for diagonalization
Covariance matrices are positive denite
Therefore symmetric
have orthogonal eigenvectors and real eigenvalues
and are factorizable by:
UT AU =
Where U has eigenvectors of A in its columns
=diag(i), where i are the eigenvalues of A
Solving for diagonalization
The solution is a function of the eigenvectors
U and eigenvalues of Cov(X)
WCov ( X ) W = I
T
1
1 0
W= UT
0 2
So what does it do?
14.9 0.05
Input data covariance: Cov ( X )
0.05 1.08
0.12 30.4
Extracted feature matrix: W /N
8.17 0.03
1 0
Weights covariance: Cov(WX) =
0 1
Another solution
This is not the only solution to the problem
Consider this one: ( WX )( WX )T = N I
WXXT WT = N I
( )
1/2
W = XX T
Another solution
This is not the only solution to the problem
Consider this one: ( WX )( WX )T = N I
WXXT WT = N I
( )
1/2
Similar but W = XX T
out of scope W = US1/2 VT
for now [U,S, V] = SVD XXT ( )
Decorrelation in pictures
An implicit Gaussian assumption
N-D data has N directions of variance
Input Data
Undoing the variance
The decorrelating matrix W contains two
vectors that normalize the inputs variance
Input Data
Resulting transform
Input gets scaled to a well behaved Gaussian
with unit variance in all dimensions
Input Data Transformed Data (feature weights)
A more complex case
Having correlation between two dimensions
We still nd the directions of maximal variance
But we also rotate in addition to scaling
dimensions
samples
features
dimensions
= SVD
samples
Feature Matrix eigenvalue matrix Input Matrix
Weight Matrix
dimensions
dimensions
features features features
dimensions
features
features
= SVD
Input Covariance
Feature Matrix Eigenvalue matrix Weight Matrix
Dimensionality reduction
1.5
0.5
0
0 10 20 30 40 50 60
So where are the features?
We strayed o-subject
What happened to the features?
We only mentioned that they are orthogonal
1 2 3 4 5
Dominant eigenfaces
1 2 3 4 5
Cumulative approx
PCA for large dimensionalities
Sometimes the data is high dim
e.g. videos 1280x720xT = 921,600D x T frames
Eigenfaces
http://en.wikipedia.org/wiki/Eigenface
http://www.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdf
http://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf
Incremental SVD
http://www.merl.com/publications/TR2002-024/
EM-PCA
http://cs.nyu.edu/~roweis/papers/empca.ps.gz