Professional Documents
Culture Documents
0
Equivalent to 1 slot. Lecturers can use extra materials for other slots in the chater
ThienNV Chapter 10. Dimensionality Reduction with PCA 1 / 23
Chapter 10. Dimensionality Reduction with PCA
1 N 2 1 N T 2 T
max [Vm = Vm [zm ] = ∑ z = ∑ (b x̂n ) = bm Ŝbm ] (13)
bm N n=1 mn N n=1 m
subject to ∥bm ∥2 = 1.
This means that b1 , . . . , bm−1 are also eigenvectors of Ŝ, but they are
associated with eigenvalue 0 so that b1 , . . . , bm−1 span the null space of Ŝ.
is as similar to x as possible.
1 N 2
JM ∶= ∑ ∥xn − x̃n ∥ . (23)
N n=1
(27)
D D D
xn = ∑ zdn bd = ∑ (xnT bd )bd = ( ∑ bd bdT ) xn
d=1 d=1 d=1
M ⎛ D
T T⎞
= ( ∑ bm bm ) xn + ∑ bj bj xn . (29)
m=1 ⎝j=M+1 ⎠
Hence the difference vector between the original data point and its
projection is
⎛ D T⎞
D
T
xn − x̃n = ∑ bj bj xn = ∑ (xn bj )bj . (30)
⎝j=M+1 ⎠ j=M+1
To maximize the variance of the projected data, PCA chooses the columns
of U in (37) to be the eigenvectors that are associated with the M largest
eigenvalues of the data covariance matrix S so that we identify U as the
projection matrix B in (3), which projects the original data onto a
lower-dimensional subspace of dimension M. The Eckart-Young theorem
(Section 4.6) offers a direct way to estimate the low-dimensional
representation. Consider the best rank-M approximation of X
We will provide a method for PCA when fewer data points than
dimensions N << D.
Assume we have a centered dataset x1 , . . . , xN ∈ RD and there are no
duplicate data points. So rk(S) = N, S has D − N + 1 many eigenvalues
that are 0. We will exploit this and turn the D × D covariance matrix into
an N × N covariance matrix whose eigenvalues are all positive.
Let (bm , m = 1, . . . , M) is a basis vector of the principal subspace.
1
Sbm = XX T bm = λm bm (41)
N
1 T 1
⇒ X XX T bm = λm X T bm ⇔ X T Xcm = λm cm . (42)
N N
1
XX T Xcm = λm Xcm , which means Xcm as an eigenvector of S. (43)
N
1. Mean subtraction
2. Standardization
3. Eigendecomposition of the covariance matrix
4. Projection
We have studied: