Sairam PCA

MA614 - Applied Linear Algebra
Principal Component Analysis

(PCA)
Dr. A. Sairam Kaliraj
Assistant Professor
Department of Mathematics
Indian Institute of Technology Ropar
Email: sairam@iitrpr.ac.in
1/10
Theorem
The j th principal direction of a normalized data matrix A is its j th
unit singular vector qj . The corresponding principal standard
√
deviation νσj is proportional to its j th singular value σj .
2/10
Theorem
√
Eckart-Young Theorem says that the number of singnificant

singular values will determine the approximate rank k of the
covariance matrix K = νAT A, and hence the data matrix A.
Thus, the normalized data lies in a k − dimensional subspace.
The variance in any direction orthogonal to the principal
direction which is very small is relatively unimportant.
2/10
Theorem
√
Eckart-Young Theorem says that the number of singnificant

singular values will determine the approximate rank k of the
covariance matrix K = νAT A, and hence the data matrix A.
Thus, the normalized data lies in a k − dimensional subspace.
The variance in any direction orthogonal to the principal
direction which is very small is relatively unimportant.
As a consequence, dimensional reduction by orthogonally
projecting the data vectors (normalized) onto the k −
dimensional subspace spanned by the principal directions
q1 , q2 , . . . qk serves to eliminate significantly redundancies.
2/10
Approximating the data by its k principal Components coincides

with the closest rank k approximation Ak = Pk Σk QkT to the data
matrix A. Rewritting the data in terms of the principal
components serves to diagonalize the principal covariance
matrix νKk = νATk Ak , since
QkT (νATk Ak )Qk = νΣk2
the result being a diagonal matrix containing the k principal

variances along the diagonal.
3/10
Approximating the data by its k principal Components coincides

with the closest rank k approximation Ak = Pk Σk QkT to the data
matrix A. Rewritting the data in terms of the principal
components serves to diagonalize the principal covariance
matrix νKk = νATk Ak , since
QkT (νATk Ak )Qk = νΣk2
the result being a diagonal matrix containing the k principal

variances along the diagonal.
This means that the covariance between any two principal
components of the data is zero, and thus the principal
components are all uncorrelated!
3/10
Applications of Principal Component Analysis
1 Classification of Handwritten digits

2 Expression Recognition
3 Netflix recommendation algorithm
The applications are not only limited to the above mentioned
real time problems... It has huge applications in Artificial
Intelligence and machine learning.
4/10
Applications of SVD - III - Handwritten digit
classification
Problem: How to classify unknown digit?
5/10
classification
Problem: How to classify unknown digit? Precisely, given a set

of manually classified digits (the training set), classify a set of
unknown digits (the test set).
5/10
classification
Problem: How to classify unknown digit? Precisely, given a set

of manually classified digits (the training set), classify a set of
unknown digits (the test set).
5/10
A simple algorithm: Distance to the known
digits
Measure the distance between the unknown digit to the

known digits using the Euclidean distance.
6/10
digits

stack the columns of the image in a vector and identify
each digit as a vector in R256 .
6/10
digits

Then define the distance function
q
d(x, y ) = ||x − y ||2 = (x1 − y1 )2 + · · · + (x256 − y256 )2 .
6/10
digits

q
d(x, y ) = ||x − y ||2 = (x1 − y1 )2 + · · · + (x256 − y256 )2 .
All the digits of one kind in the training set form a cluster of
points in the Euclidean space R256 . (Assumption)
6/10
digits

q
d(x, y ) = ||x − y ||2 = (x1 − y1 )2 + · · · + (x256 − y256 )2 .
All the digits of one kind in the training set form a cluster of
points in the Euclidean space R256 . (Assumption)
Ideally the clusters are well separated, and the separation
between the clusters depends on how well written the
training digits are.
6/10
The means (“averages“) of all digits in the training set.
7/10
digits
Algorithm
Given the manually classified training set, compute the

means mi , i = 0, 1, 2, . . . , 9, of all the 10 digits.
8/10
digits
Algorithm

For each digit in the test set, classify it as k if mk is the
closest mean.
8/10
digits
Algorithm

closest mean.
For some test set, the success rate of this algorithm is around
75%.
8/10
digits
Algorithm

closest mean.
75%. The reason for the relatively bad performance is that the
algorithm does not use any information about the variation

within each class of digits.
8/10
digits
Algorithm

closest mean.
75%. The reason for the relatively bad performance is that the
algorithm does not use any information about the variation

within each class of digits.
Using singular value decomposition(SVD), we will see a

classification algorithm, for which the success rate is around
8/10 93%.
Algo. for classification of handwritten
digits using SVD
Let us consider the 16 × 16 matrix representation of the

image as vector in R256×1 , by stacking all the columns of
the image above each other.
9/10
digits using SVD

The matrix consisting of all the training digits of one kind.
the 30 s say, is an element of the space A ∈ R256×n .
9/10
digits using SVD

Each column of A represents an image of a digit 3.
9/10
digits using SVD

EachPcolumn of A represents an image of a digit 3. If
A= m T
i=1 σi ui vi , then the left singular vectors ui forms an
orthonormal basis for the range space of A. i.e., "the image
of 30 s".
9/10
digits using SVD

EachPcolumn of A represents an image of a digit 3. If
A= m T
i=1 σi ui vi , then the left singular vectors ui forms an
orthonormal basis for the range space of A. i.e., "the image
of 30 s".
Each digit is well characterized by a few of the first left
singular values of its own kind.
9/10
digits using SVD
Algorithm
Training: For the training set of known digits, compute the SVD
of each set of digits of one kind.
Classification: For a given test digit, compute its relative
residual in all 10 bases. If one residual is significantly smaller
than all the others, classify as that. Otherwise give up.
10/10

Sairam PCA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sairam PCA

Uploaded by

Copyright:

Available Formats

MA614 - Applied Linear Algebra

Principal Component Analysis

Dr. A. Sairam Kaliraj

Eckart-Young Theorem says that the number of singnificant

Eckart-Young Theorem says that the number of singnificant

Approximating the data by its k principal Components coincides

QkT (νATk Ak )Qk = νΣk2

the result being a diagonal matrix containing the k principal

Approximating the data by its k principal Components coincides

QkT (νATk Ak )Qk = νΣk2

the result being a diagonal matrix containing the k principal

1 Classification of Handwritten digits

Problem: How to classify unknown digit?

Problem: How to classify unknown digit? Precisely, given a set

Problem: How to classify unknown digit? Precisely, given a set

Measure the distance between the unknown digit to the

Measure the distance between the unknown digit to the

Measure the distance between the unknown digit to the

Measure the distance between the unknown digit to the

Measure the distance between the unknown digit to the

Given the manually classified training set, compute the

Given the manually classified training set, compute the

Given the manually classified training set, compute the

Given the manually classified training set, compute the

algorithm does not use any information about the variation

Given the manually classified training set, compute the

algorithm does not use any information about the variation

Using singular value decomposition(SVD), we will see a

Let us consider the 16 × 16 matrix representation of the

Let us consider the 16 × 16 matrix representation of the

Let us consider the 16 × 16 matrix representation of the

Let us consider the 16 × 16 matrix representation of the

Let us consider the 16 × 16 matrix representation of the

You might also like