You are on page 1of 27

MA614 - Applied Linear Algebra

Principal Component Analysis


(PCA)

Dr. A. Sairam Kaliraj

Assistant Professor
Department of Mathematics
Indian Institute of Technology Ropar
Email: sairam@iitrpr.ac.in

1/10
Principal Component Analysis

Theorem
The j th principal direction of a normalized data matrix A is its j th
unit singular vector qj . The corresponding principal standard

deviation νσj is proportional to its j th singular value σj .

2/10
Principal Component Analysis

Theorem
The j th principal direction of a normalized data matrix A is its j th
unit singular vector qj . The corresponding principal standard

deviation νσj is proportional to its j th singular value σj .

Eckart-Young Theorem says that the number of singnificant


singular values will determine the approximate rank k of the
covariance matrix K = νAT A, and hence the data matrix A.
Thus, the normalized data lies in a k − dimensional subspace.
The variance in any direction orthogonal to the principal
direction which is very small is relatively unimportant.

2/10
Principal Component Analysis

Theorem
The j th principal direction of a normalized data matrix A is its j th
unit singular vector qj . The corresponding principal standard

deviation νσj is proportional to its j th singular value σj .

Eckart-Young Theorem says that the number of singnificant


singular values will determine the approximate rank k of the
covariance matrix K = νAT A, and hence the data matrix A.
Thus, the normalized data lies in a k − dimensional subspace.
The variance in any direction orthogonal to the principal
direction which is very small is relatively unimportant.
As a consequence, dimensional reduction by orthogonally
projecting the data vectors (normalized) onto the k −
dimensional subspace spanned by the principal directions
q1 , q2 , . . . qk serves to eliminate significantly redundancies.

2/10
Principal Component Analysis

Approximating the data by its k principal Components coincides


with the closest rank k approximation Ak = Pk Σk QkT to the data
matrix A. Rewritting the data in terms of the principal
components serves to diagonalize the principal covariance
matrix νKk = νATk Ak , since

QkT (νATk Ak )Qk = νΣk2

the result being a diagonal matrix containing the k principal


variances along the diagonal.

3/10
Principal Component Analysis

Approximating the data by its k principal Components coincides


with the closest rank k approximation Ak = Pk Σk QkT to the data
matrix A. Rewritting the data in terms of the principal
components serves to diagonalize the principal covariance
matrix νKk = νATk Ak , since

QkT (νATk Ak )Qk = νΣk2

the result being a diagonal matrix containing the k principal


variances along the diagonal.
This means that the covariance between any two principal
components of the data is zero, and thus the principal
components are all uncorrelated!

3/10
Applications of Principal Component Analysis

1 Classification of Handwritten digits


2 Expression Recognition
3 Netflix recommendation algorithm
The applications are not only limited to the above mentioned
real time problems... It has huge applications in Artificial
Intelligence and machine learning.

4/10
Applications of SVD - III - Handwritten digit
classification

Problem: How to classify unknown digit?

5/10
Applications of SVD - III - Handwritten digit
classification

Problem: How to classify unknown digit? Precisely, given a set


of manually classified digits (the training set), classify a set of
unknown digits (the test set).

5/10
Applications of SVD - III - Handwritten digit
classification

Problem: How to classify unknown digit? Precisely, given a set


of manually classified digits (the training set), classify a set of
unknown digits (the test set).

5/10
A simple algorithm: Distance to the known
digits

Measure the distance between the unknown digit to the


known digits using the Euclidean distance.

6/10
A simple algorithm: Distance to the known
digits

Measure the distance between the unknown digit to the


known digits using the Euclidean distance.
stack the columns of the image in a vector and identify
each digit as a vector in R256 .

6/10
A simple algorithm: Distance to the known
digits

Measure the distance between the unknown digit to the


known digits using the Euclidean distance.
stack the columns of the image in a vector and identify
each digit as a vector in R256 .
Then define the distance function
q
d(x, y ) = ||x − y ||2 = (x1 − y1 )2 + · · · + (x256 − y256 )2 .

6/10
A simple algorithm: Distance to the known
digits

Measure the distance between the unknown digit to the


known digits using the Euclidean distance.
stack the columns of the image in a vector and identify
each digit as a vector in R256 .
Then define the distance function
q
d(x, y ) = ||x − y ||2 = (x1 − y1 )2 + · · · + (x256 − y256 )2 .

All the digits of one kind in the training set form a cluster of
points in the Euclidean space R256 . (Assumption)

6/10
A simple algorithm: Distance to the known
digits

Measure the distance between the unknown digit to the


known digits using the Euclidean distance.
stack the columns of the image in a vector and identify
each digit as a vector in R256 .
Then define the distance function
q
d(x, y ) = ||x − y ||2 = (x1 − y1 )2 + · · · + (x256 − y256 )2 .

All the digits of one kind in the training set form a cluster of
points in the Euclidean space R256 . (Assumption)
Ideally the clusters are well separated, and the separation
between the clusters depends on how well written the
training digits are.
6/10
The means (“averages“) of all digits in the training set.

7/10
A simple algorithm: Distance to the known
digits
Algorithm

Given the manually classified training set, compute the


means mi , i = 0, 1, 2, . . . , 9, of all the 10 digits.

8/10
A simple algorithm: Distance to the known
digits
Algorithm

Given the manually classified training set, compute the


means mi , i = 0, 1, 2, . . . , 9, of all the 10 digits.
For each digit in the test set, classify it as k if mk is the
closest mean.

8/10
A simple algorithm: Distance to the known
digits
Algorithm

Given the manually classified training set, compute the


means mi , i = 0, 1, 2, . . . , 9, of all the 10 digits.
For each digit in the test set, classify it as k if mk is the
closest mean.

For some test set, the success rate of this algorithm is around
75%.

8/10
A simple algorithm: Distance to the known
digits
Algorithm

Given the manually classified training set, compute the


means mi , i = 0, 1, 2, . . . , 9, of all the 10 digits.
For each digit in the test set, classify it as k if mk is the
closest mean.

For some test set, the success rate of this algorithm is around
75%. The reason for the relatively bad performance is that the

algorithm does not use any information about the variation


within each class of digits.

8/10
A simple algorithm: Distance to the known
digits
Algorithm

Given the manually classified training set, compute the


means mi , i = 0, 1, 2, . . . , 9, of all the 10 digits.
For each digit in the test set, classify it as k if mk is the
closest mean.

For some test set, the success rate of this algorithm is around
75%. The reason for the relatively bad performance is that the

algorithm does not use any information about the variation


within each class of digits.

Using singular value decomposition(SVD), we will see a


classification algorithm, for which the success rate is around
8/10 93%.
Algo. for classification of handwritten
digits using SVD

Let us consider the 16 × 16 matrix representation of the


image as vector in R256×1 , by stacking all the columns of
the image above each other.

9/10
Algo. for classification of handwritten
digits using SVD

Let us consider the 16 × 16 matrix representation of the


image as vector in R256×1 , by stacking all the columns of
the image above each other.
The matrix consisting of all the training digits of one kind.
the 30 s say, is an element of the space A ∈ R256×n .

9/10
Algo. for classification of handwritten
digits using SVD

Let us consider the 16 × 16 matrix representation of the


image as vector in R256×1 , by stacking all the columns of
the image above each other.
The matrix consisting of all the training digits of one kind.
the 30 s say, is an element of the space A ∈ R256×n .
Each column of A represents an image of a digit 3.

9/10
Algo. for classification of handwritten
digits using SVD

Let us consider the 16 × 16 matrix representation of the


image as vector in R256×1 , by stacking all the columns of
the image above each other.
The matrix consisting of all the training digits of one kind.
the 30 s say, is an element of the space A ∈ R256×n .
EachPcolumn of A represents an image of a digit 3. If
A= m T
i=1 σi ui vi , then the left singular vectors ui forms an
orthonormal basis for the range space of A. i.e., "the image
of 30 s".

9/10
Algo. for classification of handwritten
digits using SVD

Let us consider the 16 × 16 matrix representation of the


image as vector in R256×1 , by stacking all the columns of
the image above each other.
The matrix consisting of all the training digits of one kind.
the 30 s say, is an element of the space A ∈ R256×n .
EachPcolumn of A represents an image of a digit 3. If
A= m T
i=1 σi ui vi , then the left singular vectors ui forms an
orthonormal basis for the range space of A. i.e., "the image
of 30 s".
Each digit is well characterized by a few of the first left
singular values of its own kind.

9/10
Algo. for classification of handwritten
digits using SVD

Algorithm
Training: For the training set of known digits, compute the SVD
of each set of digits of one kind.
Classification: For a given test digit, compute its relative
residual in all 10 bases. If one residual is significantly smaller
than all the others, classify as that. Otherwise give up.

10/10

You might also like