Lec 5,6

PRINCIPAL COMPONENT ANALYSIS
& SINGULAR VALUE DECOMPOSITION

PROF. NAVNEET GOYAL
The material in the presentation has been adopted/taken from

articles by Jon Shlens & Lindsay Smith on PCA and from SVD
tutorial by Kirk Baker
PCA
• A black box for dimensionality reduction used extensively in ML
community
• How and why PCA works?
• Mathematics behind PCA!
• Most celebrated result in Applied Linear Algebra
• Gives best representation of data – without regard to the ML task
at hand
• An unsupervised metric learning technique
PCA: MOTIVATIONAL EXAMPLE
• Motion of the physicist’s ideal spring.
• Ball of mass m attached to a
massless, frictionless spring.
• The ball is released a small distance
away from equilibrium (i.e. the
spring is stretched).
• Because the spring is “ideal,” it
oscillates indefinitely along the x -
axis about its equilibrium at a set
frequency.
• Underlying dynamics can be expressed as a
function of a single variable x
• being ignorant experimenters we do not
know any of this.
• We do not know which, let alone how
many, axes and dimensions are
important to measure.
• Thus, we decide to measure the ball’s
position in a 3D space (since we live in a
• 3D world).
• Specifically, we place three movie
cameras
• At 200 Hz each movie camera records an image
indicating a 2D position of the ball (a
projection).
• Unfortunately, because of our ignorance, we do
not even know what are the real “x ”, “y ” and
“z ”axes, so we choose three camera axes {~a,
~b,~c} at some arbitrary angles with respect to
the system.
• The angles between our measurements might
not even be 90o! Now, we record with the
cameras for 2 minutes.
• The big question remains: how do we get from
this data set to a simple equation of x?
• A smart experimenter would have used just 1
camera, aligned with the x-axis
• In real world – we end up recording many
additional dimensions!!
• Noise!!
• PCA computes the most meaningful basis to
re-express a noisy, garbled data set
• New basis is expected to filter out the noise
and reveal hidden dynamics
• In the example of the spring, the explicit goal of
PCA is to determine: “the dynamics are along
the x -axis.”
• The goal of PCA is to determine that
• X’ - the unit basis vector along the x -axis - is
the important dimension.
• Determining this fact allows an experimenter to
discern which dynamics are important and
which are just redundant.
• PCA computes the most meaningful basis to
re-express a noisy, garbled data set
• New basis is expected to filter out the noise
and reveal hidden dynamics
• In the example of the spring, the explicit goal of
PCA is to determine: “the dynamics are along
the x -axis.”
• The goal of PCA is to determine that
• X’ - the unit basis vector along the x -axis - is
the important dimension.
• Determining this fact allows an experimenter to
discern which dynamics are important and
which are just redundant.
• In our data set, at one point in time, camera A records a
corresponding ball position (xA;yA). One sample or trial
can then be expressed as a 6- dimensional column vector
where each camera contributes a 2-dimensional projection
of the ball’s position to the entire vector ~X. If we record
the ball’s position for 10 minutes at 120 Hz, then we have
recorded 10x60x120 = 72000 of these vectors.
PICTORIAL ILLUSTRATION
MATHEMATICAL BACKGROUND FOR PCA
• Statistics
• Mean, std. dev., variance, covariance, covariance matrix
• Matrix Algebra
• Eigen-vectors & Eigen-values
• Statistics
• Statistics
• Matrix Algebra
• Matrix Algebra
PCA: STEPS
• Step 1: Get some data
• Step 2: Subtract the mean
• Step 3: Calculate the covariance matrix
• Step 4: Calculate the eigenvectors and eigenvalues of the
covariance matrix
• Step 5: Choosing components and forming a feature vector
• Step 6: Deriving the new data set
PCA: STEPS
PCA: STEPS
PCA: STEPS
PCA: STEPS
PCA: STEPS
PCA: STEPS
STEP 5: DERIVING THE NEW DATA SET
• final step in PCA
• Once we have chosen the components (eigenvectors) that we
wish to keep in our data and formed a feature vector, we simply
take the transpose of the vector and multiply it on the left of the
original data set, transposed.
PCA: STEPS
PCA: STEPS
PCA: STEPS
SVD
• SVD produces the same new axes as PCA
• Does not require covariance matrix
• Based on a theorem in Linear Algebra, according to which a
rectangular matrix can be decomposed into 3 matrices - an
orthogonal matrix U, a diagonal matrix S, and the transpose of an
orthogonal matrix V
• Full SVD & Reduced SVD

SVD: 3 POINTS OF VIEW
• Converting correlated variables into uncorrelated variables that
better expose the relationships among original data items
• Ordering of dimensions according to maximum variability
• Finding best approximation (representation) of data in fewer
dimensions
SVD
SVD
SVD
• Based on a theorem in Linear Algebra, according to which a
rectangular matrix can be decomposed into 3 matrices - an
orthogonal matrix U, a diagonal matrix S, and the transpose of an
orthogonal matrix V
SVD: EXAMPLE
• Find the SVD of the matrix:
SVD: EXAMPLE
• Find the eigenvalues and the corresponding eigenvectors of AAT
SVD: EXAMPLE
• Gram-Schmidt orthonormalization process
• The V Matrix
SVD: EXAMPLE
• The V Matrix
Matrix
multiplication
compatibility
SVD: EXAMPLE
• The S Matrix
SVD: EXAMPLE
• SVD Decomposition
• The diagonals in S are the singular values of A

• Columns in U are the left singular vectors of A
• Columns in V are the right singular matrix
SVD: EXAMPLE
• SVD Decomposition
• Find out how SVD helps in dimensionality reduction
• Find out the role of SVD in NLP (Document classification)
• What we discussed is Full SVD
• Find out what is Reduced SVD
REDUCED SVD: EXAMPLE
How to choose S?
Full S (5x5) in case of full SVD
In Reduced SVD, we choose the size of S based on how many
dimensions we want to retain
Suppose we want to retain only 3 dimensions
Reduced versions
5x3; 3x3; 3x5 (5x5)
^
! 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐴 𝑖𝑛 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

Lec 5,6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 5,6

Uploaded by

Copyright:

Available Formats

PRINCIPAL COMPONENT ANALYSIS

& SINGULAR VALUE DECOMPOSITION

The material in the presentation has been adopted/taken from

• Full SVD & Reduced SVD

• The diagonals in S are the singular values of A

REDUCED SVD: EXAMPLE

You might also like