You are on page 1of 46

PRINCIPAL COMPONENT ANALYSIS

& SINGULAR VALUE DECOMPOSITION


PROF. NAVNEET GOYAL

The material in the presentation has been adopted/taken from


articles by Jon Shlens & Lindsay Smith on PCA and from SVD
tutorial by Kirk Baker
PCA
• A black box for dimensionality reduction used extensively in ML
community
• How and why PCA works?
• Mathematics behind PCA!
• Most celebrated result in Applied Linear Algebra
• Gives best representation of data – without regard to the ML task
at hand
• An unsupervised metric learning technique
PCA: MOTIVATIONAL EXAMPLE
• Motion of the physicist’s ideal spring.
• Ball of mass m attached to a
massless, frictionless spring.
• The ball is released a small distance
away from equilibrium (i.e. the
spring is stretched).
• Because the spring is “ideal,” it
oscillates indefinitely along the x -
axis about its equilibrium at a set
frequency.
PCA: MOTIVATIONAL EXAMPLE
• Underlying dynamics can be expressed as a
function of a single variable x
• being ignorant experimenters we do not
know any of this.
• We do not know which, let alone how
many, axes and dimensions are
important to measure.
• Thus, we decide to measure the ball’s
position in a 3D space (since we live in a
• 3D world).
• Specifically, we place three movie
cameras
PCA: MOTIVATIONAL EXAMPLE
• At 200 Hz each movie camera records an image
indicating a 2D position of the ball (a
projection).
• Unfortunately, because of our ignorance, we do
not even know what are the real “x ”, “y ” and
“z ”axes, so we choose three camera axes {~a,
~b,~c} at some arbitrary angles with respect to
the system.
• The angles between our measurements might
not even be 90o! Now, we record with the
cameras for 2 minutes.
• The big question remains: how do we get from
this data set to a simple equation of x?
PCA: MOTIVATIONAL EXAMPLE
• A smart experimenter would have used just 1
camera, aligned with the x-axis
• In real world – we end up recording many
additional dimensions!!
• Noise!!
PCA: MOTIVATIONAL EXAMPLE
• PCA computes the most meaningful basis to
re-express a noisy, garbled data set
• New basis is expected to filter out the noise
and reveal hidden dynamics
• In the example of the spring, the explicit goal of
PCA is to determine: “the dynamics are along
the x -axis.”
• The goal of PCA is to determine that
• X’ - the unit basis vector along the x -axis - is
the important dimension.
• Determining this fact allows an experimenter to
discern which dynamics are important and
which are just redundant.
PCA: MOTIVATIONAL EXAMPLE
• PCA computes the most meaningful basis to
re-express a noisy, garbled data set
• New basis is expected to filter out the noise
and reveal hidden dynamics
• In the example of the spring, the explicit goal of
PCA is to determine: “the dynamics are along
the x -axis.”
• The goal of PCA is to determine that
• X’ - the unit basis vector along the x -axis - is
the important dimension.
• Determining this fact allows an experimenter to
discern which dynamics are important and
which are just redundant.
PCA: MOTIVATIONAL EXAMPLE
• In our data set, at one point in time, camera A records a
corresponding ball position (xA;yA). One sample or trial
can then be expressed as a 6- dimensional column vector
where each camera contributes a 2-dimensional projection
of the ball’s position to the entire vector ~X. If we record
the ball’s position for 10 minutes at 120 Hz, then we have
recorded 10x60x120 = 72000 of these vectors.
PICTORIAL ILLUSTRATION
MATHEMATICAL BACKGROUND FOR PCA
• Statistics
• Mean, std. dev., variance, covariance, covariance matrix
• Matrix Algebra
• Eigen-vectors & Eigen-values
MATHEMATICAL BACKGROUND FOR PCA
• Statistics
• Mean, std. dev., variance, covariance, covariance matrix
MATHEMATICAL BACKGROUND FOR PCA
• Statistics
• Mean, std. dev., variance, covariance, covariance matrix
MATHEMATICAL BACKGROUND FOR PCA
• Matrix Algebra
• Eigen-vectors & Eigen-values
MATHEMATICAL BACKGROUND FOR PCA
• Matrix Algebra
• Eigen-vectors & Eigen-values
PCA: STEPS
• Step 1: Get some data
• Step 2: Subtract the mean
• Step 3: Calculate the covariance matrix
• Step 4: Calculate the eigenvectors and eigenvalues of the
covariance matrix
• Step 5: Choosing components and forming a feature vector
• Step 6: Deriving the new data set
PCA: STEPS
PCA: STEPS
PCA: STEPS
PCA: STEPS
PCA: STEPS
PCA: STEPS
STEP 5: DERIVING THE NEW DATA SET
• final step in PCA
• Once we have chosen the components (eigenvectors) that we
wish to keep in our data and formed a feature vector, we simply
take the transpose of the vector and multiply it on the left of the
original data set, transposed.
PCA: STEPS
PCA: STEPS
PCA: STEPS
SVD
• SVD produces the same new axes as PCA
• Does not require covariance matrix
• Based on a theorem in Linear Algebra, according to which a
rectangular matrix can be decomposed into 3 matrices - an
orthogonal matrix U, a diagonal matrix S, and the transpose of an
orthogonal matrix V

• Full SVD & Reduced SVD


SVD: 3 POINTS OF VIEW
• Converting correlated variables into uncorrelated variables that
better expose the relationships among original data items
• Ordering of dimensions according to maximum variability
• Finding best approximation (representation) of data in fewer
dimensions
SVD
SVD
SVD
• Based on a theorem in Linear Algebra, according to which a
rectangular matrix can be decomposed into 3 matrices - an
orthogonal matrix U, a diagonal matrix S, and the transpose of an
orthogonal matrix V
SVD: EXAMPLE
• Find the SVD of the matrix:
SVD: EXAMPLE
• Find the eigenvalues and the corresponding eigenvectors of AAT
SVD: EXAMPLE
• Gram-Schmidt orthonormalization process

• The V Matrix
SVD: EXAMPLE
• The V Matrix
Matrix
multiplication
compatibility
SVD: EXAMPLE
• The S Matrix
SVD: EXAMPLE
• SVD Decomposition

• The diagonals in S are the singular values of A


• Columns in U are the left singular vectors of A
• Columns in V are the right singular matrix
SVD: EXAMPLE
• SVD Decomposition
• Find out how SVD helps in dimensionality reduction
• Find out the role of SVD in NLP (Document classification)
• What we discussed is Full SVD
• Find out what is Reduced SVD
REDUCED SVD: EXAMPLE
REDUCED SVD: EXAMPLE
REDUCED SVD: EXAMPLE

How to choose S?
Full S (5x5) in case of full SVD
In Reduced SVD, we choose the size of S based on how many
dimensions we want to retain
Suppose we want to retain only 3 dimensions
Reduced versions
5x3; 3x3; 3x5 (5x5)

REDUCED SVD: EXAMPLE

^
! 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐴 𝑖𝑛 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠

You might also like