You are on page 1of 9

LVMs for Dimensionality Reduction

CS771: Introduction to Machine Learning


Piyush Rai
2
Plan
 A latent variable model for dimensionality reduction
 Probabilistic PCA
 Expectation maximization (EM) algorithm for MLE for PPCA

CS771: Intro to ML
3
Probabilistic PCA (PPCA)
 Assume obs as a linear mapping of a latent var + Gaussian noise
offset matrix
Drawn from a zero-mean -dim
𝒙 𝑛 =𝝁+𝑾 𝒛 𝑛 +𝝐𝑛 Gaussian

 Equivalent to saying
 Assume a zero-mean Gaussian prior on , so

Need to estimate fewer


A “reverse” (generative) way of
thinking: first generate a low-dim 𝑾𝒛 parameters ( as opposed to

latent variable and then map it to


generate the high-dim observation 𝝁
Thus PPCA does a low-
rank approximation of
the covariance matrix
 Joint distr. of and is Gaussian (since and are individually Gaussian) and the
marginal distribution of will be Gaussian As , the covariance become approx low-
rank (rank ) and only params needed, as
𝑝 ( 𝒙 𝑛|𝑾 , 𝜎 )= 𝑁 ( 𝒙 𝑛|𝝁 , 𝑾 𝑾 + 𝜎 𝐼 𝐷 )
2 ⊤ 2
opposed to for the full covariance
CS771: Intro to ML
4
Benefits of Generative Models for Dim-Red
 One benefit: Once model parameters are learned, we can even generate new data,
e.g.,
 Generate a random from
 Generate condition on from

Generated using a more


sophisticated generative
model, not PPCA (but
similar in formulation)

CS771: Intro to ML
5
Learning PPCA using EM
 The ILL is with integrated out
 Ignoring for notational simplicity, ILL is
 Can maximize ILL but requires solving eigen-decomposition (PRML: 12.2.1)
 EM will instead maximize expected CLL, with CLL given by

 Using and simplifying , and simplifying

 Expected CLL will need and where the expectations are w.r.t. the conditional
posterior of
CS771: Intro to ML
6
Learning PPCA using EM Using and and Gaussian
reverse conditional property Ignoring the
parameter
 The EM algo for PPCA alternates between two steps
 Compute conditional posterior of given parameters

 Maximize the expected CLL w.r.t.

 Taking derivative of expected CLL w.r.t. and setting to zero gives


Can likewise
estimate as well

 Required expectations can be found from the conditional posterior of

CS771: Intro to ML
7
Full EM algo for PPCA

CS771: Intro to ML
8
Other Generative Models for Dim-Red
 Factor Analysis is similar to PPCA except that the noise covariance of a diagonal
matrix instead of

 Can use a mixture of probabilistic PCA for nonlinear dimensionality reduction


 Data assumed to come from a mixture of low-rank Gaussians
 Each low-rank Gaussian is a PPCA model
 Basically does clustering + dimensionality reduction in each cluster

 Variational auto-encoders (VAE): to mapping is defined by deep neural net


Will look at VAE and GAN briefly
when talking about deep learning
 Generative adversarial networks (GAN) are models that can only generate
 Some variants of GANs (e.g., bi-directional GAN) can also be used to learn from
CS771: Intro to ML
9
Coming up next
 Deep neural networks

CS771: Intro to ML

You might also like