Generative Models

Generative Models
M. Soleymani
Sharif University of Technology
Fall 2017
Slides are based on Fei Fei Li and colleagues lectures, cs231n, Stanford 2017.
Supervised Learning
• Data: (x, y)
– x is data
– y is label
• Goal: Learn a function to map x -> y
• Examples: Classification, regression, object detection, semantic

segmentation, image captioning, etc.
Unsupervised Learning
• Data: x
– Just data, no labels!
• Goal: Learn some underlying hidden

structure of the data
• Examples: Clustering, dimensionality

reduction, feature learning, density
estimation, etc.
• Data: x


estimation, etc.
• Data: x


estimation, etc.
Autoencoders
• Unsupervised approach for learning a lower-dimensional feature
representation from unlabeled training data
– Train such that features can be used to reconstruct original data
• “Autoencoding” - encoding itself
Autoencoders
Autoencoders
Autoencoders
Autoencoders
Autoencoders
Autoencoders
• Pre-train the encoder with (unlabeled) data
Autoencoders
• Data: x


estimation, etc.
Generative Models
Generative Models
• Task: Given a dataset of images {X1,X2...} can we learn the
distribution of X?
• Typically generative models implies modelling P(X).

– Very limited, given an input the model outputs a probability
• More Interested in models which we can sample from distribution of

data.
– producing examples that are similar to the training examples but not exactly
the same.
• synthesize new, unseen examples similar to the seen ones
Taxonomy of Generative Models
Why Generative Models?
• Realistic samples for artwork, super-resolution, colorization, etc.
• Generative models of time-series data can be used for simulation and

planning (reinforcement learning applications!)
• Training generative models can also enable inference of latent
representations that can be useful as general features
Fully visible belief network
Intuition
Chain rule:
Oord et al., Pixel recurrent neural networks, ICML 2016

Fully visible belief network
PixelRNN
• Generate image pixels starting from corner
• Dependency on previous pixels modeled using an RNN (LSTM)

PixelRNN

PixelRNN

PixelRNN
• Drawback: sequential generation is slow!

PixelCNN
• Still generate image pixels starting from corner
• Dependency on previous pixels now modeled using a CNN over
context region
PixelCNN
• Dependency on previous pixels now modeled Softmax loss at each pixel
using a CNN over context region
• Training: maximize likelihood of training images

PixelCNN
• Dependency on previous pixels now modeled Softmax loss at each pixel
using a CNN over context region
• Training is faster than PixelRNN
– can parallelize convolutions since context region
values known from training images
• Generation must still proceed sequentially =>
still slow
Generated Samples
PixelRNN and PixelCNN
So far
Variational Autoencoders
• Probabilistic spin on autoencoders
– will let us sample from the model to generate data!
Learn model parameters to maximize

likelihood of training data
Variational Autoencoders: Intractability
Cannot optimize likelihood directly, derive and optimize lower bound on likelihood
instead
Lower bound for log likelihood
log 𝑝 𝑥 = 𝐸𝑧~𝑝 𝑧 log 𝑝 𝑥
𝑝 𝑥|𝑧 𝑝(𝑧)
log 𝑝(𝑥) = 𝐸𝑧~𝑞 𝑧|𝑥 log 𝑝(𝑥) = 𝐸𝑧~𝑞 𝑧|𝑥 log Bayes rule
𝑝(𝑧|𝑥)
𝑝 𝑥|𝑧 𝑝 𝑧 𝑞 𝑧|𝑥
= 𝐸𝑧~𝑞 𝑧|𝑥 log ×
𝑝 𝑧|𝑥 𝑞 𝑧|𝑥
𝑝 𝑥|𝑧 𝑝(𝑧)
log 𝑝(𝑥) = 𝐸𝑧~𝑞 𝑧|𝑥 log 𝑝(𝑥) = 𝐸𝑧~𝑞 𝑧|𝑥 log
𝑝(𝑧|𝑥)
𝑝 𝑥|𝑧 𝑝 𝑧 𝑞 𝑧|𝑥
= 𝐸𝑧~𝑞 𝑧|𝑥 log ×
𝑝 𝑧|𝑥 𝑞 𝑧|𝑥
𝑞 𝑧|𝑥
= 𝐸𝑧~𝑞 𝑧|𝑥 log 𝑝 𝑥|𝑧 + 𝐸𝑧~𝑞 𝑧|𝑥 log
𝑝 𝑧|𝑥
𝑞 𝑧|𝑥
− 𝐸𝑧~𝑞 𝑧|𝑥 log
𝑝 𝑧
Maximizing ELBO
• Maximizing ELBO composed of:
– Approximate posterior distribution q(z|x): Best match to true posterior p(z|x)
– Reconstruction cost: The expected log-likelihood measures how well samples
from q(z|x) are able to explain the data x.
– Penalty: Ensures that the explanation of the data q(z|x) doesn’t deviate too
far from your beliefs p(z).
Encoder network
• Model q(z|x) with a neural network (called encoder network)
• Assume q(z|x) to be Gaussian
• Neural network outputs the mean 𝜇𝑧|𝑥 and covariance matrix Σ𝑧|𝑥
Decoder network
• Model P(x|z) with a neural network (called decoder network)
– let f(z) be the network output.
– Assume p(x|z) to be i.i.d. Gaussian, e.g.,
• x = f(z) + η , where η ~ N(0,I)
Cost function
If we consider:
𝑝 𝑥 𝑧 ∼ 𝑁 𝑓 𝑧 , 𝜎 2𝐼
We have:
1 2
log 𝑝 𝑥 𝑧 = 𝐶 − 2 𝑥 − 𝑓 𝑧
2𝜎
• Putting it all together (given a (x, z) pair):

1 2
𝐿= 𝑥−𝑓 𝑧 + 𝐷[𝑞(𝑧)| 𝑝 𝑧
2𝜎 2
Corresponding to 𝐸𝑧~𝑞 𝑧 𝑥 log 𝑝 𝑥 𝑧

term since 𝑧~𝑞 𝑧 𝑥 in the training data
Reparametrization Trick
z ~ N(μ, σ) is equivalent to
μ + σ ⋅ ε, where ε ~ N(0, 1)
Non-continuous operation and has no gradient Now we can easily backpropagate the loss to the Encoder.
Generate Image: At Test Time
• Remove the Encoder
• Sample z ~ N(0,I) and pass it through the Decoder.
• No good quantitative metric, relies on visual inspection
Variational Autoencoders: Generating Data!
Common VAE Architecture for Image Generation
Conditional VAE (CVAE)
• Replace all p(x|z) with p(y|z,x)
• Replace all q(z|x) with q(z|x,y).
CVAE
Go through the same KL divergence procedure, to get the same lower
bound.
Lower bound of log conditional likelihood:

log 𝑝 𝑦|𝑥 − 𝐷𝐾𝐿 𝑞 𝑧|𝑦, 𝑥 ||𝑝 𝑧|𝑦, 𝑥
= 𝐸𝑧~𝑞 𝑧|𝑥,𝑦 log 𝑝 𝑦|𝑥, 𝑧 − 𝐷𝐾𝐿 𝑞 𝑧|𝑦, 𝑥 ||𝑝 𝑧
Attribute2Image
Xinchen Yan et al., Attribute2Image: Conditional Image Generation from Visual Attributes, 2015.
• Probabilistic spin to traditional autoencoders => allows generating data
• Defines an intractable density => derive and optimize a (variational) lower bound
• Pros:
– Principled approach to generative models
– Allows inference of q(z|x), can be useful feature representation for other tasks
• Cons:
– Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN
– Samples blurrier and lower quality compared to state-of-the-art (GANs)
• Active areas of research:
– More flexible approximations, e.g. richer approximate posterior instead of diagonal Gaussian
– Incorporating structure in latent variables
So far
Generative Adversarial Networks
• Problem: Want to sample from complex, high-
dimensional training distribution.
– No direct way to do this!
• Solution:
– Sample from a simple distribution, e.g. random noise.
– Then, learn transformation to training distribution.
• Q: What can we use to represent this complex

transformation?
– A neural network!
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Training GANs: Two-player game
• Generator network: try to fool the discriminator by generating real-
looking images
• Discriminator network: try to distinguish between real and fake
images
looking images
images

looking images
images

looking images
images
Discriminator (θd ) wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake)
Generator (θg ) wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking
generated G(z) is real)


Aside: Jointly training two

networks is challenging,
can be unstable.
Choosing objectives with
better loss landscapes
helps training, is an active
area of research.

Putting it together: GAN training algorithm

Putting it together: GAN training algorithm

GAN: Example Generation
Generative Adversarial Nets: Convolutional Architectures
• Generator is an upsampling network with fractionally-strided

convolutions Discriminator is a convolutional network
DCGAN: Convolutional Architectures
Generative Adversarial Nets: Interpretable Vector Math
Conditional GAN
M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, 2014.

Conditional GAN
M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, 2014.

Energy-Based GAN (EBGAN)
• EBGAN architecture with an auto-encoder discriminator
• auto-encoders have the ability to learn an energy manifold without

supervision or negative examples
J. Zhao et al., Energy-based Generative Adversarial Network, 2016.
Combining VAE with GAN
Anders Boesen Lindbo Larse et al., Autoencoding beyond pixels using a learned similarity metric, ICML 2016.
Generative Adversarial Text to Image Synthesis
S. Reed et al., Generative Adversarial Text to Image Synthesis, 2016.

GANs
• Don’t work with an explicit density function
• Take game-theoretic approach: learn to generate from training distribution
through 2-player game
• Pros:
– Beautiful, state-of-the-art samples!
• Cons:
– Trickier / more unstable to train
– Can’t solve inference queries such as p(x), p(z|x)
• Active areas of research:
– Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)
– Conditional GANs, GANs for all kinds of applications
Recap
Resources
• Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014.
• Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.
• Ian Goodfellow, “Generative Adversarial Nets”, NIPS 2016 Tutorial.

Generative Models

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Generative Models

Uploaded by

Copyright:

Available Formats

Generative Models

• Goal: Learn a function to map x -> y

• Examples: Classification, regression, object detection, semantic

• Goal: Learn some underlying hidden

• Examples: Clustering, dimensionality

• Goal: Learn some underlying hidden

• Examples: Clustering, dimensionality

• Goal: Learn some underlying hidden

• Examples: Clustering, dimensionality

• Goal: Learn some underlying hidden

• Examples: Clustering, dimensionality

• Typically generative models implies modelling P(X).

• More Interested in models which we can sample from distribution of

• Generative models of time-series data can be used for simulation and

Oord et al., Pixel recurrent neural networks, ICML 2016

Oord et al., Pixel recurrent neural networks, ICML 2016

Oord et al., Pixel recurrent neural networks, ICML 2016

Oord et al., Pixel recurrent neural networks, ICML 2016

Oord et al., Pixel recurrent neural networks, ICML 2016

• Training: maximize likelihood of training images

Learn model parameters to maximize

• Putting it all together (given a (x, z) pair):

Corresponding to 𝐸𝑧~𝑞 𝑧 𝑥 log 𝑝 𝑥 𝑧

Lower bound of log conditional likelihood:

• Q: What can we use to represent this complex

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Aside: Jointly training two

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.

• Generator is an upsampling network with fractionally-strided

M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, 2014.

M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, 2014.

• auto-encoders have the ability to learn an energy manifold without

S. Reed et al., Generative Adversarial Text to Image Synthesis, 2016.

You might also like