You are on page 1of 210

CMSC498L

Introduction to Deep Learning


Abhinav Shrivastava
Supervised Unsupervised
Learning Learning

Discriminative Generative
Models Models
Supervised Learning Unsupervised Learning
Data: Data:

Goal: Goal:

Method: Method:

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Supervised Learning Unsupervised Learning

Data: (x, y) Data: (x)


x is datum, y is label x is datum

Goal: Goal:
Learn P(x, y)
Learn P(y|x)
Classify xnew, Generate xnew
Classify xnew

Method:
Method:
Learn some underlying hidden structure
Learn a function to map x -> y of the data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Supervised Unsupervised
Learning Learning

Discriminative GeneraIve
Models Models
Discriminative vs. Generative Models

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Discriminative vs. Generative Models

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Discriminative vs. Generative Models

Figure from: hOps://datawarrior.wordpress.com/2016/05/08/generaIve-discriminaIve-pairs/, hOp://www.evolvingai.org/fooling


Discriminative vs. Generative Models

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Discriminative vs. Generative Models

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Discriminative vs. Generative Models

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Discriminative vs. Generative Models

Figure from: https://datawarrior.wordpress.com/2016/05/08/generative-discriminative-pairs/, http://www.evolvingai.org/fooling


Supervised Unsupervised
Learning Learning

Discriminative Generative
Models Models
Discussion
Generative tasks
Learn to generate Images from
• Random Noise

noise G Image

Slide inspired by Svetlana Lazebnik (link)


Generative tasks

Slide inspired by Svetlana Lazebnik (link)


Generative tasks
Learn to generate Images from
• Random Noise
• Conditional Generation (e.g., noise and scalar/one-hot class)

noise
G Image
class

Slide inspired by Svetlana Lazebnik (link), figure from: Self-AOenIon GeneraIve Adversarial Networks
Generative tasks

Slide inspired by Svetlana Lazebnik (link), figure from: Self-Attention Generative Adversarial Networks
Generative tasks
Learn to generate Images from
• Random Noise
• CondiMonal GeneraMon (e.g., noise and scalar/one-hot class)
• Image-to-Image GeneraMon (CondiMonal without Noise)

Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix


Generative tasks

Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix


Generative tasks
Learn to generate Images from
• Random Noise
• Conditional Generation (e.g., noise and scalar/one-hot class)
• Image-to-Image Generation (Conditional without Noise)

Slide inspired by Svetlana Lazebnik (link), figure from: Pix2Pix


Generative tasks
noise
G Image
class

G
Generative tasks
noise
G Image
class

E f G
Generative tasks

E f G

E f D
Generative tasks

age
E 2f I m G
age
I m

E ?f D
Generative tasks

age
E 2f I m G
age
I m

er
co d
E - En
f D
u to
A
Autoencoder

E f D

𝑥 Latent features 𝑥!
Input Data Reconstructed Data
Autoencoder

E f D

𝑥 Latent features 𝑥!
Input Data Reconstructed Data
Autoencoder

E f D

𝑥 𝑥!
Input Data Reconstructed Data

Trained using reconstruction loss


Autoencoder

𝑥 E f D 𝑥!
$
𝑥 − 𝑥!

Trained using reconstruction loss (L2)

No labels* required!
Autoencoder: Unsupervised or Generative?

E f D
Autoencoder: Unsupervised or Generative?

E f Use features for recognition


Autoencoder: Unsupervised or Generative?

E f Use features for recognition


• Static Features
• Clustering
• Classification
• etc.
• Update Features
• Pre-training + Fine-tuning
• Etc.
Autoencoder: Unsupervised or Generative?

E f D
Autoencoder: Unsupervised or Generative?

f D
Autoencoder: Unsupervised or Generative?

f D

^
Generate New Data f D
Autoencoder: Unsupervised or Generative?

f D

^
Generate New Data f D
Autoencoder: Unsupervised or Generative?

E f D
Supervised Unsupervised
Learning Learning

DiscriminaIve Generative
Models Models
Generative Models for Image Generation

Training data ~ Pdata(x) Generated samples ~ Pmodel(x)

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Generative Models for Image Generation

Training data ~ Pdata(x) Generated samples ~ Pmodel(x)

Want to learn a Pmodel(x) similar to Pdata(x)

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Generative Models for Image Generation

Training data ~ Pdata(x) Generated samples ~ Pmodel(x)

Want to learn a Pmodel(x) similar to Pdata(x)

Has flavors of density estimation – a core problem in Unsupervised Learning


• Explicit density estimation: explicitly define and solve for Pmodel(x)
• Implicit density estimation: learn a model that can sample from Pmodel(x), without
explicitly defining it
Inspired from: Fei-Fei Li & JusIn Johnson & Serena Yeung. hOp://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Generative Models
Direct

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain

Variational Markov Chain

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• VariaIonal Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• Variational Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:

𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*

• Then maximize likelihood of training data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:

𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*

• Then maximize likelihood of training data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:

𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*

• Then maximize likelihood of training data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:

𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*

• Then maximize likelihood of training data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:

𝑃 𝑥 = ' 𝑃 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*

• Then maximize likelihood of training data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Pixel-RNN/CNN
• Fully visible belief network
• Explicit density model
• Use chain rule to decompose likelihood of image x:

𝑃0 𝑥 = ' 𝑃0 𝑥( 𝑥* , 𝑥$ , … , 𝑥(.* )
()*

• Then maximize likelihood of training data

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Pixel-RNN
State-to-State Component

Credit: Logan Lebanoff


Input-to-State Component

Credit: Logan Lebanoff


Combine State Components

Credit: Logan Lebanoff


Row LSTM

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Diagonal LSTM

Credit: Logan Lebanoff


Diagonal LSTM

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad
Comparison

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Results

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Multi-scale PixelRNN

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Multi-scale PixelRNN

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Conditional Image Generation

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Results

Credit: Hsiao-Ching Chang, Ameya Patil, Anand Bhattad


Pixel-RNN/CNN – aka Auto-regressive Models
Pros: Tricks:
• Can explicitly compute P(x) • Gated Conv. Layers
• Skip connecMons
• Explicit P(x) gives good
• DiscreMzed logits
evaluation metric
• MulM-scale
• Clever training
Cons: • Etc.
• Sequence generation slow
• Optimizing P(x) is hard (See PixelRNN/CNN/CNN++)

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN VariaIonal Markov Chain
• Variational Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• Variational Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & JusIn Johnson & Serena Yeung. hOp://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on GeneraIve Adversarial Networks, 2017
Generative Adversarial Networks

G(z)
z G Image

GAN: Goodfellow et al., NIPS 2014; Slide inspired by Svetlana Lazebnik (link)
Generative Adversarial Networks

G(z)
z G D(G(z)) Fake
D

GAN: Goodfellow et al., NIPS 2014; Slide inspired by Svetlana Lazebnik (link)
Generative Adversarial Networks

G(z)
z G D(G(z)) Fake
D
D(xdata) Real

xdata

GAN: Goodfellow et al., NIPS 2014; Slide inspired by Svetlana Lazebnik (link)
Generative Adversarial Networks

G(z)
z G D(G(z)) Fake
D
D(xdata) Real

xdata

GAN: Goodfellow et al., NIPS 2014; Slide inspired by Svetlana Lazebnik (link)
Conditional Generative Adversarial Networks

z G(z)
class G D(G(z)) Fake
D
D(xdata) Real

xdata
Conditional Generative Adversarial Networks

z G(z)
class G D(G(z)) Fake
D
D(xdata) Real

xdata
Image-to-Image Conditional GANs

G(z)
E G D D fake

D D real
Examples
pix2pix

Contribution:
• Add L1 loss to the loss function

• U-net generator

• PatchGAN discriminator

Isola et al., Image-to-Image Translation with Conditional Adversarial Nets, CVPR 2017
pix2pix – L1 Loss
• Compared to L2, L1 loss results in less blurring*
pix2pix – U-Net Encoder
pix2pix – PatchGAN Discriminator
• Traditional discriminator outputs one value to represent the real/fake
probability.
• PatchGAN outputs N*N values to represent each patch’s real/fake
probability in the image.
• Finally, we use the mean value of the N*N probability
pix2pix Results
CycleGAN
pix2pix – Paired image-to-image translation model
CycleGAN – Unpaired image-to-image translation model

Zhu et al., Unpaired Image-to-Image TranslaIon using Cycle-Consistent Adversarial Networks, ICCV 2017
CycleGAN
CycleGAN
• Adversarial Loss:

• Cycle Loss:

• CycleGAN Loss:
CycleGAN Results
CycleGAN Results
CycleGan à StarGAN
CycleGAN – transfer between 2 domains
StartGAN – transfer between multiple domains

Choi et al., StarGAN: Unified GeneraIve Adversarial Networks for MulI-Domain Image-to-Image TranslaIon
Generative Adversarial Networks

G(z)
z G D(G(z)) Fake
D
D(xdata) Real

xdata

GAN: Goodfellow et al., NIPS 2014; Slide inspired by Svetlana Lazebnik (link)
GAN Training Problems
• Instability
• Difficult to keep G and D in sync
• Mode Collapse

Figure from: Goodfellow’s Tutorial on GANs; Metz et al. & Hung-yi Lee & Arjovsky et al.
Heuristic Solutions
• GAN Hacks (h\ps://github.com/soumith/ganhacks)

• GAN Tutorial
• Improved Techniques for Training GANs

Also see: https://medium.com/@jonathan_hui/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b


Numerous variants
• DCGAN: Deep Convolutional GAN
• WGAN: Wasserstein GAN
• Improved WGAN
• IWGAN: Importance Weighted GAN
• LSGAN: Least-squares GAN
• EBGAN: Energy-based GAN
• BEGAN: Boundary Equilibrium GAN
• InfoGAN: Information Maximizing GAN
• BiGAN: Bidirectional GAN
• Etc.
• Etc.
• Etc.
DCGAN: Deep Convolutional GAN
• Make GAN stable
• Combine CNN and GAN

Figure credit: Radford et al.


DCGAN: Deep Convolutional GAN

Figure credit: Radford et al.


Experiments
• Generating more complicated scenes

Figure credit: Radford et al.


Experiments
• Vector arithmetic on face samples

Figure credit: Radford et al.


Numerous variants
• DCGAN: Deep ConvoluMonal GAN
• WGAN: Wasserstein GAN
• Improved WGAN
• IWGAN: Importance Weighted GAN
• LSGAN: Least-squares GAN
• EBGAN: Energy-based GAN
• BEGAN: Boundary Equilibrium GAN
• InfoGAN: InformaMon Maximizing GAN
• BiGAN: BidirecMonal GAN
• Etc.
• Etc.
• Etc.
GAN Zoo

https://github.com/hindupuravinash/the-gan-zoo
E.g., LSGANs

Mao et al., ICCV 2017


High-fidelity Samples (256x256)

ProgressiveGAN, ICLR 2018


High-fidelity Samples (1024x1024)

ProgressiveGAN, ICLR 2018


High-fidelity Samples (512x512)

BIGGAN, 2018
High-fidelity Samples (interpolation)

BIGGAN, 2018
How to evaluate GANs?
• Showing pictures of samples is not enough, especially for
simpler datasets like MNIST, CIFAR, faces, bedrooms, etc.
• We cannot directly compute the likelihoods of high-
dimensional samples (real or generated), or compare
their distributions
• Many GAN approaches claim mainly to improve stability,
which is hard to evaluate
• For discussion, see Ian Goodfellow’s Twitter thread

Slide inspired by Svetlana Lazebnik (link)


Evaluating GANs

Slide inspired by Svetlana Lazebnik (link)


Evaluating GANs
• Turing Test

Slide inspired by Svetlana Lazebnik (link)


T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Evaluating GANs
• Turing Test

Slide inspired by Svetlana Lazebnik (link)


T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Evaluating GANs
• Turing Test
• Inception Score

Slide inspired by Svetlana Lazebnik (link)


T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, NIPS 2016
Evaluating GANs
• Turing Test
• Inception Score
• Fréchet Inception Distance

Slide inspired by Svetlana Lazebnik (link)


M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter,
GANs trained by a two Ime-scale update rule converge to a local Nash equilibrium, NIPS 2017
Are GANs created equal?

Slide inspired by Svetlana Lazebnik (link)


M. Lucic, K. Kurach, M. Michalski, O. Bousquet, S. Gelly, Are GANs created equal? A large-scale study, NIPS 2018
Are GANs created equal?

Slide inspired by Svetlana Lazebnik (link)


M. Lucic, K. Kurach, M. Michalski, O. Bousquet, S. Gelly, Are GANs created equal? A large-scale study, NIPS 2018
Are GANs created equal?
Abstract:

“We find that most models can reach similar scores with
enough hyperparameter optimization and random restarts.
This suggests that improvements can arise from a higher
computational budget and tuning more than fundamental
algorithmic changes … We did not find evidence that any of
the tested algorithms consistently outperforms the non-
saturating GAN introduced in Goodfellow et al. (2014)”
Slide inspired by Svetlana Lazebnik (link)
M. Lucic, K. Kurach, M. Michalski, O. Bousquet, S. Gelly, Are GANs created equal? A large-scale study, NIPS 2018
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• Variational Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • GeneraIve StochasIc Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• Variational Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Variational Auto-encoders (VAEs)
• Probabilistic twist to standard VAEs

𝑥 E z D 𝑥!
Variational Auto-encoders (VAEs)
• Probabilistic twist to standard VAEs
• Recall: how can we generate new data using Auto-encoders?

𝑥 E z D 𝑥!
Variational Auto-encoders (VAEs)
• Probabilistic twist to standard VAEs
• Recall: how can we generate new data using Auto-encoders?

𝑥 E z 𝑧̂ D 𝑥!
Generating Samples using VAEs

𝑧 D 𝑥

where, 𝑧 ~ 𝑃(𝑧) where, 𝑥 ~ 𝑃(𝑥|𝑧)


Prior Conditional
• Choose something simple • Complex, generates image
• E.g., Gaussian • Parameterized by the decoder
network D
• 𝑃0 (𝑥|𝑧) or 𝑃(𝑥|𝑧; 𝜃)
Training a VAE Generator

𝑧 D 𝑥

where, 𝑧 ~ 𝑃(𝑧) where, 𝑥 ~ 𝑃(𝑥|𝑧)

Maximize likelihood of training data:

𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧

𝑃 𝑧 : Simple prior (gaussian)

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧

𝑃 𝑧 : Simple prior (gaussian)


𝑃 𝑥|𝑧 : Decoder Network

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧

𝑃 𝑧 : Simple prior (gaussian)


𝑃 𝑥|𝑧 : Decoder Network
∫ 𝑑𝑧: Intractable to compute for every z

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧

𝑃 𝑧 : Simple prior (gaussian)


𝑃 𝑥|𝑧 : Decoder Network
∫ 𝑑𝑧: Intractable to compute for every z

Why is that?

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
+
1
Approx. with samples of z during training: 𝑃 𝑥 ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B

Inspired from: Fei-Fei Li & JusIn Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
+
1
Approx. with samples of z during training: 𝑃 𝑥 ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
• Need a lot of samples of z
• Most of the 𝑃 𝑥 𝑧 ≈ 0.

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
+
1
Approx. with samples of z during training: 𝑃 𝑥 ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
• Need a lot of samples of z
• Most of the 𝑃 𝑥 𝑧 ≈ 0.
• Can we learn which z will generate 𝑃 𝑥 𝑧 ≫ 0?

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
+
1
Approx. with samples of z during training: 𝑃 𝑥 ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
• Need a lot of samples of z
• Most of the 𝑃 𝑥 𝑧 ≈ 0.
• Can we learn which z will generate 𝑃 𝑥 𝑧 ≫ 0?

E(F|G)E(G)
Posterior density is also intractable: 𝑃 𝑧 𝑥 =
E(F)

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Training a VAE Generator
Data likelihood: 𝑃 𝑥 = ; 𝑃 𝑥 𝑧 𝑃 𝑧 𝑑𝑧
+
1
Approx. with samples of z during training: 𝑃 𝑥 ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B
• Need a lot of samples of z
• Most of the 𝑃 𝑥 𝑧 ≈ 0.
• Can we learn which z will generate 𝑃 𝑥 𝑧 ≫ 0?

E(F|G)E(G)
Posterior density is also intractable: 𝑃 𝑧 𝑥 =
E(F)

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Variational Auto-encoder (VAE)
We want, but impractical:
+
1
𝑃 𝑥 = 𝐸G~E(G) (𝑃(𝑥|𝑧)) ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B

Assume we can learn a distribution 𝑄 𝑧 , such that z ~ 𝑄(𝑧) generates P x z ≫ 0.


Then, we can compute 𝐸G~M(G) (𝑃(𝑥|𝑧))

Inspired from: Fei-Fei Li & JusIn Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Variational Auto-encoder (VAE)
We want, but impractical:
+
1
𝑃 𝑥 = 𝐸G~E(G) (𝑃(𝑥|𝑧)) ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B

Assume we can learn a distribution 𝑄 𝑧 , such that z ~ 𝑄(𝑧) generates P x z ≫ 0.


Then, we can compute 𝐸G~M(G) (𝑃(𝑥|𝑧))

Questions:
• How can we learn such a 𝑄 𝑧 ?
• How are 𝑃 𝑥 or 𝐸G~E(G) and 𝐸G~M(G) related?

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Variational Auto-encoder (VAE)

𝑥 E𝜙 𝑄O (𝑧|𝑥)

𝑧 D𝜃 𝑃0 (𝑥|𝑧)
Variational Auto-encoder (VAE)
If 𝑄O and 𝑃0 are diagonal
gaussian distributions

𝑥 E𝜙 𝑄O (𝑧|𝑥)

𝑧 D𝜃 𝑃0 (𝑥|𝑧)
Variational Auto-encoder (VAE)
If 𝑄O and 𝑃0 are diagonal
gaussian distributions
𝜇G|F
𝑥 E𝜙
Mean and (diagonal) covariance
of 𝑄O (𝑧|𝑥)
ΣG|F

𝜇F|G
Mean and (diagonal) covariance
𝑧 D𝜃 of 𝑃0 (𝑥|𝑧)
ΣF|G
Inspired from: Fei-Fei Li & JusIn Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
Variational Auto-encoder (VAE)
If 𝑄O and 𝑃0 are diagonal
gaussian distributions
𝜇G|F
𝑥 E𝜙 Sample z from z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F

𝜇F|G
𝑧 D𝜃 Sample x from x|z ~ 𝑁(𝜇F|G , ΣF|G )
ΣF|G
Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link)
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
+
1
𝑃 𝑥 = 𝐸G~E(G) (𝑃(𝑥|𝑧)) ≈ A 𝑃(𝑥|𝑧( )
𝑛
()B

How are 𝑃 𝑥 or 𝐸G~E(G) and 𝐸G~M(G) related?

Definition of KL-divergence:
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)


VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)

E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log (Bayes rule)
E(F)
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)

E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log
E(F)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃 𝑥 𝑧 − log 𝑃(𝑧) + log 𝑃 𝑥 (Re-arrange)

(P(x) independent of z)
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)

E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log
E(F)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃 𝑥 𝑧 − log 𝑃(𝑧) + log 𝑃 𝑥

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(Re-arrange)
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)

E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log
E(F)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃 𝑥 𝑧 − log 𝑃(𝑧) + log 𝑃 𝑥

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥


VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧|𝑥)


E(F|G)E(G)
= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log E(F)

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃 𝑥 𝑧 − log 𝑃(𝑧) + log 𝑃 𝑥

= 𝐸G~M(G|F) log 𝑄 𝑧 𝑥 − log 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥


(KL-divergence definition)
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


Data likelihood,
which we want to maximize,
but is not tractable
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


Data likelihood, Decoder output is P(x|z),
which we want to maximize, can compute estimate of
but is not tractable this term by sampling
(coming soon)
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


Data likelihood, Decoder output is P(x|z), Recall that both
which we want to maximize, can compute estimate of distributions are simple
but is not tractable this term by sampling (e.g., gaussian), therefore
(coming soon) KL-term has a nice form.
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


Can we compute this?
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


Can we compute this?
As seen earlier, P(z|x) is intractable
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

(re-arranging)

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)


Can we compute this?
As seen earlier, P(z|x) is intractable
However,
KL-term is always >= 0.
VAE: Relating 𝑃(𝑥) and 𝐸G~M(G|F)
𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x)

= 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧) − 𝐸G~M(G|F) log 𝑃(𝑥|𝑧) + log 𝑃 𝑥

log 𝑃 𝑥 − 𝐷TU 𝑄 𝑧 𝑥 ∥ P(z|x) = 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃 𝑧

log 𝑃 𝑥 ≥ 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)

Tractable lower bound which we can take


gradient of and optimize!
(known as Variational Lower Bound or ELBO)
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)

𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
𝜇F|G
z Sample x from
D x|z ~ 𝑁(𝜇F|G , ΣF|G )
ΣF|G
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
𝜇F|G
z Sample x from
D x|z ~ 𝑁(𝜇F|G , ΣF|G )
ΣF|G
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
𝜇F|G
z Sample x from
D x|z ~ 𝑁(𝜇F|G , ΣF|G )
ΣF|G
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
Sample x from
𝜇F|G x|z ~ 𝑁(𝜇F|G , ΣF|G )
D
ΣF|G 𝑥!
$
𝑥 − 𝑥!
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F
Sample x from
𝜇F|G x|z ~ 𝑁(𝜇F|G , ΣF|G )
D
ΣF|G 𝑥!
$
𝑥 − 𝑥!
Modeling P(x|z)
Let 𝑓 𝑧 be the network output.
- Assume 𝑃(𝑥|𝑧) to be i.i.d. Gaussian
- 𝑥! = 𝑓 𝑧 + 𝜂, where 𝜂 ~ 𝑁(0,1) (re: Linear Regression)
$
- Simplifies to an L2 loss: 𝑥 − 𝑓(𝑧)

$
Also, approximate 𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 with 𝑥 − 𝑓(𝑧) for a single 𝑧.
Why is this reasonable?

Inspired from: Fei-Fei Li & Justin Johnson & Serena Yeung (link), and Svetlana Lazebnik (link).
Details of second approximation in Tutorial on VAE.
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F

$
D 𝑓(𝑧) 𝑥 − 𝑓(𝑧)
VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
Sample z from
z|x ~ 𝑁(𝜇G|F , ΣG|F )
ΣG|F

$
D 𝑓(𝑧) 𝑥 − 𝑓(𝑧)
Re-parameterization Trick
z ~ 𝑁 𝜇, 𝜎 $ is equivalent to
• 𝜇 + (𝜎 ⋅ 𝜖), where 𝜖 ~ 𝑁 0,1
• Now we can easily backpropagate the loss to the Encoder.

Figure from Kingma’s workshop slides


VAE: Putting everything together
𝐸G~M 𝑧 𝑥 log 𝑃 𝑥 𝑧 − 𝐷TU 𝑄 𝑧 𝑥 ∥ 𝑃(𝑧)
𝐷TU 𝑁 𝜇G|F , ΣG|F ∥ N(0,I)
𝜇G|F
𝑥 E
ΣG|F * +
Sample 𝜖 from 𝑁(0, 1)
$
D 𝑓(𝑧) 𝑥 − 𝑓(𝑧)
From: A Tutorial on VAE

Figure from Doersch, Tutorial on VAE.


Sampling from VAE

Figure from Doersch, Tutorial on VAE.


Conditional VAE

Figure from Doersch, Tutorial on VAE.


Generated Samples

Samples from: Fei-Fei Li & Justin Johnson & Serena Yeung (link)
Application: Expression Magnification and Suppression

Yeh et al., Semantic Facial Expression Editing using Autoencoded Flow


Application: Expression Magnification and Suppression

Yeh et al., Semantic Facial Expression Editing using Autoencoded Flow


Application: Expression Magnification and Suppression

Yeh et al., Semantic Facial Expression Editing using Autoencoded Flow


VAEs
Pros
• Principled approach to generative models
• Allows inference of q(z|x) – which can be used as feature representation
Cons
• Maximizes ELBO (remember tightness?)
• Samples are blurrier
• Why?
Active areas of research:
• More flexible approximations, e.g. richer approximate posterior instead of
diagonal Gaussian
• Incorporating
Slide from: structure
Fei-Fei Li & Justin Johnson inYeung
& Serena latent
(link)variables
VAE-GAN
Decoder/
Encoder z
Generator
D
Discriminator real/
fake

Slide inspired by Svetlana Lazebnik (link). Bottom diagram from Larsen et al.
Application: Fader Networks

Lample et al., Fader Networks: Manipulating Images by Sliding Attributes


VAE: many schools of thoughts
Not blurry, but noisy*
Sample vs. Mean/Expected Value
x|z ~ 𝑁(𝜇F|G , ΣF|G ) vs. x|z = 𝜇F|G

L-2 Loss**
*see reddit discussion
**see blog
KL Divergence

From:
Ian Goodfellow,
Tutorial on GANs, 2017
KL Divergence

From: Pedro’s blog


Supervised Unsupervised
Learning Learning

Discriminative Generative
Models Models
What generative models we haven’t covered?
Videos
• Future generation
• Future prediction
• Future action prediction
• Future pose prediction
• Future pose generation
• Etc.
• Etc.
Generative Models
Direct
• GAN

Explicit Density Implicit Density

Tractable Density Approximate Density Markov Chain


• Fully Visible Belief Nets • Generative Stochastic Networks
• NADE
• MADE
• PixelRNN/CNN Variational Markov Chain
• Variational Auto-encoders • Boltzmann Machine

Inspired from:
Fei-Fei Li & Justin Johnson & Serena Yeung. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
Text Generation Models

source
Text Generation Models

source
Text Generation Models

source
Text Generation Models

source
Text Generation Models
3 solutions:
• Gumbel-Softmax (continuous approximation of Softmax)
• Work with continuous spaces
• Reinforcement learning (e.g., REINFORCE)

source
Text Generation Models – Examples

source
Text Generation Models – Examples

source
Text Generation Models – Examples

source
Text Generation Models – Examples

Maskgan: better text generation via filling in the______


Text Generation Models – Examples

source
Text Generation Models – Examples

source
Text Generation Models – Examples

source
Text to Image – Examples

source
Text to Image – Examples

source
Text to Image – Examples

source
Text to Image – Examples

source
Text to Video – Examples

source
Text Generation Models – Good References
• GAN for text generation – Part I
• GAN for text generation – Part II: RL (part III & IV coming shortly)
• OpenAI GPT-2 model
• Generating Natural-language text with Neural Networks
• Generative Model for text: An overview of recent advancements
GAN and VAE references
• Why it is so hard to train GANs!
• VAE explanations (link, link, link, link, link, link, tutorial)
Supervised Unsupervised
Learning Learning

Discriminative Generative
Models Models
Supervised Unsupervised
Learning Learning

Discriminative Generative
Models Models
Generative and Unsupervised Gap

cat dog
Self-supervised Learning – teaser
Proxy Tasks for “Self”- supervised learning

Doersch et al., Unsupervised Visual RepresentaIon Learning by Context PredicIon


Proxy Tasks for “Self”- supervised learning

Pathak et al., Context Encoders: Feature Learning by Inpainting


Proxy Tasks for “Self”- supervised learning

Wang et al., Unsupervised Learning of Visual Representations using Videos


Evaluate on Recognition

Table from: Pathak et al., Context Encoders: Feature Learning by Inpainting


Evaluate on Recognition

Table from: Donahue et al., Adversarial Feature Learning


Back to image generative models
• GANs
• VAEs
•?
Back to image generative models
• GANs
• VAEs
• Flow-based Methods
Back to image generative models

source
Back to image generative models

source
Conclusion

source

You might also like