Professional Documents
Culture Documents
Dlvu Lecture10
Dlvu Lecture10
Generative
model
Autoregressive models
Stable Exact Slow No
(e.g., PixelCNN)
Flow-based models
Stable Exact Fast/Slow No
(e.g., RealNVP)
Implicit models
Unstable No Fast No
(e.g., GANs)
Prescribed models
Stable Approximate Fast Yes
(e.g., VAEs)
3
GENERATIVE MODELS
Autoregressive models
Stable Exact Slow No
(e.g., PixelCNN)
Flow-based models
Stable Exact Fast/Slow No
(e.g., RealNVP)
Implicit models
Unstable No Fast No
(e.g., GANs)
Prescribed models
Stable Approximate Fast Yes
(e.g., VAEs)
4
ARMS: AUTOREGRESSIVE MODELS
5
REPRESENTING A JOINT DISTRIBUTION
∫
p(x) = p(x, z) dz
∫
= p(x | z) p(z) dz
7
Now, we will use the product rule to express the distribution of x ∈ ℝD:
D
∑
p(x) = p(x1) p(xd | x<d )
d=2
where x<d = [x1, x2, …, xd−1]⊤
8
∑
p(x) = p(x1) p(xd | x<d )
d=2
where x<d = [x1, x2, …, xd−1]⊤.
∑
p(x) = p(x1) p(xd | x<d )
d=2
where x<d = [x1, x2, …, xd−1]⊤.
10
∑
p(x) = p(x1) p(xd | x<d )
d=2
where x<d = [x1, x2, …, xd−1]⊤.
11
∑
p(x) = p(x1) p(xd | x<d )
d=2
where x<d = [x1, x2, …, xd−1]⊤.
12
13
∑
p(x) = p(x1)p(x2 | x1) p(xd | xd−2, xd−1)
d=3
Now, we can model p(xd | xd−2, xd−1) by a single model.
14
∑
p(x) = p(x1)p(x2 | x1) p(xd | xd−2, xd−1)
d=3
15
∑
p(x) = p(x1)p(x2 | x1) p(xd | xd−2, xd−1)
d=3
Now, we can model p(xd | xd−2, xd−1) by a single model.
16
∑
p(x) = p(x1)p(x2 | x1) p(xd | xd−2, xd−1)
d=3
17
∑
p(x) = p(x1)p(x2 | x1) p(xd | xd−2, xd−1)
d=3
Now, we can model p(xd | xd−2, xd−1) by a single model.
19
Advantages:
20
21
Advantages:
Advantages:
24
25
26
27
28
29
30
31
32
33
Oord, Aaron van den, et al. "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).
PIXELCNN
We can utilize the very same idea for images, but using 2D convolutions.
We need to remember about causal
convolutions!
34
PIXELCNN
We can utilize the very same idea for images, but using 2D convolutions.
We need to remember about causal
convolutions!
Moreover, we should think of
2 or even 3 dimensions (CxHxW).
We can accomplish that by composing two Conv2D layers.
35
PIXELCNN
We can utilize the very same idea for images, but using 2D convolutions.
We need to remember about causal
convolutions!
Moreover, we should think of
2 or even 3 dimensions (CxHxW).
We can accomplish that by composing two Conv2D layers.
The first Conv2D layer covers the upper part.
36
PIXELCNN
We can utilize the very same idea for images, but using 2D convolutions.
We need to remember about causal
convolutions!
Moreover, we should think of
2 or even 3 dimensions (CxHxW).
We can accomplish that by composing two Conv2D layers.
The first Conv2D layer covers the upper part.
The second Conv2D layer covers the left part.
37
Van den Oord, Aaron, et al. "Conditional image generation with PixelCNN decoders." NIPS. 2016.
PIXELCNN
We can utilize the very same idea for images, but using 2D convolutions.
We need to remember about causal
convolutions!
Moreover, we should think of
2 or even 3 dimensions (CxHxW).
We can accomplish that by composing two Conv2D layers.
The first Conv2D layer covers the upper part. using maskin
for kernel weights.
The second Conv2D layer covers the left part.
38
Van den Oord, Aaron, et al. "Conditional image generation with PixelCNN decoders." NIPS. 2016.
g
PIXELCNN
∑ [ ]
K
P(x ∣ π, μ, s) = πi σ ((x + 0.5 − μi) /si) − σ ((x − 0.5 − μi) /si)
i=1
39 Salimans, Tim, et al. "PixelCNN++: Improving the pixelcnn with discretized logistic mixture likelihood and other
modifications." arXiv (2017).
PIXELCNN
∑ [ ]
K
P(x ∣ π, μ, s) = πi σ ((x + 0.5 − μi) /si) − σ ((x − 0.5 − μi) /si)
i=1
sigmoid function
40 Salimans, Tim, et al. "PixelCNN++: Improving the pixelcnn with discretized logistic mixture likelihood and other
modifications." arXiv (2017).
PIXELCNN
∑ [ ]
K
P(x ∣ π, μ, s) = πi σ ((x + 0.5 − μi) /si) − σ ((x − 0.5 − μi) /si)
i=1
Learnable as a parameter
41 Salimans, Tim, et al. "PixelCNN++: Improving the pixelcnn with discretized logistic mixture likelihood and other
modifications." arXiv (2017).
PIXELCNN
Sampled Real
42 Salimans, Tim, et al. "PixelCNN++: Improving the pixelcnn with discretized logistic mixture likelihood and other
modifications." arXiv (2017).
AUTOREGRESSIVE MODELS
Advantages Disadvantages
✓ Exact likelihood. - Very slow sampling.
✓ Stable training. - No compression.
- Sometimes, low visual quality.
43
FLOW-BASED MODELS
44
CHANGE OF VARIABLES
45
𝒩
CHANGE OF VARIABLES
46
𝒩
CHANGE OF VARIABLES
47
𝒩
𝒩
CHANGE OF VARIABLES
48
𝒩
𝒩
CHANGE OF VARIABLES
49
𝒩
𝒩
CHANGE OF VARIABLES
( 0.75 ) 3
u−1 4 1
p(u) = p = exp {−(u − 1)2 /0.752}
50 2π 0.752
𝒩
𝒩
CHANGE OF VARIABLES
Multidimensional case:
−1
∂f (u)
p(u) = p (f (u))
−1
∂u
where:
∂f −1(u)
= det Jf −1(u)
∂u
51
CHANGE OF VARIABLES
Multidimensional case:
−1
∂f (u)
p(u) = p (f (u))
−1
∂u
where:
∂f −1(u)
= det Jf −1(u)
∂u Jacobian
∂f 1−1 ∂f 1−1
∂u1
… ∂uD
Jf −1 = ⋮ ⋱ ⋮
∂f D−1 ∂f D−1
52 ∂u1
⋯ ∂uD
CHANGE OF VARIABLES
Multidimensional case:
−1
∂f (u)
p(u) = p (f (u))
−1
∂u
where:
∂f −1(u)
= det Jf −1(u)
∂u Jacobian
∂f 1−1 ∂f 1−1
∂u1
… ∂uD
Jf −1 = ⋮ ⋱ ⋮
How can we utilize this idea? ∂f D−1 ∂f D−1
53 ∂u1
⋯ ∂uD
54
𝒩
Rippel, O., & Adams, R. P. (2013). High-dimensional probability estimation with deep density models. arXiv
...
0 0 0
“latent” space pixel space
55
𝒩
Rippel, O., & Adams, R. P. (2013). High-dimensional probability estimation with deep density models. arXiv
...
0 0 0
“latent” space pixel space
−1
K ∂fi (zi−1)
This results in: p(x) = π (z0)
∏
det
i=1
∂zi−1
56
𝒩
Rippel, O., & Adams, R. P. (2013). High-dimensional probability estimation with deep density models. arXiv
2D EXAMPLE
{fk}
{f k−1}
57
DENSITY MODELING WITH INVERTIBLE NEURAL NETWORKS
−1
K ∂fi (zi−1)
The density model: p(x) = π (z0)
∏
det
i=1
∂zi−1
58
−1
K ∂fi (zi−1)
The density model: p(x) = π (z0)
∏
det
i=1
∂zi−1
59
−1
K ∂fi (zi−1)
The density model: p(x) = π (z0)
∏
det
i=1
∂zi−1
60
−1
K ∂fi (zi−1)
The density model: p(x) = π (z0)
∏
det
i=1
∂zi−1
REALNVP
y1:d = x1:d
62 Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using RealNVP. arXiv preprint arXiv:1605.08803.
REALNVP
y1:d = x1:d
x1:d = y1:d
63 Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using RealNVP. arXiv preprint arXiv:1605.08803.
REALNVP
y1:d = x1:d
Known as
yd+1:D = xd+1:D ⊙ exp (s (x1:d)) + t (x1:d)
affine coupling layer
x1:d = y1:d
64 Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using RealNVP. arXiv preprint arXiv:1605.08803.
REALNVP
y1:d = x1:d
65
REALNVP
y1:d = x1:d
66
REALNVP
y1:d = x1:d
D−d D−d
exp (s (x1:d)) = exp
∏ ∑ ( 1:d)j
and det(J) = s x
j
j=1 j=1
67
REALNVP
y1:d = x1:d
Easy to calculate!
D−d D−d
exp (s (x1:d)) = exp
∏ ∑ ( 1:d)j
and det(J) = s x
j
j=1 j=1
68
REALNVP
69
REALNVP
70
REALNVP
71
REALNVP
72
REALNVP
73
REALNVP
74
REALNVP
75
GLOW: REALNVP WITH 1X1 CONVOLUTIONS
76
Kingma, D. P., & Dhariwal, P. (2018). GLOW: Generative flow with invertible 1x1 convolutions. NeurIPS 2018
GLOW: SAMPLES
CelebAHQ
77
GLOW: LATENT INTERPOLATION
CelebAHQ
78
VAES WITH NORMALIZING FLOWS
q(z | x) ∝
˜ p(x | z) p(z)
Variational inference
Flow-based priors
with normalizing flows
Rezende & Mohamed. "Variational Chen, Kingma, Salimans, Duan, Dhariwal,
inference with normalizing flows." Schulman, Abbeel, “Variational lossy
autoencoder”
v.d. Berg, Hasenclever, Tomczak, Welling,
“Sylvester normalizing flows for Gatopoulos, Tomczak. "Self-Supervised
variational inference” Variational Auto-Encoders."
Thank you!