Professional Documents
Culture Documents
DeepLearning.AI makes these slides available for educational purposes. You may not use or
distribute these slides for commercial purposes. You may make copies of these slides and
use or distribute them for educational purposes as long as you cite DeepLearning.AI as the
source of the slides.
Transformation
“This bird is red with white and has a very short beak”
Collie
Conditional
+ Generator
Pix2Pix
+ Generator
Pix2Pix
+ Generator
Real input
Or
Pix2Pix
Generator
Based on: https://arxiv.org/abs/1611.07004
Pix2Pix Upgrades
0.8
Pix2Pix Pix2Pix
Generator Discriminator
(Left) Based on: https://arxiv.org/abs/1611.07004
(Right) Based on: https://arxiv.org/abs/1803.07422
Goal is still to produce realistic
Pix2Pix Upgrades outputs!
0.8
Pix2Pix Pix2Pix
Generator Discriminator
(Left) Based on: https://arxiv.org/abs/1611.07004
(Right) Based on: https://arxiv.org/abs/1803.07422
Summary
0.8
.1
0.4 0.5 0
0.9 .2
0.6 0
0.9 0.3
0.7 .5 0.4
0 0.9
0.8 0.7
0.6
0.3
Image available from: https://arxiv.org/abs/1611.07004
Based on: https://arxiv.org/abs/1803.07422
Pix2Pix Discriminator: PatchGAN
.1
0.4 0.5 0
0.9 .2
0.6 0
0.9 0.3
0.7 .5 0.4
0 0.9
0.8 0.7
0.6
0.3
Image available from: https://arxiv.org/abs/1611.07004
Based on: https://arxiv.org/abs/1803.07422
Pix2Pix Discriminator: PatchGAN
.1
0.4 0.5 0
0.9 .2
0.6 0
0.9 0.3
0.7 0.4
0.5 0.9
0.8 0.7
0.6
0.3
.1
0.4 0.5 0
0.9 .2
0.6 0
0.9 0.3
0.7 0.4
0.5 0.9
0.8 0.7
0.6
0.3
Still uses BCE Loss
Image available from: https://arxiv.org/abs/1611.07004
Based on: https://arxiv.org/abs/1803.07422
Pix2Pix Discriminator: PatchGAN Values closer to 0 = fake label
0
0 0
0 0 0
0
0
0 0 0
0 0 0
0
0
.1
0.4 0.5 0
0.9 .2
0.6 0
0.9 0.3
0.7 0.4
0.5 0.9
0.8 0.7
0.6
0.3
Still uses BCE Loss
Image available from: https://arxiv.org/abs/1611.07004
Based on: https://arxiv.org/abs/1803.07422
Pix2Pix Discriminator: PatchGAN Values closer to 1 = real label
1
1 1
1 1 1
1
1
1 1 1
1 1
1 1
1
.1
0.4 0.5 0
0.9 .2
0.6 0
0.9 0.3
0.7 0.4
0.5 0.9
0.8 0.7
0.6
0.3
Still uses BCE Loss
Image available from: https://arxiv.org/abs/1611.07004
Based on: https://arxiv.org/abs/1803.07422
Summary
Image Segmentation
Image Segmentation
Forward pass
Skip connections
allow information
flow to the decoder
Backward pass
Skip connections
improve gradient
flow to encoder
Block
Block
Block
Block
Block
Block
Block
Block
Pix2Pix Encoder
Input size: 256 x 256 x 3
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Pix2Pix Encoder
Input size: 256 x 256 x 3
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Input size: 1 x 1 x 512 Block
Pix2Pix Decoder
Block
Block
Block
Block
Block
Block
Input size: 1 x 1 x 512 Block
Pix2Pix Decoder
Output size: 256 x 256 x 3
Block
Block
Block
Block
Block
Block
Input size: 1 x 1 x 512 Block
Block
Block
Block
Block
ReLU
Block
BatchNorm
Block
Input size: 1 x 1 x 512 Block Trans Conv
Block
Block
Block
Dropout in some decoder blocks
Block
adds noise to the network ReLU
Block
BatchNorm
Block
Input size: 1 x 1 x 512 Block Trans Conv
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Pix2Pix Encoder-Decoder
Same input & output size:
256 x 256 x 3
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Bottleneck size:
Block 1 x 1 x 512 Block
Block Block
Block Block
Block Block
8 encoder blocks
Block 8 decoder blocks Block
Block Block
Block Block
Bottleneck size:
Block 1 x 1 x 512 Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Pix2Pix U-Net
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Block Block
Pix2Pix U-Net
Skip connections concatenate encoder to
decoder blocks at the same resolutions
Block Block
Block
Block Block
Block
Block Block
Block
Block Block
Block
Block Block
Block
Block Block
Block
Block Block
Block
Block Block
Summary
—
Generated output Real output
BCE Loss + —
Real input
Real input
Classification matrix
Real input
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Generated PatchGAN
output Discriminator vs.
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Real input
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
Real PatchGAN
output Discriminator vs.
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
Real input
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
Generated PatchGAN
output Discriminator vs.
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1
Real input
0 0 0 0 1 1 1 1
0 0 0 0 1 1 1 1 Pixel Distance
Generated PatchGAN
Discriminator vs. +
output 0 0 0 0 1 1 1 1 Loss Term
0 0 0 0 1 1 1 1
Real input
● PatchGAN discriminator
○ Inputs input image and paired output (either real target or fake)
○ Outputs classification matrix