You are on page 1of 3

2019MSCE-13

Ume Habiba

Image-to-Image translation with Conditional Adversarial


Networks (Review)

This paper investigates conditional adversarial networks (cGAN) as general-purpose solution to


image-to-image translation problems. It demonstrates that this approach is effective at
synthesizing photos from label, reconstructing photos from edge maps, sketch from photos,
map to aerial, converting scene from day to night (and vice versa) and colorizing images
among other tasks. They release a pix2pix software related to this paper. CNNs alone produces
the blurry images this problem is solved by GAN which uses L2 loss function. GANs learning a
generative model while cGANs learn a conditional generative model where we condition on an
input image and a corresponding out image. CGAN uses structural loss unlike the plain CNNs.
The generator G learns to fool discriminator while discriminator D learns to classify real and
synthesized pairs. Unlike the unconditional GAN both generator and discriminator observe the
input image. To test the importance of conditioning the discriminator, unconditional variant is
compared in which the discriminator does not observe x:

The objective of Conditonal GAN is expressed as:

Previous methodologies have found it beneficial to combine the GAN objective with a more
traditional loss, like as L2 distance [43]. The discriminator’s job remains unchanged, but the
generator is tasked to not only fool the discriminator but also to be near the ground truth
output in an L2 sense. This feature is explored, using L1 distance instead of L2 as
L1 encourages less blurring:
Final goal is

GANs learns a mapping from random noise vector z to output image y while cGANs learning
mapping from observed image x and random noise vector z.In many image-to-image translation
problems great level of low-level information is shared between input and output and its it is
necessary to shuttle this information across the network. To overcome the bottleneck for
information like this generator with skip connection following general shape of U-Net is used.
There is low-level information shared between the input and output, and it would be desirable
to pass this information directly across the net. using the chain rule, we must keep multiplying
terms with the error gradient as we go backwards. However, in the long chain of multiplication,
if we multiply many things together that are less than one, then the resulting gradient will be
very small Thus, the gradient becomes very small as we approach the earlier layers in a deep
architecture. In some cases, the gradient becomes zero, meaning that we do not update the
early layers at all. L1 loss function is used which produces the blurry results. This loss fails to
produce high-resolution results. This failure restricts GAN discriminator to only model high-
frequency structure, relying on L1 for to force low frequency-correctness. Therefore, patchGAN
is used as discriminator. The difference between a PatchGAN and regular GAN discriminator is
that rather the regular GAN maps from a 256x256 image to a single scalar output, which
signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs
X, where each X_ij signifies whether the patch ij in the image is real or fake. The receptive fields
of the discriminator turn out to be 70x70 patches in the input image. During training step
alternate between one gradient descent point on D, then one step on G. minibatch SDG is used
and Adam solver is applied with learning rate of 0.0002, and momentum parameters B_1 = 0:5,
B_2 = 0:999. At test time - dropout is also applied. Use AMT(Amazon Mechanical Turk) for
person based evaluation and used FCN-like models to test original and generated images vs
labels. During ablation test patch size is important, 70x70 is the optimal one cGAN + L1 loss is
the best loss option U-net + skips is better than no skips. The advantage of patchGAN is that
fixed size patch discriminator can be applied to large images. Generator may be applied on
large images than those it was trained.
In this paper, their primary contribution is to demonstrate that on a wide variety of problems,
conditional GANs produce reasonable results. Their second contribution is to present a simple
framework sufficient to achieve good results, and to analyze the effects of several important
architectural choices. However, there are some limitations, for example, the cGAN produces
sharp images that look at glance like the ground truth, but in fact include many small,
hallucinated objects.

You might also like