Professional Documents
Culture Documents
Image-to-Image Translation With Conditional Adversarial Networks (Review)
Image-to-Image Translation With Conditional Adversarial Networks (Review)
Ume Habiba
Previous methodologies have found it beneficial to combine the GAN objective with a more
traditional loss, like as L2 distance [43]. The discriminator’s job remains unchanged, but the
generator is tasked to not only fool the discriminator but also to be near the ground truth
output in an L2 sense. This feature is explored, using L1 distance instead of L2 as
L1 encourages less blurring:
Final goal is
GANs learns a mapping from random noise vector z to output image y while cGANs learning
mapping from observed image x and random noise vector z.In many image-to-image translation
problems great level of low-level information is shared between input and output and its it is
necessary to shuttle this information across the network. To overcome the bottleneck for
information like this generator with skip connection following general shape of U-Net is used.
There is low-level information shared between the input and output, and it would be desirable
to pass this information directly across the net. using the chain rule, we must keep multiplying
terms with the error gradient as we go backwards. However, in the long chain of multiplication,
if we multiply many things together that are less than one, then the resulting gradient will be
very small Thus, the gradient becomes very small as we approach the earlier layers in a deep
architecture. In some cases, the gradient becomes zero, meaning that we do not update the
early layers at all. L1 loss function is used which produces the blurry results. This loss fails to
produce high-resolution results. This failure restricts GAN discriminator to only model high-
frequency structure, relying on L1 for to force low frequency-correctness. Therefore, patchGAN
is used as discriminator. The difference between a PatchGAN and regular GAN discriminator is
that rather the regular GAN maps from a 256x256 image to a single scalar output, which
signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs
X, where each X_ij signifies whether the patch ij in the image is real or fake. The receptive fields
of the discriminator turn out to be 70x70 patches in the input image. During training step
alternate between one gradient descent point on D, then one step on G. minibatch SDG is used
and Adam solver is applied with learning rate of 0.0002, and momentum parameters B_1 = 0:5,
B_2 = 0:999. At test time - dropout is also applied. Use AMT(Amazon Mechanical Turk) for
person based evaluation and used FCN-like models to test original and generated images vs
labels. During ablation test patch size is important, 70x70 is the optimal one cGAN + L1 loss is
the best loss option U-net + skips is better than no skips. The advantage of patchGAN is that
fixed size patch discriminator can be applied to large images. Generator may be applied on
large images than those it was trained.
In this paper, their primary contribution is to demonstrate that on a wide variety of problems,
conditional GANs produce reasonable results. Their second contribution is to present a simple
framework sufficient to achieve good results, and to analyze the effects of several important
architectural choices. However, there are some limitations, for example, the cGAN produces
sharp images that look at glance like the ground truth, but in fact include many small,
hallucinated objects.