Professional Documents
Culture Documents
VANILLA GAN
so we start with some input training data and feed that into our model.
A model then make a prediction in the form of output and we can compared
the predicted output with the expected output from the training data set then
based upon that expected output and the predicted output we can figure out
how we should update our model to create better output example of
supervised learning.
Gan is an example of UN supervised learning its effectively supervised itself and
consists of two sub models.
So we have a generator some model and we have a discriminated some model
job is to create fake Input aur fake samples and discriminate job is to take a
given sample and figure out if it is a fake sample or its a real sample from the
domain and their lies the adversible nature of this we have a generator creating
fake samples and discriminated is taking a look at a given sample of figuring out
is this a fake sample from the generator is the real sample from the domain set.
so let's take an example of how this works with let's see a flower so we are
going to train at generator to create really converging fake flowers first of all we
need to train discriminator model recognize what a picture of a flower looks
Now the answer is revealed to both generated and discriminator then the
flower was a fake based upon that generated and discrimaintor will change the
behaviour this is a zero sum game there's always a Winner and Loser the
winner gets to remain blissfully and changed their model does not change at all
where as a loser has to update them model so if the discriminator that
successfully spotted that lead discriminator remains and change but a
generator will need to change its model to generate better fakes.
This means a DCGAN would likely be more fitting for image/video data,
whereas the general idea of a GAN can be applied to wider domains, as the
model specifics are left open to be addressed by individual model
architectures.
DCGANs can be customized for different use cases. The various practical
applications of DCGANs include the following: The generation of anime
characters: Currently, animators manually draw characters with computer
software and sometimes on paper as well. This is a manual process that usually
takes a lot of time.
Task of taking image from one domain and transforming them so they have
style of image from another domain discriminator to figure out whether the
input image is a fake or real and generator to generate an image that looks
realistic but it's a fake image.
In addition to our random Latent vector as input basically random latent vector
is an input to generator and generator converts image that latent vector into a
fake image by adding dense player by re-shaping.
In addition to Latent vector we are providing our class label its a dog or a cat
that's the class label . when goes into generator as input is concatenated vector
of random noise plus your class label. In discriminator providing your class
label information and adding your original real image or take input and
discriminator gets trained.
CONDITIONAL
GAN
The GAN models have no control over the generated data, especially in cases of
data with more than one labeled class. Therefore, extra information is needed
to guide the direction of the distribution to a specific labeled class in order to
direct the generated results to more than one labeled class. To this end, a
conditional generative adversarial network (CGAN) has been introduced to
control the data generation process in a supervised manner. A CGAN combines
random noise z and y into a joint hidden representation of real data x, along
with conditional variable y; e.g., G (z,y) is used to direct the generated process,
where variable y is an additional parameter.
min maxV(D, G) = ExvPdata(x) [log D(x y)] + ExvPz(x) [log (1 − D(G(z
y)))]
Conditional variable y could be text or a number that turns the GAN model into
a supervised GAN model. CGAN can be used with images, sequence models, and
other models. CGAN is used to model complex and large-scale datasets that
have different labels by adding conditional information y to both the generator
and discriminator.
The Pix2Pix GAN is a general approach for image-to-image translation. It is
based on the conditional generative adversarial network, where a target image
is generated, conditional on a given input image.
PIX2PIC
It takes your image and typically we use it for segmented segmentation and it
gives you segmented segmentation it takes your image training image and
training mass then you train this unit and generates your semantic segmented
image.
In pix2pix L1 loss is that fake image how different is it from your real image.
Looking at segmented image and comparing it with your actual supposed
sigmented mark.
tor
Genera
This loss is the combine loss between your gans loss means absolute error
because you are looking at your real and fake image then differentiating
between them.
it's able to translate images of one kind to another without needing pairs of
images.
In some cases like the one you're seeing here images can be split up into two
distinct
domains. Here the domains are satellite images and map images.
CycleGan
Let's say we wanted to make an AI which translates satellite images into map
images. The
simplest approach would be relatively straightforward; we could simply train
an
autoencoder to translate the photo,then compare the translated photo to its
actual pair. Auto encoder would use some convolution block down sampling
the image a few times then up sampling few times to return the image to its
original size reason to do so is to compress,feature which might be far apart
spartially but they are related in the image.
The auto encoder is a key concept in image to image translation and is used in
CycleGAN as well. It is, on its own,far from the best in terms of paired
translation, but is a relatively simple solution nonetheless.
CycleGan:
Got release in 2017 it can perform task of image translation once its train you
can translate any image from one domain to another. first we have trained
GAN on horse then we can convert this image to Zebra this is domain
translation will keep the background same.
Architecture uses two generator and one discriminator two generator often
variations of auto encoders and take an image as input return an image as
output.
Discrimulator take an image as input and output one single number. Main
objective for its main generator and generator
1) Is to ensure that translated image looks like it is infact a zebra
This is trained into a generator using gans.Generator and discriminated train
adversarially .Discriminator goal to classify whether an image is real Zebra or a
fake Zebra made by generator .Generator training alongside discriminator tries
to make image which fool the discriminator into thinking they are
real.Generator must learn to make convincing looking image of zebras.
2) To ensure translated image looks like the original in some way can do this
Using cycle consistency loss and a second generator model.Two generator
works together in this loss one generated translate however it feels necessary
while second generator learns alongside to translate the image back to original.
Both generator are penalized for any difference between the original image
and the image thats been through both generator.Main generator does not
completely disregard its input and using second generator allows for flexibility
in that translation process.
So those are the main concepts behind CycleGAN.
Info Gan
The authors of InfoGAN proposed learning the disentangled representations by
maximizing mutual information in an unsupervised manner. In InfoGAN, the
input to the generator is decomposed into two parts: the incompressible noise
vector G(z) and latent variable c. Similar to CGAN, the generator uses latent
code c; however, the latent code c of InfoGAN is unknown and it is to be
discovered Symmetry through training. InfoGAN is implemented by adding a
regularization term to the original GAN’s objective function.
min max VI(D, G) = V(D, G) − λI(c; G(z, c))
where VI(D, G) is the loss function of GAN, λI (c;G(z,c)) is the mutual
information, and Lambda is a constant. InfoGAN maximizes the mutual
information between the generator’s output G(z,c) and latent code c to
discover the meaningful features of the real data distribution. However, mutual
information λI(c; G(z, c)) requires access to the posterior probability p(c|x),
which makes it difficult to directly optimize .Later, other InfoGAN variants
were proposed, such as the semi-supervised InfoGAN (ss-InfoGAN) and the
causal InfoGAN .
Problem:
How do we get a high resolution (HR) image from just one (LR) lower
resolution image?
Answer: We use super-resolution (SR) techniques.
SRGAN
SRGAN - Generator
● G: generator that takes a low-res image ILR and outputs its high-res
counterpart ISR
● θG : parameters of G, {W1:L, b1:L}
● lSR: loss function measures the difference between the 2 high-res images
SRGAN - Discriminator
● D: discriminator that classifies whether a high-res image is IHR or ISR
● θD : parameters of D