Technical Seminar Final Report

VISVESVARAYATECHNOLOGICALUNIVERSITY
“JNANASANGAMA”,BELAGAVI-590018.
Technical Seminar Report

On
“Generative Adversarial Network”
Submitted to the partial fulfillment for the award of degree of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted By
Rhythm Bhatnagar [1JB18CS116]
Under The Guidance Of
Dr. Krishna A N
Head and Professor,
Dept. of C.S.E, SJBIT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SJB INSTITUTE OF TECHNOLOGY

BGS HEALTH AND EDUCATION CITY
Dr. Vishnuvardhan Road,Kengeri, Bangalore-560060.
2021-22
||Jai Sri Gurudev||
Sri Adichunchanagiri Shikshana Trust ®
S. J. B INSTITUTE OF TECHNOLOGY
No.67, Dr Vishnuvardhan Road, BGS Health & Education City, Kengeri, Bengaluru,
Karnataka 560060.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the Seminar work entitled “Generative Adversarial Network” is carried
out by Mr. RHYTHM BHATNAGAR bearing USN: 1JB18CS116 is a bonafide student of SJB
INSTITUTE OF TECHNOLOGY, in partial fulfillment for the award of "BACHELOR OF
ENGINEERING" in COMPUTER SCIENCE AND ENGINEERING as prescribed by
VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELAGAVI during the academic
year 2021-2022. It is certified that all corrections/suggestions indicated for internal assessment
have been incorporated in the report deposited in the departmental library. The Technical
Seminar report has been approved as it satisfies the academic requirements in respect of seminar
work prescribed for the said degree.
ACKNOWLEDGEMENT
We would like to express our profound gratefulness to His Divine Soul Jagadguru
Padmabhushan Sri Sri Sri Dr. Balagangadharanatha Maha Swamiji and His Holiness
Jagadguru Sri Sri Sri Dr. Nirmalanandanatha Maha Swamiji for providing us anopportunity
to complete our academics in this esteemed Institution.
We would like to express our profound thanks to Reverend Sri Sri Dr. Prakashnath Swamiji,
Managing Director, SJB Institute of Technology, for his continuous support in providing amenities
to carry out this Technical Seminar in this admired institution.
We express our gratitude to Dr. K.V. Mahendra Prashanth, Principal, SJB Institute of
Technology, for providing us excellent facilities and academic ambience, which have helped us
in the satisfactory completion of the Technical Seminar.
We extend our sincere thanks to Dr. Krishna A N, Professor and Head of the Department of
Computer Science and Engineering, SJB Institute of Technology, for providing us an invaluable
support throughout the period of our Technical Seminar.
We express our deepest gratitude and sincere thanks to Seminar guide Dr. Krishna A N,
Professor and Head of the Department of Computer Science and Engineering, SJB Institute of
Technology, Bengaluru, for his valuable guidance, support and cheerful encouragement during the
entire period of the Technical Seminar.
Finally, we take this opportunity to extend our earnest gratitude and respect to my parents, the
teaching & non-teaching staff of the department, the library staff and all my friends, who have
directly or indirectly supported us during the period of our Technical Seminar.
Regards,
Rhythm Bhatnagar
[1JB18CS116]
ABSTRACT
Deep learning has achieved great success in the field of artificial intelligence, and many deep
learning models have been developed. Generative Adversarial Networks (GAN) is one of the deep
learning models, which was proposed based on zero-sum game theory and has become a new
research hotspot. The significance of the model variation is to obtain the data distribution through
unsupervised learning and to generate more realistic/actual data. Currently, GANs have been widely
studied due to the enormous application prospect, including image and vision computing, video and
language processing, etc. This report explains an overview of GAN and discuss the popular variants
of this model, its common applications, different evaluation metrics proposed for it, and finally its
drawbacks, conclusion of the paper and future course of action and also about the background of the
GAN, theoretic models and extensional variants of GANs are introduced, where the variants can
further optimize the original GAN or change the basic structures. Then the typical applications of
GANs are explained. Finally, the existing problems of GANs are summarized and the future work of
GANs models are given.
Table of Contents
Chapter No. Page No.

Acknowledgement i
Abstract ii
Table of Content iii
CHAPTER 1: Introduction 4
CHAPTER 2: Literature survey 5-7
CHAPTER 3: Problem Statement 8
CHAPTER 4: Methodology 9
CHAPTER 5: Design and Architecture 10
CHAPTER 6: Implementation 11-12
CHAPTER 7: Results and Discussion 13-14
CONCLUSION 15
REFERENCES 16
CHAPTER 1:
INTRODUCTION
A Generative Adversarial Network (GAN) emanates in the category of Machine Learning

(ML) frameworks. These networks have acquired their inspiration from Ian Goodfellow and
his colleagues based on noise contrastive estimation and used loss function used in present
GAN (Grnarova et al., 2019). Actual working using GAN started in 2017 with human faces to
adopt image enhancement that produces better illustration at high intensity. Adversarial
networks were fundamentally inspired by the blog that was written by Olli Niemitalo in 2010
but the same idea is known as Conditional GAN. In the examination of the GAN rigorous
impact of 2D to 3D image conversation, initially, the corresponding dataset has to do live data
fetching and create the benchmark with key features (Wu, Zhang, Xue, Freeman &
Tenenbaum, 2016). Thereafter, for calculating threshold and suitability score, image merging
has to be done. Image data pre-processing steps involve image segmentation and cleansing
which follows the GAN training. Outcomes are expected pattern analysis and exactness of the
image generation.
The GAN working based on three principles, firstly to make the generative model learn, and
the data can be generated employing some probabilistic representation. Secondly, the training
of a model is done can be done in any conflicting situation. Lastly by using the deep learning
neural networks and using the artificial intelligence algorithms for training the complete
system (Liu & Tuzel, 2016). The basic idea of GAN network deployment is for unsupervised
ML techniques but also proved to be better solutions for semi-supervised and reinforcement
learning.
Dept of CSE, SJBIT 4

CHAPTER 2:
LITERATURE SURVEY
AUTHORS TITLE YEAR DESCRIPTION
Zhao Fan, Jin Hu 2019

Review and Prospect This paper discusses
of Research on about the Generative
Generative adversarial Model and
Adversarial its applications.
Networks
Shailender Kumar, A Detailed Study on This paper includes a

Sumit Dhawan Generative 2020 detailed Study on
Adversarial Generative Adversarial
Networks Networks and its
Types.
Liang Gonog1,2 and

A Review: 2019 This paper explains
Yimin Zhou1
Generative about the basics of
Adversarial GAN, Training of
Networks GAN, Usage of the
GANs

In this paper, the role
Piotr. N and Teresa. Application of deep 2020 of Deep learning for
P learning for Generative
Adversarial Networks
Generative
is discussed in detail.
Adversarial
Networks
Overview of
Md. Shafiqul Islam, 2018 This paper discusses
Degenerative about the overview of
Maher Arebey,
Convolutional DC-GAN and its
M.A. Hannan,
Generative applications and
Hasan Basri
Adversarial limitations.
network

CHAPTER 3:
PROBLEM STATEMENT
Person Re-identification is an application in which a person/ multiple people are re identified

either through the same camera or through multiple cameras.
Generative adversarial networks (GANs) are a generative model with implicit density
estimation, part of unsupervised learning and are using two neural networks. Thus, we
understand the terms "generative" and "networks" in "generative adversarial networks".
Generative Adversarial Network is one of the few techniques that is used in Person Re-
Identification to achieve efficient and fast results. Therefore, I have chosen Generative
Adversarial Network as my topic for the Technical Seminar.

CHAPTER 4:
METHODOLOGY
GANs are structurally inspired by two-person zero-sum games in the game theory (i.e. the sum of the
two people interests is zero, and the gain of one side is exactly what the other side loses). It sets up
one generator and one discriminator for each participant in the game. The aim of the generator is to
learn and capture the potential distribution in the actual data samples as much as possible, and
generate new data samples. Discriminator is a binary classifier, and the aim is to determine whether
the input data is from the actual data or from the generator. In order to win the game, the two players
need to constantly improve their capability to generate and discriminate. So the process of optimal
learning is a minimax game problem. The aim is to find a Nash equilibrium between the two sides, so
that the generator can estimate the distribution of data samples. Any differentiable function can be
used to represent the generator and discriminator of GANs, which means that the generator and
discriminator can adopt the deep neural network.

CHAPTER 5:
DESIGN AND ARCHITECTURE
GANs learns to map the simple latent distribution to the more complex data distribution. To capture
the complex data distribution Pdata, GANs architecture should have enough capacity. GANs is based
on the concept of a non-cooperative game of two networks, a generator G and a discriminator D, in
which G and D play against each other. GANs can be part of deep generative models or generative
neural models where G and D are parameterized via neural networks and updates are made in
parameter space.
Fig 5.1 Basic Architecture of GAN
Both G and D play a minimax game where G’s main aim is to produce samples similar to the
samples produced from real data distribution and D’s main goal is to discriminate the samples
generated by G and samples generated from the real data distribution by assigning higher and lower
probabilities to samples from real data and generated by G, respectively. On the other hand, the main
target of GANs training is to keep moving the generated samples in the direction of the real data
manifolds through the use of the gradient information from D. In GANs, x is data extracted from the
real data distribution, Pdata, noise vector z is taken from a Gaussian prior distribution with zero-
mean and unit variance Pz, while Pg refers the G’s distribution over data x. Latent vector z is passed
to G as an input and then G outputs an image G(z) with the aim that D cannot differentiate between
G(z) and D(x) data samples, i.e., G(z) resembles with D(x) as close as possible. In addition, D
simultaneously tries to restrain itself from getting fooled by G. D is a classifier where D(x) = 1 if x ∼
Pdata and D(x) = 0 if x ∼ Pg, i.e., x is from Pdata or from Pg. The basic GANs architecture for the
above discussion is given in Figure 5.1.

CHAPTER 6:
IMPLEMENTATION
Fig 6.1 Implementation Structure of GAN
Here, we introduce the mathematical model and the training method of GANs. First, we describe
the optimization of the discriminator D given generator G. Similar to the training of Sigmod
functionbased classifiers, training the discriminator involves minimizing the cross entropy[10].
where x is sampled from the real data distribution pdata(x), z is sampled from the prior
distribution pz(z) such as uniform or Gaussian distribution, and E(·) represents the expectation.
It should be noted that the training data consists of two parts: one part from the real data
distribution pdata(x) and another part from the generated data distribution pg(x). This is slightly
different from the conventional methods for binary classification. Given the generator, we need
to minimize (1) to obtain the optimal solution[11]. In the continuous time space, Eq.(1) can be
reformulated as:

Generative Adversarial Network 2021-22
can achieve its minimum value at y = m/(m + n). When giving generator G, the objective
function Eq.(2) can achieve its minimum value, D∗ G(x), as,
D∗ G(x) = pdata(x) pdata(x) + pg(x) (4)
This is the optimal solution of the discriminator D. From Eq(4), the discriminator of the GAN
can estimate the ratio of the two probability densities, which is the key difference from Markov
chain or lower bound based methods. On the other hand, D(x) denotes the probability of x
sampled from the real data rather than the generated data. If the input data is from the real data x,
the discriminator strives to make D(x) close to 1. If the input data is from the generated data
G(z), the discriminator strives to make D(G(z)) close to 0 while the generator G tries to make it
close to 1. Since this is a zero-sum game between G and D, the optimization of GAN can be
formulated as a min-imax problem:
minGmaxDf(D, G) = nEx∼Pdata (logD(x)) +Ez∼Pz (log(1 − D(g(z)))) (5)

In summary, for learning the parameters of the GAN, we need to train the discriminator D to
maximize the accuracy of the input data from the real data x or the generated data G(z). In
addition, we should train the generator G to minimize log(1−D(G(z))). Here, an alternative
training method can be used. First, G is fixed to maximize the discrimination accuracy of D.
Then, D is fixed to minimize the discrimination accuracy of D.

CHAPTER 7:
RESULTS AND DISCUSSION
Generative adversarial networks are quite important in the generative models. GANs have great
power to deal with the problem of generating data which can be interpreted naturally. In this section,
we will analyze the main advantages, disadvantages and prospect of the generative adversarial
networks. Advantages: Compared with other generative models, the generative adversarial networks
has the following three advantages. 1) It can produce better samples than other models (the image is
sharper and clearer). GANs can train any kind of the generative network. Most other frameworks
require the generative network to have some specific functional form. It is important that all other
frameworks require generative networks to be distributed over non-zero mass. 2) Generative
adversarial networks can learn to generate points only on thin manifolds that are close to the data.
The training does not rely on the inefficient Markov chain method, nor the approximate inference. 3)
There is no complex variational lower bound, which can greatly reduce the training difficulty and
improve the training efficiency. Disadvantages: Although GANs can perform well in Nash
equilibrium, the gradient descent can guarantee Nash equilibrium only in the case of convex
function. The training process requires to ensure balance and synchronization of the two adversarial
networks, otherwise it can not achieve ideal performance. However, it is difficult to control the
synchronization of the two adversarial networks, so the training process may be unstable. In addition,
the GAN model is defined as a minimax problem with no loss function. It is difficult to distinguish
whether the progress is being made in the training process. GANs learning process may cause
collapse problem, i.e., the generator degeneration, continuous the same sample points generation,
unable continue learning. When the generative model collapses, the discriminative model also points
to the same direction for similar sample points, and the training cannot continue. Furthermore,
although the samples generated by GANs are diverse, there exists the collapse mode problem. Mode
collapse refers to the scenarios in which the generator makes multiple images that contain the same
color or texture themes, thereby having little difference for human understanding. C. Prospect In
spite of various disadvantages and limitations, it is worth investigating the future applications and
progress of the GANs. For example, Wasserstein GAN (WGAN) have two great improvements
compared with the initial generative adversarial network. The first improvement is WGAN which
can greatly overcome the training instability problem, and the second improvement is that it can
partially solve the collapse mode problem at the same time. How to completely avoid collapse mode
and further optimize the training process remains a research direction of the GANs.

Generative Adversarial Network 2021-22
Furthermore, the theory about model convergence and the existence of equilibrium point remains an
important research direction in the future. How to generate a variety of data that can interact with
humans from simple random inputs is an important research direction. From the perspective of
combining GANs and other methods, how to integrate GANs with feature learning, imitation
learning, and reinforcement learning to develop new AI applications and promote the development of
these methods is quite meaningful. In the future, GANs could be used to accelerate the development
and application of AI, make the AI have the ability to understand our human beings and explore the
world.

CONCLUSION
As an unsupervised learning method, GANs is one of the most important research directions in deep
learning. The explosion of interest in GANs is driven not only by their potential to learn deep, highly
nonlinear mappings from a latent space into a data space and also it has potential to make use of the vast
quantities of unlabeled data. Although our world is almost overwhelmed by the data, a large part are
unlabeled, which means that the data is not available for most current supervised learning. Generative
adversarial networks, which rely on the internal confrontation between real data and models to achieve
unsupervised learning, is just a glimmer of light for AIs self-learning ability. Therefore, there are many
opportunities for the developments in both theory and algorithms, and by using deep networks, there are
vast opportunities for new applications. New work is being continuously done to overcome the
limitations of GANs. For example, WGAN can resolve the mode collapse problem as well as the training
instability problem, but only partially. Therefore, preventing the problem of mode collapse in GANs
remains an open research problem. Also, other research areas include the presence of Nash equilibrium
and the theory of the convergence of a GAN model. GANs are being widely utilized in the area of
Computer Vision but comparatively not that much in other fields like Natural Language Processing.
because of the different properties related to image and non-image data. GANs can be used for
interesting applications in various fields, therefore research is going on for that and also on how to
increase the efficiency and improve the performance of GAN so that GAN’s application in much better
way.

REFERENCES
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and

Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems,
2014, pp. 2672– 2680.
[2] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image
synthesis,” arXiv preprint arXiv:1809.11096, 2018.
[3] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality,
stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
[4] C. Vondrick, H. Pirsiavash, and A. Torralba, “Generating videos with scene dynamics,” in
Advances In Neural Information Processing Systems, 2016, pp. 613–621.
[5] H. Chang, J. Lu, F. Yu, and A. Finkelstein, “Pairedcyclegan: Asymmetric style transfer for
applying and removing makeup,” in 2018 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2018. [6] L. Li, Y.-L. Lin, D.-P. Cao, N.-N. Zheng, and F. Wang, “Parallel
learninga new framework for machine learning,” Acta Automatica Sinica, vol. 43, no. 1, pp. 1–8,
2017.
[7] J. Nash, “Two person cooperative games,” Econometrica, vol. 21, no. 1, pp. 128–140, 1953.
[8] F.-Y. Wang, J. Zhang, Q. Wei, X. Zheng, and L. Li, “Pdp: parallel dynamic programming,”
IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 1, pp. 1–5, 2017.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional
neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
[10] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P.
Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition.

A Review: Generative Adversarial Networks
Liang Gonog1,2 and Yimin Zhou1,∗
1
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
2
School of Electrical Engineering, University of South China, Hengyang 421000, China
Email: {liang.gong, ym.zhou}@siat.ac.cn
Abstract—Deep learning has achieved great success in the data samples as much as possible, and generate new data
field of artificial intelligence, and many deep learning models samples. Discriminator is a binary classifier, and the aim is
have been developed. Generative Adversarial Networks (GAN) to determine whether the input data is from the actual data or
is one of the deep learning model, which was proposed based
on zero-sum game theory and has become a new research
from the generator. In order to win the game, the two players
hotspot. The significance of the model variation is to obtain the need to constantly improve their capability to generate and
data distribution through unsupervised learning and to generate discriminate. So the process of optimal learning is a minimax
more realistic/actual data. Currently, GANs have been widely game problem[6]. The aim is to find a Nash equilibrium[7]
studied due to the enormous application prospect, including between the two sides, so that the generator can estimate the
image and vision computing, video and language processing, etc.
In this paper, the background of the GAN, theoretic models and
distribution of data samples. The computation procedure and
extensional variants of GANs are introduced, where the variants structure of GANs are shown in Fig.1.
can further optimize the original GAN or change the basic
structures. Then the typical applications of GANs are explained.
Finally the existing problems of GANs are summarized and the
future work of GANs models are given.
Index Terms—Generative Adversarial Networks (GANs), Deep
learning, Generative models, Application
I. INTRODUCTION
The generative adversarial networks (GANs)[1] is an emerg-
ing generative model proposed by Ian Goodfellow of Google
Brain scientists in 2014. As a new method of learning and
generative model, GAN can avoid some deficiency in the
practical application of some traditional generation models,
and can subtly optimize some loss functions that are hardly to
deal with through adversarial learning, and realize the semi-
supervised and unsupervised learning technology by implicitly
modeling the high dimensional distribution of data. GANs
excels in various challenging tasks, such as realistic im- Fig. 1. The Structure of GAN
age generation[2][3], video frame generation[4], artistic style
migration[5], etc. Any differentiable function can be used to represent the
In this paper, we are providing GANs proposal background generator and discriminator of GANs, which means that the
and development from the algrithm prospective, including ba- generator and discriminator can adopt the deep neural network.
sic theory and extensional models of GAN in Section II. Then If the differentiable functions D and G are used to represent
we introduce the variants of GANs for different application discriminators and generators respectively, their inputs are real
fields in Section III. In Section IV, problems and advantages data x and random variable z respectively. G(z) is a sample
of the GANs are summarized, and the future prospect of GANs generated by G, which obeys the distribution of real data. If the
is discussed in Section V. input of the discriminator comes from the real data, it is tagged
as One. If the input sample is G(z), it is tagged as Zero.The
II. THEORY AND EXTENSIONAL MODELS OF GANS
goal of D is to achieve a correct binary classification of data
A. Basic Theory of GANs sources: true(from the distribution calssification of real data x)
GANs are structurally inspired by two-person zero-sum or false(from the fake data G(z) of the generator)[8], while
games in the game theory (i.e. the sum of the two people the goal of G is to make the selfgenerated false data G(z)
interests is zero, and the gain of one side is exactly what the perform as the same as the real data x performing on D(x),
other side loses). It sets up one generator and one discriminator which are mutually antagonistic. The performance of D and G
for each participant in the game. The aim of the generator is can be improved by the process and iteratively optimized[9].
to learn and capture the potential distribution in the actual When the discriminative ability of D is improved to a certain
978-1-5386-9490-9/19/$31.00 Ⓧ
c 2019 IEEE 505
degree and the data source can not be correctly identified, the zero-sum game between G and D, the optimization of GAN
generator G can be considered to have learned the distribution can be formulated as a min-imax problem:
of the actual data.
minG maxD f (D, G) = nEx∼Pdata (logD(x)) (5)
B. Mathematical Model and Training Method +Ez∼Pz (log(1 − D(g(z))))
Here, we introduce the mathematical model and the training
method of GANs. First, we describe the optimization of the In summary, for learning the parameters of the GAN, we
discriminator D given generator G. Similar to the training need to train the discriminator D to maximize the accuracy
of the input data from the real data x or the generated data
of Sigmod functionbased classifiers, training the discriminator
involves minimizing the cross entropy[10]. The loss function G(z). In addition, we should train the generator G to minimize
is formulated as, log(1 −D(G(z))). Here, an alternative training method can be
used. First, G is fixed to maximize the discrimination accuracy
of D. Then, D is fixed to minimize the discrimination accuracy
1
Obj D (θD, θ G )= − E x∼pdata [logD(x)]
of D[14]. This process alternates and we could achieve the
21 (1) global
trainingoptimal
process,solution of if pdata = p g. Intimes
if and only
the parameters the
— Ez∼pz (z) [log(1 − D(g(z)))] D are updated k
2 and the parameters of G is only updated once.
where x is sampled from the real data distribution pdata (x), z
is sampled from the prior distribution pz(z) such as uniform C. Extentional Models of GANs
or Gaussian distribution, and E( · ) represents the expectation. Compared to other generative models, higher quality sam-
It should be noted that the training data consists of two ples (sharper and clearer images) can be produced by GAN
parts: one part from the real data distribution p data (x) and than other models. Shorter runtime will consumed by GAN
another part from the generated data distribution pg(x). This to generate a sample compared to PixelRNN[12]. GAN can
is slightly different from the conventional methods for binary approximate arbitrary probability distributions in a theoretical
classification. Given the generator, we need to minimize (1) to mode to overcome the problem that VAE’s[13] (Variational
obtain the optimal solution[11]. In the continuous time space, Auto Encoders) final simulation results would be biased. How-
Eq.(1) can be reformulated as, ever, GAN still has the problem of divergence for the model
∫ is too free and uncontrollable. Therefore, some improvements
1
x
ObjD (θD , θG ) = − Pdata(x)log(D(x))dx have been made to improve the performance of GANs.
2 1) Optimization of GAN: We have introduced the basic
∫ Pz(z)log(1 − D(g(z)))dz (2)
1 structure of the original GAN, whose generator uses a dis-
— tribution to directly sample without the requirement for pre-
2∫ z modeling, thus achieving theoretically complete fitting of the
1 actual data distribution, which is the biggest advantage of
=− [P (x)log(D(x))]d x
2 ∫x data GAN. Many GANs models can realize the optimization of
1 the GAN without changing the original structure.
+ [Pg (z)log(1 − D(z))]d z
2 z DCGAN(deep convolutional GAN)[14] is one of the exten-
For any (m, n)ϵR , and yϵ[0, 1], the expression,
2
sion of the GAN. It replaces the G and D in the original GAN
− mlog(y) − nlog(1 − y) (3) with two CNN(convoluitnal neural network) without changing
the basic structure of the GAN, a step size convolution is used
can achieve its minimum value at y = m/(m + n). When instead of the upsampling layer, and a convolutional layer
giving generator G, the objective function Eq.(2) can achieve is used to replace the full connection layer to increase the
its minimum value, D ∗G (x), as, stability of the training.
∗ pdata(x) Unlike DCGAN, the permutation on the fully connected
DG (x) = (4) layer can be improved via WGAN (Wasserstein GAN) [15].
p data (x) + pg(x)
The Jensen-Shannon divergence is not suitable for measuring
This is the optimal solution of the discriminator D. From the distance of the distribution of the disjoint parts, and
Eq(4), the discriminator of the GAN can estimate the ratio Wasserstein distance is used instead to measure the distance
of the two probability densities, which is the key difference between the generated data distribution and the real data
from Markov chain or lower bound based methods. On the distritution. So the problem of instability in training and
other hand, D(x) denotes the probability of x sampled from model collapse can be solved partially. In fact, Lipschitz
the real data rather than the generated data. If the input data is restriction in WGAN requires to cut off the absolute value
from the real data x, the discriminator strives to make D(x) of the discriminator parameters without exceeding the fixed
close to 1. If the input data is from the generated data G(z), constant c. Hence, Gulrajani proposed WGAN-GP (WGAN-
the discriminator strives to make D(G(z)) close to 0 while gradient penalty)[16] with the gradient penalty replacement so
the generator G tries to make it close to 1. Since this is a that the discriminator can learn reasonable parameters to solve
506 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA)
TABLE I ial Networks (LAPGAN)[25] combines the principles of GAN
T HE CLASSIFICATION OF THE GANS and cGAN using a tandem network, and the pictures generated
Category GANs Networks by the above level are used as condition variables to form a
DCGAN[14] laplacian pyramid to generate images from rough to precise
Optimazation of GAN WGAN[15] mode.
LSGAN[17]
EBGAAN[19]
InfoGAN[23] can also be seen as a kind of cGAN. From
CGAN[21] the starting point, InfoGAN is based on the simplicity of
Semi-supervised-GAN[22] GAN. It can decompose the input z on the original singer,
Different Structure from GAN InfoGAN[23]
BigGAN[2] and decomposes an implicit code c from the original noise
CycleGAN[24] z. In addition to c, it contain a variety of variables. Taking
the MNIST dataset as an example, it is possible to indicate
such the direction of the illumination, the tilt angle of the
the slow convergence problem of the WGAN. font, the thickness of the stroke, and so on. The basic idea
Although WGAN and WGAN-GP have basically solved the of the InfoGAN is that the c can explain the generated
problem of the training failure, both the training process and G(z, c), then c should be highly correlated with G(z, c). The
the convergence speed are slower than that of the conventional important significance of InfoGAN is that it breaks out the
GAN. Inspired by WGAN, Mao et al. proposed the least structured implicit code c from the noise z, which makes the
square GAN (LSGAN)[17], one of the starting points of generation process have certain a degree of controllability, and
LSGAN is to improve the quality of the picture. Its main idea the generated result also has certain interpretability.
is to provide discriminator D with a loss function for smooth Pix2Pix[26] based on CGAN treats the generator as a kind
and unsaturated gradients. There is another LSGAN (Loss- of mapping, which maps the image to another desired image
sensitive GAN) proposed by Qi[18], the loss function obtained pixels to pixels, so as to realize the different image translation
by the minimum objective function is limited to satisfy the function through the same model. This inspires the researchers
Lipschitz continuity function class, in order to limit the mod- to explore further. However, the fatal flaw of Pix2Pix is that
eling ability of the model and solve the over-fitting problem. its training requires pictures x and y that are paired with each
Similar to WGAN, EBGAN (Energy-Based GAN)[19] uses other. This type of data is extremely lacking and greatly limits
energy values as a measure from the energy model; BEGAN the application of the model. In this regard, CycleGAN[24]
(Boundary Equilibrium GAN)[20] also proposes a “boundary proposes an image translation method that does not require
balance” architecture based on EBGAN and WGAN, using pairing data.
standard training steps to achieve fast and stable convergence. As a promotion of CycleGAN, StarGAN[27] has turned the
2) Different Structure from the Original GAN: The above two-two mapping into a mapping between multiple domains,
GANs are the improvement on the GAN foundation. However, which is another major breakthrough in the field of image
the underlying GAN is sometimes insufficient in practice translation. In addition, StarGAN can train the same model by
to meet our requirements for data generation. For example, implementing joint training between multiple datasets (such
sometimes it is necessary to generate a certain type of images as CelebA datasets with labels such as skin color, age, and
instead of randomly simulating sample data, such as generating RaFD datasets with angry, scary, and other expression tags).
a certain text; sometimes it is required some parts of the Completing the compression of the model is a major success
image to be generated instead of generating all images, such in the field of image translation.
as mosaic. Based on these real life requirements, GAN also At the same time, we can classify GANs as unsupervised
requires to adjust the structure of the original model to meet learning, semi-supervised learning and supervised learning
the data to be generated. through the presence or absence of the category label infor-
In the application, the vast majority of the data is the multi- mation in the training data. Table II demonstrates the different
labelled data, and how to generate the data of the specified attribute category of some GANs.
label is the contribution of the conditional GAN(cGAN)[21]
on the GAN model. In the basic GAN model, the generator TABLE II
is implemented by inputting a string of random numbers that C LASSFICATION OF THE GANS BASE ON THE LEARNING METHOD
satisfy a certain distribution. In CGAN, not only the random Category Characteristics Networks
number but also the label category is spliced and input, so Unsupervised Learning Tagged data InfoGAN,
CycleGAN, GAN
the generator can generates the required data. In addition, for Semi-supervised Learning Partially tagged data Improved-
the discriminator, it is also necessary to splice the real data GAN[28], SGAN
or the generated data with the corresponding tag category, Supervised Learning Untagged data CGAN, ACGAN,
AM-GAN
and then inputting the neural network of the discriminator for
identification and judgment.
Since cGAN has been proposed, many scholars have used III. APPLICATION
cGAN in the application or improvement for the follow-up The most important power of GANs is the networks that can
work of cGAN. For example, Laplacian Generative Adversar- generate samples with the same distribution as the real data,
2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA) 507
such as generating photorealistic images. GANs can also be set, rather than decorating a single training network for all
used to tackle the problem of insufficient training samples for possible fonts. The multi-content GAN model includes a
supervised or semi-supervised learning. Currently, successful stacked cGAN architecture for predicting rough glyph shapes
applications of GAN are computer vision, including images and an ornamental network for predicting the final glyph colors
and video, such as image translation, image coloring, image and textures. The first network, called GlyphNet, is used to
restoration, and video generation. In addition, GANs have been predict the glyph mask; the second, called OrnaNet, is used
applied in speech and language processing, such as generating to fine-tune the colors and decorations that generate glyphs
dialogues. In this section, we discuss the application range of from the first network. Each sub-network follows the structure
GANs. of cGAN and modifies the structure to achieve the specific
purpose of stylizing glyphs or decorating predictions.
A. The Application in the Image 4) Image Generation: The Composite GAN[31] proposed
1) image-to-image Translation: CycleGAN is an important by Hanock et al. has generated partial images through mul-
application model of GAN in the field of image. CycleGAN is tiple generators and finally synthesized the entire image. For
based on two types of images that requires no pairing. You can example, in face generation, the background and the face are
turn it into a smile by typing a crying face or change the zebra divided into two parts and generated by different generators
to horse as shown in Fig.2. StarGAN is a further extension of into the RNN[32], can produce more realistic images. TP-
CycleGAN, where it takes strength to train one category to the GAN[33], proposed by the CASIA, synthesized the frontal
other category. The smile face can be transformed into a crying face image through the side face photos, to achieve the state-of-
face via the StarGAN, along with variety of the expressions art results. The generative network of the model is characterized
such as surprise, frustration, etc. by dual path generation, in which one path is generated by
partial facial features extracted from the side face. The facial
features can be completed, while the other path can produce a
vague full face, to locate patch networks to deal with facial
features so as to finally form a complete realistic face image.
BigGAN[2] applies orthogonal regularization to the generator
to obey a simple truncation, allowing fine-tuning of sample
fidelity and diversity tradeoffs by truncating hidden spaces. This
modification allows the model to achieve the best performance
in the image synthesis of class conditions. When training on
ImageNet with 128x128 resolution, the model of BigGAN can
achieve an Inception score (IS) of 166.3 and a Frechet Inception
Fig. 2. Imge to image translation result of CycleGAN Distance (FID) of 9.6, while the previous best IS and FID
were only 52.52 and 18.65.
2) Image Super-Resolution Reconstruction: The use of
GAN for super-resolution is to solve the shortcomings of the B. The Application with the Video
conventional methods including the deep learning methods, 1) Video Frame Prediction: Mathieu et al.[34] first applied
which lacks of high-frequency information. The traditional GAN training to video prediction, that is, the generator can
deep CNN can only improve this defect by selecting the generate the last frame of the video based on the previous
objective function. On the other hand, GAN can also solve series of the frames, and the discriminator is used to con-
this problem and obtain satisfying perception. SRGAN[29] clude the frame. All the frames except the last frame are
uses perceptual loss and adversarial loss to enhance the real pictures. The advantage is that the discriminator can
realism of the recovered picture and realize the 4x resolution effectively use the information of the time dimension, and also
reconstruction. Perceptual loss is a feature extracted by con- assists to make the generated frame consistent with all the
volutional neural network. By comparing the features of the previous frames. Experimental results show that the frames
generated image after convolutional neural network and the generated by confrontation training are more clear than the
characteristics of the target image after convolutional neural other algorithms.
network, the generated image and the target image are more 2) Video Generation: Vondrick et al.[4] have made great
similar in semantics and style. progress in the video field, 32-frame resolution 64× 64 realistic
3) Style Transformation: Based on the characteristics of the video can be generated, depicting golf courses, beaches, train
GAN, autonomous learning and random samples generation, stations and newborns. After testing, 20 of the markers did not
combined with the method of adding conditional variables recognize the authenticity of these videos. MD-GAN(Multi-
to generate specific samples, a number of GANs have been stage Dynamic GAN)[35] predicts future video frames with
developed to improve to the images learning through unpaired the proposed model through a given first frame image. In
training data under unsupervised conditions. MC-GAN[30] its two-stage model, the first stage a time-lapse video with
was proposed by BAIR with a multi-content architecture to re- realistic content can be generated; the second stage the results
customize the training network for each observed character of the first stage is optimized, mainly in adding dynamic
motion information to increase the vraisemblance. In order to It is important that all other frameworks require generative
have vivid motion information for the resulting video, a Gram networks to be distributed over non-zero mass. 2) Generative
matrix is introduced to describe the motion information more adversarial networks can learn to generate points only on thin
accurately. Moreover a large-scale time-lapse photography manifolds that are close to the data.The training does not rely
video dataset was built and tested on this dataset. By using on the inefficient Markov chain method, nor the approximate
this model, realistic time-lapse photographic video with a inference. 3) There is no complex variational lower bound,
resolution of 128× 128 up to 32 frames can be generated. which can greatly reduce the training difficulty and improve
Both qualitative and quantitative experiments demonstrate the the training efficiency.
superiority of the method outperforming available models.
C. The Application of Human-Computer Interaction B. Disadvantages

1) Auxiliary Automatic Driving: Santana et al.[36] imple- Although GANs can perform well in Nash equilibrium,
mented the assisted autonomous driving with GAN. First, an the gradient descent can guarantee Nash equilibrium only in
image is generated, that is consistent with the distribution of the case of convex function. The training process requires
the official traffic scene image, and then a transition model is to ensure balance and synchronization of the two adversarial
trained based on the cyclic neural network to predict the next networks, otherwise it can not achieve ideal performance.
traffic scene. However, it is difficult to control the synchronization of the
2) Text to Image: This field is the result of a collision two adversarial networks, so the training process may be
between NLP (Natural Language Processing) and CV (Com- unstable. In addition, the GAN model is defined as a minimax
puter Vision). The task is described as: to generate a picture problem with no loss function. It is difficult to distinguish
that matches the image text from a given textual description. whether the progress is being made in the training process.
StarkGAN[37] generates high resolution images from text GANs learning process may cause collapse problem, i.e., the
descriptions. The model decomposes the generation process generator degeneration, continuous the same sample points
into two more controllable steps: to draw the basic shape generation, unable continue learning. When the generative
and color of the object; to correct the shortcomings of the model collapses, the discriminative model also points to the
first stage results and add more detail. Extensive experiments same direction for similar sample points, and the training can
are performed to prove that the method is more effective. By not continue. Furthermore, although the samples generated by
introducing an attentional generative network, AttnGAN[38] GANs are diverse, there exists the collapse mode problem.
can synthesize fine-grained details of different sub-regions of Mode collapse refers to the scenarios in which the generator
an image by focusing on the related words in natural language makes multiple images that contain the same color or texture
description. In addition, a deep attentional multimodal simi- themes, thereby having little difference for human understand-
larity model is proposed to calculate fine-grained image-text ing.
matching loss for generator training. It is the first time that the
layered attentional GAN can automatically select word-level
C. Prospect
conditions to generate different parts of the image.
In addition, some researchers expect to use GAN learning In spite of various disadvantages and limitations, it is worth
methods in the fields of pharmaceutical molecules and mate- investigating the future applications and progress of the GANs.
rials science to generate pharmaceutical molecular structures For example, Wasserstein GAN (WGAN) have two great
and synthetic new material formulations. The idea is quite improvements compared with the initial generative adversarial
creative, if it can be realized in reality, and the Artifical network. The first improvement is WGAN which can greatly
Intelligence will be omnipotent. overcome the training instability problem, and the second
improvement is that it can partially solve the collapse mode
IV. DISCUSSION problem at the same time. How to completely avoid collapse
Generative adversarial networks are quite important in the mode and further optimize the training process remains a
generative models. GANs have great power to deal with the research direction of the GANs. Furthermore, the theory about
problem of generating data which can be interpreted naturally. model convergence and the existence of equilibrium point
In this section, we will analyze the main avantages, disadvan- remain important research directionnin the future. How to
tages and prospect of the generative adversarial networks. generate a variety of data that can interact with humans from
simple random inputs is an important research direction. From
A. Advantages the perspective of combining GANs and other methods, how
Compared with other generative models, the generative to integrate GANs with feature learning, imitation learning,
adversarial networks has the following three advantages. 1) and reinforcement learning to develop new AI applications
It can produce better samples than other models (the im- and promote the development of these methods is quite
age is sharper and clearer). GANs can train any kind of meaningful. In the future, GANs could be used to accelerate
the generative network. Most other frameworks require the the development and application of AI, make the AI have the
generative network to have some specific functional form. ability to understand our human beings and explore the world.
2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA) 509
V. C ONCLUTION [13] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv
preprint arXiv:1312.6114, 2013.
As an unsupervised learning method, GANs is one of [14] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
the most important research directions in deep learning. The learning with deep convolutional generative adversarial networks,” Com-
puter Science, 2015.
explosion of interest in GANs is driven not only by their [15] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv
potential to learn deep, highly nonlinear mappings from a preprint arXiv:1701.07875, 2017.
latent space into a data space and also it has potential to [16] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville,
“Improved training of wasserstein gans,” in Advances in Neural Infor-
make use of the vast quantities of unlabeled data. Although mation Processing Systems, 2017, pp. 5767–5777.
our world is almost overwhelmed by the data, a large part [17] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley, “Least
are unlabeled, which means that the data is not available squares generative adversarial networks,” 2016.
[18] G. J. Qi, “Loss-sensitive generative adversarial networks on lipschitz
for most current supervised learning. Generative adversarial
densities,” 2017.
networks, which rely on the internal confrontation between [19] J. Zhao, M. Mathieu, and Y. Lecun, “Energy-based generative adversarial
real data and models to achieve unsupervised learning, is just a network,” 2017.
[20] D. Berthelot, T. Schumm, and L. Metz, “Began: Boundary equilibrium
glimmer of light for AIs self-learning ability. Therefore, there generative adversarial networks,” 2017.
are many opportunities for the developments in both theory [21] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”
and algorithms, and by using deep networks, there are vast Computer Science, pp. 2672–2680, 2014.
[22] T. Chavdarova and F. Fleuret, “Sgan: An alternative training of gener-
opportunities for new applications. ative adversarial networks,” in Proceedings of the IEEE international
conference on Computer Vision and Pattern Recognition, 2018.
ACKNOWLEDGMENT [23] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and
P. Abbeel, “Infogan: Interpretable representation learning by information
This work is supported under the Shenzhen Science maximizing generative adversarial nets,” 2016.
and Technology Innovation Commission Project Grant Ref. [24] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image -to-image
JCYJ20160510154736343 and Ref.JCYJJCYJ201708181536 translation using cycle-consistent adversarial networks,” pp. 2242–2251,
2017.
35759, and Science and Technology Planning Project of [25] R. Fergus, R. Fergus, R. Fergus, and R. Fergus, “Deep generative
Guangdong Province Ref. 2017B010117009, and Guangdong image models using a laplacian pyramid of adversarial networks,” in
Provincial Engineering Technology Research Center of In - International Conference on Neural Information Processing Systems ,
2015.
telligent Unmanned System and Autonomous Environmental [26] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
Perception. with conditional adversarial networks,” pp. 5967–5976, 2016.
[27] Y. Choi, M. Choi, M. Kim, J. W. Ha, S. Kim, and J. Choo, “Stargan:
R EFERENCES Unified generative adversarial networks for multi-domain image-to-
image translation,” 2017.
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, [28] A. Odena, “Semi-supervised learning with generative adversarial net-
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in works,” arXiv preprint arXiv:1606.01583, 2016.
Advances in neural information processing systems, 2014, pp. 2672– [29] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,
2680. A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic
[2] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for single image super-resolution using a generative adversarial network.”
high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, in CVPR, vol. 2, no. 3, 2017, p. 4.
2018. [30] H. Park, Y. Yoo, and N. Kwak, “Mc-gan: Multi-conditional generative
[3] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing adversarial network for image synthesis,” 2018.
of gans for improved quality, stability, and variation,” arXiv preprint [31] H. Kwak and B. T. Zhang, “Generating images part by part with
arXiv:1710.10196, 2017. composite generative adversarial networks,” 2016.
[4] C. Vondrick, H. Pirsiavash, and A. Torralba, “Generating videos with [32] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent
scene dynamics,” in Advances In Neural Information Processing Sys- neural networks for sequence learning,” Computer Science, 2015.
tems, 2016, pp. 613–621. [33] R. Huang, S. Zhang, T. Li, and R. He, “Beyond face rotation: Global
[5] H. Chang, J. Lu, F. Yu, and A. Finkelstein, “Pairedcyclegan: Asymmetric and local perception gan for photorealistic and identity preserving frontal
style transfer for applying and removing makeup,” in 2018 IEEE view synthesis,” pp. 2458–2467, 2017.
Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [34] M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video
[6] L. Li, Y.-L. Lin, D.-P. Cao, N.-N. Zheng, and F. Wang, “Parallel learning- prediction beyond mean square error,” arXiv preprint arXiv:1511.05440,
a new framework for machine learning,” Acta Automatica Sinica, vol. 43, 2015.
no. 1, pp. 1–8, 2017. [35] W. Xiong, W. Luo, L. Ma, W. Liu, and J. Luo, “Learning to generate
[7] J. Nash, “Two person cooperative games,” Econometrica, vol. 21, no. 1, time-lapse videos using multi-stage dynamic generative adversarial
pp. 128–140, 1953. networks,” in Proceedings of the IEEE Conference on Computer Vision
[8] F.-Y. Wang, J. Zhang, Q. Wei, X. Zheng, and L. Li, “Pdp: parallel and Pattern Recognition, 2018, pp. 2364–2373.
dynamic programming,” IEEE/CAA Journal of Automatica Sinica, vol. 4, [36] E. Santana and G. Hotz, “Learning a driving simulator,” arXiv preprint
no. 1, pp. 1–5, 2017. arXiv:1608.01230, 2016.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification [37] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas,
with deep convolutional neural networks,” in Advances in neural infor- “Stackgan: Text to photo-realistic image synthesis with stacked genera-
mation processing systems, 2012, pp. 1097–1105. tive adversarial networks,” arXiv preprint, 2017.
[10] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, [38] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and
A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural X. He, “Attngan: Fine-grained text to image generation with attentional
networks for acoustic modeling in speech recognition: The shared views generative adversarial networks,” arXiv preprint, 2017.
of four research groups,” IEEE Signal processing magazine, vol. 29,
no. 6, pp. 82–97, 2012.
[11] Y. Zhang, Z. Gan, and L. Carin, “Generating text via adversarial
training,” in NIPS workshop on Adversarial Training, vol. 21, 2016.
[12] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent
neural networks,” arXiv preprint arXiv:1601.06759, 2016.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
A Detailed Study on Generative

Adversarial Networks
Shailender Kumar Sumit Dhawan

Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Delhi Technological University Delhi Technological University
Delhi, India Delhi, India
shailenderkumar@dce.ac.in sumitdhawan6@gmail.com
Abstract—From past decades, along with increase in process of doing so. GANs are based on learning the joint
computing power, different generative models have also been probability distribution.
developed in the area of machine learning. Among all such
generative models, one very popular generative model, called as Image Synthesis is a problem which has been researched a
Generative Adversarial Networks, has been introduced and lot upon. It means generating an image based on the hidden and
researched upon, in past few years only. It is based on the visible features of an already existing image. GANs are being
concept of two adversaries constantly trying to outperform each used largely in the area of image synthesis (or image
other. Objective of this review is to extensively study the GAN processing, in general) as they have been proven to work really
related literature and provide a summarized form of the studied well with the images. GANs consist of two models which are
literature available on GAN, including the concept behind it, its trained simultaneously against each other. Generator is tasked
objective function, proposed modifications in base model, and with generating new data points based on how the data points
recent trends in this field. This paper will help in giving a are distributed in input samples and thus, deceiving the
thorough understating of GAN. This paper will give an overview Discriminator with the generated/fake data points as real data
of GAN and discuss the popular variants of this model, its points. Discriminator is tasked with catching the bluff of
common applications, different evaluation metrics proposed for Generator by classifying the received data points as generated
it, and finally its drawbacks, conclusion of the paper and future or a real (taken directly from the input sample space). It is
course of action. similar to a zero-sum game between two opponents. Models
Keywords—deep learning; generative adversarial networks; are trained using back-propagation [17] and dropouts (to
neural network; generative model; image processing prevent over-fitting).
GANs can be classified based on their architecture and loss
I. INTRODUCTION function used to train the generator [16]. It has been shown that
There has been many advancements in the field of machine GANs can produce pretty good and high quality images which
seem convincing enough to be considered as real images.
learning, especially deep learning as more and more computing
GANs have grown exponentially since they were first proposed
or processing power is becoming available. Deep Learning
in 2014. As of now, there are many variants of generative
helps us in extracting high level, useful and abstract features
adversarial networks, each created specifically for a specific
from the input data and using those features in classification
problem in multiple research areas like image synthesis, style
and generation tasks. This approach is commonly known as
transfer, image enhancement, image to text conversion, text to
representation learning and is based on how the human mind
learns anything. Concept of generative models (based on deep image conversion, detecting an object, etc., making GANs a
learning) forms the basis of Generative Adversarial Networks hot topic for research purpose as well as in real world
applications. Despite all the growth and research going on,
(or GANs). Traditional generative models like Restricted
Boltzmann machine [15] and Variational Auto Encoder [14] GANs suffer from some problems and have shortcomings [9]
like vanishing gradient, mode collapse, non-convergence,
were based on concepts like Markov chains and maximum
likelihood estimation. Based on the distribution of input data, absence of universal performance evaluation metrics. Also,
they estimate the distribution of generated data, but as a result various solutions have been proposed for above mentioned
drawbacks.
of not being good in generalization, their performance and
outcome is affected. A new concept in the field of generative This survey analyses and presents theory, applications,
models, called as Generative Adversarial Networks (GANs) shortcomings, state of the art variants of GANs and recent
was introduced in the year of 2014 by Goodfellow et al. [1]. It trends in this field. This survey is organized in parts as follows.
consists of one generator and one discriminator, which are Section II defines the background, structure and loss function
adversaries of each other and thus, constantly trying to of GANs. A comparison of variants of GANs is presented in
outperform each other and thus, improving themselves in the Section III. Section IV discusses different evaluation metrics.
The applications of GANs are given in Section V. Section VI
978-1-7281-5371-1/20/$31.00 ©2020 IEEE 641
Authorized licensed use limited to: University of Exeter. Downloaded on July 15,2020 at 23:56:25 UTC from IEEE Xplore. Restrictions apply.
discusses the shortcomings of GANs. Finally, Section VII Main aim of Discriminator is to maximize the objective
gives the conclusions and future scope. function by minimizing D(G(z)) along with maximizing D(x),
which signifies Discriminator is classifying real data as real
II. THEORY AND STRUCTURE OF GAN and fake data as fake. Main aim of Generator is to minimize
the objective function by maximizing D(G(z)) which signifies
A. Basic Theory that Discriminator is classifying generated data (which is fake)
as real. The training data consists of real data as well as
Behind the motivation for GANs, is Nash equilibrium of generated data. Discriminator is initially trained on real as well
Game Theory [1], which is a solution for a non-cooperative as fake data. After that, its weights are made non trainable.
game between two adversaries, in which each player already Then the generator is trained. This process is repeated until
knows all the strategies of other player, therefore no player required level of performance is achieved. In initial phase of
gains anything by modifying their own strategy. learning, due to poor performance of Generator, Discriminator
can catch the lie and reject generated samples easily as they are
easily distinguishable from the real samples. Therefore,
saturation of log(1 − D(G(z))) happens and derivative tends to
0, preventing gradient descent from occurring. So, to correct
this, Generator is now trained to maximize the value of
log(D(G(z))) instead of minimizing the value of (1 − D(G(z))).
Fig. 1. Structure of Generative Adversarial Networks This modified objective function results in the derivative to be
high and provides much better gradients in the initial phases of
learning
The main aim of GAN is to arrive to this Nash equilibrium.
Fig. 1 depicts the basic structural model of a standard GAN.
Any function which can be differentiated can be used as the
function for equations of generator as well as discriminator. III. GAN M ODELS
Here, G and D are two differentiable functions that represent
the generator and the discriminator respectively. Inputs given Since the idea of GAN was proposed, many variations have
to D is x (real data), and z (random data or simply, noise). been done in the original model, resulting in different models
Output of G is fake data produced as per the probability of GANs. These variations include the improvement of
distribution of actual data (or pdata), which is termed as G(z). structure for efficiency, changes done for any specific
If actual data is given as input to Discriminator, it should application or problem statement like, style transfer from one
classify the input data as real data, labeling it 1. If fake or image into another [18], image enhancement [13], completing
generated data is given as input to Discriminator, it should an incomplete image [20-21], and generating an image from
classify the input data as fake data labeling it 0. Discriminator text [19].
strives to classify the input data correctly as per the source of
data, On the other hand, Generator strives to deceive the A. Fully Connected or Vanilla GAN
Discriminator by making generated data G(z) similar and in It was first introduced as a baseline model for GANs in
line with the real data x. This game like adversarial process 2014 by Goodfellow et al. [1]. As the overlap of probability
helps in improving the performance of both Discriminator and distributions of actual data and fake data is very little, the
Generator slowly and gradually throughout the process. measure of similarity between two distributions, also known
Therefore, slowly, Generator is able to generate better images as the Jensen-Shannon divergence, may become constant,
that look more real, because it has to fool the improved and which results in the problem of vanishing gradient. It doesn’t
more efficient Discriminator (when compared to the previous perform that well in the case of more complex images.
iteration of training).
B. Deep Convolutional GAN (DCGAN)
B. Loss Function Another variant of GANs was proposed by Radford et al.
In this part of the section, loss function and learning (2015) [4], known as Deep Convolutional Generative
methodology of GANs will be discussed. Following is the Adversarial Networks (DCGANs), which is based on
mathematical representation of loss function (or objective Convolutional Neural Networks with following changes:
function) of GANs, as discussed in [1].  Removing fully connected hidden layers.
 In the generator, a fractional-strided convolutions are
minGmax DV(D, G) = Ex~p(x)[log D(x)]  Ex~p(z)[log(1-D(G(z)))] used instead of pooling layers. In the discriminator,
(1) strided convolutions are used instead of pooling
where, layers.
x - Obtained by sampling the distribution of real data or p(x).  Using the concept of batch normalization in
z - Obtained by sampling the distribution of prior data or p(z) generator as well as discriminator.
(which can be Gaussian or Uniform distribution).  Applying ReLU activation function in all but the last
E[x] - Expected value of any random variable x.
layer of generative model. LeakyReLU activation
D(x) - The probability that x is sampled from actual data and
function is applied in each and every layer of the
not from the generated data.
discriminator.
978-1-7281-5371-1/20/$31.00 ©2020 IEEE 642
C. Wasserstein GAN (WGAN) H. Bidirectional GAN (BiGAN)

Wasserstein GAN (WGAN) was proposed by Arjovsky et In normal GANs, the generator learns by mapping latent
al. [5] to resolve the issue of vanishing gradient. Instead of the feature vector to real data probability distribution. But it lacks
Jensen-Shannon divergence, it uses the Earth-Mover distance, an efficient mechanism which maps the real data to data in
for calculating the similarity between probability distribution of latent space. Bidirectional GANs (BiGANs) was proposed by
real or actual data and fake or generated data. Discriminator is Donahue et al. [23]. It maps the actual data probability
represented using f, which is a critic function, and is built on distribution on to latent space, hence helping to learn to how to
Lipschitz constraint. WGAN, though useful in a comparatively extract relevant features.
stable training of GANs, still produces data samples which are
low in quality and even sometimes fail in the process of
converging.
IV. EVALUATION METRICS
D. Wasserstein GAN with Gradient Penalty (WGAN-GP) Apart from the manual inspection and evaluation, of
samples produced by Generator module of GANs, there are
This model can be considered as an improvement to the
some quantitative measures like Average Log Likelihood [24],
basic WGAN, proposed by Gulrajani et al. [12]. They found
Inception Score (IS) [9], Wasserstein metric [5], Frechet
that the weight clipping in GAN, which is done in order to
apply a Lipschitz constraint on the critic, leads to failure in Inception Distance (FID) [25], etc. IS should be high and FID
training. This causes abnormal behavior. So an alternative was should be low for high quality generated images. Based on
proposed to apply the Lipschitz constraint on critic, that is, it these evaluation metrics, GAN model’s performance can be
works by penalizing the value of norm of the gradient of the f better judged and thus can be improved by introducing the
(or critic function) with respect to the input, instead of clipping required modifications in model.
of weights. This method tends to converge quickly and Log-likelihood (also called as Kullback-Leibler
generates samples of relatively high quality when compared to divergence or KL divergence) is one of the widely used
WGAN version with weight clipping. standard metrics for evaluating and measuring the
performance of generative models [24]. It helps in measuring
E. Least Square GAN (LS-GAN) the likelihood of generated distribution being consistent with
In conventional GANs, during the process of back- actual data distribution. Maximum likelihood or 0 KL
propagation, sigmoid cross entropy loss function is used. divergence will produce perfect samples which will seem real.
However, sometimes, during the learning process, the issue of Inception Score is probably one of the most used
vanishing gradient occurs due to this particular loss function. evaluation metric for GAN evaluation, and it was proposed by
As a solution, Mao et al. [22] came up with LS-GAN. It uses Salimans [9]. A pre-trained neural network (for example,
the least square loss function. LS-GANs generate higher Inception v3 model) is used to capture the relevant and
quality images and also are more stable during the whole important properties of generated samples, i.e., image quality
learning process compared to normal GANs . and image diversity. High image quality is represented by
narrow distribution of label and high image diversity is
F. Semi-GAN (SGAN) represented by uniform distribution of labels of all samples
In basic GAN, during training, only one class label is combined. As both distributions are very much dissimilar (one
required which is used to specify data source (real or fake). is narrow, another is uniform), KL divergence should be high,
Odena [10] proposed Semi GAN (SGAN) by further adding further giving good Inception score.
class labels (say, N) of actual data while discriminator D is
trained. After training, D will classify the input to one of the KL divergence = KL(C||M) = p(y|x) * (log(p(y|x)) – log(p(y)))
N+1 classes, where the extra class is used to specify the data (2)
source as per G. It is shown that this method can create a better where,
classifier which generates samples of high quality when C is conditional probability distribution, p(y|x) and,
compared to standard GAN, as it is trained to learn the features M is marginal probability distribution, p(y)
and associates them with correct class labels.
Inception score doesn’t measure the variations in fake
G. Conditional GAN (C-GAN) images if contrasted with actual images, therefore Heusel et al.
In conventional GAN, only latent space is provided to [25] introduced Frechet Inception Distance. It works on the
generator. The conditional GAN, as proposed by Mirza [11] features and transforms generated data and maps it to feature
changes that by providing an additional parameter (label y) space. This feature space is obtained by the last pooling layer,
along with latent space, to the generator and train it to produce just before output classification. It is summarized by
corresponding images. Discriminator is provided with real calculating the covariance as well as mean for fake and actual
images and label as its input, to distinguish real images better. data. By calculating this Frechet Inception Distance (also
It is shown that this model can produce digits similar to those known as Wasserstein-2 distance) between these Gaussian
of MNIST dataset, conditioned on class labels (0, 1, 2, 3…9). distributions, quality and variation of produced data is
Hence, the name is Conditional GAN. measured.
978-1-7281-5371-1/20/$31.00 ©2020 IEEE 643
As of yet, there is no unanimous decision regarding which VI. DRAWBACKS

one is a better evaluation measure. Different scores focus on
The main shortcoming in the training process of GANs is
different aspects of the image generation process. But some
metrics seem more plausible than others, like FID has the that there is always a risk of mode collapse. It occurs when
advantage of being more robust to noise. And as claimed in data produced by GANs is mostly focused on very less modes
[25], FID can tell us how similar are real and fake images, and or sometimes even on one single mode, which results in
this measure is considered more efficient than IS. comparatively less diverse samples. A solution to the above
problem is batch regularization or mode regularization. In this,
decent size of batches of data samples or various data points
V. APPLICATIONS OF GANS
are included so that the diversity is improved. Another
GANs are being used widely in fields like Image solution is proposed in [33], in which the samples generated
processing and Natural Language Processing. Following are by different models are combined together. Moreover,
few of the main applications of GANs apart from classical optimizing the objective function (like done in WGAN [5])
application of generating data similar to the input data. can resolve the problem as well.
1) Inter-domain conversion of content of one image to The lack of stability in the process of training is a big
another using CGAN was proposed by Isola et al. [18]. issue in itself. The GAN model’s parameters oscillate,
It is named pix2pix which is effective at generating destabilize and never converges to the Nash equilibrium, that
photos from label maps and colorizing images. is each opponent always countermeasures other opponent
2) Zhu et al. [18] propose CycleGAN, which can also be actions, making the models harder to converge. In contrast to
used for translating image from one domain to another, the other generative models, GANs evaluation problem is
in the absence of a training example of paired images. much more challenging as there is no consensus on what
It can perform style transfer, photo enhancement, evaluation metrics should be used for GANs [34]. Another
object transfiguration, attribute transfer, etc. shortcoming of GANs is the problem of diminished gradient.
3) Solution to the problem of Image super-resolution was It occurs if the discriminator gets very powerful while
provided in [13], where Ledig et al. present SRGAN. training, causing the gradient of generator’s function to reduce
VGG network [26] is used as the discriminator. It is and slowly vanish, and in turn, the generator learns nothing.
shown in [13] that it is possible to generate This imbalance between models of generator and
photorealistic image with high resolution. But the discriminator, results in overfitting. Also, GANs are very
texture information which SRGAN generates, doesn’t much sensitive to the selection of hyper parameters.
seem that much real, and is noisy. Therefore, Wang et
al. [29] propose an Enhanced Super-Resolution
Generative Adversarial Networks (ESRGAN) which VII. C ONCLUSION AND F UTURE SCOPE
aims to improve the network’s architecture. This paper analyses and summarizes the background of
4) Zhang et al. [28] propose that GAN can generate such GANs, its basic theory, structure, variants, evaluation metrics,
sentences which seem too real by utilizing the concept applications, shortcomings and future scope. It gives an
of long short term memory (LSTM). Li et al. [30] use overall and broad understanding of the literature present on
GANs to perform speech to text conversion by GAN. A wide variety of applications of GANs is shown in this
capturing the relevance of the dialogue. SeqGAN [31] paper. New as well as better solutions to new and existing
uses reinforcement learning to not only generates challenges in GANs are yet to be explored with an aim to
speech but even poems and music. increase the efficiency of GANs. The field of GAN is a
5) Antipov et al. [27] introduced a model on the concept promising research area, but even with all the development in
of GAN that can be used for automatic face aging. It its research area, it has its own problems like unstable training,
produces synthetic images. Facial attributes are non-convergence, lack of consensus on an evaluation metric,
modified in similar kinds of work but in this, emphasis requirement of more computing power and the model’s
is given on preserving basic facial attributes and complexity. In conclusion, GAN is an interesting and useful
original person’s identity in aged version of the face. research field with many applications but also, a lot of work
needs to be done to overcome existing challenges, as it is still
comparatively a new field.
New work is being continuously done to overcome the
limitations of GANs. For example, WGAN can resolve the
mode collapse problem as well as the training instability
problem, but only partially. Therefore, preventing the problem
of mode collapse in GANs remains an open research problem.
Also, other research areas include the presence of Nash
equilibrium and the theory of the convergence of a GAN
model. GANs are being widely utilized in the area of
Computer Vision but comparatively not that much in other
Fig. 2. Few examples of applications of GANs [8] fields like Natural Language Processing. This limitation is
978-1-7281-5371-1/20/$31.00 ©2020 IEEE 644
because of the different properties related to image and non- [17] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning
representations by back-propagating errors,’’ Nature, vol. 323, pp.
image data. GANs can be used for interesting applications in 533-536, Oct. 1986.
various fields, therefore research is going on for that and also [18] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “ Unpaired image-to-
on how to increase the efficiency and improve the image translation using cycle-consistent adversarial networks,” arXiv
performance of GAN. preprint arXiv:1703.10593v6, 2017.
[19] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
“ Generative adversarial text to image synthesis,” arXiv preprint
arXiv:1605.05396, 2016.
REFERENCES [20] Z. Chen, S. Nie, T. Wu, and C. G. Healey, “ High resolution face
completion with multiple controllable attributes via fully end-to-end
[1] I. Goodfellow et al., ‘‘Generative adversarial nets,’’ in Proc. Adv. progressive generative adversarial networks,” arXiv preprint
Neural Inf. Process. Syst., 2014, pp. 2672–2680. [Online]. Available: arXiv:1801.07632, 2018.
http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf. [21] Y. Li, S. Liu, J. Yang, and M.-H. Yang, “Generative face completion,”
[2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. New in Proceedings of the IEEE Conference on Computer Vision and
York, USA: MIT Press, 2016. Pattern Recognition, 2017, pp. 3911–3919.
[22] Mao, Xudong, et al. “ Least Squares Generative Adversarial
[3] I. Goodfellow, “ NIPS 2016 tutorial: generative adversarial networks,”
Networks.” ArXiv:1611.04076 [Cs], Apr. 2017.
arXiv: 1701.00160, 2016.
arXiv.org, http://arxiv.org/abs/1611.04076.
[4] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation [23] Donahue, Jeff, et al. “ Adversarial Feature Learning.”
learning with deep convolutional generative adversarial networks,” ArXiv:1605.09782 [Cs, Stat], Apr. 2017. arXiv.org,
arXiv: 1511.06434, 2015. http://arxiv.org/abs/1605.09782.
[5] Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein [24] Hamid Eghbal-zadeh, Gerhard Widmer, “ Likelihood estimation for
gan." arXiv preprint arXiv:1701.07875 ,2017. generative adversarial networks”, 2017.
[6] L. J. Ratliff, S. A. Burden, and S. S. Sastry, “ Characterization and
[25] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter,
computation of local Nash equilibria in continuous games,” in
Gans trained by a two time-scale update rule converge to a local nash
Proc. 51st Annu. Allerton Conf. Communication, Control, and
equilibrium, in: Advances in Neural Information Processing Systems,
Computing (Allerton), Monticello, IL, USA, 2013, pp. 917−924.
2017, pp. 6629–6640.
[7] Hitawala, Saifuddin. "Comparative Study on Generative Adversarial [26] Simonyan, Karen, and Andrew Zisserman. “ Very Deep Convolutional
Networks." arXiv preprint arXiv:1801.04271, 2018. Networks for Large-Scale Image Recognition.” ArXiv:1409.1556 [Cs],
[8] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image Apr. 2015. arXiv.org, http://arxiv.org/abs/1409.1556.
translation with conditional adversarial networks. In CVPR, 2017 [27] Antipov et al., “ Face aging with conditional generative adversarial
[9] Salimans, Tim, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, networks”, 2017.
Alec Radford, and Xi Chen. "Improved techniques for training gans." [28] Y. Z. Zhang, Z. Gan, and L. Carin, “ Generating text via adversarial
In Advances in Neural Information Processing Systems, pp. 2234 - training,” Proc. Workshop on Adversarial Training, Barcelona, Spain,
2242. 2016. 2016
[10] A. Odena, “ Semi-supervised learning with generative adversarial [29] Wang, Xintao, et al. “ ESRGAN: Enhanced Super-Resolution
networks,” arXiv: 1606.01583, 2016. Generative Adversarial Networks.” ArXiv:1809.00219 [Cs], Sept.
[11] M. Mirza and S. Osindero, “ Conditional generative adversarialnets,” 2018. arXiv.org, http://arxiv.org/abs/1809.00219.
arXiv: 1411.1784, 2014. [30] J. W. Li, W. Monroe, T. L. Shi, S. Jean, A. Ritter, and D. Jurafsky,
[12] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. “ Adversarial learning for neural dialogue generation,” arXiv:
“ Improved Training of Wasserstein GANs”. arXiv preprint 1701.06547, 2017.
arXiv:1704.00028, 2017 [31] L. T. Yu, W. N. Zhang, J. Wang, and Y. Yu, “ SeqGAN: sequence
[13] Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew generative adversarial nets with policy gradient,” arXiv: 1609.05473,
Cunningham, Alejandro Acosta, Andrew P. Aitken et al. "Photo- 2016.
Realistic Single Image Super-Resolution Using a Generative [32] Killoran, Nathan, et al. “ Generating and Designing DNA with Deep
Adversarial Network." In CVPR, vol. 2, no. 3, p. 4. 2017. Generative Models.” ArXiv:1712.06148 [Cs, q-Bio, Stat], Dec. 2017.
[14] Kingma, Diederik P., and Max Welling. “Auto-Encoding Variational arXiv.org, http://arxiv.org/abs/1712.06148.
Bayes.” ArXiv:1312.6114 [Cs, Stat], May 2014. ar Xiv.org, [33] A. Ghosh, V. Kulharia, V. P. Namboodiri, P. H. S. Torr, and P. K.
http://arxiv.org/abs/1312.6114. Dokania, ‘‘Multi-agent diverse generative adversarial networks,’’ in
[15] Fischer, Asja, and Christian Igel. “ An Introduction to Restricted Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp.
Boltzmann Machines.” Progress in Pattern Recognition, Image 8513–8521
Analysis, Computer Vision, and Applications, edited by Luis Alvarez [34] Borji, Ali. “ Pros and Cons of GAN Evaluation Measures.”
et al., Springer, 2012, pp. 14–36. Springer Link. ArXiv:1802.03446 [Cs], Oct. 2018. arXiv.org,
[16] Wang, Zhengwei, et al. “Generative Adversarial Networks: A Survey http://arxiv.org/abs/1802.03446.
and Taxonomy.” ArXiv:1906.01529 [Cs], June 2019. arXiv.org, [35] Vijayakumar, T. (2019). Comparative study of capsule neural network
http://arxiv.org/abs/1906.01529. in various applications. Journal of Artificial Intelligence, 1(01), 19-27
978-1-7281-5371-1/20/$31.00 ©2020 IEEE 645
|| JAI SRI GURUDEV ||
Sri Adichunchanagiri Shikshana Trust(R)
SJB Institute of Technology
No. 67, BGS Health & Education City, Dr. Vishnuvardhan Road Kengeri, Bengaluru -560060
Department of Computer Science and Engineering
Technical Seminar Outcome
Seminar Title: Generative Adversarial Network Year 2021-22
Sl No: Course Outcomes Applicable PO’s and PSO’s Justification

Acquire, establish and
emphasize the information
1.
from literature and beyond
of upcoming technologies.
Based on the engineering
knowledge, analyze the
2. comprehensive solution
related to societal, health
and safety.
To impart skills in report
3. writing describing the
paper and results.
Ability to work
independently and
demonstrate for effective
4.
collection, analyze and
organize scientific
information.
Signature of Students Signature of Guide

Technical Seminar Final Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Technical Seminar Final Report

Uploaded by

Copyright:

Available Formats

VISVESVARAYATECHNOLOGICALUNIVERSITY

Technical Seminar Report

Rhythm Bhatnagar [1JB18CS116]

Under The Guidance Of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SJB INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Chapter No. Page No.

A Generative Adversarial Network (GAN) emanates in the category of Machine Learning

Dept of CSE, SJBIT 4

AUTHORS TITLE YEAR DESCRIPTION

Zhao Fan, Jin Hu 2019

Shailender Kumar, A Detailed Study on This paper includes a

Liang Gonog1,2 and

Dept of CSE, SJBIT 5

Dept of CSE, SJBIT 6

Person Re-identification is an application in which a person/ multiple people are re identified

Dept of CSE, SJBIT 8

Dept of CSE, SJBIT 9

Fig 5.1 Basic Architecture of GAN

Dept of CSE, SJBIT 10

Fig 6.1 Implementation Structure of GAN

Dept of CSE, SJBIT 11

minGmaxDf(D, G) = nEx∼Pdata (logD(x)) +Ez∼Pz (log(1 − D(g(z)))) (5)

Dept of CSE, SJBIT 12

Dept of CSE, SJBIT 13

Dept of CSE, SJBIT 14

Dept of CSE, SJBIT 15

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and

Dept of CSE, SJBIT 16

C. The Application of Human-Computer Interaction B. Disadvantages

A Detailed Study on Generative

Shailender Kumar Sumit Dhawan

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 641

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 642

C. Wasserstein GAN (WGAN) H. Bidirectional GAN (BiGAN)

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 643

As of yet, there is no unanimous decision regarding which VI. DRAWBACKS

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 644

978-1-7281-5371-1/20/$31.00 ©2020 IEEE 645

Seminar Title: Generative Adversarial Network Year 2021-22

Sl No: Course Outcomes Applicable PO’s and PSO’s Justification

Signature of Students Signature of Guide

You might also like