Professional Documents
Culture Documents
Abstract—From past decades, along with increase in process of doing so. GANs are based on learning the joint
computing power, different generative models have also been probability distribution.
developed in the area of machine learning. Among all such
generative models, one very popular generative model, called as Image Synthesis is a problem which has been researched a
Generative Adversarial Networks, has been introduced and lot upon. It means generating an image based on the hidden and
researched upon, in past few years only. It is based on the visible features of an already existing image. GANs are being
concept of two adversaries constantly trying to outperform each used largely in the area of image synthesis (or image
other. Objective of this review is to extensively study the GAN processing, in general) as they have been proven to work really
related literature and provide a summarized form of the studied well with the images. GANs consist of two models which are
literature available on GAN, including the concept behind it, its trained simultaneously against each other. Generator is tasked
objective function, proposed modifications in base model, and with generating new data points based on how the data points
recent trends in this field. This paper will help in giving a are distributed in input samples and thus, deceiving the
thorough understating of GAN. This paper will give an overview Discriminator with the generated/fake data points as real data
of GAN and discuss the popular variants of this model, its points. Discriminator is tasked with catching the bluff of
common applications, different evaluation metrics proposed for Generator by classifying the received data points as generated
it, and finally its drawbacks, conclusion of the paper and future
or a real (taken directly from the input sample space). It is
course of action. similar to a zero-sum game between two opponents. Models
Keywords—deep learning; generative adversarial networks;
are trained using back-propagation [17] and dropouts (to
neural network; generative model; image processing prevent over-fitting).
GANs can be classified based on their architecture and loss
I. INT RODUCT ION function used to train the generator [16]. It has been shown that
GANs can produce pretty good and high quality images which
There has been many advancements in the field of machine
seem convincing enough to be considered as real images.
learning, especially deep learning as more and more computing
or processing power is becoming available. Deep Learning GANs have grown exponentially since they were first proposed
in 2014. As of now, there are many variants of generative
helps us in extracting high level, useful and abstract features
adversarial networks, each created specifically for a specific
from the input data and using those features in classification
problem in multiple research areas like image synthesis, style
and generation tasks. This approach is commonly known as
transfer, image enhancement, image to text conversion, text to
representation learning and is based on how the human mind
image conversion, detecting an object, etc., making GANs a
learns anything. Concept of generative models (based on deep
hot topic for research purpose as well as in real world
learning) forms the basis of Generative Adversarial Networks
(or GANs). Traditional generative models like Restricted applications. Despite all the growth and research going on,
GANs suffer from some problems and have shortcomings [9]
Boltzmann machine [15] and Variational Auto Encoder [14]
like vanishing gradient, mode collapse, non-convergence,
were based on concepts like Markov chains and maximum
absence of universal performance evaluation metrics. Also,
likelihood estimation. Based on the distribution of input data,
various solutions have been proposed for above mentioned
they estimate the distribution of generated data, but as a result
drawbacks.
of not being good in generalization, their performance and
outcome is affected. A new concept in the field of generative This survey analyses and presents theory, applications,
models, called as Generative Adversarial Networks (GANs) shortcomings, state of the art variants of GANs and recent
was introduced in the year of 2014 by Goodfellow et al. [1]. It trends in this field. This survey is organized in parts as follows.
consists of one generator and one discriminator, which are Section II defines the background, structure and loss function
adversaries of each other and thus, constantly trying to of GANs. A comparison of variants of GANs is presented in
outperform each other and thus, improving themselves in the Section III. Section IV discusses different evaluation metrics.
The applications of GANs are given in Section V. Section VI
Authorized licensed use limited to: University of Exeter. Downloaded on July 15,2020 at 23:56:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
discusses the shortcomings of GANs. Finally, Section VII Main aim of Discriminator is to maximize the objective
gives the conclusions and future scope. function by minimizing D(G(z)) along with maximizing D(x),
which signifies Discriminator is classifying real data as real
II. THEORY AND STRUCTURE OF GAN and fake data as fake. Main aim of Generator is to minimize
the objective function by maximizing D(G(z)) which signifies
A. Basic Theory that Discriminator is classifying generated data (which is fake)
as real. The training data consists of real data as well as
Behind the motivation for GANs, is Nash equilibrium of generated data. Discriminator is initially trained on real as well
Game Theory [1], which is a solution for a non-cooperative
as fake data. After that, its weights are made non trainable.
game between two adversaries, in which each player already Then the generator is trained. This process is repeated until
knows all the strategies of other player, therefore no player
required level of performance is achieved. In initial phase of
gains anything by modifying their own strategy. learning, due to poor performance of Generator, Discriminator
can catch the lie and reject generated samples easily as they are
easily distinguishable from the real samples. Therefore,
saturation of log(1 − D(G(z))) happens and derivative tends to
0, preventing gradient descent from occurring. So, to correct
this, Generator is now trained to maximize the value of
log(D(G(z))) instead of minimizing the value of (1 − D(G(z))).
Fig. 1. Structure of Generative Adversarial Networks This modified objective function results in the derivative to be
high and provides much better gradients in the initial phases of
learning
The main aim of GAN is to arrive to this Nash equilibrium.
Fig. 1 depicts the basic structural model of a standard GAN.
Any function which can be differentiated can be used as the
function for equations of generator as well as discriminator. III. GAN M ODELS
Here, G and D are two differentiable functions that represent Since the idea of GAN was proposed, many variations have
the generator and the discriminator respectively. Inputs given been done in the original model, resulting in different models
to D is x (real data), and z (random data or simply, noise). of GANs. These variations include the improvement of
Output of G is fake data produced as per the probability structure for efficiency, changes done for any specific
distribution of actual data (or pdata), which is termed as G(z). application or problem statement like, style transfer from one
If actual data is given as input to Discriminator, it should
image into another [18], image enhancement [13], completing
classify the input data as real data, labeling it 1. If fake or an incomplete image [20-21], and generating an image from
generated data is given as input to Discriminator, it should
text [19].
classify the input data as fake data labeling it 0. Discriminator
strives to classify the input data correctly as per the source of
data, On the other hand, Generator strives to deceive the A. Fully Connected or Vanilla GAN
Discriminator by making generated data G(z) similar and in It was first introduced as a baseline model for GANs in
line with the real data x. This game like adversarial process 2014 by Goodfellow et al. [1]. As the overlap of probability
helps in improving the performance of both Discriminator and distributions of actual data and fake data is very little, the
Generator slowly and gradually throughout the process. measure of similarity between two distributions, also known
Therefore, slowly, Generator is able to generate better images as the Jensen-Shannon divergence, may become constant,
that look more real, because it has to fool the improved and which results in the problem of vanishing gradient. It doesn’t
more efficient Discriminator (when compared to the previous perform that well in the case of more complex images.
iteration of training).
B. Deep Convolutional GAN (DCGAN)
B. Loss Function Another variant of GANs was proposed by Radford et al.
In this part of the section, loss function and learning (2015) [4], known as Deep Convolutional Generative
methodology of GANs will be discussed. Following is the Adversarial Networks (DCGANs), which is based on
mathematical representation of loss function (or objective Convolutional Neural Networks with following changes:
function) of GANs, as discussed in [1]. x Removing fully connected hidden layers.
x In the generator, a fractional-strided convolutions are
min G maxD V(D, G) = Ex~p( x)[log D(x)] Ex~p( z)[log(1-D(G(z)))] used instead of pooling layers. In the discriminator,
(1) strided convolutions are used instead of pooling
where, layers.
x - Obtained by sampling the distribution of real data or p(x). x Using the concept of batch normalization in
z - Obtained by sampling the distribution of prior data or p(z) generator as well as discriminator.
(which can be Gaussian or Uniform distribution). x Applying ReLU activation function in all but the last
E[x] - Expected value of any random variable x. layer of generative model. LeakyReLU activation
D(x) - The probability that x is sampled from actual data and function is applied in each and every layer of the
not from the generated data. discriminator.
Authorized licensed use limited to: University of Exeter. Downloaded on July 15,2020 at 23:56:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
Authorized licensed use limited to: University of Exeter. Downloaded on July 15,2020 at 23:56:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
Authorized licensed use limited to: University of Exeter. Downloaded on July 15,2020 at 23:56:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)
IEEE Conference Record # 48766; IEEE Xplore ISBN: 978-1-7281-5371-1
because of the different properties related to image and non- [17] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning
representations by back-propagating errors,’’ Nature, vol. 323, pp.
image data. GANs can be used for interesting applications in 533-536, Oct. 1986.
various fields, therefore research is going on for that and also [18] J.-Y. Zhu, T . Park, P. Isola, and A. A. Efros, “ Unpaired image-to-
on how to increase the efficiency and improve the image translation using cycle-consistent adversarial networks,” arXiv
performance of GAN. preprint arXiv:1703.10593v6, 2017.
[19] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
“ Generative adversarial text to image synthesis,” arXiv preprint
arXiv:1605.05396, 2016.
REFERENCES [20] Z. Chen, S. Nie, T . Wu, and C. G. Healey, “ High resolution face
completion with multiple controllable attributes via fully end-to-end
[1] I. Goodfellow et al., ‘‘Generative adversarial nets,’’ in Proc. Adv. progressive generative adversarial networks,” arXiv p reprint
Neural Inf. Process. Syst., 2014, pp. 2672–2680. [Online]. Available: arXiv:1801.07632, 2018.
http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf. [21] Y. Li, S. Liu, J. Yang, and M.-H. Yang, “Generative face completion,”
[2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. New in Proceedings of the IEEE Conference on Computer Vision and
York, USA: MIT Press, 2016. Pattern Recognition, 2017, pp. 3911–3919.
[22] Mao, Xudong, et al. “ Least Squares Generative Adversarial
[3] I. Goodfellow, “ NIPS 2016 tutorial: generative adversarial networks,”
Networks.” ArXiv:1611.04076 [Cs], Apr. 2017.
arXiv: 1701.00160, 2016.
arXiv.org, http://arxiv.org/abs/1611.04076.
[4] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation [23] Donahue, Jeff, et al. “ Adversarial Feature Learning.”
learning with deep convolutional generative adversarial networks,” ArXiv:1605.09782 [Cs, Stat], Apr. 2017. arXiv.org,
arXiv: 1511.06434, 2015. http://arxiv.org/abs/1605.09782.
[5] Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein [24] Hamid Eghbal-zadeh, Gerhard Widmer, “ Likelihood estimation for
gan." arXiv preprint arXiv:1701.07875 ,2017. generative adversarial networks”, 2017.
[6] L. J. Ratliff, S. A. Burden, and S. S. Sastry, “ Characterization and
[25] M. Heusel, H. Ramsauer, T . Unterthiner, B. Nessler, S. Hochreiter,
computation of local Nash equilibria in continuous games,” in
Gans trained by a two time-scale update rule converge to a local nash
Proc. 51st Annu. Allerton Conf. Communication, Control, and
equilibrium, in: Advances in Neural Information Processing Systems,
Computing (Allerton), Monticello, IL, USA, 2013, pp. 917−924.
2017, pp. 6629–6640.
[7] Hitawala, Saifuddin. "Comparative Study on Generative Adversarial [26] Simonyan, Karen, and Andrew Zisserman. “ Very Deep Convolutional
Networks." arXiv preprint arXiv:1801.04271, 2018. Networks for Large-Scale Image Recognition.” ArXiv:1409.1556 [Cs],
[8] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image Apr. 2015. arXiv.org, http://arxiv.org/abs/1409.1556.
translation with conditional adversarial networks. In CVPR, 2017 [27] Antipov et al., “ Face aging with conditional generative adversarial
[9] Salimans, T im, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, networks”, 2017.
Alec Radford, and Xi Chen. "Improved techniques for training gans." [28] Y. Z. Zhang, Z. Gan, and L. Carin, “ Generating text via adversarial
In Advances in Neural Information Processing Systems, pp. 2234 - training,” Proc. Workshop on Adversarial T raining, Barcelona, Spain,
2242. 2016. 2016
[10] A. Odena, “ Semi-supervised learning with generative adversarial [29] Wang, Xintao, et al. “ ESRGAN: Enhanced Super-Resolution
networks,” arXiv: 1606.01583, 2016. Generative Adversarial Networks.” ArXiv:1809.00219 [Cs], Sept.
[11] M. Mirza and S. Osindero, “ Conditional generative adversarial nets,” 2018. arXiv.org, http://arxiv.org/abs/1809.00219.
arXiv: 1411.1784, 2014. [30] J. W. Li, W. Monroe, T . L. Shi, S. Jean, A. Ritter, and D. Jurafsky,
[12] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. “ Adversarial learning for neural dialogue generation,” arXiv:
“ Improved T raining of Wasserstein GANs”. arXiv preprint 1701.06547, 2017.
arXiv:1704.00028, 2017 [31] L. T . Yu, W. N. Zhang, J. Wang, and Y. Yu, “ SeqGAN: sequence
[13] Ledig, Christian, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew generative adversarial nets with policy gradient,” arXiv: 1609.05473,
Cunningham, Alejandro Acosta, Andrew P. Aitken et al. "Photo- 2016.
Realistic Single Image Super-Resolution Using a Generative [32] Killoran, Nathan, et al. “ Generating and Designing DNA with Deep
Adversarial Network." In CVPR, vol. 2, no. 3, p. 4. 2017. Generative Models.” ArXiv:1712.06148 [Cs, q-Bio, Stat], Dec. 2017.
[14] Kingma, Diederik P., and Max Welling. “Auto-Encoding Variational arXiv.org, http://arxiv.org/abs/1712.06148.
Bayes.” ArXiv:1312.6114 [Cs, Stat], May 2014. ar Xiv.org, [33] A. Ghosh, V. Kulharia, V. P. Namboodiri, P. H. S. T orr, and P. K.
http://arxiv.org/abs/1312.6114. Dokania, ‘‘Multi-agent diverse generative adversarial networks,’’ in
[15] Fischer, Asja, and Christian Igel. “ An Introduction to Restricted Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp.
Boltzmann Machines.” Progress in Pattern Recognition, Image 8513–8521
Analysis, Computer Vision, and Applications, edited by Luis Alvarez [34] Borji, Ali. “ Pros and Cons of GAN Evaluation Measures.”
et al., Springer, 2012, pp. 14–36. Springer Link. ArXiv:1802.03446 [Cs], Oct. 2018. arXiv.org,
[16] Wang, Zhengwei, et al. “Generative Adversarial Networks: A Survey http://arxiv.org/abs/1802.03446.
and T axonomy.” ArXiv:1906.01529 [Cs], June 2019. arXiv.org, [35] Vijayakumar, T . (2019). Comparative study of capsule neural network
http://arxiv.org/abs/1906.01529. in various applications. Journal of Artificial Intelligence, 1(01), 19-27
Authorized licensed use limited to: University of Exeter. Downloaded on July 15,2020 at 23:56:25 UTC from IEEE Xplore. Restrictions apply.