Professional Documents
Culture Documents
Plagiarism Percentage 35%
Matches
Suspected Content
Dissertation on Automatic Image Colourization using Generative Models
Mudit Jha 01FB16ECS214 Saahil Jain 01FB16ECS321 Sayantan Nandy 01FB16ECS345 Under the
guidance of Internal Guide Prof. Suresh Jamadagni Associate Professor, PES University External Guide
Name of the Guide Designation, Company Name January – May 2020
Karnataka Act No. 16 of 2013) 100ft Ring Road, Bengaluru – 560 085, Karnataka, India PES UNIVERSITY
(Established under Karnataka Act No. 16 of 2013) 100ft Ring Road, Bengaluru – 560 085, Karnataka, India
FACULTY OF
Automatic Image Colourization using Generative Models is a bonafide work carried out by Mudit Jha
01FB16ECS214 Saahil Jain 01FB16ECS321 Sayantan Nandy 01FB16ECS345
In partial fulfilment for the completion of eighth semester project work in the 12
Program of Study Bachelor of Technology in Computer Science and Engineering
under rules and regulations of
PES University, Bengaluru during the period Jan. 2020 – May. 2020.
Automatic Image Colourization using Generative Models has been carried out by us under the guidance of
Prof.Suresh Jamadagni,Associate Professor
and submitted in partial fulfillment of the course requirements for the award of 13
degree of Bachelor of Technology in Computer Science and Engineering of PES
University, Bengaluru during the academic
The matter embodied in this report has not been submitted to any other university 13
or institution for the award of any degree.
01FB16ECS214 Mudit Jha 01FB16ECS321 Saahil Jain 01FB16ECS345 Sayantan Nandy
ACKNOWLEDGEMENT We
Suresh Jamadagni, Associate Professor from PES University, for his continuous guidance, assistance and
encouragement throughout the development of this project. We would also like to thank Dr.Mamta
HR,Dr.Jayashree R for all the support and guidance given to us while doing this project. We are grateful for
our Project Coordinator, Dr. Anant Koppar, for organising, managing and helping out with the entire process.
We take this opportunity to thank Dr. Shylaja S S, Chairperson,
Department of Computer Science and Engineering, PES University, for all the 37
knowledge and support
we have received from the department. We would like to thank Dr. B.K. Keshavan, Dean of Faculty, PES
University for his help. We are deeply grateful to Dr.
Suryaprasad J, Vice-Chancellor, PES University for providing to us various opportunities and enlightenment
every step of the way. Finally, this project could not have been completed without the continual support and
encouragement we have received from our parents,colleagues and friends. ABSTRACT Generative models
are becoming more and more common in applied Machine Learning as they offer a different view to the
classical deep learning approach of having multiple layers of neurons learn an abstraction of the input data
through backpropagation. Though this approach has served well and has expanded into many different
architectures and problem domains, generative models,namely Generative Adversarial Networks and
autoencoders offer a different approach: one where the model tries to learn the data distribution from which
the input dataset is sampled. Due to this approach, generative models are becoming increasingly common
in tasks which involve some generation of new data such as in image to image translation. Our problem
statement of image colorization falls under this very domain. Thus, a foray into this approach towards
colorizing images which is a step away from existing solutions which generally involve Convolutional Neural
Networks holds a lot of promise and is an area of active research as well. TABLE OF CONTENTS Chapter
No. 1. 2. 3. Title Page No. INTRODUCTION PROBLEM DEFINITION LITERATURE SURVEY 3.1 Exploring
Convolutional Neural Networks for Automatic Image Colorization 3.1.1 Introduction 3.1.2 Approach 3.1.3
Method 01 02 03 3.2
3.2.1 Introduction 3.2.2 Method 3.3 Introduction 3.3.1 Introduction 3.3.2 Network Architecture 4. PROJECT
REQUIREMENTS SPECIFICATION 5. SYSTEM REQUIREMENTS SPECIFICATION 6. SYSTEM DESIGN
7. DETAILED DESIGN 8. IMPLEMENTATION AND PSEUDOCODE 9. TESTING 10. RESULTS AND
DISCUSSION 11. SNAPSHOTS 12. CONCLUSIONS 13. FURTHER ENHANCEMENTS
REFERENCES/BIBLIOGRAPHY APPENDIX A DEFINITIONS, ACRONYMS AND ABBREVIATIONS
APPENDIX B USER MANUAL (OPTIONAL)
Autoencoder -Test Gray Images 2 Autoencoder -Generated Color Images 3 Autoencoder- Origin Color
Images 4 GAN- Test Gray Images 5 GAN- Origin Color Images 6 GAN- Generated Color Images 7 GAN-
Code 8 GAN - model build function 9 GAN - model build function 10 GAN- model initializer function 11 GAN-
load cifar dataset function 12 AutoEncoder- model initializer function 13 AutoEncoder - encoder model
summary 14 AutoEncoder -decoder model summary 31 32 33 34 35 35 37 38 39 40 41 42 43 44
_____________________________________________________________________________________
1. INTRODUCTION 1.1. Overview The
This is due to the huge assortment of utilizations such as color restoration and 7
image colorization for animations.
Photography may appear to be the snap of a picture, but the best photographs often undergo intense after-
effects on the computer. Image colorization is one technique to add style to a photograph or apply a
combination of styles. Additionally, image colorization can add color to photographs that were originally
taken in black and white. This can be used to provide a best-guess as to the context of the picture, and help
bridge the gap between the past and the present. The goal of our model is to produce realistic colorized
photos. In 2014, Goodfellow
start with a simple 256 x 256 pixel grayscale image as an input. We then use a neural network to output a
predicted colorized image. We have trained our model to output photos with realistic colors by training it on
realistic images. This does not mean that the output photo will match the ground truth every time. Instead,
the model should produce a colorized image so realistic that a viewer could not spot the fake when looking
at a true color image and an image produced by our model. Image colourization
and
The network not only needs to generate an output with the same spatial dimension 7
as the input, but also to provide color information to each pixel in the grayscale
input image
The approach being followed is to try an autoencoder based model for image colorization before trying out
GANs. The reason for this is the often cited difficulty in training a GAN model
_____________________________________________________________________________________
_____________________________________________________________________________________
along with the complexity in the model. Training a GAN is effectively iteratively training two fully fledged
neural networks, as opposed to one in an autoencoder’s case. The goal of this project is to do a comparative
study of Autoencoder based model and GAN model with auto encoder as the convolutional network for the
GAN discriminator and generator.So, our current approach of using an autoencoder model instead of GANs
has the advantage of being more time efficient. It gives us the chance to try out different models more
quickly and evaluate the obtained results. This is in contrast to GANs which have their own class of
difficulties owing to the need to train both the generator and discriminator to similar levels. 2. Problem
Definition The aim is to present models for image re-colorization and do comparative study about the
complexity of the models,training parameters and the output generated by these models after training
on 2000 images from CIFAR-10. they used the L2 loss function as the objective,
this model. As is evident from these images, the baseline model does not colorize the images well producing
very faint/dull colors and muted tonality. Objective Function One of the most important challenges in auto-
colorization is an objective function that accounts for the multimodal nature of the problem. To investigate
this further, experimented with several loss functions. Based on the results from the baseline model,
expecting the L2 loss to give under-saturated images with muted colors due to its averaging effect. This is
because the L2 loss, given various colorizations of an object that can take multiple colors (e.g. a car that can
be blue, green or red), will choose the mean colorization to reduce the model loss on the input grayscale
image. This predicted mean pixel value causes the output images to have muted colors and appear sepia
toned. Thus, while the L2 loss might seem well suited for this task, it does not work well for objects that
could take multiple colors
gives us a probability distribution for the class that each pixel can belong to, helping us select the best class
for each pixel. they get the most vivid, realistic and statured images using this method, since they are not
trying
to minimize the difference between the generated image and the ground truth as 30
in the
L2 loss, but are instead working with classes that offer the model
_____________________________________________________________________________________
_____________________________________________________________________________________
more flexibility. However, the number of classes is an important and sensitive hyper parameter here. If the
number of classes they generate with the bin() function is too large, then there is a high likelihood of the
model making an inaccurate prediction as it becomes tougher for the model to choose the correct class
amongst the increased class set.
Activation Function They use the rectified linear unit (ReLu) as the nonlinearity 10
that follows each of our convolutional layers.
Found that ReLu helped accelerate the training convergence. It is also extremely simple to compute this
function. One drawback for this function
is that the model parameters could be updated in such a way that the function's 10
active region may end up in the zero gradient
region which causes a gradient of 0 to backpropagate through the network, effectively ‘killing’ neurons and
preventing the network from training well. However, they did not run into this challenge in practice, and so
used ReLu as the activation function for our model. Dropout Dropout is an extremely effective regularization
technique introduced. While
they introduced dropout right after the batchnorm layers in our network, but observed bad results. This is
because, given our problem and the dataset size, overfitting is not a challenge. In fact, they want the model
to learn as many diverse colorizations as possible. Dropout hinders this process by preventing the model
from learning ‘too much’. Thus, they left Dropout out of the final model.
_____________________________________________________________________________________
_____________________________________________________________________________________
Hyperparameter Tuning Used the Adam optimization method (default values
of 1e-3 which was decayed to 1e-4 when the loss started to plateau (which usually happened around epoch
150). Trained for 200 epochs using batch sizes of 250 (varied slightly for different models) on a NVIDIA
Tesla K80 GPU. Another important hyperparameter was the number of bins in our classification model,
where they used 10 bins. All of these hyperparameter values were found after doing a random search over
the hyperparameter space. More specifically, they started off with a small training set of 100 images, and
used random search to narrow in for a range for these parameters. For instance, for the learning rate, they
ran multiple trials of training to see the learning rate that would yield the fastest convergence
over a fixed number of iterations. Within the set of learning rates sampled on a 10
logarithmic scale, they found that a learning rate of
1e-3
achieved one of the largest per-iteration decreases in training loss as well as the 10
lowest training loss of the learning rates sampled.
3.2
Goal in this paper is to develop a common framework for all these problems. 3
3.2.2 Method
Optimization and inference To optimize our networks, they follow the standard 3
approach from: they alternate between one gradient descent step on D, then one
step on G. As suggested in the original GAN paper, rather than training G to minimize
log(1 − D(x, G(x, z)), they instead train to maximize log D(x, G(x, z)). In addition, they
divide the objective by 2 while optimizing D, which slows down the rate at which D learns
relative to G. they use mini batch SGD and apply the Adam solver, with a learning rate of
0.0002, and momentum parameters
β1 = 0.5, β2 =
0.999. At inference time, they run the generator net in exactly the same manner as 3
during the training phase. This differs from the usual protocol in that they apply
dropout at test time, and they apply batch normalization using the statistics of the test
batch, rather than aggregated statistics of the training batch. This approach to batch
normalization, when the batch size is set to 1, has been termed “instance normalization”
and has been
_____________________________________________________________________________________
_____________________________________________________________________________________
demonstrated to be effective at image generation tasks. In our experiments, they use
batch sizes between 1 and 10 depending on the experiment.
3.3.1 Introduction
Deep convolutional networks have outperformed the state of the art in many visual 1
recognition tasks. While convolutional networks have already existed for a long
time, their success was limited due to the size of the available training sets and the size of
the considered networks. The breakthrough by Krizhevsky was due to supervised training
of a large network with 8 layers and millions of parameters on the ImageNet dataset with
1 million training images. Since then, even larger and deeper networks have been trained
. The typical use of convolutional networks is on classification tasks, where the output to
an image is a single class label. However, in many visual tasks, especially in biomedical
image processing, the desired output should include localization, i.e., a class label is
supposed to be assigned to each pixel. Moreover, thousands of training images are
usually beyond reach in biomedical tasks. Hence, Ciresan et al. trained a network in a
sliding-window setup to predict the class label of each pixel by providing a local region
(patch) around that pixel as input. First, this network can localize. Secondly, the training
data in terms of patches is much larger than the number of training images.
_____________________________________________________________________________________
_____________________________________________________________________________________
Built
upon a more elegant architecture, the so-called “fully convolutional network” . they 8
modify and extend this architecture such that it works with very few training images
and yields more precise segmentations. The main idea is to supplement a usual
contracting network by successive layers, where pooling operators are replaced by
upsampling operators. Hence, these layers increase the resolution of the output. In order
to localize, high resolution features from the contracting path are combined with the
upsampled output. A successive convolution layer can then learn to assemble a more
precise output based on this information. 3.
_____________________________________________________________________________________
_____________________________________________________________________________________
4. Project Requirements Specification Our chosen topic revolves around generative models which are a
comparatively newer domain of machine learning as compared to classifiers. With most of our previous
experience in machine learning being with classifiers(both discriminative and generative one), a particular
requirement was to build some understanding of how generative models work, specifically with terms in this
space such as latent space, learning of a data distribution etc. With respect to the implementation, we have
an autoencoder model as well as a GAN model. 5. System Requirements Specification Functional
Requirements:
_____________________________________________________________________________________
_____________________________________________________________________________________
Perform image de-colorization: This is a necessary prerequisite step for our application. There are not many
readily available datasets consisting of black-and-white i.e grayscale images which we need to provide as
the input to our model. As a result, it is necessary to work with the present datasets which consist of color
images and process them to remove this color. The technical aspect to take care of in this regard
is the number of channels in the image. 11
The transformation takes the image from having three channels (RGB) to just one. As a result, the shape of
the numpy ndarray which holds the values representing pixel intensities across various channels has to be
changed accordingly. Train the model: The two different models both require their own specific architecture.
The encoder + decoder work together to first learn the abstract information from an image before performing
transpose convolution to add new information in order to come up with a new image. Save the
model(weights and biases learnt): The model’s weights need to be saved as the training process goes on.
This is useful as it helps in saving the state of progress of the model - in case the training of the model is
interrupted due to a break in connection, it can then be continued from the last saved state. Also, it helps
isolate the model from its training environment, as the
without having to undergo the training process again. Test the model: The model needs to be tested with
some particular loss function so that the usefulness of the model can be quantified. Non-Functional
Requirements:
_____________________________________________________________________________________
_____________________________________________________________________________________
The model should not take too long to train: Considering the complexity involved with a lot of generative
models where some of the state of the art ones are trained on specialized hardware and still take many
hours (and sometimes days) to train, it is important to have a model which is not so intensive. In keeping
with our limited experience and the well documented issues which occur in training GAN models and their
variants, it becomes necessary for the model to train fast enough for us to analyze the progress made in
different metrics across epochs so that we can decide whether the training is progressing sufficiently well
enough to let it progress. The model should generalise well across images of different kinds: The success of
a deep learning model depends on how well it is able to generalise for different kinds of input and the same
holds true for our model. Though our
are quite similar in terms of various photometric attributes: the gradient of the image and the general color
combinations, it should ideally be able to colorize images from different domains and provide realistic output
in different cases. Hardware Requirements: As with most deep learning models, the availability of a GPU
instance for training is highly beneficial in terms of saving time. We made use of the free GPU instance
available on Google Colab for training our models. The CPU and GPU specifications for the cloud instance
availed can be seen as:
GPU: 1xTesla K80, compute 3.7 having 2496 CUDA cores, 12GB GDDR5 VRAM 14
_____________________________________________________________________________________
_____________________________________________________________________________________
CPU: 1x single core hyper threaded Xeon Processors @2.3Ghz i.e. (1 core, 2 threads)
RAM: 12.6 GB Available Disk: 33 GB Available
Software Requirements OpenCV(for basic image processing tasks – such as conversion of colour images
into greyscale etc) The python package cv2 is the Python based solution for the OpenCV software (which is
natively written in C++) Pandas, numpy: These are the core packages related to loading of and wrangling
data. Numpy’s ndarray object is used extensively for handling data. It has an advantage over using native
Python data types as its implementation is in C, and thus provides faster data access as well as being more
memory efficient. Keras: It
is a high level API for building and training deep learning models. 25
The
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
The
A known dataset serves as the initial training data for the discriminator. Training it 2
involves presenting it with samples from the training dataset, until it achieves
acceptable accuracy. The generator trains based on whether it succeeds in fooling the
discriminator. Typically the generator is seeded with randomized input that is sampled
from a predefined latent space (e.g. a multivariate normal distribution). Thereafter,
candidates synthesized by the generator are evaluated by the discriminator.
Backpropagation is applied in both networks so that the generator produces better
images, while the discriminator becomes more skilled at flagging synthetic images. The
generator is typically a deconvolutional neural network, and the discriminator is a
convolutional neural network. GANs often suffer from a"mode collapse"where they fail to
generalize properly, missing entire modes from the input data. For example, a GAN
trained on the MNIST dataset
_____________________________________________________________________________________
_____________________________________________________________________________________
containing many samples of each digit, might nevertheless timidly omit a subset of the
digits from its output. Some researchers perceive the root problem to be a weak
discriminative network that fails to notice the pattern of omission, while others assign
blame to a bad choice of objective function.
The general GAN structure: Our GAN model uses an autoencoder for the generator model
_____________________________________________________________________________________
_____________________________________________________________________________________
7. Detailed Design For the autoencoder model: As our implementation consists primarily of a deep learning
model, training it, validating it, and finally testing it on various mutually exclusive subsets of our image
dataset, there are no separate modules in which we have divided our work. Instead, the implementation can
be viewed as a collection of multiple Jupyter Notebook cells, each of which introduces or progresses the
work of building a functioning model. In terms of functionality, the autoencoder based model can be
considered to be made up of the following parts:
_____________________________________________________________________________________
_____________________________________________________________________________________
1. where all the necessary packages,sub-packages and methods are loaded into the environment
namespace 2. the function to de-colorize an RGB image - this is later applied to both the training and testing
sets before passing them through the model 3. the feature to stack up a variable number of images from the
dataset and view them side by side to get an idea of the variance in the dataset 4. Defining the network
parameters - this includes defining the input layer which should be in accordance with the shape and
a training-specific parameter and other parameters relevant to the CNN based structure of the encoder
model. This includes the kernel
size and the stride. Another important parameter is the size of the 36
latent vector. It is this value that specifies the amount of detail in what our model has learnt. 5. This is
followed by generating the structure and summary of both the encoder and decoder models. 6. More training
phase specific parameters: We make use of ReduceLRonPlateau method which dynamically alters the
learning rate if there isn't any sufficient improvement after a set number of epochs. Checkpoints are also
defined for saving the model periodically as well as the end of training.
_____________________________________________________________________________________
_____________________________________________________________________________________
7. This is followed by instantiating the actual training process, followed by ‘fitting’ the trained model on the
testing set of our data. The resulting output is then saved locally or can be compared to the original input
which yielded it. For the GAN model, the design is more detailed as it involves the simultaneous training of
two full-fledged deep learning models. As a result, the implementation is more modularized. For this
codebase, we have the following modules: ● dataset.py ● main.py ● models.py ● networks.py ● ops.py ●
options.py ● utils.py The datasets.py consists of a class based wrapper over the core dataset(CIFAR 10)
that we have used to train and test our core model. Instead of a simple import and splitting of the dataset,
we have defined custom TestDataset, BaseDataset and Cifar10Dataset classes with relevant helper
methods which serve to help provide useful utility wrappers over the datasets such as simple and
convenient unpickling, shuffling and stacking of images. The main.py module is the introduction to the
workflow of the project as this is where the computational graph which powers a defined Tensorflow model
is defined. This is where the tensorflow session is initialized as well as defining the conditions for the training
and building of the model as well as loading of the defined model afterwards for future usage. The
models.py is again an object-oriented definition of the GAN models which are trained and used. It consists of
a BaseModel class as well as a child class which inherits the basic
_____________________________________________________________________________________
_____________________________________________________________________________________
attributes from this model. This setup makes the code extensible by allowing us to incorporate other different
GAN models in the same project by defining it here, making it inherit from the BaseModel parent class. The
networks.py module defines the individual neural networks which make up our GAN model i.e. the
discriminator and the generator. The reasoning behind organizing it this way is that it allows us to combine
differently defined discriminators and generators as desired in the future. This is reflected in the BaseModel
class which takes class instances for both discriminator and generator neural networks as arguments. The
ops.py module defines both pre-processing as well as post-processing steps such as decolorizing input to
feed into the network, as well as storing output images. In the options.py module, we make use of the
standard library package argparse for defining command line arguments which can be passed when running
our model with customized input for different behaviours. Some of these are changing the number of epochs
to be used for training, the batch size for the stochastic gradient descent process, changing the learning rate
decay feature(something we made use of in our autoencoder model as well) as well as the training status
logging frequency, among other such options. In the utils.py, we have simple helper methods which provide
features such as saving the pickled model, showing a progress bar, plotting graphs provided an array input
etc. 8. Implementation and Pseudocode Implementation for the autoencoder:
_____________________________________________________________________________________
_____________________________________________________________________________________
The autoencoder model exists as a single Jupyter Notebook which is modularized in different functions and
cells(each Jupyter Notebook cell has independent execution ability while sharing namespace with other cells
which have been defined above) each with specific utility. This involves loading of the required modules from
different packages as well as the dataset, defining pre-processing functions for converting the default three-
channel RGB images into a single channel grayscale one, viewing the original(ground truth), decolorized as
well as the colorized output images as a stack of multiple images(whose dimensions can be configured) 9.
Testing The testing for the autoencoder model effectively looks at how well the model is able to construct an
output image by placing it in direct contrast with the input image. The model uses the mean square error
loss function which in this case effectively tells us the average pixel-wise
broken up
of 5:1 and training is done in batches of size 32, with a new batch of images being trained on every epoch.
_____________________________________________________________________________________
_____________________________________________________________________________________
10. Results and Discussion Results for the Autoencoder Model:
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
As can be seen, the autoencoder model outputs realistic colors for the different varieties of images, though
they do not match up exactly with the corresponding ground truth image. The autoencoder fails to capture
the subtlety in the original (ground truth) images. This is also seen in other images in the dataset - the
coloring obtained from the autoencoder is consistent but doesn't always match the original although it does
capture a realistic color, which is predictably due to the fact that there are multiple images from the same
‘class’. Results for the GAN model
_____________________________________________________________________________________
_____________________________________________________________________________________
Original colored image(Ground truth) above
_____________________________________________________________________________________
_____________________________________________________________________________________
Colored GAN image(Predicted output) The output provided by the GAN model, in contrast to the
autoencoder model, does not provide consistent coloring as the images in the output look to be colored in
patches - especially with out of place tinges in multiple places. This phenomenon is noticed in other images
too where certain parts of the image look to be thoroughly colored but some other parts look discolored but
with patches of some color present. In contrast to the autoencoder model, the output images of the GAN
model are much more patchier - the images seem to look like black-and-white images with patches of colors
filled in instead of having a well rounded colored image.
_____________________________________________________________________________________
_____________________________________________________________________________________
11. Snapshot Figure-1
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
12. Conclusions After working on implementing the two kinds of generative models, we got a firsthand idea
and experience of the difficulties in training a GAN model (where the efficacy depends on how well two
neural networks train concurrently). As a result, we observed that we obtained better results with the
autoencoder model than we did with the GAN model. Despite the fact that GAN models are generally used
to and are able to achieve more complex tasks than autoencoders, the difficulties with training them in a
stable manner remain, as well as the need for extensive computational resources. Thus, for our particular
use case, an autoencoder based model worked out well. The autoencoder model trained for 30 epochs
resulting in a loss of 0.1196. This reflected in the generated images which were generally accurate and
continuous in nature. 13. Further enhancements Our work revolved around exploring generative models, in
contrast to the established CNN based solutions in the problem domain. The field of GANs however is
rapidly advancing with new ideas and architectures being proposed all the time. An example of this is the
fact that GANs are now applicable to text based generation problems, something which the creator of vanilla
GANs, Ian Goodfellow himself didn't envision initially. Similarly,
tasks such as this one, some novel GAN based approaches are being proposed. One of these is called the
Pix2Pix GAN model. This has
_____________________________________________________________________________________
_____________________________________________________________________________________
similarities to the GAN model we used for our project as it constitutes an autoencoder for the generator of
the GAN. However, its definition of loss is different to our discriminator’s - whereas our discriminator
classifies an image as real or fake, the Pix2Pix model’s discriminator takes a novel approach in that it
classifies sections/chunks of the generator’s output as real/fake. Another aspect of improvement in our work
is in the evaluation metrics for the GAN model.
Unlike other deep learning neural network models that are trained with a loss 9
function until convergence, a GAN generator model is trained using a second
model called a discriminator that learns to classify images as real or generated. Both the
generator and discriminator model are trained together to maintain an equilibrium. As
such, there is no objective loss function used to train the GAN generator models and no
way to objectively assess the progress of the training and the relative or absolute quality
of the model from loss alone.
There is some scope for enhancement here however: quantitative approaches for evaluation involve making
use of low level image statistics as well as other quantitative approaches like Inception score, Wasserstein
critic etc. References/Bibliography Source for Google Colab instance specs:
https://colab.research.google.com/drive/151805XTDg--dgHb3- 32
AXJCpnWaqRhop_2
_____________________________________________________________________________________
_____________________________________________________________________________________
Google Developer’s introductory post on GAN models: https://developers.google.com/machine-learning/gan/
Tips for training stable GAN models: https://machinelearningmastery.com/how-to-train-stable-generative-
adversarial-networks/ Beginner’s guide to GANs: https://pathmind.com/wiki/generative-adversarial-network-
gan Visual progression of training of a GAN model from a randomly picked data distribution(helped in
developing an intuition for the training process): https://poloclub.github.io/ganlab/ Proposed alternative to
vanilla GANs (also discusses shortcomings of vanilla GAN models): https://towardsdatascience.com/pix2pix-
869c17900998 Stanford CS231 course lecture on generative models covering everything from probability
distributions to autoencoders and its variants to GAN models: https
autoencoders: https://www.kaggle.com/shivamb/how-autoencoders-work-intro- 26
and-usecases
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image 7
translation with conditional adversarial networks. 2016
2.
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, 19
and Xi Chen. Improved techniques for training gans..
_____________________________________________________________________________________
Automatic Image Colorization using Generative Model Automatic Image Colorization using Generative
Model Automatic Image Colorization using Generative Model Automatic Image Colorization using
Generative Model Automatic Image Colorization using Generative Model Automatic Image Colorization
using Generative Model Automatic Image Colorization using Generative Model Automatic Image
Colorization using Generative Model Automatic Image Colorization using Generative Model Automatic
Image Colorization using Generative Model Automatic Image Colorization using Generative Model
Automatic Image Colorization using Generative Model Automatic Image Colorization using Generative
Model Automatic Image Colorization using Generative Model Automatic Image Colorization using
Generative Model Automatic Image Colorization using Generative Model Automatic Image Colorization
using Generative Model Automatic Image Colorization using Generative Model Automatic Image
Colorization using Generative Model Automatic Image Colorization using Generative Model Automatic
Image Colorization using Generative Model Automatic Image Colorization using Generative Model
Automatic Image Colorization using Generative Model Automatic Image Colorization using Generative
Model Automatic Image Colorization using Generative Model Automatic Image Colorization using
Generative Model Automatic Image Colorization using Generative Model Automatic Image Colorization
using Generative Model Automatic Image Colorization using Generative Model Automatic Image
Colorization using Generative Model Automatic Image Colorization using Generative Model Automatic
Image Colorization using Generative Model Automatic Image Colorization using Generative Model
Automatic Image Colorization using Generative Model Automatic Image Colorization using Generative
Model Automatic Image Colorization using Generative Model Automatic Image Colorization using
Generative Model Automatic Image Colorization using Generative Model Automatic Image Colorization
using Generative Model Automatic Image Colorization using Generative Model Automatic Image
Colorization using Generative Model Automatic Image Colorization using Generative Model Automatic
Image Colorization using Generative Model Automatic Image Colorization using Generative Model Dept. Of
CSE Jan-May, 2020 Page 1 Dept. Of CSE Jan-May, 2020 Page 2 Dept. Of CSE Jan-May, 2020 Page 3
Dept. Of CSE Jan-May, 2020 Page 4 Dept. Of CSE Jan-May, 2020 Page 5 Dept. Of CSE Jan-May, 2020
Page 6 Dept. Of CSE Jan-May, 2020 Page 7 Dept. Of CSE Jan-May, 2020 Page 8 Dept. Of CSE Jan-May,
2020 Page 9 Dept. Of CSE Jan-May, 2020 Page 10 Dept. Of CSE Jan-May, 2020 Page 11 Dept. Of CSE
Jan-May, 2020 Page 12 Dept. Of CSE Jan-May, 2020 Page 13 Dept. Of CSE Jan-May, 2020 Page 14 Dept.
Of CSE Jan-May, 2020 Page 15 Dept. Of CSE Jan-May, 2020 Page 16 Dept. Of CSE Jan-May, 2020 Page
17 Dept. Of CSE Jan-May, 2020 Page 18 Dept. Of CSE Jan-May, 2020 Page 19 Dept. Of CSE Jan-May,
2020 Page 20 Dept. Of CSE Jan-May, 2020 Page 21 Dept. Of CSE Jan-May, 2020 Page 22 Dept. Of CSE
Jan-May, 2020 Page 23 Dept. Of CSE Jan-May, 2020 Page 24 Dept. Of CSE Jan-May, 2020 Page 25 Dept.
Of CSE Jan-May, 2020 Page 26 Dept. Of CSE Jan-May, 2020 Page 27 Dept. Of CSE Jan-May, 2020 Page
28 Dept. Of CSE Jan-May, 2020 Page 29 Dept. Of CSE Jan-May, 2020 Page 30 Dept. Of CSE Jan-May,
2020 Page 31 Dept. Of CSE Jan-May, 2020 Page 32 Dept. Of CSE Jan-May, 2020 Page 33 Dept. Of CSE
Jan-May, 2020 Page 34 Dept. Of CSE Jan-May, 2020 Page 35 Dept. Of CSE Jan-May, 2020 Page 36 Dept.
Of CSE Jan-May, 2020 Page 37 Dept. Of CSE Jan-May, 2020 Page 38 Dept. Of CSE Jan-May, 2020 Page
39 Dept. Of CSE Jan-May, 2020 Page 40 Dept. Of CSE Jan-May, 2020 Page 41 Dept. Of CSE Jan-May,
2020 Page 42 Dept. Of CSE Jan-May, 2020 Page 43 Dept. Of CSE Jan-May, 2020 Page 44