You are on page 1of 8

Available online at www.sciencedirect.

com

Available online at www.sciencedirect.com


Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia
Procedia
Procedia Computer
Computer Science
Science 00(2020)
173 (2019)28–35
000–000
www.elsevier.com/locate/procedia
International Conference on Smart Sustainable Intelligent Computing and Applications under
ICITETM2020
International Conference on Smart Sustainable Intelligent Computing and Applications under
Super-Resolution using ICITETM2020
GANs for Medical Imaging
Super-Resolution using GANs
Rohit Guptaa, Anurag Sharmab,∗for Medical
, Anupam Imaging
Kumarc
a,b,cDepartment b,∗ Rohini, Sector 22, PSP Area,
a Agrasen Institute of Technology
of Computer Science, Maharaja c Delhi, 110086, India
Rohit Gupta , Anurag Sharma , Anupam Kumar
a,b,cDepartment of Computer Science, Maharaja Agrasen Institute of Technology Rohini, Sector 22, PSP Area, Delhi, 110086, India

Abstract
Generative
Abstract Adversarial Models (GANs) have been quite popular and are currently and active area of research. They can be used
for generative new data and study adversarial samples and attacks. We have used the similar approach to apply super-resolution to
Generative Adversarial
medical images. ModelsMRI
In Radiology (GANs) have been used
is a commonly quite method
populartoand are currently
produce medicaland activebut
imaging areatheoflimitations
research. They
of labcan be used
equipment
for
and health hazard of being in an MRI radiation environment to obtain good quality scans lead to lower quality scans and also to
generative new data and study adversarial samples and attacks. We have used the similar approach to apply super-resolution it
medical images.
takes a lot of timeIn to
Radiology MRI is a commonly
get a high-resolution data. Thisused methodcan
problem to produce
be solvedmedical
by usingimaging but the limitations
super-resolution using deepof lab equipment
learning as a
and health hazard
post-processing of to
step being in anthe
improve MRI radiationofenvironment
resolution to obtain good quality
the scans. Super-resolution scans of
is a process lead to lower higher
generating qualityresolution
scans andimages
also it
takes a lot of time to get a high-resolution data. This problem can be solved by using super-resolution using
from lower resolution data. For this, we are proposing a generative adversarial network architecture which is a dual neural network deep learning as a
post-processing step to
designed to generate improve
lifelike the resolution
images. of the
In this deep scans.algorithm,
learning Super-resolution is a process
two neural networks ofcompete
generating withhigher
eachresolution images
other to improve
from lower resolution
alternatively. data. Forset,
Given a training this,this
wetechnique
are proposing
learnsa generative
to generateadversarial network
new data with architecture
the same which
statistics is atraining
as the dual neural network
set. To apply
designed to generate lifelike images. In this deep learning algorithm, two neural networks compete with
this technique to our problem statement we are using generator as the network to improve the resolution and discriminator as a each other to improve
alternatively.
network Given
to train a training
generator set,We
better. thisused
technique
transferlearns to generate
learning new data with
in our generative neuralthenetwork
same statistics as theour
and training training set. To apply
discriminator from
this technique
scratch to our
and using problem statement
the perceptual loss [1] towetrain
are our
using generator
network. Thisaswill
the help
network to improve
in improving thethe resolution of
performance andthediscriminator
network. Weasarea
network
using Lungto train
MRIgenerator better. We used
scans of tuberculosis withtransfer
a set oflearning
216 MRIinsamples
our generative neural
containing network
around and channels
60-130 training our
eachdiscriminator from
and each channel
scratch 512x512
having and usingdimensions.
the perceptual loss [1] to train our network. This will help in improving the performance of the network. We are
using Lung MRI scans of tuberculosis with a set of 216 MRI samples containing around 60-130 channels each and each channel
having
c 2020512x512 dimensions.
The Authors. Published by Elsevier B.V.
This
© is an
2020 open
The accessPublished
Authors. article under the CC BY-NC-ND
by Elsevier B.V. license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
c 2020 The under

Peer-review Authors. Published by
responsibility Elsevier
of the B.V.committee of the International Conference on Smart Sustainable Intelligent Com-
This is an open access article under thescientific
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
This is and
puting an open
Peer-review access
Applications
under article
underunder
responsibility thescientific
CC BY-NC-ND
ICITETM2020.
of the license
committee of the(http://creativecommons.org/licenses/by-nc-nd/4.0/)
International.
Peer-review under responsibility of the scientific committee of the International Conference on Smart Sustainable Intelligent Com-
Keywords:
puting and Applications under ICITETM2020.
Generative Adversarial Network;Deep Learning;Computer Vision;Super resolution;Magnetic resonance imaging;Transfer
Learning;Artificial Intelligence
Keywords: Generative Adversarial Network;Deep Learning;Computer Vision;Super resolution;Magnetic resonance imaging;Transfer
Learning;Artificial Intelligence

1. Introduction
1. Introduction
Magnetic resonance imaging is a widely used form of medical imaging which is carried for diagnosis of various
disorders in various body parts like brain, lungs, kidneys etc. The amount of time required by a patient to get desired
Magnetic resonance imaging is a widely used form of medical imaging which is carried for diagnosis of various
modalities increases beyond normal tolerance at times which is dangerous for their health. Image modalities are often
disorders in various body parts like brain, lungs, kidneys etc. The amount of time required by a patient to get desired
modalities increases beyond normal tolerance at times which is dangerous for their health. Image modalities are often
∗ Corresponding author. Tel.: +91-997-145-5852
E-mail address: anurag.sharma.2p@gmail.com
∗ Corresponding author. Tel.: +91-997-145-5852

E-mail address: anurag.sharma.2p@gmail.com


1877-0509  c 2020 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1877-0509
Peer-reviewcunder
2020 responsibility
The Authors. Published by Elsevier
of the scientific B.V. of the International Conference on Smart Sustainable Intelligent Computing and
committee
1877-0509 © 2020 The Authors. Published by Elsevier B.V.
This is an open
Applications access
under article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
ICITETM2020.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review
Peer-review under
underresponsibility
responsibilityofofthe
thescientific
scientificcommittee
committeeofofthe
theInternational Conference on Smart Sustainable Intelligent Computing and
International.
Applications under ICITETM2020.
10.1016/j.procs.2020.06.005
Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35 29
2 Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000

affected by the equipment’s physical limitations, complexity and cost constraints. These modalities like resolution,
contrast and textures can be achieved without compromising patient’s health and without using expensive equipment
by performing post processing on the image to boost their quality using deep learning. Deep learning can be used
to enhance lower quality image by reducing noise and enhancing sharpness and contrast along with the resolution.
The Neural networks have performed well in the past and provided a proof of concept for our solution as well. The
generative adversarial network that we are proposing is expected to give better results than traditional deep learning
models, some of which we will be looking in the next section. In this paper we are using the basic concept of transfer
learning. Instead of training the generator from scratch, we are using a pretrained model for encoder in our generator
and train the generator first and in isolation and then trained it with a discriminator simultaneously. This will help
training the model in two ways, first, by using transfer learning the model can identify features even before the
training, and second, make training faster and more accurate. This helped in faster and better training of the generator.
Within our generator we used a pretrained encoder for better and faster convergence. Moreover, we used perceptual
losses [1] which we will discuss in Section 3.3.

2. Literature Review

The paper [2] use representation learning to improve the resolution using deep neural networks on three sets imag-
ing data Mammary gland, Prostate tissue and Human brain for training their proposed neural network and achieved
average PSNR score of 28.99, 36.98 and 46.12 respectively and SSIM scores of 0.7889, 0.9566 and 0.9983 respec-
tively. They proposed a network in which a Randomized ReLU, which incorporates a nonzero slope for negative part
which helped to solve the problem of hyper-compression, followed by Nesterovs Accelerated Gradient method on
the SRCNN is used to accelerate the convergence of loss function, reducing noise of over-fitting and enhancing the
quality of generated results. This network takes advantage of nonlinear mapping from LR space to HR space space
directly.
In the paper [3], Qing Lyu, Chenyu You et al. achieved PSNR scores of 33.2031 and 32.2995 and SSIM scores
of 0.9563 and 0.9465 respectively using their previously published GAN-CPCE and GAN-CIRCLE [4] models, on
the IXI dataset, which were built on similar footsteps as of our approach. GAN-CPCE is a combination of the VGG
network GAN with the Wasserstein distance for perceptual similarity measure. The architecture of the generator G of
the GAN-CPCE has four convolutional layers with 32 33 filters, three deconvolutional layers with 32 33 filters, and
one deconvolutional layer with one 33 filter and the discriminator D has 6 convolutional layers with 64, 64, 128, 128,
256, and 256 33 filters respectively which are followed by 2 fully-connected layers of sizes 1024 and 1. The GAN-
CIRCLE architecture that they proposed includes two generative networks where each generative network consists of
a feature extraction network and a reconstruction network.
A recently published paper [5] showed various types of GAN networks and their applications in reconstructing
MRI images with improved edges and textures for lower resolution images.
A similar concept is published in the paper [6], that is, a 3D Densely Connected Super-Resolution Networks which
resulted in PSNR and SSIM scores of 35.05 and 0.9320 respectively on Brain MRI dataset with 1,113 subjects, and
resolution up to 4x.
The paper [7] also proves the concept of great performance of deep learning models in super resolution applications.
They proposed a 3d convolutional network to enhance the resolution by a factor of 2 with PSNR and SSIM scores of
37.51 and 0.9735 respectively

3. Methodology

3.1. Training and Testing Datasets

We started with about 216 images in NifTI format in which each instance contained about 60-130 chan-
nels each containing a 512x512 image. We sampled the data in such a way that minimized data sim-
ilarity to prevent overfitting. This was achieved by taking every third channel for first and last quar-
ter of initial Neuroimaging Informatics Technology Initiative (NIfTI-1) instance and taking alternate sam-
ples from the remaining. This also led to some cleaning and filtering of redundant and biased data and
30 Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35
Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000 3

Fig. 1. ResUnet Architecture.

Fig. 2. Decoder block of ResUnet Architecture.

also lead to removal of low feature images to get sets of png images corresponding to each instance.
This boosted the performance by significant level because this process helped to create a uniform dataset.
Now, to create our training data we use downsampling by 2x to create training input and original images as the
output.

3.2. Neural Network Architecture

The generator is a UNet [8] architecture where the encoded is a pretrained ResNet34 [9] network. The pretrained
network helped the generator get the relevant features without any training. We fine-tuned the encoder in the second
stage to make it adapt the dataset and improve the metric score further. The decoder is a basic network with each block
as shown in Fig. 1 and Fig. 2.
The discriminator is a 11-layered neural net with some conv blocks and an adaptive-average pooling layer. The
kernel-size is 3x3 in all the conv layers and number of filters is shown in Fig. 3. Stride in each conv layer is 1 or
2 alternatively and padding size is 1 in all the layers. Each block consists of a conv layer, a batch-norm layer and a
LeakyReLU layer as shown in Fig. 4.
Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35 31
4 Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000

Fig. 3. Decoder Architecture.

Fig. 4. Decoder Block.

3.3. Training Process

The input with a pretrained network should have 3 channels so we converted the slices extracted from NIfTI-1
formatted dataset to RGB channel input and the output is a single channel image. The training was done in 3 stages. In
the first stage we froze the encoded completely and trained the decoded from scratch. In the second stage we unfreeze
the encoder and fine-tuned the whole generator. Now we have a pretrained generator and a discriminator. Finally,
in the third stage we trained both the generator and discriminator simultaneously together to improve the generator
further and improve the metric score. The model was trained with 10 epochs in each stage.
To get the input image we downscaled the data to 256x256 which was our input and resized to 512x512 using
bilinear interpolation which was the input to our model and the original data is the output. Using just the generator
we got good results but the discriminator helped it improve further. To enforce the mappings between the source and
target domains and regularize the training procedure, our proposed network combines two types of loss functions:
mean absolute error and perceptual loss [1]. While training it with the discriminator in the final stage we added
adversarial loss [10]

3.3.1. Mean Absolute Error


Mean Absolute Error (MAE) calculates average absolute error, or in other words, the magnitude of difference
between two values. The two values in our case are the target pixels and the predicted pixels. This error doesn’t
include direction and hence it is not biased. This error is linear in nature hence preventing lower values to vanish
which was the limitation of mean squared error in this use-case. For notational convenience, we define N as the batch
size, yi is the target image and ŷi is the generated image. We write the MAE as:

1 n
MAE = N i=1 |yi − ŷi | (1)

3.3.2. Perceptual Loss


Perceptual loss was initially introduced for neural style transfer [1] to get calculate similarity between extracted
features from an input image. We can use this loss here to get feature similarity between the target and the output.
So, we calculated the content loss (eq. 2) and gram loss (eq. 3-4) and added that to the final loss. For notational
j
convenience, we define lφ,f eat as the content loss and Gφj (x)c,c as the gram loss for jth layer respectively, C j , H j, W j
32 Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35
Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000 5

Fig. 5. (a) Learning rate over each iteration; (b)Momentum over each iteration.

are the channels, height and width of the activation maps from jth layer. φ j (ŷ) and φ j (y) are the activation maps from
jth layer for generated and target image respectively. We write the content loss and gram loss as:

j 1
lφ,f eat (ŷ, y) = C j H j W j |φ j
(ŷ) − φ j (y) | (2)

1  H j W j
Gφj (x)c,c = C j H jW j h=1 w=1 φ j (x)h,w,c φ j (x)h,w,c (3)

j
lφ,
style
(ŷ, y) = |G jφy − G jφy | (4)

3.3.3. Adversarial Loss


Adversarial loss [10] pushes our solution to the natural image manifold using a discriminator network that is
trained to differentiate between the generated images and original photo-realistic images. For notational convenience,
we define N as the batch size, DθD is the discriminator and GθG is the generator. We write the adversarial loss as:
N   
SR
ladv = − N1 n=1 loglog DθD GθG I LR (5)

The loss function we used is MAE loss along with perceptual loss [1] using a pretrained VGG16 [11] network.
Perceptual loss helped in removing blur from output of the generator and make the results more realistic because in
feature loss the loss is calculated between the features generated from different layers of another pretrained VGG16
[11] network and gram loss helped in improving the texture representation of the output image. We used weighted
loss with a different weight for each layer and added that to the L1 loss between the output and the target.
The final loss is the sum of all losses calculated individually. This pushed our solution to the natural image manifold
using a discriminator network that is trained to differentiate between the super-resolved images and original photo-
realistic images. The discriminator loss is a basic one as shown in eq. 6. Finally, after 30 epochs of training we got the
results mentioned in the result section.
N    N   
SR
ladv = 1 − N1 n=1 DθD GθG I HR + N1 n=1 DθD GθG I LR (6)

3.3.4. Learning rate scheduler


In 1st and 2nd stage we used one-cycle scheduler [12] as shown in Fig. 5 to update learning rate and momentum.
In stage-1 the learning rate was 0.01 for the decoder and in stage 2 we used discriminative learning rate [13] ranging
from 0.00001 to 0.001. In the final stage the learning rate of 0.0001 for both generator and discriminator. Also, we
used weight decay of 0.01 in first 2 stages. In the first 2 stages we selected hyperparameter betas for Adam optimizer
as (0.9, 0.99) and in the last stage as (0, 0.99).
Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35 33
6 Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000

Fig. 6. (a) Learning rate over each iteration; (b)Momentum over each iteration.

4. Results

As shown in Fig. 6 the PSNR score and SSIM scores of our model after about 10 epochs turns out to be 38.1321
and 0.9430 which are better than most deep learning model on datasets of MRI images. The losses and scores for each
epoch are shown in Table 1, 2, and 3.

Table 1. Results for stage-1 training.

epoch trainl oss validl oss psnr ssim

1 0.3268 0.3403 32.8676 0.9144


2 0.3016 0.3744 24.9024 0.7608
3 0.2818 0.3011 33.0228 0.9151
4 0.2616 0.314 28.1288 0.8552
5 0.2525 0.302 22.2078 0.8878
6 0.2336 0.2529 33.6734 0.9161
7 0.221 0.2603 35.2767 0.9272
8 0.2062 0.2243 36.9965 0.9251
9 0.2031 0.215 36.7158 0.9229
10 0.1984 0.2146 37.0486 0.9225

Table 2. Results for stage-2 training.

epoch trainl oss validl oss psnr ssim

1 0.1982 0.2165 37.0001 0.9235


2 0.2028 0.2161 37.1124 0.9244
3 0.2026 0.2276 36.8247 0.9257
4 0.1995 0.2123 36.9128 0.9246
5 0.194 0.2128 37.0009 0.925
6 0.1878 0.2092 36.9901 0.9244
7 0.1862 0.2069 37.1532 0.9257
8 0.1756 0.2018 37.1373 0.9251
9 0.1839 0.2011 37.1515 0.9244
10 0.1818 0.1983 37.1261 0.9253
34 Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35
Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000 7

Table 3. Results after discriminator is attached and a GAN is trained.


Epochs 1 2 3 4 5 6 7 8 9 10

PSNR 37.3634 37.6275 37.827 38.0794 37.8383 38.0339 37.9235 37.955 38.0926 38.1321
SSIM 0.9373 0.9373 0.9429 0.945 0.943 0.9433 0.945 0.9452 0.9466 0.943

Fig. 7. Input image (left) is bilinear interpolated, target image(right) and generated image(middle).

5. Conclusion and Future scope

The results we achieved by our approach show that the problem statement we started has been successfully solved
and the Generative adversarial network architecture proved to outperform many standard deep learning models in the
past, on MRI datasets. The higher resolution images that we achieved after training or model as shown in Fig. 7 are
indistinguishable from the target images, and all the edges and textures are well defined, which is a great sign and
proves our model can help improve resolution of MRI images better than most deep learning models. The limited
Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35 35
8 Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000

dataset and hardware limitations that we faced during training our model can be further eliminated as computation
power increases in the future which will allow us to increase the number of epochs and more accuracy than before.
In the future, we can experiment with higher up-sampling rate like 3x, 4x, and so on. This approach might also be
useful for MRI datasets of other body parts with modified versions of this model. To achieve success on other datasets,
retraining of the model may be required as the features of images may change depending on the organ that we choose.

5.1. File naming and delivery

Please title your files in this order ‘procedia acronym conference acronym authorslastname’. Submit both the
source file and the PDF to the Guest Editor.

References

[1] Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Computer Vision – ECCV
2016 Lecture Notes in Computer Science, 694–711. doi: 10.1007/978-3-319-46475-6 43
[2] Yang, X., Zhant, S., Hu, C., Liang, Z., & Xie, D. (2016). Super-resolution of medical image using representation learning. 2016 8th International
Conference on Wireless Communications & Signal Processing (WCSP). doi: 10.1109/wcsp.2016.7752617
[3] Lyu, Q., You, C., Hongming, S., & Wang, G. (2018, October 16). Super-resolution MRI through Deep Learning. Retrieved November 25, 2019,
from https://arxiv.org/abs/1810.06776
[4] You, C., Li, G., Zhang, Y., Zhang, X., Shan, H., Li, M., . . . Wang, G. (2019). CT Super-resolution GAN Constrained by the Identical, Residual,
and Cycle Learning Ensemble (GAN-CIRCLE). IEEE Transactions on Medical Imaging, 1–1. doi: 10.1109/tmi.2019.2922960
[5] Shende, P., Pawar, M., & Kakde, S. (2019). A Brief Review on: MRI Images Reconstruction using GAN. 2019 International Conference on
Communication and Signal Processing (ICCSP). doi: 10.1109/iccsp.2019.8698083
[6] Chen, Y., Xie, Y., Zhou, Z., Shi, F., Christodoulou, A. G., & Li, D. (2018). Brain MRI super resolution using 3D deep densely connected neural
networks. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). doi: 10.1109/isbi.2018.8363679
[7] Pham, C.-H., Ducournau, A., Fablet, R., & Rousseau, F. (2017). Brain MRI super-resolution using deep 3D convolutional networks. 2017 IEEE
14th International Symposium on Biomedical Imaging (ISBI 2017). doi: 10.1109/isbi.2017.7950500
[8] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Com-
puter Science Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 234–241. doi: 10.1007/978-3-319-24574-4 28
[9] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR). doi: 10.1109/cvpr.2016.90
[10] Wang, C., Xu, C., Wang, C., & Tao, D. (2018). Perceptual Adversarial Networks for Image-to-Image Transformation. IEEE Transactions on
Image Processing, 27(8), 4066–4079. doi: 10.1109/tip.2018.2836316
[11] Simonyan, K., & Zisserman, A. (2015, April 10). Very Deep Convolutional Networks for Large-Scale Image Recognition. Retrieved November
25, 2019, from https://arxiv.org/abs/1409.1556
[12] Leslie N Smith and Nicholay Topin. Super-convergence: very fast training of neural networks using large learning rates. In Tien Pham, editor,
Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, page 1100612. International Society for Optics and
Photonics, May 2019
[13] Jeremy Howard, Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. arXiv preprint arXiv:1801.06146, 2018

You might also like