Professional Documents
Culture Documents
com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia
Procedia
Procedia Computer
Computer Science
Science 00(2020)
173 (2019)28–35
000–000
www.elsevier.com/locate/procedia
International Conference on Smart Sustainable Intelligent Computing and Applications under
ICITETM2020
International Conference on Smart Sustainable Intelligent Computing and Applications under
Super-Resolution using ICITETM2020
GANs for Medical Imaging
Super-Resolution using GANs
Rohit Guptaa, Anurag Sharmab,∗for Medical
, Anupam Imaging
Kumarc
a,b,cDepartment b,∗ Rohini, Sector 22, PSP Area,
a Agrasen Institute of Technology
of Computer Science, Maharaja c Delhi, 110086, India
Rohit Gupta , Anurag Sharma , Anupam Kumar
a,b,cDepartment of Computer Science, Maharaja Agrasen Institute of Technology Rohini, Sector 22, PSP Area, Delhi, 110086, India
Abstract
Generative
Abstract Adversarial Models (GANs) have been quite popular and are currently and active area of research. They can be used
for generative new data and study adversarial samples and attacks. We have used the similar approach to apply super-resolution to
Generative Adversarial
medical images. ModelsMRI
In Radiology (GANs) have been used
is a commonly quite method
populartoand are currently
produce medicaland activebut
imaging areatheoflimitations
research. They
of labcan be used
equipment
for
and health hazard of being in an MRI radiation environment to obtain good quality scans lead to lower quality scans and also to
generative new data and study adversarial samples and attacks. We have used the similar approach to apply super-resolution it
medical images.
takes a lot of timeIn to
Radiology MRI is a commonly
get a high-resolution data. Thisused methodcan
problem to produce
be solvedmedical
by usingimaging but the limitations
super-resolution using deepof lab equipment
learning as a
and health hazard
post-processing of to
step being in anthe
improve MRI radiationofenvironment
resolution to obtain good quality
the scans. Super-resolution scans of
is a process lead to lower higher
generating qualityresolution
scans andimages
also it
takes a lot of time to get a high-resolution data. This problem can be solved by using super-resolution using
from lower resolution data. For this, we are proposing a generative adversarial network architecture which is a dual neural network deep learning as a
post-processing step to
designed to generate improve
lifelike the resolution
images. of the
In this deep scans.algorithm,
learning Super-resolution is a process
two neural networks ofcompete
generating withhigher
eachresolution images
other to improve
from lower resolution
alternatively. data. Forset,
Given a training this,this
wetechnique
are proposing
learnsa generative
to generateadversarial network
new data with architecture
the same which
statistics is atraining
as the dual neural network
set. To apply
designed to generate lifelike images. In this deep learning algorithm, two neural networks compete with
this technique to our problem statement we are using generator as the network to improve the resolution and discriminator as a each other to improve
alternatively.
network Given
to train a training
generator set,We
better. thisused
technique
transferlearns to generate
learning new data with
in our generative neuralthenetwork
same statistics as theour
and training training set. To apply
discriminator from
this technique
scratch to our
and using problem statement
the perceptual loss [1] towetrain
are our
using generator
network. Thisaswill
the help
network to improve
in improving thethe resolution of
performance andthediscriminator
network. Weasarea
network
using Lungto train
MRIgenerator better. We used
scans of tuberculosis withtransfer
a set oflearning
216 MRIinsamples
our generative neural
containing network
around and channels
60-130 training our
eachdiscriminator from
and each channel
scratch 512x512
having and usingdimensions.
the perceptual loss [1] to train our network. This will help in improving the performance of the network. We are
using Lung MRI scans of tuberculosis with a set of 216 MRI samples containing around 60-130 channels each and each channel
having
c 2020512x512 dimensions.
The Authors. Published by Elsevier B.V.
This
© is an
2020 open
The accessPublished
Authors. article under the CC BY-NC-ND
by Elsevier B.V. license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
c 2020 The under
Peer-review Authors. Published by
responsibility Elsevier
of the B.V.committee of the International Conference on Smart Sustainable Intelligent Com-
This is an open access article under thescientific
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
This is and
puting an open
Peer-review access
Applications
under article
underunder
responsibility thescientific
CC BY-NC-ND
ICITETM2020.
of the license
committee of the(http://creativecommons.org/licenses/by-nc-nd/4.0/)
International.
Peer-review under responsibility of the scientific committee of the International Conference on Smart Sustainable Intelligent Com-
Keywords:
puting and Applications under ICITETM2020.
Generative Adversarial Network;Deep Learning;Computer Vision;Super resolution;Magnetic resonance imaging;Transfer
Learning;Artificial Intelligence
Keywords: Generative Adversarial Network;Deep Learning;Computer Vision;Super resolution;Magnetic resonance imaging;Transfer
Learning;Artificial Intelligence
1. Introduction
1. Introduction
Magnetic resonance imaging is a widely used form of medical imaging which is carried for diagnosis of various
disorders in various body parts like brain, lungs, kidneys etc. The amount of time required by a patient to get desired
Magnetic resonance imaging is a widely used form of medical imaging which is carried for diagnosis of various
modalities increases beyond normal tolerance at times which is dangerous for their health. Image modalities are often
disorders in various body parts like brain, lungs, kidneys etc. The amount of time required by a patient to get desired
modalities increases beyond normal tolerance at times which is dangerous for their health. Image modalities are often
∗ Corresponding author. Tel.: +91-997-145-5852
E-mail address: anurag.sharma.2p@gmail.com
∗ Corresponding author. Tel.: +91-997-145-5852
affected by the equipment’s physical limitations, complexity and cost constraints. These modalities like resolution,
contrast and textures can be achieved without compromising patient’s health and without using expensive equipment
by performing post processing on the image to boost their quality using deep learning. Deep learning can be used
to enhance lower quality image by reducing noise and enhancing sharpness and contrast along with the resolution.
The Neural networks have performed well in the past and provided a proof of concept for our solution as well. The
generative adversarial network that we are proposing is expected to give better results than traditional deep learning
models, some of which we will be looking in the next section. In this paper we are using the basic concept of transfer
learning. Instead of training the generator from scratch, we are using a pretrained model for encoder in our generator
and train the generator first and in isolation and then trained it with a discriminator simultaneously. This will help
training the model in two ways, first, by using transfer learning the model can identify features even before the
training, and second, make training faster and more accurate. This helped in faster and better training of the generator.
Within our generator we used a pretrained encoder for better and faster convergence. Moreover, we used perceptual
losses [1] which we will discuss in Section 3.3.
2. Literature Review
The paper [2] use representation learning to improve the resolution using deep neural networks on three sets imag-
ing data Mammary gland, Prostate tissue and Human brain for training their proposed neural network and achieved
average PSNR score of 28.99, 36.98 and 46.12 respectively and SSIM scores of 0.7889, 0.9566 and 0.9983 respec-
tively. They proposed a network in which a Randomized ReLU, which incorporates a nonzero slope for negative part
which helped to solve the problem of hyper-compression, followed by Nesterovs Accelerated Gradient method on
the SRCNN is used to accelerate the convergence of loss function, reducing noise of over-fitting and enhancing the
quality of generated results. This network takes advantage of nonlinear mapping from LR space to HR space space
directly.
In the paper [3], Qing Lyu, Chenyu You et al. achieved PSNR scores of 33.2031 and 32.2995 and SSIM scores
of 0.9563 and 0.9465 respectively using their previously published GAN-CPCE and GAN-CIRCLE [4] models, on
the IXI dataset, which were built on similar footsteps as of our approach. GAN-CPCE is a combination of the VGG
network GAN with the Wasserstein distance for perceptual similarity measure. The architecture of the generator G of
the GAN-CPCE has four convolutional layers with 32 33 filters, three deconvolutional layers with 32 33 filters, and
one deconvolutional layer with one 33 filter and the discriminator D has 6 convolutional layers with 64, 64, 128, 128,
256, and 256 33 filters respectively which are followed by 2 fully-connected layers of sizes 1024 and 1. The GAN-
CIRCLE architecture that they proposed includes two generative networks where each generative network consists of
a feature extraction network and a reconstruction network.
A recently published paper [5] showed various types of GAN networks and their applications in reconstructing
MRI images with improved edges and textures for lower resolution images.
A similar concept is published in the paper [6], that is, a 3D Densely Connected Super-Resolution Networks which
resulted in PSNR and SSIM scores of 35.05 and 0.9320 respectively on Brain MRI dataset with 1,113 subjects, and
resolution up to 4x.
The paper [7] also proves the concept of great performance of deep learning models in super resolution applications.
They proposed a 3d convolutional network to enhance the resolution by a factor of 2 with PSNR and SSIM scores of
37.51 and 0.9735 respectively
3. Methodology
We started with about 216 images in NifTI format in which each instance contained about 60-130 chan-
nels each containing a 512x512 image. We sampled the data in such a way that minimized data sim-
ilarity to prevent overfitting. This was achieved by taking every third channel for first and last quar-
ter of initial Neuroimaging Informatics Technology Initiative (NIfTI-1) instance and taking alternate sam-
ples from the remaining. This also led to some cleaning and filtering of redundant and biased data and
30 Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35
Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000 3
also lead to removal of low feature images to get sets of png images corresponding to each instance.
This boosted the performance by significant level because this process helped to create a uniform dataset.
Now, to create our training data we use downsampling by 2x to create training input and original images as the
output.
The generator is a UNet [8] architecture where the encoded is a pretrained ResNet34 [9] network. The pretrained
network helped the generator get the relevant features without any training. We fine-tuned the encoder in the second
stage to make it adapt the dataset and improve the metric score further. The decoder is a basic network with each block
as shown in Fig. 1 and Fig. 2.
The discriminator is a 11-layered neural net with some conv blocks and an adaptive-average pooling layer. The
kernel-size is 3x3 in all the conv layers and number of filters is shown in Fig. 3. Stride in each conv layer is 1 or
2 alternatively and padding size is 1 in all the layers. Each block consists of a conv layer, a batch-norm layer and a
LeakyReLU layer as shown in Fig. 4.
Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35 31
4 Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000
The input with a pretrained network should have 3 channels so we converted the slices extracted from NIfTI-1
formatted dataset to RGB channel input and the output is a single channel image. The training was done in 3 stages. In
the first stage we froze the encoded completely and trained the decoded from scratch. In the second stage we unfreeze
the encoder and fine-tuned the whole generator. Now we have a pretrained generator and a discriminator. Finally,
in the third stage we trained both the generator and discriminator simultaneously together to improve the generator
further and improve the metric score. The model was trained with 10 epochs in each stage.
To get the input image we downscaled the data to 256x256 which was our input and resized to 512x512 using
bilinear interpolation which was the input to our model and the original data is the output. Using just the generator
we got good results but the discriminator helped it improve further. To enforce the mappings between the source and
target domains and regularize the training procedure, our proposed network combines two types of loss functions:
mean absolute error and perceptual loss [1]. While training it with the discriminator in the final stage we added
adversarial loss [10]
1 n
MAE = N i=1 |yi − ŷi | (1)
Fig. 5. (a) Learning rate over each iteration; (b)Momentum over each iteration.
are the channels, height and width of the activation maps from jth layer. φ j (ŷ) and φ j (y) are the activation maps from
jth layer for generated and target image respectively. We write the content loss and gram loss as:
j 1
lφ,f eat (ŷ, y) = C j H j W j |φ j
(ŷ) − φ j (y) | (2)
1 H j W j
Gφj (x)c,c = C j H jW j h=1 w=1 φ j (x)h,w,c φ j (x)h,w,c (3)
j
lφ,
style
(ŷ, y) = |G jφy − G jφy | (4)
The loss function we used is MAE loss along with perceptual loss [1] using a pretrained VGG16 [11] network.
Perceptual loss helped in removing blur from output of the generator and make the results more realistic because in
feature loss the loss is calculated between the features generated from different layers of another pretrained VGG16
[11] network and gram loss helped in improving the texture representation of the output image. We used weighted
loss with a different weight for each layer and added that to the L1 loss between the output and the target.
The final loss is the sum of all losses calculated individually. This pushed our solution to the natural image manifold
using a discriminator network that is trained to differentiate between the super-resolved images and original photo-
realistic images. The discriminator loss is a basic one as shown in eq. 6. Finally, after 30 epochs of training we got the
results mentioned in the result section.
N N
SR
ladv = 1 − N1 n=1 DθD GθG I HR + N1 n=1 DθD GθG I LR (6)
Fig. 6. (a) Learning rate over each iteration; (b)Momentum over each iteration.
4. Results
As shown in Fig. 6 the PSNR score and SSIM scores of our model after about 10 epochs turns out to be 38.1321
and 0.9430 which are better than most deep learning model on datasets of MRI images. The losses and scores for each
epoch are shown in Table 1, 2, and 3.
PSNR 37.3634 37.6275 37.827 38.0794 37.8383 38.0339 37.9235 37.955 38.0926 38.1321
SSIM 0.9373 0.9373 0.9429 0.945 0.943 0.9433 0.945 0.9452 0.9466 0.943
Fig. 7. Input image (left) is bilinear interpolated, target image(right) and generated image(middle).
The results we achieved by our approach show that the problem statement we started has been successfully solved
and the Generative adversarial network architecture proved to outperform many standard deep learning models in the
past, on MRI datasets. The higher resolution images that we achieved after training or model as shown in Fig. 7 are
indistinguishable from the target images, and all the edges and textures are well defined, which is a great sign and
proves our model can help improve resolution of MRI images better than most deep learning models. The limited
Rohit Gupta et al. / Procedia Computer Science 173 (2020) 28–35 35
8 Rohit, Anurag, et al. / Procedia Computer Science 00 (2019) 000–000
dataset and hardware limitations that we faced during training our model can be further eliminated as computation
power increases in the future which will allow us to increase the number of epochs and more accuracy than before.
In the future, we can experiment with higher up-sampling rate like 3x, 4x, and so on. This approach might also be
useful for MRI datasets of other body parts with modified versions of this model. To achieve success on other datasets,
retraining of the model may be required as the features of images may change depending on the organ that we choose.
Please title your files in this order ‘procedia acronym conference acronym authorslastname’. Submit both the
source file and the PDF to the Guest Editor.
References
[1] Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Computer Vision – ECCV
2016 Lecture Notes in Computer Science, 694–711. doi: 10.1007/978-3-319-46475-6 43
[2] Yang, X., Zhant, S., Hu, C., Liang, Z., & Xie, D. (2016). Super-resolution of medical image using representation learning. 2016 8th International
Conference on Wireless Communications & Signal Processing (WCSP). doi: 10.1109/wcsp.2016.7752617
[3] Lyu, Q., You, C., Hongming, S., & Wang, G. (2018, October 16). Super-resolution MRI through Deep Learning. Retrieved November 25, 2019,
from https://arxiv.org/abs/1810.06776
[4] You, C., Li, G., Zhang, Y., Zhang, X., Shan, H., Li, M., . . . Wang, G. (2019). CT Super-resolution GAN Constrained by the Identical, Residual,
and Cycle Learning Ensemble (GAN-CIRCLE). IEEE Transactions on Medical Imaging, 1–1. doi: 10.1109/tmi.2019.2922960
[5] Shende, P., Pawar, M., & Kakde, S. (2019). A Brief Review on: MRI Images Reconstruction using GAN. 2019 International Conference on
Communication and Signal Processing (ICCSP). doi: 10.1109/iccsp.2019.8698083
[6] Chen, Y., Xie, Y., Zhou, Z., Shi, F., Christodoulou, A. G., & Li, D. (2018). Brain MRI super resolution using 3D deep densely connected neural
networks. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). doi: 10.1109/isbi.2018.8363679
[7] Pham, C.-H., Ducournau, A., Fablet, R., & Rousseau, F. (2017). Brain MRI super-resolution using deep 3D convolutional networks. 2017 IEEE
14th International Symposium on Biomedical Imaging (ISBI 2017). doi: 10.1109/isbi.2017.7950500
[8] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Com-
puter Science Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 234–241. doi: 10.1007/978-3-319-24574-4 28
[9] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR). doi: 10.1109/cvpr.2016.90
[10] Wang, C., Xu, C., Wang, C., & Tao, D. (2018). Perceptual Adversarial Networks for Image-to-Image Transformation. IEEE Transactions on
Image Processing, 27(8), 4066–4079. doi: 10.1109/tip.2018.2836316
[11] Simonyan, K., & Zisserman, A. (2015, April 10). Very Deep Convolutional Networks for Large-Scale Image Recognition. Retrieved November
25, 2019, from https://arxiv.org/abs/1409.1556
[12] Leslie N Smith and Nicholay Topin. Super-convergence: very fast training of neural networks using large learning rates. In Tien Pham, editor,
Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, page 1100612. International Society for Optics and
Photonics, May 2019
[13] Jeremy Howard, Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. arXiv preprint arXiv:1801.06146, 2018