You are on page 1of 6

Generative Adversarial Network with Residual

Dense Generator for Remote Sensing Image


Super Resolution
Rika Sustika1,3, Andriyan B.Suksmono1, Donny Danudirdjo1, Ketut Wikantika2
1School of Electrical Engineering and Informatics

Institut Teknologi Bandung (ITB)


Bandung, Indonesia
2Faculty of Earth Sciences and Technology

Institut Teknologi Bandung (ITB)


Bandung, Indonesia
3Research Center for Informatics

Indonesian Institute of Sciences (LIPI)


Bandung, Indonesia
Email : rika002@lipi.go.id

Abstract— Improving image resolution, especially spatial only three convolutional layers but provides better
resolution, has been one of the most important concerns on performance than traditional methods. Improvement from
remote sensing research communities. An efficient solution for SRCNN has been actively explored by many researchers to
improving spatial resolution is by using algorithm, known as obtain better performance. For example, Kim et.al proposed
super-resolution (SR). The super-resolution technique that very deep super-resolution (VDSR) [4] that used residual
received special attention recently is super-resolution based on learning with more convolutional layer than SRCNN. Lim et
deep learning. In this paper, we propose deep learning approach al. proposed enhanced deep super resolution network (EDSR)
based on generative adversarial network (GAN) for remote by removing unnecessary modules in the residual block and
sensing images super resolution. We used residual dense
expanding the model size [5]. Tai et al. proposed recursive-
network (RDN) as generator network. Generally, deep learning
with residual dense network (RDN) gives high performance on
supervision and skip-connection to easy the difficulty of
classical (objective) evaluation metrics meanwhile generative training by the method called deep recursive convolutional
adversarial network (GAN) based approach shows a high network (DRCN) [6]. Zhang et al. proposed residual dense
perceptual quality. Experiment results show that combination network (RDN) that combine dense network with residual
of residual dense network generator with generative adversarial training to fully exploit the hierarchical features from all
network training is found to be effective. Our proposed method convolutional layers in the network [7].
outperforms the baseline method in terms of objective
Most of the super resolution methods aim to maximize
evaluation metrics and perceptual quality.
PSNR by minimizing pixel-wise mean squared errors (MSEs)
Keywords— convolutional neural network, generative between the super-resolved image and the target image.
adversarial network, remote sensing, image, residual dense However, measuring pixel-wise differences cannot capture
network, super-resolution perceptual differences between images easily, therefore
higher PSNR does not always produce a perceptually better
I. INTRODUCTION image [8]. To overcome this problem, Ledig et al. proposed
Remote sensing is the process of obtaining information generative adversarial network (GAN) based training for
about targeted objects or areas by measuring its reflected and super resolution, known as SRGAN (super resolution
emitted radiation from a distance. Remote sensing imaging generative adversarial network). Generative adversarial
can cover larger areas than other methods of telemetry data network (GAN) has been introduced by Goodfellow et al. [9]
acquisition but it has low spatial resolution and very low in to produce realistic fake images. It consists of two competing
relation to the dimensions of the sensed object. An effective neural networks, a generator and a discriminator. The
way to increase image spatial resolution at lower cost is by generator tries to generate a realistic image to deceive the
using algorithm based approach, known as super-resolution discriminator, while the discriminator tries to distinguish the
(SR). SR in remote sensing applications is important because generated images from the original. SRGAN employs GAN-
it can assist the visual interpretation of images in remote based training with perceptual loss function for obtaining
sensing application such as surveillance, target detection, visually-pleasing super resolved images rather than
agriculture, land use mapping, meteorology, etc. maximizing PSNR. Another GAN-based architectures and
training strategies have continuously improved to achieve
For many years ago, a number of traditional methods was better performance. Sajjadi et al. proposed EnhanceNet that
used for super-resolution. The simplest and fastest method for using a texture matching loss to encourage super-resolved
SR is interpolation, such as bilinear and bicubic interpolation. results to have the same textures as the ground truth images
Interpolation method projects the initial low resolution (LR) [10]. SRFeat [11] is another GAN-based method that
image onto a high resolution (HR) grid and the missing pixel produces better perceptual quality. It has two discriminators
values are estimated using an interpolation function [1]. and uses the adversarial loss terms in both image and feature
Learning based super-resolution, especially deep learning domains [11]. Another approach is ESRGAN that used
with convolutional neural network (CNN) architecture gets residual in residual dense block and used features before
considerable attention nowadays. The pioneering CNN model activation to improve perceptual quality [12].
for SR is proposed by Dong et.al, known as super-resolution
convolutional neural network (SRCNN) [2] [3]. It consists
In this research, we try to evaluate the implementation of A. Generative Adversarial Training
GAN-based approach for super resolution of remote sensing In this research, we used GAN-based super resolution
images. Different with SRGAN, our proposed method used approach to improve spatial resolution of remote sensing
residual dense network (RDN) [7] that based on residual images. GAN-based super resolution consist of two networks,
learning and dense network as GAN generator and we used generator and discriminator that compete each other. The
ten convolutional layers with different number of filters in diagram of training process can be seen of Fig 3.
discriminator network.
This paper is arranged as follows. In Section II, we
present the proposed method. Section III explains experiment
details. Results and discussion are given on Section IV and
Section V is the conclusion.
II. PROPOSED METHOD
The aim of single image super resolution (SISR) is to
estimate a super resolution image from low resolution (LR)
input image, such as depicted on Fig. 1. Super resolved (SR) Fig 3. GAN-based approach
image has higher resolution (sharper and bigger size) than low The generator network generate super resolution (SR)
resolution input image. image from low resolution (LR) input image. The SR image
obtained from the generator and the ground truth (HR image)
are fed into the discriminator to be assessed continuously until
the SR image and ground truth can’t be distinguished. We
update the parameters in the adversarial networks using
SRGAN loss function (LSRGAN) and repeat the process until
the optimization is finished. Loss function on this approach
uses perceptual loss function such as used on SRGAN [8]. It
Fig. 1. Single image super resolution consists of a content loss (𝐿 ) and an adversarial
loss (𝐿 ).
SISR method that proposed on this research is deep
learning based approach. There are two main phases on deep  𝐿 =𝐿 +𝐿  
learning for super-resolution, i.e. training and prediction, such
as depicted on Fig. 2. On the training phase, deep learning Two type of content loss evaluated on this GAN-based
model is trained to analyze statistical relationship between the super resolution, the pixel-wise MSE loss (𝐿 ) and VGG19
low resolution (LR) and its corresponding high resolution
loss (𝐿 ) [8].
(HR) images from a dataset training. On prediction phase, the
trained SR model is then used to predict HR images from a set
of LR images.  𝐿 = ∑ ∑ 𝑌 (𝑎, 𝑏) − 𝑌 (𝑎, 𝑏)  


, ,
𝐿 = ∑ ∑ ∅ , 𝑌(𝑎, 𝑏) − ∅ , 𝑌(𝑎, 𝑏)
, ,

Where 𝑊 , and 𝐻 , describe the dimensions of the feature


maps within the VGG network, Adversarial loss (𝐿 ) is
stated by this equation [8]:

𝐿 = 10 ∑ −𝑙𝑜𝑔𝐷 𝑌(𝑎, 𝑏) (5)

Where 𝐷 𝑌 (𝑎, 𝑏) is the probability that super resolved


Fig. 2. Deep learning super-resolution diagram image is real HR image.
LR image on training phase is low resolution version of its B. Residual Dense Generator
HR image. In the experiment, the LR image is modeled as
output of the degradation process from HR image as can be The proposed generator network in this research is inspired
stated by the following equation: by residual learning with dense connection network, known
as residual dense network (RDN) proposed by Zhang et al.
 𝑌 = 𝐷(𝑋, 𝛿)  [7]. Fig.4 shows the architecture of this network.
Input for the network is low resolution image. The network
Where 𝑌 is the HR image, while 𝑋 is the LR image. 𝐷 starts with two convolutional layers as feature extractor and
represents the degradation mapping function from HR to the then followed by some residual dense blocks (RDB). RDB
LR image and δ is parameter of the degradations process. consist of dense network with local fusion. Dense network
Degradation process can be affected by various factor such as allows direct connections from the state of preceding RDB to
blurring, noise, artefact, defocusing, etc. Super-resolution all the layers of current RDB. Local fusion is used to learn
technique try to recover HR image 𝑌 from LR image 𝑋. 𝑌 is features from preceding and current state more effective.
expected to be as similar as possible with Y.
NWPU-RESISC45, a remote sensing image dataset for scene
classification [14] and the last is part of PatternNet dataset
that didn’t used for training.
B. Experiment Detail
The training and testing images above are consider as
high resolution images. These HR images are then down
sampled by using bicubic interpolation with 4 scale factor to
create LR images. The training used batches of size 32. Adam
algorithm was used as optimization solver. For generator
network, we employed 20 residual dense blocks with six
layers on every residual dense block. Combination of content
loss and adversarial loss used on GAN training. We evaluated
two types of content loss on the proposed method. We used
mean square error on first experiment (RDGAN-mse) and
VGG-loss on the second experiment (RDGAN-vgg). All
Fig. 4. Residual dense network architecture experiments was trained over 100 epochs.
Global feature fusion is used after the residual dense blocks C. Image Quality Assesment
to learn global hierarchical features jointly and adaptively. We employed two types of image quality assessment
After extracting local and global features, we use upsampling (IQA) to evaluate the performance of our proposed method.
network followed by convolutional layer to upscale the size The first is full reference image quality assessment (FR-IQA)
of images [7]. and the other is no reference image quality assessment (NR-
C. Discriminator Network IQA). The FR-IQA evaluates the super resolved image by
statistically measuring distortion values between the SR
Discriminator network is a classification network that is image and the reference image (HR image). This metric
trained to distinguish the super resolved images from the compares the differences between the pixel values of the
ground truth HR images. Fig. 5 shows the structure of our same position on the two images. Fig. 6 show the process of
discriminator network that used on this research. full reference image quality assessment on our research. We
used PSNR (peak signal to noise ratio) and SSIM (structural
similarity) as FR-IQA metrics.

Fig. 5. Discriminator network

The discriminator network starts with a convolutional layer


with leaky ReLU activation function and then followed by 9
discriminator block (DB). The kernel size is similar on all
layers with size 3x3. Every DB consists of one convolutional
layer, batch normalization, and leakyReLU. These DBs have Fig. 6. Full reference image quality assessment
different number of filters, DB-1 has 64 filters, DB-2 has 128 . No reference image quality assessment (NR-IQA) evaluates
filters, DB-3 has 256 filters, and DB-4 has 512 filters. The the image quality without any kind of reference images.
resulting 512 feature maps are followed by two dense layers. These metrics use to assess the perceptual quality, and known
The last layer uses sigmoid activation function to obtain the also as subjective indicator. Mean opinion score (MOS) is
probability for distinction or sample classification. popular method, but in this experiment we used NIQE score
[15] as NR-IQA metric.
III. EXPERIMENT
1) Peak Signal to Noise Ratio (PSNR)
A. Dataset
PSNR is widely used quality metric for image quality
In the experiment, we used PatternNet dataset as data assessment. PSNR computes the peak signal to noise ratio (in
training. PatternNet is a remote sensing dataset that was decibel), between 2 images. It is depended on the mean square
collected from Google Earth imagery for US cities for remote error (MSE) between the ground truth images (𝑌 ) and the
sensing image recognition (RSIR) [13]. This dataset consists super resolved images (𝑌 ) as stated in these equations:
of 38 classes with 800 images per class with 256x256 pixel
in size. The spatial resolutions is ranging from 0.062 m to  𝑀𝑆𝐸 = ∑ ∑ 𝑌 (𝑎, 𝑏) − 𝑌 (𝑎, 𝑏)  
4.693 m. We used 30.250 images from this dataset for .
training.
 𝑃𝑆𝑁𝑅 𝑌 , 𝑌 = 10 𝑙𝑜𝑔  
For testing, we used four datasets. The first is Bdg dataset,
the aerial photos for Bandung area. The second is Indo
dataset, remote sensing images for Indonesian area that where k is number of image channels and N is total number
collected from Google earth imagery. The third dataset is of pixels in each image.
2) Structural Similarity Index (SSIM) Testing Methods Quality Metrics
SSIM is used for measuring similarity between two Data PSNR SSIM NIQE
RDN 28.88 0.79 7.57
images (super resolved image (𝑌 ) and ground truth image SRGAN 24.50 0.68 4.72
(𝑌 ) ). SSIM is multiplicative combination of three terms, RDGAN 25.94 0.70 6.88
namely the luminance (L), contrast (c), and structural (s). Bilinear 25.38 0.64 8.08
Bicubic 25.99 0.66 7.74
 𝑆𝑆𝐼𝑀 𝑌 , 𝑌 = 𝐿(𝑌 , 𝐶)𝑐 𝑌 , 𝑌 𝑠 𝑌 , 𝑌   SRCNN 26.96 0.70 7.22
PatternNet
VDSR 27.36 0.74 7.82
dataset
RDN 28.18 0.76 8.63
2𝜇 𝜇 + 𝐶 SRGAN 24.71 0.67 5.75
𝐿 𝑌,𝑌 = RDGAN 25.13 0.67 7.02
𝜇 +𝜇 +𝐶
2𝜎 𝜎 + 𝐶
𝑐 𝑌,𝑌 = Table I show that compared with other evaluated methods,
𝜎 +𝜎 +𝐶
residual dense network (RDN) architecture achieves the best
𝜎 +𝐶 performance on objective metric (greatest PSNR and SSIM)
𝑠 𝑌,𝑌 =
𝜎 𝜎 +𝐶 but the worst on perceptual quality index. Meanwhile,
SRGAN has superior performance relative to all evaluated
𝜇 and 𝜇 indicate the average of 𝑌 and 𝑌 , 𝜎 and 𝜎 are methods based on perceptual index. On our proposed method
variance of 𝑌 and 𝑌 , 𝜎 is the covariance of 𝑌 and 𝑌 , and (RDGAN), we combined the residual dense network with
generative adversarial training, and the results shows that
C1, C2 and C3 are constants.
compared with SRGAN, the objective measure increased
around 1.67 dB on PSNR and 0.04 on SSIM. The perceptual
3) Naturalness Image Quality Evaluator (NIQE)
quality index is decreased with 2.065 on NIQE score. This
𝑁𝐼𝑄𝐸 is an image quality evaluator that calculates the
decreasing is occurred because MSE is defined based on
quality score for an image without a reference [15]. We used
differences of the image on pixel space so that the MSE’s
it to asses the perceptual quality of super resolved image.
ability to capture perceptual differences is limited.
NIQE score is calculated by measuring a simple distance
between the natural scene statistic model and the SR image. On the second experiment we applied VGG loss as
content loss on training the network, and the results are
The higher value of PSNR and SSIM the better
shown on Table II.
performance of super resolution method, meanwhile a
smaller NIQE score indicates better perceptual quality. TABEL II PERFORMANCE OF DIFFERENT CONTENT LOSS ON RDGAN
IV. RESULTS AND DISCUSSION Testing Methods Quality Metrics
Dataset PSNR SSIM NIQE
A. Quantitative Results SRGAN 24.94 0.56 4.56
Bdg
Table I shows the performance of remote sensing images RDGAN-mse 28.16 0.68 6.96
dataset
RDGAN-vgg 26.30 0.59 4.26
super resolution for 4x upscaling factor. We implement it on
SRGAN 27.00 0.65 4.59
four testing datasets. RDGAN on the first experiment used Indo
RDGAN-mse 31.65 0.77 7.02
MSE as contents loss function. We compared the results with dataset
RDGAN-vgg 28.81 0.67 4.40
bilinear interpolation, bicubic interpolation, and some SISR NWPU
SRGAN 24.50 0.68 4.72
methods based on deep learning (SRCNN [3], VDSR [4], RDGAN-mse 28.28 0.78 6.88
dataset
RDN [7], and SRGAN [8]). MSE was used as content loss on RDGAN-vgg 26.12 0.71 4.44
SRGAN 24.71 0.67 5.75
the first experiment. PatternNet
RDGAN-mse 28.18 0.77 7.62
dataset
RDGAN-vgg 25.47 0.68 5.57
TABEL I PSNR, SSIM, AND NIQE SCORE FOR 4X UPSCALING FACTOR
Table II shows that RDGAN-vgg has better perceptual
Testing Methods Quality Metrics
Data PSNR SSIM NIQE
quality index than RDGAN-mse with significant
Bilinear 26.86 0.59 7.59 improvement, event though PSNR and SSIM of RDGAN-
Bicubic 27.29 0.61 7.33 vgg are smaller than RDGAN-mse. We notice that VGG loss
Bdg
SRCNN 27.76 0.65 6.88 gives better perceptual quality because it works on feature
VDSR 28.02 0.67 7.03 space that closer to perceptual similarity.
dataset
RDN 28.45 0.69 8.01
SRGAN 24.94 0.56 4.56 On this experiment, we also compare the performance of
RDGAN 28.16 0.68 6.96 RDGAN-vgg with SRGAN as baseline method on GAN-
Bilinear 29.80 0.69 7.98
based super resolution approach. We plot the comparison
Bicubic 30.48 0.72 7.60
SRCNN 30.99 0.74 6.90 between SRGAN and RDGAN with VGG loss on the graphs
Indo as shown on Fig.7. The graphs indicate that PSNR and SSIM
VDSR 31.35 0.76 6.89
dataset
RDN 32.15 0.78 7.97 of RDGAN are greater than SRGAN on four testing datasets
SRGAN 27.00 0.65 4.59 consistently. NIQE score of RDGAN also smaller than
RDGAN 28.58 0.66 7.02 SRGAN on all testing datasets. It proves that the proposed
Bilinear 26.63 0.71 7.89
NWPU Bicubic 27.19 0.73 7.47
method can improves the quality of super resolved images not
dataset SRCNN 27.81 0.75 6.84 only on perceptual quality but also on the PSNR and SSIM
VDSR 28.27 0.77 6.95 metrics.
We can observe from Fig. 6 that images reconstructed by
bicubic interpolation and deep learning approaches without
GAN training are vaguer or more distorted than super
resolved images with GAN training. RDGAN improves from
SRGAN on both objective evaluation metrics (PSNR and
SSIM) and also perceptual quality index (NIQE-score).

V. CONCLUSION
In this paper, we proposed a GAN-based approach for
Fig. 7 Comparison of RDGAN with SRGAN remote sensing images super resolution. Residual dense
network (RDN) was used as generator to produce higher
resolution image from low resolution input image. The dense
B. Qualitative Result network with residual learning and skip connection leads to
Fig. 8 provides example of the super resolved images with better PSNR and SSIM of the super resolved image. Applying
scale factor of 4x on four testing data. We compared our VGG loss as content loss make this GAN based approach has
proposed method (RDGAN with VGG loss function) with better perceptual quality than other evaluated SR methods.
bicubic interpolation, SRCNN, VDSR, RDN, and SRGAN. Our proposed method outperforms SRGAN in terms of full
reference image quality assessments (PSNR and SSIM) and
perceptual quality evaluator (NIQE score) on four testing
datasets consistently.
ACKNOWLEDGMENT
The experiments on this research have been done using
facility of High Performance Computing (HPC) in Research
Center for Informatics, Indonesian Institute of Sciences
(LIPI).
REFERENCES
[1] R. Fernandez-beltran, P. Latorre-carmona, and F. Pla,
“Single-Frame Super-Resolution in Remote Sensing : a
Practical Overview,” International Journal of Remote
Sensing, vol. 38, no. 1, pp. 314–354, 2017.
[2] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a
Deep Convolutional Network for Image Super-
Resolution,” in Computer Vision – European
Conference on Computer Vision 2014, 2014, pp. 184–
199.
[3] C. Dong, C. C. Loy, and K. He, “Image Super-
Resolution Using Deep Convolutional Networks,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp.
295–307, 2016.
[4] J. Kim, J. K. Lee, and K. M. Lee, “Accurate Image
Super-Resolution Using Very Deep Convolutional
Networks,” in IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2016, pp. 1646–1654.
[5] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee,
“Enhanced Deep Residual Networks for Single Image
Super-Resolution,” IEEE Computer Society Conference
Computer Vision Pattern Recognition Work, pp. 1132–
1140, 2017.
[6] Y. Tai, J. Yang, and X. Liu, “Image Super-Resolution
via Deep Recursive Residual Network,” in IEEE
Cionference on Computer Vision and Pattern
Recognition (CVPR), 2017, pp. 3147–3155.
[7] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu,
“Residual Dense Network for Image Super-Resolution,”
in 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2018, pp. 2472–2481.
[8] C. Ledig et al., “Photo-realistic single image super-
resolution using a generative adversarial network,” IEEE
Fig. 8 Visual results with upscaling factor 4x on four testing Conference on Computer Vision and Pattern
datasets (from top: Bdg, Indo, NWPU, and PatternNet dataset) Recognition (CVPR), pp. 105–114, 2017.
[9] I. J. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, and Networks,” in The European Conference on Computer
D. Warde-farley, “Generative Adversarial Nets,” in Vision Workshops (ECCVW), 2018.
Neural Information Processing Systems (NIPS), 2014, [13] W. Zhou, S. Newsam, C. Li, and Z. Shao, “PatternNet:
pp. 2672–2680. A benchmark dataset for performance evaluation of
[10] M. S. M. Sajjadi, B. Scholkopf, and M. Hirsch, remote sensing image retrieval,” ISPRS Journal of
“EnhanceNet: Single Image Super-Resolution Through Photogrammetry and Remote Sensing, vol. 145, pp. 197–
Automated Texture Synthesis,” Proceeding of IEEE 209, 2018, doi: 10.1016/j.isprsjprs.2018.01.004.
International Conference on Computer Vision, pp. [14] G. Cheng, J. Han, S. Member, X. Lu, and S. Member,
4501–4510, 2017. “Remote Sensing Image Scene Classification :
[11] S. Park, H. Son, S. Cho, K. Hong, and S. Lee, “SRFeat : Benchmark and State of the Art,” Proceeding of IEEE,
Single Image Super-Resolution with Feature vol. 105, no. 10, pp. 1865–1883, 2017.
Discrimination,” in Proceedings of the European [15] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making
Conference on Computer Vision (ECCV), 2018, pp. 439– a ‘completely blind’ image quality analyzer,” IEEE
455. Signal Processing Letter, vol. 20, no. 3, pp. 209–212,
[12] X. Wang, K. Yu, S. Wu, J. Gu, and Y. Liu, “ESRGAN : 2013.
Enhanced Super-Resolution Generative Adversarial

You might also like