Professional Documents
Culture Documents
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
corresponding HR image (as shown in Fig. 1). Intuitively, the publicly available HR images from Google Earth,
a Google Earth HR image can provide extra information and we build an open-source reference-based remote sensing
may help reconstruct fine textures in LR images. Therefore, super-resolution data set (RRSSRD).
for the remote sensing SR task, we consider using publicly 2) We propose an end-to-end RefSR approach for remote
available HR images from Google Earth as reference (Ref) sensing images, named RRSGAN. RRSGAN contains
images to help reconstruct the fine texture from LR images. a feature extraction and alignment module, a multi-
Specifically, the Ref images can guide the SR process where level texture transformer, and GAN-based losses in
the LR and Ref images contain similar contents, leading to both image and gradient domains. Considering the mis-
the SR images that have sharp visual quality and preserve the alignment problem between LR and Ref images and
ground-truth class [18]. model robustness with different qualities of Ref images,
In the computer vision field, some studies have we propose a GAFA method and an RAM to fur-
diverted from SISR and explored reference-based SR ther release the potential of RefSR in remote sensing
(RefSR) [18]–[20]. RefSR introduces additional reference scenarios.
images to compensate for the lost details in the LR images. 3) We demonstrate that our proposed RRSGAN is superior
The state-of-the-art methods follow the pattern of combining to both the state-of-the-art SISR methods and existing
image alignment or patch matching and texture synthesis [20]. RefSR methods on RRSSRD. Our proposed method is
Existing RefSR studies are mainly based on two different types also robust with different qualities of Ref images and
of assumptions: 1) the Ref images and LR images are aligned performs well in real-world images. This work proves
well or have high content similarity (e.g., the same object the great potential of the RefSR approach in the field of
from different viewpoints or video frames) [19], [21], [22] or remote sensing.
2) the Ref images and LR images sometimes are significantly The rest of this article is organized as follows. We introduce
misaligned or have uncertain content similarity (e.g., from the related work in Section II, including the existing SR meth-
web image searches) [20], [23], [24]. However, the above ods for remote sensing images, SISR, and the deep learning-
assumptions do not fully match the remote sensing scenario based RefSR methods. In Section III, we give a detailed
in this article, mainly due to the following two reasons. description of the proposed approach. Experimental results and
1) In remote sensing tasks, the HR Ref images can be discussion are provided in Section IV, and Section V concludes
easily matched to the LR images at the same location our work.
using the latitude and longitude information. Therefore,
the Ref and LR images have a certain content similarity.
II. R ELATED W ORK
However, image alignment is still necessary due to
the different shooting viewpoints, which results in the A. SR for Remote Sensing Images
different tilting directions of tall buildings, and deviation
In the field of remote sensing, depending on the number of
in geographic coordinates (usually within several pixels).
input images, two main categories for improving the resolution
The challenges of alignment are the large spectral dif-
of remotely sensed images are SISR and multi-image super-
ference between different sensors, land cover changes at
resolution (MISR) techniques [25]. MISR techniques in
different times, shifts in the viewpoint, and illumination
remote sensing aim to reconstruct high spatial frequency
variations.
details from multiple LR versions of the same remote sensing
2) Due to inevitable land cover changes, cloud coverage,
scenes [26]–[30]. However, we can hardly apply MISR
and missing HR Ref images, there are higher require-
approaches to applications where multiple remotely sensed
ments for the robustness of the model, especially in the
images of the same scene are not possible or difficult to
Ref texture transfer process.
obtain [31]. For such situations, SISR has become the more
To address the above-mentioned issues, we propose a feasible SR technology [32].
novel reference-based remote sensing GAN (RRSGAN) With the rapid progress made in deep learning, the deep
for SR, as shown in Fig. 2. Our approach works in a learning-based SISR methods outperform the traditional
“feature extraction–alignment–transfer” fashion, as shown SISR methods and prove to be promising in the domain
in Fig. 3. In the feature alignment process, objects with of remote sensing SR [31], [33]. The early CNN-based
a larger offset usually come from tall buildings, which methods [34]–[36] usually retrained a network designed for
generally have apparent boundaries. Therefore, we propose natural images, such as SRCNN and VDSR, with remote
a gradient-assisted feature alignment (GAFA) method to sensing images. Recent progress of SR for remote sensing
match the Ref features with the LR features for better use images can be summarized in four aspects.
in the subsequent process. In the feature transfer process, 1) Improving the structure of SR network for remote sens-
we propose a relevance attention module (RAM) to suppress ing images. For example, Pan et al. [37] proposed a
irrelevant information and enhance the relevant information residual dense backprojection network (RDBPN)-based
of the Ref features to improve the robustness of the model. SISR method, which can utilize residual learning in
The contributions of our work are summarized as follows. both global and local manners. Jiang et al. [8] improved
1) To the best of our knowledge, we are one of the first the performance regarding the details of remote sensing
to explore RefSR on remote sensing images. Based on images by adding an edge enhancement module to the
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Fig. 3. Generator of the proposed RRSGAN. The approach is designed in a “feature extraction–alignment–transfer” structure. First, the Ref features are
extracted and aligned to the LR features. In the texture transfer process, we first extract the LR features and then transfer the aligned Ref features in a
multiscale way.
information can help the model focus on neighboring con- respectively. The purpose is to extract the Ref features and
figurations and better infer the local intensity of sharpness. align them to the LR features. As we analyzed in Section I,
Therefore, the generator can benefit from the two discrimina- the most difficult objects for feature alignment are usually tall
tors to learn the fine appearance and focus on avoiding the buildings. Intuitively, the gradient information from the object
distortions of geometric details. boundary is useful for alignment. Therefore, in addition to
The generator of RRSGAN is an end-to-end network, extracting the image features, we explicitly calculate the image
as shown in Fig. 3. The approach is designed in a “feature gradient to assist with feature alignment. Note that although
extraction–alignment–transfer" structure. First, we need to the feature extractor can implicitly learn to extract useful
extract the Ref features and align them to the LR features. features, including the image gradient, the explicit retention
Otherwise, the Ref features cannot be used effectively in the and utilization of the gradient features make the network
subsequent texture transfer process. Different from the patch- concentrate more on the objects that are prone to deviation as
based method in SRNTT [20] and flow-based method in Cross- shown in the ablation studies in Section IV-F2. The gradient
Net [19], we adopt deformable convolutions to align the Ref map for an image is obtained by computing the difference
and LR features. Also, considering that objects with a larger between the adjacent pixels. The gradient calculation of a pixel
offset usually come from tall buildings, we propose a gradient- x = (x, y) in image I is defined as follows:
assisted alignment method to improve the performance. Then,
we extract the LR features and perform texture transfer from Ix (x) = I (x + 1, y) − I (x − 1, y)
the Ref features. Following SRNTT [20], we transfer the I y (x) = I (x, y + 1) − I (x, y − 1)
aligned Ref features in a multiscale way. Instead of directly ∇ I (x) = (Ix (x), I y (x))
concatenating the LR and Ref features in the texture transfer,
M(I ) = ∇ I 2 (1)
we propose an RAM to improve the robustness of the model.
The RAM can enhance the relevant information to improve where M(·) computes the L 2 norm of the gradient at each
the performance and suppress the irrelevant information of the location and M(I ) is referred to as the gradient map of image
Ref features, which is useful for preventing the irrelevant Ref I . The image gradient ∇ I can be efficiently computed by using
features from making the results worse than SISR methods. convolution layers with fixed kernels.
We will elaborate on the details of the RRSGAN in the To match the size of the LR image to the Ref image,
following sections. In Section III-A, we introduce the GAFA we first upscale the LR image to obtain LR↑ (↑ denotes the
method. In Section III-B, we describe the end-to-end network, bilinear upsampling process). Instead of directly using the Ref
including the LR feature extractor, RAM, texture transfer, image for alignment, we use Ref↓↑ (↓ denotes the bilinear
and discriminator. A group of loss functions is introduced in downsampling process), resampled by first applying bilinear
Section III-C, and the implementation details of the proposed downsampling and then upsampling to the original Ref image.
method are presented in Section III-D. Then, LR↑ and Ref↓↑ are used to estimate the offset between
the LR image and the Ref image. Eventually, we use the
A. Gradient Assisted Feature Alignment Method offset to align the Ref features. We use Ref↓↑ rather than
The proposed GAFA contains a feature extraction module the original Ref image for computing the offset because the
and a feature alignment module, as shown in Fig. 4(a) and (b), different blur degrees of the Ref and LR↑ images increase the
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
assisted feature alignment method. The symbol ↑ denotes the bicubic upsampling process, and ↓ denotes the bicubic downsampling process.
Fig. 4. Gradient
The symbol denotes feature concatenating on the channel. (a) Feature extraction module. (b) Feature alignment module.
difficulty of the alignment. In contrast, the blurry Ref↓↑ image can utilize the predicted offsets to align the Ref features.
is domain-consistent with LR↑, which helps the alignment. Besides, the offsets and aligned features at the lth level can
The effectiveness of the resample strategy of Ref images is be further used to help predict the offsets and aligned features
demonstrated in Section IV-F2. As shown in Fig. 4(b), three at the (l − 1)th level. At level 1, the cascading refinement
pairs of data are used as inputs for the alignment, in which [53] is used to improve the performance of the alignment
each pair contains an image and its corresponding gradient further. A subsequent deformable alignment is cascaded to
map. further refine the coarsely aligned Ref features [52]. Note that
1) Feature Extraction Module: The feature extraction mod- the alignment module does not require explicit supervision.
ule contains two parallel encoders to extract the image and Also, the offset is jointly learned with the whole model instead
gradient features simultaneously. Each encoder consists of four of being trained separately. The effectiveness of the feature
convolution blocks, and each convolution block is composed alignment module is evaluated in Section IV-F2.
of a convolution layer and a Leaky ReLU. The kernel size of
the convolution layers is set to 5. The stride of the last two B. End-to-End Network Structure
convolutional layers is set to 2 to obtain multiscale features. Without the requirement for a separate SISR network as in
The number of feature maps in the image encoder and image CrossNet [19] or a separate feature alignment network as in
gradient encoder is set to 64 and 16. The multiscale features SRNTT [20], the proposed RRSGAN is trained in an end-
include both semantic and textual information, contributing to to-end manner. We use the GAN architecture. The generator
feature alignment and transfer. Besides, the features extracted includes a feature alignment module (elaborated in Section III-
from the two parallel encoders are concatenated at the same A), an LR feature extractor, and a texture transformer, which
level. contains the RAMs. We use two discriminators D I and DG
2) Feature Alignment Module: The structure of the feature for the image and gradient domains, respectively.
alignment module is shown in Fig. 4(b). We use deformable 1) LR Feature Extractor: The input LR image is first
convolutions [51] to align the Ref and LR features. Deformable forwarded to the feature extractor, which contains N residual
convolution aligns the features more flexibly, compared to the blocks. Each block consists of two convolution layers and a
explicit motion estimation or image warping. Inspired by the Leaky ReLU, as shown in the pink box in Fig. 3. In this work,
success of deformable convolution for aligning neighboring the kernel size of the convolution layers is set to 3 × 3, and
frames in the video field [52], we adopt a pyramid structure the number of feature maps in each layer is set to 64.
to estimate and propagate the offsets and generate aligned Ref 2) RAM : Instead of directly combining the Ref and LR
features at multiple levels. features for texture transfer, we use the correlation between
Specifically, we align the features in a coarse-to-fine manner the Ref and LR features to modify the Ref features before
for the entire image. After we obtain three-level features in combining. Intuitively, the relevant information between Ref
the feature extraction module, the offsets are predicted at each and LR features should be enhanced, and the less relevant
level by two convolution layers. Then, deformable convolution information should be suppressed in the Ref features and
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
D. Implementation Details of remote sensing scenes, including airport, bare land, beach,
Following the standard protocol, we obtain the LR images bridge, center, commercial, dense-residential, farmland, for-
during training by downsampling the HR images using a est, industrial, meadow, medium-residential, park, parking,
bicubic kernel with downsampling factor r = 4. For each playground, pond, port, river, sparse-residential, viaduct,
input minibatch, we randomly crop 16 64 × 64 patches from and so on.
LR images. The corresponding HR patches have a size of Information about the RRSSRD is shown in Table I. Exam-
256 × 256. The texture transformer in an SR network contains ples of the HR-Ref pair in RRSSRD are shown in Fig. 6.
three stages, each consists of 16 residual blocks. For the RRSSRD consists of 4047 pairs of HR-Ref images with RGB
discriminators, we adopt a VGG-style network without BN bands. The HR images are acquired from WorldView-2 and
layers. We set the weight hyperparameters for α, β, γ , and δ GaoFen-2, and depict Xiamen and Jinan City, China. The Ref
to 0.1, 0.001, 1, and 0.001, respectively. The Adam optimizer images are collected from Google Earth in 2019 with a spatial
is used for optimization with the parameters of β1 = 0.9, resolution of 0.6 m. We downsample each HR image 4 times
β2 = 0.999, and = 1 × 10−8 . The learning rates for both to a LR image. The HR and Ref images are sized 480 × 480
the generator and discriminators are set to 1 × 10−4 and are pixels, and correspondingly, the LR images are sized 120×120
reduced to half with 50k, 100k, and 200k iterations. We first pixels.
warm up the network for 30k iterations where only Lrec and Considering the model performance on different image
Lg_rec are applied. Then, we use all losses to train a total sources and locations, we build four test data sets. Each test
of 300k iterations. We implement our models with the PyTorch set consists of 40 pairs of HR-Ref images. In the first test set,
framework and train them using 16 NVIDIA GTX 1080Ti the images are collected from WorldView-2 and depict Xiamen
GPUs. City, China. The images in the second test set are also taken in
Xiamen City, but the HR images are collected from Microsoft
Virtual Earth in 2018 with a spatial resolution of 0.5 m. In the
IV. E XPERIMENTS
third test set, the HR images are acquired from the GaoFen-
A. Data Sets 2 (GF-2) satellite in 2018 with a spatial resolution of 0.8 m
To the best of our knowledge, the existing common data and depict Jinan City, China. The HR images in the fourth test
sets used for SR of remote sensing tasks [56], [57] do set are collected from Microsoft Virtual Earth and depict Jinan
not provide coordinate information for each image, limit- in 2018 with a spatial resolution of 0.5 m. Note that all the Ref
ing the matching of reference images. Therefore, we build images are collected from Google Earth in 2019 with a spatial
a benchmark data set for RefSR technology in this work, resolution of 0.6 m. LR images are obtained by ×4 bicubic
named the RRSSRD. This data set covers common classes downsampling from the HR images and are sized 120 × 120
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
pixels. The Ref images are resized to 480 × 480 pixels, which CrossNet is a CNN-based method and SRNTT is a GAN-
is the same size as the HR images. based method. All these methods are fully optimized on
Furthermore, we test our method on real-world remotely our training data set to obtain their best performance for a
sensed images from the GaoFen-1 (GF-1) satellite with a fair comparison. Note that Cycle-CNN aims to reconstruct
spatial resolution of 2 m. The corresponding Ref images are the real-world images and requires real LR images that are
collected from Google Earth and have a spatial resolution not generated from HR images. Thus, we add the real LR
of 0.6 m. images from the GF-1 satellite for Cycle-CNN method and
also compare our method with the Cycle-CNN both on the
four test data sets and the real-world GF-1 data. The results
B. Evaluation Metrics
using bicubic interpolation (Bicubic) are also included for
The PSNR and SSIM have been used as standard evaluation comparison.
metrics in image SR [58]. Nevertheless, as revealed in some For better comparison with both GAN-based methods and
recent studies [59], [60], super-resolved images may some- CNN-based methods, we train two networks, i.e., RRSGAN
times have high PSNR and SSIM scores with over smoothed and RRSNet. RRSNet uses only the reconstruction loss, a sim-
results but tend to lack realistic visual results. At this moment, plified version of RRSGAN, with the discriminators removed.
apart from the PSNR, the perception index (PI) [59] and the RRSNet is evaluated to make a fair comparison with the CNN-
learned perceptual image patch similarity (LPIPS) [60] are based methods.
included in our experiments. Besides, the PI and natural image We quantitatively evaluated the SR results using four met-
quality evaluator (NIQE) [61] can be used as evaluation met- rics, including PI, LPIPS, PSNR, and SSIM. In each row,
rics on real-world images. The NIQE and PI were originally the best result is highlighted in red. As shown in Table II,
introduced as nonreference image quality assessment methods RRSNet exhibits the highest scores in the metric of PSNR and
based on low-level statistical features [62]. The NIQE is SSIM, whereas RRSGAN achieves the best performance in the
obtained by computing the 36 identical natural scene statistical metric of LPIPS on all four test data sets. For the PI metric,
(NSS) features from patches of the same size from the image our proposed approach outperforms the other SR methods on
[61]. The PI is calculated by incorporating the criteria of Ma most test data sets.
et al. [63] and NIQE as follows: Generally, CNN-based methods have better PSNR and
1 SSIM because they focus on preserving the spatial structure
PI = ((10 − Ma) + NIQE). (11) of the LR images. However, the SR results of CNN-based
2
methods suffer from a lack of realistic visual appearance,
The LPIPS is a full-reference metric that measures percep- which causes worse LPIPS and PI. In contrast, GAN-based
tual image similarity using a pretrained deep network. We use methods obtain better LPIPS and PI, as they use adversarial
the AlexNet [64] model to compute the l2 distance in the loss and perceptual loss, encouraging the network to generate
feature space. LPIPS can be calculated using a given image y visually favorable results. Besides, we notice that the per-
and a ground-truth image y0 as follows: formances of CrossNet and SRNTT are not satisfactory. The
1 reason may be that the models are designed based on specific
LPIPS(y, y0 ) = wl f l − f l 2 (12)
Hl Wl h,w h,w 0h,w 2 assumptions in common scenarios, and it is unreasonable to
l
apply them to remote sensing scenarios directly. Owning to
where Hl and Wl represent the height and width of the lth the usage of Ref features, RRSNet surpasses other CNN-
l
layer, respectively, f h,w and f0lh,w represent the features of the based methods by a large margin. Compared with GAN-
corresponding y and y0 of the lth layer at location (h, w), based methods, RRSGAN is not only very competitive on the
respectively, wl is a learned weight vector, and is the image quality assessment but also performs well on PSNR and
elementwise multiplication operation. Note that, in contrast SSIM. The reason is that RRSGAN can utilize rich texture
to PSNR and SSIM, lower PI and LPIPS indicate better SR information from Ref images to reconstruct the details in LR
results. images.
A visual comparison is presented in Fig. 7 and can fur-
ther explain the quantitative results. The results of bicubic
C. Quantitative and Qualitative Comparison With Different interpolation cannot produce extra details. Owning to the
Methods learning-based technologies, CNN-based SISR methods, such
In this section, we compare our proposed method with as SRResNet, MDSR, WDSR, DBPN, RDBPN, and Cycle-
state-of-the-art SISR and RefSR methods on the four test CNN, can reconstruct some texture details but still suffer
data sets. The compared SISR methods include five CNN- from blurry contours due to the simplex optimization objective
based SISR methods (i.e., VDSR [11], SRResNet [14], MDSR function. GAN-based SISR methods, such as ESRGAN and
[12], WDSR [13], and DBPN [47]), two state-of-the-art SR SPSR, have a better visual appearance but generate artificial
methods for remote sensing images (i.e., RDBPN [37] and artifacts and worsen the reconstruction results. The SR results
Cycle-CNN [65]), and two GAN-based SISR methods (i.e., of SRNTT suffer from the problem of blocky artifacts due to
ESRGAN [15] and SPSR [16]). Two RefSR methods, i.e., the its patch matching method. Compared with other SR methods,
recently proposed CrossNet [19] and SRNTT [20], are also our proposed approach recovers finer texture details, and the
included in the comparison. Note that in these two methods, results are more natural and realistic.
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
Fig. 7. Visual comparison of our methods with different SR methods on the test sets.
D. Robustness of Our Proposed Method we simulate four scenarios, including Ref images from dif-
In practical applications, RefSR methods need to be suf- ferent image sources, covered by clouds, mismatched, and
ficiently robust against various Ref images with different missing. Correspondingly, the four kinds of Ref images
quality levels. To test the robustness of our proposed approach, are retrieved from Microsoft Virtual Earth (MS), covered
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
TABLE II
Q UANTITATIVE C OMPARISON W ITH D IFFERENT M ETHODS . VDSR, SRR ES N ET, MDSR, WDSR, DBPN, RDBPN, AND C YCLE -CNN A RE CNN-BASED
SISR M ETHODS . ESRGAN AND SPSR A RE GAN-BASED SISR M ETHODS . C ROSS N ET AND SRNTT A RE R EF SR M ETHODS . F OR PSNR AND
SSIM, A H IGHER S CORE I NDICATES B ETTER , W HEREAS FOR PI AND LPIPS, A L OWER S CORE I NDICATES B ETTER . I N E ACH ROW,
THE B EST R ESULT I S H IGHLIGHTED IN Red
TABLE III
R ESULTS OF THE U SE OF D IFFERENT R EF I MAGES ON THE F IRST T EST S ET. F OR PSNR AND SSIM, A H IGHER S CORE I NDICATES B ETTER , W HEREAS
FOR LPIPS, A L OWER S CORE I NDICATES B ETTER . Red I NDICATES THE B EST AND Blue I NDICATES THE S ECOND -B EST R ESULTS
TABLE IV
R ESULTS OF THE A BLATION S TUDY ON R EF T EXTURE T RANSFER . F OR PSNR AND SSIM, A H IGHER S CORE I NDICATES B ETTER , W HEREAS FOR PI AND
LPIPS, A L OWER S CORE I NDICATES B ETTER . I N E ACH C OLUMN , THE B EST R ESULT I S H IGHLIGHTED IN R ED
by clouds (Cloud), irrelevant images (Irrelevant), and black proposed approach is robust in handling the most com-
images (Black). To further demonstrate the robustness and mon distortion cases in remote sensing. Even when the
effectiveness of our proposed method, we also use the bicubic “Black” images or “Cloud” images are used as the references,
upsampled LR images (LR X4) or HR images (HR) as the results of our method are still better than those of the
references. GAN-based SISR methods (compared with Table II). This is
We calculate the quantitative results of the reconstruction due to the multiscale reconstruction structure and the gradient
results under the conditions mentioned above on the first loss, which guarantees the baseline performance, and the
test set. The results presented in Table III show that the RAM, which suppresses the irrelevant information in the
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
Fig. 8. SR results of using different Ref images. The first and the third rows represent the different Ref images. The second and the fourth rows represent
the corresponding SR results.
Ref features. Although the SR results of the model with display clear image content. In contrast, using irrelevant Ref
irrelevant Ref images have similar qualitative scores in terms images, the model degenerates to an SISR method and cannot
of PSNR and SSIM to that of the model with Google image, reconstruct more details.
a better relevance can produce more realistic textures, which Specifically, the model achieves the best performance when
is reflected in the metric of LPIPS. As shown in Fig. 8, using we use the HR images as references. It demonstrates the
relevant Ref images, the SR results show sharp edges and effectiveness of the texture transfer from the Ref images,
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
Fig. 9. Experimental results in a real-world scenario. The results of the PI and NIQE of each SR image are presented. A lower score indicates better results.
F. Ablation Studies not use Ref features in the SR process, and this method can
In this section, we verify the effectiveness of each com- be regarded as an SISR method. Then, we gradually add
ponent of our approach, including the Ref textual transfer, different levels of Ref features into the network from level 1 to
GAFA method, RAM, and gradient loss. We also discuss the level 3. “With 1-level texture transfer” means that we only
hyperparameter tuning of loss weight and model efficiency. use the Ref level-1 features, which is shown as the largest
1) Effectiveness of Ref Textual Transfer: To verify the orange box in Fig. 3. “With 2-level texture transfer” means
effectiveness of the Ref textual transfer, we conduct ablation that we use the Ref level 1 and Ref level 2 features. “With
experiments. We use the same training strategy and network 3-level texture transfer” means that we use all three-level Ref
parameters as introduced in Section III-C, except for the used features. As shown in Table IV, the use of Ref features can
level of Ref features in texture transfer. Note that we keep significantly improve the performance of SR results compared
the same feature alignment module and the same number of with the SISR method. Gradually combining more in-depth
residual blocks at each stage in the comparison experiments Ref features can improve the performance of SR results.
instead of removing each entire stage of texture transfer to Therefore, we use three levels of Ref features in our method.
avoid different performances caused by the different depths 2) Effectiveness of the GAFA Method: We discuss the effect
of the network. “Without texture transfer” means that we do of feature alignment between Ref and LR images. We conduct
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
TABLE VI
R ESULTS OF THE A BLATION S TUDY ON F EATURE A LIGNMENT M ETHOD IN T ERMS OF THE N UMBER OF A LIGNMENT L EVEL . F OR PSNR AND SSIM,
A H IGHER S CORE I NDICATES B ETTER , W HEREAS FOR PI AND LPIPS, A L OWER S CORE I NDICATES B ETTER . I N E ACH C OLUMN , THE B EST
R ESULT I S H IGHLIGHTED IN R ED
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
TABLE IX
R ESULTS OF THE A BLATION S TUDY ON G RADIENT L OSS . F OR PSNR AND SSIM, A H IGHER S CORE I NDICATES B ETTER W HEREAS FOR PI AND LPIPS,
A L OWER S CORE I NDICATES B ETTER . I N E ACH C OLUMN , THE B EST R ESULT I S H IGHLIGHTED IN R ED
TABLE X
R ESULTS OF D IFFERENT L OSS W EIGHTS . F OR PSNR AND SSIM, A H IGHER S CORE I NDICATES B ETTER , W HEREAS FOR PI AND LPIPS, A L OWER S CORE
I NDICATES B ETTER . I N E ACH C OLUMN , THE B EST R ESULT I S H IGHLIGHTED IN R ED
TABLE XI
C OMPARISON OF M ODEL PARAMETERS AND I NFERENCE RUNTIME . VDSR, SRR ES N ET, MDSR, WDSR, DBPN, RDBPN, AND C YCLE -CNN A RE
CNN-BASED SISR M ETHODS . ESRGAN AND SPSR A RE GAN-BASED SISR M ETHODS . C ROSS N ET AND RRSN ET A RE CNN-BASED R EF SR
M ETHODS . SRNTT AND RRSGAN A RE GAN-BASED R EF SR M ETHODS
the use of RAM can effectively improve the robustness of visual appearance. Besides, although the additional gradient
the model in different scenarios. The reason is that RAM can discriminator increases the training time and training difficulty,
suppress the influence of the less relevant information in the it does not increase the inference time of the test phase.
Ref features. The attention masks of different levels in RAM 5) Hyperparameter Tuning of Loss Weight: We perform
are presented in Fig. 10 and can further explain the process. ablation experiments to understand the impact of different
As shown by the dark areas in the red rectangles, the areas loss terms in (1). We use the same training strategy and
with land cover changes between the LR image and the Ref network parameters as introduced in Section III-C, except
image, caused by different seasons or building changes, have for different loss weights. First, we experiment the effect of
received less attention. The attention is focused on the relevant using only reconstruction loss Lrec , i.e., with RRSNet, when
area between the LR image and the Ref image, as shown by α, β, γ , and δ are all set to 0. As shown in Table X,
the bright area in the green rectangles. Therefore, the RAM the highest PSRN and SSIM values can be obtained using only
can improve the robustness of the model by suppressing the reconstruction loss compared with other loss weight settings.
irrelevant information and enhancing the relevant information However, the reconstruction loss function often leads to overly
between the LR features and the Ref features. smoothed results and is weak in restoring natural and realistic
4) Effectiveness of the Gradient Loss: We analyze the effect textures. The introduction of the gradient loss Lgrec and Lgadv
of the gradient loss. “Baseline” means that we only use can greatly improve the visual quality of reconstruction, which
the common SR loss functions, including the reconstruction has been verified in Section IV-F4.
loss Lrec , the adversarial loss Ladv , and the perceptual loss To determine the appropriate setting of the loss weights,
Lper . The gradient-based reconstruction loss Lg_rec and the based on the commonly used loss weights in the SR methods
gradient-based adversarial loss Lg_adv are added sequentially. [15], [16], we experiment with three sets of hyperparame-
As shown in Table IX, gradient loss can improve the PI and ters. Following the setting of different loss weights in [16],
LPIPS compared with those of the baseline model. It indicates gradient-based loss weight settings are consistent with image-
that the use of the gradient loss can yield a more realistic based loss weight settings, i.e., γ is set to 1 and δ is equal
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
to β. From Table X, we can see that the reasonable use of [2] B. Pan, Z. Shi, X. Xu, T. Shi, N. Zhang, and X. Zhu, “CoinNet: Copy
the adversarial loss Ladv and the perceptual loss Lper can initialization network for multispectral imagery semantic segmentation,”
IEEE Geosci. Remote Sens. Lett., vol. 16, no. 5, pp. 816–820, May 2019.
greatly improve the perceptual effect and contribute to better [3] R. Dong, W. Li, H. Fu, M. Xia, J. Zheng, and L. Yu, “Semantic
PI and LPIPS values. However, excessive adversarial loss and segmentation based large-scale oil palm plantation detection using
perceptual loss weights can reduce the performance of the SR high-resolution satellite images,” Proc. SPIE, vol. 10988, May 2019,
Art. no. 109880D.
results or even lead to the failure of the feature alignment [4] W. Li, C. He, J. Fang, J. Zheng, H. Fu, and L. Yu, “Seman-
module. The reason is that our model learns to align feature tic segmentation-based building footprint extraction using very high-
maps in an unsupervised fashion, where we do not explicitly resolution satellite images and multi-source GIS data,” Remote Sens.,
vol. 11, no. 4, p. 403, Feb. 2019.
define a loss term for pixelwise offset estimation. It means that [5] S. Yuan et al., “Long time-series analysis of urban development based
the feature alignment indirectly benefits from the final super- on effective building extraction,” Proc. SPIE, vol. 11398, Apr. 2020,
vision of the SR results. In such a situation, the reconstruction Art. no. 113980M.
[6] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “SOD-MTGAN: Small
loss can provide clearer guidance of learning feature alignment object detection via multi-task generative adversarial network,” in Proc.
than the adversarial loss and the perceptual loss. To balance Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 206–221.
the image reconstruction effect and perceptual effect, we set [7] W. Yang, X. Zhang, Y. Tian, W. Wang, J.-H. Xue, and Q. Liao, “Deep
the weight hyperparameters for α, β, γ , and δ to 0.1, 0.001, learning for single image super-resolution: A brief review,” IEEE Trans.
Multimedia, vol. 21, no. 12, pp. 3106–3121, Dec. 2019.
1, and 0.001, respectively. [8] K. Jiang, Z. Wang, P. Yi, G. Wang, T. Lu, and J. Jiang, “Edge-enhanced
6) Model Efficiency: In Table XI, we report the number GAN for remote sensing image superresolution,” IEEE Trans. Geosci.
of model parameters, computational complexity, the training Remote Sens., vol. 57, no. 8, pp. 5799–5812, Aug. 2019.
[9] N. Huang, Y. Yang, J. Liu, X. Gu, and H. Cai, “Single-image super-
time, and the inference time (in the GPU mode) of different resolution for remote sensing data using deep residual-learning neural
SISR and RefSR methods. For the inference time, all the network,” in Proc. Int. Conf. Neural Inf. Process. Guangzhou, China:
approaches are run on an NVIDIA GTX 1080Ti GPU and Springer, 2017, pp. 622–630.
[10] S. Lei, Z. Shi, and Z. Zou, “Coupled adversarial training for remote
tested on 120 × 120 LR images. Correspondingly, the Ref sensing image super-resolution,” IEEE Trans. Geosci. Remote Sens.,
input images are 480 ×480 pixels and are only used in the Ref vol. 58, no. 5, pp. 3633–3643, May 2020.
methods. For the training time, some of the SISR methods are [11] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
using very deep convolutional networks,” in Proc. IEEE Conf. Comput.
trained for 1 000 000 iterations, including VDSR, SRResNet, Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1646–1654.
EDSR, MDSR, WDSR. Cycle-CNN, and CrossNet are trained [12] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual
for 500 000 iterations. SRNTT is trained for 400 000 iterations. networks for single image super-resolution,” 2017, arXiv:1707.02921.
[Online]. Available: http://arxiv.org/abs/1707.02921
The rest of the methods are trained for 250 000 iterations. Note [13] J. Yu et al., “Wide activation for efficient and accurate image super-
that the training time of SRNTT is measured only for the resolution,” 2018, arXiv:1808.08718. [Online]. Available: http://arxiv.
network training phase, excluding the offline feature swapping org/abs/1808.08718
phase. In general, the training time of GAN-based methods is [14] C. Ledig et al., “Photo-realistic single image super-resolution using
a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis.
longer than the CNN-based methods. The inference time of the Pattern Recognit. (CVPR), Jul. 2017, pp. 4681–4690.
Ref methods is longer than that of the SISR methods due to the [15] X. Wang et al., “ESRGAN: Enhanced super-resolution generative adver-
extra processing of the Ref images. Compared with the results sarial networks,” in Proc. Eur. Conf. Comput. Vis. Workshops (ECCVW),
Sep. 2018.
of SRNTT, our proposed methods can effectively reduce the [16] C. Ma, Y. Rao, Y. Cheng, C. Chen, J. Lu, and J. Zhou,
inference time. In future work, we will further optimize our “Structure-preserving super resolution with gradient guidance,” in Proc.
approach in terms of model efficiency. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 7769–7778.
[17] Q. Liu, J. C. Trinder, and I. L. Turner, “Automatic super-resolution
V. C ONCLUSION shoreline change monitoring using Landsat archival data: A case study
at Narrabeen–Collaroy Beach, Australia,” Proc. SPIE, vol. 11, no. 1,
In this article, we explore the use of reference (Ref) images Mar. 2017, Art. no. 016036.
to assist in the reconstruction of LR images in remote sensing [18] Z.-S. Liu, W.-C. Siu, and Y.-L. Chan, “Reference based face super-
tasks. We build a benchmark data set and propose RRSGAN, resolution,” IEEE Access, vol. 7, pp. 129112–129126, 2019.
[19] H. Zheng, M. Ji, H. Wang, Y. Liu, and L. Fang, “Crossnet: An end-to-end
an end-to-end network with a GAFA module and a texture reference-based super resolution network using cross-scale warping,” in
transformer. GAFA extracts the Ref features and aligns them Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 88–104.
to LR features. A texture transformer can effectively utilize [20] Z. Zhang, Z. Wang, Z. Lin, and H. Qi, “Image super-resolution by
neural texture transfer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
the aligned Ref features to help reconstruct the fine textures in Recognit. (CVPR), Jun. 2019, pp. 7982–7991.
LR images. Experimental results demonstrate the effectiveness [21] H. Yue, X. Sun, J. Yang, and F. Wu, “Landmark image super-resolution
and robustness of RRSGAN. This work also proves the great by retrieving Web images,” IEEE Trans. Image Process., vol. 22, no. 12,
pp. 4865–4878, Dec. 2013.
potential of the RefSR approach in the field of remote sensing. [22] Y. Wang, Y. Liu, W. Heidrich, and Q. Dai, “The light field attachment:
In future work, we will further explore the performance of Turning a DSLR into a light field camera using a low budget camera
RefSR at a larger upscaling factor (e.g., eight times) and ring,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 10, pp. 2357–2364,
Oct. 2017.
optimize our approach in terms of model efficiency. [23] H. Zheng et al., “Learning cross-scale correspondence and patch-
based synthesis for reference-based super-resolution,” in Proc. Brit.
R EFERENCES Mach. Vis. Conf. (BMVC), T.-K. Kim, S. Zafeiriou, G. Brostow, and
K. Mikolajczyk, Eds. BMVA Press, Sep. 2017, pp. 138.1–138.13, doi:
[1] R. Mathieu, C. Freeman, and J. Aryal, “Mapping private gardens 10.5244/C.31.138.
in urban areas using object-oriented techniques and very high- [24] F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, “Learning texture trans-
resolution satellite imagery,” Landscape Urban Planning, vol. 81, no. 3, former network for image super-resolution,” in Proc. IEEE/CVF Conf.
pp. 179–192, Jun. 2007. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5791–5800.
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
5601117 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022
[25] L. Yue, H. Shen, J. Li, Q. Yuan, H. Zhang, and L. Zhang, “Image super- [49] J. Caballero et al., “Real-time video super-resolution with spatio-
resolution: The techniques, applications, and future,” Signal Process., temporal networks and motion compensation,” in Proc. IEEE Conf.
vol. 128, pp. 389–408, Nov. 2016. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4778–4787.
[26] R. Y. Tsai and T. S. Huang, “Multiframe image restoration and regis- [50] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
tration,” Adv. Comput. Vis. Image Process., vol. 1, no. 2, pp. 317–339, for biomedical image segmentation,” in Proc. Int. Conf. Med. Image
1984. Comput. Comput.-Assist. Intervent. Munich, Germany: Springer, 2015,
[27] T. Akgun, Y. Altunbasak, and R. M. Mersereau, “Super-resolution pp. 234–241.
reconstruction of hyperspectral images,” IEEE Trans. Image Process., [51] J. Dai et al., “Deformable convolutional networks,” in Proc. IEEE Int.
vol. 14, no. 11, pp. 1860–1875, Nov. 2005. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 764–773.
[28] J. Ma, J. C.-W. Chan, and F. Canters, “An operational superresolution [52] X. Wang, K. C. K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: Video
approach for multi-temporal and multi-angle remotely sensed imagery,” restoration with enhanced deformable convolutional networks,” in Proc.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 1, IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW),
pp. 110–124, Feb. 2012. Jun. 2019.
[29] H. Shen, M. K. Ng, P. Li, and L. Zhang, “Super-resolution reconstruction [53] T.-W. Hui, X. Tang, and C. C. Loy, “LiteFlowNet: A lightweight convo-
algorithm to MODIS remote sensing images,” Comput. J., vol. 52, no. 1, lutional neural network for optical flow estimation,” in Proc. IEEE/CVF
pp. 90–100, Feb. 2008. Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8981–8989.
[30] F. Li, X. Jia, D. Fraser, and A. Lambert, “Super resolution for remote [54] K. Simonyan and A. Zisserman, “Very deep convolutional networks
sensing images based on a universal hidden Markov tree model,” IEEE for large-scale image recognition,” 2014, arXiv:1409.1556. [Online].
Trans. Geosci. Remote Sens., vol. 48, no. 3, pp. 1270–1278, Mar. 2010. Available: https://arxiv.org/abs/1409.1556
[31] R. Fernandez-Beltran, P. Latorre-Carmona, and F. Pla, “Single-frame [55] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time
super-resolution in remote sensing: A practical overview,” Int. J. Remote style transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis.
Sens., vol. 38, no. 1, pp. 314–354, Jan. 2017. Amsterdam, The Netherlands: Springer, 2016, pp. 694–711.
[32] D. Yang, Z. Li, Y. Xia, and Z. Chen, “Remote sensing image super- [56] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions
resolution: Challenges and approaches,” in Proc. IEEE Int. Conf. Digit. for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv.
Signal Process. (DSP), Jul. 2015, pp. 196–200. Geographic Inf. Syst. (GIS), 2010, pp. 270–279.
[33] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep [57] G.-S. Xia et al., “Structural high-resolution satellite image indexing,” in
residual networks for single image super-resolution,” in Proc. IEEE Proc. ISPRS TC 7th Symp.–100 Years ISPRS, Vienna, Austria, Jul. 2010,
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017, pp. 298–303.
pp. 136–144. [58] M. Irani and S. Peleg, “Super resolution from image sequences,” in Proc.
[34] Y. Luo, L. Zhou, S. Wang, and Z. Wang, “Video satellite imagery super 10th Int. Conf. Pattern Recognit., vol. 2, Jun. 1990, pp. 115–120.
resolution via convolutional neural networks,” IEEE Geosci. Remote [59] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik-Manor,
Sens. Lett., vol. 14, no. 12, pp. 2398–2402, Dec. 2017. “The 2018 pirm challenge on perceptual image super-resolution,” in
[35] Z. Shao and J. Cai, “Remote sensing image fusion with deep convolu- Proc. Eur. Conf. Comput. Vis. Workshops (ECCVW), Sep. 2018.
tional neural network,” IEEE J. Sel. Topics Appl. Earth Observ. Remote [60] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
Sens., vol. 11, no. 5, pp. 1656–1669, May 2018. unreasonable effectiveness of deep features as a perceptual metric,”
[36] A. Xiao, Z. Wang, L. Wang, and Y. Ren, “Super-resolution for ‘Jilin-1’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
satellite video imagery via a convolutional network,” Sensors, vol. 18, pp. 586–595.
no. 4, p. 1194, Apr. 2018. [61] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completely
[37] Z. Pan, W. Ma, J. Guo, and B. Lei, “Super-resolution of single remote blind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3,
sensing image based on residual dense backprojection networks,” IEEE pp. 209–212, Mar. 2013.
Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7918–7933, Oct. 2019. [62] L. Liu, B. Liu, H. Huang, and A. C. Bovik, “No-reference image quality
[38] S. Zhang, Q. Yuan, J. Li, J. Sun, and X. Zhang, “Scene-adaptive remote assessment based on spatial and spectral entropies,” Signal Process.,
sensing image super-resolution using a multiscale attention network,” Image Commun., vol. 29, no. 8, pp. 856–863, Sep. 2014.
IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 4764–4779, [63] C. Ma, C.-Y. Yang, X. Yang, and M.-H. Yang, “Learning a no-reference
Jul. 2020. quality metric for single-image super-resolution,” Comput. Vis. Image
[39] L. Zhang, D. Chen, J. Ma, and J. Zhang, “Remote-sensing image Understand., vol. 158, pp. 1–16, May 2017.
superresolution based on visual saliency analysis and unequal recon- [64] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
struction networks,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 6, with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6,
pp. 4099–4115, Jun. 2020. pp. 84–90, May 2017, doi: 10.1145/3065386.
[40] I. Yanovsky, B. H. Lambrigtsen, A. B. Tanner, and L. A. Vese, “Efficient [65] P. Wang, H. Zhang, F. Zhou, and Z. Jiang, “Unsupervised remote sensing
deconvolution and super-resolution methods in microwave imagery,” image super-resolution using cycle CNN,” in Proc. IEEE Int. Geosci.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 9, Remote Sens. Symp. (IGARSS), Jul. 2019, pp. 3117–3120.
pp. 4273–4283, Sep. 2015. [66] A. Dosovitskiy et al., “FlowNet: Learning optical flow with convo-
[41] S. Kanakaraj, M. S. Nair, and S. Kalady, “SAR image super resolution lutional networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
using importance sampling unscented Kalman filter,” IEEE J. Sel. Dec. 2015, pp. 2758–2766.
Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 2, pp. 562–571,
Feb. 2018.
[42] X. Xu et al., “A new spectral-spatial sub-pixel mapping model for
remotely sensed hyperspectral imagery,” IEEE Trans. Geosci. Remote
Sens., vol. 56, no. 11, pp. 6763–6778, Nov. 2018.
[43] C. Yi, Y.-Q. Zhao, and J. C.-W. Chan, “Hyperspectral image super-
resolution based on spatial and spectral correlation fusion,” IEEE Trans.
Geosci. Remote Sens., vol. 56, no. 7, pp. 4165–4177, Jul. 2018.
[44] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional
network for image super-resolution,” in Proc. Eur. Conf. Comput. Vis.
Zürich, Switzerland: Springer, 2014, pp. 184–199.
[45] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using Runmin Dong received the bachelor’s degree
deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., in information and computing science from the
vol. 38, no. 2, pp. 295–307, Feb. 2016. Department of Science, Beijing Jiaotong University,
[46] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional Beijing, China, in 2017. She is currently pursuing
network for image super-resolution,” in Proc. IEEE Conf. Comput. Vis. the Ph.D. degree in ecology with the Department of
Pattern Recognit. (CVPR), Jun. 2016, pp. 1637–1645. Earth System Science, Tsinghua University, Beijing.
[47] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection Her research interests include remote sensing
networks for super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. image processing, deep learning, land cover map-
Pattern Recognit., Jun. 2018, pp. 1664–1673. ping, image super-resolution reconstruction, and
[48] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural self-supervised representation learning.
Inf. Process. Syst., 2014, pp. 2672–2680.
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.
DONG et al.: RRSGAN: RefSR FOR REMOTE SENSING IMAGE 5601117
Lixian Zhang received the bachelor’s degree in Haohuan Fu (Member, IEEE) received the Ph.D.
engineering of surveying and mapping from the degree in computing from Imperial College London,
School of Geodesy and Geomatics, Wuhan Uni- London, U.K., in 2009.
versity, Hubei, China, in 2018. He is pursuing the He is a Professor with the Ministry of Education
Ph.D. degree in ecology with the Department of Key Laboratory for Earth System Modeling and
Earth System Science, Tsinghua University, Beijing, the Department of Earth System Science, Tsinghua
China. University, Beijing, China. He is also the Deputy
His research interests include building extrac- Director of the National Supercomputing Center,
tion from remote sensing images, deep learn- Wuxi, China. His research interests include design
ing, and remote sensing image super-resolution methodologies for highly efficient and highly scal-
reconstruction. able simulation applications that can take advantage
of emerging multicore, many-core, and reconfigurable architectures, and make
full utilization of current Peta-Flops and future Exa-Flops supercomputers; and
intelligent data management, analysis, and data mining platforms that combine
the statistics methods and machine learning technologies.
Authorized licensed use limited to: National Technical University of Athens (NTUA). Downloaded on December 26,2022 at 20:03:24 UTC from IEEE Xplore. Restrictions apply.