Professional Documents
Culture Documents
ABSTRACT The acquisition of remote sensing images is affected by imaging equipment and environmental
conditions. Usually on lower performance devices, the resolution of the acquired images is also low. Among
many methods, the super-resolution reconstruction method based on generative adversarial networks has
obvious advantages over previous network models in reconstructing image texture details. However, it is
found in experiments that not all of these reconstructed textures exist in the image itself. Aiming at the
problem of whether the texture details of the reconstructed image are accurate and clear, we propose a
super-resolution reconstruction method combining wavelet transform and generative adversarial network.
Using wavelet multi-resolution analysis, training wavelet decomposition coefficients in the generative adver-
sarial network can effectively improve the local detail information of the reconstructed image. Experimental
results show that our method can effectively reconstruct more natural image textures and make the images
more visually clear. In the remote sensing image test set, the four indicators of the algorithm, peak signal
to noise ratio (PSNR), structural similarity (SSIM), Feature Similarity (FSIM) and Universal Image Quality
(UIQ) are slightly better than the algorithms mentioned in the article.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
18764 VOLUME 8, 2020
Z.-X. Huang, C.-W. Jing: Super-Resolution Reconstruction Method of Remote Sensing Image Based on Multi-Feature Fusion
values of the pixels to be interpolated. This method has networks can reconstruct images with more realistic texture
low complexity and fast speed, but the interpolation results details. In the network, the wavelet multi-resolution analysis
are easy. Causes artificial effects such as sawtooth, ringing is used to train the wavelet decomposition coefficients to
interference; The reconstruction-based method [9] is based improve the network’s understanding of the local details of
on a specific degradation model to provide constraints on the image during the training process. Finally, it can recon-
high-resolution image reconstruction based on the observed struct the high frequency High-resolution images with better
low-resolution image sequence, and then fuses different detail and sharpness. This chapter will introduce the specific
information of the same scene to obtain high quality. The operation of this method of image super-resolution recon-
reconstruction results can better suppress the artificial effects, struction in detail.
but also cause the loss of detailed information, and the method
is complicated in operation, difficult to guarantee accuracy, A. WAVELET TRANSFORM
and low in efficiency; With the optimization of processor In the method proposed in this paper, super-resolution recon-
performance, convenient conditions have been provided for struction of images is completed by training the image
the field of big data and artificial intelligence, and deep learn- wavelet decomposition coefficients during network training.
ing applications have become more widespread [10], [11]. In the network, we use the changes between the channels of
Learning-based algorithms are currently hotspots in the the convolution layer to implement wavelet decomposition of
field of super-resolution [12], the algorithm learns the map- the feature image. The four coefficients decomposed are used
ping relationship between the high-resolution image and the as the input of four independent subnets. Finally, the trained
low-resolution image by extracting features, and finally real- wavelet coefficients are passed through a deconvolution layer
izes image reconstruction. merge. The subnets are independent of each other and do not
The super-resolution reconstruction method based on deep affect each other. This method can realize multi-resolution
learning is an emerging method, which has advantages over analysis of images in the generative adversarial network,
traditional methods in terms of accuracy and clarity. However, so that the network can both understand the entire image
some super-resolution reconstruction algorithms based on and analyze the image details of the image. The network has
deep learning take minimizing mean square error as an opti- multiple aspects of understanding and deeper understanding
mization goal [13]–[16]. Although this method can improve of the image, thereby effectively reconstructing the image
the objective evaluation index of the reconstructed image, details more accurately and closer to the training results of
it cannot effectively reconstruct the detailed texture of the the original image.
image. Ledig et al. proposed (Super-Resolution Using a Gen- When performing wavelet transform on an image, we con-
erative Adversarial Network, SRGAN) to apply a generative sider image I as a two-dimensional signal with index [a, b],
adversarial network to the super-resolution reconstruction where I HR [a, b] is the pixel value in column b and a.
field [17]. The reconstructed image texture is relatively clear, The two-dimensional signal I HR [a, b] can be considered as
but some details are not true compared to the real image. a one-dimensional signal between a given b-th row I HR [a,:]
In order to reconstruct more realistic image detail informa- and a given a-th column I HR [:,b]. If the input low-resolution
tion, we propose a super-resolution reconstruction algorithm image is I HR [a, b], we set the high-pass and low-pass filters
that combines wavelet transform technology and genera- in the wavelet transform to h [k] and l [k], respectively.
tive adversarial network technology from the perspective of In the two-dimensional discrete wavelet transform process,
improving the high-frequency detail information of missing we first perform high-pass and low-pass filtering and fre-
images in low-resolution images. In this algorithm, we trans- quency reduction processing on the b direction:
form the wavelet decomposition and reconstruction process
to a convolutional layer and a deconvolution layer in the J −1
X
generated network structure. The decomposed wavelet coef- ILHR = I [a, 2b − j]l [j] (1)
ficients are trained in parallel independent subnets, and then J =0
the output of the subnets is combined into a reconstructed J −1
X
image by deconvolution. Finally, the discriminative network IHHR = I [a, 2b − j]h [j] (2)
is used to identify whether the output image of the generated J =0
network is a high-resolution image. This method takes advan-
tage of the advantages of wavelet transform and generative Then extend the a direction for high-pass and low-pass
adversarial networks in the super-resolution field, so that the filtering and frequency reduction processing.
reconstructed image has better texture information and is
more visually clear. J −1
X
HR
ILL = ILHR [2a − J , b]l [j] (3)
II. MODEL DESIGN J =0
Our proposed super-resolution reconstruction algorithm JX−1
combines wavelet transform technology and generative HR
IHL = ILHR [2a − J , b]h [j] (4)
adversarial network. The advantage of generating adversarial J =0
J −1
X parameters in the generation network and discriminant net-
HR
ILH = IHHR [2a − J , b]l [j] (5)
work. In this way, the generation network can continuously
J =0
optimize and get output results that are very similar to the
J −1
X original image. Therefore, super-resolution reconstruction
HR
IHH = IHHR [2a − J , b]h [j] (6)
based on image generation by adversarial networks can obtain
J =0
reconstructed images with good visual effects.
Among them, ILL HR obtained by equation (3) represents the
approximate coefficient of the original image, that is, the 1) GENERATING NETWORK
low-frequency component of the image; and IHL HR obtained by
HR HR obtained According to the definition of the generative adversarial net-
equation (4), ILH obtained by equation (5), and IHH
work, the hyperparameter δg of G is obtained by solving the
by equation (6) indicate The horizontal, vertical, and diagonal
smallest problem in Eq. (7).
detail coefficients of the image, that is, the high-frequency
components of the image.
In addition, when performing discrete wavelet decompo- 1 XN
sition and reconstruction of two-dimensional images, Haar θg = arg min lG [G (X )] (8)
N n=1
wavelet functions are often used as wavelet basis func-
tions. Haar wavelet function is the earliest tightly supported
In the formula, G consists of four functional modules:
orthogonal wavelet basis function. Its function is simple and
edge information extraction, residual dense learning and edge
widely used, especially it has good symmetry. Applying Haar
enhancement fusion, and image upsampling. The edge extrac-
wavelet basis function during training can effectively avoid
tion module obtains a preliminary edge image through a
image decomposition Generates phase distortion.
Laplacian sharpening filter [19], and transforms the prior
information of the edges through a 3 × 3 convolutional layer
B. GENERATIVE ADVERSARIAL NETWORK
into a highly discriminating Features; residual dense block
In generative adversarial networks, we need to define a gen-
(RDB) [20] can form a continuous memory mechanism, local
erative network and a discriminative network. The process of
feature fusion in RDB can adaptively learn from previous
generating adversarial network training is the process of con-
more effective features, which can stabilize the network train-
tinuously alternating optimization of the generation network
ing process and perceive loss In RDB, the Euclidean distance
and the discriminative network. The optimization process can
between image features is minimized by calculating the gap
be defined as:
in the feature space of the image, so that the generated image
min max EY ∼Ptrain(y) [log D(Y )] is less semantically different from the high-resolution remote
G D
sensing image feature and the generated image can be fitted to
+ EX ∼PG(x) [log(1−D(G(Y )))] (7)
high resolution Potential distribution of remote sensing image
Among them, G and D represent the generation network features. Definition i (•) represents the activation value
and discriminant network, and g and d represent the network of the output feature map after the ith convolutional layer of
RDB. The expression of the perceptual loss function is: and false attributes of image features [21].
n
ladv_G (X , Y ) = −EX ∼Ptrain(X ) log D (G (X ))
X (10)
lfeature = E(X ,Y )∼ptrain (X ,Y ) [ki (Y ) − i (G (X ))k] (9)
ladv_D (X , Y ) = EY ∼Ptrain(Y ) log D (Y )
i=1
+ EEX ∼ptrain (X ) log 1 − D (G (X )) (11)
2) DISCRIMINATION NETWORK
D (Discriminant network) is used to estimate the probabil- Because the texture of the reconstructed image is too
ity of false images from real sample data and G network, unnatural after being enlarged, in order to deal with this
where i is obtained by optimizing the minimum-maximum problem, we add a wavelet transform processing process in
problem in formula (7) in joint formula (8). D is mainly used the generation adversarial network, so that the network can
for binary classification to distinguish the authenticity of the train the wavelet decomposition coefficients of the image,
generated image. The network structure of D is shown in thereby improving the high resolution The frequency detail
Figure 1. The convolutional layer of s = 1 is used to extract information makes the texture of the image more realistic.
the shallow features of the image, and the convolutional layer Therefore, in this paper, our loss function adds the loss of
of s = 2 is used to reduce the image size. Then 4 dense con- wavelet coefficients to the SRGAN perceptual loss function.
nection blocks (DDB) are used for dense connection learning The loss function can be expressed as:
of the image features, and dense layers with PReLU and
Sigmoid activation functions are used to discriminate the true loss = l + lW
The minimum mean square error lW of the wavelet decom- B. EVALUATION INDEX
position coefficient can be expressed as: The most commonly used indicators in the field of super-
2 2 2 resolution reconstruction to evaluate the quality of the recon-
HR SR HR SR HR SR structed image are the peak signal-to-noise ratio and the
lW = a1 ILL − ILL +a2 ILH − ILH +a3 IHL − IHL
2 Structural Similarity Index (SSIM) [22]. However, many
HR SR studies have shown that the mean square error and the
+ a4 IHH − IHH (12)
peak signal-to-noise ratio cannot accurately reflect the visual
quality of the image, so we also cited Feature Similarity
III. EXPERIMENT AND ANALYSIS (FSIM) [23], Universal Image Quality (UIQ) [25]–[27] to
A. DATA SET evaluate the reconstructed image quality. According to the
1) DOTA Human Visual System (HVS), which mainly understands
Contains 2806 remote sensing images, including: football images through their low-level features, Zhang et al. proposed
fields, helicopters, swimming pools, roundabouts, carts, FSIM. Phase consistency can extract highly image informa-
planes, ships, tanks, baseball fields, tennis courts, basketball tion features, but cannot respond to changes in image con-
courts, track and field fields, ports, bridges, and trolleys trast, so gradient features are introduced to extract contrast
15 categories. information. Phase consistency is used as the primary feature
of FSIM and gradient is used as the secondary feature. The
2) RSOD larger the value of FSIM, the higher the feature similarity
between the images, and the better the quality of the recon-
5364 targets including aircraft, playground, overpass, and
structed image. FSIM is represented as follows:
fuel tank.
SL (x) · PCm (x)
P
3) UCAS x∈8
FSIM = (13)
PCm (x)
P
A total of 7,000 images marked by the Pattern Recogni-
x∈8
tion and Intelligent System Development Laboratory of the
Chinese Academy of Sciences, including two types of targets: where 8 represents the pixel domain of the entire image,
cars and aircraft. and SPC (x) and SG (x) are expressed as the phase consistency
TABLE 1. On the three data sets of DOTA, RSOD, and UCAS, the average PSNR and SSIM values of the various super-resolution methods are enlarged 2, 3,
and 4 times.
TABLE 2. On the three data sets of DOTA, RSOD, and UCAS, the average FSIM and UIQ values of the various super-resolution methods are enlarged 2, 3,
and 4 times.
and gradient magnitude of the image, respectively. SL (x) and PCm (x) = max (PC1 (x), PC2 (x)) (15)
PCm (x) are expressed as: UIQ is an image quality evaluation indicator that combines
three factors, linear correlation loss, brightness distortion and
SL (x) = SPC (x)α · SG (x)β
(14) contrast distortion. The larger the value of UIQ, the higher the
correlation between the reconstructed image and the original [3] H. Greenspan, ‘‘Super-resolution in medical imaging,’’ Comput. J., vol. 52,
image in terms of linearity, brightness, and contrast, and the no. 1, pp. 43–63, 2008.
[4] Y. Huang, L. Shao, and A. F. Frangi, ‘‘Simultaneous super-resolution and
better the quality of the reconstructed image. UIQ is defined cross-modality synthesis of 3D medical images using weakly-supervised
as follows: joint convolutional sparse coding,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 6070–6079.
σxy 2x̄ ȳ σx σy [5] C. Wang, S. Jiang, H. Zhang, F. Wu, and B. Zhang, ‘‘Ship detection for
UIQ = · · (16)
σx σy (x̄)2 + (ȳ)2 σx2 + σy2 high-resolution SAR images based on feature analysis,’’ IEEE Geosci.
Remote Sens. Lett., vol. 11, no. 1, pp. 119–123, Jan. 2014.
[6] O. Cossairt, M. Gupta, and S. K. Nayar, ‘‘When does computational
Among them, the first term in formula (16) represents the imaging improve performance?’’ IEEE Trans. Image Process., vol. 22,
linear correlation coefficient between x and y. When the value no. 2, pp. 447–458, Feb. 2013.
is 1, it means that x and y are linearly correlated. The second [7] J. Yang, Z. Lin, and S. Cohen, ‘‘Fast image super-resolution based on
in-place example regression,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
term indicates the correlation of image brightness, and the Recognit., Jun. 2013, pp. 1059–1066.
third term indicates the correlation of image contrast. [8] F. Zhou, W. Yang, and Q. Liao, ‘‘Interpolation-based image
super-resolution using multisurface fitting,’’ IEEE Trans. Image Process.,
vol. 21, no. 7, pp. 3312–3318, Jul. 2012.
C. EXPERIMENTAL RESULTS AND ANALYSIS [9] Z. Lin and H.-Y. Shum, ‘‘Fundamental limits of reconstruction-based
In order to show the performance of the method in this paper, superresolution algorithms under local translation,’’ IEEE Trans. Pattern
we quantitatively compare the method in this paper with the Anal. Mach. Intell., vol. 26, no. 1, pp. 83–97, Jan. 2004.
[10] H. Zhang, Y. Fu, L.-B. Feng, Y. Zhang, and R. Hua, ‘‘Implementation of
SRGAN method. Experiments are performed using DOTA, hybrid alignment algorithm for protein database search on the SW26010
RSOD, and UCAS data sets, and the obtained PSNR and many-core processor,’’ IEEE Access, vol. 7, pp. 128054–128063, 2019.
SSIM results of the reconstructed images are 2, 3, and 4 As [11] L.-Q. Zuo, H.-M. Sun, Q.-C. Mao, R. Qi, and R.-S. Jia, ‘‘Natural scene text
recognition based on encoder-decoder framework,’’ IEEE Access, vol. 7,
shown in Table 1. From the results in Table 1, it can be found pp. 62616–62623, 2019.
that the image quality assessment standards PSNR and SSIM [12] J. H. Zhou, C. Zhou, J. J. Zhu, and D. H. Fan, ‘‘Super-resolution reconstruc-
of the improved method proposed in this paper have a sig- tion method of remote sensing image based on non-downsampling contour
wave transform,’’ J. Opt., vol. 35, no. 1, pp. 106–114, 2014.
nificant improvement over most previous methods. In order [13] C. Dong, C. C. Loy, K. He, and X. Tang, ‘‘Learning a deep convolutional
to evaluate the reconstructed image quality more accurately, network for image super-resolution,’’ in Proc. ECCV, 2014, pp. 184–199.
we used FSIM and UIQ to further analyze the experimental [14] J. Kim, J. K. Lee, and K. M. Lee, ‘‘Deeply-recursive convolutional network
for image super-resolution,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
results. As can be seen from Table 2, our proposed method Recognit. (CVPR), Jun. 2016, pp. 1637–1645.
has achieved good results in both FSIM and UIQ evaluation [15] C. Dong, C. C. Loy, K. He, and X. Tang, ‘‘Image super-resolution using
indicators. deep convolutional networks,’’ IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 38, no. 2, pp. 295–307, Feb. 2016.
[16] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, ‘‘Deep Laplacian
IV. CONCLUSION pyramid networks for fast and accurate super-resolution,’’ in Proc. IEEE
By studying super-resolution reconstruction algorithms based Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 624–632.
[17] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta,
on deep learning in recent years, we learned that generating A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, ‘‘Photo-realistic
adversarial network models can reconstruct high-resolution single image super-resolution using a generative adversarial network,’’
images with sharper textures. Because the wavelet transform in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
pp. 105–114.
has the characteristics of decomposing the high-frequency [18] X. Wang, ‘‘Laplacian operator-based edge detectors,’’ IEEE Trans. Pattern
details and low-frequency in the image and expressing them Anal. Mach. Intell., vol. 29, no. 5, pp. 886–890, May 2007.
separately, in order to further improve the quality of the [19] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, ‘‘Beyond a Gaussian
Denoiser: Residual learning of deep CNN for image denoising,’’ IEEE
reconstructed image, it is proposed to fuse the wavelet trans- Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017.
form into the structure of the adversarial network. Through [20] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, ‘‘Residual dense net-
a series of experimental verifications, the super-resolution work for image super-resolution,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit., Jun. 2018, pp. 2472–2481.
reconstruction algorithm proposed in this paper can effec- [21] F. Li, H. Bai, L. Zhao, and Y. Zhao, ‘‘Dual-streams edge driven
tively improve the shortcomings of the SRGAN algorithm’s encoder-decoder network for image super-resolution,’’ IEEE Access,
reconstructed image quality evaluation standard with lower vol. 6, pp. 33421–33431, 2018.
[22] Y. Wang, L. Wang, H. Wang, and P. Li, ‘‘End-to-end image super-resolution
measured values, and PSNR, SSIM, FSIM, and UIQ all via deep and shallow convolutional networks,’’ IEEE Access, vol. 7,
increase. At the same time, the algorithm generates a recon- pp. 31959–31970, 2019.
structed image with realistic texture details. This method fully [23] Z. Lu, Z. Yu, P. Yali, L. Shigang, W. Xiaojun, L. Gang, and R. Yuan, ‘‘Fast
single image super-resolution via dilated residual networks,’’ IEEE Access,
reflects the advantages of this method in the field of super- vol. 7, pp. 109729–109738, 2019.
resolution reconstruction. [24] A. K. Moorthy and A. C. Bovik, ‘‘Visual importance pooling for image
quality assessment,’’ IEEE J. Sel. Topics Signal Process., vol. 3, no. 2,
pp. 193–201, Apr. 2009.
REFERENCES [25] Z. Wang and A. Bovik, ‘‘A universal image quality index,’’ IEEE Signal
[1] S. T. Alvarado, T. Fornazari, A. Cóstola, L. P. C. Morellato, and Process. Lett., vol. 9, no. 3, pp. 81–84, Mar. 2002.
T. S. F. Silva, ‘‘Drivers of fire occurrence in a mountainous Brazilian [26] N. Damera-Venkata, T. Kite, W. Geisler, B. Evans, and A. Bovik, ‘‘Image
cerrado savanna: Tracking long-term fire regimes using remote sensing,’’ quality assessment based on a degradation model,’’ IEEE Trans. Image
Ecol. Indicators, vol. 78, pp. 270–281, Jul. 2017. Process., vol. 9, no. 4, pp. 636–650, Apr. 2000.
[2] J. S. Isaac and R. Kulkarni, ‘‘Super resolution techniques for medical [27] H. Sheikh, M. Sabir, and A. Bovik, ‘‘A statistical evaluation of recent
image processing,’’ in Proc. Int. Conf. Technol. Sustain. Develop. (ICTSD), full reference image quality assessment algorithms,’’ IEEE Trans. Image
Feb. 2015, pp. 1–6. Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.
ZHI-XING HUANG was born in Wenzhou, CHANG-WEI JING was born in Liaocheng,
Zhejiang, China, in 1984. He received the B.S. Shandong, China, in 1983. He received the B.S.
degree from Chongqing University, in 2008, and and M.S. degrees from Jilin University and the
the master’s degree from the Beijing Institute of Ph.D. degree from Zhejiang University. Since
Technology, in 2013. He is currently an Engineer 2013, he has been an Assistant Researcher with
with the Zhejiang Marine Aquaculture Research the Zhejiang Provincial Key Laboratory of Urban
Institute, a Teacher with the Zhejiang Security Wetland and Regional Change, Hangzhou, China.
Vocational and Technical College, and a registered He is the author of two books and more than
Surveyor and Technical Consultant with Zhejiang ten articles. His research interests include spatial
Yijia Geographic Information Technology Com- data processing and remote sensing for natural
pany, Ltd. His national research interests include research and application resources and environment.
of new technology of surveying and mapping remote sensing image, marine
environmental detection, and resource disaster prevention and control.