You are on page 1of 5

1630 IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO.

11, NOVEMBER 2018

A Progressively Enhanced Network for Video


Satellite Imagery Superresolution
Kui Jiang , Zhongyuan Wang , Member, IEEE, Peng Yi , and Junjun Jiang , Member, IEEE

Abstract—Deep convolutional neural networks (CNNs) have ous applications, such as medical imaging [3], satellite imaging
been extensively applied to image or video processing and analysis [4], [5], face recognition [6], [7], and video surveillance [8].
tasks. For single-image superresolution (SR) processing, previous Remote sensing imagery finds the direct application of SR due
CNN-based methods have led to significant improvements, when to the requirement for high spatial observation precision. Video
compared to the shallow learning-based methods. However, these
satellite is a new type of remote sensing satellite, which can cap-
CNN-based algorithms with simply direct or skip connections are
not suitable for satellite imagery SR because of complex imaging
ture continuous dynamic video rather than still imagery. Video
conditions and unknown degradation process. More importantly, satellites are thus particularly well suited for the observation of
they ignore the extraction and utilization of the structural informa- large dynamic targets, such as ships and aircrafts. Compared to
tion in satellite images, which is very unfavorable for video satellite traditional remote sensing satellites, video satellites improve the
imagery SR with such characteristics as small ground targets, weak temporal resolution at the cost of spatial resolution, which thus
textures, and over-compression distortion. To this end, this letter calls for spatial enhancement through image SR [9]–[11].
proposes a novel progressively enhanced network for satellite im- Reconstructing an SR image from its LR image is a highly ill-
age SR called PECNN, which is composed of a pretraining CNN- posed problem because of the lost high-frequency information
based network and an enhanced dense connection network. The suffering from various factors, such as subsampling, compres-
pretraining part is used to extract the low-level feature maps and sion, and insufficient imaging resolution. In past years, various
reconstructs a basic high-resolution image from the low-resolution
input. In particular, we propose a transition unit to obtain the
SR algorithms [12]–[15] based on shallow learning approaches
structural information from the base output. Then, the obtained have been proposed. The shallow-learning-based methods as-
structural information and the extracted low-level feature maps sume a shared manifold consistency between LR images and
are transmitted to the enhanced network for further extraction to the corresponding HR counterparts [16], thereby learning the
enforce the feature expression. Finally, a residual image with en- linear mapping from LR to HR. However, since this assumption
hanced fine details obtained from the dense connection network does not always hold due to noise or large magnification (e.g.,
is used to enrich the basic image for the ultimate SR output. Ex- ×4), they cannot be applied to practical applications involving
periments on real-world Jilin-1 video satellite images and Kaggle more complex mapping relationships.
Open Source Dataset show that the proposed PECNN outperforms Recently, the emergence of deep neural networks greatly im-
the state-of-the-art methods both in visual effects and quantitative proves the performance with the multihidden-layer structure in
metrics. Code is available at https://github.com/kuihua/PECNN.
single-image SR (SISR) [17]–[19]. Dong et al. pioneers a CNN-
Index Terms—Dense connection, residual network, superresolu- based SR method [2], which learns a mapping relationship from
tion, subpixel convolution, video satellite imagery. LR to HR with an end-to-end model. Afterwards, various CNN
I. INTRODUCTION variants have been developed for SISR. Chen et al. proposes
to use multistage trainable nonlinear reaction diffusion as an
HE task of recovering a high-resolution (HR) image or
T video from its low-resolution (LR) counterpart is referred
to as superresolution (SR) [1], [2]. Recently, SR has been
alternative to CNN where the weights and the nonlinearity are
trainable [20]. Wang et al. trains an end to end cascaded sparse
coding network inspired by learning iterative shrinkage and
evolved to deal with image resolution enhancement in vari- thresholding algorithm [21] to fully exploit the natural sparsity
of images [22].
Manuscript received July 11, 2018; revised August 12, 2018; accepted Furthermore, the residual learning is introduced into the com-
September 6, 2018. Date of publication September 17, 2018; date of cur- puter vision by the work of [23], used for constructing a deeper
rent version September 21, 2018. This work was supported in part by the CNN. He et al. [23] explicitly reformulate the functions into
National Natural Science Foundation of China under Grant 61671332, under
Grant U1736206, under Grant 61501413, under Grant 41771452, and under
learning residual with respect to the input, instead of learning
Grant 41771454 and in part by the Hubei Province Technological Innovation unreferenced functions. Then, Kim et al. proposes a residual net-
Major Project (2017AAA123). The associate editor coordinating the review of work named VDSR [24] for image SR by using adaptively gra-
this manuscript and approving it for publication was Prof. Joao Paulo Papa. dient clipping and skip connection to alleviate training difficulty.
(Corresponding author: Zhongyuan Wang.) Although these previously mentioned CNN-based frame-
K. Jiang, Z. Wang, and P. Yi are with the School of Computer Science,
Wuhan University, Wuhan 430072, China (e-mail:, kuijiang@whu.edu.cn; works or residual networks have led to significant improve-
wzy_hope@163.com; yipeng@whu.edu.cn). ments in general image SR, the simple topological structures
J. Jiang is with the School of Computer Science and Technology, Harbin restrict fine feature expression. In particular, satellite image has
Institute of Technology, Harbin 150001, China (e-mail:, jiangjunjun@hit. complex imaging conditions and unknown degradation process,
edu.cn).
Color versions of one or more of the figures in this letter are available online
thus calling for high requirements for the feature extraction. In
at http://ieeexplore.ieee.org. addition, these traditional CNN-based algorithms ignore struc-
Digital Object Identifier 10.1109/LSP.2018.2870536 tural information, whereas it is quite crucial for satellite image

1070-9908 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 04,2023 at 18:27:10 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: PROGRESSIVELY ENHANCED NETWORK FOR VIDEO SATELLITE IMAGERY SUPERRESOLUTION 1631

Fig. 1. Outline of the proposed progressively enhanced network (PECNN). The components in the red box refer to the pretrained part and the components in the
green one denote the dense connection subnetwork.

processing and analysis tasks because of the small ground ob- input and output of the proposed PECNN. We refer to r as the
jects, weak textures, and heavy compression distortion in satel- up-scaling ratio. Both ILR and IHR have C color channels, which
lite imagery. are represented as real-valued tensors of size W × H × C and
In this letter, we propose a progressively enhanced network W r × Hr × C, respectively.
for satellite image SR, namely PECNN, which is composed of
a pretrained network and a dense connection network. In the A. Pretrained Network
front part of PECNN, low-level features are extracted from LR
input by a simple three-layer CNN network, followed by an ef- We use the front part of PECNN to obtain a basic SR result
ficient subpixel convolution layer [25] to obtain an upsampled IBR ∈ Rr H ×r W from the given LR image ILR ∈ RH ×W . To
base image. Then, we obtain a set of structure-similar images achieve the amplification operation without traditional interpo-
by pixel offset and extract structural information by projecting lation, we learn C × r2 feature maps for up-scaling base image
these images onto LR space with a transition unit. Furthermore, by the subpixel convolution [25] at the final layer, which can
we concatenate and fuse the components (the structural infor- convert the feature maps in the size of W × H × Cr2 into an
mation and the low-level features) to pass to the dense connec- RGB image in the size of W r × Hr × C. Because of the com-
tion network for a further extraction. In particular, an effective plex patterns of different feature maps in the final filters, we can
dense connectivity pattern between every two residual blocks is acquire a more comprehensive expression. This operation can
proposed to promote information propagation, which performs be described as follows:
direct connections from current block to any subsequent blocks, IBR = P S(fn (ILR )) = P S(Wn × fn −1 (ILR ) + bn ) (1)
enforcing the feature expression. At the final layer, the feature
maps extracted layer-by-layer are transmitted to the HR space where fn −1 (ILR ) denotes output of the (n − 1)th layer, and Wn
by a subpixel layer. In addition, for the training manner, we first and bn represent weight and bias of the nth layer, respectively.
optimize the pretrained network by setting the loss equivalent f (·) refers to the convolution operation, followed by the rec-
to the difference between the base image and ground truth. The tified linear unit for activation. P S(·) represents a periodic
parameters in the dense connection network are then updated shuffling operator (SP-conv) [25] that can rearrange the ele-
with the pretraining module fixed, till convergence. ments of an H × W × C · r2 tensor into a renewed tensor of
In summary, the main contributions are highlighted as rH × rW × C. To obtain the base image by the factor of r, we
follows. mathematically optimize the loss function as follows:

1) We propose a novel progressively enhanced network θ̂(IBR , θ1 ) = arg min ||IHR − f (ILR , θ1 )||22 (2)
(PECNN) for video satellite image SR reconstruction in a θ1
convenient and effective end-to-end training manner. where IHR refers to the ground truth image, f (ILR , θ1 ) denotes
2) Two effective and practical feature processing subnet- the base output of the pretrained network, and θ1 represents
works (the pretrained network and the dense connection the model parameters. Particularly, the model parameters will
network) are simultaneously proposed to promote feature be fixed and preserved when the pretrained network have been
extraction, enforcing feature expression. Especially, we optimized.
conduct a transition unit to obtain the sequential structure
related information to enforce feature expression.
B. Dense Connection Network
Residual learning is a new fashion different from the tradi-
II. PROPOSED METHOD
tional learning methods, whose learned content is an extremely
As shown in Fig. 1, the proposed PECNN can be roughly par- sparse residual image through a deeper network. VDSR [24] is
titioned into two substructures: a pretraining CNN-based net- a typical residual network architecture to deal with the SR prob-
work and a dense connection network. Residuals represent the lem. VDSR network requires Bicubic interpolated image as the
high-frequency components of an image, and specific learning input, thus consuming huge computation and memory overhead.
for high-frequency residuals is thus beneficial to detail construc- In addition, VDSR utilizes a single-path feed-forward architec-
tion. Therefore, the former learns a preliminarily magnified HR ture, where one state is mainly influenced by its direct former
image (so-called base image), and the latter learns the residual state. As the depth increases, it encounters the problem that
image between the supervised (in training) or target (in testing) weakens the impact on the front layers. We thereby conduct a
HR image and the base image. Except the upsampling opera- transition unit and a dense-connection-based network for feature
tion, motivated by prior work on SISR [24], [26]–[28], the entire extraction and expression, as shown in Fig. 2.
process of local feature extraction and fusion is performed in As shown in Fig. 2 (top panel), the transition unit is composed
LR space. In this letter, ILR and ISR are, respectively, seen as the of such operations as pixel offset, projection, and concatenation.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 04,2023 at 18:27:10 UTC from IEEE Xplore. Restrictions apply.
1632 IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 11, NOVEMBER 2018

Fig. 2. Outlines of the proposed transition unit and ResNet. “C” and “+” denote the concatenation and adding operation, respectively.

TABLE I feature-based cosine distance. They usually directly approxi-


COMPARISON OF THE NETWORK PA RAMETERS mate the learned images to the target results.
As presented before, since our network follows a residual
learning structure, we will explicitly optimize the residual im-
ages rather than the target images. We progressively approach
the target by learning two images with different feature levels.
Specifically, we obtain a base SR image IBR from the pretrained
network by minimizing the divergence between IBR and IHR .
And then, the dense connection part is used to obtain a residual
A group of structure-similar images are obtained from the base image IRES by optimizing the discrepancy between IBR and IBR ˆ .
image by the pixel offset. Then, we convert them from HR space The details are shown as follows:
to LR space to generate the structure-related information by a
projection, including applying the Gaussian filter to each basic IBR
ˆ = IHR − IRES (3)
image to obtain the structure information and downsampling the where IBRˆ refers to the divergence between the ground truth IHR
base images using Bicubic kernel with down-sampling factor r. and the prediction IRES of the dense connection network. We
Finally, the low-level features from the pretrained network and acquire the suitable IRES by optimizing the following function:
the structure-related information from the transition unit are 
fused for a further extraction in residual blocks. θ∗ (IRES , θ2 ) = arg min IBR
θ2
Second, Fig 2 (middle panel) shows the main components in
the dense connection network. Contrary to conventional skip- − (IHR − f (g(IBR ), fn (ILR ), θ2 ))22 (4)
connection-based networks, we adopt a more helpful dense
connection fashion between the every two blocks to enhance where g(·) represents an operation to obtain the structural infor-
information propagation, which benefits to the fine feature mation by a pixel offset and a downscaling operation. fn (ILR )
expression. denotes the feature maps extracted from ILR by the pretrained
Finally, Fig 2 (bottom panel) represents a residual block, using network. f (g(IBR ), fn (ILR ), θ2 ) refers to the output of the dense
three convolution layers to extract local features. To reduce the connection network, and θ2 denotes the model parameters in the
parameters and fuse the information, a 1 × 1 convolution layer dense connection network.
is embedded in the ending. The comparison results of model
complexity are reported in Table I. It is seen that our model III. EXPERIMENTS AND DISCUSSIONS
enjoys the least parameters (only about 1/3 of VDSR), thus In this section, we conduct a group of experiments on video
extremely releasing the computation and memory burden. satellite images to validate PECNN. The comparison results
with Bicubic, SRCNN [2], and VDSR [24] will be reported
in the following. In addition, because the test images have no
C. Model Optimization reference objects (i.e., using the observed LR images instead
Image SR is an ill-posed problem in nature, whose solution of the downscaled LR images as input), we provide the visual
is not unique. In training, the loss function has an important in- results and introduce the indicators of average gradient (AG)
fluence on the convergence speed and approximation accuracy [31] and naturalness image quality evaluator (NIQE) [32] to
of the network model. The commonly used loss functions in- evaluate the reconstruction quality further. These metrics can
clude pixel-based l1 -norm [26] and l2 -norm [2], [24], [29], and reasonably assess image clarity since they sensitively reflect

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 04,2023 at 18:27:10 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: PROGRESSIVELY ENHANCED NETWORK FOR VIDEO SATELLITE IMAGERY SUPERRESOLUTION 1633

TABLE II
COMPARISON RESULTS OF THE RUNNING TIME ON Kaggle Open Source Dataset WITH SCALE FACTOR OF 3

Fig. 3. Comparison results on Jilin-1 satellite imagery with scaling factor × 3. The bold indicates the best performance (AG/NIQE).

content sharpness, detail contrast, and texture diversity when TABLE III
ground truth image is not available. In particular, the larger AG COMPARISON RESULTS OF THE AG AND NIQE ON Jilin-1 VIDEO SATELLITE
IMAGERY WITH SCALE OF 3
and the smaller NIQE, the clearer the image.

A. Dataset and Parameters


Because the real-world video satellite images are in low qual-
ity and lose details suffering from over-compression, they are
not suitable for training. We thus have to seek for suitable clear of the most crucial metrics for the practical application. The
images served as training samples. DIV2K dataset [33] is a comparison results are reported in Table II.
newly produced high-quality general image dataset for image As shown in Fig. 3, most of compared methods [2], [24]
restoration tasks, which contains 900 HR images. We randomly produce noticeable artifacts and blurred outlines, whereas the
select 291 images used as our training dataset. In addition, The proposed PECNN yields more realistic results with fewer jagged
test datasets are composed of images by Chinese Jilin-1 video lines and ringing artifacts. Overall, our method outperforms
satellite imagery and emphKaggle Open Source Dataset. For VDSR, SRCNN, and Bicubic in recovering the high-frequency
training, the input patches are cropped into size of 32 × 32 pix- details of satellite images.
els, with batch size of 32. Learning rate is initialized as 10−3
and 10−4 for the pretrained network and the dense connec-
IV. CONCLUSION
tion network, respectively. We implement the proposed network
with Tensorflow framework on the NVIDA Titan XP GPU plat- In this letter, we propose a novel progressively enhanced
form. In experiments, the entire training process takes about 50 network, namely PECNN, for superresolving video satellite
epochs in 5 h (15 and 35 for the pretrained and dense network, imagery. It is composed of a pretrained network and a dense
respectively). connection network. In particular, an effective transition unit is
embedded in the middle of the network to catch the profile struc-
ture related information. We also promote the feature expression
B. Comparison Results by adopting more effective dense connections and progressive
We conduct the comparison experiments on Jilin-1 video feature learning fashion. Since the constructed network takes
satellite imagery and emphKaggle Open Source Dataset (cov- less depths and filters but more dense connections among lay-
ering agriculture, airport, buildings, warships, forest, freeway, ers, it enjoys pronounced SR performance within acceptable
parking lot, storage tanks, and harbor) with VDSR [24], SRCNN computational complexity and is therefore applicable for huge-
[2], and Bicubic interpolation. As given in Table III, PECNN size satellite imagery. The experimental results on Jilin-1 video
exhibits the highest AG and the least NIQE. In addition, we satellite imagery and Kaggle Open Source Dataset show its
evaluate the time complexity on 30 test images, which is one superiority over VDSR, SRCNN, and Bicubic.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 04,2023 at 18:27:10 UTC from IEEE Xplore. Restrictions apply.
1634 IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 11, NOVEMBER 2018

REFERENCES [17] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.
[1] J. Jiang, X. Ma, C. Chen, T. Lu, Z. Wang, and J. Ma, “Single image super- [18] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
resolution via locally regularized anchored neighborhood regression and large-scale image recognition,” Sep. 2014, arXiv:1409.1556.
nonlocal means,” IEEE Trans. Multimedia, vol. 19, no. 1, pp. 15–26, Jan. [19] E. Zerhouni, D. Lnyi, M. Viana, and M. Gabrani, “Wide residual networks
2017. for mitosis detection,” in Proc. IEEE 14th Int. Symp. Biomed. Imag., Apr.
[2] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using 2017, pp. 924–928.
deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., [20] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible
vol. 38, no. 2, pp. 295–307, Feb. 2016. framework for fast and effective image restoration,” IEEE Trans. Pattern
[3] F. Shi, J. Cheng, L. Wang, P.-T. Yap, and D. Shen, “LRTV: Mr image Anal. Mach. Intell., vol. 39, no. 6, pp. 1256–1272, Jun. 2017.
super-resolution with low-rank and total variation regularizations,” IEEE [21] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,”
Trans. Med. Imag., vol. 34, no. 12, pp. 2459–2466, Dec. 2015. in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 399–406.
[4] Z. Shao and J. Cai, “Remote sensing image fusion with deep convolutional [22] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deeply improved sparse
neural network,” IEEE J. Sel. Topics Appl. Earth Observ., vol. 11, no. 5, coding for image super-resolution,” in Proc. Int. Conf. Comput. Vis., Jul.
pp. 1656–1669, May 2018. 2015, pp. 370–378.
[5] A. Xiao, Z. Wang, L. Wang, and Y. Ren, “Super-resolution for “Jilin-1” [23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
satellite video imagery via a convolutional network,” Sensors, vol. 18, recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun.
no. 4, 2018, Art. no. 1194. 2016, pp. 770–778.
[6] Z. Wang, R. Hu, S. Wang, and J. Jiang, “Face hallucination via weighted [24] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using
adaptive sparse regularization,” IEEE Trans. Circuits Syst. Video Technol., very deep convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
vol. 24, no. 5, pp. 802–813, May 2014. Pattern Recognit., Jun. 2016, pp. 1646–1654.
[7] J. Jiang, R. Hu, Z. Wang, and Z. Han, “Noise robust face hallucination [25] W. Shi et al., “Real-time single image and video super-resolution using
via locality-constrained representation,” IEEE Trans. Multimedia, vol. 16, an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
no. 5, pp. 1268–1281, Aug. 2014. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 1874–1883.
[8] Z. Shao, J. Cai, and Z. Wang, “Smart monitoring cameras driven intelligent [26] W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, “Deep Laplacian
processing to big surveillance video data,” IEEE Trans. Big Data, vol. 4, pyramid networks for fast and accurate super-resolution,” in Proc. IEEE
no. 1, pp. 105–116, Mar. 2018. Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 5835–5843.
[9] Q. Luo, X. Shao, L. Peng, Y. Wang, and L. Wang, “Super-resolution [27] T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense
imaging in remote sensing,” Proc. SPIE, vol. 9501, May 2015, skip connections,” in Proc. IEEE Int. Conf. Int. Conf. Comput. Vis., Oct.
Art. no. 950108. 2017, pp. 4809–4817.
[10] W. Dong et al., “Hyperspectral image super-resolution via non-negative [28] C. Dong, C. L. Chen, and X. Tang, “Accelerating the super-resolution
structured sparse representation,” IEEE Trans. Image Process., vol. 25, convolutional neural network,” in Proc. 14th Eur. Conf. Comput. Vis.,
no. 5, pp. 2337–2352, May 2016. Aug. 2016, pp. 391–407.
[11] S. Yang, F. Sun, M. Wang, Z. Liu, and L. Jiao, “Novel super resolution [29] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network
restoration of remote sensing images based on compressive sensing and for image super-resolution,” in Proc. IEEE Conf. Comput. Vis. Pattern
example patches-aided dictionary learning,” in Proc. Int. Workshop Multi- Recognit., Jun. 2016, pp. 1637–1645.
Platform/Multi-Sensor Remote Sens. Mapping, Jan. 2011, pp. 1–6. [30] X.-J. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep
[12] H. He and W. C. Siu, “Single image super-resolution using Gaussian convolutional encoder-decoder networks with symmetric skip connec-
process regression,” in Proc. CVPR 2011, Jun. 2011, pp. 449–456. tions,” Mar. 2016, arXiv:1603.09056.
[13] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, “Fast and robust [31] A. Chen, B. Chen, X. Chai, R. Bian, and H. Li, “A novel stochastic
multiframe super resolution,” IEEE Trans. Image Process., vol. 13, no. 10, stratified average gradient method: Convergence rate and its complexity,”
pp. 1327–1344, Oct. 2004. 2017, arXiv:1710.07783.
[14] W. T. Freeman and E. C. Pasztor, “Learning low-level vision,” in Proc. [32] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely
7th IEEE Int. Conf. Comput. Vis., 1999, vol. 2, pp. 1182–1189. blind” image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3,
[15] J. Jiang, X. Ma, Z. Cai, and R. Hu, “Sparse support regression for image pp. 209–212, Mar. 2013.
super-resolution,” IEEE Photon. J., vol. 7, no. 5, pp. 1–11, Oct. 2015. [33] R. Timofte et al., “NTIRE 2017 challenge on single image super-
[16] H. Chang, D.-Y. Yeung, and Y. Xiong, “Super-resolution through neighbor resolution: Methods and results,” in Proc. IEEE Conf. Comput. Vis. Pattern
embedding,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, 2017, pp. 1110–1121.
Recognit., Jun. 2004, vol. 1, pp. I–I.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on April 04,2023 at 18:27:10 UTC from IEEE Xplore. Restrictions apply.

You might also like