Professional Documents
Culture Documents
Abstract—With the emergence of huge volumes of high- A necessity for developing a successful remote sensing im-
resolution remote sensing images produced by all sorts of satellites age retrieval system is the extraction of representative features
and airborne sensors, processing and analysis of these images to describe the query image and the ones in the database [36].
require effective retrieval techniques. To alleviate the dramatic
variation of the retrieval accuracy among queries caused by the Local and holistic features are two types of descriptors for
single image feature algorithms, we developed a novel graph-based representing images. Local features are particularly capable of
learning method for effectively retrieving remote sensing images. attending to local image patterns or textures, whereas holistic
The method utilizes a three-layer framework that integrates the features describe the overall layouts of an image. One drawback
strengths of query expansion and fusion of holistic and local of the two features is that the retrieved images often look
features. In the first layer, two retrieval image sets are obtained
by, respectively, using the retrieval methods based on holistic and alike but may be irrelevant to the query because remote sens-
local features, and the top-ranked and common images from both ing images often represent large natural geographical scenes
of the top candidate lists subsequently form graph anchors. In the that contain abundant and complex visual contents [35]. They
second layer, the graph anchors as an expansion query retrieve six are usually hindered by the noise from irrelevant content [7].
image sets from the image database using each individual feature. Clearly, integration of their strengths can greatly improve re-
In the third layer, the images in the six image sets are evaluated for
generating positive and negative data, and SimpleMKL is applied trieval precision. Directly combining different feature vectors
to learn suitable query-dependent fusion weights for achieving the into one vector is not a good idea for improving image retrieval
final image retrieval result. Extensive experiments were performed accuracy because the feature characteristics and the algorith-
on the UC Merced Land Use–Land Cover data set. The source mic procedures are dramatically different [6]. Although query
code has been available at our website. Compared with other expansion [12], [13] can achieve precise retrieval results, the
related methods, the retrieval precision is significantly enhanced
without sacrificing the scalability of our approach. performance of query expansion tends to degrade because of
false-positive search results [5].
Index Terms—Expansion query, graph-based learning, image Inspired by the strengths of the query expansion and specific
retrieval, query fusion, reranking.
rank fusion method [6], we propose a three-layered graph-
I. I NTRODUCTION based learning approach to retrieve remote sensing images.
Fig. 1 illustrates the overview of our proposed approach. In
W ITH the availability of huge volumes of high-resolution
remote sensing images produced by all sorts of satellites
and airborne sensors, remote sensing image processing and
this approach, we extract the holistic feature by means of
GIST and the local feature through dense SIFT and ScSPM
analysis have been an active research topic in the geoscience [43]. Next, the image retrieval process is divided into three
field [1]–[4] in recent decades. As a result, many applications layers. In the first layer, by using the holistic feature learning
involving high-resolution satellite image analysis are in need algorithm, one image list in which every image is similar to
of an image retrieval task. Due to the huge number of images, the query is retrieved from the remote sensing image database.
it is important to develop effective and efficient methods for Accordingly, we obtain another image list by using the local
searching, retrieving, and recognizing candidates within the feature learning algorithm. In the second layer, we first rerank
expanding collections. the images in the aforementioned two lists and then obtain
three types of graph anchors: PH , PL , and PC . PH and PL are
Manuscript received October 25, 2015; revised April 21, 2016; accepted the top-ranked images of the two lists. PC contains common
June 6, 2016. Date of publication June 27, 2016; date of current version similar images of the two lists. We take PH , PL , and PC as
August 11, 2016. This work was supported by the National Natural Science the queries for retrieving database images by means of the
Foundation of China under Grant 41371324.
Y. Wang, L. Zhang, L. Zhang, Z. Zhang, H. Liu, and X. Xing are with the holistic feature or the local feature learning algorithm. Thus, six
State Key Laboratory of Remote Sensing Science, Beijing Normal Univer- lists containing retrieved images can be obtained. In the third
sity, Beijing 100875, China (e-mail: zhanglq@bnu.edu.cn; xxgcdxwyb@163. layer, positive and negative sets are selected by evaluating the
com; yammer_0303@126.com; zhenxin066@163.com; apprentice_g@163.
com; xyxing@mail.bnu.edu.cn). images in the six lists. The parameters in SimpleMKL [45] are
X. Tong is with the School of Surveying and Geo-informatics, Tongji trained to fuse retrieval results, and we obtain the final retrieved
University, Shanghai 200092, China (e-mail: xhtong@tongji.edu.cn). images.
P. T. Mathiopoulos is with the Department of Informatics and Telecom-
munications, National and Kapodestrian University of Athens, 15784 Athens, The main contributions of this paper are summarized as
Greece (e-mail: mathio@di.uoa.gr). follows.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. 1) A novel three-layered graph-based learning approach
Digital Object Identifier 10.1109/TGRS.2016.2579648 for remote sensing image retrieval is developed. The
0196-2892 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6021
approach refines the original input query by combining it The global morphological texture descriptors, including circu-
with the obtained top-ranked retrieved images. The qual- lar covariance histograms (CCHs) and the rotation-invariant
ity of the retrieval results obtained by using local or holis- point triplets (RITs) [19], have been developed to retrieve
tic feature methods is verified, and the ordered retrieval content-based images. The texture descriptors in [19] are then
image sets are fused to generate the final retrieval result. adapted to a local scale by means of the popular bag-of-visual-
The retrieval precision is significantly enhanced without words paradigm [8]. The performance of such holistic feature
sacrificing the scalability of the proposed method. descriptors is usually affected by irrelevant parts of the image,
2) A new expansion query method that is robust to unstable thus making it sensitive to noises. On the other hand, the local
feature extraction is introduced. Its main advantage is that feature extraction strategy avoids this problem by focusing on
it mines multiple relevant images by a single input image detecting and describing salient image regions/points [19]. The
rather than requiring users to input multirelevant images. spin image descriptor [20] combines the descriptive nature of
Combining the retrieved images, along with the original the global object properties with robustness to partial views and
query, a richer expansion query image set is formed as clutters of the local shape descriptions, and it is widely used in
prior knowledge for the next retrieval. By means of the classification and recognition. In [21] and [22], quantized local
set, the accuracy of image retrieval can be improved. features are employed to detect complex geospatial objects
The proposed method can eliminate the deficiency of the such as nuclear and coal power plants in 1-m resolution digital
single image-based retrieval. globe imagery. Yang and Newsam [23] utilized local invariant
3) A novel approach for fusing different image retrieval features for content-based geographic image retrieval, and they
results is presented to allow accurate evaluation of the concluded that the local invariant features are more effective
quality of each return. SimpleMKL is applied to learn than features such as color and texture for image retrieval of
suitable query-dependent fusion weights for holistic and land-use/land-cover (LULC) classes. Aptoula et al. [8] later
local features. The various image retrieval results are extended the low-level feature analysis in [23] to the “bag of
fused to enhance the retrieval accuracy. morphological words” approach for content-based geograph-
ical retrieval. In scene classification, dense low-level feature
descriptors can also be extracted to characterize the local spatial
II. R ELATED W ORK
patterns for aerial scene classification [9].
In this section, we will present the related work about feature Clearly, combining the complementary advantages offered
extraction, similarity measure metrics, and image retrieval in by both local and holistic features can enhance the overall
the following sections. retrieval precision. However, combining local and holistic cues
at the feature level makes it difficult to preserve the efficiency
and scalability induced by the vocabulary tree structure and
A. Feature Representation and Fusion
compact hashing [6]. For this reason, some relevant publica-
Over the last two decades, a variety of feature descriptions tions mainly focused on one reranked retrieval method [24] or
has been proposed [14]–[23]. They are generally divided into text-based retrieval approach [25]. Xia et al. [26] proposed a
two categories: holistic and local features. Holistic features class-specific hypergraph to integrate local features and global
often represent the entire feature distribution of images, e.g., geometric constraints for object recognition. In a very recent
extended Gaussian images (EGIs) [14], texture features [15], publication [6], a graph-based query-specific fusion approach
[16], complex EGIs [17], and spherical attribute images [18]. for fusing the retrieval results based on local and holistic
6022 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
features was proposed. Although this method has high compu- machines using weighted histograms as descriptors. In [39], a
tational efficiency and improves the image retrieval precision, support vector machine approach was employed to recognize
the retrieval results are significantly affected by the quality of land cover information corresponding to spectral characteris-
the single input query image. tics, and Gabor filters were used to extract textural features
that characterize spatial information. Integration of the land
cover information and textural features led to the retrieval of
B. Similarity Measure Approaches
the spectral and spatial patterns of remotely sensed imagery.
Similarity measure is often used on the extracted features In [40], the latent Dirichlet allocation model was utilized to
to identify the images similar to the query [19]. Chen et al. map low-level features of clusters and segment the high-level
[27] proposed a visual similarity-based 3-D model retrieval map labels for remote sensing image annotation and mapping.
system using the L1 distance of features to measure model Finally, Kalaycilar et al. [41] computed the topological and
similarity. Zhang et al. [28] presented an upright-based nor- distance-based spatial relationships between the regions. The
malization method, which accelerates the matching of the spatial relationship histograms were constructed by classifying
shape descriptors by correctly rotating these building models. these regions in an image.
In their approach, they also calculated the L1 distance of
the features to measure the similarity of the building models.
Kang et al. [29] formulated the image similarity assessment III. I MAGE F EATURE D ESCRIPTORS
in terms of sparse representation. To evaluate the applicability In this section, we will present the procedure for extracting
of the proposed feature-based sparse representation using an holistic and local feature descriptors.
image similarity assessment approach, they applied FSRISA
to three popular image applications (namely, copy detection,
retrieval, and recognition) by properly formulating them to A. Holistic Features
sparse representation problems. In [5], a voting-based method
was employed to calculate the similarity measure for object A GIST feature [42] is a holistic feature descriptor repre-
retrieval. Note that the similarity measure computation in this senting the image by defining the spatial envelop with a low-
work only considered the matched feature pairs with spatial dimension approach. We calculate the 2048-dimensional GIST
consistency. In [30], an endmember-based distance measure, descriptors of the query image and every image in the image
particularly adaptive for hyperspectral image retrieval, was database. To reduce the computational cost without decreasing
presented for image retrieval. In another contribution [31], a the retrieval accuracy, a reduction of the linear-dimensionality
probabilistic method was used to match multiple views in object is performed on the descriptors to generate the projected data
retrieval. However, in this approach, the Hausdorff distance D. Then, (1) is employed to minimize the quantization loss and
fails to describe three or more objects that look similar be- implement binary quantization in the resulting space
cause the distance only reflects the information of the closest
views of a pair of objects. When comparing two objects, the Ω(B, R) = B − DR2F (1)
similarity is measured by summing up the similarity from
all of the corresponding images in [32]. Li and Fonseca [33] where B is the GIST feature descriptor and R is the orthogonal
applied different priorities to topology, direction, and distance matrix. • F denotes the Forbenius norm.
similarities for qualitative spatial similarity assessment, where Fig. 2 illustrates the GIST features of six categories in the
they considered commonality and differences in their similarity UC Merced image database. It is noted that each category has
assessment studies. its own discriminative holistic feature.
Fig. 2. GIST features of six scene categories in the UC Merced Land Use–Land Cover dataset. The first row shows the categories in the high-resolution satellite
images, and the second row presents their corresponding GIST features. (a) Agricultural. (b) Airplane. (c) Freeway. (d) Harbor. (e) Parking lot. (f) Storage tanks.
Equations (3) and (4) are adopted to extract more discrimina- A. First-Layer Graph-Based Retrieval
tive features [43] through multiscale maximum pooling
In this layer, a semisupervised learning method [46] is intro-
z = F (U) (3) duced for retrieving images. We only consider the query image
as the labeled image, and the remaining images are considered
where z is the local image feature by a prechosen pooling unlabeled. To achieve this goal, we first construct a weighted
function graph to obtain the optimized ranking function. Afterward,
zj = max {|u1j |, |u2j |, . . . , |uMj |} (4) database images are retrieved based on the local and holistic
features of these images.
where zj is the jth element of z, uij is the matrix element at the 1) Graph Construction: Given an image dataset X = {x1 ,
ith row and the jth column of U, and M is the number of local . . . , xl , xl+1 , . . . , xn }. Some of the images are labeled as query
feature descriptors in the region. images, and the rest of them need to be ranked according to their
Fig. 3 illustrates the process for extracting the local feature. relevance with the queries. For each query image, a weighted
Because the extracted local feature is highly dimensional, it is undirected graph G = (V, E, W) is constructed from each
difficult to learn the feature joint relevance without consider- individual feature-based retrieval method, where the retrieval
able data overfitting and to obtain optimized parameters with quality or the relevance is modeled by the weights on the edges,
limited training samples. The dimension reduction method [44] where V is a set of vertices, E is a set of edges, and W is
is applied to reduce the dimension. a set of edge weights. Each database image corresponds to a
vertex in G. For each image, we identify its k nearest neighbors
and connect the corresponding vertices in G with edges that
IV. T HREE -L AYERED G RAPH -BASED I MAGE R ETRIEVAL
are associated with the distance between the two vertices.
As illustrated in Fig. 4, the process of the proposed image G is usually not a connected graph. To obtain a connected
retrieval can be divided into three layers. In the first layer, by graph, the two nearest connected components of G are com-
using the graph-based holistic feature learning algorithm, one bined into one by an edge that links their two nearest points.
list whose images are similar to the input query is obtained from The process ends when G becomes a connected and undirected
6024 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
regularization term Ω(f ) and the empirical loss term R(f ) can
be combined to form the following optimization function:
= f T (I − Θs)f (7)
where
Θs = K−1/2 WK−1/2 , K is a degree matrix, kii =
j Wij , I is the identity matrix, and
f = (I − αΘs)−1 y (9)
graph. E corresponds to the similarity among vertices with the In this layer, the task is to use the retrieved images as prior
weight W defined by knowledge to improve the retrieval precision. To achieve this,
2 we utilize the reranking method to obtain the top-ranked images
d (Vi , Vj ) as graph anchors. Then, these anchors are taken as new queries
Wij = exp − (5)
σ2 to further improve the retrieval accuracy.
1) Construction of Graph Anchors: Through the first layer,
where d(Vi , Vj ) denotes the feature distance between the we have obtained two image retrieval results. They are stored in
vertices Vi and Vj , and Wij is the edge weight of Eij . σ is two lists: H_LQ and L_LQ . We separately rerank the images in
a constant that controls the strength of the weight, and it is set each of the lists and obtain two reranked image retrieval lists.
as the median distance among all images. The top-ranked images and common images are taken as the
If the similarity among vertices is determined by the holistic graph anchors for expanding the query. In the following, we
feature based on pairwise image distances, G is called the discuss the procedure for generating the graph anchors.
graph GH of the holistic feature. Otherwise, it is called the To generate accurate graph anchors for learning the joint
graph GL of the local feature. relevance of the holistic and local features, it is necessary to
2) Learning on the Graph: After the graphs are constructed, precisely measure the similarity among the images. We define
the optimal pairwise relevance is obtained through a learning the similarity degree of the two images as the relative score
algorithm. Given the initial labeled data (a query image), the (RS). RS can be computed by using the following reranking
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6025
retrieved results (LHH , LHL , and LHC ) corresponding to the evaluation. These images are stored in the aforementioned
queries PH , PL , and PC . Accordingly, we also obtain other image list LG .
three retrieved results (LLH , LLL , and LLC ) by GL . LHH and The evaluation processes of other retrieval lists are the same
LLL are the results of reinforcement learning. LHL , LLH , LHC , as the process of LHH . To the retrieved results LHL , LHC , LLH ,
and LLC are the results of the preliminary fusion. We know LLL , and LLC , we compute their consistency degree by (15)
that LHL and LHC are the retrieval results corresponding to and choose the good retrieval results for the fusion of the final
the graph GH of the holistic feature, but the queries are the result. These images are also stored in LG .
graph anchors PL and PC which are not generated from GH . 2) Further Feature Fusion by SimpleMKL: After the eval-
Similarly, LLH and LLC are the retrieval results corresponding uation, we need to address the problem of obtaining the best
to GL , but the queries are the graph anchors PL and PC fusion weights. SimpleMKL is applied to learn suitable query-
which are not generated from GL . LHC and LLC are also dependent fusion weights for two types of features. In the image
the results of the graph-based learning algorithm of holistic lists LG , the images are taken as the positive data, and the
and local features, which use the common graph anchors as images in LD with low similarity are taken as negative data. We
the queries. This approach can reduce the influence of some train the parameters of SimpleMKL through these data and the
inappropriate graph anchors on the retrieval result to a certain two features. SimpleMKL determines the combination of the
extent. feature weights by solving a standard SVM optimization prob-
lem based on a gradient descent method. The kernel K(x, x )
represents the similarity between two images using one feature.
C. Third-Layer Fusion-Based Retrieval Results
Supposing that there are M types of features in an image xi , we
To generate the fusion retrieval result of different features, formulate the corresponding kernel matrix for the mth feature
we evaluate the performance of the retrieved images in the
aforementioned six lists and find similar and dissimilar images. Km (xi , xj ) = exp −γm xm i − xj
m 2
(16)
Then, we learn the weights of holistic and local features and
where m = 1, 2, . . . , M, xm
i corresponds to the mth feature of
related parameters of the fusion. SimpleMKL [45] is introduced
xi and γm is a positive parameter that can be computed accord-
to obtain the final fusion result.
ing to the data distribution.
For the fusion process, positive data and negative data are
To combine multiple features, the conventional approach is
needed. Thus, we introduce an image list (LG ) that contains
to combine the basic kernels in a convex way
similar images (positive data), as well as a second one (LD )
that contains the dissimilar images (negative data). To create
M
LD , we randomly selected a certain number of images from the K(xi , xj ) = ωm Km (xi , xj ) (17)
bottom of LHH , LHL , LHC , LLH , LLL , and LLC separately and m=1
LHH as the graph anchors for an expansion query. We modified s.t. yi fm (xi ) + yi b ≥ 1 − ζi ∀i
y in (9), where the values of the graph anchors are set to 1. m=1
The graph GH of the holistic feature and f computed in ζi ≥ 0 ∀i
Section IV-A are employed to obtain the retrieved result La . For
the top-ranked images in reranked LHH and La , their similari-
M
ties to the query image and the sequence in the aforementioned ωm = 1, ωm ≥ 0 ∀m (18)
m=1
two lists are considered. Assume that the position of a retrieved
image in reranked LHH is i and that in La is j. The consistency where fm is a function corresponding to a reproducing kernel
degree of the retrieval results is Hilbert space Hm of kernel Km (xm m
i , xj ). ζi is the slack vari-
able, C is the regularization parameter that trades off between
num
cn
1 the decision function and the slack variable, ω is the weight
DC = (15)
cn 1+i+j vector, and b is the coefficients to be learned from the training
k=1
data.
where num is the number of common images in the reranked After the optimized parametersare obtained, we compute
lists LHH and La . In practice, cn is usually set to 20, 30, 40, the decision function f (x) + b = M fm (x) + b, where each
and 50. function fm corresponds to a different Hm associated with a
From the retrieved result LHH , we can obtain a certain kernel Km . The decision function describes the relevance of the
number of the top-ranked images by using the retrieval result retrieved images to the query image. The higher the relevance
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6027
Fig. 6. Some examples of the UC Merced Land Use–Land Cover dataset. (a) Agricultural. (b) Airplane. (c) Baseball diamond. (d) Beach. (e) Buildings.
(f) Chaparral. (g) Dense residential. (h) Forest. (i) Freeway. (j) Golf course. (k) Harbor. (l) Intersection. (m) Medium residential. (n) Mobile home park.
(o) Overpass. (p) Parking lot. (q) River. (r) Runway. (s) Sparse residential. (t) Storage tanks. (u) Tennis court.
value is, the more similar the two images are. After all of the
images in the retrieved image list are sorted in descending order,
they represent the final retrieval result.
V. E XPERIMENTS each individual feature and then further compare our method
with other related methods. mAP and precision–recall curves
In this section, we evaluate the performance of our proposed are employed to evaluate the retrieval performance. We also use
method for image retrieval on the UC Merced image data set the average normalized modified retrieval rank (ANMRR) [52]
[48]. The data set consists of images of 21 LULC categories to provide an overall rating of the performance.
with a pixel resolution of 30 cm. Each class contains 100 RGB
color samples of size 256 × 256 pixels [19]. The example
A. Validation of Retrieval Performance in Each Layer
images are shown in Fig. 6. The source code of the proposed
method is available at the website: http://geogother.bnu.edu.cn/ We extract the 2048-dimensional GIST feature and apply the
teacherweb/zhangliqiang/ algorithm in [49] to reduce the dimension of the feature to 512.
To evaluate the effectiveness of our method, we conduct The dimension reduction method [44] is used to reduce the local
extensive experiments on the UC Merced dataset. We first feature to 32. Fig. 7 demonstrates that features whose dimen-
compare our approach in different layers with those by using sions have been reduced have little influences on image retrieval
6028 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
Fig. 8. Comparisons of retrieval results. (a) Performance of the retrieval results obtained by the GIST feature and Cityblock distance, with our holistic feature
method in the first layer. (b) Performance of the retrieval results obtained by the SIFT feature and Cityblock distance, with our local feature method in the first
layer.
accuracy, but the retrieval efficiency is enhanced, saving more method are nearly the same. As the number of retrieved images
than 80% of the computational time in the graph construction. increases, the reranking method outperforms the other methods.
1) Retrieval Performance in the First Layer: In the first For the local feature, Fig. 9(d)–(f) shows the different results
layer, we use the holistic and local features to construct the obtained by the local feature. From the mAP curves, we note
graph. α is set to 0.1 in the graph of the holistic feature and that the performance of the reranking method is still higher than
0.01 in the graph of the local feature. those of other methods. The retrieval precision obtained in the
To validate the retrieval performance in the first layer, we first layer is further improved after the reranking.
use two methods to retrieve images. One is the holistic or Assume that RSH_ max denotes the maximum RS in the
local feature-based retrieval method with Cityblock distance graph-based retrieval method of the holistic feature and
(the first method). The other is the proposed method in the first RSL_ max denotes the maximum RS in the graph-based re-
layer. The results shown in Fig. 8 are the average values of trieval method of the local feature. The thresholds are defined
retrieval performance of 21 categories. The retrieval precision as tH_sim = 0.1 × RSH_ max and tL_sim = 0.1 × RSL_ max to
of some categories, such as agricultural, buildings, and harbor, extract the graph anchors PH and PL . Equation (14) and the
is even higher than the values in Fig. 8. The learned ranking threshold 0.01 × max(simHL ) are utilized to extract the graph
function takes holistic and local features into account and anchor PC . In the second layer, the graph anchors PH , PL , and
uses the smoothness and fitting constraint to solve the image PC are used to retrieve images from the database.
similarity. Thus, the retrieval results in the first layer are more From Figs. 10 and 11, it is noted that use of the top-ranked
preferable. From Fig. 8, we notice that our presented method images in the expansion query can improve the retrieval accu-
always performs better than the first method. racy. For example, by using the graph anchors PL , the obtained
2) Retrieval Performance in the Second Layer: In the second retrieval result LHL is more accurate than the other retrieval
layer, we rerank the retrieval results of the first layer. The results shown in Fig. 10. The reason is that the reranking
number of top reranked images is set to 10, 20, and 30. mAP process can offer more precise prior knowledge for PL . In
is used to evaluate different retrieval results. The first retrieval Fig. 11, the retrieval results of some categories, such as dense
result is obtained by the holistic or local feature-based retrieval residential, golf course, and tennis courts, may not be very good
method with Cityblock distance. The second result is from the in LLH. The main reason is that the graph anchors PH cannot
retrieval of the first layer. The third result is the reranked result provide more useful information for the local feature learning
by using our method. The three retrieval results are shown in in the second layer.
Fig. 9. It is shown that the retrieval performance of the reranked 3) Retrieval Performance in the Third Layer: We have
method in the second layer is always better than in the other reranked the retrieval result lists LHH , LHL , LHC , LLH , LLL ,
two methods. Fig. 9(a)–(c) shows the different results obtained and LLC . We usually select the top 50 images in the afore-
by the holistic feature. The method in the first layer and the mentioned reranked lists to evaluate the retrieval consistency.
reranking process can retrieve more accurate images than the These good results are considered as the positive data, and
holistic feature-based retrieval method with Cityblock distance. the images at the bottom of the reranked lists are selected as
When the nearest neighbor number is equal to 3, the retrieval the negative data. We remove the duplicate elements of these
performances of the method in the first layer and the reranking positive and negative data. The parameters of SimpleMKL are
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6029
Fig. 9. Comparisons of different retrieval methods. (a) Retrieval results obtained by using the GIST feature with Cityblock distance. The number of top-ranked
images is 10. (b) Retrieval results obtained by using the holistic feature-based retrieval method in the first layer. The number of top-ranked images is 20.
(c) Reranking result. The number of top-ranked images is 30. (d) Retrieval results obtained by using the SIFT feature with Cityblock distance. The number
of top-ranked images is 10. (e) Retrieval results obtained by using the local feature-based retrieval method in the first layer. The number of top-ranked images
is 20. (f) Reranking result. The number of top-ranked images is 30.
6030 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
Fig. 10. Comparison of four retrieval results by using the graph of the holistic
feature-based retrieval method. LHC denotes the retrieval result when PC rep-
resents the query images. LHH denotes the retrieval result when PH represents
the query images. LHL denotes the retrieval result when PL represents the query
images.
Fig. 12. Comparisons of the retrieval results obtained by using the holistic
feature. LHC, LHH, and LHL denote retrieval results. PC, PH, and PL represent
query images.
ACKNOWLEDGMENT
Fig. 15. Comparisons of the retrieval results using ANMRR. The authors would like to thank the editors and anony-
TABLE I
mous reviewers for their valuable comments, and also thank
R ETRIEVAL P ERFORMANCE OF D IFFERENT A PPROACHES E. Aptoula for providing the comparison features.
IN T ERMS OF ANMRR AND C OMPUTATIONAL T IME
R EFERENCES
[1] S. M. Schweizer and J. M. Moura, “Efficient detection in hyperspectral
imagery,” IEEE Trans. Image Process., vol. 10, no. 4, pp. 584–597,
Apr. 2001.
[2] C. Li, T. Sun, K. Kelly, and Y. Zhang, “A compressive sensing and
unmixing scheme for hyperspectral data processing,” IEEE Trans. Image
Process., vol. 21, no. 3, pp. 1200–1210, Mar. 2012.
[3] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and
J. C. Tilton, “Advances in spectral–spatial classification of hyperspectral
images,” Proc. IEEE, vol. 101, no. 3, pp. 652–675, Mar. 2013.
[4] J. Li, J. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspectral image
segmentation using multinomial logistic regression with active learning,”
IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4085–4098,
Nov. 2010.
learning suitable query-dependent fusion weights for holistic [5] X. Shen, Z. Lin, J. Brandt, and Y. Wu, “Spatially-constrained similarity
and local features by using SimpleMKL costs most of the measure for large-scale object retrieval,” IEEE Trans. Pattern Anal. Mach.
retrieval time. We also notice that the overall computational Intell., vol. 36, no. 6, pp. 1229–1241, Jun. 2014.
[6] S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas, “Query specific
time of the second layer-SIFT is less than the methods in [19] rank fusion for image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell.,
and [51], but the average ANMRR is the smallest. vol. 34, no. 4, pp. 803–815, Apr. 2015.
In our image retrieval procedure, most of the steps are paral- [7] S. Özkan, T. Ates, E. Tola, M. Soysal, and E. Esen, “Performance analysis
of state-of-the-art representation methods for geographical image retrieval
lelizable, such as the holistic and local feature extraction of the and categorization,” IEEE Geosci. Remote Sens. Lett. 2014, vol. 11,
query images and the image retrieval by using each individual no. 11, pp. 1996–2000, Nov. 2014.
feature. Therefore, they can be implemented by employing a [8] E. Aptoula, “Bag of morphological words for content-based geograph-
ical retrieval,” in Proc. IEEE Int. Content-Based Multimedia Indexing
parallelization scheme to further reduce computation time. Workshop, 2014, pp. 1–5.
[9] A. M. Cheriyadat, “Unsupervised feature learning for aerial scene classi-
VI. C ONCLUSION fication,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 439–451,
Jan. 2014.
In this paper, we have proposed a three-layered remote [10] R. Ji, L.-Y. Duan, J. Chen, H. Yao, Y. Rui, and W. Gao, “Location discrim-
inative vocabulary coding for mobile landmark search,” Int. J. Comput.
sensing image retrieval method based on local and holistic Vis., vol. 96, no. 3, pp. 290–314, Feb. 2012.
features. Unlike previous image retrieval methods, which often [11] X. Yang, X. Qian, and Y. Xue, “Scalable mobile image retrieval by ex-
concatenate all types of features into one vector to retrieve ploring contextual saliency,” IEEE Trans. Image Process., vol. 24, no. 6,
pp. 1709–1721, Jun. 2015.
images, we have extended the query and applied a novel [12] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, “Total recall:
three-layered learning image retrieval approach to fuse various Automatic query expansion with a generative feature model for object
features. To extend the query, the proposed approach refines retrieval,” in Proc. IEEE Int. Conf. Comput. Vis., Rio de Janeiro, Brazil,
Oct. 2007, pp. 1–8.
the original input query by combining it with the top-ranked [13] O. Chum, A. Mikulík, M. Perdoch, and J. Matas, “Total recall II: Query
retrieval results obtained by using the holistic and local feature- expansion revisited,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern
based methods. The expansion query images are taken as graph Recognit., Jun. 2011, pp. 889–896.
[14] P. Siva, C. Russell, X. Tao, and L. Agapito, “Looking beyond the image:
anchors for further retrieving six image sets. The images in Unsupervised learning for object saliency and detection,” in Proc. IEEE
each set are evaluated for generating positive data and negative Int. Conf. Comput. Vis. Pattern Recognit., Portland, OR, USA, Jun. 2013,
data. SimpleMKL is applied to learn suitable query-dependent pp. 3238–3245.
[15] M. Ferecatu and N. Boujemaa, “Interactive remote sensing image retrieval
fusion weights for the holistic and local features. Experiments using active relevance feedback,” IEEE Trans. Geosci. Remote Sens.,
in each layer were conducted, and they demonstrate that the vol. 45, no. 4, pp. 818–826, Apr. 2007.
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6033
[16] A. Samal, S. Bhatia, P. Vadlamani, and D. Marx, “Searching satellite [40] D. Bratasanu, I. Nedelcu, and M. Datcu, “Bridging the semantic gap for
imagery with integrated measures,” Pattern Recognit., vol. 42, no. 11, satellite image annotation and automatic mapping applications,” IEEE J.
pp. 2502–2513, Nov. 2009. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 4, no. 1, pp. 193–204,
[17] S. Zheng et al., “Dense semantic image segmentation with objects and Mar. 2011.
attributes,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., [41] F. Kalaycilar, A. Kale, D. Zamalieva, and S. Aksoy, “Mining of re-
Columbus, OH, USA, Jun. 2014, pp. 3214–3221. mote sensing image archives using spatial relationship histograms,” in
[18] N. Dalal and B. Triggs, “Histograms of oriented gradients for human Proc. IEEE Int. Geosci. Remote Sens. Symp., Boston, MA, USA, 2008,
detection,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., pp. III-589–III-592.
San Diego, CA, USA, Jun. 2005, pp. 1063–6919. [42] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic
[19] E. Aptoula, “Remote sensing image retrieval with global morphological representation of the spatial envelope,” Int. J. Comput. Vis., vol. 42, no. 3,
texture descriptors,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 145–175, May 2001.
pp. 3023–3034, May 2014. [43] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyra-
[20] A. E. Johnson and M. Hebert, “Using spin images for efficient object mid matching using sparse coding for image classification,” in
recognition in cluttered 3-D scenes,” IEEE Trans. Pattern Anal. Mach. Proc. IEEE Int. Conf. Comput. Vis., Miami, FL, USA, Jun. 2009,
Intell., vol. 21, no. 5, pp. 433–449, May 1999. pp. 1794–1801.
[21] S. Gleason, R. Ferrell, A. Cheriyadat, R. Vatsavai, and S. De, “Semantic [44] L. J. van der Maaten, E. O. Postma, and H. J. van den Herik, “Dimen-
information extraction from multispectral geospatial imagery via a flex- sionality reduction: A comparative review,” J. Mach. Learn. Res., vol. 10,
ible framework,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2010, pp. 66–71, Jan. 2008.
pp. 166–169. [45] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “SimpleMKL,”
[22] R. R. Vatsavai, A. Cheriyadat, and S. Gleason, “Unsupervised semantic J. Mach. Learn. Res., vol. 9, pp. 2491–2521, Nov. 2008.
labeling framework for identification of complex facilities in high resolu- [46] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, “Learn-
tion remote sensing images,” in Proc. Int. Conf. Data Mining Workshops, ing with local and global consistency,” Proc. Adv. Neural Inf. Process.,
2010, pp. 273–280. vol. 16, no. 16, pp. 321–328, 2004.
[23] Y. Yang and S. Newsam, “Geographic image retrieval using local invariant [47] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int.
features,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 818–832, J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
Feb. 2013. [48] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for
[24] D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. van Cool, “Hello land-use classification,” in Proc. ACM Int. Conf. Adv. Geograph. Inf. Syst.,
neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” 2010, pp. 270–279.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Providence, RI, USA, [49] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approach
Jun. 2011, pp. 777–784. to learning binary codes,” in Proc. IEEE Conf. Comput. Vis. Pattern
[25] B. Xu et al., “Efficient manifold ranking for image retrieval,” in Proc. Int. Recognit., Providence, RI, USA, Jun. 2011, pp. 817–824.
ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2011, pp. 525–534. [50] A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with
[26] S. Xia and E. Hancock, “3-D object recognition using hyper-graphs and a spatial pyramid kernel,” in Proc. Int. Conf. Image Video Retrieval,
ranked local invariant features,” in Structural, Syntactic, and Statisti- New York, NY, USA, 2007, pp. 401–408.
cal Pattern Recognition. New York, NY, USA: Springer-Verlag, 2008, [51] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of fea-
pp. 117–126. tures: Spatial pyramid matching for recognizing natural scene cate-
[27] D. Y. Chen, X. P. Tian, Y. T. Shen, and M. Ouhyoung, “On visual simi- gories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2006,
larity based 3-D model retrieval,” Comput. Graph. Forum, vol. 22, no. 3, pp. 2169–2178.
pp. 223–232, Sep. 2003. [52] B. S. Manjunath, J. R. Ohm, V. V. Vasudevan, and A. Yamada, “Color and
[28] M. Zhang, L. Zhang, P. T. Mathiopoulos, Y. Ding, and H. Wang, texture descriptors,” IEEE Trans. Circuits Syst. Video Technol., vol. 11,
“Perception-based shape retrieval for 3-D building models,” ISPRS J. no. 6, pp. 703–715, Jun. 2011.
Photogramm. Remote Sens., vol. 75, pp. 76–91, Jan. 2013. [53] W. Liu, J. He, and S. Chang, “Large graph construction for scalable
[29] L. Kang, C. Hsu, H. Chen, C. Lu, C. Lin, and S. Pei, “Feature-based sparse semi-supervised learning,” in Proc. 27th Int. Conf. Mach. Learn., 2010,
representation for image similarity assessment,” ACM Trans. Multimedia, pp. 679–686.
vol. 13, no. 5, pp. 1019–1030, Oct. 2011. [54] E. Aptoula, “Extending morphological covariance,” Pattern Recognit.,
[30] M. Graña and M. A. Veganzones, “An endmember-based distance for vol. 45, no. 12, pp. 4524–4535, Dec. 2012.
content based hyperspectral image retrieval,” Pattern Recognit., vol. 45, [55] E. Aptoula and B. Yanikoglu, “Morphological features for leaf based plant
no. 9, pp. 3472–3489, Sep. 2012. recognition,” in Proc. IEEE Int. Conf. Image Process., Melbourne, Vic.,
[31] Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3-D object retrieval Australia, Sep. 2013, pp. 1496–1499.
and recognition with hypergraph analysis,” IEEE Trans. Image Process., [56] Z. Wang et al., “A multiscale and hierarchical feature extrac-
vol. 21, no. 9, pp. 4290–4303, Sep. 2012. tion method for terrestrial laser scanning point cloud classification,”
[32] P. Daras and A. Axenopoulos, “A 3-D shape retrieval framework sup- IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2409–2425,
porting multimodal queries,” Int. J. Comput. Vis., vol. 89, pp. 229–247, May 2015.
Jul. 2010. [57] Z. Zhang et al., “A multi-level point cluster-based discriminative feature
[33] B. Li and F. Fonseca, “TDD: A comprehensive model for qualitative for ALS point cloud classification,” IEEE Trans. Geosci. Remote Sens.,
spatial similarity assessment,” Spatial Cognition Comput., vol. 6, no. 1, vol. 54, no. 6, pp. 3309–3321, Jun. 2016.
pp. 31–62, Jan. 2006. [58] S. Mei, M. He, Y. Zhang, Z. Wang, and D. Feng, “Improving spatial–
[34] J. A. Piedra-Fernandez, G. Ortega, J. Z. Wang, and M. Canton-Garbin, spectral endmember extraction in the presence of anomalous ground ob-
“Fuzzy content-based image retrieval for oceanic remote sensing,” IEEE jects,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4210–4222,
Trans. Geosci. Remote Sens., vol. 52, no. 9, pp. 5422–5431, Sep. 2014. Nov. 2011.
[35] M. Wang and T. Song, “Remote sensing image retrieval by scene se-
mantic matching,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5,
pp. 2874–2886, May 2013.
[36] G. Scott, M. Klaric, C. Davis, and C.-R. Shyu, “Entropy-balanced
bitmap tree for shape-based object retrieval from large-scale satellite
imagery databases,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 5,
pp. 1603–1616, May 2011.
[37] Y. Li and T. R. Bretschneider, “Semantic-sensitive satellite image re- Yuebin Wang is currently working toward the Ph.D.
trieval,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 853–860, degree in the School of Geography, Beijing Normal
Apr. 2007. University, Beijing, China.
[38] M. Schröder, H. Rehrauer, K. Seidel, and M. Datcu, “Interactive His research interests are remote sensing imagery
learning and probabilistic retrieval in remote sensing image archives,” processing and 3-D urban modeling.
IEEE Trans. Geosci. Remote Sens., vol. 38, no. 5, pp. 2288–2298,
Sep. 2000.
[39] J. Li and R. M. Narayanan, “Integrated spectral and spatial information
mining in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens.,
vol. 42, no. 3, pp. 673–685, Mar. 2004.
6034 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016
Liqiang Zhang received the Ph.D. degree in geoin- Xiaoyue Xing is currently working toward the
formatics from the Institute of Remote Sensing Ap- Bachelor’s degree in the School of Geography,
plications, Chinese Academy of Sciences, Beijing, Beijing Normal University, Beijing, China.
China, in 2004. Her research interests are TLS point cloud
He is currently a Professor with the School processing.
of Geography, Beijing Normal University, Beijing.
His research interests include remote sensing image
processing, 3-D urban reconstruction, and spatial
object recognition.