You are on page 1of 15

6020 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO.

10, OCTOBER 2016

A Three-Layered Graph-Based Learning Approach


for Remote Sensing Image Retrieval
Yuebin Wang, Liqiang Zhang, Xiaohua Tong, Liang Zhang, Zhenxin Zhang, Hao Liu,
Xiaoyue Xing, and P. Takis Mathiopoulos, Senior Member, IEEE

Abstract—With the emergence of huge volumes of high- A necessity for developing a successful remote sensing im-
resolution remote sensing images produced by all sorts of satellites age retrieval system is the extraction of representative features
and airborne sensors, processing and analysis of these images to describe the query image and the ones in the database [36].
require effective retrieval techniques. To alleviate the dramatic
variation of the retrieval accuracy among queries caused by the Local and holistic features are two types of descriptors for
single image feature algorithms, we developed a novel graph-based representing images. Local features are particularly capable of
learning method for effectively retrieving remote sensing images. attending to local image patterns or textures, whereas holistic
The method utilizes a three-layer framework that integrates the features describe the overall layouts of an image. One drawback
strengths of query expansion and fusion of holistic and local of the two features is that the retrieved images often look
features. In the first layer, two retrieval image sets are obtained
by, respectively, using the retrieval methods based on holistic and alike but may be irrelevant to the query because remote sens-
local features, and the top-ranked and common images from both ing images often represent large natural geographical scenes
of the top candidate lists subsequently form graph anchors. In the that contain abundant and complex visual contents [35]. They
second layer, the graph anchors as an expansion query retrieve six are usually hindered by the noise from irrelevant content [7].
image sets from the image database using each individual feature. Clearly, integration of their strengths can greatly improve re-
In the third layer, the images in the six image sets are evaluated for
generating positive and negative data, and SimpleMKL is applied trieval precision. Directly combining different feature vectors
to learn suitable query-dependent fusion weights for achieving the into one vector is not a good idea for improving image retrieval
final image retrieval result. Extensive experiments were performed accuracy because the feature characteristics and the algorith-
on the UC Merced Land Use–Land Cover data set. The source mic procedures are dramatically different [6]. Although query
code has been available at our website. Compared with other expansion [12], [13] can achieve precise retrieval results, the
related methods, the retrieval precision is significantly enhanced
without sacrificing the scalability of our approach. performance of query expansion tends to degrade because of
false-positive search results [5].
Index Terms—Expansion query, graph-based learning, image Inspired by the strengths of the query expansion and specific
retrieval, query fusion, reranking.
rank fusion method [6], we propose a three-layered graph-
I. I NTRODUCTION based learning approach to retrieve remote sensing images.
Fig. 1 illustrates the overview of our proposed approach. In
W ITH the availability of huge volumes of high-resolution
remote sensing images produced by all sorts of satellites
and airborne sensors, remote sensing image processing and
this approach, we extract the holistic feature by means of
GIST and the local feature through dense SIFT and ScSPM
analysis have been an active research topic in the geoscience [43]. Next, the image retrieval process is divided into three
field [1]–[4] in recent decades. As a result, many applications layers. In the first layer, by using the holistic feature learning
involving high-resolution satellite image analysis are in need algorithm, one image list in which every image is similar to
of an image retrieval task. Due to the huge number of images, the query is retrieved from the remote sensing image database.
it is important to develop effective and efficient methods for Accordingly, we obtain another image list by using the local
searching, retrieving, and recognizing candidates within the feature learning algorithm. In the second layer, we first rerank
expanding collections. the images in the aforementioned two lists and then obtain
three types of graph anchors: PH , PL , and PC . PH and PL are
Manuscript received October 25, 2015; revised April 21, 2016; accepted the top-ranked images of the two lists. PC contains common
June 6, 2016. Date of publication June 27, 2016; date of current version similar images of the two lists. We take PH , PL , and PC as
August 11, 2016. This work was supported by the National Natural Science the queries for retrieving database images by means of the
Foundation of China under Grant 41371324.
Y. Wang, L. Zhang, L. Zhang, Z. Zhang, H. Liu, and X. Xing are with the holistic feature or the local feature learning algorithm. Thus, six
State Key Laboratory of Remote Sensing Science, Beijing Normal Univer- lists containing retrieved images can be obtained. In the third
sity, Beijing 100875, China (e-mail: zhanglq@bnu.edu.cn; xxgcdxwyb@163. layer, positive and negative sets are selected by evaluating the
com; yammer_0303@126.com; zhenxin066@163.com; apprentice_g@163.
com; xyxing@mail.bnu.edu.cn). images in the six lists. The parameters in SimpleMKL [45] are
X. Tong is with the School of Surveying and Geo-informatics, Tongji trained to fuse retrieval results, and we obtain the final retrieved
University, Shanghai 200092, China (e-mail: xhtong@tongji.edu.cn). images.
P. T. Mathiopoulos is with the Department of Informatics and Telecom-
munications, National and Kapodestrian University of Athens, 15784 Athens, The main contributions of this paper are summarized as
Greece (e-mail: mathio@di.uoa.gr). follows.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. 1) A novel three-layered graph-based learning approach
Digital Object Identifier 10.1109/TGRS.2016.2579648 for remote sensing image retrieval is developed. The
0196-2892 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6021

Fig. 1. Overview of the proposed approach.

approach refines the original input query by combining it The global morphological texture descriptors, including circu-
with the obtained top-ranked retrieved images. The qual- lar covariance histograms (CCHs) and the rotation-invariant
ity of the retrieval results obtained by using local or holis- point triplets (RITs) [19], have been developed to retrieve
tic feature methods is verified, and the ordered retrieval content-based images. The texture descriptors in [19] are then
image sets are fused to generate the final retrieval result. adapted to a local scale by means of the popular bag-of-visual-
The retrieval precision is significantly enhanced without words paradigm [8]. The performance of such holistic feature
sacrificing the scalability of the proposed method. descriptors is usually affected by irrelevant parts of the image,
2) A new expansion query method that is robust to unstable thus making it sensitive to noises. On the other hand, the local
feature extraction is introduced. Its main advantage is that feature extraction strategy avoids this problem by focusing on
it mines multiple relevant images by a single input image detecting and describing salient image regions/points [19]. The
rather than requiring users to input multirelevant images. spin image descriptor [20] combines the descriptive nature of
Combining the retrieved images, along with the original the global object properties with robustness to partial views and
query, a richer expansion query image set is formed as clutters of the local shape descriptions, and it is widely used in
prior knowledge for the next retrieval. By means of the classification and recognition. In [21] and [22], quantized local
set, the accuracy of image retrieval can be improved. features are employed to detect complex geospatial objects
The proposed method can eliminate the deficiency of the such as nuclear and coal power plants in 1-m resolution digital
single image-based retrieval. globe imagery. Yang and Newsam [23] utilized local invariant
3) A novel approach for fusing different image retrieval features for content-based geographic image retrieval, and they
results is presented to allow accurate evaluation of the concluded that the local invariant features are more effective
quality of each return. SimpleMKL is applied to learn than features such as color and texture for image retrieval of
suitable query-dependent fusion weights for holistic and land-use/land-cover (LULC) classes. Aptoula et al. [8] later
local features. The various image retrieval results are extended the low-level feature analysis in [23] to the “bag of
fused to enhance the retrieval accuracy. morphological words” approach for content-based geograph-
ical retrieval. In scene classification, dense low-level feature
descriptors can also be extracted to characterize the local spatial
II. R ELATED W ORK
patterns for aerial scene classification [9].
In this section, we will present the related work about feature Clearly, combining the complementary advantages offered
extraction, similarity measure metrics, and image retrieval in by both local and holistic features can enhance the overall
the following sections. retrieval precision. However, combining local and holistic cues
at the feature level makes it difficult to preserve the efficiency
and scalability induced by the vocabulary tree structure and
A. Feature Representation and Fusion
compact hashing [6]. For this reason, some relevant publica-
Over the last two decades, a variety of feature descriptions tions mainly focused on one reranked retrieval method [24] or
has been proposed [14]–[23]. They are generally divided into text-based retrieval approach [25]. Xia et al. [26] proposed a
two categories: holistic and local features. Holistic features class-specific hypergraph to integrate local features and global
often represent the entire feature distribution of images, e.g., geometric constraints for object recognition. In a very recent
extended Gaussian images (EGIs) [14], texture features [15], publication [6], a graph-based query-specific fusion approach
[16], complex EGIs [17], and spherical attribute images [18]. for fusing the retrieval results based on local and holistic
6022 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

features was proposed. Although this method has high compu- machines using weighted histograms as descriptors. In [39], a
tational efficiency and improves the image retrieval precision, support vector machine approach was employed to recognize
the retrieval results are significantly affected by the quality of land cover information corresponding to spectral characteris-
the single input query image. tics, and Gabor filters were used to extract textural features
that characterize spatial information. Integration of the land
cover information and textural features led to the retrieval of
B. Similarity Measure Approaches
the spectral and spatial patterns of remotely sensed imagery.
Similarity measure is often used on the extracted features In [40], the latent Dirichlet allocation model was utilized to
to identify the images similar to the query [19]. Chen et al. map low-level features of clusters and segment the high-level
[27] proposed a visual similarity-based 3-D model retrieval map labels for remote sensing image annotation and mapping.
system using the L1 distance of features to measure model Finally, Kalaycilar et al. [41] computed the topological and
similarity. Zhang et al. [28] presented an upright-based nor- distance-based spatial relationships between the regions. The
malization method, which accelerates the matching of the spatial relationship histograms were constructed by classifying
shape descriptors by correctly rotating these building models. these regions in an image.
In their approach, they also calculated the L1 distance of
the features to measure the similarity of the building models.
Kang et al. [29] formulated the image similarity assessment III. I MAGE F EATURE D ESCRIPTORS
in terms of sparse representation. To evaluate the applicability In this section, we will present the procedure for extracting
of the proposed feature-based sparse representation using an holistic and local feature descriptors.
image similarity assessment approach, they applied FSRISA
to three popular image applications (namely, copy detection,
retrieval, and recognition) by properly formulating them to A. Holistic Features
sparse representation problems. In [5], a voting-based method
was employed to calculate the similarity measure for object A GIST feature [42] is a holistic feature descriptor repre-
retrieval. Note that the similarity measure computation in this senting the image by defining the spatial envelop with a low-
work only considered the matched feature pairs with spatial dimension approach. We calculate the 2048-dimensional GIST
consistency. In [30], an endmember-based distance measure, descriptors of the query image and every image in the image
particularly adaptive for hyperspectral image retrieval, was database. To reduce the computational cost without decreasing
presented for image retrieval. In another contribution [31], a the retrieval accuracy, a reduction of the linear-dimensionality
probabilistic method was used to match multiple views in object is performed on the descriptors to generate the projected data
retrieval. However, in this approach, the Hausdorff distance D. Then, (1) is employed to minimize the quantization loss and
fails to describe three or more objects that look similar be- implement binary quantization in the resulting space
cause the distance only reflects the information of the closest
views of a pair of objects. When comparing two objects, the Ω(B, R) = B − DR2F (1)
similarity is measured by summing up the similarity from
all of the corresponding images in [32]. Li and Fonseca [33] where B is the GIST feature descriptor and R is the orthogonal
applied different priorities to topology, direction, and distance matrix.  • F denotes the Forbenius norm.
similarities for qualitative spatial similarity assessment, where Fig. 2 illustrates the GIST features of six categories in the
they considered commonality and differences in their similarity UC Merced image database. It is noted that each category has
assessment studies. its own discriminative holistic feature.

C. Image Retrieval B. Local Features


In this section, we briefly review the subjects of image In the following, we first introduce dense SIFT to achieve
matching and retrieval related to our paper. In [34], a content- a representative local feature by means of spatial pyramid
based image retrieval system was presented, which aimed at maximum pooling with sparse coding.
classifying and retrieving oceanic structures from satellite im- Dense SIFT is computed from the 16 × 16 size image patches
ages. In [35], semantic matching of multilevel image scenes was with a step size of 8 pixels, and it is further encoded by using
utilized to retrieve remote sensing images. In [36], the authors sparse coding. Similar to the principal component analysis,
described a content-based shape retrieval of objects from a sparse coding uses a group of “supercomplete” base vectors
large-scale satellite imagery database. Li and Bretschneider to represent the sample data, which can find the structures and
[37] proposed a context-sensitive Bayesian network for se- patterns within the input data [43].
mantic inference of the segmented scenes. The semantics For an image O = [o1 , . . . , on ] ∈ Rm×n ], where o1 , o2 ,
were employed for extracting candidate scenes, which were . . . , on are the patches of the feature vectors, O can be fac-
evaluated and ranked in a consecutive step. Schröder et al. torized into two matrices by matrix factorization: U ∈ Rm×k
[38] developed a probabilistic retrieval methodology based on and T ∈ Rk×n . The basis U can capture the intrinsic geometric
interactive learning. Ferecatu and Boujemaa [15] developed structure of O, and each column vector of T is a sparse
an active relevance feedback solution based on support vector representation. Norms such as the Frobenius are used to
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6023

Fig. 2. GIST features of six scene categories in the UC Merced Land Use–Land Cover dataset. The first row shows the categories in the high-resolution satellite
images, and the second row presents their corresponding GIST features. (a) Agricultural. (b) Airplane. (c) Freeway. (d) Harbor. (e) Parking lot. (f) Storage tanks.

the image database. Accordingly, we obtain another image list


by using the graph-based local feature learning algorithm. In
the second layer, we first rerank the images in each of the two
lists and then obtain three types of graph anchors PH , PL , and
PC . PH and PL are the top-ranked images of the two candidate
Fig. 3. Block diagram of the local feature extraction procedure.
lists, whereas PC contains the common retrieved images in both
of the top candidate lists. We take PH , PL , and PC as the
queries to retrieve database images by means of the graph-based
approximate O. We make use of (2) to implement the matrix holistic feature learning algorithm and local feature learning
factorization of the image algorithm, respectively. Thus, six image retrieval lists can be
obtained. In the third layer, positive and negative data are
min O − UT2 + λT1
U,T generated by evaluating the quality of these retrieval results.
SimpleMKL [45] is introduced to fuse the retrieval results.
s.t. Uk  ≤ 1, ∀ k = 1, 2, . . . , K. (2)

Equations (3) and (4) are adopted to extract more discrimina- A. First-Layer Graph-Based Retrieval
tive features [43] through multiscale maximum pooling
In this layer, a semisupervised learning method [46] is intro-
z = F (U) (3) duced for retrieving images. We only consider the query image
as the labeled image, and the remaining images are considered
where z is the local image feature by a prechosen pooling unlabeled. To achieve this goal, we first construct a weighted
function graph to obtain the optimized ranking function. Afterward,
zj = max {|u1j |, |u2j |, . . . , |uMj |} (4) database images are retrieved based on the local and holistic
features of these images.
where zj is the jth element of z, uij is the matrix element at the 1) Graph Construction: Given an image dataset X = {x1 ,
ith row and the jth column of U, and M is the number of local . . . , xl , xl+1 , . . . , xn }. Some of the images are labeled as query
feature descriptors in the region. images, and the rest of them need to be ranked according to their
Fig. 3 illustrates the process for extracting the local feature. relevance with the queries. For each query image, a weighted
Because the extracted local feature is highly dimensional, it is undirected graph G = (V, E, W) is constructed from each
difficult to learn the feature joint relevance without consider- individual feature-based retrieval method, where the retrieval
able data overfitting and to obtain optimized parameters with quality or the relevance is modeled by the weights on the edges,
limited training samples. The dimension reduction method [44] where V is a set of vertices, E is a set of edges, and W is
is applied to reduce the dimension. a set of edge weights. Each database image corresponds to a
vertex in G. For each image, we identify its k nearest neighbors
and connect the corresponding vertices in G with edges that
IV. T HREE -L AYERED G RAPH -BASED I MAGE R ETRIEVAL
are associated with the distance between the two vertices.
As illustrated in Fig. 4, the process of the proposed image G is usually not a connected graph. To obtain a connected
retrieval can be divided into three layers. In the first layer, by graph, the two nearest connected components of G are com-
using the graph-based holistic feature learning algorithm, one bined into one by an edge that links their two nearest points.
list whose images are similar to the input query is obtained from The process ends when G becomes a connected and undirected
6024 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

regularization term Ω(f ) and the empirical loss term R(f ) can
be combined to form the following optimization function:

arg min {Ω(f ) + μR(f )} (6)


f

where f is a to-be-obtained ranking function and μ is a weight-


ing factor controlling the balance of the aforementioned two
terms, which mathematically is expressed as
 2
1  f (Vi ) f (Vj )
Ω(f ) = Wij  −
2 d(Vi ) d(Vj )
Vi ,Vj ∈V
 
 f 2 (Vi ) f (Vi )f (Vj )
= Wij −
d(Vi ) d (Vi d(Vj ))
Vi ,Vj ∈V

= f T (I − Θs)f (7)

where
 Θs = K−1/2 WK−1/2 , K is a degree matrix, kii =
j Wij , I is the identity matrix, and

R(f ) = f − y2 (8)

where y is the initial label vector with yi = 1 if xi is a query;


otherwise, y i = 0.
The ranking function f that minimizes (6) is computed
by setting the derivative of (6) to zero. It is mathematically
expressed as

f = (I − αΘs)−1 y (9)

where α = 1/(1 + μ). A large fi indicates the high similarity


between the database image i and the query.
We have obtained two retrieval image lists by using the
graph-based holistic and local feature learning algorithms. The
two lists represent different retrieval results and have different
image arrangements. Next, we select the top-ranked images
from the lists as the new input to further refine the retrieval
results.

Fig. 4. Three-layered graph-based learning framework for image retrieval.


B. Second-Layer Graph-Anchor-Based Retrieval

graph. E corresponds to the similarity among vertices with the In this layer, the task is to use the retrieved images as prior
weight W defined by knowledge to improve the retrieval precision. To achieve this,
 2  we utilize the reranking method to obtain the top-ranked images
d (Vi , Vj ) as graph anchors. Then, these anchors are taken as new queries
Wij = exp − (5)
σ2 to further improve the retrieval accuracy.
1) Construction of Graph Anchors: Through the first layer,
where d(Vi , Vj ) denotes the feature distance between the we have obtained two image retrieval results. They are stored in
vertices Vi and Vj , and Wij is the edge weight of Eij . σ is two lists: H_LQ and L_LQ . We separately rerank the images in
a constant that controls the strength of the weight, and it is set each of the lists and obtain two reranked image retrieval lists.
as the median distance among all images. The top-ranked images and common images are taken as the
If the similarity among vertices is determined by the holistic graph anchors for expanding the query. In the following, we
feature based on pairwise image distances, G is called the discuss the procedure for generating the graph anchors.
graph GH of the holistic feature. Otherwise, it is called the To generate accurate graph anchors for learning the joint
graph GL of the local feature. relevance of the holistic and local features, it is necessary to
2) Learning on the Graph: After the graphs are constructed, precisely measure the similarity among the images. We define
the optimal pairwise relevance is obtained through a learning the similarity degree of the two images as the relative score
algorithm. Given the initial labeled data (a query image), the (RS). RS can be computed by using the following reranking
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6025

where SS(ei , Ai ) is computed by (9), j[1, num], index is the


sequence of the retrieved images in the list, 0 < p < 1, and λcon
is the contribution constraint coefficient.
After all of the contributions of the common elements be-
tween LQ and LAi are computed, the similarity between Q and
Ai is as follows:
⎛ ⎞
num 
num 
num
SS(Q, Ai ) = · R⎝ con(ej , Q), con(ej , Ai )⎠
n j=1 j=1
(12)

a/b if a ≤ b
where R(a, b) =
b/a, otherwise.
The similarity score SS(Ai , Nij ) between Ai and Nij is also
solved by (11). Because a retrieved image RI may have several
positions in N , we search RI in the listsLQ , LA1 , . . . , LAn , and
we compute the RS between Q and RI based on the following:

n+1
Fig. 5. Reranking process. For the query image Q, the first row elements A1 ,
A2 , . . . , Am are the retrieval results of the first layer. Then, A1 , A2 , . . . , Am
RS(Q, RI) = SS(Q, Ai ) · con(Nij , Ai ) · δ(Nij ) (13)
are also considered as query images to retrieve images. i=1

1, if Nij ∈ LAi
where δ(Nij ) =
method. This method considers the similarity among image 0, otherwise.
features and the image clustering information. After the images in H_LQ and L_LQ are reranked, the top
In Fig. 5, every circle represents one image. Q is the query retrieved images are taken as the prior knowledge for further
image. N is used to represent the retrieved image matrix. image retrieval. Graph anchors [53] are initially for scalable
A1 , A2 , . . . , Am are the top-ranked images in the retrieval graph-based semisupervised learning. In our approach, we want
results in the first layer, which are stored in a list LQ . The to seek a better anchor set to enhance retrieval accuracy. In the
images in LQ are further used as queries to perform the search. following, we obtain three types of graph anchors: PH , PL , and
For example, N11 , N12 , . . . , N1m are the query results of A1 . PC . PH and PL are the top reranked images of H_LQ and
Similarly, each of the other images in LQ retrieves the m best L_LQ , respectively, whereas PC contains the common images
matching database images. Finally, m + 1 lists are generated in both of the top reranked lists (H_LQ and L_LQ ). It is easy
to obtain PH and PL . The steps for obtaining PC are described
⎛ ⎞ ⎛A A2 ··· Am

LQ 1 as follows.
⎜ L A1 ⎟ ⎜ N11 N12 ··· N1m ⎟ Suppose that CI is a common image in the reranked H_LQ
⎜ ⎟ ⎜ ⎟
⎜ L A2 ⎟ = ⎜ ⎜ N 21 N 22 · · · N2m ⎟ ⎟. and L_LQ . Its similarity simHL to the query image can be
⎜ ⎟
⎝ ··· ⎠ ⎜ ⎝ ..
. .. .. ⎟
⎠ mathematically expressed as follows:
. ··· .
L An Nm1 Nm2 ··· Nmm
simHL = ln (1 + λcon · RSH (Q, CI)) · pi
We compute the similarity score (SS) between two images. + ln (1 + λcon · RSL (Q, CI)) · pj (14)
The SS between the retrieved images Nij and Q is derived by
where i and j denote the positions of the common image CI in
H_LQ and L_LQ , respectively. RS is the relative score. λcon
SS(Q, Nij ) = SS(Q, Ai ) · SS(Ai , Nij ) (10)
and p have been defined in (11).
If simHL is no smaller than the given similarity threshold,
where SS(Q, Ai ) indicates the similarity between images Q the corresponding common image in the lists is taken as a
and Ai , and SS(Ai , Nij ) represents the similarity between graph anchor in PC . The aforementioned process is continued
images Ai and Nij . until all of the common images in H_LQ and L_LQ are
When SS(Q, Nij ) is computed, we first consider the rela- traversed.
tionship between Q and Ai . If they are very similar, many 2) Retrieval Reinforcement and Retrieval Fusion: We take
common images exist in the lists LQ and LAi , and the spatial PH , PL , and PC as query images to retrieve images from
distributions of the image features are similar between the the image database using the learning algorithm of holistic
two lists. Then, con(ej , Ai ) is defined to obtain the contribu- and local features introduced in the first layer. We label these
tions of common elements e1 , e2 , . . . , enum to Ai by using the queries as the prior knowledge and modify the vector y in (9),
following: where the values of the graph anchors are all equal to 1. Thus,
we obtain six retrieval lists through the two graphs GH and
con(ej , Ai ) = ln (1 + λcon · SS(ej , Ai )) · pindex (11) GL . That is, through the graph GH , we separately derive three
6026 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

retrieved results (LHH , LHL , and LHC ) corresponding to the evaluation. These images are stored in the aforementioned
queries PH , PL , and PC . Accordingly, we also obtain other image list LG .
three retrieved results (LLH , LLL , and LLC ) by GL . LHH and The evaluation processes of other retrieval lists are the same
LLL are the results of reinforcement learning. LHL , LLH , LHC , as the process of LHH . To the retrieved results LHL , LHC , LLH ,
and LLC are the results of the preliminary fusion. We know LLL , and LLC , we compute their consistency degree by (15)
that LHL and LHC are the retrieval results corresponding to and choose the good retrieval results for the fusion of the final
the graph GH of the holistic feature, but the queries are the result. These images are also stored in LG .
graph anchors PL and PC which are not generated from GH . 2) Further Feature Fusion by SimpleMKL: After the eval-
Similarly, LLH and LLC are the retrieval results corresponding uation, we need to address the problem of obtaining the best
to GL , but the queries are the graph anchors PL and PC fusion weights. SimpleMKL is applied to learn suitable query-
which are not generated from GL . LHC and LLC are also dependent fusion weights for two types of features. In the image
the results of the graph-based learning algorithm of holistic lists LG , the images are taken as the positive data, and the
and local features, which use the common graph anchors as images in LD with low similarity are taken as negative data. We
the queries. This approach can reduce the influence of some train the parameters of SimpleMKL through these data and the
inappropriate graph anchors on the retrieval result to a certain two features. SimpleMKL determines the combination of the
extent. feature weights by solving a standard SVM optimization prob-
lem based on a gradient descent method. The kernel K(x, x )
represents the similarity between two images using one feature.
C. Third-Layer Fusion-Based Retrieval Results
Supposing that there are M types of features in an image xi , we
To generate the fusion retrieval result of different features, formulate the corresponding kernel matrix for the mth feature
we evaluate the performance of the retrieved images in the    
aforementioned six lists and find similar and dissimilar images. Km (xi , xj ) = exp −γm xm i − xj
m 2
(16)
Then, we learn the weights of holistic and local features and
where m = 1, 2, . . . , M, xm
i corresponds to the mth feature of
related parameters of the fusion. SimpleMKL [45] is introduced
xi and γm is a positive parameter that can be computed accord-
to obtain the final fusion result.
ing to the data distribution.
For the fusion process, positive data and negative data are
To combine multiple features, the conventional approach is
needed. Thus, we introduce an image list (LG ) that contains
to combine the basic kernels in a convex way
similar images (positive data), as well as a second one (LD )
that contains the dissimilar images (negative data). To create 
M

LD , we randomly selected a certain number of images from the K(xi , xj ) = ωm Km (xi , xj ) (17)
bottom of LHH , LHL , LHC , LLH , LLL , and LLC separately and m=1

stored them in LD . Clearly, obtaining LG is a more difficult 


M
task that can be achieved by the following retrieval evaluation where ωm ≥ 0 and ωm = 1.
m=1
procedure. SimpleMKL is optimized by solving the following minimiza-
1) Graph-Based Retrieval Evaluation: The images in LHH , tion problem:
LHL , LHC , LLH , LLL , and LLC are reranked separately. The
1  1 
M M
top-ranked images are usually very similar to the query image
and belong to the same category. We apply the retrieval con- min fm 2Hm + C ζi
{fm },b,ζ,ω 2 m=1 ωm i=1
sistency to evaluate the aforementioned reranked results. For
example, we select the cn top-ranked images from reranked 
M

LHH as the graph anchors for an expansion query. We modified s.t. yi fm (xi ) + yi b ≥ 1 − ζi ∀i
y in (9), where the values of the graph anchors are set to 1. m=1
The graph GH of the holistic feature and f computed in ζi ≥ 0 ∀i
Section IV-A are employed to obtain the retrieved result La . For
the top-ranked images in reranked LHH and La , their similari- 
M

ties to the query image and the sequence in the aforementioned ωm = 1, ωm ≥ 0 ∀m (18)
m=1
two lists are considered. Assume that the position of a retrieved
image in reranked LHH is i and that in La is j. The consistency where fm is a function corresponding to a reproducing kernel
degree of the retrieval results is Hilbert space Hm of kernel Km (xm m
i , xj ). ζi is the slack vari-
able, C is the regularization parameter that trades off between
num 
cn
1 the decision function and the slack variable, ω is the weight
DC = (15)
cn 1+i+j vector, and b is the coefficients to be learned from the training
k=1
data.
where num is the number of common images in the reranked After the optimized parametersare obtained, we compute
lists LHH and La . In practice, cn is usually set to 20, 30, 40, the decision function f (x) + b = M fm (x) + b, where each
and 50. function fm corresponds to a different Hm associated with a
From the retrieved result LHH , we can obtain a certain kernel Km . The decision function describes the relevance of the
number of the top-ranked images by using the retrieval result retrieved images to the query image. The higher the relevance
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6027

Fig. 6. Some examples of the UC Merced Land Use–Land Cover dataset. (a) Agricultural. (b) Airplane. (c) Baseball diamond. (d) Beach. (e) Buildings.
(f) Chaparral. (g) Dense residential. (h) Forest. (i) Freeway. (j) Golf course. (k) Harbor. (l) Intersection. (m) Medium residential. (n) Mobile home park.
(o) Overpass. (p) Parking lot. (q) River. (r) Runway. (s) Sparse residential. (t) Storage tanks. (u) Tennis court.

value is, the more similar the two images are. After all of the
images in the retrieved image list are sorted in descending order,
they represent the final retrieval result.

D. Computational Complexity Analysis


The common way to express the complexity of one algorithm
is using big O notation. Assume that we have n database
images. The computational cost of the proposed method mainly
lies in the three layers. In the first layer, the image retrieval is
computed with the cost of O(n3 + n2 ). In the second layer,
since the number of reranking images is far less than n, the
complexity of the reranking process can be negligible. The
cost for the second layer is O(n3 + n2 ). In the third layer,
the main computational cost is for optimizing the parameters
in SimpleMKL with the time complexity of O(N 3 + lN 2 +
dlN ), where d is the feature dimension, l is the number of
training samples, and N is the number of support vectors. Since
d  n and l  n, in the third layer, different features can Fig. 7. Retrieval results by using different bits in the UC Merced data set with
be fused quickly. The whole cost of the proposed method is the local feature.
O(n3 + n2 + N 3 + lN 2 + dlN ).

V. E XPERIMENTS each individual feature and then further compare our method
with other related methods. mAP and precision–recall curves
In this section, we evaluate the performance of our proposed are employed to evaluate the retrieval performance. We also use
method for image retrieval on the UC Merced image data set the average normalized modified retrieval rank (ANMRR) [52]
[48]. The data set consists of images of 21 LULC categories to provide an overall rating of the performance.
with a pixel resolution of 30 cm. Each class contains 100 RGB
color samples of size 256 × 256 pixels [19]. The example
A. Validation of Retrieval Performance in Each Layer
images are shown in Fig. 6. The source code of the proposed
method is available at the website: http://geogother.bnu.edu.cn/ We extract the 2048-dimensional GIST feature and apply the
teacherweb/zhangliqiang/ algorithm in [49] to reduce the dimension of the feature to 512.
To evaluate the effectiveness of our method, we conduct The dimension reduction method [44] is used to reduce the local
extensive experiments on the UC Merced dataset. We first feature to 32. Fig. 7 demonstrates that features whose dimen-
compare our approach in different layers with those by using sions have been reduced have little influences on image retrieval
6028 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

Fig. 8. Comparisons of retrieval results. (a) Performance of the retrieval results obtained by the GIST feature and Cityblock distance, with our holistic feature
method in the first layer. (b) Performance of the retrieval results obtained by the SIFT feature and Cityblock distance, with our local feature method in the first
layer.

accuracy, but the retrieval efficiency is enhanced, saving more method are nearly the same. As the number of retrieved images
than 80% of the computational time in the graph construction. increases, the reranking method outperforms the other methods.
1) Retrieval Performance in the First Layer: In the first For the local feature, Fig. 9(d)–(f) shows the different results
layer, we use the holistic and local features to construct the obtained by the local feature. From the mAP curves, we note
graph. α is set to 0.1 in the graph of the holistic feature and that the performance of the reranking method is still higher than
0.01 in the graph of the local feature. those of other methods. The retrieval precision obtained in the
To validate the retrieval performance in the first layer, we first layer is further improved after the reranking.
use two methods to retrieve images. One is the holistic or Assume that RSH_ max denotes the maximum RS in the
local feature-based retrieval method with Cityblock distance graph-based retrieval method of the holistic feature and
(the first method). The other is the proposed method in the first RSL_ max denotes the maximum RS in the graph-based re-
layer. The results shown in Fig. 8 are the average values of trieval method of the local feature. The thresholds are defined
retrieval performance of 21 categories. The retrieval precision as tH_sim = 0.1 × RSH_ max and tL_sim = 0.1 × RSL_ max to
of some categories, such as agricultural, buildings, and harbor, extract the graph anchors PH and PL . Equation (14) and the
is even higher than the values in Fig. 8. The learned ranking threshold 0.01 × max(simHL ) are utilized to extract the graph
function takes holistic and local features into account and anchor PC . In the second layer, the graph anchors PH , PL , and
uses the smoothness and fitting constraint to solve the image PC are used to retrieve images from the database.
similarity. Thus, the retrieval results in the first layer are more From Figs. 10 and 11, it is noted that use of the top-ranked
preferable. From Fig. 8, we notice that our presented method images in the expansion query can improve the retrieval accu-
always performs better than the first method. racy. For example, by using the graph anchors PL , the obtained
2) Retrieval Performance in the Second Layer: In the second retrieval result LHL is more accurate than the other retrieval
layer, we rerank the retrieval results of the first layer. The results shown in Fig. 10. The reason is that the reranking
number of top reranked images is set to 10, 20, and 30. mAP process can offer more precise prior knowledge for PL . In
is used to evaluate different retrieval results. The first retrieval Fig. 11, the retrieval results of some categories, such as dense
result is obtained by the holistic or local feature-based retrieval residential, golf course, and tennis courts, may not be very good
method with Cityblock distance. The second result is from the in LLH. The main reason is that the graph anchors PH cannot
retrieval of the first layer. The third result is the reranked result provide more useful information for the local feature learning
by using our method. The three retrieval results are shown in in the second layer.
Fig. 9. It is shown that the retrieval performance of the reranked 3) Retrieval Performance in the Third Layer: We have
method in the second layer is always better than in the other reranked the retrieval result lists LHH , LHL , LHC , LLH , LLL ,
two methods. Fig. 9(a)–(c) shows the different results obtained and LLC . We usually select the top 50 images in the afore-
by the holistic feature. The method in the first layer and the mentioned reranked lists to evaluate the retrieval consistency.
reranking process can retrieve more accurate images than the These good results are considered as the positive data, and
holistic feature-based retrieval method with Cityblock distance. the images at the bottom of the reranked lists are selected as
When the nearest neighbor number is equal to 3, the retrieval the negative data. We remove the duplicate elements of these
performances of the method in the first layer and the reranking positive and negative data. The parameters of SimpleMKL are
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6029

Fig. 9. Comparisons of different retrieval methods. (a) Retrieval results obtained by using the GIST feature with Cityblock distance. The number of top-ranked
images is 10. (b) Retrieval results obtained by using the holistic feature-based retrieval method in the first layer. The number of top-ranked images is 20.
(c) Reranking result. The number of top-ranked images is 30. (d) Retrieval results obtained by using the SIFT feature with Cityblock distance. The number
of top-ranked images is 10. (e) Retrieval results obtained by using the local feature-based retrieval method in the first layer. The number of top-ranked images
is 20. (f) Reranking result. The number of top-ranked images is 30.
6030 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

Fig. 10. Comparison of four retrieval results by using the graph of the holistic
feature-based retrieval method. LHC denotes the retrieval result when PC rep-
resents the query images. LHH denotes the retrieval result when PH represents
the query images. LHL denotes the retrieval result when PL represents the query
images.

Fig. 12. Comparisons of the retrieval results obtained by using the holistic
feature. LHC, LHH, and LHL denote retrieval results. PC, PH, and PL represent
query images.

34.36%, and 41.56%. For most categories, the retrieval results


have high precisions. The retrieval processes in the first two
layers only consider one single feature type, i.e., the holistic
or local feature, whereas we extract positive and negative data
in the third layer and train the parameters of the SimpleMKL
with holistic and local features at the same time. This approach
Fig. 11. Comparison of four retrieval results by using the graph of the local
feature-based retrieval method. LLC denotes the retrieval result when PC rep- combines the advantages of the holistic and local feature-
resents the query images. LLH denotes the retrieval result when PH represents based retrieval methods, and it fuses the two retrieval results
the query images. LLL denotes the retrieval result when PL represents the query efficiently. Therefore, the performance in this layer outperforms
images.
those in the first two layers, which means that fusion of the
ranked retrieval image sets given by the holistic and local
trained through the positive data and negative data. The results feature-based retrieval methods can remarkably enhance the
are shown in Figs. 12 and 13. final retrieval accuracy.
As illustrated in Fig. 12, when the nearest neighbor number From Figs. 12 and 13, we conclude that the retrieval results
is equal to 100, the mAPs of the harbors are 13.85%, 15.40%, in the third layer are higher than those of the other two layers.
14.70%, 21.81%, and 35.48%, corresponding to the retrieval
results in the first layer, LHC, LHH, LHL, and third layer,
B. Comparisons of the Related Methods
respectively. The mAPs of parking lot are 13.97%, 15.26%,
16.12%, 25.45%, and 38.52%. The retrieval results in the third In this section, we compare our retrieval result with other
layer perform best with the holistic feature. methods: pyramid of histograms of orientation gradients
In Fig. 13, by using the local feature-based retrieval method, (PHOG) [50], local invariants [23], pyramid histogram of words
the mAPs of agricultural are 23.19%, 23.17%, 23.4%, 24.23%, (PHOW) [51], and global morphological texture descriptors
and 42.58%, corresponding to the retrieval results in the first [19]. In [50], a shape feature descriptor together with a spatial
layer, LLC, LLH, LLL, and third layer, respectively. The mAPs pyramid kernel is introduced to represent the local shape and
of baseball diamond are 34.95%, 39.91%, 42.83%, 42.39%, and spatial layout of the objects in the image. They generalize the
45.45%, and the mAPs of beach are 30.99%, 33.21%, 35.09%, spatial pyramid kernel and learn its level weighting parameters
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6031

Fig. 14. Comparisons of the retrieval results with different retrieval


approaches.

sectors (n), respectively. Specifically, FPS1’s parameters are set


as m = 2 using the scales QFZ15,15 and QFZ30,30 with n = 7
disks, and FPS2 is calculated also with two scales QFZ10,10 and
QFZ20,20 and n = 16 sectors.
In our method, the input is still a single query image in our
approach, but we can explore more relevant images iteratively
to overcome the deficiency of the single image-based retrieval
methods. It allows us to accurately validate each return, sup-
pressing the false positives. The top-ranked retrieved images
Fig. 13. Comparisons of the retrieval results obtained by using the local
feature. LLC denotes the retrieval result when PC represents the query images.
can be used as prior knowledge for the next retrieval. The
LLH denotes the retrieval result when PH represents the query images. LLL retrieval results are fused by using SimpleMKL, which greatly
denotes the retrieval result when PL represents the query images. improves the retrieval precision. The other four methods use a
single query image for retrieving similar images from the image
for combining multilevel features. In this experiment, to build database.
pyramid histograms, we discretize gradient orientations into In order to further validate the efficiency and effectiveness
eight bins and set the number of pyramid levels to 3. In [23], of our presented method, we have also computed the average
the dense SIFT descriptor is extracted to describe the local ANMRR, feature extraction time, retrieval time, and overall
invariance of images. The space between dense SIFT samples computational time for obtaining the retrieved images only
is set to 8, and the size of each patch for the SIFT descriptor using the SIFT feature and the first two layers (second layer-
is set to 16. The SIFT descriptor is quantized using codebooks SIFT) of our method.
resulting from the k-means clustering. The dictionary size is As shown in Fig. 14, the second layer-SIFT achieves a higher
set to 400, and 50 texton images are used to build the histogram retrieval accuracy compared with the methods in [19], [23],
bins. The sum of the square error function is used for k-means [50], and [51]. The first 5% of the retrieval result of three layers
clustering. The point with a local minimum is taken as the cen- is not much better than other methods. The reason is that some
ter of the cluster. In [51], PHOW partitions the image into fine inappropriate positive data are brought into the training data.
subregions and computes histograms of local features found However, our method next achieves the best retrieval precision
inside each subregion. PHOW introduces the spatial pyramid to among all of the methods.
compute the feature histograms. For PHOW, the space between In Fig. 15, the chaparral land-use is easy to be recognized,
dense SIFT samples is set to 8, the size of each patch for the but some categories are obscured by other categories, such as
SIFT descriptor is set to 16, and the number of pyramid levels buildings and overpasses. The performance of our method still
is 3. The approach for obtaining each pyramid feature is the outperforms other methods in most categories.
same as in [23]. In [19], the CCH [55], RITs [54], and two All experiments are performed on a PC with a 3.0-GHz
texture descriptors based on the Fourier power spectrum of an CPU and 8-GB memory. As listed in Table I, it is noted that
image’s quasi-flat zone representation (i.e., FPS1 and FPS2 ) are the overall computational time of our method is approximately
combined into morphological descriptors for retrieving remote equal to those of the methods in [23] and [51] and more than
sensing images. FPS1 and FPS2 have a number of parameters to the method in [50]. However, our retrieval accuracy is higher
set, i.e., the number of scales (m), the step (s) for computing than other methods. In our method, feature extraction consumes
the QFZ scale space, and, of course, the number of disks and most of the overall time, and from Section IV-D, we know that
6032 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

precision in the current layer outperforms those in the previous


layers. Therefore, the developed framework is reasonable and
scalable for remote sensing image retrieval. Comparisons of
our method with the other methods further demonstrated that
our method with a single input query image generates very
competitive retrieval performance. The work in this paper can
help in classifying and recognizing ground objects in remote
sensing images. For example, we can automatically provide
prior information for the objects to be detected or generate
training data through the image retrieval.
Future work will focus on decreasing computational com-
plexity and fusing geometric [56], [57] and spectral features
[58] to improve the retrieval performance of remote sensing
images.

ACKNOWLEDGMENT
Fig. 15. Comparisons of the retrieval results using ANMRR. The authors would like to thank the editors and anony-
TABLE I
mous reviewers for their valuable comments, and also thank
R ETRIEVAL P ERFORMANCE OF D IFFERENT A PPROACHES E. Aptoula for providing the comparison features.
IN T ERMS OF ANMRR AND C OMPUTATIONAL T IME
R EFERENCES
[1] S. M. Schweizer and J. M. Moura, “Efficient detection in hyperspectral
imagery,” IEEE Trans. Image Process., vol. 10, no. 4, pp. 584–597,
Apr. 2001.
[2] C. Li, T. Sun, K. Kelly, and Y. Zhang, “A compressive sensing and
unmixing scheme for hyperspectral data processing,” IEEE Trans. Image
Process., vol. 21, no. 3, pp. 1200–1210, Mar. 2012.
[3] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and
J. C. Tilton, “Advances in spectral–spatial classification of hyperspectral
images,” Proc. IEEE, vol. 101, no. 3, pp. 652–675, Mar. 2013.
[4] J. Li, J. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspectral image
segmentation using multinomial logistic regression with active learning,”
IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4085–4098,
Nov. 2010.
learning suitable query-dependent fusion weights for holistic [5] X. Shen, Z. Lin, J. Brandt, and Y. Wu, “Spatially-constrained similarity
and local features by using SimpleMKL costs most of the measure for large-scale object retrieval,” IEEE Trans. Pattern Anal. Mach.
retrieval time. We also notice that the overall computational Intell., vol. 36, no. 6, pp. 1229–1241, Jun. 2014.
[6] S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas, “Query specific
time of the second layer-SIFT is less than the methods in [19] rank fusion for image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell.,
and [51], but the average ANMRR is the smallest. vol. 34, no. 4, pp. 803–815, Apr. 2015.
In our image retrieval procedure, most of the steps are paral- [7] S. Özkan, T. Ates, E. Tola, M. Soysal, and E. Esen, “Performance analysis
of state-of-the-art representation methods for geographical image retrieval
lelizable, such as the holistic and local feature extraction of the and categorization,” IEEE Geosci. Remote Sens. Lett. 2014, vol. 11,
query images and the image retrieval by using each individual no. 11, pp. 1996–2000, Nov. 2014.
feature. Therefore, they can be implemented by employing a [8] E. Aptoula, “Bag of morphological words for content-based geograph-
ical retrieval,” in Proc. IEEE Int. Content-Based Multimedia Indexing
parallelization scheme to further reduce computation time. Workshop, 2014, pp. 1–5.
[9] A. M. Cheriyadat, “Unsupervised feature learning for aerial scene classi-
VI. C ONCLUSION fication,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 439–451,
Jan. 2014.
In this paper, we have proposed a three-layered remote [10] R. Ji, L.-Y. Duan, J. Chen, H. Yao, Y. Rui, and W. Gao, “Location discrim-
inative vocabulary coding for mobile landmark search,” Int. J. Comput.
sensing image retrieval method based on local and holistic Vis., vol. 96, no. 3, pp. 290–314, Feb. 2012.
features. Unlike previous image retrieval methods, which often [11] X. Yang, X. Qian, and Y. Xue, “Scalable mobile image retrieval by ex-
concatenate all types of features into one vector to retrieve ploring contextual saliency,” IEEE Trans. Image Process., vol. 24, no. 6,
pp. 1709–1721, Jun. 2015.
images, we have extended the query and applied a novel [12] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, “Total recall:
three-layered learning image retrieval approach to fuse various Automatic query expansion with a generative feature model for object
features. To extend the query, the proposed approach refines retrieval,” in Proc. IEEE Int. Conf. Comput. Vis., Rio de Janeiro, Brazil,
Oct. 2007, pp. 1–8.
the original input query by combining it with the top-ranked [13] O. Chum, A. Mikulík, M. Perdoch, and J. Matas, “Total recall II: Query
retrieval results obtained by using the holistic and local feature- expansion revisited,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern
based methods. The expansion query images are taken as graph Recognit., Jun. 2011, pp. 889–896.
[14] P. Siva, C. Russell, X. Tao, and L. Agapito, “Looking beyond the image:
anchors for further retrieving six image sets. The images in Unsupervised learning for object saliency and detection,” in Proc. IEEE
each set are evaluated for generating positive data and negative Int. Conf. Comput. Vis. Pattern Recognit., Portland, OR, USA, Jun. 2013,
data. SimpleMKL is applied to learn suitable query-dependent pp. 3238–3245.
[15] M. Ferecatu and N. Boujemaa, “Interactive remote sensing image retrieval
fusion weights for the holistic and local features. Experiments using active relevance feedback,” IEEE Trans. Geosci. Remote Sens.,
in each layer were conducted, and they demonstrate that the vol. 45, no. 4, pp. 818–826, Apr. 2007.
WANG et al.: THREE-LAYERED GRAPH-BASED LEARNING APPROACH FOR REMOTE SENSING IMAGE RETRIEVAL 6033

[16] A. Samal, S. Bhatia, P. Vadlamani, and D. Marx, “Searching satellite [40] D. Bratasanu, I. Nedelcu, and M. Datcu, “Bridging the semantic gap for
imagery with integrated measures,” Pattern Recognit., vol. 42, no. 11, satellite image annotation and automatic mapping applications,” IEEE J.
pp. 2502–2513, Nov. 2009. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 4, no. 1, pp. 193–204,
[17] S. Zheng et al., “Dense semantic image segmentation with objects and Mar. 2011.
attributes,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., [41] F. Kalaycilar, A. Kale, D. Zamalieva, and S. Aksoy, “Mining of re-
Columbus, OH, USA, Jun. 2014, pp. 3214–3221. mote sensing image archives using spatial relationship histograms,” in
[18] N. Dalal and B. Triggs, “Histograms of oriented gradients for human Proc. IEEE Int. Geosci. Remote Sens. Symp., Boston, MA, USA, 2008,
detection,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., pp. III-589–III-592.
San Diego, CA, USA, Jun. 2005, pp. 1063–6919. [42] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic
[19] E. Aptoula, “Remote sensing image retrieval with global morphological representation of the spatial envelope,” Int. J. Comput. Vis., vol. 42, no. 3,
texture descriptors,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 145–175, May 2001.
pp. 3023–3034, May 2014. [43] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyra-
[20] A. E. Johnson and M. Hebert, “Using spin images for efficient object mid matching using sparse coding for image classification,” in
recognition in cluttered 3-D scenes,” IEEE Trans. Pattern Anal. Mach. Proc. IEEE Int. Conf. Comput. Vis., Miami, FL, USA, Jun. 2009,
Intell., vol. 21, no. 5, pp. 433–449, May 1999. pp. 1794–1801.
[21] S. Gleason, R. Ferrell, A. Cheriyadat, R. Vatsavai, and S. De, “Semantic [44] L. J. van der Maaten, E. O. Postma, and H. J. van den Herik, “Dimen-
information extraction from multispectral geospatial imagery via a flex- sionality reduction: A comparative review,” J. Mach. Learn. Res., vol. 10,
ible framework,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2010, pp. 66–71, Jan. 2008.
pp. 166–169. [45] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “SimpleMKL,”
[22] R. R. Vatsavai, A. Cheriyadat, and S. Gleason, “Unsupervised semantic J. Mach. Learn. Res., vol. 9, pp. 2491–2521, Nov. 2008.
labeling framework for identification of complex facilities in high resolu- [46] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf, “Learn-
tion remote sensing images,” in Proc. Int. Conf. Data Mining Workshops, ing with local and global consistency,” Proc. Adv. Neural Inf. Process.,
2010, pp. 273–280. vol. 16, no. 16, pp. 321–328, 2004.
[23] Y. Yang and S. Newsam, “Geographic image retrieval using local invariant [47] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int.
features,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 818–832, J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
Feb. 2013. [48] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for
[24] D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. van Cool, “Hello land-use classification,” in Proc. ACM Int. Conf. Adv. Geograph. Inf. Syst.,
neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” 2010, pp. 270–279.
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Providence, RI, USA, [49] Y. Gong and S. Lazebnik, “Iterative quantization: A procrustean approach
Jun. 2011, pp. 777–784. to learning binary codes,” in Proc. IEEE Conf. Comput. Vis. Pattern
[25] B. Xu et al., “Efficient manifold ranking for image retrieval,” in Proc. Int. Recognit., Providence, RI, USA, Jun. 2011, pp. 817–824.
ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2011, pp. 525–534. [50] A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with
[26] S. Xia and E. Hancock, “3-D object recognition using hyper-graphs and a spatial pyramid kernel,” in Proc. Int. Conf. Image Video Retrieval,
ranked local invariant features,” in Structural, Syntactic, and Statisti- New York, NY, USA, 2007, pp. 401–408.
cal Pattern Recognition. New York, NY, USA: Springer-Verlag, 2008, [51] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of fea-
pp. 117–126. tures: Spatial pyramid matching for recognizing natural scene cate-
[27] D. Y. Chen, X. P. Tian, Y. T. Shen, and M. Ouhyoung, “On visual simi- gories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2006,
larity based 3-D model retrieval,” Comput. Graph. Forum, vol. 22, no. 3, pp. 2169–2178.
pp. 223–232, Sep. 2003. [52] B. S. Manjunath, J. R. Ohm, V. V. Vasudevan, and A. Yamada, “Color and
[28] M. Zhang, L. Zhang, P. T. Mathiopoulos, Y. Ding, and H. Wang, texture descriptors,” IEEE Trans. Circuits Syst. Video Technol., vol. 11,
“Perception-based shape retrieval for 3-D building models,” ISPRS J. no. 6, pp. 703–715, Jun. 2011.
Photogramm. Remote Sens., vol. 75, pp. 76–91, Jan. 2013. [53] W. Liu, J. He, and S. Chang, “Large graph construction for scalable
[29] L. Kang, C. Hsu, H. Chen, C. Lu, C. Lin, and S. Pei, “Feature-based sparse semi-supervised learning,” in Proc. 27th Int. Conf. Mach. Learn., 2010,
representation for image similarity assessment,” ACM Trans. Multimedia, pp. 679–686.
vol. 13, no. 5, pp. 1019–1030, Oct. 2011. [54] E. Aptoula, “Extending morphological covariance,” Pattern Recognit.,
[30] M. Graña and M. A. Veganzones, “An endmember-based distance for vol. 45, no. 12, pp. 4524–4535, Dec. 2012.
content based hyperspectral image retrieval,” Pattern Recognit., vol. 45, [55] E. Aptoula and B. Yanikoglu, “Morphological features for leaf based plant
no. 9, pp. 3472–3489, Sep. 2012. recognition,” in Proc. IEEE Int. Conf. Image Process., Melbourne, Vic.,
[31] Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai, “3-D object retrieval Australia, Sep. 2013, pp. 1496–1499.
and recognition with hypergraph analysis,” IEEE Trans. Image Process., [56] Z. Wang et al., “A multiscale and hierarchical feature extrac-
vol. 21, no. 9, pp. 4290–4303, Sep. 2012. tion method for terrestrial laser scanning point cloud classification,”
[32] P. Daras and A. Axenopoulos, “A 3-D shape retrieval framework sup- IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2409–2425,
porting multimodal queries,” Int. J. Comput. Vis., vol. 89, pp. 229–247, May 2015.
Jul. 2010. [57] Z. Zhang et al., “A multi-level point cluster-based discriminative feature
[33] B. Li and F. Fonseca, “TDD: A comprehensive model for qualitative for ALS point cloud classification,” IEEE Trans. Geosci. Remote Sens.,
spatial similarity assessment,” Spatial Cognition Comput., vol. 6, no. 1, vol. 54, no. 6, pp. 3309–3321, Jun. 2016.
pp. 31–62, Jan. 2006. [58] S. Mei, M. He, Y. Zhang, Z. Wang, and D. Feng, “Improving spatial–
[34] J. A. Piedra-Fernandez, G. Ortega, J. Z. Wang, and M. Canton-Garbin, spectral endmember extraction in the presence of anomalous ground ob-
“Fuzzy content-based image retrieval for oceanic remote sensing,” IEEE jects,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4210–4222,
Trans. Geosci. Remote Sens., vol. 52, no. 9, pp. 5422–5431, Sep. 2014. Nov. 2011.
[35] M. Wang and T. Song, “Remote sensing image retrieval by scene se-
mantic matching,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5,
pp. 2874–2886, May 2013.
[36] G. Scott, M. Klaric, C. Davis, and C.-R. Shyu, “Entropy-balanced
bitmap tree for shape-based object retrieval from large-scale satellite
imagery databases,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 5,
pp. 1603–1616, May 2011.
[37] Y. Li and T. R. Bretschneider, “Semantic-sensitive satellite image re- Yuebin Wang is currently working toward the Ph.D.
trieval,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 853–860, degree in the School of Geography, Beijing Normal
Apr. 2007. University, Beijing, China.
[38] M. Schröder, H. Rehrauer, K. Seidel, and M. Datcu, “Interactive His research interests are remote sensing imagery
learning and probabilistic retrieval in remote sensing image archives,” processing and 3-D urban modeling.
IEEE Trans. Geosci. Remote Sens., vol. 38, no. 5, pp. 2288–2298,
Sep. 2000.
[39] J. Li and R. M. Narayanan, “Integrated spectral and spatial information
mining in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens.,
vol. 42, no. 3, pp. 673–685, Mar. 2004.
6034 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 54, NO. 10, OCTOBER 2016

Liqiang Zhang received the Ph.D. degree in geoin- Xiaoyue Xing is currently working toward the
formatics from the Institute of Remote Sensing Ap- Bachelor’s degree in the School of Geography,
plications, Chinese Academy of Sciences, Beijing, Beijing Normal University, Beijing, China.
China, in 2004. Her research interests are TLS point cloud
He is currently a Professor with the School processing.
of Geography, Beijing Normal University, Beijing.
His research interests include remote sensing image
processing, 3-D urban reconstruction, and spatial
object recognition.

P. Takis Mathiopoulos (SM’94) received the Ph.D.


Xiaohua Tong received the Ph.D. degree from degree in digital communications from the Univer-
Tongji University, Shanghai, China, in 1999. sity of Ottawa, Ottawa, ON, Canada, in 1989.
He was a Postdoctoral Researcher with the From 1982 to 1986, he was with Raytheon
State Key Laboratory of Information Engineering in Canada Ltd., working in the areas of air navigational
Surveying, Mapping, and Remote Sensing, Wuhan and satellite communications. In 1988, he joined the
University, Wuhan, China, between 2001 and 2003. Department of Electrical and Computer Engineer-
He was a Research Fellow with The Hong Kong ing (ECE), University of British Columbia (UBC),
Polytechnic University, Hung Hom, Hong Kong, in Vancouver, BC, Canada, where he was a faculty
2006, and a Visiting Scholar with the University member until 2003, holding the rank of Professor
of California, Santa Barbara, CA, USA, between from 2000 to 2003. From 2000 to 2014, he was with
2008 and 2009. He is a Chang-Jiang Scholar Chair the Institute for Space Applications and Remote Sensing (ISARS), National
Professor appointed by the Ministry of Education, China. He also received Observatory of Athens (NOA), first as the Director and then as the Director of
National Natural Science Funds for Distinguished Young Scholar in 2013. He Research, where he established the Wireless Communications Research Group.
is the author of more than 40 publications in international journals. His current As ISARS’ Director (2000–2004), he led the institute to a significant expansion
research interests include remote sensing, GIS, trust in spatial data, and image R&D growth and international scientific recognition. For these achievements,
processing for high-resolution and hyperspectral images. ISARS was selected as a National Centre of Excellence for 2005 to 2008. Since
Prof. Tong serves as the Vice-Chair of the Commission on Spatial Data 2014, he has been an Adjunct Researcher with the Institute of Astronomy, As-
Quality of the International Cartographical Association and the Co-Chair of trophysics, Space Applications and Remote Sensing (IAASARS), NOA. Since
the ISPRS Working Group (WG II/4) on Spatial Statistics and Uncertainty 2003, he has also been teaching part-time at the Department of Informatics and
Modeling. He received the State Natural Science Award (Second Place) from Telecommunications, University of Athens, Athens, Greece, where he has been
the State Council of the Peoples’ Republic of China in 2007. a Professor of digital communications since 2014. From 2008 to 2013, he was
appointed as a Guest Professor with Southwest Jiaotong University, Chengdu,
China. He has been also appointed by the Government of China as a Senior
Foreign Expert at the School of Information Engineering, Yangzhou University
Liang Zhang is currently working toward the Ph.D. (2014–2015), Yangzhou, China, and by Keio University, Minato, Japan, as a
degree in the School of Geography, Beijing Normal Visiting Professor with the Department of Information and Computer Science
University, Beijing, China. (2015–2016) under the Top Global University Project of the Ministry of
His research interests are remote sensing imagery Education, Culture, Sports, Science and Technology (MEXT) Government of
processing and 3-D urban modeling. Japan. For the last 25 years, he has been conducting research mainly on the
physical layer of digital communication systems for terrestrial and satellite
applications, including digital communications over fading and interference
environments. He is the coauthor of a paper in GLOBECOM’89 establishing
for the first time in the open technical literature the link between MLSE and
multiple (or multisymbol) differential detection for the AWGN and fading
channels. He is also interested in channel characterization and measurements,
modulation and coding techniques, synchronization, SIMO/MIMO, UWB,
OFDM, software/cognitive radios, and green communications. In addition,
since 2010, he has been actively involved with research activities in the fields of
Zhenxin Zhang is currently working toward the
Ph.D. degree in geoinformatics in the School of Ge- remote sensing, LiDAR systems, and photogrammetry. In these areas, he is the
ography, Beijing Normal University, Beijing, China. coauthor of more than 100 journal papers, mainly published in various IEEE
His research interests include light detection and IET journals, 4 book chapters, and more than 120 conference papers. He
and ranging data processing, quality analysis of has been the PI of more than 40 research grants and has supervised the thesis
geographic information systems, and algorithm of 11 Ph.D. and 23 Master’s students. He has regularly acted as a consultant
development. for various governmental and private organizations. Since 1993, he has been
serving on a regular basis as a Scientific Advisor and a Technical Expert for
the European Commission (EC). In addition, from 2001 to 2014, he served as
a Greek representative to high-level committees in the EC and the European
Space Agency.
Dr. Mathiopoulos has been or currently serves on the editorial board of
several archival journals, including the IET Communications and the IEEE
T RANSACTIONS ON C OMMUNICATIONS (1993–2005). He has been a member
Hao Liu received the Bachelor’s degree from of the Technical Program Committee (TPC) of more than 70 international IEEE
Beihang University (BUAA), Beijing, China, in 2012 conferences, as well as TPC Vice Chair of the 2006-S IEEE VTC and 2008-F
and the Master’s degree from Kyushu University, IEEE VTC as well as the Cochair of FITCE2011. He has delivered numerous
Fukuoka, Japan, in 2015. He is currently working
invited presentations, including plenary and keynote lectures, and has taught
toward the Ph.D. degree in the School of Geography, many short courses all over the world. As a faculty member at the ECE of
Beijing Normal University, Beijing. UBC, he was elected as an ASI Fellow and a Killam Research Fellow. He
His present research interests include machine was a corecipient of two best paper awards for papers published in the 2nd
learning and computer vision. International Symposium on Communication, Control, and Signal Processing
(2008) and 3rd International Conference on Advances in Satellite and Space
Communications (2011).

You might also like