28-02-2021 RSIR by Scene Sementic Matching

2874 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO.
5, MAY 2013
Remote Sensing Image Retrieval

by Scene Semantic Matching
Min Wang and Tengyi Song
Abstract—This paper proposes a remote sensing (RS) image and reliable if spatial similarity evaluation will not only rely
retrieval scheme by using image scene semantic (SS) match- on the low-level VFs of an image, such as color and texture.
ing. The low-level image visual features (VFs) are first mapped Employing the semantics in RSIR, or semantic-based RSIR
into multilevel spatial semantics via VF extraction, object-based (SBRSIR), can meet the requirements. However, semantic ex-
classification of support vector machines, spatial relationship in-
ference, and SS modeling. Furthermore, a spatial SS matching traction and utilization in RSIR are more complicated compared
model that involves the object area, attribution, topology, and with those in common CBIR because RS images often represent
orientation features is proposed for the implementation of the large natural geographical scenes that contain abundant and
sample-scene-based image retrieval. Moreover, a prototype system complex visual contents.
that uses a coarse-to-fine retrieval scheme is implemented with This paper proposes a suite of technical schemes that is
high retrieval accuracy. Experimental results show that the pro- suitable for the semantic-based retrieval of RS image scenes.
posed method is suitable for spatial SS modeling, particularly geo-
Image low-level VFs are gradually converted to high-level spa-
graphic SS modeling, and performs well in spatial scene similarity
matching. tial semantics, including object, spatial relationship, and scene
semantics (SS). Then, a spatial SS matching model that involves
Index Terms—Attributed relational graph (ARG), image re- the object area, attribution, topology, and orientation features is
trieval, object-based image analysis, remote sensing (RS) image,
proposed for the implementation of sample-scene-based image
scene matching, semantic, semantic gap.
retrieval. Furthermore, a prototype system is designed and im-
I. I NTRODUCTION plemented for method validation. Many known computational
methods, including VF extraction, image decomposition, and
C ONTENT-based image retrieval (CBIR) is the traditional

process of finding interesting images from databases
based on the visual contents of the images, including color,
classification, are used in the proposed methods. However, the
main contribution and novelty of this paper lie in the distinctive
SS modeling and matching schemes. Experimental results show
texture, salient points, and shape features. With decades that the proposed techniques are suitable for spatial, particularly
of development, important progress has been made in this geographic, SS modeling and similarity matching and thus have
field [1]. Some prototype and commercial systems have good application prospects in RSIR.
emerged, such as QBIC, Photobook, VisualSEEK, SIMPLIcity, The rest of this paper is organized as follows. Section II
and GazoPa, among others [2]–[6]. Although many achieve- reviews related work, including technical backgrounds and
ments have been accomplished, visual-feature-based (VF) typical cases of SBRSIR. Section III introduces the proposed
CBIR is known to have a limited capability because human SS-based retrieval schemes in detail. Section IV discusses the
image interpretation has been found to be highly semantically experimental part, and Section V provides a summary and
related [7]. It not only depends on image VFs but also on discusses future research.
an inspector’s understanding and judgment originating from
his/her experiences. Semantic-based image retrieval (SBIR)
is regarded more accurate than VF-based CBIR; thus, it has II. R ELATED W ORK
received more attention [8], [9].
Remote sensing (RS) image retrieval (RSIR) is the extension A. Technical Backgrounds
and utilization of CBIR techniques in the fields of RS applica-
SBIR and SBRSIR are not separately surveyed because they
tions. RSIR offers a promising solution to problems in spatial
have similar key problems and techniques. SBIR/SBRSIR cur-
information retrieval. Moreover, RSIR can be more accurate
rently involves the following three main aspects [1], [10], [11]:
1) image semantic representation; 2) image semantic extraction
Manuscript received March 17, 2012; revised May 29, 2012, July 27, 2012, for bridging image low-level VFs to high-level semantics; and
and August 22, 2012; accepted August 30, 2012. Date of publication October 3) semantic matching and utilization.
22, 2012; date of current version April 18, 2013. This work was supported
in part by the National Natural Science Foundation of China under Grant The stratified image semantic models are of common use
41171321 and Grant 40871189, by the Natural Science Foundation of the for image semantic description. Colombo et al. [14] stated
Jiangsu Higher Education Institutions of China under Grant 11KJA420001, and that image VFs (called the perceptual data) are hierarchically
by the Priority Academic Program Development of Jiangsu Higher Education
Institutions.
organized into expressive and emotional levels. In [12] and [13],
The authors are with the Key Laboratory of Virtual Geographic Environment, image semantics were decomposed into six levels according
Ministry of Education, Nanjing Normal University, Nanjing 210046, China to their abstract degree and distance to human thinking. From
(e-mail: sysj@njnu.edu.cn; 563383778@qq.com). bottom to top, these levels include the feature, object, spatial
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. relationship, scene, behavior, and emotion semantics. The so-
Digital Object Identifier 10.1109/TGRS.2012.2217397 called “semantic gap” is regarded to be existent between the
0196-2892/$31.00 © 2012 IEEE

WANG AND SONG: REMOTE SENSING IMAGE RETRIEVAL BY SCENE SEMANTIC MATCHING 2875
first and second levels of mapping image VFs to object seman- age segmentation, primitive feature extraction, unsupervised
tics (OS). learning, and SVM classification. Sun et al. [24] designed and
The major problem lies in the extraction of image semantics. implemented an SBRSIR prototype system in a grid environ-
Although image semantic extraction is difficult, it still offers ment by using ontology and grid techniques. Tobin et al. [25]
possible solutions. State-of-the-art techniques include five cat- proposed a method for indexing and retrieving high-resolution
egories as follows: 1) object ontology; 2) machine learning; image regions in large geospatial data libraries. The steps of
3) relevance feedback (RF); 4) semantic template; and 5) web- their method include feature extraction from the segments of
page context analysis. SBRSIR may fall into categories 1), 2), tessellated images, region merging, indexing, and retrieval in
and 3). a query-by-example environment. Li and Bretschneider [26]
Object ontology-based methods quantize image VFs, such proposed an approach by using a context-sensitive Bayesian
as color and texture, to form a vocabulary, with each interval network for the semantic inference of segmented scenes. The
corresponding to an intermediate-level descriptor of images. RS-related semantics of the regions were inferred in a multi-
Images are classified by mapping such descriptors into high- stage process based on the spectral and textural characteristics
level semantics (keywords) [15], [16]. of the regions and on the semantics of the adjacent regions.
Machine learning, which generally includes supervised and Blanchart and Datcu [27] proposed a semisupervised method
unsupervised learning, seems a more popular way for extracting for the autoannotation of satellite image databases and for
RS image semantics. Supervised learning methods, such as the discovery of unknown semantic image classes in such
decision trees, Bayesian classification, neural networks, and databases. They used latent variable models to map low-level
support vector machines (SVMs), are commonly used for learn- VFs to high-level image semantics. They showed that the use
ing high-level concepts, e.g., OS, from low-level VFs. Unsu- of unlabeled data could generate reliable estimates in terms of
pervised learning tends to group image data into clusters by the model parameters. Bratasanu et al. [28] proposed the use
maximizing the inner similarity of a cluster and minimizing the of the latent Dirichlet allocation model for mapping low-level
similarity among clusters. VFs are first clustered, and then the features of clusters and segments to the high-level map labels
clusters are mapped to some concepts (e.g., some categories). for RS image annotation and mapping. Furthermore, Ferecatu
The mapping rules can then be used to index unlabeled images. and Boujemaa [29], and Li and Bretschneider [26] proposed
RF learns semantics on-demand. A typical RF-based SBIR methods by using RF techniques to support interactive RSIR
system includes the following steps [17]: 1) the system offers and effectively alleviate the “semantic gap” problems.
an initial retrieval set by some query method; 2) a user selects On the other hand, many studies focused on applying higher
the most relevant/irrelevant images; and 3) the system learns the level spatial semantics, such as the spatial relationship or SS in
“semantics” of the selection and offers a finer retrieval. Steps 2) SBRSIR. For example, El-Kwae and Kabuka [30] proposed an
and 3) can be repeated until satisfactory results are obtained. algorithm named SIMDTC . In SIMDTC , the similarity between
Semantic utilization generally includes two types of meth- two images was a weighted function of the number of their
ods. The first method extracts semantics from both the query common objects and the closeness of directional and topolog-
templates and the images in databases and then performs ical spatial relationships between object pairs in both images.
semantics-based similarity matching. The second method al- Li and Fonseca [31] proposed a comprehensive model named
ternatively matches a user’s queries with some predefined se- Topology–Direction–Distance for qualitative spatial similarity
mantic templates. Users are generally required to offer some assessment. They applied different priorities to topology, di-
concept sketches as system inputs, and the similarity matching rection, and distance similarities, and they considered both
process is transferred to common CBIR systems [18]. commonality and difference in similarity assessments. In their
report, they only addressed simple scene matching issues with
two objects. Aksoy [32] modeled RS image content using at-
B. Typical Cases tributed relational graphs (ARGs) and regarded image retrieval
Many studies have been dedicated to feature or OS extraction as a problem in relational matching and subgraph isomorphism.
and application in SBRSIR. For example, Shyu et al. [19] and The “editing distance,” defined as the minimum cost taken over
Scott et al. [20] proposed an RSIR system named GeoIRIS all sequences of operations that transforms one ARG to another,
that could implement content-based shape retrieval of objects was used as the similarity measure. Kalaycilar et al. [33]
from a large-scale satellite imagery database. Durbha and King proposed the retrieval of images by using spatial relationship
[21] proposed a retrieval framework based on a concept-based histograms. These histograms were constructed by classifying
model by using domain-dependent ontologies. Their framework the regions in an image, by computing the topological and
obtained image semantics by employing image segmentation, distance-based spatial relationships between these regions, and
primitive descriptor calculation, and object ontology learning by counting the number of times that different groups of regions
via SVMs. Li and Narayanan [22] proposed an integrated were observed in the image.
method to retrieve the spectral and spatial patterns of remotely Semantics hidden in RS images has complex representations.
sensed imagery. In their method, land cover information that RS image analysis is generally a high-level spatial analysis
corresponds to spectral characteristics was identified using that is more complicated than ordinary image processing. The
SVM classification, whereas textural features that characterize spatial semantic utilization in SBRSIR is more complicated
spatial information were extracted using Gabor filters. Durbha than that in common SBIR. Although much progress has been
and King [23] proposed I3KR, which is a semantic-enabled achieved, SBRSIR remains a young research field. Operative
image knowledge retrieval system for the exploration of dis- spatial semantic description, extraction, and matching methods
tributed RS image archives. The OS were obtained using im- for SBRSIR remain open topics.
2876 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 5, MAY 2013
Fig. 1. Technical framework.
III. M ETHODOLOGY only areas with the same size can be queried. Second, many
areas located across the neighboring blocks cannot be retrieved.
A. Technical Framework An image may be decomposed in a multiscale manner, e.g.,
This paper utilizes spatial object, relationship, and SS in the quadtree decomposing, to overcome the first shortcoming.
proposed SBRSIR system. Behavior and emotion semantics Some additional blocks may be added into the quadtree, which
are not considered because of their remoteness to RSIR. The originates quintree and nonatree decomposing, among other, to
proposed semantic extraction scheme includes image decom- address the second shortcoming [34]. Although these additional
position, segmentation, object-based classification, spatial rela- blocks improve retrieval accuracy, they also cause redundant
tionship reasoning, and scene modeling, as shown in Fig. 1. An storage and lower query efficiency. Thus, a decomposing strat-
image is first decomposed into a number of blocks according to egy that can balance retrieval accuracy and query efficiency
a quintree structure. Then, a multiresolution image segmenta- well is necessary.
tion method divides the image blocks into several parcels. The The redundancy is measured using the block cover ratio. As
VFs of the parcels, including color and texture, are extracted illustrated in Fig. 2(b), given a query image M as large as four
and stored. The OS map is obtained by applying object-based subblocks derived from the quadtree decomposition, the cover
image classification on the blocks. The area, orientation, and ratio of M is the ratio of its maximum overlap area to these
topology semantic features are calculated and stored on the blocks. Since block 1 mostly overlaps M, the cover ratio of M is
classified parcels. A SS model is then built on the semantics- given by (L − x)(L − y)/(L × L) × 100%. Given an arbitrary
labeled parcels. Different types of image retrieval methods are query image that has the same size with the blocks, the cover
then conducted based on image visual, object, spatial rela- ratio of the quadtree decomposing ranges from 25% to 100%,
tionship, and SS. This paper primarily discusses the SS-based with an average ratio of 56.25% [35].
retrieval because it is arguably the most advanced retrieval Another commonly used image decomposing method is the
scheme for RS images. quintree decomposing. A quintree is a hierarchical structure
that decomposes an image into five subblocks of the same
size, as shown in Fig. 2. The middle block does not need
B. Image Decomposing
to be decomposed because its upper left, upper right, lower
RSIR often involves the matching of a query template with left, and lower right subblocks are overlapped with some
some parts of a whole-scene RS image. A large image needs same-level subblocks derived from its neighbors. The de-
to be decomposed generally into several blocks suitable for composition is recursively implemented until the minimum
retrieval. Without decomposing, image features require online subblock size reaches a certain threshold. After the first-level
extraction, and CBRSIR systems may be inefficient when fac- decomposing, the minimum, maximum, and average cover
ing voluminous data. Block-oriented image decomposition is ratios of the quintree are 50%, 100%, and 68.75%, respectively.
a practical solution [34], [35]. Chessboard decomposition is A higher cover ratio improves retrieval accuracy, causes redun-
commonly used, but it is limited by two shortcomings. First, dant block storage, and declines query efficiency. In this paper,
Fig. 2. Image decomposing. (a) Quintree decomposing. (b) Overlapping area of quadtree decomposing.
quintree decomposition is used because of its fair retrieval created in the high dimensional space, and the classification
accuracy and efficiency. is implemented to maximize the margin, which is defined as
the distance of the closest vectors in certain classes to the
hyperplane. The function for calculating the best hyperplane is
C. Object Extraction and Classification defined as follows:
In this paper, an object-based image classification method is

n
1
n
used for OS extraction. The object-based image classification Q(α) = αi − αi αj yi yj (xi · xj ) (1)
i=1
2 i,j=1
method first segments an image into parcels, extracts features,
and then classifies the image [36], [37]. A parcel is a group
where n is the number of samples, y is the class label, αi is
of pixels that forms an “object” other than a single pixel; thus,
the Lagrange multiplier, and x is the samples. Accordingly, the
such classification scheme is called object-based image clas-
decision function is defined as follows:
sification. Compared with pixel-based image analysis, object-
n
based image analysis offers more abundant features for image
interpretation [38], e.g., the shape and spatial relationship fea- f (x) = sgn αi yi K(xi , x) + b (2)
tures of the parcels may also be utilized for image classification. i=1
In addition, it has more robust pepper-noise-removing abil-
ity, providing more comprehensible interpretation results for where sgn is the symbolic function, b is the threshold value of
medium- and high-spatial-resolution RS images [39]. Object- classification, and K is the kernel function. Vapnik et al. [44],
based image analysis has been successfully applied in many [45] examined SVMs in detail.
applications, including land cover classification and change
detection by using high-spatial-resolution RS images [40]–[42].
D. SS Modeling
Image segmentation, which segments an image into homoge-
nous parcels to facilitate sequent analyses, is the first and most This paper initially introduces the “conceptual neighbor-
important step of object-based image analysis. Wang [43] de- hoods” model [46], [47] for the simple scene modeling of only
signed and implemented a multiresolution image segmentation two regular objects. However, the “conceptual neighborhoods”
method that could issue multiresolution homogenous parcels. model is limited to modeling complex spatial scenes that
This segmentation method is conducted in two main steps. commonly comprise many irregular objects. Thus, this paper
Several initial small parcels are first obtained using the rainfall proposes a new orientation modeling that can adapt to irreg-
watershed segmentation method. Then, the fast region merging ular objects. Moreover, complex spatial scenes are modeled
is conducted to merge the small parcels in a hierarchical man- using ARGs.
ner. Merging is conducted under the control of a scale parameter 1) “Conceptual Neighborhood” Model: The “conceptual
that stopped the merging process when the minimal parcel neighborhood” comprehensively models spatial relationships,
merging cost exceeded its power. Multiresolution segmentation including topology, orientation, and approximate distances be-
could be implemented by setting different scale parameters; tween a pair of objects. Each node in the graph denotes a scene
smaller scales meant less merging cost, which created smaller of two regular objects, as shown in Fig. 3. Two nodes are called
parcels, and vice versa. the “conceptual neighborhoods” and are linked with an edge if
After image segmentation, the parcels are vectorized, and they are regarded as the most similar between each other. For
then their features are calculated and stored. The commonly example, scene 1 has two objects with disjointed topology and
used parcel-based features are as follows: 1) spectral features, north-to-south orientation relationships. Thus, scene 1 has three
such as the parcel spectral mean and standard deviation values; “conceptual neighborhoods” of scenes 2, 16, and 17 because it
2) shape features, such as the parcel area, perimeter, main needs only one step to convert itself into the other three scenes.
direction, and length/width ratio; and 3) texture feature, such as The dissimilarity among simple scenes can be measured con-
GLCM and Gabor textures. In this paper, an SVM [44], [45] is veniently using the “conceptual neighborhood” model. A scene
used to classify the parcels based on their spectral features. For can gradually change into another scene with modifications
a nonlinear classification problem, the spectral mean values of on its topology, distance, and orientation relationships. More
these parcels are mapped into a high-dimensional feature space modifications indicate less similarity. For example, scene 41 in
by using a certain kernel function. An optimal hyperplane is Fig. 3 requires at least three steps to change itself into scene 16
because object B bestrides sections 1 to 4. This position denotes

orientation NEE because it is located between sections 2 (north-
east) and 3 (east). The principle is quite simple, whereas some
cases require particular treatment. For example, when object B
bestrides sections 0, 1, and 7, index 7 should be converted to −1
to make the index array uninterrupted. Thus, the computation
becomes (−1 + 0 + 1)/3 = 0, denoting the correct northwest
orientation. In addition, when object B covers sections 0 to
7, B fully contains A and further computations are no longer
necessary, as shown in Fig. 4(c).
The section-centroid-based model precisely denotes all 16
orientations in the “conceptual neighborhood” model. More-
over, the model fits with our intuition; thus, it is used as the
orientation relationship descriptor for complex shapes in the
scene model in this paper.
3) Scene Modeling Using ARGs: A classification map is
converted into an ARG. An ARG is represented as G = (V, E),
where V is the set of nodes (objects) and E represents the edges
(relationships among nodes). The nodes and edges can be la-
Fig. 3. “Conceptual neighborhood” model.
beled with different kinds of attributions. In the proposed ARG
model, the nodes are labeled with object areas and categories,
and the edges are labeled with the scene IDs in the “conceptual
neighborhood” model.
In addition to the orientation relationships, the topology
relationships between any two parcels also need to be obtained
to label the edges. Parcels in a classification map only have
“containing,” “touching,” and “disjointing” topology relation-
Fig. 4. Section-centroid-based orientation model. ships, with the exclusion of “intersecting” and “overlapping.”
Thus, scenes 33 to 40 do not exist in the proposed scene model,
and two parcels with a “containing” topology relationship are
(e.g., 41 → 33 → 32 → 16), indicating that the two scenes
regarded as scene 41 (overlapping).
have a dissimilarity of 3. Any two scenes in the “conceptual
In the proposed scene model, a scene with object A in the
neighborhood” model have a maximum of six steps of mutual
north and object B in the south, and a scene with A in the south
conversion.
and B in the north are considered two different scenes. If A and
In principle, the above scene similarity evaluation process
B are disjointed in both scenes, the distance between the two
is quite simple. However, similarity evaluation can become
scenes is six steps (the max step in Fig. 3). To facilitate scene
very complicated when scenes become complex and comprise
matching, classification categories should first be arranged in
multiple objects. In addition, object category and shape features
order, e.g., A, B, C, and D, without loss of generality. The
can influence similarity evaluation by humans, but they are not
directed edges in an ARG always point from front-position-
considered in the “conceptual neighborhood” model.
category objects to back-position-category ones. For example,
2) Section-Based Orientation Model: The “conceptual
A− > B (PosID = 26) indicates that an object labeled B is
neighborhood” model can effectively model topology and
in the southeast of object labeled A and that they have a
orientation relationships for simple scenes that comprise
“touching” topology relationship. Thus, their unique scene ID
two regular objects, e.g., the rectangles in Fig. 3, and it
is 26.
also offers an objective similarity measure for such scenes.
A same-category case requires particular treatment. For
However, the “conceptual neighborhood” model is very limited
example, if two objects have the same category with
for spatial scene modeling. For regular objects, orientation
north-to-south orientation and “disjointing” topology, the scene
relationships can be appropriately calculated based on their
ID may be 1 or 9, which is not unique. Such scenes are
centroids. Spatial scenes are often composed of objects with
labeled with the smaller IDs. In this manner, the simple scenes
irregular shapes, sometimes resulting in the unreasonable
composed of any two parcels in a classification map can have
centroid-based orientation computation. Fig. 4(a) illustrates
their unique scene IDs, and the ARG can then be uniquely
a common spatial scene that consists of two irregular objects
generated. Fig. 5 illustrates a typical spatial scene with four
with intuitive “east-northeast” orientation relationship. The
different objects and its ARG.
centroid-based orientation model, however, will provide east-
to-west orientation because the centroid of object B is located
approximately at the dotted position.
E. Scene Matching Model
Thus, a section-centroid-based orientation model is proposed
in this paper to overcome the given shortcoming. A nine-section The proposed solution is as follows: Given a pair of query
orientation matrix is constructed around the smaller object A by and target scenes, the target scene needs to fully “contain” the
using its minimum bounding rectangle, as shown in Fig. 4(a). query scene, i.e., the object number of the target scene in each
The section centroid of object B is (1 + 2 + 3 + 4)/4 = 2.5 category must not be less than that of the query scene. This
TABLE I
MATCHING SCHEME
Fig. 5. (a) Scene and (b) its ARG.
constraint controls the adaption of retrieval size to practical

applications. Then, the query and target scenes are converted
into ARGs. The additional categories in the target scene that
do not exist in the query scene are excluded from the former’s
ARG. Consequently, the multiple spatial relationships within
the query ARG are decomposed into mutual relationships be-
tween any two objects, and then their similarities with the corre- Fig. 6. Stratified digraph.
sponding pairs in the target ARG are evaluated and aggregated
to obtain the entire scene similarity. P and Q. It is also normalized in [0, 1] by using the histogram
The proposed scene matching model is thus formulated as intersection method as follows:
follows. Let P and Q be two ARGs corresponding to the query
L
and target scenes, which are composed of the same categories, min [HP (i), HQ (i)]
and include a number of p and q objects, respectively. The i=1
NSO(P, Q) = (6)
number of object in Q for each category must not be less than
L
HP (i)
that of P . F (·) is defined as an injective function that maps i=1
nodes in P to nodes in Q with the same categories. For example,
F (i) and F (j), which can also be written as VPi → VQ m
and where L is the number of matched objects, and H(·) is the
relative area of an object.
VP → VQ , are the mappings of nodes i and j in P to nodes
i n
The maximized weighted sum of NSR and NSO is an in-
m and n in Q, respectively, where nodes i and m, and j and
tuitive definition of scene similarity. However, the weighting
n have the same class labels. The spatial relationship similarity
is difficult because NSR and NSO exhibit significant differ-
between P and Q by using F (·) is defined as follows:
ences between each other. Thus, to avoid this problem, the
scene similarity SS(P, Q) is directly defined as the maximized
SRF (P, Q) = SE (F (i), F (j)) (3)
NSRF (P, Q) as follows:
i,j∈[1,p],i=j
SS(P, Q) = max {NSRF (P, Q)} (7)
F
where SE(·) denotes the similarity between EPi,j and EQ m,n
, and
it is defined as six subtracted by the minimal “gradual change” Although (7) indicates that the scene matching primarily
steps to convert the scene that is composed of objects i and relies on spatial relationship similarity, the influence of object
j to the scene of objects m and n. After summation, SR(·) area is not neglected. In the proposed retrieval prototype sys-
represents the total spatial relationship similarity between P tem, the images are first retrieved and then sorted based on their
and Q. scene similarities to the sample. Then, the secondary sorting
Measure SR will unboundedly increase along with the is implemented using OS similarity for scenes with the same
growth of nodes in P . Thus, it is normalized into [0, 1] as scene similarities. This solution reduces algorithm inputs and
follows: improves the stability of system performance.
NSRF (P, Q) = SRF (P, Q)/ (6 · card(EP )) (4)

F. Matching Process
where card(·) is the number of edges in P .
The corresponding OS similarity between P and Q is defined The matching includes the following five steps.
as follows: 1) The categories, areas, and spatial relationships of all the
objects in the query scene are obtained, and its ARG P is
SOF (P, Q) = SN (F (i)) (5) constructed.
i∈[1,p] 2) All the scenes containing all the categories in the query
scene P are retrieved from the image database, and their
where SN(·) denotes the similarity between VPi and VQ m
, and ARGs {Q} are constructed.
it is defined as the areal similarity of the two nodes. After 3) SS(P, Q) is obtained using (3), (4), and (7) between P
summation, SO(·) represents the total OS similarity between and Q by traversing all the possible matching patterns.
Fig. 7. Query process.
4) {Q} is sorted in a descending order using SS(P, Q). The by selecting a number of interested semantic categories. The
secondary sorting is conducted by NSO(P, Q) for the image blocks that contain all the queried classes are retrieved
scenes with the same scene similarities. as the first “rough” retrieval set. The subsequent queries are
5) These sorted images are returned to the users, and the limited to the first set, which helps reduce succeeding searching
retrieval is finished. scope and improves query efficiency. Then, the user browses
In step 3), a stratified digraph structure suitable for scene and selects an interested image block as the query template. The
matching with unfixed numbers of categories and objects is user further selects some interested parcels by mouse-clicking
designed. Table I and Fig. 6 present a typical case with four on the classification map of the query image. These parcels
categories and six objects. As shown in Table I, columns 2 and form the query scene. The matching algorithm is induced to
3 list the object IDs of each category, and column 4 lists all the focus on “important” objects and omits trivial ones to improve
possible matching patterns of each category. All the scene-level feedback efficiency.
matching patterns are obtained by selecting and combining the In the second step, the scene similarities between the query
category-level matching patterns. For example, 1 ↔ 2, 2 ↔ 5, scene and each candidate image in the rough retrieval set
3 ↔ 1, 4 ↔ 6, 5 ↔ 3, and 6 ↔ 4 and 1 ↔ 2, 2 ↔ 5, 3 ↔ 6, are calculated. A candidate image is marked as eligible and
4 ↔ 1, 5 ↔ 3, and 6 ↔ 4 are two typical scene-level matching returned if the total similarity is higher than a certain threshold.
patterns. The fine retrieval sorts the rough retrieval set and returns a
To facilitate the scene-level matching process, the stratified queue in a descendant order of similarities. The threshold is
digraph is composed of nodes that represent the matching pat- used to truncate the queue and accelerate feedback.
terns in the category level. Each layer in the digraph represents
the possible matching patterns of the same category. One-way
pointers link the nodes in the same layer, and the last node of IV. E XPERIMENTAL A NALYSIS
the front layer points to the first node of the next layer. From top A. Experimental Setting
to bottom, the method visits and picks a single node from each
layer by using the depth-first traversal algorithm and then gath- The prototype system was implemented using Microsoft
ers all scene-level matching patterns to determine the best one. Visual C++.NET 2003, and the database system was imple-
mented using Microsoft SQL Server 2005. The testing platform
had an Intel(R) Core (TM) 2 Quad 8200 2.33-GHz CPU with
G. Query Process
2-GB memory. Several RS multispectral images were collected
In this paper, a two-step coarse-to-fine query scheme is as testing data, including three scene SPOT-5 images, five
designed to apply the proposed scene-matching model to SBR- scene GeoEye images, six scene ALOS images, and two scene
SIR. The proposed scheme includes two substages, namely, TM images collected from 2000–2011. A mixed data source
rough retrieval by using image OS and fine retrieval by using was used to test method adaptability. The imaging bands of
SS matching, as shown in Fig. 7. A user first issues a query SPOT-5 were 0.5–0.59 (green to yellow), 0.61–0.68 (red),
TABLE II
CLASSIFICATION ACCURACY OF THE FIRST-LEVEL CLASSES
0.78–0.87 [near-infrared (NIR)], and 1.58–1.75 μm [short- TABLE III

CLASSIFICATION ACCURACY OF THE SECOND-LEVEL CLASSES
wave infrared (SWIR)]. SPOT-5 does not have a blue
band, which renders the true color composition difficult.
The ALOS imagery has bands of 0.42–0.5 (blue to cyan),
0.52–0.6 (green to yellow), 0.61–0.69 (red), and 0.76–089 μm
(NIR). The GeoEye imagery has bands of 0.45–0.51 (blue),
0.51–0.58 μm (green), 0.655–0.69 (red), and 0.78–0.92 μm
(NIR), with 2-m resolution. Six bands of TM imagery, in-
cluding 0.45–0.52 (blue), 0.5–0.60 (green), 0.63–0.69 (red),
0.76–0.90 μm (NIR), 1.55–1.75 μm (SWIR), and 2.08– RBF kernel had a penalty factor C of 25 and kernel width
2.35 μm (SWIR) were used with 30-m resolution. The ALOS parameter σ of 40. These parameters were constant in every
and GeoEye imageries were displayed with true-color compo- classification, whereas the samples for every classified image
sition (bands 3, 2, and 1), and SPOT-5 and TM were displayed were chosen to train the SVM classifier.
with false-color composition (bands 3, 2 and 1 for SPOT-5, The selection of classification granularity is important in
and bands 4, 3 and 2 for TM), in which vegetation presents image classification. Very fine granularity occasionally renders
red color. These color schemes helped distinguish the kinds of the chosen representative training samples difficult, further
images in the illustrations. causing low-quality classification due to spectral confusion. On
To facilitate processing, the images were split into 580 the other hand, excessively coarse granularity is meaningless to
separate 1024 × 1024 subimages. Each original image was SBRSIR. Thus, to ensure the applicability of the classification
processed using principal component analysis (PCA) transform framework in this paper, an image was classified into water
and then decomposed into the quintree subblocks. In this paper, body, urban and built-up area, woodland, cropland and grass-
the blocks larger than 512 × 512 were termed “query blocks.” land, and bare land and others. Classification granularity was
Only these blocks participated in visual and semantic similarity selected to simulate a nonexpert’s visual interpretation of RS
matching and were returned in a query. This scheme prevented images. The simulation was easy to implement, and accuracy
trivial retrieval results because too small geographical areas can be guaranteed.
were of no value in CBRSIR. In addition, the query template The second-level classification was implemented using shape
and candidates could have different sizes that could help iden- analysis on the parcels. After image segmentation, river and
tify similar geographical configurations at different observation lake parcels generally had distinguishable shape features. River
scales. The database essentially stored 580 separate 1024 × parcels generally had a high length/width ratio and a low rect-
1024 first-level blocks and 2900 separate 512 × 512 second- angle ratio, in contrast to those of the lake parcels. Water bodies
level blocks; a total of 3480 subblocks composed the query were then divided into areal water bodies (lakes and seas)
blocks. These blocks were iteratively decomposed into final 16 and linear water bodies (rivers) by setting thresholds on their
× 16 feature blocks, and their VFs were calculated and stored. shape features. In this paper, water parcels with length/width
The object-based classification was conducted on the query ratios higher than 3 and rectangle ratios lower than 0.3 were
blocks, which were first segmented into parcels. The experi- classified as rivers, and the rest as lakes and seas. Errors were
mental settings of the segmentation were as follows: scale of reduced or removed via manual editing. The object, topology,
10; weights of 0.9 and 0.1 for color and shape, respectively; and orientation semantics of the query blocks were obtained
and weight of 0.5 for both smoothness and compactness. Wang by overlaying the blocks with the corresponding classification
[43] discussed these parameters in depth. maps. These semantic features were stored for future use.
In the experiments, the parcels were classified by an SVM The system required two steps, i.e., rough retrieval and
classifier by using the Gaussian radial basis function (RBF) ker- fine retrieval, to fetch interested images. The rough-retrieval-
nel. The spectral features used in image classification were all collected images contain specified semantics categories, and
of the four bands of SPOT-5, ALOS and GeoEye imagery, and the fine retrieval conducted scene (ARG) matching. The time
the six bands of TM imagery mentioned above. The Gaussian complexity of the matching was O(M N ), where M and N are
the edge numbers in the source and target ARGS, respectively.

If all objects in the target scenes participate in the matching,
the retrieval process becomes overburdened because spatial
scenes often include a large number of objects. To accelerate
performance, we only allowed large parcels in a target scene
to participate in the matching to improve query efficiency. The
minimal area threshold of 0.05 controlled the parcels larger than
5% of the entire image that participated in the scene matching.
B. Accuracy Evaluation Schemes

1) Rough Retrieval: Rough retrieval returns an image based
on whether it contains selected semantic categories or not.
Thus, the recall, precision, and F-measure were used to evaluate
the accuracy of the rough interval as follows:
TP
Recall =
TP + FN
TP
Precision =
TP + FP
2 × Precision × Recall
F-measure = (8)
Precision + Recall
where TP is the number of true positives, FP is the number of
false positives, FN is the number of false negatives, F-measure
combines precision and recall and is a harmonic mean of the
two measures.
2) Fine Retrieval: The fine image retrieval is not used for Fig. 8. Classification on a SPOT-5 image. (a) Source data. (b) Image segmen-
evaluating whether some objects exist but is rather used for tation. (c) First-level classification. (d) Results after postprocessing.
scene similarity matching; thus, the measures of recall and TABLE IV
precision are not suitable in this stage. The feedback sequence ACCURACY OF THE ROUGH RETRIEVAL. “L and S” INDICATES LAKE
order is crucial, particularly in querying practical voluminous AND S EA , “U AND B” I NDICATES U RBAN AND B UILT -U P A REA ,
RS image databases. Given a rough retrieval set and a query “R” INDICATES RIVER, “C AND G” INDICATES CROPLAND
AND G RASSLAND , AND “W” I NDICATES W OODLAND
template, a user picks a sequence with the top similarities to the
template. The machine sequence was compared with the human
sequence by using the tau index as follows [48]:
V1 − V2
H= (9)
V
where V1 is the number of correct ordered pairs, V2 is the
number of wrong ordered pairs, and V is the total number
of possible ordered pairs. A higher tau index value indicates
a greater similarity of two sequences. In this paper, sequence
length was limited to six, which is necessary for tau index
calculation.
C. Performance Analysis
1) Image Classification: Approximately 92% first-level
classification accuracy was achieved using parcel-based SVM
classification, whereas approximately 90.5% second-level clas-
sification accuracy was achieved using parcel shape analysis
and manual editing on the testing data set. Tables II and III
list the average sample size (the ratio of the number of sample Fig. 9. Method comparison.
parcels versus the total number of parcels in classified images),
the classification accuracy of each class in each kind of imagery, The average accuracy was found to be higher than that of
and the average classification accuracy. Fig. 8 shows a typical the classification. This result is reasonable because objects
classification case on a SPOT-5 image. within the same category may be distributed widely within a
2) Rough Retrieval: The rough retrieval accuracy was in- retrieved image; this paper only verified whether the category
vestigated by getting the average of five types of category was contained by the image.
combination schemes with different category numbers that 3) Fine Retrieval: In this stage, an image from the
were arbitrarily selected. The results are listed in Table IV. rough retrieval set was picked as the query template, some
Fig. 10. Retrieval by a two-object scene. The first row represents the original image, the classification map, and the query scene. The second to fourth rows
show the eight most similar images returned by the SS, VF, and OS methods. These images were labeled with image types and similarity measures. “S” indicates
SPOT-5, “G” indicates GeoEye, “A” indicates ALOS, and “T” indicates TM.
“important” objects (large classified parcels) were selected larities to the query template from left to right. The similarity
to compose the query scene, and the SS-based fine retrieval values were labeled under the images. The SS method has
was performed (SS method). Two different retrieval schemes, two similarity measures, namely, scene similarity SS and OS
namely, retrieval via VFs (VF method), including color (second similarity SO, whereas the other two methods only have one
and third color moments) and texture (2-D Gabor coefficient), measure each.
and retrieval via OS (OS method), were also implemented for In the first case, the query scene was composed of two
method comparison. The query process of the two contrast objects with categories “woodland” and “lake and sea.” The
methods coincided with the SS method. First, the same rough objects were distributed closely with a northwest to southeast
retrieval was conducted. Then, the fine retrievals were con- orientation relationship. The scenes returned by the SS method
ducted by comparing the VF histograms (VF method) or the were found to be highly close to the template in terms of spatial
object area histograms (OS method) of the query template with configurations. Although scenes 4–6 contained very visually
those of the roughly retrieved images. The fine retrieval sorted different bodies of waters, they were returned because their
the rough retrieval set by similarity and returned a queue in a semantics were consistent with the template. On the contrary,
descending order. In all three methods, a candidate image was scenes 5 and 6 returned using the VF method and scenes 4 to
returned if its total similarity to the template exceeded 0.6. This 8 returned using the OS method were relatively different from
threshold truncated the queue and accelerated system feedback. the template.
The above query was repeated ten times for the three meth- In the second case, the scene included three objects. From
ods, and their accuracy values were evaluated. Fig. 9 shows the northwest to southeast, the objects were parcels of cropland and
tau curves of the ten queries with different numbers of scene grassland, river, and urban and built-up areas. The SS method
objects. The SS method had the highest tau mean and the lowest retrieved the most similar scenes to the template, whereas the
tau standard deviation, indicating that the machine sequences other two methods sometimes returned scenes with relatively
returned by the SS method were stably the closest to human different spatial configurations.
sequences. The other two methods had distinctively lower The last case comprised four objects with three categories.
tau means, and their tau curves fluctuated more intensively. They were parcels of one woodland, one river, and two urban
Therefore, the SS method was proven superior to the other two and built-up areas distributed on both sides of the river. In this
methods. case, the SS method continued to exhibit better performance
Figs. 10–12 show three retrieval cases. In each case, the over the other two methods. In all these cases, the SS method
original image, classification map, and query scene (nonwhite exceeded the VF and OS methods in retrieving semantic similar
parcels) were arranged in the first row. The images retrieved scenes. In addition, the images retrieved using the SS method
using the SS, VF, and OS methods were arranged in three were arranged in a more precise order in terms of OS similarity.
rows from top to bottom. The eight most similar images re- The scene modeling and matching schemes were proven effec-
trieved by each method were arranged with descending simi- tive and superior.
Fig. 11. Retrieval by a three-object scene. The other settings are the same as those in Fig. 10.
Fig. 12. Retrieval by a four-object scene. The other settings are the same as those in Fig. 10.
4) System Efficiency: The efficiency and scalability of the times were recorded. The total feedback time was the sum of
rough retrieval on databases resizing from containing 500 to the rough and fine retrievals. Fig. 13 shows the relationships of
3480 full images were tested. One to five semantic classes were rough and total feedback times versus database volume. These
arbitrarily selected, and their average feedback time versus the curves were almost linear, indicating that system scalability
database volume was recorded. After a rough retrieval, two was high. Using the proposed platform, the rough and fine
to four parcels were arbitrarily selected as the sample scene, retrievals required an average of 46 and 66 s, respectively, on
the fine retrievals were conducted, and their average feedback the entire database. In the current prototype system, spatial
search for his/her query templates. Thus, sketch-map-based

or natural-language-based query schemes will gradually be
added to allow the SBRSIR prototype system to become more
intelligent and powerful.
ACKNOWLEDGMENT
The authors would like to thank the two anonymous review-
ers for their very constructive comments that improved the
manuscript significantly.
R EFERENCES
[1] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influ-
ences, and trends of the new age,” ACM Comput. Surv., vol. 40, no. 2,
Fig. 13. Retrieval efficiency. pp. 1–60, Apr. 2008.
[2] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic,
relationship inference and ARG constructions were conducted and W. Equitz, “Efficient and effective querying by image content,” J.
via real-time computing for flexibility. System efficiency could Intell. Inf. Syst., vol. 3, no. 3/4, pp. 231–262, Jul. 1994.
be improved by developing and utilizing appropriate spatial [3] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based
manipulation of image databases,” Int. J. Comput. Vis., vol. 18, no. 3,
indices, prestoring these spatial relationship features, and using pp. 233–254, Jun. 1996.
more efficient matching algorithms. [4] J. R. Smith and S. F. Chang, “VisualSEEK: A fully automated content-
based image query system,” Proc. 4th ACM Int. Multimedia Conf., pp. 87–
98, 1996.
V. C ONCLUSION [5] J. Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics-sensitive
integrated matching for picture libraries,” IEEE Trans. Pattern Anal.
This paper offers a novel SBRSIR approach whose main Mach. Intell., vol. 23, no. 9, pp. 947–963, Sep. 2001.
[6] GazoPa, Feb. 2012. [Online]. Available: http://en.Wikipedia.org/wiki/
contributions lie in the SS modeling and matching schemes. GazoPa
Multilevel image semantics are first extracted based on VF [7] R. L. Gregory, Eye and Brain: The Psychology of Seeing. Princeton, NJ:
extraction, object-based image classification, spatial relation- Princeton Univ. Press, 1997.
ship inference, and scene modeling. Then, a novel spatial [8] Y. Liu, D. S. Zhang, and G. J. Lu, “Region-based image retrieval with
high-level semantics using decision tree learning,” Pattern Recognit.,
scene-matching model is designed and extended to an SBRSIR vol. 41, no. 8, pp. 2554–2570, Aug. 2008.
prototype system that was distinctive in the coarse-to-fine two- [9] Z. Shi, H. Qing, and Z. Z. Shi, “An index and retrieval framework in-
stage query scheme. Semantic-based retrieval experiments were tegrating perceptive features and semantics for multimedia databases,”
Multimedia Tools Appl., vol. 42, no. 2, pp. 207–231, Apr. 2009.
successfully conducted on high/medium-resolution multispec- [10] W. Huang, Y. Gao, and K. L. Chan, “A review of region-based image
tral RS images with resolutions in the range of 2–30 m. The retrieval,” J. Signal Process. Syst., vol. 59, no. 2, pp. 143–161, May 2010.
proposed retrieval scheme can effectively be applied to lower- [11] Y. Liu, D. S. Zhang, G. J. Lu, and W. Y. Ma, “A survey of content-based
resolution images, as long as the images are correctly classified. image retrieval with high-level semantics,” Pattern Recognit., vol. 40,
no. 1, pp. 262–282, Jan. 2007.
RS images with lower resolutions can commonly be classified [12] H. F. Wang and Z. X. Sun, “The methods of semantics processing in
using simpler pixel-based classification methods but, perhaps, content-based image retrieval,” J. Image Graph., vol. 6, no. 10, pp. 945–
with coarse classification granularity. 952, 2001.
The proposed scheme is most recommended for retrieving [13] N. Wu and F. M. Song, “An image retrieval method based on high-level
image semantic information,” J. Image Graph., vol. 11, no. 12, pp. 1774–
10–30-m RS images because current classification granularity 1780, 2006.
basically matches the image spatial resolutions and because [14] C. Colombo, A. D. Bimbo, and P. Pala, “Semantics in visual information
object-based image classification can be utilized with good retrieval,” IEEE Multimedia, vol. 6, no. 3, pp. 38–53, Jul.–Sep. 1999.
[15] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “An ontology approach
accuracy on such images. It is not recommended for very high to object-based image retrieval,” in Proc. Int. Conf. Image Process.,
resolution (VHR) RS image (e.g., < 1 m) retrieval. First, the Barcelona, Spain, 2003, pp. 511–514.
current classification granularity seems coarse for VHR images. [16] M. Obeid, B. Jedynak, and M. Daoudi, “Image indexing & retrieval using
Second, fair and stable classification accuracy is difficult to intermediate features,” in Proc. 9th ACM Int. Conf. Multimedia, Ottawa,
ON, Canada, 2001, pp. 531–533.
obtain only by classifying spectral features because of seri- [17] X. S. Zhou and T. S. Huang, “Relevance feedback in image retrieval:
ous spectral confusion caused by image trivial details, noise, A comprehensive review,” Multimedia Syst., vol. 8, no. 6, pp. 536–544,
shadow, and the mutual occlusion of ground features in these Apr. 2003.
[18] S. F. Cheng, W. Chen, and H. Sundaram, “Semantic visual templates:
images. Low classification accuracy will seriously influence Linking visual features to semantics,” in Proc. Int. Conf. Image Process.,
semantic matching and retrieval. Despite the limitations above, Chicago, IL, 1998, pp. 531–535.
this paper provided a practical solution for spatial, particu- [19] C. R. Shyu, M. Klaric, G. J. Scott, A. S. Barb, C. H. Davis, and
larly geographic, scene matching, which has good application K. Palaniappan, “GeoIRIS: Geospatial information retrieval and indexing
system—Content mining, semantics modeling, and complex queries,”
prospects in SBRSIR. IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 839–852, Apr. 2007.
More robust classification schemes to refine classification [20] G. J. Scott, M. N. Klaric, C. H. Davis, and C. R. Shyu, “Entropy-Balanced
and retrieval granularity will be investigated in future research. bitmap tree for shape-based object retrieval from large-scale satellite
imagery databases,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 5,
In addition, shape and distance features will be considered pp. 1603–1616, May 2011.
in the current scene-matching model. The SBRSIR prototype [21] S. S. Durbha and R. L. King, “Knowledge mining in earth observation
system is currently based on templates and needs a user to data archives: A domain ontology perspective,” in Proc. IEEE Int. Geosci.
Remote Sens. Symp.: Sci. Soc.: Exploring Manag. Changing Planet, [39] P. Aplin, P. M. Atkinson, and P. Curran, “Per-field classification of land
Anchorage, AK, 2004, pp. 172–173. use using the forthcoming very fine spatial resolution satellite sensors:
[22] J. Li and R. M. Narayanan, “Integrated spectral and spatial information Problems and potential solutions,” in Advances in Remote Sensing and
mining in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., GIS Analysis, P. M. Atkinson and N. J. Tate, Eds. Chichester, U.K.:
vol. 42, no. 3, pp. 673–685, Mar. 2004. Wiley, 1999, pp. 219–239.
[23] S. S. Durbha and R. L. King, “Semantics-enabled framework for knowl- [40] J. Im, J. R. Jensen, and J. A. Tullis, “Object-based change detection using
edge discovery from earth observation data archives,” IEEE Trans. correlation image analysis and image segmentation,” Int. J. Remote Sens.,
Geosci. Remote Sens., vol. 43, no. 11, pp. 2563–2572, Nov. 2005. vol. 29, no. 2, pp. 399–423, Jan. 2008.
[24] H. Sun, S. X. Li, W. J. Li, Z. Ming, and S. B. Cai, “Semantic-based [41] W. Zhou and A. Troy, “An object oriented approach for analysing and
retrieval of remote sensing images in a grid environment,” IEEE Geosci. characterizing urban landscape at the parcel level,” Int. J. Remote Sens.,
Remote Sens. Lett., vol. 2, no. 4, pp. 440–444, Oct. 2005. vol. 29, no. 11, pp. 3119–3135, Jun. 2008.
[25] K. W. Tobin, B. L. Bhaduri, E. A. Bright, A. Cheriyadat, T. P. Karnowski, [42] I. A. Rizvi and B. K. Mohan, “Object-Based image analysis of high-
P. J. Palathingal, T. E. Potok, and J. R. Price, “Automated feature resolution satellite images using modified cloud basis function neural net-
generation in large-scale geospatial libraries for content-based index- work and probabilistic relaxation labeling process,” IEEE Trans. Geosci.
ing,” Photogramm. Eng. Remote Sens., vol. 72, no. 5, pp. 531–540, Remote Sens., vol. 49, no. 12, pp. 4815–4820, Dec. 2011.
May 2006. [43] M. Wang, “A multiresolution remotely sensed image segmentation
[26] Y. Li and T. R. Bretschneider, “Semantic-Sensitive Satellite Image Re- method combining rainfalling watershed algorithm and fast region merg-
trieval,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 853–860, ing,” in Proc. Int. Archives Photogramm., Remote Sens. Spatial Inf. Sci.,
Apr. 2007. 2008, vol. 38, Part 4, pp. 1213–1218.
[27] P. Blanchart and M. Datcu, “A semi-supervised algorithm for auto- [44] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
annotation and unknown structures discovery in satellite image Springer-Verlag, 2000.
databases,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 3, [45] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Ma-
no. 4, pp. 698–717, Dec. 2010. chines and Other Kernel-Based Learning Methods. Cambridge, U.K.:
[28] D. Bratasanu, I. Nedelcu, and M. Datcu, “Bridging the semantic gap for Cambridge Univ. Press, 2000.
satellite image annotation and automatic mapping applications,” IEEE J. [46] C. C. Chang and S. Y. Lee, “Retrieval of similar pictures on pictorial
Sel. Topics Appl. Earth Observ. Remote Sens, vol. 4, no. 1, pp. 193–204, databases,” Pattern Recognit., vol. 24, no. 7, pp. 675–680, Jul. 1991.
Mar. 2011. [47] T. Bruns and M. Egenhofer, “Similarity of spatial scenes,” in Proc. 7th Int.
[29] M. Ferecatu and N. Boujemaa, “Interactive remote-sensing image re- Symp. Spatial Data Handling, Delft, The Netherlands, 1996, pp. 173–184.
trieval using active relevance feedback,” IEEE Trans. Geosci. Remote [48] J. S. Payne, L. Heppelwhhite, and T. J. Stonham, “Perceptually based
Sens., vol. 45, no. 4, pp. 818–826, Apr. 2007. metrics for the evaluation of textural image retrieval methods,” in
[30] E. A. El-Kwae and M. R. Kabuka, “A robust framework for content-based Proc. IEEE Int. Conf. Multimedia Comput. Syst. Conf., 1999, vol. 2,
retrieval by spatial similarity in image databases,” ACM Trans. Inf. Syst., pp. 793–797.
vol. 17, no. 2, pp. 174–198, Apr. 1999.
[31] B. Li and F. Fonseca, “TDD: A comprehensive model for qualitative
spatial similarity assessment,” Spatial Cognit. Comput., vol. 6, no. 1,
pp. 31–62, Jan. 2006.
[32] S. Aksoy, “Modeling of remote sensing image content using attributed Min Wang received the M.Sc. degree from Zhejiang
relational graphs,” in Proc. Lecture Notes Comput. Sci., 2006, vol. 4109, University, Hangzhou, China, in 2000 and the Ph.D.
pp. 475–483. degree from the Chinese Academy of Sciences,
[33] F. Kalaycilar, A. Kale, D. Zamalieva, and S. Aksoy, “Mining Beijing, China, in 2003.
of remote sensing image archives using spatial relationship histo- He is currently a Professor with the Key Lab-
grams,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Boston, MA, oratory of Virtual Geographic Environment of the
2008, pp. III-589–III-592. Ministry of Education of China, Nanjing Normal
[34] E. Remias, G. Sheikholeslami, and A. D. Zhang, “Block-oriented image University, Nanjing, China. His research interests
decomposition and retrieval in image database systems,” in Proc. Int. include remote sensing image processing, feature ex-
Workshop Multi-Media Database Manage. Syst., Blue Mountain Lake, traction and classification, and remote sensing image
NY, 1996, pp. 85–92. mining.
[35] D. Li and X. Ning, “A new image decomposition method for content-
based remote sensing image retrieval,” Geomatics Inf. Sci. Wuhan Univ.,
vol. 31, no. 8, pp. 659–662, 2006.
[36] U. Benz, M. Baatz, and G. Schreier, “OSCAR-object oriented segmen-
tation and classification of advanced radar allow automated information Tengyi Song is currently working toward the Ph.D.
extraction,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Sydney, degree with the Key Laboratory of Virtual Geo-
Australia, 2001, pp. 1913–1915. graphic Environment of the Ministry of Education of
[37] T. Blaschke and G. J. Hay, “Object-oriented image analysis and scale- China, Nanjing Normal University, Nanjing, China.
space: Theory and methods for modeling and evaluating multiscale land- His research interests include remote sensing im-
scape structure,” in Proc. Int. Archives Photogramm. Remote Sens., 2001, age processing and information extraction.
vol. 34, pp. 22–29.
[38] H. G. Akçay and S. Aksoy, “Automatic detection of geospatial objects
using multiple hierarchical segmentations,” IEEE Trans. Geosci. Remote
Sens., vol. 46, no. 7, pp. 2097–2111, Jul. 2008.

28-02-2021 RSIR by Scene Sementic Matching

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

28-02-2021 RSIR by Scene Sementic Matching

Uploaded by

Copyright:

Available Formats

2874 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO.

Remote Sensing Image Retrieval

C ONTENT-based image retrieval (CBIR) is the traditional

0196-2892/$31.00 © 2012 IEEE

Fig. 1. Technical framework.

In this paper, an object-based image classification method is

because object B bestrides sections 1 to 4. This position denotes

Fig. 5. (a) Scene and (b) its ARG.

constraint controls the adaption of retrieval size to practical

NSRF (P, Q) = SRF (P, Q)/ (6 · card(EP )) (4)

Fig. 7. Query process.

0.78–0.87 [near-infrared (NIR)], and 1.58–1.75 μm [short- TABLE III

the edge numbers in the source and target ARGS, respectively.

B. Accuracy Evaluation Schemes

search for his/her query templates. Thus, sketch-map-based

You might also like