Professional Documents
Culture Documents
5, MAY 2013
Abstract—This paper proposes a remote sensing (RS) image and reliable if spatial similarity evaluation will not only rely
retrieval scheme by using image scene semantic (SS) match- on the low-level VFs of an image, such as color and texture.
ing. The low-level image visual features (VFs) are first mapped Employing the semantics in RSIR, or semantic-based RSIR
into multilevel spatial semantics via VF extraction, object-based (SBRSIR), can meet the requirements. However, semantic ex-
classification of support vector machines, spatial relationship in-
ference, and SS modeling. Furthermore, a spatial SS matching traction and utilization in RSIR are more complicated compared
model that involves the object area, attribution, topology, and with those in common CBIR because RS images often represent
orientation features is proposed for the implementation of the large natural geographical scenes that contain abundant and
sample-scene-based image retrieval. Moreover, a prototype system complex visual contents.
that uses a coarse-to-fine retrieval scheme is implemented with This paper proposes a suite of technical schemes that is
high retrieval accuracy. Experimental results show that the pro- suitable for the semantic-based retrieval of RS image scenes.
posed method is suitable for spatial SS modeling, particularly geo-
Image low-level VFs are gradually converted to high-level spa-
graphic SS modeling, and performs well in spatial scene similarity
matching. tial semantics, including object, spatial relationship, and scene
semantics (SS). Then, a spatial SS matching model that involves
Index Terms—Attributed relational graph (ARG), image re- the object area, attribution, topology, and orientation features is
trieval, object-based image analysis, remote sensing (RS) image,
proposed for the implementation of sample-scene-based image
scene matching, semantic, semantic gap.
retrieval. Furthermore, a prototype system is designed and im-
I. I NTRODUCTION plemented for method validation. Many known computational
methods, including VF extraction, image decomposition, and
first and second levels of mapping image VFs to object seman- age segmentation, primitive feature extraction, unsupervised
tics (OS). learning, and SVM classification. Sun et al. [24] designed and
The major problem lies in the extraction of image semantics. implemented an SBRSIR prototype system in a grid environ-
Although image semantic extraction is difficult, it still offers ment by using ontology and grid techniques. Tobin et al. [25]
possible solutions. State-of-the-art techniques include five cat- proposed a method for indexing and retrieving high-resolution
egories as follows: 1) object ontology; 2) machine learning; image regions in large geospatial data libraries. The steps of
3) relevance feedback (RF); 4) semantic template; and 5) web- their method include feature extraction from the segments of
page context analysis. SBRSIR may fall into categories 1), 2), tessellated images, region merging, indexing, and retrieval in
and 3). a query-by-example environment. Li and Bretschneider [26]
Object ontology-based methods quantize image VFs, such proposed an approach by using a context-sensitive Bayesian
as color and texture, to form a vocabulary, with each interval network for the semantic inference of segmented scenes. The
corresponding to an intermediate-level descriptor of images. RS-related semantics of the regions were inferred in a multi-
Images are classified by mapping such descriptors into high- stage process based on the spectral and textural characteristics
level semantics (keywords) [15], [16]. of the regions and on the semantics of the adjacent regions.
Machine learning, which generally includes supervised and Blanchart and Datcu [27] proposed a semisupervised method
unsupervised learning, seems a more popular way for extracting for the autoannotation of satellite image databases and for
RS image semantics. Supervised learning methods, such as the discovery of unknown semantic image classes in such
decision trees, Bayesian classification, neural networks, and databases. They used latent variable models to map low-level
support vector machines (SVMs), are commonly used for learn- VFs to high-level image semantics. They showed that the use
ing high-level concepts, e.g., OS, from low-level VFs. Unsu- of unlabeled data could generate reliable estimates in terms of
pervised learning tends to group image data into clusters by the model parameters. Bratasanu et al. [28] proposed the use
maximizing the inner similarity of a cluster and minimizing the of the latent Dirichlet allocation model for mapping low-level
similarity among clusters. VFs are first clustered, and then the features of clusters and segments to the high-level map labels
clusters are mapped to some concepts (e.g., some categories). for RS image annotation and mapping. Furthermore, Ferecatu
The mapping rules can then be used to index unlabeled images. and Boujemaa [29], and Li and Bretschneider [26] proposed
RF learns semantics on-demand. A typical RF-based SBIR methods by using RF techniques to support interactive RSIR
system includes the following steps [17]: 1) the system offers and effectively alleviate the “semantic gap” problems.
an initial retrieval set by some query method; 2) a user selects On the other hand, many studies focused on applying higher
the most relevant/irrelevant images; and 3) the system learns the level spatial semantics, such as the spatial relationship or SS in
“semantics” of the selection and offers a finer retrieval. Steps 2) SBRSIR. For example, El-Kwae and Kabuka [30] proposed an
and 3) can be repeated until satisfactory results are obtained. algorithm named SIMDTC . In SIMDTC , the similarity between
Semantic utilization generally includes two types of meth- two images was a weighted function of the number of their
ods. The first method extracts semantics from both the query common objects and the closeness of directional and topolog-
templates and the images in databases and then performs ical spatial relationships between object pairs in both images.
semantics-based similarity matching. The second method al- Li and Fonseca [31] proposed a comprehensive model named
ternatively matches a user’s queries with some predefined se- Topology–Direction–Distance for qualitative spatial similarity
mantic templates. Users are generally required to offer some assessment. They applied different priorities to topology, di-
concept sketches as system inputs, and the similarity matching rection, and distance similarities, and they considered both
process is transferred to common CBIR systems [18]. commonality and difference in similarity assessments. In their
report, they only addressed simple scene matching issues with
two objects. Aksoy [32] modeled RS image content using at-
B. Typical Cases tributed relational graphs (ARGs) and regarded image retrieval
Many studies have been dedicated to feature or OS extraction as a problem in relational matching and subgraph isomorphism.
and application in SBRSIR. For example, Shyu et al. [19] and The “editing distance,” defined as the minimum cost taken over
Scott et al. [20] proposed an RSIR system named GeoIRIS all sequences of operations that transforms one ARG to another,
that could implement content-based shape retrieval of objects was used as the similarity measure. Kalaycilar et al. [33]
from a large-scale satellite imagery database. Durbha and King proposed the retrieval of images by using spatial relationship
[21] proposed a retrieval framework based on a concept-based histograms. These histograms were constructed by classifying
model by using domain-dependent ontologies. Their framework the regions in an image, by computing the topological and
obtained image semantics by employing image segmentation, distance-based spatial relationships between these regions, and
primitive descriptor calculation, and object ontology learning by counting the number of times that different groups of regions
via SVMs. Li and Narayanan [22] proposed an integrated were observed in the image.
method to retrieve the spectral and spatial patterns of remotely Semantics hidden in RS images has complex representations.
sensed imagery. In their method, land cover information that RS image analysis is generally a high-level spatial analysis
corresponds to spectral characteristics was identified using that is more complicated than ordinary image processing. The
SVM classification, whereas textural features that characterize spatial semantic utilization in SBRSIR is more complicated
spatial information were extracted using Gabor filters. Durbha than that in common SBIR. Although much progress has been
and King [23] proposed I3KR, which is a semantic-enabled achieved, SBRSIR remains a young research field. Operative
image knowledge retrieval system for the exploration of dis- spatial semantic description, extraction, and matching methods
tributed RS image archives. The OS were obtained using im- for SBRSIR remain open topics.
2876 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 5, MAY 2013
III. M ETHODOLOGY only areas with the same size can be queried. Second, many
areas located across the neighboring blocks cannot be retrieved.
A. Technical Framework An image may be decomposed in a multiscale manner, e.g.,
This paper utilizes spatial object, relationship, and SS in the quadtree decomposing, to overcome the first shortcoming.
proposed SBRSIR system. Behavior and emotion semantics Some additional blocks may be added into the quadtree, which
are not considered because of their remoteness to RSIR. The originates quintree and nonatree decomposing, among other, to
proposed semantic extraction scheme includes image decom- address the second shortcoming [34]. Although these additional
position, segmentation, object-based classification, spatial rela- blocks improve retrieval accuracy, they also cause redundant
tionship reasoning, and scene modeling, as shown in Fig. 1. An storage and lower query efficiency. Thus, a decomposing strat-
image is first decomposed into a number of blocks according to egy that can balance retrieval accuracy and query efficiency
a quintree structure. Then, a multiresolution image segmenta- well is necessary.
tion method divides the image blocks into several parcels. The The redundancy is measured using the block cover ratio. As
VFs of the parcels, including color and texture, are extracted illustrated in Fig. 2(b), given a query image M as large as four
and stored. The OS map is obtained by applying object-based subblocks derived from the quadtree decomposition, the cover
image classification on the blocks. The area, orientation, and ratio of M is the ratio of its maximum overlap area to these
topology semantic features are calculated and stored on the blocks. Since block 1 mostly overlaps M, the cover ratio of M is
classified parcels. A SS model is then built on the semantics- given by (L − x)(L − y)/(L × L) × 100%. Given an arbitrary
labeled parcels. Different types of image retrieval methods are query image that has the same size with the blocks, the cover
then conducted based on image visual, object, spatial rela- ratio of the quadtree decomposing ranges from 25% to 100%,
tionship, and SS. This paper primarily discusses the SS-based with an average ratio of 56.25% [35].
retrieval because it is arguably the most advanced retrieval Another commonly used image decomposing method is the
scheme for RS images. quintree decomposing. A quintree is a hierarchical structure
that decomposes an image into five subblocks of the same
size, as shown in Fig. 2. The middle block does not need
B. Image Decomposing
to be decomposed because its upper left, upper right, lower
RSIR often involves the matching of a query template with left, and lower right subblocks are overlapped with some
some parts of a whole-scene RS image. A large image needs same-level subblocks derived from its neighbors. The de-
to be decomposed generally into several blocks suitable for composition is recursively implemented until the minimum
retrieval. Without decomposing, image features require online subblock size reaches a certain threshold. After the first-level
extraction, and CBRSIR systems may be inefficient when fac- decomposing, the minimum, maximum, and average cover
ing voluminous data. Block-oriented image decomposition is ratios of the quintree are 50%, 100%, and 68.75%, respectively.
a practical solution [34], [35]. Chessboard decomposition is A higher cover ratio improves retrieval accuracy, causes redun-
commonly used, but it is limited by two shortcomings. First, dant block storage, and declines query efficiency. In this paper,
WANG AND SONG: REMOTE SENSING IMAGE RETRIEVAL BY SCENE SEMANTIC MATCHING 2877
Fig. 2. Image decomposing. (a) Quintree decomposing. (b) Overlapping area of quadtree decomposing.
quintree decomposition is used because of its fair retrieval created in the high dimensional space, and the classification
accuracy and efficiency. is implemented to maximize the margin, which is defined as
the distance of the closest vectors in certain classes to the
hyperplane. The function for calculating the best hyperplane is
C. Object Extraction and Classification defined as follows:
used for OS extraction. The object-based image classification Q(α) = αi − αi αj yi yj (xi · xj ) (1)
i=1
2 i,j=1
method first segments an image into parcels, extracts features,
and then classifies the image [36], [37]. A parcel is a group
where n is the number of samples, y is the class label, αi is
of pixels that forms an “object” other than a single pixel; thus,
the Lagrange multiplier, and x is the samples. Accordingly, the
such classification scheme is called object-based image clas-
decision function is defined as follows:
sification. Compared with pixel-based image analysis, object-
n
based image analysis offers more abundant features for image
interpretation [38], e.g., the shape and spatial relationship fea- f (x) = sgn αi yi K(xi , x) + b (2)
tures of the parcels may also be utilized for image classification. i=1
In addition, it has more robust pepper-noise-removing abil-
ity, providing more comprehensible interpretation results for where sgn is the symbolic function, b is the threshold value of
medium- and high-spatial-resolution RS images [39]. Object- classification, and K is the kernel function. Vapnik et al. [44],
based image analysis has been successfully applied in many [45] examined SVMs in detail.
applications, including land cover classification and change
detection by using high-spatial-resolution RS images [40]–[42].
D. SS Modeling
Image segmentation, which segments an image into homoge-
nous parcels to facilitate sequent analyses, is the first and most This paper initially introduces the “conceptual neighbor-
important step of object-based image analysis. Wang [43] de- hoods” model [46], [47] for the simple scene modeling of only
signed and implemented a multiresolution image segmentation two regular objects. However, the “conceptual neighborhoods”
method that could issue multiresolution homogenous parcels. model is limited to modeling complex spatial scenes that
This segmentation method is conducted in two main steps. commonly comprise many irregular objects. Thus, this paper
Several initial small parcels are first obtained using the rainfall proposes a new orientation modeling that can adapt to irreg-
watershed segmentation method. Then, the fast region merging ular objects. Moreover, complex spatial scenes are modeled
is conducted to merge the small parcels in a hierarchical man- using ARGs.
ner. Merging is conducted under the control of a scale parameter 1) “Conceptual Neighborhood” Model: The “conceptual
that stopped the merging process when the minimal parcel neighborhood” comprehensively models spatial relationships,
merging cost exceeded its power. Multiresolution segmentation including topology, orientation, and approximate distances be-
could be implemented by setting different scale parameters; tween a pair of objects. Each node in the graph denotes a scene
smaller scales meant less merging cost, which created smaller of two regular objects, as shown in Fig. 3. Two nodes are called
parcels, and vice versa. the “conceptual neighborhoods” and are linked with an edge if
After image segmentation, the parcels are vectorized, and they are regarded as the most similar between each other. For
then their features are calculated and stored. The commonly example, scene 1 has two objects with disjointed topology and
used parcel-based features are as follows: 1) spectral features, north-to-south orientation relationships. Thus, scene 1 has three
such as the parcel spectral mean and standard deviation values; “conceptual neighborhoods” of scenes 2, 16, and 17 because it
2) shape features, such as the parcel area, perimeter, main needs only one step to convert itself into the other three scenes.
direction, and length/width ratio; and 3) texture feature, such as The dissimilarity among simple scenes can be measured con-
GLCM and Gabor textures. In this paper, an SVM [44], [45] is veniently using the “conceptual neighborhood” model. A scene
used to classify the parcels based on their spectral features. For can gradually change into another scene with modifications
a nonlinear classification problem, the spectral mean values of on its topology, distance, and orientation relationships. More
these parcels are mapped into a high-dimensional feature space modifications indicate less similarity. For example, scene 41 in
by using a certain kernel function. An optimal hyperplane is Fig. 3 requires at least three steps to change itself into scene 16
2878 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 5, MAY 2013
TABLE I
MATCHING SCHEME
4) {Q} is sorted in a descending order using SS(P, Q). The by selecting a number of interested semantic categories. The
secondary sorting is conducted by NSO(P, Q) for the image blocks that contain all the queried classes are retrieved
scenes with the same scene similarities. as the first “rough” retrieval set. The subsequent queries are
5) These sorted images are returned to the users, and the limited to the first set, which helps reduce succeeding searching
retrieval is finished. scope and improves query efficiency. Then, the user browses
In step 3), a stratified digraph structure suitable for scene and selects an interested image block as the query template. The
matching with unfixed numbers of categories and objects is user further selects some interested parcels by mouse-clicking
designed. Table I and Fig. 6 present a typical case with four on the classification map of the query image. These parcels
categories and six objects. As shown in Table I, columns 2 and form the query scene. The matching algorithm is induced to
3 list the object IDs of each category, and column 4 lists all the focus on “important” objects and omits trivial ones to improve
possible matching patterns of each category. All the scene-level feedback efficiency.
matching patterns are obtained by selecting and combining the In the second step, the scene similarities between the query
category-level matching patterns. For example, 1 ↔ 2, 2 ↔ 5, scene and each candidate image in the rough retrieval set
3 ↔ 1, 4 ↔ 6, 5 ↔ 3, and 6 ↔ 4 and 1 ↔ 2, 2 ↔ 5, 3 ↔ 6, are calculated. A candidate image is marked as eligible and
4 ↔ 1, 5 ↔ 3, and 6 ↔ 4 are two typical scene-level matching returned if the total similarity is higher than a certain threshold.
patterns. The fine retrieval sorts the rough retrieval set and returns a
To facilitate the scene-level matching process, the stratified queue in a descendant order of similarities. The threshold is
digraph is composed of nodes that represent the matching pat- used to truncate the queue and accelerate feedback.
terns in the category level. Each layer in the digraph represents
the possible matching patterns of the same category. One-way
pointers link the nodes in the same layer, and the last node of IV. E XPERIMENTAL A NALYSIS
the front layer points to the first node of the next layer. From top A. Experimental Setting
to bottom, the method visits and picks a single node from each
layer by using the depth-first traversal algorithm and then gath- The prototype system was implemented using Microsoft
ers all scene-level matching patterns to determine the best one. Visual C++.NET 2003, and the database system was imple-
mented using Microsoft SQL Server 2005. The testing platform
had an Intel(R) Core (TM) 2 Quad 8200 2.33-GHz CPU with
G. Query Process
2-GB memory. Several RS multispectral images were collected
In this paper, a two-step coarse-to-fine query scheme is as testing data, including three scene SPOT-5 images, five
designed to apply the proposed scene-matching model to SBR- scene GeoEye images, six scene ALOS images, and two scene
SIR. The proposed scheme includes two substages, namely, TM images collected from 2000–2011. A mixed data source
rough retrieval by using image OS and fine retrieval by using was used to test method adaptability. The imaging bands of
SS matching, as shown in Fig. 7. A user first issues a query SPOT-5 were 0.5–0.59 (green to yellow), 0.61–0.68 (red),
WANG AND SONG: REMOTE SENSING IMAGE RETRIEVAL BY SCENE SEMANTIC MATCHING 2881
TABLE II
CLASSIFICATION ACCURACY OF THE FIRST-LEVEL CLASSES
C. Performance Analysis
1) Image Classification: Approximately 92% first-level
classification accuracy was achieved using parcel-based SVM
classification, whereas approximately 90.5% second-level clas-
sification accuracy was achieved using parcel shape analysis
and manual editing on the testing data set. Tables II and III
list the average sample size (the ratio of the number of sample Fig. 9. Method comparison.
parcels versus the total number of parcels in classified images),
the classification accuracy of each class in each kind of imagery, The average accuracy was found to be higher than that of
and the average classification accuracy. Fig. 8 shows a typical the classification. This result is reasonable because objects
classification case on a SPOT-5 image. within the same category may be distributed widely within a
2) Rough Retrieval: The rough retrieval accuracy was in- retrieved image; this paper only verified whether the category
vestigated by getting the average of five types of category was contained by the image.
combination schemes with different category numbers that 3) Fine Retrieval: In this stage, an image from the
were arbitrarily selected. The results are listed in Table IV. rough retrieval set was picked as the query template, some
WANG AND SONG: REMOTE SENSING IMAGE RETRIEVAL BY SCENE SEMANTIC MATCHING 2883
Fig. 10. Retrieval by a two-object scene. The first row represents the original image, the classification map, and the query scene. The second to fourth rows
show the eight most similar images returned by the SS, VF, and OS methods. These images were labeled with image types and similarity measures. “S” indicates
SPOT-5, “G” indicates GeoEye, “A” indicates ALOS, and “T” indicates TM.
“important” objects (large classified parcels) were selected larities to the query template from left to right. The similarity
to compose the query scene, and the SS-based fine retrieval values were labeled under the images. The SS method has
was performed (SS method). Two different retrieval schemes, two similarity measures, namely, scene similarity SS and OS
namely, retrieval via VFs (VF method), including color (second similarity SO, whereas the other two methods only have one
and third color moments) and texture (2-D Gabor coefficient), measure each.
and retrieval via OS (OS method), were also implemented for In the first case, the query scene was composed of two
method comparison. The query process of the two contrast objects with categories “woodland” and “lake and sea.” The
methods coincided with the SS method. First, the same rough objects were distributed closely with a northwest to southeast
retrieval was conducted. Then, the fine retrievals were con- orientation relationship. The scenes returned by the SS method
ducted by comparing the VF histograms (VF method) or the were found to be highly close to the template in terms of spatial
object area histograms (OS method) of the query template with configurations. Although scenes 4–6 contained very visually
those of the roughly retrieved images. The fine retrieval sorted different bodies of waters, they were returned because their
the rough retrieval set by similarity and returned a queue in a semantics were consistent with the template. On the contrary,
descending order. In all three methods, a candidate image was scenes 5 and 6 returned using the VF method and scenes 4 to
returned if its total similarity to the template exceeded 0.6. This 8 returned using the OS method were relatively different from
threshold truncated the queue and accelerated system feedback. the template.
The above query was repeated ten times for the three meth- In the second case, the scene included three objects. From
ods, and their accuracy values were evaluated. Fig. 9 shows the northwest to southeast, the objects were parcels of cropland and
tau curves of the ten queries with different numbers of scene grassland, river, and urban and built-up areas. The SS method
objects. The SS method had the highest tau mean and the lowest retrieved the most similar scenes to the template, whereas the
tau standard deviation, indicating that the machine sequences other two methods sometimes returned scenes with relatively
returned by the SS method were stably the closest to human different spatial configurations.
sequences. The other two methods had distinctively lower The last case comprised four objects with three categories.
tau means, and their tau curves fluctuated more intensively. They were parcels of one woodland, one river, and two urban
Therefore, the SS method was proven superior to the other two and built-up areas distributed on both sides of the river. In this
methods. case, the SS method continued to exhibit better performance
Figs. 10–12 show three retrieval cases. In each case, the over the other two methods. In all these cases, the SS method
original image, classification map, and query scene (nonwhite exceeded the VF and OS methods in retrieving semantic similar
parcels) were arranged in the first row. The images retrieved scenes. In addition, the images retrieved using the SS method
using the SS, VF, and OS methods were arranged in three were arranged in a more precise order in terms of OS similarity.
rows from top to bottom. The eight most similar images re- The scene modeling and matching schemes were proven effec-
trieved by each method were arranged with descending simi- tive and superior.
2884 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 5, MAY 2013
Fig. 11. Retrieval by a three-object scene. The other settings are the same as those in Fig. 10.
Fig. 12. Retrieval by a four-object scene. The other settings are the same as those in Fig. 10.
4) System Efficiency: The efficiency and scalability of the times were recorded. The total feedback time was the sum of
rough retrieval on databases resizing from containing 500 to the rough and fine retrievals. Fig. 13 shows the relationships of
3480 full images were tested. One to five semantic classes were rough and total feedback times versus database volume. These
arbitrarily selected, and their average feedback time versus the curves were almost linear, indicating that system scalability
database volume was recorded. After a rough retrieval, two was high. Using the proposed platform, the rough and fine
to four parcels were arbitrarily selected as the sample scene, retrievals required an average of 46 and 66 s, respectively, on
the fine retrievals were conducted, and their average feedback the entire database. In the current prototype system, spatial
WANG AND SONG: REMOTE SENSING IMAGE RETRIEVAL BY SCENE SEMANTIC MATCHING 2885
ACKNOWLEDGMENT
The authors would like to thank the two anonymous review-
ers for their very constructive comments that improved the
manuscript significantly.
R EFERENCES
[1] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influ-
ences, and trends of the new age,” ACM Comput. Surv., vol. 40, no. 2,
Fig. 13. Retrieval efficiency. pp. 1–60, Apr. 2008.
[2] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic,
relationship inference and ARG constructions were conducted and W. Equitz, “Efficient and effective querying by image content,” J.
via real-time computing for flexibility. System efficiency could Intell. Inf. Syst., vol. 3, no. 3/4, pp. 231–262, Jul. 1994.
be improved by developing and utilizing appropriate spatial [3] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based
manipulation of image databases,” Int. J. Comput. Vis., vol. 18, no. 3,
indices, prestoring these spatial relationship features, and using pp. 233–254, Jun. 1996.
more efficient matching algorithms. [4] J. R. Smith and S. F. Chang, “VisualSEEK: A fully automated content-
based image query system,” Proc. 4th ACM Int. Multimedia Conf., pp. 87–
98, 1996.
V. C ONCLUSION [5] J. Z. Wang, J. Li, and G. Wiederhold, “SIMPLIcity: Semantics-sensitive
integrated matching for picture libraries,” IEEE Trans. Pattern Anal.
This paper offers a novel SBRSIR approach whose main Mach. Intell., vol. 23, no. 9, pp. 947–963, Sep. 2001.
[6] GazoPa, Feb. 2012. [Online]. Available: http://en.Wikipedia.org/wiki/
contributions lie in the SS modeling and matching schemes. GazoPa
Multilevel image semantics are first extracted based on VF [7] R. L. Gregory, Eye and Brain: The Psychology of Seeing. Princeton, NJ:
extraction, object-based image classification, spatial relation- Princeton Univ. Press, 1997.
ship inference, and scene modeling. Then, a novel spatial [8] Y. Liu, D. S. Zhang, and G. J. Lu, “Region-based image retrieval with
high-level semantics using decision tree learning,” Pattern Recognit.,
scene-matching model is designed and extended to an SBRSIR vol. 41, no. 8, pp. 2554–2570, Aug. 2008.
prototype system that was distinctive in the coarse-to-fine two- [9] Z. Shi, H. Qing, and Z. Z. Shi, “An index and retrieval framework in-
stage query scheme. Semantic-based retrieval experiments were tegrating perceptive features and semantics for multimedia databases,”
Multimedia Tools Appl., vol. 42, no. 2, pp. 207–231, Apr. 2009.
successfully conducted on high/medium-resolution multispec- [10] W. Huang, Y. Gao, and K. L. Chan, “A review of region-based image
tral RS images with resolutions in the range of 2–30 m. The retrieval,” J. Signal Process. Syst., vol. 59, no. 2, pp. 143–161, May 2010.
proposed retrieval scheme can effectively be applied to lower- [11] Y. Liu, D. S. Zhang, G. J. Lu, and W. Y. Ma, “A survey of content-based
resolution images, as long as the images are correctly classified. image retrieval with high-level semantics,” Pattern Recognit., vol. 40,
no. 1, pp. 262–282, Jan. 2007.
RS images with lower resolutions can commonly be classified [12] H. F. Wang and Z. X. Sun, “The methods of semantics processing in
using simpler pixel-based classification methods but, perhaps, content-based image retrieval,” J. Image Graph., vol. 6, no. 10, pp. 945–
with coarse classification granularity. 952, 2001.
The proposed scheme is most recommended for retrieving [13] N. Wu and F. M. Song, “An image retrieval method based on high-level
image semantic information,” J. Image Graph., vol. 11, no. 12, pp. 1774–
10–30-m RS images because current classification granularity 1780, 2006.
basically matches the image spatial resolutions and because [14] C. Colombo, A. D. Bimbo, and P. Pala, “Semantics in visual information
object-based image classification can be utilized with good retrieval,” IEEE Multimedia, vol. 6, no. 3, pp. 38–53, Jul.–Sep. 1999.
[15] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “An ontology approach
accuracy on such images. It is not recommended for very high to object-based image retrieval,” in Proc. Int. Conf. Image Process.,
resolution (VHR) RS image (e.g., < 1 m) retrieval. First, the Barcelona, Spain, 2003, pp. 511–514.
current classification granularity seems coarse for VHR images. [16] M. Obeid, B. Jedynak, and M. Daoudi, “Image indexing & retrieval using
Second, fair and stable classification accuracy is difficult to intermediate features,” in Proc. 9th ACM Int. Conf. Multimedia, Ottawa,
ON, Canada, 2001, pp. 531–533.
obtain only by classifying spectral features because of seri- [17] X. S. Zhou and T. S. Huang, “Relevance feedback in image retrieval:
ous spectral confusion caused by image trivial details, noise, A comprehensive review,” Multimedia Syst., vol. 8, no. 6, pp. 536–544,
shadow, and the mutual occlusion of ground features in these Apr. 2003.
[18] S. F. Cheng, W. Chen, and H. Sundaram, “Semantic visual templates:
images. Low classification accuracy will seriously influence Linking visual features to semantics,” in Proc. Int. Conf. Image Process.,
semantic matching and retrieval. Despite the limitations above, Chicago, IL, 1998, pp. 531–535.
this paper provided a practical solution for spatial, particu- [19] C. R. Shyu, M. Klaric, G. J. Scott, A. S. Barb, C. H. Davis, and
larly geographic, scene matching, which has good application K. Palaniappan, “GeoIRIS: Geospatial information retrieval and indexing
system—Content mining, semantics modeling, and complex queries,”
prospects in SBRSIR. IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 839–852, Apr. 2007.
More robust classification schemes to refine classification [20] G. J. Scott, M. N. Klaric, C. H. Davis, and C. R. Shyu, “Entropy-Balanced
and retrieval granularity will be investigated in future research. bitmap tree for shape-based object retrieval from large-scale satellite
imagery databases,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 5,
In addition, shape and distance features will be considered pp. 1603–1616, May 2011.
in the current scene-matching model. The SBRSIR prototype [21] S. S. Durbha and R. L. King, “Knowledge mining in earth observation
system is currently based on templates and needs a user to data archives: A domain ontology perspective,” in Proc. IEEE Int. Geosci.
2886 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 5, MAY 2013
Remote Sens. Symp.: Sci. Soc.: Exploring Manag. Changing Planet, [39] P. Aplin, P. M. Atkinson, and P. Curran, “Per-field classification of land
Anchorage, AK, 2004, pp. 172–173. use using the forthcoming very fine spatial resolution satellite sensors:
[22] J. Li and R. M. Narayanan, “Integrated spectral and spatial information Problems and potential solutions,” in Advances in Remote Sensing and
mining in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., GIS Analysis, P. M. Atkinson and N. J. Tate, Eds. Chichester, U.K.:
vol. 42, no. 3, pp. 673–685, Mar. 2004. Wiley, 1999, pp. 219–239.
[23] S. S. Durbha and R. L. King, “Semantics-enabled framework for knowl- [40] J. Im, J. R. Jensen, and J. A. Tullis, “Object-based change detection using
edge discovery from earth observation data archives,” IEEE Trans. correlation image analysis and image segmentation,” Int. J. Remote Sens.,
Geosci. Remote Sens., vol. 43, no. 11, pp. 2563–2572, Nov. 2005. vol. 29, no. 2, pp. 399–423, Jan. 2008.
[24] H. Sun, S. X. Li, W. J. Li, Z. Ming, and S. B. Cai, “Semantic-based [41] W. Zhou and A. Troy, “An object oriented approach for analysing and
retrieval of remote sensing images in a grid environment,” IEEE Geosci. characterizing urban landscape at the parcel level,” Int. J. Remote Sens.,
Remote Sens. Lett., vol. 2, no. 4, pp. 440–444, Oct. 2005. vol. 29, no. 11, pp. 3119–3135, Jun. 2008.
[25] K. W. Tobin, B. L. Bhaduri, E. A. Bright, A. Cheriyadat, T. P. Karnowski, [42] I. A. Rizvi and B. K. Mohan, “Object-Based image analysis of high-
P. J. Palathingal, T. E. Potok, and J. R. Price, “Automated feature resolution satellite images using modified cloud basis function neural net-
generation in large-scale geospatial libraries for content-based index- work and probabilistic relaxation labeling process,” IEEE Trans. Geosci.
ing,” Photogramm. Eng. Remote Sens., vol. 72, no. 5, pp. 531–540, Remote Sens., vol. 49, no. 12, pp. 4815–4820, Dec. 2011.
May 2006. [43] M. Wang, “A multiresolution remotely sensed image segmentation
[26] Y. Li and T. R. Bretschneider, “Semantic-Sensitive Satellite Image Re- method combining rainfalling watershed algorithm and fast region merg-
trieval,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 853–860, ing,” in Proc. Int. Archives Photogramm., Remote Sens. Spatial Inf. Sci.,
Apr. 2007. 2008, vol. 38, Part 4, pp. 1213–1218.
[27] P. Blanchart and M. Datcu, “A semi-supervised algorithm for auto- [44] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
annotation and unknown structures discovery in satellite image Springer-Verlag, 2000.
databases,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 3, [45] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Ma-
no. 4, pp. 698–717, Dec. 2010. chines and Other Kernel-Based Learning Methods. Cambridge, U.K.:
[28] D. Bratasanu, I. Nedelcu, and M. Datcu, “Bridging the semantic gap for Cambridge Univ. Press, 2000.
satellite image annotation and automatic mapping applications,” IEEE J. [46] C. C. Chang and S. Y. Lee, “Retrieval of similar pictures on pictorial
Sel. Topics Appl. Earth Observ. Remote Sens, vol. 4, no. 1, pp. 193–204, databases,” Pattern Recognit., vol. 24, no. 7, pp. 675–680, Jul. 1991.
Mar. 2011. [47] T. Bruns and M. Egenhofer, “Similarity of spatial scenes,” in Proc. 7th Int.
[29] M. Ferecatu and N. Boujemaa, “Interactive remote-sensing image re- Symp. Spatial Data Handling, Delft, The Netherlands, 1996, pp. 173–184.
trieval using active relevance feedback,” IEEE Trans. Geosci. Remote [48] J. S. Payne, L. Heppelwhhite, and T. J. Stonham, “Perceptually based
Sens., vol. 45, no. 4, pp. 818–826, Apr. 2007. metrics for the evaluation of textural image retrieval methods,” in
[30] E. A. El-Kwae and M. R. Kabuka, “A robust framework for content-based Proc. IEEE Int. Conf. Multimedia Comput. Syst. Conf., 1999, vol. 2,
retrieval by spatial similarity in image databases,” ACM Trans. Inf. Syst., pp. 793–797.
vol. 17, no. 2, pp. 174–198, Apr. 1999.
[31] B. Li and F. Fonseca, “TDD: A comprehensive model for qualitative
spatial similarity assessment,” Spatial Cognit. Comput., vol. 6, no. 1,
pp. 31–62, Jan. 2006.
[32] S. Aksoy, “Modeling of remote sensing image content using attributed Min Wang received the M.Sc. degree from Zhejiang
relational graphs,” in Proc. Lecture Notes Comput. Sci., 2006, vol. 4109, University, Hangzhou, China, in 2000 and the Ph.D.
pp. 475–483. degree from the Chinese Academy of Sciences,
[33] F. Kalaycilar, A. Kale, D. Zamalieva, and S. Aksoy, “Mining Beijing, China, in 2003.
of remote sensing image archives using spatial relationship histo- He is currently a Professor with the Key Lab-
grams,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Boston, MA, oratory of Virtual Geographic Environment of the
2008, pp. III-589–III-592. Ministry of Education of China, Nanjing Normal
[34] E. Remias, G. Sheikholeslami, and A. D. Zhang, “Block-oriented image University, Nanjing, China. His research interests
decomposition and retrieval in image database systems,” in Proc. Int. include remote sensing image processing, feature ex-
Workshop Multi-Media Database Manage. Syst., Blue Mountain Lake, traction and classification, and remote sensing image
NY, 1996, pp. 85–92. mining.
[35] D. Li and X. Ning, “A new image decomposition method for content-
based remote sensing image retrieval,” Geomatics Inf. Sci. Wuhan Univ.,
vol. 31, no. 8, pp. 659–662, 2006.
[36] U. Benz, M. Baatz, and G. Schreier, “OSCAR-object oriented segmen-
tation and classification of advanced radar allow automated information Tengyi Song is currently working toward the Ph.D.
extraction,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Sydney, degree with the Key Laboratory of Virtual Geo-
Australia, 2001, pp. 1913–1915. graphic Environment of the Ministry of Education of
[37] T. Blaschke and G. J. Hay, “Object-oriented image analysis and scale- China, Nanjing Normal University, Nanjing, China.
space: Theory and methods for modeling and evaluating multiscale land- His research interests include remote sensing im-
scape structure,” in Proc. Int. Archives Photogramm. Remote Sens., 2001, age processing and information extraction.
vol. 34, pp. 22–29.
[38] H. G. Akçay and S. Aksoy, “Automatic detection of geospatial objects
using multiple hierarchical segmentations,” IEEE Trans. Geosci. Remote
Sens., vol. 46, no. 7, pp. 2097–2111, Jul. 2008.