You are on page 1of 13

A Unified Relevance Feedback Framework

for Web Image Retrieval


T S Prathyusha, N B L Prasanna
Department of IT
II/IV B.Tech
Sri Vasavi Engineering College.

Abstract Introduction
Although relevance feedback (RF) has been With the explosive growth of both
extensively studied in the content-based image World Wide Web and the number of digital images,
retrieval community, no commercial Web image there is more and more urgent need for effective Web
search engines support RF because of scalability, image retrieval systems. Most of the popular
efficiency, and effectiveness issues. In this paper, we commercial search engines, such as Google, Yahoo
propose a unified relevance feedback framework for and AltaVista, support image retrieval by keywords.
Web image retrieval. Our framework shows There are also commercial search engines dedicated to
advantage over traditional RF mechanisms in the image retrieval, e.g., Pic search. A common limitation
following three aspects. First, during the RF process, of most of the existing Web image retrieval systems is
both textual feature and visual feature are used in a that their search process is passive, i.e., disregarding
sequential way. To seamlessly combine textual the informative interactions between users and
feature-based RF and visual feature-based RF, a query retrieval systems. An active system should get the
concept-dependent fusion strategy is automatically user into the loop so that personalized results could be
learned. Second, the textual feature-based RF provided for the specific user. To be active, the
mechanism employs an effective search result system could take advantage of relevance feedback
clustering (SRC) algorithm to obtain salient phrases, techniques.
based on which we could construct an accurate and
low-dimensional textual space for the resulting Web Relevance feedback, originally
images. Thus, we could integrate RF into Web image developed for information retrieval, is an online
retrieval in a practical way. Last, a new user interface learning technique aiming at improving the
(UI) is proposed to support implicit RF. On the one effectiveness of the information retrieval system. The
hand, unlike traditional RF UI which enforces users to main idea of relevance feedback is to let the user
make explicit judgment on the results, the new UI guide the system.
regards the users’ click-through data as implicit During retrieval process, the user interacts with the
relevance feedback in order to release burden from the system and rates the relevance of the retrieved
users. On the other hand, unlike traditional RF UI documents, according to his/her subjective judgment.
which hardily substitutes subsequent results for With this additional information, the system
previous Ones, a recommendation scheme is used to dynamically learns the user’s intention, and gradually
help the users better understand the feedback process presents better results. Since the introduction of
and to mitigate the possible waiting caused by RF. relevance feedback to image retrieval in the mid-
Experimental results on a database consisting of 1990s, it has attracted tremendous attention in the
nearly three million Web images show that the content-based image retrieval (CBIR) community and
proposed framework is wieldy, scalable, and effective. has been shown to provide dramatic performance
improvement. However, no commercial Web image
search engines support relevance feedback because of
usability, scalability, and efficiency issues.

1
quantities and without burden on the users. As one of
Note that the textual features, on which the most effective implicit feedback information,
most of the commercial search engines depend, are click-through data has been used either as absolute
extracted from the file name, ALT text, URL, and relevance judgments or relative relevance judgments
surrounding text of the images. The usefulness of the in text retrieval. Fortunately, image retrieval has the
textual features is demonstrated by the popularity of following two characteristics when comparing with
the current available Web image search engine. While text retrieval.
straightly using the textual information to construct First, the thumbnail of an image
the textual space leads to a time consuming reflects more information than the title and snippet of
computation and the performance suffers from noisy a Web page, so click-through information of image
terms. Since the user is interacting with the search retrieval tends to be less noisy than that of text
engine in real time, the relevance feedback retrieval. Second, unlike textual document, the content
mechanism should be sufficiently fast, and if possible of an image can be taken in at a glance. As a result,
avoid heavy computations over millions of retrieved the user will possibly click more results in image
images. To integrate relevance feedback into Web retrieval than in text retrieval. Both characteristics
image retrieval in a practice always, an efficient and imply that click-through data could be helpful for
effective mechanisms required for constructing an image retrieval. We have proposed a unified relevance
accurate and low-dimensional textual space with feedback frame work for the web image retrieval there
respect to the resulting Web images. are three main contributions which are illustrated
below as follows:-
Although all existing commercial Web A dynamic multi modal
image retrieval systems solely depend on textual fusion scheme is proposed to seamlessly
information, Web images are characterized by both combine textual feature-based RF (TBRF)
textual and visual features. With effective utilization and visual feature-based RF (VBRF). More
of textual features, image retrieval greatly benefits specifically, a TBRF algorithm is first used
from leveraging mature techniques from text retrieval. to quickly select a possibly relevant image
However, just as the Proverb “a picture is worth one set. Then, a VBRF algorithm is combined
thousand words,” the textual representation of an with the TBRF algorithm to further re-rank
image is always in sufficient compared to the visual the resulting Web images. The fusion of
content of the image itself. Therefore, visual features VBRF and TBRF is query concept-
are required for finer granularity of image description. dependent and automatically learned. The
Considering the characteristics of both textual and textual feature-based RF mechanism
visual feature, it is reasonable to conclude that RF in employs an effective search result
textual space could guarantee relevance and RF in clustering (SRC) algorithm to obtain salient
visual space could meet the need for finer granularity. phrases, based on which we could
Thus, it is meaningful to introduce a unified relevance construct an accurate and low-dimensional
feedback framework textual space for the resulting Web
for Web image retrieval which seamlessly combines images. As a result, we could integrate RF
textual feature- based RF and visual feature-based RF into Web image retrieval in a practical way.
in a sequential way.
To strengthen our proposed A new UI is proposed to
framework, we employ implicit feedback to overcome support implicit RF. On the one hand,
the limitation of explicit feedback techniques where unlike traditional RF UI which enforces the
an increased cognitive burden is placed on the users. users to make explicit judgment on the
Unlike explicit feedback, implicit feedback could be results, the new UI regards the user’s click-
collected at much lower cost, in much larger through data as implicit relevance

2
feedback in order to release burden from for this problem. In fact it has been tried
the user. On the other hand, unlike successfully in several experiments.
traditional RF UI which hardily substitutes Dynamic programming tends to be costly.
subsequent results for previous ones, a Another problem
recommendation scheme is used to help arise due to a computational- geometric
the user better understand the feedback attack is the resolution enhancement.
process and to mitigate the possible Suppose we have an image scanner with
waiting caused by RF. the six bit resolution for each of the three
colors (RGB), and we wish to increase the
resolution to eight bits. The native
Image representation: approach which consists of the adding of
Quantized images are the gray levels of four pictures of the same
commonly represented as the set of the object has immediate flaws. Of the
pixels encoding color or brightness numerous solutions proposed in the
information in the matrix form. An computer vision literature, most suffer the
alternative model is based on the contour drawback of causing the blurring. As it
lines. A contour representation allows for turns out, The problem has a natural
the easy retrieval of the full image in the formulation in terms of the weighted
bit map form. It has been used primarily voronoi diagrams, a well studied
for data compression of an image. The idea construction in computational geometry.
is to encode, for each level, The Since such a
boundaries of the connected regions of the voronoi diagram is hard to compute
pixels at levels greater than or equal to. It especially in the presence of the high
is easy to reconstruct an original image degeneracy, A different approach might be
from those boundaries. There exists output preferable. An important observation is
sensitive algorithms for computing counter that here the object are not continuous but
sensitive representation. One problem is discrete. That is the main difference with
how to store such representation in a the interpolation of the contour lines in
compact manner. In practice one seldom geographic information systems. Indeed,
needs the entire contour representation. Suppose that we want to improve the
Typical use is in the form of a query asking resolution on an intensity level. This means
for the contours matching a given gray partitioning the pixels at the intensity
level. Current data structuring techniques levels in to different good level or the finer
should be called upon to provide efficient level denote the set of contour edges
solutions. So far, they have not attempts to between the pixels with the far lower
remedy this are under way. Here a typical intensity levels. We can compute the
problem encountered with this type of Euclidean distance to the nearest
representation. Suppose that we wish to boundary edges and the ratio between
erase wrinkles around the eyes of a person these distances. We can clarify those
in the digitized picture which is given in pixels into finer levels. With out going in to
the contour representation. Because the details it is apparent that the several
wrinkles must intersect the contour lines. variants of these heuristic can be
These might become disconnected after designed. Efficient implementation and the
removal of the wrinkles. To reconnect clarification of these heuristic is very
them is not so easy. Dynamic useful.
programming might be a natural approach

3
The images collected those resulting images we collect all distinct terms
from several photo forum sites, e.g., from the meta data which results in totally distinct
photosig, have rich metadata such as title, terms. For Pem it has distinct terms which consists of
category, photographer’s comment and the early morning landscape, nature, rural, I, found,
other people’s critiques. These images this, special, light, one, in, Pyrenees, along, the,
constitute the evaluation dataset for the Vicdessos, river, near, our, house, wow, like, picture,
proposed relevance feedback framework. very, much, guess, has, to, do, with, everything, is,
All the aforementioned metadata is used great, on, snow, and, sky, strange, looking, by, way,
as the textual source for the textual space greatly, composed, nice, crafted, border, a, and
construction. To build the textual space, beauty.
there are two available approaches in our To visually represent an
work. One straightforward approach is image, a 64-dimensional feature was extracted. It is a
directly using the above metadata to combination of three features: six-dimensional color
obtain the textual feature. Another one is moments, 44 dimensional banded auto- correlogram
based on the Search Result Clustering and 14-dimensional color texture moments. For color
(SRC) algorithm to construct the textual moments, the first two moments from each channel of
space. CIE-LUV color space were extracted. For
To represent the correlogram, the HSV color space with
textual feature, vector space model with inhomogeneous quantization into 44 colors is adopted.
TF-IDF weighting scheme is adopted. More For textual moments, we operate the original image
specifically, the textual feature of an image with templates derived from local Fourier transform
I is an L -dimensional vector and can be and obtain characteristic maps, each of which
given by characterizes some information on a certain aspect of
the original image. Similar to color moments, we
calculate the first and second moments of the
characteristic maps, which represent the color texture
information of the original image. The resulting visual
Where feature of an image is a 64-dimensional vector
is the textual feature of an image I Each feature dimension is
is the weight of the ith term in the Ith normalized [0, 1] using Gaussian
textual space. normalization for the convenience of
L is the number of all distinct terms of all further computation.
images in the textual space.
is the frequency of the ith term in I’s Relevance feedback:
textual space Search systems operate using a
N is the total number of the images. standard retrieval model, where a searcher, with a
• is the number of images whose meta need for information, searches for documents that will
data contains the ith term help supply this information. Searchers are typically
expected to describe the information they require via a
To illustrate the straight forward set of query words submitted to the search system.
approach where all the meta data is utilized to This query is compared to each document in the
construct the textual space. We use the photo Pem collection, and a set of potentially relevant documents
introduced at the beginning of these session as an is returned. It is rare that searchers will retrieve the
example. Given the query early morning we have the information they seek in response to their initial
resultant images including the photo Pem. Based on retrieval formulation (Van Rijsbergen, 1986).
However, such problems can be resolved by iterative,

4
interactive techniques. The initial query can be relevant documents and decreases the weights of those
reformulated during each iteration either explicitly by in non relevant documents. The terms chosen by the
the searcher or based on searcher interaction. RF system are typically those that discriminate most
The direct involvement of the between the documents marked and those that are not.
searcher in interactive IR results in a dialogue The query statement that evolves can be thought of as
between the IR system and the searcher that is a representation of a searcher’s interests within a
potentially muddled and misdirected (Ingwersen, search session (Ruthven et al., 2002a).
1992). Searchers may lack a sufficiently developed
idea of what information they seek and may be unable The classic model of IR
to conceptualize their needs into a query statement involves the retrieval of documents in response to a
understandable by the search system. When query devised and submitted by the searcher. The
unfamiliar with the collection of documents being query is a one-time static conception of the problem,
searched they may have insufficient search experience where the need assumed constant for the entire search
to adapt their query formulation strategy (Taylor, session, regardless of the information viewed. RF is
1968; Kuhlthau, 1988), and it is often necessary for an iterative process to improve a search system’s
searchers to interact with the retrieval system to representation of a static information need. That is, the
clarify their query. need after a number of iterations is assumed to be the
Relevance feedback (RF) is a same as at the beginning of the search (Bates, 1989).
technique that helps searchers improve the quality of The aim of RF is not to provide information that
their query statements and has been shown to be enables a change in the topic of the search.
effective in non-interactive experimental
environments (e.g., Salton and Buckley, 1990) and to The evolution of the query
a limited extent in IIR (Beaulieu, 1997). It allows statement across a number of feedback iterations is
searchers to mark documents as relevant to their needs best viewed as a linear process, resulting in the
and present this information to the IR system. The formulation of an improved query. Initially, this
information can then be used to retrieve more model of RF was not regarded as an interaction
documents like the relevant documents and rank between searcher and system and a potential source of
documents similar to the relevant ones before other relevance information. However current accounts of
documents (Ruthven, 2001, p. 38). RF is a cyclical feedback in IIR expand the notion of feedback to one
process: a set of documents retrieved in response to an in which the system and the searcher engage in direct
initial query are presented to the searcher, who dialogue, with feedback flowing from searcher to
indicates which documents are relevant. This system and vice-versa (Spink and Losee, 1996).
information is used by the system to produce a
modified query which is used to retrieve a new set of The value of IIR systems that
documents that are presented to the searcher. This use RF over systems that do not offer RF has already
process is known as an iteration of RF, and repeats been established (Koenemann and Belkin, 1996). As
until the required set of documents is found. this study demonstrates, it is possible to gain a deeper
To work effectively, RF understanding of what searchers want from RF
algorithms must obtain feedback from searchers about systems through empirical investigation. A number of
the relevance of the retrieved search results. This studies have found that searchers exhibit a desire for
feedback typically involves the explicit marking of explicit relevance feedback features and, in particular,
documents as relevant. The system takes terms from term suggestion features (Hancock-Beaulieu and
the documents marked and these are used to expand Walker, 1992; Koenemann and Belkin, 1996;
the query or re-weight the existing query terms. This Beaulieu, 1997; Belkin et al., 2000). However,
process is referred to as query modification. The evidence from these and related studies have indicated
process increases the score of terms that occur in that the features of RF systems are not used in

5
interactive searching (Beaulieu, 1997; Belkin et al., documents to assess their relevance. Documents may
2001; Ruthven et al., 2001); there appears to be an be lengthy or complex, searchers may have time
inconsistency between what searchers say they want restrictions or the initial query may have retrieved a
and what they actually use when confronted with RF poor set of documents. In RF systems the searcher is
systems. Searchers may lack the cognitive resources only able to judge the relevance of the documents that
to effectively manage the additional requirements of are presented to them. If a small number of relevant
the marking documents whilst trying to complete their documents are retrieved then the ability of the system
search task. The interface support for explicit RF can to approximate the searcher’s information need (via
often take the form of checkboxes next to each modified queries taken from searchers’ relevance
document at the interface, allowing searchers to mark judgements) can be adversely affected. RF systems
documents as relevant, or a sliding scale that allows can suffer badly if the corpus consists of a large
them to indicate the extent to which a document is number of multi topic or partially relevant documents.
relevant (Ruthven et al., 2002b). The process of In such documents, it is more likely that the relevant
indicating which information is relevant is unfamiliar parts will contain the appropriate potential query
to searchers, and is adjunct to the activity of locating modification terms, and terms in the remainder of the
relevant information. The feedback mechanism is not document may be erroneous, irrelevant ad
implemented as part of the routine search activity; inappropriate. However, RF systems treat documents
searchers may forget to use the feature or find it too as single entities with an inherent notion of relevance
onerous (Furnas, 2002). and non-relevance encompassing the whole entity, not
Despite the apparent advantages of the constituent parts. For this reason, it may be
RF there have been relatively few attempts to worthwhile to base relevance assessments for such
implement it in a full commercial environment. documents not on the whole document, but only on
Aalbersberg (1992) cited two possible reasons for this the pertinent parts (Salton et al., 1993; Callan, 1994;
trend; the high computational load necessitated by the Allan, 1995). Query-biased summarization (Tombros
RF algorithms and unfriendliness of the RF interface. and Sanderson, 1998), can reveal the most relevant
With recent improvements in processing power, the parts of the document (based on the query), and also
computational expense is no longer of real concern. remove the
Although the ser interface challenge remains, need to browse to documents to assess them. The
technological advances mean that interfaces can be summaries may allow searchers to assess documents
onstructed that make RF more easily understood by for relevance, and give feedback, more quickly.
searchers (Tague and Schultz, 1988; Gauch, 1992). Similar approaches have been shown to be effective in
a number of studies (Strzalkowski et al., 1998; Lam-
RF systems suffer from a trade-off Adesina and Jones, 2001; White et al., 2003b) and are
between the searcher visiting ocuments because the used in this thesis to create many representations of
system expects them to (i.e., to gauge their relevance) documents than can be assessed through traditional
and the searcher visiting documents because they implicit or explicit relevance feedback.
genuinely want to (i.e., they are interested in heir Relevance is an ‘intuitive’ concept
content). This problem is perhaps more acute after (Saracevic, 1996) of which there are many different
submission of the first query, here the searcher is types (Mizzaro, 1998), and as such is not easy to
required by the retrieval system to peruse and assess define or measure. Traditional RF systems use a
documents in the first page of results. The first query binary notion of relevance: either a document is
is merely tentative, designed to retrieve a set of relevant, or it is not. This is an overly simplified view
cuments to then be assessed. of what is an implicitly variable and immeasurable
concept. Many studies in IR have either used binary
In operational environments notions of relevance directly (Rees, 1967; Schamber
searchers may be unable or unwilling to visit et al., 1990), or collapsed more complex scales

6
(incorporating the ‘fuzzy regions of relevance’ (Spink its usefulness influence its use. In the next section
et al., 1998)) into binary scales for analysis purposes techniques to help searchers use RF systems are
(Saracevic et al., 1988; Schamber, 1991; Pao, 1993). discussed.
Partial relevance, despite its usefulness (Spink et al., Relevance feedback, originally
1998) is typically ignored in RF systems since the developed for information retrieval, is an online
formulae used to select query expansion terms and re- learning technique aiming at improving the
weight existing terms use a binary notion of effectiveness of the information retrieval system. The
relevance. There is therefore a need to incorporate less main idea of relevance feedback is to let the user
concrete, more fuzzy notions of relevance into the guide the system.
term selection process that underlies RF (Ruthven et During retrieval process, the user interacts with the
al., 2002b). system and rates the relevance of the retrieved
documents, according to his/her subjective judgment.
Another potential application of With this additional information, the system
RF techniques is in negative relevance feedback; the dynamically learns the user’s intention, and gradually
selection of important terms in non-relevant presents better results. Since the introduction of
documents that are then de-emphasized or removed relevance feedback to image retrieval in the mid-
completely from the query. This approach has been 1990s, it has attracted tremendous attention in the
shown to not detract from, and may improve, content-based image retrieval (CBIR) community and
searching behavior when used in interactive IR has been shown to provide dramatic performance
applications (Belkin et al., 1996a; 1998). In these improvement. However, no commercial Web image
studies it was suggested that the technique was search engines support relevance feedback because of
difficult to use, not helpful and its effectiveness was usability, scalability, and efficiency issues.
dependent on the search topic. This may be due to
how negative relevance feedback was supported at the RF in textual space:
interface. CBIR systems perform retrieval
based on the similarity defined in terms of visual
The RF features investigated in features with more objectiveness. Although some new
some of the studies described in this section may have methods, such as the relevant feedback, have been
been influenced by the environment in which they developed to improve the performance of CBIR
were evaluated (i.e., in a controlled, laboratory systems, low-level features do still play an important
setting). In a study looking at different types of query role and in some sense be the bottleneck for the
expansion techniques, Dennis et al. (1998) found that development and application of CBIR techniques.
although searchers could successfully use novel
expansion techniques and could be convinced of the A very basic issue in designing a
benefits of these techniques in a laboratory or training CBIR system is to select the most effective image
environment, they often stopped using these features to represent image contents. Many low-level
techniques in operational environments. Anick (2003) features have been researched so far. Currently, the
recently found in a Web-based study, that many widely used features include color features, such as
searchers made use of a term suggestion feature to color correlogram, color moments, color histogram,
refine their query. The results suggest the potential of and texture features, such as Gabor wavelet feature,
term suggestion features, in some types of searching MR-SAR. As the color and texture features capture
environments, especially for single session different aspects of images, their combination may be
interactions. The different findings in these two useful. Therefore, some pioneer works attempted to
studies suggest that RF may be situation-dependent characterize the color and texture information of an
and that many factors other than image in one feature representation. Lakmann et al [2]
proposed a reduced covariance color texture model,

7
which suggests a set of covariance matrices CCij relevant document and the average score of a non
(Δx,Δy) between different color channels i, j plus relevant document is maximized. Cosine similarity is
some color histogram to describe a color micro- used to calculate the similarity between an image and
texture. Palm et al [7] proposed another scheme to the optimal query. Since only clicked images are
combine the color and texture information together. It available for our proposed framework, we assume
interprets the hue and saturation as polar coordinates, clicked images to be relevant and define the feature of
which allow the direct use of HSV color space for optimal query as follows:
Fourier transform.

However, despite many research


efforts, the existing low-level features are still not
powerful enough to represent image content. Some
features can achieve relatively good performance, but Where:
their feature dimensions are usually too high, or the
implementation of the algorithm is difficult. is the vector of the initial query
is the vector of the relevant image
We propose a novel low-level feature,
named color texture moments, for representing image is the vector of the non relevant image
contents. It is able to integrate the color and texture Rel is the relevant image set
characteristics of an image in one compact form. Non Rel is the non relevant image set
Preliminary experimental results show that the new
is the number of the relevant images
feature achieves better performance than many
existing low-level features. More importantly, the is the number of the non relevant images
dimension of this new feature is only 48, much lower is the parameter controlling the relative
than that of many features with good performance. contribution of the relevant images and the initial
Furthermore, the feature extraction algorithm is very query
easy to implement. It is valuable for the development is the parameter controlling the relevant
and application of the CBIR systems. co0ntribution of non relevant images and the initial
query
A texture feature based on the local In our case, only relevant images are
Fourier transform (LFT) has been developed to available for our proposed mechanism so we set to
classify textures and segment images. We operate the
original image with eight templates derived from LFT be 1 and to be zero in our experiments. Although
and obtain eight characteristic maps, each of which Rochio’s algorithm is used currently any vector based
characterizes some information on a certain aspect of RF algorithm could be used in the unified frame work.
the original image. Similar to color moments, we
calculate the first and second moments of the RF in visual space:
characteristic maps, which To perform RF in visual space, Rui’s
represent the color texture information of the original algorithm is used. Assume clicked images to be
To perform RF in textual space, relevant, both an optimal query and feature weights
Rocchio’s algorithm is used. The algorithm was are learned from the clicked images. More
developed in the mid-1960s and has been proven to be specifically, the feature vector of the optimal query is
one of the most effective RF algorithms in the mean of all features of clicked images. The weight
information retrieval. The key idea of Rocchio’s of a feature dimension is proportional to the inverse of
algorithm is to construct a so-called optimal query so the standard deviation of the feature values of all
that the difference between the average score of a clicked images. Weighted Euclidean distance is used
to calculate the distance between an image and the

8
optimal query. Although Rui’s algorithm is used Relevance feedback, originally
currently, any RF algorithm using only relevant developed for information retrieval is an online
images could be used in the unified framework. learning technique used to improve the effectiveness
of the information retrieval system. The main idea of
Relevance feedback (RF) has relevance feedback is to let the user guide the system.
been extensively studied in the content- During retrieval process, the user interacts with the
based image retrieval community. system and rates the relevance of the retrieved
However, no commercial Web image documents, according to his/her subjective judgment.
search engines support RF because of With this additional information, the system
scalability, efficiency and effectiveness dynamically learns the user’s intention, and gradually
issues. In this paper we proposed a presents
scalable relevance feedback mechanism better results. Since the introduction of relevance
using click through data for web image feedback into image retrieval in the mid-1990’s, it has
retrieval. The proposed mechanism attracted tremendous attention in the CBIR
regards users’ click-through data as community and has been shown to provide dramatic
implicit feedback which could be collected performance improvement. However, almost all the
at lower cost, in larger quantities and existing relevance feedback algorithms in image
without extra burden on the user. During retrieval systems are performed in an explicit way. It
RF process, both textual feature and visual is noted that explicit relevance feedback techniques
feature are used in a sequential way. To have been underutilized as they place an increased
seamlessly combine textual feature-based cognitive burden on users while the benefits are not
RF and visual feature-based RF, a query always obvious to them. Comparing with explicit
concept-dependent fusion strategy is feedback, implicit feedback could be collected at
automatically learned. Experimental much lower cost, in much larger quantities and
results on a database consisting of nearly without burden on the user. As one of the most
three million Web images show that the effective implicit feedback information, click-through
proposed mechanism is wieldy, scalable data has been used either as absolute relevance
and effective. judgments or relative relevance judgments.

With the explosive growth of both To the best of our knowledge, implicit
World Wide Web and the number of digital images, feedback using click through data has not been
there is more and more urgent need for effective Web applied to Web image retrieval systems in both textual
image retrieval systems. Most of the popular space and visual space. Comparing with text retrieval,
commercial search engines, such as Google, Yahoo! image retrieval has the following two characteristics.
And AltaVista, support image retrieval by keywords. First, since the thumbnail of an image reflects more
There are also commercial search engines dedicated to information than the title and snippet of a Web page,
image retrieval, e.g. Pic search. A common limitation click-through information of image retrieval tends to
of most of the existing Web image retrieval systems is be less noisy than that of text retrieval. Second, unlike
that their search process is passive, i.e disregarding textual document, the content of an image can be
the informative interactions between users and taken in at a glance. As a result, the user will possibly
retrieval systems. An active system should let the user click more results in image retrieval than in page
involve into the loop so that personalized results could retrieval. Both characteristics imply that click-through
be provided for specific users. To be active, the data could be more helpful for image retrieval. Thus
system could use relevance feedback techniques. there has been study on generating semantic features
based on user’s click-through data for image search
results clustering.

9
it is incapable for systems only offering relevant
Although all existing commercial Web images.
image retrieval systems solely depend on textual Since textual features are more
information, Web images could be characterized by semantic-oriented and efficient than visual features
textual and visual features. Note that making effective while visual features have finer descriptive granularity
use of textual features can conduce image retrieval by than textual features, we combine the RF in both
high-level concepts more efficient, and leverage feature spaces in a sequential way. First, RF in textual
mature techniques from text retrieval. However, just space is performed to rank the initial resulting images
as the proverb “a picture is worth one thousand using the optimal query. Then, RF in visual space is
words”, the textual representation of an image is performed to re-rank the top images. The re-ranking
always insufficient compared to the visual content of process is based on a dynamic linear combination of
the image itself. Therefore, visual features are the RF in both visual and textual spaces.
required for finer granularity of image description.
Considering textual and visual feature, RF in textual Note that restricting the re-ranking only
space could guarantee relevance and RF in visual on the top images has two advantages. First, the
space could meet the need for finer granularity. In this relevance of the top images could be guaranteed by
paper, we proposed a scalable relevance feedback the former RF in textual space. Second, the efficiency
mechanism using click-through data for web image of RF process could be ensured, for RF in visual space
retrieval. There are three main contributions of the could possibly be inefficient on a very large image set.
paper. The number of top images that affects both efficiency
and effectiveness of the RF process is predetermined
_ Implicit feedback using click-through data is experimentally. The re-ranking process is based on a
introduced to Web image retrieval in both textual dynamic multimodal fusion of the RF in visual and
space and visual space for the first time. textual spaces. The combination weights that reflect
the relative contribution of both spaces are
_ An efficient and effective scheme is proposed to automatically learned and query concept-dependent.
seamlessly combine visual feature-based RF (VBRF) Assume there are clicked images. The similarity
and textual feature-based RF (TBRF). More metric used to re-rank a top image using RF in both
specifically, a TBRF algorithm is first used to quickly visual and textual spaces is defined as follows:
select a possibly relevant image set. Then, a VBRF
algorithm is combined with the TBRF algorithm to
further re-rank the result images. The
fusion of VBRF and TBRF is query concept
dependent and automatically learned.

_ Extensive experiments have been done to evaluate


the proposed mechanism using a database consisting
of nearly three million Web images.

Dynamic multi modal fusion:


There has been some work on fusion of
relevance feedback in different feature spaces. A
straightforward and widely used strategy is linear Where:
combination. Nonlinear combination using support S is the similarity metric in both visual and the
vector machine (SVM) was proposed in. Since the textual spaces
super-kernel fusion algorithm needs irrelevant images,

10
is the similarity between I’s visual feature and 5. is the weighted Euclidean distance

between I’s visual feature and


is the cosine similarity between I’s textual
Note that tunes the visual feature’s
feature and contribution to the overall similarity metric according
to different query concept. controls the overall
is the dynamic linear combination parameter for
similarity metric in both visual and textual spaces. contribution of RF in visual space, fine-tunes the
Now this Dynamic multi modal is contribution. If the query concept could be well
explained by pictorial representation or by the characterized by visual feature and the clicked images
following flow chart: should be visually consistent, will be small
(near 0). should be large. Thus, visual feature will
be important. This is consistent with our intuition.
Since is query concept-dependent, the resulting
combination parameter is query concept-dependent
as well. This property of the parameter results in a
query concept-dependent fusion strategy for relevance
feedback in both textual and visual space.

User interface:
To make the best of the implicit feedback
information, a new Web image search UI named Mind
Tracer is proposed. Mind- Tracer consists of two
types of pages: main page and detail page. The main
page has three frames: search frame, recommendation
frame, and result frame. The search frame contains an
edit box for users to type query phrase. Only text-
based queries are supported by Mind Tracer since they
are friendly and familiar to the typical surfer of the
Web. After a user submits a query to Mind Tracer, the
thumbnails of result images are shown in the result
Figure: Flow chart of the RF of the unified frame
frame with five rows and four columns. Initially, no
work
images are shown in the recommended frame. When
1. and are the parameters which controls the user clicks an image in the result frame for sake of
the Relative contribution of the RF in visual his/her interest, the recommendation function are
space. activated, so that the dynamic multimodal RF are
2. deviation of the clicked image in the carried out. As a result, a finer ranking of
visual space. the initial results are obtained, and the top
20 recommended images will be shown in
3. is the visual feature vector of the clicked the recommendation frame. The images
image . iteratively roll in the recommendation
window with a scroll-bar that could be
4. is the feature vector of the optimal manually controlled by the user. And these
query in the visual space. are shown by the following figures:

11
Figure: Main page of the mind tracer

Figure: Flow chart of the user interface


Accompanying the user’s
click through the corresponding original
image will be shown in a detailed page.
The detailed page has two frames image
frame and the snapshot frame. If the user
clicks another image in the result frame or
the recommendation frame, besides the
aforementioned system reactions, the
former recommended images will be
shown in the snapshot frame of the detail
page in case that the user wants more
images from the former recommended
image list. If the user clicks an image in
the snapshot frame, the corresponding
Figure: Detail page of the mind tracer original image will be shown in the image
frame. Once the user is satisfied with the
recommended results, he/she could click
the refine button to move all the
recommended images from
recommendation frame to the result frame.
With the asynchronous scheme for
refreshing the detail page and the
recommendation frame of the main page,
no extra-waiting time is required to
support the recommendation scheme.

REFERENCES

12
[1] Google Image Search, [Online].
Available: http://images.google.com

[2] J. Rocchio, Relevance Feedback in


Information Retrieval. Upper Saddle River,
NJ: Prentice-Hall, 1971.

[3] X. S. Zhou and T. S. Huang, “Relevance


feedback in image retrieval: A
comprehensive review,” ACM Multimedia
Syst., vol. 8, no. 6, pp. 536–544, 2003.

[4] Y. Rui, T. S. Huang, M. Ortega, and S.


Mehrotra, “Relevance feedback: A power
tool for interactive content-based image
retrieval,” IEEE Trans. Circuits Syst. Video
Technol., vol. 8, no. 5, pp. 644–655, May
1998.

[5] H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma,


and J. W. Ma, “Learning to cluster web
search results,” in Proc. 27th Annu. Int.
ACM SIGIR Conference on Research and
Development in Information Retrieval,
2004, pp. 210–217.

13

You might also like