You are on page 1of 6

Visual Information Retrieval for Content Based Relevance Feedback: A Review

Prof. B. J. Joglekar
Assistant Professor Department of Information Technology MAEERS MIT, Kothrud, Pune-38

Karishma Mutha (Pursuing M.E.) Department of Information Technology MAEERs MIT, Kothrud, Pune-38 Abstract:
Content-based visual information retrieval (CBVIR) or content-based image retrieval (CBIR) has been one of the most vivid research areas in the field of computer vision over the last 10 years. This paper gives an overview of literature in the field of content based access with relevance feedback and its application in the field of image retrieval. This paper also identifies explanations to some of the outlined problems in the field as many propositions for systems are made from the image retrieval domain and research prototypes are developed in computer science departments using huge datasets. Still, there are very few systems that seem to be used in practice. years and contain reference to large number of system description of the technologies implemented. Guo-Dong-Guo and Anil. K. Jain [7] proposed a new scheme of learning similarity measures for content based image retrieval (CBIR). It learns a boundary that classifies the image from the images in the database into two clusters. Ritendra Dutta [1] (2008) presented an article on new challenges and ideas in the field of image retrieval. This article describes common problems related to CBIR such as semantic gap or the sensory gap and gives link to large number of articles describing various techniques used in the domain. This article does a systematic overview of techniques used, visual features employed, images indexed and the person involved based on relevance feedback. Rest of the paper is organized as follows. Section 1 gives an introduction into generic contentbased image retrieval with relevance feedback and the technologies used. Section 2 explains the visual features of image. Example systems and application areas are described. Section 3 describes the applications of CBVIR systems.

General terms: relevance feedback, content

based mage retrieval.





Among the various applications of computer science to the field of image
retrieval, the content based image retrieval is playing an important role. CBIR has been an extremely active research area. The following review articles from various years explain state-of-the-art techniques of corresponding

1.1 Content-Based Retrieval systems


Content-based image retrieval (CBIR) is any technology that in principle helps to organize digital picture archives by their visual content. Anything ranging from an

image similarity function to a robust image annotation engine falls under the purview of CBIR. This characterization of CBIR as a field of study places it at a unique occasion within the scientific community. While we witness continued effort in solving the fundamental open problem of robust image understanding, we also see people from different fields, such as, computer vision, machine learning, information retrieval, human-computer interaction, database systems, Web and data mining, information theory, statistics, and psychology contributing and becoming part of the CBIR community. CBIR as a real-world technology has certain shortcomings. One problem with all current approaches is the dependence on visual similarity for judging semantic similarity, which may be problematic due to the semantic gap between low-level content and higher-level concepts. Inspite of this difficulty in solving core problems, the current state-ofthe-art in CBIR holds enough promise to be useful for real-world applications if fastgrowing attempts are made. For example, online photo-sharing has become extremely popular with Flickr, which hosts hundreds of millions of pictures with diverse content. Potential real-world applications of and image analysis technologies have helped gain a renewed interest in image retrieval using CBIR. The need of the hour is to establish how this technology can reach out to the common man in the way text retrieval techniques have. Methods for visual similarity, or even semantic similarity, will remain techniques for building systems. For some applications, visual similarity may in fact be more critical than semantic. For others, visual similarity may have little significance. Under what scenarios a typical user feels the need for a CBIR system, what the user sets out to achieve with the system, and how she expects the system to aid in this process are some key questions that need to be answered in order to produce a successful system design. Unfortunately, user studies of this

nature have been scarce so far. Comprehensive surveys exist on the topic of CBIR, all of which deal primarily with work prior to the year 2000. Surveys also exist on closely related topics such as relevance feedback, high-dimensional indexing of multimedia data, face recognition (useful for face-based image retrieval), applications of CBIR to medicine [6], and applications to art and cultural imaging [6]. Multimedia information retrieval, a broader research area covering video, audio, image, and text analysis has been extensively surveyed [6]. In our current survey, we restrict the discussion to image-related research only. One of the reasons for writing this survey is that CBIR, as a field, has grown incredibly after the year 1999 in terms of the people involved and the papers published. Lateral growth has also occurred in terms of the associated research questions addressed, spanning various fields.

1.2 Relevance feedback

The concept of relevance feedback developed during the 1960s to improve document retrieval processes, consisting of using user feedback to judge the relevance of search results and therefore improve their quality through iterative steps. Moreover, by collecting feedbacks from the user, a CBIR system can dramatically increase its performance by reducing the gap between the high level schematics in the users mind and low level image descriptors. Relevance feedback is a powerful technique for interactive image retrieval [10]. Minka and Picard [9] presented a learning technique for iterative image retrieval. The key idea behind this approach is that each feature model has its own strength in representing a certain aspect of image content, and thus, the best way for effective contentbased retrieval is to utilize a society of models. A typical approach in relevance feedback is to adjust the weights of various

levels to accommodate the users need. Another method is to modify and convert the query into a new representation by using the positive and negative examples provided by the users [7]. Chang, (1998) [12] proposed a model which allows for interactive construction of a set of queries which detect visual concepts such as sunsets. Sclaroff, (2001) [15] describe the first WWW image search engine which focused on relevance feedback to guide the feature selection process. It was found that the positive examples were more important towards maximizing accuracy than the negative examples. Rui and Huang (2001) [11] compared heuristic to optimization based parameter updating and found that the optimization based method achieves higher accuracy. Chen (2001) [12] described a one-class SVM method for updating the feedback space which shows substantially improved results over previous work. He, (2002) used both short term and long term perspectives to infer a schematic space from users relevance feedback for image retrieval. The short term perspective was found by marking the top three incorrect examples from the results as irrelevant and selecting at most three images as relevant examples from the current iteration. The long term perspective was found by updating the schematic space from the results of the short term perspective. Yin, (2005) [16] found that combining multiple relevance feedback strategies gives superior results as opposed to any single strategy. Tieu and Viola (2004) [17] proposed a method for applying the AdaBoost learning algorithm and noted that it is quite suitable for relevance feedback due to the fact that AdaBoost works well with small training sets. Howe (2003) compares different strategies using AdaBoost. Dy, (2003) used a total level approach via customized queries and introduced a new unsupervised learning method called feature subset selection using expectationmaximization clustering. Their method doubled the accuracy for the set of lung

images. Guo, (2001) [7] performed a comparison between AdaBoost and SVM and found that SVM gives superior retrieval results. Haas, (2004) [19] described a general paradigm which integrates external knowledge sources with a relevance feedback mechanism and demonstrated on real test sets that the external knowledge substantially improves the relevance of the results. A good overview can also be found from Muller (2002) [20].

1.3 Image Indexing

Images can be indexed and retrieved by textual descriptors and by image content. In textual queries, text is used to retrieve images or image sets, and in visual queries (contentbased retrieval), images or image sets are retrieved by visual characteristics such as color, texture, and shape. Image retrieval strategies based on either text-based or content-based retrieval alone have their limitations. Images are rarely subjected to a wide range of interpretations, and textual descriptions can only begin to capture the richness and complexity of the semantic content of a visual image. Human indexing of images is highly labor-intensive and limiting when large databases are involved. However, retrieval based on visual characteristics is computationally intense and has not yet reached the point where it can be efficiently used to formulate intellectually subtle queries, especially for non-specialist users. Christos Faloutsos (1995) proposed a very promising idea for fast searching in traditional and multimedia databases to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert . Thus, we can subsequently use highly finetuned spatial access methods (SAMS), to answer several types of queries, including the Query By Example type (which translates to a range query); the all pairs query (which translates to a spatial join ); the nearestneighbor or best-match query, etc.

Christos Faloutsos (2001) described imageMap, as a method for indexing and similarity searching in Image DataBases (IDBs). ImageMap answers queries by example, involving any number of objects or regions and taking into account their interrelationships. They adopted the most general image content representation that is Attributed Relational Graphs (ARGs), in conjunction with the well-accepted ARG editing distance on ARGs. Also, they tested ImageMap on real and realistic medical images. This method not only provides for visualization of the dataset, clustering and data-mining, but it also achieves up to 1,000-fold speed-up in search over sequential scanning, with zero or very few false dismissals.

While most of the images are in the red, green, blue (RGB) color space, this space is only rarely used for indexing and querying as it does not match well to the human color perception. It only seems sensible to be used for images taken under exactly the same conditions each time such as trademark images. Other spaces such as hue, saturation, value (HSV) or the CIE Lab and Luv spaces are much better with respect to human perception and are more frequently used. This means that differences in the color space are similar to the differences between colors that humans perceive.

2. Visual features used

Visual features were distinguished [7] into primitive features such as color or shape, logical features such as identity of objects shown and abstract features such as significance of scenes depicted. Still, all currently available systems only use primitive features unless manual notation is tied with the visual features as in. Even systems using segments and local features such as Blobworld are still far away from identifying objects faithfully. No system offers rendition of images or even medium level concepts as they can easily be captured with text. This loss of information from an image to a representation by features is called the semantic gap. The situation is surely unsatisfactory and the semantic gap definitely accounts for part of the rejection to use image retrieval applications, but the technology can still be valuable when advantages and problems are understood by the users. The more a retrieval application is specialized for a certain, restricted domain, the smaller the gap can be made by using domain knowledge. Another gap is the sensory gap that describes the loss between the actual structure and the representation in a (digital) image.

2.2. Texture
Texture measures have an even larger variety than color measures. Some of the most common measures for capturing the texture of images are wavelets and Gabor filters where the Gabor filters do seem to perform better and correspond well to the properties of the human visual cortex for edge detection. These texture measures try to catch the characteristics of an image or image parts with respect to changes in certain directions and the scale of the changes. This is most useful for regions or images with homogeneous texture.

2.3. Local and global features

Both, color and texture features can be used on a global image level or on a local level on parts of the image. The easiest way to use regional features is to use blocks of fixed size and location, socalled partitioning of the image for local feature extraction.

2.4. Segmentation and shape features

Full segmentation of images into objects itself is an unsolved problem. Even in

2.1. Color

fairly specialized domains, fully automated segmentation causes many problems and is often not easy to recognize. In image retrieval, several systems attempt to perform an automatic segmentation of the images in the collection for feature extraction. To have an effective segmentation of images using varied image databases, the segmentation process has to be done based on the color and texture properties of image regions.

1) Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Computer. Survey. 40, 2, Article 5 (April 2008), 60 pages 2) Hossein Falaki. AdaBoost Algorithm. Computer Science Department, University of California, Los Angeles, s 3) Md. Farooque. Image indexing and retrieval. Documentation Research and Training Centre Indian Statistical Institute Bangalore560059, 2003 4) Christos Faloutsos . ImageMap: An Image Indexing Method Based on Spatial Similarity. Dept. of Computer Science Carnegie Mellon University, 2001 5) Ji Zhu, Multiclass AdaBoost. Department of Statistics, University of Michigan, Ann Arbor, MI 48109, 2006. 6) Michael s. Lew, Nicu Sebe, Djeraba Lifl, and Ramesh Jain. Contentbased Multimedia Information Retrieval: State of the Art and Challenges 2006. 7) Guo-Dong Guo, Anil K. Jain, WeiYing Ma, and Hong-Jiang Zhang. Learning Similarity Measure for Natural Image Retrieval With Relevance Feedback IEEE transactions on neural networks, vol. 13, no. 4, July 2002 8) J. Huang, S. R. Kumar, and M. Metra, Combining supervised learning with color coorelograms for content-based image retrieval,

2.5. Semantics?
All these visual features, and even features derived from segmented regions, are still reasonably low-level compared to high level concepts that are contained in the images. They do not necessarily match to objects in the images or the semantic concepts or structures that a user is interested in.



The applications based on CBVIR can be used in the field of medical technology, information retrieval on World Wide Web, and retrieving information from huge datasets.

There is large number of articles and publications in the area of content based visual information retrieval with relevance feedback. Also, there is huge work done in this area and also several prototypes are developed. But, the topic is still an area of research for many researchers who are still working in order to deliver an efficient application based on CBIR.

We would like to thank the reviewers for their comments that helped to improve the quality of this paper.





13) 14)


in Proc ACM Multimedia95, 1997, pp. 325334. T. P. Minka and R. W. Picard, Interactive Learning Using a Society of Models, MIT Media Lab., Cambridge, MA, 349, 1995. Y. Rui et al., A relevance feedback architecture in contentbased multimedia information retrieval systems, in Proc. IEEE Workshop Content- Based Access of Image and Video Libraries, 1997. Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, Relevance feedback: A powerful tool for interactive content-based image retrieval, IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 644655, Sept. 1998. Chang, Chen, and Sundaram, 1998. Semantic visual templates: Linking visual features to semantics. In Proceedings of the IEEE International Conference on Image Processing, IEEE Computer Society Press, Los Alamitos, Calif. 531 535. Sebe, and Lew, M.S. 2001. Color Based Retrieval; Pattern Recognition Letters 22(2), 223-230. Wang, J. Z., Boujemaa, N., Del Bimbo, A., Geman, D., Hauptmann, and Tesic, J. 2006. Diversity in multimedia information retrieval research. In Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR) at the International Conference on Multimedia. Sclaroff, S., La Cascia, M., Sethi, S., and Taycher, L. 2001. Mix and Match Features in the ImageRover Search Engine. In Principles of Visual Information Retrieval, M.S. LEW, Ed. Springer-Verlag, London, 259-277.

16) Yin, P.Y., Bhanu, B., Chang, K.C., and Dong, A. 2005. Integrating Relevance Feedback Techniques for Image Retrieval Using Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(10), 1536-1551. 17) Tieu, K. and Viola, P. 2004. Boosting Image Retrieval, International Journal of Computer Vision 56(1), 17-36. 18) Howe, N. 2003. A Closer Look at Boosted Image Retrieval. In Proceedings of the 2nd International Conference on Image and Video Retrieval, Urbana, July 2003, E.M. Bakker, T.S. Huang, M.S. Lew, N. Sebe, And X. Zhou, Eds. Springer-Verlag, London, 6170 19) Haas, M., Rijsdam, J. And Lew, M. 2004. Relevance feedback: perceptual learning and retrieval in biocomputing, photos, and video, In Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, New York, October, 151-156. 20) Muller, H., Marchand-Maillet, S., And Pun, T. 2002. The Truth about Corel Evaluation in Image Retrieval. In Proceedings of the 1st International Conference on Image and Video Retrieval, London, July 2002, M.S. Lew, N. Sebe, And J.P. Eakins, Eds. Springer-Verlag, London, 38-49.