You are on page 1of 9

International Journal of Advances in Science and Technology, Vol. 3, No.

6, 2011

Review Paper on Content Based Image Retrieval Using Feature Extraction


Pranali Prakash Lokhande1 and P. A. Tijare2
1

Department of Computer Science and Engineering, Sipnas College of Engineering & Technology, Amravati, Maharashtra, India pranali.lokhande@gmail.com
2

Associate Professor, Department of Computer Science and Engineering, Sipnas College of Engineering & Technology, Amravati, Maharashtra, India pritishtijare@rediffmail.com

Abstract
Now a days people are interested in using digital images. There is a great need for developing an efficient technique for finding the images. A significant and increasingly popular approach that helps in the retrieval of image data from a huge collection is called Content Based Image Retrieval (CBIR). In order to find an image, image has to be represented with certain features. Color, texture and shape are three important visual features of an image. An efficient image retrieval technique can be implemented which uses color, texture and shape features of an image. Color is one of the most widely used low-level visual features and is invariant to image size and orientation. As conventional color features used in CBIR, there are color histogram, color correlogram, and Dominant Color Descriptor (DCD). Although a precise definition of texture is untraceable, the notion of texture generally refers to the presence of a spatial pattern that has some properties of homogeneity. In particular, the homogeneity cannot result from the presence of only a single color in the regions, but requires interaction of various colors. There is no universal definition of what shape is either. Impressions of shape can be conveyed by color or intensity patterns, or texture, from which a geometrical representation can be derived. We can consider shape as something geometrical. Therefore, we can consider edge orientation over all pixels as texture, but edge orientation at only region contours as shape information.

Keywords: Content Based Image Retrieval, Linear Block Algorithm, Dominant Color Descriptor, Gray Level Co-occurrence Matrix, Color Co-occurrence Matrix, Block Difference of Inverse Probabilities, Block Variation of Local Correlation coefficients, Alexandria Digital Library, Color Hierarchical Representation Oriented Management Architecture, Picture & Self Organized Map, Query By Image Content. 1. Introduction
Due to the proliferation of video and image data in digital form, Content Based Image Retrieval (CBIR) has become a prominent research topic. This motivates the extensive research into image retrieval systems. From historical perspective, one shall notice that the earlier image retrieval systems are rather text-based search since the images are required to be annotated and indexed accordingly. However, with the substantial increase of the size of images as well as the size of image database, the task of user-based annotation becomes very cumbersome, and, at some extent, subjective and, thereby, incomplete as the text often fails to convey the rich structure of the images. This motivates the research into what is referred to as CBIR. Therefore an important problem that needs to be addressed is fast retrieval of images from large databases. To find images that are perceptually similar to a query image, image retrieval systems attempt to search through a database. CBIR can greatly enhance the accuracy of the information being returned and is an important alternative and complement to traditional text-based image searching.

Special Issue

Page 1 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

CBIR is desirable because most web based image search engines rely purely on metadata and this produces a lot of garbage in the results. Also having humans manually enter keywords for images in a large database can be inefficient, expensive and may not capture every keyword that describes the image. Thus a system that can filter images based on their content would provide better indexing and return more accurate results. For describing image content, color, texture and shape features have been used. Color is one of the most widely used low-level visual features and is invariant to image size and orientation. There are color histogram, color correlogram as conventional color features used in CBIR. For representing color in terms of intensity values, a color space is defined as a model. A color component or a color channel is one of the dimensions. The RGB model uses three primary colors, red, green and blue, in an additive fashion to be able to reproduce other colors. As this is the basis of most computer displays today, this model has the advantage of being easy to extract. In a true-color image each pixel will have a red, green and blue value ranging from 0 to 255. Without any other information, many objects in an image can be distinguished solely by their textures. Texture may describe the structural arrangement of a region and the relationship of the surrounding regions and may also consist of some basic primitives. Texture is a very general notion that can be attributed to almost everything in nature. For a human, the texture relates mostly to a specific, spatially repetitive (micro) structure of surfaces formed by repeating a particular element or several elements in different relative spatial positions. Generally, the repetition involves local variations of scale, orientation, or other geometric and optical features of the elements. Texture is a key component of human visual perception. Like color, when querying image databases, this makes it an essential feature to consider. Everyone can recognize texture, but it is more difficult to define. Unlike color, texture occurs over a region rather than at a point. It is normally perceived by intensity levels and as such is orthogonal to color. Texture can be described in terms of direction, coarseness, contrast and so on. It has qualities such as periodicity and scale. It is this that makes texture a particularly interesting facet of images and results in a plethora of ways of extracting texture features. Image textures are defined as images of natural textured surfaces and artificially created visual patterns, which approach, within certain limits, these natural objects. Image sensors yield additional geometric and optical transformations of the perceived surfaces, and these transformations should not affect a particular class of textures the surface belongs. It is almost impossible to describe textures in words, although each human definition involves various informal qualitative structural features, such as fineness, coarseness, smoothness, granularity, lineation, directionality, roughness, regularity, randomness, and so on. These features, which define a spatial arrangement of texture constituents, help to single out the desired texture types, e.g. fine or coarse, close or loose, plain or twilled or ribbed textile fabrics. It is difficult to use human classifications as a basis for formal definitions of image textures, because there is no obvious ways of associating these features, easily perceived by human vision, with computational models that have the goal to describe the textures. Nonetheless, after several decades of research and development of texture analysis and synthesis, a variety of computational characteristics and properties for indexing and retrieving textures have been found. In many cases, the textural features follow from a particular random field model of textured images. As humans can recognize objects solely from their shapes, object shape features can also provide powerful information for image retrieval. Usually, the shape carries semantic information and shape features are different from other elementary visual features, such as color or texture features. Basically, shape features can be categorized as boundary-based and region-based. The former extracts features based on the outer boundary of the region while the latter extracts features based on the entire region. Shape matching is a well-explored research area with many shape representation and similarity measurement techniques found in the literature. Shape feature has been extensively used for retrieval systems. At the primitive level, shape is the most obvious requirement. Shape is a well-defined concept and there is considerable evidence that natural objects are primarily recognized by their shape. We will review CBIR system that is based on color, texture and shape.

Special Issue

Page 2 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

The remainder of this paper is organized as follows. Section 2 presents methods used for implementing CBIR. Section 3 outlines the various CBIR systems. In Section 4, we have described some of the potential applications of CBIR. Section 5 presents our conclusion.

2. Methods used for implementing CBIR


The combination of Color based, Texture based and Shape based methods will be used for developing CBIR.

2.1. Color based methods


2.1.1. LBA (Linear Block Algorithm) In [1] and [2], a color quantization method for dominant color extraction called the Linear Block Algorithm is presented and LBA is efficient in color quantization and computation. For simplicity and without loss of generality, the RGB color space is used. Firstly the image is uniformly divided into 8 coarse partitions. After the above coarse partition, the centroid of each partition is selected as its quantized color. Let X=(XR,XG,XB) represent color components of a pixel with color components Red, Green, and Blue, and Ci be the quantized color for partition i. Dominant Color Descriptor (DCD) is one of the color descriptors proposed by MPEG-7 that has been extensively used for image retrieval. Among the color descriptors, the salient color distributions in an image or a region of interest are described by DCD. DCD provides an intuitive, effective and compact representation of colors presented in an image. In [6], an efficient scheme for dominant color extraction is developed. This scheme also uses LBA color quantization method which significantly improves the efficiency of computation for dominant color extraction. 2.1.2. HSV color space In [4] and [5], HSV color space is used for color extraction. It is used in computer graphics, visualization in scientific computing and other fields. In this space, hue is used to distinguish colors, saturation is the percentage of white light added to a pure color and value refers to the perceived light intensity. The benefit of HSV color space is that it has the ability to separate chromatic and achromatic components and is closer to human conceptual understanding of colors. It is essential to quantify HSV space component to reduce computation and improve efficiency. Unequal interval quantization according the human color perception can be applied on H, S, V components. Based on the color model of substantial analysis, color can be divided into eight parts. Saturation and intensity can be divided into three parts separately in accordance with the human eyes to distinguish. Three-component vector of HSV can form one-dimensional vector which quantize the whole color space for the 72 kinds of main colors. So we can handle 72 bins of one-dimensional histogram. Color histogram can be derived by first quantize colors in the image into a number of bins in a specific color space and counting the number of image pixels in each bin. [4] represents the one dimensional vector G by constructing a cumulative histogram of the color characteristics of image after using non-interval HSV quantization for G. 2.1.3. Color autocorrelogram In [7], the color autocorrelogram is used for color extraction. It captures the spatial correlation between identical colors only. The color autocorrelogram is given as the probability of finding a pixel of the identical color at some distance from a given pixel of specified color.

Special Issue

Page 3 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

2.1.4. CIE lab color space In [9], CIE lab color space is used in order to characterize color content of an image. The lab color coordinate system has the benefit that the same distance in the color space leads to equal human color difference perception and lightness is distinct and independent from chromaticity coordinates.

2.2. Texture based methods


2.2.1. GLCM (Gray Level Co-occurrence Matrix) and CCM (Color Co-occurrence Matrix) In [1], a texture representation for image retrieval based on GLCM is proposed. GLCM is created in four directions with the distance between pixels as one. Texture features are extracted from the statistics of this matrix. GLCM expresses the texture feature according the correlation of the couple pixels gray-level value at different positions. It quantificationally describes the texture feature. Four texture features are considered. They include energy, contrast, entropy, inverse difference. In [4], a texture representation for image retrieval based on GLCM and CCM is proposed. Four texture features extracted from GLCM and CCM also include energy, contrast, entropy, inverse difference. 2.2.2. Steerable filter decomposition In [2], a new rotation-invariant and scale-invariant texture representation for image retrieval based on steerable filter decomposition is proposed. Steerable filter is a class of filters in which a filter of arbitrary orientation is synthesized as a linear combination of a set of basis filters. The edge located at different orientations in an image can be detected by splitting the image into orientation sub bands obtained by the basis filters having these orientations. It allows one to adaptively steer a filter to any orientation and to determine analytically the filter output as a function of orientation. 2.2.3. Wavelet transform Texture feature based on the wavelet transform is proposed in [5].Wavelet analysis has been widely used in the image processing, especially in the field of image retrieval because of the unique characteristics and advantages of signal analysis. The powerful time frequency analysis ability of the wavelet provides a feasible way for high-accuracy retrieval system and makes the image characteristics well described. Through the wavelet transform the signal is decomposed into a series of basic functions. These basic functions can be obtained by Wavelet generating function. Two-dimensional wavelet transform makes image decomposed into four sub-bands, are known as LL, LH, HL and HH. Each band can be used for each decomposition level according to the frequency characteristics. The energy distribution of the mean and standard deviation express image texture features. To decompose the image, the Daubechies-4 wavelet transform can be applied. The mean and standard deviation of the four-band wavelet decomposition coefficients can be extracted after one-level wavelet transform decomposition. 2.2.4. BDIP (Block Difference of Inverse Probabilities) and BVLC (Block Variation of Local Correlation coefficients): In [7], BDIP and BVLC are used for texture extraction. BDIP is a texture feature that effectively extracts edges and valleys. In images, edges represent the regions which involve abrupt change of intensity and valleys represent the regions which contain local intensity minima. They are very important features in human vision and especially valleys are fundamental in the visual perception of object shape. BVLC represents the variation of block-based local correlation coefficients according to four orientations. It is known to measure texture smoothness well. Each local correlation coefficient is

Special Issue

Page 4 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

defined as local covariance normalized by local variance. The value of BVLC is determined as the difference between the maximum and minimum values of block-based local correlation coefficients according to four orientations. 2.2.5. Tamura and Gabor In [8], Tamura and Gabor filter are used for texture extraction. Tamura et al. took the approach of devising texture features that correspond to human visual perception. They defined six textural features coarseness, contrast, directionality, line-likeness, regularity and roughness and compared them with psychological measurements for human subjects. The first three attained very successful results and are used in our evaluation, both separately and as joint values. The use of Gabor filters has been one of the most popular signal processing based approaches for texture feature extraction. These enable filtering in the spatial and frequency domain. It has been proposed that Gabor filters can be used to model the responses of the human visual system. By using a bank of Gabor filters, Turner first implemented this to analyze texture. To extract frequency and orientation information, a bank of filters at different scales and orientations allows multichannel filtering of an image. This can be used to decompose the image into texture features. The feature is computed by first filtering the image with a bank of orientation and scale sensitive filters and then computing the mean and standard deviation of the output in the frequency domain. 2.2.6. Wavelet frames In [9], wavelet frames are used for texture extraction. The basic tools used for processing textured images are the wavelet frames ensuring the translation invariance of the features and a filter bank. The filter bank can simplify classification problem by decomposing the image into orthogonal components. Corresponding to different scales, the input signal can be decomposed into wavelet coefficients. Important texture characteristics such as periodicity and translational invariance can be handled by discrete wavelet frames.

2.3. Shape based method


2.3.1. Pseudo Zernike moments In [2] and [3], Pseudo Zernike moments are proposed for shape extraction. Pseudo Zernike moments consist of a set of complex polynomials which form a complete orthogonal set over the interior of the unit circle. Pseudo Zernike moments are not scale or translation invariant. By normalizing the image, the translation invariance and scaling can be firstly obtained, and then shape feature set can be selected for image retrieval.

3. CBIR systems
A number of content-based image retrieval systems, in alphabetical order are described in [10], which are mentioned below. If no matching, querying, indexing data structure, features or result presentation is mentioned, then it is not applicable to the system (e.g. there is no relevance feedback), or no such information is known to us (e.g. often no information is given about indexing data structures).

3.1. ADL (Alexandria Digital Library)


Developer: University of California, Santa Barbara. Matching: The query area is compared to the object footprints, according to the functioning matching operator (contains or overlaps), and those objects that match are potentially members of the result set for the query, when the query is sent to the databases.

Special Issue

Page 5 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

Result presentation: The matched images are shown as thumbnails and their footprints are shown on the map in the map browser. Drawback: Many evaluations were a short-term activity closely bound to development. Most evaluations were too brief to incorporate characteristics of adaptability, but the evaluators did include what they had learned from the evaluation of one digital library project into the assessment of others. Finally, there is little evidence that digital library evaluations incorporate triangulation in data collection and analysis.

3.2. CHROMA (Color Hierarchical Representation Oriented Management Architecture)


Developer: School of Computing, Engineering and Technology, University of Sunderland, UK. Matching: The matching score of two images is the Jaccards coefficient of their matrices, which is the number of matrix elements that are equal, divided by the maximum number, 225. Result presentation: The resulting images are presented in linear order of matching score. Drawback: Instead of its effectiveness for initial query formulation, it would not be effective as a sole query mechanism. CHROMAs indexing scheme has many drawbacks. For instance, the same object photographed with different levels of "zoom" may index quite differently, even though a human indexer or retriever may perceive little difference between the photographs.

3.3. DrawSearch
Developer: Department of Electrical and Electronic Engineering, Technical University of Bari, Italy. Matching: The similarity between two feature vectors is given by the cosine metric in the color/shape subsystem. Between the two color vectors and shape descriptors, the similarity score between a query and a database image is calculated as a weighted sum of the distances. Result presentation: Images are displayed in decreasing similarity order. Drawback: There is lack of acknowledged "ground truth" performance measures. This is mainly because of the inherent uncertainty of similarity based retrieval. Images considered similar by one user may differ in the judgement of another one.

3.4. ImageFinder
Developer: Attrasoft Inc. Matching: Using a Boltzman Machine matching is done, which is a special kind of probabilistic artificial neural network. The network must first be trained on a set of images. This is done by just entering the training images (key images), there is no control over the features on which to train the network. Result presentation: The result is an HTML page with a list of links to the images found. Drawback: It remains unclear on what features the matching is done.

3.5. PicSOM (Picture & Self Organizing Map)


Developer: Laboratory of Computer and Information Sciences, Helsinki University of Technology, Finland. Matching: The matching is based on the Self Organizing Map (SOM). The SOM consist of a regular grid of units, each having a feature vector. The SOM is fed with all database images by a sequential regression process, which makes that images are ordered on the grid so that similar images are close to each other. A unit in a higher level SOM represents a cluster of similar images. The distance between the units in the SOM is the Euclidean distance between there feature vectors. Result presentation: Starting from the SOM unit having the largest positive convolved response, PicSOM retrieves the representative images associated with the map units.

Special Issue

Page 6 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

Drawback: The MPEG-7-defined content descriptors can be used as such in the PicSOM system even though Euclidean distance calculation, inherently used in the PicSOM system is not optimal for all of them. Also, the PicSOM technique is a bit slower than the system based on vector quantization in starting to find relevant images.

3.6. QBIC (Query By Image Content)


Developer: IBM Almaden Research Center, San Jose, CA. Matching: For the average color, the distance between a query object and database object is a weighted Euclidean distance, where the weights are the inverse standard deviation for each component over the samples in the database. In matching two color histograms, two distance measures are used. Result presentation: The best matches are presented in decreasing similarity order with the matching score aside. Drawback: The semantic gap and the subjectivity of human perception affect results. The comparison of feature vectors using distance functions leads to bad, sometimes unacceptable response times.

4. Applications
In [11], a wide range of possible applications for CBIR technology has been mentioned. They include:

4.1. Crime prevention


Law enforcement agencies typically maintain large archives of visual evidence, including past suspects facial photographs (generally known as mugshots), fingerprints and shoeprints. Whenever a serious crime is committed, they can compare evidence from the scene of the crime for its similarity to records in their archives. Strictly speaking, this is an example of identity rather than similarity matching, though since all such images vary naturally over time, the distinction is of little practical significance. Of more relevance is the distinction between systems designed for verifying the identity of a known individual (requiring matching against only a single stored record), and those capable of searching an entire database to find the closest matching records.

4.2. The military


Military applications of imaging technology are probably the best-developed, though least publicized. Recognition of enemy aircraft from radar screens, identification of targets from satellite photographs, and provision of guidance systems for cruise missiles are known examples though these almost certainly represent only the tip of the iceberg. Many of the surveillance techniques used in crime prevention could also be relevant to the military field.

4.3. Intellectual property


Trademark image registration, where a new candidate mark is compared with existing marks to ensure that there is no risk of confusion, has long been recognized as a prime application area for CBIR. Copyright protection is also a potentially important application area. Enforcing image copyright when electronic versions of the images can easily be transmitted over the Internet in a variety of formats is an increasingly difficult task. There is a growing need for copyright owners to be able to seek out and identify unauthorized copies of images, particularly if they have been altered in some way.

Special Issue

Page 7 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

4.4. Architectural and engineering design


Architectural and engineering design share a number of common features the use of stylized 2and 3-D models to represent design objects, the need to visualize designs for the benefit of nontechnical clients, and the need to work within externally imposed constraints, often financial. Such constraints mean that the designer needs to be aware of previous designs, particularly if these can be adapted to the problem at hand. Hence the ability to search design archives for previous examples which are in some way similar, or meet specified suitability criteria, can be valuable.

4.5. Fashion and interior design


Similarities can also be observed in the design process in other fields, including fashion and interior design. Here again, the designer has to work within externally-imposed constraints, such as choice of materials. The ability to search a collection of fabrics to find a particular combination of color or texture is increasingly being recognized as a useful aid to the design process. So far, little systematic development activity has been reported in this area. Attempts have been made to use general-purpose CBIR packages for specific tasks such as color matching of items from electronic versions of mail-order catalogues, and identifying textile samples bearing a desired pattern, but no commercial use appears to be made of this at present.

4.6. Journalism and advertising


Both newspapers and stock shot agencies maintain archives of still photographs to illustrate articles or advertising copy. These archives can often be extremely large (running into millions of images), and dauntingly expensive to maintain if detailed keyword indexing is provided. Broadcasting corporations are faced with an even bigger problem, having to deal with millions of hours of archive video footage, which are almost impossible to annotate without some degree of automatic assistance. This application area is probably one of the prime users of CBIR technology at present though not in the form originally envisaged. In the early years of CBIR development, hopes were high that the technology would provide efficient and effective retrieval of still images from photo libraries, eliminating or at least substantially reducing the need for manual keyword indexing.

5. Conclusion
CBIR is an active research topic in image processing, pattern recognition, and computer vision. Color can be usually represented by the color histogram, color autocorrelogram, color moment under a certain color space. Texture can be represented by Tamura feature, GLCM (Gray Level Co-occurrence Matrix), Gabor and Wavelet transformation. Shape can be represented by Psuedo Zernike Moments. The combination of the color and texture features of an image in conjunction with the shape features will provide a robust feature set for image retrieval.

6. References
[1] M.Babu Rao, Dr. B.Prabhakara Rao, Dr. A.Govardhan, Content based image retrieval using Dominant color and Texture features, International Journal of Computer science and information security, vol.9, no. 2, pp: 41 46, February 2011. [2] X-Y Wang et al., An effective image retrieval scheme using color, texture and shape features, Computer Standards & Interfaces, doi:10.1016/j.csi.2010.03.004, vol.33, pp: 59-68, 2011. [3] Chia-Hung Wei, Yue Li, Wing-Yin Chau, Chang-Tsun Li, Trademark image retrieval using synthetic features for describing global shape and interior structure, Pattern Recognition, vol.42, no.3, pp:127, 2009.

Special Issue

Page 8 of 86

ISSN 2229 5216

International Journal of Advances in Science and Technology

[4] FAN-HUI KONG, Image Retrieval using both color and texture features, Proceedings of the 8th International Conference on Machine learning and Cybernetics, Baoding, pp: 2228-2232, 12-15 July 2009. [5] JI-QUAN MA, Content-Based Image Retrieval with HSV Color Space and Texture Features, Proceedings of the International Conference on Web Information Systems and Mining, pp: 61-63, 2009. [6] Nai-Chung Yang, Wei-Han Chang, Chung-Ming Kuo, Tsia-Hsing Li, A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval, Journal of Visual Communication and Image Representation, vol.19,no.2 ,pp:92105, 2008. [7] Young Deok Chun, Nam Chul Kim, Ick Hoon Jang, Content based image retrieval using multi resolution color and texture features, IEEE Transaction on Multimedia, vol.10,no.6,pp:1073 1084, 2008. [8] P. Howarth and S. Ruger, Robust texture features for still- image retrieval, IEEE Proceedings of Vision, Image & Signal Processing, vol.152, no.6, pp: 868-874, December 2005. [9] S. Liapis, G. Tziritas, Color and texture image retrieval using chromaticity histograms and wavelet frames, IEEE Transactions on Multimedia, vol.6, no.5, pp: 676686, 2004. [10] Remco C. Veltkamp, Mirela Tanase, Content-Based Image Retrieval Systems: A Survey, Technical Report UU-CS-2000-34, pp: 1-62, October 28, 2002. [11] http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc

Authors Profile

Pranali P. Lokhande currently doing M.E. in Computer Science & Engineering from Sipnas College of Engineering & Technology, Amravati. She has done her B.E. in Computer Science & Engineering from Sipnas College of Engineering & Technology, Amravati in 2003 from Amravati University, India. She has 5 years teaching experience.

Prof. P. A. Tijare has done his M.E. in Computer Technology & Application from CSVTU, Bhilai in 2008. He has done his B.E. in Computer Science & Engineering from SRTMU, Nanded in 2003. He has done his D.E. in Computer Engineering from MSBTE, Mumbai in 2000. He has teaching experience of more than 8 years. He has received certification in Oracle & WebSphere Studio V5.0.

Special Issue

Page 9 of 86

ISSN 2229 5216

You might also like