You are on page 1of 4

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 51

Image Retrieval from Web


Mrs. Tripti Bhatiyani

Abstract—In past recent years there has been much growth in the field of digital imaging technologies and hence
popularity of digital images is increasing very fast. But the task of obtaining images from the Internet which meet user’s
requirement is not a trivial task. Images were initially retrieved on the basis of textual information supplied through various
sources e.g. URL, filename, surrounding text, alt text etc. But all the images on the web are not annotated properly and still
results obtained on this basis were not always satisfactory. Imprecise annotation, perception subjectivity and requirement of
much time and labor in annotation paved the way for content based image retrieval.

Index Terms—Image retrieval, shape, texture, color, edge, low level features

——————————  ——————————

analyzing the actual contents of image rather than the


1 INTRODUCTION metadata e.g. keywords, tags, descriptions attached with the
World Wide Web is the medium wherefrom information in image. The content of images are low level features e.g.
large scale can be obtained very easily and at very low cost colors, shapes, textures and all the other information
too. In past few years the growth of World Wide Web has derived from the image itself. The origin and need of
been phenomenon. And because of this growth the content based image retrieval lies in the difficulties faces in
inclination towards retrieving images from Web has also the area of text-based image retrieval. In text based search
increased. By using digital imaging devices various kinds of first images are needed to be annotated by text and then a
digital images from the real world can be obtained easily. text based database management system is used to perform
With the growing popularity of digital images on the Web, image retrieval. The annotation of each and every image is
it has become important to focus on research on not possible if the collection is very large as it requires vast
technologies for image retrieval on Web. Images can be amount of labor. Images are also rich in content and still
retrieved either on the basis of text or content. In text based there is more difficult problem i.e. different perception of
search URL’s, page titles, alt text, hyperlinks and same image by different people. Imprecise annotations and
surrounding text etc are used to search images. While in perception subjectivity paved way for content based image
content based search, images are retrieved by matching low retrieval.
level features e.g. color, texture, edge, shape, shadow and Content based image retrieval is based mainly on three
temporal details. fundamental tasks, i.e. Feature Extraction, Indexing and
Design of Retrieval System.
Image Retrieval is the mechanism of browsing, searching
3 FEATURE EXTRACTION
and retrieving images from image databases. The most
common methods of image retrieval are based on adding A feature is function of one or more measurement which is
captions, keywords and descriptions to images. The image the value of quantifiable property which is computed to find
retrieval is based mainly on the annotation words in these some significant features of object. Feature extraction is the
methods. The task of annotating the images manually primary task of CBIR. It is the preprocessing step to obtain
requires lot of time and efforts. Hence a lot of research has image features like color histogram, shapes and textures.
been done in the field of automatic image annotation. Due Features e.g. corner points or interest points are also being
to the problem with traditional methods of image indexing used in image retrieval. Scale and affine invariant interest
there is keen interest in techniques for retrieving images on points for affine transformations and illumination changes
the basis of automatically-derived features such as color, are important features for image retrieval. Besides that
texture i.e. Content-Based Image Retrieval (CBIR). MIT was wavelet-based salient points can also be used. To improve
the first microcomputer based image database retrieval generalization and efficiency in classification and indexing
system, developed during 1980’s by Bani Reddy Prasad, of large number of image features, feature subset can be
Amar Gupta, Toong and Madnick. derived. After making the decision about visual features,
next step is retrieval of image. Color, texture and shape are
application independent features. These features can be
2 CONTENT-BASED IMAGE RETRIEVAL
further divided into:
In content based image retrieval search is performed by 1. Pixel level feature which includes color and its
location.
————————————————
 Mrs. Tripti Bhatiyani is with the Institute of Applied Sciences & Computer
Applications, ITM Universe, Gwalior.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 52
2. Local features e.g. segmentation and edge detection. 3.2 TEXTURE
3. Global features which are calculated over entire image Texture is look of visual pattern in images and their spatial
or regular sub area of image. arrangement. It is the natural property of all surfaces which
Histogram based descriptors, dominant color descriptors, tells about the structural composition of surface. Visual
spatial color descriptors and texture descriptors are used for patterns are innate properties of all surfaces which possess
browsing and image retrieval. Representation of shape information about surfaces and their relationship to
attribute in segmented image regions plays an important environment [5]. A co-occurrence matrix is constructed
role in image retrieval. Shape context is the shape descriptor which is based on the orientation and distance between
for matching shape, is very compact and robust to geometric image pixels. Textures can be divided into various sub
transformations [9]. For shape estimation in images reliable categories e.g. surface texture, image texture, deterministic
segmentation is very important. The problem of texture and statistical textures.
segmentation from the point of view of human perception is Texture representation can also by given as computational
difficult to solve but still some solutions are obtained in this approximations of the visual texture properties e.g. contrast,
direction. In normalized cuts criteria cues of contours and regularity, roughness, coarseness, directionality, line
texture differences are used for segmentation [10]. Corner likeness etc [6]. Besides that texture representations were
points or intersect points are also being used extensively for given by mean and variance extracted from the Wavelet sub
image retrieval. Scale and affine invariant interest points, bands. This approach achieved 90% accuracy on 112 Brodatz
which can handle affine transformation and illumination texture images. Tree structured wavelet transform was used
changes, are also important features for image retrieval. to explore middle band features in order to improve the
Another classification is: classification accuracy.
1. Low level feature
2. High level feature 3.3 SHAPE
Low level features are those features extracted directly from Shapes of images and of specific region within them are
images e.g. edges, texture, color, shape, corner. While high identified by applying segmentation or edge detection to an
level features extraction is based on low level features. image. When edges are detected and meaningful
information are extracted from images, outer shape can be
3.1 COLOR described using some statistical expression. Shape
This is the most widely used visual feature in Image description can be either geometric or topological.
Retrieval. Images can be retrieved on the basis of color Geometric descriptors are area, length, perimeter,
similarity by computing color histogram for the image. eccentricity, principle axis of inertia, compactness, moments
Color Histogram is the most common color feature of inertia etc. Topological descriptors are connectivity i.e.
representation. The color histogram identifies the neighboring feature adjoining the region and Euler number.
proportion of pixels in an image holding certain values.
Color histograms denote the joint probability of three colors. Shapes are retrieved on the basis of features e.g. lines,
Color histogram may define number of pixels of each red boundaries, aspect ratio and circularity etc. areas of change
channel value in range of 0-255. Histograms lose or stability are identified by region growing and edge
information related to spatial distribution of colors hence detection. Boundary-based shape representation uses outer
two different images can have similar histograms. There is boundary of the shape while region-based representation
need to access such spatial information. Correlograms and use entire shape region. Fourier Descriptor and Moment
anglograms are the two well known approaches to access Invariants are examples of such boundary-based and
spatial information. Correlgrams finds the distribution of region-based shape representation respectively.
colors of pixels in certain areas around pixel of certain color,
while anglograms finds signature of spatial arrangement of 4 IMAGE INDEXING
areas having common properties like same colors.
Similarity is measured for the color by Histogram Image indexing is the process where feature vectors in s are
Intersection, a L1 metric [1]. Similarities between similar but sorted so that to process a query there is no need to
non-identical colors can be measured by L2-related metric in sequentially scan database. Images in the database go
comparing the histograms [2]. through feature extraction process. Each image is
In addition to Color Histograms Color Moments and Color represented as k-vector, dimensional feature where k is the
Sets are also used to retrieve images. Any color distribution no of features of the image. If s is the set of feature vectors
can be described by its moments. First moment (mean), and Ek is k dimensional feature space then S Ek .
second moment (variance) and third moment (skewness) are
found out as most of the information is based on low order Images are 2-D array of pixels, where each pixel is
moments [3]. Color sets can be used to search quickly from represented with fixed precision e.g. 8 bits or 16 bits per
large-scale image collections. (R, G, B) color space can be pixel. High level abstractions of images are formed from
transformed into uniform space and quantized into M bins. raw images in many ways, one of which is making
A color set is a selection of colors from the quantized color histograms on the basis of color distributions and object
space [4]. contours. This is known as feature extraction. Images are
presented as K-dimensional feature vector, where k is
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 53
number of features representing an image. If S be the set of Relevance Feedback attempts to capture user’s precise needs
feature vectors representing the images in database and Ek by iterative feedback and query refinement. As there is no
are K-dimensional feature spaces then S Ek [7]. Image framework to characterize high level semantics of image
indexing aims to sort the feature vectors in S to avoid and human subjectivity of perception, relevance feedback
sequential scan through entire database while executing the gives case specific query semantics. It is an active learning
query. The choice of Ek and discrete matrix affects the process where feedbacks given by user are selected in
performance of search mechanism significantly. Data image several rounds. Man algorithm are based on just 2
closest to query image according to discrete matrix many classifications of image i.e. relevant or irrelevant, while
not always be the most similar data image according to some consider multiple relevant and irrelevant groups of
visual perception or other objective criteria. images using proper user interfaces.
Searching for nearest neighbor for a given point can be done
by sorting points in the database first. Partitioning methods 6 REQUIREMENTS
are used to divide the multidimensional vector space into Content Based Image Retrieval is applied in various fields
partitions. Partitioning methods are based on absolute e.g. Medical Science, Astronomy, Space Research,
coordinate values of vector space. These partitioning Mineralogy, Remote Sensing. While applying CBIR in any of
structures are useful for queries based on absolute these fields there are several requirements which must be
coordinates but not for nearest-neighbor search as search fulfilled by the System. Most critical feature is the
structure does not maintain distance information between performance of system i.e. quality of retrieval and its
points within a partition and its boundaries. relevancy. Efforts are directed to improve performance of
Image indexing is the process where feature vectors in s are precision and recall. The web is large storehouse of images
sorted so that to process a query there is no need to which is increasing enormously by each day. System must
sequentially scan the dbase. The choice of feature space and be able to handle large volume of images, indexing and
distance metric affects the performance of search retrieving them. Images taken from different sources differ
mechanism to great extent. It may happen that data image in quality, resolution and color depths, hence there may be
closest to query image in feature space according to distance variations in extracted color and texture. A robust system is
metric may not be most similar visually. What criteria needed to tackle such variations. System should be such
should be decided to match image similarity is still a big which does not exhaust host server resources, as in online
research issue. image retrieval system there would be multiple concurrent
users. Time is also very important factor in online retrieval,
5 RELEVANCE FEEDBACK TO CONTENT BASED for better performance response time must be lesser.
IMAGE RETRIEVAL
7 IMAGE RETRIEVAL SYSTEMS
In CBIR initially feature representation scheme is to be There are many image retrieval systems, which are available
chosen for each feature then during the retrieval of image today to be used for commercial purpose and research.
visual feature is selected for the process of retrieval. If there These systems support many tasks e.g. browsing images,
are multiple features, then weights are specified for the searching image either by text, sketch or example and
representations. On the basis of selected feature and permit navigation with customized image categories. QBIC
specified weights images are searched, matching to users (Query By Image Content) was the first content based image
query. This approach is totally dependent on the computer retrieval system. In this system queries are based on
and doesn’t involve human interaction. It lacks in example image, drawings and sketches etc. (R,G,B) color
performance as there is gap between high level concept and feature and k-element color histograms are used in it. Virage
low level features. Secondly in this approach there is is another content based image retrieval system, developed
subjectivity of human perception. Different persons may by Virage Incorporation. Queries are based on color, texture
have different viewpoint over similar things and /or the and shape. RetrievalWare system was developed by
same person under different circumstances may perceive Excalibur Technologies and searches images on the basis of
same visual content differently. color, shape, texture, brightness, color layout and aspect
ratio of image. It allows users to adjust weights of visual
Due to these difficulties posed by computer centric features. Photobook, developed by MIT Medial Lab
approach, relevance feedback approach is applied. In this consisted of three subparts to extract shape, texture and
approach human and computer interact to refine high level surface features respectively. Netra developed in UCSB
queries to representation based on low level features. Alexandria Digital Library, retrieved images on the basis of
Queries are adjusted automatically using the relevance color, texture, shape and spatial location information. Visual
feedback given by the user. Such adjusted query is better Seek and WebSeek retrieval system were developed in
approximation to users information need. User abstains Columbia University and searches image on the basis of
from the responsibility of concept mapping and visual features. But WebSeek also accepts text based queries.
specification of weights. Weights in the query object are It is made up of three modules i.e. Image collection,
dynamically updated to model high level concepts and classification and indexing, search and retrieval module.
perception subjectivity [8]. MARS is Multimedia Analysis and Retrieval System,
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 12, DECEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 54
developed at university of Illinois, includes relevance
feedback in image retrieval.

8 CONCLUSIONS
Image Retrieval is not an independent task, but is an
integration of multiple research community’s efforts. In past
recent years there has been much research work in the field
of Image Retrieval on the basis of text and content. Earlier
approach in the field of content based retrieval was
computer centric, where images were retrieved on the basis
of similarities of individual feature representation with fixed
weights. But this approach was not perfect as it is very
difficult to represent high level concept using low level
features and also perception difference is also another big
issue. As an after effect to this, human intervention was
involved in retrieval process as relevance feedback. In this
approach, user after giving his query can refine it later on
using relevance feedback.

Still a lot is to be done in the field of image retrieval. Even


after much research and efficient system of image retrieval,
still there is an urge for robust and reliable image
understanding technology.

REFERENCES
[1] Michael J. Swain and Dana Ballard. Color Indexing. International
Journal of Computer Vision, 1991.

[2] W. Niblack, R. Barber. The QBIC project: Querying images by content


using color, texture and shape, 1994.
[3] Markus Stricker and Orengo. Similarity of color images. 1995
[4] John Smith and Shih-Fu Chang. Tools and techniques for color image
retrieval. 1995.
[5] Robert M. Harlick, K. Shanmugam, Itshack Dinstein. Texture features for
image classification. 1973.
[6] Tamura, Shunji Mori and Takashi Yamawaki. Texture features
corresponding to visual perception. 1978.
[7] Tzi-cker Chiueh. Content based Image Indexing.
[8] Yong Rui, Thomas S. Huang, Sharad Mehrotra. Relevance feedback
technique in interactive CBIR.
[9] S. Belongie, J. Malik, J. Puzicha. Shape matching and object recognition
using shape context.
[10] J. Shi, J. Malik. Normalized Cuts and Image Segmentation, 2000.

You might also like