Professional Documents
Culture Documents
Region Based Color Histogram Features For Efficient Web Image Retrieval
Region Based Color Histogram Features For Efficient Web Image Retrieval
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 83
Abstract—In this paper, we present a method for region based color histogram features for efficient web image retrieval for searching similar
images in the Web. Our searching algorithm makes use of color histogram features of the query image and database images. The features of
these are condensed into small “signatures” for each image. The signature is stored in the database for retrieving the similar images. In the
retrieval phase, we used region-to-region based distance measure metric to retrieve the similar images. In our approach, it accommodates to
store large number of images using the computed signature. It requires less storage to store in the database for storing the signature. The
main advantage of this approach is fast enough to be performed on a database of 20,000 images. In this paper, mainly we are focusing on
retrieve similar images without considering the basic properties of the size, shape, texture etc., of the image. The retrieval system speed can
be improved by using color histogram feature techniques.
Index Terms— Feature Extraction, Similarity metrics, Web Images, Web Indexing, Image retrieval system.
—————————— ——————————
1 INTRODUCTION
Dr. Suresha is a reader with the Department of Studies in Computer Algorithm: Color Based Histogram Feature Extraction.
Science, University of Mysore, Mysore.
[P( g )]
2
togram probability for every region (blocks) with 8 x 8 Energy
pixels. g 0
g) Entropy is a statistical measure; it describes how many
Input: We have collected 20,000 color images of size 128 x bits are required to code the image data, and is de-
85 and 128 x 96 general-purpose low resolution Web- fined as
crawled image from the World Wide Web. NB 1
Assumption: We assumed all the images which are col- End for
lected from the Web are .JPEG format. N is the number of End for
Web images collected.
We describe for computing the feature vector by using
For each Web image do first-order histogram probability from each region. Histo-
gram search algorithms characterize an image by its color
Step1. Convert RGB images into HSV color space. distribution or histogram. Many distances have been used
Step2. Apply contrast component to enhance to make to define the similarity of two color histogram representa-
pixel values closer to improve the results. tions. Euclidean distance and its variations are the most
Step3. Normalize the size of image into 128 x 128. commonly used [12]. The drawback of global histogram
representation is that information about object location,
For every image partition the image into Number of shape, and texture [13] is discarded. In this section, we
blocks (NB) do have discussed region based color histogram feature ex-
traction method. In the next sub section, we described for
Step4. The size of each block is 8 x 8 pixels and extracts representing an image. In section 3, we discussed the dis-
feature vector for N blocks. tance and similarity measures for retrieving the similar
Step5. Compute the signature by using a statistical analy- images.
sis
a) Mean is the average value of total number of intensity 2.2 Feature space analysis
values available in two dimensional spaces. Let N denote the total number of images in the image
NB 1 database. We obtain a set of ni feature vectors after the
gP( g )
I (r , c)
g r c
computation process. Each of these ni d-dimensional fea-
g 0 M tures vectors represents the statistical visual features of
b) The variance is a measure of the distance of each value each region of an image.
from the mean. Suppose feature vectors in the d- dimensional feature
space are { xi : I = 1, …, L}, where NB is the total number
NB 1 2
( g g ) P( g )
of blocks in the image database. Then
g
g 0
d r , c min d r , c
i j
1l k
i l
c) The standard deviation is one to describe the spread in
The goal of the feature clustering algorithm is to parti-
the data; it tells us something about the contrast of the
tion the features into k groups with centroids
image. If the image is high contrast then it will be have
high variance else low variance. xˆ , xˆ ,..., xˆ
1 2 K such that
min xi xˆ j
L 2
d) The skew measures the asymmetry about the mean in D(k )
the intensity level distribution. It is defined as follows: i 1 1 j k
L 1 3
( g g ) P( g )
1
skew is minimized. That is, the average distance between a fea-
3
g 0
ture vector and the group with the nearest centroids to it
is minimized. Two necessary conditions for the k groups
are:
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 85
These requirements of our feature grouping process to The distance between the two regions sets is the summa-
find k cluster means with the following steps: tion of all the weighted matching strength, i.e.,
Figure 1: Best top matches of a sample car image query. The database contains 20,000 general-purpose crawled image
databases. The upper left corner is the query image. The second image in the first row is the best match using region
based color feature extraction.
Figure 2: Best top matches of a sample query. The database contains 20,000 general-purpose crawled image databases.
The upper left corner is the query image. The second image in the first row is the best match using global feature ex-
traction technique.
Figure 3: Two other query examples of best top matches of a sample query.
of all the rest images were recorded. For every query, we method. The performance of the IRRM feature extraction
computed the precision within the first 50 retrieved im- method is better than that of GFE color histogram system
ages is show in the figure 5. The Results have recoreded due to global feature space and region based feature
in Table 1 on different category of images set in our data- space separation obtained with more filled bins. Howev-
base. We achieved better precision in IRRM; the shown er, global feature space color based histogram is impossi-
results are similar images for the input given query im- ble to obtain matches on large databases. The region
age. based color histogram approach gives better searching
accuracy than the global feature extraction system.
We carried out similar evaluation tests for color histo-
gram match. We used HSV color space and a matching Figures 1 and 2 show the results of sample queries. Due
metric similar to extract region based color histogram to the limitations of space, we show only two rows of im-
features and match in the image database. Table 1 shows ages with the top 19 matches to each query. In the next
the performance as compared the Image region-to-region section, we provide numerical evaluation results by se-
Matching (IRRM) and Global Feature Extraction (GFE) mantically comparing several systems.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 87
In our experiment, we normalized the images into a network bandwidth. This contributes to reduce the
common size. This assumption may not always ideal. The bandwidth of the wide area network and it helps to speed
set of features for a particular image category is deter- up the transmission rate.
mined based on the perception of the users. For example,
shape –related features are not used. A major difficulty REFERENCES
in feature selection is the lack of information about [1] http://images.google.com/
whether any two images in the database match with each [2] http://www.ditto.com/
other. [3] http://www.altavista.com/
To explore feature selection, we conducted few expe- [4] Michael J. Swain, Charles Frankel, Vassilis Athitsos, “WebSeer: An
riments to measure the recall and precision. The main Image Search Engine for the World Wide Web”, Technical Report 96-
limi tation of our current evaluation results is that they 14, 1997.
[5] J. R. Smith. “Integrated Spatial and Feature Image Systems: Retrieval,
Compression and Analysis”. PhD thesis, Graduate School of Arts and
Sciences, Columbia University, February 1997.
[6] S. Sclaroff, L. Taycher, and M. La Cascia. “Imagerover: A content-based
image browser for the world wide Web”. In Proceedings IEEE Work-
shop on Content-based Access of Image and Video Libraries, June ’97,
1997.
[7] W. Niblack, R.Barber, and et al., “The QBIC project: Querying images
by content using Color, texture and shape”, In Proc. SPIE Storage and
Retrieval for Image and Video Databases, Feb 1994.
[8] M.L. Kherfi, D. Ziou and A. Bernardi, “Image Retrieval from the World
Wide Web: Issues, Techniques, and Systems”, ACM Computing Sur-
veys, Vol.36, No. 1, March 2004, pp.35-67.
[9] Michael J.swain, Dana H.Ballard, “Color Indexing”, International Jour-
are based mainly on precision or variations of precision. nal of computer vision, Kluwer Academic publishers, Netherlands, 7:1,
In practice, a system with a high overall precision may 11-32, 1991.
have a low overall recall. Precision and recall often trade [10] Michael Ortega, Yong Rui, Kaushik Chakrabarti, Sharad Mehrotra, and
off against each other. Thomas S. Huang. “Supporting similarity queries in MARS. In Proc. Of
ACM Conf. on Multimedia, 1997.
[11] Y.Chen, J.Z.Wang, and R.Krovetz, “Content-based image retrieval by
clustering”, In Proceedings of the 5th ACM SIGMM International
Workshop on Multimedia information retrieval, pages 193-200, ACM
press, 2003.
[12] M.Flicker, H.Sawhney, W.Niblack, J.Ashley, Q.Huang, B.Dom et
al.``Query by Image and Video Content: The QBIC System’’, IEEE
Computer, vol. 28, no. 9, 1995.
[13] K.Karu, A.K.Jain, and R.M. Bolle, ``Is there any Texture in the image?’’
Pattern Recognition, vol.29, pp.1437-1446, 1996.
[14] J.Li, J.Z. Wang, and G. Wiederhold, “Integrated Region Matching for
Image Retrieval,” proc. ACM Multimedia Conference, pp. 147-156, Los
Angeles, ACM, October, 2000.