Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
13 Paper 30121111 IJCSIS Camera Ready Paper Pp. 74-79

# 13 Paper 30121111 IJCSIS Camera Ready Paper Pp. 74-79

Ratings: (0)|Views: 16|Likes:

### Availability:

See more
See less

08/31/2014

pdf

text

original

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 10, No. 1, 201
2

Image Retrieval Using Histogram Based Bins of Pixel Counts and Average of Intensities
H. B. Kekre
Sr. ProfessorDepartment of Computer Engineering,NMIMS University,Mumbai, Vileparle, Indiahbkekre@yahoo.com
Kavita Sonawane
Ph. D. Research Scholar,Department of Computer EngineeringNMIMS University,Mumbai, Vileparle, Indiakavitavinaysonawane@gmail.com
Abstract
—This In this paper we are introducing a noveltechnique to extract the feature vectors using color contents of the image. These features are nothing but the grouping of similarintensity levels in to bins into three forms. One of its formincludes count of number of pixels, and other two are based onbins average intensity levels and the average of averageintensities of R,G and B planes of image having some similarityamongst them. These Bins formation is based on the histogramsof the R, G and B planes of the image. In this work each imageseparated into R, G and B planes. Obtain the histogram for eachplane which is partitioned into two, three and four parts suchthat each part will have equal pixel intensity levels. As the 3histograms are partitioned into 2, 3and 4 parts we could form 8,27 and 64 bins out of it. We have considered three ways torepresent the features of the image. First thing we taken intoconsideration is the count of the number of pixels in theparticular bin. Second thing considered is calculate the averageof the R, G and B intensities of the pixels in the particular binand third form is based on average distribution of the totalnumber of pixels with the average R, G, B intensities in all bins.Further some variations are made while selecting these bins inthe process where query and database images will be compared.To compare these bins Euclidean distance and Absolute distanceare used as similarity measures. First set of 100 images havingless distances between their respective bins which are sorted intoascending order will be selected in the final retrieval set.Performance of the system is evaluated using the plots obtainedin the form of cross over points of precision and recallparameters in terms of percentage retrieval for only out of first100 images retrieved based on the minimum distance.Experimental results are obtained for augmented Wang databaseof 1000 bmp images from 10 different categories which includesFlowers, Sunset, Mountain, Building, Bus, Dinosaur, Elephant,Barbie, Mickey and Horse images. We have taken 10 randomlyselected sample query images from each of the 10 classes. Resultsobtained for 100 queries are used in the discussion.
Keywords-component; Histogram, Bins approach, Image retrieval,CBIR, Euclidean distance, Absolute distance.

I.

Introduction (Heading 1)This paper describes the new technique for Content BasedImage Retrieval based on the spatial domain data of the image.CBIR systems are based on the use of spatial domain orfrequency domain information. Many CBIR approaches useslocal and global information such as color, texture, shape,edges, histograms, histogram bins etc to represent the featurevectors of the images [1], [2], [3], [4], [5]. Color is the mostwidely used visual feature which is independent of the imagesize and orientation. Many researchers have used colorhistograms as the color feature representation of the image forimage retrieval. Most of these techniques are using global orlocal histograms of images, some are using equalizedhistogram bins, some are using local bins formation methodusing histograms of multiple image blocks [6], [7], [8], [9].Main idea used in this paper is instead of changing theintensity distribution of the original image by taking theequalized histogram [10], [11]; we are using the originalhistograms of the image as it is. We are separating the imageinto R, G and B planes; obtain the histogram for each planeseparately which is partitioned into two parts having equalpixel intensities. By taking R, G and B value of each pixelintensity of an image we are checking in which of the twoparts of R, G, B histograms it falls respectively and then thebin for that pixel will be finalized where it will be counted[12]. Second thing we are taking into account is the intensitiesof the pixels in each of the 8 bins and new set of 8 bins isobtained in which each bin has the count of average of R, G, Bintensity values of each pixel in that bin. A little variation ismade in second types of bins is that we are taking average of average R, G, B values of all pixels in the respective bin countand a third set of bins holding average of average is formed.After analyzing the results of 8 bins, we have increased the noof bins from 8 to 27 and 64 by dividing the histogram of eachplane into 3 and 4 parts respectively. Once the bins formationis done comparison process is performed to obtain the resultsand evaluate the system performance. Comparison of queryand database images requires similarity measure. It issignificant factor which quantifies the resemblance in databaseimage and query image [13],[14]. Depending on the type of features, the formulation of the similarity measure variesgreatly The different types of

distances which are used bymany typical CBIR systems are Mahalanobis distance [15],intersection distance [16], the Earth mover’s distance (EMD),Euclidian distance [15], [17], and Absolute distance [19]. Inthis paper we are focusing on Euclidean distance and absolutedistance as similarity measures, using this we are calculatingthe distance between the query and 1000 database imagefeature vectors. These distances are then sorted in ascending

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 10, No. 1, 201
2

order from minimum to maximum Out of these 1000 sorteddistances images with respect to create these components,
first100 distances in ascending order are selected as images retrievedas there are 100 images of each class in the database [18].
Number of relevant images in these 100 images gives us theprecision and recall cross over point (PRCP), which is theperformance evaluation parameter of the system.This paper is organized as follows: Section 2 will discussthe algorithmic view of the CBIR system based on 8, 27 and64 bins using histogram plots. Section 3 describes the Role of the similarity measures in the CBIR system. Section4.highlights the experimental results obtained along with theanalysis. Finally section 5 summarizes the work done alongwith their comparative study.II.

A
LGORITHMIC
V
IEW OF
B
INS
F
ORMATION
A.

Feature Extraction and Formation of Feature Databases
Figure 1. Feature vector Database Formation
Bins Formation Process: 8 BinsStep1
. Spilt the image into R, G and B planes.
Step2
. Obtain the histogram for each plane.
Step3
. Divide each histogram into 2 parts and assign a uniqueflag to each part.
Step4
.
To extract the color feature of the image, pick up theoriginal image pixel and check its R, G and B values find outin the histogram that in which range these values exactly falls,based on it assign the unique flags to the r, g and b values of that pixel with respect to the partition of the histogram itbelongs.
Step5.
Count of pixels in the bin: Based on the flags assignedto each pixel with respect to the R, G B values and 2 partitions(e. g. 0 and 1) of the histogram we can have 8 combinationsfrom 000 to 111 which are the total 8 bins”.
B.

Formation of Extended Bins 27 and Bins 64

Formation of 27 and 64 bins feature vector database isextended version of the 8 bins feature extraction process. Herefor 27 bins only difference is in step3 of the above algorithm,here to get 27 and 64 bins we are partitioning the histogramsinto 3 and 4 parts respectively which are named as 0, 1, 2 for27 bins and 0, 1, 2, 3 for 64 bins approach. As explained instep 4 to 5 here also same process is applied and 3 bit flags areassigned to each pixel of the image for which the featurevector is being extracted. For 3 partitions the 3 flag bits (eitherof 0, 1 and 2) can have 27 combinations and for 4 partitionsthe 3 flag bits (either of 0, 1, 2 and 3) can have 64combinations, these are the addresses of the 27 and 64 binsrespectively. Based on this process two feature databases of feature vector size 27 and 64 holding the count of no of pixelsaccording to the r, g, and b intensity values are obtained asBins27_database and Bins64_database respectively.
C.

Variations to Obtain Multiple Feature Databases
As shown in Figure.1 Three different databases for 8, 27 and64 bins can further have 2 different sets of feature vectorsnamed “Count of no of pixels”, “Average of R, G and Bvalues for all pixels in a Bin” which are simply obtained bymodifying the process of extracting the feature vectors ;instead of just taking the count of pixels we have consideredthe significance of actual intensity levels of each pixel in eachof the 8, 27 or 64 bins and taken the average values of them.III.

A
PPLICATION OF SIMILARITY MEASURE
Many similarity measures used in different CBIR systems arestudied [21], [22], [23], [24], [25]. We have used Euclideandistance given in equation (1) and absolute distance inequation (2) as similarity measures in our work to produce theretrieval results. Once the query image is accepted by thesystem it will calculate the Euclidean distance as well asAbsolute distance between the query image feature vector anddatabase image feature vectors. In our system database size is1000 images, so we obtained two sets of results one based oneach similarity measure. When query image will be comparedwith 1000 database images which generate 1000 Euclideandistances and 1000 Absolute distances. These are then sortedin ascending order to select the images having minimumdistance for the final retrieval.
Euclidean Distance :
( )
21
=
=
niiiQI
FI FQ D

(1)
Absolute Distance :
)(
1
I n I QI
FI FQ D
=

(2)Bins Formation8 Bins27 64Count of:Number of PixelsAverage of R, Gand B values for theno of pixels

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 10, No. 1, 201
2

Final Retrieval Process
Images having less distance are to be selected in the final set.For this we kept one simple criterion that we are taking firstminimum 100 distances from the sorted list and correspondingimages of those distances only taken into the final retrieval set.Same process is applied for all the features databases usingboth similarity measuresIV.

E
XPERIMENTAL
R
ESULTS
A
ND
D
ISCUSSIONS

A.

Database and Query Images
Experimental set up for this work uses 1000 BMP imagesincludes 10 different classes where each class has 100 imageswithin it. The classes we have used are Flower, Sunset,Mountain, Building, Bus, Dinosaur, Elephant, Barbie, Mickeyand Horse images. Feature vectors for all these images arecalculated in advance using different methods described abovein section 2 and multiple feature databases are obtained.Query is given as example image to this system. Once thequery enters into the system feature vectors using all differentways will be extracted and will be compared with therespective feature vector databases by calculating the Euclideandistance and Absolute distance between them. Selection of query images is from the database itself; it includes 10 imagesfrom each class means total 100 images are selected to be givenas query to the system for all the approaches based onvariations in bins formation to test and evaluate theirperformance. Sample Images from the database is shown inFigure 2.

Figure 2. Sample Database Images from 10 Different Classes(Database is of Total 1000 bmp images from above 10 classes, includes 100from each class
B.

Results, Observations and Comparison
Results using 100 queries are obtained for 3 approaches basedon formation of bins, that are 8 bins, 27 bins and 64 bins whereeach approach includes the 2 variations while extracting thepixel’s color information to form the feature vector which areclassified as ‘Count of Number of pixels’ and ‘Single average’that is average intensities of the number of pixels in each bin.Results obtained are segregated in three tables as 8 bins, 27bins, and 64 bins. First column of each table is indicating thequery image classes used for the experimentation. Remainingtwo columns are showing the total retrieval results obtained forCount of pixels and Single average approaches with respect toboth the similarity measures that are Euclidean distance (ED)and Absolute distance (AD). Percentage retrieval is shown inChart 1, 2 and 3 for 8, 27 and 64 bins respectively. Since thereare 100 images of each class in the database percentageretrieval will be a cross over point of precision and recall [26].In Table 1 we can see the total and average of retrieval of 10 queries from each of the 10 classes. In all the three results,results based on just the count of pixels are poor as compare tothe other approaches. Results obtained for Single_Average arefar better than ‘Count of Number of Pixels’. We can note downthe two sets of results are obtained for each approach; one isEuclidean distance and other is for Absolute distance named asED and AD respectively. When we observe these results of EDand AD, we found that AD is giving very good performance asa similarity measure in both the approaches. Chart1 is showingthe percentage retrieval where Single average proving its bestfor the class flower as it shows the highest retrieval that isalmost 55%. After observing the results obtained for 8 bins wethought of extending these bins to 27 which are formed bydividing the histogram of each plane into 3 parts instead of 2parts as in case of 8 bins.
TABLE I. R
ESULTS FOR
8

B
INS AS FEATURE VECTOR
Query ImagesCount Of No of Pixels TotalRetrievalSingle AverageTotal RetrievalED AD ED AD
Flower246 253 480 547Sunset503 504 458 460Mountain161 170 236 252Building171 168 219 240Bus404 413 455 481Dinosaur216 234 375 342Elephant187 180 303 301Barbie165 173 289 273Mickey277 286 492 475Horse374 369 463 468Average of 100 queries2704 2750 3770 3839