Professional Documents
Culture Documents
Relevance Feedback
Master Thesis
By
Beirut, Lebanon
ENGINEERING
Approved by:
Supervisor
ii
DEDICATION
I would like to dedicate this project to our family for their support and caring
throughout the different stages of the project, to my instructor Dr. Ismail who always tried his
best to emit out the best of me, and off course to the Almighty God for his daily limitless
blessings.
Zahraa Loubany
I dedicate the fruit of this hard work to my family and friends who gave me the
needed support, to Dr. Ismail who pushed me all the way to show the very best in me, and to
our university LIU for providing satisfying educational levels to achieve a successful work.
And most of all, the biggest gratitude to Allah for helping me out all the way.
Dina Balchy
iii
ACKNOWLEDGMENT
We would like to share gratitude with our supervisor Dr. Ismail El-Sayad for his
invaluable assistance, support, and supervision. Dr. El-Sayad has been next to us through the
whole journey, offered undeniable assistance and regular supervision to help us achieve this
project, and never hesitated to offer us needful information and advices to attain this level.
Gratitude is also shared with our Thesis Committee Members: ‘Dr. Samir Omar’ and ‘Dr.
High appreciation goes to our parents, who gave us all the needed support and confidence
Finally, we would like to express our gratitude to Almighty God for granting us strength and
iv
ABSTRACT
Relevance Feedback. In this report, the image will be represented in a higher level depending
on the users’ feedback as they evaluate the veracity and accuracy of the retrieved images.
The visual data (hereby known as Visual Descriptors) are extracted using TOP-SURF,
whilst the textual data (known as Textual Descriptors) are extracted based on the TF-IDF
values of the annotated tags of the images. The usage of the advanced process BOVW, Bag
of Visual Words, is influenced by the success of Bag of Words in textual classification and
retrieval. The BoVW represents the image by all words that describes it or can be generated
from. The empirical distribution of words is captured with a histogram making the
assignment performed by the user. The system by its turn will consider the images with a
high relevancy value to enhance the image presentation according to a weighted schema
v
TABLE OF CONTENTS
DEDICATION.........................................................................................................................II
ACKNOWLEDGMENT........................................................................................................II
ABSTRACT.............................................................................................................................II
TABLE OF CONTENTS........................................................................................................II
LIST OF FIGURES................................................................................................................II
TABLE OF EQUATIONS......................................................................................................II
LIST OF TABLES..................................................................................................................II
LIST OF SYMBOLS..............................................................................................................II
CHAPTER 1 INTRODUCTION............................................................................................2
1.1 Background................................................................................................................2
2.1 Introduction................................................................................................................7
vi
2.3.2.2 Texture Feature………………………………………………………………11
2.3.2.3 Shape…………………………………………………………………………12
3.1 Introduction..............................................................................................................22
3.2.1.2 Algorithm…………………………………………………………………….23
3.2.2 A Proposed Log-based Relevance Feedback Technique in CBIR using Positive and
Negative Examples...................................................................................................................25
3.2.2.2 Algorithm…………………………………………………………………….26
REFERENCES.......................................................................................................................35
vii
LIST OF FIGURES
Figure 3.1: Semantic Image Retrieval Using Relevance Feedback Block Diagram................34
Figure 4.1 : General Overview of the CBIR Proposed System with RF..................................20
viii
Table of Equation
LIST OF TABLES
ix
LIST OF SYMBOLS
RF : Relevance Feedback
TF : Term Frequency
IR : Information Retrieval
CHAPTER 1
INTRODUCTION
x
1.1 Background
Through the lapse of era, the amount of digital image collections has grown
dramatically due to the rapid increase of online users and web applications. This considerable
are the easiest way of communication used to convey information and reach the audience
brains smoothly, which made the demand on using images grows significantly.
With the popularity of social media applications, it’s not a surprise that images are
becoming increasingly important for content sharing and viewing. Generally, audience and
readers like to visualize stories not just to read them. Images illustrate the ideas for the
readers so the overall experience is more tangible and less demanding on the reader’s
attention with an obvious and explicit visual content. Visual content is content that engages,
inspires and makes the opportunity to spark the readers’ interest less daunting than
comprehensive texts.
The importance of web applications especially social media is undeniable. People are
spending most of their time navigating browsers, checking news and keeping an eye on their
friends’ new feeds which are most of the time pictures shared publically. Moreover, the
influence of video games, television and photographs has also contributed to this rise. In this
context, the development of suitable systems to manage appropriately these massive loads of
The known efficient systems for managing this load use the Content Based Image
common technique for image retrieval which has been created to solve the main problems of
xi
the query by text which is commonly known as TBIR – Text Based Image
In TBIR [ CITATION Pet \l 1033 ] systems, images are annotated manually by textual
tags, which are used in the image retrieval process by a database management system. The
user provides query in terms of keyword and the system by its turn will retrieve all images
Any image found within the system database is manually annotated by textual
still has some difficulties. Firstly, the manual annotation of a huge number of images requires
an exhaustive and considerable level of human labor. Secondly, the labeled images may hold
unexpressed feelings and emotions that cannot be described by textual phrases. Thirdly, the
manual annotation of images may significantly lack accuracy due to subjectivity of human
cognition.
xii
The CBIR systems were introduced to solve such problems by taking the input as a
query image instead of a text. By its turn, it seeks for images that are highly close to the
query image by color, texture or form. CBIR, also known as query by image content (QBIC)
Training images represent all images found within the database. Each training image is
represented by a vector of features which are also known as code words. Similarly, any query
image will also be represented in the same way so that common features (similarities) are
easily distinguished. The similarities detection are performed by measuring the Euclidean
distance between the representative vector of features corresponding to the query image and
A great advantage can be taken from the Relevance Feedback (RF) that can be
implemented within the CBIR technique. RF adds a great interaction between the system and
the users after the search and retrieval operation. This feedback could be an image reordering
xiii
of the yielded results, word description or a rating mark for the image according to its
relevancy to the query image which will actually be our main concern.
Then, our framework could be generally summarized by offering a higher level image
representation using the users’ relevance feedback which improves the system’s performance
The major difference between CBIR and TBIR is the keyword assignments
incompatibility among different annotators. Each annotator may interpret the content of the
image in distinct text phrases each depending on their own point of view. TBIR stores these
text annotations in the form of keywords of textual phrases together with the image. Most of
the TBIR systems use surrounding text of the image to search for the keywords which are
physically close to the image. This technique relies on the assumption that the image is
usually described by the surrounding text. Famous Search Engines that uses this technique
are Google, Yahoo and AltaVista long ago.[ CITATION Neh13 \l 1033 ]
CBIR, a new alternative mechanism for image search, has been declared in the early
1990s. Besides using phrases assigned maually, CBIR systems use the visual content of the
images such as color, texture, and shape features. CBIR systems mainly use low-level
features that are automatically extracted by the computer using its own vision techniques,
while humans use the high-level features which are mainly the concepts lying behind the
scenes. The difference between these two paradigms of feature extractions is defined as
‘semantic gap’. Learning users’ intention and their level of satisfaction using relevance
feedback is one of the major techniques for diminishing the semantic gap.[ CITATION
Neh13 \l 1033 ]
xiv
Our main purpose is to exploit the feedback taken from the user to create a reordered
form of the retrieved images and thus a better result for the user’s query. Also, we aim to
enhance visualizing the descriptors of the image by filtering the noisy visual words from the
image. The implementation is done using Visual Studio by the aid of TopSurf image
descriptor. All data will be stored using SQL server 2014 as a relational database
management system.
so, two hierarchal approaches will be developed: the first is performed by the user where he
is asked to give his own feedback according to a voting system. This could be a simple way
to show the user’s response or satisfaction without any complicated process. The second
approach is mainly done by the system by comparing the features of the relevant images and
Briefly, the project aims to enhance results of CBIR system retrieved images and
improve the image representation on the level of BOVW by taking an advantage of the RF,
The project provides an extensive description of our work through the five chapters.
searching technique for digital images in large databases. Moreover, an inclusive description
xv
Chapter three mainly represents the literature review. The methodologies adopted by
the previous related works are presented. We go along by a brief description of the algorithms
used before followed by a general comparison of the methods and listing the advantages and
Chapter four covers the construction of our implementation used by demonstrating the
algorithms, then analyzing the effect of various parameters and factors to determine which
Finally, in Chapter five we draw out a conclusion of our entire work and briefly explain the
future work.
xvi
CHAPTER 2
2.1 Introduction
This chapter profoundly introduces a general review of image processing and the
CBIR which relies on the low level features. This system involves three main steps that are
explained extensively. Image segmentation, Low-Level features extraction and the similarity
matching are stated respectively. Furthermore, we demonstrate the approach of BoVW (Bag
introduced.
an image after converting it into digital form, in order to enhance its presentation or to extract
some useful information from it such as feature extraction. It is a type of signal dispensation
where the input could be in the form of a video frame, image or photograph, and the output
nowadays, with its several applications in various aspects of business, remote sensing,
medical interpretation and other several applications associated to image retrieval. An image
retrieval system is a system which permits the user to navigate browsers, search, and retrieve
xvii
2.3 Content Based Image Retrieval- CBIR
CBIR is a retrieval process that collects in demand images from a huge database based
on the visual information of the query image. The retrieval of a particular image from a large
database is mainly affected by general factors such as color, texture, shape and local features.
automatically performed by the system using computer vision techniques known as (low-
level features).
To perform CBIR we have to go through the low level image features as a first step.
Low-Level features are considered as the base of CBIR systems. This can be done by
extracting it from the whole image or by applying segmentation. This process is called region
based image retrieval (RBIR) which is a special type of CBIR, where the region is considered
as a part of the image with homogeneous low-level features. RBIR is preferably used by users
To go through the execution of RBIR, the first step is to perform image segmentation.
Through this step, low-level features can be extracted from the segmented regions. Then, the
the years, several techniques have been proposed, such as curve evolution [ CITATION
HFe01 \l 1033 ], energy diffusion[ CITATION WYM97 \l 1033 ], and graph partitioning
[19]. Various existing segmentation techniques work properly for images containing
homogeneous color regions solely, which are mainly used in retrieval systems that deals only
with colors[ CITATION PLS03 \l 1033 ][ CITATION KAH99 \l 1033 ], such as direct
clustering methods in color space[ CITATION DCo97 \l 1033 ]. However, natural scenes are
xviii
substantially rich in both texture and color, and a broad zone of natural pictures can be
The majority of systems build their own segmentation technique in order to achieve
the required region features through the stage of segmentation, which could be color, texture,
1033 ]. Such algorithms are mainly based on k-means feature clustering [9]. Firstly, an image
is sectioned into 4*4 sized blocks from which color and texture feature are extracted.
Secondly, k-means clustering is applied to cluster the features into independent classes, each
classified to one region. Blocks within the same class corresponds to same region. K-means
Primal features characterizing image content, such as color, texture, and shape are
automatically extracted from images and used in content-based visual query. Various
algorithms have been proposed however; our main focus will be on the features used in RBIR
Color feature is one of the most important and widely used features in image retrieval
and presentation. Colors are assigned according to multiple color spaces, which often serve
for several applications. Description of several color spaces can be found in Ref.
[ CITATION KNP00 \l 1033 ]. Color spaces are known to be closer to human perceptual
abilities and widely used in RBIR. These spaces include, RGB (red, green, and blue),
LAB( Lightness, a and b are two color dimensions), CMY (Cyan, Magenta and Yellow),
CMYK (Cyan, Magenta, Yellow, and Black), HSV(Hue, Saturation, and Value), and YCrCb
xix
chroma components respectively) [ CITATION PLS03 \l 1033 ][ CITATION YLi04 \l 1033 ]
Common color features in retrieval systems include, color histogram, color moments,
and color coherence vector [ CITATION FJi031 \l 1033 ][ CITATION CCa02 \l 1033 ], all
are considered as descriptors. Color is a very crucial feature as it is invariant with respect to
scaling, translation and rotation of an image. Color space, color quantification and similarity
measurement are indispensable key components of color feature extraction; however they are
not directly related to high-level semantics. In order to evaluate the effectiveness and
Notably, in most of the CBIR systems, the color images do not undergo preprocessing
since they are usually affected by noise. This corruption may occur due to camera sensors or
capturing devices. From here, improving the retrieval accuracy necessitates applying an
Texture features is not as well-defined as color features, because it is not used in all
the content of different real-world images such as fruits, clouds, skin, trees, sky and fabrics.
Therefore, texture is a significant feature for image retrieval purpose as it defines high-level
image. It depends on the intensity distribution entirely all over the image not defined for a
separate pixel. Among the various texture features, Gabor Features and Wavelet Features are
widely used for image retrieval and they match the results of human vision studies. Note that
texture analysis by means of the Gabor filters is a special case of the wavelet approach
2.3.1.3 Shape
Along with texture and color features, image comparison also considers the shape of objects.
For shape representation of the image, various methods are used in which they are classified
into external and internal. The external methods represent the boundary, and internal ones
represent the pixels encompassing the region. Shape features are classified in to two types:
compactness. Grid based method is a clustering approach that is commonly used for object
shape description using multi resolution grid data structure. Currently, the most popular
some objects or scenes may hold information with close color and texture features but with
different spatial locations. For instance, ‘sky’ and ‘sea’ have a common color which is blue,
but with different usual locations. Sky is usually located at the upper part of the image, while
sea at the bottom. Generally, spatial locations are either defined as top, upper or bottom
according to the location of the region in an image. Directional relationships are not sufficient
xxi
for representing the semantic content of the image. Topological relationships must be taken
The BoVW (Bag of Visual Words) model is motivated by the achievements attained
by using BoW (Bag of Words) in document classification and retrieval. Each document in the
BoW model is represented by a set of unordered common words presented in the documents.
These words are formally represented by a histogram of frequency of occurrences for each
word which is used for document retrieval and classification. Analogously, an image is
The BoVW represents the image by all words that describes it or can be derived out
from. The empirical distribution of words is captured with a histogram that represents
countable illustrations of how many times each word has been repeatedly visualized
While performing the process of visual words extraction, foreground and background
features may be mixed together. By definition, foreground features represent the part of a
scene or picture that is nearest to and in front of the viewer lying closest to the picture plane.
However, the background features represent the farthest parts from the viewer that attracts
little attention or the picture social heritage (behind the scenes). To avoid this feature mix up,
In the retrieval and classification process, researchers are recently using interest point
detection. This methodology is a recent term in computer vision that refers to the detection of
interest points of an image and is relevant for higher level processing. Interest points
represent remarkable and conspicuous image patches that hold a salient information or
xxii
noticeable object. These points are commonly used by image stabilization and structure from
motion applications to track how the image changes from frame to another. Valid interest
Several techniques have been proposed for interest point’s detection. Corners are a
natural choice since they are easy to identify inside images with Harris and KLT which refers
by looking through a small window. As the window is shifted in any direction, the intensity
will change noticeably and thus the interest point is detected. Moreover, Blobs (Binary Large
Object) shown a great work in detecting points through different scales by localizing the
center of blobs.
[ CITATION Low04 \l 1033 ] (should be referenced to the article named SURF ref)
descriptor to describe the local information. They are then grouped into clusters, where
similar descriptors lay within the same cluster. Each cluster is considered as a visual word,
and thus we get a dictionary of visual-words that describe all kinds of image patterns.
xxiii
When interest points are mapped into visual words, we can represent an image as a
“Bag of Visual Words”, that is, a vector containing the count/weight of each visual word in
Generally, there are four sequential steps for performing a Bag of Visual Words.
In this step, the features are extracted from the image as an interesting point. For
instance, the features found in the image of the figure above might be eyes, window, mouth,
or hands each as a local patch. These features are then presented as numerical vectors called
“features descriptors”. One of the most famous descriptors is SURF descriptor which presents
each patch as 64-dimensional vector. Note that there are many key point description
techniques such as Harris and SIFT. SURF algorithm is similar to SIFT, but it is more
xxiv
The figure above shows a clarified illustration of how the system extracts the features
where the interesting points are detected. Each training image has its own features that will be
This step is defined as a process of organizing objects into distinct groups with similar
high similarities between them. Here, patches are converted to "code words" that refers to
words in text documents, producing a “code book” or a vocabulary that also refers to a
2. Compute the distance between each patch and the centroids of the clusters.
3. Assign each patch to the cluster with the nearest centroid (minimum distance).
xxv
Figure 2.3 – Feature Clustering
After assigning the similar feature to the same cluster, a codebook containing all visual
words can be formed. All similar features are gathered within the same code word forming
indexed visual words. The figure above represents a set of different clusters (groups of
At this stage, images are no longer represented as a set of pixels. Instead, it can be
represented with a higher level that is more oriented to the semantic as a set of patches or
visual words known as “Bag of Visual Words”. Each image can be exemplified as a vector
containing all visual words found in the dictionary as vector components. Each component
has its own “tf-idf” value which is used as a weighting factor as shown in the figure below.
xxvi
Figure 2.4 – Image Representation as Weighted Vector
The “tf” represents the number of occurrences of a visual word in the image divided by
the total number of visual words in this image which is called (Term Frequency). The other
factor “idf” refers to the total number of images divided by the number of images where the
visual word appears and it is termed as (Inverse Document Frequency). Accordingly, the (tf-
idf) weighting factor of each component in the vector is the product of the previous two static
values.
Beyond the vector presentation of the image, we can easily visualize a histogram
showing how many features the image has in each cluster. In other words, a histogram
illustrating the frequency of each visual word contained in this image. Note that these
xxvii
Figure 2.5- Constructing Histograms of Frequency Features
The figure above shows a set of histograms representing the frequency of features of different
images.
We extract features from the tested image, and form its corresponding histogram of visual
word frequencies to be compared with the ones obtained previously. Through this approach,
matches can be effectively computed. A positive or true match considered to be within the
xxviii
Figure 2.6 – Evaluating Images Against Obtained Histograms
successful tool in image classification images according to their content. It has a large
immunity against position and orientation of object in the image. Also, it represents all
vectors with a fixed length irrespective of number of detections. However, this model still
requires further testing for large changes in scale and viewpoint because it has not been
extensively tested yet for view point and scale invariance, and its performance still somehow
suffers ambiguity.
Lately, all recent retrieval systems embedded within their systems the user’s relevance
feedback to further improve the retrieval process and produce more meaningful and related
retrieved images [6]. This online process is optimized considering the most positive image
xxix
selection on each feedback iteration. Through continuous learning and interaction with end-
(1) The system provides initial set of retrieved images through query image.
(2) User evaluates the above results as to whether they are relevant (positive examples) or
(3) Machine learning algorithm is applied to learn the user’s feedback. Then go back to (2).
Steps (2) – (3) are repeated till the user is satisfied with the results.
Figure 2.7 shows a general overview of the RF algorithm. The process performed in step (3)
is to be performed by the system. That is, the burden of specifying the weight is removed
xxx
CHAPTER 3
LITERATURE REVIEW
3.1 Introduction
I........n this chapter, we go through the sections by illustrating the Content Based Image
Retrieval by using several Relevance Feedback approaches. Our focus is on several methods
that have been adopted in previous retrieval techniques, by describing the methodology
considered for each of them. This is carried on by demonstrating profoundly three different
famous methodologies known in the RF field, and explaining the algorithm of each.
disadvantages of each considered related work, drawing out a general conclusion, and paving
Recently, user’s feedback has been very popular and existing in many different levels.
Any smart application, employment systems, or new business intend to get the user’s or
costumer’s feedback in order to improve their qualifications so that they meet the user’s
satisfaction. Outlining the feedback process as well as desired outcomes is essential for
gathering user feedback properly; otherwise, we may be blindly asking for feedback that will
only confuse our understanding of user’s intentions and desires. From here, embedding RF in
Foremost, we are going to introduce three related paradigms of RF and the followed
methodologies. These systems are “Semantic Image Retrieval Using Relevance Feedback”
xxxi
and “A Proposed Log-based Relevance Feedback Technique in CBIR using Positive and
Negative Examples”.
To reduce the considerable gap between low level property and high level concept, this
It is found that the proposed Relevance Feedback system offers effective retrieval
based on ADABoost technique applies the relevant and irrelevant pattern. Encompassing
trials using the RF technique with AdaBoost on different databases showed important
AdaBoost is an effective algorithm of the learning field that was introduced by Freund
and Schapire in 1995 [ CITATION Yoa \l 1033 ]. It is used to enhance the classification
The experiments for test this approach were performed by selecting a query image at
random and therefore the retrieved images were obtained. Then, the user is asked to label
3.1.1.2 Algorithm
iteration, the irrelevant images are removed from the database which plays a role in It
optimizing the testing data. Hence it increases efficiency by reducing the retrieval time.
xxxii
Figure 3.1 - Semantic Image Retrieval Using Relevance Feedback
Note that Canberra distance metric is used as variable of measuring the similarity. Here
x represents the feature vectors of the database while y refers to query image, and have
This allows computing the average precision and thus knowing the efficiency of the
xxxiii
3.1.2 A Proposed Log-based Relevance Feedback Technique in CBIR using
Relevance feedback has been shown as an effective model to improve the recovery
performance of CBIR systems. It has already been considered as a crucial parameter when
modeling any CBIR system. After the user adds an enquiry image, the system will retort a set
of similar images that may not be totally relevant to the user’s targets. The pertinence
feedback mechanism solicits the user to mark the pertinence of the images retrieved. By its
turn, the system modifies the outcomes by learning the returns from the user. The Previous
systems offer to integrate the user's returns, have only taken the positive feedbacks into
feedback in CBIR system was used, which merges both positive and negative feedback.
3.1.2.2 Algorithm
The main idea of RF is to switch the load of finding the right query figuration from the
Figure 10 shows the general representation of the planed system by using the mean
xxxiv
Figure 3.2 – General Scheme of Proposed System
Initially, the images will be supplied to the system and stored in the system’s database.
When the user gives the query image into system, the latter in turn will extract the visual
features of supplied image and make comparison with the database image to retrieve the set
of relevant images to the user. On the retrieved set, user will give the feedback whether they
are relevant or not. If the Image is irrelevant, system goes back to the previous step.
However, if the image is relevant then system will determine the weight vector of query
image using short term learning, and supplies the semantic space of an image using long term
In the log-based semantic approach, the similarity space is determined based on the
semantics of an image. System will maintain the logs for user feedbacks by considering the
positives and negatives (relevant and irrelevant) from the retrieved set of images which are
useful in the next iterations. After the n iterations, an image will have the logs. This will be
useful to improve the system’s performance and speed. The steps are repeated until the user
The method used in the “Semantic Image Retrieval Using Relevance Feedback” shows
a fast increase in retrieving accomplishment with each iteration using AdaBoost learning
algorithm. The method adopted by “A Proposed Log-based Relevance Feedback” has a major
The advantages and disadvantages of the three considered methods will be illustrated in
a tabulated form as shown below. The related works are numbered as 1 and 2 for “Semantic
Feedback” respectively.
approach uses.
Fast expansion in the performance of No enhancement on the level of
presentation)
Performance enhancement from 57.2% Lacks the presence of a voting
iteration.
Integrates short-term-learning with long- No enhancement on the level of
large database which contains similar system that allows the user to
Although the retrieval image systems achieved high and efficient retrieval outcomes,
these systems still lack the interaction of the user directly with this system. The absence of
We believe that effective CBIR of images needs integration between the retrieval
models found in the Information Retrieval (IR) literature and the feature plucked out and was
found in the image processing area. For this purpose, it is necessary increase the level of
accuracy in such systems in order to return the desired results for users.
Relevance Feedback in CBIR has obtained its maximum popularity in the field of
image processing and retrieval. This approach sparked the attention of many researchers who
were involved in the enhancement of retrieving systems. In the past few years, researches
Henceforth, executing the RF in CBIR systems is the recent major key in improving retrieval
From here, our objective is to offer an improved content based image retrieval
system using the users’ relevance feedback which improves the system’s performance by
Retrieval System Using Relevance Feedback” using Visual Studio and MicrosoftSQL server
2014.
xxxvii
By coding and testing algorithms we will be able to accurately study system
The next chapter will introduce our methodology in interlacing the RF of the user in an
The main deliverable of the proposed approach would be a list of retrieved images with a
highest degree of relevancy with the image query requested by the user.
xxxviii
CHAPTER 4
IMPLEMENTATION
4.1 Introduction
systems and demonstrated in details. Three processes will be mainly introduced; two by
which they are performed by the system, and one process by the user. Firstly, the system will
perform the retrieval process, is followed by feedback process done by the user. Beyond the
feedback submission, system will update the order of the retrieved images and rearrange them
based on TopSurf. Particularly, the tf-idf values of each descriptor. Bag of Words (BoW) is a
list of words with their word counts which is generally represented by table. Each row
represents a document which is –in this case- an image, and each column represents a visual
word. The cells represent the word count of each word in an image. The figure below
xxxix
Term-frequency-inverse document frequency (tf-idf) is alternative way to estimate the
subject of an image by the visual words it contains. With tf-idf, visual words are given weight
Generally, the tf-idf value is composed of two static factors: the first computes the
normalized Term Frequency (TF), i.e. the number of occurrences of a visual word in the
image divided by the total number of visual words in this image. The second term is the
Inverse Document Frequency (IDF), computed as the logarithm of the number of images
divided by the number of images where the precise visual word appears.
Tf (t) = (Number of times visual word v appears in an image) / (Total number of visual words
in the image).
Idf (t) = log (Total number of images/ Number of images with visual word v in it).
using the Relevance Feedback after we study its performance with respect to precision and
user’s satisfaction.
The proposed system we intend to implement is illustrated in the figure below, and it
relevant)
xl
5. Review the rearranged form by each user
The system will rearrange the set of retrieved images according to the RF performed
by various users. The feedbacks of all users are taken into account every single iteration, so
that the user is able to visualize the original order of retrieved images by the system and also
xli
4.3 Implementation/Simulation Tools
The design consideration of our retrieval software was the programming language
that suits best our simulation. While various options such as C++ and JAVA were employed;
our chosen one was C# due to the availability of libraries for image processing and
MicrosoftSQL server for database due voluble dealing with large databases.
Using TOP-SURF as an open source code was our starting point. TOP-SURF is an
image descriptor that integrates interest points with visual words, and thus enhancing its
performance. TOP-SURF offers the elasticity in descriptor size variation and supports very
efficient image corresponding. Besides the visual word extraction, visualization, and
comparisons it also provides a high level API and very large pre-computed
SURF[ CITATION bay \l 1033 ] descriptor is a closed source, we used the OpenSURF as an
alternative open source, which depends on OpenCV that is released under the BSD license.
For a better database practicing, our work is connected to MicrosoftSQL server 2014
containing all tables and procedures needed for retrieval process execution and RF
engagement. Note that in the SQL software, we have used one table and 3 procedures
Figure 2
xlii
Figure…. Shows that the table includes the Rating_ID as a primary key and other
attributes used by the database proedures. The data type of images is chosen to be varbinary,
because the size of images varies considerably. The MAX attribute used for indicating that
different categories (Babies, cats , flowers and musical instruments). All the images are
The very first step in the proposed system is to extract the local descriptors from the
images. In this paper, SURF description was chosen in order to evaluate the system
performance. After the extraction of the key points from the image, the local description of
xliii
Figure 4.1 – System Architecture of Image Retrieval using BOVW
As shown in Figure 4.1, the system consists of two consecutive stages: Training stage
and Testing stage. Through the training stage, each image in the dataset is converted into
grayscale, resized and the features are extracted and associated to local descriptors. Beyond
this step, the set of local descriptors is clustered using K-means algorithm to construct a
histogram of vocabulary and saved for all images. As the test stage, the input image is pre-
processed for keypoints extraction where local descriptors are computed from it followed by
computing the BOVW vector. Finally, through the matching step, SVM (Support Vector
Machine) classification was used to grab the best results, which has the most similarity with
xliv
The user is prompted to drag the file of training images after choosing the “Extract
Descriptors” option in order to perform the feature extraction process by the system as shown
Figure 3
The retrieval demo of the query image with the set of images compared to it with the
Figure 4
xlv
As shown at the top of the screen, the user has two different options; either he chooses
to submit his feedback by rearranging the set of retrieved images or he directly view the
Figure 5
The above figure shows how the user is allowed to submit his feedback regarding the
first top ten retrieved images. Here, the user gives a rating mark for each image ranging from
1 up to 10. The mark represents the degree of relevancy between the image and the retrieved
one. Number 1 presents the image that is most relevant and closely connected to the query
image, from the users own perception. However, number 10 presents the lowest degree of
relevancy. Once the user submits his feedback, the system -which is automatically connected
to the SQL- will update the table of images and insert the ranking mark corresponding to each
image.
As for the second button, the user is allowed to visualise the updated order of the
retrieved images related to the same query image. In other words, the system collects all
feedbacks associated to the tested query image and averages them up to give the new order of
xlvi
each retrieved images. Notably, the images are arranged in the ascending order as the lowest
number presents the most relevant. Remarkably, the user is capable of visualising the new
order of the retrieved images even if hadn’t submit his own feedback. This means that the
step of inserting the feedback is optional, and the user is not obliged to do it if he is not
interested to do so. The figure below shows a demo of the set of retrieved images in the new
rearranged order.
Figure 6
The above figure obviously shows how the images are rearranged taking into account
the user’s feedback. After each iteration, the system averages the total entries of feedbacks
and returns the order of the image according to the equation below.
Average Feedback=
∑ Feedback Entries
Number of feedback submitted
Equation
system. Precision is a relation presenting the number of relevant images retrieved to the total
number of retrieved images (relevant and irrelevant). This is shown in the equation below.
xlvii
Number of relevant images retrieved
Precision= X 100
Number of images retrieved
Equation
The precision is computed for each category separately in order to evaluate the overall
performance of our system before and after consecutive rounds of relevance feedback. The
results are presented in the table below. The results percentage is computed taking into
Table
4.5 Discussion
Regarding the first category (Babies), the precision of 50% before relevance means
that originally the system retrieves 5 relevant images out of the top 10 retrieved images. After
the first round of the feedback submission the precision percentage increases to 60% which
means that 6 images out of the top 10 are now relevant to the query image. After each round,
more feedbacks are being taken into account which remarkably increased the precision
xlviii
To test the performance of the entire system, mean average precision is computed
Equation
The table below shows the MAP of each round for each category.
The results are clear enough to demonstrate how important the engagement of RF in
content based image retrieval systems. That is, for a CBIR system a percentage of 87.5%
relevancy indicates a high performance in the field of query by image content retrieval
systems.
4.5 Conclusion
became a must. It permits the interaction between the user and the system in a useful efficient
way. As well as, user’s feedback is an additional feature of CBIR systems that assists the
model to retrieve much more images visually related to the query concept expeditiously. This
is certainly shown by the accuracy and the precision of the retrieved results which were quite
xlix
As to performance measures, the precision percentage and the mean average precision
systems.
CHAPTER 5
5.1 Conclusion
effective retrieval have become very crucial. From here, the CBIR systems gave birth. It is
the field of representing, organizing and searching images based on their visual content rather
than textual tags describing it. Retrieval of images is no longer based on textual phrases and
annotations but on features extracted directly from the image data. This approach retrieves
digital images from large databases using the content of the images themselves without
the user’s satisfaction and engaging it in the system, the retrieval process is much more
efficient and precise. That is, the result of the retrieved images met the users’ desires
efficiently after engaging the feedbacks of several users within the retrieval process.
l
5.2 Future Work
enhancing the way we search by. Besides the Relevance feedback we’ve added, searching
would be better if we combine the two approaches (CBIR and TBIR) together instead of
searching in terms of an image solely. In other words, we join textual and visual features
using a certain algorithm, which fuses Visual Descriptors and Textual Descriptors to produce
a multimodal global feature that helps in the retrieval process. In addition to that, we will use
the content based image retrieval to create an upper level image presentation, for instance; a
visual phrase. Furthermore, this model can be used to retrieve frames from videos using a
query image.
li
REFERENCES
retrieval technique for large image databases," in Proceedings of the Seventh ACM
234.
lii
[9] P. M. D. Comaniciu, "Robust analysis of feature spaces: color image segmentation," in
931–938.
[16] H. F. T.-S. C. C.-H. L. R. Shi, " An adaptive image content representation and
2003, p. 206–215.
[19] S. B. H. G. J. M. C. Carson, " Blobworld: image segmentation using expectation-
maximization and its application to image querying," in IEEE Trans. Pattern Anal.
level concepts, Proceedings of the SPIE Data Mining and Knowledge Discovery, vol.
III,," 2001.
[21] S. K. Alphonsa Thomas, "A Survey on Image Feature Descriptors-Color, Shape and
Texture".
[22] M. G. J. Eakins, "Content-based image retrieval," in Technical Report, 1999.
[23] H. Z. D. F. F. Long, " Fundamentals of content-based image retrieval," in Multimedia
liii
Information Retrieval and Management , Berlin, 2003.
[24] R. S. a. C. Jawahar, "Word Image Retrieval using Bag of Visual Words," India, 2012.
[25] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," p. 91–110,
2004.
[26] S. T. Avi Mehta, "Image Segmentation using k-means clustering, EM and Normalized
Cuts".
[27] Y. F. R. E. Schapire, "A Short Introduction to Boosting," USA.
[28] [Online]. Available: http://press.liacs.nl/researchdownloads/topsurf/.
[29] H. bay, "SURF:Speed Up Robust Features," p. 14.
liv