CENG695 - MasterThesisII - Report - ZahraaLoubany - DinaBalchy Edited

Enhanced Content Based Image Retrieval System Using
Relevance Feedback
Master Thesis
By
Zahraa Loubany, Dina Balchy
Submitted to the School of Engineering of the
Lebanese International University
Beirut, Lebanon
In partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE IN COMPUTER AND COMMUNICATIONS
ENGINEERING
Fall 2016 – Spring 2017
Approved by:
Supervisor
Dr. Ismail ElSayad

Committee Member Dr. Samir Omar
Committee Member Dr. Rawad Abu Assi
ii
DEDICATION
I would like to dedicate this project to our family for their support and caring
throughout the different stages of the project, to my instructor Dr. Ismail who always tried his
best to emit out the best of me, and off course to the Almighty God for his daily limitless
blessings.
Zahraa Loubany
I dedicate the fruit of this hard work to my family and friends who gave me the
needed support, to Dr. Ismail who pushed me all the way to show the very best in me, and to
our university LIU for providing satisfying educational levels to achieve a successful work.
And most of all, the biggest gratitude to Allah for helping me out all the way.
Dina Balchy
iii
ACKNOWLEDGMENT
We would like to share gratitude with our supervisor Dr. Ismail El-Sayad for his
invaluable assistance, support, and supervision. Dr. El-Sayad has been next to us through the
whole journey, offered undeniable assistance and regular supervision to help us achieve this
project, and never hesitated to offer us needful information and advices to attain this level.
Gratitude is also shared with our Thesis Committee Members: ‘Dr. Samir Omar’ and ‘Dr.
Rawad Abu Assi’, for their encouragement, comments, and support.
High appreciation goes to our parents, who gave us all the needed support and confidence
during this journey.
Finally, we would like to express our gratitude to Almighty God for granting us strength and
volition to complete this project successfully.
iv
ABSTRACT
The main approach to be introduced is a new methodology in Image Retrieval using
Relevance Feedback. In this report, the image will be represented in a higher level depending
on the users’ feedback as they evaluate the veracity and accuracy of the retrieved images.
The visual data (hereby known as Visual Descriptors) are extracted using TOP-SURF,
whilst the textual data (known as Textual Descriptors) are extracted based on the TF-IDF
values of the annotated tags of the images. The usage of the advanced process BOVW, Bag
of Visual Words, is influenced by the success of Bag of Words in textual classification and
retrieval. The BoVW represents the image by all words that describes it or can be generated
from. The empirical distribution of words is captured with a histogram making the
similarities distinguishing much easier. The feedback is represented by a ranked relevancy
assignment performed by the user. The system by its turn will consider the images with a
high relevancy value to enhance the image presentation according to a weighted schema
using the concept of BoVW.
v
TABLE OF CONTENTS
DEDICATION.........................................................................................................................II
ACKNOWLEDGMENT........................................................................................................II
ABSTRACT.............................................................................................................................II
TABLE OF CONTENTS........................................................................................................II
LIST OF FIGURES................................................................................................................II
TABLE OF EQUATIONS......................................................................................................II
LIST OF TABLES..................................................................................................................II
LIST OF SYMBOLS..............................................................................................................II
CHAPTER 1 INTRODUCTION............................................................................................2
1.1 Background................................................................................................................2
1.2 Problem Statement.....................................................................................................2
1.3 General overview of the project.................................................................................5
1.4 Thesis Outline............................................................................................................5
CHAPTER 2 BACKGROUND AND DEFINITIONS..........................................................7
2.1 Introduction................................................................................................................7
2.2 Digital Image Processing...........................................................................................7
2.3 Content Based Image Retrieval- CBIR......................................................................8
2.3.1 Image Segmentation Low-level FeatureExtraction………………………………...8
2.3.2 Low-Level Feature Extraction...............................................................................9
2.3.2.1 Color Feature………………………………...………………………………...9
vi
2.3.2.2 Texture Feature………………………………………………………………11
2.3.2.3 Shape…………………………………………………………………………12
2.3.2.4 Spatial Location ……………………………………………………………...12
2.4 Image Representation with a Bag of Visual Words...........................................................13
2.4.1 BoVW Methodology............................................................................................16
2.4.1.1 Extracting Features from Training Images……...……………………………16
2.4.1.2 Clustering Like-features Together……………………………………………16
2.4.1.3 Representing the Images as a Set of Weighted Visual Words……………….17
2.4.1.4 Constructing Histograms of Frequency of Features………………………….18
2.4.1.5 Evaluating Unknown Images Against the Obtained Histograms…………….19
2.5 Relevance Feedback in Image Retrieval…………………………………………………20
CHAPTER 3 LITERATURE REVIEW..............................................................................22
3.1 Introduction..............................................................................................................22
3.2 Image Retrieval Relevnce Feedback Methods.........................................................22
3.2.1 Semntic Image Retrieval Using Relevance Feedback.........................................23
3.2.1.1 Proposed System …………………………………………………………….23
3.2.1.2 Algorithm…………………………………………………………………….23
3.2.2 A Proposed Log-based Relevance Feedback Technique in CBIR using Positive and
Negative Examples...................................................................................................................25
3.2.2.1 Proposed Algorithm …………………………………………………………25
3.2.2.2 Algorithm…………………………………………………………………….26
3.3 Methods/Codes Comparison....................................................................................30
3.4 Conclusion and Motivation......................................................................................32
REFERENCES.......................................................................................................................35
vii
LIST OF FIGURES
Figure 1.1: Text-Based Image Retrieval System.......................................................................2
Figure 1-.2: Content Based Image Retrievl System...................................................................3
Figure 2.1: Bag of Visual Words.............................................................................................15
Figure 2.2: Feature Extraction Illustration ..............................................................................16
Figure 2.3: Feature Clustering..................................................................................................17
Figure 2.4: Image Representation as a Weighted Vector.........................................................18
Figure 2.5: Constructing Histogram of Frequency Features....................................................19
Figure 2.6: Evaluating Images Against Obtained Histograms.................................................20
Figure 2.7: Relevance Feedback Algorithm.............................................................................22
Figure 3.1: Semantic Image Retrieval Using Relevance Feedback Block Diagram................34
Figure 3.2: General Scheme of the Proposed System..............................................................27
Figure 4.1 : General Overview of the CBIR Proposed System with RF..................................20
viii
Table of Equation
Equation 3-1: Canbera Ditance 24
Equation 3-2: Average Precision 25
LIST OF TABLES
Table 2.1: Texture features for image retrieval........................................................................11
Table 3.1: Advantages and Disadvantages of Considered Work.............................................33
Table 4.1: Tabulated form of BOW.........................................................................................41
ix
LIST OF SYMBOLS
CBIR: Content Based Image Retrieval
RBIR: Region Based Image Retrieval
TBIR: Text Based Image Retrieval
BoVW : Bag of Visual Words
QBIC: Query by Image Content
CBVIR: Content-Based Visual Information Retrieval
RF : Relevance Feedback
CRT : Composite Region Template
KLT : Kanade–Lucas–Tomasi Feature Tracker
Blobs : Binary Large OBject
TF : Term Frequency
IDF : Inverse Document Frequency
IR : Information Retrieval
SVM : Support Vector Machines
MAP : Mean Average Precision
CHAPTER 1
INTRODUCTION
x
1.1 Background
Through the lapse of era, the amount of digital image collections has grown
dramatically due to the rapid increase of online users and web applications. This considerable
increase was a result of the technological breakthroughs we witnessed. Admittedly, images
are the easiest way of communication used to convey information and reach the audience
brains smoothly, which made the demand on using images grows significantly.
With the popularity of social media applications, it’s not a surprise that images are
becoming increasingly important for content sharing and viewing. Generally, audience and
readers like to visualize stories not just to read them. Images illustrate the ideas for the
readers so the overall experience is more tangible and less demanding on the reader’s
attention with an obvious and explicit visual content. Visual content is content that engages,
inspires and makes the opportunity to spark the readers’ interest less daunting than
comprehensive texts.
The importance of web applications especially social media is undeniable. People are
spending most of their time navigating browsers, checking news and keeping an eye on their
friends’ new feeds which are most of the time pictures shared publically. Moreover, the
influence of video games, television and photographs has also contributed to this rise. In this
context, the development of suitable systems to manage appropriately these massive loads of
images and collections is a necessity.
The known efficient systems for managing this load use the Content Based Image
Retrieval Systems known as CBIR[ CITATION DrF00 \l 1033 ]. Generally, CBIR is a
common technique for image retrieval which has been created to solve the main problems of
xi
the query by text which is commonly known as TBIR – Text Based Image
Retrieval[ CITATION Pet \l 1033 ].
In TBIR [ CITATION Pet \l 1033 ] systems, images are annotated manually by textual
tags, which are used in the image retrieval process by a database management system. The
user provides query in terms of keyword and the system by its turn will retrieve all images
with matching textual annotation or similar to the user’s query.
The TBIR system is illustrated in the Figure below:
Figure 1.1 – Text-Based Image Retrieval System
Any image found within the system database is manually annotated by textual
keyword. Although this technique is known to be computationally fast in image retrieval, it
still has some difficulties. Firstly, the manual annotation of a huge number of images requires
an exhaustive and considerable level of human labor. Secondly, the labeled images may hold
unexpressed feelings and emotions that cannot be described by textual phrases. Thirdly, the
manual annotation of images may significantly lack accuracy due to subjectivity of human
cognition.
xii
The CBIR systems were introduced to solve such problems by taking the input as a
query image instead of a text. By its turn, it seeks for images that are highly close to the
query image by color, texture or form. CBIR, also known as query by image content (QBIC)
is applying the computer’s visual techniques in solving image retrieval problem, that is, the
difficulty of searching for digital images in huge databases.
Figure two shows a typical schema of CBIR system:
Figure 1.2 – Content Based Image Retrieval System
Training images represent all images found within the database. Each training image is
represented by a vector of features which are also known as code words. Similarly, any query
image will also be represented in the same way so that common features (similarities) are
easily distinguished. The similarities detection are performed by measuring the Euclidean
distance between the representative vector of features corresponding to the query image and
those of the training images already found in the database.
A great advantage can be taken from the Relevance Feedback (RF) that can be
implemented within the CBIR technique. RF adds a great interaction between the system and
the users after the search and retrieval operation. This feedback could be an image reordering
xiii
of the yielded results, word description or a rating mark for the image according to its
relevancy to the query image which will actually be our main concern.
Then, our framework could be generally summarized by offering a higher level image
representation using the users’ relevance feedback which improves the system’s performance
by the images driven from the database to the user.
1.2 Problem Statement
The major difference between CBIR and TBIR is the keyword assignments
incompatibility among different annotators. Each annotator may interpret the content of the
image in distinct text phrases each depending on their own point of view. TBIR stores these
text annotations in the form of keywords of textual phrases together with the image. Most of
the TBIR systems use surrounding text of the image to search for the keywords which are
physically close to the image. This technique relies on the assumption that the image is
usually described by the surrounding text. Famous Search Engines that uses this technique
are Google, Yahoo and AltaVista long ago.[ CITATION Neh13 \l 1033 ]
CBIR, a new alternative mechanism for image search, has been declared in the early
1990s. Besides using phrases assigned maually, CBIR systems use the visual content of the
images such as color, texture, and shape features. CBIR systems mainly use low-level
features that are automatically extracted by the computer using its own vision techniques,
while humans use the high-level features which are mainly the concepts lying behind the
scenes. The difference between these two paradigms of feature extractions is defined as
‘semantic gap’. Learning users’ intention and their level of satisfaction using relevance
feedback is one of the major techniques for diminishing the semantic gap.[ CITATION
Neh13 \l 1033 ]
xiv
Our main purpose is to exploit the feedback taken from the user to create a reordered
form of the retrieved images and thus a better result for the user’s query. Also, we aim to
enhance visualizing the descriptors of the image by filtering the noisy visual words from the
image. The implementation is done using Visual Studio by the aid of TopSurf image
descriptor. All data will be stored using SQL server 2014 as a relational database
management system.
1.3 General overview of the project
This report highlights on the effect of RF in image presentation enhancement. To do
so, two hierarchal approaches will be developed: the first is performed by the user where he
is asked to give his own feedback according to a voting system. This could be a simple way
to show the user’s response or satisfaction without any complicated process. The second
approach is mainly done by the system by comparing the features of the relevant images and
ranking the features according to a weighting schema.
Briefly, the project aims to enhance results of CBIR system retrieved images and
improve the image representation on the level of BOVW by taking an advantage of the RF,
therefore increasing the performance of the system.
1.4 Thesis Outline
The project provides an extensive description of our work through the five chapters.
This is structured as follows:
Starting by chapter two, we introduce general specification of CBIR systems as a
searching technique for digital images in large databases. Moreover, an inclusive description
that shows the significance of using RF in the CBIR systems is introduced.
xv
Chapter three mainly represents the literature review. The methodologies adopted by
the previous related works are presented. We go along by a brief description of the algorithms
used before followed by a general comparison of the methods and listing the advantages and
disadvantages of each in tabulated form. A general overview of the proposed work is
introduced with an illustration of the proposed approach.
Chapter four covers the construction of our implementation used by demonstrating the
algorithms, then analyzing the effect of various parameters and factors to determine which
yields the best results.
Finally, in Chapter five we draw out a conclusion of our entire work and briefly explain the
future work.
xvi
CHAPTER 2
BACKGROUND AND DEFINITIONS
2.1 Introduction
This chapter profoundly introduces a general review of image processing and the
CBIR which relies on the low level features. This system involves three main steps that are
explained extensively. Image segmentation, Low-Level features extraction and the similarity
matching are stated respectively. Furthermore, we demonstrate the approach of BoVW (Bag
of Visual Words) [ CITATION Rad11 \l 1033 ].
Finally, a brief introduction of the RF integration process in a general CBIR system is
introduced.
2.2 Digital Image Processing
Image processing is a well-known methodology used to perform different operations on
an image after converting it into digital form, in order to enhance its presentation or to extract
some useful information from it such as feature extraction. It is a type of signal dispensation
where the input could be in the form of a video frame, image or photograph, and the output
may be image or different parameters related to it.
Image processing is assorted to be among the most dramatically growing technologies
nowadays, with its several applications in various aspects of business, remote sensing,
medical interpretation and other several applications associated to image retrieval. An image
retrieval system is a system which permits the user to navigate browsers, search, and retrieve
images related to user’s inquiry.
xvii
2.3 Content Based Image Retrieval- CBIR
CBIR is a retrieval process that collects in demand images from a huge database based
on the visual information of the query image. The retrieval of a particular image from a large
database is mainly affected by general factors such as color, texture, shape and local features.
The CBIR system is referred to “computer-centric” systems in which retrieval process is
automatically performed by the system using computer vision techniques known as (low-
level features).
To perform CBIR we have to go through the low level image features as a first step.
Low-Level features are considered as the base of CBIR systems. This can be done by
extracting it from the whole image or by applying segmentation. This process is called region
based image retrieval (RBIR) which is a special type of CBIR, where the region is considered
as a part of the image with homogeneous low-level features. RBIR is preferably used by users
because it is closer to human cognition.
To go through the execution of RBIR, the first step is to perform image segmentation.
Through this step, low-level features can be extracted from the segmented regions. Then, the
similarities between two images are defined based on region features.
2.3.1 Low-level Features Extraction
Performing the image segmentation automatically is complicated process. Through
the years, several techniques have been proposed, such as curve evolution [ CITATION
HFe01 \l 1033 ], energy diffusion[ CITATION WYM97 \l 1033 ], and graph partitioning
[19]. Various existing segmentation techniques work properly for images containing
homogeneous color regions solely, which are mainly used in retrieval systems that deals only
with colors[ CITATION PLS03 \l 1033 ][ CITATION KAH99 \l 1033 ], such as direct
clustering methods in color space[ CITATION DCo97 \l 1033 ]. However, natural scenes are
xviii
substantially rich in both texture and color, and a broad zone of natural pictures can be
classified as a mosaic of regions with distinctive textures and colors.
The majority of systems build their own segmentation technique in order to achieve
the required region features through the stage of segmentation, which could be color, texture,
or both[ CITATION CFa94 \l 1033 ][ CITATION JZW01 \l 1033 ][ CITATION CPT11 \l
1033 ]. Such algorithms are mainly based on k-means feature clustering [9]. Firstly, an image
is sectioned into 4*4 sized blocks from which color and texture feature are extracted.
Secondly, k-means clustering is applied to cluster the features into independent classes, each
classified to one region. Blocks within the same class corresponds to same region. K-means
with connectivity constraint, known as KMCC, is proposed in Ref.[ CITATION VMe03 \l
1033 ] for image segmentation.
Primal features characterizing image content, such as color, texture, and shape are
automatically extracted from images and used in content-based visual query. Various
algorithms have been proposed however; our main focus will be on the features used in RBIR
systems with high-level semantics:
2.3.1.1 Color Feature
Color feature is one of the most important and widely used features in image retrieval
and presentation. Colors are assigned according to multiple color spaces, which often serve
for several applications. Description of several color spaces can be found in Ref.
[ CITATION KNP00 \l 1033 ]. Color spaces are known to be closer to human perceptual
abilities and widely used in RBIR. These spaces include, RGB (red, green, and blue),
LAB( Lightness, a and b are two color dimensions), CMY (Cyan, Magenta and Yellow),
CMYK (Cyan, Magenta, Yellow, and Black), HSV(Hue, Saturation, and Value), and YCrCb
( Y′ is the luma component , CB and CR are the blue-difference and red difference
xix
chroma components respectively) [ CITATION PLS03 \l 1033 ][ CITATION YLi04 \l 1033 ]
[ CITATION RSh04 \l 1033 ][ CITATION VMe03 \l 1033 ][ CITATION BSM \l 1033 ].
Common color features in retrieval systems include, color histogram, color moments,
and color coherence vector [ CITATION FJi031 \l 1033 ][ CITATION CCa02 \l 1033 ], all
are considered as descriptors. Color is a very crucial feature as it is invariant with respect to
scaling, translation and rotation of an image. Color space, color quantification and similarity
measurement are indispensable key components of color feature extraction; however they are
not directly related to high-level semantics. In order to evaluate the effectiveness and
efficiency of color features color descriptors are considered.
Notably, in most of the CBIR systems, the color images do not undergo preprocessing
since they are usually affected by noise. This corruption may occur due to camera sensors or
capturing devices. From here, improving the retrieval accuracy necessitates applying an
effective filter to remove the color noise.
2.3.1.2 Texture feature
Texture features is not as well-defined as color features, because it is not used in all
systems. [ CITATION IKS01 \l 1033 ][ CITATION PLS03 \l 1033 ]. However, it describes
the content of different real-world images such as fruits, clouds, skin, trees, sky and fabrics.
Therefore, texture is a significant feature for image retrieval purpose as it defines high-level
semantics and provides important information in image classification.
Texture gives information on structural arrangement of surfaces and objects of the
image. It depends on the intensity distribution entirely all over the image not defined for a
separate pixel. Among the various texture features, Gabor Features and Wavelet Features are
widely used for image retrieval and they match the results of human vision studies. Note that
texture analysis by means of the Gabor filters is a special case of the wavelet approach
[ CITATION Alp \l 1033 ] .

xx
The table below represents the advantage of Gabor filter and Wavelet texture features.
Texture Features Advantages

Gabor filter Used to detect frequency and orientation
Wavelet Transform Filters with salient point features.
Table 2.1 - Texture features for image retrieval
2.3.1.3 Shape
Shape has a moderately well-defined concept. Compared to color and texture, it is
difficult to be applied due to the inaccuracy of segmentation [ CITATION JEa99 \l 1033 ].
Along with texture and color features, image comparison also considers the shape of objects.
For shape representation of the image, various methods are used in which they are classified
into external and internal. The external methods represent the boundary, and internal ones
represent the pixels encompassing the region. Shape features are classified in to two types:
boundary descriptors and region descriptors.
Regions can be represented as a set of simple geometrical parameters, such as area or
compactness. Grid based method is a clustering approach that is commonly used for object
shape description using multi resolution grid data structure. Currently, the most popular
region descriptor is the moment invariants [ CITATION FLo03 \l 1033 ].
2.3.1.4 Spatial Location
Equally important, spatial location is also efficient in region classification. To clarify,
some objects or scenes may hold information with close color and texture features but with
different spatial locations. For instance, ‘sky’ and ‘sea’ have a common color which is blue,
but with different usual locations. Sky is usually located at the upper part of the image, while
sea at the bottom. Generally, spatial locations are either defined as top, upper or bottom
according to the location of the region in an image. Directional relationships are not sufficient
xxi
for representing the semantic content of the image. Topological relationships must be taken
into consideration as well.
2.4 Image Representation with a Bag of Visual Words
The BoVW (Bag of Visual Words) model is motivated by the achievements attained
by using BoW (Bag of Words) in document classification and retrieval. Each document in the
BoW model is represented by a set of unordered common words presented in the documents.
These words are formally represented by a histogram of frequency of occurrences for each
word which is used for document retrieval and classification. Analogously, an image is
represented by a disarranged set of common discrete visual features, called vocabulary
[ CITATION Jaw12 \l 1033 ].
The BoVW represents the image by all words that describes it or can be derived out
from. The empirical distribution of words is captured with a histogram that represents
countable illustrations of how many times each word has been repeatedly visualized
[ CITATION Jaw12 \l 1033 ].
While performing the process of visual words extraction, foreground and background
features may be mixed together. By definition, foreground features represent the part of a
scene or picture that is nearest to and in front of the viewer lying closest to the picture plane.
However, the background features represent the farthest parts from the viewer that attracts
little attention or the picture social heritage (behind the scenes). To avoid this feature mix up,
the image is segmented into regions and a single bag is extracted.
In the retrieval and classification process, researchers are recently using interest point
detection. This methodology is a recent term in computer vision that refers to the detection of
interest points of an image and is relevant for higher level processing. Interest points
represent remarkable and conspicuous image patches that hold a salient information or
xxii
noticeable object. These points are commonly used by image stabilization and structure from
motion applications to track how the image changes from frame to another. Valid interest
points are not affected by any source of noise or image transformation.
Several techniques have been proposed for interest point’s detection. Corners are a
natural choice since they are easy to identify inside images with Harris and KLT which refers
to Kanade–Lucas–Tomasi Feature Tracker. Briefly, corners make points more detectable it
by looking through a small window. As the window is shifted in any direction, the intensity
will change noticeably and thus the interest point is detected. Moreover, Blobs (Binary Large
Object) shown a great work in detecting points through different scales by localizing the
center of blobs.
At each of these interest points, we extract a Speeded Up Robust Features (SURF)
[ CITATION Low04 \l 1033 ] (should be referenced to the article named SURF ref)
descriptor to describe the local information. They are then grouped into clusters, where
similar descriptors lay within the same cluster. Each cluster is considered as a visual word,
and thus we get a dictionary of visual-words that describe all kinds of image patterns.
An example of BoV is shown in the figure below.
Figure 2.1 – Bag of Visual Words
xxiii
When interest points are mapped into visual words, we can represent an image as a
“Bag of Visual Words”, that is, a vector containing the count/weight of each visual word in
that image, which is used as feature vector.
2.4.1 BoVW Methodology
Generally, there are four sequential steps for performing a Bag of Visual Words.
2.4.1.1 Extracting features From Training Images.
In this step, the features are extracted from the image as an interesting point. For
instance, the features found in the image of the figure above might be eyes, window, mouth,
or hands each as a local patch. These features are then presented as numerical vectors called
“features descriptors”. One of the most famous descriptors is SURF descriptor which presents
each patch as 64-dimensional vector. Note that there are many key point description
techniques such as Harris and SIFT. SURF algorithm is similar to SIFT, but it is more
simplified and computationally faster.
Figure 2.2 – Feature Extraction Illustration
xxiv
The figure above shows a clarified illustration of how the system extracts the features
where the interesting points are detected. Each training image has its own features that will be
later on be clustered altogether to form different group of similar features.
2.4.1.2 Clustering like-features together
This step is defined as a process of organizing objects into distinct groups with similar
members. A cluster therefore is a collection of common features (patches) or features with
high similarities between them. Here, patches are converted to "code words" that refers to
words in text documents, producing a “code book” or a vocabulary that also refers to a
dictionary of words. K-means clustering [ CITATION Avi \l 1033 ] could be an effective
algorithm for image clustering which works as follows:
1. Select initial cluster centroids “c” at random.
2. Compute the distance between each patch and the centroids of the clusters.
3. Assign each patch to the cluster with the nearest centroid (minimum distance).
4. Recalculate each centroid as the mean of the objects assigned to it.
5. Repeat previous 2 steps until no change.
xxv
Figure 2.3 – Feature Clustering
After assigning the similar feature to the same cluster, a codebook containing all visual
words can be formed. All similar features are gathered within the same code word forming
indexed visual words. The figure above represents a set of different clusters (groups of
features) after being clustered together.
2.4.1.3 Representing the image as a set of weighted Visual Words
At this stage, images are no longer represented as a set of pixels. Instead, it can be
represented with a higher level that is more oriented to the semantic as a set of patches or
visual words known as “Bag of Visual Words”. Each image can be exemplified as a vector
containing all visual words found in the dictionary as vector components. Each component
has its own “tf-idf” value which is used as a weighting factor as shown in the figure below.
xxvi
Figure 2.4 – Image Representation as Weighted Vector
The “tf” represents the number of occurrences of a visual word in the image divided by
the total number of visual words in this image which is called (Term Frequency). The other
factor “idf” refers to the total number of images divided by the number of images where the
visual word appears and it is termed as (Inverse Document Frequency). Accordingly, the (tf-
idf) weighting factor of each component in the vector is the product of the previous two static
values.
2.4.1.4 Constructing histograms of frequency of features
Beyond the vector presentation of the image, we can easily visualize a histogram
showing how many features the image has in each cluster. In other words, a histogram
illustrating the frequency of each visual word contained in this image. Note that these
histograms show the frequency of occurrences and not the position.
xxvii
Figure 2.5- Constructing Histograms of Frequency Features
The figure above shows a set of histograms representing the frequency of features of different
images.
2.4.1.5 Evaluating unknown images against the obtained histograms
We extract features from the tested image, and form its corresponding histogram of visual
word frequencies to be compared with the ones obtained previously. Through this approach,
matches can be effectively computed. A positive or true match considered to be within the
same category or having a high correlation with.
The figure below illustrates how this comparison takes place.
xxviii
Figure 2.6 – Evaluating Images Against Obtained Histograms
The hierarchal algorithm followed by performing the BoVW is considered to be a very
successful tool in image classification images according to their content. It has a large
immunity against position and orientation of object in the image. Also, it represents all
vectors with a fixed length irrespective of number of detections. However, this model still
requires further testing for large changes in scale and viewpoint because it has not been
extensively tested yet for view point and scale invariance, and its performance still somehow
suffers ambiguity.
2.5 Relevance Feedback in Image Retrieval
Lately, all recent retrieval systems embedded within their systems the user’s relevance
feedback to further improve the retrieval process and produce more meaningful and related
retrieved images [6]. This online process is optimized considering the most positive image
xxix
selection on each feedback iteration. Through continuous learning and interaction with end-
users, RF has provided a significant enhancement in the performance of CBIR systems.
A typical scenario for RF in CBIR is shown below [1] :
(1) The system provides initial set of retrieved images through query image.
(2) User evaluates the above results as to whether they are relevant (positive examples) or
irrelevant (negative examples) to the query.
(3) Machine learning algorithm is applied to learn the user’s feedback. Then go back to (2).
Steps (2) – (3) are repeated till the user is satisfied with the results.
Figure 2.7 - Relevance Feedback Algorithm
Figure 2.7 shows a general overview of the RF algorithm. The process performed in step (3)
is to be performed by the system. That is, the burden of specifying the weight is removed
from the user and held by the system instead.
xxx
CHAPTER 3
LITERATURE REVIEW
3.1 Introduction
I........n this chapter, we go through the sections by illustrating the Content Based Image
Retrieval by using several Relevance Feedback approaches. Our focus is on several methods
that have been adopted in previous retrieval techniques, by describing the methodology
considered for each of them. This is carried on by demonstrating profoundly three different
famous methodologies known in the RF field, and explaining the algorithm of each.
Accordingly, this chapter will be wrapped up by stating the advantages and
disadvantages of each considered related work, drawing out a general conclusion, and paving
the way for introducing our new RF approach.
Recently, user’s feedback has been very popular and existing in many different levels.
Any smart application, employment systems, or new business intend to get the user’s or
costumer’s feedback in order to improve their qualifications so that they meet the user’s
satisfaction. Outlining the feedback process as well as desired outcomes is essential for
gathering user feedback properly; otherwise, we may be blindly asking for feedback that will
only confuse our understanding of user’s intentions and desires. From here, embedding RF in
the CBIR systems turned to be a necessity.
Foremost, we are going to introduce three related paradigms of RF and the followed
methodologies. These systems are “Semantic Image Retrieval Using Relevance Feedback”
xxxi
and “A Proposed Log-based Relevance Feedback Technique in CBIR using Positive and
Negative Examples”.
3.1.1 Semantic Image Retrieval Using Relevance Feedback
3.1.1.1 Proposed System
To reduce the considerable gap between low level property and high level concept, this
RF approach is proposed and experimented by using AdaBoost process.
It is found that the proposed Relevance Feedback system offers effective retrieval
performance in countable feedback iterations. A recent relevance feedback way which is
based on ADABoost technique applies the relevant and irrelevant pattern. Encompassing
trials using the RF technique with AdaBoost on different databases showed important
advancements with respect to the performance of the retrieval process.
AdaBoost is an effective algorithm of the learning field that was introduced by Freund
and Schapire in 1995 [ CITATION Yoa \l 1033 ]. It is used to enhance the classification
performances of a weak learner. It performs this by merging a collection of weak
classification functions to form a more powerful classifier.
The experiments for test this approach were performed by selecting a query image at
random and therefore the retrieved images were obtained. Then, the user is asked to label
retrieved images as relevant and irrelevant.
3.1.1.2 Algorithm
The proposed methodology is described by the following algorithm. Every single
iteration, the irrelevant images are removed from the database which plays a role in It
optimizing the testing data. Hence it increases efficiency by reducing the retrieval time.
The figure below summarizes the algorithm of the
xxxii
Figure 3.1 - Semantic Image Retrieval Using Relevance Feedback
Note that Canberra distance metric is used as variable of measuring the similarity. Here
x represents the feature vectors of the database while y refers to query image, and have
dimension d, then the Canberra distance is given by

d
∣ xi− yi ∣
Canb (x,y) ¿ ∑
i=1 ∣ xi∣+∣ yi ∣
Equation 3-1: Canberra Distance
At the experimental results stage, it is important to define a suitable measurement for
evaluating the performance. This is defined as:
relevant images retrived ∈top T returns

Precision =
T
Equation 3-1: Average Precision
This allows computing the average precision and thus knowing the efficiency of the
system as we proceed with repeating iterations.
xxxiii
3.1.2 A Proposed Log-based Relevance Feedback Technique in CBIR using
Positive and Negative Examples
3.1.2.1 Proposed System
Relevance feedback has been shown as an effective model to improve the recovery
performance of CBIR systems. It has already been considered as a crucial parameter when
modeling any CBIR system. After the user adds an enquiry image, the system will retort a set
of similar images that may not be totally relevant to the user’s targets. The pertinence
feedback mechanism solicits the user to mark the pertinence of the images retrieved. By its
turn, the system modifies the outcomes by learning the returns from the user. The Previous
systems offer to integrate the user's returns, have only taken the positive feedbacks into
consideration which will collapse the substantial negative data.
To dominate the limitations of conventional CBIR systems, Log based relevance
feedback in CBIR system was used, which merges both positive and negative feedback.
3.1.2.2 Algorithm
The main idea of RF is to switch the load of finding the right query figuration from the
user to the system.
Figure 10 shows the general representation of the planed system by using the mean
relevance feedback in CBIR.
xxxiv
Figure 3.2 – General Scheme of Proposed System
Initially, the images will be supplied to the system and stored in the system’s database.
When the user gives the query image into system, the latter in turn will extract the visual
features of supplied image and make comparison with the database image to retrieve the set
of relevant images to the user. On the retrieved set, user will give the feedback whether they
are relevant or not. If the Image is irrelevant, system goes back to the previous step.
However, if the image is relevant then system will determine the weight vector of query
image using short term learning, and supplies the semantic space of an image using long term
learning. Finally, it gives to the log-based semantic approach.
In the log-based semantic approach, the similarity space is determined based on the
semantics of an image. System will maintain the logs for user feedbacks by considering the
positives and negatives (relevant and irrelevant) from the retrieved set of images which are
useful in the next iterations. After the n iterations, an image will have the logs. This will be
useful to improve the system’s performance and speed. The steps are repeated until the user
gets relevant images.

xxxv
3.2 Methods/Codes Comparison
The method used in the “Semantic Image Retrieval Using Relevance Feedback” shows
a fast increase in retrieving accomplishment with each iteration using AdaBoost learning
algorithm. The method adopted by “A Proposed Log-based Relevance Feedback” has a major
contribution in achieving integration of short-term-learning with long-term-learning to
semantically update the query weight vector.
The advantages and disadvantages of the three considered methods will be illustrated in
a tabulated form as shown below. The related works are numbered as 1 and 2 for “Semantic
Image Retrieval Using Relevance Feedback” and “A Proposed Log-based Relevance
Feedback” respectively.
Considered Work Advantages Disadvantages
Enhanced performance retrieving Human perception subjectivity.
1 comparing with the way R. Ding
approach uses.
Fast expansion in the performance of No enhancement on the level of
retrieving with every feedback iteration. image representation (BoVW
presentation)
Performance enhancement from 57.2% Lacks the presence of a voting
to 92.50% from 1st iteration to the 5th system.
iteration.
Integrates short-term-learning with long- No enhancement on the level of
term-learning. image representation.

2
Enquires high retrieval accuracy for a Lacks the presence of a voting
large database which contains similar system that allows the user to
semantic categories. grade the image according to its

xxxvi
relevancy.
Table 3.1 – Advantages and Disadvantages of Considered Work
3.3 Conclusion and Motivation
Although the retrieval image systems achieved high and efficient retrieval outcomes,
these systems still lack the interaction of the user directly with this system. The absence of
such interactions consequently results an inaccuracy in the output results.
We believe that effective CBIR of images needs integration between the retrieval
models found in the Information Retrieval (IR) literature and the feature plucked out and was
found in the image processing area. For this purpose, it is necessary increase the level of
accuracy in such systems in order to return the desired results for users.
Relevance Feedback in CBIR has obtained its maximum popularity in the field of
image processing and retrieval. This approach sparked the attention of many researchers who
were involved in the enhancement of retrieving systems. In the past few years, researches
devoted to relevance feedback as an efficient way to improve performance of CBIR systems.
Henceforth, executing the RF in CBIR systems is the recent major key in improving retrieval
systems and enhancing retrieval accuracy significantly.
From here, our objective is to offer an improved content based image retrieval
system using the users’ relevance feedback which improves the system’s performance by
grading the images driven from the database to the user.
This will be managed by the implementation of “Enhanced Content Based Image
Retrieval System Using Relevance Feedback” using Visual Studio and MicrosoftSQL server
2014.
xxxvii
By coding and testing algorithms we will be able to accurately study system
performance with respect to the retrieved results incorporated with RF.
The next chapter will introduce our methodology in interlacing the RF of the user in an
iterative loop within the system.
The main deliverable of the proposed approach would be a list of retrieved images with a
highest degree of relevancy with the image query requested by the user.
xxxviii
CHAPTER 4
CBIR SYSTEMS WITH RELEVNCE FEEDBACK
IMPLEMENTATION
4.1 Introduction
This chapter focuses on the implementation of the Relevance Feedback in CBIR
systems and demonstrated in details. Three processes will be mainly introduced; two by
which they are performed by the system, and one process by the user. Firstly, the system will
perform the retrieval process, is followed by feedback process done by the user. Beyond the
feedback submission, system will update the order of the retrieved images and rearrange them
according to the users’ accumulative feedbacks.
The process of retrieval will be managed based on bag-of-visual words descriptors
based on TopSurf. Particularly, the tf-idf values of each descriptor. Bag of Words (BoW) is a
list of words with their word counts which is generally represented by table. Each row
represents a document which is –in this case- an image, and each column represents a visual
word. The cells represent the word count of each word in an image. The figure below
illustrates the BOW in its tabulated form.
Image 1 Image 2 Image 3 Image 4..

V.Word 1 12 0 0 6
V.Word 2 34 12 0 0
V.Word 3 26 32 0 7
V.Word 4 0 23 42 3
Table 4.1- Tabulated form of BOVW
xxxix
Term-frequency-inverse document frequency (tf-idf) is alternative way to estimate the
subject of an image by the visual words it contains. With tf-idf, visual words are given weight
according to its number of occurrences.
Generally, the tf-idf value is composed of two static factors: the first computes the
normalized Term Frequency (TF), i.e. the number of occurrences of a visual word in the
image divided by the total number of visual words in this image. The second term is the
Inverse Document Frequency (IDF), computed as the logarithm of the number of images
divided by the number of images where the precise visual word appears.
Tf (t) = (Number of times visual word v appears in an image) / (Total number of visual words
in the image).
Idf (t) = log (Total number of images/ Number of images with visual word v in it).
As a result, we will be able to derive a conclusion with respect to retrieved images
using the Relevance Feedback after we study its performance with respect to precision and
user’s satisfaction.
4.2 Proposed Design
The proposed system we intend to implement is illustrated in the figure below, and it
can be briefly described by the following steps:
1. Image retrieval based on BOVW.
2. User’s feedback submission by an image voting criteria (Priority for most
relevant)
3. System rearranging of retrieved images
4. Saving the updated order in the database
xl
5. Review the rearranged form by each user
Figure 4.1 – Arcitecture of the RF Proposed System
The system will rearrange the set of retrieved images according to the RF performed
by various users. The feedbacks of all users are taken into account every single iteration, so
that the user is able to visualize the original order of retrieved images by the system and also
the order of the images after his own feedback.
xli
4.3 Implementation/Simulation Tools
The design consideration of our retrieval software was the programming language
that suits best our simulation. While various options such as C++ and JAVA were employed;
our chosen one was C# due to the availability of libraries for image processing and
MicrosoftSQL server for database due voluble dealing with large databases.
Using TOP-SURF as an open source code was our starting point. TOP-SURF is an
image descriptor that integrates interest points with visual words, and thus enhancing its
performance. TOP-SURF offers the elasticity in descriptor size variation and supports very
efficient image corresponding. Besides the visual word extraction, visualization, and
comparisons it also provides a high level API and very large pre-computed
codebooks[ CITATION TopSurfhtt \l 1033 ]. TOP-SURF descriptor is a fully open source,
although it depends on libraries that need different licenses. As the original
SURF[ CITATION bay \l 1033 ] descriptor is a closed source, we used the OpenSURF as an
alternative open source, which depends on OpenCV that is released under the BSD license.
For a better database practicing, our work is connected to MicrosoftSQL server 2014
containing all tables and procedures needed for retrieval process execution and RF
engagement. Note that in the SQL software, we have used one table and 3 procedures
(Retrieve, update and delete) as shown below.
Figure 2
xlii
Figure…. Shows that the table includes the Rating_ID as a primary key and other
attributes used by the database proedures. The data type of images is chosen to be varbinary,
because the size of images varies considerably. The MAX attribute used for indicating that
the column data entries may exceed 8,000 bytes.
4.4 Implementation/Simulation Results
Our dataset is composed of different images as a Training images represented by four
different categories (Babies, cats , flowers and musical instruments). All the images are
stored in JPEG format.
The very first step in the proposed system is to extract the local descriptors from the
images. In this paper, SURF description was chosen in order to evaluate the system
performance. After the extraction of the key points from the image, the local description of
each key point is computed as shown figure below.
xliii
Figure 4.1 – System Architecture of Image Retrieval using BOVW
As shown in Figure 4.1, the system consists of two consecutive stages: Training stage
and Testing stage. Through the training stage, each image in the dataset is converted into
grayscale, resized and the features are extracted and associated to local descriptors. Beyond
this step, the set of local descriptors is clustered using K-means algorithm to construct a
vocabulary of K clusters. Then, the BOW image descriptor is computed as a normalized
histogram of vocabulary and saved for all images. As the test stage, the input image is pre-
processed for keypoints extraction where local descriptors are computed from it followed by
computing the BOVW vector. Finally, through the matching step, SVM (Support Vector
Machine) classification was used to grab the best results, which has the most similarity with
the image query.
xliv
The user is prompted to drag the file of training images after choosing the “Extract
Descriptors” option in order to perform the feature extraction process by the system as shown
in the figure below.
Figure 3
The retrieval demo of the query image with the set of images compared to it with the
cosine difference is illustrated in the figure below.
Figure 4
xlv
As shown at the top of the screen, the user has two different options; either he chooses
to submit his feedback by rearranging the set of retrieved images or he directly view the
updated rearranged form of images after engaging all previous feedbacks.
Figure 5
The above figure shows how the user is allowed to submit his feedback regarding the
first top ten retrieved images. Here, the user gives a rating mark for each image ranging from
1 up to 10. The mark represents the degree of relevancy between the image and the retrieved
one. Number 1 presents the image that is most relevant and closely connected to the query
image, from the users own perception. However, number 10 presents the lowest degree of
relevancy. Once the user submits his feedback, the system -which is automatically connected
to the SQL- will update the table of images and insert the ranking mark corresponding to each
image.
As for the second button, the user is allowed to visualise the updated order of the
retrieved images related to the same query image. In other words, the system collects all
feedbacks associated to the tested query image and averages them up to give the new order of
xlvi
each retrieved images. Notably, the images are arranged in the ascending order as the lowest
number presents the most relevant. Remarkably, the user is capable of visualising the new
order of the retrieved images even if hadn’t submit his own feedback. This means that the
step of inserting the feedback is optional, and the user is not obliged to do it if he is not
interested to do so. The figure below shows a demo of the set of retrieved images in the new
rearranged order.
Figure 6
The above figure obviously shows how the images are rearranged taking into account
the user’s feedback. After each iteration, the system averages the total entries of feedbacks
and returns the order of the image according to the equation below.
Average Feedback=
∑ Feedback Entries
Number of feedback submitted
Equation
..........Precision performance measure is used to calculate the performance of the retrieval
system. Precision is a relation presenting the number of relevant images retrieved to the total
number of retrieved images (relevant and irrelevant). This is shown in the equation below.
xlvii
Number of relevant images retrieved
Precision= X 100
Number of images retrieved
Equation
The precision is computed for each category separately in order to evaluate the overall
performance of our system before and after consecutive rounds of relevance feedback. The
results are presented in the table below. The results percentage is computed taking into
account the retrieved top 10 images of the set under study.
Precision (%) Before RF After RF After RF After RF

Category
Round 1 Round 2 Round 3
50 60 70 80
Babies
60 70 80 100
Cats
50 60 80 90
Flowers
40 60 60 80
Musical Instruments
Table
4.5 Discussion
Regarding the first category (Babies), the precision of 50% before relevance means
that originally the system retrieves 5 relevant images out of the top 10 retrieved images. After
the first round of the feedback submission the precision percentage increases to 60% which
means that 6 images out of the top 10 are now relevant to the query image. After each round,
more feedbacks are being taken into account which remarkably increased the precision
percentage to 90% at the third round.
xlviii
To test the performance of the entire system, mean average precision is computed
according to the following equation.
Mean Averge Precision ( ¿ of round )=

∑ Precision of each category
Number of ctegories
Equation
The table below shows the MAP of each round for each category.
Before RF Round 1 Round 2 Round 3

MAP 50 % 62.5% 72.5% 87.5%
Table Comparison of Mean Average Precision
The results are clear enough to demonstrate how important the engagement of RF in
content based image retrieval systems. That is, for a CBIR system a percentage of 87.5%
relevancy indicates a high performance in the field of query by image content retrieval
systems.
4.5 Conclusion
The use of relevance feedback in enhancing the performance of CBIR systems
became a must. It permits the interaction between the user and the system in a useful efficient
way. As well as, user’s feedback is an additional feature of CBIR systems that assists the
model to retrieve much more images visually related to the query concept expeditiously. This
is certainly shown by the accuracy and the precision of the retrieved results which were quite
high compared to systems with no consideration of users relevance feedback.
xlix
As to performance measures, the precision percentage and the mean average precision
were sufficient proofs to emphasize on the significance of relevance feedback in CBIR
systems.
CHAPTER 5
CONCLUSION AND FUTURE WORK
5.1 Conclusion
As huge image databases become a vast necessity in scientific, medical and in
advertising/marketing domains, approaches for organizing a database of images and for
effective retrieval have become very crucial. From here, the CBIR systems gave birth. It is
the field of representing, organizing and searching images based on their visual content rather
than textual tags describing it. Retrieval of images is no longer based on textual phrases and
annotations but on features extracted directly from the image data. This approach retrieves
digital images from large databases using the content of the images themselves without
human intervention, therefore eliminating inefficient and subjective manual labeling.
The implementation of Relevance Feedback in CBIR systems proved that by testing
the user’s satisfaction and engaging it in the system, the retrieval process is much more
efficient and precise. That is, the result of the retrieved images met the users’ desires
efficiently after engaging the feedbacks of several users within the retrieval process.
l
5.2 Future Work
As for future work, we aim to increase the performance of CBIR systems by
enhancing the way we search by. Besides the Relevance feedback we’ve added, searching
would be better if we combine the two approaches (CBIR and TBIR) together instead of
searching in terms of an image solely. In other words, we join textual and visual features
using a certain algorithm, which fuses Visual Descriptors and Textual Descriptors to produce
a multimodal global feature that helps in the retrieval process. In addition to that, we will use
the content based image retrieval to create an upper level image presentation, for instance; a
visual phrase. Furthermore, this model can be used to retrieve frames from videos using a
query image.
li
REFERENCES
[1] D. H. Z. a. P. D. D. F. Dr. Fuhui Long, "FUNDAMENTALS OF CONTENT-BASED
IMAGE RETRIEVAL," 2000.

[2] P. F. A. F. S. a. C. G. Peter Wilkins, "Text Based Approaches for Content-Based Image
Retrieval on Large Image Collections," Glasnevin, Dublin 9, Ireland.

[3] S. S. R. M. S. Neha Jain, "Content Base Image Retrieval using Combination of Color,
Shape and Texture Features," 2013, p. 2.

[4] M. P. C. G. Radu Tudor Ionescu, "Local Learning to Improve Bag of Visual Words
Model for Facial Expression Recognition," 2011.

[5] D. C. W. K. H. Feng, "A curve evolution approach for image segmentation using
adaptive flows," 2001, p. 494–499.

[6] B. M. W.Y. Ma, "Edge flow: a framework of boundary detection and image
segmentation," in IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 1997, p. 744–749.

[7] D. G. J. B. D. P.L. Stanchev, "High level color similarity retrieval," (2003), p. 363–369.
[8] K. V. J.-H. O. K.A. Hua, "SamMatch: a flexible and efficient sampling-based image
retrieval technique for large image databases," in Proceedings of the Seventh ACM
International Multimedia Conference (ACM Multimedia’99), November 1999, p. 225–
234.
lii
[9] P. M. D. Comaniciu, "Robust analysis of feature spaces: color image segmentation," in
Proceedings of the IEEE Conference on Computer, 1997, p. 750–755.

[10] R. B. M. F. J. H. W. N. D. W. E. C. Faloutsos, "Efficient and effective querying by
image content," in J. Intell. Inf. Syst. 3 (3–4) , (1994) , p. 231–262.

[11] J. L. G. W. J.Z. Wang, "SIMPLIcity: semantics-sensitive integrated matching for picture
libraries," in IEEE Trans. Pattern Anal. Mach. Intell. 23 , (2001) , p. 947–963..

[12] D. S. C.P. Town, "Content-based image retrieval using semantic visual categories," in
Society for Manufacturing Engineers, 2011, p. 201.

[13] I. K. M. S. V. Mezaris, " An ontology approach to object-based image retrieval, ,," in
Proceedings of the ICIP, vol. II, 2003, p. 511–514.

[14] A. V. K.N. Plataniotis, " Color Image Processing and Applications," 2000.
[15] D. Z. G. L. W.-Y. M. Y. Liu, "Region-based image retrieval with perceptual colors," in
Proceedings of the Pacific-Rim Multimedia Conference (PCM), December 2004, p.
931–938.
[16] H. F. T.-S. C. C.-H. L. R. Shi, " An adaptive image content representation and
segmentation approach to automatic image annotation," in International Conference on
Image and Video Retrieval, 2004, p. 545–554.

[17] B. Manjunath, "Color and texture descriptors".
[18] M. L. L. Z. H.-J. Z. B. Z. F. Jing, "Learning in region regionbased image retrieval," in
Proceedings of the International Conference on Image and Video Retrieval (CIVR2003),
2003, p. 206–215.
[19] S. B. H. G. J. M. C. Carson, " Blobworld: image segmentation using expectation-
maximization and its application to image querying," in IEEE Trans. Pattern Anal.
Mach. Intell. 8 (8) , (2002), p. 1026–1038.

[20] I. C. I.K. Sethi, "Mining association rules between low-level image features and high-
level concepts, Proceedings of the SPIE Data Mining and Knowledge Discovery, vol.
III,," 2001.
[21] S. K. Alphonsa Thomas, "A Survey on Image Feature Descriptors-Color, Shape and
Texture".
[22] M. G. J. Eakins, "Content-based image retrieval," in Technical Report, 1999.
[23] H. Z. D. F. F. Long, " Fundamentals of content-based image retrieval," in Multimedia
liii
Information Retrieval and Management , Berlin, 2003.
[24] R. S. a. C. Jawahar, "Word Image Retrieval using Bag of Visual Words," India, 2012.
[25] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," p. 91–110,
2004.
[26] S. T. Avi Mehta, "Image Segmentation using k-means clustering, EM and Normalized
Cuts".
[27] Y. F. R. E. Schapire, "A Short Introduction to Boosting," USA.
[28] [Online]. Available: http://press.liacs.nl/researchdownloads/topsurf/.
[29] H. bay, "SURF:Speed Up Robust Features," p. 14.
liv

CENG695 - MasterThesisII - Report - ZahraaLoubany - DinaBalchy Edited

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CENG695 - MasterThesisII - Report - ZahraaLoubany - DinaBalchy Edited

Uploaded by

Copyright:

Available Formats

Enhanced Content Based Image Retrieval System Using

Zahraa Loubany, Dina Balchy

Submitted to the School of Engineering of the

Lebanese International University

In partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN COMPUTER AND COMMUNICATIONS

Fall 2016 – Spring 2017

Dr. Ismail ElSayad

Committee Member Dr. Rawad Abu Assi

Rawad Abu Assi’, for their encouragement, comments, and support.

during this journey.

volition to complete this project successfully.

The main approach to be introduced is a new methodology in Image Retrieval using

similarities distinguishing much easier. The feedback is represented by a ranked relevancy

using the concept of BoVW.

1.2 Problem Statement.....................................................................................................2

1.3 General overview of the project.................................................................................5

1.4 Thesis Outline............................................................................................................5

CHAPTER 2 BACKGROUND AND DEFINITIONS..........................................................7

2.2 Digital Image Processing...........................................................................................7

2.3 Content Based Image Retrieval- CBIR......................................................................8

2.3.1 Image Segmentation Low-level FeatureExtraction………………………………...8

2.3.2 Low-Level Feature Extraction...............................................................................9

2.3.2.1 Color Feature………………………………...………………………………...9

2.3.2.4 Spatial Location ……………………………………………………………...12

2.4 Image Representation with a Bag of Visual Words...........................................................13

2.4.1 BoVW Methodology............................................................................................16

2.4.1.1 Extracting Features from Training Images……...……………………………16

2.4.1.2 Clustering Like-features Together……………………………………………16

2.4.1.3 Representing the Images as a Set of Weighted Visual Words……………….17

2.4.1.4 Constructing Histograms of Frequency of Features………………………….18

2.4.1.5 Evaluating Unknown Images Against the Obtained Histograms…………….19

2.5 Relevance Feedback in Image Retrieval…………………………………………………20

CHAPTER 3 LITERATURE REVIEW..............................................................................22

3.2 Image Retrieval Relevnce Feedback Methods.........................................................22

3.2.1 Semntic Image Retrieval Using Relevance Feedback.........................................23

3.2.1.1 Proposed System …………………………………………………………….23

3.2.2.1 Proposed Algorithm …………………………………………………………25

3.3 Methods/Codes Comparison....................................................................................30

3.4 Conclusion and Motivation......................................................................................32

Figure 1.1: Text-Based Image Retrieval System.......................................................................2

Figure 1-.2: Content Based Image Retrievl System...................................................................3

Figure 2.1: Bag of Visual Words.............................................................................................15

Figure 2.2: Feature Extraction Illustration ..............................................................................16

Figure 2.3: Feature Clustering..................................................................................................17

Figure 2.4: Image Representation as a Weighted Vector.........................................................18

Figure 2.5: Constructing Histogram of Frequency Features....................................................19

Figure 2.6: Evaluating Images Against Obtained Histograms.................................................20

Figure 2.7: Relevance Feedback Algorithm.............................................................................22

Figure 3.2: General Scheme of the Proposed System..............................................................27

Equation 3-1: Canbera Ditance 24

Equation 3-2: Average Precision 25

Table 2.1: Texture features for image retrieval........................................................................11

Table 3.1: Advantages and Disadvantages of Considered Work.............................................33

Table 4.1: Tabulated form of BOW.........................................................................................41

CBIR: Content Based Image Retrieval

RBIR: Region Based Image Retrieval

TBIR: Text Based Image Retrieval

BoVW : Bag of Visual Words

QBIC: Query by Image Content

CBVIR: Content-Based Visual Information Retrieval

CRT : Composite Region Template