You are on page 1of 54

Enhanced Content Based Image Retrieval System Using

Relevance Feedback

Master Thesis

By

Zahraa Loubany, Dina Balchy

Submitted to the School of Engineering of the

Lebanese International University

Beirut, Lebanon

In partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN COMPUTER AND COMMUNICATIONS

ENGINEERING

Fall 2016 – Spring 2017

Approved by:

Supervisor

Dr. Ismail ElSayad


Committee Member Dr. Samir Omar

Committee Member Dr. Rawad Abu Assi

ii
DEDICATION

I would like to dedicate this project to our family for their support and caring

throughout the different stages of the project, to my instructor Dr. Ismail who always tried his

best to emit out the best of me, and off course to the Almighty God for his daily limitless

blessings.

Zahraa Loubany

I dedicate the fruit of this hard work to my family and friends who gave me the

needed support, to Dr. Ismail who pushed me all the way to show the very best in me, and to

our university LIU for providing satisfying educational levels to achieve a successful work.

And most of all, the biggest gratitude to Allah for helping me out all the way.

Dina Balchy

iii
ACKNOWLEDGMENT

We would like to share gratitude with our supervisor Dr. Ismail El-Sayad for his

invaluable assistance, support, and supervision. Dr. El-Sayad has been next to us through the

whole journey, offered undeniable assistance and regular supervision to help us achieve this

project, and never hesitated to offer us needful information and advices to attain this level.

Gratitude is also shared with our Thesis Committee Members: ‘Dr. Samir Omar’ and ‘Dr.

Rawad Abu Assi’, for their encouragement, comments, and support.

High appreciation goes to our parents, who gave us all the needed support and confidence

during this journey.

Finally, we would like to express our gratitude to Almighty God for granting us strength and

volition to complete this project successfully.

iv
ABSTRACT

The main approach to be introduced is a new methodology in Image Retrieval using

Relevance Feedback. In this report, the image will be represented in a higher level depending

on the users’ feedback as they evaluate the veracity and accuracy of the retrieved images.

The visual data (hereby known as Visual Descriptors) are extracted using TOP-SURF,

whilst the textual data (known as Textual Descriptors) are extracted based on the TF-IDF

values of the annotated tags of the images. The usage of the advanced process BOVW, Bag

of Visual Words, is influenced by the success of Bag of Words in textual classification and

retrieval. The BoVW represents the image by all words that describes it or can be generated

from. The empirical distribution of words is captured with a histogram making the

similarities distinguishing much easier. The feedback is represented by a ranked relevancy

assignment performed by the user. The system by its turn will consider the images with a

high relevancy value to enhance the image presentation according to a weighted schema

using the concept of BoVW.

v
TABLE OF CONTENTS

DEDICATION.........................................................................................................................II

ACKNOWLEDGMENT........................................................................................................II

ABSTRACT.............................................................................................................................II

TABLE OF CONTENTS........................................................................................................II

LIST OF FIGURES................................................................................................................II

TABLE OF EQUATIONS......................................................................................................II

LIST OF TABLES..................................................................................................................II

LIST OF SYMBOLS..............................................................................................................II

CHAPTER 1 INTRODUCTION............................................................................................2

1.1 Background................................................................................................................2

1.2 Problem Statement.....................................................................................................2

1.3 General overview of the project.................................................................................5

1.4 Thesis Outline............................................................................................................5

CHAPTER 2 BACKGROUND AND DEFINITIONS..........................................................7

2.1 Introduction................................................................................................................7

2.2 Digital Image Processing...........................................................................................7

2.3 Content Based Image Retrieval- CBIR......................................................................8

2.3.1 Image Segmentation Low-level FeatureExtraction………………………………...8

2.3.2 Low-Level Feature Extraction...............................................................................9

2.3.2.1 Color Feature………………………………...………………………………...9

vi
2.3.2.2 Texture Feature………………………………………………………………11

2.3.2.3 Shape…………………………………………………………………………12

2.3.2.4 Spatial Location ……………………………………………………………...12

2.4 Image Representation with a Bag of Visual Words...........................................................13

2.4.1 BoVW Methodology............................................................................................16

2.4.1.1 Extracting Features from Training Images……...……………………………16

2.4.1.2 Clustering Like-features Together……………………………………………16

2.4.1.3 Representing the Images as a Set of Weighted Visual Words……………….17

2.4.1.4 Constructing Histograms of Frequency of Features………………………….18

2.4.1.5 Evaluating Unknown Images Against the Obtained Histograms…………….19

2.5 Relevance Feedback in Image Retrieval…………………………………………………20

CHAPTER 3 LITERATURE REVIEW..............................................................................22

3.1 Introduction..............................................................................................................22

3.2 Image Retrieval Relevnce Feedback Methods.........................................................22

3.2.1 Semntic Image Retrieval Using Relevance Feedback.........................................23

3.2.1.1 Proposed System …………………………………………………………….23

3.2.1.2 Algorithm…………………………………………………………………….23

3.2.2 A Proposed Log-based Relevance Feedback Technique in CBIR using Positive and

Negative Examples...................................................................................................................25

3.2.2.1 Proposed Algorithm …………………………………………………………25

3.2.2.2 Algorithm…………………………………………………………………….26

3.3 Methods/Codes Comparison....................................................................................30

3.4 Conclusion and Motivation......................................................................................32

REFERENCES.......................................................................................................................35

vii
LIST OF FIGURES

Figure 1.1: Text-Based Image Retrieval System.......................................................................2

Figure 1-.2: Content Based Image Retrievl System...................................................................3

Figure 2.1: Bag of Visual Words.............................................................................................15

Figure 2.2: Feature Extraction Illustration ..............................................................................16

Figure 2.3: Feature Clustering..................................................................................................17

Figure 2.4: Image Representation as a Weighted Vector.........................................................18

Figure 2.5: Constructing Histogram of Frequency Features....................................................19

Figure 2.6: Evaluating Images Against Obtained Histograms.................................................20

Figure 2.7: Relevance Feedback Algorithm.............................................................................22

Figure 3.1: Semantic Image Retrieval Using Relevance Feedback Block Diagram................34

Figure 3.2: General Scheme of the Proposed System..............................................................27

Figure 4.1 : General Overview of the CBIR Proposed System with RF..................................20

viii
Table of Equation

Equation 3-1: Canbera Ditance 24

Equation 3-2: Average Precision 25

LIST OF TABLES

Table 2.1: Texture features for image retrieval........................................................................11

Table 3.1: Advantages and Disadvantages of Considered Work.............................................33

Table 4.1: Tabulated form of BOW.........................................................................................41

ix
LIST OF SYMBOLS

CBIR: Content Based Image Retrieval

RBIR: Region Based Image Retrieval

TBIR: Text Based Image Retrieval

BoVW : Bag of Visual Words

QBIC: Query by Image Content

CBVIR: Content-Based Visual Information Retrieval

RF : Relevance Feedback

CRT : Composite Region Template

KLT : Kanade–Lucas–Tomasi Feature Tracker

Blobs : Binary Large OBject

TF : Term Frequency

IDF : Inverse Document Frequency

IR : Information Retrieval

SVM : Support Vector Machines

MAP : Mean Average Precision

CHAPTER 1

INTRODUCTION

x
1.1 Background

Through the lapse of era, the amount of digital image collections has grown

dramatically due to the rapid increase of online users and web applications. This considerable

increase was a result of the technological breakthroughs we witnessed. Admittedly, images

are the easiest way of communication used to convey information and reach the audience

brains smoothly, which made the demand on using images grows significantly.

With the popularity of social media applications, it’s not a surprise that images are

becoming increasingly important for content sharing and viewing. Generally, audience and

readers like to visualize stories not just to read them. Images illustrate the ideas for the

readers so the overall experience is more tangible and less demanding on the reader’s

attention with an obvious and explicit visual content. Visual content is content that engages,

inspires and makes the opportunity to spark the readers’ interest less daunting than

comprehensive texts.

The importance of web applications especially social media is undeniable. People are

spending most of their time navigating browsers, checking news and keeping an eye on their

friends’ new feeds which are most of the time pictures shared publically. Moreover, the

influence of video games, television and photographs has also contributed to this rise. In this

context, the development of suitable systems to manage appropriately these massive loads of

images and collections is a necessity.

The known efficient systems for managing this load use the Content Based Image

Retrieval Systems known as CBIR[ CITATION DrF00 \l 1033 ]. Generally, CBIR is a

common technique for image retrieval which has been created to solve the main problems of

xi
the query by text which is commonly known as TBIR – Text Based Image

Retrieval[ CITATION Pet \l 1033 ].

In TBIR [ CITATION Pet \l 1033 ] systems, images are annotated manually by textual

tags, which are used in the image retrieval process by a database management system. The

user provides query in terms of keyword and the system by its turn will retrieve all images

with matching textual annotation or similar to the user’s query.

The TBIR system is illustrated in the Figure below:

Figure 1.1 – Text-Based Image Retrieval System

Any image found within the system database is manually annotated by textual

keyword. Although this technique is known to be computationally fast in image retrieval, it

still has some difficulties. Firstly, the manual annotation of a huge number of images requires

an exhaustive and considerable level of human labor. Secondly, the labeled images may hold

unexpressed feelings and emotions that cannot be described by textual phrases. Thirdly, the

manual annotation of images may significantly lack accuracy due to subjectivity of human

cognition.
xii
The CBIR systems were introduced to solve such problems by taking the input as a

query image instead of a text. By its turn, it seeks for images that are highly close to the

query image by color, texture or form. CBIR, also known as query by image content (QBIC)

is applying the computer’s visual techniques in solving image retrieval problem, that is, the

difficulty of searching for digital images in huge databases.

Figure two shows a typical schema of CBIR system:

Figure 1.2 – Content Based Image Retrieval System

Training images represent all images found within the database. Each training image is

represented by a vector of features which are also known as code words. Similarly, any query

image will also be represented in the same way so that common features (similarities) are

easily distinguished. The similarities detection are performed by measuring the Euclidean

distance between the representative vector of features corresponding to the query image and

those of the training images already found in the database.

A great advantage can be taken from the Relevance Feedback (RF) that can be

implemented within the CBIR technique. RF adds a great interaction between the system and

the users after the search and retrieval operation. This feedback could be an image reordering

xiii
of the yielded results, word description or a rating mark for the image according to its

relevancy to the query image which will actually be our main concern.

Then, our framework could be generally summarized by offering a higher level image

representation using the users’ relevance feedback which improves the system’s performance

by the images driven from the database to the user.

1.2 Problem Statement

The major difference between CBIR and TBIR is the keyword assignments

incompatibility among different annotators. Each annotator may interpret the content of the

image in distinct text phrases each depending on their own point of view. TBIR stores these

text annotations in the form of keywords of textual phrases together with the image. Most of

the TBIR systems use surrounding text of the image to search for the keywords which are

physically close to the image. This technique relies on the assumption that the image is

usually described by the surrounding text. Famous Search Engines that uses this technique

are Google, Yahoo and AltaVista long ago.[ CITATION Neh13 \l 1033 ]

CBIR, a new alternative mechanism for image search, has been declared in the early

1990s. Besides using phrases assigned maually, CBIR systems use the visual content of the

images such as color, texture, and shape features. CBIR systems mainly use low-level

features that are automatically extracted by the computer using its own vision techniques,

while humans use the high-level features which are mainly the concepts lying behind the

scenes. The difference between these two paradigms of feature extractions is defined as

‘semantic gap’. Learning users’ intention and their level of satisfaction using relevance

feedback is one of the major techniques for diminishing the semantic gap.[ CITATION

Neh13 \l 1033 ]

xiv
Our main purpose is to exploit the feedback taken from the user to create a reordered

form of the retrieved images and thus a better result for the user’s query. Also, we aim to

enhance visualizing the descriptors of the image by filtering the noisy visual words from the

image. The implementation is done using Visual Studio by the aid of TopSurf image

descriptor. All data will be stored using SQL server 2014 as a relational database

management system.

1.3 General overview of the project

This report highlights on the effect of RF in image presentation enhancement. To do

so, two hierarchal approaches will be developed: the first is performed by the user where he

is asked to give his own feedback according to a voting system. This could be a simple way

to show the user’s response or satisfaction without any complicated process. The second

approach is mainly done by the system by comparing the features of the relevant images and

ranking the features according to a weighting schema.

Briefly, the project aims to enhance results of CBIR system retrieved images and

improve the image representation on the level of BOVW by taking an advantage of the RF,

therefore increasing the performance of the system.

1.4 Thesis Outline

The project provides an extensive description of our work through the five chapters.

This is structured as follows:

Starting by chapter two, we introduce general specification of CBIR systems as a

searching technique for digital images in large databases. Moreover, an inclusive description

that shows the significance of using RF in the CBIR systems is introduced.

xv
Chapter three mainly represents the literature review. The methodologies adopted by

the previous related works are presented. We go along by a brief description of the algorithms

used before followed by a general comparison of the methods and listing the advantages and

disadvantages of each in tabulated form. A general overview of the proposed work is

introduced with an illustration of the proposed approach.

Chapter four covers the construction of our implementation used by demonstrating the

algorithms, then analyzing the effect of various parameters and factors to determine which

yields the best results.

Finally, in Chapter five we draw out a conclusion of our entire work and briefly explain the

future work.

xvi
CHAPTER 2

BACKGROUND AND DEFINITIONS

2.1 Introduction

This chapter profoundly introduces a general review of image processing and the

CBIR which relies on the low level features. This system involves three main steps that are

explained extensively. Image segmentation, Low-Level features extraction and the similarity

matching are stated respectively. Furthermore, we demonstrate the approach of BoVW (Bag

of Visual Words) [ CITATION Rad11 \l 1033 ].

Finally, a brief introduction of the RF integration process in a general CBIR system is

introduced.

2.2 Digital Image Processing

Image processing is a well-known methodology used to perform different operations on

an image after converting it into digital form, in order to enhance its presentation or to extract

some useful information from it such as feature extraction. It is a type of signal dispensation

where the input could be in the form of a video frame, image or photograph, and the output

may be image or different parameters related to it.

Image processing is assorted to be among the most dramatically growing technologies

nowadays, with its several applications in various aspects of business, remote sensing,

medical interpretation and other several applications associated to image retrieval. An image

retrieval system is a system which permits the user to navigate browsers, search, and retrieve

images related to user’s inquiry.

xvii
2.3 Content Based Image Retrieval- CBIR

CBIR is a retrieval process that collects in demand images from a huge database based

on the visual information of the query image. The retrieval of a particular image from a large

database is mainly affected by general factors such as color, texture, shape and local features.

The CBIR system is referred to “computer-centric” systems in which retrieval process is

automatically performed by the system using computer vision techniques known as (low-

level features).

To perform CBIR we have to go through the low level image features as a first step.

Low-Level features are considered as the base of CBIR systems. This can be done by

extracting it from the whole image or by applying segmentation. This process is called region

based image retrieval (RBIR) which is a special type of CBIR, where the region is considered

as a part of the image with homogeneous low-level features. RBIR is preferably used by users

because it is closer to human cognition.

To go through the execution of RBIR, the first step is to perform image segmentation.

Through this step, low-level features can be extracted from the segmented regions. Then, the

similarities between two images are defined based on region features.

2.3.1 Low-level Features Extraction

Performing the image segmentation automatically is complicated process. Through

the years, several techniques have been proposed, such as curve evolution [ CITATION

HFe01 \l 1033 ], energy diffusion[ CITATION WYM97 \l 1033 ], and graph partitioning

[19]. Various existing segmentation techniques work properly for images containing

homogeneous color regions solely, which are mainly used in retrieval systems that deals only

with colors[ CITATION PLS03 \l 1033 ][ CITATION KAH99 \l 1033 ], such as direct

clustering methods in color space[ CITATION DCo97 \l 1033 ]. However, natural scenes are

xviii
substantially rich in both texture and color, and a broad zone of natural pictures can be

classified as a mosaic of regions with distinctive textures and colors.

The majority of systems build their own segmentation technique in order to achieve

the required region features through the stage of segmentation, which could be color, texture,

or both[ CITATION CFa94 \l 1033 ][ CITATION JZW01 \l 1033 ][ CITATION CPT11 \l

1033 ]. Such algorithms are mainly based on k-means feature clustering [9]. Firstly, an image

is sectioned into 4*4 sized blocks from which color and texture feature are extracted.

Secondly, k-means clustering is applied to cluster the features into independent classes, each

classified to one region. Blocks within the same class corresponds to same region. K-means

with connectivity constraint, known as KMCC, is proposed in Ref.[ CITATION VMe03 \l

1033 ] for image segmentation.

Primal features characterizing image content, such as color, texture, and shape are

automatically extracted from images and used in content-based visual query. Various

algorithms have been proposed however; our main focus will be on the features used in RBIR

systems with high-level semantics:

2.3.1.1 Color Feature

Color feature is one of the most important and widely used features in image retrieval

and presentation. Colors are assigned according to multiple color spaces, which often serve

for several applications. Description of several color spaces can be found in Ref.

[ CITATION KNP00 \l 1033 ]. Color spaces are known to be closer to human perceptual

abilities and widely used in RBIR. These spaces include, RGB (red, green, and blue),

LAB( Lightness, a and b are two color dimensions), CMY (Cyan, Magenta and Yellow),

CMYK (Cyan, Magenta, Yellow, and Black), HSV(Hue, Saturation, and Value), and YCrCb

( Y′ is the luma component , CB and CR are the blue-difference and red difference

xix
chroma components respectively) [ CITATION PLS03 \l 1033 ][ CITATION YLi04 \l 1033 ]

[ CITATION RSh04 \l 1033 ][ CITATION VMe03 \l 1033 ][ CITATION BSM \l 1033 ].

Common color features in retrieval systems include, color histogram, color moments,

and color coherence vector [ CITATION FJi031 \l 1033 ][ CITATION CCa02 \l 1033 ], all

are considered as descriptors. Color is a very crucial feature as it is invariant with respect to

scaling, translation and rotation of an image. Color space, color quantification and similarity

measurement are indispensable key components of color feature extraction; however they are

not directly related to high-level semantics. In order to evaluate the effectiveness and

efficiency of color features color descriptors are considered.

Notably, in most of the CBIR systems, the color images do not undergo preprocessing

since they are usually affected by noise. This corruption may occur due to camera sensors or

capturing devices. From here, improving the retrieval accuracy necessitates applying an

effective filter to remove the color noise.

2.3.1.2 Texture feature

Texture features is not as well-defined as color features, because it is not used in all

systems. [ CITATION IKS01 \l 1033 ][ CITATION PLS03 \l 1033 ]. However, it describes

the content of different real-world images such as fruits, clouds, skin, trees, sky and fabrics.

Therefore, texture is a significant feature for image retrieval purpose as it defines high-level

semantics and provides important information in image classification.

Texture gives information on structural arrangement of surfaces and objects of the

image. It depends on the intensity distribution entirely all over the image not defined for a

separate pixel. Among the various texture features, Gabor Features and Wavelet Features are

widely used for image retrieval and they match the results of human vision studies. Note that

texture analysis by means of the Gabor filters is a special case of the wavelet approach

[ CITATION Alp \l 1033 ] .


xx
The table below represents the advantage of Gabor filter and Wavelet texture features.

Texture Features Advantages


Gabor filter Used to detect frequency and orientation
Wavelet Transform Filters with salient point features.
Table 2.1 - Texture features for image retrieval

2.3.1.3 Shape

Shape has a moderately well-defined concept. Compared to color and texture, it is

difficult to be applied due to the inaccuracy of segmentation [ CITATION JEa99 \l 1033 ].

Along with texture and color features, image comparison also considers the shape of objects.

For shape representation of the image, various methods are used in which they are classified

into external and internal. The external methods represent the boundary, and internal ones

represent the pixels encompassing the region. Shape features are classified in to two types:

boundary descriptors and region descriptors.

Regions can be represented as a set of simple geometrical parameters, such as area or

compactness. Grid based method is a clustering approach that is commonly used for object

shape description using multi resolution grid data structure. Currently, the most popular

region descriptor is the moment invariants [ CITATION FLo03 \l 1033 ].

2.3.1.4 Spatial Location

Equally important, spatial location is also efficient in region classification. To clarify,

some objects or scenes may hold information with close color and texture features but with

different spatial locations. For instance, ‘sky’ and ‘sea’ have a common color which is blue,

but with different usual locations. Sky is usually located at the upper part of the image, while

sea at the bottom. Generally, spatial locations are either defined as top, upper or bottom

according to the location of the region in an image. Directional relationships are not sufficient
xxi
for representing the semantic content of the image. Topological relationships must be taken

into consideration as well.

2.4 Image Representation with a Bag of Visual Words

The BoVW (Bag of Visual Words) model is motivated by the achievements attained

by using BoW (Bag of Words) in document classification and retrieval. Each document in the

BoW model is represented by a set of unordered common words presented in the documents.

These words are formally represented by a histogram of frequency of occurrences for each

word which is used for document retrieval and classification. Analogously, an image is

represented by a disarranged set of common discrete visual features, called vocabulary

[ CITATION Jaw12 \l 1033 ].

The BoVW represents the image by all words that describes it or can be derived out

from. The empirical distribution of words is captured with a histogram that represents

countable illustrations of how many times each word has been repeatedly visualized

[ CITATION Jaw12 \l 1033 ].

While performing the process of visual words extraction, foreground and background

features may be mixed together. By definition, foreground features represent the part of a

scene or picture that is nearest to and in front of the viewer lying closest to the picture plane.

However, the background features represent the farthest parts from the viewer that attracts

little attention or the picture social heritage (behind the scenes). To avoid this feature mix up,

the image is segmented into regions and a single bag is extracted.

In the retrieval and classification process, researchers are recently using interest point

detection. This methodology is a recent term in computer vision that refers to the detection of

interest points of an image and is relevant for higher level processing. Interest points

represent remarkable and conspicuous image patches that hold a salient information or

xxii
noticeable object. These points are commonly used by image stabilization and structure from

motion applications to track how the image changes from frame to another. Valid interest

points are not affected by any source of noise or image transformation.

Several techniques have been proposed for interest point’s detection. Corners are a

natural choice since they are easy to identify inside images with Harris and KLT which refers

to Kanade–Lucas–Tomasi Feature Tracker. Briefly, corners make points more detectable it

by looking through a small window. As the window is shifted in any direction, the intensity

will change noticeably and thus the interest point is detected. Moreover, Blobs (Binary Large

Object) shown a great work in detecting points through different scales by localizing the

center of blobs.

At each of these interest points, we extract a Speeded Up Robust Features (SURF)

[ CITATION Low04 \l 1033 ] (should be referenced to the article named SURF ref)

descriptor to describe the local information. They are then grouped into clusters, where

similar descriptors lay within the same cluster. Each cluster is considered as a visual word,

and thus we get a dictionary of visual-words that describe all kinds of image patterns.

An example of BoV is shown in the figure below.

Figure 2.1 – Bag of Visual Words

xxiii
When interest points are mapped into visual words, we can represent an image as a

“Bag of Visual Words”, that is, a vector containing the count/weight of each visual word in

that image, which is used as feature vector.

2.4.1 BoVW Methodology

Generally, there are four sequential steps for performing a Bag of Visual Words.

2.4.1.1 Extracting features From Training Images.

In this step, the features are extracted from the image as an interesting point. For

instance, the features found in the image of the figure above might be eyes, window, mouth,

or hands each as a local patch. These features are then presented as numerical vectors called

“features descriptors”. One of the most famous descriptors is SURF descriptor which presents

each patch as 64-dimensional vector. Note that there are many key point description

techniques such as Harris and SIFT. SURF algorithm is similar to SIFT, but it is more

simplified and computationally faster.

Figure 2.2 – Feature Extraction Illustration

xxiv
The figure above shows a clarified illustration of how the system extracts the features

where the interesting points are detected. Each training image has its own features that will be

later on be clustered altogether to form different group of similar features.

2.4.1.2 Clustering like-features together

This step is defined as a process of organizing objects into distinct groups with similar

members. A cluster therefore is a collection of common features (patches) or features with

high similarities between them. Here, patches are converted to "code words" that refers to

words in text documents, producing a “code book” or a vocabulary that also refers to a

dictionary of words. K-means clustering [ CITATION Avi \l 1033 ] could be an effective

algorithm for image clustering which works as follows:

1. Select initial cluster centroids “c” at random.

2. Compute the distance between each patch and the centroids of the clusters.

3. Assign each patch to the cluster with the nearest centroid (minimum distance).

4. Recalculate each centroid as the mean of the objects assigned to it.

5. Repeat previous 2 steps until no change.

xxv
Figure 2.3 – Feature Clustering

After assigning the similar feature to the same cluster, a codebook containing all visual

words can be formed. All similar features are gathered within the same code word forming

indexed visual words. The figure above represents a set of different clusters (groups of

features) after being clustered together.

2.4.1.3 Representing the image as a set of weighted Visual Words

At this stage, images are no longer represented as a set of pixels. Instead, it can be

represented with a higher level that is more oriented to the semantic as a set of patches or

visual words known as “Bag of Visual Words”. Each image can be exemplified as a vector

containing all visual words found in the dictionary as vector components. Each component

has its own “tf-idf” value which is used as a weighting factor as shown in the figure below.

xxvi
Figure 2.4 – Image Representation as Weighted Vector

The “tf” represents the number of occurrences of a visual word in the image divided by

the total number of visual words in this image which is called (Term Frequency). The other

factor “idf” refers to the total number of images divided by the number of images where the

visual word appears and it is termed as (Inverse Document Frequency). Accordingly, the (tf-

idf) weighting factor of each component in the vector is the product of the previous two static

values.

2.4.1.4 Constructing histograms of frequency of features

Beyond the vector presentation of the image, we can easily visualize a histogram

showing how many features the image has in each cluster. In other words, a histogram

illustrating the frequency of each visual word contained in this image. Note that these

histograms show the frequency of occurrences and not the position.

xxvii
Figure 2.5- Constructing Histograms of Frequency Features

The figure above shows a set of histograms representing the frequency of features of different

images.

2.4.1.5 Evaluating unknown images against the obtained histograms

We extract features from the tested image, and form its corresponding histogram of visual

word frequencies to be compared with the ones obtained previously. Through this approach,

matches can be effectively computed. A positive or true match considered to be within the

same category or having a high correlation with.

The figure below illustrates how this comparison takes place.

xxviii
Figure 2.6 – Evaluating Images Against Obtained Histograms

The hierarchal algorithm followed by performing the BoVW is considered to be a very

successful tool in image classification images according to their content. It has a large

immunity against position and orientation of object in the image. Also, it represents all

vectors with a fixed length irrespective of number of detections. However, this model still

requires further testing for large changes in scale and viewpoint because it has not been

extensively tested yet for view point and scale invariance, and its performance still somehow

suffers ambiguity.

2.5 Relevance Feedback in Image Retrieval

Lately, all recent retrieval systems embedded within their systems the user’s relevance

feedback to further improve the retrieval process and produce more meaningful and related

retrieved images [6]. This online process is optimized considering the most positive image

xxix
selection on each feedback iteration. Through continuous learning and interaction with end-

users, RF has provided a significant enhancement in the performance of CBIR systems.

A typical scenario for RF in CBIR is shown below [1] :

(1) The system provides initial set of retrieved images through query image.

(2) User evaluates the above results as to whether they are relevant (positive examples) or

irrelevant (negative examples) to the query.

(3) Machine learning algorithm is applied to learn the user’s feedback. Then go back to (2).

Steps (2) – (3) are repeated till the user is satisfied with the results.

Figure 2.7 - Relevance Feedback Algorithm

Figure 2.7 shows a general overview of the RF algorithm. The process performed in step (3)

is to be performed by the system. That is, the burden of specifying the weight is removed

from the user and held by the system instead.

xxx
CHAPTER 3

LITERATURE REVIEW

3.1 Introduction

I........n this chapter, we go through the sections by illustrating the Content Based Image

Retrieval by using several Relevance Feedback approaches. Our focus is on several methods

that have been adopted in previous retrieval techniques, by describing the methodology

considered for each of them. This is carried on by demonstrating profoundly three different

famous methodologies known in the RF field, and explaining the algorithm of each.

Accordingly, this chapter will be wrapped up by stating the advantages and

disadvantages of each considered related work, drawing out a general conclusion, and paving

the way for introducing our new RF approach.

Recently, user’s feedback has been very popular and existing in many different levels.

Any smart application, employment systems, or new business intend to get the user’s or

costumer’s feedback in order to improve their qualifications so that they meet the user’s

satisfaction. Outlining the feedback process as well as desired outcomes is essential for

gathering user feedback properly; otherwise, we may be blindly asking for feedback that will

only confuse our understanding of user’s intentions and desires. From here, embedding RF in

the CBIR systems turned to be a necessity.

Foremost, we are going to introduce three related paradigms of RF and the followed

methodologies. These systems are “Semantic Image Retrieval Using Relevance Feedback”

xxxi
and “A Proposed Log-based Relevance Feedback Technique in CBIR using Positive and

Negative Examples”.

3.1.1 Semantic Image Retrieval Using Relevance Feedback

3.1.1.1 Proposed System

To reduce the considerable gap between low level property and high level concept, this

RF approach is proposed and experimented by using AdaBoost process.

It is found that the proposed Relevance Feedback system offers effective retrieval

performance in countable feedback iterations. A recent relevance feedback way which is

based on ADABoost technique applies the relevant and irrelevant pattern. Encompassing

trials using the RF technique with AdaBoost on different databases showed important

advancements with respect to the performance of the retrieval process.

AdaBoost is an effective algorithm of the learning field that was introduced by Freund

and Schapire in 1995 [ CITATION Yoa \l 1033 ]. It is used to enhance the classification

performances of a weak learner. It performs this by merging a collection of weak

classification functions to form a more powerful classifier.

The experiments for test this approach were performed by selecting a query image at

random and therefore the retrieved images were obtained. Then, the user is asked to label

retrieved images as relevant and irrelevant.

3.1.1.2 Algorithm

The proposed methodology is described by the following algorithm. Every single

iteration, the irrelevant images are removed from the database which plays a role in It

optimizing the testing data. Hence it increases efficiency by reducing the retrieval time.

The figure below summarizes the algorithm of the

xxxii
Figure 3.1 - Semantic Image Retrieval Using Relevance Feedback

Note that Canberra distance metric is used as variable of measuring the similarity. Here

x represents the feature vectors of the database while y refers to query image, and have

dimension d, then the Canberra distance is given by


d
∣ xi− yi ∣
Canb (x,y) ¿ ∑
i=1 ∣ xi∣+∣ yi ∣

Equation 3-1: Canberra Distance

At the experimental results stage, it is important to define a suitable measurement for

evaluating the performance. This is defined as:

relevant images retrived ∈top T returns


Precision =
T

Equation 3-1: Average Precision

This allows computing the average precision and thus knowing the efficiency of the

system as we proceed with repeating iterations.

xxxiii
3.1.2 A Proposed Log-based Relevance Feedback Technique in CBIR using

Positive and Negative Examples

3.1.2.1 Proposed System

Relevance feedback has been shown as an effective model to improve the recovery

performance of CBIR systems. It has already been considered as a crucial parameter when

modeling any CBIR system. After the user adds an enquiry image, the system will retort a set

of similar images that may not be totally relevant to the user’s targets. The pertinence

feedback mechanism solicits the user to mark the pertinence of the images retrieved. By its

turn, the system modifies the outcomes by learning the returns from the user. The Previous

systems offer to integrate the user's returns, have only taken the positive feedbacks into

consideration which will collapse the substantial negative data.

To dominate the limitations of conventional CBIR systems, Log based relevance

feedback in CBIR system was used, which merges both positive and negative feedback.

3.1.2.2 Algorithm

The main idea of RF is to switch the load of finding the right query figuration from the

user to the system.

Figure 10 shows the general representation of the planed system by using the mean

relevance feedback in CBIR.

xxxiv
Figure 3.2 – General Scheme of Proposed System

Initially, the images will be supplied to the system and stored in the system’s database.

When the user gives the query image into system, the latter in turn will extract the visual

features of supplied image and make comparison with the database image to retrieve the set

of relevant images to the user. On the retrieved set, user will give the feedback whether they

are relevant or not. If the Image is irrelevant, system goes back to the previous step.

However, if the image is relevant then system will determine the weight vector of query

image using short term learning, and supplies the semantic space of an image using long term

learning. Finally, it gives to the log-based semantic approach.

In the log-based semantic approach, the similarity space is determined based on the

semantics of an image. System will maintain the logs for user feedbacks by considering the

positives and negatives (relevant and irrelevant) from the retrieved set of images which are

useful in the next iterations. After the n iterations, an image will have the logs. This will be

useful to improve the system’s performance and speed. The steps are repeated until the user

gets relevant images.


xxxv
3.2 Methods/Codes Comparison

The method used in the “Semantic Image Retrieval Using Relevance Feedback” shows

a fast increase in retrieving accomplishment with each iteration using AdaBoost learning

algorithm. The method adopted by “A Proposed Log-based Relevance Feedback” has a major

contribution in achieving integration of short-term-learning with long-term-learning to

semantically update the query weight vector.

The advantages and disadvantages of the three considered methods will be illustrated in

a tabulated form as shown below. The related works are numbered as 1 and 2 for “Semantic

Image Retrieval Using Relevance Feedback” and “A Proposed Log-based Relevance

Feedback” respectively.

Considered Work Advantages Disadvantages

Enhanced performance retrieving Human perception subjectivity.

1 comparing with the way R. Ding

approach uses.
Fast expansion in the performance of No enhancement on the level of

retrieving with every feedback iteration. image representation (BoVW

presentation)
Performance enhancement from 57.2% Lacks the presence of a voting

to 92.50% from 1st iteration to the 5th system.

iteration.
Integrates short-term-learning with long- No enhancement on the level of

term-learning. image representation.


2
Enquires high retrieval accuracy for a Lacks the presence of a voting

large database which contains similar system that allows the user to

semantic categories. grade the image according to its


xxxvi
relevancy.

Table 3.1 – Advantages and Disadvantages of Considered Work

3.3 Conclusion and Motivation

Although the retrieval image systems achieved high and efficient retrieval outcomes,

these systems still lack the interaction of the user directly with this system. The absence of

such interactions consequently results an inaccuracy in the output results.

We believe that effective CBIR of images needs integration between the retrieval

models found in the Information Retrieval (IR) literature and the feature plucked out and was

found in the image processing area. For this purpose, it is necessary increase the level of

accuracy in such systems in order to return the desired results for users.

Relevance Feedback in CBIR has obtained its maximum popularity in the field of

image processing and retrieval. This approach sparked the attention of many researchers who

were involved in the enhancement of retrieving systems. In the past few years, researches

devoted to relevance feedback as an efficient way to improve performance of CBIR systems.

Henceforth, executing the RF in CBIR systems is the recent major key in improving retrieval

systems and enhancing retrieval accuracy significantly.

From here, our objective is to offer an improved content based image retrieval

system using the users’ relevance feedback which improves the system’s performance by

grading the images driven from the database to the user.

This will be managed by the implementation of “Enhanced Content Based Image

Retrieval System Using Relevance Feedback” using Visual Studio and MicrosoftSQL server

2014.

xxxvii
By coding and testing algorithms we will be able to accurately study system

performance with respect to the retrieved results incorporated with RF.

The next chapter will introduce our methodology in interlacing the RF of the user in an

iterative loop within the system.

The main deliverable of the proposed approach would be a list of retrieved images with a

highest degree of relevancy with the image query requested by the user.

xxxviii
CHAPTER 4

CBIR SYSTEMS WITH RELEVNCE FEEDBACK

IMPLEMENTATION

4.1 Introduction

This chapter focuses on the implementation of the Relevance Feedback in CBIR

systems and demonstrated in details. Three processes will be mainly introduced; two by

which they are performed by the system, and one process by the user. Firstly, the system will

perform the retrieval process, is followed by feedback process done by the user. Beyond the

feedback submission, system will update the order of the retrieved images and rearrange them

according to the users’ accumulative feedbacks.

The process of retrieval will be managed based on bag-of-visual words descriptors

based on TopSurf. Particularly, the tf-idf values of each descriptor. Bag of Words (BoW) is a

list of words with their word counts which is generally represented by table. Each row

represents a document which is –in this case- an image, and each column represents a visual

word. The cells represent the word count of each word in an image. The figure below

illustrates the BOW in its tabulated form.

Image 1 Image 2 Image 3 Image 4..


V.Word 1 12 0 0 6
V.Word 2 34 12 0 0
V.Word 3 26 32 0 7
V.Word 4 0 23 42 3

Table 4.1- Tabulated form of BOVW

xxxix
Term-frequency-inverse document frequency (tf-idf) is alternative way to estimate the

subject of an image by the visual words it contains. With tf-idf, visual words are given weight

according to its number of occurrences.

Generally, the tf-idf value is composed of two static factors: the first computes the

normalized Term Frequency (TF), i.e. the number of occurrences of a visual word in the

image divided by the total number of visual words in this image. The second term is the

Inverse Document Frequency (IDF), computed as the logarithm of the number of images

divided by the number of images where the precise visual word appears.

Tf (t) = (Number of times visual word v appears in an image) / (Total number of visual words

in the image).

Idf (t) = log (Total number of images/ Number of images with visual word v in it).

As a result, we will be able to derive a conclusion with respect to retrieved images

using the Relevance Feedback after we study its performance with respect to precision and

user’s satisfaction.

4.2 Proposed Design

The proposed system we intend to implement is illustrated in the figure below, and it

can be briefly described by the following steps:

1. Image retrieval based on BOVW.

2. User’s feedback submission by an image voting criteria (Priority for most

relevant)

3. System rearranging of retrieved images

4. Saving the updated order in the database

xl
5. Review the rearranged form by each user

Figure 4.1 – Arcitecture of the RF Proposed System

The system will rearrange the set of retrieved images according to the RF performed

by various users. The feedbacks of all users are taken into account every single iteration, so

that the user is able to visualize the original order of retrieved images by the system and also

the order of the images after his own feedback.

xli
4.3 Implementation/Simulation Tools

The design consideration of our retrieval software was the programming language

that suits best our simulation. While various options such as C++ and JAVA were employed;

our chosen one was C# due to the availability of libraries for image processing and

MicrosoftSQL server for database due voluble dealing with large databases.

Using TOP-SURF as an open source code was our starting point. TOP-SURF is an

image descriptor that integrates interest points with visual words, and thus enhancing its

performance. TOP-SURF offers the elasticity in descriptor size variation and supports very

efficient image corresponding. Besides the visual word extraction, visualization, and

comparisons it also provides a high level API and very large pre-computed

codebooks[ CITATION TopSurfhtt \l 1033 ]. TOP-SURF descriptor is a fully open source,

although it depends on libraries that need different licenses. As the original

SURF[ CITATION bay \l 1033 ] descriptor is a closed source, we used the OpenSURF as an

alternative open source, which depends on OpenCV that is released under the BSD license.

For a better database practicing, our work is connected to MicrosoftSQL server 2014

containing all tables and procedures needed for retrieval process execution and RF

engagement. Note that in the SQL software, we have used one table and 3 procedures

(Retrieve, update and delete) as shown below.

Figure 2

xlii
Figure…. Shows that the table includes the Rating_ID as a primary key and other

attributes used by the database proedures. The data type of images is chosen to be varbinary,

because the size of images varies considerably. The MAX attribute used for indicating that

the column data entries may exceed 8,000 bytes.

4.4 Implementation/Simulation Results

Our dataset is composed of different images as a Training images represented by four

different categories (Babies, cats , flowers and musical instruments). All the images are

stored in JPEG format.

The very first step in the proposed system is to extract the local descriptors from the

images. In this paper, SURF description was chosen in order to evaluate the system

performance. After the extraction of the key points from the image, the local description of

each key point is computed as shown figure below.

xliii
Figure 4.1 – System Architecture of Image Retrieval using BOVW

As shown in Figure 4.1, the system consists of two consecutive stages: Training stage

and Testing stage. Through the training stage, each image in the dataset is converted into

grayscale, resized and the features are extracted and associated to local descriptors. Beyond

this step, the set of local descriptors is clustered using K-means algorithm to construct a

vocabulary of K clusters. Then, the BOW image descriptor is computed as a normalized

histogram of vocabulary and saved for all images. As the test stage, the input image is pre-

processed for keypoints extraction where local descriptors are computed from it followed by

computing the BOVW vector. Finally, through the matching step, SVM (Support Vector

Machine) classification was used to grab the best results, which has the most similarity with

the image query.

xliv
The user is prompted to drag the file of training images after choosing the “Extract

Descriptors” option in order to perform the feature extraction process by the system as shown

in the figure below.

Figure 3

The retrieval demo of the query image with the set of images compared to it with the

cosine difference is illustrated in the figure below.

Figure 4

xlv
As shown at the top of the screen, the user has two different options; either he chooses

to submit his feedback by rearranging the set of retrieved images or he directly view the

updated rearranged form of images after engaging all previous feedbacks.

Figure 5

The above figure shows how the user is allowed to submit his feedback regarding the

first top ten retrieved images. Here, the user gives a rating mark for each image ranging from

1 up to 10. The mark represents the degree of relevancy between the image and the retrieved

one. Number 1 presents the image that is most relevant and closely connected to the query

image, from the users own perception. However, number 10 presents the lowest degree of

relevancy. Once the user submits his feedback, the system -which is automatically connected

to the SQL- will update the table of images and insert the ranking mark corresponding to each

image.

As for the second button, the user is allowed to visualise the updated order of the

retrieved images related to the same query image. In other words, the system collects all

feedbacks associated to the tested query image and averages them up to give the new order of

xlvi
each retrieved images. Notably, the images are arranged in the ascending order as the lowest

number presents the most relevant. Remarkably, the user is capable of visualising the new

order of the retrieved images even if hadn’t submit his own feedback. This means that the

step of inserting the feedback is optional, and the user is not obliged to do it if he is not

interested to do so. The figure below shows a demo of the set of retrieved images in the new

rearranged order.

Figure 6

The above figure obviously shows how the images are rearranged taking into account

the user’s feedback. After each iteration, the system averages the total entries of feedbacks

and returns the order of the image according to the equation below.

Average Feedback=
∑ Feedback Entries
Number of feedback submitted

Equation

..........Precision performance measure is used to calculate the performance of the retrieval

system. Precision is a relation presenting the number of relevant images retrieved to the total

number of retrieved images (relevant and irrelevant). This is shown in the equation below.

xlvii
Number of relevant images retrieved
Precision= X 100
Number of images retrieved

Equation

The precision is computed for each category separately in order to evaluate the overall

performance of our system before and after consecutive rounds of relevance feedback. The

results are presented in the table below. The results percentage is computed taking into

account the retrieved top 10 images of the set under study.

Precision (%) Before RF After RF After RF After RF


Category
Round 1 Round 2 Round 3
50 60 70 80
Babies
60 70 80 100
Cats
50 60 80 90
Flowers
40 60 60 80
Musical Instruments

Table

4.5 Discussion

Regarding the first category (Babies), the precision of 50% before relevance means

that originally the system retrieves 5 relevant images out of the top 10 retrieved images. After

the first round of the feedback submission the precision percentage increases to 60% which

means that 6 images out of the top 10 are now relevant to the query image. After each round,

more feedbacks are being taken into account which remarkably increased the precision

percentage to 90% at the third round.

xlviii
To test the performance of the entire system, mean average precision is computed

according to the following equation.

Mean Averge Precision ( ¿ of round )=


∑ Precision of each category
Number of ctegories

Equation

The table below shows the MAP of each round for each category.

Before RF Round 1 Round 2 Round 3


MAP 50 % 62.5% 72.5% 87.5%

Table Comparison of Mean Average Precision

The results are clear enough to demonstrate how important the engagement of RF in

content based image retrieval systems. That is, for a CBIR system a percentage of 87.5%

relevancy indicates a high performance in the field of query by image content retrieval

systems.

4.5 Conclusion

The use of relevance feedback in enhancing the performance of CBIR systems

became a must. It permits the interaction between the user and the system in a useful efficient

way. As well as, user’s feedback is an additional feature of CBIR systems that assists the

model to retrieve much more images visually related to the query concept expeditiously. This

is certainly shown by the accuracy and the precision of the retrieved results which were quite

high compared to systems with no consideration of users relevance feedback.

xlix
As to performance measures, the precision percentage and the mean average precision

were sufficient proofs to emphasize on the significance of relevance feedback in CBIR

systems.

CHAPTER 5

CONCLUSION AND FUTURE WORK

5.1 Conclusion

As huge image databases become a vast necessity in scientific, medical and in

advertising/marketing domains, approaches for organizing a database of images and for

effective retrieval have become very crucial. From here, the CBIR systems gave birth. It is

the field of representing, organizing and searching images based on their visual content rather

than textual tags describing it. Retrieval of images is no longer based on textual phrases and

annotations but on features extracted directly from the image data. This approach retrieves

digital images from large databases using the content of the images themselves without

human intervention, therefore eliminating inefficient and subjective manual labeling.

The implementation of Relevance Feedback in CBIR systems proved that by testing

the user’s satisfaction and engaging it in the system, the retrieval process is much more

efficient and precise. That is, the result of the retrieved images met the users’ desires

efficiently after engaging the feedbacks of several users within the retrieval process.

l
5.2 Future Work

As for future work, we aim to increase the performance of CBIR systems by

enhancing the way we search by. Besides the Relevance feedback we’ve added, searching

would be better if we combine the two approaches (CBIR and TBIR) together instead of

searching in terms of an image solely. In other words, we join textual and visual features

using a certain algorithm, which fuses Visual Descriptors and Textual Descriptors to produce

a multimodal global feature that helps in the retrieval process. In addition to that, we will use

the content based image retrieval to create an upper level image presentation, for instance; a

visual phrase. Furthermore, this model can be used to retrieve frames from videos using a

query image.

li
REFERENCES

[1] D. H. Z. a. P. D. D. F. Dr. Fuhui Long, "FUNDAMENTALS OF CONTENT-BASED

IMAGE RETRIEVAL," 2000.


[2] P. F. A. F. S. a. C. G. Peter Wilkins, "Text Based Approaches for Content-Based Image

Retrieval on Large Image Collections," Glasnevin, Dublin 9, Ireland.


[3] S. S. R. M. S. Neha Jain, "Content Base Image Retrieval using Combination of Color,

Shape and Texture Features," 2013, p. 2.


[4] M. P. C. G. Radu Tudor Ionescu, "Local Learning to Improve Bag of Visual Words

Model for Facial Expression Recognition," 2011.


[5] D. C. W. K. H. Feng, "A curve evolution approach for image segmentation using

adaptive flows," 2001, p. 494–499.


[6] B. M. W.Y. Ma, "Edge flow: a framework of boundary detection and image

segmentation," in IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), 1997, p. 744–749.


[7] D. G. J. B. D. P.L. Stanchev, "High level color similarity retrieval," (2003), p. 363–369.
[8] K. V. J.-H. O. K.A. Hua, "SamMatch: a flexible and efficient sampling-based image

retrieval technique for large image databases," in Proceedings of the Seventh ACM

International Multimedia Conference (ACM Multimedia’99), November 1999, p. 225–

234.
lii
[9] P. M. D. Comaniciu, "Robust analysis of feature spaces: color image segmentation," in

Proceedings of the IEEE Conference on Computer, 1997, p. 750–755.


[10] R. B. M. F. J. H. W. N. D. W. E. C. Faloutsos, "Efficient and effective querying by

image content," in J. Intell. Inf. Syst. 3 (3–4) , (1994) , p. 231–262.


[11] J. L. G. W. J.Z. Wang, "SIMPLIcity: semantics-sensitive integrated matching for picture

libraries," in IEEE Trans. Pattern Anal. Mach. Intell. 23 , (2001) , p. 947–963..


[12] D. S. C.P. Town, "Content-based image retrieval using semantic visual categories," in

Society for Manufacturing Engineers, 2011, p. 201.


[13] I. K. M. S. V. Mezaris, " An ontology approach to object-based image retrieval, ,," in

Proceedings of the ICIP, vol. II, 2003, p. 511–514.


[14] A. V. K.N. Plataniotis, " Color Image Processing and Applications," 2000.
[15] D. Z. G. L. W.-Y. M. Y. Liu, "Region-based image retrieval with perceptual colors," in

Proceedings of the Pacific-Rim Multimedia Conference (PCM), December 2004, p.

931–938.
[16] H. F. T.-S. C. C.-H. L. R. Shi, " An adaptive image content representation and

segmentation approach to automatic image annotation," in International Conference on

Image and Video Retrieval, 2004, p. 545–554.


[17] B. Manjunath, "Color and texture descriptors".
[18] M. L. L. Z. H.-J. Z. B. Z. F. Jing, "Learning in region regionbased image retrieval," in

Proceedings of the International Conference on Image and Video Retrieval (CIVR2003),

2003, p. 206–215.
[19] S. B. H. G. J. M. C. Carson, " Blobworld: image segmentation using expectation-

maximization and its application to image querying," in IEEE Trans. Pattern Anal.

Mach. Intell. 8 (8) , (2002), p. 1026–1038.


[20] I. C. I.K. Sethi, "Mining association rules between low-level image features and high-

level concepts, Proceedings of the SPIE Data Mining and Knowledge Discovery, vol.

III,," 2001.
[21] S. K. Alphonsa Thomas, "A Survey on Image Feature Descriptors-Color, Shape and

Texture".
[22] M. G. J. Eakins, "Content-based image retrieval," in Technical Report, 1999.
[23] H. Z. D. F. F. Long, " Fundamentals of content-based image retrieval," in Multimedia

liii
Information Retrieval and Management , Berlin, 2003.
[24] R. S. a. C. Jawahar, "Word Image Retrieval using Bag of Visual Words," India, 2012.
[25] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," p. 91–110,

2004.
[26] S. T. Avi Mehta, "Image Segmentation using k-means clustering, EM and Normalized

Cuts".
[27] Y. F. R. E. Schapire, "A Short Introduction to Boosting," USA.
[28] [Online]. Available: http://press.liacs.nl/researchdownloads/topsurf/.
[29] H. bay, "SURF:Speed Up Robust Features," p. 14.

liv

You might also like