You are on page 1of 29

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval

Tackling Curse of Dimensionality for Efficient


Content Based Image Retrieval
Presented By: Dr. Minakshi Banerjee
RCC Institute of Information Technology
Canal South Road, Beliaghata, Kolkata - 700015, West Bengal, India

Wednesday, 01.07.2015

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


CBIR: What is it?

CBIR: What is it?


I

Retrieval of images based not on keywords or annotations but


based on features extracted directly from the image data.

To aid image retrieval, techniques from statistics, pattern


recognition, signal processing, and computer vision are
commonly deployed.

Feature extraction process may produces high dimensional


feature space.

Although high dimensional feature space reduce semantic gap


but it is difficult to handle high dimensional features while
classification and image retrieval task using similarity
measurement are involved.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Objectives of the paper

Objectives of the paper


I

Tackling the curse of dimensionality by a non-linear mapping


as the most real world data requires nonlinear methods in
order to perform tasks that involve the analysis and discovery
of patterns successfully.

Search space reduction by clustering while considering


optimum number of clusters.

Outlier detection for performance improvement.

One-class support vector machine is proposed for classification


as this classier is biased to the learned concept of a particular
category.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Proposed Method

Proposed Method
Database preprocessing

Test set preparation


Image
Database

Visual
features
extraction
(CSD)

Trai ning set preparation

Q uery
Image

Display
36 nearest
images

Mapping high
dimensional features
space to a lower
dimensional space
using kernel P CA

Query features
in mapped
space

Clustering using P AM
knowing no. of clusters
fro m optimu m silhouette
width plot

Similarity
me asure using
L 1 norm

User interaction (one


time) to mark relevant and
non- relevant

Test samples
accumulation from query
image s belonging cluster
by removing outliers by
SVC (reduc ed database)

Classification
one-class SVM

Automaticall
y select
entire
relevant
images

Training samples accumulation from KP CA


mapped space which are corresponding to all
relevant images

Display
Display 36 nearest
images using L1 norm

Figure 1: Proposed method

Select original
CSD features
vectors
corresponding to
all positive
samples

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Proposed Method

Proposed Method: Algorithm

Figure 2: Proposed method

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?

Tackling the curse of dimensionality : What is this and


Why?
I

It produces a new feature space

With a dimension significantly smaller,

Which comprises a large part of the original Information.

this also allows to de-noise the data.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Kernel Principal Component Analysis (KPCA)

Kernel Principal Component Analysis (KPCA)


I

The basic idea is to first map the input space into a feature
space via a nonlinear map and then compute the principal
components in that feature space.

(x)

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Definition

Definition
A reproducing kernel k is a function k : 2 R
I

The domain of k consists of the data patterns


{x1 , x2 , ..., xl }

is a compact set in which the data lives

is typically a subset of RN

Computing k is equivalent to mapping data patterns into a higher


dimensional space F, and then taking the dot product there.
A feature map : RN F is a function that maps the input data
patterns into a higher dimensional space F .

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Illustration

Illustration
Using a feature map to map the data from input space into a
higher dimensional feature space F :

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Kernel Trick

Kernel Trick
We would like to compute the dot product in the higher
dimensional space, or
(x).(y).
To do this we only need to compute
k(x, y),
since
k(x, y) = (x).(y).
Note that the feature map is never explicitly computed. We
avoid this, and therefore avoid a burdensome computational task.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Example kernels

Example kernels
2

)
Gaussian: k(x, y) = exp( kxyk
2 2
d
Polynomial: k(x, y) = (x.y + c) , c 0
Sigmoid: k(x, y) = tanh( < x.y > +)
Nonlinear separation can be achieved.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Nonlinear Separation

Nonlinear Separation

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Mercer Theory

Mercer Theory
Input Space to Feature Space
Necessary condition for the kernel-mercer trick:
k(x, y) =

NF
X
i

i i (x)i (y) and A =

X
i

i ui uiT

NF is equal to the rank of ui uiT - the outer product


is the normalized eigenfunction analogous to a normalized
eigenvector

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Mercer :: Linear Algebra

Mercer :: Linear Algebra

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Kernel Principal Component Analysis ....

Kernel Principal Component Analysis....

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
KPCA and Dot Products

KPCA and Dot Products

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
From Feature Space to Input Space

From Feature Space to Input Space

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Projection Distance Illustration

Projection Distance Illustration

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Minimizing Projection Distance

Minimizing Projection Distance

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
Fixed-point iteration

Fixed-point iteration

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
One Class Support Vector Machine (OCSVM)

One Class Support Vector Machine (OCSVM)


I

OCSVM maps input data into a high dimensional feature


space using a kernel and iteratively finds the maximal margin
hyperplane which best separates the training data from the
origin.
quadratic programming minimization function is

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
One Class Support Vector Machine (OCSVM)....

One Class Support Vector Machine (OCSVM)....


Here, (w, ) are a weight vector and offset parameterizing a
hyperplane in the feature space associated with the kernel
Parameter :
I

it sets an upper bound on the fraction of outliers (training


examples regarded out-of-class) and,
it is a lower bound on the number of training examples used
as Support Vector.

Parameter i
I

To prevent the SVM classifier from over-fitting with noisy


data (or to create a soft margin), slack variables i are
introduced to allow some data points to lie within the margin
using Lagrange techniques and using a kernel function for the
dot-product calculations, the decision function becomes:

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Tackling the curse of dimensionality: What is this and Why?
One Class Support Vector Machine (OCSVM)....

One Class Support Vector Machine (OCSVM)....

This method thus creates a hyperplane characterized by w


and which has maximal distance from the origin in feature
space F and separates all the data points from the origin.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Partitioning Around Medoids (PAM)

Partitioning Around Medoids (PAM)


Medoids is most central objects (the best representatives) of each
cluster
This allows using only dissimilarities d(r, s) of all pairs (r, s) of
the objects.
The aim is to find the clusters C1 , C2 , ..., Ck that minimize the
target function:
k P
P
i=1 rCi

d(r, mi ) where for each i the medoid mi minimizes

P
rCi

d(r, mi )

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Partitioning Around Medoids (PAM)
Partitioning Around Medoids (PAM): Algorithm

Partitioning Around Medoids (PAM): Algorithm


Randomly select k objects m1 , m2 , ..., mk as initial medoids.
Until the maximum number of iterations is reached or no
improvement of the target function has been found do:
I

Calculate the clustering based on m1 , m2 , ..., mk by


associating each point to the nearest medoid and calculate the
value of the target function.

For all pairs (mi , xs ), where xs s a non-medoid point, try to


improve the target function by taking xs to be a new medoid
point and mi o be a non-medoid point.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Partitioning Around Medoids (PAM)
Number of clusters selection using PAM

Number of clusters selection using PAM


p(r) : the average dissimilarity of the object r and the objects of
the same cluster
q(r) :the average dissimilarity of the object r and the objects of the
neighboring cluster
Silhouette of the object r :the measure of how well is r
clustered
silw(r) =

(q(r) p(r))
[1, 1]
max(p(r), q(r))

When s(r) is:


I close to 1 . . . the object r is well clustered
I close to 0 . . . the object r is at the boundary of clusters
I less than 0 . . . the object r is probably placed in a wrong
cluster

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Partitioning Around Medoids (PAM)
Number of clusters selection using PAM : Example

Number of clusters selection using PAM: Example


In the following figure we have shown that if number of clusters
are k where (k=1,2,3,...,25) then silhouette width for every point is
computed and average is found. Finally, the cluster number which
gives the maximum average silhouette width plot is selected. It is
obvious in the figure that the number of clusters are 3 by taking
upto 25 clusters.






   




 


 












Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Support Vector Clustering (SVC)

Outliers detection criteria of Support Vector Clustering


(SVC)
Let Xi be a dataset with dimensionality d, SVC computes a sphere
of radius R and center a containing all these data. The
computation of such a smallest sphere is obtained from solving
minimization problem considering Lagrangian formulation which
produces following expression:
kx ak2 = (x.x) 2

N
P
i=1

i (x.xi ) +

N P
N
P

i j (xi .xj ) R2 , where

i=1 j=1

i is the Lagrangian multipliers. To test a data x for outlier, the


necessary condition is i 0.

Tackling Curse of Dimensionality for Efficient Content Based Image Retrieval


Support Vector Clustering (SVC)
Outliers detection criteria of Support Vector Clustering (SVC) : Example

Outliers detection criteria of Support Vector Clustering


(SVC): Example