## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

**Understanding bag-of-words model: a statistical framework
**

Yin Zhang

•

Rong Jin

•

Zhi-Hua Zhou

Received: 27 February 2010 / Accepted: 2 July 2010 / Published online: 28 August 2010

Ó Springer-Verlag 2010

Abstract The bag-of-words model is one of the most

popular representation methods for object categorization.

The key idea is to quantize each extracted key point into

one of visual words, and then represent each image by a

histogram of the visual words. For this purpose, a clus-

tering algorithm (e.g., K-means), is generally used for

generating the visual words. Although a number of studies

have shown encouraging results of the bag-of-words rep-

resentation for object categorization, theoretical studies on

properties of the bag-of-words model is almost untouched,

possibly due to the difﬁculty introduced by using a heu-

ristic clustering process. In this paper, we present a sta-

tistical framework which generalizes the bag-of-words

representation. In this framework, the visual words are

generated by a statistical process rather than using a clus-

tering algorithm, while the empirical performance is

competitive to clustering-based method. A theoretical

analysis based on statistical consistency is presented for the

proposed framework. Moreover, based on the framework

we developed two algorithms which do not rely on clus-

tering, while achieving competitive performance in object

categorization when compared to clustering-based bag-of-

words representations.

Keywords Object recognition Á Bag of words model Á

Rademacher complexity

1 Introduction

Inspired by the success of text categorization [6,10], a

bag-of-words representation becomes one of the most

popular methods for representing image content and has

been successfully applied to object categorization. In a

typical bag-of-words representation, ‘‘interesting’’ local

patches are ﬁrst identiﬁed from an image, either by densely

sampling [14,25] or by a interest point detector [9]. These

local patches, represented by vectors in a high dimensional

space [9], are often referred to as the key points.

To efﬁciently handle these key points, the key idea is to

quantize each extracted key point into one of visual words,

and then represent each image by a histogram of the visual

words. This vector quantization procedure allows us to

represent each image by a histogram of the visual words,

which is often referred to as the bag-of-words representa-

tion, and consequently converts the object categorization

problem into a text categorization problem. A clustering

procedure (e.g., K-means) is often applied to group key

points from all the training images into a large number of

clusters, with the center of each cluster corresponding to a

different visual word. Studies [3,20] have shown promising

performance of bag-of-words representation in object cat-

egorization. Various methods [7,8,12,13,17,21,25] have

been proposed for the visual vocabulary construction to

improve both the computational efﬁciency and the classi-

ﬁcation accuracy of object categorization. However, to the

best of our knowledge, there is no theoretical analysis on

the statistical properties of vector quantization for object

categorization.

Y. Zhang Á Z.-H. Zhou (&)

National Key Laboratory for Novel Software Technology,

Nanjing University, Nanjing 210093, China

e-mail: zhouzh@lamda.nju.edu.cn

Y. Zhang

e-mail: zhangyin@lamda.nju.edu.cn

R. Jin

Department of Computer Science & Engineering,

Michigan State University, East Lansing, MI 48824, USA

e-mail: rongjin@cse.msu.edu

123

Int. J. Mach. Learn. & Cyber. (2010) 1:43–52

DOI 10.1007/s13042-010-0001-0

In this paper, we present a statistical framework which

generalizes the bag-of-words representation and aim to

provide a theoretical understanding for vector quantization

and its effect on object categorization from the viewpoint

of statistical consistency. In particular, we view

1. each visual word as a quantization function f

k

ðxÞ that is

randomly sampled from a class of functions F by an

unknown distribution P

F

; and

2. each key point of an image as a random sample from

an unknown distribution q

i

ðxÞ:

The above statistical description of key points and visual

words allows us to interpret the similarity between two

images in bag-of-words representation, the key quantity in

object categorization, as an empirical expectation over

the distributions q

i

ðxÞ and P

F

: Based on the proposed

statistical framework, we present two random algorithms

for vector quantization, one based on the empirical

distribution and the other based on kernel density estima-

tion. We show that both random algorithms for vector

quantization are statistically consistent in estimating the

similarity between two images. Our empirical study with

object recognition also veriﬁes that the two proposed

algorithms (I) yield recognition accuracy that is compara-

ble to the clustering based bag-of-words representation,

and (II) are resilient to the number of visual words when

the number of training examples is limited. The success of

the two simple algorithms validates the proposed statistical

framework for vector quantization.

The rest of this paper is organized as follows. Section 2

presents the overview of existing approaches for key point

quantization that were used by object recognition. Sec-

tion 3 presents a statistical framework that generalizes the

classical bag-of-words representation, and two random

algorithms for vector quantization based on the proposed

framework. We show that both algorithms are statistically

consistent in estimating the similarity between two images.

Empirical study with object recognition reported in Sect. 4

shows encouraging results of the proposed algorithms for

vector quantization, which in return validates the proposed

statistical framework for the bag-of-words representation.

Section 5 concludes this work.

2 Related work

In object recognition and texture analysis, a number of

algorithms have been proposed for key point quantization.

Among them, K-means is probably the most popular

one. To reduce the high computational cost of K-means,

hierarchical K-means is proposed in [13] for more efﬁ-

cient vector quantization. In [25], a supervised learning

algorithm is proposed to reduce the visual vocabulary that

is initially obtained by K-means, into a more descriptive

and compact one. Farquhar et al. [5] model the problem as

Gaussian mixture model where each visual words corre-

sponds to a Gaussian component and use the maximum a

posterior (MAP) approach to learn the parameter. A

method based on mean-shift is proposed in [7] for vector

quantization to resolve the problem that K-means tends to

‘starve’ medium density regions in feature space and each

key point is allocated to the ﬁrst visual word similar to it.

Moosmann et al. [12] use extremely randomized clustering

forests to efﬁciently generate a highly discriminative cod-

ing of visual words. To minimize the loss of information in

vector quantization, Lazebnik and Raginsky [8] try to seek

a compressed representation of vectors that preserve the

sufﬁcient statistics of features. In [16], images are char-

acterized using a set of category-speciﬁc histograms

describing whether the content can best be modeled by the

universal vocabulary or the speciﬁc vocabulary. Tuytelaars

and Schmid [21] propose a quantization method that

discretizes a feature space by a regular lattice. van

Gemert et al. [22] use kernel density estimation to

avoid the problem of ‘codeword uncertainty’ and ‘code-

word plausibility’.

Although many studies have shown encouraging results

of the bag-of-words representation for object categoriza-

tion, none of them provide statistical consistency analysis,

which reveals the asymptotic behavior of the bag-of-words

model for object recognition. Unlike the existing statistical

approaches for key point quantization that are designed to

reduce the training error, the proposed framework gener-

alizes the bag-of-words model by the statistical expecta-

tion, making it possible to analyze the statistical

consistency of the bag-of-words model. Finally, we would

like to point out that although several randomized

approaches [12,14,24] have been proposed for key point

quantization, none of them provides theoretical analysis on

statistical consistency. In contrast, we present not only the

theoretic results for the two proposed random algorithms

for vector quantization, but also the results of the empirical

study with object recognition that support the theoretic

claim.

3 A statistical framework for bag-of-words

representation

In this section, we ﬁrst present a statistical framework for

the bag-of-words representation in object categorization,

followed by two random algorithms that are derived from

the proposed framework. The analysis of statistical con-

sistency is also presented for the two proposed algorithms.

44 Int. J. Mach. Learn. & Cyber. (2010) 1:43–52

123

3.1 A statistical framework

We consider the bag-of-words representation for images,

with each image being represented by a collection of local

descriptors. We denote by N the number of training images,

and by X

i

¼ ðx

1

i

; . . .; x

n

i

i

Þ the collection of key points used

to represent image I

i

where x

l

i

2 X; l ¼ 1; . . .; n

i

is a key

point in feature space X: To facilitate statistical analysis,

we assume that each key point x

l

i

in X

i

is randomly

drawn from an unknown distribution q

i

ðxÞ associated with

image I

i

:

The key idea of the bag-of-words representation is to

quantize each key point into one of the visual words that

are often derived by clustering. We generalize this idea of

quantization by viewing the mapping to a visual word v

k

2

X as a quantization function f

k

ðxÞ : X 7!½0; 1: Due to the

uncertainty in constructing the vocabulary, we assume that

the quantization function f

k

ðxÞ is randomly drawn from a

class of functions, denoted by F; via a unknown distribu-

tion P

F

: To capture the behavior of quantization, we

design the function class F as follows

F ¼ ff ðx; vÞjf ðx; vÞ ¼ Iðkx Àvk qÞ; v 2 Xg ð1Þ

where indicator function I(z) outputs 1 when z is true, or 0

otherwise. In the above deﬁnition, each quantization

function f ðx; vÞ is essentially a ball of radius q centered at

v: It outputs 1 when a point x is within the ball, and 0 if x is

outside the ball. This deﬁnition of quantization function is

clearly related to the vector quantization by data clustering.

Based on the above statistical interpretation of key

points and quantization functions, we can now provide a

statistical description for the histogram of visual words,

which is the key of bag-of-words representation. Let

^

h

k

i

denotes the normalized number of key points in image I

i

that are mapped to visual word v

k

: Given m visual words,

or m quantization functions ff

k

ðxÞg

m

k¼1

that are sampled

from F;

^

h

k

i

is computed as

^

h

k

i

¼

1

n

i

n

i

j¼1

f

k

x

l

i

_ _

¼

^

E

i

½f

k

ðxÞ ð2Þ

where

^

E

i

½f

k

ðxÞ stands for the empirical expectation of

function f

k

ðxÞ based on the samples x

1

i

; . . .; x

n

i

i

: We can

generalize the above computation by replacing the

empirical expectation

^

E

i

½f

k

ðxÞ with an expectation over

the true distribution q

i

ðxÞ; i.e.,

h

k

i

¼ E

i

½f

k

ðxÞ ¼

_

d xq

i

ðxÞf

k

ðxÞ: ð3Þ

The bag-of-words representation for image I

i

is expressed

by vector h

i

¼ ðh

1

i

; . . .; h

m

i

Þ:

In the next step, we analyze the pairwise similarity

between two images. It is important to note that the

pairwise similarity plays a critical role in any pattern

classiﬁcation problems including object categorization.

According to the learning theory [18], it is the pairwise

similarity, not the vector representation of images, that

decides the classiﬁcation performance. Using the vector

representation h

i

and h

j

; the similarity between two images

I

i

and I

j

; denoted by "s

ij

; is computed as

"s

ij

¼

1

m

h

T

i

h

j

¼

1

m

m

k¼1

E

i

½f

k

ðxÞE

j

½f

k

ðxÞ ð4Þ

Similar to the previous analysis, the summation in the

above expression can be viewed as an empirical

expectation over the sampled quantization functions

f

k

ðxÞ; k ¼ 1; . . .; m: We thus generalize the deﬁnition of

pairwise similarity in Eq. 4 by replacing the empirical

expectation with the true expectation, and obtain the true

similarity between two images I

i

and I

j

as

s

ij

¼ E

f $P

F

E

i

½f ðxÞE

j

½f ðxÞ

_ ¸

ð5Þ

According to the deﬁnition in Eq. 1, each quantization

function is parameterized by a center v: Thus, to deﬁne P

F

;

it sufﬁces to deﬁne a distribution for the center v; denoted

by qðvÞ: Thus, Eq. 5 can be expressed as

s

ij

¼ E

v

E

i

½f ðxÞE

j

½f ðxÞ

_ ¸

: ð6Þ

3.2 Random algorithms for key point quantization

and their statistical consistency

We emphasize that the pairwise similarity in Eq. 6 can not

be computed directly. This is because both distributions

q

i

ðxÞ and qðvÞ are unknown, which makes it intractable to

compute E

i

½Á and E

v

½Á: In real applications, approxima-

tions are needed. In this section, we study how approxi-

mations will affect the estimation of pairwise similarity. In

particular, given the pairwise similarity estimated by dif-

ferent kinds of approximated distributions, we aim to

bound its difference to the underlying true similarity. To

simplify our analysis, we assume that each image has at

least n key points.

By assuming that the key points in all the images are

sampled from qðvÞ; we have an empirical distribution for

qðvÞ; i.e.,

^ qðvÞ ¼

1

N

i¼1

n

i

N

i¼1

n

i

l¼1

d v Àx

l

i

_ _

ð7Þ

where dðxÞ is a Dirac delta function that

_

dðxÞ dx ¼ 1

and dðxÞ ¼ 0 for x 6¼ 0: Direct estimation of pairwise

similarities using the above empirical distribution is

computationally expensive, because the number of key

points in all images can be very large. In the bag-of-words

model, m visual words are used as prototypes for the key

Int. J. Mach. Learn. & Cyber. (2010) 1:43–52 45

123

points in all the images. Let v

1

; . . .; v

m

be the m visual

words randomly sampled from the key points in all the

images. The empirical distribution ^ qðvÞ is

^ qðvÞ ¼

1

m

m

k¼1

dðv Àv

k

Þ ð8Þ

In the next step, we aim to approximate the unknown

distribution q

i

ðxÞ in two different ways, and show the

statistical consistency for each approximation.

3.2.1 Empirically estimated density function for q

i

ðxÞ

First we approximate q

i

ðxÞ by the empirical distribution

^ q

i

ðxÞ deﬁned as follows

^ q

i

ðxÞ ¼

1

n

i

n

i

l¼1

dðx Àx

l

i

Þ ð9Þ

Given the approximations for distribution q

i

ðxÞ and qðvÞ;

we can now compute the approximation of the pairwise

similarity s

ij

deﬁned in Eq. 6. For Eq. 9, the pairwise

similarity, denoted by ^s

ij

; is computed as

^s

ij

¼

^

E

v

^

E

i

½f ðxÞ

^

E

j

½f ðxÞ

_ ¸

¼

1

m

m

k¼1

1

n

i

n

i

l¼1

I x

l

i

Àv

k

_

_

_

_

q

_ _

_ _

Â

1

n

j

n

j

l¼1

Ið

_

_

x

l

j

Àv

k

_

_

qÞ

_ _

ð10Þ

To show the statistical consistency of ^ s

ij

; we need to

bound js

ij

À ^ s

ij

j: Since there are two approximate distri-

bution used in our estimation, we divide our analysis into

two steps. First, we measure j"s

ij

Às

ij

j; i.e., the difference

in similarity caused by the approximate distribution for

P

F

: Next, we measure j^s

ij

À" s

ij

j; i.e., the difference

caused by using the approximate distribution for q

i

ðxÞ:

The overall difference js

ij

À ^ s

ij

j is bounded by the sum of

the two difference.

We ﬁrst state the McDiarmid inequality [11], which is

used throughout our analysis.

Theorem 1 (McDiarmid Inequality) Given independent

random variables v

1

, v

2

,..., v

n

, v

0

i

[ V, and a function f :

V

n

7!R satisfying

sup

v

1

;v

2

;...;v

n

;v

0

i

2V

jf ðvÞ Àf ðv

0

Þj c

i

ð11Þ

where v ¼ ðv

1

; v

2

; . . .; v

n

Þ and v

0

¼ ðv

1

; v

2

; . . .; v

iÀ1

; v

0

i

;

v

iþ1

; . . .; v

n

Þ; then the following statement holds

Pr jf ðvÞ ÀEðf ðvÞÞj ! ð Þ 2 exp À

2

2

n

i¼1

c

2

i

_ _

ð12Þ

Using the McDiarmid inequality, we have the following

theorem which bounds j"s

ij

Às

ij

j:

Theorem 2 Assuming f

k

ðxÞ; k ¼ 1; . . .; m are randomly

drawn from class Faccording to an unknown distribution.

And further assuming that any function in Fis universally

bounded between 0 and 1. With probability 1 - d, the

following inequality holds for any two training images I

i

and I

j

j"s

ij

Às

ij

j

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2m

ln

2

d

_

ð13Þ

Proof For any f 2 F; we have 0 E

i

½f ðxÞE

j

½f ðxÞ 1:

Thus, for any k, c

k

B 1/m. By setting

d ¼ 2 exp À2m

2

_ _

; or ¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2m

ln

2

d

_

; ð14Þ

we have Prðj"s

ij

Às

ij

j Þ !1 Àd:

The above theorem indicates that, if we have the true

distribution q

i

ðxÞ of each image I

i

; with a large number of

sampled quantization functions f

k

ðxÞ; we have a very good

chance to recover the true similarity s

ij

with a small error.

The next theorem bounds j^ s

ij

Às

ij

j:

Theorem 3 Assuming each image has at least n ran-

domly sampled key points. Also assuming that f

k

ðxÞ; k ¼

1; . . .; m randomly drawn from an unknown distribution

over class F: With probability 1 - d, the following

inequality is satisﬁed for any two images I

i

and I

j

j^s

ij

Às

ij

j

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2m

ln

2

d

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2n

ln

4m

2

d

_

ð15Þ

Proof We ﬁrst need to bound the difference between

^

E

i

½f

k

ðxÞ and E

i

½f

k

ðxÞ: Since 0 f ðxÞ 1 for any f 2 F;

using McDimard inequality, we have

Pr

^

E

i

f

k

ðxÞ ½ ÀE

i

f

k

ðxÞ ½

¸

¸

¸

¸

!

_ _

expðÀ2n

2

Þ ð16Þ

By setting

2 expðÀ2n

2

Þ ¼

d

2m

2

; or ¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2n

ln

4m

2

d

_ _

¸

with probability 1 - d/2, we have j

^

E

i

½f

k

ðxÞ ÀE

i

½f

k

ðxÞj

and j

^

E

j

½f

k

ðxÞ ÀE

j

½f

k

ðxÞj for all f

k

ðxÞ

m

k¼1

simulta-

neously. As a result, with probability 1 - d/2, for any two

image I

i

and I

j

; we have

46 Int. J. Mach. Learn. & Cyber. (2010) 1:43–52

123

j^s

ij

À" s

ij

j

1

m

m

k¼1

j

^

E

k

i

^

E

k

j

ÀE

k

i

E

k

j

j

1

m

m

k¼1

jð

^

E

k

i

ÀE

k

i

Þ

^

E

k

j

j þjE

k

i

ð

^

E

k

j

ÀE

k

j

Þj

1

m

m

k¼1

j

^

E

k

i

ÀE

k

i

j þj

^

E

k

j

ÀE

k

j

j

2 ¼ 2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2n

ln

4m

2

d

_ _

¸

ð17Þ

where

^

E

k

i

stands for

^

E

i

½f

k

ðxÞ for simplicity. According to

Theorem 2, with probability 1 - d/2, we have

j"s

ij

Às

ij

j

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2m

ln

2

d

_

ð18Þ

Combining Eqs. 17 and 18, we have the result in the

theorem. With probability 1 - d, the following inequality

is satisﬁed

j^s

ij

Às

ij

j

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2m

ln

2

d

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1

2n

ln

4m

2

d

_

ð19Þ

Remark Theorem 3 reveals an interesting relationship

between the estimation error js

ij

À^s

ij

j and the number of

quantization functions (or the number of visual words). The

upper bound in Theorem 3 consists of two terms: the ﬁrst

term decreases at a rate of Oð1=

ﬃﬃﬃﬃ

m

p

Þ while the second term

increases at a rate of Oðln mÞ: When the number of visual

words m is small, the ﬁrst term dominates the upper

bound, and therefore increasing m will reduce the difference

j^s

ij

Às

ij

j: As m becomes signiﬁcantly larger than n, the

second term will dominate the upper bound, and therefore

increasing m will lead to a larger j^s

ij

Às

ij

j: This result

appears to be consistent with the observations on the size

of the visual vocabulary: a large vocabulary tends to

performance well in object categorization; but, too many

visual words could deteriorate the classiﬁcation accuracy.

Finally, we emphasize that although the idea of vector

quantization by randomly sampled centers was already

discussed in [7,24], to the best of our knowledge, this is the

ﬁrst work that presents its statistical consistency analysis.

3.2.2 Kernel density function estimation for q

i

ðxÞ

In this section, we approximate q

i

ðxÞ by a kernel density

estimation. To this end, we assume that the density func-

tion q

i

ðxÞ belongs to a family of smooth functions F

D

that

is deﬁned as follows

F

D

¼ qðxÞ : X7!R

þ

qðxÞ; qðxÞ h i

H

j

¸

¸

¸ B

2

;

_

qðxÞ dx ¼1

_ _

ð20Þ

where jðx; x

0

Þ : X ÂX7!R

þ

is a local kernel function with

_

jðx; x

0

Þ dx

0

¼1: B controls the functional norm of qðxÞ in

the reproducing kernel Hilbert space H

j

: An example of

jðx; x

0

Þ is RBF function, i.e. jðx; x

0

Þ /expðÀkdðx; x

0

Þ

2

Þ;

where dðx; x

0

Þ ¼kx Àx

0

k

2

: Then, the distribution q

i

ðxÞ is

approximated by a kernel density estimation ~ q

i

ðxÞ deﬁned

as follows

~ q

i

ðxÞ ¼

n

i

l¼1

a

l

i

j x; x

l

i

_ _

; ð21Þ

where a

i

l

(1 B l B n

i

) are the combination weight that sat-

isfy (i) a

i

l

C 0, (ii)

n

i

l¼1

a

l

i

¼ 1; and (iii) a

i

K

i

a

i

B B

2

,

where K

i

¼ ½jðx

l

i

; x

l

0

i

Þ

n

i

Ân

i

:

Using the kernel density function, we approximate the

pairwise similarity for Eq. 21 as follows

~s

ij

¼

^

E

v

~

E

i

½f ðxÞ

~

E

j

½f ðxÞ

_ ¸

¼

1

m

m

k¼1

n

i

l¼1

a

l

i

h x

l

i

; v

k

_ _

_ _

n

j

l¼1

a

l

j

h x

l

j

; v

k

_ _

_ _

ð22Þ

where function hðx; vÞ is deﬁned as

hðx; vÞ ¼

_

dz I dðz; vÞ q ð Þjðx; zÞ ð23Þ

To bound the difference between ~s

ij

and s

ij

, we follow the

analysis [19] by viewing E

i

½f ðxÞE

j

½f ðxÞ as a mapping,

denoted by g : F 7!R

þ

; i.e.,

gðf ; q

i

; q

j

Þ ¼ E

i

½f ðxÞE

j

½f ðxÞ ð24Þ

The domain for function g, denoted by G; is deﬁned as

G ¼ g : F 7!R

þ

9q

i

; q

j

2 F

D

s.t. gðf Þ ¼E

i

½f ðxÞE

j

½f ðxÞ

¸

¸

_ _

ð25Þ

To bound the complexity of a class of functions, we

introduce the concept of Randemacher complexity [2]:

Deﬁnition 1 (Randemacher complexity) Suppose x

1

,..., x

n

are sampled from a set X with i.i.d. Let F be a class of

functions mapping from X to R: The Randemacher com-

plexity of F is deﬁned as

R

n

ðFÞ ¼ E

x

1

;...;x

n

;r

sup

f 2F

2

n

n

i¼1

r

i

f ðx

i

Þ

_ _

ð26Þ

where r

i

is independent uniform ± 1-valued random

variables.

Assuming at least n key points are randomly sampled

from each image, we have the following lemmas that

bounds the complexity of domain G :

Lemma 1 The Rademacher complexity of function class

G; denoted by R

m

ðGÞ; is bounded as

Int. J. Mach. Learn. & Cyber. (2010) 1:43–52 47

123

R

m

ðGÞ 2BC

j

E

f

_

dxjf ðxÞj

_ ¸

ﬃﬃﬃﬃ

m

p ð27Þ

where C

j

¼ max

x;z

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

jðx; zÞ

_

Proof Denote F = {f

1

,..., f

m

}, according to the deﬁnition,

we have

R

m

ðGÞ ¼ E

r;F

sup

g2G

2

m

m

k¼1

r

k

gðf

k

Þ

_ _

¼ E

F

E

r

sup

g2G

2

m

m

k¼1

r

k

gðf

k

Þ

_ ¸

¸

¸

¸

¸

F

_ _ _

¼ E

F

E

r

sup

q

i

;q

j

2F

D

2

m

m

k¼1

r

k

E

i

½f

k

E

j

½f

k

¸

¸

¸

¸

¸

F

_ _ _ _

E

F

E

r

sup

kx

i

k B

2

m

m

k¼1

r

k

E

i

½f

k

¸

¸

¸

¸

¸

F

_ _ _ _

¼

2

m

E

F

E

r

sup

kx

i

k B

x

i

;

m

k¼1

r

k

U

k

_ _¸

¸

¸

¸

¸

F

_ _ _ _

where U

k

¼ h/

1

ðÁÞ; f

k

ðÁÞi; h/

2

ðÁÞ; f

k

ðÁÞi; . . . ð Þ

and /

k

ðxÞ is an eigen function

of jðx; x

0

Þ

2B

m

E

F

E

r

m

k¼1

r

k

U

k

_

_

_

_

_

_

_

_

_

_

¸

¸

¸

¸

¸

F

_ _ _ _

¼

2B

m

E

F

E

r

k;t

r

k

r

t

U

k

; U

t

h i

_ _1

2

_

_

¸

¸

¸

¸

¸

¸

F

_

_

_

_

_

_

2B

m

E

F

k;t

E

r

r

k

r

t

U

k

; U

t

ðxÞ h ijF ½

_ _1

2

_

_

_

_

¼

2B

m

E

F

k

E

r

r

2

k

U

k

; U

k

h i

¸

¸

F

_ ¸

_ _1

2

_

_

_

_

¼

2B

m

E

F

k

U

k

; U

k

h i

_ _1

2

_

_

_

_

¼

2B

m

E

F

k

_

dz dxf

k

ðxÞf

k

ðzÞjðx; zÞ

_ _1

2

_

_

_

_

2BC

j

E

f

_

dxjf ðxÞj

_ ¸

ﬃﬃﬃﬃ

m

p

ð28Þ

where the ﬁrst inequality is because E

j

½f

k

1; the second

inequality is from Cauchy’s inequality, the third and fourth

inequalities are from Jensen’s inequality. The last equality

follows

hU

k

; U

k

i ¼

i

/

i

ðÁÞ; f

k

ðÁÞ h i

2

¼

_

dz dxf

k

ðxÞf

k

ðzÞjðx; zÞ

ð29Þ

From [2], we have the following lemmas:

Lemma 2 (Theorem 12 in [2]) For 1 B q \?, let L ¼

fjf Àhj

q

: f 2 Fg; wherehand kf Àhk

1

is uniformly

bounded. We have

R

n

ðLÞ 2qkf Àhk

1

R

n

ðFÞ þ

khk

1

ﬃﬃﬃ

n

p

_ _

ð30Þ

Lemma 3 (Theorem 8 in [2]) With probability 1 - d the

following inequality holds

E/ðY; f ðXÞÞ

^

E

n

/ðY; f ðXÞÞ þR

n

ð/ FÞ þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

8 lnð2=dÞ

n

_

ð31Þ

where /(x, y) is the loss function, n is the number

of samples and / F ¼ fðx; yÞ 7!/ðy; f ðxÞÞ À/ðy; 0Þ :

f 2 Fg:

Based on the above lemmas, we have the following

theorem

Theorem 4 Assume that the density function

q

i

ðxÞ; q

j

ðxÞ 2 F

D

: Let ~ q

i

ðxÞ; ~ q

j

ðxÞ 2 F

D

be an estimated

density function from n sampled key points. We have, with

probability 1 - d, the following inequality holds

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

^

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; ^ q

i

; ^ q

j

Þj

þ2 2BC

j

E

f

_

dxjf ðxÞj

_ ¸

ﬃﬃﬃﬃ

m

p þ

1

ﬃﬃﬃﬃ

m

p

_ _

þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8=dÞ

2m

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8m

2

=dÞ

2n

_

ð32Þ

Proof From Lemma 3, with probability 1 - d/2, we have

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

^

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

þR

m

ðjG Àgðf ; q

i

; q

j

ÞjÞ þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

8 lnð4=dÞ

m

_

ð33Þ

Since 0 B g(f; q

i

, q

j

) B 1, using the results in Lemma 1

and 2, we have

R

m

jG Àgðf ; q

i

; q

j

Þj

_ _

2 R

m

ðGÞ þ

1

ﬃﬃﬃﬃ

m

p

_ _

2 2BC

j

E

f

_

dxjf ðxÞj

_ ¸

ﬃﬃﬃﬃ

m

p þ

1

ﬃﬃﬃﬃ

m

p

_ _

ð34Þ

Hence, we have, with probability 1 - d/2 the following

inequality holds

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

^

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

þ2 2BC

j

E

f

_

dxjf ðxÞj

_ ¸

ﬃﬃﬃﬃ

m

p þ

1

ﬃﬃﬃﬃ

m

p

_ _

þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

8 lnð4=dÞ

m

_

ð35Þ

48 Int. J. Mach. Learn. & Cyber. (2010) 1:43–52

123

Next, we aim to bound

^

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj: Note

that

^

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

¼

1

m

m

k¼1

jgðf

k

; ~ q

i

; ~ q

j

Þ Àgðf

k

; q

i

; q

j

Þj

1

m

m

k¼1

jgðf

k

; ~ q

i

; ~ q

j

Þ Àgðf

k

; ^ q

i

; ^ q

j

Þj þjgðf

k

; ^ q

i

; ^ q

j

Þ

_

Àgðf

k

; q

i

; q

j

Þj

_

ð36Þ

Using the same logistics in the proof of Theorem 3, we

have, with probability 1 - d/2

1

m

m

k¼1

jgðf

k

; ^ q

i

; ^ q

j

Þ Àgðf

k

; q

i

; q

j

Þj

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8=dÞ

2m

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8m

2

=dÞ

2n

_

ð37Þ

Fromthe above results, we have, with probability 1 - d/2,

the following inequality holds

1

m

m

k¼1

jgðf

k

; ~ q

i

; ~ q

j

Þ Àgðf

k

; q

i

; q

j

Þj

1

m

m

k¼1

jgðf

k

; ~ q

i

; ~ q

j

Þ Àgðf

k

; ^ q

i

; ^ q

j

Þj

þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8=dÞ

2m

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8m

2

=dÞ

2n

_

ð38Þ

Combining the above results together, we have, with

probability 1 - d, the following inequality holds

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

^

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; ^ q

i

; ^ q

j

Þj

þ2 2BC

j

E

f

_

dxjf ðxÞj

_ ¸

ﬃﬃﬃﬃ

m

p þ

1

ﬃﬃﬃﬃ

m

p

_ _

þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8=dÞ

2m

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8m

2

=dÞ

2n

_

ð39Þ

In our empirical study, we will use RBF kernel function

for jðx; x

0

Þ with a

i

l

= 1/n

i

. The corollary below shows the

bound for this choice of kernel density estimation.

Corollary 5 When the kernel function jðx; x

0

Þ ¼

ð1=ð2pr

2

ÞÞ

d=2

expðÀkx Àx

0

k

2

2

=ð2r

2

ÞÞ and a

i

l

= 1/n

i

, the

bound in Theorem 4 becomes

E

f

½jgðf ; ~ q

i

; ~ q

j

Þ Àgðf ; q

i

; q

j

Þj

1=ð2pr

2

Þ

_ _

d=2

1 Àexp Àq

2

=ð2r

2

Þ

_ _ _ _

þ2

2E

f

_

dxjf ðxÞj

_ ¸

=

ﬃﬃﬃﬃ

n

i

p

þ1

ﬃﬃﬃﬃ

m

p

þ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8=dÞ

2m

_

þ2

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

lnð8m

2

=dÞ

2n

_

ð40Þ

Remark Theorem 4 bounds the true expectation of the

difference between the similarity estimated by kernel

density function and the true similarity. Similar to

Theorem 3, this bound also consists of a term decreasing

at a rate of Oð1=

ﬃﬃﬃﬃ

m

p

Þ and a term increasing at a rate of

Oðln mÞ: What’s more, we can see in order to minimize the

true expectation of the difference between the similarity

estimated by kernel density function and the true similarity,

we need to minimize the empirical expectation of the

difference between the similarity estimated by kernel

density function and the similarity estimated by empirical

density function. If jðx; vÞ decreases exponentially as

dðx; vÞ decreases, such as Gaussian kernel, we have hðx; vÞ

close to 1 when dðx; vÞ q while hðx; vÞ close to 0 when

dðx; vÞ [q: In such circumstance, setting a

i

l

= 1/n

i

for all

1 B l B n

i

is a good choice for the approximation and is

also very efﬁcient since we do not need to learn a.

Note that although the idea of kernel density estimation

was already proposed in some studies [e.g., 22], to the best

of our knowledge, this is the ﬁrst work that reveals the

statistical consistency of kernel density estimation for the

bag-of-words representation.

4 Empirical study

In this empirical study, we aim to verify the proposed

framework and the related analysis. To this end, based on

the discussion in Sect. 3.2, we present two random algo-

rithms for vector quantization that are shown in Algo-

rithm 1. We refer to the algorithm based on empirical

distribution as ‘‘Quantization via Empirical Estimation’’,

or QEE for short, and to the algorithm based on kernel

density estimation as ‘‘Quantization via Kernel Estima-

tion’’, or QKE for short. Note that since both vector

quantization algorithms do not rely on the clustering

algorithms to identify visual words, they are in general

computationally more efﬁcient. In addition, both algo-

rithms have error bounds decreases at the rate of

Oð1=

ﬃﬃﬃﬃ

m

p

Þ when the number of key points n is large,

indicating that they are robust to the number of visual

words m. We emphasize that although similar random

algorithms for vector quantization have been discussed in

[5,17,22,14,24], the purpose of this empirical study is to

verify that

• simple random algorithms deliver similar performance

of object recognition as the clustering based algorithm,

and

• the random algorithms are robust to the number of

visual words, as predicted by the statistical consistency

analysis.

Int. J. Mach. Learn. & Cyber. (2010) 1:43–52 49

123

Finally in the implementation of QKE, to efﬁciently

calculate h-function, we approximate it as [1]

h %

2ð

~

d À ~ qÞ

2

À1

4

ﬃﬃﬃ

p

p

ð

~

d À ~ qÞ

3

expð

~

d À ~ qÞ

2

À

2ð

~

d þ ~ qÞ

2

À1

4

ﬃﬃﬃ

p

p

ð

~

d þ ~ qÞ

3

expð

~

d þ ~ qÞ

2

ð41Þ

where

~

d ¼ d=r; ~ q ¼ q=r and r is the width of the Gaussian

kernel.

Two data sets are used in our study: PASCAL VOC

Challenge 2006 data set [4] and Graz02 data set [15].

PASCAL06 contains 5,304 images from 10 classes. We

randomly select 100 images for training and 500 for test-

ing. The Graz02 data set contains 365 bike images, 420 car

images, 311 people images and 380 background images.

We randomly select 100 images from each class for

training, and use the remaining for testing. By using a

relatively small number of examples for training, we are

able to examine the sensitivity of a vector quantization

algorithm to the number of visual words. On average 1,000

key points are extracted from each image, and each key

point is represented by the SIFT local descriptor [23]. For

PASCAL06 data set, the binary classiﬁcation performance

for each object class is measured by the area under the

ROC curve (AUC). For Graz02 data set, the binary clas-

siﬁcation performance for each object class is measured by

the accuracy. Results averaged over ten random trials are

reported.

We compare three vector quantization methods:

K-means, QEE and QKE. Note that we do not include more

advanced algorithms for vector quantization in our study

because the objective of this study is to validate the pro-

posed statistical framework for bag-of-words representa-

tion and the analysis on statistical consistency. Threshold q

used by quantization functions f ðxÞ is set as q ¼ 0:5 Â

"

d;

where

"

d is the average distance between all the key points

and the randomly selected centers. A RBF kernel is used in

QKE with the kernel width r is set as 0:75

"

d according to

our experience. Binary linear SVM is used for each clas-

siﬁcation problem. To examine the sensitivity to the

number of visual words, for both data sets, we varied the

number of visual words from 10 to 10,000, as shown in

Figs. 1 and 2.

First, we observe that the proposed algorithms for vector

quantization yield comparable if not better performance

than the K-means clustering algorithm. This conﬁrms the

proposed statistical framework for key point quantization is

effective. Second, we observe that the clustering based

approach for vector quantization tends to perform worse,

sometimes very signiﬁcantly, when the number of visual

words is large. We attribute this instability to the fact that

K-means requires each interest point belongs to exactly one

visual word. If the number of clusters is not appropriate, for

example, too large compared to the number of instances,

two relevant key points may be separated into different

clusters although they are both very near to the boundary. It

will lead to a poor estimation of pairwise similarity. The

problem of ‘‘hard assignment’’ was also observed in

[17,22]. In contrast, for the proposed algorithms, we

observe a rather stable improvement as the number of

visual words increases, consistent with our analysis in

statistical consistency.

5 Conclusion

The bag-of-words model is one of the most popular repre-

sentation methods for object categorization. The key idea is

to quantize each extracted key point into one of visual words,

and then represent each image by a histogram of the visual

words. For this purpose, a clustering algorithm (e.g.,

K-means), is generally used for generating the visual words.

Although a number of studies have shown encouraging

results of the bag-of-words representation for object

categorization, theoretical studies on properties of the bag-

of-words model is almost untouched, possibly due to the

difﬁculty introduced by using a heuristic clustering process.

In this paper, we present a statistical framework which

generalizes the bag-of-words representation. In this frame-

work, the visual words are generated by a statistical process

rather than using a clustering algorithm, while the empirical

performance is competitive to clustering-based method.

A theoretical analysis based on statistical consistency is

presented for the proposed framework. Moreover, based on

the framework we developed two algorithms which do not

rely on clustering, while achieving competitive performance

50 Int. J. Mach. Learn. & Cyber. (2010) 1:43–52

123

in object categorization when compared to clustering-based

bag-of-words representations.

Bag-of-words representation is a popular approach to

object categorization. Despite its success, few studies are

devoted to the theoretic analysis of the bag-of-words rep-

resentation. In this work, we present a statistical framework

for key point quantization that generalizes the bag-of-

words model by statistical expectation. We present two

random algorithms for vector quantization where the visual

words are generated by a statistical process rather than

using a clustering algorithm. A theoretical analysis of their

statistical consistency is presented. We also verify the

efﬁcacy and the robustness of the proposed framework by

applying it to object recognition. In the future, we plan to

examine the dependence of the proposed algorithms on the

threshold q, and extend QKE to weighted kernel density

estimation.

Acknowledgments We want to thank the reviewers for helpful

comments and suggestions. This research is partially supported by the

National Fundamental Research Program of China (2010CB327903),

the Jiangsu 333 High-Level Talent Cultivation Program and the

National Science Foundation (IIS-0643494). Any opinions, ﬁndings

and conclusions or recommendations expressed in this material are

those of the authors and do not necessarily reﬂect the views of the

funding agencies.

References

1. Abramowitz M, Stegun IA (eds) (1972) Handbook of mathe-

matical functions with formulas, graphs, and mathematical tables.

Dover, New York

10 20 50 100 200 500 1,0002,000 5,000 10,000

0.6

0.65

0.7

0.75

0.8

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 500 1,000 2,000 5,000 10,000

0.6

0.65

0.7

0.75

0.8

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 500 1,000 2,000 5,000 10,000

0.65

0.7

0.75

0.8

0.85

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 500 1,000 2,000 5,000 10,000

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

No. of visual words

A

U

C

K−means

QEE

QKE

bicycle bus

car cat

10 20 50 100 200 500 1,000 2,000 5,000 10,000

0.6

0.65

0.7

0.75

0.8

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 500 1,0002,000 5,00010,000

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 500 1,0002,000 5,000 10,000

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 500 1,0002,000 5,000 10,000

0.55

0.6

0.65

0.7

0.75

No. of visual words

A

U

C

K−means

QEE

QKE

cow

dog horse

motor

10 20 50 100 200 500 1,000 2,000 5,000 10,000

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

No. of visual words

A

U

C

K−means

QEE

QKE

10 20 50 100 200 5001,000 2,000 5,000 10,000

0.65

0.7

0.75

0.8

No. of visual words

A

U

C

K−means

QEE

QKE

person

sheep

Fig. 1 Comparison of different quantization methods with varied number of visual words on PASCAL06

10 20 50 100 200 500 1,000 2,000 5,000 10,000

0.4

0.42

0.44

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

No. of visual words

A

c

c

u

r

a

c

y

K−means

QEE

QKE

Fig. 2 Comparison of different quantization methods with varying

number of visual words on Graz02

Int. J. Mach. Learn. & Cyber. (2010) 1:43–52 51

123

2. Bartlett PL, Wang M (2002) Rademacher and Gaussian com-

plexities: risk bounds and structural results. J Mach Learn Res

3:463–482

3. Csurka G, Dance C, Fan L, Williamowski J, Bray C (2004)

Visual categorization with bags of keypoints. In: ECCV work-

shop on statistical learning in computer vision, Prague, Czech

Republic, 2004

4. Everingham M, Zisserman A, Williams CKI, Van Gool L (2006)

The PASCAL visual object classes challenge 2006 (VOC2006)

results. http://www.pascal-network.org/challenges/VOC/voc2006/

results.pdf

5. Farquhar J, Szedmak S, Meng H, Shawe-Taylor J (2005)

Improving ‘‘bag-of-keypoints’’ image categorisation. Technical

report, University of Southampton

6. Joachims T (1998) Text categorization with suport vector

machines: learning with many relevant features. In: Proceedings

of the 10th European conference on machine learning. Chemnitz,

Germany, pp 137–142

7. Jurie F, Triggs B (2005) Creating efﬁcient codebooks for visual

recognition. In: Proceedings of the 10th IEEE international con-

ference on computer vision, Beijing, China, 2005, pp 604–610

8. Lazebnik S, Raginsky M (2009) Supervised learning of quantizer

codebooks by information loss minimization. IEEE Trans Pattern

Anal Mach Intell 31(7):1294–1309

9. Lowe D (2004) Distinctive image features from scale-invariant

keypoints. Int J Comput Vis 60(2):91–110

10. McCallum A, Nigam K (1998) A comparison of event models for

naive bayes text classiﬁcation. In: AAAI workshop on learning

for text categorization, Madison, WI

11. McDiarmid C (1989) On the method of bounded differences. In:

Surveys in combinatorics 1989, pp 148–188

12. Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual

codebooks using randomized clustering forests. In: Scho¨lkopf B,

Platt J, Hoffman T (eds) Advances in neural information pro-

cessing systems, vol 19. MIT Press, Cambridge, pp 985–992

13. Nister D, Stewenius H (2006) Scalable recognition with a

vocabulary tree. In: Proceedings of the IEEE computer society

conference on computer vision and pattern recognition, New

York, NY, pp 2161–2168

14. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-

features image classiﬁcation. In: Proceedings of the 9th European

conference on computer vision, Graz, Austria, pp 490–503

15. Opelt A, Pinz A, Fussenegger M, Auer P (2006) Generic object

recognition with boosting. IEEE Trans Pattern Anal Mach Intell

28(3):416–431

16. Perronnin F, Dance C, Csurka G, Bressian M (2006) Adapted

vocabularies for generic visual categorization. In: Proceedings of

the 9th European conference on computer vision, Graz, Austria,

pp 464–475

17. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in

quantization: improving particular object retrieval in large scale

image databases. In: Proceedings of the IEEE computer society

conference on computer vision and pattern recognition, Anchor-

age, AK

18. Scho¨lkopf B, Smola AJ (2002) Learning with kernels: support

vector machines, regularization, optimization, and beyond. MIT

Press, Cambridge

19. Shawe-Taylor J, Dolia A (2007) A framework for probability

density estimation. In: Proceedings of the 11th international

conference on artiﬁcial intelligence and statistics, San Juan,

Puerto Rico, pp 468–475

20. Sivic J, Zisserman A (2003) Video Google: A text retrieval

approach to object matching in videos. In: Proceedings of the 9th

IEEE international conference on computer vision, Nice, France,

pp 1470–1477

21. Tuytelaars T, Schmid C (2007) Vector quantizing feature space

with a regular lattice. In: Proceedings of the 11th IEEE interna-

tional conference on computer vision, Rio de Janeiro, Brazil,

pp 1–8

22. van Gemert JC, Geusebroek J-M, Veenman CJ, Smeulders AWM

(2008) Kernel codebooks for scene categorization. In: Proceed-

ings of the 10th European conference on computer vision,

Marseille, France, pp 696–709

23. Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable

library of computer vision algorithms. http://www.vlfeat.org/

24. Viitaniemi V, Laaksonen J (2008) Experiments on selection of

codebooks for local image feature histograms. In: Proceedings of

the 10th international conference series on visual information

systems, Salerno, Italy, pp 126–137

25. Winn J, Criminisi A, Minka T (2005) Object categorization by

learned universal visual dictionary. In: Proceedings of the 10th

IEEE international conference on computer vision, Beijing,

China, pp 1800–1807

52 Int. J. Mach. Learn. & Cyber. (2010) 1:43–52

123

- Scalability Issues of kajayanudt
- resume_pbBen George
- ie3Rehab Shehata
- 12502870_597183Akshay Mehta
- An R Package for Spatial Coverage Sampling and Random Sampling From Compact Geographical Strata by K-meanssidneysg
- Survey on Clustering Algorithms for Sentence Level Textseventhsensegroup
- Integrating Chemist’s Preferences for Shape-similarity Clustering of Seriesbaumesl
- IJCSE10-02-02-19kalyankrishna2011
- IJAIEM-2014-07-24-58Anonymous vQrJlEN
- App Store Cluster AnalysisAfnan Al-Subaihin
- [2010] Density Basedfirefly9x
- An Automatic Medical Image Segmentation using Teaching Learning Based Optimizationidescitation
- 1-s2.0-S2212017313004659-main12taeh
- IJETTCS-2014-12-13-111Anonymous vQrJlEN
- Cement Industry Overview and Market Price Forecasting In AzerbaijanAnonymous 7VPPkWS8O
- 10 Introduction to GENEX NastarVincenzo Guglielmi
- 13 Introduction to GENEX NastarShankar Singh
- Clustering in Bioinformaticstisimpson
- Best IEEE Projects 2012 Titles - IEEE Final Year Projects @ Brainrich Technologybrainrichtech
- 87608639 13 Introduction to GENEX NastarLuis Von Matterhorn
- tkde-2012-07-0526-2.pdfHanane Ari
- eswaran-2015-ijca-907021Bagya Ganesh
- Mining the Matrixmahadharmu
- ClusteringBharath Chary
- SEGMENTATION AND LABELLING OF HUMAN SPINE MR IMAGES USING FUZZY CLUSTERINGCS & IT
- Visual and analytical mining of transactions data for production planning for production planning and marketingertekg
- Density Based Clustering Algorithm using Sparse Memory Mapped Fileijcsis
- Rock 3d Software ManualBoon Fui
- Aalst (2011) Process Mining Chapter 3: Data MiningAriadna Setentaytres
- Ch 7discretization and Concept Hierarchy GenerationShubham Malpani

- Lemon Disease Detection Using Image ProcessingInternational Journal for Scientific Research and Development
- Cluster Analysis Techniques in Data Mining: A ReviewInternational Journal for Scientific Research and Development
- IJSRDV4I30068Design of new cluster based routing protocol in vanet using fuzzy particle swarm optimizationInternational Journal for Scientific Research and Development
- A Survey on Medical Data ClassificationInternational Journal for Scientific Research and Development
- Clustering on Uncertain DataInternational Journal for Scientific Research and Development
- tmpD453Frontiers
- Implementation Of Web Document Clustering Methods For Forensic AnalysisInternational Journal for Scientific Research and Development
- tmpDF60.tmpFrontiers
- IJSRDV3I110455.pdfInternational Journal for Scientific Research and Development
- General purpose computer-assisted clustering and conceptualizationSunlight Foundation
- Introduction to Multi-Objective Clustering EnsembleInternational Journal for Scientific Research and Development
- A Survey On Mining Conceptual Rule and Ontological Matching For Text SummarizationInternational Journal for Scientific Research and Development
- A Survey of Various Intrusion Detection SystemsInternational Journal for Scientific Research and Development
- A Mining Cluster Based Temporal Mobile Sequential Patterns in Location Based Service Environments using CTMSP MiningInternational Journal for Scientific Research and Development
- A Survey on Optimization of User Search Goals by Using Feedback SessionInternational Journal for Scientific Research and Development
- Tumor Detection using Particle Swarm Optimization to Initialize Fuzzy C-meansInternational Journal for Scientific Research and Development
- Effectient Clustered Based Oversampling Approach to Solve Rare Class ProblemInternational Journal for Scientific Research and Development
- A Review Paper on Cluster Based Ant Defence Mechanism Technique For Attack In MANETInternational Journal for Scientific Research and Development
- A Survey on Hard Computing Techniques based on Classification ProblemInternational Journal for Scientific Research and Development
- Measuring Underemployment: Establishing the Cut-off PointAsian Development Bank
- tmp1529.tmpFrontiers
- Analysis And Detection of Infected Fruit Part Using Improved k-means Clustering and Segmentation TechniquesInternational Journal for Scientific Research and Development
- A Comparative Study on Distance Measuring Approaches for ClusteringWhite Globe Publications (IJORCS)
- An enhanced Method for performing clustering and detecting outliers using mapreduce in dataminingInternational Journal for Scientific Research and Development
- tmp77F6.tmpFrontiers
- A Review on Image Segmentation using Clustering and Swarm Optimization TechniquesInternational Journal for Scientific Research and Development
- Analysis of Infected Fruit Part Using Improved K-Means Clustering AlgorithmInternational Journal for Scientific Research and Development
- Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailInternational Journal for Scientific Research and Development
- tmpA3B3.tmpFrontiers
- A Survey on Android application as an advertisement portalInternational Journal for Scientific Research and Development

- performance and modeling of lte harq.pdfIk Ram
- LDPC-coded Adaptive ModulationIk Ram
- How to write a research proposal.pdfIk Ram
- Probability Review.pdfIk Ram
- aadab-e-dua_1.pdfIk Ram
- EE200_Lab_Manual.pdfIk Ram
- Novel Chaos Secure Communication System Based on Walsh CodeIk Ram
- aseeran_1.pdfIk Ram
- IR Throughput Results for CoparisonIk Ram
- Adaptive Modulation and Coding for free space channelsIk Ram
- L 8 - Speed of Light ExperIk Ram
- NTS-Book-for-GAT-General.pdfIk Ram
- magoosh-gre-1000-words_oct01.pdfIk Ram
- Overview of Arq HarqIk Ram
- PHP Learning eBook UrduIk Ram
- ch6 Bayesian Learning.Ik Ram
- PAT's White PaperIk Ram
- Matlab Matrix Tricks and TipsIk Ram
- WaveParamsAssign7 (2).pdfIk Ram
- ila0205Ik Ram
- Linear operations on MatricesIk Ram
- Solved examples of Cramer Rao Lower BoundIk Ram
- Chapter 7 Adaptive OFDMIk Ram
- Ghadir Declaration 1 by Dr. Tahir ul QadriIk Ram
- Lect13 Difference vs Differential EquationsIk Ram
- Chapter 7 Adaptive OFDMIk Ram
- Raised Cosine FilterIk Ram
- AssignmentIk Ram
- Rake ReceiverIk Ram

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading