You are on page 1of 5

Unsupervised Learning of Categories with Local

Feature Sets of Image


Razieh Khamseh Ashari Maziar Palhang
Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Dept. of
Isfahan University of Technology Isfahan University of Technology
Isfahan, Iran Isfahan, Iran
r.khamsehashar@ec.iut.ac.ir palhang @cc.iut.ac.ir

Abstract— By an increase in the volume of image data in the feature vectors with constant length and each vector is
age of information, a need for image categorization systems is corresponding to a specific global feature, in local
greatly felt. Recent activities in this area has shown that the representation, each image generates different number of
image description by local features, often has very strong features with no meaningful regularity among the features of a
similarities among local partials of an image, but these methods set. On the other hand, similar measurement of a set of
are also challenging, because the use of a set of vectors for each generated unordered features is also challenging.
image does not permit the direct use of most of the common Some of the existing methods to solve the problem of direct
learning methods and distance functions. On the other hand, use of learning methods, use vector quantization for making a
measuring the created similarities of the collection of unordered codebook of feature descriptors and then by counting the
features is also problematic, because most of the proposed number of the occurrences of each feature and change of each
methods have rather high time complexities computationally. input related to a set of features of a vector, it is possible to
In this article, an unsupervised learning method of directly use the common methods of clustering or latent
categorization of objects from a collection of unlabeled images is semantic analysis [10, 11]. Every several functioning of these
introduced. Each image is described by a set of unordered local methods in recognition and categorization of objects is
features and clustering is performed on the basis of partial promising, but these methods do not permit explicit use of
similarities existing among these sets of features. For this clutter or occlusion features which are created due to image
purpose, by the use of pyramid match algorithm, the set of the background which needs a codebook and has computationally
features are mapped in multi-resolution histograms and two sets high time-complexity.
of feature vectors in time-line are calculated in this new distance. To measure the similarity of a set of unordered features,
These similarities are employed as a criterion distance among the many research activities have been performed which mostly
patterns in hierarchical clustering; therefore, the categorization have computationally high time-complexity, like [8] which
of objects by the use of common learning methods is performed introduces a kernel which measures the similarities by
with acceptable accuracy and faster than the existing algorithms. averaging the distances between each point in a set and the
nearest point to it in another set with time-complexity of
O(dm2) order (d is the dimensions of the feature vector and m
Keywords— Computer vision; unsupervised learning; is the maximum size of the set of features).
Categorization of objects; Pyramid match algorithm; Hierarchical Kondor et .al [9] designed a parametric kernel which fitted
clustering the Gaussian function to each set of feature vectors. time-
complexity order of this method is O(dm3) .Also, it is assumed
I. INTRODUCTION that the features of each set follow one distribution and the
The initial research activities performed on categorization number of the features of each set is enough to have an
and recognition of objects, described images with global accurate estimate of the distribution parameters, but satisfying
features, i.e. one feature vector was considered for each image. these conditions is not always possible.
Although this manner of representation permits the direct use But, Grauman and Darrell [2], introduced the pyramid
of learning methods and distance functions, they are very match kernel which identifies the matchings between two sets
sensitive to the conditions of the image such as clutter, of feature vectors and shows that how as a criterion of
occlusion, and lighting conditions. measurement it can represent the similarity between images
Recent research activities have shown that an image with unordered local features and variable length.
description with local features often acquires very strong In this article, by combining the pyramid match kernel and
similarities among the local partials of an image. As a result, a hierarchical clustering, not only we use the common clustering
set of local feature vectors is a descriptor for an object and for methods to categories the objects, but we also perform the job
each image (like [5, 6, and 7]), although the use of a set of with an acceptable accuracy and in linear time.
vectors for each image does not permit the direct use of most
of learning methods and distance functions. Since most
learning methods and similar measuring criterion consider the

978-1-4673-6206-1/13/$31.00 ©2013 IEEE


II. METHOD
The goal is to perform the categorization of objects by if it is assumed that the vectors are bounded by a sphere of
combining hierarchical clustering with pyramid match kernel. diameter D, the number of the pyramid levels can be defined
For this purpose, a dataset of unlabeled images is considered as L=[log2D]+1.
and by the use of the proposed method, the categories of The Hi(X) includes bins with d dimensions and width of 2i
objects are found. Each image is described by an unordered set over the data present in X, so that the Hi+l(X) width is two
of local features. This is done in a manner that, first by the times as much as component of Hi(X). It can be concluded that
application of pyramid match algorithm and formation of Ψ(X) is a histogram pyramid which includes d-dimension
multi-resolution pyramid histograms, the rate of two-by-two
components and the size of each of its components is twice the
similarity of all the dataset images are measured by
considering the partial matching existing in them, and the subsequent component.
similarity matrix (the kernel matrix) is formed and ultimately, 3. Similarity measurement among set of features
by the use of hierarchical clustering, we perform the The K∆ pyramid match function calculates the rate of
categorizing of the objects. similarity between the set of the points along the multi-
resolution histogram as (4):
L
A. Pyramid Match Algorithm KΔ(ψ(y), ψ(z)) = i=0 Wi Ni (4)
The basic idea of pyramid match kernel is the formation of
multi-resolution pyramid histogram in the space of the Where Ni signifies the number of new matchings at i level and
features to count the number of the matching at each level of the new matching is defined as a pair of features with no
pyramid. Therefore, by the use of a weighted hierarchical match or similarity at any of the finer levels.
histogram, the rate of similarity among the partial matchings Wi is the weight considered for each matching formed at i
can be found. For the calculation of similarities in a d- level. On the other hand, the rate of similarity for each two
dimension space, the space of the samples is divided into
points at any level is specified by the size of the bins and the
several pyramids instead of the calculation of the distance of
the features. As a rule, similar points are located in a occurred matchings at finer levels show stronger similarities.
histogram bin and as a result the similarities of the samples Therefore, Wi has an inverse relationship with the size of the
can be worked out with little cost. bins at any level, so we have:
The pyramid match algorithm can be considered to include Wi = 1/(d*2i)
three basic stages [1, 2]: To calculate Ni, first the histogram intersection function L
1-Development of a set of feature vector should be calculated. This function finds the overlap between
An F feature space includes a set of S samples with d- the bins of the two histograms. H(X) and H(Y) are two
dimension vectors which are considered as (1): histograms with r bins while, H(X)j determines the value of jth
of the bin of H(X) histogram. In Fig. 1(c) the intersection
S = {X|X={x1,...,xm}} (1) pyramid between
α + βthe= histograms
χ. (1)are calculated.
(1) Therefore, it
can be said that the histogram intersection function L, counts
So that xi ∈ F ⊆ Rd the number of the matching taking place at each level as (5).
Therefore, the set of feature vector for each sample is defined r
as (2): L(H(X),H(Y)) = j=1 min(H(X)j, H(Y)j) (5)

X = {x|x={[f11,…, fd1],…,[ f1m,…, fdm]}} (2) In order to αcalculate the number


+ β = χ. (1) of new
(1)matchings occurred
at the i level, it is required to obtain the difference between the
The specifications that can be enumerated for this set of matching taking place at ith and i-1th by (6).
feature vector is that each sample includes an unordered set of
feature vectors and also the number of the feature vectors in Ni = L(Hi(Y),Hi(Z)) − L(Hi-1(Y),Hi-1(Z)) (6)
each sample is different. Fig. 1(A) depicts a set of one-
The final pyramid match function is obtained by (4), (5), and
dimensional features.
(6) as (7):
2-Pyramid construction
Pyramid match employs the multi-resolution pyramid KΔ(ψ(y), ψ(z)) = L
i=0 (1/(d*2i)) (L(Hi(X),Hi(Y))) −
histograms for the segmentation of the feature space which L(Hi-1(X),Hi-1(Y)) (7)
increasingly gets bigger, so that at the lowest level of the
pyramid, each point is located in a bin and ultimately, at the By having N image, a K∆(Xi,Xj) matrix with the dimensions
highest level, all the points will be located in one bin. Fig. of N*N is created (pyramid match kernel matrix) and each
1(B) depicts a three level pyramid. element represents the rate of two-to-two similarity of
The feature extraction function Ψ (X) which is a vector corresponding component, therefore we will have (8):
consisting of Hi(X) histograms is defined as (3):
KΔ(Xi, Xj) = KΔ(ψ(y), ψ(z)), Kij = KΔ(Xi, Xj) (8)
Ψ(X) = [H0(X),…,HL-1(X)] (3) α + β = χ. (1) (1)
For the reduction of the costs resulting from outlier data in
this clustering method, the sum of squares criterion is used for
the computation of the lack of similarity among the clusters. It
is done in this manner that from the sum of squares of
differences for each data for one cluster with a mean vector of
that cluster is used as a criterion for the measurement of a
cluster.
The following algorithm can be considered for Ward
method [3, 4].
1-Initially, each data is considered as a cluster.
2- For all the pairs of possible clusters of the collection of
(A) set of points (B) histogram pyramids (C) intersections clusters, those two clusters that their sum of squares of
Fig. 1. Pyramid match algorithm calculates partial similarities by matching the differences of the data resulting from their resulted cluster
points fall in a histogram bin. In this figure, two one-dimensional feature sets with the mean vector of the that cluster is minimized are
are used for the generation of two histogram pyramids. Each row corresponds
to a pyramid level. In figure (A) the Y set is on the left and the Z set is on the selected. Therefore, at this stage, the cost of the merging of the
right (the points are distributed along the vertical axes which are repeated at clusters is computed two-by-two by (9).
each level).The light dotted lines show the bin boundary and the points 3-The two selected clusters are merged together.
connected with bold dashed lines represent the matchings taken place at each 4- Stages of 2 and 3 are repeated until the number of the
level, while solid lines show the matches occurred at the previous levels. In
(B) the histogram pyramids are represented by the enumeration of the bins clusters reaches the desired number.
along the horizontal axis. In (C) the obtained intersection pyramid between the
histograms in (B). K∆ calculate the new matchings occurred at each level. By Δ(Α,Β) = Ʃi∈A∪B ||x→i − → → → 2
m A∪B ||2 − Ʃi∈A ||x i - mA|| −
Li we mean L(Hi (y), Hi(Z)). Therefore, at each level Li=5, 4, 2 and the number
of the new matches at each level is also equal to Ni=1,2,2. The sum of the →– →
weighted Nis with Wi=1,1/1,2,4 specify the rate of the similarity matchings Ʃi∈B ||x i mB||2 = (nAnB)/( nA+nB) || →
mA – →
mB||2 (9)
found at the pyramid [2].

∆(A,B) is the cost of the merging of clusters A and B. →


mj is
B. Objects Categorization the center of jth cluster and nj is the number of the existing
The manner of objects categorization in a set of unlabeled points in that cluster.
images on the basis of the matchings obtained from pyramid Since the hierarchical clustering considers the distance
match algorithm is explained in this section. In Fig. 2, the between the samples as the similarity criterion, therefore it is
matchings with stronger similarities obtained from the different from the criterion assumed in pyramid match kernel
application of previous algorithm stage are depicted. matrix. In order to unify the two criteria, the weight Wi of the
In order to categories objects on the basis of the similarities occurred matchings at the finer levels are considered less;
obtained from pyramid match algorithm, it is not possible to therefore, the two criteria used in the clustering and kernel
use the flat clustering methods such as K-means, Forgy, etc., matrix define the same concept.
because there is a need to know the center of the clusters at the
beginning of the algorithm, while the only accessible
information is the kernel matrix. Therefore, by considering the III. RESULT
kernel matrix as a distance criterion, we employed the In this section, we presented the results of the proposed
hierarchical clustering. For this purpose, the Ward method evaluation method in the direction of unsupervised learning
was used in clustering. This method is considered as a part of for the categorization of objects. To test the method, we
hierarchical clustering methods and exclusive too. performed some experiments in which the visual complexity
level of the images gradually improved. For this purpose, two
ETH-80 and Caltech-101 dataset with respectively simple and
difficult images were employed. Since we know the
membership of each sample to each categories, these
information are used for the measurement of efficiency of the
proposed method with fractional confusion matrix used for the
evaluation of each of the performed tests.
The ETH-80 dataset includes 8 classes (apple, car, caw, cup,
tomato, pear, horse, and dog) and 41 samples in each class.
For each class, several types and different angles of that
sample with uniform blue background are present; therefore,
there are about 400 samples of each class. Among the 8
classes present, 3 categories of 4 classes (car, cup, apple, and
cow) were selected and each time one 100-group subset was
randomly selected from each class and the learning algorithm
Fig. 2. Demonstration of the matchings with stronger similarities obtained
from the application of pyramid match algorithm in four samples of a set of was implemented on 3 different classes. In this case, four
images used.
different tests were performed on the desired dataset and
ultimately the mean of the results were obtained.
real categories
The Caltech-101 database includes 101 classes with 40 to
800 samples in each class. The data of this dataset were

Learned categories
apple cup car
obtained from Google image search. These images are of
apple
considerable level of complexity, because they include clutter 0.91 0.06 0.06
images with occlusion and the different samples and types are cup
of one class. Among the existing classes, 4 classes (airplanes, 0.06 0.9 0.07
face, motorbikes, and watch) were selected and like the car
0.03 0.04 0.87
previous dataset, each time the learning algorithm was
implemented on 3 different classes and the mean of the results
was obtained. In Fig. 3, samples of images of 4 selected
classes are shown. TABLE II. For each test, the mean accuracy for the confusion matrix diameter
Due to the high volume of the exiting images in these is computed and ultimately the overall accuracy mean is obtained.
datasets, it is not possible to select all the classes and on the
Dataset ETH-80 Caltech-101
other hand, we need reliable accuracy for the presented
algorithm; therefore, the mean obtained by the tests is Clustering mean accuracy 87% 81%
considered as the obtained accuracy.
We used the SIFT to find the interesting points and for each
key point found, one 128-dimension descriptor was
established and by the use of the principle component analysis
IV. CONCLUSION
we reduced it to a 12-dimension descriptor.
In table I, the results of the accuracy obtained from two In this paper, we presented a method which can learn the
samples at the above datasets will be explained and the final categories of objects from unlabeled images automatically.
result for the accuracy obtained from clustering is presented in By the use of pyramid match algorithm, we identified
table II. partial matchings between two sets of feature vectors and
showed that how it can be used as a measurement criterion to
show the similarity between images with unordered local
features and variable length. Additionally, by the use of the
obtained kernel matrix, we were able to categories objects by
the common clustering methods such as hierarchical clustering
without computing any distance in linear time and with
acceptable accuracy. This is done while in other methods such
as the proposed algorithm in [8] the Euclidean distance is
taken into consideration.

REFERENCES
Fig. 3. Images on the left are samples of Caltech-101 dataset selected classes
and those on the right are samples of ETH-80 dataset selected classes.
[1] k. Grauman, Matching Sets of Features for Efficient Retrieval and
Recognition, Ph.D. Thesis, Department of Electrical Engineering and
Computer Science, Massachusetts Institute of Technology, 2006.
TABLE I. Fractional confusion matrix for two samples of the tests performed [2] k. Grauman and T. Darrell, "The pyramid match kernel: discriminative
at Caltech-101 and ETH-80 dataset respectively. In ideal state, the confusion classification with sets of image features", Proc. IEEE International
matrix is similar to the unitary matrix and as it is observed, the values of the conference on computer vision, Beijing, China, October 2005.
matrices are nearing the ideal state. Of course, due to simpler images in ETH- [3] Q. He, A review of clustering algorithms as applied in IR, Graduate
80 relative to Caltech-101 dataset, we obtained higher accuracy for this test. School of Library and Information Science University of Illinois at Urbana-
Champaign, 1999.
[4] C. Shalizi, Distance between clustering, hierarchical clustering, lecture,
real categories
Carnegie Mellon University, 2009.
motor watch airplane [5] D. Lowe, "Distinctive image features from scale invariant key points".
International Journal of Computer Vision, 60(2):91–110, January 2004.
Learned categories

motor 0.8 0.08 0.08 [6] A. Berg, T. Berg, and J. Malik, "Shape matching and object recognition
using low distortion correspondences". Proc. IEEE Conference on Computer
Vision and Pattern Recognition, San Diego, CA, June 2005.
watch 0.1 0.82 0.07 [7] K. Mikolajczyk and C. Schmid, "Indexing based on scale invariant
interest points". Proc. IEEE International Conference on Computer Vision,
airplane 0.1 0.1 0.85 Vancouver, Canada, July 2001.
[8] C. Wallraven, B. Caputo, and A. Graf, "Recognition with local features:
the kernel recipe". Proc. IEEE International Conf. on Computer Vision, Nice,
France, Oct 2003.
[9] R. Kondor and T. Jebara, "A kernel between sets of vectors". Proc. [11] P. Quelhas, F. Monay, J.M. Odobez, D.G. Perez, T. Tuytelaars, and L.
International Conference on Machine Learning, Washington, D.C., August V. Gool, "Modeling scenes with local descriptors and latent aspects". Proc.
2003. IEEE International Conference on Computer Vision, Beijing, China, October
[10] J. Sivic, B. Russell, A. Efros, A. zisserman, and W. Freeman, 2005.
"Discovering object categories in image collections". Proc. IEEE International
Conference on Computer Vision, Beijing, China, October 2005.

You might also like