Professional Documents
Culture Documents
Abstract— By an increase in the volume of image data in the feature vectors with constant length and each vector is
age of information, a need for image categorization systems is corresponding to a specific global feature, in local
greatly felt. Recent activities in this area has shown that the representation, each image generates different number of
image description by local features, often has very strong features with no meaningful regularity among the features of a
similarities among local partials of an image, but these methods set. On the other hand, similar measurement of a set of
are also challenging, because the use of a set of vectors for each generated unordered features is also challenging.
image does not permit the direct use of most of the common Some of the existing methods to solve the problem of direct
learning methods and distance functions. On the other hand, use of learning methods, use vector quantization for making a
measuring the created similarities of the collection of unordered codebook of feature descriptors and then by counting the
features is also problematic, because most of the proposed number of the occurrences of each feature and change of each
methods have rather high time complexities computationally. input related to a set of features of a vector, it is possible to
In this article, an unsupervised learning method of directly use the common methods of clustering or latent
categorization of objects from a collection of unlabeled images is semantic analysis [10, 11]. Every several functioning of these
introduced. Each image is described by a set of unordered local methods in recognition and categorization of objects is
features and clustering is performed on the basis of partial promising, but these methods do not permit explicit use of
similarities existing among these sets of features. For this clutter or occlusion features which are created due to image
purpose, by the use of pyramid match algorithm, the set of the background which needs a codebook and has computationally
features are mapped in multi-resolution histograms and two sets high time-complexity.
of feature vectors in time-line are calculated in this new distance. To measure the similarity of a set of unordered features,
These similarities are employed as a criterion distance among the many research activities have been performed which mostly
patterns in hierarchical clustering; therefore, the categorization have computationally high time-complexity, like [8] which
of objects by the use of common learning methods is performed introduces a kernel which measures the similarities by
with acceptable accuracy and faster than the existing algorithms. averaging the distances between each point in a set and the
nearest point to it in another set with time-complexity of
O(dm2) order (d is the dimensions of the feature vector and m
Keywords— Computer vision; unsupervised learning; is the maximum size of the set of features).
Categorization of objects; Pyramid match algorithm; Hierarchical Kondor et .al [9] designed a parametric kernel which fitted
clustering the Gaussian function to each set of feature vectors. time-
complexity order of this method is O(dm3) .Also, it is assumed
I. INTRODUCTION that the features of each set follow one distribution and the
The initial research activities performed on categorization number of the features of each set is enough to have an
and recognition of objects, described images with global accurate estimate of the distribution parameters, but satisfying
features, i.e. one feature vector was considered for each image. these conditions is not always possible.
Although this manner of representation permits the direct use But, Grauman and Darrell [2], introduced the pyramid
of learning methods and distance functions, they are very match kernel which identifies the matchings between two sets
sensitive to the conditions of the image such as clutter, of feature vectors and shows that how as a criterion of
occlusion, and lighting conditions. measurement it can represent the similarity between images
Recent research activities have shown that an image with unordered local features and variable length.
description with local features often acquires very strong In this article, by combining the pyramid match kernel and
similarities among the local partials of an image. As a result, a hierarchical clustering, not only we use the common clustering
set of local feature vectors is a descriptor for an object and for methods to categories the objects, but we also perform the job
each image (like [5, 6, and 7]), although the use of a set of with an acceptable accuracy and in linear time.
vectors for each image does not permit the direct use of most
of learning methods and distance functions. Since most
learning methods and similar measuring criterion consider the
Learned categories
apple cup car
obtained from Google image search. These images are of
apple
considerable level of complexity, because they include clutter 0.91 0.06 0.06
images with occlusion and the different samples and types are cup
of one class. Among the existing classes, 4 classes (airplanes, 0.06 0.9 0.07
face, motorbikes, and watch) were selected and like the car
0.03 0.04 0.87
previous dataset, each time the learning algorithm was
implemented on 3 different classes and the mean of the results
was obtained. In Fig. 3, samples of images of 4 selected
classes are shown. TABLE II. For each test, the mean accuracy for the confusion matrix diameter
Due to the high volume of the exiting images in these is computed and ultimately the overall accuracy mean is obtained.
datasets, it is not possible to select all the classes and on the
Dataset ETH-80 Caltech-101
other hand, we need reliable accuracy for the presented
algorithm; therefore, the mean obtained by the tests is Clustering mean accuracy 87% 81%
considered as the obtained accuracy.
We used the SIFT to find the interesting points and for each
key point found, one 128-dimension descriptor was
established and by the use of the principle component analysis
IV. CONCLUSION
we reduced it to a 12-dimension descriptor.
In table I, the results of the accuracy obtained from two In this paper, we presented a method which can learn the
samples at the above datasets will be explained and the final categories of objects from unlabeled images automatically.
result for the accuracy obtained from clustering is presented in By the use of pyramid match algorithm, we identified
table II. partial matchings between two sets of feature vectors and
showed that how it can be used as a measurement criterion to
show the similarity between images with unordered local
features and variable length. Additionally, by the use of the
obtained kernel matrix, we were able to categories objects by
the common clustering methods such as hierarchical clustering
without computing any distance in linear time and with
acceptable accuracy. This is done while in other methods such
as the proposed algorithm in [8] the Euclidean distance is
taken into consideration.
REFERENCES
Fig. 3. Images on the left are samples of Caltech-101 dataset selected classes
and those on the right are samples of ETH-80 dataset selected classes.
[1] k. Grauman, Matching Sets of Features for Efficient Retrieval and
Recognition, Ph.D. Thesis, Department of Electrical Engineering and
Computer Science, Massachusetts Institute of Technology, 2006.
TABLE I. Fractional confusion matrix for two samples of the tests performed [2] k. Grauman and T. Darrell, "The pyramid match kernel: discriminative
at Caltech-101 and ETH-80 dataset respectively. In ideal state, the confusion classification with sets of image features", Proc. IEEE International
matrix is similar to the unitary matrix and as it is observed, the values of the conference on computer vision, Beijing, China, October 2005.
matrices are nearing the ideal state. Of course, due to simpler images in ETH- [3] Q. He, A review of clustering algorithms as applied in IR, Graduate
80 relative to Caltech-101 dataset, we obtained higher accuracy for this test. School of Library and Information Science University of Illinois at Urbana-
Champaign, 1999.
[4] C. Shalizi, Distance between clustering, hierarchical clustering, lecture,
real categories
Carnegie Mellon University, 2009.
motor watch airplane [5] D. Lowe, "Distinctive image features from scale invariant key points".
International Journal of Computer Vision, 60(2):91–110, January 2004.
Learned categories
motor 0.8 0.08 0.08 [6] A. Berg, T. Berg, and J. Malik, "Shape matching and object recognition
using low distortion correspondences". Proc. IEEE Conference on Computer
Vision and Pattern Recognition, San Diego, CA, June 2005.
watch 0.1 0.82 0.07 [7] K. Mikolajczyk and C. Schmid, "Indexing based on scale invariant
interest points". Proc. IEEE International Conference on Computer Vision,
airplane 0.1 0.1 0.85 Vancouver, Canada, July 2001.
[8] C. Wallraven, B. Caputo, and A. Graf, "Recognition with local features:
the kernel recipe". Proc. IEEE International Conf. on Computer Vision, Nice,
France, Oct 2003.
[9] R. Kondor and T. Jebara, "A kernel between sets of vectors". Proc. [11] P. Quelhas, F. Monay, J.M. Odobez, D.G. Perez, T. Tuytelaars, and L.
International Conference on Machine Learning, Washington, D.C., August V. Gool, "Modeling scenes with local descriptors and latent aspects". Proc.
2003. IEEE International Conference on Computer Vision, Beijing, China, October
[10] J. Sivic, B. Russell, A. Efros, A. zisserman, and W. Freeman, 2005.
"Discovering object categories in image collections". Proc. IEEE International
Conference on Computer Vision, Beijing, China, October 2005.