You are on page 1of 4

ISBN 978-952-5726-07-7 (Print), 978-952-5726-08-4 (CD-ROM) Proceedings of the Second Symposium International Computer Science and Computational Technology(ISCSCT

09) Huangshan, P. R. China, 26-28,Dec. 2009, pp. 458-461

Image Semantic Classification Using SVM In Image Retrieval

Xiaohong Yu 1, and Hong Liu 2

College of Computer Science & Information Enginerring , Zhejiang Gongshuang University No.18,Xuezheng Str.,Xiasha University Town,Hangzhou, China 2 College of Computer Science & Information Enginerring , Zhejiang Gongshuang University No.18,Xuezheng Str.,Xiasha University Town,Hangzhou, China Email: LLH

Abstract There is a gap between low-level descriptions of image content and the semantic understanding of users to query image databases in the content-based image retrieval. In this paper, we put forward a method of classifying image regions hierarchically using their semantics and that resembles peoples perception more than using low-level features. The experiments show, the better precision of semantic classification justifies the feasibility of our method. It uses in image retrieval field further and get better index effect. Index Terms Image classification, semantic classification, image retrieval, Super Vector Machine, keyword-based retrieval

I. INTRODUCTION The growth of the World Wide Web have led to the huge online digital images and videos, so there is a strong demand for developing an efficient technique for image retrieval to exploit maximum benefit from this huge amount of digital information. In traditional system, the keywords of image in database are labeled manually and then it utilizes text-based retrieval system to index the image. As the image increases, this technique becomes very inefficient and insufficient to describe the details of an image, so the content-based image retrieval systems have been the major subject for recent decades. Many images retrieval systems have finished, such as QBIC, Visual SEEK, Netra and MARS and so on. They index and retrieval of image based on low-level features of image, such as color, texture and shape. Content-based image retrieval techniques based on similarity matching of features. Take Color similarity for example, color similarity of image can measure by pixel luminance matching, but pixel matching is highly sensitive to noise and small distortions like rotation, further, it is really time-consuming. Most of above systems have the advantage of being automatic, but they are hard to use for novice because of the semantic gap that exists between user perception and system requirements. As a matter of fact, novice prefer to retrieval image using image semantic elements, such as land, sky, mountain, snow and grass, which are closer to their perception than low-level features. If the system adopt hierarchical semantic to organize and index the image, the gap between the low-level descriptions of image and

the users semantic needs reduce. That is, to reduce the semantic gap, we need to classify image regions based on their semantics. Let novice queries desired images intuitively. In this paper, we put forward a method of the automatic hierarchical classification of image regions into a more detailed classification hierarchy based on the semantics of the region content by using SVM, and the paper also give a experiment to prove the method that can perform well in the image retrieval field. We organize this paper as follows. In Sec. 2, we describe the HSV color space that is more suitable for human perception. In Sec. 3, we pay attention to image segmentation, extraction of region features, and simply discuss how to build the SVM and classify the image regions by using SVM. In Sec. 4, the experimental results are presented and finally, the conclusions are given in Sec. 5. . HSV COLOR

Although the process followed the human brain in perceiving and interpreting color is a psychological problem that is not yet fully comprehended. The purpose of color model ( also call color space) is to facilitate the specification of colors in some standard. In fact, a color space is a specification of a coordinate system where each color is represented by a single point Most color space in use today are oriented either toward hardware or toward applications, the hardwareoriented space most commonly used in practice are the RGB (Red, Green, Blue) space, the HSV (Hue , Saturation, Value) space is more suitable for human perceive , so in this paper we use HSV space for studying. We map the image into the HSV color space.
R+G+ B v= 3 1 [( R G ) + ( R B)] 2 H = arccos 2 1/ 2 [( R G ) + ( R B)(G B)] 3 s = 1 R + G + B [min( R, G, B)]

For each image, we find the minimum and maximum values of each of the three colors components to set up

the coordinate of the HSV space. Each axis runs from minimum to maximum values. These values normalize so that the minimum value equal zero and the maximum value is one. Then, the H, S and V components are quantized to 16, 8 and 8, respectively, within the minimum-maximum range of each component. The H component quantize into more levels, as compared to both S and V, to reflect the diversity of colors in the image database. The values can change by user if necessary to suit a specific image collection. . IMAGE CLASSIFICATION BY REGION-BASED ON

quantized to 6 values, and the S component, which is quantized to 4 values. Hence, the color histogram is represented by a 24-dimensional vector. 2) Edge direction histogram Edge Direction Histogram of the image shape is one of feature extraction methods, the algorithm extracted feature vectors satisfies scale, translation and rotation change. There are many different types of edge detector operators. We use the popular Canny edge detector. Experiment proved that the method for a single background, the shape characteristics of clear image with better research results. C. Support Vector Machine(SVM) Classifier 1) Review of support vector classifiers theory The way of constructing a hyperplane to get binary classifiers done that can separate members of one class from others, but most real data hardly separate because the hyperplane that can successfully separate the members of the two classes in most case does not exist. One measure to solve this problem is to map the data into a higher dimensional space, where the members of the two classes can separate by a hyperplane. However, the traditional classifier is not good at in high dimensional vector. It is extremely expensive in terms of memory and time. Support Vector Machines can solve this problem. SVM avoid overfitting the data by choosing a hyperplane from the many that can separate the data. That maximizes the minimum distance from the hyperplane to the closest training point. Such a hyperplane call the maximum margin hyperplane. Another advantage of the SVM is the compact representation of the decision boundary, so the number of support vectors is small as compared to the number of points in the training set. In this, we simply introduce Support Vector Machine for binary classification The given training data set for binary classification problem is : {( x1 , y1 ),...( xi , yi ),..., ( xl , yl )} (3.1) where

A. Image segmentation Image segmentation is a process of dividing an image into coherent, uncovered and significative regions. Generic, complete and to the pixel accurate unsupervised segmentation regard as it is virtually impossible, so in this paper, we just want to get a method to segment an image, which is satisfied with the following condition: (1) First, the extracted regions are coherent. (2) Segmentation should give satisfactory results on general image data without knowledge assumed. (3) Segmentation process should be unsupervised. In our experiment, we use hill-climbing method to segment the image, which can be satisfied with above condition. The hill-climbing algorithm summarizes as follows: (1) Compute the HSV color histogram of the image. (2) Start at a non-zero bin of the color histogram and make uphill moves until reaching a peak. (3) Choose another unclimbed bin and re-perform step 2 to find another peak. Repeat this step until all non-zero bins of the color histogram climbed. (4) The peaks we get from above represent the first number of clusters of the input image, and, these peaks saved. (5) In the end, neighboring pixels that have same peak put together, that is associating every pixel with one of the identified peaks. Consequently, the segments of the input image formed. The segmentation results shows in Figure 1.

xi R d and yi {1,1} are training pattern

vectors and their corresponding labels, and l indicates the number of training pattern vectors. Let us also define a linear decision surface by the equation f ( x) = w x + b = 0 (3.2) Where w is normal to the hyperplane, b / w is the distance from the distance from the origin to the hyperplane. If the following formulation exists: w x + b 1 yi = 1 (3.3) w x + b 1 yi = 1 It means the training date set can be separated in linear. Using a nonlinear transform , these pattern vectors in Eq. (2.1) can be mapped from the original input space

Figure 1. The result of Image segmentation

B. Extract feature of region We use different way to extract the feature of region. 1) color histogram in the HSV space In our classification experiments, we found that we could achieve better classification accuracy if we represent the color content of each region with only the H and S components. Thus, we eliminate the V component. A color histogram contains the H component, which is

R d into high dimensional feature space R n , the

transform shown in Figure. 2.

x R d ( x) R n


In the feature space R , SVM aims at constructing a linear discriminant function of the form, f ( x) = sign( w ( x) + b) (3.5) Where w and b imply the weight vector and threshold; and <,> denotes the inner product. According to structural risk minimization principle, SVM is to solve a problem as follows, l 1 (3.6) min wT w + C i 2 i =1

defined a hierarchy organization that reflects the semantics in the Nature images and based on human judgement subjectively, as shown in Figure. 3. The hierarchy organization is not complete by itself, but it is a reasonable organization to simplify image retrieval.



Linear Polynomial RBF KMOD

k ( x, y ) = x y
k ( x, y ) = (ax y + b) d

s.t. yi ( w ( xi ) + b) 1 i

i 0 i = 1, , l Where C is the regularization parameter which can control the tradeoff between the number of errors and the complexity of model, and the slack variable i > 1 corresponds to some misclassified training sample.
l 1 L ai = D 2 i =1 max l 1 l = ai 2 i =1 i =1 s.t. 0 ai C

k ( x, y ) = exp( x y / 2 )

k ( x, y ) = a(exp(

x y + 2

) 1)

a a y y ( x ) ( x )
i j i j i j l i j i j i j j =1

The selection of classes based on having general concepts to give meaningful associations in normal comprehension.

a a y y K ( x , x )

a y
i i =1

Figure 3. A class organization used the SVM to distinguish the members of a class from others

Where ai 0, i = 1, , l are Lagrangian multipliers to solve. k ( xi , x j ) = ( xi ) ( x j ) is kernel function, Some of the classical SVM kernels are reported in Table 1. So, the discriminant function and parameter b are: l ~ f ( x) = sign yi ai K ( xi , x) + b) (3.8)
i =1




( y a y K ( x , x ))
i j j j i x j J

Where N NSV is the number of standard support vectors, JN is the set of standard support vectors, J is the set of support vectors.

Figure 2. A linear hyperplane separating the members of two classes. The support vectors are circled which is technique used by most radial basis function classifiers.

2) Region Classification In order to classify image regions into semantic classes in which humans can understand easily, we manually

3) Learning and classification stage Learning the semantics for each class through using SVM based on different features of training sample regions of each class. We get the SVM classifiers is P(classified | notMemberOfClass) .These binary classifiers achieve good class separation under the constraint that each region belongs to only one, or none, of the classes. After training the SVM, binary classifiers that can classify image regions based on their semantics create. Then, we use these binary classifiers to classify our database of image regions leading to the classification shown in Figure 3 to determine the class of an input image region and this image region map into its class in the semantic class hierarchy in Figure.3. In the semantic class hierarchy, all scenery regions are classified into Nature regions or Artificial regions. Since in this paper we concentrate on Nature regions, thus we further classify the Nature regions into three subclasses: Sky, Land and Water, Each one of these subclasses further divided into sub-subclasses. The Sky subclass divided into Night, Sunset, Clouds and BlueSky. Next, the Water subclass divided into Waterfall, BlueSea, WhiteWave and River. Then, the Land subclass divided into Mountain, Sand, Greenground, and Snow. The Greenground subsubclass further divided into Grass and Forest. Thus, each image can be represented by a set of keywords that are the name of class based on semantic classification of image regions. The choice of keywordbased method allow for highly intuitive query interface,

so the novice can use the semantic to retrieval image by their understanding.

This work was supported by Natural Science Foundation of Zhejiang province (No:Y1080565) REFERENCES
[1] N.E.Ayat, M.Cheiet, C.Y.Suen, Automatic model selection for the optimization of SVM kernels-pattern recognition 1733-1745 (2005) [2] Zaher Al Aghbari, Region-based semantic image classification International Journal of Image and Graphics Vol.6. No.3 (3006) 357-375 [3] Pawan Jain, S.N.Merchant, Wavelet-based multiresolution histogram for fast image retrieval. International journal of Wavelets, Multiresolution and Information processing (2004) [4] M. Flickner, H. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D.Lee, D. Petkovic, D. Steele and P. Yanker, Query by Image and Video Content:The QBIC System, IEEE Computer Magazine (1995). [5] S. Mehrotra, Y. Rui, M. Ortega and T. S. Huang, Supporting Content-Based Queriesover Images in MARS Proc. IEEE Int. Conf. On Multimedia Computing andSystems (1997). [6] Mihalcea, R. and Moldovan, D.: Semantic indexing using WordNet senses. In Proceedings of ACL Workshop on IR & NLP, Hong Kong, October 2000. [7] Miller, G. Wordnet: A lexical database. Communication of the ACM, 38(11):39--41, (1995). [8] 7. Joon Ho Lee, Myong Ho Kim, and Yoon Joon Lee. "Information retrieval based on conceptual distance in ISA hierarchies". Journal of Documentation, 49(2):188{207, June 1993. [9] Haav, H. M 5. A. Natsev, R. Rastogi and K. Shim, WARLUS: a similarity retrieval algorithmfor image database, IEEE Transaction on Knowledge and Data Engineering 16(3),(March 2004). [10] R. Krishnapuram and S. Medasani, Content-based image retrieval based on a fuzzyapproach, IEEE Transaction on Knowledge and Data Engineering 16(10), (October2004). [11] F. Jeng, M. Li, H.-J. Zhang and B. Zhang, An efficient and effective region-basedimage retrieval framework, IEEE Transaction on Image Processing 13(5), (May2004). [12] J.-H. Lim, Explicit query formulation with visual keywords, Proc. ACM Multimedia(October 2000). [13] B. Bradshaw, Semantic based image retrieval: a probabilistic approach, Proc. ACMMultimedia (October 2000).Region-Based Semantic Image Classification 375 [14] Minka, T.P., Picard, R.W.: Interactive learning using a society of models. Pattern Recogn. 30, 565581 (1997) [15] Wood, M.E., Campbell, N.W., Thomas, B.T.: Iterative refinement by relevant feedback in content based digital image retrieval. In: Proceedings of the International Conference on Multimedia, pp.1320 (1998) [16] Jing, F., Li, M.J., Zhang, H.J., Zhang, B.: Learning region weighting from relevance feedback in image retrieval. Proc. IEEE Int. Conf. Acoust. Speech Sign. Process. 4, 4088 4091 (2002) [17] Jing, F., Li, M.J., Zhang, H.J., Zhang, B.: Region-based relevancefeedback in image retrieval. Proc. IEEE Int. Symp. Circ. Syst. 4,145148 (2002) [18] Jing, F., Li, M.J., Zhang, H.J., Zhang, B.: Support vector machines for region-based image retrieval. Proc. IEEE Int. Conf. Multimedia Expo. 2, 2124 (2003)

We do experiments mainly on images testing set, such as nature scenery, flowers, flags and winter about two thousands images. We divide each image into 5 regions on the average. We got a database with about ten thousands regions as a result of segmentation. We selected 600 regions from above database as a training set for training the SVM. That is about 50 images per class. Then, the extracting feather use color histogram and Edge direction histogram in different classes. To classify the image regions, we tried different grouping of classes and different features before we finally decided on the grouping and features in Figure. 3, which gives the best classification precision. At is experiment we performed the classification Nature regions and Artifical regions using EDH to extract the feature of region. Then we tried to group the Nature image regions into 3 classes using color histogram feature because it gets high precision than using EDH. Finally, we use EDH feature to group the Water image regions into 4 classes. We get the experimental result shown in Figure. 4 when the user input the keyword waterfall.

Figure 4. Result of the a query of waterfall


We put forward a method to classify image regions based on their semantics. It can reduce the gap between humans perception and description of image content. Because the pre-defined semantic class hierarchy reflects in the semantics by humans subject, so it is flexible and intuitive query by novice. The use of the binary SVM classifiers that classify image regions using different features at different levels in the hierarchy were the main reasons behind the high classification precision that we achieved in our experiments. Currently, we are looking adding more feature extraction methods to get high precision and put more classifiers to include more classes into the system. ACKNOWLEDGMENT