You are on page 1of 5

ANALYSIS AND CLASSIFICATION OF IMAGE SETS BASED ON WEIGHTED FEATURE EXTRACTION

A.Narayanan. ,S.Dhanasekaran
1 2

Department of Information Techno logy Kalasalin gam University Shrivillipu thur India
Info_narain@yahoo.co.in , srividhanas@yahoo.com Abstract Image classification is the most important part of image analysis. The main aspire of the project is to categorize the images from COREL image set using WFSVM. First, the input image from the COREL image set has been partitioned using FUZZY C-means clustering method. Image segmentation is the process of partitioning the images into multiple segments and is used to analyze the segmented images easily. The features of the segmented images such as shape, color and texture can be extracted in different levels. Feature extraction is a special form of dimensionality reduction. After extracting the relevant features of each segmented images, they are assigned with weights and irrelevant features are removed. A feature is said to be the dominant feature of a cluster if its distribution is denser where as a feature is said to be irrelevant if its distribution is discrete. Finally, the images from COREL image set can be categorized using WFSVM based on the weighted features, WFSVM has two advantages than the traditional SVM they are high classification accuracy and less training time. Keywords Image Classification, Image segmentation, SVM, Feature Extraction, COREL image Dataset I. INTRODUCTION
1

Image processing and analyzing is a rapidly growing area of computer science. Its growth has been fuelled by technological advances in digital imaging, computer processors and mass storage devices. Fields such as medicine, film, video production, photography, remote sensing and security monitoring used analog imaging previously are switching to digital imaging, for their flexibility and affordability. These sources produce a huge volumes of digital image data every day. Processing and analyzing these huge volumes of images manually is a tedious job. Techniques for manipulating these images with little or no human intervention are required. Image analysis is defined as the act of examining images for the purpose of identifying objects and judging their significance. Image analyst studies the image data and tries to detect, identify, classify measure and evaluates the significance of objects, their patterns, spatial and temporal relationship. There are many different techniques

used in analyzing images automatically: image segmentation, image classification, image understanding and pattern recognition. To analyse or interpret an image automatically, there must be a way of identifying unambiguously the pixels that correspond to particular features of interest. The process of identifying these pixels is known as segmentation. Segmentation techniqu es are widely used in many applications involving the detection, recognition, classification and measurement of objects in images. The success or failure of these tasks is a direct consequence of the success or failure of segmentation. The common segmentation method is clustering. The K-means algorithm is an iterative technique that is used to partition an image into K clusters. A drawback of the K-means algorithm is, the number of clusters K is an input parameter. An inappropriate choice of K may yield poor results. This can be resolved by Fuzzy C Means (FCM) algorithm. The FCM employs fuzzy partitioning such that a pixel can belong to all groups with different membership grades between 0 and 1. FCM algorithm is mainly used for image segmenta tion. Quality of clustering is important along with increasing importance of clustering. Therefore validity functions are required to identify the best clustering, namely Partition Coefficient (PC), Classification Entropy (CE), Partition Exponent (PE), CSC (Compact and Separate Clustering) and index (S) [2][8][13]. PC, CE and PE validity measures are lacking direct connection to geometrical property. But S validity function includes geometrical properties [4] [5] and it is the ratio of compactness to separation. Here S is used to validate the clustering. In [1] organizing images into (semantical ly) meaningful categories is addressed as a challenging problem and it refers to the problem of the semantic gap which is the key hindrance in all applications. The supervised machine learning techniques such as support vector machine (SVM), Bayesian classifier [12] are often used to reduce the semantic gap in image classification [10] [11]. Their principal advantage is their good generalization capability. But, traditional SVM algorithms on image classification have not distinguished the differences of different features for object classification and assign the same weight to all low-level features. For high dimensional image data there are many features but not all the features are relevant to classification. A Weighted Support Vector Machine (WSVM) [14], assigns different weights to

samples in different classes using Kernel-based Possibilistic C-Means (KPCM) algorithm but neglect the relative importance of each feature with respect to classification task. For example, consider two images, one image with a ball and another image with a tiger. For ball, the shape feature will be more important than color and texture features. On the other hand, for tiger, the color feature is more important. Thus, the features that are relevant to classification are determined and more weight is assigned to them. In [6] Weighted Feature Support Vector Machine is proposed, where the relevant feature is determined using the degree of discrete. A Weighted Feature Support Vector Machine is proposed in this paper, where the weight estimation of features is carried out using the principle of maximizing deviations between categories. Then the weighted features are used to compute the kernel function of the WFSVM. The WFSVM is used for training and testing. The paper is organized as follows. Section II describes system design. Section III discusses the experiments and results and Section IV illustrates conclusion and future work. II. SYSTEM DESIGN The overall system architecture is shown in Fig. 1 Train/ Test image from COREL Dataset

A. Image Segmentation Image segmentation is carried out using fuzzy c means algorithm [9]. The fuzzy c-means algorithm is based on minimization of the objective function, with respect to , a fuzzy c-partition of the data set, and to v, a set of c centroids.

where, x,i ( x=1,2,3,....N, i =1,2,....c ) is membership value, it denotes fuzzy membership of data point x belonging to class i, vi (i=1,2,3....c) is centroid of each cluster, zx (x=1,2,3.....N) is data set(pixel values in image), m is the fuzzification 2 parameter, d (zx,vi) is the Euclidean distance between zx and vi, N is the number of data points and c is the number of clusters. Fuzzy partition is carried out through an iterative optimization of the objective function. The sequence of steps is: i. Choose primary centroids vi(prototypes). ii. Compute the degree of membership of all data set in all the clusters.

Image Segmentation using FCM and Validation of Clustered Output

iii.
Extraction of Features Color Features Shape Features Texture Features

Compute new centroids vi.

and update the degree of membership according to ii. iv. If maxx,i[|x,i x,i|] < stop, otherwise goto step iii. where is the termination criterion. It takes value between 0 2 and 1. d (zx,vi) is the Euclidean distance.

Determining o f relevant features and assigning weights VALIDATION OF CLUSTERED OUTPUT The segmented image is validated using the validity function, index(S) [9]. A smaller S indicates a partition in which all the clusters are overall compact and separate to each other. S is given as

Classification of images using WFSVM Training phase Testing phase

Performance Evaluation Fig. 1. Overall System Architecture has five modules. i. Image Segmentation ii. Extraction of features iii. Determining relevant features and assigning weights iv. Classification of images v. Performance evaluation

The compactness of fuzzy cluster ci is computed as where N is the number of data points. The variation of fuzzy cluster i is defined as

dx,i is called the fuzzy deviation of zx from class i

s is separation of the fuzzy c-partition, where dmin is minimum distance between cluster centroids.

The compactness and separation validity function S is defined as the ratio of compactness to separat ion, and partition index is obtained by summing up this ratio over all clusters.

The different choices of scale and orientation components construct a set of filters. To reduce the computational load, the filter-banks should be made as small as possible which provides adequate distinguishable information for a high-level classifier. Here a set of 6 filters is constructed using three scales and two orientations. Once a series of Gabor filters have been chosen, image features at different locations, frequencies and orientations can be extracted by convolving the image i(x,y) with the filters Apply the filter bank to the input image and obtain the magnitude of the filtered images. C. Determining Relevant Features and Assigning Weights

B. Extraction of Features From the segmented image 30 low-level features are extracted from each image segment [7]. They are Color Features (12) Average of R, G, B values(3) Standard deviation of R, G, B values(3) Average of L*, a*, b* values(3) Standard deviation of L*, a*, b* values(3) Shape Features (6) Area, the number of pixels occupied by the region. x, the number of pixels occupied by the region along X axis. y, the number of pixels occupied by the region along Y axis. Compactness, the ratio of perimeter square to area. Convexity, the ratio of perimeter of the convex hull over that of the original contour. Moment of inertia, measures the distribution of mass relative to axes through the center of gravity. Texture Features (12) Texture feature is represented by the average and variance of 6 filter response s.

As it is mentioned in the introduction, if there are features that are not completely relative to classification task, it may affect the performance of support vector machine [3]. So, it is required to impose different weights to different features. The weighted feature (p) calculation using histogram analysis is as follows: Let there are m features <fj1, fj2,...,fjm> in j-th cluster, and the corresponding weights of these features are <j1,j2,......jm>. The feature with denser distribution is considered as relevant feature than the feature with less distribution. To calculate the degree of density, a histogram is created for each feature in each cluster. For a particular feature, the smaller the area of the histogram the denser the feature and hence more relevant. The density for l-th feature is given as

The weight of l-th feature for j-th cluster is defined as below:

D.Classification of Images The classification of images includes two phases. Training Phase Testing Phase 1) Training WFSVM: Once the low-level features for all training images are extracted and their corresponding weights are calculated using (16), they are fed into WFSVM. The WFSVM will find a hyperplane that separates the training data by a maximal margin. That is, given l training data T={xi,yi}l i=1where xi Rn and yi {-1,1}, WFSVM need to solve the optimization problem

Texture features are extracted using the Gabor wavelet function proposed by Naghdy et al. A 2D Gabor function is:

Where C is the penalty parameter of the error term, w is the coefficient vector, b is a constant, i is a parameter for

handling non-separable data and K(xi, xj)=(xi) . (xj) is the kernel function. The non-linear WFSVM with the Gaussian (Radial Basis Function) kernel is used in this system. The Gaussian kernel is defined as

WFSVM is trained to separate that category from others. For training 300 images are used i.e. 30 images from each category and the remaining 700 images are used for testing. Image segmentation and feature extraction has been implemented and the classification of images using weighted For features is ongoing work. The training dataset consisting of classification multi-class WFSVM is used, constructed 300 images is segmented and 29 low-level features are according to one against others strategy. For each category, extracted. a WFSVM is trained to separate that category from all other 1) Image Segmentation: The training images are categories to accomplish the classification task. segmented and stored in a folder called SegmentedImages The algorithm for WFSVM is as follows: for further processing. The input image from the category l n i. Input sample set T = {(xi, yi)} i=1, where xi R , Africa is shown in Fig. 2. The input image is segmented using FCM. The validity function, index (S) is used to fix the yi {-1,1}, i=1,2,..l. ii. Structure diagonal matrix = diag(1,2,n) best cluster. The number of clusters and the corresponding S using the features weights 0 p1 (p = value is given in Table 1. The number of clusters with minimum S value is 6, it is considered as the best cluster. 1,2,..,n). iii. Select appropriate penalty parameter C > 0, The segmented image is shown in Fig. 3. 2)Extraction of Features : The segmented images stored in solve optimization model (19) and get the T the folder SegmentedImages is given as input to the feature optimal solution * = (*1,,* l) . iv. Select a positive component 0 < *j < C of and extraction module. The features extracted from the 300 training images are stored in a text file for further processing. calculate The text file containing the extracted features is shown in Fig. 4. v. Structure decision function

Obviously, when the weights of all features are 1, the WFSVM becomes the standard SVM. 2) Testing Phase: In testing phase, the trained WFSVM is tested using the images of COREL dataset. From the testing dataset, extract 30 low-level features and give them as input to WFSVM. Usually 2/3 of the data is used for training and 1/3 is used for testing but here 1/3 is used for training and 2/3 is used for testing as the SVM is trained with weighted features. D. Performance Evaluation The final module is evaluation of performance of the classifier. How well the classifier classifies the data is evaluated using the formula

As weighted features are used as input for SVM classifier better classification accuracy is expected. III.
EXPERIMENTS AND RESULTS

Fig. 2 Input image from folder Africa TABLE I NUMBER OF CLUSTERS AND S VALUE

The experiments are carried out using the images from the COREL database. COREL database can be obtained from [15]. There are 1000 images in the dataset. The dataset has 10 thematically diverse image categories. Each category includes 100 images represents one distinct topic of interest. All images are in JPEG format with size 384256 or 256384. A keyword is assigned to describe each image category. The categories are Africa, beach, building, bus, dinosaur, elephant, rose, food, home and mountain. For each category one

Number of Clusters 2 3 4 5 6 7 8 9 10

S Value 0.2869 0.3102 0.3011 0.2974 0.2698 0.2705 0.2999 0.3312 0.3507

[5]

Fig. 3 Segmented image of the input Africa image.

II Hong Suh, Jae-Hyun Kim and Frank Chung-Hoon Rhee, ConvexSet-Based Fuzzy Clustering, IEEE Trans. on Fuzzy Systems vol. 7, No. 3, pp. 271-285, 1999. [6] Keping Wang, Xiaojie Wang and Yixin Zhong, A Weighted Feature Support Vector Machines Method for Semantic Image Classification, in Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on vol.1, pp. 377-380, 2010. [7] L. Wang, L. Khan, L. Liu and W. Wu, Automatic Image Annotation and Retrieval using Weighted Feature Selection, in Multimedia Software Engineering, 2004 Proceedings on IEEE Sixth International Symposium, pp. 435-442, 2004. [8] M. Sugeno and T. Yasukawa, A Fuzzy-Logic-Based Approach to Qualitative Modeling", IEEE Trans. Fuzzy Syst., vol. 1, pp. 7-31, 1993. [9] Metin KAYA , An Algorithm for Image Clustering and Compression, Turk J Elec Engin, vol. 13, 2005. [10] Simon Tong and Edward Chang, Support Vector Machine Active Learning for Image Retrieval, in Proceedings of the ninth ACM International Conference on Multimedia, vol.1, pp. 107-118, New York, USA, 2001. [11] Stuart Andrews, Ioannis Tsochantaridis and Thomas Hofmann, Support Vector Machines for Multiple-Instance Learning, in Advances in Neural Information Processing Systems, vol. 15, pp. 561568. MIT Press, 2003. [12] Tianxia Gong, Shimiao Li and Chew Lim Tan, A Semantic Similarity Language Model to Improve Automatic Image Annotation, in Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, vol. 1, pp. 197-203, 2010. [13] Xuanli Lisa Xie and Gerardo Beni, A Validity Measure for Fuzzy Clustering, IEEE Trans. on Pattern Anal. Machine Intell., vol. 13, No.8, pp. 841-847, 1991. [14] Zhang Qilong, Shan Ganlin and Duan Xiusheng, Weighted Support Vector Machine based Clustering Vector, Computer Science and Software Engineering, International Conference on vol. 1, pp. 819-822, 2008. [15] COREL database http://corel.digitalriver.com/

Fig. 4 Extracted Features

IV.

CONCLUSION AND FUTURE WORK

In this paper, an automatic image classification using Weighted Feature Support Vector Machine is proposed and the results are given for two modules. As weighted features are considered as input to SVM high classification accuracy and less training time is expected. The classification accuracy will be compared with the traditional multi-class SVM. In future, this work can be extended to medical images. REFERENCES
[1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain, Content-based Image Retrieval at the end of the Early Years, on Pattern Analysis and Machine Intelligence, IEEE Trans. volume 12, pp. 1349-1380, 2000. Amine M. Bensaid, Lawrence O. Hall, James C. Bezdek, Laurence P. Clarke, Martin L. Silbiger, John A.Arrington and Reed F. Murtagth, Validity-Guided (Re)Clustering with Application to Image Segmentation, IEEE Trans.on Fuzzy Systems, vol. 4, pp. 112-123, 1996. Lei Wang, Latifur, Automatic Image Annotation and Retrieval Using Weighted Feature Selection, In Machine Learning and Cybernetics, 2009 International Conference on vol. 3, pp. 1616-1620, 2009. I. Gath and A.B. Geva, Unsupervised Optimal Fuzzy Clustering, IEEE Trans. Pattern Anal. Machine Intell., vol. 11, No. 7, pp. 773-781, 1989.

[2]

[3]

[4]