142 views

Uploaded by Journal of Computing

Journal of Computing, http://www.journalofcomputing.org, call for papers, Volume 5, Issue 1, January 2013

save

- APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
- Detection of Retinal Blood Vessel using Kirsch algorithm
- Impact of Software Project Uncertainties over Effort Estimation and their Removal by Validating Modified General Regression Neural Network Model
- Combining shape moments features for improving the retrieval performance
- Towards A Well-Secured Electronic Health Record in the Health Cloud
- Analytical Study of AHP and Fuzzy AHP Techniques
- An Introduction to Clustering and Different Methods of Clustering
- An Introduction to Cluster Analysis for Data Mining
- Mca5043 Winter 2016 solved assignment
- 01.04 Clustering Examples
- Quantitative Effects of Using Facebook as a Learning Tool
- Cluster
- book2016June9.pdf
- 3
- 1101.4270.pdf
- babidi_qcav03
- Tfinder_ a Recommender System for Finding Passengers and Vacant Taxis _Nicholas Jing Yuan, Yu Zheng, Liuhang Zhang, Xing Xie
- Project Ideas - IEEE
- Decision Support Model for Selection of Location Urban Green Public Open Space
- Divide and Conquer For Convex Hull
- Detection and Estimation of multiple far-field primary users using sensor array in Cognitive Radio Networks
- A Compact Priority based Architecture Designed and Simulated for Data Sharing based on Reconfigurable Computing
- Business Process: The Model and The Reality
- Exploring leadership role in GSD: potential contribution to an overall knowledge management strategy
- Hybrid Network Coding Peer-to-Peer Content Distribution
- Mobile Search Engine Optimization (Mobile SEO): Optimizing Websites for Mobile Devices
- Analytical Study of AHP and Fuzzy AHP Techniques
- Product Lifecycle Management Advantages and Approach
- Image Retrival of Domain Name system Space Adjustment Technique
- Energy Efficient Routing Protocol Using Local Mobile Agent for Large Scale WSNs
- Complex Event Processing - A Survey
- Real-Time Markerless Square-ROI Recognition based on Contour-Corner for Breast Augmentation
- When Do Refactoring Tools Fall Short
- Application of DSmT-ICM with Adaptive decision rule to supervised classification in multisource remote sensing
- Impact of Facebook Usage on the Academic Grades: A Case Study
- Overflow Detection Scheme in RNS Multiplication Before Forward Conversion
- Arabic documents classification using fuzzy R.B.F classifier with sliding window
- Secure, Robust, and High Quality DWT Domain Audio Watermarking Algorithm with Binary Image
- Hiding Image in Image by Five Modulus Method for Image Steganography
- QoS Aware Web Services Recommendations Framework
- Seminarios2014 CO
- clad_exam_prep_guide_spanish_spain.pdf
- Uso Del Programa Tekla en El Montaje de [Autoguardado]
- Function Point Analysis
- Flac 3D 2
- EOrtiz Foro de Discusion 4 ETEL 601
- Ejemplo Logica Difusa
- Pulgarin_Escalona_2009.pdf
- SOP_U1_EA_RIRA
- Signaling System 7
- Charte de Nommage ( Tn) 11052011 FR
- Taller Oracle 11 g
- AjaxTags Basics
- Camada_de_Enlace_de_Dados
- Preguntas Equipo 2
- Srinivasa Garlapati BASA
- It Companies List in Hyd
- CT213_ProcessManagement_2.ppt
- Briefing Criação de Marca - Pier System
- DIR-300 Datasheet 02[1]
- AS-User-guide-2015-GE-140410.pdf
- Turbo Pascal Version 4.0 Owners Manual 1987
- TCP_IP1
- Fisica-informe-1.1
- Guía para descargar e instalar matlab 2017.docx
- 1 Processus de Test
- Guía de aprendizaje U2 3B.docx
- 840D_manejo2
- Readme
- eog

You are on page 1of 7

com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

31

**K-Means Clustering and Affinity Clustering based on Heterogeneous Transfer Learning
**

Shailendra Kumar Shrivastava, Dr. J. L. Rana, and Dr. R.C. Jain

Abstract - Heterogeneous Transfer Learning aims to extract the knowledge form one or more tasks from same feature space and applies this knowledge to target task of another features space. In this paper two clustering algorithms K-means clustering and Affinity clustering both based on Heterogeneous Transfer Learning (HTL) have been proposed. In both the algorithms annotated image datasets are used. K-means based on HTL first finds the cluster centroid of Text (annotations) by K-Means. In the next step these centroids of Text are used to initialize the centroids in image clustering by K-means. Second algorithm, Affinity clustering based on HTL first finds the exemplar of annotations and then these exemplar of annotations are used to initialize the similarity matrix of image datasets to find the clusters. F-Measure Scores and Purity scores increase and Entropy Scores decreases in both the algorithms. Clustering accuracy of affinity based on HTL is better than K-Means based on HTL. Key words- Heterogeneous Transfer learning, clustering, affinity propagation, K-Means, feature space.

——————————

——————————

1 INTRODUCTION

I

n the Literature[1] Machine Learning is defined as: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” However many machine learning methods work well only under assumption, that the training data and testing data are drawn from same feature space. If feature space is different in training and testing data, most statistical models will not work. In this case one needs to recollect the training and testing data in same features space and rebuild the model. But this is expensive and difficult. In such cases transfer learning [3] between task domains is desirable. Transfer learning allows the domain tasks, and distribution used in training and testing to be different. In heterogeneous transfer learning, the knowledge is transferred across the domains or tasks that have different features space e.g. classifying the web pages in Chinese using the training document in English [4]. Probabilistic latent semantic analysis (PLSA) [5] was used in clustering images by using the annotation (Text). Transfers learning in Machine Learning technologies [2] have already achieved significant success in many knowledge engineering areas including classification, regression and clustering. Clustering is a fundamental task in computerized data analysis. It is concerned with the problem of partitioning a collection of data points into groups/categories using unsupervised learning tech————————————————

• Shailendra Kulmar Shrivastava is with the Department of Information Technology, Samrat Ashok Technological Institute, Vidisha, M.P.464001, India • Dr.J.L.Rana,Ex Head of Department of Computer Sc. & Engineering was with the M.A.N.I.T., Bhopal, India • Dr. R.C.Jain,Director , is with the Samrat Ashok Technological Institute, Vidisha, M.P.464001, India

niques. Data points in groups are similar. Such groups are called clusters [6][7][8] In this paper two algorithm, K-Means [8][9] based on Heterogeneous Transfer Learning and Affinity clustering based on transfer learning are proposed. Affinity propagation [6] is a clustering algorithm which for given set of similarities (also denoted by affinities) between pairs of data points, partitions the data by passing the messages among the data points. Each partition is associated with a prototypical point that best describes that cluster. AP associates each data point with one such prototype. Thus, the objective of AP is to maximize the overall sum of similarities between data points and their representatives. In K-Means starts with random initial partitions and keeps reassigning the patterns to clusters based on similarity between pattern and centroids until a convergence criterion is met. In annotated image dataset has two features space. First one is text another one is image feature space. In K-means Text data (annotations) is used to find the clusters by K-Means. In order, to transfer knowledge of text features space into image feature space first finds the centroid of annotations by K-Means. Now corresponding to Text (annotations) centroids, images centroids become available. Next we take complete image data sets and assign it to the centroid on the basis of minimum Euclidean distance and finally apply K-Means to generate the image clusters. In Affinity clustering based on HTL we use Text (annotations) of images to find exemplars by affinity propagation clustering. For transferring the knowledge form text features space to image feature space, we initialize the image similarity matrix diagonal by exemplar of text clustering then generate the images clusters of image similarity matrix by affinity propagation clustering. The remainder of this paper is organized as follows. Section 2 gives a brief over view of Transfer Learning,

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

32

original Affinity Propagation algorithm and Vector space model. Section 3 describes the main idea and details of our proposed algorithms. Section 4 discusses the experimental results and evaluations. Section 5 provides the concluding remarks and future directions.

tive to selection of initial partition.

2 RELATED WORKS

Before going into details of our proposed “K-Means based on Heterogeneous Transfer learning” and “Affinity Clustering Based on Heterogeneous Transfer Learning” algorithms, some works that are closely related to this paper are briefly reviewed. Transfers Learning, K-Means clustering algorithm, affinity propagation algorithm and vector space model will be discussed.

2.1 Transfer Learning Machine Learning methods work well only under common assumption, the training and test data from same features space and same distribution. When distributions changes, most statistical models need to be rebuilt from scratch using newly collected training data. In many real world applications, it is expensive or impossible to re-collect the needed training data and rebuild the model. It would be nice to reduce the need and effort to re-collect training data. In such cases Knowledge transfer or Transfer Learning [3] between tasks domain would be desirable. Transfer learning has following three main research issues (1) What to transfer (2) How to transfer (3) When to transfer. In the inductive transfer learning setting, the target task is different form source task, no matter source and target domain is the same or not. In the transductive transfer learning, the source and target tasks are same, while source and target domain are different. In the unsupervised transfer learning setting, similar to inductive transfer learning setting, target task is different but related to source tasks. In the heterogeneous transfer learning, transfer the knowledge across domain or task that has different feature space. 2.2 K-Means Clustering Algorithm K-Means[8][9] algorithm is one of the best known and most popular clustering algorithms K-Means seeks optimal partition of the data by minimizing the sum of squared error criterion, with an iterative optimization procedure. The K-Mean clustering procedure is as following. 1. Initialize a K - partition randomly or based on some prior knowledge. Calculate the cluster prototype matrix M = [ m 1 , … , m K ] 2. Assign each object in the data set to the nearest cluster C 3. Recalculate the cluster prototype matrix based on the current partition, ଵ (1) 4. ݉ ൌ ∑௫ೕ ∈ ݔ ே 5. Repeat steps 2 and 3 until there is no change for each cluster. Major Problem with this algorithm is that it is sensi-

2.3 Affinity Clustering Algorithm Affinity clustering algorithm [10][11][12] is based on message passing among data points. Each data point receives the availability from others data points (from exemplar) and sends the responsibility message to others data points (to exemplar). Sum of responsibilities and availabilities for data points identify the exemplars. After the identification of exemplar the data points are assigned to exemplar to form the clusters. Following are the steps of affinity clustering algorithms. 1. Initialize the availabilities to zero ܽሺ݅, ݇ሻ ൌ 0 2. Update the responsibilities by following equation. ݎሺ݅, ݇ሻ ⟵ ሺݏሺ݅, ݇ሻ (2) max ᇲ ௦௧ ᇱஷ ሼܽሺ݅, ݇ ᇱ ݏሺ݅, ݇ ᇱ ሻሽ Where ݏሺ݅, ݇ሻ is the similarity of data point i and exemplar k. 3. Update the availabilities by following equation ܽሺ݅, ݇ሻ ⟵ ݉݅݊൛0, ݎሺ݇, ݇ሻ (3) ∑ ᇲ ௦.௧.ᇱ∉ሼ,ሽ ݉ܽݔሼ0, ݎሺ݅ ᇱ , ݇ሻሽൟ Update self-availability by following equation ܽሺ݇, ݇ሻ ⟵ ∑ maxሺሼ0, ݎሺ݅ ᇱ , ݇ሻሽሻ (4) 4. Compute sum = ܽሺ݅, ݇ሻ ݎሺ݅, ݇ሻ for data point i and find the value of k that maximize the sum to identify the exemplars. 5. If Exemplars do not change for fixed number of iterations go to step (6) else go to Step (1) 6. Assign the data points to Exemplars on the basis of maximum similarity to find clusters. 2.4 Vector Space Model Vector space model [13] uses to represent the text documents. In VSD each document d is considered as a vector in the M-dimensional term (word) space. In the algorithm the tf-idf weighing scheme is used. In VSD model each document represented by following equation. Ԧ ൌ ሼݓሺ1, ݀ሻ, ݓሺ2, ݀ሻ, … . ݓሺܰ, ݀ሻሽ ݀ (5) Where, N is the number of terms (words) in the document. And ݓሺ݅, ݀ሻ ൌ ሺ1 log ݂ݐሺ ݅, ݀ሻ logሺ1 ܰ/݂݀ሺ݅ሻሻ (6) Where ݂ݐሺ݅, ݀ሻ frequency of ith term in the document d and df(i) is the number of document containing i th term . Inverse document frequency (idf) is defined as the logarithm of the ratio of number of documents (N) to the number of document containing the given word (df).

**3 CLUSTERING BASED ON HETEROGENOUS TRANSFER LEARNING
**

In this section, two algorithm of clustering based on heterogeneous transfer learning are proposed. First is

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

33

K-Means clustering based on heterogeneous Transfer Learning and second is Affinity Propagation Clustering Based on Heterogeneous Transfer Learning.

3.1 K-Means Clustering based on Heterogeneous Transfer Learning K-Means clustering based on heterogeneous transfer learning extends the K-Means for clustering. Annotated image data set has been used in simulation studies. In this annotations (Text features space) and images (image features space) are computed. K-Means clustering is applied to text (annotation) of images to find the centroid. In order to transfer knowledge from one task to another task, first step is to initialize the centroid in image clustering by the centroid obtained in Text clustering. For text clustering phrase base VSD [13] is used. In vector space model w (d, i), term frequency and document frequency are calculated on the basis of term. Term in vector space model is word. But the phrase instead of word will be used. This can be called Vector space model bases on the phrase. Phrase (Term) frequency and document frequency can be calculated by suffix tree. Here document frequency is a number, that a document contains the phrase. Generate the centroid of annotations by K-Means algorithm giving input VSD. Now apply the K-Means clustering to image data sets by initializing the centroid by the centroid obtained in text clustering. Proposed K-means clustering algorithm based on heterogeneous transfer learning can be written as following.

Input Annotations(Text) for Clustering Text preprocessing. Removing all stop words. Words steaming are done. 3. Find the words and assign the unique number to each word. 4. Convert text into sequence of number. 5. Suffix tree construction using Ukkonen algorithm. 6. Calculate the Phrase (term) frequency from suffix tree. 7. Calculate the document frequency of phrase from suffix tree. 8. Construct the Vector space model of text using phrase. 9. Apply k-means on VSD. 10. Initialize the centroid in image domain by centroid obtained from text clustering. 11. Apply K-means in Image data sets to find clusters. 1. 2.

emplar. In order to transfer knowledge from one task to another task, diagonal values of similarity matrix of image data sets are assigned on the bases of exemplar of text clustering. For text clustering phrase base VSD is used. In vector space model w (d, i), term frequency and document frequency is calculated on the basis of term. Term in vector space model is word. But the phrase is used instead of word. This can be called Vector space model bases on the phrase. Phrase (Term) frequency and document frequency can be calculated by suffix tree. Here document frequency is a number document contains the phrase. VSD model based on the phrase is used to compute the cosine similarity [1]. Similarity of two document di and dj is calculated by equation (7) and document can be represented by equation (4).

݉݅ݏሺ݀ , ݀ ሻ ൌ |ௗ

ሬሬሬሬሬԦ ሬሬሬሬሬԦ ௗ ഢ • ௗ ണ

| ୶ หௗೕ ห

ൌ

మ ಿ మ ට∑ಿ ௗ ∑ ௗೕ

∑ಿ ௗ ௗೕ

(7)

Self-similarity/Preference [9] is finding from by equation (8).

simሺd୩ , d୩ ሻ ൌ

∑ొ ,ౠసభ,ಯౠ ୱ୧୫ሺୢ ,ୢౠ ሻ ൈሺ-ଵሻ

1 k N

(8)

3.2 Affinity Clustering based on Heterogeneous Transfer Learning Affinity clustering based on heterogeneous transfer learning extends the affinity propagation clustering. Annotated image data set is used. In these annotations (Text features space) and images (image features space) form the starting point. Affinity clustering is applied to annotation (text) of images to find the ex-

Affinity propagation algorithm for clustering is applied to generate the exemplar. Extract the features of image data sets to make the features vector space of image data set. Next finds the similarity matrix from image vectors. Assign the diagonal value of similarity matrix of image domain on the bases of exemplar of Text clustering, which transfer the knowledge from one domain to other domain. Generate the exemplars/clusters by affinity propagation clustering algorithm. Proposed algorithm can be written as following. 1. Input Annotations(Text) for Clustering 2. Text preprocessing. Removing all stop words. Words steaming are done. 3. Find the words and assign the unique number to each word. 4. Convert text into sequence of number. 5. Suffix tree construction using Ukkonen algorithm. 6. Calculate the Phrase (term) frequency from suffix tree. 7. Calculate the document frequency of phrase from suffix tree. 8. Construct the Vector space model of text using phrase. 9. Find the Phrase based similarity matrix of documents from vector space model by equation 7. 10. Preference in similarity matrix is assigned by equation 8. 11. Initialize the availabilities to zero aሺi, kሻ ൌ 0 12. Update the responsibilities by equations (2). 13. Update the availabilities by equation (3). 14. Update self-availability by equation (4). 15. Compute sum = aሺi, kሻ rሺi, kሻ for data point i and find the value of k that maximize the sum to identify the exemplars.

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

JOURNAL OF COMPUTING, VOLUME 5, ISSUE 1, JANUARY 2013, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

34

16. If Exemplars do not change for fixed number of iterations go to step (12) else go to Step (17) 17. Extract feature vector from image data sets. 18. Find the similarity matrix from image feature vector. 19. Transfer the knowledge from text feature space to image feature space. 20. Initialize the availabilities of to zero ܽሺ݅, ݇ሻ ൌ 21. 22. 23. 24.

4.1.3 Purity

Purity indicates the percentage of the dominant class member in the given cluster. For measuring the overall clustering purity weighted average purity is used. Purity is given by following equation.

25. 26.

0 Update the responsibilities by equations 1. Update the availabilities by equation 2. Update self-availability by equation 3. Compute sum = ܽሺ݅, ݇ሻ ݎሺ݅, ݇ሻ for data point i and find the value of k that maximize the sum to identify the exemplars. If Exemplars do not change for fixed number of iterations go to step (21) else go to Step (26) Assign the data points to Exemplars on the basis of maximum similarity to find clusters.

**ห ܥ ห ܲ ݕݐ݅ݎݑൌ ∗ ሼ max ܲ ݊݅ݏ݅ܿ݁ݎሺ݅ , ݆ ሻሽ ୀଵ….. ܰ
**

ୀଵ

4.1.4 Entropy Entropy tell us the homogeneity of cluster. Higher homogeneity of cluster entropy should be low and vice versa. Like weighted F-Measure and weighted Purity weighted entropy is used which is given by following equation หܥ ห 1 ݕݎݐ݊ܧൌ െ log log ݇ ܰ

ୀଵ ୀଵ

**4 EXPERIMENTAL RESULTS AND EVALUATION
**

In this Section, results and evaluation of set of experiments are presented to verify the effectiveness and efficiency of our proposed algorithm for clustering. Evaluations parameters are F-Measures, Purity and Entropy. Experiments have been performed on data sets constructed from Corpus Caltech 256[14]. We will discuss Evaluation parameter, Datasets and results.

Where is the probability that a member of cluster ܥ belongs to class ܥ∗ . To sum up, we would like to maximize the FsMeasure and Purity scores and minimize the entropy score of cluster to achieve high quality clustering.

4.2 Data Set Preparation Image data sets of 100,300,500 and 800 Images have been consturcted. Images are randomly chosen from Caltech-256. Manually annotated (text) files are created for each datasets. 4.3 Experimental Results Discussion Extensive experiments are carried out to show the effectiveness of proposed algorithms. Annotations and images have been combined. Experiments are performing on following combinations annotations and images. Without annotations ,100 annotations and 100 images ,100 annotations and 300 images ,100 annotations 500images ,100 images 800 annotations ,300 annotations 300 images ,300 annotations 500images, 300 annotations and 800 images ,500 annotations and 500 images ,500 annotations and 800 images. Results of experiments are given in Table 1, Table 2, and Table 3. It can be observed from fig 1, fig 2, fig 3, fig 4, fig 5 and fig 6 that in both algorithms the entropy scores, purity scores and entropy scores vary with number of annotations and number of images. In both algorithms entropy scores and purity scores are maximum and entropy scores is minimum on the optimum number of annotations. For comparison of K-means clustering based on HTL and Affinity clustering based on HTL are plotted. From fig 7, fig 8 and fig 9 it is observed that F-Measure scores and Purity scores are larger and Entropy scores is smaller in Affinity Clustering Based on HTL.

4.1 Evaluations Parameters [15] For ready reference definition and formulas of FMeasure, Purity and Entropy are given below. 4.1.2 F-measure F-Measure combines the Precision and Recall. Let C={ܥଵ ܥଶ ܥଷ … ܥ ሽ be clusters of data set D of N doc∗ ∗ uments ,and let ∗ ܥൌ ܥଵ , ܥଶ … ܥ∗ represents the correct class of set D. Then the Recall of Cluster j with respect to Class i is defined as Recall(i , j )=

∗ ቚೕ ∩ ೕ ቚ

Then the Precision of Cluster j with respect to Class i is defined as F-Measures of cluster ܥ and class ܥ is the combinations of Precision and Recall in following manner.

∗

ห∗ ห

Precision (i , j )=

∗ ቚೕ ∩ ೕ ቚ

หೕ ห

2 ∗ ܲ ݊݅ݏ݅ܿ݁ݎሺ݅ , ݆ሻ ∗ ܴ݈݈݁ܿܽሺ݅ , ݆ ሻ ܲ ݊݅ݏ݅ܿ݁ݎሺ݅ , ݆ሻ ܴ݈݈݁ܿܽሺ݅ , ݆ ሻ F-Measure for overall quality of cluster set C is defined by the following equation ܨሺ݅, ݆ሻ ൌ

୪

**| C୧∗ | F ൌ ∗ ሼ max Fሺi, jሻሽ ୨ୀଵ…..୪ N
**

୧ୀଵ

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

35

Fig 1: Variation of F-Measure Scores with Annotations(Text) in K-Means based heterogeneous transfer learning clustering

Fig 4: Variation of F-Measure Scores with Annotations(Text) in Affinity clustering based heterogeneous transfer learning

Fig 2: Variation of Purity Scores with Annotations(Text) in K-Means based heterogeneous transfer learning clustering

Fig 5: Variation of Purity Scores with Annotations(Text) in Affinity clustering based heterogeneous transfer learning

Fig 3: Variation of Entropy Scores with Annotations (Text) in K-Means based heterogeneous transfer learning clustering

Fig 6: Variation of Entropy Scores with Annotations(Text) in Affinity clustering based heterogeneous transfer learning

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

36

Number of Annotation

No. of F-Measure F-Measure Images AP Based K-Means in Data on HTL Based on sets HTL 0 100 0.30711 0.26873 100 100 0.43563 0.35208 0 300 0.25227 0.24254 100 300 0.42308 0.35364 300 300 0.24565 0.10109 0 500 0.18273 0.18823 100 500 0.41944 0.30234 300 500 0.28443 0.19764 500 500 0.19175 0.18912 0 800 0.18928 0.18064 100 800 0.40586 0.32492 300 800 0.35365 0.26969 500 800 0.16184 0.12764 Table 1: Comparison of F-Measure Scores No. of Purity Purity Images AP Based K-Means in Data on HTL Based on sets HTL 0 100 0.3700 0.2900 100 100 0.4800 0.3600 0 300 0.2800 0.2000 100 300 0.3907 0.2966 300 300 0.2700 0.2015 0 500 0.1980 0.1680 100 500 0.3362 0.2480 300 500 0.2000 0.1175 500 500 0.1287 0.1060 0 800 0.1900 0.1062 100 800 0.2875 0.2537 300 800 0.2025 0.1200 500 800 0.1912 0.1175 Table 2: Comparison of Purity Scores No. of Entropy Entropy ImagAP Based K-Means es in on HTL Based on Data HTL sets 0 100 0.75162 0.85679 100 100 0.60888 0.70140 0 300 0.80327 0.89764 100 300 0.68969 0.79095 300 300 0.78225 0.80882 0 500 0.80658 0.93903 100 500 0.69742 0.83917 300 500 0.77842 0.88506 500 500 0.79886 0.93907 0 800 0.87716 0.95362 100 800 0.74227 0.78226 300 800 0.78725 0.88942 500 800 0.86091 0.97506 Table 3: Comparison of Purity Scores

Fig 7: Comparison of F-Measure Scores with Annotations(Text) in K-Means clustering based HTL and AP Based on HTL(Number of Images in Data sets 800)

Number of Annotation

Fig 8: Comparison of Purity Scores with Annotations(Text) in K-Means clustering based HTL and AP Based on HTL(Number of Images in Data sets 800)

Number of Annotation

Fig 9: Comparison of Entropy Scores with Annotations(Text) in K-Means clustering based HTL and AP Based on HTL(Number of Images in Data sets 800)

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

37

**5 CONCLUDING REMARKS AND FUTURE DIRECTIONS
**

In this paper two algorithms for clustering. K-Means Clustering based on HTL and Affinity Clustering based on HTL have been proposed. Clustering Accuracy of K-Means based on HTL is better than K-Means whereas Affinity Clustering based on HTL gives far better clustering accuracy than simple Affinity Propagation Clustering. It is also concluded that the clustering accuracy of Affinity based on HTL is much better than the K-Means Based on HTL. Extensive experiments on many datasets show that the proposed Affinity based on HTL produces better clustering accuracy with less computational complexity. There are a number of interesting potential avenues for future research. Affinity Clustering based on HTL can be made hierarchical. Results of FAPML can be improved by designing it on the basis of HTL. Both algorithms can be applied to information retrieval.

Shailendra Kumar Shrivastava, B.E.(C.T.),M.E.(CSE) Associate Professor in Department of Information Technology. Samrat Ashok Technological Institute Vidisha. He has more than 23 Years Teaching Experiences. He has published more than 50 research papers in National/International conferences and Journals .His area of interest is machine learning and data mining.He is PhD. Scholar at R.G.P.V.Bhopal Dr J.L.Rana B.E.M.E.(CSE),PhD(CSE) .Formerly he was Head of Department Computer Science and Engineering , M.A.N.I.T. Bhopal M.P. Inidia.He has more than 40 Years Teaching Experinces. His area of intrest includes Data Mining, Image Processing, and Ad-Hoc Network etc. He has so many publications in International Journal and conferences. Dr. R.C.Jain PhD .He is the director Samrat Ashok Technological Institute Vidisha M.P. India.He has more than 35 Years Teaching Experiences. Research Interest includes Data Mining , Computer Grpahics, Image Processing, Data Mining .He has published more than 250 research papers in Internatinal Journals and Conferences.

REFERENCES

[1] [2] [3] Tom M. Mitchell, “Machine Learning” ,McGraw-Hill , 1997 pp1- 414 EthemAlpaydin , ”Introduction to Machine Learning ”,Prentice Hall of India Private Limited New Dehli,2006,pp133-150. Sinno Jialin Pan and Qiang Yang ,”A Survey on Transfer Learning “ , IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE) Volume 22, No. 10, October 2010 ,pp 1345-1359, X. Ling, G.-R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu, “Can Chinese web pages be classified with english data source?”, Proceedings of the 17th International Conference on World Wide Web, Beijing, China ,ACM, April 2008, pp. 969–978. Qiang Yang, Yuqiang Chen, Gui-Rong, Xue,Wenyuan Dai ,Yong ,“Heterogeneous Transfer Learning for Image Clustering via the Social Web”,ACL-IJCNLP 2009 ,pp 1-9. RuiXu Donald C. Winch, “Clustering” , IEEE Press 2009 ,pp 1282 Jain, A. and DubesR. “Algorithms for Clustering Data “, Englewood Cliffs, NJ Prentice Hall, 1988. A.K. Jain, M.N. Murthy and P.J. Flynn, “Data Clustering: A Review “, ACM Computing Surveys, Vol.31. No 3, September 1999, pp 264-322. RuiXu, and Donald Wunsch,” Survey of Clustering. Algorithms “, IEEE Transactions on Neural Network, Vol 16, No. 3, 2005 pp 645. Frey, B.J. and DueckD.” Clustering by Passing Messages Between Data Points “, Science 2007, pp 972–976. Kaijun Wang, Junying Zhang, Dan Li, Xinna Zhangand Tao Guo, ”Adaptive Affinity Propagation Clustering”,ActaAutomaticaSinica, 2007 ,1242-1246. Inmar E. Givoni and Brendan J. Frey,”A Binary Variable Model for AffinityPropagation”,Journal Neural Computation,Volume 21 Issue 6, June 2009,pp1589-1600. Salton G., Wong A., and Yang C. S., 1975, “A Vector Space Model for Automatic Indexing,” Comm. ACM, vol. 18, no. 11, pp. 613-620. http://www.vision.caltech.edu/Image_Datasets/Caltech25 6/ Chim H. and Deng X., 2008 “Efficient Phrase Based Document Similarity for Clustering”, IEEE Trans. Knowledge and Data Engineering, vol. 20, No.9.

[4]

[5] [6] [7] [8] [9]

[10] [11] [12]

[13]

[14] [15]

© 2013 Journal of Computing Press, NY, USA, ISSN 2151-9617

- APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEUploaded byBilly Bryan
- Detection of Retinal Blood Vessel using Kirsch algorithmUploaded byJournal of Computing
- Impact of Software Project Uncertainties over Effort Estimation and their Removal by Validating Modified General Regression Neural Network ModelUploaded byJournal of Computing
- Combining shape moments features for improving the retrieval performanceUploaded byJournal of Computing
- Towards A Well-Secured Electronic Health Record in the Health CloudUploaded byJournal of Computing
- Analytical Study of AHP and Fuzzy AHP TechniquesUploaded byJournal of Computing
- An Introduction to Clustering and Different Methods of ClusteringUploaded bySheela V. K.
- An Introduction to Cluster Analysis for Data MiningUploaded bymrmrva
- Mca5043 Winter 2016 solved assignmentUploaded byaapki education
- 01.04 Clustering ExamplesUploaded byAnurag GV
- Quantitative Effects of Using Facebook as a Learning ToolUploaded byEnal Emqis
- ClusterUploaded byravishnk
- book2016June9.pdfUploaded bymarko77pet5004
- 3Uploaded byanasunisla
- 1101.4270.pdfUploaded byAndika Saputra
- babidi_qcav03Uploaded byVisuVision India
- Tfinder_ a Recommender System for Finding Passengers and Vacant Taxis _Nicholas Jing Yuan, Yu Zheng, Liuhang Zhang, Xing XieUploaded bysgdgp
- Project Ideas - IEEEUploaded byKuldeepPawar8

- Decision Support Model for Selection of Location Urban Green Public Open SpaceUploaded byJournal of Computing
- Divide and Conquer For Convex HullUploaded byJournal of Computing
- Detection and Estimation of multiple far-field primary users using sensor array in Cognitive Radio NetworksUploaded byJournal of Computing
- A Compact Priority based Architecture Designed and Simulated for Data Sharing based on Reconfigurable ComputingUploaded byJournal of Computing
- Business Process: The Model and The RealityUploaded byJournal of Computing
- Exploring leadership role in GSD: potential contribution to an overall knowledge management strategyUploaded byJournal of Computing
- Hybrid Network Coding Peer-to-Peer Content DistributionUploaded byJournal of Computing
- Mobile Search Engine Optimization (Mobile SEO): Optimizing Websites for Mobile DevicesUploaded byJournal of Computing
- Analytical Study of AHP and Fuzzy AHP TechniquesUploaded byJournal of Computing
- Product Lifecycle Management Advantages and ApproachUploaded byJournal of Computing
- Image Retrival of Domain Name system Space Adjustment TechniqueUploaded byJournal of Computing
- Energy Efficient Routing Protocol Using Local Mobile Agent for Large Scale WSNsUploaded byJournal of Computing
- Complex Event Processing - A SurveyUploaded byJournal of Computing
- Real-Time Markerless Square-ROI Recognition based on Contour-Corner for Breast AugmentationUploaded byJournal of Computing
- When Do Refactoring Tools Fall ShortUploaded byJournal of Computing
- Application of DSmT-ICM with Adaptive decision rule to supervised classification in multisource remote sensingUploaded byJournal of Computing
- Impact of Facebook Usage on the Academic Grades: A Case StudyUploaded byJournal of Computing
- Overflow Detection Scheme in RNS Multiplication Before Forward ConversionUploaded byJournal of Computing
- Arabic documents classification using fuzzy R.B.F classifier with sliding windowUploaded byJournal of Computing
- Secure, Robust, and High Quality DWT Domain Audio Watermarking Algorithm with Binary ImageUploaded byJournal of Computing
- Hiding Image in Image by Five Modulus Method for Image SteganographyUploaded byJournal of Computing
- QoS Aware Web Services Recommendations FrameworkUploaded byJournal of Computing

- Seminarios2014 COUploaded byJuan Guillermo Patiño Vengoechea
- clad_exam_prep_guide_spanish_spain.pdfUploaded bySyhwh
- Uso Del Programa Tekla en El Montaje de [Autoguardado]Uploaded byedunitofic
- Function Point AnalysisUploaded bychan
- Flac 3D 2Uploaded byRobert Aguedo
- EOrtiz Foro de Discusion 4 ETEL 601Uploaded byericonieves
- Ejemplo Logica DifusaUploaded bySherezada Canul
- Pulgarin_Escalona_2009.pdfUploaded byArnadisT
- SOP_U1_EA_RIRAUploaded byLuNaAyala
- Signaling System 7Uploaded byshivukumarpatil
- Charte de Nommage ( Tn) 11052011 FRUploaded byZelZelNet
- Taller Oracle 11 gUploaded bygear123_123
- AjaxTags BasicsUploaded byOm Thacker
- Camada_de_Enlace_de_DadosUploaded bykarlinhaandrade
- Preguntas Equipo 2Uploaded byM. Cervantes
- Srinivasa Garlapati BASAUploaded bySrinivasa Garlapati
- It Companies List in HydUploaded byPrashanth Katikaneni
- CT213_ProcessManagement_2.pptUploaded byAbdul Saboor
- Briefing Criação de Marca - Pier SystemUploaded byalanrg_21
- DIR-300 Datasheet 02[1]Uploaded byleoaug
- AS-User-guide-2015-GE-140410.pdfUploaded byAnonymous bUlPzdmEw
- Turbo Pascal Version 4.0 Owners Manual 1987Uploaded byMatthew Wells
- TCP_IP1Uploaded byapi-3805926
- Fisica-informe-1.1Uploaded byjunior
- Guía para descargar e instalar matlab 2017.docxUploaded byMarelin Ayarza
- 1 Processus de TestUploaded byAasfsadfg Aasdgsadg
- Guía de aprendizaje U2 3B.docxUploaded byEquidad Y Justicia Docente Contigo
- 840D_manejo2Uploaded byElias Mora Becerra
- ReadmeUploaded byJonh Almeida
- eogUploaded byChhon Channak