(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.

4, July 2010

A Robust -knowledge guided fusion of clustering Ensembles
Anandhi R J
Research Scholar, Dept of CSE, Dr MGR University, Chennai, India rjanandhi@hotmail.com

Dr Natarajan Subramaniyan
Professor, Dept of ISE PES Institute of Technology Bangalore, India snatarajan44@gmail.com commercial, industrial, administrative and other applications, it is necessary and interesting to examine how to extract knowledge automatically from huge amount of data. Very large data sets present a challenge for both humans and machine learning algorithms. Machine learning algorithms can be inundated by the flood of data, and become very slow in knowledge extraction. More over, along with the large amount of data available, there is also a compelling need for producing results accurately and fast. Efficiency and scalability are, indeed, the key issues when designing data mining systems for very large data sets. Through the extraction of knowledge in databases, large databases will serve as a rich, reliable source for knowledge generation and verification, the discovered knowledge can be applied to information management, query processing, decision-making, process control and many other applications. Therefore, data mining has been considered as one of the most important topics in databases by many database researchers. Spatial data describes information related to the space occupied by objects. It consists of 2D or 3D points, polygons etc. or points in some d-dimensional feature space. It can be either discrete or continuous. Discrete spatial data might be a single point in multi-dimensional space while continuous spatial data spans a region of space. This data might consist of medical images or map regions and it can be managed through spatial databases [8]. Clustering [17] is to group analogous elements in a data set in accordance with its similarity such that elements in each cluster are similar, while elements from different clusters are dissimilar. It doesn’t require the class label information about the data set because it is inherently a data-driven approach. So, the most interesting and well developed method of manipulating and cleaning spatial data in order to prepare it for spatial data mining analysis is by clustering that has been recognized as a primary data mining method for knowledge discovery in spatial database [4-7]. Clustering fusion is the integration of results from various clustering algorithms using a consensus function to yield stable results. Clustering fusion approaches are receiving increasing attention for their capability of improving clustering performance. At present, the usual operational mechanism for

Abstract— Discovering interesting, implicit knowledge and general relationships in geographic information databases is very important to understand and to use the spatial data. Spatial Clustering has been recognized as a primary data mining method for knowledge discovery in spatial databases. In this paper, we have analyzed that by using a guided approach in combining the outputs of the various clusterers, we can reduce the intensive computations and also will result in robust clusters .We have discussed our proposed layered cluster merging technique for spatial datasets and used it in our three-phase clustering combination technique in this paper. At the first level, m heterogeneous ensembles are run against the same spatial data set to generate B1…Bm results. The major challenge in fusion of ensembles is the generation of voting matrix or proximity matrix which is in the order of n2, where n is the number of data points. This is very expensive both in time and space factors, with respect to spatial datasets. Instead, in our method, we compute a symmetric clusterer compatibility matrix of order (m x m) , where m is the number of clusterers and m <<n, using the cumulative similarity between the clusters of the clusterers. This matrix is used for identifying which two clusterers, if considered for fusion initially, will provide more information gain. As we travel down the layered merge, for every layer, we calculate a factor called Degree of Agreement (DOA), based on the agreed clusterers. Using the updated DOA at every layer, the movement of unresolved, unsettled data elements will be handled at much reduced the computational cost. Added to this advantage, we have pruned the datasets after every (m-1)/2 layers, using the gained knowledge in previous layer. This helps in faster convergence compared to the existing cluster aggregation techniques. The correctness and efficiency of the proposed cluster ensemble algorithm is demonstrated on real world datasets available in UCI data repository.
B B

Keywords- Clustering ensembles, Spatial Data mining, Degree of Agreement, Cluster Compatibility matrix.

I.

INTRODUCTION

With a variety of applications, large amounts of spatial and related non-spatial data are collected and stored in Geographic Information Databases. Spatial Data Mining[1], (i.e., discovering interesting, implicit knowledge and general relationships in large spatial databases) is an important task for the understanding the usage of these spatial data. With the rapid growth in size and number of available databases in

284

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

clustering fusion is the combining of clusterer outputs. One tool for such combining or consolidation of results from a portfolio of individual clustering results is a cluster ensemble [13]. It was shown to be useful in a variety of contexts such as “Quality and Robustness” [3], “Knowledge Reuse” [13,14], and “Distributed Computing” [9]. The rest of the paper is organized as follows. The related work is in section 2. The proposed knowledge guided fusion ensemble technique is in section 3. In section 4, we present experimental test platform and results with discussion. Finally, we conclude with a summary and our planned future work in this area of research. II.
RELATED WORK

measure used in such algorithms is the Euclidian distance. Recently new set of spatial clustering algorithms has been proposed, which represents faster method to find clusters with overlapping densities. DBSCAN, GDBDCAN and DBRS are density-based spatial clustering algorithms, but they each perform best only on particular types of datasets [17]. However, these algorithms also ignore the non-spatial attribute participation and require user defined parameters. For largescale spatial databases, the current density based cluster algorithms can be found to be expensive as they require large volume of memory support due to its operations over the entire database. Another disadvantage is the input parameters required by these algorithms are based on experimental evaluations. There is a large interest in addressing the automation of the general purpose clustering approach without user intervention. However, it is difficult to expect accurate results from the results of these algorithms as each one has its own shortfalls. B. Litereature on Clustering Ensembles Clustering ensemble is the method to combine several runs of different clustering algorithms to get an optimal partition of the original dataset. Given dataset X = {x1 x2,.. ,xn}, a cluster ensemble is a set of clustering solutions, represented as P = P1,P2,..Pr,where r is the ensemble size, i.e. the number of clusterings in the ensemble. Clustering-Ensemble Approach first gets the result of M clusterers, then sets up a common understanding function to fuse each vector and get the labeled vector in the end. The goal of cluster ensemble is to combine the clustering results of multiple clustering algorithms to obtain better quality and robust clustering results. Even though many clustering algorithms have been developed, not much work is done in cluster ensemble in data mining and machine learning community. Strethl and Ghosh [13,14], proposed a hypergraph-partitioned approach to combine different clustering results by treating each cluster in an individual clustering algorithm as a hyper edge. All the three proposed algorithms approach the problem by first transforming the set of clusterings into a hypergraph representation. Cluster-based Similarity Partitioning Algorithm (CSPA) uses relationship between objects in the same cluster for establishing a measure of pair wise similarity. In Hyper Graph Partitioning Algorithm (HGPA) the maximum mutual information objective is approximated with a constrained minimum cut objective. In their Meta-CLustering Algorithm (MCLA), the objective of integration is viewed as a cluster correspondence problem. Kai Kang, Hua-Xiang Zhang, Ying Fan [6] formulated the process of cooperation between component clusterers, and proposed a novel cluster ensemble learning technique based on dynamic cooperating (DCEA). The approach is mainly concerned how the component clusterers fully cooperate in the process of training component clusterers. Fred and Jain [2] used co-association matrix to form the final partition. They applied a hierarchical (single-link) clustering to the co-association matrix. Zeng, Tang, Garcia-Frias and Gao[18], proposed an adaptive meta-clustering approach for

A. Litereature on Clustering Algorithms Many clustering algorithms have been developed and they can be roughly classified into hierarchical approaches and nonhierarchical approaches. Non-hierarchical approaches can also be divided into four categories; partitioning methods, densitybased methods, grid-based methods, and model-based methods. Hierarchical algorithms can be further divided to agglomerative and divisive algorithms, corresponding to bottom-up and topdown strategies, to build a hierarchical clustering tree. Spatial data mining or knowledge discovery in spatial databases refers to the extraction, from spatial databases, of implicit knowledge, spatial relations, or other patterns that are not explicitly stored [8, 10]. The large size and high dimensionality of spatial data make the complex patterns that lurk in the data hard to find. It is expected that the coming years will witness very large number of objects that are location enabled to varying degrees. Spatial clustering [8] has been used as an important process in the areas such as geographic analysis, exploring data from sensor networks, traffic control, and environmental studies. Spatial data clustering has been identified as an important technique for many applications and several techniques have been proposed over the past decade based on density-based strategies, random walks, grid based strategies, and brute force exhaustive searching methods[5]. This paper deals with fusion of spatial cluster ensembles using a guided approach to reduce the space complexity of such fusion algorithms. Spatial data is about instances located in a physical space. Spatial clustering aims to group similar objects into the same group considering spatial attributes of the object. The existing spatial clustering algorithms in literature focus exclusively either on the spatial distances or minimizing the distance of object attributes pairs. i.e., the locations are considered as another attribute or the non-spatial attribute distances are ignored. Much activity in spatial clustering focuses on clustering objects based on the location nearness to each other [5]. Finding clusters in spatial data is an active research area, and the current non-spatial clustering algorithms are applied to spatial domain, with recent application and results reported on the effectiveness and scalability of algorithms [8, 16]. Partitioning algorithms are best suited to such problems where minimization of a distance function is required and a common

285

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

combining different clustering results by using a distance matrix. C. Fusion Framework We begin our discussion of the guided ensembles fusion framework by presenting our notation. Let us consider a set of n data objects, D = { v1 . . . vn }. A clustering C of dataset D, is a partition of D into k disjoint sets C1 … Ck. In the sequel we consider m clusterings; we write Bi to denote the ith clustering, and ki for the number of clusters of Bi. In the clustering fusion problem the task is to find a clustering that maximizes the related items with a number of already-existing clusterings [4]. D. Definations • Fusion Joint set, FJij: Fusion Joint set, FJij refers to set of matching pairs of ith clusterer’s jth cluster. For instance, FJ12 refers to probable fusion spot for first clusterer’s clusters with second clusterer’s cluster. It will be used for deciding where the fusion is most likely to yield optimal preciseness of clusters. • Clusterer Compatibility matrix: CCM (m X m) Clusterer Compatibility matrix is a m X m symmetric matrix where m is the total number of clusterers, considered for fusion. • CCM[i][j] Integer value d representing the maximum information gained through the summation of intersection elements cardinality of the matching pairs of clusterer found in Fusion Joint Set, FJ[i][j]. • Degree Of Agreement Factor: (DOA) Degree of agreement factor is the ratio of the index of the merging level to the total number of clusterers. And also this DoA value will be cumulative till it reaches the threshold level DoATh, an user assigned value indicating the majority required for decision making. Under normal scenario, DoATh will be set as 50% of the number of clusterers. • Degree of Shadow factor : (DOS) Degree of shadow factor is the maximized value of the intersection of the two minimum bounding circles of k clusters with ith cluster from a different clustering. III.
KNOWLEDGE GUIDED ENSEMBLE FUSION

vertically into n subgroups and used it as the input to our ensemble algorithm. Either way individual partitions in each ensemble are sequentially generated. A. Selection of clusterings for prime fusion D Any layered approach will have a drawback of being dependent on which clusterer is considered for initial fusion. This sensitiveness is a major bottleneck in deciding the accuracy of the outputs. But, in our approach, we compute a m x m symmetric clusterer compatibility matrix, where CCM[i][j] indicates the summary of information gain when ith clusterer and jth clusterer are merged. This way we have used heuristics to direct the fusion in the right direction. B. Resolution for Label Correspondence Problem The other issues in fusion of cluster ensembles are label correspondence problem and the merging technique used for fusion. At the second phase, we address the label correspondence problem. These clustering results are combined in layered pairs, called fusion joints set, FJmk . The criteria of merging can be any one of the Fusion Joint Identification Techniques i.e., overshadowing or usage of highest cardinality in intersection set along with usage of addon knowledge gathered from such association. First approach uses the degree of shadow that one cluster has on other. This is computed using the smallest circle or minimum covering circle approach, which is a mathematical problem of computing the smallest circle that contains all of a given set of points in the Euclidean plane. Each cluster of the clusterer in two layers first compute the minimum bounding circle and the diameter of such circle, using which the degree of Shadow (DOS) is computed. The aim is to find the clusters in different layers whose shadow overlap is maximized and then assign it to the matching pair set. This method finds the most appropriate clusters belonging to a two clustererings for forthcoming fusion phase. Second approach uses the usage of heuristic greedy approach in computing mutual information theory to decide on the degree of compatibility. Mutual information is used when we need to decide, which amongst candidate variables are closest to a particular variable. Higher the mutual information, more the two variables are 'closer'. It is the amount of information 'contained' in Y about X. Let X and Y be the random variables described by the cluster labeling λ(a) and λ(b) , with k(a) and k(b) groups respectively. Let I(X; Y) denote the mutual information between X and Y, and H(X) denote the entropy of X, i.e, a measure of the uncertainty associated with a X. The chain rule for entropy states that H(X1:n)= H(X1)+H(X2|X1)+...+H(Xn|X1:n−1) (1)

In this section we discuss our proposed layered cluster ensemble fusion guided by the gained knowledge during the merging process. The first phase of the algorithm is the preparation of B heterogeneous ensembles. This is done by executing different clustering algorithms against the same spatial data set to generate partitioning results. For our experimental purpose, we have also generated homogenous ensembles by partitioning the spatial data horizontally/

When X1:n are independent ,identically distributed (i.i.d.), then H(X1:n) = nH(X1). From Eqn 1, we have H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y )

286

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

H(X) − H(X|Y ) = H(Y ) − H(Y |X). This difference can be interpreted as the reduction in uncertainty in X after we know Y, or vice versa. It is thus known as the information gain, or more commonly the mutual information between X and Y. Finding the maximum MI for the clusters in the clusterings is a combinatorial optimization problem, requiring an exhaustive search through all possible clusterings with k labels. This is formidable since for n objects and k partitions there are approximately kn /k! for n >> k. For example, there are 171,798,901 ways to form four groups of 16 objects. Hence, instead of the complete greedy solution, we have incorporated some heuristics so that cluster accuracy will improve amidst the cost savings in terms of space & computations. As we travel the length and breadth of the ensemble space, we try to reduce the kn /k! Combinations, by reuse of the cumulative information gain. This way, when n >> k, as in most of the cases of spatial data, the solutions can be reached much faster. C. Knowledge Guided fusion of clusterings – An Excerpt
Input: D – the input data in 2-dimensional feature space Layer : Group of Clusterers B1 to Bm ; Levels : List of clusters k1 to kn CCM[i][j] : Clusterer compatability for ith clusterer with jth clusterer. Step1: Form B1k1 to Bikn clusters from D using B1 to Bm clusterers, each clusterer generating k clusters Step 2; Compute Clusterer compatability matrix, whose entries are the aggregated cardinality values of the intersecting elements set of the clusterers. Step 3: Identify and select the harmonizing clusterers for fusion from the CCM matrix. as TobeMerged_Layers Step 4: Set DOA_Increment Factor as 1/ m . Step 5: Find fusion joints for TobeMerged_Layers , (FJ12 1 .. FJ1 k) , using degree of Shadow overlap or maximizing the information gain of probable merge. Step 6: For every pair in the fusion joint Set, FJi k, Do{ ClustData[i] Union of Data points of the pair Initialize Vector_DOA with DOA_Increment Factor Append it to to Vector_CData [i] For each element in the intersection set between Pairs, DOA[i] DOA[i] + DOA_Increment Factor Increment the vector index I by 1 & merge_ layer by 2 } until i <= k; //normal merge for m/2 layers Step 7: repeat steps2 to 6 till merge_ layer < m/2; // finalize the cluster elements at layer i and at level k do{ If (Vector_DOA > DOA Th ) Strong links Corresponding Elements of Vector_CData Else Weak linksk Corresponding Elements of Vector_CData } until all pairs at layer I is resolved Step 8: Using Strong links, finalize Final_Kluster k & continue gathering votes for weak links. Step 9: From (m/2 +1)th layer, perform the pruned merging, where the strong links will be pruned for the confirmed data points, when they reappear. Data points in the weak links could be the noise data points, (Noise_Elements k) , as their inherent votes were below the threshold value. Step10: Return the robust clusters obtained from m clusterers Final_Kluster k and Noise_Elements k. Figure 3.3. Excerpt of the guided fusion of ensembles

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 The initial phase of the fusion starts with finding Bm Clusterers, (2)
B B

using m different clustering algorithms. Next stage is to find the clusterers amongst Bm , with maximum compatibility matrix index, for merging, so that they yield maximum knowledge for further fusions. When the merging happens, based on the Fusion Joint Set, for each merged data point, the degree of agreement (DOA) is calculated.. For example, if the total number of clusterers are 5, then all the data points that get merged at level 1, will have DOA as 1/5 = 0.2. This DOA value will be treated as the increment factor for every future fusion. And also the DOA value in the corresponding DOA vector be cumulative till it reaches the threshold level DOATh. Once the DOA of any point in the cluster crosses the threshold, it can be affirmed to belong to a particular cluster result and will be treated as a strong link. . Thus, the normal voting procedure with huge voting matrix, to confirm the majority does not arise at all in our method. This final layer merge with the earlier combined clusters will yield the robust combined result. This approach is not computationally intensive, as we tend to use the first law of geography in merging layer by layer. And also the computation of voting matrix is avoided. The three levels of the technique: fusion joint identification, guided fusion and resolving low voted data points are all executed sequentially. They do not interfere with each other, but they just receive the results from the previous levels. No feedback process happens, and the algorithm terminates after the completion of all procedures. IV.
EXPERIMENTAL PLATFORM & RESULTS

A. The Test Platform Ensemble Creation : In order to predigest the analysis, the paper uses five representative clustering methods to produce five heterogeneous ensembles or clustering members, viz. DBSCAN, k-means, PAM, Fuzzy K Means and Ward’s algorithm. K-means is a very simple yet very powerful and efficient iterative technique to partition a large data set into k disjoint clusters. The objective criterion used in the algorithm is typically the squared-error function. DBSCAN method performs well with attribute data and with spatial data. Partitioning around medoids (PAM) is mostly preferred for its scalability and hence useful in Spatial data. The latest in clustering is the usage of fuzziness and we have added Fuzzy C means (FCM) as one of the clusterer, so that we get a robust partition in the end result. Hence these clustering techniques along with different cluster sizes form the input for our knowledge guided fusion technique. Data Source : Most of the ensemble methods, have sampling techniques in selecting the data for their experimental platform, but this heuristics results in losing some inherent data clusters, thereby reducing the quality of clusters. We have tried to avoid sampling and involve the whole dataset. For our experiments we have used the datasets available in the data repository of University of California, Irvine. Metrics: We used the classification accuracy (CA) to measure the accuracy of an ensemble as the agreement between the

287

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

ensemble partition and the "true" partition. The classification accuracy is commonly used for evaluating clustering results. To guarantee the best re-labeling of the clusters, the proportion of correctly labeled objects in the data set is calculated as CA for the partition. We have used the measurement of intra cluster and inter cluster density before and after usage of our cluster ensemble approach, which will be a metric for the preciseness of the so formed cluster groups. B. Validation of fusion Results As clustering is a kind of study without guidance, basically unsupervised classification, it is difficult to evaluate the clustering efficiency. But with classifying information of data, it can be considered that some inner distribution characters of the data are expressed to certain degree. If such classifying information is not used by clustering, it can be used to evaluate the clustering effect. If the number of same objects, which covered by certain clustering label of labeled vectors and certain known category of category properties, are at best, this clustering label corresponds to this known category. Thus many clustering labels might correspond to the same category, whereas one clustering label can not correspond to many categories. The clustering results can be evaluated by classifying information. The test results with the IRIS dataset, Wine dataset, Half rings and Spiral dataset (Courtesy: UCI data repository) is promising and shows better cluster accuracy. Two parameters were computed to verify our algorithm: Cluster Correctness Factor (CCF) and the space complexity of the fusion of ensembles. Few bench marked datasets as mentioned above were tested with this technique and the CCF was found to be 100%, in all the cases. Normally, in all ensembling algorithms, voting matrix is computed which is normally in the order of n2, where n is the number of data points. But, due to the knowledge guided fusion along with unique inherent voting scheme, the space complexity has been reduced to the order of n. This has a major impact in not only memory requirements but also in the total number of matrix computations. C. Comparison of the Experimental Results In our approach of knowledge guided fusion, we have combined the results of several independent cluster runs by computing inherent voting of their results. Our phased knowledge guided approach voting helps us to overcome the computationally infeasible simultaneous combination of all partitions and also increases the cluster accuracy. (Figure 4.3.1). By the help of our scheme, we have shown that the consensus partition indeed converges to the true partition. InterCluster Density (Figure 4.3.3) has been reduced by almost 40% when compared against the other clustering algorithms with our technique. For the benchmark iris dataset it is around 11.47 and our cluster miner produces 6.77 implying that the later has produced better cluster in terms intercluster Density. We have observed that the IntraCluster Density (Figure 4.3.2) has increased, implying that the cluster quality has improved due to the guided approach used for ensemble fusion. For the standard benchmark iris dataset, intra cluster density achieved using normal clustering methods is 5.1283, whereas our

technique had generated an intra cluster density of 5.7125, implying that we have generated more precise clusters.

Figure 4.3.1 Comparison of error rates of the fused ensembles

Figure 4.3.2 Intra cluster density

Figure 4.3.3 Inter cluster density

With Iris dataset

V.

CONCLUSION AND FUTURE WORK

In this paper we addressed the relabeling problem found in general in most of cluster ensembles, and has been resolved without much computations, using the notion from first law of geography. The cluster ensemble is a very general framework that enables a wide range of applications. We have applied the proposed guided cluster merging technique on spatial databases. The main issue in spatial databases is the cardinality of data points and also the increased dimensions. Most of the existing Ensemble algorithms have to generate voting matrix of at least an order of n2. or an expensive graphical representation with the vertices which is equal to the number of data points. When n is very huge and is also a common factor in spatial datasets, this restriction is a very big bottleneck in obtaining robust clusters in reasonable time and high accuracy. Our algorithm has resolved the re labeling using layered merging as well as guided by the gained information. Once elements move from strong links to final clusters, they do not participate in further computations. Hence, the computational cost is also hugely reduced. Usage of the Cluster compatibility matrix enables us to have a good head start in the fusion process, which otherwise is a matter of sheer randomness. The key goal of spatial data mining is to automate knowledge discovery process. It is important to note that in this study, it has been assumed that, the user has a good knowledge of data and of the hierarchies used in the mining process. The crucial input of deciding the value of k, still affects the quality of the resultant clusters. Domain specific Apiori knowledge can be

288

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

used as guidance for deciding the value k. We feel that semi supervised clustering using the domain knowledge could improve the quality of the mined clusters. We have used heterogeneous clusterers for our testing but it can be tested with more new combinations of spatial clustering algorithms as base clusterers. This will ensure exploring more natural clusters. First, we have identified several non-spatial datasets which are normally used as bench mark ones for data clustering. Then we tested how our layer based methodology can wok with spatial data. This setup must be worked with more large datasets available in GIS areas and with satellite images. We evaluated our work and can conclude that for targeting a specific platform and incorporating spatial feature space, our automated layered merge approach is able to provide the necessary correctness with more efficiency both in space constraint and in matrix computations. However, more work should be carried out to provide support for more real life data from satellites and incomplete data. Future work in the short term will focus on how to acquire such datasets, and continue with more testing, in spite of current security concerns in distributing such data. ACKNOWLEDGMENT This work has been partly done in the labs of The Oxford College of Engineering, Bangalore, where the author is currently working as a Professor, in the department of Computer Science & Engineering. The authors would like to express their sincere gratitude to the Management and Principal of The Oxford College of Engineering for their support rendered during the testing of some of our modules. They also express their thanks to the University of California Irvine, for their huge data repository made available for testing our knowledge guided approach of fusion of ensembles. REFERENCES
[1] M.Ester, H. Kriegel, J. Sander, X. Xu. ”Clustering for Mining in Large Spatial Databases”. Special Issue on Data Mining, KI-Journal Tech Publishing, Vol.1, 98. A.L.N. Fred and A.K. Jain, “Data Clustering using Evidence Accumulation”. In Proc. of the 16th International Conference on Pattern Recognition, ICPR 2002, Quebec City. A.L.N. Fred and A.K. Jain, “Robust data clustering” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, USA, 2003. Filkov, V. and Skiena, S. “Integrating microarray data by concensus clustering”. In International Conference on Tools with Artificial Intelligence, 2003 K.Koperski, J.Han, K. Koperski and J. Han, "Discovery of spatial Rules in Geographic Information Databases," Proc. 4th Intl Symposium on Large Spatial Databases, pp. 47-66, 95. Kai Kang, Hua-Xiang Zhang, Ying Fan, “A Novel Clusterer Ensemble Algorithm Based on Dynamic Cooperation”, IEEE 5TH International Conf. on Fuzzy Systems and Knowledge Discovery 2008. Matheus C.J., Chan P.K, and Piatetsky-Shapiro G, “Systems for Knowledge Discovery in Databases”, IEEE Transactions on Knowledge and Data Engineering 5(6), pp. 903-913, 1993.

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18]

[19]

Ng R.T., and Han J., “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. 20th Int. Conf. on Very Large DataBases, 144-155, Santiago, Chile, 1994. B.H. Park and H. Kargupta, “Distributed Data Mining”, In The Handbook of Data Mining, Ed. Nong Ye, Lawrence Erlbaum Associates, 2003. J. Roddick and B. G. Lees, "Paradigms for Spatial and Spatio-Temporal Data Mining," in Geographic Data Mining and Knowledge Discovery, Taylor & Francis, 2001. Su-lan Zhai1,Bin Luo1 Yu-tang Guo : Fuzzy Clustering Ensemble Based on Dual Boosting , Fourth International Conference on Fuzzy Systems and Knowledge Discovery 07. Samet, Hanan.: “Spatial Data Models and Query Processing”. In Modern Databases Systems: The object model, Interoperability, and Beyond. Addison Wesley/ ACM Press, 1994,Reading, MA. A.Strehl, J.Ghosh, “Cluster ensembles - a knowledge reuse framework for combining multiple partitions”, Journal of Machine Learning Research, 3: 583-618, 2002. A.Strehl, J.Ghosh, “Cluster ensembles- a knowledge reuse framework for combining partitionings”, in: Proc. Of 11th National Conference On Artificial Intelligence, NCAI, Edmonton, Alberta, Canada, pp.93-98, 2002. Y. Tao, J. Zhang, D. Papa dias, and N. Mamoulis, "An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces," IEEE Transactions on Knowledge and Data Engineering, vol. 16,no. 10, pp. 1169-1184, 2004. X. Wang and H. J. Hamilton, "Clustering Spatial Data in the Presence of Obstacles," International Journal on Artificial Intelligence Tools, vol. 14, no. 1-2, pp. 177-198, 2005. R. Xu and D. Wunsch, II, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16,no. 3, pp. 645- 678, 2005. Zhang, J. 2004. Polygon-based Spatial clustering and its application in watershed study. MS Thesis, University of Nebraska-Lincoln, December 2004. Zeng, Y., Tang, J., Garcia-Frias, J. and Gao, G.R., “An Adaptive MetaClustering Approach: Combining The Information From Different Clustering Results”, CSB2002 IEEE Computer Society Bioinformatics Conference Proceeding. AUTHORS PROFILE

[2]

[3]

RJ Anandhi is a PhD student in the department of Computer Science & Engineering at Dr M G R University She is currently working as a professor in the Department of Computer Science at Oxford College of Engineering, Bangalore. She has completed her BE degree from Bharatiyar University and MTech degree from Pondicherry Central University. Her research interests are in Spatial Data mining and ANT algorithms.

[4]

[5]

[6]

[7]

Dr. Natarajan has initially worked in Defence Research and Development Laboratory (DRDL) for five years in the area of software development in defence missions. Dr Natarajan then worked for 28 years in National Remote Sensing Agency (NRSA) in the areas pertaining to DIP and GlS for several remote sensing missions like IRS-1A, IRS-1B, IRS-1C, IKONOS and LANDSAT. As a Project Manager of Ground Control Point Library (GCPL) Project, he had completed the task of computing cm level accuracy for 3000 locations within India which is being used for cartographic satellite missions. He was the Deputy Project Director of Large Scale Mapping (LSM) of Department of Space. Dr Natarajan has published about fifteen papers in National/ International Conferences and Journals. His research interests are Data mining, GIS and Spatial Databases.

289

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010

290

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.