International Journal of Computational Intelligence and Information Security, August 2011 Vol. 2, No. 8
21
A survey: Performance improving of K-mean by Genetic Algorithm
Amit Dubey
1
, Prof. Anurag Jain
2
and Dr. A.K. Sachan
3
1
Computer Science department Radharaman Institute of science & technology, Bhopal
2
Computer Science department Radharaman Institute of science & technology, Bhopal
3
Computer Science department Radharaman Institute of science & technology, Bhopalamit_23dubey@yahoo.co.inAnurag.akjain@gmail.comsachanak_12@yahoo.com
Abstract
This paper presents a new initialization technique for k mean clustering. centroid selection performed by Geneticalgorithm in the K mean algorithm. These centroids act as starting points for k-means. This paper is a survey of Improved K mean using Genetic algorithm. To measure the cluster compactness a within cluster scatter criteria hasbeen used.
Keywords:
K-Means, GAIK, genetic algorithm, IGA-FKKM, Entropy Weighting
I.
Introduction
Clustering is the process of grouping data into groups having similar properties. It is widely used in many areas,including data mining, statistics, biology, and machine learning. A cluster has objects with high similarity, but isdissimilar to the objects in other clusters [1]. These similarities are assessed based on the attribute value.
1.2 Types of clustering
1.
Partition based: - The partitioning method initially creates partitions. Then an iterative relocation techniqueis used to improve partitioning and moves objects from one group to another.2.
Hierarchical: - A hierarchical method creates a hierarchical decomposition of the given set of data objects.3.
Density based: The density based approach is to continue growing the given cluster as long as the densityi.e. number of objects or data points in the neighborhood exceeds some threshold.4.
Grid based: - Grid based methods quantize the object space into a finite number of cells that forms a gridstructure.
5.
Model based clustering: - The model based clustering hypothesizes a model for each of the clusters andfinds the best fitted data according to the given model
.
K-means algorithm which is a partition based clustering, and it is one of the most popular methods used in dataclustering due to its good computational performance [2]. However, it is well known that its result depends on theinitialization process, which is generally done by random selection. To improve the performance a new initializationtechnique has been proposed. Different runs of K-means on the same input data may produce different results.Genetic Algorithms are based on the ideas of natural evolution. In general, GA start with an initial population, andthen a new population is created based on the fitness value of chromosomes. Fitness is the measure for how good is