Cluster analysis is an important human activity whether in biology deriving plant and animal taxonomies, in image pattern recognition, in earth observation database. It help us to organize similar characteristics of a large number of customers in a group. Clustering partitions large data sets into groups is called as data segmentation. Clustering can also be used for Outlier Detection, which is used for the detection of credit card fraud, and monitoring of criminal activities in electronic commerce etc.,. Some requirements for clustering in data mining are Scalability, Ability to deal with different types of attributes, Discovery of clusters with arbitrary shape, Minimal requirements for domain knowledge to determine input parameters, Able to deal with noise and outliers, Insensitive to order of input records, High dimensionality. Clustering methods can be classified into Partitioning method, where each object belong to exactly one group and given the number of partition it creates an initial partitioning which is being improved by moving objects from one group to another. Algorithm based on partition includes K-means clustering, PAM, CLARA, CLARANS, Clustering with Genetic Algorithm, Clustering with Neural network. Hierarchical method, It construct an hierarchical relationship among data in order to cluster. It either starts with a large cluster and splits into small clusters or merges similar clusters into larger cluster. Algorithm of this kind of clustering include BIRCH, CURE, ROCK, and Chameleon. Density-based method, It works on the principle that the data which in the region with high density of the data space is considered to belong to the same cluster. Algorithm include DBSCAN, OPTICS, and Mean-shift. Grid- Based method, In this method the objects space is divided into grid. Algorithms includes are STING, CLIQUE. Model- based method, It find good approximations of model parameters that best fit the data. This Model based clustering algorithms can be either partitional or hierarchical, depends on the structure. At last Constraint-based clustering determines clusters that satisfy preferences or constraints specified by user. Some of the most commonly used clustering algorithm are: K-Means Algorithm, It uses the Centroid of a cluster to represent a cluster. Algorithm start by choosing the number of cluster and generating random point as cluster center. Constructing the cluster by assigning each point to the nearest cluster centers and so recomputed the new cluster centers. By repeating the step until some convergence criterion is met. PAM on other hand uses Medoids to represent the cluster which is sequence of objects centrally located in cluster. The objective is to minimize the average dissimilarity of objects to their closest selected object by exchanging selected objects with unselected objects. It is relatively more costly than K-mean but at the same time decrease the sensitivity toward outliers. CLARA, an extension of PAM to deals with data containing large number of objects in order to reduce the complexity as it does not find a representative object for entire data set instead draws a sample of data set and applies PAM on it. CLARA draws multiple samples and gives the best clustering as the output. CLARANS, It present a tradeoff between the cost and the effectiveness of using samples to obtain clustering as unlike CLARA it does not restrict its search to a particular subgraph. Clustering with Genetic Algorithm, it is a hypothetical search technique inspired by Darwin’s Theory of Evolution- “Survival of Fittest” that performs a multi-directional search by maintaining a population of potential solutions and encourages information formation and exchange between these directions. Clustering with Neural Network (NN), Self Organizing Map (SOM) is used which is a grid of neurons which adapt to the topological shape of a dataset which allow us to visualize large datasets and identify potential cluster. This network can recognize or characterize inputs it has never encountered before; this process is known as Generalization capability. Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) is structured for clustering a large amount of numeric data by combining hierarchical clustering and other clustering methods. Which helps it overcome difficulties such as Scalability and the inability to undo what was done in the previous step which was present in agglomerative clustering methods. It uses the concept of clustering feature to summarize a cluster and clustering feature tree(CF-tree) to represent a cluster hierarchy. As using a clustering feature, we can easily derive many useful statistics of a cluster the cluster’s such as centroid, radius, and diameter. The algorithm consist of two phases. In first phase, BIRCH initially scans the database to build the CF tree and in second phase, It applies a clustering algorithm for clustering the leaf nodes of CF nodes of the CF-tree. Clustering Using Representatives (CURE) is similar to agglomerative algorithm, begins with every single data object as a cluster and the object itself is the sole representative of the corresponding cluster. At any given stage of the algorithm, we have a set of representative points. The distance between two subclusters is the smallest pair- wise distance between their representative points. Hierarchical clustering works by merging the closest pair of subclusters. Once the clusters are merged, a new set of representative points are computed for the merged cluster. The merging process continues till the pre-specified number of clusters are obtained. Chameleon uses a k-nearest neighbor graph approach to construct a sparse graph, where each vertex of the graph represents a data object, and there exists an edge between two vertices if one object is among the k-most-similar objects of the other. The edges are weighted to reflect the similarity between objects. Chameleon uses a graph partitioning algorithm such that it minimizes the edge cut. It then uses an agglomerative hierarchical clustering algorithm that repeatedly merges subclusters based on their similarity which is based on the Inter Connectivity and Relative closeness. Chameleon has efficiency of producing random shaped clusters but gives polynomial time complexity of power 2. DBSCAN: Density-Based Spatial Clustering of Applications with Noise, It is designed to find non-spherical shaped clusters. It is based on the idea that a cluster in a data space is a contiguous region of high point density, separated from each other such clusters by contiguous regions of low point density. It requires two input parameter Eps and MinPts based on which a core object is selected which with all density-reachable point combine, called as a cluster where Outliers contain the points that are not connected to any core point. It gives a linear time complexity which in worst case will give a polynomial time complexity. OPTICS (Ordering Points to Identify the Clustering Structure) Algorithm it produces a novel cluster-ordering of the database points with respect to its density-based clustering structure containing the information about every clustering level of the data set up to a generating distance Eps. It adds two more term to the concept of DBSCAN i.e., Core Distance and Reachability Distance and so this technique does not explicitly segment the data into clusters. Instead , Iit produces a visualization of Reachability distances and uses this visualization to cluster the data. Similar to DBSCAN, OPTICS also processes each point once, and thereby performs one -neighborhood query during this processing. It gives a logarithmic time complexity at best. STING: A Statistical Information Grid Approach, In this the spatial area is divided into rectangular cells. There are several different levels of such rectangular cells corresponding to different resolution and these cells form a hierarchical structure. The whole input dataset serves as the root node in the hierarchy structure and each cell at a high level is partitioned to form a number of cells of the next lower level where size of the leaf level cells is dependent on the density of objects. STING goes through the database once to compute the statistical parameters of the cells and so give us a linear time complexity for generating the cluster. CLIQUE (Clustering in QUEst), It is a bottom-up subspace clustering algorithm. We can find valid clusters which are defined by only a subset of dimension, which take two parameters i.e., density threshold and number of grids. It partition the data space into non-overlapping rectangular units called grids and then find out the dense region. Then by using apriori approach, cluster can be generated from all dense subspaces. Expectation Maximization Algorithm, is an extension of the K-Means algorithm which find the maximum likelihood solution. Instead of assigning examples to clusters to maximize the differences in means, the EM algorithm computes probabilities of cluster memberships based on one or more probability distributions. The goal is to maximize the overall probability or likelihood of the data, given the cluster which make it robust to noisy data.