You are on page 1of 13

CLUSTER ANALYSIS

PRESENTED BY:Garima Anand(34) Sarabjeet kour(44) Supriya koul(59) Priyanshu Gupta(60)

CONCEPT
Cluster analysis is a class of statistical techniques Cluster analysis is an exploratory data analysis tool. Cluster analysis sorts through the raw data. A cluster is a group of relatively homogeneous cases or observations. Cluster analysis, is an interdependence technique. Cluster analysis reduces the number of observations or cases.

Example: A group of diners sharing the same table in a restaurant may be regarded as a cluster of people. In food stores items of similar nature, such as different types of meat or vegetables are displayed in the same or nearby locations.

HYPOTHETICAL EXAMPLE
No. of vacation days

.C .F

.M .H .L

.O .G

.N .K .D .I

II

.A .B .E .J

III

Expenditure on vacations(Rs.)
 

Vacations by 15 individuals (A To O ). Three different clusters ( I, II & III ) .

To classify individuals or objects on the basis of their similarity or distance from each other . Distance is in inverse measure of similarity .

BASIC PROCEDURE
1. 2.

OF CLUSTER ANALYSIS

3. 4. 5. 6.

Formulate the problem . Select a distance measure : y Squared Euclidean distance . y Manhattan distance . y Chebyshev distance . y Mahalanobis (or correlation) distance . Select a clustering procedure . Decide on the number of clusters . Map and interpret clusters (draw conclusions ). Assess reliability and validity : y Repeat analysis but use different distance measure . y Repeat analysis but use different clustering technique . y Split the data randomly into two halves and analyze each part separately . y Repeat analysis several times, deleting one variable each time y Repeat analysis several times, using a different order each time .

Clustering Methods

Clustering methods are categorized as:


Non-Hierarchical clustering Hierarchical clustering

Cont..
Non-Hierarchical clustering:

first determine a cluster center, then group all objects that are within a certain distance

Examples

Sequential Threshold method - first determine a cluster center, then group all objects that are within a predetermined threshold from the center.Only One cluster is created at a time. Parallel Threshold method - several cluster centers are determined simultaneously, then objects that are within a predetermined threshold from the centers are grouped. Optimizing Partitioning method - first a non-hierarchical procedure is run, then objects are reassigned so as to optimize an overall criterion
Centroid methods - clusters are generated that maximize the distance between the centers of clusters (a centroid is the mean value for all the objects in the cluster) Variance methods - clusters are generated that minimize the within-cluster variance

Cont..
Hierarchical clustering objects are organized into an hierarchical structure as part of the procedure Examples:
a)

Divisive clustering - start by treating all objects as if they are part of a single large cluster, then divide the cluster into smaller and smaller clusters. Agglomerative clustering - start by treating each object as a separate cluster, then group them into bigger and bigger clusters

b)

examples:

Cont..
c)

Linkage methods cluster of objects are based on the distance between them
Single Linkage method - cluster objects based on the minimum distance between them (also called the nearest neighbour rule) Complete Linkage method - cluster objects based on the maximum distance between them (also called the furthest neighbour rule) Average Linkage method - cluster objects based on the average distance between all pairs of objects (one member of the pair must be from a different cluster)

examples:
o

ADVANTAGES Of CLUSTER ANALYSIS IN MARKETING:Market segmentation Buyer Behavior Development Of New Product Reduce Number Of Test Markets

Disadvantage:Lack Of Specificity Lack Specific Technique Time Consuming

You might also like