Professional Documents
Culture Documents
Clustering Taxonomy
Introduction to Cluster
Analysis
K-Means Clustering
Hierarchical Clustering
• Used mainly in
• Market Segmentation
• Social Network Analysis
• Image Compression and Segmentation
• Document Clustering
• Density Based - This category clusters objects based on a local density criterion
where objects are considered densely populated together and are separated
by subspaces of low density. Examples are DBSCAN and OPTICS.
• Model Based - The idea is to build a statistical model for each cluster and find
one that best fits. The user specifies the model in the form of parameters
allowing the model to change during the learning phase. Examples are
COBWEB and AutoClass.
• Distance Based - are generally easy to implement due to their simplicity and
can be applied in numerous scenarios. Popular distance-based algorithms
include the K-means algorithm.
For i = 1, i++, I =k
Recalculate the mean for each cluster centroid The average/means of data points is
Replace Ci with the mean of all the samples in cluster i assigned to the cluster centroid.
End for BUSI 651: Machine Learning 11
K-Means
Yellow, Red, and Blue lines have the same distance of 12.
Green line has a distance of 8.49
• Any questions?