You are on page 1of 42

KI2 - 7

Clustering Algorithms
Johan Everts

Kunstmatige Intelligentie / RuG


1

What is Clustering?

Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

The Goals of Clustering

Determine the intrinsic grouping in a set of unlabeled data.


What constitutes a good clustering? All clustering algorithms will produce clusters, regardless of whether the data contains them There is no golden standard, depends on goal:

data reduction natural clusters useful clusters outlier detection

Stages in clustering

Taxonomy of Clustering Approaches

Hierarchical Clustering

Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

Agglomerative Clustering

Single link In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

Agglomerative Clustering

Complete link In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

Example Single Link AC


BA BA FI MI 0 662 877 FI 662 0 295 MI 877 295 0 NA 255 468 754 RM 412 268 564 TO 996 400 138

NA
RM TO

255
412 996

468
268 400

754
564 138

0
219 869

219
0 669

869
669 0

Example Single Link AC

Example Single Link AC

BA BA FI MI/TO NA RM 0 662 877 255 412

FI 662 0 295 468 268

MI/TO 877 295 0 754 564

NA 255 468 754 0 219

RM 412 268 564 219 0

Example Single Link AC

Example Single Link AC

BA BA FI MI/TO 0 662 877

FI 662 0 295

MI/TO 877 295 0

NA/RM 255 268 564

NA/RM

255

268

564

Example Single Link AC

Example Single Link AC

BA/NA/RM

FI

MI/TO

BA/NA/RM

268

564

FI MI/TO

268 564

0 295

295 0

Example Single Link AC

Example Single Link AC

BA/FI/NA/RM

MI/TO

BA/FI/NA/RM

295

MI/TO

295

Example Single Link AC

Example Single Link AC

Taxonomy of Clustering Approaches

Square error

K-Means

Step 0: Start with a random partition into K clusters Step 1: Generate a new partition by assigning each pattern to its closest cluster center Step 2: Compute new cluster centers as the centroids of the clusters. Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)

K-Means

K-Means How many Ks ?

K-Means How many Ks ?

Locating the knee

The knee of a curve is defined as the point of maximum curvature.

Leader - Follower
Online Specify threshold distance Find the closest cluster center

Distance above threshold ? Create new cluster Or else, add instance to cluster

Leader - Follower

Find the closest cluster center


Distance above threshold ? Create new cluster Or else, add instance to cluster

Leader - Follower

Distance < Threshold

Find the closest cluster center


Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center

Leader - Follower

Find the closest cluster center


Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center

Leader - Follower

Distance > Threshold

Find the closest cluster center


Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center

Kohonen SOMs

The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

Kohonen SOMs

Each weight is representative of a certain input. Input patterns are shown to all neurons simultaneously. Competitive learning: the neuron with the largest response is chosen.

Kohonen SOMs

Initialize weights Repeat until convergence


Select next input pattern Find Best Matching Unit Update weights of winner and neighbours Decrease learning rate & neighbourhood size

Learning rate & neighbourhood size

Kohonen SOMs

Distance related learning

Kohonen SOMs

Some nice illustrations

Kohonen SOMs

Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map

Performance Analysis

K-Means

Depends a lot on a priori knowledge (K) Very Stable

Leader Follower

Depends a lot on a priori knowledge (Threshold) Faster but unstable

Performance Analysis

Self Organizing Map


Stability and Convergence Assured Principle of self-ordering Slow and many iterations needed for convergence Computationally intensive

Conclusion

No Free Lunch theorema

Any elevated performance over one class, is exactly paid for in performance over another class

Ensemble clustering ?

Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.

Any Questions ?

You might also like