Ki2 s07 Clustering Algorithms

KI2 - 7
Clustering Algorithms
Johan Everts
Kunstmatige Intelligentie / RuG

1
What is Clustering?
Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)
The Goals of Clustering
Determine the intrinsic grouping in a set of unlabeled data.

What constitutes a good clustering? All clustering algorithms will produce clusters, regardless of whether the data contains them There is no golden standard, depends on goal:

data reduction natural clusters useful clusters outlier detection
Stages in clustering
Taxonomy of Clustering Approaches
Hierarchical Clustering
Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.
Agglomerative Clustering
Single link In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.
Agglomerative Clustering
Complete link In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.
Example Single Link AC

BA BA FI MI 0 662 877 FI 662 0 295 MI 877 295 0 NA 255 468 754 RM 412 268 564 TO 996 400 138
NA
RM TO
255
412 996
468
268 400
754
564 138
0
219 869
219
0 669
869
669 0
BA BA FI MI/TO NA RM 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564
NA 255 468 754 0 219
RM 412 268 564 219 0
BA BA FI MI/TO 0 662 877
FI 662 0 295
MI/TO 877 295 0
NA/RM 255 268 564
NA/RM
255
268
564
BA/NA/RM
FI
MI/TO
BA/NA/RM
268
564
FI MI/TO
268 564
0 295
295 0
BA/FI/NA/RM
MI/TO
BA/FI/NA/RM
295
MI/TO
295
Taxonomy of Clustering Approaches
Square error
K-Means

Step 0: Start with a random partition into K clusters Step 1: Generate a new partition by assigning each pattern to its closest cluster center Step 2: Compute new cluster centers as the centroids of the clusters. Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)
K-Means
K-Means How many Ks ?
K-Means How many Ks ?
Locating the knee
The knee of a curve is defined as the point of maximum curvature.
Leader - Follower
Online Specify threshold distance Find the closest cluster center

Distance above threshold ? Create new cluster Or else, add instance to cluster
Leader - Follower
Find the closest cluster center

Distance above threshold ? Create new cluster Or else, add instance to cluster
Leader - Follower
Distance < Threshold

Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center
Leader - Follower

Leader - Follower
Distance > Threshold

Kohonen SOMs
The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing
Kohonen SOMs
Each weight is representative of a certain input. Input patterns are shown to all neurons simultaneously. Competitive learning: the neuron with the largest response is chosen.
Kohonen SOMs

Initialize weights Repeat until convergence

Select next input pattern Find Best Matching Unit Update weights of winner and neighbours Decrease learning rate & neighbourhood size
Learning rate & neighbourhood size
Kohonen SOMs
Distance related learning
Kohonen SOMs
Some nice illustrations
Kohonen SOMs
Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map
Performance Analysis
K-Means

Depends a lot on a priori knowledge (K) Very Stable
Leader Follower

Depends a lot on a priori knowledge (Threshold) Faster but unstable
Performance Analysis
Self Organizing Map

Stability and Convergence Assured Principle of self-ordering Slow and many iterations needed for convergence Computationally intensive
Conclusion
No Free Lunch theorema
Any elevated performance over one class, is exactly paid for in performance over another class
Ensemble clustering ?
Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.
Any Questions ?

Ki2 s07 Clustering Algorithms

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ki2 s07 Clustering Algorithms

Uploaded by

Copyright:

Available Formats

KI2 - 7

Kunstmatige Intelligentie / RuG

The Goals of Clustering

Determine the intrinsic grouping in a set of unlabeled data.

data reduction natural clusters useful clusters outlier detection

Taxonomy of Clustering Approaches

Example Single Link AC

Example Single Link AC

Example Single Link AC

BA BA FI MI/TO NA RM 0 662 877 255 412

FI 662 0 295 468 268

MI/TO 877 295 0 754 564

NA 255 468 754 0 219

RM 412 268 564 219 0

Example Single Link AC

Example Single Link AC

BA BA FI MI/TO 0 662 877

MI/TO 877 295 0

NA/RM 255 268 564

Example Single Link AC

Example Single Link AC

Example Single Link AC

Example Single Link AC

Example Single Link AC

Example Single Link AC

Taxonomy of Clustering Approaches

K-Means How many Ks ?

K-Means How many Ks ?

Locating the knee

The knee of a curve is defined as the point of maximum curvature.

Find the closest cluster center

Distance < Threshold

Find the closest cluster center

Find the closest cluster center

Distance > Threshold

Find the closest cluster center

Initialize weights Repeat until convergence

Learning rate & neighbourhood size

Distance related learning

Some nice illustrations

Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map

Depends a lot on a priori knowledge (K) Very Stable

Depends a lot on a priori knowledge (Threshold) Faster but unstable

Self Organizing Map

No Free Lunch theorema

You might also like