You are on page 1of 24

Machine Learning

Tools and Techniques

Week 3: Unsupervised Learning


Unsupervised Learning

Clustering Taxonomy

Introduction to Cluster
Analysis
K-Means Clustering

Hierarchical Clustering

BUSI 651: Machine Learning 2


Unsupervised Learning

• The goal is to find structure/pattern in data by exploring the


relationship (attributes/features) between data points.

BUSI 651: Machine Learning 3


Unsupervised Learning

• Clustering is an exploratory data analysis technique that can


be used to
• Group data (taxonomy of things)
• Finding homogeneous subgroups i.e. data points in each cluster
are as similar as possible compared to those in other groups.

BUSI 651: Machine Learning 4


Unsupervised Learning

• Used mainly in
• Market Segmentation
• Social Network Analysis
• Image Compression and Segmentation
• Document Clustering

BUSI 651: Machine Learning 5


Clustering Taxonomy

BUSI 651: Machine Learning 6


Partitional Clustering

• Density Based - This category clusters objects based on a local density criterion
where objects are considered densely populated together and are separated
by subspaces of low density. Examples are DBSCAN and OPTICS.
• Model Based - The idea is to build a statistical model for each cluster and find
one that best fits. The user specifies the model in the form of parameters
allowing the model to change during the learning phase. Examples are
COBWEB and AutoClass.
• Distance Based - are generally easy to implement due to their simplicity and
can be applied in numerous scenarios. Popular distance-based algorithms
include the K-means algorithm.

BUSI 651: Machine Learning 7


K-Means

• The most popular and widely used cluster analysis algorithm


that partitions the dataset into k distinct (pre-defined) non-
overlapping clusters.
• The k-means optimization objective is to minimize the
distance (sum of squared distance) between the cluster
centroid and the object assigned to the centroid.

BUSI 651: Machine Learning 8


K-means

Input: A vector x1, x2, … xn, k number of clusters


Output: k clusters
procedure K-means
{
Randomly select initial k number of centroids, C1, C2, …Ck
Repeat
Assign each point to the closest centroid to form a cluster
For i = 1, i++, I =k
Recalculate the mean for each cluster centroid
Replace Ci with the mean of all the samples in cluster i
End for
Until convergence criteria is met
}
BUSI 651: Machine Learning 9
K-Means

Randomly select initial k number of centroids, c1, c2, …. Cn


Assign each point to the closest centroid to form a cluster
BUSI 651: Machine Learning 10
K-Means

For i = 1, i++, I =k
Recalculate the mean for each cluster centroid The average/means of data points is
Replace Ci with the mean of all the samples in cluster i assigned to the cluster centroid.
End for BUSI 651: Machine Learning 11
K-Means

• Convergence of K-Means is widely affected by the random


initialization of the k cluster centroids.
• This can be corrected by
• computing the cost/distortion function that minimizes the distance
between the centroid and the data points.
• repeating the random initialization process multiple times until the
best initial clusters are found.

BUSI 651: Machine Learning 12


Initialization of k

BUSI 651: Machine Learning 13


Initialization of k

BUSI 651: Machine Learning 14


Number of k clusters

BUSI 651: Machine Learning 15


Number of k clusters

If we know the context of the problem e.g.


We need to divide the dataset according to
new, returning, continuing customers then we
can safely choose 3

BUSI 651: Machine Learning 16


Distance Measures

Manhattan à (6-0) + (6-0) = 12

Euclidean à sqrt (62 + 62) = 8.49

Yellow, Red, and Blue lines have the same distance of 12.
Green line has a distance of 8.49

BUSI 651: Machine Learning 17


Hierarchical Clustering
• Divisive - This is a top-down approach where it begins
with one root that contains all the data points. This
root is then recursively considered if it can be split
further based on some dissimilarity distance. This
process is repeated until a singleton is obtained.
• Agglomerative - This is a bottom-up approach where
all data points are represented at the bottom of the
binary tree. These points are recorded in a
dissimilarity matrix and the closest sets of clusters
are then merged together. The dissimilarity matrix is
then updated and the process is repeated where the
closest pairs that are less dissimilar are merged
together bottom-up until one cluster remains that
contains all the data points.

BUSI 651: Machine Learning 18


Distance Measures

• Single Linkage Clustering (SLC) – distance


between 2 clusters is defined as the shortest
distance between 2 points in each cluster.

• Complete Linkage Clustering (CLC) – distance


between 2 clusters is defined as the furthest
distance between 2 points in each cluster.
BUSI 651: Machine Learning 19
Distance Measures

• Average Linkage Clustering (ALC) –


distance between 2 clusters is defined as
the average distance between all points in
one cluster to another cluster.

BUSI 651: Machine Learning 20


Hierarchical Clustering

BUSI 651: Machine Learning 21


Challenges of Hierarchical Clustering

• If a misclassification is done, it is very difficult to reassign an


object again.
• Merge/split decision once done are difficult to undo.
• They tend to have a higher complexity, thus not suitable for
large datasets.

BUSI 651: Machine Learning 22


Thank you!

• Any questions?

BUSI 651: Machine Learning 23


References

• Machine Learning: Unsupervised Learning, Rizwan Khan.


Chapter 14.
• Big Data Analytics: A Tutorial of Some Clustering
Techniques, Said Baadel. Int. J. Management and Data
Analytics 1 (2), 38-46

BUSI 651: Machine Learning 24

You might also like