You are on page 1of 11

TOPIC 6 – PART A

CLUSTERING
OBJECTIVES

To introduce the basic concepts of clustering. ✅

To discuss how to compute the dissimilarity


between objects of different attribute types.

To examine several clustering techniques of


partitioning method and hierarchical method.

https://discuss.cryosparc.com/t/using-particles-from-cluster-mode-in-3d-va-for-refinement-fa
ils/3665/2
INTRODUCTION

Descriptive analytics is sometimes Predictive analytics is used to


said to provide information about identify future probabilities and
happened. trends.
WHAT IS CLUSTER ANALYSIS?

Intra-cluster Inter-cluster
distances are distances are
minimized maximized

Finding groups of objects such that the objects in a group will be similar (or related) to
one another and different from (or unrelated to) the objects in other groups.
WHAT IS CLUSTER ANALYSIS? CONTINUED…

Cluster: A collection of data objects


 similar (or related) to one another within the same group
 dissimilar (or unrelated) to the objects in other groups
Tid Refund Marital Taxable
Status Income

1 Yes Single 125K

Cluster Analysis is one of unsupervised 2 No Married 100K

NO CLASS LABEL
3 No Single 70K
learning methods which do not have 4 Yes Married 120K
any predefined classes or any previous 5 No Divorced 95K

group information. 6 No Married 60K


7 Yes Divorced 220K
8 No Single 85K
9 No Married 75K
10 No Single 90K
10
WHAT IS CLUSTER ANALYSIS? CONTINUED…

• A robust algorithm for finding similarities.

• The datasets may consists of different type of


attributes; nominal, numeric and also
vectors.

• Each type of attribute has different


dissimilarity distance function.
12 1Z 12
APPLICATION OF CLUSTERING
Typical applications What is not Cluster Analysis?
i. As a data visualization tool to get i. Supervised classification
insight into data distribution • Have class label information
ii. As a preprocessing step for other ii. Simple segmentation
algorithms • Dividing students into different
registration groups alphabetically,
Clustering as a Preprocessing Tool for by last name
regression, classification, and iii. Results of a query
association analysis • Groupings are a result of an
i. Reduce the size of large data sets external specification
ii. Compress images through vector iv. Graph partitioning
quantization • Some mutual relevance and
synergy, but areas are not identical
METHODS OF CLUSTERING
Partitional Clustering: A division data Hierarchical Clustering: A set of nested
objects into non-overlapping subsets clusters organized as a hierarchical tree.
(clusters) such that each data object is
in exactly one subset.

k means Algorithm Agglomerative with Single


link or Complete link
References

1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and


Techniques, 3rd Edition, Morgan Kaufmann, 2012.

2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data


Mining, Addison Wesley, 2019.

3. Picture Credit to Pinterest


THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin

You might also like