You are on page 1of 30

Clustering

• Clustering is the task of dividing the population or data points into a


number of groups such that data points in the same groups are more similar
to other data points in the same group and dissimilar to the data points in
other groups.

• Clustering is the task of dividing the unlabeled data or data points into
different clusters such that similar data points fall in the same cluster than
those that differ from the others.

• In simple words, the aim of the clustering process is to segregate groups


with similar traits and assign them into clusters.
• Example: Let's understand the clustering technique with the
real-world example of Mall: When we visit any shopping mall, we can
observe that the things with similar usage are grouped together. Such
as the t-shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas, Mangoes,
etc., are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way. Other
examples of clustering are grouping documents according to the topic.
Centroid-based Clustering
Density-based Clustering
Distribution-based Clustering
Hierarchical Clustering
Applications of Clustering:

• In Identification of Cancer Cells: The clustering algorithms are widely


used for the identification of cancerous cells. It divides the cancerous and
non-cancerous data sets into different groups.
• In Search Engines: Search engines also work on the clustering
technique. The search result appears based on the closest object to the
search query. It does it by grouping similar data objects in one group that
is far from the other dissimilar objects.
• In Biology: It is used in the biology stream to classify different species of
plants and animals using the image recognition technique.
Shifting of Data points is done with the help of KERNEL DENSITY FUNCTION
Core point

Noise
Border point
point
• The functioning of this algorithm is based on two parameters,
the size of the neighbourhood (e) and the minimum number of
points (m).
• 1. Core point – A point is called a core point if it has more than
specified number of points (m) within e-neighbourhood.
• 2. Border point – A point is called a border point if it has fewer
than ‘m’ points but is a neighbour of a core point.
• 3. Noise point – A point that is neither a core point nor border
point.
Apriori algorithm refers to the algorithm which is used to calculate the
association rules between objects. It means how two or more objects are
related to one another.

In other words, we can say that the apriori algorithm is an association rule
leaning that analyzes that people who bought product A also bought
product B. Example: Bread and Jam

The primary objective of the apriori algorithm is to create the association


rule between different objects.

Apriori algorithm refers to an algorithm that is used in mining frequent


products sets and relevant association rules

You might also like