You are on page 1of 12

Density-Based Clustering Algorithm

Presented by –
Rohit Paul
Disadvantages of Partitioning Method
Choosing k manually
Clustering data of varying sizes and density
Sensitive to outliers

Figure 1 Figure 2
Density-based Method
Cluster in a data space is a contiguous region of high
point density
Separated by lower density of points
Density within the areas of noise is assumed to be
lower
DBSCAN
Density Based Spatial Clustering of Applications
with Noise
How is the density estimated?
Estimates the density by counting the number of points
in a fixed-radius neighborhood
KEY IDEA: For each point of a cluster the
neighborhood of a given radius has to contain at least
a minimum number of points
Two input parameters required:
Epsilon(ε)
MinPts
ε-Neighborhood: Objects within a radius ε from an
object
Three types of points based on ε-Neighborhood:
Core Point
Border Point
Noise Point
Terminology used:
 Directly density-reachable
o A point q is directly density-reachable from a core
point p if q is within the Eps-neighborhood of p.
 Density-reachable
o A point q is density reachable
from p if there are a set of core
points leading from p to q.
 Density-connected
o Two points p and q are density connected if there
are a core point o, such that both p and q are density
reachable from o
A cluster satisfy the following conditions:
If p be a core point and the set of all point which
are density-reachable from p be O. Then, this set
O is a cluster with respect to Eps and Minpts.
 For all p and q in cluster C, p is density-
connected to q with respect to Eps and Minpts.
Noise, a set of points which do not belongs to any
clusters
Pseudo code of DBSCAN Algorithm
Scikit-learn implementation
Reference
https://developers.google.com/machine-
learning/clustering/algorithm/advantages-disadvantages
https://towardsdatascience.com/dbscan-clustering-
explained-97556a2ad556
https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.DBS
CAN.html
Martin Ester, Hans-Peter Kriegel, J¨org Sander, and
Xiaowei Xu. A density-based algorithm for discovering
clusters in large spatial databases with noise.
Thank You

You might also like