Professional Documents
Culture Documents
Density Based
Density Based
Basic Idea:
Clusters are dense regions in the
data space, separated by regions of
lower object density
Results of a k-medoid
algorithm for k=4
ε-Neighborhood of p
ε ε ε-Neighborhood of q
q p
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)
Core, Border & Outlier
Outlier Given e and MinPts,
categorize the objects into
Border three exclusive groups.
Minpts = 3
Eps=radius
of the
circles
Density-Reachability
Directly density-reachable
An object q is directly density-reachable from
object p if p is a core object and q is in p’s e-
neighborhood.
p
p is (indirectly) density-reachable
p2 from q
p1 q is not density- reachable from p?
q
MinPts = 7
Density-Connectivity
Density-reachable is not symmetric
q not good enough to describe clusters
Density-Connected
q A pair of points p and q are density-connected
if they are commonly density-reachable from a
point o.
Density-connectivity is
symmetric
p q
o
Formal Description of Cluster
DBScan Algorithm
DBSCAN: The Algorithm
– Arbitrary select a point p
– Continue the process until all of the points have been processed.
DBSCAN Algorithm: Example
• Parameter
• e = 2 cm
• MinPts = 3
for each o D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
DBSCAN Algorithm: Example
• Parameter
• e = 2 cm
• MinPts = 3
for each o D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
DBSCAN Algorithm: Example
• Parameter
• e = 2 cm
• MinPts = 3
for each o D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
MinPts = 5
e
P1
e
e P
C1 C1
C1 P
e
C1
C1
e
e
e
C1 C1
C1
Example
e = 10, MinPts = 4
When DBSCAN Works Well
• Resistant to Noise
• Can handle clusters of different shapes and sizes
When DBSCAN Does NOT Work Well
(MinPts=4, Eps=9.92).
Original Points
(MinPts=4, Eps=9.75)
DBSCAN: Sensitive to Parameters
Determining the Parameters e and MinPts
p 3-distance(p) :
q
3-distance(q) :
3-distance
first „valley“
Objects
• Problematic example
A C
F A, B, C
E
B, D, E
3-Distance
G
B‘, D‘, F, G
G1
G3 D1, D2,
D G2 G1, G2, G3
B D’
B’ D1
D2 Objects
Density Based Clustering: Discussion
• Advantages
– Clusters can have arbitrary shape and size
– Number of clusters is determined automatically
– Can separate clusters from surrounding noise
• Disadvantages
– Input parameters may be difficult to determine
– In some situations very sensitive to input
parameter setting