Fast Clustering Technique - DBSCAN: Prof. Priyadarshan Dhabe

Seminar On
VI
Fast Clustering Technique - DBSCAN
Vishwakarma Institute of Technology
By
22 Sanika Divekar
25 Vighnesh Gadge
28 Atharva Jadhav
30 Sanket Jadhav
58 Tejas Pacharne
Guided by:- Prof. Priyadarshan Dhabe.

Department Of IT & MCA, VIT, Pune 1
Content
VI
• How Clustering Came Into Picture?
• What Is Clustering?
• Types Of Clustering
• Density Based Clustering
• Measures Of DBSCAN
• Reachability And Connectivity
• Working Of DBSCAN
• Advantage
• Disadvantages
• Application
• Conclusion
• References
2
How Clustering Came Into Picture?
VI
• Unstructured data
• Cluster
Clustering
Algorithm
Unlabeled Data Labeled Clustered

Source: https://medium.com 3
What is Clustering?
VI
Clustering refers to the task of identifying groups or clusters in a data set. [1]
Color
Shape
Texture
4
Source:www.istockphoto.com
Why Clustering? [2]
VI
1. Organizing data into clusters shows internal structure of the data
Clusty and clustering genes above

2. Sometimes the partitioning is the goal
Market segmentation
3. Prepare for other AI techniques
Summarize news
4. Techniques for clustering is useful in knowledge discovery in data
Underlying rules, reoccurring patterns

5
Types Of Clustering
VI
Density
Partitioning Fuzzy Based
Method Clustering Clustering
01 02 03 04 05
Hierarchical Model Based

Clustering Clustering
[3]
6
Density Based Clustering
VI
Fast Clustering technique – DBSACAN [4]
A cluster is a dense region of points, which is separated by according
to the low-density regions, from other regions that is of high density.
7
Source: https://www.researchgate.net
Measures of DBSCAN
VI
• Epsilon:
The value of epsilon can be decided from the K-distance graph.
• minPoints:
The value of minPoints should be at least one greater than the
number of dimensions of the dataset, i.e.,
minPoints>=Dimensions+1 [5]
8
VI
9
Source: https://towardsdatascience.com
Basic Terminologies for DBSCAN
VI
Source: https://www.researchgate.net [5] 10

Reachability and Connectivity
VI
Reachable Connected
[5] 11
Source: https://www.researchgate.net
Working of DBSCAN
VI
Noise
Cluster
ε
Eps = 0.8 Border Point

minPts= 4
12
Source: Group 15 Source:https://wikipedia.org/
Advantages of DBSCAN
VI
• DBSCAN does not require one to specify the number of clusters
beforehand.
• DBSCAN performs well with arbitrary shaped clusters.
• DBSCAN has a notion of noise, and is robust to outliers.
• DBSCAN can find any shape of clusters. The cluster doesn’t have
to be circular. [3]
13
Disadvantage of DBSCAN
VI
• Cannot work with datasets of varying densities.
• Sensitive to the clustering hyper-parameters – the eps and the min_pts.
• Fails if the data is too sparse.
• The density measures (Reachability and Connectivity) can be affected by

sampling. [3]
14
Application of DBSCAN
VI
[6]
01 Document Network Analysis
Recommendation Systems 02
03 X-ray Crystallography
Social Network Analysis 04
15
Conclusion
VI
• Density-based clustering has been applied successfully for cluster
analysis in many different contexts.
• In general, density-based clustering aims at identifying clusters as

areas of high-point density that are separated by areas of low-point
density and, thus, can be arbitrarily shaped in the data space.
16
References
VI
[1] Amandeep Kaur Mann& Navneet Kaur, “Review Paper on Clustering Techniques”, Global Journals Inc.
USA, Volume 13 Issue 5, pp.43-47, 2013, Available:https://core.ac.uk/reader/231159370 [Accessed Sept.
29, 2021].
[2] M. A. Deshmukh, R. A. Gulhane, “Importance of Clustering in Data Mining0”, International Journal of
Scientific & Engineering Research, Volume 7, Issue 2, pp.247-251, February-2016, Available:https://
www.ijser.org/researchpaper/Importance-of-Clustering-in-Data-Mining.pdf
[3]Albou Kadel, “Types of Clustering Methods: Overview and Quick Start R Code”,
https://www.datanovia.com/en/blog/types-of-clustering-methods-overview-and-quick-start-r-code/
(Accessed Sep 29,2021)
[4] Yewang Chen, Lida Zhou, Nizar Bouguila,Cheng Wang, Yi Chen, Jixiang Du, “BLOCK-DBSCAN: Fast
clustering for large scale data”, Pattern Recognition,China, Volume 109, January 2021,107624,Available:
https://www.sciencedirect.com/science/article/abs/pii/S003132032030421
[5] Pradeep Singh, Prateek A. Meshram,“Survey of Density Based Clustering Algorithms and its
Variants”,International Conference on Inventive Computing and Informatics (ICICI 2017), coimbatore,
India, pp. 920-926, 2017 Available: https://sci-hub.mksa.top/10.1109/ICICI.2017.8365272
[6] Sunit Prasad, “Different Types of Clustering Methods and Applications”,
https://www.analytixlabs.co.in/blog/types-of-clustering-algorithms/, (Accessed Oct 1, 2021).
17
Thank You
Source://https://ml-explained.com 18

Fast Clustering Technique - DBSCAN: Prof. Priyadarshan Dhabe

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fast Clustering Technique - DBSCAN: Prof. Priyadarshan Dhabe

Uploaded by

Copyright:

Available Formats

Seminar On

Guided by:- Prof. Priyadarshan Dhabe.

Unlabeled Data Labeled Clustered

Clusty and clustering genes above

Underlying rules, reoccurring patterns

Method Clustering Clustering

Hierarchical Model Based

to the low-density regions, from other regions that is of high density.

Source: https://www.researchgate.net [5] 10

Eps = 0.8 Border Point

• DBSCAN performs well with arbitrary shaped clusters.

• DBSCAN has a notion of noise, and is robust to outliers.

• Sensitive to the clustering hyper-parameters – the eps and the min_pts.

• Fails if the data is too sparse.

• The density measures (Reachability and Connectivity) can be affected by

Social Network Analysis 04

• In general, density-based clustering aims at identifying clusters as

You might also like