You are on page 1of 18

Seminar On

VI
Fast Clustering Technique - DBSCAN
Vishwakarma Institute of Technology

By

22 Sanika Divekar
25 Vighnesh Gadge
28 Atharva Jadhav
30 Sanket Jadhav
58 Tejas Pacharne

Guided by:- Prof. Priyadarshan Dhabe.


Department Of IT & MCA, VIT, Pune 1
Content
VI
• How Clustering Came Into Picture?
• What Is Clustering?
Vishwakarma Institute of Technology

• Types Of Clustering
• Density Based Clustering
• Measures Of DBSCAN
• Reachability And Connectivity
• Working Of DBSCAN
• Advantage
• Disadvantages
• Application
• Conclusion
• References
2
How Clustering Came Into Picture?
VI
• Unstructured data
• Cluster
Vishwakarma Institute of Technology

Clustering
Algorithm

Unlabeled Data Labeled Clustered


Source: https://medium.com 3
What is Clustering?
VI
Clustering refers to the task of identifying groups or clusters in a data set. [1]
Vishwakarma Institute of Technology

Color

Shape

Texture
4
Source:www.istockphoto.com
Why Clustering? [2]
VI
1. Organizing data into clusters shows internal structure of the data
Vishwakarma Institute of Technology

Clusty and clustering genes above


2. Sometimes the partitioning is the goal
Market segmentation
3. Prepare for other AI techniques

Summarize news
4. Techniques for clustering is useful in knowledge discovery in data

Underlying rules, reoccurring patterns


5
Types Of Clustering
VI
Density
Partitioning Fuzzy Based
Vishwakarma Institute of Technology

Method Clustering Clustering

01 02 03 04 05

Hierarchical Model Based


Clustering Clustering
[3]
6
Density Based Clustering
VI
Fast Clustering technique – DBSACAN [4]
A cluster is a dense region of points, which is separated by according
Vishwakarma Institute of Technology

to the low-density regions, from other regions that is of high density.

7
Source: https://www.researchgate.net
Measures of DBSCAN
VI
• Epsilon:
The value of epsilon can be decided from the K-distance graph.
Vishwakarma Institute of Technology

• minPoints:
The value of minPoints should be at least one greater than the
number of dimensions of the dataset, i.e.,
minPoints>=Dimensions+1 [5]

8
VI
Vishwakarma Institute of Technology

9
Source: https://towardsdatascience.com
Basic Terminologies for DBSCAN
VI
Vishwakarma Institute of Technology

Source: https://www.researchgate.net [5] 10


Reachability and Connectivity
VI
Vishwakarma Institute of Technology

Reachable Connected

[5] 11
Source: https://www.researchgate.net
Working of DBSCAN
VI
Noise
Vishwakarma Institute of Technology

Cluster
ε

Eps = 0.8 Border Point


minPts= 4
12
Source: Group 15 Source:https://wikipedia.org/
Advantages of DBSCAN
VI
• DBSCAN does not require one to specify the number of clusters
beforehand.
Vishwakarma Institute of Technology

• DBSCAN performs well with arbitrary shaped clusters.

• DBSCAN has a notion of noise, and is robust to outliers.

• DBSCAN can find any shape of clusters. The cluster doesn’t have
to be circular. [3]

13
Disadvantage of DBSCAN
VI
• Cannot work with datasets of varying densities.
Vishwakarma Institute of Technology

• Sensitive to the clustering hyper-parameters – the eps and the min_pts.

• Fails if the data is too sparse.

• The density measures (Reachability and Connectivity) can be affected by


sampling. [3]

14
Application of DBSCAN
VI
[6]
01 Document Network Analysis
Vishwakarma Institute of Technology

Recommendation Systems 02

03 X-ray Crystallography

Social Network Analysis 04

15
Conclusion
VI
• Density-based clustering has been applied successfully for cluster
analysis in many different contexts.
Vishwakarma Institute of Technology

• In general, density-based clustering aims at identifying clusters as


areas of high-point density that are separated by areas of low-point
density and, thus, can be arbitrarily shaped in the data space.

16
References
VI
[1] Amandeep Kaur Mann& Navneet Kaur, “Review Paper on Clustering Techniques”, Global Journals Inc.
USA, Volume 13 Issue 5, pp.43-47, 2013, Available:https://core.ac.uk/reader/231159370 [Accessed Sept.
Vishwakarma Institute of Technology

29, 2021].
[2] M. A. Deshmukh, R. A. Gulhane, “Importance of Clustering in Data Mining0”, International Journal of
Scientific & Engineering Research, Volume 7, Issue 2, pp.247-251, February-2016, Available:https://
www.ijser.org/researchpaper/Importance-of-Clustering-in-Data-Mining.pdf
[3]Albou Kadel, “Types of Clustering Methods: Overview and Quick Start R Code”,
https://www.datanovia.com/en/blog/types-of-clustering-methods-overview-and-quick-start-r-code/
(Accessed Sep 29,2021)
[4] Yewang Chen, Lida Zhou, Nizar Bouguila,Cheng Wang, Yi Chen, Jixiang Du, “BLOCK-DBSCAN: Fast
clustering for large scale data”, Pattern Recognition,China, Volume 109, January 2021,107624,Available:
https://www.sciencedirect.com/science/article/abs/pii/S003132032030421
[5] Pradeep Singh, Prateek A. Meshram,“Survey of Density Based Clustering Algorithms and its
Variants”,International Conference on Inventive Computing and Informatics (ICICI 2017), coimbatore,
India, pp. 920-926, 2017 Available: https://sci-hub.mksa.top/10.1109/ICICI.2017.8365272
[6] Sunit Prasad, “Different Types of Clustering Methods and Applications”,
https://www.analytixlabs.co.in/blog/types-of-clustering-algorithms/, (Accessed Oct 1, 2021).
17
Thank You

Source://https://ml-explained.com 18

You might also like