Welcome to Scribd!

Skip carousel

Basics of Hierarchical Clustering: Shaumik Daityari

Uploaded by

NourheneMbarek

0% found this document useful (0 votes)

7 views30 pages

Original Title

chapter2

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

7 views30 pages

Basics of Hierarchical Clustering: Shaumik Daityari

Uploaded by

NourheneMbarek

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 30

Search inside document

Basics of hierarchical

clustering
CLUS TERIN G METH ODS W ITH S CIP Y

Shaumik Daityari
Business Analyst
Creating a distance matrix using linkage
scipy.cluster.hierarchy.linkage(observations,
method='single',
metric='euclidean',
optimal_ordering=False
)

method : how to calculate the proximity of clusters

metric : distance metric

optimal_ordering : order data points

CLUSTERING METHODS WITH SCIPY

Which method should use?
single: based on two closest objects

complete: based on two farthest objects

average: based on the arithmetic mean of all objects

centroid: based on the geometric mean of all objects

median: based on the median of all objects

ward: based on the sum of squares

CLUSTERING METHODS WITH SCIPY

Create cluster labels with fcluster
scipy.cluster.hierarchy.fcluster(distance_matrix,
num_clusters,
criterion
)

distance_matrix : output of linkage() method

num_clusters : number of clusters

criterion : how to decide thresholds to form clusters

CLUSTERING METHODS WITH SCIPY

Hierarchical clustering with ward method

CLUSTERING METHODS WITH SCIPY

Hierarchical clustering with single method

CLUSTERING METHODS WITH SCIPY

Hierarchical clustering with complete method

CLUSTERING METHODS WITH SCIPY

Final thoughts on selecting a method
No one right method for all

Need to carefully understand the distribution of data

CLUSTERING METHODS WITH SCIPY

Let's try some
exercises
CLUS TERIN G METH ODS W ITH S CIP Y
Visualize clusters
CLUS TERIN G METH ODS W ITH S CIP Y

Shaumik Daityari
Business Analyst
Why visualize clusters?
Try to make sense of the clusters formed

An additional step in validation of clusters

Spot trends in data

CLUSTERING METHODS WITH SCIPY

An introduction to seaborn
seaborn : a Python data visualization library based on matplotlib

Has better, easily modi able aesthetics than matplotlib!

Contains functions that make data visualization tasks easy in the context of data analytics

Use case for clustering: hue parameter for plots

CLUSTERING METHODS WITH SCIPY

Visualize clusters with matplotlib
from matplotlib import pyplot as plt

df = pd.DataFrame({'x': [2, 3, 5, 6, 2],

'y': [1, 1, 5, 5, 2],
'labels': ['A', 'A', 'B', 'B', 'A']})

colors = {'A':'red', 'B':'blue'}

df.plot.scatter(x='x',
y='y',
c=df['labels'].apply(lambda x: colors[x]))
plt.show()

CLUSTERING METHODS WITH SCIPY

Visualize clusters with seaborn
from matplotlib import pyplot as plt
import seaborn as sns

df = pd.DataFrame({'x': [2, 3, 5, 6, 2],

'y': [1, 1, 5, 5, 2],
'labels': ['A', 'A', 'B', 'B', 'A']})

sns.scatterplot(x='x',
y='y',
hue='labels',
data=df)
plt.show()

CLUSTERING METHODS WITH SCIPY

Comparison of both methods of visualization
MATPLOTLIB PLOT SEABORN PLOT

CLUSTERING METHODS WITH SCIPY

Next up: Try some
visualizations
CLUS TERIN G METH ODS W ITH S CIP Y
How many clusters?
CLUS TERIN G METH ODS W ITH S CIP Y

Shaumik Daityari
Business Analyst
Introduction to dendrograms
Strategy till now - decide clusters on visual
inspection

Dendrograms help in showing progressions as

clusters are merged

A dendrogram is a branching diagram that

demonstrates how each cluster is composed by
branching out into its child nodes

CLUSTERING METHODS WITH SCIPY

Create a dendrogram in SciPy
from scipy.cluster.hierarchy import dendrogram

Z = linkage(df[['x_whiten', 'y_whiten']],
method='ward',
metric='euclidean')

dn = dendrogram(Z)
plt.show()

CLUSTERING METHODS WITH SCIPY

CLUSTERING METHODS WITH SCIPY
CLUSTERING METHODS WITH SCIPY
CLUSTERING METHODS WITH SCIPY
CLUSTERING METHODS WITH SCIPY
CLUSTERING METHODS WITH SCIPY
Next up - try some
exercises
CLUS TERIN G METH ODS W ITH S CIP Y
Limitations of
hierarchical
clustering
CLUS TERIN G METH ODS W ITH S CIP Y

Shaumik Daityari
Business Analyst
Measuring speed in hierarchical clustering
timeit module

Measure the speed of .linkage() method

Use randomly generated points

Run various iterations to extrapolate

CLUSTERING METHODS WITH SCIPY

Use of timeit module
from scipy.cluster.hierarchy import linkage
import pandas as pd
import random, timeit

points = 100
df = pd.DataFrame({'x': random.sample(range(0, points), points),
'y': random.sample(range(0, points), points)})

%timeit linkage(df[['x', 'y']], method = 'ward', metric = 'euclidean')

1.02 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CLUSTERING METHODS WITH SCIPY

Comparison of runtime of linkage method
Increasing runtime with data points

Quadratic increase of runtime

Not feasible for large datasets

CLUSTERING METHODS WITH SCIPY

Next up - exercises
CLUS TERIN G METH ODS W ITH S CIP Y

Chapter2 Hierarchical Clustering
Document30 pages
Chapter2 Hierarchical Clustering
Komi David ABOTSITSE
No ratings yet
Cluster Analysis in Python Chapter2 PDF
Document30 pages
Cluster Analysis in Python Chapter2 PDF
Fgpeqw
No ratings yet
Chapter 2
Document30 pages
Chapter 2
Cerveza La Gaceta
No ratings yet
Unsupervised Learning: Basics: Shaumik Daityari
Document31 pages
Unsupervised Learning: Basics: Shaumik Daityari
NourheneMbarek
No ratings yet
Chapter3 PDF
Document25 pages
Chapter3 PDF
NourheneMbarek
No ratings yet
Cluster Analysis in Python Chapter1 PDF
Document31 pages
Cluster Analysis in Python Chapter1 PDF
Fgpeqw
No ratings yet
Chapter 1
Document31 pages
Chapter 1
Cerveza La Gaceta
No ratings yet
Data Mining
Document18 pages
Data Mining
ayushchoudhary20289
No ratings yet
Dominant Colors in Images: Shaumik Daityari
Document30 pages
Dominant Colors in Images: Shaumik Daityari
NourheneMbarek
No ratings yet
AI With Python - Unsupervised Learning - Clustering
Document12 pages
AI With Python - Unsupervised Learning - Clustering
Chandu Chandrakanth
No ratings yet
21BEC505 Exp2
Document7 pages
21BEC505 Exp2
jay
No ratings yet
Supervised Learning With Scikit-Learn: How Good Is Your Model?
Document31 pages
Supervised Learning With Scikit-Learn: How Good Is Your Model?
Victor Ng
No ratings yet
Exp 8
Document5 pages
Exp 8
Soban Maruf
No ratings yet
11 Different Ways For Outlier Detection in Python
Document11 pages
11 Different Ways For Outlier Detection in Python
Neethu Merlin Alan
No ratings yet
Machine Learning
Document56 pages
Machine Learning
Mani Vrs
100% (3)
Lab Experiment No. 4 Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
Document9 pages
Lab Experiment No. 4 Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
Dhruv jain
No ratings yet
ML - Practical File
Document15 pages
ML - Practical File
Jatin Mathur
No ratings yet
How Good Is Your Model?: Andreas Müller
Document54 pages
How Good Is Your Model?: Andreas Müller
rickshark
No ratings yet
Supervised Learning With Scikit-Learn: Introduction To Regression
Document31 pages
Supervised Learning With Scikit-Learn: Introduction To Regression
NourheneMbarek
No ratings yet
Unit-Iv Material
Document24 pages
Unit-Iv Material
Udaya sri
No ratings yet
TD2345
Document3 pages
TD2345
ashitaka667
No ratings yet
Data Mining
Document98 pages
Data Mining
Jijeesh Baburajan
No ratings yet
FMLASS3Q7 - Jupyter Notebook
Document6 pages
FMLASS3Q7 - Jupyter Notebook
ch20b049
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
Document7 pages
JAVIER KMeans Clustering Jupyter Notebook
Alleah Laylo
No ratings yet
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
Document6 pages
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
Teerapat Jenrungrot
No ratings yet
25th International Conference On Computer Theory and Applications, ICCTA 2015, 24-26 October 2015, Alexandria, Egypt
Document196 pages
25th International Conference On Computer Theory and Applications, ICCTA 2015, 24-26 October 2015, Alexandria, Egypt
science2222
100% (1)
Data Mining: Clustering
Document46 pages
Data Mining: Clustering
shwetadhatterwal
No ratings yet
DWM Exp8 127 133 137
Document4 pages
DWM Exp8 127 133 137
Manav Purswani
No ratings yet
Fraud Detection in Python Chapter3
Document33 pages
Fraud Detection in Python Chapter3
Fgpeqw
No ratings yet
Techniques of Cluster Analysis: A Seminar On
Document25 pages
Techniques of Cluster Analysis: A Seminar On
VAIBHAV NANAWARE
No ratings yet
Diabetes Case Study - Jupyter Notebook
Document10 pages
Diabetes Case Study - Jupyter Notebook
Abhising
100% (1)
12s MidI - SampleExam Print1
Document8 pages
12s MidI - SampleExam Print1
Divya Gn
No ratings yet
Concepts and Techniques: - Chapter 10
Document97 pages
Concepts and Techniques: - Chapter 10
sebpky
No ratings yet
SUMERA - Kmeans Clustering - Jupyter Notebook
Document7 pages
SUMERA - Kmeans Clustering - Jupyter Notebook
Alleah Laylo
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
Document11 pages
St. John College of Engineering and Management, Palghar - Maharashtra
pranay
No ratings yet
CS583 Clustering
Document40 pages
CS583 Clustering
ruoibmt
No ratings yet
s18 Cu6051np Cw1 17031944 Nirakar Sigdel
Document18 pages
s18 Cu6051np Cw1 17031944 Nirakar Sigdel
Santosh Lamichhane
No ratings yet
Unit Ii DM
Document82 pages
Unit Ii DM
Suganthi D PSGRKCW
No ratings yet
Grouping
Document98 pages
Grouping
Aditya Patel
No ratings yet
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
Document8 pages
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
Snr Kofi Agyarko Ababio
No ratings yet
Machine Learning With Scikit-Learn: George Boorman
Document34 pages
Machine Learning With Scikit-Learn: George Boorman
AS
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
Document2 pages
1 An Introduction To Machine Learning With Scikit Learn
kehinde.ogunleye007
No ratings yet
Jaipur National University: Project Design With Seminar
Document26 pages
Jaipur National University: Project Design With Seminar
Faizan Shaikh
100% (1)
Chapter 2
Document49 pages
Chapter 2
Rajachandra Voodiga
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
Document38 pages
Designing Machine Learning Workflows in Python Chapter4
Fgpeqw
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
Document7 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
Kartik Katekar
No ratings yet
Survey of Clustering Algorithms
Document37 pages
Survey of Clustering Algorithms
Aniket Roy
No ratings yet
Data Mining Business Report Set
Document12 pages
Data Mining Business Report Set
priyada16
No ratings yet
Python Machine Learning - Logistic Regression
Document1 page
Python Machine Learning - Logistic Regression
ahmed salem
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
Document22 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
Ahm Tharwat
No ratings yet
Cui 2014
Document11 pages
Cui 2014
ARI TUBIL
No ratings yet
Kabir Data Preprocessing Python
Document14 pages
Kabir Data Preprocessing Python
El Arbi Abdellaoui Alaoui
No ratings yet
Ex No: Date: K-Means Clustering Using Python: Scatter
Document10 pages
Ex No: Date: K-Means Clustering Using Python: Scatter
Jasmitha B
No ratings yet
Advanced Cluster Analysis: Clustering High-Dimensional Data
Document49 pages
Advanced Cluster Analysis: Clustering High-Dimensional Data
Priyanka Bhardwaj
No ratings yet
Artigo Smallex
Document17 pages
Artigo Smallex
Will Corleone
No ratings yet
Techniques of Cluster Analysis: A Seminar On
Document25 pages
Techniques of Cluster Analysis: A Seminar On
VAIBHAV NANAWARE
No ratings yet
The K-Modes Algorithm With Entropy Based Similarity Coefficient
Document6 pages
The K-Modes Algorithm With Entropy Based Similarity Coefficient
Deden Istiawan
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Data Science Revealed: With Feature Engineering, Data Visualization, Pipeline Development, and Hyperparameter Tuning
From Everand
Data Science Revealed: With Feature Engineering, Data Visualization, Pipeline Development, and Hyperparameter Tuning
Tshepo Chris Nokeri
No ratings yet
Explorer Award Exam: IA Pour L'ingénieur 3DNI ISITCOM 2019-2020
Document12 pages
Explorer Award Exam: IA Pour L'ingénieur 3DNI ISITCOM 2019-2020
NourheneMbarek
No ratings yet
SAAI1-AI Analyst 2019-Course Guide 1
Document166 pages
SAAI1-AI Analyst 2019-Course Guide 1
NourheneMbarek
No ratings yet
Logistic Regression and Regularization: Michael (Mike) Gelbart
Document19 pages
Logistic Regression and Regularization: Michael (Mike) Gelbart
NourheneMbarek
No ratings yet
Supervised Learning With Scikit-Learn: Preprocessing Data
Document32 pages
Supervised Learning With Scikit-Learn: Preprocessing Data
NourheneMbarek
No ratings yet
Welcome To The Course!: Michael (Mike) Gelbart
Document17 pages
Welcome To The Course!: Michael (Mike) Gelbart
NourheneMbarek
No ratings yet
Linear Classi Ers: Prediction Equations: Michael (Mike) Gelbart
Document22 pages
Linear Classi Ers: Prediction Equations: Michael (Mike) Gelbart
NourheneMbarek
No ratings yet
Supervised Learning With Scikit-Learn: Introduction To Regression
Document31 pages
Supervised Learning With Scikit-Learn: Introduction To Regression
NourheneMbarek
No ratings yet
Importing Data in Python I: Introduction To Relational Databases
Document33 pages
Importing Data in Python I: Introduction To Relational Databases
NourheneMbarek
No ratings yet
Introduction To Databases in Python: Calculating Values Ina Query
Document30 pages
Introduction To Databases in Python: Calculating Values Ina Query
NourheneMbarek
No ratings yet
Introduction To Databases in Python: Filtering and Targeting Data
Document32 pages
Introduction To Databases in Python: Filtering and Targeting Data
NourheneMbarek
No ratings yet
Cleaning Data in Python: Pu!ing It All Together
Document14 pages
Cleaning Data in Python: Pu!ing It All Together
NourheneMbarek
No ratings yet
Introduction To Databases in Python: Creating Databases and Tables
Document31 pages
Introduction To Databases in Python: Creating Databases and Tables
NourheneMbarek
No ratings yet
ch4 Slides PDF
Document44 pages
ch4 Slides PDF
NourheneMbarek
No ratings yet
Cleaning Data in Python
Document26 pages
Cleaning Data in Python
NourheneMbarek
No ratings yet
Data Mining Cheat Sheet
Document6 pages
Data Mining Cheat Sheet
NourheneMbarek
No ratings yet
Cleaning Data in Python
Document24 pages
Cleaning Data in Python
NourheneMbarek
No ratings yet
XL Miner User Guide
Document420 pages
XL Miner User Guide
Mary Williams
No ratings yet
Analytics and Decision Science
Document16 pages
Analytics and Decision Science
Duncan Wekesa
No ratings yet
Ebook 042 Tutorial Spss Hierarchical Cluster Analysis
Document17 pages
Ebook 042 Tutorial Spss Hierarchical Cluster Analysis
sneha
No ratings yet
Assignment 5 1
Document13 pages
Assignment 5 1
Unnati Thakkar
No ratings yet
2.2019 Calcium Data Clustering Hirarchial
Document18 pages
2.2019 Calcium Data Clustering Hirarchial
satyadeepika neelapala
No ratings yet
Business Analytics 1 Ca 2
Document26 pages
Business Analytics 1 Ca 2
Bharti
No ratings yet
Chap15 Cluster Analysis
Document55 pages
Chap15 Cluster Analysis
Jakhongir
No ratings yet
Cluto Clusterring Manual
Document71 pages
Cluto Clusterring Manual
vukmil
No ratings yet
The Classification of Style in Fine-Art Painting
Document10 pages
The Classification of Style in Fine-Art Painting
wobblegobble
No ratings yet
Clustering Algorithms
Document61 pages
Clustering Algorithms
Ayesha Khan
No ratings yet
Week 07 Lecture Material
Document49 pages
Week 07 Lecture Material
Meer Hassan
No ratings yet
New Hardness Results For Planar Graph Problems in P and An Algorithm For Sparsest Cut
Document34 pages
New Hardness Results For Planar Graph Problems in P and An Algorithm For Sparsest Cut
71 46
No ratings yet
A Beginner's Guide To Hierarchical Clustering
Document23 pages
A Beginner's Guide To Hierarchical Clustering
Abhinava
No ratings yet
MGNM801 Ca2
Document19 pages
MGNM801 Ca2
Atul Kumar
No ratings yet
10.cluster Analysis
Document68 pages
10.cluster Analysis
ironchefff
No ratings yet
UNIT V DWM Notes
Document18 pages
UNIT V DWM Notes
mjviru
No ratings yet
20bce0630 VL2022230504993 Pe003
Document11 pages
20bce0630 VL2022230504993 Pe003
Dominic Toretto
No ratings yet
Intro Clustering in R
Document346 pages
Intro Clustering in R
Amelia Sánchez Reyes
No ratings yet
Hierarchical Clustering
Document35 pages
Hierarchical Clustering
Wet
No ratings yet
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
Document13 pages
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
FARHAN KAMAL
No ratings yet
Practical Data Analysis Cookbook - Sample Chapter
Document31 pages
Practical Data Analysis Cookbook - Sample Chapter
Packt Publishing
100% (1)
An Introduction To Clustering and Different Methods of Clustering
Document9 pages
An Introduction To Clustering and Different Methods of Clustering
Leonor Patricia MEDINA SIFUENTES
No ratings yet
IRS Unit-4
Document13 pages
IRS Unit-4
Anonymous iCFJ73OMpD
50% (4)
07 Hierarchical Clustering
Document19 pages
07 Hierarchical Clustering
Oussama D. Otaku
No ratings yet
AMR - Assignment 1-Sample Solutions
Document7 pages
AMR - Assignment 1-Sample Solutions
M.F. Stoffijn
No ratings yet
ROCK Clustering Example
Document4 pages
ROCK Clustering Example
shehzad791
100% (2)
2 ADA Cluster Analysis
Document12 pages
2 ADA Cluster Analysis
Ash
No ratings yet
Python Machine Learning
Document19 pages
Python Machine Learning
duc anh
No ratings yet
Machine Learning Intro Final
Document74 pages
Machine Learning Intro Final
Jelena Nađ
No ratings yet