UCS551 Chapter 7 - Clustering

Uploaded by

Farah Yahaya

0% found this document useful (0 votes)

4 views9 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views9 pages

UCS551 Chapter 7 - Clustering

Uploaded by

Farah Yahaya

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

CHAPTER 7 :

CLUSTERING
DR AZLIN AHMAD
CONTENT

 K-Means
 K-Nearest Neighbour
WHAT IS CLUSTERING?

 Clustering:-
 a process of grouping similar objects into groups called clusters
 The clusters resemble the hidden patterns in the data set
 widely used in numerous applications
 pattern recognition, data analysis, image processing, life sciences etc

 traditional clustering algorithms, among the most popularly

used
 K-Means, K-Nearest Neighbors, Kohonen Self Organizing
Maps (KSOM), Hierarchical Clustering etc
 Clustering analysis is used to gain some valuable insights
from our data by seeing what groups the data points fall into
when we apply a clustering algorithm.
K-MEANS CLUSTERING

 K-Means is probably the most well know clustering algorithm.

 Commonly used in solving various clustering problems in many areas.
 K-Means has the advantage that it is pretty fast, as all we’re really doing is computing the distances between
points and group centers; very few computations
 K-Means has a couple of disadvantages.
 Have to select how many groups/classes there are
 starts with a random choice of cluster centers and therefore it may yield different clustering results on different runs of the
algorithm
HOW?
select a number of classes/groups to use and randomly initialize
their respective center points

Each data point is classified by computing the distance between

that point and each group center, and then classifying the point to
be in the group whose center is closest to it.

Based on these classified points, we recompute the group center

by taking the mean of all the vectors in the group.

Repeat these steps for a set number of iterations or until the group
centers don’t change much between iterations. You can also opt to
randomly initialize the group centers a few times, and then select
the run that looks like it provided the best results.
K-NEAREST NEIGHBOR

 The KNN algorithm assumes that similar things

exist in close proximity.
 In other words, similar things are near to each
other.
 Notice in the image above that most of the
time, similar data points are close to each other.
 KNN captures the idea of similarity (sometimes
called distance, proximity, or closeness) with
some mathematics we might have learned in
our childhood— calculating the distance
between points on a graph.
HOW?

Initialize K to your chosen

Load the data For each example in the data:
number of neighbors
Calculate the distance between the query
example and the current example from
Sort the ordered collection of the data.
Pick the first K
distances and indices from
entries from the
smallest to largest (in Add the distance and the index of the example
sorted collection
ascending order) by the to an ordered collection
distances

Get the labels of the If regression, return the mean of the K labels
selected K entries

If classification, return the mode of the K labels

ADVANTAGES & DISADVANTAGES

 Advantages
 The algorithm is simple and easy to implement.
 There’s no need to build a model, tune several parameters, or make additional assumptions.
 The algorithm is versatile. It can be used for classification, regression, and search (as we
will see in the next section).
 Disadvantages
 The algorithm gets significantly slower as the number of examples and/or
predictors/independent variables increase.

DM Lecture 06
Document32 pages
DM Lecture 06
Sameer Ahmad
No ratings yet
Assignment 5
Document3 pages
Assignment 5
Pujan Patel
No ratings yet
KNN VS Kmeans
Document3 pages
KNN VS Kmeans
Soubhagya Kumar Sahoo
No ratings yet
Create List Using Range
Document6 pages
Create List Using Range
YUKTA JOSHI
No ratings yet
DMBI5
Document9 pages
DMBI5
Shubham Jha
No ratings yet
Clustering For Big Data Analytics
Document28 pages
Clustering For Big Data Analytics
hakona
No ratings yet
U02Lecture08 Statistical Machine Learning
Document41 pages
U02Lecture08 Statistical Machine Learning
tunio.bscsf21
No ratings yet
K Means Clustering
Document6 pages
K Means Clustering
Alina Corina Bala
No ratings yet
ML Unit-Iv
Document20 pages
ML Unit-Iv
21u41a05f5
No ratings yet
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
Document4 pages
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
Ijarcsee Journal
No ratings yet
Recursive Hierarchical Clustering Algorithm
Document7 pages
Recursive Hierarchical Clustering Algorithm
reader29
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
Document6 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
sinigersky
No ratings yet
10 Marks Questions
Document19 pages
10 Marks Questions
Anupriya Veerasamy
No ratings yet
Summary - MachineLearning (Part 2)
Document19 pages
Summary - MachineLearning (Part 2)
aril dan
No ratings yet
Unsupervisd Learning Algorithm
Document6 pages
Unsupervisd Learning Algorithm
Shrey Dixit
No ratings yet
Assignment No 5 K-Means Clustering
Document2 pages
Assignment No 5 K-Means Clustering
Vaishnavi Gurav
No ratings yet
An Initial Seed Selection Algorithm
Document11 pages
An Initial Seed Selection Algorithm
hamzarash090
No ratings yet
Clustering: Clustering Is One of The Most Common Exploratory Data Analysis
Document5 pages
Clustering: Clustering Is One of The Most Common Exploratory Data Analysis
Mada
No ratings yet
Unsupervised Learning: K-Means Clustering
Document23 pages
Unsupervised Learning: K-Means Clustering
ariw200201
No ratings yet
Concepts and Techniques: - Chapter 10
Document97 pages
Concepts and Techniques: - Chapter 10
sebpky
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
Document9 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
Nikhil Jojen
No ratings yet
Text Analytics Unit-3
Document11 pages
Text Analytics Unit-3
aathyukthas.ai20001
No ratings yet
Unit 4
Document74 pages
Unit 4
Sai Manasa
No ratings yet
Partitioning Methods
Document26 pages
Partitioning Methods
Ahmed hussain
No ratings yet
Lecture Material 12
Document9 pages
Lecture Material 12
Ali Naseer
No ratings yet
Research Papers On K Means Clustering
Document5 pages
Research Papers On K Means Clustering
tutozew1h1g2
100% (3)
10 Clus Basic
Document92 pages
10 Clus Basic
Mike Ku
No ratings yet
Clustering
Document7 pages
Clustering
Deepak Varma
No ratings yet
DWBI4
Document10 pages
DWBI4
Dhanraj Deore
No ratings yet
10clustering - Han and Kamber
Document93 pages
10clustering - Han and Kamber
sukruth
No ratings yet
KMEANS
Document9 pages
KMEANS
johnzenbano120
No ratings yet
4 Clustering With K-Means - Kaggle
Document9 pages
4 Clustering With K-Means - Kaggle
Prujith Muthu Ram
No ratings yet
Clustering in R
Document12 pages
Clustering in R
Renuka
No ratings yet
Hierarchical Clustering Unit 4 ML
Document14 pages
Hierarchical Clustering Unit 4 ML
Smriti Sharma
No ratings yet
Group 4 Documentation of KNN 1
Document5 pages
Group 4 Documentation of KNN 1
vexas
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
Document6 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
M Media
No ratings yet
1.0 Modeling: 1.1 Classification
Document5 pages
1.0 Modeling: 1.1 Classification
Banujan Kuhaneswaran
No ratings yet
ML (Interview)
Document20 pages
ML (Interview)
ratnadepp
No ratings yet
4 Clustering
Document9 pages
4 Clustering
Bibek Neupane
No ratings yet
Unit-Iv Material
Document24 pages
Unit-Iv Material
Udaya sri
No ratings yet
Meeting 7 Unsupervised Learnign
Document95 pages
Meeting 7 Unsupervised Learnign
Antonio Victory
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
Document11 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
IAEME Publication
No ratings yet
Assignment 2 With Program
Document8 pages
Assignment 2 With Program
Palash Saroware
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
Document53 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
Maha Lakshmi
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
Document11 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
samaksh
No ratings yet
SML Hand Note Bau by DT
Document1 page
SML Hand Note Bau by DT
farida1971yasmin
No ratings yet
Case Study-1: Department of Computer Science and Engineering (7 Semester)
Document16 pages
Case Study-1: Department of Computer Science and Engineering (7 Semester)
Mofi Gohar
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
Document7 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
erpublication
No ratings yet
Clustering High Dimensional Data
Document15 pages
Clustering High Dimensional Data
Inzemam Ul Haq
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
Document5 pages
Analysis&Comparisonof Efficient Techniquesof
astha
No ratings yet
Cluster Analysis Thesis Matlab Code PDF
Document7 pages
Cluster Analysis Thesis Matlab Code PDF
dothakellersiouxfalls
100% (2)
R Code For Discriminant and Cluster Analysis
Document23 pages
R Code For Discriminant and Cluster Analysis
Nguyễn Oanh
No ratings yet
ML4 Unsupervised Learning
Document60 pages
ML4 Unsupervised Learning
andesong88
No ratings yet
Ambo University Inistitute of Technology Department of Computer Science
Document13 pages
Ambo University Inistitute of Technology Department of Computer Science
sifan Mirkena
No ratings yet
Ambo University: Inistitute of Technology
Document15 pages
Ambo University: Inistitute of Technology
abay
No ratings yet
Data Mining Clustering
Document76 pages
Data Mining Clustering
Anjali Asha Jacob
No ratings yet
Unit 5
Document31 pages
Unit 5
minichel
No ratings yet
Jaipur National University: Project Design With Seminar
Document26 pages
Jaipur National University: Project Design With Seminar
Faizan Shaikh
100% (1)
Week 5 Discussion 2 Algorithms of Cluster Analysis. 1) What Is K-Means From A Basic Standpoint?
Document4 pages
Week 5 Discussion 2 Algorithms of Cluster Analysis. 1) What Is K-Means From A Basic Standpoint?
Thota Tulasi
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Syllabus Data Analysis Course For Sociology
Document2 pages
Syllabus Data Analysis Course For Sociology
The_bespelled
No ratings yet
Minitab Cheat
Document9 pages
Minitab Cheat
waqas104
No ratings yet
Impulse Response
Document4 pages
Impulse Response
destiny710
No ratings yet
Excel Tu MBM 1st Sem Exam Solution
Document10 pages
Excel Tu MBM 1st Sem Exam Solution
Smaran Paudel
No ratings yet
Problem Set 1 Solutions
Document32 pages
Problem Set 1 Solutions
ale.ili.pau
No ratings yet
Decision Trees Classification: Mustafa Jarrar
Document46 pages
Decision Trees Classification: Mustafa Jarrar
rferreira85
No ratings yet
Regression Analysis Using Excel: X Abp
Document7 pages
Regression Analysis Using Excel: X Abp
Bonaventure Nzeyimana
No ratings yet
Sta416 2 2
Document23 pages
Sta416 2 2
muhd.ajimmeh
No ratings yet
Simple Linear Regression Analysis
Document27 pages
Simple Linear Regression Analysis
aksh
No ratings yet
Stat A02
Document6 pages
Stat A02
Sunny Le
No ratings yet
Regression Analysis Problems
Document4 pages
Regression Analysis Problems
Kazi Shuvo
No ratings yet
Short Term Traffic Flow Forecasting by Time Series Analysis
Document7 pages
Short Term Traffic Flow Forecasting by Time Series Analysis
Arif Hasnat
No ratings yet
Statistics Regression Final Project
Document12 pages
Statistics Regression Final Project
Henry Pinolla
100% (1)
Writing The Results Section
Document11 pages
Writing The Results Section
Alina Ciabuca
No ratings yet
Econometrics With Stata PDF
Document58 pages
Econometrics With Stata PDF
barkon desie
No ratings yet
Business Statistics Unit 3-5
Document113 pages
Business Statistics Unit 3-5
Hemant Bisht
No ratings yet
FLUKTUASI HARGA KOMODITAS PANGAN DAN DAMPAKNYA Terhadap Inflasi
Document17 pages
FLUKTUASI HARGA KOMODITAS PANGAN DAN DAMPAKNYA Terhadap Inflasi
azam aqil
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 3 Notes
Document21 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 3 Notes
Harini
No ratings yet
Analysis of Variance
Document10 pages
Analysis of Variance
বিজয় সিকদার
100% (1)
Statistics Idiots Guide!: Dr. Hamda Qotba
Document20 pages
Statistics Idiots Guide!: Dr. Hamda Qotba
kksun
No ratings yet
Parallel Forms Reliability
Document2 pages
Parallel Forms Reliability
Cel Poyos
No ratings yet
Ewan
Document144 pages
Ewan
Christian Aquino
No ratings yet
Mckinney Time Series
Document29 pages
Mckinney Time Series
Sean Liu
No ratings yet
Volatility Modeling of Emerging Indices of The World: Nadeem Akhter-19PGDM BHU040, Nilabro Biswas-19PGDM BHU044
Document20 pages
Volatility Modeling of Emerging Indices of The World: Nadeem Akhter-19PGDM BHU040, Nilabro Biswas-19PGDM BHU044
Nadeem Akhter
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
Document3 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
Owen Deng
No ratings yet
Factors Affecting Performance of Kabaddi Players Inprogress (Autosaved)
Document29 pages
Factors Affecting Performance of Kabaddi Players Inprogress (Autosaved)
h20230845
No ratings yet
Lampiran 4 Uji Asumsi Klasik
Document3 pages
Lampiran 4 Uji Asumsi Klasik
sunar ko
No ratings yet
Kernel PCA
Document6 pages
Kernel PCA
शुभमपटेल
No ratings yet
Multinomial Logit Models
Document6 pages
Multinomial Logit Models
Ricardo Campos Espinoza
No ratings yet
A00-240 Dump Sample
Document6 pages
A00-240 Dump Sample
AJ AJ
No ratings yet