Welcome to Scribd!

Skip carousel

Clustering Part1

Uploaded by

daniel.olea

0% found this document useful (0 votes)

6 views19 pages

Original Title

Clustering_Part1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views19 pages

Clustering Part1

Uploaded by

daniel.olea

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 19

Search inside document

Clustering

What is Clustering?
• Attach label to each observation or data points in a set
• You can say this “unsupervised learning”
• Clustering is alternatively called as “grouping”
• Intuitively, if you would want to assign the same label to
a data points that are “close” to each other
But, how many clusters do we have?
Example 1 Example 2

students in our school of

engineering
Application Example
Geochemical study of an impacted fluvial system
• sediments collected along the Rapel Fluvial System in Central Chile
• samples are analyzed using chemical and mineralogical analysis methods
• Features vectors are built
• Clustering
nts • Examples are shown in the map
oal
ure
and
ble
ese
cal
his
sis
and
ate
Clustering depends on the selected features

• The number of clusters (i.e. the clustering) depends on

what features are selected.
! ! !
s p x ∈ℜn c
Pre Extracción (Cálculo)
Sensado Clustering
Procesamiento de Características

n: número de
características

Mundo
Real
But, also in the distance metric

• Thus, after features are selected, the clustering

algorithms rely on a distance metric between data points
(features)
• Sometimes, it is said that the for clustering, the distance
metric is more important than the clustering algorithm
Distances: Quantitative Variables

Data point:
xi = [ xi1 … xip ]T
Some examples
Distances: Ordinal and Categorical
Variables

• Ordinal variables can be forced to lie within (0, 1) and then

a quantitative metric can be applied:

k −1/ 2
, k = 1,2,…, M
M

• For categorical variables, distances must be specified by

user between each pair of categories.
But, in some cases to use distances can be
tricky
K-means Overview

• “K” stands for number of clusters, it is typically a user

input to the algorithm; some criteria can be used to
automatically estimate K
• It is an approximation to an NP-hard combinatorial
optimization problem
• K-means algorithm is iterative in nature
• It converges, however only a local minimum is obtained
• Works only for numerical data
• Easy to implement
K-means: Setup

• x1,…, xN are data points or vectors of observations

• Each observation (vector xi) will be assigned to one and only one cluster

• C(i) denotes cluster number for the ith observation

• Dissimilarity measure: Euclidean distance metric

• K-means minimizes within-cluster point scatter:

1 K 2 K

∑ ∑ ∑ x − xj = ∑ Nk ∑
2
W (C) = xi − mk (Exercise)
2 k =1 C (i )=k C ( j )=k i k =1 C (i )=k

where

mk is the mean vector of the kth cluster

Nk is the number of observations in kth cluster

Within and Between Cluster Criteria
Let’s consider total point scatter for a set of N data points:

1 N N
T = ∑∑ d ( xi , x j )
2 i =1 j =1
Distance between two points
T can be re-written as:
1 K
T = ∑ ∑ ( ∑ d ( xi , x j ) + ∑ d ( xi , x j ))
2 k =1 C (i ) =k C ( j ) =k C ( j )≠k

= W (C ) + B(C )

If d is square Euclidean distance, then

K
1
Where, W (C ) = ∑ ∑ ∑ d ( xi , x j ) W (C ) = ∑ N k
K

∑ x −m
2
2 k =1 C (i )=k C ( j )=k i k
k =1 C (i )=k
Within cluster 1 K
scatter B(C ) = ∑ ∑ ∑ d ( xi , x j )
2 k =1 C (i )=k C ( j )≠ k
K
and B(C ) = ∑ N k mk − m
2
Ex.
k =1

Grand mean
Between cluster
scatter
Minimizing W(C) is equivalent to maximizing B(C)
K-means Algorithm

• For a given cluster assignment C of the data points,

compute the cluster means mk:

∑x
i:C ( i ) = k
i

mk = , k = 1,…, K .
Nk

• For a current set of cluster means, assign each

observation as:
2
C (i ) = arg min xi − mk , i = 1,…, N
1≤ k ≤ K

• Iterate above two steps until convergence

K-means example 1
K-means example 2
Selecting the Number of Clusters

• In a given data distribution k-means, and in general any

clustering algorithm can find k cluster (k>1 & k<N).
• Is it not easy how to choose k!
Fuzzy c-means
Fuzzy c-means
Fuzzy c-means

On the Tangent Space to the Space of Algebraic Cycles on a Smooth Algebraic Variety. (AM-157)
From Everand
On the Tangent Space to the Space of Algebraic Cycles on a Smooth Algebraic Variety. (AM-157)
Mark Green
No ratings yet
Nonlinear Functional Analysis and Applications: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin, Madison, October 12-14, 1970
From Everand
Nonlinear Functional Analysis and Applications: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin, Madison, October 12-14, 1970
Louis B. Rall
No ratings yet
K Means
Document33 pages
K Means
Naresh M Ranganatharao
No ratings yet
Improved K-Means Algorithm Based On The: Clustering Reliability Analysis
Document8 pages
Improved K-Means Algorithm Based On The: Clustering Reliability Analysis
David Moreno
No ratings yet
K-Means Clustering: CMPUT 615 Applications of Machine Learning in Image Analysis
Document13 pages
K-Means Clustering: CMPUT 615 Applications of Machine Learning in Image Analysis
pkrsuresh2013
No ratings yet
UNSUPERVISED LEARNING AND DIMENSIONALITY REDUCTION
Document58 pages
UNSUPERVISED LEARNING AND DIMENSIONALITY REDUCTION
SanaullahSunny
No ratings yet
3 Outliers in Rough K-Means Clustering
Document6 pages
3 Outliers in Rough K-Means Clustering
Anwar Shah
No ratings yet
CT321: DSP Lab - 4: Discrete Cosine Transform
Document2 pages
CT321: DSP Lab - 4: Discrete Cosine Transform
Mahmoud Handase35
No ratings yet
Clustering algorithms and techniques explained
Document34 pages
Clustering algorithms and techniques explained
Richa Jain
No ratings yet
A New Index of Cluster Validity: Mu-Chun Su
Document19 pages
A New Index of Cluster Validity: Mu-Chun Su
Helen Zhou
No ratings yet
Basic Matrix Operations: - Transpose
Document26 pages
Basic Matrix Operations: - Transpose
saraei7142356
No ratings yet
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided
Document47 pages
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided
Amritpal S
No ratings yet
Homework For Module 3 Part 1 PDF
Document3 pages
Homework For Module 3 Part 1 PDF
Sai Chaitanya Gudari
No ratings yet
ML Lecture06 2
Document63 pages
ML Lecture06 2
Marche Remi
No ratings yet
Clustering MIT 15.097 Course Notes
Document9 pages
Clustering MIT 15.097 Course Notes
alok
No ratings yet
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Document20 pages
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Shwetha B
No ratings yet
Unsupervised Learning - Clustering
Document55 pages
Unsupervised Learning - Clustering
arif.ishaan99
No ratings yet
Kalman Filters: CS 344R: Robotics Benjamin Kuipers
Document19 pages
Kalman Filters: CS 344R: Robotics Benjamin Kuipers
adarshsasidharan
No ratings yet
Roch Mmids Intro 3clustering
Document15 pages
Roch Mmids Intro 3clustering
aslamzohaib
No ratings yet
Ward K Means
Document36 pages
Ward K Means
Duyên Vũ
No ratings yet
A First Course in Digital Communication Solution Manual by Nguyen HH
Document307 pages
A First Course in Digital Communication Solution Manual by Nguyen HH
GREATJUSTGREAT
84% (25)
Pset 0
Document1 page
Pset 0
Muhammad Umar Tariq
No ratings yet
Pure maths IV exam assesses advanced topics
Document5 pages
Pure maths IV exam assesses advanced topics
nwokolobia nelson
No ratings yet
Chapter2 Algorithmic Paradigms
Document55 pages
Chapter2 Algorithmic Paradigms
vunguyen0377242
No ratings yet
The Coefficients of The Characteristic Polynomial in Terms of The Eigenvalues and The Elements of An N
Document5 pages
The Coefficients of The Characteristic Polynomial in Terms of The Eigenvalues and The Elements of An N
As Ashiffu
No ratings yet
Parameter Estimation
Document32 pages
Parameter Estimation
Rohit Singh
No ratings yet
Week6 Assignment Solutions
Document14 pages
Week6 Assignment Solutions
vicky.sajnani
No ratings yet
K XK XK X K XK Yk Yk Ykn Ykn: 7.9 State-Space Realizations 7.9.a Controllable Canonical Realization
Document9 pages
K XK XK X K XK Yk Yk Ykn Ykn: 7.9 State-Space Realizations 7.9.a Controllable Canonical Realization
Sayali Lomate
No ratings yet
2875 27398 1 SP
Document4 pages
2875 27398 1 SP
Ahmad Luky Ramdani
No ratings yet
K-Means Clustering Explained
Document8 pages
K-Means Clustering Explained
GOWRI M
No ratings yet
Clustering
Document61 pages
Clustering
efi
No ratings yet
Classnote 2
Document3 pages
Classnote 2
Edward
No ratings yet
Alt Proj
Document9 pages
Alt Proj
Shy PeachD
No ratings yet
01 K Means - Merged
Document26 pages
01 K Means - Merged
Pattranit Teerakoson
No ratings yet
Macroeconomics 1 PS4 PDF
Document3 pages
Macroeconomics 1 PS4 PDF
Taib Muffak
No ratings yet
V2 Dijkstra Layout
Document40 pages
V2 Dijkstra Layout
calpostigo
No ratings yet
K-Means++: The Advantages of Careful Seeding: David Arthur and Sergei Vassilvitskii
Document11 pages
K-Means++: The Advantages of Careful Seeding: David Arthur and Sergei Vassilvitskii
Ahmad Luky Ramdani
No ratings yet
04 Notes 6250 f13
Document16 pages
04 Notes 6250 f13
uranub2787
0% (1)
MTH742U Exam 2014
Document6 pages
MTH742U Exam 2014
adrverhar
No ratings yet
cs229 Notes7a
Document3 pages
cs229 Notes7a
chris
No ratings yet
Binomial Coefficient Identity Implies Beukers' Conjecture
Document3 pages
Binomial Coefficient Identity Implies Beukers' Conjecture
Oscar Gutierrez
No ratings yet
Optimal interval clustering DP method
Document10 pages
Optimal interval clustering DP method
María del Carmen Contreras
No ratings yet
Algebr As I Elemente de Statistic A Matematic A Curs 8: Adela Sasu, PH.D
Document14 pages
Algebr As I Elemente de Statistic A Matematic A Curs 8: Adela Sasu, PH.D
Nedelcu Luis Gabriel
No ratings yet
Sparse Regression and Dictionary Learning
Document14 pages
Sparse Regression and Dictionary Learning
mamurtaza
No ratings yet
EECE 5550 Mobile Robotics Lab #4: Due: Nov 21, 2022
Document6 pages
EECE 5550 Mobile Robotics Lab #4: Due: Nov 21, 2022
isic.mirjana
No ratings yet
Integration: Area and Estimating Finite Sums
Document12 pages
Integration: Area and Estimating Finite Sums
Fahrettin Cakir
No ratings yet
Second Exam Sheet: Taylor Polynomial Approximation
Document2 pages
Second Exam Sheet: Taylor Polynomial Approximation
Sameer Hmedat
No ratings yet
Chapter 6 (CONT') : Application: Powers of Matrices and Their Applications. 1 Powers of Matrices
Document9 pages
Chapter 6 (CONT') : Application: Powers of Matrices and Their Applications. 1 Powers of Matrices
Hafeezul Raziq
No ratings yet
An Extended Version of The K-Means Method For Overlapping Clustering
Document4 pages
An Extended Version of The K-Means Method For Overlapping Clustering
Nguyễn Duy
No ratings yet
At Univ Prince Edward Island On July 5, 2015 Downloaded From
Document10 pages
At Univ Prince Edward Island On July 5, 2015 Downloaded From
DebdyutiPal
No ratings yet
Lecture Notes in Discrete Mathematics Part 8
Document13 pages
Lecture Notes in Discrete Mathematics Part 8
Moch Dedy
No ratings yet
Cap3 B
Document20 pages
Cap3 B
João Celestino
No ratings yet
Probabilistic Clustering
Document40 pages
Probabilistic Clustering
Charlie
No ratings yet
Serie Révision
Document1 page
Serie Révision
nono 12
No ratings yet
08PM I Sol
Document10 pages
08PM I Sol
api-25887708
100% (1)
Random
Document6 pages
Random
Tytus Metrycki
No ratings yet
Permutations With Interval Cycles: Ars Combinatoria - Waterloo Then Winnipeg - July 2012
Document17 pages
Permutations With Interval Cycles: Ars Combinatoria - Waterloo Then Winnipeg - July 2012
Magritt
No ratings yet
Cult Trial Revised 1
Document7 pages
Cult Trial Revised 1
MUBASHIR HUSSAIN
No ratings yet
Chapter2 Algorithmic Paradigms
Document55 pages
Chapter2 Algorithmic Paradigms
Ngọc Nguyễn Quý
No ratings yet
Solved CS 2018
Document16 pages
Solved CS 2018
editorminati
No ratings yet
It Is Best To Have A Quiet Moment ..
Document5 pages
It Is Best To Have A Quiet Moment ..
Nuno Alves
No ratings yet
(BS 5955-8-2001) - Plastics Pipework (Thermoplastics Materials) - Specification For The Installation of Thermoplastic Pipes and Associated Fittings For Use in Domestic Ho
Document14 pages
(BS 5955-8-2001) - Plastics Pipework (Thermoplastics Materials) - Specification For The Installation of Thermoplastic Pipes and Associated Fittings For Use in Domestic Ho
john
No ratings yet
Analysis Based On 15-16 SOR
Document195 pages
Analysis Based On 15-16 SOR
Ajay Saikumar
No ratings yet
7fbef NMF
Document185 pages
7fbef NMF
Oka Works
100% (4)
NCPWB August 2002
Document6 pages
NCPWB August 2002
Claudia Mms
No ratings yet
Canning 160513032223
Document28 pages
Canning 160513032223
Nitin Singhal
No ratings yet
Uk - NCF TS 20
Document36 pages
Uk - NCF TS 20
Mohamed Hanafy Taha
No ratings yet
Unit Iii Bulk Processes Bulk Deformation
Document77 pages
Unit Iii Bulk Processes Bulk Deformation
Akash ak
No ratings yet
GeoTextiles PDF
Document6 pages
GeoTextiles PDF
Asım Davulcu
No ratings yet
JNTUH Gold Medal Recipients List
Document17 pages
JNTUH Gold Medal Recipients List
Harshitha Katta
No ratings yet
Map 3 D 2015 Install
Document15 pages
Map 3 D 2015 Install
Harrison Hayes
No ratings yet
Замена проявки Xerox - WC - 7855F
Document12 pages
Замена проявки Xerox - WC - 7855F
kuharskiyvs
No ratings yet
Answer Sa Stem
Document7 pages
Answer Sa Stem
yvette garcia
No ratings yet
Evinrude E-TEC G2 Operator's Manual - ManualsLib
Document3 pages
Evinrude E-TEC G2 Operator's Manual - ManualsLib
zizan
No ratings yet
Piping Arrangement - Conventional Oil Tanker Basics
Document11 pages
Piping Arrangement - Conventional Oil Tanker Basics
Emon Anowar
100% (1)
Anand Vihar Case Study
Document4 pages
Anand Vihar Case Study
Anaf
No ratings yet
Flowforming Mortar Cannon Barrels: Matthew Fonte
Document2 pages
Flowforming Mortar Cannon Barrels: Matthew Fonte
Adnan Colo
No ratings yet
Practical 1:-Familiarization With Programming Environment
Document3 pages
Practical 1:-Familiarization With Programming Environment
arjun arora
100% (1)
Syegon Leaflet 2012
Document2 pages
Syegon Leaflet 2012
kanteron6443
No ratings yet
Entropy Problems Answers
Document6 pages
Entropy Problems Answers
Tots Holares
No ratings yet
Descriptive Statistics Using SPSS: Ex No: Date
Document5 pages
Descriptive Statistics Using SPSS: Ex No: Date
pecmba11
No ratings yet
Abstract of Estimated Cost Project: Netra Kharel
Document2 pages
Abstract of Estimated Cost Project: Netra Kharel
Sandip Budhathoki
No ratings yet
Cohesion and Adhesion in Liquids
Document1 page
Cohesion and Adhesion in Liquids
jmlagumbay
No ratings yet
Quality Test Bank 2
Document16 pages
Quality Test Bank 2
Behbehlynn
100% (4)
5.0 - en-US - 2019-04 - OK - Flygt O-Ring Kit
Document72 pages
5.0 - en-US - 2019-04 - OK - Flygt O-Ring Kit
Luis Jose Huanca Silva
No ratings yet
Parts Catalog - Option Detail: Option Group Graphic Film Card Date
Document3 pages
Parts Catalog - Option Detail: Option Group Graphic Film Card Date
munh
No ratings yet
Game Console - Graphics Team - Bibliographic Report
Document42 pages
Game Console - Graphics Team - Bibliographic Report
neslou
No ratings yet
Design and test analog and digital circuits
Document105 pages
Design and test analog and digital circuits
eshwari2000
100% (1)
Exercise #2: Name: Dimal, Neil
Document3 pages
Exercise #2: Name: Dimal, Neil
NeilLaurrenceDimal
0% (1)
Introduction To The Project and Overview of Power System Security
Document9 pages
Introduction To The Project and Overview of Power System Security
salagasim
100% (1)