0 views

Uploaded by Saurabh Mishra

K means

- Rationalisation of Freeform Façades
- Temporal data clustering via weighted clustering ensemble with different representations
- Fuzzy Classification Vtu
- hartigan_1979_kmeans.pdf
- GSPL Commercial_Summer Project Report
- Improved Visual Final Version New
- Clustering Algorithms
- UNSUPERVISED ROBOTIC SORTING: TOWARDS AUTONOMOUS DECISION MAKING ROBOTS
- Typical Genomic Framework on Disease Analysis
- (2017) Analyis of Agriculture Data using Data Mining Techniques, Application of Big Data.pdf
- M344 Cluster Analysis_spring 2013
- Ankit Pandey_10BM60012_WEKA Term Paper
- Syllabus
- Cluster Example
- poselets-eccv10
- Angkor Wat Devata Analysis MSU Abstract
- Data Pre Processing
- psyc993_09
- Particle Swarm Optimization for Classification of Breast Cancer
- ecsbip

You are on page 1of 36

Lecture
14

David
Sontag

New
York
University

Carlos Guestrin, Andrew Moore, Dan Klein

Clustering

Clustering:

– Unsupervised learning

– Requires data, but no labels

– Detect patterns e.g. in

• Group emails or search results

• Customer shopping patterns

• Regions of images

– Useful when don’t know what

you’re looking for

– But: can get gibberish

Clustering

• Basic idea: group together similar instances

• Example: 2D point patterns

Clustering

• Basic idea: group together similar instances

• Example: 2D point patterns

Clustering

• Basic idea: group together similar instances

• Example: 2D point patterns

– One option: small Euclidean distance (squared)

dist(�x, �y ) = ||�x − �y ||22

– Clustering results are crucially dependent on the measure of

similarity (or distance) between “points” to be clustered

• /(&'+'0-(0+"+"*,'(%-.$

– 1,%%,.#23 +**",.&'+%(4&

– 5,26,7)3 6(4($(4&

Clustering algorithms

• 8+'%(%(,)+"*,'(%-.$9:"+%;

– <Ͳ.&+)$

– =(>%#'&,?@+#$$(+)

– A2&0%'+"!"#$%&'()*

!"#$%&'()*+"*,'(%-.$

• /(&'+'0-(0+"+"*,'(%-.$

– 1,%%,.#23 +**",.&'+%(4&

– 5,26,7)3 6(4($(4&

• 8+'%(%(,)+"*,'(%-.$9:"+%;

– <Ͳ.&+)$

Clustering examples

Image
segmenta3on

Goal:
Break
up
the
image
into
meaningful
or

perceptually
similar
regions

Clustering examples

Clustering gene

expression data

K-Means

• An iterative clustering

algorithm

points as cluster centers

– Alternate:

1. Assign data points to

closest cluster center

2. Change the cluster

center to the average

of its assigned points

assignments change

K-Means

• An iterative clustering

algorithm

points as cluster centers

– Alternate:

1. Assign data points to

closest cluster center

2. Change the cluster

center to the average

of its assigned points

assignments change

K-‐means clustering: Example

• Pick K random

points as cluster

centers (means)

11

K-‐means
clustering:
Example

Iterative Step 1

• Assign data points to

closest cluster center

12

K-‐means
clustering:
Example

Iterative Step 2

• Change the cluster

center to the average of

the assigned points

13

K-‐means
clustering:
Example

• Repeat
unDl

convergence

14

K-‐means
clustering:
Example

15

K-‐means
clustering:
Example

16

K-‐means
clustering:
Example

17

ProperDes
of
K-‐means
algorithm

• Guaranteed
to
converge
in
a
ﬁnite
number
of

iteraDons

1. Assign data points to closest cluster center

O(KN) time

2. Change the cluster center to the average of its

assigned points

O(N)

What properDes should a distance

!"#$%&'%(&$)(**"'+,-#-)*$#./(

measure have?

0(#*+&("#1(2

• 3400($&)/

– 56789:;56987:

– <$"(&=)*(8=(/#.*#47,''>*,)>(9?+$9-'(*.'$,''>

,)>(7

• @'*)$)1)$48#.-*(,AͲ*)0),#&)$4

– 56789:B8#.-56789:;B)AA 7;9

– <$"(&=)*($"(&(=),,-)AA(&(.$'?C(/$*$"#$=(/#..'$$(,,

#%#&$

• D&)#.E,().(F+#,)$4

– 56789:G5698H: 5678H:

– <$"(&=)*('.(/#.*#4I7)*,)>(989)*,)>(H8?+$7)*.'$

,)>(H#$#,,J

!"#$%& '(%)#*+#%,#

!"#$%&'($

σୀଵ σ௫א ݔെ ߤ ଶ

ఓ

-. /01ߤ2(340"05#!"

!"#$%&'()#*+,

ଶ ଶ

ݔെ ߤ ൌ ݔ െ ߤ௫

ୀଵ ௫א

6. /01!#(340"05#ߤǣ

σୀଵ σ௫א ݔെ ߤ ଶ

ఓ

– 7$8#3$*40$9:#*0)$40)#(;ߤ $%:,(5#*(2<#=$)#

ͳ

ߤ ൌ ݔ !"#$-&'()#*+,

ȁܥ ȁ

௫א

!"#$%& 4$8#&$%$94#*%$40%+(340"05$40(%$33*($,=2#$,=&4#30&+>$*$%4##:4(

:#,*#$=#(?@#,40)#A 4=>&+>$*$%4##:4(,(%)#*+#

[Slide from Alan Fern]

Example: K-Means for Segmentation

K=2 Original

Goal of Segmentation is

K =2 K =3 K = 10 Original image

to partition an image

into regions each of

which has reasonably

homogenous visual

appearance.

Example: K-Means for Segmentation

K=2

K =2 K=3

K =3 K=10

K = 10 Original

Original image

Example: K-Means for Segmentation

K=2

K =2 K=3

K =3 K=10

K = 10 Original

Original image

4% 8% 17%

Example: Vector quantization

514 14. Unsupervised Learning

FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders

of modern day statistics, to whom we owe maximum-likelihood, sufficiency, and

many other fundamental concepts. The image on the left is a 1024×1024 grayscale

image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using

200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses

only four code vectors, with a compression rate of 0.50 bits/pixel

We see that the procedure is successful at grouping together samples of

the same cancer. In fact, the two breast cancers in the second cluster were

Initialization

• K-means algorithm is a

heuristic

– Requires initial means

– It does matter what you pick!

this kind of thing: variance-based

split / merge, initialization

heuristics

K-Means Getting Stuck

A local optimum:

one cluster here

K-means not able to properly cluster

Y

X

Changing the features (distance function)

can help

R

θ

Agglomerative Clustering

• Agglomerative clustering:

– First merge very similar instances

– Incrementally build larger clusters out

of smaller clusters

• Algorithm:

– Maintain a set of clusters

– Initially, each instance in its own

cluster

– Repeat:

• Pick the two closest clusters

• Merge them into a new cluster

• Stop when there s only one cluster left

family of clusterings represented

by a dendrogram

Agglomerative Clustering

• How should we define closest for clusters

with multiple elements?

Agglomerative Clustering

• How should we define closest for clusters

with multiple elements?

• Many options:

– Closest pair

(single-link clustering)

– Farthest pair

(complete-link clustering)

– Average of all pairs

different clustering behaviors

Agglomerative Clustering

• How should we define closest for clusters

with multiple elements?

Closest pair Farthest pair

(single-linkSingle Link Example

clustering) (complete-link

Completeclustering)

Link Example

1 2 5 6 1 2 5 6

3 4 7 8 3 4 7 8

Clustering Behavior

Average Farthest Nearest

Agglomerative Clustering Questions

– To a global optimum?

Reconsidering “hard assignments”?

• Some clusters may be

“wider” than others

• Distances can be

deceiving!

Extra

• K-‐means Applets:

– hOp://home.dei.polimi.it/maOeucc/Clustering/

tutorial_html/AppletKM.html

– hOp://www.cs.washington.edu/research/

imagedatabase/demo/kmcluster/

- Rationalisation of Freeform FaçadesUploaded byCarmo
- Temporal data clustering via weighted clustering ensemble with different representationsUploaded byxpiration
- Fuzzy Classification VtuUploaded byVijaya Natarajan
- hartigan_1979_kmeans.pdfUploaded byHacen Spy
- GSPL Commercial_Summer Project ReportUploaded byMohana Priya S
- Improved Visual Final Version NewUploaded byAmanda Carter
- Clustering AlgorithmsUploaded bythegame4626
- UNSUPERVISED ROBOTIC SORTING: TOWARDS AUTONOMOUS DECISION MAKING ROBOTSUploaded byAdam Hansen
- Typical Genomic Framework on Disease AnalysisUploaded byWhite Globe Publications (IJORCS)
- (2017) Analyis of Agriculture Data using Data Mining Techniques, Application of Big Data.pdfUploaded byKaelel Bilang
- M344 Cluster Analysis_spring 2013Uploaded byakshay patri
- Ankit Pandey_10BM60012_WEKA Term PaperUploaded byankit191188
- SyllabusUploaded byaliysa
- Cluster ExampleUploaded byDeden Istiawan
- poselets-eccv10Uploaded byMarius_2010
- Angkor Wat Devata Analysis MSU AbstractUploaded byshivpriye
- Data Pre ProcessingUploaded byjyothibellaryv
- psyc993_09Uploaded bySharvin Balakrishnan
- Particle Swarm Optimization for Classification of Breast CancerUploaded bysusmitha
- ecsbipUploaded byPoonam Kataria
- IJCSE10-02-02-19Uploaded bykalyankrishna2011
- Randomized Clustering Forests for Image ClassificationUploaded byYoungSooKim
- Message TypeUploaded bySridhar Mocherla
- Clustering With Multiviewpoint-BasedUploaded bySuveetha Suvi
- 10bUploaded bynidhin921
- cd5a23cc68483e4205633893959f2dcd038e.pdfUploaded byGreggy Otero
- Predicting Con BUploaded byApurva Nadkarni
- 19Uploaded byArnav Guddu
- Data Warehousing and Data MiningUploaded byRahSam
- 2003 Investment TelecomUploaded byAnkita Kothari

- XRV.pdfUploaded byDjvionico Perez
- BoilerUploaded byrasheedshaikh2003
- Biohacker's handbook workUploaded byJagadish Jee
- Documentation System Cabling for CENTUM VP- CENTUM CS 3000 R3 and STARDOMUploaded byMatthew Daniel
- Vandelay Case(3)Uploaded byBernie Clinton
- FLA 01 - Journal Review - Instructional LeadershipUploaded byAbi Calimbahin
- EOR Exercise 2Uploaded byAnonymous cNlhMg
- A2 Magazine QuestionnaireUploaded byHarrygeopreston
- 3910Uploaded byKaumil Shah
- Tapered_Roller_Bearing_Catalog_C,678-679.pdfUploaded byJuandeDios Huayllani Delgado
- SIUI AdapterUploaded byLibra Man
- CONEX 2017 Reaction Paper.docxUploaded byLeonoralynn Sapungan
- Vershire Company Case PrintUploaded byFahmi A. Mubarok
- Ch 36 Sec 1-4 - Global Interdependence.pdfUploaded byMrEHsieh
- Improve Energy Efficiency of Electrical System by Energy Audit (Data Logging)Uploaded byIAEME Publication
- Test Method Recommendations of RILEM TC 177-MDT 'Masonry Durability and on-site Testing' - D.5- In-situ Stress - Strain Behaviour Tests Based on the Flat JackUploaded byAalil Issam
- art-3A10.1007-2Fs12010-008-8435-5Uploaded byprakush_prakush
- Titanium Metal Matrix CompositeUploaded byPramit Kumar Senapati
- PMO Charter Template with ExampleUploaded byMegan Wale
- Internal Combustion EnginesUploaded bySajith Nawarathne
- 4IRUploaded byArayaman
- Evaporation ProblemUploaded byJahre El Leonard Tañedo
- LectureUploaded byMonyee Sindhu
- Thesis - Development of a Flexible , Programmable Analysis System for Mathematical Operations on FFT SpectraUploaded byJawad Hasan
- RESERVOIR CHARACTERIZATION AND STRUCTURAL INTERPRETATION OF SEISMIC PROFILE A CASE STUDY OF Z FIELD NIGER DELTA NIGERIA (1).pdfUploaded bySanjeev Singh Negi
- Proceedings of the 2007 Antenna Applications Symposium, Volume II, Daniel Schaubert et al., Editor, Air Force Research Laboratory, 12-2007.Uploaded byBob Laughlin, KWØRL
- 425 Lecture Chp12 Footing DesignUploaded byMang Kulas
- Prevent CrackingUploaded byRafeek Shaikh
- Fuel Flow MetersUploaded byAMIYA SHANKAR PANDA
- S2T1_Chap_1_Additional_lecture_notes_on_GlobalisationUploaded bymoni0007