16 views

Uploaded by prabhudeen

data mining

- 3. Project Report
- IRS_Unit-4
- a content based region separation and analysis approach for sar image classification
- salfner08software_reliability-1.pdf
- h 0444146
- Smartcluster: A Metamodel Indicators for Smart and Human Cities (Presentation)
- Predicting the outcome of soccer matches
- A Novel Approach for Georeferenced Data Analysis Using Hard Clustering Algorithm
- Detection and Localisation Af Multiple Spoofing Attack
- Medical Images Retrieval Using Clustering Technique
- Secure Algorithm for File Sharing Using Clustering Technique of K-Means Clustering
- clusteringI_4
- An FP-Growth Approach to Mining Association Rules
- fuzzy c means image segmentation.pdf
- An Improved K-means Clustering Algorithm With Refined Initial Centroids
- 1 Song Etal Fuzzy Clustering Book Chapter
- InTech-Image Segmentation Through Clustering Based on Natural Computing Techniques
- Image segmentation
- Da Wk4 Segtarg Handout
- ML Lecture 10

You are on page 1of 33

3.1.1 Cluster Analysis

3.1.2 Clustering Categories

3.2 Partitioning Methods

3.2.1 The principle

3.2.2 K-Means Method

3.2.3 K-Medoids Method

3.2.4 CLARA

3.2.5 CLARANS

`

houses to find distribution patterns

similarity

Typical Applications

WWW, Social networks, Marketing, Biology, Library, etc.

`

Partitioning Methods

Hierarchical Methods

Hypothesize a model for each cluster and find the best fit of the

data to the given model

Model-based methods

Grid-based Methods

Density-based Methods

Subspace clustering

Constraint-based methods

`

3.1.1 Cluster Analysis

3.1.2 Clustering Categories

3.2 Partitioning Methods

3.2.1 The principle

3.2.2 K-Means Method

3.2.3 K-Medoids Method

3.2.4 CLARA

3.2.5 CLARANS

`

Given

K the number of clusters to form

represents a cluster

criterion

Objects of different clusters are dissimilar

Goal:

create 3 clusters

(partitions)

Choose 3 objects

(cluster centroids)

Assign each object

to the closest centroid

to form Clusters

+

Update cluster

centroids

+

+

K-Means Method

Recompute

Clusters

+

+

+

If Stable centroids,

then stop

+

+

+

K-Means Algorithm

`

Input

D: a data set containing n objects

Method:

(1) Arbitrary choose k objects from D as in initial cluster centers

(2) Repeat

(3) Reassign each object to the most similar cluster based on the

mean value of the objects in the cluster

(4) Update the cluster means

(5) Until no change

K-Means Properties

`

square-error function

i1

pC

( p m i)

E: the sum of the squared error for all objects in the data set

It works well when the clusters are compact clouds that are

rather well separated from one another

K-Means Properties

Advantages

`

data sets

Disadvantage

`

shapes or clusters of very different size

mean value)

`

Dissimilarity calculations

November 2, 2010

11

`

(Medoid) to which is the most similar

each object and its corresponding reference point

i1

pC

| p o

E: the sum of absolute error for all objects in the data set

`

representative objects continues as long as the quality of the

clustering is improved

representative object is replaced by a non-representative object

Data Objects

9

8

A1

A2

7

6

O1

O2

O3

O4

O5

O6

O7

O8

O9

O10

4

10

5

4

7

5

Choose randmly two medoids

O2 = (3,4)

O8 = (7,4)

Data Objects

9

cluster1

A1

O1

A2

7

6

O2

O3

O4

O5

O6

O7

O8

O9

O10

cluster2

3

4

10

5

4

7

5

Using L1 Metric (Manhattan), we form the following

clusters

Cluster1 = {O1, O2, O3, O4}

Cluster2 = {O5, O6, O7, O8, O9, O10}

9

Data Objects

cluster1

A1

O1

A2

O2

O3

O4

O5

O6

O7

O8

O9

O10

cluster2

3

4

10

5

4

7

5

Medoids (O2,O8)]

| p o |

i1 pCi

| o5 o8 | | o6 o8 | | o7 o8 | | o9 o8 | | o10 o8 |

9

Data Objects

cluster1

A1

O1

A2

O2

O3

O4

O5

O6

O7

O8

O9

O10

cluster2

3

4

10

5

4

7

5

(O2,O8)]

(3 4 4) (3 1 1 2 2)

20

9

Data Objects

cluster1

A1

O1

A2

O2

O3

O4

O5

O6

O7

O8

O9

O10

cluster2

3

4

10

7

5

Swap O8 and O7

Compute the absolute error criterion [for the set of Medoids

(O2,O7)]

(3 4 4) (2 2 1 3 3)

22

Data Objects

9

cluster1

A1

O1

A2

7

6

O2

O3

O4

O5

O6

O7

O8

O9

O10

cluster2

3

4

10

5

4

7

5

Absolute error [for O2,O7] Absolute error [O2,O8]

22 20

K-Medoids Method

9

Data Objects

cluster1

A1

O1

A2

O2

O3

O4

O5

O6

O7

O8

O9

O10

cluster2

3

4

10

5

4

7

5

cluster 2 did not change the assignments of

objects to clusters.

replace a medoid by another object?

K-Medoids Method

Cluster 1

Cluster 2

First case

B

B

Representative object

Random Object

Currently P assigned to A

Cluster 1

A

Cluster 2

Second case

p

B

Representative object

Random Object

Currently P assigned to B

P is reassigned to A

K-Medoids Method

Cluster 1

A

Cluster 2

Third case

p

B

B

Representative object

Random Object

Currently P assigned to B

Cluster 1

Cluster 2

Fourth case

A

B

p

Representative object

Random Object

Currently P assigned to A

P is reassigned to B

K-Medoids Algorithm(PAM)

PAM : Partitioning Around Medoids

`

`

`

Input

K: the number of clusters

D: a data set containing n objects

Output: A set of k clusters

Method:

(1) Arbitrary choose k objects from D as representative objects (seeds)

(2) Repeat

(3) Assign each remaining object to the cluster with the nearest

representative object

(4) For each representative object Oj

(5) Randomly select a non representative object Orandom

(6) Compute the total cost S of swapping representative object Oj with

Orandom

(7) if S<0 then replace Oj with Orandom

`

Advantages

noise and outliers

Disadvantages

3.2.4 CLARA

`

method to deal with large data sets

A random sample

original data

`

to what would have been

chosen from the whole

data set

PAM

sample

CLARA

`

Clusters

Clusters

Clusters

PAM

PAM

PAM

sample1

sample2

samplem

CLARA Properties

`

s: the size of the sample

k: number of clusters

n: number of objects

PAM finds the best k medoids among a given data, and CLARA

finds the best k medoids among the selected samples

Problems

The best k medoids may not be selected during the sampling

process, in this case, CLARA will never find the best clustering

If the sampling is biased we cannot have a good clustering

Trade off-of efficiency

3.2.5 CLARANS

`

RANdomized Search ) was proposed to improve the quality and

the scalability of CLARA

search

Clustering view

Current

medoids

Cost=10

Cost=5

Cost=2

Cost=1

Cost=3

medoids

Cost=20

Cost=5

CLARA

` Draws a sample of nodes at the beginning of the search

` Neighbors are from the chosen sample

` Restricts the search to a specific area of the original data

Sample

medoids

First step of the search

Neighbors are from the chosen sample

Neighbors are from the chosen sample

Current

medoids

CLARANS

` Does not confine the search to a localized area

` Stops the search when a local minimum is found

` Finds several local optimums and output the clustering

with the best local optimum

First step of the search

Draw a random sample of neighbors

Original data

medoids

Current

medoids

Draw a random sample of neighbors

The number of neighbors sampled from the original data is specified by the user

CLARANS Properties

`

Advantages

and CLARA

Handles outliers

Disadvantages

number of objects

`

- 3. Project ReportUploaded byM Shahid Khan
- IRS_Unit-4Uploaded byAnonymous iCFJ73OMpD
- a content based region separation and analysis approach for sar image classificationUploaded byrahul sharma
- salfner08software_reliability-1.pdfUploaded byma_purwoadi
- h 0444146Uploaded byAnonymous 7VPPkWS8O
- Smartcluster: A Metamodel Indicators for Smart and Human Cities (Presentation)Uploaded byVinicius Cardoso Garcia
- Predicting the outcome of soccer matchesUploaded byUlises Castro
- A Novel Approach for Georeferenced Data Analysis Using Hard Clustering AlgorithmUploaded byInternational Journal of Research in Engineering and Technology
- Detection and Localisation Af Multiple Spoofing AttackUploaded byShabana Bano
- Medical Images Retrieval Using Clustering TechniqueUploaded byEditor IJRITCC
- Secure Algorithm for File Sharing Using Clustering Technique of K-Means ClusteringUploaded byEditor IJRITCC
- clusteringI_4Uploaded byAmit Sharma
- An FP-Growth Approach to Mining Association RulesUploaded byInternational Journal of Computer Science and Mobile Computing - IJCSMC
- fuzzy c means image segmentation.pdfUploaded byAlbeto Vllagómez
- An Improved K-means Clustering Algorithm With Refined Initial CentroidsUploaded byDeden Istiawan
- 1 Song Etal Fuzzy Clustering Book ChapterUploaded byMag Nae
- InTech-Image Segmentation Through Clustering Based on Natural Computing TechniquesUploaded bykarteekak
- Image segmentationUploaded byrajarajeswari
- Da Wk4 Segtarg HandoutUploaded bySahil Jain
- ML Lecture 10Uploaded byT
- 1-s2.0-S0925443903001972-mainUploaded bycailiii
- Catching the Trenda Framework for Clustering ConceptUploaded byVinayaka Gangadharappa
- Marie Cottrell, Barbara Hammer, Alexander Hasenfuß and Thomas Villmann- Batch and median neural gasUploaded byGrettsz
- Introducation Abt MyselfUploaded byBhanu Partap Sharma
- ijcttjournal-v1i2p5Uploaded bysurendiran123
- SynopsisUploaded byvmurali.info
- 03CS3006 Pawan Singh FaujadarUploaded byGilbert Rozario
- A Measure for Objective Evaluation of Image Segmentation AlgorithmsUploaded byJanarthanam Subramaniam
- 01Uploaded byLokesh
- DIP ReviewUploaded byNaveen Gowdru Channappa

- lect17.pdfUploaded byprabhudeen
- TicketUploaded byprabhudeen
- DBMS--Q1Uploaded byprabhudeen
- How_To_Apply.pdfUploaded byYadav
- The Phong Illumination ModelUploaded byTommyc1024
- Digital Image Processing Using MATLABUploaded byapi-26891612
- Notice for Fee Payment for Spring Semester.Uploaded byprabhudeen
- COMPUTER NETWORKS NEW(gate2016 PRINTOUT.pdfUploaded byprabhudeen
- Doc of ShortcutsUploaded bySohaKhan
- lect06Uploaded byprabhudeen
- lect15.pdfUploaded byprabhudeen
- lect24.pdfUploaded byprabhudeen
- lect07.pdfUploaded byprabhudeen
- lect23.pdfUploaded byprabhudeen
- lect11.pdfUploaded byprabhudeen
- lect06.pdfUploaded byprabhudeen
- lect08.pdfUploaded byprabhudeen
- lect03.pdfUploaded byprabhudeen
- prospectusUploaded byrambori
- COMPUTER NETWORKS NEW(gate2016 PRINTOUT.pdfUploaded byprabhudeen
- Department Database SystemUploaded byprabhudeen

- Cogan, Mordechai-Imperialism and Religion_ Assyria, Judah and Israel in the 8th and 7th Centuries B.C.-Society of Biblical Literature and.pdfUploaded byPaul Haegeman
- NA05 - Cat's Cradle 01 - Time's CrucibleUploaded byFeffe F. K. Feffinson
- hanassuitcase-bookplanUploaded byapi-355590825
- LTE TG Module 2 Genetics Feb. 26PilarUploaded byAnnEvite
- Competitive Profile Matrix[1]Uploaded byBhagyashree Lotlikar
- IntroductionUploaded byAkasha Frostmourne
- Electromagnetic Methods (Theory)Uploaded bytsar_philip2010
- Disruptive Mood Dysregulation Disorder: Case StudyUploaded byMahnoor Shah Bukhari
- Effect UationUploaded byAnonymous RvXrU86Z2Z
- Western Ukraine Within Poland 1920-1939Uploaded byandriy112
- The Critical Period HypothesisUploaded byRosalina Rodríguez
- Indo Pak History Papers 2006 - CSS Forums.pdfUploaded byMansoor Ali Khan
- United States v. William Arthur Twyman, 985 F.2d 554, 4th Cir. (1993)Uploaded byScribd Government Docs
- Dillon's Digital Divide PaperUploaded bydillon_woodrum
- Why Choose a Column Database for Business IntelligenceUploaded byIan Norris
- Julian Ku on Exclusive Commander in Chief PowerUploaded bywesyang
- Mrunal [Aptitude] Probability Made Easy (Extension of Permutation Combination Concept!) - MrunalUploaded byRahul Gaurav
- Deltek, Inc. v. Department of Labor, 4th Cir. (2016)Uploaded byScribd Government Docs
- Reactive Extraction of Lactic Acid Using Alamine 336 in MIBK Equilibria and KineticsUploaded byenriquecast
- Katalog Dcb 2011 WebUploaded byIbnu Lawless
- Taran Swan Case AnalysisUploaded byLuis Angel Quispe Peña
- Journal OrtoUploaded byCesar Augusto Rojas Machuca
- Messages From the Wondrous Canopy of the Heavens-Introduction 130410Uploaded byAnthony Writer
- Michele Cosellu__Western and Indian Theories of Consciousness Confronted. a Comparative Overview of Continentatl and Analytic Philosophy With Advaita Vedanta and Madhyamaka BuddhismUploaded byAni Lupascu
- Judge Paula Hepner's Order - a bone to the dogUploaded bySLAVEFATHER
- Commissioner of Customs v. William Singson and Triton ShippingUploaded byLara Yulo
- Figurative Language PacketUploaded bydcspada
- A Study on Green Packaging Design in TaiwanUploaded byAninditya Wisnuputri
- OverTheOverflow WorkbookUploaded byBlaga MIhai Raul
- Symbolism the Story of an HourUploaded byMsDiamondTraxx