You are on page 1of 13

BITS Pilani

BITS Pilani Dr.Aruna Malapati


Asst Professor
Hyderabad Campus Department of CSIS
BITS Pilani
Hyderabad Campus

Large data clustering


Today’s Learning objective

• Define BIRCH and CURE algorithms for clustering large


data

BITS Pilani, Hyderabad Campus


BIRCH (Balanced Iterative Reducing
and Clustering using Hierarchies)

• Most of the existing algorithms DO NOT consider the


case that datasets can be too large to fit in main memory
• They DO NOT concentrate on minimizing the number of
scans of the dataset
• I/O costs are very high
• The complexity of BIRCH is O(n) where n is the number
of objects to be clustered.

BITS Pilani, Hyderabad Campus


Clustering Feature

BITS Pilani, Hyderabad Campus


Properties of Clustering
Feature

BITS Pilani, Hyderabad Campus


Distance Measures

BITS Pilani, Hyderabad Campus


Faculty Salaries

h h

h
e e
e
h e
e e h
e e e e
h
salary e
h
h
h h
h h h

age

BITS Pilani, Hyderabad Campus


Starting CURE

1. Pick a random sample of points that fit in main memory.

2. Cluster these points hierarchically --- group nearest


points/clusters.

3. For each cluster, pick a sample of points, as dispersed


as possible.

4. From the sample, pick representatives by moving them


(say) 20% toward the centroid of the cluster.

BITS Pilani, Hyderabad Campus


Initial Clusters

h h

h
e e
e
h e
e e h
e e e e
h
salary e
h
h
h h
h h h

age

BITS Pilani, Hyderabad Campus


Pick Dispersed Points

h h

h
e e
e
h e
e e h
e e e e
h
salary e Pick (say) 4
h
h remote points
h h for each
h h h cluster.

age

BITS Pilani, Hyderabad Campus


Pick Dispersed Points

h h

h
e e
e
h e
e e h
e e e e
h
salary e Move points
h
h (say) 20%
h h toward the
h h h centroid.

age

BITS Pilani, Hyderabad Campus


Finishing CURE

• Now, visit each point p in the data set.

• Place it in the “closest cluster.”

– Normal definition of “closest”: that cluster with the


closest (to p ) among all the sample points of all the
clusters.

BITS Pilani, Hyderabad Campus

You might also like