0% found this document useful (0 votes)

44 views84 pages

Understanding Cluster Analysis Techniques

Cluster analysis is a method for grouping similar data objects into clusters based on their characteristics, commonly used in various fields such as marketing and spatial data analysis. It involves different types of data and major clustering methods, including partitioning and hierarchical approaches, with the k-means algorithm being a notable example. The effectiveness of clustering depends on the quality of clusters formed, which is influenced by the chosen similarity measures and the handling of various data types.

Uploaded by

ibk2007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views84 pages

Understanding Cluster Analysis Techniques

Uploaded by

ibk2007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Cluster Analysis

1. What is Cluster Analysis?

2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods

March 26, 2025 Data Mining: Concepts and Techniques 1

What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters
 Unsupervised learning: no predefined classes
 Typical applications
 As a stand-alone tool to get insight into data
distribution
 As a preprocessing step for other algorithms
March 26, 2025 Data Mining: Concepts and Techniques 2
Clustering: Rich Applications and
Multidisciplinary Efforts
 Pattern Recognition
 Spatial Data Analysis

Create thematic maps in GIS by clustering feature
spaces

Detect spatial clusters or for other spatial mining
tasks
 Image Processing
 Economic Science (especially market research)
 WWW

Document classification

Cluster Weblog data to discover groups of similar
access patterns
March 26, 2025 Data Mining: Concepts and Techniques 3
Examples of Clustering
Applications
 Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
 Land use: Identification of areas of similar land use in an
earth observation database
 Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost
 City-planning: Identifying groups of houses according to their
house type, value, and geographical location
 Earth-quake studies: Observed earth quake epicenters
should be clustered along continent faults
March 26, 2025 Data Mining: Concepts and Techniques 4
Quality: What Is Good
Clustering?
 A good clustering method will produce high quality
clusters with

high intra-class similarity

low inter-class similarity
 The quality of a clustering result depends on both
the similarity measure used by the method and its
implementation
 The quality of a clustering method is also measured
by its ability to discover some or all of the hidden
patterns
March 26, 2025 Data Mining: Concepts and Techniques 5
Measure the Quality of
Clustering
 Dissimilarity/Similarity metric: Similarity is expressed
in terms of a distance function, typically metric: d(i, j)
 There is a separate “quality” function that measures
the “goodness” of a cluster.
 The definitions of distance functions are usually very
different for interval-scaled, boolean, categorical,
ordinal ratio, and vector variables.
 Weights should be associated with different variables
based on applications and data semantics.
 It is hard to define “similar enough” or “good
enough”
 the answer is typically highly subjective.
March 26, 2025 Data Mining: Concepts and Techniques 6
Requirements of Clustering in Data
Mining
 Scalability
 Ability to deal with different types of attributes
 Ability to handle dynamic data
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to
determine input parameters
 Able to deal with noise and outliers
 Insensitive to order of input records
 High dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
March 26, 2025 Data Mining: Concepts and Techniques 7
Cluster Analysis
1. What is Cluster Analysis?
2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods

March 26, 2025 Data Mining: Concepts and Techniques 8

Data Structures
 Data matrix
 x11 ... x1f ... x1p 
 
 ... ... ... ... ... 
x ... xif ... xip 
 i1 
 ... ... ... ... ... 
x ... xnf ... xnp 
 n1 

 0 
 d(2,1) 
 Dissimilarity matrix  0 
 d(3,1) d ( 3,2) 0 
 
 : : : 
 d ( n,1) d ( n,2) ... ... 0

March 26, 2025 Data Mining: Concepts and Techniques 9

Type of data in cluster analysis
 Interval-scaled variables
 e.g., salary, temperature
 Binary variables
 e.g., gender (M/F), has_cancer(T/F)
 Nominal (categorical) variables
 e.g., religion (Christian, Muslim, Buddhist, Hindu, etc.)
 Ordinal variables
 e.g., military rank (soldier, sergeant, lutenant, captain, etc.)
 Ratio-scaled variables
 population growth (1,10,100,1000,...)
 Variables of mixed types
 multiple attributes with various types
Interval-valued variables

 Standardize data

Calculate the mean absolute deviation:
s f 1n (| x1 f  m f |  | x2 f  m f | ... | xnf  m f |)

where m f  1n (x1 f  x2 f  ...  xnf )


Calculate the standardized measurement (z-
score)

 Using mean absolute deviation is more robust than

using standard deviation
March 26, 2025 Data Mining: Concepts and Techniques 11
Similarity and Dissimilarity
Between Objects

 Distances are normally used to measure the

where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp)

are two p-dimensional data objects, and q is a
positive integer
 , j) | x  x |  |distance
If q = 1, d isd (iManhattan x  x | ... | x  x |
i1 j1 i2 j 2 ip jp

March 26, 2025 Data Mining: Concepts and Techniques 12

Similarity and Dissimilarity
Between Objects (Cont.)

 If q = 2, d is Euclidean distance:
d (i, j)  (| x  x |2  | x  x |2 ... | x  x |2 )
i1 j1 i2 j2 ip jp
 Properties

d(i,j)  0

d(i,i) = 0

d(i,j) = d(j,i)

d(i,j)  d(i,k) + d(k,j)
 Also, one can use weighted distance, parametric
Pearson product moment correlation, or other
disimilarity measures

March 26, 2025 Data Mining: Concepts and Techniques 13

Chapter 7. Cluster Analysis
1. What is Cluster Analysis?
2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods
6. Summary

March 26, 2025 Data Mining: Concepts and Techniques 14

Major Clustering Approaches

 Partitioning approach:
 Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors
 Typical methods: k-means, k-medoids, CLARANS
 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or objects)
using some criterion
 Typical methods: DIANA, AGNES, BIRCH, ROCK, CAMELEON

March 26, 2025 Data Mining: Concepts and Techniques 15

Chapter 7. Cluster Analysis
1. What is Cluster Analysis?
2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods

March 26, 2025 Data Mining: Concepts and Techniques 16

Partitioning Algorithms: Basic
Concept
 Partitioning method: Construct a partition of a database D of
n objects into a set of k clusters, s.t., min sum of squared
distance k 2
 m1tmiKm (Cm  tmi )

 Given k, find a partition of k clusters that optimizes the

chosen partitioning criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen’67): Each cluster is represented by
the center of the cluster
 k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the
objects in the cluster
March 26, 2025 Data Mining: Concepts and Techniques 17
The K-Means Clustering Method

 Given k, the k-means algorithm is implemented

in four steps:
 Partition objects into k nonempty subsets
 Compute seed points as the centroids of the
clusters of the current partition (the centroid
is the center, i.e., mean point, of the cluster)
 Assign each object to the cluster with the
nearest seed point
 Go back to Step 2, stop when no more new
assignment

March 26, 2025 Data Mining: Concepts and Techniques 18

The K-Means Clustering Method

 Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3
3

2 each
2 the 2

1
objects
1

0
cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10

most
similar reassign reassign
center 10 10

K=2 9 9

8 8

Arbitrarily choose 7 7

6 6
K object as initial 5 5

cluster center 4 Update 4

2
the 3

1 cluster 1

0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 4 5 6 7 8 9 10

March 26, 2025 Data Mining: Concepts and Techniques 19

Comments on the K-Means Method

 Strength: Relatively efficient: O(tkn), where n is # objects, k is

# clusters, and t is # iterations. Normally, k, t << n.
 Comment: Often terminates at a local optimum. The global
optimum may be found using techniques such as:
deterministic annealing and genetic algorithms
 Weakness
 Applicable only when mean is defined, then what about
categorical data?
 Need to specify k, the number of clusters, in advance
 Unable to handle noisy data and outliers
 Not suitable to discover clusters with non-convex shapes

March 26, 2025 Data Mining: Concepts and Techniques 20

Challenges
 When the size of clusters is different

Courtesy of “https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
clustering/”
March 26, 2025 Data Mining: Concepts and Techniques 21
Challenges
 When the densities of original points are
different

Courtesy of “https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
clustering/”
March 26, 2025 Data Mining: Concepts and Techniques 22
Variations of the K-Means Method

 A few variants of the k-means which differ in

 Selection of the initial k means
 Dissimilarity calculations
 Strategies to calculate cluster means
 Handling categorical data: k-modes (Huang’98)
 Replacing means of clusters with modes
 Using new dissimilarity measures to deal with categorical objects
 Using a frequency-based method to update modes of clusters
 A mixture of categorical and numerical data: k-prototype method

March 26, 2025 Data Mining: Concepts and Techniques 23

K-Means++ Clustering
 Aids in choosing initial cluster centroid for K-Means
clustering
 The steps to initialize the centroids using K-Means++
are:
 The first cluster is chosen uniformly at random from the data
points that we want to cluster. This is similar to what we do in
K-Means, but instead of randomly picking all the centroids, we
just pick one centroid here

 Next, we compute the distance (D(x)) of each data point (x)

from the cluster center that has already been chosen

 Then, choose the new cluster center from the data points with
the probability of x being proportional to (D(x))2

 We then repeat steps 2 and 3 until k clusters have been chosen

March 26, 2025 Data Mining: Concepts and Techniques 24
K-Means++ Clustering

Original Data Points 1. Random point is

chosen as initial
cluster centroid

2. Calculate the
distance of each 3. The farthest
point from the distance point is
centroid chosen as another
March 26, 2025 cluster centroid
Courtesy of “https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means- 25
K-Means++ Clustering

4. Compute distance 5. The data point

between each point with the highest
and the closest squared distance is
centroid the next centroid

Courtesy of “https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
clustering/”
March 26, 2025 26
Choosing right number of
clusters
 Elbow method – plot an elbow curve, where x-axis
represent the number of clusters and the y-axis
will be an evaluation metric.
 Choose any one metric like inertia or Dunn index
and plot its value against the clusters

Courtesy of “https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-
March 26, 2025 27
 https://codinginfinite.com/k-means-
clustering-explained-with-numerical-
example/

March 26, 2025 Data Mining: Concepts and Techniques 28

What Is the Problem of the K-Means
Method?

 The k-means algorithm is sensitive to outliers !

 Since an object with an extremely large value may
substantially distort the distribution of the data.
 K-Medoids: Instead of taking the mean value of the object in
a cluster as a reference point, medoids can be used, which is
the most centrally located object in a cluster.

10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

March 26, 2025 Data Mining: Concepts and Techniques 29

Partitioning Around Medoids
(PAM)
 Steps to find cluster centers
 Choose k data points from the scatter plot as

starting points for cluster centers.

 Calculate their distance from all the points in

the scatter plot.

 Classify each point into the cluster whose

center it is closest to.

 Select a new point in each cluster that

minimizes the sum of distances of all points

in that cluster from itself. (Swap phase)
 Repeat Step 2 until the centers stop

changing.
March 26, 2025 Data Mining: Concepts and Techniques 30
Hierarchical Clustering
 Produces a set of nested clusters
organized as a hierarchical tree
 Can be visualized as a dendrogram
 A tree-like diagram that records the

sequences of merges or splits

6 5
0.2
4
3 4
0.15 2
5

0.1 2

0.05
1
3 1

0
1 3 2 5 4 6
Hierarchical Clustering
 Given a set of N items to be clustered, and an NxN distance
(or similarity) matrix, the basic process of hierarchical
clustering is as follows:

1. Start by assigning each item to its own cluster, so that if you

have N items, you now have N clusters, each containing just
one item.

2. Find the closest (most similar) pair of clusters and merge

them into a single cluster, so that now you have one less
cluster.

3. Compute distances (similarities) between the new cluster

and each of the old clusters

4. Repeat steps 2 and 3 until all items are clustered into a

single cluster of size N.
March 26, 2025 Data Mining: Concepts and Techniques 32
Hierarchical Clustering

1. Scan the matrix for the minimum

2. Join items into one node

3. Update matrix and repeat from step 1

Hierarchical Clustering
• Two main types of hierarchical clustering
– Agglomerative:
• Start with the points as individual clusters
• At each step, merge the closest pair of clusters until only one

cluster (or k clusters) left

– Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point

(or there are k clusters)

• Traditional hierarchical algorithms use a similarity

or distance matrix
– Merge or split one cluster at a time
Agglomerative clustering algorithm

 Most popular hierarchical clustering technique

 Basic algorithm
1. Compute the distance matrix between the input data
points
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the distance matrix
6. Until only a single cluster remains
 Key operation is the computation of the
distance between two clusters
 Different definitions of the distance between clusters
lead to different algorithms
Input/ Initial setting
 Start with clusters of individual points and a
distance/proximity matrix
p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
.
.
.
Distance/Proximity
Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate State
 After some merging steps, we have some clusters

C1 C2 C3 C4 C5
C1

C2
C3
C3
C4
C4
C5
C1
Distance/Proximity
Matrix
C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate State
 Merge the two closest clusters (C2 and C5) and update
the distance matrix.
C1 C2 C3 C4 C5
C1

C3 C2
C3
C4
C4
C5
C1
Distance/Proximity
Matrix
C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
After Merging
 “How do we update the distance matrix?”

C2
U
C1 C5 C3 C4

C3 C1 ?

C4 C2 U C5 ? ? ? ?

C3 ?

C1 C4 ?

C2 U C5

...
p1 p2 p3 p4 p9 p10 p11 p12
Hierarchical Clustering

Distance between two points – easy to compute

Distance between two clusters – harder to compute:

1. Single-Link Method / Nearest Neighbor

2. Complete-Link / Furthest Neighbor
3. Average of all cross-cluster pairs
Typical Alternatives to Calculate the
Distance between Clusters
 Single link: smallest distance between an element in one
cluster and an element in the other, i.e., dis(K i, Kj) = min(tip, tjq)
 Complete link: largest distance between an element in one
cluster and an element in the other, i.e., dis(K i, Kj) = max(tip, tjq)
 Average: avg distance between an element in one cluster and
an element in the other, i.e., dis(K i, Kj) = avg(tip, tjq)
 Centroid: distance between the centroids of two clusters, i.e.,
dis(Ki, Kj) = dis(Ci, Cj)
 Medoid: distance between the medoids of two clusters, i.e.,
dis(Ki, Kj) = dis(Mi, Mj)
 Medoid: one chosen, centrally located object in the cluster
March 26, 2025 Data Mining: Concepts and Techniques 41
Distance between two clusters
 Single-link distance between clusters Ci
and Cj is the minimum distance between
any object in Ci and any object in Cj

 The distance is defined by the two most

similar objects


Dsl Ci , C j  min x , y d ( x, y ) x  Ci , y  C j 
Single-link clustering: example

5
1
3
5 0.2

2 1 0.15

2 3 6 0.1

0.05
4
4 0
3 6 2 5 4 1

Nested Clusters Dendrogram

Strengths of single-link clustering

Original Points Two Clusters

• Can handle non-elliptical shapes

Limitations of single-link clustering

Original Points Two Clusters

• Sensitive to noise and outliers

• It produces long, elongated clusters
Distance between two clusters
 Complete-link distance between clusters
Ci and Cj is the maximum distance
between any object in Ci and any object in
Cj

 The distance is defined by the two most

dissimilar objects


Dcl Ci , C j  max x , y d ( x, y ) x  Ci , y  C j 
Complete-link clustering: example

4 1
2 5 0.4

0.35
5
2 0.3

0.25

3 6
0.2

3 0.15
1 0.1

4 0.05

0
3 6 4 1 2 5

Nested Clusters Dendrogra

m
Strengths of complete-link clustering

Original Points Two Clusters

• More balanced clusters (with equal

diameter)
• Less susceptible to noise
Limitations of complete-link clustering

Original Points Two Clusters

• Tends to break large clusters

• All clusters tend to have the same

diameter – small clusters are merged

Distance between two clusters
 Group average distance between
clusters Ci and Cj is the average distance
between any object in Ci and any object in
Cj

1
Davg Ci , C j    d ( x, y )
Ci C j xCi , yC j
Average-link clustering: example

5 4 1
2
0.25

5 0.2

2 0.15

3 6 0.1

1 0.05

4 0
3 3 6 4 1 2 5

Nested Clusters Dendrogram

Average-link clustering:
discussion
 Compromise between Single and
Complete Link

 Strengths
 Less susceptible to noise and outliers

 Limitations
 Biased towards globular clusters
Distance between two clusters
 Centroid distance between clusters Ci
and Cj is the distance between the centroid
ri of Ci and the centroid rj of Cj

Dcentroids Ci , C j  d (ri , rj )

Distance between two clusters
• Ward’s distance between clusters Ci and Cj is
the difference between the total within
cluster sum of squares for the two clusters
separately, and the within cluster sum of
squares resulting from merging the two
clusters in cluster Cij
Dw Ci , C j    x  ri    x  rj    x  r 
2 2 2
ij
xCi xC j xCij

• ri: centroid of Ci
• rj: centroid of Cj
• rij: centroid of Cij
Ward’s distance for clusters
 Similar to group average and centroid
distance

 Less susceptible to noise and outliers

 Biased towards globular clusters

 Hierarchical analogue of k-means

 Can be used to initialize k-means
Hierarchical Clustering:
Comparison
5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3
1
4 4
4

5
1 5 4 1
2 2
5 Ward’s Method 5
2 2
3 6 Group Average 3 6
3
4 1 1
4 4
3
Hierarchical Clustering
In a dendrogram, the length of each tree branch represents
the distance between clusters it joins.

Different dendrograms may arise when different Linkage

methods are used.
Hierarchical Clustering: Time and
Space requirements
• For a dataset X consisting of n points

• O(n2) space; it requires storing the

distance matrix

• O(n3) time in most of the cases

– There are n steps and at each step the

size n2 distance matrix must be updated

and searched
– Complexity can be reduced to O(n2

log(n) ) time for some approaches by

using appropriate data structures
DIANA (Divisive Analysis)

 Introduced in Kaufmann and Rousseeuw (1990)

 Implemented in statistical analysis packages, e.g.,
Splus
 Inverse order of AGNES
 Eventually each node forms a cluster on its own
10 10
10

9 9
9
8 8
8

7 7
7
6 6
6

5 5
5
4 4
4

3 3
3
2 2
2

1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

Data Mining: Concepts and

March 26, 2025 Techniques 59
Divisive hierarchical clustering
• Start with a single cluster composed of all data points

• Split this into components

• Continue recursively

• Monothetic divisive methods split clusters using one

variable/dimension at a time

• Polythetic divisive methods make splits on the basis of all

variables together

• Any intercluster distance measure can be used

• Computationally intensive, less widely used than

agglomerative methods
Manual workout References

https://people.revoledu.com/kardi/
tutorial/Clustering/Numerical
%20Example.htm

https://online.stat.psu.edu/
stat555/node/86/

March 26, 2025 Data Mining: Concepts and Techniques 61

Strengths of Hierarchical
Clustering
 No assumptions on the number of clusters
 Any desired number of clusters can be

obtained by ‘cutting’ the dendogram at

the proper level

 Hierarchical clusterings may correspond to

meaningful taxonomies
 Example in biological sciences (e.g.,

phylogeny reconstruction, etc), web

(e.g., product catalogs) etc
Clustering Algorithms
K-means vs. Hierarchical clustering:
 Computation Time:
– Hierarchical clustering: O( m n2 log(n) )
– K-means clustering: O( k t m n )
 t: number of iterations
 n: number of objects
 m-dimensional vectors
 k: number of clusters
 Memory Requirements:
– Hierarchical clustering: O( mn + n2 )
– K-means clustering: O( mn + kn )
 Other:
 Hierarchical Clustering:

Need to select Linkage Method

to perform any analysis, it is necessary to partition the dendrogram into k disjoint
clusters, cutting the dendrogram at some point. A limitation is that it is not clear
how to choose this k
 K-means: Need to select K
 In both cases: Need to select distance/similarity measure
More on Hierarchical Clustering
Methods
 Major weakness of agglomerative clustering methods
 do not scale well: time complexity of at least O(n2),

where n is the number of total objects

 can never undo what was done previously

 Integration of hierarchical with distance-based

clustering
 BIRCH (1996): uses CF-tree and incrementally

adjusts the quality of sub-clusters

 CURE (1998): selects well-scattered points from the

cluster and then shrinks them towards the center of

the cluster by a specified fraction
 CHAMELEON (1999): hierarchical clustering using

dynamic modelingData Mining: Concepts and

March 26, 2025 Techniques 64
BIRCH (1996)
 Birch: Balanced Iterative Reducing and Clustering using
Hierarchies, by Zhang, Ramakrishnan, Livny
(SIGMOD’96)
 Incrementally construct a CF (Clustering Feature) tree, a
hierarchical data structure for multiphase clustering
 Phase 1: scan DB to build an initial in-memory CF tree
(a multi-level compression of the data that tries to
preserve the inherent clustering structure of the data)

Phase 2: use an arbitrary clustering algorithm to
cluster the leaf nodes of the CF-tree
 Scales linearly: finds a good clustering with a single scan
and improves the quality with a few additional scans
 Weakness: handles only numeric data, and sensitive to
the order of the data record.
Data Mining: Concepts and
March 26, 2025 Techniques 65
Clustering Feature Vector

Clustering Feature: CF = (N, LS, SS)

N: Number of data points
LS: Ni=1=Xi
SS: Ni=1=Xi2 CF = (5, (16,30),(54,190))
10

9
(3,4)
(2,6)
8

(4,5)
5

1
(4,7)
(3,8)
0
0 1 2 3 4 5 6 7 8 9 10

Data Mining: Concepts and

March 26, 2025 Techniques 66
BIRCH

Data Mining: Concepts and

March 26, 2025 Techniques 67
BIRCH - Example

Data Mining: Concepts and

March 26, 2025 Techniques 68
BIRCH

Data Mining: Concepts and

March 26, 2025 Techniques 69
BIRCH Algorithm Phases

Data Mining: Concepts and

March 26, 2025 Techniques 70
BIRCH Algorithm Phases

Data Mining: Concepts and

March 26, 2025 Techniques 71
BIRCH Algorithm Phases

Data Mining: Concepts and

March 26, 2025 Techniques 72
CF Tree Root

B=7 CF1 CF2 CF3 CF6

L=6 child1 child2 child3 child6

Non-leaf node
CF1 CF2 CF3 CF5
child1 child2 child3 child5

Leaf node Leaf node

prev CF1 CF2 CF6 next prev CF1 CF2 CF4 next

Data Mining: Concepts and

March 26, 2025 Techniques 73
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 74
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 75
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 76
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 77
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 78
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 79
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 80
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 81
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 82
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 83
Numeric Example

Data Mining: Concepts and

March 26, 2025 Techniques 84

Cluster Analysis Methods and Applications
No ratings yet
Cluster Analysis Methods and Applications
21 pages
8 CLST
No ratings yet
8 CLST
100 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
Clustering
No ratings yet
Clustering
23 pages
Clustering Full 1
No ratings yet
Clustering Full 1
98 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
89 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
104 pages
Lecture 3.2.1 3.2.2
No ratings yet
Lecture 3.2.1 3.2.2
28 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
120 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
98 pages
Cluster Analysis
No ratings yet
Cluster Analysis
136 pages
Unsupervised Learning: Cluster Analysis
No ratings yet
Unsupervised Learning: Cluster Analysis
39 pages
Cluster Analysis: Methods & Applications
No ratings yet
Cluster Analysis: Methods & Applications
23 pages
Understanding Cluster Analysis in Data Mining
100% (1)
Understanding Cluster Analysis in Data Mining
60 pages
Understanding Cluster Analysis in Data Mining
No ratings yet
Understanding Cluster Analysis in Data Mining
21 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
127 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
101 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
86 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
120 pages
Constraint-Based Cluster Analysis Overview
No ratings yet
Constraint-Based Cluster Analysis Overview
56 pages
8 CLST
No ratings yet
8 CLST
98 pages
Cluster Analisys
No ratings yet
Cluster Analisys
100 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
123 pages
Lecture 6 - Clustering
No ratings yet
Lecture 6 - Clustering
25 pages
Kmeans Ex
No ratings yet
Kmeans Ex
98 pages
Cluster Analysis: Methods and Applications
No ratings yet
Cluster Analysis: Methods and Applications
96 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
107 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
76 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
51 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
10 Clus Basic
No ratings yet
10 Clus Basic
31 pages
Cluster Analysis: Methods and Concepts
No ratings yet
Cluster Analysis: Methods and Concepts
38 pages
Unit IV
No ratings yet
Unit IV
96 pages
Data Mining Chapter 5 Cluster Analysis
No ratings yet
Data Mining Chapter 5 Cluster Analysis
44 pages
Cluster Analysis for Researchers
No ratings yet
Cluster Analysis for Researchers
76 pages
Clustering
No ratings yet
Clustering
32 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Cluster Analysis: Concepts & Methods
No ratings yet
Cluster Analysis: Concepts & Methods
41 pages
Unit 5
No ratings yet
Unit 5
85 pages
Chapter4 Clustering
No ratings yet
Chapter4 Clustering
77 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
104 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Cluster Analysis Essentials
No ratings yet
Cluster Analysis Essentials
50 pages
Overview of Clustering Algorithms
No ratings yet
Overview of Clustering Algorithms
83 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
66 pages
Clustering Algorithms in Data Mining
No ratings yet
Clustering Algorithms in Data Mining
7 pages
Understanding Cluster Analysis Basics
No ratings yet
Understanding Cluster Analysis Basics
9 pages
Chapter 6
No ratings yet
Chapter 6
12 pages
Overview of Clustering Techniques
No ratings yet
Overview of Clustering Techniques
64 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
K-Means Clustering Techniques Review
No ratings yet
K-Means Clustering Techniques Review
4 pages
Software - BAEEE101 - Sinusoidal Response of RLC Circuit
No ratings yet
Software - BAEEE101 - Sinusoidal Response of RLC Circuit
5 pages
Earthing
No ratings yet
Earthing
22 pages
1 PN JN Diode Zener Diode 5-5-25 Notes
No ratings yet
1 PN JN Diode Zener Diode 5-5-25 Notes
25 pages
BAEEE101 Basic Engineering Syllabus - CAT1
No ratings yet
BAEEE101 Basic Engineering Syllabus - CAT1
4 pages
Electrical Drives
No ratings yet
Electrical Drives
111 pages
Understanding PAC Learning Framework
No ratings yet
Understanding PAC Learning Framework
30 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Risk Minimization
No ratings yet
Risk Minimization
12 pages
A Novel Bi-Directional DC-DC Converter For Distributed Energy Storage Device
No ratings yet
A Novel Bi-Directional DC-DC Converter For Distributed Energy Storage Device
5 pages
Bee305p SW1
No ratings yet
Bee305p SW1
4 pages
Online Learning vs. Batch Learning
No ratings yet
Online Learning vs. Batch Learning
5 pages
2024 Review On Smart Grid Load Forecasting For Smart Energy Management Using
No ratings yet
2024 Review On Smart Grid Load Forecasting For Smart Energy Management Using
17 pages
When The Supply Is Restored Automatically, The Resulting Event Is Called Short Interruption
No ratings yet
When The Supply Is Restored Automatically, The Resulting Event Is Called Short Interruption
17 pages
Overview of Quantum Computing Concepts
No ratings yet
Overview of Quantum Computing Concepts
35 pages
Fis Jur
No ratings yet
Fis Jur
9 pages
7.DS - Question - Bank DS
No ratings yet
7.DS - Question - Bank DS
10 pages
TC400 Series Manual
No ratings yet
TC400 Series Manual
43 pages
Exit Sign
No ratings yet
Exit Sign
3 pages
Physics Exam Paper for 10th Grade
No ratings yet
Physics Exam Paper for 10th Grade
1 page
Soil Excavation and Filling Techniques
No ratings yet
Soil Excavation and Filling Techniques
28 pages
How Do Species Interact?
No ratings yet
How Do Species Interact?
78 pages
Basic Needs of Communities and Countries
No ratings yet
Basic Needs of Communities and Countries
2 pages
Abstract of ASTM A320 A320M 2000
No ratings yet
Abstract of ASTM A320 A320M 2000
10 pages
Modern Combat Aircraft Design Airlife Publishing LTD
No ratings yet
Modern Combat Aircraft Design Airlife Publishing LTD
10 pages
Kci Fi002629821
No ratings yet
Kci Fi002629821
11 pages
Understanding Carbohydrates: Types & Reactions
No ratings yet
Understanding Carbohydrates: Types & Reactions
26 pages
Analog Filter Design with MathFilters
No ratings yet
Analog Filter Design with MathFilters
5 pages
Marketing Case Study Analysis
No ratings yet
Marketing Case Study Analysis
29 pages
Grade 2 Week 12 Learning Activities
No ratings yet
Grade 2 Week 12 Learning Activities
53 pages
H5381 Final (2013.01.28)
0% (1)
H5381 Final (2013.01.28)
181 pages
Lab Report
No ratings yet
Lab Report
28 pages
Indian Standard: Methods of Tests For Building Limes
No ratings yet
Indian Standard: Methods of Tests For Building Limes
13 pages
EURAMOS
No ratings yet
EURAMOS
77 pages
Republic of The Philippines Department of Education: Diagnostic Test Mapeh 7 2Nd Quarter
No ratings yet
Republic of The Philippines Department of Education: Diagnostic Test Mapeh 7 2Nd Quarter
7 pages
Counterblast - Second Edition
100% (4)
Counterblast - Second Edition
196 pages
WHO Quality Assurance for Pharmaceuticals
No ratings yet
WHO Quality Assurance for Pharmaceuticals
34 pages
Lecture 2 - Clinical and Lab Steps - Preliminary Impression-Preliminary Cast (3.03)
No ratings yet
Lecture 2 - Clinical and Lab Steps - Preliminary Impression-Preliminary Cast (3.03)
42 pages
Understanding Mental Models Explained
No ratings yet
Understanding Mental Models Explained
14 pages
Point Pattern Analysis on River Networks
No ratings yet
Point Pattern Analysis on River Networks
22 pages
Advances in Space Exploration Technology
100% (1)
Advances in Space Exploration Technology
30 pages
BUS173 ch15
No ratings yet
BUS173 ch15
183 pages
"The Centaur and Bacchante Translation"
100% (1)
"The Centaur and Bacchante Translation"
60 pages
Introduction To Microbiology
No ratings yet
Introduction To Microbiology
20 pages