Recor

71762108005
Ex. No: 4
K-MEANS & HIERARCHICAL CLUSTERING IN WEKA TOOL
Date: 26.9.23
Aim:
To implement the K-Means Clustering and Hierarchical Clustering Algorithm in Weka Tool.
K – Means Clustering Algorithm:

function k_means(dataset, k):
# Randomly initialize cluster centroids
centroids = initialize_random_centroids(dataset, k)
while true:
# Assign each data point to the nearest centroid
clusters = assign_to_nearest_centroid(dataset, centroids)
# Update the centroids based on the assigned data points

new_centroids = update_centroids(clusters)
# Check for convergence

if centroids_converged(centroids, new_centroids):
break
# Update centroids for the next iteration

centroids = new_centroids
return clusters
Hierarchical Clustering Algorithm:

function hierarchical_clustering(dataset):
# Initialize each data point as a separate cluster
clusters = initialize_clusters(dataset)
while len(clusters) > 1:

# Compute pairwise distances between clusters
distances = compute_pairwise_distances(clusters)
# Find the closest pair of clusters

closest_pair = find_closest_clusters(distances)
# Merge the closest pair of clusters into a new cluster

new_cluster = merge_clusters(closest_pair, clusters)
# Remove the merged clusters from the list of clusters
1
71762108005
clusters.remove(closest_pair[0])
clusters.remove(closest_pair[1])
# Add the new cluster to the list of clusters

clusters.append(new_cluster)
return clusters
Dataset:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/supermarket.arff
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.nominal.arff
Output:
2
71762108005
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-

density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation: supermarket
Instances: 4627
Attributes: 217
[list of attributes omitted]
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 0.0
Initial starting points (random):
3
71762108005
Cluster 0:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,high
Cluster 1:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,low
Missing values globally replaced with mean/mode
Time taken to build model (full training data) : 0.1 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 1679 ( 36%)
1 2948 ( 64%)
4
71762108005
=== Run information ===
Scheme: weka.clusterers.HierarchicalClusterer -N 2 -L SINGLE -P -A "weka.core.EuclideanDistance -R

first-last"
Relation: weather.symbolic
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: evaluate on training data
=== Clustering model (full training set) ===
Cluster 0
((1.0:1,1.0:1):0,1.0:1)
Cluster 1
(((((0.0:1,0.0:1):0.41421,((((0.0:1,0.0:1):0,
(0.0:1,0.0:1):0):0.41421,1.0:1.41421):0,0.0:1.41421):0):0,0.0:1.41421):0,0.0:1.41421):0,1.0:1.41421)
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 3 ( 21%)
1 11 ( 79%)
K-Means Clustering Inference:

• Number of Clusters: The K-Means algorithm has created two clusters, Cluster 0 and Cluster 1.
• Cluster Sizes: Cluster 0 contains 36% of the total instances (1679 instances), while Cluster 1
contains 64% of the total instances (2948 instances). These percentages represent the distribution of
data points among the clusters.
• Within-Cluster Sum of Squared Errors (WSS): The WSS is reported as 0.0, which is unusual.
Typically, the WSS should be a positive value, and it measures the total distance of data points
within each cluster to their respective cluster centroids. A WSS of 0.0 suggests that the data points
may be perfectly clustered or that there is an issue with the clustering results.
• Initial Starting Points: The initial starting points for the clusters are mentioned as "high" for Cluster
0 and "low" for Cluster 1. These labels or descriptions may indicate some characteristics or
attributes associated with each cluster, but additional context is needed to interpret their meaning.
5
71762108005
Hierarchical Clustering Inference:

• Cluster Structures:
Cluster 0: This cluster contains 3 instances, which represent 21% of the total instances in the
dataset.
Cluster 1: Cluster 1 contains 11 instances, representing 79% of the total instances.
• Hierarchical Structure: The hierarchical structure of the clusters is represented in a nested format,
with sub-clusters within Cluster 1.
• Interpretation:
Cluster 0 and Cluster 1 are the top-level clusters.
Within Cluster 1, there are sub-clusters with further nested structures.
• Distance Measures:
The distances between data points within clusters are represented using numerical values.
The specific meaning of these distances depends on the distance metric and linkage method used for
hierarchical clustering (in this case, the "EuclideanDistance" and "SINGLE" linkage method).
Result:
Successfully implemented K-Means Clustering and Hierarchical Clustering algorithm in Weka Tool
using the Cluster option.

Recor

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recor

Uploaded by

Copyright:

Available Formats

71762108005

K – Means Clustering Algorithm:

# Update the centroids based on the assigned data points

# Check for convergence

# Update centroids for the next iteration

Hierarchical Clustering Algorithm:

while len(clusters) > 1:

# Find the closest pair of clusters

# Merge the closest pair of clusters into a new cluster

# Remove the merged clusters from the list of clusters

# Add the new cluster to the list of clusters

=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-

=== Clustering model (full training set) ===

Initial starting points (random):

Missing values globally replaced with mean/mode

Time taken to build model (full training data) : 0.1 seconds

=== Model and evaluation on training set ===

=== Run information ===

Scheme: weka.clusterers.HierarchicalClusterer -N 2 -L SINGLE -P -A "weka.core.EuclideanDistance -R

=== Clustering model (full training set) ===

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

K-Means Clustering Inference:

Hierarchical Clustering Inference:

You might also like