You are on page 1of 6

71762108005

Ex. No: 4
K-MEANS & HIERARCHICAL CLUSTERING IN WEKA TOOL
Date: 26.9.23

Aim:
To implement the K-Means Clustering and Hierarchical Clustering Algorithm in Weka Tool.

K – Means Clustering Algorithm:


function k_means(dataset, k):
# Randomly initialize cluster centroids
centroids = initialize_random_centroids(dataset, k)

while true:
# Assign each data point to the nearest centroid
clusters = assign_to_nearest_centroid(dataset, centroids)

# Update the centroids based on the assigned data points


new_centroids = update_centroids(clusters)

# Check for convergence


if centroids_converged(centroids, new_centroids):
break

# Update centroids for the next iteration


centroids = new_centroids

return clusters

Hierarchical Clustering Algorithm:


function hierarchical_clustering(dataset):
# Initialize each data point as a separate cluster
clusters = initialize_clusters(dataset)

while len(clusters) > 1:


# Compute pairwise distances between clusters
distances = compute_pairwise_distances(clusters)

# Find the closest pair of clusters


closest_pair = find_closest_clusters(distances)

# Merge the closest pair of clusters into a new cluster


new_cluster = merge_clusters(closest_pair, clusters)

# Remove the merged clusters from the list of clusters

1
71762108005

clusters.remove(closest_pair[0])
clusters.remove(closest_pair[1])

# Add the new cluster to the list of clusters


clusters.append(new_cluster)

return clusters

Dataset:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/supermarket.arff
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.nominal.arff

Output:

2
71762108005

=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-


density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation: supermarket
Instances: 4627
Attributes: 217
[list of attributes omitted]
Test mode: evaluate on training data

=== Clustering model (full training set) ===

kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 0.0

Initial starting points (random):

3
71762108005

Cluster 0:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,high
Cluster 1:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,low

Missing values globally replaced with mean/mode

Time taken to build model (full training data) : 0.1 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 1679 ( 36%)
1 2948 ( 64%)

4
71762108005

=== Run information ===

Scheme: weka.clusterers.HierarchicalClusterer -N 2 -L SINGLE -P -A "weka.core.EuclideanDistance -R


first-last"
Relation: weather.symbolic
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: evaluate on training data

=== Clustering model (full training set) ===

Cluster 0
((1.0:1,1.0:1):0,1.0:1)

Cluster 1
(((((0.0:1,0.0:1):0.41421,((((0.0:1,0.0:1):0,
(0.0:1,0.0:1):0):0.41421,1.0:1.41421):0,0.0:1.41421):0):0,0.0:1.41421):0,0.0:1.41421):0,1.0:1.41421)

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 3 ( 21%)
1 11 ( 79%)

K-Means Clustering Inference:


• Number of Clusters: The K-Means algorithm has created two clusters, Cluster 0 and Cluster 1.
• Cluster Sizes: Cluster 0 contains 36% of the total instances (1679 instances), while Cluster 1
contains 64% of the total instances (2948 instances). These percentages represent the distribution of
data points among the clusters.
• Within-Cluster Sum of Squared Errors (WSS): The WSS is reported as 0.0, which is unusual.
Typically, the WSS should be a positive value, and it measures the total distance of data points
within each cluster to their respective cluster centroids. A WSS of 0.0 suggests that the data points
may be perfectly clustered or that there is an issue with the clustering results.
• Initial Starting Points: The initial starting points for the clusters are mentioned as "high" for Cluster
0 and "low" for Cluster 1. These labels or descriptions may indicate some characteristics or
attributes associated with each cluster, but additional context is needed to interpret their meaning.

5
71762108005

Hierarchical Clustering Inference:


• Cluster Structures:
Cluster 0: This cluster contains 3 instances, which represent 21% of the total instances in the
dataset.
Cluster 1: Cluster 1 contains 11 instances, representing 79% of the total instances.
• Hierarchical Structure: The hierarchical structure of the clusters is represented in a nested format,
with sub-clusters within Cluster 1.
• Interpretation:
Cluster 0 and Cluster 1 are the top-level clusters.
Within Cluster 1, there are sub-clusters with further nested structures.
• Distance Measures:
The distances between data points within clusters are represented using numerical values.
The specific meaning of these distances depends on the distance metric and linkage method used for
hierarchical clustering (in this case, the "EuclideanDistance" and "SINGLE" linkage method).

Result:
Successfully implemented K-Means Clustering and Hierarchical Clustering algorithm in Weka Tool
using the Cluster option.

You might also like