EXPERIMENT 9
Aim: Implementation of K-Mean Clustering
COURSE OUTCOMES
CO4 Evaluate machine learning model’s performance and apply learning strategy to
improve the performance of supervised and unsupervised learning model.
CO5 Develop a suitable model for supervised and unsupervised learning algorithm and
optimize the model on the expected accuracy.
K Means Clustering
In this model Data is divided into clusters on the basis of nearest mean to each cluster.
1. Identify 2 groups in 1D Array
from [Link] import KMeans
import numpy as np
data = [Link]([1,2,3,4,5,6,7,8,9,10,91,92,93,94,95,96,97,98,99,100])
kmeans = KMeans(n_clusters=2).fit([Link](-1,1))
[Link]([Link](-1,1))
1. Identify 5 groups in 1D Array
from [Link] import KMeans
import numpy as np
data = [Link]([101, 107, 106, 199, 204, 205, 207, 306, 310, 312, 312, 314, 317, 318, 380, 377,
379, 382, 466, 469, 471, 472, 557, 559, 562, 566, 569])
kmeans = KMeans(n_clusters=5).fit([Link](-1,1))
[Link]([Link](-1,1))
2. Identify 2 groups in 2 D Array
from [Link] import KMeans
import numpy as np
X = [Link]([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
[Link]([[0, 0], [12, 3]])
[Link]([[11,11], [8, 9]])
[Link]([[2,20], [4, 4]])
Explanation:
1 2
1 4
1 0
10 2
10 4
10 0
Ans is [1,0]
[0,0] will be predicted in Column No 1
[12,3] will be predicted in Column No 0
Similarly check [11,11] [8,9] it must come in [0,0]
And Check[2,2][4,4] it must come in [1,1]
3. Plotting K means cluster for 2D Group for 2 Clusters
from [Link] import KMeans
import numpy as np
X = [Link]([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
y_predict= kmeans.fit_predict(X)
#[Link]([[0, 0], [12, 3]])
import [Link] as mtp
[Link](X[y_predict == 0, 0], X[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
#for first cluster
[Link](X[y_predict == 1, 0], X[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
#for second cluster
[Link](0,10)
[Link](0,10)
[Link]()
4. Plot a scatter Chart for 300 random numbers
%matplotlib inline
import [Link] as plt
import seaborn as sns; [Link]() # for plot styling
import numpy as np
from [Link] import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.60, random_state=0)
[Link](X[:, 0], X[:, 1], s=50);
# The scatter() function plots one dot for each observation. It needs two arrays of the same
length, one for the values of the x-axis, and one for values on the y-axis.
# Using : means that we take all elements in the correspond array dimension.
# s tells the size of the marker. (This is the size of the marker)
Now seeing this chart we can identify that there are 4 different clusters.
The k-means algorithm does this automatically, and in Scikit-Learn uses the typical estimator
API:
from [Link] import KMeans
kmeans = KMeans(n_clusters=4)
[Link](X)
y_kmeans = [Link](X)
[Link](X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
[Link](centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
5. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 5
say)
from [Link] import KMeans
kmeans = KMeans(n_clusters=5)
[Link](X)
y_kmeans = [Link](X)
[Link](X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
[Link](centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
Figure 30: 5 Clusters
6. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 6
say)
from [Link] import KMeans
kmeans = KMeans(n_clusters=6)
[Link](X)
y_kmeans = [Link](X)
[Link](X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
[Link](centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
Figure 31: 6 Clusters
Similarly do the same for 7 Clusters and 8 Clusters
Figure 32: 7 Clusters
Figure 33: 12 Clusters
Viva Questions
1. What is the main difference between k-Means and k-Nearest Neighbours?
2. How is Entropy used as a Clustering Validation Measure?
3. How to determine k using the Elbow Method?
4. What is the difference between Classical k-Means and Spherical k-Means?
5. What is the difference between k-Means and k-Medians and when would you use one
over another?