You are on page 1of 4

Name : Gauransh verma

Roll no. : 2100290110054


Batch : A2

Q.9 : To perform K-Mean clustering operation and visualize for iris dataset

THEORY:

1. Import Libraries:
from sklearn.cluster import KMeans: This imports the KMeans clustering algorithm from scikit-
learn, a popular Python library for machine learning.

import matplotlib.pyplot as plt: This imports the matplotlib library for plotting.

2. Initialization:
wcss = []: This initializes an empty list wcss to store the Within-Cluster-Sum-of-Squares (WCSS)
for each number of clusters.

3. Loop through Different Numbers of Clusters:


for i in range(1, 11): This loop iterates over different numbers of clusters from 1 to 10. The range is
chosen arbitrarily but can be adjusted based on the specific problem and computational constraints.

4. KMeans Clustering:
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0): This
initializes a KMeans clustering object with the current number of clusters (i).

init='k-means++': This specifies the initialization method. 'k-means++' is a smart initialization


technique that selects initial cluster centers in a way that speeds up convergence.

max_iter=300: This sets the maximum number of iterations for each KMeans run to 300. If
convergence is not achieved within these iterations, the algorithm stops.

n_init=10: This sets the number of times the KMeans algorithm will be run with different centroid
seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

random_state=0: This sets the random seed for reproducibility.

5. Fit the Data:


kmeans.fit(x): This fits the KMeans model to the data x, where x is assumed to be your dataset.
6. Compute WCSS:
wcss.append(kmeans.inertia_): This appends the computed WCSS (inertia) of the current KMeans
clustering to the wcss list. The inertia is the sum of squared distances of samples to their closest
cluster center.

7. Plotting the Elbow Method Graph:


After looping through different numbers of clusters and collecting the WCSS values, the code plots
the Elbow Method graph using matplotlib.

The x-axis represents the number of clusters, and the y-axis represents the corresponding WCSS
values.

By visually inspecting the graph, you can identify the "elbow point," which is the point where the
rate of decrease in WCSS starts to slow down. This point indicates the optimal number of clusters.

CODE:

from sklearn.cluster import KMeans


import matplotlib.pyplot as plt

# Assuming you have your data stored in variable `x`

wcss = []

# Loop through different numbers of clusters


for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(x)
# Append WCSS to the list
wcss.append(kmeans.inertia_)

# Plotting the elbow method graph


plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
This code assumes that your dataset is in a CSV file named 'house_dataset.csv' and that you have a

target variable named 'target_variable'. Adjust these names according to your actual dataset.

Additionally, customize the preprocessing steps as needed for your specific data.

You might also like