You are on page 1of 3

let's break down the code step by step:

Import Libraries:

import numpy as np

import pandas as pd

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

This imports necessary libraries:

numpy for numerical computations.

pandas for handling data in a DataFrame.

KMeans from sklearn.cluster for performing K-means clustering.

matplotlib.pyplot for visualization.

Sample Data:

data = {

'Player': ['Player 1', 'Player 2', 'Player 3', 'Player 4', 'Player 5', 'Player 6', 'Player 7', 'Player 8', 'Player
9', 'Player 10'],

'Runs Scored': [350, 280, 420, 200, 320, 380, 240, 400, 310, 360],

'Wickets Taken': [15, 10, 20, 5, 12, 18, 8, 17, 14, 16]

This is the sample data representing runs scored and wickets taken by cricket players.

Create DataFrame:

df = pd.DataFrame(data)

This creates a DataFrame df using the sample data.


Select Features:

X = df[['Runs Scored', 'Wickets Taken']]

This selects the features 'Runs Scored' and 'Wickets Taken' for clustering.

Visualize the Data:

plt.scatter(X['Runs Scored'], X['Wickets Taken'], color='blue')

plt.xlabel('Runs Scored')

plt.ylabel('Wickets Taken')

plt.title('Cricket Players - Runs vs Wickets')

plt.show()

This visualizes the data points on a scatter plot.

Perform K-means Clustering:

k = 4 # Number of clusters

kmeans = KMeans(n_clusters=k)

kmeans.fit(X)

This performs K-means clustering with k=4 clusters.

Get Cluster Centers and Labels:

centroids = kmeans.cluster_centers_

labels = kmeans.labels_

This retrieves the cluster centers and labels assigned to each data point.
Add Cluster Labels to DataFrame:

df['Cluster'] = labels

This adds the cluster labels to the DataFrame.

Visualize the Clusters:

colors = ['r', 'g', 'b', 'orange']

for i in range(k):

plt.scatter(X[df['Cluster'] == i]['Runs Scored'], X[df['Cluster'] == i]['Wickets Taken'], c=colors[i],


label='Cluster {}'.format(i+1))

plt.scatter(centroids[:, 0], centroids[:, 1], marker='*', s=300, c='k', label='Centroids')

plt.xlabel('Runs Scored')

plt.ylabel('Wickets Taken')

plt.title('K-means Clustering of Cricket Players')

plt.legend()

plt.show()

This visualizes the clusters along with their centroids on a scatter plot.

Print Cluster Centers:

print("Cluster Centers:")

for i, centroid in enumerate(centroids):

print("Cluster {}: {}".format(i+1, centroid))

This prints the coordinates of the cluster centers.

This code performs K-means clustering on the given dataset of cricket player statistics and visualizes
the resulting clusters. Adjustments can be made to the number of clusters (k) and the features
selected for clustering as needed.

You might also like