You are on page 1of 3

Assignment – 5

Name – Vedant Modi


Reg. no.- 20BCE2126
Course – Machine Learning
Lab – L21+L22
Question : Use wine data set from sklearn library and try to form
clusters of wine behaviour using Malic Acid and Proline features.
Drop the other features for simplicity. -Create a scatter plot of the
above mentioned features of the wine data set. -Figure out if any
pre-processing such as scaling would here. -Draw elbow plot and
from that figure out an optimal value of k.
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt

# Load the wine dataset


wine = load_wine()

# Create a DataFrame with only the Malic Acid and Proline features
data = pd.DataFrame(wine.data[:, [0, 11]], columns=['Malic acid', 'Proline'])

# Create a scatter plot of the Malic Acid and Proline features


plt.scatter(data['Malic acid'], data['Proline'])
plt.xlabel('Malic acid')
plt.ylabel('Proline')
plt.show()

# Standardize the data


scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

# Compute the sum of squared distances for a range of k values


k_values = range(1, 11)
sse = []
for k in k_values:
kmeans = KMeans(n_clusters=k)
kmeans.fit(scaled_data)
sse.append(kmeans.inertia_)

/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan


warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will chan
warnings.warn(
# Plot the elbow method
plt.plot(k_values, sse)
plt.xlabel('Number of clusters (k)')
plt.ylabel('Sum of squared distances')
plt.show()

# Determine the optimal number of clusters using the elbow method


optimal_k = 3

# Perform clustering with the optimal number of clusters


kmeans = KMeans(n_clusters=optimal_k)
kmeans.fit(scaled_data)

/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWar
warnings.warn(
▾ KMeans
KMeans(n_clusters=3)

# Assign cluster labels to each data point


data['cluster'] = kmeans.labels_

# Print the results


print('Cluster labels:')
print(data['cluster'].value_counts())

Cluster labels:
0 65
2 57
1 56
Name: cluster, dtype: int64

You might also like