Professional Documents
Culture Documents
Click Here To Browse The K-Means Clustering Code in Google Colab
Click Here To Browse The K-Means Clustering Code in Google Colab
1) Study the code showing K-means clustering using the Iris dataset. The number of
clusters is chosen to be 5.
a) Experiment within different values of number of clusters (say from 1 to 10) and store
the error in a list.
(Hint: Error = [] Error.append(model_kmeans.inertia_))
The K-means algorithm aims to choose centroids that minimise the inertia, or within-
cluster sum-of-squares criterion (https://scikit-learn.org/stable/modules/clustering.html)
b) Plot a graph where X axis represents the number of clusters and Y axis represents
the error. What is the optimal value of the number of clusters?
View this video to understand the graph that you have plotted.
2) Study the code for a simple spam classifier using Bag of Words representation (each
feature is basically the frequency of a particular word in the document)
Now enhance the code to use Term Frequency — Inverse Document Frequency (TF-
IDF) as feature instead of word count.
Explanation of TF-IDF
SK-learn page of TfidfVectorizer