You are on page 1of 1

1. We choose a default number of clusters(points) to begin with.

2. Now input datapoints go into respective clusters depending on its proximity.


3. Now each of the deafult 5 clusters have multiple datapoints,and now we find out
the centroid/seed point/mean of each cluster
4. now the initial input datapoints will be assigned to the default clusters
depending on the proximity of each point to the seed points of each cluster
5. post reassignment of the datapoints into the respective clusters,we again
compute the seed point/mean of each cluster
6. Again the reassignment of datapoints happen in each cluster depending on the
proximity to the seed points calculated in step 5.
7. again compute the seed point/mean of each cluster
8.again calculate the seed point or centrod.8. Reassignment of the datapoints will
happen depending on the proximity of cluster maens claculated in step 7
9. This process will continue till no further reassignment takes place. i.e points
in one cluster remains in the same cluster

NLP
#CountVectorizer helps tokenize the documents,it converts text to vectors by
assigning numeric values to it

You might also like