Professional Documents
Culture Documents
Visual Example
This example will be done in 2D space with K=2. Meaning two features and two classes.
The data we will use is seen below
Step 1: Two random centroids are created and placed on the graph.
Step 5: Since each point is already assigned to the closest centroid we do not need to repeat
this anymore and our algorithm has finished.
Importing Modules
Before we can begin we must import the following modules.
import numpy as np
import sklearn
from sklearn.preprocessing import scale
from sklearn.datasets import load_digits
from sklearn.cluster import KMeans
from sklearn import metrics
k = 10
samples, features = data.shape
We also define the amount of clusters by creating a variable k and we
define how many samples and features we have by getting the data set
shape.
Scoring
To score our model we are going to use a function from the sklearn website.
It computes many different scores for different parts of our model. If you’d
like to learn more about what these values mean please visit the
following website .
def bench_k_means(estimator, name, data):
estimator.fit(data)
print('%-9s\t%i\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f\t%.3f'
% (name, estimator.inertia_,
metrics.homogeneity_score(y, estimator.labels_),
metrics.completeness_score(y, estimator.labels_),
metrics.v_measure_score(y, estimator.labels_),
metrics.adjusted_rand_score(y, estimator.labels_),
metrics.adjusted_mutual_info_score(y, estimator.labels_),
metrics.silhouette_score(data, estimator.labels_,
metric='euclidean')))