You are on page 1of 1

1.

3 Basic Algorithms 33

Towards this end, define prototype vectors µ1 , . . . , µk and an indicator


vector rij which is 1 if, and only if, xi is assigned to cluster j. To cluster our
dataset we will minimize the following distortion measure, which minimizes
the distance of each point from the prototype vector:
m k
1 XX
J(r, µ) := rij kxi − µj k2 , (1.29)
2
i=1 j=1

where r = {rij }, µ = {µj }, and k · k2 denotes the usual Euclidean square


norm.
Our goal is to find r and µ, but since it is not easy to jointly minimize J
with respect to both r and µ, we will adapt a two stage strategy:
Stage 1 Keep the µ fixed and determine r. In this case, it is easy to see
that the minimization decomposes into m independent problems.
The solution for the i-th data point xi can be found by setting:
rij = 1 if j = argmin kxi − µj 0 k2 , (1.30)
j0

and 0 otherwise.
Stage 2 Keep the r fixed and determine µ. Since the r’s are fixed, J is an
quadratic function of µ. It can be minimized by setting the derivative
with respect to µj to be 0:
m
X
rij (xi − µj ) = 0 for all j. (1.31)
i=1

Rearranging obtains
P
rij xi
µj = Pi . (1.32)
i rij
P
Since i rij counts the number of points assigned to cluster j, we are
essentially setting µj to be the sample mean of the points assigned
to cluster j.
The algorithm stops when the cluster assignments do not change signifi-
cantly. Detailed pseudo-code can be found in Algorithm 1.5.
Two issues with K-Means are worth noting. First, it is sensitive to the
choice of the initial cluster centers µ. A number of practical heuristics have
been developed. For instance, one could randomly choose k points from the
given dataset as cluster centers. Other methods try to pick k points from X
which are farthest away from each other. Second, it makes a hard assignment
of every point to a cluster center. Variants which we will encounter later in

You might also like