Professional Documents
Culture Documents
K-Medias, Mezcla de Gausianas y Un Ejemplo
K-Medias, Mezcla de Gausianas y Un Ejemplo
1010 , and most clustering problems involve much larger data sets than
N = 19. For this reason, practical clustering algorithms are able to examine
only a very small fraction of all possible encoders k = C(i). The goal is to
identify a small subset that is likely to contain the optimal one, or at least
a good suboptimal partition.
Such feasible strategies are based on iterative greedy descent. An initial
partition is specified. At each iterative step, the cluster assignments are
changed in such a way that the value of the criterion is improved from
its previous value. Clustering algorithms of this type differ in their pre-
scriptions for modifying the cluster assignments at each iteration. When
the prescription is unable to provide an improvement, the algorithm ter-
minates with the current assignments as its solution. Since the assignment
of observations to clusters at any iteration is a perturbation of that for the
previous iteration, only a very small fraction of all possible assignments
(14.30) are examined. However, these algorithms converge to local optima
which may be highly suboptimal when compared to the global optimum.
14.3.6 K-means
The K-means algorithm is one of the most popular iterative descent clus-
tering methods. It is intended for situations in which all variables are of
the quantitative type, and squared Euclidean distance
X
p
d(xi , xi′ ) = (xij − xi′ j )2 = ||xi − xi′ ||2
j=1
1X X X
K
W (C) = ||xi − xi′ ||2
2
k=1 C(i)=k C(i′ )=k
X
K X
= Nk ||xi − x̄k ||2 , (14.31)
k=1 C(i)=k
where x̄k = (x̄1k , . . . , x̄pk ) is the mean vector associated with the kth clus-
PN
ter, and Nk = i=1 I(C(i) = k). Thus, the criterion is minimized by
assigning the N observations to the K clusters in such a way that within
each cluster the average dissimilarity of the observations from the cluster
mean, as defined by the points in that cluster, is minimized.
An iterative descent algorithm for solving
510 14. Unsupervised Learning
X
K X
C ∗ = min Nk ||xi − x̄k ||2
C
k=1 C(i)=k
• •• • ••
• • •• •• •• ••• • • •• •• •• •••
6
•
• • • •••
•• ••• ••••••••• ••
••
•
• • • •••
•• ••• ••••••••• ••
••
•
• •• • •
•
• •• • •
4
• • • • • •
• •
• •
• •
2
• • • • • •
• • • •• • •• •• • • • • •• • •• •• •
•• • ••••• • • • • •• • ••••• • • • •
• • • •• • • • • • • •• • • •
•• • •• •• ••• •••• • •• ••• •• •• • •• •• ••• •••• • •• ••• ••
0
•
••
• •
•• ••••••• ••• ••• ••••
• •• •
•
••
• •
•• ••••••• ••• ••• ••••
• •• •
• • • •
-2
• •
-4 -2 0 2 4 6
• •• • ••
• • •• • •• ••• • • •• •• •• •••
•
• • • ••••• • ••
•• ••• ••••••• ••
•
• • • •••• • ••
•• ••• ••••••• ••
• •• • •
•
• • •
•
•
•
• •• • •
• • •
•
•
• • • • • • • •
• • • •• • •• •• • • • • •• • •• •• •
•• • ••••• • • • • •• • ••••• • • • •
• • • •• • • • • • • •• • • •
•• • •• •• •••• •••• • •• ••• ••
•
• •
•• ••••••• ••• ••• ••••
•
•• • •• •• •••• •••• • •• ••• ••
• •
•• ••••••• ••• ••• ••••
•• • •• • •• • •• •
• • • •
• •
σ = 1.0 σ = 1.0
1.0
Responsibilities
0.8
0.6
0.4
0.2
• •
0.0
σ = 0.2 σ = 0.2
1.0
Responsibilities
0.8
0.6
0.4
0.2
• •
0.0
FIGURE 14.7. (Left panels:) two Gaussian densities g0 (x) and g1 (x) (blue and
orange) on the real line, and a single data point (green dot) at x = 0.5. The colored
squares are plotted at x = −1.0 and x = 1.0, the means of each density. (Right
panels:) the relative densities g0 (x)/(g0 (x) + g1 (x)) and g1 (x)/(g0 (x) + g1 (x)),
called the “responsibilities” of each cluster, for this data point. In the top panels,
the Gaussian standard deviation σ = 1.0; in the bottom panels σ = 0.2. The
EM algorithm uses these responsibilities to make a “soft” assignment of each
data point to each of the two clusters. When σ is fairly large, the responsibilities
can be near 0.5 (they are 0.36 and 0.64 in the top right panel). As σ → 0, the
responsibilities → 1, for the cluster center closest to the target point, and 0 for
all other clusters. This “hard” assignment is seen in the bottom right panel.
240000
•
•
Sum of Squares
•
200000
•
•
•
• •
160000
• •
2 4 6 8 10
Number of Clusters K
FIGURE 14.8. Total within-cluster sum of squares for K-means clustering ap-
plied to the human tumor microarray data.
TABLE 14.2. Human tumor data: number of cancer cases of each type, in each
of the three clusters from K-means clustering.
Cluster Breast CNS Colon K562 Leukemia MCF7
1 3 5 0 0 0 0
2 2 0 0 2 6 2
3 2 0 7 0 0 0
Cluster Melanoma NSCLC Ovarian Prostate Renal Unknown
1 1 7 6 2 9 1
2 7 2 0 0 0 0
3 0 0 0 0 0 0
FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders
of modern day statistics, to whom we owe maximum-likelihood, sufficiency, and
many other fundamental concepts. The image on the left is a 1024×1024 grayscale
image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using
200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses
only four code vectors, with a compression rate of 0.50 bits/pixel