You are on page 1of 56

Supervised vs Unsupervised Learning

Entrenamiento

Test

Encontrar una estructura


The social dilemma
Netflix

Ads
Test
contar palabra "ciencia"
3
1

contar palabra "fútbol"


text
xi

centroide
datapoint x i
k centroids
𝜇1

𝜇2
𝜇3

k = 3 clusters
Voronoi Tesselation
text
k=3
k=3
k = ?
text
Cluster heterogeneity ~Cost Function = Distortion
cost (heterogeneity) = ∑ cost (heterogeneity) = ∑

Measue of quality of
given clustering:

Lower is better!
text
N=num. de obs. = num de muestras/ejemplos

Random initialization in practice


For i = 1 to 100 {
Randomly initialize K-means (e.g., random or with kmeans++)
Run K-means. Get the centroids .
Compute cost function (heterogeneity)
}
Pick clustering that gave lowest cost
Costo k=1

k=2
Heurística: elbow method

k=N

You might also like