You are on page 1of 18

28/11/2021

Plan
Introduction

Régression

Apprentissage non supervisé

Séparateurs a vaste marge

Arbres de décision

Apprentissage bayésien

Réseaux de neurones artificiels

Modèles de Markov cachés
© FEZZA S. v21‐22

Apprentissage par renforcement
Machine Learning 1

Objectifs

• Le clustering
• Domaines d’application
• Clustering par partitionnement (K‐means)
• Qualité des clusters
© FEZZA S. v21‐22

Machine Learning 2

1
28/11/2021

Supervised learning

Training set: 
© FEZZA S. v21‐22

Machine Learning 3

Unsupervised learning

Training set: 
• Clustering aims to find classes without labeled examples
© FEZZA S. v21‐22

• An “unsupervised” learning method
• Place similar items in same group, different items in different groups

Machine Learning 4

2
28/11/2021

© FEZZA S. v21‐22
Clustering

Machine Learning 5

Clustering

Cluster 1 Cluster 2
© FEZZA S. v21‐22

Machine Learning 6

3
28/11/2021

Applications of clustering

• Marketing: discover distinct groups in customer bases, and develop targeted


marketing programs (customer segmentation).

• Urban planning: organize regions into similar land‐use.

• Sociology: find groups of people with similar views

• Earth‐quake studies: Observed earth quake epicenters should be clustered along


continent faults.

• Biology: identify similar entities, plant and animal taxonomies, genes functionality.

• Also used for pattern recognition, data analysis, and image processing.
© FEZZA S. v21‐22

Machine Learning 7

K-means
• The “K” in K‐means stands for the number of clusters you want.
• The “means” in K‐means stands for the cluster centroids (means) we will compute.
© FEZZA S. v21‐22

Machine Learning 8

4
28/11/2021

© FEZZA S. v21‐22
K-means

Machine Learning 9

K-means
© FEZZA S. v21‐22

Machine Learning 10

5
28/11/2021

© FEZZA S. v21‐22
K-means

Machine Learning 11

K-means
© FEZZA S. v21‐22

Machine Learning 12

6
28/11/2021

© FEZZA S. v21‐22
K-means

Machine Learning 13

K-means
© FEZZA S. v21‐22

Machine Learning 14

7
28/11/2021

© FEZZA S. v21‐22
K-means

Machine Learning 15

K-means
© FEZZA S. v21‐22

Machine Learning 16

8
28/11/2021

K-means

K‐means algorithm
Input:
‐ (number of clusters)
‐ Training set

(drop             convention)
© FEZZA S. v21‐22

Machine Learning 17

K-means

K‐means algorithm

Randomly initialize      cluster centroids
Repeat {
for  = 1 to 
Cluster 
assignment
:= index (from 1 to  ) of cluster centroid 
closest to 
Move
for    = 1 to 
centroid := average (mean) of points assigned to cluster

}
© FEZZA S. v21‐22

Machine Learning 18

9
28/11/2021

K-means

K‐means algorithm
© FEZZA S. v21‐22

Machine Learning 19

K-means
K‐means for non‐separated clusters

T‐shirt sizing
Weight

Height
© FEZZA S. v21‐22

Machine Learning 20

10
28/11/2021

K-means
K‐means optimization objective
= index of cluster (1,2,…,   ) to which example          is currently 
assigned
= cluster centroid     (              )
= cluster centroid of cluster to which example          has been 
assigned
Optimization objective:
© FEZZA S. v21‐22

Machine Learning 21

K-means
Random initialization
Should have  

Randomly pick     training 
examples.

Set                     equal to these 
examples.
© FEZZA S. v21‐22

Machine Learning 22

11
28/11/2021

K-means

Local optima
© FEZZA S. v21‐22

Machine Learning 23

K-means

Suboptimal solutions due to unlucky centroid initializations
© FEZZA S. v21‐22

Machine Learning 24

12
28/11/2021

K-means
Random initialization
For i = 1 to 100 {

Randomly initialize K‐means.
Run K‐means. Get                                                 .
Compute cost function (distortion) 

Pick clustering that gave lowest cost
© FEZZA S. v21‐22

Machine Learning 25

K-means++
© FEZZA S. v21‐22

Machine Learning 26

13
28/11/2021

K-means
What is the right value of K?
© FEZZA S. v21‐22

Machine Learning 27

K-means
What is the right value of K?

Bad choices for the number of clusters: when k is too small, separate clusters get
merged (left), and when k is too large, some clusters get chopped into multiple
pieces (right)
© FEZZA S. v21‐22

Machine Learning 28

14
28/11/2021

K-means

Choosing the value of K
Elbow method:
Cost function 

Cost function 
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

(no. of clusters) (no. of clusters)
© FEZZA S. v21‐22

Machine Learning 29

K-means

Choosing the value of K
Sometimes, you’re running K‐means to get clusters to use for some 
later/downstream purpose. Evaluate K‐means based on a metric for 
how well it performs for that later purpose.

E.g. T‐shirt sizing T‐shirt sizing


Weight
Weight
© FEZZA S. v21‐22

Height Height

Machine Learning 30

15
28/11/2021

© FEZZA S. v21‐22
Clustering assessment metrics

Machine Learning 31

Clustering assessment metrics


In an unsupervised learning setting, it is often hard to assess the performance of a model
since we don’t have the ground truth labels as was the case in the supervised learning
setting.

• Silhouette coefficient – By noting a and b the mean distance between a sample and all
other points in the same class, and between a sample and all other points in the next
nearest cluster, the silhouette coefficient s for a single sample is defined as follows

The coefficient can take values in the interval [‐1, 1].
• If it is 0 –> the sample is very close to the neighboring clusters.
• It it is 1 –> the sample is far away from the neighboring clusters.
• It it is ‐1 –> the sample is assigned to the wrong clusters.
© FEZZA S. v21‐22

Machine Learning 32

16
28/11/2021

K-means

Choosing the value of K
© FEZZA S. v21‐22

Machine Learning 33
© FEZZA S. v21‐22

34

17
28/11/2021

© FEZZA S. v21‐22
Drawbacks of K-means

Machine Learning 35

Drawbacks of K-means
© FEZZA S. v21‐22

Machine Learning 36

18

You might also like