Chapter3 ML 21-22

28/11/2021
Plan
Introduction
Régression
Apprentissage non supervisé
Séparateurs a vaste marge
Arbres de décision
Apprentissage bayésien
Réseaux de neurones artificiels
Modèles de Markov cachés
© FEZZA S. v21‐22
Apprentissage par renforcement
Machine Learning 1
Objectifs
• Le clustering
• Domaines d’application
• Clustering par partitionnement (K‐means)
• Qualité des clusters
Machine Learning 2
1
28/11/2021
Supervised learning
Training set:
Machine Learning 3
Unsupervised learning
Training set:
• Clustering aims to find classes without labeled examples
• An “unsupervised” learning method
• Place similar items in same group, different items in different groups
Machine Learning 4
2
28/11/2021
Clustering
Machine Learning 5
Clustering
Cluster 1 Cluster 2
Machine Learning 6
3
28/11/2021
Applications of clustering
• Marketing: discover distinct groups in customer bases, and develop targeted

marketing programs (customer segmentation).
• Urban planning: organize regions into similar land‐use.
• Sociology: find groups of people with similar views
• Earth‐quake studies: Observed earth quake epicenters should be clustered along

continent faults.
• Biology: identify similar entities, plant and animal taxonomies, genes functionality.
• Also used for pattern recognition, data analysis, and image processing.
Machine Learning 7
K-means
• The “K” in K‐means stands for the number of clusters you want.
• The “means” in K‐means stands for the cluster centroids (means) we will compute.
Machine Learning 8
4
28/11/2021
K-means
Machine Learning 9
K-means
Machine Learning 10
5
28/11/2021
K-means
Machine Learning 11
K-means
Machine Learning 12
6
28/11/2021
K-means
Machine Learning 13
K-means
Machine Learning 14
7
28/11/2021
K-means
Machine Learning 15
K-means
Machine Learning 16
8
28/11/2021
K-means
K‐means algorithm
Input:
‐ (number of clusters)
‐ Training set
(drop convention)
Machine Learning 17
K-means
K‐means algorithm
Randomly initialize cluster centroids
Repeat {
for = 1 to
Cluster
assignment
:= index (from 1 to ) of cluster centroid
closest to
Move
for = 1 to
centroid := average (mean) of points assigned to cluster
}
Machine Learning 18
9
28/11/2021
K-means
K‐means algorithm
Machine Learning 19
K-means
K‐means for non‐separated clusters
T‐shirt sizing
Weight
Height
Machine Learning 20
10
28/11/2021
K-means
K‐means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:
Machine Learning 21
K-means
Random initialization
Should have
Randomly pick training
examples.
Set equal to these
examples.
Machine Learning 22
11
28/11/2021
K-means
Local optima
Machine Learning 23
K-means
Suboptimal solutions due to unlucky centroid initializations
Machine Learning 24
12
28/11/2021
K-means
Random initialization
For i = 1 to 100 {
Randomly initialize K‐means.
Run K‐means. Get .
Compute cost function (distortion)
Pick clustering that gave lowest cost
Machine Learning 25
K-means++
Machine Learning 26
13
28/11/2021
K-means
What is the right value of K?
Machine Learning 27
K-means
What is the right value of K?
Bad choices for the number of clusters: when k is too small, separate clusters get
merged (left), and when k is too large, some clusters get chopped into multiple
pieces (right)
Machine Learning 28
14
28/11/2021
K-means
Choosing the value of K
Elbow method:
Cost function
Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
(no. of clusters) (no. of clusters)
Machine Learning 29
K-means
Sometimes, you’re running K‐means to get clusters to use for some
later/downstream purpose. Evaluate K‐means based on a metric for
how well it performs for that later purpose.
E.g. T‐shirt sizing T‐shirt sizing

Weight
Weight
Height Height
Machine Learning 30
15
28/11/2021
Clustering assessment metrics
Machine Learning 31
Clustering assessment metrics

In an unsupervised learning setting, it is often hard to assess the performance of a model
since we don’t have the ground truth labels as was the case in the supervised learning
setting.
• Silhouette coefficient – By noting a and b the mean distance between a sample and all
other points in the same class, and between a sample and all other points in the next
nearest cluster, the silhouette coefficient s for a single sample is defined as follows
The coefficient can take values in the interval [‐1, 1].
• If it is 0 –> the sample is very close to the neighboring clusters.
• It it is 1 –> the sample is far away from the neighboring clusters.
• It it is ‐1 –> the sample is assigned to the wrong clusters.
Machine Learning 32
16
28/11/2021
K-means
Machine Learning 33
34
17
28/11/2021
Drawbacks of K-means
Machine Learning 35
Drawbacks of K-means
Machine Learning 36
18

Chapter3 ML 21-22

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter3 ML 21-22

Uploaded by

Copyright:

Available Formats

28/11/2021

Apprentissage non supervisé

• Marketing: discover distinct groups in customer bases, and develop targeted

• Urban planning: organize regions into similar land‐use.

• Sociology: find groups of people with similar views

• Earth‐quake studies: Observed earth quake epicenters should be clustered along

E.g. T‐shirt sizing T‐shirt sizing

Clustering assessment metrics

You might also like