4 - Unsupervised Classification

CHAPTER 4 
Unsupervised Classification
Hervé Gross, PhD - Reservoir Engineer

Advanced Resources and Risk Technology
hgross@ar2tech.com
© ADVANCED RESOURCES AND RISK TECHNOLOGY This document can only be distributed within UFRGS
What is unsupervised classification?
o Given a data set with multiple features,

unsupervised classification is the action
of sorting, classifying, categorizing this
multidimensional data into a number of
groups with similar features
o These groups are called clusters
o The goal is to identify distinguishing
features of a large set of data in a
multidimensional space
o If an unsupervised classification is
successful, any member of a given
cluster has stronger similarities with
members of its own cluster than to any
other member of other clusters
o Because no output (no classification
https://pixabay.com/en/human-crowds-collection-people-592738/
label) is provided, unsupervised
Social networks use unsupervised classification to group a large sets of people in classification is not validated against a
categories based on gender, age, taste, recent activity, friends,... (“buckets”). “truth” (contrary to supervised
By finding similarities between users, they can offer customized services to each bucket, classification, where a partial truth is
test new functionalities, analyze similar behaviors, etc. given)
© ADVANCED RESOURCES AND RISK TECHNOLOGY 2

Illustration on 2D data
o 1,500 points
o Our brain is wired to detect
patterns, seems obvious
o We can easily detect cluster
that maximize extra-cluster
distance and minimize intra-
distance clusters (most
compact clusters)
o Number of clusters?
o Centroids? Medoids?

Illustration on 2D data
o K-means clustering done here

o We provide the number of clusters
o Any misclassification?
o Never use clustering without
understanding the reason for clusters
o VISUALIZATION is very important
o Clustering quality metrics do not offer a
true validation, they only offer validation
on the difficulty in performing clustering
ALWAYS ASK YOURSELF: 

Does this clustering make sense?
Plots, statistics, spot-checks 
Eg.: what is the point of creation 20 clusters for
facies if you cannot find variograms for them?

Generalization to n-dim data
o Geomodeling data sets are multidimensional:
o Well logs (density, gamma, sonic, etc.), interpreted data (rock type…), secondary (seismic)
o Spatial information must always be accounted for: cluster by horizon, by region
o Time-dependent information: production, pressure, history-match quality
Field model
Correlated data
Well logs
Production

Techniques used in  
Unsupervised Classification
Clustering Clustering = automatic classification =

numerical taxonomy
k-means
Sometimes focused on the resulting
mixture models (such as Gaussian mixture models) groups, sometimes focused on the
discriminative power of a set of features
hierarchical clustering
Anomaly detection
Neural Networks
Hebbian Learning
Generative Adversarial Networks
Approaches for learning latent variable models such as
Expectation–maximization algorithm (EM)
Method of moments
Blind signal separation techniques
Principal component analysis
Independent component analysis
Non-negative matrix factorization
Singular value decomposition

K-means algorithms
Many variants of the same base algorithm

(centroid-based algorithms)
Principle: for each k-cluster, minimize the
within-cluster sum of square distance to the
K-centroid (heuristic)
STANDARD ALGORITHM
o Initialization: Start with a guess of k candidate
centroids (quality of answer strongly initialization
dependent!)
o Assignment: Compute the distance of each point to
each centroid, and assign each point to the cluster
with the nearest centroid (=Voronoi diagram
partition) [expectation]
o Update: compute the new centroid as the mean
position of all points contained within a cluster
[maximization]
k points 
μ = centroid  o Convergence: stop when centroid positions
S = cluster  changes are below a predetermined tolerance
Here L2 norm

Pros and Cons of K-Means
PROS http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_assumptions.html
o Works with large data sets in large dimensions

o Robust to clusters with different densities (Fig. 4)
o Heuristic: always an answer regardless of the required number
of clusters [good for representative model selection problems] 
CONS
o Poor choice of number of clusters can lead to spurious results Fig.1 Fig.2
(Fig.1)
o Sensitive to initial guess and prone to local minima
o “Spherical cluster” assumption: because the centroid is the
mean of Euclidian distances, clusters work best if they are
anisotropic around the centroid (Fig.2)
o Overlapping clusters: when clusters are touching, boundary
assignments can lead to misclassification (Fig. 3)
o Can be CPU-consuming because k*N distances are computed Fig.3 Fig.4
at each iteration, even more so with Kernel transformations.

Many K-means variations
o Smart initialization strategies (boundary sampling, DoE concepts, best guesses…)

o Reduce dimensions (pre-process with PCA, k-PCA, MDS,…)
o Transform coordinates (including Kernel transformations)
o Apply weights on the dimensions (favor separation in a metric)
o Work with different distance norms (k-median clustering uses L1, taxicab norm)
o Add stochasticity to avoid local minima (random assignments)
o Use medoids instead of centroids (ie. only existing points can be centroids)
o Add Internal cluster evaluation (measures the density, extent, shape, “silhouette” of
clusters) to update the optimal of number clusters
o Expectation-Maximization algorithms: centroids are maximum likelihood points
determined by the marginal likelihood of data (continuous variables)
o Hierarchical clustering: split clusters to build a hierarchy and find optimal number of
clusters

Hierarchical clustering algorithms
o Hierarchical clustering analysis (HCA) is also

called “connectivity-based clustering”
o Cluster data points (“connect” them) based on
a maximum distance required to travel from
one cluster to another. The inter-cluster
similarity (distance) can be defined in several
ways.
o “Hierarchical” because clusters are either
divided into smaller clusters (divisive
clustering approach) as the maximum
distance to connect decreases, or
agglomerated as the maximum distance to
connect increases (agglomerative clustering
approach)
Iterative construction of 10 clusters with Euclidian
o Dendograms are used to represent the
distance and a “complete” linkage approach
partitioning of data as a function of distance

Most efficient implementation: 
Agglomerative algorithm
o Initialization: start with each point in its own N=150 points 

cluster (N clusters) 2 dimensions 
o Iteration #1: look for the two closest points Euclidian distance
according to the selected linkage and
distance type, and “link” them together (ie.
add their centroid to the pool of points, and
remove the two original points from the pool)
o Iteration #N: look for the next two closest
points or two closest clusters of points
(based on a measure of inter-cluster
distance) and link them (add centroid,
remove parents)
o Stop when there are only 2 clusters left and
report all distances at which linkages
occurred
o Result: a dendogram (classification tree) Agglomerative hierarchical clustering iteratively searches for the two
“closest” (most similar) points or previously-formed clusters and
where all linkages are shown as a function links them
of distance. The user can then pick either a
distance or a desired number of clusters
https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/

How to read a dendogram 
(Greek dendro: tree + gramma: drawing)
Vertical line = origin of merged clusters
Progression of agglomerative clustering

Value of linkage distance
Horizontal line = merge two clusters
Label of samples (here 150 original points) 

Nothing to do with their actual position in space, only their adjacency

How to read a dendogram
1 2 At this distance, we can form 2 clusters

30

Nothing to do with their actual position in space

1 2 At this distance, we can form 2 clusters

30


At this distance, we can form 5 clusters
1 2 3 4 5
15


At this distance, we can form 5 clusters
1 2 3 4 5
15


o Dendograms are often “leaf-truncated”

to avoid showing unreadable depths.
(leaves = the end of the branches of the
tree)
o Reports either the singleton index, or
the number of points contained in the
cluster
o Dendograms are sensitive to the
distance between points/clusters (not
their position)
o The choice of metrics and linkage
strategies impacts dendograms

HCA variations
o Two keys components of the method can be customized:

o The metric (Euclidian L2, Manhattan L1, maximum distance Linf, Mahalanobis (covariance), …)
o The linkage :
o Link if closest two points of sets < threshold [minimum linkage]
o Link if farthest two points of sets < threshold [complete or maximum linkage]
o Link if average distance between all pairs of points < threshold [average linkage]
o Link if centroid distance < threshold
o Link if “minimum energy clustering”: distance between the two point sets density distributions < th
o Ward linkage: minimize distances inside all clusters (similar to k-means), compactness

Pros and Cons
PROS
o No need to know the number of clusters a
priori, can choose later
o Only distance between points matter, not their
position: useful when working in Kernel
spaces, MDS
o CPU consumption predictable, and search for
closest clusters speeds up with more linkage
CONS
o Agglomerative clusters: large clusters will tend
to grow faster with each distance iteration than
small clusters (rich getting richer, singletons
left). This is due to their larger envelope that
automatically offers more choice opportunity
for contact with close points

Unsupervised learning in geomodeling
Easy and tempting to replace expert knowledge by unsupervised learning: instead of interpreting
lithography, obtain measurements and let algorithms identify similarities.
This is dangerous:
o Clusters are pure mathematical objects, they are useful but not explanatory
o They contain no direct physical information, bear no extrapolation quality, offer no judgement with respect
to data (uses all data provided, without distinction)
o Clusters depend on our choice of features, they depend on our subset of data, they depend on our
algorithm parameterization
o More interesting is to understand how these clusters came to be produced by the algorithm: clustering is
only the beginning of our work. We have to extract some sense from them.
o Our assumption: we know that we have an underlying “law” (physics and chemistry of geological
reservoirs) to help us understand why our clusters were formed, we need to use it
o Forming clusters is a way of acknowledging that similar causes (features) will lead to similar behaviors,
although we do not know which behavior (outcome) yet.
o Supervised learning: identify a behavior and a set of causal features, and infer a modeling relationship

References
Hyvarinen and E. Oja, Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5), 2000, pp.
411-430
J.A. Hartigan (1975). Clustering algorithms. John Wiley & Sons, Inc.
Hartigan, J. A.; Wong, M. A. (1979). "Algorithm AS 136: A K-Means Clustering Algorithm". Journal of the Royal Statistical
Society, Series C. 28 (1): 100–108. JSTOR 2346830.
Honarkhah, M; Caers, J (2010). "Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling". Mathematical
Geosciences. 42: 487–517. doi:10.1007/s11004-010-9276-7.
Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US,
2005. 321-352.
Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "14.3.12 Hierarchical clustering". The Elements of Statistical
Learning (PDF) (2nd ed.). New York: Springer. pp. 520–528. ISBN 0-387-84857-6. Retrieved 2009-10-20.

4 - Unsupervised Classification

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 - Unsupervised Classification

Uploaded by

Copyright:

Available Formats

CHAPTER 4

Hervé Gross, PhD - Reservoir Engineer

o Given a data set with multiple features,

© ADVANCED RESOURCES AND RISK TECHNOLOGY 2

© ADVANCED RESOURCES AND RISK TECHNOLOGY 3

o K-means clustering done here

ALWAYS ASK YOURSELF:

© ADVANCED RESOURCES AND RISK TECHNOLOGY 4

© ADVANCED RESOURCES AND RISK TECHNOLOGY 5

Clustering Clustering = automatic classification =

© ADVANCED RESOURCES AND RISK TECHNOLOGY 6

Many variants of the same base algorithm

© ADVANCED RESOURCES AND RISK TECHNOLOGY 7

o Works with large data sets in large dimensions

© ADVANCED RESOURCES AND RISK TECHNOLOGY 8

o Smart initialization strategies (boundary sampling, DoE concepts, best guesses…)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 9

o Hierarchical clustering analysis (HCA) is also

© ADVANCED RESOURCES AND RISK TECHNOLOGY 10

o Initialization: start with each point in its own N=150 points

© ADVANCED RESOURCES AND RISK TECHNOLOGY 11

Vertical line = origin of merged clusters

Progression of agglomerative clustering

Horizontal line = merge two clusters

Label of samples (here 150 original points)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 12

1 2 At this distance, we can form 2 clusters

Label of samples (here 150 original points)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 13

1 2 At this distance, we can form 2 clusters

Label of samples (here 150 original points)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 14

At this distance, we can form 5 clusters

Label of samples (here 150 original points)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 15

At this distance, we can form 5 clusters

Label of samples (here 150 original points)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 16

o Dendograms are often “leaf-truncated”

© ADVANCED RESOURCES AND RISK TECHNOLOGY 17

o Two keys components of the method can be customized:

© ADVANCED RESOURCES AND RISK TECHNOLOGY 18

© ADVANCED RESOURCES AND RISK TECHNOLOGY 19

© ADVANCED RESOURCES AND RISK TECHNOLOGY 20

© ADVANCED RESOURCES AND RISK TECHNOLOGY 21

You might also like

CHAPTER 4 

ALWAYS ASK YOURSELF: 

o Initialization: start with each point in its own N=150 points 

Label of samples (here 150 original points) 

Label of samples (here 150 original points) 

Label of samples (here 150 original points) 

Label of samples (here 150 original points) 

Label of samples (here 150 original points)