Segmentation and Classification: FOOD, U. Copenhagen, Denmark. WWW - Models.life - Ku.dk

19/11/2017
Segmentation and
Classification
FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk
Similarity
1
19/11/2017
Similarity
Similarity:
Is the mathematical transposition of the concept of analogy. Analogy is used in
any moment of our life for pattern recognition, i.e. to recognize, to distinguish,
to classify.
Distances:
Are the starting point for evaluating similarity: close samples are considered
similar, far samples are considered dissimilar
Similarity
Distances between two samples

There are many ways to calculate distances between 2 points
Manhattan /City block, Minkowsky, Chebychev, Camberra,…
Distance
θ
( )
2
19/11/2017
Similarity
Distances between two samples

There are many ways to calculate distances between 2 points
Manhattan /City block, Minkowsky, Chebychev, Camberra,…
Similarity
Clusters:
Cluster methods search for the presence of groups (clusters) in the data,
based on distances.  UNSUPERVISED
3
19/11/2017
Similarity
Clusters
Do not confuse clustering and classification

Clustering methods
search for the presence of groups (clusters) in the data. They are unsupervised
and based on calculation of distances.
Classification
use the class information (supervised): they separate classes and their goal is to
find models able to correctly assign each sample to its proper class.
Similarity
Ad-hoc classification of pattern recognition methods
Unsupervised Pattern Recognition

PCA, clustering
Linear LDA
Pure classification
QDA
Non-linear K-NN
Classification
SIMCA
Linear PLSDA
Class-modelling
Variations PLSDA
Non-linear ANN
SVM
4
19/11/2017
Clustering
Clustering
Clustering families
1. Agglomerative or hierarchical methods 
Each pixel is considered as a class
Decreasing the number of classes-clusters
Many methods to solve the problem
2. Partitional methods 
Preselection of number of clusters
But this selection is not easy
Hard Clustering  KNN

Two main methods
Fuzzy Clustering  FCM
5
19/11/2017
Clustering
K-means clustering
K-means assigns each pixel xmn of the image to the kth cluster, whose center is
nearest, by minimizing the sum of the squared distances of each pixel to its
corresponding center
each pixel centroid
Clustering
K-means clustering workflow
(1) Random selection of k pixels to be the initial centroids
(2) Generate k clusters depending on the cluster distances to the centroids.
Assign each pixel to the nearest cluster centroid
(3) Re-assignation of the centroids according to the minimization of J
(4) Re-assignation of the pixels in the clusters and re-calculation of J
(5) Repeat 2 to 4 until convergence
6
19/11/2017
Clustering
K-means. Calculation of number of clusters
The number of clusters could be the Rank (PCA)
But the discrimination capacity of K-means is higher than PCA
There are many methods to calculate the number of clusters:
PCA
Silhouette
Distances between - within
Clustering
Silhouette
Calculated for each xmn pixel and offers a measure about the similarity between
points in the same cluster compared to points in other clusters:
amn  average distance between each

mnth pixel and all the pixels included in
the same cluster
bmn  minimum average distance

between each mnth pixel and the pixels
included in other clusters
(Smn)k close to 1  Correct classification

(Smn)k with negative value  Missclasification
EXTREMELY SLOW!!!
7
19/11/2017
Clustering
Silhouette
Silhouette 2 clusters
1
Cluster
0 0.2 0.4 0.6 0.8 1

Silhouette Value
Silhouette 3 clusters
1
Cluster
0 0.2 0.4 0.6 0.8 1

Silhouette Value
Clustering
Between – Within distances
For each clusters, it measures the influence of the rest of the clusters
Comparison between:
- The mean distance within one cluster and its centroid
- The distance between the rest of centroids and the cluster
The influence of cluster A to B it is not the same that the influence of cluster B to A
B
A
8
19/11/2017
Clustering
K-means. Benefits and drawbacks

Extremely simple to program
Resolving mixtures
Quantitation with the multi-set images
Extremely sensitive with noise and outliers. Trend to converge to local minima
Each pixel belongs to a cluster
Only valid for entities (objects). Not for mixtures
Difficult to assess the number of clusters
Clustering
Fuzzy clustering
Each pixel is assigned a fractional degree of membership to all the clusters

simultaneously, rather than it belonging completely to one cluster (as in KM
clustering).
FCM allows pixels in the edge to belong to one cluster to a lesser degree than
pixels that are in the middle of the cluster.
9
19/11/2017
Clustering
Fuzzy clustering
Membership function coefficient ugmnk is calculated for each xmn pixel in such a
way that each coefficient is compressed between 0 and 1, and the sum of all the
coefficients is defined to be 1
g is the so-called ‘‘fuzzifier’’ constant, which determines the fuzziness of the

clustering result
A good value of g is 2, which indicates that the coefficients are linearly

normalized to make this sum 1
Clustering
Fuzzy clustering
Now, the J function is calculated as follows:
and cluster center mk is calculated as the weighted mean:
10
19/11/2017
Clustering
Fuzzy clustering. Number of clusters
In this case, it is more difficult to calculate the number of clusters
Partition entropy
mnk and MN denotes the total amount of pixels of the image.
PE value ranges from 0 to log(K).
Values close to 0 indicate a good estimation of the number of clusters, whereas

PE values close to log K indicate that the number of clusters does not reflect
the real structure of the image.
Clustering
Fuzzy clustering. Number of clusters
In this case, it is more difficult to calculate the number of clusters
Partition entropy
11
19/11/2017
Clustering
Fuzzy clustering. Benefits and drawbacks

Each pixel is assigned a belonging degree to all the clusters
Resolving mixtures
Extremely sensitive with noise and outliers. Trend to converge to local minima
Difficult to assess the number of clusters
Clustering
Clustering. Example with ibuprofen poder mixture
Data in sample_demo.mat
The sample is a hyperspectral image composed by two compounds, ibuprofen

and starch. They are powders that have been mechanically mixed
Pure spectra
0.4
Starch
0.35
0.3
Mixture 0.25
0.2
0.15
0.1
Ibuprofen
0.05
1200 1300 1400 1500 1600 1700 1800 1900 2000
12
19/11/2017
Clustering
K-means with 2 clusters
Clustering
13
19/11/2017
Clustering
Clustering
14
19/11/2017
Clustering
FCM with 2 clusters
Clustering
Clustering. Example of the pastics
Kmeans with 4 clusters
15
19/11/2017
Clustering
Kmeans with 5 clusters
Clustering
FCM with 4 clusters
16
19/11/2017
Classification
Overview
Classification
Classification concept
Classification aims at finding a criterion to assign an object (sample) to one

category (class) based on a set of measurements performed on the object
itself.
Category or class is a (ideal) group of objects sharing similar characteristics
In classification, the categories are defined a priori
Classification stablishes boundaries depending of the criterion selected
The classification strongly depends on the representativeness of the measured

variables with respect to the samples
17
19/11/2017
Classification
Classification concept: (inspired by Federico Marini)
Criterion:
Nordic - Italian
Beer - Wine
Classification
Classification methods
Chemometric techniques aimed at finding mathematical models able to
recognize the membership of each sample to its proper class on the basis of a
set of measurements (X).
18
19/11/2017
Classification
Chemometric techniques aimed at finding mathematical models able to
recognize the membership of each sample to its proper class on the basis of a
set of measurements (X).
Classification
Distinctions can be made among classification techniques on the basis of the
mathematical form of the decision boundary, i.e. on the basis of the ability to
detect linear or non-linear boundaries
19
19/11/2017
Classification
Another important distinction can be made among pure classification and class-
modeling methods
Classification
Classes can be defined in different ways:
- By theoretical knowledge or experimental evidences
- By Discretizing a quantitative response
< 5  Class A >= 5.1 – 7 <  Class B >= 7.1  Class C
20
19/11/2017
Classification
Once the classes have been defined, we construct a model
Variables
A
A
Samples
MODEL B
X (I x J) B
Class = f(X)
B
C
C
C
Classification
And then we can predict the category of unknown samples
Variables
MODEL
A
Class = f(X)
Unknown sample
(1 x J)
21
19/11/2017
Classification
Classification uses the class information to find models that associate each
sample to the assigned class  SUPERVISED
Linear Discriminant Analysis
SIMCA
PLS-DA
Classification
K-NN
22
19/11/2017
Classification – K-NN
K-NN
It is the benchmark method for unsupervised classification based on measuring
distances (analogy – simmilarity).
Each sample is classified on the basis of the most represented classes of the k
nearest samples.
It is a non-linear method that needs class information
Classification
Linear Discriminant Analysis
23
19/11/2017
Classification - LDA
Discriminant Analysis
Separates samples into classes by finding directions which:
 maximize the variance between classes
 minimize the variance within classes
PC1  GOOD
LD1  GOOD
PC1  BAD
LD1  GOOD
24
19/11/2017
Linear discriminant Analysis (LDA)
LDA is a method that separate samples into classes by finding directions which
maximize the variance between classes and minimize the variance whithin
classes.
Two main assumptions of our data:
Each k-class density is a multivariate gaussian
25
19/11/2017
Linear discriminant Analysis (LDA)
LDA is a method that separate samples into classes by finding directions which
maximize the variance between classes and minimize the variance whithin
classes.
Two main assumptions of our data:
All the class covariance matrices are presumed to be identical
Sg = S
LDA is based on probabilities
It calculates the probability of belonging to each class by applying the Bayes

Theorem:
26
19/11/2017
Once the probability has been calculated, LDA assigns samples to an especific
class with minimum discriminant score dg
27
19/11/2017
LDA benefits and drawbacks
Easy to use and reliable
The number of samples must be higher than the number of variables. But this is
not a problem on images if the calibration is made from standard images
LDA assumes that the data follows a Gaussian distribution
28
19/11/2017
Classification
SIMCA
Classification - SIMCA
SIMCA. Definition
• Originally proposed by Wold in 1976
 SOFT: No assumption of the distribution of variable is made (bilinear

modeling)
 INDEPENDENT: Each category is modeled independently
 MODELING of CLASS ANALOGIES: Attention is focused on the

similarity between object from the same class rather then on
differentiating among classes.
29
19/11/2017
SIMCA. Definition
• Is a class-modelling method. Thus, it is a supervised method.

- A training set is needed to construct a model
- Projection of unknown samples to the model
• SIMCA is based on making independent PCA models for each class in the
training set, having each class the possibility of containing different number of
PCs.
• After the independent PCA models are constructed, the unknown samples are
projected onto them
SIMCA. Graphical interpretation
1) Individual PCA model for each class. Each class pre-processed independently
PC1
PC1
PC2
30
19/11/2017
1) Individual PCA model for each class. Each class pre-processed independently
PC1
PC2
2) Projection of other classes in the PCA class space of one class
PC1
PC2
31
19/11/2017
SIMCA. Class assignation
There are many implementations to decide if an unknown sample

belongs to a class or not.
Here we will talk about three variations of the same concept:
Hottelling T2 and residuals!
PC1
PC2
There are many implementations to decide if an unknown sample

belongs to a class or not.
Here we will talk about three variations of the same concept:
?
residuals
Hotelling T2
32
19/11/2017

Strategy 1: Sample competition. One sample, one class
• SIMCA assigns samples to the nearest class.

• Samples are, therefore, always assigned.
• The distance of each i sample from each g class (dig) is calculated as
• where:
• Qig and T2ig are the Hotelling’s T2 and Q calculated in the PCA g-class model.
• Q0.95,g and T20.95,g are the confidence intervals within 95% of the g class

Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
33
19/11/2017

Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q

Strategy 2: Conditioning
• SIMCA assigns a sample to the g class if
• Which is equivalent to:
• Samples can be:

• unassigned (i.e. outside the class spaces of all classes)
• classified in more than one class (confused)
34
19/11/2017

Strategy 2: Conditioning
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q

Strategy 3: More restricted conditioning
• SIMCA assigns a sample to the g class if
• Similar approach to the second strategy

• Samples can be:
• unassigned (i.e. outside the class spaces of all classes)
• classified in more than one class (confused)
35
19/11/2017

Strategy 3: More restricted conditioning
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q

Example with three samples
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Strategy 1 2 3
A A A
36
19/11/2017

Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Strategy 1 2 3
B AB B

Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Strategy 1 2 3
A None None
37
19/11/2017
SIMCA. Example
Development of SIMCA model with the 4 plastics
Calibration set: 655 spectra (between 150 – 200 spectra per class)
SIMCA. Example
Calibration development:
38
19/11/2017
SIMCA. Example
SIMCA. Example
Prediction:
39
19/11/2017
SIMCA. Benefits and drawbacks
Based on PCA
Single class modelling
Each class needs to be perfectly defined in the PCA space
The number of unassigned samples can be high, depending on the noise
One sample can belong to more than one class
Classification
PLS-DA
40
19/11/2017
Classification – PLS-DA
PLS-DA. LDA with the covariance of PLS
PLS-DA is based on the same principles than PLS, covariance between

X and Y, but with the discriminant abbility
Unfolding
PLS-2 model
0
1
D Dummy
matrix
PLS-DA. PLS-2 model

PLS-2 model
0
1
D Dummy
matrix
41
19/11/2017
PLS-DA. PLS-2 model

The main difference is that Y is a dummy matrix with 0 and 1
PLS-2 model
0
1
D Dummy
matrix
PLS-DA. Responses
The response of PLS-DA is still a number. Therefore, we need to find

rules to convert these numbers into classes
42
19/11/2017
PLS-DA. Responses

Bayes Theorem (like in LDA)
1) It assumes that the predicted values follow a normal distribution.

2) The treshold is selected where number of false positives and false
negatives is minimized
PLS-DA. Responses

43
19/11/2017
PLS-DA. Responses
The response of PLS-DA is still a number. Therefore, we need to find rules to

convert these numbers into classes
The rest, like in PLS (and SIMCA, LDA, KNN,... Any training-based method):
Cross-validation
Number of LVs
Etc...
Classification
Assessment of classification
models
44
19/11/2017
Classification – Assessment
Confusion matrix
Classifiers
TP  True positive
FP  False positive
FN  False negative
TN  True negative
45
19/11/2017
Classifiers
Classifiers
46
19/11/2017
Classifiers
Classifiers
47
19/11/2017
Classifiers
Classifiers
Receiver Operating Characteristics (ROC curves)
ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively, for a

binary classification system as its discrimination threshold is changed. They are used
to estimate the best classification score.
48
19/11/2017
Classifiers

Classifiers

49
19/11/2017
PLS-DA. Example
Development of PLS-DA model with the 4 plastics
Calibration set: 655 spectra (between 150 – 200 spectra per class)
PLS-DA. Example
Development of a PLS-Da model with the 4 plastics
50
19/11/2017
PLS-DA. Example
PLS-DA. Example
Visualization of the prediction:
51
19/11/2017
Comparison SIMCA - PLSDA
SIMCA PLS-DA
PLS-DA. Benefits and drawbacks
Based on co-variance between X and Y

More reliable than SIMCA
Does not allow single class modelling
One sample can belong to more than one class
52

Segmentation and Classification: FOOD, U. Copenhagen, Denmark. WWW - Models.life - Ku.dk

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Segmentation and Classification: FOOD, U. Copenhagen, Denmark. WWW - Models.life - Ku.dk

Uploaded by

Copyright:

Available Formats

19/11/2017

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

Distances between two samples

Manhattan /City block, Minkowsky, Chebychev, Camberra,…

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

Distances between two samples

Manhattan /City block, Minkowsky, Chebychev, Camberra,…

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

Do not confuse clustering and classification

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

Ad-hoc classification of pattern recognition methods

Unsupervised Pattern Recognition

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

1. Agglomerative or hierarchical methods 

Each pixel is considered as a class

Decreasing the number of classes-clusters

Many methods to solve the problem

Preselection of number of clusters

But this selection is not easy

Hard Clustering  KNN

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

each pixel centroid

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

K-means clustering workflow

(1) Random selection of k pixels to be the initial centroids

(2) Generate k clusters depending on the cluster distances to the centroids.

Assign each pixel to the nearest cluster centroid

(3) Re-assignation of the centroids according to the minimization of J

(4) Re-assignation of the pixels in the clusters and re-calculation of J

(5) Repeat 2 to 4 until convergence

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

K-means. Calculation of number of clusters

The number of clusters could be the Rank (PCA)

But the discrimination capacity of K-means is higher than PCA

There are many methods to calculate the number of clusters:

Distances between - within

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

K-means. Calculation of number of clusters

amn  average distance between each

bmn  minimum average distance

(Smn)k close to 1  Correct classification

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

K-means. Calculation of number of clusters

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

K-means. Calculation of number of clusters

Between – Within distances

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

K-means. Benefits and drawbacks

Quantitation with the multi-set images

Each pixel belongs to a cluster

Only valid for entities (objects). Not for mixtures

Difficult to assess the number of clusters

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

Each pixel is assigned a fractional degree of membership to all the clusters

FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk

g is the so-called ‘‘fuzzifier’’ constant, which determines the fuzziness of the

A good value of g is 2, which indicates that the coefficients are linearly