Professional Documents
Culture Documents
Segmentation and Classification: FOOD, U. Copenhagen, Denmark. WWW - Models.life - Ku.dk
Segmentation and Classification: FOOD, U. Copenhagen, Denmark. WWW - Models.life - Ku.dk
Segmentation and
Classification
Similarity
1
19/11/2017
Similarity
Similarity:
Is the mathematical transposition of the concept of analogy. Analogy is used in
any moment of our life for pattern recognition, i.e. to recognize, to distinguish,
to classify.
Distances:
Are the starting point for evaluating similarity: close samples are considered
similar, far samples are considered dissimilar
Similarity
Distance
θ
( )
2
19/11/2017
Similarity
Similarity
Clusters:
Cluster methods search for the presence of groups (clusters) in the data,
based on distances. UNSUPERVISED
3
19/11/2017
Similarity
Clusters
Classification
use the class information (supervised): they separate classes and their goal is to
find models able to correctly assign each sample to its proper class.
Similarity
Linear LDA
Pure classification
QDA
Non-linear K-NN
Classification
SIMCA
Linear PLSDA
Class-modelling
Variations PLSDA
Non-linear ANN
SVM
4
19/11/2017
Clustering
Clustering
Clustering families
2. Partitional methods
5
19/11/2017
Clustering
K-means clustering
K-means assigns each pixel xmn of the image to the kth cluster, whose center is
nearest, by minimizing the sum of the squared distances of each pixel to its
corresponding center
Clustering
6
19/11/2017
Clustering
PCA
Silhouette
Clustering
Silhouette
Calculated for each xmn pixel and offers a measure about the similarity between
points in the same cluster compared to points in other clusters:
EXTREMELY SLOW!!!
7
19/11/2017
Clustering
Silhouette
Silhouette 2 clusters
1
Cluster
Silhouette 3 clusters
1
Cluster
Clustering
For each clusters, it measures the influence of the rest of the clusters
Comparison between:
- The mean distance within one cluster and its centroid
- The distance between the rest of centroids and the cluster
The influence of cluster A to B it is not the same that the influence of cluster B to A
B
A
8
19/11/2017
Clustering
Resolving mixtures
Extremely sensitive with noise and outliers. Trend to converge to local minima
Clustering
Fuzzy clustering
FCM allows pixels in the edge to belong to one cluster to a lesser degree than
pixels that are in the middle of the cluster.
9
19/11/2017
Clustering
Fuzzy clustering
Membership function coefficient ugmnk is calculated for each xmn pixel in such a
way that each coefficient is compressed between 0 and 1, and the sum of all the
coefficients is defined to be 1
Clustering
Fuzzy clustering
10
19/11/2017
Clustering
Partition entropy
Clustering
Partition entropy
11
19/11/2017
Clustering
Resolving mixtures
Extremely sensitive with noise and outliers. Trend to converge to local minima
Clustering
Data in sample_demo.mat
Pure spectra
0.4
Starch
0.35
0.3
Mixture 0.25
0.2
0.15
0.1
Ibuprofen
0.05
1200 1300 1400 1500 1600 1700 1800 1900 2000
12
19/11/2017
Clustering
Clustering
13
19/11/2017
Clustering
Clustering
14
19/11/2017
Clustering
Clustering
15
19/11/2017
Clustering
Clustering
16
19/11/2017
Classification
Overview
Classification
Classification concept
17
19/11/2017
Classification
Criterion:
Nordic - Italian
Beer - Wine
Classification
Classification methods
Chemometric techniques aimed at finding mathematical models able to
recognize the membership of each sample to its proper class on the basis of a
set of measurements (X).
18
19/11/2017
Classification
Classification methods
Chemometric techniques aimed at finding mathematical models able to
recognize the membership of each sample to its proper class on the basis of a
set of measurements (X).
Classification
Classification methods
Distinctions can be made among classification techniques on the basis of the
mathematical form of the decision boundary, i.e. on the basis of the ability to
detect linear or non-linear boundaries
19
19/11/2017
Classification
Classification methods
Another important distinction can be made among pure classification and class-
modeling methods
Classification
Classification methods
20
19/11/2017
Classification
Classification methods
Variables
A
A
Samples
MODEL B
X (I x J) B
Class = f(X)
B
C
C
C
Classification
Classification methods
Variables
MODEL
A
Class = f(X)
Unknown sample
(1 x J)
21
19/11/2017
Classification
Classification methods
Classification uses the class information to find models that associate each
sample to the assigned class SUPERVISED
SIMCA
PLS-DA
Classification
K-NN
22
19/11/2017
Classification – K-NN
K-NN
It is the benchmark method for unsupervised classification based on measuring
distances (analogy – simmilarity).
Each sample is classified on the basis of the most represented classes of the k
nearest samples.
Classification
Linear Discriminant Analysis
23
19/11/2017
Classification - LDA
Discriminant Analysis
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
PC1 GOOD
LD1 GOOD
Classification - LDA
Discriminant Analysis
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
PC1 BAD
LD1 GOOD
24
19/11/2017
Classification - LDA
Discriminant Analysis
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes
Classification - LDA
LDA is a method that separate samples into classes by finding directions which
maximize the variance between classes and minimize the variance whithin
classes.
25
19/11/2017
Classification - LDA
LDA is a method that separate samples into classes by finding directions which
maximize the variance between classes and minimize the variance whithin
classes.
Sg = S
FOOD, U. Copenhagen, Denmark. www.models.life.ku.dk www.hypertools.org jmar@life.ku.dk
Classification - LDA
26
19/11/2017
Classification - LDA
Once the probability has been calculated, LDA assigns samples to an especific
class with minimum discriminant score dg
Classification - LDA
Once the probability has been calculated, LDA assigns samples to an especific
class with minimum discriminant score dg
27
19/11/2017
Classification - LDA
Once the probability has been calculated, LDA assigns samples to an especific
class with minimum discriminant score dg
Classification - LDA
The number of samples must be higher than the number of variables. But this is
not a problem on images if the calibration is made from standard images
LDA assumes that the data follows a Gaussian distribution
28
19/11/2017
Classification
SIMCA
Classification - SIMCA
SIMCA. Definition
29
19/11/2017
Classification - SIMCA
SIMCA. Definition
• SIMCA is based on making independent PCA models for each class in the
training set, having each class the possibility of containing different number of
PCs.
• After the independent PCA models are constructed, the unknown samples are
projected onto them
Classification - SIMCA
1) Individual PCA model for each class. Each class pre-processed independently
PC1
PC1
PC2
30
19/11/2017
Classification - SIMCA
1) Individual PCA model for each class. Each class pre-processed independently
PC1
PC2
Classification - SIMCA
PC1
PC2
31
19/11/2017
Classification - SIMCA
PC1
PC2
Classification - SIMCA
?
residuals
Hotelling T2
32
19/11/2017
Classification - SIMCA
• where:
• Qig and T2ig are the Hotelling’s T2 and Q calculated in the PCA g-class model.
• Q0.95,g and T20.95,g are the confidence intervals within 95% of the g class
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
33
19/11/2017
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Classification - SIMCA
34
19/11/2017
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Classification - SIMCA
35
19/11/2017
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Strategy 1 2 3
A A A
36
19/11/2017
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Strategy 1 2 3
B AB B
Classification - SIMCA
Class A Class B
T2 T2
T20.95,A
T20.95,B
Q0.95,A Q Q0.95,B Q
Strategy 1 2 3
A None None
37
19/11/2017
Classification - SIMCA
SIMCA. Example
Development of SIMCA model with the 4 plastics
Calibration set: 655 spectra (between 150 – 200 spectra per class)
Classification - SIMCA
SIMCA. Example
Development of SIMCA model with the 4 plastics
Calibration development:
38
19/11/2017
Classification - SIMCA
SIMCA. Example
Development of SIMCA model with the 4 plastics
Calibration development:
Classification - SIMCA
SIMCA. Example
Development of SIMCA model with the 4 plastics
Prediction:
39
19/11/2017
Classification - SIMCA
Based on PCA
Classification
PLS-DA
40
19/11/2017
Classification – PLS-DA
Unfolding
PLS-2 model
0
1
D Dummy
matrix
Classification – PLS-DA
PLS-2 model
0
1
D Dummy
matrix
41
19/11/2017
Classification – PLS-DA
PLS-2 model
0
1
D Dummy
matrix
Classification – PLS-DA
PLS-DA. Responses
42
19/11/2017
Classification – PLS-DA
PLS-DA. Responses
Classification – PLS-DA
PLS-DA. Responses
43
19/11/2017
Classification – PLS-DA
PLS-DA. Responses
The rest, like in PLS (and SIMCA, LDA, KNN,... Any training-based method):
Cross-validation
Number of LVs
Etc...
Classification
Assessment of classification
models
44
19/11/2017
Classification – Assessment
Confusion matrix
Classification – Assessment
Classifiers
TP True positive
FP False positive
FN False negative
TN True negative
45
19/11/2017
Classification – Assessment
Classifiers
TP True positive
FP False positive
FN False negative
TN True negative
Classification – Assessment
Classifiers
TP True positive
FP False positive
FN False negative
TN True negative
46
19/11/2017
Classification – Assessment
Classifiers
TP True positive
FP False positive
FN False negative
TN True negative
Classification – Assessment
Classifiers
TP True positive
FP False positive
FN False negative
TN True negative
47
19/11/2017
Classification – Assessment
Classifiers
Classification – Assessment
Classifiers
Receiver Operating Characteristics (ROC curves)
48
19/11/2017
Classification – Assessment
Classifiers
Receiver Operating Characteristics (ROC curves)
Classification – Assessment
Classifiers
Receiver Operating Characteristics (ROC curves)
49
19/11/2017
Classification – PLS-DA
PLS-DA. Example
Development of PLS-DA model with the 4 plastics
Calibration set: 655 spectra (between 150 – 200 spectra per class)
Classification – PLS-DA
PLS-DA. Example
Development of a PLS-Da model with the 4 plastics
Calibration development:
50
19/11/2017
Classification – PLS-DA
PLS-DA. Example
Development of a PLS-Da model with the 4 plastics
Calibration development:
Classification – PLS-DA
PLS-DA. Example
Development of a PLS-Da model with the 4 plastics
51
19/11/2017
Classification – PLS-DA
SIMCA PLS-DA
Classification – PLS-DA
52