You are on page 1of 86

# Segmentation/Classification

Outline

1. Unsupervised Segmentation

## 1.1. K-NN clustering

1.2. Fuzzy clustering

2.1. LDA
2.2. SIMCA
2.3. PLS-DA

## 3. Assesing the classification models

Linear Classification Methods
Outline

SIMCA

PLS-DA

## Assesing the classification models / Validation

Introduction

Classification concept:

## Classification aims at finding a criterion to assign an object (sample)

to one category (class) based on a set of measurements performed on
the object itself.

characteristics

selected
Introduction

## Classification concept: (inspired by Federico Marini)

Criterion:
Nordic - Italian
Beer - Wine
Introduction

Classification concept:

Variables

## The classification strongly depends on

the representativeness of the measured
Samples

X (I x J)
Introduction

## Classification methods: Chemometric techniques aimed at finding

mathematical models able to recognize the membership of each sample to
its proper class on the basis of a set of measurements (X).
Introduction

## Classification methods: Chemometric techniques aimed at finding

mathematical models able to recognize the membership of each sample to
its proper class on the basis of a set of measurements (X).
Introduction

## Classification methods: Distinctions can be made among classification

techniques on the basis of the mathematical form of the decision boundary,
i.e. on the basis of the ability to detect linear or non-linear boundaries
Introduction

## Classification methods: Another important distinction can be made among

pure classification and class-modeling methods
Introduction

Introduction

a model

Variables

A
A
Samples

MODEL B
X (I x J) B
Class = f(X)
B
C
C
C
Introduction

## Classification methods: And then we can predict the category of unknown

samples

Variables

MODEL
A
Class = f(X)
Unknown sample
(1 x J)
Introduction

Similarity:
Is the mathematical transposition of the concept of analogy. Analogy is used
in any moment of our life for pattern recognition, i.e. to recognize, to
distinguish, to classify.

Distances:
Are the starting point for evaluating similarity: close samples are considered
similar, far samples are considered dissimilar
Introduction

Introduction

## Clustering Cluster methods search for the presence of groups (clusters)

in the data, based on distances. UNSUPERVISED
Introduction

## Do not confuse clustering and classification

Clustering methods
search for the presence of groups (clusters) in the data. They are
unsupervised and based on calculation of distances.

Classification
use the class information (supervised): they separate classes and their goal
is to find models able to correctly assign each sample to its proper class.

measures
Introduction

## Unsupervised Pattern Recognition

PCA, clustering

Linear LDA

Pure classification
QDA
Non-linear K-NN

Classification
SIMCA
Linear PLSDA
Class-modelling
Variations PLSDA
Non-linear ANN
SVM
Segmentation

## 1. Supervised segmentation techniques A priori knowledge. Classification

SIMCA

PLS-DA Class-modelling

LDA

## 2. Unsupervised segmentation techniques. Not classification methods

PCA

Clustering
Distances, similarity and clustering

## Similarity It is the mathematical word for analogy.

Distances The distances are the starting points for evaluating similarity:
Close samples Similar samples
Far samples Dissimilar samples

Centroid

## Any spectrum of the

image
( )

d-dimensions
Distances, similarity and clustering

## Clustering Cluster methods search for the presence of groups (clusters)

in the data, based on distances. UNSUPERVISED
Distances, similarity and clustering

## Two main methods:

1. Agglomerative or hierarchical methods

## Each pixel is considered as a class

Decreasing the number of classes-clusters

## Many methods to solve the problem

2. Partitional methods

## But this selection is not easy

Hard Clustering KNN
Two main methods
Fuzzy Clustering FCM
K-NN

## K-NN It is the benchmark method for unsupervised classification based

on measuring distances (analogy simmilarity).

## Each sample is classified on the basis of the most represented classes of

the k nearest samples.
K-NN

## K-means (KM) Variation of K-NN

The KM algorithm assigns each pixel xmn of the image to the kth
cluster, whose center is nearest, by minimizing the sum of the
squared distances of each pixel to its corresponding center

K-NN

## How does it work?

(1) Choose the number of clusters k

## (5) Repeat 3 and 4 till convergence criterion.

K-NN

Simplicity

Drawbacks
Risk of converging to a local minimum in the iterations

## No. Clusters Silhouette index

K-NN

Silhouette index
Calculated for each xmn pixel and offers a measure about the similarity between
points in the same cluster compared to points in other clusters:

## amn average distance between each

mnth pixel and all the pixels included in
the same cluster

## bmn minimum average distance

between each mnth pixel and the pixels
included in other clusters

## (Smn)k close to 1 Correct classification

(Smn)k with negative value Missclasification

EXTREMELY SLOW!!!
K-NN

## Exercise 0701 Applying K-nearest

- Dataset sample_demo.mat

## Silhouette 2 clusters Silhouette 3 clusters Silhouette 4 clusters

2
1
Cluster

Cluster

Cluster
3
2

2
3 4

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Silhouette Value Silhouette Value Silhouette Value
K-NN

## surfaces 2 clusters surfaces 3 clusters surfaces 4 clusters

20 20 20

40 40 40

60 60 60

20 40 60 80 20 40 60 80 20 40 60 80

## No of clusters = 2 No of clusters = 3 No of clusters = 4

2 4 4

1
2 2
intensity

intensity

intensity
0
0 0
-1

-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
No of clusters = 5 No of clusters = 6 No of clusters = 7
K-NN
Centroids
No of clusters = 2 No of clusters = 3 No of clusters = 4
2 4 4

1
2 2
intensity

intensity

intensity
0
0 0
-1

-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
No of clusters = 5 No of clusters = 6 No of clusters = 7

Pure spectra
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05
1200 1300 1400 1500 1600 1700 1800 1900 2000
Fuzzy Clustering

## Each pixel is assigned a fractional degree of membership to all the

clusters simultaneously, rather than it belonging completely to one
cluster (as in KM clustering).

## FCM allows pixels in the edge to belong to one cluster to a lesser

degree than pixels that are in the middle of the cluster.
Fuzzy Clustering

## Membership function coefficient ugmnk is calculated for each xmn pixel in

such a way that each coefficient is compressed between 0 and 1, and
the sum of all the coefficients is defined to be 1

## g is the so-called fuzzifier constant, which determines the fuzziness

of the clustering result

## A good value of g is 2, which indicates that the coefficients are

linearly normalized to make this sum 1
Fuzzy Clustering

## and cluster center mk is calculated as the weighted mean:

Fuzzy Clustering

Each pixel is assigned a belonging degree

Drawbacks
Risk of converging to a local minimum in the iterations (as in KM)

## No. Clusters Partition Entropy

Fuzzy Clustering

Partition Entropy

## Values close to 0 indicate a good estimation of the number of

clusters, whereas PE values close to log K indicate that the number
of clusters does not reflect the real structure of the image.
Fuzzy Clustering

Fuzzy Clustering

Fuzzy Clustering

K-means
Fuzzy Clustering

FCM
Fuzzy Clustering

## Exercise 0702 Apply K-means and FCM to both

- Dataset Sample_demo.mat
- Dataset brunel.mat
Fuzzy Clustering

## Exercise 0703 Apply K-means to the fluorescence and

plastics
Distances, similarity and clustering

## Classification Classification uses the class information to find models

that associate each sample to the assigned class SUPERVISED

## Linear Discriminant Analysis

SIMCA

PLS-DA
Linear Discriminant Analysis (LDA)

Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes

PC1 GOOD
LD1 GOOD
Linear Discriminant Analysis (LDA)

Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes

LD1 GOOD
Linear Discriminant Analysis

## LDA is a method that separate samples into classes by finding directions

which maximize the variance between classes and minimize the variance
whithin classes.

## Each k-class density is a multivariate gaussian

Linear Discriminant Analysis

## LDA is a method that separate samples into classes by finding directions

which maximize the variance between classes and minimize the variance
whithin classes.

## All the class covariance matrices are presumed to be identical

Sg = S
Linear Discriminant Analysis

## LDA, then calculates the probability of belonging to each class by

applying the Bayes Theorem:
Linear Discriminant Analysis

## Once the probability has been calculated, LDA assigns samples to an

especific class with minimum discriminant score dg:
Linear Discriminant Analysis

## Once the probability has been calculated, LDA assigns samples to an

especific class with minimum discriminant score dg:
Linear Discriminant Analysis

## Once the probability has been calculated, LDA assigns samples to an

especific class with minimum discriminant score dg:
Linear Discriminant Analysis

## Benefits: It is easy to use and robust

Drawbacks:

1) The number of samples must be higher than the number of variables. This
is not a real problem with images

## 2) LDA makes the assumption of the gaussian distribution of the data

Soft Independent Method for Class Analogy
Standard Isolinear Method of Class Assignment
SIMCA
SIMCA

SIMCA

## SOFT: No assumption of the distribution of variable is made

(bilinear modeling)

## MODELING of CLASS ANALOGIES: Attention is focused on the

similarity between object from the same class rather then on
differentiating among classes.
SIMCA

SIMCA

## Is a class-modelling method. Thus, it is a supervised method.

- A training set is needed to construct a model
- Projection of unknown samples to the model

## SIMCA is based on making independent PCA models for each class

in the training set, having each class the possibility of containing
different number of PCs.

## After the independent PCA models are constructed, the unknown

samples are projected onto them
SIMCA

independently

PC1
PC1

PC2
SIMCA

independently

PC1

PC2
SIMCA

PC1

PC2
SIMCA

## There are many implementations to decide if an unknown sample

belongs to a class or not.

PC1

PC2
SIMCA

## There are many implementations to decide if an unknown sample

belongs to a class or not.

?
residuals

Hotelling T2
SIMCA

## SIMCA assigns samples to the nearest class.

Samples are, therefore, always assigned.
The distance of each i sample from each g class (dig) is calculated as

where:
Qig and T2ig are the Hotellings T2 and Q calculated in the
PCA g-class model.
Q0.95,g and T20.95,g are the confidence intervals within 95%
of the g class
SIMCA

## SIMCA. Assignment of class. Strategy 1

Hotelling T2

T20.95,g

Q0.95,g residuals
SIMCA

## Samples can be unclassified:

unassigned (i.e. outside the class spaces of all classes)
classified in more than one class (confused)
SIMCA

## SIMCA. Assignment of class. Strategy 2. Condition

Hotelling T2

T20.95,g

Q0.95,g residuals
SIMCA

## Similar approach to the second strategy

Samples can be unclassified:
unassigned (i.e. outside the class spaces of all classes)
classified in more than one class (confused)
SIMCA

## SIMCA. Assignment of class. Strategy 3. Condition

Hotelling T2

T20.95,g

Q0.95,g residuals
SIMCA

SIMCA

Drawbacks:

PLS-DA

Unfolding
PLS-2 model
0
1

D Dummy
matrix
PLS-DA

## Partial Least Squares Discriminant Analysis

PLS-DA is based on the same principles than PLS Covariance
between X and Y

PLS-2 model
0
1

D Dummy
matrix
PLS-DA

## Partial Least Squares Discriminant Analysis

The main difference is that Y is a dummy matrix with 0 and 1

PLS-2 model
0
1

D Dummy
matrix
PLS-DA

## Partial Least Squares Discriminant Analysis

The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
PLS-DA

## Partial Least Squares Discriminant Analysis

The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
Bayes Theorem (like in LDA):

## 1) It assumes that the predicted values follow a normal distribution.

2) The treshold is selected where number of false positives and
false negatives is minimized
PLS-DA

## Partial Least Squares Discriminant Analysis

The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
Bayes Theorem (like in LDA):
PLS-DA

## Partial Least Squares Discriminant Analysis

The rest, it works like PLS:

Cross validation
Number of LVs
etc
Assessing the models

Confusion matrix
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

## Exercise 0704 Calculate all the statistical

parameters of the following confusion matrix
Assessing the models

## Receiver Operating Characteristics (ROC curves)

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.
Assessing the models

## Receiver Operating Characteristics (ROC curves)

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.
Assessing the models

## Receiver Operating Characteristics (ROC curves)

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.
Assessing the models

ALMONDS

PLASTICS