You are on page 1of 86

Segmentation/Classification

Outline

1. Unsupervised Segmentation

1.1. K-NN clustering


1.2. Fuzzy clustering

2. Supervised Segmentation. Classification

2.1. LDA
2.2. SIMCA
2.3. PLS-DA

3. Assesing the classification models


Linear Classification Methods
Outline

Introduction. Concept of classification, distances and clustering

K-Nearest Neighbors (K-NN)

Linear Discriminant Analysis (LDA)

SIMCA

PLS-DA

Assesing the classification models / Validation


Introduction

Classification concept:

Classification aims at finding a criterion to assign an object (sample)


to one category (class) based on a set of measurements performed on
the object itself.

Category or class is a (ideal) group of objects sharing similar


characteristics

In classification, the categories are defined a priori

Classification stablishes boundaries depending of the criterion


selected
Introduction

Classification concept: (inspired by Federico Marini)

Criterion:
Nordic - Italian
Beer - Wine
Introduction

Classification concept:

Variables

The classification strongly depends on


the representativeness of the measured
Samples

variables with respect to the samples

X (I x J)
Introduction

Classification methods: Chemometric techniques aimed at finding


mathematical models able to recognize the membership of each sample to
its proper class on the basis of a set of measurements (X).
Introduction

Classification methods: Chemometric techniques aimed at finding


mathematical models able to recognize the membership of each sample to
its proper class on the basis of a set of measurements (X).
Introduction

Classification methods: Distinctions can be made among classification


techniques on the basis of the mathematical form of the decision boundary,
i.e. on the basis of the ability to detect linear or non-linear boundaries
Introduction

Classification methods: Another important distinction can be made among


pure classification and class-modeling methods
Introduction

Classification methods: Classes can be defined in different ways:

- By theoretical knowledge or experimental evidences

- By Discretizing a quantitative response

< 5 Class A >= 5.1 7 < Class B >= 7.1 Class C


Introduction

Classification methods: Once the classes have been defined, we construct


a model

Variables

A
A
Samples

MODEL B
X (I x J) B
Class = f(X)
B
C
C
C
Introduction

Classification methods: And then we can predict the category of unknown


samples

Variables

MODEL
A
Class = f(X)
Unknown sample
(1 x J)
Introduction

Similarity:
Is the mathematical transposition of the concept of analogy. Analogy is used
in any moment of our life for pattern recognition, i.e. to recognize, to
distinguish, to classify.

Distances:
Are the starting point for evaluating similarity: close samples are considered
similar, far samples are considered dissimilar
Introduction

Distances between two samples


Introduction

Clustering Cluster methods search for the presence of groups (clusters)


in the data, based on distances. UNSUPERVISED
Introduction

Do not confuse clustering and classification


Clustering methods
search for the presence of groups (clusters) in the data. They are
unsupervised and based on calculation of distances.

Classification
use the class information (supervised): they separate classes and their goal
is to find models able to correctly assign each sample to its proper class.

The easiest classification method (kNN) is also based on distance


measures
Introduction

Do not confuse clustering and classification

Unsupervised Pattern Recognition


PCA, clustering

Linear LDA

Pure classification
QDA
Non-linear K-NN

Classification
SIMCA
Linear PLSDA
Class-modelling
Variations PLSDA
Non-linear ANN
SVM
Segmentation

Main goal Find groups of similar pixels in an image

1. Supervised segmentation techniques A priori knowledge. Classification

SIMCA

PLS-DA Class-modelling

LDA

2. Unsupervised segmentation techniques. Not classification methods


PCA

Clustering
Distances, similarity and clustering

Similarity It is the mathematical word for analogy.

Distances The distances are the starting points for evaluating similarity:
Close samples Similar samples
Far samples Dissimilar samples

Centroid

Any spectrum of the


image
( )

d-dimensions
Distances, similarity and clustering

Clustering Cluster methods search for the presence of groups (clusters)


in the data, based on distances. UNSUPERVISED
Distances, similarity and clustering

Two main methods:


1. Agglomerative or hierarchical methods

Each pixel is considered as a class


Decreasing the number of classes-clusters

Many methods to solve the problem

2. Partitional methods

Preselection of number of clusters

But this selection is not easy


Hard Clustering KNN
Two main methods
Fuzzy Clustering FCM
K-NN

K-NN It is the benchmark method for unsupervised classification based


on measuring distances (analogy simmilarity).

Each sample is classified on the basis of the most represented classes of


the k nearest samples.
K-NN

K-means (KM) Variation of K-NN

The KM algorithm assigns each pixel xmn of the image to the kth
cluster, whose center is nearest, by minimizing the sum of the
squared distances of each pixel to its corresponding center

each pixel centroid


K-NN

How does it work?


(1) Choose the number of clusters k

(2) Generate k clusters and determine the cluster centers

(3) Assign each pixel to the nearest cluster center

(4) Calculate J and recalculate new cluster centers

(5) Repeat 3 and 4 till convergence criterion.


K-NN

Advantages
Simplicity

Drawbacks
Risk of converging to a local minimum in the iterations

It forces to each pixel to belong exclusively to one cluster.

No. Clusters Silhouette index


K-NN

Silhouette index
Calculated for each xmn pixel and offers a measure about the similarity between
points in the same cluster compared to points in other clusters:

amn average distance between each


mnth pixel and all the pixels included in
the same cluster

bmn minimum average distance


between each mnth pixel and the pixels
included in other clusters

(Smn)k close to 1 Correct classification


(Smn)k with negative value Missclasification

EXTREMELY SLOW!!!
K-NN

Exercise 0701 Applying K-nearest

- Dataset sample_demo.mat

Silhouette 2 clusters Silhouette 3 clusters Silhouette 4 clusters

2
1
Cluster

Cluster

Cluster
3
2

2
3 4

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Silhouette Value Silhouette Value Silhouette Value
K-NN

surfaces 2 clusters surfaces 3 clusters surfaces 4 clusters

20 20 20

40 40 40

60 60 60

20 40 60 80 20 40 60 80 20 40 60 80

No of clusters = 2 No of clusters = 3 No of clusters = 4


2 4 4

1
2 2
intensity

intensity

intensity
0
0 0
-1

-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
No of clusters = 5 No of clusters = 6 No of clusters = 7
K-NN
Centroids
No of clusters = 2 No of clusters = 3 No of clusters = 4
2 4 4

1
2 2
intensity

intensity

intensity
0
0 0
-1

-2 -2 -2
1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000
wavelength wavelength wavelength
No of clusters = 5 No of clusters = 6 No of clusters = 7

Pure spectra
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05
1200 1300 1400 1500 1600 1700 1800 1900 2000
Fuzzy Clustering

Each pixel is assigned a fractional degree of membership to all the


clusters simultaneously, rather than it belonging completely to one
cluster (as in KM clustering).

FCM allows pixels in the edge to belong to one cluster to a lesser


degree than pixels that are in the middle of the cluster.
Fuzzy Clustering

Membership function coefficient ugmnk is calculated for each xmn pixel in


such a way that each coefficient is compressed between 0 and 1, and
the sum of all the coefficients is defined to be 1

g is the so-called fuzzifier constant, which determines the fuzziness


of the clustering result

A good value of g is 2, which indicates that the coefficients are


linearly normalized to make this sum 1
Fuzzy Clustering

Now, the J function is calculated as follows:

and cluster center mk is calculated as the weighted mean:


Fuzzy Clustering

Advantages
Each pixel is assigned a belonging degree

Drawbacks
Risk of converging to a local minimum in the iterations (as in KM)

No. Clusters Partition Entropy


Fuzzy Clustering

Partition Entropy

mnk and MN denotes the total amount of pixels of the image.

PE value ranges from 0 to log(K).

Values close to 0 indicate a good estimation of the number of


clusters, whereas PE values close to log K indicate that the number
of clusters does not reflect the real structure of the image.
Fuzzy Clustering

Partition Entropy and silhouette are dreadfully

HOPELESS IN SOME CASES!!!


Fuzzy Clustering

Partition Entropy and silhouette are dreadfully

HOPELESS IN SOME CASES!!!

Silhoueltte Partition Entropy


Fuzzy Clustering

Partition Entropy and silhouette are dreadfully

HOPELESS IN SOME CASES!!!

K-means
Fuzzy Clustering

Partition Entropy and silhouette are dreadfully

HOPELESS IN SOME CASES!!!

FCM
Fuzzy Clustering

Exercise 0702 Apply K-means and FCM to both

- Dataset Sample_demo.mat
- Dataset brunel.mat
Fuzzy Clustering

Exercise 0703 Apply K-means to the fluorescence and


plastics
Distances, similarity and clustering

Classification Classification uses the class information to find models


that associate each sample to the assigned class SUPERVISED

Linear Discriminant Analysis

SIMCA

PLS-DA
Linear Discriminant Analysis (LDA)

Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes

PC1 GOOD
LD1 GOOD
Linear Discriminant Analysis (LDA)

Discriminant Analysis:
Separates samples into classes by finding directions which:
maximize the variance between classes
minimize the variance within classes

PC1 BAD
LD1 GOOD
Linear Discriminant Analysis

Linear discriminant Analysis (LDA)

LDA is a method that separate samples into classes by finding directions


which maximize the variance between classes and minimize the variance
whithin classes.

Two main assumptions of our data

Each k-class density is a multivariate gaussian


Linear Discriminant Analysis

Linear discriminant Analysis (LDA)

LDA is a method that separate samples into classes by finding directions


which maximize the variance between classes and minimize the variance
whithin classes.

Two main assumptions of our data

All the class covariance matrices are presumed to be identical

Sg = S
Linear Discriminant Analysis

LDA, then calculates the probability of belonging to each class by


applying the Bayes Theorem:
Linear Discriminant Analysis

Once the probability has been calculated, LDA assigns samples to an


especific class with minimum discriminant score dg:
Linear Discriminant Analysis

Once the probability has been calculated, LDA assigns samples to an


especific class with minimum discriminant score dg:
Linear Discriminant Analysis

Once the probability has been calculated, LDA assigns samples to an


especific class with minimum discriminant score dg:
Linear Discriminant Analysis

Benefits and drawbacks of LDA:

Benefits: It is easy to use and robust

Drawbacks:

1) The number of samples must be higher than the number of variables. This
is not a real problem with images

2) LDA makes the assumption of the gaussian distribution of the data


Soft Independent Method for Class Analogy
Standard Isolinear Method of Class Assignment
SIMCA
SIMCA

SIMCA

Originally proposed by Wold in 1976

SOFT: No assumption of the distribution of variable is made


(bilinear modeling)

INDEPENDENT: Each category is modeled independently

MODELING of CLASS ANALOGIES: Attention is focused on the


similarity between object from the same class rather then on
differentiating among classes.
SIMCA

SIMCA

Is a class-modelling method. Thus, it is a supervised method.


- A training set is needed to construct a model
- Projection of unknown samples to the model

SIMCA is based on making independent PCA models for each class


in the training set, having each class the possibility of containing
different number of PCs.

After the independent PCA models are constructed, the unknown


samples are projected onto them
SIMCA

SIMCA. Graphical interpretation

1) Individual PCA model for each class. Each class pre-processed


independently

PC1
PC1

PC2
SIMCA

SIMCA. Graphical interpretation

1) Individual PCA model for each class. Each class pre-processed


independently

PC1

PC2
SIMCA

SIMCA. Graphical interpretation

2) Projection of other classes in the PCA class space of one class

PC1

PC2
SIMCA

SIMCA. How to assign the class belonging?

There are many implementations to decide if an unknown sample


belongs to a class or not.

Here we will talk about three variations of the same concept:

Hottelling T2 and residuals!

PC1

PC2
SIMCA

SIMCA. How to assign the class belonging?

There are many implementations to decide if an unknown sample


belongs to a class or not.

Here we will talk about three variations of the same concept:

?
residuals

Hotelling T2
SIMCA

SIMCA. Assignment of class. Strategy 1

SIMCA assigns samples to the nearest class.


Samples are, therefore, always assigned.
The distance of each i sample from each g class (dig) is calculated as

where:
Qig and T2ig are the Hotellings T2 and Q calculated in the
PCA g-class model.
Q0.95,g and T20.95,g are the confidence intervals within 95%
of the g class
SIMCA

SIMCA. Assignment of class. Strategy 1

Hotelling T2

T20.95,g

Q0.95,g residuals
SIMCA

SIMCA. Assignment of class. Strategy 2. Condition

SIMCA assigns a sample to the g class if

Which is equivalent to:

Samples can be unclassified:


unassigned (i.e. outside the class spaces of all classes)
classified in more than one class (confused)
SIMCA

SIMCA. Assignment of class. Strategy 2. Condition

Hotelling T2

T20.95,g

Q0.95,g residuals
SIMCA

SIMCA. Assignment of class. Strategy 3.

SIMCA assigns a sample to the g class if

Similar approach to the second strategy


Samples can be unclassified:
unassigned (i.e. outside the class spaces of all classes)
classified in more than one class (confused)
SIMCA

SIMCA. Assignment of class. Strategy 3. Condition

Hotelling T2

T20.95,g

Q0.95,g residuals
SIMCA

SIMCA. Assignment of class. Strategy 3. Condition


SIMCA

SIMCA. Benefits and drawbacks:

Benefits: Simple and based on PCA

Drawbacks:

1) Each class needs to be perfectly defined by the number of PCs.

2) The number of unassigned samples can be high.

3) One sample can belong to several classes


PLS-DA

Partial Least Squares Discriminant Analysis

Unfolding
PLS-2 model
0
1

D Dummy
matrix
PLS-DA

Partial Least Squares Discriminant Analysis


PLS-DA is based on the same principles than PLS Covariance
between X and Y

PLS-2 model
0
1

D Dummy
matrix
PLS-DA

Partial Least Squares Discriminant Analysis


The main difference is that Y is a dummy matrix with 0 and 1

PLS-2 model
0
1

D Dummy
matrix
PLS-DA

Partial Least Squares Discriminant Analysis


The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
PLS-DA

Partial Least Squares Discriminant Analysis


The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
Bayes Theorem (like in LDA):

1) It assumes that the predicted values follow a normal distribution.


2) The treshold is selected where number of false positives and
false negatives is minimized
PLS-DA

Partial Least Squares Discriminant Analysis


The response of PLS-DA when classifies is still a number. Therefore
we need to find rules to convert these numbers into classes
Bayes Theorem (like in LDA):
PLS-DA

Partial Least Squares Discriminant Analysis


The rest, it works like PLS:

Cross validation
Number of LVs
etc
Assessing the models

Confusion matrix
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Confusion matrix

TP True positive
FP False positive
FN False negative
TN True negative
Assessing the models

Exercise 0704 Calculate all the statistical


parameters of the following confusion matrix
Assessing the models

Receiver Operating Characteristics (ROC curves)


ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.
Assessing the models

Receiver Operating Characteristics (ROC curves)


ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.
Assessing the models

Receiver Operating Characteristics (ROC curves)


ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,
for a binary classification system as its discrimination threshold is changed.
They are used to estimate the best classification score.
Assessing the models

Exercise 0705 PLS-DA model to:

ALMONDS

PLASTICS