Original Title: Block 07 Segmentation Classification

Outline

1. Unsupervised Segmentation

1.2. Fuzzy clustering

2.1. LDA

2.2. SIMCA

2.3. PLS-DA

Linear Classification Methods

Outline

SIMCA

PLS-DA

Introduction

Classification concept:

to one category (class) based on a set of measurements performed on

the object itself.

characteristics

selected

Introduction

Criterion:

Nordic - Italian

Beer - Wine

Introduction

Classification concept:

Variables

the representativeness of the measured

Samples

X (I x J)

Introduction

mathematical models able to recognize the membership of each sample to

its proper class on the basis of a set of measurements (X).

Introduction

mathematical models able to recognize the membership of each sample to

its proper class on the basis of a set of measurements (X).

Introduction

techniques on the basis of the mathematical form of the decision boundary,

i.e. on the basis of the ability to detect linear or non-linear boundaries

Introduction

pure classification and class-modeling methods

Introduction

Introduction

a model

Variables

A

A

Samples

MODEL B

X (I x J) B

Class = f(X)

B

C

C

C

Introduction

samples

Variables

MODEL

A

Class = f(X)

Unknown sample

(1 x J)

Introduction

Similarity:

Is the mathematical transposition of the concept of analogy. Analogy is used

in any moment of our life for pattern recognition, i.e. to recognize, to

distinguish, to classify.

Distances:

Are the starting point for evaluating similarity: close samples are considered

similar, far samples are considered dissimilar

Introduction

Introduction

in the data, based on distances. UNSUPERVISED

Introduction

Clustering methods

search for the presence of groups (clusters) in the data. They are

unsupervised and based on calculation of distances.

Classification

use the class information (supervised): they separate classes and their goal

is to find models able to correctly assign each sample to its proper class.

measures

Introduction

PCA, clustering

Linear LDA

Pure classification

QDA

Non-linear K-NN

Classification

SIMCA

Linear PLSDA

Class-modelling

Variations PLSDA

Non-linear ANN

SVM

Segmentation

SIMCA

PLS-DA Class-modelling

LDA

PCA

Clustering

Distances, similarity and clustering

Distances The distances are the starting points for evaluating similarity:

Close samples Similar samples

Far samples Dissimilar samples

Centroid

image

( )

d-dimensions

Distances, similarity and clustering

in the data, based on distances. UNSUPERVISED

Distances, similarity and clustering

1. Agglomerative or hierarchical methods

Decreasing the number of classes-clusters

2. Partitional methods

Hard Clustering KNN

Two main methods

Fuzzy Clustering FCM

K-NN

on measuring distances (analogy simmilarity).

the k nearest samples.

K-NN

The KM algorithm assigns each pixel xmn of the image to the kth

cluster, whose center is nearest, by minimizing the sum of the

squared distances of each pixel to its corresponding center

K-NN

(1) Choose the number of clusters k

K-NN

Advantages

Simplicity

Drawbacks

Risk of converging to a local minimum in the iterations

K-NN

Silhouette index

Calculated for each xmn pixel and offers a measure about the similarity between

points in the same cluster compared to points in other clusters:

mnth pixel and all the pixels included in

the same cluster

between each mnth pixel and the pixels

included in other clusters

(Smn)k with negative value Missclasification

EXTREMELY SLOW!!!

K-NN

- Dataset sample_demo.mat

2

1

Cluster

Cluster

Cluster

3

2

2

3 4

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Silhouette Value Silhouette Value Silhouette Value

K-NN

20 20 20

40 40 40

60 60 60

20 40 60 80 20 40 60 80 20 40 60 80

2 4 4

1

2 2

intensity

intensity

intensity

0

0 0

-1

-2 -2 -2

1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000

wavelength wavelength wavelength

No of clusters = 5 No of clusters = 6 No of clusters = 7

K-NN

Centroids

No of clusters = 2 No of clusters = 3 No of clusters = 4

2 4 4

1

2 2

intensity

intensity

intensity

0

0 0

-1

-2 -2 -2

1200 1400 1600 1800 2000 1200 1400 1600 1800 2000 1200 1400 1600 1800 2000

wavelength wavelength wavelength

No of clusters = 5 No of clusters = 6 No of clusters = 7

Pure spectra

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

1200 1300 1400 1500 1600 1700 1800 1900 2000

Fuzzy Clustering

clusters simultaneously, rather than it belonging completely to one

cluster (as in KM clustering).

degree than pixels that are in the middle of the cluster.

Fuzzy Clustering

such a way that each coefficient is compressed between 0 and 1, and

the sum of all the coefficients is defined to be 1

of the clustering result

linearly normalized to make this sum 1

Fuzzy Clustering

Fuzzy Clustering

Advantages

Each pixel is assigned a belonging degree

Drawbacks

Risk of converging to a local minimum in the iterations (as in KM)

Fuzzy Clustering

Partition Entropy

clusters, whereas PE values close to log K indicate that the number

of clusters does not reflect the real structure of the image.

Fuzzy Clustering

Fuzzy Clustering

Fuzzy Clustering

K-means

Fuzzy Clustering

FCM

Fuzzy Clustering

- Dataset Sample_demo.mat

- Dataset brunel.mat

Fuzzy Clustering

plastics

Distances, similarity and clustering

that associate each sample to the assigned class SUPERVISED

SIMCA

PLS-DA

Linear Discriminant Analysis (LDA)

Discriminant Analysis:

Separates samples into classes by finding directions which:

maximize the variance between classes

minimize the variance within classes

PC1 GOOD

LD1 GOOD

Linear Discriminant Analysis (LDA)

Discriminant Analysis:

Separates samples into classes by finding directions which:

maximize the variance between classes

minimize the variance within classes

PC1 BAD

LD1 GOOD

Linear Discriminant Analysis

which maximize the variance between classes and minimize the variance

whithin classes.

Linear Discriminant Analysis

which maximize the variance between classes and minimize the variance

whithin classes.

Sg = S

Linear Discriminant Analysis

applying the Bayes Theorem:

Linear Discriminant Analysis

especific class with minimum discriminant score dg:

Linear Discriminant Analysis

especific class with minimum discriminant score dg:

Linear Discriminant Analysis

especific class with minimum discriminant score dg:

Linear Discriminant Analysis

Drawbacks:

1) The number of samples must be higher than the number of variables. This

is not a real problem with images

Soft Independent Method for Class Analogy

Standard Isolinear Method of Class Assignment

SIMCA

SIMCA

SIMCA

(bilinear modeling)

similarity between object from the same class rather then on

differentiating among classes.

SIMCA

SIMCA

- A training set is needed to construct a model

- Projection of unknown samples to the model

in the training set, having each class the possibility of containing

different number of PCs.

samples are projected onto them

SIMCA

independently

PC1

PC1

PC2

SIMCA

independently

PC1

PC2

SIMCA

PC1

PC2

SIMCA

belongs to a class or not.

PC1

PC2

SIMCA

belongs to a class or not.

?

residuals

Hotelling T2

SIMCA

Samples are, therefore, always assigned.

The distance of each i sample from each g class (dig) is calculated as

where:

Qig and T2ig are the Hotellings T2 and Q calculated in the

PCA g-class model.

Q0.95,g and T20.95,g are the confidence intervals within 95%

of the g class

SIMCA

Hotelling T2

T20.95,g

Q0.95,g residuals

SIMCA

unassigned (i.e. outside the class spaces of all classes)

classified in more than one class (confused)

SIMCA

Hotelling T2

T20.95,g

Q0.95,g residuals

SIMCA

Samples can be unclassified:

unassigned (i.e. outside the class spaces of all classes)

classified in more than one class (confused)

SIMCA

Hotelling T2

T20.95,g

Q0.95,g residuals

SIMCA

SIMCA

Drawbacks:

PLS-DA

Unfolding

PLS-2 model

0

1

D Dummy

matrix

PLS-DA

PLS-DA is based on the same principles than PLS Covariance

between X and Y

PLS-2 model

0

1

D Dummy

matrix

PLS-DA

The main difference is that Y is a dummy matrix with 0 and 1

PLS-2 model

0

1

D Dummy

matrix

PLS-DA

The response of PLS-DA when classifies is still a number. Therefore

we need to find rules to convert these numbers into classes

PLS-DA

The response of PLS-DA when classifies is still a number. Therefore

we need to find rules to convert these numbers into classes

Bayes Theorem (like in LDA):

2) The treshold is selected where number of false positives and

false negatives is minimized

PLS-DA

The response of PLS-DA when classifies is still a number. Therefore

we need to find rules to convert these numbers into classes

Bayes Theorem (like in LDA):

PLS-DA

The rest, it works like PLS:

Cross validation

Number of LVs

etc

Assessing the models

Confusion matrix

Assessing the models

Confusion matrix

TP True positive

FP False positive

FN False negative

TN True negative

Assessing the models

Confusion matrix

TP True positive

FP False positive

FN False negative

TN True negative

Assessing the models

Confusion matrix

TP True positive

FP False positive

FN False negative

TN True negative

Assessing the models

Confusion matrix

TP True positive

FP False positive

FN False negative

TN True negative

Assessing the models

Confusion matrix

TP True positive

FP False positive

FN False negative

TN True negative

Assessing the models

parameters of the following confusion matrix

Assessing the models

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,

for a binary classification system as its discrimination threshold is changed.

They are used to estimate the best classification score.

Assessing the models

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,

for a binary classification system as its discrimination threshold is changed.

They are used to estimate the best classification score.

Assessing the models

ROC curves is a graphical plot of Sp and Sn as X and Y axes respectively,

for a binary classification system as its discrimination threshold is changed.

They are used to estimate the best classification score.

Assessing the models

ALMONDS

PLASTICS

