2 views

Uploaded by shane

Using mobile health technology to deliver decision support for self-monitoring after lung transplantation

- 9781107051621_frontmatter.pdf
- Emotion recognition from geometric facial features using self-organizing map
- ModEco Manual
- Insilico Desigh of Bio-reactors
- Statistical Aspects of Screening Tests
- InTech-Self Organizing Maps for Processing of Data With Missing Values and Outliers Application to Remote Sensing Images
- Soft Computing
- Funkce_UNS_15.pdf
- The Supervised Network Self-Organizing Map for Classification of Large Data Sets
- Kms
- Embeded Matlab 1 - User's Guide
- Neural Network Time Series Prediction SP500 2
- 091102
- pone.0093647-1
- Research- Faking Video With AI
- siggraph17_obama.pdf
- Jounal the NFL 225 Test
- SEPSIS 27
- A biodiversity hotspot losing its top predator: The challenge of jaguar conservation in the Atlantic Forest of South America
- 7364408.ppt

You are on page 1of 9

journal homepage: www.elsevier.com/locate/chemolab

Short Communication

A MATLAB toolbox for Self Organizing Maps and supervised neural network

learning strategies

Davide Ballabio a,, Mahdi Vasighi b

a

b

Milano Chemometrics and QSAR Research Group, Department of Environmental Sciences, University of Milano Bicocca, Milano, Italy

Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran

a r t i c l e

i n f o

Article history:

Received 17 April 2012

Received in revised form 5 July 2012

Accepted 14 July 2012

Available online 22 July 2012

Keywords:

Self Organizing Maps

Supervised pattern recognition

Articial Neural Networks

MATLAB

Kohonen maps

a b s t r a c t

Kohonen maps and Counterpropagation Neural Networks are two of the most popular learning strategies based

on Articial Neural Networks. Kohonen Maps (or Self Organizing Maps) are basically self-organizing systems

which are capable to solve the unsupervised rather than the supervised problems, while Counterpropagation

Articial Neural Networks are very similar to Kohonen maps, but an output layer is added to the Kohonen

layer in order to handle supervised modelling. Recently, the modications of Counterpropagation Articial

Neural Networks allowed introducing new supervised neural network strategies, such as Supervised Kohonen

Networks and XY-fused Networks.

In this paper, the Kohonen and CP-ANN toolbox for MATLAB is described. This is a collection of modules for calculating Kohonen maps and derived methods for supervised classication, such as Counterpropagation Articial

Neural Networks, Supervised Kohonen Networks and XY-fused Networks. The toolbox comprises a graphical

user interface (GUI), which allows the calculation in an easy-to-use graphical environment. It aims to be useful

for both beginners and advanced users of MATLAB. The use of the toolbox is discussed here with an appropriate

practical example.

2012 Elsevier B.V. All rights reserved.

1. Introduction

Kohonen maps (or Self Organizing Maps, SOMs) are one of the most

popular learning strategies among the several Articial Neural Networks

algorithms proposed in literature [1]. Their uses are increasing related to

several different tasks and nowadays they can be considered as an

important tool in multivariate statistics [2]. Kohonen maps are selforganizing systems able to solve unsupervised rather than supervised

problems. As a consequence, methods based on the Kohonen approach

but combining characteristics from both supervised and unsupervised

learning have been introduced. Counterpropagation Articial Neural

Networks (CP-ANNs) are very similar to Kohonen maps, since an

output layer is added to the Kohonen layer [3]. When dealing with

classication issues, CP-ANNs are generally efcacious methods for

modelling classes separated with non-linear boundaries. Recently, modications to CP-ANNs led introducing new supervised neural network

strategies, such as Supervised Kohonen Networks (SKNs) and XY-fused

Networks (XY-Fs) [4].

As a consequence of the increasing success of Self Organizing

Maps, some toolboxes for calculating supervised and unsupervised

SOMs were proposed in literature [58]. The Kohonen and CP-ANN

Corresponding author at: Dept. of Environmental Sciences, University of MilanoBicocca, P.zza della Scienza, 120126 Milano, Italy. Tel.: +39 02 6448 2801; fax: +39

02 6448 2839.

E-mail address: davide.ballabio@unimib.it (D. Ballabio).

0169-7439/$ see front matter 2012 Elsevier B.V. All rights reserved.

doi:10.1016/j.chemolab.2012.07.005

unsupervised Kohonen maps and supervised classication models

by means of CP-ANNs in an easy-to-use graphical user interface

(GUI) environment [9]. Recently, several new features and algorithms

(SKNs, XY-Fs, batch training, optimization of network settings by

means of Genetic Algorithms) were introduced in the toolbox. This

work deals with the presentation of the last version of the Kohonen

and CP-ANN toolbox, which is a collection of MATLAB modules freely

available via Internet (http://www.disat.unimib.it/chm) along with

examples and a comprehensive user manual released as HTML les.

2. Methodological background

2.1. Notation

Scalars are indicated by italic lower-case characters (e.g. xij) and vectors by bold lower-case characters (e.g. x). Two-dimensional arrays

(matrices) are denoted as X (I J), where I is the number of samples

and J the number of variables. The ij-th element of the data matrix X

is denoted as xij and represents the value of the j-th variable for the ith sample.

2.2. Kohonen maps

The toolbox was developed following the algorithm described in

the paper from Zupan, Novic and Ruisnchez [10]. Only a brief

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

found in the quoted paper.

The Kohonen map is usually characterized by being a squared toroidal

space that consists of a grid of N2 neurons, where N is the number of

neurons for each side of the space (Fig. 1a). Given a multivariate dataset

composed of I samples described by J experimental variables, each

neuron is associated to J weights, that is, it contains as many elements

(weights) as the number of variables. The weights of each neuron are initialized between 0 and 1 and updated on the basis of the I samples, for a

certain number of times (termed as training epochs). Kohonen maps can

be trained by means of sequential or batch training algorithms [1].

When the sequential training is adopted, in each training epoch samples are randomly introduced in the network, one at a time. For each

sample (xi), the most similar neuron (i.e. the winning neuron) is selected

on the basis of the minimum Euclidean distance. Then, the weights of the

r-th neuron (wr) are changed as a function of the difference between

their values and the values of the sample; this correction (wr) is scaled

according to the topological distance from the winning neuron (dri):

wr 1

dri

old

xi wr

d max 1

where is the learning rate and dmax the size of the considered

neighbourhood, that decreases during the training phase. The topological

distance dri is dened as the number of neurons between the considered

neuron r and the winning neuron. The learning rate changes during the

training phase, as follows:

t

start

final

final

1

t tot

25

where t is the number of the current training epoch, ttot is the total number of training epochs, start and nal are the learning rate at the beginning and at the end of the training, respectively.

When the batch training is used, the whole set of samples is

presented to the network and winner neurons are found; after this,

the weights are calculated on the basis of the effect of all the samples,

at the same time:

I

P

uir xi

wr i1I

P

uir

i1

where wr are the updated weights of the r-th neuron, xi is the i-th

sample, and uir is the weighting factor of the winning neuron related

to the i-th sample with respect to neuron r:

uir 1

dri

d max 1

where, , dmax and dri are dened as before (see Eq. (1)).

At the end of the network training, samples are placed in the most

similar neurons of the Kohonen map; in this way, the data structure

can be visualized and the role of experimental variables in dening

the data structure can be elucidated by looking at the Kohonen

weights.

Fig. 1. Structures of Kohonen maps and related methods (CP-ANNs, SKNs, and XY-Fs) for a generic dataset constituted by J variables and G classes. Notation in the g. refers to notation

used in the text: xij represents the value of the jth variable for the i-th sample, wrj represents the value of the j-th Kohonen weight for the r-th neuron, cig represents the membership of the

i-th sample to the gth class expressed with a binary code, and yrg represents the value of the g-th output weight for the r-th neuron.

26

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

Counterpropagation Articial Neural Networks (CP-ANNs) are

modelling methods which combine features from both supervised

and unsupervised learning [10]. CP-ANNs consist of two layers, a

Kohonen layer and an output layer, whose neurons have as many

weights as the number of classes to be modelled (Fig. 1b). The class

vector is used to dene a matrix C, with I rows and G columns,

where I is the number of samples and G the total number of classes;

each entry cig of C represents the membership of the i-th sample to

the g-th class expressed with a binary code (0 or 1).

When the sequential training is adopted, the weights of the r-th

neuron in the output layer (yr) are updated in a supervised manner

on the basis of the winning neuron selected in the Kohonen layer.

Considering the class of each sample i, the update is calculated as follows:

yr 1

dri

old

ci yr

d max 1

and the winning neuron selected in the Kohonen layer; ci is the i-th

row of the unfolded class matrix C, that is, a G-dimensional binary

vector representing the class membership of the i-th sample.

On the other hand, if the batch training is used, the weights of the

output layer are changed following the same algorithm shown in the

previous paragraph (see Eqs. (3) and (4)).

At the end of the network training, each neuron of the Kohonen

layer can be assigned to a class on the basis of the output weights

and all the samples placed in that neuron are automatically assigned

to the corresponding class.

2.4. XY-fused Networks

XY-fused Networks (XY-Fs) are supervised neural networks for

building classication models derived from Kohonen Maps (Fig. 1c).

In XY-fused Networks, the winning neuron is selected by calculating

Euclidean distances between a) sample (xi) and weights of the Kohonen

layer, b) class membership vector (ci) and weights of the output layer.

These two Euclidean distances are then combined together to form a

fused similarity, that is used to nd the winning neuron. The inuence

of distances calculated on the Kohonen layer decreases linearly during

the training epochs, while the inuence of distances calculated on the

output layer increases. Details on XY-fused Networks can be found in

the paper from Melssen, Wehrens and Buydens [4].

2.5. Supervised Kohonen Networks (SKNs)

As well as CP-ANNs and XY-Fs, Supervised Kohonen Networks

(SKNs) are supervised neural networks derived from Kohonen Maps

Kohonen Networks, Kohonen and output layers are glued together

to give a combined layer that is updated according to the training

scheme of Kohonen maps. Each sample (xi) and its corresponding

class vector (ci) are combined together and act as input for the network. In order to achieve classication models with good predictive

performances, xi and ci must be scaled properly. Therefore, a scaling

coefcient for ci is introduced for tuning the inuence of class vector

in the model calculation. Details on SKNs can be found in the paper

from Melssen, Wehrens and Buydens [4].

3. Main features of the Kohonen and CP-ANN toolbox

The toolbox was initially developed under MATLAB 6.5 (Mathworks),

but it is compatible with the latest releases of MATLAB. The collection of

functions and algorithms are provided as MATLAB source les, with no requirements for any other third party's utilities beyond the standard

MATLAB installation. The les just need to be copied into a folder. The

model calculation can be performed both via the MATLAB command window and a graphical user interface, which enables the user to perform all

the analysis steps.

3.1. Input data

Data must be structured as a numerical matrix with dimensions

I J, where I is the number of samples and J the number of variables.

When dealing with supervised classication, the class vector must be

prepared as a column numerical vector (I 1), where the i-th element

of this vector represents the class label of the i-th sample. If G classes

are present, class labels must be integer numbers ranging from 1 to G.

Note that 0 values are not allowed as class labels.

Data sets with missing values can be handled by the toolbox. Basically, missing values (and the corresponding values of the neuron

weights) are not considered when calculating Euclidean distances to

nd the closest neuron and when updating the neuron weights.

3.2. Network settings

Kohonen Maps have adaptable parameters that must be chosen

prior to calculation. Network settings can be dened in the GUI or

via the MATLAB command window by means of the som_setting

function and can be stored in a MATLAB data structure. Each eld of

this structure denes a specic setting for the network. All the available settings are listed in Table 1.

The network size (nsize) denes the number of neurons for each

side of the map. If the number of neurons for each side is set to N, the

total number of neurons will be N 2. The number of epochs (epochs)

is the number of times each sample is introduced in the network. The

boundary condition (bound) denes whether the space of the

Table 1

Network settings available in the toolbox.

Settings

Description

Possible values

Default

net_type

nsize

epochs

topol

bound

training

init

a_max

a_min

scaling

absolute_range

ass_meth

scalar

Number of neurons for each side of the map

Number of training epochs

Topology condition

Boundary condition

Training algorithm

Initialization of weights

Initial learning rate

Final learning rate

Data scaling (prior to automatic range scaling)

Type of automatic range scaling

Neuron assignment criterion (only for CP-anns, skns and XY-Fs)

Scaling coefcient for tuning the effect of class vector (only for skns)

Any integer number greater than zero

Any integer number greater than zero

Square, hexagonal

Toroidal, normal

Batch, sequential

Random, eigen

Any real number between 0.9 and 0.1

Any real number between 0 and the initial learning rate

None, centering, variance scaling, auto scaling

Classical, absolute

Four different criteria (1, 2, 3 or 4)

Any real number greater than 0

NaN

NaN

NaN

square

toroidal

batch

random

0.5

0.01

none

classical

1

1

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

27

Table 2

MATLAB routines of the toolbox related to the calculation of Kohonen maps and their main outputs. For each routine, outputs are collected as elds of a unique MATLAB structure.

MATLAB routine

model_kohonen

pred_kohonen

Outputs

Description

settings

scal

top_map

top_map

the number of neurons on each side of the map and J is the number of variables.

Settings used for building the model

Structure with scaling parameters

Coordinates of the samples in the Kohonen top map

Coordinates of the predicted samples in the Kohonen top map

(scaling), to be applied prior to the automatic range scaling.

denes the shape of each neuron (square or hexagonal). The training

algorithm can be dened by the eld training. Sequential or batch

training algorithms are available.

Learning rates ( start and nal) can be modied by changing the

values in a_max and a_min, respectively. Values of start and nal

are set by default at 0.5 and 0.01, respectively, as suggested in literature [10]. When dealing with supervised classication, the user can

also dene a criterion for assigning neurons to the classes on the

basis of their output weights (ass_meth) [9]. Initialization of

Kohonen weights can be dened by the eld init. In fact, Kohonen

weights can be initialized both randomly (between 0.1 and 0.9) or

on the basis of the eigenvectors corresponding to the two largest

principal components of the dataset [1]. In this second case, weights

are always initialized to the same values. Therefore, when the initialization of Kohonen weights is both based on the eigenvectors and

coupled with the batch training algorithm, the nal weights are always the same, since random initialization or random introduction

of samples into the Kohonen map are avoided. When dealing with

Supervised Kohonen Networks (SKNs), the scaling coefcient for

tuning the effect of class vector can be dened in the scalar eld.

This scaling coefcient is set by default at 1.

Regarding data scaling, it must be noted that variables are always

range scaled between 0 and 1, in order to be comparable with the network weights [10]. The range scaling can be performed separately on

each column (variable) of the dataset or by using the maximum and

minimum values of the entire dataset (absolute_range). This second

option can be used when all the variables are dened at the same

scale, such as for proles and spectral data. Moreover, the user can

Algorithms

Kohonen Maps require an optimization step in order to choose the

most suitable network architecture. When dealing with classication

models, CP-ANNs, SKNs, and XY-Fs require the selection of appropriate

numbers of neurons and training epochs, in order to make accurate predictions. The relationship between architecture and network performance cannot be easily decided and depends on many parameters

like the number of samples and their distribution in the data space.

Searching for the best architecture is usually performed by heuristic

methods and actually one of the major disadvantages of these multivariate statistical models is probably related to the network optimization,

since this procedure suffers from some arbitrariness and can be timeexpensive in some cases. Recently, a new strategy for the selection of

the optimal number of neurons and training epochs was proposed

[11]. This strategy exploited the ability of Genetic Algorithms to optimize network parameters [1215]. Details on this approach can be

found in the quoted paper. In this toolbox, this strategy for optimizing

the network architecture has been introduced and can be run both via

the graphical user interface and in the MATLAB command window.

Once the optimization has been performed, the optimization results

can be easily saved, loaded and analyzed in the graphical user interface.

Details on how to perform optimization are given in the section describing the illustrative example of analysis.

Table 3

MATLAB routines of the toolbox related to the calculation of CP-ANNs, SKNs and XY-Fs and their main outputs. For each routine, outputs are collected as elds of a unique MATLAB

structure.

MATLAB routine

Description

Outputs

Description

model_cpann, model_skn,

model_xyf

XY-F models

N N J, where N is the number of neurons on each side of the

map and J is the number of variables

Output weights stored in a 3-way data matrix with dimensions

N N G, where N is the number of neurons on each side of the

map and G is the number of classes

Vector with neuron assignments

Settings used for building the model

Structure with scaling parameters

True class vector

Calculated class vector

Output weights associated to samples

Coordinates of the samples in the Kohonen top map

Structure containing classication parameters (confusion

matrix, error rate, specicity, sensitivity and precision)

Settings used for cross validating the model

True class vector

Class vector calculated in cross validation

Output weights associated to samples in cross validation

Structure containing cross validated classication parameters

Predicted class vector

Output weights associated to samples in prediction

Coordinates of the predicted samples in the Kohonen top map

W_out

neuron_ass

settings

scal

class_true

class_calc

class_weights

top_map

class_param

cv_cpann, cv_skn, cv_xyf

and XY-F models

and XY-F models

settings

class_true

class_pred

class_weights

class_param

class_pred

class_weights

top_map

28

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

Once data have been prepared and settings have been dened, the

user can easily calculate the Kohonen network by using the

model_kohonen function via the MATLAB command window. The

output of the routine is a structure with several elds containing all

the results (Table 2).

Supervised classication models can be calculated by using CP-ANNs,

SKNs, or XY-Fs via the MATLAB command window. The MATLAB functions associated to these methods are listed in Table 3. The output of

these functions is a structure, where results concerning the output

layer and indices describing classication performance are stored together with the results concerning the Kohonen layer (Table 3). In particular, the output weights are stored in a three-way data matrix with

dimensions NNG, where G is the number of modelled classes. The assignment of each neuron is saved as well as the consequent assignment

of each sample placed in the neuron. Finally, the confusion matrix is provided. This is a squared matrix with dimensions GG where each entry

ngk represents the number of samples belonging to class g and assigned

to class k. The most known classication indices, such as error rate,

non-error rate specicity, sensitivity, precision and ratio of not assigned

samples are derived from the confusion matrix [16].

Cross-validation can be performed by means of the functions

listed in Table 3, by choosing the number of cancellation groups and

the cross-validation method for separating the samples into cancellation groups (venetian blinds or contiguous blocks). The output of this

routine is a MATLAB structure containing the confusion matrix and

the derived classication indices calculated in cross-validation.

Unknown or test samples can be predicted by using an existing

model: new samples are compared with the trained Kohonen weights,

placed in the closest neuron and assigned to the corresponding class.

This calculation can be made in the toolbox by means of the functions

Fig. 3. Kohonen and CP-ANN toolbox: interactive graphical interface for visualizing the Kohonen top map.

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

29

Fig. 4. Kohonen and CP-ANN toolbox: interactive graphical interface for visualizing the optimization results.

of the new samples.

3.5. Calculating models via the graphical user interface

The following command line must be executed in the MATLAB

prompt to run the graphical interface (Fig. 2):

>> model gui

The user can load data, sample and variable labels, and the class vector when dealing with supervised classication, both from the MATLAB

workspace or MATLAB les. Then, in addition to basic operations, such

as looking at the data, plotting variable means and sample proles, all

the calculation steps described in the previous paragraphs can be easily

performed in the graphical interface.

Optimization of the network structure to choose the optimal number of epochs and neurons can be performed directly in a proper window. Once the user has decided how to set the network, settings and

parameters for cross-validation can be dened in a proper window,

where basic and advanced settings are divided in order to facilitate

practitioners who are not skilled with SOMs.

Once a model has been calculated, results of the optimization step,

models, settings and cross validation results can be exported in the

MATLAB workspace. Saved models can be easily loaded in the toolbox

for future analyses, as well as new samples can be loaded and

predictions can be calculated on the basis of previously calculated

models.

When dealing with supervised classication, the user can graphically evaluate indices for classication diagnostic (confusion matrix,

error rate, non error rate, specicity, sensitivity, purity) and analyze

ROC curves (Receiver Operating Characteristics). These are graphical

tools for the analysis of classication results and describe the degree

of separation of classes. ROC curves are graphical plots of 1 Specicity

(also known as False Positive Rate, FPR) and Sensitivity (also known as

True Positive Rate, TPR) as x and y axes, respectively, for a binary

classication system as its discrimination threshold is changed. In this

toolbox, ROC curves are separately calculated for each class, by changing

the threshold of assignation over the output weights from 0 to 1.

Once the model has been calculated, the Kohonen top map can be

visualized in the toolbox graphical interface (Fig. 3). The Kohonen top

map represents the space dened by the neurons where the samples

are placed and allows visual investigation of the data structure by

analyzing the sample positions and their relationships. Samples are

visualized by randomly scattering their positions within each neuron

space and by means of the update button, it is possible to move the

sample positions within the neuron. Samples can be labelled with different strings: identication numbers, class labels in the case of supervised classication, or user dened labels. Moreover, the map

can be shifted if the chosen boundary condition is toroidal, in order

to optimize the map visualization.

Inuence of variables in describing data can be evaluated by coloring

neurons on the basis of the Kohonen weights by means of the Display

weights list. In this way, neurons will be colored from white (weight

equal to zero, minimum value) to black (weight equal to 1, maximum

value). Therefore, one can evaluate if the considered variable has a direct relationship on the sample distribution in the space of the top

map. Moreover, both Kohonen and output weights of a selected neuron

can be displayed by means of the get neuron weights button.

However, the analysis of the Kohonen top map only allows to plot all

the weights for a specic neuron or all the neurons for a specic weight,

that is, all the available information cannot be contemporaneously

plotted. When dealing with complex data, high dimensional spaces

Table 4

Example of analysis: some of the indices calculated by the toolbox and used for classication diagnostic. Error rate, non-error rate, specicity, sensitivity and precision

obtained in tting, cross-validation (10 cancellation groups) and on the external test

set of samples are shown.

Classication parameter

Fitting

Cross-validation

Non-error rate

Error rate

Precision of class 1

Precision of class 2

Sensitivity of class 1

Sensitivity of class 2

Specicity of class 1

Specicity of class 2

0.97

0.03

0.99

0.94

0.97

0.98

0.98

0.97

0.97

0.03

0.99

0.94

0.97

0.98

0.98

0.97

0.96

0.04

0.98

0.91

0.95

0.97

0.97

0.95

30

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

Fig. 5. Example of analysis: a) variable prole for each class produced by the toolbox. In this plot, the average of the Kohonen weights of each variable calculated on the neurons

assigned to each class is shown; b) plot of ROC curves produced by the toolbox.

are common; in these cases, it is not easy to solve the data interpretation with a simple visual approach.

For this reason, the toolbox allows the calculation of Principal

Component Analysis (PCA) on the Kohonen weights, in order to investigate the relationships between variables and classes in a global

way and not one variable at a time [17]. A GUI for calculating PCA

on the Kohonen weights is provided in the toolbox. Details on its

use are given in the section describing the illustrative example of

analysis.

This example consists of the Breast Cancer dataset, that is a real

benchmark dataset for classication [18]. The dataset is constituted

of 699 samples divided in 2 classes, class 1 as Benign (458 samples)

and class 2 as Malignant (241 samples). Samples are described by 9

variables (Clump Thickness, Uniformity of Cell Size, Uniformity of

Cell Shape, Marginal, Adhesion, Single Epithelial C, Bare Nuclei,

Bland Chromatin, Normal Nucleoli, Mitoses) which take on discrete

values in the range 110. Kohonen maps are not directly treated

here since they are implicitly calculated as the Kohonen layer of CPANNs.

The 25% of samples was randomly extracted and used as external

test samples maintaining the class proportions, that is, the number of

test samples of each class was proportional to the number of training

samples of that class. Training samples were used to optimize the network architecture and to build and cross-validate the CP-ANN classication model. External test samples were just used to evaluate the

predictive ability of the nal CP-ANN model.

The optimal number of neurons and epochs were calculated by

means of Genetic Algorithms, as previously explained. Optimization

results can be easily analyzed in the graphical user interface (Fig. 4).

Each bubble represents a network architecture. The dimension of

each bubble is proportional to the network size, that is, the number

of neurons. The color of the bubbles is proportional to the number

of epochs, that is, the darker the bubble, the higher the number of

epochs used to train the network. This plot enables qualitative interpretation of the results: architectures placed in the right upper part of

the plot are appropriate, since they are characterized by high relative

frequencies of selection by Genetic Algorithms and high predictive

performances [11]. As a consequence, the architectures placed on

the right top limit of the plot can be considered as the most suitable

ones, such as the architecture marked in red in Fig. 4, representing a

neural network optimised with 4 4 neurons and 250 epochs. The

list of all the represented architectures with their number of neurons,

epochs, frequency of selection in the GA runs and average of tness

function can be seen by clicking the view results in table button.

By clicking the select button, it's possible to select a specic bubble

(architecture) in the plot and see its corresponding numbers of

epochs and neurons, frequency of selection and value of tness

function.

4.2. Calculation of the classication model

On the basis of the optimization results obtained by means of Genetic Algorithms, the numbers of neurons and epochs were set to

4 4 and 250, respectively. In Table 4, the classication indices

Fig. 6. Example of analysis: a) Kohonen top map produced by the toolbox. In the top map, each sample is labelled on the basis of its class. Each neuron is colored with a gray scale on

the basis of Kohonen weights of variable 2 (uniformity of cell size): white corresponds to Kohonen weight equal to 0, black to Kohonen weights equal to 1; b) prole of Kohonen

weights of one of the neurons where samples of class 2 (malignant) were placed.

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

31

refer both to tting and cross-validation, executed with 10 cancellation groups selected by venetian blinds. These classication indices

can be accessed by clicking on the classication results button in

the toolbox main form, as well as the plot of Kohonen weight averages for each class (class prole) and ROC curves. In these plots, it is

possible to see that a) class 2 (Malignant) is characterised by higher

values on all the considered variables (Fig. 5a); b) the degree of separation between the two classes is high in the ROC curves (Fig. 5b). Finally, the model can be saved in the MATLAB workspace and later

loaded in the toolbox to predict new sets of samples. This was made

on the external test samples of the data set in analysis. In Table 4,

the classication indices calculated on the external test set are shown.

4.3. Interpreting the results with the graphical interface

The classication indices provided by the toolbox can help the

user to evaluate the overall classication performance, but it is important to have an insight into the model by interpreting samples and

variables relationships. This can be done by analyzing the Kohonen

top map, where samples are projected in order to evaluate the data

structure, while variable importance can be analyzed by coloring

the neurons on the basis of the neuron weights, which are always

comprised between 0 and 1.

As an example, the top map of the calculated model (4 4 neurons

and 250 epochs) is shown in Fig. 6a. In the top map, each sample is

labelled on the basis of its class, while neurons are colored on the

basis of the Kohonen weight of variable 2 (Uniformity of Cell Size),

going from low values (white) to high values (black). It is reasonably

easy to see that variable 2 discriminates samples belonging to class 1

(Benign) and class 2 (Malignant), which are placed in neurons with

higher weights. On the other hand, the user can plot all the Kohonen

and output weights of a selected neuron. The prole of Kohonen

weights of one of the neurons where class 2 samples are placed is

shown in Fig. 6b. However, it is not possible to have a comprehensive

insight into the relationships between variables and samples. For this

reason, a tool for calculating PCA on the Kohonen weights is provided

in the graphical user interface of the toolbox. In Fig. 7, the score and

loading plots of the rst two components (explaining together the

74% of the total information) are shown, respectively. In the score

plot (Fig. 7a), each point represents a neuron of the previous CPANN model. Each neuron is colored with a gray scale on the basis of

the output weight of class 2: the larger the value of the output weight,

the higher the probability that the neuron belongs to class 2 and the

darker the color. The majority of neurons assigned to class 2 are

placed on the left side of the score plot. Thus, comparing score and

loading plots, one can evaluate how variables characterize classes. All

variables are placed on the left of the loading plot (Fig. 7b), thus variables are directly correlated with samples belonging to class 2

(Malignant), that is, samples of class 2 are characterized by higher

values of all the considered variables.

5. Independent testing

Dr. Federico Marini, at the Chemistry Department, Universit di

Roma La Sapienza, P.le Aldo Moro 5, I-00185 Rome, Italy, informed

that he has tested the described software and found that it appears

to function as the Authors described.

6. Conclusion

The Kohonen and CP-ANN toolbox for MATLAB is a collection of

modules for calculating Self Organizing Maps (Kohonen maps) and derived methods for supervised classication, such as Counterpropagation

Articial Neural Networks (CP-ANNs), Supervised Kohonen Networks

(SKNs) and XY-fused Networks (XY-Fs).

Fig. 7. Example of analysis: a) score plot of the rst two principal components calculated on the Kohonen weights. Each neuron is colored with a gray scale on the basis of the

output weight corresponding to class 2 (malignant): white corresponds to output

weight equal to 0, black to output weights equal to 1; b) loading plot of the rst two

principal components calculated on the Kohonen weights. Each variable is labelled

with its identication number.

from the Milano Chemometrics and QSAR Research Group website

(http://www.disat.unimib.it/chm). It aims to be useful for both beginners and advanced users of MATLAB. For this reason, examples and a

comprehensive user manual are provided with the toolbox.

The toolbox comprises a graphical user interface (GUI), which allows the calculation in an easy-to-use graphical environment. In the

GUI, all the analysis steps (data loading, model settings, optimization,

calculation, cross-validation, prediction and results visualization) can

be easily performed.

References

[1] T. Kohonen, Self-Organization and Associative Memory, Springer Verlag, Berlin,

1988.

[2] F. Marini, Analytica Chimica Acta 635 (2009) 121131.

[3] J. Zupan, M. Novic, J. Gasteiger, Chemometrics and Intelligent Laboratory Systems

27 (1995) 175187.

[4] W. Melssen, R. Wehrens, L. Buydens, Chemometrics and Intelligent Laboratory

Systems 83 (2006) 99113.

[5] J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankangas, SOM Toolbox for Matlab 5,

Technical Report A57, Helsinki University of Technology, 2000.

[6] M. Schmuker, F. Schwarte, A. Brck, E. Proschak, E. Tanrikulu, A. Givehchi, K.

Scheiffele, G. Schneider, Journal of Molecular Modeling 13 (2007) 225228.

[7] I. Kuzmanovski, M. Novic, Chemometrics and Intelligent Laboratory Systems 90

(2008) 8491.

32

D. Ballabio, M. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 2432

167173.

[9] D. Ballabio, V. Consonni, R. Todeschini, Chemometrics and Intelligent Laboratory

Systems 98 (2009) 115122.

[10] J. Zupan, M. Novic, I. Ruisnchez, Chemometrics and Intelligent Laboratory Systems

38 (1997) 123.

[11] D. Ballabio, M. Vasighi, V. Consonni, M. Kompany-Zareh, Chemometrics and Intelligent

Laboratory Systems 105 (2011) 5664.

[12] I. Kuzmanovski, S. Dimitrovska-Lazova, S. Aleksovska, Analytica Chimica Acta 595

(2007) 182189.

[13] I. Kuzmanovski, M. Trpkovska, B. Soptrajanov, Journal of Molecular Structure

744747 (2005) 833838.

[14] D. Polani, Kohonen Maps, In: On the optimisation of self-organising maps by genetic

algorithms, Elsevier, Amsterdam, 1999.

[15] I. Kuzmanovski, M. Novic, M. Trpkovska, Analytica Chimica Acta 642 (2009)

142147.

[16] D. Ballabio, R. Todeschini, Infrared Spectroscopy for Food Quality Analysis and

Control, In: Multivariate Classication for Qualitative Analysis, Elsevier, 2008.

[17] D. Ballabio, R. Kokkinofta, R. Todeschini, C.R. Theocharis, Chemometrics and Intelligent

Laboratory Systems 87 (2007) 7884.

[18] W.H. Wolberg, O.L. Mangasarin, Proceedings of the National Academy of Sciences

of the United States of America 87 (1990) 91939196.

- 9781107051621_frontmatter.pdfUploaded byAndrea Mok
- Emotion recognition from geometric facial features using self-organizing mapUploaded bySmithjake
- ModEco ManualUploaded byfriderikos
- Insilico Desigh of Bio-reactorsUploaded bymycatalysts
- Statistical Aspects of Screening TestsUploaded byFarrukh Ali Khan
- InTech-Self Organizing Maps for Processing of Data With Missing Values and Outliers Application to Remote Sensing ImagesUploaded bylcm3766l
- Soft ComputingUploaded bydharmurali
- Funkce_UNS_15.pdfUploaded byFikri Ali Nawawi
- The Supervised Network Self-Organizing Map for Classification of Large Data SetsUploaded byvliviu
- KmsUploaded byVishal Patel
- Embeded Matlab 1 - User's GuideUploaded bybunhengchhun
- Neural Network Time Series Prediction SP500 2Uploaded bymotaheri
- 091102Uploaded byvol1no2
- pone.0093647-1Uploaded byGita Listawati
- Research- Faking Video With AIUploaded byGina Smith
- siggraph17_obama.pdfUploaded byinaseaofirrelevance
- Jounal the NFL 225 TestUploaded bySandal Jepun
- SEPSIS 27Uploaded byJuan Aarón Rodríguez Carbonell
- A biodiversity hotspot losing its top predator: The challenge of jaguar conservation in the Atlantic Forest of South AmericaUploaded byWillis Oliveira
- 7364408.pptUploaded bylcm3766l
- 2015 - Face Detection State of the Art.docxwtyeUploaded byExneider Moreno
- Neural Network Based Prediction Models for Structural Deterioration of Urban Drainage PipesUploaded byMostafa Elkadi
- Uploaded bysandragia
- Forecasting Hydrogen sulfide Level Based Neural ComputationUploaded byJabar H. Yousif
- Tutorial 001Uploaded bySarfaraj Shaikh
- 1-s2.0-S1877050916314752-mainUploaded byvpsampath
- A Classification Method for the Dirty Factor of Banknotes Based on Neural Network With Sine Basis FunctionsUploaded byshoreshwan
- Endocan, A Risk Factor for Developing Acute RespiratoryUploaded byTimoteus Richard
- 13_1534Uploaded byGuillermo Ramirez
- ECAM-01-2014-0010Uploaded byChang Fu Xiong

- Projectreport Ocrrecognition 140903052518 Phpapp02Uploaded byramya manohari
- WSOM2005-4Uploaded byPublicDocumentsAndReports
- 1-s2.0-S1568494609000908-main.pdfUploaded byJulio César
- d4304-Syllabus-neural Networks and Fuzzy SystemsUploaded byshankar1505
- IJHCI-34Uploaded bymodat
- Neutral NetworkUploaded byDaria Nedospasova
- COLOR QUANTIZATION USING PRINCIPAL COMPONENTS FOR INITIALIZATION OF KOHONEN SOFMUploaded byNikos Papamarkos
- MATLAB Based Face Recognition System Using PCA and Neural NetworkUploaded byInternational Association of Scientific Innovations and Research (IASIR)
- Ohlc Predictors PaperUploaded byaaronwontbudge
- Research Journal Computer Science Full Volume 9 No 1 IJCSIS January 2011Uploaded byijcsis
- Test BankUploaded byRodel D Dosano
- Face Recognition Using SOM NetworkUploaded byEditor IJRITCC
- Stratimagic-2015Uploaded bybella_ds
- 15. Real Time Inverse Kinematics Robot Gait Sequencing Using Self Organizing Maps FinalUploaded byRodel D Dosano
- Paper-1 Classification and Recognition of Handwritten Digits by the Self-Organizing Map of KohonenUploaded byRachel Wheeler
- AI in Network Intrusion DetectionUploaded byAnonymous Th1S33
- nnet_gsUploaded byArun Vinthan
- Matlab CodesUploaded byonlymag4u
- Artificail Neural Network Seminar ReportUploaded byankit89m
- Artificial Neural Networks Methodological Advances and Bio Medical Applications by Kenji SuzukiUploaded byzubairaw24
- 3G Mobile With SOMUploaded bySalam Aburumman
- Matlab code for Radial Basis FunctionsUploaded bySubash Chandar Adikesavan
- 02_YAPAY_SİNİR_AĞLARIUploaded byuyelikicin3737
- Statistica Neural NetworksUploaded bydmzoly
- A2Uploaded byEssam Amin
- RACHSU Algorithm based Handwritten Tamil Script RecognitionUploaded byijcsis
- Thesis Using AnnUploaded bybomadan
- Lecture 18 - Kohonen SOMUploaded byDanh Bui Cong
- C++ Neural Networks & Fuzzy Logic - Valluru b RaoUploaded byapi-26378136
- View-Invariant Action Recognition Based on Artificial Neural NetworksUploaded bykpramyait