This action might not be possible to undo. Are you sure you want to continue?

Contents lists available at SciVerse ScienceDirect

**Chemometrics and Intelligent Laboratory Systems
**

journal homepage: www.elsevier.com/locate/chemolab

**A MATLAB toolbox for Self Organizing Maps and supervised neural network learning strategies
**

Davide Ballabio a,⁎, Mahdi Vasighi b

a b

Milano Chemometrics and QSAR Research Group, Department of Environmental Sciences, University of Milano Bicocca, Milano, Italy Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran

a r t i c l e

i n f o

a b s t r a c t

Kohonen maps and Counterpropagation Neural Networks are two of the most popular learning strategies based on Artiﬁcial Neural Networks. Kohonen Maps (or Self Organizing Maps) are basically self-organizing systems which are capable to solve the unsupervised rather than the supervised problems, while Counterpropagation Artiﬁcial Neural Networks are very similar to Kohonen maps, but an output layer is added to the Kohonen layer in order to handle supervised modelling. Recently, the modiﬁcations of Counterpropagation Artiﬁcial Neural Networks allowed introducing new supervised neural network strategies, such as Supervised Kohonen Networks and XY-fused Networks. In this paper, the Kohonen and CP-ANN toolbox for MATLAB is described. This is a collection of modules for calculating Kohonen maps and derived methods for supervised classiﬁcation, such as Counterpropagation Artiﬁcial Neural Networks, Supervised Kohonen Networks and XY-fused Networks. The toolbox comprises a graphical user interface (GUI), which allows the calculation in an easy-to-use graphical environment. It aims to be useful for both beginners and advanced users of MATLAB. The use of the toolbox is discussed here with an appropriate practical example. © 2012 Elsevier B.V. All rights reserved.

Article history: Received 17 April 2012 Received in revised form 5 July 2012 Accepted 14 July 2012 Available online 22 July 2012 Keywords: Self Organizing Maps Supervised pattern recognition Artiﬁcial Neural Networks MATLAB Kohonen maps

1. Introduction Kohonen maps (or Self Organizing Maps, SOMs) are one of the most popular learning strategies among the several Artiﬁcial Neural Networks algorithms proposed in literature [1]. Their uses are increasing related to several different tasks and nowadays they can be considered as an important tool in multivariate statistics [2]. Kohonen maps are selforganizing systems able to solve unsupervised rather than supervised problems. As a consequence, methods based on the Kohonen approach but combining characteristics from both supervised and unsupervised learning have been introduced. Counterpropagation Artiﬁcial Neural Networks (CP-ANNs) are very similar to Kohonen maps, since an output layer is added to the Kohonen layer [3]. When dealing with classiﬁcation issues, CP-ANNs are generally efﬁcacious methods for modelling classes separated with non-linear boundaries. Recently, modiﬁcations to CP-ANNs led introducing new supervised neural network strategies, such as Supervised Kohonen Networks (SKNs) and XY-fused Networks (XY-Fs) [4]. As a consequence of the increasing success of Self Organizing Maps, some toolboxes for calculating supervised and unsupervised SOMs were proposed in literature [5–8]. The Kohonen and CP-ANN

⁎ Corresponding author at: Dept. of Environmental Sciences, University of MilanoBicocca, P.zza della Scienza, 1‐20126 Milano, Italy. Tel.: +39 02 6448 2801; fax: +39 02 6448 2839. E-mail address: davide.ballabio@unimib.it (D. Ballabio). 0169-7439/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2012.07.005

toolbox for MATLAB was originally developed in order to calculate unsupervised Kohonen maps and supervised classiﬁcation models by means of CP-ANNs in an easy-to-use graphical user interface (GUI) environment [9]. Recently, several new features and algorithms (SKNs, XY-Fs, batch training, optimization of network settings by means of Genetic Algorithms) were introduced in the toolbox. This work deals with the presentation of the last version of the Kohonen and CP-ANN toolbox, which is a collection of MATLAB modules freely available via Internet (http://www.disat.unimib.it/chm) along with examples and a comprehensive user manual released as HTML ﬁles. 2. Methodological background 2.1. Notation Scalars are indicated by italic lower-case characters (e.g. xij) and vectors by bold lower-case characters (e.g. x). Two-dimensional arrays (matrices) are denoted as X (I × J), where I is the number of samples and J the number of variables. The ij-th element of the data matrix X is denoted as xij and represents the value of the j-th variable for the ith sample. 2.2. Kohonen maps The toolbox was developed following the algorithm described in the paper from Zupan, Novic and Ruisánchez [10]. Only a brief

SKNs. this correction (Δwr) is scaled according to the topological distance from the winning neuron (dri): Δwr ¼ η⋅ 1− dri old ⋅ xi −wr d max þ 1 ð1Þ where t is the number of the current training epoch. Given a multivariate dataset composed of I samples described by J experimental variables. the whole set of samples is presented to the network and winner neurons are found. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 25 description of Kohonen maps is given. the weights of the r-th neuron (wr) are changed as a function of the difference between their values and the values of the sample. The weights of each neuron are initialized between 0 and 1 and updated on the basis of the I samples. wrj represents the value of the j-th Kohonen weight for the r-th neuron. and uir is the weighting factor of the winning neuron related to the i-th sample with respect to neuron r: uir ¼ η⋅ 1− dri d max þ 1 ð4Þ where η is the learning rate and dmax the size of the considered neighbourhood. The learning rate η changes during the training phase. in each training epoch samples are randomly introduced in the network. . samples are placed in the most similar neurons of the Kohonen map. one at a time. and XY-Fs) for a generic dataset constituted by J variables and G classes. at the same time: I P uir xi w r ¼ i ¼1 I P uir i¼1 ð3Þ where wr are the updated weights of the r-th neuron. (1)). When the sequential training is adopted. that decreases during the training phase. and yrg represents the value of the g-th output weight for the r-th neuron. 1a). Then. since all the details can be found in the quoted paper. At the end of the network training. Notation in the ﬁg. respectively. ηstart and ηﬁnal are the learning rate at the beginning and at the end of the training. that is. where N is the number of neurons for each side of the space (Fig. Ballabio. the winning neuron) is selected on the basis of the minimum Euclidean distance. cig represents the membership of the i-th sample to the gth class expressed with a binary code. it contains as many elements (weights) as the number of variables. after this. η. ttot is the total number of training epochs. When the batch training is used. dmax and dri are deﬁned as before (see Eq. Kohonen maps can be trained by means of sequential or batch training algorithms [1]. each neuron is associated to J weights. refers to notation used in the text: xij represents the value of the jth variable for the i-th sample. Structures of Kohonen maps and related methods (CP-ANNs. in this way. the data structure can be visualized and the role of experimental variables in deﬁning the data structure can be elucidated by looking at the Kohonen weights.D. M. the weights are calculated on the basis of the effect of all the samples. the most similar neuron (i.e. Fig. The Kohonen map is usually characterized by being a squared toroidal space that consists of a grid of N2 neurons. as follows: t start final final 1− þη η ¼ η −η t tot ð2Þ where. xi is the i-th sample. For each sample (xi). The topological distance dri is deﬁned as the number of neurons between the considered neuron r and the winning neuron. 1. for a certain number of times (termed as training epochs).

that is. 3. 2. the class vector must be prepared as a column numerical vector (I × 1).1 Any real number between 0 and the initial learning rate None. Basically. On the other hand.5 0.9 and 0.01 none classical 1 1 Type of neural network Number of neurons for each side of the map Number of training epochs Topology condition Boundary condition Training algorithm Initialization of weights Initial learning rate Final learning rate Data scaling (prior to automatic range scaling) Type of automatic range scaling Neuron assignment criterion (only for CP-anns. Each sample (xi) and its corresponding class vector (ci) are combined together and act as input for the network. cpann. variance scaling. 1b). Therefore. where the i-th element of this vector represents the class label of the i-th sample. Settings net_type nsize epochs topol bound training init a_max a_min scaling absolute_range ass_meth scalar Description Possible values Kohonen. whose neurons have as many weights as the number of classes to be modelled (Fig. normal Batch. Each ﬁeld of this structure deﬁnes a speciﬁc setting for the network. At the end of the network training. Note that 0 values are not allowed as class labels. 2.4. the weights of the output layer are changed following the same algorithm shown in the previous paragraph (see Eqs. Counterpropagation Artiﬁcial Neural Networks Counterpropagation Artiﬁcial Neural Networks (CP-ANNs) are modelling methods which combine features from both supervised and unsupervised learning [10]. Main features of the Kohonen and CP-ANN toolbox The toolbox was initially developed under MATLAB 6. sequential Random.5 (Mathworks). The collection of functions and algorithms are provided as MATLAB source ﬁles. a scaling coefﬁcient for ci is introduced for tuning the inﬂuence of class vector in the model calculation. each neuron of the Kohonen layer can be assigned to a class on the basis of the output weights and all the samples placed in that neuron are automatically assigned to the corresponding class. hexagonal Toroidal. b) class membership vector (ci) and weights of the output layer. which enables the user to perform all the analysis steps. The boundary condition (“bound”) deﬁnes whether the space of the where dri is the topological distance between the considered neuron r and the winning neuron selected in the Kohonen layer. ci is the i-th row of the unfolded class matrix C. with no requirements for any other third party's utilities beyond the standard MATLAB installation. Data sets with missing values can be handled by the toolbox. The class vector is used to deﬁne a matrix C. In Supervised Kohonen Networks. where I is the number of samples and J the number of variables. If G classes are present. Supervised Kohonen Networks (SKNs) As well as CP-ANNs and XY-Fs. 3.5. CP-ANNs consist of two layers. The number of epochs (“epochs”) is the number of times each sample is introduced in the network. missing values (and the corresponding values of the neuron weights) are not considered when calculating Euclidean distances to ﬁnd the closest neuron and when updating the neuron weights. skns and XY-Fs) Scaling coefﬁcient for tuning the effect of class vector (only for skns) . In XY-fused Networks.3. eigen Any real number between 0. while the inﬂuence of distances calculated on the output layer increases. Supervised Kohonen Networks (SKNs) are supervised neural networks derived from Kohonen Maps Table 1 Network settings available in the toolbox. Details on XY-fused Networks can be found in the paper from Melssen. M. Network settings can be deﬁned in the GUI or via the MATLAB command window by means of the som_setting function and can be stored in a MATLAB data structure. XY-fused Networks XY-fused Networks (XY-Fs) are supervised neural networks for building classiﬁcation models derived from Kohonen Maps (Fig. In order to achieve classiﬁcation models with good predictive performances. auto scaling Classical. a G-dimensional binary vector representing the class membership of the i-th sample. centering. Considering the class of each sample i. When dealing with supervised classiﬁcation. class labels must be integer numbers ranging from 1 to G. 2. The network size (“nsize”) deﬁnes the number of neurons for each side of the map. If the number of neurons for each side is set to N. Wehrens and Buydens [4]. 3 or 4) Any real number greater than 0 Default NaN NaN NaN square toroidal batch random 0. Ballabio. a Kohonen layer and an output layer. where I is the number of samples and G the total number of classes. skn. (3) and (4)). Network settings Kohonen Maps have adaptable parameters that must be chosen prior to calculation. All the available settings are listed in Table 1. 3. if the batch training is used. but it is compatible with the latest releases of MATLAB. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 2. 1c).1. Wehrens and Buydens [4]. Input data Data must be structured as a numerical matrix with dimensions I × J. the total number of neurons will be N 2. with I rows and G columns. the winning neuron is selected by calculating Euclidean distances between a) sample (xi) and weights of the Kohonen layer. that is used to ﬁnd the winning neuron. Details on SKNs can be found in the paper from Melssen. absolute Four different criteria (1.26 D.2. The inﬂuence of distances calculated on the Kohonen layer decreases linearly during the training epochs. When the sequential training is adopted. 1d). These two Euclidean distances are then combined together to form a fused similarity. xi and ci must be scaled properly. The model calculation can be performed both via the MATLAB command window and a graphical user interface. xyf Any integer number greater than zero Any integer number greater than zero Square. the update is calculated as follows: Δyr ¼ η⋅ 1− dri old ⋅ ci −yr d max þ 1 ð5Þ and used to calculate classiﬁcation models (Fig. the weights of the r-th neuron in the output layer (yr) are updated in a supervised manner on the basis of the winning neuron selected in the Kohonen layer. The ﬁles just need to be copied into a folder. Kohonen and output layers are glued together to give a combined layer that is updated according to the training scheme of Kohonen maps. each entry cig of C represents the membership of the i-th sample to the g-th class expressed with a binary code (0 or 1).

SKN and XY-F models . MATLAB routine model_kohonen Fitting of Kohonen maps Outputs W settings scal top_map top_map Description Kohonen weights stored in a 3-way data matrix with dimensions N × N × J. a new strategy for the selection of the optimal number of neurons and training epochs was proposed [11]. where N is the number of neurons on each side of the map and J is the number of variables. this strategy for optimizing the network architecture has been introduced and can be run both via the graphical user interface and in the MATLAB command window. the optimization results can be easily saved. When dealing with Supervised Kohonen Networks (SKNs).1 and 0. where N is the number of neurons on each side of the map and J is the number of variables Output weights stored in a 3-way data matrix with dimensions N × N × G. This scaling coefﬁcient is set by default at 1. the ﬁnal weights are always the same. respectively. SKN and XY-F models Outputs W Description Kohonen weights stored in a 3-way data matrix with dimensions N × N × J. When dealing with supervised classiﬁcation. Learning rates (η start and η ﬁnal) can be modiﬁed by changing the values in “a_max” and “a_min”.9) or on the basis of the eigenvectors corresponding to the two largest principal components of the dataset [1]. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 27 Table 2 MATLAB routines of the toolbox related to the calculation of Kohonen maps and their main outputs. outputs are collected as ﬁelds of a unique MATLAB structure. the scaling coefﬁcient for tuning the effect of class vector can be deﬁned in the “scalar” ﬁeld. Moreover. pred_xyf Prediction with CP-ANN. SKNs.D. sensitivity and precision) Settings used for cross validating the model True class vector Class vector calculated in cross validation Output weights associated to samples in cross validation Structure containing cross validated classiﬁcation parameters Predicted class vector Output weights associated to samples in prediction Coordinates of the predicted samples in the Kohonen top map W_out neuron_ass settings scal class_true class_calc class_weights top_map class_param cv_cpann. In this toolbox. Details on this approach can be found in the quoted paper. SKN and XY-F models settings class_true class_pred class_weights class_param class_pred class_weights top_map pred_cpann. M. Details on how to perform optimization are given in the section describing the illustrative example of analysis. such as for proﬁles and spectral data. MATLAB routine model_cpann. 3. Settings used for building the model Structure with scaling parameters Coordinates of the samples in the Kohonen top map Coordinates of the predicted samples in the Kohonen top map pred_kohonen Prediction with Kohonen maps Kohonen Map is normal or toroidal. When dealing with classiﬁcation models. where N is the number of neurons on each side of the map and G is the number of classes Vector with neuron assignments Settings used for building the model Structure with scaling parameters True class vector Calculated class vector Output weights associated to samples Coordinates of the samples in the Kohonen top map Structure containing classiﬁcation parameters (confusion matrix.5 and 0.3. and XY-Fs require the selection of appropriate numbers of neurons and training epochs. In fact. Ballabio. This strategy exploited the ability of Genetic Algorithms to optimize network parameters [12–15]. CP-ANNs. when the initialization of Kohonen weights is both based on the eigenvectors and coupled with the batch training algorithm. cv_xyf Cross-validation of CP-ANN. it must be noted that variables are always range scaled between 0 and 1. Once the optimization has been performed. to be applied prior to the automatic range scaling. since random initialization or random introduction of samples into the Kohonen map are avoided. since this procedure suffers from some arbitrariness and can be timeexpensive in some cases. weights are always initialized to the same values. Recently. outputs are collected as ﬁelds of a unique MATLAB structure. respectively. In this second case. Optimization of the network architecture by means of Genetic Algorithms Kohonen Maps require an optimization step in order to choose the most suitable network architecture. cv_skn. in order to be comparable with the network weights [10]. Table 3 MATLAB routines of the toolbox related to the calculation of CP-ANNs. Kohonen weights can be initialized both randomly (between 0. as suggested in literature [10]. The relationship between architecture and network performance cannot be easily decided and depends on many parameters like the number of samples and their distribution in the data space. Sequential or batch training algorithms are available. For each routine.01. model_skn. The training algorithm can be deﬁned by the ﬁeld “training”. Regarding data scaling. in order to make accurate predictions. For each routine. Therefore. the user can also deﬁne a criterion for assigning neurons to the classes on the basis of their output weights (“ass_meth”) [9]. This second option can be used when all the variables are deﬁned at the same scale. Values of η start and η ﬁnal are set by default at 0. pred_skn. The topology condition (“topol”) deﬁnes the shape of each neuron (square or hexagonal). the user can deﬁne different methods of data scaling in the setting structure (“scaling”). speciﬁcity. SKNs and XY-Fs and their main outputs. Searching for the best architecture is usually performed by heuristic methods and actually one of the major disadvantages of these multivariate statistical models is probably related to the network optimization. Initialization of Kohonen weights can be deﬁned by the ﬁeld “init”. model_xyf Description Fitting of CP-ANN. error rate. The range scaling can be performed separately on each column (variable) of the dataset or by using the maximum and minimum values of the entire dataset (“absolute_range”). loaded and analyzed in the graphical user interface.

This calculation can be made in the toolbox by means of the functions Fig. the confusion matrix is provided. or XY-Fs via the MATLAB command window. SKNs. The assignment of each neuron is saved as well as the consequent assignment of each sample placed in the neuron. where results concerning the output layer and indices describing classiﬁcation performance are stored together with the results concerning the Kohonen layer (Table 3). In particular. the user can easily calculate the Kohonen network by using the “model_kohonen” function via the MATLAB command window. non-error rate speciﬁcity. Supervised classiﬁcation models can be calculated by using CP-ANNs. Cross-validation can be performed by means of the functions listed in Table 3. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 Fig. 3. Ballabio. placed in the closest neuron and assigned to the corresponding class. The output of the routine is a structure with several ﬁelds containing all the results (Table 2). The output of this routine is a MATLAB structure containing the confusion matrix and the derived classiﬁcation indices calculated in cross-validation. the output weights are stored in a three-way data matrix with dimensions N × N × G. Calculating models Once data have been prepared and settings have been deﬁned. by choosing the number of cancellation groups and the cross-validation method for separating the samples into cancellation groups (venetian blinds or contiguous blocks). The MATLAB functions associated to these methods are listed in Table 3. sensitivity. M. This is a squared matrix with dimensions G × G where each entry ngk represents the number of samples belonging to class g and assigned to class k. Finally. where G is the number of modelled classes. . Kohonen and CP-ANN toolbox: main graphical interface. 2.4. such as error rate. Unknown or test samples can be predicted by using an existing model: new samples are compared with the trained Kohonen weights. Kohonen and CP-ANN toolbox: interactive graphical interface for visualizing the Kohonen top map. The output of these functions is a structure. precision and ratio of not assigned samples are derived from the confusion matrix [16]. 3.28 D. The most known classiﬁcation indices.

Calculating models via the graphical user interface The following command line must be executed in the MATLAB prompt to run the graphical interface (Fig. as well as new samples can be loaded and predictions can be calculated on the basis of previously calculated models. cross-validation (10 cancellation groups) and on the external test set of samples are shown. Kohonen and CP-ANN toolbox: interactive graphical interface for visualizing the optimization results. that is. sample and variable labels.99 0. speciﬁcity. where basic and advanced settings are divided in order to facilitate practitioners who are not skilled with SOMs.98 0.03 0. In this toolbox. by changing the threshold of assignation over the output weights from 0 to 1. Saved models can be easily loaded in the toolbox for future analyses. listed in Table 3. However. non-error rate. high dimensional spaces The user can load data. In this way. When dealing with supervised classiﬁcation. such as looking at the data. in order to optimize the map visualization. Moreover. settings and cross validation results can be exported in the MATLAB workspace. non error rate.D. purity) and analyze ROC curves (Receiver Operating Characteristics).97 Cross-validation 0. Then. Samples can be labelled with different strings: identiﬁcation numbers. for a binary classiﬁcation system as its discrimination threshold is changed. Therefore. respectively. Table 4 Example of analysis: some of the indices calculated by the toolbox and used for classiﬁcation diagnostic. Ballabio. Error rate. models. or user deﬁned labels.98 0. 2): >> model gui 3. class labels in the case of supervised classiﬁcation. sensitivity. ROC curves are graphical plots of 1 — Speciﬁcity (also known as False Positive Rate. both Kohonen and output weights of a selected neuron can be displayed by means of the “get neuron weights” button. Optimization of the network structure to choose the optimal number of epochs and neurons can be performed directly in a proper window. 4.99 0.5. one can evaluate if the considered variable has a direct relationship on the sample distribution in the space of the top map. When dealing with complex data. Moreover. that return a structure containing the class assignment of the new samples. results of the optimization step. 3). The Kohonen top map represents the space deﬁned by the neurons where the samples are placed and allows visual investigation of the data structure by analyzing the sample positions and their relationships.04 0. the map can be shifted if the chosen boundary condition is toroidal. the Kohonen top map can be visualized in the toolbox graphical interface (Fig.95 . Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 29 Fig.97 0. TPR) as x and y axes.98 0. 3. FPR) and Sensitivity (also known as True Positive Rate. error rate.97 External test set 0. maximum value). and the class vector when dealing with supervised classiﬁcation. settings and parameters for cross-validation can be deﬁned in a proper window. Once a model has been calculated. M.98 0.97 0. plotting variable means and sample proﬁles.96 0. These are graphical tools for the analysis of classiﬁcation results and describe the degree of separation of classes. both from the MATLAB workspace or MATLAB ﬁles. it is possible to move the sample positions within the neuron. Visualizing results via the graphical user interface Once the model has been calculated. Samples are visualized by randomly scattering their positions within each neuron space and by means of the “update” button. minimum value) to black (weight equal to 1. Inﬂuence of variables in describing data can be evaluated by coloring neurons on the basis of the Kohonen weights by means of the “Display weights” list. Classiﬁcation parameter Non-error rate Error rate Precision of class 1 Precision of class 2 Sensitivity of class 1 Sensitivity of class 2 Speciﬁcity of class 1 Speciﬁcity of class 2 Fitting 0.97 0. ROC curves are separately calculated for each class. speciﬁcity.6.94 0.97 0.91 0. Once the user has decided how to set the network. the analysis of the Kohonen top map only allows to plot all the weights for a speciﬁc neuron or all the neurons for a speciﬁc weight.97 0.98 0. all the available information cannot be contemporaneously plotted.94 0. sensitivity and precision obtained in ﬁtting.97 0. in addition to basic operations.95 0. all the calculation steps described in the previous paragraphs can be easily performed in the graphical interface.03 0. neurons will be colored from white (weight equal to zero. the user can graphically evaluate indices for classiﬁcation diagnostic (confusion matrix.

A GUI for calculating PCA on the Kohonen weights is provided in the toolbox. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 Fig. that is a real benchmark dataset for classiﬁcation [18]. Single Epithelial C. External test samples were just used to evaluate the predictive ability of the ﬁnal CP-ANN model. As a consequence. The list of all the represented architectures with their number of neurons. The color of the bubbles is proportional to the number of epochs. respectively. Uniformity of Cell Shape. the classiﬁcation indices 4. the higher the number of epochs used to train the network. 6. in these cases. The dataset is constituted of 699 samples divided in 2 classes. In this plot. representing a neural network optimised with 4 × 4 neurons and 250 epochs. Marginal. 4. Fig. class 1 as Benign (458 samples) and class 2 as Malignant (241 samples). each sample is labelled on the basis of its class. that is. b) plot of ROC curves produced by the toolbox. Calculation of the classiﬁcation model On the basis of the optimization results obtained by means of Genetic Algorithms. the numbers of neurons and epochs were set to 4 × 4 and 250. Example of analysis: a) Kohonen top map produced by the toolbox. such as the architecture marked in red in Fig. the toolbox allows the calculation of Principal Component Analysis (PCA) on the Kohonen weights. the number of test samples of each class was proportional to the number of training samples of that class. Ballabio. This plot enables qualitative interpretation of the results: architectures placed in the right upper part of the plot are appropriate. since they are characterized by high relative frequencies of selection by Genetic Algorithms and high predictive performances [11]. Each neuron is colored with a gray scale on the basis of Kohonen weights of variable 2 (uniformity of cell size): white corresponds to Kohonen weight equal to 0. Kohonen maps are not directly treated here since they are implicitly calculated as the Kohonen layer of CPANNs. . Adhesion. b) proﬁle of Kohonen weights of one of the neurons where samples of class 2 (malignant) were placed. In the top map. Illustrative example: classiﬁcation of multivariate data This example consists of the Breast Cancer dataset. the number of neurons. frequency of selection in the GA runs and average of ﬁtness function can be seen by clicking the “view results in table” button. the average of the Kohonen weights of each variable calculated on the neurons assigned to each class is shown. Training samples were used to optimize the network architecture and to build and cross-validate the CP-ANN classiﬁcation model. that is. Example of analysis: a) variable proﬁle for each class produced by the toolbox. it's possible to select a speciﬁc bubble (architecture) in the plot and see its corresponding numbers of epochs and neurons. Each bubble represents a network architecture. The 25% of samples was randomly extracted and used as external test samples maintaining the class proportions. Mitoses) which take on discrete values in the range 1–10. the darker the bubble. are common. For this reason. Bland Chromatin. Normal Nucleoli. it is not easy to solve the data interpretation with a simple visual approach. 4. By clicking the “select” button. Samples are described by 9 variables (Clump Thickness.2. Bare Nuclei. The dimension of each bubble is proportional to the network size. Uniformity of Cell Size. as previously explained. 4). that is. in order to investigate the relationships between variables and classes in a global way and not one variable at a time [17]. In Table 4. frequency of selection and value of ﬁtness function. 5. epochs. Selection of the optimal numbers of neurons and epochs The optimal number of neurons and epochs were calculated by means of Genetic Algorithms. the architectures placed on the right top limit of the plot can be considered as the most suitable ones. M. black to Kohonen weights equal to 1. Details on its use are given in the section describing the illustrative example of analysis.30 D. 4. Optimization results can be easily analyzed in the graphical user interface (Fig.1.

Conclusion The Kohonen and CP-ANN toolbox for MATLAB is a collection of modules for calculating Self Organizing Maps (Kohonen maps) and derived methods for supervised classiﬁcation. In these plots. Wehrens. Schneider. the higher the probability that the neuron belongs to class 2 and the darker the color. M. E. respectively.D. Federico Marini. These classiﬁcation indices can be accessed by clicking on the “classiﬁcation results” button in the toolbox main form. [3] J. Scheiffele. Springer Verlag. the model can be saved in the MATLAB workspace and later loaded in the toolbox to predict new sets of samples. optimization. Vesanto. such as Counterpropagation Artiﬁcial Neural Networks (CP-ANNs). [2] F. 4. It aims to be useful for both beginners and advanced users of MATLAB. Novic. 1988. that is. 7. Schmuker. prediction and results visualization) can be easily performed. Self-Organization and Associative Memory. A. In Table 4. Analytica Chimica Acta 635 (2009) 121–131. Himberg. In the top map. SOM Toolbox for Matlab 5. On the other hand. Buydens. The classiﬁcation performances refer both to ﬁtting and cross-validation. Chemometrics and Intelligent Laboratory Systems 90 (2008) 84–91. Berlin. 5b). E. Novic. Gasteiger. The majority of neurons assigned to class 2 are placed on the left side of the score plot. In Fig. Journal of Molecular Modeling 13 (2007) 225–228.it/chm). J. which are placed in neurons with higher weights. J. Chemometrics and Intelligent Laboratory Systems 83 (2006) 99–113. However. Givehchi. Zupan. 5a). model settings. Kohonen. . while neurons are colored on the basis of the Kohonen weight of variable 2 (Uniformity of Cell Size). [5] J. Finally. Chemometrics and Intelligent Laboratory Systems 27 (1995) 175–187. F.unimib. In the GUI. 7. M. 5. The toolbox comprises a graphical user interface (GUI). Thus. cross-validation. at the Chemistry Department. [4] W. Melssen.disat. Kuzmanovski. For this reason. This can be done by analyzing the Kohonen top map. while variable importance can be analyzed by coloring the neurons on the basis of the neuron weights. Example of analysis: a) score plot of the ﬁrst two principal components calculated on the Kohonen weights. examples and a comprehensive user manual are provided with the toolbox. where samples are projected in order to evaluate the data structure. Tanrikulu. comparing score and loading plots. In the score plot (Fig. A. Technical Report A57. the user can plot all the Kohonen and output weights of a selected neuron. but it is important to have an insight into the model by interpreting samples and variables relationships. one can evaluate how variables characterize classes. Helsinki University of Technology. Each variable is labelled with its identiﬁcation number. I-00185 Rome. samples of class 2 are characterized by higher values of all the considered variables. black to output weights equal to 1. Supervised Kohonen Networks (SKNs) and XY-fused Networks (XY-Fs). K. a tool for calculating PCA on the Kohonen weights is provided in the graphical user interface of the toolbox. calculation. it is possible to see that a) class 2 (Malignant) is characterised by higher values on all the considered variables (Fig. Each neuron is colored with a gray scale on the basis of the output weight corresponding to class 2 (malignant): white corresponds to output weight equal to 0. 7a). Parhankangas. 2000. executed with 10 cancellation groups selected by venetian blinds. Alhoniemi. b) loading plot of the ﬁrst two principal components calculated on the Kohonen weights. Each neuron is colored with a gray scale on the basis of the output weight of class 2: the larger the value of the output weight. 7b). thus variables are directly correlated with samples belonging to class 2 (Malignant). the score and loading plots of the ﬁrst two components (explaining together the 74% of the total information) are shown. Fig. which allows the calculation in an easy-to-use graphical environment. E. For this reason. Università di Roma “La Sapienza”. 6b. P. The toolbox is regularly updated and it is freely available via Internet from the Milano Chemometrics and QSAR Research Group website (http://www.3. which are always comprised between 0 and 1. all the analysis steps (data loading. Interpreting the results with the graphical interface The classiﬁcation indices provided by the toolbox can help the user to evaluate the overall classiﬁcation performance. J. The proﬁle of Kohonen weights of one of the neurons where class 2 samples are placed is shown in Fig. Proschak. References [1] T. as well as the plot of Kohonen weight averages for each class (class proﬁle) and ROC curves. it is not possible to have a comprehensive insight into the relationships between variables and samples. As an example. the classiﬁcation indices calculated on the external test set are shown. M. R. Italy. [6] M.le Aldo Moro 5. All variables are placed on the left of the loading plot (Fig. [7] I. each point represents a neuron of the previous CPANN model. This was made on the external test samples of the data set in analysis. Schwarte. b) the degree of separation between the two classes is high in the ROC curves (Fig. 6a. Ballabio. It is reasonably easy to see that variable 2 discriminates samples belonging to class 1 (Benign) and class 2 (Malignant). the top map of the calculated model (4 × 4 neurons and 250 epochs) is shown in Fig. each sample is labelled on the basis of its class. Independent testing Dr. L. going from low values (white) to high values (black). informed that he has tested the described software and found that it appears to function as the Authors described. G. 6. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 31 calculated in the toolbox are shown. Marini. Brück.

[18] W. R. Soptrajanov. Chemometrics and Intelligent Laboratory Systems 87 (2007) 78–84. B. Kompany-Zareh. C. M. [8] J. R. Chemometrics and Intelligent Laboratory Systems 38 (1997) 1–23. [16] D. Kohonen Maps. V. Novic. Ballabio. Ballabio. Todeschini. Trpkovska.L. 1999. Infrared Spectroscopy for Food Quality Analysis and Control. M. [15] I. S. [12] I. [10] J. Novic. Proceedings of the National Academy of Sciences of the United States of America 87 (1990) 9193–9196. [17] D. Todeschini. Dimitrovska-Lazova. R. [13] I. M. Chemometrics and Intelligent Laboratory Systems 98 (2009) 115–122. [9] D. Ballabio. I. Chemometrics and Intelligent Laboratory Systems 105 (2011) 56–64. O. Ballabio. Vasighi / Chemometrics and Intelligent Laboratory Systems 118 (2012) 24–32 [14] D. Kokkinofta. M.R. Chemometrics and Intelligent Laboratory Systems 61 (2002) 167–173. In: Multivariate Classiﬁcation for Qualitative Analysis. Theocharis. Elsevier. Kuzmanovski. Consonni.H. Trpkovska. [11] D. Consonni. Mangasarin. Elsevier. Todeschini. V. Analytica Chimica Acta 595 (2007) 182–189.32 D. M. Wolberg. Polani. Analytica Chimica Acta 642 (2009) 142–147. Kuzmanovski. M. M. 2008. Vasighi. S. Aleksovska. Ballabio. Amsterdam. Ruisánchez. Journal of Molecular Structure 744–747 (2005) 833–838. Kuzmanovski. . In: On the optimisation of self-organising maps by genetic algorithms. Aires-de-Sousa. Zupan. R.

- Ies Syllabus
- CONVENTIONAL-MECH-I.pdf
- Near Fault
- 00050914.pdf
- SPE-000417-G
- Laplace
- Shirdel Dissertation 2013
- 2 poro data
- SPE-142316-MS-P
- SPE-142316-MS-P
- 00050914
- DmtGS01.pdf
- SPE-38442-PA
- OTC-24442-MS
- 103470670 Polymer Flooding
- 36297509 Petroleum Reservoir Simulation Aziz
- Etd Tamu 2005B PETE Garcia
- Stpe001 Paper
- SCM Make Simple Grid Petrel 2010
- Process Manual - Polymer Flood
- Parte 1_Capitulo 3
- Parte 1_Capitulo 3
- 152715122 TOC Hydrocarbon Exploration and Production
- SPE-104343-MS
- SPE-106922-MS

Sign up to vote on this title

UsefulNot usefulgeophysics

geophysics

- 10.1016_j.chemolab.2012.07.005_neeu1
- Artificial Neural Network Based Graphical User Interface for Estimation of Fabrication Time in Rig Construction Project
- ANN_Material
- Toolboxes in MATLAB
- Pattern Recognition using Artificial Neural Network
- 68075607 33face Recognition Using Neural Networks Sona College
- Data Mining
- power system restructuring
- CI-labw5
- Chapter3 - Bp
- Lecture 11
- net11-09
- chapter3 - bp
- Perception
- Nnet Intro
- Artificial Neural Network
- Mexican Hat Network
- l1
- Basic Concepts of Neural Networks and Fuzzy Logic Systems
- Neural Network
- Design and Development of Artificial Neural Networking (ANN) System Using Sigmoid Activation Function to Predict Annual Rice Production in Tamilnadu
- Anfis Expert Systems
- ANN Report
- Ann
- ann_basics.pdf
- NNEE561 jaber
- 05672225
- VB Neural Networks
- DM
- Intro&Perceptron
- 2012-Ballabio-Vasighi

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd