Professional Documents
Culture Documents
Self-Organizing Map in Matlab: The SOM Toolbox: Juha Vesanto, Johan Himberg, Esa Alhoniemi and Juha Parhankangas
Self-Organizing Map in Matlab: The SOM Toolbox: Juha Vesanto, Johan Himberg, Esa Alhoniemi and Juha Parhankangas
sM = som_make(sD);
som_show(sM);
of what the data is about, and would indicate how the Map: SOM 06−Sep−1999, Size: 14 6
variables should be preprocessed. In this example, the Figure 5. Visualization of the SOM of Iris data. U-
variance normalization is used. After the data set is ready, matrix on top left, then component planes, and map unit
a SOM is trained. Since the data set had labels, the map is labels on bottom right. The six figures are linked by
also labeled using som_autolabel. After this, the position: in each figure, the hexagon in a certain position
SOM is visualized using som_show. The U-matrix is corresponds to the same map unit. In the U-matrix,
shown along with all four component planes. Also the additional hexagons exist between all pairs of neighboring
labels of each map unit are shown on an empty grid using map units. For example, the map unit in top left corner has
som_addlabels. The values of components are low values for sepal length, petal length and width, and
denormalized so that the values shown on the colorbar are relatively high value for sepal width. The label associated
in the original value range. The visualizations are shown with the map unit is 'se' (Setosa) and from the U-matrix it
in Figure 5. can be seen that the unit is very close to its neighbors.
Component planes are very convenient when one has to 5. Conclusions
visualize a lot of information at once. However, when only
a few variables are of interest scatter plots are much more In this paper, the SOM Toolbox has been shortly
efficient. Figures 6 and 7 show two scatter plots made introduced. The SOM is an excellent tool in the
using the som_grid function. Figure 6 shows the PCA- visualization of high dimensional data [6]. As such it is
projection of both data and the map grid, and Figure 7 most suitable for data understanding phase of the
visualizes all four variables of the SOM plus the knowledge discovery process, although it can be used for
subspecies information using three coordinates, marker data preparation, modeling and classification as well.
size and marker color. In future work, our research will concentrate on the
3 quantitative analysis of SOM mappings, especially
analysis of clusters and their properties. New functions
and graphical user interface tools will be added to the
2
Toolbox to increase its usefulness in data mining. Also
se vi outside contributions to the Toolbox are welcome.
se se
1 se vi vi It is our hope that the SOM Toolbox promotes the
sese vi
se ve vi
vi vi vi utilization of SOM algorithm – in research as well as in
sese ve vi vi vi vi
se ve viveve industry – by making its best features more readily
ve vi
0 sese se veve accessible.
se veve vevi
se ve vi vi
sese veve ve vi
se ve ve
se veve vivi
Acknowledgements
−1 ve
se ve ve ve
veve
This work has been partially carried out in ‘Adaptive
ve and Intelligent Systems Applications’ technology program
−2
of Technology Development Center of Finland, and the
EU financed Brite/Euram project ‘Application of Neural
−3 Network Based Models for Optimization of the Rolling
−3 −2 −1 0 1 2 3 4
Process’ (NEUROLL). We would like to thank Mr. Mika
Figure 6. Projection of the IRIS data set to the Pollari for implementing the initialization and training
subspace spanned by its two eigenvectors with greatest GUI.
eigenvalues. The three subspecies have been plotted using
different markers: IRU6HWRVDx for Versicolor and ¸IRU
References
Virginica. The SOM grid has been projected to the same [1] Kohonen T. Self-Organizing Maps. Springer, Berlin, 1995.
subspace. Neighboring map units connected with lines. [2] Vesanto J., Alhoniemi E., Himberg J., Kiviluoto K.,
Labels associated with map units are also shown. Parviainen J. Self-Organizing Map for Data Mining in
MATLAB: the SOM Toolbox. Simulation News Europe
1999;25:54.
[3] Kohonen T., Hynninen J., Kangas J., Laaksonen J.
SOM_PAK: The Self-Organizing Map Program Package,
7 Technical Report A31, Helsinki University of Technology,
6
1996, http://www.cis.hut.fi/nnrc/nnrc-programs.html
[4] Pyle D. Data Preparation for Data Mining. Morgan
5 Kaufman Publishers, San Francisco, 1999.
4
[5] Anderson E. The Irises of the Gaspe Peninsula. Bull.
American Iris Society; 1935;59:2-5.
3 [6] Vesanto J. SOM-Based Visualization Methods. Intelligent
2
Data Analysis 1999;3:111-126.
1
4
Address for correspondence.
7.5
3.5
7
3 6.5 Juha Vesanto
6
2.5 5.5
Helsinki University of Technology
2 4.5
5 P.O.Box 5400, FIN-02015 HUT, Finland
Juha.Vesanto@hut.fi
Figure 7. The four variables and the subspecies http://www.cis.hut.fi/projects/somtoolbox
information from the SOM. Three coordinates and marker
size show the four variables. Marker color gives
subspecies: black for Setosa, dark gray for Versicolor and
light gray for Virginica.