Self-Organizing Map Implementation - CodeProject

You might also like

You are on page 1of 14
sri92015 ‘Seforgerizing Map implementation - CodeProject 1,467,502 members (64185 onfne) Member 12691953 104 Sign out ®) 00) CODE PROJECT Search for articles, questions, ips & Q&A forums lounge Self-organizing Map Implementation Peter Leow, 25 Jul 2014 CPOL Rate: KK HK KH 5.00 6 votes) Get real with an implementation of a mini SOM project. Download source - 122.6 KB Click on the following image to view a demo video on YouTube SSGGGSSSS5 SSGSS5GSS55 SSoSogngGSE55 ASEGDnness SSSGcmn2as SSGUUsnaTE SSSGGSNcES SEERSSS5E5 ntroduction In my previous article Self-organizing Map Demystified, you have learned the concept, architecture, and algorithm of self-organizing map (SOM). From here on, you will embark on a journey to designing and implementing a mini SOM for clustering of handwitten digits. For training the SOM, I have obtained the training dataset from Machine Learning Repository of the Center for Machine Learning and Intelligent Systems, MATLAB will be used for programming the SOM. Some pf the design considerations are generic and can be applied in other types of machine learning projects. Let's get started. ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion aa sri92015 ‘Seforgerizing Map implementation - CodeProject Preparing the Ingredients The original training dataset file is called “optdigits-orig.tra.2". It consists of a total of 1934 handwritten digits from 0 to 9 collected from a total of 30 people. Each digit sample has been normalized to 32x32 binary image that has 0 or 1 for each pixel. The distribution of the training dataset is considered well balanced as shown in Table 1. It is important to have balanced training dataset where the number of samples in every class is more or less equal so as to prevent biases in training, e.g. classes with overwhelmingly large number tend to get chosen more often than those in minority thus affecting the accuracy and reliability of the training result. Table 1: Distribution of Classes Class Number of Samples 0 189 1 198 2 195 3 199 4 186 5 187 6 195 7 201 8 180 9 204 Figure 1 shows some of the binary images contained in the training dataset: ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion sri92015 ‘Selforgerizing Map implementation - CodeProject ee SRE se NSE 5 se eee SRE se ce Noe CSREES wae ow SS Figure 1: Some of the Binary Images in the Training Dataset In the original data file, each block of 32x32 binary image is followed by a class label that indicates the digit that this sample belongs to. For example, the "8" bitmap block (in Figure 1) is followed by its class label of 8 and so on. To facilitate processing by MATLAB, I have further pre-processed the data as such 1. Separate the class labels from their bitmaps in the original data file into two different files. 2. Make the class labels, a total of 1934 digits, into a 141934 vector called "train_labeland save it as a MATLAB data file called "raining tabel.mat’. 3. Make the bitmap data, from its original form of 61888(rows)x32(columns) bits after removing their class labels, into a 1024x1934 matrix called "train data’, and save it as “training data.mat’. In the train_data matrix, each column represents a training sample and each sample has 1024 bits, ie. each training sample has been transformed from its original 32x32 bitmap into a 1024x1 vector. These two files - "training labelmat” and “training data.mat” are available for download. Unzip and placed them in a folder, say "som_experiment". If you are curious to see how these digits look like, fire up MATLAB, set the Current Folder to “som_experiment”, enter the following code inside the Command Window, and run it. Hide Copy Code % View a Digit from the train_data clear cle load ‘training data’ ; img = reshape(train_data(:,10), 32, 32)'s imshow(double(ime)) This code will: * load the “training labelmat’ file that contains the 1024x1934 train_data matrix into the memory; * train_data(:, n) will extract the nt” column vector (1024x1) vector from train_data matrix, where n coincides with the position of the digits in the dataset. In this code, the n is 10, so it will extract the tenth digit, you can change it to any number up to the total number of digits in the dataset, ie. 1934; ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion ana sroz01s Satt-rgeizing Map implemertaton- CodeProject * reshape(train_data(:,10), 32, 32)" will first convert the column vector into a row vector and then reshape the row vector into a 32x32 matrix, effectively reverting it back to its original shape like those shown in Figure 1; * imshow(double(img)) will display the digit as binary image where pixels with the value zero are shown as black and 1 as white, such as Biot eG Fi Ec Vii Ins To Des Win He » Oe asls|2z- 7 i Figure 2: A Binary Image on Screen Setting the Parameters Let's set the parameters for subsequent development. * The size of the SOM map is 10x10. © The total iteration is set at 1000. In this experiment, we will only attempt the first sub-phase of the adaptation phase. * The neighborhood function: Is en(n) = exp — Fae where 5 ,c(x)is the Euclidean distance from neuron to the winner neuron ¢(X). @(n)is the effective width of the topological neighborhood at n'” iteration, v » n=0,1,2.. , n= log a0 a(n) = 09 exp(— where ois the initial effective width which is set to be 5, ie. the radius of the 10x10 map. Tis the time constant * The weight updating equation: ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion ana sri92015 ‘Seforgerizing Map implementation - CodeProject w(n + 1) = wj(n) +7(r)hj ¢2)(m)(a — w,(n)) where 7)(1)is the time-varying leaning rate at n** iteration and is computed as such n(n) =moexp(——) , where Nois the initial learning rate which is set to 0.1. Tyis a time constant which is set to \V We have designed the parameters for the mini SOM. Its time to make things happend. Ready to Cook (Code) The MATLAB script for implementing the min SOM is saved in this file called “training_som.m" and is available for download. I have created this script as “proof of concept" for the sole purpose of reinforcing the learning of SOM algorithm. You are free to implement it in any programming languages. After all, programming languages are just media of implementation of problem solving techniques using computer. Unzip and placed it in the "som_experiment" folder. Open the "training_som.m" script in MATLAB's Editor, and you will see the recipe as shown below. The code has been painstakingly commented and therefore sufficiently self-explanatory. Nevertheless, I will still round up those parts of the code that correspond to the various phases of SOM algorithm so that you can understand and related them better. Hide Shrink & Copy Self-organizing Map Clustering of Handwritten Digits ‘training_som.m Peter Leow 10 July 2014 % clean up the previous act close all; clear; % delete all memory cic; % clear windows screen clf; —% clear figure screen she; % put figure screen on top of all screens COIR LARISA COO ISLC OOO LILO IIIT, % Ground breaking! OOOO ICIS ICO CLIO IIITE, % Load training data.mat that consists of train_data % of 1024x1934 matrix % Consists of a total of 1934 input samples and % each sample posseses 1024 attributes (dimenstions) load training data; % datarow % datacol number of attributes (dimensions) of each sample, i.e. 1024 total number of training samples, i.e. 1934 ipa codepoject com/Articless7SG537/Set-o garizing-Map-implemertation sna sri92015 ‘Seforgerizing Map implementation - CodeProject [dataRow, dataol] = size(train_data); [OOOO OOOO OOO SANTOS TUCO ALTO OSSSTA, % SOM Architecture OOOO RAC OOS AXON OOOO SALOON LOIS IITA, % Determine the number of rows and colurns of som map somRow = 10; somCol = 10; % Initialize 10x10x1024 som matrix % The is SOM Map of 10x10 neurons % Each neuron carries a weight vector of 1024 elements som = zeros(sonRow, sonCol, dataRow); OOO OOIOOI ROCCO OOOO SITTIN COO ALTO OOISATL, % Paraneters Settings CROISSANCE LLL SSS, % Max number of iteration N = 1000; % Initial effective width signalnitial = 5; % Time constant for signa t1 = N/ log(signatnitial); % Initialize 1x10 matrix to store Euclidean distances % of each neurons on the map euclidean = zeros(somRow, sonCol); % Initialize 1x10 matrix to store neighbourhood functions % of each neurons on the map neighbourhoodF = zeros(somRow, somCol); % initial learning rate etainitial = 0.1; % time constant for eta t2=N5 ORRIN LEO ILLICIT ILO SINT, % Initialization OOOO COCCI ALTO OO SITTIN OS IITA, % Generate random weight vectors [dataRow x 1] % and assign it to the third dimension of som for r=1:somRow for ‘omCol som(r, cy end rand(dataRow, 1); end % Initialize iteration count to one nea OOOO IRIE IOI ITO IOI L IIIS ITE, % Start of Iterative Training SOCORRO ISCO TOO OCIA STOO ITAA CO IIIS SSSL, % Start of one iterative loop While n <= N signa = sigmaInitial * exp(-n/t1); variance = signa*2; eta = etalnitial * exp(-n/t2); % Prevent eta from falling below 0.01 if (eta < 0.01) ipa. codepoject com/Aricless7SG537/Set-o gaizing-Map-implemertsion en sri92015 ‘Seforgerizing Map implementation - CodeProject eta = 0.01; end % Randomly generate a column integer between 1 and 1934 % corresponding to the column index of a training sample % in train_data i = randi([1,datacol]); OOOO OOOO OOS INTO IOI ITAA LLL OOSSISA, % Competition Phase ROI ANIONS AOI STOOLS ISSA, % Find the Euclidean Distances from the input vector % to all the neurons for ‘omRow for ce1:sonCol v = train_data(:,i) - reshape(som(r,c,: euclideanD(r,c) = sqrt(v’ * v)3 sdataRow, 1); end end % Determine the winner neuron, i.e. the neuron that is % the closest to the input vector. % winnerRow and winnerCol is the index position of the % winner neuron on the SOM Map [vector, winnerRowVector]=min(euclideanD, [],1)} % 1 stands for 1st dimension, i.e. row [winnerEuclidean, winnerCol]=min(vector,[],2)3 % 2 stands for 2nd dimension, i.e. column winnerRow = winnerRowVector(winnerCol) ; KKHKKIK End oF Competition Phase RRKLKKKKK OEE OEN OCC OEIC ‘operation Phase OO CORROOIOCOIA OOOO SINE I TTA OIS ISSA, % Compute the neighborhood function of every neuron for r=1:sonRow for c=1:sonCol if ("== winnerRow @& ¢ == winnerCol) % Ts the winner neighbourhood (r, ¢) = 15 continue; else % Not the winner distance = (winnerRow - r)*2 + (winnerCol ~ ¢)*2; neighbourhoodF (r, c) = exp(-distance/(2*variance)); end end end xxx End oF Cooperation Phase RRREKKKKK XO RELIC LALO OOO INLET IL OOS SINS, % Adaptation Phase - only the first sub-phase is considered OOOO III III O OOS T OC OOIIT NIT I IIT, for r=1:somRow for ce1:sonCol oldweightVector = reshape(som(r, ¢,:),dataRow, 1); % Update weight vector of neuron som(r, c,:) = oldWeightVector + etatneighbourhoodF(r, c)*(train_data(:,1) - oldweightvector) ; end end wou End of Adaptation Phase XXRIKKKIX XXXKK Draw updated SOM map KXXARRRKK fl = Figure(1); set(f1, ‘name’, strcat( ‘Iteration #",num2str(n)), ‘numbertitle’, ‘off'); for ‘omRow tpn codeproject com/Aricless7SG537/Sel-o gazing Map-implemertsion ma sri92015 ‘Seforgerizing Map implementation - CodeProject for A:soncol region = 10 * (r = 1) + cj subplot (sonRow, sonCol, region, ‘align') imgsreshape(som(r, ¢, :),32,32)'5 imshow(double(img) ); end end wou end of Draw updated SOM map sxc % Increase iteration count by one pened; end WOOO end of while loop RKKKKXKHY soc Save the trained SOM XXRIR% save('trained_som', ‘som’); Initialization Hide Copy Code for r=1:somRow for c#1:somCol som(r, ¢, + = rand(dataRow, 1); end end The block of code above will generate arrays of random numbers, whose elements are uniformly distributed in the interval (0,1). for the weight vectors of all neurons on the map. These weight vectors are stored as the third dimension of the som matrix, i.e. som(r, c, @ . Note that the som matrix represents the SOM map. The actual iterative training shall begin... Update Parameters Hide Copy Code signa = sigmainitial * exp(-n/ti); igma*2; eta = etatnitial * exp(-n/t2); % Prevent eta from falling below 0.01 if (eta < 0.01) eta = 0.01; end The block of code above will update the time-varying effective width (sigma) and learning rate (eta) at the start of every iteration. Sampling Hide Copy Code i = randi([1,datacol]); The line of code above will select a training sample (input vector) randomly from the pool of 1934 samples at the start of every iteration. ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion ana sri92015 ‘Seforgerizing Map implementation - CodeProject Compet Hide Copy Code for r=: for sonRow sonCol V = train data(:,i) - reshape(som(r,c,:),dataRow,1)5 euclidean (rjc) = sqrt(v' * v)5 end end This block of code above will compute the Euclidean distances from the input vector selected in the preceding code to all the neurons on the map. The resulting Euclidean distances will be stored in a matrix euclideanD. The index positions of these Euclidean distances in euclideanD coincide with those of their corresponding neurons on the som matrix Hide Copy Code [vector, winnerRowvector sin(euclideand, [],1); [winnerEuclidean, winnerCol J=nin(vector, [],2)3 winnerRow = winnerRowVector(winnerCol) ; The block of code above will determine the winner neuron, i.e. the neuron with the minimum Euclidean distance in euclideanD matrix. Cooperation Hide Copy Code sonRow oncol if (mr == winnerRow && ¢ == winnerCol) % Is the winner neighbourhoodF(r, ¢) = 15 continues else % Not the winner distance = (winnerRow - r)*2 + (winnerCol - ¢)*2; neighbourhoodF(r, c) = exp(-distance/(2*variance) ); end end end The block of code above will compute the neighborhood functions of all the neurons. The resulting neighborhood functions will be stored in a matrix neighbourhoodF. The index positions of these neighborhood functions in neighbourhoodF will coincide with those of their corresponding neurons on the som matrix. Adaptation Hide Copy ¢: :somRow for c=1:somCol oldweightVector = reshape(son(r, ¢ | dataRow,1) 5 % Update weight vector of neuron som(r, c,:) = oldWeightVector + etatneighbourhoodF(r, c)*(train_data(:,i) - oldweightVvector) ; end end ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion on sri92015 ‘Seforgerizing Map implementation - CodeProject The block of code above will implement the weight updating equation to update the weight vectors of all the neurons on the som matrix. Seeing is Believing Hide Copy Code fi = Figure(1); set(fi, ‘nane',streat(‘Iteration #*,num2str(n)),‘nunbertitle’, ‘off'); region = 10 * (r - 1) +c; subplot (somRow, somCol, region, ‘align’ ) img-reshape(som(r, c, :),32,32)'; imshow(double(ing)) ; The block of code above will display the som matrix as a two dimensional map after the weights updates at the end of every iteration. In this way, we are able to visulaize the progress of the training. Enjoy! Saving the Fruit of your Labor Hide Copy Code save('trained_som', ‘som'); At the end of the whole training, i.e. 1000" iterations, the line of code above will save the final som matrix toa file called “trained_som.mat’. Labeling the SOM We will proceed to label the neurons on the SOM using the class labels of their most similar training samples. See illustrations as shown in Figures 3 and 4 HOODHAHAHA AA tbh bbb bie HOOHAAHHA AAA etefs fst fsifs [222 OOHAHAHAH Ae eles Ls [2 fais [22 2 HHAHHHAA ee *Gbb PEPE LL HHHAHHH AMA cists tite let HHHHHHHOOO sbEbE EEE ELL HHHHHOHHoOO PEPER DEE HHHAHDOHAHAAEA ep Pee Oe HHHHAOAMAABAL fi ee ii HHHwAAMWAABE 7D? [7 fete fe fee fe fe Figure 3: Before Labeling Figure 4: After Labeling The code to achieve this is contained in this script file called "display som_with_class_labels.m” as shown (Forgive me, I'm really very poor at naming. :p). This file is available for donwload, Hide Shrink & Copy Code tpn codepoject com/Articless7SG537/Set-o garizing-Map-implemertaion tone sri92015 ‘Seforgerizing Map implementation - CodeProject x % % Self-organizing Map % Clustering of Handwritten Digits % display_son_with_class_labels.m % Peter Leow % 1@ July 2014 % % % clean up the previous act close all; clear; % delete all menory clc; | % clear windows screen clf; % clear figure screen she; ® put figure screen on top of all screens OOOO IOI III TOOT ILLS IIT, % Draw SOM map [CORROSION OOOO IATA, % Load trained_som.mat that consists of trained som of 10x1ex1024 matrix load trained_som; [somRow, sonCol, sonbieight] = size(som); NRO End oF Draw SOM map XXXII #1 = Figure(1); set(f1, ‘name’, ‘Trained SOM, ‘nunbertitle', ‘off'); sonRow c=1:sonCol region = 1 * (r = 1) + cj subplot(somRow, sonCol, region, ‘align') imgereshape(som(r, c, :),32,32)'s imshow(double(img) )5 end end HH End OF Draw SOM map XXXII OLA LANTOS OOS IIT, % Determine class label for each neuron on the map OOOO IIIT COOOL III ITE, % Load training data.mat that consists of train_data % of 1024x1934 matrix % Consists of a total of 1934 input samples and % each sample posseses 1024 attributes (dimenstions) load training data; [datakow, dataCol] = size(train_data); % Load training_label.mat that consists of train_label of 1x1934 matrix % Each element in train_label carries one of the 10 digits % from @ to 9, which is the class label for the corresponding input sample % that has the same column index in train_data load training label; % Matrix to store the column indices of training samples that are % closest to each neuron on the map somLabelIndex = zeros(somRox, somCol) ; sonkow for ce1:somCol % distance is an 1x1934 vector that stores the euclidean distance % between each training sample and a neuron distance = zeros(1, length(train_data(1,:)))3 % loop the train _data for ist:length(train_data(1,:)) train data(:;1) - reshape(som(r,c,: pdataRow, 1); tpn codepoject com/Articless7SG537/Set-o garizing-Map-implemertaion nna sri92015 ‘Seforgerizing Map implementation - CodeProject distance(i) = sqrt(v' * v); end % Find the index in distance matrix of the training sample % that is the closest to the neuron [value, colIndex]=min(distance,[],2); % 2 stands for 2nd dimension, i.e. column % Assign colindex to the somLabelindex matrix at the sane % index position as that of the neuron on the SOM map sonLabelindex(r, ¢) = colindex; end end youuu Draw SOM map with class Labels XXXKIHRRK #2 = figure(2); set(F2, ‘name’, ‘Trained SOM with Class Labels’, ‘numbertitle’, ‘off'); :somRow c=1:somcol region = 10 * (r - 1) +c; subplot (somRow, somCol, region, ‘align’ ) set(gea, ‘xtick',[], ‘xtickLabel', {}, ‘ytick',[], ‘yticklabel, (3) text(@.5, 8.5, int2str(train_label(1,somLabelIndex(r, c))))s end end sowccad end of Draw SOM map with class labels XKXXXX% Is That All? Of course not. We can use the labeled SOM to classify new handwritten digits. How to implement it? Feed any new digit into the SOM map, find out the neuron that is the most similar to it based on Euclidean distance, and the class label of that neuron will be the class label of the new digit. Got it? How to cook (Oop, I mean code) ? Well, itis similar to that code we used to label the SOM. Alright, stop, no more questions, I shall leave the rest to you as homework. @ It has been a long journey, I hope you find it fruitful. @ Bibliography * Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Invine, CA: University of California, School of Information and Computer Science. License This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL) Share ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion vane sri92015 ‘Seforgerizing Map implementation - CodeProject About the Author Peter Leow Instructor / Trainer Singapore = FH CSM, DARI, PIG AMMZ, HHT “There is always something we can learn from another person. Choose to follow his strengths while use his shortcomings to reflect upon ourselves.” — Confucius “Live as if you were to die tomorrow. Learn as if you were to live forever.” — Mahatma Gandhi Follow on Linkedin Comments and Discussions ® Search Comments Go Add a Comment or Question First Prev Next reducing matrix size Member 11594408 9-Apr-15 14:03 Help Training and Testing Member 11093174 19-Sep-14 17:07 Hi Please give me tips Member 11048248 2-Sep-14 7:14 question Member 11024752 24-Aug-14 7:59 Refresh 1 ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion rane snoz01s ‘Sal-orgarizing Mp implementation - CodeProject General [E| News \@ Suggestion Question Bug [PlAnswer @Joke [@Rant admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+ Left/Right to switch pages. Permalink | Advertise |Privacy | Terms of Use | Mobile Web03 | 28.150516.1| Last Updated 25 Jul 2014 Select Language | ¥ Layout: fixed | id Article Copyright 2014 by Peter Leow Everything else Copyright © CodeProject, 1999-2015 ipa codepoject com/Articless7SG537/Sel-o garizing-Map-implemertaion vane

You might also like