You are on page 1of 8

Optimizing the architecture of Kohonen map

and classification
M. ETTAOUIL, A. ESSAFI, and F. HARCHLI
AbstractThe Clustering is useful in data mining. It allows speeding up the search of relevant information. There exist many
clustering algorithms in the literature. For instance, we find k-means which is the more former of them, and neural Kohonen
network. Because of their efficacy, these algorithms are preferred in this domain. But their performance depends on the
initialization phase. This drawback presents a great challenge to the searchers. In this vein, we propose in this work an
approach able to improve the performance of neural Kohonen algorithm. This method consists to initialize the neurons of the
map by some objects determinate in a preprocessing phase. The goal of neural Kokonen network is to regroups a set of data in
unknown number of cluster. In practice, the task aims to search the prototypes of these classes. Initially, in classical algorithm
these inputs are chosen randomly and updated after during the processing. Often this choice is not suitable and decreases the
results. To alleviate this problem, we perform a first preprocessing. The goal in this phase is to look for the inputs around which
there exist a lot of data. Then we initialize the neurons of the Kohonen map by these inputs. With this technique we attempt to
realize two goals: the first is to improve the quality of the Kohonen algorithm, the second is to optimize the architecture of
Kohonen map. Finally, to justify our approach, some experiments are performed.
Index Terms Clustering, Kohonen networks, Optimization of architectures and Unsupervised Learning.


1 INTRODUCTION
He automatic analysis database becomes daily need.
This latter is continually increasing because the huge
number of datawhichis constantly stored in the web
site. The approaches used in data mining can be classified
into two categories: The first called factorial analysis and
the second is named classification [9]. This latter is also
divided on two groups: the categorization which is based
on supervised matching learning, while the clustering is
based on unsupervised one. This tool is more useful in
retrieval information (IR). Indeed, it allows avoiding the
search of relevant information in the cluster in which it is
sure that the desired information not exists. According to
the method adopted to define cluster, the existing cluster-
ing algorithms can be broadly classified into several types
[14], [10]: partitional clustering, hierarchical cluster-
ing, density-based clustering and grid-based clustering
[14]. According to the type of the variables allowed in the
data set, this latter can be categorized into: statistical,
conceptual, fuzzy clustering, crisp clustering and Koho-
nen net clustering. This algorithm is known by its robust-
ness, it is also used as a mean to project the inputs in a
low dimensional space. Contrary to the principal compo-
nent analysis, this algorithm realizes a nonlinear projec-
tion. This property is very important because it allows to
a visualization of inputs. It was applied in many domains
as: speech recognition [6], [20], robotics, sensory mapping
vector quantification [3] and text retrieval [4], [8]. We
report also that the underlying idea of this algorithm is to
simulate sensory activity of human brain. According to
neuroscience,this latter is shared into areas, each one of
them is specialized in a specific sensory field. Teuvo Ko-
honen exploiting this idea, proposed an unsupervised
classification method. This latter consists in representing
the inputs in a grid called Kohonen map (or competition
layer). This layer is formed by a set of dots of dimension
1, 2 or 3. These dots are called neurons. To eachof themis
associated a vector which is initialized randomly and
updated during the process. At the end of the process,
each input is assigned to a neuron that represents the
center of a class. Thereby the inputs are represented in the
map so the topology of the inputs is preserved. Although
its best qualities, this algorithm has some drawbacks
which are: the dependency of its performance on the
initialization phase. In this paper we propose a method
for performing an appropriate initialization phase. This
goal is realized by conducting a pretreatment phase. In
this phase the training set are regrouped in some cluster
and the vector neuron weight are initializedby the proto-
type of these groups. The remaining of this paper is orga-
nized as follows: Section 2 presents a description of the
Kohonen algorithm. A short view of related work is given
in Section 3. We present with more details our approach
by describing the work process and giving the learning
algorithm in Section 4. In Section 5 some indexes are pro-
posed to assess the quality of classification. Section 6 is
devoted to the experiments carried out and comments in
order to highlighting the proposed approach. Finally,
Section 7 concludes this work by summarizing the ob-
tained results and proposing some future direction.

- M. ETTAOUIL is with scientific computing and computer sciences, engi-
neering sciences, Faculty of sciences and Technology of Fez, university Sidi
Mohammed ben Abdelah, Box 2202, Fez, MORROCO.
- A. ESSAFI is with scientific computing and computer sciences, engineer-
ing sciences, Faculty of sciences and Technology ofFez, university Sidi Mo-
hammed ben Abdelah, Box 2202, Fez, MORROCO.
- F. HARCHLI is with scientific computing and computer sciences, engi-
neering sciences, Faculty of sciences and Technology ofFez, university Sidi
Mohammed ben Abdelah, Box 2202, Fez, MORROCO.
T
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 31
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/


2 KOHONEN ALGORITHM
In this section, we present the neural Kohonen system
which is used successfully in many areas such as: a mean
of reduction of dimension, visualization and projection of
nonlinear inputs, and a classification tool. It is slow in
learning phase, but it is rapid in the classification phase.
Moreover,it has the ability to address the problems with
missing data. In the early 80s, Tuevo Kohonen aiming to
model the structure of human brain introduced the net-
work architecture (fig. 1). This latter is formed by a set of
neurons; each of them is represented by two vectors: The
first indicates the position of the neuron; the other noted
w represents his weight. The underlying idea is to project
a set of data vectors of any dimension onto a map with
one, two or more dimensions. This projection has to pre-
serve the topology of data set, i.e. the inputs vectors
which are similar to each other are put close to each other
in the map. Each neuron in this map has a neighborhood
which will be reduced during the process.This latter is
conducted as follows:
1- Initialization phase: The weight and position vec-
tors are initialized randomly. So it is more likely
that the choice be incorrect. Thereby, a great
number of these vectors can led to some unne-
cessary classes. On the other hand a little number
provides some heterogeneous classes. Moreover,
the choice of outliers inputs gives some classes
with few elements which create a serious prob-
lem in the labeling phase.
2- Training phase: In this phase, a number of itera-
tions is conducted, each of them consists of two
operations. In the first one, the inputs are pre-
sented to the system. At each presentation of an
input ( ) x t a neuron
i
N (t) of the network is se-
lected according to the following
rule:
k
k
i(t) argmin(x(t) w (t)) = (1)
In the second one, the weights of neurons are
updated according to the following formula:
k k k
w (t 1) w (t) (t)(t, k)(x(t) w (t)) + = + (2)
Where 0 (t) 1 < < is the learning-rate factor, the
scalar (i, k) is called neighborhood function.
At the beginning and during a time , the values
of this function are great. After, this function de-
creases progressively. The value of the parameter
determine the time of the organization phase
and the start of convergence one.
Typically the neighborhood function has the
Gaussian form:
2
(t, k) exp
( )
k i
r r
t

= (3)
Where
k
r and
i
r designate respectively the spacial
cordinates of neuron
k
N and
i
N . The process is
arrested after
max
t iterations or when the weights
are stabilized i.e.
( ) w t w(t 1) < (4)
Where is a predefined parameter.
3- formation of classes: At the end of this process
we obtain a number of classes which depends on
the initial architecture map and defined as fol-
lows:
i i k k
A {x / x w min x w } = = (5)
i
A is the
th
i class; it is formed by elements whose
i
w is the winner code-vector. The resulting parti-
tion satisfies the following property: The similari-
ty between two examples decreases when the
distance between those elements increases. This
property provides a nice visualization along the
Kohonen map. Before closing this section we re-
port that the Kohonen algorithm is a version of
stochastic foggy algorithm with neighborhood.
3 RELATED WORK
The cluster analysis suffers from several problems whose
potential are [18]: Problems stemming from the identifica-
tion of the distance measure, the suitable initial clusters,
expectation of the number of these latter andlack of
knowledge about the data structure. The presence of out-
liers and the kind of attributes have also great effects in
the results. To overcome some of these difficulties, the
searcher used different strategies; among them we
present the following.

Fig. 1. Topologic Kohonen map
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 32
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/


In order to alleviate these problems, some existing
works hybrid supervised classification and unsupervised
one [1], [11].
Among the methods used to improve the efficiency of
Kohonen algorithm, we find one that uses the increasing
network gas. Others build progressively the architecture
map by adding or removing some neurons according to
some criteria. Some Latter works transform the problem
of optimization of the architecture of Kohonen map to a
problem of linear programmatic[13].
To estimate the number of clusters in a data set, some
authors use some statistical criteria [5].Others, basing on
the prior knowledge about the number of clusters use
Hidden Markov Model to identify a more appropriate
one [17]. To relize the same goal, some others use rough
set theory and incremental clustering [16]. Aiming to
produce effective results,there is othertechniques[12], [2]
as: removing the outliers [15].
4 PROPOSED METHOD
4.1 Description of the algorithm
As is reported above, the problem of clustering algorithm
lies in the dependency of their results on the initial phase.
In the unsupervised classification the data structure is
totally ignored include the number of clusters. So in the
initial phase this number is randomly fixed. The inputs
which represent the initial phase are also randomly cho-
sen. Consequently a bad choice of these elements (the
case when the fixed number is very different of the real
one or when there are several outliers among the chosen
inputs) leads to a bad results. In this work we propose an
approach which overcome this difficulty or at least alle-
viate it. In this method the initial phase is conducted au-
tomatically. Indeed the user simply has to present the
data to the system. It is the latter which will look for the
suitable initial weight vectors. The suitability of such
objects is evaluated by measuring the quality of the clus-
ters obtained by the algorithm. The envisaged solution-
consists to investigate iteratively the data structure and
choose desired inputs from the areas where the density is
high. More precisely the proposed method can be pre-
sented as follows: Firstly, using the Euclidean distance
wesearch the two most dissimilar pair of inputs. Then
each input of the remaining data is assigned to the closest
object of the input pair.So the examples are regrouped in
two subsets. Using barycenter of these subsets as initial
neurons, we apply the classical Kohonen algorithm. After
a sufficient number of iterations this system provides
some number of classes. Then the quality of these clusters
is estimated using some useful criteria. This operation
concludes the first iteration of our system. If the prede-
fined arrested conditions are not verified, the system
conducts a new iteration.
The proposed method consists of a number of itera-
tions; each one is performed as follows:
From the subsets obtained by the first operation in the
previous iteration, we determine the most heterogeneous
one. This latter is divided into two subsets using the tech-
niques described above. Adding the barycenters of the
new subsets to those used in the previous iteration, we
formed a new map. To this latter the same operations are
performed. The system is arrested when all the subsets
contain a number of elements equal to or less than a given
threshold. As output, this algorithm provides a number of
classes and the final map formed by a certain number of
neurons. We note that the most heterogeneous class is the
one which maximize the following function:
( )e


2
2
x,y C
1
x - y
|c|
(6)
We recall that, in each iteration, the quality of those
classes is estimated using some clustering indexes. In this
section we present one of these latter which is denoted CI
and defined as follows:
In order to calculate this index we determine the fol-
lowing terms.
k i j
k k i j
2
sim(d ,d )
|C |(|C | 1)
>
=


(7)
With
i
d and
j
d belong to
k
C
1
1
=
=
=

k m
k
k

m
(8)
The intra-class is measured by the index:

k l
C C
kl i j
k l i 1 j 1
1
= sim(d ,d )
|C ||C|
= =

(9)
With
i k
d C e and
j l
d C e
The average inter-class is evaluated as follow:
kl
k l

>
=


2
m(m 1)
(10)
Finally the overall measure of clustering performance
is giving by the following formula:

=
+
2
CI (11)
The others indexes are described in the next section.
The main different operations of the proposed method
can be described as follows:
Processing phase: Among the class of the previous ite-
ration we determine the most heterogeneous. Then this
latter is divided into two subsets which are the most ho-
mogeneous and the most separated. After the subsets
whose number of elements is greater than a given thre-
shold (D) are added to the previous set of classes. Finally
we calculate the barycenter of those subsets.
Traditional phase: Using the barycenters of the sub-
sets obtained in the processing phase as initial weights
vectors, the classical Kohonen algorithm is performed.
Evaluation phase: in each iteration and basing on
some useful criteria, we estimate the quality of the ob-
tained classes. When the program stops we choose the
best result obtained in all iterations.
We point out that the proposed method can be used as
amean of finding a reduced Kohonen map. It can be also
used to optimize the architecture of Kohonen map. Tothis
end, we initialize randomly the Kohonen map thenwe
apply the Kohonen algorithm to this latter. After,we ap-
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 33
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/


ply the proposed algorithm attempting to determine an
appropriate map which realizes the same performance or
better. In this context the algorithm will be arrested when
the quality of the classification is better than that obtained
by the Kohonen algorithm whose weights vectors are
taken randomly.

4.2 Learning algorithm:
The different stages can be translated into the following
algorithm:
Enter :
n : number of data
{ }
1 2 n
E x , x ,..., x =
: input space
T
max
: number of iterations in Kohonen algorithm
: a decreasing function using learning phase
:a positive real which represents the threshold
Initialization :
t 1 = , G E = , F = C
N 0 = : the initial size of Kohonen map
( )
w t = C : the set of weights is empty before the
construction of weights
M 0 = : distance matrix
Phase 1: preprocessing
1.
( )
ij
i , j
M a = where
( )
ij i j
a d x , x = (distance
between the inputs
i
x G e and
j
x G e )
2. Look for the two inputs
1
O and
2
O whose
distance between them is the maximum of M
3. Form two groups :
{ }
1 1 2
C = the objects which are closer to O than O
{ }
2 2 1
C = the objects which are closer to O than O

and
{ }
1 2
S= C ,C
4.
( ) { }
1 2
W t w , w = , where
1
w and
2
w are
respectively the barycenters of the
classes
1
C and
2
C
Phase 2 : Kohonen algorithm
1. Do while
max
t T s or
( )
s s
<
1 i N
max w t w(t 1)
2.1. Choose randomly an input x into the
training set
2.2. Determine the winner neuronby the
follows
equation:
s s
=
1 i N j
k argmin x w (t)
2.3. Update all the weights of the map by the
following rule
:
j j t,k j
w (t) w (t 1) (x w (t 1)) = + The
concept of neighborhood can be
introduced by the positive and sym-
metrical function while:
( ) ( )
=
t ,k
t t, k
While
( )
t and
( )
t, k are mentioned above.
2.4.
{ } k F F = with F is the set formed by the
neurons winners.
2.5. t t 1 = +
2. classifie the objects and evaluate this
classification using the index of evaluation
3. choose the class
i
C (if there exist)which is the
most heterogeneous and whose size is greater
than or equal to D.
Then
i
G C = ,
( ) ( )
i
W t W t 1 \w = where
i
w is

Fig. 2. Diagram of the proposed method
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 34
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/


the barycenter of
i
C ,
i
S S\C = and return to
phase 1.
4. If all the classes have a size less than D the
algorithm will be stopted.
In order toevade the outliers, we proposed to ignore
the subset whose number of elements is less than a given
value D. the best result returned by the iterations is
retained.
5 EVALUATION OF CLASSIFICATION QUALITY
Evaluating the quality of a classification is an important
and delicate task.A good classification is one which
realize a better homogeneity and a better separation
between the classes. These properties are typically
measured using the following factors:
- Intra-class: defined by the following formula
:
j i
2
j i
i m
w j
i 1e N
I p x G
=
= e
=


(12)
- Inter-class: defined by the following formula:
m
B i i
i 1
I p x G
=
=

(13)
- Total inertia:
W B
I I I = +
where G is the barycenter of the data set,
i
G is the
barycenter of the
th
i class and:
i
n
p
m
=
Where
i
n and m are respectively the cardinality of the
class i and the data set.
In the literature there is several [19], [7] indexes
measuring the quality of classification. The majority of
them use the factors mentioned above. Among these
indexes we present the following:
1. Proportion of variance explained by the classes
defined by:
i
n
p
m
= (14)
the value of this index should be as close as
possible to 1 without have too many classes and
stop after the last major leap.
2. Pseudo F : measure of separation using all
classes:
2
2
R
k 1
F
(1 R )
n k

(15)
where: n is the number of observations and k
the number of classes.
3. Rand index: determines the rate of data correctly
classified by the proposed method according toa
reference one (ref). It is defined as follow:
a d
R
a b c d
+
=
+ + +
(16)
With a, b, c and d are parameters calculated for
all data pairs i and j as follows : if
( )
ref
C i ,
( )
ref
C j ,
( )
obt
C i and
( )
obt
C j are
respectively the classes of i and j in the reference
base and those obtained by the proposed
method, we have:
( ) ( ) ( ) ( )
{ }
( ) ( ) ( ) ( )
{ }
( ) ( ) ( ) ( )
{ }
( ) ( ) ( ) ( )
{ }
ref ref obt obt
ref ref obt obt
ref ref obt obt
ref ref obt obt
a i, j \ C i C j C i = C j
b i, j \ C i C j C i C j
c i, j \ C i C j C i C j
d i, j \ C i C j C i C j
= =
= = =
= = =
= = =


4. Silhouette index: for each object
i
x the index
silhouette
( )
s i mesure the similarity between
the object
i
x and the others. The average of
those indexes evaluate the tightness and
separation of clusters.
This index is defined as follows:
( )
( ) ( )
( ) ( ) { }
,

=
b i a i
s i
max a i b i
(17)

and which can be written as:
( )
( )
( )
( ) ( )
( ) ( )
( )
( )
( ) ( )
1 , if a b
0, if a b
1, if a b

<

= =

>

a i
i i
b i
s i i i
b i
i i
a i


Where :
( )
a i is the average dissimilarity of
i
x with all other data within the same cluster
and
( )
b i is the lowest average dissimilarity
to
i
x of any such cluster.
We report that:
( )
1 1 s s s i ; More that
( )
s i is
close to 1 more than
i
x is correctly classified. In
contrast, if
( )
s i is close to -1
i
x is appropriately
clustered. Finally, if this value is about zero,
i
x is
on the border of two natural clusters.
The average silhouette can be used to measure
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 35
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/


the quality of clusters and the optimality of their
number.
F-measure: To define this measure we use the
following terms : true positives (
p
t ) , true
negatives (
n
t ), false positives(
p
f ) and false
negatives (
n
f ) which refer respectively to the
number of correctly classified observations, the
number of missing observations, the number of
unexpected observations and the number of
correctly absent observations. This measure
compares the investigated classifier to a given
one. This measure function is defined as follow:
precision recall
F 2
precision recall

=
+
(18)
where:
p
p p
t
precision
t f
=
+
,
p
p n
t
Re call
t f
=
+

This function is a special case of the
general

F measure:
( )
2

2
precision recall
F 1
precision recall

= +
+
There are
other useful rate as:
n
n p
t
tr
t f
=
+
,
p n
p n p n
t t
Ac
t t f f
+
=
+ + +
.
Severalrecent works use the recall and precision
measure. Some of them use only the ratio of correctly
classified inputs.
6 EXPERIMENTAL RESULTS
6.1 presentation of the experiments
To show the advantage of the proposed method,
experiments were conducted. The data set used in these
experiments is data set IRIS which is mostly used in this
area. This later consists of three clusters: Iris setosa,Iris
Virginica and Iris Versicolor. Each of them contains 50
samples which are described by four features: sepal
length, sepal width, petal length and petal width. In each
of those experiments, we attempted to realize a specific
achievement.Different criteria are used to estimate the
quality of the obtained clustering. The results of the
experiments are stored in Tables. Then, these latter are
translated into curves which give a rapid overview of the
investigated phenomena. The goal of the first experiment
is to optimize the architecture of Kohonen map. In this
vein, we perform the traditioanal Kohonen algorithm
using an initial architecture map chosen randomly. Then
we carry out our method looking for the same
performance or better. The results are stored in Table 1
and translated into curve presented in Figure 3. Some of
these experiments are conducted out attemping to
determine the suitable architecture. This latter is realized
when the value of Rand index is maximal. The results of
these experiments are presented in Figure 4.In order to
perform a comparison between the proposed algorithm
and the useful ones, we established some others Tables
and Curves such as: Figure 3 which presents the variation
of the Rand index versus the initial number of neurons
obtained by traditional Kohonen. Figure 4 gives a
comparison between the quality of clusteting obtained by
the classical Kohonen algorithm and the proposed
method. Table 2 and Figure 5 give a comparison between
the efficiency of the five indexes using in this work.
Finally, Tables 3 and 4 provide a comparison between our
method and those commonly used in the literature.



Fig. 3. Curve showing the variation of the obtained Rand index versus
the initial number of neurons by the Kohonen algorithm
TABLE 1
THE RESULTS OBTAINED BY THE CLASSICAL KOHONEN ALGORITHM
AND THE PROPOSED ONE

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 36
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/





6.2 Comment of the empirical results
Table 1 shows that the proposed method can be used as
an efficient tool to optimize the architecture of Kohonen
map: Indeed, using a number of neurons smaller than
that used by the classical Kohonen algorithm, the pro-
posed method provides better results. Consequently the
use of this method can reduce highly the number of neu-
rons. The curve of Figure 3 shows clearly that at a certain
number of neurons the value of Rand index remains con-
stant. So using a number of neurons great than this latter
is a bad choice. This proves the importance of the present
work.
Figure 4 presents the variation of the classification
quality versus the number of neurons obtained by the
classical Kohonen algorithm and the proposed one. In one
hand, it shows that the suitable map contains only 11
neurons (the map obtained by our method). In the other
hand, it shows that the results of our method are more
better than those obtained by the classical one.
Tables 2 and Figure 5 show obviously that Rand index
is more efficient than the others criteria.
Table 3 shows clearly the advantage of the proposed
method: Indeed, only three inputs are misclassified.
Moreover, these inputs belong to the same class. So these
reults are satisfactory in comparison with those obtained
by the efficient known algorithms which are presented in
Table 4.
7 CONCLUSION
The kohonen network is an efficient tool for clustering the
unlabeled data. It has important properties andrealizes
TABLE 2
A COMPARISON BETWEEN THE EFFICIENCY OF THE FIVE INDEXES

TABLE 3
THE QUALITY OF CLUSTERING OBTAINED BY THE PROPOSED ME-
THOD


Fig. 5. Curve presents the variation of the five indexes versus the
number of neurons to evaluate the classification quality

Fig. 4. Comparison with the classification quality by Kohonen algorithm
and the proposed method
TABLE 4
COMPARISION BETWEEN OUR APPROACH AND SOME OTHERS

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 37
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/


interesting task. It is used as a mean of dimensionality
reduction and projection of objects and their
visualization. But it suffers from some drawbacks which
has a great influence on the results. In this paper we
presented a method attempting to overcome some of
those problems and hence to improve the performance of
system. The conducted experiments showed that the
proposed method has very satisfactory results namely:
The architecture of the map: Instead of imposing
the architecture of the map (as is done in the
traditional algorithm), in our method this latter is
determined automatically with an incrementally
process. This technique provided a suitable
architecture which is very reduced in comparison
with the classical algorithm.
Quality of clustering: In the contrary of the
traditional algorithm, the quality of clustering in
the proposed method is controlled by the system.
Initialization phase: This phase is conducted
automatically and iteratively by the system.
Indeed, the identification of suitable initial
vectors and the number of classes are determined
by the system.
Finally, the good results obtained by the conducted
experiments show the advantage of our approach. In-
deed, the results of our experiments outperform those
obtained by the classical one. So, this method can be used
as an efficient optimisation tool. It also provides solutions
to problems evoked in the beginning of this work.
As perspective, we will integrate other parameters
attempting to more improve the performance of this
method and we will apply this approach in text
classification field.
REFERENCRES
[1] C.Lanquillon, Partially Supervised Text Categorization:
Combining labeled and unlabeled Document Using An EM-
like Scheme, Proceeding of the 11
th
conference on Machine Learn-
ing (ECML 2000), Vol. 1810 of LNCS, Springer Verlag, pp.
229-237.
[2] D. Shifei, X. Li, Z. Hong, and Z. Liwen. Research and
Progress of Cluster Algorithms based on Granular Compu-
ting,International Journal of Digital Content Technology and its
Applications, vol. 4, no. 5, Aug. 2010.
[3] E. LE BAIL, and A. Mitiche, Quantification Vectorielle
d'images par le Rseau Neuronal de Kohonen, Traitement du
Signal, vol. 6, no. 6, 1989.
[4] G. Salton, Developments in automatic text retrieval, Science,
vol. 253, no. 5023, pp. 974-980, Aug. 1991.
[5] G.W. Milligan and M.C. Cooper, An Examination of Proce-
dures for Determining the Number of clusters in Data set
Psychometrika, Journal of Educational Psychology, vol. 50, no.
2, pp. 159-179, 1985.
[6] I. Lapido, H. Guterman and A.Kohen, Compression Speak-
er Recognition Based on Competition Between Self-
Organizing Maps, IEEE Transactions on neural Networks, vol.
13, no. 4, Jul. 2002.
[7] J.C. BezdekCoray, C.R. Gunderson and J. Watson, Detection
and characterization of cluster substructure. SIAM Journal on
Applied Mathematiques, vol. 40, pp. 339-372, 1981.
[8] J.H. Youssif and M.A. Fekihal, Neural Approach for Deter-
mining Mental Health Problems,J. computiong, vol. 4, no. 1,
Jan. 2012.
[9] J.P. Agnelli, M. Cadeiras, E.G. Tabak, C.V. Turner, and E.V.
Eijnden, Clustering And Classification Through Normaliz-
ing Flows In Feature Space, Multispace Model, Simul,vol. 8,
no.5, pp. 1784-1802, Jan. 2010.
[10] J. Vesanto and E. Alhoniemi, Clustering of the Self-
Oranizing Map,IEEE Transaction on Neural Networks, vol. 11,
no. 3, pp. 586-600, May. 2000.
[11] K. Nigam, A. K. McCallum, S. Thrun and T. Mitchell, Text
Classification from Labeled and Unlabeled Documents Using
EM, Machine Learning, vol. 39, no. 2/3.
[12] M.C.Su, L. Ta-Kang, and C. Hsiao-Te, Improving the Self-
Organizing Feature Map Algorithm using an efficient Initiali-
zation Scheme Tamkang,Journal of Science and Engineering,
vol. 5, no. 1, pp. 35-48, 2002.
[13] M. Ettaouil, Y. Gannou, K. Elmoutaouakil and M. Lazaar, A
new Architecture Optimization Model for the Network and
Clustering, Journal of Advanced Research in Computer Science,
vol 3,no. 1, pp. 14-32, 2011.
[14] M.Halkidi, Y. Batistakis and M.V. Glaniss, On clustering Va-
lidation technique. Journal of Intelligent information System,
vol. 17, no. 2-3, pp. 107-145, 2001.
[15] M. I. Petrovskiy, Outlier Detection Algorithms in Data Min-
ing, Systems Programming and Computer Software, vol. 29, no.
4, pp. 228237, 2003.
[16] M.N.M. Sap and E. Mohebi, Hybrid Self Organizing Map for
overlapping Clusters, International Journal of Signal Processing
and Pattern Recognition, vol. 1, 2009.
[17] M. Rafiul, B. Nath and M. Kirly, A data clustering based on
single Hidden Markov Model, Processing of International
Multi-conference on Computer Science and Information Technolo-
gy, pp. 57-66.
[18] P.M. Agarwal, Q. Alam, and R. Biswas, Challenge and Tools
of clustering Algorithm, International Journal of Computer
Science Issues, vol. 8, no. 2, May. 2001.
[19] S. Dehuri, A. Ghosh, and R.G. Mall, Genetic Algorithms for
Multi-Criterion Classification and Clustering in Data Min-
ing, International Journal of Computing and information, vol. 4,
no. 3, pp. 143-154, Dec. 2006.
[20] Y. Bassil and P. Semaan, ASR Context-Sensitive Error Cor-
rection Based on Microsoft N-Gran Dataset, J. computing, vol.
4, no.1, Jan. 2012.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 7, JULY 2012, ISSN (Online) 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 38
2012 Journal of Computing Press, NY, USA, ISSN 2151-9617
http://sites.google.com/site/journalofcomputing/

You might also like