Professional Documents
Culture Documents
Invited Paper
Among the architectures and algorithms suggested for artificial responses tend to become ordered as if some meaningful
neural networks, the Self-Organizing Map has the special property
coordinate system for different input features were being
of effectively creating spatially organized "internal representa-
tions" of various features of input signals and their abstractions. created over the network. The spatial location or coordi-
One novel result is that the self-organization process can also dis- natesof acell in the network then correspond toa particular
cover semantic relationships in sentences. In this respect the domain of input signal patterns. Each cell or local cell group
resulting maps very closely resemble the topographically orga- acts like a separate decoder for the same input. It is thus
nized maps found in the cortices of the more developed animal
brains. After supervised fine tuning of its weight vectors, the Self- the presence or absence of an active response at that loca-
Organizing Map has been particularly successful in various pattern tion, and not so much the exact input-output signal trans-
recognition tasks involving very noisy signals. In particular, these formation or magnitude of the response, that provides an
maps have been used in practical speech recognition, and work is interpretation of the input information.
in progress on their application to robotics, process control, tele-
The Self-organizing Map was intended as a viable alter-
communications, etc. This paper contains a survey of several basic
facts and results. native to more traditional neural network architectures. It
i s possibletoaskjust how"neura1"the map is. Itsanalytical
description has already been developed further in the tech-
I. INTRODUCTION nical than in the biological direction. But the learning results
A. O n the Role o f the Self-organizing Map Among Neural achieved seem very natural, at least indicating that the
Network Models adaptive processes themselves at work in the map may be
similar t o those encountered i n the brain. There may there-
The network architectures and signal processes used to fore be sufficient justification for calling these maps "neural
model nervous systems can roughly be divided into three networks" in the same sense as their traditional rivals.
categories, each based o n a different philosophy. Feedfor-
Self-organizing Maps, or systems consisting of several
wardnetworks [94] transform sets of input signals into sets
map modules, have been used for tasks similar to those t o
of output signals. The desired input-output transformation which other more traditional neural networks have been
i s usually determined by external, supervised adjustment
applied: pattern recognition, robotics, process control, and
of the system parameters. I n feedback networks [27], the even processing of semantic information. The spatial seg-
input information defines the initial activity state of a feed-
regation of different responses and their organization into
back system, and after state transitions the asymptotic final topologically related subsets results i n a high degree of effi-
state is identified as theoutcomeof the computation. In the ciency in typical neural network operations.
third category, neighboring cells in a neural network com- Although the largest map we have used i n practical appli-
pete in their activities by means of mutual lateral interac-
cations has only contained about 1000 cells, its learning
tions, and develop adaptively into specific detectors of dif- speed, especially when using computational shortcuts, can
ferent signal patterns. I n this category learning is called be increased to orders of magnitude greater than that of
competitive, unsupervised, or self-organizing, many other neural networks. Thus much larger maps than
The Self-organizing Map discussed in this paper belongs
those used so far are quite feasible, although it also seems
to the last category. It i s a sheet-like artificial neural net- that practical applications favor hierarchical systems made
work, the cells of which become specifically tuned to var- up of many smaller maps.
ious input signal patterns or classes of patterns through an It may be appropriate t o observe here that if the maps are
unsupervised learning process. In the basic version, only used for pattern recognition, their classification accuracy
one cell or local group of cells at a time gives the active can be multiplied if the cells are fine-tuned using super-
response t o the current input. The locations of the vised learning principles (cf. Sec. Ill).
Although the Self-organizing Map principle was intro-
Manuscript received August 18, 1989; revised March 15, 1990. duced in early 1981, no complete review has appeared in
The author is with the Department of Computer Science, Hel-
sinki University of Technology, 02150 Espoo, and the Academy of compact form, except perhaps in [44], which does not con-
Finland, 00550 Helsinki, Finland. tain the latest results. I have therefore tried t o collect a vari-
IEEE Log Number 9038826. ety of basic material in the present paper.
_-
1
B. Brain Maps the higher levels the mapsare usuallyunordered, orat most
As much as a hundred years ago, a quite detailed top- the order i s a kind of ultrametric topological order that is
ographical organization of the brain, and especially of the not easily interpretable. There are also singular cells that
respond to rather complex patterns, such as the human face
cerebral cortex, could be deduced from functional deficits
and behavioral impairments induced by various kinds of PI, P31.
lesion, or by hemorrhages, tumors, or malformations. Dif- Some maps represent quite abstract qualities of sensory
and other experiences. For instance, in the word-process-
ferent regions i n the brain thereby seemed to be dedicated
ing areas, neural responses seem to beorganized according
to specific tasks. One modern systematic technique for
to categories and semantic values of words ([6], [15], [65],
causing controllable, reversible simulated lesions is t o stim-
ulate a particular site with small electric currents, thereby
[671, [106-1091, and [114]; cf. also Sec. V).
It thus seems as if the internal representations of infor-
eventually inducing both excitatory and inhibitory effects
mation in the brain are generally organized spatially.
and disturbing the assumed local function [75]. If such a
Although there i s only partial biological evidence for this,
spatially confined stimulus then disrupts a specific cog-
enough data are already available t o justify further theo-
nitiveabilitysuch as namingofobjects, it givesat least some
retical studies of this principle. Artificial self-organizing
indication that this site i s essential t o that task.
maps and brain maps thus have many features in common,
One straightforward method for locating a response is
and what is even more intriguing, we now fully understand
to record the electric potential or train of neural impulses
the processes by which such artificial maps can be formed
associated with it. Many detailed mappings, especiallyfrom
adaptively and completely automatically.
the primary sensory and associative areas of the brain, have
been made using various electrophysiological recording
C. Early Work on Competitive Learning
techniques.
Direct evidence for any localization of brain functions The basic idea underlying what is called competitive
can also beobtained using modern imaging techniques that learning is roughly as follows: Assume a sequence of sta-
display the strength and spatial distribution of neural tistical samples of a vectorial observablex = x(t) E El", where
responses simultaneously over a large area, with a spatial tis the time coordinate, and a set of variable reference vec-
resolution of a few millimeters. The two principal methods tors {m,(t): m, E .d", i = 1 , 2 , . * , k } . Assume that the m,(O)
which use radioactive tracers are positron emission tomog- have been initialized in some proper way; random selection
raphy (PET) [80] and autoradiography of the brain through will often suffice. If x(t) can somehow be simultaneously
very narrow collimators (gamma camera). PET reveals compared with each m,(t) at each successive instant of time,
changes in oxygen uptake and phosphate metabolism. The taken here t o be an integer t = 1 , 2 , 3 , . . . , then the best-
gammacamera method directlydetectschanges in cerebral matching ml(t) i s t o be updated t o match even more closely
blood flow. Both phenomena correlate with local neural the current x(t). If the comparison i s based on some dis-
activity, but they are unable t o monitor rapid phenomena. tance measure d(x, m,),altering m, must be such that, if i =
I n magnetoencephalography (MEG), the low magnetic field c i s the index of the best-matching reference vector, then
caused by electrical neural responses i s detected, and by d(x, m,) is decreased, and all the other reference vectors m,,
computing its sources, quite rapid neural responses can be with i # c, are left intact. In this way the different reference
directly analyzed, with a spatial resolution of a few milli- vectors tend to become specifically "tuned" to different
meters. The main drawback of MEG is that only current domains of the input variable x. It will be shown below that
dipoles parallel to the surface of the skull are detectable; i f p i s the probabilitydensityfunction ofthesamplesx, then
and since the dipoles are oriented perpendicular to the cor- the m, tend to be located in the input space S" in such a
tex, onlythe sulci can be studied with this method. A review way that they approximate t o p in the sense of some min-
of experimental techniques and results relating t o these imal residual error.
studies can be found in [32]. Vector Quantization (VQ) (cf., e.g., [19], [54], [58]) is a clas-
Aftera largenumber of such observations,afairlydetailed sical method, that produces an approximation to a contin-
organizational view of the brain has evolved [32]. Especially uous probability density function p(x) of the vectorial input
in higher animals, thevariouscortices in thecell mass seem variable x using a finite number of codebook vectors m,, i
to contain many kinds of "map" [33], such that a particular = 1,2, . . . ,k. Once the "codebook" is chosen, the approx-
location of the neural response in the map often directly imation of xinvolvesfinding the referencevectorm,closest
corresponds to a specific modality and quality of sensory to x. One kind of optimal placement of the m, minimizes
signal. The field of vision i s mapped "quasiconformally" E, the expected r t h power of the reconstruction error:
onto the primaryvisual cortex. Someof the maps, especially
those in the primary sensory areas, are ordered according
to some feature dimensions of the sensory signals; for
E = j IIX - m,ll'p(x) dx (11
instance, in the visual areas, there are line orientation and where dx i s the volume differential in the x space, and the
color maps [Ill, [116], and in the auditory cortex there are index c = c(x) of the best-matching codebook vector ("win-
the so-called tonotopic maps [91], [103], [104], which rep- ner") i s a function of the input vector x:
resent pitches of tones in terms of the cortical distance, or
other auditory maps [98]. One of the sensory maps i s the IIx - m,ll = min {llx - mllll. (2)
I
somatotopic map [29], [30] which contains a representation
of the body, i.e., the skin surface. Adjacent to it i s a motor In general, no closed-form solution for the optimal place-
map [70] that i s topographically almost identically orga- ment of the m, is possible, and iterative approximation
nized. Its cells mediate voluntary control actions on mus- schemes must be used.
cles. Similar maps exist in other parts of the brain [97]. O n It has been pointed out in [14], [64], and [I151 that (1)
T
x==mFFmPk
@Q@OO . . -
00000
00000
..
A,l t,l--
0
\q-oo])
xclt,)----6-o
o/o
-
7kT-70
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
o\o
0 o\o
0
0
0
0
0
0
0
0
0
o\o
0
0
0
0
0
0
0
0
0
0
0
0 O0 O
0
0
0
0
00000 0 0 0 0 0 0 0 0 0 0 0 0
i,,,t)
if i $ N,(t),
(6)
set of signal values i s only possible if the interrelation 1. It i s related to a similar gain used in the stochastic approx-
between the signals i s simple. I n many practical problems, imation processes [49], [92], and as in these methods, a(t)
such as image analysis (cf. Discussion, Sec. VII) it will gen- should decrease with time.
erally be necessary t o use some kind of preprocessing t o An alternative notation i s to introduce a scalar ”kernel”
extract a set of invariant features for the components of x. function h,, = h,,(t),
B. Adaptation (Updating) of the Weight Vectors m,(t + 1) = m,W + h,,(t)[x(t) - m,(t)l (7)
It iscrucial totheformation ofordered mapsthatthecells whereby,above, h,,(t) = a(t)within N,,and h,,(t) = Ooutside
doing the learning are not affected independently of each N,. O n theother hand, thedefinition of h,,can also be more
other (cf. competitive learning i n Sec. I-C), but as topo- general; a biological lateral interaction often has the shape
logically related subsets, o n each of which a similar kind of of a “bell curve”. Denoting the coordinates of cells c and
correction is imposed. During the process, such selected iby the vectors rc and r,, respectively, a proper form for h,,
subsets will then encompass different cells. The net cor- might be
rections at each cell will thus tend t o be smoothed out in
the long run. An even more intriguing result from this sort h, = hoexp ( - llr, - r,l12/a2), (8)
of spatially correlated learning is that the weight vectors with bo = h,(t) and U = o(t) as suitable decreasing functions
tend to attain values that are ordered along the axes of the of time.
network.
In biophysically inspired neural network models, cor- C. Demonstrations of the Ordering Process
related learning byspatiallyneighboringcells can be imple-
mented using various kinds of lateral feedback connection The first computer simulations presented here are
and other lateral interactions. I n the present process we intended t o illustrate the effect that the weight vectors tend
to approximate to the density function of the input vectors
want to enforce lateral interaction directly in a general form,
in an orderly fashion. I n these examples, the input vectors
for arbitrary underlying network structures, by defining a
were chosen to be two-dimensional for visual display pur-
neighborhood set N, around cell c. At each learning step,
poses, and their probability density function was arbitrarily
all the cells within N,are updated, whereas cells outside N,
selected to be uniform over the areademarcated bythe bor-
are left intact. This neighborhood is centered around that
derlines (square or triangle). Outside the frame the density
cell for which the best match with input x i s found:
was zero. The vectors x(t) were drawn from this density
1Ix - mcll = min {Ilx - m,ll}. (2’) function independently and at random, after which they
I
caused adaptive changes in the weight vectors m,.
The width or radius of N, can be time-variable; in fact, for The m, vectors appear as points in the same coordinate
good global ordering, it has experimentally turned out to system as that in which the x(t) are represented; in order
beadvantageousto let N,beverywide in the beginning and to indicate to which unit each m, value belongs, the points
__
__ 1
I I
1600 seee !P0000
.....
:...-- ..~
>._._.I
? . _ _ _ _ 8
,
. ... ... . ...
> I . .
(b)
Fig. 5. Representationof three-dimensional(uniform)den-
sity functions by two-dimensional maps.
T
A B C D E
Abstract Data
--- -1-
ponents, 7) normalization of the pattern vectors. Except for C. Specific Problems with Transient Phonemes
steps 1) and 2), an integrated-circuit signal processor,
Generally, the spectral properties of consonants behave
TMS32010, i s used for the computation.
more dynamically than those of vowels. Especially in the
case of stop consonants, it seems to be better to pay atten-
B. Phoneme Map
tion to the plosive burst and transient region between the
The simplest type of speech maps formed by self-orga- consonant and the subsequent vowel in order t o identify
nization i s the static phoneme map. There are 21 phonemes the consonant. In our system transient information i s coded
in Finnish: /U, 0,a, ce, 0 , y, e, i, s, m, n, t, I, r, j, v, h, d, k, in additional "satellite" maps (called transient maps) and
p, tl. For their representation we used short-time spectra as they are trained, using transient spectral samples alone, to
the input patterns x ( t ) . The spectra were evaluated every describe the dynamic features with higher resolution [48].
9.83 ms. They were computed by the 256-point FFT, from Our system was in fact developed in two versions: one for
which a 15-component spectral vector was formed by Finnish and one for Japanese. In the Japaneseversion, four
grouping of the channels. In the present study all the spec- transient maps have been constructed to distinguish the
tral samples, even those from the transitory regions, were following cases:
employed and presented to the algorithm in the natural
order of their utterance. During learning, the spectra were 1) voiceless stops /k, p, t/ and glottal stop (vowel at the
not segmented or labeled in any way: any features present beg inning of utterance),
in the speech waveform contributed t o the self-organized 2) voiceless stops /k, p, t/ without comparison with the
map. After adaptation, the map was calibrated using known glottal stop,
stationary phonemes (Fig. IO). The map resembles the well- 3) voiced stops /b, d, g/,
known formant maps used in phonetics; the main differ- 4) nasals Im, n, 71,
ence is that i n our maps complete spectra, not just two of
Only one transient map has been adopted for the Finnish
their resonant frequencies as in formant maps, are used to
version, making the distinction between /k, p, t/ and the
define the mapping.
glottal stop. (/b/ and /g/ do not exist in original Finnish.)
Recognition of discrete phonemes is a decision-making
process in which the final accuracy only depends o n the
D. Compensation for Coarticulation Effects using the
rate of misclassification errors. It i s therefore necessary t o
"Dynamically Expanding Context"
try t o minimize them using a decision-controlled (super-
vised) learning scheme, using a training set of speech spec- Because of coarticulation effects, i.e., transformation of
tra with known classification. the speech spectra due to neighboring phonemes, system-
I n practice, for a new speaker, it will be sufficient t o dic- atic errors appear in phonemic transcriptions. For instance,
tate 200 t o 300 words which are then analyzed by an auto- the Finnish word "hauki" (meaning pike) i s almost in-
matic segmentation method. The latter picks u p the train- variably recognized as the phoneme string Ihaoukil by our
ing spectra that are applied in the supervised learning acoustic processor. It may then be suggested that if a trans-
algorithm. The finite set of training spectra (of the order of formation rule /sou/ + /au/ i s introduced, this error will be
2000) must be repeated in the algorithm either cyclically or corrected. It might also be imagined that it i s possible to
in a random permutation. LVQI, LVQ2, or LVQ3 can be used list and take into account all such variations. However, there
as the fine tuning algorithm. A map created for a typical may be hundreds of different frames or contexts of neigh-
(standard) speaker can then be modified for a new speaker boring phonemes in which a particular phoneme may
very quickly, using 100 more dictated words, and LVQ fine occur, and in many cases such empirical rules are contra-
tuning only. dictory; they are only statistically correct. The frames may
Fig. 10. An exampleof aphonememap. Natural Finnish speech was processed bya model
of the inner ear which performs its frequency analysis. The resulting signals were then
connected to an artificial neural network, the cells of which are shown in this picture as
circles. The cells were tuned automatically, without any supervision or extra information
given, to the acoustic units of speech known as phonemes. The cells are labeled by the
symbols of those phonemes to which they "learned" to give responses; most cells give
a unique answer, whereas the double labels show which cells respond to two phonemes.
-~
1
then mainly reflects the metric relationships of the sets of numbering in Fig. Il(a). A total of 498 different three-word
associated encodings. But since the inputs for symbolic sig- sentences are possible, a few of which are exemplified in
nals are also active all the time, memory traces of them are Fig. I l ( c ) . These sentences were concatenated into a single
formed in the corresponding inputs of those cells i n the continuous string, S.
map that have been selected (or actually enforced) by the The context of a word in this string was restricted to the
context part. If then, during recognition o f input informa- pair ofwords formed by its immediate predecessor and suc-
tion, the context signals are missing or are weaker, the cessor in S(ignoring any sentence borders; i.e., words from
(same) map units are selected solely o n the basis o f the adjacent sentences in S are uncorrelated, and act like ran-
symbol part. In this way the symbols become encoded into dom noise in that field). The code vectors of the prede-
a spatial order reflecting their logical (or semantic) similari- cessor/successor-pair forming the context t o a word were
ties. concatenated into a single 14dimensional code vector x,.
In the following, I shall demonstrate this idea, which was In this simpledemonstration wethusonlytook intoaccount
originated by H. Ritter, using a simple language [84]. The the context provided by the immediately adjacent textual
simplest definition of the context of a word i s t o take all environment of each word occurrence. Even this restricted
those words (together with their serial order) that occur in context already contains interesting semantic relation-
a certain "window" around the selected word. For sim- ships.
plicity, we shall imagine that the content of each "window" In our computer experiments it turned out that instead
can somehow be presented t o the x, input ports of the of presenting each phrase separately to the algorithm, a
neural system. We are not interested here in any particular much more efficient learning strategy is first to consider
means for the conversion of, say, temporal signal patterns each word in its average contextover a set of possible "win-
into parallel ones (this task could be done using paths with dows". The (mean) context of a word was thus first defined
different delays, eigenstates that depend on sequences, or as the average over 70 000 sentences o f all code vectors of
any other mechanisms implementable in short-term mem- predecessor/successor-pairs surrounding that particular
ory). word. The resulting thirty ICdimensional "average word
The vocabulary used in this experiment i s listed in Fig. contexts", normalized t o unit length, assumed the role of
I l ( a ) and comprises nouns, verbs, and adverbs. Each word the"context fields"^, i n (14). Each "context field" was com-
bined with a 7-dimensional "symbol field" xs, consisting of
the code vector for the word itself, but scaled t o length a.
The parameter a determines the relative influence of the
Bob/Jim/Mary 1 Sentence Patterns:
sym bo1 part x, i n comparison t o the context part x, and was
horse/dog/cat 2 1-5-12 1-9-2 2-5-14 Jim speaks well
beer/water 3 1-5-13 1-9-3 2-9-1
set t o 0.2.
Mary likes Jim
meat/bread 4 1-5-14 1-9-4 2-9-2 For the simulation, a planar, rectangular lattice of 10 by
Jim eats often
runs/walks 5 1-6-12 1-10-3 2-9-3 Mary buys meat 15cellswas used.The initial weightvectorsof thecellswere
works/speaks 6 1-6-13 1-11-4 2-9-4 dog drinks fast chosen randomly, so that no initial order was present.
visits/phones 7 1-6-14 1-10-12 2-10-3 horse hates meat Updating was based o n (7) and (8). The learning step size
buys/sells 1-6-15 1-10-132-10-12
8 Jim eats seldom was h, = 0.8 and the radius o(r) of the adjustment zone (cf.
likes/hates 9 1-7-14 1-10-142-10-12 Bob buys meat
1-8-12 1-11-122-10-14
(8))was gradually decreased from an initial value U, = 4 to
drinks/eats 10/1 cat walks slowly
1-8-2 1-11-13 2-11-4 a final value uf = 0.5 according to the law a(t) = u,(af/u,)'tmax
much/little 12 Jim eats bread
fast/slowly 13 1-8-3 1-11-142-11-12 cat hates Jim Here t counts the number of adaptation steps.
often/seldom 14 1-8-4 2-5-12 2-11-1: Bob sells beer After,,,t = 2000 input presentations the responses of the
well/poorly 15 1-9-1 2-5-13 2-11-14 neurons to presentation of the symbol parts alone were
tested. I n Fig. 12, the symbolic label i s written to that site
(a) (b) (C)
at which the symbol signal x = [xs, 0IT gave the maximum
Fig. 11. Outline of vocabulary used in this experiment. (a)
response. We clearly see that the contexts "channel" the
List of used words (nouns, verbs, and adverbs), (b) sentence
patterns, and (c) some examples of generated three-word- word items to memory positions whose arrangements
sentences. reflects both grammatical and semantic relationships.
Words of same type, i.e., nouns, verbs, and adverbs, are
segregated into separate, large domains. Each of these
class has further categorial subdivisions, such as names of domains is further organized according t o similarities o n
persons, animals, and inanimate objects. To study semantic the semantic level. Adverbs with opposite meaning tend t o
relationships in their purest form, it must be stipulated that be close t o each other, because sentences differing in one
the semantic meaning be not inferable from any patterns word only are regarded as semantically correlated, and the
usedfortheencodingofthe individual words, butonlyfrom words that are different then usually have the opposite
the context in which the words occur (i.e., combinations of meaning. The groupings of the verbs correspond to dif-
words). To this end each word was encoded by a random ferences in the ways they can co-occur with adverbs, per-
vector of unit length (here, seven-dimensional). sons, animals, and nonanimate objects such as, e.g., food.
A sequence of randomly generated meaningful three- It could be argued thatthe structures resulting in the map
word sentences was used as the input data t o the self-orga- were artificially created by a preplanned choice of the sen-
nizing process. Meaningful sentence patterns had there- tence patterns allowed as input. This is not the case, how-
fore first to be constructed o n the basis of word categories ever, since it is easy t o check that the categorial sentence
(Fig. I l ( b ) ) . Each explicit sentence was then constructed by patterns in Fig. I l ( b ) almost completely exhaust the pos-
randomly substituting the numbers in a randomly selected sibilities for forming semantically meaningful three-word
sentence pattern from Fig. I l ( b ) by words with compatible sentences.
VII. DISCUSSION
. . . It was stated in Secs. I and Ill that it is not advisable t o
use the Self-organizing Map for classification problems
because decision accuracy can be significantly increased
iffinetuningsuchasLVQisused.Another important notion,
that not only concerns the maps but most of the other neural
. . . buys . . visits .
network models as well, is that it would often be absurd to
. . . . . sells . . . use primary signal elements, such as temporal samples of
. . runs . . . . . . . speech waveform or pixels of an image, for the components
drinks . . . walks . . hates . likes
of x directly. This is especially true if the input patterns are
fine-structured, like line drawings. It is not possible to
Fig. 12. “Semantic map” obtained on a network of 10 x 15
cells after 2000 presentations of word-context-pairsderived achieve any invariances in perception unless the primary
from 10 000 randomsentencesofthe kind shown in Fig. IO(c). information i s first transformed, using, e.g., various con-
Nouns, verbs, and adverbs are segregated into different volutions with, say, Gabor functions [13], or other, possibly
domains. Within each domain a further grouping according nonlinearfunctionalsof the imagefield [81], ascomponents
to aspects of meaning i s discernible.
of x. Which particular choice of functionals should be used
for preprocessing in a particular task i s a very difficult and
VI. SURVEY OF PRACTICAL APPLICATIONS
O F THE MAP delicate question, and cannot be discussed here.
In addition to numerous more abstract simulations, the- One question concerns the maximum capacity achiev-
oretical developments, and “toy examples,” the following ablein themaps. I s itpossibletoincreasetheirsize,toeven-
practical problem areas have been suggested for the Self- tually use them for data storage in large knowledge data
Organizing Map or the LVQ algorithms. I n some of them bases? It can at least be stated that the brain’s maps are not
concrete work i s already in progress. particularly extensive; they mainly seem to provide for effi-
cient encoding of a particular subset of signals t o enhance
Statistical pattern recognition, especially recognition the operation and capacity of associative memory [44]. If
of speech [42], [48]; more extensive systems are required, it might be more effi-
control of robot arms, and other problems in robotics cient to develop hierarchical structures of abstract repre-
[V,[ W , [631, [851, W71, WI; sentations.
control of industrial processes, especially diffusion The hardware used for the maps has so far only consisted
processes in the production of semiconductor sub- of co-processor boards (cf., e.g., [42],[48]). If the simple algo-
strates [62], [102]; rithm i s to be directly built into special hardware, one of
automatic synthesis of digital systems [23];
its essential operations will be the global extremum selec-
adaptive devices for various telecommunications tor, for which conventional parallel computing hardware
tasks PI, [4l, [47l; i s available [39]. Analog “winner-take-all” circuits can also
image compression [71];
be used [20], [21], [52]. Another question i s whether the
radar classification of sea-ice [76];
learning operations ought t o be performed o n the board,
optimization problems [2]; or whether fixed values for weights could be loaded into
sentence understanding [95]; the cells. Note that in the latter case the function of the cells
application of expertise in conceptual domains [96];
can be very simple, Iike that of the conventional formal neu-
and even
rons. One beneficial property of the maps i s that their
classification of insect courtship songs [73]. parameters usually stabilize out into a narrow dynamic
Of these, the application to speech recognition has the range, and the accuracy requirements are then modest. In
longest tradition in demonstrating the power of the map this case even integer arithmetic operations can provide for
method when dealing with difficult stochastic signals. M y sufficient accuracy.
personal expectations are, however, that the greatest What is the most significant difference between the Self-
industrial potential of this method may lie i n process con- Organizing Map and other contemporary neural-model
trol and telecom m u n ications. approaches? Most of the latter strongly emphasize the
O n theother hand, it i s a little surprising that so few appli- aspect of distributed processing, and only consider spatial
cations of the maps to computer vision are being studied. organization of the processing units as a secondary aspect.
This does not mean that the problems of vision are not The map principle, on the other hand, i s in some ways com-
important. It is rather that automatic analysis and extraction plementary to this idea. The intrinsic potential of this par-
of visual features, without heuristic or analytical approach, ticular self-organizing process for creating a localized,
has turned out t o be an extremely difficult problem. Bio- structured arrangement of representations in the basic net-
logical and artificial vision probably require very compli- work module i s emphasized.
cated hierarchical systems using many stages (e.g., several Actually, we should n o t talk of the localization o f a
different maps) [55], [56]. One unclarified problem i s how “function”: i t is only the response that i s localized. I am
.~ -~
~ 1
thus not opposed to the view that neural networks are dis- modules in a hierarchical map system: the signals merging
tributed systems. The massive interconnects that underlie from different modules may have t o be combined nonlin-
all neural processing are certainly spread over the network; early [82]. O n the other hand, it has already been dem-
their effects, on the other hand, may be "focused" on local onstrated that the map, or the LVQ algorithms, can be used
sites. as a preprocessing stage for other models [26], [69], [ l o l l .
It seems inevitable, however, that any complex process- I n the Counterpropagation Network of Hecht-Nielsen [22],
ing task requires organization o f information into separate competitive learning is neatly integrated into a hierarchical
parts. Distributed processing models in general underrate system as a special layer. Combinations of maps have also
this issue. Consequently, many models that process fea- been studied in [74], [112], and [113].
tures of input data without structuring exhibit slow con- It should be noted that slightly different self-organization
vergence and poor generalization ability, usually ensuing ideas have recently been suggested [5], [IO], [16]. They, how-
from ignorance of the localization of the adaptive pro- ever, fall outside the scope of this article.
cesses. One of the strongest original motives for starting the
On the lower perceptual levels, localization of responses development of (artificial) neural networks was their use as
in topographically organized maps has already been dem- learning systems that might effectively be able to utilize the
onstrated long time ago, and it i s known that such maps vast capacities of active circuits that can be manufactured
need not be prespecified in detail, but can instead organize using semiconductor o r optical technologies. It i s therefore
themselves on the basis of the statistics of the incoming a little surprising that most of the theoretical research on
signals. Such maps have already been applied with success and simulations of neural networks have been restricted t o
in many complex pattern recognition and robot control relatively small networkscontainingonlyafewtens t o a f e w
tasks. thousands of nodes (let alone parallel networks for pre-
On the higher levels of representation, relationships processing images). The main problem with most circuits
between items seem to be based on more subtle roles in seems t o be slow convergence of learning, which again
their occurrence, and are less apparent from their imme- indicates that the best learning mechanisms are yet to be
diate intrinsic properties. Nonetheless it has also been found.
shown recently that even with a simple modeling assump-
tion of semantic roles, topological self-organization of REFERENCES
semanticdatawill takeplace.Todescribetheroleofan item,
S.4. Amari, "Topographic organization of nerve fields," Bull.
it i s sufficient that the input data are presented together
Math. Biology, vol. 42, pp. 339-364, 1980.
with a sufficient amount of context. This then controls the B. Angenio1,C. de IaCroixVaubois, and J.-Y.LeTexier,"Self-
adaptation process. organizing feature maps and the travelling salesman prob-
In the practical application that we have studied most lem," Neural Networks, vol. 1, pp. 289-293, 1988.
carefully, viz. speech recognition, a statistical accuracy of N. Ansari and Y. Chen, "Dynamic digital satellite commu-
nication network management by self-organization," Proc.
phonemic recognition has been achieved that i s clearly lnt. joint Conf. on Neural Networks, ljCNN-90-WASH-DC
equal to or better than the results produced by more con- (Washington, DC, 1990) pp. 11-567-11-570.
ventional methods, even when the latter are based on anal- D. S. Bradburn, "Reducing transmission error effects using
ysis of signal dynamics [28], [66]. a self-organizing network," Proc. lnt. joint Conf. on Neural
Networks, lICNN89 (Washington, D.C., 1989) pp. 11-531-537.
It should be emphasized that the map method is not
D. J. Burr, "An improved elastic net method for the traveling
restricted t o using of any particular form of preprocessing, salesman problem," Proc. / € E € lnt. Conf. on Neural Net-
such as amplitude spectra in speech recognition, or even works, ICNN-88(San Diego, Cal., 1988) pp. 1-69-1-76.
to phonemes as basic phonological units. For instance, A. Caramazza, "Some aspects of language processing
analogous maps may be formed for diphones, syllables, or revealed through the analysis of acquired aphasia: The lex-
ical system,"Ann. Rev. Neurosci., vol. 11, pp. 395-421,1988.
demisyllables, and other spectral representations such as M. Cottrell and J.-C.Fort, "A stochastic model of retinotopy:
linear prediction coding (LPC) coefficients or cepstra may A self-organizing process," Blol. Cybern., vol. 53, pp. 405-
be used as the input information t o the maps. 411,1986.
Although the basic one-level map, as demonstrated i n -, "Etude d'u n processus d'auto-organisation," Ann. Inst.
Henri Poincare, vol. 23, pp. 1-20, 1987.
Sec. II-E, has already been shown to be capable of creating A. R . Darnasio, H. Damasio, and C. W. Van Hoesen, "Pro-
hierarchical (ultrametric) representations of structured data sopagnosia: Anatomic basis and behavioral mechanisms,"
distributions, it might be expected that the real potential Neurology, vol. 24, pp. 89-93, 1975.
of the map lies in a genuine hierarchical or otherwise struc- R. Durbin and D. Willshaw, "An analogue approach to the
tured system that consists of several interconnected map travelling salesman problem using an elastic net method,"
Nature, vol. 326, pp. 689-691, 1987.
modules. In a more natural system, such modules might D. Essen, "Functional organization of primate visual cor-
also correspond to contiguous areas in a single large sheet, tex," in Cerebral Cortex, vol. 3, A. Peters, E. G. Jones (Eds.).
where each area receives a different kind of external input, New York: Plenum Press, 1985, pp. 259-329.
as in the different areas in the cortex. In that case, the bor- 1. Fuller and A. Farsaie, "Invariant target recognition using
feature extraction," Proc. Int. joint. Conf. on Neural Net-
ders between the modules might be diffuse. The problem works, IJCNN-90-WASH-DC (Washington, DC, 1990) pp.
of hierarchical maps, however, has turned out to be very 11-595-11-598.
difficult. One of the particular difficultiesarises if the inputs D. Cabor, "Theory of communication," )./.€.E., vol. 93, pp.
to a cell come from very different sources; it then seems 429-459, 1946.
inevitable that for the comparison of input patterns, an A. Cersho, "On the structure of vector quantizers," I € € €
Trans. Inform. Theory, vol. IT-25, no. 4, pp. 373-380, July1979.
asymmetrical distance function, in which the signal com- H. Goodglass, A. Wingfield, M. R. Hyde, and J.C. Theurkauf,
ponents are provided with adaptive tensorial weights, must "Category specific dissociations in naming and recognition
be applied [31]. Another aspect concerns the interfaces of by aphasic patients," Cortex, vol. 22, pp. 8 7 4 0 2 , 1986.
I
[63] 1. Martinetz, H. J.Ritter, and K. J. Schulten, “Three-dimen- [851 H. 1. Ritter, T. M. Martinetz, and K. J.Schulten, “Topology
sional neural net for learning visuomotor coordination of conserving maps for learning visuo-motor-coordination,”
a robot arm,” / € € E Trans. Ne& Networks, vol. 1, pp. 131- Neural Networks, vol. 2, pp. 159-168, 1989.
136,1990. [86] H. Ritter and K. Schulten, ”On the stationary state of Koho-
[64] J. Max, “Quantizing for minimum distortion,” lR€ Trans. nen’s self-organizing sensory mapping,” Biol. Cybern., vol.
Inform. Theory, vol. IT-6, no. 2, pp. 7-12, Mar. 1960. 54, pp. 99-106, 1986.
165) R. A. McCarthy and E. K. Warrington, “Evidence for modality [87l -, “Topology conserving mappings for learning motor
specific meaning systems in the brain,” Nature, vol. 334, pp. tasks,” Proc. Neural Networks for Computing, AIP Confer-
428-430, 1988. ence (Snowbird, Utah, 1986) pp. 376-380.
[66] E. McDermott and S. Katagiri, “Shift-invariant, multi-cate- [88] -, ”Extending Kohonen’s self-organizing mapping algo-
gory phoneme recognition using Kohonen’s LVQ2,” Proc. rithm to learn ballistic movements,” NATO AS/ Series, vol.
lnt. Conf. on Acoustics, Signals, and Speech, KASSP 89 F41, pp. 393-406,1988.
(Glasgow, Scotland) pp. 81-84. [89] -, “Kohonen’s self-organizing maps: exploring their com-
[67l P. McKenna and E. K. Warrington, ”Category-specific nam- putational capabilities,” Proc. / E € € lnt. Conf. on Neural Net-
ing preservation: A single case study,” 1. Neurol. Neurosurg. works, lCNNN 88 (San Diego, CA, 1988) pp. I-109-1-116.
Psychiatry, vol. 41, pp. 571-574, 1978. [go] -, “Convergency properties of Kohonen’s topology con-
[68] R. Miikkulainen and M.-G. Dyer, “Forming global repre- serving maps: Fluctuations, stability and dimension selec-
sentations with extended backpropagation,” Proc. / € € E lnt. tion,’’ Biol. Cybern., vol. 60, pp. 59-71, 1989.
Conf. on Neural Networks, lCNN 88 (San Diego, CA, 1988) [91] R. A. Reale and T. J. Imig, “Tonotopic organization in audi-
. . 285-292.
pp. tory cortex of the cat,” J. Comp. Neurol., vol. 192, pp. 265-
P. Morasso, “Neural models of cursive script handwriting,” 291, 1980.
Proc. lnt. Joint Conf. on Neural Networks, IJCNN 89 (Wash- [92] H. Robbins and S. Monro, “A stochastic approximation
ington, DC, 1989) pp. 11-539-11-542. method,” Ann. Math. Statist., vol. 22, pp. 400-407, 1951.
J. T. Murphy, H. C. Kwan, W. A. MacKay, and Y. C. Wong, [93] E. T. Rolls, “Neurons in the cortex of the temporal lobe and
“Spatial organization of precentral cortex in awake pri- in the amygdala of the monkey with responses selective for
mates, Ill, Input-outputcoupIing,’’J. Neurophysiol., vol. 41, faces,” Hum. Neurobiol., vol. 3, pp. 209-222, 1984.
pp. 1132-1139, 1977. [94] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning
N. M. Nasrabadi and Y. Feng, ”Vector quantization of images internal representations by error propagation,” in Parallel
based upon the Kohonen Self-organizing Feature Maps,” Distributed Processing: Explorations in the Microstructure
Proc. /€E€ lnt. Conf. on Neural Networks, IC”-88 (San of Cognition. Vol. 1.: Foundations, D. E. Rumelhart, J. L.
Diego, CA, 1988) pp. I-101-1-108. McClelland and the PDP research group, Eds. Cambridge,
M. M. Nassand L. N. Cooper,”Atheoryforthedevelopment Mass.: MIT Press, 1986, pp. 318-362.
of featuredetectingcells in visual cortex,”Biol. Cybern., vol. [95] J. K. Samarabandu and 0. E. Jakubowicz, “Principles of
19, pp. 1-18, 1975. sequential feature maps in multi-level problems,” Proc. lnt.
E. K. Neumann, D. A. Wheeler, J. W. Burnside, A. S. Bern- loint. Conf. on Neural Networks, IJCNN-90-WASH-DC
stein, and J. C. Hall, “A technique for the classification and (Washington, DC, 1990) pp. 11-683-11-686.
analysis of insect courtship song,” Proc. lnt. Joint. Conf. on [96] P. G. Schyns, “Expertise acquisition through concepts
Neural Networks, lJCNN-90-WASH-DC,(Washington, DC, refinement in a self-organizingarchitecture,” Proc. lnt. Joint.
1990) pp. 11-257-262. Conf. on Neural Networks, IJCNN-90-WASH-DC(Washing-
Y. Nishikawa, H. Kita, and A. Kawamura, ”“/I: a neural net- ton, DC, 1990) pp. 1-236-1-239.
work which divides and learns environments,” Proc. lnt. [97l D. L. Sparks and J. S. Nelson, ”Sensory and motor maps in
Joint. Conf. on Neural Networks, lJCNN-90-WASH-DC, the mammalian superior colliculus,” TINS, vol. IO, pp. 312-
(Washington, DC, 1990) pp. 1-684-1-687. 317, 1987.
G. A. Ojemann, ”Brain organization for language from the [98] N. Suga and W. E. O’Neill, ”Neural axis representing target
perspective of electrical stimulation mapping,” Behav. Brain range in the auditory cortex of the mustache bat,” Science,
Sci., vol. 2, pp. 189-230, 1983. vol. 206, pp. 351-353, 1979.
J. Orlando, R. Mann, and S. Haykin, ”Radar classification of [99] N. V. Swindale, ”A model for the formation of ocular dom-
sea-ice using traditional and neural classifiers,” Proc. lnt. inance stripes,” Proc. R. Soc., vol. 8.208, pp. 243-264, 1980.
Joint. Conf. on Neural Networks, lJCNN-90-WASH-DC, [IOO] A. Takeuchi and S. Amari, “Formation of topographic maps
(Washington, DC, 1990) pp. 11-263-11-266. and columnar microstructures,” Biol. Cybern., vol. 35, pp.
K. J. Overton and M. A. Arbib, “The branch arrow model of 63-72,1979.
the formation of retinotectal connections,” Biol. Cybern., [I011 T. Tanaka, M. Naka, and K. Yoshida, “Improved back-prop-
VOI.45, pp. 157-175, 1982. agation combined with LVQ,” Proc. lnt. Joint. Conf. on
K. J. Pearson, L. H. Finkel, and G. M. Edelman, “Plasticity in Neural Networks, IJCNN-90-WASH-DC(Washington, DC,
the organization of adult cerebral maps: a computer sim- 1990) pp. 1-731-734.
ulation based on neuronal group selection,” /. Neurosci., [I021 V. Tryba, K. M. Marks, U. Ruckert, and K. Goser, “Selbst-
vol. 12, pp. 4209-4223, 1987. organisierende Karten als lernende klassifizierende
R. Perez, L. Glass, and R. J. Shlaer, “Development of spec- Speicher,” ITG Fachbericht, vol. 102, pp. 407-419, 1988.
ificity in cat visual cortex,” ],Math. Biol., vol. 1, pp. 275-288, [I031 A. R. Tunturi, “Physiological determination of the arrange-
1975. ment of the afferent connections to the middle ectosylvian
[80] S. E. Petersen, P. T. Fox, M. I. Ponsner, M. Mintun, and auditory area in the dog,” Am. J. Physiol., vol. 162, pp. 489-
M. E. Raichle, ”Positron emission tomographic studies of 502, 1950.
the cortical anatomy of single-word processing,” Nature, [I041 -, “The auditory cortex of the dog,” Am. /. Physiol., vol.
vol. 331, pp. 585-589, 1988. 168, pp. 712-717, 1952.
[81] M. Porat and Y. Y. Zeevi, ”The generalized Gabor scheme [I051 A.Waibel,T.Hanazawa,G. Hinton,K.Shikano,andK.J. Lang,
of image representation in biological and machine vision,” ”Phoneme recognition using time-delay neural networks,”
/€E€ Trans. Pattern Anal. Machine lntell., vol. PAMI-IO, pp. /E€€ Trans. Acoust. Speech and Signal Processing, vol. ASSP-
452-468,1988. 37, pp. 382-339, 1989.
[82] H. Ritter, ”Combining self-organizing maps,” Proc. lnt./oint [I061 E. K. Warrington, ”The selective impairment of semantic
Conf. on Neural Networks, ljCNN 89 (Washington, DC, 1989) memory,” Q. j . Exp. Psycho/., vol. 27, pp. 635-657,1975.
pp. 11-499-11-502. [IO7 E. K. Warrington and R. A. McGarthy, “Category specific
[83] -, ”Asymptotic level density for a class of vector quan- access dysphasia,” Brain, vol. 106, pp. 859-878,1983.
tization processes,” Helsinki Universityof Technology, Lab. [I081 -, “Categories of knowledge,” Brain, vol. 110, pp. 1273-
of Computer and Information Science, Report A9, 1989. 1296, 1987.
[84] H. Ritter and T. Kohonen, “Self-organizing semantic maps,” [I091 E. K. Warrington and T. Shallice, “Category-specific impair-
Biol. Cybern., vol. 61, pp. 241-254, 1989. ments,” Brain, vol. 107, pp. 829-854, 1984.
- 1