Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Chapter 12

Chapter 12

Ratings: (0)|Views: 35 |Likes:
Published by armin2200

More info:

Published by: armin2200 on Dec 06, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/29/2010

pdf

text

original

 
Modular
Networks
12.1
Introduction
The hierarchical levels of organization in artificial neural networks may be classified asfollows. At the most fundamental level are synapses, followed by neurons, then layers of neurons in the case of a layered network, and finally the network itself. The design of neural networks that we have pursued up to
this
point has been of a modular nature atthe level of neurons or layers only. It may be argued that the architecture of a neuralnetwork should go one step higher in the hierarchical level of organization. Specifically,it should consist of a multiplicity of networks, and that learning algorithms should bedesigned to take full advantage
o
the resulting modular structure. The present chapter isdevoted to a particular class of 
 modular networks
that relies on the combined use of supervised and unsupervised learning paradigms.We may justify the rationale for the use of modular networks by considering theapproximation problem. The approximation of a prescribed input
-
output mapping maybe realized using a
local 
method that captures the underlying local structure of the mapping.Such a model is exemplified by radial-basis function
(RBF)
networks, which were studiedin Chapter
7.
The use of a local method offers the advantage of fast learning and thereforethe ability to operate in real time, since it usually requires relatively few training examplesto learn a single task. However, a limitation of local methods is that they tend to bememory intensive. Alternatively, the approximation may be realized using a
 global 
methodthat captures the underlying global structure of the mapping. This second model is exempli-fied by back -propagation learning applied to multilayer perceptrons, which were studiedin Chapter 6. The use of global methods offers the advantages of a smaller storagerequirement and better generalization performance. However, they suffer from a slowlearning process that limits their range of applications. In light of 
this
dichotomy betweenlocal and global methods of approximation, it is natural to ask: How can we combine theadvantages of these two methods? The answer appears to lie in the use of a
 modular
architecture that captures the underlying structure of an input
-
output mapping at anintermediate level of granularity. The idea of using a modular network for realizing acomplex mapping function was discussed by
Hinton
and Jacobs as far back as the
mid-
1980s (Jacobs et al.,
1991a).
Mention should also be made of a
 committee machine
consisting of a layer of elementary perceptrons followed by a vote-taking perceptron in
the
second layer, which was described in Nilsson (1965). However, it appears that theclass of modular networks discussed in this chapter was first described in Jacobs andJordan
(1991),
and the architecture for it was presented by Jacobs et al.
(1991a).
A useful feature of a modular approach is that it also provides a better fit to a dis
-
continuous input
-
output mapping. Consider, for example, Fig. 12.1, which depicts a
473
 
474
12
/
Modular
Networks
Discontinuous
f
function
FIGURE
12.1
A
discontinuous (piecewise
-
linear) function and its approximation.
one
-
dimensional function
g(x)
with a discontinuity, as described by
x,
x>o
-
x,
x50
=
(12.1)If we were to use a single fully connected network to approximate this function, theapproximation may exhibit erratic behavior near the discontinuity, as illustrated by thedashed curve in Fig. 12.1. In
a
situation of this
kind,
it would be preferable to split thefunction into two separate pieces, and use a modular network to learn each piece separately(Jacobs et al.,
1991b).
The use of a modular approach may also be justified on neurobiological grounds.Modularity appears to be an important principle in the architecture of vertebrate nervoussystems, and there is much that can be gained from the study of learning in modularnetworks in different parts of the nervous system (Houk, 1992). For example, the existenceof hierarchical representations
of 
information is particularly evident in the cortical visualareas (Van Essen et al., 1992; Van Essen, 1985; Fodor, 1983). The highly complexcomputation performed by the visual system is broken down into pieces, just like anygood engineer would do when designing a complex system, as evidenced by the following(Van Essen et al., 1992):
1.
Separate modules are created in
the
visual system for different subtasks, allowing
2.
The same module is replicated many times, as exemplified by the internal structure
3.
Coordinated and efficient routing of information between modules is maintained.Modularity may therefore be viewed as
an
 additional variable,
which would permitthe formation of 
 higher
-
 order
computationa2
units
that can perform complex tasks. Refer
-
ring back to the hierarchical levels of organization in the brain as described in Chapter1, and recognizing the highly complex nature of the computational vision and motorcontrol tasks that a human being can perform
so
efficiently and effortlessly, it is apparentthe neural architecture to be
optimized
for particular types of computation.of area VI of the visual cortex.
 
12.2
Basic
Notions
of
Modularity
475
that modularity as a computational technique
is
the key to understanding complex tasksperformed by artificial neural networks. Unfortunately, the scope o
our
knowledge of this important subject is rather limited at the present.
This
chapter on a very special
kind
of modular networks should therefore be viewed as a good beginning.
Organization
of
the Chapter
The main body of 
this
chapter on modular networks is organized as follows. In Section12.2 we formally define what we mean by a modular network, and discuss the implicationsof modularity.
In
Section 12.3 we describe an associative Gaussian mixture model for aspecific configuration of modular networks. This is followed by the derivation of astochastic-gradientlearning algorithm (based on maximization of a log
-
likelihoodfunction)for the network, which we do in Section 12.4. In Section 12.5 we extend the concept of modularity by developing a hierarchical structure of modular representations. In Section12.6 we describe an application of modular networks in control. The chapter concludesin Section 12.7 with a summary of 
the
properties of modular networks and some finalthoughts on the subject
12.2
Basic Notions
of
Modularity
A
modular network is formally defined as follows.’
 A
 neural network is said to be modular
if 
 the computation
pedormed
 by the network can be decomposed into two or more modules (subsystems) that operate
on
 distinctinputs without communicating with each other. The outputs
o
 the modules are mediated by an integrating unit that is not permitted to feed information back to the modules.
 In
 particular, the integrating unit both
(I)
 decides how the outputs
o
 the modules should be combined to form the
jinal
 output
o
 the system, and 
(2)
 decides which modules should learn which training patterns.
Modularity may therefore be viewed as a manifestation of the
 principle
o
 divide and  conquer,
which permits us to solve a complex computational task by dividing it intosimpler
subtasks
and then combining their individual solutions.
A
modular network fuses supervised and unsupervised learning paradigms in a seamlessfashion. Specifically, we have the following.
Supervised learning,
exemplified by an external “teacher” that supplies the desiredresponses (target patterns) needed to train the different modules of the network.However, the teacher does not specify which module of the network should produceeach desired response; rather, it is the function of the unsupervised learning paradigmto do this assignment.
Unsupervised learning,
exemplified by the modules “competing” with each otherfor the right to produce each desired response. The integrating unit has the role of “mediating” among
the
different modules for
this
right.Consequently, the modules of the network tend to
 specialize
by learning different regionsof the input space. However, the form of competitive learning described here does notnecessarily enforce the specialization; rather, it is done naturally.
In
the competition,roughly speaking, the winner is the module whose output most closely matches the desired
This
definition
is
adapted
from
Osherson
et
al.
(1990);Jacobs
and
Jordan
(1991);
and
Jacobs
et
al.
(1991a).

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->