Professional Documents
Culture Documents
10 - Clark (Eng) PDF
10 - Clark (Eng) PDF
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Oxford University Press and The British Society for the Philosophy of Science are collaborating with JSTOR to
digitize, preserve and extend access to The British Journal for the Philosophy of Science.
http://www.jstor.org
ABSTRACT
A competence model describes the abstract structure of a solution to some problem,
or class of problems, facing the would-be intelligent system. Competence models
can be quite detailed, specifying far more than merely the function to be computed.
But for all that, they are pitched at some level of abstraction from the details of any
particular algorithm or processing strategy which may be said to realize the
competence. Indeed, it is the point and virtue of such models to specify some
equivalenceclass of algorithms/processing strategies so that the common properties
highlighted by the chosen class may feature in psychologically interesting
accounts. A question arises concerning the type of relation a theorist might expect
to hold between such a competence model and a psychologically real processing
strategy. Classical work in cognitive science expects the actual processing to
depend on explicit or tacit knowledge of the competence theory. Connectionist
work, for reasons to be explained, represents a departure from this norm. But the
precise way in which a connectionist approach may disturb the satisfying classical
symmetry of competence and processing has yet to be properly specified. A
standard 'Newtonian' connectionist account, due to Paul Smolensky, is discus-
sed and contrasted with a somewhat different 'rogue' account. A standard
connectionist understanding has it that a classical competence theory describes an
idealizedsubset of a network's behaviour. But the network's behaviour is not to be
explained by its embodying explicit or tacit knowledge of the information laid out in
the competence theory. A rogue model, by contrast, posits either two systems, or
two aspects of a single system, such that one system does indeed embody the
knowledge laid out in the competence theory.
1 Scenesetting
2 Levelsof explanationand the idea of an equivalenceclass
3 Theclassical cascade
4 Newtoniancompetence
5 Roguecompetence
6 Themethodologyof connectionistexplanation
7 Conclusions:the cascade,the dam and the dividedstream
I SCENE SETTING
In the old days,we all knew what it meantto describethe mindas a syntactic
engine.A syntacticenginewas a physicalsystemcleverlydesignedso that the
way someofits physicalstatesgaveway to otherphysicalstateswas alwaysin
stepwith the way that goodinferencesproceededin some particulardomain.
Forexample,somestatesmightbeusedto standfora categorysuch as dog,and
the physicalsystem set up so that those states reliablygave way to others
which couldbe interpretedas standingforsub-and super-ordinate categories
(such as 'Fido'and 'Mammal').
We understoodthat such an effect(themirroringof semanticregularitiesin
syntactic systems) was made possible by the system's being geared to
manipulatesymbolsaccordingto rules.Symbolswererecurrentphysicalstates
whichwe couldinterpret(e.g.as standingfordog)andthe systemcouldeither
embodythe rulesexplicitlyor implicitly(see the text).
Connectionistsystems(see Rumelhartand McClelland[1986], Smolensky
[1988], Clark[1989]) appearto offera somewhatdifferentway of ensuring
that a physicalsystem is semanticallywell-behaved.In (highly distributed
Smolensky-style)connectionistmodels, there are often no neat recurrent
physicalstates which code for the real world entities which the system is
dealing with. Insteadof being a syntacticengine in which semanticgood
behaviouris ensured by having the system directlyimplementsymbolic
descriptionsof the objectsand processeswhich its inferencesconcern,the
(Smolensky-style)connectionist opts for a statisticalengine operatingon
computationalobjectswhich do not neatlystandforthe objectsandprocesses
in the domain.(Theseobjectsareoftencalled'subsymbols'.) Nonetheless,in a
centralclass of cases, the systembehavesas if it were a symbolic/syntactic
engine. (For a particularlyclear account of this proposal,see Smolensky
[1987] pp. 137-49.)
In what follows I exploresome implicationsof this novel way of being
semanticallywell-behaved.In particular,I askhow well a standardmodelof
explanationin cognitivescience(Marr's3-levelmodel)describesthe connec-
tionist'sprocedureandtheory,andwhetherafailureto fitsuch a modelimplies
a lack of explanatorypower.I begin, then, with some generalcommentson
explanation.
Mendelian genetics
DNA
A competence theory, then, leads a double life. It both specifies the function to
be computed and it specifies the body of knowledge or information which is
used by some class of algorithms. In classical cognitive science, these two roles
can easily be simultaneously discharged. For the competence theory is just an
articulated set of rules and principles defined over symbolic data-structures.
Since classical cognitive science relies on symbol processing architecture, it is
natural (at level 2) to represent directly the data structures (e.g. structural
descriptions of sentences) and then carry out the processing by the explicit or
tacit representation of the rules and principles defined (in the competence
theory) to operate on those structures. Thus, given a structural description of
an inflected verb as comprising a stem plus an ending, the classicist can go on
to define a level 2 computational process to take the stem and add -ed to form
the past tense (or whatever). The classicist, then, is (by virtue of using a symbol
processing architecture to implement level 2 algorithms) uniquely well placed
to preserve a very close relation between a competence theory and its level 2
4 NEWTONIAN COMPETENCE
5 ROGUE COMPETENCE
On the Newtonian connectionist model, then, the competence theory
functions as a descriptively adequate guide to the output in a somewhat
idealized range of cases. This, however, is not the only understanding of
competence theories available to a connectionist. And indeed, it is not the
understanding implicit in some other connectionist treatments of high level
problem solving. In this Section I look at a class of alternative treatments
which I shall call roguemodels of competence.
The basic differencebetween Newtonian and rogue models is simply this. In
a Newtonian model, the connectionist network is itselfcapable, under idealized
conditions, of behaving in all the ways specifiedby the competence theory. In a
rogue model, by contrast, the basic connectionist network does not itself have
the capacity (even under idealizations of processing time and well-posed
problems) to produce the full range of results requiredby (i.e. derivable in) the
competence theory. Instead, it will be claimed that insofar as human beings
actually exhibit the full scale classical competence they do so only by deploying
7984
5431
4.
We may, they go on to say, even learn to do this in our headby representing the
external symbols to ourselves in some manner. But it is still an essentially
'external' symbolic medium which we are manipulating, and it still constitutes
a resource built on top of the basic connectionist pattern-matching capacity
which we deploy. (Daniel Dennett has recently being saying very similar
things about the cases where sentencesseem to run through our heads. In these
case, we do indeed do classical symbol processing. But such processing may
constitute an extra resource, not implicated in all our daily, non-linguistic
reasoning-see Dennett [1987], pp. 233, 114-15; also Clark [1988].)
The account of complex multiplication is of course highly problematic since
the whole thing seems to involve knowing symbolic rules governing the serial
deployment of the pattern-matching capacities! But we have seen already that
much apparentlysymbol-reliant behaviour may be sub-symbolically produced.
(But see Clark [1989] for a detailed discussion.) And at any rate, I use the
example merely as a gesture at the kindof account which would constitute a
rogue model.
geometry. The final theory included specifications of fifteen states of liquid and
74 numbered rules or axioms written out in predicate calculus. This amounts
to a detailed competence specification which might eventually be given full
level 2 algorithmic form. Indeed, Hayes ([1985], p. 3) is quite explicit about the
high level of the investigative project, insisting that it is a mistake to seek a
working program too soon. The explanatory strategy of naive physics is thus a
paradigm example of the official classical methodology recommended by
Newell and Simon. First, seek a high level competence theory involving
symbolic representations and a set of state transition rules. Then write level 2
algorithms implementing the competence theory, secure in the knowledge
that we have a precise higher level understanding of the requirements which
the algorithms meet and hence a real grasp of why they are capable of carrying
out the task in question. It is this security which the connectionist lacks, since
she does not (cannot)proceed by formulating a detailed classical competence
theory and then neatly implementing it on a classical symbol processing
architecture.
Hence the problem: how should the connectionist proceed, and what
constitutes the higher level understanding of the processing which we need in
orderto claim to have really explainedhow a task is performed?What is needed,
it seems, is some kind of connectionist analogue to the classical competence
theoretic level of explanation.
I believe that such an analogue exists. But it remains invisible until we
perform a kind of Copernican revolution in our picture of explanation in
Cognitive Science. For the connectionist effectively inverts the usual temporal
and methodological order of explanation, much as Copernicus inverted the
usual astronomical model of the day by having the earth revolve around the
sun instead of the other way round. Likewise, in connectionist theorizing,
the high level understanding will be made to revolve around a working
program which has learnt how to negotiate some cognitive terrain. This
inverts the official Marr-style ordering in which the high level understanding
(i.e. competence theory) comes first and closely guides the search for
algorithms. To make this clear, and to see how the connectionist's high level
theory will depart from the form of a classical competence theory, I propose to
take a look at Sejnowski's NETtalkproject.
NETtalkis a large, distributedconnectionist model which aims to investigate
part of the process of turning written input (i.e. words) into phonemic output
(i.e. sounds or speech). The network architecture comprises a set of input units
which are stimulated by seven letters of text at a time, a set of hidden units, and
a set of output units which code for phonemes. The output is fed into a voice
synthesizer which produces the actual speech sounds.
The network began with a random distribution of hidden unit weights and
connections (within chosen parameters), i.e. it had no 'idea' of any rules of text
to phoneme conversion. Its task was to learn, by repeated exposure to training
conversion. But, of course, the connectionist does not stop there. From the up
and running level 3 implementation she must now work backwardsto a
higher-level understanding of the task. This is Marr-through-the-looking-
glass. How is this higher level understanding to be obtained? There are a
variety of strategies in use and many more to be discovered. I shall mention just
three. First, there is simple watching, but at a microscopic level. Given a
particular input, the connectionist can see the patterns of unit activity (in the
hidden units) which result. (This, at any rate, will be the case if the network is
simulated on a conventional machine which can keep a record of such
activity). This, as Sejnowski points out, provides a kind of data which
neuroscientists are hard pressed to gather. For neuroscience has excellent
techniques for recording single cell activity. But it is not well placed to record
patterns of simultaneous activity across large numbers of cells. (See also
Churchland [forthcoming-1989].)
Second, there is networkpathology.While it is obviously unethical delibera-
tely to damage human brains to help us see what role sub-assemblies of cells
play in various tasks, it seems far more acceptable to damage artificial neural
networks.
Lastly, and perhaps most significantly, the connectionist can generate a
picture of the way in which the system has learnt to divide up the cognitive
space it is trying to negotiate. It is this picture, given by so-called 'hierarchical
cluster analysis', which seems to me to offerthe closest connectionist analogue
to a high-level, competence-theoretic understanding.
Cluster analysis is an attempt to answer the question, 'What kinds of
representation have become encoded in the network's hidden units?' This is a
hard question since the representations, as noted earlier, will in general be of
somewhat complex, unobvious, dimension-shifted features. To see how cluster
analysis works, consider the task of the network to be that of setting hidden
unit weights in a way which will enable it to performa kind of set partitioning.
The goal is for the hidden units to respond in distinctive ways when, and only
when, the input is such as to deserve a distinctive output. Thus in text-to-
phoneme conversion, we want the hidden units to perform very differently
when given 'the' as input than they would if given 'sail' as input. But we want
them to perform identicallyif given 'sail' and 'sale' as inputs. So the hidden
units' task is to partition a space (definedby the number of such units and their
possible levels of activation) in a way which is geared to the job in hand. A very
simple system, such as the rock/mine network described in Churchland
[forthcoming-1989] may need only to partition the space defined by its
hidden units into two major subvolumes-one distinctive pattern for inputs
signifying mines and one for those signifying rocks. The complexities of text-
phoneme conversion being what they are, NETtalkmust partition its hidden
unit space more subtly (in fact, into a distinctive pattern for each of 79 possible
letter to phoneme pairings). Cluster analysis, as carried out by Rosenberg and
Hierarchy of Partitions-4
on Hidden-Unit r t t-t
z-zt ss
d-d
Vector Space k-k
c-k
c-s
g-J
9-
j-
s-Z sZr-r
1p_ o--
aa--
ku--
w-- Y--
1- s k--
h--
h-h
W -W n-G
n-n
R
Consonants I-L
1-
c- S
l- c-C
s-S t-T
• 1 ,I ?t-D
!
t-S
-t-C
V-V
f-
m-m
p-f- P-P -
! 0 b-b
q- "'
-u-y
u-I
u-yo
- o-C
o-u
O-_
y-Y
y-i
i-Y
i-x
i-A
11e-Y
e-i
e-I
e-E
e-e •
Vowels
I e-x
e-x a-o
.....a-c
a-a a-c
a-x
Relationthree:the dividedstream
Rogue models represent a more complex state of affairs in which actual
performance is dependent on two systems. One, the daily, on-line system,
relates to the competence theory in the way described by the Newtonian
REFERENCES
CHOMSKY, N. [1986]: Knowledgeof Language:Its Nature, Origin and Use, Praeger
Publishers, Connecticut.
CHURCHLAND, P. [forthcoming--1989]: 'On the nature of theories: a neurocomputa-
tional perspective'. In P. M. Churchland (ed.) TheNeurocomputational Perspective,
MIT Press, Cambridge,Massachusetts.
CLARK,A. [1988]: 'Thoughts, sentences and cognitive science', PhilosophicalPsycho-
logy, Vol. I, no. 3, pp. 263-78.
CLARK,A. [1989]: Microcognition:Philosophy, CognitiveScienceand ParallelDistributed
Processing.MIT/BradfordBooks, Cambridge, Massachusetts.
DAVIES,M. [198 7]: 'Tacit knowledge and semantic theory: can a five per cent difference
matter?', Mind, 96, pp. 441-62.
DAVIES,M. [forthcoming]: 'Connectionism, modularity and tacit knowledge', British
Journalfor the Philosophyof Science.
DENNETrD, [198 7]: TheIntentionalStance.MIT/BradfordBooks, Cambridge,Massachu-
setts.
DENNETr,D. [1988]: 'The evolution of consciousness', JacobsenLecture,University of
London, May 1988. Tufts UniversityCurrentCirculatingManuscriptCCM-88-1.