You are on page 1of 18

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 1989, 51, 399-416 NUMBER 3 (MAY)

THE INTERPRETATION OF COMPLEX HUMAN BEHAVIOR:


SOME REACTIONS TO PARALLEL DISTRIBUTED PROCESSING,
EDITED BY J. L. McCLELLAND, D. E. RUMELHART,
AND THE PDP RESEARCH GROUP1
JOHN W. DONAHOE AND DAVID C. PALMER
UNIVERSITY OF MASSACHUSETTS/AMHERST AND SMITH COLLEGE

The publication, in two volumes, of Parallel Parallel distributed processing is a term


Distributed Processing (McClelland, Rumel- used to encompass a number of related models
hart, & the PDP Group, 1986; Rumelhart, of cognition featuring networks of units (loosely
McClelland, & the PDP Group, 1986) has analogous to collections of neurons) whose
been heralded as a major potential contri- interconnections are modified through a feed-
bution to the understanding of complex human back mechanism (loosely analogous to re-
behavior. For example, the review published inforcement). These models are fundamentally
in Contemporary Psychology begins: "These different from typical models of cognitive
two volumes may turn out to be among the psychology in that they are selectionist rather
most important books yet written for cognitive than essentialist in flavor (cf. Donahoe, 1984).
psychology. They are already among the most That is, the functionality of connections among
controversial.... This new paradigm chal- the units is the result of selection by the
lenges the fundamental assumption underlying environment rather than design by the theorist.
the currently dominant symbolic paradigm; Behavior analysts-as students of behavior,
namely, that mental processes can be modeled including complex human behavior-should
as programs running on a digital computer" closely examine the approach presented in
(S. E. Palmer, 1987, p. 925). Even those who Parallel Distributed Processing (hereafter PDP)
have expressed principled reservations con- for several reasons: (a) the PDP approach
cerning certain aspects of the PDP approach is explicitly critical of many of the same
"believe this realm of work to be immensely constructs of mainstream cognitive psychology
important and rich" (Minsky & Papert, 1988, that are regarded as unhelpful in behavior
p. vii). (See Bechtel, 1985, and Miller, 1986 analysis. (b) The general approach repre-
for other reactions to the PDP approach.2) sented by PDP falls within the same broad
conceptual framework-historical science-
1 Rumelhart, D. E., McClelland, J. L., & the PDP
that encompasses behavior analysis. (c) Some
Research Group. (1986). Parallel distributed processing:
of the specific accounts of complex behavior
Explorations in the microstructure of cognition: Vol. 1. Foun- proposed in PDP functionally parallel the
dations. Cambridge, Massachusetts: MIT Press; corresponding accounts advanced by behavior
McClelland, J. L., Rumelhart, D. E., & the PDP Re- analysis.
search Group. (1986). Parallel distributed processing: Ex- To list but a few criticisms of cognitive
plorations in the microstructure of cognition: Vol. 2. Psy- constructs, PDP questions whether the cu-
chological and biological models. Cambridge, Massachusetts:
MIT Press. mulative effect of experience is usefully de-
Preparation of this manuscript was supported in part scribed as the storage of memories (e.g.,
by NSF Grant BNS-8409948 to the University of Mas- McClelland, Rumelhart, & Hinton, 1986,
sachusetts/Amherst. Reprints may be requested from the p. 31), whether regularities in verbal behavior
first author at the Program in Neuroscience and Behavior,
Department of Psychology, University of Massachusetts, may be taken as unambiguous evidence of
Amherst, Massachusetts 01003 or from the second author control by syntactic rules (e.g., McClelland,
at the Department of Psychology, Smith College, North- Rumelhart, & Hinton, 1986, p. 32), and
ampton, Massachusetts 01060. whether discontinuities in the development
2 See also Reese, H. W. (1986, May). Computer simu-
lation and behavior analysis: Similarities and differences. Pa- of complex behavior require the postulation
per presented at the meeting of the Association for Be- of critical periods (Munro, 1986, pp. 471-
havior Analysis, Milwaukee, Wisconsin. 472). (References to material in PDP conform
399
400 JOHN W. DONAHOE and DAVID C. PALMER

to the convention of citing the author or au- see Mayr, 1982, and Sober, 1984; in behavior
thors of the chapter followed by the relevant analysis see Skinner, 1966a, 1981, and Don-
page numbers.) Thus the PDP approach ahoe, Palmer, & Carlson, in press.)
makes common cause with behavior analysis A second characteristic of historical science
in questioning the utility of the storage met- is that the basic processes, whose cumulative
aphor, syntactic rules, and critical periods in effects yield complex phenomena, act in a
understanding complex behavior (cf. Skinner, particular order upon a given set of initial
1974). But common objections to such notions conditions-but both the initial conditions and
are perhaps the least important reasons for the order of action of the processes are imperfectly
behavior analysts to explore the PDP ap- known. Returning to the example of evo-
proach. lutionary biology, the processes responsible
for selection acted on the chemical compounds
of the lifeless environment of the early earth
HISTORICAL SCIENCE AND THE and, subsequently, on the products of prior
ORIGINS OF COMPLEXITY selections. However, a precise specification
More compelling than the sharing of com- of the environment of the early earth and
mon criticisms is the fact that the PDP ap- of the order in which selection processes acted
proach and behavior analysis are both in- on that environment is not perfectly known.
stances of historical science and, as such, Such uncertainities are central to the nature
assume a common stance in their efforts to of historical science because their existence
understand complexity. For present purposes, imposes inescapable limitations on the account
two salient features of historical science are of complexity that is possible: Even if the
recognized. First, historical science views com- principles describing the basic selection pro-
plex phenomena as the by-products of the action cesses were known with certainty, uncertainty
of lower level processes and not as the directed in their implications would remain because
outcome of processes operating at the level of incomplete knowledge of the initial con-
at which the complex phenomena are ob- ditions and of the order in which the various
served. (As the term level is used here, it processes operated.
denotes not only more grossly differing levels In a mature historical science, the principles
such as the physiological and behavioral, but that summarize the action of basic processes
also finer gradations such as the microbe- are sufficient to account for complexity, but
havioral and macrobehavioral. See Campbell, they cannot be shown to be necessary for
1974a, for a discussion of reductionism in its occurrence. For example, the historical
historical science.) science of evolutionary biology may be com-
As an illustration from the best known of petent to describe how a given species might
the historical sciences-evolutionary biol- have evolved but be unable to demonstrate
ogy-the complex characteristics that are ob- that only that particular species could have
served in existing species are interpreted as arisen. With slightly different initial con-
the result of the cumulative action of genetic ditions acted upon in a slightly different order,
processes, summarized by principles such as the same selecting processes could have pro-
natural selection, and not as the expression duced a different species (cf. Dawkins, 1986).
of higher order processes such as special cre- The test of the adequacy of an historical science
ation. Cosmogony is another historical science, is its ability to provide a plausible account
one which seeks to understand the origin of whereby a wide variety of complex phenomena
the universe as the cumulative product of could have been produced by the action of
physical processes summarized by principles a small set of basic processes. In summary,
such as those of Newtonian mechanics. Within because of limitations in knowledge of the
behavior analysis, the lower level processes complete history of selection, the power of
of which complex behavior is the product historical science must often be evaluated by
are those associated with selection by re- the sufficiency of its account of complexity
inforcement (e.g., Donahoe & Wessells, 1980; and not by the necessity of that account.
Skinner, 1953, 1974). (For extended dis- (Although this limitation of historical science
cussions of historical science in evolutionary has not been generally acknowledged within
epistemology see Campbell, 1974b; in biology cognitive psychology, it has been clearly ar-
INTERPRETATION VIA ADAPTIVE NETWORKS 401
ticulated by some workers, most notably An- verbal behavior is thus an exercise in in-
derson, 1978, 1983.) terpretation rather than a quantitative ex-
trapolation of rigorous experimental results"
The Analysis and Interpretation of Behavior (p. 11). This effort to interpret verbal behavior
Behavior analysis has explicitly recognized was widely viewed by cognitivists as an in-
the inescapable difficulties in understanding appropriate application of inadequate lab-
complex behavior because of its status as one oratory principles to complex behavior (e.g.,
of the historical sciences. This recognition is Chomsky, 1959). Attempts to understand other
most apparent in Skinner's (1957, 1974) dis- complex behavior have been similarly received,
tinction between the experimental analysis and although they too follow the path of inter-
the interpretation of behavior. In the exper- pretation-the means whereby historical sci-
imental analysis of behavior, (primarily) lab- ence accounts for complex phenomena. For
oratory methods are used to manipulate en- example, in About Behaviorism, Skinner (1974)
vironmental variables with the goal of notes: "Much of the argument goes beyond
identifying the functional relations between the established facts. I am concerned with
those variables and behavior (e.g., Skinner, interpretation rather than prediction and con-
1935, 1938, 1966b). The functional rela- trol" (p. 21). ". . . Our knowledge . .. is limited
tions-especially those expressing the effects by accessibility, not by the nature of the facts.
of the contingencies of reinforcement-pro- ... As in other sciences, we often lack the
vide, in turn, the basis for establishing "sim- information necessary for prediction and con-
plifying uniformities" (Skinner, 1966b, p. trol and must be satisified with interpretation,
216). The endpoint of experimental analysis but our interpretations will have the support
is the statement of these "uniformities" in of the prediction and control which have been
the form of principles that summarize the possible under other conditions" (p. 194). "We
action of the basic processes identified by the cannot predict or control human behavior in
science. daily life with the precision obtained in the
Interpretation begins where experimental laboratory, but we can nevertheless use results
analysis leaves off. That is, interpretation from the laboratory to interpret behavior else-
begins with principles derived from exper- where" (p. 251).
imental analysis and, then, explores the im- Whatever one thinks of experimental-an-
plications of those principles for the under- alytic principles, the effort to interpret com-
standing of complex behavioral phenomena. plex behavior in terms of laboratory-based
As with other historical sciences, if the pro- principles is entirely in keeping with the
cesses described by these principles are suf- practice of other historical sciences, and not
ficient to produce the observed complex be- a peculiarity of behavior analysis. It is the
havior (i.e., the occurrence of the behavior practice in evolutionary biology where the
is consistent with the principles) then the evolution of a peacock's tail feathers is traced
behavior is said to be understood (interpreted) to the action of natural selection on primordial
in terms of the principles. (Of course, there hair cells. It is the practice of cosmogony where
are also important reciprocal relations between the development of the solar system is traced
interpretation and behavior analysis: "The to the action of processes described by New-
interpretation of human affairs is a rich source tonian mechanics on a swirling cloud of in-
of suggestions for experiments" Skinner, terstellar dust particles. The uncertainty in
1966b, p. 216.) these accounts of complexity is an inherent
The distinction between the experimental and inescapable characteristic of historical
analysis and the interpretation of behavior science; it cannot be circumvented by any
is a fundamental one in behavior analysis, alternative formulation because the uncer-
and the failure to appreciate that distinction tainty arises from irremediable lapses in our
has led to misunderstandings vis-a-vis other knowledge of the initial conditions and the
approaches to complex behavior. Perhaps the selection history (Donahoe et al., in press).
most unfortunate instance of this misunder-
standing occurred with Skinner's Verbal Be- PDP as Historical Science
havior (1957). At the outset of that work, A reading of PDP makes is abundantly
Skinner states: "The present extension to clear that behavioral complexity is viewed
402 JOHN W. DONAHOE and DAVID C. PALMER
as the cumulative product of lower level pro- theoretically .. ." (S. E. Palmer, 1987, pp.
cesses. In this respect, PDP falls squarely 926-927; cf. Skinner, 1977).
within the conceptual framework of historical Although both PDP and behavior analysis
science. The point is made in many places, conform equally to the view of historical sci-
often in the context of discussions of such ence that complexity emerges from the action
molar constructs as rules or schemata. Both of lower level processes, the approaches depart
of these constructs are central notions within fundamentally in the means whereby those
mainstream cognitive psychology. "The idea processes are identified. In behavior analysis,
of parallel distributed processing [is] that as in other historical sciences, the processes
intelligence emerges from the interactions of that lead to complexity are the direct result
large numbers of simple processing units . . ." of independent experimental analyses. That
(Rumelhart, McClelland, & the PDP Group, is, the validity of these processes is not de-
1986, p. ix). And later, in the same vein, pendent on the extent to which they are suf-
"the apparent application of rules could readily ficient to interpret complex behavior, although
emerge from interactions among simple pro- their competence to do so contributes to their
cessing units rather than from application validity. In the PDP approach, unlike his-
of any higher level rules" (Rumelhart & torical science generally, the basic processes
McClelland, 1986, p. 120). "Many of the are typically inferred from the complex be-
constructs of macrolevel descriptions such as havior that they seek to interpret. Further-
schemata, prototypes, rules, productions, etc. more, the description of these processes is
can be viewed as emerging out of interactions more often constrained by logical and math-
of the microstructure of distributed models" ematical rather than experimental-analytic
(Rumelhart & McClelland, 1986, p. 125). considerations (e.g., Rumelhart & Mc-
"Schemata are not 'things.' There is no rep- Clelland, 1986, p. 133; see also Klopf, 1988).
resentational object which is a schema. Rather, Thus far, we have described the salient
schemata emerge at the moment they are features of historical science and the means
needed from the interaction of large numbers whereby complex phenomena are interpreted
of much simpler elements all working in in such sciences. We have seen that behavior-
concert with one another.... In the con- analytic and PDP approaches are both in-
ventional story, schemata are stored in mem- stances of historical science and, therefore,
ory.... . . In our case, nothing stored corresponds have a number of characteristics in common,
very closely to a schema. What is stored is such as their denial of the utility of many
a set of connection strengths which, when molar constructs from normative cognitive
activated,.., generate states that correspond psychology. However, we have claimed that
to instantiated schemata" (Rumelhart, Smo- PDP fails to conform to the practice in other
lensky, McClelland, & Hinton, 1986, pp. 20- historical sciences that interpretation is based
21). "The fact that our microstructural models solely upon principles established by inde-
can account for many of the facts about the pendent experimental analyses. In the re-
representation of general and specific infor- mainder of this paper, we present the PDP
mation ... makes us ask why we should view approach in sufficient detail to permit this
constructs like logogens, prototypes, and sche- claim to be examined, and then make several
mata as anything other than convenient ap- suggestions toward an integration of the PDP
proximate descriptions .. ." (Rumelhart & approach with behavior analysis.
McClelland, 1986, p. 127). The effort within
PDP to account for complexity in terms of
simpler processes has not escaped other cog- ADAPTIVE NETWORKS AND
nitivists. A reviewer remarked, "The im- INTERPRETATION
portance of PDP models stems mainly from Three complementary strategies of inter-
their unexpected emergent properties.... The pretation have been implemented in behavior
schemata themselves are not explicit data analysis-verbal interpretation, organismic
structures at all, but rather implicit structures interpretation, and formal interpretation
distributed over the mass of interconnections (Donahoe et al., in press). In the verbal in-
among the units. They exist explicitly only terpretation of complex behavior, the impli-
INTERPRETATION VIA ADAPTIVE NETWORKS 403
cations of experimental-analytic principles are pretation. This is so because, with very few
pursued using the conventions of ordinary exceptions, the processes contributing to com-
language. What distinguishes verbal inter- plex behavior are not all under the control
pretation from mere speculation is that, like of the experimenter. Although research with
all interpretation in historical science, it ap- human subjects may very carefully control
peals only to processes that have been iden- the variables within the experiment, the dif-
tified in prior experimental analyses. Verbal fering preexperimental selection histories of
interpretation is by far the most commonly the subjects outside the study cannot be com-
employed interpretative strategy in behavior pletely controlled or even described.
analysis, with Science and Human Behavior In the third interpretative strategy, formal
(Skinner, 1953), Verbal Behavior (Skinner, interpretation, logical and/or mathematical
1957), and About Behaviorism (Skinner, 1974) techniques are used to explore the implications
providing notable examples. of experimental-analytic principles for com-
Although verbal interpretation is a useful plex behavior. Included here would be efforts
method for understanding complex phenom- to use computer simulations of basic rein-
ena, it has distinct disadvantages. Very often, forcement processes to interpret choice be-
especially with more complex phenomena, havior (e.g., Hinson & Staddon, 1983; Shimp,
a number of processes are involved-acting 1969), and it is this strategy of interpretation
simultaneously and in many different se- that the PDP approach most resembles.
quences-and a purely verbal account cannot Not all computer simulation is an instance
effectively keep track of them all. Although of formal interpretation. Often, the goal of
verbal interpretation will always play an im- computer simulation is to devise a program
portant role in behavior analysis and other whose output simply mimics some aspect of
historical sciences, particularly during their complex behavior, but in which the instruc-
earlier phases of development, more precise tions of the program do not implement ex-
methods are desirable. perimental-analytic principles. Strictly speak-
In organismic interpretation, complex be- ing, simulations of this type fall within the
havior (or some aspect of it) is observed in rubric of artificial intelligence. However, it
one organism and simulated in another or- must be noted that unless the processes im-
ganism by exposing that organism to a se- plemented in PDP or other types of computer
quence of conditions thought to produce the simulations have some independent experi-
complex behavior in the first organism. As mental foundation, the distinction between
examples, the sequence of responses described simulations in artificial intelligence and in
by Kohler (1925) as demonstrating "insight" cognitive psychology is chiefly that the former
with chimpanzees has been functionally re- are constructed by computer scientists and
produced with pigeons (e.g., Epstein, Kirshnit, the latter by cognitive psychologists.
Lanza, & Rubin, 1984), behavioral inter- In comparison to verbal interpretation, for-
changes between chimpanzees said to reveal mal interpretation has the advantage of being
"communication" (Savage-Rumbaugh, 1984, precisely stated in the instructions of the pro-
1986) have been simulated with pigeons gram and of being able to keep track of many
(Epstein, Lanza, & Skinner, 1980), and some simultaneously interacting processes. In com-
aspects of generalization by children of num- parison to organismic interpretation, formal
ber inflections in nouns (Berko, 1958) have interpretation has the advantage of imple-
been simulated with pigeons (Catania & Cer- menting a selection history that might require
utti, 1985). In each of these organismic in- many months or years in a living organism.
terpretations, the test organism has been the A disadvantage of computer simulation as
focus of a particular sequence of selection a means of formal interpretation is that the
processes all of which had been identified initial conditions from which the computer
and directly studied in prior experimental program begins are often imperfectly specified.
analyses. It should be noted that most human That is, information comparable to the evo-
experimentation-whether conducted within lutionary history of the organism is un-
the behavior-analytic tradition or otherwise- available. It is in this respect-a shared evo-
falls into the category of organismic inter- lutionary history-that organismic simulations
404 JOHN W. DONAHOE and DAVID C. PALMER
INPUT HIDDEN OUTPUT are analogous to the synaptic efficacies with
which one neuron activates the neurons with
N °
B which it is in proximity. That is, the larger
v the connection weight, the greater the prob-
I H
R A abilty that activity in a "presynaptic" unit
0 AV will activate a "postsynaptic" unit.
N EI
M m In typical simulations using adapative net-
~~~~~~~~0
R works, the initial connection weights are as-
N signed small, randomly determined values.
Thus, the network is most often a tabula rasa,
Fig. 1. A network composed of input units in contact although this is not a requirement for such
with the environment, hidden units that are not in direct simulations. The competence of the network
contact with the environment, and output units that to simulate some complex environment-be-
constitute the behavior of the network. If a unit is activated, havior relation is demonstrated if the con-
it will probabilistically activate all those units to which
it is connected. The possible pathways between units nection weights can be modified such that
are indicated by lines, with activation propagating along an input pattern from the environment reliably
a pathway in only one direction (here, from left to right). activates an appropriate output pattern and
does not activate inappropriate output pat-
terns. In behavior-analytic terms, complex
have an advantage over computer simulations behavior is simulated when the network has
of complex behavior (Epstein, 1984). formed a discrimination.
With adaptive networks, the connection
The PDP Approach weights are modified as a result of the "ex-
PDP describes a subset of a more general perience" of the network. That is, if an input
approach to interpretation known as adaptive pattern activates an appropriate output pat-
network theory. (For surveys of this field, tern, the weights tend not to change. However,
see Nilsson, 1965, and Minsky & Papert, if the obtained output pattern does not cor-
1988, pp. 247-287.) Adaptive network theory respond to the appropriate output pattern,
is described by the authors of PDP as "neurally then the weights are changed in proportion
inspired" (Rumelhart, Hinton, & Mc- to the difference between the obtained pattern
Clelland, 1986, p. 75). In keeping with this and the appropriate pattern. A large difference
inspiration, a network consists of a number produces a large change in the connection
of units, analogous to neurons or groups of weights; a small difference produces a small
neurons, that are interconnected (see Figure change in the weights. It is clear, then, that
1). adaptive networks simulate complex behavior
It is useful to distinguish among three types through a selection process (i.e., "learning")
of units-input units, output units, and so- and that the selection process is a function
called "hidden" units. Input units are in direct of the consequences scheduled for the output
contact with the environment of the network; of the network. In behavior-analytic terms,
output units constitute the behavior of the complex environment-behavior relations in
network; hidden units lie between the input adaptive networks are the product of selection
and output units. Hidden units are analogous by reinforcement.
to interneurons. The environment activates The PDP approach primarily exploits one
one or more of the input units to provide method, the generalized delta rule (Rumel-
an input pattern to the network. The activated hart, Hinton, & Williams, 1986, pp. 318-362;
input units then probabilistically activate the Stone, 1986, pp. 444-459), to implement the
hidden units to which they are connected and selection process for changing connection
these, in turn, activate some of the output weights. As mentioned previously, changes
units. The activated output units define the in the connection weights occur when there
output pattern obtained from the network. is a discrepancy between the output pattern
The extent to which a unit activates sub- produced by an input pattern and the output
sequent units in the network depends upon pattern that is appropriate for that input
the strength of the connection, or connection pattern. Adjustment of the connection weights
weight, between the units. Connection weights is relatively straightforward when there are
INTERPRETATION VIA ADAPTIVE NETWORKS 405

direct connections between the input and out- of the network and changes in the output
put units (i.e., no hidden units). (An adaptive of the network with changes in the weights.
network with only input and output units This is the so-called chain rule of composite
is called a perceptron; Rosenblatt, 1962.) The functions (see Rumelhart, Hinton, & Wil-
connection weights are unchanged along paths liams, 1986, pp. 322-328). Although the gen-
that are active when appropriate output pat- eralized delta rule for multilayered networks,
terns occur; the weights are decreased along unlike the convergence theorem for percep-
paths that are active when inappropriate pat- trons, does not guarantee that a solution will
terns occur. Eventually, as shown by the be found for all solvable input-output (en-
perceptron convergence theorem, there are vironment-behavior) relations, it is reported
some classes of environment-behavior relations that "our analyses and results have shown
for which appropriate weights will be found that, as a practical matter, the [back-]prop-
(Minsky & Papert, 1969). However, when agation scheme leads to solutions in virtually
there are hidden units-which are required every case" (Rumelhart, Hinton, & Williams,
to represent some classes of environment-be- 1986, p. 361).
havior relations-then the method for ad- The PDP volumes contain applications of
justing the connection weights throughout the adaptive networks to a variety of content areas.
network is less clear. Minsky and Papert To give some sense of the approach, consider
(1988) have put the matter succinctly. "... the following example of the use of an adaptive
Until recently, [adaptive network theory] has network to simulate the formation of a dis-
been paralyzed by the following dilemma: Per- criminative stimulus class (Goldiamond, 1962),
ceptrons could learn anything that they could what, in the vernacular, is referred to as a
represent, but they were too limited in what concept (McClelland & Rumelhart, 1986, pp.
they could represent. Multilayered networks 170-215). Suppose that a child sees a number
were less limited in what they could represent, of dogs that differ somewhat from one another,
but they had no reliable learning procedure" but that share certain features. Many dogs
(p. 256). are brown, but there are occasional black dogs
The generalized delta rule extends the basic or white dogs; most dogs have a tail, but
thrust of the perceptron convergence theo- some do not; and so on. In the adaptive network
rem-the reduction of the discrepancy be- used to simulate these conditions, there were
tween the obtained and approprite output- 16 features with each of the features rep-
to multilayer networks (i.e., to networks con- resented by an input unit. An input unit was
taining hidden units). The goal is to select activated if the feature was present and was
connection weights throughout the network not activated if the feature was absent. Suppose
that minimize the discrepancy between the further that the child's parents say "dog" and
obtained and the desired output patterns. The reinforce the child's verbal response, "dog,"
generalized delta rule accomplishes this goal in the presence of canine input patterns of
by activating the network with an input pat- the features of particular dogs, while other
tern and then, working backward from the responses are reinforced in the presence of
obtained output pattern, adjusting the con- noncanine input patterns. In the simulation,
nection weights of increasingly earlier layers there were eight output units, with a particular
of the network. The adjustments are pro- pattern of activation of the units corresponding
portional to the discrepancy between the ob- to the verbal response, "dog." Other output
tained and the appropriate output pattern. patterns corresponding to other verbal re-
Because the discrepancy is "passed back" to sponses (e.g., "cat") might be reinforced in
the connection weights of earlier units in the the presence of different input patterns.
network, the generalized delta rule is said The goal of the simulation was to determine
to "back-propagate" the discrepancy. Back- whether the connection weights in the network
propagation exploits the known mathematical could be changed so that only canine input
relation between the derivative of a function, patterns evoked the output pattern corre-
here changes in discrepancy with changes in sponding to the verbal response, "dog." The
the connection weights, and derivatives of effect of reinforcement was simulated by ad-
components of that function, here changes justing each connection weight in proportion
in the discrepancy with changes in the output to the discrepancy between the output pattern
406 JOHN W. DONAHOE and DAVID C. PALMER
on that trial and the "dog" output pattern. torical-science approaches to complexity, and
The network was repeatedly exposed to 50 is not unique to the PDP and behavior-an-
different canine input patterns and 100 input alytic approaches. For example, production
patterns corresponding to other stimuli. After systems-whatever their other characteris-
each exposure, the connection weights were tics-also have this property, as persuasively
adjusted so that when one of the canine input demonstrated in accounts of the acquisition
patterns occurred, the discrepancy between of verbal behavior, e.g., Anderson, 1983.)
the obtained output pattern and the "dog" Third, by the end of training, the network
pattern was reduced. The cumulative effect was capable of responding more strongly with
of these adjustments was that the network the "dog" output pattern to some new input
"recognized" canine input patterns; that is, patterns than to any of the input patterns
canine patterns applied to the input units to which it had been exposed. The new input
of the network caused the "dog" pattern to patterns to which the network responded more
appear on the output units. vigorously might be described as the more
Three characteristics of the discriminative "typical" canine input patterns. Thus, the
stimulus class simulated by the adaptive net- network responded as if it had formed a pro-
work are worthy of special comment. First, totype of the canine inputs, where a prototype
the stimulus class had "fuzzy boundaries" may be thought of as the most common com-
(Rosch & Mervis, 1975). That is, no single bination of input features (Posner & Keele,
canine feature needed to be present in an 1968). Note that the prototype is not a "thing"
input pattern in order for the "dog" output that is stored at some place within the network;
pattern to be activated. For example, suppose it is not an "ideal representation of reality"
that one of the input features corresponded that is waiting to be retrieved by the stimulus.
to hasfour legs. Although the connection weights Networks, and the living organisms whose
linking the unit activated by this feature with functioning they are intended to simulate, act
the output pattern corresponding to "dog" as if there were prototypes, but what exist
might be relatively strong, input patterns not are sets of connection weights and synaptic
containing this feature would also be able efficacies, respectively. Responding as if there
to evoke the "dog" response. Thus a dog were a prototype is simply how a trained
unfortunate enough to have lost a leg through network or an experienced organism functions
an accident, but possessing other canine fea- after training.
tures, would still be called a dog.
Second, the values of the connection weights The PDP Approach as Interpretation in
by which input units were linked to output Historical Science
units depended on the particular examples To qualify as an interpretation in historical
of canine and noncanine inputs that were science, an account must draw upon only
used to train the network and on the order principles derived from findings established
in which the inputs were applied. Thus, the through prior experimental analyses. Inter-
strengths of the weights in the network were pretations are consumers, not producers, of
path-dependent-that is, determined by the principles. How well does the PDP approach
details of the selection history of the network. conform to this criterion?
Networks, like living organisms, reflect their To begin, the PDP approach-as a "neural-
unique selection histories. To illustrate, the ly inspired" effort at interpretation-must
pathways activated by a feature such as has draw upon experimental analyses of the neu-
four legs would have large connection weights rosciences as well as of behavior. Experimental
in a network that had been trained to dis- analyses of neuroscience and of behavior are
tinguish dogs from fire hydrants, but would complementary undertakings, both of which
have smaller and more complexly arrayed contribute to understanding the functioning
weights in a network that had been trained of the organism. As Skinner (1938) has noted:
to distinguish dogs from cats. Fire hydrants "What is generally not understood by those
do not have legs, but cats do; hence, the feature, interested in establishing neurological bases
hasfour legs, would be helpful in distinguishing is that a rigorous description at the level of
dogs from hydrants but not from cats. (Note behavior is necessary for the demonstration
that path dependence is inherent in all his- of a neurological correlate" (p. 422). cc...
INTERPRETATION VIA ADAPTIVE NETWORKS 407

I am not overlooking the advance that is made units are indicated by filled circles in Figure
in the unification of knowledge when terms 2. Dependent on the values of the connection
at one level of analysis are defined ('ex- weights prior to the input, an activated input
plained') at a lower level" (p. 428). Although unit will then probabilistically activate some
our focus is upon the PDP approach as a of the hidden units with which it is connected
means for exploring the implications of find- and they, in turn, will activate some of the
ings from the experimental analysis of be- output units. Paths that might be activated
havior, a few comments on the relation of by the input pattern shown in Figure 2 are
PDP to the neurosciences are in order. indicated by the heavier lines. Some activated
PDP and the neurosciences. Reaching an paths are "dead ends" in the sense that they
accommodation between the PDP approach are not constituents, in this instance at least,
and the neurosciences has been the subject of paths that activate output units upon which
of considerable discussion and controversy the reinforcer is dependent. However, other
both within the PDP volumes (Crick & Asa- paths do lead to the critical output unit(s),
numa, 1986, pp. 333-371; McClelland & and their activation causes the reinforcer to
Rumelhart, 1986, pp. 327-331; Norman, 1986, occur. In Figure 2, the critical output unit
pp. 531-546) and elsewhere (e.g., Smolensky, is shown as a filled circle. When this output
1988). A major focus of contention has been unit is activated, causing the reinforcer to
the lack of correspondence between the struc- be presented, a diffuse signal is sent through-
ture of PDP networks and the structure of out the network. The effect of this diffuse
the nervous system (cf. Segal, 1988). In gen- signal is to strengthen all of those connections
eral, the interconnections postulated within that happen to be active at that moment. Some
networks have not been closely guided by of the active connections will be part of "dead-
neuroanatomical findings (Crick & Asanuma, end" paths, and the strengthening of these
1986, pp. 370-371; Minsky & Papert, 1988, paths may not benefit the appropriate func-
p. 266). Only one inconsistency is examined tioning of the network. However, the set of
here, but it is one that pertains to the central active paths must necessarily also include some
issue in adaptive networks-the means by paths that activate the crucial output unit(s)
which reinforcers adjust the connection weights because, without their involvement, the rein-
within networks. forcer would not have occurred. Over time,
In order for the nervous system to im- it is the connection weights of these latter
plement back-propagation as a means of ad- pathways that will be strengthened most re-
justing the connection weights, large numbers liably.
of specific back-connections from the output Adaptive networks and behavior analysis.
units to units in all of the earlier layers of There are a number of striking consistencies
the network are required. In fact, there is between the behavior-analytic account of the
no neuroanatomical evidence that such rich acquisition of environment-behavior relations
back-connections exist. On the contrary, the and the interpretation of those relations by
neural systems mediating selection by re- means of selection networks. First, an operant
inforcement appear to be nonspecific systems is conventionally defined as a class of re-
that project diffusely within the brain areas sponses, all of which have a partially common
they serve (for reviews, see Carlson, 1986, effect on the environment (cf. Reynolds, 1968,
and Olds & Fobes, 1981). How can these p. 17; Skinner, 1935). In an adaptive network
diffuse, reinforcer-activated systems alter the with a diffusely projecting reinforcement sys-
specific connections between the input and tem, selection produces a number of different
output units that mediate a particular en- output patterns but all of them include ac-
vironment-behavior relation? tivation of the crucial output unit(s) upon
An adaptive network in which a diffuse which the reinforcer is dependent. The class
reinforcement system is capable of adjusting of output patterns activated by the input pat-
the connection weights is shown in Figure tern is analogous to the class of responses
2. We call a network of this architecture a that constitute the operant.
selection network. Suppose that the environ- Second, with operant as contrasted to re-
ment places a given pattern on the input units spondent conditioning, the critical response
of the selection network. The activated input class is said to be emitted rather than elicited.
408 JOHN W. DONAHOE and DAVID C. PALMER
INPUT HIDDEN OUTPUT cedures may be simulated with a selection
E network having a diffuse projection system
N activated by the reinforcer, and initial work
V
indicates that such networks are competent
R to learn a variety of environment-behavior
0 relations (e.g., Barto & Arandan, 1985).
N Fourth, there are important similarities
M
E between the behavior of organisms and the
N output of adaptive networks after selection
T
has occurred. After an organism's behavior
has been subject to differential conditioning
in which a response has been reinforced in
the presence of one environment and a dif-
ferent response during a second environment,
Fig. 2. A selection network consisting of input, hidden,
intermediate test stimuli (following intradi-
and output units. The solid circles represent units that mensional training) or compound test stimuli
are activated as the result of the environmental input (following interdimensional training) occasion
at that moment. Because of an environmentally mediated a mixture of the two discriminated operants
contingency between a particular output of the network (e.g., Bickel & Etzel, 1985; Donahoe & Wes-
and a reinforcer, the occurrence of the designated output sells, 1980, pp. 176-196). That is, the test
pattern causes the reinforcing stimulus to be presented.
The reinforcing stimulus activates an input unit having stimuli evoke only the responses that have
diffuse projections throughout the network. The diffuse been reinforced in the presence of the training
projections increase the strength of a connection to the stimuli, and not new responses. After an
extent that the connection is from an active unit and adaptive network has been differentially con-
the connection terminates on an active unit.
ditioned, an analogous phenomenon may oc-
cur. When a new input pattern is applied
to the network, the output pattern tends to-
That is, the environment acting on the or- ward one of the two trained output patterns.
ganism permits the response to occur. At other This property of adaptive networks is the
times or under other organismic conditions, result of what are called attractor dynamics
the same environment might occasion other Jordan, 1986; Sejnowski, 1986, p. 389; Smo-
responses. Similarly, an input pattern to a lensky, 1986, pp. 424-429).
network does not so much elicit the required
output as the network permits that output
to occur. Depending upon the initial state
of the network and its selection history, the ADAPTIVE NETWORKS AND THE
connection weights at that moment determine EXPERIMENTAL ANALYSIS
the range and probability of the specific en- OF BEHAVIOR
vironment-behavior relations that it can me- The accounts of differential conditioning
diate. provided by adaptive networks and experi-
Third, by whatever means the designated mental-analytic findings are strikingly and
output (operant) occurs, it is those means persuasively congenial. Because, from the
that are strengthened. Thus, conditioning with behavior-analytic perspective, complex be-
both selection networks and living organisms havior is the cumulative product of extensive
is essentially superstitious in nature (Skinner, differential conditioning, adaptive networks
1948). At heart, operant conditioning is a provide a potentially powerful means for for-
procedural arrangement whereby the pre- mal interpretation.
sentation of the reinforcer is dependent on Some of the possible contributions of be-
the occurrence of a specified response, whereas havior-analytic findings to adaptive-network
respondent conditioning is an arrangement interpretations may be illustrated by returning
whereby the reinforcer is dependent on a to the central issue of reinforcement. Re-
specified stimulus that already evokes a par- gardless of whether the connection weights
ticular response (cf. Donahoe, Crowley, Mil- in the network are adjusted by back-prop-
lard, & Stickney, 1982). Both of these pro- agation or by a nonspecific projection system
INTERPRETATION VIA ADAPTIVE NETWORKS 409

as we have proposed, the event that triggers and fading. In shaping, the response topog-
the adjustment procedure must be well spec- raphy necessary for the occurrence of a rein-
ified. According to the back-propagation pro- forcer progressively approximates the topog-
cedure, the weights are adjusted as a function raphy ultimately required for the reinforcer.
of the discrepancy between the obtained output With adaptive networks, this arrangement
pattern and the appropriate output pattern. may be simulated by progressively changing
Thus, in order for back-propagation to occur, the criterion output pattern. In chaining, re-
the environment must provide detailed in- sponding alters the stimuli available to the
formation about the appropriate output pat- organism with the result that these new stimuli
tern. This precondition is often denoted by serve as conditioned reinforcers with respect
saying that back-propagation requires a to the responses that produce them and as
"teacher." Quite reasonably, biologically ori- discriminative stimuli with respect to sub-
ented critics of the PDP approach have com- sequent responses in the sequence. With adap-
mented that this assumption is inconsistent tive networks, this arrangement may be sim-
with the conditions under which most complex ulated by having the output pattern affect
behavior is acquired (e.g., Minsky & Papert, the subsequent input pattern to the network.
1988, p. 264; Segal, 1988, p. 1107). More In this way, a feedback loop is implemented
often, complex behavior-such as verbal be- that is mediated by the environment rather
havior-is discovered, not instructed. How- than by recurrent connections within the net-
ever, because the environment usually does work itself. We are unaware of any work
not provide an all-knowing "teacher," some within the PDP framework that exploits the
of these same critics have mistakenly concluded literature on shaping and chaining.
that selection by consequences cannot play Fading is also critically important to the
a major role in modulating synaptic efficacy acquisition of complex environment-behavior
in the nervous system. (Much of the difficulty relations within the behavior-analytic frame-
arises from a failure on the part of both work. In fading, the complex stimuli that
adaptive-network theorists and their critics ultimately control behavior are progressively
to distinguish between contingency-shaped approximated as training proceeds. With
and rule-governed behavior, but that im- adaptive networks, fading may be simulated
portant matter will not be pursued here; see by progressively changing the input pattern
Skinner, 1974). to the network in the direction of the final
The belief that naturally occurring con- input pattern. There has been only some very
tingencies are insufficient to bring about com- preliminary work on fading with adaptive
plexity has been a long-standing, but mis- networks. In the one case known to us, the
conceived, impediment to selectionist accounts input patterns that were to be discriminated
of complexity in the historical sciences. This were initially very dissimilar, but were pro-
mistaken belief was reflected in the criticisms gressively shifted toward the highly similar
of Darwin's contemporaries who argued that, patterns required by the final discrimination.
although artificial selection by animal hus- Using fading, the adaptive network achieved
bandrymen could produce progressive changes, criterion performance with only 25% of the
natural contingencies could not do so without trials that were required when the highly
the intervention of a Designer. The same similar input patterns were used throughout
mistaken belief is reflected in the present-day training Uacobs, 1988)!
criticism of linguists that the verbal envi-
ronment is too impoverished for "language" Some Shared Orienting Attitudes
to be acquired without the intervention of We have seen that both behavior analysis
a Language Acquisition Device (Chomsky, and adaptive-network theory deny the func-
1980; see also Pinker & Mehler, 1988). tionality of the molar constructs of information
Within behavior analysis, the problem of processing and assert the centrality of the guid-
accounting for the emergence of complex en- ance and selection of complex behavior by the
vironment-behavior relations from the op- environment. Moreover, experimental analy-
erant level of simpler behavior and in the sis appears to have much to offer adaptive-
absence of an all-knowing teacher falls within network theory in the simulation of selection;
the extensive literature on shaping, chaining, namely, findings concerning the behavioral and
410 JOHN W. DONAHOE and DAVID C. PALMER
neural processes responsible for reinforcement. regards the environment as providing a richer
Likewise, adaptive-network theory appears to source for shaping complex behavior than is
have much to offer experimental analysis; appreciated by the information-processing
namely, a means for integrating these findings approach (see Rumelhart, Hinton, &
so that their implications for the interpretation McClelland, 1986, p. 54; Rumelhart, Smo-
of complex behavior may be pursued more lensky, McClelland, & Hinton, 1986, p. 39
precisely. These specific mutual benefits are if).
possible because both the behavior-analytic and The emphasis upon the environment as
adaptive-network approaches share a number the shaper both of behavior and adaptive
of general orienting attitudes toward the origins networks should not be taken to mean that
of complex behavior. We conclude this paper either behavior analysis or adaptive-network
by describing three of these shared orienta- theory denies the crucial contribution of the
tions-the preeminence of the environment, organism as the locus of the cumulative effects
the ubiquity of multiple causation, and the of prior environmental selection. As Skinner
disenthronement of consciousness. has noted, "The environment made its first
Preeminence of the environment. Perhaps, great contribution during the evolution of the
the characteristic that most clearly distin- species, but it exerts a different kind of effect
guishes the behavior-analytic approach from during the lifetime of the individual, and the
information-processing approaches to complex combination of the two effects is the behavior
behavior is the effort to ferret out the en- that we observe at any given time" (Skinner,
vironmental antecedents of complex behavior 1974, p. 19; see also Skinner, 1966a, 1984).
and to resist the temptation to attribute be- Similarly, in PDP it is remarked, "Some have
havior to inferred organismic events that are argued that since we claim that human cog-
beyond the reach of the environment. Although nition can be explained in terms of PDP
the organism is the locus of environmental networks and that the behavior of lower an-
action, it is the environment, and not the imals can also be described in terms of such
organism, that is the initiator and shaper of networks we have no principled way of ex-
behavior (Skinner, 1974). Adaptive-network plaining why rats are not as smart as people."
theory takes a similar stance; complex behavior Does this "criticism" sound familiar? The
occurs when the environment activates the criticism is answered in much the same way
input units of the network. Rumelhart, Hin- that behaviorists would answer it, except for
ton, and McClelland, in their summary of a few notable differences in technical vo-
"eight major aspects of a parallel distributed cabulary: "We are not claiming, in any way,
processing model," list ". .. an environment that people and rats and all other organisms
(sic) within which the system must operate" start out with the same prewired hardware.
(1986, p. 46). The reliance upon the en- ... But there must be another aspect to the
vironment (i.e., upon "experience") as the difference between rats and people as well.
origin of complexity is especially clear in the This is that the human environment includes
treatment of verbal behavior from a PDP other people and the cultural devices that
perspective: "the greater the amount of ex- have been developed to organize their thinking
perience, the more independent the system processes" (Rumelhart & McClelland, 1986,
should be from its start state and the more p. 143; cf. Skinner, 1981).
dependent it should be on the structure of Multiple causation. A corollary of the prop-
its environment.... To the extent that stored osition that behavior is the product of the
knowledge is assumed to be in the form of combined effect of the contemporary envi-
explicit, inaccessible rules of the kind often ronment acting on an organism that has been
postulated by linguists,... it is hard to see changed by the cumulative effect of selection
how it could 'get into the head' of the newborn" by prior environments is the following: Be-
(Rumelhart & McClelland, 1986, p. 142). cause both prior and present environments
(See D. C. Palmer, 1986, for an extended are complex, any given behavior particularly
discussion of this point.) Adaptive-network complex behavior, which is the product of
theory, in agreement with behavior analysis a prolonged history of selection-is likely to
and only a few other contemporary views be under the combined control of many dif-
(e.g., Gibson, 1979; see also Costall, 1984), ferent environmental stimuli. Multiple con-
INTERPRETATION VIA ADAPTIVE NETWORKS 411

trolling variables are clearly evident in verbal such constructions are to similar effect: "Per-
behavior, where slips of the tongue and the haps there is no harm in playing with sentences
pen often point to the complexity of the an- in this way, . . . but it is still a waste of time,
tecedents of a given response. For example, particularly when the sentences thus generated
the failure of the subject and predicate to could not have been emitted as verbal be-
agree in number in the sentence, "The as- havior" (Skinner, 1974, p. 109).
semblage of feathers are beautiful," probably Second, behavior analysis has long rejected
reflects such facts as "feathers" is closer to response topography as a sufficient basis for
the verb than is "assemblage" and "feathers either an adequate experimental analysis
are beautiful" has been more commonly ut- (Skinner, 1935) or interpretation (Skinner,
tered (and reinforced) than "the assemblage 1957) of behavior. Regarding verbal behavior
is beautiful." Indeed, the pervasiveness and Skinner has stated, "What is needed-and
importance of multiple controlling variables what the traditional 'word' occasionally ap-
for the understanding of complex behavior proximates-is a unit of behavior composed
led Skinner (1957) to entitle one of the chap- of a response of identifiable form functionally
ters in Verbal Behavior, "Multiple Causation." related to one or more independent variables.
Adaptive networks inherently implement ... In this way we may distinguish between
multiple causation. That is, the output pattern the operantfast in which the controlling vari-
produced by the network is dependent upon able is shared by the operant speedy and the
all of the input units that are activated by operantfast in which the controlling variable
the contemporary environment and all of the is similar to that in the operantfixed" (Skin-
connection weights within the network, which, ner, 1957, pp. 20-21). "Those who have
in turn, reflect the cumulative action of all confused behaviorism with structuralism, in
of the previous environments. The use of its emphasis on form or topography, have
adaptive networks to simulate the effects of complained that it ignores meaning....
multiple controlling variables on complex Meaning is not properly regarded as a prop-
behavior is in its infancy, as in the treatment erty either of a response or a situation but
of "sentence processing" in PDP (e.g., rather of the contingencies responsible for both
McClelland & Kawamoto, 1986, pp. 272- the topography of behavior and the control
325). Nevertheless, certain conclusions have exerted by stimuli" (Skinner, 1974, pp. 100-
been reached that are congenial to a behavioral 101). In the treatment of "words" by adaptive
analysis. networks, a similar point is made. "We will
As two examples of such conclusions, adap- probably all agree that there are different
tive-network theory questions the status of readings of the word bat in the sentences The
certain apparently verbal constructions as bat hit the ball and The bat flew round the
genuine instances of verbal behavior and the cave.... Different readings of the same word
status of response topography by itself as a are just different patterns of activation; really
fruitful means for studying verbal behavior. different readings, ones that are totally un-
First, note the treatment in PDP of center- related ... the two readings of bat simply
embedded sentences, one of the constructions have very little in common" (McClelland &
thought by linguists to require recursive mech- Kawamoto, 1986, pp. 314-315). They have
anisms in accounts of sentence processing. "very little in common" because the "different
McClelland and Kawamoto comment: "... patterns of activation" are produced by dif-
people cannot parse such sentences without ferent input patterns to the network. That
the use of very special strategies, and do not is, the controlling stimuli for responses of
even judge them to be acceptable. Consider, the same topography are different and, there-
for example, the 'sentence': ... The man who fore, the responses are members of different
the girl who the dog chased liked laughed. operants.
... sentences in natural language are simply Consciousness. The orienting attitude that
not structured in this way. Perhaps, then, behavior analysis and adaptive-network theory
the search for a model of natural language share toward consciousness might not appear,
processing has gone down the garden path, at first glance, to be fundamental to their
chasing a recursive white rabbit" (1986, pp. approaches to complex behavior. In fact, an
323-324). Skinner's earlier comments about argument can be made that it is this shared
412 JOHN W. DONAHOE and DAVID C. PALMER

attitude that is central to their common ap- special behavior called knowing. We can never
proach to complex behavior. know through introspection what the phys-
The information-processing approach, either iologist will eventually discover with his spe-
explicitly or implicitly, is prone to the belief cial instruments (pp. 238-239).... What is
that behavior taken as evidence of conscious- felt or seen through introspection is only a
ness occupies a special status with respect small and relatively unimportant part of what
to other behavior. Although a number of the physiologist will eventually discover"
characteristics are often mentioned as giving (Skinner, 1974, p. 274).
consciousness its special status, the one of Some proponents of adaptive-network the-
concern here is the belief that there is a fairly ory have come to essentially the same con-
direct and straightforward relation between clusion on this point. In summarizing the
what we think about what we are doing and PDP approach, Norman says "Introspection
what we are indeed doing. That is, con- ... is based upon observation of the outputs
sciousness is regarded as a valid, or, at a of a subconscious (PDP) system. As a result,
minimum, a useful indicator of the behavioral introspections are only capable of accurate
and physiological processes that intervene descriptions of system states. Because there
between the stimulating environment and the is no information available about how the
occurrence of other behavior. The experi- state was reached, introspection cannot give
menter's consciousness, if not the subject's, reasons for the resulting states" (1986, p. 544).
is commonly accorded this ability. Sometimes He, then, advocates a close future relation
the assumption is explicit, as when computer between cognitive psychology and the neu-
simulations of complex behavior are designed rosciences. Minsky and Papert (1988) concur:
to mimic the introspections of experts at a "What any distributed network learns is likely
complex task requiring expertise (Newell & to be quite opaque to other networks connected
Simon, 1961); more often it is implicit, as to it (p. 274).... It is because our brains
when an experimenter introspects under the primarily exploit [adaptive networks] that we
partial control of measures of the subject's possess such small degrees of consciousness,
behavior and infers the cognitive processes in the sense that we have so little insight
said to underlie the subject's behavior. into the nature of our own conceptual ma-
Behavior analysis has provided extensive chinery.... What appear to us to be direct
verbal interpretations (Skinner, 1945, 1964, insights into ourselves must be rarely genuine
1974) and some organic simulations (e.g., and usually conjectural.... Reflective thought
Lubinski & Thompson, 1987) of consciousness is the lesser part of what our minds do" (p.
and its origins. Consciousness (i.e., verbal 280).
behavior under the control of intraorganismic Certain formal analyses have also under-
events) is a defective indicator of those events mined consciousness as bearing a special,
for two classes of reasons. First, the social privileged relation to other behavior. These
community faces uncircumventable impedi- analyses have shown that, with the customary
ments to fostering such discriminations (Skin- nonlinear activation mechanisms of adaptive
ner, 1945, 1964). Second, "There has been networks, there can be, in principle, no in-
no opportunity for the evolution of a nervous variant relation between the output of a net-
system which would bring some very im- work and the events within the network that
portant parts of the body under that control produced that output (Smolensky, 1986, pp.
(p. 242).... Introspection has had to use 422-424). That is, there is no isomorphism
whatever [neural] systems were available, and between the output pattern of the network
they have happened to be systems which made and activity within the network (i.e., unit
contact with those parts of the body that activity). Moreover, ".. . if two models were
played a role in its internal and external started in corresponding [output] states and
economy.... [The verbal system] does not given corresponding inputs, they would not
make contact with that vast nervous system continue to stay in corresponding states"
that mediates ... behavior. [It] does not be- (Smolensky, 1986, p. 424). This last is a
cause there are no nerves going to the right general characteristic of historical science and
places.... The brain plays an extraordinary constitutes a strong formal argument against
role in behavior but not as the object of that conventional information-processing models.
INTERPRETATION VIA ADAPTIVE NETWORKS 413

Knowledge of the behavioral output and en- that computer simulations using adaptive
vironmental input does not sufficiently con- networks can serve an important polemical
strain conjectures about intraorganismic (or function by demonstrating, through formal in-
intranetwork) events to warrant inferences terpretation, the power of behavioral princi-
about such events from these observations ples.
alone. In this important conclusion, which However, we would be remiss if we did
strikes at the very heart of the information- not indicate the antipathy toward behavior
processing enterprise, behavior analysis (Skin- analysis that is expressed in PDP. Although
ner, 1974) and adaptive-network theory con- Skinner's views on many issues predate and
cur. (For related discussions of difficulties in parallel those expressed in PDP, none of his
inferring antecedents from knowledge of only work appears among the many citations in
the outputs of a system, see Anderson, 1978, the bibliography. Moreover, in the one in-
Gleick, 1987, and Churchland & Sejnowski, stance in which behavior analysis is consid-
1988.) ered, a vigorous effort is made to distance
work on adaptive networks from behavior
analysis. To wit, in a section entitled "Some
SOME CLOSING THOUGHTS objections [emphasis ours] to the PDP ap-
A dominant theme of our reactions to PDP proach," the following statements appear: "A
has been that behavior-analytic and adaptive- related claim that some people have made
network approaches share much conceptual is that our models appear to share much in
common ground and have much to gain from common with behaviorist accounts of behavior.
one another in the effort to interpret complex While they do involve simple mechanisms
behavior. Experimental analyses, both of be- of learning, there is a crucial difference be-
havior and neuroscience, provide the findings tween our models and the radical behaviorism
necessary for the construction of computer of Skinner and his followers. In our models,
simulations that conform to the requirements we are explicitly concerned with the problem
of historical science. Computer simulations of intermodal representation and mental pro-
based on experimental analyses provide a cessing, whereas the radical behaviorist ex-
powerful means for the formal interpretation plicitly denies the scientific utility and even
of complex behavior, one that is superior to the validity of the consideration of these con-
verbal interpretation in its ability to implement structs.... Our models must be seen as com-
the intricacies of environmental selection in pletely antithetical to the radical behaviorist
a precise and expeditious manner. program and strongly committed to the study
Skinner has long criticized a fascination of representations and process" (Rumelhart
with what he dubbed the "Conceptual Ner- & McClelland, 1986, p. 121).
vous System" (Skinner, 1938, p. 421), arguing Behavior analysts will recognize such crit-
instead for a science of behavior studied at icisms as ill-informed. As has occurred all
its own level. His position, we believe, has too often, the editors of PDP have mistakenly
been vindicated; the experimental analysis of equated the behavior-analytic position that
behavior has advanced with little concern for behavioral data do not sufficiently constrain
physiological mechanisms. However, the field inferences about underlying microbehavioral
finds itself today, like Darwinism before it and physiological processes to permit fruitful
(Catania, 1987), a science whose accom- conjectures about such processes with the
plishments have outstripped its acceptance "black-box" position that such processes are
by the general public and by much of the of no relevance to a science of behavior (Skin-
scientific community. Skinner's verbal inter- ner, 1974, pp. 233-237). To the degree that
pretations of complex behavior, however com- "intermodal representation" and "mental pro-
pelling to the prepared reader, have failed cessing" denote microbehavioral events or
to excite much interest outside the field. In- events in the real nervous system, behaviorists
deed, it is commonly held by those who call have no objection to the inclusion of such
themselves cognitive scientists that behavioral terms in a complete account of the functioning
interpretations are impoverished or even in- of the organism. To the contrary, as already
adequate in principle (Bever, Fodor, & Gar- noted, Skinner has long acknowledged "the
rett, 1968; Chomsky, 1959). It is our belief advance that is made in the unification of
414 JOHN W. DONAHOE and DAVID C. PALMER

knowledge when terms at one level of analysis will be more receptive to interactions with
are defined ('explained') at a lower level" behavior analysts on the grounds that "A
(Skinner, 1938, p. 428). "The physiologist friend in need is a friend indeed!" The better
of the future will tell us all that can be known reason for such cooperation is that adaptive-
about what is happening inside the behaving network theory, if it is to be a means for
organism. His account will be an important interpretation in historical science, must be
advance over a behavioral analysis, because guided by principles and findings from be-
the latter is necessarily 'historical'-that is havior analysis and neuroscience.
to say, it is confined to functional relations
showing temporal gaps.... What he discovers
cannot invalidate the laws of a science of REFERENCES
behavior, but it will make the picture of human Anderson, J. R. (1978). Arguments concerning rep-
action more nearly complete" (Skinner, 1974, resentations for mental imagery. Psychological Review,
pp. 236-237). 85, 249-277.
What Skinner rejected as futile was the Anderson, J. R. (1983). The architecture of cognition.
attempt to draw strong inferences about phys- Cambridge, MA: Harvard University Press.
Barto, A. G., & Arandan, P. (1985). Pattern recognizing
iology from behavior or, more generally, to stochastic automata. IEEE Transactions on Systems,
draw inferences about physical events taking Man and Cybernetics, 15, 360-375.
place at a lower level from the observation Bechtel, W. (1985). Contemporary connectionism: Are
of physical events occurring at a higher level. the new parallel distributed processing models cog-
If lower level events are to be understood, nitive or associationist? Behaviorism, 13, 53-61.
Berko, J. (1958). The child's learning of English mor-
then those events must be studied directly. phology. Word, 14, 150-177.
(See Donahoe & Palmer, 1988, for a discussion Bever, T. G., Fodor, J. A., & Garrett, M. (1968).
of this point as it relates to the concept of A formal limitation of associationism. In T. R. Dixon
inhibition.) With respect to the relation be- & D. L. Horton (Eds.), Verbal behavior and general
behavior theory (pp. 582-585). Englewood Cliffs, NJ:
tween behavior and physiology, Skinner was Prentice-Hall.
asserting nothing more remarkable than, for Bickel, W. K., & Etzel, B. C. (1985). The quantal
example, that physiology is not the way to nature of controlling stimulus-response relations as
study chemistry or that chemistry is not the measured in tests of stimulus generalization. Journal
way to study physics. As applied to the present of the Experimental Analysis of Behavior, 44, 245-270.
Campbell, D. T. (1974a). 'Downward causation' in
issue, the behavior-analytic position is simply hierarchically organised biological systems. In F. J.
that if one is interested in the architectures Ayala & T. Dobzhansky (Eds.), Studies in the philos-
of adaptive networks, which are crucial de- ophy of biology (pp. 179-186). London: Macmillan.
terminants of the environment-behavior re- Campbell, D. T. (1974b). Evolutionary epistemology.
In P. A. Schilpp (Ed.), The philosophy of Karl Popper
lations that a network can mediate (Minsky (Vol. 1, pp. 413-463). (The library of living philosophers,
& Papert, 1988, p. 266), then behavior-an- Vol. 14). LaSalle, IL: Open Court.
alytic findings must be supplemented by the Carlson, N. R. (1986). Physiology of behavior (3rd ed.).
relevant findings from neuroscience. The de- Boston: Allyn & Bacon.
mands of interpretation in historical science Catania, A. C. (1987). Some Darwinian lessons for
behavior analysis: A review of Bowler's The eclipse
are not met when one attempts to infer un- of Darwinism. Journal of the Experimental Analysis of
derlying processes from phenomena whose Behavior, 47, 249-257.
explanation is sought in terms of those very Catania, A. C., & Cerutti, D. T. (1985). Some nonverbal
same processes. That way lies the chaos of properties of verbal behavior. In T. Thompson &
M. D. Zeiler (Eds.), Analysis and integration of be-
circular reasoning. havioral units (pp. 185-211). Hillsdale, NJ: Erlbaum.
We began with the comment that behavior Chomsky, N. (1959). A review of B. F. Skinner's Verbal
analysts should regard adaptive-network the- Behavior. Language, 35, 26-58.
ory as an ally in the effort to interpret complex Chomsky, N. (1980). Rules and representations. New
behavior. A sufficient reason for taking this York: Columbia University Press.
Churchland, P. S., & Sejnowski, T. J. (1988). Per-
position, although we have indicated many spectives on cognitive neuroscience. Science, 242, 741-
better reasons, is the Machiavellian dictum, 745.
"My enemy's enemy is my friend." We may Costall, A. P. (1984). Are theories of perception nec-
hope that adaptive-network theorists, faced essary? A review of Gibson's The ecological approach
to visual perception. Journal of the Experimental Analysis
with the inevitable onslaughts from main- of Behavior, 41, 109-115.
stream cognitive psychology and linguistics, Crick, F. H. C., & Asanuma, C. (1986). Certain aspects
INTERPRETATION VIA ADAPTIVE NETWORKS 415

of the anatomy and physiology of the cerebral cortex. anisms of sentence processing. In J. L. McClelland,
In J. L. McClelland, D. E. Rumelhart, & the PDP D. E. Rumelhart, & the PDP Research Group (Eds.),
Research Group (Eds.), Parallel distributed processing: Parallel distributed processing: Explorations in the mi-
Explorations in the microstructure of cognition: Vol. 2. crostructure of cognition: Vol. 2. Psychological and bi-
Psychological and biological models (pp. 333-371). Cam- ological models (pp. 272-325). Cambridge, MA: MIT
bridge, MA: MIT Press. Press.
Dawkins, R. (1986). The blind watchmaker. Harlow, McClelland, J. L., & Rumelhart, D. E. (1986). On
Essex: Longman. learning the past tense of English verbs. In J. L.
Donahoe, J. W. (1984). Skinner-The Darwin of McClelland, D. E. Rumelhart, & the PDP Research
ontogeny? Behavioral and Brain Sciences, 7, 487-488. Group (Eds.), Parallel distributed processing: Explo-
(Commentary) rations in the microstructure of cognition: Vol. 2. Psy-
Donahoe, J. W., Crowley, M. A., Millard, W. J., & chological and biological models (pp. 170-215). Cam-
Stickney, K. A. (1982). A unified principle of re- bridge, MA: MIT Press.
inforcement: Some implications for matching. In M. McClelland, J. L., Rumelhart, D. E., & Hinton, G.
L. Commons, R. J. Herrnstein, & H. Rachlin (Eds.), E. (1986). The appeal of parallel distributed pro-
Quantitative analyses of behavior: Vol. 2. Matching and cessing. In D. E. Rumelhart, J. L. McClelland, &
maximizing accounts (pp. 493-521). Cambridge, MA: the PDP Research Group (Eds.), Parallel distributed
Ballinger. processing: Vol. 1 (pp. 3-44). Cambridge, MA: MIT
Donahoe, J. W., & Palmer, D. C. (1988). Inhibition: Press.
A cautionary tale. Journal of the Experimental Analysis McClelland, J. L., Rumelhart, D. E., & the PDP Re-
of Behavior, 50, 333-341. search Group (Eds.). (1986). Parallel distributed
Donahoe, J. W., Palmer, D. C., & Carlson, N. C. processing: Explorations in the microstructure of cognition:
(in press). Complex human behavior: A biobehavioral Vol. 2. Psychological and biological models. Cambridge,
approach. Boston: Allyn & Bacon. MA: MIT Press.
Donahoe, J. W., & Wessells, M. G. (1980). Learning, Miller, L. (1986). Behaviorism and the new science
language, and memory. New York: Harper & Row. of cognition. Psychological Record, 38, 3-18.
Epstein, R. (1984). Simulation research in the analysis Minsky, M. L., & Papert, S. A. (1969). Perceptrons.
of behavior. Behaviorism, 12(2), 41-59. Cambridge, MA: MIT Press.
Epstein, R., Kirshnit, C. E., Lanza, R. P., & Rubin, Minsky, M. L., & Papert, S. A. (1988). Perceptrons
L. C. (1984). 'Insight' in the pigeon: Antecedents (expanded ed.). Cambridge, MA: MIT Press.
and determinants of an intelligent performance. Nature, Munro, P. W. (1986). State-dependent factors influ-
308, 61-62. encing neural plasticity: A partial account of the critical
Epstein, R., Lanza, R. P., & Skinner, B. F. (1980). period. In J. L. McClelland, D. E. Rumelhart, &
Symbolic communication between two pigeons (Co- the PDP Research Group (Eds.), Parallel distributed
lumba livia domestica). Science, 207, 543-545. processing: Explorations in the microstructure of cognition:
Gibson, J. J. (1979). The ecological approach to visual Vol. 2. Psychological and biological models (pp. 471-
perception. Boston: Houghton Mifflin. 502). Cambridge, MA: MIT Press.
Gleick, J. (1987). Chaos: Making a new science. New Newell, A., & Simon, H. A. (1961). Computer sim-
York: Viking. ulation of human thinking. Science, 134, 2011-2017.
Goldiamond, I. (1962). Perception. In A. J. Bachrach Nilsson, N. J. (1965). Learning machines: Foundations
(Ed.), Experimental foundations of clinical psychology of trainable pattern-classifying systems. New York:
(pp. 280-340). New York: Basic Books. McGraw-Hill.
Hinson, J. M., & Staddon, J. E. R. (1983). Hill-climb- Norman, D. A. (1986). Reflections on cognition and
ing by pigeons. Journal of the Experimental Analysis parallel distributed processing. In J. L. McClelland,
of Behavior, 39, 25-47. D. E. Rumelhart, & the PDP Research Group (Eds.),
Jacobs, R. A. (1988). Initial experiments on constructing Parallel distributed processing: Explorations in the mi-
domains of expertise and hierarchies in connectionist crostructure of cognition: Vol. 2. Psychological and bi-
networks. Department of Computer and Information ological models (pp. 531-546). Cambridge, MA: MIT
Science Technical Report, University of Massachu- Press.
setts, Amherst, MA. Olds, M. E., & Fobes, J. L. (1981). The central bases
Jordan, M. I. (1986). Serial order: A parallel distributed of motivation: Intracranial self-stimulation studies.
processing approach. Institute for Cognitive Science Annual Review of Psychology, 32, 523-574.
Technical Report No. 8604, University of California, Palmer, D. C. (1986). Chomsky's nativism: A critical
San Diego, CA. review. In P. N. Chase & L. J. Parrott (Eds.), Psy-
Klopf, A. H. (1988). A neuronal model of classical chological aspects of language (pp. 44-60). Springfield,
conditioning. Psychobiology, 16, 85-125. IL: Thomas.
K6hler, W. (1925). The mentality of apes (2nd ed., E. Palmer, S. E. (1987). PDP: A new paradigm for cog-
Winter, Trans.) New York: Harcourt, Brace, & World. nitive theory. Contemporary Psychology, 32, 925-928.
Lubinski, D., & Thompson, T. (1987). An animal Pinker, S., & Mehler, J. (Eds.). (1988). Connections
model of the interpersonal communication of interocep- and symbols. Cambridge, MA: MIT Press.
tive (private) states. Journal of the Experimental Analysis Posner, M. I., & Keele, S. W. (1968). On the genesis
of Behavior, 48, 1-15. of abstract ideas. Journal of Experimental Psychology,
Mayr, E. (1982). The growth of biological thought: Di- 77, 353-363.
versity, evolution, and inheritance. Cambridge, MA: Reynolds, G. S. (1968). A primer of operant conditioning.
Belknap Press. Glenview, IL: Scott, Foresman.
McClelland, J. L., & Kawamoto, A. H. (1986). Mech- Rosch, E., & Mervis, C. B. (1975). Family resem-
416 JOHN W. DONAHOE and DAVID C. PALMER

blances: Studies in the internal structure of categories. ological models (pp. 372-389). Cambridge, MA: MIT
Cognitive Psychology, 7, 573-605. Press.
Rosenblatt, F. (1962). Principles of neurodynamics: Per- Shimp, C. P. (1969). Optimal behavior in free-operant
ceptrons and the theory of brain mechanisms. Washington, experiments. Psychological Review, 76, 97-112.
DC: Spartan Books. Skinner, B. F. (1935). The generic nature of the concepts
Rumelhart, D. E., Hinton, G. E., & McClelland, J. of stimulus and response. Journal of General Psychology,
L. (1986). A general framework for parallel dis- 12, 40-65.
tributed processing. In D. E. Rumelhart, J. L. Skinner, B. F. (1938). The behavior of organisms. New
McClelland, & the PDP Research Group (Eds.), York: Appleton-Century.
Parallel distributed processing: Explorations in the mi- Skinner, B. F. (1945). The operational analysis of
crostructure of cognition: Vol. 1. Foundations (pp. 45- psychological terms. Psychological Review, 52,270-277.
76). Cambridge, MA: MIT Press. Skinner, B. F. (1948). "Superstition" in the pigeon.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. Journal of Experimental Psychology, 38, 168-172.
(1986). Learning internal representations by error Skinner, B. F. (1953). Science and human behavior. New
propagation. In D. E. Rumelhart, J. L. McClelland, York: Macmillan.
& the PDP Research Group (Eds.), Parallel distributed Skinner, B. F. (1957). Verbal behavior. New York:
processing: Explorations in the microstructure of cognition: Appleton-Century-Crofts.
Vol. 1. Foundations (pp. 318-362). Cambridge, MA: Skinner, B. F. (1964). Behaviorism at fifty. In T. W.
MIT Press. Wann (Ed.), Behaviorism and phenomenology (pp. 79-
Rumelhart, D. E., & McClelland, J. L. (1986). PDP 108). Chicago: University of Chicago Press.
models and general issues in cognitive science. In D. Skinner, B. F. (1966a). The phylogeny and ontogeny
E. Rumelhart, J. L. McClelland, & the PDP Research of behavior. Science, 153, 1203-1213.
Group (Eds.), Parallel distributed processing: Explo- Skinner, B. F. (1966b). What is the experimental
rations in the microstructure of cognition: Vol. 1. Foun- analysis of behavior? Journal of the Experimental Anal-
dations (pp. 110-146). Cambridge, MA: MIT Press. ysis of Behavior, 9, 213-218.
Rumelhart, D. E., McClelland, J. L., & the PDP Re- Skinner, B. F. (1974). About behaviorism. New York:
search Group (Eds.). (1986). Parallel distributed Knopf.
processing: Explorations in the microstructure of cognition: Skinner, B. F. (1977). Why I am not a cognitive psy-
Vol. 1. Foundations. Cambridge, MA: MIT Press. chologist. Behaviorism, 5(2), 1-10.
Rumelhart, D. E., Smolensky, P. McClelland, J. L., Skinner, B. F. (1981). Selection by consequences. Sci-
& Hinton, G. E. (1986). Schemata and sequential ence, 213, 501-504.
thought processes in PDP models. In J. L. McClelland, Skinner, B. F. (1984). The evolution of behavior. Journal
D. E. Rumelhart, & the PDP Research Group (Eds.), of the Experimental Analysis of Behavior, 41, 217-221.
Parallel distributed processing: Explorations in the mi- Smolensky, P. (1986). Neural and conceptual inter-
crostructure of cognition: Vol. 2. Psychological and bi- pretations of PDP models. In J. L. McClelland, D.
ological models (pp. 7-57). Cambridge, MA: MIT E. Rumelhart, & the PDP Research Group (Eds.),
Press. Parallel distributed processing: Explorations in the mi-
Savage-Rumbaugh, E. S. (1984). Verbal behavior at crostructure of cognition: Vol. 2. Psychological and bi-
a procedural level in the chimpanzee. Journal of the ological models (pp. 390-431). Cambridge, MA: MIT
Experimental Analysis of Behavior, 41, 223-250. Press.
Savage-Rumbaugh, E. S. (1986). Ape language: From Smolensky, P. (1988). On the proper treatment of
conditioned response to symbol. New York: Columbia connectionism. Behavioral and Brain Sciences, 11, 1-
University Press. 23.
Segal, M. M. (1988). Review of Explorations in parallel Sober, E. (1984). The nature of selection: Evolutionary
distributed processing: A handbook of models, programs, theory in philosophical focus. Cambridge, MA: MIT
and exercises, edited by J. L. McClelland & D. E. Press.
Rumelhart. Science, 241, 1107-1108. Stone, G. 0. (1986). An analysis of the delta rule
Sejnowski, T. J. (1986). Open questions about com- and the learning of statistical associations. In D. E.
putation in cerebral cortex. In J. L. McClleland, D. Rumelhart, J. L. McClelland, & the PDP Research
E. Rumelhart, & the PDP Research Group (Eds.), Group (Eds.), Parallel distributed processing: Explo-
Parallel distributed processing: Explorations in the mi- rations in the microstructure of cognition: Vol. 1. Foun-
crostructure of cognition: Vol. 2. Psychological and bi- dations (pp. 444-459). Cambridge, MA: MIT Press.

You might also like