You are on page 1of 19

JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 1997, 67, 193–211 NUMBER 2 (MARCH)

THE S-R ISSUE: ITS STATUS IN


BEHAVIOR ANALYSIS AND IN DONAHOE AND PALMER’S
LEARNING AND COMPLEX BEHAVIOR
J OHN W. D ONAHOE , D AVID C. P ALMER , AND
J OSÉ E. B URGOS
UNIVERSIT Y OF MASSACHUSETTS AT AMHERST, SMITH COLLEGE,
AND UNIVERSIDAD CENTRAL DE VENEZUELA AND UNIVERSIDAD CATSLICA DE VENEZUELA

The central focus of this essay is whether the effect of reinforcement is best viewed as the strength-
ening of responding or the strengthening of the environmental control of responding. We make the
argument that adherence to Skinner’s goal of achieving a moment-to-moment analysis of behavior
compels acceptance of the latter view. Moreover, a thoroughgoing commitment to a moment-to-
moment analysis undermines the fundamental distinction between the conditioning processes in-
stantiated by operant and respondent contingencies while buttressing the crucially important differ-
ences in their cumulative outcomes. Computer simulations informed by experimental analyses of
behavior and neuroscience are used to illustrate these points.
Key words: S-R psychology, contingencies of reinforcement, contiguity, discrimination learning, re-
inforcement, respondent conditioning, computer simulation

Richard Shull’s thoughtful review (Shull, To provide a context in which to consider


1995) of Donahoe and Palmer’s Learning and the S-R issue, it is helpful to summarize brief-
Complex Behavior (1994) (hereafter, LCB) ly the central themes of the book: (a) Behav-
prompted this essay. The review accurately ior analysis is an independent selectionist sci-
summarized the general themes that in- ence that has a fundamental conceptual
formed our efforts and, more to the point for kinship with other historical sciences, notably
present purposes, identified an important is- evolutionary biology. (b) Complex behavior,
sue—here called the stimulus–response (S-R) including human behavior, is best under-
issue—that was not directly addressed in our stood as the cumulative product of the action
work. Clarifying the status of the S-R issue is over time of relatively simple biobehavioral
important for the further development of be- processes, especially selection by reinforce-
havior analysis, and we seek to make explicit ment. (c) These fundamental processes are
some of the fundamental concerns that sur- characterized through experimental analyses
round the issue, most particularly as they of behavior and, if subbehavioral processes
arise in LCB. are to be included, of neuroscience. (This
contrasts with normative psychology in which
The simulation research reported here was supported subbehavioral processes are inferred from
in part by a faculty research grant from the Graduate the very behavior they seek to explain, there-
School of the University of Massachusetts at Amherst and
a grant from the National Science Foundation, BNS-
by inviting circular reasoning.) (d) Complex
8409948. The authors thank John J. B. Ayres and Vivian human behavior typically occurs under cir-
Dorsel for commenting on an earlier version of the cumstances that preclude experimental anal-
manuscript. The authors express their special apprecia- ysis. In such cases, understanding is achieved
tion to two reviewers who have chosen to remain anon-
ymous; they, at least, should know of our appreciation for
through scientific interpretations that are
their many contributions to the essay. constrained by experimental analyses of be-
Correspondence and requests for reprints may be ad- havior and neuroscience. The most compel-
dressed to John W. Donahoe, Department of Psychology, ling interpretations promise to be those that
Program in Neuroscience and Behavior, University of
Massachusetts, Amherst, Massachusetts 01003 (E-mail:
trace the cumulative effects of reinforcement
jdonahoe@psych.umass.edu); David C. Palmer, Depart- through formal techniques, such as adaptive
ment of Psychology, Clark Science Center, Smith College, neural networks, as a supplement to purely
Northampton, Massachusetts 01063 (E-mail: dcpal- verbal accounts.
mer@science.smith.edu); or José E. Burgos, Consejo de
Estudios de Post-grado, Facultad de Humanidades y Ed-
It is in the section of the review entitled
ucacion, Universidad Central de Venezuela (UCV), Ca- ‘‘Principle of Selection (Reinforcement)’’
racas, Venezuela (E-mail: jburgos@zeus.ucab.edu.ve). (Shull, 1995, p. 353) that the S-R issue is

193
194 JOHN W. DONAHOE et al.

raised. The following statement in LCB is cit- relevant to the S-R issue is scattered through-
ed: out the book and some of the more technical
The outcome of selection by reinforcement is details are not elaborated, the need for clar-
a change in the environmental guidance of ification is understandable. We consider first
behavior. That is, what is selected is always an the interpretation of responding in a stable
environment–behavior relation, never a re- stimulus context and then proceed to a more
sponse alone. (LCB, p. 68) general examination of the core of the S-R
issue. No effort is made to discuss all of its
Of this statement, Shull comments,
ramifications—the phrase connotes a consid-
In this respect, then, [LCB’s] conception of erable set of interrelated distinctions that
reinforcement is very much in the tradition of vary somewhat among different theorists (cf.
S-R theory . . . [in which] . . . what was selected Lieberman, 1993, p. 190; B. Williams, 1986;
was the ability of a particular stimulus pattern Zuriff, 1985). Also, no effort is made to pro-
to evoke a particular response pattern. (Shull,
vide a historical overview of the S-R issue, al-
1995, p. 353)
though such information is clearly required
The question is then considered of whether for a complete treatment of the topic (see
this view is consistent with the behavior-ana- Coleman, 1981, 1984; Dinsmoor, 1995; Gor-
lytic conception of operant behavior in which mezano & Kehoe, 1981).
‘‘operant behavior occurs in a stimulus con-
text, but there is often no identifiable stimu-
lus change that precedes each occurrence of BEHAVING IN A
the response’’ (Shull, 1995, p. 354). This STABLE CONTEXT
leads to the related concern of whether adap- The central distinction between S-R psy-
tive neural networks are suitable to interpret chology and the view introduced by Skinner
operant behavior because networks are ‘‘con- is how one accounts for variability in behav-
structed from elementary connections in- ior. The defining feature of S-R psychology is
tended as analogues of stimulus–response re- that it explains variability in behavior by ref-
lations’’ (Shull, 1995, p. 354). erence to variability in antecedents: When a
In what follows, we seek to demonstrate not response occurs there must have been some
only that LCB’s view of operant behavior and discrete antecedent, or complex of antece-
its interpretation via adaptive neural networks dents, overt or covert, that evoked the re-
is consistent with behavior-analytic formula- sponse. If the response varies in frequency, it
tions (which we share), but also that this view is because antecedent events have varied in
enriches our understanding of what it means frequency. On this view, there will always be
to say that operants are emitted rather than a nonzero correlation between antecedent
elicited. We agree that the behavior-analytic events and behavior. Further, frequency of re-
view of operants should be regarded as ‘‘lib- sponse (or frequency per unit time, i.e., rate)
erating because . . . fundamental relation- cannot serve as a fundamental dependent
ships could be established in procedures that variable because response rate is, at root, a
allowed responses to occur repeatedly over function of the rate of stimulus presentation.
long periods of time without the constraints In contrast, Skinner held that, even when
of trial onset and offset’’ (Shull, 1995, p. there is no identifiable variability in antece-
354). Instead of departing from behavior-an- dents, variability in behavior remains lawful:
alytic thinking, the view that reinforcers select Behavior undergoes orderly change because
environment–behavior relations fosters more of its consequences. In fact, at the level of
parsimonious treatments of stimulus control behavioral observations, one can find lawful
and conditioning, and represents a continu- relationships between the occurrence of a re-
ation of Skinner’s efforts to provide a com- sponse and the contingencies of reinforce-
pelling moment-to-moment account of be- ment in a stable context. Skinner did not
havior (Skinner, 1976). merely assert the central role of control by
Futher, we concentrate on the rationale be- consequences; he persuasively demonstrated
hind this view of selection by reinforcement it experimentally. Once such control is ac-
as it is interpreted by biobehaviorally con- cepted as an empirical fact and not simply as
strained neural networks. Because material a theoretical preference, the S-R position be-
S-R ISSUE 195

Fig. 1. The simulation by a neural network of acquisition (ACQ), extinction (EXT), and reacquisition (REACQ)
with an operant contingency. The simulated environmental context activated the input units of the neural network
at a constant level of 1 throughout all phases of the simulation. In accordance with an operant contingency, the
input unit for the reinforcing stimulus was activated during ACQ and REACQ only when the activation level of the
output unit simulating the operant (R) was greater than zero. During EXT, the input unit for the reinforcing stimulus
was never activated. (Activation levels of units could vary between 0 and 1.) The activation level of the output unit
simulating the conditioned response (CR), which also changed during the conditioning process, is also shown.

comes untenable. We also accept control by experiments.) In the simulation, the strength
consequences as an empirical fact, and our of the response varied widely even though the
networks simulate some of its orderly effects context remained constant: Responding in-
without appealing to correlated antecedent creased in strength during acquisition, weak-
changes in the environment. ened during extinction, and then rapidly in-
Consider the neural network simulation of creased again during reacquisition, and did
the reacquisition of an extinguished response so more rapidly than during original acqui-
that is discussed in LCB (pp. 92–95). In the sition (see Figure 1). Moreover, the changes
first phase of the simulation a response was in response strength were not monotonic, but
followed by a reinforcer, in the second phase showed irregularities during the transitions in
extinction was scheduled for the response, response strength. None of these changes can
and in the third phase the response was again be understood by reference to the stimulus
reinforced. The ‘‘sensory inputs’’ to the net- context; it remained constant throughout the
work were held constant throughout the sim- simulation. Instead, the changes can only be
ulation. (Note that in a simulation the stim- interpreted by reference to the effects of the
ulus context may be held strictly constant, contingencies of reinforcement on the net-
unaffected by moment-to-moment variations work and to the history of reinforcement in
in stimulation that inevitably occur in actual that context.
196 JOHN W. DONAHOE et al.

BEHAVIORAL AND a ‘‘black box’’ is exact: We can eliminate the


NEURAL LEVELS organism as a variable in our functional re-
OF ANALY SIS lationships, not because the organism is un-
necessary, but because it can be ignored in
How do we square the foregoing account laws of behavior; we treat it as given and as a
with the claim that ‘‘what is selected is always constant, not as a variable. Similarly, when the
an environment–behavior relation, never a context is held constant, it, too, can be ig-
response alone’’ (LCB, p. 68)? The apparent nored, but this does not mean that the con-
incongruity arises from a confusion of levels text is unnecessary any more than the organ-
of analysis. We have attempted to uncover re- ism is unnecessary.
lationships between two independent sci- In discrimination procedures the context
ences: behavior analysis and neuroscience. reemerges in our behavioral laws, because it
Specifically, we have made use of physiologi- is now a variable. There is a difference be-
cal mechanisms that we believe are consistent tween claiming that control by context need
with behavioral laws. S-R psychology and rad- not be considered in some situations and
ical behaviorism are both paradigms of a sci- claiming that control by context does not ex-
ence of behavior; neither includes the under- ist in those situations. Indeed, Skinner took
lying physiology in its purview. In a stable the first position, not the second (Skinner,
context, control by consequences (as op- 1937). Consider the following: In our simu-
posed to antecedents) stands as a behavioral lation of reacquisition, the response gained
law, but we propose (at another level of anal- strength after fewer reinforcers than during
ysis) that the effects of those consequences original learning because some of the effects
are implemented by changes in synaptic ef- of prior reinforcers on the strength of con-
ficacies. This idea is not new, of course; Wat- nections within the network had not been
son thought as much (Watson, 1924, p. 209). completely undone by the intervening period
Consider how the network accomplishes of extinction. The constancy of the context
the simulation discussed above: Changes in during acquisition and reacquisition played a
the strength of a response occur because of crucial role in this result because the endur-
changes in the strengths of connections (sim- ing context permitted some of the same path-
ulating changes in synaptic efficacies) along ways to be activated during both acquisition
pathways from ‘‘upstream’’ elements. That is, and reacquisition (cf. LCB, p. 94; Kehoe,
there are changing gradients of control by 1988). With the simulation, as with a living
the constant context as a function of the con- organism, context sets the occasion for re-
tingencies of reinforcement. From this per- sponding, although its influence may not be
spective, variation in behavior is due to vary- apparent until the context is changed, in
ing consequences, but antecedent events are which case ‘‘generalization decrement’’ is
necessary for the behavior to occur. It is this said to occur. This necessarily implies control
latter feature of our proposal that encourages by context.
the misperception that we are endorsing S-R One can interpret observations at the phys-
psychology, because the strength with which iological level in ways that more transparently
an operant unit is activated depends (among parallel behavioral laws than the accounts we
other things) on the activation of the inputs have offered. For example, consider an im-
of the network by the simulated environment. portant finding from Stein’s group (e.g.,
However, the distinction between S-R psy- Stein & Belluzzi, 1988, 1989; Stein, Xue, &
chology and behavior analysis is at the level Belluzzi, 1993, 1994) that is described in LCB
of behavior, not at the level of biological (p. 56). It was found that the frequency of
mechanism. Our networks are intended to firing of a neuron could be increased by in-
simulate aspects of free-operant behavior ex- troducing a neuromodulator, such as dopa-
hibited by an organism in an experimental mine, into the synapse following a burst of
chamber and the functioning of the network firing. These findings have been interpreted
(i.e., its input–output relations) in conformity to mean that neuromodulators increase the
with behavioral laws. Thus, we argue that the bursting activity of neurons in a manner anal-
effects of the consequences of a response are ogous to the strengthening of emitted behav-
influenced by the context. The analogy with ior by contingent reinforcers. An alternative
S-R ISSUE 197

interpretation of these same facts will be giv- the same conceptual challenge as does inter-
en that is consistent with our view that rein- preting control by a stable context at the be-
forcers affect input–output relations and not havioral level.
output alone. However, the primary point The mechanisms proposed both by Stein
here is that it is a mistake to categorize ac- and by us are consistent with the behavioral
counts at the behavioral level by one’s view of phenomena that led Skinner to break from
the underlying biology. Behavior does not ful- S-R psychology—an increase in responding
ly constrain biology. To hold otherwise is to following the occurrence of a response-con-
endorse the conceptual-nervous-system ap- tingent reinforcer in the absence of a speci-
proach decried by Skinner (1938). fied antecedent. However, we prefer our pro-
Consider the following alternative interpre- posals at both the behavioral and neural
tation of the finding that an increase in the levels because they can accommodate behav-
frequency of firing occurred as a result of the ior in discrimination procedures as well as in
burst-contingent application of a neuromod- stable contexts. It appears to us that proposals
ulator. The finding was attributed to the con- that do not specify a three-term contingency
tiguity between the bursting of the postsyn- must be supplemented by something akin to
aptic neuron and the introduction of the our proposal in order to account for discrim-
neuromodulator (a two-term cellular contin- inated behavior, in which case the former
gency). Such an interpretation is consistent proposed mechanisms would be redundant.
with the finding, but it is not the only possible Ultimately, of course, the interpretation of
interpretation. Moreover, other observations the cellular results is an empirical matter re-
complicate the picture: The neuromodulator quiring simultaneous measurement of all
was not effective after a single spike, but only terms in the three-term contingency: antece-
after a burst of several spikes. An alternative dent events (presynaptic activity), subsequent
interpretation of these facts, proposed in LCB events (postsynaptic activity), and conse-
(pp. 66–67), is that the increase in postsyn- quences (the neuromodulator). Both propos-
aptic activity may reflect a heightened sensi- als have the merit of showing that behavior
tivity of the postsynaptic neuron to the re- analysis can be quite smoothly integrated with
lease of the neurotransmitter glutamate by what is known about the nervous system. This
presynaptic neurons. The experimental work remains but an elusive dream in normative
of Frey (Frey, in press; Frey, Huang, & Kan- (i.e., inferred-process) psychology.
del, 1993) has shown that dopamine acts in In brief, principles formulated on the basis
conjunction with the effects of glutamate on of behavioral observations do not tightly con-
the N-methyl-D-aspartate (NMDA) receptor strain the potential physiological mechanisms
to initiate a second-messenger cascade whose that implement the functional relations de-
ultimate outcome is an enhanced response of scribed by behavioral principles, and physio-
non-NMDA receptors to glutamate. On this logical mechanisms do not dictate the most
view, the ineffectiveness of dopamine after effective statement of principles at the behav-
single spikes occurs because bursting is nec- ioral level. The two levels of analysis must
essary to depolarize the postsynaptic mem- yield consistent principles but, as Skinner
brane sufficiently to engage the voltage-sen- pointed out (1938, p. 432), nothing that is
sitive NMDA receptor. Accordingly, the learned about the physiology of behavior can
increased bursting observed after burst-con- ever undermine valid behavioral laws.
tingent microinjections of dopamine reflects
an enhanced response of the postsynaptic
neuron to presynaptic activity (a three-term THE MOMENT-TO-MOMENT
cellular contingency involving the conjunc- CHARACTER OF
tion of presynaptic and postsynaptic activity BIOBEHAVIORAL PROCESSES
with dopamine). To conclude that bursting is Basic to the disposition of the S-R issue is
independent of presynaptic activity when pre- an even more fundamental matter: whether
synaptic activity has not been measured is to functional relations at the behavioral level are
risk mistaking absence of evidence for evi- best viewed as emergent products of the out-
dence of absence. In short, interpreting these come of moment-to-moment interactions be-
very important neural observations presents tween the organism and its environment or
198 JOHN W. DONAHOE et al.

whether such regularities are sui generis (i.e., ination becomes an anomaly and requires ad
understandable only at the level at which they hoc principles that differ from those that ac-
appear). Skinner clearly favored moment-to- commodate nondifferential conditioning. In
moment analyses (e.g., Ferster & Skinner, such a formulation, the environment would
1957). Consider the following statements in become empowered to control behavior
‘‘Farewell, my lovely!’’ in which Skinner when there were differential consequences,
(1976) poignantly lamented the decline of but not otherwise. But, is it credible that re-
cumulative records in the pages of this jour- inforcers should strengthen behavior relative
nal. to a stimulus with one procedure and not
What has happened to experiments where with the other? And, if so, what events present
rate changed from moment to moment in in- at the ‘‘moment of reinforcement’’ are avail-
teresting ways, where a cumulative record told able to differentiate a reinforced response in
more in a glance than could be described in a discrimination procedure from a reinforced
a page? . . . [Such records] . . . suggested a re- response in a nondiscrimination procedure?
ally extraordinary degree of control over an The conclusion that no such events exist led
individual organism as it lived its life from mo- Dinsmoor (1995, p. 52) to make much the
ment to moment. . . . These ‘‘molecular’’ same point in citing Skinner’s statement that
changes in probability of responding are most ‘‘it is the nature of [operant] behavior that
immediately relevant to our own daily lives.
. . . discriminative stimuli are practically in-
(Skinner, 1976, p. 218)
evitable’’ (Skinner, 1937, p. 273; see also Ca-
Skinner’s unwavering commitment to a mo- tania & Keller, 1981, p. 163).
ment-to-moment analysis of behavior (cf. During differential operant conditioning,
Skinner, 1983, p. 73) has profound—and un- stimuli are sensed in whose presence a re-
derappreciated—implications for the resolu- sponse is followed by a reinforcer. But envi-
tion of the S-R issue as well as for other cen- ronment–behavior–reinforcer sequences nec-
tral distinctions in behavior analysis, essarily occur in a nondiscrimination
including the distinction between operant procedure as well. The two procedures differ
and respondent conditioning itself. with respect to the reliability with which par-
ticular stimuli are present prior to the rein-
Stimulus Control of Behavior forced response, but that difference cannot
In LCB, an organism is described as ‘‘im- be appreciated on a single occasion. The es-
mersed in a continuous succession of envi- sence of reliability is repeatability. The dis-
ronmental stimuli . . . in whose presence a tinction emerges as a cumulative product of
continuous succession of responses . . . is oc- the occurrence of reinforcers over repeated
curring. . . . When a [reinforcing] stimulus is individual occasions. In laboratory proce-
introduced into this stream of events, then dures that implement nondifferential condi-
. . . selection occurs (cf. Schoenfeld & Farm- tioning, it is not that no stimuli are sensed
er, 1970)’’ (p. 49). At the moment when the prior to the response–reinforcer sequence,
reinforcer occurs—what Skinner more casu- but that no stimuli specifiable by the experi-
ally referred to as ‘‘the moment of Truth’’— menter are reliably sensed prior to the se-
some stimulus necessarily precedes the rein- quence.
forced response in both differential and
nondifferential conditioning. That is, at the Conditioning of Behavior
‘‘moment of reinforcement’’ (Ferster & Skin- Paradoxically, by strictly parallel reasoning,
ner, 1957, pp. 2–3), there is no environmen- an acceptance of Skinner’s commitment to a
tal basis by which to distinguish between the moment-to-moment analysis of behavior com-
two contingencies. Therefore, no basis exists pels a rejection of a fundamental distinction
by which different processes could be initiat- between the conditioning processes instan-
ed for nondifferential as contrasted with dif- tiated by respondent and operant proce-
ferential conditioning (i.e., response strength- dures. Instead, a moment-to-moment analysis
ening in the first instance and stimulus calls for a unified theoretical treatment of the
control of strengthening in the second). If conditioning process, with the environmental
control by contextual stimuli does not occur control of responding as the cumulative out-
in nondifferential conditioning, then discrim- come of both procedures.
S-R ISSUE 199

If an organism is continuously immersed in moment of selection is consistent with central


an environment and is continuously behaving aspects of Skinner’s thinking. As noted in
in that environment, then both stimulus and LCB,
response events necessarily precede and, Although Skinner’s treatment of respondent
hence, are potentially affected by the occur- and operant conditioning emphasized the dif-
rence of a reinforcer regardless of the contin- ferences between the two procedures and
gency according to which the reinforcer oc- their outcomes, the present treatment is con-
curs. In a respondent procedure a specified sistent with his emphasis on the ubiquity of
stimulus, the conditioned stimulus (CS) oc- what he called the ‘‘three-term contingency’’
curs before the unconditioned stimulus (US). (Skinner, 1938, 1953). That is, the reinforce-
The CS is likely to become a constituent of ment process always involves three elements—
the selected environment–behavior relation a stimulus, a response, and a reinforcer. There
is nothing in a unified treatment of classical
because of its temporal relation to the US. and operant conditioning that minimizes the
The behavioral constituent of the selected re- crucially important differences between the
lation includes the response elicited by the outcomes of the two procedures for the inter-
US, the unconditioned response (UR). How- pretation of complex behavior. However, a
ever, because organisms are always behaving, unified principle does deeply question the
other responses may also precede the US view that classical and operant procedures
(e.g., orienting responses to the CS; Holland, produce two different ‘‘kinds’’ of learning or
1977), although these responses may vary require fundamentally different theoretical
somewhat from moment to moment. As an treatments. Both procedures select environ-
example of a respondent procedure, if a tone ment–behavior relations but, because of the dif-
ferences in the events that reliably occur in the vi-
precedes the introduction of food into the cinity of the reinforcer, the constituents of the selected
mouth, then the tone may continue to guide relations are different. (LCB, p. 65, emphasis
turning the head toward the source of the added)
tone and come to guide salivating elicited by
food. In the operant procedure, the contin- Acknowledging that the organism is always
gency ensures that a specific behavior—the behaving in the presence of some environ-
operant—occurs before the reinforcer. Be- ment refines the conceptual treatment of re-
cause of its proximity to the reinforcer, the spondents and operants by grounding the
operant is then also likely to become a part distinction on the reliability with which spe-
of the selected environment–behavior rela- cific stimulus and response events are affect-
tion. However, because behavior always takes ed by the two contingencies (cf. Palmer &
place in an environment, some stimulus must Donahoe, 1992). On a single occasion, there
precede the reinforcer although the particu- is no basis by which to distinguish a respon-
lar stimulus may vary from moment to mo- dent from an operant procedure (cf. Hilgard
ment. For example, a rat may see or touch & Marquis, 1940; Hineline, 1986, p. 63). Oth-
the lever prior to pressing it and receiving ers, such as Catania, have appreciated this
food. From this perspective, respondent and point:
operant conditioning are two different pro- It is not clear what differential contingencies
cedural arrangements (i.e., contingencies) could be the basis for discrimination of the
that differ with respect to the environmental contingencies themselves. If we argue that
and behavioral events that are reliably contig- some properties of the contingencies must be
uous with the reinforcer. But, this procedural learned, to what contingencies can we appeal
difference need not imply different condi- as the basis for that learning? (Catania & Kel-
ler, 1981, p. 163)
tioning processes (LCB, pp. 49–50; cf. Dona-
hoe, Burgos, & Palmer, 1993; Donahoe, The difference in procedures produces cru-
Crowley, Millard, & Stickney, 1982, pp. 19– cial differences in their ultimate outcomes,
23). but those different outcomes emerge cumu-
The view that reinforcers select environ- latively over successive iterations of the same
ment–behavior relations whatever the proce- reinforcement process acting in accordance
dure and that various procedures differ with the specific contiguities instantiated by
among themselves in the stimuli and re- the procedures. A commitment to a moment-
sponses that are likely to be present at the to-moment analysis unavoidably commits one
200 JOHN W. DONAHOE et al.

to the view that reinforcers select environ- Relation of Momentar y Processes to


ment–behavior relations, not behavior alone. Molar Regularities
At the ‘‘moment of Truth’’—whether in a re- Skinner was resolutely committed to a mo-
spondent or an operant procedure or in a ment-to-moment account at the behavioral
discrimination or nondiscrimination proce- level of analysis, although he did not acknowl-
dure—the reinforcing stimulus accompanies edge that this view would call for a reassess-
both environmental and behavioral events. ment of the conceptual distinction between
Hence, even if fundamentally different con- operant and respondent conditioning (but
ditioning processes existed for the various not the crucial differences between these pro-
procedures, there would be no environmen- cedures and their corresponding outcomes).
tal basis by which one or the other could be His early adherence to a moment-to-moment
appropriately invoked (cf. Donahoe et al., analysis is apparent in the experimental ob-
1982, 1993, pp. 21–22). servation that, under properly controlled cir-
In short, we have been misled into search- cumstances, even a single occurrence of a
ing for different processes to account for re- lever press followed by food changes behavior
spondent and operant conditioning and for (Skinner, 1938). Skinner’s discussions of su-
nondifferential and differential conditioning, perstitious conditioning echo the same
as well as for more complex discrimination theme: Momentary temporal relations may
procedures (cf. Sidman, 1986), by the lan- promote conditioning (see also Pear, 1985;
guage of contingency. Contingency, as the term Skinner, 1953, pp. 86–87; for alternative in-
is conventionally used in behavior analysis, re- terpretations, cf. Staddon & Simmelhag,
fers to relations between events that are de- 1971; Timberlake & Lucas, 1985):
fined over repeated instances of the constit-
uent events. We describe our experimental A stimulus present when a response is
procedures in terms of the manipulation of reinforced may acquire discriminative control
contingencies, but, by changing the contin- over the response even though its presence at
reinforcement is adventitious. (Morse & Skin-
gencies, we change the contiguities. In our
ner, 1957, p. 308)
search for the controlling variables, we have
confused the experimenter’s description of And,
the contingencies with the organism’s contact
to say that a reinforcement is contingent upon
with the contiguities instantiated by those a response may mean nothing more than that
contingencies. And, of course, it is the organ- it follows the response. . . . conditioning takes
ism’s contact with events, not the experimen- place because of the temporal relation only,
ter’s description of them, that must be the expressed in terms of the order and proximity
basis for selection by reinforcement. of response and reinforcement. (Skinner,
Contingency is the language of procedure; 1948, p. 168)
contiguity is the language of process. We have
The centrality of momentary temporal rela-
not thoroughly researched Skinner’s use of
tions has also been affirmed by students of
the term contingency, but he employed it, at
respondent conditioning. Gormezano and
least sometimes, in a manner that is synony-
Kehoe, speaking within the associationist tra-
mous with contiguity. For example, ‘‘there ap-
dition, state,
pears to be no way of preventing the acqui-
sition of non-advantageous behavior through A single instance of contiguity between A and
accident. . . . It is only because organisms B may establish an association, repeated in-
have reached the point at which a single con- stances of contiguity were necessary to estab-
tingency makes a substantial change that they lish a cause-effect relation. (p. 3)
are vulnerable to coincidences’’ (Skinner, Any relationship of ‘‘pairing’’ or ‘‘correlation’’
can be seen to be an abstraction of the record.
1953, pp. 86–87, emphasis added; cf. Catania (Gormezano & Kehoe, 1981, p. 31)
& Keller, 1981, p. 128). (The meaning of con-
tingency as a coincidental relation between Moment-to-moment accounts of the con-
events is, in fact, the primary meaning in ditioning process are also consistent with ob-
many dictionaries, although in behavior anal- servations at the neural level. For example,
ysis it much more often denotes reliable re- Stein’s work indicates that the reinforcing ef-
lations.) fect of the neuromodulator dopamine occurs
S-R ISSUE 201

only when it is introduced into the synapse tingencies of reinforcement that pit moment-
within 200 ms of a burst of firing in the post- to-moment processes against molar regulari-
synaptic neuron (Stein & Belluzzi, 1989). Be- ties. Under these circumstances, the variation
havior analysis and neuroscience are inde- in behavior typically tracks moment-to-mo-
pendent disciplines, but their principles ment relations, not relations between events
cannot be inconsistent with one another’s defined over more extended periods of time.
findings. The two sciences are dealing with For example, with positive reinforcers, differ-
different aspects of the same organism (LCB, ential reinforcement of responses that occur
pp. 275–277; Skinner, 1938). at different times following the previous re-
Although conditioning processes are in- sponse (i.e., differential reinforcement of in-
stantiated in moment-to-moment relations terresponse times, or IRTs) changes the over-
between events, compelling regularities all rate of responding even though the overall
sometimes appear in the relation between in- rate of reinforcement is unchanged (Platt,
dependent and dependent variables defined 1979). As conjectured by Shimp (1974, p.
over more extended periods of time (e.g., be- 498), ‘‘there may be no such thing as an as-
tween average rate of reinforcement and av- ymptotic mean rate of [responding] that is
erage rate of responding; Baum, 1973; Herrn- . . . independent of reinforced IRTs’’ (cf. An-
stein, 1970). What is the place of molar ger, 1956). Similarly, in avoidance learning,
regularities in a science if its fundamental when the delay between the response and
processes operate on a moment-to-moment shock is varied but the overall rate of shock
basis? Nevin’s answer to this question seems is held constant, the rate of avoidance re-
very much on the mark: ‘‘The possibility that sponding is sensitive to the momentary delay
molar relations . . . may prove to be derivative between the response and shock, not the
from more local processes does nothing to overall rate of shock (Hineline, 1970; see also
diminish their value as ways to summarize Benedict, 1975; Bolles & Popp, 1964). Re-
and integrate data’’ (Nevin, 1984, p. 431; see search with respondent procedures has led in
also Herrnstein, 1970, p. 253). The concep- the same direction: Molar regularities are the
tual relation between moment-to-moment cumulative products of moment-to-moment
processes and molar regularities in behavior relations. For example, whereas at one time
analysis parallels the distinction between ‘‘se- it was held that behavior was sensitive to the
lection for’’ and ‘‘selection of ’’ in the paradig- overall correlation between conditioned and
matic selectionist science of evolutionary bi- unconditioned stimuli (Rescorla, 1967), later
ology (Sober, 1984). Insofar as the notions of experiments (Ayres, Benedict, & Witcher,
cause and effect have meaning in the context 1975; Benedict & Ayres, 1972; Keller, Ayres,
of the complex interchange between an or- & Mahoney, 1977; cf. Quinsey, 1971) and the-
ganism and its environment: ‘‘‘Selection for’ oretical work (Rescorla & Wagner, 1972)
describes the causes, while ‘selection of’ de- demonstrated that molar regularities could
scribes the effects’’ (Sober, 1993, p. 82). In be understood as the cumulative products of
evolutionary biology, selection for genes af- molecular relations between CS and US. In
fecting reproductive fitness leads to selection summary, research with both operant and re-
of altruistic behavior (Hamilton, 1964). As the spondent procedures has increasingly shown
distinction applies in behavior analysis, rein- that molar regularities are the cumulative
forcers cause certain environment–behavior products of moment-to-moment conditioning
relations to be strengthened; this has the ef- processes. (For initial work of this sort, see
fect, under some circumstances, of producing Neuringer, 1967, and Shimp, 1966, 1969,
molar regularities. Selection by reinforce- 1974. For more recent efforts, see Herrn-
ment for momentary environment–behavior stein, 1982; Herrnstein & Vaughan, 1980;
relations produces selection of molar regular- Hinson & Staddon, 1983a, 1983b; Moore,
ities. 1984; Silberberg, Hamilton, Ziriax, & Casey,
One can demonstrate that what reinforcers 1978; Silberberg & Ziriax, 1982.)
select are momentary relations between en- It must be acknowledged, however, that not
vironmental and behavioral events, not the all molar regularities can yet be understood
molar regularities that are their cumulative as products of molecular processes (e.g., be-
products. This can be done by arranging con- havior maintained by some schedules or by
202 JOHN W. DONAHOE et al.

long reinforcer delays; Heyman, 1979; Hine- forcement is for the former environments,
line, 1981; Lattal & Gleeson, 1990; Nevin, whereas natural selection is for the latter.
1969; B. Williams, 1985). Refractory findings Additional experimental work is needed to
continue to challenge moment-to-moment determine how moment-to-moment process-
accounts, and a completely integrated theo- es may lead to molar regularities, but the ef-
retical treatment of molar regularities in fort will undoubtedly also require interpre-
terms of molecular processes still eludes us tation (Donahoe & Palmer, 1989, 1994, pp.
(cf. B. Williams, 1990). Difficulties in provid- 125–129). In the final section of this essay,
ing moment-to-moment accounts of molar interpretation by means of adaptive neural
regularities in complex situations are not pe- networks is used to clarify the contribution of
culiar to behavior analysis. Physics continues momentary processes to the central issue: the
to struggle with many-body problems in me- S-R issue.
chanics, even though all of the relevant fun-
damental processes are presumably known.
Nevertheless, it is now clear that behavior NEURAL NETWORK
analysis is not forced to choose between mo- INTERPRETATIONS OF
lar and moment-to-moment accounts (e.g., CONDITIONING
Meazzini & Ricci, 1986, p. 37). The two ac- We turn finally to the question of whether
counts are not inconsistent if the former are biobehaviorally constrained neural networks
regarded as the cumulative product of the lat- can faithfully interpret salient aspects of the
ter. stimulus control of operants. The full answer
Indeed, the two accounts may be even to this question obviously lies in the future;
more intimately intertwined: In the evolu- however, preliminary results are encouraging
tionary history of organisms, natural selec- (e.g., Donahoe et al., 1993; Donahoe & Dor-
tion may have favored genes whose expres- sel, in press; Donahoe & Palmer, 1994). Our
sion yielded moment-to-moment processes concern here is whether, in principle, net-
that implemented certain molar regularities works ‘‘constructed from elementary connec-
as their cumulative product (LCB, pp. 112– tions’’ that are said to be ‘‘analogues of stim-
114; Donahoe, in press-b; cf. Skinner, 1983, ulus–response relations’’ can accommodate
p. 362; Staddon & Hinson, 1983). Natural se- the view that ‘‘operant behavior occurs in a
lection for some molar regularity (e.g., maxi- stimulus context, but there is often no iden-
mizing, optimizing, matching) may have led tifiable stimulus change that precedes each
to selection of moment-to-moment processes occurrence of the response’’ (Shull, 1995, p.
whose product was the molar regularity. In 354). This view of operants is rightly regarded
that way, natural selection for the molar reg- as ‘‘liberating’’ because it empowers the study
ularity could lead to selection of momentary of complex reinforcement contingencies in
processes. Once those moment-to-moment the laboratory and because it frees applied
processes had been naturally selected, selec- behavior analysis from the need to identify
tion by reinforcement for momentary envi- the precise controlling stimuli for dysfunc-
ronment–behavior relations could, in turn, tional behavior before instituting remedial in-
cause selection of the molar regularity. Note, terventions. Indeed, it can be argued that
however, to formulate the reinforcement pro- pragmatic considerations motivated the op-
cess in terms of the molar regularities it pro- erant-respondent distinction more than prin-
duces, rather than the moment-to-moment cipled distinctions about the role of the en-
processes that implement it, is to conflate nat- vironment in emitted and elicited behavior.
ural selection with selection by reinforce- The present inquiry into neural network
ment. The selecting effect of the temporally interpretations of operants can be separated
extended environment is the province of nat- into two parts: The first, and narrower, ques-
ural selection; that of the moment-to-moment tion is: Do neural networks implement ‘‘an-
environment is the province of selection by alogues of stimulus–response relations’’? The
reinforcement. Of course, many momentary second is: Are neural networks capable of
environments make up the temporally ex- simulating the effects of nondifferential as
tended environment, but selection by rein- well as differential operant contingencies?
S-R ISSUE 203

Interpreting Environment–Behavior traceable to the environment, that is, to his-


Relations tories of selection by the ancestral environ-
ment as understood through natural selec-
A neural network consists of (a) a layer of tion and by the individual environment as
input units whose activation levels simulate understood through selection by reinforce-
the occurrence of environmental events, (b) ment. Also, to be congenial with behavior
one or more layers of ‘‘hidden’’ or interior analysis, all intraorganismic events must be
units whose activation levels simulate the the product of independent biobehavioral re-
states of interneurons, and (c) a layer of out- search; they cannot be inferences from be-
put units whose activation levels simulate the havior alone. For instance, the organismic
effectors that produce behavioral events (cf. counterparts of hidden units are not merely
Donahoe & Palmer, 1989). If a stimulus–re- inferences from a behavioral level of obser-
sponse relation denotes a relation that is me- vation but are observed entities from a neural
diated by direct connections going from in- level.
put to output units, then such relations are In the case of our neural network research,
not, in general, characteristic of neural net- when input units are stimulated by the sim-
works. Although a simple network consisting ulated occurrence of environmental stimuli,
of only such input–output connections (a so- the interior units to which those input units
called perceptron architecture; Rosenblatt, are connected are probabilistically activated
1962) can mediate a surprising range of in- in the following moment. If a reinforcing sig-
put–output relations, some relations that are nal is present at that moment, then connec-
demonstrable in living organisms are beyond tions are strengthened between input units
the capabilities of these networks (Minsky & and all recently activated interior units to
Papert, 1969). In contrast, networks with non- which they are connected. The process of
linear interior units, which more closely sim- strengthening the connections between co-
ulate the networks of neurons in the nervous active pre- and postsynaptic units is carried
system, are typical of modern neural network out simultaneously throughout the network
architectures. Such multilayered networks at each moment until the end of the simulat-
have already demonstrated their ability to me- ed time period. The activation levels of units
diate a substantial range of complex environ- decay over time unless they were reactivated
ment–behavior relations that are observed during the preceding moment. Simulations
with living organisms (e.g., Kehoe, 1988, in which the strengths of connections are
1989; cf. McClelland, Rumelhart, & the PDP changed from moment to moment are
Research Group, 1986; Rumelhart, Mc- known as ‘‘real-time’’ simulations, and the
Clelland, & the PDP Research Group, 1986). successive moments at which the strengths of
Thus, neither neuroscience nor neural net- connections are changed (or ‘‘updated’’) are
work research endorses formulations in called ‘‘time steps.’’ Stated more generally,
which stimuli guide behavior by means of di- real-time neural network simulations imple-
rect connections akin to monosynaptic reflex- ment a dynamical systems approach to the in-
es. (We would also note that not even tradi- terpretation of behavior (cf. Galbicka, 1992).
tional S-R learning theorists—e.g., Guthrie, In a fully realized simulation, the simulated
1933; Hull, 1934, 1937; Osgood, 1953—held processes that change the strengths of con-
such a simple view of the means whereby the nections, or ‘‘connection weights,’’ and the
environment guided behavior. In many of durations of time steps are tightly constrained
their proposals, inferred processes, such as by independent experimental analyses of
the rg-sg mechanism, intervened between the neuroscience and behavior (e.g., Buonoma-
environment and behavior.) no & Merzenich, 1995) and, at a minimum,
The neural network research of potential are consistent with what is known about such
interest to behavior analysts is distantly relat- processes. Skinner’s dictum (1931) that be-
ed to what was earlier called S-O-R psychology havior should be understood at the level at
(where O stood for organism). However, ac- which orderly relations emerge applies with
knowledging a role for the organism in no equal force to the neural level. Although con-
way endorses an autonomous contribution of nection weights are updated on a moment-to-
the organism: All contributions must be moment basis, the functioning of a network
204 JOHN W. DONAHOE et al.

cannot be understood solely by reference to random environment–behavior relations


the environment of the moment: Connection arise by virtue of the organism’s interaction
weights at any given moment are a function with the environment. And, such interactions
of the entire selection history of the network generally occur because all responses are not
to that point. Networks, like organisms, are equally likely in the presence of all stimuli.
historic systems whose current performance Rats are more apt to make forelimb move-
cannot be understood by reference to the en- ments approximating lever pressing (e.g.,
vironment of the moment alone (Staddon, climbing movements) in environments that
1993; cf. Donahoe, 1993). contain protruding horizontal surfaces than
in environments that are devoid of such fea-
Interpreting Behavior in Nondiscrimination tures. Behavior is directed toward objects and
Procedures features of objects, not thin air. When an ex-
Before we describe computer simulations ternal environment includes stimuli that
that illustrate interpretations of the condi- make certain behavior more probable, that
tioning of operants, the view that some op- environment was said to provide ‘‘means-end-
erants may be uninfluenced by antecedent readinesses’’ by Tolman (1932) and ‘‘afford-
stimuli requires closer examination. Upon in- ances’’ by Gibson (1979).
spection, experimental situations that meet In addition to stimuli provided by the en-
the definition of a nondiscrimination proce- vironment, the organism’s own prior behav-
dure typically contain implicit three-term ior produces stimuli that become available to
contingencies. For example, consider a situ- guide further responding. As an example of
ation in which a pigeon is presented with a behaviorally generated stimuli, on ratio
response key of a constant green color and schedules a response is more apt to be rein-
key pecking is reinforced with food on some forced following sensory feedback from a
schedule of intermittent reinforcement. Be- burst of prior responses than following feed-
cause no other conditions are manipulated back from a single prior response (Morse,
by the experimenter, the arrangement is ap- 1966; D. Williams, 1968). Ferster and Skin-
propriately described as a nondiscrimination ner’s seminal work, Schedules of Reinforcement
procedure. Note, however, that pecking is (1957), is replete with proposals for stimuli
more likely to be reinforced if the pigeon’s that could function as discriminative stimuli
head is oriented toward the green key than if in nondiscrimination procedures (see also
it is oriented toward some other stimulus in Blough, 1963; Hinson & Staddon, 1983a,
the situation; pigeons tend to look at what 1983b).
they peck (Jenkins & Sainesbur y, 1969).
Thus, the observing response of orienting to- Interpreting Context in Simulations of
ward the green key is reinforced as a com- Operant Conditioning
ponent of a behavioral chain whose terminal In the simulation of acquisition, extinction,
response is pecking the green key. Stated and reacquisition in a stable environment,
more generally, observing responses are of- the role of context could safely be ignored.
ten implicitly differentially reinforced in non- However, for reasons noted earlier, control by
discrimination procedures, and the stimuli elements of the context may occur, and that
produced by such responses are therefore control can be simulated by selection net-
more likely to be sensed prior to the rein- works, the type of adaptive neural network
forced response. As a result, such stimuli proposed in LCB. Selection networks consist
come to control the response (Dinsmoor, of groups of input units, of interior units sim-
1985; cf. Heinemann & Rudolph, 1963). ulating neurons in sensory association cortex
Moreover, a schedule of reinforcement that whose connection strengths are modified by
is implemented in an environment in which hippocampal efferents, of interior units sim-
the experimenter has not programmed a re- ulating neurons in motor association cortex
lation between features of the environment whose connection strengths are modified by
and the response–reinforcer contingency ventral-tegmental efferents, and of output
may nonetheless contain stimuli in whose units.
presence the reinforced response differen- Figure 2 provides an example of the archi-
tially occurs. This relation obtains when non- tecture of a simple selection network (for de-
S-R ISSUE 205

Fig. 2. A minimal architecture of a selection network for simulating operant conditioning. Environmental events
stimulate primary sensory input units (S1, S2, and S3) that give rise to connections that activate units in sensory
association areas and, ultimately, units in motor association and primary motor areas. One primary motor output
unit simulates the operant response (R). When the R unit is activated, the response–reinforcer contingency imple-
mented by the simulation stimulates the SR input unit, simulating the reinforcing stimulus. Stimulating the SR unit
activates the subcortical dopaminergic system of the ventral tegmental area (VTA) and the CR/UR output unit
simulating the reinforcer-elicited response (i.e., the unconditioned response; UR). Subsequent to conditioning, en-
vironmental events acting on the input units permit activation of the R and CR/UR units simulating the operant
and conditioned response (CR), respectively. The VTA system modifies connection weights to units in motor asso-
ciation and primary motor areas and modulates the output of the hippocampal system. The output of the hippocam-
pal system modifies connection weights to units in sensory association areas. Connection weights are changed as a
function of moment-to-moment changes in (a) the coactivity of pre- and postsynaptic units and (b) the discrepancies
in diffusely projecting systems from the hippocampus (d1) and the VTA (d2). The arrowheads point toward those
synapses that are affected by activity in the diffusely projecting systems. Finer lines indicate pathways whose connection
weights are modified by the diffusely projecting systems. Heavier lines indicate pathways that are functional from the
outset of the simulation due to natural selection. (For additional information, see Donahoe et al., 1993; Donahoe &
Dorsel, in press; Donahoe & Palmer, 1994.)

tails, see Donahoe et al., 1993; LCB, pp. 237– richness of even the relatively impoverished
239). A stable context may be simulated using environment of a test chamber and the rela-
a network with three input units (S1, S2, and tively simple contingencies programmed
S3). In the first simulation, S1 was continu- therein; Donahoe, in press-a.) Whenever the
ously activated with a strength of .75, simu- output unit simulating the operant became
lating a salient feature of the environment activated, a reinforcing stimulus was present-
(e.g., the wavelength on a key for a pigeon). ed and all connections between recently co-
S2 and S3 were continuously activated with active units were slightly strengthened. After
strengths of .50, simulating less salient fea- training in which the full context set the oc-
tures of the environment (e.g., the masking casion for the operant, probe tests were con-
noise in the chamber, stimuli from the cham- ducted in which each of the three input units
ber wall adjacent to the key, etc.). (No simu- making up the context was activated separate-
lation can fully capture the complexity and ly and in various combinations. (Note, again,
206 JOHN W. DONAHOE et al.

when even the most salient stimulus, S1, was


presented alone and out of context, the op-
erant unit was activated only at a level slightly
above .25. As noted in LCB, ‘‘the environ-
ment–behavior relation selected by the rein-
forcer depends on the context in which the
guiding stimulus appears’’ (p. 139). And, ‘‘a
stimulus that has been sensed and discrimi-
nated may fail to guide behavior when it oc-
curs outside the context in which the discrim-
ination was acquired’’ (p. 154). The less
salient components of the context, S2 and S3,
activated the operant unit hardly at all,
whether they occurred by themselves or in
combination. It was only when S1 was pre-
sented in the partial context of either S2 or
S3 that the operant unit was strongly activat-
ed, although still not as strongly as in the full
context.
The lower panel of Figure 3 shows that the
effect of context may be even more subtly ex-
pressed when no aspect of the context is es-
pecially salient. In this simulation, the S1
component of the context was activated at a
level of .60 (instead of .75 as in the first sim-
ulation), and the S2 and S3 components were
activated at a level of .50 as before. Now,
when probe tests were simulated, the operant
output unit was appreciably activated only by
the full context and not by the components,
either singly or in combination.
As simulated by selection networks, the en-
vironmental guidance of behavior, whether
Fig. 3. Simulation results showing the mean activa- by a specified discriminative stimulus or by
tion levels of the operant output unit (R) after condi-
tioning in a stable context consisting of three stimuli (S1, components of a variable context, is de-
S2, and S3). In the upper panel, S1 was more salient than scribed in LCB as follows:
the other stimuli and activated the S1 input unit at a level
of .75 rather than .50 for the S2 and S3 units. In the Since there are generally a number of possible
lower panel, S1 was only slightly more salient than the paths between the relevant input and output
other stimuli and activated the S1 input unit at a level of units, and since the active pathways mediating
.60. The height of each bar represents the mean activa- the selected input-output relation are likely to
tion of R by the various stimuli and combinations of stim- vary over time, the selected pathways include
uli making up the context, including the full context of a number of alternative paths between the in-
S1, S2, and S3 used in training (TRAIN). put and output units. Within the network—
and that portion of the nervous system the
network is intended to simulate—an input
that simulation permits an assessment of con- unit evokes activity in a class of pathways be-
ditions that cannot be completely realized ex- tween the input and output units. At the end
perimentally.) of selection, the discriminative [or contextual]
stimulus that activates the input units does not
As shown in the upper panel of Figure 3, so much elicit the response as permit the re-
by the end of 100 simulated reinforcers fol- sponse to be mediated by one or more of the
lowing nonzero activation of the operant selected pathways in the network. The . . .
unit, the output unit was strongly and reliably stimulus does not elicit the response; it per-
activated by the full context in which training mits the response to be emitted by the organ-
had taken place (see leftmost bar). However, ism. (LCB, p. 148)
S-R ISSUE 207

On the level of the nervous system, this is the by themselves have effects (e.g., habituation,
counterpart of Skinner’s distinction between sensitization, or latent inhibition) even when
elicited responses (respondents) and emitted responding has no programmed conse-
responses (operants); Skinner, 1937. (LCB, p. quences. However, in a simulation the input
151)
units can be stimulated when the algorithms
Because, in general, behavior is not the result that modify connection weights are disabled.
of the environment activating an invariant In the present case, when the S1, S2, and S3
and rigidly circumscribed set of pathways, input units were stimulated as in the first sim-
LCB prefers to speak of behavior as being ulation of context conditioning but with no
‘‘guided’’ rather than controlled by the en- change in connection weights, the mean ac-
vironment. (As an aside, the phrase ‘‘environ- tivation of the operant output unit during
mental guidance of behavior’’ has also been 200 trials was only .09. Thus, stimuli did not
found to have certain tactical advantages over evoke activity in the operant unit to any ap-
‘‘stimulus control of behavior’’ when seeking preciable degree; that is, responding was not
a fair hearing for behavior-analytic interpre- elicited.
tations of human behavior.) Turn now to the question: Does condition-
The foregoing simulations illustrate the ing occur if activity of the operant unit is fol-
context dependence of the conditioning pro- lowed by a putative reinforcing stimulus when
cess when an operant is acquired in the stable there is no environmental context (not mere-
environment of a nondiscrimination proce- ly no measured or experimenter-manipulated
dure. Our previous simulation research has context)? To answer this question, a simula-
demonstrated that an operant may be tion was conducted under circumstances that
brought under more precise stimulus con- were otherwise identical to the first simula-
trol: When a discrimination procedure was tion except that the input units of the net-
simulated, the controlling stimuli were re- work were not activated. Any connection
stricted to those that most reliably preceded strengths that were modified were between
the reinforced response (cf. Donahoe et al., units that were activated as the result of spon-
1993; LCB, p. 78). Thus, the same learning taneous coactivity between interior and op-
algorithm that modifies the strengths of con- erant units. Under such circumstances, acti-
nections in the same selection-network archi- vation of the operant unit is emitted in the
tecture can simulate important conditioning purest sense; that is, its activation is solely the
phenomena as its cumulative effect with ei- product of endogenous intranetwork vari-
ther a nondiscrimination or a discrimination ables. Simulation indicated that even after as
procedure. many as 1,000 operant–reinforcer pairings us-
ing identical values for all other parameters,
Interpreting the Requirements for conditioning did not occur. Thus, in the ab-
Operant Conditioning sence of an environment, a two-term re-
Simulation techniques can be applied to sponse–reinforcer contingency was insuffi-
the problem of identifying the necessary and cient to produce conditioning in a selection
sufficient conditions for learning in selection network.
networks. What are the contributions of the The ineffectiveness of a two-term contin-
stimulus, the two-term response–reinforcer gency between an activated output unit and
contingency, and the three-term stimulus–re- the occurrence of a putative reinforcer is a
sponse–reinforcer contingency to operant consequence of our biologically based learn-
conditioning? And, what role, if any, is played ing algorithm (Donahoe et al., 1993, p. 40,
by intranetwork variables that affect the Equation 5). The learning algorithm simu-
‘‘spontaneous’’ activity of units? lates modification of synaptic efficacies be-
Consider the question: What is the baseline tween neurons, and is informed by experi-
activation level of the operant unit (i.e., its mental analyses of the conditions that
operant level) when stimuli are applied to in- produce long-term potentiation (LTP). Ex-
put units but without consequences for activ- perimental analyses of LTP indicate that syn-
ity induced in any other units in the network? aptic efficacies increase when a neuromodu-
In living organisms, this condition is imper- lator (that occurs as a result of the
fectly realized because stimulus presentations reinforcing stimulus) is introduced into syn-
208 JOHN W. DONAHOE et al.

As shown by the other acquisition functions


in Figure 4, reductions in the level of spon-
taneous activity markedly retarded the simu-
lated acquisition of operant conditioning.
With s 5 .09, acquisition did not begin until
after 125 reinforcers. Most strikingly, when s
was .08 or less, acquisition failed to occur al-
together, even after as many as 200 simulated
three-term contingencies. (The level of spon-
taneous activation of individual units was ap-
proximately .001 with s 5 .08.) Thus, in the
absence of spontaneous unit activity, even a
three-term contingency was insufficient to
produce conditioning. From this perspective,
the spontaneous activity of neurons is not an
impediment to the efficient functioning of
the nervous system or to its scientific inter-
pretation by means of neural networks, but is
Fig. 4. Simulation results showing changes in the ac- an essential requirement for its operation
tivation level of the R unit during conditioning for dif- and understanding.
ferent levels of ‘‘spontaneous’’ activity of units in the se- In conclusion, the effects of a three-term
lection network. The level of spontaneous activity was contingency, together with spontaneous unit
varied by manipulating the standard deviation (s) of the
logistic function, which determined the activation of a activity, are necessary and sufficient for the
unit as a function of excitation from inputs to that unit. simulation of operant conditioning in selec-
(See text for additional information.) tion networks. The interpretation of the se-
lection process by neural networks leads to a
deeper understanding of what it means to de-
apses between coactive pre- and postsynaptic scribe operants as ‘‘emitted.’’ In the moment-
neurons (Frey, in press; Frey et al., 1993; see to-moment account provided by neural net-
also Beninger, 1983; Hoebel, 1988; Wise, works, the statements that ‘‘what is selected is
1989). Under the conditions of the simula- always an environment–behavior relation,
tion, the presynaptic units and the output never a response alone’’ (LCB, p. 68) and
unit were very unlikely to be coactive spon- that ‘‘operant behavior occurs in a stimulus
taneously. Without stimuli acting on input context, but there is often no identifiable
units to increase the likelihood of coactive stimulus change that precedes each occur-
units, the simulated reinforcer was ineffec- rence of the response’’ (Shull, 1995, p. 354)
tive. are not inconsistent. To the contrary, the
Is, then, a three-term contingency suffi- statements are complementary: An environ-
cient to simulate conditioning in a selection ment is necessary for reinforcers to select be-
network? The curve in Figure 4 designated havior, but without spontaneous intranetwork
by s 5 .1 shows the acquisition function for activity environment–behavior–reinforcer se-
the first context-conditioning example. After quences are insufficient. In a moment-to-mo-
some 75 reinforcers, the operant output unit ment account, as favored by Skinner and im-
became increasingly strongly activated. The plemented by selection networks, environment–
parameter s is the standard deviation of the behavior relations are neither purely emitted
logistic function (see Donahoe et al., 1993, nor purely dependent on particular environ-
Equation 4), a nonlinear function relating ment stimuli. Within the range of environ-
the activation of a postsynaptic unit to the net ment–behavior relations that are convention-
excitation from its presynaptic inputs. This ally designated as operant, relations are
parameter determines the level of spontane- simultaneously guided by the environment
ous activity of a unit. (Neurons in the central and emitted by the organism.
nervous system typically have baseline fre-
quencies of firing that are substantially above REFERENCES
zero due to local intracellular and extracel- Anger, D. (1956). The dependence of interresponse
lular events.) times upon the relative reinforcement of different in-
S-R ISSUE 209

terresponse times. Journal of Experimental Psychology, network models of cognition: Biobehavioral foundations.
52, 145–161. Amsterdam: Elsevier.
Ayres, J. J. B., Benedict, J. O., & Witcher, E. S. (1975). Donahoe, J. W., & Palmer, D. C. (1989). The interpre-
Systematic manipulation of individual events in a truly tation of complex human behavior: Some reactions to
random control with rats. Journal of Comparative and Parallel Distributed Processing. Journal of the Experimental
Physiological Psychology, 88, 97–103. Analysis of Behavior, 51, 399–416.
Baum, W. M. (1973). The correlation-based law of effect. Donahoe, J. W., & Palmer, D. C. (1994). Learning and
Journal of the Experimental Analysis of Behavior, 20, 137– complex behavior. Boston: Allyn & Bacon.
154. Ferster, C. B., & Skinner, B. F. (1957). Schedules of rein-
Benedict, J. O. (1975). Response-shock delay as a rein- forcement. New York: Appleton-Century-Crofts.
forcer in avoidance behavior. Journal of the Experimental Frey, U. (in press). Cellular mechanisms of long-term
Analysis of Behavior, 24, 323–332. potentiation: Late maintenance. In J. W. Donahoe &
Benedict, J. O., & Ayres, J. J. B. (1972). Factors affecting V. P. Dorsel (Eds.), Neural-network models of cognition:
conditioning in the truly random control procedure Biobehavioral foundations. Amsterdam: Elsevier.
in the rat. Journal of Comparative and Physiological Psy- Frey, U., Huang, Y.-Y., & Kandel, E. R. (1993). Effects of
chology, 78, 323–330. cAMP simulate a late stage of LTP in hippocampus
Beninger, R. J. (1983). The role of dopamine activity in CA1 neurons. Science, 260, 1661–1664.
locomotor activity and learning. Brain Research Re- Galbicka, G. (1992). The dynamics of behavior. Journal
views, 6, 173–196. of the Experimental Analysis of Behavior, 57, 243–248.
Blough, D. S. (1963). Interresponse time as a function Gibson, J. J. (1979). The ecological approach to visual per-
of a continuous variable: A new method and some ception. Boston: Houghton-Mifflin.
data. Journal of the Experimental Analysis of Behavior, 6, Gormezano, I., & Kehoe, E. J. (1981). Classical condi-
237–246. tioning and the law of contiguity. In P. Harzem & M.
Bolles, R. C., & Popp, R. J., Jr. (1964). Parameters af- D. Zeiler (Eds.), Predictability, correlation, and contiguity
fecting the acquisition of Sidman avoidance. Journal (pp. 1–45). New York: Wiley.
of the Experimental Analysis of Behavior, 7, 315–321. Guthrie, E. R. (1933). Association as a function of time
Buonomano, D. V., & Merzenich, M. M. (1995). Tem- interval. Psychological Review, 40, 355–367.
poral information transformed into a spatial code by Hamilton, W. (1964). The genetical theory of social be-
a neural network with realistic properties. Science, 267, havior, I. II. Journal of Theoretical Biology, 7, 1–52.
1026–1028. Heinemann, E. G., & Rudolph, R. L. (1963). The effect
Catania, A. C., & Keller, K. J. (1981). Contingency, con- of discrimination training on the gradient of stimulus
tiguity, correlation, and the concept of causality. In P. generalization. American Journal of Psychology, 76, 653–
Harzem & M. D. Zeiler (Eds.), Predictability, correlation, 656.
and contiguity (pp. 125–167). New York: Wiley. Herrnstein, R. J. (1970). On the law of effect. Journal of
Coleman, S. R. (1981). Historical context and systematic the Experimental Analysis of Behavior, 13, 243–266.
functions of the concept of the operant. Behaviorism, Herrnstein, R. J. (1982). Melioration as behavioral dy-
9, 207–226. namism. In M. L. Commons, R. J. Herrnstein, & H.
Coleman, S. (1984). Background and change in B. F. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2.
Skinner’s metatheory from 1930 to 1938. Journal of Matching and maximizing accounts (pp. 433–458). Cam-
Mind and Behavior, 5, 471–500. bridge, MA: Ballinger.
Dinsmoor, J. A. (1985). The role of observing and atten- Herrnstein, R. J., & Vaughan, W., Jr. (1980). Melioration
tion in establishing stimulus control. Journal of the Ex- and behavioral allocation. In J. E. R. Staddon (Ed.),
perimental Analysis of Behavior, 43, 365–381. Limits to action: The allocation of individual behavior (pp.
Dinsmoor, J. A. (1995). Stimulus control: Part I. The Be- 143–176). New York: Academic Press.
havior Analyst, 18, 51–68. Heyman, G. N. (1979). A Markov model description of
Donahoe, J. W. (1993). The unconventional wisdom of changeover probabilities on concurrent variable-inter-
B. F. Skinner: The analysis-interpretation distinction. val schedules. Journal of the Experimental Analysis of Be-
Journal of the Experimental Analysis of Behavior, 60, 453– havior, 31, 41–51.
456. Hilgard, E. R., & Marquis, D. G. (1940). Conditioning and
Donahoe, J. W. (in press-a). The necessity of neural net- learning. New York: Appleton-Century-Crofts.
works. In J. W. Donahoe & V. P. Dorsel (Eds.), Neural- Hineline, P. N. (1970). Negative reinforcement without
network models of cognition: Biobehavioral foundations. shock reduction. Journal of the Experimental Analysis of
Amsterdam: Elsevier. Behavior, 14, 259–268.
Donahoe, J. W. (in press-b). Positive reinforcement: The Hineline, P. N. (1981). The several roles of stimuli in
selection of behavior. In W. O’Donohue (Ed.), Learn- negative reinforcement. In P. Harzem & M. D. Zeiler
ing and behavior therapy. Boston: Allyn & Bacon. (Eds.), Predictability, correlation, and contiguity (pp. 203–
Donahoe, J. W., Burgos, J. E., & Palmer, D. C. (1993). 246). New York: Wiley.
Selectionist approach to reinforcement. Journal of the Hineline, P. N. (1986). Re-tuning the operant-respon-
Experimental Analysis of Behavior, 60, 17–40. dent distinction. In T. Thompson & M. D. Zeiler
Donahoe, J. W., Crowley, M. A., Millard, W. J., & Stickney, (Eds.), Analysis and integration of behavioral units (pp.
K. A. (1982). A unified principle of reinforcement. 55–79). Hillsdale, NJ: Erlbaum.
In M. L. Commons, R. J. Herrnstein, & H. Rachlin Hinson, J. M., & Staddon, J. E. R. (1983a). Hill-climbing
(Eds.), Quantitative analyses of behavior (Vol. 2, pp. 493– by pigeons. Journal of the Experimental Analysis of Behav-
521). Cambridge, MA: Ballinger. ior, 39, 25–47.
Donahoe, J. W., & Dorsel, V. P. (Eds.). (in press). Neural- Hinson, J. M., & Staddon, J. E. R. (1983b). Matching,
210 JOHN W. DONAHOE et al.

maximizing, and hillclimbing. Journal of the Experimen- Palmer, D. C., & Donahoe, J. W. (1992). Essentialism
tal Analysis of Behavior, 40, 321–331. and selectionism in cognitive science and behavior
Hoebel, B. G. (1988). Neuroscience and motivation: analysis. American Psychologist, 47, 1344–1358.
Pathways and peptides that define motivational sys- Pear, J. J. (1985). Spatiotemporal patterns of behavior
tems. In R. A. Atkinson (Ed.), Stevens’ handbook of ex- produced by variable-interval schedules of reinforce-
perimental psychology (Vol. 1, pp. 547–625). New York: ment. Journal of the Experimental Analysis of Behavior, 44,
Wiley. 217–231.
Holland, P. C. (1977). Conditioned stimulus as a deter- Platt, J. R. (1979). Interresponse-time shaping by vari-
minant of the form of the Pavlovian conditioned re- able-interval-like interresponse-time reinforcement
sponse. Journal of Experimental Psychology: Animal Behav- contingencies. Journal of the Experimental Analysis of Be-
ior Processes, 3, 77–104. havior, 31, 3–14.
Hull, C. L. (1934). The concept of habit-family hierarchy Quinsey, V. L. (1971). Conditioned suppression with no
and maze learning. Psychological Review, 41, 33–54. CS-US contingency in the rat. Canadian Journal of Psy-
Hull, C. L. (1937). Mind, mechanism, and adaptive be- chology, 25, 69–82.
havior. Psychological Review, 44, 1–32. Rescorla, R. A. (1967). Pavlovian conditioning and its
Jenkins, H. M., & Sainesbury, R. S. (1969). The devel- proper control procedures. Psychological Review, 74,
opment of stimulus control through differential re- 71–80.
inforcement. In N. J. Mackintosh & W. K. Honig Rescorla, R. A., & Wagner, A. R. (1972). A theory of
(Eds.), Fundamental issues in associative learning (pp. Pavlovian conditioning: Variations in the effectiveness
123–161). Halifax, Nova Scotia: Dalhousie University of reinforcement and nonreinforcement. In A. H.
Press. Black & W. F. Prokasy (Eds.), Classical conditioning II:
Kehoe, E. J. (1988). A layered network model of asso- Current research and theory (pp. 64–99). New York: Ap-
ciative learning: Learning to learn and configuration. pleton-Century-Crofts.
Psychological Review, 95, 411–433. Rosenblatt, F. (1962). Principles of neurodynamics. Wash-
Kehoe, E. J. (1989). Connectionist models of condition- ington, DC: Spartan.
ing: A tutorial. Journal of the Experimental Analysis of Rumelhart, D. E., McClelland, J. L., & The PDP Research
Behavior, 52, 427–440. Group. (Eds.) (1986). Parallel distributed processing: Ex-
Keller, R. J., Ayres, J. J. B., & Mahoney, W. J. (1977). Brief plorations in the microstructure of cognition (Vol. 1). Cam-
versus extended exposure to truly random control bridge, MA: MIT Press.
procedures. Journal of Experimental Psychology: Animal Schoenfeld, W. N., & Farmer, J. (1970). Reinforcement
Behavior Processes, 3, 53–65. schedules and the ‘‘behavior stream.’’ In W. N.
Lattal, K. A., & Gleeson, S. (1990). Response acquisition Schoenfeld (Ed.), The theory of reinforcement schedules
with delayed reinforcement. Journal of Experimental Psy- (pp. 215–245). New York: Appleton-Century-Crofts.
chology: Animal Behavior Processes, 16, 27–39. Shimp, C. P. (1966). Probabilistically reinforced choice
Lieberman, P. A. (1993). Learning: Behavior and cognition. behavior in pigeons. Journal of the Experimental Analysis
Pacific Grove, CA: Brooks/Cole. of Behavior, 9, 443–455.
McClelland, J. L., Rumelhart, D. E., & The PDP Research Shimp, C. P. (1969). Optimal behavior in free-operant
Group. (Eds.). (1986). Parallel distributed processing: experiments. Psychological Review, 76, 97–112.
Explorations in microstructure of cognition (Vol. 2). Cam- Shimp, C. P. (1974). Time allocation and response rate.
bridge, MA: MIT Press. Journal of the Experimental Analysis of Behavior, 21, 491–
Meazzini, P., & Ricci, C. (1986). Molar vs. molecular 499.
units of analysis. In T. Thompson & M. D. Zeiler Shull, R. L. (1995). Interpreting cognitive phenomena:
(Eds.), Analysis and integration of behavioral units (pp. Review of Donahoe and Palmer’s Learning and Com-
19–43). Hillsdale, NJ: Erlbaum. plex Behavior. Journal of the Experimental Analysis of Be-
Minsky, M. L., & Papert, S. A. (1969). Perceptrons. Cam- havior, 63, 347–358.
bridge, MA: MIT Press. Sidman, M. (1986). Functional analysis of emergent ver-
Moore, J. (1984). Choice and transformed interrein- bal classes. In T. Thompson & M. D. Zeiler (Eds.),
forcement intervals. Journal of the Experimental Analysis Analysis and integration of behavioral units (pp. 213–
of Behavior, 42, 321–335. 245). Hillsdale, NJ: Erlbaum.
Morse, W. H. (1966). Intermittent reinforcement. In W. Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J.
K. Honig (Ed.), Operant behavior: Areas of research and (1978). The structure of choice. Journal of Experimen-
application (pp. 52–108). New York: Appleton-Centu- tal Psychology: Animal Behavior Processes, 4, 368–398.
ry-Crofts. Silberberg, A., & Ziriax, J. M. (1982). The interchange-
Morse, W. H., & Skinner, B. F. (1957). A second type of over time as a molecular dependent variable in con-
superstition in the pigeon. American Journal of Psychol- current schedules. In M. L. Commons, R. J. Herrn-
ogy, 70, 308–311. stein, & H. Rachlin (Eds.), Quantitative analyses of
Neuringer, A. J. (1967). Choice and rate of responding in behavior: Vol. 2. Matching and maximizing accounts of
the pigeon. Unpublished doctoral dissertation, Harvard behavior (pp. 111–130). Cambridge, MA: Ballinger.
University. Skinner, B. F. (1931). The concept of the reflex in the
Nevin, J. A. (1969). Interval reinforcement of choice be- study of behavior. Journal of General Psychology, 5, 427–
havior in discrete trials. Journal of the Experimental Anal- 458.
ysis of Behavior, 12, 875–885. Skinner, B. F. (1937). Two types of conditioned reflex:
Nevin, J. A. (1984). Quantitative analysis. Journal of the A reply to Konorski and Miller. Journal of General Psy-
Experimental Analysis of Behavior, 42, 421–434. chology, 16, 272–279.
Osgood, C. E. (1953). Method and theory in experimental Skinner, B. F. (1938). The behavior of organisms. New York:
psychology. New York: Oxford University Press. Appleton-Century-Crofts.
S-R ISSUE 211

Skinner, B. F. (1948). ‘‘Superstition’’ in the pigeon. Jour- analogue of operant conditioning. Journal of the Exper-
nal of Experimental Psychology, 38, 168–172. imental Analysis of Behavior, 60, 41–53.
Skinner, B. F. (1953). Science and human behavior. New Stein, L., Xue, B. G., & Belluzzi, J. D. (1994). In vitro
York: Macmillan. reinforcement of hippocampal bursting: A search for
Skinner, B. F. (1976). Farewell, my lovely! Journal of the Skinner’s atom of behavior. Journal of the Experimental
Experimental Analysis of Behavior, 25, 218. Analysis of Behavior, 61, 155–168.
Skinner, B. F. (1983). A matter of consequences. New York: Timberlake, W., & Lucas, G. A. (1985). The basis of su-
Knopf. perstitious behavior: Chance contingency, stimulus
Sober, E. (1984). The nature of selection. Cambridge, MA: substitution, or appetitive behavior? Journal of the Ex-
MIT Press. perimental Analysis of Behavior, 44, 279–299.
Sober, E. (1993). Philosophy of biology. Boulder, CO: West- Tolman, E. C. (1932). Purposive behavior in animals and
view Press. men. New York: Appleton-Century-Crofts.
Staddon, J. E. R. (1993). The conventional wisdom of Watson, J. B. (1924). Behaviorism. New York: Norton.
behavior analysis. Journal of the Experimental Analysis of Williams, B. A. (1985). Choice behavior in a discrete-
Behavior, 60, 439–447. trial concurrent VI-VR: A test of maximizing theories
Staddon, J. E. R., & Hinson, J. M. (1983). Optimization: of matching. Learning and Motivation, 16, 423–443.
A result or a mechanism? Science, 221, 976–977. Williams, B. A. (1986). Identifying behaviorism’s proto-
Staddon, J. E. R., & Simmelhag, V. L. (1971). The ‘‘su- type: A review of Behaviorism: A Conceptual Reconstruc-
tion by G. E. Zuriff. The Behavior Analyst, 9, 117–122.
perstition’’ experiment: A reexamination of its impli-
Williams, B. A. (1990). Enduring problems for molecu-
cations for the principles of adaptive behavior. Psycho-
lar accounts of operant behavior. Journal of Experimen-
logical Review, 78, 3–43. tal Psychology: Animal Behavior Processes, 16, 213–216.
Stein, L., & Belluzzi, J. D. (1988). Operant conditioning Williams, D. R. (1968). The structure of response rate.
of individual neurons. In M. L. Commons, R. M. Journal of the Experimental Analysis of Behavior, 11, 251–
Church, J. R. Stellar, & A. R. Wagner (Eds.), Quanti- 258.
tative analyses of behavior (Vol. 7, pp. 249–264). Hills- Wise, R. A. (1989). The brain and reward. In J. M. Lieb-
dale, NJ: Erlbaum. man & S. J. Cooper (Eds.), The neuropharmacological
Stein, L., & Belluzzi, J. D. (1989). Cellular investigations basis of reward (pp. 377–424). New York: Oxford Uni-
of behavioral reinforcement. Neuroscience and Biobehav- versity Press.
ioral Reviews, 13, 69–80. Zuriff, G. E. (1985). Behaviorism: A conceptual reconstruc-
Stein, L., Xue, B. G., & Belluzzi, J. D. (1993). A cellular tion. New York: Columbia University Press.

You might also like