Professional Documents
Culture Documents
The central focus of this essay is whether the effect of reinforcement is best viewed as the strength-
ening of responding or the strengthening of the environmental control of responding. We make the
argument that adherence to Skinner’s goal of achieving a moment-to-moment analysis of behavior
compels acceptance of the latter view. Moreover, a thoroughgoing commitment to a moment-to-
moment analysis undermines the fundamental distinction between the conditioning processes in-
stantiated by operant and respondent contingencies while buttressing the crucially important differ-
ences in their cumulative outcomes. Computer simulations informed by experimental analyses of
behavior and neuroscience are used to illustrate these points.
Key words: S-R psychology, contingencies of reinforcement, contiguity, discrimination learning, re-
inforcement, respondent conditioning, computer simulation
193
194 JOHN W. DONAHOE et al.
raised. The following statement in LCB is cit- relevant to the S-R issue is scattered through-
ed: out the book and some of the more technical
The outcome of selection by reinforcement is details are not elaborated, the need for clar-
a change in the environmental guidance of ification is understandable. We consider first
behavior. That is, what is selected is always an the interpretation of responding in a stable
environment–behavior relation, never a re- stimulus context and then proceed to a more
sponse alone. (LCB, p. 68) general examination of the core of the S-R
issue. No effort is made to discuss all of its
Of this statement, Shull comments,
ramifications—the phrase connotes a consid-
In this respect, then, [LCB’s] conception of erable set of interrelated distinctions that
reinforcement is very much in the tradition of vary somewhat among different theorists (cf.
S-R theory . . . [in which] . . . what was selected Lieberman, 1993, p. 190; B. Williams, 1986;
was the ability of a particular stimulus pattern Zuriff, 1985). Also, no effort is made to pro-
to evoke a particular response pattern. (Shull,
vide a historical overview of the S-R issue, al-
1995, p. 353)
though such information is clearly required
The question is then considered of whether for a complete treatment of the topic (see
this view is consistent with the behavior-ana- Coleman, 1981, 1984; Dinsmoor, 1995; Gor-
lytic conception of operant behavior in which mezano & Kehoe, 1981).
‘‘operant behavior occurs in a stimulus con-
text, but there is often no identifiable stimu-
lus change that precedes each occurrence of BEHAVING IN A
the response’’ (Shull, 1995, p. 354). This STABLE CONTEXT
leads to the related concern of whether adap- The central distinction between S-R psy-
tive neural networks are suitable to interpret chology and the view introduced by Skinner
operant behavior because networks are ‘‘con- is how one accounts for variability in behav-
structed from elementary connections in- ior. The defining feature of S-R psychology is
tended as analogues of stimulus–response re- that it explains variability in behavior by ref-
lations’’ (Shull, 1995, p. 354). erence to variability in antecedents: When a
In what follows, we seek to demonstrate not response occurs there must have been some
only that LCB’s view of operant behavior and discrete antecedent, or complex of antece-
its interpretation via adaptive neural networks dents, overt or covert, that evoked the re-
is consistent with behavior-analytic formula- sponse. If the response varies in frequency, it
tions (which we share), but also that this view is because antecedent events have varied in
enriches our understanding of what it means frequency. On this view, there will always be
to say that operants are emitted rather than a nonzero correlation between antecedent
elicited. We agree that the behavior-analytic events and behavior. Further, frequency of re-
view of operants should be regarded as ‘‘lib- sponse (or frequency per unit time, i.e., rate)
erating because . . . fundamental relation- cannot serve as a fundamental dependent
ships could be established in procedures that variable because response rate is, at root, a
allowed responses to occur repeatedly over function of the rate of stimulus presentation.
long periods of time without the constraints In contrast, Skinner held that, even when
of trial onset and offset’’ (Shull, 1995, p. there is no identifiable variability in antece-
354). Instead of departing from behavior-an- dents, variability in behavior remains lawful:
alytic thinking, the view that reinforcers select Behavior undergoes orderly change because
environment–behavior relations fosters more of its consequences. In fact, at the level of
parsimonious treatments of stimulus control behavioral observations, one can find lawful
and conditioning, and represents a continu- relationships between the occurrence of a re-
ation of Skinner’s efforts to provide a com- sponse and the contingencies of reinforce-
pelling moment-to-moment account of be- ment in a stable context. Skinner did not
havior (Skinner, 1976). merely assert the central role of control by
Futher, we concentrate on the rationale be- consequences; he persuasively demonstrated
hind this view of selection by reinforcement it experimentally. Once such control is ac-
as it is interpreted by biobehaviorally con- cepted as an empirical fact and not simply as
strained neural networks. Because material a theoretical preference, the S-R position be-
S-R ISSUE 195
Fig. 1. The simulation by a neural network of acquisition (ACQ), extinction (EXT), and reacquisition (REACQ)
with an operant contingency. The simulated environmental context activated the input units of the neural network
at a constant level of 1 throughout all phases of the simulation. In accordance with an operant contingency, the
input unit for the reinforcing stimulus was activated during ACQ and REACQ only when the activation level of the
output unit simulating the operant (R) was greater than zero. During EXT, the input unit for the reinforcing stimulus
was never activated. (Activation levels of units could vary between 0 and 1.) The activation level of the output unit
simulating the conditioned response (CR), which also changed during the conditioning process, is also shown.
comes untenable. We also accept control by experiments.) In the simulation, the strength
consequences as an empirical fact, and our of the response varied widely even though the
networks simulate some of its orderly effects context remained constant: Responding in-
without appealing to correlated antecedent creased in strength during acquisition, weak-
changes in the environment. ened during extinction, and then rapidly in-
Consider the neural network simulation of creased again during reacquisition, and did
the reacquisition of an extinguished response so more rapidly than during original acqui-
that is discussed in LCB (pp. 92–95). In the sition (see Figure 1). Moreover, the changes
first phase of the simulation a response was in response strength were not monotonic, but
followed by a reinforcer, in the second phase showed irregularities during the transitions in
extinction was scheduled for the response, response strength. None of these changes can
and in the third phase the response was again be understood by reference to the stimulus
reinforced. The ‘‘sensory inputs’’ to the net- context; it remained constant throughout the
work were held constant throughout the sim- simulation. Instead, the changes can only be
ulation. (Note that in a simulation the stim- interpreted by reference to the effects of the
ulus context may be held strictly constant, contingencies of reinforcement on the net-
unaffected by moment-to-moment variations work and to the history of reinforcement in
in stimulation that inevitably occur in actual that context.
196 JOHN W. DONAHOE et al.
interpretation of these same facts will be giv- the same conceptual challenge as does inter-
en that is consistent with our view that rein- preting control by a stable context at the be-
forcers affect input–output relations and not havioral level.
output alone. However, the primary point The mechanisms proposed both by Stein
here is that it is a mistake to categorize ac- and by us are consistent with the behavioral
counts at the behavioral level by one’s view of phenomena that led Skinner to break from
the underlying biology. Behavior does not ful- S-R psychology—an increase in responding
ly constrain biology. To hold otherwise is to following the occurrence of a response-con-
endorse the conceptual-nervous-system ap- tingent reinforcer in the absence of a speci-
proach decried by Skinner (1938). fied antecedent. However, we prefer our pro-
Consider the following alternative interpre- posals at both the behavioral and neural
tation of the finding that an increase in the levels because they can accommodate behav-
frequency of firing occurred as a result of the ior in discrimination procedures as well as in
burst-contingent application of a neuromod- stable contexts. It appears to us that proposals
ulator. The finding was attributed to the con- that do not specify a three-term contingency
tiguity between the bursting of the postsyn- must be supplemented by something akin to
aptic neuron and the introduction of the our proposal in order to account for discrim-
neuromodulator (a two-term cellular contin- inated behavior, in which case the former
gency). Such an interpretation is consistent proposed mechanisms would be redundant.
with the finding, but it is not the only possible Ultimately, of course, the interpretation of
interpretation. Moreover, other observations the cellular results is an empirical matter re-
complicate the picture: The neuromodulator quiring simultaneous measurement of all
was not effective after a single spike, but only terms in the three-term contingency: antece-
after a burst of several spikes. An alternative dent events (presynaptic activity), subsequent
interpretation of these facts, proposed in LCB events (postsynaptic activity), and conse-
(pp. 66–67), is that the increase in postsyn- quences (the neuromodulator). Both propos-
aptic activity may reflect a heightened sensi- als have the merit of showing that behavior
tivity of the postsynaptic neuron to the re- analysis can be quite smoothly integrated with
lease of the neurotransmitter glutamate by what is known about the nervous system. This
presynaptic neurons. The experimental work remains but an elusive dream in normative
of Frey (Frey, in press; Frey, Huang, & Kan- (i.e., inferred-process) psychology.
del, 1993) has shown that dopamine acts in In brief, principles formulated on the basis
conjunction with the effects of glutamate on of behavioral observations do not tightly con-
the N-methyl-D-aspartate (NMDA) receptor strain the potential physiological mechanisms
to initiate a second-messenger cascade whose that implement the functional relations de-
ultimate outcome is an enhanced response of scribed by behavioral principles, and physio-
non-NMDA receptors to glutamate. On this logical mechanisms do not dictate the most
view, the ineffectiveness of dopamine after effective statement of principles at the behav-
single spikes occurs because bursting is nec- ioral level. The two levels of analysis must
essary to depolarize the postsynaptic mem- yield consistent principles but, as Skinner
brane sufficiently to engage the voltage-sen- pointed out (1938, p. 432), nothing that is
sitive NMDA receptor. Accordingly, the learned about the physiology of behavior can
increased bursting observed after burst-con- ever undermine valid behavioral laws.
tingent microinjections of dopamine reflects
an enhanced response of the postsynaptic
neuron to presynaptic activity (a three-term THE MOMENT-TO-MOMENT
cellular contingency involving the conjunc- CHARACTER OF
tion of presynaptic and postsynaptic activity BIOBEHAVIORAL PROCESSES
with dopamine). To conclude that bursting is Basic to the disposition of the S-R issue is
independent of presynaptic activity when pre- an even more fundamental matter: whether
synaptic activity has not been measured is to functional relations at the behavioral level are
risk mistaking absence of evidence for evi- best viewed as emergent products of the out-
dence of absence. In short, interpreting these come of moment-to-moment interactions be-
very important neural observations presents tween the organism and its environment or
198 JOHN W. DONAHOE et al.
whether such regularities are sui generis (i.e., ination becomes an anomaly and requires ad
understandable only at the level at which they hoc principles that differ from those that ac-
appear). Skinner clearly favored moment-to- commodate nondifferential conditioning. In
moment analyses (e.g., Ferster & Skinner, such a formulation, the environment would
1957). Consider the following statements in become empowered to control behavior
‘‘Farewell, my lovely!’’ in which Skinner when there were differential consequences,
(1976) poignantly lamented the decline of but not otherwise. But, is it credible that re-
cumulative records in the pages of this jour- inforcers should strengthen behavior relative
nal. to a stimulus with one procedure and not
What has happened to experiments where with the other? And, if so, what events present
rate changed from moment to moment in in- at the ‘‘moment of reinforcement’’ are avail-
teresting ways, where a cumulative record told able to differentiate a reinforced response in
more in a glance than could be described in a discrimination procedure from a reinforced
a page? . . . [Such records] . . . suggested a re- response in a nondiscrimination procedure?
ally extraordinary degree of control over an The conclusion that no such events exist led
individual organism as it lived its life from mo- Dinsmoor (1995, p. 52) to make much the
ment to moment. . . . These ‘‘molecular’’ same point in citing Skinner’s statement that
changes in probability of responding are most ‘‘it is the nature of [operant] behavior that
immediately relevant to our own daily lives.
. . . discriminative stimuli are practically in-
(Skinner, 1976, p. 218)
evitable’’ (Skinner, 1937, p. 273; see also Ca-
Skinner’s unwavering commitment to a mo- tania & Keller, 1981, p. 163).
ment-to-moment analysis of behavior (cf. During differential operant conditioning,
Skinner, 1983, p. 73) has profound—and un- stimuli are sensed in whose presence a re-
derappreciated—implications for the resolu- sponse is followed by a reinforcer. But envi-
tion of the S-R issue as well as for other cen- ronment–behavior–reinforcer sequences nec-
tral distinctions in behavior analysis, essarily occur in a nondiscrimination
including the distinction between operant procedure as well. The two procedures differ
and respondent conditioning itself. with respect to the reliability with which par-
ticular stimuli are present prior to the rein-
Stimulus Control of Behavior forced response, but that difference cannot
In LCB, an organism is described as ‘‘im- be appreciated on a single occasion. The es-
mersed in a continuous succession of envi- sence of reliability is repeatability. The dis-
ronmental stimuli . . . in whose presence a tinction emerges as a cumulative product of
continuous succession of responses . . . is oc- the occurrence of reinforcers over repeated
curring. . . . When a [reinforcing] stimulus is individual occasions. In laboratory proce-
introduced into this stream of events, then dures that implement nondifferential condi-
. . . selection occurs (cf. Schoenfeld & Farm- tioning, it is not that no stimuli are sensed
er, 1970)’’ (p. 49). At the moment when the prior to the response–reinforcer sequence,
reinforcer occurs—what Skinner more casu- but that no stimuli specifiable by the experi-
ally referred to as ‘‘the moment of Truth’’— menter are reliably sensed prior to the se-
some stimulus necessarily precedes the rein- quence.
forced response in both differential and
nondifferential conditioning. That is, at the Conditioning of Behavior
‘‘moment of reinforcement’’ (Ferster & Skin- Paradoxically, by strictly parallel reasoning,
ner, 1957, pp. 2–3), there is no environmen- an acceptance of Skinner’s commitment to a
tal basis by which to distinguish between the moment-to-moment analysis of behavior com-
two contingencies. Therefore, no basis exists pels a rejection of a fundamental distinction
by which different processes could be initiat- between the conditioning processes instan-
ed for nondifferential as contrasted with dif- tiated by respondent and operant proce-
ferential conditioning (i.e., response strength- dures. Instead, a moment-to-moment analysis
ening in the first instance and stimulus calls for a unified theoretical treatment of the
control of strengthening in the second). If conditioning process, with the environmental
control by contextual stimuli does not occur control of responding as the cumulative out-
in nondifferential conditioning, then discrim- come of both procedures.
S-R ISSUE 199
only when it is introduced into the synapse tingencies of reinforcement that pit moment-
within 200 ms of a burst of firing in the post- to-moment processes against molar regulari-
synaptic neuron (Stein & Belluzzi, 1989). Be- ties. Under these circumstances, the variation
havior analysis and neuroscience are inde- in behavior typically tracks moment-to-mo-
pendent disciplines, but their principles ment relations, not relations between events
cannot be inconsistent with one another’s defined over more extended periods of time.
findings. The two sciences are dealing with For example, with positive reinforcers, differ-
different aspects of the same organism (LCB, ential reinforcement of responses that occur
pp. 275–277; Skinner, 1938). at different times following the previous re-
Although conditioning processes are in- sponse (i.e., differential reinforcement of in-
stantiated in moment-to-moment relations terresponse times, or IRTs) changes the over-
between events, compelling regularities all rate of responding even though the overall
sometimes appear in the relation between in- rate of reinforcement is unchanged (Platt,
dependent and dependent variables defined 1979). As conjectured by Shimp (1974, p.
over more extended periods of time (e.g., be- 498), ‘‘there may be no such thing as an as-
tween average rate of reinforcement and av- ymptotic mean rate of [responding] that is
erage rate of responding; Baum, 1973; Herrn- . . . independent of reinforced IRTs’’ (cf. An-
stein, 1970). What is the place of molar ger, 1956). Similarly, in avoidance learning,
regularities in a science if its fundamental when the delay between the response and
processes operate on a moment-to-moment shock is varied but the overall rate of shock
basis? Nevin’s answer to this question seems is held constant, the rate of avoidance re-
very much on the mark: ‘‘The possibility that sponding is sensitive to the momentary delay
molar relations . . . may prove to be derivative between the response and shock, not the
from more local processes does nothing to overall rate of shock (Hineline, 1970; see also
diminish their value as ways to summarize Benedict, 1975; Bolles & Popp, 1964). Re-
and integrate data’’ (Nevin, 1984, p. 431; see search with respondent procedures has led in
also Herrnstein, 1970, p. 253). The concep- the same direction: Molar regularities are the
tual relation between moment-to-moment cumulative products of moment-to-moment
processes and molar regularities in behavior relations. For example, whereas at one time
analysis parallels the distinction between ‘‘se- it was held that behavior was sensitive to the
lection for’’ and ‘‘selection of ’’ in the paradig- overall correlation between conditioned and
matic selectionist science of evolutionary bi- unconditioned stimuli (Rescorla, 1967), later
ology (Sober, 1984). Insofar as the notions of experiments (Ayres, Benedict, & Witcher,
cause and effect have meaning in the context 1975; Benedict & Ayres, 1972; Keller, Ayres,
of the complex interchange between an or- & Mahoney, 1977; cf. Quinsey, 1971) and the-
ganism and its environment: ‘‘‘Selection for’ oretical work (Rescorla & Wagner, 1972)
describes the causes, while ‘selection of’ de- demonstrated that molar regularities could
scribes the effects’’ (Sober, 1993, p. 82). In be understood as the cumulative products of
evolutionary biology, selection for genes af- molecular relations between CS and US. In
fecting reproductive fitness leads to selection summary, research with both operant and re-
of altruistic behavior (Hamilton, 1964). As the spondent procedures has increasingly shown
distinction applies in behavior analysis, rein- that molar regularities are the cumulative
forcers cause certain environment–behavior products of moment-to-moment conditioning
relations to be strengthened; this has the ef- processes. (For initial work of this sort, see
fect, under some circumstances, of producing Neuringer, 1967, and Shimp, 1966, 1969,
molar regularities. Selection by reinforce- 1974. For more recent efforts, see Herrn-
ment for momentary environment–behavior stein, 1982; Herrnstein & Vaughan, 1980;
relations produces selection of molar regular- Hinson & Staddon, 1983a, 1983b; Moore,
ities. 1984; Silberberg, Hamilton, Ziriax, & Casey,
One can demonstrate that what reinforcers 1978; Silberberg & Ziriax, 1982.)
select are momentary relations between en- It must be acknowledged, however, that not
vironmental and behavioral events, not the all molar regularities can yet be understood
molar regularities that are their cumulative as products of molecular processes (e.g., be-
products. This can be done by arranging con- havior maintained by some schedules or by
202 JOHN W. DONAHOE et al.
long reinforcer delays; Heyman, 1979; Hine- forcement is for the former environments,
line, 1981; Lattal & Gleeson, 1990; Nevin, whereas natural selection is for the latter.
1969; B. Williams, 1985). Refractory findings Additional experimental work is needed to
continue to challenge moment-to-moment determine how moment-to-moment process-
accounts, and a completely integrated theo- es may lead to molar regularities, but the ef-
retical treatment of molar regularities in fort will undoubtedly also require interpre-
terms of molecular processes still eludes us tation (Donahoe & Palmer, 1989, 1994, pp.
(cf. B. Williams, 1990). Difficulties in provid- 125–129). In the final section of this essay,
ing moment-to-moment accounts of molar interpretation by means of adaptive neural
regularities in complex situations are not pe- networks is used to clarify the contribution of
culiar to behavior analysis. Physics continues momentary processes to the central issue: the
to struggle with many-body problems in me- S-R issue.
chanics, even though all of the relevant fun-
damental processes are presumably known.
Nevertheless, it is now clear that behavior NEURAL NETWORK
analysis is not forced to choose between mo- INTERPRETATIONS OF
lar and moment-to-moment accounts (e.g., CONDITIONING
Meazzini & Ricci, 1986, p. 37). The two ac- We turn finally to the question of whether
counts are not inconsistent if the former are biobehaviorally constrained neural networks
regarded as the cumulative product of the lat- can faithfully interpret salient aspects of the
ter. stimulus control of operants. The full answer
Indeed, the two accounts may be even to this question obviously lies in the future;
more intimately intertwined: In the evolu- however, preliminary results are encouraging
tionary history of organisms, natural selec- (e.g., Donahoe et al., 1993; Donahoe & Dor-
tion may have favored genes whose expres- sel, in press; Donahoe & Palmer, 1994). Our
sion yielded moment-to-moment processes concern here is whether, in principle, net-
that implemented certain molar regularities works ‘‘constructed from elementary connec-
as their cumulative product (LCB, pp. 112– tions’’ that are said to be ‘‘analogues of stim-
114; Donahoe, in press-b; cf. Skinner, 1983, ulus–response relations’’ can accommodate
p. 362; Staddon & Hinson, 1983). Natural se- the view that ‘‘operant behavior occurs in a
lection for some molar regularity (e.g., maxi- stimulus context, but there is often no iden-
mizing, optimizing, matching) may have led tifiable stimulus change that precedes each
to selection of moment-to-moment processes occurrence of the response’’ (Shull, 1995, p.
whose product was the molar regularity. In 354). This view of operants is rightly regarded
that way, natural selection for the molar reg- as ‘‘liberating’’ because it empowers the study
ularity could lead to selection of momentary of complex reinforcement contingencies in
processes. Once those moment-to-moment the laboratory and because it frees applied
processes had been naturally selected, selec- behavior analysis from the need to identify
tion by reinforcement for momentary envi- the precise controlling stimuli for dysfunc-
ronment–behavior relations could, in turn, tional behavior before instituting remedial in-
cause selection of the molar regularity. Note, terventions. Indeed, it can be argued that
however, to formulate the reinforcement pro- pragmatic considerations motivated the op-
cess in terms of the molar regularities it pro- erant-respondent distinction more than prin-
duces, rather than the moment-to-moment cipled distinctions about the role of the en-
processes that implement it, is to conflate nat- vironment in emitted and elicited behavior.
ural selection with selection by reinforce- The present inquiry into neural network
ment. The selecting effect of the temporally interpretations of operants can be separated
extended environment is the province of nat- into two parts: The first, and narrower, ques-
ural selection; that of the moment-to-moment tion is: Do neural networks implement ‘‘an-
environment is the province of selection by alogues of stimulus–response relations’’? The
reinforcement. Of course, many momentary second is: Are neural networks capable of
environments make up the temporally ex- simulating the effects of nondifferential as
tended environment, but selection by rein- well as differential operant contingencies?
S-R ISSUE 203
Fig. 2. A minimal architecture of a selection network for simulating operant conditioning. Environmental events
stimulate primary sensory input units (S1, S2, and S3) that give rise to connections that activate units in sensory
association areas and, ultimately, units in motor association and primary motor areas. One primary motor output
unit simulates the operant response (R). When the R unit is activated, the response–reinforcer contingency imple-
mented by the simulation stimulates the SR input unit, simulating the reinforcing stimulus. Stimulating the SR unit
activates the subcortical dopaminergic system of the ventral tegmental area (VTA) and the CR/UR output unit
simulating the reinforcer-elicited response (i.e., the unconditioned response; UR). Subsequent to conditioning, en-
vironmental events acting on the input units permit activation of the R and CR/UR units simulating the operant
and conditioned response (CR), respectively. The VTA system modifies connection weights to units in motor asso-
ciation and primary motor areas and modulates the output of the hippocampal system. The output of the hippocam-
pal system modifies connection weights to units in sensory association areas. Connection weights are changed as a
function of moment-to-moment changes in (a) the coactivity of pre- and postsynaptic units and (b) the discrepancies
in diffusely projecting systems from the hippocampus (d1) and the VTA (d2). The arrowheads point toward those
synapses that are affected by activity in the diffusely projecting systems. Finer lines indicate pathways whose connection
weights are modified by the diffusely projecting systems. Heavier lines indicate pathways that are functional from the
outset of the simulation due to natural selection. (For additional information, see Donahoe et al., 1993; Donahoe &
Dorsel, in press; Donahoe & Palmer, 1994.)
tails, see Donahoe et al., 1993; LCB, pp. 237– richness of even the relatively impoverished
239). A stable context may be simulated using environment of a test chamber and the rela-
a network with three input units (S1, S2, and tively simple contingencies programmed
S3). In the first simulation, S1 was continu- therein; Donahoe, in press-a.) Whenever the
ously activated with a strength of .75, simu- output unit simulating the operant became
lating a salient feature of the environment activated, a reinforcing stimulus was present-
(e.g., the wavelength on a key for a pigeon). ed and all connections between recently co-
S2 and S3 were continuously activated with active units were slightly strengthened. After
strengths of .50, simulating less salient fea- training in which the full context set the oc-
tures of the environment (e.g., the masking casion for the operant, probe tests were con-
noise in the chamber, stimuli from the cham- ducted in which each of the three input units
ber wall adjacent to the key, etc.). (No simu- making up the context was activated separate-
lation can fully capture the complexity and ly and in various combinations. (Note, again,
206 JOHN W. DONAHOE et al.
On the level of the nervous system, this is the by themselves have effects (e.g., habituation,
counterpart of Skinner’s distinction between sensitization, or latent inhibition) even when
elicited responses (respondents) and emitted responding has no programmed conse-
responses (operants); Skinner, 1937. (LCB, p. quences. However, in a simulation the input
151)
units can be stimulated when the algorithms
Because, in general, behavior is not the result that modify connection weights are disabled.
of the environment activating an invariant In the present case, when the S1, S2, and S3
and rigidly circumscribed set of pathways, input units were stimulated as in the first sim-
LCB prefers to speak of behavior as being ulation of context conditioning but with no
‘‘guided’’ rather than controlled by the en- change in connection weights, the mean ac-
vironment. (As an aside, the phrase ‘‘environ- tivation of the operant output unit during
mental guidance of behavior’’ has also been 200 trials was only .09. Thus, stimuli did not
found to have certain tactical advantages over evoke activity in the operant unit to any ap-
‘‘stimulus control of behavior’’ when seeking preciable degree; that is, responding was not
a fair hearing for behavior-analytic interpre- elicited.
tations of human behavior.) Turn now to the question: Does condition-
The foregoing simulations illustrate the ing occur if activity of the operant unit is fol-
context dependence of the conditioning pro- lowed by a putative reinforcing stimulus when
cess when an operant is acquired in the stable there is no environmental context (not mere-
environment of a nondiscrimination proce- ly no measured or experimenter-manipulated
dure. Our previous simulation research has context)? To answer this question, a simula-
demonstrated that an operant may be tion was conducted under circumstances that
brought under more precise stimulus con- were otherwise identical to the first simula-
trol: When a discrimination procedure was tion except that the input units of the net-
simulated, the controlling stimuli were re- work were not activated. Any connection
stricted to those that most reliably preceded strengths that were modified were between
the reinforced response (cf. Donahoe et al., units that were activated as the result of spon-
1993; LCB, p. 78). Thus, the same learning taneous coactivity between interior and op-
algorithm that modifies the strengths of con- erant units. Under such circumstances, acti-
nections in the same selection-network archi- vation of the operant unit is emitted in the
tecture can simulate important conditioning purest sense; that is, its activation is solely the
phenomena as its cumulative effect with ei- product of endogenous intranetwork vari-
ther a nondiscrimination or a discrimination ables. Simulation indicated that even after as
procedure. many as 1,000 operant–reinforcer pairings us-
ing identical values for all other parameters,
Interpreting the Requirements for conditioning did not occur. Thus, in the ab-
Operant Conditioning sence of an environment, a two-term re-
Simulation techniques can be applied to sponse–reinforcer contingency was insuffi-
the problem of identifying the necessary and cient to produce conditioning in a selection
sufficient conditions for learning in selection network.
networks. What are the contributions of the The ineffectiveness of a two-term contin-
stimulus, the two-term response–reinforcer gency between an activated output unit and
contingency, and the three-term stimulus–re- the occurrence of a putative reinforcer is a
sponse–reinforcer contingency to operant consequence of our biologically based learn-
conditioning? And, what role, if any, is played ing algorithm (Donahoe et al., 1993, p. 40,
by intranetwork variables that affect the Equation 5). The learning algorithm simu-
‘‘spontaneous’’ activity of units? lates modification of synaptic efficacies be-
Consider the question: What is the baseline tween neurons, and is informed by experi-
activation level of the operant unit (i.e., its mental analyses of the conditions that
operant level) when stimuli are applied to in- produce long-term potentiation (LTP). Ex-
put units but without consequences for activ- perimental analyses of LTP indicate that syn-
ity induced in any other units in the network? aptic efficacies increase when a neuromodu-
In living organisms, this condition is imper- lator (that occurs as a result of the
fectly realized because stimulus presentations reinforcing stimulus) is introduced into syn-
208 JOHN W. DONAHOE et al.
terresponse times. Journal of Experimental Psychology, network models of cognition: Biobehavioral foundations.
52, 145–161. Amsterdam: Elsevier.
Ayres, J. J. B., Benedict, J. O., & Witcher, E. S. (1975). Donahoe, J. W., & Palmer, D. C. (1989). The interpre-
Systematic manipulation of individual events in a truly tation of complex human behavior: Some reactions to
random control with rats. Journal of Comparative and Parallel Distributed Processing. Journal of the Experimental
Physiological Psychology, 88, 97–103. Analysis of Behavior, 51, 399–416.
Baum, W. M. (1973). The correlation-based law of effect. Donahoe, J. W., & Palmer, D. C. (1994). Learning and
Journal of the Experimental Analysis of Behavior, 20, 137– complex behavior. Boston: Allyn & Bacon.
154. Ferster, C. B., & Skinner, B. F. (1957). Schedules of rein-
Benedict, J. O. (1975). Response-shock delay as a rein- forcement. New York: Appleton-Century-Crofts.
forcer in avoidance behavior. Journal of the Experimental Frey, U. (in press). Cellular mechanisms of long-term
Analysis of Behavior, 24, 323–332. potentiation: Late maintenance. In J. W. Donahoe &
Benedict, J. O., & Ayres, J. J. B. (1972). Factors affecting V. P. Dorsel (Eds.), Neural-network models of cognition:
conditioning in the truly random control procedure Biobehavioral foundations. Amsterdam: Elsevier.
in the rat. Journal of Comparative and Physiological Psy- Frey, U., Huang, Y.-Y., & Kandel, E. R. (1993). Effects of
chology, 78, 323–330. cAMP simulate a late stage of LTP in hippocampus
Beninger, R. J. (1983). The role of dopamine activity in CA1 neurons. Science, 260, 1661–1664.
locomotor activity and learning. Brain Research Re- Galbicka, G. (1992). The dynamics of behavior. Journal
views, 6, 173–196. of the Experimental Analysis of Behavior, 57, 243–248.
Blough, D. S. (1963). Interresponse time as a function Gibson, J. J. (1979). The ecological approach to visual per-
of a continuous variable: A new method and some ception. Boston: Houghton-Mifflin.
data. Journal of the Experimental Analysis of Behavior, 6, Gormezano, I., & Kehoe, E. J. (1981). Classical condi-
237–246. tioning and the law of contiguity. In P. Harzem & M.
Bolles, R. C., & Popp, R. J., Jr. (1964). Parameters af- D. Zeiler (Eds.), Predictability, correlation, and contiguity
fecting the acquisition of Sidman avoidance. Journal (pp. 1–45). New York: Wiley.
of the Experimental Analysis of Behavior, 7, 315–321. Guthrie, E. R. (1933). Association as a function of time
Buonomano, D. V., & Merzenich, M. M. (1995). Tem- interval. Psychological Review, 40, 355–367.
poral information transformed into a spatial code by Hamilton, W. (1964). The genetical theory of social be-
a neural network with realistic properties. Science, 267, havior, I. II. Journal of Theoretical Biology, 7, 1–52.
1026–1028. Heinemann, E. G., & Rudolph, R. L. (1963). The effect
Catania, A. C., & Keller, K. J. (1981). Contingency, con- of discrimination training on the gradient of stimulus
tiguity, correlation, and the concept of causality. In P. generalization. American Journal of Psychology, 76, 653–
Harzem & M. D. Zeiler (Eds.), Predictability, correlation, 656.
and contiguity (pp. 125–167). New York: Wiley. Herrnstein, R. J. (1970). On the law of effect. Journal of
Coleman, S. R. (1981). Historical context and systematic the Experimental Analysis of Behavior, 13, 243–266.
functions of the concept of the operant. Behaviorism, Herrnstein, R. J. (1982). Melioration as behavioral dy-
9, 207–226. namism. In M. L. Commons, R. J. Herrnstein, & H.
Coleman, S. (1984). Background and change in B. F. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2.
Skinner’s metatheory from 1930 to 1938. Journal of Matching and maximizing accounts (pp. 433–458). Cam-
Mind and Behavior, 5, 471–500. bridge, MA: Ballinger.
Dinsmoor, J. A. (1985). The role of observing and atten- Herrnstein, R. J., & Vaughan, W., Jr. (1980). Melioration
tion in establishing stimulus control. Journal of the Ex- and behavioral allocation. In J. E. R. Staddon (Ed.),
perimental Analysis of Behavior, 43, 365–381. Limits to action: The allocation of individual behavior (pp.
Dinsmoor, J. A. (1995). Stimulus control: Part I. The Be- 143–176). New York: Academic Press.
havior Analyst, 18, 51–68. Heyman, G. N. (1979). A Markov model description of
Donahoe, J. W. (1993). The unconventional wisdom of changeover probabilities on concurrent variable-inter-
B. F. Skinner: The analysis-interpretation distinction. val schedules. Journal of the Experimental Analysis of Be-
Journal of the Experimental Analysis of Behavior, 60, 453– havior, 31, 41–51.
456. Hilgard, E. R., & Marquis, D. G. (1940). Conditioning and
Donahoe, J. W. (in press-a). The necessity of neural net- learning. New York: Appleton-Century-Crofts.
works. In J. W. Donahoe & V. P. Dorsel (Eds.), Neural- Hineline, P. N. (1970). Negative reinforcement without
network models of cognition: Biobehavioral foundations. shock reduction. Journal of the Experimental Analysis of
Amsterdam: Elsevier. Behavior, 14, 259–268.
Donahoe, J. W. (in press-b). Positive reinforcement: The Hineline, P. N. (1981). The several roles of stimuli in
selection of behavior. In W. O’Donohue (Ed.), Learn- negative reinforcement. In P. Harzem & M. D. Zeiler
ing and behavior therapy. Boston: Allyn & Bacon. (Eds.), Predictability, correlation, and contiguity (pp. 203–
Donahoe, J. W., Burgos, J. E., & Palmer, D. C. (1993). 246). New York: Wiley.
Selectionist approach to reinforcement. Journal of the Hineline, P. N. (1986). Re-tuning the operant-respon-
Experimental Analysis of Behavior, 60, 17–40. dent distinction. In T. Thompson & M. D. Zeiler
Donahoe, J. W., Crowley, M. A., Millard, W. J., & Stickney, (Eds.), Analysis and integration of behavioral units (pp.
K. A. (1982). A unified principle of reinforcement. 55–79). Hillsdale, NJ: Erlbaum.
In M. L. Commons, R. J. Herrnstein, & H. Rachlin Hinson, J. M., & Staddon, J. E. R. (1983a). Hill-climbing
(Eds.), Quantitative analyses of behavior (Vol. 2, pp. 493– by pigeons. Journal of the Experimental Analysis of Behav-
521). Cambridge, MA: Ballinger. ior, 39, 25–47.
Donahoe, J. W., & Dorsel, V. P. (Eds.). (in press). Neural- Hinson, J. M., & Staddon, J. E. R. (1983b). Matching,
210 JOHN W. DONAHOE et al.
maximizing, and hillclimbing. Journal of the Experimen- Palmer, D. C., & Donahoe, J. W. (1992). Essentialism
tal Analysis of Behavior, 40, 321–331. and selectionism in cognitive science and behavior
Hoebel, B. G. (1988). Neuroscience and motivation: analysis. American Psychologist, 47, 1344–1358.
Pathways and peptides that define motivational sys- Pear, J. J. (1985). Spatiotemporal patterns of behavior
tems. In R. A. Atkinson (Ed.), Stevens’ handbook of ex- produced by variable-interval schedules of reinforce-
perimental psychology (Vol. 1, pp. 547–625). New York: ment. Journal of the Experimental Analysis of Behavior, 44,
Wiley. 217–231.
Holland, P. C. (1977). Conditioned stimulus as a deter- Platt, J. R. (1979). Interresponse-time shaping by vari-
minant of the form of the Pavlovian conditioned re- able-interval-like interresponse-time reinforcement
sponse. Journal of Experimental Psychology: Animal Behav- contingencies. Journal of the Experimental Analysis of Be-
ior Processes, 3, 77–104. havior, 31, 3–14.
Hull, C. L. (1934). The concept of habit-family hierarchy Quinsey, V. L. (1971). Conditioned suppression with no
and maze learning. Psychological Review, 41, 33–54. CS-US contingency in the rat. Canadian Journal of Psy-
Hull, C. L. (1937). Mind, mechanism, and adaptive be- chology, 25, 69–82.
havior. Psychological Review, 44, 1–32. Rescorla, R. A. (1967). Pavlovian conditioning and its
Jenkins, H. M., & Sainesbury, R. S. (1969). The devel- proper control procedures. Psychological Review, 74,
opment of stimulus control through differential re- 71–80.
inforcement. In N. J. Mackintosh & W. K. Honig Rescorla, R. A., & Wagner, A. R. (1972). A theory of
(Eds.), Fundamental issues in associative learning (pp. Pavlovian conditioning: Variations in the effectiveness
123–161). Halifax, Nova Scotia: Dalhousie University of reinforcement and nonreinforcement. In A. H.
Press. Black & W. F. Prokasy (Eds.), Classical conditioning II:
Kehoe, E. J. (1988). A layered network model of asso- Current research and theory (pp. 64–99). New York: Ap-
ciative learning: Learning to learn and configuration. pleton-Century-Crofts.
Psychological Review, 95, 411–433. Rosenblatt, F. (1962). Principles of neurodynamics. Wash-
Kehoe, E. J. (1989). Connectionist models of condition- ington, DC: Spartan.
ing: A tutorial. Journal of the Experimental Analysis of Rumelhart, D. E., McClelland, J. L., & The PDP Research
Behavior, 52, 427–440. Group. (Eds.) (1986). Parallel distributed processing: Ex-
Keller, R. J., Ayres, J. J. B., & Mahoney, W. J. (1977). Brief plorations in the microstructure of cognition (Vol. 1). Cam-
versus extended exposure to truly random control bridge, MA: MIT Press.
procedures. Journal of Experimental Psychology: Animal Schoenfeld, W. N., & Farmer, J. (1970). Reinforcement
Behavior Processes, 3, 53–65. schedules and the ‘‘behavior stream.’’ In W. N.
Lattal, K. A., & Gleeson, S. (1990). Response acquisition Schoenfeld (Ed.), The theory of reinforcement schedules
with delayed reinforcement. Journal of Experimental Psy- (pp. 215–245). New York: Appleton-Century-Crofts.
chology: Animal Behavior Processes, 16, 27–39. Shimp, C. P. (1966). Probabilistically reinforced choice
Lieberman, P. A. (1993). Learning: Behavior and cognition. behavior in pigeons. Journal of the Experimental Analysis
Pacific Grove, CA: Brooks/Cole. of Behavior, 9, 443–455.
McClelland, J. L., Rumelhart, D. E., & The PDP Research Shimp, C. P. (1969). Optimal behavior in free-operant
Group. (Eds.). (1986). Parallel distributed processing: experiments. Psychological Review, 76, 97–112.
Explorations in microstructure of cognition (Vol. 2). Cam- Shimp, C. P. (1974). Time allocation and response rate.
bridge, MA: MIT Press. Journal of the Experimental Analysis of Behavior, 21, 491–
Meazzini, P., & Ricci, C. (1986). Molar vs. molecular 499.
units of analysis. In T. Thompson & M. D. Zeiler Shull, R. L. (1995). Interpreting cognitive phenomena:
(Eds.), Analysis and integration of behavioral units (pp. Review of Donahoe and Palmer’s Learning and Com-
19–43). Hillsdale, NJ: Erlbaum. plex Behavior. Journal of the Experimental Analysis of Be-
Minsky, M. L., & Papert, S. A. (1969). Perceptrons. Cam- havior, 63, 347–358.
bridge, MA: MIT Press. Sidman, M. (1986). Functional analysis of emergent ver-
Moore, J. (1984). Choice and transformed interrein- bal classes. In T. Thompson & M. D. Zeiler (Eds.),
forcement intervals. Journal of the Experimental Analysis Analysis and integration of behavioral units (pp. 213–
of Behavior, 42, 321–335. 245). Hillsdale, NJ: Erlbaum.
Morse, W. H. (1966). Intermittent reinforcement. In W. Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J.
K. Honig (Ed.), Operant behavior: Areas of research and (1978). The structure of choice. Journal of Experimen-
application (pp. 52–108). New York: Appleton-Centu- tal Psychology: Animal Behavior Processes, 4, 368–398.
ry-Crofts. Silberberg, A., & Ziriax, J. M. (1982). The interchange-
Morse, W. H., & Skinner, B. F. (1957). A second type of over time as a molecular dependent variable in con-
superstition in the pigeon. American Journal of Psychol- current schedules. In M. L. Commons, R. J. Herrn-
ogy, 70, 308–311. stein, & H. Rachlin (Eds.), Quantitative analyses of
Neuringer, A. J. (1967). Choice and rate of responding in behavior: Vol. 2. Matching and maximizing accounts of
the pigeon. Unpublished doctoral dissertation, Harvard behavior (pp. 111–130). Cambridge, MA: Ballinger.
University. Skinner, B. F. (1931). The concept of the reflex in the
Nevin, J. A. (1969). Interval reinforcement of choice be- study of behavior. Journal of General Psychology, 5, 427–
havior in discrete trials. Journal of the Experimental Anal- 458.
ysis of Behavior, 12, 875–885. Skinner, B. F. (1937). Two types of conditioned reflex:
Nevin, J. A. (1984). Quantitative analysis. Journal of the A reply to Konorski and Miller. Journal of General Psy-
Experimental Analysis of Behavior, 42, 421–434. chology, 16, 272–279.
Osgood, C. E. (1953). Method and theory in experimental Skinner, B. F. (1938). The behavior of organisms. New York:
psychology. New York: Oxford University Press. Appleton-Century-Crofts.
S-R ISSUE 211
Skinner, B. F. (1948). ‘‘Superstition’’ in the pigeon. Jour- analogue of operant conditioning. Journal of the Exper-
nal of Experimental Psychology, 38, 168–172. imental Analysis of Behavior, 60, 41–53.
Skinner, B. F. (1953). Science and human behavior. New Stein, L., Xue, B. G., & Belluzzi, J. D. (1994). In vitro
York: Macmillan. reinforcement of hippocampal bursting: A search for
Skinner, B. F. (1976). Farewell, my lovely! Journal of the Skinner’s atom of behavior. Journal of the Experimental
Experimental Analysis of Behavior, 25, 218. Analysis of Behavior, 61, 155–168.
Skinner, B. F. (1983). A matter of consequences. New York: Timberlake, W., & Lucas, G. A. (1985). The basis of su-
Knopf. perstitious behavior: Chance contingency, stimulus
Sober, E. (1984). The nature of selection. Cambridge, MA: substitution, or appetitive behavior? Journal of the Ex-
MIT Press. perimental Analysis of Behavior, 44, 279–299.
Sober, E. (1993). Philosophy of biology. Boulder, CO: West- Tolman, E. C. (1932). Purposive behavior in animals and
view Press. men. New York: Appleton-Century-Crofts.
Staddon, J. E. R. (1993). The conventional wisdom of Watson, J. B. (1924). Behaviorism. New York: Norton.
behavior analysis. Journal of the Experimental Analysis of Williams, B. A. (1985). Choice behavior in a discrete-
Behavior, 60, 439–447. trial concurrent VI-VR: A test of maximizing theories
Staddon, J. E. R., & Hinson, J. M. (1983). Optimization: of matching. Learning and Motivation, 16, 423–443.
A result or a mechanism? Science, 221, 976–977. Williams, B. A. (1986). Identifying behaviorism’s proto-
Staddon, J. E. R., & Simmelhag, V. L. (1971). The ‘‘su- type: A review of Behaviorism: A Conceptual Reconstruc-
tion by G. E. Zuriff. The Behavior Analyst, 9, 117–122.
perstition’’ experiment: A reexamination of its impli-
Williams, B. A. (1990). Enduring problems for molecu-
cations for the principles of adaptive behavior. Psycho-
lar accounts of operant behavior. Journal of Experimen-
logical Review, 78, 3–43. tal Psychology: Animal Behavior Processes, 16, 213–216.
Stein, L., & Belluzzi, J. D. (1988). Operant conditioning Williams, D. R. (1968). The structure of response rate.
of individual neurons. In M. L. Commons, R. M. Journal of the Experimental Analysis of Behavior, 11, 251–
Church, J. R. Stellar, & A. R. Wagner (Eds.), Quanti- 258.
tative analyses of behavior (Vol. 7, pp. 249–264). Hills- Wise, R. A. (1989). The brain and reward. In J. M. Lieb-
dale, NJ: Erlbaum. man & S. J. Cooper (Eds.), The neuropharmacological
Stein, L., & Belluzzi, J. D. (1989). Cellular investigations basis of reward (pp. 377–424). New York: Oxford Uni-
of behavioral reinforcement. Neuroscience and Biobehav- versity Press.
ioral Reviews, 13, 69–80. Zuriff, G. E. (1985). Behaviorism: A conceptual reconstruc-
Stein, L., Xue, B. G., & Belluzzi, J. D. (1993). A cellular tion. New York: Columbia University Press.