Shimp 1969

VOL. 76, No.
2 MARCH 1969
PSYCHOLOGICAL REVIEW
OPTIMAL BEHAVIOR IN FREE-OPERANT
EXPERIMENTS 1
CHARLES P. SHIMP
University of Utah
Expected Utility Theory successfully predicts the steady-state relative fre-

quencies of the two alternatives in two-key, discrete-trial probability-learning
experiments, in concurrent variable interval schedules, and in concurrent
variable interval schedules modified to study magnitude and delay of rein-
forcement and conditioned reinforcement. It also successfully predicts the
relative frequencies of interresponse times in one-key variable interval
schedules. Different interresponse times between pecks on a single key and
pecks on different keys are treated as different instances of concurrent
operants that have basically the same functional relationships with reinforce-
ment variables. The theory states that an 5" chooses whichever alternative
momentarily has the greatest weighted probability of reinforcement.
In recent years the increasingly powerful number of assumptions. The model treats
techniques for behavioral control provided by only those variables having the largest and
operant conditioning have enabled experi- best-known effects on behavior and is, there-
menters to discover quantitative relationships fore, only a first-order approximation. Some
in the steady-state performance of individual implications of this approach are discussed
organisms. This paper shows how several below.
of these quantitative relationships, obtained It will be shown that each of the experi-
from a wide variety of experimental situa- ments discussed below is an instance of the
tions, can be organized under a small set of following paradigm. A subject (S) chooses
assumptions that constitute a descriptive the- repetitively from among a given set of alter-
ory of operant behavior. The basic assump- natives. Each alternative is denoted by an At,
tion holds that an organism behaves optimally where i ranges over the number of alterna-
in the sense that each of its choices is the tives. Sometimes when A{ occurs it is fol-
alternative momentarily having the greatest lowed by a reinforcing event £4. The momen-
value, where the value attached to an alterna- tary probability of reinforcement for response
tive equals its momentary objective reinforce- At, that is, the probability that At, if it occurs,
ment probability weighted by an associated is reinforced, is called P(Ei}. Each rein-
value. The purpose of this theory is to or- forcing event £4 has some quantity associated
ganize and relate the data from a maximum with it, such as its magnitude, or a delay
number of experiments under a minimum superimposed between At and Et, which de-
1
This work was done at Stanford University and termines for it a value, V(Et). The specific
was supported in part by the author's United functional relationships between these quan-
States Public Health Service postdoctoral fellow-
ship. Access to the computer, a PDP-8, was tities and the F(£«) are best deferred to a
kindly provided by the Stanford Department of later section, but it is assumed that greater
Preventive Medicine. An abbreviated version of magnitudes, or shorter delays, will determine
this paper was presented at the Symposium on higher values.
Concurrent Operants at the Washington, D. C.,
meeting of the American Psychological Association, It is assumed that an 5" chooses the alterna-
September 1967. tive for which the momentary objective prob-
97
i 1969 by the American Psychological Association, Inc.
98 CHARLES P. SHIMP
ability of reinforcement times the associated ent interresponse times. In other words, the
value is greatest, or, in shorter terms, the A± present theory predicts both relative and ab-
for which P(Ei)'V(El) is greatest. This solute response rates. The first two experi-
largest weighted reinforcement probability ments to be described by Formula 1 are the
will be denoted by discrete-trials probability learning experi-
ment and the concurrent variable interval
MAX[P(£i) • F(E,-)]. [1] schedule of reinforcement. Previous work
Nonreinforcement of a response is assumed (Shimp, 1966) has already indicated that the
behavior of pigeons in these experiments is
to have zero value. The assumption em-
bodied in Equation 1 is the invariant core of very nearly optimal, and thus to show that
the optimal process that will appear through- this behavior is predicted by a special case
of Formula 1 now will be a simple matter.
out this paper in several different forms, each
of which is determined by a different experi- The reinforcement contingencies in a con-
mental procedure. It will be noted in passing current variable interval schedule will be
that the present theory formally resembles discussed in detail because the principles
Expected Utility Theory (Edwards, 1954, involved are basic to all that follows. Then
1961; Luce & Raiffa, 1957; Luce & Suppes, the next three experiments to be described
1965). That is, each one of an 6"s choices are modified concurrent variable interval
is assumed to be of that alternative At for schedules designed to study conditioned rein-
which forcement (Herrnstein, 1964a), delay of re-
inforcement (Chung & Herrnstein, 1967),
and magnitude of reinforcement (Catania,
V(E3) 1963b; Neuringer, 1967). Finally, it will be
3-1
shown how performance in a one-key variable
is greatest. This sum is taken over the / interval schedule can be described in terms
possible outcomes of At. Since these out- of the same formal model. This variable inter-
comes here consist of just reinforcement or val schedule is one of the most common base-
nonreinforcement, and since the latter outline schedules in operant conditioning and
come has zero value, the largest sum is pre- there is a decided need for a useful account
cisely Formula 1. This formal relationship of the behavior it produces. The data typi-
may help to place the present theory into cally recorded from this schedule, and thus
proper perspective, among the various pos- the data the present theory must describe,
sible sorts of quantitative theories, as to aims, are response rates measured either by cumu-
limitations, and so forth. In particular, the lative records or by relative frequencies of
aim of the present theory is to describe suc- interresponse times. The other kind of
cinctly as much data as possible, and the schedule analyzed here, the concurrent vari-
theory is limited to steady-state behavior. able interval schedule, is one in which two
The formal assumptions have broad applica- separate variable interval schedules are pro-
bility and intuitive appeal, and their merits grammed independently and at the same time
and limitations already have been discussed on two separate response keys. An alternative
at length (e.g., Edwards, 1961; Luce & method of programming this schedule em-
Suppes, 1965). ploys one key on which one or the other of
Quantitative predictions for a particular two variable interval schedules is operative
experiment can be made as soon as rules of at any moment, and a second key, called the
correspondence are made between the empiri- change-over key, on which a peck changes
cal variables and the formal terms of the the schedule in effect on the first key (e.g.,
theory. Although the rules of correspond- Catania, 1966; Findley, 1958; Shull & Plis-
ence for a particular experiment are best koff, 1967). In the present paper, these
postponed to the discussion of that experi- two methods are treated the same. For this
ment, it is appropriate to mention here that schedule, programmed either way, as for
the A^ sometimes will be choices of different most discrete-trials probability-learning pro-
keys and sometimes will be choices of differ- cedures, the most commonly reported data
OPTIMAL BEHAVIOR IN FREE OPERANT EXPERIMENTS
are choice probabilities. The present analysis tive frequency of one response approximately
will emphasize, for both one-key and two- equals, or matches, the relative frequency of
key experiments, the need for other and reinforcement of that response (Catania,
more detailed measures of behavior, such as 1963a; Herrnstein, 1961; Reynolds, 1963).
sequential statistics. This matching has been described as a conse-
quence of
PROBABILITY-LEARNING EXPERIMENTS AND
CONCURRENT VARIABLE INTERVAL a plausible view of response strength: Rate of re-
SCHEDULES sponding is a linear measure of response strength,
which is itself a linear function of frequency of
Much of the original interest in proba- reinforcement. . . . According to this point of view,
bility-learning experiments was due, of the animals match relative frequency of responding
to relative frequency of reinforcement not because
course, to the counter-intuitive prediction they take into account what is happening on the
from Statistical Learning Theory (Estes & two keys, but because they respond to the keys in-
Straughan, 1954) that, under certain condi- dependently [Herrnstein, 1961, p. 270].
tions, an organism would match its choice
probabilities to the reinforcement probabili- To differentiate between this account for
ties. This prediction was counter-intuitive matching in terms of response strength and
precisely because it contradicted the predic- an account in terms of Formula 1, it is first
tion that an organism would respond opti- necessary to determine the reinforcement
mally. As it turns out, a pigeon will, after contingencies in a concurrent variable inter-
prolonged training, nearly always choose the val schedule. Then the variables P(Et) and
alternative having the highest probability V(Et) in Formula 1 may be computed.
of reinforcement in discrete-trial probability- Customarily in this schedule, the probability
learning experiments (Graf, Bullock, & Bit- that reinforcement is programmed for one of
terman, 1964; Shimp, 1966). This alterna- two alternatives increases almost linearly
tive is the same from trial to trial so that with the time since the last choice of alterna-
its probability of being chosen eventually tive. The approximation to linearity is espe-
approximates unity. Such behavior is called cially close over the range of interresponse
maximizing because it maximizes the time times that usually occurs. Thus the proba-
rate of reinforcement. An equivalent de- bility of reinforcement for a choice of one
scription is that each choice is of the alterna- alternative increases while a bird makes
tive momentarily having the greatest proba- other choices, and nonreinforcement of one
bility of reinforcement. Obviously, this choice resets the probability of reinforcement
maximizing is a special case of the behavior of that choice back to its lowest value. In
predicted by the optimal process described short, reinforcement probabilities change over
above, with AI and A2 equal to a choice of time as a function of both the schedule and a
the left or right key, with EI and E2 equal bird's behavior. Jji general, at any moment,
to a feeder presentation after an AI or an A%, P(Et) = Rt-ti where Rt is the probability
respectively, and with V(E-L) equal to V(E2) of reinforcement per unit time for response
since there is no difference in reinforcement AI, t = l , 2 (determined by the schedule),
delay or magnitude. and ti is the time since the last At (deter-
Since the time rate of reinforcement thus mined by the bird). For all tt greater than
appears precisely to control behavior in \/Ri, Ri't( is set equal to 1.0 since it is a
probability-learning experiments, does it probability. [In arithmetic variable interval
similarly control behavior in any other ex- schedules, P(Ei) also depends on the time
periments in which reinforcements are also since the preceding reinforcement. This de-
programmed according to a random sched- pendency affects absolute response rates (Ca-
ule? One such experimental procedure is tania & Reynolds, 1968) but as explained
the concurrent variable interval schedule of below, the absolute response rate does not
reinforcement. The question of whether a affect the predictions of relative response
bird's behavior is optimal here is especially rates in two-key schedules.] Since there are
intriguing because in this schedule the rela- no delays and since reinforcement magnitudes
100 CHARLES P, SHIMP
are equal, F(#i) = V(E2). Therefore, ac- avoid these effects, are also omitted. In
cording to Formula 1, maximizing would short, without quantitative data to guide the
consist of a very strong tendency to choose selection of a base-line response rate, a geo-
the alternative that momentarily happens to metric process was selected primarily on the
have the highest probability of reinforcement, grounds of convenience. The probability of
that is, the alternative which at any moment a time interval of length k between succes-
has a reinforcement probability equal to sive responses, on either key, was set equal
to q*(\ — q), where 1 — q was the proba-
MAX[P(E<)] = [2]
bility of an occurrence of either response in
While Equation 2 gives us the optimal some small time interval. Different values
response at any moment, it is not immediately of the single parameter q determining the
obvious what the resulting data, averaged constant response rate were found not to
over an entire session, would resemble. For- produce different relative frequencies of
tunately, quantitative predictions may be choices, at least for parameter values corre-
obtained by letting a high-speed computer sponding to realistic response rates.
simulate behavior that is at each moment The computer program can be summarized
described by Equation 2. But notice that now. The probability of reinforcement for
one necessary feature of behavior is so far Ai, P(Ei}, equaled at each moment the prod-
omitted from the theory. As yet, there is no uct equal to Ri, a constant determining the
provision for establishing an absolute re- reinforcement rate for Alt multiplied by £«,
sponse rate. That is, the theory up to this the time since the last "occurrence" of A^
point describes only which one of two keys When the geometric process assigned an oc-
is to be chosen, not when a choice is to occur. currence of a response, the computer calcu-
Thus, an additional process is required to lated P(-Ei) and P(E2), determined which
generate a base-line response rate. Unfor- was larger, and then "chose" the alternative
tunately, the current literature is of little help having the larger one. A programmed re-
here since much less is known about the sponse occurrence was canceled if P(£i)
absolute response rate than about the relative equaled F(£ 2 ). A change-over delay was
response rate in a concurrent variable inter- programmed so that the occurrence of EI
val schedule. In fact, there is almost nothing or EZ was never preceded by an AZ or an
known about the interresponse-time distribu- A\, respectively, by less than a certain inter-
tions and the interresponse-time sequential val of time. This interval, the change-over
statistics. What little is known concerns the delay, appeared to have little effect on the
effects of the procedural device known as a resulting stat-data and so it was omitted
change-over delay, frequently used to reduce from the programs for the experiments de-
the frequency of adventitiously reinforced scribed later in this paper. The programmed
switching behavior. A change-over delay change-over delays were equivalent to ap-
appears to generate short bursts of responses proximately 1 second, and were equal for
that consume somewhat more time than the both alternatives. Recent data (Shull &
change-over delay (cf. Catania, 1961 ; 1963a, Pliskoff, 1967) indicate that relative fre-
Fig. 7, top cumulative record). However, quencies of choices depend on change-over
the distribution of numbers of responses in delays if the delays are unequal and unusually
these bursts, and the distributions of inter- long. Unequal change-over delays affect the
response times at different times after such a momentary expected times to reinforcement
switch from one key to another, are not on the two keys. These momentary values
known; therefore such local properties of affect the predictions of the present theory,
behavior are omitted from consideration here. as explained below in the context of delay of
Thus the theory describes an ideal situation reinforcement. The data of Shull and Plis-
where choice behavior is unaffected by posi- koff seem to be in qualitative agreement with
tion biases and superstitious switching, yet the predictions given below for the effects of
the effects on the absolute response rate of delay of reinforcement. In short, the simu-
the change-over delay, usually required to lation program arranged that whenever a
OPTIMAL BEHAVIOR IN FREE OPERANT EXPERIMENTS 101
choice occurred, it was the alternative having pend on times since the last At's, it is very
the greater reinforcement probability, and likely that effects of induction across different
this probability was determined as it is in a interresponse times (cf. Shimp, 1967b) ulti-
concurrent variable interval schedule. mately will have to be considered.
The computer program just described was The adequacy of a model is perhaps best
used, then, to simulate maximizing in con- understood in comparison with that of alter-
current variable interval schedules (Shimp, native models. In the present case, the only
1966). The resulting stat-data revealed that competing account of matching is provided
maximizing produces a relative frequency of by the response strength notion, as succinctly
choices of one alternative that closely ap- described by Herrnstein in the above quota-
proximates the relative frequency of rein- tion. The present model would seem to pro-
forcement of that alternative. In short, vide the better account on several grounds.
maximizing produces an approximate match- First, maximizing provides accurate, quanti-
ing. Thus Equation 2 predicts the correct tative predictions for low-order sequential
overall choice probability for concurrent vari- statistics. It is not clear how to generate
able interval schedules. The same computer these statistics from response strength ideas.
program, minus the geometric process, also Second, as we shall try to show below, the
was used to simulate maximizing in a dis- present model can be generalized to account
crete-trials choice experiment equivalent to a for much additional data. It remains to be
concurrent variable interval schedule, except seen whether the response strength idea can
that the absolute response rate was controlled be so generalized.
by the experimenter, not by the bird. The In summary, the one most important con-
stat-data revealed not only matching but close sequence of the present analysis is the impli-
approximations to two kinds of data unavail- cation that matching in concurrent variable
able from concurrent variable interval sched- interval schedules results from averaging
ules : sequential statistics (the probabilities of over many choices, no one of which obeys
a choice given different sequences of preced- any matching principle. The data from this
ing choices), and the probability of a choice schedule are therefore consistent with those
as a function of the length of time since the from probability-learning experiments. The
preceding reinforcement. These data are badly consistent relationship is between momentary
needed from concurrent variable interval choice probabilities and momentary rein-
schedules themselves. The fit between pre- forcement probabilities, rather than between
dicted and observed values appears all the average choice probabilities and average
more striking in these instances since the reinforcement probabilities. This relation-
present model is deterministic and provides ship is summarized by the optimal process
for no variability. Close fits in such cases defined by Formula 1. Skinner's well-
are possible only because of the extraordi- known warning that a curve averaged over
narily good control provided by the schedules 5s may be unrepresentative of individual Ss
of reinforcement. is, in generalized form, appropriate here_^ A
Originally it was implied (Shimp, 1966) curve averaged over responses from a single
that responding was based on preceding se- 5 may be unrepresentative even of that 5"'s
quences of choices. More recent data individual responses.
(Shimp, 1967a) indirectly suggest that pre-
ceding choices are unlikely to control choice MODIFIED CONCURRENT VARIABLE INTERVAL
behavior. These data, plus the demand for SCHEDULES OF REINFORCEMENT
parsimony between the analysis for concur- Although Formula 1 accounts nicely for
rent variable interval schedules here and the the matching between relative rates of re-
analysis for variable interval schedules de- sponding and of reinforcement in unmodified
scribed below, suggest the present formula- concurrent variable interval schedules, still
tion, Equation 2, that is, a formulation stated other evidence for a more general kind of
only in terms of lengths of time since pre- matching has been obtained from various
ceding choices. Since predictions here de- modified concurrent variable interval sched-
102 CHARLES P. SHIMP
ules. The existence of this more general forcement. Obviously, an optimal choice in
matching has made the relationship between the first link would be guided both by the
maximizing and matching in the unmodified probability of conditioned reinforcement and
schedules appear coincidental. In a study by by the delay between conditioned reinforce-
Catania (1963b), the relative frequency of a ment and primary reinforcement. More pre-
choice approximately equaled the relative cisely, a low probability would be offset by
magnitude of reinforcement of that choice, a sufficiently short delay. If a pigeon's be-
when magnitude of reinforcement was mea- havior here maximized the rate of primary
sured by the duration of access to food and reinforcement, then V(El) would equal l/Dt
when the two variable interval schedules and the maximized variable would be
were equal. Also, Chung and Herrnstein
(1967) showed that the relative frequency P(Et) • V(Et) = CR, • WO* [3]
of a choice approximately equaled the relative where P(Et) is determined exactly as in
immediacy of reinforcement (defined as the Equation 2 since again the schedule is a
complement of the relative delay of rein- concurrent variable interval, and Dt is the
forcement) ; again, the two variable interval time interval that must elapse between onset
schedules were equal. The delay of rein- of the conditioned reinforcing stimulus and
forcement equaled the duration of a black- primary reinforcement. The intervals DI for
out programmed between response and rein- each alternative were actually variables in
forcement. In addition, Herrnstein (1964a) Herrnstein 's experiment but for convenience
performed an experiment on conditioned rein the computer simulation of this experi-
inforcement. He programmed a two-link ment, each D{ was set equal to the reciprocal
schedule in which the first link was a con- of the arithmetic average rate of primary
current schedule with equal variable interval reinforcement for alternative At. (Herrn-
components. The second link was either a stein, 1964b, has shown that the arithmetic
variable interval or a variable ratio schedule average is not exactly right for this purpose.
programmed on just one or the other of the However, there is no general agreement on
two keys and the remaining key was non- what should be used. For example, see
functional. Choices of one key in the first Fantino, 1967; McDiarmid & Rilling, 1965.)
link were reinforced by the presentation of When the computer program described above
the stimulus associated with primary rein- simulated maximizing, with Equation 2 re-
forcement on that key in the second link. The placed by Equation 3, the resulting predicted
relative frequency of a response to a key in curve was never more than .07 from the
the first link approximately equaled the rela- main diagonal representing perfect matching.
tive frequency of primary reinforcement on The chief discrepancy between theoretical
the same key in the second link. Herrnstein and obtained curves was that they lay on
concluded that the relative reinforcing opposite sides of the main diagonal.
strength of a conditioned reinforcing stimulus Equation 3, where the value of a delay is
approximates the relative rate of primary assumed to equal the reciprocal of that delay,
reinforcement in the presence of that is a special case of Formula 1. Only in this
stimulus. special case will the predicted behavior opti-
According to the following analysis, these mize the rate of reinforcement. In general,
experiments are all conceptually identical, the value of a given delay is assumed not to
and Formula 1 can account for the data be so simply related to the length of that
from each of them. To show this, we first delay. Instead, this relationship clearly ought
must provide the rules of correspondence to be as consistent as possible with the vast
between the terms in Formula 1, that is, be- literature on the effects on behavior of a
tween P(Et) and the V(E^, and the em- delay superimposed between a response and
pirical variables in each of the experiments. the succeeding reinforcement. The present
The rules are basically the same for each model depends on delay of the reinforcement
experiment so we shall discuss in detail only only because such a delay may affect the
Herrnstein's experiment on conditioned rein- rate of reinforcement. Therefore, no dis-
tinction is made between a delay that post- 1.0

• REAL DATA
pones reinforcement and must be terminated 0.9 STAT DATA
by a response, and a delay of equal duration BEST-FITTING
STRAIGHT LINE
fOA
that postpones reinforcement but terminates
independently of behavior. "Delay of rein- n 0-7
forcement" usually refers, of course, only to
% 0.6
the second procedure, but here the term will
apply to both unless specifically stated other-
wise. Much of the data on delays of either
type, for example, see Bower, McLean, and
Meacham (1966), Chung (1965), Fantino
(1967), Herrnstein (1964b), Logan (1960),
Perin (1943), Prokasy (1956), Pubols
(1962), Wyckoff (1959), can be sum-
marized, at least qualitatively, by either a
power function or an exponential function. O 0.1 0.8 0.3 0.4 0.8 0.6 0.7 0.8 0.9 1.0
RELATIVE FREO. OF PRIMARY REINFORCEMENT FOR RESPONSE A,
For present purposes then, we may say that
the literature suggests that the value of a FIG. 1. Relative frequencies of response Ai dur-
ing the first link as a function of the relative rate
given delay is either a power function or an of primary reinforcement on that key in the sec-
exponential function of that delay, with in- ond link. (The real data are from the left panels
creasingly long delays having less and less of Figure 4 in Herrnstein's, 1964a, paper on con-
value. ditioned reinforcement. The straight line was
Both the power function and the exponen- plotted by Herrnstein. The curved line was ob-
tained by computer simultation of the optimal proc-
tial function were employed at different times ess defined by Equation 4. The predicted curve
to simulate maximizing in Herrnstein's ex- closely approximates the data and the best-fitting
periment on conditioned reinforcement. Even straight line.)
though the exponential function gave slightly
better fits, only the power function will be strength of a conditioned reinforcing stimu-
discussed here since it required but one esti- lus, but perhaps instead, the degree of control
mated parameter to give an adequate fit and by the probability of primary reinforcement
therefore gave a more parsimonious account. weighted by a delay-of-reinforcement gra-
An informal parameter search indicated that dient.
a power equal to 2.0 for each bird would The application of the model to the experi-
provide an acceptable fit to the data. The ment by Chung and Herrnstein (1967) on
available computational facilities limited the delay of reinforcement requires only that the
possible values for this parameter and so appropriate changes be made in the rules of
there may be values giving even better fits. correspondence between formal terms and
Figure 1 compares the original data from empirical variables. Thus all the rules re-
Herrnstein's experiment with the results of main the same except that Dt is now the
simulating maximizing where each choice length of the black-out between an occur-
was of the alternative having the greatest rence of A{ and the following EL With this
weighted reinforcement probability, that is, single change, Equation 4 becomes the basis
for the simulation of maximizing for this
MAX JZ, • fc • TT; . [4] second modified concurrent variable interval
schedule. Figure 2 shows the resulting fit
The stat-data compare favorably enough with between the stat-data and the real data for
the real data to suggest that the assumptions the group of birds that had a standard delay
leading to Equation 4 may account for the of 8 seconds. The real data for which the
major portion of the effect of the independent standard delay was 16 seconds were too
variable. Therefore, the data may describe greatly influenced by position preferences to
not a relation between rate of primary rein- be suitable for analysis here. The fit revealed
forcement and the conditioned reinforcing in Figure 2 is fairly good, especially since
i.O
REDRAWN FROM CHUNG AND HERRNSTEIN
0.9
ft REAL DATA REAL DATA
0.8 STAT-DATA STAT-DATA
0.7 MATCHING
0.6
0.5
0.4
; 0.3
0.2
0.1
0 4 8 12 16 20 24 28 32 0 0.1 O.2 0.3 0.4 0.5 0.6 0.7 O.8 0.9 1.0
DELAY OF REINFORCEMENT FOR RESPONSE A, RELATIVE DELAY OF REINFORCEMENT
FOR RESPONSE A,
FIG. 2. The left panel shows the relative frequency of response Ai as the function of the de-
lay of reinforcement for Ai. (The delay of reinforcement for response A* was always 8 sec-
onds. The right panel shows the same data as a function of the relative delay of reinforcement.
The real data are from Figure 2 in the paper by Chung and Herrnstein, 1967. The predicted
curve in each panel was obtained by computer simulation of the optimal process denned by
Equation 4. The predicted curves roughly approximate the obtained curves.)
the parameter was held equal to 2.0, its value (The present account therefore shows that
in the different context of Herrnstein's ex- the matching function obtained by Chung and
periment on conditioned reinforcement. Herrnstein is consistent with the literature.)
The present analysis suggests that of the In other words, the present analysis suggests
experiments by Chung (1965) and by Chung that the data from the experiment by Chung
and Herrnstein (1967), only the former and Herrnstein would have been essentially
directly studied delay of reinforcement, ac- unchanged if a response had been required,
cording to the usual definition given above. at the end of the delay interval, to produce
In Chung's experiment, maximum reinforce- reinforcement.
ment rates on the two keys were equated by This prediction from the model recently
nondelay black-outs. Therefore, the present has been tested by the author. A paper
model would predict a relative frequency of currently in preparation will provide the de-
.50 for responding on either key. The devia- tails of this experiment and only an outline
tions from this prediction presumably reflect of the procedure and results will be given
genuine effects of delay of reinforcement. here. The procedure of the experiment
The curvilinear function Chung obtained is duplicated that of the experiment by Chung
consistent with most of the data on delay of and Herrnstein except that reinforcement
reinforcement. However, the nondelay black- was not automatically presented at the end
outs were omitted from the experiment by of a black-out. Thus, the formal structure
Chung and Herrnstein, with the result that of the experiment was identical to the experi-
the reinforcement rate, rather than delay of ment by Chung and Herrnstein in terms of
reinforcement, in the usual sense, was the reinforcement probabilities and lengths of
controlling variable. That is, maximum rein- black-outs, that is, in terms of the present
forcement rates were not equated on the two model; yet reinforcements were always pre-
keys and the present model for optimal be- sented immediately after a key peck. In
havior predicts matching if a curvilinear short, there was no delay of reinforcement in
delay-of-reinforcement function is assumed. the sense presumably intended by Chung and
OPTIMAL BEHAVIOR IN FREE OFERANT EXPERIMENTS 105
Herrnstein. Nevertheless, the data of three for delay of reinforcement, magnitude of

birds replicated the results of their experi- reinforcement, and conditioned reinforcement
ment. That is, a bird matched the relative seem restricted to these specific procedures.
frequency of a response to the relative imme- Therefore, equations that describe behavior
diacy of reinforcement of that response. in these procedures fail to describe behavior
Thus, the Chung and Herrnstein data would in analogous one-key procedures or in any
seem not to establish a matching function for other two-key procedures (Catania, 1963a;
delay of reinforcement, as defined tradition- Herrnstein, 1964a). From the present ac-
ally. Instead, the present analysis suggests count we can see why this is so. These equa-
that their data actually increase the generality tions are based on the assumption that
of a curvilinear function such as that found matching is the invariant relationship. But
by Chung and used in Equation 4. This con- it now appears that matching may be only
firmation of an a priori prediction of the a mathematical consequence of maximizing.
theory reveals that, although the theory is The real invariant in these experiments ap-;
primarily intended as a means of describing pears to be an optimal process. '
data, it nevertheless also has predictive power Clearly, Equations 4 and 5 are not re-
by virtue of its implicating certain variables stricted to the special case for which the
and their relationships as the controlling variable interval components are equal. That
factors in a wide class of experiments. is, the model could predict data for the modi-
In Catania's experiment on magnitude of fied concurrent variable interval schedules
reinforcement, the duration of reinforcement discussed above even if RI were not equal to
associated with one alternative increased the R2. The testing of such predictions is a
value of that alternative, and so the value possible direction for further research in this
equivalent to Equation 4 is area. But the rest of this paper travels still
another route. In the experiments by Chung
MAXC& • ti • Df] [5] and by Chung and Herrnstein, only certain
where RI and U are as in Equation 4, and responses initiated delays. But now suppose
A is the feeder duration for response At. that each response initiated a delay such that
Catania used feeder durations of 3, 4.5, and a reinforcement, if programmed, could not
6 seconds to obtain relative feeder durations occur until the end of that delay. We shall
of .33, .50, and .67. The relative frequencies show in the next section that, with appro-
of choices predicted from Equation 5 are priate rules of correspondence, the present
.27, .50, and .73 and approximately match model describes behavior on a one-key vari-
the relative feeder durations. Thus the same able interval schedule of reinforcement in
power function that determined the values of terms of this special case in which each re-
different delays also determined the values sponse initiates a delay.
of different reinforcement magnitudes. How-
ever, this identity between functions pre- VARIABLE INTERVAL SCHEDULES
sumably is coincidental. More generally, OF REINFORCEMENT
when reinforcement magnitude is determined An important assumption behind the de-
not by feeder duration but instead by, say, scription of variable interval behavior by
sucrose concentration, one would imagine Formula 1 was shown some years ago by
that at least the parameter value in the power Anger (1954) to have empirical validity.
function would differ from the one used here. This idea is simply that an ordinary, one-key
The account provided here for the match- variable interval schedule is a concurrent
ing behavior in modified concurrent schedules schedule of reinforcement for different inter-
is parsimonious. First, there is but one esti- response times. To show this, Anger devised
mated parameter in the power function. Sec- a synthetic variable interval schedule in
ond, the matching behavior in each of the which reinforcements were programmed
above two-key experiments is different from separately for different bands of inter-
the behavior generated in one-key experi- response times. He used a different inter-
ments. In fact, these matching relationships mittent schedule, typically a fixed interval
schedule, for each of the different bands. nitude of reinforcement for that interresponse
Thus a response terminating an interresponse time. Indeed, the approximately linear func-
time in a given band was reinforced if a cer- tion appeared similar to the function expected
tain length of time had elapsed since the on the basis of the linear functions in the
preceding reinforcement of that band of inter- two-key concurrent variable interval sched-
response times. Anger selected these lengths ules discussed above. It should be empha-
of time, that is, the values of the fixed interval sized that these functions are for steady-state
schedules, so that the rate of reinforcement behavior and are from a schedule that con-
in each band approximated the rate of rein- trols relative frequencies of reinforcement.
forcement in that band in an ordinary varia- Presumably, the functions could be obscured
ble interval schedule. To obtain the latter if (a) data were from acquisition or transi-
rates, Anger ran his animals on a variable tion periods, or, (5) any other relationship
interval schedule until they stabilized. He between relative frequencies of interresponse
then shifted the animals to his synthetic times and the corresponding relative fre-
schedule and discovered that it maintained quencies of reinforcement were proscribed by
the behavior previously established by the the schedule (such as in variable interval and
variable interval schedule. In short, Anger variable ratio schedules. See Revusky,
found that his synthetic variable interval 1962; Shimp, 1967b). Thus Malott and
schedule was equivalent to an ordinary varia- Cumming (1964, 1966) may have obtained
ble interval schedule, with respect to the data at variance with Shimp's for the second
relative frequencies of interresponse times reason, and the failure of Blough and Blough
that each schedule generated. (1968) to find any obvious relationship may
The essential difference between Anger's be attributable to both reasons, as well as to
schedule and an ordinary variable interval the effects of induction.
schedule is that only the former controls the In summary, the present theory assumes
relative frequencies of reinforcement of the the existence of functional relationships be-
different interresponse times. Therefore he tween various reinforcement variables and
avoided the troublesome interaction normally relative frequencies of interresponse times.
prevailing in variable interval schedules be- The main support for this assumption is the
tween relative frequencies of interresponse dependency obtained by Anger, and more
times and relative frequencies of reinforce- importantly, the orderly functions obtained
ment. To show that these relative reinforce- by Shimp (1967b, 1968). At the present
ment frequencies actually controlled relative level of analysis, the mere existence of the
frequencies of interresponse times, Anger orderly functions obtained from synthetic
varied the reinforcement frequencies for cer- variable interval schedules is an adequate
tain bands of interresponse times and found basis upon which to construct quantitative
appropriate changes in the interresponse-time models. There is no doubt, however, that
distribution. He also found that long inter- identification, and control of, the fundamental
response times were much less sensitive to a behavioral events occurring between recorded
given change in reinforcement frequency than key pecks would permit analysis on a more
were short interresponse times. satisfactory level. Presently, some readers
In a recent experiment with a simplified may wish to imagine that behavioral chains
synthetic variable interval schedule, the num- of different lengths "mediate" interresponse
ber of reinforced interresponse-time bands times of different lengths. These chains
was reduced to two, and the problems result- would be differentially reinforced by variable
ing from induction across reinforced inter- interval schedules.
response times were removed (Shimp, 1968). It is clear then, that Anger's synthetic
These controls enabled the results to be more schedule is a concurrent schedule of rein-
easily interpreted. The simplified schedule forcement for different interresponse times
showed that the relative frequency of an (also see Catania, 1966). More specifically,
interresponse time is an orderly function of it is assumed here that Anger's schedule is
both the relative frequency and relative mag- behaviorally equivalent to a concurrent sched-
ule in which each component is a variable its termination, is critical for the present
interval schedule for a particular inter- model for variable interval behavior. Here,
response time. Therefore the arguments de- the length of time between a choice initiating
veloped above for choice behavior in two-key a certain interresponse time and a succeeding
concurrent variable interval schedules apply reinforcement for the response terminating
to Anger's schedule, and by the equivalence that interresponse time, is precisely the
Anger demonstrated, to variable interval length of that interresponse time. Therefore,
schedules. Of course, the choices are now the choice is reinforced only after a delay
among interresponse times instead of between equal to the chosen interresponse time. The
two response keys. Since the choice behavior machinery developed above for two-key con-
itself now generates the interresponse-time current variable interval schedules with de-
distribution, it is no longer necessary to lays of reinforcement (Dt~) therefore applies
retain the geometric process previously used to one-key variable interval schedules with
to determine when to respond in the simula- the different delays being the different inter-
tions. response times. Notice that this assumption
The theoretical sequence of events for a also makes one-key and two-key schedules
variable interval schedule can now be out- similar in terms of availability of the oper-
lined. Imagine a concurrent variable interval ants. In the two-key schedules, a bird may
schedule for different interresponse times. at any moment choose either key. Thus,
The arbitrarily selected bands of inter- both operants are equally available. Simi-
response times form the different operants larly, in the one-key schedule a bird, when-
Ai. The probability of reinforcement for one ever he makes a choice, may choose any
interresponse time may be said to increase interresponse time. Thus, all the operants
linearly as a function of the time since the again are equally available. It is emphasized
last response terminating that interresponse that the interresponse times are not equally
time. Thus, P(Ei) is computed as before. available if it is assumed that a choice may
The event Et is still, of course, a presentation occur at any moment. In that case, a long
of the feeder after an At. An 5 chooses the interresponse time is as available as a short
next interresponse time when he responds one only if the short one never occurs. The
and ends the preceding interresponse time. reason for making the present assumption is
When .9 chooses, he behaves as if he scans to allow the same formal model developed
the times since he ended the various inter- above for choice to apply also to response
response times and then computes values. rate. Thus, a considerable gain in theoreti-
The interresponse time corresponding to the cal parsimony results from an untraditional
greatest of these values is chosen. In short, assumption. No relevant data appear incon-
the chosen interresponse time satisfies Equa- sistent with this assumption. In particular,
tion 4 where now £« is the time since last the data providing perhaps the best evidence
ending an interresponse time in the ith band that an interresponse time is chosen at its
of interresponse times, and Rt is obtained end, not at its beginning, comes from very
from the data and gives the reinforcement early performance in interval schedules
rate for the «th band. (This length is not to (Anger, 1956; Mueller, 1950). That is,
be confused with the width of the band.) a flat interresponse-times-per-opportunity
Unlike earlier theories for operant behav- curve is more easily accounted for by the
ior (Bush & Hosteller, 1951, 1955; Estes, traditional assumption than by the one made
1950), the present theory does not assume here. But early in training, the effects of
that S chooses, in every small time interval
h, whether or not to respond. Here, 5" differential reinforcement of different inter-
chooses how long to wait before responding. response times are not obvious, and there is
Or, in other terms, S1 chooses which medi- yet no particular reason for a bird to attend
ating behavioral chain to initiate. The as- to time as a relevant dimension. But in
sumption that a bird chooses the length of asymptotic behavior, which is the only be-
an interresponse time at its beginning, not at havior addressed by the present model, either
or both of these circumstances could con- average interreinforcement interval of 1 min-

ceivably necessitate the present assumption. ute. The experimental procedure, data, and
Notice that if the termination of an inter- other relevant materal are reported elsewhere
response time were to reset its probability in detail (Shimp, 1967b). Figure 4 in this
of reinforcement to zero and if decision time earlier paper provides the relative numbers
were the moment of responding, an inter- of reinforcements per unit time, that is, the
response time could never succeed itself. Rf. The Dl were set equal to the lower
But interresponse times, especially short bounds of the interresponse time intervals
ones, surely do succeed themselves and thus shown in Figure 3 of the present paper.
the following provision was made for such However, for the shortest interval, that from
response perseveration. A choice of one 0 to .6 second, the delay was set equal to .3
interresponse time selectively predisposed an second. The rationale for using .3 second
animal to choose the interresponse time again was as follows. It is often the shortest inter-
because the reinforcement probability for the response time ended by a response with a
second successive choice of an interresponse topography similar to that of responses end-
time was based on a moment of time in the ing longer interresponse times (Blough,
very near future. That is, when a response 1963, 1966; Ray & McGill, 1964). Some
terminated an interresponse time in the fth special precautions, described in the earlier
band, ti in Equation 4 was not reset to zero. paper, successfully reduced the frequency of
Instead, it was reset to a short interval A£ shorter interresponse times to virtually zero.
which was estimated from the data. In other Since the minimum delay from one response
words, the times tt in Equation 4 were always to the next response was about .3 second, the
greater, by A£, than the actual times. minimum delay from the supposed decision
In the two-key experiments analyzed pre- time to the nearest possible reinforcement
viously, the experimenter controlled the rela- was also about .3 second.
tive frequencies of reinforcement. But in The parameter A£ for response persevera-
variable interval schedules, the steady-state tion was estimated, purely on the basis of the
reinforcement rate for each interresponse resulting fit between observed and predicted
time depends on 5" and so must be obtained relative frequencies of interresponse times,
from the data. Consequently, momentary to be .5 second for each of the three birds.
maximizing no longer predicts a priori asym- Ideally, perhaps, this parameter would be
totic choice probabilities. All the theory selected by comparing predicted and obtained
can do is predict the relative frequencies of sequential statistics. But as stated earlier,
the interresponse times, given the relative there are unfortunately no sequential statis-
frequencies of reinforcement. However, the tics from variable interval schedules. As was
model can make a priori predictions for also said earlier, there is no clear consensus
schedules that are similar to variable interval on the appropriate function relating values
schedules but that, unlike variable interval to delays of reinforcement. Until such a con-
schedules, do not have a deterministic rela- sensus is established, the present model
tionship between relative frequencies of inter- should be flexible enough so that at least
response times and of reinforcements. If an power and exponential functions provide
estimate of the parameter Af were available, reasonable fits. Such actually was the case
one could predict steady-state relative fre- earlier for two-key data and such also is the
quencies of interresponse times for such case here. One-parameter power functions
schedules, including synthetic schedules, be- were found to provide fairly good fits and
fore obtaining the data. Also, when maxi- exponential functions did slightly better.
mizing successfully predicts behavior in a Figure 3 shows the fit between data and
variable interval schedule, it also predicts
theoretical predictions obtained from com-
behavior in the corresponding synthetic
schedule. puter simulation of maximizing defined by
The model is used below to predict behav-
ior in a variable interval schedule with an MAX[U, [6]
0.80
0.70 -
0.60 -
0.50
0.40
0.30 -
0.20 -
0.10 -
(0,0.6)
(0.6,0.9) (1.2,1.8) INTERRESPONSE TIMES (SECONDS)
FIG. 3. The relative frequencies of interresponse times in a variable interval schedule with
an average interreinforcement interval of 1 minute. (The predicted curves were obtained by
computer simulation of the optimal process defined by Equation 6, and approximately equal the
obtained curves.)
where e is the base of the natural logarithms, to explain or to describe the tendency for
and C is a parameter estimated by an infor- shorter interresponse times to be more fre-
mal search method to be 1.5 for Bird 1 and quent than longer ones, even when rein-
2.0 for Birds 2 and 3. 'Figure 3 shows that forcement frequencies are equal (e.g., Shimp,
an optimal process is capable of generating 1968). According to the present model,
an acceptable first-order description of varia- this tendency follows directly from the non-
ble interval behavior. An important conse- linearity of delay-of-reinforcement functions
quence of the good fit is that the predicted such as power and exponential functions.
behavior is stable over a fairly long time,
as it is in variable interval schedules. With- CONCLUSIONS
out a good fit, the predicted distribution of The greatest virtue of the present optimal
interresponse times would change the distri- process is its capacity to organize numerous
bution of reinforcements. The changed rein- experiments which at first glance seem un-
forcement distribution would in turn, accord- related. This process, that of optimizing
ing to Equation 6 with the appropriately weighted reinforcement probabilities of dif-
revised reinforcement rates, generate an in- ferent concurrent operants, provides an alter-
terresponse-time distribution even further native way to look at behavior in many
from the original one, and so forth. familiar schedules of reinforcement. And in
The present model for variable interval fact, at the present time, no other way ap-
behavior provides an account for a phenome- pears to be consistent with the steady-state
non variously termed a response bias in favor behavior of pigeons in probability-learning
of short interresponse times (Malott & Gum- experiments, concurrent variable interval
ming, 1966), or a greater susceptibility to schedules, more complex two-key schedules
reinforcement by short interresponse times based on concurrent variable interval sched-
(Millenson, 1966). Such terms are intended ules, and one-key variable interval schedules.
Besides organizing data already in the litera- trials procedures allow relatively few mea-
ture, the model also successfully predicted surements per S (measured in the tens in-
data from a new experiment and did so stead of in the thousands), tend more to
without any newly estimated parameters. confound individual differences with other
An alternative account based on the idea experimental effects, and employ responses
of response strength appears less successful. such as running speeds or choices in T mazes
That account presently seems restricted to (instead of interresponse times or choices in
the very special case of matching in concur- Skinner boxes). A feature common to these
rent variable interval schedules. If response models assuming maximizing is an emphasis
strength is a linear function of reinforcement on steady-state behavior. One of the most
frequency, and response rate is a linear func- challenging problems is to relate such models
tion of response strength, then response rate to others that attempt to describe behavior in
is a linear function of reinforcement fre- transition (e.g., see Norman, 1966; Suppes
quency. But neither maximizing in proba- &Donio, 1967).
bility-learning experiments nor behavior in The reinforcement schedules analyzed in
variable interval schedules can be described this paper, and indeed virtually all free-
as linear functions of reinforcement fre- operant reinforcement schedules, are not
quency. The relative frequencies of inter- designed, of course, with their amenability to
response times in a variable interval schedule formal analyses the foremost consideration.
cannot possibly be a linear function of the They are quite complicated behaviorally, and
relative frequencies of reinforcement (Re- a complete description of the behavior they
vusky, 1962; Shimp, 1967b). This impossi- produce will necessarily involve an extremely
bility is due to a unique property of a large number of variables. This complexity,
variable interval schedule. This property together with the primary purpose of this
and corresponding properties in other sched- paper, implies that the fits between predicted
ules pose difficulties for other theories as well and obtained data shown here are not as good
as for the response strength theory. The as those obtainable from an alternative ap-
difficulty for the present theory is that the proach. That is, it would be possible even
function relating value, F(£{), to reinforce- now to concoct parameters and functions for
ment delay, Dtl will not be the same, for the effects of several variables omitted from
example, in variable ratio and variable inter- the present model, such as, for example,
val schedules. At present this dissimilarity temporal discrimination, complexity, and ab-
cannot be accounted for. But the problem solute reinforcement rate. [Perhaps the most
will have to be faced if the present model is important omission is a treatment of Ca-
to be generalized to the full range of classical tania's experiments (Catania, 1962, 1963a)
free-operant schedules of reinforcement. showing that overall response rate may re-
While the present model seems quite dif- main constant while local features of respond-
ferent from the unitary response strength ing vary. But according to one of Catania's
notion, it is markedly similar in some of its tentative accounts, (Catania, 1962, p. 184)
formal aspects to other recently developed this invariance may only be a special case of
models for operant behavior. The assump- the effects of the absolute rate of reinforce-
tion that behavior may be described by ment.] However desirable the resulting im-
decision-making processes, and especially by provement in fit might be by itself if the
optimal processes, is a common one. For model provided for such variables, the ap-
example, Boneau and Cole (1967) success- proach followed here is to postpone their
fully applied such notions to the behavior of addition to the model until more precise,
pigeons in some psychophysical experiments. quantitative data on their effects exist.
In addition, recent work by Logan (1965a, REFERENCES
196Sb) makes use of this assumption. Many
ANGER, D. The effect upon simple animal be-
of the differences between his work and the havior of different frequencies of reinforcement.
present paper seem attributable to differences Report PLR-33, Office of the Surgeon General,
in experimental procedures. His discrete- 19S4. (Document #7779, ADI Auxiliary Pub-
lications Project, Photoduplication Service, Li- EDWARDS, W. Behavioral decision theory. In P. R.
brary of Congress, Washington, D. C.) Farnsworth, O. McNemar, & Q. McNemar
ANGER, D. The dependence of interresponse times (Eds.), Annual review of psychology. Palo
upon the relative reinforcement of different in- Alto: Annual Reviews, 1961.
terresponse times. Journal of Experimental ESTES, W. K. Toward a statistical theory of learn-
Psychology, 1956, 44, 145-161. ing. Psychological Review, 1950, 57, 94-107.
BLOUGH, D. S. Interresponse time as a function ESTES, W. K., & STRAUGHAN, J. H. Analysis of
of continuous variables: A new method and some a verbal conditioning situation in terms of sta-
data. Journal of the Experimental Analysis of tistical learning theory. Journal of Experimental
Behavior, 1963, 6, 237-246. Psychology, 1954, 47, 225-234.
BLOUGH, D. S. The reinforcement of least-fre- FANTINO, E. Preference for mixed- versus fixed-
quent interresponse times. Journal of the Ex- ratio schedules. Journal of the Experimental
perimental Analysis of Behavior, 1966, 9, 581-591. Analysis of Behavior, 1967, 10, 35-43.
BLOUGH, P. M., & BLOUGH, D. S. The distribution FINDLEY, J. D. Preference and switching under
of interresponse times in the pigeon during vari- concurrent scheduling. Journal of the Experi-
able-interval reinforcement. Journal of the Ex- mental Analysis of Behavior, 1958, 1, 123-144.
perimental Analysis of Behavior, 1968, 11, 23-27. GRAF, V., BULLOCK, D. H., & BITTERMAN, M. E.
BONEAU, C. A., & COLE, J. L. Decision theory, Further experiments on probability-matching in
the pigeon, and the psychophysical function. the pigeon. Journal of the Experimental Analy-
Psychological Review, 1967, 74, 123-135. sis of Behavior, 1964, 7, 151-157.
BOWER, G., McLEAN, J., & MEACHAM, J. Value HERRNSTEIN, R. J. Relative and absolute strength
of knowing when reinforcement is due. Journal of response as a function of frequency of rein-
of Comparative and Physiological Psychology, forcement. Journal of the Experimental Analysis
1966, 62, 184-192. of Behavior, 1961, 4, 267-272.
BUSH, R. R., & HOSTELLER, F. A mathematical HERRNSTEIN, R. J. Secondary reinforcement and
model for simple learning. Psychological Re- rate of primary reinforcement. Journal of the
view, 1951, 58, 313-323. Experimental Analysis of Behavior, 1964, 7, 27-
BUSH, R. R., & MOSTELLER, F. Stochastic models 36. (a)
for learning. New York: Wiley, 1955. HERRNSTEIN, R. J. Aperiodicity as a factor in
CATANIA, A. C. Behavioral contrast in a multiple choice. Journal of the Experimental Analysis of
and concurrent schedule of reinforcement. Jour- Behavior, 1964, 7, 179-182. (b)
nal of the Experimental Analysis of Behavior, LOGAN, F. A. Incentive: How the conditions of
1961, 4, 335-342. reinforcement affect the performance of rats.
CATANIA, A. C. Independence of concurrent re- New Haven: Yale University Press, 1960.
sponding maintained by interval schedules of re- LOGAN, F. A. Decision making by rats: Delay
inforcement. Journal of the Experimental Analy- versus amount of reward. Journal of Compara-
sis of Behavior, 1962, S, 175-184. tive and Physiological Psychology, 1965, 59, 1-12.
CATANIA, A. C. Concurrent performances: Rein- (a)
forcement interaction and response independence.
Journal of the Experimental Analysis of Be- LOGAN, F. A. Decision making by rats: Uncer-
havior, 1963, 6, 253-263. (a) tain outcome choices. Journal of Comparative
CATANIA, A. C. Concurrent performances: A and Physiological Psychology, 1965, 59, 246-251.
baseline for the study of reinforcement magni- (b)
tude. Journal of the Experimental Analysis of LUCE, R. D., & RAIFFA, H. Games and decisions:
Behavior, 1963, 6, 299-300. (b) Introduction and critical survey. New York:
CATANIA, A. C. Concurrents operants. In W. K. Wiley, 1957.
Honig (Ed.), Operant behavior: Areas of re- LUCE, R. D., & SUPPES, P. Preference, utility, and
search and application. New York: Appleton- subjectivity probability. In R. D. Luce, R. R.
Century-Crofts, 1966. Bush, & E. Galanter (Eds.), Handbook of mathe-
CATANIA, A. C., & REYNOLDS, G. S. A quantita- matical psychology. Vol. III. New York:
tive analysis of the responding maintained by Wiley, 1965.
interval schedules of reinforcement. Journal of MALOTT, R. W., & GUMMING, W. W. Schedules
the Experimental Analysis of Behavior, 1968, 11, of interresponse time reinforcement. Psycho-
327-383. logical Record, 1964, 14, 211-252.
CHUNG, S. -H. Effects of delayed reinforcement MALOTT, R. W., & GUMMING, W. W. Concurrent
in a concurrent situation. Journal of the Ex- schedule of IRT reinforcement: Probability of
perimental Analysis of Behavior, 1965, 8, 439- reinforcement and the lower bounds of the rein-
444. forced IRT intervals. Journal of the Experi-
CHUNG, S. -H., & HERRNSTEIN, R. J. Choice and mental Analysis of Behavior, 1966, 9, 317-325.
delay of reinforcement. Journal of the Experi- McDiARMio, C. G., & RILLING, M. E. Reinforce-
mental Analysis of Behavior, 1967, 10, 67-74. ment delay and reinforcement rate as determi-
EDWARDS, W. The theory of decision making. nants of schedule preference. Psychonomic Sci-
Psychological Bulletin, 1954, 51, 380-417. ence, 1965, 2,195-196.
MILLENSON, J. R. Probability of response and REVUSKY, S. H. Mathematical analysis of the
probability of reinforcement in a response-de- duration of reinforced interresponse times during
fined analogue of an interval schedule. Journal variable-interval reinforcement. Psychometrika,
of the Experimental Analysis of Behavior, 1966, 1962,27, 307-314.
9, 87-94. REYNOLDS, G. S. On some determinants of choice
MUELLER, C. G. Theoretical relationships among in pigeons. Journal of the Experimental Analy-
some measures of conditioning. Proceedings of sis of Behavior, 1963, 6, 53-59.
the National Academy of Sciences, 1950, 36, 123- SHIMP, C. P. Probabilistically reinforced choice
130. behavior in pigeons. Journal of the Experi-
NEUSINGEH, A. J. Effects of reinforcement mag- mental Analysis of Behavior, 1966, 9, 443-455.
nitude on choice and rate of responding. Journal SHIMP, C. P. Reinforcement of least-frequent se-
of the Experimental Analysis of Behavior, 1967, quences of choices. Journal of the Experimental
10, 417-424. Analysis of Behavior, 1967, 10, 57-65. (a)
NORMAN, M. F. An approach to free-responding SHIMP, C. P. The reinforcement of short inter-
in schedules that prescribe reinforcement prob- response times. Journal of the Experimental
ability as a function of interresponse times. Analysis of Behavior, 1967, 10, 425-434. (b)
Journal of Mathematical Psychology, 1966, 3, SHIMP, C. P. Magnitude and frequency of rein-
235-268. forcement and frequencies of interresponse times.
PERIN, C. T. The effect of delayed reinforcement Journal of the Experimental Analysis of Beha-
upon the differentiation of bar responses in white vior, 1968, 11, 525-535.
rats. Journal of Experimental Psychology, 1943, SHULL, R. L., & PLISKOFF, S. S. Changeover de-
32, 95-109. lay and concurrent schedules: Some effects on
PROKASY, W. F. The acquisition of observing re- relative performance measures. Journal of the
Experimental Analysis of Behavior, 1967, 10, 517-
sponses in the absence of differential external re-
527.
inforcement. Journal of Comparative and Physi-
ological Psychology, 1956, 49, 131-134. SUPPES, P., & DONIO, J. Foundations of stimulus-
sampling theory for continuous-time processes.
PUBOLS, B. H. Constant versus variable delay of Journal of Mathematical Psychology, 1967, 4,
reinforcement. Journal of Comparative and 202-225.
Physiological Psychology, 1962, 55, 52-56. WYCKOFF, L. B. Toward a quantitative theory of
RAY, R. C., & McGiLL, W. Effects of class-in- secondary reinforcement. Psychological Review,
terval size upon certain frequency distributions 1959,66, 68-78.
of inter-response times. Journal of the Experi-
mental Analysis of Behavior, 1964, 7, 125-127. (Received January 15, 1968)

Shimp 1969

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shimp 1969

Uploaded by

Copyright:

Available Formats

VOL. 76, No.

Expected Utility Theory successfully predicts the steady-state relative fre-

tinction is made between a delay that post- 1.0

Herrnstein. Nevertheless, the data of three for delay of reinforcement, magnitude of

or both of these circumstances could con- average interreinforcement interval of 1 min-

You might also like