This action might not be possible to undo. Are you sure you want to continue?
The critical dimensions of the response-reinforcer contingency
B.A. Williams *
Department of Psychology, Uni6ersity of California, San Diego, CA 92093 -0109, USA Received 28 June 2000; received in revised form 18 September 2000; accepted 5 January 2001
Abstract Two major dimensions of any contingency of reinforcement are the temporal relation between a response and its reinforcer, and the relative frequency of the reinforcer given the response versus when the response has not occurred. Previous data demonstrate that time, per se, is not sufﬁcient to explain the effects of delay-of-reinforcement procedures; needed in addition is some account of the events occurring in the delay interval. Moreover, the effects of the same absolute time values vary greatly across situations, such that any notion of a standard delay-of-reinforcement gradient is simplistic. The effects of reinforcers occurring in the absence of a response depend critically upon the stimulus conditions paired with those reinforcers, in much the same manner as has been shown with Pavlovian contingency effects. However, it is unclear whether the underlying basis of such effects is response competition or changes in the calculus of causation. © 2001 Elsevier Science B.V. All rights reserved.
Keywords: Delay of reinforcement; Reinforcement contingency; Contingency
1. Introduction The principle of reinforcement is psychology’s most important concept, at least as assessed by the potency and diversity of its real-world applications. Yet at the same time it is not clear that the principle of reinforcement qualiﬁes as a true law of behavior. Its basic formula, the three term contingency, is only a framework for analyzing behavior, and must be ﬂeshed out in details before it can make meaningful predictions in speciﬁc
* Corresponding author. Tel.: + 1-858-5343938; fax: + 1858-5347190. E-mail address: email@example.com (B.A. Williams).
behavioral situations — which is the function a true law of behavior should serve (see Timberlake, 1988 for a more extended discussion of this issue from a very different perspective from that presented here).
2. Role of the response-reinforcer temporal relation The simple statement that responding is strengthened when followed by positive events (reward) is clearly too broad. It is unlikely that a food pellet delivered today because a rat pressed a bar last week will have the effect of making the
0376-6357/01/$ - see front matter © 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 6 - 6 3 5 7 ( 0 1 ) 0 0 1 5 3 - X
Skinner seems to have trained his subjects for a single session. A third developed a tossing response. But it does appear there was one very fundamental difference. 2. with the terminal behavior (in this case keypecking) becoming most frequent at the time of maximum probability of food delivery. who argued that early in the temporal interval elicited behaviors other than the keypeck were most probable. in Science and Human Beha6ior (p. This does not mean that elicited behaviors are not important because there is no reason to believe that the adventitious response-reinforcer relation is the only factor operating in the superstition procedure. This in no way challenges the validity of the superstition concept. Another bird made incomplete pecking or brushing movements directed toward but not touching the ﬂoor. Six of eight birds exhibited responses sufﬁciently clearly deﬁned that two observers could agree perfectly in counting instances. and that the nature of that interim behavior changed systematically as the temporal interval progressed. whereas the subsequent two reports used multiple sessions. For example. Skinner believed that the temporal relation was the deﬁning feature of reinforcement in that temporal contiguity (or at least an approximation to contiguity) was assumed to be both necessary and sufﬁcient: necessary in the sense that whenever a reinforcer not temporally contiguous did result in response strengthening Skinner then looked for events intervening in the delay-of-reinforcement interval which served as conditioned reinforcers. in which the head was extended forward and swung from right to left with a sharp movement followed by a somewhat slower return. While Skinner assumed that adventitious response-reinforcer pairings were the basis of the idiosyncratic behaviors that he observed.1. (But see Reid et al. 76) he writes: ‘‘Although it is characteristic of human behavior that primary reinforcers may be effective after a long delay. Both Staddon and Simmelhag (1971) and Timberlake and Lucas (1985) have argued that the behaviors seen in the superstition procedure are due to Pavlovian contingencies whereby the discrimination of the reinforcer’s temporal periodicity evokes responses that are part of the bird’s natural repertoire of food-related behaviors. In neither report is there an informative analysis of the ﬁrst session of training. 1993). but that subject apparently kept its head in the food magazine for most of the session and is not helpful.112 B.. Superstitious conditioning The evidence Skinner provided for the sufﬁciency of temporal contiguity was his demonstration of superstitious conditioning. Two birds developed a pendulum motion of the head and body. Staddon and Simmelhag did report the data for one subject that included separate plotting of the behavior on the ﬁrst session. although how and why it is important continues to be the subject of debate. Another repeatedly thrust its head into one of the upper corners of the cage. but shows only that the effects of adventitious response-reinforcer contiguities can be overridden by the Pavlovian contingency. The most widely cited rendition of this perspective is the distinction between interim and terminal behaviors proposed by Staddon and Simmelhag (1971).A. which themselves were immediately contingent on the response. in part perhaps because discerning the exact details of the procedure is impossible from Skinner’s original report. others have offered alternative interpretations. There seems to be no doubt that pigeons eventually discriminate the periodicity of the food delivery and that the time of maximum food probability does evoke species-characteristic behavior. of indeterminant duration. There has always been the assumption that the temporal relation between the response and reinforcer is important. It is important to recognize that neither of these investigations was an exact replication of Skinner’s original procedure. . this is presumably only because intervening events become conditioned reinforcers’’. The procedure of Skinner (1948) was simply to present food every 15 s regardless of what the bird was doing. The only other behavior reported for it was 1/4 turns. which is consistent with Skinner’s own account. as if placing its head beneath an invisible bar and lifting it. One turned counter-clockwise about the cage. Williams / Beha6ioural Processes 54 (2001) 111–126 rat press the bar more in the future.
Fig. There is a third alternative — to use an unsignaled delay of reinforcement procedure. whereas subjects without the history of reinforced keypecking emitted only a minimal number of responses. so that the actual time values involved are indeterminant.g. it should be possible to establish a quantitative function describing the delay-of-reinforcement gradient. in which the delay is reset after each response. such that it received response-independent reinforcement whenever a delayed reinforcer occurred for the subject trained with the delay contingency. it is no longer clear how Mazur’s equation should be interpreted. but this necessarily introduces a stimulus change that might assume conditioned reinforcement properties. the result of which has been the well known hyperbolic equation. Williams / Beha6ioural Processes 54 (2001) 111–126 113 The results of Neuringer (1970) provide an empirical reason for taking the concept of superstitious conditioning seriously. 1 shows the results in terms of normalized response rate relative to the baseline rate when immediate reinforcement was delivered. 1998). One is to remove the response opportunity. Logan. For each subject receiving the delay contingency a second subject was yoked.A. Various people have attempted this over the years (e. although they of course can be measured. for a review). in that it allows for the phenomenon of preference reversal. This procedure has the obvious difﬁculty that responses are allowed during the delay interval. A great . Mazur (1997) argues that it is a description of the value of the conditioned reinforcers intervening in the delay-of-reinforcement interval. There are two ways one might accomplish this. The two groups differed only with respect to the experimental group ﬁrst receiving three reinforced keypecks. Note that there are important implications of the equation being hyperbolic rather than exponential as most early theorists assumed. in that competing behaviors may also occur during unsignaled delays. The most plausible interpretation of Neuringer’s results is that the group with the initially reinforced keypecking had their level of keypecking raised to the extent that keypecking was the dominant response in the situation and therefore was the behavior most likely to be picked up adventitiously by the response-independent reinforcement. These subjects maintained their responding at low levels of 2– 5 responses/min throughout the entire duration of training. it now seems that the use of the unsignaled delay procedure can be instructive. The gradient of delayed reinforcement If Skinner was correct in his statement that the ability of a reinforcer to strengthen a response depends only on the response-reinforcer temporal relation. Nevertheless. 2) trained pigeons on a VI 2 min schedule of immediate reinforcement and then a 5-s unsignaled delay contingency was added to the end of the 2-min schedule. The second problem is the same as the dro procedure.2. By far the most sophisticated attempt in terms of supplying evidence for a quantitative description of delayed reinforcement effects has been Mazur’s work using his titration procedure (see Mazur. No stimulus change occurred as a function of the timer’s onset (technically a tandem VI-2 min FT 5-s schedule). Williams (1976) (Expt. and also means that one is likely to be reinforcing competing behavior. 1997. 2. The second is to use a dro contingency. which underlies the analysis of Ainslee. What then would the equation be if there were no conditioned reinforcers present in the delay interval? This is an intractable problem experimentally because the only way the actual time values between a response and reinforcer can be controlled in a precise way is to preclude the response from occurring while the delay of reinforcement interval is in operation. Rachlin and others of self-control procedures (see Mazur. Mowrer. 1960. but this means that the schedule of reinforcement is changing. 1943. Unfortunately. The delay contingency entailed that the ﬁrst peck after a VI interval had elapsed started a 5-s timer where food was delivered independently of the pigeon’s behavior at the timer’s offset.B. Hull. 1960). Midway through training the roles of the two subjects was reversed so that the yoked subject received the delay contingency and vice versa. Neuringer presented two groups of pigeons a variable-time (VT) 30-s schedule for 20 sessions of training.
A. Response rate maintained by a 5-s unsignaled delay. With continued training. 1976.) . 1.114 B. (Reprinted from Williams. however. the delay subjects. The contingency for individual subjects within a pair was switched midway through training. Journal of the Experimental Analysis of Behavior. responded at a higher rate than did their paired yoked subjects. indicating that the delayed reinforcement contingency Fig. along with that of yoked subjects receiving response-independent reinforcement. Experiment 2. with one exception. Williams / Beha6ioural Processes 54 (2001) 111–126 deal of variability is evident for both the delay and yoked subjects during the ﬁrst ten or so sessions so it is difﬁcult to see any consistent difference between the delay and yoked subjects.
and reported almost no learning. But there are complications to be resolved before this inference can be accepted. with rats as subjects.5– 2 s. The third and most important complication for interpreting the unsignaled delayed reinforcement procedure is that it is now known that pigeons can acquire keypecking in an unsignaled delay procedure. producing very high response rates. Why might this be? What likely happened is events other than barpressing competed as predictors of the food delivery. when delays reached approximately 5 s. If such competing behavior could be eliminated. The apparent inference to be drawn from Fig. so it is possible that these competed with keypecking. extremely high response rates are produced with the short delays. Like my own results. When very short delays are used in the unsignaled delay procedure. these response bursts may increase in length.B. This implies that response rate is a problematic measure of response strength. rearing in the corners. 2. and these rates were sufﬁciently low that the subjects lost a signiﬁcant fraction of the available reinforcers. in that the response unit may change systematically with the delay value (Arbuckle and Lattal. Each of these . Here the subjects did acquire responding. rats engage in a variety of behavior when ﬁrst exposed to the experimental chamber. acquisition occurred even when a dro was used. But the most salient aspect of the ﬁgure is the level of responding eventually reached under the delay contingency. A second interpretative problem for the unsignaled delay procedure is that much of the precipitous fall-off in response rate seen in Fig. When delays are gradually increased. 1988). First is the fact that the delay gradient is nonmonotonic. The remaining four subjects had response rates between 20 and 40% of their baseline rates and these were highly variable across sessions. Moreover. 1 with the acquisition data showing that pigeons and rats can learn with delays that are an order of magnitude greater. 1 is that delay of reinforcement is a very powerful variable. snifﬁng the walls. the rats failed to acquire barpressing when the procedure was begun with the onset of the session. who used rats with 60-s delays. and have found acquisition with most rats with 30-s unsignaled delays. including circling. in the range of 0.3. Similar experiments have been conducted in my laboratory with rats. The apparent reason for this is that pigeons respond in bursts. This was shown in an important paper by Lattal and Gleeson (1990). it is less than 10% of the baseline level. which ensured that the scheduled delay was the minimum time between the response and reinforcer. As described by Killeen and Bizo (1998). But in a second condition the subjects waited 30 min in the experimental chamber before the bar was inserted into the chamber. and these activities gradually decrease in frequency. Williams / Beha6ioural Processes 54 (2001) 111–126 115 did exert some small amount of control over responding. with delays up to 30-s.A. etc. it seems plausible that the delay gradient might have been not nearly so steep. without the dro. In contrast other experiments have used ﬁxed session durations which were independent of when the subject starts responding. while the end of the burst corresponds with the end of the delay and thus is immediately reinforced. complete failure of acquisition when using 60-s delays. 1 could be due to the development of competing behavior. Why should long waits in the chamber be an important variable? One possible answer comes from Dickinson et al. de novo. Associati6e competition as a second 6ariable The question is how to reconcile the great sensitivity to delays seen in Fig. at which time the 60-s delay contingency was in effect. so that the ﬁrst response of the burst initiates the delay. The unsignaled delay procedure allows many forms of behavior other than keypecking to be present at the time of reinforcer delivery. One clue to this apparent conﬂict is a feature of the Lattal and Gleason paper that differs from most other experiments. but interestingly. (1992). Their subjects were placed in the chamber and kept there until they responded. and even short delays can wipe out an enormous amount of behavior. For four of the eight subjects. This inference concurs with the conclusion of the classic delay of reinforcement experiment of Grice (1948) who used a discrimination procedure without differential cues during the delay interval.
For the ‘blocking’ condition. although with highly variable response rates. Kamin.A. In the ﬁrst experiment naive rats were placed in the chamber with 4– 5 food pellets in the magazine with a 30-s delay of reinforcement contingency in effect. 1978. But with the 30-min wait in the chamber before the lever was inserted. Both of the effects seen in Fig. Then when the ﬁrst food pellet produced by a lever press occurred. Williams / Beha6ioural Processes 54 (2001) 111–126 activities. the possibility thus arises that the pecking behavior studied by Williams (1975) was also under the control of Pavlovian rather than operant contingencies. In the ‘marking’ condition (see Lieberman et al. I interpreted the result as showing that stimulus-reinforcer associations could block response-reinforcer associations. for the development of the concept of marking). 2 are important for understanding delay-of-reinforcement contingencies. In the control condition the stimulus condition during the delay-of-reinforcement interval was the same as during the intertrial interval. For one response the delay was completely unsignaled. further responses had no effect. which ended in food regardless of what occurred during the delay interval. the last 5 s of its delay had a 5-s houselight present. Pigeons were presented a discrete-trial procedure in which a keylight was on for 5 s. without any feedback that the delay was operating. potentially could be associated with the food delivery if a lever press were to occur at the beginning of the session. The marking effect indicates that one . the blocking effect was studied in a choice procedure. both involving presentations of a 5-s houselight. 1982).116 B. or. All subjects strongly preferred the response alternative without the signal during its delay. in a manner similar to Lattal and Gleeson (1990). 2 shows that the different placements of the houselight had opposite effects. This criticism has been addressed in several subsequent studies (Williams. it completely prevented acquisition from occurring. so the maximum rate of reinforcement that could be obtained was 2/min or 120/h. 1979. much in the same way as latent inhibition has been shown to occur to pre-exposed CSs in Pavlovian conditioning procedures.. When the timer was running. The foregoing analysis implies that it isn’t simply the temporal relation between the response and reinforcer that is critical. Here each of two responses started its own 30-s timer. Fig. In a second experiment published in the same paper. and their accompanying stimuli. The results were that the second stimulus in the latter portion of the delay greatly reduced the level of responding maintained by the delay contingency. the response-reinforcer relation was identical. and food was delivered automatically at the end of the 30-s delay. Note that for all three conditions. A barpress started a timer. The possibility that associative competition occurs with respect to operant conditioning in the same manner as it occurs for overshadowing and blocking effects in Pavlovian conditioning (e. 1988) was ﬁrst investigated by Williams (1975). the houselight occurred during the last 5 s of the delay. for the blocking condition. and then food was delivered 10 s after the offset of the keylight if at least one peck had occurred during the 5-s keylight. One must also consider the how the response-reinforcer relationship compares to the relation between potential associative competitors and the reinforcer. This most recent study is also most germane to the present discussion because it studied response acquisition de novo. this houselight occurred immediately after the bar press that started the-delay of-reinforcement interval. for the other response. The usual result of this procedure is that 75–80% of the rats learn to barpress on a consistent basis. For the marking condition. The procedure of Williams (1975) was contested as a method of demonstrating associative competition in operant conditioning because of its use of keypecking by pigeons as the putative operant response. most recently by Williams (1999). There were two other conditions in the experiment. attention to these potential competitors was essentially extinguished. In the ‘blocking’ condition the last 3– 5 s of the delay were ﬁlled by a second keylight.g. depending on the experiment some other kind of stimulus. it substantially facilitated the learning. Because keypecking has been shown to be controlled by Pavlovian contingencies in autoshaping experiments. only the memory of the bar press was a salient event because its potential competitors had lost their salience due to their prior exposure.
A general implication of the marking effect is that a critical consideration is the content of the animal’s memory buffer at the time the reinforcer is delivered. Different groups of subjects received different signal conditions during the delay intervals. 1999. that response is more strongly affected by the delayed reinforcer. 2.A. Of primary interest for the present discussion is the effect of the houselight signal at the end of the delay interval. In Pavlovian conditioning. Holland and Forbes (1982) conditioned a taste aversion to a speciﬁc ﬂavor.B. ranging from conditional discriminations to extinction.) needs to take into consideration the memory of the response upon which the reinforcer is contingent. Acquisition of barpressing with a 30-s unsignaled delay of reinforcement. This blocking effect demonstrates that ‘predictiveness’ is a factor in operant conditioning apparently in the same manner as it is in Pavlovian conditioning. When that response is highlighted by some distinct event. (Reprinted from Williams. Psychonomic Bulletin & Review. What this suggests for operant reinforcement is that conjuring up the memory of a response just prior to reinforcer delivery might cause that response to be strengthened. This implies that any of a variety of procedures should be potentially valuable for increasing the effectiveness of delayed reinforcement. a houselight at the start of the delay (Marking). The result was that the conditioned aver- sion to the ﬂavor was attenuated. and then presented the tone repeatedly alone. For example. a given responsereinforcer temporal relation can have very different effects depending on whether the reinforcer occurring at the delay-of reinforcement-interval is already predicted by events occurring during the delay. then paired a tone with that ﬂavor. or a houselight at the end of the delay (blocking). Experiment 1. either no signals (No Sig). the potency of both marking and blocking says any quantitative . Peter Holland and his collaborators have demonstrated that evoking the memory of a CS can substitute for the CS in several situations. which appears to have blocked the rat’s association between its lever press and the food delivery. At a more general level. Williams / Beha6ioural Processes 54 (2001) 111–126 117 Fig. Thus.
Self-reinforcement as an example At a yet more general level.A. rather than the response of writing that determines when the M&M’s will be consumed. but also the occurrence of the reinforcer when the response has not occurred. It is not only the response-reinforcer pairings that are important. given that I am working at the time the timer’s offset is signaled. for a discussion). In Science and Human Beha6ior (1953) Skinner appears to be of two minds about whether self-reinforcement should be an effective procedure. Is the individual more likely to do a similar piece of work in the future? It would not be surprising if he were not. The role of alternative reinforcement In addition to the temporal relation between the response and reinforcer. (1) . failure of self-reinforcement should not be surprising. based on the results of Dickinson et al. can a general equation be applied.118 B. and also for behavior maintained by simple schedules when additional reinforcement is presented alongside the responsecontingent reinforcement. Only when those effects are absent. But he also recognizes that the concept seems implausible: (p. and have those contingencies meaningfully affect behavior? For example. 3. 1996). writing therefore should not be strengthened. 1988. which was intended to apply to all schedule situations: B1 = kR1/(R1 +mR2 + Ro) (1) 2. The concept of relative rate of reinforcement has played a huge role in theories of operant behavior over the past 40 years. R1 to its reinforcement rate. For example. On the one hand he recognizes that any notion of reinforcement based solely on response-reinforcer temporal contiguity implies that self-reinforcement would work. One that generated considerable attention 15– 20 years ago was ‘self-reinforcement’: can one supply one’s own reinforcement contingencies. and the complexities posed by the concepts of predictiveness and memory as modulators of the effects of the temporal relation. the response rate is governed by the relative rate of reinforcement contingent on a response. when a reinforcer will be delivered.. Because writing is less predictive than the behavior of compliance. This change is critically important because Eq. both for multiple and concurrent schedules. It is important to recognize that Herrnstein himself came to believe this formula did not actually reﬂect the conditioning process at its most basic level.4. not simply the reinforcement for that response considered in isolation. the equation for delayed reinforcement effects should be substantially different if the subject is habituated to the experimental chamber prior to the onset of the experimental session (also see Dickinson et al. Once it is recognized that predictiveness is a critical part of the response-reinforcer contingency. which was instead the process of melioration (see Williams. the concept of predictiveness may shed considerable light on various enigmatic concepts in behavior analysis. (1992). That is to say. and those restrictions suggest that any such equation will be highly situation speciﬁc. Williams / Beha6ioural Processes 54 (2001) 111–126 model of delay of reinforcement will be problematic. the response must not be redundant with other events in the same situation that predict B1 refers to the behavior being counted. 238: ‘‘The ultimate question is whether the consequence has any strengthening effect upon the behavior which precedes it. although one must agree that he has arranged a sequence of events in which certain behavior has been followed by a reinforcing event’’). can I better sustain my writing behavior by consuming M&M’s on a variable interval schedule. Much of this theorizing was inspired by the relative law of effect of Herrnstein (1970). R2 to the reinforcement rate from other identiﬁable behavior (B2). But it is the compliance with the rules of self-reinforcement. there is a second dimension of the contingency of reinforcement that must be considered. In order for a reinforcer to be effective. and Ro to the hypothetical reinforcement rate of other behavior not directly observed. or at least equal for the different responses being compared.
it should be impossible for the subject to equate the local rates of reinforcement across the two choice alternatives. In a related paper Rachlin and Baum (1972) reported that the reduction in responding to a constant VI was the same regardless of whether additional reinforcement was available from working on a second key on a VI schedule. whereby there is some comparison of the rate of response-contingent reinforcement to the rates of reinforcement from other sources in the same situation. while that when the signal was present was extremely high. there are data that appear to support the concept of relative rate of reinforcement as a basic determinant of how reinforcement operates to strengthen behavior. Rescorla (1967. Similarities between Pa6lo6ian and operant conditioning The concept of relative rate of reinforcement as a controlling variable in operant conditioning is similar to the concept of contingency as employed in Pavlovian conditioning. Probability of reinforcement is of course the variable that Skinner invoked as the fundamental determinant of the strength of behavior. The duration of reinforcement associated with the signaled VI was then varied.g.A. It is interesting to note that a parallel argument was being advanced at approximately the same time in the operant literature (e. In the operant literature the contingency/correlation view of how reinforcement makes contact with behavior is still seriously maintained. or had their behavior governed by schedule-feedback functions. 3. one with a constant VI schedule. Contrary to melioration theory. very soon after Rescorla demonstrated the effect of contingency at an empirical . An illustrative experiment (also see Catania. Response rate maintained by the unsignaled VI varied inversely as a function of the duration of the signaled reinforcer. 1963) was provided by Rachlin and Baum (1969). which. on the other hand. Baum. 1973). 1968) reported that the strength of fear conditioning depended not on the probability of shock during the CS. because the local rate in the absence of the signal was zero. For example. In Pavlovian conditioning. by comparison. or whether the same amount of food was presented on a VT schedule of free food. and that the effects of relative rate of reinforcement cannot be reduced to the probabilities of reinforcement for the individual response alternatives. such that the second keylight was illuminated only when a reinforcer had been scheduled by its VI timer. who presented pigeons two response keys. is equivalent to which response alternative has had the higher recent probability of reinforcement. but instead upon the ratio of the CS probability of shock to the probability of shock in the absence of the CS. in the study of Rachlin and Baum (1969). Note that this relation held despite very little time being devoted to the signaled VI key. because a single peck was all that was required to obtained its reinforcer. These aged data are important because melioration theory has no account for why response rate for the unchanged schedule should depend on the rate of reinforcement from alternative sources without concomitant changes in the amount of response competition engendered by those other sources. Melioration theory. the time allocation presumably should not vary with the duration of the food for that key.B. Moreover. at least in the sense that no one has advanced a theoretical account showing how the effects of relative rate of reinforcement can be explained in terms of more basic principles. reduces to the very simple idea that choice is governed by which response alternative has the highest recent local rate of reinforcement. Williams / Beha6ioural Processes 54 (2001) 111–126 119 suggests that a molar deﬁnition of the reinforcement contingency is appropriate. the second with a second VI of similar value but with a signal contingency. Why then should response rate maintained by the unchanged VI be a regular function of the amount of reinforcement associated with the signaled VI? Such a ﬁnding suggests strongly that relative rate of reinforcement is the fundamental variable controlling response rate. where various theorists argued that animals are sensitive to response-reinforcer correlations.1. This suggested that contingency was a primitive variable with a fundamental status similar to the CS-US temporal relationship. in the case of discrete responses.
no conditioning emerged. because of ‘context blocking’. The critical test of the context blocking account of contingency effects has been to observe whether signalling the reinforcers otherwise paired only with the background cues would allow conditioning to occur in the random control procedure. the signal always is paired with the US presentations. 1972). Fig. A within-subject design was used such that each subject received each condition for 25 sessions. and the measure of conditioning was whether the rat went to the side of the chamber on which the pilot light appeared on a given trial. (3) subjects that received ITI reinforcers signaled by a 5-s noise (Short Signal). although much more slowly than when no reinforcers were presented during the ITI. However. The rationale is that excitatory conditioning to the background cues should be blocked by the signal. 3 from a Pavlovian conditioning procedure are quite similar to those obtained with a standard operant conditioning procedure (Williams. Then. The actual results of signalling the background reinforcers have been conﬂicting. Because these cues were also present when the CS was paired with the US. in apparent agreement with the earlier results of Jenkins et al. although obviously much more slowly than when no ITI USs were presented. they blocked the CS-US association. Durlach. In the random-control procedure in which the probability of the US was the same during the CS and during its absence. (4) subjects that received ITI reinforcers signaled by a 15-s noise (Long Signal). US presentations in the absence of the CS were assumed to cause the background cues to become associated with the US. conditioning to the CS did occur. subsequent studies by Durlach and Rescorla (e. as signalling the background reinforcers did cause conditioning to the CS to emerge. 4. (2) subjects that received unsignaled reinforcers during the ITI (No Signal). (1981) were the ﬁrst to test the effects of signalling the background reinforcers and their initial ﬁndings were that signalling the background reinforcers had no effect: conditioning to the CS failed to occur regardless of where the background reinforcers were paired with the signal. while the background only occasionally was paired. This suggests that the associative analysis that has been given for Pavlovian conditioning might also be used for understanding operant conditioning. Contingency effects occurred. blocking of conditioning to the CS in the random control procedure should not occur. while the 12-s signal produced little response suppression relative to the baseline. Williams / Beha6ioural Processes 54 (2001) 111–126 level he went on to explain the effect of contingency in reductionistic terms (for a review see Papini and Bitterman. That is. 1983) produced the opposite ﬁndings. When the ITI reinforcers were unsignaled. because the signal has greater predictive validity than the background. Certainly the predictive validity of the response for the reinforcer seems to be an . because the background cues had no excitatory value. The VT reinforcers were either unsignaled. which meant that sign tracking to the CS caused the rat to move away from the food magazine. Pavlovian and operant conditioning seem to be quite similar in the laws of reinforcement. Results for individual subjects are shown in Fig. As in the case of blocking considered earlier. 3 shows the acquisition of the sign-tracking behavior for four groups of subjects: (1) the controls for which no reinforcers were presented during the ITI (CS-only). Pigeons were trained to peck on a VI 3 min and then a VT 30-s schedule was superimposed. In a second condition the signal was 12 s in duration. despite the differences in the types of contingencies involved. The probable explanation of the conﬂict between these results is the duration of the signal for the ITI reinforcers. It is apparent that the 4-s signal produced only marginally less suppression than the unsignaled condition. said Rescorla. in which the CS was a localized pilot light at either end of the elongated chamber. The food magazine was in the center of the chamber. 1989a) similar in many respects to the procedures used by Rachlin and Baum (1969. In one condition the signal was 4 s in duration. 1990). (1981). These results shown in Fig. Conditioning also failed to occur when a 5-s noise preceded the US presentations in the absence of the CS. But when the signal was 15-s in duration. Jenkins et al. or preceded by a blackout signal.g.120 B.A. Williams (1994) used a signtracking procedure with rats as subjects.
3. a second group received extinction. while response rates for the remaining two groups were not signiﬁcantly different. 1994. just as the predictive validity of the CS for the US seems to be important for Pavlovian conditioning. But it is less clear that the kind of analysis that Rescorla has given for contingency effects can be extended to the concept of relative rate of reinforcement in operant conditioning. But the opposite pattern of results was obtained by Reed and Reilly (1990). In a second experiment the nonreinforced exposure and VT reinforcement conditions were compared to a third condition in which response-independent food was preceded by a 30-s signal. Acquisition of sign tracking as a function of different reinforcement conditions during the ITI: all groups except CS-only received reinforcers during the ITI while the groups differed as a function of whether these ITI reinforcers were signaled and the duration of that signal. Rescorla and Solomon. Response rates were lowest in subjects that received no food during their exposure to the context. Williams / Beha6ioural Processes 54 (2001) 111–126 121 important determinant of operant conditioning. 1967) which postulate a positive interaction between the response-reinforcer operant contingency and the incentive process generated by Pavlovian signal-reinforcer and/or context-reinforcer associations. Here the nonreinforced exposure and signaled VT conditions produced similar response rates.B.A. one group of subjects received VT food presentations. while a third group had a corresponding period of time in their home cages. The effects of the context manipulation were then assessed during an extinction test phase with the levers re-inserted into the chamber. Psychonomic Bulletin & Review.2. Context 6alue in operant conditioning The effects of context conditioning on the rate of operant response rate were assessed by Pearce and Hall (1979) by various manipulations of context value after responding had been established on a VI schedule of reinforcement. The results of Pearce and Hall thus suggest that context value is positively correlated with response rate.) . 3. with both signiﬁcantly less than the unsignaled VT reinforcement condition.g. Such effects are consistent with ‘two-factor’ theories of instrumental conditioning (e. who pretrained their Fig. not negatively as appears to be the case with the acquisition of Pavlovian conditioning. (Reprinted from Williams. With the levers removed from the chamber.
Williams / Beha6ioural Processes 54 (2001) 111–126 Fig. Response rate during baseline was much higher with the signaled delay than with the unsignaled delay.A. context exposure was presented with the response lever withdrawn.) subjects on a VI schedule but with a 5-s resetting delay of reinforcement. and produced higher response rates than when the context exposure involved unsignaled VT reinforcers. The latter condition produced response rates slightly lower than the home-cage controls. subjects receiving free food during the pre-exposure initially had a similar elevated response rate during the ﬁrst several sessions. In their Experiment 2 response rates were also higher than the home-cage controls when pre-exposure included signaled VT reinforcers. 1989a. Learning and Motivation. 4.122 B. . Effects of context manipulation were then assessed during an extinction test for both conditions. In addition. presumably because competing responses had been extinguished. but these decreased over extended training to be similar to the home-cage controls. subjects received either nonreinforced exposure to the context in the absence of the response lever or an equivalent time in their home cages. In a second experiment the pretraining involved a signaled resetting delay of reinforcement. Here response rate was higher when the subject received nonreinforced exposure to the context. After this pretraining. Here the signaled VT reinforcers increased response rate relative to the home cage controls. then compared the effects of signaled versus unsignaled VT reinforcers during the context exposure. Dickinson et al. (Reprinted from Williams. (1996) studied similar manipulations of context value but prior to acquisition training under extended delays of reinforcement. much like those ﬁrst used by Lattal and Gleeson (1990). Subjects receiving nonreinforced context pre-exposure produced higher response rates than subjects without context pre-exposure. consistent with the prior study of Dickinson et al. context extinction decreased the rate of responding on the probe extinction test. relative to the home-cage controls. In a third experiment Reed and Reilly pretrained with an unsignaled delay of reinforcement. Prior to each acquisition session with a 32-s delay of reinforcement contingency. With signaled delays. The effects of the context manipulation were also different. Response rate maintained by a VI-3 min schedule of reinforcement as a function of the signal condition of reinforcers delivered on a VT-30-s schedule. (1992).
presumably because the contingency allows many different kinds of behavior oriented toward the contextual stimuli to become competitors with the operant response. decreasing the value of the context appears to produce faster acquisition. Controlling variables for choice behavior The experimental results I have discussed so far are based primarily on experiments in which the strength of a single response has been studied. while much of the experimental literature on contingencies of reinforcement during that past 30 years involves choice behavior. 1994. with major . Whether pretraining the context would help or hinder response acquisition using immediate reinforcement thus remains an important unanswered question. The forgoing results imply that unsignaled delays of reinforcement have differential sensitivity to context manipulations. The effect of context-reinforcer associations on operant response rate apparently depends on the complex dynamics of the translation of arousal into the speciﬁc channels speciﬁed by the contingency of reinforcement. 1989b) have show that the degree of response suppression caused by the superimposition of schedules of free reinforcement is signiﬁcantly greater when the response-independent reinforcer is identical to the response-contingent reinforcer than when its identity is different. should predict the duration of the signal that is needed in order to eliminate the suppression of operant responding caused by superimposed VT reinforcement.B. given that the acquisition was always studied in combination with a delay contingency. These two enterprises often seem entirely separate. increases in the value of the context appears to augment response rate. 1989b) 4. but decreases response rate when the operant response is maintained by signaled delays of value. But at least one aspect of contingency manipulations on operant responding cannot be explained by response competition due to generalized arousal. 1986. Given that the value of the different reinforcers is similar. which has sponsored a theoretical emphasis on the determinants of relative rates of responding. When acquisition under unsignaled delays of reinforcement is studied. either in terms of the response-reinforcer association. The results are further complicated by the ﬁnding that this effect of reinforcer identity is decreased when the alternative response-independent reinforcer is preceded by a signal or contingent on a second response (Williams. Several different studies (Dickinson and Charnock. Colwill and Rescorla. Killeen (1994) has summarized the evidence that arousal grows monotonically with the total rate of reinforcement in the situation. Signaling the VT reinforcers presumably reduces the response competition by restricting the competing behavior to the time the signal is present. behaviors competitive with the operant response should increase in frequency and thus reduce the response rate maintained by the VI schedule. or in terms of response competition.A. The reason short signals 1– 4 s in duration are ineffective in preventing the suppression of operant responding due to the VT schedule is that the signal duration must exceed the size of the memory buffer. Unclear from the studies just described is whether there is a systematic difference between acquisition and asymptotic responding in their sensitivity to the context manipulation. Finally when the asymptotic response rate is maintained by unsignaled delays of reinforcement extinguishing the context increases response rate. Williams / Beha6ioural Processes 54 (2001) 111–126 123 The experiments just described yield a complex picture of how context value affects operant behavior. Because adding a VT schedule of free reinforcement does not differentially allocate its arousal. But when asymptotic response rates maintained by schedules of immediate reinforcement are studied. Williams. 1985. the contents of which at the time of reinforcer delivery determines what is strengthened. Whether this growth in arousal will increase the rate of the operant response then depends on whether this arousal is allocated to the operant response or to response forms that compete with the operant response. it is not obvious why they should produce differential amounts of competing behavior. One implication of such an account is that assessment of the size of the memory buffer (see Killeen. for a method for making such an assessment).
10 alternative continued to be presented on 1/2 of all of the trial presentations. The 0. each of which is associated with a given reinforcement probability. Because subjects responded on almost every trial.20’ and ‘0. In contrast the molar rate of reinforcement. The reason that rate of reinforcement has been favored is that the concept of probability implies a discrete response occurrence.05 and 0. Here there were never any choice trials.20 alternative as with the 0. An example from my own laboratory illustrates the fundamental nature of the issue. I believe.25. Local rate of reinforcement is equivalent to probability of reinforcement if the time spent responding is considered to be segmented into time units. the obtained probability was also 0. which disregards the time actually spent responding. A major part of the problem is that it is difﬁcult to know which independent variables are fundamental and which are derivative. but the 0. This means that the longer the time since the last response to a given choice alternative. and the obtained probabilities of reinforcement for the two alternatives should be equal. Then by summing over the time units. The apparent inference is that the obtained probability of reinforcement is not the controlling variable. Now both the ‘0.20 was preferred over the 0. the greater the probability that a reinforcer will have been scheduled. the fact that preference went in opposite directions seems paradoxical. somewhere in the range of 0.20 probability.10. the reinforcer was held until the next response to that alternative. The results were that the 0. with a scheduled probability of 0.20 alternatives were approximately equal. Note that if matching occurs. choice probe tests were presented in which the 0. or.10.10 alternative. For example. Williams (1993) presented pigeons a multiple schedule of discrete trials in which probability of reinforcement for the different response alternatives was the independent variable. This simple question is.05.A. After behavior had stabilized in each component. The scheduled probabilities for the choice component of the schedule were 0.20 and 0. or is probability of reinforcement derivable from rate of reinforcement? Skinner believed probability of reinforcement was fundamental.05 alternative. but most subsequent investigators in the operant tradition have favored rate of reinforcement.2– 0.05’ were strongly preferred over the 0. Williams / Beha6ioural Processes 54 (2001) 111–126 differences in the conceptual frameworks that have been constructed to explain the two sets of ﬁndings. a local rate of reinforcement can be calculated.20 versus 0. The issue posed is why obtained probability of reinforcement was the controlling variable in the yoked procedure but not in the choice procedure. Given that the obtained probabilities of reinforcement for the 0. is rate of reinforcement derivable from probability of reinforcement.124 B.10 alternative. diagnostic of the . This meant there were four times as many trials with the 0. except as a dependent variable. there should be a 4:1 preference in favor of the 0. is not commensurate in any obvious way with probability of reinforcement.05.20 alternative was paired with the 0. The schedule in one component was designed to mimic a concurrent VI VI schedule: when a reinforcer had been scheduled by the probability gate.05 alternatives was similar to those obtained by the subjects as a result of their choice distribution.10. There is an important distinction between molar rate of reinforcement versus local rate of reinforcement as explanatory concepts.20’ and ‘0. Here the probe results were entirely different.05 alternative was also paired with the 0. For the theoretical analysis of reinforcement to advance it is imperative to gain a deeper understanding of the basic units of analysis. The results become even more challenging when they are considered along side of those of subjects that were yoked to those just described. local rate of reinforcement. but the ‘0.05’ alternatives were presented separately with a probability of reinforcement equivalent to those actually obtained by the master subjects for each alternative. whereas responses can often be continuous in duration. and in which the 0. in some cases. In the second component of the multiple schedule a single response alternative was available. The yoking procedure also entailed that the distribution of trials for the 0.10 alternative.10 was preferred over the 0. This of course is not surprising given that both had higher obtained probabilities of reinforcement.
D. Psychol. delay. Anim. In: Jones.).A. Perone. In: G. D. J. K. Exp.. V. account will prevail. Miami Symposium on the Prediction of Behavior: Aversive Stimulation. Barrera.R. Behav. 1948.A. Behav. A. Logan. pp. Superstitious key pecking after three peck-produced reinforcements. 7. Learn. 225 – 284.. 1986. 9.. Response acquisition with delayed reinforcement. G. W. Both are similar in offering a comprehensive account that encompasses not only choice but response acquisition. Grifﬁths. J. (Ed. Anim. Incentive: How the Conditions of Reinforcement Affect the Performance of Rats. Jenkins. Rev. 45B. Behav..M. Bull. Gallistel.. Anal. Watt. L. New York.M. In: Locurto. 5. C. D. 1979. Baum. Dragoi. 1960. A. M. Catania. Why autoshaping depends on trial spacing. probability. Rev. R. Charnock. 49B. 1973. Anal. University of Miami Press.. 131 – 147. J. 55 –104. In: Lattal. Dickinson.A..A. 107. Barnes. . Forbes.H.. Behav.. Concurrent performances: reinforcement interaction and response independence. R.. Exp.L. Anal. Rev. W.M.. Bower (Editor).C. 396 – 403. S. Handbook of Research Methods in Human Operant Behavior. Rescorla. Appleton-Century. 20. O. Time. P. Dickinson.). 249 –263.R. Behav. 1943.. 2.E. 11 – 16. Hull. 1989. Kamin. Process..W. Associative structures in instrumental learning. and conditioning. 1999. Psychon. New York. Behav. 504 – 508. Varga. CT. Q. 137 –143. Gibbon. On the law of effect...R. 6. J. 1988.S. Z. Uncited reference Skinner. 106. 25. 1983. 1 – 16. Killeen. Williams / Beha6ioural Processes 54 (2001) 111–126 125 future of understanding reinforcement at a more fundamental level. 1992. Baum. Anal. Psychol.J. 1997. Belke. Exp.. Dynamics of time matching: arousal makes better seem worse. 1960. Q..R.. Process. pp. G.. Papini.. J. Choice and self-control.E. C. P. 1998. 208 – 215. J. Psychon. Learn.... 97. Academic Press. Killeen. Exp. 1982. 20. The assumptions underlying the two accounts are fundamentally different. Plenum Press. New York. Psychol. Anim.A.E.. Gleeson. 1990. M. K. Anal. 2000. rate. Response rate as a function of the amount of reinforcement for a signalled concurrent response.. Stimulus preference and the transitivity of preference. J.. Behav. J.R. pp. Watt.. Exp. Colwill. Motivat. Exp. M. Anim. Staddon.. T. 299 – 300. R. H. 13. Neuringer. 1994. 224 – 242. 16. Learn. W. J... Exp. 1992.. 1970. and an increased focus on deﬁning the concept of reinforcement at its most fundamental level. A. ‘Attention-like’ processes in classical conditioning. Learning Theory and Behavior. 38. 243 – 266. New Haven. Gibbon.. J. Learning when reward is delayed: a marking hypothesis.. Brain Sci.A. The effect of signalling intertrial USs in autoshaping. Hall. along with related ﬁndings (Williams and Royalty. 241 – 258. 1953 References Arbuckle. McIntosh. J.I.. 131 – 161. Psychol. Behav. Lieberman. Anim. 401 – 406. Free-operant acquisition with delayed reinforcement. Herrnstein. Lattal. Behav. H. 1988. Anal. Autoshaping and Conditioning Theory. Yale University Press. D. The correlation-based law of effect.).A.H. H. Exp. C. J. 9 – 33.. Changes in the functional response units with brieﬂy delayed reinforcement. 374 – 389. Behav. K.J. (Eds. Lattal.. A. A. The relation of secondary reinforcement to delayed reward in visual discrimination learning.C. 13. Psychol. Exp. Dragoi and Staddon (1999) and Gallistel and Gibbon (2000). 5. 1995.M. Psychol.V. Holland. Grice. J. (Eds. and asymptotic response rates. Rev. Mowrer.. Mazur.M. Gibbon. 5. Two recent theoretical papers.. F. J. Gibbon. 1998. The dynamics of operant conditioning. J. Psychol. 1979.. Rachlin. Bull. Anim. Mathematical principles of reinforcement. if either. New York.. Belke. 1963.. 20. 20 – 61.J.E. 127 – 134. L. 454 – 471. J. At the minimum the development of such comprehensive competing theories promise renewed enthusiasm for the experimental analysis of behavior. Terrace. Learn. as part of a more general framework. Mazur. Wiley. Thomas. 37B. and conditioned reinforcement. Q. 1990. FL. and it remains to be seen which. 1981. The role of contingency in classical conditioning. Contingency effects with maintained instrumental reinforcement. 1996... J.J. 221 – 238. P. 27 – 39. 1970. 17. Behav. Exp.R.L. Process. Durlach. 1969.J. J.. Behav. 105 – 135. Pearce. Motivat. Context conditioning and free-operant acquisition under delayed reinforcement. 289 – 344. 1992. R. Psychol. Principles of Behavior.J. J. 1985.T. Bizo. 1995). 13. A. Exp..C. 12. Behav. F. J. 97 – 110. Psychol.... Rev. M. J. P. Choice.H. A.. Learn. Exp. Psychol. R. The inﬂuence of context-reinforcer associations on instrumental performance.. Representation-mediated extinction of conditioned ﬂavor aversions.J. Dickinson. Psychol.B. 49. Bitterman. Coral Gables.. 397 – 416. offer accounts of the results just described.. Exp.
. The behavior of organisms: purposive behavior as a type of reﬂex. J. Williams. Behav. Anal.A. R. In: Atkinson. The basis of superstitious behavior: chance contingency. 279 –299. Science and Human Behavior. 24. Exp. Anal. Williams. pp.. Luce. Anal. Rev. Herrnstein. Williams. Rev. J. Physiol..F. Rescorla. .F. 232 – 241. B.A. R. 16.J.. Free Press. 50. In: Commons. 1972. 163 – 172. 441 – 449. 1967. stimulus substitution. Exp. The blocking of reinforcement control.. 74. J.. 26. 1999. B. Rev. J.. B. Williams. Signal duration and suppression of operant responding by free reinforcement. 335 – 357. choice. Lucas. The effects of unsignalled delayed reinforcement.. 1948..M. Behav. 1988. 18. R. Royalty.A.. Williams. Exp. J. Williams. A. G. B. 48 – 55. . Behav. Timberlake.. vol.. 1989a. Anim. Anim. Williams. 1989. Rescorla. Anim. 38.). R. 3. Motivat. Moran. 1971. Psychol. Bacha. Behav. 1967. Rescorla.E. Psychol. Quantitative Analyses of Behavior: Acquisition. W. S. Herrnstein. Anal. Psychol. 1993. 6.. and response strength. B. The temporal organization of behavior on periodic food schedules. Psychol. 427 – 445. Reid.. Skinner. Behav. Behav. Simmelhag.A. The effects of response contingency and reinforcement identity on response suppression by alternative reinforcement. Williams. B. Behav.A. Molar versus local reinforcement probability as determinants of stimulus value. Cambridge. R. J. Williams. Wagner. R. Staddon. 15.. B. Psychol. R. G. MA. The ‘superstition’ experiment: a reexamination of its implications for the principles of adaptive behavior. 74. Process. M. 1988. 371 – 379. A test of the melioration theory of matching. Pavlovian conditioning and its proper control procedures.A.A.A.. Blocking the response-reinforcer association: theoretical implications. Exp.). 20. J. 66. R. 71 – 80. Contingency theory and the effect of the duration of signals for noncontingent reinforcement. Exp.G. Learn. Wiley. Solomon. pp. New York. Motivat. Bull. J. (Eds.. Rev.. 1968. B. B. or appetitive behavior? J. Skinner. 20. W. Process. 1... Psychol. Bull. 6. 1953. 1975. J... Context extinction following conditioning with delayed reward enhances subsequent instrumental responding.K. 1 – 5.126 B. Blocking the response-reinforcer association.. Stevens’ Handbook of Experimental Psychology. Two-process learning theory: relationships between Pavlovian conditioning and instrumental learning... 1 – 27.A. 1989b. B. A. Lindzey. Rev. 1982. 1976. 59. Behav.. Comp. B. (Eds.A. 167 – 244. 204 – 224. 59. Behav. New York. 99 – 113. Psychon..L.R. 1993. C. Psychol. 44. B. 111 – 114. Exp.. 1985. Baum. P. W.A. J. V. H. 618 – 623. Learn. P.. 3 –43.A.. Anal. Psychon. Anal.. Exp.D. 215 – 225. Williams. Reilly. Anal. Timberlake.. G..C.. 1978.A. Information effects on the response-reinforcer association. Exp. Williams / Beha6ioural Processes 54 (2001) 111–126 Williams. Rachlin..A. . Superstition in the pigeon.A.A. 305 – 317. 1990. Effects of alternative reinforcement: does the source matter? J. 168 – 172. Behav. 78. Reed. Probability of shock in the presence and absence of CS in fear conditioning. 2nd edition. Exp. Learn. Reinforcement. B. 151 –182. Ballinger Publishing Company. Exp. 1994.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.