You are on page 1of 11

Journal of Experimental Psychology: Copyright 1983 by the

Animal Behavior Processes American Psychological Association, Inc.


1983, Vol. 9, No. 4,401-411

The Role of Marking When Reward is Delayed


Glyn V. Thomas David A. Lieberman
University of Birmingham, University of Stirling,
Birmingham, England Stirling, Scotland
Donald C. Mclntosh Peter Ronaldson
University of Birmingham, University of Stirling,
Birmingham, England Stirling, Scotland

Two-choice spatial discrimination by rats is enhanced if a salient stimulus, marker


occurs immediately .after every choice response and again after a delay interval
(Lieberman, Mclntosh & Thomas, 1979). Three experiments further explore this
effect. Experiment 1 found that the second marker is unnecessary. Experiment 2
found that a marker presented before a response is as effective as one presented
after. Both effects could be explained in terms of markers focusing attention on
subsequent cues. Experiment 3, however, found that markers after choice enhance
learning even when no discriminative cues are present following the marker. Markers
thus appear to initiate both a backward search through memory and attention to
subsequent events; both processes help to identify events that might be related to
the unexpected marking stimulus.

Research into the effects of delayed rein- is in some way reinforcing, but this would not
forcement on animal learning has shown that explain the differential strengthening of the
even small delays in the presentation of a re- correct choice response because the stimulus
inforcer can have dramatic effects on learning. follows incorrect choices as well as correct
In a classic study by Grice (1948), for example, ones. To account for differential strengthening,
delays of .5 sec caused a five-fold increase in the authors put forward a marking hypothesis
the number of trials required to solve a simple based on Leon Kamin's "surprise" analysis of
visual discrimination, whereas delays of 10 sec classical conditioning (Kamin, 1969). When
prevented learning altogether. In a recent an unexpected unconditioned stimulus is pre-
maze-learning experiment by Lieberman, sented, Kamin suggested, subjects initiate a
Mclntosh, & Thomas (1979), however, pre- search through memory to identify possible
sentation of a brief but salient stimulus im- predictive cues. Lieberman et al. hypothesized
mediately following the choice response—a that any salient and unexpected stimulus ini-
flash of light, a burst of noise, or handling by tiates such a search and that during this search
the experimenter—resulted in substantial subjects examine previous responses as well
learning even when reinforcement was delayed as stimuli. If a response is identified as the
for 2 min (see also Lett, 1973, 1975). best predictor of the unexpected event, the
Why should presentation of a salient stim- memory trace of this response would be
ulus following a choice response facilitate strengthened (perhaps as a result of the extra
learning? One possibility is that the stimulus rehearsal it receives during the search process)
so that it is more likely to be recalled during
any subsequent memory searches. When a
light flash is presented to a rat in a maze,
Experiments 1 and 2 were based on a dissertation by according to this analysis, the rat will im-
D. C. Mclntosh, submitted to the University of Bir-
mingham in partial fulfillment of the requirements of a mediately search its memory to identify pos-
master's degree. Experiment 3 was based on a dissertation sible causes or predictors of this unexpected
by Peter Ronaldson, submitted to Stirling University in event, and because the light flash immediately
partial fulfillment of the requirements for a baccalaureate follows the rat's choice response it will be
degree.
Requests for reprints should be sent to Glyn Thomas, especially likely to attend to this response and
Department of Psychology, University of Birmingham, P.O. thereby mark it in memory. When food is sub-
Box 363, Birmingham B15 2TT, England. sequently discovered, this event will also trigger
401
402 THOMAS, LIEBERMAN, McINTOSH, AND RONALDSON

a search through memory for possible causes. after they made their choice response but also
On the basis that salient events are particularly just before receiving food. The;second marker
memorable (cf. the von Restorff effect), might have been critical; a marker presented
marked choice responses are likely to be iden- only after a response might be insufficient to
tified by such a search. Of course, other events produce learning. If so, the paradoxical effects
(very salient ones, such as being placed in the of punishment and of quasi-reinforcement
apparatus or very recent ones immediately could not be attributed to marking because
prior to reward) may also be considered as only a single marker was presented in those
possible causes of reward. It is assumed, how- situations. As to why this second marker
ever, that these other events ultimately do not should be so critical, one possibility is that the
become as strongly associated with reward as first marker does not strengthen the memory
a marked correct response does, because only trace of the preceding response but instead
the latter has a good correlation with reward. becomes associated with it. Then, when the
If this analysis is correct, it could have marker is presented before food, this second
important implications for learning in a wide presentation reactivates the memory trace of
range of situations. Marking could, for ex- the first marker, which, in turn,: reactivates the
ample, help us to understand why contingent memory trace of the original response. Cronin
electric shock sometimes strengthens respond- (1980), Lett (1979), Roberts (1976) and Spear
ing rather than punishes it. In a study by (1978) all suggested that such memorial rein-
Muenzinger (1934), for example, rats were statement might underlie learning with pro-
trained in a T-maze with food as the reinforcer, cedures that resemble those used by Lieber-
Muenzinger found that subjects given a mild man et al.
electric shock immediately after entering the Like the first explanation of marking, this
correct arm learned faster than did subjects second analysis emphasizes memory and re-
not shocked at all. Marking readily explains trieval, but it differs in assuming that two
this paradoxical effect: Shock, a highly salient markers are necessary for learning. Support
event, marks the choice response in the rat's for such an analysis comes from studies of
memory and thus increases that response's human learning by Endel Tulving and his as-
likelihood of recall when food is subsequently sociates (Tulving & Osier, 1968; Tulving &
discovered in the goal box. Pearlstone, 1966). Using a free-recall para-
In a similar way, marking could help to ex- digm, they found that target words were better
plain the phenomenon of "quasi-reinforce- recalled if they were accompanied by weak
ment", first identified by Neuringer and Chung associates during training (e.g., leg-MUTTON)
(1967). They trained pigeons to peck a key, but ony if these associates were also present
but instead of reinforcing a pigeon every time during testing. One possible explanation is that
it satisfied a particular reinforcement schedule the target word and its associate become as-
(e.g., Fixed Ratio 11), they presented only food sociated during training but that this associ-
on a percentage of these occasions. Responding ation does not itself increase the retrievability
under these conditions was initially low, but of the target word: Only if the associate is
if a brief blackout intervened every time the again present during testing will it reactivate
schedule requirement was satisfied, the average its earlier trace and thus lead 'the subject to
rate of responding more than doubled. Because the target word.
controls established that the blackout was not Experiment 1 was designed to test these
a reinforcer, this facilitory effect was difficult possibilities by investigating whether two
to explain, but it would be expected if the markers are necessary to facilitate learning or
blackout served as a marker identifying the only one.
correct response sequence (for further discus-
sion, see Lieberman et al., 1979). Experiment 1
Thus, marking could be a fundamental pro- Method
cess of learning, with implications in a wide
variety of settings (see also D'Amato, Fazzaro, Subjects
& Etkin, 1968). However, Lieberman et al. The subjects were 24 male hooded rats, about 140 days
presented their subjects with a marker not only old and experimentally naive at the start of the experiment.
MARKING AND DELAYED REWARD

They were housed individually in a different rqanv through the door by hand. When the subject was in the
the experimental apparatus and were maintained at 80% choice box, the door from the start box was lowered behind
of their previously determined free-feeding body weights it and 10 sec later the doors to the side arms were raised
for the duration of the experiment. simultaneously and the subject allowed to enter either side
arm. Once the animal's entire body,'excluding the tail,
was within the side arm, the door was lowered to prevent
Apparatus retracing.
.The apparatus used was a maze similar to that employed Exactly 2 sec after a choice response, the door at the
by Ueberman et al. (1979, Experiment 4). A ground plan far end of the side arm was raised and the animal was
of this maze is shown in Figure 1, together with details allowed to walk into the delay box, where it was confined
of its dimensions. The walls and floor of the maze were for 120 sec with all doors closed. Following the delay in-
made of wood. The entire maze, except the side arms, terval, the door to the goal box was raised to allow the
was painted gray. The right-hand side arm was painted subject to enter. As soon as the subject was in the goal
white, the left-hand side arm black. Vertically sliding guil- box, the door was lowered behind it to prevent retracing.
lotine doors made of opaque gray plastic separated the If the previous choice response had been correct, the food
different sections of the maze. The guillotine doors from dish in the goal box contained wet mash. If the previous
the choice box to the side arms could be opened simul- choice had been incorrect, then the food dish was empty
taneously by means of a pulley system. All the sections and the subject was confined in the goal box before being
of the maze were 15 cm high and were covered with re- returned to the start box and run again. The food disk-
movable clear plastic lids. A 35-ohm miniature loudspeaker (empty or full) was placed in the goal box just before the
was mounted on the center of the lids of each of the two end of the delay interval to eliminate food odor as a cue
side arms and the goal box. These loudspeakers were con- during the delay.
nected to a white-noise source via an electronic timer set Subjects in the two-marker group were presented with
to deliver a 2-sec burst of noise at approximately 90 dB a 2-sec burst of noise in the side arms immediately after
whenever a hand switch was operated by the experimenter. each choice response and again immediately after they
Reinforcement consisted of 2.5 gm (dry weight) of pow- had entered the goal box. Subjects in the one-marker group
dered rat diet mixed with water into a wet mash and were presented with a 2-sec burst of noise after each choice
presented in a small dish placed in the goal box. A small response but not on entry into the goal box. After their
barrier of clear plastic was placed across half of the goal choice responses and entries into the goal box, subjects
box and in front of the dish so that the rats had to enter in the no-marker group were treated in exactly the same
the goal box completely to investigate its contents. way as the two-marker group and the one-marker group
subjects except, of course, that no noise was presented on
either occasion.
Procedure A correction procedure was employed so that each an-
Pretmining, All the subjects were handled for 1 wk. imal was run repeatedly until a correct choice was made
before the start Of the experiment. During this period the or until it made eight perseverative errors. In this event,
food deprivation regimen was established, and the rats the subject was given a forced choice on the next run in
were introduced to the food dishes and the wet mash to which only the door of the correct side arm was opened,
be used as reinforcement in-the experiment. but the subject was otherwise treated exactly as if the
Delayed reward twining. In view of the overall black correct choice had been made freely. In either case, after
preference initially displayed by the rats studied by Lie- reward was presented and consumed, the trial was ter-
berman et al. (1979) in mazes similar to the one used in minated, and the animal was returned to its home cage.
the present experiment, it was decided to reward all subjects If on any run an animal failed to make a choice by
for running to the white (right) side arm. The subjects entering one of the side arms within 120 sec, it was guided
were randomly assigned to three equal groups of eight by hand into, one of the side arms. After such a guided
subjects each. The first group was designated the "two- choice, the animal was treated exactly as if the choice Had
marker group"; the second, the "one-marker group";'and been made freely. If a subject required more than one
the third, the "no«marker group." guided choice, it was guided alternately to the correcfand
On training trials, all Subjects were treated in exactly the incorrect side arms. Guided and forced choices were
the same way except for the delivery of noise as a marking disregarded in the analysis of the data.
stimulus. Each subject was placed into the start box of Initially, all subjects were given, one trial per day. As
the maze, and after 10 sec the door to the choice box was progressively fewer incorrect choices were made and la-
opened. If the subject had not entered the choice jbox tencies shortened, the number of trials for each subject
within a further interval of 10 sec, it was gently guided was increased to two per day, but with at least 1 hr. between
trials. To help minimize the possibility that odor trails left
by preceding subjects could influence subjects' choices,
SIDE ARM
^-^ the entire maze was wiped with a solution of soap and
I BLACK 1 disinfectant after each trial (see Lieberman et al., 1979;
START CHOICE -• • DELAY GOAL
BOX BOX - - BOX:
Roberts, 1976). The order in which subjects were run was
BOX
SIDE ARM varied from day to day.
( WHITE 1 Following the completion of each day's trials, each sub-
-^^ Ocm i 1
DOORS 4-—)- ject was given sufficient dry food in its home cage to main-
tain it at 80% of its free-feeding weight. The experiment
Figure I. Ground plan of the maze. was run until each subject had received 50 trials. :
404 THOMAS, LIEBERMAN, McINTOSH, AND RONALDSON

100i group did differ significantly from the com-


00
bined scores of the other two groups, t(22) =
LU
oo 1.91, p<. 05.
z The ANOVA also revealed that the improve-
o
CL ment in correct first responses over trials was
00 significant, F(4, 84) = 13.34, p< .01. A trend
analysis showed that the overwhelming trend
of the improvement over trials was linear, F( 1,
84) = 52.25, p < .01, and that there were
cc. 50- significant differences between the groups in
their, linear trends over trials, F(2, 6) = 12.63,
O O ONE MARKER GROUP p<.0l.
Post hoc t tests were then carried out to
• • TWO MARKER GROUP compare the linear trends of each of the
ex,
LU O—D NO MARKER GROUP marked groups separately with that of the no-
o. marker group, using the procedure devised by
Dunnett (1955). In both cases the difference
was significant: two marker versus no marker,
1 - 2 3 4 5 t(2l) = 3.31, p < .01; one marker versus no
BLOCKS OF TEN TRIALS marker, /(21) = 2.83, p < .05.
Figure 2. Percentage of correct first responses for blocks As suggested by Figure 2, then, the two-
of 10 trials for each group in Experiment 1. marker and the one-marker groups showed
significantly more improvement in correct re-
Results sponding over trials than did the no-marker
group. The rationale of Dunnett's test did not
The principal data from this experiment permit us to directly compare the linear trends
are the number of correct first responses over of the marked groups, but these two groups
trials for each group. These data are presented did not differ significantly from each other in
in Figure 2 as percentages over blocks of 10 terms of overall correct responses.
trials. It can be seen that all groups started at The mean latencies of choice responses, av-
approximately the same level: around 45% eraged over trials, were as follows: one-marker
correct over the first block of 10 trials. It can group, 4.6 sec; two-marker group, 4.6 sec;
also be seen in Figure 2 that all groups showed and no-marker group, 4.2 sec. An ANOVA in-
some improvement in correct responding over dicated that these scores did not differ signif-
the 50 trials of the experiment, although the icantly, F(2, 21) = 1, p > .05.
no-marker group showed only slight improve-
ment. Both the marked groups showed con- Discussion
siderable improvement, increasing from ap-
proximately 45% correct on the first block of The results of Experiment 1 show clearly
10 trials to approximately 85% correct on the that it is not necessary to present a marker
final block of trials. No obvious differences twice for it to be effective: Even a single pre-
between the data for the two-marker and the sentation, providing it follows the response
one-marker groups are apparent in Figure 2. closely (see Lieberman et al., 1979), may fa-
A two-way analysis of variance (ANOVA) us- cilitate learning. It is less clear, whether re-
ing the factors of experimental treatment and peating the marker before food enhances
trial blocks revealed that there were significant learning. On several counts, such enhancement
differences between the groups in overall cor- might be expected. One possibility already
rect responding, F(2, 21) = 13.36, p < .01. discussed is that presentation of a second
Post hoc t tests showed that there was no sig- marker would remind the subject of the first,
nificant difference in overall correct responding and thus of the choice response. Alternatively,
between the two-marker and the one-marker presenting the marker immediately before food
groups, f(14) = .89, p > .05. However, the might increase its salience by endowing it with
overall correct responding for the no-marker secondary reinforcing properties. This in-
MARKING AND DELAYED REWARD 405

creased salience would then make the marker other hand, the marker was presented just be-
more effective because subjects would be more fore the doors to the side arms were raised. If
likely to attend to it and would expend more markers enhance recall for subsequent re-
effort in searching for its cause. There are, sponses as well as preceding ones, subjects in
therefore, grounds for expecting two markers the marked-before group should also be more
tq,be more effective than one, and indeed sub- likely to recall their choice response (it is the
jects in the two-marker group did learn faster. first response they make after the marker is
The difference between the two groups* how- presented), so that learning in both groups
ever, was statistically insignificant. Whether the should be roughly comparable.
second marker has any effect, it is clearly not As a further check on the assumption that
necessary for learning. markers improve learning only when tem-
porally contiguous with the choice responses,
Experiment 2 a fourth group was included. Subjects in this
group progressed through the maze in exactly
On the surface, the phenomenon of marking the same way as those in the marked-after
seems closely analogous with a common ex- group, but instead of receiving the marker as
perience in human memory: When people ex- soon as they made their choice response, it
perience an event of special importance, they was not presented until 30 sec later. If the
may later be able to recall not only the event effectiveness of a marker depends on its tem-
itself but also many of its circumstances. In poral proximity to the response, then learning
the case of John F. Kennedy's assassination, in this group should be significantly weaker
for example, many people can vividly remem- than in the marked-before and the marked-
ber what they were doing when they heard the after groups.
news, and Brown and Kulik (1977) have la-
beled these memories "flashbulb memories." Method
Although Brown and Kulik focused on mem- Subjects
ory for events that preceded the news of Ken-
nedy's assassination, anecdotal evidence sug- The subjects were 32 male hooded rats. They were ap-
proximately 140 days old and experimentally naive at the
gests that people also have vivid memories for start of the experiment. Housing and maintenance con-
what they did after hearing the news. One of ditions were the same as in Experiment 1.
the authors, for example, remembers hearing
the news while returning from a physical ed- Apparatus
ucation class and then spending the rest of the
afternoon standing on the mall of Columbia The maze used in this experiment was the same as in
University, talking with other students and Experiment 1 with the following modifications. For subjects
marked before their choice response, an additional loud-
waiting in a slightly stunned state for further speaker was mounted on the center of the lid of the choice
news. box (see Figure 1). This loudspeaker was connected in
If the analogy between marking and flash- parallel with the loudspeakers over the side arms and goal
bulb memories is meaningful, we might expect box and allowed a 2-sec burst of noise to be presented to
subjects in the choice chamber. All other details of the
that a salient event such as a bright light or a apparatus were exactly the same as in Experiment 1.
loud noise would enhance memory not only
for preceding events but also for those that
Procedure
follow, and the primary purpose of Experiment
2 was to investigate this hypothesis. Three Pretraining. Pretraining was the same as in Experi-
groups were run, with subjects receiving a ment 1.
marker either before making their choice re- Delayed reward training. The subjects were randomly
assigned to four groups of eight,subjects: marked-after,
sponse, afterward, or not at all. The procedure marked-before, delayed-marker, and no-marker groups. The
for subjects in the marked-after group was es- subjects were taken individually in a small transfer cage
sentially identical to that used for the one- to the experimental room for testing.
marker group in Experiment 1, with a single Other than changes in the scheduling of the marker's
presentation, the only change from the procedure, of Ex-
marker presented immediately after they made periment 1 was that all subjects were confined in the ap-
their choice response and entered one of the propriate side arm for 30 sec following both correct and
side arms. In the marked-before group, on the incorrect choices rather than for 2 sec. After this period
406 THOMAS, LIEBERMAN, McINTOSH, AND RONALDSON

had elapsed the door to the delay box was opened and the trials to about 80% correct at the final block.
subjects were allowed to enter and spend the remaining In contrast, the performance of the delayed-
90 sec of the delay there before they were given access to
the goal box. Thus, the total delay was 2 rain as in Ex- marker and the no-marker groups showed no
periment 1, but the first 30 sec of the delay were spent inclear change over the course of the experiment,
the side arm rather than in the delay box. remaining close to 50% correct throughout.
Subjects in the marked-after group received a 2-sec burst The data shown in Figure 3 were subjected
of noise immediately after their choice response. Subjects
in the marked-before group received a similar 2-sec burst to a two-way ANOVA, the factors being exper-
of noise, but it was presented immediately before the doorsimental treatment and' blocks of trials. The
to the side arms were raised to allow subjects to make analysis revealed that there were significant
their choice responses. Subjects in the delayed-marker group
differences in overall correct responding ac-
received a 2-sec burst of noise immediately before they cording to experimental treatment, F(3,28) =
were allowed to enter the delay box, that is, 30 sec after
their choice responses. Subjects in the no-marker group 3.59, p < .05, and that there was also a sig-
nificant effect of trials, F(4, 112) = 9.59,
did not experience white noise at any point in their training
trials. p < .01.
The correction procedure and the procedures for guided To determine the source of the significant
and forced choices, control of odor trials and reinforcement
were exactly the same as in Experiment 1. All subjects treatment effects, t tests were carried out on
the scores of the different groups averaged over
were initially given one trial daily, but as errors declined
trials. No significant difference was found be-
and latencies shortened, the number of daily trials for each
subject was increased to two per day, with at least 1 hr. tween the marked-after and the marked-before
between successive trials for each subject. groups, f(14) = .57, p > .05. There was also
no significant difference between the delayed-
Results marker and the no-marker groups, t( 14) = .29,
Figure 3 shows the percentage of correct p > .05, but the combined scores of the
first responses averaged over blocks of 10 trials marked-before and the marked-after groups
for each of the four treatment groups. It is did differ significantlyj from-the combined
apparent from this figure that the performance scores of the delayed marker and the no-
of the marked-before and the marked-after marker groups, f(30) = 3:68, p < .01. Thus,
groups improved steadily over blocks of trials the averaged scores for correct responding of
from about 45% correct at the first block of the marked-before and the marked-after
groups were significantly higher than those of
the delayed-marker and the no-marker groups.
As in Experiment 1, a trend analysis was
100 then carried out to examine changes in per-
formance over trials. The trend analysis re-
vealed that the only significant trend over trials
was linear, F(l, 112) = 36.57, p < .01, and
that there were significant differences between
the linear trends of the groups, F(3, 9) - 5.49,
p < .05.
Post hoc t tests were then carried out to
50- compare the linear trends of each of the
marked groups with that of the no-marker
(control) group (Dunnett, 1955). Both the
• NO MARKER GROUP markedrbefore and the marked-after groups
O D MARKED BEFORE GROUP were found to differ significantly from the no-
• MARKED AFTER GROUP marker group in terms of their linear trends,
O—O DELAYED MARKER GROUP r(28) = 27.56 and *(28) = 23.08, respectively,
both with p < .01. The linear trend of the
delayed-marker group did not differ signifi-
1 2 3 U 5 cantly from that of the no-marker group,
BLOCKS OF TEN TRIALS Z(28) = 2.01, p < .05.
Figure 3. Percentage of correct first responses for blocks The mean latencies of choice responses av-
of 10 trials for each group in Experiment 2. eraged over trials Were marked-before group,
MARKING AND DELAYED REWARD 4Q7

3.8 sec; marked-after group, 4.0 sec; delayed- which cues they chose when they later receive
marker group, 4.9 sec; and no-marker group, food.
3.6 sec. The differences between these groups' If we assume that markers presented after
means were not significant, F(3, 28) =1.53, a choice response also focus attention in this
p > .05. manner, then learning in the marked-after
group could be explained by attention without
Discussion invoking any memory search. When the rat
enters one of the side arms, presentation of
One purpose of this experiment was to eval- the marker will focus the rat's attention on
uate, under more tightly controlled conditions, the dominant cues in its environment. Because
Lieberman et all's (1979) finding that the ef- the most salient of these is likely to be the
fectiveness of a marker depends on its conti- chamber's color (black or white), the rat would
guity with the response to be learned. This be especially likely to attend to this brightness
conclusion was confirmed: Subjects receiving cue and thus would be likely to recall it when
a marker immediately after choice responses later fed in the goal box.
showed significant learning, whereas those for The learning that occurs when a marker
whom the marker was delayed 30 sec did not. follows a choice response, then, could be due
Because both groups spent identical periods to a backward memory search for possible
in the black and white side arms, moreover, causes (Lieberman et al., 1979), but it could
this result cannot be attributed to differential equally be explained by increased attention to
exposure to the discriminative cues. subsequent events. The purpose of Experiment
The second major purpose of this experi- 3 was to obtain further evidence concerning
ment was to test the intuition that markers the relative plausibility of these explanations.
may enhance memory for subsequent events
as well as preceding ones. This prediction was Experiment 3
also confirmed: Subjects that received a marker
immediately before making their choice re- If the backward search hypothesis is correct,
sponse learned as well as those receiving one a marking stimulus triggers a search through
immediately after. Although this result con- memory to identify preceding events that
firmed the intuition that generated the exper- might have caused it. If the attention hypoth-
iment, it poses, however, difficulties for the esis is correct, markers focus attention on the
backward search mechanism proposed by dominant cues present immediately following
Lieberman et al. Because the noise in the the marker. In Experiment 2, with distinctive
marked-before condition preceded the choice side arm cues present both before and after a
response, any memory search it initiated could choice, these hypotheses predict equivalent
not have detected the choice response. Indeed, learning. If the white and black discriminative
any such search could have interfered with cues could somehow be removed as soon as
learning, because it would mark whatever the the rat made its choice response, however, then
rat was doing just before the marker was pre- the two hypotheses would lead to very different
sented. Facilitated recall of this irrelevant be- predictions.
havior would presumably interfere with learn- To achieve this aim, in Experiment 3 the
ing of the current choice. doors leading to the side arms were painted
Because learning in the marked-before black or white, but the side arms themselves
group cannot be attributed to a backward were an identical shade of gray. To eliminate
search mechanism, how can it be explained? other possible cues, the two side arms were
One possibility involves the effects of a salient randomly interchanged over trials, and the po-
stimulus on attention. One such effect appears sition of the maze within the room was rotated.
to be a focusing of attention (Easterbrook, Once a rat entered a side arm, therefore, no
1959; Telegdy & Cohen, 1971). Thus rats that cues were available to indicate whether or not
receive a noise marker before making their it had made a correct response.
choice response may be more likely to focus Because these changes might make the dis-
their attention on the black and white; cues crimination problem considerably more dif-
facing them and thus be better able to recall ficult, the delay of reinforcement was reduced
408 THOMAS, LIEBERMAN, McINTOSH, AND RONALDSON

from 120 sec to 30 sec. Additional discrimi- open, followed, on the next day, by a single rewarded place-
native cues were also made available in the ment in the goal box.
Delayed reward training. The subjects were randomly
choice box. The doors leading to the side arms assigned to two groups of eight subjects: the marked group
and their surrounds were painted black or and the unmarked group. Subjects were taken individually
white, and a small light was mounted im- in a small transfer cage to the experimental room for
mediately above the white door. testing. For all subjects a correct response was denned as
an entry into the right-hand side arm (entered via a white
A number of discriminative cues were thus door and surround).
available at the moment of choice, but none The procedure for delayed reward training was the same
were still present when the marker occurred. 'as in Experiment 1 with the following modifications. The
If a marker triggers a memory search, learning delay interval was shortened to 30 sec. A light above the
should still be possible; whereas if it only in- right-hand, white side arm door was turned on 2 sec before
the doors to the side arms were raised to allow the subject
creases attention to subsequent events, learning to make its choice response. Subjects in the marked group
should not occur. only were presented with a burst of white noise at 90 dB
immediately after their choice response. Subjects in the
Method unmarked group were treated in exactly the same way as
the marked subjects, except that noise was not presented
Subjects following their choice responses.
After every trial the maze was rotated on a random
The subjects were 16 male rats, experimentally naive basis, and the side arms were interchanged. All other pro-
and approximately 110 days old at the start of the ex- cedures were the same as in Experiment 1. All subjects
periment. Each subject was maintained at 80% of its free- were given two trials daily until each had received SO
feeding body weight and housed individually in a different trials.
room from the one in which the experiment was run.

Apparatus Results and Discussion


The layout and dimensions of the maze used in this Figure 4 shows the percentage of correct
experiment were similar to those of the maze used in first responses over blocks of five trials for each
Experiments 1 and 2. The present experiment, however,
required a number of modifications to the details of the of the two treatment groups. There was some
maze design. increase in correct first responses over trials
The maze was constructed entirely of clear plastic that in both groups, but the increase was much
was channeled to accept colored plastic sheets on the outer greater for the marked group.
walls so that the color of these walls could be easily changed.
The walls and floor of all compartments of the maze were A two-way ANOVA was carried out on the
colored gray. The guillotine doors between the various data presented in Figure 4 for the factors of
compartments of the maze were made of opaque gray experimental treatment and trials. This anal-
plastic that was left unpainted except for the doors from ysis showed a significant effect of treatments,
the choice box to the two side arms. These two doors and F(\, 14) = 6.2, p < .05, and a significant effect
their surrounds were visually differentiated but only on
the sides that faced the choice box. The left-hand door of trial blocks, F(9, 126) = 2.8, p < .01. In
and surround were painted black; the right-hand door and addition, there was a significant interaction
surround were painted white. A-small 6-W lamp was between treatments and trials, F(9, 126) =
mounted centrally on top of the door surround of the 2.2, p < .05, confirming that the marked group
white, right-hand door. The door surfaces facing the side
arms were left gray, as were the walls and floor surfaces showed significantly more improvement over
of the side arms. The two side arms themselves were in- trials than did the unmarked group.
terchangeable. The mean latency of choice responses for
The goal box in the present maze was made slightly subjects in the marked group was 5.16 sec
larger than that used previously. This enlargement ensured (SD = 4.6 sec), and for subjects in the un-
that even without a barrier (see above) a rat would have
to completely enter the goal box to investigate the contents marked group, it was 3.4 sec (SD = 3.2 sec).
of a dish placed against its far wall. The difference between the group means was
All other particulars of the apparatus were the same as not significant, f(14) = 1.83,/> > .05. Whereas
in Experiment 1. unmarked control subjects showed little learn-
ing, the performance of subjects in the marked
Procedure group improved substantially (and signifi-
Pretraining. ^retraining was the same as in Experiment cantly) over trials. Thus despite the fact that
1 except that each subject was first given a single 10-rain. no discriminative cues were present following
session of free exploration in the maze with all doors held the marker, its presentation still enhanced
MARKING AND DELAYED REWARD 409

learning substantially. A marker, it is thus clear, press. It has long been recognized that this use
does not simply focus attention on subsequent of feedback stimuli facilitates shaping (Skinner,
events: It also produces learning about events 1938), and this facilitation has traditionally
that precede it. been attributed to secondary reinforcement
because the click is closely followed by food.
General Discussion This analysis may well be correct, but the
present results raise the interesting possibility
The results of the present experiments have that some or all of this facilitory effect could
implications for our understanding of marking be the result of a marking process: When the
at both empirical and theoretical levels. One click occurs it may help the rat to remember
important finding is that a marker presented the bar press as a functional unit, or alter-
immediately after a response can enhance natively, it may direct attention to the critical
learning even if it is not repeated when food segment of the response (i.e., the bar's down-
is presented subsequently. Marking might thus ward movement). Rather than directly
play a role in any delayed-reinforcement sit- strengthening the response, in other words, the
uation in which the response produces dis- click may be helping the rat to remember the
tinctive feedback. Consider, for example, the critical units in its execution so that when
standard practice in operant experiments of food is presented subsequently the rat will be
presenting a click immediately after every bar better able to identify these units and thus
reproduce them. One important function of
100- feedback stimuli, in other words, may be to
mark the preceding response in memory.
A,second important finding was the dem-
onstration in Experiment 2 that a marker may
enhance learning about subsequent events as
well as preceding ones. The authors are not
aware of any precedent in the animal learning
literature, but a related finding in human
learning may be the effects of noise on paired-
associate learning. Hamilton, Hockey, and
Q. Quinn (1972), for example, gave two groups
00 of subjects a list of paired associates to mem-
orize. Subjects whose initial exposure to the
list occurred in the presence of a continuous
50 85 dB white noise were found to recall sig-
nificantly more words during subsequent test-
ing (see also Craik & Blankenstein, 1975).
LU
ce Together, these findings pose interesting
QC theoretical questions. The; most obvious is how
O
a marker can enhance learning about events
that follow as well as precede it. Our current
MARKED GROUP belief is that a salient stimulus elicits two com-
UNMARKED GROUP plementary coping strategies: (a) a backward
search through memory in order to identify
events that might have produced the marker
and (b) changes in attention to external events,
perhaps as part of the same search for causes.
The memory search would lead to recall of
preceding events, whereas changes in attention
2 4 6 8 10 would lead to better recall of those which
BLOCKS OF FIVE TRIALS follow.
Figure 4. Percentage of correct first responses for Mocks A second issue concerns the apparent con-
of five trials for each group in Experiment 3. flict between the marked-after result and Wag-
410 THOMAS, LIEBERMAN, McINTOSH, AND RONALDSON

ner's rehearsal model (Wagner, 1978, 1981; be to clarify the conditions under which each
Wagner, Rudy, & Whitlow, 1973). According of these effects is obtained.
to this model, an event must be rehearsed in
short-term memory before a permanent rep- References
resentation of it can be formed in long-term
Brown, R., & Kulik, J. Flashbulb memories. Cognition,
memory. Because rehearsal capacity is as- 1977, 5, 73-99.
sumed to be limited, presentation of a salient Cook, R. G. Retroactive interference in pigeon short-term
stimulus will normally reduce rehearsal of memory by a reduction in ambient illumination. Journal
preceding events and thus reduce the likeli- of Experimental Psychology: Animal Behavior Processes,
hood of their being remembered. Support for 1980, 6, 326-338.
Craik, F. I. M, & Blankenstein, K. R. Psychophysiology
this analysis comes from Wagner's own re- and human memory. In P. H. Venables & M. J. Christie
search on classical conditioning and also from (Eds.), Research in psychophysiology. London: Wiley,
operant experiments (e.g., Cook, 1980; Pearce 1975.
& Hall, 1978; Tranberg & Rilling, 1980). The Cronin, P. B. Reinstatement of postresponse stimuli prior
to reward in delayed-reward discrimination learning by
model predicts that a marker should hamper pigeons, Animal Learning & Behavior, 1980, 8, 352-
learning because it will interfere with rehearsal 358.
of the preceding response. D'Amato, M. R., Fazzaro, J., & Etkin, M. Anticipatory
Marking is not, however, the only instance responding and avoidance discrimination as factors in
in which a salient stimulus enhances rather avoidance conditioning. Journal of Experimental Psy-
chology, 1968, 77, 41-47.
than disrupts learning about contemporaneous Dunnett, C. W. A multiple comparison procedure for
events, and thus appears to contradict the re- comparing several treatments with a control. Journal
hearsal model: A parallel finding in taste-aver- of the American Statistical Association, 1955,50, 1096-
sion conditioning has been termed potentia- 1121.
Easterbrook, J. A. The effect of emotion on cue utilization
tion. In a study by Palmerino, Rusiniak, and and the organization of behavior. Psychological Review,
Garcia (1980), for example, an odor was pre- 1959, 66, 183-201.
sented either by itself or in compound with a Grice, G. R. The relation of secondary reinforcement to
taste and then followed after a delay by illness. delayed reward in visual discrimination learning. Journal
Not only did the presence of the taste not of Experimental Psychology, 1948, 38, 1-16.
Hamilton, P., Hockey, G., & Quinn, J. Information se-
interfere with conditioning to the odor, it sig- lection, arousal and memory. British Journal of Psy-
nificantly enhanced it. chology, 1972,65, 181-189.
A possible theoretical reconciliation of en- Kamin, L. J. Predictability, surprise, attention and con-
hancement and interference effects, however, ditioning. In B, A. Campbell & R. ;M. Church (Eds.),
Punishment and aversive behavior. New \brk: Appleton-
can be achieved with only a slight change in Century-Crofts, 1969.
the rehearsal model. Interference occurs, ac- Lett, B. T. Delayed reward learning: Disproof of the tra-
cording to the model, because a novel stimulus ditional theory. Learning and Motivation, 1973,4, 237-
reduces the amount of processing available for 246.
contemporaneous events. Granting this, we Lett, B. T. Long delay learning in the T-maze. Learning
and Motivation, 1975, 6, 80-90.
can allow for enhancement if we assume that Lett, B. T. Long-delay learning: Implications for learning
the reduced amount of available processing is and memory theory. In N. S. Sutherland (Ed.), Tutorial
not necessarily redistributed equally over the essays in psychology: A guide to recent advances (Vol.
set of events concerned. In particular, if an 2). New York: Halstead Press, 1979.
Lieberman, D. A., Mclntosh, D. C., & Thomas, G. V.
event with some special relationship to the Learning when reward is delayed: A marking hypothesis.
salient stimulus (e.g., contiguity, similarity) Journal of Experimental Psychology: Animal Behavior
receives a larger share of the reduced amount Processes, 1979, 5, 224-242.
of processing, then this event could actually Muenzinger, K. F. Motivation in learning: I. Electric shock
receive more processing rather than less fol- for correct responses in the visual discrimination habit.
Journal of Comparative Psychology, 1934,17, 267-277.
lowing a salient stimulus. Neuringer, A. J., & Chung, S. H. Quasi-reinforcement:
Further theoretical speculation is probably Control of responding by a percentage-reinforcement
premature. The important point is that al- schedule. Journal of the Experimental Analysis of Be-
though a salient stimulus may interfere with havior, 1967, 10, 45-54.
Palmerino, C. C., Rusiniak, K. W., & Garcia, J. Flavor-
learning about contemporaneous events under illness aversions: The peculiar roles of odor and taste
some conditions, in others it may enhance it. in memory for poison. Science, 1980, 208, 753-755.
A pressing problem for future research will Pearce, J. M., & Hall, G. Overshadowing the instrumental
MARKING AND DELAYED REWARD 411

conditioning of a lever-press response by a more valid memory for words. Journal of Experimental Psychology,
predictor of the reinforcer. Journal of Experimental 1968, 77. 593-601.
Psychology: Animal Behavior Processes, 1978, 4:, 356- Tulving, E., & Pearlstone, Z. Availability versus accessibility
367. of information in memory for words. Journal of Verbal
Roberts, W. A. Failure to replicate visual discrimination Learning and Verbal Behavior, 1966, 5, 381-391.
learning with a 1-min delay of reward. Learning and Wagner, A. R. Expectancies and the priming of STM. In
Motivation, 1976, 7, 313-325. S. H. Hulse, H. Fowler, & W. K. Honig (Eds.), Cognitive
Skinner, B. F. The behavior of organisms. New York: Ap- processes in animal behavior. Hillsdale, N.J.: Erlbaum,
pleton-Century-Crofts, 1938. 1978.
Spear, N, E. The processing of memories: Forgetting and Wagner, A. R. SOP: A model of automatic memory pro-
retention. New York: Erlbaum, 1978. cessing in animal behavior. In N, E, Spear & R. R.
Telegdy, G. A., & Cohen, J. S. Cue utilization and drive Miller (Eds.), Information processing in animals. Hills-
level in albino rats. Journal of Comparative and Phys- dale, N.J.: Erlbaum, 1981.
iological Psychology, 1971, 75, 248-253. Wagner, A. R., Rudy, J. W, & Whitlow, J. W. Rehearsal
Tranberg, D. K,, & Rilling, M. Delay-interval illumination in animal conditioning. Journal of Experimental Psy-
changes interfere with pigeon short-term memory, /ow- chology, 1973, 97, 407-426. (Monograph)
nal of the Experimental Analysis ofBehavior, 1980,33,
39-49. Received May 17, 1982
Tulving, E., & Osier, S. Effectiveness of retrieval cues in Revision received February 2, 1983 •

Third Edition of the Publication Manual

APA has just published the third edition of the Publication Manual. This new
edition replaces the 1974 second edition of the Manual. The new Manual updates
APA policies and procedures and incorporates changes in editorial style and practice
since 1974. It amplifies and refines some parts of the second edition, reorganizes
other parts, and presents new material. (See the March issue of the American Psy-
chologist for more on the third edition.)
AH manuscripts to be published in the 1984 volumes of APA's journals will be
copy edited according to the third edition of the Manual. Therefore, manuscripts
being prepared now should be prepared according to the third edition. Beginning
in 1984, submitted manuscripts that depart significantly from third edition style will
be returned to authors for correction.
The third edition of the Publication Manual is available for $12 for members of
APA and $15 for nonmembers. Orders of $25 or less must be prepaid. A charge of
$1.50 per order is required for shipping and handling. To order the third edition,
write to the Order Department, APA, 1400 N. Uhle Street, Arlington, VA 22201.

You might also like