You are on page 1of 14

Song sequence learning with delayed reinforcement in male

Bengalese finches

Marina Hovhannisyan1 , Lena Veit2 , Michael Brainard3


1
University of California, Berkeley, Neurobiology, Department of Molecular and Cell Biology, Life Sciences
Addition, 3200, Berkeley, CA 94720
2, 3
University of California, San Francisco, Department of Physiology, 675 Nelson Rising Lane, San Francisco, CA,
94143-051

Abstract. In male birds, song is learned by listening to adult conspecifics, and then proceeding through a period
of sensorimotor learning in which initially generic vocalizations are gradually refined until they closely resemble the
adult ‘tutor song models. In adulthood, song, much like accent for speech, is normally stable in ‘closed-ended learners
such as the Bengalese finch. However, experiments from the Brainard lab have shown that normally stable adult song
can be modified through a process of differential reinforcement in which real time monitoring and feedback is provided
to ‘punish the birds for signing certain song variants. This study investigates the rules that guide modification of adult
syllable sequencing, specifically the temporal relationship between a sequencing choice and differential reinforcement.
We deploy a computerized training system to provide birds with real time feedback about performance in this form of
differential reinforcement for producing some sequences in preferences to others.

1 Introduction
The song in male birds is a set of consecutive syllables - short vocalizations varying in frequency,
intensity and duration. Syllables can be distinguished by their spectral structure. This can be ex-
pressed as a difference in duration, intensity and frequency. We label each syllable by a letter of
the alphabet (a, b, c, etc.) (Figure 1A). In Bengalese Finches, each song consists of 5-15 syllables,
grouped in patterns particular to the bird. Syllables can occur either in a stereotyped sequence,
where syllable b is always followed by syllable c only, or in variable sequences, where syllable b
may be followed by c or d; in the latter instance, b is termed as a branch point (Figure 1B) (Warren
et.al. 2012).

Variable transitions at branch points occur in a probabilistic way, where, for example, c follows b
40% of the cases and d follows b 60% of the cases. These probabilities are usually stable in adult
song. Adult birds can be trained to change these transition probabilities by negative reinforcement.
In such a study, specific syllables following a branch point were targeted with white noise (WN).
This aversive stimulus leads the bird to avoid the targeted syllable, leading to a drop in the transi-
tion probability towards this syllable at the branch point (Warren et.al. 2012). With b as the branch
point leading to c 40% of the time and to d 60% of the time, if syllable c was to be targeted, we
would observe a drop from 40% of c and a subsequent increase in percentage from 60% in d. It is
useful to note that a drop in the percentage of one syllable leads to an increase in the percentage
in the other. If the branch point exhibits more than two succeeding syllables, then the drop in the
percentage in one can lead to an increase in the percentage in one or all the others.

1
Figure 1: Branch point and transition probabilities
“A. Spectrogram of Bengalese finch song illustrating the sequencing of the acoustically distinct
syllables a, b, c, and d. Syllable a always transitioned to syllable b. In contrast, syllable b was a
branch point at which a transition could occur to either syllable c or syllable d. B. Schematic (left)
and expanded spectrograms (right) for the branch point at syllable b. For this branch point, the
probability of transitioning to syllable c was 68% ( pbc 0.68) and the probability of transitioning
to syllable d was 32% ( pbd 0.32) over a 4 d period”. (Warren et.al. 2012).

Certain patterns exhibit high stereotypy, where a particular se-


quence of syllables is sung invariably throughout each song (i.e.,
a sequence of syllables without a branch point). We define these as
motifs. Such a motif does not change in syllable sequence, how-
ever, particular syllables of the motif may appear independently
throughout the song or be part of other motifs. Taking an example Figure 2: Motif patterns
song t abcd t ebfg t abcd t ebfg, syllable b appears in motif abcd
and ebfg. As conducted in previous analyses, a transition diagram based on single syllables would
represent b as a branch point (b → c and b → f) (Figure 2). Taking the history of the sequence
leading up to b into account, however, we see that b is not really a branch point, but rather the tran-
sition from b to the next syllable is entirely determined by the motif (Jin & Kozhevnikov 2011).
The ‘true’ branch point in this example sequence would be t-a and t-e. Therefore, we wondered
what the influence of this motif structure is on song sequence learning. Assuming the birds can
only modify transition probabilities at motif branch points, we were interested whether the distance
of WN reinforcement to the branch point mattered. To test this, we targeted a variable transition
between motifs at different distances from the motif branch point (Here, syll. t-d and syll. t-g).

Adult song plasticity has been studied extensively in pitch learning, where the bird has learned
to shift the pitch of a particular syllable to avoid negative reinforcement immediately following
the target (Tumer & Brainard 2007. In this case, it has been demonstrated that there needs to be
a close temporal relationship between the target (i.e., the point in time where pitch is measured)
and the delivery of reinforcement. No learning has been observed when negative reinforcement
has been delayed 100ms from the target (Tumer and Brainard 2007). In our study, we perform a
series of experiments targeting syllables inside of a motif with a delay from the branch point, to
observe if song plasticity is possible with delayed negative reinforcement. We hypothesize that re-
inforcement learning should occur only if the reinforcer occurs close in time to the decision. If the

2
behavioral unit of song sequencing is the song syllable, then punishing a syllable several syllables
after the branch point should not be an effective reinforcement signal, and the bird should not be
able to change the transition probabilities in response to that (Figure 3 (a)). Alternatively, if the
representation is motif-based, targeting at any time during the motif should be a learning signal
that can reduce the probability of that motif (Figure 3 (b)).

(a) (b)
Figure 3: Learning Models
These two models of learning show two extremes.
In Model 1, the bird only learns if WN is administered in the beginning of the motif.
In Model 2, the effect of the WN is uniform throughout the entire motif.

2 Materials and Methods


Subjects For this study, we used one Bengalese Finch (BF) to test behavioral changes under nega-
tive reinforcement administration.

Sound Recording and Auditory Feedback Recording and auditory feedback delivery was conducted
using custom LabView software (EvTAF). Song was recorded continuously throughout baseline
and experimentation days (Figure 4). Syllables were targeted through their spectral structure. For
each syllable a template was created which captured the spectral structure of the syllable. The
template was then divided into a number of bins, and a manual choice made of a particular bin
identifying the syllable. The software continuously compares this template to ongoing song every
8ms and is capable of playing WN with short latency as soon as the target syllable is detected.

Reinforcement at Branch Points White Noise (WN) was administered as negative reinforcement
over the punished syllable (Figure 5). Performing motif-based targeting, we administered WN on
syllables a (first a), s (second s) and o (first o) of the motifs klmnaaoo and dss. For baseline
recordings, no WN was administered. During an average of 10 days of reinforcement during ex-
perimentation, WN would be played for 95 % of all songs and turned off in a random sample,
5% of the song bouts, to monitor learning. Songs during which WN was not present are termed
as “catch” songs. These are used as the data to detect learning. Certain syllables of the “catch”
files were labeled for each song. This was done using templates to recognize the syllables and
additional hand-checking and labeling was done manually (Tumer & Brainard 2007). We conduct

3
Figure 4: Experimental Setup
The bird receives auditory feedbak, in the form of WN,
upon singing the targeted syllable into a microphone.

a series of learning experiments, each lasting an average of 10 days.

It is known that disturbance to auditory feedback can cause acute changes to sequence probability
(Sakata and Brainard, 2006). During certain experiments, we observed that the presence of WN
initially increases the percentage sung of the punished syllables. Therefore, we analyze only the
catch songs. In addition, despite the administration of WN, the bird finishes singing both motifs.

Analysis All analysis was made using MATLAB.

Statistics The quantity transition probability is used to analyze the song structure and learning
patterns of the bird. This is measured as a fractional occurrence of a particular transition, relative
to all others at the branch point. For instance, if for a particular point b we have the transitions b →
a, b → c and b → d, then the transition probability of b → a, termed pba , will equal the occurrence
of b → a divided by the sum of the occurrences of b → a, b → c and b → d. To test significance,
we use Fisher’s exact test. A p-value below 5% demonstrates significant difference between the
variables.

3 Results
3.1 Overview
The song of this Bengalese Finch consists of 13 distinct syllables, each marked by a letter in our
study. There are 3 main stereotyped motifs (Figure 5). After labeling these syllables, we obtain
the motifs dss, tch and klmnaaoo. Throughout the screening period and the series of experiments,
these motifs show no variability in their syllable composition. We observe that klmnaaoo is al-
ways preceded by tch, whereas dss can occur following other syllables as well (Figure 6). The
tch motif can also be found without the succession of dss or klmnaaoo. We thus set tch as our
transition motif (with the syllable h being the branch point) and define klmnaaoo as motif 1 (CA)
and dss as motif 2 (CS).

To quantify learning, we measure transition probabilities of motif tch → dss (termed pts ) and tch
→ klmnaaoo (termed pta ). This is done by computing the percentage of the time CA and CS follow
tch. In addition to transitions tch → dss and tch → klmnaaoo, we also observe the presence of
transition tch → other syllables at an average of 4.36% during screening. Thus, measuring pta and
pts takes into account the number of tch → other syllables transitions as well. Because we see also

4
Figure 5: An example bout of singing with motifs highlighted

the presence of CS which is not preceded by tch, we quantify learning in an additional manner -
by analyzing the ratio of syllable s to syllable a, as a proxy for determining the ratio of CS to CA
throughout each experiment. By encompassing all instances of the appearance of CS, this provides
a more holistic understanding of the motif frequencies (Figure 7).

Figure 6: Syllable Transition Diagram from an example day in the screening period.
Numbers show transition probabilities in %, line thickness is proportional to transition
probabilties. Arrows show direction of transition. For simplicity, introductory notes and other
variable syllables are shown as ’X’. Note that this transition diagram was calculated from a
random subset of songs in the screening period, therefore transition probabilities do not match
those in Fig. 7a) exactly.

3.2 Screening period


In the initial week, we conduct a screening recording for 7 days to obtain a baseline value for the
CS to CA ratio and for the pta and pts (Figure 8 (a)). Here, we observe a ratio close to 1, suggesting
an equal frequency between the motifs (Figure 8 (b)). We observe that during screening, there is
variability in the bird song (e.g., order of motifs, length of song), but transition probabilities and
the CS to CA ratios remain relatively stable.

3.3 Targeting CA (1st experiment)


In the first experiment following the baseline testing, we target the syllable a of motif CA, 5 syl-
lables or approximately 200ms after the branch point (Figure 8 (c)). Because we did not observe
significant changes from the baseline value during the initial four days of targeting, we increased
the WN amplitude and duration on day 8. We see that the transition probabilities mostly remain
below the baseline value for days 5-11, reaching 0.41 on the final day of experimentation (Figure
8 (a)). This is complemented by the increase in pts from a baseline value of 0.44 to 0.57 on the
final day of targeting. The CS/CA ratio increases from a baseline value of 1 to 1.7 on the final
day of targeting (Figure 8 (b)). The pta at the end of the training period is significantly lower than

5
(a) (b)
Figure 7: Statistics for the screening period
(a) pta (blue) and pts (red) over 7 days in the screening period without WN.
(b) Ratio of motif CS to motif CA over the same 7 days in the screening period.

during baseline. (Fisher’s exact test, p = 2.1e-05, n=8290 (transitions t → a during screening),
5995 (transitions t → a during targeting CA), 7229 (transitions t → s during screening), 8157
(transitions t → s during targeting CA)). Therefore, we conclude that delayed WN was sufficient
to induce sequence plasticity in this experiment.

The transition diagram (Figure 8 (d)) shows that a decrease in the transition probability of one mo-
tif, is directly correlated with the increase in the other, demonstrating that tch is indeed a branch
point.

(a) (b)

(c) (d)
Figure 8: Statistics for targeting CA (a)-(b), spectrogram with WN on the targeted syllable (c),
syllable transition diagram after targeting CA (d).

6
3.4 Targeting CS
In the second experiment we target the syllable s of motif 2, approximately 100ms after the branch
point (Figure 9 (c)). We observe that throughout 9 days of experimentation, the CS/CA ratio hov-
ers around 2 (with 1.7 being the final value after targeting the a syllable) (Figure 9 (b)). During the
initial few days, we observe a surprising increase in pts to a value of 0.55 and a subsequent drop
in pta to a value of 0.45 on the final day of experimentation (Figure 9 (a)). Unlike in the initial
experiment, learning does not appear to take place.

(a)

(b) (c)
Figure 9: Statistics for targeting CS (a)-(b), WN targeting of syllable s (c).

3.5 Recovery Period 1


Following three weeks of WN reinforcement learning, we allow the bird to recover in the absence
of WN, to re-establish the song to its baseline version. Such a return to baseline is to be expected
after learning (Warren et. al. 2012). During 10 days of recovery, however, we do not observe
a return of the song to its original state (Figure 10 (a)). The CS to CA ratio remains close to
2 (Figure 10 (b)). To continue further experimentation, we take a new baseline value of 0.37
(compared to 0.54 during screening) in pta and a corresponding value to 0.63 (compared to 0.43
during screening) in pts and 1.75 (compared to 1 during screening) as baseline CS/CA ratio, calling
this New Baseline (NB).

7
(a)

(b)
Figure 10: Statistics for the first recovery period

3.6 Targeting CA (2nd experiment) and Recovery Period 2


After 10 days of recovery, we target the second syllable a in motif 1, again approximately 200ms
following the branch point (Figure 2 (c)). In comparison to the new baseline values (NB) obtained
from the recovery period, we see a drop in pta and a subsequent increase in pts (Figure 12 (a)). The
pta values remain below the new baseline during the first 3 days of experimentation, but gradually
increase. After the 4th day, they remain either above or at the new baseline value. Comparing
ratios, we see that there is an increase in the CS/CA ratio from the last day of recovery to the
first day of experimentation (1.7 to 2.5) (Figure 11 (b)). This is consistent with the increase in pts
during the initial days. The ratio then drops during the initial 4 days, remaining close to 2, still
above that of the last days of the recovery period, without exhibiting significant fluctuation during
the remaining days of experimentation. We can see here that the administration of WN leads the
bird to avoid CA during the initial days, but a gradual recovery, marked by the increase in pta
throughout the following days, is observed. Comparing the last three days of the recovery period
(as the new baseline) and the last three days of the experimental period, we find that the percent-
age of t-a transitions significantly decreased between the period of recovery and targeting of the
second a syllable (Fisher’s exact test, p = 2.2e-16, n=7157 (transitions t → a during recovery),
12036 (transitions t → a during targeting CA), 9704 (transitions t → s during recovery), 19149
(transitions t → s during targeting CA)).

Entering a second recovery period, the song exhibits an increase in the CS/CA ratio from 1.7 on

8
(a)

(b) (c)
Figure 11: Statistics for the second experiment targeting CA (a)-(b), WN administration to syll. a
(c).

the last day of targeting to 2.5 on the first day of recovery (Figure 13 (a)). This trend is seen also
in the increase in pts from 0.55 on the last day of targeting to 0.67 and a decrease in pta from 0.4
to 0.32 (Figure 13 (b)).

(a) (b)
Figure 12: Statistics for the second recovery period
Note that days 8-14 had to be excluded from the analysis due to malfunctioning sound recording
equipment. Total recovery time was 19 days.

3.7 Targeting CA (3rd experiment) and Recovery Period 3


After the 19 days of recovery, we target a syllable later in CA, syllable o (Figure 13 (c)). Taking
as a baseline CS to CA ratio as 2 and a baseline value (NB*) of 0.44 for pta and of 0.55 for pts ,
we once again observe song plasticity through the increase in the CS to CA ratio to 2.4 (Figure 13

9
(a), (b)). We observe significance between the preceding recovery period and targeting the syllable
o (Fisher’s exact test, p = 2.1e-05, n=8452 (transitions t → a during recovery), 2287 (transitions
t → a during targeting CA), 11187 (transitions t → s during recovery), 3448 (transitions t → s
during targeting CA)).

(a)

(b) (c)
Figure 13: Statistics for the third experiment targeting CA (a)-(b), WN administration to syll. o
(c).

After 10 days of experimentation, we allow the bird to recover. We observe a drop in the CS/CA
ratio from the initial 2.5 to a final value of 1.7 on the 12th day of recovery (Figure 14 (b)). The pta
transition probability rises from 0.44 of the final o targeting period to 0.47 on the final date of the
recovery and the pts drops from 0.56 to 0.46 (Figure 14 (a)). We thus observe a small recovery in
the song, but not to that of baseline testing values.

10
(a) (b)
Figure 14: Statistics for the third recovery period

3.8 Summary
We plot the transition probability of the above experiments from the data obtained on days 5, 6
and 7 of the experimentation. In summary, all three experiments where a syllable in motif CA was
targeted led to a slight decrease in pta compared to the immediately preceding baseline. No learning
was observed in the reverse direction for targeting motif CS. Even with prolonged recovery periods
without WN, the bird never returned to its initial baseline song (Figure 15).

Figure 15: Summary plot of days 5, 6 and 7 of each experiment and recovery period.
Error bars show standard error of the mean over n=3 days. Asterisks indicate the varying
differences using Fisher’s exact test, with three being the lowest p value obtained.

4 Discussion
Adaptive Modification Through the series of experiments, we can gauge a possibility of adaptive
modification in sequence learning with a delayed negative reinforcement. Compared to previous
pitch learning studies, where the delayed (100ms) administration of WN had no impact on the

11
syllable pitch structure (Tumer & Brainard 2007), we find in this study that delayed negative re-
inforcement may indeed impact the sequence structure of the song. We must note that for each
targeted motif, there is no alteration in the syllable composition of the motif itself, but rather an
avoidance of the motif altogether during experiments in which the bird exhibits learning. Thus,
there is no change in the stereotyped sequences, as observed in previous sequence learning studies
(Warren et.al. 2012). The observed change of transition probabilities is significant but very modest
in all three experiments where we targeted CA. This could be because of the delay, or because of
the structure of the song in this bird. We will need more subjects to find out. We do, however,
observe an alteration in the motif probabilities; in particular during the first experiment, and can
thus conclude that the bird was able to learn with delayed feedback.

Targeting CA (1st experiment) In targeting syllable a, we observe an overall decline in pta . This
behavior appears contingent on WN administration, since the increase of duration and amplitude
of WN causes a subsequent drop in the transition probability. We observe that the avoidance is
motif-based and not syllable based, since the bird learns to avoid the motif entirely.

Targeting CS For the syllable s, 100ms away from the transition point, we see a lack of impact
of WN on the song structure. This lack of learning is surprising, given that previous sequence
learning experiments have demonstrated learning is possible in both directions at branch points
(Warren et.al. 2012) and that this bird was able to modify transition probabilities at this branch
point with even longer delay when CA was targeted. One possible explanation could be that in tar-
geting the CS motif, we were targeting the second s syllable. This is contrary to the CA targeting,
where we punish the first a, causing the bird to avoid syllable a altogether. Therefore, in the CS
targeting, punishment is administered only half the time the target syllable is sung. Thus, the learn-
ing may not be as efficient. We see CS targeting shows no change on the song; the CS/CA ratio as
well as pts seem reasonably stable at new baseline and recover to that after subsequent experiments.

First Recovery period After the targeting experiments, allowing the bird to recover in the absence
of WN shows a surprising result of no recovery. On the contrary, we observe a growth in pts , as
supposed to a return to the baseline. We may possibly attribute this to the length of our experi-
mentation (total of 20 days) in contrast to previous ones (6 days) (Figure 16). Another possibility
is that learning with delayed reinforcement is accomplished through different mechanisms than
learning directly at the branch point, and therefore might follow different rules, such as being more
permanent.
Targeting CA (2nd experiment) Learning appears to have taken place on a small scale during the
initial three days on WN administration. The CS/CA ratios of the second recovery period show
a small fluctuation but remain close to a value of 2, no return to the original baseline value. The
results from this experiment are hard to interpret due to a malfunctioning pre-amplifier during this
experiment, where a percentage of songs were not recorded properly.

Targeting CA (3rd experiment) The evidence of learning in the third experiment, attributed to an
increase in pts , demonstrated another instance of adaptive modification.

Future Experimentation With a longer experimentation time, the goal of future projects would
be to establish similar procedures for several birds, instead of only one. Further modifications to

12
Figure 16: WN contingent learning and recovery
“Example trajectory of probabilities for two transitions, ab (blue) and ac (red), over 2 d at
baseline (baseline), 6 d of WN targeting the ab transition (WN on), and 3 d following the
cessation of WN (WN off). Error bars indicate 95% confidence intervals” (Warren et. al. 2012).

the experimentation procedure would be to have a pre-established time set for each experiment.
For instance, to set the number of baseline, reinforcement and recovery days in advance and keep
them constant for each bird. Learning would be quantified from data obtained from the same day
for each bird. In addition, a longer time would be necessary to take baseline recordings, to ensure
a certain stability in the bird’s song.

While these preliminary results suggest that some sequence learning with delayed reinforcement
is possible, and thus argue against the extreme learning model in Figure 3 (a), further experiments
should aim to dissect the exact relationship between the timing of negative reinforcement and the
magnitude and speed of learning. For example, in a future bird with a long stereotyped motif, one
might try to introduce sequence changes in the same direction by targeting syllable 1, 3, 5, and
7 from the branch point, in separate experiments in randomized order, to establish the amount of
learning as a function of distance from the branch point. Results from such an experiment might
refine our hypothesis in Figure 3 (b). The amount of learning could either scale with distance
from the branch point, so that the transition probability is most strongly modulated by immediate
feedback, but the time scale for effective feedback is slower than for pitch learning. Alternatively,
the amount and speed of learning could be independent of the time point at which WN is delivered
within the motif, arguing for a motif-based representation of syllable sequences, where any time
point of WN delivery acts to negatively reinforce the targeted motif as a whole.

5 Acknowledgments
I thank the UCSF Brainard Lab for providing the resources and aid for the experimentation, Lena
Veit and Professor Michael Brainard for mentoring me through the experiments, the UC Berkeley
Department of Molecular and Cell Biology for the chance to write an honor’s thesis and Professor
Daniel Feldman of UC Berkeley for supervising it.

6 References
1. Tumer EC, Brainard MS (2007). Performance variability enables adaptive plasticity of crystal-
lized adult birdsong. Nature 450 1240-1245.
2. Warren TL, Charlesworth JD, Tumer EC, Brainard MS (2012). Variable Sequencing Is Actively
Maintained in a Well Learned Motor Skill. J Neurosci 32(44):15414-15425. Medline.
3. Jin DZ, Kozhevnikov AA (2011) A Compact Statistical Model of the Song Syntax in Bengalese

13
Finch. PLoS Comput Biol 7(3): e1001108.
4. Sakata TJ, Brainard MS (2006). Real-Time Contributions of Auditory Feedback to Avian Vocal
Motor Control. Journal of Neuroscience 20 September 2006, 26 (38) 9619-9628.

14

You might also like