You are on page 1of 19

Article

Striatal dopamine signals are region specific and


temporally stable across action-sequence habit
formation
Graphical abstract Authors
Wouter van Elzelingen,
Pascal Warnaar, João Matos, ...,
Damiaan Denys, Tara Arbab,
Ingo Willuhn

Correspondence
i.willuhn@nin.knaw.nl

In brief
Van Elzelingen et al. simultaneously track
striatal dopamine signaling and
development of habitual behavior. In the
dorsomedial striatum, previously
associated with non-habitual behavior,
dopamine release increases during action
initiation in habitual rats and decreases
during action completion, whereas non-
habitual rats show opposite effects.

Highlights
d Validation of a novel test that monitors habit development
individually across time

d Dopamine release during habit development is stable across


relevant striatal regions

d Only dopamine release in dorsomedial striatum correlates


with habit development

d Optogenetic stimulation of dorsomedial striatal dopamine


accelerates habit formation

van Elzelingen et al., 2022, Current Biology 32, 1163–1174


March 14, 2022 ª 2022 The Authors. Published by Elsevier Inc.
https://doi.org/10.1016/j.cub.2021.12.027 ll
ll
OPEN ACCESS

Article
Striatal dopamine signals
are region specific and temporally stable
across action-sequence habit formation
Wouter van Elzelingen,1,2 Pascal Warnaar,1,2 João Matos,1,2 Wieneke Bastet,1,2 Roos Jonkman,1,2 Dyonne Smulders,1,2
Jessica Goedhoop,1,2 Damiaan Denys,1,2 Tara Arbab,1,2 and Ingo Willuhn1,2,3,4,*
1Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Meibergdreef 47, 1105 BA Amsterdam, the

Netherlands
2Department of Psychiatry, Amsterdam University Medical Centers, University of Amsterdam, Meibergdreef 5, 1105 AZ Amsterdam, the

Netherlands
3Twitter: @_the_ing
4Lead contact

*Correspondence: i.willuhn@nin.knaw.nl
https://doi.org/10.1016/j.cub.2021.12.027

SUMMARY

Habits are automatic, inflexible behaviors that develop slowly with repeated performance. Striatal dopamine
signaling instantiates this habit-formation process, presumably region specifically and via ventral-to-dorsal
and medial-to-lateral signal shifts. Here, we quantify dopamine release in regions implicated in these pre-
sumed shifts (ventromedial striatum [VMS], dorsomedial striatum [DMS], and dorsolateral striatum [DLS])
in rats performing an action-sequence task and characterize habit development throughout a 10-week
training. Surprisingly, all regions exhibited stable dopamine dynamics throughout habit development. VMS
and DLS signals did not differ between habitual and non-habitual animals, but DMS dopamine release
increased during action-sequence initiation and decreased during action-sequence completion in habitual
rats, whereas non-habitual rats showed opposite effects. Consistently, optogenetic stimulation of DMS
dopamine release accelerated habit formation. Thus, we demonstrate that dopamine signals do not shift
regionally during habit formation and that dopamine in DMS, but not VMS or DLS, determines habit bias,
attributing ‘‘habit functions’’ to a region previously associated exclusively with non-habitual behavior.

INTRODUCTION and pre-programmed appetitive behaviors.17,18 The behavioral


impact of these three regions is thought to evolve with the
Our daily lives are governed by learned routines, exemplified by gradual development of a habit, reflecting an inter-regional shift
many actions we perform automatically without paying attention, of the locus of behavioral control: a dorsal shift away from VMS
such as pulling out our smartphone to seek rewarding input by after early stages of instrumental learning9,17–20 and a relative
perpetually scrolling down the screen. Such habits are thought transfer from DMS to DLS as behavior becomes more ingrained
to gradually develop from initially goal-directed, conscious ac- and habitual.7,9,13,16,21–27
tions (e.g., using our smartphone to contact a friend) that reliably These presumed intra-striatal shifts of the locus of behavioral
resulted in desired outcomes. Habit formation is often tied to control are thought to depend on striatal-dopaminergic neuro-
detection of stimuli predictive of these outcomes and is facili- transmission, which is in line with the anatomy of the dopamine
tated by both extended high-rate action repetition (referred to system28 and is corroborated by studies using dopaminergic
as ‘‘overtraining’’1) and relative uncertainty as to when precisely drugs as reinforcers.29–31 Consistently, ample evidence implicates
the desired outcome will be achieved.2,3 striatal dopamine in habit formation,6,9,23,24,32 and performance of
The striatum has long been implicated in habit formation,4–6 ingrained action sequences is reflected in action-chunking activity
and its subregions are assumed to fulfill distinct functions in patterns both in striatal and dopamine neurons.33 Furthermore,
this process. The dorsolateral striatum (DLS) is hypothesized genetic deletion of dopamine-neuron NMDA receptors impairs
to actively mediate habit learning, supported by ample evidence habit learning,34 as does lesion of nigrostriatal dopamine projec-
from rodent experiments which link DLS activity with (overtrain- tions,35 whereas pharmacological enhancement of dopamine
ing-induced) habit formation and DLS lesion or inactivation with transmission biases behavior toward habit formation.36 In hu-
attenuated habitual responding.7–14 The (posterior) dorsomedial mans, dopamine-depleted Parkinson’s patients exhibit deficits
striatum (DMS) is hypothesized to control goal-directed behavior in habitual control.6,37,38 However, although dopamine is indubi-
and, thus, oppose habitual strategies.15,16 A third functional tably involved in habit formation, its regional specificity and its
domain heavily investigated in the context of reward learning is involvement in the widely assumed regional shift of behavioral
the ventromedial striatum (VMS), which is involved in motivation control during the transition from goal-directed to habitual

Current Biology 32, 1163–1174, March 14, 2022 ª 2022 The Authors. Published by Elsevier Inc. 1163
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
ll
OPEN ACCESS Article

behavior is understudied. Conclusions were advanced on the 0.0001; Figure 1B, right), demonstrating that animals learned
grounds of anatomical studies and sparse functional evidence: the task structure quickly.
one study tracked regionally shifting striatal dopamine signaling We tested whether behavior persists despite reward devalua-
across development of a presumed habit reinforced by cocaine,31 tion by sensory-specific satiety via a 1-h pre-feeding of regular
whereas another study reported no such shift using a natural rein- training pellets (devalued condition) or alternative pellets (valued
forcer.39 Thus, the central questions that remain are how region- condition) after 10 weeks of seeking-taking training (Figure 1C,
specific and regionally shifting dopamine signals regulate habit left). Subsequent presentation of the seeking lever under extinc-
formation. tion conditions generated no significant difference in the number
In experimental animals, habit formation is best achieved by of seeking-lever presses between valued and devalued condi-
either overtraining or employing variable-interval (VI) reinforcement tions (n = 14, Z = 0.785, p = 0.433; Figure 1C, middle), demon-
schedules that lead to high response rates and reward-timing un- strating that seeking-taking training induced a habit. A ‘‘pellet
certainty.1–3,40–43 Dickinson and collaborators developed a widely choice test’’ verified that satiety was specific to the pre-fed pellet
applied operational definition that became the gold standard for type, i.e., each pellet type was devalued effectively (n = 14,
habit classification:9,40,41,44 an insensitivity to (short-term) changes valued condition: Z = 3.296, p < 0.001; devalued condition:
in value of action outcomes and/or action-outcome contingency.42 Z = 3.297, p < 0.001; Figure 1C, right).
However, this definition identifies habits on the failure to satisfy We characterized dopamine release longitudinally across
criteria for goal-directed action (negative definition), including the weeks, using fast-scan cyclic voltammetry in VMS, DMS, and
possibility that unforeseen factors compromise integration of DLS. Electrode implantation induced minimal brain damage,
changes in action-outcome conditioning.45,46 Besides, tradition- and placement was verified (Figure 1E). Dopamine release in
ally used habit measures can only be applied once in a given indi- response to unpredicted delivery of a food pellet was measured
vidual and are not suited to detect individual differences, both of as a proxy for electrode sensitivity, which remained stable
which are critical features, since habits develop slowly and recruit- across weeks in all regions (VMS: t24 = 0.7162, p = 0.4808;
ment of habitual strategies vary significantly between individuals. DMS: t10 = 1.355, p = 0.2052; DLS: t15 = 1.046, p = 0.3121; Fig-
To address these limitations, we set out to develop a paradigm ures 1F and S1). Besides modestly diminished DMS dopamine
that ‘‘positively’’ identifies habits, accounts for individual differ- release upon seeking-lever extension (distal; t10 = 2.267, p =
ences, and may be repeated within a subject to track gradual habit 0.0468), no longitudinal differences were found (distal VMS:
development. t24 = 1.186, p = 0.2473; distal DLS: t18 = 0.1541, p = 0.8793; prox-
We delineate the differential contributions of relevant striatal imal VMS: t24 = 1.798, p = 0.0848; proximal DMS: t10 = 1.041, p =
subregions in habit formation by measuring real-time fluctua- 0.3223; proximal DLS: t18 = 0.3994, p = 0.6943; Figures 1G and
tions in extracellular dopamine concentrations in VMS, DMS, 1H). Thus, regional dopamine release is relatively stable across
and DLS of rats performing in an action-sequence reinforcement training, with the only task-relevant changes occurring in DMS.
task (a ‘‘seeking-taking’’ chain task, which combines VI and To overcome limitations of traditional habit testing, we devel-
fixed-ratio (FR) schedules and overtraining) and subsequently oped a test during which both seeking and taking levers were
validating the role of region-specific dopamine signaling using extended simultaneously for 1 min under extinction conditions
optogenetic stimulation. Our findings challenge the traditionally (Figure 2A, left), allowing rats the choice to engage with the lever
assumed shifting of regional dopamine dynamics and surpris- proximal to reward delivery (taking; goal-directed choice) or the
ingly indicate that DMS, instead of DLS, dopamine determines lever at the distal to reward delivery (seeking; habitual choice). In
the transition to habitual behavior, thereby questioning the idea week 1, rats exhibited a preference for the taking over the
that DMS opposes habit formation. seeking lever, but by week 10, this preference had shifted to
the seeking lever (n = 36; week 1: Z = 3.540, p = 0.0004;
RESULTS week 10: Z = 4.357, p < 0.0001; Figure 2A, right). This temporal
evolution of lever preference was highly reliable between
Habitual behavior is induced by excessive behavioral repetition different rat cohorts (Figure S2A). Extinction effects due to
(overtraining)9,11,24,47 or by reinforcing behavior on a VI repeated testing are unlikely because (1) the time spent in regular
schedule.3,48–50 Here, we overtrained animals under a chained training exceeds preference testing almost 500 times and (2) the
VI60-FR1 reinforcement schedule, thereby combining the two frequency of preference tests did not influence the habit readout
most prominent paradigms. Rats learned to press a seeking (Figures S2A and S2B).
lever (distal to reward) on a VI60 schedule (followed by a 3-s To capitalize on individual differences, we classified animals
delay in which no levers were extended) and, subsequently, an (same data as in Figures 1C–1H) into two equal-sized groups
FR1 taking lever (proximal to reward) that produced a food pellet based on their relative seeking-taking preference in week 10 (i.e.,
(Figure 1A). The number of seeking presses under VI reinforce- number of seeking minus number of taking presses, divided by to-
ment increased over 10 weeks of training (week 1 versus week tal number of presses, as shown in Figure 2F), which is identical to
10; t35 = 12.21, p < 0.0001; Figure 1B, left), whereas the number a median split. There were no significant changes in number of
of food-magazine head entries did not (n = 36, Z = 0.440, p = head entries into the food magazine over time in either group (Fig-
0.660; Figure 1B, middle). We assessed the probability with ures 2B and 2C [top]). Both groups displayed a taking-lever prefer-
which rats checked the magazine for food during the 3-s delay ence in week 1 (habit: n = 18, Z = 1.810, p = 0.0370; no habit: n =
period between retraction of the seeking lever and extension of 18, Z = 3.312, p = 0.0002; Figure 2C), but only the habitual ani-
the taking lever. This probability decreased dramatically across mals developed a steadily increasing seeking-lever preference
weeks (week 1 versus week 10: n = 36, Z = 5.185, p < (habit group: lever, F1,44 = 58.17, p < 0.0001; lever x week

1164 Current Biology 32, 1163–1174, March 14, 2022


ll
Article OPEN ACCESS

A Distal to reward Intermediate Proximal to reward


D
+ 2.04 + 1.80 + 1.56 + 1.20 + 0.84 + 0.60 + 0.36 0.0
Final press & Delay Taking Taking
Seeking lever Pellet
seeking lever period lever press &
extension delivery
retraction (no levers) extension retraction

Seeking (VI60 s) 3s ~1 s 3s ITI


VI60 FR1
Seeking

Seeking

Taking
E F
lever

lever

lever
Delivery of an unpredicted food pellet (control)

VMS DMS DLS


Food Food ns ns ns
magazine pellet
120 30 30

[Dopamine] (nM)
B Distal Intermediate 100
80 20 20

Seeking (VI60 s) Head entries (VI60 s) Head entries before 60


taking-lever extension
VI<60 VI60 VI<60 VI60 VI<60 VI60
# Seeking presses / session

40 10 10
# Head entries / session

7000 1000 1.0


probability / session

6000 20
800 0.8
Head entry

5000 0 0 0
600 0.6
4000 Week 1 Week 10 Week 1 Week 10 Week 1 Week 10
3000 400 0.4
2000
1000
200 0.2 G Distal H Intermediate and proximal

ns ns ns ns ns
*
0 0 0.0
-1 0 1 2 3 4 5 6 7 8 9 10 -1 0 1 2 3 4 5 6 7 8 9 10 -1 0 1 2 3 4 5 6 7 8 9 10 12 6 6 25 10 10
Week Week Week
8 8
C Validation of habit induction by seeking-taking paradigm at week 10 8 4 4 20
1-h ad-libitum access Extinction test: 5-min seeking-lever exposure Pellet choice test: 5-min access 6 6
= Alternative pellets
Extinction test:
*** 4 2 2 15
[Dopamine] (nM)

[Dopamine] (nM)
Seeking

100
Consumption

seeking presses 4 4
(% of max)

Feeding cage Feeding cage 75


lever

300
ns 50 0 0 0 10
25 2 2
0
# Presses

200 Alt Reg


Regular pellets = valued -4 -2 -2 5 0 0
= Regular pellets 100
100 ***
Consumption
Seeking

-2 -2
(% of max)

Feeding cage Feeding cage 75 -8 -4 -4 0


lever

0 50
25
d
ed

-4 -4
ue
lu

0
al

-12 -6 -6 -5
Va

Alt Reg
ev

Regular pellets = devalued Week 1 Week 10 Week 1 Week 10 Week 1 Week 10 Week 1 Week 10 Week 1 Week 10 Week 1 Week 10
D

Figure 1. A seeking-taking chain task to monitor dopamine signaling during habit formation
(A) A VI60:FR1 trial begins with seeking-lever extension. Seeking-lever presses (‘‘distal’’) are registered until the final VI60 press (‘‘intermediate’’), which triggers
seeking-lever retraction, followed by taking-lever extension 3 s later. Approximately 1 s later, animals press the taking lever (‘‘proximal’’), it is retracted, and the
food-pellet reward is delivered 3 s later, marking the start of the variable inter-trial interval (ITI).
(B) Number of seeking-lever presses (left) and food-magazine head entries (middle) during the VI60 segment. The probability of a food-magazine head entry
during the 3-s period following seeking-lever retraction and prior to taking-lever extension (right) decreases rapidly with training, demonstrating task acquisition.
Data are mean ± SEM (black) with individual animals (gray).
(C) Rats that were trained for 10 weeks underwent outcome devaluation via sensory-specific satiety (pre-feeding), in which, on two separate days, rats had 1-h ad
libitum access to either regular pellets (bottom) or alternative (grain-based) pellets (top) before they were exposed to the seeking-lever (extinction) test. Seeking
was unaffected by outcome devaluation, demonstrating that a habit was induced. Subsequently, rats were exposed to both pellets (‘‘choice test’’) to demonstrate
that outcome-specific devaluation was successful (non-pre-fed pellets preferred; p < 0.001).
(D) Coronal brain sections with recording sites in VMS (blue), DMS (green), and DLS (red), relative to bregma.51
(E) Representative verification of electrode-tip placement in VMS (blue circle), DMS (green circle), and DLS (red circle) and corresponding electrode tracks
(arrows).
(F) Dopamine release to an unpredicted food pellet demonstrates stable electrode sensitivity. Data are median with range in quartiles and mean (+); ns, not
significant.
(G) DMS dopamine following seeking-lever extension (distal) was modestly diminished after 10 weeks. No significant differences (ns) were observed in VMS or DLS.
(H) During proximal actions, no significant differences (ns) were detected in dopamine release between weeks 1 and 10. *p < 0.05, ***p < 0.001.
See also Figure S1.

interaction, F4,98 = 42.82, p < 0.0001; no-habit group: lever, F1,46 = not week 1 (U(nhabit = 18, nno-habit = 18) = 115.000, Z = 1.487,
1.206, p = 0.2779; Figures 2B and 2C [middle]) and exhibited this p = 0.1370;). Across weeks 1 and 10, seeking-lever presses (n =
preference in week 10 (habit: n = 18, Z = 3.724, p < 0.0001; no 18, Z = 3.724, p = 0.0006; Figure 2D) and seeking-preference in-
habit: n = 18, Z = 1.491, p = 0.1454; Figures 2B and 2C [bottom]). dex (n = 18, Z = 3.724, p = 0.0006; Figure 2F) increased in habitual
Habitual animals showed more seeking presses (U(nhabit = 18, animals, whereas taking responses decreased (n = 18, Z = 3.725,
nno-habit = 18) = 11.500, Z = 4.765, p < 0.0001; Figure 2D) and p = 0.0006; Figure 2E). In non-habitual animals, seeking responses
fewer taking presses (U(nhabit = 18, nno-habit = 18) = 1.500, Z = (n = 18, Z = 3.680, p = 0.0005; Figure 2D) and seeking-preference
5.098, p < 0.0001; Figure 2E) compared to non-habitual rats in index (n = 18, Z = 3.636, p = 0.0005; Figure 2F) also increased,
week 10, but not week 1 (seeking: U(nhabit = 18, nno-habit = 18) = albeit to a lesser extent than in habitual rats, whereas taking re-
106.500, Z = 1.758, p = 0.0790; Figure 2D; taking: U(nhabit = 18, sponses did not change (n = 18, Z = 0.104, p = 0.9175; Figure 2E).
nno-habit = 18) = 153.500, Z = 0.269, p = 1.0000; Figure 2E). During lever-preference tests, habitual and non-habitual rats
Consistently, grouping the rats based on week-10 performance made more head entries into the food magazine after pressing
(Figure 2F) separated habit from no-habit animals in week 10 the taking lever than the seeking lever (Figure 2G) in weeks 1
(U(nhabit = 18, nno-habit = 18) = 0.000, Z = 5.144, p < 0.0001), but (habit group: t32 = 7.542, p < 0.0001; non-habit group:

Current Biology 32, 1163–1174, March 14, 2022 1165


ll
OPEN ACCESS Article

A D 180 E 80 F 1.0
Habit

(# Seeking- # Taking) / # total


160 70 0.8
Lever preference (LP) test Lever presses during LP test No habit

# Seeking-lever presses

# Taking-lever presses
140 0.6
120
Seeking lever 120
60
0.4 ***
Taking lever 50
Seeking
100 0.2
100

# Lever presses
Taking

*** 40 0.0
lever

lever
80 80
-0.2
60 60 *** 30
-0.4
Group
classification
40 *** 40 20
10 *** -0.6
20 -0.8
20
0 0 -1.0
0 Week 1 Week 10 Week 1 Week 10 Week 1 Week 10
Simultaneous presentation of both levers for 0 1 2 3 4 5 6 7 8 9 10
one minute (under extinction condition) Week
B C G 20 20
Week 1 Week 10
Habit No habit
# Head entries

12 12
# Head entries

15 Habit 15
10 10

# Rats
No habit

# Rats
8 8
6 6 10 10
4 4
2 2 5 5
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 0
0 2 4 6 8 0 2 4 6 8
120 120
100
Seeking lever
Taking lever *** 100
# Seeking-press Head entry # Seeking-press Head entry
# Lever presses
# Lever presses

8 8
80 80 Week 1 Week 10
6 6
60 60
ns

# Rats

# Rats
40
* 40 *** 4 4

20 20 2 2
0 0
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 0 2 4 6 8
Week Week # Taking-press Head entry # Taking-press Head entry

180 Week 10 180 Week 10


H Seeking-lever stickiness I Head entries into food magazine
Probability to repeat seeking press

160 160
1.0 0.4

Probability of head entry


140 140 Habit R-value: -0.4301
No habit P-value: 0.0088
120 120 0.9
# Lever presses
# Lever presses

0.3
100 100
0.8
80 80 0.2
60 60 0.7

40 40 0.1
0.6 R-value: 0.6797
20 20 P-value: <0.0001
0.5 0.0
0 0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0
Taking Seeking Taking Seeking [#Seeking - #Taking] / total # presses [#Seeking - #Taking] / total # presses

Figure 2. A novel preference test to measure the development of habits


(A) Habit formation was tested by presenting both levers simultaneously for 1 min under extinction conditions (left). In week 1 of VI60:FR1 seeking-taking training,
rats preferentially pressed the taking lever, but by week 10, preference had shifted to the seeking lever (right).
(B) In the habit group, the number of food-magazine head entries remained stable (top) across weeks, but rats switched from a taking- to a strong seeking-lever
preference (middle) and exhibited more seeking than taking presses in week 10 (bottom).
(C) Food-magazine head entries remained stable in the no-habit group (top), but rats switched from a taking-lever preference to no preference (middle) and
exhibited no significant difference between seeking and taking presses in week 10 (bottom).
(D and E) Habitual rats performed more seeking-lever presses (D) and fewer taking-lever presses (E) than non-habitual rats in week 10, but not in week 1.
(F) The seeking-taking preference index demonstrates a higher seeking-lever preference in habitual rats compared to non-habitual rats in week 10, but not in
week 1.
(G) The conditioning ‘‘logic’’ translated from training to preference test: food-magazine head entries were executed more frequently after taking-lever presses
than after seeking-lever presses, both in habitual and non-habitual rats. Histogrammed data with mean ± SEM shown separately.
(H and I) Linear regression analyses demonstrate associations of the seeking-lever index with other habit-like behavior during the preference test, i.e., (H) a
positive correlation with seeking-lever stickiness and (I) a negative correlation with food-magazine head-entry probability. Data are mean ± SEM, in some panels
combined with individual animals (open circles). *p < 0.05, ***p < 0.001.
See also Figures S2 and S3.

t34 = 8.488, p < 0.0001) and 10 (habit group: t27 = 2.510, p = [Figure S3B] and habit rats on the seeking lever [Figure S3C])
0.0184; non-habit group: t34 = 6.202, p < 0.0001). Thus, both rather than performing action sequences involving both levers
groups learned the task structure and translated it from training (Figure S3A).
sessions (levers presented sequentially) to the test situation (le- We validated the preference test by carrying out regression
vers presented simultaneously). Habit and no-habit rats did not analyses, testing for a relationship between the seeking-taking
differ in number of head entries that followed seeking- or tak- preference index (shown in Figure 2F) and other ‘‘habit-like’’ be-
ing-lever presses in weeks 1 (seeking: t32 = 1.506, p = 0.4257; haviors displayed during the test. The preference index corre-
taking: t34 = 1.430, p = 0.3238; Figure 2G, left) or 10 (seeking: lated positively with the probability to repeat seeking-lever
t34 = 0.5440, p = 0.5900; taking: t27 = 2.393, p = 0.0956; Fig- presses (R value = 0.6797, p < 0.0001; Figure 2H) and negatively
ure 2G, right). Notably, repeated presses on the same lever with the probability to make food-magazine head entries (R
were much more prevalent than seeking-taking action se- value = 0.4301, p = 0.0088; Figure 2I).
quences in both habit and no-habit animals in both weeks 1 Classifying rats as habitual or non-habitual based on week-
and 10 (Figures S3A–S3C). Thus, simultaneous lever presenta- 10 preference-test performance, revealed that the two groups
tion during the preference test is marked by rats ‘‘getting stuck’’ perform differently during seeking-taking training (Figures 3A
on one of the levers (no-habit rats on the taking lever and S4): the frequency of ‘‘distal’’ seeking lever presses across

1166 Current Biology 32, 1163–1174, March 14, 2022


ll
Article OPEN ACCESS

A Distal Intermediate
Figure 3. Diametrically opposing DMS dopamine
signals in habitual and non-habitual rats during distal
Head entries outside VI Seeking (VI60s) Head entries (VI60s) Head entries before and proximal reward seeking
taking-lever extension
VI<60 VI60 VI<60 VI60 VI<60 VI60 VI<60 VI60 (A) Habitual animals showed a higher frequency of distal
frequency / session (Hz)
frequency / session (Hz)

frequency / session (Hz)


0.20 2.0 0.20 1.0
*

probability / session
Seeking-lever press

0.8 seeking-lever presses during the VI60 trial segment


0.15 1.5 ** 0.15
* *

Head entry
Head entry

Head entry
*** *** 0.6 compared to non-habitual rats (middle left). Conversely,
0.10 1.0 0.10
0.4 habitual animals exhibited fewer food-magazine head en-
Habit
0.05 0.5 0.05 tries during VI60 (middle right). However, this difference in
No habit
** 0.2

0.00 0.0 0.00 * 0.0 head entries was not present outside VI60, when no levers
-1 0 1 2 5 10 -1 0 1 2 5 10 -1 0 1 2 5 10 -1 0 1 2 5 10
Week Week Week Week
were extended (left). Similarly, no group differences were
B Delivery of an unpredicted food pellet (control) observed for the head-entry probability during the 3-s period
Week 1 Week 10 following seeking-lever retraction and prior to taking-lever
VMS DMS DLS VMS DMS DLS extension (right). Data are mean ± SEM.
ns ns ns ns ns ns
120 30 30 120 30 30 (B) Dopamine release to an unpredicted food-pellet delivery
[Dopamine] (nM)
[Dopamine] (nM)

100 25 25 100 25 25
(proxy for electrode sensitivity) did not differ between
80 20 20 80 20 20
groups. Data are median with range in quartiles and mean
60 15 15 60 15 15
(+); ns, not signifcant.
40 10 10 40 10 10
(C) During distal actions (start of VI60), no significant dopa-
20 5 5 20 5 5
mine group differences were observed in VMS or DLS.
0 0 0 0 0 0
No Habit No Habit No Habit No Habit No Habit No Habit However, DMS dopamine was greater in habitual rats in both
habit habit habit habit habit habit
Distal weeks 1 and 10. Data are mean + SEM; traces are aligned to
C
seeking-lever extension.
VMS DMS DLS
No habit No habit No habit (D) During proximal actions, no significant group differences
Seeking Habit Week1 Seeking Habit Week1 Seeking Habit Week1
ns 3 ns
*
[Dopamine] (nM)

[Dopamine] (nM)
[Dopamine] (nM)

lever lever lever


extension 6 extension 3 extension in dopamine were observed in VMS or DLS. However, again,
l 4 l 2 l 2
l 2 l 1 l 1 DMS dopamine differed between habitual and non-habitual
l l l
5 nM 0 2.5 nM 0 2.5 nM 0
l
l -2
l
l -1
l
l -1
rats, but in contrast to distal actions, proximal events
[Dopamine] (nM)

l l l
l -4 l -2 l -2 significantly decreased dopamine in habitual animals in both
l -6 l -3 l -3
l
Week l1
No Habit
habit
l
Week l1
No Habit
habit
l
Week l1
No Habit
habit weeks 1 and 10. Data are mean + SEM; gray-shaded areas
l l l
l Week10 l Week10 l Week10 indicate approximate final seeking-press timing.
[Dopamine] (nM)
[Dopamine] (nM)

[Dopamine] (nM)

l
l
l
6 ns
4
l
l
l
3
2
*** l
l
l
3 ns
2 (E) To evaluate the association of dopamine with reward
Week l10 2 Week l10 1 Week l10 1
l l l seeking (distal actions: lever presses early in the VI60
l 0 l 0 l 0
l
l
-2 l
l
-1 l
l
-1 segment), trial-by-trial correlations were calculated for week
l -4 l -2 l -2
0 5 10 -6 0 5 10 -3 0 5 10 -3 1 (top) and week 10 (bottom). Gray bars display the animals’
Time (s) No Habit Time (s) No Habit Time (s) No Habit
habit habit habit R values distributed over given histogram brackets. Each
Intermediate and proximal circle represents a single animal (habitual in orange, non-
D
VMS No habit DMS No habit DLS No habit
habitual in brown), and significant correlations are depicted
Habit Week1 Habit Week1 Habit Week1
al Taking d Pe al Taking d Pe al Taking d Pe as filled circles and non-significant correlations as empty
Finkinsg ns Finkinsg
*** Finkinsg ns
[Dopamine] (nM)

el lle el lle el lle


[Dopamine] (nM)

[Dopamine] (nM)

e press iv t 10 e press iv t 5 e press iv t 5


sepres er sepres er sepres er circles. VMS demonstrated the closest relationship between
l l y 8 l l y 4 l l y 4
l l 6 l l 3 l l 3
l l 4 l l 2 l l 2 dopamine and seeking-lever presses. *p < 0.05, **p < 0.01,
l l 10 nM 2 l l 5 nM 1 l l 5 nM 1
l l 0 l l 0 l l 0 ***p < 0.001.
[Dopamine] (nM)

l l -2 l l -1 l l -1
l l l l l l
l l
-4 l l
-2 l l
-2 See also Figure S4.
l l -6 l l -3 l l -3
Week 1 No Habit Week 1 No Habit Week 1 No Habit
l l habit l l habit l l habit
l l l l l l
l l Week10 l l Week10 l l Week10
[Dopamine] (nM)

[Dopamine] (nM)

[Dopamine] (nM)

10 ns
**
l l l l l l
l l l l
5 l l 5 ns
l l 8 l l 4 l l 4
Week 10 l l 6 Week 10 l l 3 Week 10 l l 3
l l 4 l l 2 l l 2
l l 2 l l 1 l l 1
l l 0 l l 0 l l 0
l l -2 l l -1 l l -1
l l -4 l l l l
-2 -2
-10 -5 0 5 -6 - 10 -5 0 5 -3 -10 -5 0 5 -3
Time (s) No Habit Time (s) No Habit Time (s) No Habit
habit habit habit

E 60 Week 1
VMS: dopamine & seeking presses
significant

Habit
% Rats

40 10 Week 1 No habit
# Rats

11 / 25 rats P < 0.05


20 5
P < 0.05
0 0
VMS DMS DLS -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
R-value
60 Week 10
Week 10
significant

10
% Rats

40
# Rats

5 13 / 28 rats
20 P < 0.05
0
0 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
VMS DMS DLS R-value

DMS: dopamine & seeking presses DLS: dopamine & seeking presses
10 Week 1 10 Week 1
# Rats

# Rats

5 3 / 12 rats 5 2 / 22 rats
P < 0.05 P < 0.05
0 0
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
R-value R-value

10 Week 10 10 Week 10
# Rats

# Rats

5 4 / 13 rats 5 5 / 19 rats
P < 0.05 P < 0.05
0 0
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
R-value R-value

Current Biology 32, 1163–1174, March 14, 2022 1167


ll
OPEN ACCESS Article

A Optic C Real-time
D
fiber
Intracranial self-stimulation
place preference
ns ns
100 150
ChR2 Active Nose Pokes

in active quadrant (%)

Nose pokes in 120 min


DMS DMS
MS DM
DMS ChR2 Inactive Nose Pokes
75

Time spent
100 EYFP Active Nose Pokes
pc EYFP Inactive Nose Pokes
SN VTA 50
Th::Cre rat 50
25
B GFP TH Merge
0 0
Cortex ChR2 EYFP Day 1 Day 2 Day 3
Session day
Optic E
fiber
15 15

Week 1 Week 3
DMS
10 10
ChR2

# Rats

# Rats
EYFP

Striatum 5 5

SNpc 0 0
0 2 4 6 8 0 2 4 6 8
# Seeking-press Head entry # Seeking-press Head entry
VTA
SNR
Midbrain
6 6
Week 1 Week 3

4 4
# Rats

# Rats
2 2

0 0
0 2 4 6 8 0 2 4 6 8
Midbrain # Taking-press Head entry # Taking-press Head entry
60 μm

Distal Intermediate
F G H
Head entries before
Head entries outside VI Seeking (VI60s) Head entries (VI60s) taking-lever extension
VI<60 VI60 VI<60 VI60 VI<60 VI60 VI<60 VI60
0.4 2.0 0.3 1.0
**
frequency / session (Hz)

frequency / session (Hz)

frequency / session (Hz)

ChR2
Seeking-lever press

probability / session

0.3 EYFP
1.5 ** 0.2
0.8
Head entry

Head entry
Head entry

0.6
0.2 1.0 **
0.4
0.1
0.1 0.5
0.2

0.0 0.0 0.0 0.0


-1 0 1 2 3 -1 0 1 2 3 -1 0 1 2 3 -1 0 1 2 3
Week Week Week Week

I ChR2: Lever presses EYFP: Lever presses J Seeking presses Taking presses
during LP test during LP test during LP test during LP test
120 120 120 120
Seeking lever ChR2
# Seeking-lever presses

# Taking-lever presses

100 Taking lever 100 100 EYFP 100


80 ** 80 80 ** 80
# Presses

# Presses

60 60 60 60
40 * 40 * 40 40
20 20 20 20
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
Week Week Week Week

Figure 4. Optogenetic stimulation of DMS dopamine during seeking accelerates habit formation
(A) AAV injection into substantia-nigra pars-compacta (SNpc) of Th::Cre rats induced ChR2-EYFP or EYFP (control) expression in dopamine neurons. DMS
dopamine terminals were optogenetically stimulated bilaterally.
(B) Immunostaining for tyrosine hydroxylase (TH) and EYFP indicates selective virus expression in midbrain dopamine neurons (bottom) and their striatal terminals
(top). SNR, substantia nigra pars reticulata; VTA, ventral tegmental area.
(C) Percentage of time spent in the DMS-photo-stimulated (active) quadrant of an open-field arena. No significant difference (ns) was observed between ChR2
and EYFP rats.
(D) Nose pokes into the active port triggered DMS dopamine photo stimulation. We observed no significant differences (ns) within or between ChR2 and EYFP
animals in the number of active and inactive nose pokes.
(E) During seeking-taking training, seeking presses triggered DMS dopamine photo stimulation. The conditioning ‘‘logic’’ translated from training to preference
test: food-magazine head entries were executed more frequently after taking-lever presses than after seeking-lever presses, both in habitual and non-habitual
rats. Histogrammed data with mean ± SEM shown separately.

(legend continued on next page)


1168 Current Biology 32, 1163–1174, March 14, 2022
ll
Article OPEN ACCESS

VI training weeks was higher in habitual rats (group, F5,34 = dopamine neurons during seeking presses (Figures 4A and
18.48, p = 0.0001; group x week interaction, F5,170 = 3.154, 4B). We controlled for possible reinforcing effects by exposing
p = 0.0095). Conversely, the frequency of distal food-magazine rats to an open-field arena where presence of the animal in a
head entries during VI was lower in habitual rats (group, F1,32 = specific quadrant triggered photo-stimulation (real-time place
10.66, p = 0.0026). In contrast, we observed no differences in preference [RT-PP] assay) and observed no significant differ-
head-entry frequency outside VI when levers were retracted ence between ChR2 and control rats (expressing EYFP only) in
(group, F1,32 = 0.0003168, p = 0.9859) and in ‘‘intermediate’’ time spent in the active quadrant (t15 = 1.745, p = 0.1014; Fig-
head-entry probability during the 3-s period following ure 4C). In comparison, other studies applying RT-PP via dopa-
seeking-lever retraction, prior to taking-lever extension (group, mine photo stimulation have reported animals spending about
F1,34 = 1.463, p = 0.2348). Together, these data suggest that 80% of their time in the photo-stimulated position,52–56 whereas
habit animals exhibit a greater affinity to the seeking lever we observed only 45%. Similarly, when rats were given the op-
(habit-like behavior) paired with less outcome checking (goal- portunity to self-stimulate DMS dopamine terminals optogeneti-
directed behavior) and that group differences are specific to cally by poking their noses into a device (over the course of
task-relevant actions and do not reflect discrepancies in gen- 3 days), there was no significant difference in the number of
eral activity. active or inactive nose pokes between ChR2 and EYFP rats
Dopamine release to an unpredicted food pellet (proxy for elec- (active-port group, F1,9 = 4.476, p = 0.0635, time x group interac-
trode sensitivity) did not differ significantly between habitual and tion: F2,18 = 2.566, p = 0.105; inactive-port group, F1,9 = 2.127,
non-habitual animals (Figure 3B) in VMS (week 1: t23 = 0.3175, p = 0.1787, time x group interaction: F2,18 = 1.046, p = 0.372; Fig-
p = 0.7537; week 10: t26 = 1.146, p = 0.2623), DMS (week 1: ure 4D), and there was no difference in active and inactive nose
t10 = 0.7898, p = 0.4480; week 10: t11 = 0.5837, p = 0.5712), and pokes within either of these groups (ChR2: poke-port, F1,12 =
DLS (week 1: t17 = 1.541, p = 0.1417; week 10: t18 = 0.4060, p = 4.514, p = 0.0551, time x poke-port interaction: F2,24 = 1.395,
0.6895). Dopamine measurements throughout the 10-week p = 0.267; EYFP: poke-port, F1,6 = 0.06194, p = 0.8118, time x
training (Figures 3C and 3D) did not reveal any differences between poke-port interaction: F2,12 = 0.016, p = 0.984; Figure 4D).
habitual and non-habitual rats during ‘‘distal,’’ ‘‘intermediate,’’ or Furthermore, for ChR2 animals, no difference between active
‘‘proximal’’ actions in VMS (distal week 1: t23 = 0.8278, p = and inactive pokes was detected on day 3 (t9 = 1.576, p =
0.4163; distal week 10: t26 = 1.648, p = 0.2228; proximal week 1: 0.150). In comparison, overall, other studies applying optoge-
t23 = 0.6261, p = 1.0000; proximal week 10: t26 = 0.3088, p = netic self-stimulation of dopamine have reported about 25–40
0.7600) and DLS (distal week 1: t20 = 0.6727, p = 1.0000; distal responses/min,52,55,57,58 whereas we observed only about 1
week 10: t17 = 0.09775, p = 0.9233; proximal week 1: t20 = response/min on day 1 and 0.15/min on day 3. Taken together,
0.6011, p = 0.5545; proximal week 10: t18 = 1.922, p = 0.1412). stimulation of dopaminergic DMS terminals was without signifi-
However, distal DMS dopamine concentration was higher in cant reinforcing effects.
habitual animals in week 10 (t11 = 5.213, p = 0.0006), a surprising During seeking-taking training, each (distal) seeking press trig-
effect that was already detectable in week 1 (t10 = 2.438, p = gered a 1 s DMS photo stimulation. The lever-preference test
0.0350). Remarkably, this effect was reversed during intermediate showed that both groups made more food-magazine head en-
and proximal actions: habitual animals had lower DMS dopamine tries after taking compared to seeking responses (ChR2: lever,
in both week 1 (t10 = 5.428, p = 0.0006) and week 10 (t11 = 3.663, p = F1,18 = 22.49, p = 0.0002; EYFP: lever, F1,12 = 20.10, p =
0.0037). Within-animal trial-by-trial correlations (Figure 3E) re- 0.0007; Figure 4E). Furthermore, we detected no differences
vealed that reward seeking in itself, irrespective of training amount, over time (between weeks 1 or 3) between the number of
was strongly associated with VMS, but not DMS or DLS dopamine; ChR2 and EYFP rats that performed taking-lever-to-head-entry
seeking-lever pressing in itself, therefore, does not bring about the sequences (group, F1,15 = 0.003275, p = 0.9551) or seeking-
changes in DMS dopamine we observe during habit development. lever-to-head-entry sequences (group, F1,15 = 0.1307, p =
In summary, habitual behavior was correlated with increased DMS 0.7228), indicating that task-structure representation was stable
dopamine release during distal behavioral responses, which was across training sessions.
decreased during responses proximal to reward delivery. More- During seeking-taking training, ChR2 animals increased their
over, DMS dopamine differences were already apparent early in seeking-lever presses during VI60 compared to EYFP controls
training (before habits developed) and were not involved in (motor) (group, F1,15 = 18.27, p = 0.0007, group x week interaction,
performance of distal seeking. F4,60 = 15.20, p < 0.0001; Figure 4G, left). Post hoc analysis re-
To determine whether augmented DMS dopamine signaling is vealed that the number of seeking-lever presses was higher in
sufficient in itself to induce habitual behavior, we bilaterally opto- weeks 1–3 (respectively: t15 = 4.546, p = 0.0012; t15 = 4.521,
genetically stimulated the terminals of DMS-projecting p = 0.0016; t15 = 4.654, p = 0.0015), indicating that optogenetic

(F) No significant differences were found in frequency of food-magazine head entries outside VI60, when no levers were extended.
(G) ChR2 animals made more distal VI60 seeking-lever presses (left). No significant difference was found for food-magazine head entries during VI60 (right).
(H) Similarly, no group differences were observed for the head-entry probability during the 3-s period following seeking-lever retraction and prior to taking-lever
extension (right).
(I) In week 1, both ChR2 and EYFP rats preferentially pressed the taking lever during the habit test. By week 3, this preference had shifted to the seeking lever for
the ChR2 rats, but not the EYFP rats.
(J) The ChR2 group made significantly more seeking-lever responses, but groups did not differ in taking-lever presses. Data in (E) and (F) are mean ± SEM. *p <
0.05, **p < 0.01.

Current Biology 32, 1163–1174, March 14, 2022 1169


ll
OPEN ACCESS Article

DMS dopamine stimulation promoted habit formation. In dopamine during distal reward seeking and decreased DMS
contrast, no differences in frequency of magazine head entries dopamine during actions more proximal to reward delivery
were found between ChR2 and EYFP rats, either during VI (group, (compared to no-habit rats), whereas VMS and DLS dopamine
F1,15 = 0.3029, p = 0.5901; Figure 4G, right) or outside VI (group, did not differ between groups. Although this DMS ‘‘habit signa-
F1,15 = 1.088, p = 0.3134; Figure 4F). Therefore, DMS dopamine ture’’ was not related to seeking actions themselves (these
did not simply invigorate general responding but specifically were more associated with VMS dopamine), we hypothesized
enhanced habitual responding. No group differences were found it to be causally related to the gradually developing seeking-lever
for head-entry probability during the 3-s period following bias. Thus, we optogenetically stimulated DMS dopamine
seeking-lever retraction and prior to taking-lever extension neuron terminals during seeking (during training), which resulted
(group, F1,15 = 0.4022, p = 0.5355; Figure 4H), indicating that in a nearly instant selective promotion of seeking behavior during
photo stimulation did not affect task-structure comprehension. training, and accelerated the shift toward seeking-lever prefer-
In comparison, optogenetic effects are not robust in the self-stim- ence (in the test). Together, our findings demonstrate that dopa-
ulation condition (Figure 4D) but are very robust when applied in mine release in DMS, but not VMS or DLS, predicts future
conjunction with seeking-lever presses during seeking-taking reward-seeking strategies, where augmented distal DMS dopa-
training (Figure 4G, left). Thus, we conclude that optogenetic ef- mine promotes the development of habitual reward-seeking (and
fects on seeking-lever preference cannot be explained by rein- more pronounced proximal DMS dopamine reflects the propen-
forcing properties of DMS photo stimulation alone. sity toward a non-habitual strategy).
DMS photo-stimulation effects also translated into behavior To study neural mechanisms of a gradually developing habit,
during the lever-preference test. ChR2 animals pressed the we introduced a novel (lever-preference) test that, unlike tradi-
seeking lever more than the taking lever across 3 weeks of tionally employed indicators of habits (invoked via satiety-spe-
training (lever, F1,15 = 7.840, p = 0.0135; Figure 4I, left), whereas cific devaluation, contingency degradation, or taste aversion),
EYFP animals did not (lever, F1,12 = 0.6623, p = 0.4316; Figure 4I, was designed to allow for repeated testing and analysis of indi-
right). When compared to taking-lever presses, post hoc analysis vidual differences. We validated that seeking-taking training
revealed that both groups made fewer seeking-lever presses in conditions translated to preference-test conditions by demon-
week 1 (EYFP: n = 7, Z = 2.366, p = 0.036; ChR2: n = 10, Z = strating that taking-lever presses are followed more often by
2.397, p = 0.017), but only ChR2 rats made significantly more reward-magazine entries than seeking-lever presses, indi-
seeking- than taking-lever presses in week 3 (EYFP: n = 7, Z = cating that rats expect reward after taking-lever, but not
1.609, p = 0.1077; ChR2:n = 10, Z = 2.805, p = 0.010; Figure 4I). seeking-lever, presses, and thus learned and generalized the
The number of taking-lever presses did not differ between groups task structure. That training-induced seeking-lever preference
(group, F1,15 = 0.2355, p = 0.6345; Figure 4J, right), but ChR2 rats is a valid indicator of habit formation was underlined by the
made more seeking-lever responses than EYFP animals (group, fact that it correlated positively with seeking-lever stickiness
F1,15 = 7.840, p = 0.0135; Figure 4J, left), where post hoc analysis (another habit-like feature) and negatively with food-magazine
revealed that the number of seeking-lever presses was higher in entries (presumably a goal-directed action). Our test enables
week 3 (U(nChR2 = 10, nEYFP = 7) = 1.500, Z = 3.273, p = 0.001). positive identification of habitual performance by measuring
This suggests that augmented DMS dopamine during training an active-choice preference, as opposed to the ‘‘traditional’’
contributes to (1) an increased occurrence of seeking, but not negative identification via exclusion of a goal-directed strategy.
taking responses and (2) accelerating habit formation assessed Notably, the test consistently and reliably exposed the gradual
in the lever-preference test. nature of the change in lever preference across training weeks
in several cohorts of animals (Figure S2A). The slow speed at
DISCUSSION which the preference shift develops is another validating
feature (as reported in other seeking-taking studies59,60), as
We set out to delineate the role of the striatal dopamine system in habits in humans are thought to form slowly via extensive repe-
habit formation (i.e., automation of reward seeking) and as- tition.3,42,61,62 Moreover, the test demonstrated that within the
sessed dopamine release in three striatal regions implicated same animals, via overtraining, habit-like performance can
heavily in acquisition and execution of reward seeking: VMS, arise out of a behavior that was initially under goal-directed
DMS, and DLS. A 10-week (over)training in a seeking-taking control;61,62 overtraining has been shown to effectively pro-
chain task resulted in habitual seeking, as evidenced by reward duce habits, both in single-action and action-sequence tasks,
devaluation via sensory-specific satiety, traditionally employed but aspects of action sequences (e.g., initiation or termination)
to identify habits. The sequential nature of the task enabled us may remain under goal-directed or mixed control (the latter may
to dissect dopamine signaling during distal (seeking initiation) explain that no-habit rats do not exhibit a lever preference in
and proximal (seeking completion and reward taking) epochs. week 10).1,63–65 Crucially, by chaining two operant behaviors,
Surprisingly, dopamine release was stable between training we temporally separated initiation (distal) and termination
weeks 1 and 10, the only exception being a modestly decreased (proximal) of seeking (followed by reward consumption: taking),
distal DMS signal in week 10. To better interrogate the devel- which allowed clear distinction between underlying neuronal
oping habit, we designed a novel lever-preference test. In mechanisms. Finally, our approach offers additional measures
week 1, animals displayed a taking-lever preference (goal- (reward-magazine head entries and different action sequences)
directed), but with continued training, rats began either to sample that further qualify behavioral strategy (goal-directed or
both levers equally (no-habit rats) or to develop a seeking-lever habitual). Taken together, we propose that ‘‘habits’’ are not a
preference (habit rats). Habit rats displayed increased DMS unitary concept and can be detected in different ways (see

1170 Current Biology 32, 1163–1174, March 14, 2022


ll
Article OPEN ACCESS

STAR Methods for more discussion). Here, we introduce a valu- alone was not sufficient to establish a lever preference in itself,
able addition to traditionally employed habit paradigms. since non-habitual rats were also overtrained, suggesting that
Although dopamine neurons are well known to contribute to DMS dopamine is essential to habit formation.
habit formation,6,9,10,23,32–36,66–68 little is known about their pro- Enhanced engagement with the seeking lever is reminiscent of
jection-specific activity of the dopamine system therein. Based ‘‘sign tracking,’’ but sign tracking is a Pavlovian trait that does
on the presumed functions of VMS (motivation/Pavlovian not develop gradually and, thus, does not explain the gradual
behavior), DMS (goal-directed behavioral control), and DLS shift in lever preference across weeks. Furthermore, sign
(sensorimotor/habit learning), we recorded dopamine transients tracking is strongly linked to VMS dopamine. Consistently,
in these striatal domains, hypothesizing that regional dopamine VMS dopamine was correlated with seeking-press frequency,
signals throughout habit development would align to these func- whereas DMS dopamine was less so. This suggests that DMS
tions. Additionally, based on the generally presumed dopamine- dopamine release does not directly induce seeking behavior
dependent inter-regional shift in locus of behavioral control during (although it does exacerbate it, as evidenced by our optogenetic
habit development (see introduction), we expected task-relevant study), which is supported by the finding that optogenetic
dopamine signals to shift from VMS to the dorsal striatum and/or manipulation of DMS dopamine (outside of seeking-taking
from DMS to DLS. An alternative hypothesis suggests that the in- training) did not suffice to robustly reinforce behavior during
fluence of dopamine subsides altogether with increasing training/ intracranial self-stimulation and RT-PP.
automaticity.23,69,70 However, surprisingly, our results demon- An intriguing, yet speculative, idea is that in habit animals, the
strate that dopamine signaling (1) does not shift inter-regionally dopamine response to the earliest predictor of reward ‘‘propa-
(neither across the animal population as a whole nor between in- gated back in time’’ to the extension of the seeking lever (distal),
dividual habit and non-habitual animals) and (2) does not whereas in no-habit animals, this dopamine response ‘‘re-
decrease with increasing behavioral automaticity in any of the mained’’ at seeking-lever retraction (a more conservative and
three regions. In fact, the stability of dopamine signals across precise predictor of reward [proximal]). This way, DMS dopa-
10 weeks of training is remarkable given dopamine’s role as a mine stimulates development of the habit-like reward pursuit,
‘‘learning/teaching’’ signal (the task is fully acquired within first since DMS dopamine increased in habit rats and decreased in
few weeks). Even more surprisingly, DMS, a region traditionally no-habit rats at the time of seeking-lever presentation and coin-
associated with the control of goal-directed reward seeking, ex- cided with seeking initiation without causing this behavior
hibited the most critical features, where an increase in dopamine directly. In other words, DMS dopamine may be the effector
during reward-seeking initiation was predictive of future habit-like that, during repeated performance, slowly strengthens the asso-
behavior. This habitual distal DMS dopamine surge was paralleled ciation between seeking stimulus/lever and response and
by a depression of proximal DMS dopamine. In contrast, DMS thereby may reflect a proclivity to habit formation, consistent
dopamine signals in non-habitual animals were not simply absent with our finding that such DMS dopamine was already enhanced
but opposite to those of habitual animals: they decreased distally prior to habit development.
and increased proximally. This result also underlined that DMS Our results challenge the reigning hypotheses of regionally
dopamine signaling was not compromised in non-habitual rats. shifting dopamine involvement during habit formation, dopamine
Furthermore, although ‘‘action vigor’’ is a potential contributor to independence of habits, and exclusive DLS governance of habits
DMS dopamine signals, the number of seeking presses in training and thereby break with the idea that the DMS simply counterbal-
(i.e., a proxy for vigor) increases over time in both habit and no- ances a DLS habit system. Notably, without our in-depth analysis
habit animals (Figure 3A). Furthermore, DMS dopamine signals of individual animal behavior across habit development, our
are opposite between these groups, which is the case from the dopamine data (averaged across the entire population) would
beginning of training, and the signals do not correlate with have supported the idea of a weakened goal-directed DMS sys-
seeking-press frequency (Figure 3E). Thus, it seems unlikely that tem that biases animals toward habitual performance and
vigor is itself a major contributor. Together, our data indicate masked the intricate contribution of the DMS dopamine system.
that the temporal distribution (distal/proximal) of dopamine Instead, we were able to delineate two distinct rat populations
signaling in DMS, but not VMS or DLS, governs whether a and demonstrate that DMS-dopamine release during seeking
behavior becomes habitual or not. initiation critically contributes to the formation of a habit. Intrigu-
Since the DMS is traditionally not associated with habit forma- ingly, DMS dopamine exhibited a double dissociation, where
tion directly,8,13,16,25,27,29 the question is what function dopamine proximal events were accompanied by a signal opposite to distal
released into the DMS fulfills during habit formation. From the events (in both habit and no-habit rats). In consideration of the
beginning of training, DMS dopamine signals were present during fact that dopamine signaling did not vanish in, or shift between,
seeking initiation in animals that displayed an affinity to the seeking any of the measured striatal regions, we suggest that habit forma-
lever, reflecting a predisposition toward habit formation, which tion is a decentralized, concerted effort of many collaborating
manifested in the preference test only after weeks of training. striatal domains including the DMS.
This suggests that DMS dopamine signals may strengthen the
seeking preference incrementally over many repetitions, which STAR+METHODS
then translates into other situations (i.e., the lever-preference
test) later. This is corroborated by our optogenetic-stimulation Detailed methods are provided in the online version of this paper
study: optogenetic DMS dopamine stimulation applied during and include the following:
training is reflected in a significant seeking-lever preference only
after 2 weeks. Notably, repetition of seeking presses (overtraining) d KEY RESOURCES TABLE

Current Biology 32, 1163–1174, March 14, 2022 1171


ll
OPEN ACCESS Article
d RESOURCE AVAILABILITY 3. Dickinson, A. (1985). Actions and habits: the development of behavioural
B Lead contact autonomy. Philos. Trans. R. Soc. Lond. B. 308, 67–78.
B Materials availability 4. Packard, M.G., Hirsh, R., and White, N.M. (1989). Differential effects of
B Data and code availability fornix and caudate nucleus lesions on two radial maze tasks: evidence
d EXPERIMENTAL MODEL AND SUBJECT DETAILS for multiple memory systems. J. Neurosci. 9, 1465–1472.
B Animals 5. Packard, M.G., and McGaugh, J.L. (1992). Double dissociation of fornix
d METHOD DETAILS and caudate nucleus lesions on acquisition of two water maze tasks:
further evidence for multiple memory systems. Behav. Neurosci. 106,
B Stereotactic surgery procedures
439–446.
B Viral vectors and optogenetic stimulation
6. Knowlton, B.J., Mangels, J.A., and Squire, L.R. (1996). A neostriatal habit
B Operant chambers
learning system in humans. Science 273, 1399–1402.
B Training on seeking-taking chain schedule of reinforce-
7. Yin, H.H., Knowlton, B.J., and Balleine, B.W. (2004). Lesions of dorsolat-
ment eral striatum preserve outcome expectancy but disrupt habit formation
B Lever-preference test in instrumental learning. Eur. J. Neurosci. 19, 181–189.
B Experimental procedures
8. Yin, H.H., Knowlton, B.J., and Balleine, B.W. (2006). Inactivation of dorso-
d QUANTIFICATION AND STATISTICAL ANALYSIS lateral striatum enhances sensitivity to changes in the action-outcome
B Data analysis contingency in instrumental conditioning. Behav. Brain Res. 166, 189–196.
B Statistical analysis 9. Graybiel, A.M. (2008). Habits, rituals, and the evaluative brain. Annu. Rev.
Neurosci. 31, 359–387.
SUPPLEMENTAL INFORMATION 10. Amaya, K.A., and Smith, K.S. (2018). Neurobiology of habit formation.
Curr. Opin. Behav. Sci. 20, 145–152.
Supplemental information can be found online at https://doi.org/10.1016/j.
11. Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., and Graybiel, A.M.
cub.2021.12.027.
(1999). Building neural representations of habits. Science 286, 1745–1749.

ACKNOWLEDGMENTS 12. Barnes, T.D., Kubota, Y., Hu, D., Jin, D.Z., and Graybiel, A.M. (2005).
Activity of striatal neurons reflects dynamic encoding and recoding of pro-
We thank Ralph Hamelink and Nicole Yee for technical support, Matthijs Feen- cedural memories. Nature 437, 1158–1161.
stra for input on the manuscript, and Lucia Economico for illustrations. This 13. Thorn, C.A., Atallah, H., Howe, M., and Graybiel, A.M. (2010). Differential
research was funded by the following organizations: H2020 European dynamics of activity changes in dorsolateral and dorsomedial striatal
Research Council (ERC) (ERC-2014-STG 638013 to I.W.), Netherlands Organi- loops during learning. Neuron 66, 781–795.
sation for Scientific Research (NWO) (VIDI 864.14.010,2015/06367/ALW to 14. Smith, K.S., and Graybiel, A.M. (2013). A dual operator view of habitual
I.W.), and Netherlands Organisation for Scientific Research (NWO) (Gravitation behavior reflecting cortical and striatal dynamics. Neuron 79, 361–374.
program, BRAINSCAPES 024.004.012 to I.W.). The funders had no role in
15. Yin, H.H., and Knowlton, B.J. (2006). The role of the basal ganglia in habit
study design, data collection and interpretation, or the decision to submit
formation. Nat. Rev. Neurosci. 7, 464–476.
the work for publication.
16. Yin, H.H., Knowlton, B.J., and Balleine, B.W. (2005). Blockade of NMDA
AUTHOR CONTRIBUTIONS receptors in the dorsomedial striatum prevents action-outcome learning
in instrumental conditioning. Eur. J. Neurosci. 22, 505–512.
Conceptualization, W.v.E., J.G., T.A., and I.W.; resources, I.W.; data curation, 17. Hernandez, P.J., Sadeghian, K., and Kelley, A.E. (2002). Early consolida-
W.v.E., P.W., and I.W.; formal analysis, W.v.E. and P.W.; investigation, W.v.E., tion of instrumental learning requires protein synthesis in the nucleus ac-
J.M., W.B., R.J., D.S., and J.G.; visualization, W.v.E., and P.W.; supervision, cumbens. Nat. Neurosci. 5, 1327–1331.
I.W.; funding acquisition, I.W.; methodology, W.v.E. and I.W.; writing – original
18. Smith-Roe, S.L., and Kelley, A.E. (2000). Coincident activation of NMDA
draft, W.v.E., T.A., and I.W.; writing – review & editing, W.v.E., D.D., T.A., and
and dopamine D1 receptors within the nucleus accumbens core is
I.W.; project administration, I.W.
required for appetitive instrumental learning. J. Neurosci. 20, 7737–7742.
19. Atallah, H.E., Lopez-Paniagua, D., Rudy, J.W., and O’Reilly, R.C. (2007).
DECLARATION OF INTERESTS
Separate neural substrates for skill learning and performance in the ventral
and dorsal striatum. Nat. Neurosci. 10, 126–131.
The authors declare no competing interests.
20. Hernandez, P.J., Schiltz, C.A., and Kelley, A.E. (2006). Dynamic shifts in
INCLUSION AND DIVERSITY corticostriatal expression patterns of the immediate early genes Homer
1a and Zif268 during early and late phases of instrumental training.
One or more of the authors of this paper self-identifies as an underrepresented Learn. Mem. 13, 599–608.
ethnic minority in science. 21. Everitt, B.J., and Robbins, T.W. (2005). Neural systems of reinforcement
for drug addiction: from actions to habits to compulsion. Nat. Neurosci.
Received: July 31, 2021 8, 1481–1489.
Revised: November 3, 2021 22. Everitt, B.J., and Robbins, T.W. (2013). From the ventral to the dorsal stria-
Accepted: December 9, 2021 tum: devolving views of their roles in drug addiction. Neurosci. Biobehav.
Published: February 7, 2022 Rev. 37 (9 Pt A), 1946–1954.

REFERENCES 23. Everitt, B.J., and Robbins, T.W. (2016). Drug addiction: updating actions to
habits to compulsions ten years on. Annu. Rev. Psychol. 67, 23–50.
1. Adams, C.D. (1982). Variations in the sensitivity of instrumental responding 24. Lipton, D.M., Gonzales, B.J., and Citri, A. (2019). Dorsal striatal circuits for
to reinforcer devaluation. Q. J. Exp. Psychol. B. 34, 77–98. habits, compulsions and addictions. Front. Syst. Neurosci. 13, 28.
2. Dickinson, A., Nicholas, D.J., and Adams, C.D. (1983). The effect of the 25. Yin, H.H., Ostlund, S.B., Knowlton, B.J., and Balleine, B.W. (2005). The
instrumental training contingency on susceptibility to reinforcer devalua- role of the dorsomedial striatum in instrumental conditioning. Eur. J.
tion. Q. J. Exp. Psychol. B. 35, 35–51. Neurosci. 22, 513–523.

1172 Current Biology 32, 1163–1174, March 14, 2022


ll
Article OPEN ACCESS

26. Lerner, T.N. (2020). Interfacing behavioral and neural circuit models for 49. Rossi, M.A., and Yin, H.H. (2012). Methods for studying habitual behavior
habit formation. J. Neurosci. Res. 98, 1031–1045. in mice. Curr. Protoc. Neurosci. 8, 29.
27. Yin, H.H., Mulcare, S.P., Hilário, M.R.F., Clouse, E., Holloway, T., Davis, 50. Robbins, T.W., and Costa, R.M. (2017). Habits. Curr. Biol. 27, R1200–R1206.
M.I., Hansson, A.C., Lovinger, D.M., and Costa, R.M. (2009). Dynamic 51. Paxinos, G., and Watson, C. (1998). The Rat Brain in Stereotaxic
reorganization of striatal circuits during the acquisition and consolidation Coordinates (Academic Press).
of a skill. Nat. Neurosci. 12, 333–341.
52. Nieh, E.H., Vander Weele, C.M., Matthews, G.A., Presbrey, K.N.,
28. Haber, S.N., Fudge, J.L., and McFarland, N.R. (2000). Striatonigrostriatal Wichmann, R., Leppla, C.A., Izadmehr, E.M., and Tye, K.M. (2016).
pathways in primates form an ascending spiral from the shell to the dorso- Inhibitory input from the lateral hypothalamus to the ventral tegmental
lateral striatum. J. Neurosci. 20, 2369–2382. area disinhibits dopamine neurons and promotes behavioral activation.
29. Murray, J.E., Belin, D., and Everitt, B.J. (2012). Double dissociation of the Neuron 90, 1286–1298.
dorsomedial and dorsolateral striatal control over the acquisition and per- 53. Thomas, C.S., Mohammadkhani, A., Rana, M., Qiao, M., Baimel, C., and
formance of cocaine seeking. Neuropsychopharmacology 37, 2456–2466. Borgland, S.L. (2022). Optogenetic stimulation of lateral hypothalamic
30. Belin, D., and Everitt, B.J. (2008). Cocaine seeking habits depend upon orexin/dynorphin inputs in the ventral tegmental area potentiates meso-
dopamine-dependent serial connectivity linking the ventral with the dorsal limbic dopamine neurotransmission and promotes reward-seeking be-
striatum. Neuron 57, 432–441. haviours. Neuropsychopharmacol. 47, 728–740.
31. Willuhn, I., Burgeno, L.M., Everitt, B.J., and Phillips, P.E.M. (2012). 54. Zhang, Z., Liu, Q., Wen, P., Zhang, J., Rao, X., Zhou, Z., Zhang, H., He, X.,
Hierarchical recruitment of phasic dopamine signaling in the striatum dur- Li, J., Zhou, Z., et al. (2017). Activation of the dopaminergic pathway from
ing the progression of cocaine use. Proc. Natl. Acad. Sci. USA 109, 20703– VTA to the medial olfactory tubercle generates odor-preference and
20708. reward. eLife 6, e25423.
32. Wickens, J.R., Horvitz, J.C., Costa, R.M., and Killcross, S. (2007). 55. Galaj, E., Han, X., Shen, H., Jordan, C.J., He, Y., Humburg, B., Bi, G.-H.,
Dopaminergic mechanisms in actions and habits. J. Neurosci. 27, 8181– and Xi, Z.-X. (2020). Dissecting the role of GABA neurons in the VTA versus
8183. SNr in opioid reward. J. Neurosci. 40, 8853–8869.
33. Jin, X., and Costa, R.M. (2010). Start/stop signals emerge in nigrostriatal 56. Coimbra, B., Domingues, A.V., Soares-Cunha, C., Correia, R., Pinto, L.,
circuits during sequence learning. Nature 466, 457–462. Sousa, N., and Rodrigues, A.J. (2021). Laterodorsal tegmentum-ventral
tegmental area projections encode positive reinforcement signals.
34. Wang, L.P., Li, F., Wang, D., Xie, K., Wang, D., Shen, X., and Tsien, J.Z.
J. Neurosci. Res. 99, 3084–3100.
(2011). NMDA receptors in dopaminergic neurons are crucial for habit
learning. Neuron 72, 1055–1066. 57. Ilango, A., Kesner, A.J., Keller, K.L., Stuber, G.D., Bonci, A., and Ikemoto,
S. (2014). Similar roles of substantia nigra and ventral tegmental dopamine
, F., and El Massioui, N. (2005). Lesion to
35. Faure, A., Haberland, U., Conde
neurons in reward and aversion. J. Neurosci. 34, 817–822.
the nigrostriatal dopamine system disrupts stimulus-response habit for-
mation. J. Neurosci. 25, 2771–2780. 58. Saunders, B.T., Richard, J.M., Margolis, E.B., and Janak, P.H. (2018).
Dopamine neurons create Pavlovian conditioned stimuli with circuit-
36. Nelson, A., and Killcross, S. (2006). Amphetamine exposure enhances
defined motivational properties. Nat. Neurosci. 21, 1072–1083.
habit formation. J. Neurosci. 26, 3805–3812.
59. Giuliano, C., Puaud, M., Cardinal, R.N., Belin, D., and Everitt, B.J. (2021).
37. Bannard, C., Leriche, M., Bandmann, O., Brown, C.H., Ferracane, E.,
Individual differences in the engagement of habitual control over alcohol
Sánchez-Ferro, Á., Obeso, J., Redgrave, P., and Stafford, T. (2019).
seeking predict the development of compulsive alcohol seeking and drink-
Reduced habit-driven errors in Parkinson’s Disease. Sci. Rep. 9, 3423.
ing. Addict. Biol. 26, e13041.
38. Witt, K., Nuhsman, A., and Deuschl, G. (2002). Dissociation of habit-learning
60. Zapata, A., Minney, V.L., and Shippenberg, T.S. (2010). Shift from goal-
in Parkinson’s and cerebellar disease. J. Cogn. Neurosci. 14, 493–499.
directed to habitual cocaine seeking after prolonged experience in rats.
39. Seiler, J.L., Cosme, C.V., Sherathiya, V.N., Bianco, J.M., and Lerner, T.N. J. Neurosci. 30, 15457–15463.
(2022). Dopamine signaling in the dorsomedial striatum promotes compul-
61. Wood, W., and Neal, D.T. (2007). A new look at habits and the habit-goal
sive behavior. Curr. Biol. Published online February 7, 2022. https://doi.
interface. Psychol. Rev. 114, 843–863.
org/10.1016/j.cub.2022.01.055.
62. Miller, K.J., Shenhav, A., and Ludvig, E.A. (2019). Habits without values.
40. Adams, C.D., and Dickinson, A. (1981). Instrumental responding following
Psychol. Rev. 126, 292–311.
reinforcer devaluation. The Quarterly Journal of Experimental Psychology
Section B 33, 109–121. 63. Dezfouli, A., and Balleine, B.W. (2013). Actions, action sequences and
habits: evidence that goal-directed and habitual action control are hierar-
41. Colwill, R.M., and Rescorla, R.A. (1985). Postconditioning devaluation of a
chically organized. PLoS Comput. Biol. 9, e1003364.
reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav.
64. Dezfouli, A., Lingawi, N.W., and Balleine, B.W. (2014). Habits as action se-
Process. 11, 120–132.
quences: hierarchical action control and changes in outcome value.
42. Wood, W., and Rünger, D. (2016). Psychology of habit. Annu. Rev.
Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130482.
Psychol. 67, 289–314.
65. Garr, E., and Delamater, A.R. (2019). Exploring the relationship between
43. Hilário, M.R.F., and Costa, R.M. (2008). High on habits. Front. Neurosci. 2, actions, habits, and automaticity in an action sequence task. Learn.
208–217. Mem. 26, 128–132.
44. Balleine, B.W., and Dickinson, A. (1998). Goal-directed instrumental ac- 66. Schoenbaum, G., and Setlow, B. (2005). Cocaine makes actions insensi-
tion: contingency and incentive learning and their cortical substrates. tive to outcomes but not extinction: implications for altered orbitofrontal-
Neuropharmacology 37, 407–419. amygdalar function. Cereb. Cortex 15, 1162–1169.
45. Balleine, B.W., and Dezfouli, A. (2019). Hierarchical action control: adap- 67. Nordquist, R.E., Voorn, P., de Mooij-van Malsen, J.G., Joosten, R.N.J.M.A.,
tive collaboration between actions and habits. Front. Psychol. 10, 2735. Pennartz, C.M.A., and Vanderschuren, L.J.M.J. (2007). Augmented rein-
46. Buabang, E.K., Boddez, Y., De Houwer, J., and Moors, A. (2021). Don’t forcer value and accelerated habit formation after repeated amphetamine
make a habit out of it: Impaired learning conditions can make goal- treatment. Eur. Neuropsychopharmacol. 17, 532–540.
directed behavior seem habitual. Motiv. Sci. 7, 252–263. 68. Eyny, Y.S., and Horvitz, J.C. (2003). Opposing roles of D1 and D2 recep-
47. Smith, K.S., and Graybiel, A.M. (2014). Investigating habits: strategies, tors in appetitive conditioning. J. Neurosci. 23, 1584–1587.
technologies and models. Front. Behav. Neurosci. 8, 39. 69. Ljungberg, T., Apicella, P., and Schultz, W. (1992). Responses of monkey
48. Hilário, M.R.F. (2007). Endocannabinoid signaling is critical for habit for- dopamine neurons during learning of behavioral reactions.
mation. Front. Integr. Neurosci. 1, 6. J. Neurophysiol. 67, 145–163.

Current Biology 32, 1163–1174, March 14, 2022 1173


ll
OPEN ACCESS Article
70. Choi, W.Y., Balsam, P.D., and Horvitz, J.C. (2005). Extended habit training Chronic microsensors for longitudinal, subsecond dopamine detection
reduces dopamine mediation of appetitive response expression. in behaving animals. Nat. Methods 7, 126–129.
J. Neurosci. 25, 6729–6733.
74. Olmstead, M.C., Parkinson, J.A., Miles, F.J., Everitt, B.J., and Dickinson,
71. Witten, I.B., Steinberg, E.E., Lee, S.Y., Davidson, T.J., Zalocusky, K.A., A. (2000). Cocaine-seeking by rats: regulation, reinforcement and activa-
Brodsky, M., Yizhar, O., Cho, S.L., Gong, S., Ramakrishnan, C., et al. tion. Psychopharmacology (Berl.) 152, 123–131.
(2011). Recombinase-driver rat lines: tools, techniques, and optogenetic
application to dopamine-mediated reinforcement. Neuron 72, 721–733. 75. Balleine, B.W., Garner, C., Gonzalez, F., and Dickinson, A. (1995).
72. Willuhn, I., Burgeno, L.M., Groblewski, P.A., and Phillips, P.E.M. (2014). Motivational control of heterogeneous instrumental chains. J. Exp.
Excessive cocaine use results from decreased phasic dopamine signaling Psychol. Anim. Behav. Process. 21, 203–217.
in the striatum. Nat. Neurosci. 17, 704–709. 76. Phillips, P.E.M., and Wightman, R.M. (2003). Critical guidelines for valida-
73. Clark, J.J., Sandberg, S.G., Wanat, M.J., Gan, J.O., Horne, E.A., Hart, tion of the selectivity of in-vivo chemical microsensors. Trends in Analytical
A.S., Akers, C.A., Parker, J.G., Willuhn, I., Martinez, V., et al. (2010). Chemistry 22, 509–514.

1174 Current Biology 32, 1163–1174, March 14, 2022


ll
Article OPEN ACCESS

STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER


Antibodies
Rabbit anti-GFP (primary antibodies) Thermo Fisher Scientific Cat#A-6455; RRID: AB_221570
Chicken anti-TH (primary antibodies) Aves Labs RRID: AB_10013440
Donkey anti-rabbit Alexa 488 Jackson Immunoresearch RRID: AB_2340619; 711-546-152
Donkey anti-chicken Alexa 647 Jackson Immunoresearch RRID: AB_2340380; 703-606-155
Bacterial and virus strains
AAV5.EF1a -DIO-ChR2-EYFP Viral Vector Facility, University N/A
of Zurich
AAV5.EF1a -DIO-EYFP Viral Vector Facility, University N/A
of Zurich
Chemicals, peptides, and recombinant proteins
Phosphate-buffered saline Thermo Fisher Scientific Cat#18912014
Normal donkey serum Jackson ImmunoResearch RRID: AB_2337258
Bovine serum albumin Sigma-Aldrich Cat#A9647-1kg
Sodium azide Sigma-Aldrich Cat#S2002-100 g
Triton X-100 Sigma-Aldrich Cat#X-100-500ml
4% Paraformaldehyde (37% formalin) VWR Cat#97064-606
Deposited data
Code This paper https://osf.io/j3zaq/
Experimental models: Organisms/strains
Long-Evans rats Janvier Labs, France RjOrl:LE
TH-Cre rats on Long-Evans background Rat Resource & Research LE-Tg(TH-Cre)3.1Deis / RRID:RRRC_00659
Center https://www.rrrc.us/
Software and algorithms
MATLAB version R2018b MathWorks https://www.mathworks.com/products/new_
products/release2018b.html
IBM SPSS statistics 25 IBM https://www.ibm.com/nl-en/products/spss-statistics
Prism version 9.2.0 GraphPad https://www.graphpad.com/scientific-software/prism/
LabVIEW version 2014 National Instruments https://www.ni.com/nl-nl/shop/labview.html
Med-PC version IV Med-associates https://www.med-associates.com/med-pc-v/
QuPath version 0.2.3 Open source https://qupath.github.io/
Other
45mg Dustless Precision Pellets Rodent, Purified Bio-Serv Cat#: F0021
45mg Dustless Precision Pellets Rodent, Bio-Serv Cat#: F0165
Grain-Based

RESOURCE AVAILABILITY

Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Ingo Will-
uhn (i.willuhn@nin.knaw.nl).

Materials availability
This study did not generate new unique reagents.

Data and code availability

d All data reported in this paper will be shared by the lead contact upon request.

Current Biology 32, 1163–1174.e1–e6, March 14, 2022 e1


ll
OPEN ACCESS Article
d The code used for this study is available at https://osf.io/j3zaq/ and, if necessary, more detailed information is available from the
corresponding author upon request.
d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Animals
Adult male Long-Evans rats from Janvier Labs were used for voltammetry and behavior-only studies. For the optogenetic study, adult
male TH-Cre rats (on Long-Evans background71) were used. Rats (320-410 g) were housed individually and kept on a reversed 12 h
light/dark cycle (lights off from 08:00 to 20:00) with controlled temperature and humidity. Rats were food-restricted to 85% of their
free-feeding body weight, and water was provided ad libitum. Regular laboratory chow was available in the home cage two h after the
end of daily behavioral training (executed during the dark phase), supplementing food intake during training. All animal procedures
were conducted in accordance with Dutch and European law and approved by the Animal Experimentation Committee of the Royal
Netherlands Academy of Arts and Sciences. For dopamine recordings, 35 animals had at least one functional and histologically veri-
fied recording electrode. 16 rats were used to verify whether reward-driven behavior was habitual after 10 weeks of training by tradi-
tional satiety-specific outcome devaluation, of which two rats were excluded from analysis, because outcome devaluation failed. 17
rats were used for optogenetic stimulation of dopaminergic neuron-terminals in DMS that had histologically verified virus expression
in both hemispheres and placement of optical fibers bilaterally in DMS.

METHOD DETAILS

Stereotactic surgery procedures


Stereotactic surgery was executed as described previously.31,72 Before surgery, all surgical equipment was sterilized and the imme-
diate surroundings were wiped with 70% ethanol. Rats were anesthetized with 1%–3% isoflurane and placed in a stereotactic frame.
Heart rate and breathing frequency where monitored to assess surgical anesthetic depth. Body temperature was monitored and
maintained with a heating pad. Following subcutaneous injection with the analgesic Metacam (0.2mg meloxicam/100 g), the
scalp was shaved, disinfected with 70% alcohol, and incised to expose the cranium. The site of incision was treated with lidocaine
(100mg/mL). For the voltammetric experiment, holes were drilled for three anchor surgical screws, a Ag/AgCl reference electrode,
and two custom-made carbon-fiber micro-electrodes73 unilaterally targeting two of three regions: VMS (1.2mm rostral, 1.5mm
lateral, and 7.1mm ventral from Bregma51), DMS ( 0.2mm rostral, 2.5mm lateral, and 4.5mm ventral), and DLS (1.2mm rostral,
3.6mm lateral, and 4.5mm ventral). Electrodes were secured to the skull with dental acrylic cement anchored to the surgical screws.
For the optogenetic experiment, holes were drilled for three anchor surgical screws, two optic fibers targeting the DMS bilaterally
( 0.2mm rostral, +/ 2.7mm lateral, and 3.7mm ventral), and four virus injections in the ventral midbrain across both hemispheres,
targeting the substantia nigra pars compacta ( 5.2mm rostral, +/ 2.4mm lateral, and 7.8mm ventral, and 6.0mm rostral,
+/ 2.4mm lateral, and 7.6mm ventral). After slowly lowering the microinjector into position, virus was infused at a rate of 0.1ml/min
using a microsyringe pump. After infusion, injectors were left in place for 10min and then removed slowly. Optic fibers (diameter of
230mm, custom-made) targeted at the DMS (the dopamine-neuron projection target of interest) were secured to the skull with dental
acrylic cement anchored to the surgical screws.
Following surgery, rats received 0.5ml of saline s.c for rehydration and were placed in an incubator for at least 1 h. To monitor any
possibility of discomfort or pain and to make sure that the animals properly recovered, we monitored their overall appearance,
weight, behavior, and the state of the incision (wound healing) daily for 3 days post-surgery. Surgery and anesthesia duration was
shorter than 2 h.

Viral vectors and optogenetic stimulation


For optogenetic stimulation, AAV5.Ef1a-DIO-ChR2-EYFP (0.9ml/injection site, for a total of 3.6ml/rat, titer 3x1012 particles/mL)
was injected into the ventral midbrain for Cre-dependent expression of channelrhodopsin in dopamine neurons of TH-Cre rats.
AAV5.Ef1a-DIO-EYFP (0.9ml/injection site, for a total of 3.6ml/rat, titer 3x1012 particles/mL) was used as a Cre-dependent fluoro-
phore control. For optogenetic stimulation, we used laser light with a wavelength of 470 nm, a pulse width of 10ms, a frequency
of 20Hz, and an intensity of 20mW at the tip of the optic-fiber patch cable (constant illumination).

Operant chambers
All behavioral experiments were conducted in modified modular operant chambers (32x30x29cm, Med Associates Inc., VT, USA),
equipped with a food magazine with an integrated light (connected to an automated food-pellet dispenser) flanked by two retractable
operant levers, a house light, multiple tone generators, a white-noise generator, and metal grid floors. The wall opposite the food
magazine was outfitted with two nose-poke response devices (port with integrated cue lights) located on adjacent panels. Each op-
erant box was surveilled by a video camera. The boxes were housed in metal Faraday cages, that were insulated with sound-
absorbing polyurethane foam and ventilated by a fan.

e2 Current Biology 32, 1163–1174.e1–e6, March 14, 2022


ll
Article OPEN ACCESS

Training on seeking-taking chain schedule of reinforcement


The seeking-taking chain schedule of reinforcement was adapted from Olmstead and colleagues74 and Balleine and colleagues.75
After rats were placed in the operant chamber, the start of each daily behavioral-training session was marked by illumination of the
house light and the sound of the white noise, which both remained on for the duration of the session. During each trial, rats could earn
one ‘‘purified’’ food pellet (Bio. Serv. Inc., USA), delivered into the tray of the food magazine, accompanied by a 2 s-illumination of the
magazine light. Behavioral training commenced when rats reached 85% of free-feeding body weight. On the first day, animals were
habituated to the operant chamber and 30 food pellets were delivered on a variable-interval (VI) schedule of 90 s. On the second day,
rats learned to press the taking lever, the proximal link of the chain (i.e., proximal to the food reward), on a fixed ratio 1 (FR1) schedule
for an immediate delivery of a pellet, with a maximum of 40 pellets. The following three days, taking-lever presses triggered pellet
delivery (FR1) plus lever retraction, and an inter-trial interval (ITI) of 25 s was imposed, followed by re-extension of the lever into
the operant box. In the next session, rats learned to press the seeking lever, the distal link of the chain (i.e., distal to the food reward),
in order to gain access to the taking lever. Following its extension into the box (marking the start of each trial), one press on the
seeking lever (FR1) triggered retraction of this lever, and after a delay of 3 s, the taking lever was extended. One press on the taking
lever (FR1) triggered retraction of the taking lever and delivery of a food pellet 3 s thereafter (FR1:FR1 schedule). During the variable ITI
of 25 s, both levers remained retracted. In the following sessions, rats were trained on increasing VIs (ranging from 2 s to eventually
60 s) for the seeking lever: VI2:FR1, VI7:FR1, VI15:FR1, VI30:FR1, VI60:FR1. Each VI started with the rat’s first response on the
seeking lever, once extended. Subsequent lever presses went without programmed consequences, until the first press after the
VI ended, which triggered seeking-lever retraction and taking-lever extension 3 s thereafter. Figure 2A depicts a schematic overview
of a VI60:FR1 trial. Sessions ended once rats earned 40 rewards. Rats were trained daily on the VI60:FR1 for 10 weeks. Lever position
(seeking/taking) was counterbalanced between rats.

Lever-preference test
Behavioral automaticity was probed regularly (no more than once per week) immediately preceding the daily training session, using a
novel ‘‘lever-preference’’ test: After a variable interval of 25 s with only house-light and white noise turned on, animals were exposed
to both seeking and taking levers simultaneously for one minute under extinction conditions (recording lever presses and head entries
into the food magazine without programmed consequences). During this minute, a preference for the taking lever (i.e., more presses
on the taking than seeking lever) was interpreted as a goal-directed behavioral strategy, whereas a seeking-lever preference was
taken as evidence for habitual responding.
We aim to establish the lever-preference test as a new way to assess the development of habitual behavior. Our intention is not to
emulate another version of an outcome-devaluation procedure or to design a proxy for it. Instead, we introduce an alternative way to
track progressive automation of behavior and present a number of points that validate and/or support this test as a tool that identifies
a habit:

1) Across the 10 weeks of training, the animals’ initial test preference to interact with the taking lever (that is associated with
immediate (FR1) food delivery) switches to the seeking lever (that never directly leads to food delivery and never leads to
food quickly, but instead only leads to the extension of the taking lever).
2) We demonstrate that this preference switch occurs within individual animals.
3) The taking preference at the beginning of training is present in animals from both groups (habit and no-habit; Figures 2B and 2C).
4) This switch only occurs after a very high rate of action and session repetition (1000s of repetitions in 10 weeks of daily training
sessions; Figure 1B). It is widely accepted that habit learning materializes through high repetition of responding. Our paradigm
requires many repetitions of the behavior before the habit is detectable (long after the motor skill to lever press had been
acquired).
5) The slow speed of this switch is comparable to the speed of habit development in human everyday life (Figures 2B and 2C),
establishing face validity, in contrast to some other habit tasks that report habits developing as quickly as after 1-4 days of
behavioral training.
6) Induction of sensory-specific satiety after 10 weeks of seeking-taking training demonstrates that our behavioral training
schedule induces a habit as defined and tested by traditional outcome devaluation (and in the traditional way: as a singular
group, as opposed to considering individual differences; Figure 1C), establishing construct validity and fulfilling the traditional
requirement to classify a behavior as habitual.
7) Habitual animals exhibit a higher probability to repeat seeking-lever presses (lever stickiness) during the preference test (Fig-
ure 2H), a measure associated with habitual behavior.
8) Habitual animals commit fewer head entries into the food magazine during the preference test (Figure 2I), a measure asso-
ciated with goal-oriented behavior (non-habitual).
9) The measures mentioned under points 7 and 8 correlate with seeking-lever preference (Figure 2H and 2I).
10) The preference test detects a similar preference progression across weeks independent of the frequency with which the test
is applied (Figure S2).
11) Rats do not perform head entries into the food magazine during the 3-s waiting period before extension of the taking lever,
indicating that the animals learn the ‘‘task logic’’ (i.e., the taking lever needs to be pressed in order to receive food (Figure 1B
(right)).

Current Biology 32, 1163–1174.e1–e6, March 14, 2022 e3


ll
OPEN ACCESS Article

12) The ‘‘task logic’’ translates well from the training situation (VI60/FR1) to the preference-test situation (both levers out), as an-
imals in both groups (habit and no-habit) perform substantially more food-magazine entries after taking-lever presses (the
lever associated with food delivery; Figure 2G).

We do not provide an extensive cross-validation of our preference test with the sensory-specific satiety data presented in Figure 1C
for several reasons: 1) We employed behavioral training under a VI60 reinforcement schedule, which has in fact already been widely
validated and widely accepted to induce habits, for both single-action and action-sequence tasks. 2) Data acquired using the reward-
devaluation approach is relatively ‘‘noisy’’ and provides little clustering of the population data, as exemplified in Figure 1C (center). 3)
We believe that what is commonly referred to as a ‘‘habit’’ is not a unitary concept and has a number of different aspects and that,
therefore, there are different ways to approach its detection. An approach that focuses on devaluation-insensitivity criteria is a widely
accepted approach, originally put forward by Dickinson and colleagues, and has led to invaluable research findings. However, it does
not capture all aspects of a (human) habit. For example, using its underlying logic, a habit can be induced in a single, relatively short
behavioral training session in humans, and no more than 4-5 sessions in rodents. Yet, the human behavior to be modeled by this
research, is behavior that has slowly developed across weeks or months, or even years. Our paradigm requires weeks of daily training
to induce a habit. 4) To our knowledge, sensory-specific satiety devaluation has never been reported as a way to assess individual
differences in habit formation. In addition to this type of devaluation being very sensitive to disturbances in general, we think that this
approach is actually not suitable to assess individual differences: Following the ad-lib pre-feeding, rats were exposed to the seeking
lever for five minutes under extinction conditions during which lever presses were recorded. Each animal was subjected to both
valued (pre-feeding with pellets different from training pellets) and devalued (pre-feeding with training pellets) conditions (on separate
days); the sequence of conditions was counterbalanced between animals. An important observation is that during the first pre-
feeding session, animals eat more than after the second. Thus, whichever condition comes first, valued or devalued, the first session
elicits more lever presses relative to the same condition when it occurs second in temporal sequence (in other animals). We have
observed this phenomenon reliably in several cohorts of animals, and believe it to be a general phenomenon, despite the fact
that, to our knowledge, it has not been reported in the literature. This phenomenon complicates looking at individual differences,
as individuals are strongly affected by the sequence of pre-feeding, which restricts the utility of pre-feeding devaluation to group
comparisons, where an equal number of animals have undergone each condition first (due to counterbalancing).

Experimental procedures
Experiment 1: Week 10 validation of habit induction by seeking-taking paradigm
One group of rats (n = 14) was tested for habitual responding using a traditional satiety-specific outcome devaluation, after 10 weeks
of daily training on the VI60:FR1 seeking-taking chain schedule of reinforcement. Outcome devaluation (sensory-specific satiety) was
achieved by giving rats one-h ad-libitum access to either grain-based pellets (used to remove hunger as a motivating factor for lever
pressing; the pellets that are used for behavioral training retain their ‘‘rewarding’’ value, this is the ‘‘valued condition’’), or regular (pu-
rified) pellets (used to decrease both hunger and value of the behavioral-training pellets, this is the ‘‘devalued condition’’) prior to
behavioral testing. Food pellets were placed in custom-made metal cups in an empty, otherwise unused home cage (‘‘feeding
cage’’; water freely available). Following one h of this pre-feeding, rats were exposed to the seeking lever in the operant chamber
for five minutes under extinction conditions during which lever presses were recorded. This was subsequently followed by a ‘‘pel-
let-choice’’ test to determine the efficacy of the pre-feeding procedure, measured by the preference for either pellet type during
two minutes of simultaneous ad-libitum access to both types in the feeding cage. Animals were excluded from analyses if they
consumed any of the prefed-type pellets during this test (insufficient sensory-specific satiety). Each animal was subjected to both
valued and devalued conditions (on separate days); the sequence of conditions was counterbalanced between animals.
Experiment 2: Dopamine measurement during seeking-taking chain training
In a second group of rats (n = 35), we employed fast-scan cyclic voltammetry (FSCV) using chronically implanted carbon-fiber micro-elec-
trodes73 to record rapid changes in dopamine release in different domains of the striatum, during week 1 and week 10 of training on the
seeking-taking chain of reinforcement schedule. On recording days, before the start of behavioral training, electrodes were connected to
a head-mounted voltammetric amplifier that was interfaced through an electrical swivel above the test chamber (allowing animals free
movement; Crist Instrument, MD, USA) with a PC-driven data-acquisition and analysis system (National Instruments, TX, USA).73
Voltammetric scans were repeated every 100ms (10Hz sampling rate). During each scan, the alternating potential at the carbon-
fiber electrode tip was ramped linearly from –0.4V versus Ag/AgCl to +1.3V (anodic sweep) and back to 0.4V (cathodic sweep) at
400V/s (total scan time of 8.5ms) and held at 0.4V between scans. Waveform generation, data acquisition, and analysis were carried
out on a PC-based system using two PCI multifunction data acquisition cards and software written in LabVIEW (National Instru-
ments). Dopamine is oxidized during the anodic sweep, if present at the surface of the electrode, forming dopamine-o-quinone
(peak reaction detected around +0.7V), which is reduced back to dopamine in the cathodic sweep (peak reaction detected around
0.3V). The ensuing flux of electrons is measured as current and is directly proportional to the number of molecules that undergo
electrolysis. The background-subtracted, time-resolved current obtained from each scan provides a chemical signature character-
istic of the analyte, allowing resolution of dopamine from other substances.76
Experiment 3: Optogenetic stimulation of DMS dopamine-neuron terminals
In a third group of rats (n = 17), we optogenetically stimulated DMS dopamine-neuron terminals (bilaterally) in response to seeking-
lever presses for three weeks during daily seeking-taking chain training sessions starting under VI30:FR1 and progressing to

e4 Current Biology 32, 1163–1174.e1–e6, March 14, 2022


ll
Article OPEN ACCESS

VI60:FR1. A seeking-lever press triggered a 1 s laser photo-stimulation (20mW, 20Hz, 20x10ms light pulses) followed by a 1 s timeout.
Starting 10 s after the first seeking-lever press, only every third seeking-lever press triggered stimulation. Rats were exposed to the
lever-preference test (without optogenetic stimulation) regularly to assess whether an approach habit was developing. Laser-stim-
ulation visibility was masked using a custom-made LED mounted on the wall of the operant box emitting blue light pulses (20Hz,
20x10ms).
Experiment 4: Real-time place preference (RT-PP) and intracranial self-stimulation
Prior to experiment 3 (after 8-9 week post-surgery period to allow for sufficient virus expression), rats were exposed to an open-field
arena in which an RT-PP assay was implemented. Following a 5min baseline period without photo-stimulation, the presence of the rat
in one of the quadrants of the open-field (i.e., active quadrant) triggered photo-stimulation (20mW, 20Hz, 20x10ms light pulses) of
DMS dopamine-neuron terminals during a 15min assay. The position of the active quadrant was assigned randomly. Time spent
in the active quadrant was recorded. Laser-stimulation visibility was masked using a custom-made LED mounted on the wall of
the open-field emitting blue light pulses (20Hz, 20x10ms).
After Experiment 3, rats were tested in the same operant box for three days in an operant self-stimulation experiment, where
nose pokes into the active nose-poke device triggered a 1 s photo-stimulation (20mW, 20Hz, 20x10ms light pulses) of DMS dopa-
mine-neuron terminals, followed by a 1 s timeout. Nose pokes into the inactive nose-poke device were recorded without pro-
grammed consequence. The location of active and inactive nose-poke devices was counterbalanced between rats. Laser-stimula-
tion visibility was masked using a custom-made LED mounted on the wall of the operant box emitting blue light pulses (20Hz,
20x10ms).
Histological verification of recording sites, optic-fiber placement, and virus expression
After completion of experiments, rats were deeply anesthetized using a lethal dose of pentobarbital (14mg/100 g), and transcardially
perfused with saline followed by 4%-paraformaldehyde (PFA). In animals with electrode implants, recording sites were marked
with an electrolytic lesion before perfusion. Brains were removed and post-fixed in PFA for one day after which they were placed
in 30% sucrose for cryoprotection. The brains were rapidly frozen using an isopentane bath, sliced on a cryostat (50-mm coronal sec-
tions, 20 C), and stained with Cresyl violet to aid visualization of anatomical structures, electrode-induced lesion or virus-infusion
sites, and optic-fiber placement.
For the optogenetic experiment, sliced serial coronal sections (50mm) were stored in phosphate-buffered saline (PBS) with 0.02%-
sodium azide at 4 C until further use. For immunohistochemical stainings, sections were incubated in blocking buffer (5% bovine
serum albumin (BSA), 5% normal donkey serum (NDS), and 0.2%-triton-X in PBS) for 1 h, followed by overnight incubation in primary
antibodies rabbit anti-GFP (1:1000, Invitrogen A6455, Thermo Fisher) and chicken anti-TH (1:1000, Aves TYH) in blocking buffer. All
sections were washed 4x (10min each) in PBS and then incubated for 1 h in blocking buffer containing fluorescent Alexa-conjugated
secondary antibodies (1:1000, donkey anti-chicken Alexa 647 and donkey anti-rabbit Alexa 488; Jackson Immunoresearch 711-546-
152 and 703-606-155). Sections were washed again 4x in PBS, with the second wash containing DAPI (nuclei stain; Sigma Aldrich
D9542), then mounted onto slides and coverslipped with Mowiol mounting medium. Slides were imaged using three fluorescent
channels of the ZEISS Axio Scan.z1 slide-scanner at 10x magnification for qualitative expression and targeting verification. Sections
were subsequently imaged in z stacks with a confocal microscope (Leica TCS SP5) to verify specificity of viral expression.

QUANTIFICATION AND STATISTICAL ANALYSIS

Data analysis
Dopamine-related current was isolated from the voltammetric signal by chemometric analysis using a standard training set, based on
electrically stimulated dopamine release detected with chronically implanted electrodes; resulting dopamine concentration was esti-
mated based on average post-implantation sensitivity of electrodes.73 Resulting dopamine traces were smoothed with a moving 10-
point median filter prior to analysis of average concentration. Dopamine concentration was averaged over 10 s (approximate duration
of the observed phasic signal) following seeking-lever extension (distal link of the chain) and over 7.5 s before and 2.5 s after reward
delivery (proximal link of the chain); resulting averages were compared to the average concentration over the 2 s before each trial
(baseline). Prior to each FSCV recording session, two unpredicted food-pellet deliveries (spaced apart by two minutes) confirmed
electrode viability to detect dopamine, as well as electrode sensitivity over time. Data analysis was performed using MATLAB
R2018b (The Mathworks, Inc., MA, USA).

Statistical analysis
Individual electrochemical signals were averaged across sessions and then across animals. For comparison of electrochemical data,
individual behavioral data were binned into weeks and then averaged across animals. Corresponding electrochemical and behavioral
data from week 1 were compared to week 10 using paired Student’s t test. When separated into groups, behavioral data were
analyzed using multivariate ANOVAs with group, lever, nose-poke port, and week as factors. When main effects or interactions
were significant, post hoc analyses were carried out and P values were adjusted according to the Holm-Bonferroni correction method
for multiple comparisons. For the optogenetic self-stimulation experiment (Figure 4D), we used two-way ANOVAs, with nose-poke
port (active versus inactive) or group (ChR2 versus EYFP) as factors, and time (day) as repeated-measurement. Non-parametric
testing was conducted when data was not normally distributed. Graphical representations were made using Prism (GraphPad Soft-
ware, La Jolla, CA, USA) and MATLAB. Statistical analysis was carried out using Prism and SPSS (version 25; Chicago, IL, USA) and

Current Biology 32, 1163–1174.e1–e6, March 14, 2022 e5


ll
OPEN ACCESS Article

MATLAB. Data collection and analysis were not performed blind to the conditions of the experiments. Statistical significance was set
to p < 0.05. All data are presented as mean or median ± SEM.
Sample size was 35 rats for dopamine recordings (Figures 1 and 3), 16 for testing habitual behavior by traditional outcome deval-
uation via sensory-specific satiety (Figure 1), and 17 for optogenetic stimulation of dopaminergic neuron-terminals in the DMS (Fig-
ure 4), as described under ‘‘Animals.’’ To demonstrate reliability of the lever-preference test, we used seven cohorts of rats with
different sample sizes (ncohort1 = 12, ncohort2 = 12, ncohort3 = 13, ncohort4 = 16, ncohort5 = 6, ncohort6 = 14 ncohort7 = 6;
Figure S2).

e6 Current Biology 32, 1163–1174.e1–e6, March 14, 2022

You might also like