You are on page 1of 16

Neuron

Review

Dopamine Does Double Duty


in Motivating Cognitive Effort
Andrew Westbrook1,* and Todd S. Braver1,2
1Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO 63130, USA
2Department of Radiology, Washington University in St. Louis, St. Louis, MO 63110, USA
*Correspondence: jawestbrook@wustl.edu
http://dx.doi.org/10.1016/j.neuron.2015.12.029

Cognitive control is subjectively costly, suggesting that engagement is modulated in relationship to incentive
state. Dopamine appears to play key roles. In particular, dopamine may mediate cognitive effort by two broad
classes of functions: (1) modulating the functional parameters of working memory circuits subserving effort-
ful cognition, and (2) mediating value-learning and decision-making about effortful cognitive action. Here, we
tie together these two lines of research, proposing how dopamine serves ‘‘double duty’’, translating incentive
information into cognitive motivation.

Why is thinking effortful? Unlike physical exertion, there is no sentations, with higher extrasynaptic tone promoting greater
readily apparent metabolic cost (relative to ‘‘rest’’, which is stability, to a limit (Seamans and Yang, 2004). Phasic DA efflux
already metabolically expensive) (Raichle and Mintun, 2006). may also push beyond the limit and toggle the PFC into a labile
And yet, we avoid engaging in demanding activities even when state such that working memory representations can be flexibly
doing so might further valuable goals. This appears particularly updated (Braver et al., 1999). Additionally, DA may support the
true when goal pursuit requires extended allocation of working learning of more sophisticated (and hierarchical) allocation pol-
memory for cognitive control. One hypothesis is that cognitive icies via synaptic depression and potentiation in corticostriatal
effort avoidance is intended to minimize opportunity costs loops (Frank et al., 2001; O’Reilly and Frank, 2006). Second,
incurred by the allocation of working memory (Kurzban et al., DA is critical for action selection. Specifically, DA trains value
2013). If this is true, it suggests not only that working memory functions for action selection via phasic reward prediction error
is allocated opportunistically, but also that allocation policies dynamics potentiating behaviors that maximize reward with
entail sophisticated cost-benefit decision-making that is sensi- respect to effort in a given context (see Niv, 2009 for a review).
tive to as yet unknown cost and incentive functions. In any DA tone in the striatum and the medial PFC also promotes pre-
case, the phenomenon raises a number of questions: How do paratory and instrumental behaviors in response to conditioned
brains track effort costs? What information is being tracked? stimuli and particularly effortful behavior (Kurniawan et al., 2011;
How can incentives overcome such costs? What mechanisms Salamone and Correa, 2012).
mediate adaptive working memory allocation? Here, we tie together these largely independent lines of
Working memory capacity is sharply limited, especially in the research by proposing how the very same functional properties
domain of cognitive control, involving abstract, flexible, hierar- of DA encoding incentive information translate incentives into
chical rules for behavior selection. Optimizing working memory cognitive motivation by regulating working memory. Specifically,
allocation is thus critical for optimizing behavior. Prevalent we propose that DA dynamics encoding incentive state promote
computational frameworks have proposed reward- or expec- subjectively costly working memory operations experienced as
tancy-maximization algorithms for working memory allocation conscious, phenomenal effort. As we detail below, our proposal
(Botvinick et al., 2001; Donoso et al., 2014; O’Reilly and Frank, makes use of the concept of a ‘‘control episode’’ during goal pur-
2006). Yet, these frameworks largely neglect that working mem- suit (cf. ‘‘attentional episodes’’, see Duncan, 2013), involving sta-
ory allocation itself carries affective valence. High subjective ble maintenance of the goal state at higher-levels of the control
costs drive disengagement, whereas sufficient incentive drives hierarchy, along with selective updating of lower level rules for
engagement. That is, allocation of working memory is a moti- guiding behavior during completion of subgoals, as progress is
vated process. In this review, we argue that modulatory functions made toward the ultimate goal state. We review the ways in
of the midbrain dopamine (DA) system translate cost-benefit in- which DA dynamics encoding a net cost-benefit of goal engage-
formation into adaptive working memory allocation. ment and persistence result in adaptive working memory alloca-
DA has been implicated in numerous processes including, but tion. As such, DA translates incentive motivation into cognitive
not limited to, motivation, learning, working memory, and deci- effort.
sion-making. There are two largely independent literatures that
ascribe disparate functional roles to DA with relevance to moti- Motivated Cognition
vated cognition. First, DA influences the allocation of working Why Cognitive Effort Matters
memory directly by modulating the functional parameters of Cognitive effort is an everyday experience. The subjective
working memory circuits. For example, DA tone in the prefrontal costliness of cognitive effort is consequential, sometimes
cortex (PFC) influences the stability of working memory repre- driving disengagement from otherwise highly valuable goals.

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 695


Neuron

Review

Yet, surprisingly little is known about this phenomenon. It is 2015; Shackman et al., 2011). For example, individuals respond
neither clear what makes tasks effortful, nor why task engage- faster to affectively negative, and slower to affectively positive
ment is apparently aversive in the first place (Inzlicht et al., stimuli, following priming by conflicting versus non-conflicting
2014; Kurzban et al., 2013). Stroop trials (Dreisbach and Fischer, 2012).
Beyond a quizzical influence over goal-directed behavior, Avoidance learning to minimize loss may partly explain aver-
there are numerous reasons to care about cognitive effort. First, sion to working memory allocation for cognitive control. Yet, it
expenditure is critical for career and educational success, eco- cannot be the full story. On the one hand, individuals avoid
nomic decision-making, and attitude formation (Cacioppo cognitive demand, even controlling for reward likelihood (Kool
et al., 1996; von Stumm et al., 2011). Second, deficient effort et al., 2010; McGuire and Botvinick, 2010; Westbrook et al.,
may be a significant component of neuropsychiatric disorders 2013). On the other, opportunity costs may reflect more than
for which avolition, anhedonia, and inattention feature promi- just the likelihood of failure during the current control episode;
nently, such as attention deficit hyperactivity disorder (ADHD) namely, they may reflect the value of missed opportunities (Kurz-
(Volkow et al., 2011), depression (Hammar et al., 2011), and ban et al., 2013). Finally, an adaptive system must also be judi-
schizophrenia (Strauss et al., 2015). Effort avoidance may also cious, and avoidance of all goals requiring cognitive control is
contribute to declining cognitive performance in healthy aging clearly maladaptive. Decision-making must consider both costs
(Hess and Ennis, 2012; Westbrook et al., 2013). Engagement and benefits. Indeed, there is growing evidence that the ACC is
with certain kinds of cognitive tasks appears negatively va- as important for biasing engagement with effortful, control-
lenced, indicating a subjective cost. Subjectively inflated effort demanding tasks as it is for biasing avoidance (Shenhav et al.,
costs might undermine cognitive engagement and thereby per- 2013).
formance. Incentives Motivate Cognitive Control
Control-Demanding Tasks Are Valenced If control is avoided because of subjective costs, increased in-
Not all tasks are effortful. Tasks requiring allocation of working centives could offset costs, promoting control. Indeed, incen-
memory for cognitive control, however, appear to be (Botvinick tives yield control-mediated performance enhancements (see
et al., 2009; Dixon and Christoff, 2012; Dreisbach and Fischer, Botvinick and Braver, 2015; Pessoa and Engelmann, 2010 for
2012; Kool et al., 2010; Massar et al., 2015; McGuire and Botvi- review). Incentives enhance performance in control-demanding
nick, 2010; Schouppe et al., 2014; Westbrook et al., 2013). Indi- tasks encompassing visuospatial attention (Krebs et al., 2012;
viduals allowed to select freely between tasks differing only in the Small et al., 2005), task-switching (Aarts et al., 2010), working
frequency with which working memory must be reallocated for memory (Jimura et al., 2010), and context maintenance (Chiew
cognitive control express a progressive preference for the option and Braver, 2014; Locke and Braver, 2008), among others.
with lower reallocation demands (Kool et al., 2010; McGuire and Furthermore, incentives predict greater activity in control-
Botvinick, 2010). Critically even when offered larger reward, de- related regions, including medial and lateral PFC. For example,
cision-makers discount reward as a function of effort costs, thus incentives yield increased BOLD signal in the ACC, propagating
selecting smaller reward with lower demands over larger reward to dorsolateral PFC, corresponding well with the canonical
with higher demands (Massar et al., 2015; Westbrook et al., model by which the ACC monitors for control demands and re-
2013). cruits lateral PFC to implement control (Kouneiher et al., 2009).
Under what conditions might cognitively demanding tasks ac- This particular study showed that incentives yielded an additive
quire affective valence? By one account, tasks demanding increase in BOLD signal, on top of demand-driven control sig-
cognitive control involve response conflict (Botvinick et al., nals. However, more recent work has shown that incentive in-
2001) or frequent errors (Brown and Braver, 2005; Holroyd and formation is not merely additive, but interactive: with increasing
Coles, 2002) and as such are less likely to be successful, thus incentive-related activity under high task-demand conditions,
engendering avoidance learning to bias behavior toward tasks thus more directly implicating incentives in the enhancement
with higher chances of success (Botvinick, 2007). Multiple lines of cognitive control (Bahlmann et al., 2015), cf. Krebs et al.
of evidence suggest that conflict is aversive. First, conflict in (2012). Beyond mean activity, incentives also enhance the fidel-
the context of a Stroop task predicts overt avoidance (Schouppe ity of working memory representations. Task set representa-
et al., 2012). Also, trial-wise variation in subjective frustration tions are more distinctive, as revealed by multivariate pattern
with a stop-signal task predicts BOLD signal in the anterior analysis of BOLD data, during incentivized working memory tri-
cingulate cortex (ACC), otherwise implicated in conflict detection als (Etzel et al., 2015). Interestingly, increased distinctiveness
(Spunt et al., 2012). In another study (McGuire and Botvinick, predicts individual differences in incentive-driven behavioral
2010), participant ratings of their desire to avoid a conflict- enhancement.
inducing task correlated positively with individual differences in Incentives not only drive more control-related activity, or
recruitment of ACC and also dorsolateral PFC, putatively higher fidelity task set representations, but they also affect
involved in working memory maintenance of task sets. More- the selection of more costly control strategies. For example,
over, the dorsolateral PFC correlation remained after controlling cognitive control may be recruited proactively, in advance of
for performance differences (reaction time, RTs, and error rates), imperative events, or reactively, concurrent with event onset
indicating that the desire to avoid the task did not simply reflect (Braver, 2012). Proactive control has behavioral advantages,
perceived failure. Finally, interesting interactions between affect but also incurs opportunity costs that bias reliance on reactive
and cognitive control also support the notion that conflict is aver- control. Incentives appear to offset costs, increasing proactive
sive (Dreisbach and Goschke, 2004; Saunders and Inzlicht, relative to reactive control, as reflected in sustained increases

696 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review
Figure 1. Incentive State Dynamics during a
Control Episode
State dynamics as exemplified by succession
through mental multiplication task operations. In
the image, points incentivize initial engagement.
The costs (red line) mount with time-on-task and
increasing maintenance and updating demands.
The actors persist while the net incentive value of
engagement (black line) remains positive, which
occurs when costs are offset by incremental
progress (e.g., at subgoal completion) and other
incentives (green line). If the net incentive value
goes negative, the actors are prone to disen-
gagement.

and costs may accumulate in excess of


perceived benefits, any stage may result
in disengagement. We consider the
mental multiplication example for illustra-
tive purposes only; the general notion of a
control episode should apply broadly to
any hierarchically structured, temporally
extended sequence of goal-directed be-
in BOLD signal prior to imperative events, and attenuated haviors that require working memory allocation (e.g., planning,
phasic responses at event onsets, and this shift to proactive problem-solving, and reasoning).
control predicts performance enhancements (see Jimura In the sections that follow, we describe how DA mediates
et al., 2010). Moreover, incentive-driven shifts to proactive con- value-based working memory management during control epi-
trol are larger among highly reward-sensitive individuals (Jimura sodes. Figure 2 provides an overview of critical functions that
et al., 2010). will be reviewed. Tonic DA, for example, influences the stability
In sum, working memory operations are treated as subjec- of working memory contents by direct action in PFC
tively costly. Whether apparent costliness reflects avoidance (Figure 2B), while phasic DA efflux in the striatum trains policies
learning of behaviors with low likelihood of success, or opportu- for value-based updating of working memory contents that
nity costs, incentives can counterbalance costs, promoting reflect both the reward value of the goals to which they
working memory operations. Cost-benefit decision-making correspond and effort (updating and maintenance) costs
thus underlies working memory allocation for cognitive control. (Figure 2C). While cached value-functions reflect past experi-
We propose that during goal pursuit, individuals engage in costly ence, their implementation is subject to instantaneous modula-
control episodes, remaining engaged to the extent that benefits tion by incentive state. Accordingly, we describe how DA and
outweigh costs. Moreover, we propose that DA solves a core its projection targets encode net incentive state, dynamically ac-
computational problem of control episodes: namely, value- counting for goal state revaluation and generalized motivation.
based management of working memory for cognitive control Such information is used to bias policies for working memory
that reflects not only prior reward learning, but also instanta- allocation actions (Figure 2D). Hence, DA does double duty in
neous effects of current incentive state. translating incentive information into cognitive effort both by
To illustrate, we consider an example control episode functional modulation of working memory circuits (Figures 2B
involving the demanding task of finding the product of two and 2C) and by influencing value-learning and decision-making
two-digit numbers, incentivized by points on an examination about effortful action (Figures 2C and 2D). We take up each of
(without calculators; Figure 1). Control episodes may be initiated these key duties in turn.
by incentive-driven (point-value cued) allocation of working
memory to represent the goal state (finding the product). DA and Working Memory Management
Throughout an episode, the actor must maintain high-level Successful control episodes demand stable maintenance and
goal information (e.g., the original numbers), resisting interfer- also targeted, flexible updating of working memory, with DA ap-
ence from distractors, while flexibly updating targeted, lower- pearing to play an important role in both processes. In the PFC,
level representations of subgoals in a hierarchical fashion. DA influences the stability of recurrent networks (Brunel and
Subgoals in our example include: (1) multiplying the ones column Wang, 2001; Seamans and Yang, 2004) and, thereby, the stabil-
digits; (2) carrying the tens-digit value of that product; (3) adding ity of short-term configurations that constitute control-related
that value to the product of the tens-digits, etc. Maintaining each working memory representations (Cools and D’Esposito, 2011;
subgoal is subjectively costly and thus the stability of goal repre- Robbins and Arnsten, 2009). In the striatum, DA trains gating pol-
sentations should reflect the value of those goals. Similarly, up- icies that come to determine the kinds of information that
dating operations, as required when subgoals are completed, become represented in the PFC and the stimulus signals that
are also subjectively costly. As each stage has its own costs, drive updating of specific PFC subregions (Frank et al., 2001;

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 697


Neuron

Review
Figure 2. Dopamine Does Double Duty
during Control Episodes
Double duty for DA in cognitive effort includes: (1)
modulating the functionality of WM circuits
including maintenance stability and specific flexi-
bility for updating WM contents (yellow), and (2)
developing and biasing value-based policies for
WM allocation (blue). The images clockwise from
upper left: (A) key anatomical loci of DA circuitry
regulating control episodes; (B) tonic DA promotes
stable and robust WM maintenance via PFC
modulation; (C) phasic DA release encoding effort-
discounted reward trains allocation policies in
striatum and ACC; and (D) phasic DA release and
ramping tone in the striatum bias action selection
toward costly WM updating in the lateral PFC, by
potentiating updating generally, and updating in
accordance with PFC-based action policy signals,
in particular. The top-down policy signals reflect
hierarchically higher-level goals and thus favor
gating of contextually appropriate subgoals
into WM. The insets are described in subsequent
figures.

Importantly, PFC DA changes dynami-


cally, precisely when needed, to promote
working memory maintenance. Salient,
cognitive task-relevant events have been
shown to drive mesocortical DA neuron
firing that can increase extrasynaptic DA
concentration in the PFC (Figure 2B) (re-
viewed in Bromberg-Martin et al., 2010;
Phillips et al., 2008). In humans, BOLD dy-
namics in the ventral tegmental area (VTA)
support the hypothesis that DA neurons
respond to cognitive task demands, inde-
pendently of reward (see Boehler et al.,
2011), as well as the interaction of reward
and task complexity (Krebs et al., 2012).
The effect of this VTA activation may be
O’Reilly and Frank, 2006). Thereby, DA plays key roles in initi- to promote maintenance of task sets in lateral PFC regions,
ating and sustaining control episodes by functionally promoting e.g., in those demonstrated (by reversible TMS lesion) to be crit-
both working memory stability and targeted flexibility. ical for supporting rule-guided behavior (D’Ardenne et al., 2012).
Importantly, a positron emission tomography (PET) study has re-
Promoting Stability of Higher-Order Goal vealed increased D2 receptor binding in ventrolateral PFC in hu-
Representations mans performing a verbal working memory task, relative to a
Working memory representations in the PFC (Miller and Cohen, simpler sustained attention task (Aalto et al., 2005) (Figure 4A).
2001) (though see Riggall and Postle, 2012) are instantiated as Incentive cues also drive PFC DA release (reviewed in Brom-
temporarily stable, recurrent cortical pyramidal networks (Brunel berg-Martin et al., 2010; Phillips et al., 2008). To the extent that
and Wang, 2001). Extracellular DA promotes recurrent dynamics incentive-related DA promotes robust maintenance, such effects
by increasing excitatory N-methyl-D-aspartate (NMDA) drive help explain motivational enhancements of memory- and rule-
and also pruning firing external to such networks by exciting guided behavior. It could, for example, explain why incentives pre-
inhibitory gamma-Aminobutyric acid (GABA) interneurons (Ber- dict stronger proactive, maintenance-related BOLD signal in the
ridge and Arnsten, 2013; Cools and D’Esposito, 2011; Seamans lateral PFC during a Sternberg-type working memory task that
and Yang, 2004). The net effect of increasing DA (to a point) mediates better performance (Jimura et al., 2010) (Figure 4B). It
is to increase network-specific recurrent firing rates (Figure 3) could also explain performance enhancements following pharma-
and thus signal-to-noise ratio of working memory representa- cological COMT inhibition (boosting PFC DA tone, in particular) in
tions (Brunel and Wang, 2001). Evidence includes that, DA D1 re- an exploration/exploitation task, which requires the tracking of
ceptor agonism sharpens spatial tuning in task-relevant PFC multiple value signals in working memory (Kayser et al., 2014).
neurons in monkeys performing a spatial working memory task Conversely, while increasing DA promotes maintenance, flex-
(Vijayraghavan et al., 2007). ible shifting may require decreased DA. In one study, set-shifting

698 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review

Global dopamine modulation Figure 3. Influence of Dopamine on


80 Neuronal Dynamics in a Model Recurrent
Network
Increased PFC DA tone (upper line) boosts firing in
70 +10%D1 task-selective neurons in recurrent networks
during WM maintenance (e.g., delayed match-to-
60 sample), relative to a baseline (control) dopami-
nergic state (lower line). In this computational
Firing rate (Hz)

50 simulation of neural dynamics, DA-linked increase


in NMDA and GABA currents boosts persistent,
40 recurrent firing, enhancing the stability and dis-
tractor resistance of task-relevant WM represen-
tations (x axis units are arbitrary time; Brunel and
30 Wang, 2001).

20

10 Control
(Kodama et al., 2014). The authors inter-
0 preted this unexpected result as reflecting
0 7000
Sample Match+R stress-driven DA release observed in other
studies (see Butts et al., 2011). Thus incen-
tive can promote PFC DA tone, but stress
performance was modulated in humans dosed with l-dopa. fMRI may be another affective determinant. In any case, there is a
evidence localized these effects to the PFC (Shiner et al., 2015). growing consensus that affective stimuli influence PFC DA tone
Specifically, when participants were dosed with l-dopa, the dif- which, in turn, modulates the stability of recurrent networks and,
ference between better performance on incentivized, and worse thereby, the contents of working memory.
performance on non-incentivized trials, was removed. Critically,
this mirrored the attenuation of BOLD signal deactivation in the Promoting Targeted, Flexible Updating of Task Sets
ventromedial PFC typical on incentivized versus non-incentiv- The need for both stability and flexibility of working memory, dur-
ized trials. The result was interpreted as evidence that set main- ing control episodes, creates opposing demands that DA acting
tenance was under dopaminergic control, and this control must by the PFC alone cannot resolve. Indeed, DA-mediated in-
be transiently removed to shift task sets. creases in stability undermine flexibility, as reflected in higher
Too much extrasynaptic DA, on the other hand, may destabi- task-switch costs (Herd et al., 2014; van Schouwenburg et al.,
lize working memory representations (Berridge and Arnsten, 2010). There is evidence, however, that DA can increase cogni-
2013; Cools and D’Esposito, 2011; Seamans and Yang, 2004). tive flexibility via D2 signaling in the ventral striatum (VS) (Aarts
One potential mechanism of supraoptimal DA effects is et al., 2010; Samanez-Larkin et al., 2013; Shiner et al., 2015;
increasing stimulation of relatively low-affinity DA D2 receptors van Holstein et al., 2011). Incentives can enhance task switching,
(Durstewitz and Seamans, 2008). This D2 stimulation leads to and this effect is stronger among individuals with a variant of the
decreased GABA and NMDA currents, thus counteracting D1 DA transporter gene DAT1 predicting lower transporter density,
activation effects. Blocking D2 action, therefore, could enhance and therefore higher synaptic and extrasynaptic DA tone, partic-
PFC representations. In a recent demonstration, DA D2 receptor ularly in the striatum (Aarts et al., 2010). This result supports the
blockade by amisulpride, relative to placebo, enhanced PFC hypothesis that striatal DA release mediates incentive enhance-
representations as indexed by sharper multivariate pattern ment of cognitive flexibility. Evidence of D2 receptor involvement
discrimination of PFC BOLD data between incentive conditions comes from a study comparing the effects of the DA agonist
during an incentive learning task (Kahnt et al., 2015). bromocriptine and the DA D2-selective antagonist sulpiride on
According to one proposal, task-based DA release yielding task-switching (van Holstein et al., 2011). Critically, those individ-
supraoptimal DA may provide a local task-switching mechanism uals with DAT1 coding for higher transporter density (lower stria-
in the PFC (Braver et al., 1999). Specifically, DA release may tog- tal DA tone) showed reduced switch costs after being dosed with
gle PFC lability, by pushing DA tone from optimal to supraoptimal DA agonist bromocriptine, and this improvement was blocked by
levels, increasing the likelihood of context updating during task the D2-selective antagonist sulpiride.
performance. However, as noted, this kind of updating would Successful control episodes require not simply generalized
have diffuse influence and lacks the temporal and spatial spec- increases in flexibility, but targeted, context-specific updating.
ificity required for targeted updating of, for example, a subcom- In the mental arithmetic example, it is critical to maintain a
ponent of a task-set hierarchy (O’Reilly and Frank, 2006). Even if representation of the full problem (13 3 26), while updating spe-
phasic PFC DA efflux does not support selective updating, it may cific subgoals as they are completed (e.g., shifting to multiply
be useful to serve as a general updating or disengagement 10 3 6, after storing the 3 3 6 result). While DA in the PFC lacks
signal. the temporal or spatial specificity to support targeted updating,
We close this section by noting that while increasing incentive DA can effect specific updating via the basal ganglia (Frank
can drive higher PFC DA tone, a recent study has shown conflict- et al., 2001; O’Reilly and Frank, 2006) (Figure 2C). A well-sup-
ing results. Notably, the investigators found higher PFC DA release ported model holds that phasic DA release in the dorsal striatum
in anticipation of less subjectively valued outcomes in monkeys (DS) trains ‘‘so-called ‘Go’’’ cell (D1-expressing medium spiny

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 699


Neuron

Review
Figure 4. Control Demand and Incentive
Effects of Dopamine in Lateral PFC
(A) Decreased binding potential of D2 receptors in
ventrolateral PFC indicates increased DA tone
during a verbal WM (2-back) task relative to a less
demanding sustained attention (0-back) task
(Aalto et al., 2005).
(B) In a high-incentive context (orange; R+), sus-
tained BOLD signal is enhanced in right lateral PFC
during a WM (Sternberg) task, relative to a low
incentive context (blue; R ) (Jimura et al., 2010).

external states along with cognitive


and motor actions (Li and Daw, 2011;
Wickens et al., 2007), and temporally
extended action sequences (Holroyd
neurons) synapses, through long-term potentiation (LTP), which and Yeung, 2012; O’Reilly et al., 2014). Incentive salience
increase the likelihood of contextual information being gated to models propose that DA may further bias action selection at
the PFC. DA dips, on the other hand, are proposed to train ‘‘so- the time of choice by modulating value signals, e.g., as a function
called ‘NoGo’’’ cells (D2-expressing medium spiny neurons) syn- of motivational state (McClure et al., 2003; Zhang et al., 2009).
apses, through long-term depression (LTD), decreasing the likeli- So, for example, DA could mediate the decision to engage in a
hood of context gating (Frank et al., 2001). When stimuli evoke temporally extended sequence of cognitive actions required
activity in relatively more Go than NoGo cells, information is gated for multiplication of two-digit numbers, as well as to execute all
(by transient removal of tonic inhibition of the thalamus) for repre- subgoals in sequence, as a function of the point value on an ex-
sentation into the PFC. Thus, by training striatal synapses to amination. In contrast, if the task were insufficiently incentivized,
reflect reward history, phasic DA dynamics generate cached pol- or if a lower-effort strategy was available (e.g., using a calcu-
icies governing context-specific updating of working memory. lator), the decision process may instead resolve against control
The gating model has been extended to support the hierarchi- episode engagement. In the next sections, we review evidence
cal structure of control episodes (Chatham and Badre, 2015). for the role of DA in training value functions and also instanta-
Corticostriatal loops may support DA-mediated hierarchical neously biasing the selection of, and persistence with, costly
reinforcement learning, in which content is selected for updating cognitive actions.
at different levels of a hierarchy (Badre and Frank, 2012; Frank
and Badre, 2012). Reciprocal connections allow the basal DA and Action Policy Learning
ganglia (BG) to not only direct which information gets gated Reward-Prediction Errors
into working memory, but also for higher-level PFC representa- A rich literature implicates the firing of midbrain DA cells in en-
tions of context to direct what lower-level representations get coding the momentary difference between expected and actual
out-gated, when they are no longer useful (Chatham et al., reward (Schultz et al., 1997). The remarkable functional similarity
2014). Thus, higher-level representations may interact in a top- between these reward prediction errors (RPEs) and temporal dif-
down manner with bottom-up gating mechanisms to adaptively ference values in computational reinforcement learning (RL) has
target content at a hierarchically lower level. led to the hypothesis that phasic DA dynamics train the system
Successful control episodes are enabled by both: (1) DA- to bias behaviors that increase context-based reinforcement
trained cached, value-based gating policies in cortico-striatal likelihood (Montague et al., 1996). Mechanistically, DA does so
circuits that bias adaptive updating in hierarchical environments, by potentiating synapses linking representations of the current
and (2) DA-mediated stability (in the PFC) and flexibility (in the state to specific behaviors (Wickens et al., 2007). Synaptic
striatum) of working memory as a function of incentive informa- weights acquired through this process can be thought of as
tion. Thus, DA appears to translate incentives into cognitive value functions, in the sense of stronger weights biasing actions
motivation by direct modulation of the cortico-striatal working that maximize the likelihood of reward (i.e., those actions with
memory network supporting control episodes. greatest expected value). This extends to cognitive actions.
Indeed, it is precisely these phasic DA RPEs that are thought
Cost-Benefit Decision Making to train WM gating policies described in the previous section
Control episodes are treated as subjectively costly. Behavioral (Frank et al., 2001) (Figure 2C).
evidence suggests cost-benefit decision-making, balancing Critically, the functional capacity of RPE signals extends
the value of the desirable goal against an underlying cost func- beyond simple stimulus-response pairings, to action-outcome
tion (Dixon and Christoff, 2012; Kool et al., 2010; Massar et al., association learning in the PFC (Gläscher et al., 2009). From an
2015; Westbrook et al., 2013) and DA likely plays a key role. action selection standpoint, this is enormously powerful. Fore-
Indeed, DA has long been implicated not only in working memory most, action-outcome associations are necessary for calculating
(WM) and motivation, but also in both value-learning and deci- net incentive value: the expected benefits of outcomes less the
sion-making. Specifically, DA trains functions mapping value to cost of actions. Moreover, action-outcome associations can

700 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review

not only support selecting the most highly rewarded action in a ments where individually costly actions, like WM allocation, are
given state, they also enable ‘‘looking forward’’: selecting actions justified inasmuch as they incur progress toward a goal that is
based upon an internal model of the environment, its states and more valuable than the sequence is costly. As we elaborate
action-contingent state transitions. An agent acting in a ‘‘model- next, value functions within the ACC in particular, appear critical
based’’ fashion may select actions that also take into account its for biasing engagement and persistence with costly control epi-
state motivation for particular outcomes (Daw et al., 2011; sodes.
Gläscher et al., 2009). Indeed, sensitivity to outcome devaluation
(e.g., devaluation by selective satiation) is used as the bench- DA Cell Firing Trains Action-Outcome Associations in
mark of model-based decision-making (Dolan and Dayan, 2013). the ACC
There is evidence that RPEs can reflect internal models of ac- The ACC and dopaminergic innervation of the ACC are critical for
tions and subsequent states (Hiroyuki, 2014). Hence, value func- selecting effortful behavior (Kurniawan et al., 2011). In particular,
tions may be learned for allocating WM, if doing so implements a RPE signals may train action-outcome associations in the ACC
mental state that increases the probability of reward, given sub- for prediction (Alexander and Brown, 2011; Donoso et al.,
sequent actions (Chatham and Badre, 2013; Dayan, 2012). Thus, 2014; Holroyd and Coles, 2002) and effort-based decision-mak-
RPEs may train value functions governing WM allocation. Evi- ing (Kennerley et al., 2011; Shenhav et al., 2013; Skvortsova
dence of value-based WM allocation comes from an fMRI study et al., 2014). Action-outcome associations are necessary for
of humans selecting among task sets manipulated to have vari- cost-benefit computations. Unit recording studies in monkeys
able utility (Chatham and Badre, 2013). The expected value of engaged in multi-attribute decision-making have uncovered
task sets was varied systematically over trials, and a RL model ACC neurons multiplexing information about benefits and costs
of choice behavior was used to predict trial-wise subjective (including effort) in a unified value-coding scheme (Kennerley
values of task sets. Subjective value estimates predicted et al., 2009). This contrasts with the orbitofrontal cortex (OFC),
BOLD dynamics in a fronto-striatal network, supporting that also implicated in economic decision-making, which contains
task set values are tracked according to a value-updating algo- neurons encoding the value of multi-attribute outcomes, but
rithm that is likely mediated by phasic DA RPE signals. It is worth not the cost of action to obtain such outcomes (Kennerley
noting here that, although PFC DA is thought have slow clear- et al., 2011; Padoa-Schioppa, 2011). Consistent data showing
ance, which would preclude the temporal resolution and speci- multiplexed cost-benefit encoding in the ACC also come from
ficity required for precise DA-based training, per se, there is rodent studies (Cowen et al., 2012; Hillman and Bilkey, 2012).
reason to believe that co-release of glutamate from DA cells There is also considerable evidence supporting cost-benefit
innervating the PFC could provide the mechanism for synaptic encoding in the human ACC during effort anticipation and deci-
learning effects (Seamans and Yang, 2004). DA cells may thus sion-making. In tasks utilizing advance reward and demand
direct learning, whether by the functional consequences of gluta- (effort) cues, the ACC is sensitive to the anticipation of both di-
mate in the PFC or DA in the striatum. As discussed above, how- mensions, in both forced- and free-choice trials (Croxson
ever, DA release in the PFC may have further consequences in et al., 2009; Kroemer et al., 2014; Kurniawan et al., 2010, 2013;
the PFC in terms of promoting WM stability, by modulating the Massar et al., 2015; Prévost et al., 2010; Vassena et al., 2014).
dynamics of recurrently firing networks of pyramidal cells. Moreover, the ACC has been repeatedly linked to the conscious
The functionality of RPE signals may also extend to hierarchi- experience of cognitive effort. In a striking demonstration, elec-
cal RL, whereby value functions describe actions sequences trical stimulation of the human ACC reliably evoked the
rather than individual actions (Frank and Badre, 2012; O’Reilly conscious experience of a forthcoming challenge and also a
et al., 2014; Ribas-Fernandes et al., 2011). Selection across se- ‘‘will to persevere’’ through that challenge (Parvizi et al., 2013).
quences is critical for overcoming individually costly actions that
are only justifiable given the value of desirable outcomes at Tonic PFC DA Strengthens Cortical Action Policy
sequence conclusion (Holroyd and Yeung, 2012). In the mental Signals
arithmetic example, updating WM with a ones-digit multiplica- Multiplexed value information is used by the ACC to set action
tion subgoal is costly, but may be justifiable with regard to the policies which can then be implemented via the BG, in competi-
progress it incurs toward the ultimate, valuable goal of solving tion with habitual biases against effortful engagement. In the
the two-digit multiplication problem. Importantly, knowledge of domain of cognitive effort, the ACC has been proposed to sub-
task hierarchy enables agents to bias such costly actions. serve a specific computational function in selecting the identity
Pseudo-RPEs (based on perceived progress rather than external of, and the intensity with which control signals are represented,
reward) may train value functions regarding action sequences as a function of the expected value of the associated outcome
(Ribas-Fernandes et al., 2011). Thus RPEs may train progress- (Shenhav et al., 2013). In this context, DA in the ACC strengthens
based value functions for sequences of effortful WM updating dynamics supporting representation and integration of action-
and maintenance. outcome associations (as evidenced, e.g., by increasing power
As we have just reviewed, DA-mediated RL appears to train in gamma band oscillations, see Steullet et al., 2014) and may
value functions with numerous properties supporting successful thereby increase the influence of ACC-based policy signals.
control episodes. Namely, RPEs can train value functions based Conversely, blocking DA diminishes the capacity of the ACC to
on action-outcome associations, supporting model-based pro- bias the choice of greater effort for larger reward (Schweimer
spection, and reflecting action sequences. Such value functions and Hauber, 2006; Schweimer et al., 2005). Thus, in the mental
may thus promote action in hierarchically structured environ- arithmetic example, incentive-driven DA release in the ACC

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 701


Neuron

Review

promotes cortical action policies related to the strategy of tunity costs of WM allocation (Kurzban et al., 2013). Evidence
directly computing the solution to the two-digit multiplication of DA encoding opportunity costs comes from a high-resolution
problem, rather than following a prepotent bias to utilize a fMRI study finding signed RPE-like increases in activity in the
lower-effort strategy (i.e., guessing) or otherwise disengage. VTA/substantia nigra (SN) corresponding with the value of
unchosen options which therefore constituted missed opportu-
DA and the ACC Track Progress to Regulate Persistence nities (D’Ardenne et al., 2013). The ACC, by virtue of its connec-
Following control episode initiation, an actor must decide tivity with lateral PFC WM circuits (see Kouneiher et al., 2009),
whether to persist. Opportunity costs rise with time-on-task and sustained activity through control-demanding tasks (see
and so may the drive to disengage. As we propose, perceived Dosenbach et al., 2006), is well-positioned to track such oppor-
progress implies increasing expected value, and thus may offset tunity costs.
accruing opportunity costs. There is growing evidence that DA Regardless of the nature of control costs, however, the ACC,
and the ACC regulate progress-based persistence with control which has long been implicated in avoidance learning (Shack-
episodes (Holroyd and Yeung, 2012; O’Reilly et al., 2014). In man et al., 2011), has been proposed to mediate avoidance of
fact, in rats engaged in an effort-based decision-making task, control demands by attenuating DA-based value-learning sig-
ACC neurons multiplexing maze path, reward, and effort infor- nals (Botvinick, 2007). The most direct evidence comes from a
mation were most selective after decisions were made, and their recent pharmaco-genetic imaging study (Cavanagh et al.,
dynamics were identical across forced- and free-choice trials, 2014). The paradigm was structured such that reward and pun-
suggesting greater involvement in biasing persistence than in ishment cues were accompanied by either high or low decision
initial selection (Cowen et al., 2012). conflict (control demands), designed to test the prediction that
Control episodes are intrinsically costly, perhaps reflecting op- conflict would attenuate reward and boost punishment learning.
portunity costs incurred by WM allocation (Inzlicht et al., 2014; As expected, an EEG signature of ACC activity—mid-frontal
Kurzban et al., 2013). Adaptive persistence in effortful se- theta power—increased reliably on conflict versus non-conflict
quences of behavior, therefore, requires ongoing computation trials. Critically, conflict strengthened individual difference corre-
of accruing costs and benefits (Meyniel et al., 2013). A useful lations between mid-frontal theta power and the perceived pun-
metric is the rate of progress, if progress is sufficiently fast, ishment value of a given stimulus, while it attenuated individual
engagement is maintained, while slow or blocked progress yield difference correlations with the perceived reward value of a given
frustration and disengagement (O’Reilly et al., 2014). stimulus. This result supports the hypothesis that the ACC both
The ACC, by virtue of its capacity for hierarchical RL, and recip- recruits control resources and also signals the cost of recruit-
rocal interactions with the DA midbrain (Holroyd and Yeung, ment, thereby attenuating reward and amplifying punishment
2012; Ribas-Fernandes et al., 2011), is well-positioned to track learning. Evidence implicating DA in mediating cost signaling
progress and regulate engagement. By this account, the ACC included that dosing with cabergoline—a D2 selective agonist
(perhaps in concert with the OFC, see O’Reilly et al., 2014) uses that acts on presynaptic D2 autoreceptors to inhibit burst firing
representations of hierarchical task structure to track progress to reward and exaggerate burst firing to punishments in the
toward sub and superordinate goals and conveys progress via DS—had the effect of reducing reward responsiveness and
the dopaminergic midbrain. Faster progress generates DA boosting punishment responsiveness during learning.
release, promoting value learning and engagement, while slower Direct midbrain recordings support the hypothesis that DA
progress generates DA dips. Indeed, ACC unit recordings in both neurons encode effort costs. For example, a subset (11%) of
monkeys and rats show ramping dynamics that reflect increasing midbrain VTA neurons in monkeys performing an effortful, incen-
progress through action sequences (Ma et al., 2014). Importantly, tivized reaching task fired in proportion to reward magnitude dis-
this dynamic reflects internal models of task structure: rat ACC counted by effort demands (Pasquereau and Turner, 2013).
neurons track progression through a sequence of lever presses, Similarly, population firing rates of SN neurons in monkeys per-
regardless of physical lever features or of particular sequences forming an effort-based decision-making task increased during
required on a given trial (Ma et al., 2014). higher reward trials and decreased with increasing effort require-
The midbrain, for its part, shows RPE-like firing in response to ments (Varazzani et al., 2015) (Figure 5). Interestingly, a stronger
perceived (progress-like) success in monkeys performing a vi- relationship between net expected value and SN firing rates also
sual WM task, independent of actual success, implicating predicted a stronger relationship between net expected value
model-based criteria (Matsumoto and Takada, 2013). Also, VS and choice behavior. This correlation suggests that midbrain
BOLD signal in humans performing a WM task increases tran- dopaminergic activity has the capacity to directly influence deci-
siently on correct versus incorrect trials, in the absence of perfor- sion-making beyond mediating value learning, a point to which
mance feedback (Satterthwaite et al., 2012). Together, these we will return later.
results suggest not only that pseudo-RPEs report perceived The ACC is thus a strong candidate for regulating persistence
goal progress, but also that one function of these pseudo-RPE with control episodes, via the DA midbrain, by virtue of its capac-
signals is to modulate activity in the VS, a region proposed to ity to track not only incremental progress toward a goal, but also
serve as a key motivational hub (Mogenson et al., 1980). opportunity costs, and thereby signal control costs. We propose
that ACC regulates persistence by conveying the momentary
DA and the ACC Track Costs to Constrain Persistence balance of accruing progress less costs via phasic DA release
Persistence is justifiable only inasmuch as that progress outpa- from the midbrain. As we discuss in the next section, these DA
ces accruing costs. A normative account appeals to the oppor- projections have important effects on not just value learning,

702 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review
Figure 5. SN Firing Rates Encode Reward
and Effort
(A) Raster plots of monkey SN cell activity to
incentive cues during effort-based decision-mak-
ing, firing intensifies with higher incentive values
(liquid reward; rightward columns) and lower effort
demands (handgrip squeeze; upper rows).
(B) Location of SN recordings (Varazzani et al.,
2015).

effort study increased to high versus low


reward and was attenuated when it was
preceded by high versus low demands
for handgrip squeezes (Kurniawan et al.,
2013). Similarly, in the cognitive domain,
a transient VS response to reward receipt
was diminished if it was preceded by high
but also on action selection, including instantaneous incentive versus low demands for cognitive control (i.e., task-switching
motivation effects in the striatum. frequency) (Botvinick et al., 2009).
Importantly, the VS evaluates both model-based and model-
DA and Action Selection Biasing free state features (Daw et al., 2011; van der Meer and Redish,
We propose that incentive-linked DA release promotes ACC- 2011). The capacity for model-based evaluation makes the VS
based action policies on engagement and persistence with critical for selection of control episodes, which may involve
control episodes over opposing action biases in the striatum multiple costly actions that are only justifiable with respect to
(Figure 2D). The VS, and particularly the nucleus accumbens ultimate goals. Hence, as we describe later, dopaminergic inner-
(NAcc), are regarded as a core limbic-motor interface (Mogen- vation of the VS is particularly important for selecting model-
son et al., 1980), featuring dense reciprocal connections with based behavior constituting control episodes.
both the dopaminergic midbrain and cortical regions including
the ACC (Haber and Knutson, 2010). The DS, as described Striatal DA Release Mediates Incentive Salience and
above, caches value functions controlling the gating of both State Motivation
motor behavior and WM allocation (O’Reilly and Frank, Adaptive engagement with control episodes should involve not
2006). A reconceptualization of these regions, and their dopa- only rigid implementation of cached action values, but should
minergic inputs, describes the VS as a ‘‘critic’’ evaluating also be sensitive to the current motivational state. In the arith-
states and driving DA RPE-based training of action value func- metic example, it would be adaptive to modulate persistence
tions, while the DS serves as the ‘‘actor’’ that learns value upon realizing that incentive point values were larger/smaller
functions for gating cognitive and motor action (Joel et al., than first thought.
2002; van der Meer and Redish, 2011). Here, we highlight The incentive salience hypothesis holds that action values can
the role of DA in the VS in biasing action policies from cortical be modulated instantaneously (i.e., without prior learning) by
regions like the ACC, and DA in the DS in promoting gating of incentive cued striatal DA release (McClure et al., 2003; Phillips
effortful cognitive actions as a function of incentive state and et al., 2008; Zhang et al., 2009). Hunger, e.g., increases instru-
goal proximity. mental lever pressing for food in rats (Phillips et al., 2008). A long-
standing literature implicates striatal DA in modifying value
DA RPEs Train the VS to Encode Net Incentive Value functions and thereby promoting state willingness to expend
In the VS, phasic DA RPEs train cortico-striatal synapses to effort (Bromberg-Martin et al., 2010; Kurniawan et al., 2011; Sal-
reflect the net incentive value of a given state, i.e., expected amone and Correa, 2012). Alternatively, as we discuss later,
reward less expected effort. Hence, fast-scan cyclic voltamme- incentive-cued DA release in the VS, in particular, may promote
try in rats performing an effort-based decision-making task re- flexible approach, increasing apparent willingness to expend
veals NAcc DA release encoding both reward magnitude and effort (McGinty et al., 2013; Nicola, 2010). In this section, we re-
lever-press ratio requirements of corresponding alternatives view evidence for DA’s role in incentive state modulation of
(Day et al., 2010) or the encoding of ratio requirements when de- cached action values.
mands are atypically low (Gan et al., 2009). Phasic DA RPE sig- DA signaling in the VS appears necessary for the selection of
nals, in turn, train synapses to make VS neurons more excitable physical effort. In a canonical paradigm, rats choose between
to states that signal relatively higher reward and lower effort climbing a high barrier for more reward or a low barrier for less
costs. reward, or alternatively, to select a high-ratio lever press option
Human fMRI studies support the hypothesis that the excit- for more reward or a low ratio lever press option for less reward
ability of VS neurons encode net incentive value with respect (see Bromberg-Martin et al., 2010; Kurniawan et al., 2011; Sala-
to effort (see Croxson et al., 2009; Kurniawan et al., 2013; mone and Correa, 2012 for reviews). The typical result is that DA
Schmidt et al., 2012). Striatal BOLD signal during a physical blockade in the VS (along with antagonism in the ACC or lesions

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 703


Neuron

Review
Figure 6. Striatal Dopamine
Instantaneously Biases Action Selection
(A) Schematic of cortico-striatal neural network
model modified in OpAL such that DA instanta-
neously promotes firing of D1-expressing, direct
pathway Go cells (green region), and inhibits firing
of D2-expressing, indirect pathway NoGo cells
(red region) in the striatum, thereby promoting WM
gating (Collins and Frank, 2014).
(B) Optogenetic stimulation of striatal D1 and D2
cells in rats, during decision-making in a two-
alternative reward-learning task, produce dose-
dependent shifts in preference toward the
contralateral option in case of D1 cell stimulation
(blue) and the ipsilateral option in case of D2
stimulation (red) mimicking shifts in subjective
value functions (as reviewed in Lee et al., 2015).

of the ACC-VS loop) shifts preferences from high reward, high minergic neurons in the striatum, and coincident phasic DA
effort options toward low reward, low effort options. release boosts signal-to-noise: it enhances the contrast between
Human studies also implicate striatal DA signaling in incentive strongly excited synapses corresponding to policy signals at the
motivation. For example, D2/D3 receptor and dopamine trans- time of choice, relative to weakly excited synapses (Figure 2D)
porter density in the NAcc predicts trait-level achievement moti- (Nicola et al., 2004). Thus, phasic DA efflux could instanta-
vation in the individuals with ADHD (Volkow et al., 2011). In a neously amplify cortical action policies projected to the VS
combined fallypride-PET and d-amphetamine challenge study, (Roesch et al., 2009).
human volunteers with the largest DS binding potential, and A recent computational proposal (‘‘OPponent Actor Learning’’
highest sensitivity to d-amphetamine during an instrumental but- or ‘‘OPAL’’; Figure 6A) unifies value learning and incentive
ton-pressing task were more willing to button press for reward salience aspects of DA. In the DS, where gating policies are
(Treadway et al., 2012). Also, systemic DA agonism by the indi- cached in terms of the relative strengths of cortico-striatal syn-
rect agonist d-amphetamine ameliorates physical effort deficits apses onto D1-expressing Go and D2-expressing NoGo cells,
among individuals with Parkinson’s disease (Chong et al., DA should increase Go cell firing, and inhibit NoGo cells, thus
2015). In the cognitive domain, VS BOLD signal interacted with modulating cached policies in favor of gating actions (Collins
a genetic D2 receptor density marker to predict individual differ- and Frank, 2014). According to this proposal, DA not only influ-
ences in WM performance (Nymberg et al., 2014). ences the learning of cached value functions, but can instanta-
The ability of incentive-cued DA release to energize behavior neously modulate those value functions at the time of choice.
appears to critically depend on D2 receptor signaling in the The most direct evidence comes from an optogenetic study in
NAcc core. In an instrumental lever-pressing task, transient which lateralized populations of DS Go and NoGo cells in rats
GABAergic inactivation of the NAcc core, but not the shell, were stimulated independently (Tai et al., 2012). Stimulation of
shifted preferences from high effort-high reward to low effort- Go cells yielded an apparent shift in preference to a contralateral
low reward alternatives (Ghods-Sharifi and Floresco, 2010). option, while stimulation of NoGo cells yielded an apparent shift
Additionally, rats treated with a viral vector yielding acute over- to an ipsilateral option. This shift mimicked additive effects in
expression of D2 receptors in the NAcc showed enhanced subjective value of one option over the other (Figure 6B). Hence,
instrumental lever-pressing (Trifilieff et al., 2013). We note that DS DA may translate incentive motivation into the selection of
studies using animal models with developmentally over- control episodes by increasing the subjective value of WM
expressed D2 receptors have also shown the reverse effect, allocation.
i.e., decreased incentive motivation (Krabbe et al., 2015; Ward
et al., 2015). However, this reverse effect may be due to comor- DA in the VS Promotes Model-Based Behavior
bid, developmental under-expression of NMDA NR1 and NR2B The capacity of phasic DA in the VS to bias cortical action pol-
receptors on VTA neurons, which reduces both their firing fre- icies related to effortful action is most critical in the early stages
quency and burst firing (Krabbe et al., 2015). of successful control episodes when behavior is necessarily
model-based. For example, in the two-digit mental arithmetic
DA Promotes Effortful Action by Promoting Cortical problem, the actor must consider the point-valued outcome
Action Policy Signals in the VS and Increasing the when deciding whether to engage and persist in the episode,
Likelihood of Gating in the DS since the immediate value of initial subgoals, e.g., computing
How does striatal DA bias selection of effortful action? By one the ones-digit product, is net-negative. Here, VS DA appears
proposal, action policies from the cortex, including canonical critical for promoting model-based behavior, of the kind neces-
economic decision-making regions like the ACC and the OFC, sary for persistence in these early stages. Indeed, higher presyn-
are sent via axons that jointly synapse along with midbrain dopa- aptic striatal DA, as measured by [F]DOPA PET, predicts greater

704 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review

chaining, whether that distance is a function of space, time, or


the number of subgoals in a two-digit multiplication.

DA Tone in the Striatum Reflects Goal Progress and


Invigorates Action
As progress is made within a control episode, and percepts nar-
row in on a goal state, action invigoration becomes more impor-
tant. Consider the final stage of a two-digit multiplication, when
cortical representations of products for summation increasingly
suggest the ultimate solution. At this stage, consideration of
outcome incentives becomes less important than quick and
robust execution of a retrieval action to finalize the solution.
Intriguingly, recently discovered DA dynamics appear well-
suited to subserve this functional shift from model-based control
Figure 7. Ramping Dopamine Encodes Goal Progress to invigoration (Figure 7). Namely, striatal DA tone ramps up
VS DA as measured by fast-scan cyclic voltammetry, during a single trial in smoothly, encoding goal progress (Howe et al., 2013). This dy-
which a rat progresses through a t-maze toward a final goal state.
The red vertical lines indicate, from left to right, the timing of an audible click
namic, discovered with fast-scan cyclic voltammetry in rats navi-
cueing trial start, a tone indicating the direction the rat should turn, and finally gating mazes, was found to scale with reward magnitude, and
successful goal attainment (chocolate milk). The ramp slopes reflected relative encode relative, rather than absolute distance to the goal.
path distance, rather than absolute path distance or time-on-trial, and scaled The mechanism of DA ramping is not clear, whether it reflects
with reward magnitude (Howe et al., 2013).
local release or ramping firing of midbrain DA cells. Ramping may
actually result from the progressive accumulation, or ‘‘spill-over’’
reliance on model-based decision-making in a two-stage from phasic DA release (Gershman, 2014), e.g., as progress is
sequential decision-making task and also predicted decreased made. In particular, phasic DA release may reflect the temporal
reliance on habitual associations as encoded in striatal BOLD derivative of a running average rate of progress as tracked by
signal (Deserno et al., 2015). This could also explain why humans the ACC and OFC (O’Reilly et al., 2014) or pseudo-reward in
dosed with systemic DA agonists show more model-based rela- hierarchical RL (Ribas-Fernandes et al., 2011).
tive to model-free decision-making (Wunderlich et al., 2012), An important functional consequence of rising striatal DA tone
especially to the extent that phasic signaling can be boosted is the invigoration of behavior. Specifically, striatal DA tone is
by greater extrasynaptic tone (Dreyer et al., 2010). thought to encode the average rate of experienced reward and
The emphasis on promoting model-based behavior aligns with promote vigor (inverse latency to responding) adaptively, such
a reconceptualization, in which VS DA supports flexible that higher rates of reward imply a richer local resource that an
approach or persisting in goal-directed (and therefore model- actor should act more quickly to obtain (Niv et al., 2007). Hence,
based) behavior (McGinty et al., 2013; Nicola, 2010), rather sufficiently fast progress toward the final goal yields ramping
than overcoming instrumental costs per se. The flexible striatal DA tone, which can also promote action invigoration as
approach hypothesis states that during periods in which reward the goal nears.
are not immediately available, agents are more likely to disen- We close this section on dopaminergic mediation of value-
gage and, because they can assume different positions with learning and effort-based decision-making by noting conflicting
respect to operanda during such pauses, NAcc DA is needed evidence. First, a recent study has shown that DA-based cached
to flexibly reapproach and engage. values do not necessarily map onto preferred actions (Hollon
By this account, much of the extant literature on NAcc DA pro- et al., 2014). Specifically, fast-scan cyclic voltammetry in the
moting instrumental effort can be reinterpreted, wherein subtle rat NAcc revealed that DA tone was higher on trials in which
task features allow more opportunities for disengagement in the rat was forced to choose a dis-preferred high effort high
conditions for which effort demands are higher, placing more de- reward option over a preferred low effort low reward option. Of
mands on NAcc DA to support flexible approach (Nicola, 2010). course, DA may play different roles in forced- and free-choice
This could explain why NAcc DA depletion does not always decision-making, but this result suggests that, at least in some
affect effort-based decision-making about instrumental lever contexts, the rank-ordered relationship between DA and prefer-
pressing in rats, e.g., when instrumental task design permits ence can be violated. Second, we note recent work aimed at
few opportunities for disengagement (Walton et al., 2009). It developing a rodent model of cognitive effort-based decision-
further explains the observation that NAcc DA is only necessary making (see Hosking et al., 2015). This work has provided mixed
for initiating instrumental lever-pressing when there are longer evidence so far regarding the consequences of systemic, phar-
pauses between action opportunities (Nicola, 2010). macological DA manipulation on willingness to expend cognitive
Regardless of whether VS DA is necessary for overcoming (versus physical) effort. It is open for debate whether the new
effort costs or for flexible approach, the selection of effortful ac- rodent model represents the sorts of cognitive effort-based
tion sequences, like those comprising control episodes, requires decision-making that is of focal interest for control episodes
VS DA (Nicola, 2007). Moreover, this may be particularly true and whether the task sufficiently discriminates effort-based
when there is greater ‘‘psychological distance’’ from goals (Sal- from probabilistic decision-making. Nevertheless, a rodent
amone and Correa, 2012), involving deeper action-outcome model obviously holds great promise for more fine-grained

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 705


Neuron

Review

investigation into the neural circuitry mediating decisions about PFC DA tone is unlikely to yield destabilization. As noted in a
cognitive effort. recent review (Spencer et al., 2015), intra-PFC injections of
methylphenidate in rats, boosting DA tone, do not impair WM
at concentrations that are 16- to 32-fold higher than clinically
DA Translates Incentives into Cognitive Motivation: relevant methylphenidate doses. This stands in contrast to the
Summary Proposal observation that systemic administration of methylphenidate
Here, we recapitulate the proposal we have been building, can impair WM at 4-fold concentrations higher than clinical
whereby DA does double duty during costly control episodes. doses.
We define control episodes as temporally extended sequences An account of such discrepancies (between systemic and
in which WM is allocated to represent the rules needed to guide localized DA pharmacological manipulations) is that they relate
goal-directed behavior. During an episode, DA does double duty to distinctions between DA modulation of PFC versus the stria-
in that it: (1) influences WM contents by functional modulation of tum. Specifically, systemic high-dose DA manipulations may pri-
WM circuits, and (2) supports value-learning and decision-mak- marily act in the striatum where high DA tone can potentiate
ing about effortful cognitive actions (Figure 2). gating indiscriminately (Cools and D’Esposito, 2011). In a recent
Generally, we propose that: PET study demonstrating this effect in humans, the conse-
d Phasic DA RPE signals encode goal benefits and effort quences of incentive motivation on performance of a Stroop
costs for control episodes, caching net values in terms of task were investigated as a function of individual differences in
LTP and LTD of cortico-striatal synapses. baseline striatal DA synthesis capacity (using 6-[18F]fluoro-
d Incentive-linked DA release instantaneously augments l-m-tyrosine uptake) (Aarts et al., 2014). The key finding was
cached values, increasing the likelihood of gating relevant that while incentives enhanced performance for some partici-
task sets into WM in the striatum, thereby initiating control pants, those with highest baseline synthesis capacity saw a
episodes associated with high incentive value. decrement in incentivized performance. This pattern is consis-
d During control episodes, the ACC tracks both accruing op- tent with the interpretation that incentive-cued striatal DA release
portunity costs and incremental progress, the balance of for those with high baseline DA synthesis capacity yielded indis-
these is conveyed to midbrain DA neurons, where it is criminant updating, undermining performance. In our proposal,
then transmitted to the striatum and PFC as phasic, striatal DA tone rises when progress outpaces opportunity
effort-discounted, pseudo-RPE signals. costs; however, indiscriminate updating is typically prevented
d In the PFC, rising DA tone encoding fast goal progress (or (under non-pharmacological conditions) by the imposition of hi-
high incentive state) enhances the robustness of persistent erarchical, targeted updating policies guided by PFC WM repre-
activity, thereby stabilizing active maintenance in recurrent sentations. Thus striatal DA tone interacts with targeted updating
networks representing task goals. policies to maintain engagement with the current control
d In the VS, DA release promotes drive (or flexible approach) episode.
to select extended sequences of goal-directed behavior. In focusing on DA, we have neglected other potentially rele-
This is particularly critical at early stages of a control vant neurotransmitter systems. Norepinephrine, for example,
episode. As the goal state nears, ramping DA tone invigo- has similar effects on the stability of WM representations and
rates (potentiates) action gating, including WM allocation also responds like DA to incentive cues (Sara, 2009). A recent
actions. study showed that while SN neurons encoded net cost-benefit
d In the DS, DA tone encoding sufficiently fast goal progress during effort-based decision-making, locus coeruleus neurons
in a ramping fashion increases the general likelihood of encoded effort demands during task execution, suggesting a
task set updating. However, hierarchically structured task dissociation between DA in action selection and norepinephrine
sets in the PFC interact with DS to target lower-level task in action execution (Varazzani et al., 2015). Adenosine, for its
sets for contextually appropriate out-gating. Thus, spe- part, appears to interact with the midbrain dopaminergic system
cific, lower-level flexibility is promoted while high-level to regulate effort-based decision-making and may account for
goal maintenance is sustained during the control episode. the effects of caffeine on cognitive effort (Salamone et al.,
d Conversely, to the extent that opportunity costs outpace 2012). Serotonin has also been proposed to oppose DA learning
incremental progress, the likelihood of disengagement effects and may subserve effort cost learning (Boureau and
rises. This may result from falling PFC DA tone, reducing Dayan, 2010). Nevertheless, we think that DA in particular has
the stability of WM representations or reduced likelihood a number of useful properties that position it best for mediating
of WM gating in cortico-striatal-thalamic loops. Declining incentive motivation for cognitive effort.
DA release undermines goal-directed flexible approach ef- Our proposal is similar in scope to other recent proposals. As
fects in the VS, further potentiating distraction. described above, the Expected Value of Control proposal (Shen-
hav et al., 2013) considers cognitive control recruitment as driven
Gaps in our account remain. We have described how rising by net expected value computations in ACC. A recent RL model
PFC DA promotes task set stability, yet we have also pointed (Holroyd and McClure, 2015) also considers the role of the ACC in
to evidence that supraoptimal PFC DA tone yields destabiliza- value-based regulation of cognitive control and thus offers spe-
tion, and, indeed, how rapid PFC DA efflux could act as a global cific predictions about the influence of reward dynamics on
updating signal, indiscriminately destabilizing all current repre- effortful action including upregulation when reward rates are
sentations. However, we think that, for most operating regimes, below average and downregulation when reward rates are above

706 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review

average. There are numerous points of theoretical overlap among Boehler, C.N., Hopf, J.M., Krebs, R.M., Stoppel, C.M., Schoenfeld, M.A.,
Heinze, H.J., and Noesselt, T. (2011). Task-load-dependent activation of
our proposals. For example, in all three, the ACC biases the selec- dopaminergic midbrain areas in the absence of reward. J. Neurosci. 31,
tion of effortful cognitive control actions via interactions with the 4955–4961.
striatum. Our proposal complements the other two accounts by
Botvinick, M.M. (2007). Conflict monitoring and decision making: reconciling
articulating the varied and precise roles by which DA contributes two perspectives on anterior cingulate function. Cogn. Affect. Behav. Neuro-
to the value-based regulation of WM systems during control ep- sci. 7, 356–366.
isodes. For example, in the computational model of Holroyd and Botvinick, M., and Braver, T. (2015). Motivation and cognitive control: from
McClure (2015), DA interacts with ACC signals such that when behavior to neural mechanism. Annu. Rev. Psychol. 66, 83–113.
average reward, and presumably DA tone (Niv et al., 2007), are Botvinick, M.M., Braver, T.S., Barch, D.M., Carter, C.S., and Cohen, J.D.
high, control signals are boosted because outcomes are more (2001). Conflict monitoring and cognitive control. Psychol. Rev. 108, 624–652.
likely to fall below the average reward rate. Conversely, striatal
Botvinick, M.M., Huffstetler, S., and McGuire, J.T. (2009). Effort discounting in
DA blockade is posited to be computationally equivalent to low human nucleus accumbens. Cogn. Affect. Behav. Neurosci. 36, 74–97.
average reward, thus any reward is effectively above average
Boureau, Y.-L., and Dayan, P. (2010). Opponency revisited: competition and
and control signals dissipate. In our proposal, by contrast, striatal cooperation between dopamine and serotonin. Neuropsychopharmacology
DA blockade also has the effect of diminishing effortful cognitive 9, 16–27.
control, but it has its effects not in terms of a shift in perceived
Braver, T.S. (2012). The variable nature of cognitive control: a dual mecha-
average reward, but in terms of diminished WM stability in the nisms framework. Trends Cogn. Sci. 16, 106–113.
PFC and targeted flexibility via the striatum.
Braver, T.S., Barch, D.M., and Cohen, J.D. (1999). Cognition and control in
We acknowledge the tentative nature of our proposal. Compu- schizophrenia: a computational model of dopamine and prefrontal function.
tational modeling and experimental validation are required to Biol. Psychiatry 46, 312–328.
ensure DA has the functional capacity to subserve adaptive Bromberg-Martin, E.S., Matsumoto, M., and Hikosaka, O. (2010). Dopamine in
engagement and persistence in the ways we hypothesize. motivational control: rewarding, aversive, and alerting. Neuron 68, 815–834.
Nevertheless, we hope this conceptual sketch unifies disparate
Brown, J.W., and Braver, T.S. (2005). Learned predictions of error likelihood in
literatures on DA’s various functional properties and prompts the anterior cingulate cortex. Science 307, 1118–1121.
development of a comprehensive theory of DA in cognitive effort.
Brunel, N., and Wang, X.-J. (2001). Effects of neuromodulation in a cortical
We have highlighted DA’s roles in value-learning and effort- network model of object working memory dominated by recurrent inhibition.
based decision-making and also the direct functional modulation J. Comput. Neurosci. 11, 63–85.
of WM circuits and thereby WM contents by phasic and tonic DA Butts, K.A., Weinberg, J., Young, A.H., and Phillips, A.G. (2011). Glucocorti-
modes. The integration of these two broad literatures together coid receptors in the prefrontal cortex regulate stress-evoked dopamine efflux
indicates double duty for DA in motivating cognitive effort. and aspects of executive function. Proc. Natl. Acad. Sci. USA 108, 18459–
18464.

ACKNOWLEDGMENTS Cacioppo, J., Petty, R., Feinstein, J., and Jarvis, W. (1996). Dispositional differ-
ences in cognitive motivation: The life and times of individuals varying in need
for cognition. Psychol. Bull. 119, 197–253.
This work was supported by grants from the National Institute of Mental Health
(1F31 MH100855-01 to A.W. and 1R21 MH105800 to T.S.B.) and the National Cavanagh, J.F., Masters, S.E., Bath, K., and Frank, M.J. (2014). Conflict acts
Institute of Aging (R01 AG043461 to T.S.B.). as an implicit cost in reinforcement learning. Nat. Commun. 5, 5394.

Chatham, C.H., and Badre, D. (2013). Working memory management and


REFERENCES predicted utility. Front. Behav. Neurosci. 7, 83.

Chatham, C.H., and Badre, D. (2015). Multiple gates on working memory.


Aalto, S., Brück, A., Laine, M., Någren, K., and Rinne, J.O. (2005). Frontal and
Current Opinion in Behavioral Sciences 1, 23–31.
temporal dopamine release during working memory and attention tasks in
healthy humans: a positron emission tomography study using the high-affinity Chatham, C.H., Frank, M.J., and Badre, D. (2014). Corticostriatal output gating
dopamine D2 receptor ligand [11C]FLB 457. J. Neurosci. 25, 2471–2477. during selection from working memory. Neuron 81, 930–942.
Aarts, E., Roelofs, A., Franke, B., Rijpkema, M., Fernández, G., Helmich, R.C., Chiew, K.S., and Braver, T.S. (2014). Dissociable influences of reward motiva-
and Cools, R. (2010). Striatal dopamine mediates the interface between moti- tion and positive emotion on cognitive control. Cogn. Affect. Behav. Neurosci.
vational and cognitive control in humans: evidence from genetic imaging. 14, 509–529.
Neuropsychopharmacology 35, 1943–1951.
Chong, T.T.J., Bonnelle, V., Manohar, S., Veromann, K.-R., Muhammed, K.,
Aarts, E., Wallace, D.L., Dang, L.C., Jagust, W.J., Cools, R., and D’Esposito, Tofaris, G.K., Hu, M., and Husain, M. (2015). Dopamine enhances willingness
M. (2014). Dopamine and the cognitive downside of a promised bonus. to exert effort for reward in Parkinson’s disease. Cortex 69, 40–46.
Psychol. Sci. 25, 1003–1009.
Collins, A.G.E., and Frank, M.J. (2014). Opponent actor learning (OpAL):
Alexander, W.H., and Brown, J.W. (2011). Medial prefrontal cortex as an ac- modeling interactive effects of striatal dopamine on reinforcement learning
tion-outcome predictor. Nat. Neurosci. 14, 1338–1344. and choice incentive. Psychol. Rev. 121, 337–366.

Badre, D., and Frank, M.J. (2012). Mechanisms of hierarchical reinforcement Cools, R., and D’Esposito, M. (2011). Inverted-U-shaped dopamine actions on
learning in cortico-striatal circuits 2: evidence from fMRI. Cereb. Cortex 22, human working memory and cognitive control. Biol. Psychiatry 69, e113–e125.
527–536.
Cowen, S.L., Davis, G.A., and Nitz, D.A. (2012). Anterior cingulate neurons in
Bahlmann, J., Aarts, E., and D’Esposito, M. (2015). Influence of motivation on the rat map anticipated effort and reward to their associated action se-
control hierarchy in the human frontal cortex. J. Neurosci. 35, 3207–3217. quences. J. Neurophysiol. 107, 2393–2407.

Berridge, C.W., and Arnsten, A.F.T. (2013). Psychostimulants and motivated Croxson, P.L., Walton, M.E., O’Reilly, J.X., Behrens, T.E.J., and Rushworth,
behavior: arousal and cognition. Neurosci. Biobehav. Rev. 37 (9 Pt A), 1976– M.F.S. (2009). Effort-based cost-benefit valuation and the human brain.
1984. J. Neurosci. 29, 4531–4541.

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 707


Neuron

Review
D’Ardenne, K., Eshel, N., Luka, J., Lenartowicz, A., Nystrom, L.E., and Cohen, Haber, S.N., and Knutson, B. (2010). The reward circuit: linking primate anat-
J.D. (2012). Role of prefrontal cortex and the midbrain dopamine system in omy and human imaging. Neuropsychopharmacology 35, 4–26.
working memory updating. Proc. Natl. Acad. Sci. USA 109, 19900–19909.
Hammar, A., Strand, M., Årdal, G., Schmid, M., Lund, A., and Elliott, R. (2011).
D’Ardenne, K., Lohrenz, T., Bartley, K.A., and Montague, P.R. (2013). Compu- Testing the cognitive effort hypothesis of cognitive impairment in major
tational heterogeneity in the human mesencephalic dopamine system. Cogn. depression. Nord. J. Psychiatry 65, 74–80.
Affect. Behav. Neurosci. 13, 747–756.
Herd, S.A., O’Reilly, R.C., Hazy, T.E., Chatham, C.H., Brant, A.M., and Fried-
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P., and Dolan, R.J. (2011). man, N.P. (2014). A neural network model of individual differences in task
Model-based influences on humans’ choices and striatal prediction errors. switching abilities. Neuropsychologia 62, 375–389.
Neuron 69, 1204–1215.
Hess, T.M., and Ennis, G.E. (2012). Age differences in the effort and costs
Day, J.J., Jones, J.L., Wightman, R.M., and Carelli, R.M. (2010). Phasic nu- associated with cognitive activity. J. Gerontol. B Psychol. Sci. Soc. Sci. 67,
cleus accumbens dopamine release encodes effort- and delay-related costs. 447–455.
Biol. Psychiatry 68, 306–309.
Hillman, K.L., and Bilkey, D.K. (2012). Neural encoding of competitive effort in
Dayan, P. (2012). How to set the switches on this thing. Curr. Opin. Neurobiol. the anterior cingulate cortex. Nat. Neurosci. 15, 1290–1297.
22, 1068–1074.
Hiroyuki, N. (2014). Multiplexing signals in reinforcement learning with internal
Deserno, L., Huys, Q.J.M., Boehme, R., Buchert, R., Heinze, H.-J., Grace, A.A., models and dopamine. Curr. Opin. Neurobiol. 25, 123–129.
Dolan, R.J., Heinz, A., and Schlagenhauf, F. (2015). Ventral striatal dopamine
reflects behavioral and neural signatures of model-based control during Hollon, N.G., Arnold, M.M., Gan, J.O., Walton, M.E., and Phillips, P.E.M.
sequential decision making. Proc. Natl. Acad. Sci. USA 112, 1595–1600. (2014). Dopamine-associated cached values are not sufficient as the basis
for action selection. Proc. Natl. Acad. Sci. USA 111, 18357–18362.
Dixon, M.L., and Christoff, K. (2012). The decision to engage cognitive control
is driven by expected reward-value: neural and behavioral evidence. PLoS Holroyd, C.B., and Coles, M.G.H. (2002). The neural basis of human error pro-
ONE 7, e51637. cessing: reinforcement learning, dopamine, and the error-related negativity.
Psychol. Rev. 109, 679–709.
Dolan, R.J., and Dayan, P. (2013). Goals and habits in the brain. Neuron 80,
312–325. Holroyd, C.B., and Yeung, N. (2012). Motivation of extended behaviors by
anterior cingulate cortex. Trends Cogn. Sci. 16, 122–128.
Donoso, M., Collins, A.G., and Koechlin, E. (2014). Human cognition. Founda-
tions of human reasoning in the prefrontal cortex. Science 344, 1481–1486. Holroyd, C.B., and McClure, S.M. (2015). Hierarchical control over effortful
behavior by rodent medial frontal cortex: A computational model. Psychol.
Dosenbach, N.U.F., Visscher, K.M., Palmer, E.D., Miezin, F.M., Wenger, K.K., Rev. 122, 54–83.
Kang, H.C., Burgund, E.D., Grimes, A.L., Schlaggar, B.L., and Petersen, S.E.
(2006). A core system for the implementation of task sets. Neuron 50, 799–812. Hosking, J.G., Floresco, S.B., and Winstanley, C.A. (2015). Dopamine antago-
nism decreases willingness to expend physical, but not cognitive, effort:
Dreisbach, G., and Goschke, T. (2004). How positive affect modulates cogni- a comparison of two rodent cost/benefit decision-making tasks. Neuropsy-
tive control: reduced perseveration at the cost of increased distractibility. chopharmacology 40, 1005–1015.
J. Exp. Psychol. Learn. Mem. Cogn. 30, 343–353.
Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E.M., and Graybiel,
Dreisbach, G., and Fischer, R. (2012). Conflicts as aversive signals. Brain A.M. (2013). Prolonged dopamine signalling in striatum signals proximity and
Cogn. 78, 94–98. value of distant rewards. Nature 500, 575–579.

Dreyer, J.K., Herrik, K.F., Berg, R.W., and Hounsgaard, J.D. (2010). Influence Inzlicht, M., Schmeichel, B.J., and Macrae, C.N. (2014). Why self-control
of phasic and tonic dopamine release on receptor activation. J. Neurosci. 30, seems (but may not be) limited. Trends Cogn. Sci. 18, 127–133.
14273–14283.
Jimura, K., Locke, H.S., and Braver, T.S. (2010). Prefrontal cortex mediation of
Duncan, J. (2013). The structure of cognition: attentional episodes in mind and cognitive enhancement in rewarding motivational contexts. Proc. Natl. Acad.
brain. Neuron 80, 35–50. Sci. USA 107, 8871–8876.

Durstewitz, D., and Seamans, J.K. (2008). The dual-state theory of prefrontal Joel, D., Niv, Y., and Ruppin, E. (2002). Actor-critic models of the basal ganglia:
cortex dopamine function with relevance to catechol-o-methyltransferase ge- new anatomical and computational perspectives. Neural Netw. 15, 535–547.
notypes and schizophrenia. Biol. Psychiatry 64, 739–749.
Kahnt, T., Weber, S.C., Haker, H., Robbins, T.W., and Tobler, P.N. (2015).
Etzel, J., Cole, M.W., Zacks, J.M., Kay, K.N., and Braver, T.S. (2015). Reward Dopamine D2-receptor blockade enhances decoding of prefrontal signals in
motivation enhances task coding in frontoparietal cortex. Cereb. Cortex. Pub- humans. J. Neurosci. 35, 4104–4111.
lished online January 19, 2015. http://dx.doi.org/10.1093/cercor/bhu327.
Kayser, A.S., Mitchell, J.M., Weinstein, D., and Frank, M.J. (2014). Dopamine,
Frank, M.J., and Badre, D. (2012). Mechanisms of hierarchical reinforcement locus of control, and the exploration-exploitation tradeoff. Neuropsychophar-
learning in corticostriatal circuits 1: computational analysis. Cereb. Cortex macology 40, 454–462.
22, 509–526.
Kennerley, S.W., Dahmubed, A.F., Lara, A.H., and Wallis, J.D. (2009). Neurons
Frank, M.J., Loughry, B., and O’Reilly, R.C. (2001). Interactions between fron- in the frontal lobe encode the value of multiple decision variables. J. Cogn.
tal cortex and basal ganglia in working memory: a computational model. Cogn. Neurosci. 21, 1162–1178.
Affect. Behav. Neurosci. 1, 137–160.
Kennerley, S.W., Behrens, T.E.J., and Wallis, J.D. (2011). Double dissociation
Gan, J.O., Walton, M.E., and Phillips, P.E.M. (2009). Dissociable cost and of value computations in orbitofrontal and anterior cingulate neurons. Nat.
benefit encoding of future rewards by mesolimbic dopamine. Nat. Neurosci. Neurosci. 14, 1581–1589.
13, 25–27.
Kodama, T., Hikosaka, K., Honda, Y., Kojima, T., and Watanabe, M. (2014).
Gershman, S.J. (2014). Dopamine ramps are a consequence of reward predic- Higher dopamine release induced by less rather than more preferred reward
tion errors. Neural Comput. 26, 467–471. during a working memory task in the primate prefrontal cortex. Behav. Brain
Res. 266, 104–107.
Ghods-Sharifi, S., and Floresco, S.B. (2010). Differential effects on effort dis-
counting induced by inactivations of the nucleus accumbens core or shell. Kool, W., McGuire, J.T., Rosen, Z.B., and Botvinick, M.M. (2010). Decision
Behav. Neurosci. 124, 179–191. making and the avoidance of cognitive demand. J. Exp. Psychol. Gen. 139,
665–682.
Gläscher, J., Hampton, A.N., and O’Doherty, J.P. (2009). Determining a role for
ventromedial prefrontal cortex in encoding action-based value signals during Kouneiher, F., Charron, S., and Koechlin, E. (2009). Motivation and cognitive
reward-related decision making. Cereb. Cortex 19, 483–495. control in the human prefrontal cortex. Nat. Neurosci. 12, 939–945.

708 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.


Neuron

Review
Krabbe, S., Duda, J., Schiemann, J., Poetschke, C., Schneider, G., Kandel, Nicola, S.M. (2010). The flexible approach hypothesis: unification of effort and
E.R., Liss, B., Roeper, J., and Simpson, E.H. (2015). Increased dopamine D2 cue-responding hypotheses for the role of nucleus accumbens dopamine in
receptor activity in the striatum alters the firing pattern of dopamine neurons the activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600.
in the ventral tegmental area. Proc. Natl. Acad. Sci. USA 112, E1498–E1506.
Nicola, S.M., Hopf, F.W., and Hjelmstad, G.O. (2004). Contrast enhancement:
Krebs, R.M., Boehler, C.N., Roberts, K.C., Song, A.W., and Woldorff, M.G. a physiological effect of striatal dopamine? Cell Tissue Res. 318, 93–106.
(2012). The involvement of the dopaminergic midbrain and cortico-striatal-
thalamic circuits in the integration of reward prospect and attentional task Niv, Y. (2009). Reinforcement learning in the brain. J. Math. Psychol. 53,
demands. Cereb. Cortex 22, 607–615. 139–154.

Kroemer, N.B., Guevara, A., Ciocanea Teodorescu, I., Wuttig, F., Kobiella, A., Niv, Y., Daw, N.D., Joel, D., and Dayan, P. (2007). Tonic dopamine: opportunity
and Smolka, M.N. (2014). Balancing reward and work: anticipatory brain acti- costs and the control of response vigor. Psychopharmacology (Berl.) 191,
vation in NAcc and VTA predict effort differentially. Neuroimage 102, 510–519. 507–520.

Nymberg, C., Banaschewski, T., Bokde, A.L., Büchel, C., Conrod, P., Flor, H.,
Kurniawan, I.T., Seymour, B., Talmi, D., Yoshida, W., Chater, N., and Dolan,
Frouin, V., Garavan, H., Gowland, P., Heinz, A., et al.; IMAGEN consortium
R.J. (2010). Choosing to make an effort: the role of striatum in signaling phys-
(2014). DRD2/ANKK1 polymorphism modulates the effect of ventral striatal
ical effort of a chosen action. J. Neurophysiol. 104, 313–321.
activation on working memory performance. Neuropsychopharmacology 39,
2357–2365.
Kurniawan, I.T., Guitart-Masip, M., and Dolan, R.J. (2011). Dopamine and
effort-based decision making. Front. Neurosci. 5, 81. O’Reilly, R.C., and Frank, M.J. (2006). Making working memory work:
a computational model of learning in the prefrontal cortex and basal ganglia.
Kurniawan, I.T., Guitart-Masip, M., Dayan, P., and Dolan, R.J. (2013). Effort Neural Comput. 18, 283–328.
and valuation in the brain: the effects of anticipation and execution.
J. Neurosci. 33, 6160–6169. O’Reilly, R.C., Hazy, T.E., Mollick, J., Mackie, P., and Herd, S. (2014). Goal-
driven cognition in the brain: a computational framework. arXiv,
Kurzban, R., Duckworth, A., Kable, J.W., and Myers, J. (2013). An opportunity arXiv:1404.7591, http://www.arxiv.org/abs/1404.7591.
cost model of subjective effort and task performance. Behav. Brain Sci. 36,
661–679. Padoa-Schioppa, C. (2011). Neurobiology of economic choice: a good-based
model. Annu. Rev. Neurosci. 34, 333–359.
Lee, A.M., Tai, L.H., Zador, A., and Wilbrecht, L. (2015). Between the primate
and ‘reptilian’ brain: Rodent models demonstrate the role of corticostriatal Parvizi, J., Rangarajan, V., Shirer, W.R., Desai, N., and Greicius, M.D. (2013).
circuits in decision making. Neuroscience 296, 66–74. The will to persevere induced by electrical stimulation of the human cingulate
gyrus. Neuron 80, 1359–1367.
Li, J., and Daw, N.D. (2011). Signals in human striatum are appropriate for pol-
icy update rather than value prediction. J. Neurosci. 31, 5504–5511. Pasquereau, B., and Turner, R.S. (2013). Limited encoding of effort by dopa-
mine neurons in a cost-benefit trade-off task. J. Neurosci. 33, 8288–8300.
Locke, H.S., and Braver, T.S. (2008). Motivational influences on cognitive con-
trol: behavior, brain activation, and individual differences. Cogn. Affect. Behav. Pessoa, L., and Engelmann, J.B. (2010). Embedding reward signals into
Neurosci. 8, 99–112. perception and cognition. Front. Neurosci. 4, 4.

Phillips, A.G., Vacca, G., and Ahn, S. (2008). A top-down perspective on dopa-
Ma, L., Hyman, J.M., Phillips, A.G., and Seamans, J.K. (2014). Tracking prog- mine, motivation and memory. Pharmacol. Biochem. Behav. 90, 236–249.
ress toward a goal in corticostriatal ensembles. J. Neurosci. 34, 2244–2253.
Prévost, C., Pessiglione, M., Météreau, E., Cléry-Melin, M.-L., and Dreher,
Massar, S.A.A., Libedinsky, C., Weiyan, C., Huettel, S.A., and Chee, M.W.L. J.-C. (2010). Separate valuation subsystems for delay and effort decision
(2015). Separate and overlapping brain areas encode subjective value during costs. J. Neurosci. 30, 14080–14090.
delay and effort discounting. Neuroimage 120, 104–113.
Raichle, M.E., and Mintun, M.A. (2006). Brain work and brain imaging. Annu.
Matsumoto, M., and Takada, M. (2013). Distinct representations of cognitive Rev. Neurosci. 29, 449–476.
and motivational signals in midbrain dopamine neurons. Neuron 79, 1011–
1024. Ribas-Fernandes, J.J.F., Solway, A., Diuk, C., McGuire, J.T., Barto, A.G., Niv,
Y., and Botvinick, M.M. (2011). A neural signature of hierarchical reinforcement
McClure, S.M., Daw, N.D., and Montague, P.R. (2003). A computational sub- learning. Neuron 71, 370–379.
strate for incentive salience. Trends Neurosci. 26, 423–428.
Riggall, A.C., and Postle, B.R. (2012). The relationship between working mem-
McGinty, V.B., Lardeux, S., Taha, S.A., Kim, J.J., and Nicola, S.M. (2013). ory storage and elevated activity as measured with functional magnetic reso-
Invigoration of reward seeking by cue and proximity encoding in the nucleus nance imaging. J. Neurosci. 32, 12990–12998.
accumbens. Neuron 78, 910–922.
Robbins, T.W., and Arnsten, A.F. (2009). The neuropsychopharmacology of
McGuire, J.T., and Botvinick, M.M. (2010). Prefrontal cortex, cognitive control, fronto-executive function: monoaminergic modulation. Annu. Rev. Neurosci.
and the registration of decision costs. Proc. Natl. Acad. Sci. USA 107, 7922– 32, 267–287.
7926.
Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E., and Schoenbaum, G.
Meyniel, F., Sergent, C., Rigoux, L., Daunizeau, J., and Pessiglione, M. (2013). (2009). Ventral striatal neurons encode the value of the chosen action in rats
Neurocomputational account of how the human brain decides when to have a deciding between differently delayed or sized rewards. J. Neurosci. 29,
break. Proc. Natl. Acad. Sci. USA 110, 2641–2646. 13365–13376.

Salamone, J.D., and Correa, M. (2012). The mysterious motivational functions


Miller, E.K., and Cohen, J.D. (2001). An integrative theory of prefrontal cortex of mesolimbic dopamine. Neuron 76, 470–485.
function. Annu. Rev. Neurosci. 24, 167–202.
Salamone, J.D., Correa, M., Nunes, E.J., Randall, P.A., and Pardo, M. (2012).
Mogenson, G.J., Jones, D.L., and Yim, C.Y. (1980). From motivation to action: The behavioral pharmacology of effort-related choice behavior: dopamine,
functional interface between the limbic system and the motor system. Prog. adenosine and beyond. J. Exp. Anal. Behav. 97, 125–146.
Neurobiol. 14, 69–97.
Samanez-Larkin, G.R., Buckholtz, J.W., Cowan, R.L., Woodward, N.D., Li, R.,
Montague, P.R., Dayan, P., and Sejnowski, T.J. (1996). A framework for Ansari, M.S., Arrington, C.M., Baldwin, R.M., Smith, C.E., Treadway, M.T.,
mesencephalic dopamine systems based on predictive Hebbian learning. et al. (2013). A thalamocorticostriatal dopamine network for psychostimu-
J. Neurosci. 16, 1936–1947. lant-enhanced human cognitive flexibility. Biol. Psychiatry 74, 99–105.

Nicola, S.M. (2007). The nucleus accumbens as part of a basal ganglia action Sara, S.J. (2009). The locus coeruleus and noradrenergic modulation of cogni-
selection circuit. Psychopharmacology (Berl.) 191, 521–550. tion. Nat. Rev. Neurosci. 10, 211–223.

Neuron 89, February 17, 2016 ª2016 Elsevier Inc. 709


Neuron

Review
Satterthwaite, T.D., Ruparel, K., Loughead, J., Elliott, M.A., Gerraty, R.T., Cal- Tai, L.-H., Lee, A.M., Benavidez, N., Bonci, A., and Wilbrecht, L. (2012). Tran-
kins, M.E., Hakonarson, H., Gur, R.C., Gur, R.E., and Wolf, D.H. (2012). Being sient stimulation of distinct subpopulations of striatal neurons mimics changes
right is its own reward: load and performance related ventral striatum activa- in action value. Nat. Neurosci. 15, 1281–1289.
tion to correct responses during a working memory task in youth. Neuroimage
61, 723–729. Treadway, M.T., Buckholtz, J.W., Cowan, R.L., Woodward, N.D., Li, R., Ansari,
M.S., Baldwin, R.M., Schwartzman, A.N., Kessler, R.M., and Zald, D.H. (2012).
Saunders, B., and Inzlicht, M. (2015). Vigour and fatigue: How variation in affect Dopaminergic mechanisms of individual differences in human effort-based de-
underlies effective self-control. In Motivation and Cognitive Control, T.S. cision-making. J. Neurosci. 32, 6170–6176.
Braver, ed. (Taylor & Francis/Routledge), pp. 211–234.
Trifilieff, P., Feng, B., Urizar, E., Winiger, V., Ward, R.D., Taylor, K.M., Martinez,
Schmidt, L., Lebreton, M., Cléry-Melin, M.-L., Daunizeau, J., and Pessiglione, D., Moore, H., Balsam, P.D., Simpson, E.H., and Javitch, J.A. (2013).
M. (2012). Neural mechanisms underlying motivation of mental versus physical Increasing dopamine D2 receptor expression in the adult nucleus accumbens
effort. PLoS Biol. 10, e1001266. enhances motivation. Mol. Psychiatry 18, 1025–1033.

Schouppe, N., De Houwer, J., Ridderinkhof, K.R., and Notebaert, W. (2012). van der Meer, M.A., and Redish, A.D. (2011). Ventral striatum: a critical look at
Conflict: run! Reduced Stroop interference with avoidance responses. Q. J. models of learning and evaluation. Curr. Opin. Neurobiol. 21, 387–392.
Exp. Psychol. (Hove) 65, 1052–1058.
van Holstein, M., Aarts, E., van der Schaaf, M.E., Geurts, D.E.M., Verkes, R.J.,
Schouppe, N., Demanet, J., Boehler, C.N., Ridderinkhof, K.R., and Notebaert, Franke, B., van Schouwenburg, M.R., and Cools, R. (2011). Human cognitive
W. (2014). The role of the striatum in effort-based decision-making in the flexibility depends on dopamine D2 receptor signaling. Psychopharmacology
absence of reward. J. Neurosci. 34, 2148–2154. (Berl.) 218, 567–578.
Schultz, W., Dayan, P., and Montague, P.R. (1997). A neural substrate of pre- van Schouwenburg, M., Aarts, E., and Cools, R. (2010). Dopaminergic modu-
diction and reward. Science 275, 1593–1599. lation of cognitive control: distinct roles for the prefrontal cortex and the basal
ganglia. Curr. Pharm. Des. 16, 2026–2032.
Schweimer, J., and Hauber, W. (2006). Dopamine D1 receptors in the anterior
cingulate cortex regulate effort-based decision making. Learn. Mem. 13, Varazzani, C., San-Galli, A., Gilardeau, S., and Bouret, S. (2015). Noradrenaline
777–782. and dopamine neurons in the reward/effort trade-off: a direct electrophysio-
logical comparison in behaving monkeys. J. Neurosci. 35, 7866–7877.
Schweimer, J., Saft, S., and Hauber, W. (2005). Involvement of catecholamine
neurotransmission in the rat anterior cingulate in effort-related decision mak- Vassena, E., Silvetti, M., Boehler, C.N., Achten, E., Fias, W., and Verguts, T.
ing. Behav. Neurosci. 119, 1687–1692. (2014). Overlapping neural systems represent cognitive effort and reward
anticipation. PLoS ONE 9, e91008.
Seamans, J.K., and Yang, C.R. (2004). The principal features and mechanisms
of dopamine modulation in the prefrontal cortex. Prog. Neurobiol. 74, 1–58.
Vijayraghavan, S., Wang, M., Birnbaum, S.G., Williams, G.V., and Arnsten,
A.F.T. (2007). Inverted-U dopamine D1 receptor actions on prefrontal neurons
Shackman, A.J., Salomons, T.V., Slagter, H.A., Fox, A.S., Winter, J.J., and Da-
vidson, R.J. (2011). The integration of negative affect, pain and cognitive con- engaged in working memory. Nat. Neurosci. 10, 376–384.
trol in the cingulate cortex. Nat. Rev. Neurosci. 12, 154–167.
Volkow, N.D., Wang, G.-J., Newcorn, J.H., Kollins, S.H., Wigal, T.L., Telang, F.,
Shenhav, A., Botvinick, M.M., and Cohen, J.D. (2013). The expected value of Fowler, J.S., Goldstein, R.Z., Klein, N., Logan, J., et al. (2011). Motivation
control: an integrative theory of anterior cingulate cortex function. Neuron deficit in ADHD is associated with dysfunction of the dopamine reward
79, 217–240. pathway. Mol. Psychiatry 16, 1147–1154.

Shiner, T., Symmonds, M., Guitart-Masip, M., Fleming, S.M., Friston, K.J., and von Stumm, S., Hell, B., and Chamorro-Premuzic, T. (2011). The hungry mind:
Dolan, R.J. (2015). Dopamine, salience, and response set shifting in prefrontal intellectual curiosity is the third pillar of academic performance. Perspect. Psy-
cortex. Cereb. Cortex 25, 3629–3639. chol. Sci. 6, 574–588.

Skvortsova, V., Palminteri, S., and Pessiglione, M. (2014). Learning to minimize Walton, M.E., Groves, J., Jennings, K.A., Croxson, P.L., Sharp, T., Rushworth,
efforts versus maximizing rewards: computational principles and neural corre- M.F.S., and Bannerman, D.M. (2009). Comparing the role of the anterior cingu-
lates. J. Neurosci. 34, 15621–15630. late cortex and 6-hydroxydopamine nucleus accumbens lesions on operant
effort-based decision making. Eur. J. Neurosci. 29, 1678–1691.
Small, D.M., Gitelman, D., Simmons, K., Bloise, S.M., Parrish, T., and Mesu-
lam, M.M. (2005). Monetary incentives enhance processing in brain regions Ward, R.D., Winiger, V., Higa, K.K., Kahn, J.B., Kandel, E.R., Balsam, P.D., and
mediating top-down control of attention. Cereb. Cortex 15, 1855–1865. Simpson, E.H. (2015). The impact of motivation on cognitive performance in an
animal model of the negative and cognitive symptoms of schizophrenia.
Spencer, R.C., Devilbiss, D.M., and Berridge, C.W. (2015). The cognition- Behav. Neurosci. 129, 292–299.
enhancing effects of psychostimulants involve direct action in the prefrontal
cortex. Biol. Psychiatry 77, 940–950. Westbrook, A., Kester, D., and Braver, T.S. (2013). What is the subjective cost
of cognitive effort? Load, trait, and aging effects revealed by economic prefer-
Spunt, R.P., Lieberman, M.D., Cohen, J.R., and Eisenberger, N.I. (2012). The ence. PLoS ONE 8, e68210.
phenomenology of error processing: the dorsal ACC response to stop-signal
errors tracks reports of negative affect. J. Cogn. Neurosci. 24, 1753–1765. Wickens, J.R., Horvitz, J.C., Costa, R.M., and Killcross, S. (2007). Dopami-
nergic mechanisms in actions and habits. J. Neurosci. 27, 8181–8183.
Steullet, P., Cabungcal, J.-H., Cuénod, M., and Do, K.Q. (2014). Fast oscilla-
tory activity in the anterior cingulate cortex: dopaminergic modulation and ef- Wunderlich, K., Smittenaar, P., and Dolan, R.J. (2012). Dopamine enhances
fect of perineuronal net loss. Front. Cell. Neurosci. 8, 244. model-based over model-free choice behavior. Neuron 75, 418–424.

Strauss, G.P., Morra, L.F., Sullivan, S.K., and Gold, J.M. (2015). The role of low Zhang, J., Berridge, K.C., Tindell, A.J., Smith, K.S., and Aldridge, J.W. (2009).
cognitive effort and negative symptoms in neuropsychological impairment in A neural computational model of incentive salience. PLoS Comput. Biol. 5,
schizophrenia. Neuropsychology 29, 282–291. e1000437.

710 Neuron 89, February 17, 2016 ª2016 Elsevier Inc.

You might also like