You are on page 1of 12

LECTURE 9: REWARD

- The different between what predicted & what is actually happened


- Predictability of reward
o Reward must be unpredicted or surprising for learning to occur
BRAIN IS A PREDICTION MACHINE o If no different = nothing to learn
o Only learning when there is different & surprising
- Reward “prediction error”

- Consequences of prediction error:


o Positive (+) = new learning: learn something that you never
knew before
o Zero (0) = no new learning: what expected to happen so
nothing to learn
o Negative (-) = extinction: loss money – learning in opposite
reaction: avoid in future

- The way we learn is based on what predicted to happen next REWARD CIRCUITS IN THE BRAIN
- If it matches = nothing to be change
- If we behave & get outcome that not match with our prediction = error
signals – need to update prediction

THE CONCEPT OF REINFORCEMENT

Positive reinforcers (“rewards”)

- Increase the frequency of behaviour leading to their acquisition

Negative reinforcers (“punishment”)

- Decrease the frequency of behaviour leading to their encounter &


increase the frequency of behaviour leading to their avoidance

REWARD “PREDICTION ERROR”


- Reward system in a brain is a set of complex circuit – involves neurons in REWARD SIGNALLING IN THE BRAIN
frontal cortex & midbrain (nucleus accumbens etc) & amygdala
- From brain imaging & brain recording studies: Dopamine neurons tend
to give a spike in activity level when there is a positive reward prediction
error
- In a supermarket, and drink a coffee = the best coffee = spike of activity
in neuron circuit that use dopamine
- Making coffee myself at home = not best = not happy = suppression of
concentration in dopamine in reward circuit
o Neurons that use dopamine are very sensitive to reward and
punishment
- From raw bean to coffee = dopamine activity tends to ramp up in a tonic
way over that series of steps
o When there is sequence of actions involved in getting a reward
= dopamine activity tends to ramp up over time

- Three areas of cerebral cortex that critically involve in different aspect of


reward:
o Temporal lobe = amygdala
o Frontal & parietal cortex = goal of a behaviour
o Orbitofrontal cortex = encode the relative reward value
- Striatum = in mid brain
- Dopamine neurons = reward system is regulated by this neuron that use
dopamine
EFFECTS OF REWARD ARE REGULATED THROUGH ACTIVITY IN THE o After postsynaptic neuron is activated, free dopamine will be
SYNAPSE taken back up into presynaptic neuron by dopamine
transporter (pulling neurotransmitter back into themselves we
used next time)
o Methamphetamine = blocks dopamine transporters in
presynaptic terminal button
 Accessed dopamine cannot get back into presynaptic
neurons – they hang around in synapse & activating
postsynaptic neuron
- Cocaine:
o Similar effect to methamphetamine
o Not taken up directly into terminal button of presynaptic
neuron
o Blocks dopamine transporters = continue to hang around in
the synapse
o Rewarding feeling that people get from cocaine

- Action of dopamine happen largely in the synapse

DRUGS OF ABUSE AFFECT DOPAMINE NEUROTRANSMISSION

- Methamphetamine:
o Increase release of dopamine & blocks dopamine reuptake by DOPAMINE NEURONS RESPOND TO REWARD AND REWARD-PREDICTING
presynaptic neuron STIMULI
o Methamphetamine in the system is taken up into presynaptic 3 sets of experiments in monkey to do simple behaviour tests which involves
neuron & combined with the vesicle with existing dopamine rewards & micro-electrodes in implanted in the monkey:
 Release at presynaptic terminal into synaptic cleft
- Monkey is sitting in front of operators & push response key to activate
food box (reward)
o Measure: how neurotransmitter fire action potential in
response to reward across lots of trials
o RESULT: When monkey receive cue & move to retrieve the
reward = there is spike of action potential from neuron in
striatum (reward circuit)
- When hand make contact with an apple = increase in rate of firing action
potential
o if no contact with anything (food) = no activity in action
potential
- Monkey learned that after doing an action, it will get reward after 1s
o After 1s of lever touch = see increase in rate of firing (neuron is ADAPTATION OF REWARD EXPECTATION DURING LEARNING
interested in stimulus)
o But if the reward is delayed ~1.5s = neuron is silent “empty
area” – expecting reward but not get it
 Suppression on attenuation of action potential in that
neuron
o When reinstate back to 1s or earlier = dopaminergic response
 Timing of reward is important

- Recording the electrode in putamen neuron (part of reward network)


- Rewarded for either making a movement or correctly holding - Discriminate between two different visual patterns (similar elements but
movement different shape overall)
- REWARDED MOVEMENT: Action potential is fired after monkey initiate o Monkey get reward for discriminating these two visual shapes
movement to receive reward of one another
- REWARDED NON-MOVEMENT: Rewarded after withholding the  LEFT top: rewarded with raisin (highest sugar content
response? = neurons are silent after initial period of the cue to make a – love the best “reward A”)
movement, not get action potential until the delay is finished &  RIGHT top: rewarded with apple (less preference than
correctly withheld action + get the reward raisin “reward B”)
o Action potential is fired by neuron tends to follow the timing of  Less dopamine neuron response in reward B compare
critical action that yield to reward with raisin (reward A)
- UNREWARDED MOVEMENT: Monkey initially try to expect the reward & o Monkey is given two different stimuli to discriminate – apple vs
over time reward is removed = activity in dopamine neuron drop off & cereal (reward C – most unlikeable by monkey)
disappear  Bigger dopaminergic neural response when the
monkey is rewarded with apple compared to cereal
CODING OF RELATIVE REWARD PREFERENCE - ISSUE: Salience & relative rewarding value of stimuli
o Orbitofrontal neuron seems to be sensitive to the relative value
of rewarding stimuli

VISUAL SEARCH TASKS

Orbitofrontal cortex: salience & relative daily reward is important


- (c) inverse efficiency – person’s RT but scaled by the proportion correct,
so if the person makes a fast response but also make lots of errors = get
penalised / make fast responses and few errors = get benefit
o Phenomenon of “priming of pop-up” = # of times that target
colour is repeated
o If see the same target continuously across trials = get faster
each time “target colour remains the same” – BOTH RED &
GREEN TARGETS
o Steepness of slope is greater in HIGH REWARD COLOUR
“green” than low reward colour
o Evidence: pop-up effect can be enhanced by rewarding – colour
of target

- Find vertical red bar = TARGET


- Easiest -------------------------------------------- Difficult
- Conjunction = combining two information

KRISTJANSSON ET AL (2010): REWARD INFLUENCES “POP-OUT” DURING


VISUAL SEARCH
- Human subject in a lab setting – different amount of reward
- Two red objects & one green target VS red target & two green objects
o Appear anywhere in triangular arrangement
- Find the single colour “target” & indicate whether the pointy bit of
target is missing at the bottom OR missing at the top EFFECT OF REVERSING REWARD CONTINGENCIES WITHOUT WARNING –
- Tied some differential reward value to different colour SIMILAR RESEARCHER
o Green 75% of the task = 10 point, 25% = get 1 point – HIGH
- Switch the contingencies: RED = high reward colour & GREEN = low
REWARD COLOUR
reward colour (unbeknownst to participants)
o Red 75% of time = 1 point, 25% = 10 point
- Reaction time should be quicker, but the discrimination is tricky
RESULT: each time target colour gets repeated, RT is faster & high reward colour
is fastest overall What happen to the crossover of switching the contingencies without telling
subject?

- 0 = point of change – switch the colour of HR & LR without telling the


participants
- Faster response under HR initially
- The stimulus that is initially HIGH REWARD, RT over trials tend to get
slower and slower & crossover completely with stimulus that is initially
LOW REWARD which RT tends to get faster and faster
- RT tracks accurately the relative reward value
o This pattern occurred among the participants who in post-test
debrief = indicate that they are aware of manipulation & also
presence in the participants who were not aware of changing in
contingencies
- Change behaviour as the consequences of reward value

ANDERSON ET AL (2011): REWARD-DRIVEN ATTENTIONAL CAPTURE


- Participants were told that they will be tested in two phases:
o Training Phase: Px see one set of visual-search arrays & GET  Covert attention has been grabbed by HR (red) that
DIFFERENT TYPE OF REWARDS has been trained of = slower RT
 Look at RED or GREEN circle in brief display (600ms) =
report whether the line is VERTICAL or HORIZONTAL RESULT:
 Manipulated the REWARD value of getting the target
correct when it is red and green
 RED = 80% chance will receive 5cent if target
is discriminated correctly & 20% chance get
1cent
 GREEN = 20% chance to get 5cent & 80% of
1cent
 Train with hundreds of trials & go to test phase

- Fastest RT when there is no distractor present, slower RT when there is


LOW VALUE distractor & slowest when HIGH VALUE distractors
- Use 2cent & 10cent = show the same effect as mentioned previously
o Test Phase: do on the same day OR tested on day or weeks o Even with minimal training, people tend to attach to the
follow up – critically NO REWARD IS GIVEN reward value of colours
 Participants have associated of RED (high reward) and - After several days = same pattern – value of reward is long-lasting
GREEN (low reward) with different reward values – o Correlated reward-value distractor effect with working memory
learn of salience capacity
 Ignore colour, and find the “diamond” shape & record o People who are very distracted with the distractors tend to
the line inside either VERTICAL or HORIZONTAL
have lower working memory capacity
 The RED and GREEN = distractors
o Higher working memory capacity, the less distractibility
EVENT-RELATED POTENTIALS (ERP s) o Red as 5 points & green as 1 point OR red as 1 point & green as
5 points – counterbalance across participants
- EEG = changes in electrical potential - N2pc = “second negative component over posterior contralateral
- And average together lots trials – because they are noisy - & time-lock electrode” negative deflection at ERP at 200ms
them to stimulus (sound, light) - Also, examined component of SPCN = prolonged period toward the end
- Smooth wave = ERP of 500ms period
- Analyse those to see if there is difference for HR and LR
KISS ET AL (2009): ERP EVIDENCE FOR REWARD MODULATION OF
SINGLETON TARGETS IN VISUAL SEARCH

- There is subtle difference between dotted line (LR) and solid line (HR)
- There is difference in amplitude size in different period
- Size of cueing effect (RT effect) measure within efficiency was correlated
with size of N2pc effect
- Look at pop-up effect in visual search & record brain activity using ERP
with critical manipulation of reward values
UNILATERAL SPATIAL NEGLECT
- Show sequence of brief visual display “diamond shape” – search of
either RED (black) or GREEN (white) - Occurs after damage to one side of the brain (usually the right
o Brief stimulus to respond & when see the element = indicate hemisphere)
whether the pointy bit is missing at the TOP or BOTTOM of the - Patients behave as if the affected side of space (the contralesional side)
stimulus has ceased to exist:
- Measure ERP to onset of each new visual search display + manipulated o Ignore food etc…
reward value
- Most common & severe after damage to the parietal lobe (but can arise o And in NON-REWARD condition, they will not get any reward
from cortical damage elsewhere) o But due to ethical constrains = give all participants reward
- Stroke in brain = stop from being consciously aware - Repeat the task in the next day – either they will improve or not due to
reward that they were told they will get & actually got in between
CLINICAL TESTS FOR SPATIAL NEGLECT session
- Line cancelation, circle cancellation, start cancellation = neglected
everything at the left & only perform on the right-hand side

MALHOTRA ET AL (2012): REWARD MODULATION OF UNILATERAL


SPATIAL NEGLECT

RESULT:

- NR: no-reward group, first session & second session = no improvement


in how many targets found in NR group
- R: reward group = poor performance in the first session & significant
- Modified the standard visual search = replace with coins improvement in the second session
o Distracting letters & words - Regardless of where targets presented in the display
o Copy pound coins symbols into array & distracted elements - For LEFT SIDE OF DISPLAY = same pattern
“blurred” coins - Most patient show some degree of improvement from session 1 to
o Control condition = metallic buttons – scattered amongst session 2
display o Patient 2 & 9 = doesn’t improve – lesion map showed that they
- Group of neglect patients = mark as many as the element they could had damage in the striatum & frontal cortex
find
- Initial task = told them explicitly only in POUND COIN condition, if they
did very well will get 15pound Reward
LUCAS ET AL (2013): EFFECTS OF REWARDING CONTRALESIONAL - Do the same experiment on healthy controls & neglect patients
TARGETS IN VISUAL SEARCH IN NEGLECT

- In neglect patient – active experiment: rewards were accumulating trial - CONTROL Px:
by trial o Time bins = block of trials
- Display showed twinkling stars light array o X-axis = x-axis of visual display, where likely the reward be
o Bright star = high potential reward given – across the right side
o Dim star = distractors o Clusters around the middle of the display “blue function” =
- Indicate by pointing which one of the bright stars you want to turn over symmetrical reward condition
to see its value o Asymmetrical reward (reward on the left) = drift to left-hand
- At the end, get an amount that you won as reward (0, 5, 10, 50 cent) side of display – across trials
- Patients have to learn the likely location of the higher value elements  Red = aware of debriefing reward
over successive trials  Yellow = unaware of manipulation
- Reward distributions:  BUT, the effect is stronger in aware & very much
o Symmetric distribution: give equal value rewards on average presence in unaware
across two side of display o Reward manipulation impact response of healthy control
o Asymmetric distribution: higher reward on left side of display - PATIENT:
o Symmetrical: patient tend to pick targets to right of centre
o Asymmetrical: search further to left-hand side
- Lesion map for patients
o Larger benefits are crude by patients who don’t have damage
in striatum region

- Healthy control:
o No change in symmetrical display over trial-bins
o Gradual increase in number of target detected on left hand side
with time-bins (symmetrical case)
- Patients: similar pattern

You might also like