You are on page 1of 12

The Influence of Reward and Punishment on Learning Rates

Alessandra Zito
alessandra.zito@consenso.ch

1 Introduction
Reward and punishment influences human behaviour but the question how and to which
extend remains in many parts unanswered. Thorndike introduced the law of effect which states
that a positive effect (reward) increases the probability and a negative consequence
(punishment) will reduce the probability that a certain behaviour will be repeated in the future
(Thorndike, 1913, 1927). According to this law, the effects of punishment and reward are
symmetric and simply two sides of the same coin. Nevertheless, it was Thorndike himself who
doubted the symmetry of reward and punishment some years later. Based on his research he
concluded that reward and punishment show asymmetric effects and that rewards have a
stronger impact on human behaviour than punishment (Thorndike, 1932). Recent studies
support the assumption of asymmetry and the idea of reward and punishment influencing
behaviour in a distinct manner but in contrast assume that punishment has a higher impact
than reward (Gershman, 2015; Kubanek, Rasmussen & Newland, 2008; Snyder & Abrams,
2015; Yechiam & Hochman, 2013). According to a study of Freedberg, Glass, Filoteo,
Hazeltine & Maddox (2017), negative feedback appears to be more effective and positive
feedback is not even needed for learning, at least not for implicit learning. Furthermore,
reward and punishment have different effects on learning and retention (Galea, Mallia,
Rothwell & Diedrichsen, 2015) and therefore eventually lead to higher learning rates in a
different manner, punishment increasing learning speed (short term, immediate learning rate)
and reward increasing learning retention (long term learning rate, after hours or days).

2 Hypothesis
Hypothesis 1. Punishment leads to higher learning rates than reward.

Do reward and punishment differ regarding their effect on learning?


Hypothesis 2a. Reward increases long term learning rates.
Hypothesis 2b. Punishment increases short term learning rates.


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 2

3 Method
The databases PsycARTICLES, Psychological and Behavioral Sciences Collection,
PsycINFO and PSYNDEX have been consulted in July 2017 for a systematic electronic
literature search using EBSCOhost Services. In the course of reviewing the relevant studies,
reference was made to further informative sources and literature references of selected studies
consulted for relevant sources on the topic of reward and punishment. Further search sources
were Google Scholar and Google Books. For the elaboration of the theoretical foundations of
the present work, further textbooks and studies from the EBSCOhost® search were used,
which have not been included in the systematic literature review itself. The selection has been
limited to empirical and peer-reviewed studies with adults written in English or German
language. The terms learning rate, reward and punishment were not used uniformly in all
studies. Learning rate was operationalized for example by performance speed, number of
errors or accuracy. For reward and punishment terms such as positive/negative feedback,
positive/negative reinforcement or gain and loss were used. As a consequence
operationalization of learning rate, reward and punishment was examined more closely in the
method part of the studies to ensure suitability. The following keywords were included using
the Boolean search AND: operant conditioning, reinforcement, reinforcement learning,
learning, learning rate, punishment, reward, performance and retention.
Keywords Operator Number of
results
reward, punishment AND 400
reward, punishment, learning AND 105
reward , punishment, performance AND 55
reward , punishment, reinforcement AND 75
reward, punishment, operant conditioning AND 11
reward, punishment, reinforcement learning AND 9
reward, punishment, learning rate AND 2
punishment, reinforcement AND 137
punishment, learning AND 145
punishment, reinforcement learning AND 19
reward, learning AND 776
reward, reinforcement AND 571
reward, reinforcement learning AND 122
reward, retention AND 32

More than 400 articles have been proved on relevance based on the summary, out of which
a selection of 89 articles was further reduced to 26. A more detailed consideration let to an
exclusion of another 10 studies because the focus of the studies did not match with the focus
of this review. In order to keep the current state of research as general as possible, the selection


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 3

of the primary studies excluded those studies in which specific diseases such as Parkinson or
addiction diseases were explicitly investigated in relation to reward and punishment with af-
fected patients. Further studies were excluded because they did not investigate the different
impact of reward and punishment on learning rate but did investigate for example the influ-
ence of dopamine. In the end only 16 studies could be taken into account for this review in-
cluding results that could be considerate for the hypothesis of this review.

4 Results
The results of 12 studies on the first research question about the superiority of punishment
versus reward in terms of learning rates (H1) are heterogeneous and the effect size where not
available in all studies and could not always be calculated with the data available in the studies.
There is only a slight tendency toward punishment resulting in a higher learning rate compared
to reward and thus H1 can neither be accepted nor rejected. Results from six studies confirmed
a significant difference between reward and punishment, suggesting a superiority of punish-
ment. In five primary studies no evidence of a significant difference between reward and pun-
ishment could be provided and only the results of one study report higher learning rate for
reward than for punishment. With regard to hypothesis 2a (H2a), the results are once again
heterogeneous and do not allow a clear conclusion, also because of the small number of stud-
ies. The results of two studies support H2a, one study does not a one study only provides
evidence on the superiority of monetary reward compared to verbal feedback or point gain.
Also H2a can neither be accepted nor rejected. The results of three of the four studies used for
hypothesis 2b (H2b) suggest that the impact of punishment is not significantly different from
the influence of reward on short-term learning rates (retention). In one study, however, this
applied only to one of the two tasks used. A study by Steel et al. (2016) suggests that punish-
ment can lead to even lower learning rates depending on the type of task. Only Galea et al.
(2015) report in their results the superiority of punishment with regard to short-term learning
rates. However, due to the very small number of studies, the hypothesis can neither be ac-
cepted nor rejected.


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 4

Study Design Sample Variables IV/DV Operationalization Instruments Methodology Results


Ashby, G. F., & BSD N = 19 IV: reward, reward: positive feedback Information- No differnece in learning rate between PFB
O’Brien, J. R. B. punishment (PFB) integration category ANOVA and NFB; p > .05
(2007). The effects of USA DV: learning rate punishment: negative feedback learning Sign test
positive versus negative (NFB) categorization task
feedback on reward + punishment: full with gabor stimuli.
information-integration feedback (Full-FB), partial
category learning. Feedback (Partial-FB)
learning rate: accurancy

Lie, C., & Alsop, B. BSD N = 12 IV: reward, Reward: point gain Signal-detection ANOVA Only approached significant difference in
(2009). Effects of point- WSD Age: 18 - 21 punishment Punishment: point loss task learning rate between reward and punishment
loss punishers on 8 women DV: learning rate Learning rate: discrimination (p = .06).
human signal-detection rate MDR: punishment > reward
performance. NZL

Wächter, T., Lungu, O. BSD N = 91 IV: reward, Reward: monetary reward SRTT GLM Higher learning rate for reward.
V., Liu, T., M A 21.7 punishment Punishment: monetary Bonferroni Reduction in RT in learning phase: reward
Willingham, D. T., & SD = 3.5 DV: learning rate punishment correction and control > punishment (p < .001)
Ashe, J. (2009). 69 women Learning rate: reaction time custom Overall RT-gain: reward > punishment and
Differential effect of and error rate orthogonal controll (p =.002, MSE =0.006)
reward and punishment USA contrast
on procedural learning.


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 5

Study Design Sample Variables IV/DV Operationalization Instruments Methodology Results


BSD N = 41 IV: reward, Reward: monetary reward Isometric pinch Shapiro-Wilk- Learning rate immediately after training
Abe, M., Schambra, M A = 24.3 punishment Punishment: monetary force task Test similar in all conditions (p = .23 to .77).
H., Wassermann, E. SD = 5.2 DV: learning rate punishment Mauchly’s Retention (delta 6 hours – immediately) for
M., Luckenbaugh, D., 18 women Learning rate: number of Sphericity reward higher than for punishment (p = .02).
Schweighofer, N., & errors Test Constant number of errors compared to
Cohen, L. G. (2011). USA Retention: delta error Bonferroni immediately after training in the reward
Reward improves long- immediately, 6 and 24 hours, correction condition and a higher number of errors in the
term retention of a 30 days after training. punishment condition (p = .01). Retention
motor memory through after 24 hours higher in the reward condition
induction of offline than in the punishment condition (p = .04).
memory gains. Number of errors after 30 days stable in the
reward condition (p = .31) and higher in the
punishment condition (p < .001). Mean
number of errors after 30 days significantly
smaller in the reward condition than in the
punishment condition (p = .002).

Guitart-Masip, M., WSD N = 47 IV: reward, Reward: monetary reward target detection ANOVA No significant difference in learning rate
Huys, Q. J. M., M A = 23.1 punishment Punishment: monetary task: go and no go post hoc between reward and punishment (p = .21, d
Fuentemilla, L., Dayan, SD = 4.1 DV: learning rate punishment learning. t -Test = 0.27).
P., Duzel, E., & Dolan, 28 women Learning rate: choice optimal / Go-Trials: higher learning rate with reward
R. J. (2012). Go and not optimal (p < .001, d = 1.43).
no-go learning in UK No-Go-Trials: higher learning rate with
reward and punishment (p < .001, d = 1.50).
punishment:
Interactions between
affect and effect.


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 6

Study Design Sample Variables IV/DV Operationalization Instruments Methodology Results


Weis, T., Puschmann, WSD N = 26 IV: reward, Reward: monetary reward Auditory t -Test Learning rate does not defer significantly
S., Brechmann, A., & M A = 24 punishment Punishment: monetary discrimination task between punishment and reward (p = .20, d
Thiel, C. M. (2013). SD = 2 DV: learning rate punishment = 0.60)
Positive and negative 15 women Learning rate: discrimination
reinforcement activate accuracy (and reaction time)
human auditory cortex. D

Yildiz, A., BSD N = 48 IV: reward, Reward: monetary reward Dual task (visual ANOVA Effect only significant on first task. Main
Chmielewski, W., & M A = 23.63 punishment Punishment: monetary and auditory) in a Bonferroni effect for condition (p = .045, d = 0.77).
Beste, C. (2013). Dual- SD = 3.79 DV: learning rate punishment PRP paradigm correction Higher learning rate for punishment (p =
task performance is 29 women Learning rate: reaction time .041)
differentially
modulated by rewards D
and punishments.

Galea, J. M., Mallia, BSD N = 100 IV: reward, Reward: (graded) monetary Visuo motor ANOVA Significant higher learning rate for
E., Rothwell, J., & WSD M A = 22 punishment reward (GMP) and random rotation model Post hoc punishment than for reward: AMN > AMP
Diedrichsen, J. (2015). SD = 6 DV: Lernleistung, positive feedback (RPF) (SSM) t -Test Exp. 1: p = .045, d = 1.37
The dissociable effects 58 woman reaction time, Punishment: (graded) Tukey post Exp. 2: p = .017
of punishment and mouvement time monetary punishment (GMN) hoc Test
reward on motor UK Learning rate: accuracy
learning.

Gershman, S. J. WSD N = 166 IV: reward, Reward: positive prediction two-armed bandit ANOVA Negative prediction error (punishment) leads
(2015). Do learning Age: 23 – 39 punishment error task to higher learning rate than positive
rates adapt to the DV: learning rate Punishment: negatvie prediction error (reward) (η− > η+).
distribution of USA prediction error Significant effect for prediction error
rewards? Learning rate: correct p < .0001
responces


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 7

Study Design Sample Variables IV/DV Operationalization Instruments Methodology Results


Kubanek, J., Snyder, L. WSD N = 88 IV: reward, Reward: monetary reward choice paradigm / t -Test Higher learning rate for punishment. Effect on
H., & Abrams, R. A. Age: 18 - 23 punishment Punishment: monetary discrimination task mean choice behaviour higher with punishment than
(2015). Reward and 61 women DV: learning rate punishment in a auditory and a deviation with reward, regardless of magnitude.
punishment act as Learning rate: choice visual task F -Test Absolute mean deviation: Punishment >
distinct factors in USA behaviour reward, p = .024, , d = 0.64
guiding behavior. Variance in slope: punishment > reward, p =
.00055

Moustafa, A. A., WSD N = 72 IV: reward, Reward: point gain probabilistic ANOVA Learning rate does not defer significantly
Gluck, M. A., Age: 18 - 22 punishment Punishment: point loss categorization task Levene’s- between punishment and reward
Herzallah, M. M., & 47 women DV: learning rate Learning rate: optimal and not Test Experiment 1: p > .200
Myers, C. E. (2015). optimal responses Mauchly’s- Experiment 2: p = .579
The influence of trial USA Test
order on learning from Greenhouse-
reward vs. punishment Geisser
in a probabilistic correction
categorization task:
Experimental and
computational
analyses.

Li, S. Y. W., Cox, A. BSD N = 180 IV: reward, Reward: monetary reward Data input task ANOVA Reward and Punishment resulted in
L., Or, C., & Age: 18 - 28 punishment Punishment: monetary significantly fewer errors, more frequent and
Blandford, A. (2016). 140 women DV: learning rate punishment longer checking than control (significant
Effects of mone-tary Learning rate: error rate, differences, middle to high effect sizes). But
reward and punishment HKG checking frequency and no significant difference between the reward
on information duration, duration of trial and punishment (p > .05).
checking behaviour. processing


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 8

Study Design Sample Variables IV/DV Operationalization Instruments Methodology Results


Steel, A., Silson, E. H., BSD N = 78 IV: reward, Reward: monetary reward SRTT ANOVA SRTT: No difference between reward and
Stagg, C. J., & Baker, M A = 25 punishment Punishment: monetary FFT Bonferroni punishment (early phase: p = .487; early/late
C. I. (2016). The SD = 4.25 DV: learning rate, punishment correction phase: p = n.a and no significant difference in
impact of reward and 47 women retention Learning rate: RT and RT between reward and punishment, p =
punishment on skill accuracy (SRTT); mean square .093).
learning depends on UK error (FTT) FTT: Punishment results in significant lower
task demands. Retention: RT (SRTT); mean learning rate than reward (p < .03).
square error (FTT) after 1
hour, 24 - 48 hours and after
30 days.

Widmer, M., Ziegler, BSD N = 45 IV: reward, Reward: monetary reward, Arc-pointing task GLMM Higher learning rates in all conditions, but no
N., Held, J., Luft, A., M A = 24.5 punishment point gain, verbal feedback Post hoc significant difference between the different
& Lutz, K. (2016). SD = 3.2 DV: learning rate, Learning rate: accuracy Dunnett’s forms of reward (p = .5599).
Rewarding feedback 22 women retention Retention:accuracy after 20 to t -Test
promotes motor skill 24 - 28 hours after training.
consolidation via CH
striatal activity.

Freedberg, M., Glass, BSD N = 47 IV: feedback Reward: positive feedback Information- Wilcoxon Punishment results in marginal to significant
B., Filoteo, J. V., WSD Age: 18 - 29 (positive, (PFB), higher positive integration-category ANOVA higher learning rates than reward
Hazeltine, E., & 18 women negative) feedback (PFFB-HF) learning with Gabor Post hoc NFB > PFB and PFB-HF
Maddox, W. T. (2017). AV: learning rate Punishment: negative feedback stimuli Exp. 1: p = .06
Comparing the effects USA (NFB) Exp. 2: p < .005
of positive and Reward + punishment: full- Exp. 3: p < .01
negative feedback in feedback (Full-FB), partial-
information-integration feedback (Partial-FB)
category learning. Learning rate: correct
responses


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 9

Study Design Sample Variables IV/DV Operationalization Instruments Methodology Results


Quattrocchi, G., BSD N = 45 IV: reward, Reward: monetary reward Force-field ANOVA Learning rate: No significant difference
Greenwood, R., Age: 52 - 62 punishment Punishment: monetary adaptation reaching ANCOVA between reward and punishment.
Rothwell, J. C., Galea, 19 women DV: learning rate, punishment task in a two-joint Retetion day 3: Significant difference in
J. M., & Bestmann, S. retention Learning rate: angular error at robotic conditions (p = .002, d = 1.20); Reward >
(2017). Reward and UK maximal speed manipulandum punishment (p = .008)
punishment enhance Retention: angular error at
motor adaptation in maximal speed on third day
stroke. (without reinforcement)

Abbreviations : BSD = between-subject-design, WSD = within-subject-design, N = Sample, MA = mean age, IV = indipendent Variable, DV = dependent Variable, SD = standard deviation,
SRTT = serial reaction time tast, FTT = force tracking task, PRP = psychological refractory periode paradigm, SSM = State-Space Model, GMP = graded monetary reward, RPF = random
positive feedback, GMN = graded monetary punishment, RT = reaction time, MSE = mean squared error, DR = discrimination rate, MDR = mean discrimination rate, p = significance value,
ANOVA = Analysis of Variance, ANCOVA = Analysis of Covariance, GLM = generalized linear model, GLMM = generalized linear mixed model


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 10

1 Discussion
Whether people learn better by reward or punishment, and what long-term influence reward
and punishment have on learning rate can not be conclusively answered based on the current
state of research. It must be assumed that the influence of reward and punishment on learning
rates is subject to various and partly complex mechanisms of action. For example, reward and
punishment appear to be processed in different ways in the brain and risk or loss aversion
could also have an influence on the effect of reward and punishment as well as individual
sensibility to reward and punishment.
It surprised that so few studies investigated the influence of reward and punishment on
learning rates, although this question has been addressed since the beginnings of
psychological research and is still unresolved in many aspects. Further research is not only
required in the context of long-term effects (retention) of reward and punishment, but also
whether reward or punishment lead to a higher learning rates and if at all and under what
conditions reward and punishment lead to higher learning rates.


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 11

2 References
Abe, M., Schambra, H., Wassermann, E. M., Luckenbaugh, D., Schweighofer, N., & Cohen,
L. G. (2011). Reward improves long-term retention of a motor memory through
induction of offline memory gains. Current biology, 21(7), 557–562.
doi:10.1016/j.cub.2011.02.030
Ashby, F. G., & O'Brien, J. B. (2007). The effects of positive versus negative feedback on
information-integration category learning. Perception & Psychophysics, 69(6), 865–
878. doi:10.3758/BF03193923
Freedberg, M., Glass, B., Filoteo, J. V., Hazeltine, E., & Maddox, W. T. (2017). Comparing
the effects of positive and negative feedback in information-integration category
learning. Memory & Cognition, 45(1), 12-25. doi:10.3758/s13421-016-0638-3
Galea, J. M., Mallia, E., Rothwell, J., & Diedrichsen, J. (2015). The dissociable effects of
punishment and reward on motor learning. Nature Neuroscience, 18(4), 597-602. doi:
10.1038/nn.3956
Gershman, S. J. (2015). Do learning rates adapt to the distribution of rewards?. Psychonomic
Bulletin & Review, 22(5), 1320-1327. doi:10.3758/s13423-014-0790-3
Guitart-Masip, M., Huys, Q. J. M., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J.
(2012). Go and no-go learning in reward and punishment: Interactions between affect
and effect. NeuroImage, 62(1), 154–166. doi:10.1016/j.neuroimage.2012.04.024
Kubanek, J., Snyder, L. H., & Abrams, R. A. (2015). Reward and punishment act as distinct
factors in guiding behavior. Cognition, 139, 154-167.
doi:10.1016/j.cognition.2015.03.005
Li, S. Y. W., Cox, A. L., Or, C., & Blandford, A. (2016). Effects of monetary reward and
punishment on information checking behaviour. Applied Ergonomics, 53(Part A), 258-
266. doi:10.1016/j.apergo.2015.10.012
Lie, C., & Alsop, B. (2009). Effects of point-loss punishers on human signal-detection
performance. Journal of the Experimental Analysis of Behavior, 92(1), 17–39.
doi:10.1901/jeab.2009.92-17
Moustafa, A. A., Gluck, M. A., Herzallah, M. M., & Myers, C. E. (2015). The influence of
trial order on learning from reward vs. punishment in a probabilistic categorization task:
Experimental and computational analyses. Frontiers in Behavioral Neuroscience, 9.
doi:10.3389/fnbeh.2015.00153


REWARD AND PUNISHMENT: INFLUENCE ON LEARNING RATES 12

Quattrocchi, G., Greenwood, R., Rothwell, J. C., Galea, J. M., & Bestmann, S. (2017). Reward
and punishment enhance motor adaptation in stroke. Journal of neurology,
neurosurgery, and psychiatry. jnnp-2016. doi: 10.1136/jnnp-2016-314728
Steel, A., Silson, E. H., Stagg, C. J., & Baker, C. I. (2016). The impact of reward and
punishment on skill learning depends on task demands. Scientific reports, 6, 36056.
doi:10.1038/srep36056
Thorndike, E. L. (1913). The laws of learning in animals. Educational psychology, Vol 2: The
psychology of learning (pp. 6-16). New York, NY, US: Teachers College.
doi:10.1037/13051-002
Thorndike, E. L. (1927). The law of effect. The American Journal of Psychology 39(1/4),
212–222. doi:10.2307/1415413
Thorndike, E. L. (1932). Effects of punishment and of reward. In W. V. Bingham, W. V.
Bingham (Eds.), Psychology today: Lectures and study manual (pp. 225-230). Chicago,
IL, US: University of Chicago Press. doi:10.1037/13342-029
Wächter, T., Lungu, O. V., Liu, T., Willingham, D. T., & Ashe, J. (2009). Differential effect
of reward and punishment on procedural learning. The Journal of neuroscience: the
official journal of the Society for Neuroscience, 29(2), 436–443.
doi:10.1523/JNEUROSCI.4132-08.2009
Weis, T., Puschmann, S., Brechmann, A., & Thiel, C. M. (2013). Positive and negative
reinforcement activate human auditory cortex. Frontiers in Human Neuroscience, 7.
doi:10.1037/t23111-000
Widmer, M., Ziegler, N., Held, J., Luft, A., & Lutz, K. (2016). Rewarding feedback promotes
motor skill consolidation via striatal activity. Progress in brain research, 229, 303–323.
doi:10.1016/bs.pbr.2016.05.006
Yechiam, E., & Hochman, G. (2013). Losses as modulators of attention: Review and analysis
of the unique effects of losses over gains. Psychological Bulletin, 139(2), 497-518.
doi:10.1037/a0029383
Yildiz, A., Chmielewski, W., & Beste, C. (2013). Dual-task performance is differentially
modulated by rewards and punishments. Behavioural Brain Research, 250, 304–307.
doi:10.1016/j.bbr.2013.05.010

You might also like