Professional Documents
Culture Documents
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 4 February 2022 | Accepted: 13 June 2022
DOI: 10.1002/jcpy.1313
1
Yale School of Management, Yale Abstract
University, New Haven, Connecticut, USA
2
HEC Montréal, Montréal, Canada
When consumers avoid taking algorithmic advice, it can prove costly to both
3
University of Toronto Scarborough & marketers (whose algorithmic product offerings go unused) and to themselves (who
Rotman School of Management, Toronto,
Canada
fail to reap the benefits that algorithmic predictions often provide). In a departure
from previous research focusing on when algorithm aversion proves more or less
Correspondence
Taly Reich, Yale School of Management, likely, we sought to identify and remedy one reason why it occurs in the first place.
Yale University, 165 Whitney Ave., New
Haven, Connecticut 06511, USA.
In seven pre-registered studies, we find that consumers tend to avoid algorithmic
Email: taly.reich@yale.edu advice on the often faulty assumption that those algorithms, unlike their human
counterparts, cannot learn from mistakes, in turn offering an inroad by which to
reduce algorithm aversion: highlighting their ability to learn. Process evidence,
through both mediation and moderation, examines why consumers fail to trust
algorithms that err across a variety of prediction domains and how different
theory-driven interventions can solve the practical problem of enhancing trust and
consequential choice in algorithms.
K EY WOR DS
learning from mistakes, algorithm appreciation, algorithm aversion, intervention, mistakes
and predicting decisions, this process will only become advice (Shaffer et al., 2013). Even when told that algo-
more powerful, predictive, and pervasive. rithms provide higher quality and more accurate pre-
As the marketplace moves from more traditional fore- dictions, people often continue to favor the suboptimal
casting paradigms in which humans—whether friends, recommendations of humans. Research points to many
experts, or marketers— provide recommendations to reasons for this hesitancy toward algorithms, including
those that rely on non-human algorithmic sources, it the inability of algorithms to specify targets and ac-
becomes increasingly important to understand the dif- count for individual differences (Grove & Meehl, 1996;
ferences in how consumers interact with these different Longoni et al., 2019) and concerns that the use of al-
prediction sources. Far from an algorithmic takeover, gorithms is unethical or dehumanizing (Dawes, 1979;
research has documented so-called “algorithm aversion” Grove & Meehl, 1996).
or the general preference for humans' recommenda- However, as evidence continues to mount in favor of
tions or predictions (Dawes, 1979; Dietvorst et al., 2015; the efficacy and superiority of algorithmic predictions
Meehl, 1954; Promberger & Baron, 2006; Yeomans (over human forecasters), and as these predictions spread
et al., 2019). However, this general preference can be into new and diverse domains, consumers have become
overcome, as consumers appear open to the use of and more comfortable in choosing algorithmic recommend-
reliance on algorithms under specific conditions (Logg ers and, ultimately, in accepting and utilizing these pre-
et al., 2019), suggesting that algorithm aversion is by dictions. Research in “algorithm appreciation,” wherein
no means set in stone. Given the robust ability of algo- algorithms are preferred over humans as sources of pre-
rithms to provide predictions that exceed the accuracy diction, has provided evidence of changing preferences
of human prediction (Dawes, 1979; Meehl, 1954), the in favor of algorithms in different specific domains and
present investigation asks not which domains best lend scenarios (Dietvorst & Bharti, 2020; Dijkstra, 1999;
themselves to algorithm appreciation but, instead, how Dijkstra et al., 1998; Logg et al., 2019). For instance, con-
interventions can be leveraged in order to enhance al- sumers appear more receptive to advice and forecasts
gorithm appreciation (Dietvorst et al., 2018). In order to stemming from an algorithm when the domain under
do so, the present investigation leverages a characteristic consideration is objective rather than subjective (Castelo
inherent to prediction—occasional error—to consider et al., 2019) and utilitarian rather than hedonic (Longoni
whether these inevitable mistakes might be framed not, & Cian, 2022). These developments can be seen as fa-
in keeping with past research, as evidence of algorithm vorable insofar as prediction machines regularly offer
failure but instead as offering the potential for learning. greater accuracy than humans and will likely dominate
Integrating the burgeoning literature on the benefits the future of forecasting (Agrawal et al., 2018). Whereas
of mistakes (Reich et al., 2018; Reich & Maglio, 2020; this past research has identified the specific domains in
Reich & Tormala, 2013), we propose that consumers which consumers are naturally more amenable to algo-
are reluctant to trust algorithms that err but that those rithms, the present investigation takes a different per-
same errors, when seen as opportunities from which the spective in offering a theory-driven intervention—w ith
algorithm can learn, enhance trust in and reliance on two specific manifestations—designed to enhance con-
algorithms. sumer reliance on algorithms: learning from mistakes.
Algorithmic versus human recommendation Even while supporting the contention that consum-
ers fail to use superior algorithms in favor of inferior
Algorithms—statistical models, decision rules, or other human forecasting, the paper that coined the phrase
mathematical procedures for forecasting— have been “algorithm aversion” in fact found that algorithmic
shown to consistently outperform humans in a variety of forecasts were preferred in control conditions in which
prediction tasks including academic performance, parole they were not seen to err (Dietvorst et al., 2015). In
violation, graduate student success, and clinical diagno- other words, consumers seem open to algorithms but
sis (Dawes, 1979; Dawes et al., 1989; Grove et al., 2000; with one caveat: that they not make mistakes. This
Meehl, 1954). Despite this, a majority of research has turns out to be a substantial caveat, as the uncertainty
shown that people prefer human predictions to those inherent in forecasting the future makes it an error-
put forward by algorithms (Dana & Thomas, 2006; filled enterprise. It is only when algorithms inevitably
Dietvorst et al., 2015; Eastwood et al., 2015; Hastie & fall short of perfect accuracy that consumers dismiss
Dawes, 2009). This research also shows that people tend them (Dietvorst et al., 2015), deriving from an infer-
to favor human input over algorithmic input in decision- ence that a single mistake signals that the algorithm
processes (Dietvorst et al., 2015; Önkal et al., 2009) and is irrevocably flawed or simply broken (Dawes, 1979;
that they prefer that others, particularly professionals, Highhouse, 2008). Stubborn though it may be, this in-
seek out other human advice rather than algorithmic ference is often wrong, as algorithms can be capable of
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HOW TO OVERCOME ALGORITHM AVERSION | 3
improvement and learning over time. If consumers be- T h e p re s e nt i nve st igat ion
lieve that a mistaken algorithm is a broken algorithm,
can this perception be remedied in order to bolster When observers see other people learn from mistakes,
trust in algorithms? those observers have more confidence in mistake-
Based on the heretofore separate literature on makers and end up more willing to follow their advice.
human mistake-m aking and learning from mistakes, This occurs in large part because learning from mistakes
the present investigation proposes that the answer to serves as a reliable signal of gained expertise (Reich &
this question is yes. Just because they are largely un- Maglio, 2020), an element of credibility that causes people
avoidable does not make mistakes desirable; from to be more persuasive in their communicative messaging
watching a bad movie to buying a lemon of a car to (Berlo et al., 1969; Hovland et al., 1953; McGuire, 1978;
undergoing the wrong medical procedure, following Pornpitakpan, 2004). The construct of expertise allows
mistaken forecasts and taking mistaken advice under- us to build a conceptual bridge from humans who make
mines consumer welfare. As a result, people tend to mistakes to all entities that make mistakes. Research in
keep their mistakes to themselves (Edmondson, 1996; human versus algorithmic prediction has shown that
Stefaniak & Robertson, 2010; Uribe et al., 2002) for heightened perceptions of expertise can increase pref-
fear of being seen as incompetent and deserving of erences for both algorithmic (Dijkstra, 1999; Dijkstra
dismissal (Chesney & Su, 2010; Kunda, 1999; Palmer et al., 1998) and human (Arkes et al., 1986; Promberger &
et al., 2010; Parker & Lawton, 2003) in much the same Baron, 2006) recommendations, as people prefer accu-
way that consumers dismiss mistaken algorithms. Still, rate forecasts and expertise is seen as a means by which
the focal comparison in that work lies in consumer re- to achieve it (Binzel & Fehr, 2013; Bolger & Wright, 1992;
sponse to actors making or not making a mistake. In Hovland & Weiss, 1951; Sternthal et al., 1978). Similarly,
prediction, where errors and mistakes occur regularly Dietvorst et al. (2015) demonstrated that a loss of confi-
and can thus be taken as a given, the more important dence in an algorithm's ability was the key factor that led
issue might lie in how to manage or mitigate the over- to a preference for human predictions after an algorithm
blown inferences of incompetence that often follow erred. However, in an early treatment of algorithm learn-
from mistakes. ing, Berger et al. (2021) conducted an experiment in which
When people make mistakes, observers often care participants imagined working at a call center and had to
more about how they respond to the mistake than about forecast the number of incoming calls. For this objective,
its occurrence in the first place. If people make an initial non-personal task, participants were more likely to rely
mistake and then keep making the same mistake repeat- on an algorithm when they watched (i.e., experienced) it
edly, observers rightly doubt the potential for change improve over successive trials. Of greatest relevance to
and growth. However, when people demonstrate that the present investigation, this finding provides prelimi-
they have learned from their mistakes, making the initial nary evidence that algorithms are capable of bouncing
mistake can prove not only not detrimental but, in fact, back from prior missteps in the eyes of human observ-
beneficial (Reich & Maglio, 2020). People who change ers questioning whether to place confidence and trust in
their minds can be more persuasive than those who hold them. Building from this prior work, the present investi-
fast when observers see the change as resulting from gation considers both objective and subjective domains
new information and learning (Reich & Tormala, 2013). of algorithmic prediction, non- personal and personal
People who falter in pursuing a goal are seen as more tasks, and manipulates algorithmic learning by having
likely to attain that goal than others who never falter, it described to participants rather than having them ex-
provided that they have corrected the original mistake perience it.
(Kupor et al., 2018). Consumers who admit to past pur- Expertise predicts trust, including in the choice be-
chase mistakes in writing online reviews are more likely tween reliance on human or algorithmic prediction, and
to be trusted than others who never made a mistake research to date has suggested that both humans and al-
when the audience interprets response to the initial mis- gorithms are capable of being seen as having more of it
take as evidence of learning and gained expertise (Reich than the other. However, research to date has not consid-
& Maglio, 2020). In each of these instances, acknowl- ered how one factor inherent to the forecasting process—
edging mistakes proves beneficial because observers the making of mistakes—m ight be leveraged as a means
appraise the mistake maker as having learned from the by which to enhance algorithm appreciation rather than
experience, interpreting her/his behavior through a lens result in unilateral algorithm aversion. This appears to
of change and growth rather than seeing the mistake as stem from the fact that people see other people as capa-
diagnostic of a permanent, unfixable flaw (Dweck, 2011; ble of growth following mistakes, so, it stands to reason
Hong et al., 1995). Taken together, the findings from the that they would place more stock in human predictions
literature on how people think about other humans who while eschewing predictions from algorithms that are
make mistakes suggest that, when framed as opportuni- seen as incapable of learning and growth (Dawes, 1979;
ties from which learning has occurred, mistakes foster Highhouse, 2008). However, as the literature on mistake-
trust. making suggests, mistakes garner trust when they act as
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 | REICH et al.
springboards from which learning can occur. If human with a different structure of incentive compatibility.
mistake-makers can vary in the extent to which observ- Collectively, the studies test why consumers fail to trust
ers see them as able to grow and learn or remain trapped algorithms that err across a variety of prediction do-
in ineptitude, then perhaps algorithmic mistake-makers mains as well as how different interventions can enhance
might also vary in the same manner, and with the same trust and consequential choice. All of the studies report
potential benefits. all manipulations and measures used, and each study
The present investigation proposes that, just as not was pre-registered.
all human mistakes are seen as created equal, not all al-
gorithmic mistakes should be seen as equally damning.
Instead, algorithms should garner the least trust when ST U DY 1
they are seen as incapable of learning from their mis-
takes. This tends to be the standard assumption when As a foundation from which to build our investigation,
consumers witness algorithms err, suggesting that dif- Study 1 first replicates and extends a fundamental claim:
ferent interventions and framings of algorithmic mis- that people perceive algorithms as less capable of learn-
prediction might overcome the default inference made ing compared to humans, by which they are as good as
by consumers. Study 1 first verifies this assumption, they can be out of the box and mistakes flag that they are
testing whether people perceive algorithms as less capa- fundamentally broken (Dawes, 1979; Highhouse, 2008).
ble of learning than humans and whether these percep- Providing a more modern update, Study 1 considers this
tions affect trust. To provide evidence of a robust effect question in the consumer domains examined by Castelo
across different prediction domains established by prior et al. (2019) and, accordingly, Study 1 includes not only
literature (Castelo et al., 2019), Study 1 documents this a measure of learning ability (Dietvorst et al., 2015) but
tendency for both objective and subjective predictions. also the same trust measure as employed by Castelo
Thereafter, Study 2 moves beyond learning in general to et al. (2019). Because the latter investigation used two
learning from mistakes more specifically by providing different domains (in the interest of examining its focal
prior performance statistics (including successes and research question comparing algorithm aversion for ob-
failures) for both humans and algorithms in a subjective jective and subjective forecasts), Study 1 also includes
task. By assessing both perceptions of the ability to learn those same two domains (one objective and one subjec-
from mistakes as well as trust, Study 2 tests the mediat- tive) in the interest of completeness. While we expect to
ing role of perceived learning from mistakes on which replicate the basic effect (heightened algorithm trust in
prediction source—human or algorithm—people trust. objective over subjective domains; Castelo et al., 2019),
The remaining three studies examine the efficacy of we more importantly hypothesize that participants will
different interventions designed to frame algorithms as appraise algorithms as less capable of learning than hu-
capable of learning from mistakes. Study 3 manipulates mans across the different domains. This study was pre-
not only prediction source type but also the inclusion (or registered on AsPredicted.org (https://aspredicted.org/
absence) of performance statistics (similar to Study 2) blind.php?x=zm2qm8).
designed to include learning evidence by making prior
performance dynamic (i.e., improving over time). By in-
troducing a key moderator (learning evidence), Study 3 Method
provides support for our proposed process using a mod-
erated mediation analysis to predict which prediction We recruited a sample of two-hundred participants
source people trust. Moving from trust to actual choice (Mage = 38.47, SD = 10.50; 51.5% female) from Amazon
in an incentive-compatible design, Study 4 again uses Mechanical Turk using CloudResearch. We utilized
the same prior performance intervention as Study 3 to the CloudResearch Approved Participants feature,
provide support for the role of learning evidence in in- to ensure the participation of only high-quality par-
fluencing consequential choice of an algorithm over a ticipants who have passed CloudResearch's attention
human. Study 5A introduces a second learning evidence and engagement measures. Participants participated
intervention (what the algorithm is called), simultane- in exchange for monetary payment. Participants were
ously comparing its trust- related effectiveness to our randomly assigned to one of two domain conditions:
other learning intervention (learning performance), a objective or subjective. The objective domain con-
traditional algorithm, and human prediction. Study 5B sisted of recommending a disease treatment and the
isolates its focus onto this more subtle manipulation of subjective domain consisted of predicting personality
language to provide support for the role of learning in traits (see Castelo et al., 2019). We employed the same
a different domain and with a different means by which trust measure as Castelo et al. (2019). Specifically,
to create a consequential choice setting. Finally, Study in the subjective condition, we asked participants to
6 provides a number of robustness checks for the level indicate who they would trust more to predict per-
of performance and rate of improvement in the learn- sonality traits. Participants responded on a 0 to 100
ing conditions as well as the importance of the decision sliding bar (0 = human psychologicst, 50 = no preference,
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HOW TO OVERCOME ALGORITHM AVERSION | 5
100 = algorithm). In the objective condition, we asked placed less trust in algorithms specifically charged with
participants who they would trust more to recommend making a prediction for a subjective (versus an objective)
disease treatment (0 = human doctor, 50 = no prefer- domain. However, foundational to the present investiga-
ence, 100 = algorithm). Participants were then asked, tion, participants believed that humans can learn more
in random order, the perceived learning items adapted than can algorithms, an effect that generalized across
from Dietvorst et al. (2015) that measure human and the different prediction domains.
algorithm learning. The two- item human perceived
learning index (r = 0.75, p < 0.001) comprised of how
much they thought humans can learn while perform- ST U DY 2
ing a task like this and how much they thought humans
could improve while performing a task like this. The Study 1 echoes and updates prior work suggesting that
two-item algorithm perceived learning index (r = 0.88, consumers believe that algorithms are less likely to learn
p < 0.001) was comprised of how much they thought al- in general. In the interest of providing more targeted evi-
gorithms can learn while performing a task like this dence for our focal construct, Study 2 considers whether
and how much they thought algorithms could improve people think that algorithms are incapable in the more
while performing a task like this. Participants re- specific arena of learning from mistakes. Should partici-
sponded on a scale ranging from 1 (not at all) to 7 (very pants see algorithms as less capable in this regard as well,
much). Finally, participants completed demographic it would strengthen the case for our particular goal in the
measures (gender and age). subsequent studies: designing interventions to foster the
perception that algorithms can learn from mistakes.
This is an issue of not only conceptual relevance but
Results and discussion also applied importance in consideration of where many
algorithms are gaining traction in the current market-
Trust place: subjective domains (e.g., film, music, job search,
dating apps, and psychological profiles). In subjective
As predicted, trust in a human was higher than domains, we conjecture that mistakes— and learning
trust in an algorithm for both the subjective do- therefrom—are particularly prevalent, manifested in ev-
main (M = 25.50, SD = 26.98), t(97) = −8.99, p < 0.001, erything from bad movie recommendations to bad part-
95% CI = [−29.91, −19.09] and the objective domain ner pairings. While algorithms seem well positioned to
(M = 34.07, SD = 28.64), t(101) = −5.62, p < 0.001, 95% rise to prominence in objective domains (meaning that
CI = [−21.56, −10.31]. Consistent with the findings of consumer perceptions of trust should rise in kind), er-
Castelo et al. (2019), we found that trust in the algo- rors of the sort common in subjective domains need not
rithm was higher in the objective domain (M = 34.07, be tantamount to dismissal of algorithms outright. Past
SD = 28.64) than in the subjective domain (M = 25.50, research (Castelo et al., 2019) and Study 1 documented
SD = 26.98), t(198) = 2.18, p = 0.031, d = 0.31, 95% that mistrust of algorithms is higher for subjective than
CI = [0.80, 16.33]. objective tasks, suggesting that this is the realm in which
means by which to enhance trust of algorithms might be
most important and with the most potential to observe
Perceived learning index improvement. Accordingly, Study 2 (and all subsequent
studies) focus on subjective tasks.
As predicted, a paired sample t-test revealed that par- Study 1 provided no information about the forecaster
ticipants thought that a human was more capable of other than its type (human versus algorithm). In order
learning while performing the subjective task (M = 5.54, to gain traction on the issue of learning from mistakes,
SD = 1.02) compared with an algorithm (M = 4.69, Study 2 provides not only the type of forecaster but also
SD = 1.63), t(100) = 4.18, p < 0.001, 95% CI = [0.45, 1.25]. performance statistics, which include a history of errors,
The same pattern was observed for the objective task: a for both humans and algorithms. We include this not
paired sample t-test revealed that participants thought only as a stepping stone from which we will build our in-
that a human was more capable of learning while per- tervention in subsequent studies but also because of the
forming the objective task (M = 5.78, SD = 1.04) compared prevalence of this practice in the modern marketplace
with an algorithm (M = 4.60, SD = 1.69), t(97) = 6.26, (e.g., online vendors touting the success rate of their past
p < 0.001, 95% CI = [0.81, 1.55]. recommendations to customers and Netflix appending
Using modern consumer prediction domains, Study 1 Percent Match ratings to titles as evidence-based esti-
thus updates and extends prior work. Conceptually rep- mates of successful pairings). By assessing perceptions
licating algorithm aversion (Dietvorst et al., 2015), par- of learning from those mistakes as well as trust in the
ticipants generally placed less trust in algorithms than forecaster, Study 2 will test the hypothesized mediating
in humans. Conceptually replicating task- dependent role of perceived learning from mistakes on trust of hu-
algorithm aversion (Castelo et al., 2019), participants mans and of algorithms. This study was pre-registered
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 | REICH et al.
on AsPredicted.org (https://aspredicted.org/blind.
php?x=r9hz9y).
Method
dynamic rather than static, they would be able to offer Below are the performance statistics of an
evidence of learning should the performance of the agent algorithm used in [a psychologist who works
improve over time. That is, rather than a summary snap- for] an online psychological service (% of
shot of past performance, the history might be presented personality trait evaluations that turned out
in a manner that reflects an increasingly positive trend to be correct or incorrect):
(Kupor & Laurin, 2020; Maglio & Polman, 2014, 2016;
Reich et al., 2021b), allowing participants to witness, di- 80% correct, 20% incorrect.
rectly, a retrospective sense of learning by the agent over
successive mistakes (growing smaller in number and In the learning evidence conditions, participants read:
rate).
Study 3 integrates this logic as one particular applica- Below are the first year performance statis-
tion of our broader construct of interest (learning from tics of an algorithm used in [psychologist
mistakes) that allows for the design of an intervention who works for] an online psychological ser-
to augment perceived algorithm learning. Thus, the vice (% of personality trait evaluations that
design of Study 3 builds from the performance history turned out to be correct or incorrect):
methodology used in Study 2, but with the inclusion of
a dynamic component by which the predictions of the First 3 months: 60% correct, 40% incorrect.
agent reflect learning. Accordingly, this study manip-
ulates not only agent type (human or algorithm) but First 6 months: 70% correct, 30% incorrect.
also whether the performance history statistics include
learning evidence or not (i.e., a pattern representative of First year: 80% correct, 20% incorrect.
an improving trajectory). Our hypothesis development
(with supporting evidence from Study 2) has suggested Next, participants completed the same one-item trust
that humans are seen as innately capable of learning measure and the same two-item perceived learning from
from mistakes, suggesting that only algorithms should mistakes measure (r = 0.86, p < 0.001) as in Study 2. Finally,
receive a boost in trust as a result of making prior learn- participants completed demographic measures (gender
ing salient. Accordingly, the design of Study 3 allows us and age).
to examine the effect of providing learning evidence (ver-
sus no such evidence) on trust via perceptions of an abil-
ity to learn as a function of prediction agent. We predict Results and discussion
that, in the absence of evidence of learning, algorithms
will be seen as less capable of learning than humans, ac- Trust
counting for lower trust. However, when provided with
evidence of learning, we predict that algorithms will be We submitted the trust data to a 2 (agent: human vs.
seen as equally capable of learning as humans, resulting algorithm) x 2 (learning evidence: with vs. without)
in a level of trust commensurate with that of humans as analysis of variance (ANOVA). This analysis revealed a
a result of this learning-based intervention. We test these main effect of agent, F(1, 396) = 4.04, p = 0.045, ηp2 = 0.01
hypothesized relationships using a moderated mediation and a main effect of learning evidence, F(1, 396) = 5.62,
analysis. This study was pre-registered on AsPredicted. p = 0.018, ηp2 = 0.01. Importantly, these main effects were
org (https://aspredicted.org/blind.php?x=76cu5n). qualified by a significant interaction between agent and
performance data, F(1, 396) = 5.19, p = 0.023, ηp2 = 0.01.
As illustrated in Figure 2, participants trusted the al-
Method gorithm significantly more when performance data in-
cluded learning evidence (M = 5.00, SD = 1.03) compared
We recruited a sample of four- hundred participants with when performance data did not include learning ev-
(Mage = 40.33, SD = 12.42; 51.4% female) from Amazon idence (M = 4.48, SD = 1.34), F(1, 396) = 10.85, p = 0.001,
Mechanical Turk using CloudResearch. We utilized the ηp2 = 0.03. Conversely, there was no difference in trust of
CloudResearch Approved Participants feature, to en- a human when performance data included learning evi-
sure the participation of only high-quality participants dence (M = 4.97, SD = 1.03) and when it did not (M = 4.96,
who have passed CloudResearch's attention and engage- SD = 1.04), F(1, 396) = 0.004, p = 0.948.
ment measures. Participants participated in exchange
for monetary payment. Participants were randomly as-
signed to one condition in a 2 (agent: human vs. algo- Perceived learning from mistakes index
rithm) x 2 (performance data: with learning evidence vs.
without) between- subjects design. Largely resembling We submitted the perceived learning from mistakes
the setup of Study 2, participants in the without learning index data to a similar 2 (agent: human vs. algorithm) x 2
evidence conditions read: (learning evidence: with vs. without) analysis of variance
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 | REICH et al.
We first constructed three algorithm types: a control described similarly but with that performance having
algorithm (providing no information about the algo- improved over the past year. In all conditions, for the
rithm other than that it was an algorithm), an algorithm human agent option, no information was provided about
without learning evidence, or an algorithm with learning the human other than that it was a human. They then
evidence. Each algorithm type was paired with the same indicated their choice by selecting either “The psychol-
human agent to create three experimental conditions, ogist” radio button or “The algorithm” radio button.
and participants were tasked with choosing whether to Finally, participants completed demographic measures
accept the advice provided by the human or the algo- (gender and age).
rithm described as a function of their experimental con-
dition. Thus, Study 4 asks participants to make a forced
choice between a human forecaster or an algorithm; we Results and discussion
predict that choice share in favor of the algorithm will be
greatest among participants for whom the algorithm has Choice
learning evidence. Switching from a judgment (i.e., of
trust, as in Studies 1–3) to choice also allowed Study 4 to Two dummy variables were created: one for the algo-
adopt an incentive compatible design: The choice made rithm without learning evidence condition (0 = con-
by participants to opt for the human or the algorithm car- trol, 1 = without learning evidence, 0 = with learning
ried consequential relevance, as making the better choice evidence) and one for the control condition (1 = control,
came with a financial incentive. Thus, Study 4 provides 0 = without learning evidence, 0 = with learning evi-
evidence for the role of learning evidence on influencing dence) to allow for comparison with the algorithm with
actual choice. This study was pre-registered on AsPre learning evidence condition. The dependent variable of
dicted.org (https://aspredicted.org/blind.php?x=jm77br). choice was coded as 1 = human and 2 = algorithm. As
predicted, a binary logistic regression revealed that par-
ticipants in the algorithm with learning evidence condi-
Method tion were more likely to choose the algorithm (66.3%)
than participants in the algorithm without learning evi-
We recruited a sample of three- hundred participants dence condition (50.5%; b = −0.66, SE = 0.29, p = 0.024,
(Mage = 39.64, SD = 11.94; 54.7% female) from Amazon 95% CI = [0.292, 0.918]) and participants in the control
Mechanical Turk using CloudResearch. We utilized the condition (26.7%; b = −1.69, SE = 0.31, p < 0.001, 95%
CloudResearch Approved Participants feature, to en- CI = [0.101, 0.340]). In addition, participants in the al-
sure the participation of only high-quality participants gorithm without learning evidence condition were more
who have passed CloudResearch's attention and engage- likely to choose the algorithm (50.5%) than participants
ment measures. Participants participated in exchange in the control condition (26.7%; b = 1.03, SE = 0.30,
for monetary payment. In all conditions, participants p = 0.001, 95% CI = [1.552, 5.036]).
were told that they will be asked to choose between a In the algorithm with learning evidence condition,
psychologist's personality evaluation and an algorithm's the choice of the algorithm (66.3%) was significantly big-
personality evaluation of another participant. To make ger than the choice of the human (33.7%), χ2(1) = 10.45,
choosing correctly feel consequential, all participants p = 0.001, d = 0.69. In the without learning evidence con-
were told that at the end of the study, we would score dition, the choice of the algorithm (50.5%) was not sig-
the accuracy of the psychologist and the algorithm nificantly different from the choice of the human (49.5%),
against a standardized personality measure to code for χ2(1) = 0.10, p = 0.921, d = 0.02. Finally, in the control
accuracy and those who chose the accurate personality condition, the choice of the algorithm (26.7%) was sig-
evaluation— psychologist or algorithm— would be en- nificantly smaller than the choice of the human (73.3%),
tered into a lottery to win an additional $1 bonus. In fact, χ2(1) = 21.87, p < 0.001, d = 1.05.
all participants received the $1 bonus at the conclusion These results offer new insights about how consum-
of the experimental session. ers decide whether to choose a human or an algorithm
Participants were randomly assigned to one of three to provide the most accurate forecast. The control
conditions with varying descriptions of the algorithm: condition conceptually replicates algorithm aversion
control, algorithm performance data without learning (Dietvorst et al., 2015), with participants actively avoid-
evidence, or algorithm performance data with learning ing the algorithm. However, when merely provided with
evidence. In the control condition, no information was a performance history designed to include no evidence of
provided about the algorithm other than that it was an learning, the rate at which participants chose the human
algorithm. The two performance data algorithms were or the algorithm was roughly at chance—evidence for
described in a manner identical to Studies 2 and 3: The what might be termed “algorithm indifference.” We note
algorithm without learning evidence was described that, in Studies 2 and 3, the no-learning conditions were
as having a performance history of 80% success and contrasted only against the learning conditions. Study
20% failure; the algorithm with learning evidence was 4, however, includes a control condition, allowing for
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 | REICH et al.
a comparison between performance history with no of this choice will prove far from arbitrary, as the learn-
learning and a pure control condition (no information ing signaled by the latter terminology should enhance
about performance history or learning provided). Here, consumer trust in a manner consistent with our hypoth-
it appears that any mention of the performance of an al- eses. To allow for a full comparison of our different in-
gorithm garners more consumer trust in that algorithm terventions, Study 5A includes both the novel “machine
than when that information is not provided. We return learning” naming intervention as well as the learning
to the implications of this unanticipated result in the evidence intervention from our earlier studies as well
General Discussion. To our primary prediction, we close as a traditional (control) algorithm and a human fore-
in highlighting that Study 4 provided support for the ef- caster. This study was pre-registered on AsPredicted.org
fectiveness of learning evidence on consumer trust. Even (https://aspredicted.org/blind.php?x=a3su25).
in an incentive-compatible context, participants over-
came any potential algorithm aversion and behaved in a
manner more consistent with algorithm appreciation— Method
indeed, algorithm investment—by choosing it at a higher
rate. We recruited a sample of four- hundred participants
(Mage = 38.50, SD = 11.64; 48.5% female) from Amazon
Mechanical Turk using CloudResearch. We utilized the
ST U DY 5A CloudResearch Approved Participants feature, to en-
sure the participation of only high-quality participants
Having provided evidence for an effect of perceived al- who have passed CloudResearch's attention and engage-
gorithm learning on judgments of trust (Study 3) and ment measures. Participants participated in exchange
choice (Study 4), the current study seeks to speak to the for monetary payment. Participants were randomly as-
breadth of this effect in two notable ways. First, Study signed to one of four agent conditions: human, tradi-
5A provides evidence for the benefit of learning evidence tional analytical algorithm, machine learning algorithm,
in a novel domain (online dating), testing a noteworthy or learning evidence algorithm. In the human condition,
robustness check. Second, the means by which we ma- participants read:
nipulated learning in Studies 3 and 4 presented more
pieces of performance information (success rates at 3 An online dating company can send you
different time points) than in the non-learning condi- romantic partner recommendations. Below
tions, raising the possibility that this sheer amount of are last year's performance statistics of a
information made different characteristics more salient psychologist who works for the online dating
to participants. Study 5A examines the effectiveness of company: 85% good match, 15% bad match.
a novel learning intervention based on language rather
than performance history.
The literature on psycholinguistics documents the In the traditional analytical algorithm condition, partici-
many ways in which the names given to products change pants read:
how consumers think about and behave toward those
products (e.g., Maglio et al., 2014; Maglio & Feder, 2017; An online dating company can send you
Rabaglia et al., 2016; Shrum & Lowrey, 2007; Yorkston & romantic partner recommendations. Below
Menon, 2004). Accordingly, Study 5A considers whether are last year's performance statistics of a tra-
merely changing what the algorithm is called might im- ditional analytical algorithm used in the on-
pact the extent to which consumers trust it. Toward this line dating company: 85% good match, 15%
goal, the current consumer landscape offered a clear can- bad match.
didate: machine learning. While there are definite points
of difference between algorithms and machine learn-
ing, all machine learning is underpinned by algorithms; In the machine learning algorithm condition, participants
though not all algorithms utilize machine learning, any read:
algorithm that refines and improves its forecasting capa-
bility through trial and error can be conceptualized as An online dating company can send you
machine learning (Mitchell, 1997). Because our investi- romantic partner recommendations. The
gation targets learning from mistakes as applied to algo- company uses a machine learning algo-
rithms, we believe it is a fair characterization to describe rithm, with each mistake, the algorithm
the algorithms under consideration with the current set learns more, allowing it to provide better
of studies as machine learning. Accordingly, it becomes recommendations. Below are last year's per-
a matter of arbitrary semantic choice as to whether to de- formance statistics of the machine learning
scribe the agent as an algorithm or as a machine learning algorithm used in the online dating com-
algorithm. However, we propose that the consequences pany: 85% good match, 15% bad match.
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HOW TO OVERCOME ALGORITHM AVERSION | 11
First year: 85% good match, 15% bad match. FIGURE 4 Trust as a function of agent condition, Study 5A
Next, participants completed the same one-item trust learning algorithm”) and a definition thereof, yet con-
measure as in the previous studies. Finally, participants sumer trust in this prediction agent was just as high as it
completed demographic measures (gender and age). was in the former two conditions. Next, Study 5B consid-
ers whether this new intervention might reliably impact
consequential decisions in a manner akin to the more
Results and discussion explicit learning evidence impacting a consequential de-
cision in Study 4. Furthermore, Study 5B uses an even
Trust subtler language-based intervention in that Study 5A
stated not only that the algorithm was a “machine learn-
As predicted, a one- way ANOVA revealed a signifi- ing algorithm” but also provided a complete definition
cant effect of agent condition on trust, F(3, 396) = 4.80, of what that meant. Would the effect generalize by using
p = 0.003. Planned contrasts revealed that participants just the language-based intervention without including
trusted the human agent significantly more (M = 5.00, the definition?
SD = 1.14) than they trusted the traditional analytical
algorithm (M = 4.47, SD = 1.44), Fisher's LSD: p = 0.003,
d = 0.41, 95% CI = [0.18, 0.87]. There was no difference ST U DY 5 B
in trust between the human agent and the machine
learning algorithm (M = 5.05, SD = 1.11), Fisher's LSD: Study 5A found that changing the name applied to an
p = 0.774, 95% CI = [−0.39, 0.29], or the learning evidence algorithm shifted trust in it, and Study 5B aims to use
algorithm (M = 4.99, SD = 1.21), Fisher's LSD: p = 0.954, this subtle manipulation of perceived algorithm learn-
95% CI = [−0.33, 0.35]. The latter two did not differ from ing in order to both enhance the ecological validity
each other, Fisher's LSD: p = 0.731, 95% CI = [−0.28, and the possible application of the current work. In it,
0.40]. Trust in the traditional analytical algorithm was participants are asked to make a consequential choice
significantly lower than trust in the machine learning al- of whether to rely on their own judgment or on that of
gorithm, Fisher's LSD: p = 0.001, 95% CI = [0.23, 0.92], as an algorithm. This methodological change from sev-
well as trust in the learning evidence algorithm Fisher's eral of our previous experiments allows Study 5B to pit
LSD: p = 0.003, 95% CI = [0.17, 0.86]. Figure 4 summa- preference for algorithms against not some undefined
rizes these results. other human but against human choosers themselves.
With this study, we simultaneously document the ef- Further, Study 5A used the terminology “traditional
fectiveness of the two interventions put forth by the pres- analytical algorithm” and found decreased trust in it.
ent investigation. Conceptually replicating Studies 3 and Though we chose it to match the word length of the con-
4 in the novel domain of online matchmaking, the learn- dition describing a “machine learning algorithm,” that
ing evidence algorithm saw a level of trust commensurate phrasing may have inadvertently made the algorithm
with the human agent while the traditional algorithm seem old or outdated, accounting for the reduced trust
again received the lowest trust ratings, consistent with we observed. Accordingly, the experimental conditions
algorithm aversion. Study 5A added a new intervention in Study 5B use the more straightforward terms “algo-
that did not require explicit evidence of learning history rithm” and “machine learning algorithm.” In the inter-
in the form of statistics but merely the inclusion of the est of robustness testing, Study 5B applies these terms
word learning in the name of the algorithm (“machine to examine our focal question in a new domain: art. By
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 | REICH et al.
pivoting to art, Study 5B tests our predictions in a new it over themselves to make the art quality estima-
incentive-compatible manner in order to complement tion (66.7%), whereas when the algorithm was labeled
the incentive-compatible design of Study 4. This study solely as an “algorithm,” participants were less likely
was pre- registered on AsPredicted.org (https://aspre to choose it over themselves to make the art quality
dicted.org/5HZ_WGH). estimation (43.6%; χ2(1) = 10.78, p = 0.001, Cramer's
V = 0.232).
Study 5B demonstrated that subtle manipulation of
Method perceived algorithm learning (labeling the algorithm
as a “machine learning algorithm”) can shift consumer
We recruited a sample of two-hundred participants preferences toward the algorithm in a consequential
(Mage = 40.76, SD = 11.71; 49% female) from Amazon choice setting. That is, even in the absence of explicit
Mechanical Turk using CloudResearch. We utilized information about past performance that demonstrates
the CloudResearch Approved Participants feature, how an algorithm has learned from mistakes in the past,
to ensure the participation of only high-quality par- simply denoting that an algorithm is capable of learning
ticipants who have passed CloudResearch's attention from mistakes proves sufficient to overcome algorithm
and engagement measures. Participants participated aversion.
in exchange for monetary payment. Participants were
randomly assigned to one of two agent conditions: al-
gorithm or machine learning algorithm. Participants ST U DY 6
were told that the task involved estimating the qual-
ity of a piece of art. They could either use their own Our final study, like Study 5B, stays in the domain of art
estimation or the estimation of either an “algorithm” but makes several important changes to test the robust-
or a “machine learning algorithm.” To make choosing ness of our hypotheses. First, after having documented
correctly feel consequential, all participants were told the effectiveness of a linguistic intervention in Studies
that, at the end of the study, we would score the accu- 5A and 5B, Study 6 returns to the reporting of past per-
racy of the estimation compared with the estimates of formance statistics. We make this change because earlier
an expert art curator. With this benchmark for estima- studies reported rates indicative of high performance
tion accuracy, we would conduct a lottery among the (80% success) and quick learning (an improvement of 10
participants who provided an accurate estimate, and percentage points over successive periods) that, together,
the winner would receive a 10-c ent bonus. In fact, all might have been seen as approaching near perfection.
participants received the 10-c ent bonus at the conclu- Accordingly, Study 6 operationalizes both lower overall
sion of the experimental session. performance and slower learning (in the learning con-
Specifically, participants were told “The estimation ditions) while holding performance constant. Second,
you provide today will be compared to the estimation some of our previous experiments were incentive com-
of an art curator. We will consider an accurate estima- patible while others were not. With Study 6, we introduce
tion as one that is within 10 points of the art curator's incentive-compatibility as a between- subjects experi-
estimation. The estimation will be on scale from 0 = mental factor in order to consider whether the stakes (i.e.,
low quality to 100 = high quality. Please look at the art- importance or consequences of an error) for the decision
work.” Participants were shown an art piece and were might moderate our core findings. For example, would
asked to indicate who they would like to estimate the high (versus low) stakes make algorithm aversion more
quality of the art piece by selecting one of the following likely, causing consumers to eschew algorithms regard-
two radio buttons: “I would like to estimate;” “I would less of their ability to learn? Third, and toward this ob-
like the [machine learning] algorithm to estimate.” They jective, Study 6 introduces yet another means by which
were then thanked for their selection and were notified to operationalize incentive compatibility. This study was
that they would receive the 10-c ent bonus as a token pre-registered on AsPredicted.org (https://aspredicted.
of our appreciation (i.e., all participants received the org/DNW_3YN).
bonus regardless of their choice). Finally, participants
completed demographic measures (gender and age).
Method
art piece was ostensibly selected by a human, it was objective and subjective domains. Study 2 provided more
perceived as significantly more impressive (M = 4.18, targeted evidence by examining learning from mistakes.
SD = 1.85) than when it was selected by an algorithm We found that perceived learning from mistakes medi-
(M = 3.85, SD = 1.81). There was no main effect of per- ated the effect of agent (human or algorithm) on trust.
formance data, F(1, 792) = 1.51, p = 0.219 or stakes, F(1, In Study 3, we introduced a key moderator (learning
792) = 0.002, p = 0.961. evidence), to provide evidence for our proposed process
Study 6 thus documents the robustness of our effect using a moderated mediation analysis. In the absence
across several variations to our experimental design. of evidence of learning, algorithms were seen as less
Even with a lower overall performance and a slower rate capable of learning than humans, accounting for lower
of learning, algorithms capable of learning witness an in- trust. However, when provided with evidence of learn-
crease in choice share relative to algorithms that do not. ing, algorithms were seen as equally capable of learning
This effect of the mere capacity for learning is consistent as humans, resulting in a level of trust commensurate
with the results from Studies 5A and 5B. Additionally, with that of humans. Study 4 moved from trust to actual
the importance or consequences of the decision have a choice in an incentive-compatible design, providing ev-
negligible impact on this overall effect, as Study 6 sup- idence for the role of our learning evidence intervention
ported our hypothesis for both low-and high- stakes in influencing consequential choice of an algorithm over
decisions between a human and an algorithm. Our a human. Study 5A included a full comparison of our
high-stakes conditions provide evidence for our effect different interventions. We examined both a novel ma-
in a novel realm of incentive-compatibility. Rather than chine learning naming intervention as well as the learn-
providing a financial incentive for correctly choosing ing evidence intervention from our earlier studies as well
between a human and an algorithm, the high stakes of as a traditional (control) algorithm and a human fore-
Study 6 entailed participants making a product-related caster. This study also provided support for the benefit
choice under circumstances in which they actually stood of learning evidence in a novel domain (online dating),
to receive that product. serving as a robustness check. In sum, both of our inter-
ventions were effective at moving the trust needle. Study
5B examined only that subtle naming manipulation of
GE N E R A L DI SC US SION perceived algorithm learning in a new domain, art, and
replicated our core effect (1) as contrasted against an al-
A popular 2015 book asked in its title “What to Think gorithm simply called an algorithm, (2) using a differ-
About Machines That Think” (Brockman, 2015). As ent operationalization of incentive compatibility, and
improvements in artificial intelligence, machine learn- (3) when the human contending with the algorithm was
ing, and algorithmic forecasting continue to spread into the consumer themself. Finally, Study 6 remained in the
new and diverse prediction domains, the need to un- realm of art and provided evidence spanning different
derstand what consumers think about these machines robustness checks, including lower performance, slower
and their outputs—recommendations, predictions, and rates of learning, and the stakes of the choice, using a
forecasts— continues to grow in kind. The inevitable new means by which to make the choice between human
rise of prediction machines, coupled with their tendency and algorithm incentive-compatible (possibly receiving
to outperform human forecasters, will necessitate not a product resulting from the choice participants made).
only a deeper understanding of the circumstances under
which consumers opt for one agent over the other but
also a means by which to leverage these insights to de- Theoretical and managerial implications
sign interventions that help consumers help themselves
in the form of placing well-earned trust in algorithmic This research takes root in a longstanding and growing
forecasts. tradition that considers how consumers weigh their op-
Our research contributes to marketing and consumer tions in choosing between human and algorithmic fore-
behavior by speaking to both issues. Through the lens of casters (Dawes, 1979; Grove & Meehl, 1996; Meehl, 1954).
mistake-making, seven experiments find that consum- To date, this literature has largely specified the circum-
ers tend to avoid algorithmic advice on the often faulty stances under which consumers tend toward algorithm
assumption that those algorithms, unlike their human aversion, eschewing the advice of artificial intelligence
counterparts, cannot learn from errors, in turn offering (Dietvorst et al., 2015) or, instead, toward algorithm ap-
an inroad by which to remedy algorithm aversion. Study preciation that adopts it as a source of advice over its
1 replicated task-dependent algorithm aversion (Castelo human counterpart (Logg et al., 2019). Answers to the
et al., 2019), by showing that participants placed less trust question of aversion versus appreciation vary as a func-
in algorithms in a subjective (versus an objective) domain. tion of the domain in which the forecast is being made
Central to the current investigation, participants fur- (Longoni et al., 2019; Longoni & Cian, 2022) and, im-
ther believed that humans are more capable of learning portantly, these descriptive insights have been utilized
than algorithms, an effect that generalized across both in the development of selected interventions that prove
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HOW TO OVERCOME ALGORITHM AVERSION | 15
capable of tipping the scales one way or another. For in- and non- human (e.g., algorithmic) sources, our pres-
stance, Castelo et al. (2019), after finding that consum- ent results confirmed that consumers innately see hu-
ers especially avoid algorithms for subjective prediction mans as dynamic and algorithms as static. If the latter
domains, subsequently framed the algorithm in more could be conceptualized more like the former, then, our
humanlike ways in order to increase uptake of its advice logic proceeded, algorithms might benefit from a sim-
(see also Schroeder & Epley, 2016 for a successful inter- ilar perceived ability to learn from mistakes. Our data
vention that incorporates humanness into algorithmic supported these hypotheses, finding that public percep-
forecasting). Still, when algorithms are portrayed in a tions of algorithms are malleable. When that malleabil-
manner more like humans, what about this humanlike ity manifests as learning from mistakes, as it did in our
appearance confers benefits to the algorithm in terms of two interventions, algorithm aversion wanes. In addition
choice share? to these contributions to theories of human-algorithm
The present investigation answers this question, in differences, our work extends the benefits of mistake-
part, by incorporating the literature on mistake-making. making beyond mistakes made by humans, the primary
Prior work had suggested that algorithms suffer be- focus of research to date on this topic. Algorithms, much
cause consumers doubt their ability to learn (Dietvorst like humans, garner greater trust when they use their
et al., 2015), meaning that a single mistake reads as di- past mistakes as opportunities for growth, raising new
agnostic of an underlying fatal flaw. However, as docu- possibilities for the literature on the benefits of making,
mented in the heretofore-separate literature on human admitting, and learning from mistakes (Reich et al., 2018;
mistake- making, people who err are seen as capable Reich & Maglio, 2020).
of learning from those mistakes, and those who de- Whereas our primary theoretical contributions came
liver on this potential are often held in regard just as from documenting how consumers tend to think about
high— if not higher— than their non- erring counter- humans versus algorithms, our primary managerial
parts (Kupor et al., 2018; Reich & Maglio, 2020; see also implications lie in the theory- derived interventions
Reich et al., 2021a). Our synthesis of these two literatures that we designed and tested. We did not seek to simply
opened the door to the possibility that, should algo- showcase that consumers can update their beliefs about
rithms be framed in a manner conducive to being seen as algorithms. Rather, we sought to utilize this ability to
capable of learning, then similarly providing evidence of update in a particular arena: Algorithms can change,
past learning should dispel any reservations consumers to be sure, but our focal type of change is improvement
have in following their advice. Study 2 confirmed that over time that reflects learning from past mistakes.
consumers tend to fear that a mistaken algorithm is a From this broad conceptual vantage point, we designed
broken algorithm incapable of learning, and Studies 3–6 and tested two novel interventions on the same theme.
followed from this insight to develop two novel interven- Studies 3, 4, and 6 presented the performance history of
tions that correct this misperception. the algorithm that improved over time, an intervention
From a theoretical perspective, these findings both that we then again tested alongside another in Study 5A:
deepen and broaden the collective understanding of a mere change to the name of the algorithmic predicting
how consumers think about algorithms. Consistent with agent (as well as in Study 5B). Consistently, both of our
prior research on algorithm aversion, we replicate the interventions proved capable of steering consumers to-
general tendency of consumers to opt for human fore- ward the belief that algorithms can learn from mistakes
casts. Yet, in a departure from considerations as to when and, when they did, they won trust, support, and choice
algorithm aversion proves more or less likely, we sought (even in the incentive-compatible designs of Studies 4,
to study why it occurs in the first place. Accordingly, 5B, and 6).
our theoretical development did not target particular Both interventions offer ready-m ade opportunities
domains as a moderator of algorithm aversion on the for managers to implement in order to increase con-
premise that algorithms simply lack the ability to per- sumer reliance on the advice provided by their algo-
form in particular domains. Instead, we considered the rithms. While the improving performance histories in
fundamental question about what consumers believe Studies 3, 4, 5A, and 6 might only apply to algorithms
it means to be a human versus an algorithm. As such, that have, in fact, improved over time, our data point
we contribute to a growing range of differences in the to two other possibilities of broader relevance. First,
perception and evaluation of humans and algorithms, let us return to the unexpected finding in Study 4,
including agency and experience (Epley et al., 2007), whereby mention of any performance history led to a
motivation (Epley et al., 2008), human features (e.g., nearly even split between preference for a human ver-
speech or voice; Schroeder & Epley, 2016), mind per- sus an algorithm. This choice share was substantially
ception (Gray et al., 2007), and factors fundamental to higher than that for the algorithm without mention of
human nature (e.g., emotion, intuition, spontaneity, or performance history, suggesting that perhaps mention
soul; Haslam, 2006; Turkle, 2005). While these source of any past performance leads consumers to consider
features provide avenues for future research to exam- that the algorithm has improved over its lifespan in a
ine differences in how learning is inferred in humans manner deserving of reasonable trust. However, it is
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 | REICH et al.
important to note that this result may also be idiosyn- that would also be easy to implement. We made this
cratic to the particular performance history used (an choice because, as algorithms continue to improve their
80% success rate), so, we are cautious in speculating forecasting performance and further their dominance
about its broad applicability (e.g., below a 50% success in most prediction domains, understanding which basic
rate). More confidence is warranted in our predicted factors cause non-human forecasters to be preferred will
effect from Study 5A and 5B. Second, just about all be of great value to marketers and consumer behavior
algorithms in the current marketplace are constantly researchers alike and could help to improve the uptake
updating to provide better, more accurate forecasts, of more accurate information (Dawes, 1979; Hastie &
matches, and other forms of advice. Accordingly, all Dawes, 2009). In so doing, however, our research cannot
of these algorithms can justifiably be called machine speak to where, exactly, consumers believe the learning
learning algorithms, and Study 5A and 5B suggest that is taking place. Is all learning from mistakes created
those extra words in the name are well worth includ- equal, or were participants in our studies inferring a par-
ing in the interest of winning greater consumer trust. ticular kind of learning most conducive to winning trust
Finally, the results from Study 6 especially suggest that and choice share? Answers to these questions may likely
any capacity for learning—regardless of levels of per- intersect with another important factor in the landscape
formance or rate of improvement—similarly warrant of algorithm appreciation versus aversion: individual
inclusion in promotional materials, as even lower per- differences. We conjecture that some consumers may
formance and slower improvement still appear suffi- inherently trust algorithms while others may refuse to
cient in order to signal the capacity for learning from trust algorithms at all. For these types of individuals,
mistakes and the increase in choice share that results the theory-derived interventions identified in the pres-
therefrom. This effect, per the null effect of stakes ent investigation may be, respectively, unnecessary or
in Study 6, appears robust regardless of whether the ineffective; it could be only those consumers somewhere
stakes of an error are high or low, making the effect in the middle that prove most responsive to nudges like
documented here relevant to marketing managers for ours that increase trust in algorithms. Though outside
products of both high and low cost and frequent and the scope of the present investigation, determinants of
rare purchase occurrence. learning specific to algorithms (e.g., computational
power, data type, amount of data, or statistical process-
ing method) and individual- level dispositions toward
Limitations and future directions or away from algorithms (e.g., experience, openness to
change, technological adoption, education level) could
For our second intervention, designed around the abil- be explored in future research.
ity of names to change consumer thought and action, As prediction technology continues to improve and
we chose “machine learning algorithm” in keeping with broaden the scope of its influence, firms and consum-
the parlance of the times in which our research was con- ers alike will benefit from better understanding how
ducted. We predicted that, with this common phrase, consumers respond to these new technologies. As
an easy opportunity to reduce algorithm aversion might society moves from primarily human- based recom-
be hiding in plain sight. Our results supported this hy- mendation systems to ones that are predominantly
pothesis, yet it did not probe the necessary and sufficient machine-based, research in consumer behavior must
components in those three words. Specifically, would an continue to work toward an improved understanding
algorithm win just as much consumer favor were it sim- of how we think about machines that think. At the
ply termed a “learning algorithm?” From a theoretical same time, the literature on algorithm aversion (and
perspective, any signal of learning from mistakes should appreciation) will continue to refine the definition of
prove effective. However, it remains possible that the these terms in the first place. Does algorithm aversion
familiarity of that phrase contributed to its efficacy. To always amount to human appreciation, or might al-
the list that includes algorithms and machine learning, gorithm aversion result in simply avoiding making a
we await future consideration of a third phrase of equal choice in which the algorithm provides a recommen-
prominence: artificial intelligence. Should the second dation while advice from a human is unavailable? With
word in that phrase insinuate that the agent under con- this research, we identify that consumers often think
sideration is capable of using mistakes to improve upon that machines cannot think—at least not to the extent
its intelligence, the use of this moniker might offer ad- that thinking is learning and learning is thinking. By
vantages similar to “machine learning.” identifying this tendency, interventions like the two de-
After documenting the consumer tendency to doubt signed and tested here can give rise to new opportuni-
the ability of algorithms to learn (from mistakes) in ties by which to disabuse consumers of the notion that
Studies 1 and 2, Studies 3–6 capitalized on this tendency machines cannot learn from mistakes, in turn fostering
by utilizing novel interventions. These interventions were greater reliance on algorithms and, via their tendency
designed with simplicity as the goal, providing a basic to outperform humans, to foster greater satisfaction
and straightforward test of our theoretical proposition and wellbeing.
15327663, 0, Downloaded from https://myscp.onlinelibrary.wiley.com/doi/10.1002/jcpy.1313 by Oklahoma State University, Wiley Online Library on [06/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
HOW TO OVERCOME ALGORITHM AVERSION | 17
McGuire, W. J. (1978). An information-processing model of adver- Shaffer, V. A., Adam Probst, C., Merkle, E. C., Arkes, H. R., &
tising effectiveness. Behavioral and Management Science in Medow, M. A. (2013). Why do patients derogate physicians
Marketing, 15, 156–180. who use a computer-based diagnostic support system? Medical
Meehl, P. E. (1954). Clinical versus statistical prediction: a theoretical Decision Making, 33(1), 108–118.
analysis and a review of the evidence. University of Minnesota Shrum, L. J., & Lowrey, T. M. (2007). Sounds convey meaning: The im-
Press. plications of phonetic symbolism for brand name construction.
Mitchell, T. M. (1997). Machine learning. McGraw-Hill. In Psycholinguistic Phenomena in Marketing Communications
Önkal, D., Goodwin, P., Thomson, M., Gönül, S., & Pollock, A. (pp. 39–58). Erlbaum.
(2009). The relative influence of advice from human experts Stefaniak, C., & Robertson, J. C. (2010). When auditors err: How mis-
and statistical methods on forecast adjustments. Journal of take significance and superiors' historical reactions influence
Behavioral Decision Making, 22(4), 390–409. auditors' likelihood to admit a mistake. International Journal of
Palmer, M., Simmons, G., & de Kervenoael, R. (2010). Brilliant mis- Auditing, 14(3), 41–55.
take! Essays on incidents of management mistakes and mea culpa. Sternthal, B., Phillips, L. W., & Dholakia, R. (1978). The persuasive
International Journal of Retail & Distribution Management, 38(3), effect of scarce credibility: A situational analysis. Public Opinion
234–257. Quarterly, 42(3), 285–314.
Parker, D., & Lawton, R. (2003). Psychological contribution to the un- Tjosvold, D., Zi-you, Y., & Hui, C. (2004). Team learning from mis-
derstanding of adverse events in health care. Quality and Safety takes: The contribution of cooperative goals and problem-
in Health Care, 12(6), 453–457. solving. Journal of Management Studies, 41(7), 1223–1245.
Pornpitakpan, C. (2004). The Persuasiveness of source credibility: A Turkle, S. (2005). The second self: Computers and the human spirit. Mit
critical review of five decades' evidence. Journal of Applied Social Press.
Psychology, 34(2), 243–281. Uribe, C. L., Schweikhart, S. B., Pathak, D. S., Marsh, G. B., &
Promberger, M., & Baron, J. (2006). Do patients trust computers? Reed Fraley, R. (2002). Perceived barriers to medical-error re-
Journal of Behavioral Decision Making, 19(5), 455–468. porting: An exploratory investigation. Journal of Healthcare
Rabaglia, C. D., Maglio, S. J., Krehm, M., Seok, J. H., & Trope, Y. Management, 47(4), 263–279.
(2016). The sound of distance. Cognition, 152, 141–149. Yeomans, M., Shah, A., Mullainathan, S., & Kleinberg, J. (2019).
Reich, T., Kupor, D. M., & Smith, R. K. (2018). Made by mistake: Making sense of recommendations. Journal of Behavioral
When mistakes increase product preference. Journal of Consumer Decision Making, 32(4), 403–414.
Research, 44(5), 1085–1103. Yorkston, E., & Menon, G. (2004). A sound idea: Phonetic effects
Reich, T., & Maglio, S. J. (2020). Featuring mistakes: The persua- of brand names on consumer judgments. Journal of Consumer
sive impact of purchase mistakes in online reviews. Journal of Research, 31(1), 43–51.
Marketing, 84(1), 52–65.
Reich, T., Maglio, S. J., & Fulmer, A. G. (2021a). No laughing matter:
Why humor mistakes are more damaging for men than women.
How to cite this article: Reich, T., Kaju, A., &
Journal of Experimental Social Psychology, 96, 104169.
Reich, T., Savary, J., & Kupor, D. (2021b). Evolving choice sets: Maglio, S. J. (2022). How to overcome algorithm
The effect of dynamic (vs. static) choice sets on preferences. aversion: Learning from mistakes. Journal of
Organizational Behavior and Human Decision Processes, 164, Consumer Psychology, 00, 1–18. https://doi.
147–157. org/10.1002/jcpy.1313
Reich, T., & Tormala, Z. (2013). When contradictions foster per-
suasion: An attributional perspective. Journal of Experimental
Social Psychology, 49(3), 426–439.
Schroeder, J., & Epley, N. (2016). Mistaking minds and machines:
How speech affects dehumanization and anthropomorphism.
Journal of Experimental Psychology, 145, 1427–1437.