You are on page 1of 12

Received: 20 May 2016 Revised: 1 December 2016 Accepted: 31 January 2017

DOI: 10.1002/for.2464

RESEARCH ARTICLE

Understanding algorithm aversion: When is advice from


automation discounted?

Andrew Prahl | Lyn Van Swol

Department of Communication Arts,


Abstract
University of Wisconsin–Madison,
Wisconsin, USA Forecasting advice from human advisors is often utilized more than advice from
automation. There is little understanding of why “algorithm aversion” occurs, or
Correspondence
Andrew Prahl, Department of
specific conditions that may exaggerate it. This paper first reviews literature from
Communication Arts, University of two fields—interpersonal advice and human–automation trust—that can inform
Wisconsin–Madison, Madison, WI 53706, our understanding of the underlying causes of the phenomenon. Then, an experi-
USA.
Email: aprahl@wisc.edu ment is conducted to search for these underlying causes. We do not replicate the
finding that human advice is generally utilized more than automated advice. How-
ever, after receiving bad advice, utilization of automated advice decreased signifi-
cantly more than advice from humans. We also find that decision makers describe
themselves as having much more in common with human than automated advisors
despite there being no interpersonal relationship in our study. Results are discussed
in relation to other findings from the forecasting and human–automation trust fields
and provide a new perspective on what causes and exaggerates algorithm aversion.

K EY WO R D S
advice, algorithm aversion, automation, computers, trust

1 | INTRODUCTION Meehl, 1954). Recently, this effect has been noted in forecast-
ing research (Önkal et al., 2009) and has been called algo-
Computers, robots, algorithms, and other forms of automa- rithm aversion (Dietvorst, Simmons, & Massey, 2015). A
tion are quickly becoming a fundamental part of many deci- developing area of research is trying to identify interventions
sion‐making processes in both personal and professional that increase trust in automation advice, such as providing
contexts. From forecasting product sales (Fildes, Goodwin, confidence intervals or allowing human judges to slightly
Lawrence, & Nikolopoulos, 2009) to informing medical and modify automation forecasts (Dietvorst, Simmons, &
management decisions (Esmaeilzadeh, Sambasivan, Kumar, Massey, 2016; Goodwin, Gönül, & Önkal, 2013). This
& Nezakati, 2015; Inthorn, Tabacchi, & Seising, 2015; Prahl, research is important, but more research is needed on the
Dexter, Braun, & Van Swol, 2013), people frequently seek underlying psychological processes that affect the
and receive advice from nonhuman (automation) sources discounting of automation advice, especially in comparison
when facing important decisions. Yet, despite seeking advice to human advice. This paper examines trust as a factor that
from automation, decision makers frequently discount advice may underlie differences in use of advice. First, we summa-
obtained from it, especially when compared to advice from a rize literature in two related fields that can inform forecasting
human advisor (Önkal, Goodwin, Thomson, Gönül, & research about algorithm aversion: interpersonal advice and
Pollock, 2009). human–automation trust.
The irrational discounting of automation advice has long In addition to providing a new perspective on automated
been known and a source of the spirited “clinical versus actu- advice, both the advice field and human factors provide the-
arial” debate in clinical psychology research (Dawes, 1979; oretical frameworks that we use to generate hypotheses and

Journal of Forecasting. 2017;36:691–702. wileyonlinelibrary.com/journal/for Copyright © 2017 John Wiley & Sons, Ltd. 691
692 PRAHL AND VAN SWOL

conduct a simple experiment to compare human versus auto- determined by their perceived competence, which is often
mation forecasting advice. We begin with a review of advice related to the advisor's expressed confidence. The relation-
research in order to understand how the process of receiving ship between competence and advice utilization is so strong
advice from automation differs from receiving advice from a that it is common for advice literature to use a two‐dimen-
human, specifically by examining trust in advice and the sional model of advice utilization, which differentiates advi-
effects of receiving advice of varying quality from either a sor competence from all other factors such as intentions and
human or automated, computer forecast. In turn, we will integrity (Jodlbauer & Jonas, 2011; Schrah, Dalal, &
relate advice research to findings from the human factors Sniezek, 2006). This is particularly important for the research
and forecasting fields. presented here because frequently in human–computer inter-
action research computers are conceptualized as only being
evaluated on competence‐related factors and nothing else
1.1 | Advice: The message and the advisor
(Gefen, Karahanna, & Straub, 2003; Paravastu, Gefen, &
The interpersonal process of giving and receiving advice has Creason, 2014). Furthermore, because of the computer‐
received considerable attention from communication scholars mediated format of advice from algorithms, the emotional
over the past 20 years (for a review, see Bonaccio & Dalal, and nonverbal cues that make up many message‐based char-
2006; MacGeorge, Guntzviller, Hanasono, & Feng, 2013). acteristics, besides competence (such as intention and emo-
Yet, advice from automation sources such as computers and tion) are not likely to be salient (Derks, Fischer, & Bos,
robots has been largely overlooked by communication 2008; Whittaker, 2003). Also, social cues like integrity and
scholars. Advice response theory (ART) has emerged and benevolence are likely unimportant for forecasting advice
been developed as the main theory of interpersonal advice because most forecasting tasks are not interpersonal in nature
(Feng & Feng, 2013; Feng & MacGeorge, 2010; MacGeorge (as opposed to an interpersonal advice situation involving
et al., 2013; Van Swol, MacGeorge, & Prahl, 2015). Key romantic or career advice).
aspects of ART include message‐related characteristics (such Therefore, this study will primarily examine advisor
as the politeness of the message), advisor‐related characteris- characteristics, rather than message characteristics, in deter-
tics (such as expertise), and receiver‐related characteristics mining advice utilization, and the study will focus specifi-
(such as emotional state). Advice research prior to ART cally on perceived advisor competence to investigate the
was largely focused on aspects of the advisor such as exper- process of receiving advice from a computer versus a human.
tise, trustworthiness, and confidence (Sniezek, Schrah, & There is a large and developed body of research on interper-
Dalal, 2004; Sniezek & Van Swol, 2001; Van Swol, 2009). sonal advice. Therefore, we use research on human‐to‐human
One of the most important contributions of ART is the advice as a theoretical background, and then examine how
assertion that message characteristics are more closely related computer‐to‐human advice differs both in advice evaluation,
to the receiver's evaluation of advice than advisor characteris- advisor competence, and advice utilization.
tics. While this is an important consideration when studying
interpersonal advice, it may not generalize to the domain of
1.3 | Human factors: Automation trust
automated advice. For example, a sales forecast derived from
a computer algorithm may simply take the form of a number Although there is far less research on automation–human
presented to a decision maker. Furthermore, forecasts from advice than interpersonal advice, there is a large and active
more than one software program may be presented in this research effort to understand human–computer (human–
simple format (Goodwin et al., 2013). In this case, source automation) trust. Fortunately, many trust studies, especially
factors (such as the perceived sophistication of the software those concerning forecasting support systems, use methods
program) would be more salient for advice utilization analogous to advice studies. For example, studies have exam-
because there is essentially no variation in message character- ined a sales team incorporating the output from statistical
istics (i.e., politeness). Because of this, we focus more on forecasting software in their sales projections or supply chain
advisor characteristics in determining differences between managers using decision support software to optimize opera-
how people may react to human or computer advice. tions (Asimakopoulos & Dix, 2013; Fildes et al., 2009).
Therefore, these studies examine advice utilization as a proxy
to trust, similar to interpersonal advice research. Because
1.2 | Advisor characteristics: Credibility
using another's advice involves making oneself vulnerable to
Previous advice research has frequently focused on character- the advisor, advice utilization is a behavioral measure of trust
istics of the advisor, such as trust, as a predictor of advice uti- (Mayer et al., 1995). For example, research in interpersonal
lization (Sniezek & Van Swol, 2001; Van Swol & Snizek, advice has consistently used advice utilization as a behavioral
2005). Consistent with research on trust (Mayer, Davis, & measure of trust (Mayer et al., 1995; Sniezek & Van Swol,
Schoorman, 1995), the credibility of an advisor is largely 2001; Van Swol, 2011; Van Swol & Sniezek, 2005).
PRAHL AND VAN SWOL 693

While the forecasting discipline has conducted research (MacGeorge, 2016). In the study below, there is no variation
with similar methods to interpersonal advice scholars, the in message content, so we expect to see differences in advice
largest research effort to understand advice between a human utilization based on qualities that the participant assumes the
and automation comes from the human factors and ergonom- advisor has. Furthermore, ART predicts that source charac-
ics field (Madhavan & Wiegmann, 2007b). Literature in this teristics change the perception of message features—but the
area is largely concerned with low levels of automation as proposed study has essentially no message features, at least
opposed to sophisticated predictive or forecasting software. in the traditional categories of message characteristics like
Examples of low‐level automation include the autopilot in an supportiveness or politeness. Therefore, interpersonal advice
aircraft or an automatic reactor monitoring system in a nuclear research lacks insight into how participants evaluate an auto-
power plant. These forms of automation are distinctly different mated advisor, but automation trust literature from the human
from something like a forecasting support system or medical factors field provides some insight, albeit in regard to non-
diagnosis aid that provide the human with advice (which we judgmental systems, and we review it below.
will henceforth call a judgmental system), as low‐level auto- The perceived competence of an automated aid is
mation in ergonomics is simply there to take workload off of strongly predicted by the automation error rate (e.g., false
the human and not advise human judgment. A judgmental sys- positive alerts when there is no problem). Even if a machine
tem, on the other hand, is meant to supplement and inform is wrong only a small fraction of the time, there is a tendency
(and perhaps supersede) human judgment or intuition. for human decision makers to exaggerate the perceived error
It is still informative to explore the extensive body of rate of the automation (Dzindolet et al., 2002; Hoff & Bashir,
research on human–automation trust to inform hypotheses 2015). In the forecasting field, feedback about computer per-
for the current study. One overarching theme in human– formance is also a large predictor of future use of such judg-
automation trust research is that humans generally expect mental systems (Alvarado‐Valencia & Barrero, 2014; Fildes
automation to be “perfect” (i.e., with an error rate of zero), & Goodwin, 2013). In human factors research, errors cause
whereas a human is expected to be imperfect and to make a more rapid decline in advice utilization from computers
mistakes. While this perfection may be expected of an auto- compared to human advisors (Madhavan & Wiegmann,
mated safety alert system or a plane autopilot, this is a diffi- 2007b). Evidence is weak for any differing effects of perfor-
cult principle to apply to a judgmental forecasting system, mance feedback between human advisors and computer advi-
as the future outcome is inherently unknowable and neither sors for judgmental systems, but we would expect to see the
a system nor human could achieve perfection. Nevertheless, same trend as in automation research if advice recipients
the idea that advice recipients may generally utilize automa- see that the computer advice was poor. Such a result would
tion advice more than human advice has found support in pre- be consistent with the “perfections schema” framework pro-
vious experimental work (Dijkstra, Liebrand, & Timminga, posed by Madhavan and Wiegmann (2007b), which suggests
1998; Dzindolet, Pierce, Beck, & Dawe, 2002; Madhavan & that decision makers expect automation to be perfect,
Wiegmann, 2007b). However, this research primarily deals whereas humans are fallible. Thus an error from automation
with lower‐level automation systems. Other research on judg- is not only an error—it is an unexpected error that leads to
mental systems, like forecasting systems, tends to show that harsher “punishment” of the advisor.
humans are more trusted than computers (Dietvorst et al.,
H2. After receiving bad advice, participants will
2015; Önkal et al., 2009). Despite the word “trust” being
have a larger decline in use of advice from com-
used, we are careful only to cite studies that use methods in
puter advisors than from human advisors.
which trust is measured behaviorally as advice utilization.
The tasks used in the study in this paper are predictive in
nature, and require the use of a judgmental system. Therefore, 2 | PS YCH OLOGI CAL PROC ES SES
we seek to replicate the findings of forecasting research that D R I V I NG AUT O M AT I O N A DV I C E
human advice is utilized more than computer advice: D I S C O U N T I NG

H1. Advice from humans will be utilized more In addition to advisor characteristics, recipient factors like
than advice from computers. emotional reactions and feelings of similarity to the advisor
(MacGeorge et al., 2013) are also important to investigate
This study compares advice from two different sources because, if humans experience different emotions when
(human or algorithm) in a unique way, because the format receiving advice from a nonhuman, it can affect differences
of the advice contains no content besides a numerical predic- in advice utilization. When told that advice is from a human,
tion. As reviewed above, scholars studying advice from the people may engage in a parasocial relationship with the advi-
“message paradigm” of ART are typically focused on manip- sor, where they imbue the advisor with some of the same
ulations in the advice message content that change utilization emotions as an actual social relationship (Horton & Wohl,
694 PRAHL AND VAN SWOL

1956; Kumar & Benbasat, 2002). Preliminary research with undergraduates at a large Midwestern university in the USA
fMRI imaging has shown differences in the response of and received extra credit in communication classes for their
reward regions of the brain when advice recipients choose participation. Three participants who failed the manipulation
to use the advice of human advisors as opposed to automated check by stating that the advice was from the incorrect source
advisors, even when there is no actual relationship between were removed—one from the computer condition and two
advisor and recipient (Parasuraman, de Visser, Wiese, & from the human condition—resulting in a total computer
Madhavan, 2014). In other words, just knowing that advice group n = 90 and human n = 67 (N = 157). The participants
came from a human, rather than from a computer, may create were 75% female in the computer advice condition and 73%
a parasocial relationship between the advisor and advisee that female in the human advice condition; 98% were between
would not occur with advice from a computer. the ages of 18 and 23, with two participants' ages unreported.
Thus, we propose:
H3a. Advice recipients will experience greater 3.2 | Task
positive emotions when receiving advice from
Participants completed 14 forecasting tasks consisting of a
a human advisor than a computer advisor.
graph of past data for a variety of issues related to operating
H3b. Advice recipients will experience fewer room management in a hospital setting. An example would
negative emotions when receiving advice from be to forecast the mean time an orthopedic surgery would
a human advisor than a computer advisor. take 6 months in the future, given a line chart of the past
2 years' worth of monthly mean orthopedic surgery times;
Lastly, even if advice recipients were to engage in a
see the Appendix for screenshots of the task. A judgmental
parasocial relationship with a computer advisor, there will
forecasting task was used so results could easily be related
still be a greater sense of similarity with human advisors.
to the large amount of previous research on forecasting
Research on automated sales agents and other forms of auto-
support systems.
mation such as robots has shown that automated agents are
Hospital forecasting tasks were used in order to control
generally perceived as not having human attributes such as
for participants having personal intuition for what the out-
emotions or intentions (Kahn, Ishiguro, Friedman, & Kanda,
comes would be. The assumption was (and checked via
2006). In recent research regarding interaction with robots,
postquestionnaire items) that participants would have little
about half of participants considered a robot to be “in‐
knowledge of operational research management issues and,
between” a living being and a technology, and no respon-
therefore, would only make forecasts based on the data and
dents stated that the robot was a living being; further, most
advice. Previous studies of stock price forecasting has often
respondents reported that the robot could never be an
used students enrolled in economics classes, which may lead
intimate friend (Kahn et al., 2012):
to personal bias and underweighting of advice as the partici-
H4. Advice recipients will report they have pants consider themselves experts, or the participants have
more in common with human advisors than with specific knowledge about the shortcomings of forecasting
computer advisors. support systems (Goodwin, Fildes, Lawrence, &
Nikolopoulos, 2007; Önkal et al., 2009). This lack of partic-
To summarize, we seek to deepen our understanding of
ipant domain knowledge is a key consideration when contex-
why human or automated advice will be utilized more in a
tualizing the findings due to the different ways recipients
forecasting situation. We also seek to extend findings from
utilize advice when they do/do not have domain knowledge
human factors research to determine if automated forecasting
(Lawrence, Goodwin, O'Connor, & Önkal, 2006).
advice is punished more severely than human forecasting
advice when the advice is of poor quality. Furthermore,
because both human and automated advice will be presented 3.3 | Procedure
in identical formats, we aim to identify perceived advisor
Participants signed up for the study online. After digitally
characteristics and recipient characteristics that may explain
signing a consent form, participants read a short introduction
differences in advice utilization.
about the types of tasks they would be given. The manipula-
tion was simple and similar to past studies (Dzindolet et al.,
2002; Madhavan & Wiegmann, 2007a; Önkal et al., 2009).
3 | METHOD
Participants were told at the opening that the advice would
come from “an advanced computer system,” or “Steven, a
3.1 | Participants
person experienced in operating room management issues.”
A total of 160 participants completed the study (computer We named the human advisor because we felt that making
advice, n = 91; human advice, n = 69). Participants were the advisor a nameless “person” felt unnatural and in any
PRAHL AND VAN SWOL 695

real‐world scenario a decision maker would at least know the determine where most participants would place their forecast.
name of their advisor. We settled on the name Steven after We gave advice that was not only worse than the participants'
informally asking some colleagues their ideas for generic forecast but actually would pull the participants away from
human names. In order to incentivize accuracy, participants the correct value, making it so that any utilization of the
were informed that they could win $100 based on their per- advice would make their forecast worse (e.g., in the afore-
formance if they finished all of the tasks with the lowest mentioned example, the very poor advice would be 95
mean percentage error among all participants. There was no instead of 104). After this bad advice intervention, advice
performance history given or implied for either advisor returned to being excellent, based on pilot testing, for the
because performance history may cause participants to start remaining eight trials.
the experiment with an exaggerated level of trust and
expectations (Madhavan & Wiegmann, 2007b).
3.4 | Measures
When completing the forecasting tasks, after viewing the
graph of past data, participants first entered an initial forecast To measure advice utilization, a “SHIFT” variable used in
(text entry) and reported their confidence (using a 0–100 previous forecasting studies was used to directly compare
“slider” widget) in their assessment. Then the screen human–human versus human–computer advice utilization
advanced to an “advice” screen in which the advice of either (Önkal et al., 2009). We chose to use this measure first to
the human expert or computer forecasting support system maximize commonality with previous forecasting research
was presented. The advice was presented as both an extended that used it as a measure of trust and (Önkal et al., 2009) also
trend line on the supplied graph of past data as well as a point because the shift variable is a variant of the “weight of
forecast format. Participants entered a revised forecast on this advice” (WOA) variable commonly used in advice research
screen and reported their confidence in the revised estimate. (Bonaccio & Dalal, 2006). SHIFT is computed via the
After each task was complete, participants received feed- following equation:
back on the accuracy of their forecast. A screen appeared that
showed the absolute percentage error of the participant's
Judge revised forecast−Judge initial forecast
postadvice forecast. Furthermore, the participant's mean
Advisor forecast−Judge initial forecast
absolute percentage error across all previous revised forecasts
was computed and displayed on the feedback screen in a very
large red font. Participants were reminded on the feedback Although it has been suggested that this variable only be
screen that the participant with the lowest mean absolute per- measured in absolute value and only has a theoretical range
centage error would win $100. Thus it is expected that the of 0–1, we did not make any adjustments to shift after it
participants will feel some degree of frustration/gratitude if was calculated; this resulted in some negative shift values
the advice caused them to decrease/improve their forecast being used in analysis. Negative shift values were used
accuracy. because this study includes an intervention where there is
Participants were walked through the process of making a poor advice given; it is therefore theoretically reasonable to
forecast and entering their confidence during two “familiari- expect that negative shift values indicate a distrust of the
zation” forecasts. During this process, directions appeared advisor that goes beyond simply not taking the advice into
on the screen reminding participants of the proper format account—participants may be actively shifting away from
for entering forecasts and also directing their attention to the advisor.
where and when the advice would appear. After the two Confidence was measured after each trial. Asking partic-
“familiarization” forecasts, participants were informed that ipants for their confidence in the initial and revised forecast is
the next 14 forecasts would be the ones that counted towards not entirely consistent with earlier forecasting research,
the competition to win $100. Pilot‐tested advice was used on which asked participants to rate their trust in the advice at
the forecasting tasks such that the advice given over the first the time of decision (Goodwin et al., 2007; Önkal et al.,
five trials was excellent and likely much better than what 2009). The use of pre‐ and postconfidence ratings is, how-
advice recipients would estimate alone. For example, for the ever, consistent with advice research that finds a strong rela-
third trial, pilot testing revealed that most participants would tionship between confidence and trust in advice (Sniezek &
forecast a value of about 100, whereas the “correct” forecast Van Swol, 2001; Van Swol, 2009; Van Swol & Sniezek,
was 105. Thus the given advice was 104, meaning the error 2005). Ratings of trust at the time of decision and in
percentage of the advice was far less than most participants postquestionnaires have not been consistent with actual
forecast. On the sixth trial, recipients were given advice that advice utilization or a good predictor of advice utilization
(if they followed it) was very poor and caused their forecast in forecasting research (Alvarado‐Valencia & Barrero,
accuracy and overall chance of winning the $100 to decrease. 2014). Therefore, this study adopts the confidence measure
For the very poor advice, we also utilized pilot testing to because past research has found it is strongly related to
696 PRAHL AND VAN SWOL

advice utilization (Sniezek & Van Swol, 2001; Van Swol, adjustment, the repeated‐measures ANOVA indicated a
2009). significant effect of time, F(1, 13) = 9.681, p = 0.002; and
A postquestionnaire was administered after the forecast- a significant interaction between time and advisor, F(1,
ing tasks. The questionnaire measured recipient impressions 13) = 1.892, p = 0.034. “Repeated” contrasts (repeated con-
of feelings or thoughts they have in common with their advi- trasts test one trial vs. the previous trial) revealed a significant
sor and their emotions when receiving advice. The survey interaction between time 6 versus 7 and advisor, F(1,
was constructed from items from previous advice literature 149) = 6.693, p = 0.011. This indicates that the change in
(MacGeorge et al., 2013; Van Swol et al., 2015). Positive advice utilization between both groups was significantly dif-
emotions were measured with six Likert questions on a 1 ferent after receiving bad advice. By visually inspecting the
(not at all) to (5 (extremely) scale for six positive emotions: graph of estimated marginal means (see Figure 1), it is clear
Appreciative, Happy, Grateful, Thankful, Satisfied, Glad. that between trial 6 and 7 the computer advisor group had a
The six questions produced sufficient reliability (α = 0.94), larger decrease in advice utilization than the human advisor
and the mean was used as an index of positive emotion. group. Hypothesis 2 is supported.
Negative emotions were measured on the same scale, and Hypothesis 3a stated that advice recipients will experi-
four negative emotions were measured: Mad, Frustrated, ence greater positive emotions when receiving advice from
Annoyed, Irritated. The four questions produced sufficient a human advisor than a computer advisor. Hypothesis 3b
reliability (α = 0.93), and the mean was used as an index of was similar, suggesting the human advice group would expe-
negative emotion. Feelings in common with the advisor were rience fewer negative emotions. There was no significant dif-
measured with a 1–7 Likert question: “I have very little in ference between the advisor groups on either positive
common with my advisor.” emotions, t(155) = −0.779, p = 0.436; or negative emotions,
t(155) = 0.257, p = 0.743. Hypotheses 3a and 3b were not
supported.
4 | RESULTS Hypothesis 4 stated that participants would rate that they
had more in common with human advisors than computer
Hypothesis 1 posited that human advice would be utilized advisors. Hypothesis 4 was supported: Participants in the
more than computer advice. A multivariate analysis of human advice group reported they had significantly more in
variance (MANOVA) on all 14 shift scores showed a margin- common with their advisor than participants in the computer
ally significant main effect of advisor, F(1, 14) = 1.67, advice group, t(155) = −1.935, p = 0.053, d = 0.253 (com-
p = 0.07, η2 = 0.15. Descriptive statistics showed that the puter group: M = 4.090, SD = 1.640; human group:
human advice group shifted less (i.e., utilized the advice less) M = 4.470, SD = 1.420). Post hoc analysis did not reveal
than the computer advice group (computer group: M = 0.63, any other significant relationships between the “in common
SD = 0.20; human group M = 0.62, SD = 0.21). Indeed, this with” question and any other survey question; thus we were
seems exceedingly small, and an independent samples t‐test unable to determine to what exactly this feeling of having
conducted on the average amount of all shift between condi- something “in common” refers.
tions was insignificant, t(155) = 0.02, p = 0.99. Means of tri-
als were calculated to test the amount of advice utilization
prior to, during, and after the bad advice intervention. All 5 | DISCUSSION
three were also insignificant between conditions, (prior to
intervention) t(155) = 0.59, p = 0.56; (during intervention) This study addressed violations of trust when receiving advice
t(155) = 0.78, p = 0.44; (after intervention) t(155) = 0.26, from a nonhuman (computer) or human advisor and percep-
p = 0.79. Hypothesis 1 was not supported. tions of advice from these two different sources. To our
Hypothesis 2 stated that, after receiving bad advice, knowledge, this is the first study using a judgmental system
advice from computers advisors would be utilized less than to replicate the automation research finding that recipients
advice from human advisors. Initial evidence for a significant use computer advice much less than human advice immedi-
difference when the poor advice intervention occurred was ately after receiving bad advice. Thus the penalty for automa-
evident earlier during the MANOVA for all shifts, as the only tion violating high expectations was replicated in a forecasting
individual trial contrast that was significant was for the task with advice from a computer algorithm. Earlier findings
seventh (intervention/postfeedback on bad advice) trial from automation or forecasting research indicating that one
(p = 0.049). To properly assess this hypothesis, a 14 × 2 type of advisor is utilized more was not supported; there were
repeated‐measures ANOVA was conducted on the 14 shift no differences in utilization between type of advisor. Finally,
values with advisor as a fixed factor. Mauchly's test of sphe- participants perceived that they had more in common with a
ricity indicated that the sphericity assumption was violated, human advisor, despite all factors, besides the label, being
Mauchly's W = 0.178, p < 0.001. Using the Huynh–Feldt kept constant between the two types of advice.
PRAHL AND VAN SWOL 697

FIGURE 1 Advice utilization decrease


after receiving bad advice from advisor
(trials 6 and 7)

5.1 | Reactions to bad advice This is an important area for future research in both auto-
mation and judgmental forecasting research, especially as
It is unclear what aspect of computer advice is driving the more judgmental systems and algorithms make their way into
larger decrease in computer advisor advice utilization com- our personal and professional lives. Forecasting‐support sys-
pared to human advice when the recipient is presented with tems are no longer just aiding humans in routine stock price
poor advice. Although we urge caution in using automation or product inventory forecasting—they are also being used
research to inform research on judgmental systems, the idea in very high consequence domains, such as forecasting med-
of the “perfection schema” may be applicable here ical outcomes. Even self‐driving cars or other forms of tech-
(Madhavan & Wiegmann, 2007b). The perfection schema nological driver assistance use large amounts of data to
states that advice recipients generally expect automated advi- predict uncertain future states and make decisions based on
sors to be perfect, and when they commit an error this feels those forecasts. Thus understanding how human decision
especially negative to the advice recipient and they lose trust makers react to advice that comes from such automation is
rapidly in the advisor. On the other hand, errors from human critical, especially understanding reactions to bad advice;
advisors are expected because humans are imperfect; thus after all, no system can be perfect all of the time. By under-
advice recipients are not surprised by a human advisor error standing some of the causes of inappropriate disuse of auto-
and are more likely to give the human advisor another mation, we not only advance our understanding of
chance. While technically there cannot be a “perfection” forecasting‐support system use, but also we gain insights
schema in our study given that the system forecasted future about broader human tendencies when using related technol-
performance that is unknowable and impossible to perfectly ogies. Will a doctor begin to distrust a helpful forecasting
predict, participants may have expected a more consistent system due to one poor forecast? Will drivers ever be able
amount of accuracy (or error) from the computer advisor, to trust an autonomous car that does something unexpected?
especially if recipients assumed the computer was following Understanding the psychology of human decision makers
a statistical algorithm. When that consistency was broken, when interacting with automation also uncovers potential
participants reacted by “avoiding” the automation with theoretical explanations for trust in forecasting systems. For
greater decrements in advice utilization. Human advisors, example, Ring's (1996) concept of fragile and resilient trust
on the other hand, might have been expected to be inconsis- suggests that people's trust in human advice may be more
tent with the accuracy of their advice. An alternative explana- resilient when they are provided with bad advice because
tion for the decreased utilization of the computer advice may people have a long history with human advisors, but trust
be that participants did not necessarily have different initial in automation and algorithms may be more fragile because
expectations, but saw an error by the automation as indicative people have less experience from which to make generaliza-
of a fundamental flaw that would be certain to reoccur, tions, so one bad example is more salient. Decision makers
whereas they expect that humans have the ability to notice may also interact with automated systems with fundamentally
their error and then correct themselves so as to improve different expectations than when interacting with humans
future performance. (Madhavan & Wiegmann, 2007b). Moving forward, we
698 PRAHL AND VAN SWOL

encourage researchers to continue to study differing reactions human factors (the perfection schema) research but failed to
of advice between humans and automation and additionally observe others, such as automation being trusted more during
examine whether trust restoration differs for automated the initial trials. Therefore, we suggest that judgmental
advice. Finally, our results in this study are limited in that systems are not only qualitatively different but also must be
they may only be generalized to domains in which trust studied as a different type of advisor than lower, simpler
may be measured behaviorally via advice utilization. forms of automation like signal detection software.
Although this measure of trust is commonly used in automa-
tion and forecasting research (e.g., Hoff & Bashir, 2015;
5.3 | Perception of advisor
Madhavan & Wiegmann, 2007a; Önkal et al., 2009), it may
not be applicable to other conceptualizations of trust such Recipients reported that they had more in common with the
as self‐reported “stated” trust (e.g., Goodwin et al., 2013) human advisor, but there were no significant differences in
or situations in which a human decision maker does not the emotional reactions to receiving either type of advice.
receive advice from an automated system (such as fully Given that the advice was given over the computer and not
autonomous systems). Going forward, researchers should face‐to‐face from a human, it may have lacked enough emo-
carefully choose the operationalization of trust as to tional and social cues (Walther, 1996) to elicit a strong affec-
maximize the generalizability of their findings and their tive response from recipients. Alternatively, the affective
contribution to the overall trust literature. elements of trust and advice utilization take time to develop
(Schoorman, Mayer, & Davis, 2007), and there may simply
have not been enough trials for recipients to develop a rela-
5.2 | Differences in utilization based on
tionship with the advisor. Although introducing face‐to‐face
advisor
or synchronous mediated communication between a human
Earlier forecasting research that used a very similar method to advisor and advice recipient would introduce many potential
ours and almost identical tasks found that human advice confounds, doing so would simulate how advice is received
generally is utilized more than computer advice (Önkal in natural settings and present an opportunity to see how
et al., 2009), but we did not replicate this finding. This study the relationship between advisor and advisee develops
used a sample that had no knowledge of forecasting methods differently between human and computer advisors.
and strategy, and this difference from previous research may Finally, it is important to note that, despite the fairly weak
explain the failure to replicate. In Önkal et al. (2009), the manipulation and mediated format of the advice, recipients
participants were students in a forecasting and economic were cognizant of whether their advisor was a human or a
related field and could have been educated on the pitfalls or computer. Recipients reported that they had more in common
weaknesses of statistical computer forecasting. An additional with the human than the computer advisor. Although it is
reason may be that our study included performance feedback admittedly vague (what is it recipients feel they have “in com-
after each individual trial. This aspect of our design was mon”?), this perception has the potential—if further studied
required so that participants would be aware that the advice —to explain differences in advice utilization. If it is a shared
they received was good or bad before proceeding to the sense of humanity that recipients perceive, this would have
subsequent trial, but this aspect also limits comparisons to large implications for areas such as medical advice, where
previous studies that did not include performance feedback perceived advisor morality is of great concern (Berner &
after each trial. Maisiak, 1999; Hoffman, 2003). We hoped that the item mea-
A possible theoretical explanation for our failure to repli- suring feelings in common with the advisor would correlate
cate some previous studies is provided by ART. ART sug- with other survey measures and therefore give us a more
gests that impressions of the advisor are largely formed complete answer to Hypothesis 4 but no analyses indicated
through message characteristics (Feng & MacGeorge, a significant relationship. Furthermore, this study lacked
2010), and there was no variation on message characteristics any form of open‐ended questions or exit interview, so we
in this study, which could account for the lack of differences could not determine what the “in common” feeling is. This
in utilization of advice from a human or computer. The is a limitation in the current study but represents an important
aspects highlighted above suggest that for researchers going research question that scholars should address in future
forward sample characteristics, performance feedback, and research.
message factors are important variables that can increase or People today are encountering judgmental advice systems
limit comparisons to previous work. in highly consequential contexts—from the doctor's office to
This study also highlights the need to be careful when online dating websites which all have proprietary “matching”
using human–automation trust research to inform an under- algorithms. Two decades ago, it was hardly fathomable that
standing of judgmental systems that operate at a high level computers would be matching ideal marriage “partners” on
of automation. We replicated some findings from previous dating websites, influencing our reading choices, or
PRAHL AND VAN SWOL 699

suggesting what movie we would enjoy most (Van Swol, Feng, B., & MacGeorge, E. L. (2010). The influences of message and
2011). Advice from computer algorithms and forecasting source factors on advice outcomes. Communication Research,
systems is only likely to increase in the next decade (Tetlock 37(4), 553–575. doi:10.1177/0093650210368258

& Gardner, 2015), and understanding how people trust, uti- Fildes, R., & Goodwin, P. (2013). Forecasting support systems: What
lize, and perceive this advice, especially when it disappoints, we know, what we need to know. International Journal of
is imperative to encouraging high‐quality decision making. Forecasting, 29(2), 290–294. doi:10.1016/j.ijforecast.2013.01.001

Fildes, R., Goodwin, P., Lawrence, M., & Nikolopoulos, K. (2009).


Effective forecasting and judgmental adjustments: An empirical
R E F E RENC E S evaluation and strategies for improvement in supply‐chain planning.
Alvarado‐Valencia, J. A., & Barrero, L. H. (2014). Reliance, trust and International Journal of Forecasting, 25(1), 3–23. doi:10.1016/j.
heuristics in judgmental forecasting. Computers in Human ijforecast.2008.11.010
Behavior, 36, 102–113. doi:10.1016/j.chb.2014.03.047 Gefen, D., Karahanna, E., & Straub, D. W. (2003). Trust and TAM in
Asimakopoulos, S., & Dix, A. (2013). Forecasting support systems online shopping: An integrated model. MIS Quarterly, 27(1), 51–90.
technologies‐in‐practice: A model of adoption and use for product Goodwin, P., Fildes, R., Lawrence, M., & Nikolopoulos, K. (2007). The
forecasting. International Journal of Forecasting, 29(2), 322–336. process of using a forecasting support system. International Journal
doi:10.1016/j.ijforecast.2012.11.004 of Forecasting, 23(3), 391–404. doi:10.1016/j.ijforecast.2007.05.016
Berner, E. S., & Maisiak, R. S. (1999). Influence of case and physician Goodwin, P., Gönül, M., & Önkal, D. (2013). Antecedents and effects of
characteristics on perceptions of decision support systems. Journal trust in forecasting advice. International Journal of Forecasting,
of the American Medical Informatics Association, 6(5), 428–434. 29(2), 354–366. doi:10.1016/j.ijforecast.2012.08.001
Bonaccio, S., & Dalal, R. S. (2006). Advice taking and decision‐ Hoff, K. A., & Bashir, M. (2015). Trust in automation: Integrating
making: An integrative literature review, and implications for the empirical evidence on factors that influence trust. Human Factors,
organizational sciences. Organizational Behavior and Human Deci- 57(3), 407–434. doi:10.1177/0018720814547570
sion Processes, 101(2), 127–151. doi:10.1016/j.obhdp.2006.07.001
Hoffman, S. (2003). Unmanaged care: Towards moral fairness in health
Dawes, R. M. (1979). The robust beauty of improper linear models in care coverage. Indiana Law Journal, 78(2), 659–721.
decision making. American Psychologist, 34(7), 571–582.
doi:10.1037/0003‐066X.34.7.571 Horton, D., & Wohl, R. (1956). Mass communication and para-social
interaction; Observations on intimacy at a distance. Psychiatry,
Derks, D., Fischer, A. H., & Bos, A. E. R. (2008). The role of emotion in 19(3), 215–229.
computer‐mediated communication: A review. Computers in
Human Behavior, 24(3), 766–785. doi:10.1016/j.chb.2007.04.004 Inthorn, J., Tabacchi, M. E., & Seising, R. (2015). Having the
final say: Machine support of ethical decisions of doctors.
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aver- In S. P. V. Rysewyk, & M. Pontier (Eds.), Machine Medical Ethics
sion: People erroneously avoid algorithms after seeing them err. (pp. 181–206). Berlin, Germany: Springer Retrieved from http://
Journal of Experimental Psychology: General, 144(1), 114–126. link.springer.com/chapter/10.1007/978‐3‐319‐08108‐3_12
doi:10.1037/xge0000033
Jodlbauer, B., & Jonas, E. (2011). Forecasting clients' reactions: How
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2016). Overcoming algo- does the perception of strategic behavior influence the acceptance
rithm aversion: People will use imperfect algorithms if they can of advice? International Journal of Forecasting, 27(1), 121–133.
(even slightly) modify them (SSRN Scholarly Paper No. ID doi:10.1016/j.ijforecast.2010.05.008
2616787). Rochester, NY: Social Science Research Network.
Kahn, P. H., Ishiguro, H., Friedman, B., & Kanda, T. (2006). What is a
Retrieved from http://papers.ssrn.com/abstract=2616787
human? Toward psychological benchmarks in the field of human–
Dijkstra, J. J., Liebrand, W. B. G., & Timminga, E. (1998). Persuasive- robot interaction. In 15th IEEE International Symposium on Robot
ness of expert systems. Behaviour & Information Technology, 17(3), and Human Interactive Communication, 2006 (pp. 364–371).
155–163. doi:10.1080/014492998119526 doi:10.1109/ROMAN.2006.314461
Dzindolet, M. T., Pierce, L. G., Beck, H. P., & Dawe, L. A. (2002). The Kahn, P. H., Kanda, T., Ishiguro, H., Gill, B. T., Ruckert, J. H., Shen, S.,
perceived utility of human and automated aids in a visual detection … Severson, R. L. (2012). Do people hold a humanoid robot mor-
task. Human Factors, 44(1), 79–94. doi:10.1518/0018720024494856 ally accountable for the harm it causes? In 2012 7th ACM/IEEE
International Conference on Human–Robot Interaction (HRI)
Esmaeilzadeh, P., Sambasivan, M., Kumar, N., & Nezakati, H. (2015).
(pp. 33–40).
Adoption of clinical decision support systems in a developing coun-
try: Antecedents and outcomes of physician's threat to perceived Kumar, N., & Benbasat, I. (2002). Para‐Social presence and communi-
professional autonomy. International Journal of Medical cation capabilities of a web site: A theoretical perspective.
Informatics, 84(8), 548–560. doi:10.1016/j.ijmedinf.2015.03.007 E‐Service Journal, 1(3), 5–24.

Feng, B., & Feng, H. (2013). Examining cultural similarities and differ- Lawrence, M., Goodwin, P., O'Connor, M., & Önkal, D. (2006). Judg-
ences in responses to advice: A comparison of American and mental forecasting: A review of progress over the last 25 years.
Chinese college students. Communication Research, 40(5), International Journal of Forecasting, 22(3), 493–518.
623–644. doi:10.1177/0093650211433826 doi:10.1016/j.ijforecast.2006.03.007
700 PRAHL AND VAN SWOL

MacGeorge, E. L. (2016). Advice: Expanding the communication para- Schoorman, F. D., Mayer, R. C., & Davis, J. H. (2007). An integrative
digm. Annals of the International Communication Association, 40, model of organizational trust: Past, present, and future. Academy
213–243. of Management Review, 32(2), 344–354. doi:10.2307/20159304
MacGeorge, E. L., Guntzviller, L. M., Hanasono, L. K., & Feng, B. Schrah, G. E., Dalal, R. S., & Sniezek, J. A. (2006). No decision‐maker
(2013). Testing advice response theory in interactions with friends. is an island: Integrating expert advice with information acquisition.
Communication Research, 43(2), 211–231. doi:10.1177/ Journal of Behavioral Decision Making, 19(1), 43–60.
0093650213510938 doi:10.1002/bdm.514
Madhavan, P., & Wiegmann, D. A. (2007a). Effects of information Sniezek, J. A., Schrah, G. E., & Dalal, R. S. (2004). Improving judge-
source, pedigree, and reliability on operator interaction with ment with prepaid expert advice. Journal of Behavioral Decision
decision support systems. Human Factors, 49(5), 773–785. Making, 17(3), 173–190. doi:10.1002/bdm.468
doi:10.1518/001872007X230154 Sniezek, J. A., & Van Swol, L. (2001). Trust, confidence, and expertise
Madhavan, P., & Wiegmann, D. A. (2007b). Similarities and differences in a judge–advisor system. Organizational Behavior and Human
between human–human and human–automation trust: An Decision Processes, 84(2), 288–307. doi:10.1006/obhd.2000.2926
integrative review. Theoretical Issues in Ergonomics Science, 8(4), Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and
277–301. doi:10.1080/14639220500337708 Science of Prediction. New York, NY: Crown.
Mayer, R. C., Davis, J. H., & Schoorman, F. D. (1995). An integrative Van Swol, L. (2009). The effects of confidence and advisor motives on
model of organizational trust. Academy of Management Review, advice utilization. Communication Research, 36(6), 857–873.
20(3), 709–734. doi:10.5465/AMR.1995.9508080335 doi:10.1177/0093650209346803
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical Van Swol, L. (2011). Forecasting another's enjoyment versus giving the
analysis and a review of the evidence. Minneapolis, MN: University right answer: Trust, shared values, task effects, and confidence in
of Minnesota Press. improving the acceptance of advice. International Journal of Fore-
Önkal, D., Goodwin, P., Thomson, M., Gönül, M., & Pollock, A. casting, 27(1), 103–120. doi:10.1016/j.ijforecast.2010.03.002
(2009). The relative influence of advice from human experts and sta- Van Swol, L. M., & Sniezek, J. A. (2005). Factors affecting the accep-
tistical methods on forecast adjustments. Journal of Behavioral tance of expert advice. British Journal of Social Psychology,
Decision Making, 22(4), 390–409. doi:10.1002/bdm.637 44(3), 443–461. https://doi.org/10.1348/014466604X17092
Parasuraman, R., de Visser, E., Wiese, E., & Madhavan, P. (2014, Octo- Van Swol, L., MacGeorge, E. L., & Prahl, A. (2015). The effects of
ber). Human trust in other humans, automation, robots, and advice solicitation, confidence, and expertise on advice utilization.
cognitive agents: Neural correlates and design implications. Pro- Presented at the International Communication Association Confer-
ceedings of the Human Factors and Ergonomics Society Annual ence, San Juan, Puerto Rico.
Meeting, 58(1), Chicago, IL, 340–344. https://doi.org/10.1177/
Walther, J. B. (1996). Computer‐mediated communication impersonal,
1541931214581070
interpersonal, and hyperpersonal interaction. Communication
Paravastu, N., Gefen, D., & Creason, S. B. (2014). Understanding trust Research, 23(1), 3–43. doi:10.1177/009365096023001001
in IT artifacts: An evaluation of the impact of trustworthiness and
Whittaker, S. (2003). Theories and methods in mediated communica-
trust on satisfaction with antiviral software. SIGMIS Database,
tion. In A. Graesser, M. Gernsbacher, & S. Goldman (Eds.), The
45(4), 30–50. doi:10.1145/2691517.2691520
handbook of discourse processes (pp. 253–293). Mahwah, NJ:
Prahl, A., Dexter, F., Braun, M. T., & Van Swol, L. (2013). Review of Erlbaum.
experimental studies in social psychology of small groups when an
optimal choice exists and application to operating room
management decision‐making. Anesthesia & Analgesia, 117(5), How to cite this article: Prahl A, Van Swol L.
1221–1229. doi:10.1213/ANE.0b013e3182a0eed1 Understanding algorithm aversion: When is advice
Ring, P. S. (1996). Fragile and resilient trust and their roles in economic from automation discounted?. Journal of Forecasting.
exchange. Business & Society, 35(2), 148–175. https://doi.org/ 2017;36:691–702. https://doi.org/10.1002/for.2464
10.1177/000765039603500202
PRAHL AND VAN SWOL 701

A P P E N D IX SC R E E N S H O T S O F FO R E CA ST ING TA S K
702 PRAHL AND VAN SWOL

After entering an initial forecast, advice is given.

Finally, error is shown before moving to the next trial.

You might also like