You are on page 1of 5

Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1730

Designing an Automated Agent to Encourage Human Reliance


Christopher J. Garnick, Jason M. Bindewald, Christina F. Rusnock
Air Force Institute of Technology
Wright-Patterson AFB, OH
{christopher.garnick, jason.bindewald, christina.rusnock} @ afit.edu

Human reliance on automated agents can be critically important, as exemplified by a pilot relying on an
automated ground collision avoidance system. While it is important that the automated agent perform a task
well, thus promoting reliance on the automation, it is difficult to test human reliance on automated agents in
safety-critical systems. This paper presents an automated agent designed to enable testing of human
reliance on automation in the Space Navigator environment. The automated agent performs collision
detection and avoidance tasks in the environment, aiding the human participant in real-time. We present a
collision detection and avoidance model, comparing three potential methods for collision avoidance.
Analysis shows that the new agent's performance when teamed with another simulated agent improves
upon previous individual human and human-agent team performances in the same environment, thus
making it logical for humans to rely upon it. A human-subjects study confirms that the resulting automated
agent/environment pairing enables human reliance studies in a low-states automation environment.

INTRODUCTION As technologies improved, autonomous systems gained


the capability to diagnose failure or recommend operator
In the realm of human-machine teams, trust directly actions. Ross et. al. (Ross, 2008) performed a human-subject
influences the level that a human relies on a system (Lee, experiment in order to measure reliance on diagnosis-based
2004; Parasuraman, 1997). When humans perceive the automation. In this study, participants were tasked to identify
machine to be unreliable, they display low levels of trust. If critical objects in a video clip. After each video, an automation
the system performs appropriately, improper distrust may lead aid provided a recommendation of critical objects. Since this
to failures from ignoring or turning off automation (disuse). autonomy was not alarm-based, Ross et. al. adapted the
On the opposite end, when humans perceive the machine to be definition of reliance to represent the rate participants
highly reliable, they show high levels of trust. Improper accepted the automation’s diagnosis.
overtrust in a system results in failures from overestimating
automation capabilities (misuse). In areas such as semi- The Current Study
autonomous aircraft (Lin, 2008) or cars (Neumann, 2016),
improper use of automation may have disastrous results. These In today’s world of self-driving cars and semi-
failures illustrate the need for human trust to be calibrated to autonomous aircraft, automation has the capability to perform
automation’s true capabilities (Lee, 2004). Understanding trust actions, rather than simply providing alarms, diagnosing
calibration is necessary to design effective human-machine failures, or suggesting actions. In order to analyze trust in
systems engendering appropriate trust. action-based automation, Boubin et. al. (Boubin, 2016)
redefine the reliance and compliance definitions used in the
Not subject to U.S. copyright restrictions. DOI 10.1177/1541931213601980

Human Reliance alarms literature. In their work, reliance refers to the


acceptance of automation non-action by the human, and
In order to calibrate for appropriate trust, there must be a compliance refers to the acceptance of automation action by
means to measure trust. Subjective trust measurements can be the human.
retrieved from Likert-type questionnaires that consist of self- Table 1 breaks down reliance and compliance into the
reported trust in automation. However, subjective measures human actions which exhibit trust or distrust. In reliance-based
cannot be relied upon alone, as outside factors such as self- automation, a human displays trust when they remain passive,
confidence often influence these results (Lee, 1994). For this letting the automation perform its own task while they focus
reason, the human behaviors of reliance and compliance are on other system tasks. Meanwhile, distrust is shown when the
often used as objective measures related to trust. The terms
stem from alarm-based autonomy, which alerts the operator Table 1: The human behaviors of reliance and compliance with their
associated actions that exhibit trust or distrust in automation.
when automation is working properly or when it fails.
Specifically, reliance refers to an operator’s state when an Human Action
Human Behavior
automation’s alarm is silent, while compliance refers to the Trust Distrust
Reliance Passive Preempt
operator’s state when the alarm sounds (Lee, 2004). Dixon et. Compliance Concur/Obey Override/Ignore
al. (Dixon, 2006) utilized this definition of reliance in their
human preempts automation, performing the automation’s task
research when analyzing how reliance in alarm-based
on their own. In compliance-based automation, a human
automation relates to workload.
displays trust when they concur with or obey the automation’s
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1731

action. Distrust is shown when the human overrides or ignores through one of several no-fly zones. Looking at this game
the automation’s action. from the perspective of human-machine teaming, there exist
This study presents a reliance-based automated agent four distinct score-related tasks for either the human-player or
within a human-machine environment where it is logical for automation to perform: routing ships to planets, routing ships
the human to rely upon its actions and non-actions, meaning through bonuses, avoiding no-fly zones, and avoiding
that the human should, with high-confidence, expect improved collisions.
performance as a result of relying upon the agent. The results
of the human-machine team are then analyzed to show that Space Navigator Agents
human-subject experimentation using this agent is suitable for
future reliance research to aid in better human-machine Previous human-machine studies in the Space Navigator
designs. The paper is laid out as follows: the Space Navigator environment (Bindewald, 2016; Boubin, 2017; Goodman,
environment and proposed reliance-based automated agent are 2016) involved playing with a simple-reflex agent tasked to
introduced; the collision detection and avoidance portions of route ships to planets. This agent, known as the “line agent,”
the agent are described; the agent is evaluated to show that it performs the compliance-based task of generating a straight-
is logical for humans to rely upon it; and the results of a line trajectory, where players show trust in the automation’s
human-subject experiment are analyzed to show that the action if they comply with the generated trajectory. Other
environment provides a means to measure human reliance in variations of the line agent have been designed, such as one
automation. that generates similar trajectories as the human player
(Bindewald, 2016).
APPLICATION ENVIRONMENT In order to measure human reliance, Space Navigator
requires an agent that executes a reliance-based task. The task
The Space Navigator environment is a tablet-based air of avoiding collisions is a good candidate. If players trust the
traffic control computer game specifically designed for the autonomous “collision avoidance agent,” they rely on the
evaluation of human-machine teams (Bindewald, 2014). The agent to re-route collision-imminent ships. On the other hand,
game provides a suitable framework to build a reliance if players do not trust the agent, they re-route collision-
measurement system, as it allows access to source code to imminent ships themselves.
further develop autonomous agents and has the ability to
capture in-game human and agent driven events. Seen in AGENT DESIGN
Figure 1, Space Navigator has four components: ships,
planets, no-fly zones, and bonuses. Every two seconds, a ship Success of the reliance measurement system depends on
spawns at a random location off-screen, moving in a random the proposed collision avoidance agent's performance. For this
fixed direction across the screen at a constant velocity. Each reason, importance lies in implementation of collision
ship is colored to match one of four possible destination detection, collision avoidance, and agent evaluation. This
planets. Human-players can guide each ship to any location by section covers the design of collision detection and avoidance,
pressing down on the tablet's touch screen and drawing a as well as evaluation of the agent in order to show that it is
trajectory. logical for humans to rely upon it.

Collision Detection

In the Space Navigator environment, collision detection is


accomplished via multiple inference detection (Jimènez,
2001). This deterministic strategy involves repeatedly
sampling the ships’ positions as a function of time to
determine if any intersect. Take for example, two ships S1 and
S2 with radii r1 and r2 and trajectories that can be represented
as a piece-wise function of position over time or P1(t) and
P2(t), respectively. The distance between these two ships at
any time t is calculated with D(t) = ||(P1(t) - P2(t)||. If at any
Figure 1: A screen capture of Space Navigator with components labeled. future real time D(t) ≤ r1 + r2, then the ships intersect and there
exists a future collision. In order to detect each future
Space Navigator Tasks collision, the process requires comparing the positions of
every pair of ships in a set range of time.
Space Navigator tasks the human player to achieve the
highest score within 5 minutes. Points are acquired in one of Collision Avoidance
two ways: +100 points for safely guiding a ship to its home
planet and +50 points for guiding a ship through a bonus. With the knowledge of the next future collision, the
Points are lost in one of two ways: -100 points per ship for collision avoidance agent has the responsibility to act upon a
colliding ships and -10 points per second for each ship flying
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1732

ship involved to try to prevent it. Collision avoidance consists measurements for 108 games, across four scenarios consisting
of two stages: ship selection and trajectory re-routing. of games with the line agent alone and paired with each
The decision of which ship to re-route considers many collision avoidance method.
different factors: environment congestion, ship lifespan, and Performance Without Collision Avoidance Agent. Since
distance from screen edge. The method chosen is to re-route the purpose of this experiment is to determine whether or not
the collision-bound ship furthest from its destination. This it is logical for a human to rely upon the collision avoidance
scheme increases agent predictability since ship distance from agent, game-play without the collision avoidance agent
their destination is much easier for a human operator to provides a baseline of how well the line agent and human
perceive than factors such as ship lifespan or congestion. An player perform in Space Navigator. Table 2 shows the mean
increase in predictability will allow players to better anticipate number of ships involved in a collision, mean score, and their
the chosen ship, which may in turn increase their baseline associated 95% confidence intervals across 108 five-minute
reliance on the agent. Moreover, this scheme decreases games. This baseline confirms that human-players perform
airspace congestion. By not operating on the ship closer to its statistically better than the line agent in both collision
destination, it is more likely that the closer ship will reach its avoidance and score metrics. However once the line agent and
destination sooner, leaving fewer ships on the screen. human-player work together, they significantly outperform
With the ship selection scheme determined, all that both cases of playing alone.
remains is the methodology for re-routing a ship's trajectory.
Table 2: Average baseline performance results across 108 five-minute games
After considering many possible re-route methods, three
with the human and/or line agent.
potential re-routing techniques were chosen for testing: the
step-back route, the least congested route, and the pseudo- Player and/or Collisions Score
random route. In Space Navigator trajectories are composed of Agent Mean 95% CI Mean 95% CI
a series of points. In order to preserve a player's intended ship
destination and direction, all three re-routing techniques retain Line Agent 41.3 ±1.39 4172 ±274
the last two points of the ship's original trajectory.
Human 30.5 ±2.32 5027 ±445
The step-back route. The notion behind the first re-routing
scheme is to have the ship “take a step back” before moving Human & Line 23.2 ±1.65 7117 ±306
forward with its original trajectory. This delays the ship's
movement in order to avoid the detected collision.
Performance With Collision Avoidance Agent. The
The least congested route. The next re-routing
second set of testing involves the collision avoidance agent
methodology considers all ship trajectories in order to find a
aiding the line agent in game-play. Table 3 shows the mean
least congested route. It does so by representing Space
number of ships involved in a collision, mean score, and their
Navigator’s airspace as a grid graph. Each vertex of the graph
associated 95% confidence intervals across 108 five-minute
has a weight that corresponds to the congestion of trajectories
games. These results show that the step-back and least
passing through it. This routing scheme then performs
congested methods significantly outperform the pseudo-
Dijkstra’s shortest path algorithm (Cormen, 2009), where path
random routing scheme. This is expected as the pseudo-
costs include vertex weights, to generate a least congested
random method does not necessarily try to avoid the detected
trajectory to the original destination.
collision, but instead simply creates a new “random”
The pseudo-random route. The final routing scheme
trajectory. Comparing the step-back and least congested
generates a new pseudo-random route. It does not take into
methods, the results show relatively similar performances.
account the ship’s original trajectory or any environment
However the two methods' 95% confidence intervals do not
information. Rather, this methodology utilizes a random
overlap in either area. Thus confirming that when combined
number generator to determine the direction of trajectory
with the line agent, the step-back method performs statistically
segments. The random segments are iteratively generated until
best in collision avoidance and score metrics.
the trajectory reaches the original destination.
Table 3: Average collision avoidance performance results across 108 five-
Agent Evaluation minute games with the line agent and each of the collision avoidance methods.

Collisions Score
For the collision avoidance agent to be effective, it must Agents
provide sufficient collision avoidance such that it is logical for Mean 95% CI Mean 95% CI
the player to rely on it. This effectiveness is confirmed by Line & Step-Back 5.6 ±0.62 10647 ±145
comparing game-play data with and without the collision
avoidance agent. A previous Space Navigator human-subject Line & Least
9.4 ±0.72 9749 ±182
study (Bindewald, 2016) resulted in score and collision Congested
measurements from 108 games across 36 subjects, where the Line & Pseudo-
27.6 ±1.16 6475 ±249
subject plays with no autonomous aid and with the line agent. Random
The following experiment repeats the previous human-subject
study with the collision avoidance agent in place of the human
Performance Overall. Results from Tables 2 and 3 allow
in order to show why subjects should rely upon it. This
for side-by-side comparisons of the collision and score metrics
experiment consists of gathering score and collision
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1733

across all scenarios. These comparisons, seen in Figures 2 and Experiment Details
3, display performance metrics for the line agent, human,
human/line agent team; and the line agent teams with step- Twenty-four participants were tasked to achieve the
back, least congested (LC) and pseudo-random (Random) highest possible score within Space Navigator. They were
collision avoidance methods. Figures 2 and 3 show that the informed of the collision avoidance agent’s capabilities and a
collision avoidance agent using the step-back method teamed possibility of reduced reliability.
with the line agent performs collision avoidance significantly Design. The experiment consisted of a within-subjects
better and scores higher than the human/line team, resulting in block-based design, playing Space Navigator with two blocks
a hypothesis that a human/step-back team will perform as well of 100% reliability and four blocks of the degraded reliability
as any other human/line or human/collision avoidance team. rates: 95%, 90%, 80%, and 70%. The sequence of blocks
These results confirm that it is logical for the human to rely began and ended with 100% reliability, while the intermediate
upon the step-back agent methodology for the task of avoiding blocks employed the degraded reliability rates in a random
collisions. Therefore, Space Navigator using this agent order. Each of the twenty-four participants experienced a
provides a useful tool for measuring human reliance with unique ordering. The blocking scheme aimed to reduce
automation in a human-machine team. variability from extraneous factors such as the ordering a
participant experienced reduced reliability rates.
45 Reduced reliability. Various reliability levels were
40 defined by a reliability rate (Equation 1), which represents the
Number of Collisions

35 percentage of detected collisions acted upon by the agent. An


30 agent with 100% reliability never ignores a detected collision.
25
20 -
= (1)
15 -
10
5 Reliance rate. An act of reliance in the proposed
environment occurs in two scenarios: when the collision
avoidance agent re-routes a collision-bound ship and when it
ignores an imminent collision. The first scenario captured
Figure 2: Mean number of collisions with 95% confidence intervals across events where the human relied upon the agent, and the agent
108 five-minute games for each examined set of players. performed its job reliably. Meanwhile, the second scenario
captured events where the human relied upon the agent, but
10500 the agent failed to perform its job. The environment also
9500 allows recordings of unreliance. This term refers to occasions
8500 when the participant performs collision avoidance before the
collision avoidance agent. Specifically, we define unreliance
Mean Score

7500
as every time the participant preemptively draws a trajectory
6500 for a collision-bound ship.
5500 Recording acts of reliance and unreliance during a game
4500 of Space Navigator provided data required to calculate a
3500 participant's reliance rate. The reliance rate, seen in Equation 2
holds the overall percentage that a participant relied upon the
collision avoidance agent and provides an objective
measurement related to trust.
Figure 3: Average score with 95% confidence intervals across 108 five-minute
games for each examined set of players.

= (2)

HUMAN-SUBJECT EXPERIMENT
RESULTS
Development of the collision avoidance agent in Space
Performing the human-subject experiment resulted in 144
Navigator enabled a human-subject experiment to test the
reliance rate recordings. Table 4 holds the average reliance
measurement of human reliance. Participants were tasked to
rates and their 95% confidence intervals from twenty-four
play Space Navigator with the collision avoidance agent at
participants across 6 blocks of various reliability rates. Figure
varying levels of reliability in order to see if they rely upon the
4 graphically displays the results from Table 4. The data
agent appropriately. This section describes the experiment
suggests a correlation between the reliance and reliability
design, implementation of reduced reliability, and a means to
rates. In order to evaluate the relationship, the Pearson’s
measure human reliance.
correlation was performed, resulting in the correlation
coefficient r = 0.352 and significance value p < 0.0001,
confirming a significant positive correlation between
Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1734

reliability and reliance. Therefore, the participants rely upon


the agent appropriately, supporting the hypothesis that this REFERENCES
environment provides a suitable means to measure reliance.
Bindewald, J., Miller, M., & Peterson, G. (2014). A function-
Table 4: Human-subject experiment results of reliance rate across each
to-task process model for adaptive automation system
reliability rate.
design. International Journal of Human Computer
Reliability Reliance Rate Studies, 822-834.
Rate Bindewald, J., Peterson, G., & Miller, M. (2016). Clustering-
Mean 95% CI
Based Online Player Modeling. International Joint
100% 66.0% ±2.36 Conference on Artificial Intelligence (IJCAI) -
Computer Games Workshop.
95% 63.6% ±3.78
Boubin, J., Rusnock, C., Bindewald, J., & Miller, M. (2017).
90% 61.6% ±4.00 Quantifying Compliance and Reliance Trust
Behaviors to Influence Trust in Human-Automation
80% 58.3% ±3.26
Teams. Manuscript submitted for publication.
70% 57.4% ±3.74 Cormen, T., Leiserson, C., Rivest, R., & Stein, C. (2009).
Introduction to Algorithms. The MIT Press.
Dixon, S., & Wickens, C. (2006). Automation Reliability in
70% Unmanned Aerial Vehicle Control: A Reliance-
Compliance Model of Automation Dependence in
High Workload. Human Factors: The Journal of the
65% Human Factors and Ergonomics Society, 474-486.
Reliance Rate

Goodman, T., Miller, M., & Rusnock, C. (2016). Timing


60% Within Human-Agent Interaction and its Effects on
Team Performance and Human Behavior. 2016 IEEE
International Multi-Disciplinary Conference on
55%
Cognitive Methods in Situation Awareness and
Decision Support (CogSIMA).
50% Jimènez, P., Thomas, F., & Torras, C. (2001). 3D collision
100% 95% 90% 80% 70%
detection: a survey. Computers & Graphics, 269-285.
Reliabilty Rate Lee, J., & Moray, N. (1994). Trust, self-confidence, and
operators' adaptation to automation. International
Figure 4: Human-subject experiment results of reliance rate across each
reliability rate.
Journal of Human-Computer Studies, 153-184.
Lee, J., & See, K. (2004). Trust in Automation: Designing for
DISCUSSION Appropriate Reliance. Human Factors: The Journal
of the Human Factors and Ergonomics Society, 50-
The present study covered the design and evaluation of a 80.
reliance-based autonomous agent, showing that it is logical for Lin, P., Bekey, G., & Abney, K. (2008). Autonomous Military
a human to rely upon it. It then summarized the design and Robotics: Risk, Ethics, and Design. San Luis Obispo:
results of a human-subject experiment to capture human California Polytechnic State University.
reliance on automation with varying reliability. Since a Neumann, P. (2016). Risks of Automation: A Cautionary
significant positive correlation between reliability and reliance Total-System Perspective of Our Cyberfuture.
exists, the results confirm that participants relied upon the Communicaitons of the ACM, 26-30.
agent appropriately; thus enforcing that the collision Parasuraman, R., & Riley, V. (1997). Humans and
avoidance agent within Space Navigator provides a human- Automation: Use, Misuse, Disuse, Abuse. Human
machine team environment where reliance on automation can Factors: The Journal of the Human Factors and
be measured. Ergonomics Society, 230-253.
The development of a reliance-measurement system opens the Ross, J., Szalma, J., Hancock, P., Barnett, J., & Taylor, G.
door for further research on reliance in autonomy. Possibilities (2008). The Effect of Automation Reliability on User
for future work include analysis on: reliance on automations Automation Trust and Reliance in a Search-and-
that perform tasks via different methods, relationships Rescue Scenario. Human Factors and Ergonomics
between reliance and subjective trust metrics, the relationship Society 52nd Annual Meeting, (pp. 1340-1344).
between reliance and compliance, and what confounding Rusnock, C. F., Miller, M. E., & Bindewald, J. M. (2017).
factors contribute to acts of unreliance (Rusnock, Miller, & Framework for Trust in Human-Automation Teams.
Bindewald, 2017). Research in these areas and more will aid Institute of Industrial and Systems Engineers -
in the study of trust calibration and the design for automation Annual Conference and Expo (IISE 2017).
which engenders appropriate trust to mitigate failures from Pittsburgh.
misuse and disuse.

You might also like