Brzowski2019trust Measurement in Human-Automation Interaction A Systematic Review

Proceedings of the Human
TRUST MEASUREMENT in Factors and Ergonomics INTERACTION:

HUMAN-AUTOMATION Society 2019 Annual Meeting REVIEW, San Jose State University
A SYSTEMATIC Page 1 of 1595
5
Trust Measurement in Human–Automation Interaction: A Systematic Review
Matthew Brzowski, Dan Nathan-Roberts Ph.D.

Industrial and Systems Engineering, San José State University
This systematic review summarizes current measurements of trust in human-automation interaction. A total
of 217 articles were found, and it was determined that 44 articles contained relevant information and met
inclusion criteria. The results of the review showed that 75% (n = 33) of articles used subjective measures
of trust only, and 41% (n = 18) used researcher-defined methods of measuring trust instead of peer-
reviewed and validated scales. Of 10 defined industries, the highest number of articles (n = 14) were
assigned to the automotive industry, followed by aviation, military, and security (n = 6). The automated
systems studied in relevant articles were decision aids, automated control and navigation systems, and
process control systems. This review showed that research of trust in human-automation interaction (1) has
the tendency to use subjective measures of trust as the primary or only measure, (2) has the tendency to
individually define trust and how it is measured, and (3) is heavily composed of research on automotive
automation. Best practices and future research are discussed.
INTRODUCTION Without consideration for trust in these automated

systems, America’s industries will find that automation
Trust is difficult to define, yet the body of research on adoption is difficult for employees, and high-risk industries
human–automation interaction (HAI) dating back to 1973 and such as the military and medicine will find users neglecting to
before suggests that trust could play a large role in the use such systems in favor of their own judgement, even when
successful adoption and subsequent use of automated systems it may be at a fundamental disadvantage (Alexander et al.,
(Lee & Moray, 1994). The current literature on HAI lacks a 1994; Crowley et al., 2013).
consensus on a formal definition of trust, and often, The current body of literature on human–automation
investigators choose to define it individually. interaction fails to consistently define trust and to measure it.
The difficulty of measuring trust is somewhat due to its To validate measures of trust in human–system interaction,
multifaceted nature. Larzalere and Huston (1980) observed researchers must first standardize definitions of trust and the
that factors such as predictability, reliability, and methods to measure it. This review aims to provide a
dependability play important parts in trust, but other studies foundation for trust measurement in HAI in the hopes of
have shown that elements such as accurate mental models of stating its importance and reiterating the need to standardize
Copyright 2019 by Human Factors and Ergonomics Society. DOI 10.1177/1071181319631462
the system (Muir, 1987) and the system’s performance (Lee, trust definitions and measures in HAI.
1991) could have an equal role in determining trust (Balfe et
al., 2018). GOAL OF THIS RESEARCH
A 2013 paper published in Technological Forecasting and
Social Change (Frey & Osborne, 2013) suggests that 47% of The goals of this research were to (1) survey current
jobs in the United States are at risk of automation. Other literature for all useful measures of trust with respect to
studies show that the workforce is already becoming more human-automation interaction and (2) provide a formal record
automated (Acemoglu & Restrepo, 2017). Increases in for the quantity and types of research that utilize each method
workplace automation bring an increased number of of trust measurement.
interactions between humans and that automation. It is more
important than ever to understand and improve that METHOD
interaction.
Humans often fail to rely on automation properly (Lee & The following databases were searched for relevant
See, 2000), displaying overconfidence in the system or articles: PsycINFO (full list of covered journals can be found
themselves, and trust calibration plays an important part in the at
successful adoption of such automation (Lee & Moray, 1994; https://www.apa.org/pubs/databases/psycinfo/coverage.aspx)
Muir, 1987; Sheridan, 1999). Understanding and accurately and PsycARTICLES (full list of covered journals can be found
measuring trust in automated systems is the first step to at: https://www.apa.org/pubs/databases/psycarticles/coverage-
optimizing it, ensuring successful and helpful adoption: list.aspx). The chosen keywords for search were trust and
automation, and the systematic review was conducted in
“…Operators tend to use automation that they trust while October of 2018. The initial search was followed by a screen
rejecting automation that they do not.” (Pop et al., 2015) of titles and abstracts to initially identify articles that did not
5
meet inclusion criteria. After the initial screen, papers that met Objective Measures. Objective measures varied greatly
inclusion criteria were reviewed in detail. between studies, especially between industries. Objective
Inclusion and Exclusion Criteria. Articles were included measures included compliance (n = 3), acceptance rate (n = 4),
if they met all inclusion criteria and excluded if they did not and adoption rate (n = 1). Other individually defined objective
meet at least one of the inclusion criteria. Inclusion criteria measures included resource sharing (defined as the proportion
were (1) the article reported research involving participants, of times a participant sought help from an automated decision
(2) the article was available for review in full-text, (3) the assistant) (n = 1), purchasing decisions (n = 1), glance rate in
article measured participant trust in addition to other automotive research (n = 1), and credits staked (defined as a
measurements (i.e. trust was not the only measure in the number of credits allocated to the automated system compared
study), and (4) the article measured human–automation, to credits kept by the user) (n = 1).
human–system, or human–computer trust where the primary
goal of the relevant system was to automate some function in Industry
the environment. Articles were organized into measures of
trust, industry, and type of automation. Articles were assigned to only one of the industries listed
In addition to the articles found in the search according to in Table 2. Out of interest, the researchers categorized articles
the criteria above, one article from external sources was by industry based on the claims made by the research and the
included for review. contextual information in each article using a grounded-theory
framework. This industry assignment resulted in the following
RESULTS distributions, organized by number of articles: automotive (n =
14), aviation (n = 6), military (n = 6), security (n = 6),
The initial search resulted in 217 articles. Following an medicine (n = 4), computer science (n = 3), manufacturing (n
initial screen, subsequent full-text review, and the addition of = 2), augmented reality (n = 1), maritime and nautical (n = 1),
the external article, 44 articles were found to meet all and unassigned (n = 1). The single unassigned article was a
inclusion criteria. study of visual screening with an automated decision assist
system with no contextual information concerning industry
Measures and was therefore not assigned to any of the industries listed
above.
Articles that met the inclusion criteria were screened for Industry Measures. Table 2 outlines the measures used by
the measure of trust employed in the research. All measures the reviewed articles as a function of the industry the article
were either marked as subjective or objective. Some studies was assigned to. For example, of the 14 studies in the
used only subjective measures, while others used only automotive industry, 11 of the 14 used subjective methods as
objective or both. The distributions are outlined in Table 1. the only measures of trust, and the remaining 3 used both
methods.
Table 1. Trust Measures by Type
Measure Articles Table 2. Trust Measures by Industry
subjective only 33 Articles by measure
objective only 5 Industry
both 6 subjective objective
both total
only only
Subjective Measures. Methods of trust measurement that automotive 11 3 14
were marked as “subjective” were self-reported ratings of the aviation 2 1 3 6
user’s trust in the automated system studied in each article. military 6 6
Articles measured trust in the system before interaction, security 5 1 6
during interaction, or after interaction. All articles using
medicine 2 2 4
subjective measures asked the participant to respond to one or
computer science 1 1 1 3
more questions about their trust in the system, following a
Likert scale (n = 34) or percentage (n = 4) and with a measure manufacturing 2 2
either developed alone by the researchers (n = 18) or augmented reality 1 1
following a model such as the scales below (n = 21). nautical 1 1
A total of 21 articles used the following scales to collect unassigned 1 1
self-reported trust information from their participants: The
Scale of Trust in Automated Systems (n = 8), the scale
developed by Muir (1989) (n = 8), the scale developed by Type of Automation
Merritt et al. (2011) (n = 2), the Human–Computer Trust
questionnaire (n = 2), or the German TiA Scale (n = 1). A total Using a grounded-theory approach, articles that met the
of 18 articles used methods defined individually by the inclusion criteria were also grouped by type of automated
researcher of each article. These methods are discussed in system that was used. These were assigned to 3 types of
more detail in the discussion section.
5
automation shown in Table 3. Descriptions of each type of measure trust in whatever manner investigator determines to
automation are listed below in detail. be useful to the system they are studying, and (3) be composed
of research in the automotive industry, with a proportionally
Table 3. Types of Automated Systems high number of automations focused on navigation assistance
Type Articles or control.
decision aids 19
automated control and navigation 17 Subjective Measures of Trust
process control 8
A total of 75% of relevant articles (n = 33) used
Decision Aids. Automated systems whose primary subjective measures as the only method to collect trust
objective was to assist the user in making decisions about the information in their research. A total of 14% of the remaining
system, the environment, or the next step in a process were articles (n = 6) used subjective methods in conjunction with
assigned to the decision aids type. This type includes systems objective measures. Only 11% of relevant articles (n = 5) used
such as automated navigation systems, automated alarm and no subjective measures of trust whatsoever. In other words,
warning systems, maze solution systems, and more. subjective measures are heavily relied on to measure trust. The
Many of the automated systems subjected to study in the specifics of the measures referenced in the results are listed
relevant articles focused primarily on image and video below in greater detail.
processing (e.g. aerial target detection systems and X-ray
screening automation) (n = 8). These systems were classified Scale of Trust in Automated Systems. One of the two
as decision aids because they met the criteria defined above. scales that was most frequently used (n = 8) was The Scale of
Automated Control and Navigation. Automation systems Trust in Automated Systems developed by Jian et. al (2000).
of this type were systems whose primary objective was to This scale draws heavily on the work of Sheridan (1988), Lee
automatically control the navigation of a vehicle or system & Moray (1992), and Muir and Moray (1996). The work of
that the user would normally operate. These systems included Jian, Bisantz, and Drury (2000) used three experiments to
autonomous cruise control (ACC), automatic ground collision configure a 12-item scale to measure trust in automated
avoidance systems (AGCAS), and other systems whose main systems and is clearly regarded as a source of authority in
goal was navigation and were directly responsible for measuring trust in automation. However, it is important to
providing control to the system/vehicle. note that:
These systems are differentiated from decision aids
because the automated system itself was responsible for “…the [Scale of Trust in Automated Systems] was
controlling the vehicle or navigation path but allowed for developed with respect to a nondirected feeling of trust in
intervention. A navigation-focused decision aid simply automated systems, rather than trust in a specific system that
provided navigation information to the primary human the participants had experienced.” (Jian et al., 2000)
operator.
Process Control. Process control systems were automated This fact may have important implications for the validity
systems whose primary objective was automation of the and accuracy of the scale when used to measure trust in a
control and management of a given process. These included specific automated system.
automated process controls such as medication management Muir Scale (1989). The scale referenced by many of the
systems, simulated-world management agents, and air traffic articles (n = 8) and by a large body of research on trust in
control automations. automation, Lee and Moray (1992) used the trust measure
Systems that were assigned to this type of automation hypothesized by Muir (1989). This scale measures a user’s
were differentiated from decision aids on the basis that they trust in an automated system by asking 4 trust-focused
were directly responsible for controlling the process system. A questions with answers on a scale of 1 (“not at all”) to 10
process-focused decision aid allowed the human operator to (“completely”).
hold primary control of the process and consult the automated Merritt et al. Scale (2011). The trust measure by Merritt
system for aid in decision making. et al. (2011) is a 5-point Likert-type scale that assesses a
After the review of relevant articles by automation type, user’s trust in an automated system. This scale was used to
no relationship was observed between type of automation and measure user trust in automated systems by a pair of relevant
measure of trust other than what has already been stated. It is articles (n = 2).
worth noting that a majority of relevant research on automated Human–Computer Trust. The Human–Computer Trust
control and navigation systems belonged to articles assigned (HCT) Questionnaire is a subjective measure of human–
to the automotive industry (n = 12). computer trust developed by Madsen & Gregor (2000). The
HCT questionnaire is a 25-item subjective measure of
DISCUSSION “cognition based” and “affective-based” trust. Madsen &
Gregor showed the scale has high internal consistency and
Results showed that the current literature on trust in construct validity, and this scale was used by two relevant
human–automation interaction has a tendency to (1) use articles (n = 2).
mostly or only subjective measures of trust, (2) define and
5
German TiA Scale. The German TiA scale developed by Automation Type
Körber et al. (2015) is a subjective measure of trust in
automated systems. It is a scale of 19 items on a Likert-type The type of automation that each study researched plays a
rating scale with subscales for reliability and competence, large role in defining the type of trust measure used and its
familiarity, trust, understanding, and intention of developers. generalizability to other areas. Decision aid systems were the
This scale has shown high internal consistency and construct most researched type of automation (n = 19), followed by
validity. It was used by one article (n = 1). automated control and navigation systems (n = 17).
Other Measures. More than any other single measure, While decision-aid research was conducted across
41% of articles (n = 18) used scales and questionnaires industries, a majority of automated control and navigation
developed by the researchers themselves, not a scale systems were from the automotive. This impacted the
developed and validated by one of the authors above. measures of trust that each collected. Specifically, automated
control and navigation systems used glance rates, monitoring
As stated above, 41% of relevant articles that used rates, and compliance as their objective measures of trust.
subjective methods used methods defined individually instead These measures may not be important or accurate measures of
of scales that have been validated as measures of trust. While trust in other industries.
individually developing a measure of trust may more closely
allow the investigators to understand the intricacies of the The distinction between measure, industry, and type of
system they are studying, their methods would need to be automation is not intended to make claims about how
validated externally, and even still, their measurements may prevalent automation is in each industry but simply provide a
therefore not be generalizable to a system other than the one foundation for which to present the relevant research in each
they are studying. The practice of individually defining of the categories. All of the listed industries will be subject to
measures of trust (1) brings to question the findings’ automation in the near future, so it is important for all to
reliability, validity, and generalizability; and (2) creates a understand and optimize trust in automation.
precedent in the research of trust in human–automation
interaction that should be rejected. CONCLUSIONS
With such a high reliance on subjective measures of trust,
researchers must commit to the use of reviewed and validated As evidenced by the results of this systematic review,
scales. Many of the scales above show high internal measures of trust vary across industry, type of automated
consistency and high construct validity, and are therefore system, and by study. Although frequency counts are a useful
likely accurate measures of a user’s trust in the system, but measure of current state-of-the-art, they should not be
only provide value if they are used in research. interpreted as an indicator of the best method to use, just what
is being used.
Industry Validated and peer-reviewed subjective measures of trust
like the scales developed by Muir and Merritt and the HCT
A majority of the relevant articles used subjective questionnaire provide an accurate picture of a user’s trust in
measures of trust as the only method, but some industries were the system, but those scales are not used frequently enough.
more likely to have either both or objective-only measures. To ensure accurate trust measurement in HAI, the highest
For example, 21% of automotive articles (n = 3) and 50% of priority for researchers may be to use a validated scale for
aviation research used both methods (n = 3). As expressed in subjective measurement of trust and then combined with other,
the relevant articles, the use of objective measures may be more-system-specific measures if necessary.
more useful or necessary depending on the industry; for The current literature on trust in HAI suggests that with
example, a regular practice in research of the automotive the high internal consistency and construct validity of verified
industry is to track eye movements or the duration that drivers subjective measures, such as the Muir and Merritt scales, it
attend to a particular area of their visual field. Applied to may not be necessary or even prudent to mandate that trust
testing new automated driving systems, this eye-tracking measurement includes objective methods as long as requisite
information can provide researchers with information about care is taken in maintaining validity of the scales; if users are
how often a user “checks on” the status of the system, an able to accurately predict their level of trust by objective
objective measure of their trust in that automation. measure in a given automated system, it may not be necessary
Table 2 presents each industry in order of the number of to measure their trust objectively. If participants’ trust is
relevant articles. Automation was by far the most prevalent measured with only objective measures, the investigators may
industry (n = 14), followed by aviation (n = 6), military (n = need to prove the validity of these measures, and the results of
6), and security (n = 6). This shows a prevalence of research their research may be able to be generalized less.
of trust in human–automation interaction from the automotive Across industry and system type, the use of peer-
industry and may be an indication of the important role of trust reviewed, verified, and effective measures of trust in human–
in automated systems from that industry. automation interaction is of the utmost importance. It may not
be preferable to standardize a single measure for all
applications, but maintaining consistency among industry, or
at least among scientific research, is critical for building a
5
viable basis for future research and for safe, usable, and REFERENCES
satisfying human–automation interaction.
Acemoglu, D., & Restrepo, P. (2017). Robots and Jobs: Evidence from US
Labor Markets. National Bureau of Economic Research (NBER).
Validating Trust Measures. Measure validation is another doi:10.3386/w23285
issue of importance for standardizing trust across industries. Alexander, V., Blinder, C., & Zak, P. J. (2018). Why trust an algorithm?
Measures such as the HCT questionnaire and German TiA Performance, cognition, and neurophysiology. Computers in
scale have been validated but only in the context of current Human Behavior, 89, 279–288. doi:10.1016/j.chb.2018.07.026
Balfe, N., Sharples, S., & Wilson, J. R. (2018). Understanding is key: An
human–automation interaction. These measures inherently analysis of factors pertaining to trust in a real-world automation
were created with exact applications in mind and are validated system. Human Factors, 60(4), 477–495.
as measures of trust in specific ways. The variety of doi:10.1177/0018720818761256
automation in industry makes standardizing validation of these Crowley, R.S., Legowski, E., Medvedeva, O. et al. (2013). Automated
detection of heuristics and biases among pathologists in a
measures difficult. Industries, by definition, have very computer-based system. Adv in Health Sciences Educ, 18(3), 343-
different applications of automated technology, and with these 363. doi:10.1007/s10459-012-9374-z
come the necessity of validating measures differently. For this Frey, C. B., & Osborne, M. A. (2017). The future of employment: How
reason, it may be important to validate these scales across susceptible are jobs to computerisation? Technological Forecasting
and Social Change, 114, 254-280.
industry and with new systems of automation. doi:10.1016/j.techfore.2016.08.019
Gaudiello, I., Zibetti, E., Lefort, S., Chetouani, M., & Ivaldi, S. (2016). Trust
Limitations as indicator of robot functional and social acceptance. An
experimental study on user conformation to iCub answers.
The most noteworthy limitation of this review is the Computers in Human Behavior, 61, 633-655.
Hoff, K. A., & Bashir, M. (2015). Trust in Automation: Integrating Empirical
limitations of the relevant articles themselves. Participants of Evidence on Factors That Influence Trust. Human Factors, 57(3),
most relevant research had collegiate-level education and 407–434. doi:10.1177/0018720814547570
significant exposure to computers and automated systems. Jian, J.-Y., Bisantz, A. M., & Drury, C. G. (2000). Foundations for an
This undoubtedly has implications for self-reporting trust in empirically determined scale of trust in automated systems.
International Journal of Cognitive Ergonomics, 4(1), 53–71.
automated systems; with training and prolonged exposure to doi:10.1207/S15327566IJCE0401
automation, the participants may have a more-accurate sense Körber, M., Bengler, K. (2014). Potential individual differences regarding
of how much they trust the system or may be more inclined to automation € effects in automated driving. In: Gonzalez, C.S.G.,
trust the system in general, skewing the validity of subjective Ord onez, C.C., Fardoun, H. ~ (Eds.), Interaccion 2014:
Proceedings of the XV International Conference on Human
measures of trust. Computer Interaction. ACM, New York, NY, USA, pp. 152e158.
Reviewers were also restricted in the following ways: (1) doi:10.1145/2662253.2662275.
reviewers had limited access to relevant databases, (2) Lee, J. D. (1991). The dynamics of trust in a supervisory control simulation.
reviewers had limited access to pertinent articles that did not In Proceedings of the Human Factors Society 35th annual meeting
(pp. 1228–1232). Santa Monica, CA: Human Factors Society
match the defined keywords, (3) many articles could not Lee, J., & Moray, N. (1992). Trust, control strategies and allocation of
accessed, (4) applicable articles that were published outside of function in human-machine systems. Ergonomics, 35(10), 1243–
the United States or were not in the English language did not 1270. doi:10.1080/00140139208967392
receive proportional review, and (5) research being conducted Lee, J. D., & Moray, N. (1994). Trust, self-confidence, and operators’
adaptation to automation. International Journal of Human-
at the time of the review or after the review was conducted Computer Studies, 40(1), 153–184. https://doi-
could not be included. org.libaccess.sjlibrary.org/10.1006/ijhc.1994.1007
Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for
Implications for Future Research Appropriate Reliance. Human Factors, 46(1), 50–80. https://doi-
org.libaccess.sjlibrary.org/10.1518/hfes.46.1.50.30392
Muir, B. M. (1987). Trust between humans and machines, and the design of
The discipline of human-robot trust was not studied but is
decision aids. International Journal of Man-Machine Studies,
known to have models of trust that may be able to be 27(5/6), 527–539.
translated to HAI. Subjective measures such as the 14-point Muir, B. M. (1987). Trust in automation: Part II. Experimental studies of trust
Human Robot Trust Scale, Schaefer (2013) may be able to be and human intervention in a process control simulation.
Ergonomics. 39, 429-460.
translated to HAI. Importantly, objective, quantitative
Muir, B. M. (1989). Operators' trust in and percentage of time spent using the
methods, such as using confirmation change as done by automatic controllers in a supervisory process control task.
Gaudiello, Zibetti, Lefort, Chetouani (2016) is a powerful tool Doctoral thesis. University of Toronto.
that should be examined in the context of HAI as well. Madsen, M., Gregor, S. (2000). Measuring human-computer trust. In:
Proceedings of 11th Australasian Conference on Information
With the widespread, and recommended, use of subjective
Systems. 6e8 December. Australasian Association for Information
measures of trust in human–automation interaction comes the Systems. QUT, Brisbane, pp. 6e8.
necessity to continue to research and validate the efficacy of Schaefer, K. E. (2013). The perception and measurement of human-robot trust
these measures as automation continues to evolve. Even with (Doctoral dissertation, University of Central Florida Orlando,
Florida).
high internal consistency and construct validity, measures are Sheridan, T. B. (1988). Trustworthiness of command and control systems.
only useful if they apply to the tools and systems they are Proceedings of the Third IFAC/IFIP/IEA/IFORS Conference on
being used with. As systems change and human–automation Man Machine Systems (pp. 427-431). Elmsford, NY: Pergamon.
interaction becomes more prevalent, these tools will need to Sheridan, T. B. (1999). Human supervisory control. In A. P. Sage & W. B.
Rouse (Eds.), Handbook of systems engineering and management
be re-validated and improved for future research.
(pp. 645–690). New York, NY: Wiley & Sons.

Brzowski2019trust Measurement in Human-Automation Interaction A Systematic Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Brzowski2019trust Measurement in Human-Automation Interaction A Systematic Review

Uploaded by

Copyright:

Available Formats

Proceedings of the Human

TRUST MEASUREMENT in Factors and Ergonomics INTERACTION:

Trust Measurement in Human–Automation Interaction: A Systematic Review

Matthew Brzowski, Dan Nathan-Roberts Ph.D.

INTRODUCTION Without consideration for trust in these automated

You might also like