You are on page 1of 2

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/366434144

Trust in AI-Enabled Decision Support Systems: Preliminary Validation of MAST


Criteria

Conference Paper · November 2022


DOI: 10.1109/ICHMS56717.2022.9980623

CITATIONS READS
4 1,628

10 authors, including:

Pouria Salehi Erik Blasch


Arizona State University Air Force Research Laboratory
6 PUBLICATIONS 22 CITATIONS 903 PUBLICATIONS 18,707 CITATIONS

SEE PROFILE SEE PROFILE

Myke C. Cohen Anna Pan


Arizona State University Arizona State University
23 PUBLICATIONS 40 CITATIONS 1 PUBLICATION 4 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pouria Salehi on 09 January 2023.

The user has requested enhancement of the downloaded file.


Trust in AI-Enabled Decision Support Systems:
Preliminary Validation of MAST Criteria
Erin K. Chiou1, Pouria Salehi1, Erik Blasch2, James Sung3, Myke C. Cohen1, Anna Pan1,
2022 IEEE 3rd International Conference on Human-Machine Systems (ICHMS) | 978-1-6654-5238-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICHMS56717.2022.9980623

Michelle Mancenido1, Ahmadreza Mosallanezhad1, Yang Ba1, Shawaiz Bhatti1


1 Arizona State University, 2 Air Force Office o f Scientific Research, 3 Department of Homeland Security

Abstract— This study tests the Multisource AI Scorecard


Table for evaluating the trustworthiness of Al-enabled decision
support systems. Forty participants reviewed one of four system
descriptions, either a low MAST rating or a high MAST rating,
and either a face-matching system or a text summarization system.
Participants rated the system and completed trust and credibility
questionnaires. Results show that MAST items are correlated with
validated trust and credibility items, suggesting that the tool can
be useful, but more so for text summarization based systems.
Keywords—standards, trust, artificial intelligence, security

I. Introduction
Artificial intelligence (AI) and machine learning have enabled
advanced information processing capabilities, but their use in high-risk III. Results and D iscussion
decision environments remains limited in part due to a lack o f trust in
these systems. To address this issue, in 2019, the AI Team o f the Public- The high-MAST version o f both systems resulted in higher MAST
Private Analytic Exchange Program, Office o f the Director o f National ratings than the low-MAST versions for 4/9 criteria in Facewise, and
Intelligence and Department o f Homeland Security, adapted 9/9 criteria in READIT. This supports our assumption that systems
Intelligence Community Directive (ICD) 203 to create the Multisource designed according to MAST criteria would be perceived by others as
AI Scorecard Table (MAST). MAST is an evaluation checklist and responsive to the criteria, especially for a text summarization system.
methodology that qualifies nine criteria to inform trust in Al-enabled We also found a positive correlation between MAST and trust items
decision support systems (AI-DSS). These criteria include: sourcing, (r(38) = 0.55, p = 0.000219, moderate strength), MAST and credibility
uncertainty, distinguishing, analysis o f alternatives, customer (r(38) = 0.54, p = 0.000375, moderate strength), and trust and credibility
relevance, logical argumentation, consistency, accuracy, and (r(38) = 0.85, p = 5.20E-12, high strength).
visualization. We hypothesized that applying these criteria in the
design, operation, or documentation o f an AI-DSS would and mitigate
known issues in establishing trust in AI, such as clarifying the potential
for data poisoning, report omissions, and communicating uncertainty.
MAST, however, had not yet been empirically validated. This study
addresses this gap across two AI-DSS to assess the extent to which
MAST criteria relate to trust perceptions.

II. Method
The MAST criteria were used to design two AI-DSS named
Facewise, a face-matching identity verification system, and READIT,
a text summarization system. Two levels o f each system were designed:
high-MAST, which had a set o f rich features that generally ranked high
on each o f the MAST criteria; and low-MAST, which had a minimum
set o f features similar to black-box systems, and generally ranked low
on each o f the MAST criteria. For instance, for Facewise’s uncertainty
criteria, the high-MAST version displayed a confidence level along
with a decision recommendation, whereas the low-MAST version only
displayed the decision recommendation. These system designs were
generated by the study team (authors listed) and through iterative
testing and redesign.
Forty participants were recruited from Prolific, ten for each level o f
each system (2 x 2). After random assignment to one o f the four groups, IV. Conclusion
participants reviewed the system features by watching a short video. MAST criteria can be used to assess trustworthiness and credibility
Participants then evaluated the system using MAST, and responded to o f Al-enabled decision support systems, especially for text-based
questionnaire items on perceived trust and message credibility. summarization machine learning systems. Future work should validate
these findings against trust-related behavioral measures, and
comparison with other AI evaluation tools and methodologies.
This material is based on work supported by the U. S. Department o f Homeland Security under Grant Award
Number 17STQAC00001-05-00. The views and conclusions contained in this document are those o f the
authors and should not be interpreted as representing the official policies, either expressed or implied, o f the
Department o f Homeland Security.

978-1-6654-5238-0/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: ASU Library. Downloaded on January 09,2023 at 18:42:19 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like