You are on page 1of 14

HOW TO BUILD AI

THAT PEOPLE TRUST


Table of contents
03... Introduction

04... Why is trust a unique issue for AI?

05... Why trust matters: examples of good and bad AI

07... A relative concept: how much trust do I need?

08... The points in the AI life cycle where trust can be undermined

10... A framework for building and deploying trusted AI

2
Introduction
Many assume that AI success hinges on getting world of AI. This raises very difficult questions
the model to accurately interpret data. This is when it comes to trust. Lots of companies
important, but accuracy is one element of a have implemented AI but have issues trusting
more nuanced challenge: trust. the results.

Take a hypothetical AI that looks at medical AI is a technical concept, but trust is a


scans and identifies whether they contain human one
signs of an early-stage tumor. The model is
tested using medical data and shown to be Trust is certainly undermined by low accuracy;
80% accurate in spotting early-stage tumors, but high accuracy alone does not guarantee
10% higher than a human specialist (figures trust. An AI can be 99% accurate, but so
for illustration only). complex and confusing – or new – that no one
trusts it. We need comprehensible evidence
But the AI has learned based on making it works, and explainability of how and why it
complex correlations between the images in works, before we can trust it.
its training data. And while the data shows it is
accurate, it is not entirely clear how it reached Even with all that, trust may take time to earn.
that decision. Would you get on the maiden flight of a plane
piloted only by AI? If you were told it had been
Do we trust it? Is 80% accuracy enough? Should shown in trials to be safer than human pilots,
we believe the claim of 80% accuracy? Are we would that be enough?
sure the scans were interpreted correctly? Are
we sure the test data was right? If a specialist It means managing expectations around what
disagrees, should she go with the AI or her AI can and can’t do
own diagnosis, neither of which are perfect?
What should she do about that instinctive AI must be planned and designed for human
feeling that something is wrong that she can’t trust and understanding. That means
quite put her finger on? managing expectations around what AI
can and can’t do, and rolling out a complex
These are complex questions. For many, the technology in a way that aligns with how
straightforward answer is – if an AI is shown to people learn.
perform better than a human, we should trust
the AI. But such relative concepts as “better This whitepaper will explain the importance
than” are hard to be sure of in the complex of trusted AI and discuss how to develop it.

3
Why is trust a
unique issue
for AI?
Most people are familiar with software in their replacing, or predict drug formulations, we
everyday life and workplace, which operates need to be very confident that it has reached
based on rules. They may be complex rules, but the right answer before we can trust it.
they are (bugs aside) predictable. They have
been explicitly programmed to follow a set of Added to this complexity is that AI conclusions
instructions which turn inputs into outputs. may be confusing, but still correct. NASA used
AI to design an antenna against a defined set of
AI works differently. It ingests data and learns criteria. The result would never have occurred
how to interpret it by establishing connections to a human, but it was better aligned to their
between different datasets. So, an English- needs than anything a human came up with.
Spanish translation AI is not explicitly told What does one do when an AI recommends
word-by-word that perro means dog or gato something completely counter-intuitive? It
means cat, etc., alongside fixed grammar rules. could be a breakthrough (as in NASA’s case),
It is fed texts that have been translated and is or it could be a spectacular oversight in the AI
told to learn what pattern links to the other – design or training. How do we know?
with guidance from language and data experts.
All of this raises questions of trust. If we
This allows it to learn complex tasks such know it is not 100% accurate, we need to
as translation or image recognition quickly. reach a decision about how much we trust
Many tasks performed with AI would not be its recommendation. This comes down to
possible with traditional software, or would multiple factors including how accurate we
take decades of programming. are told it is, how much we believe that claim,
how much control we had over the inputs, how
However, this approach brings unpredictability well we understand its decision-making, what
because the input data is complex and supplementary information it has provided to
imperfect, not a set of binary options. To learn back up its recommendations, its past record,
a language, an AI needs huge amounts of text the consequences of it being wrong, and the
and there is not enough time to manually check user’s own knowledge of the problem.
it all. Some translations may be poor, contain
mistakes, or deliberately misuse language. Even
correct ones contain nuance, where experts
disagree on the precise translation. A phrase
can be translated in several ways, depending on
the context. Anyone who has used a translation
app will know they are good, but not perfect.

Translation is usually low stakes, and we


can trust a language translation AI for many
applications, even if we can see it makes some
mistakes. But for AIs which diagnose disease,
spot when a plane engine component needs
4
Why trust matters: examples
of good and bad AI
High-profile examples of AI failure have impact on organizations. We look at two cases
undermined trust. But this is not the full story; that got trust wrong, and two that got it right.
plenty of AIs are also having a hugely positive

The good

The project The goal What went well The outcome


AI drug Use AI to Algorithms were used to sift through A drug
design identify drug potential compounds for an OCD treatment, molecule
molecules for checking them against a huge database of ready to go
Link treating OCD parameters. There was a dedicated focus on into clinical
high-quality data acquisition and checking, trials was
and tailoring algorithms to the specific task developed in
at hand. This was achieved through close 12 months,
collaboration between AI and drug chemistry where the
experts, who checked inputs and outputs industry
throughout to ensure that results could be average is 4.5
trusted. years.
Google’s Develop The tutor app encourages, explains, and The app has
Bolo a speech corrects the child as they read aloud. It applied since been
reading- recognition existing speech recognition and text-to-speech rolled out
tutor app to help technology to a specific application, which widely.
children in was developed with a clear purpose in mind
Link rural India with and carefully tested in 200 Indian villages. The
reading skills pilot was verified by ongoing research in the
field, showing 64% of children with significant
improvements in reading proficiency.

5
The bad

The project The goal What went wrong The outcome


An AI to Identify the The same datasets were used in both training A model that
predict link between and validation. So, it was trained to learn looked like it
premature non-invasive about a correlation, then tested with the should go into
births electro- same data to see if that correlation existed. clinical practice
hysterography Given that the initial dataset was quite small was shown to
Link readings and anyway, this gave it a very high accuracy rate. be one that
premature When other researchers reproduced the definitely
births. An models, accuracy dropped to 50%. Trust in should not.
initial project the original accuracy claims was destroyed in
suggested one stroke.
up to 99%
accuracy.
Amazon’s AI Look at Models were trained by observing patterns The model
recruitment applicant CVs in resumes from over ten years. Most came was scrapped.
tool and predict from men due to the tech industry bias. Not Considerable
the best understanding this, the system taught itself negative
Link candidates that male candidates were preferable and coverage
based on started rejecting applicants for being female. ensued.
similarities As a result, it could not be trusted to make
with previous unbiased decisions.
successful
hires.

6
How much
trust do I need?
The level of confidence in an AI output needed, far from perfect, but high-risk AI decisions need
before the user will trust it, depends on the much greater levels of confidence in order to
seriousness of the consequences of failure. create trust.
Users will trust a useful low-risk AI even if it is

Examples of AI applications and levels of trust

Application Potential negative Level of confidence needed for


consequence trusted use

Disease diagnosis Preventable


death Higher
Drug design Missed
opportunities,
expensive mistakes
Oil well drilling Major financial loss

Predictive Unnecessary
maintenance downtime
Mortgage Harm to customer,
recommendation legal challenges
Targeted adverts Missed sales
opportunities
Translation Miscommunication

Film Occasionally
recommendation frustrated
customers
Photo tagging Wrong photo
tagged Lower
AI-created artwork Probably none

7
The points in the AI life
cycle where trust can be
undermined
There are various stages where trust can the information they want, they are likely to be
be undermined in the AI development and suspicious of the result. If the interface is overly
deployment process. In this section, we discuss complex or the results presented in a confusing
the main risk factors. way, or with no explanation as to how they were
reached (even if they are correct), it will quickly
Bias in training data be abandoned. Even something as simple as a
film recommendation is much more trustworthy
Unconscious gender or racial bias has often hit if you can see what aspects of viewing history
the headlines, usually created by applying AI to led to it.
process automation without understanding the
limits of the data. Amazon’s Rekognition, for Bias in the real world
example, misidentified women and people of
color, likely due to the use of smaller training Many AIs continue to learn post deployment,
datasets for these groups. Such stories but they are not necessarily well prepared for
undermine the credibility of commercially the complexities of real-world data. Famously,
available technology. Microsoft’s Tay, an artificially intelligent
chatbot, was designed to learn from interactions
The AI doesn’t learn incorrectly; it learns to it had with real people on Twitter. Some users
reflect bias in its training, which reflects bias in decided to feed it offensive information,
the real world. Prejudice is the nasty face of this, which it had not been designed to deal with
but bias in data can also extend to misplaced appropriately. Within 24 hours, Tay had to
assumptions by scientists, doctors recording be deactivated and withdrawn for spreading
incorrect diagnoses, and even people’s choice deeply upsetting opinions.
of written or recorded language.
Malicious attacks
Badly curated data
AI is susceptible to new malicious attacks in
Data can also be mislabeled, or poorly curated, ways that are poorly understood by users. AIs
such that the AI struggles to make sense of it. that appear to take human decisions can be
If data is not appropriately selected, then the fooled in ways that humans cannot.
model will not learn how to reach the right
conclusions. And if conclusions seem suspect, In a test case, an AI was trained to recognize
people won’t trust it (or worse, they will trust images. By changing just one pixel in an image,
it and take bad decisions as a result). researchers fooled the AI, causing it to wrongly
label what it saw – sometimes very wide off the
User interface and explainability mark (one thought a stealth bomber was a dog).
Tesla’s self-driving image recognition systems
Trust is not just about how good the model is, have been tricked by placing stickers on roads
but how easy it is to use and interact with and and signs, causing them to suddenly accelerate
how clear the answers are represented to the or change lanes.
user. If the user does not feel they can input
8
Lack of transparency complained that the card’s lending algorithms
discriminated against women. No one from
Sitting above all these issues is a fear fed by AI’s Apple or Goldman was able to justify the
lack of transparency. Not only do the end-users output or describe how the algorithm worked.
not understand how AIs make their decisions, The apparent correlation between gender and
in many cases nor do their makers. credit doesn’t necessarily mean one is causing
the other, but it creates suspicion that bias has
Apple’s credit card, backed by Goldman Sachs, crept in. Without transparency, it’s impossible
was investigated by regulators after customers to know, and that makes it hard to trust.

9
A framework for building
and deploying trusted AI
Despite the risks discussed in this paper, AI rather than dogs.
delivers huge value when done well. And aside
from the negative headlines, it is often done As soon as it makes mistakes, users will stop
very well. trusting it. This may not matter too much
when classifying cats and dogs, but it matters
The problems usually come when poor a lot when classifying images of healthy vs.
process and lack of experience lead to poor precancerous cells.
choices: such as wrong algorithms, bad data,
inadequate verification, poor interfaces, or Trusted AIs must use a well-designed model,
the lack of post-deployment support. These and be trained and tested on data that is
errors are often baked in from the outset with proven to be accurate, complete, from trusted
fundamental mistakes in initial scoping and sources, and free from bias. Capturing that
design, caused by a lack of understanding about data requires rigorous processes around data
AI and the real-world problem it solves. These collection and curation.
all undermine trust.
Those designing an assured AI model should
As AI plays an ever-increasingly important role examine AI inputs and ask:
in our lives, we need to design it to be trusted.
This goes beyond data scientists designing an • Does this data accurately represent the
algorithm that learns about correlations and system to be modeled?
works on test data. AI must be designed as a
whole product, with a set of support services • Does the data contain confounding or
around it that allow the user to trust its irrelevant information?
outputs. Doing so needs a rigorous approach
to AI development. • How will I know that the data is of
sufficient quality?
In this final section, we outline five key
parameters for creating trusted AI. • Is the underlying data biased – and how
would I tell?
1. Assured
• Are my assumptions about data
A data-driven decision is only as trusted as the collection biased?
data that underpins it.
2. Explainable
The most obvious aspect of trusted AI is
ensuring it does what it is supposed to. Because A functioning model is not enough; users need
AI learns from data, that data must be reliable. to understand how it works. The AI earns
You can train an AI to recognize cats and dogs trust by backing up its recommendations with
by feeding it lots of labeled images of each. But transparent explanations and further details.
if some cats are labeled as dogs, some are not
labeled, or some show a completely different If a bank turned you down for a mortgage, you’d
animal, the AI will learn incorrectly and make expect to know why. Was it past or existing
incorrect decisions. If all images of dogs are debt? Was it an error? Did they confuse you
in the snow, the AI may learn to detect snow with someone else? Knowing the reason allows
10
you to move forward in the most constructive 3. Human
manner. For the bank, it allows them to spot
faults, retain customers, and improve processes. A trusted AI is intuitive to use. Netflix would be
less successful if users had to enter a complex
It’s the same for AI. A recommendation is much set of parameters to get film recommendations.
more useful if you understand how and why it But instead, it automatically presents films you
was made. Explainability allows the user to see may like based on your history of search terms
if the AI supports their own intuition (e.g., about in an easy to navigate interface, and sometimes
a disease diagnosis or the best way to make a even tells you why it recommended them (the
new material), or helps them question it. And Because You Watched feature).
it allows developers to spot errors in the AI’s
learning and remedy them. An intuitive interface, consistently good
recommendations, and easy-to-understand
A responsibly designed AI will have tools to decisions, all help the user come to trust it
analyze what data was used, its provenance, over time.
and how the model weighted different inputs,
then report on that conclusion in clear language Intuitive doesn’t always mean simple. A simple
appropriate to the user’s expertise. smartphone app may use very intuitive, guided
decision-making. A drug property prediction
Explainability may also involve some trade-off platform can expect advanced chemistry
between raw predictive power and transparency knowledge from its user and display complex
of interpretation. If AI cannot fully explain its information in a manner appropriate for an
outcome, trust may still be built in some cases expert to understand and interact with.
through rigorous validation to show it has a
high success rate, and by ensuring the user has The complexity of these guided decisions must
the information they need to understand that be matched to the user’s knowledge. Equally,
validation. the time it takes the user to fully trust the AI will
be relative to the complexity and risk of failure.
Those designing AI to be explainable should ask:
Making AI usable for humans means
• What could be known in principle about understanding the end-user and how they
the working of this AI? interact and learn over time. Those designing
AIs should ask:
• Does the model need to be fully
i n t e r p r e t a b l e o r a r e p o s t- h o c • What does each user need to understand
explanations sufficient? about why the AI did what it did?

• Can the AI rationalize why it decided to • How should we communicate with users
offer the user this piece of clarifying and collect feedback?
information, rather than another?
• Why would users not trust this AI?
• How consistent is the given answer with
previous examples? • What reassurances are they likely to need?

• Does too much transparency make the AI • What training and support is needed for
vulnerable to attack? different users?

• Does the information on offer meet the • Should the system allow users to ask for
accountability needs of different human more details?
users?
11
• How can we retain confidence if the AI gets of real-world data or because they have not
it wrong? been designed to integrate into the users’
working life (either technically or practically).
• How do we make human users feel the AI Users will quickly lose trust in an AI they see
is accountable? making less and less reliable decisions.

4. Legal and ethical A truly performant AI is future-proofed for


throughput, accuracy, robustness, security,
A trusted AI should reach decisions that are fair and balancing raw predictive power with
and impartial, with privacy and ethical concerns transparent interpretation while remaining
given equal weight to accuracy. aligned to genuine business needs.

An AI may conclude that certain groups are Those designing AI to perform in the real world
more likely to reoffend or default on loans. should examine AI outputs, and ask:
While this may be true at a group level (for
broader socio-economic reasons), it does not • Does this AI actually solve the intended
mean an individual from that group is more likely business problem?
to do so. An AI using this as a decision-making
factor creates undue prejudice and opens its • Do we understand the required levels of
user up to legal challenges. throughput and robustness?

Those designing AI to be ethical and legally • Do we understand the required output


compliant should ask: quality i.e., accuracy and precision?

• Why are we building this AI at all? • What safety barriers are needed if the AI
makes a mistake?
• Are we aligned with prevailing
ethical standards? • How robust is my testing, validation, and
verification policy?
• Is it fair and impartial?
• Is the in-service AI protected against
• Is it proportionate in its decisions? adversarial attacks?

• How is it governed? • Do we have a plan to continuously assess


and improve in-service performance?
• Are we honest about what we claim it
can do? • Do we know when and how the in-service
model could become invalid?
• Are we transparent about what it’s
doing, or is it doing something else in
the background?

5. Performant

Finally, a trusted AI continues to work


after deployment.

Too many AIs work well in a controlled


environment but fall over once deployed, either
because they are not ready for the complexities
12
Authors
Matt Jones
Lead analytics strategist
LinkedIn

David Hughes
Analytics solutions lead

Sam Genway
Analytics consultant
LinkedIn

John Godfree
Head of consulting
LinkedIn

13
About Capgemini Engineering
Capgemini Engineering combines, under one brand, a unique set of strengths from across
the Capgemini Group: the world leading engineering and R&D services of Altran – acquired
by Capgemini in 2020 – and Capgemini’s digital manufacturing expertise. With broad industry
knowledge and cutting-edge technologies in digital and software, Capgemini Engineering
supports the convergence of the physical and digital worlds. Combined with the capabilities of
the rest of the Group, it helps clients to accelerate their journey towards Intelligent Industry.
Capgemini Engineering has more than 52,000 engineer and scientist team members in over 30
countries across sectors including aeronautics, automotive, railways, communications, energy,
life sciences, semiconductors, software & internet, space & defence, and consumer products.

For more details, contact us:


www.capgemini-engineering.com

Write to us at:
engineering@capgemini.com

© Capgemini Engineering 2021. All rights reserved


14

You might also like