Small Data in AI

REPRINT H05F0Q
PUBLISHED ON HBR.ORG
FEBRUARY 17, 2020
ARTICLE
DATA
Small Data Can Play a
Big Role in AI
by H. James Wilson and Paul R. Daugherty
DATA
Small Data Can Play a Big

Role in AI
by H. James Wilson and Paul R. Daugherty
FEBRUARY 17, 2020
JORG GREUEL/GETTY IMAGES
More than three quarters of large companies today have a “data-hungry” AI initiative under way —
projects involving neural networks or deep-learning systems trained on huge repositories of data.
Yet, many of the most valuable data sets in organizations are quite small: Think kilobytes or
megabytes rather than exabytes. Because this data lacks the volume and velocity of big data, it’s
often overlooked, languishing in PCs and functional databases and unconnected to enterprise-wide
IT innovation initiatives.
COPYRIGHT © 2020 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. 2
But as a recent experiment we conducted with medical coders demonstrates, emerging AI tools and
techniques, coupled with careful attention to human factors, are opening new possibilities to train AI
with small data and transform processes.
For every big data set (with one billion columns and rows) fueling an AI or advanced analytics
initiative, a typical large organization may have a thousand small data sets that go unused. Examples
abound: marketing surveys of new customer segments, meeting minutes, spreadsheets with less
than 1,000 columns and rows. In our experiment, it was annotations added to medical charts by a
team of medical coders — just tens of annotations on each of several thousands of charts.
Medical coders analyze individual patient charts and translate complex information about diagnoses,
treatments, medications, and more into alphanumeric codes. These codes are submitted to billing
systems and health insurers for payment and reimbursement and play a critical role in patient care.
Coders in our experiment, all of whom were registered nurses, were already accustomed to drawing
on an AI system for assistance. The AI scanned charts and identified links between medical
conditions and treatments and suggested the proper code for a given chart.
We wanted to see whether it was possible to transform the coders, responsible for the accurate, one-
at-a-time assessment of charts, into AI trainers capable of enriching the AI with medical knowledge
that would improve the system’s performance at identifying links.
What we learned over the course of the 12-week experiment is that creating and transforming work
processes through a combination of small data and AI requires close attention to human factors. We
believe that three human-centered principles that emerged from the experiment can help
organizations get started on their own small data initiatives:
Balance machine learning with human domain expertise. A number of AI tools have been developed
for training AI with small data. For example, few-shot learning teaches AIs to identify object
categories (faces, cats, motorcycles) based on only one or a few examples instead of hundreds of
thousands of images. In zero-shot learning, the AI is able to accurately predict the label for an image
or object that was not present in the machine’s training data. In other words, it can correctly identify
things it has never seen before. Transfer learning involves transferring knowledge gained from one
task to the learning of new tasks — for example, identifying subtypes of cancer, based on knowledge
of another type — which eliminates the machine’s need for a vast set of new data for performing the
new task.
In our experiment, we employed a tool commonly called a knowledge graph, which explicitly
represents the various relationships between different types of entities: “Drug A treats condition B,”
“Treatment X alleviates symptom Y,” “Symptom Y is associated with condition B,” etc. It succinctly
captures expert knowledge and makes that knowledge amenable to machine reasoning — for
example, about the likelihood of a specific condition being present given the drugs and treatments
prescribed.
To enable the coders to impart their knowledge to the AI, we developed an easy-to-use interface that
allowed them to review contested links in the graph’s database. These were links where their
colleagues, when reviewing individual charts, had disagreed with the AI — either by adding links
unknown to the system, or by removing links it had added. Based on their expertise, the coders could
directly validate, delete, or add links and provide a rationale for their decisions, which would later be
visible to their coding colleagues. In addition, they were encouraged to follow their inclination to use
Google (often with WebMD) to research drug-disease links, going beyond what they regarded as the
existing AI’s slow look-up tool.
This combination of machine learning and human expertise has a significant multiplier effect.
Instead of merely assessing single charts, coders added medical knowledge that affects all future
charts. Further, with the AI taking on the bulk of the routine work, the need for screening of entire
medical charts is greatly reduced, freeing coders to focus on particularly problematical cases.
Meanwhile, data scientists are freed from the tedious, low-value work of cleansing, normalizing, and
wrangling data.
Focus on the quality of human input, not the quantity of machine output. In the existing system,
coders focused on the assessment of individual charts in high quantity. Over time, the AI learned
from the accumulation of links added or rejected by a multitude of coders: Once a drug-disease link
that the AI was not familiar with had been proposed a significant number of times by coders, a data
scientist added it to the graph database. This manual process was undertaken only occasionally, in
part because of the time lag in accumulating link proposals, and it relied on quantitative support for
the link, rather than on medical expertise.
In the new system, coders were encouraged to focus less on volume of individual links and more on
instructing the AI on how to handle a given drug-disease link in general, providing research when
required. Links could now be considered for addition to the knowledge graph AI with a lesser burden
of quantitative evidence. The AI would learn more regularly and dynamically, especially about rare,
contested, or new drug-disease links.
Recognize the social dynamics in play on teams working with small data. In their new roles, the
coders quickly came to see themselves not just as teachers of the AI, but as teachers of their fellow
coders. Most importantly, they saw that their reputations with other members of the team would rest
on their ability to provide solid rationales for their decisions. They spoke often of the importance of
those rationales to the confidence of a subsequent coder encountering an unfamiliar link.
After only a few experimental sessions, a number of the participants asked that the number of
characters in the tool’s rationale textbox be increased. Later, they asked that the research box be
altered to accommodate more than one reference. Notably, they not only began to devote more time
to each case than they had with the existing system, but to provide even more comprehensive
rationales for their decisions as the experiment unfolded. Moreover, coders indicated they felt more
satisfied and productive when executing the new tasks, using more of their knowledge, and acquiring
new skills to help build their expertise. They also felt more positive about working with AI on a daily
basis.
As small-data techniques advance, their increased efficiency, accuracy, and transparency will
increasingly be put to work across industries and business functions. Think drug discovery, industrial
image retrieval, the design of new consumer products, and the detection of defective factory
machine parts, and much more.
But competitive advantage will come not from automation, but from the human factor. For example,
as AI plays an increasingly bigger role in employee skills training, its ability to learn from smaller
datasets will enable expert employees to embed their expertise in the training systems, continually
improving them and efficiently transferring their skills to other workers. People who are not data
scientists could be transformed into AI trainers, like our coders, enabling companies to apply and
scale the vast reserves of untapped expertise unique to their organizations. Further, the results that
emerge from small-data applications will come not from a black box, as they do in data-hungry
applications, but from human-machine collaboration that renders those results explainable and
therefore more trustworthy both inside and outside the organization.
Mastering the human dimensions of marrying small data and AI could help make the competitive
difference for many organizations, especially those finding themselves in a big-data arms race they’re
unlikely to win.
*Acknowledgement: The authors would like to acknowledge our research team based at The Dock,
Accenture’s innovation hub in Dublin, at Accenture Labs Dublin, and in San Francisco. Our core team
included Diarmuid Cahalane, Medb Corcoran, Andrew Dalton, James Priestas, Patrick Connolly, and
David Lavieri.
H. James Wilson is global managing director of information technology and business research at Accenture Research.
Follow him on Twitter @hjameswilson. He is a coauthor, with Paul Daugherty, of Human + Machine: Reimagining Work in
the Age of AI (Harvard Business Review Press).
Paul R. Daugherty is Accenture’s chief technology and innovation officer. He is a coauthor, with H. James Wilson, of
Human + Machine: Reimagining Work in the Age of AI (Harvard Business Review Press, 2018) and contributors to
Artificial Intelligence: The Insights You Need from HBR.
Copyright 2020 Harvard Business Publishing. All Rights Reserved. Additional restrictions
may apply including the use of this content as assigned course material. Please consult your
institution's librarian about any restrictions that might apply under the license with your
institution. For more information and teaching resources from Harvard Business Publishing
including Harvard Business School Cases, eLearning products, and business simulations
please visit hbsp.harvard.edu.

Small Data in AI

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Small Data in AI

Uploaded by

Copyright:

Available Formats

REPRINT H05F0Q

Small Data Can Play a Big

JORG GREUEL/GETTY IMAGES

You might also like