Best Practices For Human-In-The-Loop PDF

Best practices for human-in-the-loop:  
the business case for active learning
Paco Nathan @pacoid 2018-09-07

Introduction: 
definitions and indications
Machine learning
supervised ML:
▪ take a dataset where each element

has a label
▪ train models on a portion of the

data to predict the labels, then  
evaluate on the holdout
▪ deep learning is a popular example,  

when you have lots of labeled
training data available
3
Machine learning
unsupervised ML:
▪ run lots of unlabeled data through

an algorithm to detect “structure”
or embedding
▪ for example, clustering algorithms

such as K-means
▪ unsupervised approaches for AI  

are an open research question
4
Active learning
a special case of semi-supervised ML:
▪ send difficult decisions/edge cases  

to experts (feedback); let algorithms  
handle routine decisions (automate)
▪ works well in use cases which have lots  

of inexpensive, unlabeled data
▪ e.g., abundance of content to be classified,

where cost of labeling is a major expense
▪ can help explore/exploit uncertainty –  

in the sense of risk vs. uncertainty vs.
opportunity
5
HITL is a “mix and match” approach
Human in the loop: Machine learning and AI for
the people 
George Anadiotis 
ZDNet (2018-05-23)
“Rather than just recognize patterns, we can use ML to look 
at data and identify opportunities, by identifying uncertainty. 
This is not about risk – there’s no upside to risk, you just buy 
insurance. But if you can separate risk from uncertainty, you 
can profit, because that’s where the opportunities are.”
Humans in the loop 

Paco Nathan 
Domino Data Lab (2017-08-16)
6
Let’s consider:
• AI encompasses much more than deep learning:
there’s also knowledge graph, planning systems,
other kinds of learning, mathematical optimization,
etc.
• Also think about teams of people + machines
• How can these different parts be integrated?
• Which methodologies need to be followed?
7
1/ Achieving human parity
Achieving Human Parity in Conversational
Speech Recognition 
W. Xiong, et al. Microsoft 
(2016-10-12)
Microsoft researchers reached  
human parity in conversational  
speech recognition, reducing  
speech-to-text error rates below  
the human threshold of ~5%
8
AI defined? achieving parity with human
(2016-10-12)
experts, in domains where experts solve
“hard problems” —
Microsoft researchers reached  perhaps more aptly
stated as learning in uncertain domains
speech-to-text error rates below  
9
(2016-10-12)
Microsoft researchers reached  
  speech-to-text error rates below  
seen repeatedly as a  
metric for “expertise”
10
2/ Segments for AI adoption in industry
segment liabilities assets
▪ high capital expenses, long-term R&D   ▪ AI + cloud + mobile/embed,  

Google, as hardware evolves rapidly leveraging a flywheel effect
Amazon,
▪ potential vulnerabilities by automating   ▪ had focused business lines well  
Microsoft, 
too much in advance to prepare large-scale  
IBM, 
▪ potential vulnerabilities by mistaking   labeled data sets
Apple,  
Baidu,   first-order cybernetics for second-order ▪ uses AI to explore uncertainty,  
etc. focusing their core expertise
▪ facing barriers: talent gap, competing   ▪ HITL provides a vector to compete   MIT SMR 
investment priorities, security concerns against top incumbents, with many 2017
adopters unexplored areas of opportunity
▪ verticals eroded by horizontal business  
lines from top incumbents
▪ struggling to recognize business use cases

legacy ▪ buried in tech debt from digital infrastructure ??
▪ lacks management buy-in 11
2/ Segments for AI adoption in industry
The State of Machine Learning Adoption in the Enterprise 
Ben Lorica, Paco Nathan 
O’Reilly Media (2018-08-07)
O’Reilly did a recent study about ML adoption in enterprise,
with 8000+ respondents worldwide, which provides relevant
insights. Also see a summary article:
https://www.oreilly.com/ideas/5-findings-from-oreilly-
machine-learning-adoption-survey-companies-should-
know
12
3/ “Detach from named methods”
Developers Should Abandon Agile 
Ron Jeffries (2018-05-10)
“No matter what framework or method your management
thinks they are applying, learn to work this way:
• Produce running, tested, working, integrated software
every two weeks, every week…
• Keep the design of that software clean. As it grows, the
design will tend to become complex and crufty…
• Use the current increment of software as the
foundation for all your conversations with your  
product leadership and management…
13
3/ “Detach from named methods”
Developers Should Abandon Agile 
~20 yrs ago: “Agile”
Ron Jeffries (2018-05-10) created value by iterating
on a code base, while the nature of the data
“No matterwaswhat framework
relatively or method
invariant, yourspecified
e.g., management
by a
thinks theyschema,
are applying, learn to work this way:
relegated to unit tests, database, etc.
• Produce running, tested, working, integrated software
today:
every two OSS
weeks, shifted
every week…ROI – now someone else
maintains the code; but without LOTS of
• Keep the design of that software clean. As it grows, the
carefully labeled data, that code
design will tend to become complex and crufty…may not  
yield much return
• Use the current increment of software as the
foundation for all your conversations with your  
product leadership and management…
14
4/ The reality of data rates
“If you only have 10 examples of something, it’s going 
to be hard to make deep learning work. If you have 
100,000 things you care about, records or whatever, 
that’s the kind of scale where you should really start 
thinking about these kinds of techniques.”
Jeff Dean Google 

VB Summit (2017-10-23)
venturebeat.com/2017/10/23/google-brain-chief-says-100000-
examples-is-enough-data-for-deep-learning/
15
Transfer learning aside, most DL use cases require large, carefully
labeled data sets, while RL requires much more data than that.
Active Learning can yield good results with substantially smaller
data rates, while leveraging an organization’s expertise to
bootstrap toward larger labeled data sets, e.g., as preparation for
deep learning, etc.
supervised reinforcement
learning learning
data rates
(log scale)
active deep
learning learning 16
Transfer learning aside, most DL use cases require large, carefully
labeled data sets, while RL requires much more data than that.
Active Learning can yield good results with substantially smaller
data rates, while leveraging an organization’s expertise to
bootstrap toward larger labeled data sets, e.g., as preparation for
deep learning, etc.
many enterprise  
use cases fit here
supervised reinforcement
learning learning
data rates
(log scale)
active deep
learning learning 17
“Reducing the Data Demands of Smart Machines” 
https://www.darpa.mil/news-events/2018-07-11
• reduce the data needed to build new models by 106
• reduce the data needed to adapt models from  
millions to hundreds of labeled examples
“We are encouraging researchers to create novel 
methods in the areas of meta-learning, transfer 
learning, active learning, k-shot learning, and 
supervised/unsupervised adaptation to solve  
this challenge.”
18
5/ Human-AI Coexistence?
China: AI superpower 
Kai-Fu Lee Sinovation Ventures 
AI SF, 2018-09-06
quadrants plotted by  
“optimization” vs “compassion"
19
5/ Human-AI Coexistence?
Second-order cybernetics provides decades of learnings
which now apply for AI adoption in enterprise
Cybernetics and Design: Conversations for Action 

Hugh Dubberly, Paul Pangaro DDO 
(2015-11-01)
Second-Order Cybernetics 
Ranulph Glanville 
Systems Science and Cybernetics (2003)
20
Active Learning:
case studies and patterns
Who’s doing this?
22
Case Study: Primer
Using Artificial Intelligence to Fix Wikipedia's
Gender Problem 
Tom Simonite 
Wired (2018-08-03)
23
Case Study: Stitch Fix
Building a business that combines human
experts and data science 
Eric Colson StitchFix 
O’Reilly Data Show (2016-01-28)
“what machines can’t do are things around cognition, 
things that have to do with ambient information, or 
appreciation of aesthetics, or even the ability to 
relate to another human” 
24
Case Study: Figure Eight
Real-World Active Learning: Applications and
Strategies for Human-in-the-Loop ML 
Ted Cuzzillo 
O’Reilly Media (2015-02-05)
Active learning and transfer learning 

Luke Biewald Figure Eight 
The AI Conf, SF (2017-09-17)
breakthroughs lag invention of algorithms –  
must wait for “killer data set” to emerge,  
often lagging by a decade or more
25
Case Study: EY
EY, Deloitte And PwC Embrace Artificial
Intelligence For Tax And Accounting 
Adelyn Zhou 
Forbes (2017-11-14)
compliance use cases in reviewing lease  
accounting standards
3x more consistent and 2x efficient than  

the previous humans-only teams
break-even ROI within less than a year
26
Case Study: Lola
Paul English on Lola's Debut for Business Travelers 
Elizabeth West 
Business Travel News (2017-10-04)
founded 2015 by Paul English and other Kayak execs:  
on-demand, personal travel service; uses expert travel agents for HITL
initially criticized by travel industry as “competing against Siri”;  
currently displacing OTAs in a reversal of “AI vs. jobs”
can book on Airbnb, Southwest, etc., which aren’t available via OTA,  
because of the human delegation
“The first time you use Lola it’s going to be great because it’s a conversation.  
We’re not making you think like a computer”
“Instead of showing you 300 choices or 1,000 choices, we think we can  
show you three choices, kind of good, better, best”
27
Case Study: SAP Concur
When Privacy Scales 
Amanda Casari (SAP Concur) 
AI SF (2018-09-06)
28
Case Study: hCaptcha
Growing two-sided markets with blockchain tech 
Eli-Shaoul Khedouri Intuition Machines 
AI SF (2018-09-05)
Customers: provide datasets which need to be labelled
Web site publishers: serve “captcha” security tests and

receive tokenized payments
Consumers: prove they’re “human” by labeling image data
https://twitter.com/pacoid/status/1037413665283010560
https://hcaptcha.com/
https://www.hmt.ai/
29
Case Study: Crowdbotics
Anand Kulkarni Crowdbotics
HITL for code+test gen, trained from GitHub, StackOverflow,

etc., with JIRA tickets as the granular object in the system
parse specs from JIRA history, reuse what’s been done before;
generate PRs for popular web stacks: React, Flask, Ruby, etc.
resolve specs into the approach needed and time required,  

where product managers get cost estimates, then on-demand
expert programmers implement for you
meanwhile, in-house engineers handle the “radically novel”

projects
results: 1.5x software dev throughput 30

Case Study: Skymind
Unsupervised fuzzy labeling using deep
learning to improve anomaly detection 
Adam Gibson Skymind 
Strata Data Conf, Singapore (2017-12-07)
large-scale use case for telecom in Asia
method: overfit variational autoencoders,  

then send outliers to human analysts
31
Case Study: B12
Building human-assisted AI applications 
Adam Marcus B12 
O’Reilly Data Show (2016-08-25)
“Humans where they’re best, machines for the rest.”
Orchestra: a platform for building human-assisted  

AI applications, e.g., create/update business websites 
https://github.com/b12io/orchestra
example: http://www.coloradopicked.com/
32
Case Study: Clara Labs
Strategies for integrating people and machine
learning in online systems 
Jason Laska Clara Labs 
The AI Conf, NY (2017-06-29)
establishing a two-sided marketplace where  
machines and people compete on a spectrum  
of relative expertise and capabilities 
 
33
Case Study: Clara Labs
Strategies for integrating people and machine
learning in online systems 
Jason Laska Clara Labs 
“the trick is
The AI Conf, NY (2017-06-29) to design systems from Day 1 
which learn implicitly from the intelligence 
which ismarketplace
establishing a two-sided where  
already there”
machines and people compete on a spectrum  
Michael Akilian Clara
of relative expertise and capabilities  Labs
 
34
Active Learning:
theory, tooling, practices
HITL theory: the business of business
Risk, Uncertainty, and Profit 
Frank Knight 
Riverside Press (1921)
▪ while ML has mostly been about generalization,  

we can borrow from Knight…
▪ use ML models to explore uncertainty in relationship

to profit vs. risk – “focus the lens” within datasets
▪ also distinguish forms of uncertainty between

aleatoric (noise) and epistemic (incomplete model)
▪ see “AI-native blockchain” where models request data

– Bharath Ramsundar
36
HITL theory: choosing what to learn
Active Learning Literature Survey 
Burr Settles UW Madison 
(2010-01-26)
Can machines learn more economically if they ask human “oracles”
questions? e.g., task in-house experts with the edge cases?
▪ uncertainty sampling: query about instances which ML is least
certain how to label - least confidence / margin / entropy
▪ query-by-committee: ensemble of ML models votes; query the
instance about which they disagree most
▪ expected error reduction: maximize the expected information gain
of the query
▪ variance reduction: minimize future generalization error of the
model (e.g., loss function)
▪ density-weighted methods: instances which are both uncertain
and “representative” of the underlying distribution
37
HITL tooling: active learning
Agnostic Active Learning Without Constraints 
Alina Beygelzimer, Daniel Hsu, John Langford,  
Tong Zhang 
NIPS (2010-06-14)
The End of the Beginning of Active Learning 

Daniel Hsu, John Langford 
Hunch.net (2011-04-20)
https://github.com/JohnLangford/vowpal_wabbit/wiki
focused on cases where labeling is expensive; uses importance

weighted active learning; handles “adversarial label noise”
as good or better than supervised ML, wherever supervised
ML works 38
HITL tooling: machine teaching
Prodigy: a new tool for radically
efficient machine teaching 
Matthew Honnibal, Ines Montani
Explosion.ai (2017)
39
HITL practices: model interpretation
A Survey Of Methods For Explaining Black Box Models 
Riccardo Guidotti, et al. 
(2018-02-19)
Understanding Black-box Predictions via Influence Functions 
Pang Wei Koh, Percy Liang 
ICML (2017-07-10)
The Building Blocks of Interpretability 
Chris Olah, et al. Google Brain 
Distill (2018-03-06)
Challenges for Transparency 
Adrian Weller 
WHI (2017-07-29)
The Mythos of Model Interpretability 
Zachary Lipton 
WHI (2016-03-06)
40
HITL practices: model interpretation
explicability of ML models becomes essential,  
must be intuitive for the human experts involved:  
Skater, and also Anchors, SHAP, STREAK, LIME, etc.
41
HITL practices: no-collar workforce
No-collar workforce: Humans and machines in one loop 
Anthony Abbatiello, Tim Boehm, Jeff Schwartz, Sharon Chand 
Deloitte Insights (2017-12-05)
▪ near-future: human workers and machines complement the
other’s efforts in a single loop of productivity
▪ 2018-20: expect firms to embrace a “no-collar workforce”
trend by redesigning jobs
▪ yet only ~17% ready to manage a workforce in which people,
robots, and AI work side by side – largely due to cultural, tech
fluency, regulatory issues
▪ e.g., what about onboarding or retiring non-human workers?
these are no longer theoretical questions
▪ HR orgs must develop strategies and tools for recruiting,
managing, and training a hybrid workforce
42
Social Systems:
collaboration with machines
First-order cybernetics
Cybernetics: Or Control and Communication  
in the Animal and the Machine 
Norbert Wiener MIT 
MIT Press (1948)
early work had been about closed-loop control systems:

homeostasis, habituation, adaptation, and other
regulatory processes
given a system which has input and output, a controller

leveraging a negative feedback loop, and one or more
observers outside of the system
related to the early Macy Conferences

44
First-order cybernetics
Cybernetics: Or Control and Communication  
in the Animal and the Machine 
Norbert Wiener MIT 
MIT Press (1948)
a sea-change: “the organism was no longer an input/
output machine; rather it was part of a loop
early work had been about closed-loop control systems:
from
perception to action and back again
homeostasis, habituation, adaptation, and other to perception”
regulatory processes
Paul Pangaro describing Jerry Lettvin @ MIT cybernetics
given a system which has input and output, a controller
leveraging a negative feedback loop, and one or more
observers outside of the system
related to the early Macy Conferences

45
Second-order cybernetics
▪ biology informing computer science
▪ historical context of Project Cybersyn
▪ autopoiesis and cognition
▪ organizational closure:  
“self-making means stability”
▪ speech acts (e.g., social analysis of open source)
▪ IMO, blueprints for AI systems
Also, the focus on “information as a collection of facts”  

is yet another form of cognitive bias – instilled through  
30+ years of data warehouse practices, where data must  
fit into dimensions, facts, schema
46
Autopoiesis and Cognition: The Realization of the Living 
Humberto Maturana, Francisco Varela 
Kluwer (1980 / original 1972)
Understanding Computers and Cognition:  
A New Foundation for Design 
Terry Winograd, Fernando Flores 
Intellect Books (1986)
Conversations for Action and Collected Essays 
Fernando Flores 
Createspace (2013)
47
Autopoiesis and Cognition: The Realization of the Living 
Humberto Maturana, Francisco Varela 
Kluwer (1980 / original 1972)
“Action emerges from committed interactions of people  
Understanding Computers and Cognition:  
making requests and
A New Foundation for Design  promises in networks of commitments;  
such networks aren’t brought
Terry Winograd, Fernando Flores  forth by plans.”
Intellect Books (1986)
– Fernando Flores  
[could rewrite
Conversations quote
for Action and as “people
Collected and machines”]
Essays 
Fernando Flores 
Createspace (2013)
48
Summary:
how this matters
What is changing and why?
Second-order cybernetics began partly as a study of how
complex systems fail, and also about what social systems  
and physical systems had in common
It provides foundations for AI systems of people + machines
Feedback loops represent structured conversations for

action, from which the participants cannot be detached
The organization is no longer viewed as an input/output

machine; rather it’s a pluralistic network of loops from
perception to action and back again to perception –  
e.g., DL augments perception and HITL augments actions
50
Strategies from Big Data
In general with Big Data, we were considering:
▪ DAG workflow execution –  
those are typically linear
▪ “data-driven organizations”
▪ ML based on optimizing for  

objective functions
▪ general considerations about   Jarvis workflow

correlation vs. causation
▪ avoid “garbage in, garbage out”

51
Strategies from Big Data
In general with Big Data, we were considering:
▪ DAG workflow execution –  
those are typically linear
▪ “data-driven sound approaches

organizations” at the time, given the
IT backdrop of batch processing, BI, etc.
▪ ML based on optimizing for  
objective functions
however, much less ROI in context today
▪ general considerations about   Jarvis workflow
correlation vs. causation
▪ avoid “garbage in, garbage out”

52
Management strategy with HITL
HITL introduces circularities:
▪ deprecate linear input/output systems  
Examples,
Actions
Experts decide
as the “conventional wisdom” about edge cases,
providing examples
▪ analogous to an OODA loop which Models focus Experts

(e.g., weak supervision) Models explore
uncertainty when needed
includes automation/augmentation
ML Models act on decisions
▪ recognize multiple feedback loops  

Models when possible
Human
Experts
as conversations for action Experts gain insights
via Model explanations
Customer
Organizational Use Cases
▪ recognize opportunity: loops from Learning
perception (e.g., DL) to action (e.g., RL)

Customers request
and back again to perception Sales, Marketing,
Service, Training
▪ hint: recognize “verbs” being used, Experts learn through

Customer interactions
rather than over-emphasizing “nouns” Customers
53
Examples, “design systems which  
Actions
Experts decide
learn implicitly from  
about edge cases,
providing examples
the intelligence that’s  
already there”
Models focus Experts
(e.g., weak supervision) Models explore
uncertainty when needed
ML Models act on decisions

Models when possible
Human
Experts
Experts gain insights
via Model explanations
Customer
Organizational Use Cases
Learning
Customers request
Sales, Marketing,
Service, Training
Experts learn through

Customer interactions
Customers 54
publicaions, interviews, conference summaries…
https://derwen.ai/paco 
@pacoid
Just Enough Math ML Adopwon in Hylbert-Speys JupyterCon Themes + Confs

Enterprise themes 2018 per Pacoid

Best Practices For Human-In-The-Loop PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Best Practices For Human-In-The-Loop PDF

Uploaded by

Copyright:

Available Formats

Best practices for human-in-the-loop:

the business case for active learning

Paco Nathan @pacoid 2018-09-07

▪ take a dataset where each element

▪ train models on a portion of the

▪ deep learning is a popular example,

▪ run lots of unlabeled data through

▪ for example, clustering algorithms

▪ unsupervised approaches for AI

▪ send difficult decisions/edge cases

▪ works well in use cases which have lots

▪ e.g., abundance of content to be classified,

▪ can help explore/exploit uncertainty –

Humans in the loop

▪ high capital expenses, long-term R&D ▪ AI + cloud + mobile/embed,

▪ struggling to recognize business use cases

Jeff Dean Google

Cybernetics and Design: Conversations for Action

Active learning and transfer learning

3x more consistent and 2x efficient than

break-even ROI within less than a year

Web site publishers: serve “captcha” security tests and

Consumers: prove they’re “human” by labeling image data

HITL for code+test gen, trained from GitHub, StackOverflow,

resolve specs into the approach needed and time required,

meanwhile, in-house engineers handle the “radically novel”

results: 1.5x software dev throughput 30

method: overfit variational autoencoders,

Orchestra: a platform for building human-assisted

▪ while ML has mostly been about generalization,

▪ use ML models to explore uncertainty in relationship

▪ also distinguish forms of uncertainty between

▪ see “AI-native blockchain” where models request data

The End of the Beginning of Active Learning

focused on cases where labeling is expensive; uses importance

early work had been about closed-loop control systems:

given a system which has input and output, a controller

related to the early Macy Conferences

related to the early Macy Conferences

Also, the focus on “information as a collection of facts”

It provides foundations for AI systems of people + machines

Feedback loops represent structured conversations for

The organization is no longer viewed as an input/output

▪ ML based on optimizing for

▪ general considerations about Jarvis workflow

▪ avoid “garbage in, garbage out”

▪ “data-driven sound approaches

▪ avoid “garbage in, garbage out”

▪ analogous to an OODA loop which Models focus Experts

▪ recognize multiple feedback loops

▪ recognize opportunity: loops from Learning

perception (e.g., DL) to action (e.g., RL)

▪ hint: recognize “verbs” being used, Experts learn through

ML Models act on decisions

Experts learn through

Just Enough Math ML Adopwon in Hylbert-Speys JupyterCon Themes + Confs

You might also like

Best practices for human-in-the-loop:  

▪ deep learning is a popular example,  

▪ unsupervised approaches for AI  

▪ send difficult decisions/edge cases  

▪ works well in use cases which have lots  

▪ can help explore/exploit uncertainty –  

Humans in the loop 

▪ high capital expenses, long-term R&D   ▪ AI + cloud + mobile/embed,  

Jeff Dean Google 

Cybernetics and Design: Conversations for Action 

Active learning and transfer learning 

3x more consistent and 2x efficient than  

resolve specs into the approach needed and time required,  

method: overfit variational autoencoders,  

Orchestra: a platform for building human-assisted  

▪ while ML has mostly been about generalization,  

The End of the Beginning of Active Learning 

Also, the focus on “information as a collection of facts”  

▪ ML based on optimizing for  

▪ general considerations about   Jarvis workflow

▪ recognize multiple feedback loops