You are on page 1of 55

Best practices for human-in-the-loop: 


the business case for active learning

Paco Nathan @pacoid 2018-09-07


Introduction:

definitions and indications
Machine learning
supervised ML:

▪ take a dataset where each element


has a label

▪ train models on a portion of the


data to predict the labels, then 

evaluate on the holdout

▪ deep learning is a popular example, 



when you have lots of labeled
training data available

3
Machine learning
unsupervised ML:

▪ run lots of unlabeled data through


an algorithm to detect “structure”
or embedding

▪ for example, clustering algorithms


such as K-means

▪ unsupervised approaches for AI 



are an open research question

4
Active learning
a special case of semi-supervised ML:

▪ send difficult decisions/edge cases 



to experts (feedback); let algorithms 

handle routine decisions (automate)

▪ works well in use cases which have lots 



of inexpensive, unlabeled data

▪ e.g., abundance of content to be classified,


where cost of labeling is a major expense

▪ can help explore/exploit uncertainty – 



in the sense of risk vs. uncertainty vs.
opportunity
5
HITL is a “mix and match” approach
Human in the loop: Machine learning and AI for
the people

George Anadiotis

ZDNet (2018-05-23)
“Rather than just recognize patterns, we can use ML to look

at data and identify opportunities, by identifying uncertainty.

This is not about risk – there’s no upside to risk, you just buy

insurance. But if you can separate risk from uncertainty, you

can profit, because that’s where the opportunities are.”

Humans in the loop



Paco Nathan

Domino Data Lab (2017-08-16)

6
Let’s consider:
• AI encompasses much more than deep learning:
there’s also knowledge graph, planning systems,
other kinds of learning, mathematical optimization,
etc.
• Also think about teams of people + machines
• How can these different parts be integrated?
• Which methodologies need to be followed?

7
1/ Achieving human parity
Achieving Human Parity in Conversational
Speech Recognition

W. Xiong, et al. Microsoft

(2016-10-12)
Microsoft researchers reached 

human parity in conversational 

speech recognition, reducing 

speech-to-text error rates below 

the human threshold of ~5%
8
1/ Achieving human parity
Achieving Human Parity in Conversational
Speech Recognition

W. Xiong, et al. Microsoft

AI defined? achieving parity with human
(2016-10-12)
experts, in domains where experts solve
“hard problems” —
Microsoft researchers reached 
perhaps more aptly
stated as learning in uncertain domains
human parity in conversational 

speech recognition, reducing 

speech-to-text error rates below 

the human threshold of ~5%
9
1/ Achieving human parity
Achieving Human Parity in Conversational
Speech Recognition

W. Xiong, et al. Microsoft

(2016-10-12)
Microsoft researchers reached 

human parity in conversational 

speech recognition, reducing 


 speech-to-text error rates below 

seen repeatedly as a 

the human threshold of ~5%
metric for “expertise”
10
2/ Segments for AI adoption in industry
segment liabilities assets

▪ high capital expenses, long-term R&D 
 ▪ AI + cloud + mobile/embed, 



Google, as hardware evolves rapidly leveraging a flywheel effect
Amazon,
▪ potential vulnerabilities by automating 
 ▪ had focused business lines well 

Microsoft,

too much in advance to prepare large-scale 

IBM,

▪ potential vulnerabilities by mistaking 
 labeled data sets
Apple, 

Baidu, 
 first-order cybernetics for second-order ▪ uses AI to explore uncertainty, 

etc. focusing their core expertise

▪ facing barriers: talent gap, competing 
 ▪ HITL provides a vector to compete 
 MIT SMR

investment priorities, security concerns against top incumbents, with many 2017
adopters unexplored areas of opportunity
▪ verticals eroded by horizontal business 

lines from top incumbents

▪ struggling to recognize business use cases


legacy ▪ buried in tech debt from digital infrastructure ??
▪ lacks management buy-in 11
2/ Segments for AI adoption in industry
The State of Machine Learning Adoption in the Enterprise

Ben Lorica, Paco Nathan

O’Reilly Media (2018-08-07)
O’Reilly did a recent study about ML adoption in enterprise,
with 8000+ respondents worldwide, which provides relevant
insights. Also see a summary article:
https://www.oreilly.com/ideas/5-findings-from-oreilly-
machine-learning-adoption-survey-companies-should-
know

12
3/ “Detach from named methods”
Developers Should Abandon Agile

Ron Jeffries (2018-05-10)
“No matter what framework or method your management
thinks they are applying, learn to work this way:
• Produce running, tested, working, integrated software
every two weeks, every week…
• Keep the design of that software clean. As it grows, the
design will tend to become complex and crufty…
• Use the current increment of software as the
foundation for all your conversations with your 

product leadership and management…
13
3/ “Detach from named methods”
Developers Should Abandon Agile

~20 yrs ago: “Agile”
Ron Jeffries (2018-05-10) created value by iterating
on a code base, while the nature of the data
“No matterwaswhat framework
relatively or method
invariant, yourspecified
e.g., management
by a
thinks theyschema,
are applying, learn to work this way:
relegated to unit tests, database, etc.
• Produce running, tested, working, integrated software
today:
every two OSS
weeks, shifted
every week…ROI – now someone else
maintains the code; but without LOTS of
• Keep the design of that software clean. As it grows, the
carefully labeled data, that code
design will tend to become complex and crufty…may not 

yield much return
• Use the current increment of software as the
foundation for all your conversations with your 

product leadership and management…
14
4/ The reality of data rates
“If you only have 10 examples of something, it’s going

to be hard to make deep learning work. If you have

100,000 things you care about, records or whatever,

that’s the kind of scale where you should really start

thinking about these kinds of techniques.”

Jeff Dean Google



VB Summit (2017-10-23)
venturebeat.com/2017/10/23/google-brain-chief-says-100000-
examples-is-enough-data-for-deep-learning/

15
4/ The reality of data rates
Transfer learning aside, most DL use cases require large, carefully
labeled data sets, while RL requires much more data than that.
Active Learning can yield good results with substantially smaller
data rates, while leveraging an organization’s expertise to
bootstrap toward larger labeled data sets, e.g., as preparation for
deep learning, etc.

supervised reinforcement
learning learning

data rates
(log scale)

active deep
learning learning 16
4/ The reality of data rates
Transfer learning aside, most DL use cases require large, carefully
labeled data sets, while RL requires much more data than that.
Active Learning can yield good results with substantially smaller
data rates, while leveraging an organization’s expertise to
bootstrap toward larger labeled data sets, e.g., as preparation for
deep learning, etc.
many enterprise 

use cases fit here
supervised reinforcement
learning learning

data rates
(log scale)

active deep
learning learning 17
4/ The reality of data rates
“Reducing the Data Demands of Smart Machines”

https://www.darpa.mil/news-events/2018-07-11
• reduce the data needed to build new models by 106
• reduce the data needed to adapt models from 

millions to hundreds of labeled examples
“We are encouraging researchers to create novel

methods in the areas of meta-learning, transfer

learning, active learning, k-shot learning, and

supervised/unsupervised adaptation to solve 

this challenge.”

18
5/ Human-AI Coexistence?
China: AI superpower

Kai-Fu Lee Sinovation Ventures

AI SF, 2018-09-06
quadrants plotted by 

“optimization” vs “compassion"

19
5/ Human-AI Coexistence?
Second-order cybernetics provides decades of learnings
which now apply for AI adoption in enterprise

Cybernetics and Design: Conversations for Action



Hugh Dubberly, Paul Pangaro DDO

(2015-11-01)

Second-Order Cybernetics

Ranulph Glanville

Systems Science and Cybernetics (2003)

20
Active Learning:
case studies and patterns
Who’s doing this?

22
Case Study: Primer
Using Artificial Intelligence to Fix Wikipedia's
Gender Problem

Tom Simonite

Wired (2018-08-03)

23
Case Study: Stitch Fix
Building a business that combines human
experts and data science

Eric Colson StitchFix

O’Reilly Data Show (2016-01-28)
“what machines can’t do are things around cognition,

things that have to do with ambient information, or

appreciation of aesthetics, or even the ability to

relate to another human”


24
Case Study: Figure Eight
Real-World Active Learning: Applications and
Strategies for Human-in-the-Loop ML

Ted Cuzzillo

O’Reilly Media (2015-02-05)

Active learning and transfer learning



Luke Biewald Figure Eight

The AI Conf, SF (2017-09-17)
breakthroughs lag invention of algorithms – 

must wait for “killer data set” to emerge, 

often lagging by a decade or more
25
Case Study: EY
EY, Deloitte And PwC Embrace Artificial
Intelligence For Tax And Accounting

Adelyn Zhou

Forbes (2017-11-14)
compliance use cases in reviewing lease 

accounting standards

3x more consistent and 2x efficient than 



the previous humans-only teams

break-even ROI within less than a year

26
Case Study: Lola
Paul English on Lola's Debut for Business Travelers

Elizabeth West

Business Travel News (2017-10-04)
founded 2015 by Paul English and other Kayak execs: 

on-demand, personal travel service; uses expert travel agents for HITL
initially criticized by travel industry as “competing against Siri”; 

currently displacing OTAs in a reversal of “AI vs. jobs”
can book on Airbnb, Southwest, etc., which aren’t available via OTA, 

because of the human delegation
“The first time you use Lola it’s going to be great because it’s a conversation. 

We’re not making you think like a computer”
“Instead of showing you 300 choices or 1,000 choices, we think we can 

show you three choices, kind of good, better, best”
27
Case Study: SAP Concur
When Privacy Scales

Amanda Casari (SAP Concur)

AI SF (2018-09-06)

28
Case Study: hCaptcha
Growing two-sided markets with blockchain tech

Eli-Shaoul Khedouri Intuition Machines

AI SF (2018-09-05)
Customers: provide datasets which need to be labelled

Web site publishers: serve “captcha” security tests and


receive tokenized payments

Consumers: prove they’re “human” by labeling image data

https://twitter.com/pacoid/status/1037413665283010560

https://hcaptcha.com/

https://www.hmt.ai/
29
Case Study: Crowdbotics
Anand Kulkarni Crowdbotics

HITL for code+test gen, trained from GitHub, StackOverflow,


etc., with JIRA tickets as the granular object in the system

parse specs from JIRA history, reuse what’s been done before;
generate PRs for popular web stacks: React, Flask, Ruby, etc.

resolve specs into the approach needed and time required, 



where product managers get cost estimates, then on-demand
expert programmers implement for you

meanwhile, in-house engineers handle the “radically novel”


projects

results: 1.5x software dev throughput 30


Case Study: Skymind
Unsupervised fuzzy labeling using deep
learning to improve anomaly detection

Adam Gibson Skymind

Strata Data Conf, Singapore (2017-12-07)
large-scale use case for telecom in Asia

method: overfit variational autoencoders, 



then send outliers to human analysts

31
Case Study: B12
Building human-assisted AI applications

Adam Marcus B12

O’Reilly Data Show (2016-08-25)
“Humans where they’re best, machines for the rest.”

Orchestra: a platform for building human-assisted 



AI applications, e.g., create/update business websites

https://github.com/b12io/orchestra

example: http://www.coloradopicked.com/

32
Case Study: Clara Labs
Strategies for integrating people and machine
learning in online systems

Jason Laska Clara Labs

The AI Conf, NY (2017-06-29)
establishing a two-sided marketplace where 

machines and people compete on a spectrum 

of relative expertise and capabilities


33
Case Study: Clara Labs
Strategies for integrating people and machine
learning in online systems

Jason Laska Clara Labs

“the trick is
The AI Conf, NY (2017-06-29) to design systems from Day 1

which learn implicitly from the intelligence

which ismarketplace
establishing a two-sided where 

already there”
machines and people compete on a spectrum 

Michael Akilian Clara
of relative expertise and capabilities
 Labs

34
Active Learning:
theory, tooling, practices
HITL theory: the business of business
Risk, Uncertainty, and Profit

Frank Knight

Riverside Press (1921)

▪ while ML has mostly been about generalization, 



we can borrow from Knight…

▪ use ML models to explore uncertainty in relationship


to profit vs. risk – “focus the lens” within datasets

▪ also distinguish forms of uncertainty between


aleatoric (noise) and epistemic (incomplete model)

▪ see “AI-native blockchain” where models request data


– Bharath Ramsundar
36
HITL theory: choosing what to learn
Active Learning Literature Survey

Burr Settles UW Madison

(2010-01-26)
Can machines learn more economically if they ask human “oracles”
questions? e.g., task in-house experts with the edge cases?
▪ uncertainty sampling: query about instances which ML is least
certain how to label - least confidence / margin / entropy
▪ query-by-committee: ensemble of ML models votes; query the
instance about which they disagree most
▪ expected error reduction: maximize the expected information gain
of the query
▪ variance reduction: minimize future generalization error of the
model (e.g., loss function)
▪ density-weighted methods: instances which are both uncertain
and “representative” of the underlying distribution

37
HITL tooling: active learning
Agnostic Active Learning Without Constraints

Alina Beygelzimer, Daniel Hsu, John Langford, 

Tong Zhang

NIPS (2010-06-14)

The End of the Beginning of Active Learning



Daniel Hsu, John Langford

Hunch.net (2011-04-20)
https://github.com/JohnLangford/vowpal_wabbit/wiki

focused on cases where labeling is expensive; uses importance


weighted active learning; handles “adversarial label noise”
as good or better than supervised ML, wherever supervised
ML works 38
HITL tooling: machine teaching
Prodigy: a new tool for radically
efficient machine teaching

Matthew Honnibal, Ines Montani
Explosion.ai (2017)

39
HITL practices: model interpretation
A Survey Of Methods For Explaining Black Box Models

Riccardo Guidotti, et al.

(2018-02-19)
Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang

ICML (2017-07-10)
The Building Blocks of Interpretability

Chris Olah, et al. Google Brain

Distill (2018-03-06)
Challenges for Transparency

Adrian Weller

WHI (2017-07-29)
The Mythos of Model Interpretability

Zachary Lipton

WHI (2016-03-06)

40
HITL practices: model interpretation
explicability of ML models becomes essential, 

must be intuitive for the human experts involved: 

Skater, and also Anchors, SHAP, STREAK, LIME, etc.

41
HITL practices: no-collar workforce
No-collar workforce: Humans and machines in one loop

Anthony Abbatiello, Tim Boehm, Jeff Schwartz, Sharon Chand

Deloitte Insights (2017-12-05)
▪ near-future: human workers and machines complement the
other’s efforts in a single loop of productivity
▪ 2018-20: expect firms to embrace a “no-collar workforce”
trend by redesigning jobs
▪ yet only ~17% ready to manage a workforce in which people,
robots, and AI work side by side – largely due to cultural, tech
fluency, regulatory issues
▪ e.g., what about onboarding or retiring non-human workers?
these are no longer theoretical questions
▪ HR orgs must develop strategies and tools for recruiting,
managing, and training a hybrid workforce

42
Social Systems:
collaboration with machines
First-order cybernetics
Cybernetics: Or Control and Communication 

in the Animal and the Machine

Norbert Wiener MIT

MIT Press (1948)

early work had been about closed-loop control systems:


homeostasis, habituation, adaptation, and other
regulatory processes

given a system which has input and output, a controller


leveraging a negative feedback loop, and one or more
observers outside of the system

related to the early Macy Conferences


44
First-order cybernetics
Cybernetics: Or Control and Communication 

in the Animal and the Machine

Norbert Wiener MIT

MIT Press (1948)
a sea-change: “the organism was no longer an input/
output machine; rather it was part of a loop
early work had been about closed-loop control systems:
from
perception to action and back again
homeostasis, habituation, adaptation, and other to perception”
regulatory processes
Paul Pangaro describing Jerry Lettvin @ MIT cybernetics
given a system which has input and output, a controller
leveraging a negative feedback loop, and one or more
observers outside of the system

related to the early Macy Conferences


45
Second-order cybernetics
▪ biology informing computer science
▪ historical context of Project Cybersyn
▪ autopoiesis and cognition
▪ organizational closure: 

“self-making means stability”
▪ speech acts (e.g., social analysis of open source)
▪ IMO, blueprints for AI systems

Also, the focus on “information as a collection of facts” 



is yet another form of cognitive bias – instilled through 

30+ years of data warehouse practices, where data must 

fit into dimensions, facts, schema
46
Second-order cybernetics
Autopoiesis and Cognition: The Realization of the Living

Humberto Maturana, Francisco Varela

Kluwer (1980 / original 1972)
Understanding Computers and Cognition: 

A New Foundation for Design

Terry Winograd, Fernando Flores

Intellect Books (1986)
Conversations for Action and Collected Essays

Fernando Flores

Createspace (2013)

47
Second-order cybernetics
Autopoiesis and Cognition: The Realization of the Living

Humberto Maturana, Francisco Varela

Kluwer (1980 / original 1972)
“Action emerges from committed interactions of people 

Understanding Computers and Cognition: 

making requests and
A New Foundation for Design
 promises in networks of commitments; 

such networks aren’t brought
Terry Winograd, Fernando Flores
 forth by plans.”
Intellect Books (1986)
– Fernando Flores 

[could rewrite
Conversations quote
for Action and as “people
Collected and machines”]
Essays

Fernando Flores

Createspace (2013)

48
Summary:
how this matters
What is changing and why?
Second-order cybernetics began partly as a study of how
complex systems fail, and also about what social systems 

and physical systems had in common

It provides foundations for AI systems of people + machines

Feedback loops represent structured conversations for


action, from which the participants cannot be detached

The organization is no longer viewed as an input/output


machine; rather it’s a pluralistic network of loops from
perception to action and back again to perception – 

e.g., DL augments perception and HITL augments actions
50
Strategies from Big Data
In general with Big Data, we were considering:
▪ DAG workflow execution – 

those are typically linear

▪ “data-driven organizations”

▪ ML based on optimizing for 



objective functions

▪ general considerations about 
 Jarvis workflow


correlation vs. causation

▪ avoid “garbage in, garbage out”


51
Strategies from Big Data
In general with Big Data, we were considering:
▪ DAG workflow execution – 

those are typically linear

▪ “data-driven sound approaches


organizations” at the time, given the
IT backdrop of batch processing, BI, etc.
▪ ML based on optimizing for 

objective functions
however, much less ROI in context today
▪ general considerations about 
 Jarvis workflow
correlation vs. causation

▪ avoid “garbage in, garbage out”


52
Management strategy with HITL
HITL introduces circularities:
▪ deprecate linear input/output systems 

Examples,
Actions
Experts decide
as the “conventional wisdom” about edge cases,
providing examples

▪ analogous to an OODA loop which Models focus Experts


(e.g., weak supervision) Models explore
uncertainty when needed
includes automation/augmentation
ML Models act on decisions

▪ recognize multiple feedback loops 



Models when possible

Human
Experts
as conversations for action Experts gain insights
via Model explanations
Customer
Organizational Use Cases

▪ recognize opportunity: loops from Learning

perception (e.g., DL) to action (e.g., RL)


Customers request
and back again to perception Sales, Marketing,
Service, Training

▪ hint: recognize “verbs” being used, Experts learn through


Customer interactions
rather than over-emphasizing “nouns” Customers

53
Examples, “design systems which 

Actions
Experts decide
learn implicitly from 

about edge cases,
providing examples
the intelligence that’s 

already there”
Models focus Experts
(e.g., weak supervision) Models explore
uncertainty when needed

ML Models act on decisions


Models when possible

Human
Experts
Experts gain insights
via Model explanations
Customer
Organizational Use Cases
Learning

Customers request
Sales, Marketing,
Service, Training

Experts learn through


Customer interactions

Customers 54
publicaions, interviews, conference summaries…
https://derwen.ai/paco

@pacoid

Just Enough Math ML Adopwon in Hylbert-Speys JupyterCon Themes + Confs


Enterprise themes 2018 per Pacoid

You might also like