CIT Watson PDF

Best Practices for Continuous Improvement with
IBM Watson Assistant

—
Session #4543
Eric Wayne
STSM, IBM Watson AI
Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Watson Assistant – Core Concepts
Definitions:
• Intents
• Entities
• Dialog flows
• User utterances
• Responses
Assistant Skills
Dialog Agent
Customer Channel Resolution

Session Agenda
1. The Practices
2. The Tools
3. The Story (an example)

The Practices

Best Practices Document
http://ibm.biz/wa-improve-best

Establishing a Baseline – Business KPIs
Cost
Revenue
Engagement
“If your goal is to reduce customer service cost, it should cost less to maintain your
assistant than it does to staff the equivalent customer support team capacity…”
Establishing a Baseline – Process Criteria
Provides an understanding of performance
Allows you to prioritize your improvement effort
Makes improvement as efficiently as possible

Watson Assistant Continuous Improvement
Measure
Live System
Analyze
Deploy
Improve
Pre-deploy
Testing

Establishing a Baseline – Two Metrics
Coverage - the percentage of the total

conversations or messages your assistant attempts to
engage with.
Effectiveness - the quality of the experiences

your assistant provided during the conversations or
messages it did engage.
Coverage – Defined
Coverage is the percentage of the total conversations or messages your

assistant attempts to engage
Coverage is a view of the range and depth of subject matter your assistant is trained on
Coverage can be measured by conversation or by message
The intent confidence thresholds you set directly impact coverage
Coverage can be measured live in production and offline with test sets or historic logs

Sample messages that went unanswered or that
were unhandled.
Identify top opportunities:
Depth of intents
• If high confidence threshold, start with
Coverage – utterances just below threshold

Range of intents
How it’s • If high confidence threshold, start with
utterances far below threshold
Improved • Use intent recommendations (beta) or your
own clustering algorithms on utterances
with lowest confidence
Range of dialog
• Add dialog branches where correct intent
was identified
Consider lowering confidence threshold with test set.

Effectiveness - Defined
Effectiveness is the quality of the experiences your assistant provided
during the conversations and messages it did engage.
Effectiveness can be measured live in production with metrics

dashboards or off-line using labeled test sets or training data.
Measurements of effectiveness include:
• Conversation containment
• Conversation success (task completion)
• Intent confidence of messages in the conversation
• Precision of individual messages in the conversation
• Sentiment analysis
• Explicit user feedback (NPS at end of a sample or all conversations)
Measure Coverage and Effectiveness

Measure Effectiveness

Effectiveness – Improvement
Improve intents, entities, and dialog based on an assessment of a
sampling of conversations.
(1) Measure (2) Sample & (3) Analyze (4) Update workspace
Label
Use automated Sample Prioritize by: Use assessment to

metrics to ineffective • Least drive:
decide where conversations precise • Resolve conflicts
to focus or messages, • Confused • Add confused
e.g. escalated pairs utterances to
conversations. training
• Business
Label responses need • Combine intents &
with correct: add entities
• Intent • Add more training
• Entity through intent
• Dialog recommendations
• Add missed entities
• Add dialog branches
The Tools

Analytics and user conversation logs in Watson Assistant
See what users are saying, and make updates to intent and entity training in your development
workspace.
Recommenders for conflict resolution and entity expansion in
Watson Assistant

Jupyter Notebooks: complementary to Watson Assistant
Jupyter notebooks that help customers implement the best practices – beginning with
Measure and Analyze Effectiveness. Available in Github and the Watson Studio community.
The Story (an example)

Telco customer care organization plans to use
Watson Assistant to reduce burden on agents
Assistant Skills
Dialog Agent
Customer Channel Resolution

Bootstrap from existing chat system…
Upload available data to learn from logs of
human conversations

After uploading your chat logs..

Scan recommended groups of user messages
Shape into an intent
Add more to the intent via user example recommendations
Now ready to run a pilot with users. Note that some intents are “silent”
– not yet implemented but useful for tracking.
Measure
Bootstrap
Live
System
Analyze
Evaluate
Deploy
Bootstrap
Improve
Pre-deploy
Testing

Prepare for deployment – Integrate with a channel
Prepare for deployment - Preview
Observe the pilot with users…
Make updates to intents and entities in development workspace
Make updates to intents and entities in development workspace
Taking the next step..
We’ve been in production for

a while…
But we have so many users

and conversations, the
manual improvement
approach isn’t scaling. How
do we prioritize what we
analyze and improve?

Jupyter Notebooks: Complementary to Watson
Assistant Tooling
Jupyter notebooks that help customers implement the best practices – beginning with
Measure and Analyze Effectiveness. Available in Github and the Watson Studio community.
Measure
Bootstrap
Live
System
Analyze
Evaluate
Deploy
Bootstrap
Improve
Pre-deploy
Testing
Measure Notebook retrieves logs and computes
automated metrics
Measure – Coverage over time
Measure – Effectiveness detail
Export an assessment spreadsheet. Let’s focus on
conversations that were escalated to an agent
Measure
Bootstrap
Live
System
Analyze
Evaluate
Deploy
Bootstrap
Improve
Pre-deploy
Testing
In this scenario, we are focusing just on conversations that escalated
when a message was not covered by the chatbot
Measure – Annotate a sampling of responses
Open the Analyze Effectiveness Notebook and load the
annotation spreadsheet
Analyze Effectiveness - Summary Metrics
• Worst overall performing intents

• Worst recall intents
• Worst precision intents

Analyze Effectiveness
Remember this analysis is

based on a sampling of
escalated conversations
This version of the skill has
where a message was not a number of intents that
covered by the chatbot. are “placeholders” - they
The overall chatbot are not yet implemented
with dialog, and therefore
performance is much better
the messages are not
than this (See Measure helpful.
Notebook).
We found a coverage problem hiding within “effectiveness:” Missing

an intent for helping customers troubleshoot. Without this intent,
users had to ask for a human agent.
Improvements based on Effectiveness Analysis
- Where do we focus?
Intents New in Watson Assistant – Search Skill
• Add a new “Trouble Shooting” intent (Beta)
• Add more examples to intents • For the Trouble Shooting use case,
consider implementing a search skill in
Entities front of a knowledge base (documents
• Use spreadsheet to create new entities, that guide users in resolving problems)
values and synonyms (copy the JSON)
• Use the synonym recommendations in
Watson Assistant
Dialog
• Prioritize implementing the missing dialogs
based on which cases lead to escalations

Fallback to Knowledge Base Search for the
“Troubleshooting” Intent
Assistant Skills
Dialog Search Agent
Customer Channel Resolution Explicit Unify The

Answers existing ultimate
content fallback
Available Resources
1. Best Practices Guide - http://ibm.biz/wa-improve-best
2. Bootstrap – Intent recommendations for new intents (Beta) and user examples (GA)
3. Measure – Measure notebook
4. Analyze – Effectiveness notebook
5. Improve – Conflict resolution, Entity Expansion & Intent recommendations

Visit Rate us on G2 Crowd for
ibm.biz/AssistantTHINKSignUp $10 donation to Girls Who
to get started with Watson Code & $10 gift certificate
Assistant for free to Starbucks
Notices and disclaimers
© 2018 International Business Machines Corporation. No part of this Performance data contained herein was generally obtained in a
document may be reproduced or transmitted in any form without controlled, isolated environments. Customer examples are presented as
written permission from IBM. illustrations of how those customers have used IBM products and the
results they may have achieved. Actual performance, cost, savings or
U.S. Government Users Restricted Rights — use, duplication or other results in other operating environments may vary.
disclosure restricted by GSA ADP Schedule Contract with IBM.
References in this document to IBM products, programs, or services
Information in these presentations (including information relating to does not imply that IBM intends to make such products, programs or
products that have not yet been announced by IBM) has been reviewed services available in all countries in which IBM operates or does
for accuracy as of the date of initial publication and could include business.
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed Workshops, sessions and associated materials may have been prepared
“as is” without any warranty, either express or implied. In no event, by independent session speakers, and do not necessarily reflect the
shall IBM be liable for any damage arising from the use of this views of IBM. All materials and discussions are provided for
information, including but not limited to, loss of data, business informational purposes only, and are neither intended to, nor shall
interruption, loss of profit or loss of opportunity. IBM products and constitute legal or other guidance or advice to any individual participant
services are warranted per the terms and conditions of the agreements or their specific situation.
under which they are provided.
It is the customer’s responsibility to insure its own compliance
IBM products are manufactured from new parts or new and used parts. with legal requirements and to obtain advice of competent legal counsel
In some cases, a product may not be new and may have been previously as to the identification and interpretation of any relevant laws and
installed. Regardless, our warranty terms apply.” regulatory requirements that may affect the customer’s business and
any actions the customer may need to take to comply with such
Any statements regarding IBM's future direction, intent or product laws. IBM does not provide legal advice or represent or warrant that its
plans are subject to change or withdrawal without notice. services or products will ensure that the customer follows any law.

Notices and disclaimers
continued
Information concerning non-IBM products was obtained from the IBM, the IBM logo, ibm.com and [names of other referenced IBM
suppliers of those products, their published announcements or other products and services used in the presentation] are trademarks of
publicly available sources. IBM has not tested those products about this International Business Machines Corporation, registered in many
publication and cannot confirm the accuracy of performance, jurisdictions worldwide. Other product and service names might
compatibility or any other claims related to non-IBM products. be trademarks of IBM or other companies. A current list of IBM
Questions on the capabilities of non-IBM products should be addressed trademarks is available on the Web at “Copyright and trademark
to the suppliers of those products. IBM does not warrant the quality of information” at: www.ibm.com/legal/copytrade.shtml.
any third-party products, or the ability of any such third-party products
to interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.

Thank you
Eric Wayne
STSM and Development Manager, IBM Watson
—
ewayne@us.ibm.com
ibm.com

®

CIT Watson PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CIT Watson PDF

Uploaded by

Copyright:

Available Formats

Best Practices for Continuous Improvement with

IBM Watson Assistant

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Customer Channel Resolution

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Provides an understanding of performance

Allows you to prioritize your improvement effort

Makes improvement as efficiently as possible

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Coverage - the percentage of the total

Effectiveness - the quality of the experiences

Coverage is the percentage of the total conversations or messages your

Coverage can be measured by conversation or by message

The intent confidence thresholds you set directly impact coverage

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Coverage – utterances just below threshold

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Effectiveness can be measured live in production with metrics

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Use automated Sample Prioritize by: Use assessment to

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Customer Channel Resolution

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

We’ve been in production for

But we have so many users

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

• Worst overall performing intents

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Remember this analysis is

We found a coverage problem hiding within “effectiveness:” Missing

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Dialog Search Agent

Customer Channel Resolution Explicit Unify The

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

You might also like