You are on page 1of 62

Best Practices for Continuous Improvement with

IBM Watson Assistant



Session #4543

Eric Wayne
STSM, IBM Watson AI

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Watson Assistant – Core Concepts
Definitions:
• Intents
• Entities
• Dialog flows
• User utterances
• Responses

Assistant Skills

Dialog Agent

Customer Channel Resolution

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Session Agenda

1. The Practices
2. The Tools
3. The Story (an example)

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


The Practices

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Best Practices Document
http://ibm.biz/wa-improve-best

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Establishing a Baseline – Business KPIs

Cost
Revenue
Engagement
“If your goal is to reduce customer service cost, it should cost less to maintain your
assistant than it does to staff the equivalent customer support team capacity…”
Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation
Establishing a Baseline – Process Criteria

Provides an understanding of performance

Allows you to prioritize your improvement effort

Makes improvement as efficiently as possible

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Watson Assistant Continuous Improvement

Measure
Live System

Analyze

Deploy

Improve
Pre-deploy
Testing

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Establishing a Baseline – Two Metrics

Coverage - the percentage of the total


conversations or messages your assistant attempts to
engage with.

Effectiveness - the quality of the experiences


your assistant provided during the conversations or
messages it did engage.
Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation
Coverage – Defined

Coverage is the percentage of the total conversations or messages your


assistant attempts to engage

Coverage is a view of the range and depth of subject matter your assistant is trained on

Coverage can be measured by conversation or by message

The intent confidence thresholds you set directly impact coverage

Coverage can be measured live in production and offline with test sets or historic logs

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Sample messages that went unanswered or that
were unhandled.
Identify top opportunities:
Depth of intents
• If high confidence threshold, start with

Coverage – utterances just below threshold


Range of intents
How it’s • If high confidence threshold, start with
utterances far below threshold
Improved • Use intent recommendations (beta) or your
own clustering algorithms on utterances
with lowest confidence
Range of dialog
• Add dialog branches where correct intent
was identified
Consider lowering confidence threshold with test set.

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Effectiveness - Defined
Effectiveness is the quality of the experiences your assistant provided
during the conversations and messages it did engage.

Effectiveness can be measured live in production with metrics


dashboards or off-line using labeled test sets or training data.
Measurements of effectiveness include:

• Conversation containment
• Conversation success (task completion)
• Intent confidence of messages in the conversation
• Precision of individual messages in the conversation
• Sentiment analysis
• Explicit user feedback (NPS at end of a sample or all conversations)
Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation
Measure Coverage and Effectiveness

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Measure Effectiveness

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Effectiveness – Improvement
Improve intents, entities, and dialog based on an assessment of a
sampling of conversations.

(1) Measure (2) Sample & (3) Analyze (4) Update workspace
Label

Use automated Sample Prioritize by: Use assessment to


metrics to ineffective • Least drive:
decide where conversations precise • Resolve conflicts
to focus or messages, • Confused • Add confused
e.g. escalated pairs utterances to
conversations. training
• Business
Label responses need • Combine intents &
with correct: add entities
• Intent • Add more training
• Entity through intent
• Dialog recommendations
• Add missed entities
• Add dialog branches
Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation
The Tools

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Analytics and user conversation logs in Watson Assistant

See what users are saying, and make updates to intent and entity training in your development
workspace.
Recommenders for conflict resolution and entity expansion in
Watson Assistant

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Jupyter Notebooks: complementary to Watson Assistant

Jupyter notebooks that help customers implement the best practices – beginning with
Measure and Analyze Effectiveness. Available in Github and the Watson Studio community.
The Story (an example)

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Telco customer care organization plans to use
Watson Assistant to reduce burden on agents

Assistant Skills

Dialog Agent

Customer Channel Resolution

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Bootstrap from existing chat system…
Upload available data to learn from logs of
human conversations

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


After uploading your chat logs..

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Scan recommended groups of user messages
Shape into an intent
Add more to the intent via user example recommendations
Now ready to run a pilot with users. Note that some intents are “silent”
– not yet implemented but useful for tracking.
Watson Assistant Continuous Improvement

Measure
Bootstrap
Live
System

Analyze
Evaluate
Deploy
Bootstrap

Improve
Pre-deploy
Testing

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Prepare for deployment – Integrate with a channel
Prepare for deployment - Preview
Observe the pilot with users…
Observe the pilot with users…
Observe the pilot with users…
Watson Assistant Continuous Improvement
Make updates to intents and entities in development workspace
Make updates to intents and entities in development workspace
Taking the next step..

We’ve been in production for


a while…

But we have so many users


and conversations, the
manual improvement
approach isn’t scaling. How
do we prioritize what we
analyze and improve?

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Jupyter Notebooks: Complementary to Watson
Assistant Tooling

Jupyter notebooks that help customers implement the best practices – beginning with
Measure and Analyze Effectiveness. Available in Github and the Watson Studio community.
Watson Assistant Continuous Improvement

Measure
Bootstrap
Live
System

Analyze
Evaluate
Deploy
Bootstrap

Improve
Pre-deploy
Testing
Measure Notebook retrieves logs and computes
automated metrics
Measure – Coverage over time
Measure – Effectiveness detail
Export an assessment spreadsheet. Let’s focus on
conversations that were escalated to an agent
Watson Assistant Continuous Improvement

Measure
Bootstrap
Live
System

Analyze
Evaluate
Deploy
Bootstrap

Improve
Pre-deploy
Testing
In this scenario, we are focusing just on conversations that escalated
when a message was not covered by the chatbot
Measure – Annotate a sampling of responses
Open the Analyze Effectiveness Notebook and load the
annotation spreadsheet
Analyze Effectiveness - Summary Metrics

• Worst overall performing intents


• Worst recall intents
• Worst precision intents

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Analyze Effectiveness

Remember this analysis is


based on a sampling of
escalated conversations
This version of the skill has
where a message was not a number of intents that
covered by the chatbot. are “placeholders” - they
The overall chatbot are not yet implemented
with dialog, and therefore
performance is much better
the messages are not
than this (See Measure helpful.
Notebook).
Analyze Effectiveness
Analyze Effectiveness

We found a coverage problem hiding within “effectiveness:” Missing


an intent for helping customers troubleshoot. Without this intent,
users had to ask for a human agent.
Analyze Effectiveness
Analyze Effectiveness
Analyze Effectiveness
Improvements based on Effectiveness Analysis
- Where do we focus?
Intents New in Watson Assistant – Search Skill
• Add a new “Trouble Shooting” intent (Beta)
• Add more examples to intents • For the Trouble Shooting use case,
consider implementing a search skill in
Entities front of a knowledge base (documents
• Use spreadsheet to create new entities, that guide users in resolving problems)
values and synonyms (copy the JSON)
• Use the synonym recommendations in
Watson Assistant

Dialog
• Prioritize implementing the missing dialogs
based on which cases lead to escalations

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Fallback to Knowledge Base Search for the
“Troubleshooting” Intent

Assistant Skills

Dialog Search Agent

Customer Channel Resolution Explicit Unify The


Answers existing ultimate
content fallback
Available Resources
1. Best Practices Guide - http://ibm.biz/wa-improve-best
2. Bootstrap – Intent recommendations for new intents (Beta) and user examples (GA)
3. Measure – Measure notebook
4. Analyze – Effectiveness notebook
5. Improve – Conflict resolution, Entity Expansion & Intent recommendations

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Visit Rate us on G2 Crowd for
ibm.biz/AssistantTHINKSignUp $10 donation to Girls Who
to get started with Watson Code & $10 gift certificate
Assistant for free to Starbucks
Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation
Notices and disclaimers
© 2018 International Business Machines Corporation. No part of this Performance data contained herein was generally obtained in a
document may be reproduced or transmitted in any form without controlled, isolated environments. Customer examples are presented as
written permission from IBM. illustrations of how those customers have used IBM products and the
results they may have achieved. Actual performance, cost, savings or
U.S. Government Users Restricted Rights — use, duplication or other results in other operating environments may vary.
disclosure restricted by GSA ADP Schedule Contract with IBM.
References in this document to IBM products, programs, or services
Information in these presentations (including information relating to does not imply that IBM intends to make such products, programs or
products that have not yet been announced by IBM) has been reviewed services available in all countries in which IBM operates or does
for accuracy as of the date of initial publication and could include business.
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed Workshops, sessions and associated materials may have been prepared
“as is” without any warranty, either express or implied. In no event, by independent session speakers, and do not necessarily reflect the
shall IBM be liable for any damage arising from the use of this views of IBM. All materials and discussions are provided for
information, including but not limited to, loss of data, business informational purposes only, and are neither intended to, nor shall
interruption, loss of profit or loss of opportunity. IBM products and constitute legal or other guidance or advice to any individual participant
services are warranted per the terms and conditions of the agreements or their specific situation.
under which they are provided.
It is the customer’s responsibility to insure its own compliance
IBM products are manufactured from new parts or new and used parts. with legal requirements and to obtain advice of competent legal counsel
In some cases, a product may not be new and may have been previously as to the identification and interpretation of any relevant laws and
installed. Regardless, our warranty terms apply.” regulatory requirements that may affect the customer’s business and
any actions the customer may need to take to comply with such
Any statements regarding IBM's future direction, intent or product laws. IBM does not provide legal advice or represent or warrant that its
plans are subject to change or withdrawal without notice. services or products will ensure that the customer follows any law.

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Notices and disclaimers
continued
Information concerning non-IBM products was obtained from the IBM, the IBM logo, ibm.com and [names of other referenced IBM
suppliers of those products, their published announcements or other products and services used in the presentation] are trademarks of
publicly available sources. IBM has not tested those products about this International Business Machines Corporation, registered in many
publication and cannot confirm the accuracy of performance, jurisdictions worldwide. Other product and service names might
compatibility or any other claims related to non-IBM products. be trademarks of IBM or other companies. A current list of IBM
Questions on the capabilities of non-IBM products should be addressed trademarks is available on the Web at “Copyright and trademark
to the suppliers of those products. IBM does not warrant the quality of information” at: www.ibm.com/legal/copytrade.shtml.
any third-party products, or the ability of any such third-party products
to interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.

The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


Thank you

Eric Wayne
STSM and Development Manager, IBM Watson

ewayne@us.ibm.com
ibm.com

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation


®

Think 2019 / 4543 / Feb 2019 / © 2019 IBM Corporation

You might also like