How To Build, Deploy, and Operationalize AI Assistants

Level-up on the skills that are currently most in-demand in software. Attend QCon Plus (Nov 4-18).
Discover QCon Plus by InfoQ: A Virtual Conference for Senior Software Engineers
and Architects (Nov 4-18)
How to Build, Deploy, and Operationalize

AI Assistants
Key Takeaways
AI assistants can be classified into 5 levels of conversational AI maturity,

defined by their capabilities
Level 3 assistants, those that can handle naturally-sounding, multi-turn
interactions, are hard to build
Rasa is an open source conversational AI framework that allows developers
to build level 3 AI assistants
Challenging natural language problems like entity disambiguation can be
addressed using applied NLP research and Rasa
AI assistants should be monitored and iteratively improved using NLU and
business metrics, and real user conversations
Conversational AI has been experiencing a renewed focus in recent years. In the

past few years, we’ve seen language models achieve state-of-the-art results,
demonstrate impressive results with language understanding benchmarks like
General Language Understanding (GLUE) and SuperGLUE, and lend themselves to
practical applications. Even so, conversational AI is far from being solved.
However, we’re moving to an AI- first world, where people expect technology to
be naturally conversational, thoughtfully contextual, and intelligent -- and so
most companies will have to consider adopting an AI assistant sooner or later.
In this article, I’ll first discuss the five levels of AI assistants using a standard
model for conversational AI maturity. Second, I’ll summarize my own recent
experience building a level 3 AI assistant. Finally, I’ll outline various custom tools
I built to continuously iterate upon, improve, and monitor the AI assistant in
production.
The Five
Level-up on theLevels
skills that areof AI most
currently Assistants
in-demand in software. Attend QCon Plus (Nov 4-18).
Most AI assistants today can handle simple questions, and they often reply with
prebuilt responses based on rule-based conversation processing. For instance, if a
user says X, respond with Y; if a user says Z, call a REST API, and so forth.
However, for AI assistants to provide value to business functions like customer
service, supply chain management, and healthcare workflow processes, we need
to move beyond the limitations of rule-based assistants and to a more standard
maturity model for conversational AI. In this article, we’ll talk about how to
model and deploy a contextual assistant and discuss real life examples of
contextual assistants in production.
There are five different levels of conversational AI maturity, defined by their

capabilities. These defined levels allow us to measure the AI assistant’s progress
to see where we are, and where we’d like to go in order to achieve or align with
business outcomes.
Level 1
At Level 1, the bot is a traditional notification assistant. It can send you

notifications about events or reminders about things in which you’ve explicitly
expressed interest. In other words, the assistant sends out preprogrammed
notifications or responds to events that are triggered by users. In this case, a help
desk assistant might send you a notification about the status change of your help
desk ticket.
Level 2
At Level 2,on
Level-up the
theassistant can
skills that are answer
currently FAQs
most and engage
in-demand in Attend
in software. simple dialogues.
QCon Plus (Nov The
4-18).
dialogues are pre-built, and the assistant relies heavily on intents, entities, and
rules. In this case, the assistant may answer some FAQs but will get perplexed
should the user engage in interjections or unexpected utterances.
Most assistants today are at level 2; they’re built using rule-based dialogues or
state machines. In this setup, the developer uses a combination of intents, entities
and if/else conditions to build dialogues. Observe the code snippet below. The
assistant has to rely on conditional statements to gather information and respond
to the user.
Observe the resulting conversation below. When the user asks an off-topic
question in the middle of a dialogue, the assistant gets confused and is not able to
respond in a relevant manner. This is because the dialogue is built using if/else
statements and is not able to recognize this new and unexpected conversation
path.
Level 3
Level 3 assistants are typically able to engage in flexible back-and-forth dialogue.

In addition, assistants at this stage are capable of handling user corrections,
interjections, chitchat, and sub-dialogues. This is the type of contextual assistants
most organizations are attempting to build today.
Levels 4 & 5
At Level 4, the assistant is able to remember your preferences and offer a

personalized experience. At Level 5 and above, assistants would be able to
monitor and manage a host of other assistants and effectively run certain aspects
of enterprise operations. Level 4 and 5 assistants do not exist today.
Case Study
Level-up ofthatBuilding
on the skills an
are currently most AI Help
in-demand Desk
in software. Assistant
Attend QCon Plus (Nov 4-18).
I’ve spent a few years building AI assistants and leading teams that shipped
contextual assistants to production. Building contextual assistants is hard.
Building contextual assistants that actually work, and drive measurable results, is
harder still.
One of the contextual assistants I built last year for an enterprise company was an
employee-facing help desk assistant. The goal was to automate a portion of help
desk tickets in order to reduce costs. An AI assistant that answers questions is
useful; however, AI assistants that perform task execution on behalf of users in
addition to answering questions, and nudging them to make informed decisions,
drive even more value.
The help desk assistant we were building had to answer questions about routine
technical issues, assist with issue resolution, and perform task execution and
follow-ups on behalf of users, none of which reflected predictable conversation
paths. Human language is messy and unpredictable; building state machines, or
rule-based processing, that attempt to script out possible conversation paths can
be incredibly difficult to scale and maintain.
Therefore, we had to use machine learning-powered dialogue management or

risk maintaining a large system with thousands of lines of code. Machine
learning-powered dialogue management enables AI assistants to train on real
user conversations, learn patterns and context, and predict appropriate and
sensible responses to queries.
We started to build and iterate upon a level 3 assistant using Rasa, an open-source
platform that provides ML tools to build and deploy contextual assistants.
TheLevel-up
following
on thediagram
skills that is
areacurrently
high-level
most overview
in-demand inof the technology
software. Attend QConstack we 4-18).
Plus (Nov used to
build, model, and deploy the level 3 help desk assistant. Services like Rasa Core,
Rasa NLU, and Rasa Actions were the foundation, or infrastructure layer, for the
help desk assistant. Rasa Core is a machine learning-based dialogue manager,
Rasa NLU is a customizable intent classification and entity extraction service, and
Rasa Actions is an integration point to call external services. We deployed
Duckling, an entity extraction service, and BERT, a language model used here for
named entity recognition, using Azure Kubernetes Service (AKS). Named entity
recognition is the task of extracting named entities from text. Named entities are
things like locations, organizations, or personal names.
It’s important to note that this enterprise company was an Azure customer, and
therefore the architecture was dictated by Azure services. The assistant was
integrated with Azure Active Directory, ServiceNow, and Microsoft Outlook to
authenticate users, create incident tickets, pull user profiles, and perform other
tasks like meeting scheduling.
Rasa’s tracker store that maintains conversation history and the current state of a
user’s conversation was backed by Azure Cosmos database. The assistant was
deployed on Slack and Microsoft Teams via a Chrome extension and other front-
end channels.
The following diagram is a high level representation of the help desk assistant’s
devops process. A command line bootstrapper was custom-created, and used to
initialize and set up the assistant’s code base. Azure DevOps service was used to
version control the assistant’s models and source code. Azure pipeline was used to
build and deploy the assistant to the various Kubernetes environments.
Note that Rasa now ships with an out-of-the-box bootstrapper to initialize and set
up a contextual assistant.
Challenges with Entity Disambiguation

As the user base grew, so did the help desk assistant’s skills and the content it
could potentially handle. It’s important to add that a lot of the captured data was
noisy. We started noticing issues with entity disambiguation. For instance, if a
friend were to say to you, “I’m on my cell,” you’d know that they most likely
meant that they were on their cell phone and that the word “cell” may not have
been in reference to a biological cell. Similarly, consider this example that
confused the help desk assistant:
“I want to schedule a meeting with my team”
or something slightly different,
“I want to schedule a meeting using teams”
Remember that this company is a Microsoft customer, and Microsoft Teams was
one of the communications software that employees used. When a user would ask
to schedule a meeting with their team, they wanted the assistant to set up a
meeting with a custom team or active directory group they had created; and
when the user would ask to schedule a meeting using teams, they wanted the
assistant to schedule a Microsoft teams meeting.
TheLevel-up
assistant would
on the confuse
skills that the entities
are currently in question
most in-demand -- “team”
in software. Attend and
QCon“teams”
Plus (Nov -- and
4-18).
provide an irrelevant or incorrect response. Both sentences and words may seem
similar but are quite different in the context of this organization. There were
several similar occurrences where the help desk assistant would get confused and
need additional training.
In April 2019, the team integrated the assistant with BERT (Bidirectional Encoder
Representations from Transformers). BERT was seen to achieve state-of-the-art
results on word sense disambiguation and other downstream NLP tasks. This is
due to its ability to pre-train bidirectional, contextual language representations
modeled on a large text corpus. It is therefore better equipped to resolve issues
with entity disambiguation.
However, BERT presented its own challenges in that it was somewhat slow. A
temporary solution was to create a wrapper to the BERT service, and fine-tune the
service to load the model to memory, thereby speeding up the request processing.
Integration with BERT solved some of the issues with entity disambiguation.
We shipped the conversational assistant to production and continued to collect

user conversation data. This data was used to train the AI models and make
continuous improvements to the assistant.
Make Continuous and Iterative Improvements

The AI assistant’s results were promising; it reduced calls and tickets and
automated large portions of the help desk team’s processes so they no longer had
to focus on repetitive and mundane tasks. Overall, it had compelling ROI that we
measured using a set of metrics we published at the beginning of the project, and
proactively maintained.
WeLevel-up
created onathe
custom testing
skills that and analytics
are currently tool in
most in-demand tosoftware.
automate testing,
Attend and(Nov
QCon Plus collect
4-18).
and visualize conversations, and measure important metrics. This tool was
critical in implementing a continuous learning cycle, where real user
conversations were collected, annotated, and used as training data for the
assistant to learn from. Real user conversations provide valuable insight into user
behavior and test the limits of your AI assistant. Therefore, it’s critical to augment
your data set with real user conversations. In addition, the custom testing and
analytics tool looked at mishandled and unhandled user requests and sent them
to the design team for review. The tool also measured fallback, or the number of
times the assistant defaulted with a generic fallback; success, or the number of
the times the assistant responded with a correct answer or successfully resolved
issues. Some of the other metrics that were measured were user retention rate,
which tracks how many users came back to talk to the assistant, and sentiment,
which identifies whether interactions were positive or negative.
We benchmarked these metrics; set weekly, monthly, and quarterly goals; and
tracked the assistant’s progress and made improvements accordingly.
Issues with
Level-up on the skillsMulti-Turn
that are currently mostDialogues
in-demand in software. Attend QCon Plus (Nov 4-18).
Some of the issues with non-linear conversations, where the user introduces a
new topic in the middle of the conversation or modifies a previous statement,
remained. These types of multi-turn conversations are particularly challenging,
and they also happen to be the way that most users actually talk. In an effort to
resolve some of these issues, the team experimented with Rasa’s TED
(Transformer Embedding Dialogue) policy. Using a transformer architecture, the
TED policy can selectively pick which conversation turns to pay attention to, and
which conversation turns to ignore.
Additionally, and perhaps, distinctively in comparison to recurrent neural

network architectures, transformers use a self-attention mechanism, by which
they’re able to choose which elements in a conversation to pay attention to, in
order to make an accurate prediction. In other words, transformers are uniquely
equipped to handle non-linear conversations where a user might change topics,
engage in chitchat in the middle of a conversation, because they’re less likely to
become perplexed when a user does something unexpected.
In addition, it provides hyperparameters that can be used to fine-tune the model.

It’s been said more than once that hyperparameter tuning is sometimes more an
art than a science, because it relies intensely on experimental results more than
pure theory, and one has to continue to try out different combinations and
evaluate each model’s performance to find the best-suited one.
Continuous monitoring of real user conversations and subsequent fixes,

integration with BERT, utilization of the TED policy, and additional tooling to
support conversational AI workflows helped to deliver a level 3 contextual
assistant.
Next Steps
We’ve seen
Level-up onathe
shift towards
skills open domain
that are currently systems.
most in-demand That refers
in software. Attend to assistants
QCon Plus (Nov that
4-18).
are, in theory, unencumbered by a particular domain or topic, and are capable of
talking about anything. This makes sense because we have massive amounts of
data, and we have systems that are good at collecting and aggregating data. We
also have the technological capabilities to tell a compelling story with this data,
not merely chase the next benchmark or accuracy score.
While it is true that the field of natural language processing (NLU) has seen many
recent advancements, today’s contextual assistants still have a long way to go
because they don’t truly understand language or its relationship with the world.
Statistical or language mimicry is not the same as language understanding.
Therefore, it’s useful to have a healthy amount of skepticism while evaluating

language models and conversational AI frameworks. It’s equally important to
note that building a contextual assistant requires machine learning, real user
conversations, sound software engineering principles, best standards and
practices around continuous improvement and continuous deployment (CI/CD),
and tooling that supports these workflows.
All of that is to say that we’re at an exciting time for conversational AI to be the
next computational platform of choice for companies and enterprises to improve
products, offer personalized and curated customer service, and see real results.
Conversational AI hasn’t been solved yet, but what’s promising is the pace of
innovation and the level of discourse in this field.
About the Author

Mady Mantha is a Senior Technical Evangelist at Rasa. Mady studied
Computer Science, Physics, and International Politics at Georgetown
University. She has years of experience building ML-driven products
and services for think tanks, enterprises, and startups. Mady is a
space enthusiast.
Discuss
Please see https://www.infoq.com for the latest version of this information.

How To Build, Deploy, and Operationalize AI Assistants

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

How To Build, Deploy, and Operationalize AI Assistants

Uploaded by

Copyright:

Level-up on the skills that are currently most in-demand in software. Attend QCon Plus (Nov 4-18).

How to Build, Deploy, and Operationalize

AI assistants can be classiﬁed into 5 levels of conversational AI maturity,

Conversational AI has been experiencing a renewed focus in recent years. In the

There are ﬁve different levels of conversational AI maturity, deﬁned by their

At Level 1, the bot is a traditional notiﬁcation assistant. It can send you

Level 3 assistants are typically able to engage in ﬂexible back-and-forth dialogue.

At Level 4, the assistant is able to remember your preferences and offer a

Therefore, we had to use machine learning-powered dialogue management or

Challenges with Entity Disambiguation

“I want to schedule a meeting with my team”

or something slightly different,

“I want to schedule a meeting using teams”

We shipped the conversational assistant to production and continued to collect

Make Continuous and Iterative Improvements

Additionally, and perhaps, distinctively in comparison to recurrent neural

In addition, it provides hyperparameters that can be used to ﬁne-tune the model.

Continuous monitoring of real user conversations and subsequent ﬁxes,

Therefore, it’s useful to have a healthy amount of skepticism while evaluating

About the Author

Please see https://www.infoq.com for the latest version of this information.

You might also like