You are on page 1of 3

White paper

Cisco partner confidential

Conversational Interactive
Voice Response (IVR): Cisco
Customer Voice Portal with
Google Cloud AI Services
By Frank Kicenko
Cisco Technical Marketing Engineer

Introduction
The contact center is the hub of customer communication. Cisco® partners and customers
have long used Cisco contact center solutions in conjunction with other technologies to
improve their customer care experiences. The proliferation and extension of cloud media
services such as Speech-To-Text (STT) and Text-To-Speech (TTS), along with natural
language processing and other artificial intelligence software, enables more capabilities
that customers are eager to apply. The promise of Artificial Intelligence and Machine
Learning (AI/ML) is that it will reduce costs by automating simpler transactions and improve
customer service by increasing first-contact resolution through better self-service and
support for agents.
Cisco is developing a framework to enable customers to use their existing Cisco contact
center platforms – Contact Center Enterprise or Express - with AI/ML capabilities, and has
labeled this framework the Journey Intelligence Hub. The Journey Intelligence Hub is a
framework to apply data-driven intelligence and learning throughout the customer journey.
It encompasses services provided by leading AI/ML provider Google.
As we evolve the framework, we encourage Cisco customers and partners to use the
existing capabilities of Cisco’s contact center platforms to leverage AI and ML now. The
benefits of extending Cisco’s enterprise-class Unified Customer Voice Portal (CVP) with
Dialogflow, a conversational user interface from Google, are to:
• Enable conversational self-service experiences for customers.
• Establish a common dialog engine for text chat and voice interaction.
• Leverage the existing CVP infrastructure.

© 2019 Cisco and/or its affiliates. All rights reserved.


White paper
Cisco partner confidential

Integration The remainder of this white paper provides implementation instructions for
this integration and links to supplementary materials, including sample code.
requirements
As with any speech application in
CVP, we need the following items
Integration overview
to build a solution: The conversational IVR use case described in this white paper extends
the traditional CVP IVR scenario by plugging in cloud services. Google
• Cisco Unified Customer Cloud Speech services are used in place of on-premises Automatic
Voice Portal (CVP) with Cisco Speech Recognition (ASR) and TTS software. Google Dialogflow is used in
Virtualized Voice Browser conjunction with CVP to function as the dialog manager. Dialogflow can also
(VVB) version 11.6 and higher. be used to provide a chat-bot user interface.
• A Media Resource Control
A new grammar type within the Cisco Unified Virtual Voice Browser (VVB)
Protocol (MRCP) version
server specifies the dialog pass-through for the new services. VVB must be
2-based speech solution. Most
configured to use the MRCP and Google ASR/TTS services configuration, as
cloud speech vendors don’t
shown in Figure 1.
directly support MRCP, so we
need a component to do that Figure 1. Conversational IVR configuration

translation. In this case, we can


use the following components TDM/SIP/WSS
PSTN/SIP
to provide this need: 1.
-- UniMRCP [1] – Translates from Call Termination
Protocol Stack
MRCPv2 to Google Cloud and Routing
Speech SIP
Conversational/ SBC Session Management
-- Google Cloud Speech-to-Text [2] Traditional IVR 2. SDP
Capability Exchange
-- Google Cloud Text-to-Speech [3] 12.
VXML/RTP VXML MRCPv2 RTP/RTCP
• A framework to drive the dialog. 8. Control Message Media Streaming
Traditionally, this could be 11.
SIP/MRCPv2
3.,13. 4.
provided by the CVP studio.
Agent
HT

7. Google Speech APIs


However, for this integration
TP

Routing SIP/RTP/SRTP
S/

we will use Google Dialogflow 6., 15.


RE
9.

[4] to leverage its AI-powered 5., 14.


ST

Google
/JS

conversational functionality. Call Media Service Resources


Services
ON

Routing TTS ASR


These components should be Speech Synthesizer Speech Recognizer

acquired by the customer or SVI SR


Speaker Verifier and Identifier Speech Recorder
partner developing the integration.
10.
Agent To BOT Framework

Call flow
Telephony functions are managed as usual by CVP. Incoming calls are
answered; then the voice dialog proceeds by fetching a Virtual XML (VXML)
page from CVP. VVB controls the underlying speech services.

CVP and VVB integrate with Google Cloud services through several APIs.
VVB interacts with Google’s Speech API by sending the caller’s audio to
Google Cloud STT, which returns transcribed speech in text format. Google
Cloud TTS generates audio prompts, which are played to the caller.
© 2019 Cisco and/or its affiliates. All rights reserved.
White paper
Cisco partner confidential

Disclaimer Standard VXML provides control of these resources through VVB (note that
currently an MRCP-Google adapter such as UniMRCP is required).
The code and examples
presented within this document Dialog control is via the Google Dialogflow API. Following are the call flow steps:
are meant for experimental 1. A call arrives and is answered by CVP.
and evaluation purposes. They
are only samples and are not 2. Voice dialog continues by fetching a VXML page from CVP and VVB controls the
guaranteed to be bug-free and underlying speech services.
production-quality. 3. VVB creates an MRCPv2 session with the UniMRCP server.

The sample applications are 4. The VVB sends the caller’s audio to the UniMRCP server.
meant to: 5. UniMRCP forwards the callers’ audio to the Google Cloud
Speech-to-Text service.
• Illustrate how to use the
6. Google returns transcribed speech as text to UniMRCP.
relevant APIs and SDKs.
7. UniMRCP sends the transcribed text over MRCP to VVB.
• Serve as an example of the
step-by-step process of 8. The VVB sends the transcribed text to the CVP server.
building applications and 9. The CVP server sends the text to Google’s Dialogflow.
architectures to accommodate 10. The next action in the dialog is determined by the Dialogflow programming
relevant APIs and SDKs. (provide response, prompt, etc.) and the intent is sent to the CVP server.
• Acts as a guide for a 11. If an escalation intent is reached (automation is unable to complete), CVP
developer to see how to provides the information to Cisco Contact Center Enterprise or Express and
use the APIs and SDKs to the call is routed using existing methods.
customize for your own use. 12. If a speech intent is provided, the CVP server instructs the VVB to play the
given text.
13. The VVB sends the text over the MRCP to the UniMRCP server.
14. The UniMRCP server sends the text to Google’s Text-to-Speech service.
15. The Google Text-to-Speech service generates the audio, which is forwarded
through UniMRCP to the VVB and played to the caller.
16. Go to Step 4.

Conclusion
Conversational IVR using Google Dialogflow can result in a better, more
natural end-customer experience. Cisco customers can leverage their
existing CVP-based infrastructure while managing the natural dialog
interaction in an ongoing way. Cisco will continue to work with Google to
provide our customers and partners with the tools they need for success.
As we continue this work, feel free to try the integration documented in this
white paper and let us know how it works for you.

Access a code sample for the documentation and a link to GitHub which
contains the code to successfully implement the solution described
in this paper.

© 2019 Cisco and/or its affiliates. All rights reserved. Cisco and the Cisco logo are trademarks or registered trademarks of Cisco
and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: https://www.cisco.com/go/
trademarks. Third-party trademarks mentioned are the property of their respective owners. The use of the word partner does not
imply a partnership relationship between Cisco and any other company. (1110R)  C11-741894-00  02/19

You might also like