Professional Documents
Culture Documents
0
VoiceXML Overview
What is VoiceXML? Well it's an XML language for writing Web pages you
interact with by listening to spoken prompts and jingles, and control by means of
spoken input. VoiceXML brings the Web to telephones. If you want to get a
hands on feeling for what this is like, there are an increasing number of voice
portals which you can phone into and try out for yourself. Several sites also
offer free hosting for VoiceXML. Some pointers to these sites can be found in
the FAQ on the overview page.
VoiceXML isn't HTML. HTML was designed for visual Web pages and lacks the
control over the user-application interaction that is needed for a speech-based
interface. With speech you can only hear one thing at a time (kind of like looking
at a newspaper with a times 10 magnifying glass). VoiceXML has been carefully
designed to give authors full control over the spoken dialog between the user
and the application. The application and user take it in turns to speak: the
application prompts the user, and the user in turn responds.
Key Concepts
A session begins when the user starts to interact with a VoiceXML interpreter
and continues as VoiceXML documents are loaded and unloaded. The session
ends when requested by the user, VoiceXML document or interpreter context.
The platform defines the default session behavior, although this can be
overridden in part by VoiceXML.
Each dialog state has one of more grammars associated with it, that are used to
describe the expected user input, either spoken input or touch-tone (DTMF) key
presses. In the simplest case, only the dialog's grammars are active in that
dialog. In more complex cases, other grammars can be active.
• grammars defined within the dialog itself
• external grammars referenced by links
• grammars defined at the document level and marked as being globally
active
• grammars defined in the root application document and active throughout
the application
A subdialog is like a function call: it allows you to call out to a new dialog and
then returns to the original dialog, retaining the local state information for that
dialog. Sub dialogs can be used to handle confirmations and to create a library
of re-usable dialogs for common tasks.
VoiceXML allows you to define named variables for holding data. These can be
defined at any level and their scope follows an inheritance model. You can test
the values of variables to determine what dialog state to transition to next.
Variable expressions can also be used for conditional prompts and grammars
etc.
Events are thrown when the user fails to respond to a prompt, or when the input
can't be understood. VoiceXML allows you to write handlers for catching events.
These follow an inheritance model, and events can be caught at a higher level if
there is no corresponding handler at the dialog level.
VoiceXML allows you to use scripting (ECMAScript) when you need additional
control over the application. VoiceXML employs a form filling metaphor. You
can define a complex grammar for collecting the values of several fields in a
single response. Any unfilled fields can be handled by special subdialogs
defined inline within each dialog.
VoiceXML Examples
Here is a very simple VoiceXML application. It says "Welcome to Travel
Planner!", plays a short audio advertising jingle and then exits:
The following example offers a menu of three choices: sports, weather or news.
<?xml version="1.0"?>
<vxml version="2.0">
<menu>
<prompt>
Say one of: <enumerate/>
</prompt>
<choice next="http://www.sports.example/start.vxml">
Sports
</choice>
<choice next="http://www.weather.example/intro.vxml">
Weather
</choice>
<choice next="http://www.news.example/news.vxml">
News
</choice>
<noinput>Please say one of <enumerate/></noinput>
</menu>
</vxml>
Human: Sports
Computer: (proceeds to http://www.sports.example/start.vxml)
Here is another example, this time, using a form to ask the user to choose a city
and the number of travellers. Once this information has been collected it is
submitted to a web server:
<field name="city">
<prompt>Where do you want to travel to?</prompt>
<option>Edinburgh</option>
<option>New York</option>
<option>London</option>
<option>Paris</option>
<option>Stockholm</option>
</field>
<block>
<submit next="http://localhost/handler" namelist="city travellers"/>
</block>
</form>
</vxml>
VoiceXML allows you to give progressively more detailed prompts when the
user is having difficulty answering. This relies on a counter that increments each
time around. The following example shows how for a field that collects the
number of people travelling. The user is initially asked: "How many are travelling
to Boston". If this doesn't get a satisfactory answer, the user is then asked:
"Please tell me the number of people travelling". The nomatch element allows
you to provide a reminder if the user said something other than a number:
<prompt count="1">
How many are travelling to <value expr="city"/>?
</prompt>
<prompt count="2">
Please tell me the number of people travelling.
</prompt>
<prompt count="3">
To book a flight, you must tell me the number
of people travelling to <value expr="city"/>.
</prompt>
<nomatch>
<prompt>Please say just a number.</prompt>
<reprompt/>
</nomatch>
</field>
Here is an example that checks the value of a field after it has been collected.
This is used to issue a warning when the number of travellers in the group is
greater than twelve:
<filled>
<var name="num_travellers" expr="travellers + 0"/>
<if cond="num_travellers > 12">
<prompt>
Sorry, we only handle groups of up to 12 people.
</prompt>
<clear namelist="travellers"/>
</if>
</filled>
</field>
VoiceXML allows you to define subdialogs that can be used for common tasks.
Subdialogs are analogous to subroutines in programming languages. Here is an
example of a confirmation subdialog where a confirmation is asked to decide
whether to accept an earlier input or not:
<form id="ynconfirm">
<var name="user_input"/>
<filled>
<var name="result" expr="'false'"/>
<if cond="yn">
<assign name="result" expr="'true'"/>
</if>
<return namelist="result"/>
</filled>
</field>
</form>
If the speech recognizer indicates that it wasn't quite sure of what the user said,
VoiceXML allows you to tailor the dialog appropriately. In the following example,
the user is asked for a confirmation if the confidence score for the city name is
less than 0.7, but if it less than 0.3, the user will be asked to say the city name
again:
<field name="city">
<prompt>Which city?</prompt>
...
<filled>
<if cond="city$.confidence < 0. 3">
<prompt>Sorry, I didn't get that</prompt>
<clear namelist="city"/>
<elseif cond="city$.confidence < 0.7"/>
<assign name="utterance" expr="city$.utterance"/>
<goto nextitem="confirmcity"/>
</if>
</filled>
</field>
If the confidence is less that 0.3, the user will be told "Sorry, I didn't get that",
and will then be reprompted for the city name. If the confidence is less than 0.7,
the generic conformation subdialog is invoked. The subdialog element acts like
a subroutine call. The param element is used to pass data to the subdialog.
You can also use grammars in separate files. The following example makes use
of grammars in "trade.xml":
<form name="trader">
<field name="company">
<prompt> Which company do you want to trade?</prompt>
<grammar src="trade.xml#company" type="application/grammar+xml"/>
</field>
<field name="action">
<prompt>
do you want to buy or sell shares in
<value expr="company"/>?
</prompt>
<grammar src="trade.xml#action" type="application/grammar+xml"/>
</field>
</form>
You can use the import element to import grammar rules so that you can refer
to them in locally defined grammars. In the following it is assumed that
"politeness.xml" defines rules named "startPolite" (e.g. 'please') and "endPolite"
(e.g. 'thankyou'):
<grammar xml:lang="en">
<import uri="http://please.com/politeness.xml" name="polite"/>
In the following example for a stock trading application, the user can respond
with a short phrase such as "buy ericsson" that sets both the company and the
trade (buy or sell). The grammar for this is defined in the file "trade.xml". If the
user fails to respond adequately, then the applications tries a simpler approach,
prompting first for the company and then for the trade. The field elements are
skipped if the corresponding field value has already been filled.
<form name="trader">
<initial name="start">
<prompt>What trade do you want to make?</prompt>
<nomatch count="1">
<prompt>Please say something like ‘buy ericsson’ </prompt>
<reprompt/>
</nomatch>
<nomatch count="2">
Sorry, I didnÂ’t understand your request. LetÂ’s try something
simpler.
<assign name="start" expr="true"/>
</nomatch>
</initial>
The application may give the user the chance to change to a different task by
speaking the appropriate command. The grammar for this can be specified at
the document level or in the application root document. Here is an example of a
document level command menu:
<form name="trader">
...
</form>
...
</vxml>
To reference the application root document, you use the application
attribute on the vxml element:
<?xml version="1.0" encoding="ISO-8859-1"?>
<vxml version="2.0" lang="en"
application="http://buster/portal?sessionID=12d4rf65hg4" >
...
</vxml>
<catch event="noinput">
Sorry, I didnÂ’t hear anything.
</catch>