VXMLRef 007-02542-0025 R4.21 v01

CONVEDIA MEDIA SERVER
VOICEXML
INTERFACE REFERENCE GUIDE
RELEASE 4.21
007-02542-0025 August 2010
Release History Part Number

007-02542-0025
Date
August 2010
Description Version 01. Released with R4.21.0.
Proprietary Information
Copyright 20012010 RadiSys Corporation. All rights reserved. RadiSys and Convedia are registered trademarks of RadiSys Corporation. CMS-3000, CMS-6000, CMS-9000, eXMP, and eXtended Media Processing are trademarks of RadiSys Corporation. Red Hat and Red Hat Linux are registered trademarks of Red Hat, Inc. Linux is a registered trademark of Linus Torvalds. All other trademarks, registered trademarks, service marks, and trade names are the property of their respective owners. No part of this publication may be reproduced, modified, transmitted, transcribed, stored in any retrieval system, or translated into any language in any form, in whole or in part, by any means without the express prior written permission of RadiSys Corporation. RadiSys Corporation reserves the right to make changes to software, hardware, and documentation without notice. For the most recent version of documentation, visit the RadiSys web site at: www.radisys.com/service_support/convedia_support.cfm. This product may include the third-party software detailed in the installation manual for your media server.
Contact Information
RadiSys Corporation 4190 Still Creek Drive, Suite 300 Vancouver, BC V5C 6C6 Canada RadiSys Technical Assistance Center (TAC) Phone: +1-800-622-2235 (North America only, toll free) Phone: +1-604-918-6415 E-mail: tac@radisys.com To access support for Convedia Media Servers from the RadiSys web site, go to: www.radisys.com/service_support/convedia_support.cfm.
TABLE OF CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi List of Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Guide Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii RadiSys Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Whats New in Release 4.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi New Features in R4.21.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi New Features for SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Behavior Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Documentation Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Release Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Chapter 1: VoiceXML Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 VoiceXML Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 VoiceXML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Mixed-Initiatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Subdialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Protocol Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 VoiceXML 2.0 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 VoiceXML 2.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 SRGS Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Table of Contents
SSML Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 General XML Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 SIP Transport of VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Request-URIs for the dialog Service Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Passing Variables to the VoiceXML Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Standard Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Application Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Terminating VoiceXML Dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Sample VoiceXML Call Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 VoiceXML Interaction with HTTP Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 HTTP Server-Side Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 HTTP Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Set-Cookie Response Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Cookie Request Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 ASR and TTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 User Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 System Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Session Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 ECMAScript Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Escape Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Working with Media Files and TTS Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Media Clip Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Clip Delineation in Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Referring to Media Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 HTTP Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Relative URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Sets and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 2: VoiceXML Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Properties Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Generic Speech Recognizer Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Generic DTMF Recognizer Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Timeout vs. Interdigit Timeout Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Prompt Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Fetching Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 fetchhint Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 maxage Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 maxstale Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Other Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iv
Radisys Convedia Media Server
Reference Guide (v.01)
Table of Contents
Object Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Fax Detection Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 3: DTMF and Voice Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 DTMF Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Speech Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Input Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Menu-Choice Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Option Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 SRGS Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Inline SRGS Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 External SRGS Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Arbitrary Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Built-In Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Currency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Maximum Length of Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Grammar Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Chapter 4: VoiceXML 2.0 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

<assign> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 <audio> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Audio, Video, and Multimedia Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Text to Speech Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Audio Clip Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Alternate Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Audio Clip Name Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 <block> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 <break> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 <catch> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 <choice> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 <clear> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 <controlcmd> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 <desc> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 <disconnect> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 <else> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 <elseif> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 <emphasis> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 <error> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Radisys Confidential
Table of Contents
<example> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 <exit> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 <field> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 <filled> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 <form> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 <goto> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 <grammar> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 <help> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 <if> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 <initial> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 <item> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 <link> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 <log> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 <mark> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 <menu> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 <meta> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 <metadata> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 <noinput> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 <nomatch> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 <one-of> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 <option> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 <p> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 <param> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 <phoneme> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 <prompt> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Prompt Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Barging and Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 <promptcontrol> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 <property> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 <prosody> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 <record> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Storage of Recorded Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Size of Streamed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Encoding of Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Stopping Recordings with DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Setting a Pre-Speech Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Trimming Post-Speech Silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Appending to a Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 <reprompt> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 <return> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 <rule> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 <ruleref> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 <s> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 <say-as> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 <script> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 <speak> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
vi
Table of Contents
<sub> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 <subdialog> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 <submit> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 <throw> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 <value> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 <var> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 <voice> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 <vxml> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Chapter 5: VoiceXML 2.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

<data> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 <foreach> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Chapter 6: ECMAScript Language Binding for the DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Attr Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 CDATASection Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 CharacterData Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Comment Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Document Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 DOMException Prototype Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Element Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 EntityReference Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 NamedNodeMap Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Node Prototype Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 NodeList Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 ProcessingInstruction Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Text Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Appendix A: Best Practices for VoiceXML Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Glossary of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
vii
Table of Contents
viii
LIST OF TABLES
Table 1-1 Table 1-2 Table 1-3 Table 1-4 Table 1-5 Table 1-6 Table 1-7 Table 1-8 Table 1-9
VoiceXML 2.0 Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 VoiceXML 2.1 Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 SRGS Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 SSML Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Event Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Error Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 VoiceXML: Supported Media Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 VoiceXML: Supported Media Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Referencing Named Media Files in VoiceXML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 1-10 Referencing Indexed Audio Files in VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Table 2-1 Table 2-2 Table 2-3 Table 2-4 Table 2-5 Table 2-6 Table 2-7 Table 2-8 Table 2-9 Property Support Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 MRCP Speech Recognizer Properties Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 General Speech Property Elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Generic DTMF Recognizer Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Prompt Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 fetchhint Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 maxage Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 maxstale Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Support for Other Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Table 2-10 Support for Object Fetch Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Table 2-11 Fax Detection Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Table 2-12 Interaction of bargein and Fax Tone Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Table 2-13 Interaction of dtmfterm and Fax Tone Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 3-1 Table 3-2 Table 3-3 Table 3-4 Default Input Modes for VoiceXML Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Mechanisms for Setting Input Mode Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Interaction of Input Mode and Grammar Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Conversion of Built-In Speech Grammars to XML-SRGS Grammars . . . . . . . . . . . . . . . . . . 56
List of Tables
Table 4-1 Table 4-2 Table 4-3 Table 4-4 Table 4-5 Table 4-6 Table 4-7
Conversion of <field> type Attribute to <grammar> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Prompt Completion Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 DTMF Collection Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Effect of Barging Announcements on the Digit Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Recording Shadow Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Supported Encoding Formats for Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Summary of append Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
LIST OF EXAMPLES
Example 1-1 Example 1-2 Example 1-3 Example 1-4 Example 1-5 Example 1-6 Example 1-7 Example 3-1 Example 3-2 Example 3-3 Example 3-4 Example 3-5 Example 3-6 Example 3-7 Example 3-8 Example 3-9 Example 3-10 Example 4-1 Example 4-2 Example 4-3
Request-URI for dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Request-URI for dialog with Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 SIP INVITE with Query in URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 SIP Dialog with Query in URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Example Server-Side Perl Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Relative URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Inline SRGS DTMF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Inline SRGS Voice Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Boolean Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Currency Built-In DTMF Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Date Built-In DTMF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Digits Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Number Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Phone Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Time Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Variable Maximum Digit Length in a DTMF Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Alternate Audio Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 <option> Grammar Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 XML-SRGS Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
List of Examples
xii
LIST OF SHADOW VARIABLES
application.cvd_lastprompt$.bargein ............................................................................................................................ application.cvd_lastprompt$.duration........................................................................................................................... application.cvd_lastprompt$.lasturl ............................................................................................................................... application.cvd_lastprompt$.lasturl_offset .................................................................................................................. application.cvd_lastresult$.faxtyp................................................................................................................................... application.cvd_lastresult$.termcond............................................................................................................................ application.lastresult$.confidence................................................................................................................................... application.lastresult$.inputmode .................................................................................................................................. application.lastresult$.interpretation.............................................................................................................................. application.lastresult$.utterance...................................................................................................................................... name$.duration .................................................................................................................................................................. name$.maxtime.................................................................................................................................................................. name$.size........................................................................................................................................................................... name$.termchar .................................................................................................................................................................
125 125 125 126 127 127 127 126 126 126 136 136 136 136
List of Examples
xivRadisys Convedia Media ServerReference Guide (v.01)
PREFACE
This guide describes the Voice Extensible Markup Language (VoiceXML) interface to the Convedia Media Server. It provides a brief overview of VoiceXML, highlighting core concepts. It also documents Convedia Media Server compliance with the VoiceXML specification [13] and [14] describing extensions, deviations, and/or omissions from the specification. The VoiceXML 2.0 language is defined by the W3C Recommendation specifying the language [13]. VoiceXML 2.1 is defined by [14].For a full description of VoiceXML, the reader is referred to that Recommendation, which remains the normative implementation reference. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported by the Convedia Media Server in this release. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification. This preface describes this guide, laying out its organization, the assumptions made about the reader, and the conventions used in the guide. It also explains how to get technical support, and describes the features that are new in this release. The following information is presented: Intended Audience Guide Organization Document Conventions RadiSys Publications Technical Support Whats New in Release 4.21
Preface
Intended Audience
This guide is intended for applications developers and other technical personnel wanting to communicate with a Convedia Media Server from a control agent (that is, from a softswitch or an application server) using SIP and VoiceXML. Readers should be thoroughly conversant with application programming using Session Initiation Protocol (SIP).
Guide Organization
Thisguideisorganizedasfollows:
Chapter 1: VoiceXML Overview Chapter 2: VoiceXML Properties Chapter 3: DTMF and Voice Grammars Chapter 4: VoiceXML 2.0 Elements Chapter 5: VoiceXML 2.1 Elements Chapter 6: ECMAScript Language Binding for the DOM Appendix A: Best Practices for VoiceXML Development References Glossary of Acronyms This chapter provides an overview of the core concepts of the Voice Extensible Markup Language (VoiceXML). This chapter describes the media servers support for VoiceXML properties. This chapter describes the media servers support for DTMF and voice grammars in VoiceXML. This chapter describes the VoiceXML 2.0 elements currently supported by the Convedia Media Server, including SRGS and SSML elements. This chapter describes the VoiceXML 2.1 elements currently supported by the Convedia Media Server. This chapter describes the ECMAScript binding for the subset of Level 2 of the DOM. This appendix describes some development practices that can help you maximize performance and capacity of your VoiceXML applications.
xvi
Document Conventions
Document Conventions
This guide uses the following advisory paragraphs:
Warning: Warnings alert you to situations that may pose a threat to personal safety.
Caution: Cautions alert you to situations that might cause harm to your system or damage to equipment, or that may affect service.
Note: Notes provide information you might need to avoid problems or configuration errors.
In addition to advisory paragraphs, the following typographic conventions are used in RadiSys guides:
Monospace <Monospace> boldface Monospace
Monospace font is used in special example paragraphs to indicate code samples and console output. Angle brackets surrounding Monospace font are used to indicate elements in a markup language, such as VoiceXML and MSML. Boldface Monospace font is used in examples where you must interact with the system. The text in boldface Monospace represents information you must enter. Boldface font is used to indicate file names, comnmands, and any term in a formal languagefor example, a signal or parameter in MGCP, an attribute in MSML, a property in VoiceXML, and other methods, classes, and headers. Italic font is used in command or element syntax, and inline, to indicate arguments and variables, that is, values that you must supply. Upper case is used to indicate protocol requests and messages, for example, a PUT request in HTTP, a SYN packet in TCP, or an INVITE or BYE message in SIP. Angle brackets are used to indicate a key on your keyboard. Combinations of keys are joined by plus signs (+), for example <Ctrl>+<Alt>+<Del>. Square brackets enclose elements that are optional in a syntax. Curly brackets enclose a set of syntax elements where exactly one element must be chosen.
boldface
italics
CAPS
<key>
[] {}
xvii
Preface
arg | arg
Vertical bars are used to separate elements that are strict alternatives (exclusive OR). When vertical bars are used, only one alternative can be chosen. The typographic convention at left indicates a value that can optionally represent a space-separated list of the same kind of element (for example, a space-separated list of IP addresses). The typographic convention at left indicates a value that can optionally represent a comma-separated list of the same kind of element (for example, a comma-separated list of IP addresses). The typographic convention at left indicates a value that can optionally represent a hyphen-separated range of values (for example, a range of IP addresses).
arg [arg...]
arg[, arg...]
arg[-arg...]
RadiSys Publications
The following product documentation is available for RadiSys products. Download the correct version of the documents you need from the RadiSys web site at www.radisys.com.
.
Convedia Media Server System Description IMMS 3G-324M-Integrated Media Server Solutions Guide CMS-9000 Media Server User Guide
Provides a high-level overview of RadiSys Convedia Media Servers. Provides an overview of the IMMS 3G-324M-Integrated Media Server and its place in the network. Describes the CMS-9000 Media Server, and explains how to perform operations, administration, management on the CMS-9000 Media Server using the web GUI. Provides hardware installation and maintenance procedures for the CMS-9000, up to and including RS-232 console configuration. Describes the CMS-6000 Media Server, and explains how to perform operations, administration, management on the CMS-6000 Media Server using the web GUI. Provides hardware installation and maintenance procedures for the CMS-6000, up to and including RS-232 console configuration. Describes the CMS-3000 Media Server, and explains how to perform operations, administration, management on the CMS-3000 Media Server using the web GUI.
CMS-9000 Media Server Hardware Installation Manual CMS-6000 Media Server User Guide
CMS-6000 Media Server Hardware Installation Manual CMS-3000 Media Server User Guide
xviii
Technical Support
CMS-3000 Media Server Hardware Installation Manual Convedia Software Media Server User Guide
Provides hardware installation and maintenance procedures for the CMS-3000, up to and including RS-232 console configuration. Describes the Convedia Software Media Server, and explains how to perform operations, administration, management on the Convedia Software Media Server using the web GUI. Describes the Convedia Software Media Server, and explains how to perform operations, administration, management using the web GUI on the Convedia Software Media Server when the operational mode is configured to co-resident mode. Provides software installation and maintenance procedures for the Convedia Software Media Server, up to and including initial network configuration. Describes the media servers support for SIP, and how to use the SIP interface. Describes the media servers support for VoiceXML 2.0 and 2.1, and how to use the VoiceXML interface. Describes the media servers support for MSML 1.1, and how to use the MSML interface. Describes the media servers support for MGCP, and how to use the MGCP interface. Describes the media servers support for H.248/MEGACO, and how to use the H.248 interface. Describes the media servers support for SNMP, and how to use the SNMP interface. Describes the media servers sets and variables feature, and support for each language. Explains how to configure and use the media server to interoperate with external devices such as NFS servers, HTTP servers, speech servers, and video terminals. Provides general guidelines for expected performance and capacity for RadiSys Convedia media servers.
Convedia Software Media Server User Guide (Co-Resident Mode)
Convedia Software Media Server Installation Manual Convedia Media Server SIP Interface Reference Guide Convedia Media Server VoiceXML Interface Reference Guide Convedia Media Server MSML 1.1 Interface Reference Guide Convedia Media Server MGCP Interface Reference Guide Convedia Media Server H.248 Interface Reference Guide Convedia Media Server SNMP Interface Reference Guide Convedia Media Server Sets and Variables Interface Reference Guide Convedia Media Server Special Interfaces Reference Guide Convedia Media Server Capacity and Performance Reference Guide
Technical Support
Technical support is available from the RadiSys Technical Assistance Center (TAC). Support is governed by the terms of your agreement with RadiSys Corporation.
xix
Preface
TAC can be reached using the following contact information: RadiSys Corporation 4190 Still Creek Drive, Suite 300 Vancouver, BC V5C 6C6 Canada RadiSys Technical Assistance Center (TAC) Phone: +1-800-622-2235 (North America only, toll free) Phone: +1-604-918-6415 E-mail: tac@radisys.com To access support for Convedia Media Servers from the RadiSys web site, go to: www.radisys.com/service_support/convedia_support.cfm.
xx
Whats New in Release 4.21

This section focuses on differences between this release of the product and the previous release. As each new point release is issued, starting with R4.21.0, this section will record the changes so that the history can be reviewed from R4.21.0 to R4.21.n. This section is included as a quick reference for users who are upgrading to this release from the previous release. Customers new to the product are advised to read the entire document.
New Features in R4.21.0

R4.21.0, the first field release of R4.21, is supported on the CMS-9000 and CMS-3000.
New Features for SIP

Media server integrates 3G-324M multimedia gateway To provide real-time multimedia services to mobile phones over circuit switched networks, the 3rd Generation Partnership Project (3GPP) adopted the 3G-324M protocol. It comprises several ITU-T standards: H.223 Multiplexing Protocol for Low Bit Rate Multimedia Communication, H.245 Control Protocol for Multimedia Communication, and H.324 Terminal for Low Bit-Rate Multimedia Communication. This release integrates 3G-324M gateway functionality into the media server. Previously, to interface with a 3G-324M network the media server required a 3G-324M video gateway. The RadiSys 3G-324MIntegrated Media Server connects to the 3G-324M network through voice gateways that support Clearmode (a psuedo-codec defined in RFC 4040 for transparent transportation of 64 kbit/s channel data in RTP packets). 3G-324M calls are initiated by a call agent specifying Clearmode in a SIP INVITE. Clearmode call scenarios are supported for both call-agent offers and media-server offers. The resultant Clearmode port multiplexes and demultiplexes the 3G-324M RTP, receiving requests as H.245 control messages and streaming audio and video. (Data components in received 3G-324M streams are not processed.) The media server supports H.223 transport Levels 0 - 2 (baseline H.223, Annex A, and Annex B) with Adaptation Layer AL1 for H.245 control and AL2 for media. To establish 3G-324M calls, the media server negotiates H.245 master/slave determination, terminal capabilities, opening audio and video channels, and H.223 multiplexing. For reliable communications the media server supports Numbered Simple Retransmission Protocol (NSRP) and Windowed NSRP (WNSRP), and uses Control Channel Segmentation and Reassembly Layer (CCSRL). Request Mode requests are not processed; Round-trip Delay (RtD) requests are accepted and an RtD response returned. The media servers support for 3G-324M includes Annex A, Annex C (level 0,1, and 2), and Annex K (MONA) Class II. Sessions can be audio-only, video-only, or multimedia. Media server MSML and VoiceXML features are supported for 3G-324M sessions with the following exceptions: DTMF inband detection, generation (inband and out-of-band), long digits, and pass through; fax detection and functionality; video text overlay and continuous presence video conferencing; and MPC redundancy for ongoing sessions. The AMR narrow-band audio codec and H.263 video codec are supported for these sessions.
xxi
Preface
When enabled, the media servers MSML interface reports to the control agent as events significant state changes in the 3G-324M session, such as the establishment of logical channels. For an overview of the RadiSys 3G-324MIntegrated Media Server, please see the Integrated Mobile Media Server (IMMS) 3G-324M-Integrated Media Server Solutions Guide. Complete details of the media servers support for the 3G-324M protocol are given in the Convedia Media Server Special Interfaces Reference Guide. The User Guide for your media server describes how to configure the integrated 3G-324M gateway. Additional usage information is provided in the protocol guides. 3G-324M session statistics This release introduces new statistics for 3G-324M sessions. For every statistics interval the media server reports the number of sessions created, maximum concurrent sessions, successful and failed sessions set up. The media servers existing per-port statistics are supported for 3G-324M sessions (with the exception of those related to the jitter buffer). For more information about new statistics, please see the Convedia Media Server SNMP Interface Reference Guide and the User Guide for your platform.
Behavior Changes
There are no behavior changes in this release.
Documentation Changes
New Integrated Mobile Media Server (IMMS) Solutions Guide This release introduces a new book, the Integrated Mobile Media Server (IMMS) 3G-324MIntegrated Media Server Solutions Guide, which provides an overview of the first IMMS product, a media server with integrated 3G-324M video gateway functionality. New 3G-324M Gateway chapter in Convedia Media Server Special Interfaces Reference Guide This release adds a new chapter, 3G-324M Gateway, to describe the media servers support for 3G-324M sessions. Changes to the Convedia Media Server MSML 1.1 Interface Reference Guide The Convedia Media Server MSML 1.1 Interface Reference Guide has been restructured to better reflect the organization in the MSML specification, RFC 5707.
Release Limitations
This release does NOT support the following media server features, available in the previous release (R4.20) of the CMS-9000 and CMS-3000 media servers: RFC 4117 transcoding Audio transcoding services as an RFC 4117 Transcoding Server (T), providing transcoding services between two SIP User Agents (UAs) through the use of Third Party Call Control (3pcc). New hardware: TPC-I A new Transcoding Processor Card (TPC-I) dedicated to providing RFC 4117 audio transcoding services on the CMS-9000.
xxii
EVRC codec 3G2 C.S0014-0 Enhanced Variable Rate Codec (EVRC-A) codec for EVRC0 media type specified in RFC 3558. Automatic noise reduction Automatically activating and inactivating noise reduction based on a configured threshold. MSML support for CRBT random ring MSML <play> elements start and end attributes, used to select part of an announcement. NLD reports change to noise type Events for changes in the type of noise (background, impulsive, continuous-signal noise, or a low SNR) exceeding configured limits. MSML configuration of per-port statistics Configuring the per-port statistics through the MSML interface. Per-port statistics can be configured through the SIP interface. T.38 fax data is replicated on G.711 ports Replicating T.38 fax data when the call is negotiated as G.711 in the SIP group context. Enhancements to SIP Custom Profile 2 for facsimile services Enhancements to SDP for fax support and changes to case sensitivity. 3G2 file format Multimedia, audio-only, and video-only announcements in the 3G2 file format as defined in 3GPP2 C.S0050. SIP message serialization SIP message serialization prevents out-of-order delivery of SIP messages. R4.20 also introduced a number of new VQE (voice quality enhancement) statistics and improvements to the echo cancellation algorithms that are not implemented in this release. Additionally, the following CMS-9000 behavior changes of R4.20 are not implemented in this release: Default network topology change from Internal Control Subnet to External Control Subnet. Binding the Apache HTTP daemon service to the management interface on the SCC. For detailed descriptions of these features, please see the documentation for R4.20.
xxiii
Preface
xxiv
Chapter1:
VOICEXML OVERVIEW
This chapter provides an overview of the core concepts of the Voice Extensible Markup Language (VoiceXML). This chapter presents the following information: Introduction VoiceXML Structure Protocol Support SIP Transport of VoiceXML VoiceXML Interaction with HTTP Servers ASR and TTS User Input System Output Control Flow Session Termination Shadow Variables Events Errors ECMAScript Support Escape Characters Working with Media Files and TTS Strings
RadiSys Confidential
VoiceXML Overview
Introduction
VoiceXML is an XML-based markup language for creating user dialogs or Interactive Voice Response (IVR) interactions. VoiceXML provides an extensive mechanism for developing simple or complex IVR applications. The ability to create modular applications from many reusable subcomponents enables VoiceXML developers to create complex IVR applications in a short period of time. The widespread adoption of VoiceXML, together with its inherent similarities to data-centric user dialogs, make it a powerful language for IVR application development. The media server supports a rich set of VoiceXML mechanisms for creating simple or elaborate IVR applications: Playing of streamed audio files, stored inside the media server or on external NFS and HTTP servers Inband and RFC 2833 DTMF detection, collection, and interpretation Detection of user speech input Support for built-in, SRGS, Menu-Choice, and Option grammars for both DTMF and speech Support for playing Text to Speech media clips Recording of audio and video to internal memory or external NFS and HTTP servers Playback of user-recorded audio and video to internal memory or external NFS and HTTP servers Support for VCR-like controls (skip forward, skip back, pause, resume, append) CNG and CED fax detection and notification capabilities Embedding of complex functions (ECMAScript/JavaScript) Dialog control flow The ability to transfer the caller to another destination, such as another telephone line or voice application The basis of all VoiceXML dialogs consists of sending audio prompts to the user and collecting user input in the form of DTMF digits. An example application is a user dialing up a service center and ibeing prompted to select from several spoken options by pressing the corresponding telephone key. Upon receiving the DTMF information the VoiceXML application determines what action to take.
VoiceXML Structure
This section presents the following topics: VoiceXML Documents Dialogs Forms Mixed-Initiatives
VoiceXML Structure
Menus Elements Subdialogs Scope
VoiceXML Documents
A VoiceXML application consists of one or more VoiceXML documents, or scripts. The IVR session with a user begins at the invocation of the first VoiceXML document associated with the application. This document is called the root document of the application. During the IVR session any number of additional documents (leaf documents) may be fetched and loaded, and then unloaded, until the user ends the IVR session according the application dialog flow. During the IVR session the root document may reference, or call, other supporting VoiceXML documents, as in the illustration below. During a given session, any number of documents may be loaded and unloaded. While a subdocument is loaded, information from higher-level documents remains available to the session. Although applications alwaysbegin by loading the root document, they can terminate from any document, including subdocuments. This ends the users IVR session. Alternatively, an external control agent (for example, a SIP User Agent) can forcibly terminate the IVR session at any point.
Root
Document D1
Document D2
Document
Dialogs
The foundation of the VoiceXML application is the dialog, which takes place between the application and the user. VoiceXML dialogs define interactions between a user and the network through an IVR session. Once the application has launched, the user interacts with it through VoiceXML dialogs and subdialogs. Dialogs are composed of VoiceXML elements. The dialog is a series of audio or video prompts to the user, streamed over RTP, and subsequent collection of user input in the form of DTMF key presses or speech inputs, which are detected and reported to the VoiceXML session. The control logic defined in the VoiceXML application (that is, the document or script) defines when media is played to the user and when user input is collected, to create a dynamic media-based user dialog or IVR session, similar in nature to a web or HTML-based data-centric user dialog session. The following types of dialogs can be created using VoiceXML:
VoiceXML Overview
System-directed. In system-directed dialogs, the system leads the user by asking questions and waiting for user input. User-directed. In user-directed dialogs, the user input controls the dialof flow. Mixed-initiative. In mixed-initiative dialogs, either the system or the user can direct dialog flow. These types of dialogs are more complex and are difficult to implement in DTMF-based systems. With speech-based grammars, this type of dialog is more practical to implement. DTMF input is obtained from users through either forms or menus.
Forms
The form item is the primary mechanism of prompting the user. A form-based dialog plays an audio or multimedia prompt to the user. In response, the user presses some sequence of DTMF digits or responds with speech input, in which the input is expected to match the field format or grammar. If the collected choices match the expected grammar, they are said to fill the field. Collected input matching the expected grammar is assigned to the field variable. The field variable can then be used as a standard variable (as in a standard programming language) within further logic and control flow. Additionally, the collected input contained in the variable can be submitted to an external application using the HTTP protocol. Each form item may consist of two sections: a user input item or a form control item. The media server supports any of the following elements as user input items: <field>: This element allows the user to enter DTMF according to a pre-determined format or grammar. <record>: This element records audio spoken by the user. <subdialog>: This element moves the user to another location (a subdialog) in the application. When the subdialog is complete, control returns to the calling dialog. The media server supports any of the following elements as form control items: <block>: This element does not collect input, but rather defines a set of executable statements for prompting. <initial>: Defines the initial control for the form when using mixed-initiative dialogs (where either the system or the user can direct the dialog flow).
Menus
A menu-based dialog presents the user with a number of choices. The menu item is a simplified version of a form item, designed to present the user with a fixed set of choices. The choices are presented as a series of audio or multimedia prompts played to the user. In response, the user presses some sequence of DTMF digits or speaks, and the inputs are collected and interpreted by the application. For example, a simple menu item may ask the user to press DTMF digit 1 to hear a weather report, to press DTMF digit 2 to hear a sports report, and to press DTMF digit 3 to hear a traffic report. If the user input matches one of the choices, then the application transitions control to another location within the document, or to another document in the application, as specified for the given choice.
VoiceXML Structure
The basic structure of a menu is as in the following example:
Menu Play prompt to user requesting choice (1, Wait for user input User Choice 1: action 1 (jump to location User Choice 2: action 2 (jump to location User Choice 3: action 3 (jump to location No User Input: No-input action End Menu
2 or 3) from user 1) 2) 3)
Mixed-Initiatives
A mixed initiative is a <form> element containing one or more <form>-level grammars, where both the user and the application can define the direction the dialog will take. A common mechanism for implementing this is to use an <initial> element that prompts the user for general information. The results of the users input then directs the user to specific fields with specific prompts and possibly other grammars defined. This mechanism is most commonly used in voice-based applications.
Elements
A VoiceXML element invokes an action. For example, the <prompt> element defines the output to be played to a user. The scope of an element is from its opening tag to its closing tag, as in the following example:
<prompt>....</prompt>
An element can have child elements nested within its scope or can itself be a child nested within the scope of a parent element. Elements can have attributes associated with them with values that can be set. The media servers support for elements is summarized in the section Protocol Support on page 7. Each supported element is described in detail in Chapter 4: VoiceXML 2.0 Elements and Chapter 5: VoiceXML 2.1 Elements
Subdialogs
A subdialog allows a user to enter into another dialog. Upon returning from the subdialog, the original dialog continues from the place where it left. Parameters can be passed into the subdialog and the subdialog can return values to the calling control logic. The subdialog mechanism is much like a subroutine in a standard programming language.
VoiceXML Overview
Subdialogs are useful in creating and organizing commonly used dialog functions as a libraries, which can be reused by many applications.
Scope
Whenever a supporting document is loaded by the VoiceXML interpreter, the root document is also loaded. This provides the interpreter with all the global information necessary to properly apply values to variables, links, and events. However, a value may be redefined within a different scope. The concept of scope applies to grammars, variables, links, and event handling. Scope determines the order of precedence for VoiceXML tags. Scope allows developers to: Control the global behavior of an application Group logically related tasks into documents Break down large applications into more manageable, faster-loading modules VoiceXML has a number of scopes, listed here in order of decreasing scope and increasing precedence. Session. Session variables are declared by the platform on which the voice application is deployed. Session variables apply to an entire user session. They are read-only, which means they cannot be modified within any VoiceXML document, either the root document or a supporting document. Application. Applications are declared within the <vxml> tag of the root document. Values assigned at the application level are initialized when the root document is loaded, and apply as long as it remains loaded. These values are available to any element within the root document or any supporting document referenced by the root document. Document. Values within documents are assigned within the <vxml> tag of a supporting document. Document values are initialized when the supporting document is loaded, and remain available as long as the document is loaded. Document values are available to any dialog within the document. Document values are not available across documents. Dialog. Values for dialogs are declared within the <form> or <menu> tags in a document. Values for dialogs are available only to the elements within the dialog for which they are declared. For executable content, values are initialized when the content is executed and are released when execution terminates. For form/field items, values are initialized when the form item is collected. Elements. Values for elements apply to any of its child elements. Precedence of values increases as the scope becomes more local. That is, the session scope has the least precedence, and values within a dialog have the greatest precedence. Another way to say this is that global scoping behavior can be overridden by declaring parameters at a lower level; locally defined values always override values defined at a higher level. For example, the scope of variables from broadest to narrowest is as follows:
Session > Application > Document > Dialog > Anonymous
On the other hand the precedence of variables from highest to lowest is as follows:
Protocol Support
Anonymous > Dialog > Document > Applicion > session
Protocol Support
This section describes the media servers support for the following protocols: VoiceXML 2.0 Elements VoiceXML 2.1 Elements SRGS Elements SSML Elements General XML Handling Use of an unsupported VoiceXML element results in an error.unsupported event. Use of an unsupported SRGS element results in an error.badfetch or an error.grammar, depending on when it is encountered. An unsupported SSML elements (and its content) is ignored in order to maximize compatibility with documents that include SSML elements as alternatives to prerecorded audio files. SSML elements are not supported within SRGS grammars.
VoiceXML 2.0 Elements

This release of the VoiceXML interface is based on the 2.0 specification of VoiceXML, as given in [13]. Table 1-1 shows which elements from that Recommendation are supported.
Table 1-1 VoiceXML 2.0 Supported Elements
Element <assign> <audio> <block> <catch> <choice> <clear> <controlcmd> <disconnect> <else> <elseif> Description Assigns a value to a variable. Plays an audio clip or multimedia file or renders a text-to-speech clip. Allows execution of code within a form. Handles (catches) events. Provides menu choices. Clears or resets form items (form fields). RadiSys extension. Specifies the actions associated with DTMF key presses for prompt controls. Terminates the VoiceXML application, sending a SIP BYE. Provides alternative logic for an <if> condition. Provides alternative logic for an <if> condition. Supported Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
VoiceXML Overview

Element <enumerate> <error> <exit> <field> <filled> <form> <goto> <grammar> <help> <if> <initial> <link> <log> <menu> <meta> <metadata> <noinput> <nomatch> <object> <option> <param> <prompt> <promptcontrol> <property> <record> <reprompt> <return> <script> Description Not supported. Handles (catches) all error events. Terminates the VoiceXML application, while keeping the port open. Collects user input. Defines the code to be executed when user input is complete. Defines a dialog for collecting user input. Transfers control to another dialog, abandoning the current dialog. Defines user input rules for DTMF or voice. Handles (catches) help events. Defines conditional logic. Provides the initial prompt in a form. Specifies a destination URL when a grammar activates a match. Generates messages for logging and troubleshooting. Provides a fixed set of menu selections. Defines page information. Defines information about a document using a metadata schema. Handles (catches) a user input timeout event. Handles (catches) an invalid user input event. Not supported. Provides a simple method for specifying grammars. Defines a parameter to a subdialog. Specifies media output to be played to a user. RadiSys extension. Specifies media controls for user prompt manipulation. Sets the value of a property. Records user audio, video, or multimedia to a file. Repeats a prompt for user input. Return from a subdialog to the calling dialog. Executes ECMAScript (JavaScript) code. Supported Rejected Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Ignored Yes Yes Rejected Yes Yes Yes Yes Yes Yes Yes Yes Yes
Protocol Support

Element <subdialog> <submit> <throw> <transfer> <value> <var> <vxml> Description Invokes another dialog, from which control will eventually return. Submit application values and fetch a new document, transitioning to a new dialog. Generates an event to be handled by <catch>. Not supported. Declares a variable and assigns it a value. Inserts the value of an expression into a log message or prompt. The root element for VoiceXML. Defines the set of actions that form a VoiceXML dialog. Supported Yes Yes Yes Rejected Yes Yes Yes

This release of the VoiceXML interface is based on the 2.1 specification of VoiceXML, as given in [14]. Table 1-2 shows which elements from that Recommendation are supported.
Element <data> Description Fetches XML data from a document server without transitioning to a new VoiceXML document. Allows a VoiceXML application to iterate through an ECMAScript array, executing the content of each array item.. Supported Yes
<foreach>
Yes. The media server does not support and rejects the following child elements of <foreach> in this release: <aws> and <enumerate>. The media server ignores the following child elements of <foreach> in this release: <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <value>, and <voice>.
SRGS Elements
The SRGS specification is given in [11]. Table 1-3 shows which elements from that Recommendation are supported. Note that, while the media server supports all SRGS elements for voice grammars, the actual support for voice is a function of the specific support provided by the external speech server deployed. Whether the external server support all the elements supported by the media server depends on the server deployed.
VoiceXML Overview
Note also that even if supported or ignored, if used illegally, an element will be rejected with an error. For example, an SRGS element that would be ignored if used correctly will be rejected with an error if enclosed directly within a VoiceXML element.
Table 1-3 SRGS Supported Elements
Element <example> Description [SRGS] Provides an example phrase that matches the input specification. Defines user input rules for DTMF or voice. [SRGS] Defines valid user input, as part of a DTMF or voice grammar rule. [SRGS] Defines valid user input, as part of a DTMF or voice grammar rule. Defines page information. Supported DTMF: Ignored Voice: Yes Yes Yes DTMF: Ignored Voice: Yes DTMF: Ignored Voice: Yes DTMF: Ignored Voice: Yes Yes Yes DTMF: Rejected Voice: Yes DTMF: Rejected Voice: Yes
<grammar> <item> <lexicon>
<meta>
<metadata>
[SRGS] Defines information about a document using a metadata schema. [SRGS] Allows one selection from a list of alternatives. [SRGS] Defines a grammar rule for an inline DTMF or voice grammar. [SRGS] Allows another voice grammar rule to be included.
<one-of> <rule> <ruleref>
<token>
SSML Elements
The SSML specification is given in [5]. Table 1-4 shows supported elements from that Working Draft plus supported VoiceXML extensions as per [13]. Please note that, for SSML elements, supported means that the media server passes the request to the external speech server. The behavior for the element depends on the behavior of the speech server and this can vary. That is, from the point of view of the media server, all SSML elements except <speak> may be included in a VoiceXML document; whether the external server supports them is an independent matter.
10
Protocol Support
Note also that even if an element is supported or ignored, if used illegally, it is rejected with an error. For example, an SSML element that would be ignored if used correctly will be rejected with an error if used illegally within an SRGS grammar.
Table 1-4 SSML Supported Elements
Element <break> <desc> <emphasis> <enumerate> Description Inserts a pause or silence into audio. [SSML] Provides a textual description of audio content. [SSML] Directs the speech server to add emphasis to surrounded text. [VoiceXML extension] This element is defined in [13]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] Places a marker into a text or tag sequence. [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] Represents a paragraph. [SSML] Provides a phonemic/phonetic pronunciation for the contained text. [SSML ] Permits control of the pitch, speaking rate and volume of the speech output [SSML] Represents a sentence. [SSML] Defines a text string to be rendered as an audio clip. [SSML] The root element of SSML. [SSML] Replaces the contained text with a substitute. [VoiceXML extension] [SSML] Requests a change in speaking voice. Supported Yes Yes Yes Yes
<lexicon>
Yes
<mark> <meta>
Yes Yes
<metadata>
Yes
<p> <phoneme> <prosody> <s> <say-as> <speak> <sub> <value> <voice>
Yes Yes Yes Yes Yes Yesa Yes Yes Yes
a. The <speak> element is not supported directly in VoiceXML scripts. All TTS scripts are rendered into <speak> SSML XML scripts which are then passed to an external server for playing if an external server is active. A parse error results if a <speak> element with TTS text is included in a VoiceXML file.
11
VoiceXML Overview
General XML Handling

The following features for general XML handling are supported: The media server ignores XML namespace declarations if present. The media server replaces XML predefined entity referencesfor example, <, &, >, ", and 'in received data are replaced with the appropriate literal character. The media server replaces XML special charactersfor example, the less-than operator (<), the ampersand (&), the greater-than operator (>), quotation marks (), and the apostrophe ( )with predefined entity references in sent data. XML identifiersincluding element names, attribute names, and literals such as attribute valuesare case-sensitive. The following media server behavior is an exception to standard XML handling: The media server does not support using an XML numeric character reference (for example &#nnnn; and &xhhhh;) to refer to a character by its Unicode code point, where nnnn is the code point in decimal form and hhhh is the code point in hexadecimal form. The media server does not replace XML numeric character references with the appropriate literal characters; instead, the XML parser simply passes these characters as-is to the interpreter.
SIP Transport of VoiceXML

The SIP dialog service context allows a VoiceXML document to be accessed by the media server. A VoiceXML dialog is initiated whenever a SIP INVITE is received for a dialog service context. The behavior for initiating VoiceXML dialogs depends on the setting configured for the SIP Standards Profile through the media servers management interface. When the SIP Standards Profile has been configured to be Default (using the media servers management interface), the behavior is as follows: If the media server does not have the resources to initiate the VoiceXML interpreter, it will return a 486 (Busy here) response. Otherwise, the media server will continue with SIP signalling according to the SDP Offer/Answer model, sending a 200 (OK) response to the INVITE and waiting for the control agents ACK. Once media negotiation has completed and the media server is able to send and receive media, it will retrieve the VoiceXML document from the server and begin document execution. If the MS is unable to retrieve the document, it will issue a BYE and include a Reason header identifying the problem with a 404 (Not found) response. When the SIP Standards Profile has been configured to be Profile 1 (using the media servers management interface), the behavior is as follows: If the media server does not have the resources to initiate the VoiceXML interpreter, it returns a 503 (Service unavailable) response. Otherwise, the media server retrieves the VoiceXML document from the server and parses the document for correctness, before proceeding with media negotiation and returning a final response.
12
If the media server was unable to retrieve or successfully parse the document, it retursn a 404 (Not found) response. The SIP Request-URI delay parameter is measured in units of milliseconds instead of in 100-millisecond increments and can be set to up to 99999 msec.
Request-URIs for the dialog Service Context

The root VoiceXML document is identified using the voicexml parameter of the Request-URI. The value of the parameter must be a valid HTTP URI. Example 1-1 shows an example of a dialog Request-URI.
Example 1-1 Request-URI for dialog
sip:dialog@ms.company.com;voicexml=http://host.company.com/scripts/ ivr.vxml
The URL must not exceed 1024 characters. The HTTP URI can include a query component, to allow the document to be dynamically generated by the server.
Note: The query delimiter character (?) must be escaped as %3f, since ? is a reserved character within a SIP URI. Similarly, when not used in a value equals context, the equals sign (=) must be escaped as %3d. In general, to determine the equivalent escaped characters for Linux or Solaris, search for the character in question, then replace its ASCII value with its equivalent hex value preceded by a %.
Example 1-2 shows an example of a VoiceXML dialog Request-URI containing a query string passing multiple parameters.
Example 1-2 Request-URI for dialog with Query Original Query: sip:dialog@ms.company.com;voicexml=http://host.company.com/scripts/ ivr?caller=usera&callee=userb Send to Media Server as: sip:dialog@ms.company.com;voicexml=http://host.company.com/scripts/ ivr%3fcaller%3dusera&callee%3duserb
When the document is expressed as a stand-alone URI, the voicexml keyword should be omitted. The SIP Request-URI should remain otherwise unchanged from that shown in Example 3-7.
13
VoiceXML Overview
The media server passes only one URI parameter from the Request-URI to the VoiceXML interpreter. If additional URI request parameters are included in the Request-URI, the media server treats the Request-URI as a bad request. The HTTP URI can include a query component, instead of a straight request for a specific VoiceXML document. This allows the server to dynamically select a VoiceXML script for execution. This allows such features as invoking scripts on the basis of called numbers, for instance. Example 1-3 shows a SIP INVITE that dynamically selects a script according to the number that was dialled. In this example, the DialledNumber parameter is sent to the HTTP server as a URL-encoded request to fetch the VoiceXML document. This request dynamically generates the associated script and fetches it for the media server to execute.
Example 1-3 SIP INVITE with Query in URI
INVITE sip:dialog@ms.company.com;voicexml=http://10.10.10.53/ scripts/cgi-bin/vmail?DialledNumber=6048081234
Example 1-4 shows a SIP dialog that dynamically selects a script based on the callers in the session.
Example 1-4 SIP Dialog with Query in URI
sip:dialog@ms.company.com;voicexml=http://host.company.com/scripts /ivr?caller=usera&callee=userb
Passing Variables to the VoiceXML Interpreter

VoiceXML defines a standard set of read-only variables (session variables) that the media server initializes whenever a VoiceXML session is invoked from SIP. These session variables are passed through the SIP INVITE request to the VoiceXML script and are automatically declared for use within the VoiceXML script. Some of these variables get their values from SIP headers and others get their values from parameters of the Request-URI. A VoiceXML application can use this information to control or customize its dialog flow. VoiceXML scripts can be passed standard session variables or application session variables.
Standard Session Variables

Script variables declared from standard SIP session information are prefixed with session.connection. For example, consider the VoiceXML script, ivr.vxml. When invoked by a SIP INVITE request, this script results in the following VoiceXML session:
sip:dialog@ms.company.com;voicexml=http://host.company.com/scripts/ivr.vxml
14
This VoiceXML script has access to the following session variables. The values are all derived from header fields within the SIP INVITE request:
session.connection.local.uri session.connection.remote.uri session.connection.callid Derived from the To: header field Derived from the From: header field Derived from the CallId: header field
Application Session Variables

In addition to the standard session variables, which are automatically passed through information found contained in the SIP session information, application-specific session variables can be explicitly passed in the URI parameters of the initial SIP INVITE. The media server supports two methods of creating VoiceXML session variables from the Request-URI. The first method defines new variables under the session.user tree. In this method, script variables declared from standard SIP session information are prefixed with session.user. For example, a Request-URI of the following:
sip:dialog@ms.company.com;voicexml=http://host.company.com/script.vxml/x=y
would create a session variable named session.user.x with a value of y. For example, the following SIP INVITE request invokes the ivr.vxml VoiceXML script:
sip:dialog@ms.company.com;voicexml=http://host.company.com/ scripts/ivr.vxml;appvara=786;appvarmsg=hi there; appvarnumber=604-555-1234
In this request, values for application-specific variables appvara, appvarmsg, and appvarnumber are explicitly passed to the ivr.vxml script. Within the context of the script, these variables are defined as follows:
session.user.appvara session.user.appvarmsg session.user.appvar.number Value: 786 Value: hi there Value: 604-555-1234
The second method uses two session variable arrays to hold all URI parameters (including those defined for SIP in RFC 3261) and values. The first session variable array:
session.connection.protocol.sip.parameter[N].name
contains the names of all URI parameters. The first array element [0] always contains the string voicexml (regardless of where it appears in the SIP Request-URI) since that is the first and only required URI parameter for the dialog service context. The second array element contains the second URI parameter (if present), and so on. The second session variable array:
session.connection.protocol.sip.parameter[N].value
15
VoiceXML Overview
contains the corresponding values for the URI parameters. For example, a Request-URI of:
sip:dialog@ms.example.com;voicexml=http://server.example.com/script.vxml;x=y
populates the [0] and [1] array elements as follows:

session.connection.protocol.sip.parameter[0].name=voicexml session.connection.protocol.sip.parameter[1].name=x session.connection.protocol.sip.parameter[0].value=http://server.example.com/scr ipt.vxml session.connection.protocol.sip.parameter[1].value=y
Any escaped characters in the SIP Request-URI that are used as the name or value of VoiceXML session variables will be replaced with their unescaped representation. For example, a Request-URI of:
sip:dialog@ms.example.com;voicexml=http://server.example.com/script.vxml;%78=%79
creates and populates session variables exactly the same as in the preceding examples.
Terminating VoiceXML Dialogs

A dialog can complete in either of two ways: The VoiceXML interpreter encounters a <disconnect> element or an error. In this case, the media server issues a BYE to end the connection and will delete the call state information when the response to the BYE is received (or times out). Processing by the VoiceXML interpreter finishes for any other reason. Examples of are that the VoiceXML interpreter encounters a <return> element in the root document or it encounters an <exit> element. In this case, the media server waits for the control agent to issue a BYE.
Sample VoiceXML Call Flow

Figure 1-1 shows the call flow for a VoiceXML session with one user. The call flow is abridged, showing only interactions which impact the media server, not the previous or subsequent interactions between the endpoint and the control agent. Optional flows are shown in grey with dashed lines.
16
Figure 1-1 VoiceXML Call Flow
User A
Control Agent INVITE
Media Server HTTP GET 200 200 ACK RTP HTTP GET 200 BYE
Server
May originate from either the CA or the media server
Optional additional server interactions to submit results and/or fetch other documents, grammars, or audio clips
In this call flow, the following occurs:

1 The control agent initiates the call to the media server with an INVITE request on the first leg. The
Request-URI contains a voicexml= value which specifies the URI for the root document on an external server.
Note: The media server does not support VoiceXML 2.0 scripts in INVITE message bodies.
2 The media server sends a GET request with the HTTP URI to the server to retrieve the initial VoiceXML
document.
3 The external server responds with a 200 message to the media server and sends the document. 4 The media server responds with a 200 message to the control agent. 5 The control agent then sends an ACK to the media server indicating that the RTP connection is ready. 6 Upon receipt of the ACK, the media server sends the appropriate audio prompts to the user. 7 The user should reply with a DTMF input. This may trigger other dialogs to be acquired and sent. 8 When the IVR dialog ends or the user terminates his or her connection, the session is terminated by either
the media server, which sends a BYE to the control agent (or vice versa).
17
VoiceXML Overview
VoiceXML Interaction with HTTP Servers

The media server communicates with external HTTP servers using HTTP v.1.1. During a typical IVR dialog with a user, the collected information from the user is sent to an HTTP server using the <submit> element.
HTTP Server-Side Logic

When a <submit> element is sent to an HTTP server, the HTTP server-side application is required to generate and return a well-formed VoiceXML document. (This may be as simple as invoking <disconnect> or <exit> if the dialog with the user is to be terminated.) Once the media server receives the fetched VoiceXML document, control transitions to the new document. Example 1-5 shows a sample <submit> request, and the associated invoked server-side logic that returns a new VoiceXML document, to which control will be transitioned. This logic would be enclosed in a Perl script (.pl file). This example does the following:
1 Collects the data submitted by the media server 2 Creates a file called POSTDATA in the same directory from which the script is executed 3 Sends a VoiceXML document (script) back 4 Sends a <disconnect> element, which hangs up the call, freeing port resources and sending a BYE to
the SIP control agent

Example 1-5 Example Server-Side Perl Script
#!/usr/bin/perl # vxml requesthandler perl script $input_st = ""; if ($ENV{'REQUEST_METHOD'} eq "POST") { $input_st = <STDIN>; } else { $input_st = $ENV{'QUERY_STRING'}; } print Content-type: text/html\n\n; print print print print print print '<?xml version=1.0?>'; \n; '<vxml version=2.0 xmlns="http://www.w3.org/2001/vxml">'; "\n\n"; '<form>'; "\n";
18
VoiceXML Interaction with HTTP Servers
print print print print print print print print print print
'<block>'; "\n"; '<disconnect/>'; "\n"; '</block>'; "\n"; '</form>'; "\n"; '</vxml>'; "\n";
if ($input_st ne "") { open (outfile,">POSTDATA.DAT"); print (outfile $input_st); close (outfile); }
HTTP Cookies
The media server supports HTTP cookies in VoiceXML HTTP transport, as defined in the Netscape Persistent Client State HTTP Cookies specification [13]. A server returning a document or other HTTP object to a client can include a cookie, which contains state information plus the range of URIs to which that state information applies. The client stores the information in the cookie, and for any future HTTP requests to the server falling within that URI range, the client will transmit the state information along with the request. This allows the HTTP server to maintain state for a VoiceXML session. The media server permits or denies the use of VoiceXML cookies using a configuration parameter in the VoiceXML configuration file vxml.cfg. The media server generates a VoiceXML log message when it receives a request to set a cookie and the OAMP configuration parameter has been set to deny cookies. By default, cookies are enabled on the media server. Cookies are deleted when the associated SIP session expires, regardless of any expiration time specified. The media server supports a maximum cookie size (that is, NAME=VALUE combination) of 4096 bytes. The media server silently discards any cookies over the maximum. The media server will support up to 10 cookies per session. After the system maximum is reached, the media server deletes the least recently used cookie when a new cookie is created.
Set-Cookie Response Header

The server introduces the cookie to the client using the Set-Cookie header as part of an HTTP response. The Set-Cookie response header has the following syntax:
19
VoiceXML Overview
Set-Cookie: NAME=VALUE; expires=DATE; path=PATH, domain=DOMAIN_NAME
and the following applies:

NAME=VALUE Mandatory. The cookie itself: a simple text string excluding semi-colon, comma, and white space. (If these characters are required, they should be encoded in URL style, for example %20 for space .) The maximum size of NAME=VALUE combination is 4096 bytes. Cookies larger than the 4Kb maximum are silently discarded. Optional. A date string that defines the valid lifetime of the cookie. Format is Wdy, DD-Mon-YYYY HH:MM:SS GMT, for example, Friday, 16-Apr-2004 13:00:00:00 GMT. By default, the cookie expires when the user session expires. Note that the media server always deletes cookies when the SIP session that received the cookie terminates, regardless of the value of the expires attribute. Optional. Specifies the subset of URLs to which the cookie applies. The media server considers the path attribute to path match the request-URI if the path attribute matches a prefix of the request-URI. If the path attribute is not a prefix of the request-URI, the media server does not return cookies. The default is the path of the request-URL that generated the Set-Cookie response. Optional. Specifies the domain for which the cookie is valid. The media server considers the domain attribute to domain match the request-host if it matches the tail of the fully qualified domain name of the host. For example, mycorp.com matches both shipping.mycorp.com and service.mycorp.com. If the tail matches, cookie will proceed to path match (see below). Domain names must have at least one embedded dot (that is, domains such as .com are rejected), and they must domain match the request host. In addition, the media server will reject cookies with domain attributes where the request host is a fully qualified domain name of the form HD, where D is the value of the domain attribute, and H contains one or more dots. For example, the media server will reject a cookie from request host x.y.mycorp.com where domain=mycorp.com. The default is the fully qualified domain name of the server that generated the cookie.
expires=DATE
path=PATH
domain=DOMAIN_NAME
Attributes are separated by semi-colons. The media server does not support any other cookie attributes.
20
ASR and TTS
The Netscape specification includes the secure attribute, as follows:

secure If a cookie is marked as secure, it is sent only if the communication channel with the host is secure (that is, the channel is over HTTPS). If secure is not specified, the cookies is sent over the network in clear text. The default is to allow communication over an insecure channel.
However, the media server does not support HTTPS at this time. Any cookies specifying the secure attribute are ignored. The following is an example of a Set-Cookie header in an HTTP response:
Set-Cookie: SESSION_ID=10725; path=/; expires=Friday, 16-Apr-2004 13:00:00:00 GMT
The media server supports lists of cookies in the Set-Cookie header. Cookies are separated in a list by commas (,). In addition, the media server accepts multiple Set-Cookie headers within a single HTTP response. Cookies are uniquely identified by the combination of domain-path-name. So long as cookies have different path or domain attributes, they can have the same name. If the media server receives a cookie with the same domain-path-name as an existing cookie, it overwrites the old cookie. If the media server receives a cookie with the same domain-path-name as an expired cookie, it deletes the cookie.
Cookie Request Header

The media server returns cookies to a server when requesting a Request-URI from a request-host. The cookies are sent by including the Cookie header in the HTTP request. If multiple cookies were introduced by the server, the media server returns all cookies where: The domain attribute of the cookie domain matches the fully qualified domain name of the host, AND The path attribute of the cookie path matches the Request-URI. Multiple cookies returned in the Cookie header are separated by semi-colons. When multiple cookies are returned, the media server orders cookies with more-specific path mappings before cookies with less-specific path mappings. Cookie headers sent by the media server do not include any cookie attributes.
ASR and TTS

The media server supports Automatic Speech Recognition (ASR) and Text to Speech (TTS) if there are external servers defined for processing voice input (ASR) or synthesizing speech (TTS). If at the start of a session no ASR servers are defined then all ASR grammars defined in scripts within the session are ignored,
21
VoiceXML Overview
regardless of the input mode defined. Similarly, if no TTS servers are definedthen throughout the entire session TTS strings found within VoiceXML scripts are ignored. If one or more servers is enabled and brought online, subsequent new sessions will be able to utilize these servers; however, existing sessions will not.
User Input
In the VoiceXML applications supported by the media server, user input comes in the form of DTMF key presses or speech utterances. The way in which user input is collected, buffered and validated varies based on whether it is DTMF or speech input. For DTMF, all processing of digits is handled within the media server: digits are matched against active grammars according to the Form Interpretation Algorithm (FIA) defined by [13]. For speech, voice processing is performed by external speech servers. The validation of the speech against the grammar and determination of whether the speech matches or not is determined by these external servers. The results of the collection are returned to the interpreter as NLSML [8] scripts which are then processed by the media server.
DTMF
Within the media server, DTMF digits are detected on received RTP streams (inband DTMF). The media server also recognizes out-of-band (RFC 2833) DTMF digits.
Speech
Speech detection processing is performed by the external speech server. Once the media server determines that a grammar defined in a VoiceXML script is a speech grammar, the media server sends the grammar to the speech server and performs no other processing on the grammar: the speech server assumes responsibility for detecting and processing the user input. Voice input is received from the users RTP stream and routed to the speech server. The speech server makes all determinations of whether more input is required or whether the current input produces a match or no-match event. The result of the collection is returned to the media server as an NLSML script. The media server then interprets the results of the collection and determines the next action based on the FIA.
System Output
The output of VoiceXML applications is either the playback of recorded audio or video files or synthesized text-to-speech (TTS) played to the user by the external speech server. Audio playback to the user is invoked using the <audio> element. Audio files may be stored internally in the media server or on an external HTTP server. In either case, the source of the audio file is specified as a URI.
22
Control Flow
Audio files may be stored internally on the media server or on an external HTTP or NFS server. In either case, the source to the audio file is specified as a URI. The URI can be explicitly specified, or specified as the evaluation of an ECMAScript expression. The latter mechanism allows playing of audio files based on application-defined logic. Different methods of specifying audio files are described in detail in the section Working with Media Files and TTS Strings on page 28. If desired, the VoiceXML application can allow the user to interrupt (barge) audio playback with a DTMF key press, by enabling the bargein attribute of the <audio> element. TTS clips are specified by embedding strings into VoiceXML scripts. The media server supports plain text strings, Speech Synthesis Markup Language (SSML) strings, or a combination of the two. All strings, however, specified, are converted to SSML strings which are subsequently passed to an external speech server.
Control Flow
Control flow within a VoiceXML application can be manipulated using any of the following mechanisms: Application-defined variablesfor example, using variables defined using <var>, <assign>, or
<clear>
Predefined system variablesfor example, using variables defined using <var>, <assign>, or <clear> Event generation and handlingfor example, using <throw>, <catch>, <error>, <help>, <noinput>, <nomatch>, or user-defined events Conditional executionthat is, using <if>, <else>, and <elseif> Control transfer and jumpsfor example, using <goto>, <subdialog>, <submit>, <exit>, <return>, and <disconnect> Promptsfor example, using <prompt> and <reprompt> Scriptsthat is, using ECMAScript, either embedded inline or externally fetched
Session Termination
A session terminates either from the execution of a <disconnect> element within a script, because a user hangs up, or because a fatal error occurs during the execution of a script. Regardless of the cause, when a session terminates, the script enters into a state in which the set of operations that can be executed is restricted, so that script can clean up resources (for example, post the current state of collections, recordings, and so on to the HTTP server) before the session terminates. A script that is terminating is not able to queue or play prompts, recordings, or collect DTMF. There is a limit of two HTTP access operations and a maximum of six iterationsthat is, the script can move between forms and other scripts a maximum of six times before the session is terminated. These restrictions are intended to prevent unnecessary processing while in this clean-up state.
23
VoiceXML Overview
Shadow Variables
Some VoiceXML elements have associated shadow variables. Shadow variables are variables that are automatically assigned values when the elements are used. The media server supports shadow variables for the following: Announcements. Shadow variables for announcements are provided through the <prompt> element. These provide information about prompt completion and information resulting from DTMF collection. For information about shadow variables for announcements, please see the Shadow Variables section of the <prompt> element. Note that if the session terminates as the result of a SIP BYE, the shadow variables will not be updated with information about the prompt they will not contain correct values. Recordings. Shadow variables for recordings are provided through the <record> element. These provide information about the duration of the recording and the reason for its termination. For information about shadow variables for announcements, please see the Shadow Variables section of the <record> element. Shadow variables cannot be modified by a user or an application. They are returned from a VoiceXML document. Supported shadow variables are summarized in the List of Shadow Variables on page xiii.
Events
Some events and errors are automatically generated by the media server; others are generated under direct control of the VoiceXML application. For each event type or error type, the VoiceXML application can specify specific handling. Some event and error handling has a predetermined default implementation provided by the media server. In most cases, the system default event or error handlers can be overridden by the VoiceXML application to provide a more tailored mechanism.
24
Events
Table 1-5 shows media server support for VoiceXML events.

Table 1-5
Event connection.disconnect
Event Support
Description Supported. Thrown whenever a disconnect distinct from hangup occurs. This event, if caught within an application, allows the application the opportunity to perform final processing before terminating the session. This may include posting of data (such as a recording), or submitting variables to an HTTP server using the <submit>, <goto>, or <link> element. Processing allowed before session termination is restricted to posting information to an HTTP server, setting variable values, and executing simple if-else-elseif statements. Execution requests to play audio, perform recording, define grammars will not be honored. In addition, a maximum of two HTTP post events are allowed in the catch handler and subsequent VXML documents. Note that this limitation applies equally to events executed within a particular <catch> handler and to subsequent VXML documents returned to the application. All attempts to perform restricted operations are terminated without incident; that is, even though a request to play an audio clip after a disconnect will fail, a second error will not be generated. Supported. Thrown whenever a disconnect occurs. This event, if caught within an application, allows the application the opportunity to perform final processing before terminating the session. This may include posting of data (such as a recording), or submitting variables to an HTTP server using the <submit>, <goto>, or <link> element. Processing allowed before session termination is restricted to posting information to an HTTP server, setting variable values, and executing simple if-else-elseif statements. Execution requests to play audio, perform recording, define grammars will not be honored. In addition, a maximum of two HTTP post events are allowed in the catch handler and subsequent VXML documents. Note that this limitation applies equally to events executed within a particular <catch> handler and to subsequent VXML documents returned to the application. All attempts to perform restricted operations are terminated without incident; that is, even though a request to play an audio clip after a disconnect will fail, a second error will not be generated. This event is thrown regardless of how the disconnect is initiatedthat is, whether the <disconnect> element was executed or encountered, or whether the disconnect was on account of a user hang-up. Supported. Thrown if fax tone detection is enabled (by setting the com.cvd.faxdetect property to true) and a fax tone is detected. Supported. This event will be processed either by the appropriate catch handler, or by the <exit> element. Please see page 88 for details on the <exit> element. Supported. Indicates a match event for DTMF collection. Allows application to define specific behavior relative to DTMF match events. This event is not actually thrown in the sense that it can be caught by a catch handler such as the <catch> element. Supported. This event will be processed either by the appropriate catch handler, or by the <help> element. Please see page 100 for details on the <help> element. If an application-specific <help> handler is not defined in the document, the default <help> handler executes 5 times before exiting the session.
connection.disconnect. hangup
com.cvd.event. faxdetect exit filled
help
25
VoiceXML Overview
Table 1-5
Event noinput
Event Support
Description Supported. Used to catch no-input events relative to DTMF collection and recording.This events allows an application to override the default <noinput> handler. Please see page 114 for details on the <noinput> element. If an application-specific <noinput> handler is not defined in the document, the default <noinput> handler executes 5 times before exiting the session. Supported. Used to catch no-match events relative to DTMF collection. Allows an application to override the default <nomatch> handler. Please see page 115 for details on the <nomatch> element. If an application-specific <nomatch> handler is not defined in the document, the default <nomatch> handler executes 5 times before exiting the session. Ignored. Ignored. Ignored.
nomatch
cancel connection.disconnect. transfer maxspeechtimeout
Errors
All VoiceXML errors are fatal to the current session, and the session terminates in all cases. Table 1-5 shows VoiceXML errors supported by the media server. Note that unsupported errors are not listed.
Table 1-6
Error error.badfetch error.grammar error.max_loop_count_ exceeded
Error Support
Description Thrown when specified resource could not be fetched, or the resource was specified incorrectly. Thrown for incorrectly formatted grammars, or unsupported attributes used within a grammar. Thrown if: 1. The maximum document fetches set for this session has been exceeded. Note this includes VXML document fetches for submit, subdialog, goto, link and the initial application document. Root documents and external SRGS grammars are not counted for this counter. The default is 100 fetches. 2. If the number of iterations (loops) exceeds 400 for a session. This includes all documents fetches and transitions between forms within a document. Thrown when a request (for example to play a clip or to enable fax detection) is rejected because available resources have been exceeded and overload protection is in effect. Like all VoiceXML errors, this error is fatal and the session will be terminated.
error.noresource
26
ECMAScript Support
Table 1-6
Error
Error Support
Description Thrown when incorrect or invalid values are assigned to properties. For information about supported properties, please see Chapter 2: Properties Overview. Also thrown for unsupported or undefined ECMAScript objects; for example when an undefined variable is evaluated. For information about ECMAScript support, please see ECMAScript Support on page 27. Thrown when an unsupported language has been specified for sets and variables. Thrown when an unsupported element is specified.
error.semantic
error.unsupported. language error.unsupported. element
ECMAScript Support
The media servers ECMAScript support is fully compliant with ECMA-262, Edition 3 based on JavaScript 1.5. The length of any variable in VoiceXML, however specified, is limited to 256. In addition to the 256-character maximum enforced by the media server, ECMAScript may apply additional constraints in its own handling of variables. Any string specified longer than 256 or that supported by ECMAScript results in session termination with an error.semantic being thrown, except in the following cases: The variable is a URI, Remote or Local Address session variable. These are default session variables available to all applications. The maximum length of a VoiceXML URI that starts a session is 1024. Rather than rejecting the call with an exception, such values are truncated to 256 characters and stored in there in the shortened form. A user-defined session variable is longer than 256 characters. In this case, the session terminates, but an exception is not thrown. All other cases result in an error.semantic and session termination. Note that the media server does not throw an error.semantic for division-by-0 errors. Instead, the media server returns a value Inf, INF, or inf, representing infinity.
Escape Characters
There are essentially four classifications of data received or sent by the media server in which checking for escape characters (in the form%HEXHEX) may or may not be requiredor in which the media server may need to escape characters deemed to be special by the protocol. Thes four types of data are the following:
27
VoiceXML Overview
1 The Request-URI representing the initial URI required to start a session.
The media server assumes that this URI has been successfully extracted by the SIP layer. The URI may or may not include escape characters, so the media server processes the URI to remove any escape characters from the string.
2 Session variables appended to the end of the Request-URI.
The session variables are removed by the SIP layer based on the rules as defined in SIP RFC 3261. The session variables are presented in a list and are individually processed, with each escape character being converted into its ASCII equivalent.
3 URIs received within a VoiceXML document.
URIs received within a VoiceXML document are processed by the XML parser, which unescapes all characters based on the rules of XML. Subesquent to this operation, there is no other escape checking required or performed.
4 URIs including namelist data sent from the media server to an HTTP server.
These are URIs compiled by the media server and sent to the HTTP server. All characters in these URIs must be escaped, according to the HTTP protocol, and the media server processes them accordingly.
Working with Media Files and TTS Strings

This section presents the following topics: Media Clip Support Clip Delineation in Prompts Referring to Media Files HTTP Queries Relative URIs Sets and Variables
Media Clip Support

Table 1-8 shows the clip formats and encodings (in the format file-format:codec) supported for VoiceXML
Table 1-7 VoiceXML: Supported Media Clips
Storage: Audio Internal Indexed WAV: G.711, G.729 Announcement Video Audio WAV: G.711, G.729 Recording Video Sets & Vars Audio WAV: G.711, G.729
28

Storage: Audio Internal Named WAV: G.711, G.729 QT: G.711 3GPa: AMR WAV: G.711, G.729 QT: G.711 3GP: AMR WAV: G.711, G.729 Announcement Video QT: H.263 3GP: H.263 Audio WAV: G.711, G.729 QT: G.711 3GP: AMR WAV: G.711, G.729 QT: G.711 3GP: AMR WAV: G.711, G.729 Recording Video QT: H.263 3GP: H.263 Sets & Vars Audio WAV: G.711, G.729
NFS
QT: H.263 3GP: H.263
QT: H.263 3GP: H.263
HTTP
a. The software media server, which does not support AMR, does not use 3GP for audio, only for video-only clips.
Audio clips must have the following characteristics: 8 kHz Mono (number of channels is 1) 8-bit
Clip Delineation in Prompts

Clips whether audio, multimedia, video-only or TTS renderingsare grouped together to form a prompt. The way that clips are queued for prompt depends on the following: A bargein attribute associated with the prompt A vcrprompt attribute The number of clips being queued (a maximum of 50 clips is supported) Whether alternate clips are specified Whether TTS clips are specified The media server places the prompt clips into a queue. When the media server reaches a point where the prompt is to be played, it issues the request to play all clips in the group. Only when all clips in the group have been played is the request to play the next group (prompt) issued. The order of clips played always matches the order specified in the VoiceXML document. All the clips within the same request group necessarily have the same bargein and vcrprompt attributes. Note that the barging of one in a set of clips barges all clips, even those that have been queued but have not yet been requested to be played. This is true even if a subsequent set of clips has the bargein attribute set to false.
29
VoiceXML Overview
Alternate clips are sent as separate requests and are not included with primary clips. TTS clips are not included with audio or multimedia clips and are handled separately by the media server. Audio and TTS clips can be grouped together. Clips containing video must be grouped separately: either all the clips must contain video or none of them may. In addition VCR controls are not supported for video clips. If a clip containing video is discovered within a set of clips of another type, or if VCR controls are applied to video clips, the media server reports an error and terminates the playing of any other queued and requested clips. Clips that have been queued but are not yet requested are not affected.
Referring to Media Files

Audio-only, video-only, and multimedia clips are supported in VoiceXML. Table 1-8 shows the clip formats and encodings (in the format file-format:codec) supported for VoiceXML
Storage: Audio Internal Indexed Internal Named WAV: G.711, G.729 WAV: G.711, G.729 QT: G.711 3GPa: AMR WAV: G.711, G.729 QT: G.711 3GP: AMR WAV: G.711, G.729 Announcement Video QT: H.263 3GP: H.263 Audio WAV: G.711, G.729 WAV: G.711, G.729 QT: G.711 3GP: AMR WAV: G.711, G.729 QT: G.711 3GP: AMR WAV: G.711, G.729 Recording Video QT: H.263 3GP: H.263 Sets & Vars Audio WAV: G.711, G.729 WAV: G.711, G.729
NFS
QT: H.263 3GP: H.263
QT: H.263 3GP: H.263
HTTP
a. The software media server, which does not support AMR, does not use 3GP for audio, only for video-only clips.
Audio clips must have the following characteristics: 8 kHz Mono (number of channels is 1) 8-bit
30
The following table shows how to specify named media files in a VoiceXML document.
Table 1-9 Referencing Named Media Files in VoiceXML
Identifier Type Internal Announcement Syntax Syntax is [file:/]/provisioned/path/filename. Provisioned clips with alphanumeric names can be structured in up to nine levels of hierarchical directories or paths (with the level /provisioned forming a tenth level where applicable). Levels are delimited with the slash character (/). If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 128 characters can be used in total for path/filename. File names are case-sensitive. File extensions are not case-sensitive. Numbers, letters, and the underscore character are supported. Slash (/) is supported only to delimit levels of hierarchy. One period (.) is supported to delimit the file name from the file extension. Examples: file://provisioned/audioclips/hello.wav /provisioned/audioclips/hello.wav Internal Recording Syntax is [file:/]/transient/filename Transient recordings do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 128 characters can be used in total for filename. Numbers, letters, and the underscore character are supported. One period (.) is supported to delimit the file name from the file extension. Example: file://transient/user1_name.wav /transient/user1_name.wav file://transient/intro.mov /transient/intro.QT An absolute URL consisting of the file://mnt header (representing the mount point or exported directory) plus a valid NFS URI as per RFC 2224. The syntax is as follows: [file://]mnt/nfs_server_ip/path/filename where nfs_server_ip is the IP address of the external NFS server, path is the path fragment to be appended to the exported directory, and filename is the media file. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 255 characters can be used in total. Numbers, letters, and the underscore character are supported. Slash (/) is supported only to delimit levels of hierarchy. One period (.) is supported to delimit the file name from the file extension.
NFS server
31
VoiceXML Overview
Table 1-9 Referencing Named Media Files in VoiceXML

Identifier Type Syntax Examples: Suppose that: The IP address of the NFS server is 10.10.4.102 The server is known by the DNS server as gecko The exported directory is /annc/myclips. To play an audio clip welcome.wav located in /annc/myclips/audioclips, you can specify any of the following URIs: file://mnt/10.10.4.102/audioclips/welcome.wav mnt/10.10.4.102/audioclips/welcome.wav file://mnt/gecko/audioclips/welcome.wav mnt/gecko/audioclips/welcome.wav HTTP server A valid HTTP URL or URI. The syntax is as follows: [http://]path/filename If the http:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 255 characters can be used in total. Numbers, letters, and the underscore character are supported. Slash (/) is supported only to delimit levels of hierarchy. One period (.) is supported to delimit the file name from the file extension. Files names can include a valid HTTP query. Examples: http://10.10.6.213/annc/myclips/audioclips/welcome.wav 10.10.6.213/annc/myclips/audioclips/welcome.wav http://10.0.0.132/wavs/audio_handler?id=1234&sub=999
The following table shows how to specify indexed audio files in a VoiceXML document.
Table 1-10 Referencing Indexed Audio Files in VoiceXML
Identifier Type Internal Announcement Syntax Syntax is [file://]index, where index is the numeric index of the clip. The range for indexes is 150000. Indexed clips do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Example: file://729 729
32
Table 1-10 Referencing Indexed Audio Files in VoiceXML

Identifier Type Internal Recording (Index Assigned by CA) Syntax Syntax is [file://]index, where index is the clip index. The range for indexes is 20000012025000. Transient recordings do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Examples: file://2000992 2000992 Syntax is [file://]index, where index is the clip index. Transient recordings do not support hierarchical paths. CMS-9000 or CMS-6000: For an MPC in slot n the range is: 5000001+(n*100000) to 5025000+(n*100000) Example: For an MPC in slot 2 the range is 5200001 to 5225000: file://5200992 5200992 For an MPC in slot 5 the range is 5500001 to 5525000: file:///5500729 file://5500729 5500729 CMS-3000 or CMS-1000: The range is 5200001 to 5225000. Example: file://5200992 5200992
Internal Recording (Index Assigned by MS)
Note: When using the VoiceXML interface, avoid using spaces in media file names; instead, encode the space as the escape character %20. The media server does accept file names that include spaces, but replaces them with the escape character %20 before passing the file along for further processing
HTTP Queries
An HTTP URI can include a query component, instead of a straight request for a specific audio or multi-media resource, as in the following example:
<audio src="http://10.0.0.132/wavs/audio_handler?id=1234&sub=999>
This example allows the server to dynamically select an audio file to play.
Relative URIs
33
VoiceXML Overview
VoiceXML documents are always stored on HTTP servers. References to VoiceXML documents are similar to those for clips stored on external HTTP servers. VoiceXML documents can be referenced by either an absolute URI or a relative URI. A relative URI is recognized by the absence of the protocol (http:// or file://) scheme before the path fragment and file specification. The media server converts a relative URI to an absolute URI by concatenating with a base URI, which is either the URI of the fetching document or the value declared by using the xml:base attribute. A declared value takes precedence over the URI of the fetching document. The declaration can be made in multiple documents; the innermost declaration takes precedence. For example, suppose a VoiceXML document has a base URI is http://server2/path1/path2. Then consider the document reference in Example 1-6:
Example 1-6 Relative URI
<goto next="record.vxml">
This reference is a URI fragment. Accordingly, the VoiceXML interpreter considers it to be a relative URI. Thus, the base URI is concatenated to record.vxml, resulting in an HTTP GET to http://server2/path1/path2/ record.vxml. In accordance with the precedence rules for determining the base URL, within record.vxml (that is, while record.vxml is executing), the following applies: If record.xml itself has xml:base specified, the value of xml:base is used as the base URI while record.xml is executing. In that case, the base URI of the calling document is ignored. If xml:base is not specified, the base URI for record.xml is the URI that was used to fetch record.xml. In this case, that is http://server2/path1/path2. In Example 1-7, suppose a VoiceXML document again has a base URI is http://server2/path1/path2. Then consider the following document reference:
Example 1-7 Absolute URI
<goto next="http://newserver/path1/path2/path3/record.vxml">
This reference is to an absolute URI. The base URI is bypassed, resulting in an HTTP GET to http://newserver/path1/path2/path3/record.vxml. As in Example 1-6, within record.vxml, the following applies: If record.xml itself has xml:base specified, the value of xml:base is used as the base URI while record.xml is executing. In that case, the base URI of the calling document is ignored. If xml:base is not specified, the base URI for is the URI that was used to fetch record.xml. In this case, that is http://newserver/path1/path2/path3.
34
Sets and Variables

The media server supports dynamically rendering announcements from sets of clips, or by using clips referenced by variables. A set is a provisioned collection of audio clips together with an associated selector type and value. The selector is used to identify a specific physical audio clip within the set. A variable represents a semantic concept (such as date or number) which the media server uses to dynamically construct the appropriate audio segment. The media servers sets and variables feature is described in detail in the Convedia Media Server Sets and Variables Interface Reference Guide. That document describes how to create an audio segment configuration file for using sets and variables, and how to install it. It also describes the media servers support for implemented languages. In SIP-controlled media servers, sets and variables are available for MSML and VoiceXML. For more information about using sets and variables on the media server, please see the Convedia Media Server Sets and Variables Interface Reference Guide. The VoiceXML interface supports a subset of the media servers full sets and variables capability. Variables can be included in audio prompts using the <prompt> element. A variable is included by embedding the <say-as> element within the <prompt> element. The variable value itself is specified using a <value> element within the <say-as> element, or by a plain text string within the <say-as> element. The language in which the variable is to be rendered is indicated using the xml:lang attribute, either at the document level (that is, within the <vxml> element) or within the <prompt> element. Currently, only English (en) is supported. For more information on VoiceXML support for sets and variables, please see the <vxml> element, the <prompt> element, the <say-as> element, and the <value> element. For full details on the media servers sets and variables feature, please see the Convedia Media Server Sets and Variables Interface Reference Guide.
35
VoiceXML Overview
36
Chapter2:
VOICEXML PROPERTIES
This chapter describes the media servers support for VoiceXML properties. This chapter presents the following information: Properties Overview Generic Speech Recognizer Properties Generic DTMF Recognizer Properties Prompt Properties Fetching Properties Fax Detection Property
37
VoiceXML Properties
Properties Overview
Properties are variable settings that can be used to affect the behavior of the VoiceXML interpreter, such as DTMF recognition, timeout intervals, caching policy, and so on. VoiceXML properties are set using the <property> element. In some cases, global properties can be overridden using an attributes. For example, the bargein property can be updated by setting the bargein attribute in the <prompt> element. When values are not specifically assigned, properties inherit the platform defaults defined in this chapter. Any malformed property will result in an error.semantic exception being thrown and session termination. Table 2-1 summarizes the media servers support for VoiceXML properties
Table 2-1 Property Support Summary
Property Class Generic DTMF Recognizer Properties Prompt Properties Fetching Properties Fax Detection Property Generic Speech Recognizer Properties Object Fetching Properties Example Interdigit timeout values Prompt barge-in (interrupt) Fetch timeout value for retrieving documents, grammars, and scripts CED fax tone. inputmodes= voice N/A Reference Page 41 Page 43 Page 43 Page 46 Page 38 Ignored
Generic Speech Recognizer Properties

There are a number of properties that are activated as part of support for voice-based grammars. These property values are passed to the specified ASR speech recognizer when a grammar is activated through the equivalent MRCP header field. If a value is not specified, default values are used for these properties.
38
Generic Speech Recognizer Properties
These values can be configured on the external speech server. Alternatively, the media server can be configured to set these values through the control protocol by configuring the media server through the management interface. If VoiceXML is the control protocol, these values are set through the properties described in the following table. If MSML 1.1 is the control protocol, the default value listed for the property is set. Table 2-2 shows the mappings between VoiceXML properties and their equivalent MRCP header fields.
Table 2-2
Property confidencelevel
MRCP Speech Recognizer Properties Support

Description The required confidence level of the speech recognition. If the confidence level returned by the speech server falls below this value, a nomatch event is returned. The range is 0.0 to 1.0, where 0.0 means minimum confidence is needed and 1.0 means maximum confidence is needed. The default is 0.5. The sensitivity level of the speech recognizer to voice input. The range is 0.0 to 1.0, where 0.0 is least sensitive and 1.0 is most sensitive. The default is 0.5. A hint specifying the desired balance between speed vs. accuracy in recognition. The range is 0.0 to 1.0, where 0.0 is fastest and 1.0 is best accuracy. The default is 0.5. The amount of time the speech recognizer should wait after a speech input has been recognized and is considered a match before returning a match. Reasonable values are in the range of 0.3s to 1.0s. The default is 1.0s. The amount of silence after receiving speech for the speech recognizer to wait before it finalizes the result. This timer applies only where the speech received does not so far match any current active grammar. This parameter also applies when the received speech matches an active grammar but where it is possible to speak further and match another grammar. The default is 1.0s. The amount of time in seconds or milliseconds the speech recognizer should wait for speech to start. The default is 10s. An MRCP property that has no equivalent in VoiceXML or MSML. Specifies the total time this recognition may take to complete. This is from, in effect, the start of the speech event through to the completion of recognition, as long as there is continuous voice being recognized. If the timer expires before the accumulated speech produces a match, a nomatch event is returned. The amount of time a speech server should wait to retrieve an ASR grammar, as well as the time the media server waits to retrieve a VoiceXML script. The application should be aware that this property value applies to two entities: general document fetches and MRCP document fetches. The value is an interval in seconds in the format <number>s. The default value is 50s. MRCP Equivalent confidence-threshold
sensitivity
sensitivity-level
speedvsaccuracy
speed-vs-accuracy
completetimeout
speech-complete-timeout
incompletetimeout
speech-incomplete-timeout
timeout (No VoiceXML equivalent)
no-input-timeout recognition-timeout
fetchtimeout
fetch-timeout
39
2
Table 2-2
Property
VoiceXML Properties
MRCP Speech Recognizer Properties Support

Description A Boolean value passed in the MRCP Recognize command that specifies whether the no-input-timeout timer should be started or not. For cases where a clip is being played prior to a collection, this value should be set to false. This value is set to false if there are concurrent announcements being played AND the announcement is bargeable; otherwise, it is set to true. For this MRCP header field it is assumed that any time a prompt is to be played prior to a voice collection this value will be set to false. If the prompt completes without being barged then the media server sends a RECOGNITION-START-TIMERS request to the Speech Server to start the timers running. In VoiceXML, this property controls the size of the application.lastresult$ array. This array holds the various possible values for a specified collection. This field allows that more that one alternative for a speech collection to be returned. The values returned are stored in the associated indices of the application.lastresult$ array. MRCP Equivalent Start-Input-Timers (MRCP v2) Recognizer-Start-Timers (MRCP v1)
(No VoiceXML equivalent)
maxnbest
n-best-list-length
In addition to properties that have specific mappings to MRCP header fields, the media server implements the following properties to support voice:
Table 2-3 General Speech Property Elements
Property inputmodes Description Specifies the input mode that is currently active within the defined scope. Supported values are as follows: dtmf: DTMF only is accepted as input. voice: Voice only is accepted as input. dtmf voice: Both DTMF and voice are accepted as input. The default value for this property is configured in the media servers management interface. Note that these values are case-sensitive. A RadiSys extension. An application-defined property that can be used to specify the external ASR server for the current call. Limitations on this value are outlined below. The value is intended to match the External Server name as defined in the management interface and applies only to ASR servers.
externalserver
Note that the externalserver property is external to the VoiceXML specification. It is defined to support applications that may want to access a specific server or set of servers using a load balancer based on server capabilities or ownership of servers. Limitations on this value are as follows: The current value of the property is included in all set-up requests. This includes ASR servers only; there is no equivalent for accessing TTS servers. The property applies to only one type of server. If set after the ASR connection has been established then this element has no effect. The media server does not validate the value specified in the property.
40
Generic DTMF Recognizer Properties
If the external server is specified and does not map to an existing server, the service request fails.
Generic DTMF Recognizer Properties

Table 2-4 shows media server support for generic DTMF recognizer properties.
Table 2-4 Generic DTMF Recognizer Property Support
Property timeout Description The interval after which, if DTMF or speech user input is not received, a noinput event is thrown. The value of this property can be overridden locally by setting the timeout attribute of the <prompt> element. The timer for this property starts whenever the media server transitions into a digit collection state or speech and no DTMF digits have yet been received (that is, there are no digits in the digit buffer) or no speech has been received. The timer stops on receipt of the first DTMF event or speech utterance, after which the inter-digit timeout timer takes effect. When used in a play-collect operation (within a <prompt> element), this value applies to the interval beginning when the prompt audio clip is queued, and not when the clip is played. The default is 10s (10 seconds). The interval between DTMF digits after which, if exceeded, a nomatch event is thrown. The timer for this property starts after receipt of the first DTMF digit, and is reset each time a DTMF digit is received, until a match is achieved or the media server determines that a match is impossible. If a match is achieved, then the behavior depends on the value set for the termtimeout property. If the media server determines that a match is impossible, then a nomatch event is thrown. The default is 4s (4 seconds). Specifies a single DTMF digit which, if detected prior to a inter-digit timeout, terminates DTMF collection. The termination key is not included in the resulting user input. A value of null indicates that no DTMF termination key is defined. By default, the pound sign (#) terminates DTMF input.
interdigittimeout
termchar
41
VoiceXML Properties
Table 2-4 Generic DTMF Recognizer Property Support

Property termtimeout Description The interval the media server will wait when all digits expected according to the active grammar have been collected. For fixed-length grammars, this is when exactly the number of specified digits have been collected. For variable-length grammars, this is when the maximum number of digits for the grammar have been collected. Until the expected number of digits has been received, the media server waits for the next expected event. This may be either: 1. The next digit, or 2. The defined termchar, or 3. An inter-digit timeout. Once the expected number of digits has been received, then the behavior depends on whether the termtimeout property has been set. If the termtimeout is NOT set (that is, when it is set to 0s, which is the default), collection immediately terminates. If termtimeout is set, the media server waits for the specified timeout interval before terminating collection. Setting a value for termtimeout allows (for instance) the possibility of matches to more than one grammar, of increasing specificity, so that the most specific grammar possible can be matched. The default is 0s (0 seconds). Ignored.
longdigitduration
Timeout vs. Interdigit Timeout Properties

The timeout and interdigittimeout values are both associated with DTMF collections and can be set as properties anywhere within a VXML document where properties are allowed to be set. (Note that the timeout property is also associated with pre-speech timeouts for recordings. However, the discussion in this section considers only its use within DTMF collections, including play-collect sequences; its use and mappings for speech collection is not addressed here. Both the timeout and interdigittimeout values are associated with a digit collection and are not explicitly associated with the playing of audio clips. The timeout property starts whenever the media server transitions into a digit collection state AND there are currently no digits in the digit buffer. It stops when the first DTMF event is received. If it expires, a noinput event is thrown. The interdigittimeout property is reset prior to all DTMF collections. Note that resetting the timer both updates its value as well as ensures that it is started. When the media server transitions into a digit collection state, it resets this timer only after receiving the first digit, since if there are no digits the timeout timer is run. The inter-digit timer (IDT) is started and stopped as follows: Requests to play announcements stop the IDT. It is restarted only if an explicit request to start it is made or a digit is received. For the latter case, the timer value is whatever was previously set. Thus, after an announcement completes, the IDT timer is not running. The IDT is stopped (if running) and restarted whenever a digit is received.
42
Prompt Properties
The IDT is stopped and restarted with the new value whenever a reset timer event is issued by the media server. Once the IDT expires, it does not restart until either a digit is received or it is explicitly requested to run, through a reset timer event.
Prompt Properties
Table 2-5 shows media server support for prompt properties.
Table 2-5
Property bargein
Prompt Property Support

Description Specifies whether an audio prompt or TTS rendering can be interrupted (barged) by DTMF input. The value of this property can be overridden locally by setting the bargein attribute of the <prompt> element. Supported values are as follows: true: The prompt is bargeable, and DTMF input will interrupt play or TTS rendering. If any digits remain in the digit buffer at the time this element is executed, the clip is barged immediately and will not play. false: The prompt is not bargeable. Any digits currently in the digit buffer are cleared, and any digits received while clip(s) are playing are discarded. The bargein property can interact with the cvd:cleardb attribute set in the <prompt> element, and in that case the behavior will vary depending on the contents of the digit buffer. For more information, please see the Usage Guidelines for the <prompt> element. The <prompt> element is documented beginning on page 122. Ignored.
bargeintype
Fetching Properties
fetchhint Properties
In the VoiceXML specification, the fetchhint property defines when the interpreter context should retrieve the corresponding content from the server. A value of prefetch indicates that a file is to be downloaded when the page is loaded. A value of safe indicates that a file is only to be downloaded when actually needed.
43
VoiceXML Properties
The VoiceXML default value for fetchhint properties is prefetch. However, prefetching is not supported on the the media server, and it always behaves as if the value is safe.
Table 2-6
Property fetchhint audiofetchhint documentfetchhint grammarfetchhint scriptfetchhint
fetchhint Property Support

Description Not supported. The media server always downloads a document only when actually needed (equivalent to a value of safe). Ignored. Ignored. Ignored. Ignored.
maxage Properties
In the VoiceXML specification, maxage properties ensure that the type of document the property governs does not use content whose age is greater than specified. These maxage properties are used in conjunction with corresponding maxstale properties to determine document fetching behavior. The maxage property is not supported. In general, the media server checks the date of the content on the server and fetches the content if it is newer than that on the media server.
Table 2-7
Property maxage audiomaxage documentmaxage grammarmaxage scriptmaxage
maxage Property Support

Description Not supported. Ignored. Ignored. Ignored. Ignored.
maxstale Properties
In the VoiceXML specification, maxstale properties indicate that the document is willing to use content that has exceeded its expiration time. These maxstale properties are used in conjunction with corresponding maxage properties to determine document fetching behavior.
44
Fetching Properties
The maxstale property is not supported, as shown in Table 2-6.

Table 2-8
Property maxstale audiomaxstale documentmaxstale grammarmaxstale
maxstale Property Support

Description Not supported. Ignored. Ignored. Ignored.
Other Fetch Properties

The media server does not support other fetch properties, as shown in Table 2-9.
Table 2-9 Support for Other Fetch Properties
Property fetchaudio fetchaudiodelay fetchaudiominimum fetchtimeout Description Ignored. Ignored. Ignored. Supported for VoiceXML scripts but not for audio..
Object Fetch Properties

The media server does not support object fetch properties, as shown in Table 2-10.
Table 2-10 Support for Object Fetch Properties
Property objectfetch objectfetchhint objectmaxage objectmaxstale Description Ignored. Ignored. Ignored. Ignored.
45
VoiceXML Properties
Fax Detection Property

Table 2-11 shows media server support for the fax detection property. The fax detection property is a RadiSys extension, and is specific to the Convedia Media Server.
Table 2-11
Property com.cvd.faxdetect
Fax Detection Property Support

Description Enables or disables fax tone detection. Supported values are as follows: true: Fax tones are detected if they occur, and a com.cvd.event.faxdetect event is thrown. false: Fax tones are ignored. If enabled within the scope of an element playing an announcement, collecting digits, or recording audio, the fax tone will interrupt the operation. Any associated shadow variables will be updated prior to the com.cvd.event.faxdetect event being thrown, as follows: If the fax tone interrupts a digit collection, the application.cvd_ lastresult$.termcond shadow variable is set to FAX. If the fax tone interrupts a recording, the name$.termchar shadow variable is set to F. The value of this property depends on the value set within the element currently being executed; that is, this property adheres to basic VoiceXML scoping rules. In addition, the setting is applied when the particular request is executed. For announcements, this means that fax detection is enabled or disabled when the associated audio clip is playednot when it is queued. By default, fax tone detection is disabled when a session is initiated.
Table 2-12 shows the interaction between the bargein property (or attribute) and the fax detection setting with respect to audio announcements.
Table 2-12 Interaction of bargein and Fax Tone Detection
bargein True True False False Fax Tone Detection Disabled Enabled Disabled Enabled Behavior Any DTMF digit will interrupt the announcement, but a fax tone will not. Fax tones are ignored. Any DTMF digit will interrupt the announcement, and a fax tone will also interrupt the announcement. The announcement is not interruptible: neither DTMF digits nor a fax tone will interrupt the announcement. No DTMF digit will interrupt the announcement; DTMF digits are ignored. However, a fax tone will interrupt the announcement.
46
Fax Detection Property
Table 2-13 shows the interaction between the dtmfterm attribute of the <record> element and the fax detection setting with respect to recordings.
Table 2-13 Interaction of dtmfterm and Fax Tone Detection
dtmfterm True True False False Fax Tone Detection Disabled Enabled Disabled Enabled Behavior Any DTMF digit will interrupt the recording, but a fax tone will not. Fax tones are ignored. Any DTMF digit will interrupt the recording, and a fax tone will also interrupt the recording. The recording is not interruptible: neither DTMF digits nor a fax tone will interrupt the recording. No DTMF digit will interrupt the recording; DTMF digits are ignored. However, a fax tone will interrupt the recording.
47
VoiceXML Properties
48
Chapter3:
DTMF AND VOICE GRAMMARS
This chapter describes the media servers support for DTMF and voice grammars in VoiceXML. The following information is presented: Overview Input Mode Menu-Choice Grammars Option Grammars SRGS Grammars Arbitrary Grammars Built-In Grammars Maximum Length of Grammars Input Mode
49
DTMF and Voice Grammars
Overview
This section presents the following topics: DTMF Grammars Speech Grammars The grammar definitions of a VoiceXML application provide a standard mechanism of validating user input. The grammar defines a set of rules, which are applied to user input to validate it. User input can take the form of either DTMF key presses or of speech (voice) utterances. Each form of input has its own grammars. Grammars can be defined according to the XML-based W3C Speech Recognition Grammar Specification (SRGS) language [11]. SRGS supports both grammars for speech recognition and grammars for DTMF user input validation. In addition to the above, the media server has a set of general purpose grammars built into the VoiceXML interpreter. These built-in grammars allow ease of application development, in that these grammars do not have to be defined using the SRGS. As with VoiceXML dialogs and subdialogs, a set of commonly used grammar rules can be maintained as a library. A grammar definition can be embedded within the application or it can be referenced from an externally located file. The concept of scope also applies to grammars. Multiple grammars may be active at the same time. For instance, when a grammar is defined with scope that applies to the entire VoiceXML document, then the grammar is active during all input collection phases. This mechanism is useful when defining common or global user input action items, such as Press *9 at any time to receive help.
DTMF Grammars
DTMF grammars define rules for collecting and validating user input supplied as DTMF key presses. DTMF grammars can be specified in either of the following ways: As a built-in grammar (see page 55) As an XML-based SRGS grammar (see page 53)
Speech Grammars
Speech grammars define rules for collecting and validating user input supplied as speech utterances. The processing of speech grammars is performed by the external speech server, not by the media server. Speech grammars can be specified in either of the following ways: As a built-in grammar (see page 55) As an XML-based SRGS grammar (see page 53) The actual support for speech grammars depends on the external speech server deployed. Provided the input mode (however defined) is voice, all grammars are passed directly to the speech server for evaluation. The determination of support for these grammars is then made by the external speech server.
50
Input Mode
Input Mode
The media server supports three modes of input: DTMF Voice DTMF and voice The VoiceXML Specification [13] defines different input defaults for different grammar types. These are shown in Table 3-1.
Table 3-1 Default Input Modes for VoiceXML Grammars Grammar Type XML-SRGS Built-In Menu-Choice Option Default Input Mode Voice DTMF and Voice No explicit default. The way in which the grammar is defined determines the input mode for the grammar. DTMF and Voice
Table 3-2 shows how the input mode is determined on the media server.
Table 3-2 Mechanisms for Setting Input Mode Scope Mechanism Configured input mode Description The default input mode as configured using the media servers management interface. The default is DTMF. An attribute of the <property> element that defines the input mode. If this attribute is not set, the value configured through the management interface is used. The starting (default) value is that set through the management interface. Scope/Precedence Scope: Session Precedence: Lowest Scope: Depends on scope of the <property> element. May be as high as Application or as low as Dialog. Precedence: Higher than configured input mode.
inputmodes attribute
Table 3-3 shows how the input mode interacts with the mode of the grammar, as defined by the mode attribute of the <grammar> element.
Table 3-3 Interaction of Input Mode and Grammar Mode Input Mode DTMF DTMF Grammar Mode DTMF Voice Behavior The media server detects, collects, and parses DTMF input. No grammars are active. The media server behaves as if no grammars were present in the script. Digits cannot barge clips and are not buffered. NOINPUT is reported for all collections.
51
Table 3-3 Interaction of Input Mode and Grammar Mode Input Mode Voice Grammar Mode DTMF Behavior No grammars are active. The media server behaves as if no grammars were present in the script. Digits cannot barge clips and are not buffered. NOINPUT is reported for all collections. The voice grammar is passed to the external speech server. The external speech server detects, collects, and parses coice collection and passes the results to the media server as an NLMSL script. DTMF digits are ignored. No voice grammar is activated. Only DTMF collection is valid. Only voice grammar is activated. DTMF digits are ignored. (In some cases two grammars would be required.) Both DTMF and coice grammars are active. DTMF input cancels the voice grammar and any voice input received until that point.
Voice
Voice
DTMF and Voice DTMF and Voice DTMF and Voice
DTMF Voice DTMF and Voice
Menu-Choice Grammars
Menu-choice grammars are a simple mechanism for allowing the user to make a choice, and transitioning application control to another location is based on the users choice. Using audio prompts, the menu offers the user a set of choices, after which it waits for user input. The dialog transitions based on the user input. A menu-choice grammar can concurrently define both a DTMF and a speech grammar. Menu-choice grammars are implemented using the <menu> element and the <choice> element; please see those elements for details.
Option Grammars
Option grammars are a relatively simple way to specify grammars for collecting and processing user input. Simple DTMF or speech sequences or speech sequences are specified within the <option> element. The value attribute is assigned to the result of the collection, based on the option that was matched. An option grammar can concurrently define both a DTMF and a speech grammar. Option grammars are implemented using the <option> elementt; please see that element for details.
52
SRGS Grammars
SRGS Grammars
SRGS grammars are grammars defined according to the XML-based W3C Speech Recognition Grammar Specification (SRGS) Language [11]. The SRGS standard support both grammars for DTMF user input and for speech recognition. SRGS grammars consist of SRGS elements. The section SRGS Elements on page 9 shows the SRGS elements supported by the media server. The scope of a grammar rule can be either private or public. If the rules scope is private, then the rule can be referenced only from other rules in the local grammar. If the rules scope is public, and if the rule is activated for recognition, then the rule can also be from other grammars. XML-SRGS grammars can be defined either inline (that is, internal to the VoiceXML document) or external.
Inline SRGS Grammars

Inline grammars are defined within the VoiceXML document. Inline grammars are defined using the supported set of XML-based SRGS elements described in Table 1-3 on page 10. Inline grammars follow the grammar scoping rules as defined by the W3C VoiceXML 2.0 Specification [13]. Example 3-1 and Example 3-2 each show an inline SRGS grammar. In Example 3-1, the grammar produces a match if the user enters exactly one of 0 , 1, 2, 3, 4, *9, or #9. Any other form of user input generates a nomatch event.
Example 3-1 Inline SRGS DTMF Grammar
<grammar mode="dtmf" > <one-of> <item> <one-of> <item> 0 </item> <item> 1 </item> <item> 2 </item> <item> 3 </item> <item> 4 </item> </one-of> </item> <item> <one-of> <item> * 9 </item> <item> # 9 </item> </one-of> </item> </one-of> </grammar>
53
In Example 3-2, the grammar produces a match if the user enters utters exactly one of zero, one, two, three, four, star nine, or pound nine. Any other form of user input generates a nomatch event.
Example 3-2 Inline SRGS Voice Grammar
<grammar mode="voice" > <one-of> <item> <one-of> <item> <item> <item> <item> <item> </one-of> </item> <item> <one-of> <item> <item> </one-of> </item> </one-of> </grammar>
zero </item> one </item> two </item> three </item> four </item>
star nine </item> pound nine </item>
External SRGS Grammars

External SRGS DTMF grammars are specified in exactly the same way as inline SRGS DTMF grammars, but in a separate VoiceXML document. The source of the external DTMF grammar is specified as a URI, and the document may be fetched using HTTP when the grammar is required by the VoiceXML interpreter depending on the media servers VoiceXML input mode and the mode of the grammar: If the mode attribute is set to dtmf, the grammar is fetched and parsed by the media server using HTTP. If the mode attribute is set to voice, the external URL is passed unchanged to the external speech server, which assumes responsibility for the grammar. If the mode attribute is unspecified, then the behavior of the media server depends on the input mode configured through the media servers management interface: If the configured input mode is dtmf, the grammar is fetched and parsed by the he grammar is fetched and parsed by the media server. unless it determines that the mode is voice. As soon as the media server determines that the mode is voice, it stops parsing and sends the grammar to the speech server for processing. If the configured input mode is voice, the external URL is passed unchanged to the external speech server, which assumes responsibility for the grammar.
54
Arbitrary Grammars
If the configured input mode is dtmf and voice, the document is fetched by the media server and the decision of how to parse is made by determining whether or not a <grammar> element is present. If it is, the input mode is assumed to be voice; otherwise, the input mode is assumed to be DTMF.
Arbitrary Grammars
The media server has internal support for menu-choice, option, XML-SRGS and built-in grammars. Some speech servers also support arbitrary grammars, such as ABNF grammars, if specified within the the <grammar> element. Provided that the input mode is voice, all grammars are passed directly to the speech server for evaluation. The determination of support for the arbitrary grammar is then made by the speech server.
Built-In Grammars
In addition to SRGS grammars, there is a set of grammars built into the media servers VoiceXML interpreter. These are designed to facilitate development by eliminating the need to use SRGS for simple, general-purpose grammars. No XML definition is required to use these grammars. For speech grammars, the built-in grammars are converted to XML-SRGS grammars before being passed to the speech server. Built-in grammars are specified by using either the type attribute of the <field> element, or the src attribute of the <grammar> element. Built-in grammars are implicitly active for both DTMF and speech user input; however, some built-in grammar types (for example, Date or Currency grammars) are designed specifically for DTMF. For these built-in grammar types, collection and interpretation by the speech server may yield unpredictable results.If a built-in grammar type does not explicitly specify valid input values for voice, you should assume that the built-in grammar is valid for DTMF only. For any built-in DTMF grammar, the media sever can accumulate at most 30 DTMF digits. If not otherwise constrained, all grammars terminate upon receipt of the 30th digit. The received digits are then evaluated based on the specific grammar associated with the collection. Limitations on the length of speech depend on the external speech servers deployed. All <field> elementdefined grammars are converted into their <grammar> element equivalent before passing it to a speech server. The following built-in grammar types are defined: Boolean Date Digits Currency Number Phone
55
Time All speech grammars defined within a <field> element are converted to the equivalent <grammar> element grammar (that is, all built-in grammars are converted to an SRGS grammar) before the grammar is passed to the speech server. Table 3-4 shows examples of this conversion.
Table 3-4 Conversion of Built-In Speech Grammars to XML-SRGS Grammars Grammar Type Boolean
Built-In Representation <field type="boolean"> <grammar src=" builtin:dtmf/boolean"/> <field type="currency"> <grammar src=" builtin:dtmf/currency"/> <field type="date"> <grammar src=" builtin:dtmf/date"/> <field type="digits?length=1"> <grammar src="builtin:dtmf/digits?length=1"/> <field type="number"> <grammar src=" builtin:dtmf/number"/> <field type="phone"> <grammar src=" builtin:dtmf/phone"/> <field type="time"> <grammar src=" builtin:dtmf/time"/>
Grammar String Sent to Speech Server <grammar mode="voice" src="builtin:grammar/boolean"/>
Currency
<grammar mode="voice" src="builtin:grammar/currency"/>
Date
<grammar mode="voice" src=" builtin:grammar/date"/>
Digits
<grammar mode="voice" src="builtin:grammar/digits?length=1"/>
Number
<grammar mode="voice" src="builtin:grammar/number"/>
Phone
<grammar mode="voice" src="builtin:grammar/phone"/>
Time
<grammar mode="voice" src="builtin:grammar/time"/>
Boolean
Boolean grammars accept a string of one or more DTMF digits, and assign a string value of true or false based on the digits entered. By default, the key 1 corresponds to true and 2 corresponds to false for DTMF grammars. For voice grammars, Yes corresponds to true and No corresponds to false. DTMF bindings may be changed by appending an HTTP URIstyle keyword=value query syntax to the grammar type. The keywords y and n are accepted as alternatives to true and false, respectively. The use of variable length digits is supported.
Example 3-3 Boolean Built-In Grammar
boolean?y=4;n=5, boolean?n=31;y=32
56
Built-In Grammars
Currency
Currency grammars accept entry of a variable number of DTMF or voice digits and the asterisk (*) key. Entries are assigned to a string in the format mm.nn format, where mm corresponds to zero or more digits in the major currency unit, and nn corresponds to zero or more digits the minor currency unit. The asterisk key is used as the decimal point to separate the major and minor currencies. With the exception of leading zeros, which are removed from the string, all entered digits are included in the resulting string.
Example 3-4 Currency Built-In DTMF Grammar
builtin:dtmf/currency
Date
Date grammars accept 2-, 4-, 6-, and 8-character DTMF or voice collections. These represent, respectively, days (dd), month and day (mmdd), year and month (yyyymm), and year, month, and day (yyyymmdd). The digits entered must form a valid date. The year component can be any four digits; that is, no validation is performed. The month must be between 01 and 12. Day must be between 01 and 31. Days are not checked for validity against the specified month. An error.nomatch event is thrown for invalidly entered dates. Note that, different from the specification [13], no question mark characters (?) are used to pad the input. Only the digits received are returned.
Example 3-5 Date Built-In DTMF Grammar
builtin:dtmf/date
57
Digits
Digit grammars accept entry of a variable number of DTMF or speech digits. The number of digits accepted may be constrained by appending an HTTP URIstyle keyword=value query syntax to the grammar type. Keywords accepted are minlength, maxlength, and length, where length specifies the exact number of digits accepted. An error.badfetch is thrown if there is a conflict between the keyword values. All digits are included in the resulting string.
Example 3-6 Digits Built-In Grammar
digits?minlength=2;maxlength=8
Number
Number grammars are identical to currency grammars, except that: The asterisk (*) key is interpreted as a decimal point, rather than a currency separator, and None of minlength, maxlength and length are specified. Leading zeros are removed from the resulting string. This allows the result to be used in an ECMAScript expression, as ECMA would interpret a leading 0 as representing an octal value.
Example 3-7 Number Built-In Grammar
builtin:dtmf/number
Phone
Phone grammars behave identically to number grammars, except that: All digits entered are included in the resulting string The asterisk key (*) is interpreted as representing an extension. For example, 8005551212*123 results in a returned string of 8005551212x123.
Example 3-8 Phone Built-In Grammar
builtin:dtmf/phone
58
Maximum Length of Grammars
Time
Time grammars accept entry of three or four DTMF digits representing a time, and return a five-character string in the format hhmmx, where hh is the hours between 00 and 24, mm is minutes between 00 and 59, and x is either h (for a 24-hour clock) or ? if the entry is ambiguous between a 12- and 24-hour clock. Because morning (AM) cannot be unambiguously expressed in DTMF, ? will be a common termination. If only three digits are entered, the media server adds a leading zero to the string.
Example 3-9 Time Built-In Grammar
builtin:dtmf/time
Maximum Length of Grammars

All grammars, whether inline or external SRGS grammars or built-in, have a maximum number of digits that can be collected. Regardless of how the grammar is defined, the grammar has at most one maximum length, which is the maximum number of digits that can be collected before the grammar has been satisfied. For simple built-in grammars (for example, digits?length=4) this is explicitly stated. For other grammars, the maximum length is implicitly determined by evaluating the grammar. Example 3-10 shows an SRGS grammar.
Example 3-10 Variable Maximum Digit Length in a DTMF Grammar
<grammar version="1.0" type="application/srgs+xml" mode="dtmf" root="root"> <rule id="root" scope="public"> <one-of> <item> 1 <item> 3 </item> </item> <item repeat="0"> 3 </item> <item repeat="3"> 4 </item> <item repeat="3-5"> 5 </item> <item repeat="4-"> 6 </item> <item repeat="0-1"> 8 </item> <item> 9 </item> </one-of> </rule> </grammar>
In this example, the possible maximum lengths are 1, 2, 3, and 30 digits. Since only one maximum length can be associated with a grammar at any time, the longest maximum length (in this case 30) is used.
59
The means that, for a given grammar, digit collection may not end immediately at the first input that satisfies one possible maximum length. In Example 3-10, the fourth item has a length of 3; as as such, an input string of 444 might be expected to end collection immediately. Instead, the maximum length is the longest possible maximum length30. In this case, the inter-digit timer is started and the system waits to see if additional input will be forthcoming. If no additional input is received within the inter-digit timeout interval, collection will end and the input string 444 is accepted as satisfying the fourth grammar item. Thus, for a grammar with variable length items, collection will only be terminated by either the longest possible maximum length or an inter-digit timeout.
Grammar Evaluation
All DTMF collections are evaluated in real time as digits are received. All matches and no-matches (for example, if the current digit results in a match or an impossible match) are recognized and reported as soon as the current digit is evaluated. For voice grammars the evaluation of incoming speech against the currently defined grammar is performed by the external speech server. The specific behavior will depend on the speech server employed.
60
Chapter4:
VOICEXML 2.0 ELEMENTS
This chapter describes the VoiceXML 2.0 elements currently supported by the Convedia Media Server, including SRGS and SSML elements. The VoiceXML 2.0 language is defined by the W3C Candidate Recommendation specifying the language [13]. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported in this release of the Convedia Media Server. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.
61
<assign>
Assigns a value to a variable. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
name expr
Mandatory. The name of the variable being updated. Mandatory. An ECMAScript expression representing the new value of the variable.
Usage Guidelines
Use this element to assign a value to a variable. Note that the maximum size of a variable namewhether assigned, or newly created using the <var> elementis 256 characters. If the variable name exceeds this length, an error.semantic is thrown.
62
<audio>
<audio>
Plays an audio clip or multimedia file or renders a text-to-speech clip. Parent element:
<audio>, <block>, <catch>, <desc>, <emphasis>, <error>, <field>, <filled>, <help>, <if>, <initial>, <mark>, <menu>, <noinput>, <nomatch>, <p>, <phoneme>, <prompt>, <prosody>, <record>, <s>, <say-as>, <sub>, <subdialog>, <voice> <audio>, <break>, <emphasis>, <mark>, <p>, <prosody>, <s>, <say-as>, <value>, <voice>
Child elements:
Note that the SSML elements <break>, <desc>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, and <voice> do not appear a children or parents of the <audio> element within the XML schema.
Attributes
src
A URI or numeric index representing the media clip or TTS string to be played. A URI must comply with the XML anyURI format. In addition, the URI or numeric index must comply with the constraints described in the section Working with Media Files and TTS Strings on page 28. Exactly one of src and expr must be specified; otherwise, an error.badfetch is thrown.
expr
An ECMAScript expression evaluating to the URI or numeric index of the media clip or TTS string to be played. A URI resulting from the expression must comply with the XML anyURI format.In addition, the URI or numeric index resulting from the expression must comply with the constraints described in the section Working with Media Files and TTS Strings on page 28. Exactly one of src and expr must be specified; otherwise, an error.badfetch is thrown.
fetchhint fetchtimeout maxage maxstale
Optional. Ignored. Optional. Ignored. Optional. Ignored. Optional. Ignored.
63
Usage Guidelines
The <audio> element requests the media server to play an audio clip or multimedia clip, or to render text-to-speech strings.
Audio, Video, and Multimedia Clips

Clips are played to completion, unless user input interrupts (barges) the clip. Barging can be allowed or disallowed by setting the bargein property. For a description of this property, please see the section Prompt Properties on page 43. The type of input that can interrupt a multimedia clip is specified using the inputmodes attribute of the <property> element. The audio to be played is identified by a URI specified in either the src or expr attribute. The referenced audio file can be stored internally (a provisioned clip) or externally on an NFS or HTTP server. Internal clips are specified using the file: scheme. External clips are specified using either the file: scheme (for NFS) or the http: URI scheme (for HTTP) and are fetched using the HTTP protocol or NFS. The use of a base URI is supported for both the src and expr attributes. The base URI can be defined using attributes in the <vxml> or <prompt> elements, or may be defined relative to the actual document. The base URI is applied if: The URI specified in the src attribute does not start with either the file: or http: schemes OR The URI resulting from evaluation of the expression specified in the expr attribute does not start with either the file: or http: schemes AND it is not a numeric value greater than 50,000. (Numeric values greater than 50,000 represent indexes of internally recorded clips and, as such, do not have a base URI prepended to them.) For complete information on referring to media clips, please see the section Working with Media Files and TTS Strings on page 28. The element may have optional audio, such as alternate audio, silence, or text, with the restrictions described in the section Audio Clip Errors on page 65. Audio, video, or multimedia clips can be played individually or specified as a sequence of multiple clips. All clips must be of the same media type (for example, all audio or all multimedia). If media types are mixed, the media server plays up to the first clip containing different media and then fails. Within a single media type, file extensions can vary. For example, you can play a series consisting of WAVE files mixed with audio-only QuickTime files. Each clip in a sequence is considered individually: if a clip cannot be played, it is skipped and the next clip in the sequence is played, if possible. Also, the failure to play a clip does not cause a session to fail.
Text to Speech Strings

Text to Speech (TTS) strings can be specified within the body of an <audio> or <prompt> element. If the TTS string is in the body of an <audio> element, it is assume to be an alternate audio string. A TTS string may be any of the following: Plain text Encoded Synthetic Speech Markup Language (SSML)
64
<audio>
Text with embedded SSML elements A URI to an external SSML file, expressed using either the src or expr attribute. For external URIs, the file extensions SSML, CSSML, and TXT are supported. Any other file extension is assumed to be a media clip. An <audio> element defined (that is, embedded) within a TTS string is only valid if there are active external TTS servers. For systems that do not deploy TTS servers, then the entire string, including any embedded <audio> elements, is ignored.
Audio Clip Errors

An audio prompt may consist of a single clip or a sequence of clips. As long as the specification of the audio clip is syntactically valid, the media server treats the clip as if it has been successfully played, regardless of any errors that might occur subsequently. For example, the request to play a non-existent clip is considered successful so long as the clip specification is syntactically valid. From the media servers perspective, the clip completes and the session transitions to the next element in the document. In sequences, this means that audio clips that fail to play are skipped, and failure to play a clip does not affect other clips in the sequence. For example, if the second clip of a four-clip sequence fails to play, the third and fourth clips will be played (if possible). Note that failure to play a clip does not result in failure of the session. The only exception to this behavior is if the request to play an audio clip fails because of an overload condition, such as service overload. In this case, the session terminates.
Alternate Audio
The media server supports the use of alternate audio or silence. Alternate audio allows an application the means to specify an audio clip, multimedia clip or TTS string to be played in case the primary clip fails. The primary clip can be either an audio or multimedia clip; it cannot be a TTS clip embedded within the VoiceXML document for the purpose of playing alternate audio. However, a TTS clip can be specified as an external URI. In that case, if the external speech server fails the playing of the clip (for example, because the file was not found or a parse error occurred), and alternate audio is defined, the alternate audio is queued and requested to be played. Alternate audio is specified by including a second <audio> element nested within the first. Silence is played by including a <break> element nested within the <audio> element. Alternate audio is played, when specified, if: The requested primary clip(s) are not found. However, if the clip was started but failed prematurely then the alternate audio will not be played. If a series of clips are specified as the primary clips and at least one of them plays then the alternate audio are not played. The primary clip is specified as an ECMA expression using the expr attribute and the ECMA variable does not exist. In this case, an ECMA error is thrown after evaluating the expr attribute. If this occurs, and an alternate <audio> or <break> element has been defined, then the <audio> or <break> element will be queued and played (assuming that it is validly specified). Otherwise the error is treated as non-fatal and the session transitions to the next element defined in the script. The <audio> element src and expr attributes support only prerecorded audio clips and not TTS strings. Alternate audio however does not have this restriction. Alternate audio can be:
65
A <break> element, which will play the specified silence An internal pre-recorded audio clip An external pre-recorded audio clip A TTS string. The <audio> element supports only two levels of nesting. Thus, there is at most one level of alternate audio clip(s) that can be specified. Anything below the second level is ignored.
Example 4-1 Alternate Audio Examples
<audio src=file://ClipstoPlay> <audio src="file://AlternateAudioClip/> </audio> <audio src=file://Welcome> Welcome to your life </audio> <prompt> This is a TTS string. <audio src=file://nextClip/> This is another string to play </prompt>
With respect to alternate audio, whether <audio> or <break> elements are used, only one alternate element can be defined. All others are ignored. Note that the <break> element will not appear as a child of <audio> in the XML schema. The <break> element is not defined as a standard element in the schema and as such does not appear in the normal child-parent relationships.
Audio Clip Name Length

All audio clip types, that is internal or external, have a maximum length of 256 characters. Any clip that exceeds 256 characters in length will be ignored. Indexed clips identifiers are limited to 50,000.
Encoding
The media server ignores the length specified in WAV file headers. The media server first uses the HTTP Content-Type header to determine the codec. The Content-Type header is analyzed in this order:
66
<audio>
1 If the content type is audio/basic, audio/x-alaw-basic, or audio-x-g729-basic, the media server assumes that
the file is a raw file and rejects the request.

2 If the value is either audio/wav or audio/x-wav, the media server interprets the audio as a WAV file, and
relies on the WAV header to determine the codec type. If a WAV header is not found, the media server fails the announcement.
3 If the value is audio/vnd.wave; codec=xxx, where xxx is any number, the media server interprets the file as a
WAV file but uses the codec=xxx encoding in preference to any defined within the file.
4 If the media server cannot determine the file type from the header (for example, only audio is specified),
the media server examines the file extension. If the extension is .wav, the media server uses the encoding specified in the file. If the extension is not .wav,, the media server assumes the file is a .wav file interprets the contents of file as containing a WAV header and accepts or rejects the request accordingly.
Interoperability Notes
For some speech servers: No audio output is heard if PCMA is the configured codec. If the configured codec is PCMU the speech is heard. For PCMA nothing is heard and there is no error indicating that there was an issue. The speech server fails to play a mixture of SSML, plain text, and CSSML text. All scripts are requested as external URIs. All three scripts can be heard being played separately but not as a group. The generated TTS speech is choppy and garbled. On some speech servers, a clicking sound occurs between the playing of TTS clips and local audio clips. This does not occur with other external servers. The speech server generates speech in English when Mandarin is specified in some scripts. A date speech grammar generates a mixture of Chinese and English when the xml:lang attribute is set to en-US.
67
<block>
Allows execution of code within a form. Parent element: Child elements:
<form> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
name
Optional. The name of the form item variable used to track whether this block is eligible to be executed. The default is an inaccessible internal variable.
expr
Optional. An ECMAScript expression representing the initial value of the form item variable. If initialized to a value, the form item will not be visited unless the form item variable is cleared. The default is the ECMAScript value undefined.
cond
Optional. A Boolean ECMAScript expression. The form item is visited if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
Usage Guidelines
The <block> element is a form item. It contains executable content that is executed if: The blocks form item variable has a value of undefined AND The blocks cond attribute (if any) evaluates to true. If cond is not specified, the behavior is as if cond is set to true.
68
<break>
<break>
Inserts a pause or silence into audio. Parent element:
<audio>, <prompt>
Accepted but ignored as a child of <choice>. Child elements: None.
Attributes
time
Optional. The length of the interval of silence to be inserted, in seconds or milliseconds. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The time attribute takes precedence over both size and strength. If nothing is specified, the default interval is 200 milliseconds.
69
strength
Optional. The length of the interval of silence to be inserted, in predefined intervals. Supported values are as follows: x-weak: 50 milliseconds weak: 100 milliseconds medium: 200 milliseconds strong: 500 milliseconds x-strong: 2000 milliseconds none: 0 milliseconds The time attribute takes precedence over both size and strength. The size attribute takes precedence over strength.If nothing is specified, the default interval is 200 milliseconds. The strength attribute is always present in all <break> elements, whether specified or not. However, because of its low precedence, it is only used if it is specified and neither time nor size is specified.
size
Deprecated in favor of strength, which is compliant with [4]; however, this attribute is still accepted for backwards compatibility.
Usage Guidelines
The <break> element attribute allows silence intervals to be played within a VoiceXML script. The element is essentially treated like an <audio> element, where the clip played is silence. Instead of specifying an actual audio clip, the <break> element specifies the interval of silence. Up to one <break> element is supported within an <audio> element; others are ignored. Note that the <break> element will not appear as a child of <audio> in the XML schema. The <break> element is not defined as a standard element in the schema and as such does not appear in the normal child-parent relationships.
70
<catch>
<catch>
Handles (catches) events. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog>, <vxml> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
event
Optional. The event or events to be caught by this event handler. The format is a space-separated list of event names, where an event name is one of the supported events listed in the section Events on page 24. If more than one event is specified, a separate event counter (that is, a separate count attribute) is maintained for each event.
count
Optional. The number of occurrences of the event. The count attribute allows an application to handle different occurrences of the same event in different ways. Each <form>, <menu>, and form <item> maintains a counter for each event that occurs while it is being visited. These counters are reset each time the <menu> or form items <form> is re-entered. The form-level counters are used in the selection of an event handler for events thrown in a form-level <filled>. Counters are incremented against the full event name and every prefix-matching event name; for example, the occurrence of the event event.foo.1 increments the counters associated with handlers for event.foo.1, event.foo, and event. The count may not exceed a 32-bit unsigned integer. The default is 1.
cond
Optional. A Boolean ECMAScript expression. The catch handling routine is invoked if and only if this expression evaluates to true. The default is true.
71
Usage Guidelines
The <catch> element allows executable content to be defined for a number of events that the interpreter can generate. The cond attribute is used to test for event conditions. The special variable _event is supported to store the name of the event that is thrown. The special variable _message is also supported. This variable holds an optional message string, which may be set within the <throw> element. If a message has not been specified, then the variable will be set to the ECMAScript value undefined.
72
<choice>
<choice>
Provides menu choices. Parent element: Child elements:
<menu> <emphasis>, <grammar>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>. <break> is accepted but ignored.
Attributes
dtmf
Optional. Specifies a simple DTMF sequence which, when matched, will result in this choice. White space is permitted in the DTMF sequence specification; for example 1234# and 1 2 3 4 # are treated as equivalent. There is no default. Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see the section Generic DTMF Recognizer Properties on page 41.
accept
Optional in speech grammars; ignored for DTMF grammars. Tue only valid is exact. An accept value specified in a <menu> element, overrides the value set here.
next
Fetches the document at the specified URI. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
expr
Fetches the document at the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
event
Throws the specified event when this choice is made. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
73
eventexpr
Throws the event resulting from evaluation of the specified ECMAScript expression when this choice is made. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
message
Optional. Returns the specified message string to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
fetchaudio fetchhint fetchtimeout
Ignored. Ignored. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
74
<choice>
Usage Guidelines
The <choice> element defines a menu item and allows the application to define a simple DTMF sequence or voice specification to indicate this menu choice. It also allows specification of a destination URI for fetching the next document when the menu choice has been made. Optionally, the element can be set to throw an event when the choice is made. All <choice> elements defined for voice are converted into an XML-SRGS format, which is then passed to an external speech server for processing. Note although <break> is a valid child of <choice> in the VoiceXML schema, it is ignored (though accepted) in this implementation, and no action is taken if specified.
For some speech servers: Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch.
75
<clear>
Clears or resets form items (form fields). Parent element: Child elements:
None.
namelist
Optional. Resets the specified variable(s), including any form item variables. The format is a space-separated list of variable names. By default, all form items for the current form are reset.
Usage Guidelines
The <clear> element resets the specified variable(s), including form item variables. When form items are cleared, the prompt and event counters are reinitialized and the form item variable is set to the ECMAScript value undefined.
76
<controlcmd>
<controlcmd>
Specifies the actions associated with DTMF key presses for prompt controls. Parent element:
<promptcontrol>
Ignored if specified as a child of <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, or <nomatch>. Child elements:
Attributes
None.
dtmf
Mandatory. Specifies a single DTMF key with the associated audio control action. Supported digits are 09, *, #, A, B, C, and D. (Note that a through d are not supported.) Whitespace is not permitted. Any other value, or use of whitespace, will cause an error.badfetch to be thrown. You may specify the same DTMF key for both the pause and resume actions, in order to achieve a toggle action. Also, you may specify the same DTMF key for a single action defined multiple times. Otherwise, you cannot specify the same DTMF key for different actions, and doing so will cause session termination with an error.semantic.
action
Mandatory. The audio control action to be performed when the specified DTMF digit is pressed. Supported values are as follows: pause: Pause the stream for an indefinite period of time. resume: Resume the paused stream. seek: Stream audio beginning at the location specified by the combination of the from and to attributes. volume: Adjust the volume by the amount specified by combination of the from and to attributes. The default volume is 0dB.
77
from
Optional with seek and volume actions; ignored otherwise. The starting value for the seek and volume actions. Supported values are as follows: begin: When used with seek, measure the change of location specified by the to attribute relative to the beginning of the file. When used with volume, interpret the volume specified by the to attribute as an absolute volume. current: When used with seek, measure the change in location specified by the to attribute relative to the current position. When used with volume, interpret the volume specified by the to attribute as a change relative to the current volume. The default is current.
78
<controlcmd>
to
Mandatory with seek and volume actions; ignored otherwise. When used with seek, this attribute represents the offset interval in seconds or milliseconds from the starting point specified by the from attribute. The format is <number><unit>, where <number> is an integer, and may optionally be preceded by a plus sign (+) or a minus sign (-), and where a plus sign moves the location forward (fast-forward) and a minus sign moves the location backward (rewind). <unit> may be one of ms (for milliseconds) or s (for seconds). Spaces between the numeric value and the unit are not permitted. The range is (2^311) milliseconds to +(2^311) milliseconds, with a precision of 10 milliseconds. If the specified value exceeds the range in either direction, then the media server automatically applies the offset limit (either positive or negative). Specifying a forward location past the end of the audio file results in audio stream completion. Specifying a rewind amount past the beginning of the file results in play starting at the beginning of the file. Examples of to values are: 100ms, 50s, and +600ms. When used with volume, this attribute represents a volume change. As an absolute volume specification (from=begin), the range is 96dB to +96dB, where the plus sign (+) is optional. Exceeding the range will cause an error.semantic to be thrown. As a change in volume relative to the current volume (from=current), the range is 192dB to +192dB, where the plus sign (+) is optional. Exceeding the range will cause an error.semantic to be thrown. If you specify a change of volume that is within the valid range, but which results in an absolute volume lower than the negative limit of 96dB or greater than the positive limit of +96dB, then the media server automatically applies the volume limit (either positive or negative). Note that all units are required. Omitting units will cause an error.semantic to be thrown.
Usage Guidelines
The <controlcmd> element specifies DTMF keys, and associates an action for audio prompt controls. This element is valid only for pre-record audio clips. TTS clips specified within a <controlcmd> element are ignored. Audio controls are limited to single DTMF keys, which are specified by the dtmf attribute. DTMF grammars (inline or external, and built-in or SRGS) are currently not supported in specifying audio controls.
79
While a prompt control is active, the DTMF keys and associated control actions override any currently active grammars or prompt barge-in. DTMF digits not consumed by <controlcmd> action keys are used by currently active grammars or prompt barge-in. Actions specified in the <controlcmd> element are active during the play of a prompt only if the <prompt> elements cvd:vcrprompt attribute is set to true. The same DTMF key can be defined for pause and resume actions, so that the user can between pausing and resuming a clip. These are the only actions that may use the same DTMF key. Also, the same action can be defined multiple times using the same key. In this case, the most recent definition overrides the previous ones. Any other combination of actions that uses the same key results in an error.semantic being thrown and session termination. Control actions specified in <controlcmd> apply or span a single <prompt> element. If several <prompt> elements are played back to back, with control commands enabled, then each is treated independently. All errors in specifying media controls result in the session terminating. An error.badfetch is thrown for any errors detected by the parser. This is generally cases where the value assigned to the attribute does not conform to the regular expression for that attributefor example, a value for the dtmf attribute that is not a valid DTMF digit. For all other errors, an error.semantic thrown. Possible error cases include the following: Omitting the to attribute for volume or seek actions. Specifying a value for from that is neither begin or current. Specifying a time value for the to attribute when the action is volume. Specifying a volume-based value for the to attribute when the action is seek. Failing to include units (s, ms, or dB) for the to attribute. Including a space between the value and the unit for the to attributefor example, 3 s. Specifying a value that is out of range for the to attribute for an absolute volume specificationthat is specifying a value that is less than 96 dB or greater than +96 dB when from=begin. Using the same DTMF key is for two different actions which are not pause and resume. Pause and resume are the only actions that may use the same key for the toggle function. Otherwise the same DTMF key cannot be used for different actions (although the same action can be defined multiple times using the same digit).
80
<desc>
<desc>
[SSML] Provides a textual description of audio content. Parent element: Child elements:
Attributes <audio>
None.
xml:lang
Optional. Indicates that content of this element is in a different language from that surrounding the element.
Usage Guidelines
The <desc> element provides a textual description of audio source (for example, door slamming). The <desc> element can only occur within the content of the <audio> element. If text-only output is being produced by the synthesis processor, the content of the <desc> element(s) should be rendered instead of other alternative content in audio. The optional xml:lang attribute can be used to indicate that the content of the element is in a different language from that of the content surrounding the element. Unlike all other uses of xml:lang in this document, the presence or absence of this attribute will have no effect on the output in the normal case of audio (rather than text) output.
For some speech servers: The <desc> element is only supported as content of the <audio> element. The expected behavior of the VoiceXML script and the subsequent SSML TTS body is that the request be rejected; however, the speech is generated and played.
81
<disconnect>
Terminates the VoiceXML application, sending a SIP BYE. Parent element: Child elements:
None.
This element has no attributes.

Usage Guidelines
The <disconnect> element allows the VoiceXML interpreter context to disconnect the user. Execution of the disconnect element causes the connection.disconnect.hangup event to be thrown, which may optionally specify some clean-up actions. The current session is terminated, a SIP BYE is sent to the control agent, and all associated media port resources are released by the platform. See also the related elements <exit> and <return>.
82
<else>
<else>
Provides alternative logic for an <if> condition. Parent element: Child elements:
Attributes <if>
None.

Usage Guidelines
The <else> element is an optional element. It defines the beginning of an else clause specifying the code to be executed if the conditions specified in the associated <if> element are not satisfied.
83
<elseif>
Provides alternative logic for an <if> condition. Parent element: Child elements:
Attributes <if>
None.
cond
Mandatory. A Boolean ECMAScript expression. The associated clause is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
Usage Guidelines
The <elseif> element is an optional element. It defines a new conditional clause specifying the code to be executed if the conditions specified in the associated <if> element are not satisfied. The new clause is entered only if the conditions specified by the cond attribute are satisfied.
84
<emphasis>
<emphasis>
[SSML] Directs the speech server to add emphasis to surrounded text. Parent element: Child elements:
<speak>
.<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <voice>
Attributes
level
Optional. Indicates the strength of emphasis to be applied. Defined values are as follows: strong moderate none reduced The default level is moderate. The meaning of strong and moderate emphasis is interpreted according to the language being spoken (languages indicate emphasis using a possible combination of pitch change, timing changes, loudness and other acoustic differences). The reduced level is effectively the opposite of emphasizing a word. For example, when the phrase going to is reduced it may be spoken as gonna. The none level is used to prevent the synthesis processor from emphasizing words that it might typically emphasize. The values "none", "moderate", and "strong" are monotonically non-decreasing in strength.
Usage Guidelines
The <emphasis> element requests that the contained text be spoken with emphasis (also referred to as prominence or stress). The synthesis processor determines how to render emphasis since the nature of emphasis differs between languages, dialects or even voices. The emphasis element can only contain text to be rendered.
For some speech servers: The <emphasis> element with the level attribute has no effect. .
85
<error>
Handles (catches) all error events. Parent element: Child elements:
Attributes
count
Optional. The number of times an error event may be thrown within its scope (form or menu), after which error handling is invoked. The count may not exceed a 32-bit unsigned integer. The default is 1. Optional. A Boolean ECMAScript expression. The error handling routine is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
cond
Usage Guidelines
The <error> element catches all events of type error. If multiple error handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=error>. For a list of supported events, please see the section Events on page 24.
86
<example>
<example>
[SRGS] Provides an example phrase that matches the input specification. Parent element: Child elements:
Attributes
None. None.
None.
Usage Guidelines
This SRGS element can be used within a grammar rule definition to illustrate an example of user input complying with the specification. No associated action for this element is performed within the interpreter context or the grammar engine; it is ignored by these components.
87
<exit>
Terminates the VoiceXML application, while keeping the port open. Parent element: Child elements:
None.
expr
Optional. An ECMAScript expression (such as field1 or Finished) to be returned to the interpreter context. By default, no expression is returned. Only one of expr and namelist may be specified; if both are specified, an error.badfetch is thrown. No error is generated if neither is specified.
namelist
Optional. A space-separated list of variables to be returned to the interpreter context. By default, no variables are returned. Only one of expr and namelist may be specified; if both are specified, an error.badfetch is thrown. No error is generated if neither is specified.
Usage Guidelines
The <exit> element allows control to be returned back to the interpreter context. Unlike session termination as a result of a <disconnect>, <exit> allows the media server to retain media port resources. Other resources (documents, variables, and so on) associated with the session are released; however, the media port resources are not released by the platform. A SIP BYE is not sent to the control agent. The port resources are kept on hold pending further direction from the control agent.
88
<field>
<field>
Collects user input. Parent element: Child elements:
<form> <audio>, <catch>, <error>, <filled>, <grammar>, <help>, <link>, <noinput>, <nomatch>, <option>, <prompt>, <promptcontrol>, <property>
Attributes
name
Optional. Defines a variable with the specified name, which will hold the result of the user collection defined by the <field> element. The variable name must be unique among all form items defined within the form; otherwise, an error.badfetch is thrown. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words. There is no default.
expr
Optional. An ECMAScript expression assigning the initial value of the form item variable defined by name. If the initial value is set using this attribute, the form item will not be executed until the variable is cleared (for example, by using the <clear> element). The default is the ECMAScript value undefined. Optional. A Boolean ECMAScript expression. The field is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true. Optional. Provides the definition of a built-in grammar. Instead of using this attribute, a grammar can be specified using the <grammar> element.
cond
type
slot
Not supported. If received it will cause an error.unsupported event to be thrown.
89
modal
Optional. Allows you to disable all other grammars while the field is being executed, so that only the grammar associated with field is active. Supported values are as follows: true: Disable all other grammars, leaving only this one active. false: Keep all grammars enabled. The default is false.
Usage Guidelines
The <field> element prompts the user to provide input based on the specified grammar. The grammar can be DTMF and/or voice. The type attribute takes one of the defined built-in grammars as an argument. Built-in grammars implicitly support DTMF and voice inputs unless the input mode is explicitly specified using the inputmodes attribute of the <property> element. As an alternative to specifying the grammar in the type attribute, the grammar for a <field> element can be specified using the <grammar> element. All voice grammars defined using the type attribute are are converted into their <grammar> equivalent before being passed to the external speech server. shows the conversion that takes place between a type-specified grammar and its <grammar> equivalent, and shows whether or not that representation is supported for DTMF and voice.
Table 4-1 Conversion of <field> type Attribute to <grammar>
Supported Mode DTMF and voice DTMF only DTMF and voice DTMF and voice DTMF and voice
<field> Representation <field type=boolean> <field type="boolean?y=5;n=6> <field type="digits"> <field type="digits?minlength=3; maxlength=5"> <field type="date"/>
<grammar> Equivalent <grammar src="builtin:grammar/boolean"/> <grammar src="builtin:dtmf/boolean?y=5;n=6/> <grammar src="builtin:grammar/digits"/> <grammar src="builtin:grammar/digits?minlength=3; maxlength=5"/> <grammar src="builtin:grammar/date"/>
For more information about DTMF and voice grammars, please see Chapter 3: DTMF and Voice Grammars.
90
<field>
For some speech servers: The match rate for voice inputs is very low. There is an inconsistent match rate across match tests. A match is returned when a no-match is expected in some test cases. This occurs with different grammar types. A special SRGS rule (which is matched without the user speaking any word) does not work. The expected behavior is that the grammar can be used to match zero or silence. However, currently the rule is not matched. A special SRGS rule (which matches any speech up until the next rule match, the next token, or the end of spoken input) does not work. Currently the rule is not matched as expected. 0229 is recognized as 0529 for date grammars. Enter the values zero, two, two, nine and the speech server returns is returns 0529. Entering an invalid leap date returns a date. The expected behavior is to return an error or a nomatch. Some digits are dropped or mismatched for ASR digit grammar. 13456 was entered but 12345 was returned. Currency input of 100.798 drops the final digit and returns 100.79. The speech server accepts only up to 2 decimal places for number grammar. Entered 98.765 and 98.76 was returned. The speech server returns nomatch for a number grammar if the leading digits are zeros. The point character (.) is recognized as 1 instead of dot for number grammars. Phone grammars are incorrectly recognized. An input of 6044202978 returned 6004123457. Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch. Noinput was returned when a match was expected for the input: zero six zero six zero six zero six zero six. Speech input in a date grammar is incorrectly interpreted. Entering june, nineteen seventy eight results in a returned string of 780619. For MRCP v1, saying or generating the speech twelve oclock results in a no match being returned in a time grammar. For MRCP v2 this test succeeded. A grammar completion failure occurs setting up an ABNF grammar. Speech server running MRCP v1 does not return PCMA as the lead codec when only PCMA is offered. As a result, the external server actually uses the PCMU codec while the media server is streaming PCMU. When running MRCP v2 the speech server works as expected.
91
<filled>
Defines the code to be executed when user input is complete. Parent element: Child elements:
<field>, <form>, <record>, <subdialog> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
mode
Optional. Specifies when execution of this element should take place. Supported values are as follows: any: Execute when any of the input items has been filled by the user. all: Execute only when all of the input items have been filled by the user. The default is all.
namelist
Optional. A space-separated list of variable names representing the input items that must be filled in order for this element to be executed. When this element occurs within a form, this list defaults to the names (both implicit and explicit) of the forms input items; otherwise, there is no default.
Usage Guidelines
The <filled> element specifies actions to be executed when the associated <field> has been completed by the user.
92
<form>
<form>
Defines a dialog for collecting user input. Parent element: Child elements:
<vxml> <block>, <catch>, <error>, <filled>, <grammar>, <link>, <noinput>, <nomatch>, <promptcontrol>, <property>, <record>, <script>, <subdialog>, <var>
Attributes
id
Optional. A unique identifier for the document. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore. This identifier is optional. If specified, it can be used to within the current document or within another document to pass control to the formfor example, this-form in <goto next=#this-form>.
scope
Optional. The default scope of this forms grammar. Supported values are as follows: dialog: This grammar applies only to the current form. document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document. The default is dialog.
Usage Guidelines
The <form> element is a key mechanism in VoiceXML for presenting information to the user and collecting user input. A form consists of form items, which can be visited during the execution of the form. Form items can either be input items (which are visited as a result of user input) or control items (which are independent of user input). A form allows variable declarations and an event handler to be associated with the form. Additionally, the child element <filled> allows you to specify procedural logic that can be executed when user input is completed and a particular field item (or field) is filled.
93
<goto>
Transfers control to another dialog, abandoning the current dialog. Parent element: Child elements:
None.
next
The URI of the document to which to transition. The URI must comply with the XML anyURI format. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
expr
An ECMAScript expression evaluating to the URI of the document to which to transition. The URI resulting from the expression must comply with the XML anyURI format. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
nextitem
The name of the next item to transition to within the form. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
expritem
An ECMAScript expression evaluating to the name of the next item to transition to within the form. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
fetchaudio fetchhint
Ignored. Ignored.
94
<goto>
fetchtimeout
Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <goto> element provides the ability to transition control to another dialog, either within the current document, or within another document.
95
<grammar>
Defines user input rules for DTMF or voice. Parent element: Child elements:
Attributes <choice>, <field>, <form>, <link>, <record> <rule>
This element has the following VoiceXML attributes.
src
The URI of the grammar, if the grammar is to be fetched externally. The URI must comply with the XML anyURI format. This attribute can also be used to directly specify a built-in grammar, using the notation builtin:grammar/type?parameters (where grammar=dtmf). Either way, this attribute is mandatory if an inline grammar is not specified, and forbidden if an inline grammar is specified; that is, exactly one of src or an inline grammar must be specified. If both or neither are specified, an error.badfetch is thrown.
scope
Optional. The default scope of this grammar. Supported values are as follows: dialog: This grammar applies only to the current form. document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document. If not specified, the grammar scope is inherited from the parent element.
type
Optional. Identifies the MIME type of the grammar. If specified, this value takes precedence over file types or the HTTP Content-type header. If not specified and the grammar is fetched externally, then the file extension type or the media Content-type is used to determine the grammar type. If not specified and the grammar is inline, the type is assumed to be XML; that is, application/SRGS+xml.
weight fetchhint
Ignored. Ignored.
96
<grammar>
fetchtimeout
maxage maxstale
Ignored. Ignored.
This element inherits the following SRGS attributes for inline grammars.
version
Mandatory for an inline XML grammar; forbidden otherwise. Identifies the W3C specification version of the grammar. The only supported value is 1.0. Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server. A value for xml:lang specified at the <item> level overrides a value specified here.
xml:lang
97
mode
Optional. The type of the current grammar. Supported values are as follows: dtmf: The grammar is a DTMF-based grammar. voice: The grammar is a voice-based grammar. This attribute differs from the inputmodes property which represents the type of input that will be accepted. For a valid grammar (that is, a grammar that will be activated and can receive input), this attribute must align with the value of the inputmodes property. Grammars that mismatch between the mode attribute and the inputmodes property are ignored. The default for this attribute in the specification is voice. For backwards compatibility, the default for the media server is dtmf.
root tag-format
Optional for an inline grammar; forbidden otherwise. Identifies the grammars root rule. If not specified, the grammars default rule is used. Optional for an inline grammar; forbidden otherwise. A URI identifying the content type and version of Symantec processor to use. Defines the tag content format for all tags with the grammar. Optional. Allows a base URI to be defined. If set, any relative URIs within the inline grammar are resolved using this base URI. Otherwise, any relative URIs are resolved using the base URI specified within the <vxml> element.
xml:base
Usage Guidelines
The <grammar> element specifies the rules for a valid set of user inputs or utterances. The grammar definition can be inline, external, or built-in, and can be specified for both DTMF and/or voice. The grammar specification must be in the XML form of the notation specified by [11]. Exactly one of src or an inline grammar must be specified. If both or neither are specified, an error.badfetch is thrown. External grammars that are voice grammars are fetched, parsed and processed by the external speech server. For this case the URI will be passed directly (as-is) to the speech server. For this reason, the media server must determine the grammar type (that is, the input mode) before it can pass the URI. The input mode can be defined in any of the following ways: By specifying the default input mode as a VoiceXML parameter using the media servers management interface Using the inputmodes attribute of the <property> element Using the mode attribute of the <grammar> element To be valid, a grammar must evaluate to at least one digit sequence. Grammars that evaluate to be empty (that is, no valid collection sequence is specified), are rejected with an error.grammar event.
98
<grammar>
For some speech servers: The match rate for voice inputs is very low. There is an inconsistent match rate across match tests. A match is returned when a no-match is expected in some test cases. This occurs with different grammar types. A special SRGS rule (which is matched without the user speaking any word) does not work. The expected behavior is that the grammar can be used to match zero or silence. However, currently the rule is not matched. A special SRGS rule (which matches any speech up until the next rule match, the next token, or the end of spoken input) does not work. Currently the rule is not matched as expected. 0229 is recognized as 0529 for date grammars. Enter the values zero, two, two, nine and the speech server returns is returns 0529. Entering an invalid leap date returns a date. The expected behavior is to return an error or a nomatch. Some digits are dropped or mismatched for ASR digit grammar. 13456 was entered but 12345 was returned. Currency input of 100.798 drops the final digit and returns 100.79. The speech server accepts only up to 2 decimal places for number grammar. Entered 98.765 and 98.76 was returned. The speech server returns nomatch for a number grammar if the leading digits are zeros. The point character (.) is recognized as 1 instead of dot for number grammars. Phone grammars are incorrectly recognized. An input of 6044202978 returned 6004123457. Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch. Noinput was returned when a match was expected for the input: zero six zero six zero six zero six zero six. Speech input in a date grammar is incorrectly interpreted. Entering june, nineteen seventy eight results in a returned string of 780619. For MRCP v1, saying or generating the speech twelve oclock results in a no match being returned in a time grammar. For MRCP v2 this test succeeded. A grammar completion failure occurs setting up an ABNF grammar. Speech server running MRCP v1 does not return PCMA as the lead codec when only PCMA is offered. As a result, the external server actually uses the PCMU codec while the media server is streaming PCMU. When running MRCP v2 the speech server works as expected.
99
<help>
Handles (catches) help events. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
count
Optional. The number of times a help event may be thrown, after which the help handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5. Optional. A Boolean ECMAScript expression. The help handling routine is executed if and only if the expression evaluates to true. The default is true.
cond
Usage Guidelines
The <help> element catches all events of type help. If multiple help handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=help>. For a list of supported events, please see the section Events on page 24.
100
<if>
<if>
Defines conditional logic. Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch> <assign>, <audio>, <clear>, <disconnect>, <else>, <elseif>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
cond
Mandatory. A Boolean ECMAScript expression. The associated clause is executed if and only if the expression evaluates to true.
Usage Guidelines
The <if> element defines procedural logic that is to be executed on satisfaction of a condition. The <if> element may have associated <else> and/or <elseif> clauses, which define alternate logical flows.
101
<initial>
Provides the initial prompt in a form. Parent element: Child elements:
<form> <audio>, <catch>, <link>, <noinput>, <nomatch>, <prompt>, <property>
Attributes
name
Optional. The name of the form item variable used to track whether the <initial> element is eligible for execution. The default is an inaccessible internal variable.
expr
Optional. An ECMAScript expression representing the initial value of the form item variable. If initialized to a value, the form item will not be visited unless the form item variable is cleared. The default is the ECMAScript value undefined.
cond
Optional. A Boolean ECMAScript expression. The form item is visited if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
Usage Guidelines
The <initial> element defines procedural logic that is to be executed on satisfaction of a condition. In a typical mixed initiative form, the <initial> element is visited when the user is initially being prompted for form-wide information, and has not yet entered into the directed mode where each field is visited individually. Like input items, the <initial> element has prompts, catches, and event counters. Unlike input items, the <initial> element has no grammars, and no <filled> action.
102
<item>
<item>
[SRGS] Defines valid user input, as part of a DTMF or voice grammar rule. Parent element: Child elements:
Attributes <item>, <one-of>, <rule> <item>, <one-of>
repeat
Optional. Specifies additional user detection repeat rules for a match to be declared. Supported formats are as follows: repeat=n. Repeat n times. repeat=m-n. Repeat between m and n times, where m is less than or equal to n, and m and n are both greater than or equal to 0. repeat=m-. Repeat m or more times, where m is greater than or equal to 0. repeat=0-1. Indicates that expansion is optional.
repeat-prob
Optional for voice grammars; ignored for DTMF grammars. Sets the probability that the repeat attribute will succeed. Valid onlly for speech grammars and only if the repeat attribute is defined. The range is 0.0 to 1.0. Ignored. Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server. A value for xml:lang set here overrides a value set at the specified at the
<grammar> level overrides a value specified here.
weight xml-lang
Usage Guidelines
The <item> element is used in XML grammar specification rules to define valid user inputs. For DTMF items, grammars as defined in Appendix E of [11] may be used. These are the digits 09, #, *, and the digits AD. For voice-based grammars, any input acceptable by the external speech server may be used. Tokens not enclosed in <item> elements are ignored. A grammar that has no valid <item> elements defined is rejected with an error.grammar event. (Note that this deviates slightly from [13], which states that empty grammars should be allowed.) For information on how this differs for voice-based grammars, please see Chapter 3: DTMF and Voice Grammars. The <item> element can be nested at most three levels deep.
103
For some speech servers: The repeat attribute used in a nested <item> element returns nomatch for input that should generate a match.
104
<link>
<link>
Specifies a destination URL when a grammar activates a match. Parent element: Child elements:
Attributes <field>, <form>, <vxml> <grammar>
next
Goes to the specified URI. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
expr
Goes to the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
event
Throws the specified event when one of the link grammars is matched. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
eventexpr
Throws the event resulting from evaluation of the specified ECMAScript expression when one of the link grammars is matched. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
message
Optional. Returns the specified message string to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
105
dtmf
Optional. Specifies a simple DTMF sequence which, when matched, activates the specified link. White space is permitted in the DTMF sequence specification; for example 1234# and 1 2 3 4 # are treated as equivalent. There is no default. Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see the section Events on page 24.
fetchaudio fetchhint fetchtimeout
Ignored. Ignored. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <link> element provides a mechanism for transitioning to a new document or dialog. Alternatively, it can be used to throw an event instead of transitioning to a new document. The <link> element is activated when the grammar contained or specified within the element is matched. For this reason, grammars specified within the <link> element are not able to have a scope specified.
106
<link>
Grammars active for a link at the root document level are active throughout all documents referenced from the root document. Grammars active for a link at the <vxml> level are active throughout the document. Grammars active for a link at the <form> level are active while the user is in the form.
107
<log>
Generates messages for logging and troubleshooting. Parent element: Child elements:
Attributes <block>, <catch>, <filled>, <form>, <catch>, <help>, <if>, <noinput>, <nomatch>
None.
label expr
Optional. A string that can be used to label the logfor example, to indicate the purpose of the log. Optional. An ECMAScript expression evaluating to a string that can be used to label the logfor example, to indicate the purpose of the log.
Usage Guidelines
The <log> element allows an application to generate messages for the purpose of logging and debugging. The messages can include events, text information, and/or results from a VoiceXML script. This facility aids application developers in debugging an application by examining its flow control and variable contents. The element may contain any combination of text and <value> elements. The <value> element is used to de-reference ECMA script expressions and include them as a string in the message. The generated message consists of the concatenation of the text message and the string form of the value of the expr attribute in the <value> element. All log messages generated by the <log> element are written to syslog at a severity level of INFO.
108
<mark>
<mark>
[SSML] Places a marker into a text or tag sequence. Parent element: Child elements:
Attributes <speak>
None.
name
Mandatory. A token providing a unique name for the marked location; for example here.
Usage Guidelines
Use the <mark> element to reference a specific location in the text/tag sequence, or to insert a marker into an output stream for asynchronous notification. When processing a mark element, a synthesis processor does one or both of the following: Informs the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output. When audio output of the SSML document reaches the mark, issue an event that includes the required name attribute of the element. The hosting environment defines the destination of the event.
The <mark> element does not affect the speech output process.
For some speech servers: The TTS server does not send MARK event to SPM when it reaches <mark> element in spoken text.
109
<menu>
Provides a fixed set of menu selections. Parent element: Child elements:
<vxml> <audio>, <catch>, <choice>, <error>, <help>, <noinput>, <nomatch>, <prompt>, <promptcontrol>, <property>, <script>
Attributes
id
Optional. A unique identifier for the menu. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore. This identifier is optional. If specified, it can be used to within the application to pass control to the menufor example, from a <goto> or a <submit>.
scope
Optional. The default scope of this menus grammar. Supported values are as follows: dialog: This grammar applies only to the current menu. document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document. The default is dialog.
dtmf
Optional. Defines whether <choice> elements that have not explicitly assigned DTMF key press attribute values are automatically assigned a corresponding DTMF key press. Supported values are as follows: true: <choice> elements not explicitly set are automatically assigned a DTMF key press. false: <choice> elements not explicitly set are not assigned a DTMF key press. The default is false.
accept
Ignored for DTMF and speech grammars; optional for speech recognition. For speech recognition, specifies whether user input must be exact or may be approximate. Menu grammars that specify speech are converted to XML-SRGS grammars. The supported value is exact; there is currently no mapping for approximate in XML-SRGS grammars. The default is exact.
110
<menu>
Usage Guidelines
The <menu> element provides a relatively simple mechanism (as compared to, say, a form) for allowing the user to make a choice, and transitioning to another location is based on the users choice. Using audio prompts, the menu offers the user a set of choices, after which it waits for user input. The dialog transitions based on the user input.
111
<meta>
Defines page information. Parent element: Child elements:
Attributes <vxml>
None.
name
A name for the metadata property describing page information. Exactly one of name or http-equiv must be specified; otherwise an error.badfetch is thrown. Mandatory. A value for the metadata; that is the page information to be recorded. This value can supply for an HTTP response header. This value can be accessed later by the session variable session.meta.name. If this attribute is omitted, an error.badfetch is thrown. Ignored. The name of an HTTP header for which the content attribute is supplying the response value. Exactly one of name or http-equiv must be specified; otherwise an error.badfetch is thrown.
content
http-equiv
Usage Guidelines
The <meta> element allows specification of information about a grammar document. This element is allowed but ignored by the media server.
For some speech servers: Providing both the name and http-equiv attributes within the <meta> element is illegal and an error is expected; however, the speech server accepted the grammar, although it eventually returned a noinput event. In a test to verify that the <meta> element is accepted in an ABNF grammar, the grammar fails when being activated (that is, in the define grammar request). The expected behavior is for the grammar to be accepted and processed.
112
<metadata>
<metadata>
[SRGS] Defines information about a document using a metadata schema. Places a marker into a text or tag sequence. Parent element: Child elements:
Attributes <speak>
Depends on the metatdata schema used.
None.
Usage Guidelines
Use the <metadata> element to act as a container in which information about the document can be placed using a metadata schema. Although any metadata schema can be used with metadata, it is recommended that the XML syntax of the Resource Description Framework (RDF) [RDF-XMLSYNTAX] be used in conjunction with the general metadata properties defined in the Dublin Core Metadata Initiative [DC]. Document properties declared with the metadata element can use any metadata schema.
113
<noinput>
Handles (catches) a user input timeout event. Parent element: Child elements:
Attributes
count
Optional. The number of times a noinput event may be thrown, after which the no-input handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5. Optional. A Boolean ECMAScript expression. The no-input handling routine is executed if and only if the expression evaluates to true. The default is true.
cond
Usage Guidelines
The <noinput> element catches all events of type noinput. If multiple no-input handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=noinput>. For a list of supported events, please see the section Events on page 24.
114
<nomatch>
<nomatch>
Handles (catches) an invalid user input event. Parent element: Child elements:
Attributes
count
Optional. The number of times a nomatch event may be thrown, after which the no-match handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5. (Note that [13] sets the termination value to 4. This was changed to match the value for <noinput>, and to provide backward compatibility with a previous release of the software.)
cond
Optional. A Boolean ECMAScript expression. The no-match handling routine is executed if and only if the expression evaluates to true. The default is true.
Usage Guidelines
The <nomatch> element catches all events of type nomatch. If multiple no-match handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=nomatch>. For a list of supported events, please see the section Events on page 24.
115
<one-of>
[SRGS] Allows one selection from a list of alternatives. Parent element: Child elements:
Attributes <item>, <rule> <item>
xml:lang
Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server. A value for xml:lang specified here overrides any value for xml:lang that may have been specified at a higher level and applies to all elements below this element.
Usage Guidelines
The <one-of> element identifies a set of alternative options that are mutually exclusive. The media server supports at most two levels of nested <one-of> elements. Deeper nesting results in the grammar being rejected, in which case an error.badfetch is thrown and the session terminated.
116
<option>
<option>
Provides a simple method for specifying grammars. Parent element: Child elements:
Attributes <field>
None.
accept dtmf
Ignored. Optional. Specifies a simple DTMF sequence for user input collection and handling. White space is permitted in the DTMF sequence specification; for example 1234# and 1 2 3 4 # are treated as equivalent. There is no default. Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see Chapter 2: VoiceXML Properties.
value
Optional. Specifies a string to be assigned to the <field> name variable when this option is selected. By default, the value of the dtmf attribute is used.
Usage Guidelines
The <option> element provides a relatively simple way to specify grammars for collecting and processing user input. Simple DTMF or speech sequences or speech sequences can be specified within this element, rather than specifying a complex grammar. An <option> grammar can concurrently define both a DTMF and a speech grammar in much the same way a <choice> element does. The value attribute is assigned to the result of the collection, based on the option that was matched. Example 4-2 shows a VoiceXML script defining an <option> grammar enabled for both DTMF and speech. For DTMF, the values 1, 2 and 3 will result in the <filled> element being executed. For speech, the words Vancouver, New York, or Paris will result in the <filled> element being executed.
Example 4-2 <option> Grammar Example
<form> <field name="city"> <prompt> Please select a city you would like to visit. <enumerate/>
117
</prompt> <option dtmf="1" value="vancouver "> Vancouver </option> <option dtmf="2" value="newyork "> New York </option> <option dtmf="3" value="paris "> Paris </option> <filled> <submit next="/cgi-bin/flyto.cgi" method="post" namelist="city"/> </filled> </field> </form>
Example 4-3 shows an XML-SRGS grammar that is equivalent to the one shown in Example 4-2. The grammar shown in Example 4-3 would be passed to the external speech server for evaluation while the grammar shown in Example 4-2 would be parsed and processed within the media server.
Example 4-3 XML-SRGS Grammar
<grammar mode="voice" version="1.0" root="optionRoot"> <rule id="optionRoot" scope="public"> <one-of> <item> Vancouver </item> <item> New York </item> <item> Paris </item> </one-of> </rule> </grammar>
118
<p>
<p>
[SSML] Represents a paragraph.
[
Parent element: Child elements:
<speak>
.<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang
Mandatory. Specifies the language of the paragraph.
Usage Guidelines
The use of the <p> element is optional. Where text occurs without an enclosing <p> or <s> element, the synthesis processor attempts to determine the structure using language-specific knowledge of the format of plain text.
For some speech servers: Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements.
119
<param>
Defines a parameter to a subdialog. Parent element: Child elements:
Attributes <subdialog> <param>
name value
Mandatory. Specifies the name of the parameter to be used in the <subdialog> element. The value to be assigned to the parameter within the <subdialog> element. Exactly one of value and expr must be specified.
expr
An ECMAScript expression resulting in the value to be assigned to the parameter within the <subdialog> element. Exactly one of value and expr must be specified.
valuetype
Optional. Specifies, only to an <object> within a <subdialog> element, whether the value is of type data or type ref. Since the media server only supports type data, any other value is ignored. Optional. Specifies the media type, if the valuetype is ref. Since the media server only supports a valuetype of data, the only supported value for type is data; any other value is ignored.
type
Usage Guidelines
The <param> element allows parameters to be passed to subdialogs. Nesting of <param> elements is not supported.
120
<phoneme>
<phoneme>
[SSML] Provides a phonemic/phonetic pronunciation for the contained text.
]

Attributes
<speak>
None.
ph alphabet
Mandatory. Specifies the phoneme/phone string. Optional. Specifies the phonemic/phonetic alphabet, which in this context refers to a collection of symbols to represent the sounds of one or more human languages. Supported values are vendor-specific.
Usage Guidelines
The <phoneme> element provides a phonemic/phonetic pronunciation for the contained text. The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document. For example, the content may be displayed visually for users with hearing impairments.
For some speech servers: The ph attribute is specified as a mandatory parameter for the <phoneme> element. However, the speech server accepts and processes the element within a SSML string without the ph attribute.
121
<prompt>
Specifies media output to be played to a user. Parent element:
<block>, <catch>, <error>, <field>, <filled>, <help>, <if>, <menu>, <noinput>, <nomatch>, <record>, <subdialog> <audio>, <break>, <say-as>
Child elements:
Attributes
.
bargein
Optional. Specifies whether the audio prompt can be interrupted (barge) by DTMF or speech input. Supported values are as follows: true: The prompt is bargeable, and DTMF or speech input will interrupt play. If any digits remain in the digit buffer at the time this element is executed, the clip is barged immediately and will not play. false: The prompt is not bargeable. Any digits currently in the digit buffer are cleared, and any digits received while clip(s) are playing are discarded. If not set, the value set for the bargein property applies. For information on the bargein property, please see Chapter 2: VoiceXML Properties. The setting of the bargein attribute can interact with the setting of the fax detection property com.cvd.faxdetect. For that information, please see Chapter 2: VoiceXML Properties.
bargeintype cond count
Ignored. Optional. A Boolean ECMAScript expression. The prompt is played if and only if this expression evaluates to true. The default is true. Optional. The number of times the form item can be visited for the prompt to be played. The default is 1.
122
<prompt>
timeout
Optional. An interval after which, if initial DTMF user input has not been received, a noinput event is thrown. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The default is 10s.
cvd:vcrprompt
RadiSys extension. Optional for audio clips only; forbidden for TTS or multimedia clips. Specifies whether <promptcontrol> actions are active for this prompt. Supported values are as follows: true: Prompt controls are active for this prompt. false: Prompt controls are not active for this prompt. Any specified TTS prompts are ignored; they are neither queued nor played. Multimedia clips are played but return an error on play. Any other value results in the session terminating with an error.semantic event. The default is false.
cvd:cleardb
RadiSys extension. Optional. Flushes digits from the digit buffer. Supported values are as follows: true: All digits currently in the digit buffer will be cleared prior to playing the requested prompt. Digits are cleared independent of any value set for the bargein attribute. For details about the interaction between the cvd:cleardb attribute and the bargein attribute, please see Table 4-4 in the Usage Guidelines. false: The digit buffer is not cleared before playing the requested prompt. Any other value results in the session terminating with an error.semantic event. The default is false. This parameter does not apply to speech; only DTMF input is buffered.
123
cvd:varprompt
RadiSys extension. Mandatory if the prompt contains a <say-as> element; ignored otherwise. Specifies whether the variable prompt specified in the <say-as> element is to be played by the external TTS server or using the media servers built-in sets and variables processor. Supported values are as follows: tts: An external TTS server plays the prompt. If this value is specified but no external TTS server is configured, the varprompt attribute is ignored. sv: The media server plays the prompt using its internal sets and variables processor.
xml:lang
Optional if the prompt contains a <say-as> child element; ignored otherwise. Specifies the language to be used in rendering the prompt. If not specified, the value specified in the xml:lang attribute within the VoiceXML root document root is used. If the media servers sets and variables processor is to be used to render the variable, the only supported value is en (English). In this case, specifying an unsupported language (that is, any language other than en) causes an error.unsupported.language event. If an external TTS server is to be used to render the variable, the language value is not inspected by the media server, but is passed directly to the external TTS server.
xml:base
Optional. Allows a base URI to be defined. If set, any relative URIs within the prompt specification are resolved using this base URI. Otherwise, any relative URIs are resolved using the base URI specified within the <vxml> element. Note that a base URI can only be applied to the relative URI specified within a src attribute. It cannot be applied to a URI resulting from evaluation of an ECMAScript expression (that is, an expr attribute).
Shadow Variables
Whenever a prompt completes (with the exception of a user hang-up), a number of application-level scoped shadow variables are populated. These shadow variables provide the VoiceXML application with information about the last prompt played. Note that if the session terminates as the result of a SIP BYE, the shadow variables are not updated with information about the prompt. In this case, numeric variables report 0 and the lasturl variable reports undefined. For TTS clips, only the bargein variable is populated. All numeric variables report 0, and string variables report undefined.
124
<prompt>
Table 4-2 shows the shadow variables defined to provide information about prompt completion.
Table 4-2 Prompt Completion Shadow Variables
Shadow Variable application.cvd_lastprompt$.bargein Description RadiSys extension. Indicates whether the prompt was barged or not. Supported values are as follows: true: The prompt was barged. false: The prompt was not barged. RadiSys extension. The amount of time, in milliseconds, consumed by the last prompt played. This is the total amount of time for the last clip, or set of clips played. If the prompt was barged, then this represents the time up to the point of being barged. Although the duration includes all clips specified for the prompt, it does not include pauses that a result of user-defined pause/resume sequences. It does, however, include any silence included as a result of using the <break> element. For multimedia clips containing both audio and video components, the duration represents the larger of the video or audio components. For example, if audio played for 5400 milliseconds and video played for 4800 milliseconds, the duration parameter reports 5400 milliseconds. If the clip fails to start playing for any reason, the value of this variable is 0. If the prompt terminates because the user hangs up, the value of this variable is 0. RadiSys extension. A string identifying the URL of the last audio or multimedia file played. If the prompt consisted of a set of multiple clips, and the sequence was interrupted as a result of a DTMF digit, then the value of this variable will be the URL of the file that was playing when the digit was received. Note that the value of this variable will be undefined if no clips have been played. This includes the case where a clip is barged before it starts, and the case where a clip is stopped immediately after starting because of type-ahead digits remaining in the digit buffer. If the prompt terminates because the user hung up, the value of this variable will be undefined.
application.cvd_lastprompt$.duration
application.cvd_lastprompt$.lasturl
125
Table 4-2 Prompt Completion Shadow Variables

Shadow Variable application.cvd_lastprompt$.lasturl_o ffset Description RadiSys extension. The position in the last clip being played when clip playing terminated. Unlike the application.cvd_lastprompt$.duration shadow variable, this is not the amount of time for which the last clip played, but rather the position in the file when clip playing completed. For clips that have no associated VCR controls this value will likely be the same as the duration of play. However, for clips that have used the media control seek action, the actual duration of clip play and position in the file may not be the same. If the prompt terminates because the user hung up, the value of this variable will be 0. This shadow variable is defined only for audio clips. For multimedia clips,containing both audio and video components, the offset represents the larger of the video or audio components. For example, if audio played for 5400 milliseconds and video played for 4800 milliseconds, the offset parameter reports 5400 milliseconds.
There is also a family of shadow variablesthe application.lastresult$.value shadow variablesdefined in [13], which can be used to reference the information resulting from DTMF collection. These are shown in Table 4-3
Table 4-3 DTMF Collection Variables
Shadow Variable application.lastresult$.interpretation Description Contains the last set of collected input with the following exceptions: For Boolean type, the variable contains true or false if there was a match. Contains the digits otherwise. For Currency type and DTMF, the asterisk (*) is converted to a period (.) in match cases. For example, input 1*23 is converted to 1.23. For no match cases the literal digits are assigned. For speech, contains the value returned from the speech server interpreting the input. Contains the raw input that was received. In the example given above for Currency, the variable would report 1*23 and not 1.23. For Boolean variables, the parameter would report it would be what was entered and not true or false. For most other cases utterance and interpretation will be the same. Contains the input mode. This is either dtmf or voice, depending on which grammars were active and which one produced the event. This shadow variable is initialized with a value of undefined and updated later with actual value (dtmf or voice) only when the <grammar> element is executed.
application.lastresult$.utterance
application.lastresult$.inputmode
126
<prompt>
Table 4-3 DTMF Collection Variables

Shadow Variable application.lastresult$.confidence Description A value in the range of 0.0 to 1.0 representing the confidence that the input was correctly interpreted. For DTMF this is always 1.0. For speech grammars this value is returned by the speech server for a voice collection. RadiSys extension. Indicates why the last collection terminated. Prior to collection, the value of this variable is undefined. For DTMF collections, there are six termination cases: Termchar. The user pressed the defined termination character. In this case, the variable contains the defined termchar (pound sign # by default.) Timeout. Collection ended as a result of either an interdigit timeout, a prompt timeout, or a term timeout. In this case, the variable contains the value T. Fixed-length match. Collection ended because n digits were expected and n digits were received. In this case, the variable contains FLM. Note that FLM does not necessarily indicate a match occurred, just that the expected number of digits were received. For information about maximum length and how it applies to grammars, please see Working with Media Files and TTS Strings on page 28. Impossible match. Collection ended because the expected input was all digits and a non-digit non-termchar character was received. In this case, collection ends immediately and the variable contains IM. User hang-up. Collection ended because the caller hung up. In this case, the variable contains Hangup. As with all values for collection shadow variables, this value set only if the hang-up actually stops collection. If the hang-up occurs during the prompt prior to collection or at some other time, then the value will remain undefined. Fax. Collection ended because a fax tone was detected. In this case, the variable contains FAX. As with all values for collection shadow variables, this value set only if the fax tone actually stops collection. If the fax tone occurs during the prompt prior to collection or at some other time, then the value will remain undefined. For voice collections the teriminating condition is always timeout (T) for matches, or impossible match (IM) for all no-match conditions. Radisys extension. Contains type of fax event that occurred. This is either CED or CNG. The variable reports undefined in the absence of a fax event.
application.cvd_lastresult$.termcond
application.cvd_lastresult$.faxtype
127
Usage Guidelines
The <prompt> element queues recorded audio, multimedia, Text to Speech (TTS), or recorded audio as prompts to be played to the user. Recorded media prompts are played by embedding the <audio> element within the <prompt> element. TTS clips can be specified as SSML, or as plain text strings with embedded SSML elements in the string. The variable prompt specified in the <say-as> element is treated as TTS string. All TTS strings (except those variable prompts to be played using the media servers built-in sets and variables subsystem) are compiled within the media server into SSML scripts and passed to the TTS speech synthesizer to be played, provided an active speech synthesizer server is configured. If no server is configured, the string is simply ignored. All attributes of the <prompt> element, with the exception of the vcrprompt attribute, apply to TTS clips in the same way that they do to prerecorded audio clips.
Prompt Controls
The vcrprompt attribute and the associated prompt controls are not supported for TTS or multimedia clips. If a TTS clip (including variable prompts to be played using the media servers built-in sets and variables subsystem) is specified within a <prompt> element that has prompt controls enabled (that is, vcrprompt is true), is ignored and will be neither queued nor played. If a multimedia clipclip is specified within a <prompt> element that has prompt controls enabled, an error is returned.
Barging and Prompts

When there are multiple prompts in a sequence, the bargein attribute is honored for each prompt as that prompt is playing. However, if a prompt is barged, no subsequent prompts from the sequence will be played, regardless of their individual bargein settings. Voice collections are set up and active prior to a prompt being played regardless of whether or not the prompt is bargeable. Since prompts can contain a mix of bargeable and non-bargeable prompts (which require the media server to turn on and off voice grammar recognition) it is possible that spoken input during a non-bargeable prompt be returned and used after the prompt completes, which is not the expected behavior. Table 4-4 outlines the effect of the bargein and cvd:cleardb attributes on the DTMF digit buffer at the moment an audio or TTS announcement is requested, assuming that DTMF collection is active. If the inputmodes attribute of the <property> element is set such that DTMF collection is not active, all digits in the digit buffer are cleared when the <prompt> clips are played.
Table 4-4 Effect of Barging Announcements on the Digit Buffer
bargein True cvd:cleardb True Digit Buffer Empty Behavior The announcement is started. All digits received before the announcement completes are stored in the digit buffer.
128
<prompt>
Table 4-4 Effect of Barging Announcements on the Digit Buffer

bargein True cvd:cleardb True Digit Buffer Contains digits Behavior The digit buffer is cleared and announcement request is played. Received digits are stored in the digit buffer until the announcement complete notification is received. Note that in this case cvd:cleardb=true is overriding bargein=true. The announcement is started. All digits received before the announcement complete notification are stored in the digit buffer. The announcement is immediately barged. The media server transitions to a digit collection state to evaluate any digits remaining in the digit buffer. The announcement is started. Any digits received before the announcement completes are discarded. The digit buffer is cleared and the announcement is started. Any digits received before the announcement completes are discarded. The announcement is started. Any digits received before the announcement completes are discarded. Digits are cleared from the buffer and the announcement is started. Any digits received before the announcement completes are discarded.
True
False
Empty
True
False
Contains digits
False
True
Empty
False
True
Contains digits
False
False
Empty
False
False
Contains digits
129
<promptcontrol>
Specifies media controls for user prompt manipulation. Parent element: Child elements:
Attributes <field>, <form>, <menu>, <vxml> <controlcmd>

Usage Guidelines
The <promptcontrol> element allows you to define VCR-like controls for playing of audio files. Prompt controls are not supported for TTS clips. The <promptcontrol> element encloses the <controlcmd> element, which specifies a set of DTMF inputs and associated actions controlling the play of the specified audio. Voice inputs for prompt controls are not supported. The scope of the <promptcontrol> element and the setting of the vcrprompt attribute of the <prompt> element determine when prompt control actions are in effect. The media server supports the following prompt controls: Pause/resume Skip forward/skip backward Volume up/volume down
130
<property>
<property>
Sets the value of a property. Parent element: Child elements:
Attributes <field>, <form>, <menu>, <record>, <subdialog>, <vxml>
None.
name value
Mandatory. The name of the property being updated. Unrecognized properties are ignored. There is no default. Mandatory. The new value for the property. The range of values depends on the property. Specifying an invalid value for the property will result in an error.semantic. For information about the valid values for supported VoiceXML properties, please see Chapter 2: VoiceXML Properties.
Usage Guidelines
The <property> element allows an application to modify the value associated with a property. For a description of supported properties, please see Chapter 2: VoiceXML Properties. The scope of the propertys value of the property is inherited from the parent element, and applies to all child elements. The lowest level value assignment for the property value overrides all higher level assignments. If no values are explicitly assigned then the default property value will be used whenever required.
131
<prosody>
[SSML ] Permits control of the pitch, speaking rate and volume of the speech output Parent element: Child elements:
Attributes <speak>,
None.
pitch contour range rate duration volume
Optional. The baseline pitch for the contained text. Optional. Sets the actual pitch contour for the contained text. Optional. Tthe pitch range (variability) for the contained text. Optional. The change in the speaking rate for the contained text. Optional. The desired time to take to read the element contents. Optional. The volume for the contained text in the range 0.0 to 100.0.
Usage Guidelines
The <prosody> element permits control of the pitch, speaking rate and volume of the speech output.. Although each attribute individually is optional, it is an error if no attributes are specified when the prosody element is used.
For some speech servers: All values associated with the pitch attribute are ignored in elements supporting this attribute. All values associated with the duration attribute are ignored in elements supporting this attribute. The contour, duration, pitch, and range attributes of the <prosody> element are ignored.
132
<record>
<record>
Records user audio, video, or multimedia to a file. Parent element: Child elements:
<form> <audio>, <catch>, <error>, <filled>, <grammar>, <help>, <noinput>, <nomatch>, <prompt>, <property>
Attributes
name
Mandatory. Specifies the name of a variable that will hold the recording. For This name will be used as an internal reference to the file after the recording is complete. To play the recorded file, reference this variable name. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words. The name must be unique across all <record> elements within the same scope. Note that recordings stored internally are transient, and are deleted at the end of the session. To store recorded audio persistently, you must specify an external NFS or HTTP server. Unless you specify otherwise (using the cvd:dest or cvd:destexpr attribute) all recordings are internal and transient.
expr
Optional. An ECMAScript expression representing the initial value of the name variable. If initialized to a value, the recording will not start unless the name variable is cleared. The default is the ECMAScript value undefined.
cond
Optional. A Boolean ECMAScript expression. The recording is started if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true. Ignored.
modal
133
beep
Optional. Specifies whether to play a short fixed beep tone just prior to beginning the recording. The location of this beep tone is configurable using the media servers management interface. Supported values are as follows: true: The beep tone will be played just prior to recordings. false: No beep tone is played before recordings. The default is false.
maxtime
Optional. Specifies a maximum recording time. If reached, the recording is terminated. In this case, the shadow variable name$.maxtime is set to true. Optional. Specifies the duration of post-speech silence time which, if exceeded, will terminate the recording. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The default is 5s. Note that a finalsilence value of 0 specifies that no post-speech trimming should be performed on the recording. This applies to both externally and internally recorded files.
finalsilence
dtmfterm
Optional. Specifies whether a DTMF key press can terminate the recording. Supported values are as follows: true: The recording will be terminated if any DTMF key is pressed, provided that the inputmodes property is set to dtmf or dtmf voice. If the inputmodes property is set to voice, DTMF key presses are ignored and the recording is not stopped. false: DTMF key presses will not terminate the recording. The default is true, so that by default, any DTMF keypress will terminate the recording. The setting of the dtmfterm attribute can interact with the setting of the fax detection property com.cvd.faxdetect. For that information, please see Chapter 2: VoiceXML Properties.
134
<record>
format
Mandatory. Specifies the file type and encoding scheme for the recording. Supported formats are shown in Table 4-6 on page 137. The default is audio/wav.
cvd:dest
Optional. RadiSys extension. Specifies the destination for a recording for either of two cases:
1 External recording. Specifies the URI of an external NFS or HTTP
server where the audio recording is to be stored persistently. The URI must conform to the guidelines given for specifying external recordings described in Working with Media Files and TTS Strings on page 28. The recording is made in real time to the specified URI.
2 Appending to an existing recording. If cvd:append is set to true, the
media server appends the recording to an existing recording referenced by cvd:dest. The existing recording may be either internal or external. For details on appending recordings, please see the Usage Guidelines, below. Only one of cvd:dest and cvd:destexpr may be specified. cvd:destexpr Optional. Specifies an ECMAScript expression that evaluates the URI of an external NFS or HTTP server where the audio recording is to be stored persistently. The URI must conform to the guidelines given for specifying external recordings described in Working with Media Files and TTS Strings on page 28. An error.semantic being thrown and session termination if the script evaluates to the ECMAScript value undefined. Only one of cvd:dest and cvd:destexpr may be specified. cvd:append Optional. Directs the media server to append this recording to the existing recording specified by cvd:dest. For details on appending recordings, please see the Usage Guidelines, below. Valid only for audio files; append is not supported for files containing video. If this attribute is specified for a file containing multimedia, and error.badfetch is thrown.
135
Shadow Variables
A shadow ECMAScript variable is created for each recording. The shadow variable is name$ where name is the name specified by the name attribute. At the end of the recording, information about the recording, such as its total length, is available to the VoiceXML application. Table 4-5 shows the shadow variables defined to provide information about recordings.
Table 4-5 Recording Shadow Variables
Shadow Variable name$.duration Description Contains the length of the recording in milliseconds. The length reported includes the length of all announcements played plus any silence played between them. When appending to an existing audio recording, the duration amount indicates the length of just the appended portion. Contains the length of the recording in bytes. When appending to an existing recording, the size amount indicates the size of just the appended portion. Contains the DTMF termination key, if a DTMF termination key was specified at the time of start of recording and if the recording was terminated as a result of detecting the termination key. Detection of a fax tone terminating the record results in the termchar shadow variable being set to F. Indicates whether the recording was terminated as a result of reaching maximum recording time. Supported values are as follows: true: The recording terminated as a result of reaching the maximum allowed time. false: Reaching the maximum allowed time was not the reason for termination.
name$.size
name$.termchar
name$.maxtime
Usage Guidelines
The <record> element allows a user audio, video, or multimedia recording to be made. Recorded audio is assigned a variable name using the name attribute. This name can be referenced within the <audio> to play back the recorded media.
Storage of Recorded Files

The recorded file may be stored internally on the media server or streamed in real time to an external server. Audio files can be streamed to either an HTTP or an NFS server; multimedia streaming is supported only for NFS servers. Internal recordings are transient and are automatically deleted from the media server when the VoiceXML session terminates. If a recording is to be saved, it can be posted to an HTTP server following completion of an internal recording using the <submit> element, which uses the HTTP POST method. Note that while multimedia streaming is only supported for NFS servers, multimedia files recorded internally can be posted to an HTTP server.
136
<record>
Memory for internal recordings is limited, and it is recommended that longer recordings be streamed to an external server. External recordings use the HTTP PUT method, which permits real-time transfer while the recording session is in progress. The destination is specified using either the cvd:dest or the cvd:destexpr attributes.
Size of Streamed Files

Because the recording is made in real time, the media server cannot know in advance the size of the file. Therefore, in the WAV header, the media server sets the file size to the maximum (FFFF). In addition, recordings streamed using this method will not have post-speech silence trimmed once on the external server. It is the responsibility of the server resource handling the request to trim post-speech silence (if desired) and adjust the file size in the WAV header. This can be done using (for example) a CGI script on the HTTP server. Note also that, for the same reason, unless an appropriate CGI script is provisioned on the HTTP server, the media server will not be able to detect failed recordings. For more information on the type of CGI script that should reside on HTTP servers, please see the material for setting up HTTP servers to interoperate with the media server, in the Convedia Media Server Guide to Working with External Servers and Peripherals.
Encoding of Recordings
Recordings are encoded as either G.711 or G.729 for audio files, and as QuickTime or 3GP format for video files. The format of the recording can be specified using the format attribute. If not specified, the format of the recording will be that configured as default, which is set using the media servers management interface. Table 4-6 shows the encoding formats supported for recordings. The format must be entered exactly as shown; in particular, no spaces or other characters are permitted other than those shown. The codecs parameter used by 3GPP MIME types is defined in RFC 4281 [13].
Table 4-6 Supported Encoding Formats for Recordings
Format Description
audio/wav audio/x-wav audio/vnd.wave; codec=1 audio/vnd.wave; codec=6 audio/vnd.wave; codec=7 audio/vnd.wave; codec=83 video/quicktime; codecs=h263 video/quicktime; codecs=h263, alaw
PCMU-encoded WAV file. (Audio-only.) PCMU-encoded WAV file. (Audio-only.) PCM. (Audio-only.) G.711 a-lawencoded WAV file. (Audio-only.) G.711 u-lawencoded WAV file. (Audio-only.) G.729 Annex Aencoded WAV file. (Audio-only.) QuickTime file with H.263-encoded video. (Video-only.) QuickTime file with H.263-encoded video and G.711 a-lawencoded audio. (Multimedia.)
137
Format
Description
video/quicktime; codecs=h263, ulaw audio/quicktime; codecs=alaw audio/quicktime; codecs=ulaw video/3gpp;codecs=s263,samr
QuickTime file with H.263-encoded video and G.711 u-lawencoded audio. (Multimedia.) QuickTime file with G.711 a-lawencoded audio. (Audio-only.) QuickTime file with G.711 u-lawencoded audio. (Audio-only.) 3GPP file with H.263-encoded video and AMR-encoded audio. (Multimedia.) Note that order matters and extra spaces are not allowed. 3GPP file with H.263-encoded video. (Video-only.) 3GPP file with AMR-encoded audio. (Audio-only.)
video/3gpp;codecs=s263 audio/3gpp;codecs=samr
Stopping Recordings with DTMF

By default, any DTMF key press will stop the recording. To keep DTMF from terminating a recording, set the dtmfterm attribute to false (where the inputmodes property is also set to dtmf or dtmf voice); In this implementation, this means that DTMF will not terminate the recording. If the inputmodes property is set to voice, DTMF is ignored and will not terminate the recording. Currently, recordings cannot be terminated by barging with speech. Also, note that since active grammars are not supported during recording, adding a voice-based grammar does not stop recording. All record errors are fatal for the particular session. This includes all attribute specification errors, errors that are reported in the process of performing the actual recording, and errors posting the recordings to an external server for the case of internal recordings. Best practice: The application should provide for a time when support for active grammars during recording is added. At that time, if dtmfterm is set to false, DTMF input will still terminate the recording if the DTMF input matches an active grammar. To ensure that DTMF would never end a recording, set dtmfterm to true AND ensure there is no local active grammar. Following this practice will allow you to ensure that applications are not affected should active grammars become supported.
Setting a Pre-Speech Timer

A pre-speech is a timer, associated with a recording, that represents the amount of time the media server should wait before assuming the recording will not start. A pre-speech timer cannot be explicitly set through a <record> element attribute; however, it can be set using the timeout property. The value used for a pre-speech timeout is the value of the property in the current scope at the time the recording is made. For audio-only recordings the timeout property represents the time to wait for speech.
138
<record>
For recordings containing videos, including multimedia recordings, the timeout property represents the time to wait for the first video I-frame. A noinput event thrown for a multimedia recording always means that the I-frame was not received in the time specified by the timeout property at the time the recording was made.
Trimming Post-Speech Silence

The CMS automatically removes any post-speech silence for internally recorded files based on the value of the finalsilence attribute. For HTTP externally recorded files there is no automatic trimming of post-speech silence. To remove post-speech silence from streamed recordings, you must define a CGI script on the HTTP server. For details on setting up an HTTP server to interoperate with the media server, please see the User Guide for your media server.
Appending to a Recording
The media server supports appending to an existing recording, for internal files or files stored on NFS servers. This mechanism is not supported for recordings on HTTP servers and it is not supported for files containing video; if attempted for either of these, an error.badfetch is thrown. The append function is enabled by setting the cvd:append attribute to true. When you append to an existing recording, you essentially make a request to create a new recording, which consists of the original recording plus the appended audio. Recording names must be unique within a session. This means that the name of the original recording cannot be reused in the request to append. It is necessary to specify a new name for the appended recording because names for recordings must be unique within the session; therefore, the old recording name cannot be reused for the new file. Instead, the appended file must be given a new unique name. For example, suppose the original recording is given the name record1, using the following request to record.
<record name="record1" maxtime="10s"/>
The request to append must use a new identifier for the file that will result after appending: this is record2 in the example. The file to append to (that is, the original recording record1) is specified using the cvd:dest attribute, as follows:
<record name="record2" maxtime="10s" cvd:append="true" cvd:dest="record1"/>
The cvd:dest value in conjunction with the cvd:append=true expression notifies the VXML interpreter to record to an existing file and not to a new file. In this example, the shadow variables associated with record1 will reflect values associated with the original recording, while shadow values associated with the appended recording will be referenced using the name record2.
139
Also, note that in this example, the code would need to be executed in the same VXML script to ensure that the record1 variable does not go out of scope. If these operations are to span multiple documents, the value of this variable must be assigned to an application scope variable. Table 4-7 shows the recording behavior depending on the various values for cvd:append and cvd:dest (or cvd:destexpr). Note that all these cases assume that the value set for the name attribute is unique (that is, unfilled) for this session. If the name attribute is defined (filled) then the recording does not occur, as specified in [13].
Table 4-7 Summary of append Behavior
cvd:append False False False True True cvd:dest and cvd:destexpr Undefined Internal recording External recording Undefined Internal recording Behavior The recording is treated as a normal internal recording. An error.badfetch is thrown. The cvd:dest attribute must specify an external recording is UNLESS cvd:append=true. The recording is treated as a normal internal recording. If the external recording already exists, it is overwritten. Creates a new recording. This is equivalent to cvd:append missing or false. File exists: Current recording is appended to the existing file, assuming that internal recording variable used evaluates to valid recording content then the recording proceeds. If the variable used to represent the existing internal recording does not evaluate as defined, or is incorrectly formatted then the call is rejected with an error.semantic. File does not exist: New file is created and recording occurs on that new file, assuming that internal recording variable used evaluates to valid recording content then the recording proceeds. If the variable used to represent the existing internal recording does not evaluate as defined, or is incorrectly formatted then the call is rejected with an error.semantic. File exists: Current recording is appended to the existing file. File does not exist: New file is created and recording occurs on that new file.
True
External recording
140
<reprompt>
<reprompt>
Repeats a prompt for user input. Parent element: Child elements:
None.

Usage Guidelines
The <reprompt> element allows the application to revisit an originating prompt from an event handler, such as the <catch> element. This mechanism, along with incrementing prompt counters, can be used to vary prompts to the user when user input does not match expected results.
141
<return>
Return from a subdialog to the calling dialog. Parent element: Child elements:
None.
event
Optional. Throws the specified event in the calling dialog after the return from the subdialog. For a list of supported events, please see the section Events on page 24. There is no default. Only one of event, eventexpr, and namelist may be specified. Otherwise, an error.badfetch is thrown.
eventexpr
Optional. Throws the event resulting from evaluation of the specified ECMAScript expression in the calling dialog after the return from the subdialog. For a list of supported events, please see the section Events on page 24. There is no default. Only one of event, eventexpr, and namelist may be specified. Otherwise, an error.badfetch is thrown.
namelist
Optional. Returns the specified list of variable names to the calling dialog. Format is a space-separated list of variable names. By default, the calling context receives an empty ECMAScript object back. Note that specifying a namelist does not cause an event to be thrown.
message
Optional. Returns the specified message string, along with the event name, to the calling dialog when an event is thrown. There is no default. The message string can be accessed within the <catch> element of the calling dialog using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler when an event is thrown, along with the event name. There is no default. The message string can be accessed within the <catch> element of the calling dialog using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
142
<return>
Usage Guidelines
The <return> element terminates the execution of a subdialog, and returns control back to the calling dialog and, optionally, data. The <return> element can also be used to throw an event in the calling dialog, such as a nomatch event. For example, <return event=nomatch/> will trigger the nomatch event handler in the calling dialog. In addition, the <return> element can be used to return results to the calling dialog. For example, suppose the variable cardnumber is defined within a subdialog and populated by user input. Then <return namelist=cardnumber/> returns the cardnumber to the calling dialog, which can access its value using subdialog-name.cardnumber, where subdialog-name is the name specified for the subdialog.
143
<rule>
[SRGS] Defines a grammar rule for an inline DTMF or voice grammar. Parent element: Child elements:
Attributes <grammar> <item>, <one-of>
id
Mandatory. An identifier for the rule. The identifier must be unique within the grammar. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore.
scope
The scope of this rules grammar. The supported value is as follows: public: This rule may be referenced by other rules within the current grammar, and by rules in other grammars. private: Not supported. Strictly speaking, this attribute is optional. However, the default defined in the VoiceXML 2.0 specification is private, which is not supported by the media server. Therefore, the application should explicitly include the scope attribute with a value of public (scope=public). This will ensure correct interworking with the media server if full grammar scoping capabilities are implemented.
Usage Guidelines
The <rule> element defines an inline XML grammar rule for DTMF or voice. All SRGS grammars must have a valid set of rules or items to be considered a valid grammar. Grammars that evaluate to empty, that is have no defined items within the grammar are rejected, with session termination and an error.grammar. Grammars that contain tokens not enclosed in <item> elements are ignored. Only one rule may be at any time. Thus, for inline grammars that could active concurrently, one grammar will actually be active . The second grammar that defines its own rule or omits the rule is ignored. To enable concurrent DTMF and voice grammars, two grammars must be defined at the same level of scope within a VoiceXML script.
144
<ruleref>
<ruleref>
[SRGS] Allows another voice grammar rule to be included. Parent element: Child elements:
Attributes <grammar> <item>, <one-of>
id
Mandatory. An identifier for the voice grammar rule. The identifier must be unique within the grammar. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore.
scope
The scope of this rules grammar. The supported value is as follows: public: This rule may be referenced by other rules within the current grammar, and by rules in other grammars. private: Not supported. Strictly speaking, this attribute is optional. However, the default defined in the VoiceXML 2.0 specification is private, which is not supported by the media server. Therefore, the application should explicitly include the scope attribute with a value of public (scope=public). This will ensure correct interworking with the media server if full grammar scoping capabilities are implemented.
Usage Guidelines
The <ruleref> element defines an inline XML grammar rule. Currently, only voice grammar rules are supported; DTMF grammar rules are not supported. All SRGS grammars must have a valid set of rules or items to be considered a valid grammar. Grammars that evaluate to empty, that is have no defined items within the grammar are rejected, with session termination and an error.grammar. Grammars that contain tokens not enclosed in <item> elements are ignored. Only one rule may be at any time. Thus, for inline grammars that could active concurrently, one grammar will actually be active. The second grammar that defines its own rule or omits the rule is ignored. To enable concurrent DTMF and voice grammars, two grammars must be defined at the same level of scope within a VoiceXML script.
145
<s>
[SSML] Represents a sentence.
[
<speak>
.<audio>, <break>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang
Mandatory. Specifies the language of the sentence.
Usage Guidelines
The use of the <s> element is optional. Where text occurs without an enclosing <p> or <s> element, the synthesis processor attempts to determine the structure using language-specific knowledge of the format of plain text.
146
<say-as>
<say-as>
[SSML] Defines a text string to be rendered as an audio clip. Parent element: Child elements:
Attributes <prompt> <value>
interpret-as
Mandatory. Used for VoiceXML variables to indicate the type of the variable. Supported values are date, time and digits. Supported variable types for VoiceXML are described in the Convedia Media Server Sets and Variables Interface Reference Guide. Optional or mandatory, depending on the variable type specified by the interpret-as attribute. (Currently, all supported variable types have mandatory subtypes.) Used for VoiceXML variables to indicate the subtype of the variable. Supported variable subtypes for VoiceXML are described in the Convedia Media Server Sets and Variables Interface Reference Guide. Ignored.
format
detail
Usage Guidelines
The media server uses the SSML <say-as> element to allow the control agent to use a subset of the media servers sets and variables processing subsystem. For general information on the media servers sets and variables feature, please see the Convedia Media Server Sets and Variables Interface Reference Guide. The <say-as> element is used as a child of the <prompt> element to specify the variable to be rendered in the prompt. The media server uses its built-in sets and variables processing subsystem; no TTS server is required. However, all the clips to be played must be internally provisioned on the media server and an audio segment configuration file (the sets and variables configuration file) must also be provisioned on the media server. See the Convedia Media Server Sets and Variables Interface Reference Guide for this information. The <say-as> element can contain either a child <value> element specifying the variable to be rendered or a plain text string specifying the variable. The variable type is indicated by the interpret-as attribute and the variable subtype is indicated by the format attribute. Supported variable types and subtypes are described in the Convedia Media Server Sets and Variables Interface Reference Guide. If the value of the variable is out of the supported range, the media server terminates the call, without throwing an error.semantic event. In general, the language in which the variable is to be rendered is specified by the xml:lang attribute at either the document level (that is, within the <vxml> element) or within the <prompt> element. Currently, the only supported value is en (English).
147
<script>
Executes ECMAScript (JavaScript) code. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <form>, <help>, <if>, <menu>, <noinput>, <nomatch>, <vxml>
None.
src charset fetchhint fetchtimeout
Optional. Specifies the URI to the script, if the script is external. If not specified, the media server expects the script to be defined inline. Ignored. Ignored. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
148
<script>
Usage Guidelines
The <script> element specifies ECMAScript client-side logic. The results of the computation performed by the script can be returned to the caller and stored in a variable. The contents of the variable can be used later by the VoiceXML application for general use, such as conditional logic or dialogs utilizing the variable. The script can be fetched externally or it can be specified in-line.
149
<speak>
[SSML] The root element of SSML.
[
<?xml>
.<audio>, <break>, <emphasis>, <mark>, <meta>, <metadata>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang xml:base version
Mandatory. Specifies the language of the root document. Optional. Specifies the base URI of the root document Mandatory. The SSML version. The only supported value is 1.0.
Usage Guidelines
The <speak> element is the root element of the Speech Synthesis Markup Language (SSML), which is an XML application for speech synthesis. The <speak> element is not supported directly in VoiceXML scripts. Rather, all TTS scripts are rendered into <speak> SSML XML scripts, which are then passed to an external server for playing. Including a <speak> element with TTS text in a VoiceXML document will cause a parse error.
150
<sub>
<sub>
[SSML] Replaces the contained text with a substitute.
[

Attributes
<speak>
.None.
alias
Mandatory. Provides the text to be substituted for the enclosed string.
Usage Guidelines
The <sub> element The sub element is employed to indicate that the text in the alias attribute value replaces the contained text for pronunciation. This allows a document to contain both a spoken and written form. The required alias attribute specifies the string to be spoken instead of the enclosed string. The processor should apply text normalization to the alias value. The <sub> element can only contain text to be rendered.
For some speech servers: The specification states that the alias attribute of the <sub> element is mandatory; however, this is not enforced by the speech server.
151
<subdialog>
Invokes another dialog, from which control will eventually return. Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <form>, <help>, <noinput>, <nomatch> <audio>, <catch>, <error>, <filled>, <help>, <noinput>, <nomatch>, <param>, <prompt>, <property>
Attributes
name
Optional. Defines a variable with the specified name, which will hold the return values returned by the <subdialog> element. The scope of the returned value is limited to the form. The values are returned from the subdialog in the namelist specified in the <return> element. The return values can be accessed using the shadow variable name$.ReturnedVariableName. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words. There is no default.
expr
Optional. An ECMAScript expression assigning the initial value of the form item variable defined by name. If the initial value is set using this attribute, the form item will not be executed until the variable is cleared (for example, by using the <clear> element). The default is the ECMAScript value undefined. Optional. A Boolean ECMAScript expression. The subdialog is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true. Optional. Specifies a set or list of variables to submit to the subdialog. Any declared VoiceXML or ECMAScript variable, including shadow variables, can be included in the list. By default, no variables are submitted.
cond
namelist
152
<subdialog>
src
The URI of the subdialog. The URI must comply with the XML anyURI format. If the subdialog is contained within the current document, the format is #dialog-name; for example #SubdialogX. Exactly one of src and srcexpr must be specified. Otherwise, an error.badfetch is thrown.
srcexpr
An ECMAScript expression evaluating to the URI of the subdialog. The URI resulting from the expression must comply with the XML anyURI format. Exactly one of src and srcexpr must be specified. Otherwise, an error.badfetch is thrown.
method
Optional. Specifies the HTTP method to be used in submitting. Supported values are as follows: get: An HTTP GET method will be used. post: An HTTP POST method will be used. The default is get.
enctype
Optional. The MIME encoding method to be used in submitting. The only supported value is application/x-www-form-urlencoded. This is the default. Ignored. Ignored.
153
fetchtimeout
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <subdialog> element provides ability to transition to a new interaction, much like a function call. Subdialogs are useful in creating and organizing commonly used dialog functions as a libraries, which can be reused by many applications. When the subdialog is complete, control is returned to the calling dialog. The state of the calling dialog (active grammars, variables, event handlers, and so on) are preserved when the called dialog is invoked, and restored when the called dialog returns control back to the calling dialog. The calling dialog can pass variables to the called dialog using the namelist attribute of the <subdialog> element. The called dialog returns control back to the calling dialog by executing the <return> element, and the <return> element can also return variables from the subdialog to the calling dialog. Unlike a subroutine, the called dialog does not have access to any information from the context of the calling dialog. This is because the calling and the called dialogs execute in two separate and independent execution contexts. Thus, for example, events thrown in the called dialog must be handled in that dialog; they cannot invoke event handlers in the calling dialog. In addition, variables scoped by the calling dialog are not accessible by the called dialog, and any variables scoped by the called dialog are not accessible when control returns back to the calling dialog.
154
<submit>
<submit>
Submit application values and fetch a new document, transitioning to a new dialog. Parent element: Child elements:
None.
next
Submits to the specified URI. The URI must comply with the XML anyURI format. Exactly one of next and expr must be specified. Otherwise, an error.badfetch is thrown.
expr
Submits to the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format. Exactly one of next and expr must be specified. Otherwise, an error.badfetch is thrown.
namelist
Optional. The variables to submit as data. Format is a space-separated list of variable names. Both VoiceXML and ECMAScript variables can be included. By default, all named input item variables are submitted. Optional. Specifies the HTTP method to be used in submitting. Supported values are as follows: get: An HTTP GET method will be used. post: An HTTP POST method will be used. The default is get.
method
enctype
Optional. The MIME encoding method to be used in submitting. The only supported value is application/x-www-form-urlencoded. This is the default. Ignored. Ignored.
155
fetchtimeout
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <submit> element allows the application to submit variables to an external HTTP server and transition control to a new VoiceXML document. The variables to be sent are listed in the namelist attribute. This data is sent as URI-encoded parameters to the HTTP server. Data can be sent using either the HTTP GET or the HTTP POST method. The values submitted can be fixed strings, internal variables (for example, field items or property variables), or ECMAScript expressions. Expressions are evaluated first and then converted to strings before submitting. The execution of a <submit> element will always result in a document fetch. The document specified by the next or the expr attribute is returned by the HTTP server, and application control transitions to this document.
156
<throw>
<throw>
Generates an event to be handled by <catch>. Parent element: Child elements:
None.
event
Throws the specified event. The event may be predefined, or application-specific. For a list of supported events, please see the section Events on page 24. Exactly one of event and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
eventexpr
Throws the event resulting from evaluation of the specified ECMAScript expression. The event may be predefined, or application-specific. For a list of supported events, please see the section Events on page 24. Exactly one of event and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
message
Optional. Returns the specified message string to the event handler, along with the event name. There is no default. The message string can be accessed using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default. The message string can be accessed using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
Usage Guidelines
The <throw> element throws the specified event to be caught by the <catch> element. The event can be pre-defined (for example, a nomatch event), or it may be application-specific.
157
<value>
Inserts the value of an expression into a log message or prompt. Parent element: Child elements:
Attributes <log>, <say-as>
None.
expr
Mandatory. An ECMAScript expression, the value of which will be inserted into the log message.
Usage Guidelines
The <value> element is used in the <log> element to insert the text of the log message into the log. In this context, the <value> element can be used to de-reference ECMA script expressions and include them in the output of the <log> message. Note that all <log> messages are written to syslog at a severity level of ERROR The <value> element is used in the <say-as> element to insert the value of an expression into a prompt.
158
<var>
<var>
Declares a variable and assigns it a value. Parent element: Child elements:
None.
name
Mandatory. The name of the variable. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words.
expr
Optional. An ECMAScript expression representing the value of the variable. If not specified, then if the variable was previously declared, it retains its original value. Otherwise, the ECMAScript value undefined is assigned to the variable.
Usage Guidelines
The <var> element declares a variable and assigns it a value. Proper scoping rules are observed as defined in [13]. The naming of user-defined variables adheres to the naming convention specified in Section 5.1 of [13]. The maximum length of a variable is 256 characters. In general, naming errors result in an error.semantic being thrown.The exception is the error where variable names end in the dollar sign ($). This error results in an error.badfetch.
159
<voice>
[SSML] Requests a change in speaking voice.
[
<speak>
.<audio>, <break>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang gender
Optional. Specifies the language of the paragraph. Optional. Indicates the preferred gender of the voice to speak the contained text. Supported values are as follows: male: Use a male voice. female: Use a female voice. neutral: Use a neutral voice.
age variant
Optional. Indicates the preferred age since birth, in years, of the voice to speak the contained text. The range is a non-negative integer. Optional. Indicates a preferred variable of the other voice characteristics to speak the contained text (for example, the second male child voice). Valid values are of the type positive integer. Optional Indicates a processor-specific voice name to speak the contained text. The value may be a space-separated list of names ordered from most-preferred to least-preferred. Consequently, a name may not contain any white space.
name
Usage Guidelines
The <voice> element is a production element that requests a change in speaking voice. Although each attribute individually is optional, it is an error if no attributes are specified when the <voice> element is used. The <voice> element is commonly used to change the language. When there is not a voice available that exactly matches the attributes specified in the document, or there are multiple voices that match the criteria, a voice selection algorithm must be used. Approximately speaking, the xml:lang attribute has the highest priority and all other attributes are equal in priority but below xml:lang.
160
<voice>
For some speech servers: Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements. All attributes of the <voice> element are ignored.
161
<vxml>
The root element for VoiceXML. Defines the set of actions that form a VoiceXML dialog. Parent element: Child elements: None. The root element for VoiceXML.
<catch>, <error>, <form>, <help>, <link>, <menu>, <noinput>, <nomatch>, <promptcontrol>, <property>, <script>, <var>
Attributes
version xmlns xml:base xml:lang
Mandatory. The W3C specification of the enclosed VoiceXML document. Supported values are 2.0 and 2.1. Mandatory. The namespace of the VoiceXML document. The only supported value is http://www.w3.org/2001/vxml. Optional. Allows a base URI to be defined. If set, any relative URIs within the document are resolved using this base URI. Optional. Specifies the language identifier for this document. If not specified, the default is en (English). If specified, the language identifier is inherited by all elements in the document that use the xml:lang attribute. Note that a value specified for xml:lang within an element overrides that specified at the document level. Specifying an unsupported language results in an error.unsupported.language event. Optional. The URI of this documents root document, if any. If specified, the implication is that this document is a leaf document. Optional. The namespace of the XML schema defined for the cvd prefix, which indicates a RadiSys extension. This is optional for any VoiceXML script that uses a cvd prefix, such as cvd:append, cvd:dest, or cvd:destexpr.
application xmlns:cvd
Usage Guidelines <vxml> is the root element for VoiceXML.
It contains a VoiceXML document, which can be an entire application or a portion of an application.
162
<vxml>
The media server accepts 2.1 as the value of the version attribute of the <vxml> elemement. Where VoiceXML version 2.1 differs from version 2.0, the media server complies with version 2.0, with the following exceptions: The media server supports the elements described in Chapter 4: VoiceXML 2.0 Elements, as defined in [13]. The media server supports the ECMAScript binding Level 2 subset of the Document Object Module (DOM) as described in Chapter 4: VoiceXML 2.0 Elements.
163
164
165
166
167
168
Chapter5:
VOICEXML 2.1 ELEMENTS
This chapter describes the VoiceXML 2.1 elements currently supported by the Convedia Media Server. The VoiceXML 2.1 language is defined by the W3C Recommendation specifying the language [14]. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported in this release of the Convedia Media Server. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.
169
<data>
Fetches XML data from a document server without transitioning to a new VoiceXML document. Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <foreach>, <help>, <if>, <noinput>, <nomatch>, <vxml>
None.
Attributes
src
The URI representing the location of the XML data to retrieve. Only HTTP URIs are supported. The URI must comply with the XML anyURI format. If a relative URI is specified, it is qualified using the base URI Exactly one of src and srcexpr must be specified; otherwise, an error.badfetch is thrown.
name
Optional. The name of a variable exposing the Document Object Module (DOM). If this attribute is not specified, the retrieved content is ignored. An ECMAScript expression representing the new value of the variable. This value dynamically determines the URI at the time that the data needs to be fetched. A URI resulting from the expression must comply with the XML anyURI format. Exactly one of src and srcexpr must be specified; otherwise, an error.badfetch is thrown.
srcexpr
method
Optional. The request method. Supported values are get and post.The default value is get.
170
<data>
namelist
Optional. The list of variables to submit. Supported values are as follows: Individual variable references which are submitted with the same qualification used in the namelist. Declared VoiceXML and ECMAScript variables can be referenced. The media server supports ECMAScript objects in namelist with following restrictions: The value of the method attribute must be post. If the value of the method attribute is get, the media server raises an error.badfetch exception. The maximum nesting level is four if the ECMAScript object contains other objects. The body of the post request contains the ECMAScript object as an XML file. The XML file contains all nested objects, each contained within an XML element. Properties of all objects are each represented as an XML element, for which the property name is the element name and the property value is the content. When the enctype is application/x-www-form-urlcoded the XML is sent in the post body as a single line using standard escaping rules and without whitespace. When the enctype is text/xml the XML is sent in the post body in standard XML format. By default, no variables are submitted.
enctype
Optional. The media encoding type of the submitted document. Supported value are as follows: application/x-www-form-urlencoded text/xml (only when the namelist is an ECMAScript object) The media server returns an error.batch if an unsupported value is specified (e.g. multipart/form-data) or if text/xml is specified when the namelist is not an ECMAScript object. The default value is application/x-www-form-urlencoded.
171
fetchaudio The maximum length of the URI string is 255 characters. The supported fetchaudio source is internal provisioned clips and external NFS or HTTP. Clip type must be audio-only, video-only, or multimedia. TTS, RTSP media and sets and variables are not supported. The playing of the audio clip is governed by the fetchaudiodelay and fetchaudiominimum properties in effect at the time of the fetch. fetchhint fetchtimeout Optional. Ignored. Optional. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties. maxage maxstale Optional. Ignored. Optional. Ignored.
Usage Guidelines
The <data> element fetches XML data without transitioning to a new XML document. The XML data fetched by the <data> element is bound to an ECMAScript through the variable named by the name attribute; this variable exposes a read-only subset of the W3C Document Object Model (DOM). If the content cannot be retrieved, the media server raises an error.badfetch exception. If the retrieved content is not well-formed XML, the media server raises an error.semantic exception.
172
<data>
The media server supports only US-ASCII characters in UTF-8 encoding format in XML documents retrieved with the <data> element. The media server does not support the access-control feature of the <data> element.
173
<foreach>
Allows a VoiceXML application to iterate through an ECMAScript array, executing the content of each array item.. Parent and child elements for a <foreach> element used within executable content: Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <foreach>, <help>, <if>, <noinput>, <nomatch> <audio>, <assign>, <clear>, <data>, <disconnect>, <exit>, <foreach>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Parent and child elements for a <foreach> element used within a <prompt> element: Parent element: Child elements:
<foreach>, <prompt> <audio>, <break>, <foreach>
Attributes
array item
Mandatory. An ECMAScript expression that must evaluate to an ECMAScript array. Mandatory. The variable that stores each array item upon each iteration of the loop. If the variable is not already defined within the parents scope, a new variable is declared.
Usage Guidelines
The <foreach> element allows a VoiceXML element to execute content from within an ECMAScript array. Both the array and item attributes must be specified; otherwise, the media server raises an error.badfetch exception. If the resulting evaluation of the array does not satisfy the instanceof(Array) statement in the ECMAScript, the media server raises an error.semantic exception. The <foreach> element operates on a shallow copy of the array specified by the array attribute; this means that only the reference is copied. For example, a shallow copy of an array of pointers to strings copies only the pointers, leaving the underlying character strings as the actual data (not copies). The <foreach> element may appear within executable content and as a chiild element of the <prompt> element. When the <foreach> element is within executable content it may itself contain elements of executable content. When the <foreach> element is within a <prompt> element, it can contain only elements that are valid in the <enumerate> element; that is: <audio>, <break>, and <foreach>.
174
<foreach>
The media server supports up to two levels of nesting with a <foreach> element. If the level of nesting is greater than two, the media server raises an error.semantic exception.
175
176
Chapter6:
ECMASCRIPT LANGUAGE BINDING FOR THE DOM
This chapter describes the ECMAScript binding for the subset of Level 2 of the DOM. The ECMAScript binding for the subset of Level 2 of the Document Object Model (DOM) exposed by the <data> element is specified in Appendix D of the W3C Recommendation Voice Extensible Markup Language (VoiceXML) 2.1 [14]. The media server supports the following objects from this specification; specific support is described in this chapter. Attr Object CDATASection Object CharacterData Object Comment Object Document Object DOMException Prototype Object Element Object EntityReference Object NamedNodeMap Object Node Prototype Object NodeList Object ProcessingInstruction Object Text Object
177
ECMAScript Language Binding for the DOM
Attr Object
For the Attr object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property name specified value ownerElement nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Boolean String Element String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes Yes No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Type
Method hasChildNodes hasAttrbutes
Returns Boolean Boolean
Parameter Name
178
CDATASection Object
CDATASection Object
For the CDATASection object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property data length nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Number String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name offset, count Parameter Type Number Number
Method substringData hasChildNodes hasAttrbutes
Returns String Boolean Boolean
179
CharacterData Object
For the CharacterData object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Methods
180
CharacterData Object
Usage Guidelines
If a DOMException object is raised on retrieval of the CharacterData.data property or the CharacterData.substringData method and not caught by an ECMAScript execution handler, the media server raises an error.semantic exception.
181
Comment Object
For the Comment object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Methods
182
Document Object
Document Object
For the Document object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property documentElement nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type Element String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name tagname Parameter Type String
Method getElementsByTagName getElementsByTagNameNS getElementById hasChildNodes hasAttrbutes
Returns NodeList NodeList Element Boolean Boolean
namespaceURI, String localName String elementId String
183
DOMException Prototype Object

For the DOMException Prototype object, the media server supports the following constants, properties, and methods.
Constants
Constant INDEX_SIZE_ERR DOMSTRING_SIZE_ERR NO_MODIFICATION_ALLOWED_ERR NOT_FOUND_ERR NOT_SUPPORTED_ERR INVALID_STATE_ERR Type Number Number Number Number Number Number Value 1 2 7 8 9 11
Properties
Property code Type Number Read-Only No
Methods
None.
Usage Guidelines
If a DOMException object is raised and not caught by an ECMAScript execution handler on retrieval of the Node.nodeValue property, the CharacterData.data property, or the CharacterData.substringData method, the media server raises an error.semantic exception.
184
Element Object
Element Object
For the Element object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property tagName nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name name name name Parameter Type String String String
Method getAttribute getAttributeNode getElementsByTagName getAttributeNS getAttributeNodeNS getElementsByTagNameNS hasAttribute
Returns String Attr NodeList String Attr NodeList Boolean
namespaceURI, String localName String namespaceURI, String localName String namespaceURI, String localName String name String
185
Method hasAttributeNS hasChildNodes hasAttrbutes
Returns Boolean Boolean Boolean
Parameter Name
Parameter Type
namespaceURI, String localName String
186
EntityReference Object
EntityReference Object
For the EntityReference Prototype object, the media server supports the following constants, properties, and methods.
Constants
Constant ELEMENT_NODE ATTRIBUTE_NODE TEXT_NODE CDATA_SECTION_NODE ENTITY_REFERENCE_NODE PROCESSING_INSTRUCTION_NODE COMMENT_NODE DOCUMENT_NODE Type Number Number Number Number Number Number Number Number Value 1 2 3 4 5 7 8 9
Properties
Property nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
187
6
Methods
Parameter Name
Parameter Type
188
NamedNodeMap Object
NamedNodeMap Object
For the NamedNodeMap object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property length Type Number Read-Only Yes
Methods
Parameter Name name index Parameter Type String Number
Method getNamedItem item getNamedItemNS
Returns Node Node Node
namespaceURI, String localName String
189
Node Prototype Object

For the Node Prototype object, the media server supports the following constants, properties, and methods.
Constants
Properties
Property nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
190
Node Prototype Object
Methods
Parameter Name Parameter Type
Usage Guidelines
If a DOMException object is raised on retrieval of the Node.nodeValue property and not caught by an ECMAScript execution handler, the media server raises an error.semantic exception.
191
NodeList Object
For the NodeList object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property length Type Number Read-Only Yes
Methods
Parameter Name index Parameter Type Number
Method item
Returns Node
192
ProcessingInstruction Object
ProcessingInstruction Object
For the ProcessingInstruction object, the media server supports the following constants, properties, and methods.
Constants
Properties
Property target data nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes No Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
193
6
Methods
Parameter Name
Parameter Type
Usage Guidelines
The media server parses and interprets the xmlprocessing instruction. The media server does not support any other processing instruction. Unsupported processing instructions cause the media server to raise an error.semantic exception. The media server does not generate any processing instruction objects.
194
Text Object
Text Object
For the Text object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Methods
195
196
ApendixA:
BEST PRACTICES FOR VOICEXML DEVELOPMENT
This appendix describes some development practices that can help you maximize performance and capacity of your VoiceXML applications.
197
Best Practices for VoiceXML Development
The coding practices recommended in this appendix are designed to guide developers in developing code for the RadiSys Convedia Media Servers VoiceXML interpreter. They are designed to help development partners achieve optimal performance on the RadiSys Convedia Media Servers VoiceXML interface.
1 Store permanent audio clips on the media server.
Provisioning permanent audio clips internally on the media server, rather than on an external NFS or HTTP server, allows more efficient clip retrieval. In addition, storing clips internally removes any issues relating to interconnectivity with the NFS or HTTP server that could occur, reducing debugging time.
2 If you store permanent clips externally, use NFS.
If you must store provisioned audio clips on an external server, RadiSys recommends using an NFS server. RadiSys currently does not recommend using HTTP for recording and playing back permanent audio clips. If you must record to an external HTTP server, use the <submit> element. This element records the file internally until it completes, and then uses the HTTP POST method to post the file to the HTTP server.
3 Record temporary audio clips on the media server.
If the application records audio clips for temporary use, it is most efficient to store the temporary clips internally on the media server. Clips that are recorded on the media server are transient: they are deleted when the connection with which they are associated is closed. They are also volatile: they will not survive a reset cycle.
4 Consolidate VoiceXML documents.
The number of document transitions, which have a high CPU overhead, can vary per application. In order to achieve higher capacity, consolidate the VoiceXML logic or flow to minimize the number of document transitions. In calculating performance characteristics, RadiSys assumes that the average number of transitions in a voicemail-type application to be 2 to 3.
5 Reduce application root document size.
The application root document size can grow large if several variables and several catch handlers are defined. Since root documents may be called with every document fetch, having a large root document can cause high CPU consumption, impacting performance. Remove any unused or unnecessary variables and catch handlers from the application root document, and define them within the VoiceXML leaf document where they are required. This guideline interacts with the previous guideline. Since root documents are called with every document fetch, a large number of VoiceXML documents calling a large root document can exacerbate CPU consumption.
6 Reduce the number of subdialogs.
198
REFERENCES
[1] [2] [3] [4] [5] [6] [7] [8] [9]
3GPP TS 26.244. 3GPP File Format (2GP) Specification. V7.1.0. Audio-Video Transport Working Group, Casner, S., and P. Hoschka. MIME Type Registration of RTP Payload Formats. Internet Draft, Internet Engineering Task Force, November 2001. Bos, B., et al. (eds). Cascading Style Sheets, Level 2 (CSS2) Specification. W3C Candidate Recommendation, World Wide Web Consortium, May 1998. Bray, T., et al. (eds). Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation 04, World Wide Web Consortium, February 2004. Burnett, D., et al. (eds). Speech Synthesis Markup Language Specification. W3C Working Draft, World Wide Web Consortium, April 2002. Burnett, D., et al. (eds). SSML 1.0 say-as attribute values. W3C Working Note 26, World Wide Web Consortium, May 2005. Cable Television Laboratories. PacketCable Audio Server Protocol Specification, PKT-SP-ASP-I02-010620. June 2001. Dahl, D. (ed). Natural Language Semantics Markup Language for the Speech Interface Framework. W3C Working Recommendation, World Wide Web Consortium, November 2000. Freed, N., and Borenstein, N. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. RFC 2046, Internet Engineering Task Force, November 1998.
[10] Gellens, R., Singer, D., and P. Frodjh. The Codecs Parameter for "Bucket" Media Types. RFC 1738, Internet Engineering Task Force, November 2005. [11] Hunt, A., and S. McGlashan. Speech Recognition Grammar Specification Version 1.0. W3C Candidate Recommendation, World Wide Web Consortium, June 2002. [12] International Organization for Standardization. Codes for the representation of names and languages -Part 2:Alpha-3 code. ISO 639-2:1998, October 1998. [13] McGlashan, S. et al. (eds.). Voice Extensible Markup Language: VoiceXML, Version 2.0. W3C Candidate Recommendation, World Wide Web Consortium, March 2004. [14] Oshry, Matt et al. (eds.) Voice Extensible Markup Language: VoiceXML, Version 2.1. W3C Recommendation 19, World Wide Web Consortium, June 2007. [15] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler. SIP: Session Initiation Protocol. RFC 3261, Internet Engineering Task Force, June 2002. [16] Schulzrinne, H., and S. Petrack. RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals.
199
References
RFC 2833, Internet Engineering Task Force, May 2000. [17] Shanamugham, S. and D. Burnett. Media Resource Control Protocol Version 2 (MRCPv2). Internet Draft, Internet Engineering Task Force, November 2008. [18] Shanamugham, S., Monaco, P., and B. Eberman. A Media Resource Control Protocol (MRCP). RFC 4463, Internet Engineering Task Force, April 2006. [19] Sjoberg, J., Westerlund, M., and Q. Xie. Real-Time Transfer Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. Internet Engineering Task Force, January 2005. Work in progress.
200
201
References
202
GLOSSARY OF ACRONYMS
3G 3GP 3GPP 3PCC C F AEC ARP ASR BITS BTU
Third-Generation Wireless A file format standardized by the 3GPP. Third-Generation Partnership Project Third Party Call Control Degrees Centigrade Degrees Fahrenheit Acoustic Echo Cancellation Address Resolution Protocol Automatic Speech Recognition Building Integrated Timing Source British Thermal Unit. A measure of heat energy. The amount of heat required to raise 1 pound of water by one degree Fahrenheit. Control Agent Communications Assistance for Law Enforcement Act Canadian Standards Association Compact Disk Conformite Europenne Fax called station identification tone International Special Committee for Radio Interference Convedia Media Server Fax calling tone Central Office
CA CALEA CAN/CSA CD CE CED CISPR CMS CNG CO
203
Glossary of Acronyms
CPA CPAMD CPTD CPVAD DC DNS DSP DTMF EMI FCC FRU FQDN FTP GUI HTTP ID IMMS IMS I/O IP IPBCP IuFP IuUP IPCC ITU IVR kbps kg
Call progress analysis Call progress answering machine detection Call progress call detection Call progress voice activity detection Direct Current Domain Name System Digital Signal Processor Dual Tone Multi Frequency Electromagnetic Interference Federal Communications Commission Field-Replaceable Units fully qualified domain name File Transfer Protocol Graphical User Interface HyperText Transport Protocol Identifier Integrated Mobile Media Server IP Media Subsystem Input/Output Internet Protocol IP Bearer Control Protocol Iu Framing Protocol Iu Interface User Plane IP Call Center International Telecommunications Union Interactive Voice Response Kilobits per second Kilogram(s)
204
LAN lb LEA LED Mbps MIB MGCP MOML MPC MPI
Local Area Network Pound(s) (weight) Law Enforcement Agency Light Emitting Diode Megabits per second Management Information Base Media Gateway Control Protocol Media Objects Markup Language Media Processor Card. Minimum Picture Interval. The minimum time that can occur between pictures selected for encoding. Multimedia Resource Function Processor Media Resource Control Protocol Media Server [except when used in conjunction with a Microsoft product, where it represents Microsoft] Mobile Switch Controller Media Sessions Markup Language Mean Time Between Failures Mean Time to Restore Nb Interface User Plane Network Equipment-Building System Network File System Noise Reduction Operations, Administration, Maintenance, and Provisioning Object-Oriented Portable Document Format Public Mobile Land Network Plain Old Telephone System
MRFP MRCP MS
MSC MSML MTBF MTTR NbUP NEBS NFS NR OAMP OO PDF PLMN POTS
205
PSTN QoS RF RFC RFI RJ-45 RPC RS-232 RTCP RTP RU SCC SDP SIP SIT SNMP SRGS SSRC TAC TCP TCP/IP TFTP ToS TTS UAC UDP UL URL
Public Switched Telephone Network Quality of Service Radio Frequency Request for Comments Radio Frequency Interference Registered Jack 45 Remote Procedure Call Recommended Standard 232 Real Time Control Protocol Real Time Protocol Rack Unit. 1.75 in (4.4 cm) in height. Shelf Control Card. Session Description Protocol Session Initiation Protocol Special Information Tone Simple Network Management Protocol Speech Recognition Grammar Specification Synchronization source Technical Assistance Center Transmission Control Protocol Transmission Control Protocol/Internet Protocol Trivial File Transfer Protocol Type of Service Text to Speech User Agent Client User Datagram Protocol Underwriters Laboratory Uniform Resource Locator
206
UTC VAC VAD VDC VoiceXML
Universal Time Coordinated [formerly GMT] Volts, Alternating Current Voice Activity Detector Volts, Direct Current Voice eXtensible Markup Language: An XML language designed for defining voice segments and enabling access to the Internet via telephones and other voice-activated devices Voice over Internet Protocol Voice Response Unit Watts eXtensible Markup Language
VoIP VRU W XML
207
208

VXMLRef 007-02542-0025 R4.21 v01

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VXMLRef 007-02542-0025 R4.21 v01

Uploaded by

Copyright:

Available Formats

CONVEDIA MEDIA SERVER

007-02542-0025 August 2010

Release History Part Number

Description Version 01. Released with R4.21.0.

Chapter 1: VoiceXML Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 2: VoiceXML Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Radisys Convedia Media Server

Reference Guide (v.01)

Chapter 3: DTMF and Voice Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Chapter 4: VoiceXML 2.0 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Radisys Convedia Media Server

Reference Guide (v.01)

Chapter 5: VoiceXML 2.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Chapter 6: ECMAScript Language Binding for the DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Radisys Convedia Media Server

Reference Guide (v.01)

Radisys Convedia Media Server

Reference Guide (v.01)

Radisys Convedia Media Server

Reference Guide (v.01)

LIST OF SHADOW VARIABLES

xivRadisys Convedia Media ServerReference Guide (v.01)

Radisys Convedia Media Server

Reference Guide (v.01)

Monospace <Monospace> boldface Monospace

Radisys Convedia Media Server

Reference Guide (v.01)

Convedia Software Media Server User Guide (Co-Resident Mode)

Radisys Convedia Media Server

Reference Guide (v.01)

Whats New in Release 4.21

Whats New in Release 4.21

New Features in R4.21.0

New Features for SIP

Radisys Convedia Media Server

Reference Guide (v.01)

Whats New in Release 4.21

Radisys Convedia Media Server

Reference Guide (v.01)

Menus Elements Subdialogs Scope

The basic structure of a menu is as in the following example:

Anonymous > Dialog > Document > Applicion > session

VoiceXML 2.0 Elements

Table 1-1 VoiceXML 2.0 Supported Elements

Table 1-1 VoiceXML 2.0 Supported Elements

VoiceXML 2.1 Elements

<grammar> <item> <lexicon>

<one-of> <rule> <ruleref>

<p> <phoneme> <prosody> <s> <say-as> <speak> <sub> <value> <voice>

Yes Yes Yes Yes Yes Yesa Yes Yes Yes

General XML Handling

SIP Transport of VoiceXML

SIP Transport of VoiceXML

Request-URIs for the dialog Service Context

INVITE sip:dialog@ms.company.com;voicexml=http://10.10.10.53/ scripts/cgi-bin/vmail?DialledNumber=6048081234

Passing Variables to the VoiceXML Interpreter

Standard Session Variables

SIP Transport of VoiceXML

Application Session Variables

populates the [0] and [1] array elements as follows:

Terminating VoiceXML Dialogs

Sample VoiceXML Call Flow