You are on page 1of 5

Embedded ViaVoice

A single, fully integrated architecture


The modular Embedded ViaVoice architecture provides fully integrated, automatic speech
recognition, speech synthesis through text-to-speech (TTS) and other technology engines
supporting the full-feature requirements of an application with minimal processor utilization and
memory requirements. A single architecture with consistent application interfaces enables
Embedded ViaVoice to support solutions from low-resource Personal Navigation Devices to high-
performance in-vehicle solutions to Java™ technology. This single-architecture implementation is a
particular advantage to applications that need to span a broad range of platform capacities, as well
as solutions where significant growth in capacity is a requirement.
A broad language base
Embedded ViaVoice is available in a broad set of languages to provide speech-recognition and
speech-synthesis capabilities through the support of a worldwide network of IBM speech research
and development laboratories. High-quality embedded concatenative TTS (eCTTS) capabilities
provide more-human-sounding speech synthesis to support more-advanced applications. To learn
more about IBM's continuing development of other language models for ASR and voices for TTS, as
well as its continuous improvement of existing languages, contact your IBM representative.
High recognition accuracy
The Embedded ViaVoice recognition engine is based on small units of speech, called phonemes.
This phoneme-based model uses finite state grammars to support highly accurate and noise-robust,
continuous speech recognition. Through a comprehensive and vigorous research and development
effort, IBM has significantly reduced the word-error rate of Embedded ViaVoice over the past several
years.
Large vocabulary recognition
The maximum vocabulary supported by Embedded ViaVoice has grown by a factor of 25 over the
past four years. Embedded ViaVoice supports the recognition of lists of a virtually limitless number of
words, bounded only by the platform's processing and memory resources.
Services and workshops
Porting and integration services include porting to a new operating system, recompiling for different
processor architecture or modifying the embedded audio layer to use a new driver or codec.
Alternatively, with a device adaptation kit, IBM supplies the tools that enable you to perform and test
the audio adaptations yourself.
IBM can provide on-site classes to application developers about the Embedded ViaVoice Software
Developer Kit (SDK). Customized development workshops are also available to provide skills
transfer and instruction on application development, evaluation methodology and tools, so you can
design and tune your own system to suit your organization's business needs.
Support for multiple programming models
Many small-footprint, embedded applications use Embedded ViaVoice through its C/C++ language
application interface.
IBM expertise in voice
IBM's sustained research and development investment in speech recognition and synthesis for more
than 30 years has resulted in multiple advances, including Embedded ViaVoice. IBM Embedded
ViaVoice software enables you to gain competitive advantage in today's fast-moving marketplace -
and offers a clear path to future growth through a single, fully integrated architecture.
Functionality

 Portable, event-driven architecture


 Fully integrated automatic speech recognition (ASR) and text-to-speech (TTS)
 Low processor utilization
 Small static and dynamic footprint
 Scalable, modular architecture
 Single-threading and multithreading support
 Runtime event notification
 Unsupervised adaptation to speakers
 Optional speaker enrollment
 Phoneme-based
 Speaker-independent

Accuracy and robustness

 Very large vocabulary recognition, exceeding 200 000 spoken words in real time
 Freeform commands combining statistical language models and semantic interpretation
 Tunable rejection to address nonspeech sounds and out-of-vocabulary words
 Advanced front-end noise suppression
 Support for vendor-supplied noise suppression
 Enhanced speech and silence detection
 Continuous and discrete digit recognition
 Spell-mode capable
 Word and phrase confidence scoring
 Detection and adaptation for gender
 Pronunciation confusability reporting
 N-Best and homonym support
 Grammar weights

Solution-development tools

 Eclipse technology-based IBM Embedded Voice Toolkit, Version 6.0, including a customized
integrated development environment (IDE) for embedded speech developers
 Application-creation wizards
 Grammar editor and templates
 Vocabulary testing and analysis
 Pronunciation compiler and variant generator
 Gain-control tuning tool
 Tracing and debugging interface
 Device adaptation kit

Flexibility

 Broad language coverage


 Additional languages in development
 JSAPI and extensions
 Automatic gain adjustment
 Multiple listening modes, including push to talk, push to activate and always listening
 Run-time language switching
 Run-time pronunciation manipulation
 Scalable acoustic models
 11/16/22kHz sampling rates
 Signal-to-noise (SNR) feedback
 Voice tags from text or acoustic input
 Embedded baseform generation

Grammar and compiler support

 Scalable vocabulary support


 Built-in grammar compiler
 Finite state grammars
 Multiple grammar formats, including Speech Recognition Grammar Specification (SRGS),
Backus-Naur Format (BNF) and Java Speech Grammar Format (JSGF)
 Annotations
 Statistical language models
 Dynamic and unlimited vocabularies
 Precompiled and runtime grammars

Speech synthesis (TTS)

 Unlimited pronunciation domain


 Multiple voices
 Customizable voices
 Dictionary support
 Indexing support, and pause-and-resume capabilities
 Adjustable performance-tuning parameters
 API for phoneme generation
 Manual override of automatic synthesis
 SSML support

Processors currently supported*

 Hitachi SH4
 Motorola PowerPC
 IBM PowerPC® processor
 Intel® x86
 Intel StrongARM
 Intel XScale
 Blackfin 539 DSP
 MIPS

*Others can be added based on customer requirements.

Operating Systems Supported

 Windows XP
 Windows 2000
 Windows CE / Windows Mobile
 QNX
 Linux
 Embedded Linux
 T-Engine
 MicroItron
 VxWorks
 RTXC

Languages Offered
Automatic Speech Recognition (ASR)

 US English
 North American Spanish
 Canadian French
 UK English
 French
 Italian
 German
 Spanish
 Dutch
 Japanese
 Mandarin Chinese
 European Portuguese
 Swedish
 Korean

Concatenative Text-to-speech (eCTTS)

 US English
 North American Spanish
 Canadian French
 UK English
 German
 French
 Italian
 Spanish
 Japanese
 Dutch

Formant Text-to-speech

 US English
 North American Spanish
 Canadian French
 UK English
 German
 French
 Italian
 Spanish
 Japanese
 Dutch
 Simplified Chinese
 Brazilian Portuguese
 Korean
 Traditional Chinese

You might also like