Definition and Overview

Speech-enabled access to Web-based information. Provides telephone users with a natural-language interface to access and retrieve Web content. Is a Web site or other service that a user can reach by telephone for information such as weather, sport scores, or stock quotes.

Requests information using voice or Touchtone keys. Receives the requested information from a special voice-producing program at the Web site. .For a Mobile User… A mobile user might dial in to a voice portal Web site.

perhaps with a WAP interface. . Gets information on a small visual display.For a smart-phone user… A user with a smart phone can connect to the Internet.

Existing portal and Web site operators have huge databases and they can support telephone applications with minimal investment. .Overview Every Service Provider wants to be the next Yahoo or AOL. A new class of Service Providers need to set themselves apart in a fiercely competitive market.

.Continue… Telephone is a natural way to cross the digital divide into the new world of information access. like VoiceXML. Adoption of a standard voice scripting language. powered by huge increases in processing power. can be expected to fuel voice portal services. Speech-recognition technology in particular has made dramatic advances.

Architecture .

and may have a database or interface to an external database or transaction server. Internet-style Network This is a TCP/IP based packet network that connects the application server and voice server via HTTP. a regular analog line or lines coming through a PBX system. Web/Application Server Runs the application logic.Components Telephony Network PSTN. ISDN lines or VoIP network. .

VoiceXML Gateway (Voice Server) TTS engine Telephony Components ASR engine Voice XML Browser .

It controls speech and telephony resources.VoiceXML browser It requests VoiceXML documents. . It interprets them and controls the dialog flow.

Telephony Resources It can play/record audio. call transfer and call termination. It includes telephony network interface. . It is responsible for all the telephony features like DTMF extraction and detection. call placing.

. Normalization Text normalizer takes the ASCII input and expands abbreviations. numbers and monetary amounts to their full word form. non alphabetic input characters and special pronunciation. Also processes punctuation.Text-To-Speech Engine It enables artificial production of human speech. The steps are: 1.

2. Syntactic Analysis It analyses of the sentence is performed based on the syntactic role of function words and verbs. using phonetic diary. 3. . Phonemic Translation The associated phoneme string and stress information are retrieved and passed directly to the next process. 4. Parameter Generation Appropriate parameters are generated and supplied to the acoustic synthesizer.

Automated Speech Recognition Engine It allows users of information systems to speak entries rather than punch numbers on a keypad. . It provides information and forwards telephone calls. Primarily used to provide information and to forward telephone calls.

Automated Speech Recognition Engine .

. This processing is best done on programmable DSP cards or on line interface cards. It must perform echo cancellation to remove the echo.1. Capture and Digitization This stage interacts with the telephony hardware to listen for and detect the incoming speech.

Then the variability cause by noise and channel conditions are reduced. Spectral Representation It converts the signal into the spectral domain then maps that onto a nonlinear spectral scale.2. .

The static and dynamic properties of the individual speech sounds are considered. The amount of computation needed in the modeling and search phases are significantly reduced. Segmentation The entire phonetic segment is taken into account (the basic units of speech). . It provides accuracy.3.

Speaker differences. The speech signal including noise and channel variability which was not normalized out by the signal processing phase is refined. Phonetic Modeling Various properties of the speech signal are measured. Then modeling is done according to the probability distributions of each of the phonetic units in this multidimensional feature space. accents are also observed. .4. speaking rate.

dynamic language and semantic models. N Best output. . This requires a huge amount of computation. Search and Match The job of the search stage is to compare that with everything the user might have said and find the best match.5. etc. It allows dynamic word additions.

Voice Portal Architecture Telephony components TTS Engine ASR Engine Voice XML Browser PSTN Documents VXML Documents Grammar Audio files Web server Voice server Internet HTML Documents .

or the recordings of the user’s input. Audio Files These are prerecorded audio files that are played back.VoiceXML Documents These define the voice user interaction and dialog flow control. Grammar Files These files define the valid commands that are allowed during the voice interaction. .

Voice XML Standard technology to make Internet content and information widely accessible via voice and phone. It provides features for : voice dialogs telephony performance . Input : Speech Recognition and DTMF Output : Pre Recorded audio and text to speech synthesis (TTS).

Call Flow .

Tellme Networks and Web Portals: AOL. MapQuest Network Service Providers: Telera.Voice Portal Service Providers Start-up Voice Portal Companies: HeyAnita. and iBasis . Netbytel.

Advantages…. Differentiation Mobile environment Customer Satisfaction Cost Savings Useful for Visually Impaired Persons Reduces wastage of band-width Language-independent More Revenue Improved Security .

