You are on page 1of 21

Voice Recognition

Lawrence Pan
Syen Hassan
Jamme Tan
Overview
 History of voice recognition
 Why voice recognition?
 Technology behind voice recognition
 Five major steps
 Common applications
 Current leaders
 Demonstrations
 Product Evaluation
 Implementation of our own voice recognition system
 Grade retrieval system for EE3414
 Future Challenges
History of Voice Recognition
 Radio Rex (house trained dog), 1922
 U.S Department of Defense, 1940’s
 SpeechUnderstanding Research (SUR)
program
 Carnegie Mellon University & MIT
 Automatic interception & translation of Russian
radio transmissions (FAILURE)
 Original message: “the spirit is willing but the flesh is
weak”
 Translated message: “the vodka is strong but the
meat is disgusting.”
History Cont’d
 First major achievements
 Bell Laboratories, 1952
 Successful recognition of numbers 0 to 9, spoken
over telephone
 MIT, 1959
 Successful recognition of vowels with 93% accuracy
 Carnegie Mellon University, 1970’s
 HARPY system: capable of recognizing complete
sentences
History Cont’d
 Obstacles
 Computing power: over 50 computers needed
for HARPY system to perform
 Ability to recognize speech from any person
 Taking in account different accents, speech tones,
etc.
 Ability to recognize continuous speech
 so…we…do…not…have…to…speak…like…this!
 Commercialization of voice recognition
systems
History Cont’d

Computation required and Accuracy and task complexity


computation available in available progress over time
processors over time
Why Voice Recognition?
 Convenience
 Naturaluser interface: human speech
 Improved services for the disabled

 Wider range of users

 Future possibilities and improvements


 Internet use over phones through voice portals
 Advanced applications implementing voice
control in all areas
Technology behind Voice Recognition

 Five major steps used by speech recognizer


Five major steps in voice recognition
 Capture and Digitalization
 System interacts with the telephony device to capture
voice input at 8000 samples/sec
 Spectral Representation
 Voice samples converted to graphical representation
 Segmentation
 Speech signals are broken down into segmented
parts.
 Improves accuracy
 Reduces computation: impossible to process entire
signal in real time
Graphical Representations
Acoustic Model
 Phonemes – smallest phonetic unit in a
language
 Creates distinction between other words
 e.g. b in boy and t in toy
 Allophone – different pronunciations of a
phoneme/letter
 E.g. t in tab, t in stab, tt in stutter
 Database (Lexicon) of all words known to the
system for a language
 Should contain several recordings for certain words
 E.g. “the” can be pronounced “duh” or “dee”
Acoustic Model Cont’d
 Trelliss
 Datastructure made up of all possible
combinations of allophones
 Training of Acoustic models
 For single-user systems
 Text is read by user and recognized by system
 For multi-user systems
 Utterances spoken by many users compiled into a
database, then inputted into a recognizer
 Weights are put on certain allophones
Language Model
 Languages have structures (i.e. grammar)
 Differencebetween two words can be difficult to
understand
 Can be distinguished using context
 E.g. “ours” and “hours” can be determined if previous
word is “two”
Common Applications
 Call Center Automation
 Widely used in all industries (consumer interface)
 Airline companies: booking flights, general info, etc.
 Banking companies: “pay by phone”, account
balances, etc.
 Delivery Services (FedEx): tracking orders, etc.

 All general customer service systems

 Computer Integration of voice recognition


 Personal Computers
 Speech to Text Dictation
 Accessibility purposes: voice control of computers
Common Applications cont’d
 Integrated into
automobiles:
 Visteon Voice
Technology™ used in
Infiniti Q45
 Controls:
 Climate
 CD player
 Navigation system
Competing Standards
 VoiceXML (extensible markup language)
 Partners: AT&T, IBM, Motorola, Lucent Tech.
 Used in implementation of most voice portals
 Shifting target toward web developers
 SALT (Speech Application Language Tags)
 Partners: Microsoft, Intel, Cisco, SpeechWorks
 Targeted toward web developers
Current Leaders
 Dragon Systems:
 Naturally Speaking: PC based user side programs for Automated
speech recognition (ASR)
 Automotive, Telephony, Mobile, Games, Embedded Chips
 SpeechWorks: Connects users to industry voice portals
 AOLByPhone, FedEx, E*Trade, etc.
 BeVocal: provides voice portals for Bell South, etc.
 TellMe: provides voice portals for AT&T, Merrill Lynch,
etc.
 Philips Speech Recognition
 Services automotive, mobile device, and consumer electronic
industries
 IBM Via Voice, MS Agent
Demonstrations
 SpeechWorksTM product line
 United Airlines' toll free flight information line (demo)
 BankWorks Automated Bill Payment (demo)
 FedEx Rate Finder (demo)
 E*Trade Stock (demo)
 AOLbyPhone service (demo)
 BeVocal solutions
Magical Merlin’s Grade Retrieval System

 Designed in Visual Basic using Microsoft’s


MSAgent
Menu Recognized voice commands
First Exam First Exam, First Test, First Midterm
Second Exam, Second Test, Second
Second Exam Midterm
Quiz Grades Quiz Grades, Grade on Quizzes

Homework Grades Homework Grades, Grade on Homework


Click on my Project Grade Project Grade, Grade on Project
belly for a short
demonstration Final Grade Final Grade, Grade for course
Main Menu Main menu, Main, Class
Future Challenges
 Speech Technology
 VoiceXML vs. SALT
 Voice enabling web content
 Real time access to source data
 Stock market, traffic, sports, etc.
 Clear connection needed for effective use of
voice portals
 Security Issues involved
 Advertising based revenue
References
 http://www.stanford.edu/~jmaurer/homepage.htm
 http://www.bevocal.com/corporateweb/technology/index.html
 http://www.speechworks.com/demos/index.cfm
 http://www.speechworks.com/learn/index.cfm
 http://www.scansoft.com/realspeak/tts2500/
 http://www.out-loud.com/speechacts.html
 http://www.gignews.com/fdlspeech1.htm
 http://www.gignews.com/fdlspeech2.htm
 http://www.gignews.com/fdlspeech3.htm
 http://www.microsoft.com/msagent/default.asp

You might also like