Professional Documents
Culture Documents
Introduction
One don’t have to be a scientist to know that the computer of the future will
talk, listen and understand. One of them is the Apple Macintosh of today.
Apple’s Speech Recognition and Speech Synthesis Technologies now give
speech-savvy applications the power to carry out your voice commands and
even speak back to you in plain English.
Apple Speech Recognition lets the system (Macintosh) understand what you
say, giving you a new dimension for interacting with and controlling your
computer by voice. You don’t even have to train it to understand your voice,
because it already understands you, from your very first word. You can
speak naturally, without pausing or stopping. Apple’s leadership in speech
recognition technology makes it possible by bringing a whole new dimension
to the user interface: speech. Combined with Voice-Over, speech synthesis
will help turn the graphical user interface into a vocal user interface.
Speech recognition (in many contexts also known as 'automatic speech
recognition', computer speech recognition or erroneously as Voice
Recognition) is the process of converting a speech signal to a sequence of
words, by means of an algorithm implemented as a computer program.
Speech recognition applications that have emerged over the last years
include voice dialing (e.g., Call home), call routing (e.g., I would like to
make a collect call), simple data entry (e.g., entering a credit card number),
and preparation of structured documents (e.g., a radiology report).
Voice Verification or speaker recognition is a related process that attempts to
identify the person speaking, as opposed to what is being said.
1
Speech Technology Development at IBM:
The overall view, with emphasis on Via-Scribe and Accessibility
2
Recognition
Conversational Biometrics Speaker identification, speaker verification
Text to Speech Synthesis Home Page Reader, viavoice
Machine Translation MASTOR, DARPA projects, websphere
3
Speech recognition is a technology that is constantly evolving. It is a technology
that is experiencing tremendous growth in the commercial market, apart from its
original niche as an assistive technology product. There are presently three major
companies with speech recognition products, Dragon Systems, Lernout & Hauspie
(L&H), and IBM. Stiff competition between these companies and more demand
from consumer and business markets, has led to a tremendous drop in prices over
the last few years. Competition has also fueled the development of a plethora of
new products. Each company has several products available, ranging in price,
features, and the applications that they support. This paper seeks to make sense of
the overwhelming array of products so that persons who are shopping for speech
recognition will have a better understanding of their choices.
What are the Types of Speech Recognition?
*Discrete
• Slower dictation process - better for persons with difficulty in language
processing or in fluid speech
• Word-by-word style, rather than phrases, reflects the way beginning writers
form sentences
*Continuous
• Processes speech by phrase
• Takes context into account
• Is less accurate if phrases are interrupted
• Advantages: Speed and accuracy (for most users)
Who Can Benefit from Speech Recognition?
• Persons with mobility impairments or injuries that prevent keyboard access
• Persons who have or who are seeking to prevent repetitive stress injuries
• Persons with writing difficulties
• Any person who want hands-free access to the computer
4
• Any persons who wants to increase their typing speed
(reportedly up to 160 wpm)
Speech Analysis
Speech Analysis
5
WHO? What? How?
The primary goal of the speech analysis is to correctly determine individual words
with probability ≤ 1. A word is recognized only with a certain probability.
Environmental noise, room acoustics and a speaker’s physical and
psychological conditions play an important role.
6
For example, let’s assume extremely bad individual words recognition with a
probability of 0.95. This means that 5% of the words are incorrectly
recognized. If we have a sentence with three words, the probability of
Reference storage:
Properties of
Learned Material
Recognized Speech
--: Speech Recognition System :--
7
• Properties are extracted by comparision of individual speech element
characteristics with a sequence of in advance given speech element
characteristics. The characteristics with a sequence of in advance
given speech elements are present.
• Second, the speech elements are compared with existent reference to
determine the mapping to one of the existent speech elements. The
identified speech can be stored, transmitted or processed as a
parameterized sequence of speech elements.
Usually the comparison and decision are executed through the main
system processor. The computer’s secondary storage contains the
letter0to-phone rules, a Dictionary of exceptions and a reference
characteristics. The concrete methods differ in definition of the
characteristics. The principle of “data reduction through property
extraction,” can be applied several times to different characteristics. The
system which provides recognition and understanding of a speech signal
applies this principle several times:-
Understood
Speech speech
Acoustical and Syntactical Semantic
Phonetic Analysis Analysis Analysis
Recognized Speech
8
• In 1st step, the principle is applied to a sound pattern and/or word
model. An acoustical and phonetical analysis is performed.
• In 2nd step, certain speech units go through syntactical analysis;
there by, the errors of the previous step can be recognized. Very
often during the first step, no unambiguous decision can be made.
In this case, syntactical analysis provides additional decision help
the result is a recognized speech.
• The 3rd step deals with the semantic of the previously recognized
language. Here the decision errors of the previous step can be
recognized and corrected with other analysis methods. Even today,
this step is non trivial to implement with current method s known in
artificial intelligence and neural nets research. The result of this
step is an understood speech.
There are still many problems into which speech recognition research
is being conducted:
A specific problem is presented by room acoustic with
existent environmental noise. The frequency dependent
reflections of a sound wave from walls and objects can
overlap with the primary sound wave.
Word boundary must be determined.
During comparison time normalization is necessary. The
9
Speech recognition systems are divided into speaker –independent
recognition systems and speaker-dependent recognition system. A speaker
independent system can recognisewith the same reliability essentially fewer
words than a speaker dependent system because the latter is TRAINED IN
ADVANCE. Training in advance means that there exists a training phase for
the speech recognition system, which takes a half an hour. speaker-
dependent recognition system can recognize around 25,000 words, speaker-
independent recognition system can recognize around 500 words but with a
worse recognition rate. These should be understood as gross guidelines.
Speech Transmission
The area of speech transmission deals with the efficient coding to
transmit the speech/sound signal correctly and efficiently over networks
such that the same quality of speech/sound. Some principles are:
Signal form coding
Here no speech specific properties and parameters are needed.
Here the goal is to schieve the most effiecent of the audio signal. The
data rate of a PCM –coded sterio audio signal with CD-quqlity
requirements is 1,411,200 bits/s.
Telephony quality , in comparision to Cd quality needs only 64
kbits/s. using DPCM,the data rate can be lowered to 56 kbits/s
without loss of quality.
Recognition/synthesis Methods
There have been attempt to reduce transmission rate using pure
recognition /synthesis methods. Speech analysis (recognition)
10
follows on the sender side of a speech transmission system and
speech synthesis (generation) follows on the receiver side.
Conclusion
11
Dragon’s current continuous speech product line, known as Dragon
NaturallySpeaking, includes a Standard, Preferred, and Professional edition,
listed in order from low end to high end. The Preferred edition includes
dictation playback and text-to-speech, features that distinguish it from the
Standard edition. The Preferred edition also supports input from an external
recording device, although no recording device is provided. A special
version of the Preferred edition, Dragon NaturallySpeaking Mobile, does
include a digital recording device for additional cost. On the high end of
Dragon’s NaturallySpeaking product line, the Professional edition is
distinguished by its expanded macro and scripting capabilities, which allow
users to dictate long sections of text or complex computer operations with
simple commands. The Professional edition also comes in Legal and Medical
versions, which feature custom vocabularies for these disciplines.
12
major competitor of Dragon Dictate. However, IBM has discontinued this
product and is now focusing all its efforts on developing continuous speech
products. Its current product line, IBM ViaVoice Millenium, includes a
Standard, Web and Professional edition. The web edition features natural
language commands for Internet Explorer, Netscape Communicator and
America Online. The web edition also features a specialized vocabulary for
on-line chats. The Professional edition provides most of the features of the
Web edition, but also provides natural language commands for the entire
Microsoft Office suite, and specialized business, finance, and computer
vocabularies.
A person who has a disability or who works with persons with disabilities
will come out of this system with a more accurate representation on which
speech recognition products will best work with them. There is a lot of
confusion today about speech recognition products. The main focus of this
presentation is to clarify the speech recognition technology.
References
13
Multilingual Speech Processing, Edited by Tanja Schultz and
Katrin Kirchhoff, April 2006
Multimedia : COMPUTING ,COMMNICATIONS &
APPLICATIONS (By. RALF STEINMETZ & KLARA
NABRSTED)
www.software.ibm.com/speech/
www.dragonsys.com
http://cslu.cse.ogi.edu/HLTsurvey/ch1node5.html
http://www.apple.com/macosx/developertools/
14