You are on page 1of 2

28/03/2011 Speech Recognition

Speech Recognition
Speech recognition technologies allow computers equipped with microphones to interpret human
speech, e.g. for transcription or as a control method.

Such systems can be classified as to whether they require the user to "train" the system to
recognise their own particular speech patterns or not, whether the system can recognise continuous
speech or requires users to break up their speech into discrete words, and whether the vocabulary
the system recognises is small (in the order of tens or at most hundreds of words), or large
(thousands of words).

Systems requiring a short amount of training can (as of 2001) capture continuous speech with a
large vocabulary at normal pace with an accuracy of about 98% (getting two words in one hundred
wrong), and different systems that require no training can recognize a small number of words (for
instance, the ten digits of the decimal system) as spoken by most English speakers. Such systems
are popular for routing incoming phone calls to their destinations in large organisations.

C ommercial systems for speech recognition have been available off-the-shelf since the 1990s.
However, it is interesting to note that despite the apparent success of the technology, few people
use such speech recognition systems.

It appears that most computer users can create and edit documents more quickly with a
conventional keyboard, despite the fact that most people are able to speak considerably faster than
they can type. Additionally, heavy use of the speech organs results in vocal loading.

Some of the key technical problems in speech recognition are that: Some of the key technical
problems in speech recognition are that:

Inter-speaker differences are often large and difficult to account for. It is not clear which
characteristics of speech are speaker-independent.
The interpretation of many phonemes, words and phrases are context sensitive. For
example, phonemes are often shorter in long words than in short words. Words have
different meanings in different sentences, e.g. "Philip lies" could be interpreted either as
Philip being a liar, or that Philip is lying on a bed.
Intonation and speech timbre can completely change the correct interpretation of a word or
sentence, e.g. "Go!", "Go?" and "Go." can clearly be recognised by a human, but not so
easily by a computer.
Words and sentences can have several valid interpretations such that the speaker leaves the
choice of the correct one to the listener.
Written language may need punctuation according to strict rules that are not strongly present
in speech, and are difficult to infer without knowing the meaning (commas, ending of
sentences, quotations).

The "understanding" of the meaning of spoken words is regarded by some as a separate field, that
of natural language understanding. However, there are many examples of sentences that sound the
same, but can only be disambiguated by an appeal to context: one famous T-shirt worn by Apple
C omputer researchers stated,

I helped Apple wreck a nice beach,

which, when spoken, sounds like I helped Apple recognize speech.

A general solution of many of the above problems effectively requires human knowledge and
experience, and would thus require advanced artificial intelligence technologies to be implemented
on a computer. In particular, statistical language models are often employed for disambiguation and
improvement of the recognition accuracies.

See also
Robotics

Other resources

Statistical Language Modeling (Natural Language Processing Lab, Northeastern University, C hina)

This article is licensed under the GNU Free Documentation License. It uses material from the
Wikipedia.

http://www.i2osig.org/speech.html 1/2

html 2/2 . Use it at your own risk http://www.28/03/2011 Speech Recognition Copyright © 2004 i2osig. All R ights R e se rve d Inform ation provide d with no warrantie s.org.i2osig.org/speech.