You are on page 1of 2

AUTOMATIC SPEECH RECOGNITION

MD SHAKIR ALAM(2K18/CO/194)
KARTIK DAGAR(2K18/CO/168)
DESCRIPTION:
Automatic speech recognition (ASR) is an independent, machine-based process of decoding
and transcribing oral speech. A typical ASR system receives acoustic input from a speaker
through a microphone, analyses it using some pattern, model, or algorithm, and produces
an output, usually in the form of a text.
Speech recognition systems can be characterized by three main dimensions: speaker
dependence, speech continuity, and vocabulary size. According to the speech data in the
training database, ASR systems can be speaker-dependent (when the system has to be
trained for each individual speaker), speaker-independent (when the training database
contains numerous speech examples from different speakers so the system can accurately
recognize any new speaker), and adaptive (when the system starts out as speaker-
independent and then gradually adapts to a particular user through training).From the
dimension of speech continuity, there are (a) isolated (or discrete) word recognition
systems, which identify words uttered in isolation; (b) connected word recognition systems,
which can recognize isolated words pronounced without pauses between them; (c)
continuous speech recognition systems, which are capable of recognizing whole sentences
without pauses between words; and (d) word spotting systems that extract individual words
and phrases from a continuous stream of speech.
Automatic speech recognition is multidisciplinary. State-of-the-art ASR systems require
knowledge from disciplines such as linguistics, computer science, signal processing,
acoustics, communication theory, statistics, physiology, and psychology. Developing an
effective ASR system poses a number of challenges. They include speech variability (e.g.,
intra- and interspeaker variability such as different voices, accents, styles, contexts, and
speech rates), recognition units (e.g., words and phrases, syllables, phonemes, diphones,
and triphones), language complexity (e.g., vocabulary size and difficulty), ambiguity (e.g.,
homophones, word boundaries, syntactic and semantic ambiguity), and environmental
conditions (e.g., background noise or several people speaking simultaneously).
Despite these challenges, there are numerous commercial ASR products, including Dragon
NaturallySpeaking, Embedded ViaVoice, Loquendo, LumenVox, VoCon, and Nuance
Recognizer. Some of these have applications in computer-system interfaces (e.g., voice
control of computers, data entry, dictation), education (e.g., toys, games, language
translators, language learning software), healthcare (e.g., systems for creating various
medical reports, aids for blind and visually impaired patients), telecommunications (e.g.,
phone-based interactive voice response systems for banking services, information services),
manufacturing (e.g., quality control monitoring on an assembly line), military (e.g., voice
control of fighter aircraft), and consumer products and services (e.g., car navigation systems,
household appliances, and mobile devices).

You might also like