STANLEY COLLEGE OF ENGINEERING AND TECHNOLGY FOR WOMEN

Abids, Hyderabad
ECE Department

Speech Recognition
2016-17

Internal Guide Presented By
Mrs.T.Prasanna V.Lavanya
Associate Professor 160613735094
ECE 2
ABSTRACT
Language is man's most important means of communication
and speech its primary medium.
Speech recognition, involves capturing and digitizing the
sound waves, converting them
to basic language units or phonemes, constructing words fro
m phonemes, and contextually analyzing the words to ensure
correct spelling for words that sound alike.
Speech Recognition is the ability of a computer to recognize
general, naturally flowing utterances from a wide variety of
users. It recognizes the caller's answers to move along the
flow of the call.
GENERATIONS
Generation 1 (1930s to 1950s):Use of ad hoc methods to recognize
sounds, or small vocabularies of isolated words.
Generation 2 (1950s to 1960s):Use of acoustic phonetic approaches to
recognize phonemes, phones, or digit vocabularies.
Generation 3 (1960s to 1980s):Use of pattern recognition approaches to
speech recognition of small to medium-sized vocabularies of isolated
and connected word sequences, including use of linear predictive coding
(LPC) as the basic method of spectral analysis; use of LPC distance
measures for pattern similarity scores; use
of dynamic programming methods for time aligning patterns;
use of pattern recognition methods for clustering multiple patterns into
consistent reference patterns; use of vector quantization (VQ)codebook
methods for data reduction and reduced computation.
GENERATIONS
Generation 4 (1980s to 2010s):Use of Hidden Markov model (HMM) statistical
methods for modelling speech dynamics and statistics in a continuous speech
recognition system; use of forward-backward and segmental K -means training
methods; use of Viterbi alignment methods; use of maximum likelihood (ML)
and various other performance criteria and methods for optimizing statistical
models; introduction of neural network (NN) methods for estimating conditional
probability densities; use of adaptation methods that modify the parameters
associated with either the speech signal or the statistical model so as to enhance
the compatibility between model and data for increased recognition accuracy.
Generation 5 (2000s to 2010s):Use of parallel processing methods to increase
recognition decision reliability; combinations of HMMs and acoustic-phonetic
approaches to detect and correct linguistic irregularities; increased robustness
for recognition of speech in noise; machine learning of optimal combinations of
models.
INTRODUCTION
Speech recognition (or sometimes referred to as
Automatic Speech Recognition) is the process by which a
computer identifies spoken words.
Basically, it means talking to a computer & having it
correctly understand what you are saying. By
“understand” we mean, the application to react
appropriately or to convert the input speech to another
medium of conversation which is further perceivable by
another application that can process it properly & provide
the user the required result.
INTRODUCTION
 The days when you had to keep staring at the computer
screen and frantically hit the key or click the mouse for the
computer to respond to your commands may soon be a
things of past.
 Today we can stretch out and relax and tell your computer
to do your bidding. This has been made possible by the ASR
(Automatic Speech Recognition) technology.
Speech recognition is an alternative to traditional methods of
interacting with a computer, such as textual input through a
keyboard. An effective system can replace, or reduce the
reliability on, standard keyboard and mouse input.
USED FOR
This can especially assist the following:
People who have little keyboard skills or experience, who
are slow typists, or do not have the time or resources
to develop keyboard skills.
Dyslexic people or others who have problems with
character or word use and manipulation ina textual form.
People with physical disabilities that affect either their
data entry, or ability to read (andtherefore check) what
they have entered.
PROCESS
Any speech recognition system involves following five major steps:
Signal Processing: The sound is received through the microphone in the form of analog
electrical signals. These signals consist of the voice of the user & the noise from the
surroundings. The noise is then removed & the signals are converted into digital signal.
These digital signals are converted into a sequence of feature vectors.(Feature Vector - If
you have a set of numbers representing certain features of an object you want to describe,
it is useful for further processing to construct a vector out of these numbers by assigning
each measured value to one component of the vector.)
Speech Recognition: This is the most important part of this process; here the actual
recognition is done. The sequence of feature vectors is then decoded into a sequence of
words. This decoding is done on the basis of algorithms such as Hidden Markov Model,
Neural Network or Dynamic Time Wrapping. The program has big dictionary of popular
words that exist in language. Each feature vector is matched against the sound &
converted into appropriate character group. It checks and compares words that are similar
in sound with the formed character groups. All these similar words are then collected.
PROCESS
Semantic Interpretation: Here it checks if the language allows a
particular syllable to appear after another. After that, there will be
grammar check. It tries to find out whether or not the combination
of words any sense.
Dialog Management: The errors encountered are tried to be
corrected. Then the meaning of the combined words is extracted &
the required task is performed.
Response Generation: After the task is performed, the response or
the result of that task is generated. The response is either in the form
of a speech or text. What words to use so as to maximize the user
understanding, are decided here. If the response is to be given in the
form of speech, then Text to Speech conversion process is used.
STRUCTURE
SIGNAL ANALYSIS CONVERTING
RAW SPEECH TO SPEECH
FRAMES
ADVANTAGES
Increases productivity.
Can help with menial computer tasks, such as
browsing and scrolling.
Can help people with disabilities.
Cost effective.
Diminishes spelling mistakes.
DISADVANTAGES
Inaccuracy and slowness.
Vocal strain.
Adaptability.
Out Of Vocabulary(OOV) words.
Spontaneous speech.
Accent and mixed language.
APPLICATIONS
Data entry.
Document editing.
Speaker Identification.
Automation at Call Centers.
Medical Disabilities.
FUTURE SCOPE
Achieving efficient speaker independent word recognition
Ability to distinguish nuances of speech and meanings of
words
Stand-alone Speech Recognition Systems
Wearable Speech Recognition System
 Talk with all the devices.
CONCLUSION
Speech recognition will revolutionize the way people interacted
with Smart devices & will, ultimately, differentiate the upcoming
technologies. Almost all the smart devices coming today in the
market are capable of recognizing speech. Many areas can benefit
from this technology. Speech Recognition can be used for intuitive
operation of computer-based systems in daily life.
This technology will spawn revolutionary changes in the modern
world and become a pivot technology. Within five years, speech
recognition technology will become so pervasive in our daily lives
that service environments lacking this technology will be
considered inferior.
REFERENCES
[1] JOE TEBELSKIS {1995}, SPEECH RECOGNITION USING NEURAL
NETWORKS, School of Computer Science, Carnegie Mellon University
[2] KÅRE SJÖLANDER {2003}, An HMM-based system for automatic segmentation
and alignment of speech, Umea University, Department of Philosophy and Linguistics
[3] KLAUS RIES {1999}, HMM AND NEURAL NETWORK BASED SPEECH
ACTDETECTION, International Conference on Acoustics and Signal Processing
(ICASSP’99)
[4] B. PLANNERER {2005}, AN INTRODUCTION TO SPEECH RECOGNITION
[5] KIMBERLEE A. KEMBLE, AN INTRODUCTION TO SPEECH RECOGNITION,
Voice Systems Middleware Education, IBM
[6] LAURA SCHINDLER {2005}, A SPEECH RECOGNITION AND SYNTHESIS
TOOL, Department of Mathematics and Computer Science, College of Arts and
Science, Stetson University
[7] MIKAEL NILSSON, MARCUS EGNARSSON {2002}, SPEECH RECOGNITION
USINGHMM, Blekinge Institute Of technology