Speech Recognition - Specific Task of Speech Recognition: Abstract

SPEECH RECOGNITION – SPECIFIC TASK OF SPEECH
RECOGNITION
Er Sarbjeet Singh, Er.Manjit Thapa , Er Gurpreet Singh , ErSukhvinder singh
Department of Computer Science, Sri Sai College of Engg. & Tech. Badhani
(Pathankot).
Email id – sarbaish@gmail.com , manjit.thapa@yahoo.co.in , chohan87@gmail.com

,sukhaish@gmail.com
Abstract: In this paper presents an overview of speech recognition technology, software, development
and applications. It begins with a description of how such systems work, and the level of accuracy that can
be expected. Applications of speech recognition technology in education and beyond are then explored. A
brief comparison of the most common systems is presented, as well as notes on the main centers of speech
recognition research in the educational sector. The paper Concludes with potential uses of speech
recognition in education, probable main uses of the technology in the future, and a selection of key web-
based resources. We introduce original visual descriptors related to the dominant and residual image
motions. The different summary types are obtained by specifying adapted classification criteria which
involve audio features to select the relevant segments to be included in the video aids. Such systems are
now capable of understanding continuous speech input for vocabularies of several thousand words in
operational environments.
Keyword: Introduction, Conventional system, Audio and Video aids .uses of recognition & application
of system speech.
1. INTRODUCTION systems are, not surprisingly, easier to construct

We can classify speech recognition tasks and and can be quite robust as they have a complete
systems along a set of dimensions that produce set of patterns for the possible inputs.
various tradeoffs in applicability and robustness. Continuous word systems cannot have complete
We have made signi_cant progress in automatic representations of all possible inputs, but must
speech recognition (ASR) for well-de_ned. Some assemble patterns of smaller speech events (e.g.,
speech systems only need identify.[1,2,3] words) into larger sequences (e.g., sentences).
Single words at a time (e.g., speaking a number Speech recognition allows you to provide input
to route a phone call to a company to the to an application with your voice. Just like
appropriate person), while others must recognize clicking with your mouse, typing on your
sequences of words at a time. The isolated word keyboard, or pressing a key on the phone keypad
provides input to an application, speech application, if you said “check balance,”
recognition allows you to provide input by the application would not interpret the
talking. In the desktop world, you need a result, but simply return the text “check
microphone to be able to do this. In the Voice balance”.
world, all you need is a telephone. [4,5] 2. Production of speech
2. Speech recognition engine While you are producing speech sounds, the air
The speech recognition process is performed by flow from your lungs first passes the glottis and
a software component known as the speech then your throat and mouth. Depending on which
recognition engine. The primary function of the speech sound you articulate, the speech signal
speech recognition engine is to process spoken can be excited in three possible ways:[6,7]
input and translate it into text that an application • Voiced excitation the glottis is closed.
understands. The application can then do one of The air pressure forces the glottis to
two things & shown in figure0. open and close periodically thus
generating a periodic pulse train
Grammars
(triangle–shaped). This “fundamental
frequency” usually lies in the range
from 80Hz to 350Hz.
Audio Input Recognized
Text • Unvoiced excitation The glottis is
Recognition open and the air passes a narrow
engine
passage in the throat or mouth. This
results in a turbulence which generates a
noise signal. The spectral shape of the
noise is determined by the location of
Model the narrowness.[8,9]
Figure 0. Shown in speech recognition engine.
• Transient excitation a closure in the
• The application can interpret the result
throat or mouth will raise the air
of the recognition as a command. In this
pressure. By suddenly opening the
case, the application is a command and
closure the air pressure drops down
control application. An example of a
immediately. (”plosive burst”)
command and control application is one
Speech recognition has accepted the three types
in which the caller says “check
of given below: of the spectral shape of the
balance”, and the application returns
speech signal are determined by the shape of the
the current balance of the caller’s
vocal tract (the pipe formed by your throat,
account.
tongue, teeth and lips). By changing the shape of
• If an application handles the recognized the pipe (and in addition opening and closing the
text simply as text, then it is considered air flow through your nose) you change the
a dictation application. In a dictation spectral shape of the speech signal, thus
articulating different speech sounds. As shown in
Utterances Pronunciations
figure1.
In this paper, we will explore the core
components of modern statistically-based speech Terms of
recognition systems. We will view speech recognition
recognition problem in terms of three tasks:

signal modeling, network searching, and
Dependent&
Grammars
language understanding. independent
Figure2. Shown in basic term of recognition.

Production of speech
• Utterances – When the user says
something, it is known as Utterances is
any stream of speech between two
periods of silence[10,11]. Utterances are
sent to the speech engine to be
Voiced excitation Unvoiced of excitation processed. Utterances are sent to the

speech engine to be processed. If the
user doesn’t say anything, the engine
Transient excitation
returns what is known as a silence
timeout - an indication that there was no
speech detected within the expected
Figure 1. Shown in phases of production.
timeframe, and the application takes an
We will conclude our discussion with an
appropriate action, such as regrouping
overview of state-of-the-art Systems and a
the user for input. As shown in figure2.
review of available resources to support further
research and technology development. • Pronunciations - as speech recognition
engine uses all sorts of data, statistical
3. Recognition of terms
models, and algorithms to convert
It is important to have a good understanding of
spoken input into text.[12,13] One piece
these concepts when developing Voice
of information that the speech
applications as shown in figure 2.
recognition engine uses to process a
word is its pronunciation, which
represents what the speech engine
thinks a word should sound like.
• Recognition Grammars- A speech

recognition grammar is a set of word
patterns, and tells a speech recognition
system what to expect a human to say.
For instance, if you call a voice
directory application, it will prompt you do not have the time or resources to
for the name of the person you would develop keyboard skills.
like to talk with. It will then start up a • Dyslexic people or others who have
speech recognizer, giving it a speech problems with character or word use
recognition grammar. This grammar and manipulation in a textual form.
contains the names of the people in the • People with physical disabilities that
directory, and the various sentence affect either their data entry, or ability
patterns callers typically respond with. to read (and therefore check) what they
As shown in figure2. have entered.
• A speaker dependent system is • Speech recognition systems used by
developed to operate for a single the general public e.g. phone-based
speaker. These systems are usually automated timetable information, or
easier to develop, cheaper to buy and ticketing purchasing, can be used
more accurate, than but not as flexible immediately – the user makes contact
as speaker adaptive or speaker with the system, and speaks in
independent systems. As shown in response to commands and questions.
figure2. • Speech recognition software is
• A speaker independent system is configured or designed to be used on a
developed to operate for any speaker of standalone computer. However, it is
a particular type (e.g. American possible to configure some software in
English). These systems are the most order to be used over a network.
difficult to develop, most expensive and Speech recognition can be includes by web
accuracy is lower than speaker design system. Web design can be followed
dependent systems. However, they are by client/server model system & interacting
more flexible & shown in figure2. with computers such as reliability, input and
4. Conventional speech recognition output devices.
systems 5. Uses of recognition

Speech recognition is an alternative to traditional Speech recognition can be used to
methods of interacting with a computer, such as educational system business system,
textual input through a keyboard. An effective industrial country as shown in figure 3.
system can replace, or reduce the reliability on,

standard keyboard and mouse input. It has the
following phases & given below as: Uses of
recognition
• People who have little keyboard skills
or experience, who are slow typists, or
Phases of speech used as
two everything. Busines
Educati
s system
on
system
The word "aids" is vital to a correct
Understanding of their use.
• Music used on its own can be very
Figure 3.shown in recognition of uses. effective as a scene setter and can help
• Education system - as the development create atmosphere.
of system speech recognition is a • Audio aids communicate ideas through
computer application that lets people the ears to the mind. They may take the
control a computer by speaking to it. In form of music or tape recordings,
other words, rather than using a television, records, sound films, etc.
keyboard to communicate with the • Visual aids communicate facts and
computer, the user speaks commands ideas through the eyes to the mind and
into a microphone (usually on a emotions. Visual aids include films,
headset) that is connected to a slides, videos, overhead projection,
computer. books, photographs, models and charts.
• Business system – In business, speech

recognition technology can help
automate tasks, increase worker Applicatio
n
productivity, and increase. Of
Recognitio
Companies’ ability to better service its n
customers. For the mobile workforce, speech
recognition technology can keep Workers
connected to the office while on the road in a
Audio tasks Video tasks
manner that is both legal and appropriate – a
growing consideration as an increasing number
of states pass legislation restricting the use of Figure 4.shown in application of speech
hand-held cell phones and text system.

• Video tasks are second concept of
6. Application of recognition
speech recognition system. & can
Recognition can be used to the application of
be used a fist television set, second
audio and visual tasks such as speech
as projector system.
recognitions as shown in figure4. These are the
• When showing a video, make sure
following points as given below:
the picture will be big enough for
• Audio-Visual aids are fist concept of
the whole audience to see. The
communicating with people. There are a
maximum screen that you are likely
variety of audio-visual aids which can
to get with a television is about
be used - it is Important to select aids
75cm (30 inches), while a video
which are appropriate to the method.
projector gives a picture size of 2½ [4] B. Kotnik, Z. kacic and B. Horvat, The usage
meters (8 foot) square or over. of wavelet packet transformation in automatic
In this way, system of audio and video depend noisy speech recognition systems, IEEE,
upon the speech recognition system. Eurocon 2003,
4. Conclusion Slovinia, vol. 2, No. 2, pp. 131-134, 2003.
In this paper, it is very simple and Speech [5] L. Birgé, P. Massart, From model selection to
recognition will revolutionize the way people adaptive estimation, in D. Pollard (ed), Festchrift
conduct business over the Web and will, for L. Le Cam, Springer, vol. 7, No. 2, pp. 55-88,
ultimately, differentiate world-class e-businesses. 1997.
Voice ties speech recognition and telephony [6] D. L. Donoho, De-noising by Soft-
together and provides the technology with which thresholding, IEEE Trans. Inform Theory, Vol.
businesses can develop and deploy voice-enabled 41, No. 3, pp. 613- 627, May 1995.
Web solutions TODAY! Speech recognition [7] D. L. Donoho, Nonlinear Wavelet Methods
refers to the ability to listen (input in audio forn Recovering Signals, Images, and Densities
format) spoken words from Indirect and Noisy Data, Proceedings of
and identify various sounds present in it, and Symposia in Applies Mathematics. Vol. 47, pp.
recognizes them as words of some known 173-205, 1993.
Language. Speech recognition in computer [8] M. N. Stuttle, M.J.F. Gales , A Mixture of
system domain may then be defined as the ability Gaussians Front End for Speech Recognition,
of computer systems to accept spoken words in Eurospeech
audio format - such as wav or raw - and then 2001, pp. 675-678, Scandinavia, 2001.
generate its content in text format. Visual speech [9] J. Potamifis, N. Fakotakis, G. Kokkinakis,
in itself does not contain sufficient information Improving the robustness of noisy MFCC
for speech recognition. features using minimal recurrent neural

networks, Neural
8. References:
Networks, IJCNN 2000, Proceedings of the
[1] K.K. Paliwal and L. Alsteris, Usefulness of
IEEEINNS- ENNS International Joint
Phase Spectrum in Human Speech Perception,
Conference on, vo1.5, pp. 271-276, 2000.
Proc. Eurospeech, pp. 2117-2120, 2003.
[10] S. Young, The HTK Book, Cambridge
[2] D. Zhu and K. Paliwal, Product of power
University Engineering Department, Cambridge,
spectrum and group delay function for speech
UK, 2001.
recognition, Proc ICASSP 2004, pp. I-125 I-128,
[11] B. Carneno and A. Drygajlo, Perceptual
2004.
speech coding and enhancement using frame-
[3] Y. Gong, Speech recognition in noisy
synchronized
environments: A survey, Speech
fast wavelet-packet transform algorithms. IEEE
Communication, vol. 16, No. 3,
Trans. Signal Process. 47 (6), pp.1622-1635,
pp. 261-291, 1995.
1999.
[12] I. Pinter, Perceptual wavelet-representation
of speech signals and its application to
speechenhancement. Comput. Speech Lang. vol.
10, pp. 1- 22, 1996.
[13] E. Zwicker, E. Tergardt, Analytical
expressions for
critical-band rate and critibandwith as a function
offrequency. JASA68, pp. 1523-1525, 1980.
[14] M. Jansen, Noise reduction by wavelet
thresholding. New York: Springer-Verlag, New
York. 2001.
[15] A. Varga, , H. Steeneken, , M. Tomlinson,
D. Jones, The NOISEX-92 study on the effect of
additive noise
on automatic speech recognition, Technical
report, DRA Speech Research Unit, Malvern,
England,
1992. Available from:
http://spib.rice.edu/spib/select_noise.

Speech Recognition - Specific Task of Speech Recognition: Abstract

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Recognition - Specific Task of Speech Recognition: Abstract

Uploaded by

Copyright:

Available Formats

SPEECH RECOGNITION – SPECIFIC TASK OF SPEECH

Email id – sarbaish@gmail.com , manjit.thapa@yahoo.co.in , chohan87@gmail.com

1. INTRODUCTION systems are, not surprisingly, easier to construct

recognition problem in terms of three tasks:

Figure2. Shown in basic term of recognition.

Voiced excitation Unvoiced of excitation processed. Utterances are sent to the

• Recognition Grammars- A speech

4. Conventional speech recognition output devices.

systems 5. Uses of recognition

methods of interacting with a computer, such as educational system business system,

textual input through a keyboard. An effective industrial country as shown in figure 3.

system can replace, or reduce the reliability on,

• Business system – In business, speech

hand-held cell phones and text system.

ultimately, differentiate world-class e-businesses. 1997.

format) spoken words from Indirect and Noisy Data, Proceedings of

recognizes them as words of some known 173-205, 1993.

Language. Speech recognition in computer [8] M. N. Stuttle, M.J.F. Gales , A Mixture of

of computer systems to accept spoken words in Eurospeech

for speech recognition. features using minimal recurrent neural

You might also like