You are on page 1of 7

SPEECH RECOGNITION – SPECIFIC TASK OF SPEECH

RECOGNITION
Er Sarbjeet Singh, Er.Manjit Thapa , Er Gurpreet Singh , ErSukhvinder singh

Department of Computer Science, Sri Sai College of Engg. & Tech. Badhani
(Pathankot).

Email id – sarbaish@gmail.com , manjit.thapa@yahoo.co.in , chohan87@gmail.com


,sukhaish@gmail.com

Abstract: In this paper presents an overview of speech recognition technology, software, development
and applications. It begins with a description of how such systems work, and the level of accuracy that can
be expected. Applications of speech recognition technology in education and beyond are then explored. A
brief comparison of the most common systems is presented, as well as notes on the main centers of speech
recognition research in the educational sector. The paper Concludes with potential uses of speech
recognition in education, probable main uses of the technology in the future, and a selection of key web-
based resources. We introduce original visual descriptors related to the dominant and residual image
motions. The different summary types are obtained by specifying adapted classification criteria which
involve audio features to select the relevant segments to be included in the video aids. Such systems are
now capable of understanding continuous speech input for vocabularies of several thousand words in
operational environments.

Keyword: Introduction, Conventional system, Audio and Video aids .uses of recognition & application
of system speech.

1. INTRODUCTION systems are, not surprisingly, easier to construct


We can classify speech recognition tasks and and can be quite robust as they have a complete
systems along a set of dimensions that produce set of patterns for the possible inputs.
various tradeoffs in applicability and robustness. Continuous word systems cannot have complete
We have made signi_cant progress in automatic representations of all possible inputs, but must
speech recognition (ASR) for well-de_ned. Some assemble patterns of smaller speech events (e.g.,
speech systems only need identify.[1,2,3] words) into larger sequences (e.g., sentences).
Single words at a time (e.g., speaking a number Speech recognition allows you to provide input
to route a phone call to a company to the to an application with your voice. Just like
appropriate person), while others must recognize clicking with your mouse, typing on your
sequences of words at a time. The isolated word keyboard, or pressing a key on the phone keypad
provides input to an application, speech application, if you said “check balance,”
recognition allows you to provide input by the application would not interpret the
talking. In the desktop world, you need a result, but simply return the text “check
microphone to be able to do this. In the Voice balance”.
world, all you need is a telephone. [4,5] 2. Production of speech
2. Speech recognition engine While you are producing speech sounds, the air
The speech recognition process is performed by flow from your lungs first passes the glottis and
a software component known as the speech then your throat and mouth. Depending on which
recognition engine. The primary function of the speech sound you articulate, the speech signal
speech recognition engine is to process spoken can be excited in three possible ways:[6,7]
input and translate it into text that an application • Voiced excitation the glottis is closed.
understands. The application can then do one of The air pressure forces the glottis to
two things & shown in figure0. open and close periodically thus
generating a periodic pulse train
Grammars
(triangle–shaped). This “fundamental
frequency” usually lies in the range
from 80Hz to 350Hz.
Audio Input Recognized
Text • Unvoiced excitation The glottis is
Recognition open and the air passes a narrow
engine
passage in the throat or mouth. This
results in a turbulence which generates a
noise signal. The spectral shape of the
noise is determined by the location of
Model the narrowness.[8,9]
Figure 0. Shown in speech recognition engine.
• Transient excitation a closure in the
• The application can interpret the result
throat or mouth will raise the air
of the recognition as a command. In this
pressure. By suddenly opening the
case, the application is a command and
closure the air pressure drops down
control application. An example of a
immediately. (”plosive burst”)
command and control application is one
Speech recognition has accepted the three types
in which the caller says “check
of given below: of the spectral shape of the
balance”, and the application returns
speech signal are determined by the shape of the
the current balance of the caller’s
vocal tract (the pipe formed by your throat,
account.
tongue, teeth and lips). By changing the shape of
• If an application handles the recognized the pipe (and in addition opening and closing the
text simply as text, then it is considered air flow through your nose) you change the
a dictation application. In a dictation spectral shape of the speech signal, thus
articulating different speech sounds. As shown in
Utterances Pronunciations
figure1.
In this paper, we will explore the core
components of modern statistically-based speech Terms of
recognition systems. We will view speech recognition

recognition problem in terms of three tasks:


signal modeling, network searching, and
Dependent&
Grammars
language understanding. independent

Figure2. Shown in basic term of recognition.


Production of speech
• Utterances – When the user says
something, it is known as Utterances is
any stream of speech between two
periods of silence[10,11]. Utterances are
sent to the speech engine to be

Voiced excitation Unvoiced of excitation processed. Utterances are sent to the


speech engine to be processed. If the
user doesn’t say anything, the engine
Transient excitation
returns what is known as a silence
timeout - an indication that there was no
speech detected within the expected
Figure 1. Shown in phases of production.
timeframe, and the application takes an
We will conclude our discussion with an
appropriate action, such as regrouping
overview of state-of-the-art Systems and a
the user for input. As shown in figure2.
review of available resources to support further
research and technology development. • Pronunciations - as speech recognition
engine uses all sorts of data, statistical
3. Recognition of terms
models, and algorithms to convert
It is important to have a good understanding of
spoken input into text.[12,13] One piece
these concepts when developing Voice
of information that the speech
applications as shown in figure 2.
recognition engine uses to process a
word is its pronunciation, which
represents what the speech engine
thinks a word should sound like.

• Recognition Grammars- A speech


recognition grammar is a set of word
patterns, and tells a speech recognition
system what to expect a human to say.
For instance, if you call a voice
directory application, it will prompt you do not have the time or resources to
for the name of the person you would develop keyboard skills.
like to talk with. It will then start up a • Dyslexic people or others who have
speech recognizer, giving it a speech problems with character or word use
recognition grammar. This grammar and manipulation in a textual form.
contains the names of the people in the • People with physical disabilities that
directory, and the various sentence affect either their data entry, or ability
patterns callers typically respond with. to read (and therefore check) what they
As shown in figure2. have entered.
• A speaker dependent system is • Speech recognition systems used by
developed to operate for a single the general public e.g. phone-based
speaker. These systems are usually automated timetable information, or
easier to develop, cheaper to buy and ticketing purchasing, can be used
more accurate, than but not as flexible immediately – the user makes contact
as speaker adaptive or speaker with the system, and speaks in
independent systems. As shown in response to commands and questions.
figure2. • Speech recognition software is
• A speaker independent system is configured or designed to be used on a
developed to operate for any speaker of standalone computer. However, it is
a particular type (e.g. American possible to configure some software in
English). These systems are the most order to be used over a network.
difficult to develop, most expensive and Speech recognition can be includes by web
accuracy is lower than speaker design system. Web design can be followed
dependent systems. However, they are by client/server model system & interacting
more flexible & shown in figure2. with computers such as reliability, input and

4. Conventional speech recognition output devices.

systems 5. Uses of recognition


Speech recognition is an alternative to traditional Speech recognition can be used to

methods of interacting with a computer, such as educational system business system,

textual input through a keyboard. An effective industrial country as shown in figure 3.

system can replace, or reduce the reliability on,


standard keyboard and mouse input. It has the
following phases & given below as: Uses of
recognition
• People who have little keyboard skills
or experience, who are slow typists, or
Phases of speech used as
two everything. Busines
Educati
s system
on
system
The word "aids" is vital to a correct
Understanding of their use.
• Music used on its own can be very
Figure 3.shown in recognition of uses. effective as a scene setter and can help
• Education system - as the development create atmosphere.
of system speech recognition is a • Audio aids communicate ideas through
computer application that lets people the ears to the mind. They may take the
control a computer by speaking to it. In form of music or tape recordings,
other words, rather than using a television, records, sound films, etc.
keyboard to communicate with the • Visual aids communicate facts and
computer, the user speaks commands ideas through the eyes to the mind and
into a microphone (usually on a emotions. Visual aids include films,
headset) that is connected to a slides, videos, overhead projection,
computer. books, photographs, models and charts.

• Business system – In business, speech


recognition technology can help
automate tasks, increase worker Applicatio
n
productivity, and increase. Of
Recognitio
Companies’ ability to better service its n
customers. For the mobile workforce, speech
recognition technology can keep Workers
connected to the office while on the road in a
Audio tasks Video tasks
manner that is both legal and appropriate – a
growing consideration as an increasing number
of states pass legislation restricting the use of Figure 4.shown in application of speech

hand-held cell phones and text system.


• Video tasks are second concept of
6. Application of recognition
speech recognition system. & can
Recognition can be used to the application of
be used a fist television set, second
audio and visual tasks such as speech
as projector system.
recognitions as shown in figure4. These are the
• When showing a video, make sure
following points as given below:
the picture will be big enough for
• Audio-Visual aids are fist concept of
the whole audience to see. The
communicating with people. There are a
maximum screen that you are likely
variety of audio-visual aids which can
to get with a television is about
be used - it is Important to select aids
75cm (30 inches), while a video
which are appropriate to the method.
projector gives a picture size of 2½ [4] B. Kotnik, Z. kacic and B. Horvat, The usage
meters (8 foot) square or over. of wavelet packet transformation in automatic
In this way, system of audio and video depend noisy speech recognition systems, IEEE,
upon the speech recognition system. Eurocon 2003,
4. Conclusion Slovinia, vol. 2, No. 2, pp. 131-134, 2003.

In this paper, it is very simple and Speech [5] L. Birgé, P. Massart, From model selection to

recognition will revolutionize the way people adaptive estimation, in D. Pollard (ed), Festchrift

conduct business over the Web and will, for L. Le Cam, Springer, vol. 7, No. 2, pp. 55-88,

ultimately, differentiate world-class e-businesses. 1997.

Voice ties speech recognition and telephony [6] D. L. Donoho, De-noising by Soft-

together and provides the technology with which thresholding, IEEE Trans. Inform Theory, Vol.

businesses can develop and deploy voice-enabled 41, No. 3, pp. 613- 627, May 1995.

Web solutions TODAY! Speech recognition [7] D. L. Donoho, Nonlinear Wavelet Methods

refers to the ability to listen (input in audio forn Recovering Signals, Images, and Densities

format) spoken words from Indirect and Noisy Data, Proceedings of

and identify various sounds present in it, and Symposia in Applies Mathematics. Vol. 47, pp.

recognizes them as words of some known 173-205, 1993.

Language. Speech recognition in computer [8] M. N. Stuttle, M.J.F. Gales , A Mixture of

system domain may then be defined as the ability Gaussians Front End for Speech Recognition,

of computer systems to accept spoken words in Eurospeech

audio format - such as wav or raw - and then 2001, pp. 675-678, Scandinavia, 2001.

generate its content in text format. Visual speech [9] J. Potamifis, N. Fakotakis, G. Kokkinakis,

in itself does not contain sufficient information Improving the robustness of noisy MFCC

for speech recognition. features using minimal recurrent neural


networks, Neural
8. References:
Networks, IJCNN 2000, Proceedings of the
[1] K.K. Paliwal and L. Alsteris, Usefulness of
IEEEINNS- ENNS International Joint
Phase Spectrum in Human Speech Perception,
Conference on, vo1.5, pp. 271-276, 2000.
Proc. Eurospeech, pp. 2117-2120, 2003.
[10] S. Young, The HTK Book, Cambridge
[2] D. Zhu and K. Paliwal, Product of power
University Engineering Department, Cambridge,
spectrum and group delay function for speech
UK, 2001.
recognition, Proc ICASSP 2004, pp. I-125 I-128,
[11] B. Carneno and A. Drygajlo, Perceptual
2004.
speech coding and enhancement using frame-
[3] Y. Gong, Speech recognition in noisy
synchronized
environments: A survey, Speech
fast wavelet-packet transform algorithms. IEEE
Communication, vol. 16, No. 3,
Trans. Signal Process. 47 (6), pp.1622-1635,
pp. 261-291, 1995.
1999.
[12] I. Pinter, Perceptual wavelet-representation
of speech signals and its application to
speechenhancement. Comput. Speech Lang. vol.
10, pp. 1- 22, 1996.
[13] E. Zwicker, E. Tergardt, Analytical
expressions for
critical-band rate and critibandwith as a function
offrequency. JASA68, pp. 1523-1525, 1980.
[14] M. Jansen, Noise reduction by wavelet
thresholding. New York: Springer-Verlag, New
York. 2001.
[15] A. Varga, , H. Steeneken, , M. Tomlinson,
D. Jones, The NOISEX-92 study on the effect of
additive noise
on automatic speech recognition, Technical
report, DRA Speech Research Unit, Malvern,
England,
1992. Available from:
http://spib.rice.edu/spib/select_noise.

You might also like