Professional Documents
Culture Documents
ON
SPEECH RECOGNITION SYSTEM
Submitted by,
Satyanarayana Dash
Roll.No : 3 MCA 067
Regd.No : 0305107085
12/08/21 1
CONTENTS
INTRODUCTION
OVERVIEW
BASIC DIADRAM
SPEECH RECOGNITION MAIN PROBLEMS
HOW SPEECH RECOGNITION WORKS
APPLICATION FIELDS
CONCLUSION
12/08/21 2
INTRODUCTION
Speech recognition is the hottest topic in research today.
Speech recognition is the process of converting an audio signal,
captured by a microphone or a telephone, to a set of words.
In its simplest form, this machine should consist of two
subsystems, namely automatic speech recognition (ASR) and
speech understanding (SU). The goal of ASR is to transcribe
natural speech while SU is to understand the meaning of the
transcription.
12/08/21 3
OVERVIEW
Speech recognition uses several techniques to "recognize"
the human voice. It functions as a pipeline that converts digital
audio signals coming from the sound card to recognized
speech.
The voice input to the microphone goes to the sound card.
The output from the sound card—digital audio—is processed
using FFT (Fast Fourier Transform)—and further fine-
processed using HMMs(Hidden Markov Models) and other
techniques. The built-in database is used for analyzing what’s
been spoken. There’s a reverse feedback to the database at the
final stage for the purpose of adaptation. The final recognized
output then goes back to the CPU.
12/08/21 4
BASIC DIAGRAM OF THE PROCESS
12/08/21 5
SPEECH RECOGNITION MAIN
PROBLEMS
12/08/21 6
HOW SPEECH RECOGNITION
WORKS
Speech recognition fundamentally functions as a pipeline that
converts PCM (Pulse Code Modulation) digital audio from a
sound card into recognized speech. The elements of the
pipeline are:
1. Transform the PCM digital audio into a better acoustic
representation
2. Apply a "grammar" so the speech recognizer knows what
phonemes to expect. A grammar could be anything from a
context-free grammar to full-blown English.
3. Figure out which phonemes are spoken.
4. Convert the phonemes into words.
12/08/21 7
PRELIMINARY THINGS
SOUND CARD
FAST FOURIER TRANSFORM(FFT)
HIDDEN MARKOV MODEL(HMM)
12/08/21 8
SOUND CARD
A sound card is a peripheral device that attaches to the PCI
slot on a motherboard to enable the computer to input, process,
and deliver sound.
12/08/21 9
FAST FOURIER TRANSFORM
12/08/21 10
Hidden Markov Models
When the software has to be able to judge when a phoneme
ends and the next one begins. For this, it uses a technique called
Hidden Markov Models (HMM), which is another mathematical
model that uses statistics. To figure out when speech starts and
stops, a speech recognizer has silence phonemes, which are also,
assigned feature numbers.
12/08/21 11
STEP BY STEP PROCESS OF SPEECH
RECOGNITION SYSTEM
12/08/21 12
APPLICATION FIELDS
Speech synthesis (voice response systems)
Digital transmission and storage (optimized encryption
of signals)
Speaker verification and identification (control of access,
legal applications)
Aids to the handicapped
Speech recognition (automatic dictation, command and
control)
Enhancement of speech signal quality (noise removal)
12/08/21 13
CONCLUSION
speech technologies are quickly moving forward, and as the
time comes with new technologies, the algorithms and methods
are optimized and improved.
So, for the speech recognition system, there are many
software’s developed and used. Those are: Voice pad generator,
Dragon’s naturally speaking, Scan soft open speech recognizer
etc.
12/08/21 14