You are on page 1of 14

SPPECH RECOGNITION

TECHNOLOGY
MA DE B Y - ISHI T HA K U R
CONTENTS
• INTRODUCTION
• HOW IT WORKS
• PROCESS DIAGRAM
• MODELLING TECHNIQUES
• TYPES
• FLAWS
• ADVANTAGES
• APPLICATIONS
• SPEECH REGOGNITION VS VOICE RECOGNITION
• PERFORMANCE
• FUTURE
INTRODUCTION
• Speech recognition is the process of converting an acoustic signal,
captured by a microphone or a telephone, to a set of words.
• The recognized words can be an end in themselves, as for
applications such as commands & control, data entry, and document
preparation.
• It is also known as Automatic Speech Recognition (ASR), Computer
Speech Recognition, Speech To Text(STT).
HOW IT WORKS
• The analog-to-digital converter (ADC) translates
the analog wave into digital data that the
computer can understand.
• Speech recognition works using algorithms
through acoustic and language modeling.
• The speech engine then produces a feedback
relative to what we have spoken.
PROCESS DIAGRAM
MODELLING

Acoustic model
An acoustic model is created by taking audio recordings of
speech, and using software to create statistical
representations of the sound that make up each word. It is
used by a speech recognition engine to recognize speech

Language Model
Language modeling is used in many natural language
processing applications such as speech recognition tries to
capture the properties of a language, and to predict the next
word in a speech sentence.
TYPES
• Speaker-dependent software are found in specialsed use cases where there a
limited number of words that need to be recognized with high accuracy.
• Speaker–dependent software operates by learning the unique, individual
Speaker characteristics of a single person’s voice, in a way similar to voice
recognition.
Dependent

• Speaker–independent software is designed to recognise anyone’s voice, so it


requires no training,the downside is that speaker–independent software is
Speaker generally speaking less accurate than speaker–dependent software.
Independent
FLAWS
• Low signal-to-noise ratio – The program
needs to “hear” the words spoken
distinctly, and any extra noise introduced
into the sound will interfere with this.
• Overlapping speech – Current systems
have difficulty separating simultaneous
speech from users.
• Homonyms – The words which sounds
similar while speaking eg, “there” and
”their”.
ADVANTAGES
 O n e o f th e m o s t n o ta b le a dva ntage s o f s p e e ch re cog nition
te c h n olo gy in c lu d e s th e d ic ta tio n a bility it p r o v ide s. w ith
th e h e lp o f th e te c h no logy u se rs c a n e a s ily c o n trol d e v ic es
a n d c r e ate d ocu men ts b y s p e a k ing.

 In s te ad o f h a v in g c a lle r s r ema in id ly o n h o ld w h ile a g e n ts


a r e b u s y , o rga niza tio ns c an e n gag e the ir c a lle r s w itho ut
liv e c u s tomer r e p r ese nta tiv es .

 T h is ty p e o f te c h no log y c a n h e lp th o s e w ith d y s le xia a n d


o th e r p h ysical d is a b ilitie s.

 Sp e e c h r e c og nition s o ftw are is n o w fr e q u en tly in stalle d in


c o mp uters a nd m o b ile d e v ic e s, a llo win g fo r e a s y a cce ss.
Applications
• Device control
• Car Bluetooth systems
• Voice transcriptions
• Virtual assistant
• Robotics
• Military
• Health care
DIFFERENCE

Speech Recognition Voice Recognition


It is the process of capturing spoken words It is a process of identifying the person
using a microphone and converting them who is speaking.
into a digitally stored set of words.

It works by analyzing the features of


It works by converting the vibrations of speech that differ between individuals.
words or the analog signal to digital signal Everyone has a unique pattern of speech
and then processing it. stemming from their anatomy.

Such software is used for automatic Voice recognition technology is used to


translations, dictation, hands-free verify a speaker’s identity or determine an
computing, medical transcription, robotics, unknown speaker’s identity
automated customer service, and much
more.
PERFORMANCE
• Speech recognition performance is measured by accuracy and
speed. Accuracy is measured with word error rate. WER works at
the word level and identifies inaccuracies in transcription, although
it cannot identify how the error occurred.

• A variety of factors can affect computer speech recognition


performance, including pronunciation, accent, pitch, volume and
background noise.
FUTURE OF SPEECH RECOGNITION
• Accuracy will become better and better.

• Greater use will be made of “intelligent systems” which will


attempt to guess what the speaker intended to say, rather than
what was actually said, as people often misspeak and make
unintential mistakes.

• Microphone and sound systems will be designed to adapt more


quickly to changing background noise level, different
environments, with better recognition of extraneous material
to be discarded.
ANY QUESTIONS?

You might also like