Professional Documents
Culture Documents
On
Speech recognition
Submitted By
Seventh Semester
B.E.(Computer Science & Engineering)
Guided by
Prof.A.A. Chaudhari
PRMIT&R
1
CERTIFI CA TE
Speech Recognition
is a bonafide work and it is submitted to the Sant Gadge Baba
Amravati University, Amravati
By
2
Prof. Ram Meghe Institute of Technology & Research,
Badnera- Amravati
2021-2022
3
ACKNOWLEDGEMENT
I would also like to thank all my professors for their constant guidance and
support throughout the course of the seminar.
Komal Gulhane
4
ABSTRACT
While there is still much room for improvement, current speech recognition
systems have remarkable performance. We are only humans, but as we develop
this technology and build remarkable changes we attain certain achievements.
Rather than asking what is still deficient, we ask instead what should be done to
make it efficient.
5
TABLE OF CONTENTS
1 Introduction 6
1.3 Working 8
4 Application 13
References 17
6
1. INTRODUCTION
They can also serve as the input to further linguistic processing in order to achieve
speech understanding. It is also known as Automatic Speech Recognition (ASR)
,computer speech recognition, speech to text (STT). Speech recognition allows you
to provide input to a system with your voice. Just like clicking with your mouse,
typing on your keyboard, or pressing a key on the phone keypad provides input to
an application, speech recognition allows you to provide input by talking. In the
desktop world, you need a microphone to be able to do this.
The days when you had to keep staring at the computer screen and
frantically hit the key or click the mouse for the computer to respond to your
commands may soon be a things of past. Today we can stretch out and relax and
tell your computer to do your bidding. This has been made possible by the ASR
(Automatic Speech Recognition) technology.
7
People who have little keyboard skills or experience, who are slow typists,
or do not have the time or resources to develop keyboard skills.
Dyslexic people or others who have problems with character or word use
and manipulation in a textual form.
People with physical disabilities that affect either their data entry, or
ability to read (and therefore check) what they have entered.
In 2011 Apple launched Siri which was similar to Google’s Voice Search. The
early part of this decade saw an explosion of other voice recognition apps. And
with Amazon’s Alexa, Google Home we’ve seen consumers becoming more and
more comfortable talking to machines.
• Today, some of the largest tech companies are competing to herald the speech
accuracy title. In 2016, IBM achieved a word error rate of 6.9 percent. In 2017
8
Microsoft usurped IBM with a 5.9 percent claim. Shortly after that IBM
improved their rate to 5.5 percent. However, it is Google that is claiming the
1.3 Working
The first component of speech recognition is, of course,
speech. Speech must be converted from physical sound to an electrical signal
with a microphone, and then to digital data with an analog-to-digital converter.
Once digitized, several models can be used to transcribe the audio to text.
9
2. TYPES OF SPEECH RECOGNITION SYSTEMS
Isolated Word
Isolated word recognizers usually require each utterance to have quiet (lack
of an audio signal) on BOTH sides of the sample window. It doesn't
mean that it accepts single words, but does require a single utterance at a
time. Often, these systems have "Listen/Not−Listen" states, where they
require the speaker to wait between utterances (usually doing processing
during the pauses).
Connected Word
Continuous Speech
Spontaneous Speech
10
Voice Verification/Identification
Text-Dependent:
If the text must be the same for enrollment and verification this is
called text- dependent recognition. In a text-dependent system,
prompts can either be common across all speakers (e.g.: a common
pass phrase) or unique. In addition, the use of shared-secrets (e.g.:
passwords and PINs) or knowledge-based information can be
employed in order to create a multi-factor authentication scenario.
Text-Independent:
Text-independent systems are most often used for speaker
identification as they require very little if any cooperation by the
speaker. In this case the text during enrollment and test is different. In
fact, the enrollment may happen without the user's knowledge, as in
the case for many forensic applications. As text-independent
technologies do not compare what was said at enrollment and
verification, verification applications tend to also employ speech
recognition to determine what the user is saying at the point of
authentication.
In text independent systems both acoustics and speech analysis
techniques are used.
11
3. ADVANTAGES & DISADVANTAGES
Increases productivity
By speaking normally into the SRS program, you create documents at the
speed you can compose them in your head. People without strong typing
skills or those who don't wish to be slowed down by manual input can use
voice recognition software to dramatically reduce document creation time.
Can help with menial computer tasks, such as browsing and scrolling
People are becoming lazy day by day. They are also not interested in
doing the necessary routine work even. Previously there where punch
cards to provide input to the system, then there came the keyboard, track
ball, touch screen, mouse, gesture control, joysticks etc; all the previously
used input methods require motion of hand or fingers. But, with SRS user
can provide input to the system through just his voice. He can complete
most of his menial computer tasks easily.
Cost effective
12
Diminishes spelling mistakes
3.1 Disadvantages
Most people cannot type as fast as they speak. In theory, this should make
voice recognition software faster than typing for entering text on a computer.
However, this may not always be the case because of the proofreading and
correction required after dictating a document to the computer. Although
voice recognition software may interpret your spoken words correctly the
majority of the time, you might still need to make corrections to
punctuation. Additionally, the software may not recognize words such as
brand names or uncommon surnames until you add them to the program's
library of words. SR systems are unable to recognize the words which are
phonetically similar. E.g. “there” & “their”.
• Vocal Strain
Using voice recognition software, you may find yourself speaking more
loudly than in normal conversation. In 2000, Linda L. Grubbs of PC World
magazine reported that this habit could lead to vocal cord injury. Although
there is no definite scientific link established between the use of voice
recognition software and damage to the voice, talking loudly for
extended periods always carries the possibility of causing strain and
hoarseness.
• Adaptability
13
4. APPLICATIONS
Data Entry
the speech recognition engine can process the command and automatically
determine which field to fill in.
Document Editing
14
Speaker Identification
Recognizing the patterns of speech of a various persons can be used to
identify them separately. It can be used as a Biometric authentication
system in which the user authenticates him/her self with the help of their
speech. The various characteristics of speech which involves frequency,
amplitude & other special features are captured & compared with the
previously stored database.
Amazon's Alexa
Apple Siri
Google's Google Assistant
Microsoft Cortana
15
5. CONCLUSION & FUTURE SCOPE
5.1 CONCLUSION
Speech recognition will revolutionize the way people interacted with Smart
devices & will, ultimately, differentiate the upcoming technologies. Almost all
the smart devices coming today in the market are capable of recognizing
speech. Many areas can benefit from this technology. Speech Recognition can
be used for intuitive operation of computer-based systems in daily life.
This technology will spawn revolutionary changes in the modern world and
become a pivot technology. Within five years, speech recognition technology
will become so pervasive in our daily lives that service environments lacking
this technology will be considered inferior
All the SR systems will be speaker independent and will produce the
same kind out output for a particular command irrespective of the user. SR
systems will be able to process the voice commands of all the users with
very high accuracy & efficiency.
16
Wearable Speech Recognition System
17
REFERENCE
18