You are on page 1of 6


Voice automation system is the most used technology in every field
irrespective of it's background from home to office to industry and IT
sectors it has a vivid use. It reduces human effort and time and also cost.
We can use this technology in different forms, one of which is Voice
controlled Wheel chair for physically challenged people .
Let us know about this Voice A
Automation system in more details.
Speech recognition is a popular topic in today’s life. The applications of
Speech recognition can be found everywhere, which make our life more
effective. For example the applicatio
ns in the mobile phone, instead of
typing the name of the person who people want to call, people can just
directly speak the name of the person to the mobile phone, and the mobile
phone will automatically call that person. If people want send some text
sages to someone, people can also speak messages to the mobile
phone instead of typing.
Voice recognition is the ability of a machine or program to receive and
interpret dictation or to understand and carry out spoken commands.
Basically it is a technology that people can control the system with their
speech. Instead of typing on the keyboard or operating the buttons for the
system, using speech to control system is more convenient
convenient. It can also
reduce the cost of the industry production aatt the same time.
Alternatively known as speech recognition, voice recognition is a
computer software program or hardware device with the ability to decode
the human voice. Voice recognition is commonly used to operate a device,
perform commands, or write without having to use a keyboard, mouse, or
press any buttons. Today, this is done on a computer with ASR (automatic
speech recognition) software programs. Many ASR programs require the
user to "train" the ASR program to recognize their voice so that it can
more accurately convert the speech to text. For example, you could say
"open Internet" and the computer would open the Internet browser.
There has been an exponential growth in voice recognition technology
over the past five decades. Dating back to 1976, computers could only
understand slightly more than 1,000 words. That total jumped to roughly
20,000 in the 1980s as IBM continued to develop voice recognition
The first speaker recognition product for consumers was launched in 1990
by Dragon, called DragonDictate. In 1996, IBM introduced the first voice
recognition product that could recognize continuous speech.
After the launch of smartphones in the second half of the 2000s, Google
launched its Voice Search app for the iPhone. Three years later, Apple
introduced Siri, which is now a prominent voice recognition assistant.
During this past decade, several other technology leaders have also
developed more sophisticated voice recognition software, with Amazon's
Echo featuring Alexa and Microsoft's Cortana -- both of which act as
personal assistants that respond to voice commands.
Voice recognition software on computers requires that analog audio be
converted into digital signals, known as analog-to-digital conversion. For
a computer to decipher a signal, it must have a digital database, or
vocabulary, of words or syllables, as well as a speedy means for
comparing this data to signals. The speech patterns are stored on the hard
drive and loaded into memory when the programs will run. A comparator
checks these stored patterns against the output of the A/D converter -- an
action called pattern recognition.
In practice, the size of a voice recognition program's effective vocabulary
is directly related to the random access memory capacity of the computer
in which it is installed. A voice recognition program runs many times
faster if the entire vocabulary can be loaded into RAM, as compared with
searching the hard drive for some of the matches. Processing speed is
critical, as well, because it affects how fast the computer can search the
RAM for matches.
At its core, speech recognition technology is the process of converting
audio into text for the purpose of conversational AI and voice

 Speech recognition can be broken down into three stages:

1.Automatic speech recognition (ASR): The task of transcribing the
2. Natural language processing (NLP): Deriving meaning from speech
data and the subsequent transcribed text.
3. Text-to-speech (TTS): Converts text to human-like speech.
There are many AI technology based home applicants viz. Google home,
Apple Siri, Alexa , which makes our daily life easier and convenient.

The process begins by digitizing a recorded speech sample with ASR. The
speaker’s unique voice template is broken up into discrete segments made
up of several tones visualized in the form of spectrograms.The
spectrograms are further divided into timesteps using the short-time
Fourier transform.
Each spectrogram is analyzed and transcribed based on the NLP algorithm
that predicts the probability of all words in a language’s vocabulary. A
contextual layer is added to help correct any potential mistakes. Here the
algorithm considers both what was said, and the likeliest next word based
on its knowledge of the given language.

Finally, the device will verbalize the best possible response to what it has
heard and analyzed using TTS.


The uses for voice recognition have grown quickly as AI, machine
learning and consumer acceptance have matured. In-home digital
assistants from Google to Amazon to Apple have all implemented voice
recognition software to interact with users. The way consumers use voice
recognition technology varies depending on the product, but it can include
transcribing voice to text, setting up reminders, searching the internet, and
responding to simple questions and requests, such as playing music or
sharing weather or traffic information.

Another uses of voice recognition software :

1.Virtual Assistants

2.Online Banking Using Voice

3. Doctors Can Stop Typing While Talking To Patients

4.Enhanced Security With Voice Biometry

5.Voice Assistants In The Workplace

6.Using Speech Recognition To Transcribe Meetings

7.E-commerce Purchases Using Voice Commands

8.Catching Criminals Using Voice

9.Making Public Transportation Simple And Inclusive

10.Creating Superior Content With Dictation

11.Transcribing Podcasts

12.Journalists Have Their Interviews Transcribed

13.Booking Your Next Vacation

14.Learning Languages

15.Voice controlled wheel chair.

Advantages and Disadvantages

 Advantages
o It can help to increase productivity in many businesses, such as in healthcare
o It can capture speech much faster than you can type
o We can use text-to-speech in real-time.
o The software can spell the same ability as any other writing tool.
o Helps those who have problems with sight and physical disabilities.

 Disadvantages
o Voice data can be recorded, as a result which could impact privacy.
o The software can struggle with vocabulary, particularly if there are specialist
o It can misinterpret words if you don’t speak clearly.

Here is a rough idea for voice automation system. We had discussed about
its limitations and capabilities in a broader sense. We have learnt a lot
while researching about the new technologies used in different field,
which will surely help us in our upcoming project i.e., Voice controlled
wheelchair for physically challenged people.

BATCH: 2019-2023
2. SHREYASI KHAN_(Roll-32)
3. SHREYA GHOSH _(Roll-33)
5. RAYA BASU_(Roll-44)

You might also like