Pankaj Singh Synopsis (Recovoicegnition)

Voice Recognition
PROJECT SYNOPSIS
OF MINOR PROJECT
BACHELOR OF TECHNOLOGY
CSE
SUBMITTED BY GUIDED BY
Name: Pankaj Singh Ms. Kamalinder Kaur

IKGPTU Roll No. :2002551
Name: Pradeep Kumar

IKGPTU Roll No. :2102475
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Chandigarh Engineering College – Landran
Mohali, Punjab – 140307

August, 2022
INDEX
Sr. no. Topic Page no.
1. Introduction 1
2. Technology used 2
3. Literature Survey 3
4. Methodology/Planning of Work 4
5. Conclusion & Future Scope (output) 5
6. References 6
Abstract
This project attempted to design and implement a voice recognition
system that would identify different users based on previously stored
voice samples. Each user inputs audio samples with a keyword of his or
her choice. This input was gathered but successful processing to extract
meaningful spectral coefficients was not achieved. These coefficients
were to be stored in a database for later comparison with future audio
inputs. Afterwards, the system had to capture an input from any user and
match its spectral coefficients to all previously stored coefficients on the
database, in order to identify the unknown speaker. Since the spectral
coefficients were acquired adequately the system as a whole did not
recognize anything, although we believe that modifying the system’s
structure to decrease timing dependencies between the subsystems might
make the implementation more feasible and less complex. Voice
recognition system is a system which is used to convert human voice into
signal, which can be understood by the machines. When this is achieved,
the machine can be made to work, as desired. The machine could be a
computer, a typewriter, or even a robot. There are systems available, in
which the machine ‘speaks’ the recorded word. But that is out of the
scope of this paper. Here, only the human is expected to talk. Further, the
voice recognition systems described here, can be used for projects only.
Keywords: Speech Recognition System, Acoustic Model, DTMF
Decoder, HM 2007, Voice Recognition Moule VR3. The overall design
methodology for this project was to divide the system into two parts that
could be done relatively independently. Unfortunately, the project ended
up being more suitable for a three person subdivision and the remaining
division had to be distributed as best as we could. Figure 1 shows the
whole system’s block diagram. As can be seen the architecture used is
fairly compact given the nature of the system. As a result of this desired
compactness, system complexity was underestimated. In particular, the
much desired modularity between our individual parts ended up being
completely useless because the distance subsystem and the control unit
should have been designed by the same person to facilitate posterior
integration.
INTRODUCTION
The technical paper aims to explain various voice recognition systems,

available. There are various software and hardware devices, which use
various techniques to decode human speech. History The concept of speech
recognition started somewhere in 1940s.Practically the first speech
recognition program appeared in 1952 at the bell labs[2],[3], that was about
recognition of a digit in a noise free environment. Bell Laboratories
designed in 1952 the "Audrey" system, which recognized digits spoken by
a single voice. This first speech recognition system, could understand only
digits. 1940s and 1950s is considered as the foundational period of the
speech recognition technology. In this period, work was done on the
foundational paradigms of the speech recognition, which is, automation and
information theoretic models. Later, this device was improved to recognize
spoken words, numbers etc. to obtain ASR(Automatic Speech Recognition)
system. In today's networked world, the need to maintain the security of
information or physical property is becoming both increasingly important
and increasingly difficult. From time to time we hear about the crimes of
credit card fraud, computer breakin's by hackers, or security breaches in a
company or government building. In the year 1998, sophisticated cyber
crooks caused well over US $100 million in losses (Reuters, 1999). In most
of these crimes, the criminals were taking advantage of a fundamental flaw
in the conventional access control systems: the systems do not grant access
by "who we are", but by "what we have", such as ID cards, keys, passwords,
PIN numbers, or mother's maiden name. None of these means are really
define us. Rather, they merely are means to authenticate us. It goes without
saying that if someone steals, duplicates, or acquires these identity means,
he or she will be able to access our data or our personal property any time
they want. Recently, technology became available to allow verification of
"true" individual identity. This technology is based in a field called
"biometrics". Biometric access control are automated methods of verifying
or recognizing the identity of a living person on the basis of some
physiological characteristics, such as fingerprints or facial features, or some
aspects of the person's behaviour, like his/her handwriting style or
keystroke patterns. Since biometric systems identify a person by biological
characteristics, they are difficult to forge. Voice or speaker recognition is
the ability of a machine or program to receive and interpret dictation or to
understand and carry out spoken commands. Voice recognition has gained
prominence and use with the rise of AI and intelligent assistants, such as
Amazon's Alexa, Apple's Siri and Microsoft's Cortana. Voice recognition
systems enable consumers to interact with technology simply by speaking
to it, enabling hands-free requests, reminders and other simple tasks.
How voice recognition works

Voice recognition software on computers requires that analog audio be
converted into digital signals, known as analog-to-digital conversion. For a
computer to decipher a signal, it must have a digital database, or
vocabulary, of words or syllables, as well as a speedy means for comparing
this data to signals. The speech patterns are stored on the hard drive and
loaded into memory when the program is run. A comparator checks these
stored patterns against the output of the A/D converter -- an action called
pattern recognition. In practice, the size of a voice recognition program's
effective vocabulary is directly related to the random access memory
capacity of the computer in which it is installed. A voice recognition
program runs many times faster if the entire vocabulary can be loaded
into RAM, as compared with searching the hard drive for some of the
matches. Processing speed is critical, as well, because it affects how fast
the computer can search the RAM for matches. While voice recognition
technology originated on PCs, it has gained acceptance in both business and
consumer spaces on mobile devices and in home assistant products. The
popularity of smartphones opened up the opportunity to add voice
recognition technology into consumer pockets, while home devices, like
Google Home and Amazon Echo, brought voice recognition technology
into living rooms and kitchens. Voice recognition, combined with the
growing stable of internet of things sensors, has added a technological layer
to many consumer products that previously lacked any smart capabilities.
As uses for voice recognition technology grow and more users interact with
it, the companies implementing voice recognition software will have more
data and information to feed into the neural networks that power voice
recognition systems, thus improving the capabilities and accuracy of the
voice recognition products.
Technology used
Facial recognition is a way of identifying or confirming your identity using
your face. Facial recognition systems can be used to identify people in
photos, videos, or in real-time. Facial recognition has historically worked
like other forms of “biometric” identification such as speech recognition,
the irises of your eyes, or fingerprint identification. Fingerprint data, for
example, is gathered and analyzed for identifying markers. A newly found
fingerprint can then be evaluated against this database for matching
markers. Facial recognition works in the same way. A computer
analyzes image data and looks for a very specific set of markers within it –
everything from a person’s head shape to the depth of their eye sockets. A
database of facial markers is created, and an image of a face that shares a
critical threshold of similarity from database indicates a possible match.
This is the basic principle behind all types of facial recognition, from
unlocking your iPhone by scanning your face, to intercepting known
shoplifters as they enter a store. That worked well enough for relatively
simple jobs, like figuring out where faces were within a photo, but
to identify a particular face as matching a photograph of the same person?
That turned out to be a bit more difficult. Several methods have been
developed to enable accurate facial recognition, and they all begin with
the collection of images and videos of people’s faces. The computer must
be trained to read the geometry of a face and identify specific facial
landmarks. One system has up to 68 landmark points on a human face,
localizing regions around the eyes, brows, nose, mouth, chin, and jaw. This
requires a set of rules general enough to include a wide range of facial types
but narrow enough to exclude paintings and clothes-store mannequins. The
algorithm compares every image pixel’s brightness to the brightness of the
pixels around it, creating a map of changing pixel intensity. A complex
network of multi-directional, brightness gradients is then stored in coded
form on the computer. Another approach has to do with the projection of a
2D photo onto a 3D model, like a cylinder. Wrapping a face around a third
dimension can often reveal forms of symmetry and distinguishing
characteristics that are much harder to find in a flat and static image. Once
the image preparation has been completed, the system “encodes” the face,
or collapses its most distinguishing characteristics and patterns to a smaller,
simplified file that exists solely to do cross-checking with other encoded
faces.
Literature survey
Voice recognition systems can be divided into a number of classes based on
their ability to recognize different words. A few classes of speech
recognition [1], [3], are classified as under: Isolated Speech Isolated words
usually involve a pause between two utterances; it doesn’t mean that, it only
accepts a single word, but requires one utterance at a time. Connected
Speech Connected words or connected speech is similar to isolated speech,
but allows separate utterances with minimal pauses between them.
Continuous Speech Continuous speech allows the user to speak almost
naturally, and is also called computer dictation. Spontaneous Speech At a
basic level, it can be thought of as speech, that is natural sounding and not
rehearsed. An ASR system with spontaneous speech ability should be able
to handle a variety of natural speech features such as words being run
together, "ums" and "ahs", and even slight stutters. As one of the most
successful applications of image analysis and understanding, face
recognition has recently received significant attention, especially during the
past several years. At least two reasons account for this trend: the first is the
wide range of commercial and law enforcement applications, and the second
is the availability of feasible technologies after 30 years of research. Even
though current machine recognition systems have reached a certain level of
maturity, their success is limited by the conditions imposed by many real
applications. For example, recognition of face images acquired in an
outdoor environment with changes in illumination and/or pose remains a
largely unsolved problem. In other words, current systems are still far away
from the capability of the human perception system. This paper provides an
up-to-date critical survey of still- and video-based face recognition research.
There are two underlying motivations for us to write this survey paper: the
first is to provide an up-to-date review of the existing literature, and the
second is to offer some insights into the studies of machine recognition of
faces. To provide a comprehensive survey, we not only categorize existing
recognition techniques but also present detailed descriptions of
representative methods within each category. In addition, relevant topics
such as psychophysical studies, system evaluation, and issues of
illumination and pose variation are covered. Reading Assistant software is
a guided reading tool to build fluency. By virtue of the speech verifier, the
voice recognition reading software listens to students reading aloud.
Monitoring for signs of difficulty, which include hesitations, silence,
mispronunciations and other cues.
METHODOLOGY/PLANNING OF WORK
A spectrogram represents every word in the “memory” of the software. It

compares the spectrogram of the word spoken with the spectrograms from
its vocabulary to determine what was said. In general, this method does
an excellent job of recognizing simple words. The disadvantage of the
previous model is that it has a limited vocabulary. Theoretically, it could
be significantly expanded because people’s vocabularies are very
different, and many also have dialects. In turn, it complicates the analysis
process for selecting patterns. So, learning blocks that recognize sounds
have been invented. It helps the system to understand whole sentences.
This is precisely the basis of feature analysis. Some more advanced voice
recognition systems are based on the language model. They can listen and
understand the words that people say because they have mathematical
algorithms for analyzing languages. This method is also built on the rule
that a different set of words can follow certain words, while other words
are rarely used in the same sentence. For example, it is more likely that
the word "open" will be followed by the word "door.” The statistical
analysis and modeling method has been actively used over the past 10
years and has reached its development limit. And this means that for
better voice recognition programming, more advanced technologies are
required. Modern systems for speech, text, and photo recognition use
neural networks. This is a mathematical model, and its hardware and
software implementation allows the computer to work like a human brain.
Instead of storing specific patterns, it uses vast networks of neurons that
change connections with each other as new information flows through
them. But there are also some difficulties here. In order for the neural
system to be able to work and develop independently, it will have to be
trained using extensive databases. Voice recognition systems are popular
throughout the world.
Conclusion and Future Scope (output)
From the detail study of various voice recognition systems discussed
above, it can be concluded that, although, speaker independent systems are
also available, they are costly. Thus, the voice recognition module VR 3,
which is speaker dependent, is best suited, for use in projects of making
automated systems. Speech recognition is a thriving domain with many
important applications. It's easy to predict that speech recognition research
will continue as well as important practical applications will be created.
Accurate speech recognition is not so hard problem so it should be solved
in a foreseeable future. And it's not about AI because it's obvious that most
of the speech recognition issues are not caused by the lack of understanding
but rather a lack of good algorithms. Noises, accents and so on are just
purely technical problems which will be eventually solved. Researches
often consider speech recognition in a noisy environment as a standalone
problem with a practical goal to build an application that works. At the
same time our knowledge about speech fundamentally imporves from day
to day and the goals are more and more ambitious. Recent BABEL
programs aims to improve support for non-English languages for example
and it's planned that we will have quite good step forward in a next few
years. Some leading researchers are working on language-independent
speech recognition. The accuracy on the standard test sets also improves
from year to year. And voice applications are already in every smartphone.
Like computers started to play chess better than human speech recognition
soon will be done better by computers too. Importantly, that will add some
important knowledge about nature as a whole and human brain in
particular. So, speech recognition is an important step to our exploration of
the nature laws. Speech recognition is a thriving domain with many
important applications. It's easy to predict that speech recognition research
will continue as well as important practical applications will be created.
Accurate speech recognition is not so hard problem so it should be solved
in a foreseeable future. And it's not about AI because it's obvious that most
of the speech recognition issues are not caused by the lack of understanding
but rather a lack of good algorithms. Noises, accents and so on are just
purely technical problems which will be eventually solved. Researches
often consider speech recognition in a noisy environment as a standalone
problem with a practical goal to build an application that works. At the
same time our knowledge about speech fundamentally improve from day
to day and the goals are more and more ambitious.
References
[1] Jibran Abbasi, Muzamil Hussain, Shoaib Ahmed, An Implementation
of Speech Recognition for Desktop Application, www.scribd.com
[2] Speech recognition-The next revolution,5th edition.
[3] Sameer Shewalkar, Shoaib Ansari, Masuma Mujawar, Prof.Patil S.S,
‘Handling PC through Speech Recognition and Air Gesture’ International
Journal of Computer Science and Information Technology Research ,Vol.
3, Issue 1,January - March 2015
[4] Mark Gales ‘Acoustic Modeling for Speech Recognition: Hidden
Markov Models and Beyond?’ December 2009
[5] Charu Joshi, ’Speech Recognition’, www.slideshare.net [6] Developing
an Isolated Word Recognition System in MATLAB, in.mathworks.com.
[7] Rachna Jain,Dr. S.K Saxena, “Voice Automated
MobileRobot,”International Journal of Computer Applications Volume
16–No.2, February 2011.
[8] Sija Gopinathan, Athira Krishnan R, Renu Tony, Vishnu M,
Yedhukrishnan,” Wireless Voice Controlled Fire Extinguisher Robot,”
International Journal of Advanced Research in Electrical, Electronics and
Instrumentation Engineering Vol. 4, Issue 4, April 2015. [9] Madhavi
Pednekar, Joel Amanna, Jino John, Abhishesh Singh, Suresh Prajapati, Don
Bosco Institute of Technology, Mumbai, India, ‘Voice Operated Intelligent
Fire Extinguishing Vehicle’, 2015 International Conference on
Technologies for Sustainable Development (ICTSD-2015), Feb. 04 – 06,
2015. [10] Voice Controlled Robot, Engineering Degree by the University
of Mumbai By Pratik Chopra Harshad Dange Under the guidance of Mr.
Shirish S. Halbe (Asst. Professor & Hobby Centre Co-ordinator )
Department of Electronics Engineering, K. J. Somaiya College of
Engineering, Vidyavihar, 2006 (report). [11] S.Suresh, Y. Sindhuja Rao,
Modelling Of Secured ‘Voice Recognition Based Automatic Control
System’, International Journal of Emerging Technology in Computer
Science & Electronics (IJETCSE), Volume 13 Issue 2 –MARCH 2015
Websites: www.wikipedia.org
www.geeksforgeeks.org
www.tutorialpoint.com
www.javapoint.com

Pankaj Singh Synopsis (Recovoicegnition)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pankaj Singh Synopsis (Recovoicegnition)

Uploaded by

Copyright:

Available Formats

Voice Recognition

Name: Pankaj Singh Ms. Kamalinder Kaur

Name: Pradeep Kumar

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Chandigarh Engineering College – Landran

Mohali, Punjab – 140307

Sr. no. Topic Page no.

5. Conclusion & Future Scope (output) 5

The technical paper aims to explain various voice recognition systems,

How voice recognition works

A spectrogram represents every word in the “memory” of the software. It

You might also like