Professional Documents
Culture Documents
A PROJECT REPORT
SUBMITTED IN COMPLETE FULFILLMENT OF THE REQUIREMENTS FOR THE
AWARD OF THE DEGREE OF
BACHELOR OF TECHNOLOGY IN
by:
ABHIVYAKT SHARMA
2K20/EC/010
ADITYA GUPTA
2K20/EC/013
PROF. MS CHOUDHRY
DEPT OF ELECTRONICS AND COMMUNICATION
CANDIDATE’S DECLERATION
B.Tech. hereby declare that the project Dissertation titled “SPEECH RECOGNITION
requirement for the award of the degree of Bachelor of Technology, is original and not
copied from any source without proper citation. This work has not previously formed
the basis for the award of any Degree, Diploma Associateship, Fellowship or other
Place: Delhi
DEPT OF ELECTRONICS AND COMMUNICATION
CERTIFICATE
I hereby certify that the project Dissertation titled “SPEECH RECOGNITION USING
requirement for the award of the degree of the Bachelor of Technology, is a record of
the project work carried out by the students under my supervision. To the best of my
knowledge this work has not been submitted in part or full for any Degree or Diploma
(Assistant Professor)
SUPERVISOR
DEPT OF ELECTRONICS AND COMMUNICATION
ABSTRACT
ACKNOWLEDGEMENT
In performing our major project, we had to take the help and guideline of some
respected persons, who deserve our greatest gratitude. The completion of this
assignment gives us much pleasure. We would like to show our gratitude PROF. MS
CHOUDHRY, Mentor for major project. Giving us a good guideline for report throughout
numerous consultations. We would also like to extend our deepest gratitude to all
those who have directly and indirectly guided us in writing this assignment.
Many people including our classmates and team members itself, have made valuable
assignment. We thank all the people for their help directly and indirectly to complete
our assignment.
Delhi Technological University for giving us the opportunity to work on this topic.
CONTENTS
1. TITLE PAGE 1
2. CANDIDATE’S DECLERATION 2
3. CERTIFICATE 3
4. ABSTRACT 4
5. ACKNOWLEDGEMENT 5
6. CONTENTS 6
7. INTRODUCTION 8
8. 9
9. 11
10. 14
11. SUMMARY 17
12. CONCLUSION 17
INTRODUCTION
Ever since the beginning of time , speech has been the most convenient and
conventional means of communicating . Today this interaction is no longer
limited to face-to-face interaction , we can in-fact communicate with our
fellow humans miles away from us by a variety of means such as a
telephone(both wired and wireless), satellite communication , voice-mail
and by the means of internet just to name a few. With the rapid
development of communication technologies, a promising speech
communication technique for human-to-machine interaction has come into
being . Automatic speech recognition (ASR) is the core challenge towards the
natural human-to-machine communication technology.
Automatic speech recognition converts a speech wave-form into a discrete
sequence of words by the means of machine. In the present world there are
already a number of pre-existing models employing this technology to task ,
however a number of problems exist when it comes to real-world
applications of this technology. In most cases the accuracy of the model is far
from that of a human listener , and so its performance could drastically
degrade with small modification of speech signal or for that matter a change
in speaking environment.
This technology employs complex algorithms , owing to the large variation in
speech signals and hence represent variability in the due process.
In-order to achieve this large computational strength and memory capacity
are required. The goal of this project is to study the vary optimization
techniques for the state of the art ASR techniques for the DSP based
embedded applications while displaying high recognition accuracy.
A BRIEF HISTORY
The idea of such a technology was first conceived in the 1940’s, however the
first model employing such a technology was first developed in 1952 at Bell
Labs, which aimed to detect a digit in a noise-free environment. The period
from the 40’s to the 50’s is considered as the foundational period of the this
technology, during this period work was done on the foundational paradigms
of this technology’s automation and information theoretic models. By the
1960’s scientists working with this technology were able to have the
machines recognise small vocabularies of words (10-100) based on simple
acoustic – phonetic properties of speech.
It was during this period that time-normalization and filter bank methods
were developed. Since the 70’s larger and larger vocabularies came into
picture and developing models became more and more robust.
The leading invention of this era was that of the Hidden Markov models
(HMM) and the stochastic language models both of which have been
discussed further in later sections of this report.
After being researched and developed for nearly 5 decades, this technology
finally entered the marketplace in early 2000’s and has since found its way in
our homes and have become an integral part of our daily-lives.
TYPES OF SPEECH RECOGNITION
Speech recognition systems can be divided into the number of classes based
on their ability to recognise that words and list of words they have. A few
classes of speech recognition are classified as
1. ISOLATED SPEECH
2. CONNECTED SPEECH
3. CONTINOUS SPEECH
This allows the user to speak almost naturally , this method is also
known as computer dictation.
4. SPONTANEOUS SPEECH
2. DIGITIZATION
The whole procedure of converting the analog signal into a digital form is
known as digitization it involves both sampling and quantization .
sampling here refers to the process of converting continuous signal to a
discrete one , the process that involves approximating a continuous
range of values is called quantization.
3. ACOUSTIC MODEL
An acoustic model is developed by taking recordings of speech ,
transcriptions and by using softwares we create statistical
representations of the sounds that make up each word . this is used by a
speech recognition engine to recognise speech and this helps break
words into phonemes.
4. LANGUAGE MODEL
Language modelling is used in natural language processing software like
in the case of speech prediction applications , this also finds its use in
speech recognition as it tries to capture the properties of the language
and predict the next word in the sequence. The model compares the
phonemes to words in its in-built dictionary.
5. SPEECH ENGINE
This is employed to convert the input audio into text to accomplish this it
uses all sorts of data , complex algorithms and statistics . The first step in
doing this is to digitize the data so that it is converted into a suitable
format for further processing . Once the data format is appropriate it
then searches for the best suitable match for it , upon which the signal is
recognized and is displayed as a text string.
A. UTTERANCES
The smallest component of whatever the user says is an
utterance. In other words speaking a word or a combination of
words that make sense and mean something to the computer
is called an utterance . Utterances once detected are then sent
to the speech engine for further processing.
B. PRONOUNCIATIONS
A speech engine uses a process to understand what the word
sounds like and what it should be this is called pronunciation.
C. GRAMMER
Grammar is the set of rules of language in-order to define the
words and phrases that are going to be recognized by the
speech engine.
D. ACCURACY
The performance of any speech recognition software is
measurable , it helps to identify an utterance in a better way.
E. VOCABULARIES
Vocabulary is the list of words that can be recognised by the
speech recognition engine . it was found that smaller
vocabularies are easy to identify by a speech engine where-as
larger vocabularies are a lot more difficult to be understood
by the engine.
F. TRAINING
Training can be used by users who have a difficulty in speaking
or pronouncing certain words , this would help the engine
better understand their speech and hence produce the
desired output as one would expect.
2. TOOLS USED IN THE MAKING
A. PYTHON
B. MS-PAINT
C. VS-CODE/PYCHARM
D. OFFICE 2019 -DOCUMENTATION PURPOSES.
3. METHODOLOGY
As this technology is still an emerging one , not everyone is familiar with
it let alone the developers working on it for the first time . while the
basic functions involving basic speech synthesis and speech recognition
take almost little to no time in understanding , there are a lot more
powerful techniques that the developer would want to understand and
utilize from the get go.
Despite a lot of investment in the research and development of this
technology over the course of last 40 years , there still are a lot of
limitations that stand between us and developing a very efficient speech
recognition software.
Hence a good understanding of the limitations and strength of this
technology is required for developers in-order to make decisions
whether a particular application will benefit from the use of speech input
/ output or not.
A. SPEECH SYNTHESIS
B. SPEECH RECOGNITION
By speech recognition we mean that converting speech into a text string
for this purpose we have to use the speechrecognition library that’ll be
have to installed from the terminal using the command “pip install
speechrecognition”
This pretty much contains all the tools required to recognise and
understand what the user wants be it the complex algorithms or the list
of words i.e., the vocabulary this library has it all to make our application
functional
We have also used the tkinter library from python , as this library comes
pre installed with python there is no need to install it again . The main
use of this library was to develop a Graphic User Interface (GUI) so that
almost anyone (with and without any prior programming knowledge )
can use this software and thereby make it more user friendly in the due
course of development.
CODE
"""
Delhi Technological University
3rd semester Signal and System project
Project: Speech recognition using python
Created by:
Abhivyakt sharma (2K20/EC/010)
Aditya Gupta (2K20/EC/013)
Mentor:
Professor MS Chaudhary
"""
'''-----------------------------------------
Program-----------------------------------------------------'''
"""
Importing two Libraries are:
1.speech_recognition for speech recognition and
2.tkinter for Graphical user interface(GUI) Development
"""
import speech_recognition
from tkinter import *
'''
Here we have created a function/method named speech_recognize()
which will work for speech recognition
'''
'''---------------------------------Speech Recognising
function--------------------------------------------'''
recognizer = speech_recognition.Recognizer()
def speech_recognizer():
global recognizer
try:
# Here, we are defining our program that from where it
will take voice input, which is Microphone here
with speech_recognition.Microphone() as mic:
# Here, program is reducing the noise and cleaning
the voice so that it can easily analyse the frequency
recognizer.adjust_for_ambient_noise(mic, 0.2)
audio = recognizer.listen(mic)
'''Here, program is using module to find word,
sentence and letters which make sense
and here it is using google's recognising module'''
text = recognizer.recognize_google(audio)
text = text.lower()
# Here, the label is element to represent then
result in our GUI Application
label = Label(text="Recognized: " + text)
label.place(x=100, y=100)
# Here, except will generate a result that program unable
hear or understand the voice
except:
label = Label(text="Unable to understand, please say
again")
label.place(x=110, y=100)
'''-------------------------------------Creating GUI: Graphical
User Interface---------------------------------'''
# Creating a new window and configurations
window = Tk()
window.title("Speech Recognition Demo")
window.minsize(width=500, height=200)
# Labels
label = Label(text="Click on Start for Speech Recognition Demo")
label.place(x=100, y=20)
# Creating button that will use speech_recognizer function to
generate the result
button = Button(text="Start", height=2, width=10,
command=speech_recognizer)
button.place(x=110, y=50)
window.mainloop()
'''Caution: This program sometimes doesn't response due to the
heavy load'''