You are on page 1of 19

SPEECH RECOGNITION USING PYTHON

A PROJECT REPORT
SUBMITTED IN COMPLETE FULFILLMENT OF THE REQUIREMENTS FOR THE
AWARD OF THE DEGREE OF

BACHELOR OF TECHNOLOGY IN

[ELECTRONICS AND COMMUNICATION]


Submitted

by:

ABHIVYAKT SHARMA

2K20/EC/010

ADITYA GUPTA

2K20/EC/013

Under the supervision of

PROF. MS CHOUDHRY
DEPT OF ELECTRONICS AND COMMUNICATION

DELHI TECHNOLOGICAL UNIVERSITY


(FORMERLY Delhi College of Engineering)
Bawana Road, Delhi-110042
MAY 2020

CANDIDATE’S DECLERATION

We, (Abhivyakt Sharma, Aditya Gupta, 2K20/EC/010 & 2K20/EC/013 ) students of

B.Tech. hereby declare that the project Dissertation titled “SPEECH RECOGNITION

USING PYTHON”. which is submitted by us to the Department of Electronics and

communication, Delhi Technological University, Delhi in partial fulfilment of the

requirement for the award of the degree of Bachelor of Technology, is original and not

copied from any source without proper citation. This work has not previously formed

the basis for the award of any Degree, Diploma Associateship, Fellowship or other

similar title or recognition.

(2K20/EC/010) Abhivyakt Sharma

(2K20/EC/013) Aditya Gupta

Place: Delhi
DEPT OF ELECTRONICS AND COMMUNICATION

DELHI TECHNOLOGICAL UNIVERSITY


(FORMERLY Delhi College of Engineering)

Bawana Road, Delhi-110042

CERTIFICATE

I hereby certify that the project Dissertation titled “SPEECH RECOGNITION USING

PYTHON” which is submitted by Abhivyakt Sharma, Aditya Gupta 2K20/EC/010 &

2K20/EC/013, (Delhi Technological University, Delhi) in complete fulfilment of the

requirement for the award of the degree of the Bachelor of Technology, is a record of

the project work carried out by the students under my supervision. To the best of my

knowledge this work has not been submitted in part or full for any Degree or Diploma

to this University or elsewhere.

Place: Delhi PROF. MS CHOUDHRY

(Assistant Professor)
SUPERVISOR
DEPT OF ELECTRONICS AND COMMUNICATION

DELHI TECHNOLOGICAL UNIVERSITY


(FORMERLY Delhi College of Engineering)

Bawana Road, Delhi-110042

ABSTRACT

Speech recognition is an ever-growing field in engineering technologies . Its


potential and areas of application are almost limitless , from being an
integral part of our smartphones to aiding the disabled , there is almost no
place where this technology hasn’t already been implemented.
This project aims to bring to light the uses of this technology ,its
implementations , how it can be improved and what can be its future scope.
DEPT OF ELECTRONICS AND COMMUNICATION

DELHI TECHNOLOGICAL UNIVERSITY


(FORMERLY Delhi College of Engineering)

Bawana Road, Delhi-110042

ACKNOWLEDGEMENT

In performing our major project, we had to take the help and guideline of some

respected persons, who deserve our greatest gratitude. The completion of this

assignment gives us much pleasure. We would like to show our gratitude PROF. MS

CHOUDHRY, Mentor for major project. Giving us a good guideline for report throughout

numerous consultations. We would also like to extend our deepest gratitude to all

those who have directly and indirectly guided us in writing this assignment.

Many people including our classmates and team members itself, have made valuable

comment suggestions on this proposal which gave us an inspiration to improve our

assignment. We thank all the people for their help directly and indirectly to complete

our assignment.

In addition, we would like to thank Department of Electronics and communication,

Delhi Technological University for giving us the opportunity to work on this topic.
CONTENTS
1. TITLE PAGE 1
2. CANDIDATE’S DECLERATION 2
3. CERTIFICATE 3
4. ABSTRACT 4
5. ACKNOWLEDGEMENT 5
6. CONTENTS 6
7. INTRODUCTION 8
8. 9
9. 11
10. 14

11. SUMMARY 17
12. CONCLUSION 17
INTRODUCTION

Ever since the beginning of time , speech has been the most convenient and
conventional means of communicating . Today this interaction is no longer
limited to face-to-face interaction , we can in-fact communicate with our
fellow humans miles away from us by a variety of means such as a
telephone(both wired and wireless), satellite communication , voice-mail
and by the means of internet just to name a few. With the rapid
development of communication technologies, a promising speech
communication technique for human-to-machine interaction has come into
being . Automatic speech recognition (ASR) is the core challenge towards the
natural human-to-machine communication technology.
Automatic speech recognition converts a speech wave-form into a discrete
sequence of words by the means of machine. In the present world there are
already a number of pre-existing models employing this technology to task ,
however a number of problems exist when it comes to real-world
applications of this technology. In most cases the accuracy of the model is far
from that of a human listener , and so its performance could drastically
degrade with small modification of speech signal or for that matter a change
in speaking environment.
This technology employs complex algorithms , owing to the large variation in
speech signals and hence represent variability in the due process.
In-order to achieve this large computational strength and memory capacity
are required. The goal of this project is to study the vary optimization
techniques for the state of the art ASR techniques for the DSP based
embedded applications while displaying high recognition accuracy.
A BRIEF HISTORY
The idea of such a technology was first conceived in the 1940’s, however the
first model employing such a technology was first developed in 1952 at Bell
Labs, which aimed to detect a digit in a noise-free environment. The period
from the 40’s to the 50’s is considered as the foundational period of the this
technology, during this period work was done on the foundational paradigms
of this technology’s automation and information theoretic models. By the
1960’s scientists working with this technology were able to have the
machines recognise small vocabularies of words (10-100) based on simple
acoustic – phonetic properties of speech.
It was during this period that time-normalization and filter bank methods
were developed. Since the 70’s larger and larger vocabularies came into
picture and developing models became more and more robust.
The leading invention of this era was that of the Hidden Markov models
(HMM) and the stochastic language models both of which have been
discussed further in later sections of this report.
After being researched and developed for nearly 5 decades, this technology
finally entered the marketplace in early 2000’s and has since found its way in
our homes and have become an integral part of our daily-lives.
TYPES OF SPEECH RECOGNITION

Speech recognition systems can be divided into the number of classes based
on their ability to recognise that words and list of words they have. A few
classes of speech recognition are classified as

1. ISOLATED SPEECH

This kind of speech usually involves a pause between two


utterances (not to be confused with a single word.)

2. CONNECTED SPEECH

This is similar to isolated speech the only difference being it allows


separate utterances with minimal pausing between each
utterance.

3. CONTINOUS SPEECH

This allows the user to speak almost naturally , this method is also
known as computer dictation.

4. SPONTANEOUS SPEECH

This is the closest what comes to human speech , a system with


such a capability can handle stutters and is hence the most natural
sounding speech method.
SPEECH RECOGNITION PROCESS
Below is a block diagram depicting the steps involved during a speech
recognition process.

Shown below is an in-depth analysis of automatic speech recognition process


 COMPONENTS OF SPEECH RECOGNITION SYSTEM
1. VOICE INPUT
With the help of an input device such as a microphone the user inputs
the audio into the system upon which the computer’s sound card
produces the equivalent digital representation of the received audio.

2. DIGITIZATION
The whole procedure of converting the analog signal into a digital form is
known as digitization it involves both sampling and quantization .
sampling here refers to the process of converting continuous signal to a
discrete one , the process that involves approximating a continuous
range of values is called quantization.

3. ACOUSTIC MODEL
An acoustic model is developed by taking recordings of speech ,
transcriptions and by using softwares we create statistical
representations of the sounds that make up each word . this is used by a
speech recognition engine to recognise speech and this helps break
words into phonemes.

4. LANGUAGE MODEL
Language modelling is used in natural language processing software like
in the case of speech prediction applications , this also finds its use in
speech recognition as it tries to capture the properties of the language
and predict the next word in the sequence. The model compares the
phonemes to words in its in-built dictionary.

5. SPEECH ENGINE
This is employed to convert the input audio into text to accomplish this it
uses all sorts of data , complex algorithms and statistics . The first step in
doing this is to digitize the data so that it is converted into a suitable
format for further processing . Once the data format is appropriate it
then searches for the best suitable match for it , upon which the signal is
recognized and is displayed as a text string.

 METHODOLOGY AND TOOLS INVOLVED


1. FUNDAMENTALS OF SPEECH RECOGNITION
Speech recognition in layman terms is the ability of the computer
understanding what exactly is the user trying to say , to understand
that we need to understand the following terms.

A. UTTERANCES
The smallest component of whatever the user says is an
utterance. In other words speaking a word or a combination of
words that make sense and mean something to the computer
is called an utterance . Utterances once detected are then sent
to the speech engine for further processing.

B. PRONOUNCIATIONS
A speech engine uses a process to understand what the word
sounds like and what it should be this is called pronunciation.

C. GRAMMER
Grammar is the set of rules of language in-order to define the
words and phrases that are going to be recognized by the
speech engine.

D. ACCURACY
The performance of any speech recognition software is
measurable , it helps to identify an utterance in a better way.
E. VOCABULARIES
Vocabulary is the list of words that can be recognised by the
speech recognition engine . it was found that smaller
vocabularies are easy to identify by a speech engine where-as
larger vocabularies are a lot more difficult to be understood
by the engine.

F. TRAINING
Training can be used by users who have a difficulty in speaking
or pronouncing certain words , this would help the engine
better understand their speech and hence produce the
desired output as one would expect.
2. TOOLS USED IN THE MAKING
A. PYTHON
B. MS-PAINT
C. VS-CODE/PYCHARM
D. OFFICE 2019 -DOCUMENTATION PURPOSES.
3. METHODOLOGY
As this technology is still an emerging one , not everyone is familiar with
it let alone the developers working on it for the first time . while the
basic functions involving basic speech synthesis and speech recognition
take almost little to no time in understanding , there are a lot more
powerful techniques that the developer would want to understand and
utilize from the get go.
Despite a lot of investment in the research and development of this
technology over the course of last 40 years , there still are a lot of
limitations that stand between us and developing a very efficient speech
recognition software.
Hence a good understanding of the limitations and strength of this
technology is required for developers in-order to make decisions
whether a particular application will benefit from the use of speech input
/ output or not.
A. SPEECH SYNTHESIS

Shown above are the steps involved in speech synthesis. Speech


synthesis in other words is also known as text to speech conversion
( TTS)

B. SPEECH RECOGNITION
By speech recognition we mean that converting speech into a text string
for this purpose we have to use the speechrecognition library that’ll be
have to installed from the terminal using the command “pip install
speechrecognition”
This pretty much contains all the tools required to recognise and
understand what the user wants be it the complex algorithms or the list
of words i.e., the vocabulary this library has it all to make our application
functional
We have also used the tkinter library from python , as this library comes
pre installed with python there is no need to install it again . The main
use of this library was to develop a Graphic User Interface (GUI) so that
almost anyone (with and without any prior programming knowledge )
can use this software and thereby make it more user friendly in the due
course of development.

 CODE
 """
 Delhi Technological University

 3rd semester Signal and System project
 Project: Speech recognition using python

 Created by:
 Abhivyakt sharma (2K20/EC/010)
 Aditya Gupta (2K20/EC/013)

 Mentor:
 Professor MS Chaudhary
 """

 '''-----------------------------------------
Program-----------------------------------------------------'''

 """  
 Importing two Libraries are:
 1.speech_recognition for speech recognition and
 2.tkinter for Graphical user interface(GUI) Development

 """

 import speech_recognition
 from tkinter import *

 '''
 Here we have created a function/method named speech_recognize()
 which will work for speech recognition

 '''

 '''---------------------------------Speech Recognising
function--------------------------------------------'''

 recognizer = speech_recognition.Recognizer()

 def speech_recognizer():
     global recognizer
     try:

         # Here, we are defining our program that from where it
will take voice input, which is Microphone here
         with speech_recognition.Microphone() as mic:

             # Here, program is reducing the noise and cleaning
the voice so that it can easily analyse the frequency
             recognizer.adjust_for_ambient_noise(mic, 0.2)
             audio = recognizer.listen(mic)

             '''Here, program is using module to find word,
sentence and letters which make sense
              and here it is using google's recognising module'''
             text = recognizer.recognize_google(audio)
             text = text.lower()

             # Here, the label is element to represent then
result in our GUI Application
             label = Label(text="Recognized: " + text)
             label.place(x=100, y=100)

     # Here, except will generate a result that program unable
hear or understand the voice
     except:
         label = Label(text="Unable to understand, please say
again")
         label.place(x=110, y=100)

 '''-------------------------------------Creating GUI: Graphical
User Interface---------------------------------'''

 # Creating a new window and configurations
 window = Tk()
 window.title("Speech Recognition Demo")
 window.minsize(width=500, height=200)

 # Labels
 label = Label(text="Click on Start for Speech Recognition Demo")
 label.place(x=100, y=20)

 # Creating button that will use speech_recognizer function to
generate the result
 button = Button(text="Start", height=2, width=10,
command=speech_recognizer)
 button.place(x=110, y=50)

 window.mainloop()

 '''Caution: This program sometimes doesn't response due to the
heavy load'''

Shown above is the output for our program.


 SUMMARY
We have successfully demonstrated how a speech recognition software
is developed , all the steps involved from obtaining to the voice input to
the very output . We have discussed it’s history of development and its
present scenario limitations and strengths,
As far as the future scope of this project Is concerned we’ll be developing
a voice assistant such as the one present in our smartphones like google
now , cortana , alexa , siri etc to name a few.
Not only this but this can be the means to developing a strong security
system for our devices.

You might also like