You are on page 1of 42

A PROJECT REPORT ON

“DEVID : A Voice Assistant using Python ”

SUBMITTED TO MIT SCHOOL OF ENGINEERING


IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF
THE DEGREE

BACHELOR OF TECHNOLOGY
(Computer Science& Engineering)

BY

Candidate Name Aditya Kulkarni Enrollment No:MITU21BTCS0036


Candidate Name Dev Patel Enrollment No:MITU21BTCS0184
Candidate Name Keya Karkun Enrollment No:MITU21BTCS0283
Candidate Name Vedashree Bhalerao Enrollment No:MITU21BTCS0710

Under The Guidance of


Dr. Shraddha Phansalkar

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


MIT School of Engineering
MIT Art, Design and Technology University
Rajbaug Campus, Loni-Kalbhor, Pune 412201
2022-23
MIT SCHOOL OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

MIT ART, DESIGN AND TECHNOLOGY


UNIVERSITY, RAJBAUG CAMPUS,
LONI-KALBHOR, PUNE 412201

CERTIFICATE

This is to certify that the Project Entitled

“DEVID : A Voice Assistant using Python”


Submitted by

Candidate Name Aditya Kulkarni Enrollment No:MITU21BTCS0036


Candidate Name Dev Patel Enrollment No:MITU21BTCS0184
Candidate Name Keya Karkun Enrollment No:MITU21BTCS0283
Candidate Name Vedashree Bhalerao Enrollment No:MITU21BTCS0710

is a bonafide work carried out by them under the supervision of Dr.Shraddha


Phansalkar and it is submitted towards the partial fulfillment of the requirement of
MIT ADT University, Pune for the award of the degree of Bachelor of Technology
(Computer Science and Engineering).

Dr .Shraddha Phansalkar Prof. Ganesh Pathak Dr. Rajneeshkaur Sachdeo

Internal Guide H.O.D Director


Department of CSE Department of CSE MIT SoE
CERTIFICATE

This is to certify that the Project Entitled

Devid : A Voice Assistant Using Python

Submitted by

Aditya Kulkarni MITU21BTCS0036


Dev Patel MITU21BTCS0184
Keya Karkun MITU21BTCS0283
Vedashree Bhalerao MITU21BTCS0710

is a bonafide work carried out by them under the supervision of Dr. Shraddha
Phansalkar and has been completed successfully.

Dr. Shraddha Phansalkar

Guide,
Department of CSE
MIT School of Engineering
DECLARATION

We, the team members

Aditya Kulkarni MITU21BTCS0036


Dev Patel MITU21BTCS0184
Keya Karkun MITU21BTCS0283
Vedashree Bhalerao MITU21BTCS0710

Hereby declare that the project work incorporated in the present project entitled “DEVID : A Voice Assistant
using Python” is original work. This work (in part or in full) has not been submitted to any University for the
award or a Degree or a Diploma.We have properly acknowledged the material collected from secondary sources
wherever required.We solely own the responsibility for the originality of the entire content.

Date : 01 / 12 / 2022

Aditya Kulkarni MITU21BTCS0036


Dev Patel MITU21BTCS0184
Keya Karkun MITU21BTCS0283
Vedashree Bhalerao MITU21BTCS0710

Dr. Shraddha Phansalkar

Seal/Stamp of the college


Place : Pune
Date:
01 / 12 /2022
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

MIT SCHOOL OF ENGINEERING


RAJBAUG, LONI KALBHOR
PUNE – 412201

EXAMINER’S APPROVAL CERTIFICATE

The project report entitled “DEVID : A Voice Assistant using Python” submitted by Aditya
Kulkarni(MITU21BTCS0036), Dev Patel(MITU21BTCS0184), Keya
Karkun(MITU21BTCS0283), Vedashree Bhalerao(MITU21BTCS0710)
in partial fulfillment for the award of the degree of Bachelor of Technology (Com-
puter Science & Engineering) during the academic year 2022-23, of MIT-ADT Uni-
versity, MIT School of Engineering, Pune, is hereby approved.

Examiners

Examiner 1 Name and Signature :

Examiner 2 Name and Signature:


Acknowledgments

It gives us great pleasure in presenting the project report on “DEVID : A Voice


Assistant using Python”

We would like to express our humble gratitude towards our mentor Dr.Shraddha
Phansalkar as well as our principal Dr. Kishore Ravande who gave this this
golden opportunity to work on this interesting project This project helped us gain
in depth knowledge about natural language processing which is the future of this
fast-paced world. It also helped analyze the Pros and Cons of a particular
application and its market viability. We provide valuable information through our
analysis which will help upcoming app developers plan meticulously as to how
their application has to be designed. Both members contributed greatly to the
timely completion of this project which would have been difficult without the
assistance of our guide

Aditya Kulkarni

Dev Patel

Keya Karkun

Vedashree Bhalerao
(B.Tech. Computer Science & Engineering)
Abstract

⮚ Devid an intelligent voice assistant is a software that can


perform tasks or services for an individual based on
commands or questions.

⮚ Voice assistance is a user interface that allows hands-free operation


of a digital device. Voice control does not require an internet
connection to work

⮚ Displays it on the screen and performs the assigned task.

⮚ While using it, one can easily be able to interact with the
system, which gives instant and computed results.

⮚ Competitors are : Alexa , Siri , Google assistant , Cortana.

⮚ It recognizes the speech of the user.


Contents

Certificate i

Certificate from company if any i

Declaration i

Examiner’s Approval Certificate i

Acknowledgement i

Abstract i

List of Figures i

List of Figures i

List of Tables ii

List of Tables ii

1 Introduction 1
1.1 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation of the Project . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Scope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 4
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Comparison of existing work . . . . . . . . . . . . . . . . . . . . . 7

3 Software Requirement Specification 8


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Purpose and Scope of Document . . . . . . . . . . . . . . . . . . . 9
3.3 General Description . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . 10
3.5 Performance Requirements . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Design Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.7 Non-Functional Attributes . . . . . . . . . . . . . . . . . . . . . . 12
3.8 Non-Functional Requirements . . . . . . . . . . . . . . . . . . . . 13

4 Project Design and Implementation 14


4.1 Architectural Diagram 15
4.2 Methodology 15
4.3 Usage Scenario 16
4.3.1 User profiles 16
4.3.2 Use-cases 16
4.3.3 Use Case View 17
4.4 Data Model and Description 18
4.4.1 Data Description 18
4.4.2 Data objects and Relationships 18
4.4.3 Data Flow Diagram 19
4.4.4 Activity Diagram: 20
4.5 Sequence Diagram 21

5 System Testing 23

6 Conclusion and Future Scope 27


7 References 29
4.1 Architecture diagram 15
4.2 Use case diagram 16
4.3 Use case View 17
4.4 Activity diagram 20
4.5 Sequence Diagram 21
4.1 Use Cases 16
CHAPTER 1

INTRODUCTION
1.1 RELEVANCE

• The main relevance of our topic and the reasons behind it are:
• Saves time by Automating Repetitive Tasks
• Aids Hand-free Operation
• Enable a highly engaging user experience
• Make your application frustration free

1.2 MOTIVATION OF THE PROJECT

We need software that carries out everyday tasks via voice command. It brings AI and machine
learning together to recognize our voice and do what we ask it to do.

1.3 PROBLEM STATEMENT

Creating a VOICE ASSISTANT using python:

The rise of automation, along with increased computational power and improved accessibility
to data, have resulted in the birth of the personal digital assistant market. Users find it convenient to
speak rather than type what they want. The voice assistant makes it easier to serve this purpose.

1.4 OBJECTIVES

This Software aims at developing a personal assistant for Windows systems. The main purpose
of the software is to perform the tasks of the user at certain commands, provided in either of
the ways, speech or text. It will ease most of the work of the user as a complete task can be
done on a single command. Devid draws its inspiration from Voice assistants like Cortana for
Windows and Siri for iOS. Users can interact with the assistant either through voice commands
or keyboard input.
1.5 Scope

Presently, Devid is being developed as an automation tool and virtual assistant. It will further
develop the work process, consumer loyalty, and deals and develop your suggestions to the
customers. Devid is able to work in the offline mode as well, to carry out some basic operations
required by the user. It is able to fetch out the requested information from the web and can also
display the current date and time if required.
CHAPTER 2

LITERATURE SURVEY
2.1 RELATED WORK
Bassam A, Raja N. et al, have written about statement and speech for communication between
humans and machines analog signals are used which are converted by speech signal to digital wave. The
technology is massively utilized and has unlimited uses and also permits machines to reply according to
users command and voices. Speech recognition systems are growing day by day and also have unlimited
uses.

B.S. Atal and L.R. Rabiner et al have explained regarding speech analysis, and the theory is
getting evolved day by day. The research performed describes a pattern recognition technique for the
determination of voice. It determines that the voice input is whether voiced speech, unvoiced, or silence.
It completely depends upon the dimensions finishing on the signal. The system comes with restrictions
and the main restriction here is the requirement for exercising the algorithm on the exact set of
dimensions picked, and also for recording circumstances.

V. Radha and C. Vimala et al, explained about the most suitable way of communication between
humans is speech. Since speech recognition is an utmost technique of recognition, it makes human
beings identical and makes it easier for machines to recognize them. This helps in autonomous speech
recognition and also has a lot of reputation. Some of the most used speech recognition techniques are
Dynamic Time Warping (DTW), HMM. For feature mining of speech Mel Frequency Cepstrum
Coefficients (MFCC), it offers a group of characteristic vectors of speech waveform. Studies have
revealed that MFCC is more precise and real than other mining approaches in speech recognition. The
research has been done on MATLAB and the outcomes on investigation depict that the system is capable
in identification of words at a great satisfactorily accuracy.

T. Schultz and A. Wavelet al, explained about the spreading of speech technology products
around the world. The research tells about the query on how to port huge vocabulary incessant speech
recognition (LVCSR) systems in a fast and well organized manner. However, there is a need to evaluate
the acoustic models for novel destination languages by means of speech information from different
source languages. But the restricted data from destination language identification outcomes using
language dependent, independent and language adaptive acoustic models are deliberated in the
framework of Global Phone project which examines LVCSR methods in 15 languages.

J. B. Allen et al has described Language as the utmost and significant means of communication
and speech is its major interface. For the interface creation between humans and machines, the speech
signals were converted into analog and digital wave shapes for the machine to understand. Speech
technologies today permit the machines to react appropriately according to human speeches and offer
valuable and appreciated services. The carried out research gave the result in terms of speech
identification procedure, its basic model, its application, and techniques and also described several other
research techniques that are necessary for speech recognition systems. SRS is an emerging technology
and is increasing its vitality day by day gradually and also has infinite applications.

Mugdha Bapat, Pushpak Bhattacharyya et al, described morphological analyzer for almost all of
the Indian languages. At the starting phase the planning was to some extent homomorphism “boos
trappable” encryption technique. The research proved to be a great success for Marathi language that
resulted in engagement of the Finite State Systems for the demonstration of language in a sophisticated
way. Since Marathi has a really difficult morphotactics hence the growth of FSA is one of significant
assistance.
G. Muhammad, M.N. Huda et al, presented an ASR model for the Bangla digits. To carry out this
research the information was gathered for the general Bangladeshi public. For identification purposes
Mel-frequency cepstral coefficients (MFCCs) and hidden Markov model (HMM) were used. In the trial
it was discovered that female spoken digits have higher accuracy than male spoken digits. 8. Sean R
Eddy et al researched on Hidden Markov Models. They are basically a common statistical designing
approach for issues like sequences or time series. These methods are extensively being used in the
process of speech recognition. With the help of HMM formalism, it is possible to create a relation
between formal, completely probabilistic techniques to profiles and gapped structure arrangements.
Steady theory for insertion and deletion, constant structure for joining structural and sequence data are
some of the popular offerings of HMM. It also makes sequence arrangements more refined. It also
makes satisfactory arrangements for difficult threading techniques for protein reverse fold.
2.2 COMPARISON OF EXISTING WORK

In the existing system of virtual assistant there are several virtual assistants in market by using
Artificial
Intelligence technology. Many companies have used the dialogue systems technology to establish
various kinds of Virtual Personal Assistants (VPAs) based on their applications and areas, such as
Microsoft’s Cortana for Windows and Espeak for Linux, Siri for Apple, Google Assistants For Android.
The first digital virtual assistant installed on an Apple smartphone was Siri, It was introduced as a
feature of the iphone in 2011. Aim of that virtual assistant was to add in tasks such as sending a text
message, making phone calls, checking the weather or setting up an alarm. Over time, it has developed
to provide restaurant locations , search the internet, and provide driving directions. In 2014 Cortona
virtual assistants was developed by Microsoft. Cortana uses Bing search engine for performing tasks like
answering questions for the users, setting remainder, etc. In 2016 Google Assistant was developed by
google. It is primarily available for mobiles and smart home devices. Google Assistants via chat on
google messaging app and via voice on google smart home speaker.

Cortona was developed by Microsoft as a personal virtual assistant for windows, iOS, android,
etc. In the Windows operating system Cortona works only for Windows 10. It was released for Windows
10 in 2015. In Windows 10 , Cortona is in Icon form on the taskbar next to the search bar for use the
cortona application we try to set up to activate the Cortona in our laptop or PC’s. It is easy to search but
it takes more time to set up. It is very time consuming. It works in windows only for windows 10. It is
not helpful for other windows versions or explorers like windows 7, 8, etc. Therefore for other versions
of windows we try to make a personal virtual assistant which is able to access on any windows explorer
such as windows 7,8,10. In this project we use Python as a programming language and pycharm as a
platform on which we execute our code for virtual assistant. We create the personal virtual assistants
web application in the form of .exe file which is easy to get in any laptop or PC’s and use it for showing
datetimes, managing emails, playing music, videos, open apps, etc. In our virtual assistant user can able
to train or update it by their own needs to do some tasks
CHAPTER 3

SOFTWARE REQUIREMENT
SPECIFICATION
3.1 INTRODUCTION
As we know Python is a suitable language for scriptwriters and developers. The query for
the assistant can be manipulated as per the user’s need.
Speech recognition is the process of converting audio into text. This is commonly used in
voice assistants like Alexa, Siri, etc. Python provides an API called SpeechRecognition to allow us to
convert audio into text for further processing. In this article, we will look at converting large or long
audio files into text using the SpeechRecognition API in python.

3.2 PURPOSE OF DOCUMENT

Presently, Devid is being developed as an automation tool and virtual assistant. Among the
Various roles played by Devid are:

• Reading Newspaper
• Search on web
• Play a music file
• Run any program or application
• Getting weather updates
• Can work offline as well as online

3.3 GENERAL DESCRIPTION

The requirements that specify what all services a system can provide the end-user
are called the functional requirements. These define exactly what functions the system can
do. The functional requirements are closely related to the user requirement specifications.
This may include calculations, data processing, technical operations and other such
functionality that aim to fulfill the application objectives. These are captured in the form of
use cases, which are the system responses to events by external agents or internal
deadlines. Any tracking operations, legal requirements, interface details, authorization
levels, transaction updates and cancellations, and administrative functions come under
functional requirements. The technical architecture of the system is determined by these
requirements.
3.4 FUNCTIONAL REQUIREMENTS

The functional requirements of this project are:

• It should display a pleasant user interface for the customer to understand the
system processes.

• It should listen to the spoken commands and recognize the words heard.

• It should provide appropriate directions to the user for ease of use.

• It should give an acknowledgement for the recognition process through some


animation and communications.

• It should be able to perform the tasks that a user requires through Python
automation.

• It should refresh or reload after every command and clear off any extra cache
memory used.

• It should maintain suitable time limits of execution, while recording audio or


providing results.

• It should deliver error messages whenever needed.

3.5 PERFORMANCE REQUIREMENTS

Performance is assessed using following specifications:

• Response time :
It is the time taken for the system to accept user input and respond to it
by displaying some output. Usually, feedback messages are displayed within
Intelligent Voice Assistant 1 second, which in itself is a noticeable delay. A
maximum of 10 seconds on a dialog window ensures that the user does not lose
interest or train of thought. The response time must also be consistent and not vary
based on the number of concurrent sessions.

• Workload:
It is the amount of stress or work that the system can hold at once. This
could be in terms of parallel sessions, number of active users or number of
database transactions. The workload is usually described as the scenarios that users
will most likely encounter. Special cases like error scenarios, backups and
management requests should be taken into consideration while specifying the
workload.

• Throughput :
The number of samples or bytes of data that are processed per second is
referred to as throughput. The data processing rate should be as high as possible to
ensure that the outputs are consistent and the user sustains interest in using the
system.
3.6 DESIGN CONSTRAINTS

1. As of now, the burden of the database involved while executing the tasks offline, is
a thing to be looked upon.

2. Secondly, recognising the speech when spelled speedily, might sometimes result in
a way we expect not, needs to be solved.

3. Also, linking some of the applications with the voice assistant in the online mode
may not show up a few times, and the same will be looked upon.

4. Repeated activation, though not very often, comes in the way, wherein the user
hasn’t requested the same. Considering the vast scope of the project & its purpose,
the team is determined to fill in the voids and make the project result as accurate
and swift as possible, and roll it on your way

3.7 NON-FUNCTIONAL ATTRIBUTES

EXTENSIBILITY :

The design principle that determines the ability of a system to be extended is called
extensibility. The extension can be an added functionality or modification of existing functionality.
Overall, the system is enhanced while not affecting existing working functions. A light software
framework can easily be extended, as small changes can be added with ease. To add new functionality to
this project, it is sufficient to add statements in existing modules. Since every major aspect of the project
has been implemented in separate files, it is easy to add new features even as a new file. The project is
compact and has lightweight processes, ensuring integration with new components with ease.

MAINTAINABILITY :

Maintainability is the ease with which users of a system can modify the system and correct
any defects. It also determines ways to add new features, maximize efficiency and error recovery.
Usually continuous improvement is required for a system to be maintainable. Maintainability is closely
related to extensibility. Since the system in this project is modular, it can be maintained easily and
updated.

SECURITY :

The resilience to potential harm caused by malicious intent or viruses sums up security. One
of the major drawbacks of Google Assistant is that every customer recording is stored in a repository
and can be accessed and linked to the exact customer. This gives way for the possibility of data leaks.
Many people are not aware about this, but blindly agree to the terms and conditions. Intelligent Voice
Assistant To avoid this, this project doesn’t store any user data or recordings anywhere. It works on a
single command basis, making sure there are no threats to the application or other user data.

USABILITY :

Usability describes how easy or difficult it will be to learn to operate the system. This is
often measured in learning time or similar metrics. It defines the user experience across softwares and
environments. It shouldn’t be confused with user-friendliness, which relates to the accessibility of the
application. Usability is more closely related to how fast users can learn to use the system, how easily
they can re-establish proficiency if using after a long period of time and how efficiently they can carry
out desired tasks once they learn to use the software. It also includes how pleasant the UI design looks
and the satisfaction that the user gets while using it. With the simple UI and prompt instructions of this
system, it can be easily learnt and used by anyone, within just a couple of trials

3.8 NON-FUNCTIONAL REQUIREMENTS

The requirements that describe how the system runs are the non-functional
requirements. These requirements determine the way the system behaves and the
constraints on it. These criteria are used to judge the system in terms of performance,
reliability and security.
CHAPTER 4

PROJECT DESIGN AND


IMPLEMENTATION
4.1 ARCHITECTURAL DIAGRAM

Figure 4.1: Architecture diagram

4.2 METHODOLOGY

Knowledge abstraction involves three phases: gathering, manipulation and


augmentation. Data gathering the first step is to generate a knowledge base. One then
generates a bunch of questions.These are answered by the voice assistant. The second step is
maintaining a database, which is done on the system.

Responses are the outputs of intents, so each time a user


entry corresponds to an intent, its response is triggered. They appear in several ways, the most common
is simple text. There are entities that correspond to geographical locations, numbers, or dates. These are
popular in the context of customer service and are considered while working on the project.

We found it was better to use a mixture of both: let the agent tell the user what it is showing, but not the
actual content, because it is otherwise annoying to hear the agent speak for a long time.
4.3 USAGE SCENARIO

The voice assistant can be used in various manners. The voices of both male and female users
are to be tested. Also, the speed of giving input, i.e speaking counts. The operations are to be
performed online as well as offline. We can test our voice assistant at a crowded place and in the
personal space as well. There are many more usage scenarios involved with the project.

4.3.1 User profiles

Students: Pursuing education from different sections of the society.

Teachers: University and Institutional faculties, especially in the research department.

Elderly: Aged people who tend not to type more.

Physically challenged: Those who have a necessity to speak in order to give command.

4.3.2 Use-cases

Table 4.1: Use Cases


4.3.3 Use Case View
4.4 DATA MODEL AND DESCRIPTION

4.4.1 Data Description

The files already existing on the system are to be maintained in order to carry out
the offline operations, for ex. we have included the music files and those that
display date & time. The default browser needs to be carefully selected, and we
will be using the latest version of Google Chrome for the online operations. The
greeting messages and the message at the end of the execution have already been
added for a better user experience.

4.4.2 Data objects and Relationships

Data objects and their major attributes and relationships among data objects are de-
scribed using an ERD- like form as shown below:
4.4.3Data FlowDiagram Level

0 Data Flow Diagram

4.4.4 Level 1 Data Flow


4.4.5 Activity Diagram:

A description of each software function is presented.


4.5 SEQUENCE DIAGRAM

Sequence Diagram is an interaction diagram that details how operations are carried
out – what messages are sent and when. Following is the diagram for the voice
assistant:

The above sequence diagram shows how an answer asked by the user is being fetched
from the internet. The audio query is interpreted and sent to Web scraper. The web scraper
searches and finds the answer. It is then sent back to speaker, where it speaks the answer to
user.
Figure 4.5: Sequence Diagram

The user sends commands to the virtual assistant in audio form. The command is passed to
the interpreter. It identifies what the user has asked and directs it to the task executor. If the
task is missing some info, the virtual assistant asks the user back about it. The received
information is sent back to task and it is accomplished. After execution feedback is sent
back to the user.
CHAPTER 5

SYSTEM TESTING
1. Queries from the web:

Making queries is an essential part of one’s life, and nothing changes even for a developer
working on Linux. We have addressed the essential part of a netizen’s life by enabling our
voice assistant to search the web .
2 . Opening any website

3 . Display Time
4. Using Wikipedia
CHAPTER 6

CONCLUSION AND FUTURE SCOPE


Future work :

Devid will soon be ready to hear, identify slangs & execute just in case if you pour out anger or
even just for fun!

The active noise cancellation feature is going to be added, so as to provide much more ease to
the users and the recognising process becomes more smooth and accurate.

As a next update of our assistant, it will make suggestions


according to the user’s overall situation.

Conclusion :

In this paper we have discussed a Voice Assistant developed using python. This
assistant currently works as an application based and performs basic tasks like weather
updates, stream music, search Wikipedia, open applications, etc. The functionality of
the current system is limited to working on application based only. The upcoming
updates of this assistant will have machine learning incorporated in the system which
will result in better suggestions with IoT to control the nearby devices similar to what
Amazon’s Alexa does.
CHAPTER 7

REFERENCES
Websites referred :

1. www.stackoverflow.com

2. www.pythonprogramming.net

3. www.codecademy.com

4. www.tutorialspoint.com

Books referred :

1. Python Programming - Kiran Gurbani

2. Learning Python - Mark Lutz

You might also like