Professional Documents
Culture Documents
A Project Report
by Priyam Choksi and Neel Ghodke
MUMBAI, 400067
MAHARASHTRA
2020-21
Abstract
Virtual Assistants are software programs that help you ease your day
to day tasks, such as showing weather reports, creating reminders,
making shopping lists etc. This system is designed to be used
efficiently on desktops. Personal assistant software improves user
productivity by managing routine tasks of the user and by providing
information from online sources to the user.
Acknowledgement
We take this opportunity to record our sincere thanks to all the faculty
members of the Department of BSc-IT for their help and encouragement.
Chapter 1 : Introduction
1.1 Background
1.2 Objectives
1.3 Purpose, Scope and Applicability
1.3.1 Purpose
1.3.2 Scope
1.3.3 Applicability
1.4 Achievements
1.5 Organization of Reports
Chapter 5 Conclusion
CHAPTER 1
Introduction
In today’s era almost all tasks are digitized. We have Smartphones in our hands and it
is nothing less than having the world at your fingertips. These days we aren’t even
using fingers. We just speak of the task and it is done. There exist systems where we
can say Text Dad, “I’ll be late today.” And the text is sent. That is the task of a Virtual
Assistant. It also supports specialized tasks such as booking a flight, or finding the
cheapest book online from various ecommerce sites and then providing an interface
to book an order to help automate search, discovery and online order operations.
Virtual Assistants are software programs that help you ease your day to day tasks,
such as showing weather reports, creating reminders, making shopping lists etc. They
can take commands via text (online chat bots) or by voice. Voice based intelligent
assistants need an invoking word or wake word to activate the listener, followed by
the command. For our project the wake word is NOVA. We have so many virtual
assistants, such as Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana. For this
project, wake word was chosen Nova.
This project is based on AI-module development and provides personal assistant using
voice recognition or text mode operation.
As it integrates most of the services for daily use, it could be useful for getting a more
convenient life and it will be helpful for those people who have disabilities for manual
operations. This is also part of the reason why it has been chosen as the degree
project.
In the long run, we aim to develop a complete server assistant, by automating the
entire server management process - deployment, backups, auto-scaling, logging,
monitoring and make it smart enough to act as a replacement for a general server
administrator.
1.2 Objectives
5. News
6. Wikipedia Result
1.2 Objectives
7. Smart Dictionary
1.2 Objectives
8. Weather Report
1.2 Objectives
9. COVID Tracker
1.2 Objectives
17. Translator
1.3.2 Scope
Voice assistants will continue to offer more individualized experiences as they get
better at differentiating between voices. However, it’s not just developers that
need to address the complexity of developing for voice as brands also need to
understand the capabilities of each device and integration and if it makes sense for
their specific brand. They will also need to focus on maintaining a user experience
that is consistent within the coming years as complexity becomes more of a
concern. This is because the visual interface with voice assistants is missing. Users
simply cannot see or touch a voice interface.
1.3.3 Applicability
The mass adoption of artificial intelligence in users’ everyday lives is also fueling
the shift towards voice. The number of IoT devices such as smart thermostats and
speakers are giving voice assistants more utility in a connected user’s life. Smart
speakers are the number one way we are seeing voice being used. Many industry
experts even predict that nearly every application will integrate voice technology in
some way in the next 5 years.
The use of virtual assistants can also enhance the system of IoT (Internet of
Things). Twenty years from now, Microsoft and its competitors will be offering
personal digital assistants that will offer the services of a full-time employee usually
reserved for the rich and famous
1.4 Achievements
In further Chapter 2, we will discuss about the technologies which willed be using in the
project. Also a brief overview of all the technologies required in the NOVA project.
1.5 Organisation of Report
Survey Of Technology
Python
Python provides a huge list of benefits to all. The usage of Python is such that it
cannot be limited to only one activity. Its growing popularity has allowed it to enter
into some of the most popular and complex processes like Artificial Intelligence (AI),
Machine Learning (ML), natural language processing, data science etc. Python has a
lot of libraries for every need of this project. For JIA, libraries used are speech
recognition to recognize voice, Pyttsx for text to speech, selenium for web
automation etc.
DBpedia
The DBpedia knowledge base allows you to ask quite surprising queries against
Wikipedia for instance “Give me all cities in New Jersey with more than 10,000
inhabitants” or “Give me all Italian musicians from the 18th century”.
Quepy
Pyttsx
Pyttsx stands for Python Text to Speech. It is a cross-platform Python wrapper for
text- to-speech synthesis. It is a Python package supporting common text-to-speech
engines on Mac OS X, Windows, and Linux. It works for both Python2.x and 3.x
versions. Its main advantage is that it works offline.
Speech Recognition
This is a library for performing speech recognition, with support for several engines
and APIs, online and offline. It supports APIs like Google Cloud Speech API, IBM
Speech to Text, Microsoft Bing Voice Recognition etc.
Information Retrieval
The program has two modes to well fetch the services and functions. The program
will start with voice mode as its primary mode to provide the voice assistant, but the
user can select switching to the text mode if he or she is not well working with the
voice mode or the surrounds don’t support the voice recognition well.
Libraries
gTTS
The main library used to output the assistant’s voice using Google’s text to speech
feature on Google Translate.
SpeechRecognition
The library used to recognize voice input from users.
playsound
The library to play sound files without having to open them using 3rd party apps.
PySide2
The library used to make the graphical user interface. This library is the counterpart
of its parent, PyQt5, developed by the same company. The advantage of using
PySide2 instead of PyQt5 is the generate python script in the designer app which is
used to design the app’s GUI.
tkinter
To open a dialog box since PySide2 doesn’t have this function.
opencv
This library is used to process images, resizing them based on needs.
Pillow
Another library used to process images, corresponds to the tkinter library to choose
images.
requests
This library is used to make a request for the html output of a webpage.
beautifulsoup4
The library used to scrape webpages to get information from it, such as explanations
of stuff.
re
This library is used to find matching string(s) in a string variable. This library is the
most important one beside ‘beautifulsoup4’ and ‘request’ to get information about
something from the web. (request > beautifulsoup4 > regex)
wolframalpha
The library used as the calculator module.
Pyautogui
The library used to get the position of the image on the screen. This library is used
to locate the microphone button when the user calls Wolf using its wake-up call and
to screenshot the screen.
os and shutil
Both of these libraries are used as the file management libraries. ‘os’ is mainly used
to make a new directory and renaming files while ‘shutil’ is used to copy and paste
files.
time
Used to display the current time.
webbrowser
To open a link in a new tab of the pc’s default browser.
Introduction
This program is made without using machine learning, thus throwing a bunch of if
and else into it. The workflow of this program is that it checks the user input by
splitting it. It will then check the input on every function that has been made to
predict whether the user wants to do this or not by using a counter (see the
examples below). If not then pass, if yes then execute that function. For example,
user inputs ‘tell me the weather of Mumbai now’. The program splits the input based
on the blank space, then checks every single function. If a function returns a value
greater than or equal to 2 or 3, then it’ll execute that function. If the return value is
lower than three, then it’ll pass to the next input checking.
Here, the first thing the program will check is the basic tasks input (a special
occasion). If the return value is empty, then the program will go on the next one. The
second one the program will check is the weather input. The weather library has the
check_userinput() function which will check the input.
Every library that I made (apps.py, day.py, notes.py, screenshot.py, and weather.py)
has this function to check the user input. I did not define it only once to ease the
process and to avoid circular import.
Basic Tasks
The first function is the basic_tasks(input) function. As its name states, this function
is executed when the user commands Wolf to do simple tasks such as asking its
name, what chores can it do, and so on. Below is the code for the function.
Again, because this program does not use machine learning, a bunch of if and else
must be implemented for the program to work as we want.
Weather
This is the weather function. It calls the openweather API to get the data. The data
is received in a form of a raw string file like a json file. Therefore, it then gets
converted into a json file which behaves similarly to python’s dictionary. The rest of
the lines are variables to store the converted data. If the request attempt failed and
received nothing, it will return None, else it will return a list consisting of the
weather data. The main window will then pick the values from inside the list and
display it in the main app. The example of the result will be on the working app
proof section of this report.
Time
The next one is the time function. It is a simple function to display the time. This
function returns 2 values, the first one is a string which will be shown in the GUI,
and the second one is a string which will then be passed into the
voice_output(output) function which will be discussed later. To distinguish this, I
made a variable that stores a dictionary containing the key, which is the current time
received from the ‘time’ library, and the values which contain the 12-hour format of
the clock.
Jokes
This is the jokes function, used to tell jokes to users using an API called official-joke-
API. What this function does is it requests random jokes to the API which will return
a string in the form of a dictionary containing the jokes. To ease the process, I used
the eval() method to change the dictionary-like string into a real dictionary. It can
then be accessed as a key value pair, append it to a new string, and return it so it can
be displayed to the user. Each time a request is made, the API will return random
and different jokes.
CHAPTER 3
Functional Requirements
Non-Functional Requirements
Reliability – The system should be more reliable i.e. the clashes rate should
be less. It should work for any amount of time
Extensibility – The system should be capable for adding new modules and
new ideas.
Reusability – The system can be reused for making new software similar to
it. Its codes and flow can be applied in different software development.
3.1. Problem Definition
Usually, the user needs to manually manage multiple sets of applications to complete
one task. For example, a user trying to make a travel plan needs to check for airport
codes for nearby airports and then check travel sites for tickets between
combinations of airports to reach the destination. There is a need for a system that
can manage tasks effortlessly.
We already have multiple virtual assistants. But we hardly use it. There are a number
of people who have issues in voice recognition. These systems can understand
English phrases but they fail to recognize our accent. Our way of pronunciation is
way distinct from theirs. Also, they are easier to use on mobile devices than desktop
systems. There is a need for a virtual assistant that can understand English in Indian
accent and work on a desktop system.
3.2 Requirements
Specifications
When a virtual assistant is not able to answer questions accurately, it’s because it
lacks the proper context or doesn’t understand the intent of the question. Its ability
to answer questions relevantly only happens with rigorous optimization, involving
both humans and machine learning. Continuously ensuring solid quality control
strategies will also help manage the risk of the virtual assistant learning undesired
bad behaviors. They require a large amount of information to be fed in order for it to
work efficiently.
Virtual assistants should be able to model complex task dependencies and use these
models to recommend optimized plans for the user. It needs to be tested for finding
optimum paths when a task has multiple sub-tasks and each sub-task can have its
own sub-tasks. In such a case there can be multiple solutions to paths, and it should
be able to consider user preferences, other active tasks, priorities in order to
recommend a particular plan
Personal assistant software is required to act as an interface into the digital world by
understanding user requests or commands and then translating into actions or
recommendations based on the agent's understanding of the world.
Nova focuses on relieving the user of entering text input and using voice as primary
means of user input. Agent then applies voice recognition algorithms to this input
and records the input. It then use this input to call one of the personal information
management applications such as task list or calendar to record a new entry or to
search about it on search engines like Google, Bing or Yahoo etc. Focus is on
capturing the user input through voice, recognizing the input and then executing the
tasks if the agent understands the task. Software takes this input in natural language,
and so makes it easier for the user to input what he or she desires to be done.
Voice recognition software enables hands free use of the applications, lets users to
query or command the agent through voice interface. This helps users to have access
to the agent while performing other tasks and thus enhances value of the system
itself. Nova also have ubiquitous connectivity through Wi-Fi or LAN connection,
enabling distributed applications that can leverage other APIs exposed on the web
without a need to store them locally.
Feasibility
Study Feasibility study can help you determine whether or not you should proceed
with your project. It is essential to evaluate cost and benefit. It is essential to
evaluate the cost and benefit of the proposed system. Five types of feasibility study
are taken into consideration.
1. Technical feasibility:
It includes finding out technologies for the project, both hardware and software. For
virtual assistants, users must have a microphone to convey their message and a
speaker to listen when the system speaks. These are very cheap nowadays and
everyone generally possesses them. Besides, the system needs internet connection.
While using Nova, make sure you have a steady internet connection. It is also not an
issue in this era where almost every home or office has Wi-Fi.
2. Operational feasibility:
It is the ease and simplicity of operation of the proposed system. System does not
require any special skill set for users to operate it. In fact, it is designed to be used by
almost everyone. Kids who still don’t know how to write can read out problems for
the system and get answers.
3. Economical feasibility:
Here, we find the total cost and benefit of the proposed system over the current
system. For this project, the main cost is documentation cost. Users also would have
to pay for microphone and speakers. Again, they are cheap and available. As far as
maintenance is concerned, Nova won’t cost too much.
4. Organizational feasibility:
This shows the management and organizational structure of the project. This project
is not built by a team. The management tasks are all to be carried out by a single
person. That won’t create any management issues and will increase the feasibility of
the project.
5. Cultural feasibility:
It deals with compatibility of the project with the cultural environment. Virtual assistants
are built in accordance with the general culture.
Hardware:
• Pentium-pro processor or later.
• RAM 512MB or more.
Software:
• Windows 7(32-bit) or above.
• Python 2.7 or later
• Anaconda
3.4 Conceptual Models
Some models are physical objects; for example, a toy model which may be assembled,
and may be made to work like the object it represents. The term conceptual model may
be used to refer to models which are formed after a conceptualization or generalization
process.
Conceptual models are often abstractions of things in the real world whether physical or
social. Semantic studies are relevant to various stages of concept formation. Semantics is
basically about concepts, the meaning that thinking beings give to various elements of
their experience.
Data flow diagrams are used to graphically represent the flow of data in a business
information system. DFD describes the processes that are involved in a system to
transfer data from the input to the file storage and reports generation.
Level 0:
Level 1:
3.4.1 Event Table
The event table is a table of data that is typically written to the logfile for each scenario
and also appears in the Analysis window. The event table contains timing information
about specific events that occur during the scenario. Only stimulus events with event
codes will appear in the event table.
3.4.2 Use Case Diagram
A use case diagram is the primary form of system/software requirements for a new
software program underdeveloped. Use cases specify the expected behavior (what), and
not the exact method of making it happen (how). Use cases once specified can be
denoted both textual and visual representation. A key concept of use case modeling is
that it helps us design a system from the end user's perspective.
3.4.3 State Chart Diagram
A State Chart Diagram describes different states of a component in a system. The states
are specific to a component/object of a system.A Statechart diagram describes a state
machine. State machine can be defined as a machine which defines different states of an
object and these states are controlled by external or internal events.
3.4.4 Sequence Diagram
A use case diagram is the primary form of system/software requirements for a new
software program underdeveloped. Use cases specify the expected behavior (what), and
not the exact method of making it happen (how). Use cases once specified can be
denoted both textual and visual representation. A key concept of use case modeling is
that it helps us design a system from the end user's perspective.
3.4.5 Entity Relationship Diagram
A use case diagram is the primary form of system/software requirements for a new
software program underdeveloped. Use cases specify the expected behavior (what), and
not the exact method of making it happen (how). Use cases once specified can be
denoted both textual and visual representation. A key concept of use case modeling is
that it helps us design a system from the end user's perspective.
CHAPTER 4
System Design
Systems design is the process of designing elements of system such as the architecture,
modules, interfaces and flow of data in system. These elements are defined, developed
and designed to satisfy the specific needs and requirements of client
User interface design (UI) or user interface engineering is the design of user interfaces for
machines and software, such as computers, home appliances, mobile devices, and other
electronic devices, with the focus on maximizing usability and the user experience. Good
user interface design facilitates finishing the task at hand without drawing unnecessary
attention to itself.
4.2 Test Case Design
Test Case 1
Test Case 2
Conclusion
No program has a perfect design without any flaws; it is the same here in this
program. Even though the program is completed with all the primary functions
implemented and working properly, there are still many things that can be done with
this program. As the future improvement, the potential work that can be
implemented ranging from adding more functions to offering the user a more
comprehensive, convenient program, refining the logic to make the program more
humanized and easy to use, increase the database capacity and add more possible
keywords, responses and data in this program, interface optimization and etc.
Additional Functions
Add more functions: although there have been 15 normal functions that are used
really often with the mobile phone, there can be more functions which simplify our
daily life and make it convenient to use. Functions such as playing movies, checking
stocks, exchange rate, downloading and uploading, installing APPs and etc, these can
be the potential functions that make the program more comprehensive and people
can enjoy more services in this program.
Database Capacity
Add database capacity and more humanized logical design; the program has a
predefined logic to make it work with the corresponding commands. Thus, the user
needs to follow the structure of the commands, contain the dedicated keywords and
well formalize the commands to work with each of the functions. In other words, the
program is limited by the database capacity and no solution will be found if the user
gives commands that are not readable by the program. Even if two commands have
the same meaning and should get exactly the same result set, the result might be that
of one is working and the other one fails. Hence, the program is to some extent
limited by the vocabulary and can be further optimized.
Humanized Voice Recognition
The more humanized the program is, the easier the user can use it. People should
accept that even if developers constantly try to add more predefined commands,
more responses to it, analyze and respond to the command more intelligently, the
program will never be completely comprehensive and contain all the possible
circumstances that the users meet. Nevertheless, the program will certainly be
improved and be more user-friendly if there can be more readable commands, more
humanized structure and more intelligent response.
Improved Interface
Interface optimization, the interface can be further improved to make it nice to the
users. Currently the interface design meets the basic requirement to present
everything for this program, and the users are able to interact with the program
through this interface, but the interface can always be optimized and more suitable
constructed.
Final Words
We have learned quite a valuable lesson while making this program. Finding new
syntaxes and modules of course, but the main thing that We learnt during this
process is not to have high expectations of the outcome.
We tried implementing an alarm function and voice recognition and it failed
miserably; it cannot work simultaneously with the main assistant program. We’ve
tried literally everything: subprocessing, multithreading, multiprocessing. Nothing
worked.
This gives us a valuable lesson to learn, that is not to expect high outcomes of
anything, even the littlest things.
Finally, We thank Jyoti Ma’am that has guided us throughout this semester.
We learned a lot of new things in the programming world and improved our
knowledge about Python.
Our profound apology, if ever We have done something that was, or were not
pleasing to you.
References
Websites referred
www.stackoverflow.com
www.pythonprogramming.net
www.codecademy.com
www.tutorialspoint.com
www.google.co.in
Books referred
Documents referred
Designing Personal Assistant Software for Task Management using Semantic Web
Technologies and Knowledge Databases - Purushotham Botla
Thakur, N., Hiwrale, A., Selote, S., Shinde, A. and Mahakalkar, N., Artificially
Intelligent Chatbot.
Project