You are on page 1of 4

Voice Based Virtual Assistant

(Dr. Subrata Sahana,Assistant Professor in Department of Computer Science Engineering

School of Technology, Sharda University)

Bhawana Sati Karan Rana


Department of Computer Science Department of Computer Science Kuhil Saikia
Engineering Engineering Department of Computer Science
Sharda University Sharda University Engineering
Greater Noida, India Greater Noida, India Sharda University
Email Id:- Email Id:- Greater Noida, India
2018012988.bhawana@ug.sharda.ac.in 2018008328.karan@ug.sharda.ac.in Email Id:-
2018003132.kuhil@ug.aharda.ac.in

Sameer Kumar
Department of Computer Science
Engineering
Sharda University
Greater Noida, India
Email Id:-
2018011552.sameer@ug.sharda.ac.in

We would express our sincere gratitude towards our


project guide DR. Subrata Sahana for his extensive support
Abstract— This paper is based on voice based virtual and expertise on the project, research , development and
assistant. It is a tool in AI that allows us to fulfill different execution.
purposes just by giving voice commands. The voice assistant we
have developed is a desktop-based built using python modules
and libraries. This assistant is just a basic version that could
perform all the basic tasks which have been mentioned above Introduction
but current technology is although good in it is still to be
This field of virtual assistants having speech
merged with Machine learning for better classifications.
recognition has seen some major advancements or
Hence in this process, we developed a system which recognizes innovations. This is mainly because of its demand in
the voice from the dataset and then the corresponding response devices like smart watches or fitness bands,
will be done. The process works on three stages like, speakers, Bluetooth earphones, mobile phones,
preprocessing, feature extraction and classification. Voice laptop or desktop, television, etc. Almost all the
assistants are the great innovation in the field of AI that can digital devices which are coming nowadays are
change the way of living of the people in a different manner. coming with voice assistants which help to control
The voice assistant was first introduced on smart phones and the device with speech recognition only. A new set
after the popularity it got. It was widely accepted by all. of techniques is being developed constantly to
Initially, the voice assistant was mostly being used in smart improve the performance of voice automated
phones and laptops but now it is also coming as home search. As the amount of data is increasing
automation and smart speakers. Many devices are becoming exponentially now known as Big Data the best way
smarter in their own way to interact with human in an easy to improve the results of virtual assistants is to
language. The Desktop based voice assistants are the programs incorporate our assistants with machine learning and
that can recognize human voices and can respond via train our devices according to their uses. Other
integrated voice system. This paper will define the working of a
major techniques that are equally important are
voice assistant, their main problems and limitations. In this
Artificial Intelligence, Internet Of Things, Big Data
paper it is described that the method of creating a voice access and management, etc. The development of
assistant without using cloud services, which will allow the artificial intelligence is growing day by day that are
expansion of such devices in the future. able to recognize human voice as a command,
analyse it and provide services to humans. The
voice assistant is gaining lots of popularity today. It
is the era where no longer a human learns how to
Keywords— voice assistant, natural language processing, communicate with machines, but a machine learns
python, speech recognition. to communicate with a human, exploring his
actions, asking about his hobbies, habits and trying
become a human's personal assistant. A voice
assistant is a software program that can perform
Acknowledgement tasks or provide some kind services for an
individual based on the human verbal commands
i.e. by using human voice commands and the voice

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


assistant will respond via synthesized voice. The The accuracy, speed, and contextual abilities of Alexa,
Users can ask their assistants’ questions, control Google Assistant, and Siri are all thanks to Machine
home automation devices, and media playback via Learning algorithms and servers owned by their developing
voice, and manage other basic tasks such as email, companies. all of them add an identical manner, the sole
to-do lists, open or close any application. difference arises in their protocols and data privacy
intricacies. When a user makes missive of invitation, the
The first virtual assistant was not developed by apple that request is instantly packaged up and is shipped to the server
was introduced as Siri in the year 2012. The first virtual of their respective companies for a response i.e. why internet
assistant was actually introduced in the year 1963 at the connectivity is one amongst the fundamental requirements
Seattle World’s fair by IBM which was named Shoe Box. for Virtual Assistants to function properly. After the
package is distributed to the server the words and tone of
It can easily recognize 1011 words which is equal to the your request are analyzed by a collection of algorithms,
vocabulary knowledge of a 3 year old student. It can do some which are then matched with a command that it thinks, you
mathematical operations, recognize numbers from 0 to 9 and asked.
was able to recognize 16 spoken words. The Virtual
Assistant Methodology got popular along with the
The complexity is in respect to the speed of task fulfilment
development of AI and ML. An average person can speak
and understanding of what the user wants within the first try.
150 words in a minute and can only type 40 in a minute.
Once it knows what it has to do, that’s a basic process of
These are the reasons why the IT companies developed tapping into a server, third-party computer, or the other
their own personal virtual assistants to make their work flow device. Here are the 2 most successful Voice Assistants:
easy and smooth.
i) Alexa:-
Amazon has designed the Alexa Voice Service
(AVS), Amazon’s cloud-based voice service, to mimic real
Literature Survey conversations, using intuitive voice commands to induce this
service to perform specific tasks. Alexa acts as a “wake up”
The proposed system was based on voice recognition in word for the Alexa supported devices that alert the service to
commands that convert speech into text. The user gets easier start out paying attention to the user commands. Its AVS has
access to speech than typing on the keyboard. The user spells intelligent voice recognition and language understanding
commands in the voice so that the user has good quality services. it's ready to play music, provide information,
microphone. The user must spell the word with the correct deliver news and sports scores, weather reports, controls
pronunciation that should be understood in the voice smart home services, and even allow Prime members to
recognition system. But there is some confusion in the shop for order products from Amazon.
system. For example, the words sun and son have the same
pronunciation so the system finds it difficult to choose which
name. To provide a powerful microphone to be heard in the ii) Siri:-
system. Voice is a simple and effective way to communicate. Siri is Apple’s virtual/digital assistant for iOS,
Many people prefer to use speech rather than text. A basic macOS, tvOS, and watchOS devices that use voice
overview of the proposed system that receives voice input recognition. it's supported the fields of AI, Machine
signals in the form of voice. It moves to the release of the Learning, and language Processing. And it's supported three
feature and enters the scanner. The decoder contains two basic components: a conversational interface, personal
main models, the acoustic model and the language model. context awareness, and repair delegation. It enables the user
The decoder will record the input. After all the processing to work the mobile device and its apps using natural voice
that goes into the decoder provides the output. The categories commands. Siri can facilitate your when the user is out and
of speech recognition systems are the types of utterances, the about, with sports and entertainment information, phone
types of platform models, the types of words. calls and messages, getting organized, tips and tricks, read
your last email, text your friends and family, shuffle playlist,
even finding a table for 3 in an exceedingly restaurant and
Speech to Text Command Execution far more stuff like this.

ML Models explain - 
 Speech to Text:- The user will give the input as These are the ML models which are used in it:-
a speech and will ask it to run the command.
 Command Execution:- Based on the command i) KNN  Classifier:-
given by the user the system will perform the K-Nearest Neighbors (KNN) is one in every of the
necessary execution like e.g.- opening paint or only algorithms employed in Machine Learning for
notepad or Google chrome , etc . regression and classification problems. KNN algorithms use
data and classify new data points supported similarity
 Text to Speech:- This is the most interesting
measures (e.g. distance function). Classification is
part where the machine process the input given
completed by a majority vote to its neighbors. As you
by the user and speaks the command. This
increase the quantity of nearest neighbors, the worth of k,
makes the user experience more interactive with
the system. accuracy might increase. It is used for both classification
and regression problems. However, it's more widely
employed in classification problems within the industry. K
nearest neighbors could be a simple algorithm that stores all
available cases and classifies new cases by a majority vote
of its k neighbors. The case being assigned to the category is
commonest amongst its K nearest neighbors measured by a
distance function.
These distance functions will be Euclidean, Manhattan,
Minkowski, and Hamming distances. the primary three
functions are used for continuous function and therefore the
fourth one (Hamming) for categorical variables. If K = 1,
then the case is just assigned to the category of its nearest
neighbor. At times, choosing K seems to be a challenge
while performing KNN modeling. KNN classifier has
disadvantages which are KNN is computationally expensive,
to prevent higher range variables to bias it variable should
be normalized, before Works on pre-processing stage more
should go for kNN like an outlier, noise removal.

ii) CNN Architecture:-


Convolutional neural networks (or ConvNets) are Interaction of AI voice assistants with people
biologically-inspired variants of MLPs, they need different
types of layers and every different layer works different than As technology advances the interaction of people with
the same old MLP layers. If you're curious about learning devices changes drastically. The internet searches has
more about ConvNets, a decent course is that the CS231n – become easier after the development of virtual assistants.
Convolutional Neural Newtorks for Visual Recognition. In Now, the search engines like Google, bing, etc recognize
deep learning, a convolutional neural network (CNN, or what you are trying to search in the internet and gives you
ConvNet) could be a class of deep neural networks, most the appropriate results. The search engines easily understand
typically applied to analyzing visual imagery. CNNs are the context and content of the search.
regularized versions of multilayer perceptrons. Multilayer The virtual assistants has made some major growth in the
perceptrons usually seek advice from fully connected world of technology. In early days the only way possible to
networks, that is, each neuron in one layer is connected to search in a search engine was through the text but now the
any or all neurons within the next layer. The "fully- voice has taken over the charge.
connectedness" of those networks makes them susceptible to
overfitting data. Typical ways of regularization include
adding some kind of magnitude measurement of weights to The virtual assistants are always listening for the users to
the loss function. However, CNNs take a unique approach receive a command. The words like hey siri or OK Google
towards regularization: they profit of the hierarchical pattern are like the default wakeup call for them. The virtual
in data and assemble more complex patterns using smaller assistants are connected to a server where the responses are
and simpler patterns. Therefore, on the dimensions of stored in the database. So, they need a constant source of
connectedness and complexity, CNNs are on the lower internet to work efficiently. When you give the wakeup call it
extreme. immediately connects to the server. The wakeup call is set as
an algorithm for the virtual assistant to connect to the
iii) SVM (Support Vector Machine):- internet.
It is a classification method. during this algorithm, They don’t really understand what you are saying they
we plot each data item as a degree in n-dimensional space just hears the command and immediately communicate with
(where n may be a number of features you have) with the the server and look for possible answers to reply as soon as
worth of every feature being the worth of a selected possible. NLP is the technology behind it that helps the
coordinate. For example, if we only had two features like virtual assistants to hear the command and process the voice
Height and Hair length of a personal, we’d first plot these to text to be understandable to the system to search for
two variables in two-dimensional space where each point relevant answers in the server database.
has two coordinates (these co-ordinates are referred to as They allows us to do various tasks so, that is the primary
Support Vectors). reason that people likes using them so much in their day to
day life , especially on their phones.

 
In the upper, training data are converted into CNN
descriptors and are incorporated into a voice dataset.
The voice templates in the dataset are indexed by a
KNN structure to speed up the voice matching
procedure.

Future Scope
In future, the proposed system can be used in
making a software application which can be
used in various platforms like educational
institutes, healthcare services and many others.
We will encourage the readers to work more on
Natural Language Processing projects for the
good and development of the society.

The advantages of Virtual Assistants


 It converts the texts to speech. Conclusion
 It will help you to find the application easily The voice based virtual assistant system in this
paper is very fundamental with few features
 It helps to set reminders and makes your daily however the additional and advance feature may
life really easy. be introduced as future work of this project. In
this paper the design and implementation is
described . With the development of the project
Limitations of Virtual Assistants we were able to learn and gain more knowledge
on NLP and AI and ML Models which are the
foundations for future Artificial Intelligence
 The data that is entered should be accurate Models.
otherwise it will not give accurate answers.
 The deaf and dumb users will not be able to
access it. References
 The microphone that is used should be of higher  Research paper by Ankit Pandey and
quality for accurate voice input. For ex:- Sun Vaibhav Vashist on Smart Voice Based
can be misspelled as son. Virtual Personal Assistant.
 International Journal of Advanced
Science and Technology. Vol. 29, No.
7s(2020),pp. 1651-1654
Applications of Virtual Assistants  Google.co.in
 This can be used by various people to get
counselling session online.
 It makes the operations hands free.
 About 50% of people those who were surveyed
said that it is a better interaction medium and
finds it helpful and easier than other means of
communication used.

Proposed System
The system proposes a new framework to solve the
aforementioned problems.
This processes based on with substantial extensions.

You might also like