Fin Irjmets1685456342

e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com
ARTIFICIAL INTELLIGENCE -BASED VOICE ASSISTANT
Gowhar Ahmad Dar*1, Jeby Tom Kurian*2, Abin K Shaji*3, Chrisil T Jose*4,
Dr. Anju ratap*5
*1,2,3,4Students, Dr. A.P.J. Abdul Kalam Technical University Kerala, Computer Science & Engineering, Saintgits
College of Engineering (Autonomous), Kottayam, Kerala, India.
*5Head of Department, Department of Computer Science & Engineering, Saintgits College of Engineering
(Autonomous), Kottayam, Kerala, India.
DOI : https://www.doi.org/10.56726/IRJMETS40794
ABSTRACT
Artificial intelligence era is starting to be actively utilized in human life, making it simpler to visualize.
Independent gadgets are clever in their methods of speaking with each other. One of the maximum suitable
sorts of synthetic intelligence is the capacity to realize human herbal language. New thoughts in this text may
also result in new methods of operating with the human system, wherein the system will learn how to
understand, adapt, and have interaction with it. Thus, we want to broaden a private assistant having splendid
powers of deduction and the capacity to have interaction with the environment simply with the aid of using one
of the materialistic styles of human interaction, i.e., with the aid of using voice.
Desktop-primarily based totally voice assistants are applications which can realize human voices and might
reply through an incorporated voice system. To convert textual content to audio, we are able to use APIs. We
use the synthetic intelligence era for this project. Use Python as a programming language as well, as it has a big
library. This software program makes use of a microphone as an enter tool to get hold of voice requests from
the consumer and a speaker as an output tool to present the output voice.
Keywords: Python, Assistant, Incorporated voice system, Microphone, Speaker, APIs.
I. INTRODUCTION
This project is based on web application development and provides a personal assistant using voice recognition
or text mode control. This program includes features and services: call services, text message transformation,
mail exchange, alarm, event handler, location services, music player service, weather control, Google search
engine, Wikipedia search engine, chat robot, camera, Bing translator, Bluetooth headset support, help menu
and Windows Azure cloud computing. Virtual assistants are very useful for the elderly, people with disabilities
or special cases, and young children who do not know how to operate machines or smart devices. They ensure
that their interactions with machines are no longer difficult and also enable them to multitask. Upcoming
technology trends like virtual reality, augmented reality, voice interaction, IOT etc. are changing the way people
engage with the world and transforming digital experiences. Voice control is one of the major advancements in
human-machine interaction enabled by advances in artificial intelligence. Nowadays, we are able to train our
machines to perform tasks on their own or to think like humans using technologies such as artificial
intelligence and machine learning. Recently, the great appearance of voice assistants like Apple Siri, Google
Assistant, Microsoft Cortana and Amazon Alexa has been noted due to the heavy use of smartphones. Voice
assistants use technologies such as voice recognition, speech synthesis, and natural language processing (NLP)
to provide a variety of services that help users perform tasks with their devices simply by giving voice
commands. With the help of a voice assistant, there will be no need to type commands again and again to
perform a specific task. Voice search prevailed over text search. Mobile web searches have only just overtaken
desktop searches, and analysts are already predicting that by 2022, 50% of searches will be conducted by voice.
Virtual assistants are proving to be smarter than ever. Let your intelligent assistant do the email work for you.
Discover intent, select important information, automate processes and deliver personalized responses. This
project was started on the premise that there is enough publicly available data and information on the web that
can be used to build a virtual assistant that has access to intelligent decision making for common user activities.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[8177]
e-ISSN: 2582-5208
II. METHODOLOGY
Methodology:
Voice assistants are all written in programming languages, which listens the verbal commands and respond
according to the user's requests. In this project we have used Python Programming language to build the Al-
based Voice assistant. A user can say, "Play me a Song" or "Open facebook.com", the voice assistant will respond
with the results by playing that particular song or by opening Facebook website. The Voice assistant waits for a
pause to know that users have finished their request, then the voice assistant sends users request to its
database to search for the request.
 The request asked by the user gets split into separate commands, so that our voice assistant can able to
understand
 Once within the commands list, our request is searched and compared with the other requests
 The commands list then sends these commands back to the Voice assistant.
 Once the voice assistant receives those commands, then it knows what to do next.
 The voice assistant would even ask a question if the request is not clear enough to process it, in other
words, to make sure it understands what we would like to receive.
 If it thinks, it understands enough to process it, the voice assistant will perform the task which the user has
asked for.
Working of ASR:
As shown in Figure 1. Automatic Speech Recognition which is termed as ASR is the main principle behind the
working of Al-based Voice Assistant [4] ASR systems, at first it records the speech, then the wavefile has been
created by the device which consists of the words it hears, later the wavefile will be cleaned so that the
background noise would get deleted and the volume will be normalized, then it will break down into elements
and it will be analysed in sequences, then the ASR software examines these sequences and it implements
statistical probability to find out the entire words and then it will get processed into text content [5] The better
method to recognise elements is Element Recognition as it provides better results than the method
of word decoding.
Figure 1: Process of ASR.

It does not matter what kind of speech recognition software we may use, because all the work happens in its
ASR During a nutshell, at first the method starts with the device gathering audio with the source, where source
is microphone, then the Recorded speech waveforms will be sent to acoustic analysis, which will be performed
on three different levels, as shown in Figure 2.
Acoustic Analysis
 Acoustic Modelling: In this process, it represents that the elements were pronounced or not and what are
the words which can complete these elements
 Pronunciation Modelling: That analyses the way, where how these elements are pronounced, it will check
whether there is any accent or other peculiarities
 Language Modelling: This is often aimed toward finding contextual probabilities counting on what
elements were captured.
All the data which were recorded get processed by Artificial Intelligence without any human interaction, then
the speech waveforms data is transmitted to the decoder where it finally transforms into text for further
use like command

[8178]
e-ISSN: 2582-5208
Figure 2: Acoustic Analysis.

III. MODELING AND ANALYSIS
The project will give a fair knowledge about the intelligent assistant which is capable of understanding the
commands given by the user. Our assistant can easily. understand the commands given by the user through
vocal media and responds as required. Our assistant performs the most frequently asked requests from the
user and makes their task easier. Our voice assistant listens to the command given by the user through the
microphone. After listening it will say "done listening" and displays what the user said and acts accordingly.
In our project we have installed gTTS engine package to make the voice assistant speak like a normal human
being. We have defined a function called 'voice assistant speak', as explained in (1) The gTTS will analyze the
command given by the user through microphone and searches in the browser the required response and
convert that response into text.
tts=gTTS(text audio_string, lang='en') (1)
gTTS is basically used to convert the audio string into text. This audio string is nothing but the response which
the voice assistant is supposed to give the user. The language of the text is chosen to be English, the code for
English is 'en'. We save this entire function into 'tts'. We are saving this text, that is the audio file with the mp3'
extension. Each audio file is given a random number from 1 to 20000000. The random number can be
generated using the command random.randint(). This whole .mp3 extension file is saved under the name 'audio
file'. Finally to save this audio file we have used the command as mentioned in (2).
tts.save(audio file) (2)
This command (2) saves the audio file in the system. (Ex-'audio24854.mp3').
Text-To-Speech:
Text-to-Speech (TTS) refers to computers' ability to read text aloud. A TTS Engine converts written text into a
phonemic representation, which is then converted into waveforms that can be output as sound. Third-party
publishers offer TTS engines with various languages, dialects, and specialized vocabularies.
Speech Recognition:
The system converts speech input to text using Google's online speech recognition system. The voice input
Users can obtain texts from the special corpora organized on the computer network server at the information
center, which are temporarily stored in the system before being sent to Google cloud for speech recognition.
After that, the equivalent text is received and fed into the central processor.
API Calls:
API is an abbreviation for Application Programming Interface. An API is a software interface that allows two
applications to communicate with one another. In other words, an API is the messenger that sends your request
to the provider you're requesting from and then returns the response to you.

[8179]
e-ISSN: 2582-5208
Context Extraction:
Context extraction (CE) is the process of extracting structured information from unstructured and/or semi-
structured machine-readable documents automatically. The majority of the time, this activity involves using
natural language processing to process human language texts (NLP). Recent developments in multimedia
document processing, such as automatic annotation and content extraction from images/audio/video, could be
viewed as context extraction TEST RESULTS.
IV. RESULTS AND DISCUSSION
The required packages of Python programming language has been installed and the code was implemented
using PyCharm Integrated development environment (IDE) and the python code we have developed runs in
both Python 2.7 and Python 3.x, and below are the few outputs which we have received in our AI-based voice
assistant.
1. Google Search Output
As shown in below Figure 3. When we ask the voice assistant to search 'Akshay Kumar', it receives the request
and performs the action by searching google.
Figure 3: Output screen of performing Google Search

2. Weather
As shown in Figure 4. when we ask weather to the voice assistant, it receives the request and responds back by
It prints the weather of that location at that time.
Figure 4: Output screen of displaying Weather Forecast

3. Generate Pdf
As shown in Figure 5. If the user wants to do two things simultaneously, then like pdf generation and other
stuff , The system will assist in adding user command data to the pdf and will generate it. So that information

[8180]
e-ISSN: 2582-5208
user wants to store in PDF format, the system will convert the user's speech into PDF and stored at specific
location.
Figure 5: Output Screen of Generating Pdf

V. CONCLUSION
The project is very useful and has great potential for use in various industries. Although the programmer
primarily focuses on how to use a personal assistant on websites, Voice and the concept of voice recognition
can be used in various industries, as in many. It will be more convenient, save a lot of time, and be especially
useful for those who have difficulty working with manual operations. There may be more applications or
products developed using voice technology in the future of the program. Controlling and, in a certain sense,
changing the forms of work that are quite different from the traditional form A programmer that uses voice is
useful for those who prefer voice control and for those who have difficulties or disabilities with manual
operations. The primary objective of the programmer is to provide voice services, and it allows more people to
enjoy this program. In addition to the program, we as developers learned a lot from the project. It's completely
different from what we've experienced before in the working model, the volume of tasks, and the challenges
we've encountered. In conclusion, we have learned a lot and improved a lot thanks to the project. development
and gained development experience as well as programming skills; for long-term and demanding development,
it is important to work as a team.
VI. REFERENCES
[1] Agrawal, Nivedita Singh, Gaurav Kumar, Dr. Diwakar Yagyasen, Mr. Surya Vikram Singh. "Voice Assistant
Using Python" An International Open Access-revied, Refereed Journal.Unique Paper ID: 152099,
Publication Volume & Issue: Volume 8, Issue 2, Page(s): 419-423.
[2] George Terzopoulos, Maya Satratzemi “Voice Assistants and Smart Speakers in Everyday Life and In
Education”, Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece.
[3] Deepak Shende. Ria Umabiya, Monika Raghorte, Aishwarya Bhisikar. Anup Bhange. "Al Based Voice
Assistant Using Python", International Journal of Emerging Technologies and Innovative Research
(www.jetir.org), ISSN 2349-5162, Vol.6, Issue 2, page no.506-509, February-2019.
[4] Saadman Shahid Chowdury, Atiar Talukdar, Ashik Mahmud, Tanzilur Rahman, "Domain specific
Intelligent personal assistant with bilingual voice command processing," IEEE 2018.
[5] Tulshan, Amrita & Dhage, Sudhir. (2019). “Survey on Virtual Assistant: Google Assistant, Siri, Cortana,
Alexa”, 4th International Symposium SIRS 2018, Bangalore, India, September 19–22, 2018, Revised
Selected Papers. 10.1007/978-981-13- 5758-9_17.
[6] Polyakov EV, Mazhanov MS, AY Voskov, LS Kachalova MV, Polyakov SV "Investigation and development
of the intelligent voice assistant for the IOT using machine leaming." Moscow workshop on electronic
[8181]
e-ISSN: 2582-5208
technologies, 2018.
[7] Dr. Kshama V. Kulhalli, Dr.Kotrappa Sirbi, Mr. Abhijit J. Patankar, "Personal Assistant with Voice
Recognition Intelligence", International Journal of Engineering Research and Technology. ISSN 0974-
3154 Volume 10, Number 1 (2017).
[8] K. Noda, H. Arie, Y. Suga, T. Ogata, Multimodal integration learning of robot behavior using deep neural
networks, Elsevier: Robotics and Autonomous Systems, 2014.
[9] Thakur, N., Hiwrale, A., Selote, S., Shinde, A. and Mahakalkar, N., Artificially Intelligent Chatbot.
[10] Huang, J., Zhou, M. and Yang, D., 2007, January. Extracting Chatbot Knowledge from Online Discussion
Forums. In IJCAI(Vol. 7, pp. 423-428).
[11] JKhawir Mahmood, Tausfer Rana, Abdur Rehman Raza, "Singular adaptive multi role intelligent personal
assistant (SAM-IPA) for human computer interaction," International conference on open source system
and technologies, 2018.
[12] Piyush Vashishta, Juginder Pal Singh, Pranav Jain, Jitendra Kumar, "Raspberry PI based voice-operated
personal assistant," International Conference on Electronics And Communication and Aerospace
Technology, ICECA, 2019.
[13] Veton Kepuska and Gamal Bohota. "Next generation of virtual assistant (Microsoft Cortana, Apple Siri,
Amazon Alexa and Google Home)." IEEE conference, 2018

[8182]

Fin Irjmets1685456342

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fin Irjmets1685456342

Uploaded by

Copyright:

Available Formats

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Figure 1: Process of ASR.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Figure 2: Acoustic Analysis.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Figure 3: Output screen of performing Google Search

Figure 4: Output screen of displaying Weather Forecast

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

Figure 5: Output Screen of Generating Pdf

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

You might also like