You are on page 1of 61

A Project Report on

Product to support Elderly and Disabled for Recognizing and Responding to


Daily Sounds through Colour Depiction and Gestures

By

D. Vaishnavi 17251A05C8

M. Sreevidya 17251A05E6

C. Shruthi 18255A0525

Department of Computer Science & Engineering


G. Narayanamma Institute of Technology & Science (For Women)
(Autonomous)
Approved by AICTE, New Delhi & Affiliated to JNTUH, Hyderabad
Accredited by NBA & NAAC, an ISO 9001:2015 Certified Institution
Shaikpet, Hyderabad-500104

June 2021
A Project Report on

Product to support Elderly and Disabled for Recognizing and Responding to


Daily Sounds through Colour Depiction and Gestures

Submitted to the Department of Computer Science & Engineering, GNITS in the partial fulfillment of the
academic requirement for the award of B. Tech (CSE) under JNTUH

By

D. Vaishnavi 17251A05C8

M. Sreevidya 17251A05E6

C. Shruthi 18255A0525

Under the guidance of


Dr.A.Sharada
Professor, CSE dept.

Department of Computer Science & Engineering


G. Narayanamma Institute of Technology & Science (For Women)
(Autonomous)
Shaikpet, Hyderabad- 500 104.

Affiliated to
Jawaharlal Nehru Technological University Hyderabad
Hyderabad – 500 085

June 2021
G. Narayanamma Institute of Technology & Science
(Autonomous) (For Women)
Shaikpet, Hyderabad – 500 104.
Department of Computer Science & Engineering

Certificate
This is to certify that the Project report on “Product to support Elderly and Disabled for Recognizing
and Responding to Daily Sounds through Colour Depiction and Gestures” is a bonafide work carried out by
D.Vaishnavi (17251A05C8), M.Sreevidya (17251A05E6), C.Shruthi (18255A0525) in the partial
fulfillment for the award of B.Tech degree in Computer Science & Engineering, G. Narayanamma Institute
of Technology & Science, Shaikpet, Hyderabad, affiliated to Jawaharlal Nehru Technological University,
Hyderabad under our guidance and supervision.

The results embodied in the project work has not been submitted to any other University or Institute
for the award of any degree or diploma.

Internal Guide Head of the Department


Dr.A.Sharada Dr. M. Seetha
Professor & Head
Professor Department of CSE

External Examiner
Acknowledgements

We would like to express our sincere thanks to Dr. K. Ramesh Reddy, Principal, GNITS, for providing
the working facilities in the college.

Our sincere thanks and gratitude to Dr. M. Seetha, Professor and HOD, Dept. of CSE, GNITS for all the
timely support and valuable suggestions during the period of our major project.

We would like to extend our sincere thanks and gratitude to Dr. K.Venugopala Rao, Professor, overall
project coordinator ,Dept. of CSE, GNITS for all the valuable suggestions and guidance during the period of
our project.

We are extremely thankful to Dr. K. Venugopala Rao, Professor , Dr. D.V.L.Parameswari ,


Sr.Assistant Professor , Mrs.P.Sunitha Devi, Asst. Professor , Mr.B.Vamshi, Asst. Professor Dept. of
CSE, GNITS, project coordinators for their encouragement and support throughout the project.

We are extremely thankful and indebted to our internal guide, Dr.A.Sharada, Professor, Dept. of CSE,
GNITS for her constant guidance, continuous advice, encouragement and moral support throughout the
major project.

Finally, we would also like to thank all the faculty and staff of CSE Department who helped us directly or
indirectly, parents and friends for their cooperation in completing the major project work.

D. Vaishnavi 17251A05C8

M. Sreevidya 17251A05E6

C. Shruthi 18255A0525
ABSTRACT

The rapidly evolving technologies have been assisting the elderly and people with disabilities through
voice assistants to an extent. These devices are aimed at providing additional access to such individuals with
cognitive difficulties, impairments, and disabilities. Furthermore, the rising elderly and disabled population also
demands increased responsibility to assist themselves with the slowly debilitating disability. These are the
people who are in need of assistive devices to perform their day-to-day activities for maintaining a healthy,
independent, and dignified lifestyle. But, if voice is the future of everything, then what about the people who
can’t hear or speak is the main idea behind the project. To tackle this problem, a web interface is built which
includes a Gesture Recognition System and a Color Depiction System integrated to a voice assistant.

The Gesture Recognition System helps the elderly and the disabled by offering them a perceptual
computing interface that captures and intercepts hand gestures. These gestures recognized by the device enables
them to communicate with the actions, by converting their gestures into text (that can be displayed onto the web
interface) and speech. This usage of gesture recognition helps to bridge the gap between the abled and disabled.

The Color Depiction System would aid the elderly and disabled community to identify and react to
various common events occurring in their surroundings through light emitted from the Smart Bulbs, when they
are lit on activation by surrounding sounds. Some common events being opening the door when someone rings
the doorbell, attending to a phone call or to respond to a person etc.

This project – a potential technological solution – serves the elderly and disabled to overcome the
problems faced due to the existing human assistants in order to improve their living standards by the application
of science and technology. The Voice Recognition System has predicted and executed all the commands in the
right manner, the Gesture Recognition System that used the KNN Model also gave the right predictions for the
gestures posed by the users and finally the Colour Depiction System has rightly observed all the surrounding
environment sounds and has accurately reflected the corresponding colours on the Smart Bulb. Hence, the
integration of all these three modules gave us the best output that makes this device a perfect considerate for the
elderly, deaf and dumb people.

iv
Contents
S.No. Page
No.

ABSTRACT iv
1 INTRODUCTION 1
1.1 Existing Systems 3

1.1.1 Voice Recognition Systems 3

1.1.2 Gesture Recognition Systems 5

1.1.3 Smart Bulb Technology 6

1.2 Advantages and Disadvantages 7

1.3 Proposed System 8

1.4 Objectives 8
1.5 Methodology 9
1.6 Organization of the Project 10
2 ASSISTIVE DEVICE THROUGH GESTURES AND COLOUR DEPICTION 11
2.1 Architecture 11
2.1.1 Voice Recognition System Architecture 12

2.1.2 Gesture Recognition System Architecture 12

2.1.3 Colour Depiction System Architecture 12

2.2 Module Description 13


2.2.1 Voice Recognition Module 13

2.2.2 Gesture Recognition Module 13

2.2.3 Colour Depiction Module 14

2.3 Algorithms Used 15


3 IMPLEMENTATION/ CODING 16
3.1 Description of the Technologies Used 16

3.2 Description of the IoT Components Used 18

3.3 Implementation of the Integrated System 19


3.3.1 Implementation of Voice Recognition System 19

3.3.2 Implementation of Gesture Recognition System 20

3.3.3 Implementation of Colour Depiction System 21

3.4 UML Diagram 22


3.4.1 Use Case Diagram 23
3.4.2 Sequence Diagram 27
3.4.3 Activity Diagram 30
3.5 Dataset Used 33
4 RESULTS AND DISCUSSIONS 35
4.1 Discussion of Results 35
4.1.1 Results of Voice Recognition System 35
4.1.2 Results of Gesture Recognition System 35
4.1.3 Results of Colour Depiction System 36
4.2 Screenshots of the Voice Recognition Module 37

4.3 Screenshots of the Gesture Recognition Module 39

4.4 Screenshots of the Colour Depiction Module 47

4.5 Graphical Representation 49


5 CONCLUSIONS AND FUTURE SCOPE 50
REFERENCES 51
List of Figures
Fig. No. Name of the figure Page No.

Fig 1.1 Gesture Recognition 2


Fig 1.2 Voice Recognition 2
Fig 1.3 Smart Bulb 2
Fig 1.4 Amazon Alexa 3
Fig 1.5 Siri 4
Fig 1.6 Google Assistant 4
Fig 1.7 Wired Gloves 5
Fig 1.8 Depth-aware Camera 5
Fig 1.9 Gesture based Video Controller 6
Fig 1.10 Smart Bulb (Smart Lighting) 6
Fig 2.1 Architecture of the proposed system 11
Fig 3.1 Use Case Diagram of Voice Recognition System 24
Fig 3.2 Use Case Diagram of Gesture Recognition 25
System
Fig 3.3 Use Case Diagram of Colour Depiction System 26
Fig 3.4 Sequence Diagram of Voice Recognition System 27
Fig 3.5 Sequence Diagram of Gesture Recognition 28
System
Fig.3.6 Sequence Diagram of Colour Depiction System 29
Fig 3.7 Activity Diagram of Voice Recognition System 30
Fig 3.8 Activity Diagram of Gesture Recognition System 31
Fig 3.9 Activity Diagram of Colour Depiction System 32
Fig 4.1 Voice Assistant 37
Fig 4.2 Depicts the Home Page of the Web Application 39
Fig 4.3 Depicts the training of Start as a wake word. 40
Fig 4.4 Depicts the training of Stop as a wake word. 40
Fig 4.5 To capture Custom Gestures 41
Fig 4.6 To clear or change a sign for a specific gesture. 41
Fig 4.7 Depicts the Retraining option of a gesture 42
Fig 4.8 Predicting the start gesture. 42
Fig 4.9 Predicting the custom gestures 43
Fig 4.10 Predicting the Stop Gesture 43
Fig 4.11 Predicting Alexa, What is your name? 44
Fig 4.12 Predicting Alexa, open Twitter 44
Fig 4.13 Predicting Alexa, where am I? 45
Fig 4.14 Predicting Alexa, how are you? 45
Fig 4.15 Predicting Alexa, is it raining today? 46
Fig 4.16 Predicting Alexa, good afternoon. 46
Fig 4.17 Listening and recognizing the surrounding daily 47
life sounds
Fig 4.18 Color change after predicting the Class 48
Fig 5.19 Graphical Representation 49
1. INTRODUCTION

The present-day assistive devices are not playing their fullest role in easing the lives of a particular
section of society like elderly and disabled. There are still plenty of people who feel intimidated by
technology and the services it provides. There is plenty more to be done to help boost these numbers even
further. The elderly people have the greatest vulnerabilities to exploitation by institutions and individuals
due to the worsening of health and cognitive abilities and the reduction in social support over time. Other
section of the society like the disabled, have equal or arguably more risk of exploitation because of their
incapabilities.

According to a 2016 report by the ministry for statistics and program implementation, India has
103.9 million elderly people, above age 60 who are about 8.5 per cent of the population. The elderly
population has grown at about 3.5 per cent per year, double the rate for the population as a whole; a 2014
report shows that while India will be the youngest country in the world by 2050, 20 per cent of the
population, will be 'elderly'[5]. According to 2011 census, in India, 20% of the disabled persons are having
disability in movement, 19% are with disability in seeing, and another 19 % are with disability in hearing.
8% has multiple disabilities.

The common problems that are faced by the elderly and the disabled population are:

1) Chronic health conditions, Cognitive health conditions, Mental health conditions, Physical injuries,
Sensory impairments and Loneliness[8].

2) They cannot use voice recognition and voice search features in smartphone(s)

3) They are unable to use like google assistance, or Apple's SIRI etc, because all those apps are based
on voice controlling.

4) There are no existing devices that can help them get assistance through the recognition of Gestures,
when there is a lack of Voice based assistance.

5) There is a staunch necessity for technological advancements to increase the ability for the elderly
and disabled to socially communicate by identifying Daily Life Sounds through a system of Colour
Depiction using Smart Bulbs and act correspondingly.

1
The current state of art of assistant systems like Voice Recognition devices, Smart Bulb etc. as shown
in Figures 1.1, 1.2 and 1.3 are purely based on voice, which the elderly and disabled people may not feel
comfortable to use. The devices cannot read gestures and reciprocate [1].

Assistive devices and technologies are those whose primary purpose is to maintain or improve an
individual’s functioning and independence to facilitate participation and to enhance overall well-being.
They can also help prevent impairments and secondary health conditions. Examples of assistive devices
and technologies include wheelchairs, prostheses, hearings aids, visual aids, and specialized computer
software and hardware that increase mobility, hearing, vision, or communication capacities.

The proposed system is a human assistive application capable of performing voice, gesture
recognition and Color Depiction. This device is mainly developed with an intension of resolving the
exhaustive problems of paramount importance related to the elderly and the disabled (deaf and dumb). This
product helps the target audience perform their routine activities and also help them tackle their loneliness.
This system can become an interface for the disabled to socially communicate with others easily either
through gestures or color identification. This assistive technology enables people to live healthy, productive,
independent, and dignified lives, and enables them to participate in societal interactions. Finally, the
proposed system is a smart assistant easing the lives of humans with a perfect blend of gestures and colors.

2
This report highlights the issues being faced by the target group (Elderly and the Disabled
population), a Literature Survey on the existing devices along with the description of their disadvantages.
Then it details the proposed system with its objectives, architecture, implementation and results when the
entire system is integrated.

1.1 Existing System

This section highlights the existing systems with regards to voice and gesture recognition, smart
bulb technology. There are various voice recognition applications like Google Assistant, Alexa, Siri etc;
These Voice Recognition Systems are capable of executing a few commands given through Voice[7].
Moving on to Gesture Recognition System, its applicability still did not reach its potential, and is still under
research and innovation. Integrated services of Voice Recognition System and Gesture Recognition System
are not yet realised to full potential. Few of the existing Voice Recognition, Gesture Recognition Systems
and Smart Bulb Technology are detailed below:

1.1.1 Voice Recognition Systems:

a) Alexa:
Amazon Alexa as shown in Fig. 1.4, also known simply as Alexa, is a virtual assistant AI
technology developed by Amazon, first used in the Amazon Echo smart speakers developed by Amazon
Lab126[2]. It is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming
podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time information, such
as news. Alexa can also control several smart devices using itself as a home automation system.

Fig 1.4: Amazon Alexa

3
b) Siri:
Siri is the voice-controlled digital assistant built into iOS[6]. One can ask Siri to perform a variety
of tasks, such as setting reminders or getting travel directions. But it is yet to incorporate home automation
tasks.

Fig 1.5: Siri

c) Google Assistant:
Google Assistant is an artificial intelligence–powered virtual assistant developed by Google
that is primarily available on mobile and smart home devices. Users primarily interact with the Google
Assistant through natural voice, though keyboard input is also supported [4]. In the same nature and manner
as Google Now, the Assistant is able to search the Internet, schedule events and alarms, adjust hardware
settings on the user's device, and show information from the user's Google account. This google assistant is
depicted in Fig. 1.6 below.

Fig 1.6: Google Assistant

4
1.1.2 Gesture Recognition Systems:

a) Wired gloves

These can provide input to the computer about the position and rotation of the hands using magnetic
or inertial tracking devices[3]. Furthermore, some gloves can detect finger bending with a high degree of
accuracy (5-10 degrees), or even provide haptic feedback to the user, which is a simulation of the sense of
touch. The Wired Gloves system during its usage is shown in Fig. 1.7.

Fig 1.7: Wired Gloves


b) Depth-aware cameras

As shown in Fig 1.8, Using specialized cameras such as structured light or time-of-flight cameras,
one can generate a depth map of what is being seen through the camera at a short range, and use this data to
approximate a 3d representation of what is being seen[10]. These can be effective for detection of hand
gestures due to their short-range capabilities.

Fig 1.8: Depth-aware camera

5
c) Gesture-based controllers

These controllers act as an extension of the body so that when gestures are performed, some of their
motion can be conveniently captured by software. An example of emerging gesture- based motion capture
is through skeletal hand tracking, which is being developed for virtual reality and augmented reality
applications as in Fig 1.9.

Fig 1.9: Gesture based Video Controller

1.1.3 Smart Bulb Technology

A smart bulb is an internet-capable LED light bulb that allows lighting to be customized, scheduled and
controlled remotely[9]. Smart bulbs are among the most immediately successful offerings in the growing
category of home automation and Internet of Things (IoT) products. The following Fig 1.10 depicts the
Smart bulb Technology.

Fig 1.10: Smart Bulb (Smart Lighting)

6
1.2 Advantages and Disadvantages:
The advantages and disadvantages of the Existing Systems are as follows:

i) Advantages of Voice Recognition Systems:

 Makes services more personalized

 Generates more time- talking is faster than typing

 Boosts productivity levels of users

 Makes people accessible to everyone

ii) Disadvantages of Voice Recognition Systems:

 Configuration Issues

 Customization Issues

 Training Issues

iii) Advantages of Gesture Recognition Systems:

 Immediate and powerful interaction

 Intuitiveness an enjoyability

 Reduces number of touch points

 Makes routine tasks easy

iv) Disadvantages of Gesture Recognition Systems:

 Public Installations

 Irrelevant object might overlap with the hand gestures

 Ambient lights affect the gesture detection

7
1.3 Proposed System:

The proposed system is a human assistive application capable of performing voice, gesture recognition
and Color Depiction. This device is mainly developed with an intension of resolving the exhaustive problems
of paramount importance related to the elderly and the disabled (deaf and dumb). This product helps the target
audience perform their routine activities and also help them tackle their loneliness. This system can become an
interface for the disabled to socially communicate with others easily either through gestures or color
identification. This assistive technology enables people to live healthy, productive, independent, and dignified
lives, and enables them to participate in societal interactions. Finally, the proposed system is a smart assistant
easing the lives of humans with a perfect blend of gestures and colors.

1.4 Objectives:

The objectives of the Proposed System are as follows:

● To embody an assistance system to support elderly and disabled people (in specific to deaf and dumb) to
do their daily activities.
● To create an uplifting system for the elderly to tackle their problem of effective interaction and
communication.
● To provide a user friendly environment where user can be served better.

● Developing Gesture Recognition System to recognize the gestures of disabled/elderly people and
respond accordingly.

● To embed the system of Color Depiction for the elderly and disabled to respond to various daily life
sounds through the identification of colors depicted by the Smart Bulbs.

8
1.5 Methodology:

The system is built on the following steps/methods:

STEP 1: BUILDING A VOICE ASSISTANT

In the initial step the voice assistant is built. The voice assistant uses Amazon Voice Cloud Services
to decipher what has been said by the user, and then either answers the question or executes the command.

 Installing and setting up Raspberry Pi.


 Configuring the sound system of the Pi.
 Creating an Amazon Developer Account.
 Installing and configuring Alexa Voice Service on the Pi.
 Testing and running voice assistant.

STEP 2: DEVELOPING GESTURE RECOGNITION SYSTEM

The objective here is to develop gesture recognition system through the following steps:

 Installing Deep Learning and Tensor Flow modules.


 Creating the KNN model.
 Train and classify actions.
 Deploy the application and test the application.

STEP 3: EMBEDDING COLOR DEPICTION SYSTEM (with the help of Voice Assistant)

 Capture the Sound


 Train and Predict the category of the daily life sound
 Output Voice Command to activate the Voice Assistant
 Depict the specific colour as specified by the Voice Command through Smart Bulb.

9
1.6 Organization of the Project

In this project, Chapter 1 deals with the Introduction to the problem under consideration, existing
systems, their advantages and disadvantages, Proposed system, Objectives of the Project, Methodology and
Organization of the Project. Chapter 2 contains the Architecture of the proposed model, Module
Description and the Algorithms Used in building the Project. Chapter 3, deals with the Description of the
Technologies and the IoT used to develop the project. It depicts the Implementation of the entire System
and also picturizes the UML Diagram followed by the detailing of the Dataset Used. Chapter 4 deals with
the Results of the working system and discusses these results with suitable screenshots serving as proof for
the working of the system. It also displays the Graphical Representation of the Training Vs Validation
accuracy. The final chapter, Chapter 5, contains the Conclusion and Future Scope followed by various
References from different sources like Journals and links to websites that contributed to the work.

10
2. ASSISTIVE DEVICE THROUGH GESTURES AND COLOR DEPICTION

The proposed system is a human assistive application capable of performing voice, gesture recognition
and Color Depiction. This device is mainly developed with an intension of resolving the exhaustive problems
of paramount importance related to the elderly and the disabled (deaf and dumb). This product helps the target
audience perform their routine activities and also help them tackle their loneliness. This system can become an
interface for the disabled to socially communicate with others easily either through gestures or color
identification. This assistive technology enables people to live healthy, productive, independent, and dignified
lives, and enables them to participate in societal interactions. Finally, the proposed system is a smart assistant
easing the lives of humans with a perfect blend of gestures and colors.

2.1 Architecture

Fig 2.1: Architecture

11
The architecture of the Proposed System is laid above in the Fig 2.1, where the interaction between
the systems is depicted.

2.1.1 Voice Recognition System Architecture:

1. User inputs the query/message through Voice.


2. The query is captured through the mic
3. The query is sent to the Amazon Voice Services Cloud through the code present in SD Card of
Pi
4. The response obtained from the cloud
5. This output is converted to speech and is given to the Speakers
6. The required output is given back to the user using speakers.

2.1.2 Gesture Recognition System Architecture:

1. Inputting Gestures through Hand(s)


2. Gesture(s) is/are captured through Desktop Webcam
3. The gesture(s) is/are trained in the TensorFlow Cloud and stored in the classifier.
4. Gesture to Text conversion is performed and is outputted to the screen
5. The same text is sent to the Voice Recognition system and is outputted as Voice
6. The required command is executed

When the above modules are integrated, first the input is taken from the user in the form of
gestures. These gestures are processed behind and are converted to Voice output. These Voice
outputs are then captured by the Voice Assistant System and the respective answers are given to the
user or the required action is executed.

2.1.3 Colour Depiction System Architecture:

1. Capture the surrounding daily sounds, Classify the sounds and output the command to awaken the
voice recognition system

2. Voice recognition system stimulates the color change in smart bulb corresponding to the instruction
received which signals the user to react accordingly.

12
2.2 Module Description
The system has three modules, namely:
1. Voice Recognition Module
2. Gesture Recognition Module
3. Colour Depiction Module

2.2.1 Voice Recognition Module

The first step in building the Voice Assistant is installing and setting up Raspberry Pi by
downloading Raspbian (Noobs), which is the Operating System of the Raspberry Pi. Then, set up the initial
configuration modules (language used etc.) along with the network connectivity. Attach all the other
components like the Microphone, Speakers, Screen, Mouse, Keyboard etc. and proceed to configuring the
sound system of the Pi. In the further steps, create an Amazon Developer Account from where the Alexa
Voice Services will be installed and imported on the Pi.

Execute certain BASH commands in order to configure the Voice services on the Pi. Change the
settings so that the Voice Assistant will auto start on receiving the Wake Word. Test the voice assistant to
check its functioning.

2.2.2 Gesture Recognition System Module

Build a User Interface application for gesture recognition module through which user can interact
with the system using gestures. Create the KNN model to train and classify Indian Sign Language gestures
including the wake gestures called START and STOP. As part of internal functionality every image in the
training set is mapped to its corresponding query and stored internally inside the model.

After the application is deployed, the user is now prompted to pose gestures and begin
communicating with the interface. The posed gesture is now converted both to voice and text wherein this
voice is taken as input by the voice assistant and performs the task at hand or the voice assistant responds
back to the given pose where the response is converted back to text and is displayed on the screen. Thus,
completing the functionality of the module.

13
The below gestures along with their symbolic meaning are listed below:

 What is my name?

 Open Twitter.

 Where am I?

 How are you?

 Is it raining today?

 Good afternoon!

 A START Gesture and a STOP Gesture which are used as WAKE Words to Start and Pause/End

the Prediction Phase respectively.

These are the various gesture questions that were trained for this system. All these gestures are
trained with a minimum of 30 image samples each to ensure high accuracies in prediction phase.

2.2.3 Color Depiction System Module

Capture and create a dataset of various daily life sounds and create a sequential neural network model
with 4 dense layers and 2 activation functions. Convert the sound clips to various frequency arrays and train
the model with these frequency arrays and categorize the sounds accordingly. Convert the predicted output
to voice and send commands/instructions to voice assistant.

This module includes various algorithms like Neural Networks and also uses a unique package called
Librosa, that effectively analyzes the audio clips present in the surroundings of the user.

Based on the instructions received, the colour in the smart bulb is changed. Establish this connectivity
between daily sounds and smart bulb light using the voice recognition system and observe the corresponding
colour changes in the smart bulb.

The various sounds and the corresponding colours that were used in the dataset are as follows:

 Car_Horn - Yellow

 Children_Playing - Dark Green


14
 Dog_Bark - Brown

 Drilling - Grey

 Engine_Idling - Orange

 Air_Conditioner - Blue

 Gun_Shot - Gold

 Jack_Hammer - Dark Violet

 Siren - Red

 Street_Music - Web Green

2.3 Description of the Algorithms


For the implementation of the Proposed System, the following Algorithms were used:

a) KNN Image Classifier Algorithm:

K-Nearest Neighbor is a Supervised Learning technique. This algorithm assumes the similarity
between the new case/data and available cases and put the new case into the category that is most similar to
the available categories. It performs the gesture recognition for the proposed system, based on the following
steps:

o Step1: The user should provide and label the 3 sets of input images for the training. The webcam
and the buttons which are available on the browser are to be used for this purpose.

o Step2: Once all input images are provided with the labels, the activation tensors will be predicted
by dumping the input images into the mobilenet model. Those activation tensors will be then
used as input to the KNN classifier to create a data- set with labels assigned to each activation
tensor provided. The training process will be completed with this step.

o Step3: For the prediction, images captured from the webcam are to be fed real-time into the
mobilenet model to get the activation tensors. These will be then fed into the trained KNN model
to recognize the class of the images.

15
b) Sequential Neural Networks

A neural network is a network or circuit of neurons, or in a modern sense, an artificial neural


network, composed of artificial neurons or nodes. Thus a neural network is either a biological neural
network, made up of real biological neurons, or an artificial neural network, for solving artificial
intelligence (AI) problems.

The ability to work with inadequate knowledge: After ANN training, the data may produce
output even with incomplete information. The lack of performance here depends on the importance of
the missing information.

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one
input tensor and one output tensor.

 Step1: A Dataset is built by taking various daily sounds which are present in the .wav format.
This in total consists of 5434 audio samples.

 Step 2: In the training phase, a neural network model is developed using 4434 audio samples
of the dataset. The rest were considered for testing phase.

 Step 3: Now, a sequential neural network model with 4 dense layers and 2 activation
functions (Rectified Linear Unit Activation Function and Softmax Activation Function).

 Step 4: This neural network model is feeded with the set of training samples and with 30
epochs which prepares the model for testing phase.

 Step 5: The output is predicted for the test set with a validation accuracy of 92%.

 Step 6: A graph is now plotted between the training and validation accuracies of the neural
network model.

16
3. IMPLEMENTATION/ CODING

This project is built on various IoT components and latest technologies which are embedded into the
components to make an integrated device. There are different latest algorithms used like the KNN and
Neural Networks for building the Gesture and the color Depiction systems respectively. The technologies
consist of various Front-End Technologies like Javascript, HTML and CSS etc. The various components
include Raspberry Pi, Microphone, Speaker, SD Card, Screen etc upon which the entire Voice Recognition
System is built.

3.1 Description of Technologies used:


The following Technologies were used:

a) HTML (Hypertext Mark-up Language):

Hypertext Mark-up Language (HTML) is the standard mark-up language for documents designed to
be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and
scripting languages such as JavaScript. Web browsers receive HTML documents from a web server or from
local storage and render the documents into multimedia web pages. HTML describes the structure of a web
page semantically and originally included cues for the appearance of the document.

b) CSS (Cascading Style Sheets):

Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a
document written in a mark-up language such as HTML. CSS is a cornerstone technology of the World
Wide Web, alongside HTML and JavaScript. CSS is designed to enable the separation of presentation and
content, including layout, colours, and fonts. This separation can improve content accessibility, provide
more flexibility and control in the specification of presentation characteristics, enable multiple web pages to
share formatting by specifying the relevant CSS in a separate .css file which reduces complexity and
repetition in the structural content as well as enabling the .css file to be cached to improve the page load
speed between the pages that share the file and its formatting.

17
c) JavaScript:

JavaScript is a programming language that conforms to the ECMA script specification. JavaScript is
high-level, often just-in-time compiled, and multi-paradigm. It has curly-bracket syntax, dynamic typing,
prototype-based object-orientation, and first-class functions. Alongside HTML and CSS, JavaScript is one
of the core technologies of the World Wide Web. JavaScript enables interactive web pages and is an
essential part of web applications. The vast majority of websites use it for client-side page behavior, and all
major web browsers have a dedicated JavaScript engine to execute it.

3.2 Description of IoT Components

a) Raspberry Pi:

The Raspberry Pi is a low cost, small sized computer that plugs into a computer monitor or TV, and
uses a standard keyboard and mouse. It is a capable little device that enables people to explore computing.
It is capable of doing everything the user would expect a desktop computer to do, from browsing the
internet and playing high-definition video, to making spreadsheets, word-processing, and playing games.

b) Microphone:

USB Microphones are the easiest way of getting a microphone working with your Raspberry Pi. One
of the most significant advantages of using a USB microphone is that it is plug and play. The Raspbian
operating system will automatically detect the microphone when its plugged in.

c) Speaker:

A raspberry speaker is an output hardware device that connects to the Raspberry Pi to generate sound.
When the speaker receives an electrical input from a device, it causes a back-and-forth motion of the
waves. This motion then vibrates the outer cone, generating sound waves.

d) 7’’ Touch Screen:

This touch screen display module is a 7inch display compatible for Raspberry Pi boards. The viewable
screen size of the 7inch display is 155mm x 86mm. This touch screen display module for Raspberry Pi has
a screen resolution of 800 x 480 pixels. The type of touch used in the raspberry pi display is a 10-finger

18
capacitive touch. The user can connect the 7-inch display module to the Raspberry Pi by attaching the
ribbon cable to the DSI port on the Raspberry Pi board.

e) SD-Card:

The Raspberry Pi should work with any compatible SD card with an optimal SD card size (capacity).
For installation of Raspberry Pi OS with desktop and recommended software (Full) via NOOBS the
minimum card size is 32GB.

f) Smart Bulb:

A smart bulb is an internet-capable LED light bulb that allows lighting to be customized, scheduled
and controlled remotely. Smart bulbs are among the most immediately successful offerings in the growing
category of home automation and Internet of Things (IoT) products. It can depict 123 colours ranging from
green to dark magenta to many more.

3.3 Implementation of the Integrated System

3.3.1 Implementation of Voice Recognition System

The implementation of this module using the above-mentioned technologies/tools/algorithms is as


follows:

Raspberry Pi 3B+, Speakers, USB Microphone, SD Card, Touch Screen are connected to the
Raspberry Pi at their indicated ports. Then, the Raspbian (Noobs) Operating System is loaded into the SD
card. Later, the Raspberry Pi has to get connected to a network in order to import the Voice Recognition
Modules onto the device. After this network connectivity, Amazon Voice Services packages are imported
into the SD card.

Once the setup is installed and initial settings related to input and output are configured, the system
can now be tested using sample voice commands. During the testing, the USB Microphone records the
user’s speech. This recording of is sent to Amazon’s servers to be analyzed. The server breaks down user’s
speech into individual sounds, and finds which words most closely correspond to the combination of
individual sounds.

19
It then identifies important words to make sense of the tasks and carry out corresponding functions.
The server sends the information back to the device and voice recognition system will execute the
respective action/task. The motive behind building this device is to develop a Cost-Effective System so that
it can reach even to the unaffordable.

3.3.2 Implementation of Gesture Recognition System

The implementation of this module using the above-mentioned technologies/tools/algorithms is as


follows:

The web interface that was developed using Javascript has an ability to capture the hand gestures
from the users using the webcam of the user’s device. Firstly, the user is requested to grant permissions for
the application to be able to utilize the webcam of the user’s device. The user initially has to provide a
minimum of 30 samples of the gesture right from the START and STOP gestures. This threshold is to
ensure more efficient prediction of the user’s gesture. Every gesture inputted, takes the lighting and
positional changes into consideration. These gestures are converted to gesture cards by mapping each
gesture to the message given by the user. These cards are now loaded into the KNN classifier which trains
the captured set of images by the end of the training phase.

In case the gesture that is already trained has to be changed due to various factors (like background
etc,) it can be retrained, by deleted the existing gestures and can be added with a new set of gestures. The
system cannot be trained with customized gestures without training the START and STOP gestures, which
act as WAKE gestures to invoke the functionality of the system.

During the Translation and Prediction Phase, there is continuous recording of the gestures through
the user’s webcam. Any gesture after the START gesture, is now classified by the trained model to predict
the output text corresponding to the input gesture until it encounters the STOP gesture. If the current
gesture and the previous gesture are classified to be the same, there will be no change in the prediction. The
text that was mapped to each predicted gesture is now displayed on the screen along with its corresponding
voice output and confidence measure.

Once the above two individual modules are functioning as expected, they can be made to interact
with each other, by first giving input through gestures, outputting the respective text as voice so that the

20
voice recognition system is enabled. The captured voice is processed (as detailed in Implementation of the
Voice Recognition System) behind and the speakers are invoked by electrical signals to give the specified
response or execute a given command as per the user’s expectation.

3.3.3 Implementation of Colour Depiction System

The following section highlights the implementation of Colour Depiction module using the above-
mentioned algorithms:

The project classifies the daily life sounds by building a sequential neural network model. The Dataset
utilized in the project has been taken from Urban8K Sounds. It consists of 5435 labelled sound excerpts from
10 different classes.

Python Librosa library is used for music and audio analysis. Librosa is used when there is a work with
audio data like in music generation or Automatic Speech Recognition. It provides the building blocks
necessary to create the music information retrieval systems. Librosa helps to visualize the audio signals
and also does the feature extractions in it using different signal processing techniques. It Loads an audio
file as a floating-point time series array representation.

This python Librosa library is thus used to extract the numerical features like Mel-frequency cepstral
coefficients (MFCCs), tonnetz, mel-scaled spectrogram and chromogram etc. from daily life sounds and then
uses these features to train a neural network model with 30 epochs.

The model has thus been trained with 4434 daily life sounds and the remaining are then used for testing
from the dataset. The model predicts the classes for the daily life sounds present in the test class. The
predicted class is then converted to audio or voice and is sent as a command to the voice assistant. Based on
the instructions received, the voice assistant internally processes the command and changes the colour in the
smart bulb to indicate the changes in the user’s surroundings.

For example, If the predicted label is siren sound, with the aid of voice assistant, the colour in the smart
bulb is changed to red to indicate an emergency situation.

21
3.4 UML Diagram

The Unified Modeling Language is a general-purpose, developmental, modeling language intended to


provide a standard way to visualize the design of a system.
The below diagrams (Fig 3.1 to 3.11) depict the UML (Use Case Diagrams, Sequence Diagrams and
Activity Diagrams), which depict the varied representational process flows of the entire system. They consist of
Actors & Actions, interactions between components and flows detailing the activity sequences between parts of
the proposed system.
The following points provide a brief description of the process flow of the entire system. These are
followed by various UML Diagrams that describe the functioning of the system from various perspectives in
great detail.

Voice Recognition System:

 User inputs the query/message through Voice.

 The query is captured through the mic

 The query is sent to the Amazon Voice Services Cloud through the code present in SD Card of
Pi
 The response obtained from the cloud

 This output is converted to speech and is given to the Speakers

 The required output is given back to the user using speakers.

Gesture Recognition System:

 Inputting Gestures through Hand(s)

 Gesture(s) is/are captured through Desktop Webcam

 The gesture(s) is/are trained in the TensorFlow Cloud and stored in the classifier.

 Gesture to Text conversion is performed and is outputted to the screen

 The same text is sent to the Voice Recognition system and is outputted as Voice

 The required command is executed


22
When the above modules are integrated, first the input is taken from the user in the form of gestures.
These gestures are processed behind and are converted to Voice output. These Voice outputs are then
captured by the Voice Assistant System and the respective answers are given to the user or the required
action is executed.

Colour Depiction System:

 Capture the surrounding daily sounds, Classify the sounds and output the command to awaken the
voice recognition system
 Voice recognition system stimulates the color change in smart bulb corresponding to the instruction
received which signals the user to react accordingly.

3.4.1 Use Case Diagram

A UML use case diagram is the primary form of system/software requirements for a new software
program underdeveloped. Use cases specify the expected behavior (what), and not the exact method of
making it happen (how). Use cases once specified can be denoted both textual and visual representation ( i.e.,
use case diagram). A key concept of use case modeling is that it helps us design a system from the end user's
perspective.
The Actors of the above system are Elderly or Disabled people (Target Audience). The Actions are
grouped under three systems (Voice, Gesture and Color Depiction Systems). These actions correspond to each
segment in the functioning of the proposed system.

23
Fig 3.1: Use Case Diagram of Voice Recognition System

In the above Fig 3.1, the Use Case Diagram of the Voice Recognition System is described. The figure
consists of an actor (here, the Voice Recognition System) along with the various actions performed by this
actor. These various actions represent a flow of how this system works.

This is described as follows:

Initially, the Actor captures the query sent by an elderly person, processes the query at the server’s side
by breaking down the query into smaller action verbs and then sends back a response, which could either be a
textual response that could be heard by the elderly person using the speaker attached to the Raspberry Pi or
perform the respective action specified by the User.

24
Fig 3.2: Use Case Diagram of Gesture Recognition System

In the above Fig 3.2, the Use Case Diagram of the Gesture Recognition System is described. The figure
consists of an actor (here, the Gesture Recognition System) along with the various actions performed by this
actor. These various actions represent a flow of how this system works.
This is described as follows:

Initially, the actor captures various gestures pertaining to the Indian Sign Language and loads these
captured images into Tensorflow Cloud, where a KNN Training Model is built. Hence, we have a trained model
which is ready to perform predictions. The prediction phase starts when the START Wake word is posed to the
webcam, and continuously records these gestures and simultaneously keeps predicting the outcomes. These
outcomes are displayed in the form of a matched gesture image, along with its confidence measure and the text
or message that was previously mapped to the gesture in the training phase is also displayed. A Voiced version
of the same text is given out through the speakers of the Gesture Recognition System.

25
Fig 3.3: Use Case Diagram of Colour Depiction System.

In the above Fig 3.4, the Use Case Diagram of the Colour Depiction System is described. The figure
consists of an actor (here, the Colour Depiction System) along with the various actions performed by this actor.
These various actions represent a flow of how this system works.

This is described as follows:


Initially, the Colour Depiction Model is loaded with a dataset consisting of most commonly prevailing
Urban Sounds. The audio files in the dataset fall under different sound classes like SIREN,
CHILDREN_PLAYING etc. Now, a Neural Network Training Model is built using these audio clips. The
Neural Network consists of a sequential layers like Convolutional layer, Dropout layer etc. This Neural Network
also uses 2 Activation Functions which are Softmax Function and ReLU (Rectified Linear Unit Function)
Function. After the training step, during the testing phase, the external sounds in the surrounding environment of
the user are captured and categorized into various sound classes. According to the sound class, a suitable
command (consisting of the colour to be lit) is given out by the Colour Depiction system to awaken the Voice
Assistant for command execution. This Voice Assistant, on receiving the command executes it, i.e, changes the
colour of the Smart Bulb according to the colour specified in the command. (eg. The Colour of the Smart bulb
changes to GREEN on hearing CHILDREN_PLAYING sounds in the surroundings).

26
3.4.2 Sequence Diagram

A sequence diagram shows object interactions arranged in time sequence. It depicts the objects
involved in the scenario and the sequence of messages exchanged between the objects needed to carry out
the functionality of the scenario.

Fig 3.4: Sequence Diagram of the Voice Recognition System

In the above Fig 3.6, the Sequence Diagram of the Voice Recognition System is described. The figure
consists of various interactions between different objects present in the system. The description is as follows:

Initially, the elderly person sends a query or a message, then the system records the query sent by the
elderly person, processes the query at the server’s side by breaking down the query into smaller action verbs and
then sends back a response, which could either be a textual response that could be heard by the elderly person
using the speaker attached to the Raspberry Pi or perform the respective action specified by the User.

27
Fig 3.5: Sequence Diagram of the Integrated Gesture Recognition System

In the above Fig 3.7, the Sequence Diagram of the Integrated Gesture Recognition System is described.
The figure consists of various interactions between different objects present in the system. The description is as
follows:
Initially, the Elderly/ Disabled person poses various gestures pertaining to the Indian Sign Language.
There is continuous capturing of the posed gestures using a webcam. These posed gestures are matched with
trained gestures in the KNN Model. The Training Model present in the Tensorflow Cloud now predicts the
matched gesture and returns the respective outcomes. These outcomes are displayed in the form of a matched
gesture image by the system to the user, along with its confidence measure and the text or message that was
previously mapped to the gesture in the training phase is also displayed. A Voiced version of the same text is
given out through the speakers of the Gesture Recognition System. This voice output is then taken by the Voice
Recognition System, that gives the corresponding responses.
28
Fig 3.6: Sequence Diagram of the Integrated Colour Depiction System

In the above Fig 3.8, the Sequence Diagram of the Integrated Colour Depiction System is described. The
figure consists of various interactions between different objects present in the system. The description is as
follows:
Initially there are several sounds present in the Elderly/ Disabled person’s surroundings that are
continuously captured by the Colour Depiction System. These sounds are simultaneously categorized into 10
sound_classes. According to the sound class, a suitable command (consisting of the colour to be lit) is given out
by the Colour Depiction system to awaken the Voice Assistant for command execution. This Voice Assistant,
on receiving the command executes it, i.e, changes the colour of the Smart Bulb according to the colour
specified in the command. (eg. The Colour of the Smart bulb changes to GREEN on hearing
CHILDREN_PLAYING sounds in the surroundings).
29
3.4.3 Activity Diagram

Activity diagram is another important behavioural diagram in UML diagram to describe dynamic aspects
of the system. Activity diagram is essentially an advanced version of flow chart that modelling the flow from
one activity to another activity.
They are graphical representations of workflows of stepwise activities and actions with support for
choice, iteration and concurrency.

Fig 3.7: Activity Diagram of the Voice Recognition System

In the above Fig 3.9, the Activity Diagram of the Voice Recognition System is described. The figure
consists of a flowchart to represent the flow from one activity to another activity. The description is as follows:
Initially, the elderly person sends a query or a message, then the system records the query sent by the elderly
person, processes the query at the server’s side by breaking down the query into smaller action verbs and then
sends back a response, which could either be a textual response that could be heard by the elderly person using
the speaker attached to the Raspberry Pi or perform the respective action specified by the User.
30
Fig 3.8: Activity Diagram of the Integrated Gesture Recognition System

In the above Fig 3.10, the Activity Diagram of the Integrated Gesture Recognition System is described.
The figure consists of a flowchart to represent the flow from one activity to another activity. The description is
as follows:
Initially, the Elderly/ Disabled person poses various gestures pertaining to the Indian Sign Language.
There is continuous capturing of the posed gestures using a webcam. These posed gestures are matched with
trained gestures in the KNN Model. The Training Model present in the Tensorflow Cloud now predicts the
matched gesture and returns the respective outcomes. These outcomes are displayed in the form of a matched
gesture image by the system to the user, along with its confidence measure and the text or message that was
previously mapped to the gesture in the training phase is also displayed. A Voiced version of the same text is
given out through the speakers of the Gesture Recognition System. This voice output is then taken by the Voice
Recognition System, that gives the corresponding responses.
31
Fig 3.9: Activity Diagram of the Integrated Colour Depiction System

In the above Fig 3.11, the Activity Diagram of the Integrated Colour Depiction System is described. The
figure consists of a flowchart to represent the flow from one activity to another activity. The description is as
follows:
Initially there are several sounds present in the Elderly/ Disabled person’s surroundings that are
continuously captured by the Colour Depiction System. These sounds are simultaneously categorized into 10
sound_classes. According to the sound class, a suitable command (consisting of the colour to be lit) is given out
by the Colour Depiction system to awaken the Voice Assistant for command execution. This Voice Assistant,
on receiving the command executes it, i.e, changes the colour of the Smart Bulb according to the colour
specified in the command. (eg. The Colour of the Smart bulb changes to GREEN on hearing
CHILDREN_PLAYING sounds in the surroundings).

32
3.5 Data Set

For developing the Gesture Recognition System, the Indian Sign Language Dataset was used,
which contains various gestures including the START and STOP Gestures (Wake Gestures) which sum
up to a total of 240 images. These images are captured from different positions and lighting backgrounds
to ensure a more efficient training model as output.

The below gestures along with their symbolic meaning are listed below:

 What is my name?

 Open Twitter.

 Where am I?

 How are you?

 Is it raining today?

 Good Afternoon!

 A START Gesture and a STOP Gesture which are used as WAKE Words to Start and Pause/End

the Prediction Phase respectively.

All these gestures are trained with a minimum of 30 image samples each to ensure high
accuracies in prediction phase.

The Colour Depiction Dataset contains 5434 labeled sound excerpts of urban sounds from 10
classes. A numeric identifier of the sound class and their corresponding colours are as follows:

0 = Car_Horn - Yellow

1 = Children_Playing - Dark Green

2 = Dog_Barking - Brown

3 = Drilling - Grey

4 = Engine_Idling - Orange

33
5 = Air_Conditioner - Blue

6 = Gun_Shot - Gold

7 = Jack_Hammer - Dark Violet

8 = Siren - Red

9 = Street_Music - Web Green

34
4. RESULTS AND DISCUSSIONS

4.1 Discussion of Results

4.1.1 Results of Voice Recognition System

The Voice Recognition System is built by importing Amazon Alexa Packages into the SD Card of
the Raspberry Pi. These packages have the potential of responding to any queries asked by the User
from responding to casual questions to booking groceries on Big Basket. The following are the set of
11 sample queries posed to test the functioning of the Voice Recognition System. This system was also
tested with certain custom queries that are unique to our system. These 11 queries were answered with
an accuracy of 80%.

1. “Alexa, open my Houseguest Guide.”

2. “How do I turn on the TV?”

3. “Alexa, where are my spectacles?”

4. “Alexa, read me a bedtime story”

5. “Alexa, how’s the traffic in Barkatpura?”

6. “Alexa, announce that dinner is ready”

7. “Alexa, how do you say HELLO in Spanish?”

8. “Alexa, what’s the date?”

9. “Alexa, show me the recipe for Lasgna”

10. “Alexa, book a ride from Nampally to Shaikpet on Uber”

11. “Alexa, open Facebook on my phone”

4.1.2 Results of Gesture Recognition System

For developing the Gesture Recognition System, the Indian Sign Language Dataset was used,
which contains various gestures including the START and STOP Gestures (Wake Gestures) which sum
up to a total of 240 images. These images are captured from different positions and lighting backgrounds
to ensure a more efficient training model as output.

35
All these gestures are trained with a minimum of 30 image samples each to ensure high
accuracies in prediction phase.

240 Images (containing the images pertaining to all 8 gestures) were used for both training and
testing, by the method of “Using Training set for Testing”. These images were predicted with an
accuracy range of 98% to 100%. Every predicted gesture card has this confidence measure of 98% to
100% displayed below it. This indicates the percentage match of the posed gesture with the images in
the training set.

The query that was mapped to the gesture cards in the initial training phase is also retrieved onto
the screen and the same text is converted to Voice format and is outputted by the system as an input to
the Voice Recognition System. After the capture of this Voice output by the Voice Assistant, the
response that will be given out is again converted back to text and is given to the Gesture Recognition
System to be displayed on the screen.

4.1.3 Results of Colour Depiction System

The Colour Depiction System uses a dataset of 5434 sounds excerpts taken from
UrbanSound8k. Among which, 80% (4347) excerpts were used for training and the rest of the 20%
(1087) were used for testing. This model produces an accuracy of 92% for the tested samples. All the
test audio samples were classified into 4 sound classes, out of which 3 were correctly predicted and the
corresponding colours were emitted in the smart bulb.

These sound classes were:


 Children_Playing : Emitted Dark Green Colour
 Siren : Emitted Red Colour
 Dog_Barking : Emitted Brown Colour

36
4.2 Screenshots of the Voice Recognition Module

Fig 4.1: Voice assistant

All the components shown in the Fig 4.1 are connected accordingly to the Raspberry Pi and
are detailed below:

a) Raspberry Pi:

The Raspberry Pi is a low cost, small sized computer that plugs into a computer monitor or TV, and
uses a standard keyboard and mouse. It is a capable little device that enables people to explore computing.
It is capable of doing everything the user would expect a desktop computer to do, from browsing the
internet and playing high-definition video, to making spreadsheets, word-processing, and playing games.

b) USB Keyboard and Mouse:

The USB Keyboard and Mouse are two external components which are connected to the Raspberry Pi
in-order to work with the Raspbian OS. Using this keyboard and mouse we tried to do all the operations
and execute command related to the voice assistant functioning.

37
c) Microphone:

USB Microphones are the easiest way of getting a microphone working with your Raspberry Pi. One
of the most significant advantages of using a USB microphone is that it is plug and play. The Raspbian
operating system will automatically detect the microphone when its plugged in.

d) Speaker:

A raspberry speaker is an output hardware device that connects to the Raspberry Pi to generate sound.
When the speaker receives an electrical input from a device, it causes a back-and-forth motion of the
waves. This motion then vibrates the outer cone, generating sound waves.

e) HDMI to Monitor:

HDMI Stands for High-Definition Multimedia Interface. A Standard for simultaneously transmitting
digital video and audio from our Raspberry Pi to any other device that is connected using this cable like a
computer, a TV or any projector.

f) 7’’ Touch Screen:

This touch screen display module is a 7inch display compatible for Raspberry Pi boards. The viewable
screen size of the 7inch display is 155mm x 86mm. This touch screen display module for Raspberry Pi has
a screen resolution of 800 x 480 pixels. The type of touch used in the raspberry pi display is a 10-finger
capacitive touch. The user can connect the 7-inch display module to the Raspberry Pi by attaching the
ribbon cable to the DSI port on the Raspberry Pi board.

g) 2.5A Micro USB Power:


USB Power Cable connects the Raspberry Pi to a power adapter that generates 5V DC standard
required by the USB. Using this power cable we would be generating the electricity that is required for
the functioning of the Raspberry Pi. This power cable connects the Raspberry Pi to a constant source of
power of 2.5A.

38
h) SD-Card:

The Raspberry Pi should work with any compatible SD card with an optimal SD card size
(capacity). For installation of Raspberry Pi Operating System with desktop and recommended software
(Full) via NOOBS the minimum card size is 32GB.

All the above-described components are connected to the main component, i.e., the Raspberry
Pi (in clockwise direction) starting from the USB Keyboard to SD Card. Then, the SD Card is loaded
with the Raspbian Operating System, and the systems is activated. This system can now be operated
similar to a normal PC with the help of the Screen, which acts like a monitor.

4.3 Screenshots of the Gesture Recognition Module

1) Home Page:

Fig 4.2: Depicts the Home Page of the Web Application

This webpage in Fig. 4.2 prompts an alert for granting permissions for accessing the video(camera) of the
device.

39
2) Initial Training Page:

Fig 4.3: Depicts the training of Start as a wake word.

Fig 4.4: Depicts the training of Stop as a wake word.

The wake words (start and stop gestures) for the device are trained as shown in Figures 4.3 and 4.4.
Any gesture is captured using the webcam with a minimum image threshold of 30 captures. A green check
mark is displayed after the number of images captured is greater than or equal to the threshold. The next
button on the page does not function without training the wake words.
40
3) Custom Gestures Training Page: (Add Gesture feature)

Fig 4.5: To capture Custom Gestures

This page allows us to train any number of custom gestures using the Add Gesture feature. A
minimum of 1 custom gesture is required for navigating to the gesture translation or prediction page as
shown in Fig.4.5.

4) Clear Gestures Feature:

Fig 4.6: To clear or change a sign for a specific gesture.

Fig 4.6 depicts the functionality of the clear button. All the images or signs captured for a particular
gesture are removed for retraining.

41
5) Retraining the gestures:

Fig 4.7: Depicts the Retraining option of a gesture

This feature in Fig. 4.7 allows recapturing the images for a gesture. This flexibility allows the user to
change any gesture conveniently.

6) Translating/ Predicting the gestures. (Start Gesture)

Fig 4.8: Predicting the start gesture.

42
7) Predicting the custom gestures:

Fig 4.9: Predicting the custom gestures

The text that is appearing below the predicted image can be copied to clipboard.

8) Predicting the Stop Gesture

Fig 4.10: Predicting the Stop Gesture

All the images from figure 4.8 - 4.16 will be depicting the confidence percentage. If the current
gesture and the trained gesture match to a level of 100% the confidence percentage is shown along with the
image of the predicted gesture.

43
The trained gestures are:
a)

Fig: 4.11: Predicting – Alexa, What is your name?

b)

Fig: 4.12: Predicting – Alexa, open Twitter

44
c)

Fig: 4.13: Predicting – Alexa, where am I?

d)

Fig: 4.14: Predicting – Alexa, how are you?

45
e)

Fig: 4.15: Predicting – Alexa, is it raining today?

f)

Fig: 4.16: Predicting – Alexa, Good Afternoon.

46
4.4 Screenshots of the Colour Depiction System

1) Listening and Recognizing Various Surrounding Daily Life Sounds

Fig 4.17: Listening and recognizing the surrounding daily life sounds

Initially the surrounding daily life sounds are recorded and stored as dataset. It is then converted to
appropriate form for processing. In the next step the model is trained with these sounds and starts recognizing
the sounds. The smart bulb glows in plain/white/normal colour before processing the sound as shown in Fig.
4.17.

47
2) Predicting and Changing the Colour of the Smart Bulb

Fig 4.18: Color change after Predicting the class

Once the daily life sounds are classified the necessary command turns the bulb into respective color.
The Fig. 4.18 above depicts the change in color of the bulb from white color to dark green color as the
predicted class is children playing.

48
4.5 Graphical Representation

Fig 4.19: Graphical Representation

The above graphical representation shown in the Fig. 5.19 depicts the Training and Validation
Accuracies plotted between the Epochs and Accuracy Scale for the 80:20 ratio of Training and the Testing
Datasets.

49
5. CONCLUSIONS AND FUTURE SCOPE

This project is an experiment that builds a Voice Assistant from the basic integration of various IoT
components (with the intension of providing an inexpensively built product to the users). This Voice
System integrated with Gestures using Indian Sign Language and Tensorflow.js recognizes and responds to
various Gestures posed by the users. This system also includes an added advantage of observing and
reacting to a multitude of sounds using smart bulb technology helping users understand their surroundings
better.
It began with a simple question "If voice is the future of computing what about those who cannot
hear or speak?". This device does cater to elderly and the disabled not only for their routine activities but
also to get rid of their loneliness. It eases their lives especially through simple modes of gesture, virtual
voice communication and colour depiction through smart bulbs. The humble features of Technology enable
the needy to make their lives easier through virtual devices.

The future scope of the project is to extend the features of both Gesture and Colour Depiction
System wherein; for the gesture recognition system, the response from the voice assistant has to be
converted to image of the gesture corresponding to the text/response and for the Colour Depiction System,
the daily life sounds from the surroundings have to be processed instantly and their corresponding colour
has to be depicted through the smart bulb.

50
REFERENCES

[1] Anupam Chouchary and Ravi Kshirsagar, “Process Speech Recognition System using Artificial
Intelligence Technique”, International Journal of Soft Computing and Engineering (IJSCE), 2012
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.479.5771&rep=rep1&type=pdf]

[2] David J. White, Andrew P. King & Shan D. Duncan, “Voice recognition technology as a tool for
behavioral research”, Springer, 2002 [https://link.springer.com/article/10.3758/BF03195418]

[3] Ji - Hwan Kim and Nguyen Duc Thang, “3-D Hand Motion Tracking and Gesture Recognition using a
data glove”, Induatrial Electronics ISIE, 2009 [https://ieeexplore.ieee.org/document/5221998]

[4] Jianliang Meng; Junwei Zhang; Haoquan Zhao, “Overview of the Speech Recognition Technology”, IEEE,
Fourth International Conference on Computational and Information Sciences, 2012
[https://ieeexplore.ieee.org/document/6300437]

[5] John Beard, Debbie Tolson, “International Association of Gerontology and Geriatrics: A Global Agenda for
Clinical Research and Quality of Care in Nursing Homes”, Journal of American Doctors Association, 2011
[https://www.sciencedirect.com/science/article/abs/pii/S1525861010004287]

[6] Kevin Bostic, AppleInsider, “Nuance confirms its voice technology is behind Apple's Siri”, 2013
[https://appleinsider.com/articles/13/05/30/nuance-confirms-its-technology-is-behind-applessiri]

[7] Ms Vrinda, Mr Chander Shekhar, “Speech recognition system for English Language”, International
Journal of Advanced Research in Computer and Communication Engineering IJARCCE, 2013.
[https://www.ijarcce.com/upload/january/8-speech%20recognition.pdf]

[8] Martha B Holstein, Meredith Minkler, “Self, Society, and the New Gerontology”, The Gerontologist, 2003.

[https://academic.oup.com/gerontologist/article/43/6/787/863093?login=true]

51
[9] Nobert A Steirtz, “Artificial Intelligence: Application to Lighting Products”, Academia, IAEME
Publication, 2020.
[https://www.academia.edu/45187720/ARTIFICIAL_INTELLIGENCE_APPLICATION_TO_LIGHTING_
PRODUCTS]

[10]Tong Du, Huichao Li, “Gesture Recognition Method Based on Deep Learning”, IEEE, 3rd Youth
Academic Annual Conference of Chinese Association of Automation (YAC), 2018
[https://ieeexplore.ieee.org/document/8406477]

52

You might also like