You are on page 1of 5

2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Dec.

18-20,2015, Sri Lanka

Project Nethra - An Intelligent Assistant for the


Visually Disabled to Interact with Internet Services
A.M. Weeratunga, S.A.U. Jayawardana, Hasindu P.M.A.K, W.P.M. Prashan and S. Thelijjagoda

Abstract-In the modern era where computers play a Several Solutions have been proposed for the above
prominent role in day to day life, visually disabled users have mentioned problems which are listed as follows but they all
found it challenging to keep up with the modern advancements in have drawbacks as mentioned below.
computers and internet usage forming a 'digital divide'. Project
Nethra is an intelligent virtual assistant for the visually disabled, Screen magnifiers are a straightforward solution and work
which provides a voice based medium for visually disabled users. well with other applications such as word processors,
It allows the target users to interact with computers and internet spreadsheets and web browsers. Though they may serve users
based services with a wide array of functionality based on various with mild visual disability, screen magnifiers provide little
internet services and social media. With Nethra, tasks will be benefit for those with severe visual disability.
done for the user instead of just returning search results. Project
Nethra will listen and detect what the user says and responds to
According to recent statIstlcs from the American
user's requests in a friendly, effective manner via voice in a
Foundation for the Blind, there were 55,200 legally blind
manner of a conversation. There are four main components of children in the United States in 1998-1999, of whom only
the system; the voice recognition module, the natural language 5,500 used Braille as their primary reading medium. Braille
processing module, conversational agent and the content readers are typically connected to the keyboard and provide
extraction module. This is a much faster and interactive solution line by line displays of text output, but because they are text
than regular assistive software for the visually disabled. The based, they are less helpful when used in isolation for GUI
existing screen reader software are not suitable for accessing based interfaces.
World Wide Web because of the minimum support they provide
Screen readers translate text and graphical displays into
for web content and the lack of voice recognition. The Virtual
Assistant software available in the market are not specifically auditory output. These can work with word processors,
catered for visually disabled. All the mainstream solutions were spreadsheets, Web browsers, and other commercial software
analyzed and compared with project Nethra in order to packages. Most commercially-available screen readers will
substantiate that the proposed solution is the most optimal and automatically announce menu bars and pop-up windows, and
would bridge the 'digital divide' that exists between sighted and will use standard protocols and voices to identify icons, radio
visually disabled users. buttons, text boxes, and other common graphical user interface
widgets. Disadvantages of a screen reader include, reading of
Keywords - Conversational Agent; Natural Language text in a very monotonous manner, and the user might feel
Processing; Virtual Assistant; Visually Disabled Users; Voice bored after continuous use [3].
Recognition
The focus of this paper is on the implementation of an
I. INTRODUCTION effective tool which will address the above drawbacks and
eventually enable visually disabled users to interact with
At present, computers and the internet are major influences
computers in a friendly and efficient manner.
on society. Access to information at the press of a keystroke
has enabled people to acquire knowledge at a pace that has II. LITERATURE REVIEW
never been possible for past generations. Yet, visually
disabled users have found it challenging to keep up with the
modern advancements made in the field of Information Computers and the Visually Impaired In the year 2002
-

Technology resulting in a 'digital divide' between the general at the seventeenth Annual International Conference of
population and 285 million people, who are estimated to be California State University Northridge, several findings about
visually disabled worldwide; of which 39 million are blind and the challenges faced by visually disabled computer users were
246 have low vision. presented [41. These are listed below.
The visually disabled face the following problems when
Time spent on tasks - Sighted users in their study were six
interacting with computers [2].
times more successful than users of screen readers at
Nonlinear hypertext web documents. This may cause accomplishing given tasks, and three times more successful
confusion for those who cannot easily follow visual cues. than users of screen magnification.
Most modern web pages rely on multimedia to convey
the message to the user in an informative manner. Need for clarity and consistency -Visually disabled users
also had problems working with the interactivity of sites,
Most computer applications in the past were based on text, particularly when sites deviated too much from standard
hence visually disabled users could use magnifiers and read Internet conventions.
the text, but, since modern applications rely on GUl's and
icons for buttons this has become a problem for the visually
disabled.

978-1-4799-1876-8/15/$31.00 ©2015 IEEE 55


2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Dec. 18-20,2015, Sri Lanka

Issue of separate screen reader (SR) or "text-only" versions 2014 - RiveScript


vs. accessibility of default site-Most users told us they do not Uses - Pattern Matching based on availability of variables,
appreciate having a separate SR or "text only" version of a availability not limited to a single programming language.
site. They were concerned that they might not be getting the Pros - Simple word pattern-matcher, uses simple visual
same thing as the default site, and that it might not be updated syntax, easy to maintain, no need to memorize lots of random
as regularly. symbols.
Cons - Less support available since the language is relatively
According to the American Foundation for the Blind new.
(AFB), majority of websites are built without regard to Popular bots - Aires Bot
accessible web design standards, resulting in inaccessible
websites for access-technology-users with vision loss. [5] "Web For the proposed solution, RiveScript will be utilized since
Content Accessibility Guidelines (WCAG) 2.0" by World it supports execution of scripts in the form of 'objects' which
Wide Web Consortium [6] and Web Accessibility Initiative - will prove to be useful when interacting with internet services.
Accessible Rich Internet Applications (WAI -ARIA) [7] states a
wide range of recommendations for making web content more Natural Language Processing Techniques An area of
-

accessible resulting in accessible content to a wider range of interest in the research is the application of Natural Language
people, especially people with disabilities. Although these Processing (NLP). NLP techniques apply knowledge about the
standards and guidelines exist, the leading web sites deviate structure of language to extract the names of entities, such as
from those. This makes it practically impossible and companies, products, and locations, as well as relations
inefficient for the visually disabled to access the web content between entities and characteristics of those entities.
through existing solutions.
• Named entity recognition is the use of gazetteers or
From these findings, we can observe that although visually statistical techniques to identify named text features: people,
impaired users deserve the right to access web based content organizations, place names, stock ticker symbols, certain
available for regular users, it is not possible with existing abbreviations, and so on.
solutions, due to service providers not adhering to the • Recognition of Pattern Identified Entities: Features such as
accessibility standards and guidelines. telephone numbers, e-mail addresses, and quantities (with
units) can be discerned via regular expression or other pattern
To improve accessibility for visually disabled users, the matches.
addition of a conversational agent for the user to converse via • Relationship, fact, and event Extraction: identification of
voice, is necessary. There have been several technologies in associations among entities and other information in text.
the past for the creation of conversational agents, many built
on top of existing solutions at the time and a summary is In the sentence "Like Saman De Silva's comment", the key
given as follows[8]. pieces of information are the name of the entity to interact
(Saman De Silva), the product (comment), and the property of
1995 - AIML the product (like). Extracting entities and their relations
Uses - Pattern Matching. requires natural language techniques that include identifying
Pros - Simple and easy to learn hence good for beginners, can parts of speech, grouping words into phrases and assigning
program using natural language. roles to phrases.
Cons - XML is not reader friendly, maintenance is hard.
Popular bots - Eliza The team members identified that there are certain
limitations in this area. Sentiment analysis is a common text
2005 - Fal;ade analysis task. The object is to categorize the overall attitude of
Uses - Pattern Matching based on discourse acts. a comment. In its most basic form, sentiment analysis
Pros - Patterns were more powerful than AIML, matches categorizes comments as positive or negative, but it does not
patterns closer to actual meaning. have the capability to extract various forms of attitudinal
Cons - Hard to author when application complexity increases. information. Simple text analytic techniques, such as counting
Popular bots - Jess positive and negative words, can give reasonable results in
many cases. For example, "The hotel is in a great location
2010 - ChatScript with fantastic view of the city." has positive words "great" and
Uses - Pattern Matching based on discourse acts "fantastic." Now consider a similar sentence: "Fantastic
Pros - Simple word pattern-matcher, easy to author, uses location and great view." The same positive words appear but
simple visual syntax. the commentator is using sarcasm to make a negative point.
Cons - Less support available since the language is relatively Simple text analytic techniques may not catch this.
new.
Popular bots - Suzette Comparison of Similar Products Several applications
-

are available which visually disabled users may employ to


interact with the computer. A summation is listed as follows.

56
2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Dec. 18-20,2015, Sri Lanka

TABLE I. COMPARISON OF SIMILLAR PRODUCTS SPEECH


SYNTHESIS
Name Type of VR Voice Focns Interaction Cost

Software Output on With

Visnally Internet

Disabled

Jaws Screen No Yes High Poor $895

Reader

Orca Screen No Yes High Poor Free

Reader

Dolphin Screen No Yes High No $795

Guide Magnifier

Level Star Braille No Yes High No $1395 Fig. 1. Speech Recognition Module
Icon Tablet

Microsoft Personal Yes Yes Low No Free Conversational Handling and Text-To-Speech modules -
Appropriate responses to user inputs are generated by this
Cortana assistant
module. A conversation can be thought as a series of volleys,
Google Personal Yes Yes Low No Free which is an input-response sequence. There have been several
Now assistant technologies which have been used to implement
Project Personal Yes Yes High Yes -----
conversational agents, starting with AIML which is an XML
based language.
Nethra Assistant
The conversational agent of the current system is
*VR - VOICE RECOGNITION implemented using Rivescript, which is an advancement from
AIML. Conversations are categorized into topics and each
III. METHODOLOGY topic has a sequence of input/response patterns. Some inputs
The proposed solution consists of the following require the invocation of a system call; e.g. Opening an
components. application. This is done by invoking a python script which is
also handled by the conversational agent.
Speech recognition module - User input is given as voice
commands and should be detected via a speech recognition Voice output is generated via the Text To Speech module
module. Speech is a continuous audio stream where stable and Pyttsx is utilized for this purpose. A high level diagram is
states mix with dynamically changed states [9]. This brings a shown below.
challenge in itself because there are no clearly distinguishable Content Extraction - The content extraction module is
parts in an audio recording which is a waveform. Words are responsible for retrieving information from websites (HTML
understood to be built of phones (the smallest unit of a pages). This will enable the system to provide up-to-date
waveform) but it is not hundred percent true. Various factors information such as weather information and latest news
such as phone context, speaker, style of speech influence a available on the internet. Web scraping techniques and
waveform. Therefore the same word can be presented in two libraries such as 'Beautiful Soup' is utilized for this purpose.
different waveforms .
This module is also responsible for interaction with
The common way to recognize speech is the following: we various API's and SDK's such as Facebook, Gmail and
take a waveform, split it on utterances by silences then try to Twitter.
recognize what's being said in each utterance [10]. Three
models are needed for the above mentioned task, an acoustic
model (waveform to phenome mapping), a phonetic dictionary
(word to phenome mapping) and a language model (words).
The most challenging aspect of speech recognition is
accuracy, which becomes even more difficult when the
acoustic model is unlimited. Therefore two different modules
will be used in the project for speech recognition. The online
speech recognition module will record voice and send the
samples as .flac files to the google voice API for processing.
The offline model will consist of the three models mentioned
above (acoustic model, phonetic dictionary and the language
model). This will be more accurate and faster than the online
model but is intended for a limited vocabulary.
Fig. 2. Content Extraction Module

57
2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Dec. 18-20,2015, Sri Lanka

Users can interact with various web services and API's using IV. RESULTS AND DISCUSSION
the proposed solution and a summary of the functionality is A questionnaire was prepared and was presented to 100
given below. visually disabled individuals, 25 of them were university
undergraduates and others were selected randomly. In this
Facebook Users can create an account on Facebook, check
-
questionnaire all the questions were closed-ended and by
and send messages, read notifications, request the software to using this type of questions we examined the expectations and
read their newsfeed and post status via voice commands. requirements of visually disabled users. Additionally, they
provided comprehensive feedback and suggestions.
Weather Users can get weather updates based on the their
-

location via voice commands. TABLE II. QUESTIONARE FEEDBACK

Gmail Users can check their inbox and ask the software to
-
Question Yes No
read emails, send emails via voice commands.
Do you use a computer or laptop at 91% 9%
News Users can request the software to read news which can
-
work?
be further divided into local news, world news, sports news Have you used screen reader 68% 32%
etc. software before?

Music Users can configure a separate folder in their


-
Do you know about social network 89% 11%
computer for music downloads and play music files via voice websites?
commands. Do you like to interact with social 90% 10%
network and email servIces
Wikipedia Users can search for content on Wikipedia and
- websites?
request the software to read the content via voice commands.
An experiment was performed to detect speech recognition
Google Search Users can search google and retrieve the top
- accuracy of the proposed system, the percentage of
results which will be read via voice. recognition of user commands is given in TABLE III. The
experiment is repeated ten times for each of the below words.
Blogger Users can start a new blog, read blog posts, and add
-

comments via voice commands. This will enable visually The overall efficiency of a voice recognition system
disabled users to share their life experiences with others. obtained is 80% with google speech recognition API.

YouTube Users can search videos on YouTube and a


-
TABLE III. SPEECH RECOGNITION ACCURACY
special option is provided for users to download audio files
and store in a preconfigured music folder from which music Command Male User Female User
files could be played via voice commands. Open Calendar 80% 80%
Post event 80% 80%
Web Search Users can retrieve information by searching the
- Read notifications 80% 80%
web via voice commands and output will be read using
speech-synthesis. This system provides support to interact with some web
services and many of the services are called through 'http'
protocol. Therefore user needs to have an internet connection
with system-recommended bandwidth. An experiment was
performed to find out the delay between requests and the
I Chatbot I API Calls
Speech

Recognition
Module , response.
Voice Input Python __ TABLE IV. RESPONSE TIME(S)
Backend .....

/ � python
Text To
Speech � Content
Extraction
Type of the command
OS related
Response Time (s)
5
Module � Get information from API 15
Voice Output
� n3� Get information from RSS feed 10

SystemCalis

Fig. 3. High level diagram

58
2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Dec. 18-20,2015, Sri Lanka

v. CONCLUSION AND FUTURE WORK REFERENCES

The test results show that improving offline speech [1] "WHO Visual impairment and blindness" [Online]. Available:
www.who.int/mediacentre/factsheets/fs282/enl [Accessed 25/06/2015]
recognition module could increase the accuracy and context­
[2] Michael F. Chiang, Roy G. Cole, Suhit Gupta and Gail E. Kaiser
awareness of Nethra up to the level of online Google speech
"Computer and World Wide Web Accessibility by Visually Disabled
recognition. We will be focusing on training a more Patients: Problems and Solutions" Department of Ophthalmology
personalized and rich language model to increase the offline Columbia University College of Physicians and Surgeons. 2011
speech recognition accuracy. Since online speech recognition [3] "Advantages and Disadvantages of Using a Screen Reader Instead of
delay depends on the user's internet speed, most of the time Braille" [Online]. Available: https://www.blindstreet.com/advantages­
we have to rely on the offline module. disadvantages-using-screen-reader-instead-braile [Accessed 25/06/2015]
[4] Elaine Gerber. "Conducting Usability Research With Computer Users
Who Are Blind or Visually Impaired" [Online]. Available:
The idea of an artificially intelligent virtual assistant that http://www.afb.org/info/accessibility/creating-accessible­
supports engaging conversations with user, instead of working websites/usability-research/235 [Accessed 23/06/2015]
as a data service provider is a key research area at present. So [5] "Designing an Accessible Website -Creating Accessible Websites"
we will be focusing on increasing contextual awareness of [Online]. Available: http://www.afb.org/info/accessibility/creating-
accessible-websites/23 [Accessed27 106/2015]
Nethra and to increase the appropriate and meaningful
[6] "Web Content Accessibility Guidelines (WCAG) 2.0" [Online].
responses which allows trained instances of Nethra to function
Available: www.w3.org/TR/ WCAG20/ [Accessed 24/06/2015]
as user intended.
[7] "Web Accessibility Initiative - ARIA Overview" [Online]. Available
http://www.w3.org/WAIlintro/aria.php[Accessed 24/06/2015]
Nethra is mainly focused to help visually disabled to [8] Beyond Fa9ade: Pattern Matching for Natural Language Applications"
access social media and other internet based services, because [Online]. Available: http://www.gamasutra.com/view/featureIl34675/
understanding digital content is an extremely important and beyond_fa%C3%A7ade_pattern_matching_.php?page=1 [Accessed
24/06/2015]
hard problem for this user group. As the first step to solve this
[9] "CMUSphnix Basic concepts of speech - Structure of speech" [Online].
problem, future versions of Nethra will be capable of http://cmusphinx.sourceforge.netlwikiltutorialconcepts [Accessed
describing the context of digital photos posted on social 27/0612015]
media. Having an emotional engagement with conversations [10] "CMUSphnix Basic concepts of speech - Speech Recognition process"
play a big role in motivating users to actively engage with the [Online]. http://cmusphinx.sourceforge.netlwiki/tutorialconcepts
[Accessed 27/06/2015]
system. Therefore understanding the user's emotional state
would help produce more personalized and relevant [II] Lawrence R. Rabiner & B.H. .luang Rutgers, "Statistical Methods for the
Recognition and Understanding of Speech", University and the
conversations. University of California, Santa Barbara, Georgia Institute of
Technology, Atlanta, 20 September 2004, pp 1-6.
Furthermore users can fmd out any information about the
project and download the software by visiting the URL
http://hellonethra.com

59

You might also like