You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341258902

Smart Reader for Visually Impaired People Based on Optical Character


Recognition

Chapter · May 2020


DOI: 10.1007/978-981-15-5232-8_8

CITATIONS READS

0 23

4 authors, including:

Dr KB Khattak Eid Rehman


The Islamia University of Bahawalpur International Islamic University, Islamabad
29 PUBLICATIONS   103 CITATIONS    13 PUBLICATIONS   103 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Optimization View project

Crowd Estimation View project

All content following this page was uploaded by Dr KB Khattak on 20 October 2020.

The user has requested enhancement of the downloaded file.


Smart Reader for Visually Impaired People
Based on Optical Character Recognition

Muhammad Farid Zamir1(&), Khan Bahadar Khan1,


Shafquat Ahmmad Khan1, and Eid Rehman2
1
Department of Telecommunication Engineering, UCET, The Islamia
University of Bahawalpur, Bahawalpur 63100, Pakistan
mfzamir67@gmail.com, ahmadatlive@gmail.com
2
Department of Computer Science, International Islamic University Islamabad,
Islamabad, Pakistan
eidrehmanktk@gmail.com

Abstract. There are millions of visually impaired people in the world.


According to the World Health Organization (WHO) data on visual impairment,
1.3 billion people are living with some kind of visual impairment while 36
million people are completely visually impaired. Reading is one of the major
necessities of visually impaired people. Numerous researchers have worked on
developing a mechanism that allows blind people to detect obstacles, to read the
labels or specific currencies and to read the written, typed or printed text. We
proposed a system which facilitates the visually impaired people by converting
the text into voice signal based on raspberry pi. Optical Character Recognition
(OCR) scheme is employed for the detection of printed text using a camera.
Typed, handwritten characters or text are converted into machine-encoded text.
The proposed method is developed on the raspberry pi in which OCR is
employed for an image to audio converter which is the output of the system. It is
a smart real-time device based on OCR.

Keywords: Optical character recognition  Raspberry pi  Tesseract  Open


source  Espeak  Python programming  Real-time  Voice output  Voice
signal

1 Introduction

World Health Organization revealed some horrific statistical figures in their report
which was titled WHO data on visual impairment 2010 [1]. It can be seen in Table 1
below which describes the blindness and its classification, masses who are suffering
from respect to their ages. It can be seen from the table that almost 285 million people
are visually impaired. The terrific stats define blindness as a massive and growing
health problem in World. Reading is one of the major necessities of blind people.
The earliest method used by blind people to read was the braille system of dots
which requires its complete learning and understanding. In the 21st Century of Science
& technological revolution, a person has to learn the old fashioned system of braille for
just to read. The conversion of text into braille alphabets is itself a very time-
consuming task.

© Springer Nature Singapore Pte Ltd. 2020


I. S. Bajwa et al. (Eds.): INTAP 2019, CCIS 1198, pp. 79–89, 2020.
https://doi.org/10.1007/978-981-15-5232-8_8
80 M. F. Zamir et al.

Table 1. UN WHO data on visual impairment 2010 [1]


Ages Population in Blind Low vision Visually impaired
millions (millions) (millions) (millions)
Till 14 years 1,848.50 1,421 17.518 18.939
Between 15 & 49 3548.2 5,754 74.463 80.248
Older than 50 1,340.80 32.16 154.043 186.203
Overall 6,737.50 39.365 246.024 (3.65) 285.389 (4.24)
(0.58)

For reading books and documents, they are required to be converted into a system
of dots which is a major demerit of this system. Figure 1 shows the conversion of basic
alphabets into braille language.

Fig. 1. Braille system of basic character mapping

In this era of technological development, many efforts have been made to assist
blind people in reading. Many prototypes were proposed and many devices were made
for the purpose. But none of the devices has been completely successful in fulfilling
this necessity of less privileged people of our society. Some of those advancements
lack versatility, some were not the real-time applications, and some of them had
limitations around the text while some had the issues of high processing time. The
clarity in the voice and extraction of complete and desired text has been a problem
faced by many researchers in the past.
The basic concept behind our research is to propose a system that will assist
millions of visually impaired masses as mentioned at the stats in Table 1, in reading the
typed, handwritten & printed text without using the old fashioned tough & difficult
system of braille mapping. As discussed in the last paragraph, that number of
advancements has been made for the same purpose but none of them was completely
able to overcome the technical challenges and hurdles. So we aim to overcome all those
challenges and propose a versatile and complete system.
So we in this research paper proposed a system that develops a device that will provide
reading assistance to blind people in real-time by converting the printed or typed text into
Smart Reader for Visually Impaired People Based on OCR 81

voice. Raspberry Pi is the brain of this device that is connected with a camera that will
capture the image containing text. OCR is used for the detection of text and computer
programming for the conversion into voice signals. Python programming along with the
Tesseract library and Espeak open source software is used for processing image capturing
& detection, conversion into digital signals, and conversion into voice form.

2 Literature Review

Several systems and methodologies are proposed in the literature, which employs the
use of raspberry pi, microcontrollers & sensors using different software tools which
includes Python, MATLAB, etc. for making the assistive tool for reading for the
visually impaired people. In literature, many systems employ the concept of OCR for
image capturing and text to speech synthesis for its conversion in voice signals.
Goel et al. [2] proposed in their paper to develop the system assistive reading
system for visually impaired people by implementing OCR through the tesseract library
and OpenCV for the detection of text. For conversion into the voice signals, they
proposed a text to speech module in their research paper. Mandar et al. [3] proposed a
system in their research article which uses two modules that are image processing and
voice processing module. He has also employed tesseract in the image processing
module and text to the speech synthesizer. The major limitation of this proposed
methodology was it recognizes only a font size of 18. Subbiah et al. [4], in his research
paper, proposed a system for making a reader for blind people using Raspberry Pi. The
researcher proposed their methodology by using the AdaBoost algorithm for the
conversion of text to audio. One of the major disadvantages of the Adaboost algorithm
is its sensitivity towards noise and outliers which can affect the detection and its
conversion into noise. Velmurugan [5], proposed a system to design a reader for
visually impaired people by using the OCR and text to speech engine. The author tested
the system of image processing and conversion on the software of MATLAB, the major
limitation of using MATLAB as software in making the device is the processing time
which is comparatively quite slow in recursive systems. Bhargava et al. [6] proposed in
their paper, a scheme to design a system that will help blind people by reading the text.
The author proposed the scheme using raspberry pi and employed OCR and TTS.
Image Magik Open source software was employed for display of edited images. One of
the drawbacks in this scheme as it doesn’t provide precision and clarity of output voice
signal. Shahi et al. [7] proposed in their paper, a mechanism for blind people to identify
labels and product packages using raspberry pi and implemented the OCR only for the
identification of Labels containing text. The scheme has some major limitations as it
wasn’t tested for normal printed typed or handwritten text.
Ezhilarasi et al. [8] proposed a mechanism in their research paper for the identi-
fication of currency, and text using the SURF and MNS algorithm which is a method
based on the comparison. The captured image was to be compared with the already
feed image. It was made for the identification of currency notes and color identification.
Abinaya et al. [9] proposed in their research work, an assistive device for blind people
using the Raspberry Pi and OCR was implemented. The major demerit of this proposed
mechanism is the use of MATLAB for the processing image and implementation of
82 M. F. Zamir et al.

OCR. The processing time in MATLAB is comparatively high which makes it


unsuitable for a recursive device in real-time. Pooja et al. [10] proposed the design of a
virtual eye for blind people using microcontrollers, ultra-sonic sensors, and raspberry
pi. It was designed for object detection. It was a feedback system. As text reading was
not involved, and use of sensors, microcontrollers don’t make it cost-effective. Saurabh
et al. [11] proposed in their paper a mechanism that will convert the normal typed or
printed text into braille language. Raspberry Pi & microcontrollers are employed with
the implementation of OCR. The major demerit was that the proposed scheme provides
the conversion of text into braille language which is not a feasible solution towards the
problem of reading for blind people. It is an extremely difficult task to convert the
massive text in form of books and documents to braille language. The design is not
cost-effective however the same scheme can be employed for getting the voice output.

3 Proposed Method

The proposed Raspberry Pi based smart reader works on the principal framework,
which is Raspberry Pi. It is a single board minicomputer that is used in the proposed
project for the extraction of text from the captured image and its conversion into audio
which is the output of this project (Fig. 2).

Fig. 2. Block diagram of the proposed model which demonstrates the hard-work
implementation.

As shown in the block diagram, the Power supply of 5 V is supplied to Raspberry


Pi, through the switched-mode power supply (SMPS). The Conversion of 230 V
supply to 5 V takes place through SMPS. The Web Camera is connected to one of four
USB Ports of Raspberry Pi. The Operating System of Raspberry Pi is RASPBION,
which facilitates the process of conversions. The audio jack of the Pi is used for taking
the voice output. An audio amplifier amplifies the converted speech. The Ethernet port
of the Raspberry Pi used for the internet connection. The typed or the printed text
which is to be read is stationed on the base wall, the camera which is placed in front of
Smart Reader for Visually Impaired People Based on OCR 83

the base wall is focused to capture the image. The installed software of OCR processes
the captured image and conversion of the captured image is done by using the software.
The Text-to-speech (TTS) engine facilitates the process of converting text into
vocalization. Connected speakers are sued to fetch the final output which is processed
by the audio amplifier. For the sake of ease & comfort, headphones may be used in
place of speakers.

3.1 Architecture of the Proposed Model


The architecture of the proposed model is primarily described in two main phases
1. Image Acquisition and its conversion into text
2. Text-to- Speech Conversion

Fig. 3. The architecture of the proposed model which explains the process from text to audible
Output.
84 M. F. Zamir et al.

The image acquisition & its conversion into text and text-to-speech conversion are
processed in the steps which are described in the following Sect. 3.2.

3.2 Flow of Process

Image Capturing: As shown in Fig. 3 Part A, The primordial step of the proposed
model is the capturing of the image which is of the typed or printed text paper/page.
The page is placed in front of the stationed camera. The focus of the camera is set
properly for getting a good quality of the image of the text. The high-resolution camera
will give a high quality of the image which will help have a fast and clear recognition
of text.
Image Processing: It can be seen in Fig. 3 after the image is captured, the goal is to
extract the letters of the text and convert them into digital form and then speak them
respectively. Image processing is a framework that is applied to the image to collect the
information from it. The image is fed into the system, it will be converted into a
greyscale image. The converted greyscale image will be in the form of pixels and
specified within a range. The letters find out by using this range. The white and black
contents of the greyscale indicate that white content will be usually the spacing’s
between the word or it can be called as blank space, while the words are identified by
the black contents.
Optical Character Recognition: Optical character recognition which is commonly
referred to as OCR is a mechanism for recognition of text that allows the conversion of
handwritten, typed, or printed text into modifiable soft copies or text files. The OCR
plays the primary role of extracting the text from the images and transforming them
into modifiable text format. OCR is a widely used method for getting the printed or
typed text into a digitized format so that it can be edited, searched and used for many
purposes like machine learning, text-to-speech or cognitive computing, etc.
Image to Text Conversion: As it is described in the architecture of the proposed
model Sect. 3.1, the captured input image is processed by the Tesseract OCR engine to
process the image to text conversion.
Tesseract OCR Engine & Its Working: From Fig. 3 part A, the working flow of
image to text conversion through Tesseract OCR can be seen, the input image is
processed through the adaptive thresholding through the programming, the output
image after adaptive filtering is binary image, which is then processed towards char-
acter recognition and OCR is applied, which then give us the desired output of mod-
ifiable text file. The file extension of the output file given by the tesseract is shown as a
.txt file. It provides results in 100% accuracy (Fig. 4).
Tesseract OCR was developed from 1985 to 1984. It was developed on HP-UX at
HP, to run at the scanner of a desktop. It was an open-source in 2005. It is an OCR
Engine, available for Windows, LINUX, and MAC OS.
Smart Reader for Visually Impaired People Based on OCR 85

Fig. 4. Block diagram of the optical character recognition process and its stepwise detail

It is a command based tool in which an image containing the text in which an image
containing text is fed into the engine as input.
The two arguments of the tesseract commands are
1. The first argument is the filename of the image which contains the text.
2. The other argument is the text file where the extracted text is stored.

Text-to-Speech: The second major step described in proposed model Sect. 3.1, is the
conversion of the text into the speech. Text-to-Speech synthesizer which is commonly
referred to as TTS is a system that is capable of reciting the text automatically. It is a
computer-based system which read the text irrespective of whether the text was
scanned input extracted through Optical character recognition OCR, or fed by the
computer input stream (Fig. 5).

Fig. 5. Process of Text-to-Speech

TTS Architecture: The TTS architecture is explained in the above block diagram of
the architecture of the proposed model Fig. 3 part B, which describes the process in the
following way. When the input image contacting text or the text file is fed into the
system, it passes through different phases before coming out as voice output.
• In the text analysis phase, the text is arranged into a manageable list of words.
• Text normalization is a conversion of the input text in a pronounceable format.
Identification of any pauses or any punctuation mark is the key aim of this process.
86 M. F. Zamir et al.

• The transformation of the orthographical symbols into phonological ones by taking


the phonetic alphabets into account, the process is commonly referred to as
grapheme-to-phoneme conversion.
• The amalgamation of stress pattern, the rise, and fall in the speech and rhythm is
known as prosody. While the emotion of the speaker while speaking is explained by
modeling. This phase contributes to generating a natural synthesized speech.
• Acoustic processing refers to the process in which the type of speech synthesis is
decided. The synthesis may be pre-recorded human voice or intelligible speech.
Articulatory synthesis which is computational technique speech based on models of
the human vocal tract falls in the domain of acoustic processing.
• After all processing through these phases, the intended voice output is taken out.

4 Results and Discussion

Raspberry Pi Model 3B with camera module V2-8 megapixel and speaker/handsets are
the prime hardware used for the implementation of the proposed methodology. The
software implementation is a major part of the proposed research paper. The following
software is used with the operating system of RASPBIAN. Python 2.7.13 is used with a
library of Tesseract OCR 4.0.0 and Espeak 1.48.15 was used as a TTS synthesizer and
the compiler for the whole process is Thonny 3.1.2. The major advantage of the
implementation of the following software is versatility, speed, precision, and efficiency.
The proposed project has been designed to provide the reading assistance to blind
people in real-time with swift and precise processing of an image, and its conversion
into voice form with clarity. From Figs. 6, 7, 8 and 9 shows the implementation of the
proposed methodology on the hardware and results which are obtained. In Fig. 6 it is
shown that the proposed methodology is implemented on the hardware, and a raspberry
pi camera module captured the image which contains text.

Fig. 6. The Text used for testing on hardware.


Smart Reader for Visually Impaired People Based on OCR 87

In Fig. 7 the captured image is shown after getting fetched into the system, the
quality of the image containing text can be seen.

Fig. 7. The captured image by the Raspberry Pi Model 3B with camera module

After getting the captured image containing text fetched in the system, the
extraction of text through OCR and its conversion through tesseract is done which can
be seen in Fig. 8.

Fig. 8. Conversion of the captured image to a text file by using Tesseract OCR.

We have implemented the proposed model and evaluated its versatility and success
towards its objective by comparing it with past and relevant methodologies. The
applied method is a very cost-effective economical as the methods applied in the past
had many unusual components for getting the same output.
88 M. F. Zamir et al.

In Fig. 9 it is shown about the conversion of a modifiable text file is converted into
voice signal through the python algorithm, which is the output of the system.

Fig. 9. Python algorithm used for conversion of the text file to audio file for blind peoples.

The method is implemented and results are obtained with the least hardware used.
The precision and accuracy of the method can be seen in results. The processing time is
compared with other methods, and through this approach, we got the best results. The
processing time was tested with multiple styles of fonts & with many font sizes. Several
samples are applied to verify the results. The output voice clarity was one of the main
challenges in previous methodologies. The text-to-speech approach applied in our
paper is most efficient as its speed can be varied with own desire and it provides
multiple voice format.

5 Conclusion and Future Work

The proposed method was applied to the hardware and it was tested with different
samples repeatedly. Our research methodology has successfully done the process of an
image containing text and its transformation into audible speech. The experimental
results are evident of this conclusion that the proposed method gives exceptional results
in the processing of the image, extraction of text and conversion into speech. The
experimental results have shown that the extraction of text and its conversion into
audible speech is 100%. In the proposed method, speed of voice output can be altered
as per one’s comfortability and the proposed method gives several audio voices which
resolves the issue of voice clarity. The proposed method is feasible and versatile as it
had addressed some major limitations & issues, within a very economical cost. It will
be proved as a major advancement towards serving the visually impaired people.
Smart Reader for Visually Impaired People Based on OCR 89

As we are living in an era of technological development, so the room for


improvement is always there. We have the plan to implement our proposed method for
the text of Urdu which will serve millions of people of Subcontinent. Moreover, more
enhancements can be achieved by implementing it with high-resolution focused
cameras which will then improve the extraction of text and more efficient text to speech
engines can be developed for swift conversion into the text.

References
1. https://www.who.int/blindness/GLOBALDATAFINALforweb.pdf?ua=1
2. Goel, A., Sehrawat, A., Patil, A., Chougule, P., Khatavkar, S.: Raspberry Pi Based Reader
for Blind People (2018)
3. Jadhav, M.S., et al.: Raspberry pi based reader for blind. Int. J. Innov. Emerg. Res. Eng. 5(1),
1639–1642 (2018)
4. Subbiah, A.: Camera based label reader for blind people. Int. J. Chem. Sci. 14(S3), 840–844
(2016). ISSN 0972-768X
5. Velmurugan, D., et al.: A Smart reader for visually impaired people using Raspberry pi.
IJESC. https://doi.org/10.4010/2016.699. ISSN 2321 3361 © 2016
6. Bhargava, A., Nath, K.V.: Reading assistant for the visually impaired. 5(2) (2015)
7. Shah1, P.H., et al.: A portable prototype label reading system for blind. 4(9) (2015)
8. Ezhilarasi, C., et al.: A raspberry pi based assistive aid for visually impaired users. 3(2)
(2017)
9. Abinaya, R.I., et al.: Compact camera based assistive text product label reading and image
identification for hand-held objects for visually challenged people. 3(1), 87–92 (2015)
10. Sharma1, P., et al.: Design of microcontroller based virtual eye for the blind. 3(8) (2014)
11. Bisht, S., et al.: Refreshable Braille Display using Raspberry Pi and Arduino, vol. 6, no. 3,
June 2016. Accepted 01 June 2016, Available online 06 June 2016

View publication stats

You might also like