"Text Recognition and Face Detection Aid For Visually Impaired Person Using Raspberry Pi

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
PROJECT REPORT
ON
“TEXT RECOGNITION AND FACE DETECTION AID FOR VISUALLY

IMPAIRED PERSON USING RASPBERRY PI “
Submitted in partial fulfillment for the award of the degree of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
BY
NAVYA K (1NH17CS418)
SWAPNA K M (1NH17CS426)
VIDHYA M (1NH17CS428)
Under the guidance of
Ms. Asha Rani Borah

Senior Assistant Professor,
Dept. of CSE,NHCE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
It is hereby certified that the project work entitled “TEXT RECOGNITION AND FACE
DETECTION AID FOR VISUALLY IMPAIRED PERSON USING RASPBERRY PI” is a bonafide
work carried out by NAVYA K(1NH15CS418), SWAPNA K M(1NH17CS426), VIDHYA
M(1NH17CS428) in partial fulfilment for the award of Bachelor of Engineering in
COMPUTER SCIENCE AND ENGINEERING of the New Horizon College of Engineering
during the year 2019-2020. It is certified that all corrections/suggestions indicated for
Internal Assessment have been incorporated in the Report deposited in the
departmental library. The project report has been approved as it satisfies the academic
requirements in respect of project work prescribed for the said Degree.
………………………… ……………………….. ………………………………

Signature of Guide Signature of HOD Signature of Principal
(Ms. Anjana Sharma) (Dr. B. Rajalakshmi) (Dr. Manjunatha)
External Viva
Name of Examiner Signature with date
1. ………………………………………….. ………………………………….
2. …………………………………………… …………………………………..
ABSTRACT
Speech and text is the main medium for human communication. A person needs vision
to access the information in a text. However, those who have poor vision can gather
information from voice. For blind or visually impaired persons and the increasing
availability of cost efficiency, high performance and portable digital imaging devices has
created a tremendous opportunity for supplementing traditional scanning for document
image acquisition. This project proposes a camera based assistive text reading to help
visually impaired person in reading the text present on the captured image. The faces
can also be detected when a person enter into the frame by the mode control. The
proposed idea involves text extraction from scanned image using Tesseract Optical
Character Recognition (OCR) and converting the text to speech by e-Speak tool, a
process makes visually impaired persons to read the text. This is a prototype for blind
people to recognize the products in real world by extracting the text on image and
converting it into speech. Proposed method is carried out by using Raspberry pi and
portability is achieved by using a battery backup. Thus the user can carry the device
anywhere and will be able to use at any time. Upon entering the camera view,
previously stored faces are identified and informed which can be implemented as a
future technology. This technology helps millions of people in the world who experience
a significant loss of vision.
I
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task
would be impossible without the mention of the people who made it possible, whose
constant guidance and encouragement crowned our efforts with success.
We have great pleasure in expressing our deep sense of gratitude to Dr. Mohan
Manghnani, Chairman of New Horizon Educational Institutions for providing necessary
infrastructure and creating good environment.
We take this opportunity to express our profound gratitude to Dr. Manjunatha,

Principal NHCE, for his constant support and encouragement.
We are grateful to Dr. Prashanth C.S.R, Dean Academics, for his unfailing
encouragement and suggestions, given to us in the course of our project work.
We would also like to thank Dr. B. Rajalakshmi, Professor and Head, Department of
Computer Science and Engineering, for her constant support.
We express our gratitude to Ms. Asha Rani Borah, Senior Assistant Professor,
Department of Computer Science and Engineering, our project guide, for constantly
monitoring the development of the project and setting up precise deadlines. Her
valuable suggestions were the motivating factors in completing the work.
Finally a note of thanks to the teaching and non-teaching staff of Dept. of Computer
Science and Engineering, for their cooperation extended to us, and our friends, who
helped us directly or indirectly in the course of the project work.
NAVYA K (1NH17CS418)
SWAPNA K M (1NH17CS426)
VIIDHYA M (1NH17CS428)
II
CONTENTS
ABSTRACT I
ACKNOWLEDGEMENT II
LIST OF FIGURES V
LIST OF TABLES VI
1. INTRODUCTION
1.1. INFORMATION CONSISTENCY 1
1.2. PROBLEM DEFINITION 2
1.3. PROJECT PURPOSE 2
1.4. PROJECT FEATURES 2-3
1.5. MACHINE LEARNING 3-4
2. LITERATURE SURVEY
2.1. LITERATURE SURVEY 5-7
2.2. EXISTING SYSTEM 8
2.3. PROPOSED SYSTEM 9-13
2.4. SOFTWARE DESCRIPTION 14-15
3. REQUIREMENT ANALYSIS
3.1. FUNCTIONAL REQUIREMENTS 16
3.2. NON FUNCTIONAL REQUIREMENTS 16
3.3. HARDWARE AND SOFTWARE REQUIREMENTS 18
4. DESIGN
4.1. DESIGN GOALS 19
4.2. SYSTEM ARCHITECTURE 25-26
4.3. DATA FLOW DIAGRAM/ACTIVITY DIAGRAM 27-28
4.4. SEQUENCE DIAGRAM 29-30
III
5. IMPLEMENTATION
5.1. IMAGE CAPTURING 31
5.2. IMAGE PROCESSING MODULE 32-33
5.3. FACE DETECTION 33-34
5.4. VOICE PROCESSING MODULE 35-36
6. TESTING
6.1. UNIT TESTING 37
6.2. INTEGRATION TESTING 37
6.3. VALIDATION TESTING 38
6.4. SYSTEM TESTING 39
6.5. TESTING OF INITIALIZATION AND UI COMPONENTS 39-43
7. SNAPSHOTS/OUTPUT 44-48
8. CONCLUSION 49
REFERENCES 50
IV
LIST OF FIGURES
Fig. No Figure Description Page No
2.1 EXISTING SYSTEM BLOCK DIAGRAM 8
2.2 PROPOSED SYSTEM BLOCK DIAGRAM 9
2.3 TEXT TO SPEECH DEVICE 12
4.1 RASPBERRY PI 17
4.2 PI CAMERA 18
4.3 HEAD SET 19
4.4 ARCHITECTURE OF TEXT EXTRACTION PROCESS 23
4.5 DATA FLOW DIAGRAM 25
4.6 SEQUENCE DIAGRAM 27
6.1 TESTING PROCESS 38
7.1 CAPTURING IMAGE 44
7.2 CAPTURED IMAGE 44
7.3 IMAGE CONVERTED TO TEXT 45
7.4 AUDIO 45
7.5 CAPTURING IMAGE/FACE 46
7.6 FACE DETECTION 46
7.7 UNKNOWN 47
7.8 UNKNOWN PERSON 47
7.9 FACE CAPTURING AND DETECTING 48
7.10 AUDIO 48
V
LIST OF TABLES
Fig. No Figure Description Page No
TEST CASE TO CAPTURE IMAGE FROM RASPBERRY PI

6.1 39
CAMERA
TEST CASE FOR CONVERSION OF IMAGE TO TEXT

6.2 40
USING OCR TOOL
TEST CASE FOR CONVERSION OF TEXT TO VOICE

6.3 41
USING TTS SYSTEM
TEST CASE FOR CAPTURING HUMAN FACE USING

6.4 42
RASPBERRY PI CAMERA
TEST CASE FOR IDENTIFYING HUMAN NAMES ONCE

6.5 43
CAPTURED
VI
Text recognition and face detection aid for visually impaired person using raspberry pi
CHAPTER 1
INTRODUCTION
1.1 INFORMATION CONSISTENCY
To improve the ability of people who are blind or have significant visual impairments to
independently access, understand, and explore unfamiliar indoor and outdoor
environments, The purpose a new framework using a single camera to detect and
recognize the signs, text and obstacles to give audio as an output. To extract and
recognize the text information associated with the detected objects. The first extract
text regions from indoor signs with multiple colors.
The text character localization and layout analysis of text strings are applied to filter out
background interference. The object type, orientation, and location can be displayed as
speech for blind travelers. In order to discriminate similar objects in indoor
environments, the text information associated with the detected objects is extracted.
Optical character Recognition (OCR) is a process that converts scanned or printed text
images, handwritten text into editable text for further processing. Speech synthesis is
the artificial synthesis of human speech. A Text-To-Speech (TTS) synthesizer is a
computer-based system that should be able to read any text aloud, whether it was
directly introduced in the computer by an operator or scanned and submitted to an
Optical Character Recognition (OCR) system. Testing of device was done on raspberry pi
platform. The Raspberry Pi is a basic embedded system and being a low cost a single-
board computer used to reduce the complexity of systems in real time applications.
Dept. of CSE, NHCE 1

1.2 PROBLEM DEFINITION
According to the fact sheet of 2016, 39 million people are blind and 246 million have low
vision. It is very difficult for them to do task like read, write or walk without help. An
effort to minimize the dependence of the user on the people around him for doing the
basic work. The concept of the device which supports general human tendency of
pointing at objects to interact with the environment. This system captures image and
any kind of text and which converts to speech. The OCR stands for Optical Character
Recognition. This algorithm is used for converting captured image into readable codes.
Captures the image, localize the text region, crop the text from the image, and
Recognize the text code using OCR.
1.3 PROJECT PURPOSE
Speech and text is the main medium for human communication. A person needs vision
to access the information in a text. However, those who have poor vision can gather
information from voice. This system proposes a camera based assistive text reading to
help visually impaired person in reading the text present on the captured image. It
converts captured text image into audio to help visually impaired people. The main
purpose of this project is to present a progressive work for developing an assistive aid
for visually impaired. It will help them in object identification, face recognition and
obstacle detection as well as reading newspapers and books. In this approach the object
identification, face recognition, text extractor block, obstacle detection module is
integrated in a single device. Use of finger for reading the text is overcomes by the
camera module.
1.4 PROJECT FEATURES
 As the main intention of the project is to help people who travels around the globe
and to help the blind by assisting them in reading text.

 User friendly (easy to use).
 The speed of translation is much higher than the human translation it takes more
than an hour for translating 10,000 words for human whereas, few seconds are
enough for the device to translation.
 Cost efficient.
1.5 MACHINE LEARNING
Literature survey is the most important step in software development process. Before
developing the tool, it is necessary to determine the time factor, economy and company
strength. Once these things are satisfied, then next steps is to determine which
operating system and language can be used for developing the tool. Once the
programmers start building the tool the programmers need lot of external support. This
support can be obtained from senior programmers, from book or from websites. Before
building the system the above consideration are taken into account for developing the
proposed system.
Machine Learning
Machine learning is a sub field of artificial intelligence (AI). The goal of machine learning
generally is to understand the structure of data and fit that data into models that can be
understood and utilized by people.
Although machine learning is a field within computer science, it differs from traditional
computational approaches. In traditional computing, algorithms are sets of explicitly
programmed instructions used by computers to calculate or problem solve. Machine
learning algorithms instead allow for computers to train on data inputs and use
statistical analysis in order to output values that fall within a specific range. Because of
this, machine learning facilitates computers in building models from sample data in
order to automate decision-making processes based on data inputs.
Any technology user today has benefitted from machine learning. Facial recognition
technology allows social media platforms to help users tag and share photos of friends.
Optical character recognition (OCR) technology converts images of text into movable
type. Recommendation engines, powered by machine learning, suggest what movies or
television shows to watch next based on user preferences. Self-driving cars that rely on
machine learning to navigate may soon be available to consumers.
Machine learning is a continuously developing field. Because of this, there are some
considerations to keep in mind as they work with machine learning methodologies, or
analyze the impact of machine learning processes.
Benefits of Machine Learning:
Easily identifies trends and pattern:- Machine Learning can review large volumes of data
and discover specific trends and patterns that would not be apparent to humans.
No human intervention needed (automation):- With ML, It need not to be babysit the
project every step of the way. Since it means giving machines the ability to learn, it lets
them make predictions and also improve the algorithms on their own.
Continuous Improvement:- As ML algorithms gain experience, they keep improving in

accuracy and efficiency. This lets them make better decisions. It need to make a weather
forecast model. As the amount of data, it can keep growing and algorithms learn to
make more accurate predictions faster.
Handling multi-dimensional and multi-variety data:- Machine Learning algorithms are

good at handling data that are multi-dimensional and multi-variety, and they can do this
in dynamic or uncertain environments.

CHAPTER 2
LITERATURE SURVEY
A number of experimental investigations have been carried out by using partial
replacement of basalt fiber in concrete. The study of various literatures on use of basalt
fiber in concrete as below.
Image to Speech Conversion for Visually Impaired (International Journal of Latest

Research in Engineering and Technology-2017) [3]
Asha G. Hagargund carried out a work and they concluded that the basic framework is
an embedded system that captures an image, extracts only the region of interest (i.e.
region of the image that contains text) and converts that text to speech. It is
implemented using a Raspberry Pi and a Raspberry Pi camera. The captured image
undergoes a series of image pre-processing steps to locate only that part of the image
that contains the text and removes the background. Two tools are used convert the new
image (which contains only the text) to speech. They are OCR (Optical Character
Recognition) software and TTS (Text-to-Speech) engines. The audio output is heard
through the raspberry pi’s audio jack using speakers or earphones.
OCR based automatic book reader for the visually impaired using Raspberry PI
(International Journal of Innovative Research in Computer and Communication
Engineering - 2016) [8]
Aaron James S carried out a work and they concluded that Optical character recognition
(OCR) is the identification of printed characters using photoelectric devices and
computer software. It coverts images of typed, handwritten or printed text into machine
encoded text from scanned document or from subtitle text superimposed on an image.
In this research these images are converted into audio output. OCR is used in machine
process such as cognitive computing, machine translation, text to speech, key data and

text mining. It is mainly used in the field of research in Character recognition, Artificial
intelligence and computer vision. In this research, as the recognition process is image
processing based Multilingual Translator for visually impaired And Travelers Using OCR
on Raspberry PI done using OCR the character code in text files are processed using
Raspberry Pi device on which it recognizes character using tesseract algorithm and
python programming and audio output is listened. To use OCR for pattern recognition to
perform Document image analysis (DIA) we use information in grid format in virtual
digital libraries design and construction. This research mainly focuses on the OCR based
automatic book reader for the visually impaired using Raspberry PI. Raspberry PI
features a Broad com system on a chip (SOC) which includes ARM compatible CPU and
an on chip graphics-processing unit GPU. It promotes Python programming as main
programming language with support for BBC BASIC.
A Smart Reader for Visually Impaired People Using Raspberry PI (International Journal
of Engineering Science and Computing – 2016) [11]
D. Velmurugan carried a work and they concluded that this work proposes a smart
reader for visually challenged people using Raspberry Pi. This report addresses the
integration of a complete Text Read-out system designed for the visually challenged. The
system consists of a webcam interfaced with raspberry pi which accepts a page of
printed text. The OCR (Optical Character Recognition) package installed in raspberry pi
scans it into a digital document which is then subjected to skew correction,
segmentation, before feature extraction to perform classification. Once classified, the
text is readout by a text to speech conversion unit (TTS engine) installed in Raspberry Pi.
The output is fed to an audio amplifier before it is read out. The simulation for the
proposed project can be done in MATLAB. The simulation is just an initiation of image
processing, the image to text conversion and text to speech conversion done by the OCR
software installed in raspberry pi. The system finds interesting applications in libraries,
auditoriums, offices where instructions and notices are to be read and also in assisted
filling of application forms. Though there are many existing solutions to the problem of
assisting individuals who are blind to read, however none of them provides a reading
experience that in any way parallels that of the sighted population. In particular, there is
a need for a portable text reader that is affordable and readily available to the blind
community.
Camera based Text to Speech Conversion, Obstacle and Currency Detection for Blind
Persons (Indian Journal of Science and Technology – 2016) [13]
J. K. R. Sastry carried out a work the main object of this report is to present an innovated
system that can help the blind for handling currency. Methods/Statistical Analysis: Many
image processing techniques have been used to scan the currency, remove the noise,
mark the region of interest and convert the image into text and then to sound which can
be heard by the blind. The entire system is implemented by using Raspberry Pi Micro
controller based system. In the prototype model an IPR sensor is used instead of camera
for sensing the object. Findings: In this report a novel method has been presented using
which one can recognize the object, mark the interesting region within the object, scan
the text and convert the scanned text into binary characters through optical recognition.
A second method has been presented using which the noise present in the scanned
image is eliminated before characters are recognized. A third method that can be used
to convert the recognized characters into e-speech through pattern matching has also
be presented. Applications: An embedded system has been developed based on ARM
technology which helps the blind persons to read the currency notes. All the methods
presented in this project have been implemented within an embedded application. The
embedded board has been tested with different currency notes and the speech in
English has been generated that identify the value of the currency.

2.2 EXISTING SYSTEM

In Existing System, Text extractor block, face recognition module and obstacle detection
module are separately available in market, it is difficult for them to carry different
module with them. Finger reader and orcam devices is used for the visually impaired
people for reading newspaper and text. Both device add to the inconvenience and
increases its market cost.
Disadvantages:
 In Existing System, Text extractor block, face recognition module and obstacle
detection module are separately available in market, it is difficult for them to carry
different module with them.
 Finger reader and orcam devices is used for the visually impaired people for reading
newspaper and text. Both device add to the inconvenience and increases its market
cost.
Figure 2.1- Block Diagram

2.3 PROPOSED SYSTEM

In this approach the object identification, face recognition, text extractor block, obstacle
detection module is integrated in a single device. Use of finger for reading the text is
overcomes by the camera module. The purpose of new framework using a single
camera to detect and recognize the signs, text and obstacles to give audio as an output.
It can be extract and recognize the text information associated with the detected objects.
We first extract text regions from indoor signs with multiple colors. To improve the
ability of people who are blind or have significant visual impairments to independently
access, understand, and explore unfamiliar indoor environments, we propose a new
framework using a single camera to detect and recognize the face, obstacles, sign
incorporating text information associated with the detected object.
Figure 2.2 Block Diagram

This project presents a prototype system for recognition of text present in the image
using raspberry pi. As illustrated in the block diagram the system framework consist of
five functional components: Image acquisition, Image pre-processing, Text extraction,
Text to speech conversion and Speech output.
The image of the text is captured using Raspberry Pi camera or an HD webcam with high
resolution. The acquired image is then applied to the image pre-processing step for
reduction of unwanted noise. In image processing, it is defined as the action of
retrieving an image from some source, usually a hardware-based source for processing it
is first step in the work flow sequence because without an image, no processing is
possible. The image that is acquired is completely unprocessed. Now the incoming
energies transformed into a voltage by the combination of input electrical power and
sensor material that is responsive to a particular type of energy being detected. The
output voltage waveform is the response of the sensor(s) and a digital quantity is
obtained from each sensor by digitizing its response.
Advantages:
 Low cost
 Consistent board format &10x faster processing
 The main advantage of the raspberry pi module over another processor is raspberry
pi is a fully functional Linux computer and also compact in size.
 As the main intention of the project is to help people who travels around the globe
and to help the blind by assisting them in reading text.
 User friendly (easy to use).
 It is very compact to the user.
 High rate of translation: The speed of translation is much higher than the human
translation it takes more than an hour for translating 10,000 words for human
whereas, few seconds are enough for the device to translation.
2.3.1 Text to Speech (TTS) Converter
In this step the extracted text is first converted into speech using the speech synthesizer
called TTS engine which is capable of converting text to speech using predefined libraries.
Text-to-speech device consists of two main modules, the image processing module and
voice processing modules Image processing module captures image using camera
converting the image into text. Voice processing module changes the text into sound
and processes it with specific physical characteristics so that the sound can be
understood. Figure shows the block diagram of Text-To-Speech device, first block is
image processing module, where OCR converts .jpg to .txt form. Second is voice
processing module which converts .txt to speech.
First block is image processing module, where OCR converts .jpg to .txt form. Second is
voice processing module which converts .txt to speech. OCR is important element in this
module. OCR or Optical Character Recognition is a technology that automatically
recognize the character through the optical mechanism, this technology imitate the
ability of the human senses of sight, where the camera becomes a replacement for eye
and image processing is done in the computer engine as a substitute for the human
brain. Tesseract OCR is a type of OCR engine with matrix matching. The selection of
Tesseract engine is because of its flexibility and extensibility of machines and the fact
that many communities are active researchers to develop this OCR engine and also
because Tesseract OCR can support 149 languages. In this project we are identifying
English alphabets. Before feeding the image to the OCR, it is converted to a binary image
to increase the recognition accuracy. Image binary conversion is done by using Image
software, which is another open source tool for image manipulation. The output of OCR
is the text, which is stored in a file (speech.txt). Machines still have defects such as
distortion at the edges and dim light effect, so it is still difficult for most OCR engines to
get high accuracy text. It needs some supporting and condition in order to get the
minimal defect. Tesseract OCR Implementation.

Figure 2.3 Block diagram of text-to-speech device.
Image Processing Module (OCR)
Character recognition software recognizes text characters in electronic files, usually

scanned documents that are saved as images and are not immediately text- searchable.
Character recognition software is also known as optical character recognition or OCR,
because it uses to optical properties of text to identify the characters. Character
recognition enhances the processing of scanned images by allowing automatically
recognizing and extracting text content from different data fields. When a scan a form
and use document imaging software to process it, OCR allows transfer information
directly from the document to an electronic database.
For an image processing tool with powerful, highly accurate OCR and a wide range of
other capabilities, check out Trapeze from soft works AI. Trapeze can recognize both
printed and hand-written text, and it even has features to process scanned documents
that are imperfect in quality.
Voice processing module (TTS)
In this module text is converted to speech. The output of OCR is the text, which is stored
in a file (speech.txt). Here, Festival software is used to convert the text to speech.
Festival is an open source Text To Speech (TTS) system, which is available in many
languages. In this project, English TTS system is used for reading the text.

OPTICAL TEXT RECOGNITION
Data extraction tools are handy for all accountants to make sure they don’t waste their
time on manual data entry. To make it possible, they need to understand what OCR is
and how it works to read and understand expense-related documents. As on reading,
they reveal the intricacies of how attribute extracts data from invoices and receipts. It
simply, OCR (Optical Character Recognition) is a process used to turn an image file into a
text file. It can treat the process as a type of compression since text documents require
significantly less space than picture files such as JPEG, PDF, etc. OCR techniques are
already used in many different fields. Some examples where OCR is thriving are devices
to help the visually impaired, read algorithms that translate handwriting to text
documents, automatic number plate recognition.
How does OCR work?
Step 1: A unique API key is generated when the first integrate optical character
recognition software with mobile or desktop. Even for a trial use, our representatives
offer, this key so that all set of scanned documents.
Step 2: Upload any document in the format of PDF, JPEG, TIFF, PNG etc. to the software.
It can be used as Infrared’s software as a white-labeled mobile app, on the desktop or as
a cloud solution to start scanning.
Step 3: Our software starts extracting line items and other key fields such as logo,
expense type, merchant name, date of the transaction, amount, currency VAT/GST,
business name etc. It can even customize it to extract any other information that might
need. The OCR software provides character-level and word-level confidence scores.
These scores are indicators of whether the OCR software believes the extracted
information to be accurate.
Step 4: The extracted data is made available to them in formats like XML, CSV, JSON etc.
as per their requirement.

2.4 SOFTWARE DESCRIPTION
2.4.1 Raspbian
 Raspbian is a Debian-based computer operating system for Raspberry Pi. Since 2015
till now it is officially provided by the Raspberry Pi Foundation as the primary
operating system for the family of Raspberry Pi single-board computers.
 The operating system is still under active development. Raspbian is highly optimized
for the Raspberry Pi line's low-performance ARM CPUs.
 Raspbian uses PIXEL, Pi Improved Xwindows Environment, Lightweight as its main

desktop environment as of the latest update.
Basic features
 Developer: Raspberry Pi Foundation
 OS family: Unix/Windows
 Source model : Open source
 Latest release:Raspbian Jessie with PIXEL / 16.02.2017
 Marketing target: Raspberry Pi
 Update method: APT
 Package manager: dpkg
 Platforms: ARM
 Kernel type: Monolithic
 User land: GNU
 Default user interface: PIXEL, LXDE

2.4.2 Programming Language- Python
Python is a powerful modern computer programming language. Python is an interpreted,

object oriented, high level programming language. Python allows you to use variables
without declaring them (i.e., it determines types implicitly),and it relies on indentation
as a control structure. You are not forced to define classes in Python(unlike Java) but you
are free to do so when convenient.
Python was developed by “Guido van Rossum”, and it is free software. Free as in “free
beer,” in that you can obtain Python without spending any money. But Python is also
free in other important ways, for example you are free to copy it as many times as you
like, and free to study the source code, and make changes to it. There is a worldwide
movement behind the idea of free software, initiated in 1983 by Richard Stallman.
Python is a good choice for mathematical calculations, since we can write code quickly,
test it easily, and its syntax is similar to the way mathematical ideas are expressed in the
mathematical literature. By learning Python you will also be learning a major tool used
by many web developers.
Python Features
 Easy to Learn and Use. Python is easy to learn and use.
 Expressive Language. Python language is more expressive means that it is more

understandable and readable.
 Interpreted Language.
 Cross-platform Language.
 Free and Open Source.
 Object-Oriented Language.
 Extensible.
 Large Standard Library.
 GUI Programming Support

CHAPTER 3
REQUIREMENT ANALYSIS
3.1 FUNCTIONAL REQUIREMENTS

In software engineering, a functional requirement defines a function of a software
system or its component. A function is described as a set of inputs, the behaviour, and
outputs. Functional requirements may be data manipulation, processing, technical
details and other specific functionality that define what a system is supposed to
accomplish. Behavioral requirements describing all the cases where the system uses the
functional requirements are captured in use cases.
Here, the system has to perform the following tasks:
 The open CV library manufactures the plant item investigation for medical imaging,
security UI, camera adjustment, stereo vision and robotics.
 OCR and TTS tasks have to introduce OCR and TTS engines with predefined libraries.
 OCR is also useful for visually challenged individuals who was not able to read Text
document, but need to access the contents of the text documents. It is utilized to
digitize and reproduce messages.
 Tesseract OCR engine separates the recognized characters.
3.2 NON-FUNCTIONAL REQUIREMENTS

In systems engineering and requirements engineering, a non-functional requirement is a
requirement that specifies criteria that can be used to judge the operation of a system,
rather than specific behaviors. This should be contrasted with functional requirements
that define specific behavior or functions. The plan for implementing functional
requirements is detailed in the system design. The plan for implementing non-functional
requirements is detailed in the system architecture.

3.2.1 ACCESSIBILITY
Accessibility is a general term used to describe the degree to which a product, device,
service, or environment is accessible by as many people as possible.
In our project people who are blind or have significant visual impairments to
independently access, understand, and explore unfamiliar indoor and outdoor
environments, The faces can also be detected when a person enter into the frame by
the mode control.
The proposed idea involves text extraction from scanned image using Tesseract Optical
Character Recognition (OCR) and converting the text to speech by e-Speak tool, a
process makes visually impaired persons to read the text.
In image pre-processing the unwanted noise in the image is removed by applying

appropriate threshold (OTSU), morphological transformations like dilation and black hat
transformation, discrete cosine transformations, generating the required contours and
drawing the bounding boxes around the required text content in the image.
3.2.2 MAINTAINABILITY
In software engineering, maintainability is the ease with which a software product can
be modified. In order to:-
 Correct defects
 Meet new requirements

3.2.3 SCALABILITY
System is capable of handling increase total throughput under an increased load when
resources (typically hardware) are added.
System can work normally under situations such as low bandwidth and large number of
users.
3.2.4 PORTABILITY
Portability is one of the key concepts of high-level programming. Portability is the

software code base feature to be able to reuse the existing code instead of creating new
code when moving software from an environment to another.
3.3 HARDWARE REQUIREMENTS
Processor : Any Processor above 500 MHz

RAM : 512Mb
Hard Disk : 10 GB
Input device : Raspberry pi Model B+ and Raspberry pi 3 Model B and Pi camera
Connectivity : wireless LAN and Bluetooth
3.4 SOFTWARE REQUIREMENTS
Operating system : Raspbian (or) Debain operating system.

Scripting Language : Python 3
Programming Language : Java and C++
OS family : Unix
Tools : Tesseract OCR

CHAPTER 4
DESIGN
4.1 DESIGN GOALS

The main goal is to capture the Image and Software processes the input image and
convert it into text format. Then the text is translated into any desired language and it
uses OCR algorithm to convert text to audio.
4.1.1 INPUT/OUTPUT PRIVACY
RASPBERRY PI:- The Raspberry Pi 3 Model B is the third generation Raspberry Pi. This
powerful credit-card sized single board computer can be used for many applications and
supersedes the original Raspberry Pi Model B+ and Raspberry Pi 2 Model B. Whilst
maintaining the popular board format the Raspberry Pi 3 Model B brings more powerful
processer, 10x faster than the first generation Raspberry Pi. Additionally it adds wireless
LAN & Bluetooth connectivity making it the ideal solution for powerful connected
designs.
Figure 4.1 Raspberry Pi

Key Benefits:
 Low cost
 Consistent board format and10x faster processing
 The main advantage of the Raspberry Pi module over another processor is

raspberry pi is a fully functional Linux computer and also compact in size.
Pi camera: - The camera module used in this project the Raspberry Pi camera module
as shown in the fig.4.2. The camera module plugs to the CSI connector on the Raspberry
Pi. It is able to deliver clear 5MP resolution image, or 1080p HD video recording at 30fps.
The camera module attaches to Raspberry Pi by a 15 pin Ribbon Cable, to the dedicated
15 pin MIPI Camera Serial Interface (CSI), which was designed especially for interfacing
to cameras. The CSI bus is capable of extremely high data rates, and it exclusively carries
pixel data to the BCM2835 processor.
Figure 4.2 Pi Camera

Features
 High-Definition video camera for Raspberry Pi Model A/B/B+ and Raspberry Pi 2
 Omni vision OV5647 sensor in a fixed-focus module with replaceable Lens
 Lens holder: M12x0.5 , CS mount or C mount
 5MPixel sensor
 Integral IR filter
 Still picture resolution: 2592 x 1944
 Max video resolution: 1080p
 Max frame rate: 30fps
 Support FREX/ STROBE feature
 Size: 36 x 36 mm
HEAD SET:- It converts into a voice output to help blind peoples. Its Audio input is
3.5mmjack. This gets the result from the raspberry pi module and convey to the receiver.
Figure 4.3 Head Set

4.1.2 Block Diagram
Figure 4.4 Block diagram
Image Acquisition
In this step the image of the text is captured using raspberry pi camera or an HD
webcam with high resolution. The acquired image is then applied to the image pre-
processing step for reduction of unwanted noise.
In image processing, it is defined as the action of retrieving an image from some source,
usually a hardware-based source for processing it is first step in the work flow sequence
because without an image, no processing is possible. The image that is acquired is
completely unprocessed. Now the incoming energies transformed into a voltage by the
combination of input electrical power and sensor material that is responsive to a
particular type of energy being detected. The output voltage waveform is the response
of the sensor(s) and a digital quantity is obtained from each sensor by digitizing its
response.

Image acquisition using a single sensor
Example of a single sensor is a photo diode. Now to obtain a two dimensional image
using a single sensor, the motion should be in both X & Y directions. Rotation provides
motion in one direction. Linear motion provides motion in the perpendicular direction
Image Pre-processing
In image pre-processing the unwanted noise in the image is removed by applying
appropriate threshold (OTSU), morphological transformations like dilation and black hat
transformation, discrete cosine transformations, generating the required contours and
drawing the bounding boxes around the required text content in the image. Initially the
captured image is re scaled to appropriate size and converted into gray scale image such
that it will be more useful for further processing.
Then the discrete cosine transformation is applied to the grey image to compress the
image which helps to improve processing rate. Then by setting the vertical and
horizontal ratio unwanted high frequency components present in the image are
eliminated.
Then the inverse discrete cosine transform is applied for decompression. Then image
undergoes morphological operations like black top-hat transformation and dilations. The
black top-hat transformation is applied to the image by generating appropriate
structuring elements and extracts the objects or elements which are smaller than the
defined structuring elements and darker than their surroundings.
Then dilation operation is performed, which adds the pixels to the boundaries of the
objects present in the image. The number of pixels added to the objects depends on the
size and shape of the structuring element defined to process the image. After the
morphological operations, thresholding is applied to the morphologically transformed
image. Here the OTSU’s thresholding algorithm is applied to the image, which is an
adaptive thresholding algorithm. After thresholding, the contours for the image are
generated using special functions in Open CV.

These contours are used to draw the bounding boxes for the objects or elements
present in the image. Using these drawn bounding boxes each and every character
present in the image is extracted which is then applied to the OCR engine to recognize
the entire text present in the image
Text Extraction
In this step the recognized text present in the image are extracted using OCR engines. In
this project the Tesseract OCR engine which helps to extract the recognized text.
The aim of Optical Character Recognition (OCR) is to classify optical patterns (often
contained in a digital image) corresponding to alphanumeric or other characters. The
process of OCR involves several steps including segmentation, feature extraction, and
classification. In principle, any standard OCR software can now be used to recognize the
text in the segmented frames. However, a hard look at the properties of the candidate
character regions in the segmented frames or image reveals that most OCR software
packages will have significant difficulty to recognize the text.
Document images are different from natural images because they contain mainly text
with a few graphics and images. Due to the very low-resolution of images of those
captured using hand held devices, it is hard to extract the complete layout structure
(logical or physical) of the documents and even worse to apply standard OCR systems.
For this reason, a shallow representation of the low-resolution captured document
images is proposed. In case of original electronic documents in the repository, the
extraction of the same signature is straightforward; the PDF or PowerPoint form of the
original electronic documents is converted into a relatively high-resolution image (TIFF,
JPEG on which the signature is compute Finally, the captured document’s signature is
compared to with all the original electronic document signatures in order to find a
match.

4.1.2 Algorithms
 Start
 Scan the textual image.
 Convert color image into gray image and then binary image.
 Do pre-processing like noise removal, skew correction etc. 4
 Load the DATABASE.
 Do segmentation by separating lines from textual image.
4.2 SYSTEM ARCHITECTURE

Text extraction and recognition process comprises of five steps namely text detection,
text localization, text tracking, segmentation or binarization, and character recognition.
Architecture of text extraction process can be visualized.
Figure 4.4 Architecture of Text Extraction Process

A. Text Detection
This phase takes image or video frame as input and decides it contains text or not.It also
identifies the text regions in image.
B. Text Localization
Text localization merges the text regions to formulate the text objects and define the
tight bounds around the text objects
C. Text Tracking
This phase is applied to video data only. For the readability purpose, text embedded in
the video appears in more than thirty consecutive frames.
Text tracking phase exploits this temporal occurrences of the same text object in
multiple consecutive frames. It can be used to rectify the results of text detection and
localization stage. It is also used to speed up the text extraction process by not applying
the banalization and recognition step to every detected object.
D. Text Banalizations
This step is used to segment the text object from the background in the bounded text
objects. The output of text banalizations is the binary image, where text pixels and
background pixels appear in two different binary levels.
E. Character Recognition
The last module of text extraction process is the character recognition. This module
converts the binary text object into the ASCII text. Text detection, localization and
tracking modules are closely related to each other and constitute the most challenging
and difficult part of extraction process.

4.3 DATA FLOW DIAGRAM

A data-flow diagram (DFD) is a way of representing a flow of a data of a process or a
system (usually an information system). The DFD also provides information about the
outputs and inputs of each entity and the process itself. A data-flow diagram has no
control flow, there are no decision rules and no loops.
Process:- A process receives input data and produces output with a different content
or form. Processes can be as simple as collecting input data and saving in the database,
or it can be complex as producing a report containing monthly sales of all retail stores in
the northwest region.
Fig 4.5: Data Flow diagram
The text character localization and layout analysis of text strings are applied to filter out
background interference. The object type, orientation, and location can be displayed as
speech for blind travelers.

To improve the ability of people who are blind or have significant visual impairments to
independently access, understand, and explore unfamiliar indoor environments, we
propose a new framework using a single camera to detect and recognize the face,
obstacles, signs incorporating text information associated with the detected object. In
order to discriminate similar objects in indoor environments, the text information
associated with the detected objects is extracted.
Optical character Recognition (OCR) is a process that converts scanned or printed text
images, handwritten text into editable text for further processing. Speech synthesis is
the artificial synthesis of human speech. A Text-To-Speech (TTS) synthesizer is a
computer-based system that should be able to read any text aloud, whether it was
directly introduced in the computer by an operator or scanned and submitted to an
Optical Character Recognition (OCR) system.
Testing of device was done on raspberry pi platform. The Raspberry Pi is a basic

embedded system and being a low cost a single-board computer used to reduce the
complexity of systems in real time applications. Once the OCR converts Text or Image to
speech then using head set the Visually impaired person can listen.

4.4 SEQUENCE DIAGRAM
Sequence diagram is an interaction diagram that emphasizes the time ordering of

messages. Graphically, a sequence diagram is a table that shows objects arranged along
the X axis and messages, ordered in increasing time, along the Y axis.
 It shows a set of objects and the messages sent and received by those objects.
 The objects are typically named or anonymous instances of classes, but may also
represent instances of their things, such as collaborations, components and nodes.
 These diagrams are used to illustrate the dynamic view of a system.
Fig 4.6: Sequence diagram

Character recognition software recognizes text characters in electronic files, usually

scanned documents that are saved as images and are not immediately text- searchable.
Character recognition software is also known as optical character recognition or OCR,
because it uses to optical properties of text to identify the characters.
Character recognition enhances the processing of scanned images by allowing

automatically recognizing and extracting text content from different data fields. When a
scan a form and use document imaging software to process it, OCR allows transfer
information directly from the document to an electronic database.
For an image processing tool with powerful, highly accurate OCR and a wide range of
other capabilities, check out Trapeze from soft works AI. Trapeze can recognize both
printed and hand-written text, and it even has features to process scanned documents
that are imperfect in quality

CHAPTER 5
IMPLEMENTATION
5.1 IMAGE CAPTURING

The proposed method uses the raspberry pi board as the main controller. The latest
version of raspbian wheezyisused on to the board. After installing the OS to the board
connect all the necessary hardware components and switch on the power supply.It
starts booting up the Board and login the raspberry pi by username and password. It
operates on the Linux/windows operating system. It mainly works on the python
software and checks the network settings to update the python software by commands
in the terminal window.
Following packages are to be installed for implementing the proposed model.Installation
commands have been listed below.
Code
“import os
import cv2
import numpy as np
import pygame
from picamera import PiCamera
from google.cloud import vision
from google.cloud.vision import types
from google.cloud import texttospeech
from PIL import Image”

Enable the camera settings on the board to capture the image and save it on the folder.
Run the python code to check the enhancement algorithms and remove the noise
present in an image.
5.2 Image processing module using optical character recognition
OCR is important element in this module. OCR or Optical Character Recognition is a

technology that automatically recognize the character through the optical mechanism,
this technology imitate the ability of the human senses of sight, where the camera
becomes a replacement for eye and image processing is done in the computer engine as
a substitute for the human brain. Tesseract OCR is a type of OCR engine with matrix
matching. The selection of Tesseract engine is because of its flexibility and extensibility
of machines and the fact that many communities are active researchers to develop this
OCR engine and also because Tesseract OCR can support 149 languages. In this project
we are identifying English alphabets. Before feeding the image to the OCR, it is
converted to a binary image to increase the recognition accuracy. The output of OCR is
the text, which is stored in a file (speech.txt).
Code
“while True:
print("##### MENU #####")
print("1. Capture image")
print("2. Identify Person")
print("3. Exit")
opt = input("Enter option: ")
if opt=="1":
# Generate image name with timestamp
#image_name = "Book-"+datetime.datetime.now().strftime("%H-%M-%S")+".jpg"

image_name = "Books.jpg"
self.camera.start_preview()
time.sleep(2)
self.camera.capture(image_name)
print("[LOG] Image clicked....")
# content ---> Image contents in binary.
with open(image_name, 'rb') as image_file:
content = image_file.read()
print("Sending Image to OCR . . ")
# Send the binary to OCR API for text extraction.
image = types.Image(content=content)
response = self.imgClient.document_text_detection(image=image)
labels = response.full_text_annotation”
5.3 FACE DETECTION

First stage was creating a face detection system using Haar-cascades. Open CV has a
robust set of Haar-cascades that was used for the project. Using face-cascades alone
caused random objects to be identified and eye cascades were incorporated to obtain
stable face detection.For this project three algorithms are implemented independently.
These are Eigenface, Fisherface and Linear binary pattern histograms respectively. All
three can be implemented using OpenCV libraries. There are three stages for the face
recognition as follows:
1. Collecting images IDs.
2. Extracting unique features, classifying them and storing in XML files.

3. Matching features of an input image to the features in the saved XML files and predict
identity.
Code
“def recognizeFace(self, faces, hog, img):
# For every face in image
for (x,y,w,h) in faces:
# Draw rectangle
cv2.rectangle(img, (x,y), (x+w, y-h), (255,0,0), 2)
result = cv2.face.StandardCollector_create()
# Get prediction for face
self.rec.predict_collect(hog[y:y+h,x:x+w],result)
# Get the predicted label
id = result.getMinLabel()
conf = result.getMinDist()
if conf<100:
if id==1:
id = "Bill Gates"
elif id==2:
id = "Ratan Tata"
else:
id = "Modi"
else:
id = "Unknown"

cv2.putText(img, str(id), (x,y+h), self.font, 1, (255,255,255), 2, cv2.LINE_AA)
# Preprocess and face detection
def preprocessCapture(self, img):
# To hold image to show
showImage = None
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Adjust and clean lighting
equ = cv2.equalizeHist(gray)
faces = self.cascade.detectMultiScale(equ, 1.05, 5,minSize=(10,10))”
5.4 VOICE PROCESSING MODULE (TTS)
In this module text is converted to speech. The output of OCR is the text, which is stored
in a file (speech.txt). Now this speech.txt is converted to speech.
Once the face is captured will check with a id if the id matches then that is converted to
speech.
Code
def generateAudio(self,text,audioname):
# Synthesize the input
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(

language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
# audion_config = configuration for the output audio file. Supports other formats such as
WAV, AVI etc.
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# response ---> Receives API response for the input text.
response = self.ttsClient.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is in binary format. The audio content is written into the
output file.
with open(audioname, 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file {}'.format(audioname))”

CHAPTER 6
TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product it is
the process of exercising software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific testing
requirement.
TYPES OF TESTS
6.1 UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit
before integration. This is a structural testing, that relies on knowledge of its
construction and is invasive. Unit tests perform basic tests at component level and test a
specific business process, application, and/or system configuration. Unit tests ensure
that each unique path of a business process performs accurately to the documented
specifications and contains clearly defined inputs and expected results.
6.2 INTEGRATION TESTING

Integration tests are designed to test integrated software components to determine if
they actually run as one program. Testing is event driven and is more concerned with
the basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically
aimed at exposing the problems that arise from the combination of components.
6.3 VALIDATION TESTING

An engineering validation test (EVT) is performed on first engineering prototypes, to
ensure that the basic unit performs to design goals and specifications. It is important in
identifying design problems, and solving them as early in the design cycle as possible, is
the key to keeping projects on time and within budget. Too often, product design and
performance problems are not detected until late in the product development cycle —
when the product is ready to be shipped. The old adage holds true: It costs a penny to
make a change in engineering, a dime in production and a dollar after a product is in the
field.
Verification is a Quality control process that is used to evaluate whether or not a

product, service, or system complies with regulations, specifications, or conditions
imposed at the start of a development phase. Verification can be in development, scale-
up, or production. This is often an internal process.
Validation is a Quality assurance process of establishing evidence that provides a high

degree of assurance that a product, service, or system accomplishes its intended
requirements. This often involves acceptance of fitness for purpose with end users and
other product stakeholders.
The testing process overview is as follows:
Figure 6.1: The testing process

6.4 SYSTEM TESTING

System testing of software or hardware is testing conducted on a complete, integrated
system to evaluate the system's compliance with its specified requirements. System
testing falls within the scope of black box testing, and as such, should require no
knowledge of the inner design of the code or logic.
As a rule, system testing takes, as its input, all of the "integrated" software components
that have successfully passed integration testing and also the software system itself
integrated with any applicable hardware system(s).
System testing is a more limited type of testing; it seeks to detect defects both within
the "inter-assemblages" and also within the system as a whole.
System testing is performed on the entire system in the context of a Functional

Requirement Specification(s) (FRS) and/or a System Requirement Specification (SRS).
6.5 TESTING OF INITIALIZATION AND UI COMPONENTS
Serial Number of Test Case TC 01
Module Under Test Capturing image from Raspberry Pi

Camera
Description Raspberry Pi camera captures the image,

which contains only region of text.
Output If image contains region of text it will

captured or else if image contains any
pictures, It will be ignored.
Remarks Test Successful.
Table 6.1: Test case for Capture Image from Raspberry Pi Camera

Module Under Test Conversion of image to txt using OCR

tool
Description Tesseract is an open source-OCR engine.

It assumes that its input is a binary image
with optional text region defined. The
binary image is converted to text by
Tesseract library in OCR engine
Input Captures the image of txt from the

raspberry pi camera
Output Once the image of txt has captured,

Using tesseract it will be converted to
text.
Table 6.2: Test Case for Conversion of image to txt using OCR tool

Module Under Test Conversion of text to voice using TTS

system
Description Speech synthesis is used to create voice

output. A synthesizer can be used to
convert text to speech.
Input The input is taken from previously

converted text using OCR tool.
Output The converted text is converted to speech

using TTS(Text-to-Speech) system.
Table 6.3: Test Case for Conversion of text to voice using TTS system

Module Under Test Capture human face using Raspberry Pi

Camera
Description Raspberry Pi camera captures the

human face to identify the person
name.
Input Capture the face image of human.
Output If the face image of human is captured,

it will be proceeded with next step or
else it will be ignored.
Table 6.4: Test Case for Capture human face using Raspberry Pi Camera

Module Under Test Identification of human names
Description Once the Raspberry Pi camera captures

the human face. The next step is to
finding the person name with his
characteristics.
Input User select captured image face of

human.
Output Once the image face of human is

captured, the person name will be
identified.
Table 6.5: Test Case for Identification of human names

CHAPTER 7
SNAPSHOT
Fig 7.1 Capturing Image
Fig 7.2 Captured Image

Fig 7.3 Image Converted to text
AUD-20200427-WA0005.opus
Fig 7.4 Audio

Fig 7.5 Capturing face
Fig 7.6 Face Detection

Fig 7.7 Unknown
Fig 7.8 Unknown Person

Fig 7.9 Face capturing and detecting
AUD-20200427-WA0005.opus
Fig 7.10 Audio

CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENT
8.1 CONCLUSION
A design on face and text recognition based on raspberry pi, which is mainly designed
for the purpose of blind navigation. Our future work will focus on detecting the
emotions of the persons and recognizing more types of indoor objects and icons on
signage in addition to text for indoor way finding aid to assist blind people travel
independently. We will also study the significant human interface issues including
auditory output and spatial updating of object location, orientation, and distance. With
real-time updates, blind users will be able to better use spatial memory to understand
the surrounding environment, obstacles and signs.
8.2 FUTURE ENHANCEMENT

Addition of GPS to the present system will enable the user to get directions and it could
give information regarding present location of the user. Also the device can be used for
face recognition. Visually impaired person need not to guess people. GSM module can
be added to this system to implement a panic button. If the user is in trouble, then he
can make use of the panic button to seek help by sending the location to some
predefined mobile numbers. Recognizing objects like currencies, tickets, visa cards,
numbers or details on smart phone etc could make the life of blind people easier.
Identification of traffic signals, sign boards and other land marks could be helpful in
traveling. Blue tooth facility could be added in order to remove the wired connection
between the spectacle and Raspberry Pi.

REFERENCES
[1]. Eyes of things(IEEE 2017) by Noelia Vallez ,THALES Communications & Security,
4 Avenue des Louvresses, 92230 Gennevilliers, France.
[2]. Camera based analysis of text and documents by Jian Liang, David Doermann. In
Proceedings of the IEEE International Conference on Robotics and Automation, 2004.
[3]. Context-based Indoor Object Detection as an Aid to Blind Persons Accessing

Unfamiliar Environments by Xiaodong Yang, Yingli Tian from The city of newyork,
USA. IEEE Trans PAMI 24(9):1167–1183.
[4]. Reading labels of cylinder objects for blind persons by Ze Ye, Chucai Yi and Yingli
Tian. Dept. of Electrical Engineering, The City College of New York,The City
University of New York, USA.e-mail: zye01@ccny.cuny.edu.
[5]. Blind people guidance system by Ljupko simunovic,Velimir Andelic, Ivan

Pavlinusic at Central European conference on information and intelligent system.
[6]. Auditory Feedback and Sensory Substitution. During Teleoperated Navigation by

Rong Liu, Member, IEEE, and Yong-Xuan Wang.

Text Recognition and face detection aid for visuallyimpaired
person using Raspberry Pi
ORIGINALITY REPORT
10 %
SIMILARITY INDEX
%
INTERNET SOURCES
10%
PUBLICATIONS
%
STUDENT PAPERS
PRIMARY SOURCES
1 S. Dubal, M. Yadav, V. Singh, V. Uniyal, M.

Singh. "Smart aero-amphibian surveillance
2%
system", International Conference & Workshop
on Electronics & Telecommunication
Engineering (ICWET 2016), 2016
Publication
2 Annisa Istiqomah Arrahmah, Aulia Rahmatika,

Samantha Harisa, Hasballah Zakaria, Richard
1%
Mengko. "Text-to-Speech device for patients
with low vision", 2015 4th International
Conference on Instrumentation,
Communications, Information Technology, and
Biomedical Engineering (ICICI-BME), 2015
Publication
3 Vaibhav V. Mainkar, Tejashree U. Bagayatkar,

Siddhesh K. Shetye, Hrushikesh R. Tamhankar,
1%
Rahul G. Jadhav, Rahul S. Tendolkar.
"Raspberry pi based Intelligent Reader for
Visually Impaired Persons", 2020 2nd
International Conference on Innovative
Mechanisms for Industry Applications (ICIMIA),
2020
Publication
4 S.D. Shirbahadurkar, D.S. Bormane. "Marathi

Language Speech Synthesizer Using
1%
Concatenative Synthesis Strategy (Spoken in
Maharashtra, India)", 2009 Second International
Conference on Machine Vision, 2009
Publication
5 Madhavi R. Repe, S.D. Shirbahadurkar, Smita

Desai. "Natural Prosody Generation in TTS for
1%
Marathi Speech Signal", 2010 International
Conference on Signal Acquisition and
Processing, 2010
Publication
6 Sakchai Tangwannawit, Wanida Saetang.

"Recognition of Lottery Digits Using OCR
1%
Technology", 2016 12th International
Conference on Signal-Image Technology &
Internet-Based Systems (SITIS), 2016
Publication
7 Abul K. M. Azad, Mohammed Misbahuddin.

"Cloud Enabled Text Reader for Individuals with
1%
Vision Impairment", Advances in Internet of
Things, 2017
Publication
8 M. Rajesh, Bindhu K. Rajan, Ajay Roy, K.

Almaria Thomas, Ancy Thomas, T. Bincy
Tharakan, C. Dinesh. "Text recognition and face 1%
detection aid for visually impaired person using
Raspberry PI", 2017 International Conference
on Circuit ,Power and Computing Technologies
(ICCPCT), 2017
Publication

"Text Recognition and Face Detection Aid For Visually Impaired Person Using Raspberry Pi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

"Text Recognition and Face Detection Aid For Visually Impaired Person Using Raspberry Pi

Uploaded by

Copyright:

Available Formats

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

“TEXT RECOGNITION AND FACE DETECTION AID FOR VISUALLY

Submitted in partial fulfillment for the award of the degree of

COMPUTER SCIENCE AND ENGINEERING

Under the guidance of

Ms. Asha Rani Borah

………………………… ……………………….. ………………………………

Name of Examiner Signature with date

We take this opportunity to express our profound gratitude to Dr. Manjunatha,

Fig. No Figure Description Page No

2.1 EXISTING SYSTEM BLOCK DIAGRAM 8

2.2 PROPOSED SYSTEM BLOCK DIAGRAM 9

2.3 TEXT TO SPEECH DEVICE 12

4.3 HEAD SET 19

4.4 ARCHITECTURE OF TEXT EXTRACTION PROCESS 23

4.5 DATA FLOW DIAGRAM 25

4.6 SEQUENCE DIAGRAM 27

6.1 TESTING PROCESS 38

7.1 CAPTURING IMAGE 44

7.2 CAPTURED IMAGE 44

7.3 IMAGE CONVERTED TO TEXT 45

7.5 CAPTURING IMAGE/FACE 46

7.6 FACE DETECTION 46

7.8 UNKNOWN PERSON 47

7.9 FACE CAPTURING AND DETECTING 48

Fig. No Figure Description Page No

TEST CASE TO CAPTURE IMAGE FROM RASPBERRY PI

TEST CASE FOR CONVERSION OF IMAGE TO TEXT

TEST CASE FOR CONVERSION OF TEXT TO VOICE

TEST CASE FOR CAPTURING HUMAN FACE USING

TEST CASE FOR IDENTIFYING HUMAN NAMES ONCE

1.1 INFORMATION CONSISTENCY

Dept. of CSE, NHCE 1

1.2 PROBLEM DEFINITION

1.3 PROJECT PURPOSE

1.4 PROJECT FEATURES

and to help the blind by assisting them in reading text.

Dept. of CSE, NHCE 2

 User friendly (easy to use).

1.5 MACHINE LEARNING

Benefits of Machine Learning:

Continuous Improvement:- As ML algorithms gain experience, they keep improving in

Handling multi-dimensional and multi-variety data:- Machine Learning algorithms are

Dept. of CSE, NHCE 4

Image to Speech Conversion for Visually Impaired (International Journal of Latest

Dept. of CSE, NHCE 5

Dept. of CSE, NHCE 7

2.2 EXISTING SYSTEM

Figure 2.1- Block Diagram

Dept. of CSE, NHCE 8

2.3 PROPOSED SYSTEM

Figure 2.2 Block Diagram

Dept. of CSE, NHCE 9

 Consistent board format &10x faster processing

 User friendly (easy to use).

 It is very compact to the user.

2.3.1 Text to Speech (TTS) Converter

Dept. of CSE, NHCE 11

Figure 2.3 Block diagram of text-to-speech device.

Image Processing Module (OCR)

Character recognition software recognizes text characters in electronic files, usually

Voice processing module (TTS)

Dept. of CSE, NHCE 12