Professional Documents
Culture Documents
PROJECT REPORT
ON
BACHELOR OF ENGINEERING
IN
BY
NAVYA K (1NH17CS418)
SWAPNA K M (1NH17CS426)
VIDHYA M (1NH17CS428)
CERTIFICATE
It is hereby certified that the project work entitled “TEXT RECOGNITION AND FACE
DETECTION AID FOR VISUALLY IMPAIRED PERSON USING RASPBERRY PI” is a bonafide
work carried out by NAVYA K(1NH15CS418), SWAPNA K M(1NH17CS426), VIDHYA
M(1NH17CS428) in partial fulfilment for the award of Bachelor of Engineering in
COMPUTER SCIENCE AND ENGINEERING of the New Horizon College of Engineering
during the year 2019-2020. It is certified that all corrections/suggestions indicated for
Internal Assessment have been incorporated in the Report deposited in the
departmental library. The project report has been approved as it satisfies the academic
requirements in respect of project work prescribed for the said Degree.
External Viva
1. ………………………………………….. ………………………………….
2. …………………………………………… …………………………………..
ABSTRACT
Speech and text is the main medium for human communication. A person needs vision
to access the information in a text. However, those who have poor vision can gather
information from voice. For blind or visually impaired persons and the increasing
availability of cost efficiency, high performance and portable digital imaging devices has
created a tremendous opportunity for supplementing traditional scanning for document
image acquisition. This project proposes a camera based assistive text reading to help
visually impaired person in reading the text present on the captured image. The faces
can also be detected when a person enter into the frame by the mode control. The
proposed idea involves text extraction from scanned image using Tesseract Optical
Character Recognition (OCR) and converting the text to speech by e-Speak tool, a
process makes visually impaired persons to read the text. This is a prototype for blind
people to recognize the products in real world by extracting the text on image and
converting it into speech. Proposed method is carried out by using Raspberry pi and
portability is achieved by using a battery backup. Thus the user can carry the device
anywhere and will be able to use at any time. Upon entering the camera view,
previously stored faces are identified and informed which can be implemented as a
future technology. This technology helps millions of people in the world who experience
a significant loss of vision.
I
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task
would be impossible without the mention of the people who made it possible, whose
constant guidance and encouragement crowned our efforts with success.
We have great pleasure in expressing our deep sense of gratitude to Dr. Mohan
Manghnani, Chairman of New Horizon Educational Institutions for providing necessary
infrastructure and creating good environment.
We are grateful to Dr. Prashanth C.S.R, Dean Academics, for his unfailing
encouragement and suggestions, given to us in the course of our project work.
We would also like to thank Dr. B. Rajalakshmi, Professor and Head, Department of
Computer Science and Engineering, for her constant support.
We express our gratitude to Ms. Asha Rani Borah, Senior Assistant Professor,
Department of Computer Science and Engineering, our project guide, for constantly
monitoring the development of the project and setting up precise deadlines. Her
valuable suggestions were the motivating factors in completing the work.
Finally a note of thanks to the teaching and non-teaching staff of Dept. of Computer
Science and Engineering, for their cooperation extended to us, and our friends, who
helped us directly or indirectly in the course of the project work.
NAVYA K (1NH17CS418)
SWAPNA K M (1NH17CS426)
VIIDHYA M (1NH17CS428)
II
CONTENTS
ABSTRACT I
ACKNOWLEDGEMENT II
LIST OF FIGURES V
LIST OF TABLES VI
1. INTRODUCTION
1.1. INFORMATION CONSISTENCY 1
1.2. PROBLEM DEFINITION 2
1.3. PROJECT PURPOSE 2
1.4. PROJECT FEATURES 2-3
1.5. MACHINE LEARNING 3-4
2. LITERATURE SURVEY
2.1. LITERATURE SURVEY 5-7
2.2. EXISTING SYSTEM 8
2.3. PROPOSED SYSTEM 9-13
2.4. SOFTWARE DESCRIPTION 14-15
3. REQUIREMENT ANALYSIS
3.1. FUNCTIONAL REQUIREMENTS 16
3.2. NON FUNCTIONAL REQUIREMENTS 16
3.3. HARDWARE AND SOFTWARE REQUIREMENTS 18
4. DESIGN
4.1. DESIGN GOALS 19
4.2. SYSTEM ARCHITECTURE 25-26
4.3. DATA FLOW DIAGRAM/ACTIVITY DIAGRAM 27-28
4.4. SEQUENCE DIAGRAM 29-30
III
5. IMPLEMENTATION
5.1. IMAGE CAPTURING 31
5.2. IMAGE PROCESSING MODULE 32-33
5.3. FACE DETECTION 33-34
5.4. VOICE PROCESSING MODULE 35-36
6. TESTING
6.1. UNIT TESTING 37
6.2. INTEGRATION TESTING 37
6.3. VALIDATION TESTING 38
6.4. SYSTEM TESTING 39
6.5. TESTING OF INITIALIZATION AND UI COMPONENTS 39-43
7. SNAPSHOTS/OUTPUT 44-48
8. CONCLUSION 49
REFERENCES 50
IV
LIST OF FIGURES
4.1 RASPBERRY PI 17
4.2 PI CAMERA 18
7.4 AUDIO 45
7.7 UNKNOWN 47
7.10 AUDIO 48
V
LIST OF TABLES
VI
Text recognition and face detection aid for visually impaired person using raspberry pi
CHAPTER 1
INTRODUCTION
To improve the ability of people who are blind or have significant visual impairments to
independently access, understand, and explore unfamiliar indoor and outdoor
environments, The purpose a new framework using a single camera to detect and
recognize the signs, text and obstacles to give audio as an output. To extract and
recognize the text information associated with the detected objects. The first extract
text regions from indoor signs with multiple colors.
The text character localization and layout analysis of text strings are applied to filter out
background interference. The object type, orientation, and location can be displayed as
speech for blind travelers. In order to discriminate similar objects in indoor
environments, the text information associated with the detected objects is extracted.
Optical character Recognition (OCR) is a process that converts scanned or printed text
images, handwritten text into editable text for further processing. Speech synthesis is
the artificial synthesis of human speech. A Text-To-Speech (TTS) synthesizer is a
computer-based system that should be able to read any text aloud, whether it was
directly introduced in the computer by an operator or scanned and submitted to an
Optical Character Recognition (OCR) system. Testing of device was done on raspberry pi
platform. The Raspberry Pi is a basic embedded system and being a low cost a single-
board computer used to reduce the complexity of systems in real time applications.
According to the fact sheet of 2016, 39 million people are blind and 246 million have low
vision. It is very difficult for them to do task like read, write or walk without help. An
effort to minimize the dependence of the user on the people around him for doing the
basic work. The concept of the device which supports general human tendency of
pointing at objects to interact with the environment. This system captures image and
any kind of text and which converts to speech. The OCR stands for Optical Character
Recognition. This algorithm is used for converting captured image into readable codes.
Captures the image, localize the text region, crop the text from the image, and
Recognize the text code using OCR.
Speech and text is the main medium for human communication. A person needs vision
to access the information in a text. However, those who have poor vision can gather
information from voice. This system proposes a camera based assistive text reading to
help visually impaired person in reading the text present on the captured image. It
converts captured text image into audio to help visually impaired people. The main
purpose of this project is to present a progressive work for developing an assistive aid
for visually impaired. It will help them in object identification, face recognition and
obstacle detection as well as reading newspapers and books. In this approach the object
identification, face recognition, text extractor block, obstacle detection module is
integrated in a single device. Use of finger for reading the text is overcomes by the
camera module.
As the main intention of the project is to help people who travels around the globe
The speed of translation is much higher than the human translation it takes more
than an hour for translating 10,000 words for human whereas, few seconds are
enough for the device to translation.
Cost efficient.
Literature survey is the most important step in software development process. Before
developing the tool, it is necessary to determine the time factor, economy and company
strength. Once these things are satisfied, then next steps is to determine which
operating system and language can be used for developing the tool. Once the
programmers start building the tool the programmers need lot of external support. This
support can be obtained from senior programmers, from book or from websites. Before
building the system the above consideration are taken into account for developing the
proposed system.
Machine Learning
Machine learning is a sub field of artificial intelligence (AI). The goal of machine learning
generally is to understand the structure of data and fit that data into models that can be
understood and utilized by people.
Although machine learning is a field within computer science, it differs from traditional
computational approaches. In traditional computing, algorithms are sets of explicitly
programmed instructions used by computers to calculate or problem solve. Machine
learning algorithms instead allow for computers to train on data inputs and use
statistical analysis in order to output values that fall within a specific range. Because of
this, machine learning facilitates computers in building models from sample data in
order to automate decision-making processes based on data inputs.
Dept. of CSE, NHCE 3
Text recognition and face detection aid for visually impaired person using raspberry pi
Any technology user today has benefitted from machine learning. Facial recognition
technology allows social media platforms to help users tag and share photos of friends.
Optical character recognition (OCR) technology converts images of text into movable
type. Recommendation engines, powered by machine learning, suggest what movies or
television shows to watch next based on user preferences. Self-driving cars that rely on
machine learning to navigate may soon be available to consumers.
Machine learning is a continuously developing field. Because of this, there are some
considerations to keep in mind as they work with machine learning methodologies, or
analyze the impact of machine learning processes.
Easily identifies trends and pattern:- Machine Learning can review large volumes of data
and discover specific trends and patterns that would not be apparent to humans.
No human intervention needed (automation):- With ML, It need not to be babysit the
project every step of the way. Since it means giving machines the ability to learn, it lets
them make predictions and also improve the algorithms on their own.
CHAPTER 2
LITERATURE SURVEY
A number of experimental investigations have been carried out by using partial
replacement of basalt fiber in concrete. The study of various literatures on use of basalt
fiber in concrete as below.
Asha G. Hagargund carried out a work and they concluded that the basic framework is
an embedded system that captures an image, extracts only the region of interest (i.e.
region of the image that contains text) and converts that text to speech. It is
implemented using a Raspberry Pi and a Raspberry Pi camera. The captured image
undergoes a series of image pre-processing steps to locate only that part of the image
that contains the text and removes the background. Two tools are used convert the new
image (which contains only the text) to speech. They are OCR (Optical Character
Recognition) software and TTS (Text-to-Speech) engines. The audio output is heard
through the raspberry pi’s audio jack using speakers or earphones.
OCR based automatic book reader for the visually impaired using Raspberry PI
(International Journal of Innovative Research in Computer and Communication
Engineering - 2016) [8]
Aaron James S carried out a work and they concluded that Optical character recognition
(OCR) is the identification of printed characters using photoelectric devices and
computer software. It coverts images of typed, handwritten or printed text into machine
encoded text from scanned document or from subtitle text superimposed on an image.
In this research these images are converted into audio output. OCR is used in machine
process such as cognitive computing, machine translation, text to speech, key data and
text mining. It is mainly used in the field of research in Character recognition, Artificial
intelligence and computer vision. In this research, as the recognition process is image
processing based Multilingual Translator for visually impaired And Travelers Using OCR
on Raspberry PI done using OCR the character code in text files are processed using
Raspberry Pi device on which it recognizes character using tesseract algorithm and
python programming and audio output is listened. To use OCR for pattern recognition to
perform Document image analysis (DIA) we use information in grid format in virtual
digital libraries design and construction. This research mainly focuses on the OCR based
automatic book reader for the visually impaired using Raspberry PI. Raspberry PI
features a Broad com system on a chip (SOC) which includes ARM compatible CPU and
an on chip graphics-processing unit GPU. It promotes Python programming as main
programming language with support for BBC BASIC.
A Smart Reader for Visually Impaired People Using Raspberry PI (International Journal
of Engineering Science and Computing – 2016) [11]
D. Velmurugan carried a work and they concluded that this work proposes a smart
reader for visually challenged people using Raspberry Pi. This report addresses the
integration of a complete Text Read-out system designed for the visually challenged. The
system consists of a webcam interfaced with raspberry pi which accepts a page of
printed text. The OCR (Optical Character Recognition) package installed in raspberry pi
scans it into a digital document which is then subjected to skew correction,
segmentation, before feature extraction to perform classification. Once classified, the
text is readout by a text to speech conversion unit (TTS engine) installed in Raspberry Pi.
The output is fed to an audio amplifier before it is read out. The simulation for the
proposed project can be done in MATLAB. The simulation is just an initiation of image
processing, the image to text conversion and text to speech conversion done by the OCR
software installed in raspberry pi. The system finds interesting applications in libraries,
auditoriums, offices where instructions and notices are to be read and also in assisted
filling of application forms. Though there are many existing solutions to the problem of
Dept. of CSE, NHCE 6
Text recognition and face detection aid for visually impaired person using raspberry pi
assisting individuals who are blind to read, however none of them provides a reading
experience that in any way parallels that of the sighted population. In particular, there is
a need for a portable text reader that is affordable and readily available to the blind
community.
Camera based Text to Speech Conversion, Obstacle and Currency Detection for Blind
Persons (Indian Journal of Science and Technology – 2016) [13]
J. K. R. Sastry carried out a work the main object of this report is to present an innovated
system that can help the blind for handling currency. Methods/Statistical Analysis: Many
image processing techniques have been used to scan the currency, remove the noise,
mark the region of interest and convert the image into text and then to sound which can
be heard by the blind. The entire system is implemented by using Raspberry Pi Micro
controller based system. In the prototype model an IPR sensor is used instead of camera
for sensing the object. Findings: In this report a novel method has been presented using
which one can recognize the object, mark the interesting region within the object, scan
the text and convert the scanned text into binary characters through optical recognition.
A second method has been presented using which the noise present in the scanned
image is eliminated before characters are recognized. A third method that can be used
to convert the recognized characters into e-speech through pattern matching has also
be presented. Applications: An embedded system has been developed based on ARM
technology which helps the blind persons to read the currency notes. All the methods
presented in this project have been implemented within an embedded application. The
embedded board has been tested with different currency notes and the speech in
English has been generated that identify the value of the currency.
Disadvantages:
In Existing System, Text extractor block, face recognition module and obstacle
detection module are separately available in market, it is difficult for them to carry
different module with them.
Finger reader and orcam devices is used for the visually impaired people for reading
newspaper and text. Both device add to the inconvenience and increases its market
cost.
This project presents a prototype system for recognition of text present in the image
using raspberry pi. As illustrated in the block diagram the system framework consist of
five functional components: Image acquisition, Image pre-processing, Text extraction,
Text to speech conversion and Speech output.
The image of the text is captured using Raspberry Pi camera or an HD webcam with high
resolution. The acquired image is then applied to the image pre-processing step for
reduction of unwanted noise. In image processing, it is defined as the action of
retrieving an image from some source, usually a hardware-based source for processing it
is first step in the work flow sequence because without an image, no processing is
possible. The image that is acquired is completely unprocessed. Now the incoming
energies transformed into a voltage by the combination of input electrical power and
sensor material that is responsive to a particular type of energy being detected. The
output voltage waveform is the response of the sensor(s) and a digital quantity is
obtained from each sensor by digitizing its response.
Advantages:
Low cost
The main advantage of the raspberry pi module over another processor is raspberry
pi is a fully functional Linux computer and also compact in size.
As the main intention of the project is to help people who travels around the globe
and to help the blind by assisting them in reading text.
High rate of translation: The speed of translation is much higher than the human
translation it takes more than an hour for translating 10,000 words for human
whereas, few seconds are enough for the device to translation.
Dept. of CSE, NHCE 10
Text recognition and face detection aid for visually impaired person using raspberry pi
In this step the extracted text is first converted into speech using the speech synthesizer
called TTS engine which is capable of converting text to speech using predefined libraries.
Text-to-speech device consists of two main modules, the image processing module and
voice processing modules Image processing module captures image using camera
converting the image into text. Voice processing module changes the text into sound
and processes it with specific physical characteristics so that the sound can be
understood. Figure shows the block diagram of Text-To-Speech device, first block is
image processing module, where OCR converts .jpg to .txt form. Second is voice
processing module which converts .txt to speech.
First block is image processing module, where OCR converts .jpg to .txt form. Second is
voice processing module which converts .txt to speech. OCR is important element in this
module. OCR or Optical Character Recognition is a technology that automatically
recognize the character through the optical mechanism, this technology imitate the
ability of the human senses of sight, where the camera becomes a replacement for eye
and image processing is done in the computer engine as a substitute for the human
brain. Tesseract OCR is a type of OCR engine with matrix matching. The selection of
Tesseract engine is because of its flexibility and extensibility of machines and the fact
that many communities are active researchers to develop this OCR engine and also
because Tesseract OCR can support 149 languages. In this project we are identifying
English alphabets. Before feeding the image to the OCR, it is converted to a binary image
to increase the recognition accuracy. Image binary conversion is done by using Image
software, which is another open source tool for image manipulation. The output of OCR
is the text, which is stored in a file (speech.txt). Machines still have defects such as
distortion at the edges and dim light effect, so it is still difficult for most OCR engines to
get high accuracy text. It needs some supporting and condition in order to get the
minimal defect. Tesseract OCR Implementation.
For an image processing tool with powerful, highly accurate OCR and a wide range of
other capabilities, check out Trapeze from soft works AI. Trapeze can recognize both
printed and hand-written text, and it even has features to process scanned documents
that are imperfect in quality.
In this module text is converted to speech. The output of OCR is the text, which is stored
in a file (speech.txt). Here, Festival software is used to convert the text to speech.
Festival is an open source Text To Speech (TTS) system, which is available in many
languages. In this project, English TTS system is used for reading the text.
Data extraction tools are handy for all accountants to make sure they don’t waste their
time on manual data entry. To make it possible, they need to understand what OCR is
and how it works to read and understand expense-related documents. As on reading,
they reveal the intricacies of how attribute extracts data from invoices and receipts. It
simply, OCR (Optical Character Recognition) is a process used to turn an image file into a
text file. It can treat the process as a type of compression since text documents require
significantly less space than picture files such as JPEG, PDF, etc. OCR techniques are
already used in many different fields. Some examples where OCR is thriving are devices
to help the visually impaired, read algorithms that translate handwriting to text
documents, automatic number plate recognition.
How does OCR work?
Step 1: A unique API key is generated when the first integrate optical character
recognition software with mobile or desktop. Even for a trial use, our representatives
offer, this key so that all set of scanned documents.
Step 2: Upload any document in the format of PDF, JPEG, TIFF, PNG etc. to the software.
It can be used as Infrared’s software as a white-labeled mobile app, on the desktop or as
a cloud solution to start scanning.
Step 3: Our software starts extracting line items and other key fields such as logo,
expense type, merchant name, date of the transaction, amount, currency VAT/GST,
business name etc. It can even customize it to extract any other information that might
need. The OCR software provides character-level and word-level confidence scores.
These scores are indicators of whether the OCR software believes the extracted
information to be accurate.
Step 4: The extracted data is made available to them in formats like XML, CSV, JSON etc.
as per their requirement.
2.4.1 Raspbian
Raspbian is a Debian-based computer operating system for Raspberry Pi. Since 2015
till now it is officially provided by the Raspberry Pi Foundation as the primary
operating system for the family of Raspberry Pi single-board computers.
The operating system is still under active development. Raspbian is highly optimized
for the Raspberry Pi line's low-performance ARM CPUs.
Basic features
OS family: Unix/Windows
Platforms: ARM
Python was developed by “Guido van Rossum”, and it is free software. Free as in “free
beer,” in that you can obtain Python without spending any money. But Python is also
free in other important ways, for example you are free to copy it as many times as you
like, and free to study the source code, and make changes to it. There is a worldwide
movement behind the idea of free software, initiated in 1983 by Richard Stallman.
Python is a good choice for mathematical calculations, since we can write code quickly,
test it easily, and its syntax is similar to the way mathematical ideas are expressed in the
mathematical literature. By learning Python you will also be learning a major tool used
by many web developers.
Python Features
Easy to Learn and Use. Python is easy to learn and use.
Interpreted Language.
Cross-platform Language.
Object-Oriented Language.
Extensible.
CHAPTER 3
REQUIREMENT ANALYSIS
The open CV library manufactures the plant item investigation for medical imaging,
security UI, camera adjustment, stereo vision and robotics.
OCR and TTS tasks have to introduce OCR and TTS engines with predefined libraries.
OCR is also useful for visually challenged individuals who was not able to read Text
document, but need to access the contents of the text documents. It is utilized to
digitize and reproduce messages.
3.2.1 ACCESSIBILITY
Accessibility is a general term used to describe the degree to which a product, device,
service, or environment is accessible by as many people as possible.
In our project people who are blind or have significant visual impairments to
independently access, understand, and explore unfamiliar indoor and outdoor
environments, The faces can also be detected when a person enter into the frame by
the mode control.
The proposed idea involves text extraction from scanned image using Tesseract Optical
Character Recognition (OCR) and converting the text to speech by e-Speak tool, a
process makes visually impaired persons to read the text.
3.2.2 MAINTAINABILITY
In software engineering, maintainability is the ease with which a software product can
be modified. In order to:-
Correct defects
3.2.3 SCALABILITY
System is capable of handling increase total throughput under an increased load when
resources (typically hardware) are added.
System can work normally under situations such as low bandwidth and large number of
users.
3.2.4 PORTABILITY
CHAPTER 4
DESIGN
RASPBERRY PI:- The Raspberry Pi 3 Model B is the third generation Raspberry Pi. This
powerful credit-card sized single board computer can be used for many applications and
supersedes the original Raspberry Pi Model B+ and Raspberry Pi 2 Model B. Whilst
maintaining the popular board format the Raspberry Pi 3 Model B brings more powerful
processer, 10x faster than the first generation Raspberry Pi. Additionally it adds wireless
LAN & Bluetooth connectivity making it the ideal solution for powerful connected
designs.
Key Benefits:
Low cost
Pi camera: - The camera module used in this project the Raspberry Pi camera module
as shown in the fig.4.2. The camera module plugs to the CSI connector on the Raspberry
Pi. It is able to deliver clear 5MP resolution image, or 1080p HD video recording at 30fps.
The camera module attaches to Raspberry Pi by a 15 pin Ribbon Cable, to the dedicated
15 pin MIPI Camera Serial Interface (CSI), which was designed especially for interfacing
to cameras. The CSI bus is capable of extremely high data rates, and it exclusively carries
pixel data to the BCM2835 processor.
Features
5MPixel sensor
Integral IR filter
Size: 36 x 36 mm
HEAD SET:- It converts into a voice output to help blind peoples. Its Audio input is
3.5mmjack. This gets the result from the raspberry pi module and convey to the receiver.
Image Acquisition
In this step the image of the text is captured using raspberry pi camera or an HD
webcam with high resolution. The acquired image is then applied to the image pre-
processing step for reduction of unwanted noise.
In image processing, it is defined as the action of retrieving an image from some source,
usually a hardware-based source for processing it is first step in the work flow sequence
because without an image, no processing is possible. The image that is acquired is
completely unprocessed. Now the incoming energies transformed into a voltage by the
combination of input electrical power and sensor material that is responsive to a
particular type of energy being detected. The output voltage waveform is the response
of the sensor(s) and a digital quantity is obtained from each sensor by digitizing its
response.
Example of a single sensor is a photo diode. Now to obtain a two dimensional image
using a single sensor, the motion should be in both X & Y directions. Rotation provides
motion in one direction. Linear motion provides motion in the perpendicular direction
Image Pre-processing
In image pre-processing the unwanted noise in the image is removed by applying
appropriate threshold (OTSU), morphological transformations like dilation and black hat
transformation, discrete cosine transformations, generating the required contours and
drawing the bounding boxes around the required text content in the image. Initially the
captured image is re scaled to appropriate size and converted into gray scale image such
that it will be more useful for further processing.
Then the discrete cosine transformation is applied to the grey image to compress the
image which helps to improve processing rate. Then by setting the vertical and
horizontal ratio unwanted high frequency components present in the image are
eliminated.
Then the inverse discrete cosine transform is applied for decompression. Then image
undergoes morphological operations like black top-hat transformation and dilations. The
black top-hat transformation is applied to the image by generating appropriate
structuring elements and extracts the objects or elements which are smaller than the
defined structuring elements and darker than their surroundings.
Then dilation operation is performed, which adds the pixels to the boundaries of the
objects present in the image. The number of pixels added to the objects depends on the
size and shape of the structuring element defined to process the image. After the
morphological operations, thresholding is applied to the morphologically transformed
image. Here the OTSU’s thresholding algorithm is applied to the image, which is an
adaptive thresholding algorithm. After thresholding, the contours for the image are
generated using special functions in Open CV.
These contours are used to draw the bounding boxes for the objects or elements
present in the image. Using these drawn bounding boxes each and every character
present in the image is extracted which is then applied to the OCR engine to recognize
the entire text present in the image
Text Extraction
In this step the recognized text present in the image are extracted using OCR engines. In
this project the Tesseract OCR engine which helps to extract the recognized text.
The aim of Optical Character Recognition (OCR) is to classify optical patterns (often
contained in a digital image) corresponding to alphanumeric or other characters. The
process of OCR involves several steps including segmentation, feature extraction, and
classification. In principle, any standard OCR software can now be used to recognize the
text in the segmented frames. However, a hard look at the properties of the candidate
character regions in the segmented frames or image reveals that most OCR software
packages will have significant difficulty to recognize the text.
Document images are different from natural images because they contain mainly text
with a few graphics and images. Due to the very low-resolution of images of those
captured using hand held devices, it is hard to extract the complete layout structure
(logical or physical) of the documents and even worse to apply standard OCR systems.
For this reason, a shallow representation of the low-resolution captured document
images is proposed. In case of original electronic documents in the repository, the
extraction of the same signature is straightforward; the PDF or PowerPoint form of the
original electronic documents is converted into a relatively high-resolution image (TIFF,
JPEG on which the signature is compute Finally, the captured document’s signature is
compared to with all the original electronic document signatures in order to find a
match.
4.1.2 Algorithms
Start
Convert color image into gray image and then binary image.
A. Text Detection
This phase takes image or video frame as input and decides it contains text or not.It also
identifies the text regions in image.
B. Text Localization
Text localization merges the text regions to formulate the text objects and define the
tight bounds around the text objects
C. Text Tracking
This phase is applied to video data only. For the readability purpose, text embedded in
the video appears in more than thirty consecutive frames.
Text tracking phase exploits this temporal occurrences of the same text object in
multiple consecutive frames. It can be used to rectify the results of text detection and
localization stage. It is also used to speed up the text extraction process by not applying
the banalization and recognition step to every detected object.
D. Text Banalizations
This step is used to segment the text object from the background in the bounded text
objects. The output of text banalizations is the binary image, where text pixels and
background pixels appear in two different binary levels.
E. Character Recognition
The last module of text extraction process is the character recognition. This module
converts the binary text object into the ASCII text. Text detection, localization and
tracking modules are closely related to each other and constitute the most challenging
and difficult part of extraction process.
Process:- A process receives input data and produces output with a different content
or form. Processes can be as simple as collecting input data and saving in the database,
or it can be complex as producing a report containing monthly sales of all retail stores in
the northwest region.
The text character localization and layout analysis of text strings are applied to filter out
background interference. The object type, orientation, and location can be displayed as
speech for blind travelers.
To improve the ability of people who are blind or have significant visual impairments to
independently access, understand, and explore unfamiliar indoor environments, we
propose a new framework using a single camera to detect and recognize the face,
obstacles, signs incorporating text information associated with the detected object. In
order to discriminate similar objects in indoor environments, the text information
associated with the detected objects is extracted.
Optical character Recognition (OCR) is a process that converts scanned or printed text
images, handwritten text into editable text for further processing. Speech synthesis is
the artificial synthesis of human speech. A Text-To-Speech (TTS) synthesizer is a
computer-based system that should be able to read any text aloud, whether it was
directly introduced in the computer by an operator or scanned and submitted to an
Optical Character Recognition (OCR) system.
It shows a set of objects and the messages sent and received by those objects.
The objects are typically named or anonymous instances of classes, but may also
represent instances of their things, such as collaborations, components and nodes.
These diagrams are used to illustrate the dynamic view of a system.
For an image processing tool with powerful, highly accurate OCR and a wide range of
other capabilities, check out Trapeze from soft works AI. Trapeze can recognize both
printed and hand-written text, and it even has features to process scanned documents
that are imperfect in quality
CHAPTER 5
IMPLEMENTATION
Code
“import os
import cv2
import numpy as np
import pygame
Enable the camera settings on the board to capture the image and save it on the folder.
Run the python code to check the enhancement algorithms and remove the noise
present in an image.
Code
“while True:
print("3. Exit")
if opt=="1":
#image_name = "Book-"+datetime.datetime.now().strftime("%H-%M-%S")+".jpg"
image_name = "Books.jpg"
self.camera.start_preview()
time.sleep(2)
self.camera.capture(image_name)
content = image_file.read()
image = types.Image(content=content)
response = self.imgClient.document_text_detection(image=image)
labels = response.full_text_annotation”
3. Matching features of an input image to the features in the saved XML files and predict
identity.
Code
# Draw rectangle
result = cv2.face.StandardCollector_create()
self.rec.predict_collect(hog[y:y+h,x:x+w],result)
id = result.getMinLabel()
conf = result.getMinDist()
if conf<100:
if id==1:
id = "Bill Gates"
elif id==2:
id = "Ratan Tata"
else:
id = "Modi"
else:
id = "Unknown"
showImage = None
# Convert to grayscale
equ = cv2.equalizeHist(gray)
In this module text is converted to speech. The output of OCR is the text, which is stored
in a file (speech.txt). Now this speech.txt is converted to speech.
Once the face is captured will check with a id if the id matches then that is converted to
speech.
Code
def generateAudio(self,text,audioname):
input_text = texttospeech.types.SynthesisInput(text=text)
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
# audion_config = configuration for the output audio file. Supports other formats such as
WAV, AVI etc.
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# The response's audio_content is in binary format. The audio content is written into the
output file.
out.write(response.audio_content)
CHAPTER 6
TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product it is
the process of exercising software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific testing
requirement.
TYPES OF TESTS
6.1 UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual
software units of the application .it is done after the completion of an individual unit
before integration. This is a structural testing, that relies on knowledge of its
construction and is invasive. Unit tests perform basic tests at component level and test a
specific business process, application, and/or system configuration. Unit tests ensure
that each unique path of a business process performs accurately to the documented
specifications and contains clearly defined inputs and expected results.
As a rule, system testing takes, as its input, all of the "integrated" software components
that have successfully passed integration testing and also the software system itself
integrated with any applicable hardware system(s).
System testing is a more limited type of testing; it seeks to detect defects both within
the "inter-assemblages" and also within the system as a whole.
Table 6.1: Test case for Capture Image from Raspberry Pi Camera
Table 6.2: Test Case for Conversion of image to txt using OCR tool
Table 6.3: Test Case for Conversion of text to voice using TTS system
Table 6.4: Test Case for Capture human face using Raspberry Pi Camera
CHAPTER 7
SNAPSHOT
AUD-20200427-WA0005.opus
Fig 7.4 Audio
AUD-20200427-WA0005.opus
Fig 7.10 Audio
CHAPTER 8
8.1 CONCLUSION
A design on face and text recognition based on raspberry pi, which is mainly designed
for the purpose of blind navigation. Our future work will focus on detecting the
emotions of the persons and recognizing more types of indoor objects and icons on
signage in addition to text for indoor way finding aid to assist blind people travel
independently. We will also study the significant human interface issues including
auditory output and spatial updating of object location, orientation, and distance. With
real-time updates, blind users will be able to better use spatial memory to understand
the surrounding environment, obstacles and signs.
REFERENCES
[1]. Eyes of things(IEEE 2017) by Noelia Vallez ,THALES Communications & Security,
4 Avenue des Louvresses, 92230 Gennevilliers, France.
[2]. Camera based analysis of text and documents by Jian Liang, David Doermann. In
Proceedings of the IEEE International Conference on Robotics and Automation, 2004.
[4]. Reading labels of cylinder objects for blind persons by Ze Ye, Chucai Yi and Yingli
Tian. Dept. of Electrical Engineering, The City College of New York,The City
University of New York, USA.e-mail: zye01@ccny.cuny.edu.
10 %
SIMILARITY INDEX
%
INTERNET SOURCES
10%
PUBLICATIONS
%
STUDENT PAPERS
PRIMARY SOURCES