Professional Documents
Culture Documents
Guide:
Mr. VSVS Murthy
Team
Team members:
• Abstract
• Introduction
• Literature reviews
• Existing system
• Problem identification
• Objectives of proposed system
• procedure
• outcomes
• Technologies
• UML diagrams
• conclusion
• References
Abstract:
The project aims to convert image to text and text to speech. Also
aims at translating this text into another language.This helps Users
to Listen to any English document in their native language and helps
visually handicapped to listen to any text by capturing an image
Introduction
The main intention of the project is to help people with visual disabilities and help them to overcome
everyday struggles in their lives, which include, reading news papers, recognizing signboards,
recognizing the text on advertisements or notice boards, numbers etc.
In step with a survey by a national organisation for ophthalmologists India accounts for 20% of the
full blind population of the globe, with 7.8 million visually impaired out of the 39 million across the world.
According to the survey report, India has the largest number of blind people in the world.
About 260 million people are suffering from blindness worldwide and many from eye problems.
40 million people can't see totally and 210 million people have partial vision.
The project aims to convert a picture into text, so for this process we'd like image processing
Literature review
Smart guiding glasses for visually impaired people in indoor environment -2017
• It is made with help of ultrasonic sensors, that helps in detecting even small obstacles.
• The efficiency of system is tested by number of users, which results shows that it can help visually challenged
• A reading device is a compact hardware setup with necessary programmes coded in it which read out printed
documents like a human reader. People having eyesight problem can't read books, papers or any kind of printed
reading materials.
• This problem can be solved simply by taking image of the reading materials, extracting words from the image and
converting those words to sound, so by hearing that text converted sound they can understand what is written on
that paper.
• For character recognition tesseract-ocr is used as optical character recognition(OCR) engine. Python gTTS
module, a text to speech engine is used to convert the words extracted by tesseract-ocr to sound. Whole process
is implemented on Raspberry Pi based a compact hardware design.
• This paper presents the research carried out by us to develop a wearable device that will identify the texts in the
surroundings of the wearer and convert it to voice message which will be given as feedback.
• It will reduce the dependence of blind people on others especially during commutation. The system consists of a
small camera, a push button and a SoC(System on Chip) attached to a cap.
• The SoC is powered using a portable power bank. The camera captures the image, which is processed to extract
the text present and converts it into voice signal that is transferred to the user.
• In today's advanced hi-tech world, the need of independent living is recognized in case of visually impaired people
who are facing main problem of social restrictiveness.
• Visual information is the basis for most tasks, so visually impaired people are at disadvantage because necessary
information about the surrounding environment is not available.
• With the recent advances in inclusive technology, it is possible to extend the support given to people with visual
impairment. This project is proposed to help those people who are blind or visually impaired using Artificial
Intelligence, Machine Learning, Image and Text Recognition.
• This system takes a picture of any reading materials like books or papers and then it extracts those words from
the picture
• Then converts them into a voice output, and text born sound can be heard by the user, usually a blind person.
• For identifying the character, a tesseract-ocr is employed. By using gtts module we can convert the text into
sound or speech that is helpful to the blind.
We have proposed a system with the following capabilities, which are additional to existing system:
Take an Image
The preprocessing of image is essential to get better outcome from the image.
It includes the following steps
• deskewing
• thresholding
1.Converting to a grayscale image:
Firstly the image is taken as input using openCV modules and stored in program.
This must be converted into grayscale to beat the matter with different colors of text and background.
2.Deskewing:
We need to identify the letters and words from the processed image, so we do thresholding.
If a pixel value is larger than a threshold calculated value, it's assigned one value
for example it is 0, else it's assigned a different value lets say it is 1.
The function used for cv2.threshold.
Converting an image to text
The conversion image to text is done by using pytesseract-ocr, which is an open source.
pattern recognition- OCR is trained with datasets of millions of words and characters to identify
any type of words in multiple languages.
feature extraction - The image is given as a parameter to ocr function, it performs the feature extraction
and identifies the words and characters.
Translating into Multiple languages
• One of the Objectives of the project is to provide translation to the text extracted so that any user
that any native language can utilise the system.
• Google’s gtts module provides an API to translate the text from one language to any language.
• Using this feature we translate the text into TELUGU or HINDI or FRENCH, which can be extended to
any language as per user needs.
TEXT TO SPEECH:
• it identifies the language and speak the words based on specified speed.
• This helps the visually challenged to hear and understand the text in any language.
Outcome:
The model should work for Images rotated in any angle. It calculates the angle for text and converts
it to a straight image.
2. Outcome of images with background:
The developed model should take Pan card images and process the extracted text and
give the details of the user.
Name : D Manikandan
D.O.B : 16/07/1986
PAN : BNZPM2501F
googleTrans
• Bulk translations
• Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio
manipulation, or stdout.
Usecase Diagram
Sequence Diagram
Conclusion
It all begins with taking a picture as an input Image that is noise free gives better results for the model.
If the image is not noise free then some techniques are performed to make the image quality better.
It detects English words by pytesseract.
The developed model takes a text image and through various image processing techniques can be
passed to the pytesseract-ocr model, and the text is extracted with about high accuracy for the images
without any background and clear text images.
Text can be extracted about a little less accurate for the identity cards or images with background colors
or designs. Output is stored in a file. This file contains the sentences on the image.
Now the file is converted to sound by using python gTTS module which is based on google’s API.
Sound file is saved as mp3 format and played at end.
References
• “Md. Mahade Sarkar” , Implementation of a reading device for bengali speaking visually handicapped people,”
• Medium - “improve-accuracy-of-ocr-using-image-preprocessing”
• https://www.ijcseonline.org/pub_paper/57-IJCSE-03906.pdf
• Pyimagesearch - “opencv-ocr-and-text-recognition-with-tesseract”
• “Smart guiding glasses for visually impaired people in indoor environment -2017”
Thank You