Image To Speech Conversion in Multi Languages

IMAGE TO SPEECH CONVERSION IN MULTI LANGUAGES
Guide:
Mr. VSVS Murthy
Team
Team members:
M Siva Badarinath 1215316025

G Manish Reddy 1215316015
M Sumanth 1215316027
Y Raviteja 1215316059
Table Of Contents
• Abstract
• Introduction
• Literature reviews
• Existing system
• Problem identification
• Objectives of proposed system
• procedure
• outcomes
• Technologies
• UML diagrams
• conclusion
• References
Abstract:
The project aims to convert image to text and text to speech. Also
aims at translating this text into another language.This helps Users
to Listen to any English document in their native language and helps
visually handicapped to listen to any text by capturing an image
Introduction
The main intention of the project is to help people with visual disabilities and help them to overcome
everyday struggles in their lives, which include, reading news papers, recognizing signboards,
recognizing the text on advertisements or notice boards, numbers etc.
In step with a survey by a national organisation for ophthalmologists India accounts for 20% of the
full blind population of the globe, with 7.8 million visually impaired out of the 39 million across the world.
According to the survey report, India has the largest number of blind people in the world.
About 260 million people are suffering from blindness worldwide and many from eye problems.
40 million people can't see totally and 210 million people have partial vision.
The project aims to convert a picture into text, so for this process we'd like image processing
Literature review
Smart guiding glasses for visually impaired people in indoor environment -2017
• A device in the shape of eyeglasses is made.
• The device helps visually challenged people to overcome their difficulties.
• It is made with help of ultrasonic sensors, that helps in detecting even small obstacles.
• The efficiency of system is tested by number of users, which results shows that it can help visually challenged
By: Jinqiang Bai , Shiguo Lian, Zhaoxiang Liu

Beihang University, Beijing, China
Implementation of a reading device for visually handicapped people - 2017
• A reading device is a compact hardware setup with necessary programmes coded in it which read out printed
documents like a human reader. People having eyesight problem can't read books, papers or any kind of printed
reading materials.
• This problem can be solved simply by taking image of the reading materials, extracting words from the image and
converting those words to sound, so by hearing that text converted sound they can understand what is written on
that paper.
• For character recognition tesseract-ocr is used as optical character recognition(OCR) engine. Python gTTS
module, a text to speech engine is used to convert the words extracted by tesseract-ocr to sound. Whole process
is implemented on Raspberry Pi based a compact hardware design.
By: Md. Mahade Sarkar, Shuvasis Datta, Md. Mahedi Hassan

Department of Electrical and Electronic Engineering, Chittagong University of Engineering and Technology,
Chittagong - 4349, Bangladesh
Pseudo Eye - Mobility assistance for visually impaired using image recognition 2018
• This paper presents the research carried out by us to develop a wearable device that will identify the texts in the
surroundings of the wearer and convert it to voice message which will be given as feedback.
• It will reduce the dependence of blind people on others especially during commutation. The system consists of a
small camera, a push button and a SoC(System on Chip) attached to a cap.
• The SoC is powered using a portable power bank. The camera captures the image, which is processed to extract
the text present and converts it into voice signal that is transferred to the user.
By: A. G. Sareeka, K. Kirthika, M. R. Gowthame, V. Sucharitha

Dept. of Electronics and Communication Engineering CEG, Anna University Chennai, India
A Smart Personal AI Assistant for Visually Impaired People - 2018
• In today's advanced hi-tech world, the need of independent living is recognized in case of visually impaired people
who are facing main problem of social restrictiveness.
• Visual information is the basis for most tasks, so visually impaired people are at disadvantage because necessary
information about the surrounding environment is not available.
• With the recent advances in inclusive technology, it is possible to extend the support given to people with visual
impairment. This project is proposed to help those people who are blind or visually impaired using Artificial
Intelligence, Machine Learning, Image and Text Recognition.
By: Shubham Melvin Felix, Sumer Kumar, A. Veeramuthu

Shubham Melvin Felix Department of Information Technology, Sathyabama Institute of Science and Technology,
Chennai
Existing system
Implementation of a reading device for bengali speaking visually handicapped people
• This system takes a picture of any reading materials like books or papers and then it extracts those words from
the picture
• Then converts them into a voice output, and text born sound can be heard by the user, usually a blind person.
• For identifying the character, a tesseract-ocr is employed. By using gtts module we can convert the text into
sound or speech that is helpful to the blind.
• The device is developed by using Raspberry Pi based hardware.

Problem Identification:
Visually handicapped people cannot read any non-braille text

this makes their lives difficult in day to day life.
Some people may be suffering eye sight and some other may
not be able to read certain language.
Objectives of Proposing system:
We have proposed a system with the following capabilities, which are additional to existing system:
• Translation of extracted text to multiple languages.
• dealing with other than plain text images
• Increase processing speed
• Utilisation of efficient pattern recognition

Procedure
• Pre-processing the images
• Converting an Imagetext pattern in english to normal text.
• This obtained text is translated into a specific language.
• The Translated text is converted into speech and also provides

the text document of translated speech.
Flow Chart
Take an Image
Preprocess the image
Convert into Text
Translate text into desired language
convert into speech
provide desired output

Preprocessing:
The preprocessing of image is essential to get better outcome from the image.
It includes the following steps
• Converting into black and white
• deskewing
• thresholding
1.Converting to a grayscale image:
Firstly the image is taken as input using openCV modules and stored in program.
This must be converted into grayscale to beat the matter with different colors of text and background.
2.Deskewing:
• Detect the text within the captured image.
• calculate the angle of the text in the picture.
• Finally picture is rotated by desired angle.

3.Thresholding
We need to identify the letters and words from the processed image, so we do thresholding.
If a pixel value is larger than a threshold calculated value, it's assigned one value
for example it is 0, else it's assigned a different value lets say it is 1.
The function used for cv2.threshold.
Converting an image to text
The conversion image to text is done by using pytesseract-ocr, which is an open source.
It performs pattern recognition and feature extraction
pattern recognition- OCR is trained with datasets of millions of words and characters to identify
any type of words in multiple languages.
feature extraction - The image is given as a parameter to ocr function, it performs the feature extraction
and identifies the words and characters.
Translating into Multiple languages
• One of the Objectives of the project is to provide translation to the text extracted so that any user
that any native language can utilise the system.
• Google’s gtts module provides an API to translate the text from one language to any language.
• Using this feature we translate the text into TELUGU or HINDI or FRENCH, which can be extended to
any language as per user needs.
TEXT TO SPEECH:
• The translated document is passed to a speech engine by the model.
• it identifies the language and speak the words based on specified speed.
• This helps the visually challenged to hear and understand the text in any language.
Outcome:
Converts image into Text and Translates it into specific

language , then converts it to speech
1. Outcome of Inverted images or rotated images:
The model should work for Images rotated in any angle. It calculates the angle for text and converts
it to a straight image.
2. Outcome of images with background:
The model should work for Images with any background.

It converts them into black and white images and then performs the pre-processing.
3. PAN CARD IMAGES::
The developed model should take Pan card images and process the extracted text and
give the details of the user.
Name : D Manikandan
D.O.B : 16/07/1986
PAN : BNZPM2501F
googleTrans
• Auto language detection
• Bulk translations
• Customizable service URL
• Connection pooling (the advantage of using requests.Session)

Technologies used in our proposed system:
gTTS : GoogleText to Speech
• a Python library and tool to interface with Google Translate’s text-to-speech

API.
• Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio
manipulation, or stdout.
• It features flexible pre-processing and tokenizing, as well as automatic

retrieval of supported languages
OCR(optical Charecter Recognition):
• Optical Character Recognition, or OCR, is a technology that enables one to

convert different types of documents or images captured by a digital camera into
editable and searchable data
• ABBYY FineReader is an optical character recognition (OCR) software that works

with text conversion and creates editable, searchable files and e-books from
scans of paper documents, PDFs and digital photographs
UML Diagrams
Usecase Diagram
Sequence Diagram
Conclusion
It all begins with taking a picture as an input Image that is noise free gives better results for the model.
If the image is not noise free then some techniques are performed to make the image quality better.
It detects English words by pytesseract.
The developed model takes a text image and through various image processing techniques can be
passed to the pytesseract-ocr model, and the text is extracted with about high accuracy for the images
without any background and clear text images.
Text can be extracted about a little less accurate for the identity cards or images with background colors
or designs. Output is stored in a file. This file contains the sentences on the image.
Now the file is converted to sound by using python gTTS module which is based on google’s API.
Sound file is saved as mp3 format and played at end.
References
• “Md. Mahade Sarkar” , Implementation of a reading device for bengali speaking visually handicapped people,”
• Medium - “improve-accuracy-of-ocr-using-image-preprocessing”
• https://www.ijcseonline.org/pub_paper/57-IJCSE-03906.pdf
• Pyimagesearch - “opencv-ocr-and-text-recognition-with-tesseract”
• “Smart guiding glasses for visually impaired people in indoor environment -2017”
Thank You

Image To Speech Conversion in Multi Languages

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image To Speech Conversion in Multi Languages

Uploaded by

Copyright:

Available Formats

IMAGE TO SPEECH CONVERSION IN MULTI LANGUAGES

M Siva Badarinath 1215316025

• A device in the shape of eyeglasses is made.

• The device helps visually challenged people to overcome their difficulties.

By: Jinqiang Bai , Shiguo Lian, Zhaoxiang Liu

By: Md. Mahade Sarkar, Shuvasis Datta, Md. Mahedi Hassan

By: A. G. Sareeka, K. Kirthika, M. R. Gowthame, V. Sucharitha

By: Shubham Melvin Felix, Sumer Kumar, A. Veeramuthu

Implementation of a reading device for bengali speaking visually handicapped people

• The device is developed by using Raspberry Pi based hardware.

Visually handicapped people cannot read any non-braille text

• Translation of extracted text to multiple languages.

• dealing with other than plain text images

• Increase processing speed

• Utilisation of efficient pattern recognition

• Pre-processing the images

• Converting an Imagetext pattern in english to normal text.

• This obtained text is translated into a specific language.

• The Translated text is converted into speech and also provides

Preprocess the image

Convert into Text

Translate text into desired language

convert into speech

provide desired output

• Converting into black and white

• Detect the text within the captured image.

• calculate the angle of the text in the picture.

• Finally picture is rotated by desired angle.

It performs pattern recognition and feature extraction

• The translated document is passed to a speech engine by the model.

Converts image into Text and Translates it into specific

The model should work for Images with any background.

• Auto language detection

• Customizable service URL

• Connection pooling (the advantage of using requests.Session)

gTTS : GoogleText to Speech

• a Python library and tool to interface with Google Translate’s text-to-speech

• It features flexible pre-processing and tokenizing, as well as automatic

• Optical Character Recognition, or OCR, is a technology that enables one to

• ABBYY FineReader is an optical character recognition (OCR) software that works

You might also like