Professional Documents
Culture Documents
Hindi
Hindi
Abstract—This paper explores the conversion of Devanagari image enhancement techniques are required to improve the
Hindi Braille, first to text, and subsequently to speech. The first clarity of the images and the dots, which are more intricate
part of the implementation is the conversion of Hindi Braille when the Braille is double-sided.
to text, in which two approaches are used for Braille character
recognition: a conventional sequence-mapping approach and a Speech synthesis is the production of human voice or speech
deep learning-based method. The second part of the paper deals by a machine. It is mostly used to convert written information
with the conversion of Hindi text to speech, in which text is con- into spoken information for convenience. A text-to-speech
verted to speech by concatenating speech samples corresponding (TTS) system performs this function. One form of speech
to Hindi vowels and consonants. Successful conversion of Hindi
synthesis is concatenative, which involves rearranging of voice
Braille to text and, consequently, speech, yielded two forms of
output. Generated samples of Hindi Braille letters, as well as samples spoken by humans into words and sentences.
extracts from a Hindi Braille textbook, were used to create an The motivation behind this paper was to bridge the gap
image dataset. A Hindi speech corpus was created using speech between sighted people and visually impaired people. This
coefficients extracted from a recorded audio sample. The authors project could help preserve Braille books written by the
achieved an accuracy of 100 percent using the conventional
method of Hindi Braille to text conversion and an accuracy of 96 visually impaired. A lot of sighted people have begun to prefer
percent using the deep learning approach. Experts also validated audiobooks over physical books, due to their portability and
the quality of Hindi speech generated from the text-to-speech ease of use. A similar system for Braille books could help
model, based on factors such as clarity of speech, pronunciation, visually impaired people enjoy books on the go. This system
sound quality, and speed of speech. could also help sighted people understand the Braille script
Index Terms—Braille to text, deep learning, AlexNet, text-to-
speech (TTS) system
without any prior knowledge about Braille.
In this paper, the authors describe a novel methodology to
I. I NTRODUCTION convert the obtained Hindi Braille images to new forms of
images using certain image processing techniques for dots
Braille is the most popular system used by visually impaired enhancement and noise reduction, which are then used for
people for reading and writing using tactile means. Developed converting into Hindi text using deep learning and later
by Louis Braille in 1824 for the French alphabet, Braille now convert the obtained text to speech by creating a text-to-speech
exists for several languages used by sighted people. A unified (TTS) system and using the concepts of concatenative speech
script, Bharati Braille is used for communication in various synthesis.
Indian languages, which is based on the English Braille script,
and uses a 6-dot cell system, arranged in 3x2 form as shown II. L ITERATURE S URVEY
in Fig. 1.
In [2], the authors discuss a Braille to Text conversion
system using images from a flatbed scanner. The paper eluci-
dates the image processing techniques to differentiate between
a recto dot (protrusion) and a verso dot (depression) based
Fig. 1. Hindi Braille Characters [1] on the illumination of light. In [3], the authors talk about
an algorithmic approach to differentiate recto dots and verso
Optical Braille recognition (OBR) helps capture Braille dots, albeit from recto-dot-only documents and verso-dot-only
characters from documents, convert them to images, and documents. In [4], the authors discuss the steps involved
process those images to get their natural language equivalents. in building a character recognition system that translates a
This technique is used to preserve documents and reproduce standard character to its corresponding alphanumeric character
them when required. One challenge faced during OBR is from a single-sided page using a conventional method. In [5],
that no ink is used while producing the documents to help the authors talk about a system devised to convert Cyrillic
differentiate between raised dots and the flat surface. Different Braille characters to text using artificial neural networks. The
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 19,2021 at 15:33:41 UTC from IEEE Xplore. Restrictions apply.
multi-layer perceptron was implemented using modified back-
propagation algorithms, reducing convergence time. In [6],
the authors discuss a famous Convolutional Neural Network
(CNN) popularly used for image classification, AlexNet. In
[7], the authors give a brief introduction to Support Vector
Machine (SVM) and how it can be used as a binary classifier
in the case of Optical Braille Recognition (OBR), where the
two classes are presence and absence of dots.
In [8], the authors talk about the methodology used for
designing and creating a Hindi speech corpus consisting of
sentences and phrases, and their respective annotations. In [9],
the authors discuss a concatenative technique of speech syn-
thesis for the Kannada language, creating a database of only
phonemes extracted from MP3 audio files, and concatenating
certain phonemes to form any word or phrase.
The authors of [10] devised a methodology to convert
Kannada Braille to text or speech using Field Programmable
Gate Arrays (FPGAs). Classifications were made on the basis
of number of dots in each Braille cell, and case statements
were used to determine the character based on the presence
or absence of a dot in each of the six positions. In [11], the
authors present the development of a system speaking from
Braille writing using dynamic thresholding, adaptive Braille
grid, and text-to-speech software.
The authors of this paper found that many papers focused
on one of the two processes, Braille to Text or Text to
Speech. They also found that the languages worked on were
mainly foreign and Indian Regional languages. Only one paper
used neural networks, another used support vector machines,
and the rest used manual methods of classification, mainly Fig. 2. Flowchart of proposed methodology
character to binary mapping methods.
III. P ROPOSED M ETHODOLOGY a Hindi Braille book [12] using a mobile camera or a flatbed
As stated in Section I, the main objective of this work is to scanner as shown in Fig. 3. The dataset consisted of 34,800
convert Hindi Braille samples to text and later into speech. The images, 600 images for each of the 58 Hindi Braille characters.
algorithm to perform the same is shown in Fig. 2. The entire The images were created by slightly changing the position and
work was coded in Python 3, using a Windows 10 system. size of dots, while also adding some amount of skewness to
For conversion of Hindi Braille to text, the Braille images them.
are first preprocessed and segmented to obtain individual
Braille characters. Once this is done, Braille character recog-
nition is performed using two approaches i.e, Conventional
Approach and Deep Learning Approach. In the first approach,
the Braille character is converted to a binary sequence us- (a)
ing either the Contouring Approach or SVM Approach and
then mapped to its corresponding Hindi letter. In the second
approach, concepts of Deep Learning (namely Convolutional
Neural Networks) are used in order to train a Convolutional
(b)
Neural Network called AlexNet using a dataset containing
images of Braille letters. Fig. 3. Images captured by various devices (a) Mobile Camera, (b) Flatbed
Scanner
For conversion of text to speech, concatenative speech
synthesis is implemented, wherein the vowel and consonant
combinations in every word is mapped to its corresponding Images captured using a mobile camera or a flatbed scanner
audio files from a speech corpus. tend to have noise content, and some dots in the images might
not be visible due to various lighting factors. Therefore, to
A. Dataset Creation and Preprocessing enhance Braille dots for easy Braille character recognition,
The Devanagari Hindi Braille script consists of 57 charac- various preprocessing methods were employed to obtain noise-
ters. To create the Braille dataset, images were captured from free images as explained below.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 19,2021 at 15:33:41 UTC from IEEE Xplore. Restrictions apply.
1) Gaussian Blur single-sided image. For double-sided images, various methods
Gaussian Blur: is a method where an image is convolved were employed to differentiate between Recto and Verso dots
with a Gaussian filter (a low-pass filter) in order to remove for the final Braille character segmentation process.
the high frequency noise components as shown in Fig. 4.
In the case of Braille character recognition, the sharpness 1) Canny Edge Detection
of Braille dots is essential for edge detection. Therefore, an Edge detection is the process of finding boundaries of
optimal kernel size had to be found by trial and error to remove objects within an image, which is done by finding sudden
noise while retaining the sharpness of the dots. changes in the color of pixel values. As shown in Fig. 7,
an edge is detected when there is a sudden change from
the white background to a black dot. Therefore, Canny Edge
Detection, one of the most popular algorithms, was used for
edge detection.
2) Thresholding
Thresholding: is the process of changing pixel values based
on a predefined threshold value to convert it to a binary image.
In the case of single-sided Braille, adaptive thresholding was Fig. 7. Image after application of canny edge detection
used. In the case of double-side Braille images, the Threshold
to Zero technique was used to differentiate between Recto dots
(protrusions), Verso dots (depressions), and the background as
2) Find Contour Method
shown in Fig. 5.
Contouring is a method where boundaries are drawn around
continuous points having the same color or intensity. Since this
method detects shapes and objects effectively, it works well in
finding Braille dots as well. Contouring is applied to binary
images to improve accuracy.
(a)
An image moment, defined as the weighted average of
image pixel intensities, is used to calculate the centroids of all
the contours in the image. These centroid values are then used
(b) to draw uniform dots on the image by replacing the existing
irregular Braille dots.
Fig. 5. Thresholding (a) Single sided, (b) Double sided
3) Differentiation of Recto and Verso dots
3) Erosion and Dilation
As explained in section III, techniques like Gaussian Blur
Erosion and Dilation: Morphological operations are image
and Thresholding are applied on the double-sided image, and
processing techniques that depend on the shapes present in
later centroid detection is done using Canny Edge Detection
an image. Since the shape of the dots has to be maintained,
and Find Contour method. These centroid values are then used
the two main morphological operations used are Erosion and
to differentiate between Recto and Verso dots.
Dilation.
Fig. 6 shows the image obtained after applying a few rounds In a flatbed scanner, the reflection of light is captured
of Erosion and Dilation. differently for different types of dots. When a Braille page
is scanned, Recto dots have a light region followed by a
dark region, and vice-versa for Verso dots. This image can
be enhanced using Threshold to Zero.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 19,2021 at 15:33:41 UTC from IEEE Xplore. Restrictions apply.
threshold value is considered. If the distance between the con-
secutive x-coordinates is greater than the threshold, then that
distance is the distance between two horizontally consecutive
(a) Braille cells. A line is then drawn using the average of the two
consecutive x-coordinates and the y-coordinate span for that
x-coordinate. The characters are then cropped out and saved
as separate image files.
(b) Fig. 10 shows the output after vertical segmentation.
Fig. 8. Drawing uniform dots using centroid positions (a) Single sided, (b)
Double sided
(a)
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 19,2021 at 15:33:41 UTC from IEEE Xplore. Restrictions apply.
Fig. 14. Text after matra correction
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 19,2021 at 15:33:41 UTC from IEEE Xplore. Restrictions apply.
IV. C OMPARISON OF RESULTS two methods, contouring and SVM and the second approach
The two approaches used by the authors for the conversion was the Deep Learning. The accuracy was found to be better
of Hindi Braille to text are comparable. The conventional in case of conventional approach provided the manually set
method is a simple approach yielding optimal results, but thresholds don’t have to be changed for every image. Fur-
excessive processing has to be performed on the images, to thermore, contouring was a simpler and reliable method for
correctly determine the number of contours for making the conventional approach implementation. In terms of robustness,
binary sequence. SVM was found to be an excellent binary deep learning gave good results as there was no need for
classifier for classifying the images as dots or no-dots. Even manually setting up thresholds. As future work, a real time
though the accuracy of SVM is 100%, in comparison to the application can be implemented.
overall conventional approach, the accuracy of Braille to text ACKNOWLEDGMENT
conversion is not 100%, since SVM is not the only factor
The authors would like to thank PES University for sup-
affecting that accuracy. The deep learning approach, which
porting this work.
uses AlexNet, on the other hand, is robust but slightly less
accurate than the conventional method (Table III). The lack of R EFERENCES
accuracy can be accounted for by a larger dataset. For both [1] M. Clute, ”Elephants, and mysore, and hindi braille, oh my!,” [On-
approaches, the placement of dots must be correct and close line]. Available: https://istep2013.wordpress.com/2013/07/15/technical-
to the original Braille characters, for better recognition, since challeges-of-hindi-braille
[2] A. Antonacopoulos and D. Bridson, ”A Robust Braille Recognition
some characters have only a slight change in position which System,” In: Marinai S., Dengel A.R. (eds) Document Analysis Systems
differentiates them from the others. VI, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg,
Comparing the results of the Braille to Text conversion, vol. 3163, pp. 533-545, 2004.
[3] T. Shreekanth and V. Udayashakara, ”An Algorithmic Approach For
the authors found that the output of the SVM approach was Double Sided Braille,” Int. J. Image. Process. Vis. Commun, vol. 2, no.
more accurate compared to that of the CNN AlexNet approach 4, 2014.
for both kinds of splits. Although the conventional approach’s [4] J. Subur, T. A. Sardjono, and R. Mardiyanto, ”Braille Character
Recognition Using Find Contour Method,” International Conferencce
output cannot be quantified, on inspection, the resulting text on Electrical Engineering and Informatics (ICEEI), pp. 699-703, 2015.
was very similar. [5] K. Smelyakov, A. Chupryna, D. Yeremenko, A. Sakhon, and V. Polezhai,
”Braille Character Recognition Based On Neural Networks,” IEEE
TABLE III Second International Conference on Data Stream Mining Processing
Comparison of results (macro averaged) (DSMP), pp. 509-513, 2018.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ”Imagenet classification
Metric SVM AlexNet AlexNet
with Deep Convolutional Neural Networks,” Advances in Neural Infor-
approach (random (stratified
mation Processing Systems, pp. 1097-1105, 2012.
split) split)
[7] J. Li, X. Yan, and D. Zhang, “Optical braille recognition with haar
Accuracy 100% 96.07% 96.47%
wavelet features and support-vector machine,” IEEE International Con-
PPV 100% 93.29% 93.45% ference on Computer, Mechatronics, Control and Electronic Engineering,
TPR 100% 94.34% 94.75% vol. 5, pp. 64–67, 2010.
F1-Score 100% 93.25% 93.71% [8] D. Magdum, M. S. Dubey, T. Patil, R. Shah, S. Belhe, and M. Kulkarni,
“Methodology for designing and creating hindi speech corpus,” IEEE
The authors conducted a survey for 50 samples produced International Conference on Signal Processing and Communication
Engineering Systems, pp. 336–339, 2015.
by the Hindi Text to Speech converter. The average scores [9] M. Dhananjaya, B. N. Krupa, and R. Sushma, “Kannada text to
(out of 5) on the basis of clarity (4.142), speech (4.013), and speech conversion: a novel approach,” IEEE International Conference
pronunciation (4.081) were determined. on Electrical, Electronics, Communication, Computer and Optimization
Techniques (ICEECCOT), pp. 168–172, 2016.
In [4], contouring is employed for Braille character recogni- [10] S. R. Rupanagudi, S. Huddar, V. G. Bhat, S. S. Patil, and M. Bhaskar,
tion. They get an accuracy of 100% with images that have 0- “Novel methodology for kannada braille to speech translation using im-
0.5 degrees of tilt and accuracy drops to 1% for 1.5 degrees tilt. age processing on FPGA,” IEEE International Conference on Advances
in Electrical Engineering (ICAEE), pp. 1–6, 2014.
However, skewness correction on the images results in high [11] N. Falcon, C. M. Travieso, J. B. Alonso, and M. A. Ferrer, “Image
accuracy for images with tilt greater than 1.5 degrees. In [5], processing techniques for braille writing recognition,” International
ANN for Braille character recognition is used. For 33 training Conference on Computer Aided Systems Theory, Springer, pp. 379–385,
2005.
and 8 testing images per character in Cyrillic an accuracy of [12] S. Barahat, ”Aethihaasik Kathaaen,” All India Confederation for the
95% is obtained. In [16], an MLP model is trained to achieve Blind (AICB) Printing Press.
an accuracy of 98%, but skewness and non-uniform dots are [13] J. P. Vert, K. Tsuda, and B. Scholkopf, “A primer on kernel methods,”
Kernel methods in computational biology, vol. 47, pp. 35–70, 2004.
not considered. Whereas in this proposed work, skewness [14] M. ul Hassan, “Alexnet – imagenet classification with deep convolutional
correction results in high accuracy for tilt greater than 1.5 neural networks,” [Online]. Available: https://neurohive.io/en/popular-
degrees. Furthermore, a dataset of 34800 images have been networks/alexnet-imagenetclassification-with-deep-convolutional-
neural-networks/
considered to obtain an accuracy of 96.47%. [15] Vasant, Part 3, Textbook for class 8, National Council of Educational
Research and Training.
V. C ONCLUSION [16] B.-M. Hsu, “Braille Recognition for Reducing Asymmetric Communi-
The authors successfully implemented the conversion of cation between the Blind and Non-Blind,” Symmetry, vol. 12, no. 7, p.
1069, Jun. 2020.
Braille to speech for Hindi. Two approaches were used,
conventional approach which was further implemented using
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on June 19,2021 at 15:33:41 UTC from IEEE Xplore. Restrictions apply.