Recognition of Printed Bangla Document from Textual Image Using Multi-LayerPerceptron (MLP) Neural Network
Md. Musfique Anwar, Nasrin Sultana Shume, P. K. M. Moniruzzaman and Md. Al-Amin Bhuiyan
Dept. of Computer Science & Engineering, Jahangirnagar University, BangladeshEmail: email@example.com, firstname.lastname@example.org, email@example.com,
This paper focuses on the segmentation of printedBangla characters for efficient recognition of thecharacters. The segmentation of characters is animportant step in the process of character recognitions because it allows the system toclassify the characters more accurately and quickly.The system takes the scanned image file of the printed document as its input. A structural featureextraction method is used to extract the feature. Inthis case, each individual Bangla character isconverted to a
feature matrix. A Multi-Layer Perceptron (MLP) neural network with back propagation algorithm is chosen to feed the featurematrix to train with the set of input patterns and todevelop knowledge to classify the character. Theeffectiveness of the system has been tested withseveral printed documents and the success rates inall cases are over 90%.
Character segmentation, Character recognition,Feature extraction, Multi-Layer Perceptron (MLP),etc.
Optical character recognition  is one of theattractive fields of image processing . Acharacter recognition technique associates asymbolic identity with the image of a character. Lotof research works on Bangla Character recognitionhas been done through last few years. In themodern approach, adaptive tools have been appliedto pattern recognition system. The Artificial Neural Network (ANN) is the most popular adaptive toolthat is used for character recognition . Mostapplication use feed forward ANN and a numerousvariant of classical backpropagation algorithm andother training algorithms. The area of this researchis not only individual character recognition but itattempts to retrieve a complete paragraph from itsoptical image created by a scanner. In this paper we proposed a way to recognize printed Bangladocument from textual image using multilayer perceptron with backpropagation algorithm for individual character recognition.
2. Bangla Character Set
Character is the fundamental attribute for writingand reading a language. Character recognition isthe process to classify the input character accordingto the predefined character class. There is a particular character set for each language in theworld and Bangla language has also its owncharacter set with 49 characters, 10 digits, punctuations and other symbols.Bangla letters are formed in two-dimensional space based on mostly horizontal, vertical and are stroke.The Bangla characters are classified in twocategorizes as follows:i)
‘Shorborno’ like vowel of EnglishLanguage Character. There are eleven‘Shorborno’ characters. The first six charactersor letters have full matra, the 7
has half matraand the last four have no matra.ii)
‘Banjonborno’ is like as theconsonant. There are 39 ‘Banjonborno’ inBangla letter. Here we are concerned aboutonly the characters.Bangla scripts are moderately complex patterns
word in Bangla scripts is composedof
characters joined by a horizontal line(called
or head-line) at the top. Theconcept of upper and lower case (as
English)character is absent
There are manycomposite characters, called “Jukto barna” asshown in
. There are more that about 253compound characters composed of 2, 3, or 4consonants (i.e. Banjonborno) . There are someother types of characters used in Bangla dictionary,called suffix-prefix characters as shown in
Shorbarna(b) Benjonbarno(c) Bangla numerals(d) A few Bangla composite characters
Some Bangla mainstream characters used for images recognition.
Suffix-prefix determiner characters
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 1, April 2010254http://sites.google.com/site/ijcsis/ISSN 1947-5500