You are on page 1of 4

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN

COMPUTER ENGINEERING

GUAJARATI CHARACTER RECOGNITION: THE


STATE OF THE ART COMPREHENSIVE SURVEY
1

AVANI R. VASANT, 2SANDEEP R. VASANT, 3DR. G.R. KULKARNI


Research Scholar Singhania University, Rajasthan and Assistant professor and Head
Department of Information Technology V.V.P. Engineering College, Rajkot
2
Research Scholar Saurashtra University, Rajkot & Lecturer AES Institute of
Computer Studies, Ahmedabad University,
3
Principal C.U.Shah College of Engg. & Tech., Wadhwan
1
avanivasant@yahoo.com , 2sandeep.vasant@gmail.com, 3grkulkarni29264@rediffmail.com
1

ABSTRACT: - Character recognition is very interesting area of pattern recognition and it deals with offline
handwriting recognition. Handwriting Recognition has continued to Persist as a means of communication,
collection, recording and transmitting information in day-to-day life since the centuries even with the advent of
the new technologies. Machine recognition has many practical applications, reading handwritten postal
envelopes, amount written in bank checks, bill processing, government records, commercial forms, signature
verification, offline document recognition etc. This Paper describes the state of the art survey of the work done
for the Guajarati character recognition.
Keywords Gujarati Character Recognition, online, offline, pre-processing, classifiers

I. INTRODUCTION
India is a versatile country. In India more than 20
official languages are there, Bengali, Malayalam,
Hindi, English, Guajarati, Tamil, Kannada, Urdu etc
[1]. Gujarati language is very popular language and it
is an official language of the Gujarat State of India.
More than 50 million people speak Gujarati language
[2]. We can find that work has been done in the field
of various languages like Chinese, Tamil, Telugu, and
Kannada etc but very less work is done in the field of
Gujarati language recognition.
Pattern recognition has become a very interesting
topic for researchers during last few decades. Typed
characters can be easily recognized by computer
machine. But handwritten characters are not
recognized efficiently and accurately by computer
machine. Many researchers have done work to
recognize these characters and many algorithms have
been proposed to recognize characters. For more than
30 years, researchers have been working on
handwritten recognition. Over the few past years, the
numbers of companies involved in research on
handwritten recognition are increasing continually.
Challenges in handwritten characters recognition lie
in the variation and distortion of offline handwritten
characters since different people may use different
style of handwriting.

includes 59 Characters and 16 diacritics. Fifty-nine


characters are divided into 36 consonants (34
Singular and 2 Compound (not lexically though))
means ornamented sounds, 13 vowels (pure sounds),
and 10 numerical digits [4][5]. Sixteen diacritics are
divided into 13 vowel and 3 other characters. The
alphabet is ordered by logically grouping the vowels
and the consonants based on their pronunciations [3].
There are many applications of this form of
recognition. Like postal code verification, vehicle
number plate recognition, bank cheque processing
Assigning ZIP Codes to letter mail, Reading data
entered in forms, e.g., tax forms, Automatic
accounting procedures used in processing utility bills,
Verification of account numbers, Automatic
accounting of airline passenger tickets, Automatic
validation of passports
Etc [6].
In particular, machines that can read symbols are
very cost effective. A machine that reads banking
checks can process many more checks than a human
being in the same time. This kind of application saves
time and money, and eliminates the requirement that
a human perform such a repetitive task [1].
Gujarati digits are having different characteristics.
They are having various shapes and its really
difficult to recognize those shapes. Due to varieties in
shapes there are some characters that are confusing
and possibilities for misclassification are very high
[5]. Figure 1 shows the Gujarati alphabets and digits.

II. Gujarati Script


Gujarati-script used to write the Gujarati language.
The Gujarati alphabet utilizes overall 75 distinct
legitimate and recognized shapes, which mainly

ISSN: 0975 6760| NOV 11 TO OCT 12 | VOLUME 02, ISSUE - 01

Page 146

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN


COMPUTER ENGINEERING
An automatic passport reader can be used for the
inspection purpose. That can verify the travelers
information like name, age, passport number and also
photograph image which saves time at the airports.
IV. Types of Character recognition system

Figure 1 Gujarati Characters and Gujarati Digits


III. Application of Character Recognition System
There are number of applications of Character
Recognition System [1]
Task-specific Readers
It is basically used for voluminous data processing. It
focuses on the specific application like
Assigning ZIP codes to letter mail.
Reading data entered in forms, e.g. tax
forms
Verification of account numbers and
courtesy amounts on bank checks
Automatic accounting procedure used in
processing utility bills
Automatic accounting of airline passenger
tickets
Automatic validation of passports
Address Readers
The address reader in a postal service reads the
destination address block on the envelope and also
reads the PIN code in the address block. Then using
the PIN code it can sort the envelopes according to
the area.
Form Reader
Form Reader automatically reads the data filled up in
the form. It can find the printed and handwritten text
in the form and also recognizes the same.
Check Reader
Automatic check reader reads amount and account
information from the check image and recognizes the
amount as well as the account information.

A character recognition system basically deals with


the recognizing offline handwritten character.
Typically it can be classified as the following two
types[8].
Online recognition and
Offline recognition
Online Character Recognition
In case of online character recognition, there is real
time recognition of characters[6]. Online systems
have better information for doing recognition since
they have timing information and since they avoid
the initial search step of locating the character as in
the case of their offline counterpart. Online systems
obtain the position of the pen as a function of time
directly from the interface. Offline recognition of
characters is known as a challenging problem
because of the complex character shapes and great
variation of character symbols written in different
modes.
Offline Character Recognition
In case of offline character recognition, the
typewritten/handwritten character is typically
scanned in form of a paper document and made
available in the form of a binary or gray scale image
to the recognition algorithm. Offline character
recognition is a more challenging and difficult task as
there is no control over the medium and instrument
used [7]. The artifacts of the complex interaction
between the instrument medium and subsequent
operations such as scanning and binarization present
additional challenges to the algorithm for the offline
character recognition. Therefore offline character
recognition is considered as a more challenging task
then its online counterpart.
V. Different Phases of the Character Recognition
System

Signature verifier
Just like check reader there is a signature verifier that
also reads the signature image from the check image
and recognizes the same.
Bill Processing System
This system is basically used to read payment slips,
bills or any value specified in the bill.
Passport Readers

ISSN: 0975 6760| NOV 11 TO OCT 12 | VOLUME 02, ISSUE - 01

Page 147

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN


COMPUTER ENGINEERING
Figure 2 Components of the character recognition
system
The Pre-processing step aims to improve the image
data or the image features that required for the further
processing. The pre-processing is a series of
operations performed on the scanned input image. It
essentially enhances the image. It involves converting
an input image to binary image, noise removing,
dilation operation, line segmentation and digit
segmentation and normalization [9].
Feature Extraction is a very important step for any
character recognition system. This step involves the
procedures like shape information or style which is
very much useful for the classification of the pattern.
The feature extraction stage analyses a text segment
and selects a set of features that can be used to
uniquely identify the text segment [10].
Classification stage uses the features extracted to
identify the text segment according to the algorithm.
The task is to compare the testing patterns and
minimizing the error rate and correct classification of
the pattern.
Post-processing involves various approaches
dictionary lookup and statistical approach or neural
network recognition [11] for the correct recognition.
VI. Various classifiers used for the Gujarati character
recognition
K-nearest Neighbor classifier
In [12] k-nearest neighbor classifier approach has
been used for the Gujarati character recognition. Knearest neighbor classifier has been found very good
results for the English characters. It used the knearest samples to test sample and identifies it to that
class which has the largest number of votes. The
nearest neighbor is found by using the Euclidean
distance measure. For 1-NN classifier the best
recognition rate achieved was 67% in the binary
feature space and in regular moment space the rate
was 48%.

character being classified. Using this approach the


recognition rate was only 39%.
Feed forward back propagation neural network
classifier
Neural based character recognition can be found in
[5][13][14].
In [5] a feed forward back propagation neural
network is proposed for the classification of the
Gujarati numerals. Various techniques are used in the
preprocessing
phase
before
implementing
classification of numerals. Gujarati numerals are
based on very sharp curves and curves are irregular,
to handle this situation here in this work, various
profiles of digits are used as template to identify
various digits. In this very simple but effective,
feature extraction technique the use of four different
profiles, horizontal, vertical, and two diagonals, is
suggested. The overall performance of this proposed
network is as high as 81.66%.
A handwritten character recognition system using
multilayer Feed forward neural network is proposed
in [13]. Three different orientations, namely,
horizontal, vertical and diagonal directions are used
for extracting 54 features from each character. In
addition, 9 and 6 features are obtained by averaging
the values placed in zones row wise and column
wise, respectively. As a result; every character is
represented by 69, that is, 54 +15 features.
From the test results it is identified that the diagonal
method of feature extraction yields the highest
recognition accuracy of 98% for 54 features and 99%
for 69features.
KNN and PCA classifier
In [15] they are using KNN classifier and PCA (to
reduce dimensions of feature space) and used
Euclidean similarity measure to classify the
numerals. KNN classifier yielded 90 % as
recognition rate whereas PCA scored recognition rate
of 84%. The comparison of KNN and PCA is made
and it can be seen that KNN classifier has shown
better results as compared to PCA classifier.

The Minimum Hamming Distance Classifier:

SVM Classifier

In [12] this approach has been used for the Gujarati


character recognition. The Minimum Hamming
Distance Classifier uses the Hamming Distance
between the sample and the class centroids built
using the training sets to classify characters. It is
assumed that the image pixels have a Bernoulli
distribution. Then the hamming distance is the sum of
the absolute pixel difference (in binary space)
between the class centroids and the image of the

In [16] authors propose the Support Vector Machine


(SVM) based recognition scheme towards the
recognition of Gujarati handwritten numerals. A
technique based on affine invariant moments for
feature extraction is applied and the recognition rate
of 91% approximately.
V. CONCLUSION

ISSN: 0975 6760| NOV 11 TO OCT 12 | VOLUME 02, ISSUE - 01

Page 148

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN


COMPUTER ENGINEERING
This paper describes the various steps
involved in the character recognition system. Then it
also reviews various character recognition systems
like online and offline recognition system. It also
describes the various applications of the character
recognition system. Last section reviews the various
classifiers that can be used for Gujarati character
recognition.
REFERENCES
[1] Online and off-line Handwriting Recognition: A
comprehensive survey, Rejean Plamondon,
Fellow IEEE, sargur Srihari, fellow IEEE, IEEE
transactions on pattern analysis and machine
intelligence, vol.22 no.1 january 2000
[2] Indian script character recognition: a survey U.
Pal, B.B. ChaudhuriU. Pal, B.B. Chaudhuri /
Pattern Recognition 37(2004) 1887 1899
[3] OFFLINE TYPED GUJARATI CHARACTER
RECOGNITION, Manish Kayasth, Dr. Bankim
Patel, I S S N : 0 9 7 4 - 3 3 0 8, VOL. 2, NO.
1 ,JUNE 2009
[4] Babu Suthar - Gujarati-English Learners
Dictionary
[5] Gujarati handwritten numeral optical character
reorganization through neural network Apurva A.
Desai, Elsevier, Pattern recognition, 43 (2010)
25822589
[6] CIA/DOE Partnership Program Proposal for
FY99 (Sandia National Laboratories Proposal),
1998.
[7] S. N. Srihari, Recognition of Handwritten and
Machineprinted Text for Postal Address
Interpretation, Pattern Recognition Letters, 14,
1993, pp. 291-302.
[8] Genetic algorithm for feature selection and
weighgting for off-line character recognition,
Thesis , Faten T. Hussein, Egypt, 1995
[9] Rafael C. Gonzalez, Richard E. woods and
Steven L.Eddins, Digital Image Processing using
MATLAB,
Pearson
Education,
Dorling
Kindersley, South Asia, 2004
[10] Kharma, N. & ward, R. (1999). Character
recognition systems for the Non-expert, in IEEE
Canadian Review, 33, pp.5-8.
[11]
K. Y. Rajput and Sangeeta Mishra ,
Recognition and Editing of Devnagari
Handwriting
Using
Neural
Network,
Proceedings of SPIT-IEEE Colloquium and
International Conference, Mumbai, India Vol. 1,
66.
[12] S. Antani, L. Agnihotri, Guajarati character
recognition,
Proceedings
of
the
Fifth
International Conference on Document Analysis
and Recognition, 1999, pp. 418421.
[13]Diagonal Feature Extraction Based Handwritten
Character
System
Using
Neural

Network,J.Pradeep ,E.Srinivasan, S.Himavathi,


International Journal of Computer Applications
(0975 8887) Volume 8 No.9, October 2010
[14]M. Hanmandlu, K.R.M. Mohan, and H. Kumar,
Neuralbased
Handwritten
character
recognition, in Proceedings of Fifth IEEE
International Conference on Document Analysis
and Recognition, ICDAR99, pp. 241-244,
Bangalore, India, 1999.
[15] Comparison Of Classifiers For Gujarati Numeral
Recognition Baheti M. J., Kale K.V., Jadhav
M.E., International Journal Of Machine
Intelligence Issn: 09752927 & E-Issn: 0975
9166, Volume 3, Issue 3, 2011, Pp-160-163
[16]Support Vector Machine Based Gujarati
Numeral Recognition Mamta Maloo, K.V.Kale
International Journal On Computer Science And
Engineering (Ijcse), Issn : 0975-3397 Vol. 3 No.
7 July 2011 , Pp2595-2600

ISSN: 0975 6760| NOV 11 TO OCT 12 | VOLUME 02, ISSUE - 01

Page 149