Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/45898220
CITATIONS READS
3 73
2 authors, including:
S. Sethu Selvi
M.S. Ramaiah Institute of Technology
10 PUBLICATIONS 57 CITATIONS
SEE PROFILE
All in-text references underlined in blue are linked to publications on ResearchGate, Available from: S. Sethu Selvi
letting you access and read them immediately. Retrieved on: 10 May 2016
Kannada Character Recognition System: A Review
1
K. Indira, 2S. Sethu Selvi
1
Assistant Professor, indm_1@yahoo.com
2
Professor and Head, selvi_selvan@yahoo.com
1,2
Department of Electronics and Communication Engineering
M. S. Ramaiah Institute of Technology, Bangalore – 560 054
Abstract
Intensive research has been done on optical character recognition ocr and a large number of articles have
been published on this topic during the last few decades. Many commercial OCR systems are now available
in the market, but most of these systems work for Roman, Chinese, Japanese and Arabic characters.
There are no sufficient number of works on Indian language character recognition especially Kannada
script among 12 major scripts in India. This paper presents a review of existing work on printed Kannada
script and their results. The characteristics of Kannada script and Kannada Character Recognition
System kcr are discussed in detail. Finally fusion at the classifier level is proposed to increase the
recognition accuracy.
Keywords: Kannada Script, Skew detection, Segmentation, Feature Extraction, Nearest neighbour
classifier, Bayesian classifier .
1. INTRODUCTION
Kannada, the official language of the south 16 = 19600. If each akshara is considered as a
Indian state of Karnataka, is spoken by about 48 separate category to be recognized, building a
million people. The Kannada alphabets were classifier to handle these classes is difficult. Most of
developed from the Kadamba and Chalaukya scripts, the aksharas are similar and differ only in additional
descendents of Brahmi which were used between the strokes, it is feasible to break the aksharas into their
5th and 7th centuries A.D. The basic structure of constituents and recognize these constituents
Kannada script is distinctly different from the Roman independently. The block diagram of a KCR System is
script. Unlike many north Indian languages, Kannada shown in Figure 1.
characters do not have shirorekha (a line that connects
all the characters of any word) and hence all the 2. PRE PROCESSING
characters in a word are isolated. This creates a In a KCR system, the sequences of preprocessing
difficulty in word segmentation. Kannada script is steps are as follows:
more complicated than English due to the presence of
compound characters. However, the concept of
upper/lower case characters is absent in this script. 2.1 Binarization
Modern Kannada has 51 base characters, called The page of text is scanned through a flat
as Varnamale. There are 16 vowels and 35 bed scanner at 300dpi resolution and binarized using
consonants. (Table 1). Consonants take modified a global threshold computed automatically based on a
shapes when added with vowels. When a consonant specific image [1]. A binary image is obtained by
character is used alone, it results in a dead consonant considering the character as ON pixels and the
(mula vyanjana). Vowel modifiers can appear to the background as OFF pixels. The binarized image is
right, top or at the bottom of the base consonant. processed to remove any skew so that text lines are
Table II shows a consonant modified by all the 16 aligned horizontally in the image.
vowels. Such consonant-vowel combinations are called
live consonant (gunithakshra). When two or more 2.2 Skew Detection
consonant conjuncts appear in the input they make a
consonant conjunct (Table III). The first consonant Methods to detect and correct the skew for the
takes the full form and the following consonant document containing Kannada script has not been
becomes half consonant. In addition, two, three or reported in any of the recognition methods for
four characters can generate a new complex shape Kannada script. However, the methods discussed here
called a compound character. briefly work on the document containing Kannada
The number of possible Consonant–Vowel script.
combinations is 35 x 16 = 560. The number of
consonant -consonant vowel combinations is 35 x 35 x 2.2.1 Projection profiles
32 Kannada Character Recognition System …
A projection profile is a histogram of the number Skewed document images are decomposed by the
of ON pixels accumulated along parallel sample lines wavelet transform [5]. The matrix containing the
taken through the document. The profile may be at any absolute values of horizontal subband coefficients
angle, but often it is taken horizontally along rows or which preserves the text horizontal structure, is then
vertically along columns, and these are called the rotated through a range of angles. A projection profile
horizontal and vertical projection profiles is computed at each angle, and the angle that
respectively. For a document whose text lines span maximizes a criterion function is regarded as the
horizontally, the horizontal projection profile will skew angle. This algorithm performs well on
have peaks whose widths are equal to the character document images of various layouts and is robust to
height and minimum height valleys whose widths different languages and hence can be used for
are equal to the between line spacing. The most documents containing Kannada script.
straight forward use of the projection profile for skew
detection is to compute it at a number of angles close 2.2.4 Wavelet decomposition and Hough Transform
to the expected orientations [2]. For each angle, a [3, 5]
measure is made of the variation in the bin heights The document image is decomposed by wavelet
along the profile, and the one with the maximum transform and the LL subband which preserves the
variation gives the skew angle. At the correct skew original image is rotated through a range of angles
angle, since scan lines are aligned to text lines the and Hough Transform is calculated at each angle. The
projection profile has maximum height peaks for text angle that maximizes the highest number of counts
and valleys between line spacing. Some of the corresponds to the skew angle of the text. This method
Indian scripts, such as Devanagiri, have a dark top is suitable for Kannada text scanned at 300dpi and is
line linking all the characters in a word. This strong faster compared to other methods.
linear feature can be used for projection profile based
method. In Kannada, such a line linking all 2.3 Skew Correction
characters of the word is not present. However, a
short horizontal line can usually be seen on top of Skew correction i s performed by rotating the
most of the characters. This feature can be exploited document through an angle –θ with respect to the
for skew estimation. horizontal line, where the detected angle of skew is
θ. In order to prevent the image being rotated off the
image plane, the skewed image is first translated to the
2.2.2 Hough Transform center and the new image dimensions are computed.
This transform [3] maps points from (x, y)
domain to curves in (ρ, θ) domain, where ρ is the 3. SEGMENTATION
perpendicular distance of the line from the origin and
θ is the angle between the perpendicular line and Segmentation is the process of extracting
horizontal axis in the (x, y) plane. Crossing curves in objects of interest from an image. The f i r s t
(ρ, θ) domain is the result of collinear pixels in (x, y) s t e p i n segmentation is detecting lines. The
plane. θ is varied between 0 and 1800 for each black subsequent steps are detecting the words in each line
pixel in a document image and ρ is calculated in and the individual characters in each word.
Hough space. The maximum value in (ρ, θ) space
is considered as t h e skew angle of t h e document 3.1 Line Segmentation
image. This method is time consuming due to the
mapping operation from (x, y) plane to (ρ, θ) plane for Kunte and Samuel describe horizontal and
all black pixels, especially for images containing non vertical projection profiles (HPP and VPP) [7] for line
text dominant area (i.e. pictures, graphs etc.). Le et and word detection respectively. The horizontal
al [4] improved the computational time by applying projection profile is the histogram of the number of
Hough transform only on t h e bottom pixels of ON pixels along every row of the image. White space
connected components belonging to the dominant text between text lines is used to segment the text lines.
Figure 2 shows a sample Kannada document along
area. This method gives an accuracy of 0.50 to the with its HPP. The projection profile has valleys of
original skew angle of the image. Hough transform zero height between the text lines. Line segmentation is
based methods are robust enough to estimate skew done at these points.
angles between -150 to 150. But they are Kumar and Ramakrishnan have reported
computationally expensive and sensitive to noise. segmentation techniques for Kannada script. The
bottom conjuncts of a line overlap with top matras
2.2.3 Wavelet decomposition and projection [8] of the following text lines in the projection
profiles profile. This results in nonzero valleys in the HPP.
InterJRI Science and Technology, Vol. 1, Issue 2, July 2009 33
These lines are called Kerned Text lines. To segment width of the gaps in a line. Hence, the threshold is
such lines, the statistics of the height of the lines are not fixed and i t has to be changed after performing
found from the HPP. Then the threshold is fixed at the histogram of a line. This method may not work for
1.6 times the average line height. This threshold is all types of document.
chosen based on experimentation of segmentation on a
large number of Kannada documents. Nonzero valleys 3.3 Character Segmentation
below the threshold indicate the locations of the text
line and those above the threshold correspond to the 3.3.1 Three Stage Character Segmentation
location of kerned text lines. The midpoint of a Kumar and Ramakrishnan [8] described a three-
nonzero valley of a kerned text line is the separator of stage character segmentation for separating Kannada
the line. Ashwin and Shastry have solved the problem characters from the segmented word. Three line
of overlapping of consonant conjunct of one line with segmentation of character involves the division of
the vowel modifier of the next line by extracting the each into three segments: Top zone consists of top
minima of the horizontal projection profile smoothed matras, middle zone consists of base and compound
by a simple moving average filter. The line breaks characters and bottom zone consists of consonant
obtained sometimes segment inside a line. Such false conjuncts. Head line and base line information is
breaks are removed by using statistics of line widths extracted from the Horizontal Projection Profile. Head
and the separation between lines. line refers to the index corresponding to maximum in
the top half of the profile base line refers to the index
3.2 Word Segmentation corresponding to maximum in the bottom half of the
profile. Using the baseline information, text region in
Kunte and Samuel have proposed word the middle and top zones of a word is extracted and its
segmentation by taking the vertical projection profile VPP is obtained. Zero valued valleys of this profile
of an input text line. For Kannada script, spacing are the separators for the characters as in Figure 5.
between the words is greater than the spacing between Sometimes, the part of a consonant conjunct in the
characters in a word. The spacing between the words middle zone is segmented as a separate symbol then,
is found by taking the vertical projection profile of an split the segmented character into a base character and
input text line. The width of zero valued valleys is a vowel modifier (top or right matra). The consonant
more between the words in line as compared to the conjuncts are segmented separately based the on
width of zero valued valleys that exist between connected component analysis (CCA).
characters in a word. This information is used to
separate words from the input text lines. Figure 3
shows a text line with its VPP. 3.3.2.1 Consonant Conjunct Segmentation
Kumar and Ramakrishnan have described that Knowledge based approach is used to separate
Kannada words do not have shirorekha and all the the consonant conjuncts. The spacing to the next
characters in a word are isolated. Further, the character in the middle zone is more for characters
character spacing is non-uniform due to the having consonant conjuncts than for t h e others. To
presence of consonant conjuncts. Thus, spacing detect the presence of conjuncts, a block of partial
between the base characters in the middle zone image in the bottom zone corresponding to the gap
becomes comparable to the word spacing. This could between adjacent characters in the middle zone is
affect the accuracy of word segmentation. Hence, considered. If the number of ON pixels in the
morphological dilation is used to connect all the partial image exceeds a threshold (for examples 15
characters in a word before performing word pixels), a consonant conjunct is detected. Sometimes,
segmentation. Each ON pixel in the original image is a part of the conjunct enters the middle zone between
dilated with a structuring element then VPP of the the adjacent characters such parts will be lost if the
dilated image is determined. The zero valued valleys conjunct is segmented only in the bottom zone. Thus,
in the profile of the dilated image separates the words in order to extract the entire conjunct CCA is used.
in the original image. Figure 4 illustrates the dilated However, in some cases the conjunct is connected to
image and VPP of a line. The accuracy of word the character in the middle zone causing difficulty in
segmentation depends upon the structuring element using CCA for segmenting the conjunct alone. This
and the type of structuring element has to be decided problem is solved by removing the character in the
by performing experiments on different text with middle zone before applying CCA. But the consonant
various fonts. Ashwin and Shastry have suggested to conjunct segmentation described in this paper will
adapt a threshold for each line of text to separate have discrepancies for some of the words as
interword gaps from inter-character gaps. Threshold depicted in Figure 6. The normal segmentation
has to be obtained by analyzing the histogram of the technique described in this paper would falsely
34 Kannada Character Recognition System …
the performance of nearest neighbour classifier is presegmented characters and hence a complete system
better than BPN network. RBF [8] with Zernike for recognition of Kannada text is not designed.
moments, the recognition rate is 96.8%. With DWT In [6] and [8], to increase the recognition accuracy
and with structural features the recognition rate is structural features are included for disambiguating
99.1% for the base character. RBF is used for spatial confused characters. Results presented in [11] and [14]
domain and frequency domain features. Thus are based only on base characters and consonant
performance of RBF network is higher than NN and conjuncts. A complete KCR system is not proposed.
BPN classifier and is true for vowel modifier and Curvelets give very good representation of edges
consonant conjuncts. SVM [1] with Zernike features, in an image, it has high directional sensitivity and are
the recognition rate is 92.6%. With modified structural highly anisotropic. Hence, can be used to represent
features, the recognition rate is 93.3% for the base Kannada characters for classification. Different types
character. Performance of SVM with spatial domain of features can be used to represent a character.
features is higher than NN classifier for the base Each of the feature vector is analysed using a
character. Comparing the work proposed in [1], [8] classifier. Multiple categories [17] of the features
and [14] RBF gives highest recognition rate of 99% are combined into a single feature vector and a
using Haar features. final classifier provides the decision of the class of the
test character.
6. CONCLUSION AND FUTURE DIRECTION Table 1. Vowels and Consonant
In this paper, a complete character recognition
system for printed Kannada documents is
d i s c u s s e d in detail. Different segmentation
techniques and various classifiers with different
features are also discussed. Different segmentation
methods lead to different classifier design, as it Table 2. Consonant – Vowel combination
depends on the number of classes at the output stage of
a classifier.
Results presented in [1] are based on scanning
Kannada texts from magazines and textbooks at 300
dpi. The training and test patterns were generated
from the same text and was ensured that these
patterns are disjoint. SVM classifier is used in the
recognition stage, and this type of two class
classifier becomes complicated for a multi-class
problem. This multi-class classification problem can
be solved [16] by using pair wise based SVM
classifier, fuzzy pair wise SVM and directed acyclic
graph (DAG).
To improve recognition performance, a
recognition based segmentation method uses dynamic
wide length to provide segmentation points which
are confirmed by the recognition stage. In [8] and
[14] DWT, DCT, KLT, Hu’s invariant moments
and Zernike moments are used to represent each
character and is classified using NN, BPN and RBF
network. Results presented in [8] are based on
38 Kannada Character Recognition System …
KANNADA
TEXT
DOCUMENT Recognized o/p
Figure 5 (a) Text line (b) VPP (c) Middle and top zone (d) VPP of (c)
Sl
Publication Feature Extraction
No
Kumar and DCT, DWT
1.
Ramakrishnan [8] ( Haar, Db2), KLT.
Ashwin and Division of segments into
2. tracks and sectors.
Sastry [1]
Kunte and Hu’s invariant moments,
3. Zernike moments.
Samuel [11]
Kunte and Fourier and
4. Wavelet Descriptors
Samuel [11]
Kunte and Contour extraction and
5. wavelet transform.
Samuel [11]
Kumar and Subspace
6.
Ramakrishnan [6] projection.
40 Kannada Character Recognition System …
%
Sl. Publi Feature Recognition
Symbol Features Recognition
No cation dimension time(min)
Rate
16 (4 x 4) 93.80 0.60
DCT 64 (8 x 8) 98.70 1.23
144 (12 x 12) 98.27 2.11
Base 40 98.70 0.92
1 [8]
character KLT 50 98.55 1.13
60 98.77 1.32
DWT (Haar) 64 (8 x 8) 98.83 2.65
DWT (db2) 100 (10 x 10) 98.55 3.53
Zernike 7 92.66 -
Structural 48 92.76 -
(equally spaced
radial
Tracks & sectors)
Base Modified 48 93.28
2 [1]
character structural
(adaptively
spaced for
Equal ON
Pixels in each
annular region)
Haar (db1) 64 94.20 0.56
Top and
right matra DCT 64 93.04 0.43
3 [8]
Consonant Haar (db1) 64 96.61 0.37
conjuncts DCT 64 96.7 0.21
Zern like 7 88.13 -
Top matra Structural 48 86.91 -
Modified 48 88.28
4 [1] Consonant structural
Zernike 7 92.22
conjuncts Structural 48 89.27
Modified 48 92.76
5 [6] Base structural
Subspace 60 94.5 1.69
character Projection
InterJRI Science and Technology, Vol. 1, Issue 2, July 2009 41
Equal to
number
Hu’s moments 7 of RBF 82
training
Base samples
6 [11]
Character
Zernike 7 RBF 91.8
moments
10 96.8
Base
DWT 120 60 BPN 92
Character
7 [14] --- ----
Consonant
94
conjuncts
42 Kannada Character Recognition System …