You are on page 1of 27

Offline Handwritten Character Recognition

A Thesis submitted in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology

by
ANSHUL GUPTA
(07010206)
MANISHA SRIVASTAVA
(07010226)

Department Of Electronics And Communication Engineering

INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI


ASSAM, INDIA - 781039
April, 2011

Certificate
This is to certify that work reported in this thesis entitled Offline Handwritten Character
Recognition in partial fulfilment of the requirements for the award of the Degree of Bachelor
of Technology, is submitted by Anshul Gupta(07010206) and Manisha Srivastava(07010226) in
the Department of Electronics and Communication Engineering, Indian Institute of Technology
Guwahati , under the supervision of Dr. Chitralekha Mahanta, Department of ECE, IIT Guwahati. The matter embodied in this thesis has not been submitted elsewhere for the award of any
other degree.
Place :
Date :
(Supervisor Signature)
Dr. Chitralekha Mahanta
Associate Professor
Dept. of Electronics and Communication Engineering
Indian Institute of Technology, Guwahati

Acknowledgment

First and foremost, We would like to take this opportunity to express our deepest and
sincere gratitude to our thesis supervisor. We value the freedom she gave us to carry out
research in the field of our interest and we sincerely thank her for that. Her stimulating
suggestions and encouragement helped us in the time of research and writing of this thesis.
We are very much thankful for her continuous help and support during entire semester.
Finally, We would like to thank our parents and siblings and friends for their immense
love and support during our entire student life.

Abstract

Character Recognition (CR) has been an active area of research and due to its diverse
applicable environment, it continues to be a challenging research topic. In this project, we
focus specially on off-line recognition of handwritten English words. The main approaches
for off-line cursive word recognition can be divided into segmentation-based and holistic
one. The holistic approach is used in recognition of limited size vocabulary where global
features, extracted from the entire word image are considered. As the size of the vocabulary increases, the complexity of algorithms also increases linearly due to the need for a
larger search space and a more complex pattern representation. Additionally, the recognition rates decrease rapidly due to the decrease in interclass variances in the feature space.
The segmentation based strategies, on the other hand, employ bottom-up approaches, starting from stroke or character level and going towards producing a meaningful text. With
the cooperation of segmentation stage, the problem is reduced to the recognition of simple
isolated characters or strokes, which can be handled for unlimited vocabulary. We here
adopt segmentation based character recognition using neural nets. A number of techniques
are available for feature extraction and training of CR systems each with its own superiorities and weaknesses. We will try to explore these techniques in order to obtain a good
recognition rate.

Introduction

It is a challenging issue to develop a practical cursive, handwritten CR system which can maintain
high recognition accuracy and is independent of the quality of the input documents. Very often
adjacent characters tend to be touched or overlapped.
Therefore, in the segmentation-based strategy, it is essential to segment a given string correctly into
its character components. The complexity of character segmentation stems from the wide variety of
fonts, rapidly expanding text styles and poor image characteristics. Touched, overlapped, separated,
and broken characters are major factors for causing segmentation errors. In most of the existing
segmentation algorithms, human writing is evaluated empirically to deduce rules. Sometimes the rules
derived are satisfactory but there is no guarantee for their optimum results in all styles of writing.
Moreover human writing varies from person to person and even for the same person depending on
mood, speed, environment etc. On the other hand researchers have employed techniques like artificial
neural networks, hidden Markov models and statistical classifiers to extract rules based on numerical
data.
Another crucial module is a cursive character classifier for scoring individual characters. It has to
cope with the high variability of the cursive letters and their intrinsic ambiguity (letters like e and l
or u and n can have the same shape).The features that are used for training the neural net classifier
also play a very important role. The choice of a good feature vector can significantly enhance the
performance of a character classifier whereas a poor one can degrade its performance considerably.
A generic character recognition system may be shown in Figure 1. Its different stages are as given
below:
Input: Samples are read to the system through a scanner.
Preprocessing: Preprocessing converts the image into a form suitable for subsequent processing
and feature extraction.
Segmentation: The most basic step in CR is to segment the input image into individual glyphs.
This step separates out sentences from text and subsequently words and letters from sentences.
Feature extraction: Extraction of features of a character forms a vital part of the recognition
process. Feature extraction captures the vital details of a character.
Classification: During classification, a character is placed in the appropriate class to which it
belongs.
Post Processing: Combining the CR techniques either in parallel or series.

Figure 1: System Block Diagram: Off-line Handwritten character Recognition

History of character recognition system

The very first effort in the direction of CR was made by Tyuring who attempted to develop an aid
for the visually handicapped [1]. The first character recognizer appeared in around 1940s. The early
works were concentrated either upon machine-printed text or upon a small set of well-separated handwritten text or symbols. Machine-printed CR generally used template matching and for handwritten
text, low-level image processing techniques were used on the binary image to extract feature vectors,
which were then fed to statistical classifiers [2],[3],[4]. A good survey of the CR techniques used until
1980s can be found in [5]. The period from 1980 - 1990 witnessed a growth in CR system development due to rapid growth in information technology [6],[7],[8]. Structural approaches were initiated in
many systems in addition to the statistical methods [9],[10]. The syntactic and structural approaches
require efficient extraction of primitives [11]. Chan et al. [12] discussed a structural approach for
recognizing on-line handwriting. The recognition process starts with a sequence of points from the
user and then uses these points to extract the structural primitives. These primitives include different
types of line segments and curves. But there existed an upper limit in the recognition rate, because
the CR research was focused basically on the shape recognition techniques without using any semantic information. Historical review of CR research and development during 1980-1990 can be found in
[13]and [14] for off-line and on-line cases, respectively.
After 1990, image processing techniques and pattern recognition were combined using artificial intelligence. Along with powerful computers and more accurate electronic equipments such as scanners,
cameras and electronic tablets, there came in efficient, modern use of methodologies such as neural networks (NNs), hidden Markov models (HMMs), fuzzy set reasoning, and natural language processing.
The 1990s systems for the machine-printed off-line [15],[16] and limited vocabulary, user-dependent
on-line handwritten characters [17],[18] were satisfactory only for restricted applications.
Although research on recognizing isolated handwritten characters has been quite successful, recognizing off-line cursive handwriting has been found to be a challenging problem. There is a large corpus
of research on the application of character recognition in different domains, but no system to date
5

has achieved the goal of system acceptability.

Applications

One application of CR system is handwritten word recognition . Current research aims at developing
constrained systems for limited domain applications such as postal address reading , check sorting, tax
reading, and office automation for text entry. Since we can make use of the entire word at once, it is
possible to exploit correlations between adjacent characters. One way to do this is through contextual
knowledge of syntax and a dictionary of possible words, which has been shown to be successful for
reading handwritten address information of postmarked mail. Another potential application of CR
systems is in script recognition. CR systems also find applications in newly emerging areas, such
as development of electronic libraries, multimedia database, and systems which require handwriting
data entry.

4
4.1

Methodology Used
Segmentation

Most of the existing CR systems threshold the gray-level image and normalize the slant angle and
baseline skew in the preprocessing stage. Then, they employ the normalized binary image in the
segmentation and recognition stages [19, 20, 21]. However, in some cases, normalization may severely
deform the writing generating improper character shapes. Furthermore, through the binarization of
the gray scale document image, useful information is lost. In order to avoid the limitation of binary
image, some recent methods use gray-level image [22]. There, however, the insignificant details
suppress important shape information. The method used in this project for segmentation is similar
to that in [23] which employs an analytic approach on gray-level image supported by binary image
and a set of global features.
4.1.1

Heuristic Based Segmentation

4.1.1.1 Global Feature Estimation : In this stage, first, the input image is binarized using a global
threshold. Secondly, the following operations are performed on the binarized image.
4.1.1.1.1 Stroke Width and Height Estimation : Stroke Width Estimation is a two-scan procedure. The first scan on each row of the binary image calculates the stroke width histogram by
counting the black pixel runs in horizontal direction. Then, the mean width, estimated over all of the
rows, is taken as the upper bound (maximum width) for the run length of the strokes. The second
scan on the stroke width histogram discards those strokes whose run length is greater than maximum
width. Finally, the stroke width of the input-word image is estimated as the average width of the
strokes in the second scan. In order to estimate the stroke height, which is assumed to be the average height of the vertical strokes in writing, a similar algorithm is used with the scanning procedure
applied in vertical direction. Minimum height is estimated instead of maximum width. In the second
scan, those pixels whose run lengths are smaller than the minimum height are discarded.
6

4.1.1.1.2 Slant Angle Detection: Slant is the deviation of the strokes from the vertical direction, depending on writing style. In many handwriting recognition studies, slant correction is applied
before segmentation and recognition stages. However, this correction produces serious deformation in
characters. In [24] no slant correction was applied, but slant angle was used later in the segmentation
stage. For slant angle estimation, we have used [25]. The method involves rotating the image from
45 to 45 .The horizontal projection was taken at each rotation to calculate Wigner - Ville distribution (WVD - a joint function of time and frequency). The angle, which presents the maximum
intensity after applying WVD, is taken as the estimated slant angle.
4.1.1.1.3 Baseline extraction : Locations of upper and lower baselines determine the existence
of ascending and descending characters in a given word image. Baseline information is used in
segmentation in order to avoid problems introduced by ascending and descending portions of the
characters. In [24], a new baseline extraction algorithm has been proposed. First, a preliminary
centerline for each word image is determined by finding the horizontal line with the highest number
of black pixel runs. Then, the local minima below the preliminary baseline are identified eliminating
the ones on the ascending part. The goal is to find the best fit to the local minima with a high
contribution from the normal characters and low contribution from descending characters. A weight
is computed for each minimum by considering the average angle between that minimum and the
rest of the minima. This approach assumes relatively small average angles among the minima of
normal characters compared to the average angle between a descending minimum and normal minima,
independent of the writing style. Finally, a line-fitting algorithm is performed over the weighted local
minima. To locate the upper baseline, the local maxima above the lower baseline are identified and
their distances from lower baseline is calculated. The ones whose distance is less than the estimated
stroke height are pruned. Next the remaining distances are clustered in two classes and a line parallel
to the lower baseline is drawn, which passes from the mean value of the class, which includes the local
maxima with smaller distances. The center baseline is a parallel line with equal distance from the
upper and lower baseline.
4.1.1.2 Determination of Segmentation Regions : The segmentation regions carry the potential
segmentation boundaries between the connected characters. The first step is to partition each word
image into stripes along the slant angle direction, each of which contains a potential segmentation
boundary. The rules applied on the binary word image for identifying the segmentation regions are
based on the fact that a single maximum above the center baseline indicates a single character or a
portion of a character whereas the region between the two adjacent local maxima carries a potential
segmentation boundary.
Determination of the segmentation regions in each word image is accomplished in three steps:
Step:1 A straight line is drawn in the slant angle direction from each local maximum until the top
of the word image. However, there may be an ascender character on this path which should
be avoided. While going upward in slant direction, if any contour pixel is hit, this contour is
followed until the slope of the contour changes to the opposite direction which marks the end of
the character. The direction of the contour following is selected as the opposite of the relative
position of the local maximum with respect to the first contour pixel hit by the slanted straight

line. After this a line is drawn from the maximum to the top of the word image in the slant
direction.
Step:2 In this step, a path in the slant direction from each maximum to the lower baseline is drawn.
However, the algorithm avoids passing from the white pixels by selecting a black pixel, as long
as there is one in either left or right neighborhood of the white pixels.
Step:3 A process similar to the one in the first step is performed in order to determine the path from
lower baseline to the bottom of the word image. In this case, the aim is to find the path, which
does not cut any part of the descended character.
4.1.1.3 Segmentation Path : The problem of segmentation can be represented as finding the shortest path from the top row to the bottom row, which minimizes the cumulative cost function as given
in [24],
X
(i,j)(i1 ,k)
(1)
Cost =
1<i<H

j1k j+1

(2)
where,
!

(i,j)(i1 ,k) =

H + (H i)
Ii+1,k + SW Ci+1,k ,
H

(3)
1 i H,

j 1 k j + 1,

(4)
where Iij and Cij are the gray values of the pixel (i, j) in gray-level and boundary image respectively,
with zero corresponding to white and one corresponding to black. SW is the estimated stroke width,
i is the y-coordinate and j is the x-coordinate of a pixel .The above cost function forces the shortest
path to pass through the white pixels without cutting any boundary of the characters if possible.
The intensity values in gray-level are weighted by the y coordinate of the pixel. Between two pixels
with the same intensity values, the pixel whose coordinate is lower than the other is given higher
weight. If the path is required to cut any stroke in the segmentation region, it cuts the stroke, which
is closest to the bottom. The character contours in the boundary image are represented by black
pixels and weighted by the estimated stroke width. The weight given to each boundary pixel enforces
the path to cut the minimum number of strokes. Therefore, the segmentation path is optimal when it
goes through the common stroke, thus separating two joined characters. The algorithm constrain the
possible vertices that can be reached from vij to vi+1,j1 , vi+1,j , and vi+1,j+1 . This avoids the cuts in
the horizontal direction and the moves in the opposite directions. A dynamic programming algorithm
then searches for the shortest path (segmentation path) from top to bottom rows.

4.1.2

Neural Network Based segmentation

In [26] word recognition system, heuristic and intelligent methods are used for the segmentation of
real world, handwritten words.
Gray level image is converted to binary image. Slant detection similar to the one used in Heuristic
based segmentation is employed and then slant correction is done. For both training and testing
phases, a heuristic, feature detection algorithm is used to locate prospective segmentation points in
handwritten words. Each word is inspected in an attempt to locate characteristics representative of
segmentation points.

4.1.2.1 Segmentation using a heuristic algorithm : A simple heuristic segmentation algorithm was
implemented which scanned handwritten words for important features to identify valid segmentation
points between characters. The algorithm first scanned the word looking for minimas or arcs between
letters, common in handwritten cursive script. For this a histogram of vertical pixel densities is
calculated for each word. The histogram is obtained by calculating total runs of vertical pixels for
each column of the word image where black pixels exist. The histogram is examined for minima (low
vertical pixel density) which may confirm the location of possible segmentation points in the word. In
many cases these arcs are the ideal segmentation points, however in the case of letters ,as a and o,
where an erroneous segmentation point could be identified. Therefore the algorithm incorporated a
hole seeking component which attempted to prevent invalid segmentation points from being found.
If an arc was found, the algorithm checked to see whether it had not segmented a letter in half,
by checking for a hole. Finally, the algorithm performed a final check to see if one segmentation
point was not too close to another. This was done by ascertaining if the distance between the last
segmentation point and the position being checked was equal to or greater than the average character
width of a particular word. If the segmentation point in question was too close to the previous one,
segmentation was aborted. Conversely, if the distance between the position being checked and the last
segmentation point was greater than the average character width, a segmentation point was forced.
4.1.2.2 Manual Segmentation of the database: Since we did not have any database for handwritten
words we created our own database for the training of neural network segmentation. 26 words were
chosen that contained all the upper and lower case alphabets and then 10 different samples of each
word were taken on paper from different writers. The images were then scanned and preprocessed to
create a list of 260 words. Prior to ANN training, the heuristic feature detector was used to segment
all words. The segmentation points output by the heuristic feature detector were manually analyzed
so that the x coordinates can be categorized into correct and incorrect segmentation point classes.
For each segmentation point, a matrix of pixels representing the segmentation area was extracted and
stored in an ANN training file. The feature extractor breaks the segmentation point matrix down into
small windows of equal size 5x5 and analysis the density of black and white pixels. Therefore, instead
of presenting the raw pixel values of the segmentation points to the ANN, only the densities of each
window are presented. As an example, if a window exists that, and contains 6 black pixels, then a
single value of 0.24 (Number of pixels/25) was written to the training file to represent the value of
the window. Accompanying each matrix the desired output was also stored in the training file (0.1
for an incorrect segmentation point and 0.9 for a correct point) ready for ANN training
9

4.1.2.3 Training of the ANN : For this step, a multi-layer feed-forward Neural Network trained
with the backpropagation algorithm was used. The ANN was presented with the training pairs found
in the previous step.
4.1.2.4 Testing phase of the segmentation technique : Following ANN training, the words used for
testing are also segmented using the heuristic, feature-based algorithm. This time there is no manual
processing. The segmentation points are automatically extracted and are fed into the trained ANN.
The ANN then verifies which segmentation points are correct and which are incorrect. Finally, upon
ANN verification, each word used for testing should only contain valid segmentation points.
4.1.3

Results of segmentation :

[A]

[B]

Figure 2: A: Neural network based segmentation,Neural Network used:MLP, Configuration:[90(single


hidden layer)], Training Algorithm:traingdx(Matlab); B : Heuristic segmentation

4.2

Feature Extraction

A compact and characteristic representation of the image is required in the CR systems. For this
purpose, a set of features is extracted for each class that helps distinguish it from other classes, while
remaining invariant to intra class differences [27]. A good survey on feature extraction methods for
CR can be found in [22].
The different representation methods can be categorized into three major classes:
1. Global Transformation and Series Expansion: includes Fourier Transform, Gabor Transforms,
wavelets, moments and Karhuen-Loeve Expansion.
2. Statistical Representation: Zoning, Crossing and Distances, Projections.
3. Geometrical and Topological Representation: Extracting and Counting Topological Structures,
Geometrical Properties, Coding, Graphs and Trees etc.
We have used the following three features.
4.2.1

Gradient Features

The method is similar to the one presented in [28].

10

4.2.1.1 Skeletonisation : The skeletonisation process has been used on binary pixel image. The
extra pixels which do not belong to the backbone of the character, were deleted and the broad strokes
were reduced to one pixel thin lines. This creates a uniformity in all the testing and training data.
4.2.1.2 Normalization and Compression:
Since there are a lot of variations in handwritings of
different persons, therefore after skeletonisation process, we used a normalization process, which
normalized the character into 32 x 32-pixel character and used as an input of the neural network.
4.2.1.3 Gradient Feature Extraction: Each character is normalized into 32 x 32 size. The gradient
operator, named Sobel operator is used to calculate the gradient. The Sobel operator uses two
templates to compute the gradient components in horizontal and vertical directions, respectively.
The templates are shown below :

Figure 3: Horizontal and Vertical Templates for sobel operator.


The two gradient components at location (i,j) are calculated by:
gv (i, j) = f (i 1, j + 1) + 2f (i, j + 1) + f (i + 1, j + 1) f (i 1, j 1) 2f (i, j 1) f (i + 1, j 1)
(5)
gh (i, j) = f (i 1, j 1) + 2f (i 1, j) + f (i 1, j + 1) f (i + 1, j 1) 2f (i + 1, j) f (i + 1, j + 1)
(6)
The gradient strength and the direction are calculated as:
G(i, j) =

gv2 (i, j) + gh2 (i, j)

(7)
gv (i, j)
= arctan
gh (i, j)


(8)

The gradient strength and the direction calculation are the same as eq:7 and eq:8. In this way, we
can calculate the gradients of each character which comes between 0 and 2
11

4.2.2

Fourier Descriptor

The method adopted is similar to [29] where first boundary detection is done. Once a boundary image
is obtained then Fourier descriptors are found. This involves finding the discrete Fourier coefficients
a[k]andb[k] for 0 < k < L 1, where L is the total number of boundary points found, by applying
equations :
L
2
1 X
x[k]e(jk( L )m)
a[k] = ( )
L m=1
(9)
L
2
1 X
b[k] = ( )
y[k]e(jk( L )m)
L m=1

(10)
where x[m] and y[m] are the x and y coordinates, respectively,
of the mth boundary point. As found
q
2
in the study [29], descriptor produced using r[k] = |a[k]| + |b[k]|2 is less effective than using the
moduli of the complex coefficients, |a[k]| and |b[k]|. The values for K = 0 are discarded as they only
contain information about the position of the image. The coefficients for high values of k describe
high frequency features in the image but do not contain much information about the overall shape of
the character and so these high frequency components are also discarded. So the first five beginning
from k = 1 to k = 5 are considered.
Once the coefficients of the moduli have been found, the input vector is normalized to 1 to compensate
for image scaling. To spread the input data more evenly over the input space, the mean and standard
deviation. vectors are found over the whole set of test and training data. The j th component of input
vector i, is calculated as :


 
1

1 +1
ipj = (ipoj ) ioj )
noj
(11)
where ipoj is the j th component of the original vector or pattern p, ioj is the mean of the j th components
of the original vectors and noj is the corresponding standard deviation. Coefficient linearly controls
the degree of standard deviation compensation. If = 0, there is no compensation for variations of
standard deviation between dimensions; if = 1, the standard deviation of all dimensions is forced
to equal 1, giving full standard deviation compensation.
4.2.2.1 Fourier angle : It was also mentioned in [29] that if there is moduli alone is not successful
in discriminating all the classes experiments can done to incorporate angles also in the training set.

12

4.2.2.2 Fourier magnitude [30] : The use of FFT is not feasible if one seeks rotation and shift
invariant descriptors for the characters. Further, it has been observed that only the first few (say
10-15) Fourier coefficients are needed to adequately describe the various characters. Under these
conditions there exists no computational advantage in using FFT to evaluate the Fourier coefficients.
The Fourier coefficients derived from eq:9 and eq:10 are not rotation or shift invariant (to clarify, it
is noted that a shift will occur if the starting point of boundary following is arbitrary). In order to
derive a set of Fourier descriptors that have the invariant property with respect to rotation and shift
the following operations are defined. For each n compute a set of invariant descriptorsr[n] as :
r[n] =

|a[n]|2 + |b[n]|2

(12)

It is easy to show that r are invariant to rotation or shift. A further refinement in the derivation of
the descriptors is realized if dependence of r[n] on the size of the character is eliminated by computing
a new set of descriptors :
s[n] = r[n]/r[1]

(13)

The Fourier coefficients k(a[n]|, |b[n]| and the invariant descriptors s[n], n = 2, 3.. were derived for
all the character specimens and stored in files for application to reconstruction and recognition.

4.3
4.3.1

Training of classifiers
Neural Network (NN) based classifiers [31].

A neural network is a massively parallel distributed processor that has a natural propensity for storing
experiential knowledge and making it available for use. It resembles the brain in two respects :
1. They adapt by learning process.
2. Knowledge is stored in interconnections between neurons known as synaptic weights.
Basically, learning is a process by which the free parameters (i.e.,synaptic weights and bias levels) of a
neural network are adapted through a continuing process of stimulation by the environment in which
the network is embedded. The type of learning is determined by the manner in which the parameter
changes take place. Broadly learning can be classified into two :
1. Supervised Learning : This form of learning assumes the availability of a labeled (i.e., groundtruthed) set of training data made up of N input-output.
2. Unsupervised Learning : This form of learning do not assume the availability of a set of training
data made up of N input-output. They learn to classify input vectors according to how they
are grouped spatially and try to tune its network by considering a neighborhood.
In this project we will consider MLP RBF as classifiers based on supervised learning. We have used
Matlab neural network toolbox for the implementation of these networks.
13

4.3.1.1 Multilayer Perceptron(MLP) : This network is a feed forward network because its structure does not contain any loop. As shown in Fig., a multilayer perceptron has an input layer of source
nodes and an output layer of neurons (i.e., computation nodes); these two layers connect the network
to the outside world. In addition to these two layers, the multilayer perceptron usually has one or
more layers of hidden neurons, which are so called because these neurons are not directly accessible.
The hidden neurons extract important features contained in the input data. Each input node is
connected to each node of hidden layer by a synaptic weight. The input to a hidden node is the sum
of all input nodes weighted by synaptic weights for connection between input nodes and the hidden
neurons.

Figure 4: MLP structure.


There are many activation functions out of which we selected tan-sigmoid, log-sigmoid and pure
linear.

1. Tan sigmoid tansig(n) =

2
1
(1 + exp(2n))

2. Log sigmoid logsig(n) =

1
(1 + exp(n))

3. purelinear purelin(n) = n
4.3.1.2 Radial Basis function(RBF NN) : Radial Basis function NN ( RBF NN ) is a two layer
network. It falls under the category of feed-forward network, in which graphs has no loops. Basic
structure of RBF network is given below :

14

Figure 5: RBF network.


A radial basis function is a real- valued function whose output depends on the distance between
origin and the input to that function.
(x, c) = (x, c)

where ||x c|| is norm or the distance between vector c (defined as center of the radial basis
function) and the input.
Different types of radial basis function. :
1. Gaussian RBF :
z(x) = exp(

||x ||2
)
()2

Where is a vector defined as the center of the Gaussian function and is the standard deviation of
this given function and x is the distance between the input and the center of this function.

2. Logistic basis function :


(r) =

1
(1 + exp( r ))

where is the standard deviation and r is the distance between input and the origin.
The RBF network is built up as a linear combination of N radial basis functions with N distinct
centers. Given an input vector x, the output of the RBF network is the activity vector y given by

y =

L
X
m=1

15

j zj

(14)

where, j is the weight associated with the jth radial basis function, centered at j and zj =
(||x j ||). The output y approximates a target set of values denoted by y.

Figure 6: Complete RBF structure.

4.3.1.3 Training of neural network:


into two parts.

For training the neural network we divide training dataset

1. Estimation subset used for training the model.


2. Validation subset used for evaluating the model performance.
The network is finally tuned by using the entire set of training examples and then tested on test
data.Training of these networks is usually done by back-propagation algorithm. This algorithm consists of two phases:
1. Forward Phase: In this phase the free parameters of the network are fixed, and the input
signal is propagated through the network, layer by layer. At the end of this phase error signal is
calculated between predicted output of network and the actual output corresponding to input
sample presented.
2. Backward Phase: During this phase, the error signal ei is propagated through the network in
the backward direction. It is during this phase that adjustments are applied to the network
weights so as to minimize the error ei in a statistical sense, generally MSE criterion is used.

16

Figure 7: Back propagation Network.

4.3.1.4 Classification using neural networks: In classification problems, the purpose of the network is to assign each input to one of the classes. Each of the output units has continuous activation
values between 0.0 and 1.0. In order to definitely assign a class from the outputs, the network must
decide if the outputs are reasonably close to 0.0 or 1.0, otherwise the class is regarded as undecided.
Confidence levels (the accept and reject thresholds) decide how to interpret the network outputs.

4.3.2

Support vector machine(SVM) [32]

Support Vector Machines are based on the concept of decision planes that define decision boundaries.
A decision plane is one that separates between a set of objects having different class memberships.
Most classification tasks, however, are not that simple, and often more complex structures are needed
in order to make an optimal separation, i.e., correctly classify new objects (test cases) on the basis
of the examples that are available (train cases). For example , in the figure below the GREEN and
RED objects would require a curve (which is more complex than a line).

Figure 8: Complex classification problem.


Support Vector Machines are particularly suited to handle such tasks.
The illustration below shows the basic idea behind Support Vector Machines. Here we see the original
objects (left side of the schematic) mapped, i.e., rearranged, using a set of mathematical functions,
known as kernels. The process of rearranging the objects is known as mapping (transformation).
Note that in this new setting, the mapped objects (right side of the schematic) is linearly separable
and, thus, instead of constructing the complex curve (left schematic), all we have to do is to find an
optimal line that can separate the GREEN and the RED objects.

17

Figure 9: Classification using SVM.


Support Vector Machine (SVM) is primarily a classifier method that performs classification tasks
by constructing hyperplanes in a multidimensional space that separates cases of different class labels.

4.4

Testing result of MLP, RBF SVM on the features extracted

Fourier with phase, ||a(k)|| and ||b(k)|| features are used for the comparison of classifiers.
4.4.1

Performance of Neural Network classifiers :

[A]

[B]

Figure 10: A: MLP with structure [80(first hidden) 50(second hidden) 50(third hidden)],Algorithm
usedGradient-descent with momentum (traingdx of Matlab), learning rate: adaptive with initial 0.2,
Momentum :0.9 : Results are very bad on training set ; B : RBF : Results are good on training data
but over learning is high hence bad results on test data.

4.4.2

Performance of Support vector machine classifiers :

In case of SVM result on training data is 98.86% and very optimum learning. The result on the
testing data is 62.93%.
On the test data SVM outperforms the other two networks.
4.4.3

Comparison between all four feature vectors with SVM :

Now we have to pick the best feature extraction technique for our system. For that we tested SVM
with different feature vectors. The table below shows the recognition rate (%) for all four feature
18

vectors.
Fourier
with
magnitude
(s(k)),|a(k)| and
|b(k)|
86.66%

4.5

Fourier
with Fourier
with Gradient
phase,
|a(k)| magnitude
tures.
and |b(k)|.
(s(k)),|a(k)|,|b(k)|
and phase
98.74%
98.04%
40.50%

fea-

Post Processing:Combining the CR techniques

Fusion is one of the powerful methods for improving recognition rates produced by various techniques.
It takes advantage of different errors produced by different techniques, emphasizes the strengths
and avoids the weaknesses of individual techniques. Researcher have found that in many real word
applications , it is better to fuse multiple techniques to improve the results. Fusion can be done in
the following two ways:
Serial Architecture: In this method the output of a classifier is fed into the next classifier. There
are four basic methodologies used, viz.: sequential, selective [33], boosting [34] and cascade
[35] methodologies.
Parallel Architecture: This method combines the result of more than one independent algorithms
by using one of the following methodologies: voting , Bayesian [36], Dempster-Shafer Theory
[37], behavior-knowledge space [38], mixture of experts [39] and stacked generalization.
We here use a method based on Borda count that is inspired from [40] to combine the following results
:
Technique 1: SVM on Moduli of Fourier Coefficients||a(k)|| and ||b(k)|| and magnitude s(k).
Technique 2: SVM on Moduli of Fourier Coefficients||a(k)|| and ||b(k)|| and phase.
Technique 3: SVM on Moduli of Fourier Coefficients||a(k)|| and ||b(k)||, phase and magnitude s(k).
4.5.1

Conventional Borda Count

Conventional Borda count for a string in the lexicon is defined as the sum of the number of strings
that are below the string in the different lexicons produced by the various techniques [40].
4.5.2

Modified Borda Count

A rank is assigned and used in the calculation of the Boda count, instead of calculating the number
of strings below the string to be recognized. The rank for a particular string can be calculated using
the following formulae :
Rank = 1

(positionof thestringinthetopN strings)


N
19

The rank is 0, if the string is not in the top N choices.


We have taken N=3.Therefore only top three words are considered from each technique to calculate the
rank. Secondly the confidence values produced by different techniques are considered. The confidence
values for all the three predicted words for any given technique is the confidence that the classifier has
in its predicted string, even if the string is not a valid lexicon word. This can be estimated by summing
up the scores of each predicted characters. This is reasonable because the top three strings are chosen based on its similarity with the predicted string. The similarity between the predicted string and
the lexicon words are found by finding the number of matching characters and their relative positions.
Final Boda count of a lexicon word = (rank conf idence)tech1 + (rank conf idence)tech2 + (rank
conf idence)tech3

20

Final Results

Figure 11: Result on Moderated.

21

Figure 12: Result on Puzzle.

22

Figure 13: Result on Rolled.

Discussions

In case of Moderated, the neural network segmentor failed to segment te. This is obvious because
it treated them as a hole because of the way in which these pair of characters was written. The outputs of the three different techniques are MOrerlmd, MOGeraED and MOrerlmd, which has
very small similarity with word Moderated. This error is because of the low discriminative ability
of fourier features and their combinations in our case where they have to distinguish 52 different
classes. This error is corrected in the post processing step where the borda count for all three parallel
techniques is highest for the word Moderated of the lexicon. Hence, system outputs correct word
Moderated.
In case of puzzle, u is incorrectly segmented into two. The outputs of the three different techniques
are PzZzfe, PCZZfc and PsZzme, which has very small similarity with word Puzzle. Again
this is due to the low discriminative ability of Fourier features. Here the output of two techniques
is Puzzle with confidence 1.2 each while the third technique predicted Climate with confidence
2.17. This error is corrected when the borda count for all three techniques are combined with highest
23

confidence for the word Puzzle . Hence, system outputs correct word Puzzle.
In case of Rolled, segmentation is perfect but the outputs of the three techniques are quite different
from the word Rolled. But combining the results of the three parallel techniques the score for the
word Rolled is highest, hence system outputs correct word.

Conclusions

We thus conclude that the proposed system gives fairly good results on the test samples that were
presented to it. We could not list the recognition accuracy as percentage because we did not have
enough test samples. We tested both heuristic and neural network based segmentation and found
that the later gave better results. This is reasonable because heuristic algorithm is based on rules
that are deduced empirically and there is no guarantee for their optimum results for different styles
of writing. So their validation using neural network becomes essential. Moreover our character
recognition network has 52 output classes whereas in most of the literature they have used separate
classifiers for upper and lower case characters. We tested different neural networks that have been
used in the past for character recognition. We tried different configuration of MLP upto 3 hidden
layers and the best results were obtained with [80 50 50] configuration, with validation performance
of 0.01 in 640 epochs. The training algorithm used in this case was Gradient-descent with momentum
(traingdx of Matlab). Also, we tested RBF neural network and got performance (MSE) of 0.0010155
in 1800 epochs. This network suffered from over learning and gave poor results on test data. Apart
from neural network we tried Support vector machines classifier on the same feature set and achieved
98% classification accuracy on training data set and 62.93% on test data set. Finally, we selected
SVM as it outperformed MLP and RBF. For feature extraction we started with gradient features,
which in our case produced very poor results. We tried Fourier features like moduli of Fourier
coefficients,magnitude, phase and their various combinations as feature vectors. We got best results
with Moduli of Fourier coefficients and phase with a recognition accuracy of 98.74% on training data
set. We have used three combinations of Fourier descriptors in our final system.Postprocessing which
uses lexicon becomes imperative as there is no other way to find out the errors that have creeped in
at any of the previous stages.The only way to do that is to verify that whether the predicted word is
a valid lexicon word or not.Thus incorporating this in our final system using Borda Count improved
the overall efficiency of the system.

Future work

Performance of neural network based segmentation can be improved by using a larger database. More
research can be done to come up with a better feature vector that incorporates transform based, statistical and directional features for character recognition. SVM has outperformed in classification of
characters because it performs classification tasks by constructing hyperplanes in a multidimensional
space that separates samples of different class labels. Other recently developed technique like Dempster Shafer theory could be used for combining different CR technique. Even in the case of Borda
Count other techniques can be explored which can give different confidences to each predicted lexicon

24

word for a given classifier. Also, experiments can be done to give different weights to each of the
parallel CR techniques according to their performance on the validation data

References
[1] J. Mantas, An overview of character recognition methodologies, Pattern Recognition, vol. 19, no. 6, pp. 425 430, 1986.
[2] T. S. El-Sheikh and R. M. Guindi, Computer recognition of arabic cursive scripts, Pattern Recognition, vol. 21, no. 4, pp. 293 302,
1988.
[3] S. Mori, K. Yamamoto, and M. Yasuda, Research on machine recognition of handprinted characters, Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. PAMI-6, no. 4, pp. 386 405, 1984.
[4] C. Suen, M. Berthod, and S. Mori, Automatic recognition of handprinted characters 8212;the state of the art, Proceedings of the
IEEE, vol. 68, no. 4, pp. 469 487, 1980.
[5] C. Tappert, C. Suen, and T. Wakahara, The state of the art in online handwriting recognition, Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 12, pp. 787 808, Aug. 1990.
[6] R. Bozinovic and S. Srihari, Off-line cursive script word recognition, Pattern Analysis and Machine Intelligence, IEEE Transactions
on, vol. 11, pp. 68 83, Jan. 1989.
[7] V. Govindan and A. Shivaprasad, Character recognition a review, Pattern Recognition, vol. 23, no. 7, pp. 671 683, 1990.
[8] Q. Tian, P. Zhang, T. Alexander, and Y. Kim, Survey: omnifont-printed character recognition, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series (K.-H. Tzou & T. Koga, ed.), vol. 1606 of Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference Series, pp. 260268, Nov. 1991.
[9] A. Belaid and J.-P. Haton, A syntactic approach for handwritten mathematical formula recognition, Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. PAMI-6, no. 1, pp. 105 111, 1984.
[10] Y. Ding, F. Kimura, Y. Miyake, and M. Shridhar, Evaluation and improvement of slant estimation for handwritten words, in
Document Analysis and Recognition, 1999. ICDAR 99. Proceedings of the Fifth International Conference on, pp. 753 756, Sept.
1999.
[11] S. M. Lucas, E. Vidal, A. Amiri, S. Hanlon, and J.-C. Amengual, A comparison of syntactic and statistical techniques for off-line
ocr, in Proceedings of the Second International Colloquium on Grammatical Inference and Applications, (London, UK), pp. 168179,
Springer-Verlag, 1994.
[12] K.-F. Chan and D.-Y. Yeung, Recognizing on-line handwritten alphanumeric characters through flexible structural matching, 1999.
[13] S. Mori, C. Suen, and K. Yamamoto, Historical review of ocr research and development, Proceedings of the IEEE, vol. 80, pp. 1029
1058, July 1992.
[14] C. Tappert, C. Suen, and T. Wakahara, The state of the art in online handwriting recognition, Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 12, pp. 787 808, Aug. 1990.
[15] H. Avi-Itzhak, T. Diep, and H. Garland, High accuracy optical character recognition using neural networks with centroid dithering,
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, pp. 218 224, Feb. 1995.
[16] I. Bazzi, R. Schwartz, and J. Makhoul, An omnifont open-vocabulary ocr system for english and arabic, Pattern Analysis and
Machine Intelligence, IEEE Transactions on, vol. 21, pp. 495 504, June 1999.
[17] J. Hu, S. G. Lim, and M. K. Brown, Writer independent on-line handwriting recognition using an hmm approach, Pattern Recognition,
vol. 33, no. 1, pp. 133 147, 2000.
[18] A. Meyer, Pen computing: a technology overview and a vision, SIGCHI Bull., vol. 27, pp. 4690, July 1995.
[19] G. Kim and V. Govindaraju, A lexicon driven approach to handwritten word recognition for real-time applications, Pattern Analysis
and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 366 379, Apr. 1997.
[20] M. Mohamed and P. Gader, Handwritten word recognition using segmentation-free hidden markov modeling and segmentation-based
dynamic programming techniques, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, pp. 548 554, May
1996.

25

[21] A. A. Atici and F. T. Yarman-Vural, A heuristic algorithm for optical character recognition of arabic script, Signal Processing,
vol. 62, no. 1, pp. 87 99, 1997.
[22] ivind Due Trier, A. K. Jain, and T. Taxt, Feature extraction methods for character recognition-a survey, Pattern Recognition, vol. 29,
no. 4, pp. 641 662, 1996.
[23] S.-W. Lee, D.-J. Lee, and H.-S. Park, A new methodology for gray-scale character segmentation and recognition, Pattern Analysis
and Machine Intelligence, IEEE Transactions on, vol. 18, pp. 1045 1050, Oct. 1996.
[24] N. Arica and F. Yarman-Vural, Optical character recognition for cursive handwriting, Pattern Analysis and Machine Intelligence,
IEEE Transactions on, vol. 24, pp. 801 813, June 2002.
[25] E. Kavallieratou, N. Fakotakis, and G. Kokkinakis, Skew angle estimation for printed and handwritten documents using the wignerville distribution, Image and Vision Computing, vol. 20, no. 11, pp. 813 824, 2002.
[26] M. Blumenstein and B. Verma, Neural-based solutions for the segmentation and recognition of difficult handwritten words from a
benchmark database, in Document Analysis and Recognition, 1999. ICDAR 99. Proceedings of the Fifth International Conference
on, pp. 281 284, sep 1999.
[27] I.-S. Oh, J.-S. Lee, and C. Suen, Analysis of class separation and combination of class-dependent features for handwriting recognition,
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 1089 1094, Oct. 1999.
[28] D. Singh, M. Dutta, and S. H. Singh, Neural network based handwritten hindi character recognition system, in Proceedings of the
2nd Bangalore Annual Computer Conference, COMPUTE 09, (New York, NY, USA), pp. 15:115:4, ACM, 2009.
[29] I. P. Morns and S. S. Dlay, Character recognition using fourier descriptors and a new form of dynamic semisupervised neural network,
Microelectronics Journal, vol. 28, no. 1, pp. 73 84, 1997.
[30] M. Shridhar and A. Badreldin, High accuracy character recognition algorithm using fourier and topological descriptors, Pattern
Recognition, vol. 17, no. 5, pp. 515 524, 1984.
[31] S. Haykin, Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1st ed., 1994.
[32] C. Wei, Statsoft, inc., tulsa, ok.: Statistica, version 8, AStA Advances in Statistical Analysis, vol. 91, pp. 339341, 2007.
10.1007/s10182-007-0038-x.
[33] S. Gopisetty, R. Lorie, J. Mao, M. Mohiuddin, A. Sorin, and E. Yair, Automated forms-processing software and services, IBM J.
Res. Dev., vol. 40, pp. 211230, March 1996.
[34] H. Drucker, R. E. Schapire, and P. Simard, Improving performance in neural networks using a boosting algorithm, in Advances in
Neural Information Processing Systems 5, [NIPS Conference], (San Francisco, CA, USA), pp. 4249, Morgan Kaufmann Publishers
Inc., 1993.
[35] J. Park, V. Govindaraju, and S. Srihari, Ocr in a hierarchical feature space, Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 22, pp. 400 407, Apr. 2000.
[36] H.-J. Kang and S.-W. Lee, Combining classifiers based on minimization of a bayes error rate, in Document Analysis and Recognition,
1999. ICDAR 99. Proceedings of the Fifth International Conference on, pp. 398 401, Sept. 1999.
[37] L. Xu, A. Krzyzak, and C. Suen, Methods of combining multiple classifiers and their applications to handwriting recognition, Systems,
Man and Cybernetics, IEEE Transactions on, vol. 22, no. 3, pp. 418 435, 1992.
[38] Y. Huang and C. Suen, A method of combining multiple experts for the recognition of unconstrained handwritten numerals, Pattern
Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, pp. 90 94, Jan. 1995.
[39] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, Adaptive mixtures of local experts, Neural Comput., vol. 3, pp. 7987,
March 1991.
[40] B. Verma, P. Gader, and W. Chen, Fusion of multiple handwritten word recognition techniques, Pattern Recognition Letters, vol. 22,
no. 9, pp. 991 998, 2001.

26

You might also like