You are on page 1of 11

Recognition of

handwritten characters
a review

R H Davis and J Lyall

before comparison with several prototypes and eventual


In the quest to study h~an perception, the field of recognition or rejection.
character recognition remains prominent. Research has Much more difficult, and hence more interesting to
progressed from early, cumbersome optical character researchers, is the ability to recognize handwritten
readers used for the recognition ?j’ a limited set of characters automatically. The complexity is greatly
individual characters to the present-day automatic recog- increased by the noise problem, and the almost infinite
nition C$ handwritten cursive script. This paper attempts variability of handwriting. Noise arises from the writing
to clarify the,fundamentals of character recognition, high- instrument used, the quality of the paper or the pressure
lighting the processes involved in using a standard database applied when writing, for example. Variability, on the
for ‘leurning’ character sets and also the .~tandards and other hand, is virtually certain as even one person’s
~on~~trai~tsimposed by researcller.~ concerning the consti- handwriting will change depending upon su~oundings,
tution of a valid ~hararter. A number of various feature the mood of the person and the nature of the writing.
extraction techniques which enable individuul characters The recognition of handwritten characters can be sub-
to he recognized are discussed and compared. divided into that of cursive script (joined writing) and
printed script. Even humans, with the best optical
Keywords: pattern recognition, handwritten characters, readers available, have difficulty in recognizing some
feuture extraction cursive script, and in this area of CR the greatest
problem is that of segmentation of letters within the
word and detection of features.
While much work in character recognition (CR) has been
concentrated on multi-font and fixed-font reading, the The operation of character recognition is a process
most interesting field, scienti~caily, is that of recognition which can be carried out online or offline. In the former
of handwritten characters. Machines which read fixed case, characters are recognized as they are written. This
fonts, such as those on cheques and credit cards, are approach necessitates the use of a graphics tablet, or
very reliable due to the constraints of using magnetic a light pen, for the drawing of characters. Information,
ink and the stylization of the character set. These readers usually in the form of coordinates, is collected at regular
perform a one-dimensional scan combined with wave- time intervals as the tablet is moved. The tasks of prepro-
form correlation, and recognition is achieved by correlat- cessing, feature extraction and classification follow,
ing the resulting signal with stored waveforms. enabling either successful recognition, misrecognition or
Operating on a less idealized set of characters, a multi- rejection of the character.
font reader has to cope with noise factors such as paper In an offline system, characters are recognized on
quality and reflectivity, ribbon wear and type element completion of drawing. An optical scanner is required
alignment. The use of template matching is the most to obtain a digitized representation of the input
widely used approach to multi-font reading and, because character. The stages involved in optical character recog-
ideally the number of acceptable fonts is finite, a proto- nition (OCR) are shown in Figure 1.
type of each character can be stored for the purpose In all recognition methods the fundamental task is
of matching. In the recognition process the character that of extracting useful information from the image
is optically scanned and converted into digital patterns and then performing classification, usually by com-
parison with known results. Although the character may
Department of Computer Science, Heriot-Watt University, 79 be represented in matrix form, or by line segments, the
Grassmarket, Edinburgh EHI IHJ, UK basic approach is the same - that of processing the

0262-8856/86/04208-l 1 $03.~ @ 1986 ~utterworth & Co. (Publishers) Ltd

208 image and vision computing


To print ‘1’ without serifs and place horizontal bars on
preprocessmg the letter ‘I’.
Input Digitization Character Highleyman’s data consisted of the 36 alphanumeric
n-&lx characters, represented by a 12 x 12 matrix. He
obtained 1 800 letters and 500 numerals from 50 authors
who were instructed to print neatly on l/4 in quadruled
paper, approximately tilling these boxes.
Smoothing, noise Identifi~atian of
Since Highleymans’ data is of low resolution, poor
elimination and character recognition rates were achieved using his database.
size narmalizat~on Although Munsons’ data varies widely in quality, it is
Figure 1. Stages in recognition of handprinted characters still considered to be the most difficult set of almost
unconstrained handprinted characters for machine
character(s) and extracting features to compile what is recognition.
known as a feature vector. While researchers have been developing and designing
The feature vector is the means by which the character recognition techniques, OCR committees in some
is represented in the system, and is the basis of the countries have developed handprint standards in order
recognition stage, Before any classification takes place to reduce the number of shape variations, resulting in
it is usual to build up some prior knowledge of the less confusion over identification of characters and
character set to be tested, by supplying a learning set higher recognition rates.
of characters, and establishing feature vectors which can A large proportion of optical readers in OCR indus-
be compared with those of the characters of the testing tries can only recognise handprinted characters within
set. the scope of numerals and a few letters, entered in boxes
The stages in recognition by an OCR are with or without guidelines. The writers are usually
instructed to keep to the standards and models provided
Input characters are read and digitized by an optical as closely as possible, avoiding gaps and flourishes in
scanner. the characters.
Each character is located and segmented from the Another common constraint is that characters should
data sheet under software control. be printed as large as possible, within specified bound-
The resulting matrix is fed into a preprocessor for aries, to retain good resolution, and it is usual to have
2 smoothing i.e. filling holes and breaks in line characters written on forms with OCR specifications
segments, in weight, dirt count, colour and reflectance.
fi noise elimination (isolated dots), As well as specifying exact areas in which to enter
0 size normalization. characters, many researchers place rules on what consti-
Distinctive features are then extracted, which are seen tutes a valid character: commonly these are ‘0’ and ‘Z’
to be ‘descriptive’ of the character. with strokes, ‘I’ with bars and ‘4’ with vertical and
Identification is performed on the basis of comparison horizontal lines only.
between extracted features and statistics of features
from a learning set of examples.
Comparison of recognition techniques
Early character recognition was hindered by three
aspects’, Feature extraction is the central issue in handprint recog-
nition, and is followed by classification. Recognition,
The adequate representation of a handprinted or classification, is fairly straightforward and totally
character requires at least 500 bits of memory (to dependent on the merit of the extraction process. Table
enable 20 x 25 matrix size). 1 shows a comparison of various techniques used in
The cost in terms of computing power and time was feature extraction based on:
high.
The lack of scanning equipment and a combination sensitivity to the deformation of the image of a
of the above forced many researchers to work on character, caused by
limited sets of characters; as a result their results were CD]) noise - disconnected line segments, bumps,
very inconclusive. gaps in lines, filled loops etc.;
032) distortion - local variations, rounding of
With the limited set of characters proving to be a great corners, improper protrusions, dilation and
drawback some researchers established ‘standard’ data- shrinkage;
bases, containing many variations of different (D3) style variation - use of different shapes to
characters. which could be used as both learning and represent the same character, serifs, slants etc.;
testing sets. The two most commonly used are those (D4) transl~ti~~n- movement of the whole character
created by Munson’ and HighIeyman~. or its components;
Munson’s data was represented in 24 x 24 matrix (D5) roration - change in orientation;
form, and consisted of 46 characters of the FORTRAN practical implementation of the recognition technique,
IIlanguage. He obtained 12 760 samples, 6 762 of these iudged by
from 49 authors, with each printing three sets of iIl)- aut&atic musk making - whether or not it is
FORTRAN characters on standard coding sheets. The easy to build masks automatically corresponding
remaining characters were collected from individual to each class of characters to match the different
authors, or from fragments of coding sheets. Munson features used in such a technique:
instructed authors to slash the characters ‘0’ and ‘Z’, w ,speed 0f rt~~.~)g~~iti~)n~

~014 no 4 november 1986 209


Table 1. Comparison of recognition techniques

Tolerance to image Ease of


deformation implementation

Dl D2 D3 D4 D5 11 I2 I3 I4

Global Template matching + 0 - - - + - 0 -

features Transformation - + + + + 0 - - 0

Zoning _ 0 - - 0 - + + -

Moments 0 0 - + + 0 - 0 -

Distribution of points n-tuple 0 - 0 - 0 - + + 0

Characteristic loci - + + + or0 + + -

Crossings and distances _ + + + 0 - + + -

Geometrical and topological _ + + + 0 - + 0 +

+ high or easy 0 medium - low or difficult

(13) comple.uity qf implementation; and the densities of points in these regions are used
(14) independence ~ whether supplementary as feature vectors.
techniques are needed. Moments. The moments of black points about a
chosen centre, such as the centre of gravity, are used
The above criteria are applied to compare recognition as features.
techniques which can be classified into three categories, n-tuples. The occurrence of black or white elements,
according to the way in which features are extracted or joint occurrences of these elements, are used as features.
Characteristic loci. For each white (i.e. empty) point
l global features in the background of the character, vertical and horizon-
l distribution of points tal vectors are generated, and the number of times the
0 topological features line segments are intersected by these vectors are used
as features.
Crossings and distances. For crossings, features are
Global features measured by the number of times line segments are
crossed by vectors in specified directions. In the case
This technique extracts features from every point inside of distances, the distances of elements or line segments
a frame surrounding the character, and features show from a given boundary, such as the frame, are used
no reflection of any local, geometrical or topological as features.
properties.
Template matching and correlations. Features are
Geometrical and topological features
determined by the state (white or black) of all points
within the frame. A measure of the similarity between
This approach is based on extracting the geometrical
the input character and store references is obtained by
and topological features which describe the physical
the matching and correlation of points or groups of
makeup of the character. They may represent both global
points in the frame.
and local properties. This is by far the most popular
Transformations and series expansions. Using the
method used by researchers and common features to
points in the frame as features, as in template matching,
extract include:
results in a very large feature vector. (The dimension
will be high because all points in the frame will be held
l strokes and curves in various directions e.g.
in the vector.) It is possible to reduce the dimensionality
and extract features invariant to some global defor-
mation, such as a translation or rotation. Rotational
I I \
transformations include those of Fourier, Walsh, Haar l end points, intersections of line segments and loops,
and Hadamard whilst Karhunen-Loeve use series l stroke relations, angular properties, sharp protrusions
expansions. and intrusions in connection with contour analysis.

An approach to geometrical and topological features,


Distribution of points and zoning, is explained later.

The dimensionality of the feature vector can also be


reduced by considering as features the statistical distri- USING TOPOLOGICAL FEATURES FOR
bution of points, rather than every individual point of RECOGNITION
the character.
Zoning. This involves splitting the character frame Printed script recognition has been researched by many
into several overlapping, or non-overlapping, regions people. Particularly useful is the manipulation of the

210 image and vision computing


Stage I - Feature -
The pattern is centred in an octagonal grid (see Figure
3a), and features are extracted according to a set of
rules. By using a symmetrical grid the detection software
can operate independently of the character size and
avoid the necessity of size normalization.
The features are represented by:
I

Stage a&no part of a line in an octant;


feature
al-a stroke crossing octants 1,2,5, or 6 is approximated
extractor
by a horizontal stroke;
a2-a stroke crossing octants 3,4,7 or 8 is approximated
Relectlon DeClSlOrl Decision by a vertical stroke;
Figure 2. Recognition system using two-level decision a3-a stroke terminating in an octant in the clockwise
making direction;
a&a stroke terminating in an octant in the anti-
geometrical and topological features of the character4. closewise direction;
This approach to recognition is concerned with the a5-a loop in an octant, with opening to the right, con-
physical makeup of a letter or digit, and the features sidered only in octants I ,6,7.8;
of interest may involve a&a loop in an octant, with opening to the left, con-
sidered only in octants 2,3,4,5;
0 strokes and bays (curves); a7-a loop in an octant, facing upwards, considered
0 end points of characters, intersections of line segments only in octants 4,5,6,7;
and loops; a8-a loop in an octant, facing downward, considered
l contour analysis: stroke relations, angular properties, only in octants 1,2,3,8;
sharp protrusions and intrusions. a9-a stroke contained entirely in an octant.

The popularity of such an approach can be attributed In the feature list, a loop is signified by a continuous
to its high tolerance to distortion and style variations stroke, starting in one octant, proceeding to the adjacent
when compared with other recognition techniques, as octant, and returning to the original octant. In Figure
well as its ability to tolerate a certain amount of rotation 3a the octagonal grid is shown, with the octant numbers
and translation. used. Figures 3b to 3h show examples of characters to
It is possible to involve decision making at various illustrate the selected features.
stages during the recognition process, and thus avoid There are ten possible symbols, aO,al a9, which
the waste of time and resources if a character can be are assigned to each octant depending on the nature
rejected at an early stage in processing4. of the line segment within that octant. Each octant may
Such a recognition model would be split into two have up to four features, indicating four different stroke
phases, as shown in Figure 2. It starts with global detec- segments, and for any octant having less then four fea-
tion of features, which is then followed by local analysis tures (i.e. parts of strokes) the entries are zero rather
when the first stage provides a non-unique solution, i.e. than one of the symbols a0.. a9.
the character cannot be uniquely identified. The feature vector for a character consists of these
Within these stages four symbols for each octant, or zero, and a boolean
value to indicate whether or not two strokes cross in
The conversion takes place of the physical image to an octant; m = 2 if two strokes do cross, and m = I
a digital pattern through the use of an input device if none cross. The vector can be represented thus
such as a high speed optical scanner.
The preprocessor centres and thins the pattern, and X = (.ul 1,x12,.~13,.~14,m;.u21,.~22,.u23,.~24,m; .;
locates the start and end of each stroke. .uil ,.ui2,xi3,.ui4,m; x8 1..u82..~83..~84,m)
The first stage feature extractor determines the basic
features which are collected as a feature vector. where
The first stage classifier operates on feature vectors
and assigns characters to a primary category or rejects i = 1,2,3...8
.x,, > (a,!,
the character, involving no detailed analysis. j = 1,2,3,4
The first level decision maker uses previous experi- .k = 0,1.2.. . 9
ence, from the learning set, to see if the character
can be uniquely identified without involving more
or
detailed local analysis.
If a non-unique classification has been reached, the
X,, = 0
second stage classifier is used to detect local features
to remove any of the ambiguities realised at the first
stage. Although these features a0.. a9 help categorize a
character with a minimum of feature extraction. it is
not possible to identify uniquely all characters as some
First stage analysis will have identical vectors. In such cases the second stage
classifier will have to remove any ambiguities.
The first stage of the recognition process involves group- E.ytraction of ,fiatures. Once the start, intermediate
ing the input character into one of several subgroups. and end points of each stroke have been found (by the

vol4 no 4 novemher 1986 211


a b

9 h

Figure 3. The octugonaf grid (a) and examples offeature types (h)-(h)

preprocessor), yielding individual strokes, feature o determining when a stroke crosses a boundary,
extraction is performed. Each stroke is tracked and l calculating features a0. . , a9 as each octant is crossed;
features obtained by: l ordering of features into feature vectors.

212 image and vision computing


Reduction I$ the ,j&ure set. This is used to determine ledge. They can be described as general purpose for
which features can be classified as identical, while mak- the detection of sharp inflections or special purpose for
ing sure the loss of information does not lose the class resolving specific ambiguities between characters.
uniqueness for any pattern in the learning set. Before
reduction, a regrouping process is necessary as several
General purpose tests
classes contain identical feature vectors. At this stage
it is impossible to remove any of the eight measurements
One approach for detecting sharp inflections is by way
from the vector, but the feature set can be simplified
of a curve follower, which establishes three points on
by:
a curve and follows the curve, keeping the distances
between points similar. The smallest angle between two
l ignoring the orientation of strokes;
line segments connecting three points is stored, and the
l not assigning different codes to the same stroke
maximum inflection along a line can be determined
occurring in different octants;
(enabling discrimination between ‘U’ and ‘V’ among
l ignoring the relative position of loops. In fact, the
others).
only significance is whether loops are facing left, right,
A discriminant-function detector is useful for dis-
up, or down, hence the basic topological features
tinguishing a straight stroke from an inflection. This
shown.
involves locating an elliptical boundary that has its
major (horizontal) axis containing the two end points
Recognition ut thr ,first stuge. The main advantage of
of a stroke. For a straight line most of the stroke will
using the topological features as shown is the reduction
lie within the ellipse, while an inflection will exist mainly
of variability of characters among the same class, thus
outwith the boundary.
allowing the use of a lookup table for recognition at
the first stage. A lookup table can be easily updated,
and allows for rejection. Updating the table merely Special purpose’tests
entails adding entries. while rejection is necessary in
many application of CR. such as in mail sorting. These are concerned with resolving specific ambiguities
between characters. As the first stage in recognition has
greatly narrowed the range of possible outcomes, the
Classification extraction is much easier than had recognition been per-
formed in one step. The specific conflicts encountered
First, classify the feature vector by searching for an entry are listed below, where * denotes the use of a curve-
in the table. If no entry is found the character is rejected, follower inflection and ** denotes use of discriminant-
i.e. it is not recognised by the absence of a corresponding function detection.
vector from the learning set.
If an entry is found, then determine if the feature DO/UVW** - examine degree of opening in upper
vector represents a unique pattern using the first level region to yield (D,O) or (U,V,W);
decision maker. which has an auxiliary storage for the D/O* ~ check lower section for sharp inflection (0);
previous conflicts between patterns of different classes. U/V/W* -W has multiple inflections in lower half;U
The auxiliary storage is identical in structure to the 1’sV: single inflection;
lookup table, and its size is determined by how well X/Y/4* - 4: stroke starting in upper left has sharp
features at the first stage are able to distinguish between inflection; X vs. Y: consideration of relationship
classes. If the feature vector is not present in the storage between two strokes;
then the character is recognized, but if it has an entry C/L** - check straightness of upper 2;3 of pattern;
then the character cannot be uniquely recognized. This D/P - check height of lower horizontal stroke in
necessitates second stage classification which performs comparison with height of character:
more local analysis. depending on which subgroup the F/I - check height of lower horizontal for being
character belongs to (one of a possible i characters). more than 15% up the height of the vertical;
J,;T,!Y** - Y has sharp inflection in top stroke; T
r.s J: check for tail in bottom of character;
Second stage analysis J:‘5 ~ check relationship between top stroke and the
rest of the character;
The second stage is used when the pattern has been 5&9.G,6* - 5 has sharp inflection in upper left;
classified into a subgroup but:annot be uniquely identi- S.9/G,6 - examines region below and left of lower
fied. In this event specialized features on a local level endpoint: G and 6 have a section of character present,
are analysed, according to which subgroup the character S and 9 do not:
has been restricted. G!‘6* - examine lower right for inflection;
At the first stage, for example, differentiation between S:‘9* - check for lower section being curved or
a ‘U’ and a ‘V’ would not be possible because of their straight;
identical feature vectors, this fact being held in the l Z/2* ~- examine upper section of character for sharp
auxiliary storage. On entering the second stage, the inflection.
feature extractor would look for a means of classifying
the character. In the case of ‘U’ and ‘V’, the presence
or absence of a sharp inflection at the bottom of the DISTRIBUTION OF POINTS AS FEATURES
character would enable discrimination.
The procedures required for the extraction of features Another of the features of a character to be used for
at this stage are governed by experimentation and know- recognition is that of the distribution of points, i.e. tilled

rol4 no 4 novernher 1986 213


points within a matrix, to assess the probability of an splitting the alphabets into M disjoint sets’. Bayes’ esti-
input pattern being a specific character. mates are used for probabilities for M - 1 of these
One such technique is zonings. This method works sets (used as training sets for the system) and formula
by splitting the matrix up into several non-overlapping (1) for the other set (testing set). The procedure is
regions, and building a feature vector which contains repeated, with each set as the test set, and the average
as elements the count of black points in each of these probability of error for each set is calculated.
regions. Probabilities are required for the value of region
k (i.e. the count of black points) on condition of class
Cj, which could, for example, be one of the 26 upper ONLINE RECOGNITION OF CHARACTERS
case letters of the English alphabet. Also required is
the probability of an input pattern being Cj. This may In online recognition the processing occurs as the
be such that there is an equal probability of each characters are drawn rather than on completion’.
character occurring, so P(Cj = l/26) for all Cj, or it Commonly, a graphics tablet is used which either
may be related to the known probabilistic occurrences samples points at a constant rate or in spatial increments.
of each letter. The creation of noise is a problem in such systems and
On obtaining these values, a formula is applied for is usually due to the distance between the tip of the
each Cj and the Cj yielding the highest result is deemed light pen and the point of whose coordinates are taken.
the recognition answer. In any recognition process the
task is to extract features from the input pattern, and
Characteristics of online recognition
use stored data from the learning (or training) set in
conjunction with these features to enable classification.
As patterns are represented by line drawings, the need
As there is no optimum way of extracting features it
for costly processes such as skeletonization or contour
is usual to select some well known ones rather than
extraction is removed, and it is more usual to gather
make a simple random selection. Obviously, the result
information on the input stroke sequence in preference
of having too large a feature set is to involve a vast
to the actual shape of the character.
amount of storage and data processing, as well as
The time to recognize may be longer than in an optical
necessitating many training samples to build the initial
character reader, as it is unnecessary to recognize
learning set. It is for this reason that such importance
characters faster than they can be drawn.
is placed on preprocessing of the pattern in character
There are less constraints on characters, allowing size
recognition.
and orientation variation.
Preprocessing the pattern before feature extraction
allows the complexity of the feature extraction process
to be reduced, and is vitally important in such an Problems of online recognition compared with an
approach as zoning. OCR
A specific form of preprocessing, size normalization,
is achieved by Hussain et ~l.~, as performed using the If the online recognizer is being used for computer aided
3 822 upper case alphabetic letters from Munson’s data- design, the set of symbols encountered may be very large,
base (49 different persons each supplied three sets of usually more than 85 as opposed to, say, the 36 alpha-
the alphabet). The need for size normalization when numeric characters.
a zoning technique is involved is apparent because of It may be necessary to enlarge the vocabulary as the
the use of point densities within a region for features, system expands, thus a learning capability is required.
so a scaling up or scaling down is required to obtain Sloppy writing or ineffectual writing devices will
a normalized pattern involving most of the matrix area. reduce the recognition rate.
The use of zoning for character recognition is a statis- Noise is a problem, particularly if the character is
tically based approach once the feature vectors have small, due to the precision of the graphics tablet. The
been compiled. limit for size is usually 0.1 to 0.5 mm.
For each of the 26 upper case letters, the character
set in this case
Preprocessing
R = f ln~P(x~/C~)] + ln[P(Cj)] (1) As with all recognition systems, a preprocessor is used
i=l
to extract the relevant information that will be used
is calculated where: for classification. In an online system it is generally the
Cj is letter ‘j’ points that make up the character that are selected.
P(Cj) is the a priori probability of the character Methods of obtaining these points include
being Cj
P(xi/CJ:, is the conditional probability of the value smoothing i.e. averaging a point with its neighbours
of xi given Cj to reduce the effects of noise;
filtering by forcing a minimum distance between
The Cj yielding the maximum value of R is that which points; this eliminates perturbations while the pen
is resultantly recognized. The probability of a recog- is not moving and reduces data redundancy;
nition error is reduced if the feature vectors’ components angular segmentation by sampling a point as soon
are statistically independent and when probabilities in as the tangent to the drawing changes by a certain
(I) are obtained using Bayes Estimation. angle;
To obtain some quantified value of the merit of a normalization of the size and position of the
recognition system, Touissant and Donaldson suggest character.

214 image and vision computing


Extraction of features and ciassifica tion Recognizable character set

Most online systems use as features the sequence of Online systems have been designed to recognize a vary-
subareas traversed by the pen or the sequence of ing set of alphabets from numerals and alphanumerics
directions of the drawing. Typically, feature vectors are to Japanese Katakana characters, as well as for specific
created by means of angles, structure (i.e. clockwise or purposes such as input of computer programs of com-
anti-clockwise curves etc.), zoning or Fourier descrip- puter aided design requiring FORTRAN symbols and
tors. Classi~cation is achieved by a table lookup scheme special symbols for CAD applications.
i.e. check if that vector has been previously encountered
and thus known. In an attempt to reduce the amount
of processing of data, a hierarchical scheme can be Recognition rates
adopted. operating on only part of the feature vector.
This would be particularly advantageous when the In terms of recognition rates it is not normal for an
vector has many dimensions. online system to exceed 95% accuracy, but this figure
may be seen as more than acceptable because:

They are sensitive to experimental conditions as there


are no standard databases.
Capabilities and performances
Recognition varies considerably as the user acclima-
tizes to the system and training of the user is an
Separation of characters
important part of online recognition.
The possibility exists to correct errors in drawing
In an OCR system. separation is implied by a space almost as soon as they occur.
between characters or by considering a pattern within
a restricted area, but in the online case it is more dif~cult
to distinguish the completion of one character from the Time requirements and memory
initiation of another. It is possible to find this separation
by having the user signal he has finished one character All online systems operate recognition in a time neg-
by interaction, operating a time-out of about 0.5 s ligible in comparison with the time taken to draw a
between characters or using forced physical separation. character, and a figure of under 50 ms is usually possible.
With the task of recognition being so involved in data
processing, the time can be optimized and memory
~earnj~g requirements reduced by adopting techniques such as
hash coding or binary search of a dictionary containing
condensed codes.
While nearly all offline recognition systems (i.e. process-
Most programs run on large computers would require
ing performed on completion not during the drawing
drastic alterations if they were to be run on mini or
of the character) operate on trainable algorithms which
microcomputers, although in the case of CAD this is
involve the use of a learning set, online systems lack
not possible as these systems tend to operate on large
the standard databases. e.g. those of Munson and
and powerful computers.
Highleyman. because taking data one item at a time
would prove to be ineffective in terms of building the
lookup table, or dictionary.
Although this appears to be a major drawback, learn- Use of Fourier coefficients
ing is feasible through user interaction and, feedback.
In the case of correct recognition there would be no In an online recognition system a useful device for the
feedback and different features or parameters could be construction of feature vectors is that of Fourier
weighted accordingly to indicate success. On the other expansion coefficients. The stage of feature description
hand. if the character is rejected then interaction would is concerned with analysing the preprocessed informa-
be used to indicate to the system the character the user tion, i.e. smoothed, normalized and sampled, and pin-
had drawn, and an entry in the table with the name pointing the features of importance in a character that
and description (i.e. feature vector) would be established. will be used for classification.
A substitution, i.e. misrecognition of the intended Classification is performed at two levels, involving
character, would result in a new test or the addition a preliminary grouping according to the number of
of a feature to remove the ambiguity as seen by the strokes in the character. In the experimentation by
recognition system. Arakawa’, using the 26 lower case letters and digits,
the characters i. j. t. x, 4, 5 and 7 are found to comprise
two strokes, with the remainder being drawn in one
complete stroke. These two subgroups are further
Size of recognizable characters categorized according to the shape of the strokes as
represented by Fourier coefficients.
Due to the cost of increasing the precision of the graphics The character feature vector is then compared with
tablet, it is usually difficult to recognize small characters reference pattern vectors, which consist of average values
(3-5 mm), and as a result most researchers systems are and a covariance matrix of Fourier coefficients for learn-
restricted to characters SlOmm high which is unsatis- ings samples, with the classification being based on
factory for typical applications such as billing machines. Bayes’ decision rule.

~014 no 4 novcmtwr 1986 215


CURSIVE SCRIPT RECOGNITION built to classify numerals of large variability with high
accuracy.
In the recognition methods described so far, the task As with all recognition techniques, the main features
has been to identify an individual character, but it would of the pattern are extracted before an attempt is made
undoubtedly be of more interest to humans if it were to classify. Brown and Ganapathy’s model consists of
possible to recognize entire words that consisted of individual routines for each of 183 features, 15 of these
cursively joined characters. being globally extracted from all parts of the word, and
Work in the area of cursive script recognitions (CSR) the others are extracted from isolated parts of the word.
has evolved over some 25 years, and began with Before these methods of extraction can work, the
Frishkopf and Harmon’s work at Bell Telephone height of the script, script location and orientation must
Laboratories. Frishkopf aimed to recognize the word be known. It is these tasks which preprocessing performs
as a whole, i.e. as a single complex symbol, while through normalization.
Harmon decided to segment the word and perform
recognition on the individual characters.
Preprocessing techniques
Much work was carried out in CSR until about 1965,
when many of the researchers turned to speech recog-
Typically a preprocessor for cursive script is concerned
nition instead, and it was a further 15 years before the
with coordinate translation, rotation and scaling, curve
research intensified again. Although CSR is a major
smoothing (and noise elimination) and character
area of character recognition, very little of the work
deskewing. An open loop preprocessor would handle
turns up in literature, as a great deal of the work is
these stages one by one and involve a single pass of
being promoted by proprietary concerns. It is the capa-
the pattern, so there is no verification that the data
bility to alleviate problems associated with man-machine
has been satisfactorily modified by the preprocessor. In
communication more efficiently, with the development
contrast to this, a closed loop preprocessor can reduce
of relatively cheap processing power, that has made the
error by applying verification and feedback.
area all the more attractive.
As even humans find it difficult to recognize some
cursive script. it is clear that in automatic recognition Open loop preprocessing
the task or preprocessing is even more important than
in single character script. With such a variety of styles Brown and Ganapathy’s implementation uses ten bits
possible it is necessary to eliminate much of this varia- per coordinate, with an average space of 100 units
bility which proves to be problematical in recognition. between successive characters within the script.
Brown and Ganapathyy implemented a CSR using The pattern vector which represents the original input
an Amdahl 470 system, linked to a Tektronix graphics pattern is stored as a 3 x N array, with N being the
tablet and a Tektronix 4010 graphics terminal. As the number of sample points taken and the three dimensions
word is written the computer is supplied with X and representing X-coordinate, Y-coordinate and visibility.
Y coordinates from the tablet at the rate of 96 coordi- This information enables the script to be redrawn in
nates/s. and with a realistic drawing rate of characters/s, a sequential process from the first to last coordinates,
this yields about 24 sets of coordinate points per with the visibility factor distinguishing between a move
character. and a draw. This indicates, for example, a move of the
tablet to place a dot on an ‘i’ rather than a continuously
The script data is stored in the Amdahl 470 in the
drawn stroke.
form of X and Y integer vectors, with a typical four
Preprocessing is achieved by performing two-
to six letter word being represented by 200&300 coordi-
dimensional linear transformations using homogeneous
nates -this information is known as the pattern vector.
coordinate transformation matrices. A fourth dimension
In an unnormalized form this vector represents the com-
is augmented to the 3 x N matrix, containing as
plete input script, including the variability of the script
elements the value 1, to enable post-multiplication of
caused by a particular writer, and it is this variability
3 x 3 matrices as the visibility information is not used
that is required to be removed by preprocessing. This
in calculations.
will result in a normalized pattern vector which is author
By using vector representation, several transfor-
independent and is followed by feature extraction and
mations may be accumulated into one 3 x 3 transfor-
classification.
mation matrix. A measure of the current orientation,
Shridhar and Badreldin’“,” use a structural classifi- size and shape is obtained by the extraction of various
cation scheme in which the recognition algorithm is features from the pattern vector, and is a necessary
derived as a tree classifier. requirement before performing the linear processes such
The first phase describes topological features to as translation and rotation.
uniquely characterize isolated numerals. Features are Before it is possible to scale the script, the height
derived from the left to right profiles of threshold images of the central body is found, ignoring upper and lower
of the numerals. Using a structural scheme features are loops, dots and crossbars on ‘t’s. The relative height
defined and combined to yield a logical description. The of the script can be determined by looking at the prob-
second phase deals with the segmentation of connecting ability density of Y coordinates, as in most cases the
strings using a hierarchical approach. majority of Y coordinates will lie in the central body
Huang and Chuang” similarly pursue a means to of the script. Ideally, the density function will have low,
recognize handwritten numerals. Here a heuristic flat tails representing coordinates of upper and lower
approach is adopted whereby structural differences loops, with a plateau in the centre representing the
between numerals are extracted and a learning table central body.

216 image and vision computing


To determine this density function, horizontal
thresholds are placed over the script at pr~dete~ined
levels, and a count of the number of times the script
crosses each is kept in a histogram.
Deskewing is the last of the preprocessing tasks to
be performed, and is concerned with removing slant 4.----- Verification --9
variation that is typical of cursive script. The slant of
the script can be measured by placing two horizontal
thresholds through the centre of the script and finding
reciprocal gradients at the points at which the script
crosses over these lines. Determining the slant of the
complete script is then merely a task of averaging all Pattern Transformation
the slopes taken. transformotion -

In most cases (more than 80%) Brown and


Ganapathy found that open loop preprocessing is able
to properly translate, rotate and scale the script, but
problems do arise when critical baseline points that are
selected do not all lie on the true baseline of the script,
particularly if the number of critical points is low. While
most classifiers that have been proposed” assume
handwritten characters to be continuous, isolated and Figure 4. Closed loop preprocessing
completely described by their boundaries, Shridhar and
Badreldin attend not only to the more commonly Table 2. Computer selected words for recognition
encountered broken form of handwritten numeral but
also to connected adjacent numerals. Their recognition Recognition error
algorithm uses a tree classifier to determine individual
numerals with a reported accuracy of 99% and Open loop Closed loop
connected numerals with an accuracy of 93O/.
rxxi 10.0 0
xxxi 9.1 9.1
Closed loop preprocessing rxxi 33.3 0
exxi 20.0 10.0
In terms of operations performed, closed loop and open xrxi 30.0 40.0
loop preprocessing are very similar, but the main advan- xxxr 0 0
tage of the former is that the error rate can be reduced rxxe 22.2 30.0
by applying verification and feedback before repeating rxxx 0 0
the process, while an open loop system operates on only cxxi 20.0 20.0
one pass. ixxi 20.0 0
Verification of the location and orientation of the rxri 22.2 0
word are usually easy so the measurement algorithms exxr 40.0 50.0
can be simplified to give more consistent results. This xxri 40.0 10.0
veri~cation gives an approximate measurement of the rixx 20.0 20.0
positional or rotational error, and although this value xrxx 50.0 30.0
is not exact the measurement error will diminish as the rexx 60.0 10.0
word becomes properly normalized. By placing the rxxs 0 0
verification stage before the first application of a trans- rxre 0 0
formation, to obtain a first transformation estimate, it exxe 20.0 20.0
is possible to reduce the amount of processing if the exri 30.0 10.0
word is already partially normalized in terms of xxxe 20.0 0
orientation and size,
The diagram of the closed loop preprocessor (Figure Average error 22.1% 12.4%
4) shows the veri~cation and transformation processes
in expanded form. Change in recognition error - 43.9%
A set of thresholds is placed over the script, as was
done in open loop scaling, and the histogram computed
for the non-orientated pattern. In this closed loop
method a trial rotation is performed and the new histo- before the first transfo~ation, thus avoiding these
gram is obtained. processes (scaling and deskewing) if they are not necess-
These processes are repeated until there is no further ary.
improvement, in which case the angular change is halved The usefulness of a closed loop preprocessor can be
before repeating the whole process again. seen by the reduction of recognition errors shown in
Closed loop scaling and deskewing are similar to their Table 2.
respective open loop counterparts, except that in the Processing times for the various algorithms were
closed loop case the histogram is calculated after each found by Brown and Ganapathy on the Amdahl 470
transformation, rather than just once, to determine the using operating systems software, and they established
result of that transformation. Verification is performed that preprocessing could be performed at a rate exceed-

vol4 no 4 novemher 1986 217


ing 300 words/min, which is typical of human reading sidering the distribution of points as descriptive of
rates. With the use of special purpose hardware much the character.
of the preprocessing could be speeded up and pipelined
with the recognition stage to maximize throughout. Cursive script recognition will prove most advantageous
A recognition rate of about 97% was established by to industry in terms of the efficiency gained, and it is
Huang and Chuang using a digital TV camera input already possible for recognition of words at a speed
with 128 x 128 resolution. The closed loop effectively exceeding that of human reading.
was achieved through use of a dictionary set up during
input of the first 200 of 400 randomly written numerals.
REFERENCES

1 Suen, C Y, Berthod, M and Mori, S ‘Automatic recog-


CONCLUSIONS nition of handprinted characters - the state of the
art’ Proc. IEEE Vol68 No 4 (April 1980) pp 472487
Character recognition has been of great interest to 2 Munson, J H ‘Experiments in the recognition of
scientists for over 25 years. It is the quest to study human handprinted text: part I - character recognition’
perception, in combination with the advantages of com- Proc. Fall Jt. Comp. Conf. Vol 33 (Dec. 1968) pp
puter processing, that has made this field one of such 1125-1138
significance. Research has progressed from early 3 Highleyman, W H ‘An analog method for character
cumbersome optical character readers, involving massive recognition’, IRE Trans. Electron. Comput. Vol 10
hardware facilities, for recognition of a limited set of (Sept. 1961) pp 502-512
individual characters, to the automatic recognition of 4 Tou, J T and Gonzalez, R C ‘Recognition of hand-
handwritten cursive script. written characters by topological feature extraction
In these systems, the problem of storing the vast and multilevel categorization’ IEEE Trans. Comput.
amounts of information involved in an unprocessed Vol21 (Feb. 1972) pp 321-331
character proved to be particularly acute. The advent 5 Hussain, A B S, Toussaint, G T and Donaldson, R
of relatively cheap processing power, along with the W ‘Results obtained using a simple character recog-
advance in technology, made CR more within the nition procedure on Munson’s handprinted data’
reaches of automation. This resulted in much research IEEE Trans. Comput. Voi 21 (Feb. 1972) pp 201-205
work from the mid to late 1960s onwards, initially 6 Toussaint, G T and Donaldson, R W ‘Algorithms for
dealing with recognition of individual characters before recognizing contour-traced handprinted characters’
progressing to the more complex feat of cursive script IEEE Trans. Comput. Vol 19 (June 1970) pp 541-546
recognition. 7 Arakawa, H ‘On-line recognition of handw~tten
As the automation of CR can be achieved offline or characters ~ alphanumeric, Hiragana, Katakana,
online, attention has been given to each of these to Kanji’ Pattern Recognition Vol 16 No 1 (1983) pp
understand the similarities and differences between such 25-34
systems. It was found that although online systems tend 8 Harmon, L D ‘Automatic recognition of print and
to use graphics facilities for the writing of characters, script’ Proc. IEEE Vol60 (Oct. 1972) pp 1165-I 176
with offline adopting an optical scanning technique, the 9 Brown, M K and Ganapathy, S ‘Preprocessing
processes necessary were basically the same. These are, techniques for cursive script word recognition’
namely, segmentation of the characters (if needed), Pattern Recognition Vol 16 NO 1 (1983) pp 35-42
preprocessing, feature extraction and cfassification. 10 Shridhar, M and Badreldin, A ‘Recognition of
For the recognition of individual characters three isolated and simply connected handwritten numerals’
varying techniques were studied. These are Pattern Recognition Vol 19 No 1 (1986) pp l-12
11 Shridhar M and Badreldin, A ‘Handwritten numeral
l recognition of characters as they are drawn using recognition by tree classification methods’ Image
a graphics tablet, Vision Comput. Vo12 No 3 (August 1984) pp 143-149
l recognition using the geometrical features of a 12 Huang, J S and Chuang, K ‘Heuristic approach to
character handwritten numeral recognition’ Pattern Recog-
l recognition of matrix represented characters by con- nition Vol 19 No 1 (1986) pp 15-20

218 image and vision computing

You might also like