You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/258029618

A study on method of feature extraction for Handwritten Character


Recognition

Article  in  Indian Journal of Science and Technology · March 2013

CITATIONS READS

3 4,746

2 authors:

Vijay Prasad Yumnam Jayanta


Assam Don Bosco University NIELIT
7 PUBLICATIONS   60 CITATIONS    50 PUBLICATIONS   140 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Brain Tumer detection View project

Image Processing View project

All content following this page was uploaded by Vijay Prasad on 01 December 2014.

The user has requested enhancement of the downloaded file.


A study on structural method of feature extraction for
Handwritten Character Recognition
Vijay Prasad, Y. Jayanta Singh
Dept. of Computer Science and IT, Don Bosco College of Engineering and Technology,
Assam Don Bosco University, Airport Road, Azara, Guwahati, Assam – 781 017
vijay@dbuniversity.ac.in, jayanta@dbuniversity.ac.in

Abstract— This paper presents the study reports of major D. Classification


process involved in a handwritten character recognition system. E. Post-processing
We focus on the various feature extraction techniques as the The fig. 2 gives a block diagram of the whole recognition
recognition mainly depends on the features extraction. After system.
studying the various features we have modified an existing
feature extraction technique by introducing two more feature
vectors. After the introduction of these two new vectors we found
a considerable increase in the percentage of recognition.

Keywords— Handwritten, character, recognition,


feature extraction, optical scanner, classification
I. INTRODUCTION
Handwritten character recognition is a technique of a
computer application recognizes handwritten character or
sentence from sources such as paper photographs, documents,
touchscreens and similar devices by a computer. The image of
the handwritten sentence/word/character can be gathered
either offline or online. In off line technique it can be from a
scanned image of a paper. In online the detecting motion of
the pen tip, for example by a pen-based computer screen
surface. There are certain steps of processing before an image
can be recognised. First in this paper those steps are discussed
and then the main part that is the feature extraction techniques
is explained broadly.

Fig. 2 Block diagram of the recognition system

A. Pre-processing
When a document is scanned or say the raw data it may
require some preliminary processing. This Pre-processing
Fig. 1 An image of a handwritten sheet (grayscale). helps in producing the final document which will be processed
finally by a handwritten character recognition system. The
II. STEPS INVOLVED
main objectives of pre-processing are:
An offline handwritten character recognition system in  Noise reduction
general involves the following major steps:  Binarization
A. Pre-processing
 Skew correction
B. Segmentation
 Stroke width normalization
C. Feature Extraction
b. Bozinovic – Shrihari Method (BSM).
Noise Reduction – Normalization

Application of noise reduction techniques increases the


quality of the document. The major methods are:
 Filtering (masks)
 Morphological Operations (erosion, dilation, etc) Fig. 6 Calculation of the average angle of near-vertical elements.

Fig. 7 Bozinovic – Shrihari Method (BSM).


Fig. 3 Filter applied to a scanned image to get a noise reduced image.
The dominant slope of the character is found from the
After normalization generally it reduces the amount of data
slope corrected characters used in this study[2]. The vertical
to be processed. For eg by thinning the shape information of a
histogram projection is calculated for a range of angles ± R.
character can be gathered without loosing the data.
The slope of a character, a m , can be found
N
 m  min H
a R
H   pi log pi
i 1
am
Fig. 4 Stroke width normaliszation.
Character correction is done by using:
Binarization
In Binarization a gray sacle image if transformed to a x  x  y tan(am ) y  y
binary image. Two categories of binarization are:
1. Global: In this global one threshold value for the B. Segmentation
entire document image is pickup. This picking It involves two major steps in the given sequence:
value is based on an estimated value the a. Text Line detection, for which we may use
intensity of the image. Hough Transform, projections, smearing, etc
2. Adaptive: It uses different values for each
pixel[1]

Skew Correction and Slant Removal


Skew correction methods are used to align the coordinate Fig. 8 Text line segmentation
system of the scanner with that of the document. Its main
approaches include correlation, projection profiles and hough b. Word extraction for which we can use vertical
transform etc. projections, connected component analysis, etc.
and finally

Fig. 9 Word segmentation

For segmentation of word we may use wither explicit


segmentation or implicit segmentation technique.

The explicit approaches tries to identify the smallest


Fig. 5 Hough transformation applied to a raw scanned image. possible word segments that may be smaller than letters, but
The slant of any handwritten text(s) varies from one user to surely cannot be segmented further. However during the
another. Normalization of characters are done by using the recognition process these segments are assembled into letters.
slant removal methods. It is robust and quite straightforward.
Some of most popular deslanting techniques are given
below:
Fig. 10 Explicit segmentation
a. Calculation of the average angle of near-vertical
elements.
The implicit approaches recognized the words entirely
without segmenting them into letters. It is effective only when
the set of possible words is small and known in advance[3].

C. Feature Extraction
In this phase each character is represented as a vector. It
extract a set of features. Latter these features will be used to
optimised the recognition percentage. General feature
extraction methods considers three types of features
i. Statistical
Fig. 14 Line type normalisation
ii. Structural
iii. Global transformations and moments

i. Statistical Method
Some of the major statistical features used for character
representation-(a)Zoning, (b)Projections and profiles and
(c)Crossings and distances
Fig. 15 Formation of feature vector
Zoning: The character image is divided into NxM zones.
Features are extracted from each zone at local characteristics. Projection Histograms: The basic idea behind using
projections is that character images, which are 2-D signals,
can be represented as 1-D signal. These features, although
independent to noise and deformation, depend on rotation.
Projection histograms count the number of pixels in each
column and row of a character image. Projection histograms
can separate characters such as ―m‖ and ―n‖.

Fig. 11 Zooning technique for feature extraction

Zoning-Density Features: The number of foreground pixels in


each cell is considered a feature. Darker squares indicate
higher density of zone pixels as shown in the following figure

Fig. 16 Statistical Method: Projection Histogram

Profiles: The profile counts the number of pixels (distance)


Fig. 12 Zooning – Density feature
between the bounding box of the character image and the edge
Zoning-Direction Features: It is based on the contour of the of the character. The profiles describe well the external shapes
character image. For each zone the contour is followed and a of characters and allow to distinguish between a great number
directional histogram is obtained by analysing the adjacent of letters, such as ―p‖ and ―q‖.
pixels in a 3x3 neighbourhood.

Fig. 13 Zooning – Direction features

Based on the skeleton of the character image, one can


distinguish individual line segments.
Fig. 17 Statistical Method: Profiles
Fig. 18 Statistical Method: Profiles-character contour

Crossings and Distances:


Crossings count the number of transitions from background to
foreground pixels. It counts along the vertical and horizontal
lines through the character image. The distances calculate the
distances of the first image pixel detected from the upper and
lower boundaries of the image.

Fig. 21 Structural Method: Horizontal, vertical, radial, radial out-in and radial
in-out.

Fig. 19 Statistical Method: Crossing and Distances D. Classification


Classification is the problem of identifying which of a set
ii. Structural Method of categories a new observation belongs, on the basis of a
Characters can be represented by structural features with training set of data containing observations whose category
high tolerance to distortions and style variations. It based on membership is known. The individual observations are
topological and geometrical properties of the character, such analyzed into a set of quantifiable properties[4].
as aspect ratio, cross points, loops, branch points, strokes and There are no best classifier, however use of classifier
their directions, inflection between two points, horizontal depends on many factors, such as available training set,
curves at top or bottom, etc. number of free parameters. Some of the important methods
which can be used for classification are-k-Nearest Neighbour
(k-NN), Bayes Classifier, Neural Networks (NN), Hidden
Markov Models (HMM), Support Vector Machines (SVM),
etc

E. Post-processing
It refer to the processing done on the image after the
Fig. 20 Structural Method
classification process is over to refine the result.
It also incorporated the context and shape information in all
Three types of features:
the stages of HCR systems for meaningful improvements in
 Horizontal and Vertical projection histograms
recognition rates.
 Radial histogram
 Radial out-in and radial in-out profiles

Fig. 22 A post processing approach


III. OUR STUDY [2] G. Vamvakas, N. Stamatopoulos ,B. Gatos, I. Pratikakis and S.J.
Perantonis, "Standard Database and Methods for Handwritten Greek
We studied one of the feature extraction methods, viz., Character Recognition", accepted for publication in the proc. of the
Structural: Horizontal and Vertical projection histogram. 11th Panhellenic Conference on Informatics (PCI 2007) ,Patras,May
Normally in this method of feature extraction people use only 2007.
[3] John Canny, ‖A computational approach to edge detection.‖
two feature vectors. However by using these feature vectors IEEE Transactions on PAMI 698, 1986
we generated another two more projection and we named [4] Fisher R.A. (1936) " The use of multiple measurements in taxonomic
them as Mean H Histogram and Mean V Histogram. By problems", Annals of Eugenics, 7, 179–188
introducing these two extra feature vectors we found the [5] Oivind Due Trier, Anil K Jain, Torfinn Taxt. ―Feature Extraction
Methods for Character Recognition-A survey‖, 1995
increase in the accuracy level of the recognition. These new [6] Anita Pal & Dayashankar Singh, ―Handwritten English Character
feature vectors were generated by Recognition Using Neural Network‖, - International Journal of
a. an optimal span is taken, e.g. say the normalised image Computer Science & Communication, 2010
is 45 x 45 pixel in dimension, so we may divide the [7] Kunihiko Fukushima and Nobuaki Wake, ―Handwritten Alphanumeric
Character Recognition by the Neocognitron‖ IEEE TRANSACTIONS
image in span of 9. ON NEURAL NETWORKS. VOL. 2. NO. 3. MAY 1991
b. The mean for each span is calculated. [8] Hsin-Chia Fu, Member, IEEE, and Yeong Yuh Xu ―Multilinguistic
c. Each member in the span is initialised with the Handwritten Character Recognition by Bayesian Decision-Based
calculated mean in the previous step. Neural Networks,‖ IEEE TRANSACTIONS ON SIGNAL
PROCESSING, VOL. 46, NO. 10, OCTOBER 1998
[9] Jianying Hu, Member, IEEE, Michael K. Brown, Senior Member,
IEEE, and William Turin, Senior Member, IEEE, ―HMM Based On-
Line Handwriting Recognition,‖ IEEE TRANSACTIONS ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 18,
Fig. 23 A screen shot of the proposed feature vectors test engine NO. 10, OCTOBER 1996.
[10] Elie Krevat, Elliot Cuzzillo, ―Improving Off-line Handwritten
Character Recognition with Hidden Markov Models,‖ 2005.
[11] Sandhya Arora, Debotosh Bhattacharjee, Mita Nasipuri, Dipak Kumar
IV. RESULTS Basu*, Mahantapas Kundu ―Combining Multiple Feature Extraction
For testing our implementation of these feature vectors, we Techniques for Handwritten Devnagari Character Recognition,‖ 2008
IEEE Region 10 Colloquium and the Third ICIIS, Kharagpur, INDIA
created an engine, by which we can test the result with and December 8-10.
without these new feature vectors. Example: when we test out [12] Dayashankar Singh, Sanjay Kr. Singh, Dr. (Mrs.) Maitreyee Dutta,
the study with using these new feature vectors we found a ―Hand Written Character Recognition Using Twelve Directional
considerable increase in the accuracy level of the recognition Feature Input and Neural Network,” 2010 International Journal of
Computer Applications.
compared to that study performed without extra feature [13] Plamondon, R. Ecole Polytech., Montreal, Que. Srihari, S.N., Online
vectors. On the introduction of these new vectors it partially and off-line handwriting recognition: a comprehensive survey.
incorporated the zooning method which is a kind of statistical [14] R. Bajaj and S. Chaudhury, &ldquo,Signature Verification Using
method of feature extraction. Multiple Neural Classifiers,&rdquo, Pattern Recognition, vol. 30, no. 1,
pp. 1-7, 1997.
[15] G. Dimauro, S. Impedovo, G. Pirlo and A. Salzo, &ldquo,A Multi-
V. CONCLUSIONS Expert Signature Verification System for Bankcheck
This paper reports the basic concept of a Handwritten Processing,&rdquo, Int',l J. Pattern Recognition and Artificial
Intelligence, vol. 11, no. 5, pp. 827-844, 1997.
Character Recognition. Our study process of HCR developed [16] T. Fujisaki, T.E. Chefalas, J. Kim, C.C. Tappert and C.G. Wolf,
certain feature vectors which enhanced the recognition. We &ldquo,On-Line Run-On Character Recognizer: Design and
have used the basic feature extraction technique. By using two Performance,&rdquo, Character and Handwriting Recognition:
new feature vectors that we called as Mean H Histogram and Expanding Frontiers. P.S.P. Wang, ed., pp. 123-137, Singapore: World
Scientific, 1991.
Mean V Histogram in process of feature formation; we found [17] Arica, N. , Dept. of Comput. Eng., Middle East Tech. Univ., Ankara
the increase in the accuracy level of the recognition. Yarman-Vural, F.T., An overview of character recognition focused on
off-line handwriting, Volume: 31 , Issue: 2 , Page(s): 216-233.
[18] I. Abulhaiba and P. Ahmed, ―A fuzzy graph theoretic approach to
recognize the totally unconstrained handwritten numerals,‖ Pattern
ACKNOWLEDGMENT Recognit., Volume. 26, no. 9, pp. 1335–1350
[19] Y. S. Huang and C. Y. Suen, ―A method of combining multiple experts
We want to acknowledge all the faculty members of our for the recognition of unconstrained handwritten numerals,‖IEEE
department with special thanks to our Principal and Director Trans. Pattern Anal. Machine Intell., vol. 17, pp. 90–94.
of Research. [20] X. Li, W. Oh, J. Hong, and W. Gao, ―Recognizing components of
handwritten characters by attributed relational graphs with stable
features,‖ in Proc. 4th Int. Conf. Document Anal. Recognit., 1997, pp.
VI. REFERENCES 616–620.
[21] S. Madhvanath, E. Kleinberg, V. Govindaraju, and S. N. Srihari, ―The
[1] G. Vamvakas, B. Gatos, I. Pratikakis, N. Stamatopoulos, A. Roniotis HOVER system for rapid holistic verification of off-line handwritten
and S.J. Perantonis, "Hybrid Off-Line OCR for Isolated Handwritten phrases,‖ in Proc. 4th Int. Conf. Document Anal. Recognit., Ulm,
Greek Characters", The Fourth IASTED International Conference on Germany, 1997, pp. 855–890.
Signal Processing, Pattern Recognition, and Applications (SPPRA [22] R. Plamandon and S. N. Srihari, ―On-line and off-line handwriting
2007), ISBN: 978-0-88986-646-1, pp. 197-202, Innsbruck, Austria, recognition: A comprehensive survey,‖ IEEE Trans. Pattern Anal.
February 2007. Machine Intell., vol. 22, pp. 63–85.

View publication stats

You might also like