You are on page 1of 4

UG7HOHFRPPXQLFDWLRQVIRUXP7(/)25 6HUELD%HOJUDGH1RYHPEHU

An Algorithm for Handwritten Digit Recognition


Using Projection Histograms and SVM Classifier
Eva Tuba and Nebojsa Bacanin

Abstract—Higher level of image processing usually contains The accuracy of recognition is strongly depends on classi-
some kind of recognition. Digit recognition is common in ap- fication method. Artificial neural network is a flexible system
plications and handwritten digit recognition is an important that changes its structure based on external or internal informa-
subfield. Handwritten digits are characterized by large variations
so template matching, in general, is not very efficient. In this tion that flows through the network during the learning phase.
paper we describe an algorithm for handwritten digit recognition It is extensively used for classification and recognition task
based on projections histograms. Classification is facilitated [9], [10]. Number of researches relied on k-nearest neighbors
by carefully tuned 45 support vector machines (SVM) using algorithm (KNN) for classification [11]. Also, it is common
One Against One strategy. Our proposed algorithm was tested practice to combine two or more methods of classification.
on standard benchmark images from MNIST database and it
achieved remarkable global accuracy of 99.05%, with possibilities KNN is combined with support vector machine (SVM) in
for further improvement. [12]. At first KNN is used for classification, and if class is
Keywords — Image processing, Handwritten digit recognition, not uniquely determined then KNN output (distance matrix) is
SVM, Projection histogram. used as input for SVM. Three stage classifier based on neural
network (NN) and SVM is proposed in [13]. Novel neural
I. I NTRODUCTION classifier LIRA, adjusted neural network for classification of
Pattern recognition is the scientific field that is used to binary and gray-scaled digit images is presented in [14].
classify data based on patterns detected in data. Character Multiclass SVM and SVM have gained prominence in the field
recognition and its sub-field, digit recognition, are very popu- of pattern classification [15], [16].
lar topics of research. One of the most common applications Due to bad results, projection histogram is rarely used
for digit recognition is recognition of license plate numbers. for handwritten digits. It is usually used for more regular
Numbers on license plates are written in clear and uniform shaped digit recognition. In this paper we will show that with
fonts. Several papers are proposing template matching tech- this feature as input vector, with proper classifier very good
nique for license plates digit recognition [1], [2], [3]. Another accuracy can be achieved. SVM classifier is used.
common method for license plate digit recognition is using The rest of the paper is organized as follows. In the next
projection histogram [4], [5]. Handwritten digit recognition is Section II there is description of our proposed algorithm,
harder to do due to large number of different writing styles, feature extraction and classifier. In Section III empirical results
angles, pencil thicknesses etc. Handwritten digit recognition along with comparative analysis are given.
is used in post offices for mail sorting, in banks for check
processing, form data entry etc. These applications require II. P ROPOSED A LGORITHM FOR H ANDWRITTEN D IGIT
accuracy and speed. Many algorithms have been developed R ECOGNITION
for solving this problem, but it is still an interesting topic Algorithms for handwritten digit recognition (and character
for researchers to develop faster algorithms that provide better recognition) usually consist of three steps. The first step is
results. These algorithms are more complex than algorithms pre-processing. This step should remove irrelevant data and
for typed digit recognition. data that could have negative influence on recognition. Usual
For handwritten digit recognition different types of features steps in this faze are binarization, normalization, smoothing
extraction techniques are being used. Main issue with hand- and denoising. Second step is feature extraction and the third
written digits is diversity in size and angle due to different step is classification. We used benchmark images that did not
writing styles. In order to make invariant training set, in [6] require pre-processing.
an affine transformation is applied to each training image and
the training set is expanded with result images. In [7] and [8] A. Feature extraction
this problem is overcome by using invariant moments (Hu’s Historically, first ideas for object recognition algorithms
invariants and Zernike moments, respectively) as features. were to simulate the process humans use for this task: template
matching technique. Template matching for digit recognition
This research is supported by the Ministry of Education, Science and can be successful if digits have same the shape and diversity
Technological Development of Republic of Serbia, Grant no. III-44006.
Eva Tuba (corresponding author), Faculty of Computer Science, John is only in angle or size. As mentioned above, this technique is
Naisbitt University, Bulevar umetnosti 29, 11070 Belgrade, Serbia (e-mail: rather popular in typed number recognition such as numbers
eva.tuba@gmail.com). on license plates or house numbers [17], [18]. For handwritten
Nebojsa Bacanin, Faculty of Computer Science, John Naisbitt Univer-
sity, Bulevar umetnosti 29, 11070 Belgrade, Serbia (e-mail: nebojsaba- digit recognition this method is not successful with templates
canin@gmail.com). variability. In Fig. 1 digit eight is presented written by different

‹,((( 464

Authorized licensed use limited to: Eskisehir Teknik Universitesi. Downloaded on January 07,2021 at 15:32:03 UTC from IEEE Xplore. Restrictions apply.
people. Every digit from Fig. 1 needs a matching template.
Similarly, all variations in size and orientation require their
templates. The conclusion is that for just one digit we would
need a very large number of templates (for variations in size,
orientation, form, style etc.) and that would certainly lead to
making them indistinguishable from templates for some other
digit.
Fig. 3. Projection histograms: Left on x-axis, Right on y-axis

on the x-axis of the same digit straight written. This leads


to the conclusion that combining projections on different axis
will lead to better results. For our proposed algorithm we used
Fig. 1. Example of handwritten digit eight
four axis: x-axis, y-axis, y = x and y = −x. Figure 4 shows
Feature extraction is used in order to accommodate vari- projection histograms to x and y axes for few samples of digit
ability of patterns. Each type of pattern is stored as a set 3, while Figure 5 shows projection histograms for the same
of its features. As a result, images are represented as set digits to diagonal axes y = x and y = −x.
of features. Feature extraction is the second step in the
algorithm for handwritten digit recognition. A number of sets
of features were presented by researchers in the past. This
step is very important for performance of the algorithm. The
accuracy of classification will strongly depend on selection and
extraction of features. In Introduction we already mentioned (a) (b)
one set of features, projection histograms. These are usually Fig. 4. Number 3 histograms on (a) x and (b) y axis
used for typed numbers recognition. Additionally, this set is
commonly combined with other feature sets; it is rarely used
alone. Another frequently used method for digit recognition is
zoning. After pre-processing the image is divided into several
convenient zones and feature extraction is done for every zone.
One promising technique is combining zoning and projection
histograms [19].
(a)
In our proposed algorithm every digit is represented with
projection histograms. We use different axis for projection and
combine this features in input vector. Examples of projection
histograms on x-axis are shown in Fig. 2.

(b)

Fig. 5. Number 3 histograms on (a) y = x and (b) y = −x

B. Classification
Fig. 2. Projection histogram on x-axis (y = 0) Support vector machine is binary classifier which works
by finding hyperplane that separates members of individ-
On examples of digits zero, three and eight we can see ual classes. SVM finds hyperplane which maximizes margin
diversity of projections. Fig. 3 presents histograms for those between closest representatives of the two classes. SVM is
numbers. Zero has less pixels in the middle than at the sides. rapidly replacing neural networks in the field of pattern recog-
For number eight histogram is reversed, it has more pixels at nition and it is used for many applications such as handwritten
the sides then in the middle. Number three is like the number digit recognition.
eight which has part cut on the left side. Histograms for this In order to design a learning algorithm, we need to find
two numbers are similar, but number three has less pixels on a class of functions whose capacity can be calculated. SVM
the left side. classifiers are based on a set of hyperplanes:
Diversity in writing styles can cause digits to be rotated
to one side, so the projection on only one axis could not w · x + b = 0, w ∈ Rn , b ∈ R (1)
recognize difference between numbers. It would be possible
b
for one number to have very different histograms and to where w is the normal vector to the hyperplane and is ||w||
overlap with histograms of other numbers. Making projections the perpendicular distance from the hyperplane to the origin.
on several axes creates a more complete information about the Decision functions have the form f (x) = sgn(wx + b).
number. If the number is rotated slightly to the left, projections The optimal hyperplane has its distance to the closest
on y = −x axis should be similar to the histogram projection training examples from each class maximized. Finding such

465

Authorized licensed use limited to: Eskisehir Teknik Universitesi. Downloaded on January 07,2021 at 15:32:03 UTC from IEEE Xplore. Restrictions apply.
hyperplane boils down to selecting b and minimal ||w|| so that elements. Each subset is tested using classifier obtained from
equation 2 is correct for training examples. the rest of v − 1 subsets. The result of this procedure is that
each element from training set is classified once and accuracy
yi (w · xi + b) − 1 ≥ 0 (2) of the cross validation is percent of correct classified data.
Cross validation can also prevent the over-fitting problem.
where yi is class (+1 or -1). Minimizing ||w|| is equivalent Grid search is a method where various pairs for C and γ
to minimizing 12 ||w||2 and than quadric optimization can be are tested in order to find best values. The cross validation
performed. After solving the quadratic
 optimization problem, accuracy is used for parameter quality estimation. Optimal
the solution has the form w = αi yi xi , where αi > 0 are values for parameters are not known beforehand and values
Lagrange multipliers. Support vectors are training examples from wide range with extensive distribution and being tested
which are closest to the hyperplane. An important feature first. A good method for parameter search is to use exponen-
of the algorithm that should be emphasized is that quadratic tially growing array. After determination of better range, finer
optimization problem and decision function depend only on search is done in that range. This method of improving can
the vector product between samples. SVM classifier defined be repeated few times, but not too many times because of
like this will provide poor results if training set contains possibility of over-fitting.
misclassified or extremely unusual data, or more precisely it 3) Multiclass Problem with SVM: Ten classes are needed
is highly sensitive to noise in input data. In order to overcome for digit classification and SVM separates data only into
this problem the idea of soft margin is proposed. The idea is two classes. There are two strategies for using SVM with
to allow some training examples to be misclassified. Including multiclass problems. First is One Against All where for each
this permission in quadric problem leads to minimization of class we build SVM with data from chosen class as positive
 examples and all the rest as negative. For handwritten digit
yi (w · xi + b) + C i (3) recognition it would be necessary to build ten SVMs, one for
Value  serves as threshold, all predictions have to be within each digit. SVM for one digit would classify data for that digit
an  range of the true predictions. Parameter C is parameter in the first class and data for all others in the second class.
for soft margin function cost. Smaller values for C increase Second strategy is One Against One, it builds n∗(n−1)
2 SVMs,
allowed number of training errors. Choosing optimal value for data from each class is individually trained against all other
parameter C is crucial for creating successful SVM classifier classes.
[20]. In this paper we are using One Against One method for
Another problem with earlier definition of SVM is that classifying digits. In [21] it is shown that for practical use
algorithm would not be able to find separating hyperplane if One Against One is better then the other methods. We created
data in training set is not linearly separable. In that case the 45 SVMs and for each one found optimal parameters.
use of kernel function is needed. Dot product is replaced with III. E XPERIMENTAL R ESULTS
kernel function and in this way input data are projected to In this paper for testing of the proposed algorithm we used
higher dimensional space with the goal to make them separa- MNIST (modified NIST) database. Images in this database are
ble. In theory every function that satisfies Mercer’s condition already pre-processed, so the algorithm does not include the
can be kernel function. In practice, there are several most first step. MNIST database contains 60,000 images for training
commonly used kernel function, linear, polynomial, (Gaussian) and 10,000 images for testing. The images were centered in a
radial basis function (RBF) and sigmoid. The most used kernel 28x28 image by computing the center of mass of the pixels,
function is RBF, mainly because of its localized and finite and translating the image so as to position this point at the
responses across the entire range of the real x-axis. RBF is center of the 28x28 field.
defined with:

K(Xi , Xj ) = exp(−γ||Xi − Xj ||2 ) (4)


where parameter γ is free parameter in RBF which determines
the influence of every single data point on the entire learning
process. Selecting better value for γ could drastically increase
the quality of classification.
1) Scaling: Quality of classification using SVM can be
significantly improved if data are scaled. The main reason for Fig. 6. Example of MNIST databasee
scaling is to avoid data in greater numerical range dominating
over data in smaller range. Before using SVM input data Our algorithm is tested under limited set of digits. For every
should be linearly scaled to the range [−1, 1] or [0, 1]. If SVM we found optimal values for γ and C by grid search.
training set and test set are not scaled with the same scaling We experimented with projection on four different axis, x,
factor accuracy can be decreased drastically. y, y = x and y = −x, and their combinations. Histograms
2) Grid Search and Cross-validation: Grid search and cross for projections on x and y axis contain 28 elements, because
validation are algorithms used in search for good values for dimension of digit images is 28 by 28, while histograms for
parameters C and γ with the aim to get the best accuracy in projection on y = x and y = −x have 55 elements. So the
classifying unknown data. The idea of v−f old cross validation combination of all four histograms create input vector with
is to divide training set into v subsets with the same number of 166 elements.

466

Authorized licensed use limited to: Eskisehir Teknik Universitesi. Downloaded on January 07,2021 at 15:32:03 UTC from IEEE Xplore. Restrictions apply.
TABLE I R EFERENCES
ACCURACY OF RECOGNITION ON MNIST DATASET
[1] R. Juntanasub and N. Sureerattanan, “Car license plate recognition
Digit X Y Y =X Y = −X through Hausdorff distance technique,” in 17th IEEE International
0 94.67% 98.56% 96.89% 99.11% Conference on Tools with Artificial Intelligence, ICTAI 05., pp. 647–
1 92.22% 99% 98.56% 97.89% 651, Nov 2005.
2 85.67% 93.78% 91.67% 87.56% [2] C. Shyang-Lih, C. Li-Shien, C. Yun-Chung, and C. Sei-Wan, “Automatic
3 90.56% 86.56% 91.11% 75.44% license plate recognition,” IEEE Transactions on Intelligent Transporta-
4 67.33% 97.56% 91.78% 81.22% tion Systems, vol. 5, pp. 42–53, March 2004.
5 78% 80.44% 94.78% 87% [3] N. F. Gazcn, C. I. Chesevar, and S. M. Castro, “Automatic vehicle
6 91.89% 99.78% 94.44% 91.44% identification for argentinean license plates using intelligent template
7 78% 96.44% 96.67% 87.67% matching,” Pattern Recognition Letters, vol. 33, no. 9, pp. 1066 – 1074,
8 91.22% 96.22% 91.89% 92.11% 2012.
9 78.89% 98.44% 94.67% 83.22% [4] J. Jagannathan, A. Sherajdheen, R. Deepak, and N. Krishnan, “License
Gl. acc. 84.84% 94.68% 94.24% 92.21% plate character segmentation using horizontal and vertical projection
with dynamic thresholding,” in International Conference on Emerging
Trends in Computing, Communication and Nanotechnology (ICE-CCN),
pp. 700–705, March 2013.
Table I shows recognition accuracies for each digit as well [5] F. Weijian and X. Zhou, “The research on image extraction and seg-
mentation algorithm in license plate recognition,” in International Con-
as global. From this result we can see that the best accuracy is ference on Information Technology and Management Science(ICITMS
obtained by using histogram of projection on y-axis and results 2012) Proceedings, pp. 487–494, Springer Berlin Heidelberg, 2012.
achieved with projection on y = x are almost the same. By far [6] F. Lauer, C. Y. Suen, and G. Bloch, “A trainable feature extractor
for handwritten digit recognition,” Pattern Recognition, vol. 40, no. 6,
the worst result is produced from SVM with x-axis projection pp. 1816 – 1824, 2007.
as input vector. This result is incomparable with others. With [7] S. Zekovich and M. Tuba, “Hu Moments Based Handwritten Digits
this pieces of information we try to combine the two pro- Recognition Algorithm,” in Proceedings of the 12th International Con-
ference on Artificial Intelligence, Knowledge Engineering and Data
jections with best results expecting improvements. Generally, Bases (AIKED ’13), pp. 98–104, 2013.
classification should be better with more projections because [8] C. Kan and M. D. Srinath, “Invariant character recognition with Zernike
they provide information about digit from different angle. It and orthogonal FourierMellin moments,” Pattern Recognition, vol. 35,
no. 1, pp. 143 – 154, 2002.
can be seen by merging horizontal and vertical histograms, [9] Z. Man, K. Lee, D. Wang, Z. Cao, and S. Khoo, “An optimal weight
best and worst consecutively, how classification efficient is learning machine for handwritten digit image recognition,” Signal Pro-
raised. Next we tested whether or not we will get a better result cessing, vol. 93, no. 6, pp. 1624 – 1638, 2013. Special issue on Machine
Learning in Intelligent Image Processing.
by taking all histograms together as input vector. Table II show [10] M. Kang and D. Palmer-Brown, “A modal learning adaptive function
accuracy of SVM classifiers with combination of histograms neural network applied to handwritten digit recognition,” Information
as input vector. As we expected, with more information we Sciences, vol. 178, no. 20, pp. 3802 – 3812, 2008. Special Issue on In-
dustrial Applications of Neural Networks10th Engineering Applications
gained better results. In [19] projection histograms are used of Neural Networks 2007.
along with zoning technique for character recognition. They [11] D. Keysers, R. Paredes, H. Ney, and E. Vidal, “Combination of tangent
reported recognition rate 99.03% for classifying digits from vectors and local representations for handwritten digit recognition,”
in Structural, Syntactic, and Statistical Pattern Recognition, vol. 2396
MNIST dataset. With our proposed method using only four of Lecture Notes in Computer Science, pp. 538–547, Springer Berlin
projection histograms without any preprocessing and further Heidelberg, 2002.
feature extraction 99.05% recognition rate is obtained. [12] H. Zhang, A. Berg, M. Maire, and J. Malik, “SVM-KNN: inative nearest
neighbor classification for visual category recognition,” in IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition,
TABLE II
vol. 2, pp. 2126–2136, 2006.
ACCURACY OF RECOGNITION ON MNIST DATASET
[13] D. Gorgevik and D. Cakmakov, “An efficient three-stage classifier for
Digit X,Y Y,Y=X Y,Y=X,Y=-X Four axis handwritten digit recognition,” in Proceedings of the 17th International
0 99.56% 99.44% 99.89% 99.89% Conference on Pattern Recognition, ICPR 2004., vol. 4, pp. 507–510
1 99.11% 99.89% 99.89% 100% Vol.4, Aug 2004.
2 98.11% 99.22% 99.33% 99.78% [14] E. Kussul and T. Baidyk, “Improved method of handwritten digit
3 95.67% 95% 96% 96.11% recognition tested on MNIST database,” Image and Vision Computing,
4 99.44% 99% 99% 99.44% vol. 22, no. 12, pp. 971 – 981, 2004. Proceedings from the 15th
5 94% 98.22% 98% 98.33% International Conference on Vision Interface.
6 99.89% 99.89% 100% 100% [15] R. P. Duin and E. Pekalska, “The dissimilarity space: Bridging structural
7 96.78% 98% 98.22% 98% and statistical pattern recognition,” Pattern Recognition Lett., vol. 33,
8 99.22% 99.67% 99% 99.78% pp. 826–832, May 2012.
9 98.67% 99.22% 99.11% 99.22% [16] T. Loo-Nin and L. Kia-Fock, “Robust vision-based features and clas-
sification schemes for off-line handwritten digit recognition,” Pattern
Gl. acc. 98.04% 98.76% 98.84% 99.05%
Recognition, vol. 35, no. 11, pp. 2355 – 2364, 2002.
[17] S. Chakraborty and R. Parekh, “Article: An improved template matching
algorithm for car license plate recognition,” International Journal of
IV. C ONCLUSION Computer Applications, vol. 118, pp. 16–22, May 2015.
[18] P. Sermanet, S. Chintala, and Y. LeCun, “Convolutional neural networks
In this paper, we presented a system of 45 SVM classifiers applied to house numbers digit classification,” in 21st International
for handwritten digit recognition. The results show that simple Conference on Pattern Recognition (ICPR), pp. 3288–3291, Nov 2012.
[19] G. Vamvakas, B. Gatos, and S. J. Perantonis, “Handwritten charac-
feature extraction with this system of SVM classifiers provide ter recognition through two-stage foreground sub-sampling,” Pattern
very good results. Using projections on only four axis as Recogn., vol. 43, pp. 2807–2816, Aug. 2010.
input vector for classifiers, the algorithm gives global accuracy [20] L. Wang, Support Vector Machines: Theory and Applications. Springer-
Verlag Berlin Heidelberg, 2005.
of 99.05%. This result could be improved by introducing [21] H. Chih-Wei and L. Chih-Jen, “A comparison of methods for multiclass
additional projection histograms. Size of the input vector could support vector machines,” IEEE Transactions on Neural Networks,
be reduced by summing two or more neighboring elements vol. 13, pp. 415–425, Mar 2002.
into one input element.

467

Authorized licensed use limited to: Eskisehir Teknik Universitesi. Downloaded on January 07,2021 at 15:32:03 UTC from IEEE Xplore. Restrictions apply.

You might also like