Professional Documents
Culture Documents
ISBN 978-83-60810-14-9
ISSN 1896-7094
I. INTRODUCTION
HE today's systems of automatic sorting of the post
mails use the OCR (Optical Character Recognition)
mechanisms. In the present recognizing of addresses
(particularly written by hand) the OCR is insufficient.
The typical system of sorting consists of the image
acquisition unit, video coding unit and OCR unit. The image
acquisition unit sends the mail piece image to the OCR for
interpretation. If the OCR unit is able to provide the sort of
information required (this technology has 50 percent
effectiveness for all mails), it sends this data to the sorting
system, otherwise the image of the mail pieces is sent to the
video coding unit, where the operator writes down the
information about mail pieces.
Training Set
Filtration
Image
Binaryzation
Normalization
Radon
Transformation
Feature
PCA
vector
Database
Accumulator
Learning
Analysis
Recognition
Testing Set
Filtration
Image
Binaryzation
Normalization
Accumulator
Preliminary
Analysis
Classification
Radon
PCA
Transformation
495
Recognition
Character
496
Output Image
91 155 155
91 155 155
91
82
91
82
91
155
91 155 164
91
155
91 155 164
91 164 164
91 155
91 155
91
91
91 164 164
91
91
91
91
91
91
91
91
91
155
91
82
91
82
91
91
91
91
91
91
91
91
91
91
155 155
91
82
91
155 155
91
82
91
91
164 155
91
91
164
91
155
164
91
82
164 155
91 164
91
155 164
91
82
164
sorting
82
91
91
164 155
91
91
82
1
[ x , y ,1]=[i , j ,1] 0
I
0
0
1
][
mi 0 0
cos sin 0
0 m j 0 sin cos 0
0
0
1
0 0 1
x'
(1)
x
f(x,y)
if i , j
I=
f i , j
Rotation angle
So
ur
c
0
1
J
Imput
Image
Se
ns
or
s
jf i , j
J=
f i , j
i
(2)
(4)
x' is the perpendicular distance of the beam from the origin
and is the angle of incidence of the beams. The Fig. 5 illustrates the geometry of the Radon Transformation. The
MIROSAW MICIAK: CHARACTER RECOGNITION USING RADON TRANSFORMATION AND PRINCIPAL COMPONENT ANALYSIS
497
f(x,y)
x1 '
x'
f(x,y)
1
X
r k=1 k
(5)
R k = X k M k
(6)
M k=
= r R k R tk
k=1
R(x1')
R(x')
R(x')
x'
Fig 5. Geometry of the Radon Transform
x'
(7)
where:
[]
[]
X1
X
X= 2
.
Xr
(8)
M1
M
M= 2
.
Mr
(9)
498
[]
R1
R
R= 2
.
Rr
( 10 )
Principal components are calculating from the eigenvectors l and eigenvalues l of the covariance matrix . The
eigenvectors l are normalized, sorted in order eigenvalue,
highest to lowest and transponed, to obtain transformation
matrix W, where K is the number of dimensions in the dimensionally reduced subspace calculated by:
K
i
i=1
l
( 11 )
i
i=1
1 ... 1
W = . ... .
1
K
l ... l
65 115 69 63 69 119
12
37
36
72
62 54 58
68
65
43
43
62
57 54 61
63
74
36
55
82
58 62 77 113 123 52
24 30
( 12 )
where:
[]
The next step of vector features preparing is concatenate operation and Principal Component Analysis of resized matrix
from Radon Transformation, see on Fig.8.
As a result of PCA is 8 element vector L1-L8 of main values from input data, which will be used to feature vector.
The second set of data are: code of known character ZN as a
Unicode [13] and number of local maximum LP from the
Accumulator Analysis stage. The feature vector consists a 10
values Fig.9.
( 14 )
UNICODE
The final data set will have less dimensions than the original [8], after all we have r column-vector for each input image with K values :
t
Y k = y 1, y2, ... , y K
( 15 )
The PCA module in proposal system generate a set of
data, which can be used as a features in building feature
vector section. For instance when we use input matrix 8x8
from Radon transformation stage, as a result we obtained
K=8 values vector, using Cattell's criterion [9].
IV. FEATURE VECTOR
Two sets of data received from the PCA module and accumulator analysis stage are used to create vector of features of
character. The amount of data from Radon transformation
depends from the image of character size and numbers of
projections. For example, we use image with size 128x128
PCA data
ZN
LP
L1
L2
L3
L4
L5
L6
L7
L8
48
432558
-121290
13117
-276860
166130
22197
-59384
-38101
Number of
maximum
V. PRELIMINARY CLASSIFICATION
The aim of the preliminary classification is to reduce the
number of possible candidates for an unknown character, to
a subset of the total character set. For this purpose, the selected domain is categorized into six groups with number of
local maximum as in Fig.10.
MIROSAW MICIAK: CHARACTER RECOGNITION USING RADON TRANSFORMATION AND PRINCIPAL COMPONENT ANALYSIS
Database
Group 1
5
Group 6
Group 2 Group 3
432550
4 34 23 52 55 05 0
4 3 2- 15 25 10 2 9 0
4 34- 23-1152-22155112052211099230091 01 7
13117
- 1- 21 12 2111923- 30291170116778 6 0
-2 7 6 86 0
1 - 31-22137711661718867666001 3 0
166130
- 2- 72 67118666686601- 130 38001 0 1
-3 8 10 1
1 61 66- 16-33318801120002111 9 7
- 3- 83 182201222101- 125199197793 78 4
2 22- 12-55919- 95793397883448 4
- 5- 95 39 83 48 4
432550
4 34 23 52 55 05 0
4 3 - 21-521 512 021 92 09 0
4 34- 231-522155120521 0921 093 01 1 7
13117
- 1- 21 12 211 9231 -091320117 716 78 6 0
-2 7 6 86 0
1 - 312-13721167718617686 066 01 3 0
- 2- 72 671 8661 6866106166-03163 0318 031 00 1
1 61 66- 163-3183-03183 00182 1012 101 19 7
- 3- 83 182 0122 10122-19125 7919 793 78 4
2 22- 125-9195-79395 7839 483 48 4
- 5- 95 39 83 48 4
Amount of local
maximum
432550
4 34 23 52 55 05 0
4 3 2 -51 52 01 2 9 0
4 3 -4213-5212-5121502125921100923 091 01 7
13117
- 1 -2 11 22119312-013291170716 78 6 0
-2 7 6 86 0
1 - 321-17231671786116867066 01 3 0
166130
- 2 -7 261786166668016 -3130038 01 0 1
-3 8 10 1
1 6 16 -6136-3831018 0120102 11 9 7
- 3 8- 3128022111220-912517919 793 78 4
2 2 -2152-9951-739598397483 48 4
- 5 9- 53 98 34 8 4
4 34 23 52 55 05 0
432550
4 3 - 21-521512021 92 09 0
4 34- 231-5221 5512 0521 0921 093 01 1 7
13117
- 1- 21 12 211 9231 -091320117716 78 6 0
276860
1 - 312-1372 1167 718617686 066 01 3 0
166130
- 2- 72 671 8661 6866-06163-031830318 001 10 1
1 61 66- 163-3183 0318200122 1012 191 79 7
- 3- 83 182 0122-10125-191957939 783 48 4
2 22- 125-9195 7939 783 48 4
- 5- 95 39 83 48 4
Group 2
4 34 23 52 55 05 0
4 34- 231- 5221 5512 0521 092 09 0
4 342- 315- 22155120521 0921 093 01 1 7
13117
- 1 -2 11 22 119 2310- 9312 0117 716 78 6 0
-2 7 6 8 6 0
1 3- 12-1372116771861 768 066 01 3 0
166130
- 2 -7 26 718 6616 8660 616- 031 038 01 0 1
-3 8 1 0 1
1 61661- 633- 1830381 0021 120 11 9 7
22197
- 3 -8 31820 2121 021- 1915 799 73 8 4
-5 9 3 8 4
2 22-125-91957993 783 48 4
- 5 -9 5398 34 8 4
The preliminary classification is based on the amount of local maximum calculating in the Accumulator Analysis stage
Fig.11.
Accumulator
Analysis
RT
PCA
Recognition
D C i , C r = [R j A j ]
datasets contain the digit patterns of above 130 writers. Collected 920 different digits patterns for training set and 300
digits for test set. Each pattern is represented as a feature
vector of 10 elements.
Comparing results for handwritten character with other researches is a difficult task because are differences in experimental methodology, experimental settings and handwriting
database. Liu and Sako [15] presented a handwritten character recognition system with modified quadratic discriminant
function, they recorded recognition rate of above 98 %.
Kaufman and Bunke [16] employed Hidden Markov Models
for digits recognition. They obtained a recognition rate of 87
%. Aissaoui and Haouari [14] using Normalized Fourier Descriptors for character recognition, obtained a recognition
rate above 96 %. Bellili using the MLP-SVM recognize
achieves a recognition rate 98 % for real mail zip code digits
recognition task [17]. In this experiment recognition rate 94
% was obtained. The detailed results for individual testing
sets was presented in Table I.
TABLEI.
RECOGNITIONRATEFORTESTINGSETS
Testing Set
Database
( 16 )
94.7 %
Set 2
94.2 %
Set 3
93.3 %
VIII. CONCLUSIONS
The selecting of the features for character recognition can
be problematic. Moreover fact that the mail pieces have different sizes, shapes, layouts etc. this process is more complicated. The paper describes often used the character image
processing such as image filtration, binaryzation, normalization and the Radon Transformation calculating.
The character recognition algorithms were proposed. In
connection with this work the application included the algorithms is in progress. So far the application reached recognition speed 30 characters/sec without any optimization.
In the future work is planning to use another statistical
methodology such as JDA/LDA. Moreover the will be upgraded to remaining all alphanumerical signs and special
signs often placed on regular post mails.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Recognition rate
Set 1
j=1
where:
Ci - is the predefined character,
Cr - is the character to be recognized,
R - is the feature vector of the character to be recognized,
A - is the feature vector of the predefined character,
N - is the number of features.
The minimum distance D between unknown character feature and predefined class of the characters is the criterion
choice of the character [14].
499
[7]
500
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]