A Bayesian Neural Network For Separating Similar Compl - 1994 - Pattern Recognit

April 1994
Pattern Recognition
Letters
ELSEVIER Pattern Recognition Letters 15 ( 1994 ) 403-408
A Bayesian neural network for separating similar complex

handwritten Chinese characters
Hong-De Chang a, Jhing-Fa Wang b,., Shye-Chorng Kuo a
a Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan 701, ROC
b Institute oflnformation Engineering, National Cheng Kung University, 1 University Road, Tainan, Taiwan 701, ROC
Received 3 September 1992; revised 17 September 1993
Abstract
A Bayesian neural network for separating characters with the same number of linear like strokes and cross-points is proposed.
It is trained by an incremental learning vector quantization algorithm which endows this system with incremental learning
ability.
Key words: Handwritten Chinese character; Neural network; Shape coding; Recognition
I. Introduction tiers which tend to test competing hypotheses se-

quentially, neural networks test the competing hy-
Major research activities in character recognition potheses in parallel, thus providing high computation
are now centered on the recognition of handwritten rates. Additional advantages include robustness and
Chinese characters, which was once considered to be the ability to adapt or learn. Although neural net-
a very difficult problem and regarded as one of the works have the above merits, they also have the de-
ultimate goals of character recognition research. Since fects of being without incremental learning ability and
the number of Chinese characters is very large (at taking too much training time. For example, the ex-
least 5401 daily-used characters), a hierarchical rec- perimental results of research (Wang et al., 1991 )
ognition approach is usually needed for this work; i.e., show the training time of a tree-layer backpropaga-
a rough classification stage for a group of similar tion neural net model implemented on the ANZA co-
characters is first applied and then a recognition processor board working with an IBM P C / A T com-
scheme resolves individual identification within a patible computer system for 92 Chinese radicals is
group. Here we try to use a neural network to distin- 428.6 hours. This is impractical in actual conditions.
guish a group of characters which contain the same Here we propose a Bayesian neural network for sep-
number of linear like strokes and cross-points. arating a group of similar complex characters. It is
Interest in neural networks is rapidly growing and trained by an incremental learning vector quantiza-
several neural network models have been proposed tion (ILVQ) algorithm, which endows this system
for shape classification (Lippmann, 1987; Yong, with incremental learning ability and fast training
1988; Fukushima, 1983). Unlike traditional classi- speed. Experimental results show the proposed Baye-
sian net is a fast and effective method for Separating
* Corresponding author. similar complex handwritten Chinese characters.
0167-8655/94/$07.00 © 1994 ElsevierScienceB.V. All rights reserved

SSDI 0167-8655 ( 93 ) E0052-P
404 H.-D. Chang et al. /Pattern Recognition Letters 15 (1994) 403-408
2. Bayesian neural network

b ( i ) = b ( i ) I b~(i),
2.1. Feature vector and a subsidiary feature i = 0 , I ..... 7, J = 1, 2, ..., N , (1)
where N is the number of training patterns and " I "

The numbers and the distributions of four primi- denotes the logical OR operation, to record all possi-
tive (horizontal, vertical, left-slash and right-slash) ble variations. By the above formula, the four-cor-
extracted strokes of the observed character are used ner-code of the character " ± " is represented
to construct a 12-dimensional feature vector. The by ( 10000001 ), ( 11000001 ), (01000000),
distributions represent the average position and av- (01000000 ).
erage length of the primitive strokes. In order to in-
crease the recognition accuracy a subsidiary feature,
2. 2. Fundamental architecture
the four-corner-code, is introduced to assist the net-
work in cancelling the candidates whose four-corner
shapes do not match the input one. We define eight The fundamental architecture of a Bayesian neural
types of stroke patterns as shown in Fig. 1 to encode network used in our approach contains three layers:
each comer of an observed character in the sequence the input layer, the Gaussian layer and the mixture
of upper left comer, upper right corner, lower left layer as shown in Fig. 3. The input layer is broad-
corner and lower right comer. According to the out- casted to all processing elements (PEs) in the Gaus-
most stroke pattern of each corner, a character is en- sian layer, and the weights between the input layer
coded into four codes which form a four-corner-code. and Gaussian layer all set to 1. In the Gaussian layer,
The code of each corner is represented by one byte there initially are Mprocessing elements and each PE
and each bit corresponds to a stroke pattern. A char- represents one sub-cluster of a Chinese character. We
acter may have different four-corner-codes due to use the K-means algorithm to classify the training
handwriting variations. For example, the codes of the patterns of a Chinese character into M sub-clusters
upper left comer and upper right corner of the char- according to their feature vectors. Then, we calculate
acters shown in Fig. 2 are (00000001), (10000000) the distribution of the feature vectors which belong
and (10000000), (01000000) respectively. There- to the same sub-cluster to form a multivariate normal
fore, we can use the following formula distribution. Consequently, there are M individual
multivariate normal distributions for the M sub-clus-
ters in the Gaussian layer. The mixture layer contains
I-/\ • +C ¢ one PE per cluster which connects to each Gaussian
PE by the weighted connections.
The operation of the Bayesian network, serving as
a Bayesian classifier, can be separated into two phases:
:vertical stroke • :dot stroke the ILVQ algorithm training phase and the recogni-
m :horizontal stroke -~-:cross stroke

/ :right-slash stroke C, :comer
Mixtureslab
:left-slash stroke ¢ :empty
Fig. 1. Bits and their corresponding stroke patterns.
Gaussianslab
(~.~(10000000) (10000000) ~(01000000) AA ......A
Inputslab .......
Fig. 2. Two different handwriting variations of the character
1 2 N
" 2: "and their two upper comer codes. Fig. 3. The fundamental architecture of a Bayesian neural network.
H.-D. Changet al. /Pattern Recognition Letters 15 (1994) 403-408 405
tion phase. During the training phase, each PE cal-

culates the Gaussian probability density between the # i'j;,, = #'~;" × NUMy + X~,,,
NUMy + 1 , n = 1, 2, 3, ..., N ,
input vector and its weight vector. These Gaussian
probability densities are then used to select the PE (3)
with the closest weight vector if its density is above a • 2 (a~j,. + # ~ ; . ) X NUMj + X~.. . 2
threshold. Otherwise, a new PE is created based on (aiu;,) = NUMj + 1 ' - (Pi;j;,) ,
the input vector. In the recognition phase, the Gaus-
sian probability densities are calculated between the n = 1, 2, 3, ..., N , (4)
test vector and each PE's weight vector on the Gaus-
N U M 7 = NUMj + 1, (5)
sian layer. The cluster conditional probability is ap-
proximated by a mixture of the Gaussian probability where N U M / i s the total number of the training pat-
densities that are assigned to the same cluster. The terns which belong to the flh sub-cluster. Otherwise,
cluster conditional probability is then weighted by an a new sub-cluster is constructed and Xk stands for the
a priori probability of the corresponding cluster to sub-cluster center. The variance of the new sub-clus-
produce the a posteriori probability. Finally, the a ter is then defined as
posteriori probability is transformed into a log M
distance. E a2m; n
2 m=l
ai;M+l;n-- M , n = 1, 2, 3, ...,N. (6)
2.3. ILVQ training algorithm
Initially, we use the K-means algorithm to classify

3. Experimental results
the training patterns of a Chinese character into M
sub-clusters ( M = 3 in our system) according to their
feature vectors. Each sub-cluster forms a multivar- In order to demonstrate the performance of the
iate normal distribution. For each successive training proposed Bayesian neural network, we chose five
pattern Xk, we calculate the Gaussian probability groups of similar complex handwritten Chinese char-
density P,.,k(Xk [ W~(m)) between each sub-cluster acters. The characters of each group contain the same
W~(m) and the input vector Xk according to number of linear like strokes and cross-points. Each
group corresponds to a recognition architecture which
Pm;k(Xk I Wi(m) ) is constructed by the fundamental Bayesian neural
network described previously. This recognition ar-
l
chitecture contains three parts; Bayesian neural net-
(2•)N/2 0-2l;m;n work, arbitration comparator and minimum selec-
tor, as shown in Fig. 4. The part of Bayesian neural
×exp [ ~ (~.,,-,ui;,,,;,,) ]
network consists of L fundamental Bayesian neural
networks and each fundamental Bayesian neural net-
work represents a Chinese character. For example, the
m = 1, 2 , . . . , M , (2) chosen group S5F2 contains 18 characters and all of
these characters have 5 linear like strokes and 2 cross-
where N is the dimension of the sub-cluster center and points as shown in Fig. 5. The number L is equal to
the input vector, /t~;m;n is the mean of sub-cluster 18. The output of each fundamental Bayesian neural
W R m ) and at;,,;n represents the standard deviation network is the distance between the input unknown
in W~(m). Then we select the Pj;k=max(P,,;k), m= 1, pattern Xk and the reference character. Namely, if Xk
2, ..., M, and check whether or not the Pj;k is larger is closest to the ith Chinese character in the network,
than a threshold TH. If it is larger than TH, ten Xk is then the output of the ith fundamental Bayesian
assigned to the jth sub-cluster and the mean and var- neural network will be the minimum one. The arbi-
iance of the jth sub-cluster are adapted, according to tration comparator uses the following formula
~iii~i~i!ii~i~Jiiiiii~!iiiiiiiiii~i~i~ii~iiiiiiiii~!~i!i!ii~iiiii~i~!~!i~i~i~iiiiii~i~!!
~ii~iiiiii~iii~i~!i!i!iiiiiiiiiii
iii~iiiiiHiii
!~iiii~iiiiii~i~i~iiiiiiii~iiiii~i~i!!i!!iiiiii~i
bitrauonT itraonI bitraonI comparat~ L comparator
IInniiiiiluii ~
-
~om~~.~ ~ of~o~-~o~e,-o~o~.,
fofUrw-Codrner-code/_
word2 J
,o~-~oo~-~
of wordL f
l
~
Bayesian neural netwo
four:comer- code 1 2 N
of Xk
feature vector of Xk
Fig.4.Therecognitionarchitectureforeachclass.
H.-D. Chang et al. /Pattern Recognition Letters 15 (1994) 403-408 407
each character contains 30 samples written by the au-

thors, twenty samples for training and the others for
testing. The recognition results of these two files are
shown in Table 1. The average recognition rates of
95.1% and 84.7% have been obtained for these two
files respectively. The training time of each Bayesian
neural network depends on the number of training
patterns and the number of classes. For instance, the
training time of the network corresponding to the
Fig. 5. The charactersof the class $5F2. group $5F2 of the second file containing 360 ( 18 × 20)
training patterns is about 45 seconds based on 486/
33MHZ IBM P C / A T compatible microcomputer.
These training patterns are also used to train a three-
fsR= bS( i ) - b f ( i)'bS( i) , (7) layer backpropagation neural network. The training
k=l i
time is about 73 minutes under the tolerance 0.3
to check whether the four-corner shape of the input ( Idesired o / p - actual o/pl < 0.3). The comparison
character S is one of the variations of the reference of training times indicates the fast training speed of
character R. If the value o f f s n is zero, which means the Bayesian neural network.
the input character S and the reference character R
have the same four-corner-code, then pass through the
output of the fundamental Bayesian neural network;
otherwise, set the output of the arbitration compara- 4. Conclusions
tor at a large value. The minimum selector selects the
output with the minimum value from the arbitration In this paper, we have applied the Bayesian neural
comparators and identifies the unknown character. network to recognize similar complex HCCs and
Two different files are used as the test patterns. found the training speed of the Bayesian neural net-
Each file contains the five chosen groups of HCC work is faster than the backpropagation neural net-
which are listed in Table 1. The characters of the first work. It can also be incrementally trained while add-
file come from the CCL/HCCR1 (Computer and ing a new character to the network. The experimental
Communication Laboratories Handwritten Chinese results also show the proposed Bayesian net is effec-
Character Recognition) database and each character tive. In the recognition of handwritten Chinese char-
has 100 samples which are divided into two equal acters written by two specified persons and selected
parts, one for training and the other for testing. The from the CCL/HCCR1 database, the average recog-
second file contains 135 Chinese characters, where nition rates are 95.1% and 84.7% respectively.
Table 1
The recognitionresults of two test files
Testing pattern
Class a Specified person CCL/HCCRI data base
Learning set Test set Learning set Test set

S3FI 100.0% 98.7% 96.6% 89.1%
S4Fo 98.6% 94.3% 89.4% 82.7%
$5F2 99.8% 92.3% 87.3% 83.7%
85F3 100.0% 99.1% 93.6% 88.3%
SrFo 97.2% 91.3% 86.5% 79.6%
a SmFn represents the class which containsthe characters with m linear like strokes and n cross-points.
References Wang, J.F., H.D. Chang and J.H. Tseng (1991). Handwritten
Chinese radical recognition via neural networks. In: Proc. 1991
Internat. Conf. Computer Processing of Chinese and Oriental
Fukushima, K., S. Miyake and T. Ito (1983). Neocognitron: a Languages, 92-97.
neural network model for a mechanism of visual pattern Yong, Y. (1988). Handprinted Chinese character recognition via
recognition. IEEE Trans. Systems Man Cybernet. 13, 826- neural networks. Pattern Recognition Left. 7, 19-25.
834.
Lippmann, R.P. (1987) An introduction to computing with
neural nets. IEEE Acoust. Speech Signal Process. Mag. 4,
4-22.

A Bayesian Neural Network For Separating Similar Compl - 1994 - Pattern Recognit

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Bayesian Neural Network For Separating Similar Compl - 1994 - Pattern Recognit

Uploaded by

Copyright:

Available Formats

April 1994

A Bayesian neural network for separating similar complex

I. Introduction tiers which tend to test competing hypotheses se-

0167-8655/94/$07.00 © 1994 ElsevierScienceB.V. All rights reserved

2. Bayesian neural network

where N is the number of training patterns and " I "

m :horizontal stroke -~-:cross stroke

tion phase. During the training phase, each PE cal-

Initially, we use the K-means algorithm to classify

Bayesian neural netwo

each character contains 30 samples written by the au-

Class a Specified person CCL/HCCRI data base

Learning set Test set Learning set Test set

You might also like