Professional Documents
Culture Documents
Edited by K. S. Fu
School of Electrical Engineering
Purdue University
Lafayette, Indiana
DOl: 10.1007/978-1-4615-7566-5
© 1971 Plenum Press, New York
Softcover reprint of the hardcover 1st edition 1971
A Division of Plenum Publishing Corporation
227 West 17th Street, New York, N.Y. 10011
United Kingdom edition published by Plenum Press, London
A Division of Plenum Publishing Company, Ltd.
Davis House (I;.th Floor), 8 Scrubs Lane, Harlesden, NWI0, 6SE, England
All rights reserved
No part of this publication may be reproduced in any form without
written permission from the publisher
PREFACE
t Two papers, which were not received in time to meet the publication
deadline, have not been included.
v
vi PREFACE
K. S. Fu
Lafayette, Indiana
April 1971
CONTENTS
vii
viii CONTENTS
Kokichi Tanaka
University of Osaka
Osaka, Japan
1. INTRODUCTION
1
2 TANAKA
{
> on en=l (input contains a pattern) (1)
where
~ =< °
n
gn =0 (input does not contain a pattern)
= -n-
ytL-l Y -(y -H )t (~ +L)-l (y -H ) (2)
~ -n -n -n -n - -n -n
(1_p)2 l~n+~1
0 = log p2 + log I~I
n
I:::-n +L:l l /2
P(6 =l/yn
n -'
en- l ) = [1 + l-p
P
-
1/2 exp(-
1-1
2 Qn)] (4)
I~I
where p=a priori probability that the input c~ntainsAa pattxrn;
yn=(X l 'Y2""
- -
,Y-n
) is the input sequence; and 6n=(~1,62"" ,6 n ) is
the teaching sequence. In addition, Bn and ~n are the iteratively
obtained estimator of the unknown pattern! and its covariance,
respectively, by the following rule:
H H + 8 L:- l ::: l(Y -H ) ; a posteriori mean of pattern X
-n+l -n n- -n+ -n-n
-1 A -1-1 T5 )
:::n+l (:::
-n
+ 6 L:
n-
) ; a posteriori covariance of X . (6)
(8)
where
-1
Z
-n
= -n
Y + L::::
--n-n
H; F
£
= I C .u
nl ni
U C ut
-n-n-n
i=l
and
4 TANAKA
i
+ I
i=l
respectively, where
hnl Ynl nnl Xl
hn2 Yn2 nn2 x2
H= y = N= X =
-n -n -n
h ni Yni nni xi
u nl
u n2
U = = a normal random vector with a unit covariance
-n matrix*
u ni
nni a standard normal
~n = diag.(cnl,cn2,···,cni)*; n' . = - - =
nl a. random variable
:\.
2
6 E; .x.+a.h .
n nl 1 1 nl
c . =
nl 2
u.
nl
= n'. +
nl a.E; .
(12)
a.+E; . 1 nl
1 nl
v =
E(U .) = (16)
nl
SOME STUDIES ON PATTERN RECOGNITION 5
G (.)
m
= chi-SQuared cumulative distribution function of
m degree of freedom.
p arbitrary positive constant satisfying the condition
p ~ min c .
i=1,2 , ... ,£ nl
£
E = I {E(U . )}2 = noncentrality parameter.
nl
i=l
Thus the probability of error at nth time becomes
Wen = (l-p)P e (8 n
=118 =O)+pP (8 =018 =1)
ne n n
where
P (8 n =118 n =0)
e
I-P
£,E o
(0(1))
n
}
(18)
P (8 =018 =1) P (0(1))
e n n £,El n
(2' )
8 =1 i f Q(2) >
A
or (1' )
n ' n n n ' n n
(8' )
D . (22 2)
= dlag. 0d 1'0 d2" .. ,0 d£
6 TANAKA
Then define
H = E(Y /8 =1)
-n n
"
a.1-
H
A A
= P(8 =1/8 =1)X+P(8 =1/8 =1)-- +P(8 =0/8 =1)
n n - n n y'gt~H n n
"-
p{l-4>(dl )}
P(8 =1/8 =1)
n n
d
<pI (x) = dx <p(x) .
- - - - - - (27)
Eliminating R(=!) on both sides of eq. (27), we can obtain a given
as follows:
H
~l ~, -n+l = -n - -n-n
A
(::;:-1 + e I-l)-l (30 )
:::1 ~, :::n+l = -n n
This rule differs from that of DDM only in the point that In is
modified as aX n .
8 TANAKA
Table 1
Equivalent SN ratio Error prob. Error
SN obtained when the attained prob.
ratio signal used was uni- k a at terminal for
~tI-~ formly distributed phase of template
r- - - over L=20 dimensions DDM machines
0.5 -16 dB 2.050 0.5889 0.3675 0.3619
1 -13 1.589 0.7165 0.3155 0.3085
2 -10 1.291 0.8336 0.2444 0.2398
5 -6 1.081 0.9441 0.1328 0.1318
10 -3 1.019 0.9849 0.0569 0.0569
20 0 1.0021 0.9980 0.0127 0.0127
-n+ 1] = E[n
Covar.[n -n+ l-x)(n
- -n+ l-X)t]
- = -Kin
where 2 2 t
E= lip [(p-p) (!l + ~ ) + f.]
Thus, after the example of the decision-directed machine, the learn-
ing rule equations for the non-decision-directed machine may be
given by
y
H H + K- l :;:: ( =!!.-H) M
-n+l -n - -n+l p -n' ~l
(:;::-1 + E-1)-1 , = A
} (35 )
=n+l -n =1
This means that in the non-decision-directed machine ~n+l and ~n+l
are always updated in an iterative manner by using the input vector
Xn as an aid in making its next estimate, no matter whether the
pattern was actually contained or not contained in Xn . On the other
hand, the decision-directed machine applies the input vector Xn to
update only when the pattern was estimated to be present.
d. Experimental Results
H.
0. 1
COO" 11
e •• entl.ll,.
t he .... I.
\ 'h11 c.orY. J
0.0
----- -lN~
- - - - - - .eD
10 so 100 soo
Figure l-(a). Normalized-Squared-Error En as a Function of Time
in Learning Processes for Four Machines Used.
SNR. - 1) dB
- - _____ , COD"
- _ _ _ _ S , upeM'lled
"",c h i n •
X =-aX +Z (39 )
-n --n-l-n
and the measurement is given by
y - 8 M = 8 X +N (40 )
-n n- n-n -n
The optimal estimation of this system may be given by [15]
x:-n/n =
- --n-l/n-l
aX +8 P L:-l(y + a X )
n-n/n- -n -n-l/n-l
(41)
P = aP a t+ ,I, (42 )
-n/n-l --n-l/n-l- ~
P = [p-l +8 L:-l]-l (43)
-n/n -n/n-l n-
X. I .
where - l J is the best estimate of X.
-l
when the measurements in the
1,2, ••. ,jth intervals are given, and ~i/j is its co~ariance. In
our pattern recognition problem, ~n corresponds to ~n/n-l' and ~n
corresponds to ~n/ -1. Further, the single-stage optimal predicted
estimate is given gy
h
-n
= X
-n/n-l
= - aX
--n-l/n-l
(44 )
substituting (43) nto (42), and replacing ~n/n-l with =n' we obtain
~ = ~ + a(~-11+8 lL:-l)-lat
-n - -n- n- - -
This is the same as (38-3). Multiplying (41) by -a from the left
side, and using (44), we obtain
h = - a{h +8 p L:-l(y -h )}
-n+l - -n n-n/n- -n-n
Then
{ ) -1 -1 -1 -1 )}
!!n+l- ~ = - §:. (!!n -~ +8 n (~n +f. ) f. (~n -!!n
This is the same as (38-2). The behavior of -n
H and -n
~ as n ~ 00
are well known in the optimal estimation theory, though few explicit
statements have been done.
lil = !i, ~l = !
H = M+ ~ [ +st (a +8 L:-l)-l{f +8 L:-l(y -M)}]
-n -n 80-1 -n-l -n-l n-l- -n-l n-l- -n-l-
{ T -1 -1 }-l (4 )
::n = rn-l-§n-l(~n-l+8n-l~ ) §n-l ' n=2,3, ••• , 5
SOME STUDIES ON PATTERN RECOGNITION 13
where
Y + a t ,I, -1 a -Sf t Q'S' ; a _ ( t a,t , T)",-l
~k+l - - ~l+~k ~k~2 !
~k+l -k -l! -1 -k -K-k
-1 { -1 }
~k+l = -~ ~2~~ !k+8k~ (~k-~) ;
t -1
§k-~2~ ~l (46 )
.8
Q){~ =-1.899
0,= .901
(P,: .99:0 )
~-.9902
.6
2 NO- ORDER @{OlQ,==- .949 (P.: .99~0 )
. 95 1 ,0,- .9901
.2 --- ----
TIME DIFFERENCE n
-.2
Under the conditions of L=20, p=1/2, ~=~, I=cr 2!, !=~!, !n=~n!'
1=~!, m=5, and ~i=ai!' we have simultaneously made the computer
simulations of the three types of the machines above discussed; i.e.,
the DD machine, the MDD machine, and the supervised machine, for
some of the processes having the autocorrelations shown in Fig. 2.
The averaged results of 100 runs are shown in Figure 3-(a) and 3-(b).
PNR is a n~minal energy ratio of pattern to noise defined by PNR =
10 log(~/cr ) [dB]. Therefore, the actual pattern to noise ratio
fluctuates momentarily around the (nominal) PNR according to the
fluctuation of the pattern. Symbols used in Figure 3-(a) and 3-(b)
are as follows:
E = [(H -x )t(H -x )] / xtx
n -n -n -n -n -n-n
normalized s~uared error
= P(8 =l)p(e =0/8 =1)+P(8 =O)p(e =1/8 =0)
n n n n n n
probability of dichotomization error
<l E=E in the stationary region of 1st-order autoregressive
n
process
Since all the three machines have the function to discriminate the
measurements by the normalized distance between the estimate ~n of
the pattern and the measurement Xn , we used the s~uared error of tin
in addition to the probability of error as the performance index of
the machines.
1.0~_~
c:
UJ
.6
.J.
- - - SUPERVISED
TIME n
Figure 3-(a). Normalized S~uared Error of Estimates for 2nd-Order
Autoregressive Process ®
in Figure 2.
SOME STUDIES ON PATTERN RECOGNITION 15
.3
-----. DO
MOD
- -- SUPERVISE:D
.1
~ O dB
- - .- ...,.- ~--:=::=..::
2 5 20 so 100
TIME n
From these results, we can see that the effect of the learning
appears within the time having comparatively large autocorrelation
coefficient Pi' and as Pi becomes smaller, it comes to the station-
ary state, where the variation of the pattern is balancing with the
quantity of the learning. For example, in Figure 2, the autocorre-
lation of ® becomes very small after the time difference 50.
Corresponding to this fact, in Figure 3-(a) and 3-(b), the perfor-
mances of the machines are poorly improved after the time 50. We
can also see that the performance of the MDD machine is better than
the DD machine especiallY when the noise is large. This is because
of two reasons. First, the structure of the MDD machine is closer
to that of the optimum nonsupervised machine than that of the DD
machine. Second, the DD machine is compelled to use biased measure-
ments to update tin' The bias accompanies the DD approach because
misclassified samples of Xn are included in the estimator tin'
16 TANAKA
4. CONCLUSION
REFERENCES
Y. T. Chien
I. INTRODUCTION
18
LINEAR AND NONLINEAR STOCHASTIC APPROXIMATION 19
Although the treatment here will be in the context of param-
eter estimation and pattern recognition. the concepts and algorithms
developed are applicable to other learning systems involving identi-
fication or recognition.
(1)
Since the choice is determined by the maximum of the m values p(X/C j ).
one can just as well compute an arbitrary monotone function of
p(X/C.). the likelihood function. to select C.• This is the well-
J J
known maximum likelihood procedure. What is implied in this
procedure is that one has been given a complete statistical de-
scription of the m classes of patterns in terms of the m probability
densities. This is of course an unrealistic assumption in practical
applications. A more realistic case in learning to recognize pat-
terns is to assume that a complete statistical description is not
given. In fact. through a learning process. the conditional proba-
bility densities should be allowed to change on the basis of
pattern vectors that have been previously classified. These vectors
are commonly called learning samples. denoted by Xl .x2 •· •.• Xn' with
subscript indicating the. sequence of classification. Since these
20 CHIEN
vectors are classified samples. we can assume a sequence of learn-
ing samples Xl .X2 •...• Xn to be the n learning samples for each and
every conditional density under consideration.
x
i
=M + N.
l
It has been shown by Chien and Fu [2] that the successive estimate
M for M can be obtained by using a stochastic approximation algor-
n
ithm of the following type:
(4 )
M
n = Mn-l + Yn (Xn - Mn- 1)
where y is a sequence of numbers satisfying certain conditions.
n
The recursive estimate M defined in
n
(4) will converge to the true
mean M in mean square and with probability one. In addition. if
Yn is chosen according to the initial guess MO as in [1]. the mean
square error for M computed for each n is minimized. In terms of
n
our recognition problem. we see that the recursive algorithm defined
in (4) enables the recognition device to lock into the unknown pat-
terns by learning the mean vector for each and every class. The
optimal recognition performance (with minimum probability of mis-
classification) is achieved in the long-run as the number of
learning samples increases indefinitely.
LINEAR AND NONLINEAR STOCHASTIC APPROXIMATION 21
where
D
n
= Xn - M
n-l
Assumption (iii). For each N., type I noise can occur with a
1
probability (l - a) and type II noise can occur with a probability
a where a«l. Thus if Pl(N) denotes the probability density of
type II, then the noise component of X. can be described by a
1
mixture distribution, with probability density P(N):
P(N) = (l - a) Pl(N) + aP 2 (N) (6 )
Let
d(D )
n
= DTn K -l D
l n
Note that d(D ) can be considered as a measure of the distance
n
between the current learning sample X and the previous estimate
n
for the mean, M l.
n-
We can discuss the behavior of F (D ) in
n n
(8)
by finding its asymptotes in terms of the measure d(D ).
n
LINEAR AND NONLINEAR STOCHASTIC APPROXIMATION 23
(ll)
Combining case (I) and case (II), we can characterize the asymptotic
behavior of F (D ) in the following way: If the current learning
n n
sample X is sufficiently close to the previous estimate M , F (D )
n -1 n n n
takes the form of D , weighted by a factor K 1 Kl If X is far
n n- n
away from M 1 (which most likely signifies an unreliable sample),
n-
F (D ) is essentially D but weighted by a factor K lK 2 - l Since
n n n n-
1
K 1 K2 - is negligible (6)>1), a good approximation to F (D ) can
n- n n
be mathematically described by selecting a threshold, say 8, so
that
NORMAL DlSTRIBunON
m' IllAN VALUE' 10
IT,' STANDARD DEVIATIOIII • 5
(3. 11: lIS'," • I 0
a:. O. I
• ••• NONLINEAR
XXx X LINEAR
O.l~1- - - - -.......---:':::--~----~-1..10-0---------l1OO0
90
... ............
............
.".
•• •••
80
•
...
••••
•••
• ,,""
10
••••
• .""""
........•... •••
Il""
. :e.••
.......-.
60
,,"
,,:?"'''''
••• "x"
""....
50
• ,,""x"
40 .. "
""
""" ........
" •••••
,,".. ..,.
_ UIC[.... AUOIIITHII
U NIA .. ALGOIIIT....
x"
.. x
",tJ' "
" ....
-
30 >c""
20
10 40 50 60 70
NUMBER OF EFFECTIVE LEARNING SAMPLES
REFERENCES
ACKNOWLEDGMENT
University of Tokyo
Tokyo, Japan
I. INTRODUCTION
29
30 MORISHITA AND TAKANUKI
" w, X,
x2
X,
1\, w,
w,
" B.zl "
Xl X,
w,
w,
X (t)
1-----001 MUlmum
selector
W ,tt) ' X (L)
Assume that patterns in the input sequence !.(t) are those obtained
from a distribution by random sampling. Since during the interval
h a large number of patterns are presented, we can put
MULTI-CATEGORY PATTERN CLASSIFICATION 33
-P2l -P 2i -P2M
j~2P2j
P = (19)
-Pil -P i2 L P"
j:fi 1J
-P iM
-P
Ml
-P
M2
-PMi .j;lM
I PM'J
34 MORISHITA AND TAKANUKI
let us consider
(21)
or
M 2
J(lil'~2""'liM) = E{.L (~fi (~;~1'~2""'~M)-~i) } (22)
l=l
Putting
dw. (t)
2T _-...
J __
dt
yields
dw. (t)
T -~"""~"-- + ~j(t) = JXj(~1(t)'li2(t), •.• '~M(t))~P(~)d~. (26)
It has been shown that the algorithm minimizes the criterion func-
tion J. Substituting a steady-state solution into (22) yields a
minimum value of J:
MULTI-CATEGORY PATTERN CLASSIFICATION 35
V. AN EXAMPLE
us assume that any of the four subspaces is not null. Then, there
3
exists four boundaries B~2' B~3' B 4, B41' We may describe them by
the exqu:tl[o~ncoos
_
8
r Sln 8~
r] i=1,2,3,4. (32)
l
The weight vectors ~: and the angles 8i must satisfy the equations
2" 8 *+1
= Jx* 9(~)d~ = ar J l ~~: ~] sin 2 28d8, i=1,2,3,4, (33)
0
~~
* 2[8/15]
~l=ar 8/15 ' ~2=ar
2[-8 /1 5]
8/15' ~3=ar
* 2r- 8/ 15] * 8/15J2[
~8/15 ' ~4=ar -8/15 ' (36a)
8; = ~ 1f, (36b)
36 MORISHITA AND TAKANUKI
x'2 x, ,
w'
I
w')
x'l
( . ) I • )
x' \
,2 i X,'
X; x2' i w'
I
w'
I
i
w'
w;
""\ xI '
x'
l
x'I
x'
x; l
"\ W
4'.O 'Co.
( < ) I d) I • )
X,'
....
I
I I ) I g)
w* 2~1.037J ' w*
-1 = ar 0 -2
= ar 2 -0.513 * 2 -0.513
-[
0.589' ~3 = ar -0.589 ' J -[ ] (39a)
8* = 1.209,
1
8* = TI
2 '
8 3 = -1.209 , (39b)
or
-1
2~0.500]
w* = ar 0 '
*_ 2[-0.25 0
~2- ar * J
2{-0.25 0]
0.898' ~3 = ar -0.898 ' (40a)
8* = 0.676,
1
8* = TI
2 '
83 = -0.676 , (40b)
l 2[-0.863'
2 '
2[0.53~]
w* = ar * _ 2 0.863] * 0.326] (41a)
-1 0.537 ' ~2 - ar 0.326' ~3 = ar
8* = 1. 721,
1
8* = 2.. TI
2 4 '
83= 0.150 . (41b)
The three solutions are shown in Figure 3(c), (d) and (e), respec-
tively. Clearly, rotating each solution by the angle 1/2 TI, TI or
3/2 TI yields another solution.
Now let
w* = ~4 = 0 (42 )
-3
Then we obtain
w* -_ ar
2" [16/15] 2 [-;6/15]
-1 0 ' w*
-2 = ar , (43a)
1 1
8* =
1 "2 TI, 8* = -
2 "2 TI , (43b)
or
2" [712/15J
w* = ar 712/15 ' w* 2 [ -712/15]
ar -712/15 , (44a)
-1 -2
8* =
1
.f.'+ TI ' 8~
1
= - "4 TI. (44b)
The two solutions are shown in Figure 3(f) and (g), respectively.
Clearly, rotating each solution by the angle 1/2 TI yields another
solution.
. , , . .
o
~
I ..
"!"
\ ":~
'i. ,.
J 'I'
"~" " f e ,,
K,'., "
..' " ff'
~ ., ~4 . (.,o f •
•
•
'"
,~
I,
•
,{ , , ' r.~'
.'
I . .. •
• •
-
-1.-=-C_ _ _
r----~
1. 0
: Xl
, '.' ,.:.
" ,
••
.,. I,
"
.
~ .-::.~
, ,~
:
.... )f"
.... :..
'\,.:'.:',:
' V ' ..
~ ,'# ',~
~.,
II'
•
"
~.~'----~~~----~-----'~.~~----~~'
"
I;
,;
(a)
w,
w,
.... , ......
/'
I'
I
I;
<j
?
(b)
w,
w, I. •
VIII. CONCLUSIONS
REFERENCES
ACKNOWLEDGMENT
WITHOUT A TEACHER
Osaka University
Osaka, Japan
I. INTRODUCTION
42
A MIXED-TYPE NON-PARAMETRIC LEARNING MACHINE 43
instead of the teacher. Therefore, it is very difficult for the
machine to learn by itself.
II • THE MODEL
Detector
z Je t---'---nO
and
= (x(t0
), x(t +
0 0
T), ... , x(t +(m-l)T))
WI>
X = 0
k
46 SHIMURA
where X~ is the k-th "signal" pattern. that is. the input pattern
in which an actual signal exists with noise and X~ is the k-th
"no signal" pattern.
(12)
:: Q
where Ct. l • Ct.2 are the numbers of "no signal" patterns Y~ when nj = -1.
+1. respectively. and 131' 132 are the numbers of "signal" patterns
yi when n~=+l. -1. respectively.
q, =2 IU~2+8ms2+mU02/2 (15 )
U (2p-l)(lw.s.-8)+(2q-l)8-ms 2 /2-m0 2
L. l l
Thus
lim p(Q>O)
~
=1 (20 )
r
q = prob. (n=-ll X~)
J
2 (22)
2 m8 /aexp- !..- dt
I2TI 0
2
2
!..- dt (23)
2
1 m
where So
= -
m. 1
s.. L Thus as 8 increases, q also increases, but p
becomes small§r.
r
q is approximately 0.68 and
2
V. TWO SIGNALS
Consider the case in which there exists two signals S~ and S2'
When the input patterns are classified into two classes: signal Sl"
~
• 1
1.0
0.9
0.8
0.7
0.6
O.S
0.4
30 SO
__ ~
100
______ ~
200 300
__L--L-L____
SOD
~
1000
___ m
10 20
Figure 3. Relations Between the Probability of Correct Answers p
and the Number of Dimensions m in a Mean Value Detector.
50 SHIMURA
p
1.0
0.9
0.8
0.7
0.6
O.S
0.4
1
L-~
10
______L -_ _
20 30
L-~~
so
____ ~
100
______L -_ _L-_'L -'L-____
1000
__ m
and "signal S2" patterns. the decision rule of the linear classifier
shown in Figure 1 is
Hk Xk Sl + Nk if Ewixki ~8
d(~) = { 1 (26 )
Hk Xk + S2 + Nk i f EWiXki < 8
2
\WN\ < 8
U becomes positive if p > 0.5 and q > 0.5. The probability that Q
is positive can be written in the form
2
p(Q>O) - 1 Joo exp- -t dt
I2TI -]1 2
where uU
]1
Thus we have
lim p ( Q>O) = 1
u+N
p
1.0 5/, " 3dB
0.9 OdB
0.8 ·3dB
0.7 .. - -- _..·6dll
0.6
unsupervised
supervised
--~------ L-_n
_ _L -_ _ _ _ _ _ _ _
0.8 -3dB
-6dB
0.7
0.6
0.5jt..£-~-~
6 • 1. 21 0 2
6 ., 1.1 00 2
T n
o SO 100 ISO 200 250
2
Figure 7. Learning Process when 8 1.2102 and 1.100 .
---- ---'
SIN • 3dB
1.0 • ___ ----
0.9 -------
0.8 --:::::.-
.. -_ ...
0.7
-6d8
0.6
m • 10
m • 20
1 m . 40
~----~----~----~------~----~n
o SO 100 ISO 200 250
Figure 8. Learning Process when m 10, 20 and 40.
SHIMURA
p
1.0
0.9
,.,," ,
/':
0.8 I
f
,,
0.7
•
,,
0.6
O. 5 ~~o--_ _~
fixed increment rule
absolute correction rule
l~ __________ L -_ _ _ __ __ _ _ _ ~ ________ ~_ n
o so 100 ISO
Figure 9. Learning Process by a Fixed Increment Rule and an
Absolute Correction Rule.
P
1.0
5/ • 3dB
0.9
OdB
0.8
- 3dB
0.7
- 6dB
0.6
0.5
1
0 so 100 ISO 200
n
Figure 10. Learning Process in the Case There Exist Two Signals:
Sl and S2.
A MIXED-TYPE NON-PARAMETRIC LEARNING MACHINE 55
VII. CONCLUSIONS
ACKNOWLEDGEMENT
REFERENCES
Osaka University
Osaka, Japan
1. INTRODUCTION
56
RECOGNITION SYSTEM FOR HANDWRITTEN LETTERS 57
bipolar receptors
cells
section of the retina
Figure 1. Schematic Illustration of the Visual System in Higher
Animal.
Though, the mechanism of the human brain has not been revealed,
the excellent recognition faculties are possibly achieved by the
information processing mechanism as above mentioned.
q(x ,y)
w(r) (4 )
p(x) p(x)
q(y) q(y)
(a) (b)
Figure 2. Models of Lateral Inhibition Structure. Forward Inhibition
(a) and Backward Inhibition (b).
RECOGNITION SYSTEM FOR HANDWRITTEN LETTERS 59
where Kl and K2 are coefficients of the excitatory and inhibitory
connections, respectively. Also, 01 and 02 are parameters repre-
senting the spread of these neural connections.
I
I
(0) (10)
property A property B
K
____~I----~~---
(a) (b)
Figure 4. Property Filter Using Lateral Inhibition Structure (a)
and its Elemental Unit (b).
RECOGNITION SYSTEM FOR HANDWRITTEN LETTERS 61
When the nerve connection has symmetry, Eq. (5) can be expres-
sed as
n n
-
qK N.PK
l i
w.
l
I Ki Wi I
i=O i=O
(6 )
K. = /NiPK. ' W.= v'if"w.
l l l l
l
i=-n
I w. 2
l
= I W. 2 = 1
i=O
l
(7 )
c
I (8)
n+1 m-l
qM,M j , ... \ : M.l ,MJ.••• --k
~
. .
n+l Cm-ll· .
qM,M j , ... \ : M ••• M
u v
Therefore, it is possible to obtain the maximum value of qA from
this equation.
(11)
- I/
" "' +X
properties
to be
extracted
( ~ .J End
coefficients of the property filters designed by the method de-
scribed in Section 4. The properties about + and X-type intersect-
ing portions, and end portions are extracted isotropically and the
remaining properties are filtered anisotropically. The property
extraction is not affected by property position, but is restricted
in relation to length and thickness of the line. For example, a
horizontal segment filter can detect horizontal line having thick-
ness of 1 or 2 cells and partially 3 cells, but not possible in
case of greater thickness. Figure 8 shows property position
extracted by each filter.
-I -2 -2 -1
-,
-, -, t -, -2
-'2 0 4 o -2 -1 -,
-,-, -, 0 I
1 I
o -, -2 -2 1
-2 -, 0
0
,, -, -, -, ,
e =S
6 , -3 ,
0 -, -,
8 -I -4 6
1 -I 4
-, '2 0 Z \
, -,
\ -I 1 '2 I
-,
-,8 ',2 10 1,
0 0 0 8 -3 0 o 0 -3
-, .\ 0
, -I \ '2 0 '2 1
-, -, -,
1
I -4 -1 8 -\ -( 6 1 -3 , 6
Right downward +-type inter- lC. -type int" r-
oblique line secting point secting point
6 =S 6=38 6 =26
, , -,-, o -, -,
-4 ·5
1
-3 -1 0 0
0 0 o -2 -1 -2 0
-,
t
-,
0 0 0 -!
0 0 o ·3
-2 -\ -\ -2
o -\ -,
-, -2
-1 -,
-, -I
0 1
o -, 0 -, -1 -'2 ., -1
o -2 -, -2
0 0 -I 1 1 , -6 0
0 0 -\ 0 4 I -4
-type curve End point
6 =10 e =-8
1219165 1~ I~I
'I-type
curve
(-type
curve
'--type
curve
-'-type
curve
End point
The cell outputs of each filter are scanned and when a cell
emitting out-put is found, a counter corresponding to the property
is added 1. Then, scanning procedure is resumed skipping over the
vicinity of the discovered property position. The size of the
vicinity is chosen suitably for each property and in the case of a
horizontal segment, about 3 mesh lines is suitable. Each extracted
property is treated as follows. The straight line group (horizon-
tal, vertical, and oblique lines) is classified according to
whether each number corresponds to 0, 1, 2, 3 or more. Curves are
classified according to 0, 1, 2 or more, and + and X-type inter-
secting points are classified according to their presence or absence.
In the case of end points, the center of gravity of the letter is
computed and image plane is divided into 4 quadrants. The proper-
ties about ends are classified as to whether the number appearing
in each quadrant is 0, 1, 2 or more. Thus, the property vector
which consists of 44 bits information is obtained.
and the letter is identified based on the maximum among the functions
Zi'
Coefficients aij are adjusted as,
a ij = log P(Ci/x j ) (13)
on the basis of the conditional probability P(Ci/Xj) that the in-
put character is Ci when a property Xj is filtered. However,
assuming that the characters are given with uniform probability,
a ..
lJ
= log P(Xj/C.)
1
(14)
may be used instead of E~. (13) according to the Bayes' rule. Here,
p(x./C i ) is the conditional probability that the property Xj will
be tiltered when the character Ci is given. If the property
vector X is the linear-separable pattern, it will ultimately be
possible to construct the classifier which will make correct iden-
tification.
6. IDENTIFICATION EXPERIMENT
co r rect
Vl
answers
,,
1-0
CI) 0
~
~""
Vl ~ ,,
~ C1l
C1l Vl 00 I, ,
.j...l
~ I-o
Vl CI) \ ,\
U I-o.j...l II'
CI) CI) U 10 ' I
1-0 ~ C1l I
1-0 VI 1-0
0 ,::::: C1l I incor r ec t
u CII..c::
u
o::t
\ i\ rejection ans,.;ers
4-I.j...l
o <l)u"" <l)
1-0 I-o.j...l
N
, I I
\,I, ,I'
,
'\
II -, J \ ft • •
I
CI) 1-0 U
I' .-J 1/ "'" II. ~•
• , A I,I1 ~ " ~
.0 o <l) I ,
E u ·......
::l ,:::::
SO
<l)
z·~ 1-0 10 20 30 40 60 70 80 90 100
\ ri ter number
Figure 9. Learning Process of the Recognition System for Hand-
written Numerals.
No. of
charac- Correct Incorrect
Rejects
ters answers answers
7. CONCLUSION
J. M. Mendel
I. INTRODUCTION
70
SEQUENTIAL IDENTIFICATION 71
vlk)
+
!k ylk) +.< ~ qlk)
BLACK BOX '(>"
.!:!k + >{
+
+~
-1
elk)
7 ~
!K zlk)
MODEL
rr~
!K
ALGORITHM Rlk)
diag Ih 1 , ... , h I
R Ikl = n
JjN rit'lklHPlklrk
V. COMPUTATIONAL REQUIREMENTS
,.
!?k+1 •
plk + 11 ,.
"""'PikI[!?k
b-k+1 • --;;ncr
plk+1'{11 +R Ckl~ J b
bN L.Jn -k
+ eCkl"","Ckl!!k J
~t Equetionl
4 ~sec add time, 24 ~sec multiply and divide times, 4,096 word
memory capacity, and 20 bit word length.
VI. CONCLUSIONS
Table III
COMPUTATIONAL REQUIREMENTS (NOISY MEASUREMENTS)
A Posteriori A Priori
Algorithm Algorithm
Computational
Time (f.Lsec) 131 + 297 n':' 47 + 303 n
Program Size
(Cells) 46 + 4 n 44 +4 n
A. Problem 1
RfALWORLD
DYNAMICS
v(k)
+
y(k) +.<~ q(k)
BLACK BOX
+
nOd .~ elk)
+ f' ~-
,dk) zlk)
MODEL
rOd
ALGORITHM RIk)
B. Problem 2
C. Problem 3
.0.
REAL WORLD
DYNAMICS
vlk)
+1 elk)
1
~
+
rlk) zlk)
MODEL
~~Ik)
rlk)
ALGORITIiM RIk)
REFERENCES
Nagoya University
Nagoya, Japan
1. INTRODUCTION
79
80 FUNAHASHI AND NAKAMURA
2. IDENTIFICATION ALGORITHM
where
STOCHASTIC APPROXIMATION ALGORITHMS 81
w(k)
Identifier
~~ ~--~~~
u(k) ¢l ¢l
k
Uk u(k-l) ,¢ ¢2 ¢2 (6 )
¢k .k
'N 'N
u(k-N+l) ¢ ¢k
Define the estimate ¢" of the parameter vector ¢ as thx vector which
minimizes"the mean square error criterion E{(Z(k+l)-(¢,Uk })2}. This
estimate ¢ is to be computed recursively by means of Albert and
Gardner type of stochastic approximation procedure [4J; i.e.,
Uk
¢k+l = ¢k+Pk ~ [Z(k+l)-m(k+l)J, k=N,N+l,N+2, ...
Theorem 1. If the conditions (A), (B) and (C) are met, the
estimated parameter sequence {¢k} defined by (7), converges to the
true parameter value ¢ in the mean square sense.
(A) Smoothing coefficients are a sequence of positive numbers
such that
00 00
L L P~ < + 00
k=N k=N
»)
(B) Input sequence {u(k)} is such that
1 n U .U~ 2
lim inf. -
n
A.
mln
(L
. N .
= a > 0 (8 )
J= J
where Amin (A) denotes the minimum eigenvalue of the symmetric
matrix A.
(C) Noise sequence {w(k)} is white noise and is independent of
input sequence {u(k)} .
82 FUNAHASHI AND NAKAMURA
Subtracting <P
(10 )
_ n UkUk _ n
T n U. U. U
T
<P k +l = IT [I-P j rru-rr]<Pk +
j=N Ilukll j=N l=j+l
I .
IT [I-Pi 11~.~]Pj~W(j+l). (11)
l IIUjll
n
where ITj=m means the matrix backward product; i. e .•
~ ~-l ... '\no
Putting
T
n U.U.
pn = IT [I-p. ~ ] • (12 )
m j=m J II Uj l1
n U 2
E II.I pnj +l PJ' ~ w(j+l) II
J=N I/Ujll
n+l n
As {w(k)}k=N+l is independent of each other and of {Uk}k=N+l and
its mean is zero, the second term is zero. Therefore, eq. (13)
becomes
V VT T
M ) -2A . (I n
Pk
VV
k k )
IIVkl1 + ~ 02p~e mJ.n k=j+l ~
j=N w J
Xn tends to 0 as n ~ 00.
For any fixed k, an,k ~ 0 as n ~ 00. And
00
n
\' 2-0 \' 2-0 < + 00 .
La < L P < L P
n,k k=N k k=N k
So the Teoplitz lemma [5 ] implies
T
n V.V.
n n 2
-A . (
mJ.n j=k+l
I
Pj (uj)
I a
n,k ~ = I Pke
J ~ O.
k=N k=N
Thus it is shown that the second term on the righthand side of eq.
(15) tends to zero; i.e., EII<p -<p112~ O. Q.E.D.
n
Ell
n
L P~+l
j=N J
p. +U
J IIU j II
~(j+l)li2 •
J
(18)
where
~(j+l) = W(j+l)+81 v(j)+8 2v(j-l)+ ••• +8 Nv(j-N+l)
O(k) = E{~(j+l)~(j+l+k)}
0 2 + (2
8 +8 2 + ... +8 2) 02 when k = 0
w 1 2 N v
{
= (8181+k+8282+k+·· .+8 N_k 8 N) O~ when 1 ~ k ~ N-l
o when k > N
The second term of eq. (18) is zero and the third term is as follows:
n N-l
I I
j=N k=l
pn
j+k+l nu:::-n >
U +k
O
n+k
0 (k )
n N-l n
.s. 0(0) I p 2Jo Ilp~+llf+ 2 I O(k) I PJoPJo+ k IlpnJo+lll IlpnJo+k+lll
j=N J k+l j=N
n 2 2 N+l n 2 2 n 2 2
< I o(O)po IIp\lll + I o(k)[ I PJo IlpnJo+l11 +Jo=INP j +k IIPj+l+JI ]
j=N J J k=l j=N
-+ 0 .
This proves lim o . Q.E.D.
n -+ 00
1
lim info A.
m~n
where
Jk = {vk,vk+l,vk+2,···,vk+l-l}.
a 2 is given by S2 for this class.
4. CONCLUSION
REFERENCES
University of California
INTRODUCTION
87
88 AOKI AND YUE
complexity may be reduced, sometimes dramatically, and since better
estimates may be obtained by making fUll use of available informa-
tion. The identification of dynamic system matrices utilizing a
priori structural information was first considered by Aoki using
the method of least squares in 1967. [5]
BASIC ALGORITHM
T T T (2 nxr)x(2n+r)
[v (t), z (t), z (t+l)]
= [-B, -A, I]
ON UTILIZATION OF STRUCTURAL INFORMATION
In ref. [3] and [4] it has been established that the estimate con-
verges almost surely to the true A and B matrices.
s (t)
EXAMPLE PROBLEM
We now give computational results of the problem mentioned in
Introduction. See Figure 1. The system state and control vectors
are composed of four subvectors
and
u(t)
I ST LEVEL
2ND LEVEL
3RD LEVEL
- DYNAM IC INTERACTION
-- - CONTROL INTERVENTION
Figure 1.
ON UTILIZATION OF STRUCTURAL INFORMATION 91
with
All A12 0 0 Bll 0 0 0
The vectors xi(t), i=1, ••• ,4, and ui(t), i=1, ••• ,4, denote the
states of individual subsystems and their controls, respectively.
It is assumed that the parameter matrices (All' Bll ), .•• (A44, B44)
are known to individual subsystems; but the interaction character-
istics (i.e., A12. A2:. A23' A24. A32 • A42) as well as the control
intervention characteristics (i.e. B21. B32' B42) are unknown.
Noisy observations of the entire organization's state are avail-
able as
z(t) = x(t) + n(t),
where
En(t) = 0, En(t)nT(s) = cr 2 ct I
,s
T T T T
u (t) ~ (ul (t), u 2 (t), .• u4 (t))
T T T T
x (t) ~ (xl (t), x 2 (t), .• x4 (t))
T T T T
z (t) ~ (zl (t), z2 (t), .• z4 (t))
where
[:j= (gl,g2)
and G2 are defined similar to those with A12 except for the fact
that Wk is now defined as Wk=Wll-W12W22-1Wliwhere
N-l N-l t N-l
Wll = \'L zt (zt)T. W12 = \'L z u 2 ( t ) • W = L
22
\' u 2 ( t ) •
2
t=O 0 t=O
with
(zt)T = [R j -l(Zj (t+l)-AjjZ j (t)-BjjU j (t))]T 'Z2(t)T ]
'" T
A2l
where
where
where
N-l
I u(t)u(t)T
t=o
where the symbols have similar meanings as above.
NUMERICAL STUDIES
(Average of 14 samples)
interaction estimation error
e B = intervention estimation error
94 AOKI AND YUE
(Average of 8 samples)
DISCUSSIONS
400
100
80
Figure 2.
1000
, i it
NO STRUCTURAL
INFORMATION
18 RUNS)
''\ ........
'. \
<[
~
;;;
100 V .,.
.
or
filor
tr'1
'"z STRUCTURAL
INFORMATION
0
114 RUNS)
~
~
;::
U)
10
'"
100 1000
NUM8ER OF 08SERVATIONS
Figure 3.
AOKI AND YUE
REFERENCES
ACKNOWLEDGMENT
Atsuhiko Noda
Tokyo, Japan
1. INTRODUCTION
97
NODA
tThis means that the unknown parameter varies perpetually and slowly
enough to be identified with the admissible tracking error, which
may be determined at need.
ttl indicates the transposition of a vector or a matrix.
AN INCONSISTENCY 99
~ j=1,2, ...
Ix.1 2
-J
where
a: error-correcting coefficient, 0 < a « 1,
~ ({j)
(j) (j)),.
Xj - xl ,x2 , ... ,xN . input vector,
N
Ix.12~L (x~j))2,
J i=l 1
y.
J
= x!w + n.:
J
output of the unknown system with additive
J
noise,
a sequence of random variables with E[n.J=O and V[n J<oo,
J j
x!v.: output of the approximating linear system, v.,
J J at the jth step of iteration. J
(4 )
2
Eq. (5) implies that the expectation of the mean-square error ej+l
tends to i (>0) with iterative approximation by (3) if the unknown
parameter w is assumed to be stationary. and in this sense i will
be referred to as the limit-accuracy of the fundamental method of
learning identification. The limit-accuracy depends on the error-
correcting coefficient a. the dimension N of the unknown parameter
w and on n*2 which is determined by the ensemble average of the
noise magnitude. As n*2 and N are given constants. i can be made
as small as desired by reducing a.
tThe jth step is started at the jth sampling. and should be carried
out within the jth sampling interval.
ttThis state is. in other words. a situation in which IVl-w12 has
been nearly extinguished. and the influence of the noise alone has
remained to cause the error.
ttt Th18
· St a te ·lS. ln
. s h ort • a perlo
. d wh·lC h lS
. requlre
. d for the
influence of the initial error IVl-w12 to disappear.
AN INCONSISTENCY 101
c,,,----------T.:r'-----.,,j
i
I
l091~'--------===t===---
log e;:
Figure 2. Limit-Accuracy State of the Learning Identification.
tJ.--tt
b. = v -w
J j
T - [
E - 2
ln E
In(l-o.IN)
]+ 1 (7)
where [x] implies the integer part of x. For large N (e.g., N > 10),
TE is given approximately as follows:
-N ln E
(8 )
20.
and TE is obviously monotonically decreasing with respect to both
E and a..
+ 1
where
= (1.
N
+ l ) ( _ op)2
2 Ci. cx.. + 1 - (10 )
e J
j
(ll)
2 + N *2
ej nj
Consequently. eq. (10) shows that the error-correcting coefficient
a of procedure (3) should be ajP at the jth step of iteration to
minimize rj. and therefore. a~P may be referred to as the
optimal correcting coefficient of the learning identification method.
Nevertheless. ajP cannot be estimated at each step of iterative
identification. because ajP depends on the unknown identification
error eJ. However. (1) expresses qualitatively that a should be
nearly equal to unity when the error eJ is much greater than the
noise magnitude Nn~2. In other words. a should nearly equal unity
J
tCl may occasionally become non-positive. In such cases. the initial
value vl has been unexpectedly chosen very close to the unknown para-
meter w.
AN INCONSISTENCY 103
£ TE ~ N2 In E
(12)
4
EQ. (12) represents practically the relation of inverse proportion
between £ and T ; that is, an inconsistency between the rate and
the accuracy (lImit-accuracy).
Thus, eQ. (12) will be made use of as a guiding principle in
determining the error-correcting coefficient, a, of the iterative
approximation by the learning method. In other wordS, the error-
correcting coefficient should be chosen by giving priority to the
more important demand, because the degrees of freedom are limited
to one tt in the choice between rate and accuracy.
I if, or w,
i
!
J
Figure 3. Limit-Accuracy State in the Case of Estimating a Non-
Stationary Unknown Parameter by the Learning Method.
~ =~ + r
"u
where
"u
r ~ (18 )
"u
The effect of parameter variat~on r does not disappear and thus
influences the limit-accuracy ~. which is distinguishable from the
limit-accuracy of stationary parameter estimation. as shown in
Figure 3.
N N 2 2
I
"u
E[r] ~ c. W. (22)
a l l
i=l
106 NODA
This is the quantity of influence on the components of the periodic
variation of the unknown vector parameter on the limit-accuracy,
the expectation taken with respect to T.
Now, let v~ denote the expectation of v. with respect to ~
and n k (k=l ,2., .. ~ ,j -1) in the limit-accuracy state, and then,
where 2 2
N2 W2 N2 W2
N wI N
, ... ,
2
A ~ diag(l- --2-' 1 - - - 1---)
20. 20. 2 20.2
N WI N W2 N WN
<P ~ (--0.-, --0.-"'" ---a- )
Therefore, by the expectation ~ of the approximation V., the period
of each component of unknown patameter variation can be ~stimated
by the rate of amplitude attenuation, A, and phase lag <P in the
limit-accuracy state.
aJl, ",j c/ *2
n
e~,j ~ IV j - Wj 1
2 , e~+l,j ~ IV j +l - Wjl2
The jth correction by (3) is performed proportionally to the jth
error vector: (Vj - wj ); that is, not in a predictive manner con-
sidering the future variation: oW j = w'+ l - Wj' Therefore, the
performance evaluation of a single cor~ection at the jth iterative
step may be defined by an expectation R., i.e.,
J __
;2 2 *2
rv e. 1 .
R. J 2-+;:c.. L.lJ,,--
~ _ .... ' = {I - a(2
- } + a nj-
a)
J N 2
e .. e ..
J,J J ,J
rv
and Rj should be minimizedrvto make an optimal correction. a, which
minimlzes the expectation R. in the non-stationary parameter approxi-
mation, will be referred toJas the optimal correcting coefficient
in ~e non-stationary parameter approximation, and will be denoted
b y a op , l.e.,
. 2
j e
a OP ,j ,j
j
2 + N *2
nje j ,j
which is exactly similar to (11).
lo8 NODA
(26 )
k ~ j
(28)
where
Yj = x'w
j j
+ n
j'
y: constant, 0 < Y « l.
5. CONCLUSIONS
REFERENCES
ACKNOWLEDGMENT
ABSTRACT
INTRODUCTION
111
112 LEE AND SHEN
gx (x) (4 )
Likewise for
From equations (3). (4). and (5). we see that g(x.t) can be approx-
imated by:
M Mt N
g(x.t) = LXc .P.(x) L ctkPk(t) = L c.b.(x.t) (6 )
j=l xJ J k=l i=l l l
where N = M'M and
x t
j=1.2 •...• Mx; k=1.2 •...• Mt
(7 )
Using matrix notations. g(x.t) can be expressed by
g(x.t) = £T~(x.t) (8)
where £ and ~(x.t) are both N x 1 vectors. Superscript T denotes
the transpose of a vector or a matrix.
The step size t(m+l) and direction vector s(m+l) can be deter-
mined by setting the derivative of I(m+l) with ;espect to £(m+l)
e~ual to zero vector.
m+l
dI(m+l)
d£(m+l)
I [gm(i) - £T(m+l)~(i)] ~(i)
i=l
(14)
So we obtain
T
[gm(1)gm(2) ... ~(m+l)]-£ (m+l).[~(1)~(2) ... ~(m+l)] = o. (15)
t 9.,
J J gd(x,x~t-T)[ud(x~T)+q(X~O)O(T)]dx'dT
o 0
where q(x,O) is the initial state function; ud(x,t) is the distri-
buted control function, and gd(x,x' ,t) the corresponding weighting
function. In the absence of distributed control, eq. (27) can be
reduced to eq. (1).
U(t) =
is the control function u. (t) from t=O to t=T and u2i(t) is ui(t)
from t=T to t=2T. The ne~d for two sets of control inputs
and
dJ [ g. ( x,t ) ] =
aJ[g.
-~
(x,t) + a.z.
a ~-~
(x,t)] I
-~ ai a.=O
~
After -....
z, (x,t) is chosen, take derivative of J[g.~ (x,t)+a,z.
... -~ (x,t)]
with respect to ai' and set it equal to O. We obtain
a. (44)
~ (~i(x,t),(L*L~i)(x,t»H (T)
1
Equations (43) and (44) together with (41) form the functional
steepest descent algorithm.
WEIGHTING FUNCTION ESTIMATION 119
The proof of convergence and an estimate of the rate of con-
vergence of steepest descent method of approximation can be found
in Kantarovich's original work [8].
(46)
z . (x, t) = -
-l
r.l (x, t) + b.l - l~'l - 1 (x, t )
CONCLUSION
REFERENCES
Kyushu University
Fukuoka, Japan
1. INTRODUCTION
121
122 TSUJI AND KUMAMARU
~+l = f(~) + Wk (1 )
Yk = h(~) + Vk (2 )
Definitions.
k
(1) ~ = {Yoyl ••• y k } observed information up to the kth stage
(2) ~/k: estimate of ~ based on yk
(3) ~+l/k: prediction of ~+l based on yk
(4) ~/k = ~-Xk/k~ estimation error of Xk
(5) ~+l/k = ~+l-~+l/k: prediction error of Xk +1
(6) Ck / k = E(~/k~/k/yk), Ck +l / k = E(~+l/k~+l/k/yk)
E(./~) is expected value conditioned on yk.
Assumptions.
"-
(I) Initial state Xo is Gaussian distributed with mean XO/_ l and
and covariance CO/_I' which are given a priori.
(2) f(Xk ) and h(Xk ) are continuous functions which have at least
second order derivatives.
(3) ~/k and ~+L/k are approximately Gaussian Distributed.
(4) ~/k and ~+l/k are unbiased estimate of ~ and unbiased
prediction of ~+l' i.e., E(~/k/yk)=O, E(~+l/k/yk)=O.t
Suppose that f(Xk ) and h(Xk+l) are approximated by the second
order Taylor expansion about ~/k and ~+l/k' respectively, as follows:
~ iII ~i(~+1-~+1/k)'h;x(~+1/k)(~+1-~+1/k)+Vk+l(4)
where
~ = (X~ ~ ... ~)" f(~) = (fl(~)f2(~) ... ~(Xk))"
h(~) = (hl(~)h2(~) ... hm(~))"
¢i = nxl natural basis vector,
~. = mxl natural basis vector
~
Wk - ~(~/k))/yk} = 0 (6)
Using the assumption (4) and the definition (6) in eq. (6) yields
1 n .
~(~/k) = 2 iIl¢itr(f~x(~/k)Ck/k) (7)
Therefore
(8)
A A
where Kk+l is nXm filter gain matrix and TI(~+l/k) is another biased
correction term. TI(~+l/k) is determined as below. Define
Yk +l / k = h(~+l/k) + TI(~+l/k) (10)
= Yk +l
A
Yk +l / k - Yk +l / k (11)
Then from eq. (4) we obtain
1 m i
2 .l ~itr(hxx(~+l/k)Ck+l/k))
A
(14)
~=l
- - k ln
= E(~+l/kXk+l/k/Y = E{(fx(~/k)~/k+ .l
A -
Ck +l / k ) 2 $i •
~=l
tr(f~(~/k) (~/k~/k-Ck/k))+Wk)(fx~~/k)~/k +
E(tr(AXX'BXX')) = E(tr(AXX')tr(BXX'))
=2 tr(ACBC)+ tr(AC)tr(BC) (18)
Using eqs. (17), (18) and definition (6) yields
Ck +l / k = fX(~/k)Ck/kfx(~/k)' + Dk + ~
where Dk is given by
ij 1 i '" j '"
(Dk ) =2 tr(fxx(~/k)Ck/kfxx(Xk/k)ck/k) (20)
1 m i A - -
2 iIl~itr(hxx(~+l/k)(Xk+l/kXk+l/k-ck+l/k))+vk+l)) .
.I ~itr(h!x(~+l/k)
(~+l/k-~+l(h~(~+l/k)~+l/k + ~ 1=1
(~+l/k~+l/k-Ck+l/k)) + Vk+l))'/yk} (21)
Using eqs. (17) and (18) and the definition (6) in eq. (21) yields
- - k
E(~+l/k+l~+l/k+l/y ) = (IT - ~+lhx(~+l/k))Ck+l/k
(IT - ~+lhx(~+l/k))' + ~+l(~+l+~+l)~+l (22)
We determine the optimal filter gain ~+l as such value that mini-
mizes the weighted mean square error ~+l/k+l conditioned on yk, i.e.,
- - k - - k
E(Xk+l/k+l~+l~+l/k+l/y ) = tr(~+lE(~+1/k+1Xk+l/k+l/y )),
where ~+l is an nXn positive symmetric weighting matrix. Therefore,
K
-1\.+1 is given by
- - k
dtr(~+lE(~+1/k+1Xk+l/k+l/y )) _
dK - 0 (24)
-1\.+1
In the calculation of eq. (24), the following gradient matrix
formulas are used.
126 TSUJI AND KUMAMARU
a
aK tr(BK') =B (26)
where A, B and C are arbitrary matrices for which the above matrix
calculations are possible.
~+l = Ck+l/kh~(~+l/k)(hx(~+l/k)Ck+l/kh~(~+l/k) +
)-1 (28)
~+1 + 1\.+1
Substituting eq. (28) into eq. (22) yields
- - k A A
E(~+1/k+1Xk+l/k+l/y ) = Ck+l/k-Ck+l/kh~(~+l/k)(hX(~+l/k) •
A -1 A
Theorem 2.t Under the optimal filter gain given by eq. (29),
the following properties are satisfied to the estimation error
\+1/k+1'
- k+l - k
E(~+l/k+l/y ) = E(~+l/k+l/y ) (30)
- - k+l - - k
E(~+1/k+1Xk+l/k+l/y ) = E(~+1/k+1Xk+l/k+l/y ) (31)
Ck +l / k +l = Ck+l/k-Ck+l/kh~(~+l/k)(hX(~+l/k)Ck+l/k
h~(~+l/k) + ~+l + 1\.+l)-~X(~+l/k)Ck+l/k (32)
Moreover, from eqs. (15) and (30) the following unbiased estimate
condition is obtained:
- k+l - k
E(~+l/k+l/y ) = E(~+l/k+l/y ) =0 (34)
~+l/k+l
(36 )
1 ( i ('" ) j ('"
2 tr hxx ~+l/k Ck+l/khxx ~+l/k)Ck+l/k)
Ck +l / k + l = Ck+l/k-Ck+l/kh~(~+l/k)(hx(~+l/k)Ck+l/k
h~(~+l/k)+~+l+~+l)-lhx(~+l/k)Ck+l/k (39)
= fX(~/k)Ck/kfx(~/k)' + Dk + Qk (40)
(41)
'"
With a priori information XO/_ l ' CO/_I' the nonlinear state estima-
tion can be sequentially processed through the eqs. (35)-(41).
3. APPLICATIONS
Yk = ~+ v k (45)
where
A1T
cpd = (<I>ll <1>22 ... <l>nn) ,
e 0
A2T T sampling period
cp = e = I<l>i j I , G = (1
g g 2 ••• g n) ,
AT
n i (<I>ii_l)T/log <l>ii
0 e g =
h(~) = ~ = } ~n+i~ .
~=l
1.0
. - cp~O.6J
est stage
o 5 10 15 20 25 30
Figure 1. Estimation of Transition Matrix.
TSUJI AND KUMAMARU
m 2.".2 -8
B fi 10 & 1 .;;:;:::: ... : : : : ~ :
m\.24.3
= (2 2 ••• 2 0.30 0.37 0.45 0.61 0.74 0.82 -6.2 26.8 -34.1
24.3 -17.5 6.8)'
" /_
XO 1 = (1.5 1.5 .•• 1.5 1.0 1.0 .•• 1.0 -10.0 20.0 -40.0 30.5
-20.0 10.0 )'
(46)
This approximation is only trial one but this function form seems
to be adequate to represent the multimodal function. If the quad-
ratic form is used, we cannot construct any multimodal form even
if we try to represent it by any linear combination. In the Eq.
(46), x k is a n-dimensional system state and a. ,r. ,Po are unknown
l l l
parameters. £,£,~ are unknown parameter vectors composed of each
component of a. ,r. ,P. and the modal points are x, = r.. The system
l l l K l
is governed by the dynamic equation and observed by the nonlinear
scheme;
~+l = g(xk'~) + wk (47)
z~ = h(xk ) + ~~ (48)
where wk and ~~ are white gaussian noises. And the gain output
corresponding to the actual state ~ is observed with additive
gaussian noise ~~;
L L
zk = L(~,£,£,~) + ~k (49)
Define the augmented state vector Xk by (xk' £' £' ~')' , then
this searching problem can be reduced to the state estimation prob-
lem in the nonlinear augmented system which is described by the
following dynamic and observation equations;
i n+l,n+2, ... ,N
N = order of ~.
1 2
Xo = (x1O Xo2 III
1 2
r l r l a l pll p12 p22 r 2 r 2 a 2 pll p12 p22) ,
2 2 2
= (7 7 4 10 40 20 50 60 18 18 50 20 50 60)'
o
A
XO/_ l = (0 0 0 10 10 20 30 0 05256)'
Xl
25
20
15
o
XI'
~ 10 5
60 • • • •
50
___ D
40
-o-~ . 0
30
20
10
est sta e
0 5 15
Figure 4. Estimation of Unknown Parameters.
134 TSUJI AND KUMAMARU
where wk and vk are white gaussian noises with zero means and var-
iances cr;k and cr~k+ 02 • 02 is considered as a relative increment
of the variance in the measurement noise due to the approximation
error of Fk(9 k ). Therefore the second order filter is available to
the estimation of the augmented state, i.e. the learning of the
nonstationary unknown function. Moreover in this case, the obser-
vation matrix M(9 k ) is a known function of the input sample 9k •
And so the optimal sample sequence can be determined through the
minimization of the trace of parameter estimation error covariance
matrix. As an example, a linear incremental type of nonstationary
function is taken and the learning results using the optimal input
sample are shown in Figure 5.
4. CONCLUSIONS
Z 60
..... s
- 0 - 0 - : ZIV.
- A- A- : Z~
est. sta e
o 5 o 15
REFERENCES
Washington University
ABSTRACT
I. INTRODUCTION
138
A LINEAR FILTER FOR DISCRETE SYSTEMS 139
*At least one unsuccessful effort to solve the problem by this type
of approach has been reported in the literature [11, Appendix A].
140 TARN AND ZABORSZKY
Find:
A
x (5 )
-n
P(nln) = E(x -i )(x -i )'
- -n -n -n-n
(6)
that is to find the expected value of the state x conditioned on
the set of all past measurements, lk' and on the-~nowledge of
E~l = gl' E(~l-gl)(~l-gl)' = !:(lll)
~l =0 ,
III. THEOREM
[~ n +l-H
-n +l¢(t
- n+ l,t n )i
-n -S(t
- n+ l,t n )(~n -H x
-n-n )J (8)
P(n+lln+l)
-
= -P(n+lln) - K
-n+l
[~n+lf(n+lln)-~(tn+l,tn)~nr(nln)!'(tn+l,tn)J (9)
[Bn+l+~n+l~(n+lln)~~+l-~n+l!(tn+l,tn)f(nln)~~§'(tn+l,tn)
- -S(t n +l,t n )H
-n-P(nln)¢(t
- n+ l,t n )H'n+ 1
+ S(t l,t )H P(nln)H'S'(t l,t )J- l (10 )
- n+ n -n- -n- n+ n
(ll)
IV. PROOF
For simplicity the proof will be first given for the simplified
case.
:In H x + w (13 )
-n-n -n
z B x (15)
-n = -n-n + v-n
where
~1 !{1 gl!(t 1 ,t n )
~2 ~2 !i2!(t 2 ,t n )
z = v B = (16)
-n -n -n
~n
w H
-n -n
o (17 )
-S'(t t )R- l
- n' n-l -n
o 1 -1
-R-
-n -Set n ,t n-1)' ~n
(19 )
and
A
(21)
~n+1
where
r
A LINEAR FILTER FOR DISCRETE SYSTEMS 143
~n+l
=
H
O(t n ,tn+
-n+l
ll] (22)
0 0 0
= [~'(t ,t
- n
)B,~-lB ~(t
n+l -n-n -n-
t )+M' R- l M ]-1
n' n+l -n+l-n+l n+l
= [p-l(n+lln)+M'+lR-l+1M
- -n -n -n+
1]-1
f(n+lln+l) = f(n+lln)-~(n+lln)~~+l
(~n+l+~n+lf(n+lln)~~+l)-~n+lf(n+lln) (30 )
~(n+lln+l)f-l(n+lln+l) = I . (31)
+ K [y -H ~(t t ) i -S(t t ) (y -H i )]
-n+l -n+l -n+l- n+l' n -n - n+l' n -n -n-n
V. CONCLUSION
APPENDIX 1
~l 0, ~l~i Bl (A2 )
Er-n-m
r' = 0 ifn=lm (A4)
For the first n measurement points, the set of equations (Al) can
be condensed into
c
-n
146 TARN AND ZABaRSZKY
where
I a a ~l lil
a ~2 I2
a v-n = c
-n= (A6)
a w r
-n -n
(Al2)
o 0 I 0
I
( -1 I -1
+ 0 -S' t n +l,t n)R- n
+lS(t
- n +l,t)
n l -S'(t
- n +l,t n-n
)R +1
---------------+---------
-1 ( ) I
o -R
-n +lS
- t n +l,t n 0
APPENDIX 2
Y =Hx +w (B2)
-n -n-n -n
-n) B'~
-n-n-n
Hence,
148 TARN AND ZABORSZKY
-
~ = E["~-2n ] = E ( ~~En-1~n )-1 -1
~~~n Yn =0 .
Now the covariance matrix of ~ given all the information up to
time tn will be
E[- -, I l<k<] = E[(B'E-IB )-lB'E-lv v'E-IB (B'E-IB )-1']
!n!n ~k' = =n -n-n -n -n-n -n-n-n -n -n-n -n
= (B'E-IB )-l(B'E-IB )'(B'E-IB )-1'
-n-n -n -n-n -n -n-n-n
-1
= (~~~n ~n)
-1 I
= !:(n n) • (B8)
which is equation (27). Similarly it can be shown that
REFERENCES
STOCHASTIC PROCESSES
Seigo Kano
Kyushu University
Fukuoka, Japan
1. INTRODUCTION
150
STOCHASTIC LEARNING 151
real vector which is called target vector and is given in advance.
We consider a set of transformations by which U is successively
transformed so as to make E{U} come near A. If Ul=ul is observed
then Ui is transformed to
Ul(.2) = Ul' + C11 (~-u)
'->1 l ' l'-2
- " 3 ... , (2.1)
I
l
(~2
0 0 0 0
D
1/2 0 0 0
1/2 1/2 0 0
0 1/2 1/2 0
~~ E {U~!11)}=1;, and
J.
given by( " 0 0 0 0
1/2 1/2 0 0 0
1/2 1/3 1/3 0 0
0 1/3 1/3 1/3 0
152 KANO
Then e k are given by
m,m- k k k
ax + Sy + y z , k=O,l , .•. ,m-l,
where
x = A + B + 2/9, y = A + B + 2/9, z=y
A = (1/16 2 ) (-31±12E34l/9)1/3, W3=1,
B
a = (16/27)-(5/9)~l+z)+lz/3
(x-z ) (x-y)
(16/27)-(5/9(x+z)+xz/3
13 = (y-x) (y-z)
y = a,
Therefore, we obtain lim E U(m) =
m
IJt)(lO
J
1
t~2
0
0 0
0 1/3
Now we assume that Yn are independent of Xl, ... ,Xn' n=1,2, ...
We note that Xn depend upon Yn-l, n=1,2 ... Hence we have
(r
by U= (Sl' Sl' ••• ),U(n) = (Sn' Sn' ... ),
o o
~)
C = I-a o
o I-a
,
and A = (Y l , Y2 , ••• ). Therefore we have to make generalized form
of L.C.S.P. with target stochastic process {Yn }
where a and C are random variables assuming the values aij' i,j=1,2,
and 0jl' j=O,l, with some probabilities respectively. Equation (3.6)
is reauced to Sn+l = Sn-a(Sn-c). Again we have generalized form of
L.C.S.P. in which C and A matrix and vector of random variables.
Now, we shall give the convergence of generalized L.C.S.P. in which
we consider the stochastic processes U and V instead of U-A and V-A
respectively and we rewrite the expression (2.3) as
m+l
Vm+l = L dm+l k(Uk-Uk _l ) • (3.7)
k=l '
STOCHASTIC LEARNING 155
= -min(Un , c),
n=1,2, . . . . Obviously, Un are uniformly bounded.
v
I n I = min(Un , c)<
= c.
A
E{Vn I V1 ,···,V
n-
I} = -c p{U >c Iul,···,u l}-
n n-
(3.8)
p{U IUl,···,U I}
(I
U <c
Un {n I U
P Un<C
n-
1"" ,Un _ l
})p{u <c I Ul '··· 'Un_I}
n
n
= -c p{u >c
n= Ul""'Un- l}- I Un p{un lUI"" ,Un- I}
Un<c
V
L V n p{V n
n
- UL Vn p{Vn
n
156 KANO
If sup IUnl<c then V=V. From the fact that U is Ll bounded, the
se~u~nce {Un} converges almost everywhere and for some positive
constant k,
Martingale. Accordingly
n n
E{U;} = E{( Ldk )2} ~ E{ I ~} , n ~ 1.
k=l k=l
If 0n , n=1,2, ... , are defined by
nn n=1,2, ... ,
then
A A A
Furthermore, if we put d n=Un-Un- l' then
A2
E{d } = E{[d -E{d
I Ul,···,U
A
tI\. A 2
I}] }
tI\.
n n n n- (3.10)
= E{a2 }-E{[E{d I U 1 , .•• ,Un- 1}]2} -~E{d2}
n n n
A
The relations (3.9) and (3.10) make U L2 bounded Martingale,
since
~ ~ A
Let the transform of U by D be V, that is Vn = Then
STOCHASTIC LEARNING 157
we can prove that ~ is L2 bounded Martingale. In fact
E{VfI: m+ 1 1Vl,···,V}
~
Itt. A" I"ul,···,U}+
"
m = E{Um+ l-Um m
and
(3.11)
n k-l
kI 2 E{[iIl (<\i-<\-1,i)d i ]2} ,
"" fI: A
where we used the property, E{(V.-V. l)(V.-V j I)} = 0 , i > j. 1 1- J -
In the last expression of (3.11) we put
fI:
E{Vn } = II + 12 + 13 .
n k-l
< 4M I I O(k- ( 2+0/2 ) ) < M .
k=2 i=l 0
Similarly, 13 is evaluated as follows,
n k-l
1131 ~ I I 1 E{(<\.-<\ 1 . ) (dk ·-<\ 1 .)d.d.}1
k;2 i ,j=l 1 -,1 J -,J 1 J
n k-l
~ I I
k=2 i,j=l
158 KANa
<
~ 2M I
k=2
O(k-(2+0)) < ~
"-
f':
V in the left hand side and last term in the right hand side con-
vgrge hence Vn converges with probability one. This completes the
proof of theorem 3.1.
REFERENCES
l. INTRODUCTION
160
LEARNING PROCESSES 161
NAND
E,
STOCHASTIC
VALUE
L Pa.P.~ ~
i
Some features of the random pulse machine are: (1) The random
machine is scarcely affected by the noise and does not make any
gross errors. (2) The machine configuration is very simple and the
multiplier and the analog memory are extremely accurate, since they
are essential digital. (3) The link between the stochastic and
OHTERU, KATO, NISHIHARA, KINOUCHI
NAND
PtE,)
E,
PtE) E,+E 2
-2-
R R
DIGIT AL NOISE
(a)
NOT
(b)
E~-E
Figure 3. Stochastic Adder Figure 4. Stochastic Inverter.
(a) Simple Adder
(b) Weighted Sum.
3. STOCHASTIC ADALINE
INVERTOR
P{Djl
XI'
m
II f(P(x .. ), -a.)}
i lJ 1
(6)
set,
= C I. P j (D.-LX .. w. )JC j
J . lJ 1 -K
(11)
J 1
where
LEARNING PROCESSES 165
(12)
(13)
~~WLk> = Z L
a •
P.(D.-
J J
L. <W.>
l
XlJ
.. )X
-KJ
. (14)
J l
= <W > +
k t
(15)
f(.) =I a i ~i (.)
i
K(t,.) = Ii b. (t) .~. (.)
~ ~
(18)
using the orthogonal function ~.(.). Then we can obtain from eq. 16
~
F ( t) = iLa.~b.~ (t ) ( 19 )
0,
GATE (Ge )
RADEMACHER CONTROLLER
FUNCTION
GENERATOR
IfUp'7FUiPi
I
I
I
I
I
I
L. L _ ,_ ~,~ ___ ,J
WALSH FU'lCTION GEf'£RATOR
CLOCK f'ULSE ( WFG )
~+l(Z) ~(Z+l)P(Z/Z+l)
+ ~(Z)p(Z/Z) (21)
+ ~(Z-l)P(Z/Z-l)
(Zn) !
q(Z) C(p) (n-np+Z)! (n+np-Z)! (22)
in which the expected value is np and the variance n/2 and C(P) is
constant value for Z. Figure 9 shows an experimental result. It
is possible to control easily the variance and expected value at
the desired values. Furthermore, it is also possible to get the
other type distributions. Figure 8(b) shows the circuit for a
Poisson distribution.
168 OHTERU, KATO,NISHIHARA, KINOUCHI
p STOCHASTIC
OUT-PUT
DIGITAL
(a) OUT-PUT
P
ANO DIGITAL t
i
~
OUl;PUT
]STOCHASTIC
OUT-PUT
(b)
and equals
PROBABILITY
DENSITY
OSCILLATOR
1-------",
i i
1
i
1
i
L:~ ____ j
K(f ,Tl'~ b,(f)4>i(') Of
1_ _ -1 _____,
i, I'
1 '
i
!L ______ J'
(\ K(I,T)
~
J] IL
0==0
fIT)
T~
(REAL (EXPERIMENTAL
IMAGE IMAGE
5. CONCLUSION
(A-l)
(A-2)
where
(A-6)
2
for k#
~=
{ ~kt/VX (A-7)
1 for k=t,
!i' !i - (A-8)
and -
= W
~is the weight vector which minimizes the mean-square error.
The fluctuation of weight in learning process is calculated from
eq. (A-5).
LEARNING PROCESSES 171
The covariance 0k.Q, in the limit of t-+oo is given
l\..Q,VW
L OOk
~
F;0.Q, +
~
L °u F;ik
°
= ~ (A-9)
i ~
REFERENCES
Kaoru NaKano
University of Tokyo
Tokyo, Japan
SUMMARY
INTRODUCTION
172
LEARNING PROCESS IN A MODEL 173
to construct the structure artificially, work having been presented
by Post [1] is concerning rather biological associative memory.
His model is a distributed memory device which memorizes triplets
of entities and recalls one of these entities from two other enti-
ties in the specific triplet. The model will be helpful for reali-
zing life-like information processing.
(2 )
To recall entities. first we define the ~uantizing function.
-I if t < 0
{ if t = 0
</>(t) = ~
if t > 0
Suppose that it can also be applied to a vector u = (Uj) and a
matrix A = (aij)' as
</>(u) = (</>(u j ))
</>(A) = (</>(aij )) •
(4)
From an input y = (yl.y2 •.•• 'Yn) , the same kind of vector x, the
memory device recalls a row vector z. where
z = </>(y </>(M))
If Y is composed of a neutral area and a few patterns which are
also the components of a stored entity x, then it is expected that
y is nearly e~ual to x, even when a lot of entities are memorized.
This means the realization of recalling the whole of a stored entity
from a part of it. Conse~uently, a few patterns previously men-
tioned can be associated with the rest of the patterns of entity x.
m;j Memory
unit
~~B:B--,' DCD' ~
~L_--===F=:::I~
Figure 2. Association Mode
~l
t
M (A,B,O) (A,B,O) (13 )
In the case of Figure 2(c), two associations, A-B and C-B', are
stored in such a way that pattern Band B' are overlapping. Stored
vectors are
x = (A,B,O,O), y = (O,B' ,e,o) . (15)
The matrix is
AtA AtB
BtA BtB+B,t B , ° °
B,te
M
ete ° (16)
° °
etB'
° ° ° °
The recalling process is as follows,
t
v * a(A,O,O,O) ¢(O,A¢(A B),O,O) (O,B,O,O)
B t
v * a(O,B,O,O) = ¢(B¢(B A),O,O,O) (A,O,O,O)
A
(21)
That is,
1 (s-1)/2 ..
p = ( __ )s L C.(k+l)s-~(k-l)~, (22)
2k i=O s ~
where sCi is the number of combination. As this summation cannot
be written as a simple form, to determine the behavior of this equa-
tion, the graphical expression is taken by calculating the values
of P for various values of k and s. This is shown in Figure 3.
From the graph, it is found that completely accurate recalling can
be done when the number of stored entities is very small or when the
number of elements constructing the input pattern is very large.
But it is important that the ability of an associative memory is
not determined only by the accuracy of memory.
LEARNING PROCESS IN A MODEL 179
"......
."
....,
C
10
-•
e
»
~ 70
60
•::s
(,)
10
(,)
(,)
< 50
40
0 10 20 30 40 50 60 70 80 90 100 110
Number of overlapping patterns
l00r-------------------------------~
Best player
9~~============================~
~ 80~--------------------------------~
CIt:!
.....•
tJ 70
Player with associative memory
l1li
c:
x~-
V--
·a 60 x
.5
~
Random player
50
o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of games
Figure 4. Graphical Expression of Learning.
recalls "red" and "spherical" from "apple" and vice versa. That is,
although the triplet apple-red-spherical is not a stored entity, it
is formed in the memory device.
Memorizing Recalling
Thing-Shape-Color-Word Word -+ Concept Concept -+ Word
1- A - B1 - C - C' C' -+ Cl Cl -+ C'
1 1 1 1 1
2. A2 - B2 - C - C' C' -+ C2 C2 -+ C'
1 1 2 2
3. A - B - Cl - C' B' -+
Bl Bl -+ B'
3 3 1 1 1
4. A4 - B2 - C2 - C' B' -+
B2 B2 -+ B'
2 2 2
5. A5 - B - C2 - C'
1 2
6. A6 - B - C2 - C' Thing -+ Shape, Its word, Color, Its word
3 2
7· A - B - Cl - B'
1 1 1 Al -+ Bl , B1, Cl , C'
1
8. A - B2 - C - B' A2 -+
B2 ' B~ , Cl , C'
2 1 2 1
9. A4 - B2 - C2 - B'2
Every recalling is completely accurate.
10. A5 - B1 - C2 - B'1
There are n chips on the board initially. Two players take any
number of one to three chips alternately from these chips. The
player who is forced to take the last chip loses the game. The
restriction is that each player must not take the number which the
opponent took at the previous move. For simplicity, presume that
there are only six chips on the initial board. To play the game
using the Associatron, first the components of vector x are assigned
as shown in Table 4 to four patterns of the board position, the move,
the mutual effect of the board position and the move, and the image
of the result. Learning is performed in such a way that all sets of
patterns of board position, strategy, mutual effect and the result,
which have appeared in the game, are memorized one by one. The
player with the associative memory (Player A), at every move, recalls
the image of the result from the board position, a possible strategy
Board position
1, 2, 3, 4, 5 , 6
Strategy 1, 2, 3
Player PI
Winner
Winner
and the mutual effect. The strategy, from which the image of "win"
is recalled, is taken. If the images are the same through possible
strategies, one strategy is chosen at random.
HARDWARE REALIZATION
In this way, a memory device with 25 neurons, that is, 325 memory
units, has been constructed for trial. Each memory unit is composed
of 20 integrated circuit elements. For convenience, input and output
184 NAKANO
Y3 X3
···
I
( b)
I I
I
--~---------r---------
I I
---------r--
• I I
I
I
I
·
I
I
(a)
Figure 6. Hardware Realization.
parts are clearly separated in this device. and any part of the out-
put pattern can be transferred to the input register manually or
automatically. Thus. the recalled pattern can be immediately modi-
fied and be used as the next input. Although this device is very
small in number of neurons. it is effective for some experiments
concerning sequential recalling. The importance of this kind of
experiments is shown in the next section.
CONCLUSIONS
ACKNOWLEDGMENT
This work was done under the supervision of Dr. Jin-ichi Nagumo,
Professor of the University of Tokyo, and was supported by a grant
from the Science and Technology Agency of Japan.
REFERENCES
George J. McMurtry
INTRODUCTION
187
188 MCMURTRY
where ~ = (Yl' Y2' " ' , Yn)' is the vector of parameter states by
which the system or process may be governed. The region Y is the
subset of the y-space in which the function I(y) is defined.
assumed that the expression for I(~) is not known, the gradient
ClI
components, ~ , may be measured at each trial by making a small
oYi
test step in each direction, Vyi , and observin~ the resultant
change in I(Y), ~I' Then ~I. is estimated by~ .• ) The next trial
is conducted at Yl Yl
(2 )
(4 )
where
[I(Yk+~) - I(Yk-~)]
2~
j=l, ... ,n (6 )
and
(8 )
190 MCMURTRY
Gradient techni~ues have the advantages of using information
obtained at each trial of the search and converging to the optimum
under rather general unimodal conditions. They are easily applied
and conceptually appealing. They are, of course, not convergent
on multimodal surfaces. In addition, they have the disadvantage
of re~uiring the measurement of the gradient at each trial, and
each such measurement normally re~uires a minimum of n test steps.
Each test step presumably re~uires as much time as each gradient
search step. Rastrigin (l963) and Gurin and Rastrigin (l965) have
shown that random search methods are faster than gradient techni~ues
as the dimension (n) of the surface increases. Saridis from Purdue
has indicated at this meeting that he has experimental results
which favor the random search techni~ues for high dimensional
spaces (p. 204).
(9 )
~+l ~ - Pk ~k + ~+l
(12 )
192 MCMURTRY
(l3)
-(E B+l/2)
b k +l 'Uk 0<E B<1/2
(15)
E{~+l} =0 for all k
E{~+l ~+l}
T (i < 00 for all k
REFERENCES
INTRODUCTION
195
196 ASAI AND KITAJIMA
0.8
..-... 0.6
.-
+
r::
........
."
.... .~
0. 2
10 20
Trial number (n)
Figure 1. Convergence Behaviors of Membership Function f .. (n+l).
lJ
Modifier for
membership
I ,r
function r Ij Evaluator for
objective
function
Modifier for
membership
function 8jh
(iii) For the purpose of finding a path that loops from a state
si to the same state si via k branches, a kth-order transition
matrix is calculated, and the maximum value [~i,max] of membership
function on the diagonal entries in the matrix and the number [M]
of these maximum membership functions are found.
(iv) Comparing M with k, if M < k, Mth-order transition matrix
is calculated in order to remove overlapping branches, and then the
same procedure as (ii) is executed. This procedure is repeated
until all overlapping branches are removed from the loop path. Let
the maximum value of membership function and the number of the max-
imum membership functions thus obtained be rfi max and m, respec-
tively. '
(v) Since the k branches on the single loop leading from si to
si should have the membership functions that are equal to or larger
than ffi,max, the loop path may be found by comparing rfi,max with
the membership functions for k branches leading from si to si' In
this case, if the single loop leading from si to si cannot be found,
the procedures (ii)-(iv) should be repeated for the (k-l)th-order
transition matrix or the transition matrix of lower order than the
(k-l)th until the single loop is obtained.
(vi) When the transition is executed on the path obtained in
(iv), k control variables are sent from the automaton to the con-
trolled system from which k outputs will be sent out, and then k
objective functions Ik(n) and the mean value len) of all objective
functions thus far obtained are calculated.
(vii) Comparing k objective functions Ik(n) with the mean value
len), the (n+l)th membership functions fij,k(n+l) and gjh,k(n+l) are
respectively determined from the nth membership functions fij,k(n)
and gjh,k(n) by using the algorithm as shown in the following:
SIMULATION RESULTS
o 50 l JO
Trial number (r. )
0.0' , ,
0.0 0 .2 1.0 1.8 2.6 3.4 4. 2 5 .0
U1
.: Trial points at starting time (7 points),
x:Trial points at final stage (1 point).
@: Optimum point, ~: Local optimum point,
A: Saddle point, ( ) : Number of tri~8 .
0: State number.
f\)
o
I-'
Figure 3. An Example of Results of Simulation Study.
202 ASAI AND KITlWIMA
CONCLUSIONS
.,..,
·roi
'H
0 .8
f 4 ,4(0)
---: g4,4(n)
XI a is variable
(0.0 <:a~0.99)
~: a is constant (0.6)
0: a is constant (0 .4 )
0.5 16r----
I
o 50 100 130
Trial number (n)
REFERENCES
George N. Saridis
Purdue University
I. INTRODUCTION
204
ON A CLASS OF PERFORMANCE-ADAPTIVE 205
u (t)
c
= F{c T¢(z(t),t)} E rl
u
(4)
such that l(u c ) is bounded for all controls Uc(t)E~,the admissible
set; i.e.,
where the length of the interval T.= t.-t. l' i=1,2, ... ,00 satisfies
the conditions 1 1 1-
c.
1
= C* i=1,2, ... , (ll)
the "cost" is the accumulative performance index over the interval
[t ,t ]:
o n
t *T
V(n)
1
t -t
E( k L(z(t),F{c <P(z,t)}t)dt) J (12 )
n 0 t
o
and in the limit using eq. (10):
ON A CLASS OF PERFORMANCE-ADAPTIVE 209
lim V(n) = I(u*) = Min I(u ) = I .
c c mln
n~ c
Then the following theorem and corollary establish the asymptotic
optimality of the algorithm in probability, and in the rth mean.
Theorem 1.
search (8) with ~he
Given an arbitrary number
following properties:
°> 0 and the random
then there exists a number ~ > 0 such that as the number of itera-
tions n goes to infinity
lim P(n) 6 lim Prob{V(n)-I. < 36} = 1 .
= mln = (14)
n~ n~
n
I
E~ {T.J(c.)+T.J(p.)}
i=l ~,n ~ ~ ~ ~ Ti' Ci+l=Pi
V(n) ~------------------------
n
T
i
= o, Ci+l=C i
(16)
[T.+T.]
~ ~
I
i=l
I
F(-) ----~t- ---~;r--------------
/ ,-----,
/
Per Interval
Performance
Evaluator
a =-2
x= a
1 '
=-4 '
2
z = (l,O)x + v
where
w ~ N(O,l) E{v(t)w(t-z)} = 0, _00 < t < 00
v - N(O,l)
x(O) ~ N(O, [;~] ).
.
I = E{hm 1 IT
T (z 2 +u2 )dt .
T+O °
The self-organizing controller is designed to be
u(t) = F[cl¢l(z) + c 2 ¢2(z)]
where
d2 d )-1
¢1 (z) = z, F = ( - + f dt+fl
dt2 2
and the adjustable parameters are c l ' c 2 ' fl and f 2 .
212 SARIDIS
This set has been selected to satisfY the Routh criterion for the
augmented system and under the assumption that the plant parameters
belong to the set
~a = {al < -1, a 2 < -1, b l = 0, 0 < b 2 < 3} .
W( t) VO)
S: LINEAR PLANT
:0 )=Ax( t )+bu( t )+dw( t) z(t)
x( to)
r --,
" ", ,
,,
PER -INTERVAL
L...-_~ I
J(Cr4)=t=T.
it2(ZZwz
PERFORMANCE E\Al..UATOR
IE-----------I
+u 2 wz)dt
PERFORMANCE -ADAPTIVE
'--_ _2_1_tl_ _ _ _ _--J SELF-ORGANIZING CONTR>l.,
1
- - - - - - - - - - - - - - - - - - - - - - - - - _____ 1
8.00
ro
trJ
!:tI
'-rj
U)
~8oo
N !!l 6 ,00
x I(')
<l « trJ
0 cr I
~ 6,00 «
CD
z
E;
~400
::; . i'f-3d
Z PERFOIlMANCE ADAPTIVE H
:> SOLUTION <:
400 trJ
2,00
1,373
0 ,00
0,0 500 1000 1SO.o 200.0 250.0 - 2,00 +I----,------,r----r---..---- ----.
ITERATIONS - N 0.0 SOD 100,0 5
1 0 ,0 200.0 250,0
ITERATIONS -N
Figure 3. Per-Interval Performance Index J(C N) Figure 4. Accumulated Performance Index (V(N)
Goalz vs. Iterations N.Random Gzbar vs. Iterations N.Random
Search Technique. Search Technique.
I\)
I--'
W
214 SARIDIS
V. DISCUSSION
However, the method seems very prom~s~ng and has given mathe-
matical rigor to heuristic"behavioral" ideas used in learning
control problems.
ON A CLASS OF PERFORMANCE-ADAPTIVE 215
APPENDIX A
Px(nl/n) ~ Py(nl/n)
Then P(n)
-lim
n-)<lO
1
V(n)
n
= ~~--------------~~--------------~~-------- n
i=l
I T.~ I T.~
i=l
(A3)
\)1
where Il is the sum over \)1 segments of absolutely successful
i \)2
iterations of length >n as defined in Lemma 3, I2 is the sum over
n i
the \)2 = k - \)1 remaining segments, and ~3 is the sum over n'
~
I T.~ ~ ~ 0 (A5)
i=l
00 n 00
k 2.. nTB, n' 2.. 2nTB, mTB(n+2) 2.. 1, for some m > 1
It is easily shown that under (A9) at least one segment of length
ni is greater than n, because otherwise there is a contradiction
in the following relation
k
L n. = n-n' > nmTB(n+2)-2nTS > nmTBn
~ - - (A10)
i=l
From (A9) it is shown that the quantities V2/n and n'/n can be made
arbitrarily small
I
I V2 n'
:ax [~2 Ti + Ti ] < 0 (A12)
L
T.
i=l ~
Now define
~i E{J(c.)} - I
~
.
m~n
T.
~
n
1
L
i
T. [E{J(c. )}-I . ] =
~ ~ m~n
L
i
8.s.
~ ~
(A13)
L T.
~
i=l
From Lemma 3, ° -
2.. Si < 0 with probability P(n*) 2.. P(n), n* < n.
Define 8i and the independent identically distributed random vari-
able t;i satisfying conditions of Lemma 2:
s. with probability P(n*)
8.
~
= ~
Then
Vl Vl
Ll
. e.~.
~~-.
< Ll 8.s.
~~
(Al5)
~ ~
~(o,P(n*)) ~ Prob
prob{lf
~
8i~il > 2o} ~ t (l-P(n*))
Using the results of Lemma 3 and its definition, P(n*) can be made
arbitrarily close to one for a sufficiently large n*. This implies
that (Al7) can be made arbitrarily close to zero and therefore
~(o,P(n*)) can be made arbitrarily close to one.
Using now (A5), (A12) and (A16), one establishes that for a
fixed number of segments and interruptions
P(s/k,n' ,n) 1l Prob{V(n)-I. < 30/k segments, n' interruptions}
- m~n-
VI V2 n' n
LIT.E{J(C. )-1 . hL 2 T.E{J(c. )hL 3T.E{J(c. )hL T.E{J(P.)}
ill mln i l l i l l i=l l l
V(n)-I .
mln n
L (T. +T. )
i=l l l
1 (A5 ,)
<
n
L (T. +T. )
i=l l l
and the proof carries through with V(n) replaced by V(n) and 30
replaced by 40. The proof of convergence in the rth mean follows
immediately from Corollary 1.
Q.E.D.
220 SARIDIS
REFERENCES
ACKNOWLEDGMENT
DYNAMICS BY LEARNING
Kahei Nakamura
Nagoya University
Nagoya, Japan
1. INTRODUCTION
221
222 NAKAMURA
2. PROBLEM STATEMENT
2 ~'
• Xci
'J , ~
.....
... , ,
X
(Jort
A'd' •
'lJ
.x0 tAO
;:( - Space U. - Space
Figure l . Vector Space Description of Learning Process.
A CONTROL SYSTEM IMPROVING ITS 223
height to cause the desired steady state of plant output) causes
the plant output of xc. Call the association of (UO,XO) as the
initial situation of the iterative improving approach.
xl = xO+a (x _XC) (1 )
d 1 d
~y
1 1
= H 1
(XO)~v (2)
where Hl(xo) is designated Sequential Transfer Matrix or S.T.M. [lJ
in the first stage, and is evaluated by the procedure described
Section 4. Since the evaluated Hl is available in the neighborhood
of xc, the control variation required to attain the subgoal x~ is
derived by the relation (3).
~ul = [Hl(xo)J-l~xl where ~x~
d
By applying the new control ul=uo+~ul, the system reaches at xl
which is located near x~.
That is (6 )
By applying this evaluated H2(x l ) to
~u2 = [H2(xl)]-I~x2 (7)
d
the control variation ~u2 to cause the desired variation of output
~a can be calculated. By applying to the plant the control u2=ul+~u2
the system reaches at x 2 located near xa.
Controlled
Plant H
X
'Ii. +
+
. (n)
f(x,x, ... , x )=g(u) (8)
The variational equation of (8) comes to Eq. (9).
(n)
6 x -rnr
a ( (n) (n-l)
f x,x, ..• ,6x )+6 x
a (n)
(n-l) f(x,x, ... , x )+ ...
af af
6 (n)
x +al (x,x,
. ... , (n))
x 6 (n-l)
x +. , ..
+a .
1 ( x,x,.", (n)),.
x ux+a ( ' ... , (n)),
x,x, X uX k(u)6u
n- n
(n)
Now, assume that coefficients ai's be linear functions of x,x, ... , x,
(
a.x,x,
• (n)
... ,x)=C.t+LY j 9,
( ) ~ (9,)
x (t) (10 )
J J 9,=0
and k(u) be a power-expanded polynominal of u,
k(u) =
L
r k9,u
9,=0
9,
(ll )
0 1 0 0
0 o1
0 o 0
S NXN matrix (13 )
•
•• 1
0 0 0 ... 0
(20 )
t,
where Mj = [c.l+
n
I [
J.t l-S] x] R. (2l)
J t=l
Since all components of L'I are usually smaller than 1, the next
relation is derived for large number of i.
xi-xd = L'li[ (l-a)L'I-l+a](xO-x d ) (29)
228 NAKAMURA
Then,
i
lim x (30 )
i--=>
i
For general a.<l, the convergence of x is guaranteed for noise
free case. J
6. DISCUSSIONS
7. CONCLUSIONS
REFERENCES
Hiroshi Tamura
Osaka University
Osaka, Japan
Much work has recently been done in the field of optimal con-
trol. The structure of the optimal controller can be determined
mathematically on the basis of Pontryagn's maximum principle, if
the dynamic characteristics of the plant are known.
230
SELF-LEARNING METHOD 231
process under operation. Thus it has been an important and long-
pending problem for control engineers to find the control system
which adapts its control action to the varying dynamic plant.
This paper will show that, by making use of the above mentioned
pattern identification technjques, self-learning control can be
accomplished for time-optimal control of unknown plant dynamics.
2. DESCRIPTION OF SYSTEM
The optimal control theory states that the normal time optimal
control can be implemented by switching u from 1 to -lor vice versa.
A control of this kind is usually called bang-bang control, and such
switchings of u can be made by a switching surface set up in the
state space.
3. SELF-ADAPTATION LOGIC
When plant dynamics are unknown, weights Wi's are not possible
to determine analytically and some adaptive method is necessary.
The author shows that the pattern of state transition or trajectory
in relation to temporal switching surface could be sufficient
information to decide the direction of adaptive modification of Wi's.
------~J~------ •e1
.,.--:0
0(J.
------ --- 1J'..
u =1
- ...
r--...;;;;;;;;:;:;~ 0~ ..... _--.....
-.....;;.;:::.:;,~-
U=-l
o
Figure 3. The Adaptation Logic and Effective (®) and Non-effective
(0) Correction Points.
SELF-LEARNING METHOD 235
an effective correction takes place at a point nearer to the origin
always after such non-effective correction, as far as the optimal
switching curve for a given plant is convex upward in the second
Quadrant. This holds true in most systems of practical importance.
Speed and convergence of this logic are subject to shape of the
optimal switching curve, type of f.(E), input patterns and its
distribution. Besides the stabiliiy of the feedback loop in the
process of adaptation is an interesting thema left for further
studies. But computer experiments applying this self-adaption logic
to the various control situation have proved satisfactory results
with regard to convergence and stability.
4. ADAPTATION PROCESS
'\.. ...
"\--""-'0,_1"---\
~--\.---......
o 20 60 eo 100
Nlnt>er of Steps m
Figure 4. Adaptation Process to a Second Order Plant.
236 TAMURA
Here each step begins with the setting of an initial state and
ends with the arrival of the terminal state, the origin.
15 initial states are randomly chosen from the state space and
applied cyclically. The control quality ~ at the m-th step is
defined as
100 m tj-t Oj
'\n = m-k+l j=k
L t . (10 )
OJ
k = {l for m~,14
m - 14 for ~5
where tj is the actual control time at j-th step and toj is the
possible minimum time for the same initial state. The time course
(Figure 4b) of qm shows good improvement of control quality. The
resultant switching curve at the 100th step is
e l + 0.217e2 + 0.085e~ = 0 (11)
(12)
Self-learning process can be read from the time course of the con-
trol quality. Since adaptation to a certain environment causes to
forget the established experience for other environments at earlier
stage of learning, the controller cannot follow the sudden changes
of the environment. It is, however, proved that the controller
comes to be able to utilize, as the learning process proceeds, the
result of adaptation for the input patterns experienced earlier.
This scheme has been applied to the case where the environmen-
tal pattern is simply represented by a scalar z which influences
linearly the plant dynamics. The results are shown in Figure 6.
The plant is described by
(14 )
*2 -zx2 + u
L~aming Control
-I =x2
~: -zxZou
SELECTOR CONTROLLER
Figure 8. A Pattern-selective Learning Control System.
REFERENCES
Julius T. Tou
University of Florida
INTRODUCTION
243
244 TOU
Environment
Input
Plant
Control Situation
Strategy Parameters
(1)
where y. denotes the ith feature parameter which may be a meter
readingldescribing the performance measure, the input signal, or
the environmental parameter. The control feature vector may be con-
verted to a binary vector, if we quantize the feature parameters
and express each measurement in a binary number. A group of Yk's
will be assigned to a feature parameter. Determination of the con-
trol feature vector from observed data is a problem of encoding.
c ml c m2 c
mn
which reduces to
-1
~ = (1 - A. ~ .Q.' ) ~ l.. (10)
Upon expanding the inverse matrix term, we have
z = [1 + (A. .Q. ~') + (A. .Q. ~,)2 +
Nij
a .. (12)
lJ N.+N-N ..
l j lJ
1 a 12 a 13 a ln
a 21 1 a 23 a 2n
A (13)
a nl a n2 a n3 1
w(y.)
~
=
(l8)
The control strategy with the largest relevance number will be
selected.
CONCLUSIONS
REFERENCES
Kyoto University
Kyoto, Japan
INTRODUCTION
252
STATISTICAL DECISION METHOD 253
disturbance{z}
OBSERVATION
MECHANISM
L(Zt) = f
f(x,d(zt)) p(x/z t ) dx (1)
where f(x,d(zt)) represents the loss associated with decision d(zt)
when the optimal input xo=x, and p(x/z) is the conditional proba-
bility density function. However, p(x/z) in many cases is not
known beforehand. We can use the past control experiences as the
pair (xo,Zt) of optimal input and observed disturbance.
There are two ways to solve this problem: (a) calculate eq.
(2) by estimating p(x) (section 2); and (b) take the minimax
strategy (sections 3-5).
254 EIHO AND KONDO
i.e. :
o 1
2
1
2
m
a +al x.+a2x.+ ••• +a x.+W. >
mll =
m
1 Y.}
a +al x.+a2x.+ ••• +a x.-W. ~ Yi (4' )
o 1 1 mll
2 m-l
a l + 2a2x.+ ••• +ma x. > 0
J m J =
F(x)
F0<)
. . . . . . . . . . . . . . . . . . ·8· • • • • • • • • • • • • • • • • • . • <!) •
.
o o.
• v
Q
y = -Cl,02 + 4.9x _ 11.lx 2 + '.6x 5
~~--------------------~x 1» y
o
Figure 2. Probability Distribu- Figure 3. Probability Distribu-
tion Estimated by Linear Pro- tion Estimated by Least-Square
gramming Method. Method
Assume that U can take only discrete values. Eq. (2) can then
be expressed as follows:
L.
l
= J f. (x)p(x)
l
dx = J f. (x)
l
dF(x) . (7)
i
ij,
U3
J U4
:
&1 U5
.... .......
"o ,
, ' .. ~
. ", . ,
..
•
:---...................................... :
."_.,., ... ... "",, .
,.""
:--......................................
. ,...,,',0,0""""""."
:
Ul
.: ,..... '.,.
:....................•..............•......
, .
:
",,·<;>.- •. ,', •
:.............•............................:
• 00" .... - •• , . . . .
• .... co" ,0'" "'0'"""-<' .
mi: number of past observations for which X = xi' i=l, ... ,k;
n: sample size: n=ml+m2+" .+mk;
~=(ml"" ,mk)T: vector expression of sample (data);
M: set of ~, whose element is expressed by mp;
Lij: loss associated with the decision d j when X=xi;
L=[Lij ]: loss matrix;
P(xi) = 8 i : probability of X=xi' i=1,2, .•. ,k;
~ = (8 1 ,8 2 "" ,8 k )T: vector expression of probability of X.
R(D,~) = I
p=l j=l
rr(j,~) 1T;-p
P(mp/~) , (13)
where ~* and ~*=(p~,P~, ..• ,P~) are the maximin and minimax solution
of game r and
j=l
r 7T j
m
-p
= 1 p=1,2, ... ,s (18)
Con,jecture
= 1 m E M (23)
-p
q s
S(D) = L I K(m ,j)
-p
(24)
j=l p=l
260 EIRO AND KONDO
(25 )
STATISTICAL DECISION METHOD 261
CONCLUSIONS
This paper has dealt with the question of how to use the
statistical decision method where the statistical property of the
system are not known. The learning property is included in the
mechanism of estimating the probability function by the data
(section 2) or of making the confidence interval (section 3). The
third method (section 4) has learning property included in the
mechanism of minimizing S(D). By using a relaxed minimax decision
function, we car. make an optimal decision for every loss matrix
when we have gained enough control experience.
dR
d8 i
I = 0 i=2, ... ,K (A-3)
8 = 8*
Calculating eq. (A-3) by substituting eq. (13), we get:
q q
dR - L
r(j &) ~8. P{j,~,D) +) p{j ,~,D) ~8. r(j ,~) (A-4)
d8 i -j=l l J=l l
where
s
p{j ,~,D) L 'TT j
m
P(m /8)
-p -
(A-5)
p=l -p
If the game has only one maXlmln solution and one minimax solu-
tion, the theory of games leads the following equation:
r(j ,~*) =v for all j (A-6)
where v is the value of game r. Considering the fact that:
P(l,~,D) + P(2,~,D) + ... + P(q,~,D) 1 , (A-7)
262 EIHO AND KONDO
Q.E.D.
The proof of general cases is similar to the above special
case, though it is omitted here.
REFERENCES
R. W. McLaren
University of Missouri
INTRODUCTION
noise
X(k)
StoCH~STIC PLANT
U(k-l)
where i=1,2, •.. ,s(k), i # j, and 0 < a < 1; (b) if U(k) = Us +l ' a
new control policy, then eq. (4) above applies with j=s(k)+l, while
s(k) is replaced with s(k)+l. This particular algorithm is discussed
further in [8] while a discrete-valued one is described. in [9].
Remembering that Z(k) represents an instantaneous performance index,
define the conditional average value of Z(k) as m(U)=Ez(zIU). Let
m(U) possess bounded partial derivatives and assume that the condi-
tional probability density function p(Zlu) is stationary for a
given U. As k~, the expected values of the probabilities of the
generated or existing values of U tend to approach limit values
which fall in reverse magnitude order as the corresponding condi-
tional expected performance values. That is, if mi=E[Zlu i ] and
Ey(k)=E[Pv(k)], v=l, ... ,s=s(k), then,
Suppose now that there are only noisy state measurements avail-
able. Let vii(k) be the maximum of Vij(k) over j for any i. If U~
is the optimal U for state qt using Y(k), then using eq. (10) with
at(k-l) = 1,
E[Z(k) IU~] Vit (k) > E[Z(k) IU~] (16)
for a given Q(O) or X(O), where N=l, ... ,M < 00. Per-state optimiza-
tion to achieve long-term low average performance as in Model II is
not sufficient to determine a finite optimal control sequence as
considered here. Rather, the learning controller must repeat con-
trol sequences, reinitializing at X(O) in an off-line mode. It is
desired that the controller seek a control sequence SN to achieve
the following behavior. Letting p[sNlx(o)] = P(SN) for a given
X(O), it is desired to determine SN so that,
(a) p[S~lx(O)] = minimum over SN ' and
1 kN
(b) kN I E [z (v)] + minimum (18)
v=l
These two conditions are related in the sense that for large k, the
optimal sequence sought will result in the term of eq. (18a) domin-
ating the time average expressed in eq. (18b) as S; is chosen more
frequently. In general, the N-step performance measure could be a
function of all U(k-l) and X(k). If the X(k) are not accessible,
then the desired result in eq. (18) is over the unknown states. In
this model, the learning controller selects and applies a sequence
or vector of N values of U(k-l), k=l, ... ,N. Based on E Z(k) over
N steps (for the N controls), SN is generated from a uniform dis-
tribution defined over a continuum of sets of N controls. As the
number of repetition increases (with reinitialization), sequences
of controls closer to the optimal one will acquire higher expected
probabilities while the average expected performance decreases.
This is similar to using an Nth order stochastic automaton model
for the controller. Let Q(O) be specified. If Q(l) is known, the
learning controller can then seek an optimal (N-l) control sequence,
SN-l for every khown Q(l). The length of the desired control
sequence decreases as the Q(k), k < N become known. In this case,
more control sequences, such as U(N-l), ... ,U(k) given X(O), X(l),
.. , ,X(k), k < N, would need to be stored (with the corresponding
performances), but performance will be improved because of this
added information.
COMPUTER SIMULATION
•7
Il
()
c:
III
.6
e
Ll
....Ll0 .5
4'
0.
Il .4 •
0()
III •
Ll
CV
>
III
.3
1 2 3 4 5 6 7 8 9
k - hundreds of iterations
Figure 2. Learning Curve of the Time Average of Z(k) Versus k.
A CONTINUOUS-VALUED LEARNING CONTROLLER 275
REFERENCES
Yale University
I. INTRODUCTION
277
278 VISWANATHAN AND NARENDRA
{al}
Figure 1. Automaton-Environment Feedback Configuration
or as a sufficient condition if
or equivalently if
ON VARIABLE-STRUCTURE STOCHASTIC AUTOMATA 279
limPk(n) = 1
n~
1.2 ••.•• r
}
The closer the value of M [satisfying (1.3)] to ck the more exped-
ient is the automaton. The degree of expediency can be defined
similarly so that it is maximum when M = c k and zero when M = MO.
Expediency assures a performance better than that obtained by
choosing the automaton actions with equal probabilities. However.
expediency allows in the limit. finite and perhaps significant
probabilities of occurrence of poorer actions (those having higher
penalty probabilities). Furthermore. as the probabilities p.(n)
(i=1.2 •...• r) are bounded below and above by 0 and 1 respectively.
the optimal convergence defined in (1.7) results in zero variance
for the limiting state probability vector. Hence optimality is
more desirable than expedient behavior.
r
L f~[Pl (n) .P2(n) •...• Pr(n)] = 0 (1.9 )
i=l
Let a(n) = a .. Clearly. the action a. is penalized if f~[.] < 0
l l l
and rewarded if f~[·] > O. The updating scheme by which the auto-
l
maton changes its structure is said to be a reward-penalty scheme
if it rewards an action that causes the environment to produce a
non-penalty output and penalizes an action that makes the environ-
ment respond with a penalty. Similarly reward-reward. reward-
inaction. inaction-penalty and penalty-penalty updating schemes can
be defined.
280 ISWANATHAN AND NARENDRA
Linear and Nonlinear Schemes. An updating scheme is
(i) linear, if for each i E {1,2, ... ,r} the updating term
f~ [.] depends linearly on Pk(n) (k = 1,2, ..• ,r)
and
(ii) nonlinear, if it is not linear.
(1.11)
and (iii) the step size factors in the updating functions are
such that the updated probabilities lie within [0,1]
at each stage 'n'.
A necessary condition for optimal convergence of an updating scheme
belonging to S is stated in the following theorem.
p.(n)=l
~
k = 1,2, •.. ,r
In view of (1.9) it is necessary that
r
L J\(n) = 1 (3.2 )
k=l
282 ISWANATHAN AND NARENDRA
Aj=Fi(n) = 0
S(n) = (l-a). 0 < a < 1
(ii ) 12enalt~. set 0.3)
A. (n) = 0
1
1
A ..,.(n) r-l
J 1
S(n) = (l-b) • o < b < 1
r
M = ........
r~-l-
I
i=l c i
or of the form
p.(n+l) = p.(n) - Sp.(n). 0 < S < 1
1 1 1
(ii ) penalty:
l
(l-a)p(l-p)
fJp) = - ap + (l-p) (4.2)
f _ (p)
= (l-b)p(l-p) o < b < 1 (4.4)
- bp + (l-p)
with b = ak , k -> 1
Using another result from [11] (p. 240, Theorem 5) it is found that
the S-model as given in (4.1) and (4.4) is optimal provided that
cl < k~l < c2 (4.5)
while
° x
0
f+(P) Alpa(l_p)a+l
f_(p) - A2pa(1_p)a+l
a > 2
Ni=~ (AI = 0, A2 > 0) and N~=~ (AI < 0, A2 > 0) are optimally con-
vergent.
00
± B L a
p (l-p)
a+l
=± B 2(1 )3
p -p
I-p(l-p)
, 0 < B < 3 (4.17)
a=2
(4.18)
V. FURTHER COMMENTS
For optimal schemes, the expected step size ~Pl(n) may be used
as a criterion for comparison. For example, a scheme A may be de-
fined to be better than a scheme B if ~Pl(n) for A is larger than
~Pl(n) for B whenever pf(n) = pr(n). Using such a criterion it can
(2). (2) (2) (2)
be Sh(~) that the NR_P scheme ~s better than the NR_ I , NR_R , NI _P '
and Np_p schemes.
REFERENCES
ACKNOWLEDGMENT
K. S. Fu
Purdue University
I. INTRODUCTION
288
A CRITICAL REVIEW OF LEARNING CONTROL RESEARCH 289
(1) Improvement of learning rate and the stopping rule. The
rate of learning of existing learning algorithms is considered
rather slow, particularly for fast-response systems. It may be
improved either by appropriate utilization of a priori knowledge
or by developing new and faster learning algorithms. Furthermore,
most of the existing algorithms have been proved to be convergent
asymptotically. In practice, with a finite time of operation, the
time available for learning may be rather limited. Consequently,
the quality of a learning algorithm at a finite number of itera-
tions becomes important. If the tolerable or satisfactory perfor-
mance of the system can be specified, an appropriate stopping rule
is required so that the learning time will not be longer than
necessary.
HUMAN CONTROLLER
r------------------------,
I
I
I
I
I
I
-
I
I Data Acquisition Arm and
Command I Plant
I "nd Hand
Force Program Dynamics
Calculations Dynamics
I
I
Pattern I
I
Recognition I
I
I I
IL __________________________ JI
Sensors
l
I
Figure 1. Pattern Recognizing Model for a Manual Control System
A CRITICAL REVIEW OF LEARNING CONTROL RESEARCH 291
I
I
I
I
I
I
I
I
I
I I
IL ____________ ...JI
II i
i
Problem
Models Reflexive
I
Robot !
Solving I
and of the
Environment Respones I
I
Vehicle i
Planning I I
L __________________________ I ~ "j
505-940
II
I j
L._._._._.~
REFERENCES
ACKNOWLEDGMENT
Nagoya University
Nagoya, Japan
1. INTRODUCTION
297
298 NAKAMURA AND ODA
•
, ______________~r---~-----------'I
V
heuristic system
Figure 1. Heuristic Element and Heuristic Algorithm.
HEURISTICS AND LEARNING CONTROL 299
randomly. then selects the highest trial point among them. Although
there are several varieties of the scheme (2). the basic frame of
search is as follows. First a point xi' i=I.2 •.••• is selected
randomly and decided as a base point for a successive local search.
which will get an extreme point x i m with a value fi. Next another
point xi+l is selected. and the same procedure is repeated till
times of fail (fi+l ~ mrx
{fi}) are counted. The highest (optimum)
point fO is got as m~{fi}. There are also several varieties of
(3). An example of them is as follows. First, a global model of
a polynomial approximate equation is made for a criterion function
from data of local search; then an optimum point of the model is
conjectured. After jumping over to a conjectured optimum point.
some correcting trials of search are added to stick a true optimum
point (Polynomial Conjecturing Method). In another example, first
local search points of four adj'acent trials make a local approxi-
mate criterion function of a third-order polynomial equation; then
a global criterion function is constructed by a piecewise connection
of such local functions. A next trial is made at the maximum point
of the global function. After the trial, the global function is
reconstructed by an addition of the new trial result, and another
trial point is selected: in this way, the search process is
repeated (Method of piecewise cubic approximation). There are
also a method of probabilistic model which approximates by prob-
ability the intervals between adjacent two points, and so on.
lIest hclL
a screen, which can avoid the subject's seeing a test hill (criter-
ion function) in front of the experimenter. When the subject
selects a trial point x = (xl' x2)' the experimenter tells him the
height of the point on the test hill. The subject writes down the
height on a recording paper, and selects a next trial point. Repeat-
ing the above procedure, the subject gets an optimum point which is
drawn on the experimenter's test hill paper, but hidden for the
searcher by the screen, by using the experience of search which
recorded successively on his searching paper. Main goal is to
search an optimum point f(xm). Incidental goal is to keep the
value of
N
r(x.) =
1
l:.
N i=l
I f(x.)
1
5.4. Discussions
18
I .?~ 1
17
16
-1 ~~-
I
,-
~-~ ~ L---
IS 1 0 o ~
1-1 t-l~ ~
13
12
11
1 -- -
10
9
8
7 I)
6
:J
-I
3
.1
2 3 -I C, 0 7 " 'I lU 11 12 13 1-1 15 16 17 18 19 20 21 22
Figure 4. An Example of Searching Experiment (In Case of YF-16)
1 r~
1~ :l~
0Jcy
.nl
2
YF-IG
7, k [),~)
(b l Kn-13 (c) EH-·l
Figure 5. Example of Mode Transition Diagram.
306 NAKAMURA AND aDA
6. CONCLUSION
This paper emphasized that the courageous work for the intro-
duction of more sophisticated intelligence to control engineering
should be started just now, and demonstrated author's opinion on
definition and meaning of Heuristics, Heuristic Learning, Learned
Heuristic and Intuition, and furthermore introduced authors'
research on heuristics evolved by human searcher in the peak-
searching process on a two-dimensional and multi-modal criterion
fUnction. There remains various problems worthy of future works:
What kind of difference will appear between heuristics evolved
by subjects with and without any sense organ such as visual organ,
HEURISTICS AND LEARNING CONTROL 307
REFERENCES
* in Japanese
308 NAKAMURA AND ODA
Bernard Widrow
Stanford University
ABSTRACT
310
ADAPTIVE MODEL CONTROL
PLANT~~--~
(?)
ADJUSTABLE
COMPENSATOR
--------------1
I
I
I
_ _ _---.:>1
I
I
I
~~~--~~-L-L-ll~N,SOOCERI
I
I
I IBM
I 1130
L
PALO ALTO
_M_E_DIC_A_L~ESE_A_RC_H_F_O_UN_D_AT_I_O~ _ _
I
-.J !~. DURAND LAB
~TANFORD~NIVERSI~ J
Figure 2. The Experimental Set-up.
ADAPTIVE MODEL CONTROL 313
2015DPM
160 SICK
UJ
0:
~ 140
'"~ ± 120i---t-----"'--
UJ
--- HEALTHY
0'
§~ 100
CD
~ 0L---~~-*_,~~~~~C5~0--------~-C
(bl
'j
REFERENCE
INPUT
BLOOD SAMPLER
PRESSURE
COMMAND
SIGNAL
xj
The adaptive model would be linear if' the weights were f'ixed
or if' their values were not f'unctions of' the input-signal character-
istics. The adaptive process automatically adjusts the weights so
that f'or the given input-signal statistics, the model provides a
best minimum-mean-square-error f'it to a sampled version of' the
combination of' the zero order hold and the plant. The adaptive
process utilized is the LMS algorithm, presented f'irst in ref's. [4]
and [5] and presented more completely in the context of' applications
to pattern recognition [6] and applications to spatial and temporal
f'iltering (adaptive antenna arrays) [7].
SAMPLER
ERROR SIGNAL
USED IN
ADAPTING WEIGHTS
Wo
BIAS WEIGHT
where
R == E[X j X3] and P == E[djX j ] (3)
n
X.
J
1
= -
w3
[r. - w -
J o i=4 L X ••
J-~+
3w .] •
~
ADAPTIVE MODEL CONTROL 317
,.,
a) first two
weights
zero
+1~---1 [Xj]-OUTPUT OF
FORWARD-
TIME
CALCULATION
Y,
',j REFERENCE INPUT ,COMMAND SIGNAL Y'j
[Xi]-- OUTPUT
FORWARD
OF
TIME
X
,+ 0
X
,-I
x
,-2
x CALCULATION
,- 3
,., j
b) first two
weights
small,
finite
+1 1.3)------(
Yj
n
W5 Xj + W4 Xj+l + Wij+2 + w2 x j + 3 + wl x j +4 = r j +2 - Wo I w,x
- i=6 ~
j _'+ 5
~ (7 )
These equations cannot be solved yet, since there are too many
"unknowns" for the number of equations. Since wl and w2 are rela-
tively small, a solution can be obtained by letting two adjacent
distant-future values of x take arbitrary values, such as zero.
When only three values of the reference signal rj,rj+l,rj+2 are
known, we have three simultaneous equations to solve. We let
Xj+4 = Xj+3 = O. It is then possible to solve for Xj,X j +l ' and x j +2 •
ADAPTIVE MODEL CONTROL 319
Although we only need the decided value x. at the time of the jth
cycle for direct control purposes, the future tentative values a
interesting to have also.
Note that when the transport delay mechanism is such that the
first k weights of the model are relatively small, solving the
equations determining Xj and future x-values requires assuming
that a sequence of k distant-future x-values are zero.
The total memory time of the adaptive model was 100 seconds.
The model contained 20 taps with 5 second delays between taps of
the transversal filter. Once automatic control was established,
the system took about 5 minutes to settle the blood pressure close
to the set point. Thus the system settling time was about 3 times
as long as the memory time of the model. This represents rather
fast settling for an adaptive control system.
2
HEALTHY DOG
leo SICK DOG
¥
! 160
140
~
w
a:
120
100
"-
eo
~ 60
40
20
....- -
~
~
L'
...ii!
w ~
-
.€
40
MAtfJAL CONTROL AUTOMATIC COOTROL
g g 20
:5.:2 0
0 6 7 8 9 10 II 12
TIME (m,nul •• )
I
leo SICK DOG
AUTOMATIC COOTROL
160
~"? 40
ii! .€
g g 20
:5 .:§
0
0 2 3 <I 5 6 7 8 9 10 II 12
TIME (m,nu"')
SICK DOG
leo AUTOMATIC CONTROL
160 • - PRESSlf£ SET POINT
0· 1---.---r--.---r--'---r--.r--.---r--.---r--.~
o 10 II 12
TIME (minutes'
O.6 O r - - - - - - - - - - - - - - - - - ,
MINUTES OF AOAPTiON ' 0
~ 020}
~-~. , . I i
~1~
1~ 00 2 4 6 8 10 12
WEIGHT NUMBER
14
I
16
!
18
REFERENCES
ACKNOWLEDGEMENTS
Jin-ichi Nagumo
University of Tokyo
Tokyo, Japan
INTRODUCTION
325
326 NAGUMO
N
I W.X . .
1 J-l
(4)
i=l
i=l
I V
i
(j)x
j-i
(6)
Xj
101, 102,
... , IDN
Yj +
+
+
L -
('
I Yj - Z j
Wl V (j) Xj _l
l
W2 V (j) Xj _ 2
2
W= = !j =
~
WN V (j) Xj _N
N
Zj = (Yj , !j)' (9 )
(lO)
x,
o
Figure 2. Geometrical Interpretation of the Adjustment Procedure
When N=3.
MAN-MACHINE SYSTEM
~ D1SPLAY-------::::rl
cD ern:
TAPE-RECORDER
~ SUWECT f--r--+--,
PROGRAM IDSYS
10 j1
OO?
o SAMPLE+DISPLAY OO?
1 SET OF A WEIGHTING
FUNCTION
2 10. OF THE SYSTEM
o THE TRACKING
EXPERIMENT -
3 1 RETURN
3 RETURN
10
0 2 1
)1
SUBJECT ????????
DATE ???? 19??
SPAN OF THE SYSTEM SPAN OF THE SYSTEM TIME ????
?? ?? SAMPLE INTERVAl I-
???MSEC.
SPAN ??
XX XX XX .. . X
XX XX XX
EXEC~TION
XX xx
XXX
EXECr'ON EXEJTION XX
I I
EXECUTION
U/
A
B
Figure 5. Two Typical Shapes of the Weighting Functions of the
Pursuit Tracking Systems in the Normal Mode Operation of the Sub-
ject. A: trapezoidal, B: bell-shaped.
334 NAGUMO
SUMMARY
REFERENCES
"An Inconsistency Between the Rate and the Accuracy of the Learning
Method for System Identification and Its Tracking Character-
istics" by Atsuhiko Noda
Discussers: D. W. C. Shen, G. N. Saridis, Y. Funahashi
337
338 LIST OF DISCUSSERS
341
342 INDEX