You are on page 1of 7

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-17, NO.

4, JULY/AUGUST 1987 683

and decision analysis," Civ. Eng. Syst., vol. 2, pp. 201-208, 1985. rule and the corresponding fuzzy model identification algorithmr
[24] G. Hobson and K. Luetkemeyer, "EO/IR automatic feature recogni-
tion," Emerson IRAD Tech. Rep., fiscal year 1986, Oct. 1986. were also proposed by Pedrycz in [10]. Li et al. studied the
[25] L. T. Minor and J. Sklansky, "The detection and segmentation of blobs self-learning of fuzzy models [9] and proposed a self-learning
in infrared images," IEEE Trans. Syst. Man, Cybern., vol. SMC-11, no. algorithm for the simple SISO (single-input/single-output) fuzzy
3, pp. 194-201, 1981. models.
[26] J. K, McWiliams and M. D. Srinath, "Performance analysis of target This correspondence proposes a general fuzzy model identifica-
detection systems using infrared imagery," IEEE Trans. Aerospace Elec-
tron. Syst., vol. AES-20, no. 1, pp. 38-48, 1984. tion algorithm for MISO (multi-input/single-output) systems
[27] H. Dreyfus, What Computers Can't Do. New York: Harper and Row, based on Pedrycz's work [10] and a related self-learning al-
1972. gorithm. Two numerical examples show that the proposed identi-
[28] G. Waldman, J. R. Wootton, and K. Lanburg, "An IR detection pro- fication algorithm can provide a fuzzy model with a fairly high
gram incorporating EOSAEL," presented at the 4th Ann. EOSAL
Workshop, Nov. 1983. accuracy and that the proposed self-learning algorithm might
[29] J. R. Wootton, E. Carney, and H. Nelgner, "Sensor stabilization require- make the model more accurate. Note that the self-learning al-
ment investigation," in Proc., Optical Platforms, Nat. Symp. and gorithm can also readily be converted into a real-time form for
Workshop, SPIE, vol. 493, June 1984, pp. 426-428. on-line applications purposes.
[30] H. Nelgner and J. Wootton, "Computer based electro-optical detection
model," Emerson Electric, St. Louis, MO, Tech. Rep., vol. 17342, Apr.
1980.
II. Fuzzy MODEL IDENTIFICATION
A. Problem Statement
A discrete-time fuzzy relational model for a MISO system with
p inputs may be written as
y(t)=y(t-T) y(t T-)o o (t-r1-ny)oul(t-
Fuzzy Model Identification and Self-Learning for
Dynamic Systems
CHEN-WEI XU AND YONG-ZAI LU
o . ou(t- )

Abstract -The algorithms of fuzzy model identification and self-learn-


u(t-Tr-np)oR (1)
... o

ing for multi-input/multi-output dynamic systems are proposed. The re- where output y(.) and inputsu'(.), .,up(.) are all fuzzy
quired computer capacity and time for implementing the proposed variables, and R is the fuzzy relation between the inputs and the
output. The symbol "c" denotes the fuzzy composition operator.
algorithms and related resulting models are- significantly reduced by
T, 1,* *, '1,T are time delays and ny, n1,* *, np present the system
introducing the concept of the "referential fuzzy sets." Two numerical
examples are given to show that the proposed algorithms can provide the
orders. Equation (1) is one of the general forms of the MISO
fuzzy models with satisfactory accuracy. discrete-time fuzzy models. As is well-known, the identification
problem usually involves both the structure identification and the
I. INTRODUCTION parameter estimation. Obviously, the structure identification for
A number of approaches to the identification of system dy- the problem under study is to determine the delays (T, T1.*, Tp)
namics have been proposed during the last two decades [1]. and the orders (ny , tn, ,np). The approach to the structure
However, many difficulties still exist in applying the existing identification could be simnilar to the regular methods.
methods to many real complex systems with nonlinear time-vary- For convenience, let
ing characteristics. One of the possible approaches to overcome
these difficulties is to use a fuzzy model [5]-[7] to describe the x1(t) =y(t-'T)
static and/or dynamic behavior of these systems. The identifica- x2(t) = (t - T-1
tion of such fuzzy models may be done in two ways: the linguistic
approach [7]-[10], [12] and the approach based on resolving fuzzy
relational equations [5], [6], [11].
The fuzzy relational model based identification was proposed a
xny+l(t) =uy(t-r Y) (2)
few years ago. Tong [7] proposed a "logical examination" method Xn +2( t) =ui( t 1)T
to solve the linguistic identification problem [17]. Li et al. mod-
ified Tong's method and got a better result [8]. They also pro-
posed an adaptive model modification algorithm based on the
"decision table." However, the proposed algorithmns could not be
Ix"(t) = -T,- u,(t n,)
used for multivariable systems with high dimensions due to the where
large amounts of computer memory and time required. In ad- p
dition, the correlation analysis method determining the structure n=n +1 +, (T,+1).
of the rule set, as proposed in [7] and [8], is also difficult to i-=1
extend to multivariable systems.
Pedrycz emphasized that the "referential fuzzy set" is an Substituting (2) into (1) yields
important concept in linguistic modeling [10]. A new composition
A(t) 7xI(t) 0 x2(t) 0.* *,* x,,t) o R. (3)
It should be emphasized that here the fuzzy relational model (3)
is based on the concept of " referential fuzzy sets" used by several
Manuscript received April 19, 1986; revised December 12, 1986. This work investigators [7J-[10], [21]. A new composition rule has been
was supported in part by the National Science Foundation from the National adopted such that the fuzzy relation R in (3) does not directly
Educational Committee of China. connect the elements of all universes but connects the prespeci-
The authors are with the Laboratory for Industrial Process Modelling and
Control, Zhejiang University, Hangzhou, China. fied linguistic constants (referential sets) on these universes as
IEEE Log Number 8714515. proposed in [10].

0018-9472/87/0700-0683$01.00 01987 IEEE


684 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-17, NO. 4, JULY/AUGUST 1987
The fuzzy model identification problem can then be described ential sets:
as follows:
Pij(k) A poss(Aljlxl(k))
I N
= supmin [A1j(xl), XI(k, xl)]
min J + _ (y(k)-9(k))2 (4) X1
R N -Tmax + 1 k = Tmax
model structure
referential sets

subject to I/O data sequence {y(k),x(k), k=1,N}, where


pn,j(k) 'poss(A,,jjx,(k)) (8)

maX( T1,
= supmin[Anj(xn),Xn(k,xn)]
Tmax = , Tp ) + Xn

pj(k) poss(Bjly(k))
=

B. Determination of Referential Fuzzy Sets = supmin [ Bj(y), Y( k, y)], j =1,r


Let Y,XXl, *, ,Xn be (n +1) universes of discourse and y
y, xI, x,xn be their generic elements, respectively. Assume that
each universe contains the same number ( = r) of referential sets, where X(k, xi) is the membership of xi(k), i = 1, n.
which are b) Construct vectors
All , * ,r41 E- F( X1) p,x,(k) "[pjj(k), ' * Plr(k)]
(5) (9)
Anl..%AnrE F(Xn) 5( k)([pnl(k)9 )Pr(k)]
B1, ,BrE F(Y) py(k)A[pl(k) ..prk) k=1, n.
where F(Y) stands for the set of all fuzzy sets on Y, and so on. The subrelation Rk can then be constructed based on (PX (k),
The referential fuzzy sets have their linguistic meanings. For i =l1,n), Py (k):
instance, if X1 is for temperature, then one may define All as
"low temperature," Ali (1 < i < r) as "medium temperature," Rk = Pxl(k) X X px,( k) x py(k) (10)
and Alr as "high temperature," etc. These referential fuzzy sets
are characterized by their membership functions Aij(xi): Xi - where x denotes the Cartesian product, i.e.,
[0,1], i =1, n, j =1, r, and B13(y): Y- [0,1], j =1, r.
To determine the memberships of the referential fuzzy sets, Rk(slA , S *pnsn,(k),ps(k)],
,s)=min[pl,,(k),
statistics [8], clustering [18], or subjective methods [8], [21] can be for all sl, ** ,s,sE r. (11)
used. To ensure the performance of the fuzzy model and provide
a uniform basis for further study, it is required that all referential In fact, this corresponds to max-min composition. If max-prod-
sets should be normal, convex, and satisfying the following uct composition is applied, (10) should be defined as follows:
completeness conditions: Rk (Sl¢ Sn IS) =PI,, (k) X P2S2 (k)X * X Pns,( k) X ps (k)> ..

for all xi E Xi, 3j E T Aij(xi) > 0, i=1,n (6) for all sj, *,sns E-r. (12)
and c) Calculate R:
N
forall yeY. Aje r By(y) >0 (7) R= U Rk (13)
k =1
where A{1,2,- -,.r
The number of referential fuzzy sets in each universe, r, should where U denotes union operation, i.e.,
be selected according to experience and tests. In general, the N
model accuracy may be improved by increasing r [10].: However, R(s1, ,s,,s) = V Rk(sl . i . n, S
.

on the other hand, a larger r will certainly require more com- k =1


puter memory and CPU time. (14)
No evidence exists to show that any other methods are better
than the subjective one in determining the referential fuzzy sets where V = max.
[8], [10]. Furthermore, one can readily ensure the convexity,
normality, and completeness conditions of the referential sets D. Using the Fuzzy Model
through the subjective method to be used in this correspondence. If R and xl(k), * , xn(k) in (3) are given, y(k) can then be
calculated as follows. First, find the referential fuzzy sets "closest"
to xj(k),-.*, x"(k), denoted by A1jx,,** ,An^X, respectively,
C. Identification Algorithm Al -Determining R where
The main task in identification is to determine the fuzzy -{ jlpij(k) > q, jE r}
relation R from the input/output data sequence of the system. A (15)
relation R based on referential sets is different from that based
on the elements of the universes, both in the form and in the X= {ilPn(k)> q, j e r}, 0< q<1
meaning. Let the memberships of R be R (sl, s ,s), which
expresses the connection among the referential sets and q is a preselected threshold. If Xl, * *, A,n are unique, y(k)
Alsl- , A",B (sl,* )s7s=,r can then be calculated in terms of its membership Y(k, y) as
Algorithm Al: follows:
a) Calculate the possibility distribution [20] of the kth (k =
1, N) data pair [xl(k),..., x"(k), y(k)] on corresponding refer- Y( k, y) = maxmin[ R(X, **,An, s), Bs (y)] (16)
s
-
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-17, NO. 4, JULY/AUGUST 1987 685

III. SELF-LEARNING OF THE Fuzzy MODEL


plant t -- It is obvious that the algorithm Al cannot guarantee the
optimum of the resulting model. Fortunately, the self-learning
algorithm could improve the performance of the resulting fuzzy
model and could be applied to time-varying systems. The sche-
j matic representation of the self-learning algorithm is shown in
Fig. 1. It'can be seen that the performance index is to minimize
I_ the error e(t) A y(t)- y(t), where y(t) is the model's prediction
, for y(t).
i~~~~~~~~~~
r- A fuzzy model usually consists of many rules. Referential fuzzy
Self-learning sets on different universes form linguistic rules through the
connection of R. Each element of R corresponds to a specific
L-1 mechanism rule. The model gives the prediction y(t) based on x,(t),
xn(t) using the rule set. Obviously, only a small part of the rule
set'needs to be used to produce y(t). For instance, only r rules
.kl
are needed when X1.i , XA are unique. Thus only these rules
t ,, -

should be modified through self-learning if e(t) * 0.


Fig. 1. Fuzzy model with self-learning mechanism. If fact, the self-learning algorithm is a rule-modification al-
gorithm. To modify the rules, one can either modify the corre-
If X1,-, X,A are not unique and denoted by sponding elements in R or modify the memberships of referential
* fuzzy sets. However, the modification of the referential sets might
A(ki) XQ() ., ,(kn) respectively, then Y(k,y) could be ob- cause
.

confusion because the referential fuzzy sets are used not


tained from only by the rules to be modified, but also by all other rules. For
this reason only the fuzzy relation R will be modified in al-
Y(k,y)= max max max maxmin gorithm A2. What follows is the self-learning algorithm A2 in its
i1ek1 i2 k2 in kn S
off-line form.
(
[R X(P), *,X(\n,s), B( y)]. (17) Algorithm A2: Suppose that the initial relation R(°) is given:
Note that (16) and (17) correspond to max-min composition. If now it will be modified using the measurements { xl(t),* , x, (t),
max-product composition is used, the min operation in the t = 1, N }. The procedures are as follows:
equations should be replaced by product operation. a) We have k = 0, R(0) given.
The fuzzification and defuzzification processes should be taken b) k+1I=k.
into account both in constructing and using the fuzzy model. c) Use model R(k- 1) and data x1(k), .,x,(k) to produce
Fuzzification means transforming a scalar into a fuzzy set, namely, prediction 9(k). Let e(k)=y(k)-y(k). If Ie(k)I<E (e pre-
fuzzying the scalar, the result of which is usually a singleton, i.e. selected), then R(k 1) X R(k) and return to b); else continue.
a fuzzy set with the only nonzero membership in the element d) Calculate p1l(k), p,j ((k), pj (k) (j = 1, r) according
"closest" to the scalar [10]. Defuzzification is the inverse process: to (8). Then calculatej X1.*., X,. It is obvious that the following
a fuzzy set is aggregated into a scalar. For defuzzification, we r rules concerned with A1x1,. ,AnA are responsible for the
usually have three'choices: maximum membership method [10], prediction 9(k):
denoted by de F1, which chooses the element with greatest mem-
bership as the scalar or chooses the mea-x of the elements with if (A4x1,, AnX.
-
*, ) then poss0(B Iy(k))
the same greatest membership in case no unique peak exists in = R (k- l) (A,***,An 1)
membership; mean-area method [19], denoted by deF2, which
chooses an element such as the scalar that it divides the area (20)
under the membership curve into two equivalent'parts; and fuzzy 11 ,* nX, ) .
if (Alxl, then poss ( BrIy( k))
mean method [14], denoted by de F3, which determines the scalar, = R(k- 1)(
say x, for fuzzy set A on X by the formula *.S r).

S xA(x) Thus only r rules are required to be modified. In other words, r


xeX elements in R concerned with AlAl, * * *, An^. should be modified.
e) define the quantity d(k):
Ex A(x) (18)
xE
d( )-P1X1(k) * ... *pA(k) * p(k) s =1,r (21)
where X is a finite set with generic element x and A(x) is the where for max-min composition, * = min; for the max-product
membership of A. case, * = product. Note that in (21),
Through the development of algorithm Al, it can be seen that
the fuzzy model is governed by a fuzzy relation R connecting ps(k) = poss(Bsly(k)), s =1, r. (22)
referential fuzzy sets. The input/output data used in the identifi- f ) Modifying R(k -1) results in R(k)
cation are viewed as linguistic data and thus make their contribu-
tions to the fuzzy relation R according to their conditional {ad(k) +(- ) (k- 1) ( Sj, v**v5nnS
possibility distribution on corresponding referential fuzzy sets.
The model is thus equivalent to a rule set -and each entry in R
corresponds to a specific rule: R(k-1)(sl1 . . 5;S 5)
if ( xl ( t) = Als, and and xn (t) = AnSn,),
...
otherwise
then aSEtr [0,1] s = 1, r. (23)
poss (Bsly(t)) = R(sl,. * ,s,s), sir,s , s E r. (19) g) End, i'f k = N; -otherwise, return to b).
686 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. $MC-17, NO. 4, JULY/AUGUST 1987

It can be seen that if the rules in (20) are replaced by TABLE Ia


if (AIX,, . An.n)
* n then poss (BjIy(k)) = dfk) J
(24) T, 2 3 4 5 6
then poss (Brly( k)) = d(k) 1 1.5221 1.3598 1.0689 1.4727 1.7702
2 2.1368 1.486 1.5277
3 3.0435 1.9839 1.886
since d(k),.. *, d(k) are calculated from measurement y(k), (24) 4 2.5997
will attempt to make the following inference:
aMax-min composition and de F3.
if (xl(k),r.,xn(k)), then y(k). (25)
Modification is realized just in this way. In algorithm A2, if for TABLE II
all s, a3 = 0, then modification no longer exists; if for all s, Composition Max-Min Max-Min Max-Product Max-Product
a3 1, rule set (24) will completely replace rule set (20). in R.
Since y(k) usually contains noises, it is reasonable to keep as < 1 de Fuzzification de F3 de F2 de F3 deF2
to reduce the effect of measurement noise on the model. In this J 1.0689 0.6018 0.8501 0.4555
case a compromise is made between the old rule set which results
in y(k) and the new rule set which attempts to infer y(k) from
x, (k), ,x,,(k). It can .then be seen that as is somewhat like
the "step length" in rule updating. A larger a will result in a
faster self-learning rate, however, the model is more easily af-
fected by noises; on the other hand, a smaller a3 can reduce the
effe,ct of noise on the mnodel, but it will also result in a slower 0.4555
self-learning rate.
as (s = 1, r) are determined by the follQwing two factors: 1) the
amplitude of je(k)I (obviously, a larger Ie(k)I should correspond
to a larger as, and as = 0 if e(k) = 0); 2) the relative contribu-
tions of each of r rules to 9(k)-a rule which contributes more
to 9(k) should undergo more modification (a larger as). To
examine the contribution to 9(k) of each rule in (20), we write
Y(k,y) = max[ R(k n)( ,**Xn s) Bs(y)] (26)
where Y(k, y) is the membership of Ay(k). If we define t, as
ts = R(k 1)
( xi 9 , , ,
. (27)
Xn , S) 9.s =1,'r Q.32
and define C,, * * *, C, E F(Y), with the membership of Cs being 0 2 4 6 i
Fig. 2. Self-learning reduces J.
Cs(Y)=ts forally (28)
and further define 95(k) E F(Y),
IV. THE NUMERICAL EXAMPLES
YS(k)=B.nC,C. s=1,r, (29)
The following two numerical examaples are presented to il-
then (16) can be written as lustrate the proposed algorithms Al and A2.
r
Example 1: Box's Gas Furnace Data [13]
A(k)) s=1U A (k). (30) In this example, 296 I/O data pairs are gathered from a gas
furnace. Input u(k) is the inlet methane
It can then be found that 5 (k) is the contribution to '(k) of the the CO2 concentration in outlet gas. The rate, and output y(k) is
sth rule in rule set (18): data are written in
{y(), u(k), k = 1,296}.
if (A1l1 * ) Identification Procedures:
a) Determine the universes of discourse U and Y; u(k) E
thien poss ( Bsly (k)) = R k -1) ( I,. ,Xn,S). F(U) and y(k) E F(Y). This is done by checking the range of
the I/O data.
Now let us define .1. r describing the relative contribu- b) Determine all referential fuzzy sets. Let r= 5. The refer-
tions of A1(k), r(k) to Ay(k): ential fuzzy sets Al, A5, B, JB5 have their memberships
similar to those in Fig. 3. It is easy to check that these sets are all
s E[Y(k, ( sy)]> = 1, r (31) convex, normal, and complete.
y
c) Assume that the structure of the rule is
where Y (k, y) is the membership of A3 (k). It is obvious that the
larger the Ps, the more the contribution (k) to A(k). Thus as is A
(y(k T,) u(k T2)) J* (k) (33)
defined as where delays Tr and 2 will be determined later.,
as hfis e(k)|, s = 1, r (32) d) Performance index of the model is
where h is a constant and is used to control the range of a3. It 1 296
should be emphasized that even if the X1, .., N,A are not unique, min J =
286
[y(k)- 9(k)]2. (34)
the self-learning algorithm and the discussion about a, still work. T1, T2 k-11
The case when the X1, ... x ; are not unique is considered in the R is constructed using algorithm Al with different Tr and T2,
numerical examples to follow. with the corresponding J being calculated. The results are shown
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-17, NO. 4, JULY/AUGUST 1987 687

TABLE III memberships


Literature Note f 1
[7]
Fuzzy modified model 0.469
[81 unmodified model 0.899
Fuzzy modified model 0.44 0.5
[10] r=5 0.776
Fuzzy r= 7 0.478
r=9 0.320
[13]
Nonfuzzy 0.71
Our r = 5 unmodified 0.4555
0 Y'Ul ,u2
Results modified 0.328 Fig. 3. Membership of referential fuzzy sets in Example 2.
(fuzzy)

TABLE IVa
T 1 1 1 1 1 1 1 1 1 1 1 1
T1 0 0 0 0 0 0 0 0 0 0 0 0
2 2 3 4 5 6 4 4 4 4 4 4 4
J, 0.0853 0.0798 0.0331 0.0843 0.0827 0.0927 0.1129 0.1286 0.0855 0.0932 0.0858 0.0875
J2 0.0892 0.0833 0.0369 0.0858 0.0818 0.0989 0.1249 0.1313 0.0893 0.0970 0.0922 0.0926
a Max-product composition and de F2.

TABLE Va
T 1 1 1 1 1 1 1 1 1 2 3 4
T, 0 0 0 0 0 1 2 3 4 0 0 0
T2 2 3 4 5 6 4 4 4 4 4 4 4
J1 0.0932 0.0925 0.0393 0.0918 0.0866 0.0863 0.0988 0.0986 0.1017 0.0947 0.1213 0.1271
J2 0.0934 0.0938 0.0429 0.0933 0.09 0.0902 0.1071 0.1045 0.1022 0.1037 0.1236 0.1308
aMax-min composition and de F2.

in Table I. It can be seen that J reaches its minimum when T1 =1 distributed on ( - 0.08, 0.08). Therefore, data 1 are noise-free and
and T2 = 4.- data 2 are noisy. Inputs ul (k) and u2(k) are both uncorrelated
A comparison between different composition and defuzzifica- random sequences uniformly distributed on (0.1,0.9).
tion methods is shown in Table II. The minimal J corresponds to Identification Procedures:
max-product composition and de F2. a) Determine universes Y, U1, U2.
Self-Learning of the Model: The relation R with J = 0.4555, b) Determine referential sets. Again r = 5. The memberships
taken as the initial relation, is modified by algorithm A2. Let Ji of the referential sets All,* *,A15, A 21,*, A259 Bj. *,B5 are
denote J after i times of modification, i = 0,1,2, * * . The effect shown in Fig. 3.
of modification with a different "step length" h is shown in Fig. c) Suppose that the rule has the following structure:
2. Self-learning significantly reduces J. It is noted that in self-
learning h plays an important role, somewhat like that of step (y( t- ), u(t-T1), U2(t- 2)) Yy(t) (38)
length in gradient methods. An adequate h will result in good where delays T, T1, and T2 will be determined later.
convergence in self-learning. d) Define a performance index
When algorithm A 2 is converted into an on-line form, the
requirement on h may be different, because in the off-line case 1 400
the J may be reduced through carrying out self-learning step by .J17 J2 3909 k-l y(k) - y"( k)]
y
step. However, for the on-line case, usually a larger h is desired k =11
to follow the time-varying characteristics of the system as quickly
where J1 corresponds to data 1, and J2 is for data 2.
as possible. The delays T, T1, and T2 and composition and defuzzification
Table III shows the comparison results between the proposedmethods will be determined by minimizing J1/J2.
algorithms and the published literature. The results obviously It may be intuitively seen that the effect of delays T, Tr, and T2,
demonstrate the advantages of the proposed methods. on J1 /J2 are not affected by composition and defuzzification
Example 2: A Simulation Dynamic Model methods. To verify this, let us look for T, T,, and T2 that
minimize J, 7J2 under two different combinations of composition
A two-input/single-output bilinear model and defuzzification methods. The results are shown in Tables IV
y(k) 0.8y(k -l)ul(k) +0.5u1(k -1)y(k -2) and V. It can be seen in Tables IV and V that when T = 1, i, = 0,
and T2 = 4, J. /J2 are minimum. This result shows that in identi-
+ U2(k-4)+ae(k) (35) fication, we can first choose appropriate T, T1, and T2 by mini-
is used to provide the input-output data sequence, which is mizing J1/J2 using any composition and defuzzification meth-
expressed as follows: ods, then fix the delays to their best values and minimize J7/J2
in terms of composition and defuzzification methods.
data 1 (ae = 0): { y(k), u1(k), U2( k), k 1, 400} (36)
= The effect of composition and defuzzification methods on
J1/J2 is shown in Table VI. Here the best choices are max-prod-
data 2 (a = 1): {y(k), u1(k), u2(k), t =1, 400}. (37) uct composition and de F3. This is different from Example 1. In
In model (35), e(t) is an uncorrelated random noise uniformly addition, de F3 causes very poor performance.
688 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-17, NO. 4, JULY/AUGUST 1987

TABLE VIa
J1, Defuzzification
J2
Composition de F1 de F2 de F3
Max-min J, = 0.0663 J. = 0.0393
J2 = 0.0652 J2 = 0.0429 J2J1 == 0.0442
0.0468
Max-product J4 0.0821
=J2
=0.0780 J4
J2
=
=
0.0331
0.0369 J1 0.0364
J2
=
=
0.0328

aT = 1, 1-, ,T
= 4.

J (i)

0.06
0 2 4 6 8 10 i
Fig. 6. Self-learning reduces J (MI).

0.0393

0.02
0 2 4 6 8 10 i
Fig. 4. Self-learning reduces J (I).

0.0329

0.025
0 2 4 6 8 10 i
Fig. 7. Self-learning reduces J.

V. CONCLUSION
In this correspondence a linguistic identification method (al-
Ih2 gorithm Al) has been proposed. The concepts of linguistic vari-
h-> _ h=1 ables and conditional possibility are used for constructing a fuzzy
model. Two numerical examples have shown that the proposed
identification method can result in fuzzy models with fairly high
-10 _ accuracy, although the models are inherently imprecise.
0.022 0 To improve the fuzzy model accuracy, a self-learning algorithm
2 4 6 8 10 (A2) associated with the algorithm Al has further been devel-
Fig. 5. Self-learning reduces J(I). oped in this correspondence. In fact, the algorithm (A2) is a
linguistic rule-modification algorithm. Numerical examples show
that the self-learning algorithm A2 can considerably improve the
In this examplle, we not only examine the effect of h on J /J2 fuzzy model accuracy.
but also the effe cts of different composition and defuzzification The methods proposed in this correspondence might also be
methods on J.. 17he results are shown in Figs. 4-6. used for deriving human control strategies. They also can be used
The effectiven ess of the self-learning algorithm is again proven in fuzzy self-organizing control algorithms and other decision-
It is interesting
--
--o-- to note that in Figs. 4 and 5 the least possible
-- v-- r
---- making processes.
values J can reach are the same (J, = 0.0230) for the different
composition and defuzzification methods used. Fig. 6 again shows REFERENCES
the poor performance of de F1. [1] P. Eykhoff, Ed. Trends and Progress in System Identification. Oxford,
Fig. 7 shows the effect of self-learning on J2. Due to the noises England: Pergamon, 1981.
in the data, J2 is usually greater than J. both before and after [2] L. A. Zadeh, "Outline of a new approach to the analysis of complex
systems and decision processes," IEEE Trans. Syst., Man, Cybern. vol.
self-learning. SMC-3, pp. 28-44,1973.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-17, NO. 4, JULY/AUGUST 1987 689

[3] R. R. Yager, "Fuzzy prediction based on regression model," Inform. I. INTRODUCTION


Sci., vol. 26, pp. 45-63, 1982.
[4] A. Kandel, "Fuzzy dynamic systems," in Fuzzy Sets: Theory and Appli- In many pattern classification problems a set of classified
cations to Policy Analysis and Information Systems, P. P. Wang and S. K. training samples (not necessarily completely correct) and an
Chang, Eds. New York: Plenum, 1980. additional set of test samples are available. Many classical statis-
[5] W. Pedrycz, "Numerical and application aspects of fuzzy relational tical pattern recognition techniques can be applied. One of these
equations,," Fuzzy Sets Syst., vol. 11, pp. 1-18, 1983.
[6] E. Czogala and W. Pedrycz, "On identification in fuzzy systems and its techniques is the k-nearest neighbor (NN) classification rule [5].
applications in control problems," Fuzzy Sets Syst., vol. 6, pp. 73-83, Many possible derivatives of this rule exist (see, for example, the
1981.
[7] R. M. Tong, "Synthesis of fuzzy models for industrial processes," Int. J.survey given by Dasarathy and Sheela [4]). An intuitively appeal-
Gen. Syst., vol. 4, pp. 143-162, 1978. ing idea, due to Dudani [6], is that a training sample closest to an
unclassified test sample should be weighted most heavily. Dudani
[8] B. S. Li and Z. J. Liu, "Application of fuzzy set theory to identification
of system models," Inform. Contr. (China), vol. 9, no. 3, 1980 (in proposed the use of a weight which increases as the distance
Chinese). between the test sample and its nearest neighbors decreases.
[9] T. H. Li et al., "Self-learning algorithm for fuzzy semantic inference," It has, however, been shown by Bailey and Jain [1] that the
A CTA Automatica Sinica (China), vol. 10, no. 4, 1984 (in Chinese).
[10] W. Pedrycz, "An identification algorithm in fuzzy relational systems," asymptotic error rate of the traditional unweighted k-NN rule
Fuzzy Sets Syst., vol. 13, pp. 153-167, 1984. (i.e., its performance assuming an infinite set of training samples)
[11] M. Higashi and G. J. Klir, "Identification of fuzzy relation systems," is better than that of any weighted k-NN rule. We do not dispute
IEEE Trans. Syst., Man, Cybern., vol. SMC-14, no. 2, pp. 349-355,
1984. this conclusion. In the same paper, Bailey and Jain also present
[12] J. B. Kiszka, M. E. Kochanska, and D. S. Sliwinska, "The influence of the results of an experiment in which a -k-NN rule gives a lower
some fuzzy implication operators on the accuracy of a fuzzy model," relative frequency of misclassification than a distance-weighted
Fuzzy Sets Syst., vol. 15, Part I, pp. 111-128; Part II, pp. 223-240, 1985.
k-NN rule using Dudani's weighting function. Similar results
[13] G. E. P. Box and G. M. Jenkins, Time Series Analysis, Forcasting and were obtained by Morin and Raeside [9]. These results would
Control. San Francisco, CA: Holden Day, 1970.
[14] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applica- tend to imply that the foregoing conclusion for the asymptotic
tions. New York: Academic, 1980. error rate may also apply when the number of training samples is
[15] E. Sanchez, "Resolution of composite fuzzy relation equations," Inform. finite. However, from experimental results in three recent papers,
Contr., vol. 30, pp. 38-48, 1976.
[16] L. A. Zadeh, "The concept of linguistic variable and its application to one can gather evidence suggesting that it does not apply, Brown
approximate reasoning," Inform. Sci., Part III, vol. 9, pp. 43-80, 1976.and Koplowitz [2] used an NN rule weighted according to the
[17] H. Zhao et al., "Multifactor fuzzy weighting graph: Theory of element numbers of samples in the respective pattern classes and ob-
net graph and system model identification," ACTA Automatica Sinica
(China), vol. 9, no. 6, 1983 (in Chinese). tained better performance than from the unweighted rule on a
[18] R. R. Yager, "Some relationships between possibility, truth and cer- finite training set. Keller et al. [8] proposed a fuzzy k-NN rule
tainty," Fuzzy Sets Syst., vol. 11, pp. 151-56, 1983. which can be considered as another weighted rule (weighting in
[19] M. M. Gupta, G. N. Saridis, and B. R. Gaines, Eds., Fuzzy Automata this case being based on fuzzy logic); for a finite number of
and Decision Processes. New York: North-Holland, 1977.
[20] L. A. Zadeh, "Fuzzy sets as a basis for a theory of possibility," Fuzzy training samples these workers' rule also performed better than
Sets Syst., vol. 1, no. 1, pp. 3-28, 1977. the unweighted rule. Most interestingly of all from our present
[21] R. M. Tong, "The construction and evaluation of fuzzy models," in viewpoint, Fukunaga and Flick [7] (in a paper on NN methods of
Advances in Fuzzy Set Theory and Applications, M. M. Gupta, R. K. Bayes risk estimation) used both distance-weighted and un-
Ragade, and R. R. Yager, Eds. New York: North-Holland, 1979. weighted distance measures and obtained lower classification
[22] L. Bainbridge, "Verbal reports as evidence of the process operator's error rates when using the weighted measures. The first aim of
knowledge," in Fuzzy Reasoning and Its Application, E. H. Mamdani
and B. R. Gaines, Eds. London: Academic, 1981. this correspondence is to show that the following hypothesis is
[23] T. J. Procyk and E. H. Mamdani, "A linguistic self-organizing process not generally applicable.
controller," Automatica, vol. 15, no. 1, pp. 15-30, 1979. Hypothesis: The error rate of the unweighted k-NN rule is
lower than that of any weighted k-NN rule even when the
number of training samples is finite.
In Section II the basic differences between the cases of the
A Re-Examination of the Distance-Weighted finite and the infinite training set are discussed, expressions for
k-Nearest Neighbor Classification Rule the classification error rate of a test sample are developed, and it
is argued intuitively that under certain conditions a weighted rule
JAMES E. S. MACLEOD, ANDREW LUK, AND may achieve a lower error rate than an unweighted rule. In
D. MICHAEL TITTERINGTON Section III, a particular example (2-NN, one dimension, two
classes, two training samples'per class, particular class condi-
Abstract-It was previously proved by Bailey and Jain that the asymp- tional probability density functions) is solved. It is shown ana-
totic classification error rate of the (unweighted) k-nearest neighbor lytically for this particular case that a suitably weighted NN rule
(k-NN) rule is lower than that of any weighted k-NN rule. Equations are gives a lower overall error rate than the corresponding un-
developed for the classification error rate of a test sample when the weighted rule for any training set (subject to the aforementioned
number of training samples is finite, and it is argued intuitively that a restrictions) generated from the specified pdf's. This example
weighted rule may then in some cases achieve a lower error rate than the may be regarded as confirming analytically what was shown or
unweighted rule. This conclusion is confirmed by analytically solving a suggested experimentally for the problems studied in the three
particular simple problem, and as an illustration, experimental results are previously cited references. It may also be regarded as a counter-
presented that were obtained using a generalized form of a weighting example to Hypothesis 1.
function proposed by Dudani. Our second aim is to investigate Hypothesis 1 experimentally.
We suggest that the higher error rates observed by Bailey and
Manuscript received May 18, 1986; revised October 15, 1986. This work was Jain and by Morin and Raeside in the case of distance-weighted
supported in part by the Croucher Foundation, Hong Kong. k-NN classification are due not to any inherent general property
J. E. S. Macleod and A. Luk are with the Department of Electronics and of weighted k-NN rules, but rather to the particular weighting
Electrical Engineering, University of Glasgow, Glasgow G12 8QQ, Scotland,
United Kingdom. function used, that of Dudani [6]. In Section IV we propose a
D. M. Titterington is with the Department of Statistics, University of generalized version of Dudani's rule and present experimental
Glasgow, Glasgow G12 8QQ, Scotland, United Kingdom. results showing that in some cases a lower error rate can be
IEEE Log Number 8714518. achieved when using the weighted measure.

0018-9472/87/0700-0689$01.00 ©1987 IEEE

You might also like