You are on page 1of 6
An Extension of the BP-Algoritha to Interval Input Vectors — Learning from Nunerical Data and Expert's Knowledge — lsao ISHIBUCHI and Hideo TANAKA Department of Industrial Engineering, University of Osaka Prefecture Mozu-Umemach!, Sakal, Osaka 591, JAPAN ABSTRACT: In this paper, we extend the back-propagation algorithm to the case of Interval Input vectors. First, for two-group classification probleas of Interval vectors, we propose an architecture of the neural network which can deal vith interval input vectors. Since the proposed architecture maps an Interval Input vector into an interval, the output fron the neural network Is an Interval. We define cost function using the target output and the Interval output from the neural network. The learning algorithm derived from the cost funetion can be vvloyed as an extension of the back-propagation algorithm to the case of interval Input vectors. Our algorithm can deal with both real vectors and Interval vectors, ‘a5 Input vectors of the neural network. Therefore, in learning of the neural network, we can use the expert's knowledge represented by means of Intervals 1. IWTRoDUCTION Im this paper, for two-group classification problems, we extend the BP algorithal1,2} (the back-propagation algorithm) to the case of Interval Input vectors In order to utllize expert's knowledge In learning of neural networks. The ‘ain contribution of this paper 1s to propose # method for learning from expert's knowledge represented by means of Intervals. The expert's knowledge considered in this paper Is as follow If xer Is In the Interval Xs and .... and Xen is in the Interval Xen then Xe belongs to Gx, Kel.2..18 (1) where Xe(Ker,-Xen) 18 a pattern vector and Xer.wXen are intervals. In two-group classification problems, Gx 1s G1 (Group 1) oF G2 (Group 2). Not only the expert's knowledge but also numerical data are used in the proposed method. The ‘nunerical data used in this paper 1s as follows, Xu DelOngs to Gm, KeSH, 8425 ns ® where Xu-(Xx,....Xnn) 18 an n-dimensional real vector. Therefore our classification Problem is to train a neural network using the expert's knowledge (1) and the nhunerieal data (2) In the proposed method, the expert's knowledge (1) is treated as the folloning interval data, Xx belongs to Ge, Kl, 2,. where Xue(Xx,....Xkn) 16 an n-dimensional Interval vector. Since the BP algorithm can not deal with the Interval vector Xx In (3) as an Input vector, we first propose an architecture of the neural network to cope with Interval Input vectors. The neural network with the proposed architecture maps an interval vector into an Interval. This means that the output from the neural network Is an Interval, Next we define a cost function using the Interval output ° from the neural network and the corresponding target output. We derive a learning ‘algorithm from the cost function, which can be viewed as en extension of the UP algorithn to the case of Interval input vectors. Using the derived algorithm, earning from the numerical data (2) and the expert's knowledge (1) or (3) can be performed. Last, two examples are shown to Illustrate the effectiveness of the proposed method. 2, INTERVAL DATA AND INTERVAL ARITUMETIC 2.1 Interval Data ‘Since # real number can be considered as a degenerated interval whose two Iinits fare the sane, the mimerical data (2) and the expert's knowledge (3) can be represented as the following forn. Xe belongs to Ge, Pel. 2e..ut! 4 here Ks's are Interval vectors for pel.2..u8 and degenerated Interval vectors for peeeL..im, Therefore our problem can be viowed asa tyo-group classification protien of the Interval vectors Xyrs In (@). Ag a classification method of the Incorvat vectors, Tshibuchl, Tanaka k Fukwoka(9} have proposed an LP based method whlch. can, be appiled to linearly separable Interval vectors. Since the tassification method proposed In this paper Is based on a neural network, 1t can bo applted to a non-linear classification of the Interval vectors In tls paper, we denote Intervals by upper case Letters AyBrwy2_ An interval Is ropresented by Its lower (eft) and upper (right) limits, for example, Kes ~ (Xe! RerPl and Ko = ptvscenn) = (Heres Kev levoRents Kew. 25 interval aritheetic ‘The gencrallzation of ordinary arithaetle to closed Intervals Is known ss Interval. arithmetic. For conplete details, se Alefeld & Herzberger(s]. The operations on intervals used in this paper are 09 follows Re Be (are) + (Oesbel = abe, aby ® guy {itraty mea’) Tor m0, ave mtatany fh AG RAT raze © exp A= oxp (atsa¥) = [exp a% exp 2) sere wi a real number $3. EXTENSION OF THE BP ALGORTUN 3.1 architecture of the Neural Network im order vo deal with the interval vector Xp a8 an input vector of a neu- 1 Tal network, we first extend the acti- vation function of each unlt asthe Tollowing Interval input-output rela £0 ton (see Fig. fives) = Fl{net", net?) [ninet'), inet) (8) where f(x) = 1/(1rex9(-3))- a mi Using the Interval activation func- te UT tion (8), We propose an architecture of the three-layer neural network whlch can deal with the Interval vector Xe (Kes, --.Xen) as an Input vector. The Input-output relation of each unit In the proposed neural network Is as follows. Input units = Opi Ops", Opt] = Kew [el.2,->- @ Fig.l Interval input-output relation Hidden units : Os * (oes, os”) = fetes), 5 am a0) Netes +E waOes + 3s. ay i Output unt : Op = (ors, ons") = f(Nete), 2) Nete = = wiles + 0, aa, i where Oni's, O»,'s and O» are intervals and n' is the number of hidden units. The Interval operations in (9)-(18) can be explicitly calculated from (8)-(8). For example, (18) Is calculated as follows, Rete! = woes + = woes + 8, ay ii i we0 nico +E moni +o. as 5 wizo wco 3.2 Learning Algorithe for Interval Input Vectors Let us define the target output tp corresponding to the interval vector Xe as te [Hf the patiernp belongs to a 0, If the pattern p belongs to G2, here the menbership of the pattern p Is specified by (4). Using the target output te and the Interval output Or from the neural network specified by (8)-(13), we define a cost function as follows. ep = B -maxi(te-04)*/2:0»€ Op) + (1-8) inl ty-o9)"/2:0" € Oe (pera + UB Mteron'V7/2, If te = 1, cn B(or0n"}*/2 4 (1-8 MterOn')*/2, If te = 0, There # Is a constant In (0, 1]. The cost function er Is the welghted sum of the naximun and minimus squared errors between the target output t» and the Interval output 0 ‘The learning of the neural network Is to mininize the cost function es. In a similar manner as Runelhart et al.[1,2], the welght *) and wi, are changed according to the following rules. Bows #1 (-0er/w. Jol2annn', as) Damir e 9-Dee/ Omid Jel 2ccen, Bel, 2esnsty ay where deo/w; and deo/ ws can be calculated from (9)-(13). The explicit ‘expressions are shown in Appendix (see also Omae, Fujioka, Ishibuchl & Tanakal5)) ‘The following momentum terms can be Introduced in the same manner es Runelhart et al.(21. Apwaltel) = 9(-8ee/0¥s) + aAsnult, (2) Aswad « 9(-Bee/ Dwi) + aDewad(t). a ‘The biases @ and 9 , are changed In the same manner as w) and Ws. 4. NUMERICAL EXAMPLES 4.1 Example 1 Let us assume that the following ten pattens are glven In the two-dimensional pattern space (0, 207. GL = (4,11), (611), (11,8), (13,6), (13,10) (22) a= { (218), (6.18), (14,2), (15.4), (15.18) @ Ye also assune that the expert's knowledge Is Independently given as the following if-then rules. f Xe 10 and x2 10 then x» belongs to Gl. a If xev216 oF Xpx216 then xe belongs to 2. (5) Since the pattern space is [0, 20]%, these rules can be represented by neans of Intervals as follows. If xes fs tn [ 0, 10) and xee Is in [ 0, 10] then x» belongs to Gl. (28) If xps Js in [16, 20] and xee Is In [ 0, 20] then xs belongs to G2. @n If xev ts tn [ 0, 20) and xpa Is In (18, 20] then x» belongs to G2. (2) ‘Therefore the following thirteen patterns can be used in learning. GL = (AD, (8,41), (21,3), (13.6), (18,10), ({ 0,10},{0, 109 2) 62 = {(2.14), (6,15), (14.2), (15,4), (15,15), ((18,201.[0,20)), (C0,20].(16,201)) (30) We first perform the learaing of the neural network with five hidden units using only the numerical data (22) and (25). We set 7 =0.9 and a=0.5 In the BP algorithm. In Fig.2, we show the shaulation result after 1,000 Iterations of the BP algoritha for each of the ten patterns In (22) and (28). The curved line in Fig.? ts drawn by plotting the polats where the output values from the neural network are nearly ‘equal to 0.5. Therefore the curved line can be vlewed as the boundary line between the two groups. Since we do not utilize the expert's knowledge in the learning, the boundary curve In Fig.2 Invades the area of x1216 which 1s assigned to G2 by the expert's knowledge (25). Next we train the neural network with five hidden units using (29) and (30), that 1s, using both the nunerleal data and the expert's knowledge. We set 70.9 and a 0.5 in learning and f=1.0 In the cost function (17). In Fig.3, we show the Simulation result after 1,000 Iterations of the proposed method for each of the thirteen Interval patterns In (29) and (30). We can see that the boundary curve In Fig.3 follows both the nunerical data and the expert's knowledge. Fig.2 Result of learning from numerical Fig. Result of learning from numerical data (22) and (23) ‘data and expert's knowledge We also train the neural network using the proposed method with different values of 8 tn the cost function (17) until the sum of (17) over the thirteen patterns becones less than 0.01, The required Iteration numbers are as follows. 310 (1.0), 347 (80.75), 380 (80.5), 2577 (80.25), 370 (20.0) (91) ‘The simulation results for 0.75, 0.5, 0.25 are similar to Fig.3, But the simulation result for 0.0 1s sinilar to Fig.2. From these simulation results and other simulations for various data, we can conclude that the value of 8 should be positive, 2 Example 2 Let us suppose that the following Interval patterns are given. GL= (COL, 5} 8.14), (C4101, (1,5), (6.121. (9,11) 9 (2) G2 = { (18,141.116,20), (U18,19},(1.3), ((14,181-(5.15) } «sy In Fig.4, we show the simulation result using the proposed method with the sane Parancters as tn Fig.3. Fig.4 Is the result of learning after 2000 Iterations of our ethod. For the comparison of the proposed nethod with the BP algorithm, we apply * the BP ‘algorithm to the given data, Since the BP algoritha can not deal with Interval vectors, we use the four vertexes of each Interval vector a3 training patterns. In Fig.5, wo. show os fy the simulation result with the sane pa- rameters as in Fig.2. Fig.5 is the re~ ‘sult after 2000 Iterations of the BP al gorithm for each of 24 potnts. We can seo from this figure that the boundary JIGS. curve correctly classify all the : w = vertexes but Invades one of the Inter- _Fig.4. Simulation result of learning val patterns. Of course, If we use many ipcjestencel patteens| points Included In each of the Interval patterns, the BP algorithm may cope with Interval patterns (see Fig.6). One problem of such strategy Is that enormous points are required for the correct classification of Interval patterns especially for the case where the dimension of the pattern space Is large. . cd f Fig.5 Result of learning of 24 points _—_—Fig.6 Result of learning of 54 points ‘5. CONCLUSION In this paper, we proposed a learning method using the expert's knowledge represented by means of intervals, Since a real nuaber can be viewed as a degenerated interval whose two limits are the same, the proposed method can cope with both real vectors and Interval vectors. This means that ve can use both the hnunerical data and the expert's knowledge In learning of neural networks. The Proposed method 1s viewed as an extension of the BP algorithm to the case of Interval Input vectors. ‘APPENDIX (1) 8¢»/ dw, Is calculated as follows. Olt te=1 and w20, then dee/ dws + -80s-(1-8)09 an @if tei and ws<0, then Bee/ Ow, =-2O2- (I-04 aa Qf te=0 and w,20, then de,/Ows = -BOa- (1-8). re) @If te=0 and w,<0, then deo/ Ow, «-BO4~ (1-8)O= as where 1 + (tenor “{I-ortonst as) 82+ (trron")or“{l-or')ons" wo 3 (tron Noe "{1-0-")0ns" an O4 = (te on Jon “(l-or" Ons wa) (2) 800/8%: Is calculated as follows. ¥sB0 and wysz0, thon deo/ wis = QIt tent, wi20 and #i+<0, then 9e0/Ow)1 = BY. -(1-8)¥o as BY2-(1-8)¥o (a0) QIt te= 1, ws<0 and ws+20, then BYa-(-B)¥r wu if te=1, wec0 and ws+<0, then BY a (-8)¥a (a2) if te=0, ws20 and W120, then B¥o-C-B)T1 (aia) if te=0, wz0 and wir<0, then Of te=0, ws<0 and ws120, then : B¥e-(-B)¥o ara BV, ~(-8)¥a (a.15) if te=0, wi<0 and wsr<0, then B¥a-(-B)¥a (a.10) where Wy 2 (ter opbor{I-op' wi0es"(1-09s 4001" aan Wo (teron*)ont{1-004)W201 (as) Va (teontloet(l-on')W701 (1s) Wa = (tortor '{l-ont)Ws0ns4U-06/4)0n1 (20) Wa = (toon Mon Y(1-o9")Ws00s%U-0024)00% aa Wo = (eror¥0nM(1-00")"s0nsMU1-Ons 4) Oni (a2) V7 = (terontionY(I-o9")ws00s"(l-00s4}00% 023) a= (tron }0¢%(1-00")#00s"(L-O0s")0n1 (028) [REFERENCES [1] DE.Runethart, G.E.Hinton and R.J.Willlans: Learning Representations by Back- Propagating Errors, Nature $23, pp.539-536 (1986) (2) D.E-Rumelhart, J-L.ticCletland and the PDP Research Group: Parallel Distributed Processing Vol.1, MIT Press, Cambridge (1986) [31 H.lshibuchi, H-Tanaka and N-Fukuoka: Discriminant Analysis of Multi-Dinensional Interval Data and Its Application to Chemical Sensing, International Journal of General Systems 16, pp.311-329 (1990) [4] G.Alefeld and J.Herzberger: Introduction to Interval Computations, Academic Press, New York (1983) [5] Monae, R.Fujioka, H.ishibuch! and Tanaka: Learning Algorithm of Neural Networks for Interval-Valued Data, Bulletin of Unlversity of Osaka Prefecture, Serles A (to appear)

You might also like